ChatGpt and tagging in OpenStreetMap

Cartography in OSM is twofold: drawing and tagging. While the former only require the ability to click with a mouse for drawing a line on a screen, tagging a feature is more demanding. There are only two options for finding appropriate tags to describe a feature.

The first is to browse the wiki and find the feature’s page. Once there, you often need to dig through many other pages to find the accepted keys/values that would describe the detailed characteristics of the feature. All of this is very laborious, even for a seasoned contributor, particularly when mapping a feature for the first time.

The second option is to use search tools to find the tags you are looking for (keys or values). You can use a search engine (e.g., Google) by adding “OpenStreetMap Wiki” with your keywords to narrow down the results. You can also use OSM tools, such as TagInfo, TagFinder, OSMantic JOSM Plugin and some others. These tools work well if your keywords match existing keys or values, but it gets trickier if they don’t. However, for a few months now, a new tag search alternative has been offered: ChatGPT.

ChatGPT (Chat Generative Pre-trained Transformer) is a chatbot launched by OpenAI in November 2022 [1]. OpenAI is an artificial intelligence (AI) research and deployment company that gives a free access to its chatbot for the time of the “research preview” [2]. Basically, once registered, a user can ask any type of questions to the chatbot. I then decided to test the chatbot on its ability to find appropriate tags for OpenStreetMap features.

Since I’m currently working in tagging businesses, I tried a few ones. Here are a sequence of questions/answers tried on ChatGPT.

Q: What would be the tag use in OpenStreetMap for a restaurant serving pizza?

A: In OpenStreetMap, the tag for a restaurant serving pizza would be “amenity=restaurant” and “cuisine=pizza”.

The answer is good, but restaurants and pizza could easily be linked together. So, I decided to make it more difficult…

Q: What would be the tag use in OpenStreetMap for an establishment raising alpacas to make clothes?

A: In OpenStreetMap, the tag for an establishment raising alpacas to make clothes would be “landuse=farm” and “animal=alpaca”. Additionally, the “craft=textile_manufacturing” tag could be used to specify that the alpacas are being raised for textile production.

Wow, although craft=textile_manufacturing does not exist (yet), the value makes sense according to the clue I give to the chatbot. Someone tagging the feature for the first time will produce elaborate tags right from the start!

However, the tagging is not always perfect. For instance, here is an example that did not work exactly as expected.

Q: what would be the tag use in OpenStreetMap for describing a building material dealer for doors and windows?

A: In OpenStreetMap, the tag for describing a building material dealer for doors and windows would be “shop=doors” and “shop=windows”. Additionally, the “building_materials=yes” tag could be used to specify that the dealer sells building materials in general.

Well, in this case, the proposed keys and values exist in OpenStreetMap, but the answer is not adequate. First, it suggests using “shop=doors” and “shop=windows”. At a minimum, it should have suggested “shop=doors” or “shop=windows”, not both (although the latter is deprecated). Regarding building_materials=yes, building_materials is not a key but there is a section in the key=shop wiki page that uses it in a section header. Either way, this could have directed a contributor to the appropriate wiki pages.

In conclusion, ChatGPT (and other eventual competitors such as Bard) can really make tagging in OpenStreetMap easier. Using natural language without prior knowledge of expected tags greatly simplifies finding appropriate keys or values. It is certain that for the moment, the answers must be checked with OSM tools, but they should improve overtime.

[1] ChatGPT - Wikipedia

[2] ChatGPT: Optimizing Language Models for Dialogue

8 Likes

| jfd553
February 8 |

  • | - |

Cartography in OSM is twofold: drawing and tagging. While the former only require the ability to click with a mouse for drawing a line on a screen, tagging a feature is more demanding. There are only two options for finding appropriate tags to describe a feature.

“clicking with a mouse” will actually determine where the nodes are, and this can easily make the difference between a very good map and an ugly one (which despite the missing detail could still be overnoded). It may not matter much for medium scales (or rather the relevant threshold distance is at a different scale as well), but for detailed scales there are lots of opportunities of things that can be done well or worse. It is a bit like saying making a drawing only requires the ability to draw lines (or dots) on paper :slight_smile:

The second option is to use search tools to find the tags you are looking for (keys or values)…

Since I’m currently working in tagging businesses, I tried a few ones. Here are a sequence of questions/answers tried on ChatGPT.

Q: What would be the tag use in OpenStreetMap for a restaurant serving pizza?

A: In OpenStreetMap, the tag for a restaurant serving pizza would be “amenity=restaurant” and “cuisine=pizza”.

The answer is good, but restaurants and pizza could easily be linked together. So, I decided to make it more difficult…

also it misses the opportunity to state whether the oven is wood fired or electrical (or even something else). :wink:

Q: What would be the tag use in OpenStreetMap for an establishment raising alpacas to make clothes?

A: In OpenStreetMap, the tag for an establishment raising alpacas to make clothes would be “landuse=farm” and “animal=alpaca”. Additionally, the “craft=textile_manufacturing” tag could be used to specify that the alpacas are being raised for textile production.

Wow, although craft=textile_manufacturing does not exist (yet), the value makes sense according to the clue I give to the chatbot.

indeed impressive

In conclusion, ChatGPT (and other eventual competitors such as Bard) can really make tagging in OpenStreetMap easier. Using natural language without prior knowledge of expected tags greatly simplifies finding appropriate keys or values. It is certain that for the moment, the answers must be checked with OSM tools, but they should improve overtime.

[1] ChatGPT - Wikipedia

[2] ChatGPT: Optimizing Language Models for Dialogue


given that the OSM tagging scheme is completely invented from scratch within OSM, it seems granted that it is protected by copyright. I wonder what this means for chatgpt trained with this information (naturally this is a general question for all such systems and their relation with the intellectual property that is fed into it).

It’s interesting that a general model is able to answer with this accuracy to “niche” topics like OSM Tagging.

The next phase for generative models are “custom-trained models” where you can train a model with all OSM tagging documentation so it can produce even more accurate responses.

I’ve already seen some examples of this on a custom model trained to answer questions from the biomedical PubMed dataset that is already more accurate than a human.

Maybe there is an opportunity here for someone to train a model like this and allow integration in existing tools as a “tagging suggestion” feature? :slight_smile:

2 Likes

I have tried ChatGPT for a few weeks also with reference to OSM and could but also still find a few errors. It’s a really cool tool, but always to be used with caution.
I had asked how to tag a signpost and the answer was man_made=guidepost. De facto this combination has not a single use. So here ChatGPT has improvised a bit :wink:

2 Likes

Copyright does not protect ideas, it only protects specific expressions of those ideas. When applied to OSM, that means OSM’s tagging scheme is completely unprotected by copyright, but the tag descriptions on wiki.openstreetmap.org are protected (and licensed under the CC-BY-SA 2.0 license).

1 Like

Copyright does not protect ideas, it only protects specific expressions of those ideas. When applied to OSM, that means OSM’s tagging scheme is completely unprotected by copyright, but the tag descriptions on wiki.openstreetmap.org are protected (and licensed under the CC-BY-SA 2.0 license).

the tagging system is a specific expression, the descriptions and other texts are part of it

Things may be different in Italy, but both the United States and the UK have what’s called the “merger doctrine”: if an idea can only be expressed one way (such as is the case for OSM’s tagging system), then that expression cannot be copyrighted either.

1 Like

if an idea can only be expressed one way (such as is the case for OSM’s tagging system), then that expression cannot be copyrighted either.

it’s a new language to describe the world in a machine readable way.

The examples sounds like a bot, but not like a AI. This hints to an idea. Current, users tags whatever she wants according to your own ideas.This results in countless variations on how something can be tagged. Everyone tags “individually”, with reference to the wiki. It would be an idea, if users no longer tag themselves, but answer questions or enter information, from which tags are created. Then there is a uniformity, a standard.

Example (very simple, not real):
User input: Shop
Bot: what sells?
U: lemonade
B: seasonal?
U: no

[submit]
Result:
shop=lemonade
seasonal=no

Or by an existing Tag:
U: icecream
B: additional or change?
U: change

[submit]
Result:
shop=icecream
seasonal=no

3 Likes

While creating human-like suggestions is one of ChatGPT’s greatest strengths, it can also be its greatest weakness. As you’ve demonstrated, it’ll often give valid and correct responses, but in cases where it doesn’t ‘know’ the exact tag it will often make a guess and present it in the same way as recognised tagging like amenity=restaurant.

For this reason, I think it is still important to use the OSM wiki to check for correct tags. However, ChatGPT comes into its own when I’m unsure how to describe a feature, reducing the utility of Google or OSM Wiki searches. I could use ChatGPT as a stepping stone to convert a human description into a common name.

Another useful feature of ChatGPT is its ability to provide a balanced summary of differing opinions - as an OSM example, some tagging is poorly-defined or controversial. I doubt that it could do this for such a niche topic as specific OSM tags, but as mentioned above, a similar AI trained on OSM Wiki discussions could be very powerful.

As a final note, the original post failed to mention a third option for discovering tagging: editor presets. As an iD user, this is the main method that I use, and it does a very good job of providing the tags I need and suggesting further details that can be added. Wiki research takes up some of my time, but it’s not as prevalent as the OP suggests.

4 Likes

It would be a great idea. I’ve been mapping since 2009 and still having trouble finding the proper tags when mapping a new type of feature. I recently started using the tagfinder, but still, it often offers dozens of alternatives.

1 Like

Well, that’s both the strength and the weakness of OSM tagging folksonomy. Your proposal is a good idea (especially for frequently used tag values), but such a bot shouldn’t force contributors to only use a predefined set of values.

1 Like

I do agree!

I also agree! I’m more a JOSM user but I do recall using iD proposing such preset. Thanks.

2 Likes

See an example of what was done using the Wordpress documentation as a training material

Anyone wants to experiment with OSM and Tagging docs? :sweat_smile:

2 Likes

If you’re suggesting that remixing portions of the OSM Wiki’s literal descriptions of various tags could raise copyright issues, at least that passes the sniff test and I would defer to a lawyer to figure that one out. There are probably a lot of legal professionals eyeing the field of generative AI at this point. But if you’re suggesting a copyright infringement in using the descriptions to choose existing tags or coin new, compatible tags, then I’d imagine that’s on shakier ground, at least in the U.S. where OpenAI is based.

To use an analogy, many constructed languages such as lojban and Klingon are described in dictionaries and grammar references that are themselves copyrighted. However, you’d be hard-pressed to find a successful legal case against someone writing beautiful Klingon poetry or making a map of places named in lojban, and not for lack of trying.

There was concern that a language could be subjected to copyright protection if Google were found in violation of Oracle’s copyright when they created a compatible API guide using the original Java API documentation. But the court sided with Google in that case, ruling that it was fair use, and that wasn’t even about using the language’s grammar and vocabulary to write something new.

(I’m not a lawyer; I just follow the law. :slightly_smiling_face:)

2 Likes

To use an analogy, many constructed languages such as lojban and Klingon are described in dictionaries and grammar references that are themselves copyrighted. However, you’d be hard-pressed to find a successful legal case against someone writing beautiful Klingon poetry or making a map of places named in lojban, and not for lack of trying.

what about taxonomies and ontologies, can they be covered by copyright?

That’s less clear. I wouldn’t dismiss the tagging scheme’s copyrightability out of hand, but at the same time, I wouldn’t take it for granted either. If you want an informed take on ChatGPT or want to start the ball rolling on legal action, you could contact the legal-talk list or the LWG directly. But I’ll share some additional food for thought here, because I love geeking out about copyright – apparently I’m not alone. :smiley:

For one thing, I wasn’t assuming that UK or European law applies. While the OSMF is based in the UK, OpenAI is based in California, and the wiki is published under an older Creative Commons license that was drafted based on U.S. copyright law. (Newer versions are based on international conventions.)

As far as the U.S. is concerned, not all copying infringes on copyright, and not every infringement violates copyright law. This 2010 paper about best practices for linked data projects provides an overview of relevant case law (though it may be outdated). Apparently some courts have held that taxonomies and ontologies can be copyrighted, because fashioning an ontology is a creative endeavor, but that using parts of an ontology does not infringe on its copyright. The authors conclude that the merger doctrine wouldn’t apply if a lot of consideration goes into organizing and describing the ontology.

Even if this ChatGPT output does infringe on OSM contributors’ copyright, it might qualify as fair use depending on the circumstances. We don’t talk about fair use very often when assessing contributions to OSM. After all, the OSM database needs to be usable beyond the U.S. and for commercial purposes, so as a matter of policy, we don’t accept contributions of copyrighted material under fair use. But this policy does nothing to stop someone from, say, reproducing the entire access key hierarchy to critique it for a school assignment. If the student ignores the attribution requirements, they may get reprimanded for plagiarism, but they wouldn’t be at risk legally.

None of this is to say that your conclusion was wrong, just that it isn’t obviously right. I think we can greatly simplify this discussion by concluding that ChatGPT could be just as good at performing plagiarism as it is in facilitating it. It already tends to insert boilerplate caveats and disclaimers into certain kinds of responses, so it isn’t unreasonable to expect attribution as part of that.

Very interesting, as often it boils down to we don’t currently know for sure.

by the way, Getty Images currently has a lawsuit with an AI company about images used in their product: