AI-Assisted Tagging in OpenStreetMap: A Case for Responsible Innovation and Copyright Compliance

Also be aware that “an import of information from Wikidata into OSM is generally not permitted because Wikidata has lower standards in terms of copyright than OSM. In addition, Wikidata contains some data from sources we are not allowed to use” (from: Wikidata - OpenStreetMap Wiki)

7 Likes

I’d agree with that, and it’s worth also mentioning that the whole point of linking to wikidata is that this sort of information can be retrieved by data consumers without storing all 7000-odd languages in the world in name:xx fields in OSM.

7 Likes

And the wikidata title of an object often contains a descriptive name to allow different articles for objects with the same name. Most common example being a town and its station.

Spotted this once when planning a route in London with OSMand set to Welsh and spotting Gorsaf Baker Street on the map.

The station name in English is Baker Street and the paid mapper had blindly copied the word for station into each language tag.

It sounds like the tags were extracted from the Wikidata item’s labels, which are somewhat generic. In principle, a label is just whatever it makes sense to display to end users in their language. The name could come from the original source, a third-party source, an on-the-spot translation, or even an automated transliteration. OpenMapTiles, Mapbox, and Google Maps all use Wikidata labels as fallbacks because they don’t really care about the source or nature of the name, as long as it’s usable as a map label.

On the other hand, if a data consumer needs to distinguish the various kinds of names, it should use statements like name (P2561), official name (P1448), and native label (P1705) where available, as well as any qualifiers such as Georgian national system of romanization (P2126). I just finished adding these statements to the item based on cursory research, so it appears the “AI” did something else to decide between common and official names.

In an ordinary written work, most foreign toponyms would be transliterated if they don’t have well-known translations. However, composed names, such as “X Mountains”, “University of X”, and “X National Park”, are particularly amenable to partial translation in many languages. German is somewhat atypical in that it borrows many composed names from other languages without changing a thing. At the other extreme, Chinese speakers aggressively translate every part of a foreign name, going so far as to perform phonosemantic matching in many cases. Most languages are somewhere in the middle, but it isn’t very logical or consistent.

In OSM, we tend to be more conservative in what we put in the name=* key and its localized subkeys, due to the on-the-ground principle and our emphasis on local knowledge. Still, prominent features like countries, capital cities, and famous landmarks often do get tagged with name:*=* in languages that aren’t signposted or used by the local populace, using some of the same techniques that writers and Wikidata use. Everyone has a different idea of what counts as a famous landmark. A country’s national botanical garden might be famous enough, but most mappers would probably agree that an obscure shop’s name (“Minh’s Marvelous Map Shoppe”) should not get translated or transliterated directly inside OSM.

So I guess we need to know where @David_Osipov intends to draw the line, since the stated motivation is to map more efficiently at scale. And if these names are just coming from Wikidata on items that already have wikidata=* tags, then we pretty much know that there’s no practical benefit that would outweigh the maintenance overhead.

Wikidata guidelines allow matching, ambiguous labels. You’ve encountered disambiguating suffixes in labels because many labels were imported from Wikipedia. Cleanup is ongoing, but unfortunately many Wikipedians edit Wikidata unaware of the different standards.

1 Like

Hey everyone and sorry for not replying. For the time being, I’ve decided to create a Feature request for an OSMand app to integrate more closely with Wikidata. As OSM community resist translating names unless they are officially used by the POI or widely established within a local community, while aiming for accuracy, severely hinders the map’s accessibility for international users who may not be familiar with local naming conventions or scripts.

I hope this will help OSMand users to circumvent OSM limitations on map internationalization for the time being.

Fallback to wikidata is a pretty common approach. OpenMapTiles does this: if tag name:xx does not exist, then attempt to retrieve it from wikidata.

I see that you are still using AI chatbot-written text. Please stop doing that. It’s rude and offense to people that have to read it. It says that you don’t care about anyone else’s time. You could have simply opened a ticket that said:

“I would like OSMAnd to use Wikidata for labels when the name:xx tag is not present in OSM for the language I’m interested in”.

Or rather, you could have written something like:

მინდა, რომ OSMAnd იყენებოდეს Wikidata-ს ლეიბლებისთვის, როდესაც OSM-ში ჩემს ინტერესის ენისთვის name:xx ტეგი არ არის.

And then you could have pasted that into ChatGPT, and it would have translated it to:

I want OSMAnd to use Wikidata labels when there is no name:xx tag in OSM for my language of interest.

I think that’s a good idea (applications using Wikidata as a fallback from OSM name:xx tags).

I am also a fan of AI. It’s a leap forward in technology and we’re going to find all sorts of applications for it. However, it has to be used responsibly and complement, rather than replace, normal human interactions.

What you are doing – writing pages and pages of AI-generated text and pasting it into places for people to read is not OK.

15 Likes

Actually, my last message hasn’t been written with the help of an AI - that’s how I write. The github feature request is - yes.

It is. And it has all the hallmarks of AI written over it. Instead of a short issue which breaks down what problem you want to solve, there is this wall of text which is hard to read because it’s so smoooooth and AI just stretched words instead of adding content.

8 Likes

This isn’t a school essay. You’re not being rewarded for word count. People in this community will take you more seriously if you just get straight to the point and use your own words concisely.

11 Likes

I am not an Osmand developer so hard to say me how they handle it, but please be aware that it would be by itself valid reason to be blocked/banned in many places as a spammer. Especially if repeated.

It is much better to post shorter actual message. For example text that you gave chat-gpt or other automatic text generator to expand.

6 Likes