AI-Assisted Tagging in OpenStreetMap: A Case for Responsible Innovation and Copyright Compliance

Following a Telegram conversation, I encouraged the topic creator to post here because it was something important for the community to discuss, and I am grateful to them for raising it here.

Thanks for that.

I’m sure that’s the case, but ad-hoc translations like that absolutely don’t belong in OSM. Ethnologue suggests that 7174 languages are in use today. It simply isn’t practical to store 7173 translations in different “name” variants in OSM for every object with a name - and it dilutes the information about which names really exist, as opposed to just being ad-hoc translations.

Several approaches have been successfully used at providing translated OSM data (transliteration, on the fly translation, links via wikidata et al) - and of course you are more than welcome to create a tourist website for Georgia that uses whatever approach you like.

5 Likes

I’m just a small contributor here and I don’t wanna pretend I know much about the legal side of things or the OSM guidelines regarding translation/transliteration, but I do wanted to briefly share my thoughts.

OSM is one of the few projects that didn’t jump onto the AI hype yet. Of course, the big companies went first. Then smaller ones. Heck, even Mozilla Firefox, the only major FOSS browser still out there, has a built-in chat bot now. Meanwhile, Microsoft is wishing to reopen a nuclear power plant and major disaster site to fulfull the ever-growing energy consumption of these “AI” models.

I, as a mapper, might not be able to add the translation of a clock tower into 25 languages onto OSM, but at least I don’t consume half a litre of water every time I make a decision.

With regards to your post title, there is no “responsible” “AI”, there is no “innovation”, there is no “copyright compliance”. One cannot pretend these computer models consuming excessive amounts of power and water, ever-dwindling resources, are “responsible”; they’re accelerating global warming.

Jumping on a short-term trend can barely be called “innovation” too. Before the AI hype, we had the NFT hype; before that, the cryptocurrency hype. After the AI hype dies, the spatial computing hype will likely be next. These hypes have one primary purpose: financially benefiting big tech companies. We should seriously consider whether we, admittedly indirectly, want to support this trend.

Not to mention that AI models are trained on large amounts of stolen data. A quick search on fediverse scapers or Clearview sums it up. And then there’s the upcoming EU regulations regarding AI, which will likely make it necessary to indicate what data was produced using AI for at least some applications, which isn’t the case in your example.

Then there’s some practical concerns raised by @JeroenHoek’s wonderful post earlier in this thread.

Again, I’m just some random girl from Belgium; I don’t make decisions here, and I wouldn’t be able to deal with the burden of doing so either. But I do believe we should seriously consider the next moves we, as a community (that is what OSM is in the end), make here. Do we really want to utilise models that have proven to have a serious bias towards marginalised groups, consume massive amounts of power and water, and benefit larger tech companies, just so we can easily add translations to OSM objects on a large scale? I don’t believe so. Ethically speaking, I believe OSM should be a project aimed at the essence of producing map data, keeping what’s good, improving what’s bad, but not just jumping on the latest trends just because others do. True innovation lies in distinguishing yourself from others, not copying their behaviour.

Side remark: I can seriously imagine mappers having to clean up AI-generated garbage from OSM tags in the near future and wasting their time on that, because some editors would have implemented AI-assisted mapping.

26 Likes

If you want to improve OSM usefulness for tourists, I would suggest these items instead:

  • Making sure POIs are up to date: Add website, social media, and check_date
  • Make sure end-user apps display how recent POIs are (I tried with Organic Maps, but not an app developer, so gave up)
  • Make sure WikiVoyage is updated (at least OsmAnd use these)
8 Likes

You know that you can get these advantages without needing to store this data in OSM itself? You can just take OSM data, then feed into whatever AI generation software you want, and make a new map! You can do that today! You don’t need to save this in OSM.

(I second the “keep this AI slop out of OSM” and “dear god don’t post such blatently eye-roll-ing AI written posts on the forum” comments)

22 Likes

(somewhat drifting off the initial question, but) Vespucci definitely prompts for this, and updating existing existing POIs with that is really easy.

2 Likes

Thank you everyone for your feedback on my AI-assisted tagging proposal! I appreciate your insights and concerns, especially regarding data accuracy, provenance, and the ethical implications of using AI in OSM.

I acknowledge the initial post was lengthy and drafted with AI assistance to thoroughly analyze the legal aspects of AI-generated content and its fair use.

I’m committed to refining my approach based on your feedback, prioritizing data accuracy, transparency, and community collaboration. Your input is vital as we explore AI’s role in OSM.

Let’s continue this conversation and work together to enhance OSM for everyone.

That was a bold move to reply to complaints about AI assistance with another chatbot-written response. Bravo.1 Also there are zero grammar errors, which is something that I, in my fifth decade of using the English language, almost never accomplish.

I assume since you are from Georgia that English may not be your first language. Unfortunately, your writing seems to fall into the uncanny valley, that weird zone where something is imperceptibly…off. We don’t want data in OSM to be off. We want it to be real, authentic, and human curated.

So please - write your responses in your native language, and if you must use AI, use it to translate your own words into English! Also if you are going to use AI to compose forum content, at least make an effort for the result to be clever.

The more you plagiarize AI in interacting with the community, the more hostile the community will be to you and your ideas.


1Sarcasm, in case it didn't translate.
11 Likes

If any data consumer wants to enrich our data with AI generated stuff, he can go for it. But our internal data should be human created only.

9 Likes

While English is not my native language, I’m trying to communicate effectively and clearly. I use AI assistance in creating my responses to ensure accuracy and clarity, especially when dealing with complex topics. But still the content and ideas expressed are my own. I hope we avoid sarcasm, trolling and off topics and focus on fruitful discussion of the topic instead.

1 Like

Welcome to the forum. I would encourage you to keep exploring how LLMs can be used with OSM. But I think you’re going about it the wrong way. For a start, as others have pointed out, your posts are far too long to get useful feedback on this forum. Try asking more specific questions, for example

Hi all, I would like to build an app where blind people can learn about nearby POIs. This requires me to have a textual description for each POI. I have had good results with using GPT-4 to translate OSM tags into human-readable descriptions for use in my app. Would it be appropriate to store these in the OSM description= tag?

(Great idea for an app but no, please store that in a separate database.)

Or

Instead of visiting a place in person, I’m planning to ask an LLM to confirm if the POI still exists. Is this appropriate?

(No.)

Or.

I used an LLM to generate the website URLs for POIs that were missing a website= tag. I then visit each of these websites myself, to verify that they do in fact exist and belong to the POI in question. Do you think it’s appropriate to add these websites to OSM?

(Probably yes. I’d be curious how often the websites are actually correct.)

11 Likes

I agree with Brian’s overall message - AI use in the forum is not helping the case for AI in OSM in general. But I agree with you, David, that it could be said without the sarcasm and rude tone. Personally my opinion tracks with others’ expressed here, specifically that at most AI should be used as an assistant to a human mapper, at a granular level. (for example in identifying missing features for a human to approve/add.)

David, I recommend that you follow Brian’s suggestion to ditch the AI editing/rephrasing, write by hand in your native tongue, and machine translate the result to English. (Or, in the spirit of this discussion, post it as-is and allow the English-speaking forum readers to translate it ourselves! :wink:)

4 Likes

That points to an issue that I don’t think you have addressed. Your proposal involves greatly expanding the volume and complexity of POI tags. If mappers are already struggling to keep POIs up to date, how will they deal with the increased volume of tags in multiple languages and alphabets? The description tags in one of your examples refers to performances at 12pm and 7pm. This really isn’t the kind of thing we would normally include in a description tag, even in one language. If it changes, who will update the information across 20 languages?

As others have said, a lot of this seems to involve duplicating things that wikidata/wikipedia/wikivoyage are better equipped to handle than OSM (including systematic linking of languages via wikidata).

9 Likes

Can we ban this person? For all we know, even their responses are AI-generated.

Part of the reason that Ireland isn’t part of the UK anymore is that people objected to having foreign-language names imposed on them individually. Whatever about historical exonyms, new exonyms are particularly bad.

I don’t think that would be fair - I explicitly asked them to discuss this issue with the community here.

Well yes - they’ve said that since their first language isn’t English they’re using AI to make their responses seem “more natural”.

I’m sure that they’re doing so with the best of intentions, but unfortunately (as others have said) that doesn’t really work on two counts - one is that it can produce overly verbose text that doesn’t really say anything, and the other is that the result looks very disingenuous - like the spammy descriptions that businesses sometimes use, it raises red flags that suggest “this text is not to be trusted”, even if (in this case) that would be completely unwarranted.

When posting in the forum here I’d echo the suggestions above for everyone to post in their native language and let the built-in translation handle things (it seems to do a pretty good job with those languages that I’m somewhat familiar with). Nuance from the original will be lost if original text is “polished to look nice, rather than accurate, in English”.

Another reason is that one of the anti-spam mechanisms that Discourse seems to use is to treat “pasted text” (which would obviously be the case here) negatively. I’m not saying that it’s a good or a bad thing, as a moderator in one of the subforums here I’ve regularly had to manually greenlight posts from new users that had fallen foul of that.

Edited to add: One more thing to mention - I get posts to the forum by email, and I’ve just noticed that my email client also thought the “AI generated” posts above were spam.

11 Likes

Thank you @someoneelse for moderation of this post.

I’ll clarify a bit how the initial post has been created, so that people would understand the value of the conducted analysis.

I’ve manually written drafts of sections “Why AI? Addressing OSM’s Challenges and Expanding its Impact”, “My AI-Assisted Tagging Process: An example and a breakdown of the AI-Human Partnership”, “Addressing Specific Concerns from the OSM Community” and “FAQ”. Once I drafted them, I’ve asked AI to restructure the text, clean up the language. I’ve verified logical discrepancies in my drafts using AI several times, having respect for the community o receive a good quality analysis. After that I’ve found all these sources in the “Reference” section and conducted a 3 hour long dialog with an AI to analyze the legal side of copyright of AI output, with several rounds of fact-checking. That’s how the original text in the post was born. It was born to address concerns raised in Telegram chat on copyright concerns on AI output.

For me, It’s a pity that a witch hunt began. I think inquisitors dismissed their victims’ opinions as well, being solid in their belief to hunt monsters, rather than humans. It’s interesting that some members who proposed using machine translation to overcome language barriers are also critical of the usage of AI for similar purposes. But the bright side is that the OSM community raised valid points and concerns. I would think how to properly address the valid concerns and, might be, propose a new way of dealing with the Babel tower.

In case others weren’t being clear enough, every post has a :globe_with_meridians: button that uses machine translation – a form of artificial intelligence! – to automatically convert the post from its original language to the reader’s preferred language. This technology is supposed to save everyone some effort: you don’t have to scale the Tower of Babel to start a new topic, and others don’t have to in order to respond.

This is a false equivalence. Not all AI technologies are equally useful, and none is an unalloyed good regardless of the context and manner of use.

I think people are nitpicking about your writing process because they found the original post to have – how should I put it? – low information density. Some participants on this forum tend to write lots of fluff manually, even, and the same thing happens: the superlatives and unnecessary detail form a formidable barrier between the idea and the reader.

On top of that, you’re wading into a touchy subject to begin with. As you might suspect, some have already made up their minds about LLM usage in OSM. Overwhelming us with more LLM output isn’t likely to win over the folks among us who already take a dim view toward the technology or its manner of use. Insulting their intelligence with references to the amenity=* and tourism=* documentation may not help much either.

I wasn’t able to digest all that you wrote, but it seems like you have multiple proposals bundled up into one. Maybe start smaller in terms of your idea, and more effective writing will follow.

14 Likes

People don’t like the first post because it is unreadable. With these impossible to read post I just read the first two sentences and the last two sentences. You need to understand that if you write such a long post that you are thereby saying that you do not value other people’s time.

Now for AI in OSM: AI could be very useful, for example checking changesets for vandalism. Or comparing OSM data to satellite to see what is outdated and needs an update. And there a probably many more use cases that both respect OSM and could help mappers. This unfortunately is not one of those cases.

21 Likes

Maybe.

But please, do not add these machine-translated names into OpenStreetMap database and remove all machine-generated translations that you have added.

The claim that Node History: ‪თაბორის რეაბილიტაციის ცენტრი‬ (‪12185682258‬) | OpenStreetMap has Polish name is highly dubious to me. Is any of these name:* tags real? Is name:ru also machine-translated?

What is weird in liking use of tool where it is overall useful (though with some pitfalls and traps) and disliking clearly harmful use for generating and adding fake data?

name:pl field is not for machine-translated value of name field. It is for actual names in Polish. Not everything has name in my language.

10 Likes

Based on the received feedback to my proposal, I’ve decided to revert the AI-generated name tags I’ve added earlier in a week time.

I would like to thank members of the community for the constructive criticism.

I also want to mention the harshness, sarcasm, and assumptions I saw in some of the feedback. I see constructive criticism as valuable resource to move forward, but personal attacks, assumption-thinking, negativity don’t help at all.

I’d like to close this thread and focus on exploring other solutions to internationalization of OSM.

10 Likes