I’m a Informatic student whos developing a tool for detection of bad language use when someone create or edit a name of a location. For that reason, I’m asking if there are some good examples of this to implement a prevention against it. If this were already discussed, I’m sorry and would love to follow a suggested link.
Look at OSMCha list of bad words:
What would you describe as “bad language use”?
A village in Austria renamed itself to “Fugging” formerly written with ‘gg’ replaced by ‘ck’.
Would you prevent mappers naming a village with its real name just because the name is “bad language use” in a different language?
Or placenames in the local language.
Most have probably aware of the Scunthorpe Problem but there are many more in the UK https://en.m.wikipedia.org/wiki/Scunthorpe_problem
Yes very hard to automatically filter out bad words that shouldn’t be in OSM from bad words that should be in OSM. See:
Another challenge is that there are many words that aren’t inherently bad, but in certain contexts they may be. For example, due to its Dutch heritage, New York State has many rivers named “-kill” and many places named after those rivers. Sometimes changeset comments mentioning these places have gotten misidentified as somehow condoning violence.
Conversely, someone once deleted a hamlet named Coonville out of an assumption that it was a racial slur. That exact name has appeared on places elsewhere in the country as a racial slur, but as far as I can tell, this particular instance was named after the Raccoon Creek that runs past it.
Sometimes things are even named swear words on purpose in the real world. See this restaurant I recently mapped: Node: Burger Bitch (10128505319) | OpenStreetMap. I imagine it’d be pretty hard to design something that didn’t flag this, though I guess that’s what manual review is for.
Automatic profanity detection can only go so far, but in my experience moderation tools that allow for whoever is inputting data to preemptively flag it as such (basically, they know it will trip the profanity detector, but they believe in good faith it is still the correct input) makes manual review later much easier. Someone who wants to vandalize the map generally has little incentive to do something that will immediately flag their changes for review.
I’m not sure what the scope of your project is, but it’s worth thinking about in development how to handle “approved” profanity.
Thank you very much for the information and your interest. I was specifically searching for an example of ‘language vandalism,’ something like this: New York renamed 'Jewtropolis' in map hack - BBC News
You can try the OSMCha profanity filter (need to log in). You can activate it via Filters › Flags › Reasons for flagging › Profanity tag.
There was a MapRoulette challenge checking the OSMCha profanity detections: 96 % out of 7596 “issues” were false positives.
If you’re curious you can download the tasks with status “Fixed” from MapRoulette: https://maproulette.org/api/v2/challenge/view/8113?status=1&priority=0,1,2&reviewStatus=0,1,2,3,4,5,6,7,-1&invf=&timezone=&filename=challenge_8113.geojson
You can find some minor vandalism among them, but by far not even all of the “Fixed” tasks are vandalism.
This looked like vandalism but there seems to be a real sign: Way History: 388029001 | OpenStreetMap
This road really seems to exist with this name: Way History: 32238926 | OpenStreetMap
So, it’s not easy to moderate such changes from remote.
There is also a “Pussy Lane”:
If you want really offensive names, the Vancouver North Shore trails used to refer to sexual body parts, but many have changed names since then to abbreviate those words.
North Shore mountain bikers are perfectly willing to use what others would consider very offensive language.