Bad Langugage Example

dav_van_dect · November 3, 2023, 3:30pm

I’m a Informatic student whos developing a tool for detection of bad language use when someone create or edit a name of a location. For that reason, I’m asking if there are some good examples of this to implement a prevention against it. If this were already discussed, I’m sorry and would love to follow a suggested link.

RicoElectrico · November 3, 2023, 3:46pm

Look at OSMCha list of bad words:

and

github.com

willemarcel/osmcha/blob/master/osmcha/suspect_words.yaml

sources:
- google
- nokia
- waze
- apple
- tomtom
- wikimapia
- goo.gl
- navteq
- teleatlas
- yelp
- yandex
- яндекс
- 2gis
- 2гис

common:
- pokemon
- import
- reimport

This file has been truncated. show original

ToniE · November 3, 2023, 3:46pm

What would you describe as “bad language use”?

A village in Austria renamed itself to “Fugging” formerly written with ‘gg’ replaced by ‘ck’.

Would you prevent mappers naming a village with its real name just because the name is “bad language use” in a different language?

trigpoint · November 3, 2023, 4:49pm

Or placenames in the local language.

Most have probably aware of the Scunthorpe Problem but there are many more in the UK https://en.m.wikipedia.org/wiki/Scunthorpe_problem

ezekielf · November 3, 2023, 4:51pm

Yes very hard to automatically filter out bad words that shouldn’t be in OSM from bad words that should be in OSM. See:
https://www.reddit.com/r/CasualUK/comments/omeq85/perfect_road_trip_doesnt_exi_all_names_are_real/

Minh_Nguyen · November 3, 2023, 6:34pm

Another challenge is that there are many words that aren’t inherently bad, but in certain contexts they may be. For example, due to its Dutch heritage, New York State has many rivers named “-kill” and many places named after those rivers. Sometimes changeset comments mentioning these places have gotten misidentified as somehow condoning violence.

Conversely, someone once deleted a hamlet named Coonville out of an assumption that it was a racial slur. That exact name has appeared on places elsewhere in the country as a racial slur, but as far as I can tell, this particular instance was named after the Raccoon Creek that runs past it.

willkmis · November 3, 2023, 11:59pm

Sometimes things are even named swear words on purpose in the real world. See this restaurant I recently mapped: Node: ‪Burger Bitch‬ (‪10128505319‬) | OpenStreetMap. I imagine it’d be pretty hard to design something that didn’t flag this, though I guess that’s what manual review is for.

oznius · November 4, 2023, 12:44am

Automatic profanity detection can only go so far, but in my experience moderation tools that allow for whoever is inputting data to preemptively flag it as such (basically, they know it will trip the profanity detector, but they believe in good faith it is still the correct input) makes manual review later much easier. Someone who wants to vandalize the map generally has little incentive to do something that will immediately flag their changes for review.

I’m not sure what the scope of your project is, but it’s worth thinking about in development how to handle “approved” profanity.

dav_van_dect · November 8, 2023, 10:31am

Thank you very much for the information and your interest. I was specifically searching for an example of ‘language vandalism,’ something like this: New York renamed 'Jewtropolis' in map hack - BBC News

hfs · November 8, 2023, 11:08am

You can try the OSMCha profanity filter (need to log in). You can activate it via Filters › Flags › Reasons for flagging › Profanity tag.

There was a MapRoulette challenge checking the OSMCha profanity detections: 96 % out of 7596 “issues” were false positives.

hfs · November 8, 2023, 11:50am

If you’re curious you can download the tasks with status “Fixed” from MapRoulette: https://maproulette.org/api/v2/challenge/view/8113?status=1&priority=0,1,2&reviewStatus=0,1,2,3,4,5,6,7,-1&invf=&timezone=&filename=challenge_8113.geojson

You can find some minor vandalism among them, but by far not even all of the “Fixed” tasks are vandalism.

This looked like vandalism but there seems to be a real sign: Way History: 388029001 | OpenStreetMap

This road really seems to exist with this name: Way History: 32238926 | OpenStreetMap

So, it’s not easy to moderate such changes from remote.

whb · November 8, 2023, 1:04pm

There is also a “Pussy Lane”:

pnorman · November 8, 2023, 2:49pm

If you want really offensive names, the Vancouver North Shore trails used to refer to sexual body parts, but many have changed names since then to abbreviate those words.

North Shore mountain bikers are perfectly willing to use what others would consider very offensive language.