TLDR: Many spam edits share very similar attributes that makes me wonder about their origin. Can we stop or improve these edits at the root?
OSM Spam
Hi there. I’m sure many of us are aware of the low-quality SEO spam features that are relentlessly added to OSM. I’ve recently become interested in detecting and improving/removing them. Fortunately, many of these edits share common attributes! That makes them easier to detect. It also makes me very curious about them, because obviously they’re all following some guide or being done by the same marketing company or something.
Some examples here (osmcha). They are very common.
My Question
Is there anything that can be done to investigate further? Perhaps someone with deeper access to OSM’s data can check whether they are all from the same IP? Or maybe someone with better sleuthing skills can find the guide or service that’s the origin of this very distinctive editing style.
More Details
Edit similarities
- Changeset comment
Add My Business
exactly (or sometimes a big list of SEO terms) - Brand new account, username usually some variation on business name
- iD editor used
operator=
tag almost always used, usually someone’s name (presumably business owner’s real name?)- Always very similar invalid tags:
Hours of Operation=
(values and formatting varies)Category=
Keywords=
huge list of SEO type words and phrasesPayment Options=
Year Established=
addr:street=
value includes housenumber, eg623 Main St
- no
addr:housenumber
tag - sometimes
Country=USA
- Frequently for businesses common in lead-generation scams like locksmiths, plumbers, etc.
How I detect them
- Overpass queries for many of the invalid tags listed above
- OSMCha saved filters to review recent changesets with “Add My Business” etc.
- Overpass query for descriptions with 255 chars, usually meaning it was copy-pasted in and truncated by length limit
- Overpass query for descriptions with any suspicious words. (Made by @Friendly_Ghost)
How I fix them
By hand every time, as is OSM tradition. Sometimes I remove the offending feature without comment, sometimes I just repair the tags - depending on how bad the issues are and how plausible the tags are. (eg. is the address accurate? Does this location make sense for this business? Was any effort at all put into following OSM standards?)
If there’s any question in my mind that a suspicious edit is genuine, I usually leave a comment, flag it in OSMCha, and remove it later if they don’t respond. (They almost never do.)
What is SEO?
SEO is “Search Engine Optimization” - the attempt to make one’s website appear higher in search results. SEO is the motivator behind lots of OSM’s spam.