Where do I sign up to help clean up spam from the OSM diaries

OSM diaries see a lot of spam. Where do I sign up to help keep the diaries page spam-free?

4 Likes

Hit „Report this entry“ to report it as spam. A moderator will then process your report and hide the diary entry.

5 Likes

The moderators == the DWG?

Mostly yes, I’d say. It’s everyone with a blue star on their profile page, e.g. SomeoneElse | OpenStreetMap

The actual flow of a reported user is to the admins first, so we (the DWG) don’t tend to see spam reports via that route because the admins have already dealt with them.

We do hide the occasional spam diary entry that we might spot ourselves, though.

1 Like

I wonder if it would make sense to split up the roles more, technicalities aside.

  • There is moderation of spam, misbehavior on note and changeset discussions etc.
  • and there is reverting vandalism, copyright claims, etc.

The former, dealing with people, could theoretically be something moderators here on discourse could do too. With OAuth2 and all this, maybe it would even be easier to integrate that.

1 Like

I know my opinion on this topic is not very popular with some folks.

I would discontinue Diary posts on the Rails port altogether, and create a new category for “Diaries” right here on community osm.

All the moderation, discussion, tagging, translation features are already in place, and they’re way more sophisticated.

On top, Discourse offers self-hosted images, which is a huge pain on the Rails port where you need to resort to some external image hoster.

Improving the Rails port by working out some improved spam removal authorization schema seems like wasted effort to me.

Some random page demoing a blog like ui on discourse: Elektronauts

3 Likes

I think it’s a bit early to be discussing porting more things to Discourse while it has so many issues to be worked out. Let’s see how the transition of the forum and help.osm.org go first.

5 Likes

I was thinking more about becoming part of a moderator team, if that exists, to process / delete the spam that gets reported. Since I would say 90% is very easy to distinguish from “real” content (at least in English), the amount of work involved is not enormous if spread out between a few people. I would be happy to help out with that.

1 Like

Hmm. I personally like the diaries. But the main reason I like them is that they are right there linked on the front page of osm.org. That could be achieved with a different “platform” as well, of course. But I would like to keep encouraging diaries to be a “long form” format. Platforms like Discourse are more aimed at Q&A and starting discussion threads, which I would find less suitable.

1 Like

What exactly is the perceived problem? Pretty much all diary spam is removed pretty quickly already usually by one of the DWG moderation team hiding the post and then by me closing the user’s account in most cases.

Right now the third diary entry is a spam (though not so obnoxious: looks a bit like someone confused how to add their business to the map).

At basically any time you can use Spam/Report user - OpenStreetMap Wiki queries to find blatant spam (I just reported several accounts, though most of spam is in profile descriptions - up to porn website spam with matching avatars)

Ideally spam cleanup would be fast enough so that it is not possible to find 20+ active spammy profiles using Google search.

That diary entry is less than 24 hours old - if nobody had killed it by the time it hit 24 hours then I would have seen it and killed it.

That said it’s quite borderline anyway because it’s fairly obviously a confused attempt to add a business to the map rather than outright spam as you seem to have recognised by replying to it.

The wiki page is mostly talking about user descriptions which is an entirely different problem because they’re much less visible so don’t get found so quickly - the flip side of that is they’re less of a problem because most people never see them.

Personally, I find that my diary RSS feed contains spam with some regularity. Of course, it has usually been removed by the time I check. So that particular issue is specific to the feed reader use case (where the entry doesn’t get removed once it has been downloaded).

Are there stats on how many accounts per month end-up being deleted because of spam and how long it take since account creation to account deletion? It would be also good to know about spam diaries and comments too. What’s the monthly % of spam accounts/content created vs non-spam?

With this information it would be easier to understand if this is really a problem or just a perception.

There are no such statistics, and as the reason for closing accounts or hiding diary entries is not recorded there is no way to generate them.

Uhm, then maybe it would be worth considering a way/system to document some stats (that don’t require dev work ideally) so an informed conversation can happen.

My personal opinion is that without some numbers it’s impossible to know if there is a problem, and if so how big it is or the sources of the problem.

One idea would be to maintain a spreadsheet each time a account/content deletion is done, if that’s a time burden maybe someone can help implementing something on the system to keep a log.

Assuming these spam accounts are being deleted, you could download https://planet.openstreetmap.org/users_deleted/ on a daily basis, and run diffs every day.

Probably a few users also triggered an account deletion on their own, but for the most part, it’s likely to be spam accounts. This might be a good indication to get some upper limit on deleted accounts for a given time interval.

But even with those numbers, it’s absolutely not clear what you do with them, and at what point you would consider it an issue. As always, users have zero tolerance for spam, so any number of spam will always get you some annoyed users.

I think you grossly underestimate the number of users that delete their own accounts.

1 Like

If people have time to spend writing code then there are far better things they could spend it on to make the moderation process easier than collecting statistics!

Collecting statistics would actually make life worse, by adding an extra step (to choose a reason) when an action is taken.