The "OSM Standard tile layer" looks wrong (white lines, abusive comments etc.)

I hope we can agree that drawing roads thousands of miles long with obscenities on it is so clearly in the “not OK” bucket that it doesn’t need discussion. Let’s leave the “squishy gray area” stuff to the humans already adjudicating edits and put some controls in to protect against the “obviously wrong” stuff.

I find it hard to believe that there aren’t people “internal to OSM” (whatever that means) with this type of expertise.

It’s a strawman argument that someone is going to come in like a bull in a china shop and break things with some kind of ham-fisted approach that’s wrong for our community. Yes, by all means let’s not ask for help because we’re afraid we won’t get good help.

…which is a case that can easily be resolved with the right access and privilege controls and the ability for trusted overseers like yourself to increase user privileges in exceptional cases when it’s assessed that users are editing in good faith.

Yes, securing and protecting our data while still maximizing the ability of users to contribute and minimizing the impact of administering the works is hard work. But, if done right it will be less work than constant whack-a-mole and bad press we get every time some potty-mouth kid at a keyboard has the brilliant idea to draw long lines and type in obscenities.

Which by the way, could be easily detected with some basic heuristics like a dirty word list applied to new users triggering an auto account-lockout. We won’t get it right immediately but over time I’m sure we could compile a list of triggers that are clear vandalism and work out the false positives over time.

Imagine:

“Hello, NewUser123. The is the automated system at OpenStreetMap. Your account has been locked because your account is new and we detected edits that appear to be vandalism. If you think this message is in error, please email data@openstreetmap.org and reference ticket number 123456789”

2 Likes

Gentlemen, please tone it down a notch. You are both right, in a certain sense.

Brian has a point that we do need security experts to advise us what kind of measures we should apply to prevent large-scale vandalism…

…but then, Andy is also right that we first ought to define “business requirements” how to separate potentially harmful activity from normal good-faith editing practices, identify gray areas, specify use cases where exceptions may apply (mapathons which involve newbies working under auspices of experienced users, alternative accounts for automated editing…) and so on.

Before the recent string of incidents (Ukraine/Russia name clashes; vandalism from new accounts belonging experienced long-term abusers; to name the most prominent ones), our basic defense was “assume good faith”, trust that malevolent actors are just not interested in OSM, and that any revert or DWG intervention will be sufficiently quick. However, that worldview is obviously too naive for the today’s world, and we have to work in both directions (cybersecurity expertise and subject area knowledge) to reduce the future risk.

9 Likes

In my experience this is the problem OSM community faces quite often: the outsiders who would like to help (be it with validation, data sources, imports, software development or even communications work) are not as useful as we (or they) would’ve hoped. It takes intentional effort to learn how the OSM community operates due to it being a semi-anarchy. Having to hand-hold outsiders slows things down.

There are people who can easily get into any new field and not experience any mental blocks, but they’re rare.

4 Likes

If it can’t be done in English someone will use their own language so we’ll have to include many languages and the dirty word list will get very, very long.

This is my first thought but I have no idea if the list can be done in the real world.

Otherwise it sounds like a good idea to me.

Also limiting a changeset to 1 km² (sounds rather complicated to implement) and limiting way length (between two connected nodes) to 1 km sounds reasonable to me.

I actually don’t have to imagine because there already is analysis of “new user edits” (over and above the rate limiting at the API level) and yes, as a number of new users can testify, that has led to automatic account blocks.

Unfortunately it’s a little more complicated than "some basic heuristics like a dirty word list ". Taking this account as an example (one of the three involved here), it started vandalising at 2:08 UTC and finished at 2:09. The DWG detected it as a problem at 2:11. The vast majority of the data was reverted within 68 minutes of that (there were as noted above a couple of objects that got missed)…

Looking at the initial vandalism by the account above, https://www.openstreetmap.org/relation/6336333/history/144 wouldn’t have triggered even George Carlin’s bosses back in the day, and even if it had it would have bought us a whole 120 seconds of extra notice. The challenge with map edit-based blocking is that it is by definition after the event, but the alternative (at the API level, or even somehow at signup) is much harder to do. If anyone thinks they can solve that problem, then all suggestions would be gratefully received, but it really needs to be testable code rather than just ideas.

To get back to the original point

I actually tend to agree with that - but more to help e.g. the board think through the effects of some decisions. For example, the tile CDN allows OSM’s “standard” layer to be slurped by all sorts of organisations and suppliers to organisations for free, which immediately cuts out of the loop anyone who might have asked “so how up to date do you want the data on your website to be” who might have talked through what the implications of the answer to that question were. The net result of that is that the DWG then gets lots of emails from people who are upset by what the “free pony” that they were given previously is now saying.

Edit: “changeset-based” changed to “map edit-based” above because almost no-one consuming OSM data feeds is waiting until changeset closure; its all based on minutely diffs. There’s a separate feed of changeset metadata, but that won’t help you find e.g. a turn restriction relation that has grown dozens of extra objects in it.

5 Likes

I think that we have a cybersecurity problem but it is by design. You can’t be safe against anything if you let everyone edit everything without any cross-checks. And we have currently designed our project such that it is easy to edit - for everyone, even the bad guys.

I think it is a fallacy to assume that any degree of cybersecurity expertise will get us to a point where we can enjoy all that is good about “everyone can edit everything” while magically getting rid of the unwanted side effects.

There are a few obvious things that could be rejected by the API before they are ever accepted in the database, with no loss to standard mapping uses, but these don’t need a cybersecurity expert to find - these need a good coder to build a validation engine into the API in a way that doesn’t suck all performance out of it :wink:

19 Likes

The Fastly CDN has an option to purge the entire cache, but I guess this wasn’t used because it might turn the vandalism into a DDOS attack on the render servers and degrade the service for all.

Instead, Firefishy specifically has written some code for a more targeted invalidation as a response to this:

I suppose this will also help with future incidents.

There is an open issue to also invalidate tiles on the CDN when expiring them on the render server, but API limits at Fastly might be too tight for our use case:
operations#947 Fastly soft purge tiles based on diff updates

3 Likes

The operations team discussed the “purge all” option. The response paraphrased was “hell no” as it would indeed melt the cache servers for an unknown period of time.

The cache invalidation code I added on tile request invalidates the cache of any tiles which originate from before the vandalism was reverted, fresh copy is pulled from the backend and cached. It will likely be useful for any future incidents.

10 Likes

I kind of suspect that the better approach would be the other way around: give the DWG/Ops the facility* to stop rendering and maybe replication until clean up has finished. With other words avoid rendering vandalised data instead of having to clean up in two places (halting replication is likely a bit problematic but could at least be thought through).

* as in “one click”.

4 Likes

Indeed, the delay between mapping something and seeing it on the Standard View is very little. Alert chain must react quite quickly.

While the replication delay is very low, higher zoom tiles only get rendered on demand, so there can be quite a substantial delay between a data change and an affected tile actually being regenerated.

Imagine something like OSMCha consuming diffs and turning off rendering if it detects something seriously suspicious, that would reduce the reaction time to ~1 minute or so, at which point I suspect nearly no tiles would have been re-rendered (yes and that creates a nice DOS opportunity, but TANSTAAFL)

That wouldn’t protect 3rd parties consuming diffs as set up now, but you could naturally add say another 5 minutes delay before making diffs available on planet.osm.org. This doesn’t solve the issue that everybody consuming diffs would invalidate a hell of a lot of tiles (once replication was restarted) in a vandalism situation as discussed here, but I suspect most would still prefer that.

There absolutely are usecases (e.g. mapper feedback) that mean that delaying diffs on planet.osm.org would be a bad idea - but people consuming those diffs (or worse - just using the “free” tiles at tile.osm.org against the spirit of https://operations.osmfoundation.org/policies/tiles/) absolutely need to know that their decision to use data updated on the fly was a conscious one. Delaying updates is absolutely doable (and directly supported by the raster tile toolchain - I’ve even documented it at switch2osm.org :slight_smile: )

Many of the correspondents that the DWG had complaining about the recent round of vandalism were from large commercial companies (or their customers) who presumably based their products around “free” tiles because it was cheaper than actually asking the questions “how up to date should the background map underneath our product be?” and “what are the advantages and disadvantages of the various options?”. I’m sure that most or all could afford their own server infrastructure and the expertise to run it, they chose instead to use resources from a volunteer organisation because it was better for their bottom line.

5 Likes

Yes and no.

The thing is, one shiny day, far in the future, when the OSMF provides GDPR compliant planet dumps and diffs, instead of providing PI to people that neither want it, need it and immediately throw it away in any case (not touching on the fact that their DP officers would get a fit if they realized the situation), you are going to have a second set of diffs that could easily be delayed and/or paused in scenarios as we had in this case without impacting mapper feedback.

1 Like

OK, we now need a “flying pig” emoji to go with the :popcorn: one…

5 Likes

In this case I prefer:

1 Like

On the ‘upside’ of the International Rescue efforts, noticed in the last few days the Carto 2km-5km and zooming further up seem to be updating now daily (or nightly) instead of only Friday or Saturday night. Goodoos

3 Likes

Yes, the OWG (Operations Working Group) had already improved a couple of things in response to past incidents. My subjective selection of what I noticed and found as an outside observer:

API:

  • rate limiting for
    • signup requests
    • changeset comments
    • edits
  • account deletion cool-down period

Rendering:

  • daily low zoom (0-12) render (chef#627, chef@6410e8b)
  • switched to osm2pgsql expiry and osm2pgsql-replication (operations#987)
    • should fix almost all previous edge cases, where some changes hadn’t expired affected Metatiles on the render server, especially:
      • relation changes
      • ways crossing without nodes (relevant here)
    • unfortunately, as more tiles get expired now, the servers odin and ysera can’t really handle the increased load, which has been an issue here

Many thanks to the OWG for your work and the constant improvements!

8 Likes

Bravo, sir: that does seem a, or even the, correct time, place and manner (to own / direct such a major sluice gate of data). If we ever find ourselves in that future, that is. So, thank you for a revealing place to introduce what in a radio broadcast or national uplink feed, might be the 15-second (10-second, one hour…) delay.

There truly are times, places and manners to do this (bulk processing, including or only delay, of live, big data). Does OSM have the computational, automation, human “sift through things” intelligence resources (to discern GDPR-ness) to do this now? It seems not, as having / providing sophisticatedly-processed planet diffs is described as “one shiny day, far in the future,” not now.

Maybe there is something I don’t understand, but that sounds like starting with a false premise and basing something upon it, so I’m confused. I’m impressed with learning that “this would be the way to do it, with a slight delay to planet diffs…” is the sort of sluice gate that it is. I like learning new things. API and Rendering tactics, neat-o. I’m sure there’s more and that’s awesome.

Pigs likely don’t fly (it does seem like deep computation or human evaluation on a lot of incoming data) so now that we know a good “choke point” how do we generate these magical diffs? How do we discover bad actors and vandalism? These are highly related and a lot of great work by genuine heroes in our project who cleverly fit certain pieces of it together and the whole effort of “Shields up!” makes real differences. I once again salute those who defend our data with serious, effective ongoing efforts.

It is worth it to do so. I also glean that it is a fair bit of effort on our part. Keep it up, everyone and thank you. We all play a part in this. Good discussion about this is healthy. The multi-pronged, plastic, flexible, nimble, smart, clever, stay ahead of bad actors approach is working. And while there might be no rest for the wicked (bad actors, vandals…) there is no rest for the vigilant. I love that so many of us care so deeply about our project.

It’s really strange to read the code discussion and commit taking place in Dec-2023 but only seeing the effect on 0-12 zoom in the last week and Ctrl+F5 IS my go-to to get a stale cache refresh.

6 posts were split to a new topic: Personal Information and GPDR