Cyber attacks in the OSM space

By its name, a cyber attack.


Cyber attack refers to actions done via unauthorized access. This isn’t the case. This is plain vandalism.


I’ll cite wikipedia:

A cyberattack is any offensive maneuver that targets computer information systems, computer networks, infrastructures, personal computer devices, or smartphones.

I would assess that the biggest cyber threat to the project come from users who create accounts and then conduct malicious edits through the front door.


No. The end result is that the geodatabase – the product that we fundamentally provide to the world as a public good – is still damaged.

OSM isn’t a hobby project anymore, it’s a serious and valuable asset that powers apps and services that people rely on. It’s also seen as a source of geo-truth and is in some cases in conflict with the preferences of governments in less free parts of the world. So there’s plenty of potential future conflict by people with more of an axe to grind than “mad at Andy”.

These kinds of garden variety trolls by themselves aren’t that interesting. What concerns me is how they highlight that a malicious actor can spend a small amount of their time to waste a large amount of time from others in cleaning up after them. It’s a huge power imbalance and it’s one that’s ripe to be exploited.


Which continues:

An attacker is a person or process that attempts to access data, functions, or other restricted areas of the system without authorization, potentially with malicious intent.

Yes, but that is exactly what we want, just minus the malicious aspect, none of the “attackers” are accessing parts of OSM that require privileged access.

Now outside of arguing semantics, I believe we can all agree that measures that work for more “conventional” settings are not going to be a big help in our scenario. Worst case we just kick off an arms race, that’s why we need measures that don’t just raise the bar a bit, but such that deny the success that is seeing their vandalism on widly used map services.


It’s true that the primary vector for malicious behavior is through “authorized access” from a very narrowly semantic perspective. But it’s also not true that “users are authorized to fake and obscene geodata” regardless of the lack of technical controls that prevent them from doing so. A separate example: I think anyone would call a distributed denial of service attack a cyber attack, even though it requires no unauthorized privileged access to execute.

I use the term “cyber attack” deliberately because I think it’s a mistake for us to group the kind of malicious edits that we’re discussing in this thread separately from other attack vectors that we traditionally take more seriously. I think protecting our data needs to be a holistic consideration.

I’m also increasingly of the opinion that our problem set is fairly unique in that the data is massively crowd-sourced, the “undo” button is slow and labor-intensive, and the problems manifest easily and dramatically. So it’s not like we can just apply a standard template for data protection.

While that is true, using the term “cyber attack” invites a knee-jerk
response of “we need a cyber security specialist”, when practically
nothing in the repertoire of a cyber security specialist will be what we
need. Strong passwords? Antivirus software? Don’t click on attachments
in your emails? Make backups? Encrypt your traffic? Review your network

I therefore prefer the term “vandalism”, which is also a better match
for the type of rather brainless attacks we’ve seen - none of them
trying to somehow “break OSM” or abuse OSM towards some sinister goal,
instead just being crazy nationalism or individuals sulking because we
kicked them out.

I’d reserve the term “attack” for when state actors come sending their
hacker armies, something that will certainly happen some time, but
hasn’t yet.


This is a hyper-simplistic and wrong description of what cyber security entails. What we need is a holistic analysis of the project’s assets and threats, and a measured and thoughtful approach to data protection that protects our ability to crowd-source geodata.

The only knee-jerk reactions we have are rapid attempts to band-aid controls in when attacks are happening followed by a lull of complacency when we pat each other on the back until the next attack.

Poo-pooing an entire field of study only serves to demonstrate the project’s cavalier attitude towards data protection and lack of recognition that the world and the threat environment has changed.


should vandalism of Wikipedia articles also be called a cyberattack?


I think we should move from a discussion about semantics to the pressing concern that ZeLonewolf actually wants to stress out and that is: the OSM database has no protections whatsoever at the moment for probably sophisticated vandalism acts to come in the future that may be way worse than anything OSM has seen in the past.

So it is this:

that we should think about, not if we want to call it an attack or not.

And, as an addendum:

Even worse than vandalism acts would be a situation like - as an example - a ransomware attack against the planet osm db infecting mirrors and backups as well that would immediately halt any mapping activity - and this is a cybersecurity threat.


The OSM data is full of errors and inaccuracies, and for the mapping side we cannot prevent that without impeding the most important data source: the mapping crowd. Some measures can be taken, are in place and effective against mapping errors. Most important: given the variety of mappers, and the BYO-tags policy, there is amazing (though far from complete) unity in basic tagging. Still, the data is inherently flawed. Intentional attacks, introducing false data, is a very visible form of flawed data.

The consumer side (users of apps and applications, non-mappers, persons and organisations) require a consistent, non-flawed, stable product with verified updates. Reliable, available. Content flaws in maps are often accepted as long as it is not too obvious and does not touch the core functionality of the application.

From a helicopter perspective (where I am in the helicopter wearing a business manager’s cap), OSM data is good enough as a background map; for routing and navigation it is acceptable, but it’s way too unreliable for core functionality in other end user apps.

For OSM to open up the larger world of end user map data applications, it must offer a more stable, more reliable, more available data with quality assurance. That means, it cannot happen that one malicious common user this easily corrupts the delivered end product and hits all the end user applications. If OSM does not offer that, this larger world is not within reach. There is a strategic choice here, I am not sure if this choice has been made.

Any solution has to keep the mapper side as open as possible.

That is why I would prefer to solve this conundrum at the data delivery side. And I don’t think it suffices to point out that anyone can set up a filtering system and a cleaned-tile server, because the decision makers will then, as they do now, generally prefer a paid service with no-attack guarantees.


This is precisely why Overture has appeared. It addresses this exact need, for quality assurance of data provided to end users. What I find interesting as an economist is that the current OSMF budget for supporting the project is roughly a million dollars, while the Overture budget for building a cleaned-up version of OSM data augmented by additional data from other sources is multiple millions of dollars.


There are important differences between good-faith errors and intentional vandalism. Sometimes accidents are widely felt too, but we look at them in a different light than vandalism. We can mitigate good-faith errors in a variety of ways without affecting the distribution mechanism: user education, improved editor usability, better documentation, tagging scheme reform. We have room for improvement in all these ways, but we’ve done a good job of managing accidents in general.

The concern expressed in this thread is that we’ve been slow to respond to the rise in malicious edits with systematic measures. We have some guardrails against casual vandals, like iD and Rapid “locking” features that have Wikidata tags, suggesting some degree of notability. But to the extent that we have any countermeasures against more persistent vandals, they’ll inherently grow outdated over time.

Today, our first line of defense against persistent, high-profile vandalism is a relatively small group of elite mappers who know how to use a third-party service[1] to detect spot fires and various arcane revert tools to extinguish them. It’s easy to see how the rate of vandalism could potentially outstrip our capacity to fight it as OSM becomes more prominent and vandalism techniques also become more widely known. We can’t say for sure if or when this will happen, but planning for this scenario wouldn’t be wasted effort, because it would also cut down on the effort to fight ordinary vandalism, which we currently take for granted.

In discussions about the future of countervandalism, we’re quick to dismiss approval systems, no matter how nuanced. This is only natural, because many of us are familiar with the approval systems of other platforms such as Google Maps and we don’t want to be like them. Gating volunteer contributions behind an an approval system can fail too, sometimes disastrously. But anyways this is a sledgehammer when perhaps there are scalpels we haven’t considered yet. My suggestion would be to investigate the countermeasures employed by similarly situated projects that also need to maintain a high degree of openness despite being prominent targets for vandalism.

To pick on a project I’m familiar with, many of the technical measures against vandalism in the MediaWiki ecosystem are responses to common behavioral patterns that the Wikipedia community has identified over the years. These measures also ended up helping the project deal with other problems besides vandalism. Abuse filters block many attempts at vandalism but also block SEO spam, a scourge we’re also familiar with. CheckUser helps administrators block sockpuppets whether for block evasion or for astroturfing in a content dispute. Revision scoring not only catches subtle vandalism but also lets Wikimedia claim that they’ve been using AI all this time. :smile:

Our analogues to these tools would look different due to our data model, but they wouldn’t be impossible, and I think it would be possible to avoid a backlash over egalitarianism or transparency.

  1. Or does OSMCha count as second-party because of its OSMUS sponsorship? You get my point. ↩︎


I disagree only with this point. We are very fast to respond to malicious edits. It’s just that the tools we have available to respond to a malicious edit are increasingly insufficient to prevent harm to our data and reputation.


If you’re really paranoid you could look at the backdoor that a rogue maintainer put into xz, or at the deepfake impersonators that have been popping up recently, who could be used to disrupt the board or a working group.

In my opinion, the description of

is pretty much accurate and close to what’s in the Yellow Pages thereunder.

If there are studies that go beyond, it is certainly not crowding the market place, perhaps some “basic research”?

So true. My point is one step further: we have to accept harm to our the data, because we do want to keep free and open public editing. Intentional abuse cannot be fully prevented without closing the door, and taht would be the end of OpenStreetMap as we know it.
The question is: (How) can we prevent that seriously compromised data reaches end user applications, causing unacceptable displays (current case) and functionality (less obvious but just as easy).

End user application providers face more or less the same problem: currently, they have to assume that serious errors can be in their data source; if they refresh very often they have all the improvements fast but are vulnerable to major errors; if they delay the refreshes they miss actuality (important for routers).

Note that at this point there is no difference between intentional and unintentional harm to the data. The criterium is: how bad/harmful/unacceptable is it for the end user applications using the data.

I think providing a cleaner, more protected source would be the best overall solution. Besides providing the opportunity to prevent distribution of harmful data, there are other benefits, which are discussed in amother topic, I believe.

The Foundation has started to seek out financial support for a
three-tiered solution:

  1. A tool to isolate and throw out all of the edits of one or more users
    in one step
  2. A downstream database that keeps only those objects and object
    properties (tags vs coordinates) that are stable over time X. With X
    being a day, a week, or a month.
  3. More tooling to track ongoing edits

I expect those things to take two to three years to come to fruition.
The good thing about the federal nature of OpenStreetMap is that
everybody is encouraged to produce third party tools quicker.

For example, I’m working on that tool to throw out users in one go,
robust against arbitrary user stacking, partial reverts, and any
changeset shaping that is based on Overpass spitting out an OSM file
that can be opened and applied by JOSM. That is not a great UX but
definitely the best step to figure out which finer semantics makes sense
when the time for a feature in openstreetmap-website comes.

Again, that is expected to take one to three months before something
becomes usable. We have already read a lot about the general mood, and
it might make sense as a next step to actually implement things. The
more third party tools are present to be scrutinized the better we
understand the actually useful semantics.


I would note two points

  • we already have a mechanism to redact edits at a very large scale (larger than anything that has happened in more than a decade), instead of creating something new, what about an effort to clean those tools up and make them run in the current environment.

  • we have plenty of existing tooling to detect bad edits, it just has the same problem any newly developed tooling will have, it wont be run by the OSMF.


What exactly does “stable” mean? Unchanged?