Cyber attacks in the OSM space

Minh_Nguyen · May 25, 2024, 8:50pm

And if we think of the “review” queue as nothing more than the current list of changesets, possibly filtered, that anyone can “review” today (and maybe revert if they know how), then what we’re left with is essentially a no-build solution: an extract of OSM elements with a timestamp older than a certain cutoff date (including deleted elements), and presumably a tileset and geocoder that use this extract instead of the usual minutely diffs. It’s not nothing, but if we want it to lower expectations about the rawer database, then we have to raise expectations about this extract so that it sees enough adoption.

The more serious data consumers consume neither the minutely diffs nor the Standard tile layer and Nominatim endpoint that are based on these diffs. This safer OSM distribution would be geared toward the sort of casual data consumer that has been hotlinking the Standard tiles. Some of these data consumers have an unreasonably high expectation of this tileset and we’re hearing all about it; wouldn’t their expectations be even greater if they’re forced to actively migrate to another tileset that we market as safer? Do we think we can convince data consumers to use this tileset instead of one based on a derivative like Overture?

Woazboat · May 25, 2024, 9:51pm

Something like that could make sense, but the problem is that the current notion of a ‘bounding box’ for changes in OSM only has a tenuous relation to the actual geographical extent of meaningful semantic change.
Simply adding a single node to a way causes the extent of the whole way to be added to the bounding box, which has the potential to be huge, even if the way itself has not been meaningfully changed in any way. The same applies to relations, where simply splitting a road could affect a continent-spanning route relation. Another source of big changeset bounding boxes are multiple small isolated edits with a large distance in between that are uploaded at the same time (e.g. StreetComplete changesets where a user makes some tiny changes to a POI, takes a flight halfway around the world and continues mapping there).
To do this properly, we would need to interpret the semantic content of changes and not just blindly count the affected OSM primitives. That is a huge challenge and an enormous amount of effort.

A simple check using the current bounding box definition could be of some use, but it would have to have very big continent-sized limits to keep the false positive rate down.

ElliottPlack · May 25, 2024, 9:54pm

I don’t think so, at least not ones that are already willing to pay for something curated.

OSMF could consider throwing their weight behind daylight or one of these other ones instead of rolling its own ‘safe’ tiles. I would think most mappers still want carto for quick visual feedback.

Woazboat · May 25, 2024, 10:00pm

Any change in the API that allows rejection of uploaded changes will also need to consider the impact on editor software and the knock-on effects. Besides the obvious user interface changes, the possibility of an upload API call being blocked means that the ability for ‘chunked’ uploads spread out over multiple requests effectively has to be removed from the API. We absolutely do not want broken partial data as a result of one upload call out of a chunked upload being blocked while the rest succeeded.

Peter_Elderson · May 26, 2024, 6:27am

I thought of changesets as input side, before the actual change, but of course, after the database changes they are also groups of actually changed objects. So they are an entry for review/clear-for-release after the fact, and you can apply algorithms and rules such as "should not span more then ?? Km / Km^2. I smell an AI coming… to raise yellow, orange or red flags and increase the holding time for urgent review by the pool of mappers with review clearance. And the control and engine room crew, because if it’s really a massive attack decisions will have to be made. Reviewers can only do so much; if the queue grows beyond the review/repair capacity, distribution has to be shutdown the oldfashioned way.
Reviewers perform triage:

No critical harm, pass
Harmful, repair/revert individual incident
Critical / massive / vandalism: block and report / flag

Peter_Elderson · May 26, 2024, 7:02am

This applies to the upload side of the changes, right? Measures at the data distribution side do not affect uploads I think?
If an after-edit review results in a revert, it can affect edits of others at that time, but that is no different than now.

SomeoneElse · May 26, 2024, 12:07pm

Daylight is or was just “OSM without certain categories of data” - it didn’t contain fixes to OSM data only in Daylight. I assume that for each release a manual check was done for “things that might be problematic”. All fixes for problems detected by Facebook were made within OSM itself (which is good!) - which of course means that anyone can run a “staged release” policy for the map on their site using standard, documented OSM tools. You don’t need a “special”, cleaned set of data to achieve this.

(somewhat offtopic here, but based on comments elsewhere) I suspect they want “their own vision for OSM Carto” rather than what OSM Carto is now.

SomeoneElse · May 26, 2024, 12:21pm

Hopefully some of the subsequent comments have clarified things like what “reverts” “redactions” and “the redaction bot” are. I’ve said (paraphrasing) that "consumers of diffs don’t need to explicitly worry about redactions**, because the reverts will be in subsequent planet files and in the feeds - a set of “current OSM data” obtained from these will be “clean”.

However, if I was maintaining a service that allowed people to look at historical OSM data, and was effectively building an external historic mirror of OSM data, then I’d definitely want to make sure that it didn’t contain historic data that it should not - and I’d suggest that anyone doing that ask themselves that question. For the vast majority of consumers of OSM data who only want “what is in OSM now” it is a non-issue, though.

** an exception might be if moderators ever decided they needed to “unredact” data. Following discussion with other DWG folks, I’m not aware of a case when it has ever happened, and it’s difficult to foresee a situation when it would be needed.

PierZen · May 26, 2024, 2:05pm

One tool of an Agile vigilant system would be to block some suspected editors for a certain period of time. Conditions and time lag could vary related to a particular ‘turbulent period’.

For example, edits of a suspected editor could be rejected and the user blocked for 15 minutes giving time to DWG to analyze the situation. The editor would receive a ‘Soft message’ like :

 The OSM API was not able to process your Changeset edit session. 
 Please retry in 15 minutes. If the problem persist, contact the OWG.

SimonPoole · May 26, 2024, 2:59pm

Originally it was at least considered that that would be possible for people that had in one way or the other decided to change their mind wrt the licence change, however in practical terms most of that happened far too late to be of any real use and therefore that aspect was never pursued any further (aka “nobody wrote the code to do that”).

I would note that personal insults would be one of the categories of vandalism that I would consider redacting, as even just leaving it in the historic data hands the vandal a win.

In any case my point at the begin of this sub-thread was that we already have a mechanism that will remove more or less completely edits from the public view and there is no reason to re-invent the wheel.

Anton_Khorev · May 26, 2024, 7:52pm

How can you possibly remove this ability?

aighes · May 27, 2024, 7:38am

I don’t think so. Every “save” to the API gets transferred, checked and accepted or rejected. If it should be more complex it would be partly rejected.
Each save would be one changeset, but you couldalso cache the edit until the changeset gets closed and perform the check afterwards. For sure, the editors will need to understand the results of the checking and inform the user. Though that would be nothing specific the API needs to consider.

aighes · May 27, 2024, 7:47am

Taking the actual case, affected people could request OSM to remove that text from the entire database, not only from the ‘map’. Though it might be nothing time critical.It could be reverted first and redacted afterwards.

woodpeck · May 27, 2024, 8:05am

I see the danger of accepting parts of a coherent whole - e.g. a building gets drawn, the editor uploads four nodes and one way, the nodes are accepted but the way is rejected. In this particular case the editor could, with generous additional coding, know that the nodes and the way belong together and somehow clean up after, but in cases where e.g. an old building gets deleted and a new building drawn over it, a rejection of either the delete or the create operation would lead to undesirable results.

aighes · May 27, 2024, 8:36am

I was rather thinking about the way gets accepted as well as the building=yes. But name=asshole would be rejected.

I would in a first step consider geometry as ok, knowing you can do vandalism with geometry as well. But I think you can also add more thoughts to this and kind of rollback the object. I recall it wasn’t the idea of the redaction bot before, since we wanted to keep as much data as possible.

Duja · May 27, 2024, 10:26am

I’m not sure where the current bounding box definition comes from, but I agree that it’s too broad and at best misleading. For validation purposes, I would argue that the “proper” (narrowly construed) bounding box for a changeset, considering only geometric changes (i.e. excluding tagging) only includes the extent of:

all nodes created
all nodes deleted
former and new positions of all nodes moved.

So basically, it should let you edit parts of very long ways without considering the entire way as the bounding box. Obviously, there are many edge cases to consider (e.g. when you split a way you do not create any nodes, but you appear to have edited one and created another way).

With “my” definition, adding two restaurants on distant parts of the world would still count as a “huge” bounding box, and I would argue that we want to prevent that. Tools such StreetComplete should not allow such edits.

While a bounding box validation would be relatively simple to implement at the API level, a question remains open how the editing tools should be adapted. Currently, when a good-faith changeset created in iD gets rejected by API (a rare event), I can not meaningfully split it. The best I can do is to undo operations until I make it valid, and that obviously erases a part of my edits.

Woazboat · May 27, 2024, 10:48pm

That’s the hard part, isn’t it. It’s not really a problem with the API on the server side but how it’s (intentionally) used by editors. The necessary changes would more or less all have to be done client-side in the editors, so they no longer automatically split uploads. (Afaik it’s mostly/only josm that would be affected here, so this part might not actually be that big of an issue).
However, that opens up a new problem when the number of pending changes to be uploaded are over the API limit for a single call, which is the main reason why the upload would be split in the first place. You could either 1) increase the limit on the server side or 2) make the user/editor partition the changes so they can be submitted in multiple uploads but where each individual upload still makes sense on its own.

On the server side, you could limit each changeset to a single upload request only and then automatically close it, but that doesn’t really solve the issue and isn’t actually necessary.

The problem is that a ‘save’ might be split up into multiple upload requests in the background of which some might be rejected while the rest gets through. Multiple upload requests for the same changeset are not actually a problem as long as they are standalone and do not contain partial changes. (E.g. StreetComplete submitting changes to the same changeset spread out over multiple upload requests is not an issue since individual changes made in StreetComplete are usually independent of each other.)

Partial rejection is realistically not an option. With the way the OSM data/tagging model works, it’s extremely hard to determine interdependencies between objects or which parts can safely be rejected without affecting the meaning of the rest and you’d just end up breaking things. Storing changes on the server until the changeset is closed and then checking/applying them all at once is also not really feasible for multiple reasons (resource requirements, data conflicts).

ppku · May 28, 2024, 9:16am

Can we change the clickbait title? It is also in the URL which was just posted in chat.

ppku · May 28, 2024, 9:40am

The good thing about the federal nature of OpenStreetMap is that
everybody is encouraged to produce third party tools quicker.

Federal means that non-OSMF-controlled servers can edit OSM, which is not the case. What is described here is the effect of being open data.

SimonPoole · May 28, 2024, 9:50am

The issue of referentially incomplete uploads is not new and is an issue completely independent of the vandalism protection issue, so I would suggest that, if at all, it should be discussed in a separate thread.

I would note that I handle chunked uploads in Vespucci just as broken as JOSM does, and it is 100% due to me being too lazy to do it properly (for a very edgy edge case).

If you are very bored see the discussion on idempotent uploads that touches on some of the issues Idempotency for API 0.6 · Issue #2201 · openstreetmap/openstreetmap-website · GitHub