Proposal: Revert bot for restoring the OSM Israel map

zstadler · October 26, 2023, 3:21pm

The vandalism against the OSM map in Israel does not seem to go away. We could expect it lo last for weeks if not more, similar to the on-going vandalism that still takes place in Ukraine.

To be able to recover from such attack in a systemic repeatable approach I would like to propose an algorithm for a revert bot.

This proposal is inspired by the recent reverts performed by @SomeoneElse and @woodpeck.
Thank you !!!

Feedback and comments are highly appreciated.

The problem:

Malicious accounts perform massive random modifications in the OSM data:

Moving nodes
Deleting and modifying tags
Performing the above multiple times in different changesets
Deleting elements
Adding new elements (I didn’t see such activity in the recent attacks)

The proposal

Create a bot that will revert all edits performed by a set of blocked malicious accounts.

The bot will be used to define a “revert version” for modified and deleted elements which would be the newer of:

The version created by a known revert account, if any.
The version just before the earliest un-reverted edit by a malicious accounts.

Inputs

List of blocked “malicious accounts”.
List of known “revert accounts”

Algorithm

Determine a “earliest time” of the run by the minimal “created_at” tag of all changesets created by the malicious accounts
Create the list of elements included in the changesets of the malicious accounts
For each modified or deleted such element, determine the “revert version” by scanning the element’s versions from new to old
1. Set the revert version to “unnecessary”
2. If an edit was done by a malicious user, set the revert version to the previous version and continue with it
3. If the version timestamp is before the earliest time or this version was created by a revert account , store the revert version found earlier and move to the next element.
4. continue to the next version
Restore modified and deleted elements to their revert version, unless it is unnecessary: nodes, then ways, and then relations
Delete elements added by the malicious accounts, if still exist: relations, then ways, and then nodes

Notes

Restoring nodes enables restoring ways and restoring also the ways allows the restoring of relations
Deleting added relations allows deletions of ways and deleting also the ways allows the deletion of nodes.
The order of restoring and deleting relations of relations should addressed
The revert bot should be run by dedicated revert account, so it can be included in the list of known revert accounts
Edits, including attempted reverts, done by other accounts, after a malicious edit will also be reverted

Vandalism and blocks in Israel

… in each case my previous revert failed because of reverts attempted by well-meaning people who didn’t always make a very good job of the revert.

Fizzie-DWG · October 27, 2023, 2:45am

I am aware that there is currently another reversion tool under development, which will hopefully correct some of these problems.

One thing, though, with regard to:

Unfortunately, as we have seen this week, other mappers also do partial reverts, that can remove the latest set of bad data, but replace it with earlier bad data, & this revert then also needs to be fixed.

I completely understand your anger & frustration at the damage that has been done to the map in your area, but could I suggest that we all please hold off for a few weeks & see what emerges?

zstadler · October 27, 2023, 5:25am

Exactly!
The approach here is to revert all edits performed after the malicious edits:

This is because only versions make by malicious account and revert accounts are considered.
Versions created by other accounts are ignored, including partial reverts done by accounts other than the given revert accounts.

I do assume that the listed revert accounts perform only automated full reverts that should not be ignored. It does require the creation of accounts like SomeoneElse_Revert and wookpacker_repair and their use only for automated full reverts.
If this assumption is not feasible, then the algorithm could be extended ignore a list of changesets by revert accounts that should be ignored in step 3.3 above.

Edit:

Frustration - yes, but no anger. I see these acts of vandalism like I see computer viruses. Both will not go away and we need to act accordingly. The next act of vandalism will happen and we should be prepared, not surprised.

SomeoneElse · October 27, 2023, 3:31pm

Have you had a look at the perl revert scripts (in their latest incarnation, since last night)? They do quite a lot of this already. What do you think is missing / can be done better?

zstadler · October 27, 2023, 3:46pm

I didn’t know they were openly available. Could you share their location?

SomeoneElse · October 27, 2023, 3:54pm

There is a wiki page Revert scripts - OpenStreetMap Wiki

Documentation is lagging somewhat**, but you can play with them safely on the Dev server to see how e.g.undoing the changes of multiple “bad users” at once works.

** this functionality was only added yesterday.

NorthCrab · October 27, 2023, 11:34pm

By running a full revert, whether for one user or multiple users, following osm-revert’s logic, you are assured of returning to the original input, even if it has been partially reverted by other users. This process can be easily accomplished through the Thanos interface. Simply create a single, comprehensive revert task that encompasses all the vandalizing users, and it will automatically resolve the issues.

Currently, there are two main issues to address. The first is the very low rate-limit for moderation users, which is caused by cgimap and significantly extends the duration of the revert process after a certain threshold. The second issue involves the overpass synchronization delay, which, in some scenarios, causes it to skip certain changes. I’m in the final stages of preparing a solution, and all the details, including the project’s code, will be released around November 3-5 next week.

zstadler · October 28, 2023, 7:40am

It seems to me that the first 2 components of the above algorithm will be useful additions to both the
osm-revert-scripts library and the osm-revert tool:

The operator’s ability to define the revert targets as a set of accounts, who’s changesets may be intermixed in time. As far as I understand, both osm-revert-scripts and osm-revert expect to receive a set of changesets.

NorthCrab:

Simply create a single, comprehensive revert task that encompasses all the vandalizing users, and it will automatically resolve the issues.

I was unable to create such a task simply and with minimal risk of human error, given the need to collect changesets of many accounts, some of which with dozens of changesets.
The ability to define a the revert version based on each element’s history, a set of malicious accounts (“black list”) and a set of revert accounts (“white list”). In particular, avoid reverting a malicious version of a given element if it was followed by a known revert version.

Clearly, the third part, concerning the proper order to revert a set of element, is common to all properly-programmed revert software. IMHO, code reuse is a programmer’s virtue.

zstadler · October 28, 2023, 5:26pm

Can you elaborate a bit, as I’m a bit confused.

The bojicat389 account was able to modify about 250,000 elements in a bit less than 1:25 hours (10:23:25Z to 11:48:17Z). That’s about 175,000 elements/hour.
On the other hand, the revert has started about 6 hours ago and has reverted about 30,000 elements. That’s about 5,000 elements/hour.

I do hope I have an error in the above numbers. If that’s not the case, I doubt that the 35-fold slower revert is because cgimap is discriminating the moderation users.

NorthCrab · October 28, 2023, 7:00pm

osmtools is not part of osm-revert or Thanos. It’s actually one of the earliest revert script tools ever created and is known for its stability, although it tends to be slowish. The rate limit I talk about primarily affects Thanos (a tool made specifically for handling mass-reverts efficiently).

SomeoneElse · October 28, 2023, 7:46pm

That revert (a) isn’t done using a moderator account** so that issue link isn’t relevant and (b) I doubt that the rate limiting that there is is having an effect on it, because it’s essentially reverting object by object.

** which is normal for DWG reverts - I try and do those from a separate account so hat it’s different from “normal mapping”.

zstadler · November 2, 2023, 3:45pm

I’ve done a first step in that direction, dealing specifically with elements that were deleted by the vandals at any stage, and restoring then to the latest version prior to any update by a vandal.

A Python program accepts a list of vandal account and produces a JOSM file. That format was chosen since it requires only minimal changes to the history XML of the elements in order to enable uploading it using JOSM.

It is also a first step in the sense that it will enable the correct revert of ways and relations which were reverted without some of their components, such as this building that is lacking its 16’th node.

zstadler · November 7, 2023, 8:26pm

Things look better now