Proposal: Revert bot for restoring the OSM Israel map

The vandalism against the OSM map in Israel does not seem to go away. We could expect it lo last for weeks if not more, similar to the on-going vandalism that still takes place in Ukraine.

To be able to recover from such attack in a systemic repeatable approach I would like to propose an algorithm for a revert bot.

This proposal is inspired by the recent reverts performed by @SomeoneElse and @woodpeck.
Thank you !!!

Feedback and comments are highly appreciated.

The problem:

Malicious accounts perform massive random modifications in the OSM data:

  • Moving nodes
  • Deleting and modifying tags
  • Performing the above multiple times in different changesets
  • Deleting elements
  • Adding new elements (I didn’t see such activity in the recent attacks)

The proposal

Create a bot that will revert all edits performed by a set of blocked malicious accounts.

The bot will be used to define a “revert version” for modified and deleted elements which would be the newer of:

  • The version created by a known revert account, if any.
  • The version just before the earliest un-reverted edit by a malicious accounts.

Inputs

  • List of blockedmalicious accounts”.
  • List of known “revert accounts

Algorithm

  1. Determine a “earliest time” of the run by the minimal “created_at” tag of all changesets created by the malicious accounts
  2. Create the list of elements included in the changesets of the malicious accounts
  3. For each modified or deleted such element, determine the “revert version” by scanning the element’s versions from new to old
    1. Set the revert version to “unnecessary”
    2. If an edit was done by a malicious user, set the revert version to the previous version and continue with it
    3. If the version timestamp is before the earliest time or this version was created by a revert account , store the revert version found earlier and move to the next element.
    4. continue to the next version
  4. Restore modified and deleted elements to their revert version, unless it is unnecessary: nodes, then ways, and then relations
  5. Delete elements added by the malicious accounts, if still exist: relations, then ways, and then nodes

Notes

5 Likes

I am aware that there is currently another reversion tool under development, which will hopefully correct some of these problems.

One thing, though, with regard to:

Unfortunately, as we have seen this week, other mappers also do partial reverts, that can remove the latest set of bad data, but replace it with earlier bad data, & this revert then also needs to be fixed.

I completely understand your anger & frustration at the damage that has been done to the map in your area, but could I suggest that we all please hold off for a few weeks & see what emerges?

Exactly!
The approach here is to revert all edits performed after the malicious edits:

This is because only versions make by malicious account and revert accounts are considered.
Versions created by other accounts are ignored, including partial reverts done by accounts other than the given revert accounts.

I do assume that the listed revert accounts perform only automated full reverts that should not be ignored. It does require the creation of accounts like SomeoneElse_Revert and wookpacker_repair and their use only for automated full reverts.
If this assumption is not feasible, then the algorithm could be extended ignore a list of changesets by revert accounts that should be ignored in step 3.3 above.


Edit:

Frustration - yes, but no anger. I see these acts of vandalism like I see computer viruses. Both will not go away and we need to act accordingly. The next act of vandalism will happen and we should be prepared, not surprised.

Have you had a look at the perl revert scripts (in their latest incarnation, since last night)? They do quite a lot of this already. What do you think is missing / can be done better?

1 Like

I didn’t know they were openly available. Could you share their location?

1 Like

There is a wiki page Revert scripts - OpenStreetMap Wiki

Documentation is lagging somewhat**, but you can play with them safely on the Dev server to see how e.g.undoing the changes of multiple “bad users” at once works.

** this functionality was only added yesterday.

1 Like

By running a full revert, whether for one user or multiple users, following osm-revert’s logic, you are assured of returning to the original input, even if it has been partially reverted by other users. This process can be easily accomplished through the Thanos interface. Simply create a single, comprehensive revert task that encompasses all the vandalizing users, and it will automatically resolve the issues.

Currently, there are two main issues to address. The first is the very low rate-limit for moderation users, which is caused by cgimap and significantly extends the duration of the revert process after a certain threshold. The second issue involves the overpass synchronization delay, which, in some scenarios, causes it to skip certain changes. I’m in the final stages of preparing a solution, and all the details, including the project’s code, will be released around November 3-5 next week.

1 Like

It seems to me that the first 2 components of the above algorithm will be useful additions to both the
osm-revert-scripts library and the osm-revert tool:

  1. The operator’s ability to define the revert targets as a set of accounts, who’s changesets may be intermixed in time. As far as I understand, both osm-revert-scripts and osm-revert expect to receive a set of changesets.

    I was unable to create such a task simply and with minimal risk of human error, given the need to collect changesets of many accounts, some of which with dozens of changesets.

  2. The ability to define a the revert version based on each element’s history, a set of malicious accounts (“black list”) and a set of revert accounts (“white list”). In particular, avoid reverting a malicious version of a given element if it was followed by a known revert version.

Clearly, the third part, concerning the proper order to revert a set of element, is common to all properly-programmed revert software. IMHO, code reuse is a programmer’s virtue.

3 Likes

Can you elaborate a bit, as I’m a bit confused.

The bojicat389 account was able to modify about 250,000 elements in a bit less than 1:25 hours (10:23:25Z to 11:48:17Z). That’s about 175,000 elements/hour.
On the other hand, the revert has started about 6 hours ago and has reverted about 30,000 elements. That’s about 5,000 elements/hour.

I do hope I have an error in the above numbers. If that’s not the case, I doubt that the 35-fold slower revert is because cgimap is discriminating the moderation users.

osmtools is not part of osm-revert or Thanos. It’s actually one of the earliest revert script tools ever created and is known for its stability, although it tends to be slowish. The rate limit I talk about primarily affects Thanos (a tool made specifically for handling mass-reverts efficiently). :slightly_smiling_face:

That revert (a) isn’t done using a moderator account** so that issue link isn’t relevant and (b) I doubt that the rate limiting that there is is having an effect on it, because it’s essentially reverting object by object.

** which is normal for DWG reverts - I try and do those from a separate account so hat it’s different from “normal mapping”.

I’ve done a first step in that direction, dealing specifically with elements that were deleted by the vandals at any stage, and restoring then to the latest version prior to any update by a vandal.

A Python program accepts a list of vandal account and produces a JOSM file. That format was chosen since it requires only minimal changes to the history XML of the elements in order to enable uploading it using JOSM.

It is also a first step in the sense that it will enable the correct revert of ways and relations which were reverted without some of their components, such as this building that is lacking its 16’th node.

image

Things look better now

1 Like