The new rate limiting prevents participants to Missing Maps mapathons from saving buildings

Hi, I just wanted to share that I have written a new section on MSF OSM wiki page on Assuring Data Quality, where I describe the measure we take. You can check here: Médecins Sans Frontières - OpenStreetMap Wiki
The limit was a minor or no issue at our last December mapathons last week, because the mapathons with corporate partners were too short to do enough mapping for anyone to reach it, and the Greece and Brazil mapathons were too small (no one reached it). At our Swiss Geneva mapathon on 13 Dec, there was one person who reached the limit, user name MoYuTu99, and Czech Olomouc mapathon on 13 Dec, also one.

8 Likes

Hi @pnorman @Mateusz_Konieczny @kaartjesman @Kovoschiz @Matija_Nalis and all contributing to this thread who may be interested, here is a scientific analysis from @Hagellach37 published on the HeiGIT blog about Missing Maps new users and how they would be affected by different rate limits. We are curious about your thoughts.

(cc @Jorieke_V @SColchester @Patrik_B @Filip009)

7 Likes

I am skeptical about the analysis. At the end it identifies that about 32k/58kchangesets were made in Ukraine. To a reasonable approximation, all the high volume edits from new users in Ukraine during this time were vandalism. Most of the ones in Russia and Belarus would be too, but I didn’t consider those. Many of the users were not blocked as they were deleted. This makes the users which were blocked metric meaningless as due to the nature of the most common vandalism, vandals would not show up under this metric. It would have been better to look at users that were blocked or deleted their account, and then check what this metric was missing.

Because it’s missing this, it makes the user counts not very useful. The percentage of changesets that are vandalism goes up as you go from 1000 to 5000, but it also starts missing a bunch of the vandalism. This makes me curious about the changesets that fall between 1000 and 5000.

Publishing a list of changesets would be useful, as it would allow users to do an analysis of the interesting part.

Overall, the analysis confirms that over 50% of blocked edits are likely vandalism and as the rate limit increases that percentage goes higher.

8 Likes

I also followed

Get in touch with us via ohsome@heigit.org if you have further questions about our analysis and its results.

and wrote to them

EDIT: got reply - they made minor tweak to text and mentioned that they want to look into deleted but not blocked accounts and gave them futher info about deleted vandal acounts

Thanks for doing that legwork!

I am under the impression that the option of finding an local optimal for limits is not all that useful as the situation is a transitory one. Not in a government sense where any transitory situation tends to become permanent :wink: No, instead this is simply hard work that needs to be done and has been acknowledged by devs and the foundation.

I stand by this older post where the longer term vision is to recognize that lone mappers can have much lower limits than mappers doing it in a team with a social element to it. Because the limits are not just about actual intentional vandalism, they are just as much about well intended but maybe harmful changes.
Lets aim to make new people actually seek out and work with experienced ones if they want to get full access.

The older post:

1 Like

Hey @pnorman @Mateusz_Konieczny ,
we have updated the blog post and analysis which now includes also changes from account which were deleted. You will find the changes marked in red in the post.

I think to some extend this should answer your open questions.

3 Likes

From a researchers perspective we prefer user blocking by DWG over user account deletions by OSM sysadmins

The account deletions were almost all done by the vandals in an effort to make it more difficult to track what they had done.

Under the assumption that all changesets that would have hit the 1k rate limit and were from a user that deleted, I looked at the difference between 1000 and 2500.

Increasing it would allow

Looking at the difference between 1000 and 2500 and under the assumption that all accounts that hit 1k rate limits and deleted or were blocked were vandals, this would allow through 12k changesets that are vandalism. Looking at it in terms of how selective the rate limit is, at 1000 I find 78% of the changesets are vandalism, and at 2500 95% of the changes are. Most of the gain in selectivity is between 1000 and 1500, and at the same time that would allow 4.8k changes that are vandalism.

Do you have a by-country breakdown for the different rate limits? Another good approximation is that every changeset that hit the rate limit in Ukraine was vandalism.

Ultimately, I’m not sure at what level the vandalism becomes unmanageable, as I’m not one of the people cleaning it up.

What I’d like to see the most is research on better algorithms. The current one has some weaknesses which I will not detail here, but it could also be more selective.

The information available when determining the rate limit is

  • user roles
  • last block
  • time since first changeset
  • active user reports
  • number of changes since some time
  • number of changesets since some time
2 Likes

I’ve re-read the new version, and I have to say, the narrative there simply does not match what actually happened before and just after the introduction of rate limiting.

The numbers that I can easily test don’t pass the sniff test - for example “There are about 601 users who were blocked by the OSM Data Working Group, but would not have been affected by the rate limit”. A simple check of my OSM account will show a “blocks issued” number of around 8,000. The vast majority of those were issued during the period you’re looking at, and the vast majority of those hadn’t hit any rate limit, or even started editing - they’d been identified as accounts created for vandalism and were blocked before they could do anything. An even larger number were disabled by the admins before anyone (including the DWG) saw them.

That said, it’s definitely worth looking at the effect of rate limiting on “real” usage (see for example the testing that Sam Colchester did last year). If you want to anaylyse a period of time I’d definitely suggest picking a period when there were not mass vandalism attempts happening.

–Andy (from the DWG)

Hello everyone.
Today 5 new mappers got blocked at Missing Maps Mapathon #12 Žilina

It was very sad to look at them and see, that they want map, but they can not :frowning:

4 Likes

I’m wondering if hardcoding different limits for bboxes including the high-abuse regions (e.g. Ukraine and surroundings) might be set with different limits.

i.e. trigger at 500 changes for newusers mapping Ukraine, compared to 2500 for newusers mapping rest of the world, or something like that; instead of 1000 changes everywhere

Yeah, I know such kludge wouldn’t be ideal in long term where ideally such “abuse areas” should be able to be dynamically specified; but should be good enough for slowing down abusers in conflict regions even more, while highly reducing (or completely avoiding) false positives in the rest of the world.

1 Like

That is quite sad :crying_cat_face:

But did they eventually manage to upload changes later? I.e. could’ve they continued editing, even if they couldn’t upload & see their changes go live?

But after some time (an hour?) they (or the organizer) could click upload again, and it should work better? Or it didn’t work even then?

Yes, they were trying to upload each minute or more, until they were successful.

Yes, they could continue editing, but with more they add, more later they can upload. So it’s counterproductive to map more.

It’s possible to do it like when they would map and upload changes after an hour, when they come home.

Organizer can upload they changes, but this would not help with morality, because organizer will “steal” whole work. So they would not see their results. How much they have mapped.

2 Likes

But are we really interested in new users barging in and uploading 2500
edits? Is that a healthy approach for good and lasting contributions?
Even in a Mapathon situation, do we want to gamify the whole thing into
a “who maps most” competition? There’s enough overlapping, duplicated,
jarry-angled buildings in OSM as it is. What’s wrong with “you can only
add 100 buildings on your first day, so take your time and do them well”?

5 Likes

They can map more at the next mapathon!

We all own OSM. Every map edit benefits every one of us!

Frederik, the new users are not “barging in”, they are invited by us, representatives of humanitarian organisations (Médecins Sans Frontières in my case, that is why I started this thread) to help us improve the base maps in areas where we either have (mostly medical, in our case) projects and activities ongoing, or are assessing the needs.
And what is wrong with that approach is that you demotivate and likely loose the most capable mappers, quick learners, who could become eventually either inspiring role models in their communities/for other students (like @Filip009 above) and/or eventually validators. As I already wrote above, at some of our mapathons for Czech & Slovak mappers, the organisers teach JOSM directly and we do a thorough training and answer all the questions regardless the mapoing editor, so the buildings the new users create are usually orthogonalized and, in general, validators report high quality mapping. So the limit in such cases tends to be misplaced.

2 Likes

Totally agree to some stringent limits on new accounts without track record lest this gets pre-agreed on incidentals with QA. Came across a village recently which from above looks like 150+ distinct buildings, residentials mostly, sheds, gardens… mapped as 6 following the edges of the streets. MapWithAI wonders or something like ‘map filler’. There’s the case @ivanbranco analysed in Algeria, 156k buildings or so imported with, yes MWA, with way crossings and 131K building overlaps. No responses so guess it’s still an open case. Mopping with the tap full open.

1 Like

Another recent example is this one in Equador. New mappers seem to have been given some particularly poor advice, resulting in duplicated and triplicated buildings, and buildings called “edits”, etc.

Can anyone in this thread help find who organised that activity?

1 Like

I’d like to rectify some facts that might have gone lost.

The rate limit of 1000 is per first hour, not per first 24 hours. The clock starts with the upload of the first object, not with account creation.

Any such mechanism must be extremely simple: the algorithm must be such that it scales faster than any potential attack. And more importantly, judgement criteria must be so simple that the total time spent on reviewing complaints does not overwhelm the real people tasked with that. This makes any ideas with bounding box or edit type adaptive rules or whatever too complex to administer.

The rate limits would have been hit in the past 11 years relatively rarely (in the talk after 11 minutes) by benign users. Note that I have hand checked for a substantially higher limit. The lower limit has been hit more often: absolutely too often for manual review, relatively rarely.

Not every hyperactive editing activity is vandalism. There are unusual editing patterns (pattern, not content) that run into the limit accidentally. However, the limit is set such that mappers with an average degree of diligence and current tooling will not run into the limit.

In total we have seen in the past 11 years an estimated 200-2000 benign recurring users (out of a total of almost 2 million mappers) that have uploaded in the first hour more than 1000 edits. The 20k number in the talk includes one-off mappers, imports, bots, and spam, and the 200-2000 are extrapolated from the findings after manually analyzing users as explained in the talk.

The limit is designed to give long term users and ultimately the DWG enough time to stop vandalism early. This has and must have priority, because these caring users are scarce. The number of now long term users we might have put off if we had the limit already in the past is most likely a one-digit figure per year or zero.

However, it would be a beneficial research task: How many (and which) users that now fulfill the conditions for active users have uploaded over 1000 object versions in the first hour of their mapping activity, structured by number of years of ongoing activity?

6 Likes

Currently the limit is at 1000 nodes.
Would it be a better solution to set the limit to 1000 modified (or created) objects (ways/nodes/relations) or would this make vandalism to easy again (based on the known vandalism patterns)?

why you think so? I am pretty sure that ways and relations are also counted already.