The new rate limiting prevents participants to Missing Maps mapathons from saving buildings

yes, and was done before, effects were really poor

Years ago I remember hearing about an attempted decentralised OSM DB, Portable OSM, but I think nothing came of that.

3 Likes

As far I know it was used quite a bit, but it has obvious downsides that make its approach not applicable for anything that has to remotely scale (which is the same for many of the proposals made here).

1 Like

This is exactly what was on my mind. Thanks for pointing it out.

It is not, for two reasons. The API is tag-agnostic, by design, and secondly, that information isn’t available from the changeset table. The rate limit function needs to be queried many times, so I don’t see going to the full history tables as practical.

I’m not really following the pseudo-code, but this doesn’t really make much sense as tags aren’t deleted or changed in the API. A new version is uploaded with a complete set of tags.

2 Likes

Hi, I just wanted to share that I have written a new section on MSF OSM wiki page on Assuring Data Quality, where I describe the measure we take. You can check here: Médecins Sans Frontières - OpenStreetMap Wiki
The limit was a minor or no issue at our last December mapathons last week, because the mapathons with corporate partners were too short to do enough mapping for anyone to reach it, and the Greece and Brazil mapathons were too small (no one reached it). At our Swiss Geneva mapathon on 13 Dec, there was one person who reached the limit, user name MoYuTu99, and Czech Olomouc mapathon on 13 Dec, also one.

8 Likes

Hi @pnorman @Mateusz_Konieczny @kaartjesman @Kovoschiz @Matija_Nalis and all contributing to this thread who may be interested, here is a scientific analysis from @Hagellach37 published on the HeiGIT blog about Missing Maps new users and how they would be affected by different rate limits. We are curious about your thoughts.

(cc @Jorieke_V @SColchester @Patrik_B @Filip009)

7 Likes

I am skeptical about the analysis. At the end it identifies that about 32k/58kchangesets were made in Ukraine. To a reasonable approximation, all the high volume edits from new users in Ukraine during this time were vandalism. Most of the ones in Russia and Belarus would be too, but I didn’t consider those. Many of the users were not blocked as they were deleted. This makes the users which were blocked metric meaningless as due to the nature of the most common vandalism, vandals would not show up under this metric. It would have been better to look at users that were blocked or deleted their account, and then check what this metric was missing.

Because it’s missing this, it makes the user counts not very useful. The percentage of changesets that are vandalism goes up as you go from 1000 to 5000, but it also starts missing a bunch of the vandalism. This makes me curious about the changesets that fall between 1000 and 5000.

Publishing a list of changesets would be useful, as it would allow users to do an analysis of the interesting part.

Overall, the analysis confirms that over 50% of blocked edits are likely vandalism and as the rate limit increases that percentage goes higher.

8 Likes

I also followed

Get in touch with us via ohsome@heigit.org if you have further questions about our analysis and its results.

and wrote to them

EDIT: got reply - they made minor tweak to text and mentioned that they want to look into deleted but not blocked accounts and gave them futher info about deleted vandal acounts

Thanks for doing that legwork!

I am under the impression that the option of finding an local optimal for limits is not all that useful as the situation is a transitory one. Not in a government sense where any transitory situation tends to become permanent :wink: No, instead this is simply hard work that needs to be done and has been acknowledged by devs and the foundation.

I stand by this older post where the longer term vision is to recognize that lone mappers can have much lower limits than mappers doing it in a team with a social element to it. Because the limits are not just about actual intentional vandalism, they are just as much about well intended but maybe harmful changes.
Lets aim to make new people actually seek out and work with experienced ones if they want to get full access.

The older post:

1 Like

Hey @pnorman @Mateusz_Konieczny ,
we have updated the blog post and analysis which now includes also changes from account which were deleted. You will find the changes marked in red in the post.

I think to some extend this should answer your open questions.

3 Likes

From a researchers perspective we prefer user blocking by DWG over user account deletions by OSM sysadmins

The account deletions were almost all done by the vandals in an effort to make it more difficult to track what they had done.

Under the assumption that all changesets that would have hit the 1k rate limit and were from a user that deleted, I looked at the difference between 1000 and 2500.

Increasing it would allow

Looking at the difference between 1000 and 2500 and under the assumption that all accounts that hit 1k rate limits and deleted or were blocked were vandals, this would allow through 12k changesets that are vandalism. Looking at it in terms of how selective the rate limit is, at 1000 I find 78% of the changesets are vandalism, and at 2500 95% of the changes are. Most of the gain in selectivity is between 1000 and 1500, and at the same time that would allow 4.8k changes that are vandalism.

Do you have a by-country breakdown for the different rate limits? Another good approximation is that every changeset that hit the rate limit in Ukraine was vandalism.

Ultimately, I’m not sure at what level the vandalism becomes unmanageable, as I’m not one of the people cleaning it up.

What I’d like to see the most is research on better algorithms. The current one has some weaknesses which I will not detail here, but it could also be more selective.

The information available when determining the rate limit is

  • user roles
  • last block
  • time since first changeset
  • active user reports
  • number of changes since some time
  • number of changesets since some time
2 Likes

I’ve re-read the new version, and I have to say, the narrative there simply does not match what actually happened before and just after the introduction of rate limiting.

The numbers that I can easily test don’t pass the sniff test - for example “There are about 601 users who were blocked by the OSM Data Working Group, but would not have been affected by the rate limit”. A simple check of my OSM account will show a “blocks issued” number of around 8,000. The vast majority of those were issued during the period you’re looking at, and the vast majority of those hadn’t hit any rate limit, or even started editing - they’d been identified as accounts created for vandalism and were blocked before they could do anything. An even larger number were disabled by the admins before anyone (including the DWG) saw them.

That said, it’s definitely worth looking at the effect of rate limiting on “real” usage (see for example the testing that Sam Colchester did last year). If you want to anaylyse a period of time I’d definitely suggest picking a period when there were not mass vandalism attempts happening.

–Andy (from the DWG)

Hello everyone.
Today 5 new mappers got blocked at Missing Maps Mapathon #12 Žilina

It was very sad to look at them and see, that they want map, but they can not :frowning:

4 Likes

I’m wondering if hardcoding different limits for bboxes including the high-abuse regions (e.g. Ukraine and surroundings) might be set with different limits.

i.e. trigger at 500 changes for newusers mapping Ukraine, compared to 2500 for newusers mapping rest of the world, or something like that; instead of 1000 changes everywhere

Yeah, I know such kludge wouldn’t be ideal in long term where ideally such “abuse areas” should be able to be dynamically specified; but should be good enough for slowing down abusers in conflict regions even more, while highly reducing (or completely avoiding) false positives in the rest of the world.

1 Like

That is quite sad :crying_cat_face:

But did they eventually manage to upload changes later? I.e. could’ve they continued editing, even if they couldn’t upload & see their changes go live?

But after some time (an hour?) they (or the organizer) could click upload again, and it should work better? Or it didn’t work even then?

Yes, they were trying to upload each minute or more, until they were successful.

Yes, they could continue editing, but with more they add, more later they can upload. So it’s counterproductive to map more.

It’s possible to do it like when they would map and upload changes after an hour, when they come home.

Organizer can upload they changes, but this would not help with morality, because organizer will “steal” whole work. So they would not see their results. How much they have mapped.

2 Likes

But are we really interested in new users barging in and uploading 2500
edits? Is that a healthy approach for good and lasting contributions?
Even in a Mapathon situation, do we want to gamify the whole thing into
a “who maps most” competition? There’s enough overlapping, duplicated,
jarry-angled buildings in OSM as it is. What’s wrong with “you can only
add 100 buildings on your first day, so take your time and do them well”?

5 Likes

They can map more at the next mapathon!

We all own OSM. Every map edit benefits every one of us!

Frederik, the new users are not “barging in”, they are invited by us, representatives of humanitarian organisations (Médecins Sans Frontières in my case, that is why I started this thread) to help us improve the base maps in areas where we either have (mostly medical, in our case) projects and activities ongoing, or are assessing the needs.
And what is wrong with that approach is that you demotivate and likely loose the most capable mappers, quick learners, who could become eventually either inspiring role models in their communities/for other students (like @Filip009 above) and/or eventually validators. As I already wrote above, at some of our mapathons for Czech & Slovak mappers, the organisers teach JOSM directly and we do a thorough training and answer all the questions regardless the mapoing editor, so the buildings the new users create are usually orthogonalized and, in general, validators report high quality mapping. So the limit in such cases tends to be misplaced.

2 Likes