Overpass API performance issues

I would think if you download either 3.4mb from Overpass or 116mb as daily planet-diff, Overpass is the way to go. Even updating your data hourly might be totally fine (but I would assume they will let you know based on the UA you send). If you send a bbox-related querry and do this each time your user shifts the map, that might look different and I believe that’s what @Nakaner was talking about.

3 Likes

Not by volume, but that:

                headers={"User-Agent": "tap-in-osm/1.0"},

likely should be changed from that generic value to one that indicates your project bot specifically.

6 Likes

Yes, overpass result is cached of course. The one request to overpass is only done once by the service. Every other user does not use overpass, but only the json cached file from my webserver. So no further traffic/load to overpass or community-run servers.

The osmpoidb github repository is also linked above. That is what i was referring to. It would be nice to see some good beginner documentation for how to use osmium to cut that data and keep it updated.

Overpass is (over)used, because it is easy to use and there is a lot of documentation in the wiki.

1 Like

It’s not documentation, but this is at least an example. It doesn’t do the “keep it updated” part but does have examples of osmium tags-filter and osm-tags-transform.

1 Like

We can talk about that, latest at the next pub meeting in Munich.

PTNA makes excessive use of ‘pyosmium-uptodate’, ‘osmium tags-filter’, ‘osmium extract’, … and will be 100% overpass-free tomorrow, for the nightly reports.

8 Likes

OSM-Revert now is at least somewhat usable thanks to the upstream changes on the Overpass.de server. However, I still consistently get a rate_limited error on what appears to be the last request it makes (the HTTP status code isn’t shown, but I’m assuming its the 429) at all times of day when attempting to revert at least some specific sets of two or more changesets (at least when attempting an “advanced revert”, which may perform additional requests), as well as if I don’t wait at least several minutes between 1-changeset reverts.

I rely on OSM-Revert as the linchpin of my SEO-spam fighting arsenal reviewing and remediating dozens of SEO accounts/up to 100 changesets per day, where it would otherwise take many times more time to perform the revert and write and post the appropriate comments manually on each of the affected changesets, or having to constantly wrangle the significant time and complexity cost of switching OSM accounts in JOSM to automate only the revert part of the equation, and not as thoroughly as OSM-Revert. This seriously hobbles it in the very cases where it provides the most value–reverting many/more complex changesets from a spammer in one go and posting appropriate comments on all of them.

Unfortunately, the volunteer maintainer of the tool does not appear to have responded to the number of issues opened about Overpass issues in the past few months, including when for a long period the tool was completely unusable before the recent changes, nor a contributed PR from 2 months ago adding pause-and-retry support.

Is there anything that can be done on the Overpass server side, at least to point to anything that can be changed in the tool to better follow the rules and reduce the chance of this happening? If not, any recommendations on a path forward here? As a mapper spending sometimes multiple hours a day of volunteer time to mitigate the harms of essentially the same basic type of shameless for-profit abusive spammers that were overloading Overpass, the more we can reduce the community resources (i.e. time and effort) expended per spammer, the more effectively we can stop them and perhaps even deter more spam in the future.

1 Like

“Not using Overpass” sounds like a plan? For individual changesets and interactive reverts Josm “just works”, beyond that the Perl scripts are an option.

You could also stand up your own Overpass server, but that would be more work.

3 Likes

Thanks for the suggestions!

Unfortunately, for the high-volume SEO spam use case the JOSM revert plugin as mentioned has rather severe compromises (slow and tedious–often as much or more so than a manual revert–in the common case of subsequent changes, doesn’t post comments, requires a restart and some hackery to switch to my revert account and back), as do the Perl revert scripts apparently (won’t revert any objects with subsequent changes, doesn’t post comments, multi-step workflow, Perl). And as an open source scientific software developer in the US right now, given the current situation I unfortunately don’t have spare financial resources or bandwidth to spend on setting up and maintain my own worldwide Overpass server with the metadata and attic data required for OSM-Revert–having talked with @jacobwhall about his Overpass setup, supporting attic data is apparently quite non-trivial and would further increase the resource cost, such that none of the pre-built containers/workflows or community-run servers support it (including yours :slightly_smiling_face:). Especially just so I can spend volunteer time and effort helping deal with dozens of SEO spam changesets each day, hehe.

After discussing it with Roman Deev (of Better-OSM-org fame) on Zaczero/osm-revert#62, perhaps the most viable option I see in terms of time and resources is either running OSM-Revert locally from his branch, with any further modifications as necessary to reduce Overpass load and ensure all rules are complied with, or chipping in a few bucks a month to help deploy the same on a VPS. There are a few hurdles to leap through (namely, little to no documentation and seemingly relying on Nix for install/deployment which I’m not personally familiar with), but I’m hopeful we can solve them.

The option to revert objects with subsequent changes is a flag in most of the scripts. It defaults to “not reverting” (a safe default), because if there have been subsequent changes you may need to reconcile subsequent changes (depending on whether a redaction is needed). By definition, any revert affecting objects with subsequent changes has to be a multi-stage process for that very reason, even if the final stage is “asking the community to tidy up after a redaction needed for licence reasons”.

Adding retries to hammer an overloaded community overpass server seems like a spectacularly bad idea.

I know that @Firefishy has been cracking down on spam at the account level so (at the risk of stating the obvious) please do report spammer accounts.

2 Likes

In situations where the available quota fluctuates frequently, this is the only option for this tool. You can’t accurately predict the interval at which you can make requests to the Overpass API, but you can rely on the server’s HTTP responses.

I’d also like to point out that the proposed PR specifies a retry interval of 10 seconds, which is very generous for the server. And the number of retry attempts is limited.

I’ll also note that osm-revert typically makes no more than three requests to the Overpass API. And osm-revert is a service that benefits OpenStreetMap and mappers. It’s pointless to argue about a community server when most Overpass users don’t use the Overpass API to improve the map.


P.S. It’s a different matter that the Overpass API isn’t needed for a reverter. Someone should write a reverter that runs entirely in the user’s browser, using only the OSM API…

1 Like

If a community Overpass instance’s “available quota fluctuates frequently” it’s doing that for a reason! I’d suggest “use something that is known to work” is actually the better option.

Saying that “someone should” always generates a :smiley: in OSM! Having some low entry barrier (such as being able to use Josm) before reverting someone else’s work is IMHO a good thing, and with a DWG hat on I’ve seen numerous edit wars played out via the medium of osm-revert . Also, it regularly makes a mess that needs tidying up.

1 Like

I’ve been thinking about that for changeset viewers like OsmCha. But if direct OSM API calls become too common, that could end up overloading the OSM servers, and that is a much worse situation from Overpass being overloaded. It’s probably a good thing that popular tools don’t talk directly to OSM.

2 Likes

Just a note: OsmCha has been generating its own augmented diffs, independently from Overpass, for a while now.

3 Likes

Ah, thanks! I’d thought I’d seen that somewhere, but when I went back to check in the docs and wiki page, I couldn’t find the mention of it nor in the script arguments, and the code comments I found in complex_revert.pl just discussed the default behavior and also mentioned that the state of the object was reset to the version immediately prior to the changesets rather than incorporating non-conflicting changes like OSM-Revert.

Sorry I was unclear; the multi-step process I was referring to was having to download the user’s changesets to a local directory with download_changesets.sh, do any manual pruning or grouping of changsets, and then pipe them into complex_revert.pl (plus then manually commenting on each of them).

As to your comment, I’m not quite sure this is true at least for this use case, since if I’m actually reverting a SEO spam changeset (versus manually fixing it instead), changes since then are generally one of the following that is automatically handled by OSM-Revert:

  • To untouched tags/elements/etc. or to tags that already existed in which case they are automatically kept
  • Minor or bot fixes to the invalid tags/objects the SEO spammer added, which are automatically removed
  • Derivatives of copyvio content, in case of Google Maps copied content (which a lot of SEO spammers do) which are nessesarily removed
  • Moot, in case the feature shouldn’t exist on OSM to begin with (no permanent physical presence on the ground, wrong location, etc)

If other human mappers have already made significant efforts to clean up the SEO spam POI, the POI only needs minor fixes to begin with or if its a relatively rare corner cases, I don’t use OSM-Revert and just clean it up myself.

That seems a tad harsh, considering the requests are for the express purpose of remediating damage by abusive (or at least negligent) actors to the core OSM database that the Overpass server itself relies on, by a tool you and your DWG colleagues seem to use quite extensively for the same purpose as I. And in practice, waiting a period of time and then retrying that single request results in significantly fewer total requests (per revert) since it doesn’t need to repeat all the other requests like when manually retrying. (Conversely, the previously-discussed limitations of other tools that don’t use Overpass means substantially fewer reverts for a given amount of volunteer time.)

Yes indeed. Actually, @Firefishy 's profile-level SEO spam block spree is the origin of my current ~400+ SEO spam account backlog from the past couple weeks that I’ve been working through, 20-40 accounts (~50-100 changesets) per day (I’m almost halfway through now). These accounts were mass-banned for SEO spam profiles but their changesets were not affected, thus I’ve been reviewing them to see what can stay/was already fixed, what needs (further) minor/manual fixes and what can just be reverted, and dealing with them accordingly.

3 Likes

It’s worth noting, however, that Osmcha continues to use the Overpass API as a fallback. Just open the most recent changset in the Osmcha feed. It likely won’t have an augmented diff.

1 Like

Linking this here for reference (although it unfortunately won’t help with reverts)

The VK Maps (ex. mail.ru) Overpass API server is back online: VK Maps - maps and location services for your business

Overpass API URL for osm-revert: https://maps.mail.ru/osm/tools/overpass/api/interpreter

4 Likes