OpenStreetMap.org currently offline. Operations Team are working to restore - 15 December 2024 (Updated)

DavidKarlas · December 16, 2024, 6:57am

Just friendly reminder that high availability and redundancy cost extra, so if you can go to Donate – OpenStreetMap Foundation and donate what you can so maybe in future this can be prevented.

Peer_van_Daalen · December 16, 2024, 7:31am

Let the team behind the scenes work in peace. They’ll know how to get out of this mess.

It’s going to take time and I trust them. Good success takes time …

And instead of trying to ‘make a name for yourself’ with ‘clever’ advice, you could help the family decorate the Christmas tree.

admin · December 16, 2024, 12:11pm

The Operations Team has taken the decision to wait for restore of ISP services at our primary site in Amsterdam. Our expectation is that services should be restored on Wednesday. We don’t have an ETA for ISP, so the time estimate is based on our predictions based on the communication with ISP.

While manually recovering services (postgres + planet diffs) to Dublin remains an option, we have decided for now not to activate this disaster recover scenario. The risks involved are not yet justified. Data integrity is our priority.

In parallel we are finalising the provisioning of new ISP services in Amsterdam and Dublin.

ZeLonewolf · December 16, 2024, 12:52pm

Just curious, who is operating the “admin” account?

admin · December 16, 2024, 12:58pm

Grant. My login token expired and I used a special admin login method. oAuth2 via OSM.org isn’t functioning because the tokens cannot be stored in a read-only database.

admin · December 16, 2024, 1:10pm

ISP are express shipping the equipment from California.

katpatuka · December 16, 2024, 1:13pm

Per ship?

ZeLonewolf · December 16, 2024, 1:25pm

The good news here is that eventually all of our tokens will expire and you won’t have to hear complaints on the forum anymore

grin · December 16, 2024, 2:38pm

SSO - Single Silence, Offline.

StC · December 16, 2024, 2:53pm

Out of curiosity, can you explain why we can read the database and not write to it? There probably are lessons here for those of us who are involved in managing other platforms.

Cristoffs · December 16, 2024, 2:55pm

Considering how many services use our data, one would expect the infrastructure to work a little more efficiently. I’m curious how long it will take to fix, and even more curious what lessons will be learned.

admin · December 16, 2024, 2:59pm

We have primary (Amsterdam) and follower (Dublin). The primary data is synced to the followers (we have multiple). We use asynchronous replication because latency between Amsterdam and Dublin can vary a lot and synchronous would effect the speed at which changes upload. In addition we also have state data for planet diffs and internal planet diff tracking state.

When the uplink in Amsterdam failed not all the data had been synced to Dublin. A small amount of map changes will therefore only exist in Amsterdam (we also have a follower database in Amsterdam). If we force Dublin live then the data will be lost. Alternatively we could manually sync over a 4G + VPN link, but we have deemed this too high risk for the moment.

Summary: We are running Read-Only in Dublin because not all the map changes had been copied to Amsterdam when our Amsterdam connection stopped. If we force Dublin Read-Write again, we will lose some mapping data.

admin · December 16, 2024, 3:10pm

Tiny budget, tiny team. We do the best we can with the resources we have. We’d appreciate more help.

Cristoffs · December 16, 2024, 3:12pm

Is this the point at which we should think about not acting on a volunteer basis in this part of OSM?

309_308_307 · December 16, 2024, 3:13pm

Wow, my estimate wasn’t too far off. Tough time for anyone here who relies on OpenStreetMap for their daily lives, and especially those who were planning to update public transport lines for the annual schedule change for the 2025 schedule (which happened to fall on the exact day the database went down)

M_dgard · December 16, 2024, 3:24pm

Out of curiosity, what are the data integrity risks involved?

admin · December 16, 2024, 3:32pm

I am the only full time employee of the OpenStreetMap Foundation. The OSMF is interested in hiring additional staff, but I believe they are constrained by financials.

admin · December 16, 2024, 3:36pm

Manual postgres WAL log recovery and manual osmdbt state recovery, while ensuring it continues at the right DB logical segment.

We’d also have to hack the 4G link to get it working for this purpose. The 4G modem link has a tiny bandwidth allowance, but can be topped up in 500MB blocks.

Mateusz_Konieczny · December 16, 2024, 3:39pm

Though some mapping data is also lost by long downtime, that will be simply never mapped.

On the other hand throwing some unsynchronized data would I guess highly increase risk of inconsistencies (some edits left only partly done, some revert performed and vandalism restored by throwing them away, maybe also some other data would be lost - final registrations, blocks etc).

(note: I am not a highly experienced sysadmin)

I can confirm this.

(though note that I am not on OSMF board anymore - but I expect no dramatic changes here)

Cyrille37 · December 16, 2024, 3:47pm

Hi
The current problem is with the service provider, not the team.