Nearly a month in the making, I am super excited to bring this to light.
This project is a transformative rewrite of the foundational OSM infrastructure1,2 to Python. Its goal? To simplify contributions, secure against vandalism, and modernize the API, all while remaining backwards compatible.
(Some) Highlight New Features
100% Python; Move away from the complexities of Ruby, C++, and SQL.
API 0.7; A streamlined, future-proofed update that addresses many 0.6 challenges.
Anti-Vandalism Hardening; A comprehensive 3-stage strategy to combat vandalism.
Optimized Performance; An innovative parallel diff processing algorithm.
Important Update: In the video, I mentioned MongoDB replacing the PostgreSQL database, but that is no longer the case. OpenStreetMap-NG will work with PostgreSQL. Please see the document for the updated information (or watch a State of NG episode).
State of NG
I started a video series where I talk about the current project state and summarize general community discussions. You can find the latest updates here:
SONG #1 — PostgreSQL, Performance, Project Structure (2023-11-15)
I understand that what is obvious to me may not be so for others. Because of that, I will be hosting two open AMA video sessions, collecting questions from you—the community. I will then format all the questions and answers and publish them on the project’s website. Feel free to hop in, even just to chat about anything!
Looks like I’ll be first in what will surely be a busy thread!
First of all - this sort of initiative is surely needed, since the tech stack of OSM has/is at risk of stagnating! However, I believe the hard part for you begins now. The tech is easy, getting buy-in from the community will be a few orders of magnitude higher.
A few questions as well:
Have you been in contact with the contributors to the current code before this announcement?
What is the technical reason for mowing away from PostgreSQL/PostGIS? I do not see an objective benefit that does outweight the work involved in migrating the database engine.
Will there initially be backwards compability with API 0.6 clients?
Does your code include an OAuth2 server, or will that be using an off-the-shelf solution liky Keycloak or Ory?
Are you aware of the API 0.7 wiki page, which collects suggestions for a new version of the API? Which, if any, of those suggestions have you incorporated?
EDIT: Another question:
I don’t know where in the current API C++ is used, but that’s a language that’s rarely used without a good reason. Have you done some benchmarking of the two APIs to ensure that yours does not introduce a performance regression?
I think that redoing the OSM API is long overdue. So, I applaud your commitment to this goal!
I have watched your video and took a quick look at the code, what I am missing is some written consideration about the chosen tech stack. I think that if you are making chooses like the tech stack that you should write down requirements and compare that to alternative technologies.
And I do think that the code structure could use some work, it seems that most of the retrieving logic is housed in the models. I think that is less than ideal. I think that your models should define the data model and that kind of logic should be housed in repositories or if they use multiple entities the logic should be defined in services.
But for the rest I think that the additions you propose to the 0.7 API are very reasonable. And that things like better hashing for mods/admins. And 2FA are very welcome security improvements.
From use of MongoDB it seems to not be a proposed replacement?
In general suspect that improving existing codebase would be much better than “lets replace codebase, migrate storage, start using new database, setup new deployment and use new languages, add more features, change api, add edit filtering, everything at once”, as a viable progression.
For example: imagine planning theoretical deployment of that project, including ability to undo it and return to old setup, without data loss.
This alone would be a massive task to plan and prepare. And if at the same time many features are added it makes more complex to even test that something is backward-compatible.
I suspect that most responses you get will be very similar to these; very happy and positive that someone is doing something, but skeptical too the scale and tech choices (like @Mateusz_Konieczny I’m very skeptical to replacing PostgreSQL/PostGIS, unless there is a really good and sound argument for it).
I’ve actually done something very similar to this once; trying to come out of nowhere with a huge revamp of an existing system. And I’ve seen plenty of others do similar things.
The results were always the same: Somewhere between going nowhere, and some minor aspects being applied. Which is really sad, as in most of those cases, as in this, change was badly needed (in one case, the main project collapsed shortly after).
I’ve actually played with similar thoughts as you; amongst them re-implementing the OSM API in Python (and specifically FastAPI). So based on the ideas I had, I think you might want to consider this: Begin by going through the current openstreetmap-website and looking at the parts that could be extracted from it; like API, website, and authentication, and then handling each part separately:
For authentication: Security is really, really hard to do well, and there are tons of traps along the way. I don’t think it makes sense for OSM as a project to maintain its own IAM stack. Instead, it would make sense to extract it into some other existing components, like Keycloak or Ory. This would be a project in its own right; establishing the new stack and replacing the code in openstreetmap-website.
For API: The primary goal here is backward compatibility and continuity, because of the number of clients depending on it. To begin with, I again really don’t think replacing PostgreSQL makes sense, any changes in the rest of the stack become so much easier if it is kept. Assuming this, some alternatives are:
Writing a 0.6 compatible API in FastAPI on top of the existing database, run it in parallel with the existing API, then once it seems stable turn off the existing API, new API versions can then be implemented in FastAPI (even while 0.6 is kept alive in both existing and new code)
Going directly to 0.7 in FastAPI; would be less code duplication but might require keeping the existing API around for longer
In both cases, you should be able to keep around a lot/most of your code, just replacing the data access layer.
An additional upside to this approach is that as it can be run in parallel to the existing codebase, it should be easier to convince those in charge of the current OSM infrastructure to run your code (initially in the pre-production environment, but also when graduating to production).
For website: This part I believe is the least urgent, as it is possible to build other frontends (and that is indeed being done, with great websites such as OSMCha etc.). Anyway, it should be possible to work on this (either still in Ruby, or FastAPI or even going for a frontend framework) independently from the rest.
This may be fine, as long as you are prepared that those conversations result in you having to throw away every single line of code you’ve written. You’ll have to enter the conversation ready to do so.
Maybe, maybe not. Just note especially my post regarding splitting the current codebase up into separate components though, a revamp like this would be the perfect opportunity to improve the architecture.
Also, unless you’re quite experienced with Ruby, do consider the possibility that the current codebase is in fact not convoluted, just hard to understand for someone without experience. I’ve personally used Ruby (on Rails) quite a bit in the past (in fact, it was part of the project I did where I came up with a huge revamp from nowhere) but got annoyed with it because there’s too much “magic” going on, making the code hard to follow. This would of course work for your argument that Python/FastAPI would be easier for newcomers to work with.
Sure, and that’s a commitment OSM is in need of.
But that does not address my fear. In my case, I spent at least 20h a week for several years (initially on my revamp, then gradually more and on working with the existing codebase, though I never got more than a few smaller parts of my original revamp in). The issue is, again, not technical; it is about how others will react to it.
Your entire reasoning is about what you prefer. We want to know why it is objectively better.
Some things that might convince me (and likely others):
Concrete functionality that’s available in a document database but not in a relational database
And note, that since the switch is significant work and risk, you’d have to show a significant improvement, both in terms of performance and functionality, not just a status-quo.
Personally, I much prefer SQL databases over NoSQL-databases, and I could produce an argument just like yours to that effect.
No, it never is. Been part of a few major database migrations, and it’s always a lot harder than you might initially think.
I’d much implore you to reconsider this: Sure, you can implement your own OAuth server, and it’s actually a quite fun exercise. But how sure are you of its robustness? Are there any backdoors you have unintentionally introduced?
Security is not a game or about what’s fun, and the ongoing vandalism against OSM shows that there is an actual target. Imagine the current vandalism, but from someone having gained access to existing user accounts, camouflaging their edits in the stream of the users legitimate commits!
Keycloak is pretty straight-forward, and Ory is a bit more complex but on the other hand has a company behind it, maybe they’d give us some extra help to deploy their stuff, as it would be a great showcase for them?
Repeating myself, but I think it is a really bad idea for OSM to maintain it’s own IAM stack, and this revamp would be the perfect opportunity to address this.
In that case, I’d expect you to produce a benchmark proving that the performance regression is negligible.
Also, note that a few milliseconds per request is not necessarily a micro-optimization. I don’t know how many requests this service handles or what the best/average/worst case, but that’s information that’s needed to make a call on that.
Also, just to be clear, I think what you’re doing and your initiative is awesome, so don’t take my criticism as me being against change or wanting you to fail. I just try want the result to be as good as practically possible.
I am considering a switch to PostgreSQL having received the community feedback, I will post more updates soon. As more and more people expressed the same stance, I believe it’s the only sensible move forward. At the current stage, this should not take more than 2 days to change completely.
Feel free to reach out if you need any assistance with the SQL parts.
Some tips up-front:
You probably already know about it, but there’s a diagram of the current schema here
SQLAlchemy is the most commonly used ORM and since a while back it also has an async API, it is however, somewhat complex (though most of the complexity is there because it is also powerful)
SQLModel is another ORM, developed by the same person as FastAPI and thus having a similar “feel”, however, its development has been slow and I’m not sure I’d trust it for production yet
I’ve personally used Tortoise ORM for a few projects, it’s quite alright but does lack some advanced features
However, you might not even need an ORM! Considering the relative simplicity of the data model of OSM, it might make sense for you to write your own queries behind an appropriate layer of abstraction. As I understand it you are not too comfortable with SQL, but you shouldn’t need more than the basics for a project like this, so I don’t think it’d be out of your reach.
There are some things I’d consider changing about the current schema, for example, I think it might be possible to get rid of the current_ tables using some of the newer PostgreSQL features, but, for easier migration to your codebase I’d stay away from that for now
As already mentioned, I think it would make sense for you to focus on just the “core” parts, i.e. changesets, nodes, ways, relations, and redactions, both current_* and history tables. That, together with keeping the current schema intact, will likely give you a much smoother path toward getting your work deployed.
However, I personally didn’t find sufficient justification to continue supporting C++. Most of the time, it simply waits for database queries. The most significant time savings come from constructing the API responses (XML encoding). This optimization likely saves a few milliseconds at best per call, which I consider to be in the realm of micro-optimization. Cgimap doesn’t perform any computationally expensive operations itself, making it seem like a bad tool for the job.
This explanation makes me even more curious about the benchmarks you used to make claims about scalability. Were you testing with a full database dump, an extract, or some other mocked-up data? What kind of hardware did you test on?
Besides osm.org, other deployments of openstreetmap-website are running in environments with different characteristics. I’m not in a position to say with any certainty, but anecdotally, CGIMap has very significantly improved the user experience on OpenHistoricalMap. That project was all but forced to switch to CGIMap because the /map command simply couldn’t handle the data volume and density in some parts of that database. Now the browser is the performance bottleneck, which is a good problem to have.
Perhaps your implementation provides a comparable performance improvement, but I would expect hard numbers and steps to reproduce before drawing that conclusion.
To make benchmarks, I first need to have a code to benchmark - so coming soon. I used my experience to make that statement. I believe the majority of the performance boost in cgimap comes from handcrafted/optimized SQL calls. Encoding is not computationally expensive and time saved thanks to c++ is insubstantial. But of course when proper benchmark are run we will have a definite answer.
I’m not in a position to evaluate your experience, but ultimately it isn’t me you need to convince anyways. C++ may not be an essential language for that component versus some other language – cue Rust fans! – but I am taken aback by the implication that CGIMap is much ado about nothing, because it flies in the face of what I’ve seen. Looking forward to your benchmarks.
Do note that encoding is in fact something that’s more efficiently done in languages like C, C++, Rust or Go, thanks do better control over the memory, as well as similar reasons why SAX-style approaches are more efficient than DOM-style approaches.
That said, a package such as ujson might bring most of these benefits to Python as well. But the best way to tell is always to benchmark.
A third approach might be doing the encoding directly in PostgreSQL, which may or may not be more efficient, but of course requires writing more advanced SQL.
So! I have updated the announcement document, the roadmap, and the FAQ questions. Here’s what we’ll do for PostgreSQL support: SQLAlchemy will be used because it has great migration support, is well tested, is super flexible, and has async support too. Alembic is an obvious choice (designed for SQLAlchemy) for the migration tool. I really wanted to use a decent solution for maintaining a single set of models, but the majority of the projects are either less featured than SQLAlchemy or are not well maintained. I will follow a similar pattern as described on the Pydantic website and use the formerly known “ORM Mode”: Models - Pydantic. A switch to Django was also considered, as it provided a uniform way to handle ORM, but it introduced a lot more complexities to the project and, again, wasn’t fully async compatible. I expect the change to take no more than 2 days and will prioritize it before moving forward.
Also, I want to kindly remind everyone that, as with every big community project, discussion is key. I find it somewhat strange that some people are quick to downplay the release without seeing the final result. Everything requires some time and discussion. I kindly ask for your respect and trust in my work. I am dedicated to providing the community with the best update experience and will continue working towards that goal.
This is kind of inevitable if your approach is to come in vigorously rocking the boat with such a polished presentation. You’ve demonstrated skill in identifying problems and putting together a package that includes the kind of productization that most developers frankly are not good at. Even with the best of intentions, you’re going to face some skepticism from people who’ve been working on these problems longer.
Now comes the hard slog of explaining things over and over again, tracking down odd edge cases, and revisiting faulty assumptions you made due to unknown unknowns. You won’t get the same adrenaline rush as when you first debuted this project, but believe me, once you succeed, no one will remember how it started anyways – they’ll remember what you got past the finish line.
I’d note that this community is accustomed to discussions happening in advance of decisions. What you’ve set up here is a project where you make decisions first, and you’re now receiving the feedback, and that sets a different tone and a different barrier to entry. I suspect many people don’t have the time and energy to participate in this format - I don’t, at least.
I think it also limits the utility of the end product. If you’re making decisions on your own, that works fine if you want to either 1) fork the project and deploy your own or 2) use the code as a demonstration project for potential improvements to OSM infrastructure that will be written later to fit into the existing codebase - but where this code is not intended to be the actual replacement. But if you want the code you’re working on to be used in production on openstreetmap.org, I’d encourage you to flip it back around and start with discussion before making decisions - that consensus process drives the community buy-in you’ll need to actually see your code deployed.
So, my question is - is this a demonstration project that’s solely intended for community discussion, or is your intention for your code to be deployed?
I find your comment somewhat surprising: So far, all discussion here has been very civil and respectful from all sides. And I must say that I’m somewhat surprised about the lack of negativity here, OSM tends to have somewhat of a veto-culture where there’ll usually be someone completely opposed to any change, regardless of the details.
Do note also, that the discussion so far has mostly been about high level details (like choice of database and other architectural choices), the kind of things that usually get discussed before any significant amount of code is written. As such you should expect discussion around, included push back on, decisions you have already done. I also feel it is appropriate to add the standard graph on cost of change vs. time:
Based on this comment and your initial post I think you might have gone into this thinking all this is a done deal. Be very careful about that! At this stage (especially considering the relatively low number of participants in this discussion), it might still very much happen that the consensus shifts to improving the current codebase instead, or even writing new code from scratch in a different language, etc. By keeping that in mind you’ll likely find it easier to take the comments you’ll get into consideration, significantly raising the likelihood that this project will come to fruition.
This kind of thing is not easy and I know what you’ve gone/are going through. Starting something like this with a discussion runs the risk of not being taken seriously and being almost ignored, while coming in guns blazing like this runs the risk of having to redo a lot of work. The best is becoming involved with the status-quo first (both to show your abilities and seriousness and to learn about the considerations, tradeoffs and implicit assumptions of the current system), like working on the existing code, but that takes time of course.