State of NG #1 — PostgreSQL, Performance, Project Structure

NorthCrab · November 15, 2023, 7:11pm

Today, I post the first State of NG (SONG) video update. In this series, I will talk about the current project state and summarize general community discussions.

Video length: 10 min 48 sec

Video Summary

Database Choice

Kamil begins by thanking the community for their participation in discussions. He acknowledges concerns about using MongoDB as the project’s data store. He admits that MongoDB was chosen for its simplicity but recognizes that it may not be the best choice. The project is transitioning to PostgreSQL, a more widely known database, which should improve developer familiarity and performance. PostgreSQL allows for application-specific optimizations and provides more tooling.

Python Performance

Kamil addresses concerns about potential performance bottlenecks in Python. He mentions the Cython project, which can compile Python code to C. Without static typing, this can result in a 20-50% performance gain, and with static typing, even greater improvements, depending on the function’s complexity. Initial testing has shown 50%-1000% performance gain on computationally heavy methods.

Pure Python Mode: In the context of Cython 3.0, Kamil mentions the importance of “Pure Python Mode,” which allows for writing cython modules in a Python-native way. He presents xmltodict.py as an example.

Scalability

Kamil thinks about future scalability by considering the possibility of running multiple threads per user request. While not an immediate need, this approach could significantly improve performance when needed.

Database Schema

Kamil believes that the new database schema will operate more efficiently, though he cannot guarantee this without performance benchmarks which will come in later.

Client-Side Content Encoding

Kamil introduces the concept of client-side content encoding. By using techniques like Brotli and GZIP, the size of data transferred by users can be significantly reduced. This results in substantial performance gains for users with slower internet connections. An example changeset upload takes 5-15x less time for a person with 200kb/s internet.

Project Structure

Kamil addresses concerns about the project’s structure, specifically the inclusion of too much logic in models. He initially included this logic to make the transition for current contributors smoother - presenting Ruby code example. However, he acknowledges that higher-quality code is essential and mentions that this issue is being addressed as part of the database migration.

Future Discussions

Kamil mentions that future discussions, including API 0.7 and other significant changes, will be held in the general section to encourage more public input and community involvement, as per forums governance decision.

Support

Kamil encourages support for the development through GitHub sponsors, Patreon, or Librepay, as he is working on the project full-time. He expresses appreciation for any support, including likes and stars on the project repository.

Useful links

Disclaimer: Please note that this project is not affiliated with the OpenStreetMap Foundation. It’s the result of my voluntary work and personal choices.

starsep · November 15, 2023, 9:23pm

Thanks for the update!

I appreciate client-side content compression. You haven’t mentioned that for most clients CPU time spent on compression/decompression will be significantly less than time gained on transfer time. Compression is opt-in so client can use uncompressed data if needed (or faster).

To be honest I don’t know how much of the problem it is currently though. I think it’d mostly useful for mobile editors where changeset size is usually small. Might come handy for JOSM users with slow internet.

02JanDal · November 16, 2023, 6:08pm

@NorthCrab Would it be possible for you to also post a written update? I much prefer that over a video, and I know I’m not alone in that.

NorthCrab · November 17, 2023, 9:21am

I added a video summary and will keep that in mind for future updates. Thanks !
Pinging everyone who liked your response, as they may be interested in the text version:

@PZP3610 @stevea @arctic-rocinante @soliMM @SomeoneElse

02JanDal · November 17, 2023, 9:52am

Does this mean that your project will be incompatible with the current schema? If so, I think it would be wise to reconsider that; as I’ve previously mentioned being able to run your project in parallell to the existing code will likely significantly increase the chances of it ever being deployed on any official OSM server.

The current schema can definitely use some work, I’ve been a bit in contact with Andy regarding that and have started testing a few things out, so feel free to reach with your ideas so that we hopefully can work in the same direction. That said, until anything changes in the current code I strongly recommend that you stay true to the current schema (or at least use a compatibility layer using views/triggers/rules).

I would also be interested to know if you’ve had any further contact with the maintainers of the current codebase other than in the last thread? What steps have you taken, communication/organisational/architectural/code-wise, to ensure your project will work well together with “what already is”, and to prevent/decrease alienating those who have and are spending time and effort on the current codebase?

NorthCrab · November 17, 2023, 10:02am

Yes, to achieve the full scope of the improvements and security enhancements, the schema needs to be changed. I don’t see it as feasible for the project to forcefully stick to the old schema. I don’t want to intentionally make the NG project worse.

Running two projects in parallel will require two database instances, so I don’t see any issues with using a different schema there. Then one could simply use a different script to compare, for example, planet diffs from the two databases and conclude that they both indeed operate in the same way. Planet integrity is the most important part of the operation.

No, I don’t see it as a priority until the feature parity point is reached. Without reaching the feature parity point, I don’t think there is anything meaningful to talk about beyond what has already been said.

02JanDal · November 17, 2023, 10:23am

I recommend looking a bit more into PostgreSQL, based on your initial posts I assume you’re quite new to it. There are a great many features in it that can allow for these kinds of parallell tracks.

If you can provide your current database schema as you’d ideally like it I can take a look and point you in the right direction.

While I understand your thinking, I think you might have to take some trade-offs here. Otherwise you might end up with a technically flawless project that never gets any use.

I disagree with this premise; I think running the projects in parallel against the same database instance* is very much doable (alternatively in separate instances using foreign data wrappers, but you’d anyway need essentially the same setup in terms of compatibility layer so I don’t see much reason for that).

* and to be clear here, I don’t suggest running a new untested project against the production database, as with any new code it’d have to pass through the normal pre-production environments first, likely over a significant timespan to ensure that everything works as expected

I really think you might be shooting yourself in the foot with this approach… But I think I’ve made my point on this topic so I won’t reiterate it.

NorthCrab · November 17, 2023, 10:48am

Here you go: https://github.com/Zaczero/openstreetmap-ng/tree/main/models/db

I believe running 2 apps on the same database instance will be quite a mess and somewhat dangerous. I also don’t understand what’s wrong with running 2 database instances and comparing their state with a use of a script. I think the approach you talk about is slightly over-engineered and does not result in any clear benefit, yet it limits innovation.

I really don’t know what else is there to say at this point in time. I think we all just need to wait for the feature parity point to be reached before any serious conversation can take place.

pnorman · November 17, 2023, 11:08am

Yes, I think that is the likely state. As a software engineer, for something as complex as the OSM database I first look at how is this going to interface with existing projects (e.g. osmdbt, planet-dump-ng, backup systems, monitoring, etc) because handling cross-team work is the main difficulty in software engineering. I then look at release engineering, including deployment strategies, rollback strategies, etc). I do not see a practical way to deploy this software, as we would need to be able to rollback any changes without data loss.

This project is programming, not software engineering.

NorthCrab · November 17, 2023, 11:24am

Of course! But I kindly ask everyone for some patience as the project is still under active development. I have already stated that compatibility is this project’s top priority, and that has never changed. I don’t think it’s fair to make such judgments when the current project focus is on achieving feature parity with the Ruby release. I am not currently focusing on supporting existing monitoring, upgrading current tools, etc - this will come at a later time. Doing so now would decrease development efficiency, as any significant change would require more work. My development workflow focuses on efficiency, where I first develop components that serve as the basis for future work.

02JanDal · November 17, 2023, 11:36am

There’s really two reasons why I think being able to run the two projects in parallell against the same database will be pretty much a hard requirement:

OSM is large enough, and especially distributed enough, that any form of migration downtime should be avoided, even if that means a higher cost (technical, operative, etc.). Under an hour might be negotiable, but even that’s a stretch (OSM is global, so it’s always daytime somewhere, and informing all mappers in a suitable way would be a significant and hard task, as well as requiring all commonly used editors to be able to handle the server being down)
You need to remember that you are challenging the status quo, which is proven technology at this point. Even if you can prove that your project is as reliable (which is likely to take at least 6 months to a year of testing), it is still a significant risk for OSM as a whole to do a sudden switch to a different technology, a risk that can be immensely reduced by being able to run in parallel, as it allows for deployment strategies such as canary.

As @pnorman stated really well above, you really want to start to consider more than just the code you are writing, otherwise you will hit a dead end. Maybe take a step back from the code for a moment and put some time into writing a migration plan? That would both help us understand how your code could be deployed, and likely be an important thing to guide your architecture.