The routing backend is down again because the filesystem is mounted read-only as a result of a kernel bug. The admin is looking at it.
Edit: it appears that the routing backend is back up, but I don’t know if this serious problem is permanently solved. I’m still looking for hosting sponsors to make the routing backend more robust, so if you can help please contact me.
I could help with hosting, but I don’t reach the memory requrements. So I was wondering, can the hosting be done in a distributed way? 8GB is a bit on the high side for what most hardware around has, but if it were possible to distribute the backend then lower memory requirements would become possible.
Thanks Rebroad, but I think that memory is critical. I’ll try to explain:
The route database is about 15 GB, of which about 5 GB covers North/South America, the rest covers Eurasia, Africa and Oceania. If you want to route from somewhere in North America to South America you need access to the full 5 GB. The data needs to be available in RAM to get any performance from this (this is straight-forward and I’ve done benchmarks that decisively prove this).
On top of this memory payload you need additional memory for the routing application to store intermediate results (see wikipedia for shooting star or Dijkstra algorithm, I believe the A* (shooting star) algorithm is similar to the algorithm used in Gosmore). For long routes these intermediate results require a fair amount of memory as well, up to 4 GB is not uncommon.
So 8 GB RAM is the minimum for routing in the America’s with a concurrency of 1. A concurrency of 2 would need 12 GB RAM (worst case, when two large requests are handled simultaneously). Etc. An Eurasia-only server would need 16 GB as a minimum if you want to be able to keep the whole route database in RAM and have some RAM available for the intermediate route results.
If the Gosmore processes require more RAM then the system has free, the system will first start unloading parts of the routing database from RAM, reducing performance. The system will start swapping when the Gosmore processes use more RAM then physically available. Swapping is really killing performance, you don’t want that ever to happen. It is the reason why the old route server with 2 GB ram (which now only serves the front-end website) was often completely unresponsive because of heavy swapping, so much so that I sometimes even could not login to attempt to kill these large requests. As long as you quickly process each request then there won’t be concurrency, but when you need a lot of time to handle one request other requests will also arrive and increase concurrency which increases RAM usage which increases swapping which causes slow request handling etc. You end up in a killing feedback loop.
The current route server that provides worldwide routing has 16GB ram and performs pretty well, although parts of the routing database will be removed from RAM from time to time. The number of routing requests is also increasing every month so 32 GB ram would give a nice margin to grow. Below 16 GB there is not much performance to be had, perhaps it would be doable for the America’s only.
So to come back to your question: there is (currently) almost no option for distributing the routing backend that will seriously reduce the 16+ GB memory requirements.
I’ve received several sponsoring offers ranging from 2nd-hand hardware, virtual machines, rack space and dedicated servers. More than I hoped for and more than I can manage, which is really a nice ‘problem’. Many thanks for everyone who contacted me!
So how’s the routing backend doing? A few weeks ago the backend has moved from one server to two servers, one server handles both America’s and one server handles the rest of the world (hereafter: Eurasia). Besides the doubling of CPU’s and memory the memory use per server also halves so the actual routing requests have more memory to work with. I think it’s safe to say that the doubling of processing capacity has increased the routing requests per second by a factor of four. This has shown to be very useful already as last Friday the service handled the most route requests ever: 360.000 in one day, way more than the usual 80.000.
The plan going forward is:
first, to quickly add a sponsored dedicated machine to share the load with the Eurasia server, which receives the most requests.
Secondly, I have received two 2nd-hand servers and two rack-space hosting options and will install the servers (hopefully) next week which brings the amount of available servers to 5.
Third, to add another sponsored dedicated server to the pool. This sponsoring has been promised/discussed but has not translated into an actual server yet.
Software wise at first a simple round-robin scheduler will be implemented to take advantage of all the servers. Availability checking and monitoring must also be implemented. After that I would like to dedicate one of two servers for providing experimental routing options, like many cycling profiles and nautical (if possible). Perhaps other developers can be given access to test routing engine updates, different routing engines etc.
These upgrades will greatly improve performance and reliability of this service
Another option would be to provide an additional service to commercial parties who require up-time guarantees or bulk routing or … implementing an API-access authentication and subscription mechanism. Contributions from these API users can be used to strengthen the service further (e.g. rent extra servers).
I would like to hear the input of others, so if you have ideas; please share!
Thanks for your thoughts. I’m curious though, perhaps Apache isn’t the fastest solution but I think only little time is spent in running PHP as it’s mostly used to pass the request on to a Gosmore shell binary executable which does all the heavy lifting. Most (by a large margin) requests are handed between 0.01 and 0.1 seconds. The entire request is moved between servers with large geographical distances, adding several times the calculation time in network transmission. How much would an NginX/Apache combo with APC speed this up?
I’m not sure what the Varnish reverse proxy would be used for?
I’ve read a bit about Varnish and it’s mostly used to cache (semi-)static requests from users. Because most (>99.9%) route requests are unique I don’t see any added value for a reverse proxy. Or do I see this wrong?
No you are right. Maybe Varnish isn’t that helpfull after all. It’s only helpful for same requests yes.
The advantage you can get from running PHP in FPM mode + NginX is not the speed per request, but on how much resources per request your server uses and therefor how much requests per second the server can handle. Apache with running PHP as mod_php is rather resource inefficient.
Indeed running gosmore as binary each time won’t help either with the speed of the request.
I will try to install the netherlands part, internal at our side, because we use only the netherlands and I noticed we were spamming your servers. (we are the 93.186.184.x). So by that I will also take a look on how it can be optimized, because I think it could be done. Although I’ve already taken a look at the source of gosmore, but that’s a serious mess.
From our point of view, caching is possible. Because we calculate distances for many of the same routes. We use it as autocomplete to calculate distances between many same addresses. So I’ll build a local cache on our part so that your servers won’t get that much of requests from our side. And will use our server as first API. If that doesn’t work yournavigation will than be a fallback.
But I can understand that at your side, caching is not an option. What you can do however is add a robots.txt to disable spiders from crawling your API. That will remove some requests from happening.
I will email you about the info on installing the server with gosmore with optimalisations.
I searched the logs for the last few days (including the day which saw 350.000 requests) for your IP address, but there is no match. You’re welcome to use the API service, just remember to add an email address or app name to the request so that I can contact you when excessive use is detected, see the API documentation in the OSM wiki. Ofcourse you may setup your own server and I offer to assist where possible.
Ok, I understand. Apache/PHP isn’t optimal but, considering that I’m not a trained linux server admin, I’m glad to have Apache/PHP running well. The API frontend server has 2GB ram and 2 cores which are doing nothing mostly (relaying requests to the API backend server(s)). The API backend servers are 16GB ram 4-8 core machines and Gosmore is by far the most memory/cpu hungry application on those servers. So my primary question remains: how much does a more complex chain really help?
Regarding Gosmore, it indeed requires some extra time to startup on each request compared to an already running service, but there is a special feature where an external tool keeps the routing database locked and shared in RAM (I don’t think it’s in SVN already). The gosmore instances immediately have the entire routing database available. So I think that the startup overhead per route request is quite low already.
The remark about a robots.txt is interesting I do see some crawler activity, although not significant. But a robots.txt is simple to add
I must apologize to ict4schools because my assertion was incorrect. Yesterday I implemented a very simple cache using tempfs on the frontend webserver and the logs show that 30-40% of the requests are cache-hits. Apparently some applications and/or users produce multiple identical requests within a relatively short time? I don’t know how it’s possible that there are so many non-unique route requests, but anyway, I’m very happy with the results.
The first version of the cache held about 1000 requests, but the final (current) version simply discards requests that haven’t been used for more then 1 hour. The effects are clearly visible on the route backend server. Cache size usage for 1 hour (about 4000 routes) is less then 50 MB currently and varies slightly during the day.
I guess it’s because of the implementation of the applications requesting the routes. I guess the route won’t be cached in the application and will be requested again if the information is needed again.
Can you give me close easy examples how to use yournavigation API easiest way on my site or another recommendations?
I admired by yournavigation website service!
Hello Konstantin, your browser will block cross site scripting calls because of possible security issues. Therefore you’ll need to send the request to yournavigation.org via proxy script on your own server.
I’ve been rebuilding our Gosmore maps every month or so, but the latest updates didn’t work out. I’ve build the program from svn and downloaded a fresh planet. The rebuild doesn’t end in errors and delivers pak’s, but these are much smaller than normal:
11473176 Mar 10 07:09 america.pak
45057416 Mar 10 04:56 eurasia.pak
It may very well be an error in our setup, but before I go into low-level debugging: is anyone else having trouble rebuilding Gosmore .pak’s lately?
Well, updating Osmosis wasn’t enough. it seems that the option “idTrackerType=Bitset” in the update script is not working. Without it, Osmosis uses dynamic id-trackers and the resulting file sizes seem to explode. As a result the machine was in swapping madness this morning and I had to reboot it.
I will do another run over the weekend allowing for more runtime but if anyone found a solution I am very interested.
BTW: @Lambertus, how’s the YOURS website routing maps update? Can we learn from that?