Programatic reconstruction of postal code areas


lately, I have been working on reverse geo-coding and noticed that the quality of postal code area information in OSM is pretty awesome in Germany but flawed in Austria/Switzerland and completely lacking in Italy, Polen and other countries. As an example, Switzerland has hardly any postal code relations. The next best thing are admin boundaries that contain a postal code which is incorrect because usually postal area != district. Especially in highly populated cities, there are no postal code tags on admin boundaries because the 1:1 matching does not work (e.g Zurich as one city has multiple postal codes):

Since I’m developing a 3D geometry processing framework anyway, I’ve started a little side project to try to programmatically reconstruct the postal code areas and hopefully at some point contribute the information back into OSM. I’m aiming for a semi-automatic workflow where I import OSM data, calculate the regions based on known data points (i.e. OpenGeoDB and OSM points with tagged postal code) and then use both mesh repair/smoothing algorithms as well as manual editing. The reasoning behind this is that manually cleaning up noise (i.e. incorrect postal code on buildings) and improving lines will produce a quality that will be hard to achieve with algorithms alone. My hope is that when it’s done, I have a dedicated editor that allows me to process a whole country within a couple of days of tweaking. More detailed information can be found here:

Results so far are pretty nice:

I’ve done some preliminary evaluations (Austria & Switzerland) and while the boundaries are by definition imperfect (they are after all only approximations based on a limited set of data points), using them to do real-world reverse geocoding on the whole improves the quality of answers a lot. Or in other words: even if some boundaries are wrong/imprecise (e.g. a small village ends up in the wrong postal code), in total the quality is a lot better than what we have right now! And it would be a great starting place for the community to tweak individual boundaries if they find more precise information (e.g. someone knows that a certain building is in a different postal code, he could simply move that line).

As a first test case though, I’m working on Liechtenstein (currently has no postal code relations at all) because it’s a smaller data set and I can establish the whole workflow pipeline. There is still lots to do (intersecting the cells with the country boundary, merging the result with existing postal code areas, …), but I have two big points where I’m looking for input from the community:

  • Is there any further data source that has points of interest or something similar with postal code information? Street name/postal code pairs seem to produce a lot of false information, so I’m looking for longitude/latitude/postal-code triples in CH, AU, NL, PL, IT. Right now, for licensing reasons I’m only using information that’s already inside OSM and additional data points from OpenGeoDB. Perhaps a list of restaurants or public places could be used to query OSM for their geo-coordinate…
  • What’s the best way to get the information back into OSM? I guess I could either use a C+±library to directly import it from my tool, export my data to some shape file format and then use some other means to put it into OSM (e.g. through JOSM or some bot). I’m also more than happy to simply contribute the shape information if someone else has more experience with importing such information into OSM! I unfortunately won’t be able to share the code of the tool…

So any input and/or help is highly welcomed!


I think you’d be better off sending this to the mailing list, as most of relevant people read it, not this sub-forum.