However as the site title says/said, this was based purely on counting the addresses per municipality and not on an actual comparison of the address data. This week I at last got around to improving it a bit and I believe it to be substantially more useful now.
Major changes:
determines and counts matching addresses based on street/place name and house number (the matching value).
supports , and ; separators in house numbers.
produces a geojson file with missing addresses per municipality.
produces statistics for some common potential errors for addresses that were matched with a GWR address:
a missing addr:postcode tag or a wrong addr:postcode value.
a missing addr:city tag or a wrong addr:city value.
GWR and OSM address that are more than 50 meters apart.
use of addr:street when addr:place should have been used.
addresses in OSM that don’t have an counterpart in the GWR data (many of these are typos).
produces a file with geojson data for the warnings above.
Caveats:
addresses without an addr:street or addr:place value currently cannot be matched to the GWR data and will be contained in this “missing” data together with addresses that are completely absent. In a future version we might use a geographic search for nearby addresses in a 2nd pass, however that is likely to be very slow.
addresses in multi-lingual municipalities, for example in Biel/Bienne, that only contain an addr:street or addr:place tag with a composite name and no language specific version, ie addr:street:fr, can not be matched with the GWR data which always contains the name just for one language. While we could attempt to parse composite addr:street values, it is arguably an error to not include the language specific variants in such situations in any case.
we currently ignore address interpolations.
addresses with just addr:housename cannot be matched with GWR data.
non-standard value separators in addr:housenumber will cause the addresses not to match.
duplicate street - house number tupels are currently dropped, besides the case of actual duplicates, we currently don’t consider the postcode value when trying to determine a match, this is something I intend to improve in the immediate future.
the format of the warnings geojson file is very preliminary and will likely change.
This is a 1st release of the functionality and likely to have multiple issues, if you see something odd please report it here.
Yes, true. Addresses on building:part is a bit novel though but should be easy to support. This is a bit of an issue as no geometries are currently generated for building:part so that will need a re-setup and re-import of the database.
Actually while I don’t import a polygon for something that just has a building:part tag, in this case they get imported because of the address tags, so all is good and the addresses will turn up after the next run.
The two issues have been fixed now and further, I’ve removed all addresses/entrances that belong to demolished/planned buildings (including from the full data files).
The issue likely crept in when I changed everything to support the “new” way to retrieve the data from the BfS. In any case it removed 100’000 addresses on the GWR side, a very easy way to improve our coverage :-).
PS: due to the vandalism this morning the database is currently a bit behind, but I expect it will catch up by the morning.
There’s a general issue with false positive wrt municipality borders, currently I don’t buffer them because you will then simply get false positives in the other direction, so you should simply take any such errors with a grain of salt if the building is straddling a border.
As to 4aaa this is the original data in CSV format as retrieved from the BfS and it is likely simply a data entry issue, which is to be expected to happen now and then.
280137497 0 103544085 4aaa 10095121 Chemin de Clair Matin Ch. de Clair Matin Cla 9903 1 1009 0 Pully 2024-04-25
Thanks for the explanation. To clarify, I wasn’t expecting a fix, but I just wanted to highlight the excellent ratio with so few false-positive at the end!
Gleich der erste Eintrag “Aadorf” hat meine Aufmerksamkeit geweckt. Im Ortsteil Guntershausen ist addr:city=Guntershausen hinterlegt, GWR erfordert aber “Guntershausen b. Aadorf”.
Was macht man in einem solchen Fall?
Müsste allenfalls noch der place Node korrigiert werden? Dieser wurde 2010 umbenannt.
Eine ganz harte Regel kann es für solche Situationen nicht geben.
IMHO wenn der regional gebräuchliche Namen vom offiziellen abweicht: offizieller Name in official_name auf dem place Objekt (ohne Abkürzungen), und der gebräuchliche Name in name. Gegebenenfalls auch noch loc_name verwenden.
ABER: addr:city ist der Name der postalischen Ortschaft (sprich das was dem PLZ6 entspricht) und das muss es nicht unbedingt als place in OSM geben. So oder so würde ich vorschlagen abzuklären was auf einem ev. vorhandenen Ortsschild steht bevor wir uns da zu viele Gedanken dazu machen.
I just had a minute or two before I’m away and added a Canton column. Just as the other columns you can sort by it, so you can, for example, sort descending by the Missing column and then by Canton to find the municipality in a canton with the most missing addresses.