Duplicated administrative boundaries in Poland

Hi,

apologies for posting in English. I have added a translation by DeepL below, but I cannot even judge if it is useful.

I’m currently reworking how the search engine Nominatim handles administrative
boundaries to form addresses. One of the issues that I have tried to solve is
the issue of admin levels and cities. To give you an example waht I mean:
in Poland admin_level 6 corresponds to ‘country’. But large cities like Krakow
have admin_level 6, too.

Nominatim now solves this by looking at place nodes. If there is an administrative
boundary with admin_level 6 that contains a related place node with place=city,
then the boundary is classified as city. This works pretty well, except for
Poland. You have decided to duplicate the administrative boundaries for levels
that are missing. These duplicates cause problems.

I have reindexed all of Koszalin[1] on nominatim.openstreetmap.org in order to
show you the problem. Take a random street in Koszalin:
https://nominatim.openstreetmap.org/ui/details.html?osmtype=W&osmid=275110865&class=building

You can see that Nominatim has correctly figured out that the admin_level 6
describes a city (= address rank 16). So far so good. But then another rules
comes in: it knows that admin_levels describe a perfect hierarchy. It now finds
a boundary with admin_level 7. It cannot be a city anymore, so it must be a
city borough (= address rank 18). The same happens with admin_level 8. It is
inside a city borough, so it must be a suburb (= address rank 20).

I can (and will) add a rule that detects these duplicates (most likely by
comparing the wikidata tag). Still, I wanted to make you aware that this
duplication of boundaries is a rare exception in OSM. All other countries I
have looked at so far create exactly one boundary and usually set it to the lowest
applicable admin_level. More consistency would be useful. So maybe you can
revisit your decision and see if the duplication is still necessary today.

[1] Most data on nominatim.openstreetmap.org is still indexed according to the
old algorithm. You will not see the same effect in other cities.

Kind regards

Sarah


Translation by DeepL:

Obecnie pracuję nad tym, jak wyszukiwarka Nominatim obsługuje administrację
granice do tworzenia adresów. Jedną z kwestii, które próbowałem rozwiązać jest
kwestia poziomów administracyjnych i miast. Żeby dać ci przykład, o co mi chodzi:
w Polsce admin_level 6 odpowiada “krajowi”. Ale duże miasta jak Kraków
mają też admin_poziom 6.

Nominatim rozwiązuje to teraz, patrząc na węzły miejsca. Jeśli istnieje administracyjny
granica z admin_level 6, który zawiera powiązany węzeł miejsca z place=city,
wtedy granica jest klasyfikowana jako miasto. To działa całkiem nieźle, z wyjątkiem
Polska. Zdecydowaliście się na powielenie granic administracyjnych dla poziomów
które zaginęły. Te duplikaty powodują problemy.

Przekindeksowałem cały Koszalin[1] na nominatim.openstreetmap.org w celu
pokazać ci problem. Weź losową ulicę w Koszalinie:
https://nominatim.openstreetmap.org/ui/details.html?osmtype=W&osmid=275110865&class=building

Widzisz, że Nominatim poprawnie zorientował się, że admin_level 6
opisuje miasto (= stopień adresowy 16). Na razie dobrze. Ale potem inne zasady
wchodzi: wie, że poziomy admin_levels opisują doskonałą hierarchię. Teraz znajduje
granica z admin_poziomem 7. To już nie może być miasto, więc musi to być
gmina miejska (= stopień adresowy 18). To samo dzieje się z admin_level 8. Jest to
wewnątrz dzielnicy miejskiej, więc musi to być przedmieście (= adres rangi 20).

Mogę (i będę) dodać regułę, która wykryje te duplikaty (najprawdopodobniej przez
porównując znacznik wikidata). Mimo to, chciałem ci uświadomić, że to
powielanie granic jest rzadkim wyjątkiem w OSM. Wszystkie inne kraje I
jak do tej pory stworzyli dokładnie jedną granicę i zazwyczaj ustawili ją na najniższą
obowiązujący poziom admin_level. Przydałaby się większa spójność. Więc może uda ci się
zrewidować swoją decyzję i sprawdzić, czy powielanie jest jeszcze dziś konieczne.

[1] Większość danych na nominatim.openstreetmap.org jest nadal indeksowana zgodnie z normą
stary algorytm. Nie zobaczysz tego samego efektu w innych miastach.

Hi, I think the Polish community would be open to discussing changes if indeed we’re a “special snowflake” in the world scale, however I think the data consumers tend to stay quiet until they are surprised by something.

Such change should be well announced and even still there will be some silly questions, and people bothering us. Outsiders tend to assume that OSM is a static entity with well-defined rules (we know it’s everything but).

Also, I have a feeling that some geocoder developers already solved a problem you have, maybe it was Mapzen? I don’t remember.

I guess that it is caused by https://en.wikipedia.org/wiki/City_with_powiat_rights https://pl.wikipedia.org/wiki/Miasto_na_prawach_powiatu

The problem with proposed “create exactly one boundary and usually set it to the lowest applicable admin_level” is that it would be incorrect, and parts of the country would have no admin_level=6 classification, contrary to actual situation.

Creating one boundary and setting it to admin_level=6 should work, as I understand it there are no two separate legal object, but one (“Miasto na prawach powiatu jest gminą o statusie miasta, wykonującą zadania powiatu”).

So duplicates may be actually incorrect!

See

I suggested that because it is the usual legal situation. In Germany, for example we have ‘city counties’ which practically are a city but administratively a county. So, place=city + admin_level=6 is correct. If in Poland the cities are administratively without county, then admin_level=8 is, of course, more correct. Anything works, really. Question is more: does the current duplication correctly reflect the administrative situation in Poland.

I’ve just brought up the issue because I fear that the duplicated boundaries may partly be Nominatim’s fault. It has handled this situation very badly for many years. If that is the case then we should get rid of the ‘mapping for the geocoder’.

You probably think about libpostal. It is a cool piece of software and I always have an eye on what it does but it is not directly usable for the boundary classification in Nominatim.

Unfortunately it does – in TERYT (official government registry of administrative division of Poland) database Koszalin is on the same time:

  • a country – with teryt:terc=3261
  • a municipality – with teryt:terc=3261011
  • a city – with teryt:simc=0949448

The situation is identical with other country-cities in Poland (currently 66 if Wikipedia is still correct)

I guess it is more related to the way we try to reflect the administrative status in OSM database. I personally don’t have a problem with Koszalin described only with admin_level=6, but I suppose there are some who would complain if the relation of gmina miejska with admin_level=7 and miasto with admin_level=8 would be missing.