New England place name inflation

Minh_Nguyen · February 4, 2024, 1:53am

My read on this thread is that we (continue to) have a consensus that the official designations of places in New England should go in border_type=* rather than place=*, out of a desire for some degree of harmonization across regions. This is not to say that towns aren’t places, just that they don’t fit into the same definition of “place” as the other local-scale place=* values. The discussion quickly turned to alternative criteria for place=* classification. I think most participants here are on board with place=* being based on some fuzzy, holistic criteria, just as in highway classification. That’s the easy part.

This is a great list, showing the challenge of adhering to a bespoke place classification system. These are all factors that could influence a single holistic place classification scale, but that isn’t to say we must limit ourselves to place=*.

In general, whenever official designations routinely differ from an OSM-centric classification system, the official designation can go in designation or a more specific key. In the U.S., we’ve tended to tag official designations as border_type=* on the boundary relation, because virtually every officially designated place has a boundary. This is more predictable and thus more reusable than conflating, say, New England towns with place=town and expecting data consumers to know the difference between the OSM and New England definitions.

Unfortunately, this requires the boundary to be mapped first, and some countries have official designations for places without definite boundaries. Instead, China pairs place=* with place:CN=* and the Philippines pairs place=* with place:PH=*. These local subkeys also extend to other classification systems: France pairs school=* with school:FR=*. By analogy, we could tag place=municipality place:US-VT=town on a Vermont town node, but a proliferation of hyperlocal subkeys like place:US-CA-Orange-Irvine=village (for Irvine’s “villages”) would be very difficult for editors to support, versus something more unified like place:official=US-CA-Orange-Irvine:village, or simply designation=village.

Regardless, in New England, the boundary of a town is much more meaningful than its abstract centroid, hence the hemming and hawing about border_type=*.

In OSM, we have two options for subordinating one administrative area to another, and both have drawbacks:

One boundary=administrative relation lies within another and has a numerically higher admin_level=* value. This forces us to adhere to a single linear hierarchy of places, which breaks down in many respects. For example, we have no good answer for how to indicate that a CDP boundary is simultaneously the third-level division of the Navajo Nation and equivalent in rank to a third-level division of the State of Arizona, yet the Navajo Nation outranks Arizona and the CDP is only considered administrative by the Navajo Nation. Maybe is_in:*=* tags make a comeback?
One boundary relation is a member of another with the role subarea. This enables us to model multiple inheritance, and it conveniently solves the problem that some countries allow an administrative area to subordinate an administrative area that lies outside its boundary. Unfortunately, since our data model requires a relation to list its members, rather than vice versa, we can quickly end up with monstrous, fragile relations, not to mention reference loops. Another disadvantage is the utter lack of software support.

Settlements (place=isolated_dwelling/locality/hamlet/village/town and place=block/neighbourhood/quarter/suburb/city nodes) can form a sensible, usable hierarchy or two if we decouple them from their boundaries. But there’s no official nationwide classification system, other than maybe something tangentially related to core-based statistical areas, so we run against the limits of trying to be maximally data-driven and objective. For example, we haven’t reached a consensus about the degree to which amenities and services should factor into promoting a place from one classification level to another. But I think this is because we’re too focused on edge cases and not focused enough on the 90% case.