New England place name inflation

It turns out that there are also sometimes duplicative place tags on boundary=census as well. (overpass query)

I noticed this because it was causing these CDPs to appear in Nominatim results. I removed the Vermont ones in 144555737.

Yes, it’s essentially the result of clustering on the server side in OpenMapTiles, which is being opinionated without having any context about things like the font size or spacing between the icon and text label. Vector maps need the flexibility to make collision decisions on the client side at runtime, informed by more objective data.

Renderers don’t use place=* on boundary relations, but geocoders do. At State of the Map 2021, @lonvia gave a great overview of the challenges in supporting diverse place classification strategies in Nominatim. Geocoders need to work with places as areas of some sort, since you aren’t necessarily in a given city based solely on your proximity to its center. In the U.S., we aren’t mapping urban areas or postal cities, so OSM-based geocoders focus on administrative units. Nominatim recognizes place=* on the boundary relation, but we’ve already established that settlements can differ so markedly from administrative structures that conflating the two creates more problems than it solves. Alternatively, Nominatim can match the place node to the boundary relation based on wikidata=*, the label role, or some heuristics involving name=*.

As you remove place=* from the boundary relations, make sure there’s some way for a geocoder to reassociate the boundary with a settlement if there’s a strong association in reality. For example, I regularly relate place=* POIs representing unincorporated areas with boundary=census relations representing CDPs while keeping them unrelated to boundary=administrative relations.

On the other hand, some place=* values represent space-filling places rather than population centers: county, state, and country are usually mapped as independent nodes at centroids, presumably as a compatibility shim for data consumers that don’t process boundary relations. I suppose place=municipality could be used in the same manner, but I haven’t bothered to do that. Instead, I’ve been distinguishing between Midwestern townships (analogous to New England towns) and other local places using border_type=* on the boundary relation.

If a boundary relation has a place tag, the relation includes a place node, and the boundary ways have a place tag, is it OK to remove the place tag from the relation and the ways? I’m seeing contradictions on many, probably because someone changed one or the other and didn’t realize it was duplicated.

This makes sense. Colorado has 4 types of Municipalities. You’re saying this could go in border_type, it doesn’t have to match the place node.

1 Like

Yes, I agree that border_type=* can be used as stated here but it has gotten conflated with legacy “other things” and at least for land borders, isn’t very reliable. For maritime borders, it is more reliable.

I continue to believe that place=city should only be used for incorporated municipalities. There can be other things along with things tagged that which are “also true” (like it is a capital city, a consolidated city/county, an independent city, “named a town” (in California, “city” and “town” are synonymous by law), etc. But as “city” is so elastic, we should continue to insist that at least “incorporated” is true, considering we also use it for population=50000 to population=30000000 (or so…I don’t know what Tokyo is, something like that). That’s a wide range, and constraining “city” to at least “incorporated” (as we do so in the fifty states) keeps some sane, sensible bounds on things which are not especially tame to have bounds put on them.

Edit: Unless they already exist (and I think they do, at least are paid attention to by some renderers), it may be time for OSM (along with USA, EU, others…) to craft some “large_city” and maybe “mega_city” values for Earth’s 5M+ and 20M+ cities, for example. If / as renderers coalesce on these, paying attention to some consensus we (globally) hammer out, we win all around.

I’m not sure how much this rule would accomplish, other than blurring the line further between administrative areas and settlements. Not many unincorporated settlements are large and important enough to justify place=city anyways. But for example urban Honolulu is place=city by common sense, even though it’s unincorporated and only the surrounding county is considered a “city”. It’s similar to the situation in many New England towns and Puerto Rican municipios that aren’t fully urbanized.

2 Likes

I’ve been following along with interest, as we’ve had further conversations along similar lines.

A question for you all though, please.

There’s been lot’s of mention of “incorporated” vs “unincorporated” places - what’s the difference?

Incorporated town - Wikipedia seems to say that if the local government is elected then it’s incorporated?

Is that it?

What I am about to say is likely only true in many or most of the 50 US states. Not Australia, (though maybe aspects of this, as English Common Law has some overlaps, I’m not sure how much, I’m not sure where), not other non-USA places.

An incorporated city has received a charter from its state and is allowed to elect its own officials. It is a “body corporate” (similar to a corporation, but we might say “municipal law, not corporate law” or “municipal bond, not corporate bond”). Things like cities, towns, villages, charter townships (in Michigan) can be incorporated. It is not simple to explain in 50 words or less, but that’s largely it. Things vary from state to state (here, in the USA, where we attempt to apply this).

There isn’t a single answer for the whole country. In the context of this conversation, it would probably be more accurate to speak of municipalities rather than incorporated places. Unincorporated places have no direct self-government, though they may be served by any number of overlapping special-purpose districts (schools, fire protection, water, etc.).

A lot of this discussion has been about administrative units that are not populated places per se. A rural New England town or a Midwestern township or county has directly elected officials but (depending on the state) isn’t incorporated. All that means is that the administrative unit is legally part of the state government, rather than an independent municipal corporation organized under state law.

(Note that Puerto Rico is divided into municipios, which are kind of like New England towns rather than municipalities on the mainland.)

Getting likely a bit into the weeds here (with some readers), but you brought up municipios (in PR), Minh, and we document these in our wiki:

• “municipalities” exist in the 50 states, (not quite Hawaii, it can be explained, we do [1]) as usually- (always-?) incorporated bodies (which have been given their charter by their state and allow them to elect officials to govern “locally”),

• “municipalities” exist in the USA’s territories and commonwealths (non-state divisions more strongly associated with federal-level governance), where municipalities in these go by many different names:

Outside of the 50 states, and as each of these are “territorial municipalities”: districts of U.S. Virgin Islands, municipios of Puerto Rico, villages of Guam, districts and unorganized atolls of American Samoa (overpass turbo) and islands and an island group in Northern Mariana Islands are tagged admin_level=6, the same as counties in the 50 states. Puerto Rico’s municipios and wards were imported from Puerto Rico Planning Board data, while Guam’s village data (names end in “Municipality”) are from the Census Bureau. A territorial municipality boundary relation’s name=* tag should include the word “Municipality” (or one of its flavors: district, municipio , village, unorganized atoll, island or island group).

Full disclosure in case you don’t already know this: Minh (and to a small extent, I) substantially wrote “Boundaries” (wiki), which is more descriptive, I and many others substantially wrote “US _admin_level” (wiki), which is more prescriptive. So, a lot of people have already said a great deal about this. We can say more, but the consensus we have achieved with these wiki and present (and ongoing) tagging isn’t exactly fragile, but it has been “carefully wrought,” and while it can be bent, it can also bend so much it can be broken.

[1] United States admin level - OpenStreetMap Wiki
[2] https://wiki.openstreetmap.org/wiki/United_States_Boundaries

Aw, shucks, I think my contributions to that page account for only a small fraction of what’s there now. In any case, the page is primarily about boundaries, which are only somewhat related to the populated places we’re classifying in this discussion.

New England is a head-scratcher when it comes to devising a hierarchy of populated places, but it’s downright easy compared to some other states when it comes to rationalizing boundaries into administrative “levels”.

Thanks!

I think I’ll just forget that I even asked! :crazy_face: :woozy_face: :face_with_spiral_eyes:

I disagree; this is an arbitrary rule that I see having no place in defining the relative importance and size of a population center. That place= node isn’t a statement about legal structures, it’s a statement about the cluster of population and infrastructure that exists in a place. I think these are completely independent concepts, especially considering how widely government structures vary.

I think we will find there are places that aren’t incorporated as a city but should be place=city as well as places that ARE incorporated as a city but should not be place=city.

1 Like

There does seem to be a “ceiling” of an unincorporated town which simply shouldn’t try to “stand on its toes” to become tall enough to become a city if it isn’t incorporated. I say that because of how large place=city can become (millions), as that is so far away from an unincorporated town (maybe 10,000 people, maybe it stretched there because of a university or hospital) but town (unincorporated) and city (incorporated) seem like they have a sort of “hard boundary” separating them that the idea of incorporation captures.

I either need to have examples presented to me of both, or stretch my mind a bit (it wouldn’t be the first time I have to stretch my mind in OSM, or change it, or deal with a new concept and re-asses a decision I’ve made). And I’d be fine with doing it again! For the former, would it be that they are (simply?) a “big town,” and so we call it a city? For the latter, is it because they are incorporated, but very (VERY) small?

Every city in Hawaii would be the most classic example. Honolulu, Kona, Hilo would all be examples. Hawaii only has state and county level governments. The place node for Honolulu represents the urban area located south of the Ko’olau mountain range and east of Aiea.

An excellent example, and one I know you can attest personally. Though (and it isn’t a deal-breaker), 49 of the USA’s 50 states don’t have this “direct by the state kind of governance” quite like Hawaii does, where only the island of Oahu is incorporated out of all of the islands, and even then, only certain parts (zones?) of Oahu are “urbanized,” which often is associated (sometimes closely, sometimes not) with the concept of “incorporated.”

I realize I’m conflating a lot, but I’m also doing my best to keep certain concepts separate here, which is what I believe incorporation does (for municipalities): make “cities” (urbanized areas) self-governing and “a body separate from the state.” Yes, “relative importance and size of a population center” (in that order) certainly does guide OSM quite strongly about “what is a city?” (in the USA). What I continue to listen to is whether others think this should also include “incorporation” and perhaps “being urbanized” or “as a larger conurbation.”

Again, it’s important to emphasize “what makes a settlement a conurbation?” (and related discussions) can be pretty squishy concepts and that stretching definitions to fit all cases can be difficult when our goal is to achieve wide consensus. And New England might not be exactly right for the rest of USA (though as was said, perhaps that ship has sailed, at least in this topic / thread), and agreements that can be achieved to work in USA likely won’t work in Australia or EU or India or elsewhere.

To expand on this, I think that a higher place= value can “encompass” areas which include lesser place= values. In other words, the nodes do not represent distinct bounded areas, they represent the center of a population cluster which holds identity influence across a wide or narrow area.

For example, suppose someone lives in the Corey Hill neighborhood of Brookline, MA, which is itself technically a separate town that for all intents and purposes is part of Boston, even though it’s not within city limits.

If that person were in California, they would say that they’re from Boston. If that person were in Boston, they’d say they live in Brookline, and if that person were in Brookline, they’d say they live in Corey Hill. This to me is the essence of what the place nodes represent - the degree of their identity influence as a named place (with apologies for making this even squishier than it already is).

I think necessarily that means the surrounding context has a lot of influence over how we decide upon a population center’s identity significance.

I drew this chart (which I hereby release as CC0) to lay out my personal mental model about how I think about place=* nodes:

That small settlement in the middle of nowhere, long distances from anything else, can indeed be a place=city, while that same-sized place might get a =hamlet node in a dense conurbation.

6 Likes

This uplifts Banstable (the place) rather than Hyannis which is the urban centre. Noted because I recently noticed that Hyannis had been downgraded to place=village and a town node added for Barnstable in more or less the same place.

I notice other places on the Cape need more work, the town of Yarmouth probably only really exists as an administrative geography: the actual commercial & population centre looks to be South Yarmouth (but all CDPs are significantly larger than hamlets, and are the names people use locally & for addrresses). Where the admin geography matters is in things like local residents parking permits for beaches within the boundary. I think better mapping of schools, places of worship & retail areas can assist weighting places (see Stefan Keller’s SotM talk on Areas of Interest).

As I think about this more, I believe it is important to emphasize OSM distinguishes the differences between a node and a (multi)polygon tagged place=city. For the latter, I do believe it must be incorporated, and the boundary=* relation reflects the “boundary of incorporation” (which may also include a label=* node for renderers to “place” the node). For a node so tagged (city, not label), I’m tempted to say “don’t do this, except as a shortcut, as there is a better, richer dataset expressible as a (multi)polygon, so do that instead.” For [town, village, hamlet], it isn’t very likely there is such a boundary, so these values can be “restricted” to only being on a node when there is no legal boundary. That’s a fundamental difference which our wiki states, but is good to reiterate here.

Does the “curvy” aspect of Brian’s (excellent) diagram accommodating the “how sparsely or densely settled is the region?” dimension of this capture at least part of what the OP was asking about? Yes, quite well. And it may be that regional or statewide “more local” preferences (New England being a particular example) best accommodate this. Adam’s maps show us there may be others.

Edit: Clarified that [town, village, hamlet] may or may not find themselves on either a node or a polygon — it isn’t always.

I am of the opinion that there is nothing particularly wrong and everything particularly correct with putting a place=* tag on a boundary relation; this isn’t a problem. What we see in that OT rendering of New England (particularly much of Connecticut, Rhode Island, western Massachusetts, southern New Hampshire and much of Maine) is that there are many such boundary relations, including for such things as town, village and even hamlet (I believe). As these data are correct, it is appropriate for OSM to have such a type=boundary relation (could be multipolygon-flavored with outer and inner members, even as it is tagged type=boundary rather than type=multipolygon).

But where these data (in the politics of the real world) are indistinct or not legally or otherwise definable, and the “location extent” of a town, village or hamlet is much more “amorphous,” well, OSM solves that quite well by tagging a node with place=* [town, village, hamlet].

Since the same words are used to mean different things below, I use monospace text to indicate an OSM tag and the ® symbol to indicate a type of municipal government. Thanks to @Minh_Nguyen for this idea.

It seems to me that we are talking about two related but distinct concepts here:

  1. Administrative boundaries of incorporated municipalities
  2. Settlements of various size/importance with somewhat blurry bounds

My mental model is that concept 1 is mapped with a boundary relation and tagged boundary=administrative + admin_level=8 (or sometimes a different number) and may additionally include border_type=city|town|village|township|etc to indicate what type of municipality governs this incorporated area. I don’t consider a place=* tag necessary on such a boundary relation as the previously mentioned tags include all the necessary information. If I were to include a place= tag on a municipal boundary, then I suppose place=municipality would be the appropriate one as it is documented as an administratively declared place on the wiki.

I wouldn’t include place=city|town|village on a municipal boundary because my understanding is that these tags represent concept 2, populated settlements with blurry bounds. The municipality I live in is called Colchester and it is modeled as an admin boundary in OSM. Although this municipality uses a Town® government and is officially named Town of Colchester, there is no distinct, dense settlement that fits the description of a place=town within this municipal area. There are two distinct settlements within the Town® of Colchester that fit the description of place=village. One is called Malletts Bay and the other is called Colchester. These are both modeled with place=village nodes in OSM. The village of Colchester is a different entity than the Town® of Colchester although they are related. The distinction between a settlement and a related administrative boundary becomes blurrier when an incorporated municipality is called a Town® and also contains a dense enough settlement to fit the description of a place=town, but even then these are still distinct (though related) concepts to me.

2 Likes