New England place name inflation

That feels intuitively correct to me, looking at it. I think for New England the Micro/Metropolitan NECTAs (available in TIGERweb) provide better results and map more closely the the MSAs for the rest of the country.

1 Like

NECTAs and MSAs (really, any data published by the Census Bureau) can certainly guide us, but shouldn’t “rule” us. Let’s decide for ourselves what the right (statewide, regional-area…) “balances” are. If the Census Bureau somewhat or actually bolsters those as OSM agrees with their data, that’s OK, and some confirmation OSM is on the right track. But let’s not be wed to always agreeing with the Census Bureau, or representing faithfully its data. It and OSM have different goals and different definitions for what each of us say exists.

I continue to think this is an awesome discussion. I very much agree (and it’s good that primary authors, movers and shakers of Americana as a renderer are here) that “how place names render” really IS an important consideration in all this, despite OSM’s admonishment not to tag for any given renderer. “Better” (with consensus) decisions should come earlier, and the correctly-rendered renderings should come later. We can do this.

1 Like

Setting aside the thresholds, I think it might be wise to rewrite the United States/Tags – Places preamble to more strongly disambiguate the concept of municipalities (generally modeled as boundary relations) and place POIs.

In the global context boundary relations are a high bar to map and a single place=* node is a good first pass, but where boundary relations do exist for municipalities, we should be more clear of the usage of the place=* node. As coverage of municipal boundaries expands across the US we are more able to treat these concepts of settlements/localities and municipalities independently.

I think the situations can be distilled into a few buckets:

  • (1) A municipality is mapped as a boundary relation and the main settlement it covers fills (or mostly fills) its extent.

    Examples:

  • (2) A municipality is mapped as a boundary relation and the settlement[s] it covers are only a small portion of the municipal extent.

    Examples of New England “Town” municipalities:

    • Middlebury, VT (boundary) – The main settlement called “Middlebury” is in the northwest corner of the municipality. A separate settlement of East Middlebury also exists within the municipality, as well as a named hamlet of Farmingdale. Much farmland in between makes for distinct settlements.
    • Royalton, VT (boundary) – The municipality is “Royalton”, but the largest and primary settlement is “South Royalton”. There are smaller settlements of “Royalton” (in the center) and “North Royalton” as well.
  • (3) A municipality is mapped as a boundary relation, but there are either no settlements or no distinctive settlements
    Examples:

    • Lewis, VT (boundary) – Lewis is actually unincorporated and managed by the state as it has no population, but is a still municipal boundary exclusive of the neighboring Towns.
    • Lower Frankfort Township, PA (boundary) – The 1,757 residents are spread out across rolling farmland with no historical center or particularly dense cluster of settlement. There is no village to speak of and the town garage is on a random rural road.
  • (4) A municipality exists, but it hasn’t had its boundary mapped yet.

  • (5) There is a settlement without a municipality strongly associated

    • smaller settlements within a municipality
    • settlements in unincorporated areas.

While in all cases the POI place node represents the settlement rather than the municipality, I do think that in bucket (1) there is a very strong association between them and disambiguating what tags go on the boundary relation versus the POI place node isn’t necessarily obvious.

The second bucket (2) feels much more clear to me – I don’t think the POI place node should have the tags related to the municipality (total municipal population, municipal Wikidata id, etc). Those municipal values should be on the boundary relation and the POI place node should have only those related to the settlement itself (if known).

For bucket (3) I don’t think it makes sense to have any POI place node as there really isn’t any single point that one can show up to and say “I’ve arrived at ____”. Having a POI at the geographic center is a falsehood for labeling which can be done automatically from the boundary extent anyway.

Bucket (4) is probably a compromise where the POI just holds a conglomeration until disambiguation is possible.

Bucket (5), like bucket (2) would have the POI node only having tags related to the settlement itself.

8 Likes

This is a great breakdown of different settlement/municipal boundary patterns, @Adam_Franco. Thank you.

Another bucket would be a settlement that extends beyond the core municipal boundary. This can happen in metropolitan areas where all the land separating a city from surrounding towns is developed to the point that they effectively merge together. Sometimes the core municipality absorbs the surrounding towns. Other times it doesn’t. For example the Boston admin boundary represents the municipality, but the Boston place node represents a more general sense of the urban area which probably includes a number of suburbs that aren’t technically within Boston city limits.

Most of the cases I’ve come across are Bucket 1. For those, most have a relation with a boundary way and a place node. There is usually a place tag on the node and a place tag on the relation. (I found one with a place tag on the boundary way also). Perhaps the place tag should only be on the node?

Here, the admin boundary (if there is one), and the place are tightly coupled.
Bucket 2 & 3 don’t seem common around here, but I’m sure we could find some.

Is it OK that this discussion is broader than New England or would you like to keep it focused?

Good question. I think that yes, this discussion is broader than New England, but it is possible that New England is the only part of the country that is completely out of whack due to its history of development and municipalities being named “Town” rather than “Township” in a way that confuses people trying to assign values to the place hierarchy. As well, many New England states have little or no unincorporated area and no county governments, so “Towns” are large and sometimes sparsely-populated administrative areas here.

In other parts of the country with County-level government Towns/Cities/Boroughs were formed by carving out a densely populated part of the County for self-government, leaving the less populated portions outside of the incorporated territory. This makes those towns much more likely to fit the OSM place=town definition than the “New England Town” as they had a dense enough population to bother incorporating.

Using this Overpass rendering of place=* nodes it doesn’t seem that the rest of the country is as miscategorized as New England:

Mid-Atlantic and Southeast seem like they have a reasonable distribution of cities/towns/villages/hamelts:


I don’t have enough local context to know if the Midwest and plains are over-classified or not:


The west looks much more distributed overall, but Wyoming has a bunch of place=town with populations < 100 which may or may not be appropriate given how little else there is out there:


Long story short, New England is a particular problem but we should try to solve the overall classification with an eye toward a national standard.

5 Likes

Absolutely fantastic work here, @Adam_Franco . Thank you for all you are showing us!

Considering the fairly natural distribution of places though out states with appropriately
sized incorporated municipalities. I feel like we could include one of more tag based on population or some other relative measure of size or importance. Adding the second tag would override the normally rendered zoom of the normal place value. This woul prevent confusion by shifting importance to another tag instead artificially changing the incorporation type.

So in the example of New England states, the designated legal incorporation of “town” would appear as the default value of the place tag. Though either fully or partial based ignored in favor of the importance= value.

Taking a look on Addison County:
Compared to Middlebury Vergennes looks to me less important based on what I can see on OSM. So I would rather see Middlebury as city and Vergennes as town. Wikipedia follows our Carto-Map, which seems to follow the boundary relation, not the place-node.

Relation of Vergennes is considered as city but the place-node of Vergennes is a village. :smiley:

It’s too late… OSMUS Slack has already hung a whole #place-classification channel off this thread, with an eye toward writing new national guidelines along the lines of the 2021 highway classification overhaul. The “inflation” that prompted this thread has already been reverted anyhow.

There’s no easy answer. A couple years ago, there was a proposal to change all the population=* tags on place nodes to reflect the overall metropolitan area’s population. The idea was that it would make it easier for OSM-only renderers like osm-carto to prioritize major cities where the population mostly resides outside the city limits, but the idea fell apart because there isn’t a neat one-to-one correlation between metropolitan areas and cities.

Some cartographers dream of being able to rely on OSM to say that, for example, San Francisco takes precedence over San José, so they suggest including the whole MSA. Except San José is in a different metropolitan area, so the whole CSA would be included in San Francisco’s population and the other cities just get the populations in their city limits. But this would result in the perverse situation where San José’s population gets transferred to a much smaller city 50 miles away in a different metropolitan area, only because that city is more famous. The global community has previously rallied against such arbitrary accounting tricks, preferring population=* to retain its more literal meaning.

A similar suggestion that sometimes comes up is to demote all the satellite cities to place=town and use place=city more judiciously. This is an elegant solution, but it doesn’t solve the edge cases. Again, if San Francisco becomes the only place=city in the Bay Area, it would erase the fact that San José suburbs like Gilroy have nothing to do with San Francisco. This problem tends to occur along major transportation corridors and along coasts, where the population doesn’t have a clear nucleus.

To the extent that a renderer or geocoder needs something more nuanced than a raw population within the city limits, it should look beyond OSM. Traditionally, data consumers like Mapbox have simply hard-coded preferences for some world-class cities over others. But a data consumer can produce more data-driven results by consulting the linked Wikidata items, which come with plenty of demographic and economic data on multiple axes that we don’t tag in OSM, along with data that can be used as proxies for notability, such as Wikipedia article page views.

Note that place=* does not specify the legal incorporation type of a municipality. Rather it indicates what type of place a settlement is in a more abstract general sense. A place=town can correspond to a municipality that is incorporated as a City®, Village®, Township®, Borough®, or Town® depending on the legal terms used in a given state. For example, an incorporated Village® in my area recently reincorporated as a City®, but regardless of legal status the appropriate place tag for this settlement is town based on the population and services available there.

1 Like

At under 3000 residents place=village does seem appropriate for Vergennes. However, it does have a full size supermarket and serves as a small commercial hub for the immediate surrounding area (municipal boundary is just 2 square miles) so it could be considered a small place=town. It is incorporated as a City®, but that just means it uses a city form of government and is recognized as a city by the State of Vermont. Clearly it’s quite confusing that the terms City®, Town®, and Village® indicate specific legal status of a municipality while in OSM place tags these same words indicate settlement importance/size.

Since mappers really seem to want to record the type of government a municipality uses, perhaps establishing a separate key for this would be good. I’m imagining something like protection_title that indicates the type of protected area in freeform text (“State Park”, “Wildlife Sanctuary”, etc).

2 Likes

This comment stood out to me. I think it’s fine in principle for a settlement in a sparsely-populated area to have a higher place= value than the exact same settlement in a dense area. I.e. if these settlements are the most important features in the area, it makes sense for them to be more prominent than usual, and vice-versa. The devil is in the details of course about how to accomplish this.

Sounds like a good idea to separate the legal state of a place from its importance.

3 Likes

The San Francisco Bay Area foiled the last attempt at redefining population=*, but Boston is no less troublesome. If we ignore notability and economy and focus on population density for a moment, the Boston urban area would be the most appropriate statistical unit upon which to calculate a population for the purpose of label sizing. Urban areas are what many maps shade or highlight to represent a built-up area. But like many urban areas, Boston’s doesn’t follow any rational lines. It extends well beyond the Boston city boundary, even venturing into southern New Hampshire.

Ordinarily, metropolitan and micropolitan statistical areas do a decent job of answering the question, “Would a town in this area be considered a suburb of one of the principal cities?” Some large, multifocal statistical areas are further divided into metropolitan divisions to help answer the question of which principal city it’s a suburb of; San José–San Francisco–Oakland and Boston–Worcester–Providence are divided thus. But each of these structures fails to contain the Boston urban area:

To some extent, we can chalk it up to the unusual town government structure in New England, which doesn’t really aim to reflect population growth. But NECTAs and NECTA divisions are not much better at containing the urban area:

The urban area comprehensively accounts for all the built-up area around Boston, but we probably shouldn’t tag population=* based on urban areas sometimes and based on administrative structures other times. Neither mappers nor data consumers would be able to intuit that distinction reliably, even if we could agree on which populations should be based on which heuristic.

For simplicity, we could rely solely on urban areas to determine population=* tags, but ignoring everything besides demographics creates problems of its own. Notice that “Leominster–Fitchburg” urban area to the west of Boston: lots of urban areas represent population centers that have conjoined over the years. In theory, we could fairly distribute the population=* among the multiple named cities, but that’s a lot of implicit original research for a single numeric tag.

Taking a step back, labeling populated places on a map is essentially a point clustering problem, just without the colored bubbles. When clustering points, you have to defer the decision to show or hide a label until the last possible moment. Baking this decision into the data means making assumptions that may not always hold. Whether or not you think Newport is an important city like Providence, it’s no problem that this map labels both at zoom level 6, because there’s enough room for both labels to fit comfortably. But if someone who doesn’t have perfect eyesight sets their phone to show text at a slightly larger size, suddenly Newport looks a little less essential at this zoom level.

Normal-sized labels for Providence and Newport in Rhode Island.

The place points can be clustered based on their literal populations, but a sophisticated renderer can consult Wikidata for more nuanced data to support whatever fancy accounting tricks it desires. Wikidata already directly or indirectly links cities to their containing metropolitan statistical areas, which are tagged with up-to-date population figures. There’s nothing stopping Wikidata from even covering NECTAs and urban areas.

As a demonstration, I’ve fleshed out items for all the MSAs, μSAs, and metropolitan divisions in the Boston–Worcester–Providence CSA (Q123564318); all the metropolitan and micropolitan NECTAs and NECTA divisions in the Boston–Worcester–Providence combined NECTA (Q123565597); and the Boston urban area (Q123566801). This SPARQL query visualizes the items as a tree:

This SPARQL query visualizes the effect of boosting a place’s population if it’s the first place that appears in the name of a statistical area:

Boston, Cambridge, and Providence enter the exclusive million-plus club by virtue of the statistical areas that are named after them. After Boston and vicinity, there isn’t much work left to cover the rest of the New England. Tilesets generated by OpenMapTiles and Planetiler already run a similar SPARQL query against the Wikidata Query Service to retrieve translations of place names. A U.S.-focused tileset could quite reasonably adapt this query to pull in the enhanced populations seen here.

In that case, I’ll add one more requirement: the distribution of place=city nodes and of place=town nodes shouldn’t make the state’s borders very obvious if there isn’t an abrupt change in population density on the ground. Unless something about the water in New York discourages town-building, the state’s eastern border proves a need for reclassification, probably on the New England side. By contrast, things are looking OK in much of the rest of the country, despite a lack of hard-and-fast rules.

This is similarly an implicit goal of highway classification: some states like Kentucky and Louisiana include a much larger proportion of the public road network in their state highway networks than surrounding states do. For a couple years after the TIGER import and its literal mapping of route networks to highway=* values, you could easily make out the shape of Kentucky in the form of orange-colored roads – one of the more jarring qualities of OSM back in those days.

A municipality inherently has a boundary; the boundary relation’s border_type=* tag captures this official classification in an open-ended but still machine-readable format.

1 Like

As a native Southern New Englander, this map brings a few surprises in its choice of which cities end up with a label.

Now while Newport is a city in the legal sense (has a head of government called
“mayor”, uses “City of” in its name, etc), it’s inclusion is out of place at this scale.

Instead, along that southern coast, I would expect to see the cities of New Haven, New London, and New Bedford (the English settlers were far from creative in their naming choices), as these places are certainly more significant population centers. Perhaps Newport might have more cultural prominence based on its popularity as a tourist destination, but that feels like a stretch.

My assumption is that this unexpected display choice is the result of label collisions from clustering algorithms. I can imagine that Providence collided out New Bedford’s label and therefore left room for Newport to be drawn. Bridgeport is much harder to explain, as New Haven is CLEARLY the more significant population center, and the missing New London makes even less sense.

Now if I were to stand in downtown Newport (something I do quite literally several times a week), the scene I see around me does not exactly scream “city” in any traditional sense. My pride would not be hurt if for these sorts of reasons we said “this is a city, but in OSM it’s a place=town” as part of a systematic reclassification.

I also note the missing Fall River label, which is certainly (and correctly) collided out by the Providence label. However, it’s definitely a city in the traditional and classic “looking around at downtown” sense and would make sense to encode as a city in the same way we would with San Francisco and Oakland, or Dallas and Forth Worth as distinct urban centers.

I’m hopeful that we can find a more systematic, if not algorithmic way to draw these distinction than just eyeballing things…

2 Likes

It turns out that the addition of place=* tags on the boundaries was something added by other users independently of the changes I reverted. Since it isn’t rendered by the Carto stack it wasn’t obvious that such tagging was used, but thanks for noticing! Adding place=* to boundaries is discouraged in the wiki, but we should probably take an active stand on that as it is duplicative and likely to get missed in any updating of place nodes.

Searching via OverPass it looks like New England also has a particular problem of place=* being placed on boundary relations.

Once guidelines are more solid it would make sense to remove these duplicative place tags from the boundary relations if that is what the guidance is.

2 Likes

Following up on this, I’ve removed the duplicative place=* tags from boundary relations in Vermont where a POI node with the place tag also exists. (changeset)

Broadly, we wouldn’t want to drop the place=* tag from a boundary relation if that is the only way the settlement is mapped, but having the tag on two objects is duplicative and likely to conflict when edits to one aren’t reflected in the other (as we just saw).

1 Like

It turns out that there are also sometimes duplicative place tags on boundary=census as well. (overpass query)

I noticed this because it was causing these CDPs to appear in Nominatim results. I removed the Vermont ones in 144555737.

Yes, it’s essentially the result of clustering on the server side in OpenMapTiles, which is being opinionated without having any context about things like the font size or spacing between the icon and text label. Vector maps need the flexibility to make collision decisions on the client side at runtime, informed by more objective data.

Renderers don’t use place=* on boundary relations, but geocoders do. At State of the Map 2021, @lonvia gave a great overview of the challenges in supporting diverse place classification strategies in Nominatim. Geocoders need to work with places as areas of some sort, since you aren’t necessarily in a given city based solely on your proximity to its center. In the U.S., we aren’t mapping urban areas or postal cities, so OSM-based geocoders focus on administrative units. Nominatim recognizes place=* on the boundary relation, but we’ve already established that settlements can differ so markedly from administrative structures that conflating the two creates more problems than it solves. Alternatively, Nominatim can match the place node to the boundary relation based on wikidata=*, the label role, or some heuristics involving name=*.

As you remove place=* from the boundary relations, make sure there’s some way for a geocoder to reassociate the boundary with a settlement if there’s a strong association in reality. For example, I regularly relate place=* POIs representing unincorporated areas with boundary=census relations representing CDPs while keeping them unrelated to boundary=administrative relations.

On the other hand, some place=* values represent space-filling places rather than population centers: county, state, and country are usually mapped as independent nodes at centroids, presumably as a compatibility shim for data consumers that don’t process boundary relations. I suppose place=municipality could be used in the same manner, but I haven’t bothered to do that. Instead, I’ve been distinguishing between Midwestern townships (analogous to New England towns) and other local places using border_type=* on the boundary relation.