Framework for aligning New England place nodes to census categories

ezekielf · April 29, 2024, 1:26pm

That makes sense. The circles I drew are meant as city candidates subject to other criteria as well, not places I definitely think should be cities. I can definitely see Lynn not having enough density prominence to be separate from Boston, same as Quincy.

ZeLonewolf · April 29, 2024, 1:57pm

We seem to be coalescing on a definition of place=city that looks maybe like this:

place=city is used for place nodes that:

Represent the most significant center of population (the “urban core”) of an extended urban area with a sufficiently high population that it should be categorized at the same level of significance with other cities in the region.
Have sufficient separation from other place=city nodes that they represent a separate urban core and extended urban area - OR - represent an additional urban core within a larger extended urban area where the urban core is of similar magnitude to the principal city.

Minh_Nguyen · April 29, 2024, 2:52pm

For the most part, I think that definition is reasonable and fairly obvious.

This gestures at the notion that smaller places in sparser areas should get a boost. The Urban Area algorithm already has a tool for evening out the density of places across denser and sparser parts of the country, up to a point: it’s everywhere census block housing density is used as an input. Conceptually, this assumes that a certain urban density is necessary to establish that character of a place so big you can only know it partially, to paraphrase @Adam_Franco.

By contrast, ad hoc measures of regional significance tend to expose differences of perspective rather than guiding us toward consensus. We can only assign each place a single place=* value regardless of zoom level, but any time you zoom in, a larger set of places becomes regionally significant. People from one region will disagree markedly about the significance of places in another region, based on renown alone. Places can take on added significance in mass media or transportation that don’t greatly affect other aspects of life, as in Bangor and its special status as an emergency stop for transatlantic flights.

uk4sm · April 29, 2024, 6:33pm

Before dismissing NETCAs, you should consider implementing Micropolitan areas as cities as opposed to Metros only. Also, since US Census has a 50,000 city center as a standard for metros as opposed to micros, this could be applied to separate major cities from huge metros, such as Nashua, Lowell, and Cambridge from the Boston metro. And of course, the rare cases that need exceptions, like the tiny Vermont capital of Montpelier, would have exceptions. This would almost match up perfectly with the current map. For example:

Maine Cities

Portland, ME (metro, South Portland included)
Lewiston, ME (metro, Auburn included)
Bangor, ME (metro)
Sanford, ME (micro)
Brunswick, ME (micro)
Augusta, ME (micro)
Waterville, ME (micro)

Removes Rockland, Presque Isle, and Biddeford from current map.
Adds Sanford to current map.

Vermont Cities

Burlington, VT (metro, South Burlington included)
Montpelier, VT (would be separate from Barre due to capital city status)
Barre, VT (micro)
Bennington, VT (micro)
Rutland, VT (micro)

Removes nothing from current map.
Adds Bennington to current map.

New Hampshire Cities

Manchester, NH (metro)
Nashua, NH (would separate from Boston metro, due to a core pop. greater than 50,000 people)
Concord, NH (micro)
Berlin, NH (micro)
Claremont, NH (micro)
Dover, NH (metro, Durham included)
Laconia, NH (micro)
Keene, NH (micro)
Lebanon, NH (micro)
Portsmouth, NH (metro)

Removes Rochester from current map.
Adds Berlin to current map.

ZeLonewolf · April 29, 2024, 8:21pm

Including such a broad list of places as place=city is counter to the sentiment in New England place name inflation and commentary that place nodes east of the Mississippi are overclassified.

Including such locations such as Berlin, NH (population < 10,000), would then cause me to insist on adding not only Newport, RI, but also other small locations that have similar compact centers such as Wakefield, RI, Westerly, RI, Woonsocket, RI, and Pawtucket, RI.

Since it’s suggested to have Cambridge be a separate city despite it being a clear outgrowth of Boston, I would also similarly insist that the Providence suburbs of Warwick and Cranston be similarly tagged place=city.

Minh_Nguyen · April 29, 2024, 9:23pm

I don’t believe this comports with your earlier point, which I’ve been trying to make as well:

The OMB and Census Bureau do not intend for CBSAs (MSAs and μSAs) to be used in cartography and do not consider them to be an urban classification:

OMB establishes and maintains these areas solely for statistical purposes. In reviewing and revising these areas, OMB does not take into account, or attempt to anticipate, any public or private sector nonstatistical uses of the delineations. …

Furthermore, the MSA and µSA delineations do not produce an urban-rural classification, and confusion of these concepts has the potential to affect the ability of a program to effectively target either urban or rural areas, if that is the program goal. …

In 2020, the OMB rejected an official recommendation to raise the minimum population to qualify an MSA from 50,000 to 100,000, pending further study. While acknowledging that 100,000 would be a more accurate reflection of population, they were concerned about disrupting the administration of ongoing federal programs. We have no such backwards compatibility concerns.

Moreover, 50,000 is the minimum population of the urban area at the core of a CBSA, not the minimum population of the CBSA itself. As explained in the 2010 standards, this threshold was carried over from the former 50,000 threshold between urban clusters and urbanized areas. In other words, this is the threshold that had been frozen since 1950 and finally tossed out as antiquated when defining the 2020 urban areas. The country’s population has more than doubled since 1950; there’s no reason OSM needs to turn back the clock by 75 years.

I will let this observation speak for itself.

AntiCompositeNumber · April 30, 2024, 1:45am

Previous discussions on this forum and on the Slack have had good agreement that Berlin is just too small to be a place=city since the end of the timber boom. The largest three employers are the hospital, the prison, and the other prison.

uk4sm · April 30, 2024, 2:36am

By that logic, Presque Isle should not be tagged as a city, either. And unlike the latter, at least Berlin has a defined Micropolitan area.

uk4sm · April 30, 2024, 2:37am

If Berlin, NH, a city with a defined micropolitan area, is unacceptable, then why is Presque Isle, ME still tagged as a city?

The reason Berlin has a NECTA is because of its isolation. The distance between Berlin, NH and Concord, NH is over twice the distance between Newport, RI and Providence, RI. Wakefield, Westerly, Woonsocket, and Pawtucket are even closer together. There’s a reason they’re all part of the Providence metropolitan area. The entire state of Rhode Island is smaller in size than the Bangor, ME and Portland, ME metro areas.

ZeLonewolf · April 30, 2024, 2:51am

Agreed, Presque Isle should not be tagged as a city.

uk4sm · April 30, 2024, 2:52am

I don’t think you’re aware that there have already been attempts to downsize New England city labels in the past to only include larger cities with metro areas. Because of cases such as Montpelier and Augusta, it will never be accepted.

New England has less of a city inflation problem, and much more of a lack of standardisation problem. I support a standard that comes from an independent, government source, that doesn’t drastically change the map that currently exists.

If you try to limit a state like Vermont to Burlington only, a state like Maine to Portland, Bangor, and Lewiston only, you will never get a large enough consensus to implement those changes, at least if you to expect them to last more than a week.

ZeLonewolf · April 30, 2024, 2:56am

On the contrary, there is a fair amount of consensus emerging from the people that actually live here as we continue to explore the issue.

I would suggest toning down the combative attitude if you seriously intend to collaborate.

ezekielf · April 30, 2024, 4:06am

The OMB has demoted Berlin as it no longer even meets the population threshold for a micropolitan area. See: Is Your Locality Impacted by the Changes to the 2023 Core Based Statistical Area Definitions? | Chmura.

Minh_Nguyen · May 3, 2024, 5:37am

A prerequisite for classifying the place=* points in New England is making sure that they actually represent what they purport to represent. The place=* values more or less fall into three orthogonal hierarchies:

Administrative areas	Population centers	Parts of population centers
`country` `state` `county` `municipality`	`city` `town` `village` `hamlet`	`borough` `suburb` `quarter` `neighbourhood`

The vast majority of place=city/town/village/hamlet nodes were imported from the Populated Place class in GNIS. The coordinates are located at a “downtown” rather than the centroid of any administrative area, and the Feature ID in gnis:feature_id=* specifically represents a population center. (The Civil class in GNIS represents administrative areas, but we never imported those features.) To the extent that these population centers correspond well to TIGER-imported administrative areas in name and function, they are label members of administrative boundary relations.

If you were to map a place point for an administrative area, it would be located at the area’s centroid, regardless of where people live or work, and it would be the label member of an administrative boundary relation. Most states and counties have such place points. This is also an option for the cities and towns that evenly partition New England counties, which would be tagged as place=municipality, but to date only two have been mapped as points: Clarksburg, Massachusetts, and Colchester, Vermont. (place=municipality is the standard tagging for towns in New York and Wisconsin.)

Unfortunately, the TIGER import sometimes incorrectly conflated some places with larger, identically named places elsewhere in the state, inflating both the population and place classification. Compounding the problem, subsequent mass edits conflated many more populated place points with administrative areas by the same name, even though only a small portion of the administrative area is built up.

In the Midwest, many sleepy unincorporated communities wound up with population=*, wikipedia=*, and wikidata=* tags corresponding to the entire surrounding township, even though the point remained at a tiny population center in one corner of the township. Even today, in New England, this QLever query finds 413 place=city/town/village/hamlet/neighbourhood points that are linked to cities or towns in Wikidata and generally have the populations of those entire cities or towns, as if the feature were really tagged place=municipality. These points should ideally match CDPs that the Census Bureau has created to approximate the town’s central village.

In some cases, this overconflation may be the best we can do. In Maine, the Census Bureau abolishes any CDPs within a town when it reorganizes as a city, even if nothing has changed about the city’s population distribution. But ignoring that problem, this QLever query still found 48 place=town/village/hamlet/neighbourhood points that were linked to towns in Wikidata despite there being a CDP by the same name within the town.

I’ve gone through these results, replacing each point’s population=*, wikidata=*, and wikipedia=* tags with those of the CDP; changing its role in any administrative boundary relation to admin_centre; and adding it to a boundary=census relation, if available, as a label member. In some cases, the population rose as I simply updated the figure from 2006 estimates to 2020 figures, but in most cases, the population fell – in some cases, to as little as two percent of the previous figure.

I haven’t changed any place=* classifications, so some of them stick out more when you compare their population figures to a global distribution of place classifications. For example, at 17,000 residents, Brunswick, Maine, would be a median town globally but is tagged as a city. Hillsboro, New Hampshire, with a population of 2,200, is tagged as a town but is about one standard deviation away from both town and village. The 23 residents of Bolton, Vermont, can hardly claim to be a hamlet, let alone a village as currently tagged.

With these more accurate population figures, we have a more sound basis for evaluating the current classifications and any replacement criteria.

Adam_Franco · May 3, 2024, 12:23pm

For those that missed this, I believe the standard deviation Minh is referring to is from Brian’s global analysis of the Distribution of primary populated place values.

Adam_Franco · May 3, 2024, 12:55pm

Thank you Minh for this clear disambiguation of these population figures between CDPs (which generally align with more densely populated places) and their enclosing Towns.

Unfortunately, many New England Towns have densely population places in them that the Census Bureau has not defined CDPs for. In these cases it has been common practice in the past to attribute the Town’s population and wikidata id to the place node representing the dense settlement place rather than the administrative boundary.

For reference, here is the Census map showing CDPs in Vermont, note the numerous Towns without CDPs.

Two examples of this problem are Moretown (Town, settlement) and Middlesex (Town, settlement) Vermont. Both of these are tiny settlements on the border of hamlet and village surrounded by the large rural municipal area that likely has a greater population than the settlement itself.

In these cases I think that for internal consistency we should move the population and Wikidata tags to the administrative boundary relation as the settlements do not have a known-to-the-Census population and are not the municipality. An additional task would be to create new Wikidata items for the unincorporated settlements themselves if that was desired.

Minh_Nguyen · May 3, 2024, 2:44pm

Agreed. For example, Madawaska, Caribou, Maine, had gotten conflated with a town by the same name elsewhere in Aroostook County. Since Caribou is a city, there’s no CDP for this Madawaska and therefore no convenient population figure to tag. Caribou’s central village similarly has no specific data. I stripped the population tag off Madawaska but haven’t stripped them off these cities’ central villages yet, since I was focusing on more incontrovertible changes.

In some cases, we could figure out which census blocks correspond to which places within the city. If the Urban Area fits within the city limits, it probably approximates what the place point represents and we can use its population directly. If the city organized since the 2000 census, we could track down the geometry of the former CDPs and correlate them to current census blocks, but that would be more time-consuming.

Wikidata generally doesn’t have preexisting items about places like Madawaska yet, because Wikipedia never considered them notable enough for an article that Wikidata could import. So I created an item for it based on GNIS. We could automate this item creation using a GNIS dump and Open Refine.

Adam_Franco · May 3, 2024, 4:11pm

This might be a little tricky, but I found P.L. 94-171 County Block Map (2020 Census) which provides detailed maps of census blocks. It looks like these blocks generally are bounded by through-roads, rivers, and rail lines.

Here is a screenshot from Washington County Vermont with the Moretown village area roughly circled:

If a data set with these census blocks and their populations was available in a format that was browseable in QGIS then it may be somewhat straight forward to make up our own CDP-equivalents to get an estimates of place populations. Definitely not perfect, but probably more realistic than just taking the entire municipality’s population. Please post a link if anyone has found such this dataset!

ezekielf · May 3, 2024, 4:29pm

I did this in for Colchester village within the Town of Colchester, VT (this place is like Barnstable, MA), but at the census tract level. This is not ideal as the tract is much larger than the actual village area so it’s probably a significant over-estimate. I’d be perfectly happy somehow indicating that a place node’s population is unknown. I’m mostly concerned with clearly distinguishing the difference between a hamlet/village and a surrounding municipality that shares the same name. Separate wikidata items certainly seems a reasonable way to handle this.

A place=municipality node at the admin boundary centroid with the label role also seems like a reasonable way to head off future mappers adding or upgrading nodes to place=town just because we call municipalities Towns here. A label node for each municipality shouldn’t be strictly necessary since data consumers can calculate a centroid from the boundary polygon, but with counties, states, and nations all having these centroid place nodes it seems ok for municipal boundaries to have them as well.

Minh_Nguyen · May 3, 2024, 4:31pm

Just to be clear, while we can go down the rabbit hole of obtaining super-precise population figures, I’m only updating these population tags so that we can get people out of the mindset of oversimplifying the whole town as a monolith. Ultimately, if we rely on Urban Areas to classify places, then the places that lack CDPs will naturally be classified as village or less, so the precise population becomes less relevant to rendering or geocoding.