Framework for aligning New England place nodes to census categories

ZeLonewolf · April 23, 2024, 9:39pm

Hi folks,

I would like to start laying on the table some concrete criteria for classifying place names in New England (and potentially with an eye towards elsewhere if folks feel it applies). This follows other discussion threads on the topic of place classification, namely:

I’d like folks to poke and prod on this scheme and see if we can get something that works. Some of these will be black and white while some will be necessarily subjective.

Proposed:

place=city:

Any location listed first in the name of a Metropolitan Statistical area

place=town:

Any location listed 2nd or later in a Metropolitan Statistical area
Any location listed in the name of a Micropolitan Statistical Area
Any location that is otherwise locally significant. Examples:
a. is the state capital (really couldn’t let Vermont’s capital be a village)
b. is a “control city” on highway signs (e.g. White River Junction)
c. regularly receives traffic jams with traffic destined for that location (e.g. Newport, RI)

place=suburb

Any location clearly part of the conurbation of a place=city and not listed in the name of its Metropolitan Statistical Area, even if it corresponds to an administrative entity in its own right (example, Brookline MA or Cranston RI). Locations that are in the conurbation of a place=town should be tagged village or hamlet based on relative size.

place=village
place=hamlet
place=isolated_dwelling

All lower categories subjectively classified. If there are classification strategies, they are defined and applied locally.

Below are all Metro and Micropolitan Statistical Aeas in New England. Let’s try this on for size and see how people feel that things fit.

Metropolitan Statistical Areas (MSAs)

Boston-Cambridge-Newton, MA-NH
Bridgeport-Stamford-Norwalk, CT
Burlington-South Burlington, VT
Hartford-West Hartford-East Hartford, CT
New Haven-Milford, CT
Norwich-New London, CT
Pittsfield, MA
Providence-Warwick, RI-MA
Springfield, MA
Worcester, MA-CT
Portland-South Portland, ME
Manchester-Nashua, NH
Bangor, ME
Lewiston-Auburn, ME

Micropolitan Statistical Areas

Claremont-Lebanon, NH-VT
Concord, NH
Keene, NH
Laconia, NH
Bennington, VT
Rutland, VT
Barre, VT
Torrington, CT
Rockingham-Strafford, NH
Berlin, NH-VT
Vineyard Haven, MA
Nantucket, MA
Greenfield Town, MA
Athol, MA
North Adams, MA
Augusta-Waterville, ME
Bangor, ME

AntiCompositeNumber · April 23, 2024, 10:25pm

Nashua, NH is the second largest city in New Hampshire, so demoting it to town because it’s listed second doesn’t make sense. Rockingham and Strafford are the names of the two Seacoast NH counties, currently containing multiple cities and towns.

I think this classification is too strict for NH at least, and probably the rest of NNE as well. For NH I would generally include at least Manchester, Nashua, Concord, Dover, and Rochester based on population alone, excluding Derry because the size of their CDP is too small. I like that Portsmouth, Keene, and Lebanon are currently mapped as cities because of their regional significance. Laconia and Claremont could be argued down.

ezekielf · April 24, 2024, 3:50am

Thanks for laying out a first draft, Brian. In addition to Metropolitan and Micropolitan Statistical Areas, we should probably also consider New England City and Town Areas (NECTA). They seem to break things down in more detail.

I think limiting place=city nodes to only the first named city of a Metropolitan Statistical Areas would leave us with a very sparse list. How about something a bit broader like this:

A named settlement will qualify for place=city
IF  any of the following use its name:
    Metropolitan Statistical Area
    Metropolitan Division
    Metropolitan NECTA
    Metropolitan NECTA Division
    Micropolitan Statistical Area
    Micropolitan NECTA
AND any of the following use its name: 
    Census Designated Place >= 30,000 population
    Urban Area >= 30,000 population
    Municipality >= 30,000 population

The addition of micropolitan areas and metropolitan divisions would include more places, while the population threshold would still exclude those of lower significance. Under this framework in New Hampshire Concord and Lebanon would qualify, while Keene and Laconia would not. In Vermont none of our micro area core cities would qualify due to low population. Burlington would qualify based on its urban area population. However, South Burlington’s lower municipal population would exclude it. These parameters can be adjusted up or down of course, but I think some sort of multifaceted criteria like this is needed to strike a good balance.

Minh_Nguyen · April 24, 2024, 3:50am

If there’s a view toward eventually extending this approach beyond New England, then I would definitely agree that restricting city to the first titled city might be too strict in many cases. Some MSAs are essentially tied between two or three cities. Oakland already lost all its major league teams, and its city-owned airport is now ashamed to call it home; demoting the city to a town in the shadow of San Francisco would be quite the salt in the wound.

I would support limiting city to the MSA’s titled cities, as opposed to all the MSA’s principal cities, especially if there are multiple metropolitan divisions, as in the case of San Francisco–Oakland–Berkeley. Per the latest OMB bulletin, that would allow both Manchester and Nashua to be city, potentially. But we probably need some other criterion, maybe involving central counties or urban areas, if we want to declutter the map beyond that.

stevea · April 25, 2024, 8:25am

I, too, thank Brian for throwing a serious-sized and -scoped dart at the dartboard.

One thing I want to urge if / as we go down any path that includes Census Bureau data is that the Census Bureau is an agency in the USA’s Department of Commerce (so, for the benefit of commercial activity, not the actual, legal, factual basis for the communities we are discussing here). This starts to go down what some might consider a slippery slope of confusing a legal entity (an incorporated town, an unincorporated gore…) with a purely statistical “blob,” with no basis in reality (but rather statistics): one whose definition is amorphous, ephemeral and therefore essentially stale very quickly after it is defined.

We can do this, but it must always be done (and “slapped with stickers” everywhere if we do) saying “these are statistical entities, not actual, legal entities.” Census data are OK to enter OSM, but ONLY (in my opinion) when they are clearly labelled as such. We already do this, and thankfully, for example with distinctions between boundary=census and boundary=administrative: these really are different things, and we importantly agree to denote them as distinct. Actually say “CDP” somewhere if true, put “Census Bureau” or TIGER or whatever in a source tag: make it clear.

So, using “census” anything is OK, but only when clearly identified as such. With place=*, we do already do this, but I cannot stress strongly enough that we must keep “census-like” data as distinctly marked (tagged, identifiable as such) in OSM going forward.

TheSwavu · April 25, 2024, 8:34am

Correct me if I am wrong, but place=* is not a legal classification.

stevea · April 25, 2024, 8:42am

It’s not. But when you say place=city because you have the exact boundaries of the incorporated limits, that simply isn’t the same thing as place=town where the town might “simply” be a CDP (Census Designated Place, a statistical entity, not anything legal, but “a conurbation of a not-so-certain size”).

In California, where “town” and “city” are “synonymous by law,” this doesn’t make sense. But we have 50 states here where there are not only legal distinctions (which place=* isn’t, I agree), but because of the ease with which, say, “town” means so many (slightly different) things in so many different places, OSM should be very, very careful to enforce strict denotation that “this is a statistical entity.” We can do that, I’m saying “let’s be very, very careful if we do this using Census Bureau data.” It will continue to enforce clarity that is sorely needed on this topic.

I thank you for making this distinction, as it is important. We don’t want to confuse place=* as either always or never meaning “legal classification is what is meant here” but it can and does mean that sometimes. Let’s say so when, where and how, please, and now OSM is doing our data a solid, rather than a potentially confusing disservice.

stevea · April 25, 2024, 8:58am

For nodes tagged place=* that are part of a boundary=administrative (for example, “city”), we already have conventions that make this clear (like also using a label tag). As I think about this topic being about nodes (I suppose only) and the emphasis is on “census categories,” I’m somewhat thinking out loud as I type here. Please forgive me if my intersection of the data types with the wide potential of tags doesn’t result in perfection, but I continue to believe that the distinction of Census data should be clearly indicated as such in OSM. This includes our many-flavored Census Bureau entities (MSAs, PSAs, µPSAs…there are lot of these and my mind fogs over as I likely get some wrong…) which really seem to be a stretch to enter into OSM. It may very well be that the Census Bureau has already done some of the hard work we seem to be wanting to invent for ourselves here, so it might make sense to leverage it, but not only must I (and our wider community really) be convinced of the value of this, but the distinction that these are Census Bureau-derived data really should be kept in OSM if we choose to use them.

Minh_Nguyen · April 25, 2024, 2:52pm

Yes, to be clear, this topic is only about a framework for choosing a value among place=city/town/village/hamlet/suburb on points representing populated places, not administrative boundary relations. There is no push to map statistical area boundaries per se. Classifying populated places is inherently related to demography, which is what we turn to the Census Bureau for.

The Core-Based Statistical Areas we’re discussing (metropolitan and micropolitan areas) are defined by the White House Office of Management and Budget based on Urban Areas defined by the Census Bureau. CBSAs are used by multiple agencies, including the Census Bureau and the Bureau of Labor Statistics. CBSAs are defined solely for statistical use. In previous discussions, we’ve only considered CBSAs’ names and principal cities, not their boundaries.

CBSAs are notoriously imprecise in New England, hence the creation of NECTAs based on town boundaries. Unfortunately, NECTAs were deprecated last year and will no longer receive updates going forward. Apparently the reason was that tabulating labor data on both CBSAs and NECTAs would have run the risk of deanonymizing individual employers or employees, so the BLS never released parallel datasets; thus the Census Bureau could no longer justify the cost of maintaining NECTAs. I don’t expect the NECTAs to grow outdated very fast, but in the long term, any reliance on them will increasingly seem arbitrary.

UAs are used in a broader range of applications. For example, the Department of Transportation uses them to determine whether an area is required to provide fixed public transit service or qualifies for rural road funding. State DOTs use UAs as one of the factors for functional highway classification. UAs have some considerations for cartography. For example, originally, urban areas were artificially kept contiguous by including sparsely populated areas, but this changed in 2020, when it was determined that newer GIS software could suitably label discontiguous areas.

miela404 · April 25, 2024, 2:52pm

Both Metropolitan and Micropolitan areas should be classified as cities, with anything else being a town, at least in New England.

ezekielf · April 25, 2024, 4:55pm

The proposal here is not to add census designated statistical entities into OSM. It is only to see if we can glean some information from these statistical entities that will help us decide whether to class a given place node as high significance (city) or medium significance (town).

This is correct. When I tell people I live in Burlington, VT I am not referring to the legally incorporated municipality of Burlington because I don’t live within that boundary. Instead I’m referring to the cultural and economic conurbation which is centered around Burlington, VT and is decently approximated by the Burlington, VT Urban Area. This general cultural and economic sense of place is what the Burlington, VT place=city node represents. Meanwhile the Burlington, VT boundary relation represents the municipality. Two distinct but related concepts.

Minh_Nguyen · April 26, 2024, 2:22am

Not sure if you realize this, but you linked to a map of NECTAs, which are distinct from metropolitan and micropolitan statistical areas. I agree that NECTAs are a much more accurate representation of New England’s places than MSAs and μSAs. Unfortunately, NECTAs have been officially deprecated since last year. If we use them at all, it would be based on how well we feel they continue to reflect reality, not based on their official authoritativeness.

ElliottPlack · April 26, 2024, 2:41am

I’d like to point out that on the NECTA map, Barre, VT would trump the state capital Montpelier which is within the Barre area. I actually lived in this area for about two years in the early 2000s. At the time, Barre did feel more like the economic center of the small region, yet the state capital status of Montpelier can not be overlooked.

uk4sm · April 26, 2024, 2:41am

“To qualify as an urban area, the territory identified according to criteria must encompass at least 2,000 housing units or have a population of at least 5,000.”

This is also from the US Census, linked below. It appears that this is what @edops did to towns in Maine. I say well done, 5,000 and up (besides metros and micros) gives a good idea where “towns” are in rural areas like Maine, and is a good distinction from smaller villages with town names. I agree, however, that @edops was off with only using metro areas for cities, like I and others have said before, simply because it would have the tiny capitals, like Augusta and Montpelier, appear as towns, since they are micropolitans.

I think that cities are areas with micros and metros, defined by the US census. If a metro has two cities with populations of over 50,000 (number from the census) then both should be marked as cities. If not, like Burlington and South Burlington, only the larger, more prominent city is marked.

Also defined by the US census, settlements with a population above 5,000 people (classified by the Census as being “urban” as opposed to rural) will receive the town classification, with the obvious exception of the micro and metro areas previously mentioned. Anything below 5,000 people should, by the census, be regarded as rural, and be tagged as villages.

When do villages become hamlets? That, I am not sure, perhaps below 1,000? maybe below 100? maybe on a case by case basis? Regardless, cities and towns are more pressing for the time being, and the information that has been proposed, although rejected by some already, has no better alternative, and so far, no other proposal to rival it, and I think it is a strong, standard that makes sense. It is third party, coming from the Census, meaning it will be easy to enforce. Plus, the 50,000 and 5,000 is somewhat fitting, and it also allows small cities below 50,000 be left as cities, since they have metro and micro areas. If they don’t even have a micropolitan area that is classified by the US government, it should not be labeled as a city.

I would like to see someone come up with something better, because I can’t think of anything better than using a system laid out by the US government, that can actually be enforced and cited, given our current situation, where small towns of less than 7,000 people, with no notable surrounding populated areas, are being marked as cities.

Urban and Rural.

uk4sm · April 26, 2024, 2:50am

In Maine, for example, this would have

Portland, Bangor, Lewiston, Sanford, Brunswick, Augusta, and Waterville marked as cities.

This is a solid, sensible list, that is already similar to what is currently displayed, with a few key changes needed.

As of now, all of these areas are marked that way, except for Sanford for some reason, despite it having its own micropolitan area. I think whoever made those changes forgot about it. Biddeford is apart of the Portland metro, and since it doesn’t have a large enough population to be separated from the Portland metro, it too like South Portland, Westbrook, and other suburbs of the greater city, should be marked as a town.

Also, for some reason Rockland and Presque Isle have been marked as cities, and that doesn’t make sense at all by any agreed upon metric. I think @edops was completely correct in their complaints about it. Neither have metro or micro areas, neither of them are even inside other metro or micro areas, and neither have significant populations themselves.

uk4sm · April 26, 2024, 2:58am

To clear any confusion, smaller cities like South Portland and Aurburn are inferior to their respective cities, Portland and Lewiston, and are in the same metro area. Neither South Portland or Auburn have populations large enough (over 50,000 according to the census) to be considered their own separate city, in the classification system I’m proposing. They should be marked as towns, as they already are. However, if two cities in the same metro, like Boston and Cambridge, have populations over 50,000 (both of them do, obviously) then they should both be marked as separate cities. Over 100,000 people live in Cambridge, it is just too large to be marked as a town. 100,000 is the cap used by the OSM display system to note larger cities on the map. 50,000 is a good number, very few cities in New England are that large, and its a number that comes from the US census classification for a metro area. Also, as I said before, the other US census classification for urban areas that should be used for towns is 5,000, and there’s something fitting about 50,000 and 5,000 being used as benchmarks, with the other metros and micros below 50,000, such as Augusta and Bangor, still remaining as cities, as they should.

Also, to note what @ElliottPlack commented on, there simply needs to be a clearly defined exception for state capitals, to compensate for the Montpelier/Barre situation. Montpelier is a rare exception. It is tiny, and would be an insignificant small town in Vermont if not for its status as the state capital. It is so small, that its micropolitan area is not even called the Barre-Montpelier Micropolitan area (it is simply the Barre Micropolitan area). An exception should be made because it’s the capital.

It seems that somewhere in the OSM display system, capital cities already get some sort of precedent. If you zoom out, you’ll notice that the Montpelier label covers up the Barre label. Normally, population determines this, and yet even though Barre is bigger, Montpelier covers it up when you zoom out further. Since the exception for state capitals already exists in the OSM system somewhere, there is no reason not to keep it.

Minh_Nguyen · April 26, 2024, 3:41am

For what it’s worth, there isn’t anything particularly special about the 5,000-resident threshold on its own. This is the official criterion you’re citing from federal regulation:

An area will qualify as urban if it contains at least 2,000 housing units or has a population of at least 5,000.

The term “area” in this criterion refers to an earlier step that first identifies an “urban area core”:

Aggregation of census blocks with a housing unit density of 425 [per square mile]. Use of land cover data to identify territory with a high degree of imperviousness.

The early steps are based on housing unit density, which mean they technically identify residential areas, not necessarily populated places in an all-encompassing sense. I’m sure some New England cities can be characterized as sleepy bedroom communities, but most are defined also by commercial activity. A later step tries to correct for this bias by bringing in employment statistics:

Inclusion of groups of census blocks with at least 1,000 jobs (per Longitudinal Employer-Household Dynamics Origin-Destination Employment Statistics (LODES) data) and that are within 0.5 miles of an urban area.

and by splitting agglomerations based on commuting patterns and feedback from local officials:

Potential splits and merges are identified using Longitudinal Employer-Household Dynamics worker flow data between 2010 Census urban area pairs. If necessary, split location is guided by commuter-based communities.

Even so, the Census Bureau recognizes that the resulting Urban Areas still don’t line up perfectly with traditional notions of populated places. So there’s a whole section of the code for giving the UA a recognizable title based on administrative area names or CDPs. A UA’s primary name could even potentially be a place with fewer than 2,500 residents.

I’ve glossed over many details that you can read in the regulation and other documents. The end result is that Urban Areas are far more nuanced than equating place=town to any named municipality over 5,000 people. If we really have to stick to a simple numerical population threshold for New England, maybe 5,000 is a better cutoff than 10,000, but ultimately it’s just an arbitrarily chosen proxy for something. Maybe you’re right that it’s a good fit for what we’re trying to do with place classification. (What are we trying to do with place classification?) If so, I wouldn’t take this number out of context by claiming that it has anything to do with Urban Areas.

By “OSM display system”, you’re probably referring to OpenStreetMap Carto, one of many OSM-based map stylesheets. Carto does give capitals a boost. Other stylesheets may or may not make a similar exception. After all, maps have always been very diverse in terms of how they present place labels.

uk4sm · April 26, 2024, 3:50am

Actually I was referring to the regular OSM map, not OSM carto. Maybe it’s different for others, but when I zoom out it is Montpelier that supersedes Barre, not the other way around as you’d expect.

And fair points on the 5,000 classification for urban areas. Towns might be trickier, but I still think 5,000 is a start, even if it’s not based on US Census classification as much as I thought it was. I still stand by my points about cities. Right now, the Miropolitan and Metropolitan census areas are the best we’ve got that isn’t arbitrary. We need something that is clearly defined, preferably by something as official and as neutral as the census, so that it can actually be enforced in the future

Minh_Nguyen · April 26, 2024, 3:58am

Yes, that’s OSM Carto under the hood, unless you’ve used the Layers sidebar to switch to a different featured stylesheet. If you take a look around this forum, you’ll get a sense of how polarizing Carto has been lately within the community (not specifically about place labels though). One of the most cited principles in OSM is against tagging for the renderer, essentially to discourage making choices based solely on this stylesheet at the expense of other data consumers. Still, it is an influential stylesheet, so you aren’t wrong to want to use it as a point of reference.

This need isn’t unique to New England, which is why other parts of the country have occasionally popped in and out of these conversations. There’s some desire to align on a consistent framework across regions, even if the specifics might differ due to local circumstances (such as the irrelevance of counties in most of New England).

ConradWard · April 26, 2024, 11:22am

Just reading through all this now - I apologize if it looks like I was trying to fracture the discussion with my discussion about Maine.
Lots of good points made. I don’t have much specific to add at the moment other than I’d say anytime you try to create a formula for such a large diverse place, you’re going to have a lot of exceptions to the rules.