Framework for aligning New England place nodes to census categories

I think the path is to establish a formulaic approach that’s 80-90% right and then explore the outliers that we think are exceptions. Then we ask: WHY are these exceptions? And then that answer gets worked back into the criteria. Some degree of subjectivity will be unavoidable but at least we should be able to come up with what the subjective criteria are.


Just to bring up my personal favorite exception: the Town of Barnstable.

It looks like following NECTA, Barnstable would warrant a place=city node. The most logical place to put that would probably be on top of Hyannis – the most commercially significant and populace of the Town of Barnstable’s villages. However, if I put Barnstable in my GPS and ended up in Hyannis, I’d think someone had made a mistake, as “Barnstable” typically refers to the Village of Barnstable, a smaller population center north of Route 6.

When previously discussed on the OSMUS slack, the consensus reached was that the Town of Barnstable didn’t warrant a place node at all, as it is basically just an administrative grouping of villages that share a school system/police department/etc, and representing Hyannis as “Downtown Barnstable” didn’t make a lot of sense to anyone.

I can’t think of other cases like this in New England (though I was also surprised to learn that Hyannis was just a village in the Town of Barnstable when I dove into this rabbit hole).

1 Like

NECTAs were named after cities and towns (the C and T in that acronym), specifically the most populous city or town in the NECTA, which might not contain the largest settlement. The Barnstable Town NECTA was named after the Town of Barnstable, not the Village of Barnstable. In order to determine which place(s) to promote as the place=town or place=city based on the NECTA, you’d have to find out which Urban Area(s) formed the core area(s) that gave rise to the NECTA – probably Hyannis. So the Barnstable Town NECTA wouldn’t boost anything named Barnstable.

Hyannis is probably the best example, but you can be forgiven for not knowing the name of the municipality in a few other places. White River Junction, Vermont is another one (actually in the town on Hartford). Stratton, ME located in Eustis and Terryville, CT located in Plymouth are other examples I’ve encountered.
Hope I’m not getting too off topic.

Given this description it’s quite odd that the Census represents Barnstable Town only as a single incorporated place rather than broken up into separate CDPs as all the neighboring towns are.

From TIGERWeb. Black is for municipal boundary, orange if for CDP, and red is for incorporated place.

Also the census designated UA in the area is titled Barnstable Town yet it sprawls far beyond. This urban area seems somewhat unique in being a distributed collection of towns and villages. Hyannis may be centrally located and the largest of them, but it hardly seems to be an urban core in the same way nearby New Bedford or Providence is.

I think NECTA is a poor choice for algorithmically determining place=city for several reason including those noted…

I’ve been working on a spreadsheet analysis to come up criteria for place=city on the basis of MSAs. I used the following factors to assign a population center place=city:

  1. If the city is the first listed in an MSA.
  2. If a city is within an MSA (either listed in the name or geographically located within), and it’s within a threshold of size to the principal city.
  3. If a city was listed first in a μSA and was above a certain size threshold with respect to its state.
  4. I adjusted the population of each city upwards based on its state’s population density and land area. A city in a sparsely populated state was weighted higher than those in more dense states.

I then tuned the weights up and down until I got a set of cities that made sense to me as a New Englander and represented at least an 80-90% solution.


  • Boston, MA
  • Fall River, MA
  • New Bedford, MA
  • Springfield, MA
  • Worcester, MA
  • Pittsfield, MA

Rhode Island

  • Providence, RI


  • Hartford, CT
  • Bridgeport, CT
  • Stamford, CT
  • Norwalk, CT
  • New Haven, CT
  • Norwich, CT
  • New London, CT


  • Portland, ME
  • Bangor, ME
  • Auburn, ME
  • Lewiston, ME


  • Burlington, VT

New Hampshire

  • Manchester, NH
  • Nashua, NH
  • Concord, NH

Algorithmically, this excluded a few places that might normally be included in lists of New England cities. For example, Portsmouth, NH and Newport, RI (which are of roughly identical population), failed to make the cut. Same for Keene, NH as well as Barre and Bennington, VT, and any of the locations in the White River Junction area. However, we may also conclude that these places don’t quite make the cut compared to the city list above.

It also includes Pittsfield, MA, which is questionable to me, but hey, it’s the lead city in its MSA.

I also plugged in a couple places in the spreadsheet outside of New England as a sanity check (and added an additional criterion to catch “twin cities” like Oakland and St. Paul).

Adding to the Maine portion of the discussion, here’s my take on the most prominent municipalities.

The three major urban centers of the state, Portland, Bangor, and Lewiston-Auburn are all shoe ins. Though Auburn is not a certainty, as one could argue that Lewiston is a little more prominent given its larger population, hospitals, Bates College, and unfortunately a bit better known due to the shootings. At the same, they both share the same urban center core, separated only by the river. One could argue that including one but not the other as a city would be like separating conjoined twins. It doesn’t really break anything to include both as a city so I’m leaning yes to Auburn being a city.

Then the regional cities. Biddeford, Saco, and Sanford in southern Maine, Brunswick (technically a town), Augusta, and Waterville further up. I will include Presque Isle way up in Aroostook County as well. Some of these areas may not be too far from Portland with the advent of the Maine Turnpike, but all of them have mostly separate economies and residents would probably be, at best, displeased if you called them a suburb of another city. Biddeford, Saco, Sanford, Brunswick, and Waterville all originally developed around the ample water power available and became bustling centers of industry, while Augusta is the state capital. Presque Isle, though less than 10,000 residents, is the largest city in Aroostook County/Northern Maine and home to a public university and a commercial airport.

It’s a little tricky with Saco, Biddeford, and Sanford because they’re all relatively close to each other so there might be some concerns of ‘crowding.’ But they are all relatively equal in stature so we couldn’t really just pick one as a city to represent the area. I am neutral to slightly in favor as to whether or not they should be cities. Having Portsmouth, Dover, and Rochester, NH as cities gives ammo to the case of these being cities, but one could just as easily make a case that these don’t rise to the level of importance required of a city.

Brunswick is an important area economically, the gateway to the Maine coast and home to Bowdoin College. Formerly home to an air force base and right down the road from Bath, the center of the Maine ship building industry. Nevertheless, it is a town by incorporation. I am neutral to slightly in favor of it being a city. Proximity to Lewiston-Auburn and Portland could be used in an argument against being a city.

August: state capital so it rises to the level of importance to be a city.

Waterville: probably the second easiest regional population center to justify as a city given that it’s a little more isolated. Nevertheless, it’s the second smallest population on my list.

Presque Isle: By some measures, this passes the test with flying colors: economic and cultural center and largest population center in the northern part of the state. Nearly two and a half hours to a larger city in Maine, though there is a larger Canadian city an hour and 15 minutes away. Anecdotally, this part of the state is so far out there that I’ve never visited despite being a Mainer for over 20 years. However, the factor going against it is that it’s starting to become a stretch calling it an urban area at all given its sub-10k population. I’m going to remain neutral until hearing more perspectives.

If we were just making a map of Maine, I think most would absolutely include all of these as cities as having the largest font. However, being part of the broader framework of OSM, it gets a little trickier. I think that including the regional Maine cities as OSM cities will allow us to spread out the usage of the hierarchies giving a more nuanced picture of the state on OSM. Just having four cities and lumping all of the other regional cities in with the smaller towns could be cause for confusion. I will be interested to see what happens to the regional cities of New Hampshire and Vermont because these states are most comparable to Maine. I think if most of the regional cities remain as cities there, then they should probably remain cities in Maine too.

Not necessarily, We have discussed demoting Montpelier to place=town for low prominence as well. Remember that the purpose of place=city, town, village, and hamlet is simply to group populated places into bins by relative magnitude. I.e. consider them to be place=big, place=medium, place=small, place=tiny where each category has a progressively higher density of places on the map.

It’s unfortunate that the tag values use English language words that have meaning in people’s minds, because it’s hard to separate. There are plenty of place in New England that are “cities” in our vernacular that ought not be place=city

1 Like

Without any local knowledge, I can’t evaluate your tuning for accuracy. However, I would caution that this exercise reminds me of how geocoders and routers are typically implemented, by weighing factors until certain representative queries or routes yield the desired results. Essentially, it’s an act of curation rather than data gathering. To the extent that a geocoder would factor place=* into its weights, it would want the tag to communicate some objective fact that it could not derive itself. But a renderer might appreciate a subjective hint, because algorithmic curation is a Hard Problem. I could see this becoming a source of tension in the long term, since both kinds of data consumers rely heavily on this same key.

The inputs to your spreadsheet are the populations of cities or towns and the populations of CBSAs, both of which have been shown to be poor proxies for the geographies of populated places, and thus inaccurate sources of population figures. You’ve attempted to correct for this inaccuracy by multiplexing them together. How stable would these weights be as populations change? And if we extend this approach beyond New England, can we be sure that it won’t unduly bias the map for bedroom communities at the expense of commercial hubs?

These are problems that the demographers at the Census Bureau have attempted to solve with their Urban Areas. I must admit that I have no idea what LODES data looks like, but the overall process seems to do a better job of identifying non-rural areas than we could on our own. What if you repeat your experiment but this time with UAs instead of official city and town boundaries? Would you be able to get away with fewer fudge factors? There’s still the issue of UA titles that don’t quite line up with populated place names, but it doesn’t seem as severe as with CBSAs.

Right, (imo), “curation rather than data gathering.” That really is how this best evolves. (Again, imo). Thanks, Minh.

And look at what Maine’s libraries do: Urban/Rural Designations: Maine E-rate for Libraries as even they have an “appeals process” for Mainer (Maniac?) input on whether any area is urban or rural. And “there we go again” at noticing that it is both wide-area opinion as well as individual opinion which tussle with each other at where the fuzzy line actually gets drawn.

I’ve been to Orono, Maine (university “town” or “city”?) and while I think the population is over 10,000, is that not a city in this context? Or is town correct? (I don’t want to get lost in a single example, and it was a long time ago I was there, so I’m not a local with local perspectives by a long shot).

I would urge folks to consider broadly the criteria to which we divide population centers into categories at a higher level and consider the lists I’m generating here as a thought exercise to consider places which are on the margin between =city and town and what factors we think are important in distinguishing them. The goal here is separating the most important population centers from lesser cities.

There seems to be a consensus at least that context matters - a population center in a sparse area gets higher weight than the same population center in a more densely settled region. There also seems to be a consensus that the presence of border_type=city administrative boundaries (i.e. places called “city” and have a mayor and so forth) are irrelevant in the decision to assign place=city. After all, some very tiny places (like the city of Palmer, MA, population 12K, which I’ve personally never heard of) are legally cities.

The issue, and the original reason why this discussion is ongoing, is that there is a general sense that we have disproportionately too many place=city nodes in New England compared to the importance level that place=city has compared to place=town nodes in other areas. That means this is not solved unless we re-categorize some subset of cities tagged place=city as place=town.

As such, we have successfully “curated” way too many cities on the map for the reality of what these population centers are, and curation is useless without criteria, whether objective, subjective, or more likely a combination of both.

It seems to me that, over the years, overclassification east of the Mississippi and underclassification west of it has always stemmed from a naïve overreliance on administrative areas or their population counts. Your experiment also relies on population counts of administrative areas, namely, the populations of cities and towns, counties (aggregated into CBSAs), and states. However, it uses a formula for allowing more city nodes per MSA depending on a state’s population density based on some qualitative “bonus factors”.

As a dabbling map designer, I appreciate that you’ve replicated the label density scrubber found in some GIS tools, and that I long for renderers like MapLibre to provide. Unfortunately, there’s no way for any of us to judge whether these factors are versatile enough for OSM, other than the one constraint that “the map” “looks right”. This reduces place=* to a purely presentational attribute, rather than one about a populated place’s function in society. If place classification formerly suffered from “garbage in, garbage out”, this greener recycling process will require more transparency the moment anyone disagrees with its results. I’m not entirely sure we’ll ever be able to hone place classification into a science, but any formula we come up with deserves scrutiny.

In the other threads linked at the top, I’ve floated a half-baked idea for how we could classify places nationally without a minimum of fudge factors and magic numbers. We could restrict place=town/city/suburb to places within Urban Areas. Within a UA, place=city would be subject to a simple test of whether any surrounding places would be considered its suburbs or those of another city within the UA, and perhaps the same would extend to choosing a place=town in a smaller UA. This most likely aligns with the Census Bureau’s practice of titling UAs based on their “high-density nuclei”, except that we wouldn’t reclassify any MCD or directional place name as place=town or city. Palmer, Massachusetts, falls within the Springfield UA and would not be its place=city.

Where I get stuck is how to draw the line between a UA that has at least one place=city at its core and a UA that has only a place=town at its core. The same uncertainty recently caused the Census Bureau to drop its distinction between Urban Clusters and Urbanized Areas, which had been set at a population of 50,000 since 1950. They say we’re now free to categorize UAs based on any population threshold we want. Thanks a lot, Census Bureau!

Maybe we could scale up the old threshold to 109,515, based on the country’s population growth since 1950. That’s right around the median UA population of 101,536 and the formerly documented place=city cutoff of 100,000. Maybe we set a budget for the number of place=city nodes within a UA, based on the UA’s population divided by that threshold. But I think these arbitrary cutoffs are only useful to the extent that they align with real-world differences in how places function.

There have also been concerns that some sparsely populated regions of the country would go blank if we rely on UAs, which like CBSAs require a certain minimum housing density. Nome, Alaska, would be relegated to a village unless we come up with some exception for it. In general, though, I don’t think our goal should really be to pad out the map artificially. A stylesheet should pull in place=village and rely on symbol collision if it needs to maintain an even label density everywhere.

Perhaps we could also consider how Natural Earth classifies places at its three scales. I would rather leave the subjective curation to them and focus on value that we can add independently as a data-driven project, but many renderers mash up Natural Earth at low zoom levels and OSM at high zoom levels, so some degree of alignment may benefit the broader ecosystem. As well, some consistency between regions would benefit our users. Whether it’s our methodology or the resulting density that is consistent, predictability will encourage more data consumers to make more thorough use of our data.

Mmmm, sausage! Seriously, I like how we certainly get right to it!

In the USA, for about 15 years, we’ve been tweaking these. They work best when they run as Brian says with a relative magnitude sense: (each of which is suffixed with “around here” which is deliberately squishy as to how far out, but really means something well- and widely-understood in a more-local context) large, medium, small, tiny. This is their place.

Then, there is their admin_level and boundary and so on, which while I know is related in some sense we have as mappers to map correctly, has serious overlap and blur with the concept and actual key of place in many minds. I keep saying that, we keep saying that, these are choppy waters like that around here sometimes.

It seems like we both want to and can and do but fully haven’t in some places all of the above. It gets better with admin_level, and some smoothing of place is underway.

It’s true that how place names render on a map has a fair amount to do with this. That’s why our brains visually trigger with the sense of relative big, medium, small, tiny -ness. We see our sense of place around us and that can be powerful. In some sense, we are hallucinating into existence our own sense of togetherness and community as we do this. It’s freakin’ awesome to watch.

Below is the list of Urban Areas in Massachusetts with population listed. There’s also a few other columns in the raw data like land and water area and population density. I included UAs that have the primary city in another state but include “MA” in the descriptive name.

Name Population
Boston, MA–NH 4,382,009
Providence, RI–MA 1,285,806
Worcester, MA–CT 482,085
Springfield, MA–CT 442,145
Barnstable Town, MA 303,269
Nashua, NH–MA 242,984
New Bedford, MA 155,491
Leominster–Fitchburg, MA 111,790
Amherst Town–Northampton–Easthampton Town, MA 90,570
Pittsfield, MA 50,720
North Adams, MA 25,432
Greenfield, MA 22,294
Southbridge Town, MA 20,789
Vineyard Haven–Edgartown–Oak Bluffs, MA 14,064
Athol, MA 13,557
Nantucket, MA 12,011
Ipswich, MA 9,380
Spencer, MA 8,196
Lee, MA 8,119
Ware, MA 5,662
Provincetown, MA 5,698
Sunderland–South Deerfield, MA 5,048
Winchendon, MA 4,866
Pepperell, MA 6,103

If I were curating this, I would snap the city/town threshold between New Bedford and Leominster/Fitchburg (it’s pronounced Lemminstah in case you’re wondering). I consider Leominster/Fitchburg to be more like two sprawled out towns that just happen to cover enough territory that they managed to collect up a decent amount of people. But it’s not an urban center at all, in the way that Fall River or Burlington, VT are (to pick examples with similar population)

I would also exclude Barnstable Town for all the reasons discussed elsewhere and for the fact that it’s basically just a sprawled suburban area on Cape Cod.

I would also include Fall River, MA, which has a population of 94,000 and is included in the Providence, RI-MA UA and has a significant urban center with urban character. Excluding cities in other states, that would leave Massachusetts with five cities - Boston, Worcester, Springfield, New Bedford, and Fall River.

Some of the smallest cities on that list may well even qualify as place=village. I’ve never even heard of Sunderland.

Moving north to Vermont, here’s the table:

Name Population
Burlington, VT 118,032
Lebanon, NH–VT 30,299
Barre–Montpelier, VT 20,014
Rutland, VT 19,550
Bennington, VT 13,759
St. Albans, VT 11,368
Brattleboro, VT 10,285
Milton, VT 6,417
Middlebury, VT 6,154
Springfield, VT 5,140
St. Johnsbury, VT 4,883
Bellows Falls, VT–NH 3,978

On this list, I would include only Burlington as a city, and I would even make Montpelier a place=town despite being the state capital.

Now, moving onto Maine:

Name Population
Portland, ME 205,356
Dover–Rochester, NH–ME 72,391
Portsmouth, NH–ME 95,090
Bangor, ME 61,539
Lewiston, ME 60,743
Brunswick, ME 31,361
Augusta, ME 24,005
Waterville, ME 25,529
Sanford, ME 15,067
North Windham, ME 10,271
Rockland, ME 9,868
Camden, ME 4,660
Skowhegan, ME 4,795
South Paris, ME 4,371
Houlton, ME 4,281
Rumford, ME 5,585
South Berwick, ME–NH 5,584
Presque Isle, ME 5,361
Millinocket, ME 3,812
Belfast, ME 3,754
Boothbay Harbor, ME 3,067

Of this list, I would likely only include Portland, Portsmouth (NH), Bangor and mayyyybe Lewiston. I hesitate on Lewiston based on the utter lack of notability outside of Maine (based on the very hand-wavy concept of, if you asked a New Englander outside of Maine to name cities in Maine, most people won’t come up with it). But with a population of 60K in spacious Maine (the UA includes adjacent Auburn), it’s hard to argue against it.

If we’re excluding Montpelier, there’s a good argument for excluding Augusta on similar grounds (motto: “at least we’ve technically got more people than Montpelier”).

I would also exclude Dover/Rochester (which are both in NH but include a little Maine). It’s high on the list but if you take a look at a map, the census bureau has combined two distinct spread-out areas with a big enough lasso around that they managed to combine a decent population count. But a significant population center it is not. I also assess those places to have very low name recognition outside of New Hampshire.

This leaves Maine with three cities: Portland, Bangor, and Lewiston.

1 Like

This would validate the threshold of around 100,000 to 110,000 that I floated above. If we want to apply the same framework nationwide, we’d need more gut checks along these lines.

In order to choose which titled places within the Leominster–Fitchburg UA would qualify for place=city, we would essentially have to split the UA in two. It certainly wouldn’t make sense to double-count the entire UA’s population by classifying both places as city. Roughly halving the UA’s population would likely cause it to fall well below the threshold for any city within the UA, matching your expectations.

Likewise, if we roughly halve the Sunderland–South Deerfield UA’s population, it would easily fall below the threshold for a place=village within the UA.

Other double- and triple-barreled UAs may be less obvious just judging by title or appearance. Of course, it would be great if we could more precisely divvy up the UA’s population rather than “roughly halving” it. The process for defining a UA involves creating urban area agglomerations that probably correspond to lobes like the ones above, but I don’t think they publish those intermediate geographies.

Did you find my “would have suburbs around it” test useful in identifying Fall River as a place=city within that UA? If not, we could look for a more rigorous standard. Through the 2000 census, the Census Bureau designated central places within Urban Clusters and Urbanized Areas. They dropped these lists in 2010, considering them redundant to the principal cities in CBSAs. Should we reintroduce the previous central place concept or adopt CBSA principal cities to supplement the places in the UA’s title? Previously, I found that principal cities tended to overpopulate some of the larger MSAs with too many place=city contenders, 19 in the Los Angeles–Long Beach–Anaheim MSA alone.

Barnstable Town is an MCD. By the rules for naming UAs, this means the Census Bureau couldn’t find within the UA a specific populated place with at least 2,500 people, or even a CDP. On that basis, we can say that no place within Barnstable Town should be a place=city, even if we set the floor for a city-based UA at 100,000 or so.

Do you feel strongly about also including Portsmouth and Bangor as place=city? The Dover–Rochester UA’s population would be split between the two cities, so there’s no problem denying both that coveted city status. But Bangor would really blow a hole in any threshold around 100,000. Bangor would also frustrate other heuristics that have been proposed in the past, like the presence of an international airport, so maybe there’s just something special about it that shouldn’t sway a regional or national classification standard.

For some of the low hanging fruit in Maine, I think Rockland and Westbrook being towns instead of cities would be pretty close to unanimously supported.

1 Like

This feels right to me as a Vermont resident.

Metro Burlington is the only one of these that feels like a it is big enough that if you ate out at a restaurant once per week you would never be able to visit them all (including fast food establishments) once turnover is considered. The rest of these have a few dozen restaurants, maybe 100 at most.

Around a population of 100,000 or more (for the Urban Area, not just municipal boundary) seems reasonable to me. One place=city in Vermont (Burlington) makes sense. However, I have heard voices arguing that some regionally signifcant medium sized places like Concord, NH or Lewiston, ME should qualify for place=city. Seems like we have some consesus building to do on exactly what we are aiming for the place=city tag to represent. Should it be only for the most significant, densely populated, urban centers in a region? Or should it be for any regionally significant urban center including smaller ones in more sparsely populated regions? I’m ok with either definition but if we don’t establish a consensus on this point I think we’ll continue to talk past each other about what the qualifying factors should be.

This is to say: There is some concept of scale of place that a single person can know intimately and keep track of themselves as new buildings are constructed and businesses open and close. I feel like a “town” is at least potentially knowable by someone regularly out in it and exploring it.

In contrast, a person in a large “city” may only be able to intimately know several distinct neighborhoods that they frequent while other neighborhoods ebb and flow out of sight.

There is certainly a fuzzy threshold between a small city and a large town, but thinking of scale in this knowability way feels like the ~100,000 UA population is about right for this kind of distinction between town and city.


First, a disclaimer: I am not a native New Englander, so feel free to take my opinion with a grain of salt, or disregard it. But I have visited quite a few times, so I consider myself moderately familiar with the region.

It seems like this discussion is mostly focusing on the city vs town distinction among distinct urban cores. One thing I haven’t seen mentioned is the status of somewhere like Cambridge, MA, currently tagged as place=city. The administrative city has a population of 120,000, making it the 4th largest municipality in Massachusetts. Cambridge would certainly be described as being “in the Boston area”, but it’s certainly no sleepy bedroom community where all residents commute into Boston: it has its own economic engines, major destinations, business districts, and (from what I can tell) a distinct sense of community identity. When I visited a few months ago, it certainly seemed sufficiently large as to feel ‘cosmopolitan’, like you could be unaware of what was going on on the other side of the town, but that’s hard to say from the outside. I also suspect that many folks living in surrounding smaller places like Arlington or Watertown commute to Cambridge, not Boston. It’s certainly notable in the sense of “people from outside of the region have heard of it”, probably more so than somewhere like New Bedford or even Springfield IMO, although places with major universities are always going to be outliers with that sort of thing. Cambridge is listed second in the MSA and NECTA names after Boston.

I guess I would find it helpful to delineate what the consensus is on places like Cambridge, and why. Is it impossible for it to be a place=city because it is across a river from another place=city thought of as more prominent/that has 5x the municipal population? What would allow the urban area to have multiple place=city places? Is there some other factor that finds Cambridge to have insufficient amenities to be considered a city? To avoid seeming like I’m just playing devil’s advocate, I’d say that I would probably tag Cambridge or somewhere like it as place=city. But my read is that others do not share this opinion, so I’m interested to hear arguments that could change my mind.

Then there are other places, also currently tagged city, that have slightly lower municipal populations (but all >100k) but are also further from central Boston while remaining in the census Urban Area, like Quincy, Lynn, and Lowell. Are all of these clearly place=town, on par in importance with an isolated place with 10,000 people? Or is there some combination of population x distance that allows for one to be a city? Does there have to be undeveloped land as you go out from Boston for it to be one? Some other factor determining them to be center-like in character, rather than fully subordinate to Boston? I’m less personally familiar with these places, so I don’t have a strong stance, but they seem like they could reasonably be place=city to me.