Toward a national system for functionally classifying populated places

Over the past couple months, we’ve seen quite a bit of uncertainty and disagreement about how to apply the place=city and place=town tags in the U.S. I’d like to consolidate the backstory and walk you through some of the ideas I’ve explored for classifying places more systematically and deterministically, as well as the struggle to balance the needs of urban and rural parts of the country.

An early standard

From 2006 to 2013, the “core values” of place=* were globally defined by population on a logarithmic scale:

  • place=city for settlements with 100,000 inhabitants or more
  • place=town for settlements with between 10,000 and 100,000 inhabitants
  • place=village for settlements with between 1,000 and 10,000 inhabitants
  • place=hamlet for settlements with fewer than 1,000 inhabitants

There was plenty of debate about whether the rubric should be strictly based on population, legal designation, both, or neither. At one point, someone tried to reframe these population thresholds as merely a general observation about typical populations associated with each legal designation.

In 2009, the vast majority of place=* points in the U.S. were imported from GNIS. These points were classified based on the documented scale according to population data from the Census Bureau’s 2006 American Community Survey, falling back to place=hamlet if no matching entry in the ACS could be found.

GNIS turned out to have systemic errors. For example, old railroad stations and junctions were tagged as place=hamlet because both types of features had similar-looking labels on old USGS topographic maps:

The import compounded these problems. In the import, GNIS features were matched to ACS populations based on FIPS codes, but whereas GNIS distinguishes between populated places, administrative areas, and statistical census areas, the import tended to conflate these concepts. It also tended to conflate places with the same name, which distorted population figures of places across the Midwest (where a township and a place within often share a name) and New York State (where every town has a village by the same name, almost as a rule).

Accommodating local tastes

Even when an accurate match took place, the population figure only corresponds to the place’s incorporated territory. Boston is tagged with the population of the city proper, whereas a “Boston” label on any small-scale (zoomed-out) map represents the entire metropolitan area, including the suburbs. Worse, the “city”, “town”, and “village” terminology misleads many new mappers into thinking that places should be classified based on their official designations, particularly in states where these terms carry legal meaning. Changes along these lines rarely lasted long because they made the map look bizarre, especially across state lines.

A more holistic approach still leaves something to be desired. Some state capitals and county seats are tiny compared to their peers, but to many mappers, they deserve equal treatment. And look at all this empty space – is OSM saying there’s nothing there but trees, if that?

The U.S. documentation continued to mention the logarithmic population scale long after it disappeared from the global documentation. Nevertheless, mappers in some regions adopted rubrics for reclassifying places based on some exceptions to this scale. Some boosted county seats, so that renderers would show the full set of administratively similar places at the same zoom level. Others tried to account for the majority of the urban population living in the suburbs, outside the incorporated city limits. Still others arbitrarily promoted places to fill in perceived gaps.

Generally, these have been ad hoc adjustments, with the unwritten expectation that any change to existing classifications is drastic enough to warrant prior discussion with the local community. And so launched scores of long discussions, often in response to mass edits and mass reversions. But there is very little coordination. If anything, we have created a visual demonstration of the “Lake Wobegon effect”: every place is of above-average importance to someone.

It should come as no surprise that there are more place=city nodes in the eastern half of the country and along the West Coast than in the Rockies. However, I can’t help but wonder what the large mass of city in Southern California has to do with the smaller clusters of city along the Bos–Wash corridor and the specks of city in Montana.

In 2017, there was an attempt to formalize many of these exceptions. On a wiki page unrelated to place classification, the minimum threshold for place=city was halved to 50,000 for unincorporated areas. By 2022, this became the documented standard for all place=city in the country. I was unable to find any discussion about this change, which might explain why mappers have implemented it inconsistently: of the places between 50,000 and 100,000 in population, 233 are place=town and 265 are place=city.

We need a system

Over the years, many mappers have expressed dissatisfaction that OSM’s place classification system is unclear, poorly documented, arbitrary, or based on unwritten rules. To some extent, they’re right: many places that aren’t classified according to a population scale essentially either got reclassified under the radar or were curated based on handshake agreements behind closed doors (such as PMs on osm.org or more public threads on Discord or Slack). This leaves us with little that we can document for newcomers to make sense of. They want something that, if not straightforward, is at least explainable.

A systematic approach to place classification isn’t just for wannabe urban planners in armchairs. Data consumers would be able to use place nodes more effectively and creatively if there’s some rhyme or reason to how they’re tagged. Cartographers want to declutter conurbations, but all they have is the sledgehammer of ranking by population and enforcing a minimum distance between labels – or eschewing OSM data in favor of Natural Earth at certain zoom levels. They need more context with which to make these decisions, which the place=city tag could provide if applied more judiciously.

Most OSM-based renderers are poor tools for evaluating our place classification choices. Interactive maps don’t try very hard to show a coherent selection of place labels at each zoom level. At low zoom levels, they only need to show enough place labels to suggest further detail that you can access by zooming in. They tend not to label places any more aggressively than that, because otherwise the map starts to fill up with insignificant places that are tagged similarly. In an already sparsely populated region, this approach creates a perverse incentive for mappers to inflate classifications just so that a bare minimum number of places show up on the map, to achieve a minimal level of usability.

Not every renderer is interactive. There’s still an enduring need for static renders that get incorporated into slide decks, news articles, doctoral theses, textbooks, and other printed material. Many static maps preserve the traditional practice of filling in the gaps with labels that you’d normally see after zooming in. With this higher information density, there’s an even greater need to establish a visual hierarchy, to prevent information overload. Sometimes the reader needs to be able to glance at the map to see what’s most significant; other times, they need to be able to obtain more details by squinting at the same map.

To be sure, this map is quite educational, informing the reader of many places they wouldn’t otherwise know of. There’s nothing wrong with a small-scale map labeling Havre, Montana; Show Low, Arizona; Enid, Oklahoma; Rolla, Missouri; Athens, Ohio; Key West, Florida; or Presque Isle, Maine. Same with populous suburbs like Hayward, California; Olathe, Kansas; West Palm Beach, Florida; and Lakewood Township, New Jersey, if there’s room. But are these place=city nodes so essential for getting one’s bearings that they all definitely deserve a top-tier typographic treatment?

Like the functional values of highway=*, the functional values of place=* should ideally measure connectivity: how well the place binds its surroundings together and leads it in a common identity. We call a suburb a suburb because it lacks that pull, no matter how populous it is. Instead, it’s somehow dependent on a city that has a broader sphere of influence. Meanwhile, a rural town has nothing in its vicinity to pull. How do we quantify this connectivity?

Defining population

As long as place classification is influenced by population at all, then each population figure needs to align with what the place point is supposed to represent, and populations need to be comparable from one place to the next. Our approach thus far has relied heavily on population figures of incorporated places. Though these figures are readily accessible and easy to apply, they misrepresent most cities, taking them out of context unless you’re zoomed in far enough to see each of the suburbs called out separately. Moreover, an incorporated place in one state isn’t directly comparable to an incorporated place in another state, because the two places are subject to very different rules for incorporation and annexation.

Fortunately, the Census Bureau has calculated a set of 2,644 urban areas that attempt to approximate the coherent built-up area surrounding a place. Each urban area takes into account population, housing, and employment data, plus a dash of history, but almost entirely disregards administrative boundaries.[1] Urban areas are designed to be comparable to each other regardless of the state. The Census Bureau has published papers examining the history of metropolitan and urban classification since the 1910. It’s a good look at the challenges facing the kind of classification we’ve signed up to do.

This interactive visualization based on the 2010 census and a different urban area definition nonetheless does a great job of illustrating the difference between incorporated places, urban areas, and metropolitan statistical areas in terms of area and population:

In 2021, @Joseph_E proposed that we tag each urban area’s population directly on one of the places that it’s named after – the principal or secondary places – in order to fix cities that have counterintuitive populations. Though I appreciated the out-of-the-box thinking, I was concerned that redefining population=* proper would lead us down the path to “fuzzy math”. Almost one in ten urban areas has a double- or triple-barreled title, such as Minneapolis–St. Paul and Los Angeles–Long Beach–Anaheim. Would we give all the suburban population to the largest city, even though some of these are close ties, or would we divide the spoils more evenly? Even when an urban area is named after just one place, would we strip all the suburban place=* nodes of their population=* tags, so as not to double-count suburbanites?

Despite my reservations, I think Joseph was onto something. Urban areas are too nuanced for a quantitative key like population=*, but if we want place=* to be more holistic, urban areas could be a useful check on our classifications, to prevent that “Lake Wobegon” effect. If we restrict place=town and place=city to places in urban areas, we can immediately weed out 1,675 place=town nodes that the federal government considers to be unambiguously rural for a variety of purposes. Maybe we should demote these places to place=village and even less significant places to place=hamlet.

Many of these places were promoted from place=village to place=town despite their populations, because they serve as county seats. But thanks to the quick action of this community over the past couple weeks, data consumers can already give county seats an automatic boost without requiring us to make all these exceptions. If a place’s status as a county seat doesn’t already translate to enough population, housing, or employment to qualify as an urban area, then there isn’t a very strong rationale for promoting it anyways.

On the other hand, if there really is some unusual reason for promoting a place that isn’t apparent in the usual data sources, then this seemingly rigid rule will prompt a discussion, allowing us to distinguish that real-world consideration from an arbitrary one.

Putting suburbia in its place

If limiting place=town to urban areas cleans up rural areas, we still have to contend with inflated classifications in urban areas. In the grand scheme of things, when a suburb is located next to a much larger city, its raw population means a lot less than the population of a standalone town. And when we’re zoomed in far enough to consider only one metropolitan area at a time, any distinction between place=city and place=town wouldn’t be very relevant anyways. Instead, the more relevant distinction would be between the inner city (i.e., place=suburb) and outlying suburbs (place=town).

What if we define place=city by restricting it to the urban area’s titular places, which are designated based on the places’ population and housing unit counts? This heuristic goes a long way toward singling out the places that laypeople would consider to be the urban core. If you were to enumerate the places that meet this new criterion in a state that you’ve never lived in, you’d recognize much more of this list than before. To me, that’s a positive sign, showing that this criterion helps us achieve the goal of classifying based on importance.

The Dallas–Fort Worth–Arlington urban area and vicinity as of April 2024 and with the proposed change. Eight places remain classified as cities. Seven other places are demoted from cities to towns.
The Los Angeles–Long Beach–Anaheim urban area and vicinity as of April 2024 and with the proposed change. Twenty places remain classified as cities. Twenty other places are demoted from cities to towns.
The Kansas City urban area and vicinity as of April 2024 and with the proposed change. One place remains classified as a city. Thirteen other places are demoted from cities to towns.
The Miami–Fort Lauderdale urban area and vicinity as of April 2024 and with the proposed change. Three places remain classified as cities. Fifteen other places are demoted from cities to towns.

How big is big?

Unfortunately, the Census Bureau’s definition of “urban” is very generous, resembling the traditional place=town definition more than any reasonable definition of place=city. Hundreds of communities qualify as urban areas for having more than 2,000 housing units, despite a population below 5,000. Munds Park, Arizona (population 1,096), has its own urban area, the least populous in the country at only 773 inhabitants. This is a typical bedroom community attached to an outdoor recreation destination, almost exclusively residential, with a skyline that bears little resemblance to the sort of place typically documented as place=town, let alone city.

At this point, it’s worth remembering that a place’s remoteness doesn’t inherently cause it to become more important. Despite its isolation, Munds Park is still a bedroom community dependent on Flagstaff for big-city services. On a rendered map, Munds Park may still appear in order to fill in some empty space. Or on a map that prioritizes natural features and deemphasizes populated places, it may not appear at all.

Clearly, urban area titles alone are inadequate for defining place=city. Unless we distinguish Munds Park and hundreds of communities like it from big cities, deemphasizing suburbs would lead to a bizarre imbalance between rural and urban America. We need some other criterion for distinguishing between a major urban area and a minor one.

Obviously, a big city is more populous than a small town, but beyond that, there are other factors that determine whether a place functions more like a big city or more like a small town. Unfortunately, this determination has defied a quantitative definition, Dairy Queen and Whataburger notwithstanding. It seems like all roads eventually lead back to population, even after acknowledging other nuances.

Two official sizes of urban areas

In the 2000 and 2010 censuses, the Census Bureau did make a distinction between larger urbanized areas and smaller urban clusters, but it was essentially an accident of history. When the concept of an urbanized area was first introduced in 1950, the Census Bureau was operating under a tight deadline while relying on paper maps, so they opted to only compute urbanized areas based on incorporated places with 50,000 inhabitants or more. They just picked a nice round number as any project manager would. Later censuses kept that threshold because population figures need stable definitions to be compared across years. By 2000, the bureau had adopted more efficient computerized methods, so they introduced urban clusters as a supplement. In 2020, they unified the two concepts into a single urban area definition.

As an experiment to see if a population threshold would even help, I arbitrarily set the threshold for a major urban area at 101,536 inhabitants, which is the median population of a 2020 urban area. To avoid overweighting the inhabitants of twin cities, I divided the urban area’s population by the number of places in its title before comparing it to the threshold. I also put place classifications on a sliding scale based on the urban area classification:

Environment Principal place Secondary place Other place
Major urban area city city town
Minor urban area town town village
Rural area village

Indeed, for the most part, the nominally urban areas get downgraded, and we’re left with a less unwieldy city layer:

Unsurprisingly, this distinction results in far fewer place=city across the board. Rural states like Montana and Alaska are left with even fewer cities than they had before. Wyoming and the U.S. Virgin Islands have no city at all. Fortunately, enough place=town also get demoted to place=village that you can still look at a map of dots and reckon which town is Bismarck or Casper, something you couldn’t easily do if the only change we make is to restrict town to urban areas. Meanwhile, populous states like California and Florida have lost many of their cities too, with a similarly enhanced discernibility.

Rules are made to be broken

Sometimes, urban areas are more granular than what locals consider to be a coherent metropolitan area. For example, laypeople regard the San Francisco Bay Area as a single metropolitan area anchored by San Jose, San Francisco, and Oakland. But the OMB’s combined statistical areas are too coarse for this cultural region, while metropolitan statistical areas are too granular, to say nothing of the Census Bureau’s urban areas. Like many California cities, Concord and Walnut Creek are very populous by national standards, but locals would probably consider them to be suburbs in the Bay Area, if not specifically of Oakland.

It’s entirely possible that, at some stage in the Census Bureau’s process for calculating the San Francisco–Oakland urban area, it actually included Concord–Walnut Creek, but then it got split out for the same reason that San Jose remains separate: the process somewhat respects historical distinctions between urban areas that used to be separated by a greater distance and used to have weaker economic ties before they merged together.

I think cases like this are rare enough that the overall approach would still be a marked improvement over the status quo. There’s nothing sacrosanct about the number 101,536, but any purely population-based heuristic is a balancing act between decluttering rural regions and decluttering populous regions. We’d have to increase the threshold to a whopping 269,292 to make Concord–Walnut Creek a minor urban area and equate its two titular places with Oakland’s other suburbs. However, that would probably rob several more rural states of their last place=city. Alternatively, we could try setting additional constraints based on population density, housing density, or proximity.

Or… we could make an exception. You’re probably snickering at this point, but my goal isn’t to completely eliminate exceptions. Rather, I want us to establish a better general rule that satisfactorily classifies the vast majority of places, minimizing the number of exceptions. I want us to limit the exceptions to cases that are more self-explanatory than the Show Lows and Presque Isles we have today.

Just the beginning

As mappers, we aren’t accustomed to applying tags that depend on countless other tags hundreds of miles away. We specialize in zooming way in, mapping from the lowest possible vantage point, applying the most hyperlocal knowledge we have. For years, we used to document tags like highway=trunk and highway=primary using a set of ostensibly representative street-level photos, before we realized that such photos misleadingly prioritize construction quality over function and connectivity. Our preconceptions of what counts as a big city are shaped by our own life experiences. Within the U.S. community, I’d guess that relatively few of us have lived in different regions and experienced both urban and rural America. Those of us who have still carry our own biases that aren’t necessarily relevant to a map user.

Whatever further adjustments we try, I think we should try to start from first principles and the wealth of statistical data we’re lucky enough to have available in the public domain and see where the chips fall. It would be tempting to do the reverse, to decide how we want each chip to fall and work backwards from there. But we’ve already seen how difficult it is to reach consensus among mappers with diverse perspectives. An overfitted heuristic without a foundation won’t stand up to scrutiny, or dissent.

If nothing else, hopefully I’ve inspired folks to think outside the box about statistical data. In the colophon that follows, you’ll notice that the maps marked “Proposed” barely make use of OSM’s existing place=* tags. Even if nothing ever comes of this exploration, I’ve outlined an algorithm that data consumers could substitute for OSM’s place classifications, with enough effort. It just wouldn’t benefit from the well-considered exceptions we’d make to such an algorithm.


  1. If two urban areas have grown so that they’ve collided into each other, the boundary between them may follow city limits. ↩︎

14 Likes

Colophon

My primary source for this exploration was the Census Bureau. To facilitate future research, I imported the entire set of 2020 urban areas into Wikidata. This required getting a new property approved for urban area census codes.

As far as I know, Wikidata is the only place online where you can obtain the principal and secondary places that appear in the title of each urban area. Unless you game out the complex rules for determining an urban area, it isn’t necessarily obvious that, for example, the bi-state Kansas City urban area is named after the city in Missouri but not the one in Kansas, whereas the Texarkana urban area is named after both Texarkanas. This required many hours of manual conflation in OpenRefine.

To complete the etymological data for the whole country, I had to lavish special attention on Puerto Rico. Puerto Rico has no incorporated places, only municipios (analogous to New England towns), which are partitioned into barrios, one of which is typically the barrio-pueblo, the seat of government. Since none of these structures corresponds to where people live, the Census Bureau has defined two different kinds of CDPs: a zona urbana includes the barrio-pueblo and seat of government, while a comunidad does not.

I had to import all the zonas urbanas into Wikidata and rework many comunidades that had been poorly represented by the Cebuano Wikipedia. Meanwhile, OSM had incorrectly conflated each populated place with its surrounding municipio. As in New England, this resulted in inflated population=* tags, sometimes by orders of magnitude. I retagged each zona urbana and any comunidad that appeared in the title of an urban area, but I haven’t gone back and added population figures to the remaining comunidades.

I used QGIS to produce the custom maps in this post. They’re just a quick demonstration, nothing particularly rigorous from a cartographic perspective, but if you want to reproduce it, I threw the project up on GitHub along with notes for reproducing the various layers. This repository contains a GeoJSON file of all the OSM place points in the United States annotated with the major/minor/rural environment and any relationship to an urban area title. You can use this file to create a mashup that overlays the proposed places on other OSM data. Another GeoJSON file contains all the current place=town nodes that lie in rural areas.

3 Likes

A quite thorough rediscovery of the nuances of city hierarchy that geographers and cartographers have been trying to deal with for decades (like a lot of OSM debates). The observations in this piece are excellent, and more interesting than reading Christaller.

You are correct that if OSM is going to be useful for multiscale cartography (which it was not originally designed to do), then the relative importance property needs to be represented. But the place=city/town/village/hamlet distinction is totally insufficient to do that for many reasons:

  1. you need a lot more levels to be useful. If you’re webmapping, you need at least one for every zoom level (at least, every level where you would show places as points)
  2. In many states and countries, these terms have precise legal meanings as different types of government incorporation. Of course, every legal system is different, so there is no consistent system you can use.
  3. However, generally these are vaguely defined words with meanings that vary widely by region, culture, and dialect. Personally, I never saw the word “hamlet” where i grew up, even though I was surrounded by what other people would call hamlets.
  4. I think most people would agree that their personal definitions of these words (i.e., the prototypical urban landscape each invokes) have more to do with size than prominence. Anaheim is much more “city” than Bismarck even though the latter appears on many more maps.

It sounds like past debate on this has already raised these points. Nailing down technical meanings for anything in common language is always fraught; reinventing meanings rarely turns out well. It is likely impossible to make place=* intuitive & consistent & useful.

Net result: the scalerank used in Natural Earth works much better for this particular purpose (no surprise, this is exactly what it was designed for). If OSM wanted to add something like importance=*, with a new rubric for entering and using values, that would be great.

2 Likes

Yes, from a cartographic perspective, limiting place=* to a single value at a time is a challenge. For that matter, some well-designed print maps filter and size places purely based on simple heuristics like population and symbol collision plus a boost for capitals, which skirts the whole idea of data-driven place classification and also has a side benefit of a more intuitive legend.

So there’s an inherent limit to how well we can or would want to fit place classification to cartographic needs. But I figure that, as long as we classify populated places at all, at least it should be predictable enough to serve as an input. Currently it borders on chaos in some places, breeding conflict among mappers, which is no good.

One of my goals with this writeup is to raise awareness that a place=* value isn’t necessarily tied to whether the place shows up at enough zoom levels to be discoverable. I think this misconception is a major source of discontent and a reason for fudging the definitions. Maybe we can solve that problem by simply promoting alternative renderers that surface labels more aggressively, as in the demonstration maps I posted. That could lower the stakes so that even the admittedly simplistic thresholds we had in 2006 would satisfy most mappers.

Interestingly, we’ve had folks from the Plains states express a very different opinion, that for example Casper is a big city on par with the likes of Denver. Maybe from a certain perspective, these words have more to do with a local maximum than an absolute value. With enough coaxing, I think we could find a compromise between these extremes that makes Southern Californians, Midwesterners, and New Englanders equally unhappy. :sweat_smile:

This exploration has clarified for me that we need to better distinguish between suburbs and the cities they depend on, rather than relying on a place’s size to communicate that information. Otherwise, a renderer that naïvely categorizes places into buckets by population will get some cities wrong. For example, 90% of the Atlanta urban area’s inhabitants live in suburbs outside the Atlanta city limits, especially in large suburbs like Sandy Springs, which few outside Georgia have ever heard of. Hundreds of cities have only a plurality of the population within the principal or secondary city limits.

As the Census Bureau colorfully illustrates in its history of metropolitan areas, the notion of central and peripheral places has been around longer than central place theory, and laypeople are quite familiar with it intuitively, even if we can’t quite put our finger on a definition. Unfortunately, a suburb is more similar to a place=town than a place=suburb, which is part of the city rather than an adjunct to it.

This is not to say that our place points must comprehensively represent every suburbanite. The Barnstable Town urban area is a classic example of standalone, aimless suburban sprawl. But I think one of the advantages of the approach I explored is that the urban areas often indicate where people self-identify with a larger place. The population of Kirtland, New Mexico, “dropped” by over 90% overnight when the small core of the CDP incorporated in 2015. The outlying residents don’t go by a different name. Their location relative to the town boundary is adequately communicated by the boundary itself, but it would be great if we could somehow gesture at their contribution to the town’s stature in a deterministic way.

Oh hi there

I guess I’m not “most people”, as my city definition is definitely tied to prominence rather than population. I guess it comes with growing up in an “urban area” with a population that doesn’t even show up when you click on the profile (around 12 to 15,000, not including the college). The most populous urban area I’ve lived in has a current population of ~250,000 (Tri-Cities, WA), so I will admit I don’t really have much experience with big cities.

The issue for me with raising the “city” threshold is that (for example) having Rapid City, Summerset, and Box Elder at the same level of classification doesn’t really make a whole lot of sense (granted some of that is on the Census Bureau for granulating that area so finely, I don’t think Summerset and Box Elder deserve their own urban areas outside of Rapid). Maybe adding some sort of state population/population density adjustment to the proposed major/minor urban area cutoff value would help mollify small-state concerns? If, say, SD was an independent country our “city” threshold would be around 10,000 people (which is how the classification works out now, which also corresponds roughly to the high school sports classification (go Spartans)) but that’s not realistic for the rest of the US.

As an aside, the urban areas drawn by the Census Bureau aren’t the most accurate in places (at least where I’ve looked in SD). Doubt it would affect a scheme like this all that much (probably only missing a few hundred people here and there) but just wanted to get that on the record.

1 Like

It’s a choice between two values, so the question is more whether lumping Box Elder and Rapid City into the same category makes more or less sense then lumping Rapid City into the same category as Denver or Minneapolis. Or we could decide that Box Elder isn’t even deserving of town and should be village instead, but then we’d need to establish a heuristic for that other than whether there’s an urban area.

Just to clarify, 10,000 is the traditional minimum population for a town, not a city. If South Dakota has been classifying anything over 10,000 as a city, then that helps to explain some of the current discrepancy.

Could you clarify or give an example? The urban area extents are merely approximations based on census tracts and housing density. They aren’t intended to include everyone, and the exact shape is something you’d never want to map with precision. They work nice as highlights at low and mid zoom levels, but you wouldn’t want to depict them at high zoom levels. I also had to buffer them generously (by a kilometer) in order to classify OSM places as urban versus rural.

I think we a separate scale of prominence. Maybe some type of population multipler. Something that would boost the zoom level at which the place would appear. This would allow each region to have thier own set of relationships. It could also be used to deemphasize less important places in higher population areas. All without modifing the actual place population value.

In general, spinning out a separate key is a sound approach to addressing a case of a skunked tag. But there’s always the question of what to do about the original tag – which, by the way, the rest of the world is still using. If place should be strictly based on population, what population thresholds should there be, and should there be any exceptions? Why should a data consumer use place rather than population?

That makes sense; for the most part they’re good it’s just in the fine details that I start to scratch my head. Like I said, no big deal, just bothering my personal fastidiousness (Why does the Spearfish one include a field between the service road and the interstate all the way out to Broken Boot? Why does it cut through half of Sandstone? Why is Deer Meadows included but not Country School?, etc.).

So building off my idea of a population-based scaling factor, I took the median state/DC/PR population (4,371,546) and density (110.2) (from Population Density of the 50 States, the District of Columbia, and Puerto Rico: 1910 to 2020 (census.gov)) and scaled the median urban area population based on those values. This resulted in this lovely table:

Scaling by population in some way makes more sense looking at it than scaling by density (cities with population of 1,200 in AK!) but this is what a straight scaling factor would look like. I’m not as good with the techy stuff so I can’t output this into a nice map unfortunately. Still needs some tweaking (the straight scale cutoff for TX means only Houston, San Antonio, Dallas, Austin, Fort Worth, and El Paso would be cities, which seems light) but it’s a starting point.

I think most of these of these are the result of the “hops and jumps” methodology that the Census Bureau used to grow an urban area beyond its initial urban core (which is based on a group of census blocks), specifically what’s known as a road connection. On the other end of the spectrum, apparently it wasn’t aggressive enough to link Concord–Walnut Creek to San Francisco–Oakland through the Caldecott Tunnel across a mountain range, which was a point of contention in Slack.

This is reminiscent of something @ZeLonewolf tried recently:

Putting aside the taste-testing, one problem with this approach is that it prioritizes state boundaries that are more or less artificial. Is Texarkana half a city, half a town, because it straddles the line between the big state of Texas and the small state of Arkansas?

If we want to scale place classification to achieve an even density, we would instead use something spatially uniform like quadtiles or hexbinning. But data consumers like OpenMapTiles and external datasets like Natural Earth can already implement scale ranking more effectively, even varying it by zoom level and projection. It goes back to the question of why uniformity would matter for a key like place=*. As I demonstrated earlier, a map can still achieve a usable information hierarchy without uniformity.

By the way, I’m encouraged to hear that you also find the Census Bureau’s definition of “urban” to be a stretch. Maybe as a first step, we could reach a consensus that the 1,675 place=town nodes that fall outside an urban area should be demoted to place=village, regardless of population? This would be a straightforward bulk edit, except that many of these nodes in New England have incorrectly conflated a town with a town and need to be retagged as place=municipality.

Isn’t adjusting incorporation levels just as bad. There should a much simple scaling method? Something where locations are linked together. That way people are not temped to mess objective facts just to mess wit it’s final zoom level. Possibly use the one of the methods mentioned earlier that is already buit-in the rendering software.

Can you clarify what you mean by “adjusting incorporation levels”? Newcomers do sometimes mass-retag places according to incorporated status and other legal details, but no one here is proposing to do that. In fact, I’m hoping we can come up with a classification system intuitive enough that newcomers will be less prone to thinking we meant to classify according to legal status but got it wrong, so it needs to be fixed.