Cleanup of old imported key "census:population" in the United States

Let’s discuss the key census:population. The wiki does a good job documenting how the key came to be in the database and has some notes under “cleanup” so I recommend reading that.

Here’s some stats from poking around a bit:

  • 18,397 features still have this key
  • 18,392 of these features are in the “United States” according to Overpass areas
  • 62 features with the tag census:population do not have the tag population
  • 15,730 features have a census:population and population value that agree on the population
  • Spot checking ~30 random features where census:population and population disagree leads one to believe that the vast majority of such cases are caused by the census:population value being wildly out of date.

Given the above, my personal feeling is that this is now a dead and useless tag. My first proposal would be to remove this tag from every feature that has a population tag and adjust the other 62 features manually.

What do folks think? If there’s no major concerns/discussion ongoing, I’ll probably get around to this next week.

4 Likes

Would you also add any missing population:date=* or source:population=* at the same time? population=* on its own isn’t very helpful. That was the reason census:population=* wound up with this weird semicolon syntax to begin with.

Some additional observations:

I think there’s a second issue with cleanup and updating population tags generally. I suspect I will chew off small pieces of that over the coming months. Getting thing standardized will make whatever the next bit is somewhat easier… something something “eating an elephant”.

For cases where census:population and population match, I am happy to add population:date (of whatever the date portion is) and source:population (US Census?).

I will add the 4 cases where the population:date is later than the one indicated in census:population to my list of manual review and cleanup.

2 Likes

I started working on this today. Took the 2 easiest cases and knocked those out:

  1. No conflicting population tags of any kind. Split the tag into the appropriate tags added source tag. Changeset: 147392324 | OpenStreetMap
  2. Population agree and no other source tagging. Move date to population:date, delete the tag. Add source tag. Changeset: 147396213 | OpenStreetMap Changeset: 147396201 | OpenStreetMap

I have added a single hashtag #CensusTagCleanup to these in case we need to track them down for easy revert.

The wiki suggests removal. Do we have any wiki experts that could add the template to that article saying that the tag is discouraged or deprecated? @Minh_Nguyen maybe?

1 Like

I updated the article and data item to reflect the key’s deprecated status.

1 Like

I am working my way through the remaining items. There’s lots of clusters where folks updated chunks of states so it’s not quite going one by one. Thankfully.

The bulk of the time is looking at edit histories. Trying to sort out any inconsistencies and add source tags where I can reliably infer source info from the original edits. Some are just busted and I have manually added the real data from 2020 Census.

I hope to get another big check-in done in the next day or two.

Currently looking at features with minimal tagging (no source:population etc)…

There’s ~800 features with this key from the original import (2006 as year), a mismatched value for population, but have a wikidata tag. I have spot checked these and they seem mostly populated from a MapRoulette task that folks worked through over the intervening years.

Under the presumption that the population tag is now out of sync because it has been updated post 2006… I am tempted to bulk delete the old census:population key instead of going one by one through features of this type. Spot checks of histories of randomly selected places in this category seem to indicate this is sensible. Thoughts?

Here’s a good example: Node History: ‪Star‬ (‪150973584‬) | OpenStreetMap

This is the changeset for the work described earlier where I was looking at various clusters of updated population tagging. Changeset: 147474022 | OpenStreetMap

  • There was a huge update to population data in NY by one user who helpfully left a source:population tag with a reasonable and consistent value (Changeset: 118146251 | OpenStreetMap)
  • A few users had picked various states and updated major cities in a predictable pattern
  • A user modified a big chunk of CA populations but then used census:population as a population:date field… mostly 2015. This also happened in MT Changeset: 104278469 | OpenStreetMap
  • etc

Okay, I have gone ahead and cleaned up the “2006” tagged things as mentioned above. I fixed up a handful of validator warnings about mismatched wikidata entries etc. Changeset: 147559075 | OpenStreetMap

This leaves ~70 items needing review and a small handful of items I’m still tracking down updated census info for. For example: Node History: ‪Stillwell‬ (‪153596554‬) | OpenStreetMap doesn’t seem to have updated 2020 census info on their web portal.

There’s a lot going on here. This node was originally imported as a place=village named Hamlet and located where USGS topographical maps show a survey control point named Stillwell. This point is just 1 mile to the southeast of an unincorporated community also named Stillwell, which was imported as a place=hamlet node with no population tag. The topos show no sign of a “Hamlet” in the vicinity.

However, in GNIS, Hamlet has two different coordinates on either side of the LaPorte–Starke county line, unusually for a feature in the Populated Place class. The other coordinate, at the actual Town of Hamlet, was apparently never imported, so it had to be added manually. Back at the survey control point, the village node for Hamlet was tagged with a population of 770, which probably corresponded to the town, not the unincorporated community. The Census Bureau only publishes population figures for an unincorporated community if a CDP or urban area has been defined for it, but that doesn’t appear to be the case for Stillwell.

In 2016, Mapbox renamed Hamlet to “Stillwell”, citing coordinates in the Wikipedia article by that name. In 2022, it was merged with the hamlet, keeping the population=* tag and both feature IDs and elevations but leaving a malformed census:population=2006. Finally, last year, you deduplicated the tags on the merged node, choosing the town’s elevation and population but the unincorporated community’s feature ID.

Overall, OSM is in better shape than GNIS. But ele=* should be changed back to 215 meters and population=* and census:population=* should both be removed, unless you can find a non-census population estimate. The feature ID 435652 can be tagged on the actual Hamlet.

I suppose it’s nice that my initial impression of the history was “this is weird and bad somehow”. You know you’ve really made it as an OSMer when you show up two or more times in a summary of some issue. I’ll sort it out and get things recombobulated.

I’ll also go review whatever edits I was doing that caused the merge.

A OSMUS Slack user has commented that Indiana has an estimate for Stillwell of 141 which seems sensible.

I think I have managed to vanquish this key as well as address the issue above. If anyone finds something amiss, please let me know. Happy to get things sorted.

I have added a note about the cleanup to the wiki with a link back to this thread.

1 Like

There is a wikipage for this key in Russian… should I try and sort that out? What’s the typical way to handle this?

You can try to sort it out yourself, or slap a {{Translation out of sync}} on it to alert readers.

1 Like

I have taken the liberty of cleaning up all instances of population:census:NNNN where NNNN is a year. There’s a ton more of that worldwide but I don’t feel comfortable at this time blasting everything off of nodes in Ukraine.