Let’s discuss the key census:population. The wiki does a good job documenting how the key came to be in the database and has some notes under “cleanup” so I recommend reading that.
Here’s some stats from poking around a bit:
18,397 features still have this key
18,392 of these features are in the “United States” according to Overpass areas
62 features with the tag census:population do not have the tag population
15,730 features have a census:population and population value that agree on the population
Spot checking ~30 random features where census:population and population disagree leads one to believe that the vast majority of such cases are caused by the census:population value being wildly out of date.
Given the above, my personal feeling is that this is now a dead and useless tag. My first proposal would be to remove this tag from every feature that has a population tag and adjust the other 62 features manually.
What do folks think? If there’s no major concerns/discussion ongoing, I’ll probably get around to this next week.
Would you also add any missing population:date=* or source:population=* at the same time? population=* on its own isn’t very helpful. That was the reason census:population=* wound up with this weird semicolon syntax to begin with.
Some additional observations:
16,114 features have a census:population=* older than 2010 – two decennial censuses ago.
Among the features that additionally have population=*, 3 have matching dates but mismatching population figures in population=* and census:population=*.
I think there’s a second issue with cleanup and updating population tags generally. I suspect I will chew off small pieces of that over the coming months. Getting thing standardized will make whatever the next bit is somewhat easier… something something “eating an elephant”.
For cases where census:population and population match, I am happy to add population:date (of whatever the date portion is) and source:population (US Census?).
I will add the 4 cases where the population:date is later than the one indicated in census:population to my list of manual review and cleanup.
The wiki suggests removal. Do we have any wiki experts that could add the template to that article saying that the tag is discouraged or deprecated? @Minh_Nguyen maybe?
I am working my way through the remaining items. There’s lots of clusters where folks updated chunks of states so it’s not quite going one by one. Thankfully.
The bulk of the time is looking at edit histories. Trying to sort out any inconsistencies and add source tags where I can reliably infer source info from the original edits. Some are just busted and I have manually added the real data from 2020 Census.
I hope to get another big check-in done in the next day or two.
Currently looking at features with minimal tagging (no source:population etc)…
There’s ~800 features with this key from the original import (2006 as year), a mismatched value for population, but have a wikidata tag. I have spot checked these and they seem mostly populated from a MapRoulette task that folks worked through over the intervening years.
Under the presumption that the population tag is now out of sync because it has been updated post 2006… I am tempted to bulk delete the old census:population key instead of going one by one through features of this type. Spot checks of histories of randomly selected places in this category seem to indicate this is sensible. Thoughts?
This is the changeset for the work described earlier where I was looking at various clusters of updated population tagging. Changeset: 147474022 | OpenStreetMap
There was a huge update to population data in NY by one user who helpfully left a source:population tag with a reasonable and consistent value (Changeset: 118146251 | OpenStreetMap)
A few users had picked various states and updated major cities in a predictable pattern
A user modified a big chunk of CA populations but then used census:population as a population:date field… mostly 2015. This also happened in MT Changeset: 104278469 | OpenStreetMap
Okay, I have gone ahead and cleaned up the “2006” tagged things as mentioned above. I fixed up a handful of validator warnings about mismatched wikidata entries etc. Changeset: 147559075 | OpenStreetMap
This leaves ~70 items needing review and a small handful of items I’m still tracking down updated census info for. For example: Node History: Stillwell (153596554) | OpenStreetMap doesn’t seem to have updated 2020 census info on their web portal.
There’s a lot going on here. This node was originally imported as a place=village named Hamlet and located where USGS topographical maps show a survey control point named Stillwell. This point is just 1 mile to the southeast of an unincorporated community also named Stillwell, which was imported as a place=hamlet node with no population tag. The topos show no sign of a “Hamlet” in the vicinity.
However, in GNIS, Hamlet has two different coordinates on either side of the LaPorte–Starke county line, unusually for a feature in the Populated Place class. The other coordinate, at the actual Town of Hamlet, was apparently never imported, so it had to be added manually. Back at the survey control point, the village node for Hamlet was tagged with a population of 770, which probably corresponded to the town, not the unincorporated community. The Census Bureau only publishes population figures for an unincorporated community if a CDP or urban area has been defined for it, but that doesn’t appear to be the case for Stillwell.
In 2016, Mapbox renamed Hamlet to “Stillwell”, citing coordinates in the Wikipedia article by that name. In 2022, it was merged with the hamlet, keeping the population=* tag and both feature IDs and elevations but leaving a malformed census:population=2006. Finally, last year, you deduplicated the tags on the merged node, choosing the town’s elevation and population but the unincorporated community’s feature ID.
Overall, OSM is in better shape than GNIS. But ele=* should be changed back to 215 meters and population=* and census:population=* should both be removed, unless you can find a non-census population estimate. The feature ID 435652 can be tagged on the actual Hamlet.
I suppose it’s nice that my initial impression of the history was “this is weird and bad somehow”. You know you’ve really made it as an OSMer when you show up two or more times in a summary of some issue. I’ll sort it out and get things recombobulated.
I’ll also go review whatever edits I was doing that caused the merge.
A OSMUS Slack user has commented that Indiana has an estimate for Stillwell of 141 which seems sensible.
I think I have managed to vanquish this key as well as address the issue above. If anyone finds something amiss, please let me know. Happy to get things sorted.
I have added a note about the cleanup to the wiki with a link back to this thread.
I have taken the liberty of cleaning up all instances of population:census:NNNN where NNNN is a year. There’s a ton more of that worldwide but I don’t feel comfortable at this time blasting everything off of nodes in Ukraine.