Iâm pleased to announce a new utility for finding issues with boundary relations in the United States. I operate a data consumer that uses boundary data, so having good boundary data is important to me. This utility inspects on boundaries with admin_level values between 7 and 9 as well as CDPs tagged boundary=census. I also cross-reference these boundaries with wikidata and (in the case of CDPs), the US Census Bureau.
At this point, I am the most interested in hearing feedback on the utility itself, and whether the checkers are picking up the right things, and if not, how the logic should work instead (the more specific, the better). I am also interested in any additional checks that I might implement at the tagging and correlation level, as there are other tools for checking geometry validity. If you are so motivated to correct any issues identified by this tool, that is of course most welcome!
This tool make use of external data from wikidata and the US Census Bureau. We may find that a problem flagged in this tool is a wikidata issue rather than an OSM issue, so I expect that state-by-state, someone knowledgeable will need to do some manual inspection on these findings to figure out how they should be resolved, or if the tool is reporting a false positive.
Great tool! I looked at some of the missing ones - they are already entered as âplace=neighborhoodâ nodes. Would the proposed correction be to find the latest CDP boundary from TIGER and convert the node to some type of area polygon?
Since weâve mapped CDPs (initially as a mistake but over time weâve kept them), I would recommend bringing in the latest CDP boundaries from the census bureau and maintaining them, as they can change over time. However, the place node may still be appropriate if it represents a named population center.
I wouldnât know what checks would be appropriate to apply in those cases, but if you have a good handle on what the logic should be, I would welcome a writeup describing it.
I wonder if thatâs an artifact of the old TIGER data. The new CDP boundaries are supposed to be distinct from âincorporatedâ places, as recently discussed on Slack.
Just to put it out there (this is a pet peeve of mine) the TIGER boundaries are on the original NAD83, and donât âfitâ to GPS data or imagery⊠there is a noticeable offset (this was the case with the original import as well) I have the 2023 TIGER boundaries converted to WGS83(G2139) up on Google Drive at TIGER 2023 Places - WGS84(G2139) - Google Drive
Thanks for this QA tool! Iâll be chipping away at Vermont boundaries.
Why do you recommend this? CDPs donât seem a great fit for OSM to me as they have no on the ground evidence of existence (at least that Iâve seen). The place name corresponding with a CDP will have on the ground evidence, but the actual polygon designated by the Census Bureau is generally not something marked in the real world unless it is an exact match for an admin boundary (in which case weâd map that).
Itâs also kind of unclear what to do when a CDP boundary in TIGER seems to want to follow some real-world feature but doesnât. Sometimes itâs just the usual TIGER exaggerations; other times, an adjacent municipality has annexed into the CDP, but the CDP hasnât been officially updated yet.
Retagging the imported CDPs as boundary=census was a concession for the CDPs that fit a popular notion of a named placeâs extent, such as Bethesda, Maryland. We didnât want the boundary to seem authoritative and administrative, but it wasnât necessarily a call to action for completeness. For those named places with well-defined boundaries, thereâs also boundary=place, which applies even where thereâs no CDP, like in this Buffalo neighborhood.
What I mean to say is that since, in the 15 years since TIGER 2008 when these all showed up, we havenât managed to form any consensus to remove them, we may as well maintain them if theyâre going to be in the database.
With my data consumer hat on, I will say that theyâre useful in places where you need municipal boundaries where they donât otherwise exist (for example, Hawaii and Maryland). This is also not a specific argument to keep them, but I did want to point out that they are selectively in use.
It would be easy enough to tune the QA tool if we decided to do away with these, but Iâd advise that discussion to live on a dedicated thread.
The US continues to be polished to a high gloss in OSM: I find it exciting that we have such enthusiasm to improve / clean-up both TIGER-imported boundaries (sometimes boundary=census data makes sense to include in OSM, sometimes less so, especially as these do not age well at all), as well as boundary polygon relations in general.
I thank all the tool builders, mappers and dialog / discussion participants on how we best do this, as it both has resulted in and continues to impress many that OSM can produce really high-quality data that emerges with both consensus among us (we, the contributors / owners of these data) as well as virtually continual improvement that makes our data better and better as the years go by. As someone who watches, participates, maps, wiki-writes and tool-builds myself (e.g. MapRoulette), this isnât simply self-congratulatory (I donât like to âtoot my own hornâ), I really mean to say âthanks to manyâ here. Yeah!
There are at least two boundaries in Maryland that are both a military base and a census designated place. How should this be handled on the Wikidata side? Example
The issue is that the wikidata link is for the base, not the CDP. Should I create a separate wikidata item for the CDP and use P1889 to flag it as different?
Bug report? There is one that has the issue of not being on the CDP list for Maryland (Relation: âȘWoodlawn⏠(âȘ133521âŹ) | OpenStreetMap) I think it is a bug because this item is on that list. There are two CDPs with the same name. A possible bug is that your QA tool is missing the second occurrence of the same name.
Why do I have the feeling that as Brian does this, he (and the rest of us, by osmosis of using his tool) are going to get some serious schooling in many other examples of âwhat the hell, (fill in the place with its very own boundary quirks)â?
But, (to quote Martha Stewart): âthatâs a good thing.â
Hereâs another one. Maybe a wikidata expert can help. Bennsville, Maryland is misspelled by the US Census. This is even noted in the article. Is there a wikidata property to say âthis is often mispelled asâ ? The flag will remain since wikidata has the correct spelling but USCB is wrong.
Iâve said this many times in the context of OSM (-US), as it is true and widely acknowledged: there is âwhat isâ and there is âwhat the Census Bureau says there is.â Now, I realize that sometimes what the Census Bureau says is quite helpful, especially in the (sometimes quite narrow) context in which it states its data, and so what the differences are can be helpful to be pointed out. Even at the granularity of a case-by-case basis, it frequently behooves OSM to carefully ask ourselves âwell, so what? that the Census Bureau says (fill in the blank).â And then, there are times and places where Census Bureau data are quite valuable to OSM.
My father served in the US Army and was once told by his commanding officer, âThereâs the right way, the wrong way, and the Army way.â OSM data and Census Bureau data (as we compare and consider whether correct or appropriate or not) are kind of like that.