TIGER data quality

Continuing the discussion from Overturemaps.org - big-businesses OSMF alternative:

This was raised in the Foundation section by @Richard. Thought it would be appropriate to continue the discussion here.

1 Like

I donā€™t know the exact pipeline by which the census bureau gets its data and ultimately aggregates it into the TIGER data set. But itā€™s pretty clear from other Americans Iā€™ve discussed it with that the quality issues seem to be very localized. In my state (RI), the number of TIGER data errors Iā€™ve found have been very small. At one point I made a challenge of running every single street in two towns in my state, and after this exhaustive survey there were only a small number of errors, with the most common one being dead-ends that went further on the map than in reality.

1 Like

Some classes of problems were common to the entire TIGER import, but we largely fixed those. Off the top of my head: abbreviations in road names, disconnections at county lines, divided roads intentionally digitized as single carriageways, disconnections at railroad crossings. Probably the only one thatā€™s still very widespread is the name_* tags, because that requires a lot of manual review.

Otherwise, the classes of problems that remain are specific to individual counties (albeit many counties) scattered throughout the country. For example, overnodedness happens in many counties but is difficult to clean up en masse. Many counties also have every private driveway tagged as a highway=residential. Some have every parking lot outlined as a highway=residential. Itā€™s easy to perceive these problems as being common to the entire import because of our personal experiences focusing on counties where these things occur.

Agreed that the problems are very localized. At this point the best we can do is therefore come up with local solutions. Iā€™ve done state-wide cleanup challenges in #maproulette before but those prove too dauntingā€¦Tens of thousands of tasks. Here in Utah the main problem remaining, and itā€™s a big one, is the huge number of ā€œmade upā€ rural roads, geometry that does not represent anything currently existing on the ground, or I would not recommend anyone rely on for anything. Iā€™ve considered proposing deleting everything in Utah that is TIGER imported and never looked at by a human. Either as a bulk operation or as something more targeted (not sure what that would look like). Any experience with localized ā€œrevertingā€ of TIGER import data?

1 Like

Would it be possible to build some sort of heatmap overlay showing the density of potential remaining TIGER issues? This could help interested mappers find areas to focus on.

2 Likes

So the big issue is A41-class roads in rural areas, particularly (though not exclusively) away from the coasts. These were imported as highway=residential. Some are indeed residential highways! But most arenā€™t. Some would be better as highway=track, some as highway=unclassified, surface=unpaved (or something more nuanced), and some would be better simply deleted.

My interest is that this is particularly problematic for cycle routing. Car routing is less affected because it prefers the higher routes in the hierarchy (trunk, primary, etc.), which are mostly fixed. Bike routing is the opposite - it prefers unclassified/residential etc. - and if you try and route across the US on highway=residential, you will basically die of dysentery somewhere on the way to Oregon.

Weā€™re unfortunately a long way past anything that can be sanely reverted. There have been lots of incremental little fixups over the years, plus the blizzard of corporate edits relating to driveways, that mean most ways have indeed been edited somehow over time. As an example, there are several counties where (say) a maxspeed=45 mph tag has been added to every highway in the county because thatā€™s the local ordinance. This means you might find a drainage ditch tagged with highway=residential, maxspeed=45 mph :wink:

A smart(ish) data consumer can go a long way to alleviating these issues with several heuristics. cycle.travel is very distrustful of highway=residential with tiger:reviewed=no (and no surface tag) in rural areas. The upshot is that you can use it to route across the US with cycle.travel and you will probably not die of dysentery. But it would be better to improve the source data.

Iā€™m not against carefully targeted automated edits when the point is to fix issues with an earlier automated edit, i.e. TIGER. For example, a few states (Colorado springs to mind) publish open data on road surfaces. This could be sanely brought into OSM. The imagery-derived surface detection mentioned in another thread also looks really promising. And, of course, maybe Overture Maps will be releasing some relevant open dataā€¦ who knows.

But anything automated would have to be very carefully reviewed for fear of blatting the few usable heuristics we have at the moment.

4 Likes

You should be, and youā€™d know because youā€™ve looked into this pretty deeply. My heuristic for separating wheat from chaff is:

  1. highway=residential
  2. No name=*
  3. existence of tiger:cfcc and / or tiger:reviewed=no
  4. Last touched date long ago (I usually query for <2012)

There is a bunch of ā€œadvancedā€ overpass queries that have more elaborate criteria, @Minh_Nguyen would know where to find these.

2 Likes

My last experience with this was in 2009 when I asked for all of Greene County, Ohio, to be deleted ā€“ because it had been imported twice, every road duplicating another copy of the road without any connections between them. I had already made many edits in the area and had haphazardly deleted many roads from one or the other import, but it only took me a couple months to recover. I donā€™t think we couldā€™ve done something that clean in 2011. Granted, Greene County is much more developed than some of the counties in Utah where you map.

Here are some Overpass queries for unedited TIGER ways. As @Richard points out, there are many false negatives because of driveway editing and such. The public Overpass instance canā€™t handle querying for TIGER unedited ways beyond a small area.

A couple years ago, I developed some SPARQL queries to determine the most deserted TIGER desert counties, and even refined it down to individual ZIP codes. Unfortunately, Sophox is no longer reliable for these queries, but I made this snapshot in 2020 that might still be useful.

3 Likes

This is an Overpass query Iā€™ve used in the past that trades off reasonable speed for reasonable accuracy:

rel(161993);map_to_area->.a; // state of utah
way // consider ways that...
  [highway=residential] // are residential 
  [!name] // do not have name tag
  ["tiger:cfcc"] // have tiger:cfcc tag which was created as part of the import
  (if:timestamp() < "2013-01-01T00:00:00Z") // have timestamp before 2013
  (area.a); // are in the defined area
out meta geom; // output geometry and metadata

This yields 2176 ways. If you leave out the highway=residential criterion itā€™s up to 8000+.

Thereā€™s possibly an argument that some of these highway=residential roads should be automatically retagged as highway=road - which itself is effectively a fixme tag.

NM is probably the state with the most un-road-like A41s IME. Some of the geometry in WV is pretty shocking thoughā€¦

3 Likes

I can see that it might make sense to remove untouched TIGER in ā€˜wildernessā€™ areas, but even then by someone in the region who has familiarity.

But locally it would make no sense to take any sort of automated or mass action. Iā€™ve been going over counties and setting the surface and geometry where not hidden under trees, using the US Tasking manager. Even that requires knowledge of regional road construction, soil, and any possible better local authoritative sources of road names and geometry. Iā€™ve also found that the commercial ā€˜driveway mappersā€™. have often taken the time to improve roads nearby where the original TIGER was wonky.

4 Likes

Welcome to the new forums, @MikeN !

As a compromise, I think it is fesaible to do MapRoulette challenges for smaller areas. I discovered an interesting dividing line at the Navajo Nation boundary where inside NN thereā€™s many old, untouched TIGER residential roads, and outside NN (at least on the Utah side) almost none. I donā€™t know if this is just selective mapper activity or inconsistencies in the TIGER data coverage, or something else. But I made a MapRoulette challenge to encourage people to help with building a better road network for the Navajo Nation: Martijn van Exel: "Mapping Inequality The map below shows SE Utah wā€¦" - En OSM Town | Mapstodon for OpenStreetMap

2 Likes

This is a neat little snippet of OT code, thanks Martijn. Iā€™ve entered something similar for my county (in California) linked in our (county-level) wiki, it produces a bit ā€œricher / deeperā€ a set of data (both nodes and ways).

I really, really miss the wonderful, deprecated (summer of '19?) ITO World ā€œTIGER Cleanupā€ (I think it was called) renderer. I used this to clean (and clean, and clean, and cleanā€¦) my county until I got to something like 75% or 80% ā€œdoneā€ (I might give my efforts a solid B-?!) and then that particular renderer quit. So sad.

Iā€™ve looked for other ā€œprettifiedā€ helpers / renderers to aid in TIGER cleanup, as in many cases, automation is quite ad hoc, specific to a county, state, aboriginal_land (again, Martijn, thanks for the tip about Navajo Nation, Iā€™ll go take a look). Alas, there arenā€™t any renderers that suit my fancy, so what little work I now do (in my county, really) to improve TIGER is from my OT query. Somehow, because it isnā€™t as pretty as that ITO World version, I clean up TIGER less than I used to. I think it was the color-scheme (red, orange, light-blue, dark-blue, I think) and rather clever reasons (including ā€œ3 year aging since last editedā€) that made it truly useful. I know if we got a replication or close to it, I (for one) would slash away towards 90%, then towards 100% (again, in my county, where I concentrate my mapping efforts, especially for TIGER fixup).

I do recall one august volunteer in this project (I have a lot of offline email conversations with him) calling TIGER, in many cases, ā€œnot much better than an hallucination.ā€

Anyway, Iā€™m dedicated to improving TIGER data, locally, more widely (statewide, and indeed, there is a lot to be said for state-by-state ā€œdivide and conquer,ā€ as weā€™ve done a decent job of whacking rail data from TIGER down, though thereā€™s still tens of thousands of rail miles to go, and these arenā€™t getting easier to quantify). Iā€™d love to know that better tools are available. OT queries are good, but theyā€™re wonky and largely used by the more geek-inclined (no offense to geeks, I actually proudly have the word on a license plate of a car of mine).

Yes, it might be the 2040s before we clean it all up. My sleeves are rolled up, and have been for a while.

2 Likes

Richard, I donā€™t know if youā€™ve been to West Virginia or know much about it, but itā€™s an outlier among states in some interesting ways. For one example, it seems to be deliberately ā€œradio signal quiet,ā€ I believe part of that was or is for a radiotelescope near there that needs to attenuate interference, improving its signal-to-noise ratio. A number of things seem to ā€œfall off the mapā€ when you enter West Virginia, itā€™s hard to explain. Iā€™m sure there are reasons for such things, they seem beyond me. Maybe thereā€™s an article written about why.

cycle.travel can be used in some areas. It shows unfixed TIGER residentials in rural areas as a faint grey dashed line, like this:

But it wonā€™t be 100% reliable for this purpose - in many areas it has additional heuristics to guess what might be a usable road, and it updates roughly once a month so itā€™s not ideal for real-time fixing.

While Iā€™ve perused cycle.travel on my little county before (I did develop and propose to the transportation commission the ā€œCycleNetā€ bicycle local bike route numbering protocol), thanks to your ā€œunfixed TIGER residential = faint grey dashed lines,ā€ it visually now makes much more sense! Iā€™m not sure how you determine / calculate ā€œrural areas,ā€ but my eyeballs are quickly getting retrained as they parse your semiotics. Thanks!

1 Like

I think this is a reference to the National Radio Quiet Zone, which also extends into a good chunk of Virginia. It is indeed an area where youā€™re guaranteed to lose cell reception, but the data quality issues in West Virginia arenā€™t limited to this zone by any means. Iā€™ve cleaned up many roads that corresponded to old mining roads or roads predating mountaintop removal. Even the many roads that legitimately exist have poor geometry because most roads follow winding rivers in dense woodlands within narrow hollows ā€“ tough for both GPS reception and aerial survey.

TIGERā€™s data quality issues are generally endemic to specific counties, but for West Virginia they seem to be pretty consistent statewide. I wonder if this is because West Virginia maintains the entire public road network outside incorporated cities, in contrast to most states that rely more heavily on local highway departments, which are typically responsible for sending road network data to TIGER.

2 Likes

Yes, Minh: the Green Bank (,West Virginia) radiotelescope et al. Thanks for your link, thanks for your mapping of the ā€œquiet zoneā€ polygons.

Making an explicit reply here because Zekeā€™s excellent suggestion got a like from me and resonates with my experience with that (no longer functional) ITO World render I mentioned. Goinā€™ through a bit of ā€œalreadyā€ ground here (for over five years of history) at TIGER Edited Map - OpenStreetMap Wiki. The topmost render is what Iā€™m talking about. That ā€œoverviewā€ or ā€œheat mapā€ (something about those red-and-orange turning to sky-blue, then darker-blue as weā€™re done) clinched it as ā€œvisually parsable semiotics which make a lot of sense to my mind,ā€ driving forward TIGER cleanup.

It worked, is what Iā€™m saying. Replication (or something close) WOULD rekindle that fire, fairly easily, I speak for myself.

1 Like

I can speak generally about some local governmentsā€™ I consult for. The Census bureau maintains relationships with GIS managers in local administrative jurisdictions (call them Counties in Maryland). They routinely exchange data, not just centerline but also boundary annexations, things like that.

However, not all jurisdictions participate. Some smaller ones do not have the resources to pass quality data back and forth with the Census. Some really small ones may just have a single GIS person and all of their CL could be in an incompatible format. Lots of these tiny jurisdictions around.

2 Likes