Tiger cleanup - tags

Richard · January 29, 2023, 3:09pm

And surface. Unpaved highway=residential TIGER A41s are the bane of bike routing in the USA.

DUGA · January 29, 2023, 7:26pm

https://wiki.openstreetmap.org/wiki/TIGER_fixup

Most of the TIGER tags are junks and should be deleted. That being said, what I’ve done is:

Fix the geometry.
Verify if the name is correct or not.
Check tags which may contain historical information.
After then, remove everything related to TIGER.

tiger:cfcc:Useless information.
tiger:name_base: You need to verify and correct name if needed, then remove it.
tiger:reviewed: You need to remove it after everything is set.
tiger:separated: Automatically removed if you iD.
tiger:source: Automatically removed if you iD.
tiger:tlid: Automatically removed if you iD.
tiger:zip*: Useless information, USPS is the source of the truth.
tiger:upload_uuid: Automatically removed if you iD.

aighes · January 29, 2023, 7:38pm

As a cyclist I would personally agree, though I think it’s nothing mandatory anyone have to add in order to remove the tiger:reviewed=no. Let’s hope the recent changes to OSMcarto will give a boost to surface-usage

stevea · January 29, 2023, 9:05pm

It was years ago, but I do recall using some of the tiger:cfcc tags when doing some rail classification and “better tagging” (as in existing rail tagging conventions that didn’t exist or weren’t paid attention to when TIGER data were imported, or those that evolved in OpenRailwayMap tagging conventions years after the import), as I have cleaned up thousands of miles of USA Rail that came into OSM by the TIGER import, and that tag proved useful in certain wide-scale OT queries: quite helpful, I recall. I agree that for the most part, tiger:cfcc tags seem “junks” or “useless” (as @DUGA says; welcome to Discourse, DUGA). But as can be true with such data, we don’t know all possible cases for how or when they might be useful. And as in some cases (rare, admittedly, but in the case I note above, actually do happen), they are useful, because they (surprisingly) can be.

Until, as with tiger:reviewed and then we review the data and it becomes “fine enough for OSM,” and then it is absolutely proper that we delete it. I agree with @aighes that “if it is already correct, it doesn’t need to be further corrected.” The other data? Let’s continue to discuss, as “which tags might be disposed of has begun.” We should be careful, though: we must try to imagine use cases, but that is fraught with the danger we don’t imagine everything possible, quite a likely occurrence.

I’d say tiger:zip* could be deleted with a mechanical edit (and yesterday). There is no reason for these data to be in OSM, whether they are, might be or are not correct. It’s simply incorrect for them to be in OSM; no need to make a determination if they are “correct.”

watmildon · January 29, 2023, 9:29pm

Someone in another forum mentioned renaming the TIGER zip tags to postal_code tags which I think is also likely not helpful?

stevea · January 29, 2023, 9:34pm

Renaming “zip” to “postal_code” simply obfuscates / further confuses their origin, which is “mail delivery numeric algorithm acting as imposter data for geographic area.” Making such geographies (polygons) is impossible to do, but people continue to try to “map” (logically and geometrically) ZIP codes to geographic areas, always as “estimated” or as is stated to be more blatantly true, “incorrectly.”

I’ll repeat my strong opinion: tiger:zip is a no-brainer for “can be deleted with a mechanical edit,” but of course, we’d need to achieve wider consensus on that before doing so. Other tiger: keys are more difficult to make such a determination, but they all lean heavily towards “let’s do what we have to do, even taking years to get there if we need to, so that we can show these tags the exit door out of OSM.”

The tricky part, and why we are (still) 15 years into this discussion (and discussion, and discussion…) is to “wring out” the maximal amount of “mapping value” out of these tags before their demise. That’s pretty hard, as we can’t possibly imagine every use case. And simultaneously, the data DO need substantial review and/or improvement.

watmildon · January 29, 2023, 9:50pm

It seems easy to conflate two discussions together:

What tags are safe to remove after a “thorough” review of an object?
What tags do various folks think should just be removed from the DB?

Focusing only on the first class, it seems that there’s some set of folks that derive some (maybe small) value from county tagging and that cfcc is somewhat helpful if the surface isn’t well tagged. Everything else seems to be getting the axe by most folks. Fortunately this seem to match the wiki.

For the second class there’s probably some relatively easy housecleaning among the top TIGER:* tags. Looking at you: tlid (2.7mil entries), source (2.7mil), upload_uuid (2.4mil). And then some perhaps less easy but maybe not too contentious cleanup… maybe: zips (200-300k)?

Everything on tag info with fewer than 100k entries is totally unfamilar to me so I won’t even speculate! I was surprised to see the wider range of TIGER:* tags listed there.

stevea · January 29, 2023, 9:54pm

It’s kind of a side-topic to this thread, but it is absolutely true that wider paying attention to “the entire data path” of OSM nodes, ways and relations turning into a rendered map is a real requirement for how the feedback loop of effective mapping happens. Yes, this gives rise our “don’t tag for the renderer” no-no, but it also rears its head when Carto begins to support a tag as a rendered feature: in effect, this says “it is important to get the tagging right on such features, as NOW, they render” in our “standard” renderer, which by doing so, carries some clout on “what is important for Contributors to tag, and how.”

This is a complex topic, as OSM does a certain amount of “trying to hide” the complexities and difficulties of “what is rendered” (OSM is not a map of specific rendering, it is a database), especially by saying “don’t tag for the renderer,” yet at the same time, it (appears to?) influence the way that specific tagging does or will happen, by making choices in the renderer. That feedback (at the end of the pipeline, “a rendered road surface,” for example) reaches all the way back to the mapper, by saying “it is somewhat important to be careful tagging this feature” (because it is rendered). A very tricky balancing act, people like Paul Norman, Joseph Eisenberg and other author-contributors to Carto know quite well.

Again, this is a side topic; back to the main thread.

stevea · January 29, 2023, 10:02pm

For the tags that are safe to remove after a “thorough” review on a datum, I think it is accepted that tiger:reviewed=no should be removed when a conscientious OSM editor (aren’t we all?!) feels the datum meets the requirement of “good enough to enter into OSM.” It’s the equivalent of “whether it came in from TIGER or it came in because I created it from scratch, there is no reason for tiger:reviewed=no to remain on this datum: it is ‘high enough quality to be in OSM’ and a tiger:reviewed=no tag either directly contradicts that or leaves it questionable or ambiguous.”

For the tiger:zip_code data, let’s continue to discuss whether a mechanical edit to delete these is appropriate, I’d say yes to that.

For the others, I’m glad we are discussing these, but let’s not be too glib or easy, as there really may be some use cases we haven’t (yet?) imagined that could make them useful. If they are, let’s “wring out” as much semantic usefulness as possible (with new, better tagging, or better position, or whatever) and make them contain in an OSM-correct method whatever those tiger:* tags purported to impart, and then delete the tiger:* tag. This won’t be a quick, socially-lubricated-as-easy process. It will be fraught with chin-scratching and a wide variety of opinions that fall widely upon a spectrum of what is best we should do.

15 years in, hm…another 15? We could shoot for a finish line in 10, though I think 5 years is ambitious.

Richard · January 30, 2023, 12:59pm

I use tiger:reviewed for the latter (for cycle.travel), not :cfcc.

watmildon · January 30, 2023, 7:05pm

Absolutely sensible.

SherbetS · January 30, 2023, 7:19pm

Yeah. I’d definitely look for tiger:reviewed to find roads that don’t have trustworthy surface info. cfcc is pretty useless considering every road that you’d have to worry about adding surface data to would just be… A41

In due time I’d love to consider a mass edit to just remove the tags to stop confusing mappers.

stevea · January 30, 2023, 7:52pm

I’m beginning to nod my head in agreement, but let’s be careful here before we all scream for tag deletion bots running deep into the night. What it appears we’ve “built” with TIGER data (tags) are a whole bunch of assumptions that keep stacking ever higher. For example, yes, @Richard using tiger:reviewed for making more than a coin-flip’s worth of determination about surface is sensible, but what about the converse: when there is no such tag, it may still be true that if the surface isn’t well tagged, it’s still a big unknown w.r.t. surface. You’d have to dig into the history of the datum to see if tiger:reviewed ever WAS on it, and when, and perhaps what might have happened so that it was removed, with only poor-worthiness “guesswork-level” logic, most likely.

Stated succinctly (nearly impossible with TIGER, unless you go very deep, and I won’t here): you can’t always remove tiger:* tags willy-nilly without some consequences, but eventually we want to do exactly that.

Adamant1 · January 30, 2023, 7:52pm

I’ll delete tiger:reviewed if I add surface tags, but I usually keep it if the edit is just a minor adjustment.

SherbetS · January 30, 2023, 8:35pm

What it appears we’ve “built” with TIGER data (tags) are a whole bunch of assumptions that keep stacking ever higher.

Absolutely. IMO the best move is to make focused guidance on how to handle the tags, so we can methodically separate ourselves from the tiger:* stuff.

There’s another thread on the US forums where they’re using MR to add surface data in California.

I feel that going forward, this kind of work may be an important part of shedding all the old tiger:* tagging.

–SherbetS

Carnildo · January 30, 2023, 10:46pm

Worse than “not helpful”, it’s “actively misleading”. “tiger:zip” codes aren’t ZIP codes, they’re ZCTA codes. These are a Census-derived approximation to ZIP codes, useful mainly for interpreting Census-derived statistics.

aighes · January 31, 2023, 12:53am

How the rest of the world is doing it without the tiger:reviewed Consider there is no more tiger:reviewed, so a mapper removed it. Either there he set a surface, then there should be no issue. Or there is no surface defined, well then there might be assumptions done in the renderer. The assumptions possible by tiger:cfcc have been made already during the import. For me nothing you going to loose here. Anyway the issue of missing surface you will have for everything not taken over from TIGER, more or less like everywhere else. So deleting everything but the tiger:reviewed would make sense and keep tiger:reviewed as a machine-readable status indicator, whether the object has been reviewed.

stevea · January 31, 2023, 1:46am

Mmmm, mostly true, I’ll say, but not necessarily always true. Many people forget a significant amount of imported TIGER data were railway=rail, and while they were “fairly noisy,” (with what are now well-understood and surmountable limitations, thanks to newer tags like owner=* and operator=*) they were also “good enough” to have become seriously improved over the last 15 years: today these data are now “pretty decent and still improving.” It is estimated TIGER-imported rail data (in the USA, of course, that’s the only place we have such data) are approximately 75% completely reviewed, but that is a “compromise estimate.” It remains difficult to derive hard numbers about this, because the data have spent 15 years getting both “smeared” and “deliberately improved,” resulting in “much, much better data, but messy as to a measurable value of their actual level of improvement since import.”

In the rail examples I gave, the tiger:cfcc turned out to be quite helpful to make determinations between things like rail mainlines, sidings, industrial spurs and other similarly and quite distinctly taggable ways in OSM — and so they largely have been. This is the best example I can think of for “hey, don’t delete a tiger:* tag until you’re 100% certain there are absolutely no more use cases where semantic value can be ‘wrung out’ of them.” (Imagine a slightly-damp washcloth that you think of as containing “no more moisture” and you twist and squeeze it as hard as you can and get a few more precious drops of fluid). I’m pretty sure we have neither imagined fully nor squeezed out all of the semantic value that (still) might remain in tiger:= tagging. It’s just plain difficult to “fully imagine” all the possible use cases where semantic value might be extracted before the final “goodnight” of deletion happens.

And that is (just) for tiger:cfcc. (Which I’ll be the first to say, is actually a fairly useful tag, at least for what I did with some rail data improvements). Other tiger:= tags? Well, please chime in here and now, everyone.

aighes · January 31, 2023, 1:57am

True, yes I was only talking about the highways, not about the railways.

Minh_Nguyen · February 3, 2023, 5:11pm

Better to verify the name against the latest TIGER Roads overlay or a more reliable source. Often TIGER 2005 did things like assign a road name to the next road over (typical with minor residential side streets off a major road).

On the other hand, tiger:name_base can be handy for sorting out a mess where someone long ago merged multiple ways incorrectly or deduplicated county line roads without using name:left and name:right. You can easily tell which tiger:name_base_# goes on the left or the right based on the corresponding tiger:county in the version before deduplication.

I eventually managed to tease out the names on either side of this road, but it required trawling through the histories of all the connected ways that it had been split from over the years – and using Potlatch 1’s deleted ways visualization, which we no longer have access to. This would’ve been more difficult if someone had already stripped out tiger:county from all the roads.