Tiger cleanup - tags

Has the consensus evolved on whether to keep or delete the tiger tags such as
tiger:cfcc, tiger:county, tiger:name_* ?
The wiki implies there was no consensus so when I touch a highway I’ve been deleting tiger:reviewed, but leaving the rest. I’m wondering if I should start deleting them.

2 Likes

My approach, which is based on advice from others in the community, is to remove all the TIGER tags when cleaning up a section of highway. They are all findable in the history of the object, so I think it is fine to remove them.

I like to at last add surface=* before removing them.

4 Likes

I always delete all tiger:* tags except tiger:reviewed=no. If I’ve ensured that the geometry matches a real road reasonably well and verified the name is correct, I remove that tag as well.

4 Likes

I find tiger:county to be useful when looking at raw data and changeset diffs. It lets me figure out where I’m looking without needing to zoom out.

1 Like

I would ask myself: Would there be a benefit for OSM maintaining those tags according to the reality? For example if the name is wrong, would there be a benefit to change as well the tiger-name-tags or isn’t it enough to maintain the name-tag. Same goes for the road class etc. Usually the same object has this data available in “OSM-language”. For county-data we have admin-boundary. If there is a matching one, I remove it as well. In those cases I don’t see a need in maintain the same information twice. Additional there might be a risk in the end, there a two different information existing. So removing such kind of data seems the better approach.

For the tiger:reviewed=no I remember there is a “definition” in the wiki, it should be removed after you confirmed the existence, highway classification, position and so on…

As address-data is pretty bad in my area, I leave usually the zip-codes in, as they might be helpful later on for someone who is adding them.

2 Likes
(-("tiger:reviewed"=*) and (surface=*)) OR ("highway"="footway"|"service"|"path"|"cycleway")

I’m currently doing same stuff, that filter might help you in case you are working with jOSM

1 Like

I wrote a bit about some novel TIGER name cleanup you all may find interesting. watmildon's Diary | Using the US National Address Database to assist TIGER tag cleanup | OpenStreetMap.

As for the original question, I don’t find any TIGER tags particularly useful once the review is done. cfcc should be better specified under highway=, tiger:county= while historically interesting when tools weren’t as mature is generally better served using the now robust Overpass service, tiger:name_* just leaves a mine filed of nonupdated tags if a roadway changes it’s name later.

While I chose to respond by specifically clicking on the @aighes icon, this reply addresses more than simply one person’s comments here.

I remove tiger:reviewed=no when I’ve substantially improved either or both of naming or geometry of the way (road or rail, and I improve both). Other TIGER keys like :cfcc or :county I generally leave alone. And although it is true what @ElliottPlack says, digging through histories is fraught with more difficulty than it is to see the tag right here, right now. I don’t believe a consensus has emerged about :cfcc (which I have found to be useful in some quirky cases) and :county really can be useful (as @Carnildo said) “when zoomed at a particular level” (seems like a “weak” reason, but OK, I’m nodding my head).

Now, ZIP codes simply don’t belong in OSM at all, whether associated with a TIGER tag or not. ZIP codes are not even geographic areas (they don’t delineate these, which is why drawing their boundaries is so problematic). ZIP codes are more like a routing algorithm for efficient postal mail delivery. Their intersection with OSM should be zero.

I read @watmildon’s Diary as linked; good stuff there; thank you. And welcome to posting in Discourse.

15 years after TIGER, there still remains a LOT to clean up from this, and we are only a bit closer to consensus on what to do with the tags: it does seem to be more-universal to delete :reviewed when the way actually has become “reviewed,” rather than the “uh, some do this, some do that…” non-consensus we had five years ago. Maybe by about 2045 we’ll have cleaned up all aspects of TIGER. Maybe later.

Again, I really miss the old ITO World rendering that helped fix TIGER, that was great and highly effective. But we can continue to blue-sky good ideas here and in our wiki TIGER Edited Map - OpenStreetMap Wiki , where the ITO World rendering is the topmost example. For example, there are fairly nice OT queries (for local areas, a county-at-a-time query doesn’t cripple the servers too bad). I can continue TIGER cleanup in my county with these, but I feel OSM can do much better “visually” (as in ITO World-like renderings). We’ve had a few “stabs” at quicky renderers that attempt to do TIGER review, anybody feeling ambitious enough to whip up an ITO World-clone?

But we might decide on good strategies on “what to do with tiger: keys” as a continuing discussion here.

And as I don’t want to write seven more chapters to this book (here and now), I’ll stop.

Additional it should be fine as well to verify the existing geometry/name/highway is sufficient. If something is already correct, there is no need for any substantially improvement.

Where du you see the benefit of the tiger:cfcc? So far it seems for me to be same thing, but less precise than our highway classification.

And surface. Unpaved highway=residential TIGER A41s are the bane of bike routing in the USA.

https://wiki.openstreetmap.org/wiki/TIGER_fixup

Most of the TIGER tags are junks and should be deleted. That being said, what I’ve done is:

  1. Fix the geometry.
  2. Verify if the name is correct or not.
  3. Check tags which may contain historical information.
  4. After then, remove everything related to TIGER.

  • tiger:cfcc:Useless information.
  • tiger:name_base: You need to verify and correct name if needed, then remove it.
  • tiger:reviewed: You need to remove it after everything is set.
  • tiger:separated: Automatically removed if you iD.
  • tiger:source: Automatically removed if you iD.
  • tiger:tlid: Automatically removed if you iD.
  • tiger:zip*: Useless information, USPS is the source of the truth.
  • tiger:upload_uuid: Automatically removed if you iD.

As a cyclist I would personally agree, though I think it’s nothing mandatory anyone have to add in order to remove the tiger:reviewed=no. Let’s hope the recent changes to OSMcarto will give a boost to surface-usage :wink:

It was years ago, but I do recall using some of the tiger:cfcc tags when doing some rail classification and “better tagging” (as in existing rail tagging conventions that didn’t exist or weren’t paid attention to when TIGER data were imported, or those that evolved in OpenRailwayMap tagging conventions years after the import), as I have cleaned up thousands of miles of USA Rail that came into OSM by the TIGER import, and that tag proved useful in certain wide-scale OT queries: quite helpful, I recall. I agree that for the most part, tiger:cfcc tags seem “junks” or “useless” (as @DUGA says; welcome to Discourse, DUGA). But as can be true with such data, we don’t know all possible cases for how or when they might be useful. And as in some cases (rare, admittedly, but in the case I note above, actually do happen), they are useful, because they (surprisingly) can be.

Until, as with tiger:reviewed and then we review the data and it becomes “fine enough for OSM,” and then it is absolutely proper that we delete it. I agree with @aighes that “if it is already correct, it doesn’t need to be further corrected.” The other data? Let’s continue to discuss, as “which tags might be disposed of has begun.” We should be careful, though: we must try to imagine use cases, but that is fraught with the danger we don’t imagine everything possible, quite a likely occurrence.

I’d say tiger:zip* could be deleted with a mechanical edit (and yesterday). There is no reason for these data to be in OSM, whether they are, might be or are not correct. It’s simply incorrect for them to be in OSM; no need to make a determination if they are “correct.”

Someone in another forum mentioned renaming the TIGER zip tags to postal_code tags which I think is also likely not helpful?

Renaming “zip” to “postal_code” simply obfuscates / further confuses their origin, which is “mail delivery numeric algorithm acting as imposter data for geographic area.” Making such geographies (polygons) is impossible to do, but people continue to try to “map” (logically and geometrically) ZIP codes to geographic areas, always as “estimated” or as is stated to be more blatantly true, “incorrectly.”

I’ll repeat my strong opinion: tiger:zip is a no-brainer for “can be deleted with a mechanical edit,” but of course, we’d need to achieve wider consensus on that before doing so. Other tiger: keys are more difficult to make such a determination, but they all lean heavily towards “let’s do what we have to do, even taking years to get there if we need to, so that we can show these tags the exit door out of OSM.”

The tricky part, and why we are (still) 15 years into this discussion (and discussion, and discussion…) is to “wring out” the maximal amount of “mapping value” out of these tags before their demise. That’s pretty hard, as we can’t possibly imagine every use case. And simultaneously, the data DO need substantial review and/or improvement.

1 Like

It seems easy to conflate two discussions together:

  1. What tags are safe to remove after a “thorough” review of an object?
  2. What tags do various folks think should just be removed from the DB?

Focusing only on the first class, it seems that there’s some set of folks that derive some (maybe small) value from county tagging and that cfcc is somewhat helpful if the surface isn’t well tagged. Everything else seems to be getting the axe by most folks. Fortunately this seem to match the wiki.

For the second class there’s probably some relatively easy housecleaning among the top TIGER:* tags. Looking at you: tlid (2.7mil entries), source (2.7mil), upload_uuid (2.4mil). And then some perhaps less easy but maybe not too contentious cleanup… maybe: zips (200-300k)?

Everything on tag info with fewer than 100k entries is totally unfamilar to me so I won’t even speculate! I was surprised to see the wider range of TIGER:* tags listed there.

It’s kind of a side-topic to this thread, but it is absolutely true that wider paying attention to “the entire data path” of OSM nodes, ways and relations turning into a rendered map is a real requirement for how the feedback loop of effective mapping happens. Yes, this gives rise our “don’t tag for the renderer” no-no, but it also rears its head when Carto begins to support a tag as a rendered feature: in effect, this says “it is important to get the tagging right on such features, as NOW, they render” in our “standard” renderer, which by doing so, carries some clout on “what is important for Contributors to tag, and how.”

This is a complex topic, as OSM does a certain amount of “trying to hide” the complexities and difficulties of “what is rendered” (OSM is not a map of specific rendering, it is a database), especially by saying “don’t tag for the renderer,” yet at the same time, it (appears to?) influence the way that specific tagging does or will happen, by making choices in the renderer. That feedback (at the end of the pipeline, “a rendered road surface,” for example) reaches all the way back to the mapper, by saying “it is somewhat important to be careful tagging this feature” (because it is rendered). A very tricky balancing act, people like Paul Norman, Joseph Eisenberg and other author-contributors to Carto know quite well.

Again, this is a side topic; back to the main thread.

For the tags that are safe to remove after a “thorough” review on a datum, I think it is accepted that tiger:reviewed=no should be removed when a conscientious OSM editor (aren’t we all?!) feels the datum meets the requirement of “good enough to enter into OSM.” It’s the equivalent of “whether it came in from TIGER or it came in because I created it from scratch, there is no reason for tiger:reviewed=no to remain on this datum: it is ‘high enough quality to be in OSM’ and a tiger:reviewed=no tag either directly contradicts that or leaves it questionable or ambiguous.”

For the tiger:zip_code data, let’s continue to discuss whether a mechanical edit to delete these is appropriate, I’d say yes to that.

For the others, I’m glad we are discussing these, but let’s not be too glib or easy, as there really may be some use cases we haven’t (yet?) imagined that could make them useful. If they are, let’s “wring out” as much semantic usefulness as possible (with new, better tagging, or better position, or whatever) and make them contain in an OSM-correct method whatever those tiger:* tags purported to impart, and then delete the tiger:* tag. This won’t be a quick, socially-lubricated-as-easy process. It will be fraught with chin-scratching and a wide variety of opinions that fall widely upon a spectrum of what is best we should do.

15 years in, hm…another 15? We could shoot for a finish line in 10, though I think 5 years is ambitious.

1 Like

I use tiger:reviewed for the latter (for cycle.travel), not :cfcc.

3 Likes

Absolutely sensible.