Proposal: Undo 4 year old parcel import in Dallas TX

A largescale (and as best I can tell undocumented) parcel import was done by user Mark@MJS | OpenStreetMap about 4 years ago across a bunch of edits. That user has been inactive for 4 years. User @citrula brought this import up in the OSM-US slack a few days ago and there is renewed enthusiasm to clean it out of the database.

While there have been edits to various portions of this in the intervening months (ex: a park parcel may now have address tagging), the vast majority of items are small ways with place=plot and version 1 or 2 (another user did a bulk tag adjustment at one point). My plan is to move through chunks of the city removing things as I go and determining if any extra details need to be extracted/moved. I may set up a project on the OSMUS task manager to help track things but that may not end up being necessary.

I plan to do this myself over a series of edits but others are welcome to join. I will start this work sometime next week presuming there are no unresolved objections.

Thoughts? Suggestions? Concerns?

An example area

The scope of the import:

8 Likes

Yikes, that’s no fun to edit. Depending on how manually you’re going to carry out this edit, would there be any benefit to aligning any non-parcel areas to the parcel boundaries? I’m thinking of something like a schoolground (amenity=school or landuse=education) that would’ve been drawn in roughly according to the property line but without the benefit of this data. This would only be a good idea if the parcels seem well aligned and are suitably licensed. (I take it that this is the same parcel dataset that was used to prepare the Dallas building import.)

I think that’s a lovely secondary goal and will depend on how much effort is required for the baseline goal of: get this outta here.

1 Like

Not only is that unverifiable (looking at the example area) but it’s just nonsensical… Way: 755677911 | OpenStreetMap (and it’s neighbors) are tagged “zoning=grass”… and in the middle of an appropriately mapped “natural=wood”.

I’d say start out with a mass deletion of everything claimed to be zoned as grass…

Yep. Fortunately, the mechanics of this should be pretty straightforward as the tagging is very consistent.

Looking at taginfo… “zoning=grass” is used over 28,000 times… and I promise you, “grass” is not a zoning classification. Just terminating those with prejudice would be a huge start.

Way: 764375733 | OpenStreetMap (zoning=grass) is a paved parking lot.

The zoning=grass ways are from a tag change another user made when they realized that landuse=grass was randomly being applied to the parcels. I do agree that zoning=grass is nonsensical and I am pretty sure it was introduced by this edit.

IMO, the entire idea of “zoning” as a tag is completely wrong… we map facts as they exist on the ground, not theoretical stuff. An empty lot with commercial zoning is just an empty lot.

2 Likes

I would argue that parks probably fall into that category as well - I would happily contribute to a MR or tasking manager project to merge that sort of POI to the imported parcels, and fix up imported parcels which have more tags on them. I don’t suspect that there would be that many of these because the vast majority of the place=plot ways are untouched.

2 Likes

I have asked dwg for the importer role to be added to one of my accounts to help make this a bit smoother. (~700k nodes and ~250k ways… oof)

Parks (especially what have come to be known as “proto-parks,” land which may become a park in the future but which today are somewhat poorly-known or not-well-publicized elsewhere, except maybe a state- or federal-level “protected lands” database) don’t necessarily “fall into that category” as well. OSM has parks (a leisure=park is a usually-urban, smallish, manicured area for human recreation which is pleasing in its natural arrangements), we have boundary=protected_areas (of various sorts), there are leisure=nature=reserves aplenty, there are boundary=national_parks, which depending on their character, might be applied to state parks or even county parks, there are even more of these things. It is vast, complicated and a truly large dataset.

A contentious issue in OSM can be how there really is a great deal of history as to how certain tagging has evolved (though we all agree that zoning=grass is nonsensical) yet many contributors are anxious to do fairly major re-tagging. Major re-tagging (as in TIGER Review, huge undertakings like public lands, vast networks like rail and road tagging, which essentially has to be broken down state-by-state…) can and should happen in OSM. But when/if/as it does, good dialog, deep wiki reading (in many cases), historical research and a sensitivity to “what was right, is right and will be right” truly is required. It takes a fair number of people who are already nodding our heads together in agreement to make major changes to our map. We call this consensus. It does take some effort to build consensus, this doesn’t (usually) happen quickly.

We have already developed (like United States Public Lands, more…) much hammered-out documentation about our efforts (and its dozens, even hundreds who have stirred our spoons in this kitchen) yet I feel the need to say this when I see “parks probably…” when they don’t (necessarily, until you check our wiki).

Go nuts deleting (well, cleaning up) in Dallas, please. And be careful making far-reaching statements about parks. OSM-US has had enough grief with misunderstandings about how we should tag parks. Our linked wiki is messy, incomplete, rambling and even complicated. That’s because reality is that way. And with the strength of many (not one, two, or a few) we continue to improve these data. Keep up the good work, everybody. And build consensus slowly, as we must not hurry to make our map data real. Quality takes time.

Yes, I am talking about park in the OSM sense, not in the way that it’s often used in American English. I think these are still a great opportunity to use parcel data for since tree cover often prevents you from seeing the limits of the property. I agree that future parks and nature reserves are not leisure=park and I didn’t intend to suggest that.

Thanks for your quick reply and sorry to the rest of those in this topic / thread for skidding off-topic.

Good of you to know that American English and “park” in OSM have a messy history. When you say “these” are a great opportunity to use parcel data, I’m not quite sure what you mean. I agree that an “edge” of mapping efforts include tree cover and “park” (protected_area, nature_reserve…) boundaries in some cases, I was simply urging caution here, like using PAD (Protected Area Databases) to more-authoritatively do this (in some jurisdictions). We do need to be careful, this includes being precise in our language in how we talk about these things, and about how we might tune them up. I appreciate your clarification that you didn’t intend to suggest a blurring of things. OSM has high standards to uphold wherever we can!

1 Like

Hello, will add some comments and insights within a few days.

Best regards, Mark.

3 Likes

A largescale (and as best I can tell undocumented)…

I used the local appraisal districts public datasets to create accurate landuse objects for part of the Dallas area. These datasets contain classifications like ‘Residential’, ‘Commercial’, ‘Farm’, etc. that correspond well to OSM landuse tags like ‘residential’, ‘commercial’, ‘farmland’, etc. Doing it by hand is just guessing and as I remember landuse tags didn’t cover a lot of Dallas area. Here’s a screen shot from years ago: PROJ4 Visualizer. The landuse coverage is much improved now.

parcel import was done…

When you say ‘parcel import’ I say 'landuse areas derived from local appraisal districts public datasets (now known as LADFLADPD). There’s no reference to any cadastre, just objects created with the mostly correct landuse tags. The landuse objects created are the most granular possible - for example in the city of Dallas you might find a commercial area right in between two residential areas with houses on the same street, it’s nice to be able to differentiate if you’re interested in that. The ‘lines’ dividing identically tagged landuse objects (that some people complain about) are a useful side effect of LADFLADPD. Even though unreferenced they do denote an important real-world boundary and give context to building and other such objects. Another example would be looking at multiple building outlines that are close together and wondering if the are related or part of the same entity or not. The LADFLADPD lines imply the answer. I estimate the data overhead similar to the building objects they enclose.

As to the quality of LADFLADPD I believe it’s mostly error free and well constructed. The source data is not error free nor suited for the OSM format. I’ve succeeded in resolving most issues and optimizing for OSM. I did a random quality check against Google maps, who also display the offending lines, and I would say Google doesn’t do the same error corrections.

Interestingly the Dallas GIS data does not change much. On another effort I tracked the monthly GIS changes and it’s either new development or the GIS office doing the same fixups I did to network the geometry. Adding a new subdivision to OSM is something an individual could do, the granular landuse data would not need to be updated often if at all.

I plan to do this myself over a series of edits but others are
welcome to join. I will start this work sometime next week
presuming there are no unresolved objections.
Thoughts? Suggestions? Concerns?

My vote is to leave the LADFLADPD alone. If you asked OSM users which version of an OSM map they preferred, with granular landuse or without, I predict most would say keep the granular version because it’s useful. It conveys a lot of information in a subtle way. Again, it’s not and hasn’t ever been cadastre.

1 Like

Just to clarify, are you arguing for keeping something like Way: 755677911 | OpenStreetMap?

I am not entirely sure since you seem to be talking about fine-granular landuse areas when other stuff discussed here is clearly not. Regarding fine-granular landuse areas, keeping a small commercial landuse in between residentials is fine, but having lots of micro-residential land uses next to each other instead of one larger residential landuse doesn’t make sense to me.

2 Likes

Hello Mark, thanks for your detailed response. The primary concern is that on OSM, landuse areas do not map directly to parcel boundaries - the landuse area is meant to correspond to the usage on the ground. For example, consider a parcel zoned for commercial use, but only 25% of the parcel is actually used in that way. On OSM, we only map the 25% which is actively being used that way as landuse=commercial. This is in line with the “On the Ground” principle which guides mapping on the project as a whole (not without contention :smile:.)

For a more concrete example, here is the particular area that made me look into this in the first place, which was linked in the parent post:


This is an almost entirely abandoned area of the city in the middle of a floodplain. These were imported as landuse=grass which they probably have not been for several decades. I am sure with a bit of perusing we could find similar quality issues across the import.

I understand your concern with landuse coverage, but I think that a direct import like this is not the correct way forward, simply because the data source is not precise enough for OSM’s expectations of landuse mapping. I would definitely be interested to hear about your process for “cleaning up” the Dallas CAD data, though - I think this could be an interesting and useful way to validate landuse mapping in Dallas or other places with a similar data source.

3 Likes

We can have a reasonable debate about where to split chunks of landuse=residential but having it at the level of each plot is incorrect for OSM.

I’ll post some more specific issues here from just a quick scan. It’s very easy to find issues like this. I am not cherry picking.

  1. There are 250k items with a “zoning” tag. It seems a large swathe of them came from another user editing after your import… not sure what the story there is.
  2. Here’s a parking lot and grassy/shrubby area with a lot of added geometry but the tagging is incorrect.
  3. A cemetery relation that is weirdly cut into 2 pieces.
  4. A “grass” relation that is area:highway at best and likely should be removed entirely.
  5. A schoolground that the import says is “commercial”.
  6. An area around the airport the import thinks this should be “grass” but is meaningless geometry as it currently stands.
    etc

I absolutely think there’s some value in this dataset but this import leaves a lot to be desired.

4 Likes

This seems like a pretty obvious candidate for revert. I would recommend a heuristic search for the changesets involved and then just dump them into https://revert.monicz.dev/.

This is going to be a heck of a job to nuke this.

It was a challenge just getting this into JOSM. Kudos to the JOSM developers on making such a robust piece of software.

Reverting this is going to remove 250,000 objects, not including any child nodes and ways, so several million total for sure.

While I wait for the upload dialog in JOSM to appear, it would be awesome if someone could lift the hourly upload cap for me :smiley:

1 Like