Mechanical edit Proposal to clean up street=*-tag

The tag street=* is used over 3500x despite being undocumented. According to addr:street=* it is a possible tagging mistake. Here is an overpass-query showing all elements with street=* and here is an overpass-query generating a corresponding table.

I did some analysis of the usage of street=*. The 3670x total uses are distributed as followes:

  1. 2844x street=* is set but addr:street=* is not set.
  2. 206x street=* and addr:street=* are both set but have different values.
  3. 620x street=* and addr:street=* are both set and have the same value.
  4. 1784x use of street=* in this amenity=refugee_site in Jordan. Most of the time it is street=street_... or street=Al_....

For 1. most of the time street=* is probably a synonym for addr:street=* but as there is also street=yes (89x) I don’t think that this should be mechanically edited. For this I propose to create a MapRoulette-chalange for checking everything by hand.

For 2. manual checks are necessary to determine the situation and what to do. For this I propose to include it into the MapRoulette-chalange for checking everything by hand.

For 3. I see a chance of cleaning this up mechanically. I propose to remove street=* where addr:street=* stores the same value.

For 4. I already tried to reach out to the mappers but so far with no success.

What do you think about this plan? Could more cleanup be done mechanically? Should nothing be cleaned up mechanially? Should MapRoulette be used here?

3 Likes

IMO, using MapRoulette to instigate a challenge or two (depending on how you might or do combine them) can act as a proxy for showing interest in the other distribution components. There is pretty much zero downside to doing this, it’s a “win all around.”

I suggest you get that going, see what additional feedback you get from a wider community here and “let this further unfold.” There isn’t any rush, and in fact I’d say (and do, from first-hand experience) that “letting things further unfold” (sometimes, even over many years) is much preferable to a hasty, often ill-conceived quick solution.

Hi @stevea and thank you for your reply.

If I understand you right you support the plan to use MapRoulette for cleaning up the street=*-tag. But I am nut sure what your thoughts about the proposed mass edit edit of street=* where addr:street=* contains the same value are.

What is “that”? My proposal or your thoughts about MapRoulette?

I agree. It is better to have a slow solution than a fast damage to the database. That is why I limit my mass edit proposal to cases where street=* and addr:street=* have the same value. A lot of these values contain the words “Road”, “Avenue” or “Street”. That is something that would be expected for addr:street=*-values.

“That” is indeed starting with MapRoulette. See how it goes, see how much interest is generated. The more that can be completed by a well-designed and well-constructed MapRoulette task, the better (compared to a mechanical edit). If you can’t tell, I’m a big fan of gamification (in OSM), especially MapRoulette: it is a well-developed (continuous development, really) and excellent tool. It not only “gets the job done,” but it can be used to gauge (how much, how fast) interest there is in any given problem domain.

I partly agree. For tagging problems with an not absolutely clear solution it is defently better to not do an mass edit. But I am sure that where addr:street=* is equal to street=*, street=* can be safely removed as it stores the same data but in an uncommon tagging scheme.

1 Like

As of now one peron geve me a “thumbs up” which I interpreet as agreement and one person sugested to do the whole clean up with MapRoulette. How do others see this?

No, I didn’t suggest to do the whole cleanup with MapRoulette. But it’s unanimous (so far) that for some aspects of this (which the OP denoted), MapRoulette would at least be a good start.

And yes, the mentioned “uncommon tagging scheme” I’ll agree is exactly that. Let’s let others chime in.

Can you identify a root cause for these issues? Poor documentation? Incorrect editor presets? Some organized (or semi-organized) editing where participants have been given poor instructions? Is there anything else the community can do to reduce the chance of this happening in the future, such as additional editor validation rules?

I agree that a MR challenge is a good place to start making fixes. Many of the examples I looked at had other issues, such as city=* when addr:city=* is probably correct.

2 Likes

An example in that area was added in this import. You also commented on the original changeset too, but unfortunately, that user only made 6 edits 9 years ago

You might get more luck with the last editor of that node, as they were active only a couple of weeks ago. With a bit of luck that might know something about the imported data; if you’re unlucky they were just tagfiddling away the “place” tag.

1 Like

street and friends have a lot of messy data, with tags used and abused in many different and sometimes unintuitive ways. I don’t like the idea of a mass edit for that, unless there is a very specific and well thought out combination of tags, whitelists and blacklists.

This sounds like a good fit for MapRoulette though! Either the whole thing or in part.

1 Like

Another point in favour of making all of cases 1-3 an MR challenge. Case 3 only adds 20% to the overall size of the task, so it would seem easier and safer all round to use MR.

You might want to exclude relations, or specifically associatedStreet (not worth the complication).

1 Like

The only documentation I was able to find about the street=*-key is that it is a possible tagging error according to addr:street=*. So as far as I can tell it is basically undocumented.

I am not aware of any but I also do not know how to search for this. Is there a list where I can see which tags are in some way recommanded buy iD or JOSM?

There is this refugee-camp I already meantioned. The data import was in 2014 and is documented here. None of these objects are part of my mechanical edit proposal. I will keep trying to contact the people responsible for this. If that does not wok out I would include it in the MapRoulette-Challenge for others to have a look.

There is this area in London. About 90% of the objects where street=* is equal to addr:street=* seem to be in this area, added by one user. I just reached out to this user in a changeset discussion and am now waiting for feedback.

As long as street=* is undocumented, an editor could display a warning like “This is an unusual tag. Consider using addr:street=* or highway=* instead.”. After the MapRoulette-Challenge is finished or close to being finished, the people who participated could probably come up with some ideas for such warnings.

Maybe tags such as housenumber=*, postcode=* or city=* are also worth a look. There are in total 7495 objects using at least one of these tags. But on first glance I can’t see much of a concept. Sometimes it is city=name of city, so city=* should be name=*. Sometimes it is city=yes, probably to tell that this object is a city. A lot of housenumber=* seem to have a numeric value how you would expect it from a addrs:housenumber=*. I suggest to include some more of the addr:*-prefixes into the MapRoulette-Challenge.

2 Likes

I just reached out to them via changeset discussion.

I understand that. That is why I limited my initial proposal to cases where addr:street=* carry the same information like street=*.

If it takes you 30 seconds for every correction via MapRoulette, ca. 600 objects still add up to ca. 5 hours of work. And if there is a safe way to automate this without the risk of damaging data, I am sure that this is something that is worth doing.

There are exactly 8 street=* on relations, so including/excluding them will probably not change much. But I will consider this.

3 Likes

I think the purpose of the MR challenge is gain insight from random mappers. It might make sense to group suspected tag duplications by thier location. So that the tag is probably being used in the same way.

The same tag may be used in different ways as there is no documentation to form a baseline reference.

The instructions should be asking for simple reason why a change was made. The responses should be enough to start forming picture of how the tag is being understood in different places.

1 Like

street seems to be safe to remove automatically here

5 Likes

I would like to do the possible mechanical cleanup before creating a MapRoulette-Challenge. The other way around would lead to a MapRoulette-Challenge with lots of already fixed problems.

Instead of a wide spread discussion about how to design a MapRoulette-Challenge for further cleanup I would like to focus the discussion for now to one question: Should street=* be removed from objects where addr:street=* stores the same value with an mechanical edit? Even though this is not a democracy rather than consensus based discussion, a poll is a good orientation for that:

  • I endorse a mechanical edit where street=* is equal to addr:street=*
  • I oppose a mechanical edit where street=* is equal to addr:street=*
0 voters

I documented the proposed mechanical edit in the wiki and added an example.

3 Likes

As the poll was open over more than 24 hours and got 9 votes all supporting the mechanical edit I executed the edit with changeset 145806155.

grafik

9 Likes

@os-emmer Just out of curiosity, could you link the MR challenges once you have created them?

That was my plan :stuck_out_tongue:


I had a look into the usage of housenumber=*. It is used 306x at the moment. About half of these objects seem to be added by a single user. These look a lot like typos for addr:housenumber=*. I contacted the user in the changeset discussion. There is no object, where housenumber=* and addr:housenumber=* store the same value.

Another 80x housenumber=* is used in combination with tags of emergency=* for some kind of fire hydrant or water tank. As this combination is used by several users in different places (all in Russia) I wonder if theses where intended uses. Does anyone know anything about this? The examples I checked where creaed by accounts that have only a few edits and that are inactive for a long time now.

Most of the uses of housenumber=* are probably typos for addr:housenumber=*. Some of them may also be addr:housename=* or ref. I think that this can be included into one MapRoulette-Challenge with the remaining street=*.


postcode=* is undocumented and used exactly 7x spread around the globe. As the number is so low that can be included into the MapRoulette-Challenge.

2 Likes

One such edit in Greece was done by a Kaart member. It was properly tagged addr:housenumber initially, but they changed it to housenumber. (specifically that Kaart member done quite several mistakes, so it’s not fully surprising)

Thank you anyway for this search you are conducting.