Implementation of new tagging scheme of archaeological_site=

| Casey_boy
January 17 |

  • | - |

ChillyDL:

As of today, 75,000 of the occurrences have already been mechanically re-tagged

Probably worth just making the distinction that (I think) b-unicylcing wasn’t using mechanical/automated re-tagging. They were systematically but, as far as I am aware, manually re-tagging items.

It was mechanical because they were searching for objects with the site_type tag and then removed this tag and added a archaeological_site tag. On every item until they got too much flak and stopped.

How is this not “mechanical”?

This is, essentially, a terminology change. It doesn’t break the OSM database.

it breaks every single data consumer who expects “site_type” tags

I think @ChillyDL’s plan is a good idea.

Out of curiosity, could you share the script with us here?

The automated edits code of practice implies automated/mechincal edits are:

[changes] made to objects in the database without review individually by the person controlling the edits

I don’t think this is the case here. b-unicylcing was manually (individually) editing objects and, one could argue, therefore reviewing them as they did so. Although, I accept, it will depend on one’s definition of what counts as a review…

To be absolutely clear: I still think they should have discussed this change and come up with a transition plan. An accepted proposal isn’t enough. Their edits were systematic but I don’t think they were “mechanical”. Just a semantic distinction.

Again, not disagreeing, see the other part of my post…

But I mentioned “not breaking OSM” because the code of conduct (as written) is about the database not 3rd parties

The purpose of this policy is to avoid the database being damaged.

Again, to be clear. I don’t think this was the right way to go about things but I think it was probably a misunderstanding that an accepted proposal was enough to make widespread changes.

2 Likes

yes, I am aware of this wording, I agree the guidelines are probably not really helpful in the case of “retagging” from tag A to B with the intention to not change anything, e.g. from phone to contact:phone.

If you take the guideline literally, doing so one by one (implying “individual review”) would not be forbidden by the guideline, but if the spirit is to not have few individuals manipulate what thousands of mappers have „voted with their feet“, then it doesn’t matter if you do it one by one or in a bigger batch, and you must get approval for such mass editing

Regarding the individual review, it is very likely that b-unicycling didn’t know all those thousands of objects, so if they were wrongfully tagged as a site_type they will be wrong as an archaeological_site=* now (i.e. maybe no actual review has taken place)

In the end, I decided not use the script since Word and Excel would freeze because of the sheer amount of 650,000 lines to process.

I use regular expressions in Visual Studio Code instead, to which I copied the output from level0 for processing:

  1. run overpass query for POIs with tag archaeological_site NOT with tag site_type, load into level0, copy plain text from there.

  2. duplicate lines containing archaeological_site and replace archaeological_site with site_type in the original lines, using these regex in Visual Studio Code:
    Find: *(.*)archaeological_site =(.*)*
    Replace with: *$1site_type =$2\n$1archaeological_site =$2*

  3. run overpass query for POIs with tag site_type NOT with tag archaeological_site, load into level0, copy plain text from there.

  4. duplicate lines containing site_type and replace site_type with archaeological_site in the original lines, using these regex in Visual Studio Code:
    Find: *(.*)site_type =(.*)*
    Replace with: *$1archaeological_site =$2\n$1site_type =$2*

  5. combine both texts in level0 and upload to OSM.

Further details in the documentation.

1 Like

Just for completeness, you need to search for historic=archaeological_site + site_type=* as some usage of site_type=* is not for historical sites (and so we wouldn’t want to add archaeological_site to them).

2 Likes

And, of course, commercial projects may not want to document their tag-parsing “secret sauce” on taginfo.

2 Likes

(unrelated to archaelogical sites, as was the complaint, but)

Experience suggests that that is unlikely to occur. In the meantime it probably makes sense to manually review anything that you have contributed by email (and maybe wait until you can contribute not by email).

(back on topic)

Yes, and that actually applies to pretty much any proposal that suggests “deprecating” existing usage of a particular tag without giving any thought to how the proposal can be implemented. Another recent example was here - good idea (in this case to harmonise tags for “diplomatic” entities), poorly documented proposal and even poorer implementation that was only spotted due to “objects disappearing”.

1 Like

I am a bit stuck - I am about to upload the POIs with duplicate tags now, but level0 refuses to even load the data (too many?), while in JOSM I keep running into conflicts while uploading so I have to restart every time.
Is there a way I can just upload the non-conflicting POIs in JOSM?

Or would you have any other idea?

I absolutely would not use a “shoot yourself in the foot” tool like level0 for something like this. I’d use JOSM, and the workflow I’d use would be - query data according to a published overpass query that people have agreed to, change the data that people have agreed to, upload the data.

However, before doing anything else, I’d make sure that the discussion was complete, and I’m not sure it is, yet (I don’t think you’ve addressed this comment, and the fact that you’re even thinking about using level0 for this suggests a misunderstanding of the best way to work with OSM data).

Thanks for the point. I take care of this in two ways:

  • there are no POIs tagged site_type= that are not part of the historic= group
  • the regex in the overpass query makes sure only POIs in the historic= group are selected:
nwr[~":*historic$"~".*"]["site_type"]

That’s what I am doing.
I am at “upload the data” at the moment, and I wonder if there is a way to deal with the conflicts later and get the rest uploaded already. That is my question.

The best way to deal with conflicts is not to have them in the first place. The fact that you’re seeing conflicts presumably means that something has gone wrong.

Can you give an example of a conflict that you are getting?

I suppose the way to go is to not have everything in on global changeset with ca. 100.000 POIs. With this amount of POIs, it is just rather likely that somebody will upload a change while I am working with the data. I suppose this is what happened. Also, this seems to be easier to digest to the other mappers reviewing.
I just had to leave it now and will only come back to it after the weekend. Better not to do this in a hurry.

The requirement for previous discussions of large-scale edits is, among other things, to ensure that those intending to perform the edit can retrieve the necessary advice in advance, rather than starting something half-baked, attracting complaints, and then stopping midway :wink:

I am reverting your edits now in the hope that they can either be done by someone more knowledgeable or that you acquire the necessary skills before you try again. I also found it a bit problematic that you used this thread as a justification for your mass edit even though the discussion had not concluded.

4 Likes

Note that there are also some (few) cases where site_type:<language> has been used.

Example: Node: ‪Römischer Gutshof‬ (‪3457334492‬) | OpenStreetMap

@woodpeck Thank you!

I expect there to be a comparison of old to new first with something like this. Documented in a suitable place, e.g. Wiki. That’s how I would do it if I were considering the hot potato of mass edits.
It must also be ensured that all common editors such as iD, JOSM also evaluate the new key. Applications such as Historische Objekte are also part of this, hence the documentation.

Sven

Translated with DeepL Translate: The world's most accurate translator (free version)

these weren’t part of the proposal anyway :wink:

in all languages these are less than 20 on the globe…

All right, I will hold my horses. Seriously, I had the impression the discussion had come to an end, with no votes against after a week and nobody having said anything for two days.

I feel some might be unaware of the documentation of the suggested mechanical edit – quite some of the concerns raised here have been addressed there already. I am posting it again.

For the record, apparently it is not possible to upload just the non-conflictuous POIs with JOSM and deal with the conflicts later. Just in case you, too, ever want to pick up the 70,000 pieces after a not-so-fortunate mass edit …

Sorry, votes? What votes? Were we supposed to “vote” somewhere on your proposed implementation?

For my part I was waiting for you to follow what it says here - document how you were going to perform your planned edit. Earlier I had said

I (and possibly other people) were waiting for you to publish the overpass query that you were going to use.

Note also that the code of conduct says “Execute only a small number of edits with a new bot at beginning before proceeding with larger edits” so I absolutely wasn’t expecting you to try and do the whole planet at once …

1 Like