Implementation of new tagging scheme of archaeological_site=

In the end, I decided not use the script since Word and Excel would freeze because of the sheer amount of 650,000 lines to process.

I use regular expressions in Visual Studio Code instead, to which I copied the output from level0 for processing:

  1. run overpass query for POIs with tag archaeological_site NOT with tag site_type, load into level0, copy plain text from there.

  2. duplicate lines containing archaeological_site and replace archaeological_site with site_type in the original lines, using these regex in Visual Studio Code:
    Find: *(.*)archaeological_site =(.*)*
    Replace with: *$1site_type =$2\n$1archaeological_site =$2*

  3. run overpass query for POIs with tag site_type NOT with tag archaeological_site, load into level0, copy plain text from there.

  4. duplicate lines containing site_type and replace site_type with archaeological_site in the original lines, using these regex in Visual Studio Code:
    Find: *(.*)site_type =(.*)*
    Replace with: *$1archaeological_site =$2\n$1site_type =$2*

  5. combine both texts in level0 and upload to OSM.

Further details in the documentation.

1 Like

Just for completeness, you need to search for historic=archaeological_site + site_type=* as some usage of site_type=* is not for historical sites (and so we wouldn’t want to add archaeological_site to them).

2 Likes

And, of course, commercial projects may not want to document their tag-parsing “secret sauce” on taginfo.

2 Likes

(unrelated to archaelogical sites, as was the complaint, but)

Experience suggests that that is unlikely to occur. In the meantime it probably makes sense to manually review anything that you have contributed by email (and maybe wait until you can contribute not by email).

(back on topic)

Yes, and that actually applies to pretty much any proposal that suggests “deprecating” existing usage of a particular tag without giving any thought to how the proposal can be implemented. Another recent example was here - good idea (in this case to harmonise tags for “diplomatic” entities), poorly documented proposal and even poorer implementation that was only spotted due to “objects disappearing”.

1 Like

I am a bit stuck - I am about to upload the POIs with duplicate tags now, but level0 refuses to even load the data (too many?), while in JOSM I keep running into conflicts while uploading so I have to restart every time.
Is there a way I can just upload the non-conflicting POIs in JOSM?

Or would you have any other idea?

I absolutely would not use a “shoot yourself in the foot” tool like level0 for something like this. I’d use JOSM, and the workflow I’d use would be - query data according to a published overpass query that people have agreed to, change the data that people have agreed to, upload the data.

However, before doing anything else, I’d make sure that the discussion was complete, and I’m not sure it is, yet (I don’t think you’ve addressed this comment, and the fact that you’re even thinking about using level0 for this suggests a misunderstanding of the best way to work with OSM data).

Thanks for the point. I take care of this in two ways:

  • there are no POIs tagged site_type= that are not part of the historic= group
  • the regex in the overpass query makes sure only POIs in the historic= group are selected:
nwr[~":*historic$"~".*"]["site_type"]

That’s what I am doing.
I am at “upload the data” at the moment, and I wonder if there is a way to deal with the conflicts later and get the rest uploaded already. That is my question.

The best way to deal with conflicts is not to have them in the first place. The fact that you’re seeing conflicts presumably means that something has gone wrong.

Can you give an example of a conflict that you are getting?

I suppose the way to go is to not have everything in on global changeset with ca. 100.000 POIs. With this amount of POIs, it is just rather likely that somebody will upload a change while I am working with the data. I suppose this is what happened. Also, this seems to be easier to digest to the other mappers reviewing.
I just had to leave it now and will only come back to it after the weekend. Better not to do this in a hurry.

The requirement for previous discussions of large-scale edits is, among other things, to ensure that those intending to perform the edit can retrieve the necessary advice in advance, rather than starting something half-baked, attracting complaints, and then stopping midway :wink:

I am reverting your edits now in the hope that they can either be done by someone more knowledgeable or that you acquire the necessary skills before you try again. I also found it a bit problematic that you used this thread as a justification for your mass edit even though the discussion had not concluded.

4 Likes

Note that there are also some (few) cases where site_type:<language> has been used.

Example: Node: â€ȘRömischer Gutshof‬ (â€Ș3457334492‬) | OpenStreetMap

@woodpeck Thank you!

I expect there to be a comparison of old to new first with something like this. Documented in a suitable place, e.g. Wiki. That’s how I would do it if I were considering the hot potato of mass edits.
It must also be ensured that all common editors such as iD, JOSM also evaluate the new key. Applications such as Historische Objekte are also part of this, hence the documentation.

Sven

Translated with DeepL Translate: The world's most accurate translator (free version)

these weren’t part of the proposal anyway :wink:

in all languages these are less than 20 on the globe


All right, I will hold my horses. Seriously, I had the impression the discussion had come to an end, with no votes against after a week and nobody having said anything for two days.

I feel some might be unaware of the documentation of the suggested mechanical edit – quite some of the concerns raised here have been addressed there already. I am posting it again.

For the record, apparently it is not possible to upload just the non-conflictuous POIs with JOSM and deal with the conflicts later. Just in case you, too, ever want to pick up the 70,000 pieces after a not-so-fortunate mass edit 


Sorry, votes? What votes? Were we supposed to “vote” somewhere on your proposed implementation?

For my part I was waiting for you to follow what it says here - document how you were going to perform your planned edit. Earlier I had said

I (and possibly other people) were waiting for you to publish the overpass query that you were going to use.

Note also that the code of conduct says “Execute only a small number of edits with a new bot at beginning before proceeding with larger edits” so I absolutely wasn’t expecting you to try and do the whole planet at once 


1 Like

“Votes” as in “vota”, “opions”. There was broad support, that was my point, then two days silence.

The documentation on the planned edit had been on the wiki for days already at that time, including the overpass query, the pilot I ran after some smaller tests, and the global scope of my suggestion, the two latter of which I had also posted right here above: 1st 2nd

@ChillyDL To me it seems that support is in favour of this, to give users some time to adapt before any mechanical edit happens. (As always, correct me if I’m wrong.)

In the Germany Forum, @streckenkundler suggested to simultaneously run a quality control on the values of site_type= while re-tagging. To keep the discussion together, I would like to address this here.

To me, the task here is to effectively re-name the tag site_type=. Due to the circumstances, this will be done with an interim step for data consumers to adapt, but it still is re-naming task. There is built-in quality insurance in my above proposal as only those POIs that bear tags of the historic= group will be re-named in the bulk edit. Presently all do.

I suggest to keep changing the values of site_type out of this bulk edit. Of course, there should be quality control of the values of site_type=/ then archaeological_site=, and I have busied myself to exactly this the last two years in my daily editing routine and intend to keep doing. But I would keep this apart from the present re-naming operation.

@streckenkundler named three values specifically that raised his concern: rock_painting, roman_road and industrial. With rock-painting and roman_road, I believe it is safe to trust their historic= tagging to go on with the tag re-naming. Of industrial, there is only a handful left, tagged historic=archaeological_site, that I looked through individually and that seem alright.

You can find a list of the affected values in the documentation.

I agree that semantical and syntactical edits should be separate steps.

1 Like