Automated edit proposal - convert non-standard dashes to standard dashes

Looks a good idea to me! Does someone have more knowledge than me in parsing this specific data in Overpass?

This could be a starting point:

In keeping with the mood in this thread, you could add all the other quotation mark characters as alternatives to the ASCII double straight quotation mark. :smile:

3 Likes

Something like this I think: nwr[opening_hours!~"[\"”„“]"] (I don’t have them all on my keyboard)

2 Likes

This doesn’t exclude tags such as opening_hours=opens on fridays of course

Just now I could touch the data. I’ve manually checked these 37+44 elements, through MapRoulette, and I “fixed” the dashes (also fixed OH where I could, although many complex stuff I didn’t touch).

MapRoulette 1
MapRoulette 2

After checking all these elements, I can safely affirm that an automatic change would not negatively affect the data at all. Of course, as a first iteration, it was good to be more conservative, but in the future, an automatic edit can be used even for these more specific cases.

Now, next step is to write the wiki page with the proposal. When I have some free time again I’ll work on that and let you all know.

4 Likes

Initially I just want to work with opening_hours tag, and I don’t want to mess with those other dashes in other tags (eg. name). Since I was having trouble with regexp and text editors, I lazily came up (AKA ChatGPT) with a very simple Python code:

Made some manual checks and seems to work. This also includes all dashes mentioned in a Wikipedia page mentioned here, so it’s an improvement.

I uploaded a changeset covering just Brazil, and you can find the wiki page here.

Anything else I should consider?

Out of curiosity, how many of the non-standard dashes are just regular time interval separators?

/\d\d:\d\d\s?NONSTANDARD\s?/\d\d:\d\d/

(or equivalently, weekday intervals)

If that’s the wast majority of the cases, it’s perhaps worth it to just deal with those “simple” cases mechanically, and review everything else.

Not sure if I understood properly but using a regex online service + latest output from Overpass, couldn’t find any match. Did you take a look as well?

10 days since last message, so I performed the edits.

You can find all here (check latest edits from my user): Changesets by matheusgomesms-import | OpenStreetMap

It was a good exercise, was an easy fix that I think it will be valid to correct many POIs. Obviously there are many things still to be fixed, so a MapRoulette task can be used (local language knowledge required, though).

Also, what stood out was that in Japan there were some OH in wrong format due to different charset used there. A more focused task could also be done there regarding this (updating numbers chars and colons, for example).

I intend to perform this maybe every 6 months or once a year, let’s see in 6 months how’s the situation in OSM.

7 Likes

Here are some numbers on opening_hours (plus collection_times etc.). The data should be quite recent, not sure if your change is included in the numbers, though.

  • there is a total of 3502562 opening hours strings in OSM (not including opening hours strings within conditional strings)

  • 96.10% are considered validÂč

  • 2.99% are invalid but can usually be unambiguously parsed by a lenient parser

of the latter, here is a list (the number in front is the number of times that unique string appears in OSM):

https://raw.githubusercontent.com/westnordost/osm-opening-hours/master/src/jvmTest/resources/invalid_but_unambiguous_opening_hours.tsv


Âč what exactly is considered valid differs a bit from parser to parser. E.g. the “reference implementation” parser does not understand everything that is in the spec, while understanding other constructs that are not in the spec. In this case, considered valid by my own osm-opening-hours parser

4 Likes

FWIW, StreetComplete considers opening hours that can be unambiguously parsed but are invalid according to the spec as immediately due for re-survey, i.e. an opening hours quest is created. (And completely invalid opening hours strings anyway.)

When the user then acknowledges that the displayed times are still correct or edits the opening hours, a valid opening_hours in canonical form is saved.

In general, the app asks if any opening hours are still correct once every year. This only works if either the shop hasn’t been edited for at least one year or a check_date:opening_hours with a date that is more than one year old has been set. I.e. the app won’t ask if the opening hours are still correct for most shops whose opening hours syntax you corrected just now for another year now.

4 Likes

6 months later, here I am again!

Interesting to see, according to Overpass there are now 526 nodes 144 ways and 1 relation with non-standard dashes. I expected a smaller number in 6 months, but it is what it is.

Does anybody oppose that I do another round of fixes? I believe the first one didn’t break anything, on the contrary.

12 Likes

Well, I do wonder how these dashes are inserted in the first place. An n-dash or m-dash does usually not exist at all on any normal keyboard, right? So, maybe it is a certain app or automated import that is causing this. If this is the case, it would be more meaningful to root out the source of that.

The obvious downside of a mass edit is that the last-edit date of the elements are also updated, causing those elements to appear as if they are up-to-date to software that evaluates that (like StreetComplete) even if they are not.

Also, there are many more opening hours strings that are invalid according to the spec but still unambiguous enough that a lenient parser can understand them. From the numbers above, this would be about 100,000 opening hours. So, your edit would just fix less than 1% of these invalid but unambiguous opening hours.

1 Like

From a visual inspection, I believe the main source is someone copying/pasting from websites. Some mappers just do this, others fix the syntax, but forget/can’t see the different dashes (almost impossible to see this difference on iD, for example).

I also believe some languages use different chars, leading to this problem too.

I’m not trying to fix all ambiguous OH, the idea here is to quickly fix non-standard dashes (right now I’m taking more time to create this message than to perform the fix).

On a quick glance on Overpass, you can see some cases that OH will be COMPLETELY fixed by this quick fix:

Doesn’t StreetComplete use check_date for that?


Other automated edits can be performed to fix other parts of Opening Hours (such as MO → Mo; Monday to Friday → Mo-Fr etc), but this fix is not intended to fix those cases.

Copy and paste from something on the internet would be my guess - but you’re right that “actually asking people about the source” is the way to go.

Another source of these characters is that some operating systems helpfully convert typewriter punctuation to “smart” punctuation. iD explicitly disables the autocorrection feature in Safari on macOS and iPadOS, but both Go Map!! and Every Door insert smart punctuation by default on iOS. This mainly affects curly quotes, but you can also get an em dash by typing --, for example.

6 Likes

See Automated edit proposal - convert non-standard dashes to standard dashes - #14 by SimonPoole

After this new round of discussion, I did the fix again:

https://www.openstreetmap.org/user/matheusgomesms-import/history#map=2/19.8/10.5

547 nodes, 142 ways, and 1 relation were edited. It took me 30 min, because I was manually trying to upload smaller bboxes (some of them are not small and for sure I’ll receive some complaints). If not for that, it would take me about 2 minutes to edit all of them in the whole world.

Let’s see how it goes again in 6 months. Thank you all for the discussion! :smiley:

4 Likes

You seem to have missed some.

There is a question about what to do with opening_hours:covid19? Are they still a thing anywhere?

1 Like

Have you tried following advise of @SimonPoole ? If some editors keep adding it - maybe it would be worth reporting to maintainers of that editing software?