What you think about importing opening_hours data from AllThePlaces?

I added this concerns to list in the first post

For vs ground truth - is it actually something that happens? In my experience this chains list accurate info on their websites.
Though tiny shops often have outdated info

For divergence of effort - it is balanced that effort that would be anyway going into ATP is useful also for OSM.

There’s a postal service here of which the local branch has different opening hours than the ones mentioned in the main website of the chain. And that one is correct in this case, as you read it on the door, and not the one on the website.
It may be a rare occurrence, but because it exists, I would like to assume that indeed it may actually happen.

1 Like

This aspect can be adressed by checking the source listed by the person who added the original hours (if it includes Survey or Local Knowledge) or more strictly, simply combing through conflicting opening hours manually.

The thing is, you never now, how accurate “web truth” is in this case. Also OSM data surveyed on the ground might be outdated after a while. The only truth which you can actually trust is current ground truth. So as soon as OSM has existing data, the only suitable way of updating it, is checking locally.

I can understand, there are other services with a better coverage. Maybe just use that service and not OSM for that problem.

For myself, I value ground truth in OSM. Since then I know at date xyz someone surveyed locally the opening hours as from 8 to 5. Based on the survey date I can build my own opinion about the trust I put into it.

Importing stuff from a website makes this impossible. The only thing left is, that it was copied from that service on date xyz. But might be surveyed/scraped 5 years ago or yesterday.

You could also add opening hours data only to objects that dont yet have an opening_hours value, if I’m understanding correctly. And then leave existing data to be checked manually if it conflicts with ATP. That solves to some degree the “overwriting better on-the-ground data” problem, and it’s what I did for all of my ATP imports in the US.

6 Likes

I agree with this sentiment. There are clearly challenges with such an import, but from the discussion so far, I get the impression that these are on @Mateusz_Konieczny’s radar. And given his prior experience with automated edits, I’m confident it wouldn’t be conducted recklessly.

Current ground truth is the ideal. But we often don’t have that, and in the absence of current ground truth, I prefer “web truth” over outdated ground truth or no data at all.

So while one obviously shouldn’t overwrite recently surveyed opening hours, I do believe importing opening hours data for objects which lack them, or which haven’t had their opening_hours or check_date updated for a long time, would be helpful. One shouldn’t have to use other services instead of OSM for this common use case.

13 Likes

I disagree with this point. It can’t be in our interest that a mapper needs to frequently update pseudo-data in order to protect what he mapped in OSM. If someone want’s to update existing data, he need to proof that the new data is better.

1 Like

Why would the 6 years old opening_hours=Mo-Fr 09:00-13:00, 14:00-18:00 be assumed better than opening_hours=Mo-Th 08:00-12:30, 13:00-17:00; Fr 08:00-14:00 from the website?

I think we should basically always add check_date:opening_hours to make this decision easier.

10 Likes

I am fine with everything else written in your comment but:

I have often found webpages with outdated opening_hours. Additionally, this data could simply be wrong cause of typos, etc

Unless we have the information when the opening_hours on the website were last updated, I would assume it should be Jan 1st 1970 and therefore the existing OSM data would be more up-to-date.

1 Like

If the ATP Dataset reaches a certain level of quality AND the existing data has not been added/changed/checked in some time, I would accept this as “proof”.

I agree, but to be more safe we should specify for this proposal how long ago is the “some time”.
Personally, I would consider possibly outdated opening hours that haven’t been updated for at least 4 years, especially due to the pandemic which altered the operating hours of businesses/services even after the curfews stopped being applied.

3 Likes

Unless it’s a singular Mom & Pop store (that ATP don’t really import, it’s really only chain stores), I would think the stores maintain the opening hours on their websites. It is after all where people would check it up, and they’d get feedback from the customers if it’s wrong

At least in Norway we have no issue importing this regularly with spiders similar to ATP.

4 Likes

I think it really depends on the source quality. If some chain’s opening hours is known to be updated daily at its website, we could shorten the duration a lot. If ATP quality were so bad we even regard 4 year old opening hours better, then we should stop this adventure right now. I guess (and hope) ATP is better. Maybe someone could share some examples of how good or bad it is?

3 Likes

I tested data for some chains in Poland and in general there is 100% match for brand chains between actual opening hours and what is on their websites (ƻabka, Lidl, Auchan, Biedronka, OBI, Castorama and so on). This is just personal local knowledge as result of living there.

Major Poland-specific problem is concept of trading Sundays that cannot be modelled in OSM opening_hours model but it is affecting also surveying opening hours.

ƻabka opening hours were broken in ATP, got fixed after I reported problems. For others I was waiting with deeper investigating quality of ATP crawling where some errors may remain. I was expecting new government to get rid of nontrading Sundays but it seems to be not happening any time soon.

I spot-checked some opening hours when I travelled to Germany and they matched between reality, ATP and websites.

When travelling across Europe in my experience claimed opening hours on websites of major chains matched exactly actual opening hours. For non-chain shops it was common to see outdated websites, but these are not in ATP. I had no opportunity to compare these with ATP (except limited checks in Germany)


So far every single case that I have seen where website of major chain mismatched OSM - OSM was wrong.

13 Likes

If you can make sure, the website was updated after the OSM survey or is more accurate, what the mapper surveyed. Go for it. Just keep in mind, a website can also be 10 years old and contains outdated data.

1 Like

On big chain-shops I have less concerns. They usually have enough manpower or tool chains to keep the opening hours correct in their website. My concerns are more about individual shops. As they do not frequently update their website. Just scrapping those websites without validating might not be a good solution.

Just to note that many of the chains on ATP operate on a franchise model, where individual store owners have an agreement with the brand owner to use the branding and sell the brands products. In such cases, there may be a less strong link between the store info and what’s shown on the brand’s website. The opening hours on the brand’s website may rely on the owner remember to log in somewhere and update them whenever they change, and they might not always remember to do this.

I’ve also come across cases where a chain doesn’t remove pages for old stores or has duplicate pages (with different opening hours) for the same physical store. There may also be issues with ATP interpretting the opening hours data on the store web pages.

So there are certainly issues with the data in ATP. I suspect if we want to use it in OSM, we’d need some sort of manual review on a chain-by-chain basis, to decide how trustworthy it is. There could be different levels, ranging from “don’t import at all” to e.g. “overwrite manual changes made more than a week ago”.

10 Likes

It’s hard to say globally. In France, post offices opening hours are in Open Data and always up-to-date. I selected an office randomly. It was created 11 years ago thanks to OD. In 2020, David Faure created a bot to retrieve the opening hours from another OD data set. This was the first time the opening hours was added. The 7 following times was also updated by the bot. The object has been updated in the mean time by others, but not this attribute. I doubt a contributor would have said that on 2023 Oct 12, it was open from 14:00 to 17:30. So not ATP as source, but same logic: data based on another service.

On a supermarket chain, the opening hours on the global website was just what it could be more than what it is. This chain is using a franchise model that may explain a lot. I see now the opening hours on the web site are correct, not in OSM (last update in 2021), so ATP would have been a plus now.
So, let’s check by brand to see if some should be excluded.

3 Likes

OK, brand by brand checking would be definitely needed.

It seems that I should not assume that IT by shop brands globally is working as well as in Poland and places I visited.

“world is more complicated that you expected” appears once again.

I already have some ideas how to do it without expanding monstrous effort on verification I “only” need to implement this, and test is it working well.

7 Likes

Well, while that is rather pessimistic view (that website wasn’t updated since 1970-01-01), but might be safe bet


On the other hand, one can automatically detect changes in ATP dataset – e.g. if shop had one opening_hours in ATP on 2025-01-15, and a different one on 2025-01-16; it can automatically be determined that the opening_hours likely changed on 2025-01-16, and thus if OSM opening_hours wasn’t updated after 2025-01-16, that it probably incorrect and recently updated web source should be preferred.

(or perhaps ATP already keeps track of such metadata when store information has last changed? It would save a step
)

3 Likes