For vs ground truth - is it actually something that happens? In my experience this chains list accurate info on their websites.
Though tiny shops often have outdated info
For divergence of effort - it is balanced that effort that would be anyway going into ATP is useful also for OSM.
Thereâs a postal service here of which the local branch has different opening hours than the ones mentioned in the main website of the chain. And that one is correct in this case, as you read it on the door, and not the one on the website.
It may be a rare occurrence, but because it exists, I would like to assume that indeed it may actually happen.
This aspect can be adressed by checking the source listed by the person who added the original hours (if it includes Survey or Local Knowledge) or more strictly, simply combing through conflicting opening hours manually.
The thing is, you never now, how accurate âweb truthâ is in this case. Also OSM data surveyed on the ground might be outdated after a while. The only truth which you can actually trust is current ground truth. So as soon as OSM has existing data, the only suitable way of updating it, is checking locally.
I can understand, there are other services with a better coverage. Maybe just use that service and not OSM for that problem.
For myself, I value ground truth in OSM. Since then I know at date xyz someone surveyed locally the opening hours as from 8 to 5. Based on the survey date I can build my own opinion about the trust I put into it.
Importing stuff from a website makes this impossible. The only thing left is, that it was copied from that service on date xyz. But might be surveyed/scraped 5 years ago or yesterday.
You could also add opening hours data only to objects that dont yet have an opening_hours value, if Iâm understanding correctly. And then leave existing data to be checked manually if it conflicts with ATP. That solves to some degree the âoverwriting better on-the-ground dataâ problem, and itâs what I did for all of my ATP imports in the US.
I agree with this sentiment. There are clearly challenges with such an import, but from the discussion so far, I get the impression that these are on @Mateusz_Koniecznyâs radar. And given his prior experience with automated edits, Iâm confident it wouldnât be conducted recklessly.
Current ground truth is the ideal. But we often donât have that, and in the absence of current ground truth, I prefer âweb truthâ over outdated ground truth or no data at all.
So while one obviously shouldnât overwrite recently surveyed opening hours, I do believe importing opening hours data for objects which lack them, or which havenât had their opening_hours or check_date updated for a long time, would be helpful. One shouldnât have to use other services instead of OSM for this common use case.
I disagree with this point. It canât be in our interest that a mapper needs to frequently update pseudo-data in order to protect what he mapped in OSM. If someone wantâs to update existing data, he need to proof that the new data is better.
Why would the 6 years old opening_hours=Mo-Fr 09:00-13:00, 14:00-18:00 be assumed better than opening_hours=Mo-Th 08:00-12:30, 13:00-17:00; Fr 08:00-14:00 from the website?
I think we should basically always add check_date:opening_hours to make this decision easier.
I am fine with everything else written in your comment but:
I have often found webpages with outdated opening_hours. Additionally, this data could simply be wrong cause of typos, etcâŠ
Unless we have the information when the opening_hours on the website were last updated, I would assume it should be Jan 1st 1970 and therefore the existing OSM data would be more up-to-date.
If the ATP Dataset reaches a certain level of quality AND the existing data has not been added/changed/checked in some time, I would accept this as âproofâ.
I agree, but to be more safe we should specify for this proposal how long ago is the âsome timeâ.
Personally, I would consider possibly outdated opening hours that havenât been updated for at least 4 years, especially due to the pandemic which altered the operating hours of businesses/services even after the curfews stopped being applied.
Unless itâs a singular Mom & Pop store (that ATP donât really import, itâs really only chain stores), I would think the stores maintain the opening hours on their websites. It is after all where people would check it up, and theyâd get feedback from the customers if itâs wrong
At least in Norway we have no issue importing this regularly with spiders similar to ATP.
I think it really depends on the source quality. If some chainâs opening hours is known to be updated daily at its website, we could shorten the duration a lot. If ATP quality were so bad we even regard 4 year old opening hours better, then we should stop this adventure right now. I guess (and hope) ATP is better. Maybe someone could share some examples of how good or bad it is?
I tested data for some chains in Poland and in general there is 100% match for brand chains between actual opening hours and what is on their websites (ƻabka, Lidl, Auchan, Biedronka, OBI, Castorama and so on). This is just personal local knowledge as result of living there.
Major Poland-specific problem is concept of trading Sundays that cannot be modelled in OSM opening_hours model but it is affecting also surveying opening hours.
ƻabka opening hours were broken in ATP, got fixed after I reported problems. For others I was waiting with deeper investigating quality of ATP crawling where some errors may remain. I was expecting new government to get rid of nontrading Sundays but it seems to be not happening any time soon.
I spot-checked some opening hours when I travelled to Germany and they matched between reality, ATP and websites.
When travelling across Europe in my experience claimed opening hours on websites of major chains matched exactly actual opening hours. For non-chain shops it was common to see outdated websites, but these are not in ATP. I had no opportunity to compare these with ATP (except limited checks in Germany)
So far every single case that I have seen where website of major chain mismatched OSM - OSM was wrong.
If you can make sure, the website was updated after the OSM survey or is more accurate, what the mapper surveyed. Go for it. Just keep in mind, a website can also be 10 years old and contains outdated data.
On big chain-shops I have less concerns. They usually have enough manpower or tool chains to keep the opening hours correct in their website. My concerns are more about individual shops. As they do not frequently update their website. Just scrapping those websites without validating might not be a good solution.
Just to note that many of the chains on ATP operate on a franchise model, where individual store owners have an agreement with the brand owner to use the branding and sell the brands products. In such cases, there may be a less strong link between the store info and whatâs shown on the brandâs website. The opening hours on the brandâs website may rely on the owner remember to log in somewhere and update them whenever they change, and they might not always remember to do this.
Iâve also come across cases where a chain doesnât remove pages for old stores or has duplicate pages (with different opening hours) for the same physical store. There may also be issues with ATP interpretting the opening hours data on the store web pages.
So there are certainly issues with the data in ATP. I suspect if we want to use it in OSM, weâd need some sort of manual review on a chain-by-chain basis, to decide how trustworthy it is. There could be different levels, ranging from âdonât import at allâ to e.g. âoverwrite manual changes made more than a week agoâ.
Itâs hard to say globally. In France, post offices opening hours are in Open Data and always up-to-date. I selected an office randomly. It was created 11 years ago thanks to OD. In 2020, David Faure created a bot to retrieve the opening hours from another OD data set. This was the first time the opening hours was added. The 7 following times was also updated by the bot. The object has been updated in the mean time by others, but not this attribute. I doubt a contributor would have said that on 2023 Oct 12, it was open from 14:00 to 17:30. So not ATP as source, but same logic: data based on another service.
On a supermarket chain, the opening hours on the global website was just what it could be more than what it is. This chain is using a franchise model that may explain a lot. I see now the opening hours on the web site are correct, not in OSM (last update in 2021), so ATP would have been a plus now.
So, letâs check by brand to see if some should be excluded.
OK, brand by brand checking would be definitely needed.
It seems that I should not assume that IT by shop brands globally is working as well as in Poland and places I visited.
âworld is more complicated that you expectedâ appears once again.
I already have some ideas how to do it without expanding monstrous effort on verification I âonlyâ need to implement this, and test is it working well.
Well, while that is rather pessimistic view (that website wasnât updated since 1970-01-01), but might be safe betâŠ
On the other hand, one can automatically detect changes in ATP dataset â e.g. if shop had one opening_hours in ATP on 2025-01-15, and a different one on 2025-01-16; it can automatically be determined that the opening_hours likely changed on 2025-01-16, and thus if OSM opening_hours wasnât updated after 2025-01-16, that it probably incorrect and recently updated web source should be preferred.
(or perhaps ATP already keeps track of such metadata when store information has last changed? It would save a stepâŠ)