What you think about importing opening_hours data from AllThePlaces?

If you disagree with the entire idea of importing opening_hours from ATP, please comment now (best after reading through this post) rather later when significant effort was invested into this.

See also Improving OpenStreetMap shop coverage with AllThePlaces

why it would be a good idea:

  • so we have up to date opening hours

Personally, for me, this is one of main reasons to use Google Maps app. opening_hours in OSM are often far more outdated than what business lists at their website pages and what Google Maps has.

I want to change this

why it may be a bad idea

ATP has too many mistakes and errors

well, in current form it is true - but with some work it is possible to fix it and identity usable parts

See alltheplaces/DATA_FORMAT.md at master · alltheplaces/alltheplaces · GitHub for just some of issues. Also, as mentioned in section of that document below linked one: some data will be parse incorrectly or will be simply wrong at the official website.

It will take significant effort to process ATP data, compare it with OSM, check is ATP actually parsing websites correctly, select part of data where import would make sense and so on.

I am willing to do this, though I want to confirm with OSM community that it is not a dead end that would be wasted.

Ground truth vs web truth

There is risk of replacing surveyed data with what is listed at website, despite that ground survey was accurate. I have plans how to reduce risks of that.

we cannot use ATP data in OSM

It seems that we can. At least partially.

See https://osmfoundation.org/wiki/Licensing_Working_Group/Minutes/2023-08-14#Ticket#2023081110000064_—_First_party_websites_as_sources

Note that it applies to first-party websites, so it is not covering entire ATP dataset.

Also, they may be wrong after all.

But as I understand it, at least specifically for opening_hours it should be fine?

And I expect that in the worst case I will waste part of my life on this and edits that would be done would be reverted, as it would be (at this part) only changing one attribute.

this effort clearly can be better spend on something else

If you have a specific OSM idea, please mention it. Though note that my talents may be poorly fitting it or I may disagree how important/useful it will be.

But I would be happy to discover some other challenges.

Diverges effort into ATP improvements

If it would be done then it would encourage OSM contributors to spend time on improving ATP. This was mentioned as a bad thing for OpenStreetMap project.

(Though, at least in my opinion it is balanced by fact that effort that went into ATP would be useful also for OSM)

reliance on a third-party

reliance on a third party doing the right thing

Disclaimers

This is not a import bot edit proposal. There is not enough details yet to propose this. But as it will be fairly large effort to get it to this stage, I want to check with OSM community.

???

Do you see problems with this plan or other not mentioned obstacles? Please comment.

Are you interested in helping?

You can follow what is described at beginning of Improving OpenStreetMap shop coverage with AllThePlaces

You can also send PR to matkoniecz/list_how_openstreetmap_can_be_improved_with_alltheplaces_data: Code for generating comparison between OSM and ATP, focused on how OSM can be improved - missing shops and tags, skipping dubious data - Codeberg.org or ATP, maybe fixing one of data problems I spotted and reported there ( Issues · alltheplaces/alltheplaces · GitHub )

13 Likes

Sorry if this has been answered before, but how is the connection to be made between OSM feature and AllThePlaces feature? Is this something inherent to AllThePlaces or something you will have to consider yourself?

Is it possible to detect features mapped as ways rather than nodes?

Can we detect features that are mapped with a large offset from their OSM counterpart? Will the correction be automated or have to be done by hand?

Basically, I am unclear how much of this will be done in an automated manner and how much by hand.

Are we to expect a utility similar to this but calibrated to opening hours only?
How do you plan to account for false positives?

2 Likes

I don’t see were you get this additional quality from, surely the only thing that we gain from copying the opening hours from a website is that the opening hours are the same as on the website. If they are correct / up to date can’t be determined in that way.

2 Likes

There are few parts here

  • in my experience major brand are typically managing to update own websites to list own opening hours
  • I assume that it holds also for brands where I have not used their websites
  • I plan to research how well this assumption holds - for example by detecting recently updated opening hours in osm and comparing it with company website and ATP result
  • part of mentioned major effort would be detecting and blacklisting cases where osm in fact has more up to date oh info than company itself. In my experience it happens rarely and never found such case for brands with many shops
  • another part will be dropping cases where ATP failed to parse data properly. This is a fairly common problem
3 Likes

It would be extension of already existing project that you link at end of the message.

Plan is to spend significant manual effort to setup system that would run mostly automatically. Though some manual upkeep is expected to continue.

Any heavily suspicious cases would be skipped. Mostly to reduce continuous manual upkeep and bad edits.

My matcher already supports areas - both ways and multipolygons.

Matcher is partially implemented, and it is my own project (and it generates that linked report). Further improvements to it will be likely needed to throw away dubious or suspicious data.

I prefer to add 100 000 valid and correct opening hours and 1 bad, over adding 100 000 000 valid and 1000 invalid.

1 Like

The lack of up-to-date POI information in OSM is definitely a big barrier to the viability of OSM-based apps being a real competitor in those everyday “I’m out and about and want to go somewhere to eat, and not be disappointed when I get there because it’s closed” moments, so I object to this objection!

I think this would be a very valuable contribution to OSM.

4 Likes

I received PM from someone worrying that

  • ATP is actually substantial extract from databases covered by UK/EU database rights
  • data from UK may be based on data from Ordnance Survey and Royal Mail and therefore tainted and should not be imported
  • that especially in UK fact that they needed this data for own operations does not really imply that it can be extracted in this way
  • that scaping done despite TOS forbidding this may actually impact legal status of extraverted info

Therefore I am planning to process no data for UK. For other areas I am going, at least for now, to assume that LWG interpretation makes sense. But I will try to get some legal advise on this or at least look into situation.

(have I mentioned that I deeply dislike overly complex laws that benefit primarily lawyers?)

3 Likes

While all the rest could be true, if the data is actually scraped from websites operated by the brand owner, that would seem to be an interpretation that goes against the ECJs view of the regulation (that databases created in the course of normal business are not afforded sui generis protection).

1 Like

It would be helpful to link to that :slight_smile:

1 Like

Thank you for your initiative and for your efforts to integrate ATP in general.

One small problem I can anticipate is that opening hours data in ATP usually doesn’t include closed days as far as I could see.

In countries where shops are closed on public holidays, it’s helpful for international data consumers to include that info as PH off in OSM’s opening hours.

If a supermarket has opening hours Mo-Sa 07:00-22:00; Su, PH off I’m worried we might lose this information if it gets overwritten with Mo-Sa 07:00-22:00 from ATP.

3 Likes

I would assume PH off even if not stated explicitly and I see no real point in adding it in manual or automatic OH edits.

if local data is including PH off and ATP does not it would be detected as mismatch and block any import (unless after discussion with local community it would be decided that such PH off is not really needed).

1 Like

In England and Wales, I have no idea about Scotland and Northern Ireland, supermarkets are usually open on public holidays, but may open for shorter hours.

They are only closed on Christmas day (maybe a public holiday if it’s Monday-Friday) and Easter Sunday (not a public holiday).

Then there is Christmas or boxing day bank holiday, in-lieu if they fall at the weekend.

Other than that, it’s a big depends, very hard to document, but I would not blindly fill PH off without local knowledge.

1 Like

I meant opposite: in my editing I was not adding PH off clause, even when I know that POI is closed on public holidays

You should :wink:.
Some people already do some similar stuff as what you intend to do:

Currently it’s not all brands, some verified and for France. Some work consist in homogenization of shops in OSM (discrepancies in names, brands, etc…).

The earliest decision I know of is the Fixtures Marketing one EUR-Lex - 62002CJ0444 - EN - EUR-Lex but there have been a number more in which a significant investment in obtaining the material in the database has been required for protection.

Now obviously on the high seas and before the court … but it would seem to be rather difficult to argue that you made any kind of notable investment in obtaining the opening hours of your own shops.

AFAIK the LWG issued an opinion on the suitability of ATP as a data source so I’m not quite sure why the actual lawyers are being 2nd guessed here.

3 Likes

To be clear, what they actually said was “Copying the opening hours of a business from its own website is fine” and then explained why that was OK. I agree; it’s entirely reasonable to think that a business knows its own opening hours and doesn’t have to go to a third-party, but that isn’t true of everything that you might find on a website.

PS: It wasn’t me that sent the PM mentioned above; I hadn’t thought that there was (and still don’t think that there is) a licensing issue regarding “importing opening_hours from AllThePlaces” (to quote the thread title). There may be a data matching or data-out-of-date issues, but there are ways of dealing with that as discussed above.

2 Likes

I’m on the record as finding the idea problematic, it is just that the sui generis database protection argument would seem to be rather contrived (as long as it isn’t a third party site being scraped). In the UK there is naturally the sweat of the brow doctrine to be considered too, but even that would seem to be far fetched in this case.

From data quality perspective, legal status perspective, as general idea or for some other reason?

  • legal: reliance on a third party doing the right thing
  • goals: working on fixing ATP instead of OSM
  • quality: replacement of ground truth by web truth
2 Likes

I would not consider it a show stopper though. The PH property is somehow blurry anyway, because there are different kinds of holidays, some of which see all or most places closed, while others don’t. Also it may be difficult for data consumers to trace local public holidays, like those that are observed only in a single city compared to those that are observed nationwide.

2 Likes