Automatically update opening hours of businesses

Especially during covid-lockdowns (at least in my country) opening hours of businesses changes somewhat regularly. As you might guess, the data in OSM are rarely up-to-date. Since most businesses have their up-to-date opening hours on their website, I thought about writing a pipeline to automatically (IMO daily or even weekly will be just fine) visit their websites and change the opening hours in OSM accordingly. Since (most) businesses don’t have an API to pull this data from, I guess web scraping is the way to go (unfortunately). This means that this pipeline of automatically updating opening hours needs to have a mechanism to handle website changes somewhat fine (even if this is just a notification to a developer that something failed).

Furthermore, you probably still want to manually review these changes (in order to prevent serious spamming in case of a malfunction). IMO, the most sensible way to go for is to change the opening hours of the businesses directly but mark the change set as “to be reviewed by someone”.

Since I don’t think a centralized version where one instance checks the opening hours of a lot of businesses, I’d propose a more decentralized approach (see the following paragraphs).

Before I create even a first prototype I wanted to ask: Is this something the community is interested in? Or do you think it’s not necessary and not the goal of OSM? (I certainly think that up-to-date opening hours are valuable, as this would decrease my dependency on Google Maps even more. But since I’m not actively working on OSM I’d love to get feedback from someone more experienced within OSM.)

I’d be more than happy to create a first prototype of this decentralized pipeline (probably on Github) and test it on a small subset of businesses around me. This includes proper error handling, e.g. because the website changed. The goal would be to have working code, which everyone can use to add additional parsers for specific websites. These parsers should contain as little code as possible. Ideally, you would only have to supply an URL, a CSS-selector to get the opening hours and a format string indicating how the opening hours are structured.

To summarize, if someone wants to create their own parser of local businesses, they would simply have to fetch the github-repo; add (URL, CSS selector, and format string) for every website they want to check, and that’s it!

I’m more than happy to hear your thoughts on this topic!

If you haven’t yet I recommend taking a look at the Automated Edits code of conduct aswell as the Import Guidelines.
It should also be noted that web scraping can lead to questions of copyright and licence for which you might find a few answers over at Licence and Legal FAQ.

This is not meant as dismissal but you should be aware of these references.

Many thanks for your response, especially for the links! In fact, I’ve not looked into these wiki-pages, but I’ve done so now. I’ll wait a few more days and see if anyone replies to this thread with any suggestions/ideas/feedback. If not, I guess I’ll write the things specified in the “Automated Edits code of conduct” wiki page and send a mail to the mailing list. I don’t want to rush anything – I’ll try to be as cautious as possible and create a long-lasting enhancement to OSM.

It’s proven very challenging to keep POI up-to-date and a good solution for automatically comparing opening hours between OSM and business websites is something I’ve long been hoping for. So I think better tools for this task would be very valuable and even though the rest of my post is going to focus on the challenges I see and how you could deal with them, I’d really like to encourage you to work on this!

In my mind, there are two important caveats: Usability and community expectations for automated edits.

To elaborate, the most important factor on whether such a tool would succeed is probably the user experience, and I suspect the workflow you describe would prevent it from being widely adopted. Fetching a github repo and working with CSS selectors and format strings are both things that would likely scare off the vast majority of mappers.

So if you want this to attract a significant amount of users, I expect you would have to do most of the following:

  • Rely on OSM tags where possible instead of storing this information separately. To get the URL, use the (rare) opening_hours:url key if it’s present on a POI, fall back to website=* if it’s not.

  • Figure out how to detect the opening hours on a website automatically at least some of the time. Perhaps use the existing opening_hours value so you know what to look for on the website. The selectors + format strings might still be how your tool remembers this internally, and there could even be an chance for power users to define them manually to deal with websites that your tool cannot make sense of by itself. But you really want autodetection for common formats. That way, your tool could already find outdated opening hours in a mapper’s region of interest the first time they check it out! They wouldn’t have to first invest a lot of effort that won’t pay off until months or years later when the opening hours change. I think this is absolutely crucial for adoption.

  • Make it super-easy to try out and run. This probably means that your tool would be either web-based or integrated into a popular editor (i.e. JOSM or iD, maybe StreetComplete; JOSM has the benefit of allowing plugins so you don’t need to convince the maintainers to add it).

Second, kartonage has already pointed out the community expectations surrounding automated edits. I suspect that any solution which writes to the database without prior manual review would face opposition. For better or worse, the OSM community holds automated edits to a much higher standard than manual edits and expects them to be almost flawless – somthing that’s only usually correct, but occasionally adds errors, isn’t going to be accepted. Flagging your automated changes for manual review would likely not address those concerns, it would just be an admission that your tool isn’t up to that standard. So I expect you will have to set it up so that it only suggests changes to mappers, but doesn’t apply them until the mapper okays it. Once your tool has been used productively and reliably for a longer time, that may change – people may be open to more automation at that point.

Finally, because of the legal situation, you probably want at least a blacklist of sites to never copy opening hours from even if someone enters that URL (e.g. Google, Facebook or any of the large business directory sites).

1 Like

So a bot whould be nice, that
Crawle the Key:opening_hours:url - OpenStreetMap Wiki

And search/import json-ld for the current opening hours:

Automated updated opening hours would be really nice IMO!

What the community may accept in terms of automated vs manual updates may depend on the local communities in each country IMO. It’s likely not a good idea to try this in the UK or Germany that have very active communities, but as a tourist visiting places like Mexico and Vietnam I’d loved to have reliable data. Those communities are smaller, and could be more accepting for automated imports. I would also be positive about this in Norway, where we already have bots updating quite a few chain stores.

You should start by checking out alltheplaces.xyz, that already have spiders that collect opening hours.

However, there’s been very few imports from this dataset, but some in the US

Note that AllThePlaces seems quite promising but so far I spend quite a lot of time on improving matcher and ATP data itself

See my current preliminary outcome at Experimental All The Places <-> OpenStreetMap matcher (I can add more areas for review if anyone is interested, and yes better country selector is planned).

Also, feel free to report any data problems (some were already fixed, some known issues are in the process of being fixed).

I am also trying to fully verify legal status, see Maybe mention in readme that ATP takes data from first part websites? · Issue #8790 · alltheplaces/alltheplaces · GitHub

For now I plan to import website tags, other values are not reliable enough for me but hopefully it will change (no bot proposals exist yet)

1 Like

Anecdotally, I have found this isn’t the case anymore in my area (southern Ontario). Small businesses might not have websites at all, or might have a website but it’s out of date. Managers usually only update Google Maps when they change hours (and I guess probably Facebook but I can’t check that).

It is probably more reliable for chain businesses that have many locations and a dedicated website.

1 Like

Some small businesses also only maintain a Facebook page or Instagram account, and this is where they publish updated opening hours.

Since Overture source data from e.g., Facebook, I checked if they publish opening hours as part of the Place dataset. But that’s unfortunately not a property

(Besides, making a manual importer for Overture Places could be an interesting project. While the quality of everything there is not always good, it does have a link to the Facebook pages so it should be trivial to verify if a business is still operating)

1 Like