Especially during covid-lockdowns (at least in my country) opening hours of businesses changes somewhat regularly. As you might guess, the data in OSM are rarely up-to-date. Since most businesses have their up-to-date opening hours on their website, I thought about writing a pipeline to automatically (IMO daily or even weekly will be just fine) visit their websites and change the opening hours in OSM accordingly. Since (most) businesses don’t have an API to pull this data from, I guess web scraping is the way to go (unfortunately). This means that this pipeline of automatically updating opening hours needs to have a mechanism to handle website changes somewhat fine (even if this is just a notification to a developer that something failed).
Furthermore, you probably still want to manually review these changes (in order to prevent serious spamming in case of a malfunction). IMO, the most sensible way to go for is to change the opening hours of the businesses directly but mark the change set as “to be reviewed by someone”.
Since I don’t think a centralized version where one instance checks the opening hours of a lot of businesses, I’d propose a more decentralized approach (see the following paragraphs).
Before I create even a first prototype I wanted to ask: Is this something the community is interested in? Or do you think it’s not necessary and not the goal of OSM? (I certainly think that up-to-date opening hours are valuable, as this would decrease my dependency on Google Maps even more. But since I’m not actively working on OSM I’d love to get feedback from someone more experienced within OSM.)
I’d be more than happy to create a first prototype of this decentralized pipeline (probably on Github) and test it on a small subset of businesses around me. This includes proper error handling, e.g. because the website changed. The goal would be to have working code, which everyone can use to add additional parsers for specific websites. These parsers should contain as little code as possible. Ideally, you would only have to supply an URL, a CSS-selector to get the opening hours and a format string indicating how the opening hours are structured.
To summarize, if someone wants to create their own parser of local businesses, they would simply have to fetch the github-repo; add (URL, CSS selector, and format string) for every website they want to check, and that’s it!
I’m more than happy to hear your thoughts on this topic!
Many thanks for your response, especially for the links! In fact, I’ve not looked into these wiki-pages, but I’ve done so now. I’ll wait a few more days and see if anyone replies to this thread with any suggestions/ideas/feedback. If not, I guess I’ll write the things specified in the “Automated Edits code of conduct” wiki page and send a mail to the mailing list. I don’t want to rush anything – I’ll try to be as cautious as possible and create a long-lasting enhancement to OSM.
It’s proven very challenging to keep POI up-to-date and a good solution for automatically comparing opening hours between OSM and business websites is something I’ve long been hoping for. So I think better tools for this task would be very valuable and even though the rest of my post is going to focus on the challenges I see and how you could deal with them, I’d really like to encourage you to work on this!
In my mind, there are two important caveats: Usability and community expectations for automated edits.
To elaborate, the most important factor on whether such a tool would succeed is probably the user experience, and I suspect the workflow you describe would prevent it from being widely adopted. Fetching a github repo and working with CSS selectors and format strings are both things that would likely scare off the vast majority of mappers.
So if you want this to attract a significant amount of users, I expect you would have to do most of the following:
Rely on OSM tags where possible instead of storing this information separately. To get the URL, use the (rare) opening_hours:url key if it’s present on a POI, fall back to website=* if it’s not.
Figure out how to detect the opening hours on a website automatically at least some of the time. Perhaps use the existing opening_hours value so you know what to look for on the website. The selectors + format strings might still be how your tool remembers this internally, and there could even be an chance for power users to define them manually to deal with websites that your tool cannot make sense of by itself. But you really want autodetection for common formats. That way, your tool could already find outdated opening hours in a mapper’s region of interest the first time they check it out! They wouldn’t have to first invest a lot of effort that won’t pay off until months or years later when the opening hours change. I think this is absolutely crucial for adoption.
Make it super-easy to try out and run. This probably means that your tool would be either web-based or integrated into a popular editor (i.e. JOSM or iD, maybe StreetComplete; JOSM has the benefit of allowing plugins so you don’t need to convince the maintainers to add it).
Second, kartonage has already pointed out the community expectations surrounding automated edits. I suspect that any solution which writes to the database without prior manual review would face opposition. For better or worse, the OSM community holds automated edits to a much higher standard than manual edits and expects them to be almost flawless – somthing that’s only usually correct, but occasionally adds errors, isn’t going to be accepted. Flagging your automated changes for manual review would likely not address those concerns, it would just be an admission that your tool isn’t up to that standard. So I expect you will have to set it up so that it only suggests changes to mappers, but doesn’t apply them until the mapper okays it. Once your tool has been used productively and reliably for a longer time, that may change – people may be open to more automation at that point.
Finally, because of the legal situation, you probably want at least a blacklist of sites to never copy opening hours from even if someone enters that URL (e.g. Google, Facebook or any of the large business directory sites).