What you think about importing opening_hours data from AllThePlaces?

For specific case where I am looking into import - of website tags for start - is Empik in Poland.

See empik_pl website import candidates (entire website is work in progress, experimental, use on own risk. I may filter out more data and slap more disclaimers on it - let me know if anything like that should be changed.)

Looking at say Salon Empik - Wrocław Zwycięska may ring alarm bells as page has

<iframe width="100%" height="100%" style="border: 0px;" loading="lazy" allowfullscreen="" src="https://www.google.com/maps/embed/v1/place?key=AIzaSyBl5C-uONKhzn9Nmn3DJXAP42tYWejo5YU&amp;q=Empik,Wrocław Zwycięska,ul. Zwycięska 33"></iframe>

and as I understand it displays location on Google Maps, with geolocation done by Google Maps.

I definitely would not want to do imports based on addresses geolocated with Google Maps. And I would proceed to revert any spotted.

But… Spider is defined at alltheplaces/locations/spiders/empik_pl.py at 350fd8686ccf52039969ccaf39317bbae22fc2fb · alltheplaces/alltheplaces · GitHub and it actually pulls data from Empik itself.

It can be replicated with following Python script:

import rich
import json
import requests

response = Request(
            method="POST",
            url="https://www.empik.com/ajax/delivery-point/empik?query=",
            headers={},
            cookies={"CSRF": "42adc778-4158-4646-8ca9-e97ce140da75"},
        )
response = requests.post("https://www.empik.com/ajax/delivery-point/empik?query=", headers={"X-CSRF-TOKEN": "42adc778-4158-4646-8ca9-e97ce140da75"}, cookies={"CSRF": "42adc778-4158-4646-8ca9-e97ce140da75"})

try:
    data = response.json()
    rich.print(data)
except json.JSONDecodeError:
    print("Failed to decode JSON:", response.text)

and has entries like

    {
        'id': 123,
        'deliveryPointType': 11,
        'city': 'Żywiec',
        'name': 'Żywiec Lider (SP)',
        'address': 'ul. Zielona 3',
        'phone': '695550266',
        'faxNumber': None,
        'cellPhone': None,
        'email': 'zywiec.lider@empik.com',
        'postCode': '34-300',
        'mondayWorkingHours': '9:00-20:00',
        'tuesdayWorkingHours': '9:00-20:00',
        'wednesdayWorkingHours': '9:00-20:00',
        'thursdayWorkingHours': '9:00-20:00',
        'fridayWorkingHours': '9:00-20:00',
        'saturdayWorkingHours': '9:00-20:00',
        'sundayWorkingHours': '10:00-18:00',
        'phoneVisible': True,
        'longitude': 19.2006465,
        'latitude': 49.6909741,
        'lastUsedDate': None,
        'blockedTemporarily': False,
        'temporarilyBlockedForSmallGauge': False,
        'temporarilyBlockedForAverageGauge': False,
        'manuallyBlockedByBusiness': False,
        'empikStoreType': 1,
        'storePage': '/salony-empik/zywiec/zywiec-lider-sp,123,e',
        'closed': False,
        'default': False
    }

This data looks safe to use for me.

1 Like