Is the license of alltheplaces suitable?

Matija_Nalis · June 14, 2023, 5:35pm

I agree that either obtaining, verification or the presentation is enough. However, you seem to miss (IMHO crucial) part of “to prevent extraction and/or re-utilization of the whole or of a substantial part”. IOW, it does not seem to trigger if there were substantial investment for some other purpose which is not regarding preventing extraction and/or re-utilization. Or do you read that differently?

My point being that shop chain which publicly publishes its lists and opening hours of shops does not do so in order to preventing extraction and/or re-utilization, thus that clause does not trigger. (but if it e.g. were putting it behind paywall (or some other means of preventing extraction or re-utilization) then it would trigger.

Again, you seem to miss the (crucial, to me) part which says “which conflict with a normal exploitation of that database or which unreasonably prejudice the legitimate interests of the maker of the database shall not be permitted”

Yes, scraping is exactly “extraction and/or re-utilization of […] parts of the contents of the database”, I agree with you with that. However, such scraping is totally fine, unless:

such scraping conflicts with normal exploitation of that database (which I argue it does not conflict, as making such DB availble to potential customers and other interesting parties is exactly why the shop is publishing its coordinates and opening hours), or
such scraping unreasonably prejudice the legitimate interests of the maker of the database (which again I argue it does not unreasonably prejudice, to the contrary, it is exactly the legitimate interest of the shop and a whole reason why they published the location and opening hours of their shops)

Absolutely. Also note that it is just my opinion, and IANAL either.

don't get me wrong either

(I also do not mean anything bad by emphasizing the parts that I find important but does not seem like you’ve taken them at the value I think they should be taken – I otherwise find rest of your post quite agreeable. Hopefully that should be taken for granted, but as it doesn’t look quite ideal to me, other people might take it wrongly, so I hope to clarify that I meant nothing insulting by “you seem to have missed” phrase, its just that I’m unable to come with one which sounds more benign/non-attacking [I’m willing to learn, so feel free to PM me suggestions] way of emphasizing such part of the clause that does not seem to have gotten adequate attention).

Matija_Nalis · June 14, 2023, 5:47pm

I agree, I see none of the issue in such case. If that data is unencumbered by copyright/database rights, one could do anything with that. Like e.g. if NASA puts some data in public domain, I can use it everywhere however I like it, even if I live in EU, and ESA would not put similar data in public domain – it is irrelevant.

The only nexus to a jurisdiction with sui generis database rights is that it would end up getting distributed by a project with servers in the UK and EU, or that a Mapbox or OsmAnd would sell products based on it in the same jurisdictions.

True, but only sui generis database rights that might possibly be created in such case would belong to the entity that created such database (e.g. some alltheplaces-alike scraper based in EU), and not to the original unprotected-data publishers (e.g. US McDonald’s chain or whatever). Thus, if such hypotethical alltheplaces.eu decided to release their rights under CC0, such data would again be free to import in OSM, as the only rights that existed were relinquished by that CC0 license.

Minh_Nguyen · June 14, 2023, 5:53pm

I’m not angry, just trying to understand if there’s a workable compromise that would allow some of the AllThePlaces scrapers to be used but not others. It isn’t difficult to determine whether a brick-and-mortar store chain operates in Europe.

If you expand that quote a little, you’ll see that it was taken out of context:

I never claimed that all of AllThePlaces would be admissible. Just AllThePlacesOfSomeOfTheChains. When considering an import of Amerigas locations via ATP, what does it matter that ATP also scrapes Aldi? Unlike OSM, ATP distributes a separate file for each company whose website got scraped, so it’s straightforward to jettison data about any company that does business in the EU for instance.

SomeoneElse · June 14, 2023, 6:21pm

I don’t think so - for completeness the full quote is here. It is clear from my link that people in UK/Europe are adding European data to “alltheplaces” (see e.g. here) and are also updating OSM with similar data.

My concern is actually more than the titular “Is the license of alltheplaces suitable” - it is “Would the way the project is being run (including CC0 claims here) allow third-parties to spread FUD about the suitability of OSM data for commercial use?”.

So you’re saying that the general answer to the titular question is “no” - but with the caveat that in some cases (perhaps where none of the parties involved have any EU/UK presence)?

Matija_Nalis · June 14, 2023, 7:03pm

Or that the generally answer is “yes” but with caveat that it is “no” in some cases (depending how you phrase it).

It should be relatively trivial to determine if some point is inside USA, and mark a dataset as potentially problematic if it not.

So I’d say the succinct answer would be “mostly harmless”

Minh_Nguyen · June 14, 2023, 7:05pm

To be crystal clear: I claimed that AllThePlaces has scraped some store locators of chains that have no European presence, such as Amerigas or the American Automobile Association. (They don’t all start with “A”; I’m just being lazy.) My intention is to find a compromise that would preserve the ability to consider scraped locations of these specific chains.

You correctly pointed out that ATP has merged some scrapers for chains that are in Europe, but I never claimed otherwise. Others can feel free to debate the admissibility of ATP’s European coverage in OSM, but I’m not convinced that database rights are relevant to the likes of Amerigas and the AAA when ATP keeps these datasets separate. A mapper could even take the source code and scrape just one of the websites individually, never allowing their hard drive to become tainted with the other companies’ data.

The general case is too big for me to comment on intelligibly. The non-EU/UK edge case is what I’m most interested in. I’m not a very ambitious armchair lawyer!

This is an interesting question. Perception is a funny thing, and preempting FUD can become self-defeating after a certain point. From my perspective, AllThePlaces is actually quite conservative and cautious, relying as it does on the chains’ published store locators. Meanwhile, there’s a whole cottage industry of companies that scrape all the Web about all the places. But at least here in the States, we have such a wealth of unencumbered data to import – assuming U.S. local governments don’t enjoy database rights in Europe – that importing all of AllThePlaces doesn’t strike me as an urgent priority.

Mateusz_Konieczny · January 5, 2024, 8:21am

Note https://osmfoundation.org/wiki/Licensing_Working_Group/Minutes/2023-08-14#Ticket#2023081110000064_—_First_party_websites_as_sources

LWG official position
Copying the opening hours of a business from its own website is fine. There are no copyright rights in factual information like opening hours. There’s no investment in the database for a business for its own opening hours, because that is something that the business has to have for its purpose of operating. A business does not have additional investment in a database, so there are no database rights to protect. Scraping/spidering is legal in many jurisdictions, including the US. Even where the legal status of scraping is uncertain, it does not impact whether or not OSM can use the resulting data - it’s just a matter of the personal risk of the person running the scraper.

All LWG members that were present voted in favour of the following:

LWG official position on community.osm.org thread
From a legal risk perspective, we do not consider accepting this information to be a legal risk to OSMF and therefore DWG is not going to revert these edits.

SomeoneElse · January 5, 2024, 9:49am

Just to be clear, you’ve answered a general “is the licence of X compatible” question with a statement from the LWG that “data of one specific type will be compatible with OSM under some specific circumstances”.

It doesn’t address the general question asked at the top (“Is the license of alltheplaces suitable?”) or make any statement about non-opening hours data on first-party websites (as discussed above).

Minh_Nguyen · January 5, 2024, 3:31pm

I think @Mateusz_Konieczny’s post is on-topic. The LWG fielded an inquiry from an AllThePlaces developer that recounted several specific objections to the use of AllThePlaces data in OSM, specifically citing the original post in this thread, and this was their response. It wasn’t limited to opening hours (emphasis on the word “like”). Perhaps they wanted to address these questions specifically so they wouldn’t have to deliver an opinion on every similar scraper that comes along. You could demand a yes or no answer from the LWG if you feel it would make a difference.

SomeoneElse · January 6, 2024, 1:10am

To be clear, the specific request was quoted here and the questions asked there*** were:

I guess my questions are: 1) Can we use first party websites as sources for independent POIs?
2) Can we use first party websites as sources for chain POIs?
3) Are urls copyrighted?
4) Can we collect opening hours off doors?

The LWG’s answer can be seen below that, starting

LWG official position
Copying the opening hours of a business from its own website is fine…

That, I’m sure, is non-controversial. However some parts of what follows are somewhat “interesting”, not least:

… Even where the legal status of scraping is uncertain, it does not impact whether or not OSM can use the resulting data - it’s just a matter of the personal risk of the person running the scraper

which seems to imply that data can be “license-washed” by being included in a third-party database such as “alltheplaces”. Perhaps it would help for the LWG to expand on what they actually mean there? Maybe I’ve just misunderstood the sense of it.

However, as noted in the request to the LWG, the business owners want this data to be public, it is extremely unlikely that there will be complaints from them that this data is being used by OSM et al**. The challenge occurs when the data isn’t actually the business owner’s to distribute under a licence of their choosing (see e.g. the Postcode Address File question above****). That’s why it’s important not to assume that “any data that might have got into alltheplaces is therefore freely licenced for use in OSM regardless of what licence it was originally under” and instead to read what the LWG actually said.

Best Regards,
Andy (for the avoidance of doubt, writing in a personal capacity, and most definitely not a lawyer)

** although there are exceptions; with a DWG hat on I can think of a few, in most cases invalid, complaints.

*** Incidentally, the “As far as I know [alltheplaces is] not used as a source to add data to OSM” statement in there is at best “incompletely informed”; a search though changeset tags for “%alltheplaces%” finds quite a few.

**** and when I browse data locally I see almost no opening hours in alltheplaces data but lots of postcodes.

Minh_Nguyen · January 6, 2024, 1:42am

Scraping is not a violation of copyright law per se – that’s the domain of contract law, terms of service, computer abuse laws, etc. What they said is that someone engaging in scraping takes on some personal risk, depending on the jurisdiction, but that the result of that scraping isn’t necessarily tainted by association, nor does it necessarily make the scraped content free.

A lawyer by training (but not my lawyer) once made an analogy to being handed a photocopy of a public domain book that had been shoplifted from a bookstore. There are undoubtedly limits to this analogy, but yes, license-washing is a thing, and there’s a fair chance that your computer exists because of it. To be clear, it’s not something I personally care to devote my time to, because it’s a bit of a Rube Goldberg contraption compared to what I’m most interested in doing.

Edit: This turns out to be a poor analogy altogether, and I misinterpreted the term “license-washing”. The point about scraping being orthogonal to copyright still stands, however.

KoiAndBlueBird · January 6, 2024, 1:49am

Probably a stupid question but couldn’t the LWG just make a final decision on this one? I feel like they are the most qualified for that, right?

Fizzie41 · January 6, 2024, 2:18am

Just had a look in my locale & noticed:

https://www.alltheplaces.xyz/map/#14.65/-28.08091/153.43921

All good, except that the marked Night Owl is actually located in the same group of shops as Brumbies Bakery, Coles & Shell in the bottom left corner!

Mateusz_Konieczny · January 6, 2024, 5:25am

IANAL, IANAL, IANAL, this is my personal understanding of situation, not consulted with anyone

SomeoneElse:

… Even where the legal status of scraping is uncertain, it does not impact whether or not OSM can use the resulting data - it’s just a matter of the personal risk of the person running the scraper
which seems to imply that data can be “license-washed” by being included in a third-party database such as “alltheplaces”.

My understanding is that license-washing (someone taking copyrighted data, falsely claiming that it is openly licensed and publishing it on license not applicable to it) is distinct from either clean room design and scrapping.

In clean room design something is recreated by people who have not seen original implementation, and therefore it is provable that new work is not tainted by copyrighted original (and only non-copyrightable parts are used as inspiration - for example, as I understand, it is legal and fine to look at Google Maps public transport routing and decide to add similar functionality to Organic Maps but using leaked Google Maps code or even implementing it after looking at leaked Google Maps code would be problematic).
In terms of map data clean room design would be going to some location after hearing that say “Google Maps has many shops mapped at Foobar Street, OSM has none”, without looking at Google Maps (so shops are clearly not copied from there).
So clean room design is kind of opposite of license washing.

Scraping does not effect legality of copied data, but doing it may break some rules - and depending on type of scraping and local laws may be illegal.
In terms of other mapping activity - it can be similar to mapping military installation where it is not copyright issue but mapper may break some local laws while doing it (analogy is not ideal as map with military installations may be illegal according to local law, while only scraping is problematic and there is no trouble with using product of it).
Maybe “flying a drone to get image for mapping” may be better analogy? Flying drone may or may not be illegal, depending on how it was done and local law. But mapping done based on that will not be affected by how images were collected.

(though I guess that in some jurisdiction they could rule that data obtained by data scrapping/people with wrong religion/illegal drone flights is illegal and cannot be used and taints any products? But as I understand in typical jurisdictions this does not apply)

So LWG commented, as I understand, that ATP is not license washing because it was not copyrighted or database rights protected in the first place.

IANAL, IANAL, IANAL, this is my personal understanding of situation, not consulted with anyone

Minh_Nguyen · January 6, 2024, 5:42am

Thanks for the clarification. I wasn’t aware that this was a formal term for that practice specifically.

Ah nice analogy, and more relevant to our project too.

Mateusz_Konieczny · January 6, 2024, 5:48am

not sure is it formal term, but that is how it was used by Wikimedia Commons community.

I found now

SimonPoole · January 6, 2024, 9:46am

The limits are that there are not even claimed intellectual property rights involved in the example and it illustrates exactly nothing of relevance to this thread.

The whole point of (conventional) copyright is that it is a near universal, state guaranteed, set of exclusive rights that does not rely on contracts between the parties to take effect. There are some corner cases wrt terms, and the US “fair use” terms tend to be substantially more relaxed than in other countries, but as said, corner cases.

I find the statement from the LWG a bit unfortunate as it could be taken as saying that material on a website is not protected at all, but naturally that is not the case, images, text, audio, visual etc. material that is eligible for copyright protection naturally doesn’t lose that protection just because the rights owner decided to use them on a website, or licence such use. So if you scrape images from websites and try to reuse them in a form that isn’t an exception in copyright protection you are asking for trouble.

What the LWG is referring to is the extraction of information, or if you so will “data”, from websites and that has already been discussed at length in this thread.

Minh_Nguyen · January 6, 2024, 3:39pm

OK, fair enough. I’ve redacted it from my earlier post.

ramseraph1 · January 8, 2024, 2:15am

Doesn’t look this question was ever appropriately answered… If a citizen of a country where there are no database rights, were to want to add data to OSM, from a database consisting of non-copyrightable data, from an organization which only operates in that country( lets say the government of that country )… is it allowed?

EDITS: grammar and punctuation… hope it is more readable now.

IanH · January 8, 2024, 6:16am

The largest problem is that the OSM community is not one entity. Even with the official organization being registered as a charitable organization in the UK, each chapter is bound by the laws of the respective countries.

Then there is the official database that carries data with multiple copyrights that have been traditionally used with software. Licenses that have never been truly tested in most jurisdictions.

That doesn’t even include the different members with pseudo official websites and tools. Each maintained by thier own subgroups.

At the end of the day, the best we can do is show that we are working in good faith. Doing our best to follow the spirit of the respective laws in whatever jurisdictions where the map might efitted or accessed. I doubt any multinational corporation’s legal department wouldn’t want to deal with our situation. That is assuming they could even figure it all out if they wanted to. We likely get a pass for of these mostly due to being a charity that provides so much value to those in so many jurisdictions.

For that reason alone most of the government agencies and corporations we partners with have an interest in preventing any real litigation from moving forward. They lose a lot when inoffensive volunteers get servered with legal paperwork. In a way, OSM has become the UN of mapping by holding ourselves to a higher standard.