Is the license of alltheplaces suitable?

dieterdreist · June 14, 2023, 12:49pm

You just need to read what I wrote.

you seem to defend the status quo, to paraphrase: we have to respect European law because it is relevant for „big markets“ (I guess this is about user numbers, or is it revenue?).

Actually I tend to agree that realistically we have to respect western law as it is where most of our contributors and data users are, and our mission implies we will likely not respect other law if it would mean refraining from mapping.

Minh_Nguyen · June 14, 2023, 2:38pm

I’m not sure I understand the risk that we’re concerned about here. In the scenario I focused on, the dataset is produced and published by a U.S. entity without a significant EU (or UK) presence, scraped by a U.S.-based project, and contributed by a U.S.-based contributor. The only nexus to a jurisdiction with sui generis database rights is that it would end up getting distributed by a project with servers in the UK and EU, or that a Mapbox or OsmAnd would sell products based on it in the same jurisdictions.

Does the EU or UK claim a sort of universal jurisdiction over the protection of database rights? If so, what does this mean for just about any data that Americans have imported from their own government? The federal and state laws that enjoin agencies against copyrighting their work don’t necessarily apply overseas, and our laws are silent on the matter of database rights in Europe.

快乐的老鼠宝宝 · June 14, 2023, 3:18pm

If the foundation thinks that removing China from the data will make OSM more pure on a legal perspective, it is not impossible to separate it as a separate project

如果基金会觉得把中国从OSM中单独拆出来，把这块地方空出来，能够避免法律风险——也不是不行。

Of course, this is only an extreme case
But my personal opinion is that the fact that OSM is not legal in some areas does not affect it as open data

SimonPoole · June 14, 2023, 4:13pm

I’m not offering an opinion on if alltheplaces contains potentially infringing material or not, I’ve already pointed out the limits of sui generis protection (you surely can get somebody to vet the data in question).

I’m just noting that if it does, all your offshore claims and anger over the EU meddling in your affairs (bad case of the US pot calling the EU kettle black) are irrelevant.

SomeoneElse · June 14, 2023, 4:27pm

Github begs to differ with some of that, the “without a significant EU (or UK) presence” part.

Mateusz_Konieczny · June 14, 2023, 5:31pm

There is a difference between following local law making some data sources unavailable and following law making impossible to map anything or requiring us to put false data.

Matija_Nalis · June 14, 2023, 5:35pm

I agree that either obtaining, verification or the presentation is enough. However, you seem to miss (IMHO crucial) part of “to prevent extraction and/or re-utilization of the whole or of a substantial part”. IOW, it does not seem to trigger if there were substantial investment for some other purpose which is not regarding preventing extraction and/or re-utilization. Or do you read that differently?

My point being that shop chain which publicly publishes its lists and opening hours of shops does not do so in order to preventing extraction and/or re-utilization, thus that clause does not trigger. (but if it e.g. were putting it behind paywall (or some other means of preventing extraction or re-utilization) then it would trigger.

Again, you seem to miss the (crucial, to me) part which says “which conflict with a normal exploitation of that database or which unreasonably prejudice the legitimate interests of the maker of the database shall not be permitted”

Yes, scraping is exactly “extraction and/or re-utilization of […] parts of the contents of the database”, I agree with you with that. However, such scraping is totally fine, unless:

such scraping conflicts with normal exploitation of that database (which I argue it does not conflict, as making such DB availble to potential customers and other interesting parties is exactly why the shop is publishing its coordinates and opening hours), or
such scraping unreasonably prejudice the legitimate interests of the maker of the database (which again I argue it does not unreasonably prejudice, to the contrary, it is exactly the legitimate interest of the shop and a whole reason why they published the location and opening hours of their shops)

Absolutely. Also note that it is just my opinion, and IANAL either.

don't get me wrong either

(I also do not mean anything bad by emphasizing the parts that I find important but does not seem like you’ve taken them at the value I think they should be taken – I otherwise find rest of your post quite agreeable. Hopefully that should be taken for granted, but as it doesn’t look quite ideal to me, other people might take it wrongly, so I hope to clarify that I meant nothing insulting by “you seem to have missed” phrase, its just that I’m unable to come with one which sounds more benign/non-attacking [I’m willing to learn, so feel free to PM me suggestions] way of emphasizing such part of the clause that does not seem to have gotten adequate attention).

Matija_Nalis · June 14, 2023, 5:47pm

I agree, I see none of the issue in such case. If that data is unencumbered by copyright/database rights, one could do anything with that. Like e.g. if NASA puts some data in public domain, I can use it everywhere however I like it, even if I live in EU, and ESA would not put similar data in public domain – it is irrelevant.

The only nexus to a jurisdiction with sui generis database rights is that it would end up getting distributed by a project with servers in the UK and EU, or that a Mapbox or OsmAnd would sell products based on it in the same jurisdictions.

True, but only sui generis database rights that might possibly be created in such case would belong to the entity that created such database (e.g. some alltheplaces-alike scraper based in EU), and not to the original unprotected-data publishers (e.g. US McDonald’s chain or whatever). Thus, if such hypotethical alltheplaces.eu decided to release their rights under CC0, such data would again be free to import in OSM, as the only rights that existed were relinquished by that CC0 license.

Minh_Nguyen · June 14, 2023, 5:53pm

I’m not angry, just trying to understand if there’s a workable compromise that would allow some of the AllThePlaces scrapers to be used but not others. It isn’t difficult to determine whether a brick-and-mortar store chain operates in Europe.

If you expand that quote a little, you’ll see that it was taken out of context:

I never claimed that all of AllThePlaces would be admissible. Just AllThePlacesOfSomeOfTheChains. When considering an import of Amerigas locations via ATP, what does it matter that ATP also scrapes Aldi? Unlike OSM, ATP distributes a separate file for each company whose website got scraped, so it’s straightforward to jettison data about any company that does business in the EU for instance.

SomeoneElse · June 14, 2023, 6:21pm

I don’t think so - for completeness the full quote is here. It is clear from my link that people in UK/Europe are adding European data to “alltheplaces” (see e.g. here) and are also updating OSM with similar data.

My concern is actually more than the titular “Is the license of alltheplaces suitable” - it is “Would the way the project is being run (including CC0 claims here) allow third-parties to spread FUD about the suitability of OSM data for commercial use?”.

So you’re saying that the general answer to the titular question is “no” - but with the caveat that in some cases (perhaps where none of the parties involved have any EU/UK presence)?

Matija_Nalis · June 14, 2023, 7:03pm

Or that the generally answer is “yes” but with caveat that it is “no” in some cases (depending how you phrase it).

It should be relatively trivial to determine if some point is inside USA, and mark a dataset as potentially problematic if it not.

So I’d say the succinct answer would be “mostly harmless”

Minh_Nguyen · June 14, 2023, 7:05pm

To be crystal clear: I claimed that AllThePlaces has scraped some store locators of chains that have no European presence, such as Amerigas or the American Automobile Association. (They don’t all start with “A”; I’m just being lazy.) My intention is to find a compromise that would preserve the ability to consider scraped locations of these specific chains.

You correctly pointed out that ATP has merged some scrapers for chains that are in Europe, but I never claimed otherwise. Others can feel free to debate the admissibility of ATP’s European coverage in OSM, but I’m not convinced that database rights are relevant to the likes of Amerigas and the AAA when ATP keeps these datasets separate. A mapper could even take the source code and scrape just one of the websites individually, never allowing their hard drive to become tainted with the other companies’ data.

The general case is too big for me to comment on intelligibly. The non-EU/UK edge case is what I’m most interested in. I’m not a very ambitious armchair lawyer!

This is an interesting question. Perception is a funny thing, and preempting FUD can become self-defeating after a certain point. From my perspective, AllThePlaces is actually quite conservative and cautious, relying as it does on the chains’ published store locators. Meanwhile, there’s a whole cottage industry of companies that scrape all the Web about all the places. But at least here in the States, we have such a wealth of unencumbered data to import – assuming U.S. local governments don’t enjoy database rights in Europe – that importing all of AllThePlaces doesn’t strike me as an urgent priority.

Mateusz_Konieczny · January 5, 2024, 8:21am

Note https://osmfoundation.org/wiki/Licensing_Working_Group/Minutes/2023-08-14#Ticket#2023081110000064_—_First_party_websites_as_sources

LWG official position
Copying the opening hours of a business from its own website is fine. There are no copyright rights in factual information like opening hours. There’s no investment in the database for a business for its own opening hours, because that is something that the business has to have for its purpose of operating. A business does not have additional investment in a database, so there are no database rights to protect. Scraping/spidering is legal in many jurisdictions, including the US. Even where the legal status of scraping is uncertain, it does not impact whether or not OSM can use the resulting data - it’s just a matter of the personal risk of the person running the scraper.

All LWG members that were present voted in favour of the following:

LWG official position on community.osm.org thread
From a legal risk perspective, we do not consider accepting this information to be a legal risk to OSMF and therefore DWG is not going to revert these edits.

SomeoneElse · January 5, 2024, 9:49am

Just to be clear, you’ve answered a general “is the licence of X compatible” question with a statement from the LWG that “data of one specific type will be compatible with OSM under some specific circumstances”.

It doesn’t address the general question asked at the top (“Is the license of alltheplaces suitable?”) or make any statement about non-opening hours data on first-party websites (as discussed above).

Minh_Nguyen · January 5, 2024, 3:31pm

I think @Mateusz_Konieczny’s post is on-topic. The LWG fielded an inquiry from an AllThePlaces developer that recounted several specific objections to the use of AllThePlaces data in OSM, specifically citing the original post in this thread, and this was their response. It wasn’t limited to opening hours (emphasis on the word “like”). Perhaps they wanted to address these questions specifically so they wouldn’t have to deliver an opinion on every similar scraper that comes along. You could demand a yes or no answer from the LWG if you feel it would make a difference.

SomeoneElse · January 6, 2024, 1:10am

To be clear, the specific request was quoted here and the questions asked there*** were:

I guess my questions are: 1) Can we use first party websites as sources for independent POIs?
2) Can we use first party websites as sources for chain POIs?
3) Are urls copyrighted?
4) Can we collect opening hours off doors?

The LWG’s answer can be seen below that, starting

LWG official position
Copying the opening hours of a business from its own website is fine…

That, I’m sure, is non-controversial. However some parts of what follows are somewhat “interesting”, not least:

… Even where the legal status of scraping is uncertain, it does not impact whether or not OSM can use the resulting data - it’s just a matter of the personal risk of the person running the scraper

which seems to imply that data can be “license-washed” by being included in a third-party database such as “alltheplaces”. Perhaps it would help for the LWG to expand on what they actually mean there? Maybe I’ve just misunderstood the sense of it.

However, as noted in the request to the LWG, the business owners want this data to be public, it is extremely unlikely that there will be complaints from them that this data is being used by OSM et al**. The challenge occurs when the data isn’t actually the business owner’s to distribute under a licence of their choosing (see e.g. the Postcode Address File question above****). That’s why it’s important not to assume that “any data that might have got into alltheplaces is therefore freely licenced for use in OSM regardless of what licence it was originally under” and instead to read what the LWG actually said.

Best Regards,
Andy (for the avoidance of doubt, writing in a personal capacity, and most definitely not a lawyer)

** although there are exceptions; with a DWG hat on I can think of a few, in most cases invalid, complaints.

*** Incidentally, the “As far as I know [alltheplaces is] not used as a source to add data to OSM” statement in there is at best “incompletely informed”; a search though changeset tags for “%alltheplaces%” finds quite a few.

**** and when I browse data locally I see almost no opening hours in alltheplaces data but lots of postcodes.

Minh_Nguyen · January 6, 2024, 1:42am

Scraping is not a violation of copyright law per se – that’s the domain of contract law, terms of service, computer abuse laws, etc. What they said is that someone engaging in scraping takes on some personal risk, depending on the jurisdiction, but that the result of that scraping isn’t necessarily tainted by association, nor does it necessarily make the scraped content free.

A lawyer by training (but not my lawyer) once made an analogy to being handed a photocopy of a public domain book that had been shoplifted from a bookstore. There are undoubtedly limits to this analogy, but yes, license-washing is a thing, and there’s a fair chance that your computer exists because of it. To be clear, it’s not something I personally care to devote my time to, because it’s a bit of a Rube Goldberg contraption compared to what I’m most interested in doing.

Edit: This turns out to be a poor analogy altogether, and I misinterpreted the term “license-washing”. The point about scraping being orthogonal to copyright still stands, however.

KoiAndBlueBird · January 6, 2024, 1:49am

Probably a stupid question but couldn’t the LWG just make a final decision on this one? I feel like they are the most qualified for that, right?

Fizzie41 · January 6, 2024, 2:18am

Just had a look in my locale & noticed:

https://www.alltheplaces.xyz/map/#14.65/-28.08091/153.43921

All good, except that the marked Night Owl is actually located in the same group of shops as Brumbies Bakery, Coles & Shell in the bottom left corner!

Mateusz_Konieczny · January 6, 2024, 5:25am

IANAL, IANAL, IANAL, this is my personal understanding of situation, not consulted with anyone

SomeoneElse:

… Even where the legal status of scraping is uncertain, it does not impact whether or not OSM can use the resulting data - it’s just a matter of the personal risk of the person running the scraper
which seems to imply that data can be “license-washed” by being included in a third-party database such as “alltheplaces”.

My understanding is that license-washing (someone taking copyrighted data, falsely claiming that it is openly licensed and publishing it on license not applicable to it) is distinct from either clean room design and scrapping.

In clean room design something is recreated by people who have not seen original implementation, and therefore it is provable that new work is not tainted by copyrighted original (and only non-copyrightable parts are used as inspiration - for example, as I understand, it is legal and fine to look at Google Maps public transport routing and decide to add similar functionality to Organic Maps but using leaked Google Maps code or even implementing it after looking at leaked Google Maps code would be problematic).
In terms of map data clean room design would be going to some location after hearing that say “Google Maps has many shops mapped at Foobar Street, OSM has none”, without looking at Google Maps (so shops are clearly not copied from there).
So clean room design is kind of opposite of license washing.

Scraping does not effect legality of copied data, but doing it may break some rules - and depending on type of scraping and local laws may be illegal.
In terms of other mapping activity - it can be similar to mapping military installation where it is not copyright issue but mapper may break some local laws while doing it (analogy is not ideal as map with military installations may be illegal according to local law, while only scraping is problematic and there is no trouble with using product of it).
Maybe “flying a drone to get image for mapping” may be better analogy? Flying drone may or may not be illegal, depending on how it was done and local law. But mapping done based on that will not be affected by how images were collected.

(though I guess that in some jurisdiction they could rule that data obtained by data scrapping/people with wrong religion/illegal drone flights is illegal and cannot be used and taints any products? But as I understand in typical jurisdictions this does not apply)

So LWG commented, as I understand, that ATP is not license washing because it was not copyrighted or database rights protected in the first place.

IANAL, IANAL, IANAL, this is my personal understanding of situation, not consulted with anyone