Automated edit proposal - get rid of google.com/maps image urls

Friendly_Ghost · June 7, 2023, 11:53pm

The title says it all: I propose to mass remove all image tags that contain google.com/maps.
These links do not lead to picture files and they contain copyrighted content from Google.

The Overpass query I propose to use for this automated edit is https://overpass-turbo.eu/s/1vQg. A simpler query with out count gives me 444 results, so the scale of this edit will be relatively small.

After removing the image tags I can run JOSM’s validator (shortcut shift + V) to find issues that can reasonably be fixed within the same changeset. I can then share the resulting changeset here and on Mechanical edits/Friendly Ghost on the Wiki.

SomeoneElse · June 8, 2023, 9:20am

Something that would also be useful would be to investigate that actual source of these. A couple of local ones, here and here seem to be SEO by “one edit per account” editors. Often when that happens there will be some obvious rubbish in the wrong place that can just be deleted altogether (though these two have both been edited subsequently and clearly aren’t “rubbish in the wrong place”).

Tordanik · June 8, 2023, 9:31am

Are you volunteering to do that? Otherwise, let’s please not prevent an useful automated edit with a vague “someone who’s not me could someday put a lot of effort into reviewing these individually and perhaps discover something interesting along the way”.

SomeoneElse · June 8, 2023, 9:53am

With a DWG hat on I’ve already commented on one of these, yes. However, OSM is a community project and local communities are key to maintaining OSM data quality. A list of “things that may or may not be wrong” is worth looking at with the benefit of local knowledge. The one I commented on was relatively local to me and I was able to use that to see that one of the values was obviously wrong (not malicious; just an error).

Er, who is preventing anything? Overpass can run with a date parameter so this query will still be possible whether or not this automated edit has happened or not.

What I’m rather more worried about is whether there’s any use of Google sources in OSM here, which would require redaction, rather than just a mechanical edit.

SimonPoole · June 8, 2023, 10:48am

The problem is it isn’t necessary an useful edit.

Literally the 1st one I checked was Node: ‪VELO-ART.CH GMBH‬ (‪10578653180‬) | OpenStreetMap, would it be “nicer” if it linked to the same images hosted on a more appropriate platform, yes, but that’s about it, it is just as useful or useless as any other image link.

Essentially all image tags contain links to copyrighted material (the narrow exception being images out of copyright and US federal government material), with other words that is not a reason.

If there was a policy / general agreement that links in OSM should only point to “open” content, then I could get behind removing any such problematic tags, except that means that links to the majority of websites and other content would have to go.

Tordanik · June 8, 2023, 11:15am

In my opinion, links in OSM should indeed point to either a first-party resource (a shop’s official website in a website=* tag would be the typical example) or else openly licensed third-party content (such as links to Wikipedia, Wikidata, Commons, etc.). Links to third-party, proprietary content do not feel like a good fit for OSM.

You are correct, however, that no such policy exists.

Taya_S · June 8, 2023, 12:25pm

A bit off topic from the original proposed edit maybe, but out of curiosity I ran the query:

nwr[source~"google",i];
(._;>;);
out meta;

and got 60MB worth of result and ~40k objects. all listing google in some form in the source tag. Thats quite a few objects. It might also be an idea to sort those out sometime. No clue how to best deal with that, since copyright is involved.

This included sources such as:
source=Google Streetview June 2015 image
source=https://drive.google.com/file/RestOfTheUrlRedacted
source=Google Workspace Updates: New community features for Google Chat and an update on Currents (lol)
(sidenote: also found over 500 dead google+ website links)
source=Google
source=Google maps and local knowledge
source=Bing and Google
source=Google Sattelite

SomeoneElse · June 8, 2023, 1:03pm

Yes, I am working through those. For completeness, be aware of https://overpass-turbo.eu/s/1vRQ, which includes post Haiti 2010 earthquake mapping there. See WikiProject Haiti/Earthquake map resources - OpenStreetMap Wiki .

Edit: Also, some of the others may be valid - some information that someone has themselves stored at Google Drive presumably doesn’t become invalid just because they stored it on Google Drive? Google Plus might (maybe - not sure what the Ts and Cs were) count as a “website” in the way that for some businesses, their “website” is at Facebook?

Taya_S · June 8, 2023, 1:20pm

Oh wow yeah, thats a full half of all the google sources I found.

I wasnt suggesting those were invalid, mostly just highlighting some of the common sources I found. The google drive ones just depend on whatever it links to and the google+ ones as far as I can tell are now just dead links.

SomeoneElse · June 8, 2023, 1:23pm

… and that would be a useful project - going through the Google Plus links and figuring out what the businesses in question have done with their “online presence” - created another website? Moved to Facebook / Instagram / Something else?

Taya_S · June 8, 2023, 1:32pm

Yeah, it sounds like it would. Would MapRoulette be fit for this? I havent really used the platforrm much, so I’m not fully sure. If it is I’ll have a look at creating a challenge for it there.

Edit: I created this challenge MapRoulette

SimonPoole · June 8, 2023, 2:59pm

Just for those that didn’t actually follow the link, it points to the shops gmap presence that they very likely added or at least updated themselves. It is a bit splitting hairs to declare that “bad” and hosting the same content somewhere non-google “good”.

All that said, this would likely actually be a reasonable thing to systematically check: see if the links still work and point to an alternative location and/or simplify them if possible. For example for the shop in question https://www.google.com/maps/contrib/114608459294167716655/photos/ seems to work.

Friendly_Ghost · June 8, 2023, 8:17pm

How so?

Fair enough. The other reason I gave still stands.

That’s wishful thinking since most map objects never make it past version 1 or 2. We (the mappers who are actively involved in QA) don’t have enough manpower to review all tags of objects on such a list manually. If you believe otherwise, then I have some MR tasks that will keep you busy for the rest of the year.
Getting rid of some wrong data while not noticing each and every other issue in sight is still a good step forward.

SomeoneElse · June 8, 2023, 8:19pm

Not me guv - too busy outside mapping

Friendly_Ghost · June 12, 2023, 10:38am

Can the presence of image~google.com/maps tags help in determining whether there’s any use of Google sources in OSM, or can the tags be removed without hindering this other validation process?

SomeoneElse · June 12, 2023, 10:55am

Without looking at the list of what values are stored, it’s difficult to comment. Above there were 444 that matched your original query, so someone would need to go through that.

Friendly_Ghost · June 12, 2023, 8:21pm

So I’ve been clicking for a while on a whole bunch of different objects that match the query and to changesets in which these objects were originally added. Each of them that I checked seems to be valid, meaning that they were likely mapped without the use of Gmaps, but mappers probably wanted to add a nice picture to whatever they mapped in the hope that it would show up on Mapcarta or something.

The query nwr[image~"google.com/maps",i][source~"google",i];out count; also returned zero results.

There are some weird objects in there, like objects with only a name and an image tag, which should probably be revised. I already contacted a mapper of one such case here.

I can limit the mass edit to objects that contain at least one of the keys listed on Map Features on the Wiki or can otherwise be considered valid, and leave the invalid map objects aside for closer review by mappers with more local knowledge.

It’s also notable that there are no cases of image~google.com/maps at all in France, which leads me to believe that the French have already made a successful effort to purge these tags.

Matija_Nalis · June 13, 2023, 9:29pm

If I understand correctly, you propose to remove image=* tags from some objects. Image tags point to images of those objects, which are useful for some use(r)s. (Or am I misunderstanding and does general usefulness of image tags needs defending?). Thus:

proposed edit by the very definition of “removing information” itself is reducing value of OSM data (if we agree?)
Also, people who put effort into adding useful data to the map, usually get sad when that effort is annihilated; and making contributors sad is usually bad in itself if one wants to accomplish growing community.

So, as I see it, in order to be "useful edit", such automated edit would need accomplish positive things that outweigh negative things, agreed? So, what seems to me to be disputed is that “positive things automated edit would do indeed outweigh negative things it would do”.

Can you elaborate what would those positive things be, and why do you think they outweigh negative things? I can see those from your post and from my understanding:

dislike of links to copyrighted data (discussed before and below)
dislike of non-direct links to images (see “links do not lead to picture files”, discussed below)
it would make active database slightly smaller (although it will also make history database slightly larger - not discussed here as I find it too miniscule to be worth discussing).

Are there any more positive reasons?

I’d agree with Simon there; as much as I dislike Google, it should have similar treatment as other copyrighted content (be it link to Facebook or to any website with “all rights reserved” or similar - most website, contact:facebook, opening_hours:url etc. would have to go too if we agreed that links to copyrighted material are not OK).

So, the other reason being “Links do not lead to picture files”, if I read that correctly:

But I’m not sure you mean by that “Links do not lead to picture files”? I click on those links from your overpass query (e.g. this random one) and my web browser produces picture of that storefront. Doesn’t it for you? I’d say that this link leads to picture files. Wouldn’t you?

Or do you allude to the fact that direct URL endpoint is not Content-type: image/jpeg but something else which only includes image? That is true, but note that:

it seems to be explicitly allowed by image=* wiki wording (i.e. “(or) URL or URI of a page containing the image alongside copyright information and other details”
wiki mentions it as an alternative for wikimedia_commons (which does pretty much the same thing: encapsulate image in some html/javascript container with extra stuff; which reinforces the idea that it is to be explicitly allowed)
and actual usage of the tag shows a lots of image=* links working in similar manner i.e. not being direct link to an image but containing image indirectly (e.g. all the image=* which link to wikipedia.org images, commons.wikimedia.org, flickr, mapillary, google drive, image hosting sites, redirectors like tinyurl etc). Update: rough statistic of image=* says that of total ~301k image values, at least 170k of top-20 sites are complex html+js+css+image, and only about 31k of top-20 sites are simple direct-to-image links; i.e. only about 18% of the values used are simple images)

Or am I missing something?

Friendly_Ghost · June 13, 2023, 11:29pm

I think we can drop the point I tried to make about copyrighted images. I concede that current policy allows this.

That is my main concern. Creating consistency in how image=* is used (by removing Gmaps links in this case) will benefit many users.

Most of these can be retagged to URLs in one way or another, but that’s not possible with Gmaps content. That’s why I propose to remove these instead of finding a way to re-tag them.

Personally I would discourage users from tagging any indirect links and I would put all Wikimedia Commons into the wikimedia_commons key, but those ideas are beyond the scope of this automated edit proposal.

Inconsistent use of the image=* key is unhelpful for users and makes OSM less practical to use. Removing this data is actually beneficial.

Would you prefer a re-tagging of Gmaps image links to a separate key instead (following the ATYL guidelines)?

Matija_Nalis · June 14, 2023, 6:47pm

In which way exactly would such data removal benefit them, could you elaborate? Because I completely fail to see how removing information can benefit anyone, beside making download (insignificantly in this case) smaller.

I can see exactly two situations:

data consumer is able to parse to more complex image (indirect image linking) e.g. by opening image link in browser where user can see the image. In that case, your proposed automated edit would make situation worse for data consumer, by taking useful information away from them
data consumer is unable to parse more complex image (i.e. it can only display directly-linked images). In this case, it would obviously not display the image to the user – which would be exactly the same result/usefulness to data consumer as if you did that proposed automated edit and removed the image tag. Thus, data consumers wouldn’t benefit.

Thus, doing such proposed automated edit will make situation worse for some data consumers, while making no difference for other data consumers. It won’t make situation better in either case. Thus, it’s impact can only be negative, never positive. As such, it sounds like a bad idea.

No, I would not, as I don’t see how it would make situation better (even if the proponent commited to do PR implementing support for new tag for top-25 data consumers, result would still be worse, or at best equally good/bad, than a current situation)

I don’t think why you think it’s impossible? It’s possible as with the others (e.g. wikimedia, wikipedia, mapillary…).
E.g. that google.com/maps example from your overpass that I’ve linked earlier ? You can get direct Content-Type: image/jpeg of that storefront picture, like this.

It’s not even especially hard. Main problem however is that such deep-linking loses context. As when you link directly to .jpg on wikimedia commons, you lose all metadata as opposed when you link to File: on wikimedia commons. And if the link was to Category (like this google maps example is), you loose all but a first picture. (another problem is that those might get blocked at some time in the future)

So, it would seem to that your proposed automated edit seems to stem from the fact that you don’t like how image=* wiki is defined, and you’d like if it was simpler. Am I correct?

Well, it is what it is, and it is little late to change how it is supposed to be used, and trying to enforce that imagined simplicity that would break on much more than just google maps links is not a good idea, IMHO (and trying to subjectively enforce rules in one case, and not in others cases, I find even more disturbing idea). Let’s not go there.

If you have a dream of such simplicity, you might want to propose new tag like image_direct which would have simpler syntax that is different than the one specified by image=*, but I’d also consider that waste of time and effort (it would require more work all around, while not accomplish anything that isn’t already possible with much less work)

TL;DR: it ain’t broken, don’t fix it.

What might a be good idea instead, if you’re feeling up to it, is to go (and encourage other people going) visit those locations, and upload your own (better!) pictures to wikimedia commons, and then replacing that image=https://maps.google.... with wikimedia_commons=File: (or wikimedia_commons=Category:). That would replace unwanted google copyright-restricted stuff with free wikimedia commons stuff, and at the same time clear that parsing complexity from those image=* tags (well, it would move complexity to wikimedia_commons=* tag, but it is always expected there)