Restrict wikimedia_commons URLs as image=* tag values?

I’m thinking the same.

In fact, the wikimedia_commons links can represent various types of content, including categories, images, videos, and audio. I see it as something more versatile and generic. On the other hand, the image link is more specific; it explicitly indicates that you’ll receive a single image and nothing else.

Please, read all my message and don’t extract only what agrees with your position. I’ve also stated that I’d use wikimedia_commons tag also for files (it doesn’t really matter what type of file a user uploads, to be honest. Be it a video or a picture they could be both useful for the end user, only a madman would link an audio or a pdf file). Moreover, who could say what a link that points to a file hosted on another platform could contain? It could link to a malware or a spyware, whilst a Wikimedia Commons file are more safe.

I suppose you misunderstood my intention. I extracted the part I agree to remark that.

The image tag effectively conveys that the content is an image, which is crucial information for visualizers.

Let’s consider the common scenario where a visualizer wishes to display an image that accurately represents an object. In this context, the wikimedia_commons tag may be too generic, as it can encompass a wide range of content. For instance, if it’s a collection, it doesn’t specify which image serves as the most representative. In contrast, the image tag provides precisely what a visualizer needs.

Regarding security, the choice of security checks to be performed is at the discretion of the visualizer. They can still filter by URL if it aligns with their specific use case. It’s essential to keep in mind that just because a file is hosted on Wikimedia Commons doesn’t necessarily mean it’s suitable or safe for display within the OpenStreetMap context. Many types of images, such as medical images, could raise concerns.

That’s why displaying the whole category (OsmAnd does this, for example) could be more useful than choosing only one picture.

I don’t think this applies to OSM, so why using it as an argument? Moreover, most of Commons images have a name that makes it clear what they represent, and it could be more easy to spot vandalism. What can you understand what image links an URL such as imgur.io/ag5ja1 (random string)?

It depends on the specific context, and one way doesn’t exclude the other. You can have wikipedia_commons with a collection of images, and at the same time image with the image most representative of the object. Then the visualizer can choose what it needs.

To emphasize that an image being in wikipedia_commons doesn’t guarantee safety, and using safety as an argument in favor of wikipedia_commons over image is a weak argument. Neither of them provides a safety guarantee. It’s the tool or visualizer making use of such links that should take care of this aspect.

While it’s true that Wikimedia Commons contains some NSFW content, it is a much more predictable and trustworthy source of images than the Internet writ large. At least there is some degree of moderation to prevent vandalism from persisting too long. Some mappers even trust Commons enough that they expect renderers to display any arbitrary Commons image that appears in a wiki:symbol tag – directly on the map:

When fetching arbitrary content from the Internet, a data consumer doesn’t only have to worry about NSFW content and vandalism. The content can go away at any time, because the host has no way of knowing that it’s being used on OSM and has made no commitment to keeping it online. Commons has a system to track which files are used on OSM so administrators are aware of this usage; they can avoid arbitrarily deleting the image or they can update OSM if they have to delete the image for copyright reasons.

Worse, an arbitrary domain could get squatted and start serving up malware or otherwise violate your privacy. It’s a really bad idea to hotlink this content and load it automatically without the user’s consent.

This is not a theoretical concern. OpenHistoricalMap has customized the OSM frontend to embed the contents of image verbatim in the sidebar when you visit an element’s page. Unfortunately, many of the images tagged in OHM are broken links. The vast majority of the images that still work are from Commons:

1 Like

I diverted from the main argument (you wouldn’t have a string of random characters for a Commons image name, so this is a non-problem for the discussed point), so I’ll bring it back on rails.

Why I think that using a Commons tag is better than using image: string length. OSM tags have a maximum characters allowance of 255 characters. The obligatory string https://commons.wikimedia.org/wiki/ takes 35 characters, which are not plenty, but could be saved when using the wikimedia_commons tag.

1 Like

Moreover, the correct URL to put in image isn’t obvious for Wikimedia Commons–hosted files. For example, this traffic sign node links to the following page on Commons via wikimedia_commons=*:

If I were to convert this tag to image=*, which URL format would a data consumer expect me to use?

  • https://commons.wikimedia.org/wiki/File:Lane_use_diagram_sign_at_Interstate_280_and_Almaden_Plaza_Way,_San_Jose,_California.jpg
  • https://upload.wikimedia.org/wikipedia/commons/d/d2/Lane_use_diagram_sign_at_Interstate_280_and_Almaden_Plaza_Way%2C_San_Jose%2C_California.jpg
  • https://commons.wikimedia.org/wiki/Special:Redirect/file/Lane_use_diagram_sign_at_Interstate_280_and_Almaden_Plaza_Way,_San_Jose,_California.jpg
Format Problem Prevalence
https://commons.wikimedia.org/wiki/File:… Points to an HTML page, not an image per se. 74,748
https://upload.wikimedia.org/wikipedia/commons/… Not a permalink: if someone uploads a new version of the image, for example to touch it up, then they’ll break this URL. (Old image revisions are moved to an archive/ directory.) Hotlinking this file violates its license. 23,428
https://commons.wikimedia.org/wiki/Special:Redirect/file/… No one knows about Special:Redirect. Hotlinking this file violates its license. 0

Sure, a data consumer could sniff out one of these URL formats and convert it to the desired format – either a link to the image description page, which contains the legally required attribution and license, or an API call that fetches the attribution along with the raw image URL. But parsing URLs is error-prone, and our general tendency is to prefer structured tags over freeform ones.

2 Likes

What you say is right, but such advantages are the consequence of using Wikimedia Commons as host, and not related to use the wikimedia_common tag.

If you put a Wikimedia Commons link in the image tag, you get the same advantages.

Wikimedia Commons discourages hot linking, so the widely accepted best practice is to link to the description page (the one you call as “HTML page”).

Data consumers such as OsmAnd can typically handle this page. This process is no more complex than handling the wikimedia_commons tag, and it doesn’t provide a direct link to the image. In fact, the information conveyed is exactly the same.

If so, this key has been misused tens of thousands of times. That should be cleaned up before consolidating another key into it.

Are you referring to OsmAnd’s “Images nearby” panel? That isn’t coming from image tags on features in OSM. For example, this spaceship hangar is tagged with image=http://commons.wikimedia.org/wiki/File:Last_Look_at_Hangar_One.jpg to indicate what it looks like today (note insecure URL):

But OsmAnd shows this historical image instead, based on the linked Wikidata item:

This robotic company’s office has no wikidata tag, but it does have image = https://www.inorbit.ai/hubfs/Jackal%20on%20the%20street%2001.jpg:


Yet no image appears on OsmAnd:

In order for an application to present an image in the UI using OSM tags, it would need to isolate the file name or manipulate the URL to use Special:Redirect to get the raw image and quite possibly the attribution string. By contrast, the wikimedia_commons key contains a page name that can be used verbatim with the Wikimedia Commons API. If we don’t see this as an advantage, then I don’t really see the advantage of image over description or url.

Additionally, editors such as iD and Go Map!! have fancy preset fields that let you pick a file to add to wikimedia_commons. This makes the process less error-prone for mappers too. image has no such convenience because its format is more freeform.

1 Like

My bad, I had an outdated download. The image does appear now:

But here’s another example, a building last edited a decade ago to add image=http://commons.wikimedia.org/wiki/File:Bank_of_Italy_%28Livermore,_CA%29.JPG (note insecure URL):

So OsmAnd will fetch direct image URLs but will not resolve URLs to image description pages.

I have to correct you on this. Likely OsmAnd does it only in some cases, like for some objects, but definitively it’s able to get images from the description page

See for example this guidepost: https://www.openstreetmap.org/node/10969256248

It has this image tag: image=https://commons.wikimedia.org/wiki/File:20230608-serina_cornalba_giro_redo-163.jpg

This is what I see in OsmAnd when I click on it (note that you have to click exactly on the guidepost to select it, and not nearby)

I stand corrected. Based on code inspection, OsmAnd is very optimistically looking for a Commons file name in any URL that very loosely matches the pattern it’s looking for. It assumes, for instance, that every image on the OSM Wiki is coming from Commons, which could lead to some weirdness whenever the OSM Wiki happens to have a different image with the same name. On the other hand, it fails to extract the Commons image name when the URL mentions MediaWiki’s index.php endpoint.

Meanwhile, 25,034 features simply set image to a Wikimedia Commons page name instead of a full URL. One mapper was confused enough about this usage that they went around deleting the tags, possibly thinking they were local file URLs.

Even if OsmAnd is capable of munging inconsistent data like this, I think this fragility rather hurts the argument for consolidating wikimedia_commons into image. If anything, we should consider consolidating image into url since almost a third of its occurrences point to HTML webpages anyways.

2 Likes

To be honest, I’m a bit puzzled about what exactly needs fixing here. Currently, we can effectively utilize the image tag, and it’s being correctly recognized by major data users like Waymarkedtrails and OsmAnd. This represents the best outcome we can hope for.

The proposals I’ve come across seem to suggest that these images might disappear, which would be a disservice to mappers who have taken the time to contribute these images (myself included) and to the users who benefit from having these images displayed.

Recognizing my direct involvement, I acknowledge that my perspective might not be entirely objective, but this change appears to have a strongly negative impact in my view.

A potential compromise solution could involve acknowledging the current state of affairs and allowing each mapper to decide whether to include an image based on their specific reasons for doing so.

OsmAnd also supports wikimedia_commons, albeit inconsistently, and it even knows how to fetch the image from the image (P18) statement of the Wikidata item mentioned in wikidata. That is the best outcome based on Postel’s law.

The Waymarked Trails developer balked at supporting wikimedia_commons out of fear of a slippery slope. Indeed, both flickr and mapillary have seen significant uptake by mappers. These other keys have different considerations, but to me a dedicated key containing more structured information is always preferable to making data consumers munge URLs into a more usable format, just as a “proliferation” of structured tags is preferable to freeform tagging in description.

Another option would be to tolerate some redundancy between image and wikimedia_commons, just as we tolerate some redundancy between wikipedia and wikidata.

1 Like

We agree on the principles. What we don’t agree on is if the adherence to such principles can come at the expense of the image visualization.

We have to accept that we are not in control of what data users are doing, and to some degree this limits us on the changes we can do to avoid breaking their functionality. If this requires us to tolerate minimal inconsistencies, then so be it.

Unfortunately, having both image and wikimedia_commons pointing to the same image results in OsmAnd displaying duplicate images.

Anyway, as the main problem is the visualization of guideposts in Waymarkedtrails, another possible solution is to limit the use of image keys pointing to Wikimedia Common files only for information=guidepost nodes. All the other image keys can be converted to wikimedia_commons.

This really limits the inconsistency to the minimum necessary.

What do you and others think?

I agree that backwards compatibility is important and have often made that argument in the context of routing. However, backwards compatibility isn’t absolute: as a community, we also reserve the right to evolve tagging schemes when necessary. The good news is that Waymarked Trails is more or less a single application, rather than a reusable library that has been deployed in countless applications as routing engines have. Unlike some routing engines, waymarkedtrails-backend is actively maintained, which means an agreed-upon migration period could be effective.

Taking a step back, not too many data consumers display images linked from OSM data yet. Waymarked Trails is a mature codebase, so naturally there’s a reluctance to fix what ain’t broke. But as a mapper, I care about not only supporting existing data consumers but also promoting new interesting data consumers.

As a developer making a new application, I’m probably going to care that:

  • Loading arbitrary images from the Web introduces a vector for security vulnerabilities. It’s my responsibility to protect users from malicious mappers, even as I trust the OSM community as a whole.
  • Displaying images without attribution, without even honoring an “All rights reserved”, subjects me to legal liability.
  • Loading arbitrarily sized images eats up my users’ bandwidth for no good reason.

OsmAnd and Waymarked Trails may have no such qualms, but that’s their choice. Based on these observations, I may choose to only display images from a trusted source, such as Wikimedia Commons, which has:

  • Technical and social controls to mitigate malicious files to some extent
  • A strong privacy policy and track record of protecting reader privacy
  • An API to obtain the license, attribution, and thumbnail for any image hosted there
  • A terms of service that declares a hyperlink to be sufficient attribution with respect to Creative Commons–licensed files

In order to limit my application to Commons files, displaying the thumbnail along with attribution, my options are:

  1. Insert the wikimedia_commons value into an API call.
  2. Match multiple URL patterns with different character escaping rules to extract the file name, then insert it into that API call.

Neither approach is particularly bothersome, but option (2) is hackier, which some programmers care about. And we’ve already seen that no one gets it quite correct, especially if they try to account for Commons images on other MediaWiki sites, as OsmAnd does. If only option (2) is available, we’re less likely to end up with those new, interesting, correct implementations.

The OSM-Wikidata Map Framework, which powers sites like Open Etymology Map, supports only wikimedia_commons but not image. Maybe this is because of the downsides of image that I mentioned earlier, or maybe it’s because of the site’s editorial focus on Wikimedia projects. It can’t hurt that wikimedia_commons has formal editor support, and that wikimedia_commons is 2½ times more common than Commons URLs in image. Anyways, it seems unfair to ask one set of developers to implement hacky, less straightforward code to keep another set of developers from having to literally concatenate two strings together.

Let’s not elevate a simple bug to an immutable constraint on OSM. OsmAnd already avoids duplicating the image if OSM and Wikidata link a given feature to the same image, so a fix for this bug doesn’t seem far-fetched – just open an issue in their issue tracker about it. Even if the OsmAnd developers object to deduplicating the images for some reason, this duplication doesn’t seem like a big problem in the grand scheme of things.

This is a practical compromise, but I think most mappers would see it as a blatant example of tagging for the renderer. Even though neither key is inaccurate, there’s a limit to how far we should bend over backward for a particular data consumer.

4 Likes

I would see as pragmatic first step to demonstrate to developer that community strongly prefers specific tagging scheme (and would not include codifying this as a rule, just filter out such cases in initial cleanup)

1 Like

I agree with the idea of not codifying this as a strict rule.

Conflating the proposals, our plan could be to change all the image keys that reference Wikimedia Commons images to use the wikimedia_commons key. However, for the information=guidepost nodes, we will also retain the existing image key, essentially introducing a form of duplication.

This is done to prevent any disruptions to the current functionality of Waymarkedtrails while also allowing other data users to rely solely on the wikimedia_commons key if they choose to do so

I will create a new support request to notify Waymarkedtrails about the current situation, with the hope that this approach will be sufficient to persuade them to make the necessary changes.

Can we all agree on this plan?

5 Likes