Restrict wikimedia_commons URLs as image=* tag values?

Back to the original discussion:

I think recommended is a bit to soft, what about “should” or “strongly recommended”?

my planet file from a few days ago shows 276.332 objects with image= (closely matching taginfo). In that I can find only 28.830 entries with ‘image=File:’ and … 19.423 objects with ‘/commons/’ in the url.

1 Like

with taginfo “image” searching for “File” I get 109k, with “File:” it is 88k, not sure but I believe it searches for the substring inside the value, not just the beginning.

In the wikimedia_commons tag there are 83k total, of which 28k have the string category: inside, so there are about 55k images.

1 Like

I can reproduce the numbers with taginfo “image” searching for “File”/“File:” but I am pretty sure it is bogus just looking at the first entry:

The way I did get the number is using osmium-tags-filter to extract an .opl file with all image= objects.

Then use grep on that .opl file searching for ‘image=File:’ and count the lines using wc.

1 Like

I can confirm your numbers,
osmium cat 220902-planet.osm.pbf -f opl|grep "image=File:"|wc -l
28913

1 Like

Your argument is basically “tagging for the renderer” it seems.
Having the link in a separate tag makes it a lot easier for all data consumers.
You can create a grease monkey script for sites like waymarked trails if you are not happy with their interface ignoring the tag you are using.

not all of them - it makes harder for ones already using image and only image

(I would not treat it as a blocker, but pretending that it does not exist is not helpful)

2 Likes

Here is a a suggestion for a possibly more long term and well structured solution for guideposts: Wikidata items for all guideposts?

I’m not sure duplication of information is a bad thing. I mean, redundancy can help in many cases (if Commons has a temporary downtime an alternate link could back up the same information so the user can still see the image). But having both image and wikimedia_commons link to the same file I agree is quite useless. I’m not sure what’s my favourite chiuse in this case, though. I think that using wikimedia_commons only for categories of page-galleries could be good (although, if there’s a Wikidata item linked that could suffice and provide images from linked category) but I’d use it for files hosted on Commons too.

for start, many OSM items have neither

also, I used few times specific image to match exactly state (after rebuilt) or perspective

I know, but that wasn’t the point of my message.

I’m thinking the same.

In fact, the wikimedia_commons links can represent various types of content, including categories, images, videos, and audio. I see it as something more versatile and generic. On the other hand, the image link is more specific; it explicitly indicates that you’ll receive a single image and nothing else.

Please, read all my message and don’t extract only what agrees with your position. I’ve also stated that I’d use wikimedia_commons tag also for files (it doesn’t really matter what type of file a user uploads, to be honest. Be it a video or a picture they could be both useful for the end user, only a madman would link an audio or a pdf file). Moreover, who could say what a link that points to a file hosted on another platform could contain? It could link to a malware or a spyware, whilst a Wikimedia Commons file are more safe.

I suppose you misunderstood my intention. I extracted the part I agree to remark that.

The image tag effectively conveys that the content is an image, which is crucial information for visualizers.

Let’s consider the common scenario where a visualizer wishes to display an image that accurately represents an object. In this context, the wikimedia_commons tag may be too generic, as it can encompass a wide range of content. For instance, if it’s a collection, it doesn’t specify which image serves as the most representative. In contrast, the image tag provides precisely what a visualizer needs.

Regarding security, the choice of security checks to be performed is at the discretion of the visualizer. They can still filter by URL if it aligns with their specific use case. It’s essential to keep in mind that just because a file is hosted on Wikimedia Commons doesn’t necessarily mean it’s suitable or safe for display within the OpenStreetMap context. Many types of images, such as medical images, could raise concerns.

That’s why displaying the whole category (OsmAnd does this, for example) could be more useful than choosing only one picture.

I don’t think this applies to OSM, so why using it as an argument? Moreover, most of Commons images have a name that makes it clear what they represent, and it could be more easy to spot vandalism. What can you understand what image links an URL such as imgur.io/ag5ja1 (random string)?

It depends on the specific context, and one way doesn’t exclude the other. You can have wikipedia_commons with a collection of images, and at the same time image with the image most representative of the object. Then the visualizer can choose what it needs.

To emphasize that an image being in wikipedia_commons doesn’t guarantee safety, and using safety as an argument in favor of wikipedia_commons over image is a weak argument. Neither of them provides a safety guarantee. It’s the tool or visualizer making use of such links that should take care of this aspect.

While it’s true that Wikimedia Commons contains some NSFW content, it is a much more predictable and trustworthy source of images than the Internet writ large. At least there is some degree of moderation to prevent vandalism from persisting too long. Some mappers even trust Commons enough that they expect renderers to display any arbitrary Commons image that appears in a wiki:symbol tag – directly on the map:

When fetching arbitrary content from the Internet, a data consumer doesn’t only have to worry about NSFW content and vandalism. The content can go away at any time, because the host has no way of knowing that it’s being used on OSM and has made no commitment to keeping it online. Commons has a system to track which files are used on OSM so administrators are aware of this usage; they can avoid arbitrarily deleting the image or they can update OSM if they have to delete the image for copyright reasons.

Worse, an arbitrary domain could get squatted and start serving up malware or otherwise violate your privacy. It’s a really bad idea to hotlink this content and load it automatically without the user’s consent.

This is not a theoretical concern. OpenHistoricalMap has customized the OSM frontend to embed the contents of image verbatim in the sidebar when you visit an element’s page. Unfortunately, many of the images tagged in OHM are broken links. The vast majority of the images that still work are from Commons:

1 Like

I diverted from the main argument (you wouldn’t have a string of random characters for a Commons image name, so this is a non-problem for the discussed point), so I’ll bring it back on rails.

Why I think that using a Commons tag is better than using image: string length. OSM tags have a maximum characters allowance of 255 characters. The obligatory string https://commons.wikimedia.org/wiki/ takes 35 characters, which are not plenty, but could be saved when using the wikimedia_commons tag.

1 Like

Moreover, the correct URL to put in image isn’t obvious for Wikimedia Commons–hosted files. For example, this traffic sign node links to the following page on Commons via wikimedia_commons=*:

If I were to convert this tag to image=*, which URL format would a data consumer expect me to use?

  • https://commons.wikimedia.org/wiki/File:Lane_use_diagram_sign_at_Interstate_280_and_Almaden_Plaza_Way,_San_Jose,_California.jpg
  • https://upload.wikimedia.org/wikipedia/commons/d/d2/Lane_use_diagram_sign_at_Interstate_280_and_Almaden_Plaza_Way%2C_San_Jose%2C_California.jpg
  • https://commons.wikimedia.org/wiki/Special:Redirect/file/Lane_use_diagram_sign_at_Interstate_280_and_Almaden_Plaza_Way,_San_Jose,_California.jpg
Format Problem Prevalence
https://commons.wikimedia.org/wiki/File:… Points to an HTML page, not an image per se. 74,748
https://upload.wikimedia.org/wikipedia/commons/… Not a permalink: if someone uploads a new version of the image, for example to touch it up, then they’ll break this URL. (Old image revisions are moved to an archive/ directory.) Hotlinking this file violates its license. 23,428
https://commons.wikimedia.org/wiki/Special:Redirect/file/… No one knows about Special:Redirect. Hotlinking this file violates its license. 0

Sure, a data consumer could sniff out one of these URL formats and convert it to the desired format – either a link to the image description page, which contains the legally required attribution and license, or an API call that fetches the attribution along with the raw image URL. But parsing URLs is error-prone, and our general tendency is to prefer structured tags over freeform ones.

2 Likes

What you say is right, but such advantages are the consequence of using Wikimedia Commons as host, and not related to use the wikimedia_common tag.

If you put a Wikimedia Commons link in the image tag, you get the same advantages.

Wikimedia Commons discourages hot linking, so the widely accepted best practice is to link to the description page (the one you call as “HTML page”).

Data consumers such as OsmAnd can typically handle this page. This process is no more complex than handling the wikimedia_commons tag, and it doesn’t provide a direct link to the image. In fact, the information conveyed is exactly the same.