Restrict wikimedia_commons URLs as image=* tag values?

I stand corrected. Based on code inspection, OsmAnd is very optimistically looking for a Commons file name in any URL that very loosely matches the pattern it’s looking for. It assumes, for instance, that every image on the OSM Wiki is coming from Commons, which could lead to some weirdness whenever the OSM Wiki happens to have a different image with the same name. On the other hand, it fails to extract the Commons image name when the URL mentions MediaWiki’s index.php endpoint.

Meanwhile, 25,034 features simply set image to a Wikimedia Commons page name instead of a full URL. One mapper was confused enough about this usage that they went around deleting the tags, possibly thinking they were local file URLs.

Even if OsmAnd is capable of munging inconsistent data like this, I think this fragility rather hurts the argument for consolidating wikimedia_commons into image. If anything, we should consider consolidating image into url since almost a third of its occurrences point to HTML webpages anyways.

2 Likes

To be honest, I’m a bit puzzled about what exactly needs fixing here. Currently, we can effectively utilize the image tag, and it’s being correctly recognized by major data users like Waymarkedtrails and OsmAnd. This represents the best outcome we can hope for.

The proposals I’ve come across seem to suggest that these images might disappear, which would be a disservice to mappers who have taken the time to contribute these images (myself included) and to the users who benefit from having these images displayed.

Recognizing my direct involvement, I acknowledge that my perspective might not be entirely objective, but this change appears to have a strongly negative impact in my view.

A potential compromise solution could involve acknowledging the current state of affairs and allowing each mapper to decide whether to include an image based on their specific reasons for doing so.

OsmAnd also supports wikimedia_commons, albeit inconsistently, and it even knows how to fetch the image from the image (P18) statement of the Wikidata item mentioned in wikidata. That is the best outcome based on Postel’s law.

The Waymarked Trails developer balked at supporting wikimedia_commons out of fear of a slippery slope. Indeed, both flickr and mapillary have seen significant uptake by mappers. These other keys have different considerations, but to me a dedicated key containing more structured information is always preferable to making data consumers munge URLs into a more usable format, just as a “proliferation” of structured tags is preferable to freeform tagging in description.

Another option would be to tolerate some redundancy between image and wikimedia_commons, just as we tolerate some redundancy between wikipedia and wikidata.

1 Like

We agree on the principles. What we don’t agree on is if the adherence to such principles can come at the expense of the image visualization.

We have to accept that we are not in control of what data users are doing, and to some degree this limits us on the changes we can do to avoid breaking their functionality. If this requires us to tolerate minimal inconsistencies, then so be it.

Unfortunately, having both image and wikimedia_commons pointing to the same image results in OsmAnd displaying duplicate images.

Anyway, as the main problem is the visualization of guideposts in Waymarkedtrails, another possible solution is to limit the use of image keys pointing to Wikimedia Common files only for information=guidepost nodes. All the other image keys can be converted to wikimedia_commons.

This really limits the inconsistency to the minimum necessary.

What do you and others think?

I agree that backwards compatibility is important and have often made that argument in the context of routing. However, backwards compatibility isn’t absolute: as a community, we also reserve the right to evolve tagging schemes when necessary. The good news is that Waymarked Trails is more or less a single application, rather than a reusable library that has been deployed in countless applications as routing engines have. Unlike some routing engines, waymarkedtrails-backend is actively maintained, which means an agreed-upon migration period could be effective.

Taking a step back, not too many data consumers display images linked from OSM data yet. Waymarked Trails is a mature codebase, so naturally there’s a reluctance to fix what ain’t broke. But as a mapper, I care about not only supporting existing data consumers but also promoting new interesting data consumers.

As a developer making a new application, I’m probably going to care that:

  • Loading arbitrary images from the Web introduces a vector for security vulnerabilities. It’s my responsibility to protect users from malicious mappers, even as I trust the OSM community as a whole.
  • Displaying images without attribution, without even honoring an “All rights reserved”, subjects me to legal liability.
  • Loading arbitrarily sized images eats up my users’ bandwidth for no good reason.

OsmAnd and Waymarked Trails may have no such qualms, but that’s their choice. Based on these observations, I may choose to only display images from a trusted source, such as Wikimedia Commons, which has:

  • Technical and social controls to mitigate malicious files to some extent
  • A strong privacy policy and track record of protecting reader privacy
  • An API to obtain the license, attribution, and thumbnail for any image hosted there
  • A terms of service that declares a hyperlink to be sufficient attribution with respect to Creative Commons–licensed files

In order to limit my application to Commons files, displaying the thumbnail along with attribution, my options are:

  1. Insert the wikimedia_commons value into an API call.
  2. Match multiple URL patterns with different character escaping rules to extract the file name, then insert it into that API call.

Neither approach is particularly bothersome, but option (2) is hackier, which some programmers care about. And we’ve already seen that no one gets it quite correct, especially if they try to account for Commons images on other MediaWiki sites, as OsmAnd does. If only option (2) is available, we’re less likely to end up with those new, interesting, correct implementations.

The OSM-Wikidata Map Framework, which powers sites like Open Etymology Map, supports only wikimedia_commons but not image. Maybe this is because of the downsides of image that I mentioned earlier, or maybe it’s because of the site’s editorial focus on Wikimedia projects. It can’t hurt that wikimedia_commons has formal editor support, and that wikimedia_commons is 2½ times more common than Commons URLs in image. Anyways, it seems unfair to ask one set of developers to implement hacky, less straightforward code to keep another set of developers from having to literally concatenate two strings together.

Let’s not elevate a simple bug to an immutable constraint on OSM. OsmAnd already avoids duplicating the image if OSM and Wikidata link a given feature to the same image, so a fix for this bug doesn’t seem far-fetched – just open an issue in their issue tracker about it. Even if the OsmAnd developers object to deduplicating the images for some reason, this duplication doesn’t seem like a big problem in the grand scheme of things.

This is a practical compromise, but I think most mappers would see it as a blatant example of tagging for the renderer. Even though neither key is inaccurate, there’s a limit to how far we should bend over backward for a particular data consumer.

4 Likes

I would see as pragmatic first step to demonstrate to developer that community strongly prefers specific tagging scheme (and would not include codifying this as a rule, just filter out such cases in initial cleanup)

1 Like

I agree with the idea of not codifying this as a strict rule.

Conflating the proposals, our plan could be to change all the image keys that reference Wikimedia Commons images to use the wikimedia_commons key. However, for the information=guidepost nodes, we will also retain the existing image key, essentially introducing a form of duplication.

This is done to prevent any disruptions to the current functionality of Waymarkedtrails while also allowing other data users to rely solely on the wikimedia_commons key if they choose to do so

I will create a new support request to notify Waymarkedtrails about the current situation, with the hope that this approach will be sufficient to persuade them to make the necessary changes.

Can we all agree on this plan?

5 Likes

I would change only ones referencing existing files and leave broken ones for review.

I already made a start with this a while ago with a few thousand cases of image~File:....
From this experience I can say it’s a really simple tag conversion. If you can write a script to convert the full URLs into page titles, you can technically do all of it in a single mass edit.

Make sure to skip the German-speaking countries and let them discuss the topic for another year or so. Then, when the project is already finished everywhere else on the planet, they can perform their own tag updates. This approach worked very well for the riverbank project.

1 Like

Sure. I will make changes only in Italy.

I’ve already converted about 3000 keys of images created by me (excluding guidepost).

There are still about 500 in Italy, but unfortunately, some of them are unsolvable. There are cases where both the image and wikimedia_commons keys are already present and point to different URLs. Typically, the image points to a File, and wikimedia_commons points to a Category.

For example: Node: ‪Orimento‬ (‪728285746‬) | OpenStreetMap

Besides guidepost, this is an additional inconsistency that we will have to accept.