amadvance:
To emphasize that an image being in wikipedia_commons
doesn’t guarantee safety, and using safety as an argument in favor of wikipedia_commons
over image
is a weak argument. Neither of them provides a safety guarantee. It’s the tool or visualizer making use of such links that should take care of this aspect.
While it’s true that Wikimedia Commons contains some NSFW content, it is a much more predictable and trustworthy source of images than the Internet writ large. At least there is some degree of moderation to prevent vandalism from persisting too long. Some mappers even trust Commons enough that they expect renderers to display any arbitrary Commons image that appears in a wiki:symbol
tag – directly on the map:
Hi,
a group of Outdoors contributors is preparing proposed additions to Waymarkedtrails so as to support symbols defined through the wiki:symbol tag. We are hitting security issues and would like to propose to restrain the use of this tag, and at least to leave URLs out.
The wiki:symbol tag is aimed at indicating that the symbol used for a given route must be fetched in SVG format from the OSM wiki. That is very useful for symbols that can’t be approximated with the mini-language offered by os…
When fetching arbitrary content from the Internet, a data consumer doesn’t only have to worry about NSFW content and vandalism. The content can go away at any time, because the host has no way of knowing that it’s being used on OSM and has made no commitment to keeping it online. Commons has a system to track which files are used on OSM so administrators are aware of this usage; they can avoid arbitrarily deleting the image or they can update OSM if they have to delete the image for copyright reasons.
Worse, an arbitrary domain could get squatted and start serving up malware or otherwise violate your privacy. It’s a really bad idea to hotlink this content and load it automatically without the user’s consent.
This is not a theoretical concern. OpenHistoricalMap has customized the OSM frontend to embed the contents of image
verbatim in the sidebar when you visit an element’s page. Unfortunately, many of the image
s tagged in OHM are broken links. The vast majority of the images that still work are from Commons:
opened 09:04PM - 16 Aug 23 UTC
compliance
inspector
security
images
As noted in https://github.com/OpenHistoricalMap/issues/issues/581#issuecomment-… 1679783531, the inspector automatically embeds any URL in an `image:0`, `image:1`, `image:2`, etc. tag as an image. #583 would at least check whether it’s an image before displaying it, but we don’t have any idea what the URL points to. The inspector should only embed the image if it comes from a domain on some project-wide whitelist.
## Problem
In general, it’s quite risky for us to hotlink an image from an unknown source, especially since we present it as a signature part of the website content, rather than as part of something most laypeople would recognize as a user-contributed post.
### Durability
[This way](https://www.openhistoricalmap.org/way/198636092) links directly to [an image hosted on Google Photos](https://lh3.googleusercontent.com/2QH8W9HluQJILPpfKbFjIRc453VlFORuvrzXpb4g4YCmpG7R3RqmBvWWz3zK7wqN0tHreb6BHsV9hEJU0YVSubEBzCYUIdX3E4Sb55Yk1noUw0Ugo3MRDB4kl_j8HwlSrTVeDWYlX5w=w2400). Except for an earlier version that linked to [the share page](https://photos.google.com/share/AF1QipOj4uozDyQdvlLYDxCOy--xQXK2YRKpuwGHjBfH4N--ybfyedLTAu2lqxYSS6XVlA/photo/AF1QipOJO4EzAxXsuO-umZ-LLxOAOny_N2WiOb7MTD7C?key=clA4WEIwVkFfYzdYV21oVEUzajl0RUN5UFY1RDFn), we wouldn’t know whose Google Photo account it’s on or whether it’ll remain there long-term. For all we’d know, the mapper got the photo from somewhere else on the Internet, and the Google Photo account owner has no way of knowing that deleting their photo would break OHM.
Images on personal photo hosting services are not the only images at risk of breakage, but more public-facing services can be archived by the Wayback Machine or kept in sync with OHM through other means: https://github.com/OpenHistoricalMap/issues/issues/581#issuecomment-1681131541. By explicitly listing the domains that the inspector _does_ support, we have a clearer strategy for keeping track of any widespread changes, such as [imgur culling old images](https://community.openstreetmap.org/t/imgur-is-going-to-delete-old-images/98229).
### Privacy
The page for the [Segedunum](https://www.openhistoricalmap.org/relation/2694442#map=18/54.98786/-1.53233&layers=O&date=335-12-11&daterange=10-01-01,2023-12-31) relation (the example given in https://github.com/OpenHistoricalMap/issues/issues/581#issuecomment-1679886563) embeds an image from [this URL shortened by Bitly](https://bit.ly/pl_89288).[^bitly] If a user views this page in OHM, Bitly automatically sets a tracking cookie in the user’s browser, even if the user never clicks on it. It also sets `referrer-policy` to `unsafe-url`, which [potentially undermines HTTPS security](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referrer-Policy#unsafe-url).
<img width="547" alt="set-cookie: _bit=…; Domain=bit.ly; Expires=Mon, 12 Feb 2024 17:55:18 GMT" src="https://github.com/OpenHistoricalMap/issues/assets/1231218/afbac76a-75a5-4d3b-a289-485ff97ea5b4">
The OHM website makes no mention of [Bitly’s cookie policy](https://bitly.com/pages/cookies) or links to their opt-out screen, which I assume would be required in some jurisdictions. Perhaps the feature could be tagged with `image:1:cookie_policy` or somesuch, but I don’t imagine that mappers would be very interested in doing this kind of accounting for an image just to have it show up in the inspector.
URL shorteners aren’t the only websites that set HTTP cookies on images – so does Wikimedia Commons. However, Bitly’s business model is to use these cookies for tracking purposes. I suspect it would be a lot easier for an OHM privacy policy to link out to a fixed set of third-party privacy policies than to have to automatically discover the privacy policy for any site linked from `image:#`.
## Security
Many websites in OSM’s `website` tag have gone away, replaced by squatters who run the gamut from benign to malicious. Tracking these changes has been a challenge and [an unsolved problem for OSM](https://community.openstreetmap.org/t/is-there-a-procedure-to-prevent-link-rot/100886). Most of the occurrences of `image:#` in OHM are recent enough that they haven’t had time to rot yet, but it’s only a matter of time.
Even now, a number of `image:#` occurrences refer to HTTP URLs instead of HTTPS URLs. Google Chrome refuses to load images from HTTP in an HTTPS webpage.
## Proposal
Maintain a JSON file in some repository under the OpenHistoricalMap organization. The JSON file would include an array of whitelisted domains.[^blacklist] [In the inspector](https://github.com/OpenHistoricalMap/ohm-inspector/blob/94e546bc8ad4b05a09ad052cffff0cec1a8c8217/openhistoricalmap-inspector.js#L117), fetch this file (caching if necessary) and match an `image:#` URL against the whitelist before showing it.
Populate the whitelist with the domains listed in https://github.com/OpenHistoricalMap/issues/issues/583#issuecomment-1680008099 after a cursory review to make sure each one is OK.
Document the process for OHM contributors to get another domain listed in this file, including the criteria for getting whitelisted. If necessary, publish the repository to GitHub Pages on a subdomain of openhistoricalmap.org to keep the file lightweight and independent of the usual deployment process.
## Alternatives
@jeffreyameyer suggested in https://github.com/OpenHistoricalMap/issues/issues/581#issuecomment-1679886563 that the inspector should continue to show the tagged image in general, unless the domain is on a blacklist. This would give mappers instant gratification when including images from lesser-known websites. Unfortunately, I don’t think this approach would address the durability, privacy, and security concerns, and it would create maintenance overhead of a worse nature than the proposal above: if we don’t get around to approving a pull request for a whitelist, the user may be discouraged until we do respond. But if we don’t get around to approving a PR on a blacklist, OHM potentially faces a reputational issue in the meantime.
[^bitly]: This is a kind of rubegoldbergian image reference. Besides the URL shortening service, [the rehydrated URL](https://web.archive.org/web/20210203035412/https://d279tnhy9skgzk.cloudfront.net/d8KIUrQXIdG4wBcxYD3BC0Yw9ZU=/600x0/https://s3-eu-west-1.amazonaws.com/atwam-images-files/production/images/content/segedunumromanfort/2015-06/5687.jpg) also goes through the Internet Archive and Cloudfront on its way to an AWS s3 bucket. At only 231 characters in length, the rehydrated URL would’ve satisfied OHM’s 255-character tag value length limit. Furthermore, the resulting image is a mere 600-pixel-wide thumbnail, whereas [the full image](https://s3-eu-west-1.amazonaws.com/atwam-images-files/production/images/content/segedunumromanfort/2015-06/5687.jpg) at a much shorter URL is 2,000 pixels wide, allowing the user to see many more details than in the thumbnail.
[^blacklist]: A JSON file would allow us to potentially add a complementary blacklist of URL patterns in the future, in case there’s something from a trusted domain that should be tagged but would be problematic to show in the inspector for some reason.
1 Like