Replace certain OSM tags by Wikidata/Wikimedia links

Copntinuing the discussion from Is there support for removing all name tags on objects with a Wikidata QID present?:

I agree to both statements. Interlinking open source databases is a great thing and gives access to lots of valuable information. Replacing important tags like “name” in the OSM database by such links is another thing.

Nevertheless there are certain tags which could be understood as redundant information to Wikidata or Wikimedia commons links. One could be the species-tag. Using the species:wikidata tag instead provides a link to the correct species plus additional information:

Anyhow in the wiki it is said that additionally to this link the species-tag should be set. What would be the addtional value by doing so?

Another thing are pix on Wikimedia Commons. In this case the wiki page for key:image encourages the link to a Wikimedia Commons pic or category (with several pix) instead of adding a link by using the tag image=** . Nevertheless I have seen a lot of objects with both. Does that make sense? Would the image=** tag qualify for being removed in such cases, specially if that image is a link to Wikimedia Commons again?

2 Likes

To repeat from that thread:

No, for start Wikidata is a licensing quagmire as far as database rights are concerned.

Community there basically pretends that problem does not exist, making their dataset problematic to use in UK and UE where sui generis database rights exist.

Despite presenting dataset as CC0 and not mentioning this issue.

(they are importing directly and indirectly databases covered by databases rights - which is entirely legal in USA but resulting work is unfree in countries where database rights exist)

Also, not sure about their data quality - but I tried to use part of Wikidata data and it was of really low quality ( User:Mateusz Konieczny/failing testcases - Wikidata ).

Also, editing experience in Wikidata is quite miserable and expecting people to have separate accounts in a separate service to maintain basic data is a terrible idea.

here situation is different, as it is the same in a different format

See Restrict wikimedia_commons URLs as image=* tag values? for thread discussing this

4 Likes

When I build or update the database for my map, I work with Openstreetmap data. If I use another source, that would increase my workload and reduce my spare time.

5 Likes

it is human readable and its validity does not depend on a third party-service with own rules, moderators, editors, missing data, internal politics and occasional damage of valid data. Third party account on other platforms should not be needed to edit OSM objects.

Also, scientific names are a standard way to refer to taxons, while wikidata ids are not.

6 Likes

Thanks for your feedback.

The quality of the species-data in Wikidata is way better then many of the data entered manually in the species-tag. And of course the scientific name is part of the Wikidata item, otherwise it would not make any sense to link it.

I have to admit that I was not aware that a simple link to another website could create problems related to database right violation. To my knowledge such problems would only arise when copying data and publishing them on your own website. So if links to Wikidata could create such problems would it be your advise to abstain from links to Wikidata in general?

Or - in the context of my question - forget about species:wikidata and enter only species manually instead?

The validity always depends (at least until nowadays) on human beings whereever they do their work, but I got your point.

Thanks for the link to the image discussion. I had checked the forum for Wikidata but forgot to do the same for Wikimedia Commons otherwise I would have found that topic myself. Sorry for that.

1 Like

The link from OSM to Wikidata does not cause any problem (the only license it carries is OSM’s own license). The problem is that if you remove the text version of the information from OSM (e.g. species=*) in favor of the Wikidata link (e.g. species:wikidata=*) you are forcing any user (like a map that needs to show the species) to take that information from Wikidata, which carries another license.

By including both the text version and the link to other sources you are

  • technically simplifying the usage of the data (not forcing users to combine data from other sources), which is good
  • legally simplifying the usage of the data (not forcing users to combine data with other licenses), which is good
  • still allowing users that need extra info and are ok with the extra technical and legal overhead to use it, which is good
  • introducing redundancy, which is bad (because it allows to include inconsistent data and forces to do extra work to keep them consistent over time)

IMO there is no perfect choice about which fields to keep, but keeping both is still the best one

11 Likes

Instead of “only” linking to Wikidata, it would be nice if editors would download information from there and offer appropriate tag-suggestions. Like … if you enter the wikidata of a certain tree, it could offer you the correct species,=* if already in Wikidata and so on. Just a thought.

4 Likes

Good onya mate, on of the best replies I have received so far. Comprehensive and understandable and not omitting the negative issue, a double thumbs up for that.

3 Likes

link is not a problem, current situation is fine

but making mandatory to use external Wikidata data to get species info starts becoming problematic, and that would happen in case of replacing species by wikidata:species - as anyone wishing to use this data would be forced to use Wikidata (as @Danysan95 writes)

that sadly runs partially into legal limbo of Wikidata as far as database rights go (there were some attempts to decide on that in their community, basically it went nowhere and keeps being ignored)

1 Like

Very much this. Working with OSM data right now is “download a .pbf for the area you’re interested in and feed it into your tool of choice”. Adding an extra preprocessing step where you have to ingest and cross-reference another database, even just to render a map of one small region… nope. Big nope.

4 Likes

For the most part I agree with you, but do we really want all sorts of brand:*=* and operator:*=* tags in OSM to the point where we start tagging brand logos onto every POI, or do we just tag brand=*/operator=* + brand:wikidata=*/operator:wikidata=* and leave it at that?

I’m not joking about the brand logos btw; a quick Overpass query for [image~logo] gives me nearly 1800 results, and I already removed a whole bunch recently. Clearly there’s at least some demand for it, but my question is if we want to have this level of detailed info about places in OSM, while it can also be achieved by linking to Wikidata?

1 Like

Using Wikipedia and Wikidata more than just descriptive info is bad for OSM. Wikipedia and Wikidata are unreliable sources, prone to change. We already met issues that we had links to Wiki data in OSM and then those links became invalid because Wiki changed.

OSM should be self-sufficient and reliable database. Dependance on Wiki data can only hurt OSM reliability.

2 Likes

on what basis did you remove the logo tags?

I think the key here is defining “what is OSM” (easy one for a Tuesday morning, I know).

OSM is a collection of geographic facts. A company logo is not a geographic fact. Nor is a restaurant menu, or the specification of a church organ, or the history of a building. All those things can go into Wikidata (or wikiwhatever) and be linked from OSM via a Wikidata entry.

There is a road outside my house called Market Street that goes from one lat,lon to another lat,lon. That is a geographic fact, at least as we define it in OSM.

Obviously there are grey areas. OSM, as ever, defines the dividing line by convention rather than by explicit documentation. But by and large we have it right at the moment. I don’t think we should significantly shift the line towards either “more stuff in Wikidata” or “more stuff in OSM”.

6 Likes
  1. Image tags in OSM must refer to pictures of places. Logos are not that.
  2. I checked the images. No blind mass edits were involved. There were also a few broken links.
3 Likes

I can agree with this statement.

| Richard
January 31 |

  • | - |

I think the key here is defining “what is OSM” (easy one for a Tuesday morning, I know).

OSM is a collection of geographic facts. A company logo is not a geographic fact. Nor is a restaurant menu, or the specification of a church organ, or the history of a building. All those things can go into Wikidata (or wikiwhatever) and be linked from OSM via a Wikidata entry.

There is a road outside my house called Market Street that goes from one lat,lon to another lat,lon. That is a geographic fact, at least as we define it in OSM.

Not sure if “geographic facts” is helpful, we are describing the world, and things like a name, a road surface, the current restaurant menu, the kind of cuisine, the postcode, the color of the roof tiles or start date of a building are details that further describe the thing, and that can be verified on the ground.

Obviously there are grey areas. OSM, as ever, defines the dividing line by convention rather than by explicit documentation. But by and large we have it right at the moment. I don’t think we should significantly shift the line towards either “more stuff in Wikidata” or “more stuff in OSM”.

+1, in particular I would be concerned if we’d remove information we currently have in OSM because it could eventually be (or is currently) also available in other databases (like wikidata).

1 Like

Unfortunately that assumes that : a) wikidata & wikispecies tags are correct; b) that the species name used on wiki projects is the one recognised by the botanical authorities in a given country. Botanists have opinions on these things. People interested in tagging this sort of information are likely to use one of the regular floras for their region, as will the experts who compile tree registers which get imported into OSM. OSM is helped by keeping names which people recognise, and, rather importantly, in validate in the field. My botany books don’t have Q124 for each species in the margin.

A few examples below, and if you think these sound like tagging discussions you’re not that far out.

A recent example is that iNaturalist, wikipedia etc. have all accepted a revision of the genus Sorbus which split it into 7 genera. This is not accepted by British and Irish botanists (including one of the leading experts on the genus and the author of the principal flora Stace 2019 p.213 : not because the revision is based on faulty data, but because there is not enough to make a judgement which means these names will be stable).

Another example is the London Plane, which can be either Platanus x hispanica or Platanus x acerifolia. I’ve never got the bottom of why both names still exist, but the former is standard in the UK, the latter in Germany.

Worse are that different countries (really author(s) of floras) will have different species concepts. So for instance the Holm Oak Quercus ilex is split into two species by some botanists: Qu. ilex sensu strictu and Qu. rotundifolia. This may mean, for instance, that trees labelled Qu. ilex in the UK may be from either species, because I suspect no-one has looked overclosely. Apparently the species concepts of Willows, Salix in the UK is very outdated compared to those of botanists elsewhere in Europe and many, many names may change. Only recently has some progress been made in harmonising concepts of various Rosa species.

Replacing the name used by the original mapper with a wikispecies key may therefore have the effect of obscuring the data not just for regular data consumption, but hiding inaccuracies as well. A large quantity of species tags, perhaps most, originate from imported data. Maintained tree registers will always use the standard species names from the main floristic work for a country.

For linking data using the current in-use species name is often also more useful as there is much higher quality data available on things like distribution and a whole range of attributes keyed on that name.

6 Likes

Identifying species is an advanced task in real life, but species presents laypeople with a sometimes impossibly high standard for mapping. I distinctly recall cleaning up after a whole high school class on an assignment to map the locations of foraging options around town. They did a good job identifying roses and crabapple trees, but this class taught geography, not biology or Latin, so all they tagged were common names in English.

The students could’ve tagged species:en and left it to another mapper to translate the common names into scientific names. But the more experienced mapper would’ve likely relied on a copyrighted, “all rights reserved” source to make this translation, if they could even do so without conducting a followup survey. A common name like “palm” can refer to many species and subspecies, and different authorities sometimes disagree on the proper classification of a species. At least species:wikidata can be useful no matter the value’s precision.

(I was really stumped about the sheer number of species=banana they had identified in this snowy climate where tropical fruits fare poorly. It turns out the kids had identified a kind of weed commonly called a “plaintain”. I lost my appetite at this point.)

By the way, species can sometimes be limiting when mapping athletic fields. This American football field is surfaced in a specific trademarked hybrid cultivar of grass, apparently significant because the last variety failed spectacularly. With all the attention that golf mappers pay to the finer points of gameplay, maybe a future game could even use Wikidata statements about the grass variety’s specifications to adjust the golf ball’s behavior. :man_shrugging:

I assume you’re referring to the problem of Wikipedia tags becoming outdated because of articles getting renamed. Wikidata IDs are much more stable, generally only changing for good reasons and redirecting from the old ID just in case. Of course, no self-respecting identifier scheme for the sum of human knowledge can be perfectly permanent, and to put things in perspective, OSM’s are less persistent than industry norms.

4 Likes

In a similar vein, wikidata’s categorization of gender is… pretty old fashioned, and causes :face_with_raised_eyebrow::roll_eyes: from LGBTQ+ people. (It uses “woman or transwoman” classification). This caused a problem for EqualStreets Brussels a few years ago.


I’m also opposed to this idea for the data licencing, and “don’t need to work with other datasources” reasons above.