Replace certain OSM tags by Wikidata/Wikimedia links

The link from OSM to Wikidata does not cause any problem (the only license it carries is OSM’s own license). The problem is that if you remove the text version of the information from OSM (e.g. species=*) in favor of the Wikidata link (e.g. species:wikidata=*) you are forcing any user (like a map that needs to show the species) to take that information from Wikidata, which carries another license.

By including both the text version and the link to other sources you are

  • technically simplifying the usage of the data (not forcing users to combine data from other sources), which is good
  • legally simplifying the usage of the data (not forcing users to combine data with other licenses), which is good
  • still allowing users that need extra info and are ok with the extra technical and legal overhead to use it, which is good
  • introducing redundancy, which is bad (because it allows to include inconsistent data and forces to do extra work to keep them consistent over time)

IMO there is no perfect choice about which fields to keep, but keeping both is still the best one

11 Likes

Instead of “only” linking to Wikidata, it would be nice if editors would download information from there and offer appropriate tag-suggestions. Like … if you enter the wikidata of a certain tree, it could offer you the correct species,=* if already in Wikidata and so on. Just a thought.

4 Likes

Good onya mate, on of the best replies I have received so far. Comprehensive and understandable and not omitting the negative issue, a double thumbs up for that.

3 Likes

link is not a problem, current situation is fine

but making mandatory to use external Wikidata data to get species info starts becoming problematic, and that would happen in case of replacing species by wikidata:species - as anyone wishing to use this data would be forced to use Wikidata (as @Danysan95 writes)

that sadly runs partially into legal limbo of Wikidata as far as database rights go (there were some attempts to decide on that in their community, basically it went nowhere and keeps being ignored)

1 Like

Very much this. Working with OSM data right now is “download a .pbf for the area you’re interested in and feed it into your tool of choice”. Adding an extra preprocessing step where you have to ingest and cross-reference another database, even just to render a map of one small region… nope. Big nope.

4 Likes

For the most part I agree with you, but do we really want all sorts of brand:*=* and operator:*=* tags in OSM to the point where we start tagging brand logos onto every POI, or do we just tag brand=*/operator=* + brand:wikidata=*/operator:wikidata=* and leave it at that?

I’m not joking about the brand logos btw; a quick Overpass query for [image~logo] gives me nearly 1800 results, and I already removed a whole bunch recently. Clearly there’s at least some demand for it, but my question is if we want to have this level of detailed info about places in OSM, while it can also be achieved by linking to Wikidata?

1 Like

Using Wikipedia and Wikidata more than just descriptive info is bad for OSM. Wikipedia and Wikidata are unreliable sources, prone to change. We already met issues that we had links to Wiki data in OSM and then those links became invalid because Wiki changed.

OSM should be self-sufficient and reliable database. Dependance on Wiki data can only hurt OSM reliability.

2 Likes

on what basis did you remove the logo tags?

I think the key here is defining “what is OSM” (easy one for a Tuesday morning, I know).

OSM is a collection of geographic facts. A company logo is not a geographic fact. Nor is a restaurant menu, or the specification of a church organ, or the history of a building. All those things can go into Wikidata (or wikiwhatever) and be linked from OSM via a Wikidata entry.

There is a road outside my house called Market Street that goes from one lat,lon to another lat,lon. That is a geographic fact, at least as we define it in OSM.

Obviously there are grey areas. OSM, as ever, defines the dividing line by convention rather than by explicit documentation. But by and large we have it right at the moment. I don’t think we should significantly shift the line towards either “more stuff in Wikidata” or “more stuff in OSM”.

6 Likes
  1. Image tags in OSM must refer to pictures of places. Logos are not that.
  2. I checked the images. No blind mass edits were involved. There were also a few broken links.
3 Likes

I can agree with this statement.

| Richard
January 31 |

  • | - |

I think the key here is defining “what is OSM” (easy one for a Tuesday morning, I know).

OSM is a collection of geographic facts. A company logo is not a geographic fact. Nor is a restaurant menu, or the specification of a church organ, or the history of a building. All those things can go into Wikidata (or wikiwhatever) and be linked from OSM via a Wikidata entry.

There is a road outside my house called Market Street that goes from one lat,lon to another lat,lon. That is a geographic fact, at least as we define it in OSM.

Not sure if “geographic facts” is helpful, we are describing the world, and things like a name, a road surface, the current restaurant menu, the kind of cuisine, the postcode, the color of the roof tiles or start date of a building are details that further describe the thing, and that can be verified on the ground.

Obviously there are grey areas. OSM, as ever, defines the dividing line by convention rather than by explicit documentation. But by and large we have it right at the moment. I don’t think we should significantly shift the line towards either “more stuff in Wikidata” or “more stuff in OSM”.

+1, in particular I would be concerned if we’d remove information we currently have in OSM because it could eventually be (or is currently) also available in other databases (like wikidata).

1 Like

Unfortunately that assumes that : a) wikidata & wikispecies tags are correct; b) that the species name used on wiki projects is the one recognised by the botanical authorities in a given country. Botanists have opinions on these things. People interested in tagging this sort of information are likely to use one of the regular floras for their region, as will the experts who compile tree registers which get imported into OSM. OSM is helped by keeping names which people recognise, and, rather importantly, in validate in the field. My botany books don’t have Q124 for each species in the margin.

A few examples below, and if you think these sound like tagging discussions you’re not that far out.

A recent example is that iNaturalist, wikipedia etc. have all accepted a revision of the genus Sorbus which split it into 7 genera. This is not accepted by British and Irish botanists (including one of the leading experts on the genus and the author of the principal flora Stace 2019 p.213 : not because the revision is based on faulty data, but because there is not enough to make a judgement which means these names will be stable).

Another example is the London Plane, which can be either Platanus x hispanica or Platanus x acerifolia. I’ve never got the bottom of why both names still exist, but the former is standard in the UK, the latter in Germany.

Worse are that different countries (really author(s) of floras) will have different species concepts. So for instance the Holm Oak Quercus ilex is split into two species by some botanists: Qu. ilex sensu strictu and Qu. rotundifolia. This may mean, for instance, that trees labelled Qu. ilex in the UK may be from either species, because I suspect no-one has looked overclosely. Apparently the species concepts of Willows, Salix in the UK is very outdated compared to those of botanists elsewhere in Europe and many, many names may change. Only recently has some progress been made in harmonising concepts of various Rosa species.

Replacing the name used by the original mapper with a wikispecies key may therefore have the effect of obscuring the data not just for regular data consumption, but hiding inaccuracies as well. A large quantity of species tags, perhaps most, originate from imported data. Maintained tree registers will always use the standard species names from the main floristic work for a country.

For linking data using the current in-use species name is often also more useful as there is much higher quality data available on things like distribution and a whole range of attributes keyed on that name.

6 Likes

Identifying species is an advanced task in real life, but species presents laypeople with a sometimes impossibly high standard for mapping. I distinctly recall cleaning up after a whole high school class on an assignment to map the locations of foraging options around town. They did a good job identifying roses and crabapple trees, but this class taught geography, not biology or Latin, so all they tagged were common names in English.

The students could’ve tagged species:en and left it to another mapper to translate the common names into scientific names. But the more experienced mapper would’ve likely relied on a copyrighted, “all rights reserved” source to make this translation, if they could even do so without conducting a followup survey. A common name like “palm” can refer to many species and subspecies, and different authorities sometimes disagree on the proper classification of a species. At least species:wikidata can be useful no matter the value’s precision.

(I was really stumped about the sheer number of species=banana they had identified in this snowy climate where tropical fruits fare poorly. It turns out the kids had identified a kind of weed commonly called a “plaintain”. I lost my appetite at this point.)

By the way, species can sometimes be limiting when mapping athletic fields. This American football field is surfaced in a specific trademarked hybrid cultivar of grass, apparently significant because the last variety failed spectacularly. With all the attention that golf mappers pay to the finer points of gameplay, maybe a future game could even use Wikidata statements about the grass variety’s specifications to adjust the golf ball’s behavior. :man_shrugging:

I assume you’re referring to the problem of Wikipedia tags becoming outdated because of articles getting renamed. Wikidata IDs are much more stable, generally only changing for good reasons and redirecting from the old ID just in case. Of course, no self-respecting identifier scheme for the sum of human knowledge can be perfectly permanent, and to put things in perspective, OSM’s are less persistent than industry norms.

4 Likes

In a similar vein, wikidata’s categorization of gender is… pretty old fashioned, and causes :face_with_raised_eyebrow::roll_eyes: from LGBTQ+ people. (It uses “woman or transwoman” classification). This caused a problem for EqualStreets Brussels a few years ago.


I’m also opposed to this idea for the data licencing, and “don’t need to work with other datasources” reasons above.

Thanks for all your replies some of which would be well worth to improve documentation in the wiki. A final remark to avoid misunderstandings: My question did not aim at replacing=removing existing species tags but to replace the species tag in favour of species:wikidata for new objects only.

One of the reasons to look at this tag specially is the fact that it is quite challenging for everyone not being a biologist or the like as already mentioned in earlier posts. Pushing mappers to look for species in wikidata would make sure that the value entered into the OSM list of tags is a valid species at least and not just a genus or anything else. I have looked up quite a list of various species in wikidata and did not find a single mistake so I would not say the data quality found there is worse than the data quality found in OSM tags.

Anyhow I agree to your explanatory statements - under the given circumstances it makes sense to use species:wikidata as an additional tag but not as a replacement.

All my previous objections apply to this too! It’s still multiple levels of indirection, the name on wikidata may not match that of the book in your hand …

If you’re not competent to identify trees to species then don’t try, but you can add leaf_cycle, leaf_type and perhaps know the genus. Almost everyone actually mapping these things will be interested in them, own books, perhaps have been on courses, or learnt from others. Additionally, quite a lot of planted trees do not have species names anyway : e.g., most flowering cherries, such as Prunus ‘Kanzan’. Some, such as this very well-known variety will have entries in wikidata, but others will not, although quite a few have recently been added. Others can easily be identified more precisely than the species, e.g. copper-leaved European Beech trees.

Supporting validation pick lists is entirely different, and a task for editors, but one can easily make use of one of the many well-developed observation apps which already provide lists correctly tailored to individual recording areas (iNaturalist, observation.org, iRecord …), such as the UK Species Inventory curated by the Natural History Museum in London. It is also possible to make suitable extracts from such lists which could be used with OSM editors.

Database rights issue and USA vs UE legal situation is - I think - documented already. Feel free to document other!

Maybe you did not note that I had already confirmed:

Who defines if someone is competent enough to give it a try?

If that is so, who tagged 6780 Pla, 6177 palm, 5168 TG1, 4634 Palm, 3756 japonica etc. etc.?

The key species is not marked as a key to be exclusively used by specialists. Everyone can make use of it and apparently a lot of mappers do without having the expertise you are asking for.

I can easily identify an orange tree without memorizing Citrus × sinensis, a weeping willow without ever having heard of Salix babylonica, a palm tree without having an opinion on this menagerie of tags. Applications like iNaturalist are very useful but shouldn’t be a prerequisite for adding this sort of information to the map.

Editors already give mappers the opportunity to jot down the common name using keys like species:en. But data consumers require a more dependable scientific name. It’s that translation that typically requires an external resource, if not Wikidata then something more encumbered. If the editor could persuade the mapper to volunteer a little more detail, then I don’t see how this would be worse than telling mappers not to bother. If iD’s existing field for the species’ Wikidata ID would display descriptions like JOSM or images like Wikidata itself, then mappers and data consumers would be less likely to confuse plantain the weed with plantain the fruit tree.

2 Likes