Persistent and stable identifiers

A Wikidata item and an OSM item are “always same as” it would make sense to support SKOS ?

they are similar, but wikidata and OpenStreetMap items, while (ideally) somehow referring to the same thing, aren’t “same” in a mathematical way, they are both independently defined through different systems, which may contradict each other if you compare their properties and meanings of those properties and relationships. And both systems change all the time, independently of each other.

4 Likes

Wikipedia / Wikidata aren’t even consistent within themselves. This wikidata item allegedly matches these two wikipedia entries, yet those pages have a very different idea of the extent of the country that they describe. OSM has objects for both, and wikipedia in a sense has too (it has different language pages), but both wikipedia pages are linked to one wikidata item, which makes no sense.

3 Likes

Another example: Oldmoor Wood. Originally added to wikidata as a “human settlement” only changed to “woodland” after 6 years. This shows the item refers to two things: a name which can be erroneously attached to other (or non-existent things) and a wood. Obviously, someone decided that the name was more important for persistence. If I’d been doing it this would have been marked as obsolete and a new valid identifier created with some link to the previous value. In this case the resolution is straightforward, but I’ve come across many similar examples of Wikidata where this is not simple. The commonest is where a wikidata item refers both to a village or other settlement and an administrative boundary with the same name.

2 Likes

There are also plenty of cases in the other way around. Consider about all those historic buildings nowadays contain a museum. So there is a wikipedia/wikidata entry for the building and another one for the museum. But in OSM, there is only one polygon describing everything.

3 Likes

It depends - sometimes that makes sense, sometimes not. If multiple entities share the same physical space there are plenty of ways in OSM to split them.

1 Like

The purist view of Wikidata is that it’s just a collection of claims made by others. So if there are two competing ideas about the area covered by “Serbia”, however defined, then those claims can coexist with qualifiers saying who supports or rejects each claim, or in what context each claim is valid.

If the conflicting statements get at the fundamental question of what Serbia is, then ideally there would be separate items. I spend as much time conflating and deconflating items in Wikidata as I conflate and deconflate dual-tagged features in OSM. Both projects are a work in progress.

1 Like

To be fair, I think it is also correct to say that “wide understanding of ontologies like Wikidata” (and other authority controls) is also “a work in progress.” Speaking personally, even for someone somewhat-well-versed in the use and evolving science of libraries, classification, ontologies, linguistics, computer science, cartography and related digital technologies, these aren’t necessarily new concepts, but they are evolving and emerging. Especially as we are to consider them to be “widely understood.”

They are “somewhat understood,” by a certain segment of “classification geeks” (no offense to geeks, I consider myself a member of geek communities). They are getting to be better understood by a wider segment of people who use such systems, but I would not say that they are “widely understood.” Yet.

1 Like

Agree but dont we have a nice user case with Wikidata and Open Street Map with two dedicated open communities were we could maybe get SKOS working and learn more…

As said Wikipedia Wikidata I feel has not started adopted SKOS as much as I feel we should

  • even when we have more Wikipedia articles in more languages that are different between languages we dont use etc…
  • having > 7500 external identifiers I feel Wikidata is a good place to start be better learning more about SKOS and also persistent identifiers… I see it as a work in progress

A little related blogpost Denny Vrandečić one of the architects of WD wrote A categorical imperative?

@salgo60 would you be interested in creating a dedicated thread here to discuss how would be the SKOS version of the tagging schema we have today on Wiki? I think it would be worth it.

Context

Some frontend applications also works with SKOS, such as Skosmos , TemaTres, iQVoc , SkoHub , which are used by like FAO (https://agrovoc.fao.org/), UNESCO (https://vocabularies.unesco.org/), etc, etc, etc. Not sure which one the European Union uses, but EuroVoc (Browse by EuroVoc - EUR-Lex) seems to be a custom app that exports whatever they use for humans to input the data.

Compared to formats that allow strict semantic inference (OWL, at some level RDFS, SHACL, …) SKOS is good enough to be viable to encode (likely even somewhat automate) data mining from Wiki (e.g. generate daily data dumps people unlikely to disagree with encoding, maybe with the data, but not how to encode the data). We may never fully agree on an “upper ontology”, but SKOS would be plausible. However, this would require some conventions on the output, because obviously this would allow some extra reusability outside TagInfo (like if people try to create “pocket dictionaries/thesaurus”


Post edit: I’m commenting this because would be willing to make the software implementation (public domain license) on the data mining for it, so I’m not just saying expecting someone else would do it.

1 Like

You are more than welcome… I have spent 6 years on Wikidata and created > 25 external identifiers (user salgo60) and my feeling is that persistent identifiers is a must but its also important to define how two knowledge domains are connected using e.g. SKOS

WD and OSM can never be Same As in the sense if x is not identical to y,
then there must be some property that they do not share maybe we need better semantics to tell that this WD object is the same as defined on a map in OSM or its a narrower term…

I would think that “better” ontology description (systems) allow exactly this: how to more-precisely describe that two things are not the same, and/or that one object has “a more narrowly-defined scope.”

Some potentially good reading: Ontology - an overview | ScienceDirect Topics . You’ll note that the very first thing that needs doing in “Ontology Engineering” is identification of purpose : at the outset it is important to be clear about why the ontology is being built and what its intended uses are. If we don’t start with (or have already) at the very least THIS for both OSM and Wikidata (or whatever…TemaTres, SkoHub…), you’ve not only likely lost the audience, you may be somewhat lost yourself. I don’t say this to insult, I merely wish to build on strong foundations. (And by the way, I don’t know what SKOS is, and maybe many others don’t, either, even as we do our best to follow this thread, including self-education and following the trail to read up on SKOS. Similarly, Wikidata, at least in my experience, “arrived suddenly” into OSM and I felt very much like “hey, fellow OSMers, figure out why this ontology has crashed in here on your own, as, I’m not going to explicitly tell you”).

I know it can be tedious to give everyone a primer on what one is talking about (all the time? no,…) but it can be helpful in a forum like this and on topics like these to do a bit of that. Not spoon-feeding, but a few breadcrumbs on the trail can and does help.

2 Likes

While linking with Wikidata makes sense for some things, I have more than once encountered problems where someone on Wikidata abused OSM as a geometry storage for Wikidata items, creating relations in OSM that have no place here just to “have something to link to”.

I think that forcing stable identifiers on our mappers would create an extra burden that would detract from our purpose. Every single object would become a potential link target and you’d never know who links to it and with what purpose. I like the current situation better, where we can explicitly link to Wikidata where we think it makes sense, and when we decide to split or merge objects we can determine if and where to keep the link.

Obviously, any method that would make it harder to add stuff to OSM would be an inacceptable burden for mappers, like having to obtain an ID from some ID authority, or a requirement to add distinguishing information. OSM is a project for everyone to participate in. What enables you to make good contributions to OSM is local knowledge, not subject matter knowledge. You do not have to be an expert in the domain of the thing you’re adding to the map - you can add a tree without knowing what kind it is, and you can add a transformer without knowing how many secondary coils it has.

Anyone can take OSM data and do with it what they please on the “output side”, but manipulating the “input side” so that some non-OSM purpose is easier to reach will always be a problem.

To be honest, I am quite happy with Wikidata and OSM being different worlds with different approaches, and would prefer them to be kept at arm’s length. Or maybe the length of a bargepole :wink: while there are some people invested in both projects, the Wikidata mindset is often very different from that in OSM. Wikidata folks in OSM are more likely to run imports and mass edits from the comfort of their office chair than to go out and map, and that’s not a good influence on OSM.

7 Likes

I’ve said this in other places and in other ways, and I don’t want to detract from @woodpeck 's clearly-heartfelt enthusiasm for OSM more-or-less “as it is now.” And while I share that myself, too, I also see an all-or-nothing kind of bifurcation, when it doesn’t have to be that way.

Part of what I find so fascinating about this new(er) Discourse instance is discovering new-for-me and deeply intellectual and technically-advanced topics (like this, again, for me). And I don’t want to sound curmudgeonly (brittle, bad-tempered like an old person who wants to see no change) so I’m very open to “testing” (as ideas, in my mind first) this exciting, new stuff that intersects with OSM. I do so not looking forward to eschewing it completely, saying, “oh, no, too far different from the OSM I’ve known for so many years.” Rather, I want to find a sweet spot: let’s say I change 1% of my efforts to “something new” (an approach, a tool, a new structure for data…) and keep 99% of what I do the same, and I get a great deal more value for changing that 1%. Maybe it’s 4% or even 10%, but if I “double my value” by using a different tool 5% of my time (with no more time invested in my OSM efforts, except the time it takes to invest in learning something new), I very well might do that.

The sweet spot would certainly include “OSM being there, largely for the people who expect it to be there as they know it, AS they know it.” I don’t want to “go too far” in a wild, radical new direction, but I’d be willing to stick my toe in the water if it is exciting, positive and highly leveraged. I can always “pull back” and there is that sense of “balance” and “feedback loops work” (when a human and some technology are mixed up together). I suspect I am not alone in this desire to find this harmonious balance, while exploring new technologies, new paradigms and new methods for “how I OSM.” Maybe I’m dreaming, but really, I think I’m simply looking towards our future, but with both excitement and caution. We can adapt to change, in fact, we should embrace it, knowing we have both a gas pedal and a brake pedal.

1 Like

I’m an old mapper who started with Wikidata a few years back and learned about the tools, imports, constraints on the infrastructure etc. Through it all I have had a focus on hiking trails, campsites and related amenities.

I have followed a number of discussions about imports into OSM and Wikidata. I’m quite careful when doing edits in OSM en masse it’s very different from Wikidata where automated jobs are easily reversible on a large scale. In OSM that is not the case so seeking consensus’s before making changes on a scale is very important.

I like to craft map and improve by moving about and collecting data myself. I found that it helps me keep in shape and have something meaningful to do :grinning:

With Wikidata everything is done from an armchair. :sweat_smile:

When I made my hiking trail matcher I would have really benefited from a good stable identifier for every single segment of hiking trails in the world, but no such identifier exists. It is thus quite difficult at times to match hiking trails and their segments between OSM and WD.
I have had a very hard time finding good datasets for trails in Sweden which pretty much makes it impossible to have a good coverage of hiking trails in Sweden in Wikidata unless I want to manually investigate a host of websites and scrape or manually extract information and put it into Wikidata which I would like to avoid.

The advantage of having official data in Wikidata about hiking trails is that we could potentially find trails that are currently completely missing in OSM and create notes or similar so we can improve the coverage.

1 Like

Could you link to the osm elements? I don’t understand the problem, have you raised the issue on the talk page in Wikidata?

Hi. You can read more about SKOS Simple Knowledge Organization System - Wikipedia there.

In short it is a better, more advanced way to link between two heterogeneous datasets like OSM elements and Wikidata items because most of the time the element in OSM is not exactly the same as in Wikidata.

Magnus gave an example above. Here is another: how are campsites mapped in OSM? Is a firepit and a bench element a campsite?

How do we link between the campsite item in Wikidata and OSM? Would it be ok for other mappers if I create a new relation with the firepit and bench and grass and give it a name and link it to the corresponding wikidata item?

Is the campsite in OSM broader because it even includes the trashcan but the Wikidata item does not?

1 Like

Thank you greatly for that link; I’m devouring the article like I’m hungry!

See the claimed administrative boundary of Serbia and the administrative boundary that Serbia controls. See here for a bit of background, but I’d suggest trying to read a bit more widely around the topic, because different points of view are a great help here.

No. I’d have a more productive conversation with next door’s cat** :slight_smile: .

** I have previously attempted to draw attention to geographical errors in wikipedia (“XYZ place that is listed as a village isn’t actually a village”), but it essentially fell on deaf ears. Wiki* cares less about accuracy than the fact that there is something that can be cited (ob XKCD), even if the thing being cited is completely out of context.

When I started seeing this kind of misuse, especially with boundary types that aren’t very suitable for OSM, I was very tempted to riff on your well-written essay on the difference between relations and categories but couldn’t find the words.

Fortunately, the Wikidata project recognized the problem too and introduced an alternative to OSM linking, “geoshape” statements that link to GeoJSON files hosted on Wikimedia Commons. Some of these files are ODbL-licensed Overpass query results. (This is also an alternative to the English Wikipedia’s previous practice of scraping Google Maps directions into a KML file and dumping it into a wiki template. :face_vomiting:) Geoshapes could still use better documentation and awareness among Wikipedians.

2 Likes

In principle, Wikipedia and Wikidata are more appropriate projects than OSM for accommodating different points of view. To the extent that OSM includes both sides of a dispute, it’s to keep the peace within our project or because the “ground truth” is seriously contested. But Wikipedia and Wikidata are also capable of including points of view that are notable despite having fewer facts on the ground.

Case in point: when the Afghan government fell last year, there was very intense edit warring over every Wikidata item related to the country’s government, especially this item about the national flag. Aside from applying semi-protected status (preventing new or anonymous users from editing the item), the project defused the situation by creating an item about each historical variant of the flag and indicating which group accepts or rejects each one.

This nuanced approach directly benefited OSM because some Afghan flags had been mapped as part of flag displays (UN headquarters, Afghan embassies, world-class hotels, mosques, community centers, etc.), but these establishments didn’t suddenly start flying the Taliban flag! The nuance on Wikidata ensured a measure of stability for projects that use Wikidata in conjunction with OSM. The name suggestion index made sure mappers tagged flagpoles with flag:wikidata set to the more specific item about the 2013 flag design, so that people wouldn’t see an out of place, offensive Taliban flag abroad based on country=AF alone.

OSM’s mechanisms for describing geopolitical disputes are comparatively underdeveloped. When keys like disputed_by were first proposed, the proposal relied on very strict criteria about who can dispute a boundary, in an effort to distill these disputes down to a set of simple two-letter codes. But what is to be done about the many disputes between subnational entities? To illustrate this point ad absurdum, I invoked “any tags you like” and tagged the boundary disputes between neighborhood councils within Cincinnati, Ohio, using ad-hoc identifiers. Wikidata identifiers would’ve been more usable and self-documenting, even for humans. :wink: