I took some time to read past attempts on “make OpenStreetMap data more semantic” in the RDF sense, however the ones that are actually still used today have a strong relationship with delivering the full thing. The more academic and/or buzzwordy the terms the less likely to be adopted. And I think there’s practical issues (e.g. people actually tried in production, but didn’t have performance as alternatives).
If we consider that OpenStreetMap data (because it is almost always explicitly anchored with spatial relation) the de facto optimized size of its form on a traditional triplestore would be massive. So if Wikidata is already looking for alternatives to handle their query engine, there’s no tool today able to handle full OSM data. The closest to this is this paper
but it takes as much as 48 days to process the full planet. So imagine that realistically speaking, it is better to work towards using the explanation of how concepts are represented in tags to then rewrite these so the actual query could be run very efficiently on an overpass. Since these queries would be complex for humans to write by hand, I think we could already use the way to represent concepts both for validation, but in some years the full query. If necessary, we try to optimize some cache for Overpass, but generic databases for SPARQL will not cope with OSM ever.
One problem is that pretty much every potential buzzword we try to use to explain something new before actually being ready, someone in the past promised as revolutionary, but then didn’t deliver. It might feel hostile at first, but even the complaints about DWG against imports (from what I’m perceiving) is because by far the easiest way to break OSM is by imports, so it makes sense. On Wikidata, bad Imports would be far harder to perceive than OpenStreetMap because on OSM anyone can see things in the map, but how to visualize abstract things that may be lost on Wikidata? If we assume that it is easier to see errors on OSM than Wikidata, and that average non-experts, not massive imports, tend to improve the result, things make sense.
With all this said, also being realistic that even developers from Overpass augmented in public that’s is hard for developers on OSM to have “buy in” from ideas I’m new, but it seems that it is not about being hostile to new ideas, just that developers have a strong culture of “talk is cheap, show the code”. I do understand people got some hope with Sophox being somewhat fast, however blazegraph was one of the worst tripestores to archive GeoSPARQL compliance
So yes, I fully agree we could try ways to make data, like the separators from fields, and other simpler things. And also that this has a massive potential to be reused inside OpenStreetMap, because every developer will prefer this (but, again, unless things are very broken, we assume it is better to assume original data is kept unchanged).
About formalize better the tags and concepts themselves
Ok, going back to what I think might be easier, the closest to be the place to store how values are expected to be are the OpenStreetMap Wiki. TagInfo is the closest production-ready use of data from the Wiki (and even the Data items got stuck). Also, Eveb Wikibase and Semantic MediaWiki somewhat store data as if it was content of Wiki (however they’re more more formal than Infoboxes, but the underlying storage is still just plain text on an SQL database.
However, the current implementation of the equivalent of OpenStreetMap Infoboxes doesn’t hold sufficient Information. And if we start adding new parameters, since this would obviously be used to check consistency of data, we need to check the consistency of what checks the consistency. Add to this that even if we restrict people adding new parameters, over the years things might conflict with each other, so we literally would need to plan ahead the full thing.
Also, some complaints on OpenStreetMap, like the idea of attaching a specific tag to a Wikidata item was discarded because often it was used wrong. This means eventually even the idea of what “primary highway” concepts represent must never be the tag we use for it, because is the same as try to argue with human ontologist that name of the person is the person (TL:DR; the IRI to represent abstract idea of “primary highway” being different from the tags allow for stricter checks, including formalize difference between countries). So, my argument here is that, since we cannot rely on external identifiers (such as a code to express the concept of World/Continent/Country-name, then we need to formalize it because it is used for other rules.
I know this might feel hard, but structural concepts cannot be offloaded, not even to Wikidata. Labels and translation, ok, but not something that could break continuous integration pipeline. Never would have “buy-in”. I mean, eventually we could start with the low hanging fruits, like the concept of word, then attach default rules, so if someone in the world digit one additional zero like in a higthway=residential + maxspeed=200, then worst case scenario, the generic rule would apply and reject that data.
On this case of the initial proposal (about how to split fields) might not even need full RDF, just encode such information with the tags themselves (and in case of regexes, since there’s more than one flavor of regex, then we would need to recommend have for every popular language). And for things that are too complicated to make by rules, then this must also deliver snippets of software that anyone could use for that rule.