Question on database schema standardization

HarryKane · January 14, 2018, 4:54pm

Hi,

I often see that there are many different ways to tag are used for one and the same thing, for example

the many keys to express the date an entry was checked the last time, see the Wiki,
ATMs can be mapped additionally to atm=yes at their bank’s node, but it is not adviced to map a separate shelter after adding shelter=yes to a bus stop although the shelter might be useful even if one doesn’t wait for the bus,
the wiki is a place where one searches for the right way to tag but on the other hand I sometimes read that it is not a reference.
I have experienced both sides of maintaining a database - the input as a mapper and the output as a database programmer and user. For mappers it is complicated to map things right and for the programmers it is hard to write proper query statements which return a complete dataset.
These examples and some forum post indicate the underlying problem that there is either nobody taking care for the database schema or this/these person(s) is/are taking weak control. Could somebody verify my conclusion and - if there are actual efforts to overcome these issues - tell me about actual actions to maintain the schema?

Thank you in advance,

Sven

R0bst3r · January 14, 2018, 7:29pm

Knowing the problems you are facing, I can just recommend to use the data which is usable for you and to ignore the rest.
Unfortunately you are right that nobody of the responsibles will take care although there are sometimes some tendencies within the community.

The problem goes back to https://wiki.openstreetmap.org/wiki/Any_tags_you_like
Thats why currently no editor is checking the user input for valid key and values, as far as I know. Only some low level checks are available.

The next problem is https://wiki.openstreetmap.org/wiki/Automated_edits
Why it is not allowed is explained here https://wiki.openstreetmap.org/wiki/What%27s_the_problem_with_mechanical_edits%3F
Unfortunately even manual improvements of typos are defined as mechanical edits as far as you’re not a local and can’t clarify the object on the ground.

Efforts to overcome these issues? Help yourself and don’t talk about. That worked best for me.

HarryKane · January 14, 2018, 7:50pm

Oh gosh, this database seems to be rotting like any h?cker-driven FOSS project.:rolleyes:

SK53 · January 14, 2018, 8:15pm

OSM does not have a defined database schema, it has free-format tags: therein lies the power of OSM. If you don’t like it you are free to choose alternative technology.

In practice around 98-99% of tags in given categories (e.g., shop, highway) are readily usable with a little light post-processing. The nature of the any post-processing will be dependant on your application needs. These may happen to be completely different from the person who added them to OSM for their own purposes.

The fundamental problem of defining a suitable fixed database schema for OSM are:

It would take forever (see the tagging mailing list).
It would probably have to be so generic that exactly the same issues would arise with schema data as with tags.
It would be very unlikely to cover a fraction of the uses for which people contribute data to OSM.
If it was so phenomenally useful, someone would be offering this as a service. Closest is OpenCageData which is a relatively tiny enterprise.
The suggestion that somehow harmonising tagging in the database will improve database quality is erroneous. All the evidence suggests that it removes data and erodes quality.
A one-stop database ready for a given application domain to consume direct is a fantasy.
Any application consuming data from a very large database MUST make provision for post-processing because every database will have errors or inconsistencies once over a certain size.
OSM is multinational and multilingual. The complexities of achieving any coherent design across countries, usecases, languages etc would have been very high.

In the days when I regularly designed large databases, something of OSM’s complexity would have taken several years just to get a sensible starting design. Once one starts populating such databases there is an inevitable series of revisions, as data appears which doesnt behave as one was told during design. Rejigging formal schemas on a production database is an absolutely horrible task.

Richard · January 21, 2018, 7:52pm

I think, Harry, you might be better advised to stick to the day job rather than branching out into database critique. I mean, come on, Southampton are in the relegation zone and you still couldn’t manage a win today.