surface=paving_stones:30 is a bad idea - detail such as colour or exact size of paving stones or their shape or exact type of concrete/stone should not go into surface=*
Yes, aliases of other tags (and for nearly all uses it is a duplicate of surface=paving_stones) are not a large problem, having some tag aliases it is not even top 10 problem for new data consumers or top 100 problem for openstreetmap community. But it is one of easier ones to reduce its scale.
IIRC the proposed new tagging is what the Dutch community also agreed on. 30*30cm paving stones are basically omnipresent here and we have or had a lot of paving_stones:30 tags. I think the automated edit is a good idea.
I am really not very happy with this, and it flags up a fundamental issue with this sort of tag reworking and bot-fiddling.
As far as I can tell, there has been no outreach to data consumers about this. With my cycle.travel hat on, I haven’t received an email, or a DM, or anything. I presume other significant commercial routers (RWGPS, Komoot, Strava, mapy, etc. etc.) haven’t. I haven’t seen any attempt at PRs for open-source routers either.
This is a problem because you have effectively just worsened routing for a vast number of people relying on OSM for their commute, or leisure ride, or whatever. cycle.travel had subtly different weighting for surface=paving_stones and surface=paving_stones:30. That has now been sacrificed on an altar of tidiness with, in practice, no warning whatsoever.
It is not realistic to expect every data consumer to stumble across an obscure thread posted in the “General talk” area, in amongst the dozens of other messages there every day. As it happens I read these boards and spotted it. But I’m pretty sure that most data consumers won’t.
I understand that this comes from a spirit of “trying to make things better”, but OSM isn’t a tidy data playground any more. We can’t just rewrite big swathes of the database to make it look neater. OSM data is mission-critical to thousands, millions of people and we need to act like it is.
I would ask people to stop this sort of edit until some form of outreach method has been devised and implemented. Maybe this is actually something OSMF could do.
(Secondly, surface=paving_stones:30 is a good tag! You are basically repeating the mistake of highway=path here, replacing meaningful, easily understood tags in widespread usage with complex, fragile structures. Those who do not learn history’s mistakes are condemned to repeat them, etc. But that’s just day-to-day OSM facepalm and not really the issue here.)
Most commercial data consumers don’t document their usage at taginfo. Put simply, I’m not going to document my “special sauce” so that the €300m gorilla that is Komoot can just copy it.
Changes like this need an announce list, or something like that. At the very least I would expect an announcement in WeeklyOSM. Not a beware-of-the-leopard thread hidden among hundreds on the forum.
yes, and I am sad that it apparently has not went so well and I am interested in improving things here
I am not promising to stop such ideas completely but I am willing to change how I am approaching them.
(and I would say that I am putting much more effort already than typical, see Category:Automated edits log - OpenStreetMap Wiki where about half of pages are mine - and it is not because I am making over half of automated edits in OpenStreetMap!)
“this sort of edit” - can you give more specific definition of what kind edit you consider here as bad?
Is this one (changing shop=empty to shop=vacant) also bad and harmful and should be stopped?
Are you asking to stop all automated edits altogether?
I am not promising to stop edits until it happens but I am willing to put even more effort into contacting everyone relevant. On the other hand there is limit how much is reasonable to do.
I am surprised that there is so much secret sauce in just listing all used tags. And maybe some extra not actually used to confuse competition.
But I am not running commercial OSM-based product so I am not going to claim expertise here.
So maybe it is not a solution? But then, how one is supposed to find relevant data consumers interested in being informed?
not changing much of your complaint and is only very partial defense of me but vast majority was sacrificed about year ago via Paving_stones:30 en gebruik (which, if anything, shows that better communication would be great)
Maybe I will use OSM Weekly but I am sure it will not be considered enough by someone else.
I disagree - it was neither clear value nor a good design, it was not easily understood, it was not intuitive, it made as much sense as surface=asphalt_painted_red. Or surface=paving_stones:30:yellow:NS-direction or surface=reddish_sand or surface=granite_paving_stones. Or even less as these (hopefully theoretical) tags are mostly more intuitive in their meaning.
But even if that was a terrible one then it would be still preferable for data consumers to become aware before edit runs, not when all are retagged. Or few years later.
How it can be achieved?
If you think that listing specific tags is dangerous, maybe at least some general listing of tags important for you is possible?
For example I am looking currently into roof, building:roof:shape, roof:type, roof_shape: which data consumers I should notify when I will propose bot edit (at some time, depending on when I will have free time that I will decide to use this way)? Should I contact you? How I am supposed to know that?
How much effort I should put into tracking down projects which for one reason or another decided to not list anything on taginfo but use it? No time at all? 10 minutes? 10 hours?
If you mean mapy.cz I am not going to spend any extra effort for them until they start attributing openstreetmap as source, in place seen by a typical user (Doing this when map is in area based solely on OSM data + elevation model would be ideal).
I would still contact them if they would list themselves as using this tag on taginfo, just that I am not going to spend extra unpaid research time on that. Yes, I already tried politely contacting them about attribution.
I sense an opportunity here for the OSMF to increase its corporate membership. “Using OSM commercially but don’t want to give away your secret sauce? Become a bronze+ member and we send you a monthly newsletter with all the latest tag developments in the OSM world.”
You could, you know, just stop, and go out and map something new instead.
Where you’re consolidating values that aren’t typos you are not adding any value to OSM - you are degrading the quality of the database. If you think that there is a better way of tagging something without losing information then by all means propose it (but anything that does not add value, such as clarifying something previously difficult to tag, will be an uphill struggle).
That doesn’t mean that anything a user has entered must always be kept - I have and will continue to fix building=yesq to yes (followed by squaring the building), but if they have chosen a very specific tag that has a very specific meaning we need to make sure that meaning is not lost in any replacement tags (and ask ourselves what the point of the tagfiddling was in the first place).
Like I say, some sort of announcement list, whether that be as part of WeeklyOSM - which already has a critical mass of readers and the relevant mailing infrastructure - or repurposing the existing announce@ mailing list, or whatever.
Right now I worry that the situation is feeding into the hands of resellers like Overture etc. “You can’t rely on OSM data because the spec keeps arbitrarily changing… use our quality-controlled dataset™ instead!” That’s not a great place for OSM to be in.
and while doing this I encounter from time to time some tagging that just makes such mapping harder in one way or another
sometimes (quite rarely) problem is fixable in automated fashion
well, in this case it is not the problem - problem is annoyance for data consumers
ironically, one of reasons why I think that limited and smart automated retagging may make sense is to limit usefulness of Overture repackaging OSM data
I had my share of telling people “well, and this info can be expressed as following alias”. Yes, it is not in top 10 or probably not even top 50 problems for established data consumers, but it is one of annoying things for someone starting to use OSM data.
I disagree, I think that merging shop=vacancyshop=unusedshop=closed into shop=vacant is adding some tiny value.
The same for shop=flowers into shop=florist
(do you think that such changes are also unhelpful or are in fact correcting typos?)
Well, at the very least I will try to take larger correction on “how useful it actually is and is it worth spending effort on that”
A messy data model full of legacy cruft is also a strong argument against OSM. And sadly, that describes the current state of the OSM data model quite well.
I would actually agree that we need better involvement of data consumers in the evolution of our data model. Neither the wiki voting process nor the “watch 5k wiki pages and 20 discussion channels or someone will make a random change which breaks your software” process really succeed at that goal.
However, any such process must be lightweight enough that improvements keep happening.
Discovering all the tags relevant to your use case should not involve any “special sauce”. If only people who have been around for decades to watch all the idiosyncracies evolve in real time can use the OSM database to its full potential, that’s a serious issue.
Actually, that’s arguably the most serious problem with your approach - you’re making automated changes to the database without even looking at the objects and in some cases (although not here) that can miss a major category error that has been made. Beginners often make mistakes, especially someone who’s been dragged against their will into a school or HOT mapathon and told to mindlessly draw things. Classic examples of this sort of thing include “fixing the tagging on a fake ATM in the middle of the Sahara Desert” and “changing an obviously spam website from http to https”.
There absolutely is value in looking at the “long tail” of usage - you only need to go as far as page 4 of information to find information=Unterflurhydrant am Straßenrand vor dem Haus Am Gwend 18 which is … surely not ideal. Here after looking at the data you can see a combination of a misuse of information (there are a few other ones in the data) and the usual JOSM copy X to Y footgun that has caught us all out at some time.
That was not a talk I saw as as an aspiration when I watched it. Overture Maps at least share the output of their secret sauce even if that donĘĽt share the content.
yes, and it makes sense only if error rate on such objects is lower than what is easily detectable in other fashion
that is why it makes sense to retag sidewalk:right=seperate to separate and shop=flowers into shop=florist - error rate there is lower than what can be easily found with other methods. For example, notes and various automated QA make easier to find serious real problems than checking those. And supply of such detections is much greater than we can process, so nothing is lost by retagging those in automated fashion.
In other words it is better to retag 500 cases like this automatically and look at 50 notes fixing 10 issues. Rather than spend the same time and retag these 500 cases manually and fix 5 issues of similar importance. (numbers are illustrative, but I believe that general proportion holds here)
While landuse=wood and natural=forest have error rates much higher so automatic retagging is not helpful there. And there is a real benefit from checking one by one. Automated edit makes no sense there as it would hide many real issues.
In other words it is better to retag 50 cases like this manually and fix 40 issues. Rather than spend the same time on looking at 50 notes and fixing 10 issues of similar importance. (numbers also illustrative, based on my experience with such data)
are you sure that even limited tag listing would cause such problems?
maybe at least release some subset? For example relevant surface values. If you say use surface=chipseal it would be useful to stake information on taginfo, even if for one reason or another you do not want to release full set of relevant tags.
you can name project “partial listing of tags used by XYZ”
As a general rule, if it’s conceivably useful for cycle routing, I probably use it. substation=minor_distribution is used somewhere in the cycle.travel weightings, for example.
Thanks you for the work you do. The gardening (as it was called) of these worldwide forgotten map mistakes/errors is a thankless and much needed task. I see there is still overt hostility from some of the old guard about it.
I did a fair bit of this 10+ years ago. The hassle you needed to go through was insane.
To fix a simple and obvious typo in a tag you officially had to for every change:
Create wiki page explaining you are changing say building=yess to building=yes
Message every countries mailing list in their native language saying the change effecting the tag.
Get consensus approval from each countries mailing list. It was never defined what a consensus was some though 90+%
You were also encouraged to contact every mapper that had edited that element to get feedback on why they did it.
Once or if you got the approval for each country you could change.
Let’s just say I didn’t do that.
even about 5 years ago this was still the case but I see the guidelines have thankfully changed to allowed to do typos now. Progress!
Post processing some say should be the approach. Every data consumer should take account of every possible typo for every tag and convert them to a more readable form for them. I think this is ridiculous and just gatekeeps some data consumers that do this and it is not in their interest to have fixed data as their app, etc works better than their competitors. Some people will just vote for bad data.
The world has moved on from that world now and wants good data. Once the data is better more data consumers will come.
Things have also improved with the QA tools and apps, better tagging schema presets, NSI, etc have all helped done wonders for getting standardization, errors fixed, etc in the past few years. Still things they miss but people like yourself save the data.
The world has moved on from that world now and wants good data. Once the data is better more data consumers will come.
what are you talking about? It just doesn’t make any difference at all whether a thing is tagged building=building or building=yes.
We already have a lot of data consumers, be it Apple, Amazon, Microsoft, Facebook, Tomtom, the Department of Defense, or a myriad of others including German television ARD, Italian map provider tutto città , numerous hiking associations and apps, the same for cycling, …
Those that have a need for map data but do not use OpenStreetMap are not refraining because of some tagging typos but because of other issues (license, data coverage, not official, …)
You never had to create a wikipage to fix obvious typos like yess to yes or residental to residential, but often the “gardeners” do “fixes” that can change the meaning significantly rather than just fixing typos, or can distort community consensus (usage numbers) how a tag should be called. Or remove good data because of wrong interpretation from remote.