Auto-generated Changeset Comments

I suggest that we move hashtags to the hashtag=* tag on the changeset (see wiki for docs on that) and then still require a regular human written changeset comment in the comment=* field. As the wiki points out and @Anton_Khorev hinted at above, iD automatically moves hashtags in the comment to the changeset hashtag=* tag.

All that said, I am a frequent user of the EveryDoor and StreetComplete mobile apps. Both of those apps automatically set comments based on the work performed. I am OK with that. I think they do a good job in summarizing the work done. What else could be written that is not already stated in a comment like:

Updated a restaurant, a compressed_air, and a doityourself shop; Confirmed 3 restaurants, a variety_store shop, and 7 other objects

6 Likes

On the other hand I would consider this a good example of a worthless comment that just as well could have been generated automatically from the changeset on the fly, literally not worth the bits it is using in the database.

Critical things I want to know from a comment: was this a remote edit or on the ground, what was it based on if it was remote, for example was it based on material gathered during a survey. Was there any noteworthy motivation for the edits? And so on. With other words meta information that is not contained in changeset edit data, and if that is most of the time bland and boring, then be it so.

5 Likes

Maps.me/Every Door/SyreetComplete autogenerated comments have even less effort put in them than hastag comments. Seemingly anyone who is against hastag comments should also be against autogenerated ones. However people are usually more tolerant of them. Why?

  1. maybe because they know that these comments can’t be edited, thus it’s pointless to demand writing something else
  2. maybe these comments are good enough

1 can be adressed by allowing to update closed changesets, at least for some time after they are closed. The app won’t let you edit your changeset comment and will close it. Currently you can’t change tags of closed changesets. But their metadata still changes when they are commented. Maybe it’s ok to update the tags too, let’s say for a day after the changeset was closed? Then you can also ask those who do hashtag-only comments to update them, something that’s currently impossible.

For 2, which comments are actually useful? The ones where you declare the scope of your edit, and then if you deviate from that scope, it’s probably an error. You declare that you’re adding things and if your changeset contains deletions, other users will more readily consider that as a mistake and correct it. Autogenerated comments are typically produced by apps that let you do a rather narrow range of edits. You are less likely to deviate from what you intended to do. You probably won’t accidentally delete many things in StreetComplete. Therefore requiring writing comments for such apps is not as important as for JOSM or iD, where you can do anything (and make any mistakes).

1 Like

I would offer the case of StreetComplete being used to “magically” add 100s (if not 1000s) of missing street names in Germany in its early days.

Given that it might have not been actual malicious intent and just a misunderstanding, having to add a couple of words would have at least given them a fighting chance of expressing what they thought they were doing, which on all counts is better than just having to accept automatically generated mumble jumble.

Note that with StreetComplete it:

  • records whether user was close to object or far away (according to GPS)
  • is supposed to be used on survey only, other use is misuse and app asks to be used only on survey
  • is very limited in what it can do

not sure is it making things good enough

Yes, they could have been auto generated. In fact, to my experience it would seem that in most cases changeset comment would benefit from automatically generated comments, compared to what currently gets into them (fixed useless text like “fixed the map” or “updated the data”, wrong/reused previous comment, lacking any useful information etc.)

It is not that many bits; but I’ll give you that it is certainly nicer and faster to have in one easy human-readable line, then to have to open and parse all data to auto-generate that summary.

Note that I said “in most cases”, not “all cases”. (See e.g. my comment in issue requesting possibility to modify changeset comment at Confirmation popup before upload · Issue #68 · Zverik/every_door · GitHub). But in majority of cases, comment field could (and maybe even should? it’d have much less human error that way!) be autogenerated IMHO.

Critical things I want to know from a comment:

OK, let’s what is actual purpose of changeset comments, as opposed to other changeset metadata.

was this a remote edit or on the ground, what was it based on if it was remote, for example was it based on material gathered during a survey.

there is source=* changeset metadata for that, IMHO there is no pressing need to duplicate it in the human-readable comment field (or at least no need to type it manually; it can be added automatically by editor for convenience if we find it useful). Also imagery_used=* etc.

Was there any noteworthy motivation for the edits?

In majority of the cases (not all of them, buy say certainly more than 90% in my experience) it is “because mapper thought the map should be made better here”. Is there a value of adding that text to 90%+ of changeset comments?

Specifics in most cases are not important, and might be privacy-invading ("I added those tracks because I went there with bicycle there and plan to recommend it to my friends" or “I added tactile_paving here because I have blind friends which use this route sometimes”, “Added roof shapes and colors because I get kick out of looking at https://streets.gl, or “categorized pubs here as I plan to visit all of them in next month!” )

And so on.

Feel free to specify more, I do not think reasons mentioned so far add useful information compared to autogenerated comments.

I too agree that autogenerated comments are just fine for 90%+ of the cases (more likely 98%+ in many cases, but let’s be conservative here).

Looking at suggested data to add:

3 Likes

For a firm “manual changeset comments only” believers, let’s take a my common (i.e. 95%+ of my changesets those days are like that) random changeset example with autogenerated comment; it says Survey whether benches have backrests, which IMHO cleanly explains what is has done.

It also has following changeset metadata:

  • source=survey - indicating I took this on-the-ground survey
  • created_by=StreetComplete_ee 55.0 - app I used
  • StreetComplete:quest_type=AddBenchBackrest - machine readable version of changeset comment
  • locale=en-HR - hint about preferred language for changeset discussion
  • (bbox of the changeset, of course)

How would proponents of manually specified changeset comments propose that this changeset comment be improved according to Good changeset comments - OpenStreetMap Wiki ?

Even if I had option to add/update those changeset comments manually, I am at loss what I would change to be better netizen?

  • Wiki above says Added Trees is good changeset comment, so Survey whether benches have backrests also looks good to me, does it not? If not, could someone elaborate why?
  • I may duplicate location information from bbox too (e.g. add in Park pravednika među narodima in Zagreb, Croatia); but that can be autogenerated too if people think it is valuable, and I don’t see much value in that duplication (visual bbox looks more useful to me).
  • I could also add a reason why I decided to map that (because I have issues with my back after prolonged walking, so those backrests are useful for planning my walking trips or I have OCD issues with improving OSM map), but that would be completely unnecessary detail which should be nobody’s concern (and likely a huge privacy violation if one were to forced to disclose their reasons for mapping).

Note that if it is decided to force the users to manually enter changeset comments, there is a price to pay: Instead of mapping 1000 details on some walk, if I had to manually type 100 changeset comments for it (because StreetComplete does one changeset for one type of quest), the number of improvements would likely be much less

Would the tradeoff on enforcing manually typed changeset comments be worth possible increase in comment quality (and how likely is that such technical enforcement actually is to lead to such outcome, instead of even worse changeset comments than autogenerated ones)?

Also, would forcing 1000 different changes in one huge changeset be beneficial compared to current autogenerate-one-changeset-comment-per-quest-type (e.g. “done about 100 of different kinds of POI and ways updates in Zagreb” better then 100 changesets named Survey whether benches have backrests, Survey surface type, Is the road lit etc. respectively)? I’m not convinced that it would hold true.


(That is my use case with just-get-out-and-map hat on; I guess that armchair mappers might have different use cases and their percentages might differ; e.g. if they are doing imports or having different data sources those should always be specified; perhaps some of that even in changeset comment)

6 Likes

I think the focus should be on whether the auto-generated changeset comment is good/useful for other mappers, and not about whether the user had the technical capability to change it manually. For example, if exactly the same changeset comment were to be:

does that mean the latter case is just fine, but former is bad and OSM would be better without such users (as seems to be implied by comment here)? IMHO, it should not be so – either the changeset comment is good (enough), and it doesn’t matter which editor pre-filled it; or the changeset comment is bad (and again it doesn’t matter which editor pre-filled it).

“if you are against X then you must also be against Y and HERE’S MY ARGUMENT WHY BEING AGAINST Y IS BAD”

But the issue is that both X and Y has a same Z result; and the premise seems to be that “Z is considered bad”. How can it be decoupled? If we condemn “Z” (e.g. '#added 10 shops' is bad changeset comment and must not be allowed”), we condemn both “X” (e.g. OsmAnd) and “Y” (e.g. StreetComplete).

It surely must be totally unacceptable that we treat same changeset comment as violation to be harshly punished in one case, but completely ignore it in other case? Such preferential treatment should not be acceptable IMHO, or do you disagree?

1 Like

Just a quick digression: EveryDoor does use hashtags if you specify them, (see Configuration / Changeset hashtags) - but as you note, most users are likely going with default auto-generated comments without customizing hashtags; and we’re straying away from original subject.

You are correct; however posts/links I reacted to seem to link one to the other (paraphrasing: “because they’re all auto-generated crap!”).

To rephrase the issue with that: is "added benches" changeset comments radically better than "#added #benches", and thus former should be allowed while latter forbidden? Or is the root of the problem actually something else unrelated to use of # character?

And is:
"#hotosm-project-15476 #moroccoearthquake2023 #OPSGIS2023"
so much worse then:
"HOTOSM project 15476 mapping Morocco earthquake in 2023 #OPSGIS2023"?

Because that is a change that I predict will happen to sources generating such changesets if technical measures to prevent hashtag-only-changeset-comments are implemented. Is that the result we want to accomplish, and if not, what should we do in order to avoid it happening?

IOW, are we barking at the wrong #tree here?

2 Likes

I have created a separate thread to discuss auto-generated comments which while related, deviated from the original topic of deprecating hashtag-only changeset comments. The creation of a separate thread was done at the request of @ElliottPlack.

2 Likes

I disagree. A changeset comment that modifies existing information, especially contentious items like (e.g.) highway classification, in a nonobvious way should certainly have information about motivation and sourcing, but a streetcomplete user adding addresses is just trying to add the information to the database. Does it matter it matter what their motivation is? For places that are incompletely mapped and where the changeset is just adding missing information like buildings and addresses, I think it’s OK if the changeset comment just says what was added. What else is there to say? Even the parts mentioned in Simon’s comment have separate fields for that metadata suggesting it doesn’t belong in the comment itself - I indicate if I’m doing a survey or using imagery using the data sources tags. Editors add the imagery used.

To the argument then that autogenerated summaries aren’t worth the bits, I encourage you to imagine the workflows others use, even if it doesn’t add value to your own. For me, a summary of additions greatly speeds up reviews of changesets in my area. I can see that someone added a bunch of buildings and decided if I want to review that, rather than clicking on every changeset to see what’s in it. I can, at a glance, see that work was completed by a streetcomplete user based on the changesets, and see what kind of work they were doing and know that, in most cases, I won’t be able to validate it remotely except obvious vandalism.

Summary changeset comments provide value to me, even if they don’t provide value to you all, and I’ve seen others express that they provide value too. It may just be a difference in the tasks we work on or the completeness of the maps in our areas.

6 Likes

Thanks @WarpathPeacock for splitting this out because I did want to continue to the conversation about auto-generated comments but not in that hashtag discussion. Forgive me for forking the original conversation.

I would suggest that for apps that are narrow in focus, it should be good enough. For instance, the EveryDoor app is oriented around ground survey of businesses and some other amenity POI that are best field surveyed. I always felt the fact that I am using this app gives the edit the same credence as saying source=survey would.

4 Likes

Thanks also from me, for the same reasons as Elliott put ^.

It should (?) be doable to program things to create an automatic comment based on what has been done in an edit (ChatGPT anybody? :crazy_face:): “Added 15 buildings, sidewalk, crossings …”, then include “Do you want to use this comment Yes / No?” Yes saves it, No clears it but you must then add a comment before uploading, possibly even with a minimum length (10 characters?)? Sure, it won’t stop people from adding “nnnnnnnnnnnnnnnnnnnnnnnnnnn” but if somebody overwrites an auto comment with that, I’d take it as grounds for a block & revert! :smiling_imp:

Just make the autogenerated comment editable before upload. This is how OsmAnd works.

3 Likes

I appreciate this thought process because it reflects a degree of trust that I think we’d like to have among mappers.

For years, I used to habitually label my changesets as “Bing updates” or less commonly “NAIP updates”. My process was to roam around the countryside looking for things to update based on Bing imagery – anything that came up – until the changeset got so large that Potlatch could no longer handle it. My typical changeset included a mix of everything from roads to dog sheds to ponds to grain silos to retaining walls. A reviewer didn’t need me to enumerate each change and they usually didn’t need to know which specific imagery layer I was using; they benefited more from knowing that I was doing whatever they would do if they also roamed the countryside with the latest Bing imagery. They could trust me based on my process, not my motivation.

It was only in the last year or so that “bing” finally dropped out of my top five changeset comment words in HDYC. I started entering more descriptive comments because iD automatically adds an imagery_used tag. More importantly, as the project has grown, mappers have become more specialized; I can’t rely on other mappers inferring the same meaning from “Bing updates” as I implied. This is the heart of the matter, I think: we need descriptive changeset comments to avoid misunderstandings between mappers with very different approaches. Formulaic changeset comments can only help to a limited extent, whether written by hand or generated by software.

(And of course, there are those pesky “changesets” that were synthesized from individual changes I uploaded before there was such a thing as a changeset. I don’t think I can describe those as anything other than “Yahooooooo!!!”)

The status quo is already a tradeoff, and the proposed assisted approach would also be a tradeoff. We keep talking about some other mapper being lazy, and there certainly are mappers who insist on being uninformative in their comments. But the truth is that each of us sometimes gets a bit lazy and just wants to get the changeset over with, whether because my laptop is about to run out of battery power or because the pizza is at the door and I’m hungry. If software can generate something more informative than “10 extra chars to please software”, changeset reviewers do benefit to some extent. It’s kind of like what we all experience here on the forum:

On the flip side, we would be incentivizing mappers to put in a little less effort sometimes. Fortunately, there are plenty of smart UX approaches to reduce these perverse incentives. For example, the editor could avoid prefilling the generated comment, giving the user an opportunity to fill in something manually. Whereas currently the editor disables the Save button, it could instead enable it but require the user to bonk the button a couple times to confirm that they really don’t have anything useful to say. (MediaWiki does this optionally.) If they insist on an empty comment, then the editor could generate one. (MediaWiki does this with some limited types of edits, like creating a redirect.) To avoid unexpectedly putting words into the mapper’s mouth, the editor could display the generated comment as grayed-out placeholder text in the comment box.

I think StreetComplete and Every Door have gotten away with these comments so far because these are more guided applications. We can easily imagine the process by which the updates, additions, and confirmations took place: by filling out a very specific form, and not at a desk on another continent. We also understand that it can be very challenging to type up a cogent explanation on a tiny phone keyboard, with the same autocorrect that ambushes you and replaces “raster tiles” with “Easter tule”.

1 Like

The point was that a summary can be generated from the changeset itself when you are browsing it on the website. And given that such a list can be quite long, that has the added benefit that it wouldn’t have to be cut off after 255 characters. Further having such a summary available would make it easy to spot inconsistencies between an existing comment and what was really changed.

I’m arguing for better tooling for mappers, not worse.

1 Like

Haha, especially with our changeset format.

literally not worth the bits it is using in the database

95 per cent of nodes without tags, say hello.

That is an issue if you only have the changes in OSC format, but the API/Website has the full information (aka previous and current state) available. As I sense a bit of a misunderstanding: we don’t actually retain the OSC format information “as such” post-processing and what you see in the changeset display is not generated from the uploaded diffs directly.

And there is no reason you couldn’t summarize geometry changes, something like U ways created, V ways modified, W ways deleted, X nodes created, Y nodes moved, Z nodes deleted.

1 Like

Yes. And it would have other advantages too; e.g. if the format is improved at some point in time, all previously created changesets without comments will benefit. Or changeset comments can be auto-generated in language of the person using UI (instead of forcing English) etc.

It however introduces significant disadvantage too: every time before showing any changeset summary, every changeset would need to be fully analyzed by such automatic parser. Given that even single click on history button fetches dozen of changeset summaries, if that would force parsing of all changeset change data, I think that would introduce significant load on OSM servers.

That load can be greatly reduced by precomputing and storing auto-generated comment at changeset creating time and storing it in comment field in the table, but then we are back at square one regarding “wasted bits” et al. But we in addition rob the editors (programs like StreetComplete, EveryDoor etc) to auto-generate specific comments that are better tailored for their use case, instead forcing the generation on single place (OSM server) which would create one-size-fits-all comments (which never really works all that good, IMHO).

Anyway, if such comment-autogenerated-by-OSM-servers-if-blank is implemented, for best results we should re-educate the users (including technical measures - e.g. iD/JOSM having comment hidden in separate tab, clearing it on each use, and/or warning the user “you have entered manual comment override. Are you sure it is necessary? Unless you are adding extra information which cannot be obtained by looking at source tag and imagery used and hashtags and data itself, please leave this field blank in order to generate ideal comment automatically” ) to never enter manual comment unless absolutely needed. (i.e. more or less exactly the opposite of what we are telling them now).

Not really (e.g. in case of malicious user, which would obviously enter fake manual comment to prevent automated comment from showing what was really done in hopes to avoid detection completely or at least for longer). There are few ways around that problem, each with its own disadvantages:

  • disallow manual comments completely and only use autogenerated ones which can then always be trusted (disadvantage: that kills most of the purpose of changeset comment field – to indicate things that could not be determined from data and other metadata. also, old pre-change comments would need to be handled somehow)
  • always show both auto-generated (trusted) comments as well as any manually generated comments by user (untrusted) (disadvantage: double the text to parse for user)
  • show auto-generated (trusted) comment in place of manually entered comments, but only in case where manually entered comment is missing, but mark it with some special checkmark/flag (which cannot be confused/faked with UTF-8 in manually entered comment). (disadvantage: if user enters anything as manual comment, the advantage is lost; also schema and tooling / data users change needed to support new flag)

I’m still not sure which I’d dislike the least (Probably one of the last two)

I’m arguing for better tooling for mappers, not worse.

Yes. As are we all :smiling_face: