Required tagging for imports suggestion

Hello Community,

I am writing to suggest adding required metadata tags that importers will be required to to add to changesets in the future.

It’d be simple to add something like this to the guidelines, but I want input on what would be good tags to use for this purpose.

My initial idea is as follows

  • source - used for putting in the name of the source dataset
  • source:url - used for linking to the source dataset
  • import=yes - required on all changesets, for tracking.
  • import:page - used for linking the import’s wiki page

additionally, this tag may be worth considering

  • source:date - if known, used to specify the age of the parent dataset

Let me know your thoughts,
–James

2 Likes

Should this second instance of source:url be something like source:date?

yes, I wasn’t awake enough to catch that error haha.

source:date is what was meant.

Good idea! Let’s use changeset tags more. Some ideas:

  • source:licence - textual describtion, or SPDX Licence Code for the licence. So we can easily file (e.g.) and CC0 data.
  • source:ct_compatible=yes/no - Is the source data compatible with the OSM Contributor Terms. This is relevant if OSM wants to change the licence (again). It’s also required to be documented (as per OSMF Board meeting 2022-11)
3 Likes

Thanks for the SPDX website link! I’ll definitely include that.

How are users expected to use source:ct_compatible in practice? It’s not like you can update changeset tags once the changeset it closed

–James

I think there is a bit of a misunderstanding on behalf of the board on this topic because there are two different aspects of licencing third party data that are at play wrt CT compatibility.

On the one hand there’s a specific issue with ODbL imports that they fix the licence to the ODbL as you would expect from share alike terms (no other share alike licence is even remotely compatible, so that is why there is just the ODbL in this category) and that would directly interfere with changing the licence as provided by the contributor terms.

On the other hand neither the ODbL nor any of the CC licenses (exception CC0) allow sub-licensing so an importer of so licenced data is NEVER importing on the basis of the contributor terms to start with and needs special permission from the OSMF to do so.

A ref:* tag containing an identifier that uniquely refers to the individual source record that was used to produce the OSM data should be included in every feature imported into OSM.

This will be very helpful in the future when the source data changes and the corresponding OSM data needs to be updated.

This will also be helpful when new records are added to the source data, and mappers need to make a determination whether the corresponding features are already present in OSM or whether they need to be added.

OSM contributors choose a share-alike data license because they didn’t want others to use their data without giving anything back. I think we really won this battle: Microsoft, Google, Overture and even IGM (the national mapping agency for Italy) choose (or were bound) to release data under the ODbL.

But we cannot import these data because the importer has no right to grant OSM the ability to relicense that data in future, as explained here: Open Database License/Contributor Terms/Open Issues - OpenStreetMap Wiki

I think we have a huge problem. Either we choose the wrong license, which I doubt, or we have the wrong CT because we cannot take advantage of our own license.

Moreover a third party could use all these ODbL licensed data (OSM, Microsoft, Google, IGM, etc) and produce a geospatial dataset that is better than ours because obviously they are not bound to follow our CT. And, maybe this is just science fiction, the same third party could attract our contributor base because they have a “better map”.

Isn’t it time to address and solve this issue?

3 Likes

As I already pointed out, the no sub-licensing issue is not unique to the ODbL, and secondly having a single licensor as a goal is completely reasonable

What is reasonable for you maybe not reasonable at all for others. Just to be clear, is this your personal opinion, is it LWG opinion or OSMF opinion?

That’s a kind of silly question given that the LWG and OSMF wrote and put the contributor terms in to force and that the concept was clearly what was intended.

And no, I had nothing to do with the drafting of the TCs. But as somebody that has had a lot to do with data consumers large and snall, it is clear that having one licensor would be far preferable for them.