CoRE Stack Import Proposal Review: Hydrology Boundaries for India

Hello everyone,

We are students from IIT Delhi - Sweety Kumar and Devanshi Malik (@Devanshi_Malik), working under the guidance of Prof. Aaditeshwar Seth (@aaditeshwar_seth), on importing data from CoRE Stack, a rich environmental and ecological dataset for India, into OpenStreetMap.

As OpenStreetMap provides administrative boundaries such as countries, states, and districts, we are developing similar datasets for hydrology related spatial units, including watersheds, micro-watersheds, waterbodies and related layers. Some of these datasets are currently hosted on Google Earth Engine and are accessible through the CoRE Stack platform (https://core-stack.org/)

Our objective is to contribute these datasets to OpenStreetMap through the proper import process. We believe this can help make the data more accessible to a wider community, enable collaborative improvement and support useful analyses through tools such as Overpass API, including watershed relationships, drainage connectivity, waterbodies within watersheds and related use cases.

We have prepared a detailed import documentation outlining the project scope, data sources, methodology, quality assurance process, workflow, and compliance plan. We have also made the associated code publicly available for transparency and review.

Documentation: Import/Catalogue/CoREStack to OSM - OpenStreetMap Wiki
GitHub Repository: GitHub - lmaokisnotcute-4-lbhai/OSM_corestack: OSM import pipeline and documentation for integrating CoRE Stack watershed data into OpenStreetMap ¡ GitHub

We would sincerely appreciate feedback, suggestions and guidance from the OSM India community before proceeding further.

Thank you.

Hello,

Could you please add the link to the explicit permission you mention? As mentioned here CC-BY 4.0 is not inherently compatible with ODbL.

Your methodology of uploading data using the raw OSM API instead of common well tested tools such as JOSM is highly discouraged and prone to errors. Please don’t use custom scripts when you don’t need to. In fact, looking at your earlier import attempts and your code, you always create a relation with a singular way as ember. You also never reuse nodes you’ve previously added, even if two boundaries are next to eachother and the nodes share the exact same coordinates. Why not just split all these ways up into reusable segments like virtually all boundaries on OSM? Why not reuse nodes? I don’t believe you should be using some custom script for this. Regular JOSM should suffice. A lot of this code also seems AI generated, so how can we trust that you know what you are doing with regards to this import and OSM in general?

Please do not create a wiki page for every single object you upload. There is no reason to do this. Those examples you provided 1 and 2 are completely meaningless to anyone visiting the page. Creating these is just unwanted spam.

The tags you list are incomplete. You’re suggesting basically just 4 OSM tags, one of which is deprecated and should instead be used on changesets, not on objects. Just having a tag with type=boundary + source=https://core-stack.org/ doesn’t mean much.

I’m also not sure about the wikipedia tags you added here as that page is not about a watershed. Could you give us an example of microwatersheds you intend to add that already have their own wikipedia page? (Edit: I see now that you want to add wikipedia tags that link to the OSM wiki <tag k="wikipedia" v="https://wiki.openstreetmap.org/wiki/CoREStack/{uid}"/>, this is not how the wikipedia tag works. And again, please don’t create OSM wiki pages to represent objects on the map. That is not what it is used for)

The CoRE-Specific Tags you mention seem unneccecary to me, if this is just a one-time import, then these tags are not needed. Maybe a single ref at most. It is also unclear how eactly you want to tag things. You mention those 3 tags (core_entity, core_id, core_updated) specifically, but then start talking about a core: namespace?

In your previous import attempt, I found the geometry you added to be highly dubious. Firstly, it was incredibly blocky, but more importantly, the data does not seem to correspond to anything verifiable on the ground? A lot of your boundaries randomly cross rivers or seem arbitrary in nature. Could you exlain what exactly ‘microwatersheds’ are, and how one could verify those?

6 Likes

Hi Taya,

Thank you for taking the time to review our proposal and early import attempts so thoroughly. Your feedback has been incredibly helpful. We are relatively new to the operational side of OSM, and we clearly made some missteps in our initial approach. We have overhauled our pipeline, tagging schema, and workflow based on your guidance.

The document below lays out the changes we propose to our import plan to address your concerns:

And to address the incompatibility issues between CC BY 4.0 licensing and ODbL licensing, we have got this waiver form signed for the use of their data from the owners of the CoRE Stack-

Thank you for your response, but please share your reply in this thread instead of linking to a seperate document. The forum is meant for discussions after all.

1 Like

Based on the clarification you provided in that text document, I do not believe this data would benefit our project and I don’t believe it belongs on OpenStreetMap in the first place.

One of our most important principles is that things need to be verifiable. An average person should to be able to visit the object, and be able to recognise it. I do not believe these boundaries truly exist ‘on the ground’, but rather are arbitrarily sized groupings of how an algorithm believes the water should flow.

What is to stop an update to this algorithm from completely changing the size and amount of these microwatersheds? Looking at the data you provided, some of these microwatershed boundaries make no sense to me at all and do not correspond with what a watershed should look like. I also do not believe just relying on elevation data can capture the complexity of the real world, where things like canals can ‘bypass’ the natural flow of water.

I believe that the best way to represent watershed information on OpenStreetMap is indirectly, by ‘simply’ ensuring that the waterways are properly mapped. Projects such as waterwaymap already do something similar, by calculating the actual connectivities of waterways mapped on OpenStreetMap. This aligns much better with our ‘on the ground’ principle than importing algorithmically generated boundaries.

I would also appreciate a swifter response. It is very difficult to have a discussion if you only provide a response after 2 whole weeks have passed.

1 Like

I think Taya might perhaps not have said it clearly enough, so just to add that clarity: algorithmically deduced data has no place in OSM. Do not upload.

2 Likes

Hi ,

Thanks a lot for your feedback. We understand the concerns about verifiability and suitability of algorithmically derived data and we won’t push the import further.

We are now considering a different approach and would value the community’s perspective before committing to it.

The idea is to self-host the OSM software stack (following the switch2osm.org guides) and use it to host the CoRE Stack hydrology layers privately, with an OSM India extract as a base map underneath. Editing would happen through JOSM/iD against our own server, with our own user accounts and changeset history.

We think the OSM toolchain is a great fit for collaborative refinement of this kind of data, even if the data itself is not suited to the public database.

Before we go down this path, would the community see any concerns with this direction? Anyone with experience running private OSM instances we could learn from?

Thank you again for your time on this.

Best,
Sweety

Actually, the pages there are mostly about rendering OSM data. You probably want to start here.