How to include anonymous nodes in a GeoDesk feature library?

Tordanik · December 4, 2022, 12:35am

According to the GeoDesk documentation on Feature Subtypes, the default behavior is to not include anonymous nodes:

By default, feature libraries omit the IDs of such nodes to save space, in which case id() returns 0.

This is a reasonable default given the performance benefits yielded by omitting over 97% of all nodes. However, my application does currently expect untagged nodes to have an identity to guarantee correct semantics in situations where two or more nodes have identical coordinates. And if I understand the documentation correctly, this is not a situation in which node IDs will be preserved – only having tags or being a relation member is mentioned as a reason for preserving IDs.

So, is there an option to override this default behavior?

mmd · December 4, 2022, 9:41am

This is being discussed here: Provide option to store IDs of all nodes in ways · Issue #57 · clarisma/geodesk · GitHub

GeoDeskTeam · December 5, 2022, 12:21pm

Your understanding is correct: Currently, GeoDesk discards the identity of all untagged features (unless they are a relation member) – they simply exist as locations on ways. Way.nodes() returns them as AnonymousNode objects, which return 0 as the ID.

As @mmd pointed out above, an enhancement is on our roadmap to support retaining the identity of all nodes (This is a prerequisite for incremental updates, as .osc files only provide locations of changed nodes, but not for existing untagged nodes referenced in ways).

The open question is whether to store node IDs in the GOL itself (which would enable them to be discoverable via Way.nodes(), but means increasing file size by 20%), or store separately (possible to save space via compression, and allow user to discard if updating is no longer needed; however, more difficult to access them via the API).

Strictly speaking, two nodes may have the same location (same exact longitude/latitude) only if they are at different elevations, levels or layers. This would require tagging the nodes, which turns them into features with an identity. Two untagged nodes at the same location would be an error (though it appears that editors aren’t enforcing this – at least iD won’t complain if I disconnect two features, creating new nodes with the same locations, and try to upload this change).

(The above was an official guideline on the OSM Wiki, but I cannot find the link to it now.)

A possible compromise solution could be for GeoDesk to promote untagged nodes to features, if they have the same location as one or more other nodes (possibly tagging them with geodesk:error=duplicate_location).

Can you tell me more about your use case? Are you building a QA tool?

Tordanik · December 5, 2022, 6:42pm

I’m building a 3D + indoor renderer called OSM2World. That’s why I’m looking so closely at nodes with different 3D locations, but identical 2D locations.

I agree that different elevations are the reason for this situation. However, my impression is that adding level or layer tags to otherwise untagged nodes isn’t really common practice. That is, the nodes of a bridge way with layer=1 usually do not have a layer tag, and the nodes of a room tagged with level=15 usually do not have a level tag. Mappers seem to assume that these implicitly inherit their parent way’s elevation, and so far, I’ve been following that assumption.

That would work for me. If I had that option available, I would likely use it instead of keeping all anonymous nodes. (At least in production, it can be convenient to have node IDs for debugging.)

GeoDeskTeam · December 6, 2022, 5:20pm

Wow, this looks amazing! Is WebGL support live yet?

I’ve researched the above alternative (upgrading duplicate nodes to feature status) and created this GitHub issue.

I’m assuming you’re dealing with issues like staircases connecting different levels, or slanted roofs where the sides connect to lines at different elevations. Mapper could (should?) remove ambiguity by assigning level/elevation/min_height to nodes, but don’t, so you need to fall back to topological analysis to resolve the ambiguity (but for which you need the node IDs if two or more nodes share the same location). Does this accurately reflect your use case? Do you have any examples of buildings where this happens?

Osmose identifies duplicate untagged nodes as potential errors. All cases I’ve seen so far appear to be (benign) issues (mostly created by imports), whese nodes could be consolidated or left as-is without consequences – but this is just based on cursory observation. Other QA tools (OSM Inspector, KeepRight) don’t seem to screen for these cases.

It probably makes sense to treat duplicate untagged nodes like path/stream crossing without waterway=ford or street end nodes close to a road (but without noexit tag): Potential mapping mistakes where tagging could clarify the mapper’s intent.

As for anonymous nodes in general, do you know of other use cases where the ID of untagged nodes may be significant?

Tordanik · December 19, 2022, 7:32pm

Thanks! Not yet – one of missing building blocks was a backend that provides the data for on-demand rendering of tiles, but it looks like I may have found a solution for that.

Yes, you’ve summed up the staircase example well. Slanted roofs are actually mapped differently so they’re not affected.

This topic was actually touched a few months ago at the indoor workshop in the context of the study on potential data model changes. One of the ideas proposed there would be to have ways store their own geometry instead of referencing nodes, which would cause similar challenges. We concluded that this staircase situation was an example of something that would have to be mapped differently to accommodate this change (adding, say, level information + incline direction to each way of steps would make connectivity unambiguous).

It’s surprisingly hard to find real examples of this, though – there’s not that much indoor data in the first place. Here’s one where there are actually small differences between the nodes on each level, so it’s not really affected, but it still illustrates the general issue (thanks to @VolkerKrause for finding it):

I’m also using IDs so users can select and interact with features. (Imagine users seeing a “view this on OSM” link, for example). Because an anonymous node, such as a junction between highways, can expand to a pretty large patch of asphalt + lane connections across it, it is associated with geometry that the user may click on. Of course, I can think of alternative solutions that do not rely on having the node ID stored alongside the geometry.

Otherwise, the brief list of use cases in your issue is all I can think of at the moment.

GeoDeskTeam · January 7, 2023, 3:16pm

Quick update: --tag-duplicate-nodes is now supported in Version 0.1.4.

If you specify this option for gol build, untagged nodes that are not part of relations (and therefore would otherwise be discarded) will be tagged geodesk:duplicate=yes, which means they will be treated as proper features: They retain their true ID, and are also returned by spatial queries.