Intersection tolerance for connectivity

Sometimes two ways should be joined together but they do not share a node. In order to compute connectivity one must assume an intersection tolerance. Is there a standardized tolerance value that is recommended?

Better to fix the OSM data than overcome faults in it with algorithmic methods.

Ways can be very close without having an intersection, for example when separated by a wall or fence. Assuming connectivity here will introduce additional errors. It is generally better to fix the data than to introduce some random guessing about the original intention of the mapper. Having an intersection tolerance might fix some of these problems but cannot fix all of them and might even introduce new ones.

+1 for all above. Since OSM data is free to edit, we prefer to fix the connectivity errors than making some assumptions introducing other errors.

+1
and with tools like keepright and JOSM-validator it’s very easy to find this kind of data errors.

I guess people are not understanding the issue, so let me rephrase.

I’ve observed that there are four different methods for specifying connectivity of ways in OSM:

  1. Ways that share a node with each other (this is about 20%)
  2. Ways on the ground layer that overlap (ie, intersect) each other (this is probably about 30%)
  3. Ways that come close to intersecting (this is about 60%).

The percentages are my rough guess approximations based on my work over the past several months in building a new routing algorithm on top of OSM. I haven’t actually calculated the true percentages.

Case #3 occurs almost any time there is a road with a side street coming off it. The side street is created second after the main street, and nobody bothers to break the main street into two ways and join them with a node at the intersection. They don’t make the side street having an actual intersection because this would cause, effectively, a short “virtual” side street on the opposite side. It is impossible with floating point arithmetic to expect that it exactly intersects without slightly going over. Therefore, tolerances are needed.

The point is that it is not common practice in OSM to join ways via a node. This is not common practice because it is not obvious to the people who contribute, and hence, it cannot be assumed to be that way all the time. I am not going to build a broken routing system that ignores 60% of the connections just based on the principle that “they should have done it differently”…

So, with that said, an intersection tolerance must be set by anyone who converts OSM into a graph structure, and for this purpose there should be a standard tolerance, so that we can say “if the intersections is less than this amount, then it is good data; if it is more, then it is bad data.”

Currently I’m using 1 meter as my tolerance.

Method 1 above is the only one considered as a connection by the OSM data model and AFAIK by the currently existing OSM based routing engines.

Method 2 is flagged as an error by most QA tools as long as both ways are not at different layers, because then you can not decide if the ways actually connect and someone forgot the connecting node or they cross each other on different levels and someone forgot the layer tag.

Method 3 is flagged as an potential error by most OSM QA tools, because ways coming very close but not connecting do exist, but edit errors exist more often.

The percentages you observed for the different connections do not match my observations; it might be that the area you are working on has particulary bad data.

Good to know.

This is supposed to be the only method for specifying connectivity of ways in OSM, and in regions with a sufficient number of experienced, active contributors, this will be applied correctly in well above 99% of intersections.

Yes, it happens that new contributors don’t understand connectivity requirements, but that’s not the main cause of connectivity errors. These types of errors are very prevalent precisely in those areas where most data is not created by human mappers, but imported.

Are you using import-derived data for testing your system? Say, data from the USA, particularly from one of the parts of the USA where most roads are still TIGER data? That might explain why you observe a number of errors which would be uncharacteristically high for data in regions with a high mapper density.

If you want to build a routing system that works with OSM data even in those parts of the world where our data is mostly broken, then you probably need to use heuristics like that. Naturally, since this is not the intended way of mapping, you won’t find any recommended tolerance values.

But if you are ever going to publish software based on your routing system, please switch the heuristic off at least in those parts of the world that are generally well maintained. It will inevitably introduce some new errors (say, routes jumping over a wall or down a cliff), and I wouldn’t want contributors to feel compelled to “fix” entirely correct data.

Also a piece of more specific advice: If a node or the way containing it has a noexit=yes tag, then you certainly should not apply any proximity-based heuristics. These tags are also used by quality checkers such as JOSM validator to rule out accidentally incorrect connectivity (basically, a mapper can override false positive “these nodes are very close to each other, are you sure the ways are not connected?”-style warnings with that tag), and can therefore sometimes be found on very close, but unconnected, way end nodes.

Based on this feedback I’m going to switch back over to the strict method of connections without using heuristics in order to do a more thorough evaluation. If the level of errors overall is much lower than I thought then we’ll just fix them manually. Thanks guys!