Join osm_ids between points file and multi-poly files


Im trying to do a dump into tableau of the central america data on openstreetmap to create some custom visualisation.
For that, ive converted the .pbf file on central america from openstreetmap into an excel form and now tried to do a simple left join in sql between the dataset files containing points (which has the longitude and lattitude coordinates) and the other dataset containing multipolygons.

However when i do the left join, im using the osm_id across the 2 files to make the join (this seems to be the only unique identifier i could find in the 2 datasets)
On executing the script, the join finds no pairings.
I also tried doing an inner join and got no results.

From what i can see, it seems it cannot find any links between the 2 datasets. Is the osm_id not the right field to be using to make the join?
Is it actually possible to join the 2? If so, how?
Im trying to do this because i want to pull in some info from the multipolygons dataset around, but since that one doesnt have coordinates i have to join it with the points dataset to get the longitude and latitude.

For context, the points dataset has about 800k rows, and the multipolygons dataset has about 1.4million rows.

Appreciate the help many thanks

I’m not familiar with the joins you’re trying to do, but I suspect you may be skipping a step. A multipolygon relation would typically contain ways, not nodes. If you look at one of the multipolygons, you’ll likely see references to the member ways. You’d need to look up those ways first, and then look up the nodes that compose those ways. Trying to jump straight from a relation to nodes won’t work for multipolygons.

Since you seem to be somewhat familiar with SQL and databases, the explanation is simplified: your are missing join conditions and tables.

You need to extract the ways (paths within the multipolygons) as well as the multipolygons and the nodes (points).

The database schema should help with the SQL table connections.

Diagram symbols for table relations

A OSM relation is unordered, mixed types collection of OSM entities, including other relations.

A way is an ordered list of nodes, which are points with latitudes, longitudes and possibly other data.

Multipolygons are a particular [type of relation](Types of relation).

In your situation, the connections are




CURRENT_WAY_NODES->way_id (sequence_id is used to maintain order in a way)/node_id->

CURRENT_NODES with the latitude and longitude directly in the node and the other data in the key(k) and value(v) pair tables.

You also need some logic to compute the areas described by the multipolygon. This logic isn’t simple and has to handle several possibilities.
I think tools like Osm2pgsql do this for you.

Many thanks for the reply guys.
I have to admit, im still a bit stumped.

So in the .pbf file for the Central America region, there are 5 layer files as such:

layer name* (geometry type)*
lines (LineString)
multilinestrings (MultiLineString)
multipolygons (MultiPolygon)
other_relations (GeometryCollection)
points (Point)

the “points” file is the only one with x and y coordinates which also contains ‘osm_id’ field.
The “multipolygons” file has both an ‘osm_id’ field and an ‘osm_way_id’ field.

However when i do a join between the 2 osm_id fields across points file and multipolygons file i dont get a match.

You said before i was missing a step and a table?
Do i need to bring in another one of those tables and do a join with osm_way_id first before trying to join with the osm_id?

Many thanks for your help.

Extract from PBF Format of the missing PBF table:

Another extract from PBF Format:

Relations may contain relations. It is a problem if a relation contains itself directly or indirectly.

A multipolygon is represented by a relation which has among its key value pairs a pair of type=multiipolygon.

A multipolygon contains only ways as I remember. Therefore the Membertype is alwyas WAY and the memid is of the ways.

All the ways in a multipolygon ought closed ways, i.e., form a loop.

The relation’s memid field is equal to the id field of the way id field.

The way refs field contain the ordered list of the node id fields.

The nodes contain the location and other data in key value pairs.

I think this is the source of your confusion. While these fields are labelled the same, they’re not the same IDs.

There are three different object types in the OSM database: nodes, ways, and relations. Each of the object types has its own ID numbering starting at 1. For example, you can have node #1, way #1, and relation #1, and these all refer to different objects in the database.

For the multipolygon relations you’re looking at, each relation has its own ID (“osm_id”) within the relation numbering. This multipolygon relation contains a number of member way objects, each referenced by its ID within the way numbering (“osm_way_id”). Going further, each of these ways contains an ordered list of the nodes that compose the way, each referenced by its ID within the node numbering. Therefore, if you want the nodes that ultimately compose a multipolygon, you’ll first need to do a join to look up the ways that compose the relation, and then look up the nodes that compose those ways.