Duplicate key -- correctable?

Niemand · January 5, 2014, 6:48pm

I’m new to postgres/postgis (I’m comfortable with mariadb/mysql, but they’re based on very different human-factors criteria). Trying to load US regions, osm2pgsql 0.81 stumbled over a duplicate key in ways_pkey and gave up. Is this correctable by me, and if so how?

JRA · January 5, 2014, 8:09pm

There is nothing Postgresql specific happening here. The primary key of the table is ways_pkey and the same feature obviously comes with two separate US regional files. Loading the latter fails because of breaking the primary key constraint.
Your alternatives are to clean the dublicates of the same features from the region datasets (but I do not know how to do it) or import datasets into separate tables with different prefixes and combine data later in Postgresql. Or then you can use the North American data file which should not contain dublicate features http://download.geofabrik.de/north-america-latest.osm.bz2.

Niemand · January 5, 2014, 8:56pm

Thanks for your prompt response.

I think what I was perhaps really trying to ask (I shouldn’t try to make sense when I’m tired) is whether there’s some way to keep osm2pgsql from throwing up its hands and quitting when it can’t store a dupe record. From a HF standpoint, that’s rather awful behavior.

Slurping up North Am is a fallback possibility, but also represents a lot of work since I don’t want Canada and certainly don’t want Greenland (which is less part of logical North America than Mexico is)

JRA · January 5, 2014, 10:36pm

I would try to push the North American data through osmosis by using the USA polygon for filtering http://wiki.openstreetmap.org/wiki/Osmosis
Error has not much to do with osm2pgsql, except that it creates database tables so that osm_id is used as a primary key. No database accepts dublicate primary keys. On the other hand, osm_id suits very well for a primary key and better than some local PK because it makes updating the database from diff files safe. Moreover, I believe that osm2pgsql is doing transactions in big chunks for making the process fast. If one feature in the transaction is dublicate the whole transaction will fail. By doing inserts row by row it would be possible to discard only the dublicates but that would be awfully slow and you would not like it either.

I guess that the state extracts are made with buffered polygons to guarantee that all right features for sure will be selected and that leads to dublicates. The Geofabrik extract system is not tuned for combining state extracts into a valid USA extract. Start from a bigger dataset and select what you need from that and you should be happy.

Niemand · January 6, 2014, 8:28pm

Thanks. That sounds like good advice.

SK53 · January 8, 2014, 2:59pm

Nope its standard database technology not bad human factors: its the sort of thing that stops your bank from debiting the same transaction from your account twice. osm2pgsql is part of a toolchain, and behaves as it does because the data is supposed to be clean before uploading.

Duplicate ways occur between state extracts because highways belong in both states. You need to merge the data before importing and osmosis is the tool to use.