Hm, reading more: First, apparently this is a known problem; I hadn’t seen all the Wiki pages and comments about it.

Second, reading the code: It seems that the problem is all the data correlation that osm2pgsql has to do. If it could be just a few giant COPY statements, rather than that plus a truly massive quantity of prepared select’s, it would be hugely faster.

Some thoughts on that: First, why is osm2pgsql single-pass? I feel like it’d be a great deal faster to scan the file once and fill up the cache with nodes from one region and process them, then repeat for a second region, etc. This would require stitching the regions together in the end, but I’d expect that to be considerably cheaper?

Second, if osm2pgsql has to do all of this correlation to convert between formats, maybe this isn’t the best format for this data? Does anyone publish equivalent PostgreSQL dumps of this data?; that seems like the simple solution to me. Also, has anyone considered denormalizing the OSM format a bit, just enough that it can actually be read sequentially without needing to refer to past parts of the file?