OpenStreetMap Simple Feature Access and Spatial Extensions

daveb1034 · January 14, 2016, 8:21pm

Good evening. I am currently preparing to write a dissertation for my MSc in GIS. The main focus of my proposal is to investigate whether the OSM model could be migrated into a structure that is OGC Simple Feature Access compliant and all the use of spatial extensions. I am planning on using postgres with postgis.

I primarily use ArcGIS for analysis and mapping and I love using OSM data in my workflows. I have had issues importing into ArcGIS and rely primarily on the OpenStreetMap editor tools written by ESRI professional services however this is ok for editing but really inefficient for analysis and display, especially over larger areas.

I am keen to hear any ideas / opinions from the community in order to shape the research.

I am anticipating that the migration and maintenance of the data will require some custom code and I will post this all to my GitHub site

I look forward to any comments you may have.

Dave

mboeringa · January 16, 2016, 6:11pm

I don’t know if you have any real understanding of ESRI technology, but basically, once you have used the ArcGIS Editor for OpenStreetMap, and put your data by means of ArcGIS for Server / ArcSDE in a spatial database, whether a File or Enterprise Geodatabase, the data is essentially OGC Simple Feature compliant (as far as there is actually a true standard, each vendor and spatial database has in fact its own implementation and idiosyncrasies).

If you want to have some better understanding of ESRI technology, I wrote and posted two PDF documents that I published on ESRI’s GeoNet. Some of it is outdated by now, especially some things concerning CAD compatibility, and the Spatial Data Server component mentioned in the original document probably was the shortest lived ESRI product ever, as after just one release as a separate product, it got fully integrated with ArcGIS for Server at the next release, but I think it is still relevant enough to be a good read, especially in combination with the second small “future” document highlighting some of the changes. You can find them here on GeoNet:

“The ESRI Geodatabase Framework”
https://geonet.esri.com/message/416634#416634

“The ESRI Geodatabase Framework”
Future developments at ArcGIS 10.2 and 11 - kind of a supplement to the first document, correcting or predicting some (future) changes.
https://geonet.esri.com/message/416639#416639

What issues do you have importing? I have been working on a personal project for the past 3 years creating an ArcGIS Renderer for OpenStreetMap, and put both my renderer and ESRI’s editor through its paces by importing multiple GBs sized OSM data extracts downloaded from Geofabrik using the Load OSM File tool of the toolbox, resulting in some cases in well over 100 GB sized File Geodatabases (largest one the DACh - Germany, Austria, Switzerland) with no real problems, accept possible considerable processing times in case of using a traditional hard drive.

Although the details are pretty outdated by now, and I developed it into a style based multi-scale renderer by now, you can still see the first announcement of my ArcGIS Renderer and render results in this thread:
http://forum.openstreetmap.org/viewtopic.php?id=26451

What do you mean with “inefficient”? Two aspects you really need to take care of, if using larger extracts, is setting appropriate min-max display scales in ArcMap or ArcGIS Pro. This is no different than what all the Open Source renderers and styles like OpenStreetMap-carto do. You really cannot use millions of complex vector objects without limiting what is displayed. This has been a key aspect of the development of my personal - still private since it is not finished - renderer. A second aspect is indexing. By default, the ArcGIS Editor only creates a spatial index, and an index on the OSMID field. If you use complex SQL statements as part of the Definition Query property of a layer, or in Query Layers, you really need to index the relevant attribute fields in your database using the Add Attribute Index (http://pro.arcgis.com/en/pro-app/tool-reference/data-management/add-attribute-index.htm) tool of ArcGIS to maintain performance. In some cases, you may also find some use for the Sort (https://desktop.arcgis.com/en/desktop/latest/tools/data-management-toolbox/sort.htm) tool’s spatial defragmentation options (e.g. PEANO sorting), although from what I saw up to now, the benefits are not as big as indexing, primarily since data in OSM seems quite spatially clustered already (sometimes due to extensive imports, e.g. whole cadastral building data of countries like the Netherlands and France).

As said, I actually have no real idea what code would be needed for the specific target you set (get OSM data into PostGIS). ArcGIS for Server / ArcSDE already does this in combination with ESRI’s Editor…

Anyway, one last write-up by me you may like to know about as it is not well documented: if you have ArcGIS Standard license minimum, you can have a free “ArcSDE Personal Server”, as it is actually part of your license (not well known this, but it really is, and even for ArcGIS for Home Use). You are limited though to SQL Server Express and its max. capabilities… so putting data in PostGIS by means of this, is unfortunately not an option. It is a great playground though for getting to know ESRI Enterprise Geodatabase functionality.

I have described in detail how to set this up here on GeoNet:
https://geonet.esri.com/message/118404#118404

Yes, I know, if you don’t want, nor have the possibility of using ArcGIS for Server, then using Query Layers (http://desktop.arcgis.com/en/desktop/latest/map/working-with-layers/creating-a-query-layer.htm) in ArcGIS to view and use the data, and non-ESRI tools to put the data in PostGIS, is another option, but you will be limited in some functionalities of ArcGIS (e.g. no versioned editing), as the Query Layers don’t support the ESRI geodatabase model.

Marco

daveb1034 · January 17, 2016, 1:19pm

Marco,

Thank you for your excellent post and feedback. I have been working with ESRI software for the past 8 years so I am pretty familiar with the capabilities.

Some of the issues I have been having with the OSM Editor tools regard time taken to process as you rightly point out. The biggest extract I have imported was the whole of Africa (from Geofabrik) and this took 7 days to load and populate additional attributes as well as constructing a Network Dataset.

The renderer results you put up in your related threads are excellent. Have you published the models / scripts. Wasn’t able to see a link anywhere. I and a friend have been working on a set of layers for ArcGIS based on the Humanitarian OSM layer. This was a reverse engineering of the CartoCSS style here. The layers are here. They are still a work in progress and require a lot of work to finish.

I agree that the data is in ESRIs implementation of SFA when loaded using these tools the main issue is that your presented with three feature classes one of which maintains every node in the extract in order for the data to be edited and uploaded back to OSM. TThis leads on to the main issues I have been having which relate to drawing speed and accessing the attributes based on the definition queries required for each layer. I believe the most efficient way would be to extract out set themes on loading to minimise the number of features in each layer.

My main aim of the research is to determine whether the whole planet could be maintained in a theme based ( ie transportation, population, …) data model in postgis. This would then exclude to the prepossessing step and allow users to either view directly in QGIS or use query layers in ArcGIS.

I am also interested to see whether this sort of structure could be easily maintained and kept in sink with the core OSM database giving users an alternative to accessing this excellent data source.

Dave

mboeringa · March 2, 2016, 9:18am

Sorry for the delay in responding. No, you’re right there is not yet anything to find, as I haven’t published it yet. The development has been a major undertaking, and it would be really hard to publish a non-finished product. ModelBuilder doesn’t lend itself very well to cooperative development as well, so I’ve decided to make sure it is as close to a real finished product before anything can be released.

Yeh, well, tell me, I have been working on this for three years… (and enjoying gorgeous maps at the same time )

You are right ESRI’s Editor maintains all nodes (just like all other editors). However, the “osmSupportingElements” field, with values “yes/no” determines whether a node is simply a non-tagged “supporting” node of a way, or has tags of its own and is a “real” feature worth rendering. You can therefore use this field to filter out the bulk of the performance killing nodes, which probably means getting rid of some 90% of all nodes.

I actually use this in my ArcGIS Renderer for the same reasons. I also implemented a flexible approach where users can chose between rendering from 1) the three base tables unmodified 2) tables without supporting elements and 3) thematic layers / tables like you suggest. In fact, if choosing to render to dedicated thematic tables only, with the current advanced style, you end up with close to 400 layers / tables in your database, because that is how many I have defined for the renderer! (and that was an absolute necessity for the advanced rendering you see).

I have little doubt this is possible. In fact, the mere existence of the OpenStreetMap project and multiple global renderings, is proof of it. I don’t completely understand what you mean with “no preprocessing”. You will need to pre-process the data if you desire to create a “theme based data model in postgis”…

However, more generally speaking, dealing with ArcGIS, you need to make a choice, as there are two alternative strategies possible:

Create an ordinary “non-Geodatabase” type spatial database in PostGIS. This database is not Geodatabase aware, and can thus not be used in any (versioned) editing workflow in ArcGIS, except through non-versioned editing as Feature Services in ArcGIS. You would most likely create such a database using osm2pgsql, and access the data through read-only Query Layers in ArcGIS, giving you a lot of flexibility in defining your queries. In this scenario, you would NOT use the ArcGIS Editor for OpenStreetMap at all, as that creates ESRI Geodatabases! This scenario will allow minutely updates, as you would essentially be running the ordinary database software stack almost any other OSM website is using, so you can use the current tools for synchronization and updates. In this scenario using osm2pgsql to create and maintain your database, it would be best to implement HStore (or BJSON) key-value storage from the beginning, as that will allow easy access to an arbitrary set of keys. Otherwise you run into the issues the Carto team currently faces, that is the need for an entire new database import to implement HStore and allow more flexible OSM key access.
Use ESRI’s ArcGIS Editor for OpenStreetMap to create an ESRI Geodatabase. This is the approach I have taken. I use the Editor’s functionality to flexibly create an arbitrary database schema. Since the Editor contains its own variant of an “HStore” like key-value storage, you have access to all keys. Disadvantages of this approach are that you probably won’t be able to do something like minutely diffs on a global scale easily. Although there is a “diff” tool in the Editor’s toolbox, I don’t think it is yet capable of global scale automated minutely processing (I must admit I only looked very superficially at it, but my first impressions were not positive in this respect). Also, it would require very specific ESRI experience to properly implement a most likely required geodatabase replication workflow to sync updates to the render database if not going the “diff” approach, but updating from a secondary geodatabase. In my own ArcGIS Renderer for OpenStreetMap, I haven’t yet worked out a real updating workflow, and certainly not a minutely one. My current approach is simply to do an entire database re-load. As my renderer makes a database re-load a really painless and easy experience (just a lengthy one!), I won’t dive into how to maintain a “minutely” diff processing flow in an ESRI Geodatabase. It would be really hard to implement I think, a major undertaking (but again, I must admit I need to look better at the “diff” tool of the Editor).

Well, yes, I think there are possibilities in the ArcGIS product line for this, but setting it up won’t be that easy. See the two scenario’s above. It seems you have chosen scenario 1) I discussed there, which will be more in line with current approaches, and thus make true synching and minutely updates a more realistic scenario.

Geonick · March 6, 2016, 1:04am

Hi Dave,

mboeringa is right that once you have OSM data in ArcGIS it’s more ore less compliant to OGC’s Simple Features Access (SFA).

But be aware that the core issue is, that OSM has a special data model which needs to be mapped to the quite difference model in GIS: OSM model is based on topology geometry and Entity-Attribute-Value (key-value) whereas GIS db’s are relational and have SFA geometry (types point, linestring, polygon, multipolygon…).

So any import of OSM data into a GIS needs a mapping usually with information loss. That’s why it’s difficult to edit OSM data unless you have dedicated editors like iD and JOSM.

There are several other tools and services outside OSM Editor for ArcGIS, like osm2pgsql http://wiki.openstreetmap.org/wiki/Osm2pgsql , OGR http://www.gdal.org/drv_osm.html or Spatialite https://www.gaia-gis.it/fossil/spatialite-tools/wiki?name=OSM+tools .

And this is an incomplete list of data services which offer download of GIS formats (Shapefiles) which implies this mapping: http://download.geofabrik.de/ , http://export.hotosm.org/ and soon http://download.bbbike.org/osm/ and http://giswiki.hsr.ch/Osmaxx .

Now when I look at your your github repo https://github.com/daveb1034/OpenStreetMap-ArcGIS you seem to be interested in styling, which is another issue. Any style is based on a data model (where there’s no common one as explained above).

Unfortunately there’s no common styling language neither (except OGC’s SLD/SE). But there’s one thing you can do: Publish your point symbols as SVG and TTF. That’s we we’re going to do too in the Osmaxx project (https://github.com/geometalab/osmaxx-docs/ ).

mboeringa · March 6, 2016, 9:51am

Good additional remarks. Totally agree.

The remark about the “core issue” regarding the gap in spatial data modelling, also in relation to editors, is actually the prime reason why ESRI’s ArcGIS Editor for OpenStreetMap doesn’t support editing of OSM (multipolygon) relations. Many people who try the tool and have worked with iD or JOSM, are baffled by this and don’t understand why there is this limitation in the ArcGIS Editor, but I think it would be really hard to implement in the type of relational SFA database that ESRI’s Geodatabases represent. OSM “multipolygons” (relations) in an ESRI geodatabase are actually derived features, build from their primitives (nodes and ways) into an SFA geometry. Just like osm2pgsql does for the render database. In fact, that the ESRI Editor allows editing at all, albeit only to the OSM “primitives” of nodes/points and ways/polylines, is quite remarkable. It is a kind of “hybrid” database model, and a bit of an exception in the world of OSM.

People often also don’t understand that OSM’s main edit database is in fact no OGC SFA implementation and certainly no true PostGIS database containing (multi)polygons at all! As far as I can tell, the edit database’s PostGreSQL relational model is completely custom to OpenStreetMap (it has taken time for me to realize this as well…).

SimonPoole · March 6, 2016, 3:16pm

It is not only “no true PostGIS” DB it doesn’t use or require PostGIS at all.

Simon

mboeringa · March 6, 2016, 8:11pm

Yes, thanks, that was actually what I was trying to say…

daveb1034 · March 29, 2016, 3:40pm

mboeringa, Geonick, Simon

I want to thank you all for the comments. It has really helped. I have been doing a lot more reading and trying to get to the route of what I want to achieve.

Having discussed the ArcGIS editor tools with the developers at ESRI i have more of an understanding of the issues they had to deal with to just be able to edit the OSM primitives.

I really want to try and assess whether it would be possible to migrate the existing method of storing and editing OSM from the relational structure to a fully spatially enabled database that allows full SFA functionality.

There are many tools to load data into postgis and other spatial databases.

The main ones I have used being osm2pgslq and the ESRI tools.

The main issue I want to try and avoid is a preprocessing step, ie load the data into a spatial database in order to carry out any spatial functions on it.

Mapping all the data across to a new structure is a massive task and one that would likely cause controversy and may end up moving away from one of the core concepts of OSM being the free tagging of features.

Whilst the ESRI tools do this

They are do not scale to large datasets very well.

From my research so far I think editing the data model in a fully migrated dataset will have issues in updating the core OSM database. This is due to the requirement to maintain a list of all node ids that form the features and needing to reference these in any edit call to the API. Updates from OSM to the new structure should be easier to handle through the use of the osmid and version numbers.

I hope this has provided a bit more clarity on what I am trying to achieve and let me say again thank you for all of your comments.

Dave

Gertjan_Idema · March 30, 2016, 9:50pm

Hi Dave,

Both the Topological model and the Simple Feature model have their pro’s and cons. For Open Streetmap, the pro’s of the topological model outweigh the cons by far. It would be very complex, if not impossible, to represent the full OSM database in a Simple Feature model.

One important issue: Routing.
In the topological model, connected ways are related because they share the same node (point) object. This knowledge is used to efficiently calculate a route between to locations.
In the Simple Feature model, there is no easy method to find if two ways are connected. You could state that two point’s are connected if they have the same coordinates, but that information is hard to retrieve efficiently and even harder to maintain. If a way would move only a tenth of a millimetre, the node wouldn’t have the same coordinates as the related node in the connected way and connection would be lost for the router. Adding a tolerance to solve this would introduce new problems. A nearby node could mean the the ways are connected, but it might as well belong to the opposite lane, or to an overpass.

Another issue: Border/Coastlines
In the Simple Feature model, border lines an coastlines would be represented by one single object. Downloading an area of 1x1 kilometre at the coast of Canada would involve downloading the following large objects:
The coastline of North-America
The borderline of Canada
The borderline of the state
etc. involving a lot of data you don’t need
Also, if a state is bordering another state, moving a part of the border in one of the states wouldn’t automatically update the border of the neighbouring state. That would be up to the user, resulting is a lot of gaps and overlap between states. The same goes for land-use.

Because of these and other issues, the OGC Simple Feature model is in my opinion just not suited for Open Streetmap.
The OGC Topological model can handle the issues above, but I’m not sure if it could represent the ‘unlimited’ number of tags on an object that OSM allows.

Gertjan Idema

daveb1034 · April 3, 2016, 7:13pm

Gertjan,

Thank you for the comments. The more i read into the design of OpenStreetMap and the wide variety of uses for the data the more I realise that Simple Feature on its own may not be the way to go. I am planning on looking at whether simple feature and the use of topology in postgis will have any benefits but I am not discounting that the solution may be use the existing data model and that i will need to use tools to process the data for use as i currently do.

I agree the issue of routing is an important one and i wouldn’t want to lose that capability when migrating to Simple Feature. The unlimited tagging and no fixed ontology is one of the core principles of OSM and I have a feeling that moving to a wholly Simple Feature approach may compromise that.

Thanks again

Dave