How to get predominant land use for given area?

GrößterNehmer · March 19, 2014, 8:21pm

Hi @all!

First of all I’d like to apologize if my question is stupid, in the wrong place or has been asked before. At least I couldn’t find anything via search.

Part of my masters thesis is to develop an Android app that plays music from different playlists, depending on what kind of area the user is driving through, e.g. residential, woods,…

That is how my academic supervisor thought it should operate (it’s in German but I think you get the idea):

He also told me that there is an OpenStreetMap API against which I could fire a set of coordinates and miraculously get back an XML data stream with all the information I would need. While I found Overpass and Xapi I can’t for the life of me figure out how to get the data I would need to calculate those percentages or if it is even possible.

In addition to that it seems that I can’t solely rely on the tag landuse. As I found here a “bunch of trees standing next to each other” could be tagged landuse=forest or natural=wood.

By the way, the app should only differentiate between city/residential, countryside and woods. So I guess everything else should be divided into those categories.

If anyone could point me in the right direction it would be most appreciated.

JRA · March 20, 2014, 9:06am

Is is defined that you must use OpenStreetMap data and APIs? I think that data like Corine Land Cover would suit better for your needs. CLC is well documented, data covers whole EU area, same methodology is used everywhere and every place has been classified to one major land cover class. Resolution should be good enough for your needs.

http://www.eea.europa.eu/publications/COR0-landcover/at_download/file
http://www.eea.europa.eu/publications/tech40add/at_download/file

As far as I know there are no public service or API delivering CLC data but setting up such is not a big deal if demo scale is enough. It could be based on either vector data or classified raster data. I can suggest some easy alternatives if you get interested.

OSM data is more rich and also more accurate in some places and it could be used for verifying and enhancing CLC data. You could for example check the distance to the closest fuel station or tourism attraction from OSM data and fine tune the standard playlist according to that. One alternative could be to query the pixel colour values of the OSM slippy map. Some averaging and filtering would be needed but at least the residential and industrial ares should be easy to find http://osm.org/go/0MVjfpI

GrößterNehmer · March 20, 2014, 11:53am

Thanks for the quick answer, JRA!

It’s not specifically stated that the data must be retrieved from OpenStreetMap but it must be retrieved from an already existing source on the interweb. So, setting up my own little web server is out of question.

Furthermore the whole idea behind using OSM for determining the type of area is that my academic supervisor thought it would be the easiest way. For implementing that part of the app where it calculates the percentages of land use around the user, I have two weeks at most. My supervisor told me that this would be the easy part.

Analyzing the pixel color values could be an option. But I need to get a new reading every ten seconds, so downloading an image and processing it would use up to much resources. Besides, the map is generated from raw data, so there should be a way to get only the information I need from that data.

Ignoring for a minute that I can’t rely only on the tag landuse and doing just that for arguments sake: would the following be a feasible approach?

Query a bbox:

 http://www.overpass-api.de/api/xapi?*[landuse=*][bbox=10.52412,52.27387,10.52635,52.27505]

Calculate the area of the resulting polygons that intersects with the bbox
Highest number means predominant land use in that bbox

If that would be a feasible approach, I could search the wiki for other tag I could use and include them into the algorithm.

Any thoughts?

JRA · March 20, 2014, 1:17pm

I fear that at least you should look also other tags than landuse. Natural for sure but those two may be enough for a proof of concept. For more accurate analysis perhaps some other features would be needed too. Historic, sport, leisure, amenity can all contain big landuse-like polygons. And then you will need a default landuse because huge areas are without any landuse polygon.
You will need to have a program or library for resolving polygons from ways and relations and for making intersections with out “region of interest” geometry. Let’s forget that probably sometimes, somewhere, polygons may overlap.

You do not have very much time. I am for sure biased because GDAL is the tool I know best, but at least you may be interested in giving GDAL a try. Here is an example which makes your xapi query, finds two polygons with a landuse tag and prints the areas and some other information. Both polygons seem to be residential. You can do the same from your own computer, all you need is a GDAL of version 1.10 or dev version from trunk. The command must be kept on one line.

ogrinfo /vsicurl_streaming/http://www.overpass-api.de/api/xapi?*[landuse=*]
[bbox=10.52412,52.27387,10.52635,52.27505] -dialect sqlite -sql "select landuse,
 st_area(geometry) from multipolygons"
Had to open data source read-only.
INFO: Open of `/vsicurl_streaming/http://www.overpass-api.de/api/xapi?*[landuse=*][bbox=10.52412,52.
27387,10.52635,52.27505]'
      using driver `OSM' successful.

Layer name: SELECT
Geometry: None
Feature Count: 2
Layer SRS WKT:
(unknown)
landuse: String (0.0)
st_area(geometry): Real (0.0)
OGRFeature(SELECT):0
  landuse (String) = residential
  st_area(geometry) (Real) = 2.62613661499937e-006

OGRFeature(SELECT):1
  landuse (String) = residential
  st_area(geometry) (Real) = 2.71783572899986e-005

You can do with SQL everything that is supported by SQLite and Spatialite. For your thesis I suppose you could do quite a lot without extra heavy programming.

Areas are in square degrees which is a bit odd unit but I don’t think that it is an issue because you are interested in finding the dominating landuse withing a small area. If you want, you can reproject polygons into any other projection prior to calculating the area by adding “ST_Transform” into SQL.

SK53 · March 23, 2014, 3:57pm

I quite agree with JRA: whilst it is possible to extract landuse data from OSM the whole process can be very complicated. You should be aware that in the recent past the research group of Professor Alexander Zipf at Heidelberg (http://www.geog.uni-heidelberg.de/gis/index_en.html) have investigated using data-mining techniques for determining some landuse classes from OpenStreetMap: it’s definitely a research problem! I’d also endorse JRA’s other comments about processing the data (e.g., gdal or SQLlite) but they have more experience with these tools than I do.

However, you don’t need to do very elaborate transformations to reduce available information into a discrete set of landuse classes. A good place to start would be to look at the relationship between CORINE (https://wiki.openstreetmap.org/wiki/Corine_Land_Cover#Tagging) and Urban Atlas (https://wiki.openstreetmap.org/wiki/User:SK53/Urban_Atlas derived from https://wiki.openstreetmap.org/wiki/WikiProject_Poland/Urban_Atlas)).

I found the following rules of thumb work quite well:

Assume that farmland is the default (least likely thing to be mapped)
Implement a virtual Painter’s Algorithm for choosing one type of landuse over another (this is hard to do in your set-up, but I think you could hack something simple which does demonstrates the principle). I dont think the rules I used are that well documented, but there may be something on my slides here: http://2011.sotm-eu.org/slides/38_JerryClough_UrbanAtlas_SK53.pdf. (The problem you have is determining if polygons overlap which is very complicated processing).

You will have to put a range of values into your main overpass query, but that shouldnt be too bad if you restrict the size of the bbox. If you want more info email me directly at SK53.osm at gmail.com and I can probably find all the combinations I used for Urban Atlas (which gives you IIRC 15 categories).

I wonder if Overpass will be fast enough to deliver data (although every 10 seconds seems overkill to me, every minute is perhaps more realistic). You could of course grab a bigger polygon (perhaps in the direction of travel) and slice it up later which might reduce any latency introduced by Overpass. Anyway I find what you are trying to do to be rather interesting.

Sk53

GrößterNehmer · March 24, 2014, 1:29am

A big “Thank you” to both of you guys!

Upon receiving the assignment I thought it should be fairly easy. As I started looking at OSM that changed quite a bit. And then it changed a lot more when I realized that I so stupidly thought of nodes, edges, polygons, coordinates, areas and distances in a way you would in math class: on a flat piece of paper. It simply didn’t cross my mind that the math for distances on a sphere is a little different than on a flat surface, let alone a sphere which the universe pressed a little out of shape. OK, I knew that the math for a sphere is different but I was thinking too “small scale” because in my day-to-day life everything is meters and kilometers and not coordinates on a giant squeezed ball.

Furthermore I think I gave you guys the wrong impression what my assignment is about. It’s not so much about what is around me in the real world but more about showcasing location awareness, particularly from user generated content in the web 2.0 (or arguably web 3.0). Therefore it is not too big of a problem if not everything is tagged or if there are some inconstancies in the tagging. If I can show the basic principal behind the idea with land use and natural it obviously would be possible to refine the algorithm with more tags, but that is not required for the assignment.

After a little chat with my academic supervisor the “every 10 seconds” went out the window. We agreed on getting the data only a few seconds before the song playing at the moment would end. That means every 3 to 5 Minutes, depending on the duration of the song. If no land use or natural tag is found at that moment, the current playlist would continue. Because the bbox is only 100 meters in diameter I don’t even have to consider the curvature of the earth because it wouldn’t make much of a difference.

So basically all of the “hard parts” went out of the window. While I totally agree with Sk53 in that accurately determining the land use around you is interesting, it turns out for my little assignment accuracy isn’t. But I think maybe, after my thesis, I could look at that topic again and maybe could come up with an app that people actually would want to use. Maybe even something that could give back a little data-wise to the OSM project. Because, let’s face it: a free map that anyone can improve is cool stuff! I live in a one way street that had the direction changed decades ago. Google Maps Navigation still wants me to drive in the wrong direction!

One last question: I looked at GDAL and Spatialite. Both could do everything I would need for my app but are very complex. I also searched Google for any libraries that would fit my needs but with less overhead. So far I only found libraries that generate graphical representations of OSM data or are equally complex as the above mentioned. While I’m a big fan of SQLite and have developed a corporate-wide information system with it, for my purpose it seems to be a bit complicated. After all, I don’t need to store anything but only process data I get at any given time. Translating to SQL wouldn’t be difficult, but if anyone knows a Java/Android library that could do the following natively in Java, I would be very happy if they would point me in the right direction:

create polygons from coordinates
calculate unions and intersections of those polygons
calculate area for polygons
optional: calculate bbox or return coordinates for distances in meters

Thanks in advance and thank you very much, JRA and SK53! Both of you really helped me out a great deal.

JRA · March 24, 2014, 5:45am

Hi,

JTS is one candidate http://tsusiatsoftware.net/jts/main.html. Jar file seems to be 680 kt. Perhaps you could play a bit with OpenJUMP which is built on top or JTS, test the functions with the GUI and have a look at the source code. OpenJUMP Plus version can even read OSM data directly. OSM support is included only in the snapshots at the moment http://sourceforge.net/projects/jump-pilot/files/OpenJUMP_snapshots/.

GrößterNehmer · March 27, 2014, 11:43pm

Thanks, JRA!

That was exactly what I was thinking about. I tested JTS with its TestBuilder GUI and fed it both polygons from the XML file above. Guess what: it matched your GDAL area sizes to the last digit! It obviously can’t read OSM XML, but it should be fairly easy to put all nodes into a HashMap and then generate polygon objects from the way part of the XML. As I understand it, nodes come always before ways in OSM XML. From there on it’s only a matter of calling some of JTS’s methods.

I guess (with the help of both you guys) it’s almost as easy as I initially thought.

Thanks again!

JRA · March 28, 2014, 5:14am

Good. Perhaps you want also to check how much JTS features have been ported to javasciript https://github.com/bjornharrtell/jsts.