Small script for UrbanAtlas data

I made a small (python) script to convert some fields from UrbanAtlas ( https://land.copernicus.eu/local/urban-atlas ) into OSM-type of format, that can be opened in JOSM.

If anyone has use for it, please go ahead. Now it is adapted for UrbanAtlas 2012, but should be easy to adapt to the 2018 dataset.

(Also, if this type of topic should be in some other part of the forum, please let me know.)

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import sys

import geopandas as gpd

infilename = sys.argv[1]
outfilename = sys.argv[2]

gdf = gpd.read_file(infilename, encoding="utf-8")

attrs = ["landuse", "harbour", "aeroway", "natural", "leisure"]
attrs_to_delete = [
    #    "fid",
    "country",
    "fua_name",
    "fua_code",
    "code_2012",
    "class_2012",
    "prod_date",
    "identifier",
    "perimeter",
    "area",
    "comment",
    "Pop2012",
]

map_attr = "code_2012"
# code; desc; osm-tag
mapper = {
    "11100": {"landuse": "residential"},  # Continuous Urban Fabric (S.L. > 80%)
    "11210": {
        "landuse": "residential"
    },  # Discontinuous Dense Urban Fabric (S.L. : 50% - 80%)
    "11220": {
        "landuse": "residential"
    },  # Discontinuous Medium Density Urban Fabric (S.L. : 30% - 50%)
    "11230": {
        "landuse": "residential"
    },  # Discontinuous Low Density Urban Fabric (S.L. : 10% - 30%)
    "11240": {
        "landuse": "residential"
    },  # Discontinuous Very Low Density Urban Fabric (S.L. < 10%)
    "11300": None,  # Isolated Structures
    "12100": {
        "landuse": "industrial"
    },  # Industrial, commercial, public, military and private units
    "12210": None,  # "Fast transit roads and associated land
    "12220": None,  # "Other roads and associated land
    "12230": {"landuse": "railway"},  # Railways and associated land
    "12300": {"harbour": "yes"},  # Port areas
    "12400": {"aeroway": "aerodrome"},  # Airports
    "13100": {"landuse": "quarry"},  # Mineral extraction and dump sites
    "13300": {"landuse": "construction"},  # Construction sites
    "13400": {"landuse": "brownfield"},  # Land without current use
    "14100": {"landuse": "park"},  # Green urban areas
    "14200": {"leisure": "sports_centre"},  # Sports and leisure facilities
    "21000": {"landuse": "farmland"},  # Arable land (annual crops)
    "22000": {
        "landuse": "orchard"
    },  # Permanent crops (vineyards, fruit trees, olive groves)
    "23000": {"landuse": "meadow"},  # Pastures
    "24000": {"landuse": "farmland"},  # Complex and mixed cultivation patterns
    "25000": {"landuse": "orchard"},  # Orchards at the fringe of urban classes
    "31000": {"landuse": "forest"},  # Forests
    "32000": {
        "natural": "grassland"
    },  # Herbaceous vegetation associations (natural grassland, moors...)
    "33000": None,  # "Open spaces with little or no vegetations (beaches, dunes, bare rocks, glaciers)
    "40000": {"natural": "wetland"},  # Wetland
    "50000": {"natural": "water"},  # Water bodies
}

# JOSM loads polygons faster if they are in wgs84
gdf = gdf.to_crs("EPSG:4326")

# set def attrs to None
for attr in attrs:
    gdf[attr] = None

for attr in attrs:
    temp_map = {}
    for key, valdict in mapper.items():
        if valdict is not None:
            try:
                temp_map[key] = valdict[attr]
            except KeyError:
                pass
    gdf[attr] = gdf[map_attr].map(temp_map)
    gdf[~gdf[attr].isna()].drop(columns=attrs_to_delete).to_file(
        f"{outfilename}_{attr}.shp", driver="ESRI Shapefile"
    )

gdf = gdf.drop(columns=attrs_to_delete)



gdf.to_file(f"{outfilename}_ALL.shp", driver="ESRI Shapefile")

If I didn’t miss something, it looks as if you are transforming all “Continuous" and "Discontinuous Urban Fabric” into residential, but maybe in OpenStreetMap terms these could be retail or commercial landuse as well? Similarly, industrial can also mean commercial in this dataset and from looking at the data in my area I think it is giving a rough impression but does not fit well with the OSM map geometrically (there are weird polygons that do not seem to represent anything on the ground). On top of this, the most recent version is 5 years old.
While I could imagine this script being useful for (locally) filling up gaps in our landuse data, I do not understand why you would want to have it in Josm rather than loading it directly into your rendering (or other usage) toolchain?

1 Like

Hi, yes you are right, some areas may be either residential or commercial and so on, and this requires of course that the importer has a look at satellite imagery, but it may speed up the editing to simply be able to copy geometries rather than editing them by hand. There is also some ambiguity when editing “manually” regarding when an area is for instance residential or commercial, since it may be residential but still contain some commercial parts and so on…

The main point is of course to use this for areas that are lacking in landuse coverage, not to replace anything already present in OSM.

The point of importing rather than rendering it as an additional layer is the be able to complete OSM in areas where this type of data is missing. For areas that are already well mapped in OSM, there is no reason to use this, of course.

I guess it can depend on the area, around here I thought I would have to touch every single node and be quicker by drawing what I want, and as the landuse classes don’t match you can also expect that the geometry has to be split if there is more than one osm-landuse in it, but if nothing was here it could be a start :+1:

on the other hand, if you import errors in an area where there are no mappers to care for landuse mapping they will remain there for a long time and suggest we have data when in fact it’s not mapped according to our standards. It can be better to be clear to not have data compared to providing misleading information (and being seen as a model to copy from by people in the area who just start mapping).

Yes, it is a delicate balance between importing and errors. On the other hand, OSM is often full of errors, mainly for areas that were edited a long time ago when satellite imagery had coarser resolution and larger errors in “location”. I do not suggest we should simply import things we know are very incorrect, but things such as landuse will always have a certain degree of ambiguity and it will be up to each editor what they consider is a suitable selection of landuse category (and so on).

What is “our standards” exactly? Is there a standard? How accurate must the satellite data or gps traces be?