Exporting streets from OpenStreetMaps

I am using the following command to extract streets names from OpenStreetMap’s osm files into csv file and then import it to MySQL:

./osmfilter /home/vmosin/geo/europe-latest.osm --keep="addr:country= and addr:city= and addr:street=" --ignore-dependencies --drop-relations |./osmconvert - --csv="\+ @otype \+ @lon @lat addr:street \+ addr:city \+ addr:country" -o=europe.csv

This command works perfect except of encoding - I receive abra-cadabra instead of unicode characters (such as german umlauts). How can I solve it?
Thank you in advance!

( Just in case: this is the german subforum, we all speak german here)

Did you double-check the encoding of your europe-latest.osm ?

What happens when you dont use a pipe, but write the output of osmfilter to disk and then use the disk file for osmconvert (only practical for smaller extracts, I think, otherwise you get huge intermediate files - but can be good to see where exactly the umlauts break)?

Just started to learn Deutsch))

How can I check it? I downloaded it from http://download.geofabrik.de/

Will check now the raw output…

So far the problem only with german umlauts… Tried to extract streets from russian city - everything looks fine.

Thats strange. osmfilter doesn’t do anything with the character coding, it leaves everything as it is.

What you might want to do: reformat .osm files to .o5m format before filtering. It’s much faster then.

Actually it’s the subforum for germany, though it’s also used for more general discussions in german. Since german umlauts appear mostly in germany, this subforum seems to be a reasonable choice to me :wink:

@nKognito: Maybe start with a test-file containing only a single node, and check after every step: Is it correctly encoded (i.e. 0xc3 0xa4 for ‘ä’) after

  • saving the .osm file,

  • osmfilter,

  • osmconvert without using the pipe,

  • osmconvert with using the pipe,

  • importing to MySQL?

I bet it’s a MySQL import encoding problem. I only remember that it is a mess with MySQL (names encoding, connection encoding, collation setting etc.) but not how I solved it once.

The Geofabrik extracts (same as JOSM downloaded xml) use UTF-8 coding.

<?xml version='1.0' encoding='UTF-8'?>

In some terminals I have files with utf-8 characters, but can’t display them… Are your sure, your bash language settings are utf-8? whats the result of

set | grep "LC_\|LANG"

Could you show me the result of

file europe.csv

You’re right. I just wanted to open the door for speaking german if he wanted. Because it sometimes happens to me that i write english posts in a german forum because i didn’t realize i was in a german-speaking community and my internet default language override kicks in. :slight_smile: