Hi everyone.
I’m planning an import of schools in Sweden with data from Skolverket’s API. That’s also why I am keeping this post in english, so I can eventually link it from the import Wiki page later on.
The license is CC0, so that should not be a problem.
I’ve written myself a small script that extracts all the information, filters is and creates a geoJSON file that can be imported in JOSM.
Now my question: Would someone be willing to have a look at this file to manually check some schools in your area to see if all my tags make sense? I did extensive checks myself (and will later also of course publish the script) but I’d like to get some eyes on it already now…
Here’s my preliminary file: schools.geojson - Google Drive
I’m looking at the file now and the first two school is already in OSM as areas. How wil this be dealt with?
Preferably the tags from the imported nodes should be added to the area, if it exists.
There also has to be a search for addresses since the address tags can already be in the school polygon, on individual buildings or as a separate node.
Any thoughts about that?
Yes of course. the tags from the point data will need to be conflated with pre-existing features. the Conflation and “replace geometry” plugins in JOSM are the right tools for that.
I looked at the dataset in general and found some issues. For some reason, I wrote them in Swedish and its too annoying to go back over and translate I come back to English at the end. Here are the comments:
Alla operators skriver “Kommun” med stor bokstav. Eftersom alla operators börjar alla ord med stor bokstav antar jag att du har scriptat det här på nåt sätt. T.ex. alla “AB” skrivs “Ab”.
I “addr:city” så är det några orter som är i ALLCAPS.
isced:level bör nog vara sorterat. Här finns exempel som “0;1;3;2” och “2;0;1”
En enda skola saknar http/https: “www.yrkesgymnasiet.se”
Vissa skolor ser ut att ha flera namn som är semikolonseparerade. Det funkar inte så bra för “name”. Extra namn måste flyttas till “alt_name”.
Vissa skolor verkar vara kombinerade på en punkt, t.ex.: “Internationella Engelska Skolan Värmdö;Internationella Engelska Skolan Kungsbacka;Internationella Engelska Skolan Solna;Internationella Engelska Skolan Sigtuna” som ligger på en enda nod (tillsammans med två andra skolor) i Täby.
En skola har ref: “21938009;21798613;94206100;98702729;78193353;89487045;60521412;52719739;48773104;49474076;17950126;96392396;94600352;60796098;52798556;28310176;56167054;39701961;28887899;43055932;72624322;60712249;62009354;28004366;46399291;90347955;29794705;27675952;91687391;41600127;48863032;65244856;43553236;61927182;45526612;82589201;10245367;79956863;46133825;44027818;45813524;23680908;53285375;33143689;33016914;97480702;97724398;98702223;22474777;65313627;91306166;75401552”
Den typen av kombinerade skolor har också kombinerade grades: “grades=0,1,2,3,4,5,6;7,8,9;4,5,6,7,8,9;0,1,2,3;4,5,6;6,7,8,9;0,1,2,3,4,5”
Sen finns det skolor som är delade i datasettet men bara finns som ett enda område i OSM. Hur gör du då? Ex: name=Kyrkebyskolan 4-6 name=Kyrkebyskolan F-3 i Arvika.
Then I looked at my local schools. They look OK’ish if they are conflated into existing data but here are some comments on specific schools:
Misplaced in a residential area “Adolfsfors skola” (should be further to the east by the river bend)
Sulviks skola is in an intersection too far south.
Holmedals skola is way out in the forest 1,4 km north of its real location.
Thats for my local area but I randomly looked at some other places and:
Rättviks grundsärskola looks malplaced.
Many schools are of course spot on, but it seems like there are a large number of outliers that are severely misplaced because you don’t have to look for very long to find them.
How. Very detailed feedback. I love it. Let me see:
About the grades: Yes the 1,2,3,4… was the easiest to generate from how I get the data from skolverket. I’ll see what I can do.
There’s a general issue in the data, that some values are ALL CAPS and some are not. That’s why I capitalize the values. It seems that I missed some fields and that it has some weird side effects. Maybe I can add some special cases for “kommun” and “AB”.
Schools that are on one single point: Yeah that seems to be an issue with skolverket’s data. Not sure what to do with this. That seems to be the same issue with the very long “ref”: That’s nothing that comes from me, it seems that JOSM recognized that a lot of points had the same coordinates and combined it into one.
I’ll try to see how big the problem is before continuing.
Schools that are split into separate entities (example name=Kyrkebyskolan 4-6 name=Kyrkebyskolan F-3 i Arvika). That’s also something that comes from skolverket. I’d say that would be something the importer has to fix manually since I can’t fix this in code.
Misplaced schools: That’s also a problem in the underlying dataset then. I wonder if that’s a showstopper or if this is something that the importing person could fix? @wulfmorn what do you think?
In general, I think I need to clarify that my intention is not to dump the dataset as it is into OSM but to use a manual import (conflation) process. Similar to the one that has been used by the NVDB project.
Strömbackaskolan in Piteå. In your data it has five names and five refs I don’t know if the separate buildings counts as a separate school in the skolverket data, but it looks a bit strange. Same issue as your no 3 above I guess. Way: Strömbacka (313931035) | OpenStreetMap
OK great. So what I hear from all of you is that the quality is not great for a direct import.
I just ran the conflation plugin in JOSM with all of the schools in Sweden and this dataset. While I see that there are a lot of problems in the dataset, they are easy to spot when actually looking at the map.
More importantly: There are a lot of smaller schools in villages that seem to be correct (looking at satellite images and other maps for reference) but are not mapped in OSM. So maybe the focus of this manual import should rather only be to import the ones that are not mapped at all instead of updating schools which already are well mapped? WDYT?
So maybe the focus of this manual import should rather only be to import the ones that are not mapped at all instead of updating schools which already are well mapped? WDYT?
If the tagging data is clean and “appealing” then I think it would be a nice dataset to import from (conflate from). Large offsets are mitigated by the user and in return get more complete tagging added to the schools.
So if you can make sure there are:
no combined nodes anymore,
with pretty tagging,
then users can start picking away at these schools.
I would guess that the ref tag would be a unique new element that could be used to detect rate of completion - as well as post-check for schools that have not been imported (could indicate that the school is no longer there).
Speaking of which, the ref looks rather anonymous. Perhaps it should be ref:skolverket to be more clear about its origin.
Since the import is manual I would also like to flag a common issue with schools.
The AREA of the school is amenity=school
It has all contact information, grades etc. tags
The BUILDINGS of the school are building=school
It only has tags about the building, such as levels, roof type etc.
The buildings are usually not named and when they are they are probably called something else than the actual school.
ONLY if the school and the building are the same area should the building have all the tags.
If the school has MULTIPLE AREAS but the same name it should be a multipolygon and all tagging should be moved from the way to the relation.
If a building has MANY amenities (or similar) within the same building it may be relevant to only map the school as a node.
I don’t the scale of problems in Sweden because I’ve only worked on it in Norway - but double schools were super common there, and lots of buildings were called the same as the school and often tagged with amenity=school themselves. The problem is when you try to list out schools as f.ex. POIs and there are duplicates of the same school. Sometimes several duplications because several buildings have been tagged as the school.
Tag counter:
ONE amenity per school
ONE name per school
JOSM merges nodes with the exact same coordinate when loading a geojson file. The usual fix is to build a set of nodes/coordinates in the generator script as it iterates the POIs and then relocate each new POI a meter or so until there is no conflict.
If the quality of the POI position/coordinates varies, it might help to geocode using the provided addresses. I often have to do that with POIs. For example, Here is offering a large free quota on their geocoding api. For such a large dataset, it will likely pay off in terms of hours used during the manual import.
Fixing names to the extent possible in the script usually also pays off. Another benefit of fixing it in the script is that the fix will be reused in next year’s update.
Schools are important POIs and popular routing targets, so this is a commendable project.
I’m now creating a Wiki page with the instructions for the import for everyone to look at and to contribute.
Let me know if you have any objections or other input around the data.
As Skolverket’s geopositions are really unreliable (I emailed them, they know about the problem and said they will not fix it), I went along with the suggestion by @NKA to use a geocoding API.
@Wulfmorn I also added some instructions regarding your comment above to fix tagging when visiting a school. Please have a look at the “workflow” section and adjust
I checked the schools I know anything about (Karlskoga + Piteå).
No more duplicate res for the same nodes. One ref per node now
924 schools are missing “grades”? I suspect that it is missing in the skolverket data too.
This is very well structured. Here are a few comments - hope some of it is useful:
After this discussion in the Swedish community, the import plan (the wiki) is required to be submitted for review on the import mail list. There is no formal approval, but any potential issues will be addressed.
In the tagging section of the wiki there is usually a table which shows how which source data is used and how it is tagged, but it is currently lacking. This table will be expected, and it is a useful for later reference.
In step 6 of the workflow I would include the following items to clarify the mapping of amenity=school and related tags during import:
Either draw an area around the outdoor perimeter of the school
Or create a node in a central position for the school, ideally the geographical point to which you would like a router to guide you if you selected the school.
Avoid tagging the school building with amenity=school because there are usually multiple buildings at the location which would result in duplication of the school (already included in the wiki).
Include the ref:se:skolverket=* tag (and other tags) even if the school already exists in OSM.
There are often licensing issues when geocoding (for example, Google geocoding is not permitted for OSM). The geocoding you have used here is based on the official Swedish post addresses, but you may want to reduce the geocoding section in the wiki and on GitHub to avoid creating any misunderstandings.
addr:municipality=* is not a standard address tag (even if it exists in the wiki). I propose to rename it to MUNICIPALITY=* and for it to be deleted before uploading. If the information is available, a COUNTY=* tag would also be useful to structure the import.
I guess website=* should have “https”, not “http” (it will be updated by a bot, but might as well fix it up-front).
I think the school names should have better quality before they are imported. My experience is that most users will not fix them during an import, so it needs to be done in the code. The “Tag multiple objects” window in JOSM is useful to get an overview of issues. Examples:
A title conversion seems to have been done, resulting in capital letters in the wrong places, which need fixing, such as “Ock”, “I”, “Skolan”, “Gymnasiet”, “Friskola” etc.
Remove “AB”.
Fix abbrevations, for example “Ihgr”, “Isgr”, “Abb”, “Ess” etc.
Fix spacing in relation to commas.
Fix special cases through a simple conversion table.
Some of the school addresses do not enable a direct geocoding hit, for example if the address text does not match or if it lacks a house number. In these cases the school will be geolocated to a point which is representing the street, the post code area or the municipality, or similar. It would then be useful for the import to provide a FIXME=* tag which also contains the result type returned by the geocoder (place, locality, street, administrativeArea, addressBlock, intersection, postalCodePoint).
Looking in my surroundings i found a lot of amenity=school areas that does not refer to an actual used school. But legaly all the regulations concerning school buildings are still in place.
That makes me think about if it is meaningsful to add a tag like “active=yes” or sameting to the import.
Or is their an other way how to tag a school building not yet used as school? Generally speaking we never now what a kommun is about to do with their buildings. They discuss many skolnedläggningar but on the other hand they have used the buildnings for example to teach language for immigrants or reactivated the old buildings with sudden needs like damages on the normal building forcing them to relocate quickly.