New public transport ids/modelling in Switzerland

I mentioned this at the Fondue evening in Zürich last week, just FYI at this stage: the SKI+/SBB/BAV have moved to a new stop (of interest mainly for buses) modelling schema which includes different ids than the previous UIC numbers (that is still maintained though). In particular it seems as if they now model the actual stop locations and not just an overall higher level entity aka more OSM-like :-).

There is some interest on their side that we include these new ids too and we (a couple of people from SOSM) will have an initial meeting likely in January to work out how to and if to proceed. If we make progress I suspect we’ll have a further meeting with a wider audience, but right now things are not really concrete enough for that to make sense.

4 Likes

And the new IDs are still somewhat based on the UIC numbers, aren’t they? e.g. SLOID ch:1:sloid:2684::1 and UIC 8502684. Or am I mixing something up here?

Yes it seems so.

Very cool! Actually, I started looking into this last January (as in walking over to BAV, chatting with them over coffee, and then doing some actual work). I didn’t quite finish, but let me write a brain dump below, in case someone else wants to take over. :slight_smile: :slight_smile: :slight_smile:

  • SLOIDs and IFOPT IDs: Swiss SLOIDs are actually just the local incarnation of pan-European effort called IFOPT. In OSM, these new IDs can be tagged as ref:IFOPT, which is currently used on 318K OSM features globally, but only 141 times in Switzerland. In the Wikidata schema, these IDs are modeled as property IFOPT Stop ID (P12393). Note that the same station/platform can carry multiple IFOPT IDs. For example, Basel Badischer Bahnhof has both an IFOPT ID (=SLOID) from SBB/SKI+, and a different IFOPT ID from Deutsche Bahn/Delphi. To my knowledge (which may be wrong), there’s no effort to unify these. Well, both OSM and Wikidata support multiple IDs for the same feature (OSM with semicolon-separated values), so it’s not a problem.
  • Data sources for IFOPT IDs: Last time I checked (a year ago), there was no global aggregation of IFOPT identifiers. Rather, one had to get the data from each country separately, such as OpenTransportData.swiss from Switzerland.
  • Licensing: OpenTransportData.swiss have an unusual licensing model: The data is available for public download, but you needs to ask for permission to integrate it into a “database works.” Once the permission is granted, though, that database can freely re-distribute the Swiss transit data under its own terms. For OSM, that permission has already been granted (not sure who hd asked for it; perhaps Stefan Keller?) For Wikidata and AllThePlaces, I filed a legal request, which was promptly granted.
  • SBOIDs: In Switzerland, stations/platforms (SLOIDs) refer to their operators via “SBOIDs”. These correspond to railway companies, regional transit agencies, cable car operators, and various other organizations in travel/transit. There’s also many SBOIDs for companies that operate no SLOIDs, such as travel agencies or hotel booking sites. In OSM, the operator for a SLOID gets tagged as operator and operator:wikidata. However, the BAV/OpenTransporData.swiss data model maintains entities for individual business units of transit agencies, which seems too fine-grained for either OSM or Wikidata. For example, various business departments and accounting groups of SBB each have their own SBOID; this distinction is irrelevant outside of SBB accounting. (Likewise for Deutsche Bahn and others). So we need a mapping from fine-grained SBOIDs to coarse-grained operators for OSM and Wikidata, and a way to maintain this mapping. Wikidata seems like a good place for such things, so I added a new property SBOID (P13221) to the Wikidata schema.
  • SBOIDs in Wikidata: Next, I wrote a script that compares Swiss SBOIDs between Wikidata and OpenTransportData.swiss. The script runs once per day in a Wikimedia datacenter, and you can find its output online here. Then, I made an effort to add the Swiss transit operators (ie., SBOIDs that are actually operating some SLOIDs according to OpenTransportData.swiss) to Wikidata. The intention was to automatically assign operator and operator:wikidata in OSM given a SLOID’s operating SBOID code. I completed this effort up to SBOIDs having >=2 SLOIDs, but did not bother to do it for the remaining micro-agencies (mostly aerial lifts in the alps). If anyone wants to do it, see this list and clean it up in Wikidata.
  • Deep links by SLOID/IFOPT ID: SBB/SKI+ has a public tool called Atlas to inspect the data for a SLOID. However, it does not support deep-linking by SLOID, and SBB did not want to implement this. So, to make IFOPT IDs clickable on Wikidata, I built a little URL redirector. For IFOPT IDs with a ch: prefix, the tool redirects to the correct Atlas page based on a mapping table downloaded daily from OpenTransportData.swiss. Foor IFOPT IDs starting with de:, the tool redirects to a random (not very good) German webpage. This URL redirector now runs in Wikimedia’s datacenters, and supports links for SLOIDs/IFOPT IDs such as as ch:1:sloid:7000:0:229097 or de:14713:8098205. It would be nice if someone else could extend this tool to support other countries beyong Switzerland and Germany, and possibly also extend it to display something real instead of just a redirect. Pull requests welcome :slight_smile:
  • Swiss stops and platforms converted to OSM schema: As the next step, I wrote a script (see source code) that downloads all stations and platforms from OpenTransportData.swiss. Its output is a GeoJSON FeatureCollection whose features have properties with OpenStreetMap tags, including ref:IFOPT but also eg. OSM tags for wheelchair accessibility. To populate operator and operator:wikidata, the script queries Wikidata to build a mapping table keyed by SBOID. That script’s source code is here. It gets run once per week by AllThePlaces, and its output is part of the weekly AllThePlaces build. Unfortunately, though, it looks like my script is currently crashing, so there’s been no output recently. I’ll have a look at this next week, it shouldn’t be hard to fix. Feel free to send pull requests, though. :slight_smile:
  • Conflation with OSM: So we already have the Swiss transit data converted to GeoJSON with OSM tags (if the script doesn’t crash, ahem). The next step would be conflation with OSM. I haven’t looked into this yet. Stefan Keller had a master’s student working on this problem in general, also handling other data from AllThePlaces. They wrote a nice thesis, but not really a working system that could be used in daily production. I’ll start working on a conflator next week, but will initially focus on stores and restaurants, not transit stations. If someone else wants to conflate transit stations, that would be awesome! There’s a nice paper here that uses Random Forests (Machine Learning) for this, but other/simpler approaches might work too.

Anyhow, so much for my brain dump. Hope you’ll find it useful, and good luck!

— Sascha

1 Like

Is the origin country deducible from the id (if it isn’t that’s a, minor, issue)?

I arranged that literally years ago.

While, if we actually proceed with adding the ids to OSM objects, a one time conflation will be necessary, the more immediate concern is determining which objects are going to get these ids going forward, aka which PTv1/PTv2 objects get the id(s). Given that foreign keys are always an issue in OSM we need to keep this to the minimum required.