Hello everyone! I’ve been working on a plan to import and merge the bus stops from King County Metro’s GTFS feed in order to improve the quality and quantity of bus stop data in the area. I’ve already reached out to folks on the OSM World Discord and in the OSM US Slack and received great feedback from them. I believe this is almost ready to go, so now I’m sharing it here. (Some time in the future I will probably get to also importing the GTFS routes, but this import’s scope is limited to just the stops.)
To facilitate the import, I’ve been making a tool called GTFS Janitor that helps automate the process of downloading, filtering, and matching bus stops to their representations in OpenStreetMap. It also guides the user through manually resolving any ambiguous matches the tool finds. Here’s what it looks like:
The goal is that once this initial import is complete, semi-automated maintenance of the data can be performed on a regular basis to keep it synced with KCM.
You can learn all about the details and decisions for the import plan on the wiki, including some QA tools and MapRoulette challenges to ensure data integrity: Automated edits/tjhorner-import - OpenStreetMap Wiki
And you can check out GTFS Janitor on GitHub (see the issues for outstanding import-related discussions):
I’m impressed by this tool and look forward to it becoming available for other transit systems over time. Down here in Santa Clara County, California, we investigated using GO_Sync to keep VTA’s ever-changing network up to date, but we couldn’t quite get it to work. I think my local community would be very interested in working with you to refine this tool once you’ve proven it out in King County.
That sounds great! I am definitely interested in making this a general-purpose tool that can work with any agency. Recently I took the first step in making it customizable by ripping out all of the hard-coded King County Metro stuff and introducing import profiles to customize the behavior of the conflation process.
After this import is complete I’d love to work with other mapping communities to get feedback on this spec so that profiles are both easy to write one but customizable enough for all situations
This is great! I can work on the necessary details to get King County Metro added then send it over to you when it’s ready, it will be a convenient tool to have.
After additional testing, feedback from the community, and some manual cleanup of existing bus stop data, I made some tweaks to reach the right balance of automation and human review. I imported KCM’s latest GTFS feed and ran through a conflation session; here are the resulting osmChanges if anyone wishes to review:
existing-and-new-stops.osc includes changes to update stops matched against existing nodes, creation of nodes for stops that could not be matched, and deletions from the human review process.
disused-stops.osc includes changes to nodes for stops that may no longer be in service (it adds a disused:* lifecycle prefix to these). This one is exported separately as it’s intended to be reviewed by a human with more scrutiny.
I’m spot checking the disused-stops.osc, and so far they all look to indeed be out of service.
One issue I’m noticing is that sometimes the actual physical infrastructure of the stop (the flag/pole, the shelter, the waste basket, the bench, etc.) ranges from all being still present to all being completely removed.
Because of this, I’m glad the disused stops are separated out because they require, as you say, more scrutiny.
This tool is looking really promising, keep up the great work!
Indeed. I think we would ideally be able to crowdsource verification of the exact state of the stops, through e.g. people completing StreetComplete quests. “Is this bus stop still here?” would be a perfect question to quickly answer while out surveying.
I remember we had a discussion about this on GitHub and agreed that using a tag like fixme=* would be perfect for this kind of thing since many editors and QA tools will flag these for review. But fixme=* is explicitly not for automated edits, and there doesn’t seem to be an alternative tag for “this is probably not here anymore, please check”.
I believe applying the disused:* lifecycle prefix would be valid to apply automatically, even though we can’t assume that the physical traces of the stop haven’t been removed. We want to make some sort of change that the stop is no longer in service since we know that piece of information for certain, and disused:* is the least destructive change we can make to represent that (unless there’s some other lifecycle prefix or tag I’m not aware of).
Since we can’t use fixme=*, we can possibly come up with our own tag and associated MapComplete challenge, or some other way to crowdsource the verification of the specific stop state. Previous imports have used this special tag method, for example the (somewhat notorious) tiger:reviewed=no tag.
Another option: since the wiki shouldn’t be treated as gospel, we (as a community) can evaluate the scope of the disused stop edits and grant an exception to using fixme=* for automated edits if we agree that it’s not too big of an edit to be a nuisance.
I’d love feedback on this point to determine what the best way to deal with out-of-service stops is. I just want to achieve these goals, basically:
Indicate somehow on the stop nodes that they are out-of-service.
Use crowdsourced knowledge to verify the state of the each stop.
That wiki page gives some keys from national-scale imports as examples to avoid. But a GTFS-based import is likely to be very local in nature, so as long as the local community has a realistic plan to review and resolve these fixmes in a reasonable timeframe, I don’t think there’s a real issue.
Though not an import in a traditional sense, the MapRoulette-based Santa Clara County POI import also has participants adding fixme=* to POIs that we know are present somewhere along a strip mall or inside an office building, but we don’t know exactly where. Some of us periodically go out on field surveys to specifically track down POIs with these tags.
This license is less restrictive than the one on King County’s website. Many of the requirements here don’t apply to “end users,” but it’s sort of unclear what that means in this context. I can email ST/KCM for clarification and explicit permission.
Others have used KCM GTFS data in the past so I’ve just been operating on the assumption that someone looked at the license and determined it’s compatible. I’ve attempted to contact the users who’ve imported/edited from the GTFS feed in the past about this, but they’ve been unresponsive
I’ll email Sound Transit about this. They actually contribute to OSM themselves, so I would be surprised if they say no
Did some digging… looks like King County granted permission to use “any King County-derived data in OpenStreetMap” back in 2012, according to this wiki page: Contributors - OpenStreetMap Wiki
So I think we’re good to go on the licensing front.