Mechanical edit proposal: ref=* → trip_ref=* for certain passenger train routes

For public transit route relations like route=train, ref=* is intended to be a route number or identifier. For example, RE 8 is a Regional Express service in Berlin and Brandenburg, Germany. Any train that follows this stopping pattern in either direction will be signed as RE 8, regardless of what time of day it shows up.

However, for some train operators in the United States, we’ve been using schedule numbers instead of route identifiers in ref=*. Certain passenger train operators like Amtrak assign names instead of numbers to routes, but give each scheduled service of the day its own unique number. For example, Amtrak Cascades is a named Amtrak route, which has the following daily trip numbers: 500, 503, 504, 505, 507, 508, 516, 517, 518, 519. Even numbers are northbound trips, and odd ones are southbound. Trips 500 and 508 have the same stopping pattern, so they’re mapped as one relation with ref=500;508. Two return trips also share a stopping pattern, ref=503;505. All other Amtrak Cascades trips have unique stopping patterns and are mapped as individual relations.

Since these trains have schedule numbers but not route numbers, mappers have been putting the schedule numbers in ref=*, leading to confusion in quality assurance tools (after all, 5;6 does look more number-y than California Zephyr). This is a departure from international norms, as the only foreign train network that follows this tagging scheme is Canada’s Via Rail, and they were likely following our lead on that one. It’s also given us some quite absurd and unmaintainable tags like ref=405;409;417;451;463;465;467;473;475;479;497;4403;4405;4407;4415;6401;6403;6411;6455 for the southbound Hartford Line.

While schedule numbers are valuable information, they probably don’t belong in ref=* since they aren’t route identifiers. If a route has a name, then the ref=* should be the name, or an abbreviation of it (as commonly referred to by locals, or defined by train operators). I’ve gotten messages from other mappers (cc @Mundilfari) who concur with this opinion and would like to have ref=* values match routes rather than trips.

So I’m proposing to move ref=* to trip_ref=*, and assign a new ref=* that matches the name of the route or an abbreviation thereof, for the following train routes and operators:

  • Amtrak
  • Sounder (Sound Transit)
  • Caltrain
  • Altamont Corridor Express
  • Metrolink (Southern California)
  • South Shore Line (Northern Indiana)
  • Virginia Railway Express
  • MARC

Thoughts?

2 Likes

I’m familiar with this numbering system from riding Caltrain from a station that ACE also serves. Caltrain and ACE refer to these as “train numbers” (not to be confused with the numbers assigned to the locomotives). They use the train numbers in timetables and departure boards but not wayfinding signage and not visibly on the trainsets themselves. The closest analogy I can think of is flight numbers in aviation. These numbers aren’t so useful for active wayfinding, but they remain just as useful as the route numbers for distinguishing multiple concurrent routes in a route master relation.

I think it’s fine to shunt these numbers over to a new key, considering the different semantics. Editors will need to add support for trip_ref everywhere they currently support ref. This is especially true of iD, which is trying to avoid perpetuating PTv2’s naming-for-the-editor scheme by falling back to keys such as ref when labeling relations. I think editors will pretty much be the only immediate consumers of this information. Train numbers are relevant to routers when doing trip planning, but the train number you care about when getting a route depends on the time of day, so this information would come from a GTFS feed rather than OSM. Maybe this key could help routers match GTFS entries to relations.

I don’t know if “trip” is the right terminology, but I can’t think of anything more appropriate.

“Trip number” for each journey, and “run number” for each vehicle used in the day or shift, are quite standard transit operation terminology found around the world, for both trains and bus. Though if your train number has trip and run combined, something else should be used.
But fundamentally I wonder whether this should be kept at all. OSM is not the best for full scheduling info. GTFS is where detail down to each service belongs. At most show the prefix and range.

1 Like

This will be a longer-term “migration,” Clay. But it will be a welcome one, especially as ref=* tags better harmonize with how they are used more widely in the world (especially Europe and Germany).

As you and I have done a great deal of mutual rail / train improvements in USA, I’m glad to see you proposing this; thank you.

I think I could quickly get used to this, and like many such tag-migration schemes, it gets easier as the ball gets rolling and the momentum picks up. It won’t happen in a week or a month or maybe even a year, but longer-term, yes.

Some wiki documentation wouldn’t hurt.

I’m pro for this but would like to point to a schema already used here in Germany and other countries in Europe:

  • ref_trips

is used instead of trip_ref .I don’t know who introduced that first, but we may think about a migration to a tag name with global consensus.

Re: GTFS. There is a set of gtfs:* tags already documented on the Wiki

  • gtfs:feed the name of the dataset
  • gtfs:route_id, the ID of the route from routes.txt, …
  • gtfs:trip_id, the ID of a single trip
  • gtfs:trip_id:sample, the ID of a placeholder trip_id. Other trips with same shape but different departure times can be derived from this
  • gtfs:trip_id:like, an SQL LIKE string to search for trip_ids with same shape
    • if the trip_ids follow a syntax which allows to derive similar trips by using a substring of the trip_id - usually seen in GTFS created from MentzDV SW, a service provider for “Transport for London”, DE-BY-MVV, Switzerland, …

PTNA supports both approaches

I’m in a rush now, I can provide more details later.

3 Likes

I’m happy to steer things towards an already-established global consensus (so, +1). If ref_trips=* is the better key, let’s use it. Clay, I’ll be the first to say “it’s OK” to have reversed the syntax components (and not using plural, as it appears is done with ref_trips=*). And Toni, I’ll be the first to say “thank you” for calling this to our attention here and suggesting good harmonization. Nice!

Right. Follow something like a database convention so that ref can refers to as many things as you want. I think it should be ref:trip = abc, just as source tag is being divided to source:names, source:geometry, source:ref etc etc.

I don’t claim that this is actually the better key. I’ll be ok with any other, globally harmonized one.

ref_trip= is not ideal, but ref:trip= won’t work at all because suffix by convention means it is a variation on ref= . not that it is a ref= of something else. Eg Trip.com exists, so ref:trip= might be some id on that website. (that’s the dominant use of ref:*=; cf name:*=) Or if as I said, when the train number contains both run and trip number, *:ref:trip= might be used to show the trip part of the train number. (this may be compared to addr:*= for different parts of an address)
source:*= has a different logic. It is a namespace used to group such info. In this sense, it is similar to lifecycle prefixes. (I personally don’t like it. It can technically be interpreted as how the tags are recorded in the source=, as in import prefixes eg tiger:*= and naptan:*= .)

2 Likes

Almost all of the additions are jumps, seems to be imports or mass editing. That’s not a strong case to keep them. ref_trips | Keys | OpenStreetMap Taginfo

For completeness, there’s also railway:ref, which sometimes has to be suffixed too.
This station served by Caltrain, ACE, and Amtrak has a different station code according to each. I’m not familiar with any examples of airline-style codeshare agreements among public transit operators in the U.S. that would apply to these route relations, but I think it could happen in Europe.

No imports but rather massive use of the key during a short period of improving bus relations in southern Germany by a small group of mappers (including me) having access to a new and valid source of information. Total number is still quite small compared to total number of route relations in that area.

1 Like

Do you think of something like ‘Bus 603/50/27’ with ref=603/50/27 and ref:MVV=603 and ref:VLK=50 and ref:LAVV=27, where MVV and VLK and LAVV are networks?

If those are route numbers, that would probably be modeled as separate concurrent route relations. Rather, I’m referring to something like how, in aviation, a single flight (trip) might have a different flight number according to Delta than according to KLM. There are a number of air–rail alliances, resulting in the IATA assigning codes to railway stations, but I don’t know if this extends to trip numbers.

Those are route numbers and networt=MVV;VLK;LAVV

Not sure I fully understand the point you’re making, but yes, there’s definitely examples of trains being assigned flight numbers.
One example is the air+rail alliance between Lufthansa and DB in Germany. Lufthansa “flight” LH3424 is really a train. I’m sure there’s more cases like that.

1 Like

It’s particularly helpful for North American train routes. Some routes have a variety of stopping patterns for each individual train, and it’s hard to map them on OSM without keeping track of schedule numbers.

I’m inclined to stick with trip_ref=* here because it matches other types of ref, like loc_ref=* and route_ref=*.

2 Likes

I appreciate adding it. But it won’t always work nicely. Let’s take one issue, USA often have more limited frequency, while other countries could have many more numbers. ref=405;409;417;451;463;465;467;473;475;479;497;4403;4405;4407;4415;6401;6403;6411;6455 already amounts to 84 chars, nearing 1/3 of the limit. In this case, you would able to further split by Amtrak and CTrail (should it be?) to make it as short as possible. For other cases in general, using ranges and rules may be inevitable for length constraint, and desirable for editing concerns. Listing them all out is less convenient for humans, inelegant to read, and prone to errors when enumerating. The SQL format available from GTFS is not always the simplest solution, necessary for every number format, or easily understood by most.

The way I see it, this isn’t really about adding information (the trip numbers are already present on e.g. the Amtrak routes).

Doesn’t the issue you’re describing (the value of the tag eventually overflowing the max character limit) exist regardless of whether the proposed change is implemented (move the trip numbers to trip_ref=*) or not (keep them in ref=*)?
In my opinion, the proposed change does neither improve nor disimprove this; strictly speaking it’s a different issue, only coincidentally affecting the same tag(s).
In my opinion this is primarily about clearing the ref=* tag of the trip numbers so ref=* can be used for matching values in both the route master and all member routes.

Whether the trip numbers should be kept at all is a discussion worth having in general, but it could be had independently of the changes proposed here. (Personally, I’m with clay_c here - having the trip numbers feels like it can be valuable - at least in the US. I can see this being different in other parts of the world, but other local communities could just chose not to tag trips in their areas…).

2 Likes

I went ahead and made the changes:

I’ll follow up by updating the wiki where applicable.

2 Likes