Proposal to replace documented syntax for telephone extensions

Last November, the phone=* syntax was unilaterally redefined to accommodate telephone extensions and more theoretical details. The changes are technically problematic in several respects. I propose to undo this change and welcome the community’s ideas on what to replace it with.

History

Since 2008, the documentation for phone=* has recommended formatting the value according to the International Telecommunication Union’s E.123 standard for human-readable phone numbers. The documentation also noted that separating groups of digits by spaces is consistent with the German DIN 5008 standard for German-language publications, and that separating groups of digits by hyphens is consistent with RFC 3966. (This was a bit of a non-sequitur, because E.123 tolerates hyphen separators according to national norms, while RFC 3966 is a standard for machine-readable tel: URIs.)

In the years since, quite a few data consumers have added support for both syntaxes, but there’s been a lot of confusion about how to tag PBX telephone extensions. Although the documentation never mentioned it, E.123 recommends a decidedly non-machine-readable syntax:

To show an extension number of a PABX without direct in-dialling, the nationally used word or abbreviation for “extension” should be written immediately after the telephone numbers and on the same line as the word “telephone”, followed by the extension number itself.
[…]
Example 2: Telephone international +22 607 123 4567 ext. 876

In 2009, contact:*=* was first documented with an example requiring the DIN 5008 syntax for telephone extensions:

+<country_code> <national_destination_code> <subscriber_number>-<direct_inward_dialing>

Unfortunately, this syntax conflicts with E.123’s tolerance for hyphen-separated digits.

Last November, in response to a question about whether it’s a good idea to tag a telephone extension, the following parameters were added to the phone=* documentation, ostensibly borrowed from the RFC 3966 standard:

Parameter Description Example
\;ext= Extension phone=+1 859-255-0270;ext=2
\;isub= ISDN subaddress Irrelevant to OSM?[1]
\;phone-context= Emergency number or other service code (e.g., 9-1-1 or 4-1-1 in North America) None[2]

As far as I know, this syntax was added to the discussion without a proposal or informal discussion beforehand.

As of writing, the \;ext= syntax has seen some uptake but not nearly as much as the alternatives:

Syntax Example Prevalence
DIN 5008 +49 3831 2681-0 83,928
E.123 +43 1 71613 ext. 0 756
Informal North American +1-905-688-5550x3369 174
Escaped semicolon +34 96 352 54 78\;ext=4298 151

Problem

Of all the possible syntaxes we could’ve chosen for telephone extensions, \;ext= is the least intuitive and least interoperable syntax:

  • Unlike the rest of the phone=* syntax, it is designed for machine readability at the expense of human readability.

  • It conflicts with both the ext. syntax from E.123 and the - syntax from DIN 5008.

  • It cherry-picks a part of RFC 3966 that suffers from poor support in mobile operating systems – the very operating systems that matter for this key. Instead, the industry best practice is to use just a comma or semicolon, for example, tel:+15555555555,2.

  • The semicolon has long been documented as separating multiple phone numbers, so a backslash was added to escape the semicolon, even though the longstanding general syntax for multiple values uses ;; as the escape sequence. Thus this edit also introduces a novel escape syntax for the semicolon that is valid only in phone number keys but nowhere else.

As far as I can tell, \;ext= is currently unsupported by OSM data consumers. The main OSM website is unable to detect the value as a phone number. Overpass turbo fails to remove the backslash, resulting in a malformed URI.

Both Organic Maps and OsmAnd remove the \;ext=, causing iOS to treat the extension as part of the subscriber number. In North America, the public telephone system will ignore these extra digits, so the user will have to enter the extension manually. On the other hand, in a country with variable-length subscriber numbers, the additional digits may connect the user to an altogether different line elsewhere in the country.

Certainly, we could write off these issues as mere bugs, not our problem. But honestly I think these are symptoms of a poorly thought-out redefinition of the longstanding phone=* syntax.

Proposal

The documentation for phone=* should stop recommending the \;ext= syntax. These edits should be undone.

Since there’s clearly a need for telephone extensions in phone numbers, we should come up with something to replace \;ext=. Here are some options that come to mind:

  • Refer mappers to the E.123 notation, ext. or the local translation, expecting data consumers to equate any xzy. with ext. and extract the extension number.
  • Refer mappers to the E.123 notation, ext., but require a particular language such as British English rather than allowing it to be localized.
  • Allow an actual tel: URI for any phone number that can’t be represented by E.123 notation, such as phone=tel:+1-888-828-4798,2334.
  • Choose some other delimiter that doesn’t already have a special meaning, such as a comma (which would be consistent with tel: URIs) or x (a very common North American notation).

Note that the hyphen specified by DIN 5008 would not be viable as a global recommendation, as it conflicts with the hyphens used throughout the North American Numbering Plan Area as part of E.123 notation. However, as a practical matter, I’m not currently proposing to deprecate DIN 5008 notation in German-speaking regions.

(FYI to those who have participated in related discussions or touched the relevant wiki documentation recently: @bkil @Kovoschiz @Mateusz_Konieczny @user_5589.)


  1. In theory, an ISDN subaddress would be analogous to an extension. However, I have never seen a shop, office, or other POI advertise an ISDN subaddress as its public point of contact. ↩︎

  2. Both emergency:phone=* and emergency_telephone_code=* are typically set to the number verbatim, without any attempt to conform to a broader scheme. ↩︎

12 Likes

Here I have a quick question: I am German and therefore not very familiar with the American telephone system: Do I understand correctly that the telephone numbers there are meant in such a way that I call +22 607 123 4567 and then have to say or enter the number 876 again to reach the valid line when I am connected?

I don’t feel that way. Because an extension system is generally unusual in Europe. Here, every telephone has an internal number. So the extension number can be reached directly from outside. A company therefore doesn’t have 1 external telephone number, but instead books 10, 100 or 1000 telephone numbers, etc.
You have the main number e.g. +49 391 54 XXXX (example Magdeburg city administration) and the direct dial-in to the connection directly in the telephone number. If you now dial +49 391 54, you would arrive at a completely different subscriber and not at the city administration. (If the number is taken)

That’s how it works in the NANPA (most of North America). If you don’t pause for a couple seconds before entering the extension, then the public phone system will simply drop the rest of the numbers. This is called “overspell”, and it’s how some businesses get away with advertising phonewords that are longer than seven or ten digits.

I understand that the regions with this system seem to be mutually exclusive of the regions that use hyphens in written phone numbers. However, the problem is primarily that mappers and software developers have been misled over the years.

The wiki documentation calls the extra digits after the hyphen “direct inward dialing”, but DID is not exclusive to the situation you describe. A business in the U.S. can set up DID, but the caller is still required to pause before dialing the extension. From the caller’s perspective, it’s no different than a PBX that doesn’t use DID. So at the very least, that portion of the format needs to be described differently.

Anyhow, that’s tangential to my proposal to remove and replace the current guidance about \;ext=.

1 Like

I agree that the change to the Wiki should be reverted, but coming from Germany, what you’re trying to achive just isn’t a thing here, so I don’t have a strong opinion on it, other that, strictly speaking, the extension doesn’t seem to be an actual part of the phone number from what I understand, so maybe phone:ext=* could be used instead of adding things like commas or text to the phone-field. Please correct me if I’m misunderstanding this.
The reason, why I am proposing this, is that the phone number seems to be working even without the extension, but you reach some sort of “central hub”, correct?

4 Likes

Possibly, but if the business sets up DID, then it might be the difference between calling a national hotline and a local landline that might have its own (hidden) ten-digit number.

Practically speaking, if an office’s sign indicates an extension, then we have to assume that the extension is required for contacting the office. By analogy, we set website=* to the full URL to the store location’s webpage, not the homepage of the store chain (brand).

We already use addr:*=* subkeys to make mailing addresses more structured, but I don’t think we should extend this approach to phone=*. For all these years, mappers and software developers have had every reason to believe that phone=* is self-contained. They would definitely be surprised if we redefine phone=* to be just part of a phone number.

Definitely, but data consumers could also be surprised if the allowed characters now included letters instead of only digits, + and -. If the extension was in phone:ext, old libraries would still parse the phone=*-part fine, and new ones would be able to infer the full number.

But as I said: I’m not affected by this, just wanted to add some thoughts. The whole thing sounds a bit “hacky” to me :slight_smile:

1 Like

Unrelated: “;;” can’t be used as universal escape sequence, because there are tags that may have a list of values that may be empty (e.g. destination tagging).

This seems like the best option to me.

That might be the best option, if we figure out that there are more special cases like this somewhere in the world.

2 Likes

It’s an awkward situation, to be sure: we insist on making phone=* human-readable to aid in data entry but then turn around and attempt to parse it as if it’s machine-readable. Ideally, we would standardize on tel: URIs for all phone numbers and rely on editors and data consumers to pretty-print the URIs using off-the-shelf libraries. There are off-the-shelf libraries for detecting and parsing phone numbers, too, but they reject this exotic \;ext=* syntax.

In case it’s any comfort, an extension is normally written as part of a NANP phone number, as either “1-555-555-5555 ext. 123” or “1-555-555-5555 x123”. Phone number libraries already handle these notations without any problem. I would illustrate this point with a photo of either notation on a shop sign, but extremely few shops would have an extension in their primary phone number. That’s a more common practice among obscure offices.

1 Like

Unrelated: “;;” can’t be used as universal escape sequence, because there are tags that may have a list of values that may be empty (e.g. destination tagging).

you can use „;;;;“ for these :wink:

4 Likes

Here are some examples of popular software libraries for detecting, parsing, and formatting phone numbers:

These libraries are used very widely across the software landscape and would be very unlikely to add any special affordances for an OSM-specific format. Some of them have live demos that you can use to prototype potential syntaxes.

I’ve reverted the change to the documentation that introduced the RFC 3966–inspired syntax, since there didn’t seem to be any objections here but there were quite a few expressions of support. I left in a footnote mentioning the syntax that had been in place for almost a year, in case someone comes across an example of it. As far as I can tell, most occurrences were the result of someone working through an unrelated MapRoulette challenge about phonewords.

The next step would be to agree on what should replace it for the purpose of representing PBX telephone extensions.[1] Before I start a straw poll here, does anyone have any suggestions besides the ones in my original post?

Note that the most popular mobile operating systems all treat the comma , as pause (pause for a fixed amount of time) and the semicolon ; as wait (wait for a tone). In practice, you can either pause or wait to enter an extension.

The distinction probably only matters when doing something interactive, such as entering a password to get into a Zoom teleconference or changing the colors on the light sculpture atop this apartment building:

If we do accept a semicolon as part of the syntax, we would need to agree on an escape syntax so that it doesn’t get confused with the multi-value separator, which is already very commonly used in phone=*. One option is to interpret ; as a wait unless followed by a +, since we require phone numbers to begin with a plus sign and country calling code. Another option is to require tel: URIs to be percent-encoded. RFC 3986 requires both , and ; to be percent-encoded. Either way, we should probably clarify at the same time whether website=* and similar keys are expected to be percent-encoded, since this is a frequent point of uncertainty for developers of editors and data consumers.


  1. I don’t think there’s any need for a syntax for ISDN or service codes. emergency:phone=911 is definitely preferable to phone=911\;phone-context=+1. ↩︎

2 Likes

Perhaps a subkey like phone:tel=+1-888-828-4798,2334 for machine-readable numbers in the tel: URI format? Doesn’t solve the issue of old parsers, but buys us flexibility to leave the main phone key human-readable.

The character | is used as delimiter for lanes or rather their properties. I think that usage parallels its use here, and fits better than ;. (W.r.t to how to separate the different phone numbers; this could allow ; again)

I’d expect data consumers to do that mostly anyhow, so having this for phone is probably fine.

1 Like

With my American perspective in mind, I’ll say I find \;ext=* (and ext. to a lesser degree) very clumsy. A simple delimiter (that doesn’t introduce problems the way ; does) like , or x is much more likely to be added correctly because it’s more human-readable.

I definitely wouldn’t want to type out \;ext=* into EveryDoor while I’m trying to not spend 10 minutes loitering at a shop’s entrance :laughing:


In August, I added a note on percent-encoding under the Best Practices section after running into that issue myself:

3 Likes

Addendum: The comma and semicolon both originated as special modifiers on the Dial command in the Hayes AT command set, which is ubiquitous among modems. I haven’t found any formal standard calling for this syntax as part of a tel: URI. Instead, RFC 3601 specifies p and w, which are very easy to confuse with P and W as part of phonewords. However, but we can expect the operating system to recognize the comma and semicolon in a tel: URI, perhaps more reliably than p and w.

I’ve never seen the p/w syntax in the wild. The ,/; syntax is only used in machine-readable contexts: either the phone number is being used as a tel: URI for automatic dialing, or the user is expected to copy-paste the phone number to place a call manually. In other contexts, where human readability is a priority, the ext. and x syntaxes predominate.

I did manage to find the ;ext= and ;phone-context= syntax in a guide published by the IT department of one California county to help county staff insert tel: URIs in official publications. But that’s about it.

I guess we need to decide first of all whether phone=* should optimize for human readability or machine readability. On the one hand, we explicitly allow punctuation characters for human readability, knowing that a mapper will enter this value by hand and that some applications will show the value verbatim. On the other hand, we insist on strict adherence to DIN 5008 or E.123 notation and relegate phonewords to phone:mnemonic=* to make phone=* more predictable for machine readability.

Illustrating the contradiction, openstreetmap-website has a routine that converts already RFC 3966–conformant dial strings to tel: URIs by prepending the tel: scheme, as if the value is not quite machine-readable, but not quite non-machine-readable either:

There’s a real-world distinction too. A single pause lasts either one or two seconds, but an extension technically doesn’t say how long to pause, just that the dialer needs to pause long enough that the subsequent digits won’t get ignored. Sometimes I see phone numbers given with two consecutive commas, just in case. In some cases, you can dial the main number and verbally tell the operator the extension so they can redirect you. That human interaction can’t necessarily be encoded as an automated sequence. If we prioritize human readability, then the written conventions around ext. or x matter more, whereas if we prioritize machine readability, then perfecting the timing via , or ,, matters more.

Incidentally, I hope this discussion will eventually lead editors and data consumers to format and validate phone numbers automatically and not as naïvely, making human readability less of a concern for raw tagging.

1 Like

My opinion is clear, phone=* should be for human consumption and manual entry, ideally showing exactly what the user has to type in. This exludes mnemonics who help the user remember the number, but the user is expected to type in numbers instead of letters.

Some phone numbers happen to be in tel: format, that doesn’t preclude the phone:tel tag. For clarification, like in other cases, we could stipulate that

if the number in phone is not in the tel: format, specify also phone:tel with the number in tel: format.

This way we avoid duplication. Note however that the tel: format disallows whitespace, hence the post-processing in your excerpt. We could allow those as exceptions.

Where the number cannot be called fully automatically due to this pause, perhaps a subkey phone:pause could provide clarification and enable proper user instructions. E.g.

phone=+1-888-828-4798 ext. 2334|+1-777-717-3687 ext. 1223
phone:tel=+1-888-828-4798,,2334|
phone:pause=2s|operator

The first number needs a 2s delay at least, the second number needs the user to say the extension to an operator. Since the second number isn’t fully automatic, I have excluded it from phone:tel.
What do you think?

In my experience, most hotlines these days immediately present you with a phone tree. One common automated prompt is: “If you know the extension of the person you want to call, enter the three-digit number now. Otherwise, stay on the line to speak to an operator.”

I have also called smaller offices where I’m greeted by the receptionist, who asks me for the name or extension. This isn’t automatable. I suppose one could encode that as a wait, since Android shows the user a dialog box asking whether they want to dial the subsequent digits.

Anyways, my point is that the pause interval isn’t formally a part of the phone number. It’s just something people adjust based on trial and error, but to a machine, that’s more important than whatever the numbers after the pause actually mean. In general, a mapper is unlikely to know exactly how to interact with the PBX; all that’s printed on the office door or business card is “ext.” Translating that to a pause interval is probably best done by a data consumer, which can assume a one-second delay or choose to add more commas to err on the safe side. None of this is implemented yet, but it could be once we have consensus on a workable syntax.

It’s difficult to see what your issue is. If the phone number isn’t automatic (or it’s unknown), phone:tel is empty and the data consumer has to somehow parse phone.

And how does the data consumer (I guess you mean an app, not the actual customer) do that? If you somehow overlooked my phone:pause suggestion, I’d like you to re-read that section. Please clarify what is missing.

I’m not arguing with you, just thinking through the different cases we’d need to handle. My current assumption is that, if all the data consumer knows is that there’s an extension, it would encode a minimum of however many seconds it wants before the extension. The specific interval might as well be a user preference rather than hard-coded in the database. So that speaks to your preference for keeping phone=* human-readable, and maybe not even bothering with the subkeys. But I am curious how non-OSM-based databases and applications like Yelp and Google Maps handle these situations.

1 Like

Alright, yes, there are many cases to think through.
Keeping phone:pause could serve as a hint whether to pause/wait/talk with an operator. The exact time to pause could be set somewhere else.

Indeed, it would be interesting to see how others are doing it. Though chances are that they won’t tell us. For us, it’s probably best to simply keep it simple.

The example given at the wiki discussion that Minh linked earlier was the Portsmouth Olympic Marina in Kingston, Ontario:

:man_shrugging:

1 Like