Phone= tag formatting in the US, +1 NXX NXX-XXXX vs +1-NXX-NXX-XXXX vs?

I am once again foolishly attempting to standardize the formatting of a tag. I presume there’s been some discussion about which is preferable etc.

Standard formatting makes it easier to discover actual data entry issues and direct folks more cleanly to things needing human intervention and investigation (phone numbers missing digits, etc).
I have a set of validator rules that standardize to the all dashes form but that just happened to be the most common in the dataset I first pulled (King County WA I think?). I am happy to modify them to whatever the consensus is.

I will not be attempting to solve way folks tag extensions at this time.

Just for fun, here’s a list of the various formats that are most common in my experience:

(213) 456 7890
(213) 456-7890
(213)456-7890
+1 (213) 456-7890
+1 213 4567890
+1 213-456-7890
+1 2134567890
+1-213-4567890
+1.213.456.7890
+1213-456-7890
+12134567890
1-213-456-7890
12134567890
213 456 7890
213 456-7890
213-456 7890
213-456-7890
213.456.7890

The wiki gives quite a bit of latitude on the separators saying only that there should be one between the country code and the rest of the number.

I personally like +1.213.456.7890 which I started using when I was working with customers in a few countries around the world and it seemed to be the one format that everyone could use without asking me what it meant.

However it seems that the wiki slightly prefers spaces for separators so I use +1 213 456 7890 when entering phone tag values. Given the expectations of US mappers I doubt that very many are in that format. The only time I correct other mappers is when they leave off the international dialing code (“+”) and the country code (“1”).

By the way, the “1” code is actually for all countries participating in the North American Numbering Plan which includes Canada and some Caribbean countries.

Here’s the output of my validator rules when run on KY. Note that the first one detects all combinations of space and dash as delimiter where the format still isn’t +1-NNN-NNN-NNNN. (ex: +1-NNN-NNN NNN).

If you take a look at the 1399 phone= tags in Kentucky (picking a random state I haven’t touched) the most common formatting is all dashes format at 292. The second most common is all dashes but missing the +1, followed in third by (NNN) NNN-NNNN. All dots is extremely uncommon as it turns out.

This is pretty typical with my experience of all dashes being the most common (but not majority) of phone numbers in the US.

And to illustrate what we get for having one format… Once the rules are applied as outlined here, it becomes super easy to see the residual numbers that need non-robot attention:

My vote is for this format. I think it’s both recognizable to Americans who don’t often encounter the +1, while making the “core” of the number obvious. I have a JOSM validation rule that changes every other format you have listed to it, although I think it leaves +1-NXX-NXX-NXXX and +1 NXX NXX-NXXX alone. Very easy to change obviously if there is a different consensus.

2 Likes

I am completely happy with that.

Would you mind sharing your version of the rules? No need to have unnecessary duplication on my part.

Just as a hint - there is a ITU-T E.164 phone number standard. It does not say a lot about formatting but just as another datapoint for the discussion.

E.164 is about the digits, not the number’s presentation. E.123 is the ITU standard for formatting numbers; however, it gives considerable leeway to regional norms, particularly the North American practice of grouping digits with hyphens. (The hyphens conflict with the DIN 5008 standard, which is for German phone numbers in German-language publications. Bizarrely, parts of the wiki call for conformance to DIN 5008 in every language and region.)

A couple years ago, @Kovoschiz sent me down a rabbit hole investigating the correct format for phone and fax numbers in the NANP region. The short story is that there isn’t a single standard, because OSM is trying to have it both ways: most consumers of phone numbers either strive for human readability, in which case consistency is a matter of house style, or machine readability, in which there are no delimiters. But OSM tries to strike a compromise between both formats, probably because editors and data consumers haven’t historically done much to format the numbers themselves.

In the real world, people rarely specify the +1 calling code, because the area code is already part of an international scheme. The 1 mostly appears as part of a toll-free number, such as a 1-800 number, in which case most people habitually include the hyphen.

Here are my notes from researching this topic in the past. I’ve marked this as a wiki post so you can add to it as we learn more.

Government agencies and other regulatory bodies:

Telecom industry:

Style guides for writers and journalists:

  • AP Stylebook: 775‑784‑4040
  • Canadian Press Style Book: 819‑555‑5555
  • Chicago Manual of Style: (000) 000‑0000 or (1-000) 000‑0000 or 000‑000‑0000 or 1‑000‑000‑0000 or (for an international audience) +1 607 000 0000
  • Medical Library Association: 312.419.9094
  • Modern Language Association: don’t use commas
  • UPI Style Book: (212) 682‑0400 or 682‑0400
  • Wikivoyage: +1 YYY XXX‑XXXX or +1 YYY‑XXX‑XXXX (if the NPA requires ten-digit dialing) or +1‑YYY‑XXX‑XXXX (if the area requires 11-digit dialing, or for a toll-free number)

Style guides for software and technical writers:

  • Apple: 800‑282‑2732
  • Google: (800) 555‑0175 (non-breaking space and non-breaking hyphen)
  • IBM: 1‑800‑426‑4968
  • Microsoft: 612‑555‑0175

Other relevant standards for software:

  • RFC 3966: tel: URIs MAY include hyphens for readability, MUST NOT include spaces
    • This affects any mobile software that wants to make it easy to dial a phone tag in OSM.

Extensions

Entity Extraction Analysis

It may be informative to know what other consumers support and recognize as phone numbers. The test formats are:

Format iOS Slack
(213) 456 7890 :white_check_mark: :white_check_mark:
(213) 456-7890 :white_check_mark: :white_check_mark:
(213)456-7890 :white_check_mark: :white_check_mark:
1 (213) 456-7890 :white_check_mark: :white_check_mark:
+1 213 4567890 :white_check_mark: :white_check_mark:
+1 213-456-7890 :white_check_mark: :white_check_mark:
+1 2134567890 :white_check_mark: :white_check_mark:
+1-213-4567890 :white_check_mark: :white_check_mark:
+1.213.456.7890 :white_check_mark: :white_check_mark:
+1213-456-7890 :white_check_mark: :white_check_mark:
+12134567890 :white_check_mark: :white_check_mark:
1-213-456-7890 :white_check_mark: :white_check_mark:
12134567890 :white_check_mark: :heavy_multiplication_x:
213 456 7890 :white_check_mark: :white_check_mark:
213 456-7890 :white_check_mark: :white_check_mark:
213-456 7890 :white_check_mark: :white_check_mark:
213-456-7890 :white_check_mark: :white_check_mark:
213.456.7890 :white_check_mark: :white_check_mark:
2 Likes

Yep, here are my phone rules, apologies for the dump:

/* flag potentially non-US phone numbers */
*["phone"]["phone"!~/^(?:\+?1[ -.]?)?(?:\(?([0-9]{3})\)?[ -.]?)?([0-9]{3})[ -.]?([0-9]{4})$/][inside("US")] {
  assertMatch: "node phone=\"383382039\"";
  assertMatch: "node phone=\"(383) 380398\"";
  assertNoMatch: "node phone=\"1.383.382.0398\"";
  assertNoMatch: "node phone=\"(383) 3820398\"";
  assertNoMatch: "node phone=\"+1 303-355-1765\"";
  
  throwWarning: tr("{0} has too few or many digits", "{0.tag}");
  group: "Phone number formatting";
}

/* reformat valid phone numbers -> +1 NXX-NXX-NXXX  */
*["phone"]["phone"=~/^(?:\+?1[ -.]?)?(?:\(?([0-9]{3})\)?[ -.]?)([0-9]{3})[ -.]?([0-9]{4})$/]["phone"!~/^\+1[ -][0-9]{3}[ -][0-9]{3}-[0-9]{4}$/][inside("US")] {
  assertMatch: "node phone=\"1 (383) 382-0398\"";
  assertMatch: "node phone=\"+1 383 382 0398\"";
  assertMatch: "node phone=\"3833820398\"";
  assertMatch: "node phone=\"1.383.382.0398\"";
  assertNoMatch: "node phone=\"+1 383 382-0398\"";
  assertNoMatch: "node phone=\"+1 383-382-0398\"";
  assertNoMatch: "node phone=\"+1-383-382-0398\"";

  throwWarning: tr("{0} valid, but improper format", "{0.value}");
  fixRemove: "phone";
  fixAdd: concat("phone=+1 ", get(regexp_match("^(?:\\+?1[ -.]?)?(?:\\(?([0-9]{3})\\)?[ -.]?)([0-9]{3})[ -.]?([0-9]{4})$", tag("phone")), 1), "-", get(regexp_match("^(?:\\+?1[ -.]?)?(?:\\(?([0-9]{3})\\)?[ -.]?)([0-9]{3})[ -.]?([0-9]{4})$", tag("phone")), 2), "-", get(regexp_match("^(?:\\+?1[ -.]?)?(?:\\(?([0-9]{3})\\)?[ -.]?)([0-9]{3})[ -.]?([0-9]{4})$", tag("phone")), 3));
  group: "Phone number formatting";
}

/* flag missing area codes */
*["phone"]["phone"=~/^(?:[0-9]{3})[ -.]?(?:[0-9]{4})$/][inside("US")] {
  assertMatch: "node phone=\"456 1234\"";
  assertNoMatch: "node phone=\"123 456 1234\"";

  throwWarning: tr("{0} missing area code", "{0.value}");
  group: "Phone number formatting";
}
1 Like

If I’m reading this code correctly, it’ll flag any phone number that contains an extension. This is the other rabbit hole I went down. There are many styles for indicating an extension; I added a section on extensions to the wiki post above.

Yeah, it’ll throw a warning, but there’s no autofix for that one so presumably you’d take a look and leave it alone. Didn’t want to make that regular expression even more complicated :slight_smile:

2 Likes

(this is now phone validator chat digression)
I wanted to see what the delta between your much more compact rule is and my own method mostly to make sure I wasn’t botching anything too bad. I downloaded all of the numbers in Tennessee and did 2 runs through validation, once with @whammo’s rules and once using the methodology above.

The only big difference is there’s an “aggressive” phase outlined in my method that will replace out all the common delimiters and then try to format a number out of it. This means I a number like +(901) 249-0549 will eventually get to a format that is “correct”. A small difference is that I flag numbers that have [A-Z] and have a formatter to convert those to digits while moving the original number to phone:mnemonic.

You may consider adjusting your fixup to non include numbers that have “1” in places that make is an invalid phone number but that’s also an easy check to write on it’s own.

Extensions are a massive annoyance and I’d love for it to just be " ext. "

Here’s the residual phone numbers for both methods.

@whammo:

@watmildon :

I was curious if there was a “definitely avoid” formatting that some very common software stack would NOT recognize as a phone number. I added a couple to the wiki post. Feel free to add as you see fit.

For humor:
ChatGPT: 15/18 - says 15 of them are phone numbers when asked how many are phone numbers
WolframAlpha: 0/18 - when asked “is +1 213 4567890 a valid phone number in the united states?” etc

Until I researched this topic, I never quite appreciated the reasons for all these different formats. Clearly, some variants like 000.000.0000 and 000/000-0000 are purely stylistic. But other variants communicate subtle nuances about local dialing procedures, at least according to some style guides:

Format Dialing procedure Trunk prefix Area code
1 000 000-0000
1 (000) 000-0000
(1-000) 000-0000
7-digit dialing Optional Optional
1 000-000-0000
(+1) 000-000-0000
10-digit dialing Optional Required
1-000-000-0000 11-digit dialing Required Required

I suppose these nuances can be the responsibility of a software library that includes a table of dialing procedures by NPA. But that raises the question of why we’re even bothering to format these tags for human consumption. Omitting any punctuation would be less error-prone and more compatible with the tel: URI format that smartphones actually use to place calls. Another consideration is that NPAs sometimes adopt ten-digit dialing; either we’d have to make a bulk edit, or every software application would need to update its table of dialing procedures.

Last year, I created a validator for phone numbers in Ukraine – Rules/UkrainePhoneNumbers – JOSM

Feel free to utilize it for your needs.

1 Like

Osmose has this pin for ‘phone number not in the expected format’ (In Italy). Don’t know if it has some sort of AI routine learning what the most common formats are for numbers of different digit counts and if with city 0 prefix (has to be dialed even from abroad) or without for mobile numbers. For the US/Canada can only remember it complaining if the international +1 prefix is missing.

1 Like

This has reminded me to extend checks to fax= and contact:phone= as well as detect toll free numbers. Thanks!

1 Like

The exception to the intl prefix check is the numbers entered under the phone:tollfree tag which can not be dialled from abroad.

edit: For entry quality improvement there’s a phonenumber plugin for JOSM. Certainly worth knowing about what it does in terms of formatting.

RE: the phonenumber plugin

The good news: it does great work formatting numbers into +1 NNN-NNN-NNNN format!

The bad news: It does not appear to work if you do a partial download from overpass. Bug filed here: Validator hits null exception when run on overpass downloaded data set latest JOSM · Issue #7 · gabortim/josm-phonenumber · GitHub

1 Like