Phone= tag formatting in the US, +1 NXX NXX-XXXX vs +1-NXX-NXX-XXXX vs?

watmildon · September 22, 2023, 6:38pm

I am once again foolishly attempting to standardize the formatting of a tag. I presume there’s been some discussion about which is preferable etc.

Standard formatting makes it easier to discover actual data entry issues and direct folks more cleanly to things needing human intervention and investigation (phone numbers missing digits, etc).
I have a set of validator rules that standardize to the all dashes form but that just happened to be the most common in the dataset I first pulled (King County WA I think?). I am happy to modify them to whatever the consensus is.

I will not be attempting to solve way folks tag extensions at this time.

Just for fun, here’s a list of the various formats that are most common in my experience:

(213) 456 7890
(213) 456-7890
(213)456-7890
+1 (213) 456-7890
+1 213 4567890
+1 213-456-7890
+1 2134567890
+1-213-4567890
+1.213.456.7890
+1213-456-7890
+12134567890
1-213-456-7890
12134567890
213 456 7890
213 456-7890
213-456 7890
213-456-7890
213.456.7890

n76 · September 23, 2023, 3:06am

The wiki gives quite a bit of latitude on the separators saying only that there should be one between the country code and the rest of the number.

I personally like +1.213.456.7890 which I started using when I was working with customers in a few countries around the world and it seemed to be the one format that everyone could use without asking me what it meant.

However it seems that the wiki slightly prefers spaces for separators so I use +1 213 456 7890 when entering phone tag values. Given the expectations of US mappers I doubt that very many are in that format. The only time I correct other mappers is when they leave off the international dialing code (“+”) and the country code (“1”).

By the way, the “1” code is actually for all countries participating in the North American Numbering Plan which includes Canada and some Caribbean countries.

watmildon · September 23, 2023, 3:44am

Here’s the output of my validator rules when run on KY. Note that the first one detects all combinations of space and dash as delimiter where the format still isn’t +1-NNN-NNN-NNNN. (ex: +1-NNN-NNN NNN).

If you take a look at the 1399 phone= tags in Kentucky (picking a random state I haven’t touched) the most common formatting is all dashes format at 292. The second most common is all dashes but missing the +1, followed in third by (NNN) NNN-NNNN. All dots is extremely uncommon as it turns out.

This is pretty typical with my experience of all dashes being the most common (but not majority) of phone numbers in the US.

watmildon · September 23, 2023, 3:52am

And to illustrate what we get for having one format… Once the rules are applied as outlined here, it becomes super easy to see the residual numbers that need non-robot attention:

whammo · September 23, 2023, 3:12pm

My vote is for this format. I think it’s both recognizable to Americans who don’t often encounter the +1, while making the “core” of the number obvious. I have a JOSM validation rule that changes every other format you have listed to it, although I think it leaves +1-NXX-NXX-NXXX and +1 NXX NXX-NXXX alone. Very easy to change obviously if there is a different consensus.

watmildon · September 23, 2023, 7:33pm

I am completely happy with that.

Would you mind sharing your version of the rules? No need to have unnecessary duplication on my part.

flohoff · September 23, 2023, 8:07pm

Just as a hint - there is a ITU-T E.164 phone number standard. It does not say a lot about formatting but just as another datapoint for the discussion.

Minh_Nguyen · September 23, 2023, 10:24pm

E.164 is about the digits, not the number’s presentation. E.123 is the ITU standard for formatting numbers; however, it gives considerable leeway to regional norms, particularly the North American practice of grouping digits with hyphens. (The hyphens conflict with the DIN 5008 standard, which is for German phone numbers in German-language publications. Bizarrely, parts of the wiki call for conformance to DIN 5008 in every language and region.)

A couple years ago, @Kovoschiz sent me down a rabbit hole investigating the correct format for phone and fax numbers in the NANP region. The short story is that there isn’t a single standard, because OSM is trying to have it both ways: most consumers of phone numbers either strive for human readability, in which case consistency is a matter of house style, or machine readability, in which there are no delimiters. But OSM tries to strike a compromise between both formats, probably because editors and data consumers haven’t historically done much to format the numbers themselves.

In the real world, people rarely specify the +1 calling code, because the area code is already part of an international scheme. The 1 mostly appears as part of a toll-free number, such as a 1-800 number, in which case most people habitually include the hyphen.

Minh_Nguyen · September 23, 2023, 10:25pm

Here are my notes from researching this topic in the past. I’ve marked this as a wiki post so you can add to it as we learn more.

Government agencies and other regulatory bodies:

International Telecommunication Union: (819) 555 5555
- A footnote expresses resignation to the fact that Americans will never give up their dashes and dots.
North American Numbering Plan Administration: 819‑555‑5555
Canadian Numbering Administrator: 819‑555‑5555
Translation Bureau (Canada): 819‑555‑5555 or 1‑800‑555‑5555
Office québécois de la langue française: 418 123‑4567
U.S. federal government:
- Government Publishing Office: 1–703–555–6593 (en dashes)
- Department of Energy: 303‑275‑3658 or 1‑800‑555‑5555
- Department of Veterans Affairs: 212‑123‑1234 or +1‑201‑123‑1234 (when writing for an international audience)
- Department of Health and Human Services: 212‑123‑1234 or +1‑201‑123‑1234 (when writing for an international audience)

Telecom industry:

Telecommunications Alliance (Canada): 819 555‑5555 or 1 800 555‑5555 for nongeographic area codes (non-breaking spaces)

Style guides for writers and journalists:

AP Stylebook: 775‑784‑4040
Canadian Press Style Book: 819‑555‑5555
Chicago Manual of Style: (000) 000‑0000 or (1-000) 000‑0000 or 000‑000‑0000 or 1‑000‑000‑0000 or (for an international audience) +1 607 000 0000
Medical Library Association: 312.419.9094
Modern Language Association: don’t use commas
UPI Style Book: (212) 682‑0400 or 682‑0400
Wikivoyage: +1 YYY XXX‑XXXX or +1 YYY‑XXX‑XXXX (if the NPA requires ten-digit dialing) or +1‑YYY‑XXX‑XXXX (if the area requires 11-digit dialing, or for a toll-free number)

Style guides for software and technical writers:

Apple: 800‑282‑2732
Google: (800) 555‑0175 (non-breaking space and non-breaking hyphen)
IBM: 1‑800‑426‑4968
Microsoft: 612‑555‑0175

Other relevant standards for software:

RFC 3966: tel: URIs MAY include hyphens for readability, MUST NOT include spaces
- This affects any mobile software that wants to make it easy to dial a phone tag in OSM.

Extensions

International Telecommunication Union: (302) 123‑4567 ext. 876 (“ext.” or “extension” or any translation thereof)
U.S. Department of Veterans Affairs: 202‑123‑1234, ext. 9
U.S. Department of Health and Human Services: 202‑123‑1234, ext. 9
AP Stylebook: 212‑621‑1500, ext. 2
Chicago Manual of Style: (000) 000‑0000, ext. 0000
UPI Style Book: (212) 682‑0400 extension 364
Apple: 800‑282‑2732 extension 987 (or “ext.”, but not “x”)
Google: (415) 555‑0132, extension 987
RFC 3966: tel: URIs are identified by a separate parameter, analogous to a separate subkey in OSM
- In practice, operating systems like iOS use a comma.

Entity Extraction Analysis

It may be informative to know what other consumers support and recognize as phone numbers. The test formats are:

Format	iOS	Slack
`(213) 456 7890`
`(213) 456-7890`
`(213)456-7890`
`1 (213) 456-7890`
`+1 213 4567890`
`+1 213-456-7890`
`+1 2134567890`
`+1-213-4567890`
`+1.213.456.7890`
`+1213-456-7890`
`+12134567890`
`1-213-456-7890`
`12134567890`
`213 456 7890`
`213 456-7890`
`213-456 7890`
`213-456-7890`
`213.456.7890`

whammo · September 23, 2023, 11:49pm

Yep, here are my phone rules, apologies for the dump:

/* flag potentially non-US phone numbers */
*["phone"]["phone"!~/^(?:\+?1[ -.]?)?(?:\(?([0-9]{3})\)?[ -.]?)?([0-9]{3})[ -.]?([0-9]{4})$/][inside("US")] {
  assertMatch: "node phone=\"383382039\"";
  assertMatch: "node phone=\"(383) 380398\"";
  assertNoMatch: "node phone=\"1.383.382.0398\"";
  assertNoMatch: "node phone=\"(383) 3820398\"";
  assertNoMatch: "node phone=\"+1 303-355-1765\"";
  
  throwWarning: tr("{0} has too few or many digits", "{0.tag}");
  group: "Phone number formatting";
}

/* reformat valid phone numbers -> +1 NXX-NXX-NXXX  */
*["phone"]["phone"=~/^(?:\+?1[ -.]?)?(?:\(?([0-9]{3})\)?[ -.]?)([0-9]{3})[ -.]?([0-9]{4})$/]["phone"!~/^\+1[ -][0-9]{3}[ -][0-9]{3}-[0-9]{4}$/][inside("US")] {
  assertMatch: "node phone=\"1 (383) 382-0398\"";
  assertMatch: "node phone=\"+1 383 382 0398\"";
  assertMatch: "node phone=\"3833820398\"";
  assertMatch: "node phone=\"1.383.382.0398\"";
  assertNoMatch: "node phone=\"+1 383 382-0398\"";
  assertNoMatch: "node phone=\"+1 383-382-0398\"";
  assertNoMatch: "node phone=\"+1-383-382-0398\"";

  throwWarning: tr("{0} valid, but improper format", "{0.value}");
  fixRemove: "phone";
  fixAdd: concat("phone=+1 ", get(regexp_match("^(?:\\+?1[ -.]?)?(?:\\(?([0-9]{3})\\)?[ -.]?)([0-9]{3})[ -.]?([0-9]{4})$", tag("phone")), 1), "-", get(regexp_match("^(?:\\+?1[ -.]?)?(?:\\(?([0-9]{3})\\)?[ -.]?)([0-9]{3})[ -.]?([0-9]{4})$", tag("phone")), 2), "-", get(regexp_match("^(?:\\+?1[ -.]?)?(?:\\(?([0-9]{3})\\)?[ -.]?)([0-9]{3})[ -.]?([0-9]{4})$", tag("phone")), 3));
  group: "Phone number formatting";
}

/* flag missing area codes */
*["phone"]["phone"=~/^(?:[0-9]{3})[ -.]?(?:[0-9]{4})$/][inside("US")] {
  assertMatch: "node phone=\"456 1234\"";
  assertNoMatch: "node phone=\"123 456 1234\"";

  throwWarning: tr("{0} missing area code", "{0.value}");
  group: "Phone number formatting";
}

Minh_Nguyen · September 24, 2023, 12:22am

If I’m reading this code correctly, it’ll flag any phone number that contains an extension. This is the other rabbit hole I went down. There are many styles for indicating an extension; I added a section on extensions to the wiki post above.

whammo · September 24, 2023, 12:51am

Yeah, it’ll throw a warning, but there’s no autofix for that one so presumably you’d take a look and leave it alone. Didn’t want to make that regular expression even more complicated

watmildon · September 24, 2023, 3:35am

(this is now phone validator chat digression)
I wanted to see what the delta between your much more compact rule is and my own method mostly to make sure I wasn’t botching anything too bad. I downloaded all of the numbers in Tennessee and did 2 runs through validation, once with @whammo’s rules and once using the methodology above.

The only big difference is there’s an “aggressive” phase outlined in my method that will replace out all the common delimiters and then try to format a number out of it. This means I a number like +(901) 249-0549 will eventually get to a format that is “correct”. A small difference is that I flag numbers that have [A-Z] and have a formatter to convert those to digits while moving the original number to phone:mnemonic.

You may consider adjusting your fixup to non include numbers that have “1” in places that make is an invalid phone number but that’s also an easy check to write on it’s own.

Extensions are a massive annoyance and I’d love for it to just be " ext. "

Here’s the residual phone numbers for both methods.

@whammo:

@watmildon :

watmildon · September 24, 2023, 3:58am

I was curious if there was a “definitely avoid” formatting that some very common software stack would NOT recognize as a phone number. I added a couple to the wiki post. Feel free to add as you see fit.

For humor:
ChatGPT: 15/18 - says 15 of them are phone numbers when asked how many are phone numbers
WolframAlpha: 0/18 - when asked “is +1 213 4567890 a valid phone number in the united states?” etc

Minh_Nguyen · September 24, 2023, 4:50am

Until I researched this topic, I never quite appreciated the reasons for all these different formats. Clearly, some variants like 000.000.0000 and 000/000-0000 are purely stylistic. But other variants communicate subtle nuances about local dialing procedures, at least according to some style guides:

Format	Dialing procedure	Trunk prefix	Area code
1 000 000-0000 1 (000) 000-0000 (1-000) 000-0000	7-digit dialing	Optional	Optional
1 000-000-0000 (+1) 000-000-0000	10-digit dialing	Optional	Required
1-000-000-0000	11-digit dialing	Required	Required

I suppose these nuances can be the responsibility of a software library that includes a table of dialing procedures by NPA. But that raises the question of why we’re even bothering to format these tags for human consumption. Omitting any punctuation would be less error-prone and more compatible with the tel: URI format that smartphones actually use to place calls. Another consideration is that NPAs sometimes adopt ten-digit dialing; either we’d have to make a bulk edit, or every software application would need to update its table of dialing procedures.

andygol · September 24, 2023, 8:13am

Last year, I created a validator for phone numbers in Ukraine – Rules/UkrainePhoneNumbers – JOSM

Feel free to utilize it for your needs.

SekeRob · September 24, 2023, 8:36am

Osmose has this pin for ‘phone number not in the expected format’ (In Italy). Don’t know if it has some sort of AI routine learning what the most common formats are for numbers of different digit counts and if with city 0 prefix (has to be dialed even from abroad) or without for mobile numbers. For the US/Canada can only remember it complaining if the international +1 prefix is missing.

watmildon · September 24, 2023, 11:31pm

This has reminded me to extend checks to fax= and contact:phone= as well as detect toll free numbers. Thanks!

SekeRob · September 25, 2023, 5:22am

The exception to the intl prefix check is the numbers entered under the phone:tollfree tag which can not be dialled from abroad.

edit: For entry quality improvement there’s a phonenumber plugin for JOSM. Certainly worth knowing about what it does in terms of formatting.

watmildon · September 26, 2023, 4:45am

RE: the phonenumber plugin

The good news: it does great work formatting numbers into +1 NNN-NNN-NNNN format!

The bad news: It does not appear to work if you do a partial download from overpass. Bug filed here: Validator hits null exception when run on overpass downloaded data set latest JOSM · Issue #7 · gabortim/josm-phonenumber · GitHub