BCP-47 compliant language tag predicates
language-tag?
1 Components
normal-use?
private-use?
grandfathered?
language-part?
language-script-part?
language-region-part?
language-variant-part?
language-extension-part?
language-private-use-part?
2 Matching
language-tag-match
3 Appendix:   Definition
1.0

BCP-47 compliant language tag predicates🔗ℹ

Simon Johnston <johnstonskj@gmail.com>

 (require langtag) package: langtag

This module provides a single predicate that determines whether a given string is a valid Language Tag as defined by RFC5646 and used across HTTP, HTML, XML, RDF, and much more.

References

predicate

(language-tag? val)  boolean?

  val : (or/c symbol? string?)
Returns #t if the string val is a valid BCP-47 language tag.

Examples:
> (require langtag)
> (language-tag? "en")

#t

> (language-tag? "en-US")

#t

> (language-tag? "en-US-boont")

#t

> (language-tag? "en-Latn-US")

#t

> (language-tag? "i-klingon")

#t

> (language-tag? "x-private")

#t

1 Components🔗ℹ

predicate

(normal-use? val)  boolean?

  val : (or/c symbol? string?)

predicate

(private-use? val)  boolean?

  val : (or/c symbol? string?)

predicate

(grandfathered? val)  boolean?

  val : (or/c symbol? string?)
Returns #t if the string val corresponds to one of the three top-level productions for the rule Language-Tag.

Examples:
> (require langtag)
> (for-each (lambda (val) (displayln (format "~s ~s ~s"
                                             (normal-use? val)
                                             (private-use? val)
                                             (grandfathered? val))))
            '("en-US" "x-private" "i-klingon"))

#t #f #f

#f #t #f

#f #f #t

predicate

(language-part? val)  boolean?

  val : (or/c symbol? string?)

predicate

(language-script-part? val)  boolean?

  val : (or/c symbol? string?)

predicate

(language-region-part? val)  boolean?

  val : (or/c symbol? string?)

predicate

(language-variant-part? val)  boolean?

  val : (or/c symbol? string?)

predicate

(language-extension-part? val)  boolean?

  val : (or/c symbol? string?)

predicate

(language-private-use-part? val)  boolean?

  val : (or/c symbol? string?)
Returns #t if the string val corresponds to one of the components of a normal-use language tag.

2 Matching🔗ℹ

procedure

(language-tag-match val)

  (list symbol? string? (or/c (listof (cons/c symbol? string?)) none/c))
  val : (or/c symbol? string?)
TBD

Examples:
> (require langtag)
> (language-tag-match "en")

'(lang "en" ((language . "en")))

> (language-tag-match "en-US")

'(lang "en-US" ((language . "en") (region . "US")))

> (language-tag-match "en-US-boont")

'(lang "en-US-boont" ((language . "en") (region . "US") (variant . "boont")))

> (language-tag-match "en-Latn-US")

'(lang "en-Latn-US" ((language . "en") (script . "Latn") (region . "US")))

> (language-tag-match "i-klingon")

'(grandfathered-i "i-klingon")

> (language-tag-match "x-private")

'(private-use "x-private")

3 Appendix: Definition🔗ℹ

The syntax of the language tag, from [RFC5646], in ABNF [RFC5234] is:

Language-Tag  = langtag             ; normal language tags

              / privateuse          ; private use tag

              / grandfathered       ; grandfathered tags

 

langtag       = language

                ["-" script]

                ["-" region]

                *("-" variant)

                *("-" extension)

                ["-" privateuse]

 

language      = 2*3ALPHA            ; shortest ISO 639 code

                ["-" extlang]       ; sometimes followed by

                                    ; extended language subtags

              / 4ALPHA              ; or reserved for future use

              / 5*8ALPHA            ; or registered language subtag

 

extlang       = 3ALPHA              ; selected ISO 639 codes

                *2("-" 3ALPHA)      ; permanently reserved

 

script        = 4ALPHA              ; ISO 15924 code

 

region        = 2ALPHA              ; ISO 3166-1 code

              / 3DIGIT              ; UN M.49 code

 

variant       = 5*8alphanum         ; registered variants

              / (DIGIT 3alphanum)

 

extension     = singleton 1*("-" (2*8alphanum))

 

                                    ; Single alphanumerics

                                    ; "x" reserved for private use

singleton     = DIGIT               ; 0 - 9

              / %x41-57             ; A - W

              / %x59-5A             ; Y - Z

              / %x61-77             ; a - w

              / %x79-7A             ; y - z

 

privateuse    = "x" 1*("-" (1*8alphanum))

 

grandfathered = irregular           ; non-redundant tags registered

              / regular             ; during the RFC 3066 era

 

irregular     = "en-GB-oed"         ; irregular tags do not match

              / "i-ami"             ; the 'langtag' production and

              / "i-bnn"             ; would not otherwise be

              / "i-default"         ; considered 'well-formed'

              / "i-enochian"        ; These tags are all valid,

              / "i-hak"             ; but most are deprecated

              / "i-klingon"         ; in favor of more modern

              / "i-lux"             ; subtags or subtag

              / "i-mingo"           ; combination

              / "i-navajo"

              / "i-pwn"

              / "i-tao"

              / "i-tay"

              / "i-tsu"

              / "sgn-BE-FR"

              / "sgn-BE-NL"

              / "sgn-CH-DE"

 

regular       = "art-lojban"        ; these tags match the 'langtag'

              / "cel-gaulish"       ; production, but their subtags

              / "no-bok"            ; are not extended language

              / "no-nyn"            ; or variant subtags: their meaning

              / "zh-guoyu"          ; is defined by their registration

              / "zh-hakka"          ; and all of these are deprecated

              / "zh-min"            ; in favor of a more modern

              / "zh-min-nan"        ; subtag or sequence of subtags

              / "zh-xiang"

 

alphanum      = (ALPHA / DIGIT)     ; letters and numbers