SIL International Home

Scope of denotation for language identifiers

A language identifier represents one or more language names, all of which designate the same specific language. The ultimate objects of identification are languages themselves; language names are the formal means by which the languages denoted by language identifiers are designated.

Languages are not static objects; there is variation temporally, spacially, and socially; every language corresponds to some range of variation in linguistic expression. In this part of ISO 639, then, a language identifier denotes some range of language varieties. The range of varieties that are denoted can have three different scopes: individual language, macrolanguage or collection.

Individual languages

In this part of ISO 639, most identifiers are assumed to denote distinct individual languages. Furthermore, it is a goal for this part of ISO 639 to provide an identifier for every distinct human language that has been documented, whether living, extinct, or constructed, and whether its modality is spoken, written or signed.

There is no one definition of "language" that is agreed upon by all and appropriate for all purposes. As a result, there can be disagreement, even among speakers or linguistic experts, as to whether two varieties represent dialects of a single language or two distinct languages. For this part of ISO 639, judgments regarding when two varieties are considered to be the same or different languages are based on a number of factors, including linguistic similarity, intelligibility, a common literature, the views of speakers concerning the relationship between language and identity, and other factors. The following basic criteria are followed:

Some of the distinctions made on this basis may not be considered appropriate by some users or for certain applications. These basic criteria are thought to best fit the intended range of applications for this standard.


Other parts of ISO 639 have included identifiers designated as individual language identifiers that correspond in a one-to-many manner with individual language identifiers in this part of ISO 639. For instance, this part of ISO 639 contains over 30 identifiers designated as individual language identifiers for distinct varieties of Arabic, while ISO 639-1 and ISO 639-2 each contain only one identifier for Arabic, "ar" and "ara" respectively, which are designated as individual language identifiers in those parts of ISO 639. It is assumed here that the single identifiers for Arabic in parts 1 and 2 of ISO 639 correspond to the many identifiers collectively for distinct varieties of Arabic in part 3 of ISO 639.

In this example, it may appear that the single identifiers in ISO 639-1 and ISO 639-2 should be designated as collective language identifiers. That is not assumed here. In various parts of the world, there are clusters of closely-related language varieties that, based on the criteria discussed above, can be considered distinct individual languages, yet in certain usage contexts a single language identity for all is needed. Typical situations in which this need can occur include the following:

Where such situations exist, an identifier for the single, common language identity is considered in this part of ISO 639 to be a macrolanguage identifier.

Macrolanguages are distinguished from language collections in that the individual languages that correspond to a macrolanguage must be very closely related, and there must be some domain in which only a single language identity is recognized.

Collections of languages

A collective language code element is an identifier that represents a group of individual languages that are not deemed to be one language in any usage context. Whereas ISO 639-2 includes three-letter identifiers for such collections of languages, this part of ISO 639 provides identifiers for individual languages and macrolanguages only.


The linguistic varieties denoted by each of the identifiers in this part of ISO 639 are assumed to be distinct languages and not dialects of other languages, even though for some purposes some users may consider a variety listed in this part of ISO 639 to be a "dialect" rather than a "language". In this standard, the term dialect is used as in the field of linguistics where it simply identifies any sub-variety of a language such as might be based on geographic region, age, gender, social class, time period, or the like. This contrasts with a popular usage in which "dialect" is typically construed to connote a substandard or undeveloped form of language.

The dialects of a language are included within the denotation represented by the identifier for that language. Thus, each language identifier represents the complete range of all the spoken or written varieties of that language, including any standardized form.

Reserved for local use

Identifiers qaa through qtz are reserved for local use, to be used in cases in which there is no suitable existing code in ISO 639. There are no constraints as to scope of denotation. These identifiers may only be used locally, and may not be used in interchange without a private agreement.

Special situations

ISO 639-2 defines three code elements for other special situations. The identifier [mul] (multiple languages) should be applied when many languages are used and it is not practical to specify all the appropriate language codes. The identifier [und] (undetermined) is provided for those situations in which a language or languages must be indicated but the language cannot be identified. The identifier [zxx] (no linguistic content) may be applied in a situation in which a language identifier is required by system definition, but the item being described does not actually contain linguistic content.