Scope of denotation for language identifiers
- Individual languages
- Collections of languages
- Reserved for local use
- Special situations
A language identifier represents one or more language names, all of which designate the same specific language. The ultimate objects of identification are languages themselves; language names are the formal means by which the languages denoted by language identifiers are designated.
Languages are not static objects; there is variation temporally, spacially, and socially; every language corresponds to some range of variation in linguistic expression. In this part of ISO 639, then, a language identifier denotes some range of language varieties. The range of varieties that are denoted can have three different scopes: individual language, macrolanguage or collection.
In this part of ISO 639, most identifiers are assumed to denote distinct individual languages. Furthermore, it is a goal for this part of ISO 639 to provide an identifier for every distinct human language that has been documented, whether living, extinct, or constructed, and whether its modality is spoken, written or signed.
There is no one definition of "language" that is agreed upon by all and appropriate for all purposes. As a result, there can be disagreement, even among speakers or linguistic experts, as to whether two varieties represent dialects of a single language or two distinct languages. For this part of ISO 639, judgments regarding when two varieties are considered to be the same or different languages are based on a number of factors, including linguistic similarity, intelligibility, a common literature, the views of speakers concerning the relationship between language and identity, and other factors. The following basic criteria are followed:
- Two related varieties are normally considered varieties of the same language if speakers of each variety have inherent understanding of the other variety (that is, can understand based on knowledge of their own variety without needing to learn the other variety) at a functional level.
- Where spoken intelligibility between varieties is marginal, the existence of a common literature or of a common ethnolinguistic identity with a central variety that both understand can be strong indicators that they should nevertheless be considered varieties of the same language.
- Where there is enough intelligibility between varieties to enable communication, the existence of well-established distinct ethnolinguistic identities can be a strong indicator that they should nevertheless be considered to be different languages.
Some of the distinctions made on this basis may not be considered appropriate by some users or for certain applications. These basic criteria are thought to best fit the intended range of applications for this standard.
Other parts of ISO 639 have included identifiers designated as individual language identifiers that correspond in a one-to-many manner with individual language identifiers in this part of ISO 639. For instance, this part of ISO 639 contains over 30 identifiers designated as individual language identifiers for distinct varieties of Arabic, while ISO 639-1 and ISO 639-2 each contain only one identifier for Arabic, "ar" and "ara" respectively, which are designated as individual language identifiers in those parts of ISO 639. It is assumed here that the single identifiers for Arabic in parts 1 and 2 of ISO 639 correspond to the many identifiers collectively for distinct varieties of Arabic in part 3 of ISO 639.
In this example, it may appear that the single identifiers in ISO 639-1 and ISO 639-2 should be designated as collective language identifiers. That is not assumed here. In various parts of the world, there are clusters of closely-related language varieties that, based on the criteria discussed above, can be considered distinct individual languages, yet in certain usage contexts a single language identity for all is needed. Typical situations in which this need can occur include the following:
- There is one variety that is more developed and that tends to be used for wider communication by speakers of various closely-related languages; as a result, there is a perceived common linguistic identity across these languages. For instance, there are several distinct spoken Arabic languages, but Standard Arabic is generally used in business and media across all of these communities, and is also an important aspect of a shared ethno-religious unity. As a result, a perceived common linguistic identity exists.
- There is a common written form used for multiple closely-related languages. For instance, multiple Chinese languages share a common written form.
- There is a transitional socio-linguistic situation in which sub-communities of a single language community are diverging, creating a need for some purposes to recognize distinct languages while, for other purposes, a single common identity is still valid. For instance, in some contexts it is necessary to make a distinction between Bosnian, Croatian and Serbian languages, yet there are other contexts in which these distinctions are not discernible in language resources that are in use.
Where such situations exist, an identifier for the single, common language identity is considered in this part of ISO 639 to be a macrolanguage identifier.
Macrolanguages are distinguished from language collections in that the individual languages that correspond to a macrolanguage must be very closely related, and there must be some domain in which only a single language identity is recognized.
A collective language code element is an identifier that represents a group of individual languages that are not deemed to be one language in any usage context. Whereas ISO 639-2 includes three-letter identifiers for such collections of languages, this part of ISO 639 provides identifiers for individual languages and macrolanguages only.
The linguistic varieties denoted by each of the identifiers in this part of ISO 639 are assumed to be distinct languages and not dialects of other languages, even though for some purposes some users may consider a variety listed in this part of ISO 639 to be a "dialect" rather than a "language". In this standard, the term dialect is used as in the field of linguistics where it simply identifies any sub-variety of a language such as might be based on geographic region, age, gender, social class, time period, or the like. This contrasts with a popular usage in which "dialect" is typically construed to connote a substandard or undeveloped form of language.
The dialects of a language are included within the denotation represented by the identifier for that language. Thus, each language identifier represents the complete range of all the spoken or written varieties of that language, including any standardized form.
Identifiers qaa through qtz are reserved for local use, to be used in cases in which there is no suitable existing code in ISO 639. There are no constraints as to scope of denotation. These identifiers may only be used locally, and may not be used in interchange without a private agreement.
ISO 639-2 defines three code elements for other special situations. The identifier [mul] (multiple languages) should be applied when many languages are used and it is not practical to specify all the appropriate language codes. The identifier [und] (undetermined) is provided for those situations in which a language or languages must be indicated but the language cannot be identified. The identifier [zxx] (no linguistic content) may be applied in a situation in which a language identifier is required by system definition, but the item being described does not actually contain linguistic content.