# Introduction to Two-level Phonology[1]

Evan L. Antworth

May 1991

Two-level phonology is a linguistic tool developed by computational linguists. Its primary use is in systems for natural language processing such as PC-KIMMO, a program recently been published by SIL (Antworth 1990). This article describes the linguistic and computational basis of two-level phonology.[2]

## Computational and linguistic roots

As the fields of computer science and linguistics have grown up together during
the past several decades, they have each benefited from cross-fertilization.
Modern linguistics has especially been influenced by the formal language theory
that underlies computation. The most famous application of formal language
theory to linguistics was Chomsky's (1957) transformational generative grammar.
Chomsky's strategy was to consider several types of formal languages to see if
they were capable of modeling natural language syntax. He started by
considering the simplest type of formal languages, called *finite state*
languages. As a general principle, computational linguists try to use the least
powerful computational devices possible. This is because the less powerful
devices are better understood, their behavior is predictable, and they are
computationally more efficient. Chomsky (1957:18ff) demonstrated that natural
language syntax could not be effectively modeled as a finite state language;
thus he rejected finite state languages as a theory of syntax and proposed that
syntax requires the use of more powerful, non-finite state languages. However,
there is no reason to assume that the same should be true for natural language
phonology. A finite state model of phonology is especially desirable from the
computational point of view, since it makes possible a computational
implementation that is simple and efficient.
While various linguists proposed that generative phonological rules could be
implemented by finite state devices (see Johnson 1972, Kay 1983), the most
successful model of finite state phonology was developed by Kimmo Koskenniemi,
a Finnish computer scientist. He called his model two-level morphology
(Koskenniemi 1983). His use of the term morphology should be understood to
encompass both what linguists would consider morphology proper (the
decomposition of words into morphemes) and phonology (at least in the sense of
morphophonemics). Koskenniemi's motivation for developing the two-level model
was eminently practical. Finnish is a highly agglutinative language in which
words can have thousands of inflected forms. Natural language processing
systems for Finnish could get nowhere without first parsing its morphology.
This is in contrast to English, whose relatively impoverished inflectional
morphology can be handled in an ad hoc fashion.

Koskenniemi's two-level model comprises two components:

- a rules component, which contains phonological rules represented as finite state devices, and
- a lexical component, or lexicon, which lists lexical items (indivisible words and morphemes) in their underlying forms, and encodes morphotactic constraints.

The two components work together to perform both generation
(production) and recognition (parsing) of word forms.
Our main interest in this article is the phonological formalism used by the
two-level model, hereafter called *two-level phonology*. Two-level
phonology traces its linguistic heritage to `classical' generative phonology as
codified in *The Sound Pattern of English* (Chomsky and Halle 1968). The
basic insight of two-level phonology is due to the phonologist C. Douglas
Johnson (1972,) who showed that the *SPE* theory of phonology could be
implemented using finite state devices by replacing sequential rule application
with simultaneous rule application. At its core, then, two-level phonology is a
rule formalism, not a complete theory of phonology. The following sections of
this article describe the mechanism of two-level rule application by
contrasting it with rule application in classical generative phonology. It
should be noted that Chomsky and Halle's theory of rule application became the
focal point of much controversy during the 1970s with the result that current
theories of phonology differ significantly from classical generative phonology.
The relevance of two-level phonology to current theory is an important issue,
but one that will not be fully addressed here. Rather, the comparison of
two-level phonology to classical generative phonology is done mainly for
expository purposes, recognizing that while classical generative phonology has
been superseded by subsequent theoretical work, it constitutes a historically
coherent view of phonology that continues to influence current theory and
practice.

One feature that two-level phonology shares with classical generative phonology is linear representation. That is, phonological forms are represented as linear strings of symbols. This is in contrast to the nonlinear representations used in much current work in phonology, namely autosegmental and metrical phonology (see Goldsmith 1990). On the computational side, two-level phonology is consistent with natural language processing systems that are designed to operate on linear orthographic input.

## Two-level rule application

We will begin by reviewing the formal properties of generative rules. Stated
succinctly, generative rules are *sequentially ordered rewriting rules.*
What does this mean?
First, *rewriting* rules are rules that change or transform one symbol
into another symbol. For example, a rewriting rule of the form *a -> b*
interprets the relationship between the symbols *a* and *b* as a
dynamic change whereby the symbol *a* is rewritten or turned into the
symbol *b*. This means that after this operation takes place, the symbol
*a* no longer `exists,' in the sense that it is no longer available to
other rules. In linguistic theory generative rules are known as process rules.
Process rules attempt to characterize the relationship between levels of
representation (such as the phonemic and phonetic levels) by specifying how to
transform representations from one level into representations on the other
level.

Second, generative phonological rules apply *sequentially,* that is, one
after another, rather than applying simultaneously. This means that each rule
creates as its output a new intermediate level of representation. This
intermediate level then serves as the input to the next rule. As a consequence,
the underlying form becomes inaccessible to later rules.

Third, generative phonological rules are *ordered;* that is, the
description specifies the sequence in which the rules must apply. Applying
rules in any other order may result in incorrect output.

As an example of a set of generative rules, consider the following rules:

Vowel Raising 1. e -> i / ___C0i Palatalization 2. t -> c / ___i

Rule 1 (Vowel Raising) states that *e* becomes (is rewritten as) *i* in
the environment preceding *Ci *(where *C* stands for the set of
consonants and C0 stands for zero or more consonants). Rule 2 (Palatalization)
states that *t* becomes *c* preceding *i*. A sample derivation
of forms to which these rules apply looks like this (where UR stands for
Underlying Representation, SR stands for Surface Representation):[3]

UR: temi Rule 1: timi Rule 2: cimi SR: cimi

Notice that in addition to the underlying and surface levels, an intermediate level
has been created as the result of sequentially applying rules 1 and 2. The
application of rule 1 produces the intermediate form *timi*, which then
serves as the input to rule 2.
Not only are these rules sequential, they are ordered, such that rule 1 must
apply before rule 2. Rule 1 has a *feeding* relationship to rule 2; that
is, rule 1 increases the number of forms that can undergo rule 2 by creating
more instances of *i*. Consider what would happen if they were applied in
the reverse order. Given the input form *temi*, rule 2 would do nothing,
since its environment is not satisfied. Rule 1 would then apply to produce the
incorrect surface form *timi*.

Two-level rules differ from generative rules in the following ways. First, whereas generative rules apply in a sequential order, two-level rules apply simultaneously, which is better described as applying in parallel. Applying rules in parallel to an input form means that for each segment in the form all of the rules must apply successfully, even if only vacuously.

Second, whereas sequentially applied generative rules create intermediate levels of derivation, simultaneously applied two-level rules require only two levels of representation: the underlying or lexical level and the surface level. There are no intermediate levels of derivation. It is in this sense that the model is called two-level.

Third, whereas generative rules relate the underlying and surface levels by
rewriting underlying symbols as surface symbols, two-level rules express the
relationship between the underlying and surface levels by positing direct,
static correspondences between pairs of underlying and surface symbols. For
instance, instead of rewriting underlying *a* as surface *b,* a
two-level rule states that an underlying *a* corresponds to a surface
*b* . The two-level rule does not change *a* into *b*, so
*a* is available to other rules. In other words, after a two-level rule
applies, both the underlying and surface symbols still `exist.'

Fourth, whereas generative rules have access only to the current intermediate
form at each stage of the derivation, two-level rules have access to both
underlying and surface environments. Generative rules cannot `look back' at
underlying environments or `look ahead' to surface environments. In contrast,
the environments of two-level rules are stated as lexical-to-surface
correspondences. This means that a two-level rule can easily refer to an
underlying *a* that corresponds to a surface *b,* or to a surface
*b* that corresponds to an underlying *a*. In generative phonology,
the interaction between a pair of rules is controlled by requiring that they
apply in a certain sequential order. In two-level phonology, rule interactions
are controlled not by ordering the rules but by carefully specifying their
environments as strings of two-level correspondences.

Fifth, whereas generative, rewriting rules are unidirectional (that is, they
operate only in an underlying to surface direction), two-level rules are
*bidirectional.* Two-level rules can operate either in an underlying to
surface direction (generation mode) or in a surface to underlying direction
(recognition mode). Thus in generation mode two-level rules accept an
underlying form as input and return a surface form, while in recognition mode
they accept a surface form as input and return an underlying form. The
practical application of bidirectional phonological rules is obvious: a
computational implementation of bidirectional rules is not limited to
generation mode to produce words; it can also be used in recognition direction
to parse words.

## Two-level rules and declarative representation

Two-level rules are not process rules like generative rules but more like the realization rules of stratificational linguistics. The linguistic opposition between process rules and realization rules is mirrored in computer science in the opposition between imperative and declarative programming. A typical imperative programming language is Pascal, while Prolog is an example of a declarative language. An imperative program is an operation that transforms input data objects into the desired output objects. In contrast, a declarative program merely expresses what must be true of the relationship between the input objects and output objects. When writing an imperative program, the programmer must specify an ordered sequence of commands that the computer will execute in order to arrive at the correct result. But when writing a declarative program, the programmer merely states constraints among the data objects, leaving it up to the computer to figure out what operations are needed to get output that is consistent with the constraints.

A significant consequence of declarative programming is that programs in a declarative language such as Prolog can run bidirectionally. For example, consider the problem of converting Fahrenheit temperatures to Celsius temperatures, and vice-versa. An imperative program that does these operations must contain two separate procedures: one to convert Fahrenheit to Celsius and another to convert Celsius to Fahrenheit. A declarative program, however, will simply state the relationship between Fahrenheit and Celsius equivalents in such a way that a single function can accept as input a Fahrenheit temperature and return as output the Celsius equivalent or accept a Celsius temperature and return a Fahrenheit temperature. Thus many relationships are more appropriately represented by a declarative formalism than an imperative one. Two-level phonology, then, permits phonological rules to be implemented declaratively as static, two-level rules, rather than imperatively as dynamic, process rules.

## How a two-level description works

To understand how a two-level phonological description works, we will use the
example given above involving Raising and Palatalization. The two-level model
treats the relationship between the underlying form *temi* and the surface
form *cimi* as a direct, symbol-to-symbol correspondence:

UR: t e m i SR: c i m i

Each pair of lexical and surface symbols is a *correspondence pair*. We refer
to a correspondence pair with the notation <*underlying
symbol>*:<*surface symbol>*, for instance *e:i* and
*m:m*. There must be an exact one-to-one correspondence between the
symbols of the underlying form and the symbols of the surface form. Deletion
and insertion of symbols (explained in detail in the next section) is handled
by positing correspondences with zero, a null segment. The two-level model uses
a notation for expressing two-level rules that is similar to the notation
linguists use for phonological rules. Corresponding to the generative rule for
Palatalization (rule 2 above), here is the two-level rule for the *t:c*
correspondence:

Palatalization 3. t:c <=> ___ @:i

This rule is a statement about the distribution of the pair *t:c* on the left
side of the arrow with respect to the context or environment on the right side
of the arrow. A two-level rule has three parts: the *correspondence,* the
*operator,* and the *environment*. The correspondence part of rule 3
is the pair *t:c*, which is the correspondence that the rule sanctions.
The operator part of rule 3 is the double-headed arrow. It indicates the nature
of the logical relationship between the correspondence and the environment
(thus it means something very different from the rewriting arrow -> of
generative phonology). The <=> arrow is equivalent to the biconditional
operator of formal logic and means that the correspondence occurs always and
only in the stated context; that is, *t:c *is allowed if and only if it is
found in the context* ___@:i*. In short, rule 3 is an obligatory rule. The
environment part of rule 3 is everything to the right of the arrow. The long
underline indicates the gap where the pair *t:c* occurs. Notice that even
the environment part of the rule is specified as two-level correspondence pairs.
The environment part of rule 3 requires further explanation. Instead of using a
correspondence such as *i:i*, it uses the correspondence *@:i*. The
*@* symbol is a special `wildcard' symbol that stands for any phonological
segment included in the description. In the context of rule 3, the
correspondence *@:i* stands for all the feasible pairs in the description
whose surface segment is *i*, in this case *e:i* and *i:i*. Thus
by using the correspondence *@:i*, we allow Palatalization to apply in the
environment of either a lexical *e* or lexical *i*. In other words,
we are claiming that Palatalization is sensitive to a surface (phonetic)
environment rather than an underlying (phonemic) environment. Thus rule 3 will
apply to both underlying forms *timi* and *temi* to produce a surface
form with an initial *c*.

Corresponding to the generative rule for Raising (rule 1 above) is the
following two-level rule for the *e:i *correspondence:

Vowel Raising 4. e:i <=> ___ C:C* @:i

(The asterisk in *C:C* *indicates zero or more instances of the correspondence
*C:C) *Similar to rule 3 above, rule 4 uses the correspondence *@:i*
in its environment. Thus rule 4 states that the correspondence *e:i*
occurs preceding a surface *i*, regardless of whether it is derived from a
lexical *e* or *i*. Why is this necessary? Consider the case of an
underlying form such as *pememi*. In order to derive the surface form
*pimimi, *Raising must apply twice: once before a lexical *i* and
again before a lexical *e*, both of which correspond to a surface
*i.* Thus rule 4 will apply to both instances of lexical* e*,
capturing the regressive spreading of Raising through the word.
By applying rules 3 and 4 in parallel, they work in consort to produce the
right output. For example,

UR: t e m i | | | | Rules: 3 4 | | | | | | SR: c i m i

Conceptually,
a two-level phonological description of a data set such as this can be
understood as follows. First, the two-level description declares an alphabet of
all the phonological segments used in the data in both underlying and surface
forms, in the case of our example, *t, m, c, e, *and *i. *Second, the
description declares a set *feasible pairs,* which is the complete set of
all underlying-to-surface correspondences of segments that occur in the data.
The set of feasible pairs for these data is the union of the set of *default
correspondences,* whose underlying and surface segments are identical
(namely *t:t, m:m, e:e, *and *i:i*) and the set of *special
correspondences,* whose underlying and surface segments are different
(namely *t:c* and *e:i*). Notice that since the segment *c* only
occurs as a surface segment in the feasible pairs, the description will
disallow any underlying form that contains a *c.*

A minimal two-level description, then, consists of nothing more than this
declaration of the feasible pairs. Since it contains all possible
underlying-to-surface correspondences, such a description will produce the
correct output form, but because it does not constrain the environments where
the special correspondences can occur, it will also allow many incorrect output
forms. For example, given the underlying form *temi,* it will produce the
surface forms *temi, timi, cemi, *and *cimi,* of which only the last
is correct.

Third, in order to restrict the output to only correct forms, we include rules
in the description that specify where the special correspondences are allowed
to occur. Thus the rules function as constraints or filters, blocking incorrect
forms while allowing correct forms to pass through. For instance, rule 3
(Palatalization) states that a lexical *t *must be realized as a surface
*c *when it precedes *@:i; *thus, given the underlying form
*temi* it will block the potential surface output forms *timi
*(because the surface sequence *ti* is prohibited) and* cemi
*(because surface *c* is prohibited before anything except surface
*i*). Rule 4 (Raising) states that a lexical *e *must be realized as
a surface *i* when it precedes the sequence *C:C @:i; *thus, given
the underlying form *temi* it will block the potential surface output
forms *temi *and *cemi *(because the surface sequence *emi* is
prohibited). Therefore of the four potential surface forms, three are filtered
out; rules 3 and 4 leave only the correct form *cimi*.

Two-level phonology facilitates a rather different way of thinking about phonological rules. We think of generative rules as processes that change one segment into another. In contrast, two-level rules do not perform operations on segments, rather they state static constraints on correspondences between underlying and surface forms. Generative phonology and two-level phonology also differ in how they characterize relationships between rules. Rules in generative phonology are described in terms of their relative order of application and their effect on the input of other rules (the so-called feeding and bleeding relations). Thus the generative rule 1 for Raising precedes and feeds rule 2 for Palatalization. In contrast, rules in the two-level model are categorized according to whether they apply in lexical versus surface environments. So we say that the two-level rules for Raising and Palatalization are sensitive to a surface rather than underlying environment.

## With zero you can do (almost) anything

Phonological processes that delete or insert segments pose a special challenge
to two-level phonology. Since an underlying form and its surface form must
correspond segment for segment, how can segments be deleted from an underlying
form or inserted into a surface form? The answer lies in the use of the special
null symbol *0* (zero). Thus the correspondence *x:0* represents the
deletion of *x,* while *0:x* represents the insertion of *x*.
(It should be understood that these zeros are provided by rule application
mechanism and exist only internally; that is, zeros are not included in input
forms nor are they printed in output forms.) As an example of deletion,
consider these forms from Tagalog (where *+* represents a morpheme
boundary):

UR: m a n + b i l i SR: m a m 0 0 i l i

Using
process terminology, these forms exemplify phonological coalescence, whereby
the sequence *nb* becomes *m.* Since in the two-level model a
sequence of two underlying segments cannot correspond to a single surface
segment, coalescence must be interpreted as simultaneous assimilation and
deletion. Thus we need two rules: an assimilation rule for the correspondence
*n:m* and a deletion rule for the correspondence *b:0* (note that the
morpheme boundary *+* is treated as a special symbol that is always
deleted).

Nasal Assimilation 5. n:m <=> ___ +:0 b:@ Deletion 6. b:0 <=> @:m +:0 ___

Notice
the interaction between the rules: Nasal Assimilation occurs in a lexical
environment, namely a lexical *b *(which can correspond to either a
surface *b* or *0*), while Deletion occurs in a surface environment,
namely a surface *m *(which could be the realization of either a lexical
*n* or *m*). In this way the two rules interact with each other to
produce the correct output.
Insertion correspondences, where the lexical segment is *0*, enable one to
write rules for processes such as stress insertion, gemination, infixation, and
reduplication. For example, Tagalog has a verbalizing infix *<um>*
that attaches between the first consonant and vowel of a stem; thus the infixed
form of *bili* is *bumili*. To account for this formation with
two-level rules, we represent the underlying form of the infix *<um>
*as the prefix *X+,* where *X* is a special symbol that has no
phonological purpose other than standing for the infix. We then write a rule
that inserts the sequence *um* in the presence of *X+,* which is
deleted. Here is the two-level correspondence:

UR: X + b 0 0 i l i SR: 0 0 b u m i l i

and
here is the two-level rule, which simultaneously deletes *X* and inserts
*um*:

Infixation 7. X:0 <=> ___ +:0 C:C 0:u 0:m V:V

These examples involving deletion and insertion show that the invention of zero is just as important for phonology as it was for arithmetic. Without zero, two-level phonology would be limited to the most trivial phonological processes; with zero, the two-level model has the expressive power to handle complex phonological or morphological phenomena (though not necessarily with the degree of felicity that a linguist might desire).

## Two-level phonology as a linguistic tool

Shieber (1986) describes two classes of linguistic formalisms: linguistic
*tools* and linguistic *theories*. A linguistic tool is used to
describe natural languages. A linguistic theory, on the other hand, is intended
to define the class of possible natural languages. From this point of view,
two-level phonology is best regarded as a linguistic tool rather than a theory.
Its job is to provide the expressive power needed to describe the phonological
phenomena of natural languages. Issues such as characterizing the class of
possible natural language phonologies, constraining possible analyses, and
evaluating competing descriptions must be resolved by the theory which the tool
serves.
As described below, two-level phonology has been used to build PC-KIMMO, a
computational system for producing and recognizing words. But PC-KIMMO is not a
linguistic theory either (though it is modeled on linguistic concepts), rather
it is a practical application for natural language processing. Thus it is
inappropriate to compare PC-KIMMO with, say, the theory of Lexical Phonology.
However, two-level phonology *per se *is not inconsistent with a theory
such as Lexical Phonology. While this article has described the two levels of
two-level phonology as corresponding to the underlying and surface levels of
classical generative phonology, the general point to understand is that the two
levels can actually be any two levels as defined by a certain linguistic
theory. For example, Lexical Phonology does not have a single underlying level
and a single surface level. Rather, the model allows multiple, ordered
morphological levels. At each level, morphological rules such as affixation are
applied accompanied by the application of the phonological rules relevant to
that level (this summary leaves out many important details; for an overview of
Lexical Phonology see Kaisse and Shaw 1985 or Kroeger 1990). So on each
morphological level, phonological rules apply to `underlying' forms and produce
`surface' forms, which are then fed into the next morphological level. These
phonological rules could be implemented as two-level rules. It is in this sense
that two-level phonology can be used as a tool to computationally implement a
linguistic theory.

## Doing two-level phonology on a computer

Earlier in this article two-level phonology was described as a type of finite
state phonology. The importance of this observation lies in the fact that
finite state devices can be effectively constructed on a computer. Various
computer implementations of Koskenniemi's two-level model have been done, but
they have all required large, expensive computers. In order to bring the power
of the two-level model to individual linguists who do not have access to a
large computer, SIL has recently released PC-KIMMO, a computer program that
runs the two-level model on personal computers, namely IBM PC compatibles and
the Apple Macintosh. It is named after Kimmo Koskenniemi, the originator of the
two-level model. The program is included with the book entitled *PC-KIMMO: A
Two-level Processor for Morphological Analysis* (Antworth 1990). The book is
a tutorial on developing two-level descriptions with PC-KIMMO. It teaches how
to write two-level rules in the notation used above and then how to translate
them into finite state tables, which is the notation the computer actually
uses. For example, rules 3 and 4 above translate into the following tables:

Rule 3: Palatalization t t @ @ c @ i @ 1: 3 2 1 1 2: 3 2 0 1 3. 0 0 1 0 Rule 4: Vowel Raising e e C @ @ i @ C i @ 1: 4 2 1 1 1 2: 4 2 3 1 1 3: 4 2 1 0 1 4. 0 0 5 0 0 5. 0 0 0 1 0

Describing what these tables mean and how to construct them is beyond the scope of this article. Suffice it to say that while an ordinary, working linguist can learn to translate two-level rules into finite state tables, it does require motivation and a commitment of time. And what practical uses does PC-KIMMO have? Here are two:

- Field linguists can use PC-KIMMO as a tool for developing and testing phonological and morphological descriptions.
- Applications based on PC-KIMMO can be developed that will morphologically analyze text in preparation for interlinear glossing or dialect adaptation.

Neither two-level phonology nor PC-KIMMO is the ultimate answer to the challenges of phonological description or computational word parsing. While phonological theory has advanced beyond the classical generative theory that two-level phonology grew out of, two-level phonology is still consistent with many generally accepted and widely practised views of phonology. In addition, its formalism for rule application provides an alternative to generative rule application that can be computationally implemented in practical natural language processing systems.

## References

Antworth, Evan L. 1990.*PC-KIMMO: a two-level processor for morphological analysis.*Occasional Publications in Academic Computing No. 16. Dallas, TX: Summer Institute of Linguistics.

Chomsky, Noam. 1957. *Syntactic structures.* The Hague: Mouton.

_____, and Morris Halle. 1968. *The sound pattern of English.* New
York: Harper and Row.

Goldsmith, John A. 1990. *Autosegmental and metrical phonology.
*Oxford and Cambridge, MA: Basil Blackwell.

Johnson, C. Douglas. 1972. *Formal aspects of phonological description.
*The Hague: Mouton.

Kaisse, Ellen M. and Patricia A. Shaw. 1985. On the theory of Lexical
Phonology. *Phonology Yearbook *2:1-30.

Kay, Martin. 1983. When meta-rules are not meta-rules. In Karen Sparck
Jones and Yorick Wilks, eds., *Automatic natural language parsing,*
94-116. Chichester: Ellis Horwood Ltd. See pages 100-104.

Koskenniemi, Kimmo. 1983. *Two-level morphology: a general computational
model for word-form recognition and production.* Publication No. 11.
Helsinki: University of Helsinki Department of General Linguistics.

Kroeger, Paul. 1990. Lexical Phonology and the rebirth of the phoneme.
*Notes on Linguistics *50: 11-24.

Shieber, Stuart M. 1986. *An introduction to unification-based approaches
to grammar. *CSLI Lecture Notes No. 4. Stanford, CA: Center for the Study of
Language and Information.

[1]This article appeard in *Notes on Linguistics *No.
53, May 1991, pp. 4-18, Copyright 1991 SIL, Inc.

[2]I would like to thank those who read and commented on drafts of this paper: Gary Simons, Stuart Milliken, and David Payne.

[3]This made-up example is used for expository purposes. To
make better phonological sense, the forms should have internal morpheme
boundaries, for instance *te+mi *(otherwise there would be no basis for
positing an underlying *e)*. See the section below on the use of zero to
see how morpheme boundaries are handled.