Hermit Crab Parsing Engine Specification

Mike Maxwell

24 February 1999

1. Introduction

The morpher/ lexical lookup module is also referred to as the "morpher module" in this specification. Its function is analyze each word of the input into a stem plus possible affixes. Conceptually, this is done by applying morphological and (morpho-)phonological rules in analysis order (i.e. the reverse of the order linguists usually think of) until the morpher discovers a string matching the lexical entry of some stem in the user's dictionary. The rules are applied in this reverse order in as many ways as possible to generate all possible analyses of each word. Each lexical entry discovered in this way is then acted on by the rules in synthesis order, to allow the testing of various criteria more conveniently tested when the lexical entry is known. (The algorithm assumed here is then a generate-and-test algorithm.) The output is the set of analyses, in the form of lexical entries for the input word.

The user is free to provide lexical entries for roots, stems, or partially or completely inflected/ derived words. Because of this freedom on the part of the user to provide both inflected and uninflected lexical entries, the lexical entries into which the morpher module analyzes input words are of one of two types: real entries, and virtual entries. A real lexical entry is one which the user has listed in the dictionary, while a virtual entry is one which the morpher has constructed from a dictionary entry plus one or more affixes.

The dictionary is then the repository of all real (as opposed to virtual) lexical entries. Since the dictionary is potentially very large, it may not be stored in the lexical module itself, but may be a separate module (perhaps a database program).

Regardless of whether the dictionary is actually internal to the morpher or not, the morpher may handle access to the lexical entries of the dictionary. That is, the morpher may serve as the front end to the dictionary. Dictionary commands are therefore listed together with other morpher commands in the following specification.

2. Linguistic Characteristics of the Morpher

This section describes the linguistic characteristics of the morpher module in general terms. Succeeding sections provide a more rigorous definition of these capabilities.

Morphological and phonological rules are discussed in this specification from the viewpoint of the linguist. That is, the "input" and "output" of rules are seen from the viewpoint of the generation of surface forms from underlying forms. (However, the term "input to the morpher module" refers to the unanalyzed tokens read in by the morpher, while the term "output of the morpher" refers to the lexical entries written out by the morpher.)

The morpher may be used to model either an Item-and-Process theory or an Item-and-Arrangement theory.

2.1. Cyclicity, Strata, and Ordering

The user may define various strata of rule application, where a "stratum" of rules refers to a set of rules which apply in a block, before or after the application of rules of other strata.

A morphological rule applies in just one stratum, while a phonological rule may apply in more than one stratum. Which stratum (or strata) a given rule applies in is designated by the user, as is the order of application of the various strata.

Linguistic theories may vary in the number of strata they assume. A structuralist theory, for instance, might have a stratum of allophonic phonological rules and another stratum of morphological and morphophonemic rules. The theory of The Sound Pattern of English (Chomsky and Halle 1968, henceforth SPE), on the other hand, assumes that morphological and phonological rules exist in at least two strata, a cyclic stratum and a postcyclic stratum. (Some generative phonologists would propose a stratum of precyclic rules as well.)

Within each stratum, the user (or the shell) may define several types of rule interaction, including cyclic and non-cyclic application. (Cyclic application, as implemented by Hermit Crab, is not precisely the same as that described in SPE. Under Hermit Crab, each cycle of phonological rules applies immediately after each morphological rule, not after all the morphological rules of the cyclic stratum have applied. If a morphological rule is sensitive to the phonetic form of the word to which it attaches, this leaves open the possibility that a preceding cycle of phonological rules will feed or bleed that morphological rule.) Cyclic phonological rules, in addition to applying as a block after each application of a cyclic morphological rule, are constrained by Kiparsky's Strict Cycle Condition (see below, Cyclic Phonological Rules, 2.3.2).

Within each stratum, morphological rules may be specified as being ordered in a linear fashion, or as being unordered (i.e. as potentially applying whenever their structural description is met). Similarly, phonological rules may be specified as being linearly ordered, as applying whenever their structural description is met, or as applying simultaneously (the latter option being unique to phonological rules). If linear order is specified for morphological and/or phonological rules, the relative ordering of individual rules must be specified.

Finally, subsets of the phonological rules in a given stratum may be specified as applying disjunctively. Within such a set of rules, the order is linear; and as soon as one such rule has applied once, no other rule in the set may apply to the same position in the phonetic shape of the lexical entry (except that in a cyclic stratum, the entire set may be applied again on the next cycle, subject to the Strict Cyclicity Condition).

2.2. Morphological Rules

Morphological rules analyze the phonetic (or phonological) shape of their input (a lexical entry) into one or more substrings, and output a lexical entry whose phonetic shape is the concatenation of one or more phonological substrings. These output substrings may be copies of the original substrings, copies altered by the modification of designated features, or entirely new sequences of segments and boundary markers. A morphological rule may also change the syntactic feature content, part of speech, etc. of a lexical entry.

Morphological Rules have one or more subrules, which apply disjunctively to a given form: the first subrule to match a given form is the only subrule which can apply. This mechanism can be used to encode variant forms of a rule whose application depends on the phonetic form of their input (e.g. English pluralization), conjugation class membership (e.g. Spanish verb classes), etc.

Sequences of phonetic segments in a morphological rule are specified in terms of their phonetic features. These sequences are matched against a translation into phonetic features of the string representing the phonetic shape of the rule’s input.

A morphological rule may require or prohibit the presence of Morphological Rule Features or syntactic features; may require that the input belong to a certain part of speech; and may require that the input have certain syntactic subcategorization properties.

2.2.1. Affix Types

All the following affix types can be analyzed by the morpher: prefixes, suffixes, circumfixes, infixes, suprafixes, replacives, reduplication, and null affixes.

However, care should be taken in writing null affixation rules, lest they cause the morpher to loop infinitely. For instance, if a language had a null affixation rule that derived nouns from verbs, and another null affixation rule that derived verbs from nouns, with no further stipulation the morpher could enter an infinite regress of deriving nouns from verbs and vice versa. Such looping can be prevented by the use of assigned and prohibited features.

2.2.2. Reverse Application of Morphological Rules

As discussed in more detail below (see section 2.3.1, Reverse Application of Phonological Rules), when parsing, Hermit Crab operates part of the time in analysis mode, undoing rules, rather than applying rules to an underlying form (as linguists are accustomed to doing). When a rule specifies a change in the value of some feature (e.g. that a particular segment in the stem becomes voiced), Hermit Crab "undoes" this rule by leaving the value for voicing of that segment unspecified. This is because the original (underlying) value of that feature is unknown: the morphological rule may apply in the synthesis direction to one underlying form by changing the feature specification (in this case changing an underlying [– voiced] segment to [+ voiced]), while to another underlying form, the rule may apply vacuously. The original voicing contrast thus becomes neutralized in this context. When features which have become uninstantiated are referred to by another rule, Hermit Crab assumes the rule applies without actually instantiating the features in question to all possible combinations of values.

2.2.3. Cyclicity and Ordering

As mentioned above, a morphological rule may be assigned to any one stratum. Within each stratum, morphological rules may be specified as linearly ordered or as unordered (i.e. as applying whenever their structural description is met).

If the user has specified multiple strata, there is a linear order among those strata, and no morphological rule from an earlier stratum may apply after a rule from a later stratum. "Looping back" (as advocated in Halle and Mohanan 1985) is therefore not provided for. This can lead to problems. For example, a common analysis of English is that rules of the cyclic stratum precede rules of the post-cyclic stratum. For instance, –al is a cyclic suffix (in SPE terms, it is attached with a + boundary), while –ment is a post-cyclic suffix (in terms of the SPE system, it is attached with a # boundary). However, in the word developmental, the –al suffix attaches outside the –ment suffix, an impossibility if there is no looping back. In defense of the non-provision for looping back in Hermit Crab, it should be said that there is no clear answer in morphological theory to such ordering paradoxes. (An ad hoc solution is to regard –mental as a single post-cyclic suffix. Alternatively, development can be listed in the lexicon as a stem.)

A similar ordering paradox occurs across word boundaries in compound words like transformational grammarian—a phrase which refers to a person who studies transformational grammar, not a grammarian who undergoes transformations. Again, theory provides no simple answer, nor does Hermit Crab.

Hermit Crab also lacks any direct provision for orders of affixes (i.e. position classes, such as first order suffixes, second order suffixes, etc.). However, it is possible to enforce an order of affixes in two ways. One is by the use of features. For instance, first order affix rules could assign the pseudo-syntactic feature (level (one)), and second order affix rules could require the presence of the feature (level (one)), assigning the new feature value two to the feature level, resulting in the new feature-name feature-value pair (level (two)). If a third order suffix could attach outside either a second order suffix or a first order suffix, such a rule could require the presence of either a (level (two)) or a (level (one)) feature. More precisely, the rule would have among its Required Syntactic Features the feature (level (one two))), and would assign the feature (level (three)).

A more straightforward way of assigning orders to affixes is by linearly ordering the morphological rules that attach the affixes. A rule attaching a first order suffix would be ordered before the rule attaching a second order suffix, etc.

2.3. Phonological Rules

Phonological rules are of the general form

X ® Y / W __ Z

where X, Y, W and Z represent (possibly optional) sequences of phonetic segments (specified as phonetic feature matrices) and/or boundary markers. These sequences cannot include any segments of morphemes which have not yet been attached by a morphological rule, i.e. which are outside of the "current" stem. The application of phonological rules may also be restricted by requiring the presence or absence of specified MPR features and/ or part of speech. Syntactic features (i.e. head and foot features) are invisible to phonological rules.

"Phonological rules", as here defined, encompass the allophonic and morpho-phonemic rules of structuralist linguistics, as well as the phonological rules of generative phonology.

Phonetic features are specified in phonological rules by a unique value (such as ‘+’ or ‘-’), or by alpha variables (as used e.g. in Chomsky and Halle 1968).

As mentioned above, phonological rules are allowed to apply at one or more strata. Within each stratum, phonological rules may be specified as being ordered linearly, or as applying simultaneously; disjunctive subsets of rules may also be defined.

2.3.1. Reverse Application of Phonological Rules

One peculiarity of Hermit Crab's application of phonological rules is due to its operation in analysis mode. Rather than starting with an underlying form and applying phonological and morphological rules to unambiguously synthesize a surface form, Hermit Crab begins with a known surface form, and attempts to analyze it into one or more underlying forms. As with morphological rules, phonological rules are neutralizing in the synthesis mode: they assign a value to a feature regardless of what the previous value (if any) of that feature may have been. In undoing the application of such a rule, i.e. applying it in an analysis manner, one doesn't know what feature values the vowel had prior to the rule's application, at least until lexical lookup. The rule may have applied by changing the feature values in question, but it is equally possible the rule applied vacuously. When un-applying a phonological or morphological rule, Hermit Crab un-instantiates such features. When another rule needs to know the value of an un-instantiated feature, the morpher simply assumes that uninstantiated features have the required value. This may result in over-application of phonological rules, but any incorrect application will be discovered when the rules are applied in synthesis mode to looked-up lexical entries.

At any rate, a deeper rule will require the value of a feature which was uninstantiated by a shallower rule only in the case of opaque rule orderings, i.e. when one rule counterbleeds or counterfeeds another rule. For instance, consider two rules ®  B /C__D and C ® E /false__G, applying in that (synthesis) order, and suppose that there are at least some forms in which both rules will apply. When Hermit Crab undoes the application of the second (shallower) rule, it will uninstantiate the features of C. But when Hermit Crab comes to the first rule, the value of those features required by that rule are unknown, so Hermit Crab simply assumes that the rule applies. (For further background, see Maxwell 1991.)

The situation is more complex in the case of length-changing rules (e.g. deletion rules, epenthesis rules, and diphthongization rules). Due to the internal representation used by Hermit Crab, the morpher will need to explore two search paths: one for which the length-changing rule is unapplied, and one for which it is not. This is true regardless of the interaction of the length-changing rule with other rules.

When doing lexical lookup of a partially instantiated phonetic feature matrix, Hermit Crab looks for all lexical forms which would match the feature matrix, ignoring unspecified feature values. For instance, suppose a language has both front rounded and front unrounded vowels, and Hermit Crab has undone a rule which rounds vowels in some environment. Then given that the rule has been "un-applied" to a form with surface [ü], the morpher will create a high front vowel with unknown rounding, and attempt to find a lexical entry with either an [i] or an [ü] in that position. (If the phonological analysis which Hermit Crab were modeling in this example used archiphonemes, the morpher would also attempt to find a lexical entry with the appropriate archiphoneme in that position.)

2.3.2. Cyclic Phonological Rules

Cyclic phonological rules apply once at the beginning of a cyclic stratum, and once after each application of a cyclic morphological rule; they are ordered among themselves as specified by the user. The morpher further constrains the application of cyclic phonological rules on all but the first cycle by Kiparsky's (1982) Strict Cycle Condition, given below:

Cyclic phonological rules apply only to derived representations.

A representation X is derived with respect to phonological rule R in cycle j iff X meets the structural analysis of R by virtue of a combination of morphemes introduced in cycle j or by virtue of the application of a previous cyclic phonological rule in cycle j (even if that application was vacuous).

2.3.3. Non-cyclic rules

Non-cyclic phonological rules are applied as a block after any applicable morphological rules of the same stratum have applied. Their order among themselves may be specified by the user.

2.3.4. Boundary Markers

Theories differ as to the number and kind of boundary markers they countenance. Hermit Crab makes no commitment to any of these theories, save that there is no provision for treating boundary markers as segments with features (as in Chomsky and Halle 1968).

Boundary markers are inserted as strings (not phonetic feature matrices).

Boundary markers in the phonetic shape of a lexical entry are ignored when matching that lexical entry against a phonological rule, unless the rule explicitly requires the boundary.

Boundary markers are erased at the end of each cycle and stratum.

Both morphological rules and phonological rules may insert boundary markers. However, the use of phonological rules to insert or alter boundary markers (i.e. readjustment rules) is discouraged, as it may lead to computational intractability.

2.3.5. Deletion Rules

In the absence of other restrictions, the fact that phonological rules can delete segments puts phonology into the domain of an unrestricted rewrite grammar. Since such a grammar would be impossible to parse, Hermit Crab places arbitrary (i.e. nonlinguistic) restrictions on deletion rules. (This is not to say that we have placed sufficient restrictions on such rules—a sufficiently ingenious linguist may still find some way of putting the morpher into an infinite loop, perhaps by including a deletion rule in a cyclic stratum!)

For the purposes of this discussion, a deletion rule is any rule which deletes part of its input, i.e. where the number of segments in the output of the rule is less than the number of segments in its input.

Understanding the arbitrary restriction that Hermit Crab uses requires an understanding of the way in which unapplication of deletion rules proceeds. Unlike all other rules, deletion rules are always unapplied as if they had been applied simultaneously. That is, during unapplication to a form X, X is scanned for all places where the deletion rule could be unapplied, and the rule is unapplied to those places, resulting in the new form X'. By default, that is the end of it; deletion rules cannot be unapplied again. However, if the user is brave, he can set the variable *del_re_apps* to some number greater than zero (its default); then the deletion rule is unapplied to X', and to X'', etc. *del_re_apps* times.

The above definition is couched in terms of simultaneous (un-)application of the deletion rule. However, if *del_re_apps* is set to a sufficiently large number, unapplication of a deletion rule will generate from a surface form all the underlying forms (and more) from which iterative application of the deletion rule might have generated the surface form (I think!).

2.4. Syntactic and Phonological/ Morphological Rule Features

Morphological/ phonological rule features (abbreviated as MPR features) and syntactic features are arbitrary features assigned by the user. MPR features govern the application of morphological and phonological rules, while syntactic features govern the application of morphological and syntactic rules. Syntactic features bear values, while MPR features do not bear values (i.e. if an MPR feature name appears on the MPR list of a given lexical entry, its value is implicitly +, while if it is absent, its value is implicitly –). Syntactic features include Head Features and Foot Features. However, this distinction is essentially invisible to the Morpher Module; morphological rules can assign features as either Head- or Foot-features in their output, but make no use of the distinction. (The distinction is, however, visible to the Parser Module.)

The value of a syntactic feature is a list (this is an extension of many theories, in which syntactic features are atomic valued; atomic valued features can be simulated by lists of length one). The interpretation of a list value whose length is greater than one is that the feature in question is ambiguous between (or among) the values listed.

Typical examples of features are tense (past present) (a syntactic feature-name feature-value pair) and verb_class_3 (an MPR feature).

A morphological rule may require that the syntactic features of the lexical entry which constitutes its input be unifiable with the features specified in the input of the rule. The rule may also require the presence or absence of specified MPR feature names.

A phonological rule may require the presence or absence of designated MPR features; syntactic features are invisible to phonological rules.

There are three ways in which features can become attached to a lexical entry. First, syntactic and MPR feature values may be assigned in the user's dictionary (i.e. lexically); syntactic feature assignments may later be changed by unification with specified features of a morphological rule’s input. Second, syntactic features have default values (see below); if a morphological rule calls for the unification of a specified feature with the value of that feature in a lexical entry, but the lexical entry has not received any values for that feature thus far in the derivation, then the unification of the rule’s feature specification with the default feature value becomes the new value of the feature in the lexical entry. Thirdly, both syntactic and MPR features may be introduced in the output of a morphological rule. However, assignment of syntactic feature values in the output side of a morphological rule overrides feature values (if any) previously assigned to the lexical entry. (E.g. a rule may change a singular noun into a plural noun.)

There is no restriction on the meaning of features. For instance, the English suffix –ee is restricted to verbs which take animate direct or indirect objects: employee, *tearee. This restriction might be encoded with the ad hoc Morphophonemic Rule feature AnimObj.

Feature value assignment, together with null affixation rules, allows Hermit Crab to distinguish between true null affixes, such as the plural marker on sheep, and optional affixes. That is, one analysis of English would hold that there are two words sheep: one singular, and one plural. The null affix pluralization rule for words like sheep, deer, antelope, reindeer, bison etc. would require that the value of the feature number of the input be unifiable with the value (singular), while the output would be assigned the feature number (plural). The lexical entry for the singular noun sheep (in the user's lexicon) would bear the feature number (singular). The surface string sheep would then be ambiguous between two lexical entries, one the singular noun sheep (listed in the user's lexicon), and the other the plural noun sheep (derived from the singular form by the rule of null affixation).

On the other hand, in a language in which the plural suffix was optional, the syntax will require that an unsuffixed word be unmarked (and therefore ambiguous) for the feature number (so as to support both singular and plural number agreement between the subject and the verb, for instance). Likewise, the morphological rule for plural affixation would require that the lexical entry which serves as its input be unifiable with the feature number (singular) (so as not to pluralize a noun already marked plural); the output of this rule would assign the feature number (plural). Under the system described in this chapter, if unsuffixed noun lexical entries bear no value for the feature number, they will be unifiable with the value (singular); i.e. a feature with no value serves as the identity feature under unification.

2.4.1. Default Feature Values

By default, any syntactic features not specifically assigned values are treated as having a maximal set of values for purposes of unification, i.e. the unification of A and B, where A is a feature with no values assigned and B is a feature of the same name with one or more values assigned, is B.

The grammar writer may assign other default values to any feature names by use of the function assign_default_morpher_feature_value (see section 6.1.11). There is no provision for making default feature assignment dependent on part of speech or on the values of other features, although this is a possible future enhancement.

2.5. Exceptions

2.5.1. Irregular and Suppletive Forms

Irregular or semi-regular forms may be treated in two ways:

(1) By specifying morphological or phonological rules which only apply to (or which fail to apply to) forms marked with specified features (see section 2.4, Syntactic and Phonological/ Morphological Rule Features); and

(2) By listing irregular forms in the lexicon.

Method (1) might be used for verb classes that take different suffixes (e.g. Spanish –ar,er and –ir verbs), while (2) might be used for a highly irregular verb, such as the English verb be.

However, it is not sufficient for the morpher to merely recognize irregular forms; it must also not analyze a given string as if it were the regular form of an irregular word. For instance, not only must the morpher recognize the English word saw as the past tense of see, it must not morph the English word seed as if it were a regular past tense of see. This situation is treated in terms of the blocking of the analyzed form by an irregular form listed in the lexicon (see section 3.4, Families of Lexical Entries). Blocking allows for words which are irregular in their phonology, morphology, or subcategorization (for arguments that a form can be irregular in its subcategorization, see Carlson and Roeper 1981).

2.5.2. Blocking of Affixes in Phonological Environments

A morphological rule may require that the stem to which it attaches have a certain phonetic form.

However, occasionally affixes will attach to any morpho-phonological form except a certain one. An example is the English suffix –al, which does not attach to a stem ending in the suffix –ism: *fatalismal (Aronoff 1976). Aronoff's solution is a negative phonological condition on the rule attaching –al: the stem must not be analyzable into a root + the suffix –ism.

Hermit Crab does not allow negative conditions on the phonological composition of stems, but this particular case could easily be handled by having the –ism suffixation rule assign the ad hoc Morphosyntactic Rule feature ISM, and having the –al suffixation rule forbid that feature. An alternative analysis of this case (suggested by Siegel 1974) is that the –ism rule is ordered after the –al rule. Either of these solutions would fail if the negative condition were purely phonological (which it is not in this case: cf. baptismal). It is not clear whether affixes in natural languages can have purely phonetic negative conditions on their attachment (but see Scalise 1986: 46-48 for some possible examples). At any rate, Hermit Crab does not provide for negative phonetic conditions.

2.5.3. Paradigm Gaps

Rarely languages will have gaps in their paradigms. A paradigm gap occurs when there is no form for a given position of the paradigm. For instance, the English phrasal verb have got lacks a past tense (J.D. Fodor 1978).

Provided the nonexistent forms would not be derivable by rule from the existing forms (perhaps because the morphological rules that would derive them are blocked by MPR features), the nonexistent paradigm forms could be blocked by listing all and only the existing inflected forms in the lexicon. Beyond this, there is no special provision for handling paradigm gaps in the morpher. This is in part because there is no widely accepted theoretical explanation for this phenomenon.

2.5.4. Idioms, Compound Nouns, and Incorporation

Morphological rules may be written in Hermit Crab for compounding and incorporation processes, i.e. processes which combine two lexical entries to form a derived word, provided that the word is written solid (i.e. with no internal white space).

However, there is no provision for lexical entries for idioms and compound nouns which are not written solid. Such idioms and compound nouns must be handled syntactically (for instance by selecting one word as the head, and having that word subcategorize a special syntactic idiom rule).

3. Lexical Entries and Lexical Lookup

This section defines various kinds of lexical entries.

Lexical entries represent words, stems, or roots, including their phonological, morphological and syntactic properties (plus any additional information added by the linguist).

As used in this specification, the term dictionary refers to a permanent repository of lexical information; this may be contained in one or more files. The lexicon, on the other hand, appears to the user as a temporary repository of information during a given session. The lexicon may be loaded from the dictionary or from a portion of the dictionary (such as a single file containing only nouns). Additions, deletions and changes to lexical entries affect only the lexicon until the lexicon is saved to the dictionary. This specification has little to say about the structure of the dictionary, except that the lexicon must be derivable from the dictionary. (The dictionary might be used as the lexicon, except that changes would be stored only in the main memory until saved.)

The actual form of the lexicon is not specified here; it may be in memory, in temporary disk files, or some combination of the two. What is specified is the form of the lexical entries which the lexicon contains.

Lexical entries may be classified as real (listed in the user's lexicon) or virtual (constructed from other lexical entries on the basis of morphological and phonological rules). Both real and virtual lexical entries may be cross-classified as complete entries, which correspond to full words in the target language, and incomplete entries, which correspond to roots or stems.

The following subsections further describe this classification of lexical entries. For a definition of lexical entries as data structures, see section 5.2.

3.1. Real Lexical Entries

A Real Lexical Entry is a lexical entry which is listed in the lexicon. A Real Lexical Entry must be Storable Lexical Entry (as defined below). Real Lexical Entries are added to the lexicon by the user (see section 6.4.1, load_lexical_entry; section 6.5.2 load_dictionary_from_text_file, and section 6.5.3 merge_text_file_with_dictionary).

3.2. Virtual Lexical Entries

A Virtual Lexical Entry is a lexical entry which is derived from another lexical entry (either real or virtual) by the application of one or more morphological or phonological rules (see section 4.2, Definitions of Morphological Rule Application, and section 4.4 Definitions of Phonological Rule Application).

3.3. Storable Lexical Entries

A storable lexical entry is one which is a candidate for entry in the user's dictionary. In most cases, economy of storage (and the patience of the user) will dictate that only roots and irregular forms will actually be stored in the lexicon. However, lexical lookup is attempted for each storable lexical entry found in the analysis of an input word.

3.4. Families of Lexical Entries

Each Real Lexical Entry may specify a Family Name. The set of all real lexical entries which have the same Family Name are referred to as a Family of Lexical Entries, and the individual members of that family are each other's Relative Lexical Entries.

The purpose of having families of lexical entries is to allow for blocking of regular derivations by the presence of irregular lexical entries listed in the lexicon. For instance, consider the English word seed. This word is properly formed as a noun, but not as the past tense of the verb see, since it is blocked by the irregular past tense saw. It would not be sufficient to simply list the irregular form saw in the lexicon, since that would not prevent morphing seed as a past tense verb. Rather, it is necessary to bock the incorrect morphing by setting up the irregular form as the unique past tense of see.

Suppose the morpher is analyzing some surface form. Once a real lexical entry has been looked up in the course of analysis, its Family Name (if any) is known. The morpher can then compare the various storable lexical entries which it produces in the course of the derivation which synthesizes the surface form from this real lexical entry against the relative lexical entries (i.e. all lexical entries with the same Family Name as that of the real lexical entry which it found). If any relative lexical entry has the same Part of Speech, Subcategorization, Head and Foot Features as one of the storable lexical entries in the derivation, then that Relative Lexical Entry represents an irregular form which blocks the derivation.

Note: There is nothing to prevent the user from redundantly listing a regular form in the lexicon as a relative lexical entry. Such a regular form will be found at lexical lookup, and will block its own derivation by rule from some other real lexical entry, which at least prevents duplicate analyses of a given word. One situation where it might be desirable to list productive forms is the case where tow forms of a given word exist (due to historical change or dialectal variation). Examples in English include hanged–hung and learned–learnt. If both forms are listed, either form will be correctly analyzed (since real lexical entries do not block each other).

The mechanism of blocking is detailed below (see section 3.6, Analyzable Word).

3.5. Complete Lexical Entries

A Complete Lexical Entry potentially represents a fully inflected word, as opposed to an Incomplete Lexical Entry, which represents a form that is not fully inflected, i.e. a stem or root. ("Potentially", because it may in fact be blocked by an irregular form; see 3.6, Analyzable Word.)

A Complete Lexical Entry results from the application of zero or more morphological and phonological rules to some Real Lexical Entry, provided all Obligatory Features required by that Real Lexical Entry and the morphological rules which applied in the derivation are instantiated in the Complete Lexical Entry. The sequence of lexical entries beginning with the Real Lexical Entry, followed by a series of zero or more Virtual Lexical Entries, and terminating in the Complete Lexical Entry, represents the derivation of that Complete Lexical Entry.

More specifically, a lexical entry L is a Complete Lexical Entry if:

(1) it is a lexical entry of the *surface* stratum;

(2) it is derived from a Real Lexical Entry by the application of zero or more morphological rules and the corresponding phonological rules in accordance with the definitions of Morphological Rule Application and of Phonological Rule Application; and

(3) for each feature name in its Obligatory Head Features list, that feature name has been assigned a value in its Head Features list.

Note: Under part (3) above, it is not sufficient that a feature have a default value; it must have been assigned some value in the Real Lexical Entry from which the Complete Lexical Entry is derived, or by a morphological rule. (Default feature values may be assigned by the function assign_default_morpher_feature_value, section 6.1.11.)

Example of the use of Obligatory Features: Suppose that in some language, count nouns are obligatorily marked with a number suffix. Then the obligatory_features list of all count noun stems should contain the feature name number.

This mechanism provides a means of distinguishing between obligatory number marking (but where a null affix may indicate the unmarked value of number), and the situation in which number marking optional (so that the lack of a number marking affix indicates ambiguity as to number). In the former case, all count noun stems would be listed in the lexicon (or would be designated by some derivational rule) as requiring a value for the feature number, and there would be one or more rules attaching number affixes, of which rules one might be a rule of null affixation providing the unmarked (default) value of number. All lexical entries for count nouns which lack a value for the feature number would be incomplete lexical entries.

In the second case, in which number marking is optional, noun stems would not be listed as requiring the feature name number, and a noun to which a number affixation rule has not applied is simply unmarked for number. Such a noun would (all other requirements being met) be a Complete Lexical Entry, ambiguous for number.

3.6. Analyzable Word

An input word is analyzable if it can be matched by the morpher with one or more complete lexical entries.

An input token (word) matches a complete lexical entry if the phonetic shape of the complete lexical entry is identical to the input token's shape.

4. Results of Morpher Application

This section defines the application of morphological and phonological rules to lexical entries.

4.1. Phonetic Representation

Externally, there are two different representations for sequences of phonetic segments in the morpher. Input words (tokens) and the phonetic shape of Real Lexical Entries are represented as strings, in which each segment and/or suprasegmental is represented by one or more string characters. Phonological and morphological rules, on the other hand, use a Phonetic Template data structure (defined below), in which each segment is defined in terms of its phonetic features. These differing representations are made compatible internally to the morpher by being translated into a Phonetic Sequence (also defined below). At the other "end", the phonetic shape of a virtual lexical entry (i.e. a lexical entry derived by the application of phonological and/or morphological rules) is translated from a Phonetic Sequence into a string before lexical lookup. We therefore begin with definitions of the correspondences among these phonetic representations: strings of characters, phonetic templates, and phonetic sequences.

4.1.1. Definition of Translation between String and Phonetic Sequence

The translation between a string and its representation as a Phonetic Sequence makes use of the Character Definition Table (defined below). The translation from string to phonetic sequence is unambiguous; the reverse translation may be ambiguous.

The translations are defined here in algorithmic form for convenience. (Hermit Crab need not use the same algorithm internally.)

4.1.1.1. Translation from String to Phonetic Sequence

The translation of the string representing an input word into a phonetic sequence, defined in this section, is unambiguous.

The phrase "exit with error, returning X" means return an error message containing X. Error messages for this translation process are listed under the command morph_and_lookup_word.

Let Str be a string consisting of string characters C1...Cm. (String characters are defined in chapter two.) This string may be translated into the Phonetic Sequence PS = (F1...Fn), where each Fi is a boundary marker or a set of phonetic features by the following procedure.

(1) Set PS equal to the empty list.

(2) Remove from Str the longest sequence of characters C = C1..Cj beginning at the left of Str and matching a Character Sequence in the Character Definition Table. (Note that Str is now of length m–j.) If no sequence beginning at the left end of Str matches with any Character Sequence in the Character Definition Table, exit with failure, returning the first character of Str.

(3) If sequence C matches the Character Sequence of a Segment Definition Record, append the Phonetic Features field of that Segment Definition Record to the right end of PS. If sequence C matches the Character Sequence of a Boundary Definition Record, append C to the right end of PS. (Boundary markers are not associated with any phonetic features, hence the character(s) which represent them in Str are also used to represent them in PS.)

(4) If Str is non-empty, go to step (2). Else exit with success, returning PS.

Note that some features in PS may be uninstantiated for some segments.

4.1.1.2. Translation from Phonetic Sequence to a Regular Expression

In the following definition of the translation from phonetic sequence to a regular expression, no translation is defined for a Phonetic Sequence which contains an Optional Segment Sequence record. Phonetic sequences containing Optional Segment Sequence records should appear only in rule environments, not in the structural change of rules or in lexical entries, and therefore will never need to be translated into a regular expression. (However, traces of rule unapplication may contain optional segments resulting from the unapplication of epenthesis or deletion rules (see section 5.8.3.2 Phonological Rule Analysis Trace Record--Rule Input.)

Let PS = (F1..Fn) be a Phonetic Sequence. This list may be translated into the Regular Expression RegExpr consisting of the terms C1..Cm by the following algorithm. (If each Fi is sufficiently instantiated to be unambiguously translated into a segment, RegExpr will represent a single string.)

(1) Set RegExpr equal to the empty string, and i = 1.

(2) (a) If Fi is a string (i.e. a boundary marker), append it to the right end of RegExpr (bracketing it with ASCII 2 (STX) and ASCII 3 (ETX) to the left and right respectively if it is marked "optional"), and go to step (3).

(b) Else, let SDR = {SDRi...SDRj} be the set of all Segment Definition Records whose Phonetic Features Field are a superset of Fi, and let CS = {CS1...CSj} be the set of Character Sequences of SDRi. Then if SDR is of length one (i.e. Fi is unambiguously translatable into a segment), set RegExpr equal to the result of appending CS1 to the right end of RegExpr; else (if SDR is of length greater than one, meaning Fi is ambiguously translatable), set RegExpr equal to RegExpr plus an ASCII 28 (FS) plus the members of CS, each separated by an ASCII 29 (GS), plus an ASCII 30 (RS). If the segment(s) is/are marked as optional, enclose the segment or the bracketed list of segments in ASCII 2 (STX) and ASCII 3 (ETX) to the left an right respectively. If there is no Segment Definition Record whose features are a superset of Fi, exit with error, returning Fi.

(3) If i < n, set i = i+1 and go to step 2. Else exit with success, returning RegExpr.

4.1.2. Definition of the Partition of a Phonetic Sequence by a Phonetic Template

Let PSTSeq = (PST1...PSTm) be a Phonetic Sequence of a Phonetic Template, and let INIT and FINAL be the values of the init and final fields of that Phonetic Template. Furthermore, let PSLSeq = (PSLx...PSLy) (the Lexical Sequence) be a subsequence of the Phonetic Sequence PSL1...PSLz of a lexical entry. Then PSTSeq partitions PSLSeq into the list PART = (BMs1 Part1...BMSm Partm BMsm+1), where each MSsi is a list of zero or more Boundary Markers, and Parti is a variable-free phonetic sequence, iff:

(1) If INIT is true, the left-most segment of the left-most non-empty Parti in PART is PSL1 (i.e. PSTSeq must match PSLSeq beginning at the left-most segment of PSLSeq);

(2) If FINAL is true, the right-most segment of the right-most non-empty Parti in PART is PSLy (i.e. PSTSeq must match PSLSeq ending with the right-most segment of PSLSeq);

(3) If PSTi is a Simple Context, then Parti contains a single segment Seg such that PSTi is a subset of Seg (i.e. every feature in PSTi has that same value in Seg);

(4) If PSTi is a string of one or more boundary markers, then Parti is that same string of boundary markers;

(5) If PSTi is an Optional Segment Sequence, let MIN and MAX be the values of the Minimum Occurrence and Maximum Occurrence fields of PSTi (default 0 and 1, respectively), and let PSTSeq be the Optional Sequence of PSTi. Then Parti is a list divisible into between MIN and MAX nonoverlapping adjacent subsequences, each of which matches PSTi; and

(6) For all i, BMi is a list of zero or more boundary markers. (Boundary markers in the lexical sequence need not be accounted for by the template; this corresponds to the generally accepted notion that phonological rules can apply freely across morpheme boundaries. However, the definition of the application of a phonetic rule to a lexical entry, as given below, requires that the portion of a phonetic sequence matched by the input of a phonetic rule must not contain a boundary marker unless the marker is specifically required by the rule.)

Note 1: The above definition assumes synthesis order, whereas rules must be applied in analysis order to the morpher's input. In particular, when (un-)applying rules in analysis order, boundary markers which the input side of a phonological rule may call for are unlikely to be present in the lexical form.

Note 2: By step (3) above, a template which requires a feature-value pair (Fi Vi) will not match (during synthesis) against a segment for which Fi does not have an instantiated value.

4.2. Definitions of Morphological Rule Application

This section describes the lexical entry generated by applying a morphological rule to another lexical entry.

In the following subsections, the application of a morphological rule MR is defined in terms of its application to an input lexical entry ILE, resulting in an output lexical entry OLE. ILE may be a real or virtual lexical entry; OLE will be a virtual lexical entry. (The terms "input" and "output" are here used in the synthesis sense.)

4.2.1. Blocking

A morphological rule may be blocked under certain circumstances. When blocking occurs, the input lexical entry is replaced by a different lexical entry, and the derivation continues as if the rule had already applied.

Blocking of morphological rules is defined as follows. (Blocking of affix templates is defined separately, see section 4.3.)

Let DLE be a Derived Lexical Entry to which morphological rule MR has just applied, and let StemSet be the Family of DLE.

Then DLE is replaced with a member RLE of StemSet if:

MR is a blockable rule; and

the Stratum, Part of Speech, Subcategorization, of RLE are identical to the corresponding fields of DLE; and

the Head and Foot Features of DLE are subsets of the corresponding fields of RLE.

Example of the Use of Blocking: Suppose that the word seed has been (incorrectly) analyzed as being derived from the verb see by the application of the morphological rule attaching the –ed suffix, a rule which adds the Head Feature tense (past); and changes the phonetic form of this stem to seed. Suppose further that the lexical entry for see and the lexical entry for the verb saw are Relative Lexical Entries, with the entry for saw identical to the lexical entry for see save for its phonetic form, the addition of the head feature tense (past). Then the analysis of seed as the past tense of see will be blocked by the lexical entry for saw, that is, in the derivation of the past tense of see, the Derived Lexical Entry seed is replaced by the Lexical Entry for saw. (If Hermit Crab is parsing seed, i.e. running the command morph_and_lookup_word, the resulting word saw will not match the input, and the derivation will fail. If Hermit Crab is instead generating the past tense of saw, i.e. it is running the command generate_word, the output will be saw instead of seed.)

4.2.2. Definition of Feature Unification

This section defines the unification of the Head (or Foot) Features of an Input Lexical Entry ILE with the head (foot) features of the Required Head (Foot) Features field of a subrule SR of a morphological rule. (The result of this operation is then combined with the Head (Foot) Features of the subrule to create the Head (Foot) Features of the Output Lexical Entry; see 4.2.6 below.)

Note that features may be either uninstantiated or instantiated. An instantiated feature is a feature which either has one or more values, or whose value is the designated atom ‘*NONE*’. (The latter is used in Required Features to ensure that no value has been assigned to a lexical entry’s Head Features.)

We first define the unification of a single Required Feature (RFN RFV) with the Head (or Foot) Features LF= (LFN1 LFV1...LFNn LFVn) of a Lexical Entry.

If RFV is the atom ‘*NONE*’, then

if RFN is not included in (LFN1...LFNn) and there is no default value for RFN, unification succeeds with the value (RFN ‘*NONE*’);

else if RFN is included in (LFN1...LFNn) with the value ‘*NONE*’, unification succeeds with the value (RFN ‘*NONE*’);

else if RFN has the default value ‘*NONE*’, unification succeeds with the value (RFN ‘*NONE*’);

otherwise (RFN is included in (LFN1...LFNn) but has a value other than ‘*NONE*’, or it is not included in LF but there is a default value for RFN other than ‘*NONE*’), unification is said to fail (and the value of the unification is undefined).

Otherwise (if RFV is not ‘*NONE*’), then

If the feature name RFN is included in (LFN1...LFNn), let the set intersection of RFV and the value of RFN in the Head (Foot) Features of the Lexical Entry be OFV. If OFV is non-empty, unification succeeds with the value (RFN OFV); otherwise, unification fails;

else (if RFN is not included in (LFN1...LFNn)), then if RFN has a default value, then let the set intersection of RFV and the default value of RFN be OFV. If OFV is non-empty, unification succeeds with the value (RFN OFV); otherwise (if OFV is empty), unification fails;

else (if RFN does not appear among the Head (Foot) Features of the Lexical Entry, and it does not have a default value), unification succeeds with the value (RFN RFV). (That is, an uninstantiated feature in the lexical entry acts as the identity element under unification.)

The unification of a set of Required Features (RFN1 RFV1...RFNn RFVn) with the Head (or Foot) Features of a Lexical Entry succeeds if the unification of each of the Required Features with the Head (or Foot) Features of a Lexical Entry succeeds, producing an output set of features OF = (OFN1 OFV1...OFNm OFVm) determined as follows:

For every RFNi, OF includes the unification of RFi with LFi;

and for every LFNi not included in (RFN1...RFNn), OF includes (LFNi LFVi) (that is, the features of the Lexical Entry not mentioned in the Required Features of the rule remain unchanged in the output features).

The output features OF are used as the new features of the lexical entry for the purposes of applying the morphological rule.

In (hopefully!) more intuitive terms, unification means that any features in the Input Lexical Entry’s Head Features which are incompatible with the Required Head Features of the morphological subrule are removed; if the result is empty, unification fails. Furthermore, if the Lexical Entry lacks a specified value for any Required Feature, the default value (if any) is used in place of a specified value; failing a default value, the value of the feature in the Lexical Entry is treated as compatible with anything, which is to say the value of the Required Feature is taken to be the value of the actual feature. The special value ‘*NONE*’ is used when it is required that a feature have no assigned value (e.g. if an affix attaches to a noun only if the noun does not as yet bear any marking for number).

4.2.3. Definition of Match between a Morphological Rule and a Lexical Entry

An Ordinary (non-realizational) Morphological Rule R applies to a Lexical Entry ILE if:

(1) If a Part of Speech is specified on the input side of R, it is identical to the Part of Speech of ILE;

(2) If the Required Subcategorization Rules list of R is non-empty, the Subcategorization field of ILE contains at least one of the syntactic rule names contained in the Required Subcategorization Frame field of R;

(3) The Head and Foot Features lists of R have been successfully unified with the Required Head/ Foot Features lists of ILE (as defined above, see section 4.2.2, Definition of Feature Unification);

(4) The value of the Multiple Application field of R is greater than the number of times the Rule Name of R appears in the Morphological Rules list of ILE; and

(5) The Rule Stratum of R is one deeper than or the same as the Morphological Stratum of ILE. (See section 3.3, Storable Lexical Entries for a more detailed definition of when a morphological rule may apply to a lexical entry of a given stratum.)

If the Morphological Rule applies to the Lexical Entry, its subrules are applied disjunctively. That is, the Input Side of each of the Subrules is checked in order for a match (see below); if there is a match, that Subrule is applied, and the application of the Morphological Rule is complete. (It is not an error if the Morphological Rule as a whole applies to the Lexical Entry, but none of its subrules apply.)

4.2.4. Definition of Match between the Input Side of a Morphological Subrule and a Lexical Entry

Let the Phonetic Template MRITemp (= Morphological Rule Input Template) be the Required Phonetic Input of a subrule SR of a morphological rule, and let the Phonetic Sequence PLSeq be the Phonetic Shape of the Lexical Entry ILE.

Then subrule SR matches against ILE iff:

(1) MRISeq matches against PLSeq;

(2) For each atom in SR's Required Morphological Rule Features list, ILE must contain that same atom in its MPR Features list;

(3) For each atom in SR's Excluded Morphological Rule Features list, ILE must not contain that atom in its MPR Features list.

4.2.5. Definition of Transformation of a Phonetic Sequence by a Morphological Rule

Note: The following definition is given in terms of synthesis of a derived phonetic sequence from another phonetic sequence (that of the stem) plus the phonetic sequence of an affix (given by a morphological rule).

Let the Phonetic Template MRITemp = MRI1...MRIm be the Required Phonetic Input of a subrule SR of a morphological rule MR, and MROList = MRO1...MROn be the Phonetic Output of SR. (Note that while MRITemp is a phonetic template, MROList is a list of integers, simple contexts, lists of integers plus feature specifications, and lists of strings plus the name of a character definition table; cf. Morphological Rule Notation—Phonetic Output.) Further let PLISeq be the Phonetic Sequence which represents the Phonetic Shape of some lexical entry LE, let PartI = (BM1 PI1...BMIm PIm BMIm+1) be the partition of PLISeq by MRITemp, and let PLOSeq be the Phonetic Sequence which is to represent the transformation of PLISeq according to rule SR.

Then the rule SR transforms PartI into PartO = (BMO1 PO1...BMOn POn BMOn+1), a list of boundary markers (BMOq) and phonetic sequences (PIq), according to the following rules:

(1) If MROq is an integer p, POq = PIp; (boundary markers in the input phonetic sequence which are not mentioned in the rule associate with the segments to their left if).

(2) If MROq is a list composed of an integer p followed by a feature list FL, then POq is identical to PIp except that for every Simple Context Sk in POq, and for every feature-name feature-value pair {FN FV} in FL, the value of FN in Sk is FV; and boundary markers associate as per (a). (The feature values specified in FL are inserted in each segment of POq, replacing the values of those same features, if any, in PLISeq. Note that any boundary markers in PIp are simply copied over into POq.).

(3) If MROq is a list composed of a string s followed by the name of a character definition table CT, then POq is the sequence of segments into which the string s is translated using the specified character definition table.

(4) If MROq is a Simple Context, POq is identical to MROq (i.e. it is a single segment whose features are those of MROq.;

(5) If MROq is a boundary marker (string), POq is identical to MROq.

65) BMO1 = BMI1; and all BOq not specified above are empty.

Finally, SR transforms PLISeq into PLOSeq iff PLOSeq is the phonetic sequence composed by concatenating all the members of the list PartO.

Note: It is unwise to have a morphological rule delete optional segment sequences. One reason is that it is computationally expensive to insert (during analysis) an unknown number of unknown segments. There is also the undesirable possibility of inadvertently deleting boundary markers during synthesis.

4.2.6. Definition of Application of a Morphological Rule to a Lexical Entry

The following definition is written in the synthesis sense: rule MR attaches an affix to ILE to produce OLE. Note also that this defines a single application of MR; in some cases, a morphological rule may apply more than once (see section 4.2.8, Definition of Application of a Set of Non-Realizational Morphological Rules).

Rule MR transforms the input lexical entry ILE into the output lexical entry OLE iff for SR, the first subrule of MR to match lexical entry ILE (as defined above, see section 4.2.3 Definition of Match between a Morphological Rule and a Lexical Entry):

(1) The phonetic sequence representing the Phonetic Shape of ILE has been transformed into the Phonetic Sequence of OLE by the application of SR (as defined above, see section 4.2.5, Definition of Transformation of a Phonetic Sequence by a Morphological Rule);

(2) The Lexical Entry ID of OLE is the same as the Lexical Entry ID of ILE;

(3) The Stratum of OLE is the same as the Rule Stratum of MR;

(4) The Gloss String of OLE is the result of concatenating the Gloss String of SR to the right of the Gloss String of ILE, with a space separating the two;

(5) The Part of Speech of OLE is the same as the Part of Speech of the output of SR if that field is non-empty; otherwise it is the same as the Part of Speech of ILE.

(6) If there is a Subcategorization field in the output of SR, the Subcategorization field of OLE consists of (1) all atomic members of the Subcategorization field of the output of SR, (2) the second member (if any) of each sublist of that field for which the first member of the sublist is a member of the Subcategorization field of ILE, and (3) any members of the Subcategorization field of ILE which are not mentioned in the Subcategorization field of SR. Otherwise (if there is no Subcategorization field in the output of SR), the Subcategorization field of OLE is the same as the Subcategorization field of ILE.

Note: If the Subcategorization field of ILE is absent, it is considered to be empty, i.e. the Subcategorization of OLE = the Subcategorization of SR. If, however, the Subcategorization field of the output record of SR is the empty list, the above definition implies that the Subcategorization field of OLE will be empty.

(7) The Morphological Rules list field of OLE consists of the Morphological Rules list of ILE appended to (the left of) a list containing the Rule Name of MR.

(8) The MPR Features list of OLE is the set union of the MPR Features list of ILE and the MPR Features list of SR.

(9) The Head Features list of OLE is the Head Features to be realized on ILE, plus any non-conflicting features of the Head Features list of SR, plus any non-conflicting features of the Head Features list of ILE as modified by the unification of the Required Head Features of the input of SR with the previous Head Features of ILE (see section 4.2.2, Definition of Feature Unification). (That is, the Head Features to be realized on ILE take precedence over the Head Features of SR, which in turn take precedence over any other Head Features of ILE.)

(10) The Foot Features list of OLE is the Foot Features list of SR plus any non-conflicting features of the Foot Features list of ILE, as modified by the unification of the Required Foot Features of the input of SR with the previous Foot Features of ILE (see section 4.2.2, Definition of Feature Unification).

(11) The Obligatory Features list of OLE is the set union of the Obligatory Features lists of ILE and SR.

Note: The Head- and Foot-features fields of OLE bear only values which have been assigned to them by a virtue of percolation from a real lexical entry. Default values are not listed in lexical entries, and therefore are not output by the morpher module.

4.2.7. Compounding Rules

A compounding rule is a morphological rule with two input fields: one Head field and one Non-head field. Such a rule analyzes a word into two lexical entries; for computational reasons, the Non-head field is required to be a Real Lexical Entry. (This is probably linguistically motivated, as well.) Compounding rules are applied in the same way as other morphological rules, except for the differences specified in the following subsections.

For these subsections, SRH and SRNH refer to the Head and Non-head fields respectively of SR, and ILEH and ILENH refer to the corresponding input lexical entries.

4.2.7.1. Unification of Head and Foot Features

The Head and Foot Features of ILEH and ILENH must be unifiable with the Required Head and Required Foot Features of SRH and SRNH respectively, as defined above (see section 4.2.2, Definition of Feature Unification).

4.2.7.2. Match between a Compounding Rule and Lexical Entries

ILEH and ILENH must each be partitionable by SRH and SRNH respectively, as defined above (section 4.2.4, Definition of Match between the Input Side of a Morphological Subrule and a Lexical Entry). (Given the specification of compounding rules given later, SRNH cannot contain a Multiple Application field.)

4.2.7.3. Transformation of Phonetic Sequences by a Compounding Rule

OLE is formed by appending the partition of the Phonetic Sequence of ILEH by SRH to the left of the partition of the Phonetic Sequence of ILENH by SRNH, and transforming the resulting partition as if it were the input to an ordinary morphological rule (section 4.2.5, Definition of Transformation of a Phonetic Sequence by a Morphological Rule). (This does not imply that the non-head word will appear to the right of the head word, but is only a convention to standardize application of compounding rules.)

4.2.7.4. Application of Compounding Rule to Lexical Entries

The result of applying a compounding rule to two lexical entries is the same as the result of applying an ordinary morphological rule to a single lexical entry (section 4.2.6, Definition of Application of a Morphological Rule to a Lexical Entry), with the following exceptions:

The Phonetic Sequence of OLE is as defined in the section immediately above (see 4.2.7.3, Transformation of Phonetic Sequences by Compounding Rule).

The Gloss String of OLE is the result of concatenating the Gloss String of ILENH to the right of the Gloss String of ILEH; the two Gloss Strings are separated by a space (ASCII 32).

The Lexical Entry ID, Part of Speech, Subcategorization, Morphological Rules list, MPR Features, Head Features, Foot Features, and Obligatory Features fields of OLE are as specified above for ordinary morphological rules, but substituting ILEH for ILE.

Finally, ILENH must be a Real Lexical Entry.

4.2.8. Definition of Application of a Set of Non-Realizational Morphological Rules

This section specifies the application of a set of ordinary and/or compounding (but not realizational) morphological rules of a given stratum.

Let the set of morphological rules of the stratum be MRSet = {MR1,...MRn}, and let ILE be the Input Lexical Entry to which MRSet applies to produce the Output Lexical Entry OLE. (Again, "input" and "output" are used here in the synthesis sense.) Each subsection below defines the application of one or more rules of MRSet, according to the ordering of morphological rules for the stratum.

Note: Additional applications of phonological rules, not described in the following subsections, may be necessary to generate a Storable Lexical Entry; see section 3.3, Storable Lexical Entries.

4.2.8.1. Linearly Ordered Morphological Rules

This definition applies if the value of the m_rule_order field of the current stratum is linear.

Let MRList = MR1...MRn be the list of morphological rules in MRSet in their order of application. Then ILE is related to OLE by the following algorithm:

(1) Set InterLE = ILE.

(2) If MRList is empty, set OLE = InterLE and exit, returning InterLE. Otherwise set CurRule to one of the rules in MRList, and remove CurRule and all rules preceding it from MRList. Set NumApplics = 0.

(3) Apply CurRule to InterLE, set InterLE equal to the result, and increment NumApplics by 1.

(4) If the current stratum is cyclic, apply the phonological rules of the current stratum to InterLE, and set InterLE equal to the result.

(5) If NumApplics is less than the Multiple Application Field of CurRule, optionally go to step (3).

(6) Go to step (2).

4.2.8.2. Unordered Application of Morphological Rules

This definition applies if the value of the m_rule_order field of the current stratum is unordered.

For each rule MRi in MRSet, applics(MRi) represents the number of times MRi has applied. Then OLE is derivable from ILE by the following algorithm:

(1) Set MRSub equal to any subset (including the empty set) of MRSet. For all MRi in MRSub, set applic(MRi) = 0. Set InterLE = ILE.

(2) If MRSub is empty, set OLE = InterLE and exit. If MRSub contains only rules whose Multiple Application Field is greater than one, optionally set OLE = InterLE and exit. Otherwise set CurRule to any rule of MRSub. Increment applics(MRi); if the result is equal to the Multiple Application Field of CurRule, remove CurRule from MRSub.

(3) Apply CurRule to InterLE, and set InterLE equal to the result.

(4) If the current stratum is cyclic, apply the phonological rules of the current stratum to InterLE, and set InterLE equal to the result.

(5) Go to step (2).

Warning: Because all possible permutations of rules are tried in every order, this algorithm can be very slow. In practice, the situation is not quite as bad as it might seem, because Hermit Crab will either be given a particular ordering of rules to use (if it is running the command generate_word), or it will have chosen a particular order of rules based on the analysis of a surface form. (However, the analysis may be indeterminate if the stratum in question contains null affixes.)

4.3. Definition of Application of an Affix Template

Realizational Morphological Rules are applied according to an Affix Template. The Affix Template of a given Stratum applies after all relevant ordinary Morphological Rules of that Stratum have been applied, but before any Phonological Rules of a non-cyclic Stratum have been applied.

Let Templates = T1...Tk be the list of Affix Templates of a Stratum. (Note that Slots may be empty, in which case there are no Realizational Rules to be applied for this Stratum.). Also let LE be a Lexical Entry to which the Stratum is being applied, and let RzF be the set of features to be realized in the derivation.

Then a stem LE' is selected as follows: Let StemSet be the set of lexical entries in the family of LE. Then set LE' to the member of StemSet whose Head and Foot Features are a superset of LE and the largest subset of RzF. (This should be a unique lexical entry; if there are more than one lexical entries matching this description, an error results. If RzF is empty, this step is skipped.) If there is no such lexical entry in StemSet, then set LE' to LE.

The application of the Realizational Morphological Rules of the Stratum to LE' is as follows. Templates is scanned for an Affix Template whose Required Part of Speech matches the Part of Speech and LE', and whose Required Subcategorized Rules are a (possibly improper) subset of the Subcategorized Rules of LE'. (It is not an error if no Affix Template matches against LE', but an error will occur if more than one Template matches.) Let T be the selected Template.

Let Slots = S1...Sm be the list of Slots of T. The Slots are scanned in order. For Slot Sj, let Rules = R1...Rn be the list of Realizational Rules. Rules are then applied in disjunctive order, that is: the Head Features of R1 are checked against RzF. If the Realizational Features of R1 are a subset of RzF, and not also a subset of the Head Features of LE', the rule is applied, and if the Stratum is cyclic, the phonological rules of the Stratum are then applied. Processing then continues with Slot Sj+1. If the Realizational Features of R1 are not a subset of RzF, rule R2 is checked, and so forth. If none of the rules of slot Sj match, processing continues with Slot Sj+1. (It is not an error if none of the rules of a given slot apply, nor is it an error if a rule of a slot matches LE', but none of its subrules matches. Note that the test of the Realizational Features is not a unification test; any features of the Realizational Features of the rule must be present with that same value in the Realizational Features of the derivation.)

After processing the slots, set the head features of the resulting word equal to RzF plus any nonconflicting Head Features of LE'. (An alternative would be to assign the Head Features of each Realizational Rule as it is applied, which would have the effect of allowing one affix to block attachment of a later affix. It is not clear which of these approaches is correct.)

The reason for requiring that the Realizational Features of R1 are not a subset of the Head Features of LE', is to allow blocking of inflectional affixation if the stem is inherently specified for all the features which the inflectional affix would realize. For instance, on the assumption that oxen is listed in the lexicon and bears the feature [+plural], the plural suffix -s should be prevented from attaching to it to give *oxens; see Anderson (1992: 134, example (20)).

4.4. Definitions of Phonological Rule Application

The application of a phonological rule to a lexical entry changes the phonetic form of the input lexical entry.

The following subsections define the application of a phonological rule PR to an input lexical entry ILE, resulting in the output lexical entry OLE. (ILE may be a Real or a Virtual Lexical Entry, and OLE will be a Virtual Lexical Entry.)

4.4.1. Phonetics of Phonological Rule Application

This section describes the phonetic effects of the application of a phonological rule to a lexical entry.

4.4.1.1. Definition of Match between a Phonological Rule and a Lexical Entry

Let the Phonetic Template PRLTemp = <LInit, LFinal, (PRL1...PRLi)> be the Left Environment of phonological rule PR, the Phonetic Sequence PRISeq = (PRI1...PRIj)> be the Phonetic Input Sequence of PR, and the Phonetic Template PRRTemp = <RInit, RFinal, (PRR1...PRRk)> be the Right Environment of PR. Let the Phonetic Sequence PrevWord be the prev_word field (if any) of PR, and let the Phonetic Sequence NextWord be the next_word field (if any) of PR.

Further let the Phonetic Template PETemp = <LInit, RFinal, (PRL1...PRLi PRI1...PRIj PRR1...PRRk)> be the combined template for PR, where the Phonetic Sequence of PETemp is the concatenation of the Phonetic Sequences of the Left Environment template + Phonetic Input Sequence + Right Environment. (Note that LFinal and RInit are ignored. Also, either PRLSeq or PRRSeq may be omitted in PR, and any of the Phonetic Sequences of the environments or of the input of PR may be empty; in that case, the Phonetic Sequence of PETemp consists of the concatenations of the non-empty fields. PESeq itself should never be empty, since then the rule would apply everywhere.)

Then phonological rule PR matches against the Phonetic Sequence PLSeq = PL1...PLn, a subsequence of the phonetic sequence representing the Phonetic Shape of the lexical entry ILE, iff:

(1) PETemp partitions PLSeq (section 4.1.2, Definition of Partition of a Phonetic Sequence by a Phonetic Template);

(2) If PLSeq = PLw...PLx is the subsequence of PLSeq that matches PRI1...PRIj (the input sequence of PR), then PLSeq does not contain any boundary markers not specifically required in PRI1...PRIj. (Unlike the part of the phonetic shape which matches against the rule’s environment, the portion which matches the rule’s input cannot contain any boundary markers not called for by the rule.)

(3) If the Phonetic Sequence of the Input Template of PR is empty (i.e. a rule of epenthesis), and if PRR1 is not a boundary marker, then if PLy...PLz is the subsequence of PLSeq that matches PRRSeq (the Right Environment of PR), then PLy is not a boundary marker. (In a rule of epenthesis, the epenthesized segment(s) is (arbitrarily) attached to the right of a boundary marker not specifically mentioned in the rule); and

(4) The Stratum of ILE must be included in the Rule Strata of PR.

(5) If PrevWord has a value, then PrevWord matches the Phonetic Shape of the word preceding the word being analyzed, if there is one; if there is no preceding word and PrevWord has a value, it is the atom *null*.

(6) If NextWord has a value, then NextWord matches the Phonetic Shape of the word following the word being analyzed, if there is one; if there is no following word and NextWord has a value, it is the atom *null*.

The sub-sequence PLISeq of PLSeq which matches against the Input Sequence of PR is referred to as the "input stretch" of PLSeq. (There may be more than one input stretch in a given lexical entry.)

4.4.1.2. Definition of the Phonetics of a Single Application of a Phonological Rule

In this section, the single application of a phonological rule to a phonetic sequence is defined. This is an abstraction from the more general situation in which a phonological rule may apply multiple times to a single phonetic sequence; that case is defined in the next section, based on the definition given here. It is also an abstraction from the application of a disjunctive set of phonological rules to a lexical entry, which is described in the second section following.

Note: The following definition is given in terms of the synthesis of a derived lexical entry by applying a phonological rule to another (underlying) lexical entry.

Let the variable-free Phonetic Sequence PRIseq = (PRI1...PRIm) be the Phonetic Input Sequence of rule PR, and PROSeq = (PRO1...PROn) be its Phonetic Output Sequence, and let the variable-free phonetic sequences PLISeq = (PLI1...PLIi) and PLOSeq = (PLO1...PLOj) be the Input Stretch of some lexical entry LE and its transformation according to rule PR.

(There is no guarantee that m=i or that n=j, since the Input Lexical Sequence and its transformation may have a boundary marker not mentioned in the rule; and there is no guarantee that m=n or that i=j, since segments may be epenthesized or deleted by the rule.)

Then PR transforms PLISeq into PLOSeq iff:

(1) Rule PR matches LE, with PLISeq being the Input Stretch according to this match (see definition in section 4.4.1.1, Match between a Phonological Rule and a Lexical Entry);

(2) If PRISeq and PLISeq are the empty list (a rule of epenthesis), PLOSeq = PROSeq;

(3) If PROSeq is the empty list (a rule of deletion), PLOSeq is the empty list;

(4) If PRISeq and PROSeq are non-empty phonetic sequences of the same length, then each PLOk is identical to PLIk except that for each segment PLIk matched to the corresponding simple context PRIl, each feature-name feature-value pair in PROli is substituted into PLOk in place of the corresponding feature of the same name (if any) in PLIk;

(5) If PRISeq is of length one and PROSeq is of length greater than one (for instance, a diphthongization rule), then PLOSeq consists of the same number of segments as PROSeq, and each segment PLOk bears all the features of PLI1 except that the feature-name feature-value pairs given in PROk have been substituted for the features of the same name (if any) in PLI1; or

(6) If PRISeq and PLISeq are of length greater than one, and PROSeq is of length one (for instance, a rule of degemination), PLOSeq is of length one, and its features are those of PRO1 plus any non-conflicting features from the intersection of the feature-name feature-value pairs of the set of all segments in PLISeq.

Note 1: There is no provision for a rule which takes as input two or more segments, and transforms them into some different number of segments greater than one.

Note 2: For reasons of computational tractability, the use of phonological rules to add, delete or change boundary markers is not recommended.

4.4.1.3. Definition of Phonetics of Multiple Application of a Phonological Rule

If its structural description is met more than once in a given input, a phonological rule will apply to that sequence multiple times (cf. Kenstowicz and Kisseberth 1979, chapter 8). The way multiple application works in Hermit Crab depends on the setting of the field mult_applic for the rule (section 4.4.1.3 Definition of Phonetics of Multiple Application of a Phonological Rule). This field may have the value simultaneous (section 4.4.1.3.1), lr_iterative (section 4.4.1.3.2), or rl_iterative (section 4.4.1.3.3). Left-to-right iterative application is the default. The following subsections define the application of a phonological rule to a phonetic sequence under these three settings of the mult_applic field.

For the purposes of this specification, a rule is said to apply to a form when one of the following algorithms has been applied, regardless of whether the rule actually changes the input form. In other words, a rule "applies" whenever it is tried against an input string, regardless of whether its structural description is met by any part of that string.

The definitions below refer to application of phonological rules. Because of the difficulty of parsing forms to which deletion rules have been applied, Hermit Crab imposes an arbitrary restriction on the unapplication of deletion rules. (A deletion rule is one whose Phonetic Output Sequence is the empty list.) The application of deletion rules remains unchanged, but there is the possibility that during the analysis phase, a form will not be found that would have produced the correct surface form during the synthesis phase. This could happen if the variable *del_re_app* were set to zero (the default) and a deletion rule was self-opaquing (by virtue of deleting part of its own environment through multiple application). The solution is to set the variable *del_re_app* to a number higher than zero (probably one; setting it too high will cause the search space to expand greatly and likely result in severe slowing). This will cause the morpher to generate further forms in which the deletion rule has been unapplied to its own output, and should generate the forms from which iterative application of the deletion rule can later generate the surface form. See Phonological Rules—Deletion Rules (section 2.3.5) for further details.

As a result of the application of a set of phonological rules, the stratum to which a lexical entry belongs may change; see Storable Lexical Entries (section 3.3).

The application of a disjunctive rule set to a lexical entry differs from the application of a (simple) phonological rule (which is modeled as a disjunctive rule with a single subrule); see Definition of Phonetics of Application of a Disjunctive List of Phonological Rules, section 4.4.1.4.

4.4.1.3.1. Simultaneous Application

If the mult_applic variable for the rule has the value simultaneous, the following describes the application of a phonological rule to a phonetic sequence.

Phonological rule PR transforms the phonetic sequence ILESeq into the phonetic sequence OLESeq, iff ILESeq is identical to OLESeq except that for every phonetic sub-sequence SSi = Seg1...Segj of ILESeq which matches against rule PR (see Definition above of a Match between a Phonological Rule and a Lexical Entry, section 4.4.1.1); and which, if the stratum is cyclic, contains one or more segments which have been changed or inserted since the beginning of this cycle, or which has had one or more segments deleted between Seg1 and Segj since the beginning of the cycle, the Input Stretch I1...Im of SSi has been transformed into the Phonetic Sequence O1...On by the application of PR.

Note 1: The special condition on the application of a cyclic phonological rule approximates the Strict Cycle Condition.

Note 2: There is no guarantee that the portions of ILESeq that matched against the Left and Right Environments of PR will still match in OLESeq. In other words, "why" opacity may occur.

Note 3: The input stretch of SSi should not overlap the input stretch of SSi+1. (This possibility can arise only if the input stretches contain more than one segment. The results of simultaneous application of a rule to overlapping sequences of segments is in the general case ill-defined.)

4.4.1.3.2. Left-to-right Iterative Application

If the mult_applic variable for the rule has the value lr_iterative (the default), the following describes the application of a rule to a phonetic sequence.

Phonological rule PR transforms the phonetic sequence ILESeq into the phonetic sequence OLESeq, by the following algorithm:

(1) Set TempSeq = ILESeq, and set CurSeg = the first segment of TempSeq.

(2) If PR matches against TempSeq, then set InStretch = the left-most input stretch of TempSeq such that the first segment of InStretch is CurSeg or to the right of CurSeg, and either

(a) the current rule stratum is noncyclic, or

(b) the portion of TempSeq which PR partitions with InStretch its input stretch, contains one or more segments which have been changed or inserted since the beginning of this cycle, or one or more segments has been deleted from that stretch since the beginning of this cycle,

then set OutStretch = the result of applying PR to InStretch, and then replace InStretch in TempSeq with OutStretch.

Otherwise (if PR does not match against TempSeq while meeting the above requirements), then set OLESeq = TempSeq and exit.

(3) Else set CurSeg to the first segment after OutStretch and go to step (2).

Note 1: Condition (2b) approximates the Strict Cycle Condition.

4.4.1.3.3. Right-to-left Iterative Application

If the mult_applic variable for the rule has the value rl_iterative, the rule is applied iteratively from right to left. The algorithm is identical to that for left-to-right iterative application (see above), except for the obvious difference of direction.

4.4.1.4. Definition of Phonetics of Application of a Disjunctive List of Phonological Rules

For any given segment in a lexical entry, a disjunctive list of phonological rules may apply only once in a given stratum (unless the disjunctive rule belongs to a cyclic stratum, in which case it may apply only once in each cycle, as allowed by the principle of Strict Cyclicity). Furthermore, only one subrule of the disjunctive list may apply to that segment. (Note that "ordinary" phonological rules are modeled by disjunctive rules with a single subrule.)

Let disjunctive rule R be a list of subrules (R1...Rn), and LESeq a phonetic sequence (the input sequence). Then R maps applies to LESeq by the following algorithm:

(1) Set CurSeg = the first segment of LESeq.

(2) Set CurRule = R1.

(3) Test CurRule for a match beginning with CurSeg in LESeq.

(4) If CurRule matches LESeq beginning with CurSeg, let InStretch be the input stretch of LESeq beginning with CurSeg. Then set CurSeg to the first segment following InStretch; set LESeq to the result of applying CurRule to InStretch. If this moves CurSeg past the end of the word, exit, returning LESeq; else go to step 2.

Else (if CurRule does not match LESeq beginning with CurSeg), set CurRule = the next rule after CurRule. If there is no rule after CurRule, set CurRule = R1 and set CurSeg = the next segment after CurSeg. If this moves CurSeg past the end of the word, exit, returning LESeq; else go to step 2.

If the current rule stratum is cyclic, the stretch of ILESeq matching CurRule must contain one or more segments which have been changed or inserted since the beginning of the cycle, or one or more segments has been deleted from that stretch since the beginning of this cycle.

Note 1: Step 4 provides for vacuous application of a subrule to count as application, i.e. the first subrule which applies blocks other subrules even if it only applies vacuously.

Note 2: The above algorithm (like all algorithms in this specification) is not necessarily the most computationally efficient way to implement the process in question.

4.4.2. Definition of Application of a Phonological Rule to a Lexical Entry

The application of a phonological rule PR to an input lexical entry ILE translates ILE into an output lexical entry OLE iff the application of rule PR to the Phonetic Shape of ILE results in the Phonetic Shape of OLE (see Definition of the Phonetics of Multiple Application of a Phonological Rule, section 4.4.1.3).

4.4.3. Definition of Application of a Set of Phonological Rules

This section specifies the application of a set of phonological rules of a given stratum.

The ordering of such sets of rules of different strata or in different cycles with respect to each other, and with respect to morphological rules, is defined above (see Storable Lexical Entries, section 3.3).

Let the set of phonological rules of the stratum be PRSet = {PR1...PRn}, and let ILESeq be the input Phonetic Shape to which PRSet applies to produce the output Phonetic Shape OLESeq. (Again, "input" and "output" are used in the synthesis sense.) Each subsection below then defines the application of PRSet, according to the rule ordering of phonological rules for the current stratum, whether linear or simultaneous.

In addition to linear and simultaneous ordering, it is logically possible that a set of rules would be freely ordered, that is, the set would reapply to a given form until they produced no further change. In Kenstowicz and Kisseberth (1979, chapter 8), this is referred to as "the Free Reapplication Hypothesis." Hermit Crab does not implement this form of ordering, because (1) it is computationally expensive (and can lead to nontermination); and (2) few if any phonologists have proposed such ordering.

4.4.3.1. Linearly Ordered Rules

This definition applies to PRSet if the value of the p_rule_order field of the current stratum is linear.

Let PR1...PRn be the list of phonological rules in PRSet in order of application. Then ILESeq is the first applying rule PR1 to ILESeq, then applying PR2 to the output of PR1, etc., and finally applying PRn to the output of PRn–1.

Note: In Kenstowicz and Kisseberth (1979, chapter 8), this is referred to as "the Ordered Rule Hypothesis."

4.4.3.2. Simultaneous Application of Rules

This definition applies to PRSet if the value of the p_rule_order field of the current stratum is simultaneous.

ILESeq is derived form OLESeq by the set of phonological rules PRSet iff, for every rule PRi in PRSet which matches against ILESeq, that rule has been applied to ILESeq to produce OLESeq.

Warning: Hermit Crab does not prevent two rules with contradictory effects from applying in such a way that one rule undoes the effect of the other, nor does Hermit Crab signal this situation.

Note: In Kenstowicz and Kisseberth (1979, chapter 8), this is referred to as "the Direct Mapping Hypothesis."

4.5. Definition of Application of a Stratum

The following defines the application of a single Stratum of rules to a Lexical Entry.

4.5.1. Application of a Noncyclic Stratum

Let Si be a noncyclic stratum. Then the application to a lexical entry from stratum Si of one morphological rule of stratum Si produces a storable lexical entry of stratum Si. If stratum Si+1 is a cyclic stratum, then the application to a lexical entry from stratum Si of the relevant Affix Template (if any) of Si, followed by the application of all the phonological rules of stratum Si, followed by the erasure of any boundary markers, followed by the application of all the phonological rules of stratum Si+1, produces a storable lexical entry of stratum Si+1. Otherwise (if stratum Si+1 is a non-cyclic stratum), the application to a lexical entry from stratum Si of the relevant Affix Template (if any) of Si, followed by all the phonological rules of stratum Si, followed by the erasure of any boundary markers, produces a storable lexical entry of stratum Si+1.

4.5.2. Application of a Cyclic Stratum

Let Sj be a stratum of cyclic rules. Then the application to a storable lexical entry from stratum Sj of one or more cycles is also a storable lexical entry of stratum Sj. (A "cycle" is defined as the application of one morphological rule of the stratum, followed by the application of all phonological rules of that stratum, followed by the erasure of any boundary markers.) If stratum Sj+1 is also a cyclic stratum, then the application of all the phonological rules of stratum Sj+1 to a storable lexical entry of stratum Sj, followed by the application of the relevant Affix Template (if any) of Si, is a storable lexical entry of stratum Sj+1. Otherwise (if stratum Sj+1 is a non-cyclic stratum), then a storable lexical entry of stratum Sj to which the relevant Affix Template (if any) of Si has been applied is also a storable lexical entry of stratum Sj+1.

4.6. Definition of Generation of a Surface Lexical Entry

For convenience, the pseudo-stratum *surface* is defined as the final stratum; it has no rules and is considered a non-cyclic stratum for purposes of the following definition. (That is, a lexical entry belonging to the *surface* stratum may have no further rules, morphological or phonological, applied to it. The user should not define another stratum with the name *surface*.)

Let LE be a lexical entry of stratum S1 to which no morphological or phonological rules have applied, and let RzHF be a set of Head Features which are to be realized on LE. Then LE may be converted into a Derived Lexical Entry of the Surface Stratum by first setting the Head Features list of LE to RzHF plus any non-conflicting features of the existing Head Features of LE, then applying all the Strata beginning with S1 through the Surface Stratum in order.

5. Data Structures

5.1. Input Data Format

The data input to the morpher module for the commands morph_and_lookup_word and morph_and_lookup_list is the output of the Preprocessor module (see chapter five), and contains the data to be morphed. To summarize that chapter: the input to the morpher is a list of one or more Token Record data structures, each containing the print form of the word and its normalized form, and representing a single word of the input string.

The Phonetic Shape field of those records is visible to the morpher, while the Orthographic Shape field is invisible to the morpher rules (although the morpher module passes it on to downstream modules in the Orthographic Shape field of Lexical Entry records).

The function morph_and_lookup_word accepts a list of length three; each member of the list is a Token Record data structure, and represent a single input word, plus the preceding and following words, in that order. The function morph_and_lookup_list accepts a list of Token Record data structures of any length. The morpher morphs each word separately; the previous word and the following word (if any) are, however, accessible to phonological rules through the phonological rule fields prev_word and next_word.

The input to the morpher module for the commands generate_word, apply_stratum, and apply_morpher_rule are similar, but are described under each command.

5.2. Lexical Entry Data Structure

Lexical Entries are record structures; as described above (see Lexical Entries, section 3), each lexical entry represents a root, stem or word. The Lexical Entry data structure is used in the lexicon and in the output of the morpher. (A nearly identical structure is used in the syntactic parser to represent terminal nodes; see chapter seven, Parse Tree Format—Terminal Node Record Structure.)

This section describes the record structure of a lexical entry.

Note: The Lexical Entry structure may be augmented in future versions of Hermit Crab by the addition of fields, e.g. for indicating functional structure.

Record Label: lexical_entry

Fields:

5.2.1. Lexical Entry ID

Optionality: obligatory

Label: id

Type: string

Contents: A code which uniquely identifies this lexical entry data structure.

Purpose: used in debugging to refer to lexical entries.

A derived lexical entry inherits the lex ID of the lexical entry from which it is derived.

A real lexical entry's lex ID remains valid during a single session of Hermit Crab; a virtual lexical entry's lex ID remains valid only until the next time either the function morph_and_lookup_word or the function morph_and_lookup_list is called. Deleting a (real) lexical entry also causes its lex ID to become invalid, as does resetting the lexicon (see reset_lexicon, section 6.4.6).

5.2.2. Phonetic Shape

Optionality: obligatory in Real Lexical Entries; pertains to Virtual Lexical Entries only during debugging

Label: sh

Type: string

Contents: A string which represents the phonological form of the lexical entry. For lexical entries which represent entire tokens in the input, this field is copied from the field of the same name in the input Token Record data structure; in the case of lexical entries in the lexicon, it is the result of lexical lookup. In the case of virtual lexical entries, this field is translated from the phonetic sequence which represents its phonological form; this translation is only necessary when matching a storable lexical entry against a real lexical entry, or during debugging.

Implementation note: The translation of the phonetic sequence of a virtual lexical entry into a string may be ambiguous; see Translation from Phonetic Sequence to Regular Expression, section 4.1.1.2.

5.2.3. Family

Optionality: optional, used only in Real Lexical Entries

Label: fam

Type: atom

Contents: Gives the family to which a given (real) lexical entry belongs.

Purpose: To allow blocking of derivations by irregular forms listed in the lexicon.

It may be useful for the shell to treat families of lexical entries as units when the user is editing lexical entries, so that changes to one member of the family are consistently propagated to others. An inheritance schema is one way this might be implemented.

5.2.4. Gloss

Optionality: optional

Label: gl

Type: string

Contents: A translation of the lexical item as listed in the dictionary (for real lexical entries) or as morphed (for virtual lexical entries).

If this field is empty in a real lexical item, the default string "?" is used, as described below (see Morphological Rule Notation—Gloss String, section 7.2.1.14).

Purpose: To represent the morpher's analysis of the word's meaning. The intention is that it will contain the translation of one or more of the morphemes composing the word. This field may also the Display Module as a label for the word.

Glosses are shown in Hermit Crab’s output if the global variable *show_glosses* is true (default), otherwise they are not included.

5.2.5. Part of Speech

Optionality: obligatory

Label: pos

Type: atom

Contents: The name of the part of speech of the lexical item.

5.2.6. Subcategorization

Optionality: optional

Label: sub

Type: list

Contents: A list of atoms, each one of which is the name of a syntactic (parser) rule which the lexical item subcategorizes. If this field is absent, the lexical item does not subcategorize any rules.

Purpose: To allow the lexical item to subcategorize certain syntactic rules. Morphological rules may also be constrained to require that the lexical entry to which they apply subcategorize a specified rule.

Warning: The morpher does not check whether the rules in this list actually exist in the parser's rulebase.

5.2.7. Grammatical Function Information

Optionality: empty

Label: gf

Type: atom

Purpose: This field is meant to carry information specified in syntactic rules as to the function of this node. This information is added by the Parser and/or Functional Structure Modules; the field is always empty in the Morpher module, and may therefore be omitted from all lexical entries within this module. (It is mentioned here only for completeness.)

5.2.8. Morphological Rules

Optionality: optional (defaults to "?")

Label: mrs

Type: list

Contents: The names (atoms) of the morphological rules (if any) which have applied to form this lexical entry; left-to-right order of this list represents the order in which morpher rules applied to produce this lexical entry (in the synthesis sense). This field will often be the empty list for real lexical entries. However, if a real lexical entry represents a stem, rather than a root, it may be desirable to indicate the morphological rules which "would have" applied, in order to prevent their applying. (For instance, if the irregular past tense verb ran is listed in the lexicon, its lexical entry might list the past tense rule as having applied, to avoid generating *ranned.)

Purpose: Used to prevent multiple application of morphological rules, and in debugging.

5.2.9. Stratum

Optionality: obligatory

Label: str

Type: atom

Contents: The name of a rule stratum.

Purpose: This encodes the stratum of rules which may apply to this lexical entry.

The value of *surface* means that no more rules may apply to the lexical entry (it is a surface form).

For real lexical entries, the value of this field must be supplied by the user. For virtual lexical entries, the value is automatically supplied by the morpher.

See also: Storable Lexical Entries (section 3.3)

5.2.10. Morphological/ Phonological Rule Features

Optionality: optional

Label: rf

Type: list

Contents: zero or more atoms, each of which is the name of a Morphological/ Phonological Rule (MPR) feature.

Purpose: These rule features govern which morphological or phonological rules a lexical entry will exceptionally undergo or not undergo. They may be used to encode such things as conjugation class and gender.

If this field is absent, the lexical entry has no MPR features.

If membership in a conjugation class or gender class is important in the syntax, the class membership should be indicated as a Head Feature, since syntactic rules make reference only to Head and Foot Features. Head and Foot Features are visible both to morphological and phonological rules, and to syntactic (phrase structure) rules, whereas MPR features are visible only to morphological/ phonological rules.

5.2.11. Head Features

Optionality: optional

Label: hf

Type: list-valued feature list

Purpose: This list represents the assigned (non-default) Head Features of the lexical entry.

If this field is absent, the values of all Head Features of the lexical entry are the default values.

See also: Foot Features (section 5.2.12); Morphological/ Phonological Rule Features (5.2.10)

5.2.12. Foot Features

Optionality: optional

Label: ff

Type: list-valued feature list

Purpose: This list represents the assigned (non-default) Foot Features of the lexical entry.

If this field is absent, the values of all Foot Features of the lexical entry are the default values.

Foot features are invisible to phonological rules.

See also: Head Features (section 5.2.11); Morphological/ Phonological Rule Features (5.2.10)

5.2.13. Obligatory Head Features

Optionality: optional

Label: of

Type: list

Contents: A list of atoms, each of which is the name of a Head Feature.

Purpose: For each feature-name listed, some value must be assigned to that feature by the end of the derivation (in the synthesis sense, i.e. to the complete word). This feature value will usually be supplied by an affix yet to be attached to the stem represented by this lexical entry. If at the end of the derivation, no value has been assigned to such a feature, the derivation is ruled out.

If this field is omitted, there are no obligatory Head Features.

See also: Complete Lexical Entries (section 3.5); Morphological Rule Notation—Output Side Record Structure—Obligatory Features (section 7.2.1.13)

5.2.14. Pseudo Lexical Entry Flag

Optionality: optional, relevant only to storable lexical entries

Label: ps

Type: Boolean

Default: false

Purpose: This field has the value false for Real Lexical Entries and for all Storable Lexical Entries which are derived from Real Lexical Entries. Storable Lexical Entries not derived from Real Lexical Entries (visible to the user only with the function show_morphings) have the value true for this field. In other words, this field is used in debugging to flag lexical entries which are not derivable from the lexicon.

See also: show_morphings (section 6.6.10).

5.3. Superentries

The notion of a superentry of a lexical entry is used in the debugging function show_morphings, and in the lexicon function find_lexical_entries. Intuitively, one lexical entry is a superentry of a template lexical entry if the superentry is a (possibly more specific) instantiation of the second entry. The template will usually be supplied by the user, who (for instance) may wish to find all lexical entries in the lexicon matching the template.

More precisely, lexical entry X is a superentry of a (possibly partially specified) lexical entry Template iff:

(1) If Template specifies a Phonetic Shape, then the Phonetic Shape of Template is a substring of X. ("Substring" here includes the case where the two strings match exactly, i.e. an improper substring.) There is no provision for "wildcards." However, the special character # (ASCII 35) at the beginning or end of the string representing the Phonetic Shape of Template must correspond to the respective terminus of the string representing the Phonetic Shape of X. (In other words, # represents the boundary of the stem.)

(2) If Template specifies a Family, it is identical to the Family of X.

(3) If Template specifies a Gloss, it is identical to the Gloss of X.

(4) If Template specifies a Part of Speech, it is identical to the Part of Speech of X.

(5) If Template specifies a Subcategorization, its Subcategorization list is a subset of the Subcategorization list of X.

(6) If Template specifies a Morphological Rules field, it is a subset of the Morphological Rules list specified for X. (The order of the two lists need not be identical.)

(7) If Template specifies a Stratum, it is identical to the Stratum of X.

(8) If Template specifies lists of MPR Features, Head Features, Foot Features, and/or Obligatory Head Features, each such list is a subset (not necessarily in the same order) of the corresponding list of X.

5.4. Character Definition Table

Character Definition Tables define the translation between sequences of one or more characters, each of which represents a single segment, and sets of phonetic features representing those segments.

Record Label: char_table

Fields:

5.4.1. Character Table Name

Optionality: obligatory

Label: name

Type: atom

Purpose: To refer to this table. For instance, the definition of a stratum must specify the Character Definition Table it uses (see Stratum Property Setting Record, section 5.5).

Warning: The morpher enforces uniqueness of character definition table names; no two tables can have the same name.

5.4.2. Encoding

Optionality: obligatory

Label: encoding

Type: string

Purpose: This field is not used in Hermit Crab; it appears solely so that LevelOfRepr objects can be passed back and forth between Hermit Crab and Cellar (which does utilize this field).

5.4.3. Segment Definitions

Optionality: optional

Label: seg_defs

Type: list

Contents: Each member of this list is itself a list of length two. The first member of the sublist is a string, whose characters define the external representation of a segment. The other member of the sublist is an atomic-valued feature list, which defines the features of the segment. Any features which do not appear are undefined for that segment.

Default: the empty list.

Purpose: The character representation of segments is used in lexical entries and in input tokens. They may also be used in the Phonetic Output field of morphological rules to represent affixal material.

If this field does not appear, the character definition table does not define any segments (such a table is unlikely to be of much use).

Boundary Definitions

Optionality: optional

Label: bdry_defs

Type: list

Contents: Zero or more strings, each of which represents a boundary marker. (There is no provision for the representation of boundary markers as sets of features.)

Default: the empty list.

Purpose: Boundary markers may be used in morphological and phonological rules. They may also appear in lexical entries and in input tokens, although such usage is likely to be rare.

If this field does not appear, the character definition table does not define any boundary markers.

5.5. Stratum Property Setting Record

A Stratum Property Setting Record specifies, for a given stratum, one of the following properties: the name of its character definition table; whether it is cyclic or noncyclic; whether its morphological or phonological rules are linearly ordered; or the set of Affix Templates pertaining to the Stratum. The reason for setting these properties individually, rather than loading them together in some sort of stratum record, is that some sorts of changes (e.g. the character table) have much more far-reaching repercussions than others (e.g. the rule order). The use of individual property setting records allows the latter properties to be changed without having to reset the former properties.

Record Label: stratum_setting

Fields:

5.5.1. Stratum Name

Optionality: obligatory

Label: nm

Type: atom

Purpose: To determine the stratum which is being set.

5.5.2. Setting Type

Optionality: obligatory

Label: type

Type: atom: ctable, cyclicity, mrule, prule or templates

Purpose: Specifies the property of the stratum that is being set: its character definition table; its cyclicity; the ordering of its morphological or phonological rules, or its realizational affixes template.

Warning: Resetting the character definition table for a stratum means that any lexical entries or rules for that stratum are invalidated, and must be reloaded.

5.5.3. Value

Optionality: obligatory

Label: value

Type: If the Setting Type is ctable, an atom (the name of a character definition table); if the Setting Type is cyclicity, one of the atoms cyclic or noncyclic; if the type is mrule, one of the atoms unordered or linear; if the type is prule, one of the atoms linear or simultaneous; or if the type is templates, a list of Affix Templates.

Default: None. Except for the character definition table, the properties of the stratum have defaults; but the use of this command requires an explicit value.

The defaults for the Stratum itself are as follows: the Cyclicity is ‘noncyclic’; the Morphological Rule Order is ‘unordered’; the Phonological Rule Order is ‘simultaneous’; and there are no Affix Templates.

5.6. Natural Class

Each Natural Class is a record data structure which defines a set of features, which may then be used in Phonetic Sequences (see below).

Record Label: nat_class

Fields:

5.6.1. Name

Optionality: obligatory

Label: name

Type: atom

Purpose: To identify the natural class to which a simple context may refer.

5.6.2. Features

Optionality: obligatory

Label: features

Type: atomic-valued feature list. This list represents a set of phonetic features constituting a single segment.

For instance, the list

(consonantal + pt_of_articulation alveolar)

might refer to an alveolar consonant.

A Natural Class features list which is empty matches any segment.

There is no provision for extra-segmental structure (such as syllabic tree structure), nor for feature structure (such as tiers of features), although these are possible future enhancements.

5.7. Phonetic Sequences and Phonetic Templates

5.7.1. Purpose of Phonetic Sequences and Phonetic Templates

A phonetic sequence consists of a list of boundary segments, natural classes, and optional sequences, and represents a sequence of phonetic segments and/or phonological boundaries. Such sequences are used internally to the morpher to represent the phonetic form of words being analyzed. In addition, phonetic sequences are used both internally and externally to define the input of morphological rules, and the input and output of phonological rules.

A phonetic template is a phonetic sequence augmented with a specification as to whether it must match against a phonetic sequence beginning at the left end of the phonetic sequence and/or ending at its right end. Phonetic templates are used internally and externally in the left environment, right environment, previous word, and next word fields of phonological rules.

The term "phonetic" is used here as a convenient cover term for sequences which are interpretable in phonetic terms. No theoretical stance is implied as to the existence of a distinction among morphophonemic, phonemic, or systematic phonetic levels.

The input to the morpher also represents a sequence of phonetic segments. However, as described in the chapter on the Preprocessor module, this input consists of a sequence of tokens, each containing a pair of strings (one of which represents the phonetic shape); there is no provision in the input to the morpher for the explicit representation of phonetic features. Likewise, the phonetic shape of a lexical entry is given as a string, not a phonetic sequence. However, the morpher uses phonetic sequences internally to represent an input word or its analysis. Finally, the Phonetic Output field of morphological rules allows affixal material to be specified by strings (to be translated into segments using a specified character definition table), as well as by phonetic sequences.

5.7.2. Definitions

A Phonetic Sequence is defined in terms of Simple Contexts, Segments, Boundary Markers, and Optional Segment Sequences. These subparts are first defined, followed by the definitions of Phonetic Sequences and Phonetic Templates.

5.7.2.1. Definition of Simple Context

Each Simple Context is a record data structure which defines a class of segments.

Record Label: simp_cntxt

Fields:

5.7.2.1.1. Natural Class

Optionality: obligatory

Label: class

Type: atom (name of a natural class)

Purpose: To define the invariant features of the simple context. The natural class refers itself defines that feature content (see Natural Class—Features, section 5.6.2).

5.7.2.1.2. Alpha variables

Optionality: optional

Label: alpha_vars

Type: alpha variable list

Purpose: To define the variable-valued features of the simple context. Variable-valued features are often used in assimilation rules; they are the "alpha" variables of classical generative phonology. Note that, unlike the standard notation for phonological rules, only the alpha variable and its polarity (+ or –) will appear here; the feature name to which the alpha variable refers is defined elsewhere in the rule (in the var_fs field).

5.7.2.2. Segment

A segment data structure represents a single segment.

Record Label: seg

Fields:

5.7.2.2.1. Representation

Optionality: obligatory

Label: rep

Type: string

Purpose: This gives the string representation of the segment, which should match the string representation of a segment in the appropriate character definition table.

5.7.2.2.2. Character table

Optionality: obligatory

Label: ctable

Type: atom

Purpose: This gives the name of the character definition table in which the segment is defined.

5.7.2.3. Boundary Marker

A boundary marker data structure represents a single boundary marker.

Record Label: bdry

Fields:

5.7.2.3.1. Representation

Optionality: obligatory

Label: rep

Type: string

Purpose: This gives the string representation of the boundary marker, which should match the string representation of a boundary marker in the appropriate character definition table.

5.7.2.3.2. Character table

Optionality: obligatory

Label: ctable

Type: atom

Purpose: This gives the name of the character definition table in which the boundary table is defined.

5.7.2.4. Definition of an Optional Segment Sequence

An optional segment sequence is a record data structure which represents a sequence of one or more optional segments and/or boundary markers. Included in the data structure is an indication of how many times the optional segment sequence may appear within the string being matched.

Record Label: opt_seq

Fields:

5.7.2.4.1. Minimum Occurrence

Optionality: optional

Label: min

Type: integer

Purpose: This defines the minimum number of times which the optional sequence may appear at the corresponding position in the string being matched.

If this field is omitted, the minimum number of times the optional sequence may appear is zero.

5.7.2.4.2. Maximum Occurrence

Optionality: optional

Label: max

Type: integer

Purpose: This defines the maximum number of adjacent appearances of the optional sequence at the corresponding position in the string being matched.

If this field is omitted, the maximum number of times the optional sequence may appear is one.

Setting this field to -1 allows the optional sequence to match a stretch of segments any number of times. An optional sequence whose maximum occurrence is -1, and which contains an empty set of features (a set which matches any segment) will match any string. This will often be useful in morphological rules, where the entire length of a stem must be matched against, but only a small part of that stem need be specified in terms of its segmental content. An example is a prefix that attaches to any stem beginning with a consonant. The required phonetic input would be a single simple context for a consonant, followed by an optional sequence of any number of segments:

(<simp_cntxt class cons> <opt_seq max -1 seq (<simp_cntxt class mt>)>)

(assuming the simple contexts "cons" and "mt" had been defined previously).

5.7.2.4.3. Optional Sequence

Optionality: obligatory

Label: seq

Type: list

Contents: One or more names of simple contexts and/or boundary markers.

Purpose: This specifies the series of segments which are optional. The optional sequence is itself a phonetic sequence.

The simple context whose specification is the list (()) represents an optional sequence composed of any segment, and matches either a phonetic segment or a boundary marker. One use of such a sequence is described above under Maximum Occurrence.

5.7.2.5. Definition of Phonetic Sequence

A Phonetic Sequence is a list representing a sequence of zero or more segments and/or boundary markers. Each member of the list is:

a Segment Record (defined above);

the name of a Simple Context (defined above); or

an Optional Segment Sequence (defined above).

Phonetic sequences are used in the input of morphological rules, the input and output of phonological rules, and in phonetic templates. However, Phonetic Sequences used for certain purposes must not contain Optional Segment Sequences; see section 5.7.2.6, Definition of Variable-Free Phonetic Sequences.

A phonetic sequence which is an empty list matches any single segment or boundary marker. These may be used in phonological rules of epenthesis or deletion, or in phonetic templates which represent variables in morphological rules, but would not normally appear in phonetic templates in the environment of phonological rules. (An empty environment can simply be omitted from a phonological rule.)

Example: The following phonetic sequence:

(<simp_cntxt class high_vowels> <bdry rep "+" ctable table1>

<opt_seq min 0 max 2 seq (<simp_cntxt class cons>

<simp_cntxt class vowel)>

might represent a high vowel, followed by a morpheme boundary, followed by between zero and two open syllables.

5.7.2.6. Definition of Variable-Free Phonetic Sequences

A Variable-Free Phonetic Sequence is defined as a Phonetic Sequence which does not contain any Optional Segment Sequences. Variable-free phonetic sequences are used internally to the morpher to represent sequences of segments in stems and words, and in certain parts of some rules.

5.7.2.7. Definition of Phonetic Template

Functionally, a Phonetic Template is a phonetic sequence augmented with boundary conditions indicating whether it must match against another (variable-free) phonetic sequence beginning at the latter's left end and/or ending a the latter's right end. Structurally, a Phonetic Template is the record structure described below.

Phonetic Templates are used in the left environment, right environment, previous word, and next word fields of phonological rules.

Record Label: ptemp

Fields:

5.7.2.7.1. Initial Boundary Condition

Optionality: optional

Label: init

Type: Boolean

Default: false

Purpose: If this field is true, the phonetic template must match against a phonetic sequence beginning with left-most segment of the latter, i.e. word-initially.

5.7.2.7.2. Final Boundary Condition

Optionality: optional

Label: fin

Type: Boolean

Default: false

Purpose: If this field is true, the phonetic template must match against a phonetic sequence ending with the right-most segment of the latter, i.e. word-finally.

5.7.2.7.3. Phonetic Sequence

Optionality: obligatory

Label: pseq

Type: phonetic sequence

Purpose: Represents the sequence of segments and/or boundary markers which must match against a phonetic sequence.

5.8. Trace Structures

Trace structures are record structures used to output information resulting from the tracing of one or more phonological and/or morphological rules during analysis (unapplication) or synthesis (application), blocking, or from the tracing of lexical lookup. How much information is available in the trace is a function of the particular algorithm used by the morpher, particularly in the case of tracing of a rule in analysis mode.

The output of the functions morph_and_lookup_word and morph_and_lookup_list changes when tracing of rules, lexical lookup or blocking is turned on: in addition to the usual output structure produced by these commands, a root trace record is produced, which contains the input argument of morph_and_lookup_word, plus any further trace records. The trace record is embedded in a call to the function pretty_print (see chapter four), and is output before the normal command + data, for instance:

(pretty_print <trace ...>) (other commands... <word_analysis...>)

If the analysis results in an error message being output, the trace structure will still be ouput up to the point at which the error is detected, but the trace structure will be terminated by the error message, rather than by the usual close parenthesis etc.:

(pretty_print <trace... (send_error_msg message)

A Rule Analysis Trace Record is produced for each unapplication of each rule being traced in analysis mode, and a Rule Synthesis Trace Record is produced for each application of each rule being traced in synthesis mode. (Multiple application of a phonological rule to a single form, whether iterative or simultaneous, counts as a single application for the purposes of tracing.) If tracing is turned on for lexical lookup, a Lexical Lookup trace record is produced for each attempted lexical lookup; and if tracing of blocking is turned on, a Blocking Trace record is produced for each storable lexical entry built during the synthesis phase. Finally, a Lexical Entry Trace record is produced for each successfully analyzed word; this record is identical to that produced by the function morph_and_lookup_word when tracing is not turned on. All these records are contained recursively. For instance, if tracing of lexical lookup is turned on, the lexical lookup records will be contained in the cont field of the root trace record, and lexical entry trace records representing successful analyses of those lexical lookups will be contained in the cont field of the lexical lookup structures which resulted in those successful analyses.

The output of the function generate_word is also altered when tracing of rules in synthesis mode or tracing of blocking is turned on. A Successful Lookup Trace record is produced as the root record of the trace, which contains as its Virtual Lexical Entry field the input argument of the function generate_word. In addition, one Rule Synthesis Trace Record is produced for each application of each rule being traced in synthesis mode. If tracing of blocking is turned on, a Blocking trace record is also produced for each storable lexical entry produced. If tracing of strata is turned on, one Strata Analysis and/or Synthesis Trace Record is produced at the beginning and at the end of each stratum (except for the *surface* stratum, for which no trace record is produced). Finally, a lexical entry record is produced for the output word, if such a word is successfully generated; this lexical entry is identical to that produced by the function generate_word when tracing is not turned on.

It is not intended that a trace record be presented directly to the user. Rather, the shell is responsible for presenting the information in some useful form, which will form will depend on the capabilities of the display device, the linguistic theory being represented, etc.

5.8.1. Root Trace Record

The root trace record is the 'outer' data structure used to output information when tracing the function morph_and_lookup_word. It gives the phonetic and orthographic form of the input word (as specified in the input token which is the argument of this function), as well as any cont. A continuation is another trace record, and represents the tracing of rules, lexical lookup, or blocking of the input word.

Record Label: trace

Fields:

5.8.1.1. Orthographic Shape

Optionality: obligatory

This field is identical to the field of the same name in the Lexical Entry data structure.

5.8.1.2. Phonetic Shape

Optionality: obligatory

This field is identical to the field of the same name in the Lexical Entry data structure.

5.8.1.3. Continuations

Optionality: optional

Label: cont

Type: list

Contents: Each member of this list is a rule unapplication, lexical lookup, rule application, or blocking record structure.

Purpose: Each member of this list represents a continuation by traced rules, lexical lookup, or blocking of this input word.

If this field is absent, there were no further continuations by traced rules, etc., nor was the input successfully analyzed.

5.8.2. Stratum Analysis Trace Record

The Stratum Analysis Trace Record structure is used to output information resulting from the tracing of strata during analysis. One such record is produced at the beginning of a user-defined stratum, and another at the end of the stratum, after all rules of the stratum have been unapplied and lexical lookup has been done; the two records are marked to distinguish them. In addition, a single Stratum Trace Record is produced for the *surface* stratum; this is treated as the output of that stratum (there is no Stratum Trace Record for the input of the *surface* stratum).

Note that the phonetic shape shown in a trace at the output of one stratum may be different from the phonetic shape shown at the input of the next stratum, since the two strata may use different character sets.

Record Label: sua

Fields:

5.8.2.1. Stratum Name

Optionality: obligatory

Label: nm

Type: atom

Purpose: This gives the name of the stratum whose input or output this record represents.

5.8.2.2. Input vs. Output

Optionality: obligatory

Label: io

Type: atom, either in or out

Purpose: This field tells whether this trace record represents the input to the stratum (i.e. the lexical form before any rules of the stratum have been unapplied) or the output.

5.8.2.3. Lexical Form

Optionality: obligatory

Label: lex

Type: lexical entry record

Purpose: This field represents the virtual lexical entry which was the input or output (in the analysis sense) of the stratum. This lexical entry may be only partially instantiated.

See also comments under Rule Analysis Trace Record--Input (section 5.8.3.2) concerning optional or ambiguous segments.

5.8.2.4. Continuations

Optionality: optional

Label: cont

Type: list

Contents: Each member of this list is a stratum unapplication, rule unapplication, lexical lookup, or blocking record.

Purpose: Each such member represents a continuation by traced rules, lexical lookup, or traced blocking of the form shown, resulting from this form.

If this field is absent, there were no continuations. This field will happen if the trace record represents the output of the deepest stratum, and there were no successful lexical lookups.

5.8.3. Phonological Rule Analysis Trace Record

The Phonological Rule Analysis Trace Record structure is used to output information resulting from the tracing of one or more phonological rules during analysis.

Record Label: pua

Fields:

5.8.3.1. Rule Name

Optionality: obligatory

Label: nm

Type: atom

Purpose: This gives the name of the traced rule whose attempted application this record represents.

5.8.3.2. Rule Input

Optionality: obligatory

Label: in

Type: lexical entry record

Purpose: This field represents the virtual lexical entry which was the input (in the analysis sense) of the rule being traced. Depending on the algorithm used, only certain fields of the lexical entry will be instantiated, and some instantiated fields may be only partially instantiated (see Morphological Rule Analysis Trace Record, section 5.8.4).

Hermit Crab uses a limited form of regular expressions to encode information which is ambiguous in the shape field of virtual lexical entries. If a segment in the shape field has been unepenthesized or undeleted by some rule, it will be marked as optional, meaning that its presence cannot be determined until lexical lookup. This optionality will be encoded by bracketing the segment with an ASCII 2 (STX) to the left and an ASCII 3 (ETX) to the right (see Translation from Phonetic Sequence to Regular Expression, section 4.1.1.2). If a feature bundle is ambiguous between two or more segments, these segments will be separated by a an ASCII 29 (GS) and the set of segments bracketed with ASCII 28 (FS) to the left and ASCII 30 (RS) to the right. If a feature bundle is both optional and ambiguous, the parentheses are outermost.

Note: This field is not output if the value of the global variable *trace_inputs* is false (the default value of this variable is true).

5.8.3.3. Rule Output

Optionality: optional

Label: out

Type: lexical entry record

Purpose: This field represents the virtual lexical entry which was the output (in the analysis sense) of the application of the rule being traced. Like the input field, this lexical entry may be only partially instantiated.

See also comments under Rule Input (section 5.8.3.2) concerning optional or ambiguous segments.

Implementation note: If a rule does not alter its input (i.e. it fails to apply), an implementation may substitute the atom ‘*NA*’ for a lexical entry record.

5.8.4. Morphological Rule Analysis Trace Record

The Morphological Rule Analysis Trace Record structure is used to output information resulting from the tracing of one or more morphological rules during analysis.

What counts as an attempted application of a rule will depend on the specific algorithm for selecting candidate rules. For instance, consider a morpher processing English which has encountered the word fasters (i.e. people who fast). After stripping the –s suffix, the morpher presumably knows that faster may be either a noun or a verb, but not an adjective or adverb. If the morpher uses this information as a guide to selecting candidate suffix rules, it will not even try applying the –er comparative rule. On the other hand, if the morpher did not use the part of speech information to select candidate rules, it might try applying the –er rule, although that rule would ultimately fail because of the conflicting requirements for part of speech.

Note that we do not distinguish between the application of realizational rules and "ordinary" morphological rules.

Record Label: mua

Fields:

5.8.4.1. Rule Name

Optionality: obligatory

Label: nm

Type: atom

Purpose: This gives the name of the traced rule whose attempted application this record represents.

5.8.4.2. Rule Input

Optionality: obligatory

Label: in

Type: lexical entry record

Purpose: This field represents the virtual lexical entry which was the input (in the analysis sense) of the rule being traced. Depending on the algorithm used, only certain fields of the lexical entry will be instantiated, and some instantiated fields may be only partially instantiated. For instance, if a morphological rule produces [+subjunctive] verbs, it is possible to deduce that the input of this rule (if it is derivable at all) must have the feature [+subjunctive], and a part of speech of verb. However, the implementation may not instantiate all of this information during analysis.

See Phonological Rule Analysis Trace Record (section 5.8.3) for the use of regular expressions to encode information which is ambiguous in the shape field of virtual lexical entries.

Note: This field is not output if the value of the global variable *trace_inputs* is false (the default value of this variable is true).

5.8.4.3. Rule Output

Optionality: optional

Label: out

Type: lexical entry record

Purpose: This field represents the virtual lexical entry which was the output (in the analysis sense) of the application of the rule being traced. Like the input field, this lexical entry may be only partially instantiated.

If this field is absent, the attempted application of the morphological rule being traced failed.

See also comments under Rule Input (section 5.8.4.2) concerning optional or ambiguous segments.

Implementation note: It would be desirable to show why a morphological rule failed to apply. There is no provision for this at present, but this may be added in the future.

5.8.4.4. Continuations

Optionality: optional

Label: cont

Type: list

Contents: Each member of this list is a rule unapplication, lexical lookup, or blocking record.

Purpose: Each such member represents a continuation by traced rules, lexical lookup, strata, or traced blocking of the result of unapplying this rule.

If this field is absent, there were no continuations. (This field will always be absent if the Output field is empty.)

5.8.5. Lexical Lookup Record

The lexical lookup record shows what storable lexical entries the morpher found during analysis, and what real lexical entries matched those storable lexical entries.

Record Label: ll

Fields:

5.8.5.1. Virtual Lexical Entry

Optionality: obligatory

Label: v

Type: lexical entry

Purpose: This field represents a virtual storable lexical entry which the morpher constructed during the analysis phase, and then attempted to look up in the lexicon. Only those fields of the lexical entry which the morpher instantiates during analysis will be instantiated, and those only partially.

See comments under Phonological Rule Analysis Trace Record--Rule Input (section 5.8.3.2) concerning optional or ambiguous segments.

5.8.5.2. Continuations

Optionality: obligatory

Label: cont

Type: list

Contents: one or more successful lookup structures (defined below)

Purpose: This field lists the real lexical entries which the morpher found in the lexicon and which matched against the virtual lexical entry, together with their continuations.

If this field is absent, no real lexical entries matched the virtual lexical entry.

5.8.5.2.1. Successful Lookup Structures

This record contains a real lexical entry found during traced lexical lookup resulting from the application of the function morph_and_lookup_word, together with any traces continuing from it.

A successful lookup structure also serves as the root structure when tracing the function generate_word; the lexical entry which is the argument of that function serves as the contents of the Real field of this record. (Note that this may not be an actual lexical entry in the lexicon, if the user has supplied a lexical entry as the argument of generate_word.)

Record Label: sll

Fields:

5.8.5.2.1.1. Real Lexical Entry

Optionality: obligatory

Label: real

Type: lexical entry

Purpose: This is either a real lexical entry found during the tracing of the function morph_and_lookup_word, or the argument of the function generate_word.

5.8.5.2.1.2. Realizational Features

Optionality: obligatory

Label: rf

Type: list-valued features list

Purpose: This field tells what morphosyntactic features are to be realized.

5.8.5.2.1.3. Continuations

Optionality: optional

Label: cont

Type: list

Contents: Each member of this list is a rule application, or blocking record structure, or else the atom ‘duplicate_analysis’. (A duplicate analysis is one which represents an identical successful lexical lookup with identical morphological rule applications to one found elsewhere in the analysis. Such duplication can happen because of ambiguities in the unapplication of phonological rules.)

Purpose: Each member of this list represents a continuation by rules being traced during synthesis, tracing of blocking, or tracing of strata during synthesis, of this lexical entry. (Any such continuations will be more 'surfacy' than the form represented by this record.)

If this field is absent, there were no further continuations (no traced rules applied during synthesis, and no blocking lexical entries were found if tracing of blocking is turned on.)

5.8.6. Stratum Synthesis Trace Record

The Stratum Synthesis Trace Record structure is used to output information resulting from the tracing of strata during synthesis. One such record is produced at the beginning of a stratum, and another at the end of the stratum, after all rules of the stratum have been applied; the two records are marked to distinguish them.

The form of a Stratum Synthesis Trace Record is the same as that of a Stratum Analysis Trace Record, except for the record label.

Record Label: sa

Fields: All fields are the same as those of the Stratum Analysis Trace Record, except that there is no explicit continuation field. The trace record immediately following the Stratum Analysis Trace Record, if any, will be its continuation. A Stratum Synthesis Trace Record may fail to have such a continuation if a morphological rule fails to apply; and the output Stratum Synthesis Trace Record of the shallowest user-defined stratum will not have be followed by a Surface Analysis Record if the phonetic shape of the output lexical entry does not match the input word.

The lexical entry of the lex_form record will be at least as fully instantiated as the lexical entry taken from the dictionary.

The Stratum Synthesis Trace Record for the ‘*surface*’ stratum is special. It will of course have no phonological or morphological rule applications. Its input field represents the final form of the lexical entry generated by applying all the rules of the preceding stratum, but in the encoding of the *surface* stratum. Its output field, if present, indicates that the derived lexical entry passes all the final tests (specifically, for the output of the command morph_and_lookup_word, the phonetic form of the derived lexical entry matches that of the original input word). If the output field is not present, the derived lexical entry did not pass the final tests.

5.8.7. Template Analysis Trace Record

The Template Analysis Trace Record structure is used to output information resulting from the tracing of templates during analysis. One such record is produced at the beginning of a template, and another at the end of the template, after all slots of the template have been unapplied; the two records are marked to distinguish them.

For any one input Template Analysis Trace, there may be several output Template Analysis Traces. This is because one such trace record is produced every time a slot is unapplied. (If morphological rules are being traced during analysis, the Template output trace records will appear in the continuations field of the trace of each slot’s rules; if morphological rules are not being traced, the Template output trace records will appear in the continuations field of the input record.)

Record Label: tua

Fields:

5.8.7.1. Template Name

Optionality: obligatory

Label: nm

Type: atom

Purpose: This gives the name of the template whose input or output this record represents.

5.8.7.2. Input vs. Output

Optionality: obligatory

Label: io

Type: atom, either in or out

Purpose: This field tells whether this trace record represents the input to the template (i.e. the lexical form before any slots of the stratum have been unapplied) or the output.

5.8.7.3. Realizational Features

Optionality: obligatory

Label: rf

Type: list-valued features list

Purpose: This field tells what realizational features have been discovered (during analysis).

5.8.7.4. Lexical Form

Optionality: obligatory

Label: lex

Type: lexical entry record

Purpose: This field represents the lexical entry which was the input or output of the template. This lexical entry may be only partially instantiated.

5.8.7.5. Continuations

Optionality: obligatory

Label: cont

Type: list

Contents: Each member of this list is another trace record.

Purpose: Each such member represents a continuation from this form.

If this field is absent, there were no continuations.

5.8.8. Template Synthesis Trace Record

The Template Synthesis Trace Record structure is used to output information resulting from the tracing of templates during synthesis. One such record is produced at the beginning of a template, and another at the end of the template, after all slots of the template have been unapplied; the two records are marked to distinguish them.

Record Label: ta

Fields:

5.8.8.1. Template Name

Optionality: obligatory

Label: nm

Type: atom

Purpose: This gives the name of the template whose input or output this record represents.

5.8.8.2. Input vs. Output

Optionality: obligatory

Label: io

Type: atom, either in or out

Purpose: This field tells whether this trace record represents the input to the template (i.e. the lexical form before any slots of the stratum have been applied or unapplied) or the output.

5.8.8.3. Lexical Form

Optionality: obligatory

Label: lex

Type: lexical entry record

Purpose: This field represents the lexical entry which was the input or output of the template. This lexical entry may be only partially instantiated during analysis.

5.8.9. Rule Synthesis Trace Record

The Rule Synthesis Trace Record is used to output information resulting from the tracing of a phonological or morphological rule during synthesis of surface lexical entries from Real Lexical Entries (or from the lexical entry argument to the function generate_word).

There are two kinds of Rule Analysis Trace Record, depending on whether the rule being traced is a morphological or a phonological rule. They differ only in the Record Label (mrule_app or prule_app).

A Rule Synthesis Trace Record does not have an explicit continuation; the trace record immediately following it, if any, is its continuation.

Record Label: ma (for a morphological rule) and pa (for a phonological rule)

Fields:

5.8.9.1. Rule Name

Optionality: obligatory

Label: nm

Type: atom

Purpose: This gives the name of the traced rule whose attempted application this record represents.

5.8.9.2. Rule Input

Optionality: obligatory

Label: in

Type: lexical entry record

Purpose: This field represents the virtual lexical entry which was the input (in the synthesis sense) of the rule being traced.

Note: This field is not output if the value of the global variable *trace_inputs* is false (the default value of this variable is true).

5.8.9.3. Rule Output

Optionality: optional

Label: out

Type: lexical entry record

Purpose: This field represents the virtual lexical entry which was the output (in the synthesis sense) of the application of the rule being traced.

If this field is absent, the attempted application of the morphological rule being traced failed. (If the rule being traced is a phonological rule, this field will always be present, since a phonological rule cannot fail, although it may leave its input unchanged.)

Implementation note: If a rule does not alter its input (i.e. it fails to apply), an implementation may substitute the atom ‘*NA*’ for a lexical entry record. It would be desirable to show why the rule failed to apply. There is no provision for this at present, but this may be added in the future.

5.8.10. Blocking Records

Blocking data structures have two uses:

A Blocking data structure is used to show when the output of a morphological rule is blocked by a storable lexical entries representing an irregular form listed in the lexicon. For instance, if the morpher were analyzing the (ungrammatical) form runned, the morpher might succeed in morphing this as the verb stem run + the past tense suffix –ed, but this would be blocked by the irregular past tense form ran, listed in the lexicon. If tracing of blocked lexical entries is turned on (see the function trace_blocking, section 6.6.8), when runned is analyzed a blocking record will be produced with the virtual lexical entry for runned, If the rule is traced, the rule’s trace will show the real lexical entry for ran.

A Blocking data structure is also used to show when a derived lexical entry is replaced by a real (stored) lexical entry prior to the application of an inflectional template. Consider again the example of the previous paragraph, but assuming past tense affixation was handled by an inflectional template. Prior to the application of the template, the relative lexical entries of the stem run would be checked to see if any were identical to the lexical entry for run except for bearing the realizational feature [past tense]. Assuming ran was such a relative lexical entry, it would be substituted for run.

When tracing of blocking is turned on, a blocking record is produced for each application (but not unapplication) of a morphological rule whose output is blocked. The blocking record is output immediately after the rule application’s output field, and shows the storable lexical entry which blocks the rule’s normal output. No blocking record is produced when a morphological rule’s output is not blocked.

A blocking record is also produced when one stem is substituted for another immediately prior to the application of a template.

Record Label: block

Fields:

5.8.10.1. Type

Optionality: obligatory

Label: type

Type: atom: ‘rule’ or ‘template’

Purpose: To identify the type of blocking.

Contents: If a rule is being blocked by a stored lexical entry, this is signaled by the word ‘rule’ in this field; or if a derived lexical entry is being replaced by a stored lexical entry just prior to the application of a template, the word ‘template’ appears.

5.8.10.2. Blocking Lexical Entry

Optionality: obligatory

Label: bl

Type: lexical entry structure

Contents: The storable lexical entry which blocks the virtual lexical entry output by the rule.

6. Command Language Functions and Variables

Message numbers for the morpher module begin with hc6000.

Error message hc6000 is the message "Morpher error: Unknown error", and is used for instance when a command expects one type of argument but is given a different sort of argument.

Error message hc6058 is "Morpher error: Incomplete command: <cmd>.", where <cmd> is the command. (This may be triggered e.g. if end-of-file is reached before a command is complete.)

6.1. Variable Functions

6.1.1.morpher_set

Summary: Set a morpher variable to a given value.

Argument: list of length two:

var-name (obligatory): a variable name (atom; the variable names used in the morpher are given in the sections below)

var-value (obligatory): the value to which the variable should be set (Boolean, atom, list, or record structure)

Purpose: To set morpher variables to non-default values.

Normal output: Message hc6502 "Morpher: Variable <var_name set."

Abnormal output: hc6001 "Morpher error: unknown morpher variable: <var_name>."

hc6002 "Morpher error: inappropriate value <var_value> for morpher variable <var_name>." (The value to be assigned is of an inappropriate type; the variable retains its original value.)

hc6032 "Morpher error: inappropriate argument <arg> to morpher_set." (The command morpher_set was called with <arg>, but this is not a list of length two.)

Error messages specific to the individual variables are listed with those variables.

Warning: All assignments to a variable wipe out previous values of that variable.

The morpher will silently ignore any unknown fields in arguments.

See also: morpher_show (section 6.1.2)

6.1.2.morpher_show

Summary: Shows the value of a morpher variable.

Argument: var-name (obligatory): a variable name (atom)

Purpose: To determine the value of a morpher variable.

Normal output: message hc6503 "Morpher: Variable <var_name> has the value <var_value>.", where <var_value> is the value of <var_name>.

Abnormal output:

hc6003 "Morpher error: unknown morpher variable: <var_name>."

See also: morpher_set (section 6.1.1)

6.1.3.load_char_def_table

Summary: Reads in a character definition table. Any new table with the same name as an old table replaces that old table.

Argument: char_table (obligatory): character definition table structure

Purpose: To allow the user to change definitions of characters.

Normal output: If the new table replaces an old one, message hc6520: "Morpher: Old character definition table <table_name> replaced by new table.", where <table_name> is the name (atom) of the table. Otherwise message hc6501 "Morpher: New character definition table <table_name> loaded."

Abnormal output:

hc6020 "Morpher error: Character definition table cannot be loaded, because variable *pfeatures* has not been set." (The variable *pfeatures* must be set before a character definition table can be loaded.)

hc6010 "Morpher error: Unknown feature <fname> <fvalue> used in character definition table <table_name>." (The feature name/ value combination is not listed in the variable *pfeatures*.)

Warnings: The variable *pfeatures* must be set before any character definition tables can be loaded.

In the case of multiple definitions for a single character sequence, the last definition read in supersedes earlier ones. This situation should not arise.

See also: Character Definition Table (section 5.4); show_char_def_table (section 6.1.4); del_char_def_table (section 6.1.5)

6.1.4.show_char_def_table

Summary: Outputs a character definition table.

Argument: name (atom)

Purpose: debugging

Normal output: A character definition table.

Abnormal output:

hc6004 "Morpher error: Character definition table <name> is unknown." (No character definition table by that name has been loaded.)

Warning: There is no guarantee that the segment and boundary definitions in the output table will be in the same order as the input definitions, nor that the features in each segment definition will be in the original order.

6.1.5.del_char_def_table

Summary: Removes a character definition table.

Argument: name (atom)

Purpose: To allow the user to remove a character definition table.

Normal output: hc6509: "Morpher: Character definition table <table_name> removed."

Abnormal output: hc6004 "Morpher error: Character definition table <name> is unknown." (No character definition table by that name has been loaded.)

Warnings: There is no undelete function.

See also: load_character_def_table (section 6.1.3)

6.1.6.load_nat_class

Summary: Reads in a natural class definition. If the new natural class has the same name as an old class, it replaces that old definition.

Argument: natural class (obligatory): natural class record

Purpose: To allow the user to change definitions of natural classes.

Normal output: If the new definition replaces an old one, message hc6540: "Morpher: Old natural class definition <nat_class_name> replaced by new definition.", where <nat_class_name> is the name of the natural class definition. Otherwise message hc6541 "Morpher: New natural class definition <nat_class_name> loaded."

Abnormal output:

hc6039 "Morpher error: Natural class definition cannot be loaded, because variable *pfeatures* has not been set." (The variable *pfeatures* must be set before a natural class definition can be loaded.)

hc6014 "Morpher error: Unknown feature <fname> <fvalue> used in natural class definition <nat_class_name>." (The feature name/ value combination is not listed in the variable *pfeatures*.)

Warnings: The variable *pfeatures* must be set before any definitions of natural classes can be loaded.

The definition of a natural class must be loaded before its name is mentioned in a simple context of a rule.

6.1.7.show_nat_class

Summary: Outputs a natural_class

Argument: name (atom)

Purpose: debugging

Normal output: The definition of a natural class.

Abnormal output:

hc6041 "Morpher error: Natural class <name> is unknown." (No natural class by that name has been loaded.)

6.1.8.remove_nat_class

Summary: Removes the definition of a natural class.

Argument: name (obligatory): atom

Purpose: Removes the named natural class definition from the morpher, without substituting another definition of the same name.

The removed definition cannot be used by any rule which referred to it.

Normal output: Message hc6542 "Morpher: Natural class definition <nat_class_name> removed."

Abnormal output:

hc6041 "Morpher error: Natural class definition <name> is unknown." (No definition by that name has been loaded.)

Warnings: The morpher does not provide an "undo" to restore the definition after it has been deleted.

Hermit Crab does not check whether a natural class which is about to be removed is still used in some rule.

See also: load_nat_class (section 6.1.6)

6.1.9.*pfeatures*

Summary: The *pfeatures* variable lists all the phonetic feature names, together with their possible values.

Default: There is no default.

Possible values: A list-valued feature list. Each odd numbered member is a feature name, and each even-numbered member is a list of possible values for that feature name.

When this variable is changed, any already loaded character definition tables, lexical entries, or rules, are invalidated (unloaded).

Warning: A list of possible values of length one would not correspond to linguistic reality in most theories, and should therefore be avoided. Hermit Crab does not check for this.

There will probably be implementation-specific limits on the number of possible feature/value combinations.

6.1.10.*cur_lang*

Summary: Setting the *cur_lang* variable changes the language whose phonology and morphology is being modeled.

Default: *unknown*

Possible values: A string (the name of a language).

When this variable is changed, all subsequent commands refer to that language until this variable is set again, or until the command open_language is issued. Information associated with a language (rules, lexicon, variable settings etc.) remains in existence during a session, until the command close_language is issued for the named language.

Abnormal output: hc6023 "Morpher error: Unknown language <lname>." (The specified language has not already been opened by the command open_language; the current language remains whatever it was before.)

Warning: Attempting to perform any commands except setting or showing this variable, opening a language, redirecting input or output, or terminating the morpher when this variable has the default value *unknown* will result in error message hc6036 "Morpher error: There is no current language."

6.1.11.assign_default_morpher_feature_value

Summary: Assigns a default feature value to a feature name.

Argument: list:

feature-name (obligatory): the name (atom) of a head or foot feature

default-value (obligatory): the default value(s) of the feature (list)

Purpose: By default, the morpher assigns a default value of () (the empty set) to all morphosyntactic feature names. This default value is used as the value of the feature in the absence of a specified value in a Lexical Entry data structure. This function allows the user to assign a different default value to a feature name in the morpher.

Normal output: Message hc6507 "Morpher: Feature <feature_name> assigned the default value <default_value>."

Abnormal output: There is no function-specific error checking.

Warnings: The morpher does not check that the feature-name is used anywhere in the grammar. Hermit crab also does not check that the same default values are assigned to a given feature name in the morpher and parser.

Example: (assign_default_morpher_feature_value (person (3)))

See also: show_default_morpher_feature_value (section 6.6.11)

6.1.12.set_stratum

Summary: Sets the properties of a named stratum.

Argument: a Stratum Property Setting record.

Purpose: To assign a character definition table for a stratum, set a stratum to cyclic or noncyclic application, determine whether the phonological or morphological rules of a stratum are ordered, or to set the inflectional templates for a stratum.

Normal output: If the property being set is the stratum’s character definition table, message hc6550: "Morpher: Character definition table for stratum <sname> set to table <ctable_name>."

If the property being set is the stratum’s cyclicity, and the value is true, message hc6551 "Morpher: Stratum <sname> set to cyclic rule application."; or if the value of the cyclicity argument is false, message hc6552 "Morpher: Stratum <sname> set to noncyclic rule application."

If the property being set is the ordering of phonological rules, either message hc6553 "Morpher: Phonological rule ordering for stratum <sname> set to simultaneous." or hc6555 "Morpher: Phonological rule ordering for stratum <sname> set to linear."; or if the property is the ordering of morphological rules, either message hc6554 "Morpher: Morphological rule ordering for stratum <sname> set to unordered." or hc6556 "Morpher: Morphological rule ordering for stratum <sname> set to linear."

If the property being set is the Affix Templates of the Stratum, message hc6565 "Morpher: The affix templates for stratum <sname> have been loaded."

Abnormal output: For all properties, message hc6021 "Morpher error: Unknown stratum <sname>."

If the property is the stratum’s character definition table, message hc6004 "Morpher error: Character definition table <table_name> is unknown."

If the property specified in the Stratum Property Setting record is not ctable, cyclic, prule, mrule, or template, or the value specified is not appropriate for the property, hc6034 "Morpher error: Attempt to assign inappropriate value <value> or inappropriate property <property> to stratum <sname>."

Warnings: Resetting (even vacuously) the character definition table assigned to any stratum will result in all lexical entries assigned to all strata being invalidated. (The reason for resetting the lexicons for all strata, not just the stratum being changed, is that a file of lexical entries may contain entries for more than one stratum, and it will therefore not be sufficient to simply reload the file, since that might result in duplicate lexical entries in other strata.)

Hermit Crab does not check that all the rules named in the Affix Templates are rules which have already been loaded. However, if a rule which is named in a Template has not been loaded by the time parsing or generation takes place, and a word whose part of speech appears in the parts of speech field of the Template is being derived (generated), an error will occur.

Hermit Crab intentionally does not check for appearance of a rule name more than once in a given Template. The assumption is that the same affix might apply in more than one slot, e.g. if the same affix is used to mark a given person of both subject and object, but the subject and object affixes appear in different slots.

Hermit Crab also does not check that all the parts of speech of the Affix Template are actually the part of speech of some lexical entry (or the output of some morphological rule). However, this should not result in an error message during parsing. Nor does it perform any check on the subcategorized rules in the Template’s Required Subcategorized Rules field.

6.1.13.show_stratum

Summary: Show the properties of a named stratum.

Argument: An atom, the name of a stratum.

Purpose: To display certain properties of a stratum, namely its character definition table, its cyclicity setting, the ordering of its phonological or morphological rules, and its affix template.

Normal output: One of the following messages:

hc6557 "Morpher: For stratum <sname>, the character table is <table_name>, the stratum is cyclic, phonological rules are simultaneously ordered, morphological rules are unordered, and the affix templates are <templates>."

hc6558 "Morpher: For stratum <sname>, the character table is <table_name>, the stratum is noncyclic, phonological rules are simultaneously ordered, morphological rules are unordered, and the affix templates are <templates>."

hc6559 "Morpher: For stratum <sname>, the character table is <table_name>, the stratum is cyclic, phonological rules are linearly ordered, morphological rules are unordered, and the affix templates are <templates>."

hc6560 "Morpher: For stratum <sname>, the character table is <table_name>, the stratum is noncyclic, phonological rules are linearly ordered, morphological rules are unordered, and the affix templates are <templates>."

hc6561 "Morpher: For stratum <sname>, the character table is <table_name>, the stratum is cyclic, phonological rules are simultaneously ordered, morphological rules are linearly ordered, and the affix templates are <templates>."

hc6562 "Morpher: For stratum <sname>, the character table is <table_name>, the stratum is noncyclic, phonological rules are simultaneously ordered, morphological rules are linearly ordered, and the affix templates are <templates>."

hc6563 "Morpher: For stratum <sname>, the character table is <table_name>, the stratum is cyclic, phonological and morphological rules are linearly ordered, and the affix templates are <templates>."

hc6564 "Morpher: For stratum <sname>, the character table is <table_name>, the stratum is noncyclic, phonological and morphological rules are linearly ordered, and the affix templates are <templates>."

All values will be instantiated, including any default values not explicitly set by the user. (But see below concerning error message hc6048.)

Abnormal output: Message hc6021 "Morpher error: Unknown stratum <sname>."

If the character definition table for a stratum has not been set, hc6048 "Morpher error: Stratum <sname> has not been assigned a character definition table.".

See also: show_stratum_prules (section 6.2.4), show_stratum_mrules (6.2.5)

6.2. Rule Loading Functions and Variables

The editing of rules is to be provided by the shell; no rule editing facilities, only rule loading functions, are provided in the morpher itself. The assumption underlying this is that the format of the rules visible to the user will be determined by the linguistic theory which the system is emulating, and will be different from Hermit Crab's internal format.

Similarly, no facility for permanently storing rules is provided. The shell is responsible for tracking changed rules (alternatively, the current version of a rule may be obtained from the morpher using the function show_active_phon_rules or show_active_morph_rules), and saving such changes as necessary.

Finally, it is assumed that the form of the rules being loaded is basically correct, e.g. that the phonetic output of a morphological rule does not refer to a nonexistent element of its phonetic input.

6.2.1.load_morpher_rule

Summary: Loads a single rule into the morpher's rulebase.

Argument: rule (obligatory): a morphological or phonological rule structure (defined below)

Purpose: Loads a single morphological or phonological rule from the shell into the morpher, replacing any existing rule of the same name. A new rule is appended to the end of the list of phonological or morphological rules (the value of the variable *prules* or *mrules*); a rule which replaces an existing rule of the same type is retained in the same position in the list of phonological or morphological rules. However, Realizational Morphological Rules are only loaded, not added to the list of morphological rules; in order to be used after being loaded, they must be added to an Affix Template (see set_stratum, section 6.1.12).

Normal output: If the new replaces an old rule, message hc6504 "Morpher: Rule <rname> replaced by new rule.", where <rname> is the rname field of the rule. If the new rule does not replace an old rule, message hc6505 "Morpher: New morpher rule <rname> loaded." (If a phonological rule replaces a morphological rule, or vice versa, the newly loaded rule will be treated as a new rule for purposes of rule ordering, since phonological and morphological rules are kept in separate lists for purposes of ordering. However, it is unwise to do such cross-type rule replacement.)

Abnormal output:

hc6009 "Morpher error: Failure to translate character <char> of string <string> of item <item> into a phonetic sequence using character table <ctable_name>.", where <char> is the character which could not be translated, <string> is the string in the (morphological) rule which could not be translated, <item> is the rule name, and <ctable_name> is the name of the character table to which the string belongs. (The translation from string to phonetic sequence failed because a character could not be found in the Character Definition Table; see Translation from String to Phonetic Sequence, section 4.1.1.2).

hc6015 "Morpher error: Rules cannot be loaded, because variable *pfeatures* has not been set."

hc6019 "Morpher error: Rules cannot be loaded, because strata have not been defined."

hc6037 "Morpher error: Segment or boundary marker <segment> used in rule <rname> is not defined in character definition table <ctable>." (The segment is not defined in the character definition table specified in the segment record.)

hc6038 "Morpher error: Rule <rname> cannot be processed because length of input = <in_length> and length of output = <out_length>." (Hermit Crab allows the input and output of phonological rules to differ in length only if one or the other is of length zero or one; see Definition of the Phonetics of a Single Application of a Phonological Rule, 4.4.1.2.)

hc6040 "Morpher error: Unknown feature <fname> used in declaration of variable <var_name> in rule <rname>."

hc6042 "Morpher error: Unknown natural class <nat_class_name> used in rule <rname>." (The specified natural class name appears in one of the phonetic sequences of the named rule, but it has not been defined; see load_nat_class, section 6.1.6.)

hc6047 "Morpher error: Unknown alpha variable <var_name> used in a context of rule <rname>." (The specified alpha variable appears in the alpha_vars field of a simple context of the named rule, but it is not defined in the var_fs field of that rule.)

Note: Unknown fields in morpher rules will be silently ignored. The assumption is that some morphing algorithms may require control information which other algorithms do not use, that this information will be passed by fields not described in this specification, and that this information can be safely ignored by a version of the morpher using another algorithm.

The variables *pfeatures* must be set before any rules are loaded, and any character definition tables and natural classes used by the rules must have been loaded.

Warnings: The morpher does not provide an "undo" to restore the original rule of the same name (if any).

If an attempt is made to replace one rule by another, and the new rule fails to load, the old rule will not be replaced.

The morpher does not check that the character table of any segments in the phonetic sequences in the rule correspond to the character tables of the strata to which the rule belongs.

It is normally unwise to replace a phonological rule with a morphological rule of the same name, or vice versa.

See also: remove_morpher_rule (section 6.2.2)

6.2.2.remove_morpher_rule

Summary: Removes a rule from the morpher's rule base.

Argument: rule-name (obligatory): atom

Purpose: Removes the named rule from the morpher's rule base, and its name from the all lists of rules, without substituting another rule of the same name.

Specifically, the removed rule's name is automatically deleted from the set of traced rules, from the *prules* and *mrules* variables; and from the list of morphological and phonological rules of each stratum.

Normal output: Message hc6506 "Morpher: Rule <rname> removed."

Abnormal output:

hc6005 "Morpher error: Unknown morpher rule <rname>."

Warnings: The morpher does not provide an "undo" to restore the rule after it has been deleted.

Since this function (like other rule loading functions) does not distinguish between morphological and phonological rules, one should not give identical names to rules of the two types.

Example: (remove_morpher_rule plur_suffix)

See also: load_morpher_rule (section 6.2.1)

6.2.3.reset_morpher_rulebase

Summary: Resets the rulebase to be empty.

Argument: none

Purpose: To allow the user to load in an entirely new rulebase without dropping out of Hermit Crab. After performing this command, the morpher has no morphological or phonological rules loaded. This may be of use in doing historical linguistic research, when a set of proto-forms are to be transformed into words of two or more daughter languages.

Normal output: Message hc6531 "Morpher: rulebase has been reset to null."

Abnormal output: There is no function specific error checking.

Warnings: Any changes will be lost.

Implementation note: It is not considered an error to reset an empty rulebase, therefore calling this function twice will not cause an error.

6.2.4.show_stratum_prules

Summary: Shows the names of the phonological rules assigned to the named stratum.

Argument: Stratum name (atom)

Purpose: To display the names of the phonological rules assigned to a particular stratum.

Normal output: A (possibly empty) list of rule names, in their order of application during synthesis (assuming linear rule application).

Abnormal output: hc6021 "Morpher error: Unknown stratum <sname>."

See also: *prules* (section 6.2.6)

6.2.5.show_stratum_mrules

Summary: Shows the names of the morphological rules assigned to the named stratum.

Argument: Stratum name (atom)

Purpose: To display the names of the morphological rules assigned to a particular stratum.

Normal output: A (possibly empty) list of rule names, in their order of application during synthesis (assuming linear rule application).

Abnormal output: hc6021 "Morpher error: Unknown stratum <sname>."

See also: *mrules* (section 6.2.7)

6.2.6.*prules*

Summary: The *prules* variable gives the names of all phonological rules in the order of their application (in synthesis, and assuming linear ordering), regardless of their assignment to strata. This variable may only be set after any phonological rules whose names appear have already been loaded.

Default: The default is the order in which the rules were loaded; if no phonological rules have been loaded, the value is the empty list. Any new phonological rules which are loaded after this variable has been set will be appended to the end of the list of phonological rules. Deleting a rule with the command remove_morpher_rule will result in the rule’s name being deleted from this list.

Possible values: A list of rule names.

Abnormal output: hc6007 "Morpher error: Unknown phonological rule: <rname>." (Note: Hermit Crab is only guaranteed to find the first such unknown rule.)

hc6043 "Morpher error: Phonological rule <rname> does not appear in the variable *prules*." (The phonological rule has been loaded, but its name is missing from the variable’s list of rules.)

Note: If the order of phonological rules for some stratum is simultaneous, setting this variable will have no effect on that stratum. Such rules may be intermingled among other rules in any order.

See also: show_stratum_prules (section 6.2.4)

6.2.7.*mrules*

Summary: The *mrules* variable lists the names of the ordinary morphological rules (i.e. not the realizational rules) in the order of their application (in synthesis, and assuming linear ordering), regardless of their assignment to strata. This variable may only be set after any morphological rules whose names appear have already been loaded.

Default: The default is the order in which the rules were loaded; if no morphological rules have been loaded, the value is the empty list. Any new morphological rules which are loaded after this variable has been set will be appended to the end of the list of morphological rules. Deleting a rule with the command remove_morpher_rule will result in the rule’s name being deleted from the list of morphological rules.

Possible values: A list of rule names.

Abnormal output: hc6008 "Morpher error: Unknown morphological rule: <rname>." (Note: Hermit Crab is only guaranteed to find the first such unknown rule.)

hc6044 "Morpher error: Morphological rule <rname> does not appear in the variable *mrules*." (The morphological rule has been loaded, but its name is missing from the variable’s list of rules.)

Note: If the order of morphological rules for some stratum is unordered, setting this variable will have no effect on that stratum. Such rules may be intermingled among other rules in any order.

See also: show_stratum_mrules (section 6.2.5)

6.3. Morphing Functions and Variables

6.3.1.morph_and_lookup_word

Summary: Causes the morpher to morph a single word and look up the residue in the dictionary, in as many ways as possible.

Argument: input token (obligatory): a list consisting of a three token records, as output by the Preprocessor (see Input Data Format, section 5.1). The first token record is the word to be morphed, while the second and third token records represent the previous and following words, respectively. If the word is to be morphed as if it is the first word of an utterance, the second member of the list may instead be the atom *null*. Likewise, if the word is to be morphed as if it is the last word of an utterance, the third member of the list may be replaced by the atom *null*. If it is desired that no rules apply which depend on the preceding and/or following word of the utterance (or if there are no such rules in the grammar), the second and/or third arguments should be replaced by the atom *NA*.

Purpose: The function morph_and_lookup_word performs an exhaustive morphing and lookup of the word represented by the second token in its input argument, by applying the following generate-and-test algorithm:

Attempt to look up the input word in the lexicon.

Unapply all phonological rules of the top-most stratum and zero or more morphological rules of that stratum, and attempt lexical lookup of all storable lexical entries produced, saving any lexical entries found on lookup.

Repeat the previous step for each lower stratum.

For each form looked up in the lexicon, apply the morphological rules which produced its analysis and the phonological rules of the relevant strata, throwing away any forms which are blocked by irregular forms in the lexicon. If the resulting surface form is identical to the input word, the path which produced it represents a successful analysis.

The result is a set of lexical entries representing the analysis of the input word.

Normal output: If there is at least one morphing, a list whose first element is the atom word_analyses, and whose remaining members are lexical entries, one for each successful analysis. (For the definition of lexical entries, see section 5.2, Lexical Entry Data Structure.)

The tracing of morpher rules produces additional output in the form of a call to pretty_print plus a root trace structure; see the discussion in section 5.8 of the Trace Record Structure.

Abnormal output:

hc6006 "Morpher error: Unknown word: <printform>.", where <printform> is the string which represents the (internal) printform of the word. (There was no successful morphing.) If tracing is turned on, a trace record is produced regardless of whether the word was successfully analyzed.

hc6016 "Morpher error: Failure to translate character <char> of word to be parsed <word> into a phonetic sequence using character table <ctable_name>.", where <char> is the character which could not be translated, <item> is the printform of the lexical entry which could not be translated, and <ctable_name> is the name of the character table of the lexical entry which could not be translated. (The translation from string to phonetic sequence failed because a character could not be found in the Character Definition Table (see Translation from String to Phonetic Sequence, section 4.1.1.1).

hc6011 "Morpher error: Failure to translate the set of phonetic features <features> into a character using character table <table_name>.", where <features> is the set of features which could not be translated, and <table_name> is the name of the character table which was being used. (See Translation from Phonetic Sequence to Regular Expression, section 4.1.1.2.)

hc6035 "Morpher error: Failure to unambiguously translate the set of phonetic features <features> into a character using character table <table_name>; the ambiguous translation is <ambiguous translation>." (There was a translation, but it was ambiguous; the final translation into a surface form cannot be ambiguous.)

hc6022 "Morpher error: No strata defined."

hc6033 'Morpher error: Stratum <sname> must be assigned a character definition table.'

hc6024 "Morpher error: Lexical entry <lex-id> assigned to unknown stratum <stratum>.", where <lex-id> is the value of the Lexical Entry ID field of the offending lexical entry, and <stratum> is the name of the unknown stratum. (One of the real lexical entries which was looked up had a stratum specified which was not listed as a stratum name in the *strata* variable.) This error message may also be generated by the function load_lexical_entry, which should prevent such a lexical entry from being added in the first place. However, this error message can also be generated by the present function if lexical entries were added to the external database by some other program which did not check for correctness of strata.

hc6042 "Morpher error: Unknown natural class <nat_class_name> used in rule <rname>." (The specified natural class name appears in one of the phonetic sequences of the named rule, but it is not defined. Since it had to have been defined when the rule was loaded (see load_morpher_rule, section 6.2.1), it must have been removed by remove_nat_class.)

hc6050 "Morpher error: Boundary marker in phonetic representation is unknown in character definition table <table_name>." (This may occur when a rule is traced, if a boundary marker is introduced by a morphological rule, which marker does not belong to the stratum of the lexical entry. It should be avoided by only specifying a character definition table which will be available in the stratum to which the rule applies. Note that the boundary marker itself cannot be printed out, because its character definition table is unknown to the lexical entry.)

hc6051 "Morpher error: Deletion rule <rname> deleted all segments and/or boundaries from phonetic sequence of lexical entry."

hc6052 "Morpher error: A deletion rule has deleted all segments and/or boundaries from phonetic sequence of lexical entry." (Message hc6051 should be used if the deletion rule which caused this error can be determined; message hc6052 may be used otherwise, e.g. if the error only became apparent at the end of the stratum, when boundary markers are erased.)

hc6053 "Morpher error: Rule <rname> requires agreement in the feature <fname>, but the feature is uninstantiated in the environment." (During synthesis, a feature must be instantiated in at least one place at the point the agreement rule is supposed to apply. For instance, if the target is supposed to agree in point of articulation with the following segment, then the point of articulation of that following segment must be instantiated when the rule applies.)

hc6055 "Morpher error: Ambiguous application of Affix Templates to Lexical Entry <lexid>; the following Templates matched: <template_names>." (More than one Template matched; all matching names are shown.)

hc6059 "Morpher error: Unknown rule <rname> in an Affix Template for the Stratum <sname>." (The named realizational rule appears in one of the slots of one of the Affix Templates, but the rule itself is not currently loaded.)

Example:

(morph_and_lookup_word

(*NA* <token orth "loves" shape "lovz"> *NA*))

Assuming the appropriate rules and lexical entries, this should return two analyses, one with loves as a plural noun (as in "the many loves of Doby Gillis"), the other as a third person singular present tense verb.

See also: show_morphings (section 6.6.10), generate_word (section 6.3.3)

6.3.2.morph_and_lookup_list

Summary: Maps the function morph_and_lookup_word (see above, section 6.3.1) over a list of words.

Argument: words (obligatory): a list containing one or more token records, as output by the Preprocessor (see section 5.1, Input Data Format).

Purpose: This function morphs a series of words, such as a sentence. The output is intended to be usable as the input of the parser module.

Normal output: If each word in the input list is successfully morphed, the output is a list whose first element is the atom word_analyses, and whose second element is a list of lists. Each sublist is a list of lexical entries, one for each successful morphing of an input word.

Note that the output of this function is a list (with the identifier word_analyses) of lists of lists of lexical entries, while the output of morph_and_lookup_word is a list (with the same identifier) of lists of lexical entries.

If tracing is turned on, one call to the command pretty_print plus a root trace structure is produced for each word in the input to this command. A trace structure may be prematurely terminated by an error message.

Abnormal output:

hc6012 "Morpher error: Unknown word(s): <words>.", where <words> are the printforms of any unknown words, each separated by a space. (One or more words in the input could not be morphed; analysis of any words which were successfully morphed are not output.)

Again, the output of this function is not identical to what would result if morph_and_lookup_word were simply mapped over the input list, as mapping would result in a separate error message for each unknown word.

Errors in translation between strings and phonetic sequences return the same error messages as morph_and_lookup_word.

Example:

(morph_and_lookup_list

(<token orth "John" shape "john">

<token orth "'ll" shape "l">

<token orth "go" shape "go">))

This should return a list whose first member is the atom word_analyses, and whose other member is a list of lists of lexical entries: one such sublist for John, another for ll (the verb will), and finally a sublist of lexical entries for go.

6.3.3.generate_word

Summary: Generates a derivation in the synthesis sense for a lexical entry to a surface form.

Argument: list:

lex-entry or lex-id (obligatory): a lexical entry record, or a string designating a lexical entry in the current lexicon;

morph-rules (obligatory): a list of lists of rule names of morphological rules to be applied. Each sublist of this list consists of the names of the morphological rules of a stratum which are to be applied. The rules of the first such sublist must belong to the stratum of the lexical entry, the rules of the next sublist must belong to the next (more surface) stratum, etc. There should NOT be a sublist for the *surface* pseudo-stratum.

realizational-features (obligatory): A List-Valued Features list. This represents the set of Head Features which are to be realized by Realizational Rules, and are added to the Head Features of the Lexical Entry, superceding any conflicting Feature Values already present. Also, they may not be overwritten by features assigned by affixes. (Normally, each sublist will contain the name of a single feature value, so that an atomic-valued feature list would suffice; the list-valued feature list is an extension, since a list can always contain a single value. No sublist should be empty.)

prev-word (optional, unless next_word is supplied): token record (as output by the Preprocessor; see section 5.1, Input Data Format), representing the preceding word in the utterance (for alternatives, see morph_and_lookup_word, 6.3.1)

next-word (optional): token record, representing the next word in the utterance

Purpose: To allow the user to test the rules by synthesizing a surface lexical entry from an underlying lexical entry. If the first argument of this function is a lex-id, the underlying lexical entry will be taken from the current lexicon; if the first argument is a lexical entry, that lexical entry will be used as the underlying form (it may or may not be in the current lexicon). This should be useful for debugging, but it may also be useful for historical reconstruction and Computer Assisted Related Language Adaptation (CARLA).

Normal output: A lexical entry data structure representing the surface form derived from the underlying form by the application of the specified morphological rules and any relevant phonological rules. (If tracing is turned on, a trace record is output before the normal output; see Trace Record Structure, section 5.8.)

If the variable *blocking* is set to true (the default), generation of a surface form from an underlying form is blocked by a blocking lexical entry. If the variable is set to *substitute*, when the morpher encounters a blocking lexical entry, it substitutes that blocking entry for the blocked lexical entry, and continues with the derivation.

Abnormal output: Any of the error messages which may be output by morph_and_lookup_word, except for hc6006. (Note that hc6024 "Morpher error: Lexical entry with phonetic shape <pform> assigned to unknown stratum <stratum>." will be triggered if the lex-entry in the argument list of generate_word refers to a nonexistent stratum.) Additional error messages which may appear:

hc6013 "Morpher error: Unknown lexical entry: <lex_id>." (The user supplied a lexical id string as the first argument, but the specified lexical id could not be found.)

hc6025 "Morpher error: Incorrect number of strata in list of morphological rules to be applied.". (The number of sublists in the morph-rules argument must equal the number of strata to be applied to the lexical entry. Specifically, there must be a sublist for the stratum to which the lexical entry belongs, and one sublist for each higher stratum, not counting the *surface* stratum.)

hc6026 "Morpher error: Unknown morphological rule <rname> for stratum <stratum> specified in list of morphological rules to be applied.", where <rname> is the name of the rule in the morph-rules list argument to this function, and <stratum> is the name of the stratum. (There may be more than one unknown rule in the morph-rules list; only the first unknown rule is shown.)

hc6055 "Morpher error: Ambiguous application of Affix Templates to Lexical Entry <lexid>; the following Templates matched: <template_names>." (More than one Template matched; all matching names are shown.)

hc6056 "Morpher error: <fieldname> field is missing from lexical entry with shape <pshape>." (The user forgot to specify some obligatory field; note that for this command, the lex_id is not obligatory.)

Warnings: The morpher does not check that the morphological rules for each stratum are in the same order as that which was specified in a set_stratum command. This is intentional, so as to allow the user to explore varying rule orders.

See also: morph_and_lookup_word (section 6.3.1)

6.3.4.*strata*

Summary: The *strata* variable lists the names of the rule strata in the order of their application (in synthesis).

The morpher defines a pseudo-stratum *surface*, which corresponds to the surface (input) form of words. This stratum has a character definition table, but no morphological or phonological rules. It does not need to be given in the list of strata assigned to the *strata* variable.

Default: There is no default; there must be at least one stratum, not counting the *surface* stratum.

Possible values: A list of stratum names.

Abnormal output: There is no error checking uniquely associated with this variable.

Warning: Resetting this variable causes the lexical entry database to be reset.

6.3.5.*del_re_app*

Summary: When deletion rules are unapplied, it may be impossible to tell when to stop unapplying them, if the unapplication of such a rule creates an environment for its repeated unapplication. For instance, consider the following deletion rule (written with the usual linguistic abbreviations):

C ® 0 / C__C

If this rule is unapplied to a sequence ...C1C2..., it will generate the sequence ...C1C3C2..., where C3 is the undeleted consonant. But now the rule may be unapplied again, once between C1 and C3, and once between C3 and C2; and so on ad infinitum.

The *del_re_app* variable imposes an arbitrary (i.e. linguistically unmotivated) upper limit on such feeding unapplication by limiting the number of times deletion rules can be re-unapplied to their own output. Should the default (0) prove too low, the user may set this variable to a higher value, although that will probably slow parsing.

Default: 0

Possible values: integer

See also: Deletion Rules (section 2.3.5)

6.3.6.*show_glosses*

Summary: Determines whether the gloss field is shown on lexical entries in traces and word_analyses. If true, glosses are shown, else not.

Type: atom

Default: true

Possible values: true or false

6.4. Lexicon Functions

6.4.1.load_lexical_entry

Summary: Adds a new lexical entry to the lexicon.

Argument: lex-entry (obligatory): a lexical entry record

Purpose: This function allows the user to add a new (real) lexical entry to the (temporary) lexicon. It replaces any existing lexical entry having the same lexical id.

Normal output: If no lexical entry with the lexical id <lex_id> has been loaded, message hc6512 "Morpher: Adding new lexical entry with lexical id <lex_id>.". If a lexical entry with that lexical id has already been loaded, message hc6523 "Morpher: Replacing old lexical entry <lex_id> with new one."

Abnormal output:

hc6009, "Morpher error: Failure to translate character <char> of string <string> of item <item> into a phonetic sequence using character table <ctable_name>." where <string> is the shape of the lexical entry and <item> is its id. (Note that <char> may have been intended to associate with the preceding or following character.)

hc6024, "Morpher error: Lexical entry with lexical id <lex_id> assigned to unknown stratum <stratum>.", where <pform> is the Phonetic Shape of the lexical entry, and <stratum> is the name of the unknown stratum.

hc6027 "Morpher error: Lexical entries cannot be loaded, because variable *strata* has not been set."

hc6029 "Morpher error: Lexical entry with phonetic shape <pform> must have a lexical ID."

hc6057 "Morpher error: <fieldname> field is missing from lexical entry <lexid>." (An obligatory field is missing.)

Warnings: The lexical entry is not automatically saved (see section 6.5.1, dump_dictionary_to_file).

Loading a lexical entry identical to one already in the lexicon is a good way to introduce spurious ambiguity.

The fact that a lexical entry successfully loads is no guarantee that it is correct, since the morpher may or may not check all parts of the entry as it loads it. Errors in other parts of the lexical entry may not become apparent until a word fails to be morphed which should have used that lexical entry. Such errors can be very difficult to track down.

Note: This function is intended to allow entering new lexical entries interactively. It is likely that many users will also wish to enter lexical entries from existing dictionaries, perhaps using the morpher to strip suffixes from full word entries. For this purpose, the function trace_morph (defined below) may be useful for suggesting possible roots or stems.

Implementation note: The shell may use the morpher to help the user create lexical entries. For instance, rather than simply asking the user if a root provided by the user has irregular inflected forms, the morpher may be used to generate a set of regularly inflected forms, and the user queried as to whether the forms are correct.

See also: remove_lexical_entry (section 6.4.2); show_lexical_entry (section 6.4.4); load_dictionary_from_text_file (section 6.5.2); merge_text_file_with_dictionary (section 6.5.3)

6.4.2.remove_lexical_entry

Summary: Removes a designated real lexical entry.

Argument: lex-id (obligatory): string

Purpose: Allows the user to remove a lexical entry, specified by its Lexical Entry ID.

Normal output: Message hc6513 "Morpher: Lexical entry <lex_id> with printform <printform> removed.", where <printform> is the Phonetic Shape of the lexical entry which was removed.

Abnormal output:

hc6013 "Morpher error: Unknown lexical entry: <lex_id>."

Warnings: The revised lexicon is not automatically saved (see section 6.5.1, dump_dictionary_to_file).

There is no undelete function.

Once a lexical entry is removed, its Lexical Entry ID becomes invalid.

See also: find_lexical_entries (section 6.4.3); load_lexical_entry (section 6.4.1); show_lexical_entry (section 6.4.4)

6.4.3.find_lexical_entries

Summary: Returns a list of real lexical entries from the lexicon which are superentries of a template lexical entry.

Argument: template (obligatory): a lexical entry record (perhaps partially instantiated)

Purpose: This enables the user to get a list of all real lexical entries which are superentries of a given lexical entry. Such a list might be used to find entries needing editing or removal from the lexicon.

Normal output: A list whose first member is the identifier lexical_entries, and whose second member is a (possibly empty) list of lexical entries. The Lexical Entry ID field of these entries is used as an argument to the functions remove_lexical_entry, show_lexical_entry, and show_relative_lexical_entries.

An empty list as output is not considered an error, but implies that there are no lexical entries which are superentries of the given lexical entry.

Abnormal output: There is no function specific error checking.

Warnings: The morpher is not guaranteed to remember the Lexical Entry ID field of the lexical entries returned by this function from one Hermit Crab session to another. The identifier for a given lexical entry also becomes invalid if that lexical entry is deleted by remove_lexical_entry, or when the function reset_lexicon is called.

This function can be very slow if the lexicon is large, and the template highly uninstantiated.

6.4.4.show_lexical_entry

Summary: Outputs a copy of a (real) lexical entry.

Argument: lex-id (obligatory): string

Purpose: The morpher has no built-in provision for editing a lexical entry, since editing involves interaction between the shell and the user. Instead, this function sends a copy of the lexical entry to the shell, where the user may edit it. The modified lexical entry may then be loaded back in with the function load_lexical_entry, and the original lexical entry removed with the function remove_lexical_entry if desired.

Normal output: a lexical entry record.

Abnormal output:

hc6013 "Morpher error: Unknown lexical entry: <lex_id>."

See also: find_lexical_entries (section 6.4.3); load_lexical_entry (section 6.4.1); remove_lexical_entry (section 6.4.2)

6.4.5.merge_in_dictionary_file

Summary: Loads in a named dictionary file, adding it to the contents (if any) of the current lexicon.

Argument: dictionary-name (obligatory): string

Purpose: Loads a new dictionary file into the lexicon. New lexical entries loaded in do not overwrite previous lexical entries, even if identical. The ability to load in multiple dictionary files may be of use when working with multiple dialects, languages or semantic domains.

Normal output: Message hc6515 "Morpher: Loaded <n> lexical entries from the dictionary file <fname>.", where n is the number of lexical entries loaded.

Abnormal output: See load_lexical_entry (section 6.4.1). If possible, a single erroneous lexical entry should not cause the morpher to stop loading the file.

Warnings: The morpher does not check for duplicate lexical entries.

Implementation notes: "Loading" a dictionary file merely means that its information is now accessible; it is analogous to opening a file for reading. The file need not be copied into memory. Accordingly, it may be necessary to lock the file against writes, so that the locations of lexical entries in the file do not change. Alternatively, the "file" may actually be a database table, in which case the table should either be locked against updates, or any updates communicated to the morpher by means of load_lexical_entry and remove_lexical_entry.

See also: reset_lexicon (section 6.4.6); load_dictionary_from_text_file (section 6.5.2)

6.4.6.reset_lexicon

Summary: Resets the lexicon to be empty.

Argument: none

Purpose: To allow the user to load in an entirely new dictionary without dropping out of Hermit Crab. Immediately after performing this command, the morpher has no lexical entries; in other words, all lexical lookups will fail. This does NOT erase any dictionary files; it merely severs any connection the morpher module may have had with those files (for instance by closing them).

Normal output: Message hc6516 "Morpher: Lexicon has been reset to null."

Abnormal output: Message hc6028 "Morpher error: Lexicon cannot be reset until variable *strata* has been set."

Warnings: Any changes (i.e. lexical entries which have been added or removed) since the last time the lexicon was saved will be lost.

Implementation note: It is not considered an error to reset an empty dictionary, i.e. calling this function twice will not cause an error. This command can be called even if the strata have not been assigned character definition tables, even though in that case there can be no valid lexicon (because lexical entries cannot be loaded until the strata have been assigned character definition tables). The result will simply be no change.

See also: merge_in_dictionary_file (section 6.4.5)

6.5. Dictionary Functions

As discussed above, the dictionary is the permanent repository of lexical information. The user can maintain multiple dictionary files (e.g. for different semantic domains or different languages), and any number of dictionary files may be loaded into the morpher at any point.

There is no constraint on the form of a dictionary file, nor even any guarantee that such a "file" is one file on disk.

The functions discussed in the following subsections serve as the interface between the dictionary and other applications programs, enabling the user to convert the dictionary to or from a standard format (e.g. ASCII text). Because of the wide variety of formats an external program might use, no attempt is made to convert between the internal dictionary format and some "standard" format. Instead, the text format of the dictionary is the lexical entry format described above (see Lexical Entry Record Structure, section 5.2). Conversion between this format and other formats (e.g. standard format markers) should be trivial.

The dictionary may or may not be internal to the morpher. If it is internal, the morpher is responsible for the execution of these commands. If a separate dictionary module maintains the dictionary, then these commands will be executed by that module.

6.5.1.dump_dictionary_to_file

Summary: Dumps a saved dictionary file to a text file (e.g. an ASCII file).

Argument: list:

dict-file-name (obligatory): string

text-file-name (obligatory): string

Purpose: To write the specified dictionary (which may be stored in a non-text file, such as a commercial database) to a plain text file, for transfer to other computers, editing, publishing, etc.

Normal output: Message hc6517 "Morpher: Dictionary <dict_name> saved in text format to file <fname>."

Abnormal output: Operating system errors (such as file system full) should be trapped and output as errors..

Implementation notes: The format for writing lexical entries to a file is not fixed, except that each lexical entry should conform to the lexical entry format as defined above (see Lexical Entry Record Structure, section 5.2). Preferably, lexical entries should be separated by whatever character(s) the operating system uses to indicate a newline, and a single lexical entry should not be broken by a newline. (These two recommendations are to make it easier to use line-oriented tools like grep.) There is no required order for fields within a lexical entry, although it is suggested that the order in which the fields are presented in this specification be used. Any whitespace character may be used to separate fields, but tabs are recommended. (This would allow a program like awk to readily distinguish fields.) There is no need to write empty fields, and not writing them will save time and space.

The dictionary process may itself interact with the user to set defaults (e.g. maximum line length, character set, specific format, etc.).

See also: load_dictionary_from_text_file (section 6.5.2), merge_text_file_with_dictionary (section 6.5.3)

6.5.2.load_dictionary_from_text_file

Summary: Loads text file in lexical entry format into the specified dictionary file, replacing the current contents (if any) of that file.

Argument: list

text-file-name (obligatory): string

dict-file-name (obligatory): string

Purpose: To load a dictionary in text file format into a dictionary file. The text file may have been transferred from a different computer or format.

Normal output: Message hc6518 "Morpher: Dictionary <dict_name> loaded from text file <fname>."

Abnormal output: Operating system errors (such as invalid file name) should be trapped and output as errors.

Implementation note: The format for lexical entries in a text file is not fixed, except that each lexical entry must conform to the lexical entry format as defined above (see Lexical Entry Record Structure, section 5.2). This function should be able to accept files in a variety of formats, including variant order of fields and various whitespace characters.

See also: dump_dictionary_to_file (section 6.5.1), merge_text_file_with_dictionary (section 6.5.3)

6.5.3.merge_text_file_with_dictionary

Summary: Loads a text file in a specified format into the specified dictionary file, merging it with the current contents (if any) of that file.

Argument: list:

text-file-name (obligatory): string

dict-file-name (obligatory): string

Purpose: To load a text dictionary file, which may have been transferred from a different computer or format, adding it to the current dictionary.

Normal output: Message hc6519 "Morpher: Text file <text_fname> merged into current dictionary <dict_name>."

Abnormal output: Operating system errors (such as invalid file name) should be trapped and output as errors.

Warnings: The morpher does not attempt to find duplicate entries. This is because making decisions as to when near-duplicate entries should be merged is too difficult.

Implementation note: The format for lexical entries in a text file is not fixed, except that each lexical entry must conform to the lexical entry format as defined above (see Lexical Entry Record Structure, section 5.2). This function should be able to accept files in a variety of formats, including variant order of fields and various whitespace characters.

See also: dump_dictionary_to_file (section 6.5.1);

load_dictionary_from_text_file (section 6.5.2); merge_in_dictionary_file (section 6.4.5)

6.6. Debugging Functions and Variables

6.6.1.show_active_morph_rules

Summary: Shows all morphological rules in the rulebase matching a given template.

Argument: template (optional): morphological rule record (possibly partially instantiated)

Purpose: This function outputs all active morphological rules which match the template. A morphological rule matches the template if:

1. The template's Rule Name (if any) is the same as the rule's Rule Name. (There are no "wildcards.")

2. The template's Rule Stratum (if any) is the same as the rule's Rule Stratum.

3. The template's Blockability (if given) is the same as the rule's Blockability. (A value of true (the default) in the template matches an empty field in a rule.)

4. The template's Required Phonetic Input and Phonetic Output (if any) are identical to the corresponding fields of the rule.

5. The template's Required Part of Speech (if any) is the same as the rule's Required Part of Speech.

6. The template's Required Subcategorized Rules, Required Head Features, Required Foot Features, Required Morphological Rule Features, and Excluded Morphological Rule Features (if any) are subsets of the corresponding fields of the rule.

7. The template's (output) Part of Speech (if any) is the same as the rule's (output) Part of Speech. If the template's output Part of Speech is the special atom *null*, the rule does not have an output Part of Speech.

8. The template's (output) Subcategorization, Head Features, Foot Features, MPR Features, and Obligatory Features (if any) are subsets of the corresponding fields of the rule.

9. The template's Gloss String and Morphemic Representation (if any) are the same as the corresponding fields of the rule.

If no template argument is given, this function lists all active morphological rules.

A rule is active if it has been loaded and has not been removed.

Normal output: A list consisting of the identifier (atom) morphological_rules plus a list of rule structures matching the template. If the pattern does not match any rules, this sublist will be empty. This is not considered an error.

Abnormal output: hc6042 "Morpher error: Unknown natural class <nat_class_name> used in rule <rname>." (The specified natural class name appears in one of the phonetic sequences of the named rule, but it is not defined. Since it had to have been defined when the rule was loaded (see load_morpher_rule, section 6.2.1), it must have been removed by remove_nat_class.)

See also: show_active_phon_rules (section 6.6.2)

6.6.2.show_active_phon_rules

Summary: Shows all phonological rules in the rulebase matching a given template.

Argument: template (optional): phonological rule record (possibly partially instantiated)

Purpose: This function outputs all active phonological rules which match the template. A phonological rule matches the template if:

1. The template's Rule Name (if any) is the same as the rule's Rule Name. (There are no "wildcards.")

2. The template's Rule Strata (if any) is a subset of the rule's Rule Strata.

3. The template's Left Environment, Right Environment, Phonetic Input Sequence, and Phonetic Output Sequence (if any) are identical to the corresponding fields of the rule.

4. The template's Previous Word and Next Word fields (if any) are identical to the corresponding fields of the rule.

5. The template's Required Phonological Rule Features and Excluded Phonological Rule Features (if any) are subsets of the corresponding fields of the rule.

If no template argument is given, this function lists all active rules.

A rule is active if it has been loaded and has not been removed.

Normal output: A list consisting of the identifier (atom) phonological_rules plus a list of rule structures matching the template. If the pattern does not match any rules, this sublist will be empty. This is not considered an error.

Abnormal output: hc6042 "Morpher error: Unknown natural class <nat_class_name> used in rule <rname>." (The specified natural class name appears in one of the phonetic sequences of the named rule, but it is not defined. Since it had to have been defined when the rule was loaded (see load_morpher_rule, section 6.2.1), it must have been removed by remove_nat_class.)

See Also: show_active_morph_rules (section 6.6.1)

6.6.3.trace_morpher_rule

Summary: Provides a trace facility for tracing a named morpher rule.

Argument: list:

analysis_mode (obligatory): Boolean

generate_mode (obligatory): Boolean

rule_name (optional): atom

Purpose: This function allows the user to trace the operation of a phonological or morphological rule.

If a rule_name is provided as an argument, tracing is turned on for that rule in analysis mode if analysis_mode is true, and off for analysis mode otherwise; and it is turned on for generate mode if generate_mode is true, and off otherwise. If no rule_name is provided as an argument, tracing is turned on (exhaustive tracing) or off for all rules.

Normal output: One of the following messages, depending on the arguments:

hc6532 "Morpher: Tracing of morpher rule <rname> turned off for analysis and synthesis modes."

hc6533 "Morpher: Tracing of morpher rule <rname> turned off for analysis mode and on for synthesis mode."

hc6534 "Morpher: Tracing of morpher rule <rname> turned on for analysis mode and off for synthesis mode."

hc6535 "Morpher: Tracing of morpher rule <rname> turned on for analysis and synthesis modes."

hc6536 "Morpher: Tracing of all morpher rules turned off for analysis and synthesis modes."

hc6537 "Morpher: Tracing of all morpher rules turned off for analysis mode and on for synthesis mode."

hc6538 "Morpher: Tracing of all morpher rules turned on for analysis and off for synthesis mode."

hc6539 "Morpher: Tracing of all morpher rules turned on for analysis and synthesis modes."

When tracing is turned on for one or more rules, a trace data structure is output before the normal output of morph_and_lookup_word and generate_word (see Trace Data Structures, section 5.8).

Abnormal output:

hc6017 "Morpher error: Tracing status changed on unknown morpher rule: <rname>."

Warnings: If the rule base is at all complex, turning on tracing for all rules is likely to be more confusing than enlightening.

Implementation notes: It is not an error for tracing to be turned on for a rule which was already being traced, or off for a rule which is not being traced.

It is not an error to turn tracing of all rules on or off when there are no rules.

If a new rule is loaded with the same name as a rule currently being traced (presumably a corrected version of that rule), the new rule is traced. However, tracing is not turned on for any new rules with new names which may be loaded after trace_morpher_rule is called, even if tracing had been turned on globally. (This is because trace_morpher_rule may be called to turn off tracing on individual rules, even if tracing had previously been turned on globally.)

Exhaustive tracing can be selectively untraced; hence the implementation of exhaustive tracing must mark each rule as being traced, rather than turning on a global flag.

If a rule is deleted from the rulebase by the function remove_morpher_rule, tracing of that rule is automatically turned off (and will remain off until explicitly turned on, even if another rule of the same name is later added).

See also: list_traced_morpher_rules (section 6.6.6)

6.6.4.trace_morpher_strata

Summary: Provides a trace facility for tracing of strata.

Argument: list:

analysis_mode (obligatory): Boolean

generate_mode (obligatory): Boolean

Purpose: This function allows the user to trace the operation of strata.

If analysis_mode is true, tracing is turned on during analysis mode, and it is turned off for analysis mode otherwise; it is turned on for generate mode if generate_mode is true, and off otherwise.

Normal output: One of the following messages, depending on the arguments:

hc6545 "Morpher: Tracing of strata turned off for analysis and synthesis modes."

hc6546 "Morpher: Tracing of strata turned off for analysis mode and on for synthesis mode."

hc6547 "Morpher: Tracing of strata turned on for analysis mode and off for synthesis mode."

hc6548 "Morpher: Tracing of strata turned on for analysis and synthesis modes."

When tracing is turned on for strata, a trace data structure is output at the beginning and end of each stratum (see Trace Data Structures, section 5.8).

Abnormal output:

There is no function-specific error checking.

Implementation notes: It is not an error for tracing to be turned on for strata when it was already turned on, or off if it was already turned off.

6.6.5.trace_morpher_templates

Summary: Provides a trace facility for tracing of templates.

Argument: list:

analysis_mode (obligatory): Boolean

generate_mode (obligatory): Boolean

Purpose: This function allows the user to trace the application of templates.

If analysis_mode is true, tracing is turned on during analysis mode, and it is turned off for analysis mode otherwise; it is turned on for generate mode if generate_mode is true, and off otherwise.

Normal output: One of the following messages, depending on the arguments:

hc6566 "Morpher: Tracing of templates turned off for analysis and synthesis modes."

hc6567 "Morpher: Tracing of templates turned off for analysis mode and on for synthesis mode."

hc6568 "Morpher: Tracing of templates turned on for analysis mode and off for synthesis mode."

hc6569 "Morpher: Tracing of templates turned on for analysis and synthesis modes."

When tracing is turned on for templates, a template trace data structure is output each time a template matching the input is applied or unapplied (see Trace Data Structures, section 5.8).

Abnormal output:

There is no function-specific error checking.

Implementation notes: It is not an error for tracing to be turned on for templates when it was already turned on, or off if it was already turned off.

6.6.6.list_traced_morpher_rules

Summary: Returns a list of the names of all morpher rules being traced.

Argument: none

Normal output: A list of two lists, each sublist containing zero or more rule names. The first sublist is the list of rules being traced in analysis mode, and the second is the list of rules being traced in synthesis mode.

Abnormal output: There is no function specific error checking.

See also: trace_morpher_rule (section 6.6.3)

6.6.7.trace_lexical_lookup

Summary: Turns on or off the tracing of lexical lookup; all the storable lexical entries into which the morpher analyzes the input word will appear in the trace data structure output by the function morph_and_lookup_word.

Argument: on (optional): Boolean (default false)

If the argument is true, tracing of lexical lookup is turned on; otherwise, it is turned off.

Purpose: If the morpher fails to analyze a word, this function can be used to determine what storable lexical entries the morpher attempts to look up.

Another possible use is to scan a text known to contain a number of words (i.e. roots or stems) not in the dictionary. The morpher would make a pass through the text in batch mode, and the unknown words would then be separated out (e.g. using grep to pull out all lines in the output containing the phrase "unknown word", then using awk to separate the unknown word itself). The unknown words are then sorted, duplicates removed, and the resulting list again passed through the morpher, this time with tracing of lexical lookup turned on. The result is a list of possible roots and stems for each unknown word, from which the correct ones can be manually selected for inclusion in the dictionary.

Normal output: If the argument is true, message hc6527 "Morpher: Tracing of lexical lookup turned on." Otherwise, message hc6528 "Morpher: Tracing of lexical lookup turned off."

It is not an error to turn tracing off when it is already off, nor to turn it on when it is already on.

Abnormal output: There is no function specific error checking.

6.6.8.trace_blocking

Summary: Turns on or off the tracing of blocking.

Argument: on (optional): Boolean (default false)

If the argument is true, tracing of blocking is turned on; otherwise, it is turned off.

Purpose: The user may use this function to follow the blocking of virtual lexical entries by real lexical entries listed in the lexicon.

When the tracing of blocking is turned on, the functions morph_and_lookup_word and generate_word output trace data structures before their normal output. For each morphological rule whose output is actually blocked by a stored lexical entry, a blocking record structure appears in the trace structure.

Normal output: If the argument is true, message hc6529 "Morpher: Tracing of blocking turned on." Otherwise, message hc6530 "Morpher: Tracing of blocking turned off."

It is not an error to turn tracing off when it is already off, nor to turn it on when it is already on.

Abnormal Output: There is no function specific error checking.

6.6.9.show_derivations

Summary: Shows all the morphological and phonological rules that applied to successfully derive a given word.

Argument: word (obligatory): a list consisting of a single token record, as output by the Preprocessor (see section 5.1, Input Data Format).

Purpose: This function shows how the input word was analyzed into one or more real lexical entries, and how the morphological an phonological rules applied in the analyses. Its output is similar the trace data structure output by morph_and_lookup_word when tracing of lexical lookup and rule applications is turned on, but less voluminous: (1) Unsuccessful analyses are not shown; and (2) the input to each rule is not shown, since it is identical to the output of the preceding rule.

Normal Output: A list whose first member is the identifier derivations, and whose second member is a list of one or more derivations of the word which was the function's argument. Each derivation corresponds to an analysis which resulted in a complete unblocked lexical entry for the word, and is a list. The first member of that list is a sublist containing the real lexical entry which was looked up. For each rule which applied (vacuously or not) in a given derivation, the list will contain an additional sublist for that rule, consisting of the rule name followed by the lexical entry resulting from the application of that rule.

Abnormal Output:

hc6006 "Morpher error: Unknown word: <printform>.", where <printform> is the string which represents the (internal) printform of the word. (There was no successful morphing.)

Implementation note: The complete output of this function is likely to be more copious than helpful. The shell should therefore present the output selectively. For instance, the application of a phonological rule to a lexical entry only changes the latter's Phonetic Shape field, and therefore only the rule's name and this field should be displayed. (And even this field need not be displayed if it is unchanged.)

6.6.10.show_morphings

Summary: Shows all the lexical lookups that were attempted.

Argument: word (obligatory): a list consisting of a single token record, as output by the Preprocessor (see section 5.1, Input Data Format).

Purpose: To show the possible roots that could underlie a given input word.

Note: This command may be superfluous, since much the same thing can be accomplished using morph_and_lookup_word with tracing of lexical lookup turned on.

6.6.11.show_default_morpher_feature_value

Summary: Shows the default feature-value for a given feature-name.

Argument: feature-name (obligatory): (atom) a feature name

Normal output: Message hc6524 "Morpher: Feature name <feature_name> has the default feature value <feature_value>.", where <feature_value> is the default value.

Abnormal output: There is no function specific error checking.

Warnings: The morpher does not know whether the specified feature name is valid (i.e. is used anywhere in the grammar); if the user has not assigned a default value to a feature name, the morpher will assume the default value is the global default value, namely () (the empty set), regardless of whether that feature is actually used.

See also: assign_default_morpher_feature_value (section 6.1.11)

6.6.12.*trace_inputs*

Summary: Setting the *trace_inputs * variable determines whether the input field of rule application and unapplication traces is sent to the output.

Default: true

Possible values: Boolean.

Purpose: When this variable is true, the input field of each rule application and unapplication is output for all rules for which tracing is turned on. When this variable is false, the input fields of rule applications and unapplications are not shown. (The inputs of lexical lookup and strata traces are unaffected.) This may be useful to reduce the amount of text output if full tracing is turned on, since the input of each rule application or unapplication is redundant (being shown in the previous application or unapplication, or in the input to the stratum).

6.7. Miscellaneous Functions and Variables

6.7.1.*quit_on_error*

Summary: If the *quit_on_error* variable is true, when a command is executed that terminates abnormally, calling send_error_message, the morpher writes the error message to the current output. It then returns to the top-level read-eval-print loop, closing any files from which it has been reading input or to which it has been sending output. All further input comes from standard input, and all further output is sent to standard output. If the *quit_on_error* variable is false, when a command is executed that terminates abnormally, the morpher issues the error message but continues with the next command in the current input source, and continues writing output to the current output.

Note that *quit_on_error* is defined on a per-language basis, not globally.

Default: true

Possible values: true or false

6.7.2.load_msg_file

Summary: Loads a file of messages in some language convenient for the user.

Argument: file name (obligatory) (string)

Normal output: Message hc6514 "Morpher: Loaded message file <fname>.", where <fname> is the name of the file.

Abnormal output: Message hc6049 "Morpher error: Unable to load message file <fname>." This may be because the file does not exist, or because it was in the wrong format.

Note: The format for error messages in a file of messages follows the example below:

msg_text(hc6041, 'Morpher error: Natural class «name» is unknown.').

where « (ASCII 174) and » (ASCII 175) enclose the labels of the arguments, which will be replaced by the actual arguments when the message is output. The argument labels may appear in any order. Argument labels should not be omitted; if they are, the corresponding arguments will appear at the end of the message.

If no message file is loaded, the messages will be printed out in the form

<hc6041 name foobar>

This format will also be used for any messages not listed in the most recently loaded message file.

6.7.3.open_language

Summary: Adds a language to the list of possible languages that the morpher variable *cur_lang* may refer to, and sets that variable to point to the new language.

Argument: name of the language (obligatory): string

Normal output: Message hc6510 "Morpher: Opened new language <lname>." All following commands refer to this named language until a new language is opened with this command, or until the variable *cur_lang* is changed.

Abnormal output: Message hc6045 "Morpher error: Language <lname> has already been opened." (The variable *cur_lang* retains whatever value it had before this command was issued.)

hc6046 "Morpher error: Invalid language name <lname>." (Note that there may be an implementation-specific limitation.

6.7.4.initialization_complete

Summary: Indicates that the current language has been initialized.

Argument: none

Normal output: Message hc6525 "Morpher: Initialization complete."

Abnormal output: None. (Calling this function twice without closing and re-opening the language between times will not cause an error.)

Purpose: If an error occurs between the time a language has been opened (with open_language) and when this command is issued, while the variable *quit_on_error* is true, the language will be closed. This ensures against a language being partially initialized.

6.7.5.close_language

Summary: Removes a language from the list of languages that the morpher variable *cur_lang* may refer to. After this command, if the value of the variable *cur_lang* was the language which was closed, the variable’s new value is *unknown*; otherwise the variable retains its previous value.

Argument: name of the language (obligatory): string

Normal output: Message hc6511 "Morpher: Closed language <lname>; current language is <cur_lname>." where <lname> was the name given in this command’s argument, and <cur_lname> is the new value of the variable *cur_lang*. (See comments above concerning the new value of *cur_lang*.)

Abnormal output: hc6023 "Morpher error: Unknown language <lname>." (The specified language has not already been opened by the command open_language; the current language remains whatever it was before.)

Warning: Attempting to perform any commands except setting the variable *cur_lang*, redirecting input or output, or terminating the morpher when that variable has the default value *unknown* will result in error message hc6036 "Morpher error: There is no current language."

6.7.6.morpher_input_from_file

Summary: Loads a file of morpher commands. Commands are taken from the named file until end of file, bypassing any upstream modules (e.g. the preprocessor). Morpher_input_from_file commands may be nested.

Argument: fname (obligatory) (string)

Normal output: Message hc6508 "Morpher: Taking input from file <fname>." Any messages resulting from processing of the morpher commands in the file will follow this message. At end of file, message hc6521 "Morpher: Finished taking input from file <fname>."

Abnormal output: If the designated file is already open for output, the function send_error_msg is called with the message hc6030 "Morpher error: unable to open file <fname> for input, because it is already open for output."

Operating system errors (such as invalid file name) should be trapped and output as errors.

See also: *quit_on_error* (section 6.7.1)

6.7.7.morpher_output_to_file

Summary: Redirects all output (including error output) produced by processing morpher commands to a file or to standard output. If the file already exists, output is appended to the end. (It is assumed the shell will provide a means of deleting files if it is desired instead to overwrite them.) If output is currently to a file (not standard output), that file is closed. (Thus, only one file can be open for output at a time. This should not be a limitation, as a given file can be opened any number of times for output.)

Argument: fname (obligatory) (string). The special string "stdout" redirects output to standard output for the morpher module, i.e. to whatever module is downstream from the morpher.

Normal output: Message hc6522 "Morpher: Sending output to file <fname>." This message is sent to whatever the output is being sent to before the command takes effect. (There is no message when a file is closed.)

Abnormal output: If the designated file is already open for input, the function send_error_msg is called with the message hc6031 "Morpher error: unable to open file <fname> for output, because it is already open for input."

Operating system errors (such as invalid file name) should be trapped and the error message sent to standard output (not to the file). It is not an error to call morpher_output_to_file on a file which is already open for output.

6.7.8.terminate_morpher

Summary: Causes the process running the morpher to quit.

Argument: none

Normal output: none

Abnormal output: none

Implementation note: This function may be superfluous under some operating systems.

7. Morpher Rule Notation

Morpher rules are of two types: morphological rules and phonological rules.

7.1. Affix Templates

An Affix Template represents a sequence of Affix Slots which apply to a stem with a given part of speech in a given Stratum. The normal use will be to model inflectional morphology, particularly in inflecting languages which lend themselves to position class analysis.

The Stratum to which an Affix Template belongs is defined by the Stratum Property Setting Record which contains the Affix Template (see set_stratum, section 6.1.12). That is, all the templates for a given Stratum are loaded together with a single Stratum Property Setting Record. The intention is to ensure that there is no ambiguity in the use of the r_pos and r_subcat fields, since if Affix Templates were loaded individually, it would be unclear whether a given template should replace a previously loaded template.

An Affix Slot represents a block of affixes which are in complementary distribution and which, in some sense, appear in the same ‘position’. The reason for the hedging ("in some sense") and the scare quotes is that it may be possible for a prefix and a suffix (or more likely, a prefix and an infix) to compete for attachment to a given word, but obviously not to appear in the same position. (See Anderson 1992 page 131.)

Affix slots allow disjunctive application based on the morphosyntactic properties of the stem, not on its phonological properties. Each morphological rule may have subrules to enforce phonologically-based allomorphy. The assumption is that it is never necessary to distinguish between two affixes on the basis of both their morphosyntactic and their phonological properties.

It is permissible for an affix (rule name) to appear in more than one slot. For instance, the same person marking affix might be used for both subjects and objects, but the subject and object slots would be distinct.

Record Label: affix_template

Fields:

7.1.1. Template Name

Optionality: obligatory

Label: nm

Type: atom

Purpose: Template names are used strictly for purposes of identification during tracing.

7.1.2. Required Parts of Speech

Optionality: optional

Label: r_pos

Type: list of atoms

Contents: The names of parts of speech.

Purpose: This defines the parts of speech that the lexical entry which is the input to the rules of this template must belong to. The use of a list, rather than an atom, allows the use of more finely divided parts of speech (e.g. distinguishing among various subcategorizations of verbs by means of their parts of speech), while still allowing certain rules to apply to a general category (e.g. all verbs).

If this field is omitted, there is no requirement that the input belong to any particular part of speech.

7.1.3. Required Subcategorized Rules

Optionality: optional

Label: r_subcat

Type: list

Contents: A list of atoms, each of which is the name of a syntactic rule.

Purpose: The rule will apply to a lexical entry only if the lexical entry subcategorizes at least one of the rules in this list. This is useful for a template that requires that the stem to which it attaches have certain transitivity properties, e.g. object agreement.

If this field is omitted, there is no requirement that the input lexical entry subcategorize any particular rules.

7.1.4. Slots

Optionality: obligatory

Label: slots

Type: list of lists, each sublist being a list of atoms, each of which is the name of a realizational morphological rule.

Purpose: This defines the realizational inflectional rules which may apply to a stem of the given part of speech. The first realizational rule whose realizational features are a subset of the features to be realized of the derivation will be applied to the input lexical entry (see Definition of Application of an Affix Template, section 4.3).

Warning: Hermit Crab does not sort the realizational rules of a given slot from most specific to least specific; they are stored in the order given in this argument. Since they are disjunctively ordered, it is up to the user to ensure that they are in the correct order.

7.2. Morphological Rule Notation

Morphological rules are of three types: "ordinary" morphological rules (which attach an affixal morpheme), realizational rules (which realize a set of morphosyntactic features, and are typically inflectional rules), and compounding rules (including incorporation rules).

Note: The morphological rule notation may be augmented in future versions of Hermit Crab by adding additional fields, e.g. for indicating functional structure templates.

7.2.1. Ordinary (affixal) Morphological Rules

A morphological rule may have more than one subrule; such subrules apply disjunctively: the first subrule which can be applied applies, and no others do.

Record Label: mrule

Fields:

7.2.1.1. Rule Name

Optionality: obligatory

Label: nm

Type: atom

Purpose: Rule names are used to identify the rule which performed a given operation (for debugging), and to delete individual rules from the morpher's rule base.

Warnings: The morpher enforces uniqueness of morphological/phonological rule names: if two rules of the same name are loaded, the first one will be deleted. (This allows rules to be changed by loading a new version with the same Rule Name.) Note that morphological and phonological rules occupy the same namespace, i.e. it is not possible to have a phonological rule with the same name as a morphological rule.

See also: remove_morpher_rule (section 6.2.2); show_active_morph_rules (section 6.6.1); show_active_phon_rules (section 6.6.2)

7.2.1.2. Rule Stratum

Optionality: obligatory

Label: str

Type: atom

Contents: The name of one of the strata defined in the global variable *strata*.

Purpose: Tells which stratum the rule applies in. Unlike phonological rules, a given morphological rule may not apply in more than one stratum.

See also: Phonological Rule Notation—Rule Strata (section 7.3.2)

7.2.1.3. Blockability

Optionality: optional

Label: blockable

Type: Boolean

Default: true

Purpose: If the value of this field is true, the output (in the synthesis sense) of this rule can be blocked by an irregular form listed in the lexicon; otherwise not. Some highly productive derivational affixes are not blockable, e.g. the English suffix –ness: curiousness is a possible word even though the word curiosity exists.

See also: Lexical Entries and Lexical Lookup—Analyzable Word (section 3.6)

7.2.1.4. Multiple Application

Optionality: optional

Label: mult_applic

Type: integer

Default: 1

Purpose: By default, a morphological rule may apply only once. Rarely there are affixes which may be repeated (e.g. honorifics and causatives). By setting this field to an integer greater than one, this rule may be applied to a lexical entry up to that number of times.

Warning: This field should not be set to > 1 on a rule of null affixation, lest the morpher postulate unneeded forms.

See also: Complete Lexical Entries (section 3.5)

7.2.1.5. Required Parts of Speech

Optionality: optional

Label: r_pos

Type: list of atoms

Contents: The names of parts of speech.

Purpose: This defines the parts of speech that the lexical entry which is the input to the rule must belong to. The use of a list, rather than an atom, allows the use of more finely divided parts of speech (e.g. distinguishing among various subcategorizations of verbs by means of their parts of speech), while still allowing certain rules to apply to a general category (e.g. all verbs).

If this field is omitted, there is no requirement that the input belong to any particular part of speech.

7.2.1.6. Required Subcategorized Rules

Optionality: optional

Label: r_subcat

Type: list

Contents: A list of atoms, each of which is the name of a syntactic rule.

Purpose: The rule will apply to a lexical entry only if the lexical entry subcategorizes at least one of the rules in this list. This is useful for an affix that requires that the stem to which it attaches have certain transitivity properties. (For instance, the English suffix un– attaches only to transitive verbs: uncover, but *unsleep.)

If this field is omitted, there is no requirement that the input lexical entry subcategorize any particular rules.

7.2.1.7. Required Head Features

Optionality: optional

Label: r_hf

Type: list-valued features list

Purpose: This list gives the Features with which the Head Features of the lexical entry input to this rule must be unifiable in order to undergo the rule. The unification of these features with those of the input lexical entry is available for percolation to the output lexical entry. (Note that an uninstantiated Head Feature on the lexical entry is considered to unify with these features, unless the feature has a default value which does not unify; see above Definition of Feature Unification, section 4.2.2.)

If this field is omitted, there are no required Head Features.

See also: Required Foot Features (section 7.2.1.8); Required MPR Features (section 7.2.1.15.1.4.2); Output Obligatory Features (section 7.2.1.13)

7.2.1.8. Required Foot Features

Optionality: optional

Label: r_ff

Type: list-valued features list

Purpose: This list gives the Foot Features with which the Foot Features of the lexical entry input to this rule must be unifiable in order to undergo the rule. The unification of these features with those of the input lexical entry is available for percolation to the output lexical entry.

If this field is omitted, there are no required Foot Features.

See also: Required Head Features (section 7.2.1.7); Required MPR Features (section 7.2.1.15.1.4.2); Output Obligatory Features (section 7.2.1.13)

7.2.1.9. Output Part of Speech

Optionality: optional

Label: pos

Type: Atom

Purpose: This gives the Part of Speech which the virtual lexical entry output (in the generation sense) by this rule will belong to.

If this field is omitted, the Part of Speech of the output of the rule is the same as the Part of Speech of the input.

Note that unlike the Required Part of Speech field, this is an atom, not a list.

7.2.1.10. Output Subcategorization

Optionality: optional

Label: sub

Type: list

Contents: A list of atoms and/or lists. Each atom is the name of a syntactic rule; each sublist contains one or two atoms, which are names of syntactic rules.

Purpose: This defines the subcategorization of the lexical entry output by this rule. If this field is present, the subcategorization of the output lexical entry consists of:

all the atomic members of this list; plus:

for each list member of this field with length two, the second rule name of the sublist if the first rule name of the sublist was a member of the input lexical entry's subcategorization list; plus

all the members of the input lexical entry’s subcategorization list which are not present in this list.

(In order to prevent a rule name in the input lexical entry’s subcategorization list from appearing in the output lexical entry, that rule name should appear in a sublist of length one in this field.)

If the value of this field is the empty list, the subcategorization of the lexical entry output by this rule is empty.

If this field is omitted, the subcategorization list of the output of this rule is the same as the subcategorization of the input lexical entry.

Warning: The morpher does not check that any of the atoms in this list or in its sublists are names of actual syntactic (parser) rules.

7.2.1.11. Output Head Features

Optionality: optional

Label: hf

Type: list-valued feature list

Purpose: This lists the Head Features which the morphological rule adds to the lexical entry output by the rule. In case of conflict between the features specified in this list and Head Features passed up from lexical entries from which this lexical entry is built, the features in this list "win": the features specified in this list override any values percolated up from the stem on which this lexical entry is built.

If this field is omitted, the rule does not assign any additional Head Features to the lexical entry output by the rule.

7.2.1.12. Output Foot Features

Optionality: optional

Label: ff

Type: list-valued feature list

Purpose: This defines the Foot Features which the morphological rule adds to the lexical entry output by the rule. In case of conflict between the features specified in this list and Foot Features passed up from the lexical entry from which this lexical entry is built, the features in this list "win": the features specified in this list override any values percolated up from the stem.

If this field is omitted, the rule does not assign any additional Foot Features to the lexical entry output by the rule.

7.2.1.13. Output Obligatory Features

Optionality: optional

Label: of

Type: list

Contents: a list of atoms, each of which is the name of a Head Feature

Purpose: The atoms in this list are added to the Obligatory Features field of the lexical entry output by this rule (see Lexical Entry Record Structure—Obligatory Head Features, section 5.2.13). This field encodes the requirement that for each feature-name listed, some value must be assigned to that feature by the end of the derivation (i.e. by another affix or by percolation from the stem to which this affix is attached). If at the end of the derivation (in the synthesis sense), no value has been assigned to such a feature, the derivation is ruled out. (This field only requires that some value for a feature be assigned by the end of the derivation, whereas the Required Syntactic Features field requires that a specific value (or values) be present in the lexical entry to which this rule applies.)

Example: Suppose that in some language, the addition of a present tense suffix to a verb means that a person suffix must also be added. Then the rule attaching the present tense suffix should contain the feature name person in its Obligatory Features list.

7.2.1.14. Gloss String

Optionality: optional

Label: gl

Type: string

Contents: The gloss of the morpheme attached by this rule.

Purpose: This Gloss String will be appended to the right of the Gloss field of the input lexical entry, with a space as a separator.

If this field is empty, the morpher supplies the default string "?".

7.2.1.15. Subrules

Optionality: obligatory

Label: subrules

Type: List of one or more Morphological Subrule structures (defined below); each sublist represents a subrule.

Purpose: The subrules apply disjunctively; each subrule is tried beginning with the first rule of the list until one applies or the end of the list is reached. Only one subrule may apply. The subrules are intended to represent allomorphy.

7.2.1.15.1. Morphological Subrule Structure

The Morphological Subrule defines the specific phonological environment in which the morphological rule applies (including any restrictions due to rule features), and the output of the rule. If a morphological rule has more than one subrule, the subrules apply to a stem disjunctively, that is the first subrule whose structural description is met applies, and no others do.

Record Label: msub

Fields:

7.2.1.15.1.1. Input Side

Optionality: obligatory

Label: m_lhs

Type: Input Side Record Structure (defined below)

Contents: The left-hand side of the rule.

Purpose: This record structure defines what a linguist would think of as the input side of the rule. In reality, it defines the input as the rule is used for synthesis, and the output as the rule is used for analysis.

This field is defined in more detail below (section 7.2.1.15.1.4).

7.2.1.15.1.2. Output Side

Optionality: obligatory

Label: m_rhs

Type: output side record structure (defined below)

Contents: The right-hand side of the rule.

Purpose: This represents what a linguist would think of as the output of the rule. In reality, it defines the output of the rule only as the rule is used for synthesis; it represents input as the rule is used for analysis.

This field is defined in more detail below (section 7.2.1.15.1.5).

7.2.1.15.1.3. Variable Features

Optionality: optional

Label: var_fs

Type: list

Content: Each odd-numbered member of the list is the name (atom) of an alpha variable. Each even-numbered member is the name (atom) of a phonetic feature.

Purpose: This lists the alpha variables which may appear in the subrule, and assigns a feature name to each. The use of the name of an alpha variable later in the rule (inside a Natural Class) indicates agreement (if the name is followed by the atom +) or disagreement (if the name is followed by the atom –) with the value of the alpha variable elsewhere in the rule.

There is no provision for using the same alpha variable name with different features in different parts of the rule.

Warnings: An alpha variable which appears zero or one times in a rule will have no effect, since no agreement could be enforced. Hermit Crab does not check for this.

An alpha variable must be instantiated before it can be used to assign a value to a feature. Normally, this will be accomplished by enforcing agreement between a feature in the input of the rule and one in the output; they should not be used to enforce agreement between two parts of the output.

7.2.1.15.1.4. Input Side Record Structure

Record Label: m_lhs

Fields:

7.2.1.15.1.4.1. Required Phonetic Input

Optionality: obligatory

Label: pseq

Type: A list of lists, each sublist of which is a Phonetic Sequence (as defined in section 5.7.2.5)

Purpose: This field represents the required phonetic shape of the stem which is to be affected by the rule. The entire length of the stem must match against this required shape. Variables (i.e. an optional sequence which can appear an indefinite number of times; see Phonetic Sequence—Definition of an Optional Segment Sequence, section 5.7.2.4.3) may be used to match against those portions of the stem whose phonetic form is irrelevant.

With each sublist of the list is associated a unique integer beginning with one at the left end of the list and increasing up to the length of the list. In other words, it is as if the sublists were numbered sequentially. These numbers are used in the output of the rule to represent the corresponding elements of the input. (Note that the use of sublists allows a single number to correspond to a sequence of boundary markers, simple contexts, and optional segment sequences, thereby potentially standing for some hierarchical structure, such as a syllable.)

See also: Output Side Record Structure—Phonetic Output (section 7.2.1.15.1.5.1)

7.2.1.15.1.4.2. Required MPR Features

Optionality: optional

Label: r_rf

Type: list

Contents: Each member of the list is the name (an atom) of a Morphological- Phonological Rule (MPR) feature.

Purpose: This encodes positive rule feature requirements, such as conjugation class membership or gender.

In order for this rule to apply to a lexical entry, the lexical entry must contain in its MPR Features list all the feature names of this list.

If this field is omitted, there are no required MPR Features.

See also: Excluded MPR Features (section 7.2.1.15.1.4.3)

7.2.1.15.1.4.3. Excluded MPR Features

Optionality: optional

Label: x_rf

Type: list

Contents: Each member of the list is the name (an atom) of a Morphological- Phonological Rule (MPR) feature.

Purpose: This encodes negative rule feature requirements, such as conjugation class membership or gender.

In order for this rule to apply to a lexical entry, the lexical entry must not contain in its MPR Features list any of the feature names of this list.

If this field is omitted, there are no excluded MPR features.

Warning: The names in the Required MPR Features list and this list should be mutually exclusive. The morpher does not check for this.

See also: Required MPR Features (section 7.2.1.15.1.4.2)

7.2.1.15.1.5. Output Side Record Structure

Record Label: m_rhs

Fields:

7.2.1.15.1.5.1. Phonetic Output

Optionality: obligatory

Label: p_out

Type: list

Contents: Each member of this list is:

an integer,

a Simple Context,

a list of length two whose first member is an integer and whose second member is a Simple Context, or

a list of length two whose first member is a string and whose second member is the name (atom) of a character definition table.

The interpretation of each such member is as follows:

Integer: An integer N implies that the stretch of the Phonetic Shape of the input which matches the Nth member of the Required Phonetic Input field of this rule should be copied to the output in this position. (The members of the list which constitutes the Required Phonetic Input of the morphological rule—or the lists, if there are two Input records for this rule—are implicitly numbered from one to the length of the list (or to the combined lengths of the two lists; see Input Side Record Structure—Required Phonetic Input, section 7.2.1.15.1.4.1). It is an error for an integer member of this Phonetic Output field to be larger than the length of that Phonetic Input list.)

Simple Context: A member of the Phonetic Output list which is a simple context is inserted into the output phonetic sequence at this position. The simple context given in this rule may underspecify the segment, so long as the segment is not underspecified when the simple context must have been translated into a character string when the derivation is complete (in the generation sense). For instance, in a language with vowel harmony, a simple context might insert a high vowel without specifying the roundness or backness of that vowel—in essence, inserting an archiphoneme. But this archiphoneme must become a fully specified phoneme through the application of phonological rules by the time the derivation is complete.

List of integer plus Simple Context: the integer corresponds to one of the members of the Required Phonetic Input list (in the same way as an integer member of this field, see above). The interpretation is that the matching stretch of the Phonetic Shape of the input to this rule is copied to the output in this position, but with the values of the phonetic features given in the Simple Context substituted for the features of the same name (if any) in each segment of the matched stretch of the Phonetic Shape of the input. If the member of the Required Phonetic Input of the rule to which this member of the Phonetic Output list corresponds is an optional segment sequence, it may match multiple segments in the phonetic input; the features are assigned to all such segments. If the optional sequence to which this list corresponds matches zero segments in the input, the features of this list are ignored.

List of string plus name of a character definition table: The string is translated into a sequence of segments using the specified character definition table, and inserted into the output of the rule at this position.

Purpose: This field represents the changes in phonetic form caused by the application of the rule.

See also: Input Side Record Structure—Required Phonetic Input (section 7.2.1.15.1.4.1)

7.2.1.15.1.5.2. Morphological/ Phonological Rule Features

Optionality: optional

Label: rf

Type: list

Contents: zero or more atoms, each of which is the name of a Morphological/ Phonological Rule (MPR) feature.

Purpose: This field lists the mp-rule features added to the output lexical entry by this morphological rule.

If this field is omitted, the rule does not assign any additional mp-rule features to the lexical entry output by the rule.

See also: Output Head Features (section 7.2.1.11)

7.2.2. Realizational Rules

A Realizational Rule is one which realizes a given set of morphosyntactic features for a certain part of speech. A set of Realizational Rules belongs to a given Slot of an Affix Template; hence all the Realizational Rules of a given Template apply to the same part of speech. As such they do not specify an Input Part of Speech, nor do they specify morphosyntactic properties of their output.

The specification of a stratum for a Realizational Rule is redundant, inasmuch as the assignment of an Affix Template to a Stratum carries it with it the names of the realizational rules which will be applied. However, the redundancy is retained to ensure that a given Realizational Rule is used in only one stratum.

In order for a Realizational Rule to be used, it must first be loaded, then added to at least one affix template of its stratum.

Record Label: rz_rule

Fields:

The fields of a Realizational Rule are identical to those of an ordinary Morphological Rule, except that the following fields may not appear: Required Parts of Speech; Required Subcategorized Rules; Output Part of Speech; Output Subcategorization; Output Head Features; Output Foot Features; and Output Obligatory Features, and Multiple Application.

In addition, there is another field, Realizational Features:

7.2.2.1. Realizational Features

Optionality: obligatory

Label: rz_f

Type: list-valued features list

Purpose: This list gives the morphosyntactic features of which the Realizational Features of the input to this rule must be a superset in order to undergo the rule.

7.2.3. Compounding Rules

Compounding Rules are variants of ordinary Morphological Rules, with the several differences. The subsections list the fields; in most cases they are the same as the corresponding fields of ordinary Morphological Rules, and the reader is referred there for a description.

Record Label: comp_rule

Fields:

7.2.3.1. Rule Name

Same as for ordinary Morphological Rules.

7.2.3.2. Rule Stratum

Same as for ordinary Morphological Rules.

7.2.3.3. Multiple Application

Same as for ordinary Morphological Rules. (However, the default value of one will probably always be correct, so there should be little if any need for this field in Compounding Rules.)

7.2.3.4. Blockability

Same as for ordinary Morphological Rules.

7.2.3.5. Head Part of Speech

Optionality: optional

Label: head_pos

Type: list of atoms

Contents: The names of parts of speech.

If this field is omitted, there is no requirement that the head belong to any particular part of speech.

7.2.3.6. Non-Head Part of Speech

Optionality: optional

Label: nonhead_pos

Type: list of atoms

Contents: The names of parts of speech.

If this field is omitted, there is no requirement that the non-head belong to any particular part of speech.

7.2.3.7. Head Subcategorized Rules

Optionality: optional

Label: head_subcat

Type: list

Contents: A list of atoms, each of which is the name of a syntactic rule.

Purpose: The rule will apply to a lexical entry as head only if the lexical entry subcategorizes at least one of the rules in this list. This is useful for instance for an incorporation rule that requires that the verb be transitive.

If this field is omitted, there is no requirement that the head lexical entry subcategorize any particular rules.

7.2.3.8. Non-Head Subcategorized Rules

Optionality: optional

Label: nonhead_subcat

Type: list

Contents: A list of atoms, each of which is the name of a syntactic rule.

Purpose: The rule will apply to a lexical entry as non-head only if the lexical entry subcategorizes at least one of the rules in this list.

If this field is omitted, there is no requirement that the non-head lexical entry subcategorize any particular rules.

7.2.3.9. Head Required Head Features

Optionality: optional

Label: head_r_hf

Type: list-valued features list

Purpose: This list gives the Features with which the Head Features of the head lexical entry must be unifiable in order for it to undergo the rule. The unification of these features with those of the input lexical entry is available for percolation to the output lexical entry. (Note that an uninstantiated Head Feature on the lexical entry is considered to unify with these features, unless the feature has a default value which does not unify; see section 4.2.2, Definition of Feature Unification.)

If this field is omitted, there are no required Head Features for the head lexical entry.

7.2.3.10. Non-Head Required Head Features

Optionality: optional

Label: nonhead_r_hf

Type: list-valued features list

Purpose: This list gives the Features with which the Head Features of the non-head lexical entry must be unifiable in order for it to undergo the rule.

If this field is omitted, there are no required Head Features for the non-head lexical entry.

7.2.3.11. Head Required Foot Features

Optionality: optional

Label: head_r_ff

Type: list-valued features list

Purpose: This list gives the Foot Features with which the Foot Features of the head lexical entry must be unifiable in order to undergo the rule. The unification of these features with those of the input lexical entry is available for percolation to the output lexical entry.

If this field is omitted, there are no required Foot Features for the head lexical entry.

7.2.3.12. Non-Head Required Foot Features

Optionality: optional

Label: nonhead_r_ff

Type: list-valued features list

Purpose: This list gives the Foot Features with which the Foot Features of the non-head lexical entry must be unifiable in order to undergo the rule.

If this field is omitted, there are no required Foot Features for the non-head lexical entry.

7.2.3.13. Output Part of Speech

Same as for ordinary Morphological Rules.

7.2.3.14. Output Subcategorization

Same as for ordinary Morphological Rules.

7.2.3.15. Output Head Features

Same as for ordinary Morphological Rules.

7.2.3.16. Output Foot Features

Same as for ordinary Morphological Rules.

7.2.3.17. Output Obligatory Features

Same as for ordinary Morphological Rules.

7.2.3.18. Subrules

Same as for ordinary Morphological Rules, except that the members of the list composing this field are Compound Subrules (defined below).

7.2.3.18.1. Compound Subrules

Compound Subrules serve as the subrules of Compounding Rules.

A Compounding Subrule has a Head record, a Non-Head record, and an Output Side record.. The Non-Head record's Required Phonetic Input field is implicitly numbered beginning with the next integer following the last number of the Head record's Required Phonetic Input field.

Record Label: comp_subrule

Fields:

7.2.3.18.1.1. Head Record Structure

Record Label: head

Fields: The fields of the Head record are identical to those of the Input Side Record for an ordinary morphological rule (see section 7.2.1.15.1.4).

7.2.3.18.1.2. Non-Head Record Structure

Record Label: nonhead

Fields: The fields of the Non-Head record are identical to those of the Input Side Record for an ordinary morphological rule (see section 7.2.1.15.1.4), except that the following fields cannot appear: Required Morphological Rule Features and Excluded Morphological Rule Features.

7.2.3.18.1.3. Output Side Record Structure

Record Label: c_rhs

Fields: The Output Side Record Structure is identical to the field of the same name in an ordinary morphological rule.

7.2.3.18.1.4. Variable Features

Same as in ordinary Morphological Rule.

7.3. Phonological Rule Notation

Phonological rules are those rules which modify the phonological structure of a lexical entry without changing its meaning or grammatical features. In a structuralist grammar, they would include both morphophonemic and allophonic rules. (If the input to the morpher is in a true phonemic orthography, there would be no need for allophonic rules in the morpher.)

Phonological rules may undergo multiple application (see Phonetics of Phonological Rule Application—Definition of Phonetics of Multiple Application of a Phonological Rule, section 4.4.1.3).

Phonological rules may be defined to have more than one subrule, in which case they act as disjunctive rules. Only one such subrule may apply to a given segment in a phonetic sequence (again see Phonetics of Phonological Rule Application—Definition of Phonetics of Multiple Application of a Phonological Rule, section 4.4.1.3).

Record Label: prule

Fields:

7.3.1. Rule Name

Optionality: obligatory

Label: nm

Type: atom

Purpose: Rule names are used to identify the rule which performed a given operation (for debugging), and to delete individual rules from the morpher's rule base.

The rule name refers to the set of disjunctive rules, not to the individual (sub-)rules.

Warnings: The morpher enforces uniqueness of phonological/morphological rule names: if two rules of the same name are loaded, the first one will be deleted. (This allows rules to be changed by loading a new version with the same rule name.) Note that morphological and phonological rules occupy the same namespace, i.e. it is not possible to have a phonological rule with the same name as a morphological rule.

See also: remove_morpher_rule (section 6.2.2); show_active_phon_rules (section 6.6.2)

7.3.2. Rule Strata

Optionality: obligatory

Label: str

Type: list

Content: Each member of the list is the name of a stratum, as defined in the global variable *strata*. They are ordered from deepest to shallowest.

Purpose: This lists which strata the rule applies in.

See also: Morphological Rule Notation—Rule Stratum (section 7.2.1.2)

7.3.3. Variable Features

Optionality: optional

Label: var_fs

Type: list

Content: Each odd-numbered member of the list is the name (atom) of an alpha variable. Each even-numbered member is the name (atom) of a phonetic feature.

Purpose: This lists the alpha variables which may appear in the rule, and assigns a feature name to each. The use of the name of an alpha variable later in the rule (inside a Natural Class) indicates agreement (if the name is followed by the atom +) or disagreement (if the name is followed by the atom –) with the value of the alpha variable elsewhere in the rule.

There is no provision for using the same alpha variable name with different features in different parts of the rule. Thus, it would not be possible to write a rule in which one segment assimilates in the value of the feature voiced with the value of the feature sonorant in some other segment.

Warning: An alpha variable which appears zero or one times in the body of a rule will have no effect, since no agreement could be enforced. Hermit Crab does not check for this.

7.3.4. Multiple Application Order

Optionality: optional

Label: mult_applic

Type: atom: lr_iterative, simultaneous, or rl_iterative

Default: lr_iterative

Purpose: This defines the way the rule will behave if its structural description is met more than once to a given lexical entry.

See also: Definition of Phonetics of Multiple Application of a Phonological Rule (section 4.4.1.3)

7.3.5. Phonetic Input Sequence

Optionality: obligatory

Label: in_pseq

Type: a variable-free phonetic sequence

Purpose: This defines the phonetic input to the phonological rule.

If the input is null (i.e. a rule of epenthesis), the input sequence is represented by the empty list. The phonetic input field is shared by all the subrules.

Restrictions: Either the Phonetic Input Sequence or the Phonetic Output Sequence must be a list of length zero or one, or else both must have the same length.

See also: Output Side Record Structure—Phonetic Output (section 7.3.7.4)

7.3.6. Subrules

Optionality: obligatory

Label: subrules

Type: List. Each member of the list is a Phonological Subrule Structure (defined below).

Purpose: This contains the list of subrules to be applied disjunctively. Each subrule is tried beginning with the first rule. At most one subrule may apply to each segment (even if it only applies vacuously).

7.3.7. Phonological Subrule Structure

The Phonological Subrule structure includes the environment in which the rule applies (broadly construed to include any restrictions due to rule features, part of speech etc., in addition to the left and right phonetic environments), as well as its output (structural change). If a phonological rule has more than one subrule, the subrules apply to a given segment or sequence of segments disjunctively, that is the first subrule whose structural description is met applies, and no others do. (In the case of a disjunctive rule of epenthesis, since there is no input segment, each possible position of epenthesis acts in the same manner, although it would be unusual for a rule of epenthesis to have more than one subrule.)

Record Label: psub

Fields:

7.3.7.1. Required Parts of Speech

Optionality: optional

Label: r_pos

Type: List of atoms

Contents: The names of parts of speech.

Purpose: This defines the parts of speech that the lexical entry which is the input to the rule must belong to. The use of a list, rather than an atom, allows the use of more finely divided parts of speech (e.g. distinguishing among various subcategorizations of verbs by means of their parts of speech), while still allowing certain rules to apply to a general category (e.g. all verbs).

If this field is omitted, there is no requirement on the part of speech of the input lexical entry.

7.3.7.2. Required MPR Features

Optionality: optional

Label: r_rf

Type: list

Contents: Each member of the list is the name (an atom) of a Morphological- Phonological Rule (MPR) feature.

Purpose: This encodes positive rule feature requirements, such as conjugation class membership or gender.

In order for this subrule to apply to a lexical entry, the lexical entry must contain in its MPR Features list all the feature names of this list.

If this field is omitted, there are no required MPR features.

See also: Excluded MPR Features (7.3.7.3)

7.3.7.3. Excluded MPR Features

Optionality: optional

Label: x_rf

Type: list

Contents: Each member of the list is the name (an atom) of a Morphological- Phonological Rule (MPR) feature.

Purpose: This encodes negative rule feature requirements, such as conjugation class membership or gender.

In order for this subrule to apply to a lexical entry, the lexical entry must not contain in its MPR Features list any of the feature names of this list.

If this field is omitted, there are no excluded MPR features.

Warning: The names in the Required MPR Features list and this list should be mutually exclusive. The morpher does not check for this.

See also: Required MPR Features (section 7.3.7.2)

7.3.7.4. Phonetic Output

Optionality: obligatory

Label: output_pseq

Type: a variable-free phonetic sequence

Purpose: This defines the phonetic output of the subrule (see Phonetics of Phonological Rule Application, section 4.4.1), and represents what a linguist would think of as the output of the rule. In reality, it defines the output of the subrule only as the rule is used for synthesis; it represents input as the rule is used for analysis.

If the output is null (i.e. a rule of deletion), the output sequence is represented by the empty list.

Restrictions: See Phonetic Input Sequence (section 7.3.5)

7.3.7.5. Left Environment

Optionality: optional

Label: left_environ

Type: Phonetic template (see definition in section 5.7.2.7)

Purpose: This field represents the left-hand phonetic environment of the subrule. It is identical in form to the right environment field.

If this field is omitted, there is no constraint on the left environment of the subrule.

The left environment cannot extend across a word boundary (i.e. white space). This means that this phonetic template should not contain a value of true for the Final Boundary Condition field. Hermit Crab will silently ignore such a value in that field.

See also: Right Environment (section 7.3.7.6); Previous Word (section 7.3.7.7)

7.3.7.6. Right Environment

Optionality: optional

Label: right_environ

Type: Phonetic template (see definition in section 5.7.2.7)

Purpose: This field represents the right-hand phonetic environment of the subrule. It is identical in form to the left environment field.

If this field is omitted, there is no constraint on the right environment of the subrule. The right environment cannot extend across a word boundary (i.e. white space). This means that this phonetic template should not contain a value of true for the Initial Boundary Condition field. Hermit Crab will silently ignore such a value in that field.

See also: Left Environment (section 7.3.7.5); Next Word (section 7.3.7.8)

7.3.7.7. Previous Word

Optionality: optional

Label: prev_word

Type: Phonetic template (see definition in section 5.7.2.7) or the atom *null*.

Purpose: This field represents the required (surface) Phonetic Shape of the preceding word in the input. It is identical in form to the Next Word field.

If this field is omitted, there is no constraint on the preceding word.

This field is intended for the sandhi rules, in which the phonetic form of the preceding word is relevant to the application of a phonological rule. There is no provision for specifying properties other than the phonetic form of the preceding word (which is probably inadequate, but it is not clear from linguistic theory what would be adequate).

For a rule which must apply to only the first word of the input, this field should be the atom *null*.

See also: Next Word (section 7.3.7.8)

7.3.7.8. Next Word

Optionality: optional

Label: next_word

Type: Phonetic template (see definition in section 5.7.2.7) or the special atom *null*.

Purpose: This field represents the required (surface) Phonetic Shape of the following word in the input. It is identical in form to the Previous Word field.

If this field is omitted, there is no constraint on the following word.

This field is intended for sandhi rules, in which the phonetic form of the following word is relevant to the application of a phonological rule. This situation is typical of clitics (e.g. the Spanish pronominal clitic le becomes se before the clitic pronouns lo and la), as well as tone sandhi rules. There is no provision for specifying properties other than the phonetic form of the following word.

For a rule which must apply to only the last word of the input, this field should be the atom *null*.

See also: Previous Word (section 7.3.7.7)

8. References

Anderson, Stephen R. 1992. A-Morphous Morphology. Cambridge Studies in Linguistics 62. Cambridge: Cambridge University Press.

Carlson, Greg, and Thomas Roeper. 1981. "Morphology and Subcategorization: Case and the unmarked Complex Verb." In Teun Hoekstra, Harry van der Hulst and Michael Mooortgat (eds.) Lexical Grammar. Publication in Language Sciences 3. Dordrecht: Foris Publications.

Maxwell, Michael. 1991. "Phonological Analysis and Opaque Rule Orders." In Proceedings of the Second International Workshop on Parsing Technologies, pages 110-116. Computer Science Department, Carnegie Mellon University.

Scalise, S. 1986. Generative Morphology. Second edition. Dordrecht: Foris Publications. NB: I have not been able to verify this reference—MM.