SIL International Home

Hermit Crab IconHermit Crab

Hermit Crab is a morphological parser and generator for classical generative phonology and morphology.

Say what? A morphological parser is a tool for going from the surface ("phonetic") representation to an underlying representation, including breaking the word into its "morphemes", and undoing any phonological rules that have applied. The term "morphemes" is in quotes, because Hermit Crab treats affixes as morphological processes. In fact, some of the "morphemes" may in fact simply be morphosyntactic features (under a realizational approach to morphology).

A morphological generator is a tool for going in the reverse direction, that is, from an underlying representation to a surface representation. The underlying representation of a word consists of a lexical entry for a stem, a series of morphological rules to be applied, and (optionally) a set of morphosyntactic features to be realized by additional inflectional affixes.

The term 'classical generative phonology' is used to mean pre-autosegmental generative phonology, where each segment is represented by a set of phonetic (or distinctive) features, as in The Sound Pattern of English (SPE). There is no spreading (as opposed to copying) of features, nor is there any hierarchical feature representation. Strata of rules are, however, supported. For more on the phonological capabilities and limits, click here.

'Classical generative morphology' is probably a misnomer, as in some sense there was no such thing. The intended meaning is that autosegmental or prosodic-type morphology is not supported. What is supported is process-style morphology (as opposed to item-style morphology, although the latter can be modeled as a degenerate version of the former), and realizational morphology (in which rather than specify a set of inflectional affixes to be attached to a stem, you specify a set of morphosyntactic features to be realized, and these in turn trigger the relevant morphological rules). For more on the morphological capabilities and limits, click here.

Hermit Crab consists of two parts, a parsing engine and a user interface. The current user interface is through LinguaLinks, an SIL-developed integrated environment for doing linguistics. (Nothing would prevent someone from generating input in some other way, such as Emacs.  The parsing engine's native language vaguely resembles SGML, so you don't want to write your rules with Windows Notepad!) The parsing engine is written in 16-bit Arity Prolog and Microsoft C, and runs under Microsoft Windows 3.1 and Windows95 (and presumably Windows98, although it hasn't been tested there). It is open-sourced, so here is the source code (zipped; see the license and the ReadMe file), the regression testing files (also zipped), and the executable (zipped as well). The Hermit Crab parsing engine may some day be ported to a 32-bit version of Prolog and C, assuming there is any interest on the part of users. (Since it's open-sourced, perhaps you'd like to take on that task!)

To see what the LinguaLinks user interface looks like, see these pictures.

The specification for how Hermit Crab was (and is) supposed to work is here as a Word97 document, and here as an HTML doc. (Apologies: The HTML doc is just a dump of the Word doc, and is not very well formatted...) Not everything in the specification has been implemented, but most of it has. (See the description of the phonological capabilities and limits and the description of the morphological capabilities and limits for what hasn't been implemented.)  There is also a Help file for the LinguaLinks version (excerpted from the LinguaLinks Help itself). Some of the Help is specific to the LinguaLinks user interface, but some of the general discussion is more broadly relevant (it explains the theory of realizational morphology, for instance). The Help doc is in the form of a Windows Helpfile, but it may be available in the near future as an HTML doc. (Note: This help file requires either the Win95 help engine, or Help v4.0 or later for Windows 3.1.)

The LinguaLinks version of Hermit Crab ships with some sample data, so you can see what a Hermit Crab analysis might look like. Unfortunately, version 3.0 of LinguaLinks shipped with an outdated sample data file, which will not load into version 3.0... Here is an updated zipped sample data file. Unzip this to create the file SMPLDATA.CLR, then import that file into LinguaLinks.

Some additional papers describing the algorithm behind Hermit Crab are here.