SIL International Home

The Linguist's Shoebox

Integrated data management and analysis for the field linguist


The Shoebox parser works according to the longest-match principle.

In Shoebox, the parser eliminates much of the potential ambiguity by selecting the parse that "cuts off" the longest affix. If there is more than one way that the parser could divide a word into pieces, the longest affix can "win" over the longest root. Depending on the morphology of the particular language you are analyzing, the way that the Shoebox parser uses "greedy" matching can work for you or against you. Here are two potential parsing problems:

It is ironic that the longest-match principle provides a solution to these parsing problems. In the lexical database, you add data fields containing alternate and underlying forms. (It might be helpful to think of the "alternate" form as a surface form.)

For more information: Read Parsing with Shoebox and pages 247–250 in the Shoebox Tutorial.

Index of tips: alternate forms; longest-match principle; parsing; underlying forms
List of tips