SIL International Home

The Linguist's Shoebox

Integrated data management and analysis for the field linguist


Text-based linguistics is a solid foundation for understanding a language and culture.

Shoebox integrates the text, lexical, and cultural data produced by text-based analysis (e.g., through interlinearizing, jumping and data links, word lists and concordances). While this should not be your exclusive strategy for building a good lexical database, it is highly productive and reliable as a primary source of words, for checking senses, and for investigating semantic and grammatical collocations. Here is a caution, if you build a lexicon primarily through interlinearizing texts: morpheme-level glossing tends to encourage researchers to ignore compounds and phrasal lexemes and to overlook sense discrimination. After interlinearizing a text—parsing it by morphemes—you should pass through the text a second time to identify polymorphemic words, compounds, and phrases. Enter them into the lexical database as separate headwords. For more information: Read page 8 and section 4.4 in Making Dictionaries.

Index of tips: data analysis; interlinear text; lexicography; text files
List of tips