The Linguist's Shoebox

Integrated data management and analysis for the field linguist


Options for matching characters relate to the sort order and case associations.

In most computer programs, you can select two basic options for matching characters by using the Match Case check box in the Find and Replace dialog boxes. In Shoebox, you can select options for matching characters in the following dialog boxes:

To meet the needs of researchers who work with writing systems that include diacritics (e.g., à) and who distinguish roots from affixes in lexical and interlinear text databases (e.g., in vs. in-), Shoebox provides four options. Because the options use the unfamiliar terms primary grouping, secondary ordering, and ignored characters, it is not obvious what effect they have. The second option in Shoebox corresponds to when the Match Case option in ordinary software would be turned off and the third option corresponds to when it would be turned on. The options are listed from the least strict (i.e., most inclusive) to the most strict (i.e., most exclusive) criteria.
When you select the option for matching characters in a particular data field, Shoebox uses the sort order and case associations in the appropriate language encoding. For example, you could use the Find command to look for occurrences of the letter c in all French data fields. Here are characters that would and would not be found according to the properties of the French language encoding:

Option Matches But not Relationship to language encoding
By primary grouping only C, c, Ç, ç A, a, B, b, etc. same line in the Primary characters box
Disregarding case C, c Ç, ç same line in the Case Associations box
Exactly by secondary ordering c C matches the individual character
Even those normally ignored c c-, -c Ignore characters in the Sort Order

Index of tips: case associations; characters, matching; language encoding properties; matching characters; sort orders
List of tips