Putting the Cookies on a Lower Shelf--

Getting Started with Conc

Making Concordances, Indices, and Word Lists

By Ed Beach

Ed_Beach@sil.org

Contents

Introductory Level Tutorial

Intermediate Level Tutorial

Advanced Level Tutorial

One of the most useful and powerful functions of a computer is to generate concordances, indices of words and characters, and word lists. I am frequently surprised at how many computer users do not know how easy this is to do, and how many Bible translators let the making of a word list go until their translation is practically done, supposing that this is one of the *final* steps for checking a translation and that it has to be performed by an expert. If you are one of those people, then here's good news for you! The Conc concordance software for Macintosh by John Thomson of SIL is an easy-to-use tool that computer neophytes and experts alike can profit from greatly.

This tutorial is organized into three consecutive levels: Introductory Level, Intermediate Level, and Advanced Level.


Introductory Level Tutorial

Here are some typical uses of Conc, all of which can be done by a linguist who is a novice Mac user:

Conc is safe for anyone to use since it merely views your original document and cannot change it in any way. No need to worry that you or a coworker will accidently alter a document!

Conc's learning curve is very short. I recently worked with a Mother Tongue Translator who was not particularly computer savvy, but within one hour, he was generating his own concordances.

Some of the features that Conc provides include:

Conc comes with a user's manual in electronic form that contains far more information than can be contained in NOAM. You can print it out or use it on disk. Conc has been designed so that it can be translated into other languages using Apple's free resource editor, ResEdit (available from JAARS). Computer-wise, this is an astoundingly easy process; the challenge however, is the decision-making process concerning the choice of terminology to be used in menus, dialogs, and messages. For this reason, translation of Conc into other languages should be done by those who are familiar with computer terminology in the target language, Conc, and Macintosh interface standards.

The last general release of Conc is version 1.76, dating back to 1993. Various beta versions have been available since then which have added some features, but 1.76 was the last widely tested and debugged version. Little development work has been done on Conc in recent years due to the need to focus on development of SIL's LinguaLinks software.

Preparing to Use Conc

Conc can be obtained from any of the following sources:

Conc is RAM-based software. (RAM means "Random Access Memory" and refers to the little memory chips inside a computer. This is where Conc does almost all its work, except when slows down to access information on a disk, e.g., when you open a text document or when you save/export a concordance or index.) This simply means that Conc works with all text, concordances, and indices entirely in your computer's pure electronic memory. An important implication of this is that the source text(s), concordance, and index can all fit into the RAM space allotted for Conc. Generally speaking, you need at least three times as much room for all this as the size of your source text. Until Mac OS 8 is available in 1996, you have to check and set this manually. Here's how:

Before starting Conc, select its icon and choose "Get Info" from the Finder's File menu. In the area of the lower right corner called "Memory Requirements," edit the number to be at least three times the size of your source text. Once the Info Dialog is closed, the setting is saved as the future default. If your Mac has only 4 Mb of RAM, then you may run into difficulty using Conc to process large amounts of text material.

Memory sizer

For language work in general, linguists should have at least 8 Mb of RAM for pre-PowerPC Macs such as I have (I also use RAM Doubler), or at least 16 Mb for a PowerPC. (If you think that's expensive, note that this is what JAARS is basically recommending (see NOC 14.3.28) and that it is what LinguaLinks will require.)

First Look

Here is a sample of Conc's Concordance and Index windows. The top window shows the text of Acts 1. Below that is a concordance that has been generated and which shows the key words in bold, aligned down the center, and with chapter and verse references along the left side. Below that is an alphabetized index of words showing how many times each occurs and the chapter and verse references of where it occurs.

Note that in the text window I have clicked on one of the "Spirit" entries in the concordance. Conc instantly selected that item in the text as well as in the index.

It took just one minute to do this! In that time, I...

Here's how you can do that...

Seven Easy Steps From Text to Concordance and Index

Overview
1. Start Conc by double-clicking its icon.

2. Open a source document from Conc's File menu. (Formatted Nisus documents are okay; others have to be "plain text.") Include additional documents by choosing Append.

File Open

3. Define a custom sort order, if needed.

4. Define reference markers (e.g., chapter and verse) and word separator characters.

5. Define parameters for specialized concordances.

6. Tell Conc to make the concordance.

7. Tell Conc to make an index.

The Details
Step 1: Start Conc
Conc can be started up in various ways:
Step 2: Open a source document from Conc's File menu.
Do this from *Conc's* File menu! If you double-click the source text itself, you will merely open it in the application with which you created it. Using Conc's File "Open" command let's Conc look at a document created by another application.

Conc normally requires that the source text be a text-only document, but any *formatted* Nisus or Nisus Writer document can be used directly by Conc. Documents created by most other word processors contain hidden formatting information that will confuse Conc. To make a text-only document in most word processors, open the document in that application; then open the "Save As..." dialog from the File menu, choose the setting for a text-only version, and give the text a new name so as not to overwrite the original.

Saving a document as "text only" in the "Save As" dialog of Microsoft Word

Note about Microsoft Word: Always check the "Make Backup" check box (or whatever its equivalent is in your version of Word). Word's "Fast Save" option can lead to various problems, not the least of which is that other word processors have trouble reading documents saved with Word's Fast Save option. The text-only version of your source text is now ready for use by Conc.

Step 3: Define a custom sort order.
Select "Sorting..." on Conc's Options menu. When the dialog opens, note that the Font menu is active and can be used to set the font in this dialog for special orthographies such as seen in the example below.

Options menu: Sorting...

Note the following about this dialog:

Step 4. Define reference markers (e.g., chapter and verse) and word separator characters.
Open the Text Properties dialog for Conc's Options menu. Here are the settings I use for SIL Standard Format (SF) Scripture when making a single concordance/index from multiple books:

Step 5. Define parameters for specialized concordances.
This step may lead you to select one or two more dialogs accessed from the Options menu:

The Include Words dialog is a veritable power house which will be discussed later. First timers should select "Include all words."

Important: Words specified here will be included only if they are not excluded by the Omit Words dialog.

This dialog is a handy way to make an initial spell check by enabling you to limit a concordance/index to rare words--likely suspects for spelling mistakes. If this is your first time to use Conc, leave at least the first two items unchecked.

Layout menu: Display...

Step 6: Tell Conc to make a concordance.
This is the fun part! Simply choose "Word concordance" on the Build menu.

Before building a large concordance, ensure that you have plenty of hard disk space! Concordances can easily be twice as long as the original file.

Use the standard Save and Print items on the File menu for saving and printing your new concordance. Note that you can also export it as a text file to use in a word processor or other program.

Troubleshooting: If you wind up with a concordance that is different than you thought it would be, check all the option settings. Also make sure you are using a text-only file as your starting point.

Getting Statistics: If you would like to know some statistics about your concordance or index, choose Conc's "Statistics" command on the Build menu.

Step 7: Tell Conc to make an index.
An index of words or characters tells how many of each occurs and where they occur. To create such an index in Conc, you first build a concordance (as done above) and then simply choose "Index" on the Build menu.

Use the standard Save and Print items on the File menu for saving and printing your new index. Note that you can also export it as a text file to use in a word processor or other program.

If what you really wanted was a plain vanilla word list, i.e., a list of unique words without any references or the number of occurrences, then refer to the Intermediate Level tutorial below. (Most people use indices. A true "word list" is actually a quasi power user tool.)

This is the end of the Introductory Level tutorial for Conc. Your are invited to continue on to the Intermediate Level.


Intermediate Level Tutorial

Working Smart with a Concordance and Index

Now that you have a concordance and an index, you can find things very fast in your document.

Want to quick change the look of your concordance or index?

Need a word list? This is useful for importing into a user dictionary in other software such as Nisus. Strip an index of references and numbers of occurrences by doing the following. (This assumes you are working from a document with no spelling errors!)

You can save time by saving your options settings in a file there they can all be retrieved at once. Simply choose the "Save All Options" from Conc's File menu..

Next time you start Conc, do so by double-clicking on your options document.

Restore a given set of options merely by using the File menu's Open command to open the needed options file, or simply double-click it in the Mac's Finder window.

The Revert command restores the options and concordance that were in effect the last time the concordance was saved. (It does nothing with an unsaved concordance.)

Note: Conc 1.76 beta may leave Save and Save As grayed out and unavailable when an options dialog is open. Generating or opening a concordance should solve the problem.

Building a Large Concordance

Long documents are normally saved in pieces. Conc enables you to easily build concordances and indices of such multiple file documents. Use the File "Open" command as usual to open the first text file for your concordance or index. Now use "Append" on the File menu to open additional texts to be included in the concordance/index.

Exporting a Concordance or Index

The "Export..." command on the File menu creates a plain text file of your concordance or index, depending on which window is active. This can be useful for printing it with other software in more sophisticated ways than Conc is capable of doing. I frequently export concordances and indices for processing with Nisus. If you have already saved the document in Conc format this command appears as Export file name As.

Exporting can produce very large files, more than ten times the size of the original text. You can halt the export process by clicking the Abort button on the progress indicator. Consider whether you could better use "Export Selection" (becomes active when part of a concordance or index has been selected by dragging over it) to save just part of the concordance.

If you desire to get a word list from Conc, then select "Index..." on the Options menu. In the dialog, choose "Index entries show at most 0 references," then go ahead and make your index. Finally, you will have to export the index and strip it of the numbers of occurrences (and their preceding tabs) using a word processor. You may have to fill in the number zero. Don't forget to set the Index Options dialog back to "Index entries show all references" as a default!

Printing Concordances and Indices

"Print" on the File menu prints the current concordance or index, depending on which window is active. To print just a part of a concordance or index, select the lines you want (by dragging or shift-clicking as usual) and then choosing "Print Selection" on the file menu.

The Page Setup command is stock standard. Page Layout and Header/Footer dialogs are self explanatory.


Advanced Level Tutorial

Pouring on the Power with Pattern Matching

A key to Conc's usefulness is its ability to generate special concordances based on user-defined words or, more strictly speaking, rules called "patterns." Finding word or character sequences that fulfill rules is called "pattern matching." Pattern matching is accomplished by:

Conc's Include Words dialog has boxes ("fields") for two patterns and a third box where whole words can be listed. Words matching whichever of these three is activated will be included in the concordance.

The simplest pattern is just a group of ordinary letters. E.g., Specifying the pattern "ing" will create a concordance of words containing "ing."

Formulaic patterns are even more powerful. This is an advanced feature of Conc, so if you don't feel ready for it, skip this section until your felt need for it is high enough to motivate you. Be forewarned that Conc is no smarter in pattern matching than you make it! The burden for creating proper formulas is completely on the analyst.

In pattern matching formulas, special meanings are assigned to certain characters. Conc comes with a default set of special characters already set up for this purpose. (I, general, these are world-wide standard GREP (Global Regular Expression Parser) symbols from the world of Unix.) For instance, "ing$" is the pattern that matches all words with "ing" endings because Conc knows that $ means "end of word."

Select the Pattern Matching dialog on the Options menu to see Conc's default pattern matching symbols.

You may be tempted to change the characters in this dialog. However, it's best to at least start out using these defaults since they are a standard in the world of computing. (Note the similarity to Nisus Writer's PowerSearch Pro find mode.)

Hint: Leave the dialog open while you write pattern matching formulas. (If you change the settings, you will need to click okay and then reopen it.)

Normally, a pattern is written to find single words that match it. You can, however, look for multiple word sequences by specifying a number of words to include in the comparison. Do this by filling in a number other than "1" in the Include Words dialog item that reads "Include groups of __ words."

Sample Patterns

bapti
All occurrences of baptism, baptist, baptize, baptized, baptizing.

^[aA]
All words beginning with either a or A.

^[A-Z]
All words beginning with an uppercase letter (in ASCII range of A through Z).

[^aeiou][^aeiou][^aeiou]
All strings of at least three consonants.

[!?."][ ]*[a-z]
All words beginning with a lower case letter after zero or more spaces following end of sentence punctuation (Have to set search to include groups of two words.)

^a_*p$
All words that start with a and end with p. The _* matches any word forming character that may or may not come between.

^\([aeiou]\)_*1
Vowel initial words, provided the same vowel occurs elsewhere in the word

b[aeiou]%b
All strings where a b is followed by any number of vowels and then another b.

Here is how Conc's default pattern matching works:

Matching with special characters \ [ . _ # ^ * $

1. The . (period) matches any character.

2. The _ (underline) matches any character considered to be part of a word.

3. The # matches any character that is not part of a word.

4. The backslash \ followed by any character--except a digit or the parenthesis characters--matches the character that follows the backslash. This is useful if you want to look for characters that are normally special.

Matching Sets of Characters

5. Square brackets identify a set of characters.

6. Shorthand, such as [a-s], may be used where a and s are in ascending ASCII order and a-s represents the inclusive range of ASCII characters. (Unfortunately, this means exactly what it says in the present version of Conc. Hence, if you have a special character in a font that alphabetically occurs between, say, a and s, but has an ASCII number higher than s, then it would not be included in the set. A future version of Conc or its successor will most likely employ a user-defined collating sequence. )

Matching Numbers of Occurrences

7. An element of a pattern that is followed by * matches a sequence of 0 or more occurrences of that element.

8. If an element of a pattern is followed by % then it matches one or more occurrences of that pattern element.

Matching Copies of Character Strings

9. Are you ready to adjust your eyeballs? Backslashes paired with parentheses \( and \) are opening and closing brackets that cause Conc to remember whatever is between them for further comparison. (Note that Nisus has a beautiful way to handle this in its PowerSearch mode. I have found the best way to get accustomed to this feature of Conc is using the "found" feature in Nisus PowerSearch.)

10. This is used in combination with the next convention: A backslash \ followed by a digit n matches a copy of the string that the bracketed pattern beginning with the nth \( matched. This is useful when what is inside the brackets could match several things (it includes other special characters) and you later want to check for a repeat of the thing that matched.

Matching Character Positions

11. Any pattern such as mentioned above that is preceded by the caret symbol ^ is restricted to matches at the beginning of words (or group of words, if that option is selected).

12. Any pattern such as mentioned above that is followed by $ is restricted to matches at the end of words (or group of words).

Other Options in the Pattern Matching dialog

The check box item "Characters within primary sort groups are distinctive for pattern matching" in the Pattern Matching dialog functions independently of the corresponding check box item in the Sorting dialog.

If certain characters are specified for the secondary sort sequence, they are ignored for pattern matching if "Characters...are distinctive" is off.

The check box item "Include word separation characters" controls whether characters that are not considered to be in words are included in the match.

Progressively Restricting Word Inclusion

A very useful strategy for analyzing text is to make progressively more focused concordances. For example, suppose you want all the words that contain a and e and i, in any order. There is no single pattern match that will do this. However, if you first limit the concordance to words containing a, then build a concordance from that one of words containing e, and then a concordance based on that of words containing i, you can easily get the required result.

Building one concordance from another is as easy as having a concordance open, and then selecting the Build Word Concordance command. Conc will ask if you want to use the current concordance or the original text as the starting point for your new concordance, and you will click on "Present concordance".

A particularly valuable way to use the "Present concordance" option is to begin your study of a large text by creating a concordance containing all the words you ever expect to be interested in (possibly all the words in the document). Then save this as a base concordance, using the Save command on the File menu. Then, for each set of words of interest, open this base concordance, change the set of words to include, and choose "Present concordance." Use the Revert command on the File menu when you want to start again. In most cases (especially for large files and if you have a hard disk) the combination of Revert followed by building a new concordance based on the current one will be much faster than building a new concordance based on the original text.

Current Limitations of Conc


Date created: 14-Dec-1995
Last modified: 14-Dec-1995
URL: http://www.sil.org/computing/conc/tutorial.html
Questions/Comments: WWW@sil.org
[SIL Home Page | Conc Home Page | SIL Computing]
Copyright © 1995, Ed Beach and Summer Institute of Linguistics