SIL International Home

Conceptual modeling versus visual modeling:
a technological key to building consensus

A paper presented at: Consensus ex Machina Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computing and the Humanities
Paris, 19-23 April 1994

Gary F. Simons
Academic Computing Department
Summer Institute of Linguistics
7500 W. Camp Wisdom Road
Dallas, TX 75123
Copyright 1994 by Summer Institute of Linguistics, Inc.


Debate has long been a hallmark of the academic endeavor. The recent introduction of computers into academic life has not been the deus ex machina to bring sudden resolution to these debates. There is a new computing technology, however, that has some promise in this regard. It is called conceptual modeling. This paper (see endnotes) demonstrates how a computer-based model of a problem domain can lead to consensus when competing approaches to the domain can be encapsulated in different visual models that are applied to the same underlying conceptual model.

1. Conceptual modeling languages

Humanists have been using computer systems for decades to model the things in the "real world" which they study. A conceptual model is a formal model in which every entity being modeled in the real world has a transparent and one-to-one correspondence to an object in the model. Relational databases do not have this property; they spread the information about a single entity across multiple tables of a normalized database. Nor do conventional programming languages; even though the records of Pascal and the structures of C offer a means of storing all the state information for a real world entity in a single data storage object, other aspects of the entity like its behavior and constraints on its relationships to other objects are spread throughout the program.

A conceptual modeling language, like an object-oriented language, encapsulates all of the information about a real world entity (including its behavior) in the object itself. A conceptual modeling language goes beyond a standard object-oriented language (like Smalltalk) by replacing the simple instance variables with attributes that encapsulate integrity constraints and the semantics of relationships to other objects. Because conceptual modeling languages map directly from entities in the real world to objects in the computer-based model, they make it easier to design and implement systems. The resulting systems are easier to use since they are semantically transparent to users who already know the problem domain. See Borgida (1985) for a survey of conceptual modeling languages and a fuller discussion of their features.

2. The requirements for conceptual modeling in literary and linguistic research

To do conceptual modeling in the domains of literary and linguistic research we need a modeling formalism that can adequately model the kinds of information we deal with. The following are five of the most fundamental features of the data we work with (both the primary textual data and our analyses of them) and the demands they put on the formalism:

See Simons (forthcoming) for an in-depth discussion of these requirements.

It is possible to find software systems that meet some of these requirements for data modeling, but we are not aware of any that can meet them all. Some word processors (like Nota Bene, Multilingual Scholar, and those that use the Macintosh World Script system) can deal well with multilingualism (point 1). All word processors deal adequately with sequence (point 2). A few word processors can handle arbitrary hierarchy (point 3), but most cannot. The areas of multidimensional data elements and associative linking (points 4 and 5) do not even fall within the purview of word processors. This is where database management systems excel, but they typically do not support multilingualism, sequence, and hierarchy adequately.

The academic community has recognized the potential of SGML (ISO 1986) for the conceptual modeling of literary and linguistic data. The Text Encoding Initiative (TEI) is a large-scale international project to develop SGML-based standards for encoding textual data, including their analysis and interpretation (Sperberg-McQueen and Burnard 1994). The TEI guidelines handle the multilingualism problem by providing a LANG= attribute on all data elements. SGML already handles sequence and arbitrary hierarchy well. SGML offers partial, but not complete, solutions to the last two points---multidimensionality and associative linking.

With regard to multidimensionality of data elements, SGML does provide a mechanism for tagging a data element with a number of simultaneous attributes. However, the attributes of SGML elements cannot themselves store other complex elements; they may only be simple values like strings, numbers, or pointers. Thus the multidimensional nature of complex data elements must be modeled as hierarchical containment.

As for the associative links between related data elements, SGML offers a pointing mechanism (through IDs and IDREFs in attribute values), but there is no semantic validation of pointers. Any pointer can point to any element; there is no mechanism for specifying constraints on pointers in the Document Type Definition (DTD). The only relationships between element types that can be modeled and validated in the DTD are sequence and hierarchical inclusion.

Despite these shortcomings, SGML can be viewed as a conceptual modeling language that can be used for modeling literary and linguistic data, and indeed the TEI has used it as such. But SGML-based representations of information are too abstract and too cumbersome for the average researcher to work with directly. There is another fundamental requirement for a conceptual modeling system that will meet the needs of literary and linguistic research:

Until this requirement is met, it will be difficult for the community of literary and linguistic researchers to reach a widespread consensus on conceptual models such as those embodied in the TEI guidelines.

3. A computing environment for literary and linguistic research

The Summer Institute of Linguistics (through its Academic Computing Department) has embarked on a project to build a computing environment that meets the above six requirements. The environment is called CELLAR---for Computing Environment for Linguistic, Literary, and Anthropological Research. At the heart of CELLAR is an object-oriented knowledge base for storing multilingual textual information (Rettig, Simons, and Thomson 1993).

To build an application in CELLAR, one does not write a program in the conventional sense of a structure of imperative commands. Rather one builds a declarative model of the problem domain. A complete domain model contains the following four components:

Conceptual model.
Declares all the object classes in the problem domain and their attributes, including integrity constraints on attributes that store values and built- in queries on those that compute their values on-the-fly.
Visual model.
Declares one or more ways in which objects of each class can be formatted for display to the user.
Encoding model.
Declares one or more ways in which objects of each class can be encoded in plain text files so that users can import data from external sources.
Manipulation model.
Declares one or more tools which translate the interactive gestures of the user into direct manipulation of objects in the knowledge base.

Because CELLAR is an object-oriented system, every object encapsulates all the knowledge and behavior associated with its class. Thus any object can answer questions, whether from the programmer or the end user, like: "What queries can you answer about yourself?" "What ways do you have of displaying yourself?" "What text encoding formats do you support?" "What tools do you offer for manipulating yourself?" For programmers this greatly enhances the reusability of previously coded classes. For end users this greatly enhances the ability to explore all the capabilities of an application without having to read manuals.

4. Building consensus via multiple visual models

Experience to-date with CELLAR has shown that the distinction between conceptual model and visual model is crucial for reaching consensus among designers and users alike. The first step in implementing a computerized model for a problem is for domain experts to develop the conceptual model which sets up the classes of objects and their attributes. The second step is to develop a set of visual models, each of which displays the data stored in the knowledge base in a different way.

When two domain experts agree on a conceptual model developed for their domain, we could say they have achieved a "direct consensus." But often there are different points of view (whether due to differences in perspective, area of focus, terminology, or notation) that make it difficult to achieve such consensus. In such cases it may be possible to achieve an "indirect consensus" by building different visual models that are applied to a common conceptual model which governs a single knowledge base. For instance, two scholars who have different focuses of interest can use different visual models that present different selections of the information in the knowledge base. Two scholars who use different notations and different terminology can use different visual models which present the same information but using different notations and terms. Even when two scholars are not ready to agree that the underlying conceptual model is their preferred model, they have found consensus in a common conceptual model when each can agree that their preferred visual model gives them the view of the domain they are looking for.

This approach of finding indirect consensus through multiple visual models contrasts with the approach which the TEI has had to take. Because the complete computing environment which presents user-friendly views of information is missing, the TEI has been forced to forge direct consensus on the SGML representation of conceptual models. To find consensus it has been necessary to offer multiple conceptual models (that is, multiple ways of tagging the same information). The result falls short of the ideal for information interchange that lies behind the TEI, for when two scholars working in the same domain interchange SGML files that reflect different conceptual models, it is likely that the specialized software each has been using will not accept the encoding scheme of the other.

To illustrate building consensus through alternate visual models, the next two sections present sample applications developed in CELLAR. The first is from the domain of corpus linguistics; the second concerns textual criticism.

5. An example from corpus linguistics

The first example deals with the treatment of grammatical tagging in a tagged text corpus. Some of the debate in this domain has centered around what set of tags to use and whether it is better to use full feature structures rather than simple tags to represent the grammatical analysis of wordforms (see, for instance, Leech 1993:276-277, 280). The text used in the accompanying screen shots (see appendix 1) is taken from the British National Corpus. The feature analysis of the BNC tag set is taken from the feature structures chapter of the TEI Guidelines (Langendoen 1994). Both the text and the feature analysis were loaded into CELLAR from TEI-conformant SGML files.

5.1 The conceptual model

The conceptual model for the CELLAR implementation is shown in figure 1.1 (in appendix 1). The rectangles represent classes of objects. The class name is given at the top of the rectangle; the attributes are listed inside. When nothing follows the name of an attribute, its value is a simple string. An ellipsis following an attribute name means that its value is a complex object, but the detail is not shown. Arrows indicate that the attribute value is an instance of another class. A single-headed arrow means that there is only one value; a double- headed arrow indicates that a sequence of values is expected. Solid arrows represent "owning" attributes; these model the part- whole hierarchy of larger objects composed of aggregates of smaller objects. Dashed arrows represent "reference" attributes; these model the network of relationships that hold between objects. In a CELLAR knowledge base, every object is owned by exactly one object of which it is a part, but may be referred to by an arbitrary number of other objects to which it is related.

Beginning at the left edge of figure 1.1, the conceptual model for our corpus linguistics example has TaggedTextCorpus as its top-level object. The text attribute contains a number of Texts which contain a number of Divisions which contain a number of Paragraphs which contain a number of Segments which contain a number of TaggedWordForms. A Division also has a heading which is a single Segment. Segments have the attribute n which stores an identifying name or number for the segment (which generally corresponds to an orthographic sentence). This model of text structure is essentially that of the TEI.

A TaggedWordForm stores its form as a string, but stores its tag as a reference to a FeatureStructure. A FeatureStructure is the object that represents a grammatical tag, The BNCtag attribute stores a string which is the three- letter tag used by the British National Corpus. The BrownTag attribute gives the equivalent for this tag in the coding system of the Brown Corpus (Kucera and Francis 1967). The complete set of FeatureStructures is stored in the tagSet attribute of the TaggedTextCorpus. Note that one instance of each tag (as a FeatureStructure) is owned by the corpus and then referred to by each TaggedWordForm that uses the tag. This achieves a normalization of the database in that the specification of each tag (including, for instance, how its BNCtag is spelled) occurs only once in the database. Changing the specification of the tag in that one place will change it in every use (that is, reference to it) throughout the tagged corpus.

Each tag (as a FeatureStructure) also has a featureSpecification attribute. This gives the analysis of what the tag means as a set of feature-value pairs. Any tag set is based (whether implicitly or explicitly) on a feature system. This model of a TaggedTextCorpus states that the corpus also has a featureSet. This owns a set of Features which are all the possible features that can be used in the specification of a tag. Each Feature has a name and a set of possible values. The values attribute stores a set of FeatureValues. Each FeatureValue has an id (used in parsing SGML files), an abbreviation, and a fullName. Note that we again achieve a normalization by having a single instance of each possible FeatureValue owned by the Feature of which it can be a value, and then referring to it from the FeatureStructure for every tag that has it in its feature specification. Note, too, that every pointer in the CELLAR knowledge base, whether owning or referring, has an automatically maintained back pointer. Thus the feature specification need only point to its FeatureValues and not also to its Features, since each FeatureValue can easily produce its Feature by following the ownership link backwards.

5.2 The visual models

Appendix 1 gives a series of 22 screen shots which demonstrate a CELLAR implementation of this conceptual model of tagged texts. Figure 1.2 shows the normal view of the text that is used in this series of examples, "Memoirs of a Dog Shrink." The CELLAR tool offers many possible ways of viewing the same text; the View menu is used to select these options (figure 1.3). The "Select Display Options" command brings up a dialog that sets view options (figure 1.4). Choosing "Display words as subscripted tags" causes the grammatical tag associated with each word to be displayed as a subscript in a smaller type size (figure 1.5). Each tag that appears in the display is actually a minimal view of a FeatureStructure object; clicking on the tag launches a small window which displays a full view of its information content, giving both its full feature specification and the equivalent Brown Corpus tags. Figure 1.6 is the result of clicking on the VVD tag following spent.

Going back to the "Select Display Options" dialog we can also control whether or not the sentence numbers are displayed (figure 1.7). Figure 1.8 shows a view of the text in which the BNC's standard id numbers are displayed at the beginning of each text Segment. "Select Display Options" also allows us to display the words with their tags as interlinear annotations (figure 1.9). Figure 1.10 shows what this view looks like. "Select Display Options" also gives us control over whether we see the BNC tags or their Brown Corpus equivalents (figure 1.11). Figure 1.12 shows the interlinear format with Brown tags. Note that when a single BNC tag has more than one equivalent in the Brown tag set, all the possible Brown tags are shown separated by hyphens. Even when displaying Brown tags, clicking on a tag still launches a full view of the feature structure. Figure 1.13 shows that clicking on the VBD tag beneath spent produces the same feature structure as clicking the BNC tag in figure 1.6.

The final option for displaying the words of the text is to view each as a feature structure (figure 1.14). Figure 1.15 shows the result. Each word is displayed as a feature structure containing the wordform as the value of the form feature, plus the feature specification for the grammatical analysis associated with the word's tag.

The View menu contains a second command, "Export to SGML" (figure 1.16). From CELLAR's perspective, putting information into an interchange format is just another view of the information. The "Export to SGML" command brings up a dialog that offers three choices (figure 1.17). All three options are TEI-compatible formats; they differ with respect to the treatment of the tags. The first choice (figures 1.18 and 1.19) outputs the text without any grammatical tags. The second choice (figures 1.20 and 1.21) outputs a tag as an in-line entity reference immediately following each word. The third choice (figures 1.22 and 1.23) outputs each word as the content of a <w> element; the associated grammatical tag is output as the value of a tag= attribute on <w>.

5.3 A look behind the scenes

This section gives a look behind the scenes to show how some of the behavior demonstrated in the preceding section is actually implemented in CELLAR. We will look into the implementation of the TaggedWordForm class since this is where most of the action is. For instance, it turns out that all other classes have only a single view definition for all the alternatives in the "Select Display Options" dialog; the basic differences are achieved by switching between different views of the TaggedWordForm class.

Programming in CELLAR begins with the definition of the conceptual model. Figure 1.1 gives a graphical representation of the conceptual model. The following is how the conceptual model for TaggedWordForm is actually expressed as source code in the CELLAR system:

class TaggedWordForm has
   owning    form   : String
   reference tag    : FeatureStructure
			 owned in tagSet of my corpus
   virtual   corpus : TaggedTextCorpus
			 means corpus of my owner

The definition of an attribute has four parts: its type (as owning, reference, or virtual), its name, its signature (that is, what class of object it stores, points to, or returns), and additional information. For a reference attribute, one may also declare where the objects to be pointed to are found in the knowledge base. Looking at the conceptual model diagram in figure 1.1 we see that the FeatureStructures which the tag attribute refers to are owned in the tagSet attribute of the TaggedTextCorpus. We can find the TaggedTextCorpus object from any given TaggedWordForm by following all the ownership backwards. The corpus virtual attribute is defined for this purpose; it is a virtual attribute because it returns a value like any other attribute, but the value is not actually stored in the object--- it is computed on-the-fly when needed. It is defined as corpus of my owner; the corpus attribute is similarly defined on all the higher-level classes except that on TaggedTextCorpus it is defined to mean self. Thus in searching for the top-level corpus object, each object passes the request up to its owner until the TaggedTextCorpus finally says, "Here I am."

Section 3 above speaks of the encoding model which declares how objects of each class can be encoded in plain text files so that users can import data from external sources. The text for this example came from the British National Corpus in TEI-compatible SGML markup. To load data into CELLAR objects from an external text file, one write s a parser for each class involved which tells how to map the contents of the text file onto the attributes of the objects. For instance, the tagged wordforms for the title of the text had the following format in the input file:

Memoirs&NN2; of&PRF; a&AT0; Dog&NN1; 

The following is the CELLAR parser for mapping the span of a text file that represents a TaggedWordForm into an instance of a TaggedWordForm:

enrich TaggedWordForm with
   parser BNC : build matching (
      '&' <tag=find(String.upTo(';'))> ';' ?blank )

The angle brackets enclose an expression that sets an attribute value. The find keyword is used to instruct CELLAR to follow the owned in declaration for the reference attribute to find the object for which the matched string is the name. (This implementation depends on FeatureStructure defining a virtual attribute for name which means my BNCtag.) String.upTo(literal) is a built-in parser for class String which produces a string containing all the characters up the next occurrence of the literal. '&' and ';' match those literal characters. The question mark signifies that the following element is optional; blank matches any number of spaces, tabs, or newlines.

The four choices for "Display words" in the "Select Display Options" dialog are implemented by four different views of TaggedWordForm. The simplest, formOnly, displays only the form of the TaggedWordForm:

enrich TaggedWordForm with
   view formOnly : my form

The other views of TaggedWordForm show two pieces of information ---the form and the tag. To handle a view with multiple components we need a way of specifying how they are to be laid out. CELLAR, following the lead from Donald Knuth's (1986) TEX system, builds a display as a structure of boxes within boxes. There are three kinds of grouping boxes: a row places its component boxes side by side, a pile places its component boxes one over the other, and a paragraph places its component boxes side by side until reaching the limit of available space at which point it continues making another line of boxes below the first and so on. The specification of the layout of a view is called a template in CELLAR.

The subscriptedTag view is a row. It shows the form followed by the tag with no intervening space; glue is placed in the row template between the two elements to suppress the space that would normally occur by default. The tag is displayed using tagButton; this names the view that is to be used for the retrieved object. TagButton is a view of a FeatureStructure since my tag retrieves a Feature Structure. The tagButton view of FeatureStructure is what knows whether to display the BNC tag or the Brown equivalent and what launches the "Feature Structure" window when the tag is clicked on. The with keyword is used to express property settings that alter the details of formatting. In this case the text of the tag is to be put in a smaller type size (10 points) and superscripted by negative 5 points.

enrich TaggedWordForm with
   view subscriptedTag : 
      row showing (
	 my form, glue,
	 my tag using tagButton 
	    with text[size=10, superscript=-5])

The interlinearTag view is a pile. It shows the form over the tag. As before, the tag is shown using its tagButton view so that it can launch the "Feature Structure" window and be sensitive to the choice of BNC tags or Brown equivalents.

enrich TaggedWordForm with
   view interlinearTag : 
      pile showing (
	 my form,
	 my tag using tagButton with text.size=12 )

The featureStructure view is also a pile. It shows a row composed of "form =" followed by the form attribute placed over the pile that results from showing the FeatureStructure which is the value of my tag with its featurePile view. To get the square brackets which are conventionally displayed around a feature structure, this pile is placed inside of a frame which selects "bracket" as its border style and turns off the display of border segments above and below the enclosed box.

enrich TaggedWordForm with
   view featureStructure :
      frame showing (
	 pile showing (
	    row showing ('form =', my form),
	    my tag using featurePile) )
      with frame[borderStyle=bracket,
		 above=false, below=false]

The three formats for SGML export use the formOnly view and two new views: tagAsEntity and tagInW. Both are simply a row which glues together data strings and the appropriate literal strings. Note that my tag returns a FeatureStructure and not a printable string; these views use BNCtag of my tag to get the right string out of the FeatureStructure.

enrich TaggedWordForm with
   view tagAsEntity : 
      row showing (
	 my form, glue,
	 '&', glue, BNCtag of my tag, glue, ';' )

enrich TaggedWordForm with
   view tagInW :
      row showing (
	 '<w tag=', glue, BNCtag of my tag, glue, '>',
	 glue, my form, glue, '</>' )

These source code samples should serve to illustrate the nature of programming in CELLAR. A major aim of the project has been to develop a high-level programming system that makes it relatively easy to build new applications. The tagged text corpus application illustrated in figures 1.2 through 1.23 was implemented in about 20 hours; the text critical application (see next section) was similarly implemented in about 20 hours.

6. An example from textual criticism

The second example is from the domain of textual criticism. A point of debate in this domain has been whether the proper product of such research should be a diplomatic edition which presents an extant manuscript with critical comment, or should be an eclectic edition which reconstructs the editor's notion of the original text (see, for instance, Speer 1991).

In this example an electronic edition (or, following Faulhaber 1991, a "hyperedition") of a passage from the Second Epistle of Clement has been constructed; it records all variant readings attested in the three extant manuscript witnesses as well as the critical choices of four modern editions. The three manuscripts are the Codex Alexandrinus (5th century), the Codex Constantinopolitanus (11th century), and the Syriac Version (translated in 8th century, with only one extant manuscript from 12th century); the sigla A, C, and S are used (respectively) for these three manuscripts. The four editions are Lightfoot (1890), the Loeb edition (Lake 1912), Bihlmeyer (1970), and Wengst (1984); the sigla L, Lb, B, and W are used (respectively) for these editions.

6.1 The conceptual model

The conceptual model for the CELLAR implementation of this critical text is given in figure 2.1. The graphic notation is explained above in section 5.1. Beginning at the left, a CriticalText has header information, a body, and a set of authorities. The body of the text is composed of Chapters which are in turn composed of Verses. Both have the attribute n to store an identifying number. The contents of a Verse is a sequence of two kinds of objects, Strings and TextVariations. A String is used for a span of text which is the same in all the manuscripts and editions. A TextVariation is used when there are variant readings for a span of text. A TextVariation has two attributes, a noteSymbol which returns the letter which identifies the variation in the critical apparatus view, and the set of readings which encode the variants. Each Reading stores a set of witnesses and its text, which may be either a simple String or a combination of Strings and embedded TextVariations.

The witnesses attribute of Reading is a reference attribute. It points to Authorities which are owned by the CriticalText itself. This achieves a normalization in which each Authority occurs only once in the database and is referred to by all Readings which use it. An Authority has three attributes: a siglum which represents it in displays of variant readings, a brief one-line description, and a lengthier source which gives a full bibliographic reference and other relevant background information. The conceptual model diagram shows two subclasses emanating from the bottom of the Authority class: Manuscript and Edition. These two classes have no unique attributes; they inherit all of their attributes from Authority. They are distinguished in the implementation by generating different views; for instance, the siglum for a Manuscript is always presented in bold type, while the siglum for an Edition is in normal type.

6.2 The visual models

Appendix 2 gives a series of 19 screen shots which demonstrate a CELLAR implementation of this conceptual model of a critical text. Figure 2.2 gives a view of the seventh chapter of Second Clement which shows all the textual variants as bracketed alternatives. Each reading is followed by a list of the sigla for its witnesses. Clicking on one of these sigla launches a view of that authority which provides a description of it (figure 2.3).

The View menu is used to select other possible views of the text (figure 2.4). The "Single reading" command generates a view which shows just one reading of the text; the example in figure 2.5 shows the text according to the Codex Alexandrinus. The underlined words, phrases, and dashes (which represent an omission) signify lemmas for which other authorities have a variant reading. Clicking on one of these lemmas launches a window which shows a view of all the variants (figure 2.6). From this embedded window as well it is possible to click on the siglum for an authority to get its full description (figure 2.7).

The "Select authorities" command in the View menu (figure 2.8) brings up a dialog that allows the user to choose a different authority to serve as the base text in the display. The example in figure 2.9 is selecting Lightfoot's edition as the authority to follow. The result (figure 2.10) shows the text following Lightfoot's edition.

Another choice in the View menu, "Compare readings" (figure 2.11), allows the user to display a comparative view of the text according to two authorities. In the example in figure 2.12, the Codex Alexandrinus is selected as the base authority and Lightfoot's edition is selected as the second authority to compare to it. Figure 2.13 shows the resulting view. Underline signifies a lemma for which there are variants, but on which these two authorities agree. Brackets show the variant readings of the two authorities when they do not agree. Clicking on a bracketed variant launches a view of all the variants and their witnesses so that one can see how the other authorities compare (figure 2.14).

Another choice in the View menu, "Text with apparatus" (figure 2.15), produces a conventional view of a textual edition with critical apparatus. When the base authority is a manuscript the result is a diplomatic edition (figure 2.16). When the base authority is an edition, the result is an eclectic edition (figure 2.17). In this implementation, the difference between a diplomatic edition and an eclectic edition boils down to what authority is chosen for the base text; both are produced from the same view definition applied to a common database.

Finally, the View menu also provides an "Export" command for dumping the data into an SGML format for interchange (figure 2.18). The export view has been implemented to produce a markup that follows (at least in spirit) the TEI guidelines. Figure 2.19 shows the beginning of the marked up output which contains the witness list. Figure 2.20 shows the encoding of the text itself with <app> and <rdg> tags to represent the TextVariation and Reading objects, respectively. The Greek text is encoded in TLG beta code.

7. Conclusion

Conceptual modeling of problem domains is a technology that can lead to consensus among researchers working in that domain. When implemented computationally, such consensus can in turn lead to seamless interchange of information among scholars. Indeed, the Text Encoding Initiative is attempting to do just this for a broad range of domains in literary and linguistic research. But conceptual models alone do not seem to be enough. As the sample applications implemented in CELLAR illustrate, the real key to achieving consensus appears to be providing many alternative visual models for a common underlying conceptual model.


I am indebted to many for making the research reported herein possible. Terry Langendoen and Dominic Dunlop assisted materially in providing the data files for the tagged text example. Robin Cover has played a crucial role as provider of inspiration, tutelage, and source materials for the example from textual criticism---a domain which falls outside my own area of expertise. John Thomson has collaborated in the design of CELLAR ever since the idea was conceived late in 1987. He has also led the team of Smalltalk programmers who began working in 1990 to implement CELLAR. Members of this team (listed with length of service) have included: Larry Waswick (4 years), Sharon Correll (3 years), Marc Rettig (2 years), Jim Heimbach (9 months), and Nathan Miles (6 months). The development of CELLAR has been funded in large part through a grant from Wycliffe Bible Translators (Huntington Beach, CA). [back to introductory paragraph]


Bihlmeyer, Karl. 1970. Die Apostolischen Vaeter, Erste Teil. Zweiter Klemensbrief, pages 71-81. Tuebingen: J. C. Mohr. [back to section 6]

Borgida, Alexander. 1985. Features of languages for the development of information systems at the conceptual level. IEEE Software 2(1): 63-72. [back to section 1]

Faulhaber, Charles B. 1991. Textual criticism in the 21st century. Romance Philology 45(1):123-148. [back to section 6]

ISO. 1986. Information processing---text and office systems---Standard Generalized Markup Language (SGML). ISO 8879-1986 (E). Geneva: International Organization for Standardization, and New York: American National Standards Institute. [back to section 2]

Knuth, Donald E. 1986. The TEXbook. Volume A of Computers & Typesetting. Reading, MA: Addison-Wesley Publishing Co. [back to section 5.3]

Kucera, Henry and W. Nelson Francis. 1967. Computational analysis of present-day English. Providence, RI: Brown University Press. [back to section 5.1]

Lake, Kirsopp. 1912. The Apostolic Fathers with an English translation by Kirsopp Lake. The second epistle of Clement to the Corinthians, volume 1, pages 123-163. Loeb Classical Library. Cambridge: Harvard Univeristy Press. [back to section 6]

Langendoen, D. Terence. 1994. Feature structures. Chapter 18 of Sperberg-McQueen and Burnard (1994). [back to section 5]

Leech, Geoffery. 1993. Corpus annotation schemes. Literary and Linguistic Computing 8(4):275- 281. [back to section 5]

Lightfoot, J. B. 1890. The so-called second epistle of S. Clement. The Apostolic Fathers: Clement, Ignatius, Polycarp (2nd edition), part 1:, volume 2, pages 191-268. Macmillan. (Reprinted 1989 by Hendrickson Publishers, Peabody, MA) [back to section 6]

Rettig, Marc, Gary Simons, and John Thomson. 1993. Extended objects. Communications of the ACM 36(8):19-24. [back to section 3]

Simons, Gary. Forthcoming. The nature of linguistic data and the requirements of a computing environment for linguistic research. To appear in John Lawler and Helen Dry (eds.), Computing and the Ordinary Working Linguist. Mouton de Gruyter. [back to section 2]

Speer, Mary B. 1991. Editing Old French texts in the eighties: theory and practice. Romance Philology 45(1):7-43. [back to section 6]

Sperberg-McQueen, C. M. and Lou Burnard. 1994. Guidelines for the encoding and interchange of machine-readable texts. Text Encoding Initiative, document number TEI P3. Sponsored by Association for Computers and the Humanities, Association for Computational Linguistics, and Association for Literary and Linguistic Computing. [back to section 2]

Wengst, Klaus. 1984. Didache (Apostellehre). Barnabasbrief. Zweiter Klemensbrief. Schrift an Diognet. Pages 205-280. Munich: Koesel-Verlag. [back to section 6]

Appendix 1: Demonstrating a tagged text application

Appendix 2: Demonstrating a critical text application

Document date: 31-Oct-1995