The Use of a Book of Photos in Initial Comprehension Learning
by Greg Thomson

© 1989 Greg Thomson. Used by permission.



This essay by Greg Thomson tells you how to set up and use a photo book for second language acquisition. It describes the kind of pictures to take and the way you can use them in various ways with a speaker of another language to learn to understand a variety of vocabulary and grammatical structures. This information, in connection with other writings by Greg Thomson, can help you quickly develop your listening comprehension in a second language.

Setting up the photo book
Nouns, transitive subjects and objects
Other basic sentence types, locations, instruments
Summary of first week
More complex structures
You can't get everything by one method
Unstructured use
Monolingual use
Later use, and better planned photo books
Back Matter

Setting up the photo book

There are many ways in which a photo book could be set up. Furthermore, different photo books could be designed for different purposes. The set-up which I will describe here is that of a photo book for use from the first day of language learning. It will be especially useful during the first four to six weeks, as the learner attempts to acquire a recognition vocabulary over 1,000, and enough grammatical structure for minimal communicative functionality. The photo book should be used in conjunction with other communicative techniques, such as Total Physical Response.

The photo book on display at this conference contains about a hundred photos. It took about two hours to take the pictures, and another hour or two to arrange them in the scrap-book.

The aim in collecting the photos was that each would contain one or more humans as main characters, and that the humans would, in most cases at least, be particularly involved with either another human, or with a non-human object (typically the latter). This would set the stage for simple transitive sentences.

Nouns, transitive subjects and objects

First pass: Identifying humans in the pictures

Only the more generic nouns are used: man, men, woman, women, boy, girl, people, children, and so on. One statement per picture is plenty. Helper says: “This is a man; this is a woman; this is a man; this is a woman; this is a boy; this is a man; this is a boy and a man; this is a girl; this is a boy and a girl; this is a boy and a girl and a woman and a man; these are some children; these are some men; these are some men and some women.”

Second pass: Identify objects which are especially associated with the people in the pictures

Helper says: “This is a pail; this is a shovel; show me the pail; show me the shovel; this is a cart; show me the shovel; show me the cart; this is a bicycle; show me the pail; show me the bicycle; show me the cart; show me the shovel; show me the cart; show me the pail; the bicycle; the pail; the cart....”

Drill on each page until vocab is familiar. The helper will need to spend adequate time on each page.

Third pass: Simple transitive sentences

It may turn out that, strictly speaking, one ends up with something other than transitive sentences. Use only a handful of transitive verbs: have, use, touch, see are good possibilities.

Learner answers “true” or “false” in his or her own language: “This man has a shovel. This boy has a pail. These people have chairs. This man is touching a pail. This woman is looking at a pail.”

If relative clauses are transparent enough, this pass might even include: Show me the man who is using the pail. Show me the man who is not using the pail. The language learner is now comprehending sentences with subjects, objects, and verbs.

Other basic sentence types, locations, instruments

Fourth pass: Lots of verbs

The comprehension of simple transitive sentences is a good foundation. The next steps build on this foundation. As with the non-human nouns in the second pass, learning a wider variety of verbs from the photos requires spending enough time on each page for vocabulary to become familiar. At this point both intransitive and transitive verbs should be included.

“This man is sitting. This woman is standing. Who is sitting?” (Language learner answers by pointing. “Show me who is standing. This man is working. This woman is resting. Show me who is standing. Show me who is not standing. Show me who is resting. Show me who is working. Is this man standing? Is this woman sitting?” Answers by nod of head, or in contact language.

Important note: in my experience the language helper always finds it hard to believe the amount of repetition that is needed, and so must be trained to provide adequate repetitions, and frequently reminded.

Fifth pass: Existential sentences, more nouns, locations, instruments

At this point the learner may recognize many dozens of words, mostly nouns and verbs. There are potentially a very large number of nouns lurking in these photos. Existential sentences and location phrases provide natural contexts for using these nouns. The “true or false” method can be used. Yes/no questions can be asked. Note that answering in the target language can dramatically decrease retention. In most cases it is best to answer in the contact language. Also, “Show me...”, “Point to....”, “Touch...” imperatives can be used. The language helper's instructions may require the learner to turn to a different page to search for the appropriate photo.

“In this picture (on this page) there is a man (true or false).” “He is sitting on a bench.” “There is a bicycle in front of him.” “Show me a window.” “Touch the ground in this picture.” “Show me a man who is working with a shovel.”

New vocabulary should be introduced gradually, with lots of repetition, of course. A good goal in the early weeks is a daily increase in “recognition vocabulary” of thirty items.

Summary of first week

We have covered about a week's use of the picture book. During the same week language sessions will include physical response drills as well, also following a grammatical syllabus, using real props. The learner will spend perhaps an hour preparing for each session, one or two hours in the session, two or three hours reviewing with the tape recorder, and an hour may be left for making notes of a more analytical nature.

Going on--emphasis still on simplex sentences

Now we will no longer talk about individual passes through the photo book. The further grammatical items we will discuss could be covered in various ways and in various orders.

Pronominal categories; agreement categories

Pictures lend themselves most naturally to third person sentences singular and plural. However, given that the photo book is being used in an interactional context, it is easy to involve communication using first and second person sentences.

“I am touching a pail, and you are touching a shovel (true/false)”. “Can you see an old man? Can I see an old man?” (Learner and helper have agreed to look at separate pages--the learner looks at the teacher's page only in order to answer the question).

It is necessary to have two learners, or a volunteer native speaker in order to most effectively include second person plural and first person plural, inclusive and exclusive. Having a third person in the session, either a fellow learner or another native speaker, provides a number of important advantages. (One learner with whom I worked added imaginary characters to the communication environment by taping papers to the wall around the room. This allowed a lot of flexibility as to who and what they were.)

Further practice with first and second person can come from the helper talking about what the people in the photos might be thinking. This will be especially helpful when combined with tense/aspect distinctions below.

When I speak of agreement categories, I am primarily thinking of gender at this point, and will use masculine and feminine to illustrate. In Urdu, for example, the pronouns do not mark gender, but gender agreement is shown in verbs and in many adjectives.

Learner and helper look at a page of photos together. It is important that there be more than one possible picture to which a statement could conceivably apply, with gender being the decisive factor in determining which picture the helper is referring to. The learner responds by pointing. “He/she is sitting.” “He/she/they are near a baby.” “He/she is tall/short.” “It (object) is red (masculine/feminine).”

As with all of these exercises, the learner can sit down after the session with photo book and tape-recording and respond to the tape as he or she responded to the helper. This is especially helpful in cases where the helper is available for only a limited time each day. It also can spare the helper from boredom and frustration.


There will be one or two tense or aspect or tense/aspect combinations which are natural in reference to the photos. The learner will have had much exposure to these by this point. It is easy to learn to recognize a variety of tense/aspect combinations in conjunction with the photo book. While following a grammatical syllabus in this way, we always have in mind the minimal needs for communicative functionality by the third month, using his or her “personal pijin”. In this regard, although there may be a wide range of tense/aspect distinctions, the first essential minimum is that the learner be able to speak of what has happened, what is happening, or what will happen. The sentences can relate to what is happening in the pictures and the surrounding (preceding and following) circumstances, or they can relate to what the teacher or learner is doing or is about to do. Frames such as “When this picture was taken”, “Before this picture was taken”, “After this picture was taken” are helpful.

“Before this, this man folded the cloth.” “After this the man will spread the cloth.” If the man is ironing cloth, then for both of these statement the correct response would be “false”, since he would have first spread the cloth, and later folded it. “I am going to touch a man who is working... Did I do what I said?” “I just touched someone old.”

Constituents of noun phrases

Much of this sort of thing will come up without special effort to include it. Also, not everything has to be covered by this method. For instance, physical response methods work far better for learning numerals. By using currency, it is possible to cover all the numbers from one to whatever in communicative exercises.

“All of the people in this picture are eating.” “In this picture there are three small boys and three large boys.” “Someone gave this man a cup of tea.” “This lady is cleaning some rice.”

Negation, questions, commands, modality, voice

Sentence negation will have occurred without any planning during the first week. Along with this will be words for “no”. Also we have already been using yes/no questions. Content questions, and tag questions (or functionally similar constructions) can be handled easily.

“Who/what is he looking at?” (answers should be in the contact language). “Where is he sitting?” “What is he sitting on?” “What is she working with?” “This man is sitting in front of some firewood, right?”

There will have been much experience with commands by this point, especially since it is assumed that the photo book is being complemented by physical response drills. The learner should not miss the opportunity to begin learning about politeness phenomena. Requests and orders may involve various non-imperative forms. Role play of different relationship during sessions, e.g. mother and child, teacher and student, citizen and governor, etc. allows variety in regard to such phenomena.

Modality refers to probability, possibility, certainty and such things, as in “Perhaps this man might possibly be making shoes.” The construction of relevant exercises is left as an exercise.

The main voice we are typically concerned with is passive. There will be plenty of active verbs! Probably the most common use of passive forms involves avoiding mention of the agent. “This tree was cut down.” Complications are typically in the verbal morphology. If the learner learns to understand agentless passives, later learning about adding agents, if that is allowed, will probably be fairly trivial. Using the frames such as “When this picture was taken...” may gives lots of exposure to passive forms.

Coordination, and related phenomena

Verbs may be conjoined. However, there may be constructions in which there is a string subordinated events, with a single event marked as the main one. (“He is sitting on a bench and eating an apple” could conceivably take the learner in various directions grammatically, depending on the language.) At the phrase level, Nouns can be conjoined as subject, object, location, instrument, etc. Sentences can be linked by “neutral” conjunctions.

Other NP business

A close to maximal noun phrase might be something like “all of those seven frightened children”. Languages differ in what they allow as minimal noun phrases. The learner has enough resources after a couple weeks or a month to experiment with using “all”, “all those”, “all seven”, “those frightened”, “those”, and NULL as possible noun phrases (subject position is best). Once the helper has the idea of shortening noun phrases he can go through the picture book using the range of possibilities the language allows.

If the helper is told to tell a three or four sentence “story” using vocabulary he thinks the learner knows, in relation to each picture, this will allow the learner to observe “definiteness” phenomena and anaphoric reference, as well as features which may distinguish topical from non-topical noun phrases. Remember, the goal is comprehension, not analysis, so these things do not need to be mastered. However, it is good for the learner to be able to check off these sorts of things from the grammatical checklist, meaning that exposure to them is occurring in communication, with good comprehension.

Finally, generic nouns can be used as in “This man likes to drink tea. This man raises cattle.”

Noun roles

We are more interested in the oblique roles here, since the roles typically filled by subjects and objects will have been exemplified early on.

“With whom is this man talking?” (Interlocutor). “On the next page I will touch beside a lady who is making bread for her husband.” (Beneficiary). “Show me a boy who is standing with his father.” (Company). “The man in this picture is traveling by cart.” (Means). “These people are walking to the well.” (Destination). “They are probably coming from their house.” (Source).

More complex structures

Many complex sentence forms will be used right along. In using the checklist, the goal is to make sure to cover those which have not already come up.

If pictures are discussed in regard to what the people in them are thinking or wanting, or what they might say about what they are doing, this will illustrate complement clauses.

“He is going to tell these children to go away” (Embedded command). “He is thinking, 'Why is this man taking my picture?' “(Embedded statement). “This person does not know what he is drinking.” (Embedded question). Embedded questions may well provide another frame for providing exposure to all of the question words.

An example of a perceptual verb with a complement would be “This man could not see me taking his picture.” A stative subject clause would be “It is good that this woman is making bread for her family.” A modal main predicate would be “It is certain that this man will buy some vegetables.”

Relative clause also have been used in early examples. At some point the learner will want to see that at least subject relatives, object relatives, and oblique relatives have been covered.

“On this page there are three people who are making things” (Subject relative). “Show me someone whom a child is pushing” (Object relative). “Show me a window in front of which no one is standing” (Oblique relative).

Alternatives to relative clauses (e.g. participles). Expressions are found in many languages such as the italicized part of “A man eating bread was sitting across the street.” In such cases the verb is used adjectivally. If this is a common sort of expression, it should be easy to get the language helper to construct sentences with participles modifying the main character in each photo.

Finally, we will consider examples of clauses that are commonly labelled adverbial. As with tense/aspect distinctions, our goal is not necessarily to cover everything at this point in language learning. The goal is for the learner to have enough resources for functional conversational ability by the third month or so. Languages are often over-equipped for such a minimal need. Reason clauses, for example, may have a form like “He was angry over my coming.” But there may be other options such as, “He was angry because I came.” The italicized portions are examples of what we refer to as “small adverbial clauses”. The learner needs to become familiar with some pattern for expressing the reason relation. By the end of the first year he or she will be familiar with all of the major patterns for such things.

“This woman is making bread because she needs to feed her family” (reason); “Because this lady's children are thirsty and she needs to wash clothes, therefore she is carrying water” (reason and result); “This woman is making bread in order to feed her family” (purpose); “This woman is taking care of her family by making bread” (means); “Getting up and walking across the street, this man got on his motorcycle” (participial/background).

The following illustrate what we have called “large adverbial clauses” (often called “dependent”). “Even though it is hot, still this man is working” (concession, contra-expectation); “If this man drops his tea cup, then it will break” (conditional); “If this man were rich, then he would not be working in the sun” (counter-factual conditional).

A few further comments are in order with regard to complex sentences. First, in training learners to use the methods, it is good to have a group of them brain-storm as many reasonable sentences of each type which might be associated with each picture. The more fully predictable the sentence is from the scene in the photo, the better, but as long as the sentence is connected with the photo, the photo serves to provide the context which aids comprehension.

To use these methods so early in language learning assumes, of course, a helper who is bilingual in the trade language. My observations suggest that learners who start out working monolingually experience much slower progress than those who start out working bilingually and then “go monolingual” (even when relating to bilinguals) after three or four months. This should be obvious, but often one gets the sense that field-workers feel “monolingual” is the ideal right from the beginning. It is not that the learning process itself benefits from the bilingualism. The helper does not use the contact language during the actual communication activities. The advantage of the contact language is that the learner can explain and exemplify what he desires the helper to do. When this is possible, my impression is that most helpers can quickly get on with, for example, using a reason and result construction that is fairly obviously related to each picture. This should, of course, employ vocabulary and grammar the helper feels the learner will understand. Every time a sentence is not understood, the helper can supply the meaning in the contact language. A complete pass can be made through the photo book for each such construction, or for a combination of such constructions, or less than a complete pass can be used, depending on how much is required before the learner is convinced that he or she has recognition ability for the construction or constructions being focussed on. When the sentences so used are captured on tape, it provides further opportunity for relaxed comprehension practice using the tape recorder and photo book together.

You can't get everything by one method

There are some constructions which do not lend themselves well to the photo book method. I would suggest indirect objects (and the related sentence types), reflexives, reciprocals, destination, source, company, and direction as examples of relations and roles which are more easily learned through the Total Physical Response method. In addition, most of the constructions which can be learned by the photo book method can also be learned by the Total Physical Response method. Again, however, some matters included in the grammar checklist may not lend themselves as well to physical response methods as to the photo book method. Between the two methods, the learner can quickly develop the ability to comprehend a substantial core of grammatical constructions.

Unstructured use

One learner with considerable experience in ESL chose to use a book with pictures (not photos) in a much less structured and more natural way. In connection with each picture, going through them in the order they were found in the book, the helper would point and talk, and the learner would learn to comprehend all sorts of information in connection with each picture. Using such a method, it would be possible for linguistically trained language learners to keep an eye on a grammatical checklist. Others might need more help from a consultant in using such a tool.

Also, by using carefully planned line drawings illustrating cultural scenes, it is possible to achieve more detail and ethnographic sophistication than is achieved in my photos. However, it would take longer to produce the same quantity of pictures in that way.

Monolingual use

Unfortunately, I have not had opportunity yet to employ such a photo book in a monolingual situation. There are ideas I would like to try, but as I have not done so, I will not discuss them. In truly monolingual situations, I am told that some people have difficulty interpreting photos, and that line drawings seem to work better. At any rate, using pictures and the “here and now” principle, I feel the learner has more control over the content of the learning sessions than if he or she tries to learn entirely while sitting by the campfire or working in the crops from the very outset.

Later use, and better planned photo books

This paper has described the use of a photo book in the early weeks of language learning, as one aid to high-speed comprehension learning. The main advantage is that more is learned more quickly than by methods which involve memorizing, drilling, and learning to speak right from the outset. Another advantage is that all language learning from the first session involves real communication: receiving messages, processing them, and responding to them. Sometimes the response is no more than being able to keep up, connecting each sentence meaningfully to each photo. But even this minimal response means processing the sentences, and means that real communication is taking place.

The initial exercises, as we have used them, are contrived and unnatural. This applies to the physical response methods as well. A sentence such as, “If you are wearing anything green, then your wife would like you to pat her on the head, after which she will pat you on the head” is an excellent Total Physical Response sentence, involving real communication--comprehension, processing, responding. But it is contrived.

The learner will rapidly be able to comprehend a broad range of vocabulary and constructions. After that, things can become much more natural and less contrived. Endless-freewheeling conversation can revolve around those same photos, including talking about them for real, i.e. who the people are in them, various facts about those people, and the situation surrounding the snapping of the photo, and so on. Furthermore, the photo book can be a conversation piece with visitors, and provide a basis for conversation and learning for an indefinite period. This would, however, be a fairly minor language learning activity once the learner is able to follow simple stories and series method texts and so on.

Our first picture book, made of pictures clipped from magazines, was not as useful as a book of photos. We found we did much better by quickly shooting three roles of film and arranging them in a scrap-book. I have long felt a need to approach this in a more planned manner. It would be possible to take photos illustrating all steps in a procedure, or the major events in the daily cycle, yearly cycle, or life cycle. The major differences in stages of the life cycle. Major cultural happenings could be photographed in great detail and the photos arranged in a logical, spatial, or chronological manner. (Such resources could be shared among language learners in the same culture area.) Alternatively such photos could be loose, rather than pasted in a scrap-book, with comprehension activities (and later, production activities as well) being built around sorting them into order.


This paper describes a language learning method for linguistically trained language learners. Certainly, a consultant could adapt it to the needs of others as well. It assumes that language is acquired through communication experiences. Thus it contributes to the goal of having all language learning activities consist of real communication, right from the outset. Yet it takes advantage of the analytical ability of the learner. Inevitably, the linguistically trained learner will be noticing patterns and making generalizations. Going through a hundred photos using, for example, oblique relative clauses, contributes to this, and allows substantial observations to be made and conclusions to be drawn. But it aims at optimal monitor use. Discovering the pattern for oblique relative clauses purely through analytical work, and then employing it in speech to consciously construct utterances containing oblique relative clauses, amounts to over-use of the monitor. Linguistically trained language learners are not in danger of being under-users of the monitor. Methods such as this allow the learner to benefit from his analytical skills and ability to generalize, even while experiencing language as communication. (See Dulay, Burt and Krashen 1982 for a discussion of the monitor concept.)

Our pre-field training needs to do more to counteract popular concepts still brought to the field by some field-workers. In particular I have in mind, the language-as-subject-matter concept of “language study”, the desire to make “lessons” to be “learned” in non-communicative ways. Learners will say that this is their learning style. Yet there is little evidence that pre-field training provides enough experience with, and confidence in, any other style, to allow them to really make this judgement . Hence methods such as this one are not greeted with uniform enthusiasm, except when major efforts are made on the field to supplement pre-field training in language learning.


Dulay, Heidi, Marina Burt, and Stephen Krashen. 1982. Language Two. Oxford: Oxford University Press.

