A HOLISTIC APPROACH TO LANGUAGE
Brian D. Josephson
Cavendish Laboratory, Madingley Road, Cambridge CB3 0HE, England.
email : bdj10@cam.ac.uk
and
David G. Blair
Maritime Operations Division, DSTO, PO Box 44, Pyrmont, NSW
2009, Australia.
email : david.blair@dsto.defence.gov.au
The following progress report views language acquisition as primarily the attempt to create processes that connect together in a fruitful way linguistic input and other activity. The representations made of linguistic input are thus those that are optimally effective in mediating such interconnections. An effective Language Acquisition Device should contain mechanisms specific to the task of creating the desired interconnection processes in the linguistic environment in which the language learner finds himself or herself. Analysis of this requirement gives clear indications as to what these mechanisms may be.
Original date of preparation of this report: November 29th,
1982.
Corrections added 1995, abstract 1996.
We were fortunate in having available at the outset a little known general account of many aspects of intelligence, including language (Mahesh Yogi 1974). Some clarifications of this very abstract description were gained from the work of Skemp (1979) and that of Eckblad (1981). The need to integrate very general principles with attention to matters of detail was met by utilising ideas from the work of Dik (1978) and Marcus (1980).
Little can be done in this introduction to summarise the theory, except to say that the rules for processing language, and the mechanisms for their acquisition, play a central role; and further, that by keeping attention simultaneously on the levels of very abstract principles and of less abstract general facts about language one is led upon a subtle trail, on which one finds that all the general rules get explained in a essentially simple and direct manner.
The general form of this work involves types of argument very different from the usual analytical-deductive ones (which contrive to 'prove' a given hypothesis to be the correct one), and many of the ideas are contained implicitly in the arguments rather than being stated explicitly. Exclusively left-brain thinkers should perhaps stop reading right away. For the remaining readers, it is suggested that more attention should be given to absorbing the basic ideas than to their critical analysis. It is important that the mind be allowed to work on the ideas spontaneously. For ourselves, we find the degree of fit that can be obtained between the generally intuitively appealing hypotheses and the patterns in the linguistic and other data to be a good sign that the ideas are a reasonable approximation to the truth. It is not feasible at this stage in the research to attempt to make the theory perfect and to study exhaustively all permutations and combinations of the concepts involved; therefore there are an number of gaps in the theory as formulated in this report, and it should therefore be viewed as a convenient stopping place on the way to perfection (which is a very long way away from where we are now). The present work can be only a first approximation to the truth, and future research will be needed to amend any errors and add further necessary detail.
Linguistic exchange is a process of great utility to the users of a language. This utility is contingent upon the receiver being able to translate the message into a format in which it is able to guide his actions. It must be translated from the code of the language into an inner code before this is possible. This we refer to as the process of interpretation. The translation process and processes auxiliary to it involve many specific subprocesses, which are learnt during the acquisition of the language concerned. Language learners use only some of these processes, and they may use simplified or erroneous processes which lead to failure to understand or misunderstanding.
The inner code, in order that it can guide actions, must in some sense consist of the individual's representation of what is or what may be (or what was). The existence of inner code is independent of language; language is merely an alternative way of generating it. Language acquisition is in essence learning the right way to connect inner code and external language. But there is no unique 'right way' since in reality language users have a diverse variety of processing systems. Their environments differ and so what they acquire is different. But their environments also have similarities, and different users' systems are in general equivalent enough for effective information transfer to take place. Nevertheless, the 'meaning' assigned to an utterance is in reality (except in special cases) a subjective entity (and to a degree an intersubjective one), rather than an objective one as assumed in many treatments of language.
Grammar plays an important but not exclusive role in regulating the initiation of the interpretative processes. Inappropriate input is ignored in the first instance, greatly reducing the number of possibilities that are considered by the listener. The appropriateness of input is governed mainly by linguistic categories. By selecting the categories in his linguistic output, as a speaker of a language learns to do, a speaker can determine which combinations of entities a listener will assemble.
Language has a pragmatic as well as a semantic component (as well as syntactic and phonological components). That is to say, certain linguistic forms presuppose that the listener will do certain things with the information supplied. Users learn the advantages of doing these particular things, embodying the information in their system of rules or processes, while the fact that listeners know and apply these conventions is used by speakers when they make use of language.
Having noted this, we must go on to observe that any rules discovered by such means may be wrong, or valid only under certain conditions. Therefore the system must go on to learn (and apply) further knowledge which restricts the application of the rules. The system must learn two distinct things, which we shall term the domain of applicability of the rules and their conditions of applicability. The sense of the former is that of the domain of a function in mathematics, in other words it is only within a certain set of entities that the rule should be applied at all, while the question of whether a rule should be applied to an entity which is inside the domain is a matter of other conditions which may apply independently of the entity being operated upon by the rules. From the point of view of learning the domains and conditions it is crucial to observe that when a rule is applied to an entity outside the domain of the rule, a significant outcome can result only by chance, while a greater than chance significant outcome will occur for an entity within the domain (depending on the probability that the appropriate condition is satisfied). In principle, then (we do not wish to consider details here), a system can be designed to discover the domain and conditions of a rule. Note that if an erroneous rule is by chance acquired, the criteria given will lead to the domain becoming non-existent.
For completeness, it should be noted that a system may extend its concept of the domain of a rule to take into account the dependence of a rule on a parameter, that is to say that the domain is parameter dependent. The system extends its first approximation of a fixed domain to take into account this parameter dependence. This point is exemplified by the fact that the rules for assigning a phoneme to a sound are context dependent and also dependent on factors such as the accent of the speaker. The mechanics involved will not be discussed further here.
Our approach demands the existence of processes which produce modifications or variants of rules. A number of such rule-changing or variant creating processes are (i) making a new rule in which a constant in the old rule becomes a variable (whose domain of variation is subsequently determined); this is a form of generalisation; (ii) adding extra variables: this applies when the system learns a construction slightly more complex than one it can already handle; (iii) making a mutant rule with some constant changed; (iv) rules made by 'process splitting', a mechanism which involves partitioning a previously learnt process into two components. Examples will be given later.
The mechanisms just discussed can give rise to the kind of structure noted in systemic grammar. The mutant process is one mechanism for this. Also it may be the case that one of the conditions for one rule is that another rule should be operating at the same time; this gives rise to the embedding of one process in another.
We consider first the effect of the word 'move'. Our model is that (at a certain level of its functioning, at any rate) the system first creates a code L(move) corresponding to the word move. It subsequently translates this code into a code S(move) representing the meaning of move, i.e. the process of moving something. The second, semantic code, is one which under the right conditions is capable of activating programs for moving objects. Such programs will require at particular points information such as indicated by the linguistic expressions X, Y and Z. When the utterance is heard and interpreted, this information must be stored away (in either long-term or short-term memory) in such a way that it automatically reappears when it is needed.
The memory problem is one not specifically connected with linguistic input. The system has just as much the ability to remember actions which it has discovered for itself by trial and error. We assume that collections of information are stored in independently addressable memory blocks, and that within a block specific items are stored and retrieved by means of specific 'enquiry codes'. If this is the case, it is natural to assume that information acquired linguistically is stored in memory blocks using these same enquiry codes.
But what codes are stored exactly corresponding to X, Y and Z in our example? Requiring that the elementary constituents of processes be as simple as possible leads us to the hypothesis that the system uses a set of codes x, y and z which act as codes not only for the linguistic form of the unit (X, Y or Z) but also for the meaning. The codes x, y and z act as labels for blocks of memory which contain all the relevant information. The assumption is that when a linguistic unit such as that represented by X is recognised a label is assigned to it, with a corresponding block of memory, which is used to store, with the aid of suitable enquiry codes, the information indicated. Interpreting units such as X, Y and Z is a matter of taking information out of a given block of memory, transforming it in accordance with the relevant interpretation rule, and inserting the result back in the same block of memory (but not in identical memory locations since the enquiry codes for the different types of information are different).
The nature of a typical interpretation rule or process may be seen by returning to our example 'move X to Y, using Z'. The input to the process is the collection of codes L(move), x, y, z (stored under suitable enquiry codes), and the output is the collection S(move), x, y, z, again stored under suitable enquiry codes. The operation involved is the translation of the code for move fr om the L version to the S version, together with the copying of the codes x, y and z from their original enquiry code locations into new enquiry code locations associated with semantic information.
The effect of the interpretation process is to produce a set of information codes, in turn marked by enquiry codes, held either in short term memory or in a labelled block of long-term memory. The system must be such that it can treat the stored codes such as x, y and z as labels for subroutines and make the necessary arrangements for returning at the end. The most plausible mechanism is one which conforms to the intuitive idea of making a specific plan of action out of available knowledge (cf. Eckblad's discussion relating to the computer language SIMULA 67 (p. 104 of Eckblad 1981)).
It may be asked why we do not treat the word 'move' in the same way as X, Y and Z, i.e. as a variable and not a constant. There are number of reasons. The most basic one is that the semantic enquiry codes are intimately related to the meaning of the key word such as move: they indicate the semantic role of the variables. Thus, the interpretation rule needs in any case to depend explicitly on the key word. In addition, Dik's researches show that the domains over which the rules apply are also dependent upon the key word. As an example, in a sentence such as 'The meeting lasted for three hours' the variable denoting duration is obligatory (because the verb 'to last' is used), whereas it is not obligatory in a sentence such as 'They discussed the matter for three hours'. Both these facts indicate that the key word should be explicit in the specification of the rules. We assume, then, that the system which knows the rule is one which contains a mechanism which can carry out the process or transformation indicated. The next points to consider are how the mechanism gets its input and what triggers it off. By definition, the process ought to be triggered off if a signal is present that is within its domain and if the conditions relevant to the rule are satisfied. Clearly pattern detectors must first operate to see if patterns falling within the domain of a given rule are present. The information must be held in a suitable buffer, as in the Marcus (1980) model of parsing. Codes representing information under examination must be held by a mechanism which preserves the (in linguistic processing) all-important information concerning original temporal order, and which presents to view the information which is immediately required, concerning the constituents of the buffer.
We know (e.g. from transformational grammar) that the general basis for recognising forms is that of constituent category (plus temporal order). The question of how categories and patterns are actually learnt will be left till later, and we shall consider for the moment the mechanics of transference from the buffer. A significant clue is provided by consideration of related forms such as the active and passive forms of the same utterance. It is normally assumed that the semantic representations of two such utterances are identical (apart perhaps from differentiating markers), representing the fact that their meanings are known to be identical. It is unreasonable to suppose that each of the very large number of semantic interpretation rules present in a given language should be duplicated, while the existence of a transformation mechanism which shunts information around into a different order seems inefficient (though valid as a piece of linguists' apparatus). Instead we can make contact with Dik's theory and postulate an intermediate labelling process applied to those elements in the buffer which belong to a pattern which is going to be subsequently interpreted.
This will be illustrated with the example of the two related sentences
1: John kicked the ball
2: The ball was kicked by John
We suppose that preliminary processes recognise 'John', 'kicked', 'the ball' and 'was kicked' (in example 2) as units and assign labels to them, which are held in the Marcus buffer. The relationship between 'kicked' and 'was kicked' is acknowledged by their sharing a code L(kick), and their being distinguished by features, which include markers for active and passive respectively. The codes in the buffer include information about their grammatical categories (e.g. NP, VP ). This information, together with that of the order in the buffer, is sufficient to indicate the different 'presemantic functions' of the units, viz. verb, agent and goal in Dik's notation (we use the word presemantic in preference to Dik's semantic, since on our model these attributions are not precise determiners of semantic role (instead, the meaning is generated by the interpretation process/rule operating on the presemantic marker)). These presemantic labels (presemantic enquiry codes) are attached to the items in the buffer and used to extract the information. Two modes are possible: immediate interpretation, on the basis of a known interpretation process, or delayed interpretation, in which case a label is attached to the uninterpreted codes, which are stored away using presemantic codes as markers, so that an interpretation process can operate on them later. In the first case, the information in its semantic form can be assessed immediately, so that if necessary an alternative pattern or interpretation can be tried. When a plausible interpretation has been arrived at the information can be stored in a block of memory indexed by a label.
We pause to consider how the Marcus buffer is operated. A suggestive possibility is that the mechanism of rehearsal is used. The contents of the buffer normally decay with time, but may be kept topped up by a process which repeatedly scans the contents of the buffer in an order which represents the associated time sequence. Items are no longer maintained in the buffer once they have been assembled into a group, and the code for the group may be added into the buffer in the corresponding place. New items derived, by a unit-identification process from acoustic input, are inserted at the end of the buffer. (A particular implementation of this model, based on labelling the contents of the buffer with ordinal numerals, allows the contents of the buffer to be moved continually towards the beginning as space becomes available).
What happens to the resultant information, which has now been interpreted, or merely grouped on the basis of pattern discovery? This is a matter of the pragmatics of the use of linguistic information. One case is that in which the unit that has been processed is part of a larger pattern (for example either of the above utterances might have been preceded by the words 'I know that'. The nature of natural language is such that the correct procedure is simply to replace the information in the buffer by a code representing the whole (as in the Marcus parser). This is implied by the fact that a unit such as a phrase can be treated as a single entity, with a particular meaning of its own.
The alternative is that the unit is not embedded in a larger construction, and thus functions as a complete whole. It may, for example, be a request such as 'come here', or a declaration such as 'The Moon will be full tonight'. The label giving access to the meaning of the utterance tags a valuable(?) piece of information, and the listener must determine what to do with the label (equivalently, what to do with the information). This question goes under the general heading of pragmatics. As we see from these two examples, pragmatic action is sometimes a matter of social or linguistic convention, and sometimes a matter of general intelligence (which is more or less outside the terms of reference of the current theory). In the first example the convention is to treat the action described as one which is to be carried out, if not overruled by other factors (this raises a general point to the effect that one aspect of pragmatics is to decide the validity, reliability or suitability for use of information supplied by a speaker, and to act upon the outcome of the assessment in a manner which is appropriate). In the second case, the response is not a matter of convention, but one of determining the manner in which the information supplied might be useful, and storing the outcome of any deductions or decisions made, so that similar operations may be tried out in the future.
It is largely the fact that rules can be transcended that gives language its attributes of flexibility and adaptability. The connection between words and meaning is a non-rigid one, the listener's intelligence discovering what is not explicitly indicated by the rules. This flexibility allows the speaker to create new rules, which allow new configurations of ideas to be linked up to language. Furthermore, the fact that a listener has the means on occasion to interpret an item of language even when he does not know the relevant rules is the principal means by which he is able actually to discover rules.
Clearly anaphora (the use of part of an utterance to refer to another part, by use of a pronoun or the repetition of a word) is a related problem. The fact that anaphora works efficiently indicates that the method of recording an utterance in memory includes processes designed for easy subsequent recall by anaphoric reference -- what may be called anticipatory memory. Codes which it is known from experience may have to be recalled soon are stored temporarily in registers indexed by the words to which the codes refer (so that they are recalled if the word is repeated) or by features such as number and gender so that they may be recalled by subsequent use of a pronoun bearing the same features. Only one item may be stored in a given register, and the pragmatics of the language presumably determine whether a given item is displaced from a register or not.
(i) It is clearly most efficient for pattern detection to be carried out by a
number of independent processors operating in parallel. This remark is subject
to certain provisos, however. As already discussed, there are cases where one
rule is discovered a s a variant of another. In such cases systemic networks of
processes are involved, and different patterns may share processing
elements.
(ii) The fact that only one pattern out of a number of
possibilities is chosen as the basis for further processing indicates the use
of the mechanism of reciprocal inhibition.
(iii) We can (to a limited
degree) recover from errors by backtracking. This requires mechanisms to
remember intermediate states in processing and for effectively erasing
errors.
(iv) Patterns may be detected, in principle, either serially (using
the original input, or the output of the rehearsal process) or by a parallel
process using logic circuitry. Probably both methods are used to some
extent.
We begin with the idea that, for the language learner, order is gradually discovered among what is initially patternless random noise, as the cognitive structures able to represent the significance and form of language develop. At the beginning, speech is represented only at the level of sound. As correlations between sound and representations of the world are discovered, response becomes possible to language represented at the level of its sound.
Next, attention is given to the patterns visible in language, and as the meaningful patterns become familiar, the processes for detecting patterns and using them are acquired. Thus knowledge of grammar is acquired.
When it becomes possible to pick out the patterns in speech (whose meanings in general are unknown), attention is given to finding out the meanings in other ways, and the diverse rules which a language employs for relating words to meaning are discovered.
Finally, when rules for literal meanings have become familiar, attention goes to discovering what speakers actually intend to imply (often going beyond the literal meaning) by what they say, and thus are learnt the pragmatic aspects of a language (which to a greater extent than the semantics are the shared property of a social group).
(Comment: the above description is not a reference to four separated stages of development. All that is implied is the manner in which acquisition of knowledge at one level is dependent on knowledge gained at previous levels).
Having given this general outline, we proceed to describe the detailed mechanisms. We shall not attempt any general formulation, but discuss instead a number of examples which will illustrate the mechanisms involved.
In the case of familiar objects, such as tables or balls, or categories such as food, or known people, or common actions such as walking or eating, it may be expected that the listener possesses already codes which represent them. Excitation of such codes is significantly correlated with the presence of the corresponding words. We invoke a mechanism which couples receptors responding selectively to the sounds with the corresponding semantic codes. Thus is acquired a primitive type of linguistic rule, one which relates sound and meaning. The supposed coupling means that a representation of the object, etc., can be invoked by the hearing of the word, and corresponding responses will follow. To the extent that such outcomes produce positive results (especially of the nature of a concordance between expectation and reality), the rule is reinforced (in the terms of reference of the previous discussion, the conditions of validity of the rule are being established).
The next stage is the discovery of patterns. The only patterns which are useful to know are those with specific interpretations, which convey more information as a group than is conveyed altogether by the elements of the pattern taken separately. We assume that it is on this basis that the patterns in language are discovered.
A pattern is an abstraction artificially imposed on a structure: for any given pattern there exists a defining process which determines whether or not the pattern is present. We therefore propose (bearing in mind the linguistic evidence concerning the nature of linguistic patterns) that the following general procedure is followed:
(i) a meaningful constellation is discovered (according to procedures discussed
later);
(ii) a pattern-defining rule which applies to the given constellation is
created;
(iii) the rule is applied to subsequent input, etc., to look for more instances
of patterns conforming to the same rule.
(iv) these new instances are examined to see to what extent they fit the
purpose for which (by virtue of the way the parts of the system work together
to perform an overall function) they are intended.
(v) rule modifications are carried out (with the aim of improving the overall
utility, in terms of increasing the number of positive outcomes and reducing
the number of negative outcomes, of the rule).
(vi) pattern-defining rules generally involve categories (such as nouns or
auxiliaries). The categories come into existence by a process of expansion from
a prototypical example, as will be explained in due course.
This procedure will first be discussed for the case of standard units of input (such as morphemes, words or phrases). This is a degenerate case of a pattern, in which all examples of the pattern are closely similar to each other (e.g. all the pronunciations of a particular word). We suppose that the system has already learnt to respond to the word, etc. on the basis of its sound. This means that detectors which respond selectively to the sound are linked to codes which function as the meaning. The system does not yet represent the word, etc. as a unit -- in fact it has no code to represent the unit as a unit of speech. The change comes about when it does tentatively assign a code for the relevant segment of input. At the time that it does this it must store certain correlations which will function as processes or rules: one which associates features of the acoustic input with the new code, and one which associates the new code with the semantic code (the two associations can be learnt at different times, the second after the first). As hinted at previously, the system has now split its original rule up into two. But the first one incorporates the feature of checking that the input is in agreement with that recorded previously, when the new code was first assigned.
The new code for the word, etc. has the purpose, in a larger context, of indicating that the word has a particular meaning. We now invoke the principle that the system tries varying the rule which defines whether the new code should be assigned to given input, noting whether the input is still associated with the expected meaning. This leads to the system defining the limits over which the pronunciation of the word may range (which may be context dependent, e.g. dependent on accent). We end up with a system capable of responding selectively to the given word. If the word has more than one meaning, it can learn a new semantic rule which links the new meaning with the previous code for the word, without having to relearn how to identify the word.
A similar argument, which will not be expounded in detail, shows that these ideas are applicable to the perception of phonemes in speech. In the above exposition we have implicitly talked in terms of phones (units of sound). The system can in principle learn, by means similar to those already discussed, to interpolate a phoneme detection stage, but here the process which is split in two is the perception of input rather than its interpretation. The hardware for this is almost certainly innate, as is possibly also hardware for working with syllables (the hardware consisting of special memory units, signal channels and so on).
We now come to the subtler problem of learning to detect syntactic patterns. Our theory presupposes that the first stage is that of discovering a meaningful constellation of units. The Marcus buffer is a natural tool for doing this.
We suppose that the system has already learnt to recognise a number of common words, knows their meanings, and has also assigned codes to them, which can be held in the Marcus buffer. When a sentence referring to a current scene is heard, codes for some of the words are held in the buffer. The language learner examines his environment looking for something corresponding to one of the codes in the buffer, focusses attention on this part of the environment and then looks again in the buffer to discover further codes corresponding to this part of the scene. In this way it tends to find a meaningful group, e.g. one code may be the name for an object and the other a code for its colour.
The next stage is that changes occur in the system of a type which leads ultimately both to pattern recognition abilities and to the acquisition of an interpretation process for a group. A number of developments, which need not all be done at the same time, are needed. The processes will be illustrated by means of a particular concrete example. Suppose the listener hears a sentence (e.g. 'bring the red ball') containing the words 'red' and 'ball' in the order indicated with (for reasons which will become clear) no intervening words in between. Suppose also that a red ball is visible. The buffer contains codes for 'red' and 'ball' (L1 and L2, say), and by the processes already indicated, the system is able to pick out this pair of codes as being a meaningful group. On the non-linguistic, or semantic side, the system is assumed to have the ability to represent certain facts about the corresponding reality, e.g. that the ball is an object and red is a colour, for which it uses enquiry codes S1 (for colour) and S2 (for object). Suppose now that the two words of the group are also assigned arbitrary codes P1, P2, which will later function as pre-semantic codes. The system also notes the crucial ordering of the special elements of the buffer. Tentatively, we hypothesise that codes representing ordinal numbers are used, so that 'red' gets the code 'first' and 'ball' 'second', provided that no interstitial words are present.
The system can now acquire a number of processes/rules by noting the correlations present in the example. At first sight, the codes for red, first, colour and the presemantic code S1 should all be correlated together to act as potential process, but a little thought indicates that the actual role of ordinal information in natural language is to provide a constraint for a rule to apply (e.g. for a given type of pattern to be presumed to be present). The net effect of all these processes is to allow the combination 'red ball' to be translated into the correct semantic form (including the enquiry codes), going via the intermediate stage of the presemantic codes. The next stage is for the system to generalise its processes to work with categories of words, instead of with fixed words as in this example. Recall the treatment given for sentences of the type 'move X to Y, using Z' (though here we are dealing with a simpler kind of construction). The new process involves assigning and labelling the particular occurrence of one of the words with an arbitrary label, and treating this label as a variable to be re-recorded in semantic form under the relevant enquiry codes, under a new (generalised) version of the interpretation process (of the form previously described). We want the system to be able to make the correct semantic representation of all groups of the type 'red X'. The hypothesised mechanism for this is that the system collects under a particular category label all instances for which the given interpretation process works, and regards this label as an indication of the domain for which the rule is to be applied.
The procedure just described constitutes the first procedure for category expansion, and is based on essentially semantic considerations: all the elements of the category have similar semantics. If the criterion of similar semantics is dropped, expansion is possible to larger categories, ending up ultimately with the actual linguistic categories of the language concerned. The fact about language that is relevant here is that the rules of a given language which define which types of patterns are meaningful observe syntactic categories rather than semantic ones. For example, the fact that in English the sequence preposition - noun phrase is a grammatical combination manifests itself in the two phrases 'under the bridge' and 'under the circumstances ' whose semantics are totally unrelated. The implication of this fact is that a procedure of tentatively expanding a category by adding a new word which would fit into an already known pattern if it were placed in that category will often pick up a meaningful combination (though it may also pick up nonsense combinations instead by picking on an inappropriate set to group together). We are led to hypothesise that the system finds tentative category elements in this way, and confirms or disconfirms them by attempting to assign a meaning to the group. This gives a general procedure by which grammatical categories can be established (any erroneous assignments picked up by the means described will tend ultimately to be extinguished, by virtue of the fact that such groupings cannot in general be interpreted by any reasonable rule).
The above process of category discovery, combined with that of pattern discovery, is a cumulative one. The process starts with a few very basic combinations, such as the adjective-noun one discussed above, which lead to the discovery of a few basic categories. These original categories allow new patterns to be discovered, leading to the acquisition of more and more complex patterns, and also those additional categories which apply to particular patterns. It is worth noting, though we shall not go into the matter in detail here, that the gradual growth of a given language probably proceeds through similar pathways.
One of the known regularities of language is that particular patterns generally belong to particular categories (the pattern being regarded as a single entity). For example, in English the combination ADJ, NP belongs to the category NP. Such relevant facts about the language can be discovered by means similar to those already described.
We note briefly the application of our ideas to a problem mentioned earlier, that of having the system know that, for example, active and passive versions of a given sentence have a similar meaning. Let us take as an example the sentences given previously :
1: John kicked the ball
2: The ball was kicked by John
In the first sentence, the pattern noun - verb - noun phrase is recognised and the applicable rule for presemantic labelling generates the labels agent, verb and goal respectively (the relevant rule giving the correspondences 1st --> agent, etc.) . The interpretation rules link these presemantic codes to the semantic codes which define the actual semantic roles. Now we hypothesise that a person hearing the second construction but not yet familiar with it attempts to assign presemantic markers to the items in the Marcus buffer in such a way that the previous interpretation rule will still apply. This leads to the last item of the group now being characterised as agent instead of the first. The new rule relating order and presemantic codes is learnt. Investigation of its domain of validity leads to the correlation with the use of the passive form being discovered, and to the specification of this new category.
On the other hand, we do not need to invoke in our theory Chomsky's 'core grammar'. We have already indicated how knowledge of simple grammatical categories and forms could be acquired.
(i) Words with no concrete reference. In some theories of language it is supposed that words have meanings by virtue of having specific referents, a proposition which encounters considerable difficulties if one tries to implement it. In the present theory, meaning is in the first instance subjective, and the link which is encompassed by the word meaning is one which connects together internal signals in the system. Thus the domain of meaning is that of the internal language of the system, and any codes it uses can be associated with speech units. In this way we can understand readily how words such as 'not', 'easy', 'round', etc. acquire meaning in their users' linguistic systems.
(ii) The difference between children's and adults' understanding of words (question raised by K. Sparck-Jones).
A typical example is the word 'dictionary'. A child on first contact with the word can have no concept of its actual meaning, but nevertheless can recognise easily a particular dictionary and assign a code to it. In our theory the change which occurs when a child learns the real meaning of the word is an instance of updating of a rule. The original rule which assigned to the word the code for the particular book is changed to a rule which assigns a better code to the word, one which represents its real meaning.
This is not the only type of change possible. In a situation where the word is understood differently because more is learnt about the entity to which the word is applied, the appropriate change is that of storing more information away in the block of memory to which the code for the meaning gives access.
(iii) Language used as a secret code (objection raised by grant-awarding committee -- the illustrative example is ours).
A foreign agent spots the spy he is due to meet, approaches him and remarks 'We last met in Pittsburgh, I believe'. The spy replies 'No, it was in Berlin in 1944' (the exchange being a prearranged one to establish identities). The explanation of this interesting piece of linguistic behaviour clearly requires taking into account particular preceding instructions, which we may represent symbolically as follows: the spy is told 'your contact will say A, and then you will reply B'. Without going into full details we can see that the correct pragmatic effect of interpreting this information is to set up a new interpretation rule, which involves treating the sentence A as a special one, having its own individual code a, and assigning a meaning code a' to it, which gives access to a block of memory in which is stored information such as the fact that the speaker is a foreign agent and that the spy should answer with B. (Psychological arousal mechanisms can be invoked to account for the fact that the normal response (treating the statement at its face value) is inhibited).
Eckblad, G. (1981), Scheme Theory, Academic Press, London.
Josephson, B.D. (1982), On Target, Cavendish Laboratory Progress Report TCM/29/1982.
Mahesh Yogi, Maharishi (1974), numerous videotaped lectures, in particular the lecture course entitled The Science of Creative Intelligence, lecture 25 of which is concerned specifically with speech.
Marcus, M.P. (1980), A theory of syntactic recognition for natural language, M.I.T. Press, Cambridge, Mass. and London.
Skemp, R.R. (1979), Intelligence, Learning and Action, Wiley, Chichester and New York.
Wilks, Y. (1973), An Artificial Intelligence Approach to Machine Translation, in Computer Models of Thought and Language, ed. R.C. Schank and K.M. Colby, Freeman, San Francisco.