Intonational phonology

Cambridge University Press

9780521861175 - Intonational phonology - Second Edition - By D. Robert Ladd

Excerpt

1 Introduction to intonational phonology

Research on intonation has long been characterised by a number of unresolved basic issues and fundamental differences of approach. For many years, these precluded the emergence of any widely accepted framework for the description of intonational phenomena, or even any general agreement on what the interesting phenomena are. Since the mid 1970s, however, several lines of research have converged on a set of broadly shared assumptions and methods, and studies on a variety of languages are now yielding new discoveries expressed in comparable terms. This emerging viewpoint – which it is perhaps only slightly premature to characterise as the standard theory of intonational structure – is the subject of this book.

As the book's title suggests, the heart of this theory is the idea that intonation has a phonological organisation. This idea requires some justification, since intonation sits uneasily with many ordinary linguistic assumptions. For one thing, it is closely linked to a paralinguistic vocal code: sometimes against our will, pitch and voice quality help signal information about our sex, our age, and our emotional state, as part of a parallel communicative channel that can be interpreted by listeners (even some non-human ones) who do not understand the linguistic message. Yet we know that in languages like Chinese or Thai or Yoruba it is also fairly simple to identify a small inventory of phonological elements – tones – that are phonetically based on pitch or voice quality but are otherwise quite analogous to segmental phonemes, and that these tones function alongside the more universal paralinguistic effects of pitch and voice quality. The question is thus not whether pitch and voice quality can have phonological structure – we know that they can – but whether intonation does have phonological structure in languages like English or French. By claiming that it does, we focus in some sense on the distinction between linguistic and paralinguistic functions of pitch, and claim that intonation belongs to the former.

Another apparent obstacle to talking about intonational phonology is that the phonetic substance of intonation somehow seems less concrete than the properties involved in consonants and vowels. In the case of pitch, instead of

the complex constellations of articulatory settings or acoustic parameters that identify a [t] or an [o], we find only a simple scale of up and down, which can differ conspicuously from speaker to speaker and occasion to occasion; somehow the phonological properties of pitch have to be defined relative to the speaker and the occasion. In the case of stress, its very definition seems to depend on a comparison between one word and another or one syllable and another: the identification of a given syllable as stressed often seems to depend on its perceived prominence relative to some other syllable that is not stressed. Understanding the ways in which these features need to be defined in relative terms will require us to go beyond the traditional paradigmatic basis of phonology – contrast based on the choice of one element rather than another from a set of possibilities – to consider syntagmatic contrasts that depend on the structural relation between one element and another in the same utterance.

These two dichotomies – paralinguistic versus linguistic, and syntagmatic versus paradigmatic – lie at the heart of most of the issues discussed in the book. Once we understand them better, I believe that it will seem as natural to talk about the phonology of intonation as to talk about the phonology of ordinary words. After that, we should eventually reach the point of being able to describe in explicit and testable terms how intonation affects the meaning and function of utterances. What follows can be thought of as a kind of report on our progress toward that goal.

1.1 Intonation

1.1.1 Three defining characteristics

We begin with a definition. Intonation, as I will use the term, refers to the use of suprasegmental phonetic features to convey ‘postlexical’ or sentence-level pragmatic meanings in a linguistically structured way. The three key points in this definition are the three italicised terms:

(1) Suprasegmental: I follow phonetic tradition in restricting my attention to suprasegmental features – features of fundamental frequency (F₀), intensity, and duration, according to a common definition. Although this restriction is traditional, it is not without problems, which I will only mention here. First, there is a problem of definition. Lehiste (1970) defines suprasegmentals as features of ‘pitch, stress, and quantity’. The difference between her definition and the one I have given raises the more general question of the relations among physical, psychophysical, and phonetic properties. ‘Stress’ is clearly a
phonetic property (i.e. a complex perceptual amalgam only indirectly relatable to psychophysical and physical dimensions); ‘loudness’ is psychophysical; ‘intensity’ is physical. Similar distinctions can be drawn in the case of ‘pitch’ and ‘F₀’, or ‘quantity’ and ‘duration’. In all these cases, it is often unclear which terms of reference are most appropriately used in talking about suprasegmental phenomena. For the most part I have avoided this issue in what I have written here; in particular, I have made no attempt to distinguish rigorously between pitch and F₀. Strictly speaking, F₀ is a physical property and pitch is its psychophysical correlate, but in many contexts outside psychophysics little ambiguity arises if the terms are used interchangeably, and this accords with much recent phonetic work.¹

The other problem with restricting our attention to suprasegmental features is that there are other phenomena that might otherwise be covered by the definition of intonation proposed here. For example, it has long been observed that many languages use segmental morphemes to convey the kinds of meanings that in other languages can often be signalled intonationally. Two obvious examples are question particles and focus particles (see König 1991); there are several reports of detailed similarities between typical intonational functions and the function of particles in certain languages, such as German (Schubiger 1965, 1980) or Russian (Arndt 1960). It may be that the functional similarity between such particles and intonation as defined here should outweigh the clear phonetic and syntactic differences. Similarly, research on sign language phonology has suggested a three-way distinction closely comparable to the lexical–intonational–paralinguistic distinction that I will attempt to justify below (see Liddell 1977; Wilbur 1994a, 1994b). If such a comparison is valid, it is clearly important not to define intonation solely in terms of phonetic suprasegmentals. As in other areas of phonology, sign language research may be able to yield important insights into what is essential about intonation in spoken language, and what is accidental. However, throughout the book I have deferred to phonetic tradition, and have excluded both particles and sign language from any detailed consideration.

(2) Sentence-level or postlexical: intonation conveys meanings that apply to phrases or utterances as a whole, such as sentence type or speech act, or focus and information structure. By this definition, intonation excludes features of stress, accent, and tone that are determined in the lexicon, which serve to distinguish one word from another. For example, English permit (noun) and permit (verb) are composed of identical strings of phonemes, and distinguished by whether stress falls on the first syllable or the second; Standard Chinese huā ‘flower’ and huà ‘speech, language’ are segmentally identical, but are distinguished by the fact that the former has high level pitch (‘Tone 1’) while the latter has sharply falling pitch (‘Tone 4’). Intonational features are, by definition, never involved in signalling such distinctions. Phonetically, of course, lexical features of stress, accent, and tone interact with intonational features in many ways. In general, however, the two types can be kept distinct in a description.

(3) Linguistically structured: intonational features are organised in terms of categorically distinct entities (e.g. low tone or boundary rise) and relations (e.g. stronger than/weaker than). They exclude ‘paralinguistic’ features, in which continuously variable physical parameters (e.g. tempo and loudness) directly signal continuously variable states of the speaker (e.g. degree of involvement or arousal). Like lexical features, paralinguistic features interact with intonational features. Unlike lexical features, paralinguistic aspects of utterances are often exceedingly difficult to distinguish from properly intonational ones, and it is a matter of considerable controversy which aspects are which, or whether such a distinction is even possible. I will return to discuss this at length at the end of the chapter, and at various places throughout the book.

1.1.2 Pitch and relative prominence

Formally and functionally, the phenomena covered by the three-part definition just given have two orthogonal and independently variable aspects, which we might refer to as ‘pitch’ and ‘relative prominence’. These two aspects are illustrated by the four intonational possibilities of the simple utterance five pounds informally sketched in figure 1.1.

Pitch. The two pitch patterns shown are by no means the only possibilities in English, but they are clearly distinct. The ‘falling’ pitch pattern is the one that would normally be used in a straightforward reply to a question, for example in answer to a question like How much does it cost? The ‘rising’ pattern would normally be used to convey doubt, uncertainty, or some other ‘questioning’ modality: it could be used to ask for confirmation that the speaker has heard

Image not available in HTML version

Figure 1.1. Tune and relative prominence as two independently variable aspects of intonation, illustrated on the English phrase five pounds.

correctly (Did you say) five pounds? Alternatively, in the relatively recent usage sometimes known as ‘uptalk’ or ‘High Rising Terminal (HRT)’, the ‘rising’ pattern can also be used on five pounds in answer to a question like How much does it cost? In this usage, the rising pattern would signal that the speaker is not sure of the answer, or that the price seems unreasonable, or more generally would invite feedback from the questioner about whether the price is acceptable. This shows that it is not possible to identify pitch patterns with sentence types in any simple way. It does not, however, undermine the point being made here, since the two patterns are still clearly distinct: even in the popular press notice has been taken of ‘uptalk’, suggesting that the distinction between falling and rising is obvious to the casual native-speaker observer.²

Relative prominence. The two prominence patterns are also clearly distinct. The first, weak–strong, is the ‘neutral’ stress pattern, used when there is no particular reason to emphasise either five or pounds, or (to put it somewhat differently) when the focus is on the phrase as a whole. This is the stress pattern that would normally be found if the phrase were used to answer a wide range of questions, like How much does it cost? or What did you give him? or What have you got there? The second pattern of prominence, strong–weak, focuses on five for contextual reasons, and would normally only be used in a discourse context where a specific number of pounds was under discussion: that

is, as an answer to a question like Did you say four pounds? In either pattern, it is possible to bring about perceptible gradual modifications of the phonetic ‘prominence’ of the individual words five or pounds by gradual changes in various acoustic parameters, adding a slight nuance of emphasis to one word or the other, but the prominence pattern of the utterance as a whole must fall into one of the two categories shown: either we have narrow focus on five, or we do not. In describing the situation this way, I am not ignoring the fact that the weak–strong pattern can be used to focus on pounds, for example in reply to a question like Did you say five euros? Rather, I am claiming that there is a clear asymmetry in the linguistic effects of the two prominence patterns, a point that will be discussed at some length in chapters 6 and 7.

The distinctions of pitch and relative prominence shown in figure 1.1 fit all three points of the definition of intonation presented above. First, of course, the features under discussion are obviously suprasegmental. Second, the meanings conveyed are clearly not lexical: the meanings of five and pounds are unaffected by the intonational changes, and the differences of pitch and of relative prominence affect the meaning of the utterance as a whole. Finally, the distinctions are linguistically structured, in the sense that we are dealing with categories such as rising versus falling, or weak–strong versus strong–weak. Detailed phonetic differences of prominence on the individual words occur and are meaningful, but they work within the phonological framework of the two possibilities strong–weak and weak–strong. The extent to which a categorical structure is involved in intonation is, as I said, a point of some controversy, but in these specific examples it seems fairly clear that we are dealing with sharp rather than gradual distinctions. By our definition, then, pitch and relative prominence are at the heart of intonation, and the organisation of the book's chapters is based on the centrality of these two clusters of phenomena.

Nevertheless, two points require further comment. First, in much earlier work it is often assumed that there are three main aspects to intonation rather than two. In a three-way division of intonational function, the third major function of intonation is said to be the division of the stream of speech into intonationally marked chunks (‘intonational phrases’, ‘tone groups’, and related terms). In the American phonemic tradition, for example, the three aspects were called ‘pitch’, ‘stress’, and ‘juncture’ (e.g. Trager and Smith 1951); juncture phonemes were supposed to be phonetically definable boundary markers of one sort or another. Halliday (1967a) states explicitly that the intonation of an utterance involves features of ‘tone’ (my ‘pitch pattern’), ‘tonicity’ (part of what I am calling ‘relative prominence’), and ‘tonality’ (the division of the utterance into tone groups). Other writers have made similar distinctions.

I do not deny of course that there are phonetic cues to the division of the stream of speech into smaller chunks, but I regard this fact as following from the existence of phonological structure, of the sort that has been extensively discussed in the literature on prosodic structure since the late 1970s (e.g. Selkirk 1980, 1984; Nespor and Vogel 1986; Truckenbrodt 1999). That is, I assume that utterances have a phonological constituent structure (or prosodic structure), and that the prosodic constituents have various phonetic properties, both segmental and suprasegmental. Intonation has no privileged status in signalling prosodic structure – indeed, much of the work on ‘prosodic phonology’ (e.g. Nespor and Vogel 1986) deals with segmental sandhi rules (rules describing phonetic adjustments at word and morpheme boundaries, such as the palatalisation that yields gotcha from got + you). Moreover, I assume that constituent boundaries in prosodic structure are in the first instance abstractions, not actual phonetic events: intonational features of pitch and relative prominence are distributed in utterances in ways allowed by the prosodic structure. In some cases this means that conspicuous phonetic breaks occur at major constituent boundaries, but this is neither the essence of the boundary nor the only factor governing the distribution of the intonational features. I will return to the issue of prosodic structure and its relation to intonational features in chapters 7 and 8.

The second point on which comment is required has to do with the relation between phonological and phonetic description. In distinguishing pitch from relative prominence and treating the two aspects of intonation as ‘independent’ and ‘orthogonal’, we are making a phonological abstraction. As can be seen from figure 1.1, there is a great deal of phonetic interaction between the two sides of the intonational coin: in short utterances like five pounds, the relative prominence is actually cued perceptually primarily by the pitch contour (see section 2.2). However, the fact that the pitch pattern and the prominence pattern can vary independently shows that we are dealing with two distinct phenomena: that is, in a general account of intonation, it is useful to posit an abstract prominence pattern, distinct from the pitch contours that may serve to realise it phonetically. To put this distinction in fairly traditional terms, the ‘sentence stress’ or ‘nuclear stress’ on five or pounds can be referred to independently of the ‘pitch accent’ or ‘nuclear tone’ by which it is phonetically manifested.

1.1.3 Intonational phonology

This brings us to the term ‘intonational phonology’. Until the late 1970s there was not really any such notion, and even now it is not obvious to some intonation researchers that intonation has a phonology worth discussing. Since the publication of the first edition of this book, the actual collocation intonational

phonology has become more common, but as I pointed out in the preface to the present edition, some people have apparently understood it as designating a school of thought rather than a set of phenomena. It is therefore necessary to demystify this term quite explicitly.

At a minimum, a complete phonological description includes (a) a level of description in which the sounds of an utterance are characterised in terms of a relatively small number of categorically distinct entities – phonemes, features, or the like – and (b) a mapping between such a description and a physical description of the utterance in terms of continuously varying parameters, such as an acoustic waveform or tracks of the movement of the articulators. I emphasise that this characterisation of phonology is not intended to be controversial, although admittedly it has a laboratory bias that not all readers may share. I also emphasise that it is intended to apply to phonological phenomena of any sort, not just intonation. It obviously deals mostly with issues of ‘postlexical’ phonology and phonetic realisation, and consequently leaves out all sorts of aspects of morphophonemics or ‘lexical’ phonology that would be needed for a characterisation of phonology as a whole. Nevertheless, the parts it leaves out are, by the definition given in section 1.1.1, irrelevant to intonation, and it will therefore serve as an adequate notion of phonology for our purposes here.

Minimal though such a phonology may be, it is not something that is encountered very often in past work on intonation. Until the late 1970s there were two essentially separate approaches to studying intonation, which in their own way both failed to include a description that we might call phonological according to the characterisation just given. The two approaches also largely ignored each other. For want of better terms I will refer to these as the ‘instrumental’ approach and the ‘impressionistic’ approach, though – anticipating my conclusions a bit – I might also designate the two views as ‘phonetic’ and ‘proto-phonological’. It will be useful to sketch these two approaches briefly.

The ‘instrumental’ or ‘phonetic’ tradition was that of experimental psychologists and phoneticians interested in speech perception and in identifying the acoustic cues to intonational phenomena. An excellent review of this work up to the late 1960s is Lehiste 1970. Much of this work has focused on discovering the acoustic cues to several specific intonational phenomena, in particular: (a) syntactic/pragmatic notions like ‘finality’, ‘continuation’, and ‘interrogation’ (e.g. Hadding-Koch and Studdert-Kennedy 1964; Delattre 1963; Lieberman 1967); (b) emotional states such as anger, surprise, and boredom (e.g. Lieberman and Michaels 1962; Williams and Stevens 1972); and (c) word and sentence stress (e.g. Fry 1958; Lieberman 1960). In none of these cases can clear understanding have been said to result, though there are some fairly general findings that are well established, such as the fact that active emotions like anger or surprise are generally signalled by higher overall pitch (Uldall 1964; Williams and Stevens 1972), or that the duration of pauses at intonational breaks correlates well with the syntactic ‘strength’ of the boundary (Cooper and Paccia-Cooper 1980). But more conclusive findings seemed elusive, and fundamental uncertainty remained about such questions as the acoustic nature of stress.