Resource Center


Publications

Does Cued Speech Entail Speech? An Analysis of Cued and Spoken Information in Terms of Distinctive Features By Earl Fleetwood, M.A. and Melanie Metzger, Ph.D.

Part 2

For the purpose of this study, the visible attributes of Cued test items were defined in the following terms: an isolated cued English allophone is considered the product of (1) the simultaneous pairing of a particular mouth formation with a particular hand shape (at any of 4 specific placements) to represent a specific consonant phoneme or (2) the simultaneous pairing of a particular mouth formation with a particular hand placement (with any of 8 specific hand shapes) to represent a specific vowel phoneme. Articulators that produce a cued phonemic referent are assumed to be visible. It is assumed that visual access to the articulators of cueing constitutes access to a phonemic referent (cued allophone) produced by the articulators of cuem. These assumptions are used as the basis for predicting the responses of the deaf native English cuers to both the N and the C test items. As part of this study, the efficacy of these assumptions was determined by whether or not actual responses coincided with predicted responses. See Appendix A to view a single cued allophone6 for each phonemic value in American English.

For the purposes of this study, the acoustic attributes of Spoken test items were defined in the following terms: an isolated spoken English allophone was considered the product of employing a mouth, teeth, tongue, throat, and soft-palate formation through which exhaled air is channeled and which is simultaneously voiced or non-voiced. The articulators that produce a spoken allophone were assumed to be not completely visible. The exhalation of air was assumed to be not completely visible. The channeling of air was assumed to be not completely visible. Whether the referent is voiced or non-voiced was assumed to be not visible. It was assumed that visual access to the articulators of speaking does not constitute access to an allophone produced by the articulators of speech. Simple put, it was assumed that the sounds generated by speech production cannot be seen. These assumptions are used as the basis for predicting the responses of the hearing native English speakers to both the N and the C test items. Again, as part of this study, the efficacy of these assumptions was to be determined by whether or not actual responses coincide with predicted responses.

During a pre-study review of the testing material, flaws were noted in cued test items numbered 23 and 25; unintended hand shapes and/or hand placements were articulated. This is likely a cognitive manifestation of producing mismatched cued-spoken information; the cuer who presented the test items does not regularly strive to simultaneously generate two different linguistic messages. While participants were exposed to the flawed test items, the test items are not included in the results and analysis portions of the current study as they do not satisfy the test material parameters.

Procedure

A VHS recording was prepared in advance of the experiment. The VHS videotape contained a simultaneously recorded visual and audio track. The video track consisted of visual (cued) information, specifically the cued representation of isolated phonemes, isolated words, and short sentences. The audio track consisted of acoustic (spoken) information, specifically the spoken representation of isolated phonemes, isolated words, and short phrases. Participants were exposed twice to the same videotaped test items. The participants were not told that the same test items would be used in both trials. The participants’ first trial was exclusively via his/her native mode of communication (i.e., cued/visual). Participants who used assistive listening devices as a part of regular life routines continued to use them for both trials in this study.

If members of either group were given access to the other’s native mode of communication (i.e., speech or cuem7), it would be unclear whether written responses were products of (a) the distinctive features of cueing, (b) the distinctive features of speech, or (c) a mixture of the distinctive features of cueing and of speaking. By eliminating sound from the test items received by deaf native cuers, the acoustic features of speech were removed as a consideration with regard to what prompted their written responses. By eliminating the video image from the test items received by hearing native speakers, the visible features of cuem were removed as a consideration with regard to what prompted their written responses. Because the cued and the spoken test items were co-presented, a comparison of the responses of each group– the deaf native cuers and the hearing native speakers–can be revealing with regard to (a) whether or not speech and cuem exist autonomously, (b) whether the distinctive features of speech and of cuem differ or are one and the same, and (c) whether the distinctive features of speech and of cuem can exist independently yet co-occur. Thus, initial exposure to the test stimuli was provided exclusively via the native mode of a given test subject.

During the deaf participants’ first trial, the television monitor was adjusted such that the video image could be seen to the satisfaction of participants. Additionally, during the deaf participants’ first trial, the volume on the television monitor was set to zero (no sound).

During the hearing control group’s first trial, the volume on the television monitor was adjusted such that the sound track of the video recording matched the decibel level that the participants were using in pre-test conversation. Additionally, during the hearing participants’ first trial, the picture on television monitor was dimmed completely.

Each participant was also provided a second trial with the same test stimuli. The second trial co-presented spoken and cued test items. This was done to determine whether or not participants who are native to the distinctive features of a particular mode (i.e., speech or cuem) are influenced by the distinctive features of their non-native mode, should the distinctive features of one mode prove to be different than the distinctive features of the other. Without providing simultaneous exposure to both the spoken and the cued test items, the results of the current study would be subject to scrutiny on the grounds that test items should not both systematically segregate features into two pre-determined sets, then conclude that participants naturally defer to these sets. Thus, the second trial allowed participants to determine to which distinctive features they deferred.

During the deaf participants’ second trial, the volume on the television monitor was adjusted such that the sound track of the video recording matched the decibel level that the hearing control group used in pre-test conversation. Deaf participants who make use of residual hearing and/or assistive listening devices had the opportunity to make use of same. The television monitor was adjusted such that the video image could be seen to the satisfaction of participants.

During the hearing control groups’ second trial, the television monitor was adjusted such that the characteristics of the video image matched what the deaf participants had seen. The volume on the television monitor was adjusted such that the sound track of the video recording matched the decibel level that the hearing participants were using in pre-test conversation.

Both groups of participants were exposed twice to 16 stimulus items, each exposure constituting a trial. The end of the first trial and the beginning of the second trial were unmarked. Participants were told that the stimuli contained 32 items. Specifically, participants were directed via a blank answer sheet numbered from 1-32 as per the following written8 English instructions: “Using the video/audio recording as a source, for each number below, write the information that you receive.”

The purpose of providing two trials was to conduct both intra- and inter-group comparisons. Such comparisons provided evidence for determining whether or not spoken (acoustic) information impacted on the responses of deaf native English cuers. Although not a question for the current study, it also provided evidence of whether cued (visual) information impacted on the responses of the hearing, native English-speaking control group.

It is at least worth noting that controlling for access to a given participant’s non-native mode was not possible. Both sighted deaf and sighted hearing individuals have complete access to the products of cueing (e.g., visible symbols) and, thus, to the visible features of cuem. However, sighted deaf individuals do not have complete access to the acoustic features of speech; the acoustic symbols produced by speaking are not visible. Thus, despite attempts to provide equivalent controls for exposure to the non-native mode, such equivalency is, at best, random with regard to the deaf participants’ second trial. Fortunately, the first trial provided the controls necessary to determine whether or not cueing entails the distinctive features of speaking. The second trial was simply an attempt to discover to which set of distinctive features the two groups of participants deferred when both visible and acoustic features were co-presented.

Test stimuli were presented by a 35 year old, female, hearing native speaker of American English. The presenter had 20 years of experience communicating language via the visual mode, including functioning as a signed language interpreter (interpreting between American Sign Language and spoken English) certified by the Registry of Interpreters for the Deaf, Inc.; as a cued language transliterator (transliterating between cued and spoken English) certified by the Testing, Evaluation, and Certification Unit, Inc.; as a cued language transliterator educator teaching graduate and undergraduate courses; and as a professor of signed language interpretation in a graduate program. The presenter made no errors on the Basic Cued Speech Proficiency Rating (Beaupré, 1983), a standardized test that evaluates knowledge and skills with regard to cueing mechanics.

Results

Participants in the hearing, native English speaking control group recorded the same responses for both trials (see Table 2). Given that the only difference between the trials was the addition of the video-image, it appears that the rendering of cued information did not affect their linguistic judgments. This provides evidence that hearing native users of spoken English who have not previously seen or studied cueing do not consider the articulatory features of cuem when making linguistic decisions. This was true of the hearing control group with regard to their phonemic, morphemic, and syntactic decisions (see Table 2). This provides evidence that visible articulatory features of Cued Speech (i.e., hand shape, hand placement, and mouth formation) are not entailed by speech.

It is not surprising that the linguistic decisions of hearing participants appeared unaffected by the presence of hand shapes and hand placements; no member of the hearing control group had experience with cueing. It is also not surprising that the mouth shapes serving as one feature of the cued stimuli did not confound the hearing participants’ predicted responses to the spoken stimuli. Even when linguistic values for co-presented test items did not match, the mouth shapes coincided with what is likely the hearing participants’ experience with seeing speech while hearing it produced (for reviews, see Summerfield, 1987; Massaro, 1987).

The fact that the hearing participants consistently provided predicted responses to the acoustic stimuli presented in the absence of the video image is an interesting result. Where mouth shape is not accessible, one could expect that spoken allophones representing isolated phonemes such as /m/ might instead be perceived as spoken allophones of isolated phonemes such as /n/ (cite). The fact that this did not occur simply suggests that although visual information might be used when available, it is not a requirement for the reception of spoken language, at least not for the given test items as provided the hearing control group in this study.

Deaf native users of cued English recorded the same responses for their first and second trials (see Table 2). Given that the only difference between stimuli in the second trial was the addition of the sound-track information, it can be said that for the deaf participants the inclusion of spoken information was linguistically unnecessary with regard to the rendering and/ comprehension of cued information. This was true regardless of their hearing acuity. Because deaf participants provided responses in keeping with a different set of features than those yielding messages in the acoustic mode for the N test items, it appears that the articulatory features of speech are not salient to linguistic decision making by deaf native English cuers. This is true with regard to their phonemic, morphemic, and syntactic decisions. This provides evidence that the acoustic articulatory features that constitute speech are not entailed by Cued Speech. Such evidence is significant as it is counter to the previously described implicit assumption that Cued Speech entails articulatory features and/or products of speech.

Although no interviews with the participants were conducted, two of the deaf native cuers made comments via cued American English as they departed the test site. They expressed the belief that the acoustic information provided in the second trial influenced, and even changed, their judgments about what had been cued. Given that deaf native cuers provided the same written responses in the first trial as they did in the second trial, the test data do not bear out the aforementioned belief held by these two deaf participants; the deaf native cuers did not defer to acoustic information when making linguistic judgments.

Perhaps any acoustic information that a deaf participant might have received was accessible to the degree that it served a redundant and secondary function. For example, if the deaf participants could hear the number of syllables rendered in the spoken test items (e.g., two syllables in the spoken word hilltop) perhaps they used this information as a confirmation of the number of syllables they had seen cued (e.g., two syllables cued in the co-presented word kingdom). If the number of syllables did not coincide (e.g., cued the four syllable phrase for insurance while speaking the five syllable phrase foreign journalist), perhaps this went unnoticed or perhaps the visual (i.e., cued) information was simply prioritized over the acoustic (i.e., spoken) information in linguistic decision making. Whatever the sense of these two deaf participants when co-presented cued and spoken test items, the written responses of all the deaf participants not only suggest that they do not make linguistic judgments as products of acoustic (i.e., spoken) information, the data suggests that they do not make linguistic judgments as a product of the visible articulators of speech. What the deaf native cuers could see of the articulation of speech did not necessarily elicit the same linguistic responses as did what the hearing native speakers could hear of the speech articulation products.

Where all other participants provided one set of responses to the test material, one deaf native cuer provided two sets of responses to selected items: one set was provided in response to the first trial, and two sets were provided in response to the second trial. For each test item found in the second trial — the exposure that included acoustic information — this deaf native cuer provided two responses, writing “cued” next to one written response and “spoken” next to a second written response for a given test item. At first glance, it appears that this deaf native cuer was able to simultaneously receive and process visible (i.e., cued) and acoustic (i.e., spoken) information and separate them in terms of their linguistic values.

It is noteworthy, however, that this deaf native cuer’s written responses to the second trial seem to assume that all of the test items were, in effect, N test items (i.e., cued-spoken mismatches); while this participant might have differentiated what was seen from what was heard with regard to the actual N test items, it is possible that this participant’s mismatch- responses for the C test items are simply products of patterns he/she noted in his/her mismatch- responses to the actual N (i.e., dissociated) test items. Regardless, as with all of the other deaf native cuers, all of this participant’s responses to the cued test items coincided with the predicted responses for those test items. This test subject’s seeming ability to simultaneously process and recall two distinct messages (for the N test items) provides information consistent with other data collected in the current study and serves as unique evidence in terms of addressing the focus of this study.

Responses of the deaf native cuers coincided with responses of the hearing native speakers for the C test items. However, responses of the deaf native cuers did not coincide with those of the hearing native speakers for the N test items. The former finding suggests that the linguistic values rendered via the articulators of cuem can coincide with the linguistic values rendered via the articulators of speech. However, the latter finding suggests that this linguistic correlation is not the product of a coinciding articulatory system. Reconciling this with the earlier finding that deaf native English cuers do not consider the articulatory features of speech when making linguistic decisions, it appears that cueing and speaking work autonomously in rendering linguistic values. Apparently, any linguistic correlation between cueing and speaking is an option rather than a requirement. Regardless of whether a simultaneously cued and spoken message can yield information of equivalent linguistic value, results of the current study suggest that the salient articulatory features of Cued Speech are functionally distinct from those of speech.

The result of presenting visual and acoustic mismatches to sighted hearing participants has been noted by McGurk and MacDonald (1976). The McGurk Effect refers to the finding that simultaneous exposure to both the visual and the acoustic channels can create a condition unique to either channel exclusively. The McGurk Effect suggests that sighted, hearing people who are presented information simultaneously in two channels are affected by the information originating in both channels in terms of what they receive, perceive, and process.

While simultaneous cueing and speaking also presents information in two channels, the current study finds no evidence of the McGurk Effect where participants are sighted deaf individuals. The sighted deaf participants provided responses consistent with information found in the visual channel regardless of the presence of information in the acoustic channel. In light of the McGurk Effect, this suggests that while simultaneous cueing and speaking constitutes the bimodal information presented the sighted deaf individuals, in this study it does not constitute the information that they receive, perceive, and process.

A comparison of responses between the two groups tested yields predictable differences at the phonemic, lexical, and phrasal levels for the N test items. Because phrases are sequences of lexical items and because lexical items are sequences of phonemic referents (i.e., allophones), it follows that differences in the perception of phonemic sequences (lexical and phrasal) might be founded in perceptual differences with regard to isolated allophones. Thus, the production of allophones is the starting point of the analysis.

According to the data, for every isolated allophone tested, responses were consistent within the group of deaf native English cuers. Likewise, for every isolated allophone tested, responses were consistent within the group of hearing native English speakers. However, as predicted, responses were not consistent between the two groups; for the N test items, phonemes indicated by the group of deaf native English cuers did not coincide with those indicated by the

hearing native English speakers with regard to the simultaneously cued and spoken test items. Where simultaneously cued and spoken information differed linguistically, the written responses of the two groups (a) targeted different values and (b) were consistent within a given group.

The data also reveal that at least some articulatory features associated with spoken allophones were irrelevant to and/or disregarded by the group of deaf native English cuers. For example, one of the simultaneously cued and spoken test items included a two second acoustic rendering that was identified by all of the hearing native English speakers as m. The simultaneously produced two second visible rendering was identified by all of the deaf native English cuers as p. Thus, it seems that the articulatory features of cuem accept visual allophones of /p/ that contain a durative aspect. This is unlike the articulatory features of speech, which, for spoken American English, present allophones of /p/ only as non-durative plosives or stops. The data shows that with regard to the representation of at least some phonemes, the articulators of cuem accept as having a durative (i.e., +continuant) quality that which speech regards as strictly a plosive or stop (i.e., – continuant). Thus, it appears that cueing and speaking differ with regard to manner of articulation.

Differences in the articulatory features of cuem and speech go to the point of the current study. However, in addition to behaviors that distinguish cueing and speaking in an articulatory sense, it is also noteworthy that these differences have linguistic implications. Where the articulators of cuem accommodate duration in instances that speech might not, it is the deaf native cuers who subconsciously decide whether the visible symbol(s) generated has been ascribed linguistic value. In the example above, the deaf participants performed a subconscious linguistic exercise when they accepted as a cued allophone of /p/ a visible symbol in which duration is a feature. This piece of evidence provides some insight into the nature of cued allophones, how the attributes of cued allophones are articulator-specific, and why attributes of cued allophones are not likely to map onto attributes of spoken allophones.

Another articulatory feature associated with spoken allophones, that of + voice, is apparently not an articulatory feature of cued allophones. For at least one of the simultaneously cued and spoken renditions of an isolated phonemic referent, the deaf native cuers wrote p, apparently indicating /p/ as the target phoneme while the hearing native speakers wrote m, apparently indicating /m/ as the target. This suggests that + voice did not override visible information used by the deaf native cuers in making linguistic decisions. It appears that + voice is neither necessary nor considered salient to the representation of cued allophones. Thus, it appears that voice is neither a distinctive articulatory feature of cuem nor of cued English.

Test item 22 included the simultaneous rendering of the cued word kingdom and the spoken word hilltop. In keeping with predicted responses, the deaf native English cuers perceived a cued allophone of /k/, transcribing k in the word-initial position while the hearing native English speakers perceived a spoken allophone of /h/, writing h in the same word-initial position. This difference in linguistic categorization by the two test groups suggests that a cued allophone of /k/ need not exist as the product of a velar production (since none was rendered). This is evidence that place of articulation differs where cueing is compared with speaking.

Given the design of Cued Speech, the aforementioned evidence regarding place of articulation is particularly intriguing. By design, Cued Speech allows spoken messages and cued messages to co-occur. Where the mouth as an articulator is concerned, a commonly held notion is that place of articulation co-occurs as well. After all, the same visible mouth configuration is used to generate simultaneously rendered spoken allophones and cued allophones.

Nevertheless, the data provide evidence that the visible features constituting a given configuration and the place of production features used to generate the given allophone are not systematically one and the same where speech is concerned. For example, a visible mouth shape accompanies the production of spoken allophones for each of the phonemes /h/, /g/, /ng/, /k/. Certainly, mouth shape is relevant to accurately generating cued allophones representing each of these phonemes. However, and in contrast, where spoken allophones of /h/, /g/, /ng/, and /k/ are concerned, mouth shape does not constitute place of articulation. Test item 22, noted above, provides evidence of this reality.

The pattern of consistency within the two groups and inconsistency between them continues with regard to lexical and phrasal responses. Still, for some isolated lexical items as well as for some lexical items within the phrasal test stimuli, written responses by the deaf native English cuers were consistent with those indicated by the hearing native English speakers. It appears that simultaneous production of cued allophones and spoken ones is possible. It also appears that cued allophones and spoken allophones can target orthographic symbols representing the same phonemic values. This is not surprising as it is consistent with the findings of previous studies. However, it is noteworthy that past studies have only examined the production and reception of cued information where the linguistic values presented in the cued mode coincided with those presented in the spoken mode.

It is, perhaps, this ability for cuem and speech to simultaneously represent equivalent linguistic values that has lead researchers and others to assume or conclude that Cued Speech entails speech. Nevertheless, the current study provides evidence that information produced via cued English and information simultaneously produced via spoken English need not coincide linguistically. Even when cueing and speaking English co-occurs, representations of the same phonemic values need not be produced by the sender nor perceived by a deaf native English cuer and a hearing native English speaker.

At the lexical level, disparity exists between the two groups tested with regard to the identification both of consonant (e.g., trend /trend/ vs. dread /dred/) and vowel (e.g., pig /pIg/ vs. beak /bik/) phonemes. At the phrasal level, the two groups also perceived differences in word boundaries (e.g., I paid for insurance. vs. I met a foreign journalist.) and grammatical function (e.g., It could happen. vs. It’s a good habit.). Given the differences across groups in the identification of phonemes, it follows that perceptual differences at the lexical and phrasal levels would occur; phonemes build the lexicon and the lexicon entails and builds the syntax of English.

Because correspondence between the two groups with regard to the identification of phonemes and word boundaries need not occur, it can be said that the articulatory features that convey these structural aspects of cued English do not correspond with the articulatory features that convey the same aspects of spoken English. Ultimately, it appears that place, manner, and voicing as they describe speaking 1) do not describe and 2) are not salient to cueing. Again, it appears that Cued Speech does not entail the distinctive features of speech.

Authors of the current study are quick to note the perfect correlation between expected and actual responses to test items. Such a correlation suggests the possibility of ceiling effects. It is, therefore, important to consider whether such effects, if real, would counter evidence relevant to the current question. Toward that end, the possibility must be considered that counter- indicative data would be generated in response to different test items and/or in alternate test conditions.

Counter-indicative data might yield evidence suggesting that acoustic or articulatory products of speech influence the deaf participants in at least some instances in their linguistic decision making. However, in order to serve as evidence that is actually counter to the current finding, the nature and degree of this apparent influence would need to be very specific. The influence would need to be (a) systematic and predictable within the population(s) tested, (b) other than explainable by the McGurk effect, and (c) distinctly primary, overcoming the possibility that the acoustic products of speech that influence deaf native cuers serve the same function as the visible products of speech employed by hearing native speakers.9 In other words, evidence could only be considered counter to the current finding if it found distinctive features of speech requisite of receiving, perceiving, and comprehending cued information.

Absent such evidence, findings of the current study suggest that the linguistic decision making of the deaf native cuers predictably deferred to an autonomous set of visibly distinctive features. Moreover, because the generation, reception, and perception of phonetic and phonemic information was successfully segregated into two distinct modes, the potential impact of possible ceiling effects does not appear to be significant to the current findings.

6 Allophones are symbols that (a) are generated by a set of articulators and (b) function as referents to phonemic values. The features that constitute a given allophone are articulator- specific. Because cueing includes articulators that are not used to produce speech, cued allophones are defined by a different set of features than are spoken allophones, even if cueing and speaking co-occur. As with spoken allophones, the number and nature of cued allophones is driven by the rendering of a given language by the given set of articulators. Appendix A illustrates only one possible cued allophone for each of the phonemes of American English.

7 In order to more easily refer to the visible features of Cued Speech independently of the speech, speechreading, and/or sound references and assumptions found in traditional definitions, discussions, and most research, Fleetwood and Metzger (1998) use the term cuem (hand cues + mouth formations) to refer to the strictly visible phenomenon. This phenomenon is characterized by the coordinated manipulation of articulators, including hand shapes, hand placements, and non-manual signals found on the mouth, used to visibly render the phonology, and subsequently the morphology and syntax, of approximately 60 of the world’s major languages and dialects. The term cuem is used to clearly differentiate between the articulators of spoken messages (e.g., lips, teeth, tongue: speech) and the articulators of cued messages (e.g., hand shapes, placements, and mouth formations: cuem) under investigation.

8 Collecting data in written form confined the responses of all participants to a common medium. As a result, spoken responses of the hearing control group were not measured against cued responses of the group of deaf cuers. Additionally, the skills required to transcribe data were not subject to the transcriber’s competence, or lack thereof, in the reception of cued or spoken information. Hand written responses originating with the participants, and the deciphering thereof, sufficiently allowed for collection of data relevant to the study question while helping to limit possible error resulting from mis-comprehending the data.

9 Visual information might be used redundantly or confoundedly by the hearing speakers of a given language. Regardless, for the hearing person visual information is not required of receiving, processing, and comprehending spoken messages. Likewise, for the deaf participants, the articulatory and acoustic information produced by speech was not compulsory to receiving, processing, and comprehending cued messages.