Natural Fast And Time Compressed Speech English Language Essay

Published: November 21, 2015 Words: 1094

Human speech perception is flexible, quick adaptation occurs whenever there is a change in speaker, speech rate, or when speech conditions change. There is evidence for rapid adaptation to time-compressed speech, noise-vocoded speech, foreign-accented speech, and synthetic speech. Within a conversation, speakers often vary their speech rate considerably. These variations result in variations of co articulation and assimilation, deletion of segments, reduced vowel durations and reduction of unstressed vowels. The consequences of these variations force the listeners to use a normalization process which involves short term automatic compensations. To effectively process speech, listeners must be able to adjust to changes in speech rate extremely rapidly. When variations in speech rate are minimal, as in natural speech, listeners can accomplish this task without apparent effort. However, for extremely fast rates, adaptation becomes considerably more difficult. Adaptation to fast rates is usually measured using artificially time-compressed speech which is a method of artificially shortening the duration of an audio signal without affecting the fundamental frequency of the signal.

Listeners can adapt to sentences compressed up to 38% of their original duration within 10-20 sentences (Dupoux & Green, 1997). Natural fast speech is more difficult to process than speech that is artificially time compressed to the same rate (Janse, 2004) as it differs in both spectral and temporal characteristics from speech at normal rate (Koreman, 2006; Wouters, Macon, 2002). Peelle & Wingfield (2005) demonstrated that perceptual learning is comparable in young and older adults but maintenance and transfer of this learning decline with age. Time-compressed speech has a temporal and a segmental processing advantage over naturally produced fast speech (Janse, 2003). Studies suggest that listeners adapted to one type of fast speech facilitate adaptation and/or general performance for the other type (Adank & Janse, 2009). Recent studies have focused adaptation in quiet situations where as adaptation during noise, particularly when the noise is a competing voice reflecting more realistic listening situations. The persistence of intelligibility despite great changes in background noise, talker's voice, speech rate, dialect, etc. has to be part of any reasonable account of speech perception.

NEED FOR THE STUDY

Previous research has shown that when hearers listen to artificially speeded speech, their performance improves over the course of 10-15 sentences, as if their perceptual system was adapting to these fast rates of speech. An extension of Adank & Janse (2009) study is needed to check adaptation to natural fast before and after time compressed speech in the presence of multitalker babble Since few studies have done on adaptation of time compressed speech for Indian languages, the present study focuses listener's ability to adapt to natural fast speech before and after time compressed speech in the presence of background noise and to correlate the adaptations in natural fast speech and time compressed speech.

METHOD

Twenty four participants (8 males and 16 females, age range 20 to 30years, mean age 23.6years) took part in the study. All were native speakers of Kannada with no history of speech, language and hearing impairment or neurological or psychiatric disease. Kannada QuickSin sentences were recorded in three modes by Kannada male speaker using PRAAT software 5.1.22 version (Weenink & Boersma, 1992). 4 talker babble at 0 dB SNR was added to all the recorded sentences using a Matlab code. (Narne, 2007). To measure adaptation, a total of 75 sentences were presented in two experimental designs. That are 1) Normal speech mode, Natural fast mode and Time compressed mode and 2) Normal speech mode and time compressed mode and Natural fast mode. 75 sentences in which 25 sentences each represented three modes of speech presented. Stimulus presentation and reaction time measurements were done using DMDX software (version 4.0.3.0). Subjects' repetition of words was recorded by Sony Vaio laptop computer with Frontech external microphone to calculate the reaction time and word frequency accuracy.

RESULTS

To measure the accuracy, percent-correct scores for sentences were obtained for all three conditions. A repeated measure ANOVA test was used to assess the significance differences between three conditions. Normal speech mode was generally better than other two conditions in both experimental conditions. Analysis of the speech recognition data revealed that the assumption of sphericity was violated, so the Greenhouse-Geisser correction was used to interpret results. Results indicate a significant main effect of condition for both first [F (1.74, 19.14) = 295.21, p<0.001] and second experimental conditions [F (1.76, 19.36) = 65.58, p<0.001]. For each condition, the result was compared to that of with other two conditions using Pair wise comparisons with Boneferroni's adjustments. The data shows statistical significance (p <0.05) by the post hoc test when the Normal speech mode was compared with other, Natural fast mode and Time compressed mode conditions in both experimental design.

When compression condition preceded natural condition, there was a significant difference. However, in subsequent fast condition improvement was observed. When fast rate condition preceded natural condition, deterioration was observed, but no improvement in compression was noted. Overall a significant difference (t = 5.67, p < 0.001) between responses to fast rate condition before and after time compression mode presentation was found.

DISCUSSION

The results have shown some important points. Study reveals that listeners can adapt to natural fast speech. Natural fast speech involves greater variation in spectral as well as temporal aspects compared to artificially time compressed speech. And the present study included speech babble at 0dB SNR which made adaptation to natural fast speech more difficult to the listeners. Study confirms previous research done by Janse, (2004) and Adank, (2009). The result scores obtained for natural fast speech in the present study were lower when compared to study done by Adank, (2009), which can be attributed to the introduction of speech babble at 0dB SNR.

Study clearly showed the relation between adaptation of natural fast and time compressed speech. Accuracy scores were better for participants who were presented with time compressed speech prior to natural fast speech (group 2, Normal- time compressed - natural fast). Listeners in this group were already adapted to variations in temporal aspects and this adaptation lead to achieve a better score in group 2 compared to group 1 which showed temporally as well as spectrally manipulated sentences being presented first.

CONCLUSION

The results are relevant as it confirms previous research. The study shows learning about how listeners process a naturalistic distortion in the speech signal for learning more about human's general speech comprehension ability. Results also exhibit ability of speech perception system to adapt to variations in speech signal.. Older adults generally have more trouble with time-compressed speech, and perhaps also with natural fast speech.