NOTE: I’ve edited this post less than the last one, so it may be harder to read.
Do articulatory constraints play a role in speech errors? (Slis & Van Dies Hout) — Past research has shown that vowel context influences whether or not you get speech errors (by Goldstein and others). Using EMA data, the authors showed that there is tongue tip movement during production of /k/ and tongue dorsum movement in production of /t/, at least in words that contain both of these consonants. Let’s call these non-matching articulations. They then looked at nonmatching articulations in a variety of vowel contexts for English speakers. The amount of movement varied by vowel. The follow-up question is whether this variation is sometimes not normal and whether (as we expect based on past research by Goldstein and others) these not normal or aberrant articulations occur more often with some vowels versus others. For example, sometimes the amount of non-matching articulation is much greater than something like a standard deviation from typical non-matching articualtion. The basic idea is to have a system of automatic speech error recognition based on the kinematics and conforming to past research on errors being conditioned by vowels. This next step is currently ongoing and should be completed shortly.
(I missed the title) Poster on automatic recognition of consonants from ultrasound data. Jeff Berry, Diana Archangeli, Ian Fasel, maybe others (I didn’t write down all the authors). — The poster looks at three methods for classifying ultrasound data (2D tongue images, usually on the sagittal plan, so you see from the tongue root to the tongue tip). Models were tested on their ability to classify ultrasound data of consonants /p t k l r/. One method was to match the ultrasound images to wavelets (how much does the tongue match specific line types) and then use support vector machines to classify them. The second method was to apply eigentongues (like Principled Components Analysis) to generate features in the images and use those features to classify tongue images. The third method was to use a Deep Value Network (a neural net that allows for probabilistic classification of the data to categories). The best classification results came from the support vector machine, but the authors suggest that the analyses are preliminary, so they will continue to look at other ways of automatic classification. My input: Note that, as in previous work by Jeff Mielke & Ying Lin (in JASA), ultrasound is primarily being used to classify consonants based on place of articulation.
Shadowing, phonetic imitation, and the role of articulatory gestures in speech perception (Mitterer & Musseler) — The authors present three experiments in which they argue against direct perception of phonetic gestures, a theory usually associated with Carol Fowler. In experiment 1, German participants imitated words ending in –ig (sorry, no gloss). They varied the phonetic realization of this suffix between [ik] and [ic] (c is palatal fricative). Direct perception of gestures would predict slowing based on a mismatch of what participants hear and what they say. However, the authors found no evidence for a reaction time (RT) cost for mismatch. This replicates earlier work the authors did on Dutch /r/ where the variation was between the alveolar and uvular trills. The second experiment was on imitation of pitch and speaking rate variation. Participants saw a matrix of four pictures and had to answer a question about one of the pictures or repeat the last word of the question (For example, one picture was of a red sweater. If participants heard “What color is the sweater?”, they either had to answer “red” or repeat “sweater”). Direct perception predicts faster RTs for repetition of the last word, but the opposite was found. Also, the questions varied in terms of speech rate (some questions were asked in rapid speech, some in slower speech). Participants were faster to answer when the question was asked in rapid speech. Thus there is evidence of imitation of abstract phonetic qualities, and evidence for pragmatic facilitation (because it’s much more sensible that the instructions would be to answer the question rather than repeat the last word), but there is no evidence of direct perception. The third experiment looked at joint action eliminating repetition advantage. This experiment was similar to the second, but on some trials, participants saw an X appear in the middle of the four-picture matrix. When that happened, a word would appear immediately and participants had to ignore the other directions and just repeat that word. I’ll call this the repeat condition. The word that appeared then either matched the instructions or did not. And, of course, the instructions could have been to answer the question or repeat the last word of the question. For the repeat condition, the finding was that there was clear facilitation to say the word after the X when the instructions were to repeat the last word and the word matched that last word. This is compared to when the word-to-be-repeated was the answer to the question from the original instructions. In contrast there was a nonsignificant trend towards faster production of the word-to-be-repeated when the instructions were to answer the question and the word-to-be-repeated was the answer to the question. My input: this was a lot of data in one poster, although all of the results are very interesting. I think, though, that Fowler would probably expect to get all of these effects because of the task demands and she would argue they do not bear on the question of direct perception. If you’re confused by my explanation, it might be worth Googling Mitterer or Musseler (u is an umlaut) to see if they have more information or a manuscript related to these experiments.