I’ve been reading a lot about the ‘psychological reality’ of phonotactic constraints lately. Something that’s puzzling me is the diversity of on-line repair strategies for constraint violations. In particular, for phontactically illegal clusters, sometimes epenthesis is observed (e.g., Japanese listeners perceive ebzo as ebuzo; Dupoux and colleagues), and other times consonants are altered (e.g., /dl/->/gl/ for French speakers; Halle and colleagues).
My question is why such diversity is observed–that is, what triggers the different repair strategies. N.B. I’m not expecting a single strategy to be used; I will place my hand on the good book (P&S 1993) and swear an oath to “homogeneity of target, heterogeneity of process.”
In a recent paper, Kabak + Idsardi argued that perceptual epenthesis is driven by syllable structure constraints, not by consonantal contact. So, for a cluster C1C2, it is the ill-formedness of C1 in coda that drives epenthesis, not the contact of C1.C2. In fact, they argue (based on confusion data) that no epenthesis occurs when there’s simply a syllable contact violation (*C1.C2).
This accounts for their data as well as the Dupoux et al. results; however in many other cases a “contact” violation does trigger ‘epenthesis.’ In production, Lisa Davidson’s shown that onset clusters violating English phonotactic constraints are repaired not by changing the consonants but by altering the temporal relationship of the gestures (not true epenthesis, which is why I used the scare quotes above). Perhaps more directly comparable to K+I, Berent et al. have shown perceptual epenthesis (i.e., confusions between lbif and lebif).
And of course epenthesis is not the only way to fix clusters. Lots of other studies have shown perceptual confusions between featurally similar clusters (e.g., */dl/-/gl/)–e.g., Moreton in English.
Any thoughts on this? What’s driving the heterogeneity of processes?
My feeling based on my production studies is that there are a few factors involved. First, people likely do want to preserve information in production, so they choose epenthesis when they are certain of the components of the cluster. When they perceive both segments, they could choose deletion, but they rarely do. Second, if there’s a lot of competition from a very similar real cluster, that knowledge can interfere. For example, /zC/ sequences are not surprisingly repaired to /sC/ a lot, since it’s a very minimal change and the voicing of C2 already is pretty close to the C2 of an sC cluster. (Shameless self-promotion: see my website for my papers looking at both the perception and production of non-native sequences.)
I believe that production and perception are VERY different things. I think that with these non-native sequences task effects can force people into just about anything—so if you give them zgamo~zəgamo, they’ll tell you those are the same even if in a production task, they’d repair zgamo as [skamo]. Likewise, as in Elliott Moreton’s 2003 Cognition study, if you give them a bl~bw continuum, that’s what they have to work with in the task, and that’s what they’ll adapt to. It doesn’t necessarily mean they would repair [bw] as [br]. Maybe they’d make it [dw]. I guess these perception tasks tap into the fact that all (a) perceptually, these non-native things are similar to several different possible legal sequences, especially in English, and (a) the phonological system is variable, and could lead to multiple types of repair depending on what ranking you have at the moment. As interesting as they are, I don’t necessarily think that the Kabak & Idsardi, Berent et al, and Dupoux et al perceptual epenthesis studies should be taken as definitive indication of how one’s phonology deals with non-native clusters, since production studies show a much wider range of possibilities.
At Arto Anttila’s LSA workshop, I’ll be presenting a poster with data on English speakers’ production of a wider variety of word initial non-native sequences (stop-initial and fricative-initial) than I’ve looked at in the past. In addition, there’s both an audio-only input condition, and an audio+orthography input condition. Interestingly, the results essentially showed that overall there were slightly more errors in the audio-only condition, but that the types of errors and their relationship to one another were the same in both types of input condition. I take this to suggest that the perceptual input causes a slight decrement in performance, but the phonological mechanism underlying the variable production of non-native clusters is the same.
One point you bring up that I’d like to ask you about…I’m sure that some of the perception/production asymmetries come from the fact that in perceptual studies subjects often explicitly compare two alternatives. But what about transcription experiments, where the response format is relatively free? In this cases, you still see a multitude of repair patterns. For example, Halle et al. show that French speakers transcribe /dl/ as /gl/ (only 6% of responses are epenthesis), but Berent et al. (in footnote 5) show that illegal clusters tend to repaired by epenthesis in transcriptions (only 5% or responses show just alteration of one consonant).
Well, the case of transcription is similar to speaking studies, I think—I’ve done both transcription and (speech) production of the same clusters, and it’s true that people don’t always repair the same cluster exactly the same way. This could have to do with small phonetic differences in individual stimuli (e.g. in [zgamo], the [z] was slightly longer and sounded more like an [s] than in [zgane]). But in my cases, the speaker (and language) of the stimuli is always the same, so small phonetic differences are more likely to be the cause.
In the bigger picture of Halle vs. Berent, well, there are a couple of issues. One is a task issue again: Was Halle only playing /dl/ and /gl/ tokens? Or were they mixed in with lots of fillers? (I don’t remember, and I don’t have the paper here.) If not, you probably have the same task effect as for the perception studies. But on a broader level, I suspect that language background plays a huge role in repair strategies. Halle’s speakers were French, and Berent’s were English (or Spanish) speaking (and in her case the stimuli were the same for both groups). I’m not going to try to speculate on why French is different than Spanish or English in terms of repair strategies, but I do think that’s probably a large factor in these differences.
I certainly agree with Matt and Lisa’s points, especially about the nature of the tasks in the experiments, and the differences between production and perception. At least in the perception cases, if we’re looking for an overall framework to cast the results into, I think we have to look to probabilistic, Bayesian accounts, which bear some superficial resemblance to tableau calculations. So, in perception we’re looking for p(/X/|[Y]) relying in part on our knowledge about the forward model, p([Y]|/X/). So I don’t think that it’s co-incidental that you get perceptual epenthesis of vowels in languages where vowels can be devoiced or deleted (i.e. where p(|V) > 0. As to our particular finding for Korean, the syllable account seems sufficient (and it’s very interesting, as in Moreton’s study, how well the subjects deal with SOME 0 frequency data). So one possibility is that under the particular time pressures of the experiment the subjects don’t (yet) do a full phonological parse, but just try to get a sequence of valid syllables, which could coincide with a relatively earlier stage of processing.
Thanks Lisa + Bill for helping me think about some of the issues involved. I think a big take-home point from this work is that it’s critical to think about the exact task situation when evaluating data. No behavior is a ‘pure’ reflection of any one cognitive process, be it ‘grammar’ or some purely perceptual or production process–behavior reflects a (task appropriate) integration of multiple sources of knowledge. Not that I know how that works…but I think a Bayesian perspective is a nice general framework to think about the problem.
It looks like I missed the main part of this discussion, but I have a bit of data to add. I’m currently working on consonant clusters in Spanish, and I found some interesting production patterns. I recorded speakers uttering (essentially illegal) word-medial sequences of two voiced stops, e.g. /tagdo/. There were a variety of strategies. Sometimes the two were both produced as stops, with variability in the release of the first; this is in contrast to legal sequences of voiceless stops, which were always produced with two clear bursts. More often one or the other voiced stop was spirantized, which is a frequent realization of Spanish voiced stops in other contexts, and would presumably alleviate some of the articulatory difficulty posed by sequences of voiced stops. Just as interesting is the repair strategy that was not used: subjects basically never inserted an epenthetic vowel / open transition between the two segments. This is especially surprising because such a transition is a frequent (actually invariant in my recordings) repair for C-tap (Cr) clusters in Spanish. This raises the question of why some repair strategies are OK (or even mandatory) for some sequences in a language (voiced stop – tap), but not for very similar sequences in the same language (voiced stop – voiced stop). This is interesting to me because the voiced stop sequences don’t seem to be repaired by analogy to other cluster repairs (i.e., intrusive schwa) , nor by changing the segments to acceptable sequences (i.e., devoicing). I’m leaning toward the view that the ‘intrusive schwa’ in Cr sequences is not a repair associated with a tap, but is actually part of a single-cycle trill. This would explain why intrusive schwa is not generally available as a repair in the language, but it still leaves the question of where the attested repairs (non-release, spirantization of C1 and/or C2) come from.
Matt asks why Berent et al.’s listeners repaired illegal onset clusters with epenthesis ([lbIf] –> /l@bIf/, while Halle et al.’s changed the place feature ([dlapto] –> /glapto/). Lisa suggests that it may have to do with the response set available to the listeners, and Lisa and Bill both propose that the listeners’ native-language repair strategies (or faithfulness rankings, if you prefer) would also have an influence.
I would like to add that the kind of violation being repaired may matter too (Kabak & Idsardi 2003, 2005 make this point for medial clusters). The Halle et al. stimuli were un-French because of onset place co-occurrence restrictions; they could be fixed by changing the place of one of the consonants. The Berent et al. stimuli were un-English because of sonority restrictions. Changing consonant place won’t fix that problem; you have to insert something, delete something, or change consonant manner. (Sounds like McCarthy’s Diola Fogny problem set!)
This hypothesis can explain why place didn’t change in the Berent et al. experiments, but it doesn’t explain why epenthesis wasn’t used by Halle et al.’s listeners. The perceptual parse has to compromise between sounding like the stimulus and sounding like English (French, Spanish, …), and maybe a change of place just sounds less different than an epenthesis, even if your language allows all of the possibilities as surface forms. This predicts that (1) speakers of Polish or Russian should find it easier to distinguish [dli] vs. [d@li] and [gli] vs. [g@li] than [dli] vs. [gli] and [d@li] vs. [g@li], and (2) if you did a written version of the Halle et al. experiment (French speakers; flash a string on the screen; participants write it down) you might get different results from what Halle et al. got — people might read DLAPTO as DELAPTO rather than as GLAPTO.