I’m struggling with finding a reasonable rubric for determining lexical neighbors in languages with contrastive segment length. The basic issue is whether to consider a substitution of one geminate segment for another to qualify as a within-neighborhood change. e.g., are osso and otto neighbors?
Quick background: neighbors are similar words, but the notion of “similarity” can be variously defined. A standard working definition of neighbor is a lexical item that differs from your target by one phoneme (or in reading comprehension, one letter), and “differ” means the addition, deletion, or substitution of a single segment. A measurable variable using neighbors is neighborhood density: the count of neighbors of a given lexical item.
In experimental phonetics/phonology, neighborhood density is associated with Luce, Pisoni, and Goldinger (1989, 1990, et seq, et coauthors, et various subsets thereof) as a correlate of latency in spoken word recognition. The concept also appears in the literature regarding written word recognition at least as early as Coltheart et al (1977). Because of its implication in word recognition, density needs to be controlled for in phoneme discrimination tasks involving non-words (because they have neighborhoods of real words in the language of the subject). (At the risk of sweeping generalization) it is also presumably correlated directly with frequency and inversely with word length in many languages.
Consider an example from English. Here are the phonemic neighbors of scam; i.e. the existing English words which differ from scam by the addition, deletion, or substitution of a single phoneme: cam, sam, scamp, scram, spam, slam, swam, scum, skim, scheme, scab, scat, scan.
Suppose then you are working with a language in which vowel length is contrastive, and you want to find the neighbors of a word like, say, paak. It shouldn’t be too controversial to allow paa, aak, taak, laak, maak, naak, paat, paap, paan, and paat as neighbors. What, then, of manipulating the vowel? For example, which of pak, piik, and pik should also count as neighbors?
I guess the answer depends on your conception of length. Let’s assume that [a] in pak is a substitution for [aa]. This can only be the case if they are considered different kinds of segments, and [aa] is a single segment for the purposes of neighborliness. It would follow that [ii] is also a single segment, and thus [i] and [ii] can both serve as substituted segments, therefore pak, piik, and pik are all neighbors of paak.
If you instead consider pak to be a deleted neighbor of paak (like paa or aak), then you are relying on the assumption that aa is two segments. As a result, neither pik nor piik can be considered neighbors: both involve two segment changes rather than one (a deletion and a substitution in pik, and two substitutions in piik). Maybe it doesn’t matter, but if you are working in a 7-vowel system, this reduces the set of potential neighbors for any word by 12 for each vowel in the word.
I guess one way around this is to code two variables: a length-as-single-segment measure and a length-as-two-segment measure. Another way around it is to look for precedent, which I have started to do. There is some work on neighborhood density in languages with contrastive length, in particular Italian and Japanese. (I checked Arabic and Hebrew in LLBA but came up dry).
The Japanese research (Kawakami, Masahiro 2002) suggests an inhibitory effect on lexical decision of kana-level neighborhood density (i.e., orthographic neighborhood), but not of phoneme-level neighborhood density. I can’t tell if the decision tasks were both reading tasks, but that would explain the different results across tasks. I also don’t know whether this research considers a change of segment length to be substitution or addition/deletion, as I have not been able to determine whether long segments appeared or were manipulated in any test items. Judging from the short discussion I found here, length was left out of the equation.
The Italian research (Barca et al 2002) is also focused on reading tasks and orthographic neighborliness. It would thus seem that geminate consonants (spelled with two letters) would count as two units. So anno ‘year’ and inno ‘hymn’ are considered neighbors, but osso ‘bone’ and otto ‘eight’ are not.
If ss and tt are both treated as single segments, they could substitute for each other, and either could substitute for s or t. Ultimately this condition would increase the neighborhood size of everything in the lexicon, but individual items could be affected in different ways. Perhaps what I’ll do is code my data both ways, rank lexical items in order of density, and see if the ordering changes drastically between the groups.