neighborhood update

A few weeks ago I posted some thoughts on neighborhoods for languages with contrastive segment length. The issue is that calculating the number of neighbors for a given item presumably would net different results based on how you conceive a geminate: is it a pair of segments, or is it a single segment? I had thought that the geminate-as-single-segment approach would generally provide a higher neighbor count, which preliminarily is supported by an artificial trial.

The clarifying example I used is from Italian: if geminates are considered two segments for neighborhood purposes, then occo and osso are not neighbors, but inna and anno are. But if geminates are considered one segment, occo and osso are neighbors.

The research I have come across so far treats geminates as two segments, mainly because it’s for reading comprehension rather than spoken. But since I’m a phonologist rather than an orthographer, I decided to challenge myself to work on the one-segment idea. Getting neighbors by hand is not a task anyone should have to do, so the quick way is to write a script that will do it for you. Turns out the implementation is fairly clunky … the two-segment procedure is far simpler than the one-segment procedure. It took much trial and error methodology, but I now have a script that treats geminates as single segments.

My original intuition seems to be confirmed: most words have more neighbors when you allow geminate substitutions. I made a small artificial lexicon for this, designed to contain words that test the accuracy of the comparison procedure. With a little tweaking, it outputs the number of neighbors for each item, as well as what the neighbors actually are.

In the artificial lexicon, the average word has .87 more neighbors when geminates are considered single segments (brought down by pairs which are neighbors only in the two-segment world). The average absolute difference is 1.27.

For the curious, I provide the lexicon below, along with neighbor data under both approaches.

Neighbors, considering geminates as two segments:

      akk 0 []
      asdfl 0 []
      feet 1 [fet]
      fet 1 [feet]
      fook 2 [foot, pook]
      foot 2 [poot, fook]
      impil 2 [ipil, ippil]
      ipil 4 [impil, pil, ppil, ippil]
      ippil 3 [impil, ipil, ppil]
      oopoot 0 []
      paak 1 [paat]
      paat 3 [paak, ppat, pat]
      paloota 1 [palota]
      palota 2 [paloota, palta]
      palta 1 [palota]
      paluuta 0 []
      pat 6 [pit, ppat, put, pt, pato, paat]
      pato 1 [pat]
      pil 3 [ipil, pit, ppil]
      pit 4 [pil, put, pt, pat]
      pont 2 [punt, poot]
      pook 3 [pooko, poot, fook]
      pooko 2 [pook, pookoo]
      pookoo 1 [pooko]
      poot 3 [foot, pook, pont]
      ppat 2 [pat, paat]
      ppil 3 [pil, ipil, ippil]
      pt 3 [pit, put, pat]
      punt 2 [pont, put]
      put 4 [pit, pt, pat, punt]

Neighbors, considering geminates as single segments:

      akk 0 []
      asdfl 0 []
      feet 2 [foot, fet]
      fet 2 [foot, feet]
      fook 2 [foot, pook]
      foot 4 [feet, poot, fook, fet]
      impil 1 [ipil]
      ipil 3 [impil, pil, ippil]
      ippil 2 [ipil, ppil]
      oopoot 1 [poot]
      paak 2 [pook, paat]
      paat 6 [paak, pit, put, pt, pat, poot]
      paloota 3 [palota, palta, paluuta]
      palota 3 [paloota, palta, paluuta]
      palta 3 [paloota, palota, paluuta]
      paluuta 3 [paloota, palota, palta]
      pat 7 [pit, ppat, put, pt, poot, pato, paat]
      pato 1 [pat]
      pil 3 [ipil, pit, ppil]
      pit 6 [pil, put, pt, pat, poot, paat]
      pont 1 [punt]
      pook 5 [pooko, paak, pookoo, poot, fook]
      pooko 2 [pook, pookoo]
      pookoo 2 [pooko, pook]
      poot 8 [foot, pook, oopoot, pit, put, pt, pat, paat]
      ppat 1 [pat]
      ppil 2 [pil, ippil]
      pt 5 [pit, put, pat, poot, paat]
      punt 2 [pont, put]
      put 6 [pit, pt, pat, punt, poot, paat]

Leave a Reply

Your email address will not be published. Required fields are marked *