I am looking for input (data) on tautosyllabic consonant clusters. Suppose that a syllable begins with two adjacent consonants, followed by a vowel: CCV. Technically this is called an initial demisyllable. I am aware of two competing claims/proposals about what kinds of consonants are cross-linguistically unmarked or preferred in this type of situation, both based on the notion of relative sonority. For the sake of simplicity, let us assume a common five-way sonority scale:
(1) One approach posits that specific languages can place a minimum sonority distance requirement on onset clusters, such that Spanish, for example, allows OL but not *ON or *NL, since obstruents differ from liquids by two steps on the sonority scale, whereas obstruents plus nasals or nasals plus liquids differ by only one sonority rank. One implication of this is that there could or should exist languages in which the only permissible onset clusters consist of an obstruent followed by a glide, such as /py/, /kw/, etc., whereas OL onsets, such as /pl/ or /tr/, are not attested. Works such as Steriade (1982) and Selkirk (1984) are examples of this general theory.
(2) A different approach is the Sonority Dispersion Principle proposed by Clements (1990). In this theory the three segments in an initial CCV demisyllable prefer to be evenly spaced apart in terms of relative sonority. This leads to the claim that OL (obstruent + liquid) syllable-initial clusters are universally preferred over OG (obstruent + glide). One implication of this is that there could or should exist languages in which the only permissible onset clusters consist of an obstruent followed by a liquid, such as /pl/ or /tr/, whereas *OG onsets, such as /py/, /kw/, etc., systematically do not occur.
I am preparing to carry out a major cross-linguistic study in which I test the claims of these two competing approaches on a robust sample of languages, preferably a set of languages which is genetically and geographically balanced. Evidence for or against these two theories could potentially come from different areas of the phonology:
(1) inventory of attested syllable patterns
(2) relative frequency of different types of syllable patterns
(3) child language acquisition data
(4) dynamic morphophonemic alternations
The latter, for example, would consist hypothetically of an underlying combination of morphemes which might otherwise be expected to surface as OL (obstruent + liquid), but which instead is realized phonetically as OG (obstruent + glide), or vice-versa. To illustrate, /pla/ > [pwa] or /kwa/ > [kra], etc.
The general research question which I am trying to tease apart is, which type of initial cluster, OL or OG, is truly unmarked in the languages of the world? My general impression at this point is that the answer to this issue is mixed, with some languages showing a preference for OL, and others indicating that OG is default. I think there are also other languages in which these two types of clusters are more or less evenly preferred.
What I am looking for is hard empirical and statistical evidence from individual languages, or even better yet from many languages, in response to this dilemma. I would especially like to know if any published surveys or typological databases already exist which address this issue, or which would allow me to perform searches to answer these questions? In addition, I would be happy to hear about electronic dictionaries and/or text corpora in relevant languages which would lend themselves to easily counting unique words (types) or tokens of forms containing such clusters (OL and/or OG).
By the way, I am aware of the difficult issue of interpreting OGV sequences, such as [kwa], in terms of whether the [w] is really in the onset or the nucleus, whether it is a separate consonant or just labialization of the [k], etc. So I would especially value cases of languages in which there is a clear answer to these questions.
Thank you very much,
Graduate Institute of Applied Linguistics and SIL International