Still more distributional arguments

I’m teaching a grad seminar on assimilation this quarter, and this week we discussed Jaye Padgett‘s “Unabridged feature classes in phonology” (abridged published version appeared in Language, 2002; the paper dates back to these). Something came up in the discussion that I’ve been thinking about for a while, related to my two posts from a while back about distributional arguments in phonology.

First, a little background on Padgett’s paper. (If you’d like to skip this, please move on to page 2.

Back in the heyday of feature geometry, there seemed to be plenty of good reasons why you would want to group a set of features that pattern (e.g., spread) together under a class node. Perhaps the most compelling of these reasons can be loosely characterized as follows. Borrowing an example from Padgett, suppose you’re looking at Turkish high vowels and you want to express the generalization that both [back] and [round] spread. If these are completely separate features with no class node, the best you could do would be to write two rules, one for each feature. By lumping [back] and [round] under the class node Color, however, you could write one rule spreading Color, which takes both [back] and [round] along for the ride.

So what? Well, the feature-geometric heyday of which we speak was still under the heavy influence of the SPE evaluation metric: roughly, a grammar with fewer and simpler rules (that express linguistically significant generalizations) is more highly valued than a grammar with more and more complex rules (that do not express linguistically significant generalizations). Translated into OT, the evaluation metric is (I think) still only at the level of a analytical heuristic: more general constraints and more explanation-from-interaction is more highly valued than more specific and ad hoc constraints. (The value of the metric was arguably also primarily heuristic in pre-OT days, but there seems to have been more pseudo-serious philosophy-of-science-type stuff written about the significance of the evaluation metric then than now.)

One of Padgett’s points is that standard feature geometry has a problem with partial class behavior of the type you find in Turkish: [back] spreading affects high and nonhigh vowels alike, while [round] spreading only affects high vowels. Padgett’s proposal is that feature classes like Color are sets that can be referred to by gradiently violable constraints: Spread(Color) prefers spreading of both [back] and [round], but [round] is prevented from spreading to nonhigh vowels, in which case it’s better to at least spread [back] than to spread nothing at all. This results from the following ranking:

  • *Nonhigh/round >> Spread(Color) >> { Ident(back), Ident(round) }

As Cahill & Parkinson (1997) point out, the really significant part of Padgett’s proposal is the bit about gradient evaluation of Spread(Class), which they argue is not incompatible with standard feature geometry. (There’s more to their argument, and to Padgett’s, that I’m not going into here; see also Halle’s 1995 article in LI for a model of feature geometry that allows partial class behavior in a rule-based framework.)

3 thoughts on “Still more distributional arguments

  1. Adam Albright

    I was interested to read this, because I had actually ended up telling my undergrad class something along these same lines on Thursday, and had then been pondering whether I really was telling them the right thing. The discussion was slightly different because it was in the context of rule-based grammar, but it had a similar flavor.

    Essentially, after rehearsing the argument that a good feature set should make it easier to express common processes, I had them ponder why common classes should be easier to express. Aside from convenience and economy, there seemed to be two possible arguments (one stupid and one maybe not so stupid):

    1) If rules were constructed by some random process in which it was increasingly unlikely to add additional feature specifications to them, then most of the rules in the world’s languages would be simple ones. (A purely formal generative process for generative phonologies…)

    2) If learners had a bias to assume simple rules, then no matter how the pattern actually arose, the scenario in (1) would be played out in the course of acquisition: whatever the phonetic basis, there would be a tendency to reanalyze processes as involving easier-to-state classes. This would be a strong argument if we observed that common classes were also phonetically unnatural, since it would point to distinct cognitive biases for how sounds should be classified.

    In OT, this would translate into an argument not about the distribution of grammars across the world’s languages, but rather, learning biases in choosing among competing characterizations of the context. (There is a connection here with the various proposals in the ranking literature for picking out the correct level of generality for faithfulness constraints; e.g., Prince & Tesar 1999, Hayes 1999.)

  2. Marc Ettlinger

    I’ve also been considering some of the same questions re: classes recently and I think one alternate way of thinking about it, actually analogous to Adam’s first point (which I assume was the stupid one), is in terms of category theory. If there is a rule, or a process, or a generalization that operates over some thing, that thing must be defined as a category. Classical category theory (which I think most closely corresponds to feature theory) has long fallen out of favor with cognitive psychologists that study category theory and there are a number of alternate proposals out there including exemplar-based models, prototype models, modified feature-based theories, and even Bayesain network models.
    I’m far from familiar with the subtleties of the debate, but considering whether /b, d, g/ or /b, d, k/ can be better acquired as a category based on these different models may go a long way towards answering the question. So, in that sense it parallels Adam’s first suggestion in that it does become increasingly diffucult to learn a category that has more tenuous, numerous or irregular associations between their members. Instead of a bias towards simpler rules, which to me has an element of circularity (what make a rule simple? being able to be learned easily?), the bias is towards learnable categories. Whether this could be translated into the recent exemplar-based models that have become popular in Lx recently is a question I’ve been trying to think about. And perhaps this is what Adam meant by simpler rules and I am rather just suggesting appealing to category theories as a way of defining simple.
    I don’t totally follow Adam’s point regarding how simple rules would translate into the language of OT – some clarification would be great!

  3. Pingback: phonoloblog»Blog Archive » Distributional arguments noch einmal

Leave a Reply

Your email address will not be published.