Sean Burke and Sheri Wells wrote a fascinating article about converting (English) text to Braille. They consider both rule- and constraint-based approaches, though only a rule-based approach is implemented.
What makes the problem interesting is that more than one “contraction” rule could potentially apply to a given substring of the input word. To use B&W’s example, there are contractions for the strings ‘the’, ‘th’, and ‘er’—so how should the end of the word ‘leather’ be contracted? (Answer: by contracting ‘the’, not ‘th’ and ‘er’.) Conflicts like this could in principle be resolved by either rule ordering (for convenience, I’ll refer to contraction rules by putting !! around the strings they contract: !the! bleeds !th! and !er!) or constraint ranking (*the >> *th, *er).
The Elsewhere Condition applies: rules with more-specific structural descriptions bleed those with less-specific SDs, as in the ‘leather’ example, where !the! precedes !th!. Of course, if the ordering were the reverse (!th! then !the!), the system wouldn’t appear to have !the! at all. B&W use identity rules to implement lexical exceptions. Similarly, exceptional sub-word strings, like ‘shead’ (as in ‘brideshead’), whose ‘sh’ fails to contract, is handled by a special ‘shead’ → ‘shead’ rule that’s ordered before the regular !sh! rule.
Less trivial is the fact that in the ‘leather’ example, !er! mustn’t bleed !the!. B&W seem to get this with left-to-right directionality: since ‘the’ matches first, it’s replaced, bleeding !er!. That made me wonder about a case where the shorter string is to the left of the longer string, like ‘aesthete’: would !st! bleed !the!? Apparently not: B&W use a special rule for ‘sthe’ which contracts its substring ‘the’ (and that rule bleeds both !st! and, vacuously, !the!). But that could be because ‘th’ is a digraph and there’s a dispreference for splitting digraphs. As far as I can tell, the only other test case (that doesn’t involve a morpheme-specific rule) is !bb! vs. !ble!, as in ‘rabble’, and again B&W have a special rule for ‘bble’ that contracts the substring ‘ble’. I can’t think of any reason why that should be true except that there’s a preference for either applying longer rules or getting a shorter result.
I can’t tell what effect morpheme boundaries have on rule application (except when letters become adjacent across a morpheme boundary without having their usual phonetic value, as in the ‘ar’ of ‘tearoom’, or the ‘th’ of‘meathead’). For example, on the one hand the ‘st’ in ‘mistyp’ (as in‘mistype’) is blocked from contracting, but on the other hand not so for ‘mistitle’ (which might be insufficiently frequent to make it into the rule list). lsquo;Mistreat’, ‘mistime’, and ‘mistrust’, conversely, have such high-frequency stems that they’re handled by the rules !treat!, !time!, and !trust!. There’s no special rule blocking !gh! from applying in words like ‘prighood’, but that could be an oversight due to the low frequency of such words.
There’s also a brief discussion about doing hyphenation with constraints, which could be useful for teaching beginning OT. (It uses constraint weighting rather than strict ranking, so the examples would have to be modified slightly.)
All in all, worth a read! One quibble: the accuracy of B&W’s program is assessed by comparing its output to that of an existing program, not to the intuitions of Braille users, so we don’t know if the program’s output for tricky cases is truly correct.