You are currently browsing rlevy's articles.

I’ve recently had need to compile SRILM on my Mac, which runs OS X 10.9 (Mavericks).  I have Mac Ports installed but found that I had to change to the out-of-box C/C++ compiler in order to get SRILM to compile.  It was simple: in common/Makefile.machine.macosx I simply changed

CC = cc $(GCC_FLAGS)


CC = clang $(GCC_FLAGS)

May it work for you as well!

My laptop died recently, and I am stuck with a new hard drive and a fresh OS X 10.9 (Mavericks) installation.  I wanted to install lme4.0. This turned out to be trickier than I’d expected, because Xcode 5.0 doesn’t include a Fortran compiler. This can be solved by installing a Fortran compiler from Many thanks to Simon Urbanek for this advice.  A couple of other potentially useful notes:

  • I find that you can no longer install Command Line Tools from within Xcode.  I also was unsuccessful in running xcode-select --install, which used to work; I get the error “Can’t install the software because it is not currently available from the Software Update server”. However, downloading directly from does work for me.
  • MacPorts offers Fortran compilers within recent gcc versions — e.g., gcc49 installs a Fortran compiler at /opt/local/bin/gfortran.  But these won’t meet R’s needs; Simon Urbanek states:

    The CRAN build is a native build of R and only supports native compilers.

Update 6 March 2014: Recently Yan Chen used this advice to install lme4.0 on her Mac. She reports that the complete sequence of actions that led to success for her were:

  1. Install Command Line Tools
  2. Install the gfortran compiler
  3. Modify the Makeconf file in /Library/Frameworks/R.framework/Resources/etc to contain the lines


  4. Upgrade the Matrix package in R to version 1.0.0 or higher
  5. Build lme4.0 following the directions in the tarball

We hope that this may work for you too!


I often want to make a PDF version of Word documents I’m writing, but when there are differently formatted sections the Word document spits out multiple PDFs.  To fix this so that you get only one PDF out, do the following:

1) Open File->Page Setup…

2) From the “Settings” dropdown box on the resulting window, Choose “Microsoft Word

3) From the “Apply Page Setup settings to” dropdown box on the resulting window, choose “Whole Document.”

Kudos to dbenoit who posted this solution here.

Despite all the advantages of pdflatex, I still do a lot of LaTeX writing in which I take the latex -> dvips -> ps2pdf route. This is primarily because of pstricks, which I use for drawing syntactic trees, dependency graphs, and various pictures that I’ve already written out. (I’ve never had complete success with the “pdf” option for pstricks or with xelatex, perhaps because I use other packages within pstricks.) When I write Sweave files, however, I’ve had the trouble that R’s default postscript driver doesn’t support transparency, which is problematic for graphs produced with ggplot2.

However, I found a work-around (): one can direct Sweave to use R’s cairo_ps driver, which *does* support transparency. Putting the following code at the beginning of your .Rnw file accomplishes ths:

cairo <- function(name, width, height, ...) grDevices::cairo_ps(file = paste(name, "eps", sep =

Then, in the header of the R code snippets that generate your graphs, use the options fig=T,grdevice=cairo,pdf=F, and you’re good to go.

That being said, I have a long term goal to abandon pstricks and move completely over to tikz, because tikz is clearly the future (from Google Trends):

Unix find is a powerful tool, but I always forget how to use it. I’ve recently found my favorite page ever on a wide variety of examples of use:

I hope that this is as useful to you as it is to me!

Also, FWIW I have recently found recoll to be a very powerful Spotlight-from-the-command-line tool on both Linux and OS X.

After 4.5 years of struggling with OS X’s input source menu, I have finally figured out how to bind key combinations to selection of specific input sources (read: English, traditional Chinese through Pinyin, Japanese Hiragana, …).  Here are the steps:

  1. Note the specific names of the input sources on the input menu you want to use.  Mine, for example, are “U.S.”, “Russian”, “Pinyin – Traditional”, “Hiragana”, and so forth
  2. Install Quicksilver.
  3. Open Script Editor (Spotlight will find it).
  4. Within Script Editor, add the contents
    on changeKeyboardLayout(layoutName)
        tell application "System Events" to tell process "SystemUIServer"
            tell (1st menu bar item of menu bar 1 whose description is "text input") to {click, click (menu 1's menu item layoutName)}
        end tell
    end changeKeyboardLayout

    where your_input_source_name is replaced with the name of one of your input sources. For example, I might use Pinyin - Traditional. Note that you have to get the name of the input source EXACTLY right.

  5. Save this script as your_input_source_name.scpt in a location of your preference, where you replace your_input_source_name with a mnemonic string of your preference (I might just use chinese-traditional) .
  6. Open Quicksilver through Spotlight.  Push ⌘’ to open the Triggers window.
  7. On the bottom of the Triggers window, hit “+” and choose HotKey to add a HotKey binding.
  8. For “select an item”, type in the name of your saved script until Quicksilver finds it.
  9. For “Action”, just leave it as “Run”.
  10. Don’t put anything for Target.  Click Save.
  11. You’ll now have a new item in the Triggers window called Run Script: your_input_source_name.scpt.  Now double-click on the “Trigger” column of that item, click “Settings” in the new sub-window popping out to the left, click in the Hot Key window, and type in a hot key combination (e.g., I might use ^⌘T).
  12. Repeat steps 4-11 for each input source for which you want a keyboard shortcut.

Kudos to Asmus who posted the very simple AppleScript code used above at this link.

from Gmail Online Service <>
date Mon, Jan 10, 2011 at 12:35 PM
Gmail technology team has recently launched Google web software to protect and secure all Gmail Accounts. This system also enhanced efficient networking and fully supported browser. You need to upgrade to a fully supported browser by filling out the details below for validation purpose and to confirm your details on the new webmaster Central system.
User id:
User Password:
Date of birth:
Note: Your account will be disabled permanently if you failed to provide the details required above within 72hours. Gmail will not be held responsible for your negligence.
The Google web Service.

Dear Senator Kyl:

I grew up in Arizona and now live in California.  My mother and many of my childhood friends still live and vote in Arizona.

Today on Face the Nation (about 6:45 in) you criticized Pima County Sheriff Clarence Dupnik for suggesting that vitriolic public discourse may have played a role in inciting the shooting of Gabrielle Giffords.  Sheriff Dupnik is an elected public official, just like you are.  What gives you but not him the right to pontificate about the slaughter of Giffords, Roll, and several other bystanders?

Submitted to John Kyle here.

Dear Sheriff Dupnik,

I grew up in Tucson, graduated from University High School in 1991 (I overlapped with Gabrielle Giffords for one year in high school, though I didn’t know her), and now live in San Diego.  I have been following media coverage of the shooting of Congresswoman Giffords and others, and I have listened to your remarks on the anger, bigotry, and vitriol in the United States today, and how Arizona has become a mecca for prejudice.  I am extremely grateful that you are standing up and telling the truth as you see it.  The country needs more sober voices like yours.  I am saddened to see what is happening to my home state.  Please know that you have my full support in any more public statements that you make on this issue, and I hope that you continue to tell the truth as you see it.

Yours truly,

Roger Levy

You can send Sheriff Dupnik a message of support here:

Backdrop: Gabrielle Giffords was shot today in Tucson.

So I walk into this taco shop this afternoon to get a burrito.  At the ordering counter is a white guy, maybe late 50s, white-haired in a Harvard sweatshirt and horn-rimmed glasses.  Seems mild-mannered enough. He’s talking in crap Spanish to the staff, and I recognize that he’s trying to talk about the Giffords shooting.  I say, “I know what you’re talking about, the shooting in Tucson.”  Eventually he switches to English and says, nineteen people got shot at this rally.  The congresswoman was shot point-blank in the head, he says, and she survived.

I’m surprised: I’d heard that Giffords had died.  “She’s alive?” I ask.

He shrugs his shoulders.  “We all knew she was empty-headed!” he replies.  Presumably he was trying to tell this joke to the restaurant staff in his crap Spanish before I got involved.

Needless to say, I did not react well to this, told him that this was the Congressional representative my mother had voted for, and asked him to get the hell out of the restaurant.  His responses were telling: “Don’t you have a sense of humor?”  and “By the end of the year the writing’s going to be in the streets, and hundreds or thousands of people are going to be dead”, because the government is screwing us over and people aren’t going to take it anymore.

I told him that “you’re what’s wrong with this country,” which I now recognize is a pretty crummy comeback, because the immediate response was “no, you’re what’s wrong with this country; you’re a lamb being led blindly by the government.”

The conversation continued in ways that aren’t really worth describing in further detail; but crucially, at no point did this man express any condolence or disapproval of the Tucson gunman’s action.  I’m still parsing the details of the incident in my mind, but one basic take-home message strikes me above all else.  If here in California I can run into someone who essentially approves the ruthless slaughter of an elected official and numerous other bystanders on the grounds that he disagrees with the official’s political stances within a few hours of the event, then there are far more wackos out there than anyone thinks, and this country is in serious danger.

It can’t read your mind, but attachment Scanner Plugin for is available here and will catch you some of the time :-)

Incredibly useful for Firefox and X11:

David Brooks has just written approvingly of Obama’s capitulation to Republicans on tax cuts for the wealthy.  He cites the most recent Gallup poll’s estimate that 67% of independents and 52% of Democrats support extending all the tax cuts.  (In this poll, the average support over all political persuasions was 66%.)

But you would think that someone who devotes extensive and laudatory coverage to social science would account for the basic fact that how questions in a poll are worded can have a huge impact on what answers are given.  In this most recent Gallup poll, those surveyed were given only the option to vote yes or no on a law to “extend the federal income tax cuts passed in 2001 and 2003 for all Americans for two years”.

Less than three weeks ago, another Gallup poll gave participants three options: to keep tax cuts for all Americans (chosen by 40%), to keep tax cuts but set new limits for the wealthy (44%), or to let tax cuts expire for all (13%).  So according to this poll, a 57% majority of the public wants to let tax cuts for the wealthy expire.  And I’ll give you ten-to-one odds that the results of a three-option poll today would look more like November’s three-option poll than like December’s two-option poll.

And while he’s busy oversimplifying public opinion polls, why doesn’t Brooks mention that over two-thirds of the population supports repeal of “Don’t ask, don’t tell”? — and this chart says it all:

This is a persistent blog entry that I’ll keep editing to collect examples of how terrible WebCT software is.  It also serves as a memory for myself (and perhaps an FAQ for other hapless faculty) as to how to work around WebCT’s innumerable gotchas.  Though it is useful for a few things — such as automatically communicating grades to students with proper privacy settings on a secure (hopefully!) website, the list of bad things about WebCT is so large that it stuns me universities pay money for this product.

  • As a teacher, you can “hide” a column in a grade sheet.  This does not mean hide the column from the student’s view (the totally independent “Release to Student” setting is for that) — it means hide it from yourself!  Although there may be some good reason to do this, there is no excuse for the fact that it is hard to figure out how to unhide a column!  It turns out that unhiding a column cannot be done from the Column Settings page (where it would intuitively belong), but rather only from the Reorder Columns page.  Changing the “Release to Student” setting, on the other hand, can be done only from the Column Settings page, and not from the Reorder Columns page.  Go figure.
  • WebCT is extraordinarily inconsistent as to what is required to save a change.  To take the previous example: on the Column Settings page, you change the Release to Student setting for a grade-sheet column by clicking on the current value of the setting; this switches the setting (Yes changes to No, No changes to Yes), reloads the page, displays the new setting, and the new setting is thereby saved.  On the Reorder Columns page, in contrast, a currently hidden column will have a Show Column button that you can click, and vice versa for a currently unhidden column.  As with the Column Settings page, clicking on this button will change the setting and reload the page with the new setting.  But you have to click on an unobtrusive Save button at the bottom of the page to save this change! Either behavior would be fine, but please make the behavior consistent!

I love Sweave for writing LaTeX documents with embedded R code, and R’s cacheSweave package helps tremendously in avoiding repetition of time-consuming computations.  (Thank you, Roger Peng, for writing cacheSweave!)  One catch that is evident from a careful perusal of the cacheSweave vignette but which I keep forgetting is that cacheSweave will not cache R code chunks with side effects! Side effects include (but are not limited to):

  • print() and related display-to-screen commands
  • plot() and other graphical-output commands

This failure to cache is silent, so it can be confusing; I keep on forgetting about it.  So this post is as much as anything else a note to myself to remember.  The simple solution is to separate out the long-computation chunks from the side-effect chunks and cache only the former (see the vignette for details).

One of the things I do to my Sweave output is to make scientific notation more transparent.  I have a special R function for this which I use in my Sweave documents:

myFormat <- function(...) {
tmp <- format(...)
return(sub("e(.*)","\\\\\\\\times 10^{\\1}",tmp))

Note in particular the ridiculous number of backslashes required in the call to sub() to obtain a single backslash in the string output.  I have no idea why so many backslashes are needed—but apparently they are.  Hopefully this will save someone else the headache of figuring out the magic number by trial and error!

Earlier I posted on combining natbib and apacite to approximate APA citation style. Well, I became aware that this approach doesn’t handle the rule that in-text citations should be “AuthorA and AuthorB (year)” whereas parenthetical citations “(AuthorA & AuthorB, year)”. I can’t figure out how to get this to work while using natbib.

So I’ve concluded that just using apacite is probably better. The disadvantage is that its citation commands aren’t robust. But the makerobust package helps with that. Here is what the relevant part of my preamble currently looks like:


and I don’t need to change any of the in-document citation commands from the natbib names I’m used to.

I publish in a number of journals that use the rules of APA style.  I prefer to write my papers in LaTeX, and to use BibTeX together natbib to manage citations and bibliographic references.  One of the rules of APA style says that the first citation of a bibliographic reference must be the “long” citation (all authors listed, up to five).  Natbib has a longnamesfirst option that is supposed to do this automatically, but two of the most popular natbib style files — apalike and newapa — silently fail to use the option.

The solution is to use the apacite style file. It works. That is, you should have the following line in your preamble:


and the following line before you use the \bibliography command:


People often ask me about the relative merits of R versus other statistics & mathematical programming environments (Matlab, SPSS, and so forth).  While I love R and use it almost every day, I always insist that it has a(n unnecessarily) steep learning curve and can bite you in ways that are hard to recognize.  I also say that R’s practice of silent coercion and sometimes misleading textual presentation of objects contributes to this steep learning curve. Today I stumbled across a very concrete example which has bitten me for the umpteenth time.  The example involves the functiontapply() and conversion to factors. Suppose that we have a number of observations, each of which belongs to one of K groups:

> K <- 6
> dat <- data.frame(group=rep(1:K, each=4))

There are underlying group-mean differences, and noise at the level of the observations as well:

> group.means <- 100 + runif(K, min=-20, max=20)
> dat$y <- group.means[dat$group] + rnorm(dim(dat)[1], 0, 40)

A very common thing to want to do is to compute some aggregate statistic of each group, and then record the group-specific aggregate statistic on each individual observation. This is naturally done with tapply():

> <- with(dat,tapply(y,group,mean)) ## compute the observed mean for each group
> dat$group.mean <-[dat$group] ## side note: if you have more than one grouping factor, you need to use cbind() inside the brackets

No trouble so far, and inspection of the new vectors shows everything’s ok:

1 2 3 4 5 6
75.32891 102.14953 89.51449 102.49337 90.25102 117.25358
> tail(dat)
group y group.mean
19 5 105.34542 90.25102
20 5 119.21220 90.25102
21 6 164.36802 117.25358
22 6 41.89392 117.25358
23 6 130.04878 117.25358
24 6 132.70359 117.25358

But now, let’s imagine that your group indices happen to start with some number other than 1 (e.g., you found that group 1 was tainted and you throw it  out before computing the aggregate statistic):

> dat1 <- subset(dat, group > 1)
> <- with(dat1,tapply(y,group,mean))
> dat1$group.mean <-[dat1$group]

Everything seems ok. But if you didn’t bother to inspect the results, you might never guess that something is horribly wrong:

2 3 4 5 6
102.14953 89.51449 102.49337 90.25102 117.25358
> tail(dat1)
group y group.mean
19 5 105.34542 117.2536
20 5 119.21220 117.2536
21 6 164.36802 NA
22 6 41.89392 NA
23 6 130.04878 NA
24 6 132.70359 NA

What happened? This unhappy situation results from two sins and one proper behavior:

  1. tapply() silently coerces elements of its second argument into factors. So the line

    > <- with(dat1,tapply(y,group,mean))

    turned the numeric vector dat1$group into a factor with 5 levels, named “2″, “3″, “4″, “5″, and “6″. in turn is an array of length 5. [SIN]

  2. Indexing a vector or array with a numeric vector does not ever coerce to factor. So the operation


    gives the mean for group 2 when dat1$group is 1, the mean for group 3, when dat1$group is 2, and so forth. [PROPER BEHAVIOR]

  3. When you try to take elements of a vector using indices outside of the vector’s range, R silently gives you NAs back:

    > c(1,2)[3]
    [1] NA

    So the operation


    silently gives NA for elements where dat1$group is 6, since is only of length 5. [SIN]

Now, there are good arguments for each of these behaviors (coercion to factor by tapply(), non-coercion for numeric indexation of vectors, and — most questionably IMHO — returning NA for indexation outside a vector’s range). But the fact that R does each of these things silently makes it easy to lead even the expert user astray — and note that in this case, the user might not discover any sign of an error until much further downstream (if at all). From the user’s point of view, one way to avoid these dangers is to ensure that your data are of the correct type from the outset: if the differences among your groups are categorical rather than numeric, then you should be using a factor in the first place to represent them, not a numeric vector. But it is incredibly easy to make this mistake, especially because so many other programs use integer values to encode categorical variables upon data export. R could make life easier for all its users by providing explicit warning messages in some of these cases — coercion, indexing from outside the range of a vector — instead of being silent.

I recently have had occasion to write a Microsoft Word 2008 document with floating figures. This has been a rather harrowing experience. Two useful pieces of advice for those of you out there struggling with it:

  1. If you create a caption, you can group the figure with its caption as you would two objects in PowerPoint.  This keeps the two together.  However, you cannot do this with tables.  (Don’t ask me whose bright idea this must have been.)
  2. Word doesn’t have real floats.  The closest approximation is wrapping text around the figure.  You can access this with Format Object -> Layout.  There are extra options (e.g., Top and Bottom wrapping style, which is decent) hidden under Advanced.
  3. While you’re in Format Object -> Layout, you can get better control over the figure’s positioning using Picture Position options than you can by dragging the figure around with the mouse.  For example, selecting Vertical Alignment to Top relative to Margin is basically the best way of demanding float to top.  Note that this doesn’t seem to keep the figure there reliably if you edit the preceding text.
  4. Doing this kind of figure stuff can wreak havoc with formatting of later text, causing bizarre early page breaks and such.  You can undo some of this bizarreness by selecting the text after the pagebreak, selecting Format -> Paragraph | Line and Page Breaks, and unchecking some of the boxes including Keep with Next, Page break before, Keep lines together.  Which ones to uncheck?  Trial and error.
  5. After you have done this, ask yourself a hard question: why are you using this shoddy, poor excuse for document-production software?  Consider switching to LaTeX.

Today’s New York Times Grammar Blog critiques an article for the following circumlocution:

He is on a pace to finish with more than 1,800 yards for Stanford, which hosts cross-bay rival California on Saturday, is 7-3 and guaranteed its first bowl appearance since 2001.

Parallelism problem; in this case, we needed to repeat “is” before “guaranteed.”

The “missing” is creates a coordination of unlike categories. The critique is interesting, though — it’s well-established that there are lots of cases of coordination of unlike categories that are perfectly natural, such as the following (famous in linguistic circles) sentence:

Pat is a Republican and proud of it. (Sag, Gazdar, Wasow, & Weisler 1985)

What’s particularly unusual about the Times’ example, though, is that it’s not clear what syntactic category 7-3 is. Is it a noun phrase? An adjective phrase? Perhaps the reason that the example seemed so unnatural to Philip Corbett is that the unlike-category coordination draws attention to this uneasy question of category status?

With the aid of a couple of helpful blogs (here and here), I have managed to get command-line email-sending capability via sendmail going, at least as long as I’m on UCSD campus:

$ sudo postfix start
$ sudo postconf -e myhostname=
$ sudo postconf -e relayhost=

And sendmail just works now! Pretty cool, eh?

UPDATE: it’s a bit tricky to get Bcc: effects with sendmail, but I’ve found a way to do it. Specify all the addresses that you want to receive an email as arguments of sendmail, and then include an explicit From: line in standard input. For example:

*** Filename: sendmail.test ***
To: Jane Doe
From: John Doe
Subject: bcc test
(put what you want here)
*** End file sendmail.test ***

$ sendmail < sendmail.test

The January 2010 LSA preliminary program is available online, and UC San Diego has another strong showing, with nine presentations overall.  This compares favorably with UC Davis (4), UCLA (5), and UC Santa Cruz (6) and we’re just behind MIT and Stanford (10 each), Maryland and Johns Hopkins (11), and UMass (12).  Once again UC Berkeley is at the top, though, with 16 presentations!  Congratulations UCB!

In fact, given the current UC crisis, it’s worth noting that UC schools taken together have (co-)authors on 42 presentations total, out of (from my estimate) 333 oral presentations + 85 posters = 418 presentations in all.  That means that the University of California has had a hand in over 10% of the scholarly output in the premier annual scholarly meeting for the scientific study of language. Nothing to shake a stick at.

Since switching from PC to Mac three years ago, probably the single most annoying user-interface feature on Mac has been the loss of flexibility in using Tab and Shift+Tab to move focus: to buttons and menus in web browsers, and between buttons in pop-up dialogue windows.

Thanks to Tony Spencer, I am now able to do this!  It’s a simple setting: in System Preferences, select Keyboard & Mouse > Keyboard Shortcuts.  At the bottom of the window there will be a button for changing Full Keyboard Access from Text boxes and lists only to All controls.  Selecting the latter will give you full flexibility of changing focus with Tab and Shift+Tab.

One other crucial tidbit: on pop-up windows, when you have moved focus to a non-default button, in order to select that button you need to press Space rather than Return/Enter.

Amazing that it took me three years to figure this out…!

Here are some slides that I wrote on the current UC budget crisis — what it is, how we (and the state of California) got to where it is now, why you (the student) should care, and what you can do about it.  My goal is to be informative rather than polemic — the underlying issues are complex.

Over the past several weeks there’s been a lot of talk in the media and at the University of California about the origins of the current budget crisis.  One of the prime suspects has been Proposition 13, which passed in 1978 and capped real estate taxes by limiting appreciation of the base for property taxes (the “assessed value”) to 2% per year.  I spent a bit of spare time this week quantifying how much this cap has actually cost California.  Using data from Los Angeles Almanac (, we can visualize this loss within San Diego:

Housing prices

The black line is actual median housing prices in San Diego since 1982; the dotted magenta line is the appreciated assessed value of a home that was median-price in 1982. The area between the two lines is the value of the home that was immune to taxation due to Proposition 13.

Prop 13 also capped property tax at 1%. Dividing the area between the two lines by 100, we find that the state has lost $31,220 (not accounting for inflation) in property tax on a median-value home over the last twenty-seven years. (Because most of the rise in housing prices happened recently, inflation isn’t so important; the inflation-adjusted figure is $40,000 in 2008 dollars.)

Now for the caveats:

  1. This estimate is probably biased upward by the fact that Prop 13 inflates housing prices — in an alternative reality with out Prop 13, people would have less incentive not to sell houses they’ve held onto for a long time, which would increase supply and presumably push prices down.  It’s hard to know the size of this effect.
  2. The median home sale price isn’t the right statistic: it’s based on a different set of homes every year, whereas the correct statistic involves changes in prices to a fixed set of homes over time.  I don’t know how to get data for this correct statistic, however.  I believe that the effect here is most likely to bias the estimate downward, because between 1982 and 2006 (thus excepting the recent downturn), the set of homes being sold was probably becoming an ever lower-quantile sampling from among San Diego homes.  This is because there was lots of new property being built and sold, but it was being built disproportionately in lower-cost regions (=far from the coast), and catered largely to families who were being priced out of the booming market.

If you use LaTeX for linguistics paper writing, and use either the tipa package for IPA or the Sweave package for interleaving R code with LaTeX, and you also use the linguex package for formatting examples, you may occasionally encounter the problem of odd formatting of examples.  Just be sure to call both \usepackage{tipa} and \usepackage{Sweave} before \usepackage{linguex}, not after!

Who says it doesn’t happen?

First Black Mayor in City Known for Klan Killings

I boggled…

(, 12:51am 22 May 2009)

A newly declassified CIA report states that waterboarding was used 83 times on Abu Zubaydah and 183 times on Khalid Shaikh Mohammed.  In 2007, John Kiriakou stated that waterboarding was used only once on Zubaydah before he began to talk.  It doesn’t take more than elementary logic to conclude that either waterboarding doesn’t work well, the CIA was torturing its prisoners unnecessarily, or John Kiriakou made up lies to tell on national public television to get the CIA and the Bush Administration off the hook.  Or, of course, more than one of the above.

Sad times we live in.

Here’s a shout-out to Nathaniel Smith’s xpra. I’ve managed to get it working with a remote Debian server and my Mac laptop as a client. I have at least the following packages installed in support of xpra (all installed using MacPorts):

  • python25
  • python26
  • py26-pyrex
  • py26-gtk
  • xorg-libXtst
  • py25-gobject
  • py25-gtk
  • py25-nose
  • py26-nose
  • xorg-libXdamage
  • xorg-libXcomposite
  • xorg-libXtst
  • xorg-libXfixes

No guarantee that all of these are really necessary. I suspect that all the py26-* packages are necessary because it seemed that I needed py25-gtk even after installing py26-gtk.  On the other hand, there is no py25-pyrex.

I currently run OS X 10.5.6 and my TeX distribution is TeTeX from MacPorts.  I recently found out that my pdflatex is generating A4 paper by default, even if I use the letter option with \documentclass.  Solution: call texconfig-sys and choose Paper -> Letter.  Unfortunately I couldn’t get it to write locally to my home directory — I set TEXMFCONFIG to $HOME/Library/texmf but texconfig-sys didn’t seem to care.  That’s unfortunate, but I am the only user of the computer so I just set it locally.

I’m using the Papers program for the first time, on a 30-day trial basis.  It seems like one of those programs whose benefits aren’t totally obvious until you really have been using it for a little while.  The problem for me has generally not been having papers scattered on my hard drive — I already have a “papers” directory for this — but rather things like multiple copies and, most crucially, BibTeX export.  I’m banking on BibTeX export saving me enough time to make Papers worth it for that feature alone.  Some thoughts:

  • Computational linguistics conference proceedings papers should be obtained through the ACM repository
  • Psycholinguistics journal articles work well through Google Scholar.
  • JSTOR doesn’t work yet :(
  • The journal Language can be obtained through Project MUSE.
  • I prefer Skim to Preview for PDF reading.  Cmd-Alt-O will open the currently displayed paper in Skim.

The CUNY 2009 conference schedule came out a few days ago, and UCSD is pretty well represented among the talks (this is a single-track conference).  Here are some quick numbers:







Stanford: 2

MPI – Nijmegen: 2

UC Davis: 1

Wisconsin: 1

UMass: 1


Ohio State: 1

Penn: 1

NYU: 1

York: 1

Maryland: 1

San Diego State: 1

Tied for #1 with Rochester…not bad…:)

I’ve decided that the best get-started tutorial for WinBUGS is this one:

Also, kudos to Yarden Katz for publishing a webpage on how to use WinBUGS under Darwine on OS X:

(of course, I am actually using JAGS for most things these days.)

I’m learning how to use JAGS for a variety of hierarchical Bayesian models.  I had a bit of trouble figuring out how to install it on my Debian (etch) server, so I thought I would share how I got it to work (thanks to Brian Ripley and Martyn Plummer for suggestions):

# 1. configure and install JAGS
./configure --with-jags-modules=/usr/local/lib/JAGS/modules --libdir=/usr/local/lib64
make check
sudo make install

# 2. install rjags

sudo R --with-jags-modules=/usr/local/lib/JAGS/modules/ CMD INSTALL rjags_1.0.3-4.tar.gz

And this worked.  Critical was to make sure that the JAGS .so files wind up in /usr/local/lib64.

I’m a big fan of Sweave for writing LaTeX documents with R code embedded inside.  But there is a really annoying gotcha, which is that if you use statements that return R objects inside a figure, the resulting latex won’t compile :(

Examples pending…

I usually write my presentations in PowerPoint, but I use LaTeX to generate good-looking equations.  I’d been looking for a way to generate equations from LaTeX that are trimmed to a tight bounding box, so that they paste into PowerPoint properly (uniform size and transparent background).  And I found what I wanted: pdfcrop.  Yeah!

OK — I have a new favorite app: Air Sharing.  Allows you to use the iPhone as a wireless hard drive.  What’s the first thing I do with it?  Naturally, put a bunch of my own PDF papers on it :)  Powerpoint file viewing still has some issues.  But that’s just one more reason to go with PDF as universal standard, I guess.

Way cool.  Now I’d just like to be able to save files downloaded from Safari directly into Air Sharing.

As of today (10 August 2008), I am told by their customer service people that Capital One credit cards still have no foreign transaction fees.  I am going to at least one foreign conference this academic year, and this time around I have promised myself to be prepared!

You’re not seeing what you think you’re seeing. Believe it?

I downloaded iPhone 2.0 onto my first-generation iPhone yesterday.  I seem to have avoided some of the hiccups that others complained about.  I’ve only had time to explore the new iPhone a bit, but my top 3 impressions:

  1. Zenbe lists are pretty cool, and fill the function gap left by the fact that iCal doesn’t have to-do lists on the iPhone.
  2. iCal now has multiple calendars.  Hooray!
  3. Camera now says “Camera would like to use your current location” — yech.  Has iPhone been geotagging my pictures?  I don’t want this.  At least I can say “no” now, but it’d be nice to be able to turn this off by default.

Social/behavioral/cognitive scientists such as myself sometimes grumble that research in our profession is difficult because theories and measurements involving people are squishy, not like in the hard sciences like physics.  On the other hand, my father is a physicist and he often tells me that physics is not as clean as we think it is.   I just read this article on the Mpemba effect–the phenomenon where cold water can freeze more slowly than hot water–and it inclines me to think that my father is right.  In grade school I learned that hot water can indeed freeze more quickly, and it’s because it evaporates while it cools.  But the story is not actually that simple.  Who ever imagined that life for physicists was this complicated?

Stanley Fish, in his most recent column, “Memo to the Superdelegates: No Principles, Please“, said several highly controversial things, most notably that the Democratic primary process isn’t really democracy, or to the extent that it is, it’s the kind that the Founding Fathers feared, and the superdelegates should limit that democracy by voting politically rather than by following the will of the people. However, he also ignited a small firestorm with a wh-pronoun, in the following sentence:

Whom do I think would make the best president?

and a couple more like it. Here’s a sampling from the comments:

  • It’s interesting–and a bit dismaying–to read a piece that espouses pure principles of law and history, then lapses into a genteelism that confounds pure principles of grammar. A superdelegate might ask properly ask himself “Who do I think would make the best president…” but not “whom,” as “who” is the subject of the verb in this question rather than the object of “think.”
  • Didn’t you used to teach English at a university somewhere in the US? So how come, Stan, you write, “Whom do you think would make/will be . . . ?

[ironically, this second comment brings up yet another murky grammatical corner of English, one that I've been perplexed about since high school: what is the categorial status of "used to"? My current belief is that it is an adverb with no flexibility in positioning, the main piece of evidence being that it doesn't inflect, at least in writing. Assimilation/deletion makes it hard to figure out the situation in speech.]

  • “Whom do I think would make the best president?” or “Whom do I think will be the best general election candidate?” are, as you say, appropriate political questions. They are, however, ungrammatical ones. Get after your copy editor. The correct pronoun in each case (no pun intended) is “Who.”
  • [N]ot content shamelessly to shill for Sen. Clinton, Prof. Fish wholly abjures his celebrated expertise in English by writing, “Whom do I think would make the best President?”
  • Yes, Fish used the wrong (Who|Whom) form, because he’s pretentious, but in reality it doesn’t make any semantic difference. “Whom” is used as a signal to say “I’m being a classy writer, look at me!” and not for any expressive power that it lends to the grammar of English.

Curiously, these critiques are the opposite of the usual prescriptivist attack in which a self-styled educated language maven scolds the uneducated public for using who when whom is normatively appropriate. Is this an instance of populist prescriptivism? The last poster puts her finger on a truly important point: that the word whom truly can be a target of hypercorrection in English — that is, overusing a form that is socially associated with prestige.

However, I’m not completely convinced that hypercorrection is necessarily the source of Fish’s “error”. If we look at the unextracted (non-question) version of the sentence structure, we see that the only way of testing for the case of the position from which the wh-pronoun is extracted is using a regular pronoun:

I think (that) (he/she/I/we/they/*him/*her/*me/*us/*them) will be busy tomorrow.

On the other hand, it seems to me that it’s possible to imagine more “immediately deictic” contexts in which the accusative pronoun doesn’t seem totally out of place:

A: Who do you think we should send to the conference to present our new work, you or C?

B: I think “me” would make the best choice to send to the conference because I know that audience better.

I’m not saying that this exchange reads beautifully, but I feel as if I could imagine hearing an exchange like this. The situation reminds me a bit of a presentation I have heard Larry Horn give on reflexives, though I can’t remember where the presentation was.  If I’m at all correct, then Fish may have had in mind a meaning that would correspond to something like the following unextracted form:

I think (Clinton/Obama) would make the best choice for president.

And, by analogy, having this context in mind could have led to the offending whom.

Of course, it could just be that Fish is pretentious…

I’ve found that xzgv is pretty nice!

I’ve said before how much I like the streamlined browser Skim. One of the best parts is that it will auto-reload PDFs that change on disk (e.g., when you recompile a LaTeX document into a PDF). I just found out how to make it even better, so that Skim never asks you whether to reload, it just does it automatically:

$ defaults write -app Skim SKAutoReloadFileUpdate -boolean true


For a long time I have been unsuccessful at using the crossref field in BibTeX properly for the situations where I need it most: when I have multiple chapters in an book consisting of an edited collection of articles in my database, and I want to provide book-level information through a cross-reference. I always got the error

Warning--empty booktitle in <your_favorite_key_here>

which was rather frustrating.

However, I’ve finally figured out that for a book or proceedings, you need to specify both title and booktitle fields in the book entry. Most of the time the contents of these fields will be identical.  Looking back at the Guide to LaTeX, this is a no-brainer, but it stumped me for a very long time. May this blog post save you similar agony!

Read about it at

After much searching I have finally found how to use BUGS (Bayesian Inference using Gibbs Sampling) on OS X. Thanks to Tom Palmer for this document, which explains how in a few easy steps.

Instructions taken from here:

  1. Click to select one of the files for which you’d like to change the associated default application
  2. Press Command+I to open the Get Info window
  3. Under Open with select the application you’d like to use
  4. Under Use this application to open all documents like this click on Change All… and follow the prompts

I wondered whether this was possible. Well, it is:

Unfortunately, he didn’t realize how firm the ground was he had his feet on.

– To catch a thief, about 25 minutes into the movie (spoken by Mrs. Stevens)

Who says that X -> X Conj X????

Either Senator Barack Obama will be the first African-American or Senator Hillary Rodham Clinton will be the first woman to win the presidential nomination of a major American political party.

(From the New York Times, 13 January 2008, online here)

One cool thing I just found out is that emacs has version control integration. Most things are bound to C-x v <key>. Some useful commands:

  • C-x v i: add
  • C-x v v: commit (followed by C-c C-c once you’ve written your comment)
  • C-x v l: view log
  • C-x v =: view diff with repository version

A more complete list can be found here.

I’m pretty psyched about my iPhone but there are a few simple things that would really improve it. Here’s a running list:

  1. A functioning To-Do list. I’ve been using Toodledo as a web-based to-do list, but it’s not the same thing as having on your phone.
  2. The ability to send text messages to multiple recipients at once.
  3. Improvements to the Calendar functionality:
    1. Ability to customize event recurrence as in the OS X iCal application.
    2. Ability to distinguish among multiple calendars, as in iCal as well.
  4. A search function for email

UPDATE: as of mid-January 2008, we can now send text messages to multiple recipients. Good for Apple!

Looking at the preliminary program for the annual meeting of the Linguistic Society of America shows terrific representation by UC San Diego: nine presentations (out of a total 300) are by UCSD-affiliated researchers.  Among other things, this number is ahead of UC Santa Cruz (with four), MIT (five), UMass (also five), and UCLA (seven), tied with Cornell, and only one behind Stanford and Chicago.  UC Berkeley takes the cake, however, with 18.

Apart from R, I do most of my research programming in Java. There are two major reasons for this:

  1. The Stanford NLP lab group I was in during grad school programmed in Java, and old habits die hard.
  2. Java has some very nice IDEs — IntelliJ and Eclipse.

One thing that bugs me a lot about Java, though, is all the type casting that you need to do. I got very excited when Java 5 introduced generics, and I use them so much that I need few casts in my programs. But then I got exposed to Ocaml and type inference. What a language! What a feature! It is the closest thing I’ve ever seen to being able to say that if your program will compile, then it will work properly.

With the right IDE available, I would give up Java and adopt Ocaml right now — (1) above I can deal with, but (2) is hard to leave behind. Sadly, no language with type inference seems to have a functioning IDE with the same power of IntelliJ or Eclipse. Anyone out there reading this blog who feels inspired, go and work on Eclipse integration for Ocaml! Here is the list of critical IDE features I’d want, in order of importance:

  1. Safe rename
  2. Safe move
  3. Auto extraction of functions/expressions
  4. Quick navigating to types (or functions, modules, etc. in Ocaml)
  5. Outline views
  6. Integration with debugger/output (e.g., cross-referencing of error messages to code lines across windows)
  7. Auto-generation of code—we’d need a lot less of this in Ocaml or the like.

The best prospect right now may be ODT

I just received a Goldtouch ergonomic USB split keyboard for the Macintosh. It’s really nice, but it has only one Control key and it’s tucked away in the lower left corner. For Emacs addicts such as myself, this is unacceptable.

I have dealt with the problem by remapping the right-hand Alt key to Control, but this is suboptimal. Goldtouch, if you’re reading this, please redesign your keyboard with a more traditional Mac Control/Alt/Command key layout (one of each on both sides of the space bar)!

I encountered Will Lewis‘s Online Database of Interlinear Text (ODIN) for the first time today. What a terrific idea: electronic resources are scarce for the vast majority of the world’s languages, but annotated corpora for these languages are sitting under our noses in the form of glossed examples from linguistics papers. ODIN scours the web to collect these and turn them into a searchable database. There is more to be done — both in data cleaning and in improving the search capabilities — before this resource will be usable for serious work, but it’s already a nice way of getting to online papers that cover a particular set of languages and/or linguistic phenomena. Now what would be great is a manual submission facility and encourage all linguists to submit their papers…

May 18, 2007, The New York Times, “Violence Continues in Gaza”:

Two rockets fired from Gaza had landed near Sderot, and the government bused many residents to hotels in what it refused to call an evacuation.

Interesting how this headless relative clause doesn’t technically contain the information required to identify the referent, but through pragmatic inference you can.

I write almost everything in LaTeX. In Windows I generally previewed the output as Postscript using Ghostview, but OS X is much more PDF-oriented. I’ve found this terrific little PDF viewer called Skim, which I like more than anything else for LaTeX output. The key reason is that it will automatically reload the PDF file any time it changes on disc. That means you can work with just two or three windows — the text editor, Skim, and maybe a terminal window — and always have a pretty up-to-date view of your document.

Better yet, you can automatically switch focus from a terminal window to Skim with Applescript:

osascript -e ‘tell application “Skim” to activate’

This means you can construct a shell script that automatically compiles your LaTeX document into PDF and then shifts into Skim so that you’re looking at a freshly updated version.   Pretty sweet!  The only thing left is to figure out how to bind this set of commands correctly into Emacs…

I just learned about flyspell for on-the-fly spell checking in LaTeX. It’s not perfect, but still pretty sweet!

One of those “howdoI”s I come across all the time is how to manipulate PDF files, e.g. concatenate several, from the command line without too much pain. The best tool I’ve found so far is Pdftk. My favorite command is concatenating several PDFs together:

pdftk 1.pdf 2.pdf 3.pdf cat output 123.pdf

For a while I had been living without color-coding in my OS X X11 xterm windows to distinguish directories from executable files from other files. A bit of playing around got me color-coding back, though. Try the following in your .bashrc:

# color
export TERM=xterm-color
export CLICOLOR=1
export LSCOLORS=ExFxCxDxBxegedabagacad