Re Publish data

Thanks, Eric, for setting this up.

This is partly a test and partly a followup to Paul de Lacy’s query about data availability.

I’ve been facing a similar issue lately and think the means of addressing it depends on the medium of the data. For transcribed data, it may be pretty simple to make it available online, but not for audio or video.

I’ve been involved in compiling a database of English nicknames – transcribed data, all the representations are orthographic, and phonemic ones are not too difficult to extract from them. It’s getting really big (2000+), approaching the size of a small dictionary or phrasebook. We decided to make it public since it has some variability as well as counterexamples to apparent exceptionless generalizations. (also, it is unlikely to be published in its full form anyway).

I posted it on my own site as a password-protected pdf, so anyone who wants it needs to email me – that way I can kind of track who’s got it, and give additional advice for the reader depending on whether they’re a linguist. (No one’s asked yet).

It was pretty simple to generate, and I’d probably be glad to put it on an archive (it’s only 217K), but I don’t know if that should precede or follow submission of the work for review. But I think it makes sense that if the data are new, crucial, and unpublished, they should be available anyway (and perhaps cited with caveats to that effect).

Now as for video … fully sharing video data in any way seems just about impossible, unless you put everything on VHS and mail it. Uncompressed video data runs about 4.2 min for 1 Gb. (treating each frame as a 100% Jpeg and no between-frame compression algorithm). If you try compressing the data you run the risk of compromising it – not enough to be noticeable in real time, but it could affect frame-by-frame and pixel-by-pixel decisions.

We recently ran a study of 10 subjects for about 6 minutes of video footage each – roughly 15 Gb of data, which suddenly got our tech support intrigued. Hypothetically I’d be willing to share the data, but wouldn’t want to burn 4 data DVDs, and would not want to post it online (nor have anyone else do so) because of the space and bandwidth needs. 1 Gb on a busy T1 can take 20 minutes to an hour to transfer. I guess the only feasible means of sharing the data would be to compress it on to a single DVD for home-theatre play, which sounds like a fun project, but would tie up the burner even longer. Plus, you can’t pull data directly off that kind of DVD. It’s also not much different from the VHS idea.

(Suddenly it seems weird that it would take longer to transmit video data than to view it – does this mean video cards are faster than disc drives?)

Bob K

One thought on “Re Publish data

  1. Pingback: phonoloblog » Should I be surprised?

Comments are closed.