LingPipe Blog
Interannotator Agreement for Chunking Tasks Like Named Entities and Phrases
From the Emailbox Krishna writes, I have a question about using the chunking evaluation class for inter annotation agreement : how can you use it when the annotators might have missing chunks I.e., if one of the files contains more chunks than the other. The answer’s not immediately obvious because the usual application of interannotator [...]
Categories: Blogroll
Standard Output Ruins Everything!
The title is a a paraphrase from Dirk Eddelbuettel on the Rcpp mailing list (an interface tool for R and C++), but the lesson also applies to Java. Don’t Write to Standard Output! One of the first lessons of writing an API (as opposed to something that only runs from the command line) is that [...]
Categories: Blogroll
Mavandadi et al. (2012) Distributed Medical Image Analysis and Diagnosis through Crowd- Sourced Games: A Malaria Case Study
I found a link from Slashdot of all places to this forthcoming paper: Mavandadi, Sam, Stoyan Dimitrov, Steve Feng, Frank Yu, Uzair Sikora, Oguzhan Yaglidere, Swati Padmanabhan, Karin Nielsen, and Aydogan Ozcan. (2012) Distributed Medical Image Analysis and Diagnosis through Crowd-Sourced Games: A Malaria Case Study. PLoS ONE. The main body of the paper is [...]
Categories: Blogroll
Quick iPad 3 Review: Wow!
My iPad 3 arrived Friday afternoon. I’ve been using the iPad 1 for the past year and a half or so for all of my technical reading. Mirroring My Old iPad After synching my iPad 1 with my Macbook Air, when I plugged in the iPad 3 for the first time, it gave me the [...]
Categories: Blogroll
All Aboard for Quasi-Productive Stemming
One of the words Becky and I are having annotated for word sense (collecting 25 non-spam Mechanical Turk responses per word) is the nominal (noun) use of “board”. One of the examples was drawn from a text with a typo where “aboard” was broken into two words, “a board”. I looked at the example, and [...]
Categories: Blogroll
Natural Language Generation for Spam
In a recent comment on an earlier post on licensing, we got this spam comment. I know it’s spam because of the links and the URL. It makes faculty adage what humans can do with it. We’ve approved to beacon bright of that with LingPipe’s authorization — we artlessly can’t allow the attorneys to adapt [...]
Categories: Blogroll
Cross Validation vs. Inter-Annotator Agreement
Time, Negation, and Clinical Events Mitzi’s been annotating clinical notes for time expressions, negations, and a couple other classes of clinically relevant phrases like diagnoses and treatments (I just can’t remember exactly which!). This is part of the project she’s working on with Noemie Elhadad, a professor in the Department of Biomedical Informatics at Columbia. [...]
Categories: Blogroll
Settles (2011): Closing the Loop: Fast, Interactive Semi-Supervised Annotation with Queries on Features and Instances
Whew, that was a long title. Luckily, the paper’s worth it: Settles, Burr. 2011. Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances. EMNLP. It’s a paper that shows you how to use active learning to build reasonably high-performance classifier with only minutes of user effort. Very cool and right up [...]
Categories: Blogroll
How to Prevent Overflow and Underflow in Logistic Regression
Logistic regression is a perilous undertaking from the floating-point arithmetic perspective. Logistic Regression Model The basic model of an binary outcome with predictor or feature (row) vector and coefficient (column) vector is where the logistic sigmoid (i.e., the inverse logit function) is defined by and where the Bernoulli distribution is defined over support so that [...]
Categories: Blogroll



