For those interested in performing NLP on source code, in particular Javadoc comments, we just released a Doclet [5] at the NLP Frameworks workshop [6] last week.
Its main feature is that it creates an XML corpus from Java source code that is optimised for processing in an NLP Framework (GATE [7] in our case, but it should work for any framework that takes XML as input).
For more information and the download, have a look at the Web page [5]. And for details, background, and an application example at our paper [1].
We currently use it for automatic quality assessment of source code comments, but obviously there are many other use cases as well.
