Text/corpora research

I’ve been asked for some good examples of what can be done in this area, so I’m putting up a page with a few choice links as they occur to me.

Firstly, there are some interesting tools around to allow you to play around yourself with text.

Google’s Ngram Viewer allows you to view the frequency of occurrence of a phrase over time in the Google Books corpus, or to compare more than one phrase.

Wordle has been used here to compare the frequency of occurrences of common words in two different corpora.

A favourite example of mine is the gender-testing of text authorship, reported on here.  Texts by the thriller writer Dick Francis came out as female-authored – but he writes in collaboration with his wife.

Multiple authorship of parts of the Old Testament is being unpicked using computers to assign text to different authors.

Iris Murdoch’s vocabulary as used in her novels strikingly declined as her dementia set in. Here is a study comparing her late writing to that of Agatha Christie and P. D. James.

