Joe Loewenstein and our colleagues in the Humanities Digital Workshop have won a $780,000 digital infrastructure grant from the NSF to refine the metadata associated with EarlyPrint, the comprehensive WashU-Northwestern corpus of Early English printed texts.
Together with colleagues at the Linguistic Data Consortium, they mean to parse all the sentences in the corpus and to disambiguate each of its 1.65 billion words. This monumental machine-learning undertaking will also entail the unambiguous identification of every person, place, and event mentioned in the corpus, linking each of these data to such knowledge bases as the Oxford English Dictionary, Wikipedia, and the Oxford Dictionary of National Biography, thereby effectively providing an unified index to printed discourse in the first 2.3 centuries of English print culture.
Congratulations!