MIT Libraries guide to freely available text resources

MIT Libraries has created a detailed guide to freely available resources for text and data mining. Expand the entries in their 'Freely available resources and tools' list to view additional information such as:

  • Description/coverage notes
  • Means of access
  • Access restrictions, if any
  • Limitations
  • Contact for technical questions
  • Additional information links

English-Corpora.org

English-Corpora.org hosts a diverse collection of of full-text data sets, from news content to the full text of Wikipedia to soap opera transcripts. Though some content is still under copyright, English-Corpora removes 5% of the text and makes the argument that the content is transformed and market value for it is eliminated in this process. 

Downloadable Text Data

This is only a selective list; there are many open access sources of downloadable data. With particular thanks to the Carnegie Mellon Libraries Guide to Text and Data Mining.