MIT Libraries guide to freely available text resources
MIT Libraries has created a detailed guide to freely available resources for text and data mining. Expand the entries in their 'Freely available resources and tools' list to view additional information such as:
- Description/coverage notes
- Means of access
- Access restrictions, if any
- Contact for technical questions
- Additional information links
English-Corpora.org hosts a diverse collection of of full-text data sets, from news content to the full text of Wikipedia to soap opera transcripts. Though some content is still under copyright, English-Corpora removes 5% of the text and makes the argument that the content is transformed and market value for it is eliminated in this process.