Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
Downloadable Text Data
This is only a selective list; there are many open access sources of downloadable data. With particular thanks to the Carnegie Mellon Libraries Guide to Text and Data Mining.
Caselaw Access Project
Based at Harvard University, an amazing and fully downloadable database of 360 years of Unites States caselaw. Access via API or bulk data download.
Over 250,000 full-text, peer-reviewed Biomed Central articles are available for text and data mining.
A substantive list of open data repositories across many disciplines.
English Broadside Ballad Archive
16th and 17th century English broadside ballads, hosted and provided by the University of California at Santa Barbara.
Features content in many file formats, including figures, datasets, media, papers, posters, presentations and filesets.
Folger Digital Texts
Shakespeare's play, sonnets, and poems, downloadable in multiple formats. Provided by the Folder Shakespeare Library.
Hathi Trust: Public Domain Data
Northeastern has a Hathi Trust membership which gives access to all data, but Hathi Trust also provides a sub-set of public domain items for any researcher.
Over 8 million ebooks and texts in the public domain.
JSTOR Data for Research
JSTOR provides some freely available data and tools, including those for visualization and bulk downloads. See the FAQ
for more information.
Public Library of Science (PLOS)
Data available via two APIs, one for search
(bring content into other web applications), and one for Article-Level Metrics
(usage stats, citation counts, social media coverage).
Text Creation Partnership
16th, 17th, and 18th cenury English works, transcribed and encoded by libraries and released to the public domain. Hosted by the University of Oxford; contact the TCP for a bulk download. Includes the new EEBO
University of Pennsylvania Online Books
"Listing over 2 million free books on the Web", not necessarily with bulk download features but a good source for textual data nonetheless.