Skip to main content

Text and Data Mining Library Databases: Known Vendor Policies

Guidance for text and data mining subscription resources

Known Policies

This is not an exhaustive list, and we cannot promise this list will always have the most current information. Please contact us if you have questions or for information on other possible data sources.

Procedure and policies can change very quickly; this list is only to give a sense of different approaches to obtaining text mining data. As the library works with vendors, we will provide a synopsis of procedures and policies below.

DO NOT use webcrawling or spidering bots to gather materials. These activities are in violation of our licenses and can lead to the entire university losing access to a resource. Please contact a librarian who will help you work with our vendors.

Contact Information

For questions, please contact:

  • Amanda Rust, Assistant Director, Digital Scholarship Group and Digital Humanities Librarian, or
  • Jen Ferguson, Research Data Management Librarian

American Medical Association (JAMA)

Authorized users may register for limited rights to text and data mine online content licensed by Northeastern on the AMA site for non-commercial purposes only. Instructions on how to create the account and the license to which the researcher must agree may be found here:

American Psychological Association (APA)

For APA E-books, extracting and manipulating information for research analysis is allowed. Please contact APA for guidance.

The PsycINFO database may not be mined in any way.

Association for Computing Machinery (ACM)

Products:  ACM Digital Library, ACM Transactions

Text & Data Mining may be available upon request.  If ACM agrees, researcher will negotiate and sign a license directly with ACM. Researcher must first contact ACM with a project description to request access.


Products:  ScienceDirect, Compendex/INSPEC, ClinicalKey, etc.

Working with Elsevier's APIs, text and data mining research can be performed on all content that Northeastern University has access to. Before proceeding with TDM projects, researchers must request API keys and more information at

Gale Cengage

Products:  Artemis Primary Sources, 17th, 18th, and 19th C. British Newspapers, Times of London Archive, etc.

Gale Primary Source collections are available for text and date mining projects for a fee paid by the researcher. Contact a librarian for more information.

(Journal collections are currently prohibited from use in text and data mining projects.)


HathiTrust has datasets available for download and APIs for text and data mining available upon request and registration. See the HathiTrust Research Center and HathiTrust Research Center Analytics for more information.


Data mining is permitted with prior written agreement of the publisher. Please contact a librarian for assistance.


Products:  IEEE Xplore Digital Library

Northeastern users may perform text & data mining on our licensed IEEE content. IEEE must be notified in advance of the activity at which time they will decide how the Authorized User may access and use the product for text & data mining. Output may not be used for commercial purposes or to substitute for a subscription to IEEE content. Content may not be put on an externally facing server, nor may a third party be permitted to harvest or use the data. Please contact a librarian for assistance initiating any IEEE project.


Products:  Academic articles across disciplines

JSTOR Data for Research allows researchers to create limited datasets for data mining projects. Large and full-text datasets may be provided upon request and agreements of terms. Information can be found at JSTOR Data for Research.


Data mining is permitted but a University license will be required for each project.  Please contact the Library for assistance.


Products:  PsycINFO, American Periodical Series Online, Dissertations and Theses Online, Academic Video Online, Ebook Central, etc.

For a significant fee, ProQuest may agree to work with a researcher on text and data mining projects after negotiation and agreement of terms. Contact ProQuest directly.


Authorized users may use the licensed material to perform and engage in text or data mining activities for legitimate academic research and other educational purposes. Non-educational use shall require SAGE's permission. Please contact a librarian for assistance.


No registration is required for text mining; full-text content can be accessed by API based on the content's Digital Object Identifier. See Text and Data Mining at Springer Nature for more information.