Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Text and Data Mining Library Databases: Known Vendor Policies

Guidance for text and data mining subscription resources

Known Policies

This is not an exhaustive list, and we cannot promise this list will always have the most current information. Please contact us if you have questions or for information on other possible data sources.

Procedure and policies can change very quickly; this list is only to give a sense of different approaches to obtaining text mining data. As the library works with vendors, we will provide a synopsis of procedures and policies below.

Do not use webcrawling or spidering bots to gather materials. These activities are in violation of our licenses and can lead to the entire university losing access to a resource. Please contact a librarian who will help you work with our vendors.

American Medical Association (JAMA)

Authorized users may register for limited rights to text and data mine online content licensed by Northeastern on the AMA site for non-commercial purposes only. Instructions on how to create the account and the license to which the researcher must agree may be found here: https://jamanetwork.com/pages/about-tdm

American Psychological Association (APA)

For APA E-books, extracting and manipulating information for research analysis is allowed. Please contact APA for guidance.

The PsycINFO database may not be mined in any way.

American Society of Civil Engineers (ASCE)

Any requests regarding data mining should be directed to ascelibrary@asce.org and is handled outside of the University’s licensed access.

Art & Architecture ePortal (Yale University Press)

Researchers must contact Yale University Press for permission to data mine this resource.

Association for Computing Machinery (ACM)

Products:  ACM Digital Library, ACM Transactions

Text & Data Mining may be available upon request.  If ACM agrees, researcher will negotiate and sign a license directly with ACM. Researcher must first contact ACM with a project description to request access.

BioCyc

Authorized Users may use and make copies of the Licensed Materials to perform and engage in computational analysis (including text and data mining) for academic research, scholarship, and other educational purposes but not for commercial use, and to permit Authorized Users to distribute, display, and otherwise use (publicly or otherwise) the results, provided that such results do not reproduce the whole or a substantial part of any of the Licensed Materials and any full copies of Licensed Materials are promptly deleted once the computational analysis is complete.

Cambridge University Press Journals

Authorized Users may download, extract, store and index the Products for the purposes of TDM and may mount, load, integrate and analyze the results of TDM on their personal devices or Secure Network. Any copies of the Products accessed or reproduced by an Authorized User for the purposes of TDM must be deleted once the analysis of the results of the TDM is complete.

Elsevier

Products:  ScienceDirect, Compendex/INSPEC, ClinicalKey, etc.

Working with Elsevier's APIs, text and data mining research can be performed on all content that Northeastern University has access to. Before proceeding with TDM projects, researchers must request API keys and more information at https://dev.elsevier.com/.

Gale Cengage

Products:  Artemis Primary Sources, 17th, 18th, and 19th C. British Newspapers, Times of London Archive, etc.

Gale Primary Source collections are available for text and date mining projects for a fee paid by the researcher. Contact a librarian for more information.

(Journal collections are currently prohibited from use in text and data mining projects.)

HathiTrust

HathiTrust has datasets available for download and APIs for text and data mining available upon request and registration. See the HathiTrust Research Center and HathiTrust Research Center Analytics for more information.

To obtain public domain datasets, see HathiTrust datasets for guidelines on proposal submissions and other agreements. The University has completed the distribution agreement with Google referenced in the instructions so the Google public domain content is available to Northeastern researchers.

HistoryMakers

Data mining is permitted with prior written agreement of the publisher. Please contact a librarian for assistance.

IEEE

Products:  IEEE Xplore Digital Library

Northeastern users may perform text & data mining on our licensed IEEE content. IEEE must be notified in advance of the activity at which time they will decide how the Authorized User may access and use the product for text & data mining. Output may not be used for commercial purposes or to substitute for a subscription to IEEE content. Content may not be put on an externally facing server, nor may a third party be permitted to harvest or use the data. Please contact a librarian for assistance initiating any IEEE project.

JSTOR

Products:  Academic articles across disciplines

JSTOR (and some other) content can be accessed for text mining via Constellate.

JSTOR Data for Research allows researchers to create limited datasets for data mining projects. Large and full-text datasets may be provided upon request and agreements of terms. Information can be found at JSTOR Data for Research.

Karger Publishers

Contact orders@karger.ch for text and data mining inquiries.

Microbiology Society

May use the Licensed Material to perform and engage in text mining/ data mining activities for academic research and other Educational Purposes and allow Authorized Users to mount, load and use the results in accordance with this License.  Text-mining and data-mining output can be displayed and distributed on any electronic network, including the internet, provided that such output does not contain copies of copyright works owned or licensed to the Publisher.

MIT Press Direct

Authorized users may use the Subscribed Content to perform and engage in text and/or data-mining activities ("TDM") for academic research, scholarship, and other educational, noncommercial purposes; utilize and share the results of TDM in the Authorized User’s scholarly work with proper attribution; and make the results available for use by others for the foregoing purposes; all provided the Authorized User’s work product is not such as would substitute for the Subscribed Content, a substantial portion of any monograph included in the Subscribed Content, or any material subset of the Subscribed Content. TDM may be conducted only where special access has been granted by MIT Press; no extra fees shall be charged for such access. TDM conducted by the use of scripts or scraping in order to extract content from MIT Press web sites is not permitted. Downloaded metadata and Subscribed Content may be stored locally for the duration of a TDM project but must be permanently deleted upon completion of the project.

OCLC

Data mining is permitted but a University license will be required for each project.  Please contact the Library for assistance.

Ovid (Wolter Kluwers Health)

Text & data mining is allowed only for journals owned by Ovid’s affiliated company, Wolters Kluwer Health, Inc or for which Wolters Kluwer Health, Inc. has an exclusive license to publish.  Authorized Users may use the Products to perform and engage in text and/or data mining activities for academic research, scholarship, and other educational purposes.

Project Syndicate

Contact the publisher before any attempts to text or data mine.

ProQuest

Products:  PsycINFO, American Periodical Series Online, Dissertations and Theses Online, Academic Video Online, Ebook Central, etc.

For a significant fee, ProQuest may agree to work with a researcher on text and data mining projects after negotiation and agreement of terms. Contact ProQuest directly.

Royal Society Publishing

Use Text and Data Mining technologies to derive information from the Licensed Materials meaning: Download, extract and index information from the Licensed Materials to which the Authorized User has access under this License. Where required, mount, load and integrate the results on a server used for the Authorized User’s text-mining system and evaluate and interpret the Text and Data Mining Output for access and use by Authorized Users. The Authorized User shall ensure compliance with Publisher's Usage policies. Text and data mining may be undertaken on either locally loaded Licensed Materials or as mutually agreed. Electronic copies of the Licensed Materials may be locally stored for this purpose only during the lifetime of any TDM project.

SAGE

Authorized users may use the licensed material to perform and engage in text or data mining activities for legitimate academic research and other educational purposes. Non-educational use shall require SAGE's permission. Please contact a librarian for assistance.

SciFinder-n

Please contact a librarian for assistance.

Metadata from CAS Records may be downloaded for a Data Mining procedure in compliance with CAS or STN product licenses. The data may be used in CAS or STN Data Mining Tools. To use CAS Metadata in third party Data Mining Tools, you must download CAS Records through STN AnaVist by using a "Download feature", and agree to the terms and conditions for the use of this information. Any other uses of CAS data in Data Mining procedures are prohibited and require that you contact CAS.

Any Records retained from the task are subject to the limits specified in the SciFinder terms document. CAS Records retained as a result of the Data Mining process must be consistent with these Policies and display the STN AnaVist copyright. CAS Records or Metadata may not be used in Data Mining with non-CAS or non-STN tools unless the User or Information Professional has downloaded the Records via STN AnaVist. For uses of CAS Information outside of this procedure the User or Information Professional must contact CAS.

Springer

No registration is required for text mining; full-text content can be accessed by API based on the content's Digital Object Identifier. See Text and Data Mining at Springer Nature for more information. 

University of Michigan

An authorized user wishing to perform text & data mining against the University of Michigan e-book collections must contact the University of Michigan for permission and so they can facilitate the data extraction.  fulcrum-info@umich.edu 

(Participating Institutions and their Authorized Users may, subject to prior notificationand approval by the Licensor, using reasonable practices, engage in text processing, which is any kind of analysis of natural language text. The Licensor will make appropriate arrangements prior to the start of this activity to account for heavy usage and ensure continued access for the user. This may include but not be limited to a process by which information may be derived from text by identifying patterns and trends within natural language through text categorization, statistical pattern recognition, concept or sentiment extraction, and the association of natural language with indexing terms. Technology will not be used to hinder any uses granted under this section.)

Web of Science (Clarivate)

A User may request access to the API through the Clarivate Developer Portal: https://developer.clarivate.com/

This allows for access to certain data fields.  (See the Web of Science & InCites APIs document.).  Any other data mining needs not accessible through the API must be requested directly from Clarivate.

Contact Information

Please contact your subject specialist or Jen Ferguson with questions.