Known Policies

This is not an exhaustive list, and we cannot promise this list will always have the most current information. Please contact us if you have questions or for information on other possible data sources.

Procedure and policies can change very quickly; this list is only to give a sense of different approaches to obtaining text mining data. As the library works with vendors, we will provide a synopsis of procedures and policies below.

Do not use webcrawling or spidering bots to gather materials. These activities are in violation of our licenses and can lead to the entire university losing access to a resource. Please contact a librarian who will help you work with our vendors.

American Medical Association (JAMA)

Authorized users may register for limited rights to text and data mine online content licensed by Northeastern on the AMA site for non-commercial purposes only. Instructions on how to create the account and the license to which the researcher must agree may be found here: https://jamanetwork.com/pages/about-tdm

American Psychological Association (APA)

For e-books and handbooks, download rights include extraction and manipulation of information for the purpose of illustration, explanation, example, comment, criticism, teaching, research or analysis. Please contact APA for guidance.

The PsycINFO database may not be mined in any way.

American Society of Civil Engineers (ASCE)

Any requests regarding data mining should be directed to ascelibrary@asce.org and is handled outside of the University’s licensed access.

Art & Architecture ePortal (Yale University Press)

Researchers must contact Yale University Press for permission to data mine this resource.

Association for Computing Machinery (ACM)

Authorized Users may use the Licensed Materials to perform and engage in text and/or data mining activities for academic research, scholarship, and other educational purposes and may utilize and share the results of text and/or data mining (“TDM output”) with research collaborators, and in their scholarly work and make the results available for use by others, so long as the purpose is not to create a product for use by third parties that would substitute for the Licensed Materials. Authorized users may include TDM Output as part of original works of scholarship, e.g. articles that describe, analyze, and interpret research, presentations at academic conferences, and inclusion in an academic thesis. ACM will cooperate with Universities and Authorized Users as reasonably necessary in making the Licensed Materials available in a manner and form most useful to the Authorized Users for computational access. ACM shall provide this access without any additional fees. Universities and Authorized Users shall be able to conduct TDM by An API provided by Vendor or a mutually agreed third-party provider. Please contact the library before beginning any TDM projects with ACM so we can work with the vendor to obtain an API key.

Biochemical Society Journals

Authorized users are permitted to download and make copies of the whole or any parts of the Licensed Material for the purposes of, and to perform and engage in computational analysis (including text and data mining) using the Licensed Material for the purpose of research and other Educational and Research Purposes but not for Commercial Use, and to permit Authorised Users to distribute and display and otherwise use (publicly or otherwise), other than for Commercial Use, the results, provided that such results do not reproduce the whole or a substantial part of any Licensed Content. Copies of Licensed Content made under this Clause  shall be deleted promptly after the computational analysis has been completed.  The Publisher hereby acknowledges that any copyright and other intellectual property rights, of whatever nature, arising from any computational analysis (including any text mining/data mining) of the Licensed Material referred to above shall, as between the Institution and Authorised User on the one hand, and the Publisher (and any licensor of the Publisher or other rights holder in the Licensed Material), on the other, be the property of the relevant Authorised Users or the Institution, as the case may be.

BioCyc

Authorized Users may use and make copies of the Licensed Materials to perform and engage in computational analysis (including text and data mining) for academic research, scholarship, and other educational purposes but not for commercial use, and to permit Authorized Users to distribute, display, and otherwise use (publicly or otherwise) the results, provided that such results do not reproduce the whole or a substantial part of any of the Licensed Materials and any full copies of Licensed Materials are promptly deleted once the computational analysis is complete.

BioOne

Use Text and Data Mining (TDM) technologies to derive information from the Licensed Materials for academic research, scholarship, and other educational purposes, utilize and share the results of text and/or data mining in their scholarly work, and make the results available for use by others, so long as the purpose is not to create a product for use by third parties that would substitute for the Licensed Materials.

Cambridge University Press Journals

Authorized Users may download, extract, store and index the Products for the purposes of TDM and may mount, load, integrate and analyze the results of TDM on their personal devices or Secure Network. Any copies of the Products accessed or reproduced by an Authorized User for the purposes of TDM must be deleted once the analysis of the results of the TDM is complete.

Company of Biologists

The Licensee and the Authorized Users may use Licensed Materials for text and data mining, namely downloading, extracting and indexing information from the Licensed Materials, for text and data mining purposes and mounting, loading and integrating the results of text and data mining on Licensee’s Secure Network used for the Licensee’s text and data mining systems, as well as evaluating and interpreting the text and data mining output, for access and use by Authorised Users only. Text and data mining may be undertaken only on Licensed Materials within the Licensee’s Secure Network only. Such text and data mining is permitted for internal, research purposes only and the outputs from such text and data mining may not be sold, licensed, assigned, transferred, or disposed of in any way, or otherwise disclosed to any third party.

Elsevier

Products:  ScienceDirect, Compendex/INSPEC, ClinicalKey, etc.

Working with Elsevier's APIs, text and data mining research can be performed on all content that Northeastern University has access to. Before proceeding with TDM projects, researchers must request API keys and more information at https://dev.elsevier.com/.

Upon completion of the relevant text and data mining project, the user will immediately and permanently delete all copies (including back-ups or otherwise) of the ScienceDirect Subscribed Product dataset.

Gale Cengage

Products:  Artemis Primary Sources, 17th, 18th, and 19th C. British Newspapers, Times of London Archive, etc.

Gale Primary Source collections are available for text and date mining projects for a fee paid by the researcher. Contact a librarian for more information.

(Journal collections are currently prohibited from use in text and data mining projects.)

HathiTrust

HathiTrust has datasets available for download and APIs for text and data mining available upon request and registration. See the HathiTrust Research Center and HathiTrust Research Center Analytics for more information.

To obtain public domain datasets, see HathiTrust datasets for guidelines on proposal submissions and other agreements. The University has completed the distribution agreement with Google referenced in the instructions so the Google public domain content is available to Northeastern researchers.

HistoryMakers

Data mining is permitted with prior written agreement of the publisher. Please contact a librarian for assistance.

IEEE

Authorized Users may perform and engage in TDM provided that Licensee notifies IEEE prior to performance or engagement of each TDM activity or operation. Licensee and IEEE will mutually determine how Licensee and its Authorized Users may access and use the Licensed Products to perform or engage in TDM. Licensee and its Authorized Users may make available or share and/or utilize TDM Output so long as the results are not used for commercial purposes or to substitute the Licensed Products.  Authorized Users may NOT utilize the TDM Output to enhance institutional or subject repositories in a way that would compete with the value of the final peer-reviewed journal article, or have the potential to substitute and/or replicate any other existing IEEE product, service and/or solution;  make the results of any TDM output available on an externally facing server or website other than as permitted above; or permit a third party to harvest any TDM Output. Please contact a librarian for assistance initiating any IEEE project.

JSTOR

Products:  Academic articles across disciplines

JSTOR (and some other) content can be accessed for text mining via Constellate.

JSTOR Data for Research allows researchers to create limited datasets for data mining projects. Large and full-text datasets may be provided upon request and agreements of terms. Information can be found at JSTOR Data for Research.

Karger Publishers

Contact orders@karger.ch for text and data mining inquiries.

Knovel Library

Text and data mining is not allowed.

Microbiology Society

May use the Licensed Material to perform and engage in text mining/ data mining activities for academic research and other Educational Purposes and allow Authorized Users to mount, load and use the results in accordance with this License.  Text-mining and data-mining output can be displayed and distributed on any electronic network, including the internet, provided that such output does not contain copies of copyright works owned or licensed to the Publisher.

MIT Press Direct

Authorized users may use the Subscribed Content to perform and engage in text and/or data-mining activities ("TDM") for academic research, scholarship, and other educational, noncommercial purposes; utilize and share the results of TDM in the Authorized User’s scholarly work with proper attribution; and make the results available for use by others for the foregoing purposes; all provided the Authorized User’s work product is not such as would substitute for the Subscribed Content, a substantial portion of any monograph included in the Subscribed Content, or any material subset of the Subscribed Content. TDM may be conducted only where special access has been granted by MIT Press; no extra fees shall be charged for such access. TDM conducted by the use of scripts or scraping in order to extract content from MIT Press web sites is not permitted. Downloaded metadata and Subscribed Content may be stored locally for the duration of a TDM project but must be permanently deleted upon completion of the project.

Nikkei Asia

Content is not encoded, indexed, or cataloged. Institution may, directly or through a vendor, provide indexing and data mining work to aid in searching Content; provided, however, that Institution shall disclose to Nikkei prior to commencement of work the full scope of the indexing and data mining Institution plans to perform, ensure that performance of such services shall not adversely affect Nikkei in any manner, bear responsibility for all costs and any damage sustained by Nikkei as the result of such undertaking, and obtain written approval from Nikkei in advance before undertaking any indexing or data mining. Nikkei, may, in its sole discretion, deny all such requests from Institution.

OCLC

Data mining is permitted but a University license will be required for each project.  Please contact the Library for assistance.

Ovid (Wolter Kluwers Health)

Text & data mining is allowed only for journals owned by Ovid’s affiliated company, Wolters Kluwer Health, Inc or for which Wolters Kluwer Health, Inc. has an exclusive license to publish.  Authorized Users may use the Products to perform and engage in text and/or data mining activities for academic research, scholarship, and other educational purposes.

Policy Commons

Authorized users may use the Product to perform and engage in text and/or data mining activities for academic research, scholarship, and other educational purposes, utilize and share the results of text and/or data mining in their scholarly work, and make the results available for use by others, provided such results do not compete or compromise the salability or value of Product. Licensor will cooperate with licensee and Authorized Users as reasonably necessary in making the Product available in a manner and form most useful to the Authorized User. If licensee or Authorized Users request the licensor to deliver or otherwise prepare copies of the Product for text and data mining purposes, any fees charged by licensor shall be solely for preparing and delivering such copies on a time and materials basis.

Project Syndicate

Contact the publisher before any attempts to text or data mine.

ProQuest TDM Studio

Data available for text mining includes the Proquest databases and publications that the library subscribes to: current and historical newspapers and wire services, dissertations and theses, scholarly journals, trade publications, and primary sources.  

Registration with a Northeastern email address is required.

Royal Society Publishing

Use Text and Data Mining technologies to derive information from the Licensed Materials meaning: Download, extract and index information from the Licensed Materials to which the Authorized User has access under this License. Where required, mount, load and integrate the results on a server used for the Authorized User’s text-mining system and evaluate and interpret the Text and Data Mining Output for access and use by Authorized Users. The Authorized User shall ensure compliance with Publisher's Usage policies. Text and data mining may be undertaken on either locally loaded Licensed Materials or as mutually agreed. Electronic copies of the Licensed Materials may be locally stored for this purpose only during the lifetime of any TDM project.

SAGE

Authorized Users shall be permitted to extract or use information contained in the Product for Educational Purposes, including, but not limited to, text and data mining, extraction and manipulation of information for the purposes of illustration, explanation, example, comment, criticism, teaching, research, or analysis.  SAGE shall provide either online at its web site, through a third-party service (including, but not limited to, CLOCKSS or Portico), or to Institutions in mutually agreed physical media, one full copy of the Licensed Materials in raw data format. The raw data may be used by Institutions and Authorized Users to perform text, image, and/or data mining functions and algorithms for academic research and other educational purposes in accordance with the terms of this Agreement. Researchers should contact the library or SAGE directly for assistance.

SciFinder-n

Please contact a librarian for assistance.

Metadata from CAS Records may be downloaded for a Data Mining procedure in compliance with CAS or STN product licenses. The data may be used in CAS or STN Data Mining Tools. To use CAS Metadata in third party Data Mining Tools, you must download CAS Records through STN AnaVist by using a "Download feature", and agree to the terms and conditions for the use of this information. Any other uses of CAS data in Data Mining procedures are prohibited and require that you contact CAS.

Any Records retained from the task are subject to the limits specified in the SciFinder terms document. CAS Records retained as a result of the Data Mining process must be consistent with these Policies and display the STN AnaVist copyright. CAS Records or Metadata may not be used in Data Mining with non-CAS or non-STN tools unless the User or Information Professional has downloaded the Records via STN AnaVist. For uses of CAS Information outside of this procedure the User or Information Professional must contact CAS.

Springer

No registration is required for text mining; full-text content can be accessed by API based on the content's Digital Object Identifier. See Text and Data Mining at Springer Nature for more information.

University of Michigan

An authorized user wishing to perform text & data mining against the University of Michigan e-book collections must contact the University of Michigan for permission and so they can facilitate the data extraction.  fulcrum-info@umich.edu 

(Participating Institutions and their Authorized Users may, subject to prior notificationand approval by the Licensor, using reasonable practices, engage in text processing, which is any kind of analysis of natural language text. The Licensor will make appropriate arrangements prior to the start of this activity to account for heavy usage and ensure continued access for the user. This may include but not be limited to a process by which information may be derived from text by identifying patterns and trends within natural language through text categorization, statistical pattern recognition, concept or sentiment extraction, and the association of natural language with indexing terms. Technology will not be used to hinder any uses granted under this section.)

Web of Science (Clarivate)

A User may request access to the API through the Clarivate Developer Portal: https://developer.clarivate.com/

This allows for access to certain data fields. See the attached document below. Any other data mining needs not accessible through the API must be requested directly from Clarivate.

Wiley

Authorized Users who wish to text and data mine the Licensed Electronic Products for non-commercial purposes may do so using the Wiley TDM API. Authorized Users will need to accept Wiley’s Text and Data Mining Agreement to receive an API token. See here for further details: https://onlinelibrary.wiley.com/library-info/resources/text-and-datamining