Get Started
On this page, you will find info about:
- How to find data
- Northeastern Subscribed Databases with Data Sets
- U.S Government Data
- Preserving U.S. Government Data - including alternate access points for archived datasets
- Other Free Data Sets - including data from global organizations
- State and Local Data for Massachusetts
- Data Mining Resources
If you can't find what you're looking for, please contact us!
How to Find Data
Data and data analysis are an important part of many types of research. Finding, using, and evaluating data can be a bit different than using other types of sources. Consult our Data Research Tutorials for more information and explore a variety of data collections below.
Northeastern Subscribed Databases with Data Sets
- IBISWorld This link opens in a new windowEconomic, demographic, and market data on thousands of industries worldwide.
- ICPSR (Inter-university Consortium for Political and Social Research) This link opens in a new windowA non-profit membership-based data archive based at University of Michigan. ICPSR maintains and provides access to a vast archive of social science data for research and instruction. You must register for a myData account in order to access some reports.
- MarketLine Advantage This link opens in a new windowCompany, industry, country and financial data for every major marketplace in the world. Includes company SWOTs, company overviews, industry profiles, market research, case studies, financial deals, country analysis, news and statistics.
- Mergent Intellect This link opens in a new windowA fully searchable database with detailed information on businesses including both active/inactive companies as well as daily updates on executives.
- Mintel U.S. Reports & Global New Products This link opens in a new windowCategory specific reports with quantitative and qualitative market, brand, and consumer insights as well as a global database of new consumer packaged goods launches in 86 markets.
- Passport (Euromonitor) This link opens in a new windowInternational marketing data, analysis for countries, consumers, and industries, for example, searching of industries and products in order to locate leading companies in different regions of the world.
- What's included:Charts and data exporting available
- Roper Center Public Opinion Archives (with iPOLL) This link opens in a new windowSearch opinion polls from all over the world, using this database from Cornell University. Poll results from news organizations, governments, private foundations, academic institutions, and more. The scope of topics spans politics, culture, workplaces, and social life. 1930s to the present.
- Sage Data This link opens in a new windowSage Data is a collection of U.S. and international datasets sourced from governmental, commercial, and private organizations. Sage Data allows you to search and browse millions of datasets, compare and contrast variables of interest, and create customized exportable charts and tables. Includes the Claritas Consumer Profiles dataset.
- SimplyAnalytics This link opens in a new windowA mapping and data visualization application including demographic data from the US Census dating back to 1980, the American Community Survey (ACS), consumer spending data from the Consumer Expenditure Survey (CEX), CDC PLACES health data, and D&B's Points-of-Interest business directory. Additionally, users have access to the MRI-SimmonsLOCAL consumer behavior dataset, which includes data on over 8,000 brands, 450 categories and detailed lifestyle data. This resource is limited to 2 users at a time. Registration is required to save work between sessions.
- Statista This link opens in a new windowBrings together data and graphs on every imaginable topic from business to government, surveys, sports, and scientific topics. Easy to use, exportable charts and data, recommended citations.
- Web of Science Data Citation IndexThe Data Citation Index provides a single point of access to research data from repositories across disciplines and around the world.
U.S. Government Data
U.S. government sites are traditionally a reliable source of publicly available data on a wide range of topics. As government organizations adapt to executive actions and other federal funding and policy changes, access to some datasets has been changing as well.
When working on a project, regardless of source, the best practice is to save all the data you will need when you first find it and record the date on which you accessed the data.
There are efforts underway to preserve access to U.S. government data via other open web archives. If you have trouble accessing the data you need, you can learn more about other potential access points, or ask a librarian for help.
- CDC Data and StatisticsThe Centers for Disease Control and Prevention maintains a database on cause of death. The data can be segmented in almost every way imaginable: age, race, year, and so on.
- BLS Databases, Tables, and Calculators by Subject (Bureau of Labor Statistics)Includes materials on employment, unemployment, prices and inflation, productivity, and pay and benefits, and workplace injuries. Also international comparative data.
- Data.govHome of U.S. Government Open Data. Here you will find data, tools, and resources to conduct research, develop web and mobile applications, design data visualizations, and more with access to over 300,000 datasets.
- DataCiteSearch for datasets across over 1000 different major data centers and repositories including ICPSR, Harvard Dataverse, Data-Planet, etc.
- DOE Data ExplorerSearch tool for finding publicly available, scientific data records submitted by data centers, repositories, and other organizations funded by the U.S. Department of Energy
- Energy Information Administration (EIA)Provides statistics and other data relating to major energy sectors, including petroleum, natural gas, coal, nuclear, and renewables
- FBI Crime Data ExplorerThe Crime Data Explorer is part of the FBI’s broader effort to modernize the reporting of national crime data. It allows you to view trends, download bulk data, and access the Crime Data API for reported crime at the national, state, and agency levels.
- HealthData.govThe U.S. Department of Health and Human Services, provides data on a wide range of topics, including environmental health, medical devices, Medicare & Medicaid, social services, community health, mental health, and substance abuse.
- NASA Open Data PortalCatalog of publicly available NASA datasets
- National Center for Biotechnology Information (NCBI)one-stop shop for finding, browsing, and downloading genomic data
- National Center for Science and Engineering StatisticsData on research and development, the science and engineering workforce, the condition and progress of STEM education, and U.S. competitiveness in science, engineering, technology, and R&D
- NOAA National Centers for Environmental Information (NCEI)Publicly available geophysical data and information for the U.S.
- ResearchDataGovResearchDataGov is a web portal for discovering and requesting access to restricted microdata from federal statistical agencies.
- United States CensusBusiness, economic, and population data.
- USDA Geospatial Data GatewayEnvironmental and natural resources data curated by the U.S. Department of Agriculture
Preserving U.S. Government Data
Listed below are some organizations working to preserve access to data that may be important for engineering research.
For more information about these efforts, explore the Data Rescue Project, a coordinated effort among a group of data organizations, including IASSIST, RDAP, and members of the Data Curation Network, with a goal to serve as a clearinghouse for efforts focused on preserving access to public data and data access points for public US governmental data that are currently at risk.
If you are looking for a specific dataset, check the Data Rescue Project Tracker.
- CDC Data on the Internet ArchiveAn archive of CDC datasets from before January 28th, 2025
- Climate Change and Health Research Coordinator Center (CAFE) Collection at Harvard DataverseIncludes datasets from multiple federal agencies at the intersection of climate and human health
- Data.gov archive by Harvard Library Innovation Lab TeamReleased on February 6, 2025 on Source Cooperative, it includes over 311,000 datasets harvested during 2024 and 2025. it will be updated daily as new datasets are added to data.gov.
- DataLumosCrowdsourced repository for government data, provided by ICPSR (Institute for Social Research) at University of Michigan
- Open Energy Data Initiativeprovides free access to data generated from efforts funded by the U.S. Department of Energy and support projects and partnerships
- PolicyMap This link opens in a new windowPolicyMap is a mapping tool for accessing data on demographics, real estate, health, jobs, and more. Supports research about communities across the U.S.
- Public Environmental Data ProjectPartnering with Environmental Data & Governance Initiative (EDGI) to make working copies available of key tools, including Climate and Economic Justice Screening Tool (CEJST), CDC's Social Vulnerability Index and Environmental Justice Index, and Council on Environmental Quality EJScorecard.
- Social Explorer This link opens in a new windowGeographic resource that brings together a vast and growing amount of quantitative data with an intuitive visual interface to make demographic research, the analysis of social trends, and comparison of neighborhoods, communities, counties, and other areas accessible and interactive. Access is limited to 5 users at a time.
- The Climate Mirrorhas NOAA (National Oceanic and Atmospheric Administration) data from 2017
Free Data Sets
- CDP Cities, States and Regions Open Data PortalFormerly the Carbon Disclosure Project, climate change and sustainability data from more than 1,200 city, state and regional governments.
- Common CrawlFree, open repository of web crawl data
- Cooperative Association for Internet Data Analysis (CAIDA)Large-scale data collection, curation, and data distribution, based at the San Diego Supercomputer Center at UC San Diego.
- Data.WorldOne key differentiator of data.world is the tools they have built to make working with data easier - you can write SQL queries within their interface to explore data and join multiple data sets. They also have SDK's for R an python to make it easier to acquire and work with data in your tool of choice.
- DataCiteSearch for datasets across over 1000 different major data centers and repositories including ICPSR, Harvard Dataverse, Data-Planet, etc.
- DataOneEarth and environmental data provided by a community of member repositories
- EurostatStatistics and data on Europe, provided by the statistical office of the European Union.
- FAOStatFree access to food and agriculture data for over 245 countries and territories from 1961 to the most recent year available, from the Food and Agriculture Organization of the United Nations.
- FigshareFeatures content in many file formats, including figures, datasets, media, papers, posters, presentations and filesets.
- Github Awesome Public DatasetsList of topic-centric (mostly free) public data sources
- Google DatasetGoogle's search engine for datasets.
- The Government Finance DatabasePrepared data set based on census data for government finance
- Harvard DataverseFree data repository maintained by Harvard University, open to all researchers from any discipline, worldwide, for sharing, archiving, and accessing research data
- IPPSR Correlates of State PolicyThe Correlates of State Policy Project aims to compile, disseminate, and encourage the use of data relevant to U.S. state policy research, tracking policy differences across the 50 states and changes over time. We have gathered more than 3000 variables from various sources and assembled them into one large, useful dataset.
- KagglePublicly available datasets on a wide variety of topics, founded and run by an online community of data scientists and machine learning practitioners
- Mikulski Archive for Space Telescopes (MAST)Astronomical data archive focused on the optical, ultraviolet, and near-infrared. MAST hosts data from over a dozen missions like Webb, Hubble, TESS, Kepler, and in the future Roman.
- Our World in DataOur World in Data is produced as a collaborative effort between researchers at the University of Oxford, who are the scientific contributors of the website content and the non-profit organization Global Change Data Lab, who owns, publishes and maintains the website and the data tools.
- Pew Research Center DatasetsRaw data from Pew's research into American life.
- Registry of Open Data on AWSPublicly available datasets hosted via AWS resources, from organizations like Allen Institute for Artificial Intelligence (AI2), Digital Earth Africa, Data for Good at Meta, NASA Space Act Agreement, NIH STRIDES, NOAA Open Data Dissemination Program, Space Telescope Science Institute, and Amazon Sustainability Data Initiative.
- Stanford Large Network Dataset Collection (SNAP)Datasets mostly scraped from the web for analysis of large social and information networks
- United Nations Data (UNdata)World data about population, education, labor and more from a variety of global organizations.
- World Bank DatabankDatasets covering population demographics and a huge number of economic and development indicators from across the world.
- World Bank Open Data"Free and open access to data about development in countries around the globe"
- World Health Organization - Data and StatisticsOffers world hunger, health, and disease statistics.
- ZenodoOpen repository developed under the European OpenAIRE program and operated by CERN
State and Local Data for Massachusetts
- Analyze BostonThe City of Boston provides an open data hub. Locate datasets and other projects built on this open data.
- Boston Indicators ProjectProvides key indicators for data trends in Boston.The reports tend to include data from the field of health, education, transportation, etc.
- The Health of BostonIncludes reports that provide descriptive information about the health status and factors that influence the health of Boston residents.
- Massachusetts Office of Data Management and Outcomes AssessmentODMOA facilitates and coordinates the collection, access to, and use of public health data in order to monitor and improve population health.
- State Budget Sources The Volcker AllianceState Budget Sources is designed to provide improved tools for public officials, policy advocates, journalists, academics, and concerned citizens researching the critical fiscal decisions that governors and legislators must make.
Data Mining Resources
- Article Discussing the challenges data mining library resourcesMcCracken, P. & Raub, E., (2023) “Licensing Challenges Associated With Text and Data Mining: How Do We Get Our Patrons What They Need?”, Journal of Librarianship and Scholarly Communication 11(1). doi: https://doi.org/10.31274/jlsc.15530
- Google Books Ngram ViewerSearch Google's text collection, including printed sources published between 1500 and 2019 in several languages.
- Hathi Trust Data Availability and APIsPerform text mining on Hathi Trust's collection through a variety of channels.
- English-CorporaLarge collections of text for text mining.