How to Find Data
Data and data analysis are an important part of many types of research. Finding, using, and evaluating data can be a bit different than using other types of sources. Consult our Data Research Tutorials for more information and explore a variety of data collections below.
Northeastern Subscribed Databases with Data Sets
- IBISWorld This link opens in a new windowEconomic, demographic, and market data on thousands of industries worldwide.
- ICPSR (Inter-university Consortium for Political and Social Research) This link opens in a new windowA non-profit membership-based data archive based at University of Michigan. ICPSR maintains and provides access to a vast archive of social science data for research and instruction. You must register for a myData account in order to access some reports.
- MarketLine Advantage This link opens in a new windowCompany, industry, country and financial data for every major marketplace in the world. Includes company SWOTs, company overviews, industry profiles, market research, case studies, financial deals, country analysis, news and statistics.
- Mergent Intellect This link opens in a new windowA fully searchable database with detailed information on businesses including both active/inactive companies as well as daily updates on executives.
- Mintel U.S. Reports & Global New Products This link opens in a new windowCategory specific reports with quantitative and qualitative market, brand, and consumer insights as well as a global database of new consumer packaged goods launches in 86 markets.
- Passport (Euromonitor) This link opens in a new windowInternational marketing data, analysis for countries, consumers, and industries, for example, searching of industries and products in order to locate leading companies in different regions of the world.
- What's included:Charts and data exporting available
- Roper Center Public Opinion Archives (with iPOLL) This link opens in a new windowSearch opinion polls from all over the world, using this database from Cornell University. Poll results from news organizations, governments, private foundations, academic institutions, and more. The scope of topics spans politics, culture, workplaces, and social life. 1930s to the present.
- Sage Data This link opens in a new windowSage Data is a collection of U.S. and international datasets sourced from governmental, commercial, and private organizations. Sage Data allows you to search and browse millions of datasets, compare and contrast variables of interest, and create customized exportable charts and tables. Includes the Claritas Consumer Profiles dataset.
- SimplyAnalytics This link opens in a new windowA mapping and data visualization application including demographic data from the US Census dating back to 1980, the American Community Survey (ACS), consumer spending data from the Consumer Expenditure Survey (CEX), CDC PLACES health data, and D&B's Points-of-Interest business directory. Additionally, users have access to the MRI-SimmonsLOCAL consumer behavior dataset, which includes data on over 8,000 brands, 450 categories and detailed lifestyle data. This resource is limited to 2 users at a time. Registration is required to save work between sessions.
- Statista This link opens in a new windowBrings together data and graphs on every imaginable topic from business to government, surveys, sports, and scientific topics. Easy to use, exportable charts and data, recommended citations.
- Web of Science Data Citation IndexThe Data Citation Index provides a single point of access to research data from repositories across disciplines and around the world.
Free Data Sets
- CDC Data and StatisticsThe Centers for Disease Control and Prevention maintains a database on cause of death. The data can be segmented in almost every way imaginable: age, race, year, and so on.
- BLS Databases, Tables, and Calculators by Subject (Bureau of Labor Statistics)Includes materials on employment, unemployment, prices and inflation, productivity, and pay and benefits, and workplace injuries. Also international comparative data.
- CDP Cities, States and Regions Open Data PortalFormerly the Carbon Disclosure Project, climate change and sustainability data from more than 1,200 city, state and regional governments.
- Common CrawlFree, open repository of web crawl data
- Cooperative Association for Internet Data Analysis (CAIDA)Large-scale data collection, curation, and data distribution, based at the San Diego Supercomputer Center at UC San Diego.
- Data.govHome of U.S. Government Open Data. Here you will find data, tools, and resources to conduct research, develop web and mobile applications, design data visualizations, and more with access to over 300,000 datasets.
- Data.WorldOne key differentiator of data.world is the tools they have built to make working with data easier - you can write SQL queries within their interface to explore data and join multiple data sets. They also have SDK's for R an python to make it easier to acquire and work with data in your tool of choice.
- DataCiteSearch for datasets across over 1000 different major data centers and repositories including ICPSR, Harvard Dataverse, Data-Planet, etc.
- DataOneEarth and environmental data provided by a community of member repositories
- DOE Data ExplorerSearch tool for finding publicly available, scientific data records submitted by data centers, repositories, and other organizations funded by the U.S. Department of Energy
- Energy Information Administration (EIA)Provides statistics and other data relating to major energy sectors, including petroleum, natural gas, coal, nuclear, and renewables
- EurostatStatistics and data on Europe, provided by the statistical office of the European Union.
- FAOStatFree access to food and agriculture data for over 245 countries and territories from 1961 to the most recent year available, from the Food and Agriculture Organization of the United Nations.
- FBI Crime Data ExplorerThe Crime Data Explorer is part of the FBI’s broader effort to modernize the reporting of national crime data. It allows you to view trends, download bulk data, and access the Crime Data API for reported crime at the national, state, and agency levels.
- FigshareFeatures content in many file formats, including figures, datasets, media, papers, posters, presentations and filesets.
- Github Awesome Public DatasetsList of topic-centric (mostly free) public data sources
- Google DatasetGoogle's search engine for datasets.
- The Government Finance DatabasePrepared data set based on census data for government fianance
- Harvard DataverseFree data repository maintained by Harvard University, open to all researchers from any discipline, worldwide, for sharing, archiving, and accessing research data
- HealthData.govThe U.S. Department of Health and Human Services, provides data on a wide range of topics, including environmental health, medical devices, Medicare & Medicaid, social services, community health, mental health, and substance abuse.
- IPPSR Correlates of State PolicyThe Correlates of State Policy Project aims to compile, disseminate, and encourage the use of data relevant to U.S. state policy research, tracking policy differences across the 50 states and changes over time. We have gathered more than 3000 variables from various sources and assembled them into one large, useful dataset.
- KagglePublicly available datasets on a wide variety of topics, founded and run by an online community of data scientists and machine learning practitioners
- Mikulski Archive for Space Telescopes (MAST)Astronomical data archive focused on the optical, ultraviolet, and near-infrared. MAST hosts data from over a dozen missions like Webb, Hubble, TESS, Kepler, and in the future Roman.
- NASA Open Data PortalCatalog of publicly available NASA datasets
- National Center for Biotechnology Information (NCBI)one-stop shop for finding, browsing, and downloading genomic data
- National Center for Science and Engineering StatisticsData on research and development, the science and engineering workforce, the condition and progress of STEM education, and U.S. competitiveness in science, engineering, technology, and R&D
- NOAA National Centers for Environmental Information (NCEI)Publicly available geophysical data and information for the U.S.
- Our World in DataOur World in Data is produced as a collaborative effort between researchers at the University of Oxford, who are the scientific contributors of the website content and the non-profit organization Global Change Data Lab, who owns, publishes and maintains the website and the data tools.
- Pew Research Center DatasetsRaw data from Pew's research into American life.
- Registry of Open Data on AWSPublicly available datasets hosted via AWS resources, from organizations like Allen Institute for Artificial Intelligence (AI2), Digital Earth Africa, Data for Good at Meta, NASA Space Act Agreement, NIH STRIDES, NOAA Open Data Dissemination Program, Space Telescope Science Institute, and Amazon Sustainability Data Initiative.
- ResearchDataGovResearchDataGov is a web portal for discovering and requesting access to restricted microdata from federal statistical agencies.
- Stanford Large Network Dataset Collection (SNAP)Datasets mostly scraped from the web for analysis of large social and information networks
- United Nations Data (UNdata)World data about population, education, labor and more from a variety of global organizations.
- United States CensusBusiness, economic, and population data.
- USDA Geospatial Data GatewayEnvironmental and natural resources data curated by the U.S. Department of Agriculture
- World Bank DatabankDatasets covering population demographics and a huge number of economic and development indicators from across the world.
- World Bank Open DataFree and open access to global development data
- World Health Organization - Data and StatisticsOffers world hunger, health, and disease statistics.
- ZenodoOpen repository developed under the European OpenAIRE program and operated by CERN
Data Mining Resources
- Article Discussing the challenges data mining library resourcesMcCracken, P. & Raub, E., (2023) “Licensing Challenges Associated With Text and Data Mining: How Do We Get Our Patrons What They Need?”, Journal of Librarianship and Scholarly Communication 11(1). doi: https://doi.org/10.31274/jlsc.15530
- Google Books Ngram ViewerSearch Google's text collection, including printed sources published between 1500 and 2019 in several languages.
- Hathi Trust Data Availability and APIsPerform text mining on Hathi Trust's collection through a variety of channels.
- English-CorporaLarge collections of text for text mining.
State and Local Data for Massachusetts
- Analyze BostonThe City of Boston provides an open data hub. Locate datasets and other projects built on this open data.
- Boston Indicators ProjectProvides key indicators for data trends in Boston.The reports tend to include data from the field of health, education, transportation, etc.
- The Health of BostonIncludes reports that provide descriptive information about the health status and factors that influence the health of Boston residents.
- Massachusetts Office of Data Management and Outcomes AssessmentODMOA facilitates and coordinates the collection, access to, and use of public health data in order to monitor and improve population health.
- State Budget Sources The Volcker AllianceState Budget Sources is designed to provide improved tools for public officials, policy advocates, journalists, academics, and concerned citizens researching the critical fiscal decisions that governors and legislators must make.