Best Practices
When working on a project, save all the data you will need when you first find it. Record the date on which you accessed the data. Use this date when citing the data.
Bioinformatics and Cheminformatics Resources
- AlphaFold Protein Structure DatabaseProvides open access to protein structure predictions for the human proteome and 20 other key organisms
- BindingDBPublic database of measured binding affinities for biomolecules, genetically or chemically modified biomolecules, and synthetic compounds
- BioCyc This link opens in a new window
Over 20,000 pathway/genome databases (PGDBs). BioCyc encyclopedias integrate a diverse range of data and provide a high level of curation for important microbes. Data can be downloaded and queried, and Pathway Tools can be installed to create your own local database. View more information about this resource.
- Biological Macromolecule Crystallization DatabaseStores information on protein and nucleic acid crystals that have been reported in the literature or deposited in the Protein Data Bank
- Biological Magnetic Resonance DatabankCollects, annotates, archives, and disseminates spectral and quantitative data derived from NMR spectroscopic investigations of biological macromolecules and metabolites
- BRENDA: Comprehensive Enzyme Information SystemFree database containing information on over 6500 enzymes: nomenclature, EC and registry numbers, reaction and specificity, inhibitors, structure, isolation, literature references, and more
- ChEMBLBrings together chemical, bioactivity and genomic data to aid the translation of genomic information into effective new drugs
- ChemDB Chemoinformatics PortalA suite of chemical datasets and learning tools, including a chemical search feature for compounds from vendor catalogs
- Chemical Entities of Biological Interest (ChEBI)Dictionary of small molecular entities that are natural or synthetic products used to intervene in the processes of living organisms
- Chemical Probes PortalTool to find and use evaluated small-molecule reagents called chemical probes in biomedical research and drug discovery
- Comparative Toxicogenomics Database (CTD)Provides manually curated information about chemical–gene/protein interactions, chemical–disease and gene–disease relationships, integrated with functional and pathway data
- EMBL-EBIOffers the ability to query large biological data resources programmatically
- ENZYMERepository of information relative to the nomenclature of enzymes, primarily based on the recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (IUBMB). Describes each type of characterized enzyme for which an EC (Enzyme Commission) number has been provided.
- Enzyme NomenclatureRecommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the nomenclature and classification of enzymes by the reactions they catalyse. Browse and search for enzyme names using EC numbers.
- ExpasyProvides access to databases and software tools, developed by Swiss Institute of Bioinformatics (SIB) groups
- GenbankThe NIH genetic sequence database, an annotated collection of all publicly available DNA sequences
- Joint Genome Institute Portal (JGI Portal)Search metadata for over 13 PB of top-quality plant, algal, fungal, and microbial genomic and metagenomic data.
- NucleotideA collection of nucleotide sequences from several sources, including GenBank, RefSeq, the Third Party Annotation (TPA) database, and PDB. Searching the Nucleotide Database will yield available results from each of its component databases.
- Online Mendelian Inheritance in ManComprehensive, authoritative compendium of human genes and genetic phenotypes that is freely available and updated daily. The full-text, referenced overviews in OMIM contain information on all known mendelian disorders and over 16,000 genes.
- Human Metabolome Database (HMDB)A freely available electronic database containing detailed information about small molecule metabolites found in the human body
- Integrated Resource for Reproducibility in Macromolecular CrystallographyA comprehensive repository and website designed to archive raw data, including metadata from macromolecular diffraction experiments
- Lipid MapsProvides access to lipid nomenclature, databases, tools, protocols, standards, tutorials, meetings, publications, and other resources
- MassBankOpen source mass spectral library for the identification of small chemical molecules of metabolomics, exposomics, and environmental relevance
- Molinspiration CheminformaticsFree web site with Java-based (JME) interface for searching substructure, similarity and pharmacophore similarity on a collection of molecules. Also offers a chemical property calculation function for determining estimated logP (octanol-water partition coefficient), PSA, and other characteristics.
- NCBI DatasetsNCBI Datasets is an experimental resource for finding and building datasets. Their web interface allows you to download genome sequence and annotation for eukaryotic organisms. For access to data for all organisms, including bacteria and viruses, use their command line tool and RESTful APIs.
- National Products AtlasOpen access database designed to cover all microbially-derived natural products published in the peer-reviewed primary scientific literature. This encompasses bacterial, fungal and cyanobacterial compounds, but does not include compounds from plants, invertebrates or other higher organisms unless these compounds have also been explicitly identified from a microbial source. Compounds from lichens and mushrooms and other higher fungi are included. Compounds from marine macro algae and diatoms are excluded.
- MarinLit (Royal Society of Chemistry) This link opens in a new windowMarinLit is a database dedicated to marine natural products research. It contains a comprehensive range of data, along with powerful dereplication features.
- ChemSpiderA free chemical structure database providing fast access to over 120 million structures, along with properties and associated information.
- Nucleic Acid Knowledgebase (NAKB)Portal for 3D structural information about Nucleic Acids, is successor to the Nucleic Acid Database (NDB). Provides search, report, statistics, atlas and visualization pages for all nucleic-acid containing experimentally determined 3D structures held by NDB and by the Protein Data Bank (PDB), including all major methods: X-ray, NMR, and Electron Microscopy
- PeptideAtlasMulti-organism, publicly accessible compendium of peptides identified in a large set of tandem mass spectrometry proteomics experiments
- PIR (Protein Information Resource)Protein informatics site intended to support genomic, proteomic, and systems biology research
- ProteopediaA wiki site that aims to collect, organize and disseminate structural and functional knowledge about protein, RNA, DNA, and other macromolecules, and their assemblies and interactions with small molecules
- RCSB Protein Data Bank (RCSB PDB)Protein Data Bank archive-information about the 3D shapes of proteins, nucleic acids, and complex assemblies
- SABIO-RKA curated database containing structured information about biochemical reactions and their corresponding kinetics. It describes participants and modifiers of the reactions, as well as measured kinetic data (including kinetic rate equations) embedded in their experimental and environmental context.
- SCOPe (Structural Classification of Proteins — extended)Classifies many newer structures through a combination of automation and manual curation, and corrects some errors in SCOP, aiming to have the same accuracy as the hand-curated SCOP releases. SCOPe also incorporates and updates the Astral database.
- UniProtA free resource of protein sequence and functional information from EMBL-EBI, PIR, SIB
- ZINC15A free database of commercially-available compounds for virtual screening
Databases at the Library
- BioCyc This link opens in a new window
Over 20,000 pathway/genome databases (PGDBs). BioCyc encyclopedias integrate a diverse range of data and provide a high level of curation for important microbes. Data can be downloaded and queried, and Pathway Tools can be installed to create your own local database. View more information about this resource.
- Data Citation Index (1900 - current) (Clarivate) This link opens in a new windowCitations and abstracts to quality research data from sources around the world in the sciences, social sciences, arts and humanities.
- InCites - Essential Science Indicators This link opens in a new windowThis unique compilation of science performance statistics and science trends data is based on journal article publication counts and citation data from Clarivate
- PolicyMap This link opens in a new windowPolicyMap is a mapping tool for accessing data on demographics, real estate, health, jobs, and more. Supports research about communities across the U.S.
- Sage Data This link opens in a new windowSage Data is a collection of U.S. and international datasets sourced from governmental, commercial, and private organizations. Sage Data allows you to search and browse millions of datasets, compare and contrast variables of interest, and create customized exportable charts and tables. Includes the Claritas Consumer Profiles dataset.
External Databases and Organizations
CDC datasets uploaded before January 28th, 2025
A special archive created on Internet Archive of all CDC datasets publicly available as of January 28, 2025
Datasets in Dataverse
Data uploaded by the Climate Change and Health Research Coordinating Center (CAFE). Includes the CDC's Social Vulnerability Index data. Most of this data focuses on health and the environment.
IPUMS
Provides census and survey data from around the world.