Skip to main content
We are still offering consultation services during the COVID-19 move to online instruction. Please feel free to reach out for virtual appointments! For more information on library services and resources, please click here.
Key Resources: NCBI Databases and Tools for Bioinformatics
Please note: Use of databases from off-campus requires logging in with your myNortheastern account user name and password.
NCBI, National Center for Biotechnology Information, has a number of useful databases for bioinformatics. A complete list is available here, and selected databases are linked below.
1000 Genomes Browser
An interactive graphical viewer that allows users to explore variant calls, genotype calls and supporting evidence (such as aligned sequence reads) that have been produced by the 1000 Genomes Project.
The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.
A resource to provide a public, tracked record of reported relationships between human variation and observed health status with supporting evidence.
dbSNP (Database of Short Genetic Variations) includes single nucleotide variations, microsatellites, and small-scale insertions and deletions. dbSNP contains population-specific frequency and genotype data, experimental conditions, molecular context, and mapping information for both neutral variations and clinical mutations.
dbVar (Database of Genomic Structural Variation) has been developed to archive information associated with large scale genomic variation, including large insertions, deletions, translocations and inversions. In addition to archiving variation discovery, dbVar also stores associations of defined variants with phenotype information.
Database of Genotypes and Phenotypes (dbGaP) is an archive and distribution center for the description and results of studies which investigate the interaction of genotype and phenotype. These studies include genome-wide association (GWAS), medical resequencing, molecular diagnostic assays, as well as association between genotype and non-clinical traits.
A searchable database of genes, focusing on genomes that have been completely sequenced and that have an active research community to contribute gene-specific data. Information includes nomenclature, chromosomal localization, gene products and their attributes (e.g., protein interactions), associated markers, phenotypes, interactions, and links to citations, sequences, variation details, maps, expression reports, homologs, protein domain content, and external databases.
Contains sequence and map data from the whole genomes of over 1000 organisms. The genomes represent both completely sequenced organisms and those for which sequencing is in progress. All three main domains of life (bacteria, archaea, and eukaryota) are represented, as well as many viruses, phages, viroids, plasmids, and organelles.
Organizes information related to human medical genetics, such as attributes of conditions with a genetic contribution.
A collection of nucleotide sequences from several sources, including GenBank, RefSeq, the Third Party Annotation (TPA) database, and PDB. Searching the Nucleotide Database will yield available results from each of its component databases.
The Protein database is a collection of sequences from several sources, including translations from annotated coding regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB. Protein sequences are the fundamental determinants of biological structure and function.
Key Resources: Additional Literature Databases for Bioinformatics
Please note: Use of databases from off-campus requires logging in with your myNEU account user name and password.
This is a list of additional suggested databases, but not a complete list of all databases for Bioinformatics. For the 'best bets' literature databases relevant to Bioinformatics, please click here. For a complete list of databases at Northeastern, click here.
If you are having trouble finding what you need, you can make an appointment (in person or via web) with a librarian here.
Use this link for direct full-text access to Northeastern's resources through PubMed.
PubMed comprises more than 29 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full-text content from PubMed Central and publisher web sites. For tips on searching efficiently and effectively in PubMed, click here
Embase is a versatile and up-to-date biomedical research database covering the most important international biomedical literature from 1947 to the present day with more than 32+ million records from 8,200 journals and ‘grey literature’ from over 2.4 million conference abstracts. Embase includes unique non-English content and coverage of the most important types of evidence, such as randomized controlled trials, controlled clinical trials, Cochrane reviews and meta-analyses.
BMC has an evolving portfolio of some 300 peer-reviewed, open access journals, sharing discoveries from research communities in science, technology, engineering and medicine.
Not-for-profit collaboration bringing together scientific societies, publishers, and libraries to provide access to critical, peer-reviewed research in the biological, ecological, and environmental sciences.
Published by the Public Library of Science, an open access, peer-reviewed journal; features works in all areas of biological science, including works that interface with other disciplines such as chemistry, medicine and mathematics.
Science Direct is the web site for selected journal titles from the scholarly publisher Elsevier and its affiliates. Learn how to download articles directly to your mobile device.
Web of Science
Citations and abstracts from scholarly literature in the sciences, social sciences, arts, and humanities. Includes conference proceedings, symposia, seminars, colloquia, workshops, and conventions. One of the most comprehensive databases of academic research.
Other Resources: Databases and Tools
NCBI provides a variety of resources that allow developers to access and manipulate NCBI data in their applications. Use this resource for information on APIs, code libraries, and data formats.
NCBI GitHub Repository
The GitHub Repository from NCBI
E-Utilities API for NCBI
The Entrez Programming Utilities (E-utilities) are a set of eight server-side programs that provide a stable interface into the Entrez query and database system at the National Center for Biotechnology Information (NCBI). The E-utilities use a fixed URL syntax that translates a standard set of input parameters into the values necessary for various NCBI software components to search for and retrieve the requested data. The E-utilities are therefore the structured interface to the Entrez system, which currently includes 38 databases covering a variety of biomedical data, including nucleotide and protein sequences, gene records, three-dimensional molecular structures, and the biomedical literature.
The Encyclopedia of DNA Elements (ENCODE) Consortium is an international collaboration of research groups funded by the National Human Genome Research Institute (NHGRI). The goal of ENCODE is to build a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels, and regulatory elements that control cells and circumstances in which a gene is active.
Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. The UniProt databases are the UniProt Knowledgebase (UniProtKB), the UniProt Reference Clusters (UniRef), and the UniProt Archive (UniParc). UniProt is a collaboration between the European Bioinformatics Institute (EMBL-EBI), the SIB Swiss Institute of Bioinformatics and the Protein Information Resource (PIR).
REACTOME is an open-source, open access, manually curated and peer-reviewed pathway database. Our goal is to provide intuitive bioinformatics tools for the visualization, interpretation and analysis of pathway knowledge to support basic and clinical research, genome analysis, modeling, systems biology and education.
Broad Institute of Harvard and MIT
The Broad Institute of Harvard and MIT shares some data and software tools produced with the larger scientific community.
Genome Analysis Toolkit - Broad Institute
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Firehose - Broad Institute
A suite of tools and pipelines developed for processing and analyzing various types of large-scale genomic and proteomic data.
Firebrowse - Broad Institute
A tool to explore and visualize cancer data generated by Broad GDAC Firehose. Provides graphical tools like viewGene to explore expression levels and iCoMutto explore a comprehensive mutation analysis of each TCGA disease and an API for programmers.
KEGG: Kyoto Encyclopedia of Genes and Genomes
KEGG is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies.
TCGA Computational Tools - National Cancer Institute
The Cancer Genome Atlas (TCGA) catalyzed considerable growth and advancement in the computational biology field by supporting the development of high-throughput genomic characterization technologies, generating a massive quantity of data, and fielding teams of researchers to analyze the data. At this link is a collection of some of the tools developed by TCGA network researchers and collaborators that were used to analyze TCGA data.
Finding journal articles
Includes all journals, print & electronic available at NU
Complete A-Z list of databases
Off-campus access to resources
Using your Northeastern login, you should be able to access resources off campus. Click the above link for help with troubleshooting any issues you may have with off-campus access.
Get started with Scholar OneSearch
Scholar OneSearch can be a great place to start your research. It contains resources provided by Northeastern, but may not include results from specific databases. Search results include books (both print and electronic), articles, videos, data sets, and more.
If we don't have access to an article you need...
Try using Northeastern's Interlibrary Loan system to request books or articles not available at NU. Articles can be delivered electronically.