It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
Bioinformatics: Bioinformatics Databases
A guide to library resources and tools for the Bioinformatics program at Northeastern.
The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.
dbSNP (Database of Short Genetic Variations) includes single nucleotide variations, microsatellites, and small-scale insertions and deletions. dbSNP contains population-specific frequency and genotype data, experimental conditions, molecular context, and mapping information for both neutral variations and clinical mutations.
dbVar (Database of Genomic Structural Variation) has been developed to archive information associated with large scale genomic variation, including large insertions, deletions, translocations and inversions. In addition to archiving variation discovery, dbVar also stores associations of defined variants with phenotype information.
Database of Genotypes and Phenotypes (dbGaP) is an archive and distribution center for the description and results of studies which investigate the interaction of genotype and phenotype. These studies include genome-wide association (GWAS), medical resequencing, molecular diagnostic assays, as well as association between genotype and non-clinical traits.
A searchable database of genes, focusing on genomes that have been completely sequenced and that have an active research community to contribute gene-specific data. Information includes nomenclature, chromosomal localization, gene products and their attributes (e.g., protein interactions), associated markers, phenotypes, interactions, and links to citations, sequences, variation details, maps, expression reports, homologs, protein domain content, and external databases.
Contains sequence and map data from the whole genomes of over 1000 organisms. The genomes represent both completely sequenced organisms and those for which sequencing is in progress. All three main domains of life (bacteria, archaea, and eukaryota) are represented, as well as many viruses, phages, viroids, plasmids, and organelles.
A collection of nucleotide sequences from several sources, including GenBank, RefSeq, the Third Party Annotation (TPA) database, and PDB. Searching the Nucleotide Database will yield available results from each of its component databases.
The Protein database is a collection of sequences from several sources, including translations from annotated coding regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB. Protein sequences are the fundamental determinants of biological structure and function.
Key Resources: Additional Literature Databases for Bioinformatics
Please note: Use of databases from off-campus requires logging in with your myNEU account user name and password.
This is a list of additional suggested databases, but not a complete list of all databases for Bioinformatics. For the 'best bets' literature databases relevant to Bioinformatics, please click here. For a complete list of databases at Northeastern, click here.
If you are having trouble finding what you need, you can make an appointment (in person or via web) with a librarian here.
Use this link for direct full-text access to Northeastern's resources through PubMed.
PubMed comprises more than 32 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full-text content from PubMed Central and publisher web sites. For tips on searching efficiently and effectively in PubMed, click here.
Embase is a versatile and up-to-date biomedical research database covering the most important international biomedical literature from 1947 to the present day with more than 32+ million records from 8,200 journals and ‘grey literature’ from over 2.4 million conference abstracts. Embase includes unique non-English content and coverage of the most important types of evidence, such as randomized controlled trials, controlled clinical trials, Cochrane reviews and meta-analyses.
Not-for-profit collaboration bringing together scientific societies, publishers, and libraries to provide access to critical, peer-reviewed research in the biological, ecological, and environmental sciences.
Published by the Public Library of Science, an open access, peer-reviewed journal; features works in all areas of biological science, including works that interface with other disciplines such as chemistry, medicine and mathematics.
Citations and abstracts from scholarly literature in the sciences, social sciences, arts, and humanities. Includes conference proceedings, symposia, seminars, colloquia, workshops, and conventions. One of the most comprehensive databases of academic research.
The Entrez Programming Utilities (E-utilities) are a set of eight server-side programs that provide a stable interface into the Entrez query and database system at the National Center for Biotechnology Information (NCBI). The E-utilities use a fixed URL syntax that translates a standard set of input parameters into the values necessary for various NCBI software components to search for and retrieve the requested data. The E-utilities are therefore the structured interface to the Entrez system, which currently includes 38 databases covering a variety of biomedical data, including nucleotide and protein sequences, gene records, three-dimensional molecular structures, and the biomedical literature.
The Encyclopedia of DNA Elements (ENCODE) Consortium is an international collaboration of research groups funded by the National Human Genome Research Institute (NHGRI). The goal of ENCODE is to build a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels, and regulatory elements that control cells and circumstances in which a gene is active.
Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. The UniProt databases are the UniProt Knowledgebase (UniProtKB), the UniProt Reference Clusters (UniRef), and the UniProt Archive (UniParc). UniProt is a collaboration between the European Bioinformatics Institute (EMBL-EBI), the SIB Swiss Institute of Bioinformatics and the Protein Information Resource (PIR).
REACTOME is an open-source, open access, manually curated and peer-reviewed pathway database. Our goal is to provide intuitive bioinformatics tools for the visualization, interpretation and analysis of pathway knowledge to support basic and clinical research, genome analysis, modeling, systems biology and education.
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
A tool to explore and visualize cancer data generated by Broad GDAC Firehose. Provides graphical tools like viewGene to explore expression levels and iCoMutto explore a comprehensive mutation analysis of each TCGA disease and an API for programmers.
KEGG is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies.
The Cancer Genome Atlas (TCGA) catalyzed considerable growth and advancement in the computational biology field by supporting the development of high-throughput genomic characterization technologies, generating a massive quantity of data, and fielding teams of researchers to analyze the data. At this link is a collection of some of the tools developed by TCGA network researchers and collaborators that were used to analyze TCGA data.
Using your Northeastern login, you should be able to access resources off campus. Click the above link for help with troubleshooting any issues you may have with off-campus access.
Get started with Scholar OneSearch
Scholar OneSearch can be a great place to start your research. It contains resources provided by Northeastern, but may not include results from specific databases. Search results include books (both print and electronic), articles, videos, data sets, and more.
If we don't have access to an article you need...
Try using Northeastern's Interlibrary Loan system to request books or articles not available at NU. Articles can be delivered electronically.