Additional Resources: NCBI Databases and Tools for Bioinformatics
NCBI, National Center for Biotechnology Information, has a number of useful databases for bioinformatics. Select relevant databases are linked below. See a complete list of databases.
- BioCyc This link opens in a new windowOver 20,000 pathway/genome databases (PGDBs). BioCyc encyclopedias integrate a diverse range of data and provide a high level of curation for important microbes. Data can be downloaded and queried, and Pathway Tools can be installed to create your own local database. View more information about this resource.
- BLASTThe Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.
- ClinVarA resource to provide a public, tracked record of reported relationships between human variation and observed health status with supporting evidence.
- dbGaPDatabase of Genotypes and Phenotypes (dbGaP) is an archive and distribution center for the description and results of studies which investigate the interaction of genotype and phenotype. These studies include genome-wide association (GWAS), medical resequencing, molecular diagnostic assays, as well as association between genotype and non-clinical traits.
- dbSNPdbSNP (Database of Short Genetic Variations) includes single nucleotide variations, microsatellites, and small-scale insertions and deletions. dbSNP contains population-specific frequency and genotype data, experimental conditions, molecular context, and mapping information for both neutral variations and clinical mutations.
- dbVardbVar (Database of Genomic Structural Variation) has been developed to archive information associated with large scale genomic variation, including large insertions, deletions, translocations and inversions. In addition to archiving variation discovery, dbVar also stores associations of defined variants with phenotype information.
- GeneA searchable database of genes, focusing on genomes that have been completely sequenced and that have an active research community to contribute gene-specific data. Information includes nomenclature, chromosomal localization, gene products and their attributes (e.g., protein interactions), associated markers, phenotypes, interactions, and links to citations, sequences, variation details, maps, expression reports, homologs, protein domain content, and external databases.
- GenomeContains sequence and map data from the whole genomes of over 1000 organisms. The genomes represent both completely sequenced organisms and those for which sequencing is in progress. All three main domains of life (bacteria, archaea, and eukaryota) are represented, as well as many viruses, phages, viroids, plasmids, and organelles.
- MedGenOrganizes information related to human medical genetics, such as attributes of conditions with a genetic contribution.
- NucleotideA collection of nucleotide sequences from several sources, including GenBank, RefSeq, the Third Party Annotation (TPA) database, and PDB. Searching the Nucleotide Database will yield available results from each of its component databases.
- ProteinThe Protein database is a collection of sequences from several sources, including translations from annotated coding regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB. Protein sequences are the fundamental determinants of biological structure and function.
Key Resources: Additional Literature Databases for Bioinformatics
This is a list of additional suggested databases, but not a complete list of all databases for Bioinformatics. Try the 'best bets' literature databases relevant to Bioinformatics and visit the complete list of databases at Northeastern.
- PubMed (NU customized) This link opens in a new windowMore than just medical literature, PubMed was developed by the National Center for Biotechnology Information (NCBI) and publishers of life sciences literature. PubMed consists of 26 million citations for biomedical literature from Medline, life science journals and online books.
- Embase (Elsevier) This link opens in a new windowAn up-to-date biomedical research database covering the most important international biomedical literature from 1947 to the present day. Structured and natural language searching.
- BioMed Central This link opens in a new windowAn independent online publishing house; provides immediate free access to the peer-reviewed biological and medical research. Northeastern's membership reduces article-processing charges for Northeastern-affiliated authors who publish with BMC journals.
- BioOne Complete This link opens in a new windowCritical, peer-reviewed research in the biological, ecological, and environmental sciences.
- PLoS BiologyPublished by the Public Library of Science, an open access, peer-reviewed journal; features works in all areas of biological science, including works that interface with other disciplines such as chemistry, medicine and mathematics.
- ScienceDirect Books and Journals (Elsevier) This link opens in a new windowScience Direct is the web site for selected journal titles from the scholarly publisher Elsevier and its affiliates. Learn how to download articles directly to your mobile device
- Web of Science, Core Collection 1975-present This link opens in a new windowUse for citation tracking, finding seminal literature, data visualizations, author alerts, institutional affiliations, and impact factors. LInks to full text for Northeastern-subscribed journals.
- Annual Reviews This link opens in a new windowScholarly overviews in 37 academic subjects, mostly biomedical sciences, physical sciences, and social sciences. Excellent for finding authoritative overviews of new topics.
Other Resources: Databases and Tools
- NCBI DevelopNCBI provides a variety of resources that allow developers to access and manipulate NCBI data in their applications. Use this resource for information on APIs, code libraries, and data formats.
- NCBI GitHub RepositoryThe GitHub Repository from NCBI
- E-Utilities API for NCBIThe Entrez Programming Utilities (E-utilities) are a set of eight server-side programs that provide a stable interface into the Entrez query and database system at the National Center for Biotechnology Information (NCBI). The E-utilities use a fixed URL syntax that translates a standard set of input parameters into the values necessary for various NCBI software components to search for and retrieve the requested data. The E-utilities are therefore the structured interface to the Entrez system, which currently includes 38 databases covering a variety of biomedical data, including nucleotide and protein sequences, gene records, three-dimensional molecular structures, and the biomedical literature.
- ENCODEThe Encyclopedia of DNA Elements (ENCODE) Consortium is an international collaboration of research groups funded by the National Human Genome Research Institute (NHGRI). The goal of ENCODE is to build a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels, and regulatory elements that control cells and circumstances in which a gene is active.
- UniProtUniversal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. The UniProt databases are the UniProt Knowledgebase (UniProtKB), the UniProt Reference Clusters (UniRef), and the UniProt Archive (UniParc). UniProt is a collaboration between the European Bioinformatics Institute (EMBL-EBI), the SIB Swiss Institute of Bioinformatics and the Protein Information Resource (PIR).
- REACTOMEREACTOME is an open-source, open access, manually curated and peer-reviewed pathway database. Our goal is to provide intuitive bioinformatics tools for the visualization, interpretation and analysis of pathway knowledge to support basic and clinical research, genome analysis, modeling, systems biology and education.
- Broad Institute of Harvard and MITThe Broad Institute of Harvard and MIT shares some data and software tools produced with the larger scientific community.
- Genome Analysis Toolkit - Broad InstituteDeveloped in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
- Firehose - Broad InstituteA suite of tools and pipelines developed for processing and analyzing various types of large-scale genomic and proteomic data.
- Firebrowse - Broad InstituteA tool to explore and visualize cancer data generated by Broad GDAC Firehose. Provides graphical tools like viewGene to explore expression levels and iCoMutto explore a comprehensive mutation analysis of each TCGA disease and an API for programmers.
- KEGG: Kyoto Encyclopedia of Genes and GenomesKEGG is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies.
- TCGA Computational Tools - National Cancer InstituteThe Cancer Genome Atlas (TCGA) catalyzed considerable growth and advancement in the computational biology field by supporting the development of high-throughput genomic characterization technologies, generating a massive quantity of data, and fielding teams of researchers to analyze the data. At this link is a collection of some of the tools developed by TCGA network researchers and collaborators that were used to analyze TCGA data.
If we don't have access to an article you need...
Try using Northeastern's Interlibrary Loan system to request books or articles not available at NU. Articles can be delivered electronically.