One of the three categories used by the Gene Ontology project, cellular
component encompasses subcellular structures, locations, and
macromolecular complexes. Examples include nucleus,
telomere, and origin recognition complex.
European Molecular Biology Labs. The EMBL
Nucleotide Sequence database is a comprehensive database of DNA and
RNA sequences. The database is produced in collaboration with GenBank and the DNA Database of Japan (DDBJ).
Search System was developed by NCBI. Entrez allows you to
retrieve molecular biology data and bibliographic citations from
integrated nucleotide (GenBank, DDBJ, EMBL), protein (Swiss-Prot, PIR, PRF, PDB), and bioliographic (PubMed) databases. Within SMD database
pages, external links are provided to one or more of these databases.
The Gene Ontology (GO) project was established to provide a
common language to describe aspects of a gene product's biology. The
use of a consistent vocabulary allows genes from different species to
be compared based on their GO annotations. For each of three
categories of biological information--function, process, and cellular
component--a set of terms has been selected and organized. Each set of
terms uses a controlled vocabulary, and parent-child relationships
between terms are defined. This combination of a controlled vocabulary
with defined relationships between items is an ontology. Within an
ontology, a child may be a "part of" or an example ("instance") of its
parent. There are three independently organized controlled
vocabularies, or gene ontologies, one for function, one for process, and one for cellular component. Many-to-many
parent-child relationships allowed in the process and cellular
component ontologies. A gene may be annotated to any level in an
ontology, and to more than one item within an ontology. The Gene
Ontology project is a collaboration between three model organism
databases, FlyBase (Drosophila), Saccharomyces Genome Database (SGD)
and Mouse Genome Informatics (MGI).
A keyword is a word identified as particularly informative about an
object. In a sequence, a keyword often relates to the identity of a
gene or the function of the gene product. References often have a
list of keywords that are Medline MeSH terms. Keywords are good to
use in text searches.
The National Center for
Biotechnology Information (NCBI) is part of the National Library
of Medicine (NLM) in the National Institutes of Health (NIH). Its
mission is to develop new information technologies to aid in the
understanding of fundamental molecular and genetic processes that
control health and disease. NCBI developed and maintains the Entrez Search System and PubMed database.
An ORF (Open Reading Frame) corresponds to a stretch of DNA that
could potentially be translated into a polypeptide; i.e., it begins
with an ATG "start" codon and terminates with one of the 3 "stop"
codons. For an ORF to be considered as a good candidate for coding
a bona fide cellular protein, a minimum size requirement is often
set, e.g., many of the systematic sequencing groups define an ORF as a
stretch of DNA that would code for a protein of 100 amino acids or
more. An ORF is not usually considered equivalent to a gene or locus
until there has been shown to be a phenotype associated with a
mutation in the ORF, and/or an mRNA transcript or a gene product
generated from the ORF's DNA has been detected. See ORF naming
conventions for how ORF's are named in Saccharomyces
All S. cerevisiaeORF's are designated by
a symbol consisting of three uppercase letters followed by a number
and then another letter, as follows: Y (for "Yeast"); A - P for the
chromosome upon which the ORF resides (where "A" is chromosome I, up
to "P" for chromosome XVI); L or R (for Left or Right arm); a 3-digit
number corresponding to the order of the open reading frame on the
chromosome arm (starting from the centromere and counting out to the
telomere); and W or C for whether the open reading frame is on the
"Watson" or "Crick" strand (where "Watson" runs 5' to 3' from left
telomere to right telomere). Most ORF designations by the systematic
sequencing groups use a predicted 100 amino acid polypeptide as the
minimum size limit, except when a smaller gene has already been
characterized and localized to the chromosomal sequence. When a new
ORF is discovered on a chromosome that has already had its ORF's
named, the new ORF will usually be named by taking the name of an
adjacent ORF and adding an "A" or "B" to the end of it (this avoids
re-numbering all the distal ORF's).
PDB The PDB archive contains information about experimentally-determined
structures of proteins, nucleic acids, and complex assemblies. As a
member of the wwPDB, the RCSB PDB curates and annotates PDB data
according to agreed upon standards.
Saccharomyces Genome Database. The SGD project collects
information and maintains a database of the molecular biology of the
yeast Saccharomyces cerevisiae. This database includes a
variety of genomic and biological information. SGD is funded by the
National Center for Human Genome Research (NCGHR) at the U.S. National
Institutes of Health. The SGD is in the Department of Genetics at the
School of Medicine, Stanford University. The SGD Homepage is located
Stanford Mincroarray Database. The SMD project stores raw and normalized data from microarray experiments, as well as their corresponding image files. In addition, SMD provides interfaces for data retrieval, analysis and visualization.
SMD isfunded by the National Cancer Institute at the US National Institutes of Health, the National Science Foundation, and the Howard Hughes Medical Institute fund the Microarray Database. The database is a joint project in the Departments of Biochemistry and Genetics at the School of Medicine, Stanford University.
The SMD Homepage is located at http://smd.Stanford.EDU/.
is an annotated protein sequence database. Within a Locus page, an
external link is provided (at the "SwissProt" tag) to the SwissProt
entry for the gene, which includes the amino acid sequence for the
protein encoded by the gene.
SMD uses an asterisk "*" as a wildcard symbol. In a search, the
wildcard character shows where any text can be tolerated. For
example, searching for the category "DNA*" will produce all categories that begin with "DNA". Since the database requires exact matches to its format for searches to be productive, wise use of the "*" wildcard character is needed for many types of searches.
The Yeast Protein
Database maintained by Proteome, Inc. YPD contains
physical, functional and some genetic information about
Saccharomyces cerevisiae. SMD provides direct links to YPD into gene name search