Origin of the promoter sequences

The C. intestinalis promoter sequences present in DBTGR were originally obtained from the JGI genome version 1.00. After the release of version 2.00, the corresponding sequences were obtained by BLAST searches using E-values smaller than 10^-100. The exact location of the 3'-end of the promoters was mapped by a second BLAST search using the the 25 last nucleotides of the promoter. Not all promoter sequences could be mapped to the newer genome version. In DBTGR, the latest available version is used for motif search and cross-species sequence alignments. When two versions of a promoter exists, both are available for download.

The C. savignyi promoter sequences were obtained from the VISTA prealignment of the C. savignyi genome and C. intestinalis genome version 1.00. As such, these sequences should be taken with care, as they may not represent the desired promoter, or represent it only partially.

Promoter sequence features

When the promoter sequence is available, it is displayed along with the identified regulatory elements. The TSS and TATA boxes are shown as highlighted bases, while binding sites are shown as arrows, either above or under the nucleotide sequence, depending on the strand on which they are found. A question mark is added next to predicted binding sites. Moving the mouse over a specific binding site will display informations about it.
Sites resulting from a motif search are shown as highlighted text in the sequence, with different colors depending on the strand on which they are found.

Cross-species promoter sequence alignment

The sequence alignments between C. savignyi and C. intestinalis promoter sequences were obtain using ClustalW with the default parameters.
Bases can be highlighted based on two parameters: the minimal length of an exact match sequence and the maximum distance between two such sequences. Using these two parameters, a visual overview of the sequence conservation can be obtained.
Clickable arrows representing the different identified binding sites are displayed above or under the two aligned sequences.
The arrows above the alignment refer to the upper sequence while the ones under the alignment refer to the lower sequence.

Sequence search

The scores returned by either the consensus or the weight matrix search are normalzied to range from 0 to 1.

Consensus search
The following letters can be used in the consensus sequence:

A, C, G, T
K (G/T), M (A/C), R (A/G), S (C/G), W (A/T), Y (C/T)
B (C/G/T), D (A/G/T), H (A/C/T), V (A/C/G)
N (A/C/G/T)

Other letters will be counted as a mismatch.

The scores of the consensus searches represent the number of matches between the query and the sequence divided by the length of the query.

Weight matrix search
In addition to using the provided weight matrices, a user-defined matrix can be given. It should consist of four lines of integer values separated by spaces. Each line correspond to one base, in the order ACGT. The first number of a line represents the number of the corresponding base found at the first position of the aligned sequences used to generate the matrix. The second number corresponds to the second position, and so on.

The odd-score for a position is calculated based on the method presented in Mount, D. W., (2001). Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press, Cold Spring Haror, NY, USA.

The scores of the weight matrix searches are normalized using the best possible score for the used matrix.

Searching DBTGR

Searching the database with an empty query will result in the listing of all entries.

When searching the database, the following terms can be used to modify the query:
+	will return only rows where the modified term is present (e.g. +muscle)
-	will return only rows where the modified term is absent (e.g. -muscle)
*	will return matches where words starting by the modified term are present (e.g. neur* will match both neuron and neural)
"..."	will return only matches containing the enclosed sentence (e.g. "central nervous system")

In the result page, the arrowheads above and below a field name allow to reorder the results based on the data of that field.

Obtaining information about binding motifs

From the detail page of a promoter, information about a binding motif can be obtained by clicking on either the name of a site in the binding sites lists, or on the arrow displayed with the promoter sequence. The information available include a list of transcription factors known to bind the motif, a list of all the occurence of this motif in the database and a position specific weight matrix computed from the binding sites known to be involved in regulation.

In the detail page, the arrowheads above and below a field name allow to reorder the listed binding sites based on the data of that field. Checking or unchecking the box in the display field and clicking on the reload button at the end of the list allows to choose which sites should be shown in the sequence view.

Obtaining information about transcription factors

From the detail page of a promoter, information about a transcription factor can be obtained by clicking on its name in the binding sites lists. The information available include a list of motifs known to be bound by the transcription factor, as well as a list of all the genes it is known to regulate.

Contributors

Nicolas Sierro	Human Genome Center, Institute of Medical Science, University of Tokyo
Riu Yamashita	Human Genome Center, Institute of Medical Science, University of Tokyo
Keun-Joon Park	Human Genome Center, Institute of Medical Science, University of Tokyo
Takehiro Kusakabe	Department of Life Science, Graduate School of Life Science, University of Hyogo
Kengo Kinoshita	Human Genome Center, Institute of Medical Science, University of Tokyo
Kenta Nakai	Human Genome Center, Institute of Medical Science, University of Tokyo