FAQ - frequently asked questions:


What are transposable elements?


SINE - short interspersed repeat are short DNA sequences (<500 bases) that represent reverse-transcribed RNA molecules originally transcribed by RNA polymerase III into tRNA, rRNA, and other small nuclear RNAs. SINEs do not encode a functional reverse transcriptase protein and rely on other mobile elements for transposition. The most common SINES in primates are called Alu sequences.

LINE - long interspersed repeat are long DNA sequences (>5kb) that represent reverse-transcribed RNA molecules originally transcribed by RNA polymerase II into mRNA. LINE elements code for 2 proteins; one that has the ability to bind single stranded RNA, and another that has known reverse transcriptase and endonuclease activity, enabling them to copy both themselves and noncoding SINES such as Alu elements. A typical LINE contains a 5'UTR (untranslated region) 2 ORFs (open reading frames) and a 3'UTR. The 5'UTR contains an internal polymerase II promoter sequence, while the 3'UTR contains a polyadenylation signal (AATAAA) and a poly-A tail.


LTR retroposons (long terminal repeat) - are flanked by long terminal direct repeats that contain all of the necessary transcriptional regulatory elements. The autonomous elements (retrotransposons) contain gag and pol genes, which encode a protease, reverse transcriptase, RNAse H and integrase. Exogenous retroviruses seem to have arisen from endogenous retrotransposons by acquisition of a cellular envelope gene (env). Transposition occurs through the retroviral mechanism with reverse transcription occurring in a cytoplasmic virus-like particle, primed by a tRNA (in contrast to the nuclear location and chromosomal priming of LINEs).


DNA - DNA transposons resemble bacterial transposons, having terminal inverted repeats and encoding a transposase that binds near the inverted repeats and mediates mobility through a 'cut-and-paste' mechanism.


How to search for a gene in Transpogene database?

Transpogene allows 3 search possibilites:
1. Search for a single gene, e,g: NADK.
2. Search for a group of genes separated by comma, e.g: CA1, Car1. In this case Car1 is the murine ortholog gene of the human gene CA1 and both genes can be compared in the same query for frequency and location of TEs.
3. Search using wild card (the sign '%'), e.g: brca%. In this case all genes with the prefix 'brca...' will be searched and the search will yield brca1 and brca2 genes.
Comment: Since Transpogene holds only one name for genes with multiple names, it is recommended that in case you don't find your gene of interest, try to use Reseq and Swissprot accessions or try to enter the genomic positions of the gene.
Return to TranspoGene homepage

How to read the results?



















































TE family - contains the species name underscore and the TE family of transposed elements (for example: Alu, MIR )

TE name - The subfamily of the transposed element (for example: AluJb, MIRb )

Chromosome - the chromosome number on which the transposed element is located

TE strand - the strand on which the transposable element is located

Repeat position - the coordination of the transposed element within the genome.

Example of transcript including the exon - an EST or cDNA accession that contains the repetitive element.

Transcript strand - the strand of the gene containing the transposed element.

Exon start - the start coordinate of the exon within the transcript

Exon end - the end coordinate of the exon within the transcript

3' splice site - the 3' splice site sequence of the exon containing positions -6 to -1 relative to the beginning of the exon.

Exon sequence: overlapping part with TE seq is red, the rest in black - The sequence of the exon, in which the transposed element sequence is marked in red and the non-transposed element sequence is marked in black

5' splice site - the 5' splice site sequence of the exon containing positions -3 to +6 relative to the end of the exon.

TE sequence: overlapping part of exon is in black, the rest in red the transposed element sequence, the sequence overlapping the resulted exon is marked in black, the intronic parts are marked in red.

TE orientation to gene - transposed elements with opposite strand relative to the transcript are marked as antisense, these that have the same strand are marked as sense (e.g. if TE strand ≠ Transcript strand then TE orientation to gene = antisense and vice versa).

Gene region - if the exon is located in the CDS or UTR.

Number of transcripts holding exon - the number of ESTs/cDNA that contains the TE exon.

Number of transcripts skipping exons - the number of ESTs/cDNA that do not contain the TE exon.

Inclusion level - is calculated by the below equation: Ni/(Ni+Ne)

where Ni = the number of ESTs/cDNA that contains the TE exon

and Ne = the number of ESTs/cDNA that do not contain the TE exon

Splice mode - whether the exon is an exon skipping/constitutive exon/alternative 5'ss exon/alternative 3'ss exon

Organism - the genome in which this exon was detected human/mouse/chicken/zebrafish/fly/nematode/ciona intestinalis.

Gene name - the name of the gene in which the exonization occur.

Gene strand - the strand from which the gene is transcribed

Refseq accession protein - the refseq accession number of the corresponding protein sequence coded by the gene.

Protein description - The description of the protein function. Usually it's the meaning of the gene name

Gene related diseases - diseases that are correlated to this gene mis-function.

Expression in normal tissues in Affymetrix exon chip - the tissues in which this gene is expressed according to affymetrix published microarray data.

Location of TE on refseq mRNA - the intron or exon number in which the transposed element is located within the refseq sequence (e.g. in this example the transposed element is located within the first intron of the refseq out of 10 introns within this gene).

Alignment to consensus sequence - RepeatMasker (www.repeatmasker.org) alignment to the transposed element sequence relative to the consensus sequence. The above row (contating the name of the link) is the current TE sequence.



How to interpret the TE alignment files?


The TE alignemnet files are the alignment received when aligning the current TE to its consensus sequence using RepeatMasker program. In order to open the alignment file, the user presses a link with a internal identifier of the current TE seuqnece. The rows in the alignment files start with the TE identifier, or the TE type, which are followed by the sequence of the current TE in the genome, and the consensus sequence of TE, respectively. The "i","v" and "-" signs mark transitions (purine->purine or pyrimidine->pyrimidine), tranversions (purine->pyrimidine or vice versa), and gaps, respectively. The numbers in the beginning of each row mark the position in the relevant sequence.



What citation should I use for the website?

Please cite the paper:
Levy A., Sela N., Ast G.
Transpogene and microTranspoGene: transposed elements influence on the transcriptome of seven vertebrates and intertebrates,

Nucl. Acids Res., 36, D47-D52 (2008)

and the website: