|
| Home | Search | Distribution | Filter | Download | Import | Help | |
Below is an example detailed output one can receiving by doing a search for 'EST2' and following the link to the putative PRF signal at position 1995. Click on any portion of the page below to receive a short description of that field.
Example entry from the PRFdb.| Saccharomyces cerevisiae | ||
|
Accession:
|
SGDID:S0004310 | |
|
Position:
|
1995 | |
|
Algorithm:
|
pknots | |
|
Barcode:
|
112233 | |
|
Sequence Length:
|
100 | |
|
Slippery Site:
|
UUUAAAA | |
|
Base Pairs:
|
21 | |
|
MFE:
|
-16.3 kcal/mol | |
|
Z Score:
|
-0.73 | |
|
Randomized Mean MFE:
|
-13.9 ± 0.3 kcal/mol | |
|
PPCC:
|
0.9632 |
| 10 20 30 40 50 60 70 80 90 100 | |
| Sequence | CUGGCUGACGAUUUCCUUAUAAUAUCAACAGACCAACAGCAAGUGAUCAAUAUCAAAAAGCUUGCCAUGGGCGGAUUUCAAAAAUAUAAUGCGAAAGCCA Download subsequence |
| Parens | .((((((.....................))).))).((((((((...............))))))..))(((...((((.............))))))). Download brackets |
| Parsed | .111111.....................111.111.22222222...............222222..22333...3333.............3333333. Download parsed |
| Feynman |
Most information is kept on the 'detail' page. Once a particular
gene and putative PRF site has been chosen, it is possible to view
some more specific information about that PRF signal here.
Each group of information concerns one folding performed for the
given PRF signal. Thus we see once again the gene's name, species,
accession, and position.
The 'algorithm' field specifies which minimum free energy folding
program was used to fold this specific sequence window.
'Barcode' provides a shorthand notation to quickly identify how many
stems exist in the putative secondary structure and how they are
oriented. In this notation, a simple stem is '11' and a H-type
pseudoknot is '1212'.
'Sequence length' and 'Slippery site' provide some information about
the context of the given sequence window.
'Base pairs' and 'MFE' show the measurements provided by the mimimum
free energy heuristics of the given prediction algorithm.
'Z score,' 'Randomized Mean MFE,' and 'PPCC' provide some
statistical information about how this particular sequence window
compares to a distribution of randomized sequences folded in a
similar fashion.
'Z score' is defined as (Actual mfe - the mean mfe of the randomized
sequences) / the standard deviation of the randomized sequences.
As a result, the more negative a given Z score is, the more
interesting it should prove.
PPCC is a measurement which can give a sense of how likely the Z
score is to be useful. A Z score assumes its data is normally
distributed. The PPCC provides a measurement of how close to
'normal' a given set of values actually is.
These last few fields are summarized in the graph on the right side
of the table. This bar graph shows a series of 'bins' and what
percentage of the randomized sequences have minimum free energies
falling into each 'bin.' A red line provides a guide to how many
entries one would expect in each bin if the minimum free energy
values were normally distributed. The green line shows where on
this distribution the actual minimum free energy lies. If the
actual MFE is more negative than the mean of the distribution of
random values, that is a good sign for the significance of the
putative PRF signal.
Below this graph lies a sequence view, and a few ways to visualize
the putative mRNA secondary structure, including a bracket notation,
a numeric stem diagram, and a linear feynman image.
Species
The 'species' field tells just that. Currently the only likely
species one will see are Saccharomyces cerevisiae, Homo
sapiens, and Mus musculus
Graph
Every sequence window of the PRFdb is randomized using one of a few
potential randomization strategies (shuffling, maintain nucleotide
frequencies, maintain dinucleotide frequencies, maintain reading
frame). In each instance, the given sequence window is randomized
and refolded using the given algorithm 100 times. This plot shows
how often the resulting randomized sequences result in each provided
minimum free energy. Thus, we can see that 10 randomized sequences
of the 100 have a minimum free energy of 15.4 Kcal/mol when folded
with pknots. The red line provides an idealized normal distribution
of how many sequences one would expect of each range if the
sequences are properly randomized. Finally, the green bar shows the
actual minimum free energy calculated using pknots at this
position. The further to the left of the mean this value reaches,
the more significant it should be with respect to its randomized
sequences.
Accession
The accession provides the genbank or SGD accession of the given
gene. Clicking on the accession itself will take you back to the
gene view.
Position
This number defines how many bases from the beginning of the mRNA
the actual PRF signal may be found. In order to get a sense of
where this is in the gene, it may be most useful to follow the
accession link and look at the MFE minima graph or the text
representation of the message.
Algorithm
The algorithm may be one of 'pknots,' 'nupack,' or 'hotknots.'
These secondary structure prediction programs provide a putative
mRNA structure as well as a putative minimum free energy.
Barcode
The 'barcode' is a shorthand representation of the secondary
structureo of a given sequence window. A sequence consisting of a
single stem would thus show up as '11' while an H-type pseudoknot
is '121' and a SARS-like pseudoknot with a secondary stem would be
'121332'.
Sequence Length
The sequence lenght is usally 100 bases. However this is
variable in Saccharomyces cerevisiae sequences.
Slipsite
There are 24 separate slippery heptamers which follow the motif
NNNWWWH. This field shows which heptamer is used by this sequence
window.
Number of Base Pairs
This field provides the number of bases which are paired in the
sequence.
MFE
The minimum free energy is calculated by summing the free energy
change associated with a group of known base pair configurations
including: stacking bases, GC/AU/GU base pairs, internal bulges,
hairpin loops, and known motifs like tetraloops. Each secondary
structure prediction algorithm makes different assumptions about
these.
The putative minimum free energy calculated by pknots for this
sequence window is provided here.
z score
The z score of a given sequence is calcuated as the difference
between the mean mfe of the randomized sequences and the mfe of
the given sequence divided by the standard deviation of the
randomized mfes. Thus values which are more negative are more
significant with respect to the randomized sequences.
Randomized MFE
This field shows the mean of the randomized mfe values. The
standard error is reported. It is calculated as the standard
deviation divided by the square root of the number of iterations.
In most situations this is the standard deviation divided by 10.
As a result this is a nice summary of what is going into the z
score calculation.
PPCC
PPCC stands for Probability Plot Correlation Coefficient. This is
calculated by taking the Correlation coefficient of an idealized
normal distribution and the actual distribution provided by the
randomized mfe values. Each value of the idealized normal
distribution is i/n+1 where both i and n iterate from 1 until
(usually) 100. Each value of the actual distribution is calculated
by sorting the randomized mfe values and calculating the upper
probability of a normal distribution of the mfe value - the mean mfe
divided by the standard deviation. As the given coefficient
approaches 1.0, the more similar that array of randomized values is
to a normal distribution.
Sequence Window The actual sequence window folded. You can download a fasta file of it to the right.
Parens Some MFE prediction algorithms present a parenthesis output shorthand where '()' describes bases participating in stems and '{}' describes bases participating in stems which are a part of a pseudoknot. For those who prefer this output, this is provided, as well as a link to download this output.
Parsed The 'parsed' output is a shorthand way to describe each stem. Each base is either a '.' or number of a stem.
Feynman A feynman diagram of this secondary structure is generated for each sequence window. It is possible to download the .bpseq format of the secondary structure beside this image.