Home Search Distribution Filter Download Import Help

Below is an example detailed output one can receiving by doing a search for 'EST2' and following the link to the putative PRF signal at position 1995. Click on any portion of the page below to receive a short description of that field.

Example entry from the PRFdb.
Species:
Saccharomyces cerevisiae
Accession:
SGDID:S0004310
Position:
1995
Algorithm:
pknots
Barcode:
112233
Sequence Length:
100
Slippery Site:
UUUAAAA
Base Pairs:
21
MFE:
-16.3 kcal/mol
Z Score:
-0.73
Randomized Mean MFE:
-13.9 ± 0.3 kcal/mol
PPCC:
0.9632
         10        20        30        40        50        60        70        80        90        100
Sequence CUGGCUGACGAUUUCCUUAUAAUAUCAACAGACCAACAGCAAGUGAUCAAUAUCAAAAAGCUUGCCAUGGGCGGAUUUCAAAAAUAUAAUGCGAAAGCCA Download subsequence
Parens .((((((.....................))).))).((((((((...............))))))..))(((...((((.............))))))). Download brackets
Parsed .111111.....................111.111.22222222...............222222..22333...3333.............3333333. Download parsed
Feynman Download bpseq

Most information is kept on the 'detail' page. Once a particular gene and putative PRF site has been chosen, it is possible to view some more specific information about that PRF signal here.
Each group of information concerns one folding performed for the given PRF signal. Thus we see once again the gene's name, species, accession, and position.
The 'algorithm' field specifies which minimum free energy folding program was used to fold this specific sequence window.
'Barcode' provides a shorthand notation to quickly identify how many stems exist in the putative secondary structure and how they are oriented. In this notation, a simple stem is '11' and a H-type pseudoknot is '1212'.
'Sequence length' and 'Slippery site' provide some information about the context of the given sequence window.
'Base pairs' and 'MFE' show the measurements provided by the mimimum free energy heuristics of the given prediction algorithm.
'Z score,' 'Randomized Mean MFE,' and 'PPCC' provide some statistical information about how this particular sequence window compares to a distribution of randomized sequences folded in a similar fashion.
'Z score' is defined as (Actual mfe - the mean mfe of the randomized sequences) / the standard deviation of the randomized sequences. As a result, the more negative a given Z score is, the more interesting it should prove.
PPCC is a measurement which can give a sense of how likely the Z score is to be useful. A Z score assumes its data is normally distributed. The PPCC provides a measurement of how close to 'normal' a given set of values actually is.
These last few fields are summarized in the graph on the right side of the table. This bar graph shows a series of 'bins' and what percentage of the randomized sequences have minimum free energies falling into each 'bin.' A red line provides a guide to how many entries one would expect in each bin if the minimum free energy values were normally distributed. The green line shows where on this distribution the actual minimum free energy lies. If the actual MFE is more negative than the mean of the distribution of random values, that is a good sign for the significance of the putative PRF signal.
Below this graph lies a sequence view, and a few ways to visualize the putative mRNA secondary structure, including a bracket notation, a numeric stem diagram, and a linear feynman image.


Species
The 'species' field tells just that. Currently the only likely species one will see are Saccharomyces cerevisiae, Homo sapiens, and Mus musculus

Graph
Every sequence window of the PRFdb is randomized using one of a few potential randomization strategies (shuffling, maintain nucleotide frequencies, maintain dinucleotide frequencies, maintain reading frame). In each instance, the given sequence window is randomized and refolded using the given algorithm 100 times. This plot shows how often the resulting randomized sequences result in each provided minimum free energy. Thus, we can see that 10 randomized sequences of the 100 have a minimum free energy of 15.4 Kcal/mol when folded with pknots. The red line provides an idealized normal distribution of how many sequences one would expect of each range if the sequences are properly randomized. Finally, the green bar shows the actual minimum free energy calculated using pknots at this position. The further to the left of the mean this value reaches, the more significant it should be with respect to its randomized sequences.

Accession
The accession provides the genbank or SGD accession of the given gene. Clicking on the accession itself will take you back to the gene view.

Position
This number defines how many bases from the beginning of the mRNA the actual PRF signal may be found. In order to get a sense of where this is in the gene, it may be most useful to follow the accession link and look at the MFE minima graph or the text representation of the message.

Algorithm
The algorithm may be one of 'pknots,' 'nupack,' or 'hotknots.' These secondary structure prediction programs provide a putative mRNA structure as well as a putative minimum free energy.

Barcode
The 'barcode' is a shorthand representation of the secondary structureo of a given sequence window. A sequence consisting of a single stem would thus show up as '11' while an H-type pseudoknot is '121' and a SARS-like pseudoknot with a secondary stem would be '121332'.

Sequence Length
The sequence lenght is usally 100 bases. However this is variable in Saccharomyces cerevisiae sequences.

Slipsite
There are 24 separate slippery heptamers which follow the motif NNNWWWH. This field shows which heptamer is used by this sequence window.

Number of Base Pairs
This field provides the number of bases which are paired in the sequence.

MFE
The minimum free energy is calculated by summing the free energy change associated with a group of known base pair configurations including: stacking bases, GC/AU/GU base pairs, internal bulges, hairpin loops, and known motifs like tetraloops. Each secondary structure prediction algorithm makes different assumptions about these.
The putative minimum free energy calculated by pknots for this sequence window is provided here.

z score
The z score of a given sequence is calcuated as the difference between the mean mfe of the randomized sequences and the mfe of the given sequence divided by the standard deviation of the randomized mfes. Thus values which are more negative are more significant with respect to the randomized sequences.

Randomized MFE
This field shows the mean of the randomized mfe values. The standard error is reported. It is calculated as the standard deviation divided by the square root of the number of iterations. In most situations this is the standard deviation divided by 10. As a result this is a nice summary of what is going into the z score calculation.

PPCC
PPCC stands for Probability Plot Correlation Coefficient. This is calculated by taking the Correlation coefficient of an idealized normal distribution and the actual distribution provided by the randomized mfe values. Each value of the idealized normal distribution is i/n+1 where both i and n iterate from 1 until (usually) 100. Each value of the actual distribution is calculated by sorting the randomized mfe values and calculating the upper probability of a normal distribution of the mfe value - the mean mfe divided by the standard deviation. As the given coefficient approaches 1.0, the more similar that array of randomized values is to a normal distribution.

Sequence Window The actual sequence window folded. You can download a fasta file of it to the right.

Parens Some MFE prediction algorithms present a parenthesis output shorthand where '()' describes bases participating in stems and '{}' describes bases participating in stems which are a part of a pseudoknot. For those who prefer this output, this is provided, as well as a link to download this output.

Parsed The 'parsed' output is a shorthand way to describe each stem. Each base is either a '.' or number of a stem.

Feynman A feynman diagram of this secondary structure is generated for each sequence window. It is possible to download the .bpseq format of the secondary structure beside this image.