Information about Sequence

Statistics about an opened sequence can be found on the Statistics tab of the Options Panel. When a region is selected in the sequence, the statistics is interactively re-calculated for this region only. The following information is available:

Common statistics (length, molecular weight, etc.) - see the detailed description below
Characters occurrence
Dinucleotides occurrence - available for nucleotide sequences only

Note that all data, displayed on the Statistics tab, can be selected with the mouse and copied. Use the copy item in the context menu or a shortcut - Ctrl+C on Windows or Linux, Cmd+C on macOS.

Nucleotide sequence common statistics

The following common statistical information is calculated for a nucleotide sequence:

Length
GC content
Molecular weight
Extinction coefficient
Melting temperature
nmole/OD260
μg/OD260

GC content

TODO: add description/meaning?

The percentage of guanine (G) and cytosine (C) bases in within the sequence or its selected region, for example:

GC-content("ACGTAC") = ((0 + 1 + 1 + 0 + 0 + 1) / 6) * 100% = 50%

If the sequence contains degenerate base characters, average values are used, for example:

GC-content("ACGNBCT") = ((0 + 1 + 1 + 1/2 + 2/3 + 1 + 0) / 7) * 100% ~= 59.52%

In this example "1/2" is used for "N" (any nucleotide), "2/3" us used for "B" (that means "C", "G", or "T" according to the IUPAC ~~notation~~ nucleotide code).

Molecular weight

TODO: add description/meaning?

Molecular weight for a single-stranded molecule is calculated as ~~the~~ a sum of the atomic masses of the molecule compounds:

DNA molecular weight = nA*251.24 + nT*242.23 + nC*227.22 + nG*267.24 + (n-1)*61.97

RNA molecular weight = nA*267.24 + nU*244.20 + nC*243.22 + nG*283.24 + (n-1)*61.97

Here "nA", "nT", "nC", "nG", "nU" denote the number of the corresponding nucleotide in the molecule, "n" is the number of all bases (61.97 is the weight of an internal phosphate).

Note that for degenerate base characters average value of nucleotide weight is used, for example, if the sequence also contain "Y" characters (that is "C" or "T"), the sum will include one more summand - "nY*(242.23 + 227.22)/2".

Molecular weight for a double-stranded molecule is calculated as the sum of the single strands molecular weights.

Extinction coefficient

TODO: add description/meaning? I think, it's OK for now

To calculate the Extinction coefficient (Molar extinction coefficient), an approach proposed by Richard Owczarzy is used: http://www.owczarzy.net/extinctionDNA.htm. That is for a single-stranded molecule:

Extinction coefficient = sum(extinction coefficients of all dinucleotides) - sum(extinction coefficients of inner mononucleotides)

TODO: should the table below be simplified? Don't think, if it's necessary

The table below specified the extinction coefficients for dinucleotides and mononulceotides:

DNA		RNA
Stack or monomer	Extinction coefficient	Stack or monomer	Extinction coefficient
pdA	15400	pA	15400
pdC	7400	pC	7200
pdG	11500	pG	11500
pdT	8700	pU	9900
dApdA	27400	ApA	27400
dApdC	21200	AdC	21000
dApdG	25000	ApG	25000
dApdT	22800	ApU	24000
dCpdA	21200	CpA	21000
dCpdC	14600	CpC	14200
dCpdG	18000	CpG	17800
dCpdT	15200	CpU	16200
dGpdA	25200	GpA	25200
dGpdC	17600	GpC	17400
dGpdG	21600	GpG	21600
dGpdT	20000	GpU	21200
dTpdA	23400	UpA	24600
dTpdC	16200	UpC	17200
dTpdG	19000	UpG	20000
dTpdT	16800	UpU	19600

For example, let's calculate the molar extinction coefficient ("ε") for "ATGCA":

ε(ATGCA) = ε(AT) + ε(TG) + ε(GC) + ε(CA) - ε(T) - ε(G) - ε(C) =

= 22800 + 19000 + 17600 + 21200 - 8700 - 11500 - 7400 =

= 53000

As for the other statistics, average values are used in case of degenerate base characters.

Extinction coefficient for a double-stranded molecule is calculated as a sum of the extinction coefficients of the two single strands (e_s1 + e_s2) multiplied by coefficient of (1 - hypochromicity h_260nm). The hypochromicity effect can be taken into account as follows:

h_260nm = (0.287f_AT + 0.059f_GC), where f_AT and f_GC are fractions of AT and GC base pairs, respectively.

Melting temperature

The melting temperature is calculated as follows. For sequences of length 15 or longer:

T_m = 64.9 + 41 * (nG + nC - 16.4) / (nA + nT + nG + nC)

For shorter sequences:

T_m = (nA + nT) * 2 + (nG + nC) * 4

Here "nA", "nT", "nC", "nG" denote the number of the corresponding nucleotide.

nmole/OD260

TODO: add description/meaning? DONE

The amount of DNA of RNA represented in nanomoles per 1 unit of absorbance at 260 nm dissolved in 1 ml cuvette with 1 cm pathlength

nmole/OD260 = 1000000 / molarExtCoef

μg/OD₂₆₀

TODO: add description/meaning? DONE

The amount of DNA of RNA represented in microgrames

μg/OD260 = nmoleOD260 * molarWeight * 0.001;

Amino acid sequence common statistics

The following common statistical information is calculated for an amino acid sequence:

Length
Molecular weight
Isoelectic point

Page tree