Child pages
  • Call Variants with SAMtools Element
Skip to end of metadata
Go to start of metadata

Calls SNPs and INDELS with SAMtools mpileup and bcftools.

Parameters in GUI

ParameterDescriptionDefault value
Output variants file

The url to the file with the extracted variations.

 
Reference

Specify a file with the reference sequence.

The sequence will be used as reference for all datasets with NGS assemblies.

 
Use reference from

Specify "File" to set a single reference sequence for all input NGS assemblies. The reference should be set in the "Reference" parameter.

Specify "Input port" to be able to set different references for difference NGS assemblies. The references should be input via the "Input sequences" port (e.g. use datasets in the "Read Sequence" element).

File
Illumina-1.3+ encodingAssume the quality is in the Illumina 1.3+ encoding (mpileup)(-6).False 
Count anomalous read pairsDo not skip anomalous read pairs in variant calling (mpileup)(-A). False
Disable BAQ computationDisable probabilistic realignment for the computation of base alignment quality (BAQ). BAQ is the Phred-scaled probability of a read base being misaligned. Applying this option greatly helps to reduce false SNPs caused by misalignments (mpileup)(-B). False
Mapping quality downgrading coefficientCoefficient for downgrading mapping quality for reads containing excessive mismatches. Given a read with a phred-scaled probability q of being generated from the mapped position, the new mapping quality is about sqrt((INT-q)/INT)*INT. A zero value disables this functionality; if enabled, the recommended value for BWA is 50 (mpileup)(-C).0
Max number of reads per input BAMAt a position, read maximally the number of reads per input BAM (mpileup)(-d).250
Extended BAQ computationExtended BAQ computation. This option helps sensitivity especially for MNPs, but may hurt specificity a little bit (mpileup)(-E).False
BED or position list fileBED or position list file containing a list of regions or sites where pileup or BCF should be generated. (mpileup)(-l). 
Pileup regionOnly generate pileup in region STR (mpileup)(-r). 
Minimum mapping qualityMinimum mapping quality for an alignment to be used (mpileup)(-q).
Minimum base qualityMinimum base quality for a base to be considered (mpileup)(-Q).13
Gap extension errorPhred-scaled gap extension sequencing error probability. Reducing INT leads to longer indels (mpileup)(-e).20
Homopolymer errors coefficientCoefficient for modeling homopolymer errors. Given an l-long homopolymer run, the sequencing error of an indel of size s is modeled as INT*s/l. (mpileup)(-h).100
No INDELsDo not perform INDEL calling (mpileup)(-I).False
Max INDEL depthSkip INDEL calling if the average per-sample depth is above INT (mpileup)(-L).250
Gap open errorPhred-scaled gap open sequencing error probability. Reducing INT leads to more indel calls (mpileup)(-o).40
List of platforms for indelsComma dilimited list of platforms (determined by @RG-PL) from which indel candidates are obtained.It is recommended to collect indel candidates from sequencing technologies that have low indel error rate such as ILLUMINA. (mpileup)(-P). 
Retain all possible alternateRetain all possible alternate alleles at variant sites. By default, the view command discards unlikely alleles. (bcf view)(-A).False
Indicate PLIndicate PL is generated by r921 or before (ordering is different) (bcf view)(-F).False
No genotype informationSuppress all individual genotype information (bcf view)(-G).False
A/C/G/T onlySkip sites where the REF field is not A/C/G/T (bcf view)(-N).False
List of sitesList of sites at which information are outputted (bcf view)(-l). 
QCALL likelihoodOutput the QCALL likelihood format (bcf view)(-Q).False
List of samplesList of samples to use. The first column in the input gives the sample names and the second gives the ploidy, which can only be 1 or 2. When the 2nd column is absent, the sample ploidy is assumed to be 2. In the output, the ordering of samples will be identical to the one in FILE (bcf view)(-s). 
Min samples fractionskip loci where the fraction of samples covered by reads is below FLOAT (bcf view)(-d).0
Per-sample genotypesCall per-sample genotypes at variant sites. (bcf view)(-g).True
INDEL-to-SNP RatioRatio of INDEL-to-SNP mutation rate. (bcf view)(-i).-1
Max P(ref|D)A site is considered to be a variant if P(ref|D)0.5
Prior allele frequency spectrumIf STR can be full, cond2, flat or the file consisting of error output from a previous variant calling run (bcf view)(-P).full
Mutation rateScaled mutation rate for variant calling (bcf view)(-t).0.001
Pair/trio callingEnable pair/trio calling. For trio calling, option -s is usually needed to be applied to configure the trio members and their ordering. In the file supplied to the option -s, the first sample must be the child, the second the father and the third the mother. The valid values of STR are ‘pair’, ‘trioauto’, ‘trioxd’ and ‘trioxs’, where ‘pair’ calls differences between two input samples, and ‘trioxd’ (‘trioxs’)specifies that the input is from the X chromosome non-PAR regions and the child is a female (male) (bcf view)(-T). 
N group-1 samplesNumber of group-1 samples. This option is used for dividing the samples into two groups for contrast SNP calling or association test. When this option is in use, the followingVCF INFO will be outputted: PC2, PCHI2 and QCHI2 (bcf view)(-1).0
N permutationsNumber of permutations for association test (effective only with -1) (bcf view)(-U).0
Min P(chi^2)Only perform permutations for P(chi^2).0.01
Minimum RMS qualityMinimum RMS mapping quality for SNPs (varFilter) (-Q).10
Minimum read depthMinimum read depth (varFilter) (-d).2
Maximum read depthMaximum read depth (varFilter) (-D).10000000
Alternate basesMinimum number of alternate bases (varFilter) (-a).2
Gap sizeSNP within INT bp around a gap to be filtered (varFilter) (-w).3
Window sizeWindow size for filtering adjacent gaps (varFilter) (-W).10
Strand biasMinimum P-value for strand bias (given PV4) (varFilter) (-1).0.0001
BaseQ biasMinimum P-value for baseQ bias (varFilter) (-2).1e-100
MapQ biasMinimum P-value for mapQ bias (varFilter) (-3).0
End distance biasMinimum P-value for end distance bias (varFilter) (-4).0.0001
HWEMinimum P-value for HWE (plus F).0.0001
Log filteredPrint filtered variants into the log (varFilter) (-p).False

Parameters in Workflow File

Type: call_variants

ParameterParameter in the GUIType
illumina13-encodingIllumina-1.3+ encoding

boolean

use_orphanCount anomalous read pairsboolean
disable_baqDisable BAQ computationboolean
capq_thresMapping quality downgrading coefficientnumeric
max_depthMax number of reads per input BAMnumeric
ext_baqExtended BAQ computationboolean
bedBED or position list filestring
regPileup regionstring
min_mqMinimum mapping qualitynumeric
min_baseqMinimum base qualitynumeric
extQGap extension errornumeric
tandemQHomopolymer errors coefficientnumeric
no_indelNo INDELsboolean
max_indel_depthMax INDEL depthnumeric
openQGap open errornumeric
pl_listList of platforms for indelsstring
keepaltRetain all possible alternateboolean
fix_plIndicate PLboolean
no_genoNo genotype informationboolean
acgt_onlyA/C/G/T onlyboolean
bcf_bedList of sitesstring
qcallQCALL likelihoodboolean
samplesList of samplesstring
min_smpl_fracMin samples fractionnumeric
call_gtPer-sample genotypesboolean
indel_fracINDEL-to-SNP Rationumeric
prefMax P(ref|D)numeric
ptypePrior allele frequency spectrumstring
thetaMutation ratenumeric
ccallPair/trio callingstring
n1N group-1 samplesnumeric
n_permN permutationsnumeric
min_perm_pMin P(chi^2)numeric
min-qualMinimum RMS qualitynumeric
min-depMinimum read depthnumeric
max-depMaximum read depthnumeric
min-alt-basesAlternate basesnumeric
gap-sizeGap sizenumeric
window"Window sizenumeric
min-strandStrand biasnumeric
min-baseQBaseQ biasstring
min-mapQMapQ biasnumeric
min-end-distanceEnd distance biasnumeric
min-hweHWEnumeric
print-filteredLog filteredboolean

Input/Output Ports

The element has 2 input ports:

Name in GUI: Input assembly

Name in Workflow File: in-assembly

Slots:

Slot In GUISlot in Workflow FileType
Dataset namedatasetstring
Source urlurlstring

Name in GUI: Input sequences

Name in Workflow File: in-sequence

Slots:

Slot In GUISlot in Workflow FileType
Source urlurlstring

And 1 output port:

Name in GUI: Output variations

Name in Workflow File: out-variations

Slots:

Slot In GUISlot in Workflow FileType
Variation trackvariation-trackvariation

 

 

 

  • No labels