Page History

In general, DIAMOND Kraken is a sequence aligner for protein and translated DNA searches similar to the NCBI BLAST software tools. However, it provides a speedup of BLAST ranging up to x20,000.
Using this workflow element one can use DIAMOND for taxonomic classification of short DNA reads and longer sequences such as scaffoldstaxonomic sequence classifier that assigns taxonomic labels to short DNA reads. It does this by examining the k-mers within a read and querying a database with those.

Parameters in GUI

Parameter	Description	Defaultvalue
Database	Input a binary DIAMOND database file.
Genetic code	Genetic code used for translation of query sequences (--query-gencode).	The standard genetic code
Sensitive mode	The sensitive modes (--sensitive, --more-sensitive) are generally recommended for aligning longer sequences. The default mode is mainly designed for short read alignment, i.e. finding significant matches of >50 bits on 30-40aa fragments.	Default
Frameshift	Penalty for frameshift in DNA-vs-protein alignments. Values around 15 are reasonable for this parameter. Enabling this feature will have the aligner tolerate missing bases in DNA sequences and is most recommended for long, error-prone sequences like MinION reads.	Skipped
Expected value	Maximum expectedvalueto report an alignment.	0.0010
Matrix	Scoring matrix (--matrix).	BLOSUM62
Gap open penalty	Gap open penalty (--gapopen).	Default
Gap extension penalty	Gap extension penalty (--gapextend).	Default
Block size	Block size in billions of sequence letters to be processed at a time (--block-size). This is the main parameter for controlling the program’s memory usage. Bigger numbers will increase the use of memory and temporary disk space, but also improve performance. The program can be expected to use roughly six times this number of memory (in GB).	2
Index chunks	The number of chunks for processing the seed index (--index-chunks). This option can be additionally used to tune the performance. It is recommended to set this to 1 on a high memory server, which will increase performance and memory usage, but not the usage of temporary disk space.	4
Number of threads	Number of CPU threads (--treadsInput data	To classify single-end (SE) reads or scaffolds, received by reads de novo assembly, set this parameter to "SE reads or scaffolds". To classify paired-end (PE) reads, set the value to "PE reads". One or two slots of the input port are used depending on the value of the parameter. Pass URL(s) to data to these slots. The input files should be in FASTA or FASTQ formats.	SE reads or scaffolds
Database	A path to the folder with the Kraken database files.
Quick operation	Stop classification of an input read after the certain number of hits. The value can be specified in the "Minimum number of hits" parameter.	False
Load database into memory	Load the Kraken database into RAM (--preload). This can be useful to improve the speed. The database size should be less than the RAM size. The other option to improve the speed is to store the database on ramdisk. Set this parameter to "False" in this case.	True
Number of threads	Use multiple threads (--threads).	8
Output file	Specify the output file name.	auto

Parameters in Workflow File

Type: diamondkraken-classify

number

Parameter	Parameter in the GUI	Type
database	Database	string
genetic-code	Genetic code	number
sensitive-mode	Sensitive mode	string
frame-shift	Frameshift	number
e-value	Expected value	number
matrix	Matrix	string
gap-open	Gap open penalty	number
gap-extend	Gap extension penalty	number
block-size	Block size	number
index-chunks	Index chunks	input-data	Input data	string
database	Database	string
quick-operation	Quick operation	bool
preload	Load database into memory	bool
threads	Number of threads	number
output-url	Output file	string

Input/Output Ports

The element has 1 input port:

Name in GUI: Input sequences:

URL(s) to FASTQ or FASTA file(s) should be provided. The input files may contain single-end reads, scaffolds, or "left" reads in case of the paired-end sequencing (see In case of SE reads or scaffolds use the "Input URL 1" slot only.

In case of PE reads input "left" reads to "Input URL 1", "right" reads to "Input URL 2". See also the "Input data" parameter of the element).

Name in Workflow File: in

Slots:

SlotInGUI	Slot in Workflow File	Type
Input URL	url	string

The element has 1 output port:

Name in GUI: DIAMOND Kraken Classification:

A list map of sequence names with the associated taxonomy IDs, classified by DIAMONDKraken.

Name in Workflow File: out

Slots:

SlotInGUI	Slot in Workflow File	Type
Taxonomy classification data	tax-data	tax-classification

Child pages

Versions Compared

Old Version 2

New Version Current

Key

Parameters in Workflow File

Input/Output Ports