In general, DIAMOND Kraken is a sequence aligner for protein and translated DNA searches similar to the NCBI BLAST software tools. However, it provides a speedup of BLAST ranging up to x20,000.
Using this workflow element one can use DIAMOND for taxonomic classification of short DNA reads and longer sequences such as scaffoldstaxonomic sequence classifier that assigns taxonomic labels to short DNA reads. It does this by examining the k-mers within a read and querying a database with those.
Parameters in GUI
Parameter | Description | Defaultvalue | |
---|---|---|---|
Database | Input a binary DIAMOND database file. | ||
Genetic code | Genetic code used for translation of query sequences (--query-gencode). | The standard genetic code | |
Sensitive mode | The sensitive modes (--sensitive, --more-sensitive) are generally recommended for aligning longer sequences. The default mode is mainly designed for short read alignment, i.e. finding significant matches of >50 bits on 30-40aa fragments. | Default | |
Frameshift | Penalty for frameshift in DNA-vs-protein alignments. Values around 15 are reasonable for this parameter. Enabling this feature will have the aligner tolerate missing bases in DNA sequences and is most recommended for long, error-prone sequences like MinION reads. | Skipped | |
Expected value | Maximum expectedvalueto report an alignment. | 0.0010 | |
Matrix | Scoring matrix (--matrix). | BLOSUM62 | |
Gap open penalty | Gap open penalty (--gapopen). | Default | |
Gap extension penalty | Gap extension penalty (--gapextend). | Default | |
Block size | Block size in billions of sequence letters to be processed at a time (--block-size). This is the main parameter for controlling the program’s memory usage. Bigger numbers will increase the use of memory and temporary disk space, but also improve performance. The program can be expected to use roughly six times this number of memory (in GB). | 2 | |
Index chunks | The number of chunks for processing the seed index (--index-chunks). This option can be additionally used to tune the performance. It is recommended to set this to 1 on a high memory server, which will increase performance and memory usage, but not the usage of temporary disk space. | 4 | |
Number of threads | Number of CPU threads (--treadsInput data | To classify single-end (SE) reads or scaffolds, received by reads de novo assembly, set this parameter to "SE reads or scaffolds". | SE reads or scaffolds |
Database | A path to the folder with the Kraken database files. | ||
Quick operation | Stop classification of an input read after the certain number of hits. | False | |
Load database into memory | Load the Kraken database into RAM (--preload). | True | |
Number of threads | Use multiple threads (--threads). | 8 | |
Output file | Specify the output file name. | auto |
Parameters in Workflow File
Type: diamondkraken-classify
Parameter | Parameter in the GUI | Type | ||
---|---|---|---|---|
database | Database | string | ||
genetic-code | Genetic code | number | ||
sensitive-mode | Sensitive mode | string | ||
frame-shift | Frameshift | number | ||
e-value | Expected value | number | ||
matrix | Matrix | string | ||
gap-open | Gap open penalty | number | ||
gap-extend | Gap extension penalty | number | ||
block-size | Block size | number | ||
index-chunks | Index chunks | numberinput-data | Input data | string |
database | Database | string | ||
quick-operation | Quick operation | bool | ||
preload | Load database into memory | bool | ||
threads | Number of threads | number | ||
output-url | Output file | string |
Input/Output Ports
The element has 1 input port:
Name in GUI: Input sequences:
URL(s) to FASTQ or FASTA file(s) should be provided. The input files may contain single-end reads, scaffolds, or "left" reads in case of the paired-end sequencing (see In case of SE reads or scaffolds use the "Input URL 1" slot only.
In case of PE reads input "left" reads to "Input URL 1", "right" reads to "Input URL 2". See also the "Input data" parameter of the element).
Name in Workflow File: in
Slots:
SlotInGUI | Slot in Workflow File | Type |
---|---|---|
Input URL | url | string |
The element has 1 output port:
Name in GUI: DIAMOND Kraken Classification:
A list map of sequence names with the associated taxonomy IDs, classified by DIAMONDKraken.
Name in Workflow File: out
Slots:
SlotInGUI | Slot in Workflow File | Type |
---|---|---|
Taxonomy classification data | tax-data | tax-classification |