Child pages
  • Classify Sequences with DIAMOND
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

In general, DIAMOND is a sequence aligner for protein and translated DNA searches similar to the NCBI BLAST software tools. However, it provides a speedup of BLAST ranging up to x20,000.
Using this workflow element one can use DIAMOND for taxonomic classification of short DNA reads and longer sequences such as scaffolds.

Parameters in GUI

 

ParameterDescriptionDefaultvalue
Database

Input a binary DIAMOND database file.

 
Genetic code

Genetic code used for translation of query sequences (--query-gencode).

The standard genetic code
Sensitive mode

The sensitive modes (--sensitive, --more-sensitive) are generally recommended for aligning longer sequences.

The default mode is mainly designed for short read alignment, i.e. finding significant matches of >50 bits on 30-40aa fragments.

Default
Frameshift

Penalty for frameshift in DNA-vs-protein alignments. Values around 15 are reasonable for this parameter.

Enabling this feature will have the aligner tolerate missing bases in DNA sequences and is most recommended for long, error-prone sequences like MinION reads.

Skipped
Expected value

Maximum expected value to report an alignment.

0.0010
Matrix

Scoring matrix (--matrix).

BLOSUM62
Gap open penalty

Gap open penalty (--gapopen).

Default
Gap extension penalty

Gap extension penalty (--gapextend).

Default
Block size

Block size in billions of sequence letters to be processed at a time (--block-size).

This is the main parameter for controlling the program’s memory usage.

Bigger numbers will increase the use of memory and temporary disk space, but also improve performance.

The program can be expected to use roughly six times this number of memory (in GB).

2
Index chunks

The number of chunks for processing the seed index (--index-chunks).

This option can be additionally used to tune the performance.

It is recommended to set this to 1 on a high memory server, which will increase performance and memory usage, but not the usage of temporary disk space.

 4
Number of threadsNumber of CPU threads (--treads).8
Output fileSpecify the output file name.auto

Parameters in Workflow File

Type: diamond-classify

Parameter

Parameter in the GUI

Type

database

Database

string

genetic-code

Genetic code

number

sensitive-mode

Sensitive mode

string

frame-shift

Frameshift

number

e-value

Expected value

number

matrix

Matrix

string

gap-open

Gap open penalty

number

gap-extend

Gap extension penalty

number

block-size

Block size

number

index-chunks

Index chunks

number

threads

Number of threads

number

output-url

Output file

string

Input/Output Ports

The element has 1 input port:

Name in GUI: Input sequences: 

URL(s) to FASTQ or FASTA file(s) should be provided.

The input files may contain single-end reads, scaffolds, or "left" reads in case of the paired-end sequencing (see "Input data" parameter of the element).

Name in Workflow File: in

Slots:

SlotInGUISlot in Workflow FileType
Input URLurlstring

The element has 1 output port:

Name in GUI: DIAMOND Classification: 

A list of sequence names with the associated taxonomy IDs, classified by DIAMOND.

Name in Workflow File: out

Slots:

SlotInGUISlot in Workflow FileType
Taxonomy classification data

tax-data

tax-classification

  • No labels