Build a Kraken database from a genomic library or shrink a Kraken databaseCLARK (CLAssifier based on Reduced K-mers) is a tool for supervised sequence classification based on discriminative k-mers.
UGENE provides the GUI for CLARK and CLARK-l variants of the CLARK framework for solving the problem of the assignment of metagenomic reads to known genomes.
Parameters in GUI
Parameter | Description | Default valueDefaultvalue |
---|---|---|
Mode | Select "Build" to create a new database from a genomic library (--build). | Build |
Database | Name of the output Kraken database (corresponds to --dbthat is used with --build, and to --new-dbthat is used with --shrink). | |
Genomic library | Genomes that should be used to build the database. | |
K-mer length | K-mer length in bp (--kmer-len). | 31 |
Minimizer length | Minimizer length in bp (--minimizer-len). The minimizers serve to keep k-mersthat are adjacent in query sequences close to each other in the database, which allows Kraken to exploit the CPU cache. | 15 |
Maximum database size | By default, a full database build is done. To shrink the database before the full build, input the size of the database in Mb (this corresponds to the --max-db-size parameter, but Mb is used instead of Gb). The size is specified together for the database and the index. | No limit |
Clean | Remove unneeded files from a built database to reduce the disk usage (--clean). | True |
Work on disk | Performs most operations on disk rather than in RAM (this will slow down build in most cases). | False |
Jellyfiah hash size | The "kraken-build" tool uses the "jellyfish" tool. This parameter specifies the hash size for Jellyfish. Supply a smaller hash size to Jellyfish, if you encounter problems with allocating enough memory during the build process (--jellyfish-hash-size). | Skip |
Number of threads | Use multiple threads (--threads). | 8 |
Parameters in Workflow File
Type: kraken-buildInput data | To classify single-end (SE) reads or scaffolds, received by reads de novo assembly, set this parameter to "SE reads or scaffolds". | SE reads or skaffolds |
Classification tool | Use CLARK-l on workstations with limited memory (i.e., "l" for light), this software tool provides precise classification on small metagenomes. It works with a sparse or ''light'' database (up to 4 GB of RAM) while still performing ultra accurate and fast results. | CLARK-l |
Database | A path to the folder with the CLARK database files (-D). | |
Minimum k-mer frequency | Minimum of k-mer frequency/occurrence for the discriminative k-mers (-t). For example, for 1 (or, 2), the program will discard any discriminative k-mer that appear only once (or, less than twice). | 0 |
Mode | Set the mode of the execution (-m):
| Default |
Gap | "Gap" or number of non-overlapping k-mers to pass when creating the database (-п). | 4 |
Load database into memory | Request the loading of database file by memory mapped-file (--ldm). This option accelerates the loading time but it will require an additional amount of RAM significant. This option also allows to load the database in multithreaded-task (see also the "Number of threads" parameter). | False |
Number of threads | Use multiple threads for the classification and, with the "Load database into memory" option enabled, for the loading of the database into RAM (-n). | 8 |
Output file | Specify the output file name. | auto |
Parameters in Workflow File
Type: clark-classify
Parameter | Parameter in the GUI | Type | ||
---|---|---|---|---|
mode | Mode sequencing-reads | Input data | string | |
tool-variant | Classification tool | number | ||
database | Database | string | ||
genomic-library | Genomic library | url-datasets | ||
k-min-freq | Minimum k-mer -lengthK-mer lengthfrequency | number | ||
minimizer-length | Minimizer length | number | ||
maximum-database-size | Maximum database size mode | Mode | bool | |
gap | Gap | number | ||
clean | Clean | bool | ||
work-on-disk | Work on disk | bool | ||
jellyfish-hash-size | Jellyfiah hash size | number preload | Load database into memory | bool |
threads | Number of threads | number | ||
output-url | Output file | string |
Input/Output Ports
The element has 1 output port1 input port:
Name in GUI: Output Kraken database: Input sequences:
URL(s) to FASTQ or FASTA file(s) should be provided. In case of SE reads or scaffolds use the "Input URL 1" slot only.
In case of PE reads input "left" reads to "Input URL 1", "right" reads to "Input URL 2". See also the "Input data" parameter of the element.
Name in Workflow File: out in
Slots:
SlotInGUI | Slot in Workflow File | Type |
---|---|---|
Output Input URL 1 | url | string |
The element has 1 output port:
Name in GUI: CLARK Classification:
A map of sequence names with the associated taxonomy IDs, classified by CLARK.
Name in Workflow File: out
Slots:
SlotInGUI | Slot in Workflow File | Type |
---|---|---|
Taxonomy classification data | tax-data | tax-classification |