Attention: Metagenomics was available before v40.
The workflow sample, described below, takes FASTQ files with metagenomic NGS reads as input and process them as follows:
If you haven't used the workflow samples in UGENEbefore, look at the "How to Use Sample Workflows" section of the documentation. |
The workflow sample "Serial NGS Reads Classification" can be found in the "NGS" section of the Workflow Designer samples.
The opened workflow for single-end reads looks as follows:
<center> <br> <img src="/wiki/download/attachments/22061358/Serial NGS Reads Classification.jpg"/> <br> </center> |
The opened workflow for paired-end reads looks as follows:
<center> <br> <img src="/wiki/download/attachments/22061358/Serial NGS Reads Classification_1.jpg"/> <br> </center> |
The wizard has 5 pages.
Input data: On this page, input files must be set.
<center> <br> <img src="/wiki/download/attachments/22061358/Serial NGS Reads Classification_2.jpg"/> <br> </center> |
Trimmomatic settings: The Trimmomatic parameters can be changed here.
<center> <br> <img src="/wiki/download/attachments/22061358/Serial NGS Reads Classification_3.jpg"/> <br> </center> |
To configure trimming steps use the following button:
<center> <br> <img src="/wiki/download/attachments/22062035/De novo Assemble Illumina PE Reads_3.jpg"/> <br> </center> |
The following dialog will appear:
<center> <br> <img src="/wiki/download/attachments/22059547/Improve Reads with Trimmomatic Element.png"/> <br> </center> |
Click the Add new step button and select a step. The following options are available:
Each step has the own parameters:
This step drops a read if the average quality is below the specified level.
Input the following values:
This step removes bases regardless of quality from the end of thread, so that the readhas maximally the specified length after this step has been performed. Steps performed after CROP might of course further shorten the read.
Input the following values:
This step removes the specified number of bases, regardless of quality, from the beginning of the read.
Input the following values:
This step is used to find and remove Illumina adapters.
Trimmomatic first compares short sections of an adapter and a read. If they match enough, the entire alignment between the read and adapter is scored. For paired-end reads, the "palindrome" approach is also used to improve the result. See Trimmomatic manual for details.
Input the following values:
There are also two optional parameters for palindrome mode: Min adapter length and Keep both reads. Use the following dialog. To call the dialog press the Optional button.
<center> <br> <img src="/wiki/download/attachments/22059547/Improve Reads with Trimmomatic Element_1.jpg"/> <br> </center> |
This step removes low-quality bases from the beginning. As long as a base has a value below this threshold the base is removed and the next base will be investigated.
Input the following values:
This step performs an adaptive quality trim, balancing the benefits of retaining longer reads against the costs of retaining bases with errors. See Trimmomatic manual for details.
Input the following values:
This step removes reads that fall below the specified minimum length. If required, it should normally be after all other processing steps. Reads removed by this step will be counted and included in the "dropped reads" count.
Input the following values:
This step performs a sliding window trimming, cutting once the average quality within the window falls below a threshold. By considering multiple bases, a single poor quality base will not cause the removal of high-quality data later in the read.
Input the following values:
This step (re)encodes the quality part of the FASTQ file to base 33.
This step (re)encodes the quality part of the FASTQ file to base 64.
This step removes low-quality bases from the end. As long as a base has a value below this threshold the base is removed and the next base (i.e. the preceding one) will be investigated. This approach can be used removing the special Illumina " low-quality segment" regions (which are marked with a quality score of 2), but SLIDINGWINDOW or MAXINFO are recommended instead.
Input the following values:
To remove a step use the Remove selected step button. The pink highlighting means the required parameter has not been set.
Kraken settings: Default Kraken parameters can be changed here.
<center> <br> <img src="/wiki/download/attachments/22061358/Serial NGS Reads Classification_4.jpg"/> <br> </center> |
The following parameters are available:
Database | A path to the folder with the Kraken database files. |
Quick operation | Stop classification of an input read after the certain number of hits. |
CLARK settings: Default CLARK parameters can be changed here.
<center> <br> <img src="/wiki/download/attachments/22061358/Serial NGS Reads Classification_5.jpg"/> <br> </center> |
The following parameters are available:
Database | A folder that should be used to store the database files. |
K-mer length | This value is critical for the classification accuracy and speed. For high sensitivity, it is recommended to set this value to 20 or 21 (along with the "Full" mode). However, if the precision and the speed are the main concern, use any value between 26 and 32. Note that the higher the value, the higher is the RAM usage. So, as a good tradeoff between speed, precision, and RAM usage, it is recommended to set this value to 31 (along with the "Default" or "Express" mode). |
Minimum k-mer frequency | Minimum of k-mer frequency/occurrence for the discriminative k-mers(-t). |
Mode | Set the mode of the execution (-m):
Sampling factor value | |
Gap | "Gap" or number of non-overlapping k-mers to pass when creating the database (-п).
Output Files Page: On this page, you can select an output directory:
<center> <br> <img src="/wiki/download/attachments/22061358/Serial NGS Reads Classification_6.jpg"/> <br> </center> |