TopHat is a program for mapping RNA-Seq reads to a long reference sequence. It uses Bowtie or Bowtie2 to map the reads and then analyzes the mapping results to identify splice junctions between exons.

Provide URL(s) to FASTA or FASTQ file(s) with NGS RNA-Seq reads to the input port of the element, set up the reference sequence in the parameters. The result is saved to the specified BAM file, URL to the file is passed to the output port. Several UCSC BED tracks are also produced: junctions, insertions, and deletions.

Element type: tophat

Parameters

ParameterDescriptionDefault valueParameter in Workflow FileType
Reference input type

Select "Sequence" to input a reference genome as a sequence file.
Note that any sequence file format, supported by UGENE, is allowed (FASTA, GenBank, etc.).
The index will be generated automatically in this case.
Select "Index" to input already generated index files, specific for the tool.

Index

reference-input-type

string
Bowtie index folderThe folder with the Bowtie index for the reference sequence.
bowtie-index-dirstring
Bowtie index basenameThe basename of the Bowtie index for the reference sequence.
bowtie-index-basenamestring
Output folder

The base name of the output folder. It could be modified with a suffix.


out-dir
Mate inner distanceThe expected (mean) inner distance between mate pairs.50mate-inner-distancenumeric
Mate standard deviationThe standard deviation for the distribution on inner distances between mate pairs.20mate-standard-deviationnumeric
Library typeSpecifies RNA-Seq protocol.fr-unstrandedlibrary-typenumeric
No novel junctionsOnly look for reads across junctions indicated in the supplied GFF or junctions file. This parameter is ignored if Raw junctions or Known transcript file is not set.Falseno-novel-junctionsboolean
Raw junctionsThe list of raw junctions.
raw-junctionsstring
Known transcript fileA set of gene model annotations and/or known transcripts.
known-transcriptstring
Max multihitsInstructs TopHat to allow up to this many alignments to the reference for a given read, and suppresses all alignments for reads with more than this many alignments.20max-multihitsnumeric
Segment lengthEach read is cut up into segments, each at least this long. These segments are mapped independently.25segment-lengthnumeric
Fusion searchTurn on fusion mapping.Falsefusion-searchboolean
Transcriptome onlyOnly align the reads to the transcriptome and report only those mappings as genomic mappings.Falsetranscriptome-onlyboolean
Transcriptome max hitsMaximum number of mappings allowed for a read, when aligned to the transcriptome (any reads found with more than this number of mappings will be discarded).60transcriptome-max-hitsnumeric
Prefilter multihitsWhen mapping reads on the transcriptome, some repetitive or low complexity reads that would be discarded in the context of the genome may appear to align to the transcript sequences and thus may end up reported as mapped to those genes only. This option directs TopHat to first align the reads to the whole genome in order to determine and exclude such multi-mapped reads (according to the value of the Max multihits option).Falseprefilter-multihitsboolean
Min anchor lengthThe anchor length. TopHat will report junctions spanned by reads with at least this many bases on each side of the junction. Note that individual spliced alignments may span a junction with fewer than this many bases on one side. However, every junction involved in spliced alignments is supported by at least one read with this many bases on each side.8min-anchor-lengthnumeric
Splice mismatchesThe maximum number of mismatches that may appear in the anchor region of a spliced alignment.0splice-mismatchesnumeric
Read mismatchesFinal read alignments having more than these many mismatches are discarded.2read-mismatchesnumeric
Segment mismatchesRead segments are mapped independently, allowing up to this many mismatches in each segment alignment.2segment-mismatchesnumeric
Solexa 1.3 qualsAs of the Illumina GA pipeline version 1.3, quality scores are encoded in Phred-scaled base-64. Use this option for FASTQ files from pipeline 1.3 or later.Falsesolexa-1-3-qualsboolean
Bowtie versionSpecifies which Bowtie version should be used.Bowtie2bowtie-versionnumeric
Bowtie -n modeTopHat uses -v in Bowtie for initial read mapping (the default), but with this option, -n is used instead. Read segments are always mapped using -v option.Use -v modebowtie-n-modenumeric
Bowtie tool pathThe path to the Bowtie external tool.defaultbowtie-tool-pathstring
SAMtools tool pathThe path to the SAMtools tool. Note that the tool is available in the UGENE External Tool Package.defaultsamtools-tool-pathstring
TopHat tool pathThe path to the TopHat external tool in UGENE.defaultpathstring
Temporary folderThe directory for temporary files.defaulttemp-dirstring
Samples map

The map which divides all input datasets into samples. Every sample has the unique name.




Input/Output Ports

The element has 1 input port:

Name in GUI: Input reads

Name in Workflow File: in-assembly

Slots:

Slot In GUISlot in Workflow FileType
Dataset namedatasetstring
Input readsfirst.inassembly
Input reads urlin-urlstring
Input paired reads urlpaired-urlstring
Input paired readssecond.inassembly

And 1 output port:

Name in GUI: TopHat output

Name in Workflow File: out-assembly

Slots:

Slot In GUISlot in Workflow FileType
Accepted hitsaccepted.hitsassembly
Accepted hits urlhits-urlstring