Child pages
  • RNA-seq Analysis with Tuxedo Tools

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

  • The pipeline is currently available on Linux and Mac OS X only.
  • Before proceeding, please make sure Python 2.7 is installed on the target computer.

 Select Samples tab on the Workflow Designer Palette and double-click on the ChIP"RNA-seq analysis with Cistrome Tuxedo tools sample" sample. The  The following configure wizard appears:

HTML
<center>
  <br>
  <img src="/wiki/download/attachments/3244623/RNA-seq analysis with Tuxedo tools_1.png"/>
  <br> 
</center>

Here you need to choose analysis type and short reads type and click Setup. There are two short reads type: single-end and paired-end reads. For both of them there are three analysis type:

...

HTML
<center>
  <br>
  <img src="/wiki/download/attachments/3244623/RNA-seq analysis with Tuxedo tools_21.png"/>
  <br> 
</center>

For Full Tuxedo Pipeline analysis type and paired-end reads type the following workflow appears:

HTML
<center>
  <br>
  <img src="/wiki/download/attachments/3244623/RNA-seq analysis with Tuxedo tools_32.png"/>
  <br> 
</center>

For Single-sample Tuxedo Pipeline analysis type and single-end reads type the following workflow appears:

HTML
<center>
  <br>
  <img src="/wiki/download/attachments/3244623/RNA-seq analysis with Tuxedo tools_43.png"/>
  <br> 
</center>

For Single-sample Tuxedo Pipeline analysis type and paired-end reads type the following workflow appears:

HTML
<center>
  <br>
  <img src="/wiki/download/attachments/3244623/RNA-seq analysis with Tuxedo tools_54.png"/>
  <br> 
</center>

For No-new-transcripts Tuxedo Pipeline analysis type and single-end reads type the following workflow appears:

HTML
<center>
  <br>
  <img src="/wiki/download/attachments/3244623/RNA-seq analysis with Tuxedo tools_65.png"/>
  <br> 
</center>

For No-new-transcripts Tuxedo Pipeline analysis type and paired-end reads type the following workflow appears:

HTML
<center>
  <br>
  <img src="/wiki/download/attachments/3244623/RNA-seq analysis with Tuxedo tools_76.png"/>
  <br> 
</center>

Use the workflow wizard to guide you through the parameters setup process. Click Show wizard button on the Workflow Designer toolbar to open it:

HTML
<center>
  <br>
  <img src="/wiki/download/attachments/3244623/ChIPRNA-seq analysis with CistromeTuxedo tools_47.png"/>
  <br> 
</center>

 All of these workflows have the similar wizards. For Full Tuxedo Pipeline analysis type and paired-end reads type the following first wizard page appears: 

...

Here you need to input RNA-seq short reads in FASTA or FASTQ formats. Many datasets with different reads can be added. Click the Next button. The next page appears:

 

HTML
<center>
  <br>
  <img src="/wiki/download/attachments/3244623/RNA-seq analysis with Tuxedo tools_9.png"/>
  <br> 
</center>

Here you can configure TopHatneed to divide the input datasets into samples for running Cuffdiff. There are must be at least 2 samples. It is not neccessary to have the same number of datasets (replicates) for each sample. The samples names will be used by Cuffdiff as labels, which will be included in various output files produced by Cuffdiff. Click the Next button. The next page appears:

HTML
<center>
  <br>
  <img src="/wiki/download/attachments/3244623/RNA-seq analysis with Tuxedo tools_10.png"/>
  <br> 
</center>

Here you can configure TopHat settings. To show additional parameters click on the + button. The following parameters are available: 

...

HTML
<center>
  <br>
  <img src="/wiki/download/attachments/3244623/RNA-seq analysis with Tuxedo tools_1011.png"/>
  <br> 
</center>

The following parameters are available:  

 

Reference annotation

Tells Cufflinks to use the supplied reference annotation to estimate isoform expression. Cufflinks will not assemble novel transcripts and the program will ignore alignments not structurally compatible with any reference transcript.

RABT annotation

Tells Cufflinks to use the supplied reference annotation to guide Reference Annotation Based Transcript (RABT) assembly. Reference transcripts will be tiled with faux-reads to provide additional information in assembly. Output will include all reference transcripts as well as any novel genes and isoforms that are assembled.

Library type

Specifies RNA-seq protocol.

Mask file

Ignore all reads that could have come from transcripts in this file. It is recommended to include any annotated rRNA, mitochondrial transcripts other abundant transcripts you wish to ignore in your analysis in this file. Due to variable efficiency of mRNA enrichment methods and rRNA depletion kits, masking these transcripts often improves the overall robustness of transcript abundance estimates.

Multi-read correct

Tells Cufflinks to do an initial estimation procedure to more accurately weight reads mapping to multiple locations in the genome.

Min isoform fraction

After calculating isoform abundance for a gene, Cufflinks filters out transcripts that it believes are very low abundance, because isoforms expressed at extremely low levels often cannot reliably be assembled, and may even be artifacts of incompletely spliced precursors of processed transcripts. This parameter is also used to filter out introns that have far fewer spliced alignments supporting them.

Frag bias correct

Providing Cufflinks with a multifasta file via this option instructs it to run the bias detection and correction algorithm which can significantly improve accuracy of transcript abundance estimates.

Pre-mRNA fraction

Some RNA-Seq protocols produce a significant amount of reads that originate from incompletely spliced transcripts, and these reads can confound the assembly of fully spliced mRNAs. Cufflinks uses this parameter to filter out alignments that lie within the intronic intervals implied by the spliced alignments. The minimum depth of coverage in the intronic region covered by the alignment is divided by the number of spliced reads, and if the result is lower than this parameter value, the intronic alignments are ignored.

Cufflinks tool path

The path to the Cufflinks external tool in UGENE.

Temporary directory

The directory for temporary files.

Configure parameters, if necessary, and click Next. On the next page you may configure Cuffmerge settings:

HTML
<center>
  <br>
  <img src="/wiki/download/attachments/3244623/RNA-seq analysis with Tuxedo tools_1112.png"/>
  <br> 
</center>

The following parameters are available: 

 

Minimum isoform fraction

Discard isoforms with abundance below this.

Reference annotation

Merge the input assemblies together with this reference annotation.

Reference sequence

The genomic DNA sequences for the reference. It is used to assist in classifying transfrags and excluding artifacts (e.g. repeats). For example, transcripts consisting mostly of lower-case bases are classified as repeats.

Cuffcompare tool path

The path to the Cuffcompare external tool in UGENE.

Cuffmerge tool path

The path to the Cuffmerge external tool in UGENE.

Temporary directory

The directory for temporary files.

Configure parameters, if necessary, and click Next. On the next page you may configure Cuffdiff settings:

HTML
<center>
  <br>
  <img src="/wiki/download/attachments/3244623/RNA-seq analysis with Tuxedo tools_1213.png"/>
  <br> 
</center>

The following parameters are available: 

 

Time series analysis

If set to True, instructs Cuffdiff to analyze the provided samples as a time series, rather than testing for differences between all pairs of samples. Samples should be provided in increasing time order.

Upper quartile norm

If set to True, normalizes by the upper quartile of the number of fragments mapping to individual loci instead of the total number of sequenced fragments. This can improve robustness of differential expression calls for less abundant genes and transcripts.

Hits norm

Instructs how to count all fragments. Total specifies to count all fragments, including those not compatible with any reference transcript, towards the number of mapped fragments used in the FPKM denominator. Compatible specifies to use only compatible fragments. Selecting Compatible is generally recommended in Cuffdiff to reduce certain types of bias caused by differential amounts of ribosomal reads which can create the impression of falsely differentially expressed genes.

Frag bias correct

Providing the sequences your reads were mapped to instructs Cuffdiff to run bias detection and correction algorithm which can significantly improve accuracy of transcript abundance estimates.

Multi read correct

Do an initial estimation procedure to more accurately weight reads mapping to multiple locations in the genome.

Library type

Specifies RNA-Seq protocol.

Mask file

Ignore all reads that could have come from transcripts in this file. It is recommended to include any annotated rRNA, mitochondrial transcripts other abundant transcripts you wish to ignore in your analysis in this file. Due to variable efficiency of mRNA enrichment methods and rRNA depletion kits, masking these transcripts often improves the overall robustness of transcript abundance estimates.

Min alignment count

The minimum number of alignments in a locus for needed to conduct significance testing on changes in that locus observed between samples. If no testing is performed, changes in the locus are deemed not significant, and the locus’ observed changes don’t contribute to correction for multiple testing.

FDR

Allowed false discovery rate used in testing.

Max MLE iterations

Sets the number of iterations allowed during maximum likelihood estimation of abundances.

Emit count tables

Include information about the fragment counts, fragment count variances, and fitted variance model into the report.

Cuffdiff tool path

The path to the Cuffdiff external tool in UGENE.

Temporary directory

The directory for temporary files.

Configure parameters, if necessary, and click Next. The last page of the wizard appears:

HTML
<center>
  <br>
  <img src="/wiki/download/attachments/3244623/RNA-seq analysis with Tuxedo tools_1314.png"/>
  <br> 
</center>

Choose output directories for each tools and click Finish. 

 Note that default button reverts all parameters to default settings. 

 Now let’s validate and run the workflow. To validate that the workflow is correct and all parameters are set properly click the Validate workflow button on the Workflow Designer toolbar:

 

HTML
<center>
  <br>
  <img src="/wiki/download/attachments/32446173244623/CallRNA-seq variantsanalysis with Tuxedo SAMtoolstools_715.png"/>
  <br> 
</center>

If there are some errors, they will be shown in the Error list at the bottom of the Workflow Designer window, for example:

HTML
<center>
  <br>
  <img src="/wiki/download/attachments/32446173244623/CallRNA-seq variantsanalysis with Tuxedo SAMtoolstools_816.png"/>
  <br> 
</center>

However, if you have set all the required parameters, then there shouldn’t be errors. To run a valid workflow, click the Run workflow button on the Workflow Designer toolbar: After that you can estimate the workflow. To run estimation click the Estimate workflow button:  

HTML
<center>
  <br>
  <img src="/wiki/download/attachments/32446173244623/CallRNA-seq variantsanalysis with Tuxedo SAMtoolstools_917.png"/>
  <br> 
</center>

As soon as the variants calling task is finished, a notification and dashboard will appear.  To run a valid workflow, click the Run workflow button on the Workflow Designer toolbar: 

HTML
<center>
  <br>
  <img src="/wiki/download/attachments/32446173244623/CallRNA-seq variantsanalysis with Tuxedo SAMtoolstools_1018.png"/>
  <br> 
</center>

As soon as the variants calling task is finished, a notification and dashboard will appear. The dashboard will contain information about workflow: input and output files, all information about task.  . 

Info

The work on the Tuxedo pipeline was supported by grant RUB1-31097-NO-12 from NIAID.