Date: Fri, 29 Mar 2024 11:39:58 +0700 (NOVT) Message-ID: <1220051226.88536.1711687198121@ugene.net> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_88535_774095245.1711687198121" ------=_Part_88535_774095245.1711687198121 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
Download and install the UGENE = NGS package to use this pipeline.
Th= e ChIP-seq pipeline =E2=80=9CCistrome=E2=80=9D integrated into UGENE allows= one to do the following analysis steps: peak calling and annotating, motif= search and gene ontology. ChIP-seq analysis is started from MACS tool. CEA= S then takes peak regions and signal wiggle file to check which chromosome = is enriched with binding/modification sites, whether bindings events= are significant at gene features like promoters, gene bodies, exons, intro= ns or UTRs, and the signal aggregation at gene transcription start/end site= s or meta-gene bodies (average all genes). Then peaks are investigated in t= hese ways:
Note that it is originally based on the General ChIP-seq = ;pipeline from the public Cistrome installation on the Galaxy wor= kflow platform.
If you haven't used the workflow samples in UGENE before, look at the "<= a href=3D"/wiki/display/WDD28/How+to+Use+Sample+Workflows">How to Use Sampl= e Workflows" section of the documentation.
The workflow sample "ChIP-seq Analysis with Cistrome Tools" can be = found in the "NGS" section of the Workflow Designer samples.
For treatment tags only analysis type the workflow looks as follows:
For treatment and control tags analysis type the workflow looks as = follows:
The wizards are the same for both types of workflows. T
Input data: Here you need to input a file with tre= atment and control annotations for MACS.
MACS: Here you can change default MACS parameters<= /span>.
The following parameters are available:= p>
Genome size (M= bp) |
Homo sapience = - 2700 Mbp Mus musculus - 1870 Mbp =Caenorhabditis elegans - 90 Mbp Drosophila melanogaster - 120 Mbp It's the mappable genome size or effective genome= size which is defined as the genome size which can be sequenced. Because o= f the repetitive features on the chromosomes, the actual mappable genome si= ze will be smaller than the original size, about 90% or 70% of the genome s= ize. |
P-value | P-value cutoff= . Default is 0.00001, for looser results, try 0.001 instead. |
Tag size (opti= onal) |
Length of read= s. Determined from first 10 reads if not specified (input 0). |
Keep duplicate= s |
It controls th= e MACS behavior towards duplicate tags at the exact same location -- the sa= me coordination and the same strand. The default auto option makes MACS cal= culate the maximum tags at the exact same location based on binomal distrib= ution using 1e-5 as pvalue cutoff; and the all option keeps every tags. If = an integer is given, at most this number of tags will be kept at the same l= ocation. |
Use model <= /td> | Whether or not= to use MACS paired peaks model. |
Model fold = |
Select the reg= ions within MFOLD range of high-confidence enrichment ratio against. Model = fold is available when Use Model is true, which is the foldchange to chose = paired peaks to build paired peaks model. Users need to set a lower(smaller= ) and upper(larger) number for fold change so that MACS will only use the p= eaks within these foldchange range to build model. |
Wiggle output&= nbsp; |
If this flag i= s on, MACS will store the fragment pileup in wiggle format for the whole ge= nome data instead of for every chromosomes. |
Wiggle space&n= bsp; |
By default, th= e resolution for saving wiggle files is 10 bps, i.e., MACS will save the ra= w tag count every 10 bps. You can change it along with Wiggle output parame= ter. |
Shift size = |
An arbitrary s= hift value used as a half of the fragment size when model is not built. Shi= ft size is available when Use Model is false, which will represent the HALF= of the fragment size of your sample. If your sonication and size selection= size is 300 bps, after you trim out nearly 100 bps adapters, the fragment = size is about 200 bps, so you can specify 100 here. |
Band width = |
The band width= which is used to scan the genome for model building. You can set this para= meter as the sonication fragment size expected from wet experiment. Used on= ly while building the shifting model. |
Use lambda = |
Whether to use= local lambda model which can use the local bias at peak regions to throw o= ut false positives. |
Small nearby r= egion |
The small near= by region in basepairs to calculate dynamic lambda. This is used to capture= the bias near the peak summit region. Invalid if there is no control data.= |
Auto bimodal= p> |
Whether turn o= n the auto pair model process.If set, when MACS failed to build paired mode= l, it will use the nomodelsettings, the =C2=93Shift size=C2=94 parameter to= shift and extend each tags. |
Scale to large= |
When set, scal= e the small sample up to the bigger sample.By default, the bigger dataset w= ill be scaled down towards the smaller dataset,which will lead to smaller p= /qvalues and more specific results. Keep in mind that scaling down will bri= ng down background noise more. |
CEAS: The next page allows to configure CEAS para= meters.
The following parameters are available:
Gene annotations table |
Path to gene annotation table (e.g. a refGene= table in sqlite3 db format. |
Span size <= /td> | Span from TSS = and TTS in the gene-centered annotation (base pairs). ChIP regions within t= his range from TSS and TTS are considered when calculating the coverage rat= es in promoter and downstream. |
Wiggle profiling resolution |
Wiggle profili= ng resolution. WARNING: Value smaller than the wig interval (resolution) ma= y cause aliasing error. |
Promoter/downs= tream interval |
Promoter/downs= tream intervals for ChIP region annotation are three values or a single val= ue can be given. If a single value is given, it will be segmented into thre= e equal fractions (e.g. 3000 is equivalent to 1000,2000,3000). |
BiPromoter ran= ges |
Bidirectional-= promoter sizes for ChIP region annotation. It's two values or a single valu= e can be given. If a single value is given, it will be segmented into two e= qual fractions (e.g. 5000 is equivalent to 2500,5000). |
Relative dista= nce |
Relative dista= nce to TSS/TTS in WIGGLE file profiling. |
Gene group fil= es |
Gene groups of= particular interest in wig profiling. Each gene group file must have gene = names in the 1st column. The file names are separated by commas. |
Gene group nam= es |
Set this param= eter empty for using default values. T= he names of the gene groups from "Gene group files" parameter. These names = appear in the legends of the wig profiling plots. Values range: comma-separated list of strings. Default value: '= Group 1, Group 2,...Group n'. |
Peak2Gene and Gene Ontology: The next page = allows to configure Peak2Gene and Gene Ontology parameters.
The following parameters are available:
Output type |
The directory = to store Conduct GO results. |
Official gene symbols |
Output official gene symbol instead of refseq= name. |
Distance |
Set a number which unit is base. It will get = the refGenes in n bases from peak center. |
Genome file |
Select a genom= e file (sqlite3 file) to search refGenes. |
Title |
Title is used = to name the output files - so make it meaningful. |
Gene Universe<= /p> |
Select a gene = universe. |
Conservation plot: On this page you can modify Conservat= ion Plot parameters.
The following parameters are available:
Title |
Title of the figure. |
Label |
Label of data in the figure. |
Assembly versi= on |
The directory = to store phastcons scores. |
Window width= p> |
Window width c= entered at middle of regions. |
Height |
Height of plot= . |
Width |
Width of plot.= |
SeqPos motif tool: On this page you can modify SeqPos mo= tif parameters.
The following parameters are available:
Genome assembly version |
UCSC database version. |
De novo motifs= |
Run de novo mo= tif search. |
Motif database= |
Known motif co= llections. |
Region width= p> |
Width of the r= egion to be scanned for motifs; depends on a resolution of assay. |
Pvalue cutoff<= /p> |
Pvalue cutoff = for the motif significance. |
Output data: On this page you can modify output paramete= rs.
The following parameters are available.
MACS output:
Output directory |
Directory to save MACS output files. |
Name |
Name string of the experiment. MACS will use = this string NAME to create output files like 'NAME_peaks.xls', 'NAME_negati= ve_peaks.xls', 'NAME_peaks.bed', 'NAME_summits.bed', 'NAME_model.r' and so = on. So please avoid any confliction between these filenames and your existi= ng files. |
= CEAS output:
Output report file |
Path to the report output file. Result for th= e CEAS analysis. |
Output annotations file |
Name of tab-delimited output text file, conta= ining a row of annotations for every RefSeq gene. Note that the file is not= generated if there is no peak regions input. |
Conservation Plot output:
Output file |
File to store phastcons results (BMP). |
= SeqPos motif tool output:
Output directory |
Directory to store seqpos results. |
Output file name |
Name of the ou= tput file which stores new motifs found during a de novo search. |
= Peak2Gene output:
Gene annotations |
Location of peak2gene gene annotations data f= ile. |
Peak annotations |
Location of peak2gene peak annotations data f= ile. |
Conduct GO output:
Output directory |
Directory to store Conduct GO results. |
The work on this pipeline was supported by grant RUB1-31097-NO-12 f= rom NIAID.