TFAv1.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="http://deep.mpi-inf.mpg.de/DAC/files/style/deep_process_style.css"?>
<process>
    <name>TFA</name>
	<version>1</version>
	<author>
		<name>Florian Schmidt</name>
		<email>fschmidt@mmci.uni-saarland.de</email>
	</author>

    <description>
        The TFA (Transcription Factor Annotation) process can be used to obtain scores describing the affinity of Transcription Factors to certain regions of the genome and to a predfined area around the TSS of a set of genes.
		We utilise TRAP (Transcription Factor Affinity Prediction, cf. http://bioinformatics.oxfordjournals.org/content/23/2/134.long) to obtain the aformentioned scores. The process takes a region file as input (e.g. Dnase peaks),
		a reference genome, a gene annotation file, a set of position weight matrices (PWMS, e.g. from the Jaspar database), and a set of precomputed parameters fitted to the PWMs which are required for the pValue computation in TRAP.
		In addition, the user has to specify the SampleID and the number of cores that should be used for running TRAP in parallel.
		To run the process use the script TFA.sh. The command line is:
		sh TFA {genome_reference} {region_file} {SampleID} {Number_of_cores} {pwms} {TRAP_pValue_parameters} {genome_annotation} {TFRank}
	</description>

	<inputs>
		<filetype>
			<identifier>region_file</identifier>
			<format>bed</format>
			<quantity>single</quantity>
			<comment>TF prediction will be carried out in all regions specified in this file.</comment>
		</filetype>
	</inputs>
	<references>
		<filetype>
			<identifier>genome_reference</identifier>
			<format>fa</format>
			<quantity>single</quantity>
			<comment>The reference genome of the analysed organism.</comment>
		</filetype>
		<filetype>
			<identifier>genome_annotation</identifier>
			<format>gtf</format>
			<quantity>single</quantity>
			<comment>A genome annotation file used to extract the position of TSS.</comment>
		</filetype>
		<filetype>
			<identifier>pwms</identifier>
			<format>txt</format>
			<quantity>single</quantity>
			<comment>A file containing PWMs, e.g. the vertebrates PWMs of the jaspar database.</comment>
		</filetype>
		<filetype>
			<identifier>TRAP_pValue_parameters</identifier>
			<format>txt</format>
			<quantity>single</quantity>
			<comment>A file containing precomputed parameters needed for the pValue computation within TRAP. For each PWM contained in jaspar_pwms, these parameters have to be specified.</comment>
		</filetype>
	</references>
	<outputs>
		<filetype>
			<identifier>affinity_file_peak_view</identifier>
			<format>txt</format>
			<quantity>single</quantity>
			<comment>This file contains the so called affinity scores computed by TRAP for all regions in the region_file.</comment>
		</filetype>
		<filetype>
			<identifier>pAffinity_file_peak_view</identifier>
			<format>txt</format>
			<quantity>single</quantity>
			<comment>This file contains the -log() transformed pValues computed by TRAP for all regions in the region_file. The Pseudocount (to avoid -log(0) is set to 0.001.</comment>
		</filetype>
		<filetype>
			<identifier>affinity_file_gene_view</identifier>
			<format>txt</format>
			<quantity>single</quantity>
			<comment>Using the genome_annotation, we computed an additive affinity score for all genes by overlapping all regions of the region file with a window of size 3000bp centered at the TSS of each gene.</comment>
		</filetype>
		<filetype>
			<identifier>pAffinity_file_gene_view</identifier>
			<format>txt</format>
			<quantity>single</quantity>
			<comment>Using the genome_annotation, we computed an additive score, based on -log transformed pValues, for all genes by overlapping all regions of the region file with a window of size 3000bp centered at the TSS of each gene.</comment>
		</filetype>
		<filetype>
			<identifier>affinity_file_rank</identifier>
			<format>txt</format>
			<quantity>single</quantity>
			<comment>This file contains the top x TFs according to the highest TRAP affinity scores for each gene in the annotation file. The number of genes x is specified by the user through setting the variable TFRank.</comment>
		</filetype>
		<filetype>
			<identifier>pAffinity_file_rank</identifier>
			<format>txt</format>
			<quantity>single</quantity>
			<comment>This file contains the top x TFs according to the highest TRAP pValues (-log transformed) for each gene in the annotation file. The number of genes x is specified by the user through setting the variable TFRank.</comment>
		</filetype>

	</outputs>
	<software>
		<tool>
			<name>bedtools getfasta</name>
			<version>2.23.0</version>
			<command_line><![CDATA[bedtools getfasta -fi {genome_reference} -bed {region_file} -fo temp_file]]></command_line>
            <loop></loop>
			<comment>Creates a fasta file containing the sequences for all regions listed in the region_file.</comment>
		</tool>
		<tool>
			<name>convertRYtoN.py</name>
			<version>1.0</version>
			<command_line><![CDATA[python convertInvalidCharacterstoN.py temp_file temp_file2]]></command_line>
            <loop></loop>
			<comment>As TRAP accepts only A,C,G,T and N we replace all characters unequal to A,C,G, or T by N.</comment>
		</tool>
		<tool>
			<name>TRAP.R3script, R3</name>
			<version>1.0, 3.1.2</version>
			<command_line><![CDATA[R3script TRAP.R3script temp_file2 {pAffinity_file_peak_view} {affinity_file_peak_view} {Number_of_cores} {pwms} {TRAP_pValue_parameters}]]></command_line>
            <loop></loop>
			<comment>This script uses the package TRAP to compute transcription factor binding affinity. It requires the libraries tRap, parallel and Biostrings. FORK is used in the parallelisation. A manual on how to install tRap can be found here: http://trap.molgen.mpg.de/download/TRAP_R_package/tRap-tutorial.html.</comment>
		</tool>
		<tool>
			<name>annotateTSS.py</name>
			<version>1.0</version>
			<command_line><![CDATA[python annotateTSS.py {genome_annotation} {affinity_file_peak_view} {pAffinity_file_peak_view} {affinity_file_gene_view} {affinity_file_rank} {pAffinity_file_gene_view} {pAffinity_file_rank} {windowSize} {TFRank}]]></command_line>
            <loop></loop>
			<comment>This python script computes Gene specific TF scores using both files obtained from TRAP.R3script (affinity_file_peak_view and pAffinity_file_peak_view}) and the genome_annotation.
					 There are two types of output: The gene_view files and optinal gene_rank files. Gene_view files provide affinity scores and pValues for each TF and each gene in the genome_annotation.
					 The rank files contain the top TFRank transcription factors, individually for all genes. One file is scored according to affinity and one is scored according to pValues. If TFRank is set to none, the rank files are not generated
					 The window size determines the size of the window around each TSS.</comment>
		</tool>
	</software>
</process>
	<?xml version="1.0"?>
	<?xml-stylesheet type="text/css" href="http://deep.mpi-inf.mpg.de/DAC/files/style/deep_process_style.css"?>
	<process>
	<name>TFA</name>
	<version>1</version>
	<author>
	<name>Florian Schmidt</name>
	<email>fschmidt@mmci.uni-saarland.de</email>
	</author>

	<description>
	The TFA (Transcription Factor Annotation) process can be used to obtain scores describing the affinity of Transcription Factors to certain regions of the genome and to a predfined area around the TSS of a set of genes.
	We utilise TRAP (Transcription Factor Affinity Prediction, cf. http://bioinformatics.oxfordjournals.org/content/23/2/134.long) to obtain the aformentioned scores. The process takes a region file as input (e.g. Dnase peaks),
	a reference genome, a gene annotation file, a set of position weight matrices (PWMS, e.g. from the Jaspar database), and a set of precomputed parameters fitted to the PWMs which are required for the pValue computation in TRAP.
	In addition, the user has to specify the SampleID and the number of cores that should be used for running TRAP in parallel.
	To run the process use the script TFA.sh. The command line is:
	sh TFA {genome_reference} {region_file} {SampleID} {Number_of_cores} {pwms} {TRAP_pValue_parameters} {genome_annotation} {TFRank}
	</description>

	<inputs>
	<filetype>
	<identifier>region_file</identifier>
	<format>bed</format>
	<quantity>single</quantity>
	<comment>TF prediction will be carried out in all regions specified in this file.</comment>
	</filetype>
	</inputs>
	<references>
	<filetype>
	<identifier>genome_reference</identifier>
	<format>fa</format>
	<quantity>single</quantity>
	<comment>The reference genome of the analysed organism.</comment>
	</filetype>
	<filetype>
	<identifier>genome_annotation</identifier>
	<format>gtf</format>
	<quantity>single</quantity>
	<comment>A genome annotation file used to extract the position of TSS.</comment>
	</filetype>
	<filetype>
	<identifier>pwms</identifier>
	<format>txt</format>
	<quantity>single</quantity>
	<comment>A file containing PWMs, e.g. the vertebrates PWMs of the jaspar database.</comment>
	</filetype>
	<filetype>
	<identifier>TRAP_pValue_parameters</identifier>
	<format>txt</format>
	<quantity>single</quantity>
	<comment>A file containing precomputed parameters needed for the pValue computation within TRAP. For each PWM contained in jaspar_pwms, these parameters have to be specified.</comment>
	</filetype>
	</references>
	<outputs>
	<filetype>
	<identifier>affinity_file_peak_view</identifier>
	<format>txt</format>
	<quantity>single</quantity>
	<comment>This file contains the so called affinity scores computed by TRAP for all regions in the region_file.</comment>
	</filetype>
	<filetype>
	<identifier>pAffinity_file_peak_view</identifier>
	<format>txt</format>
	<quantity>single</quantity>
	<comment>This file contains the -log() transformed pValues computed by TRAP for all regions in the region_file. The Pseudocount (to avoid -log(0) is set to 0.001.</comment>
	</filetype>
	<filetype>
	<identifier>affinity_file_gene_view</identifier>
	<format>txt</format>
	<quantity>single</quantity>
	<comment>Using the genome_annotation, we computed an additive affinity score for all genes by overlapping all regions of the region file with a window of size 3000bp centered at the TSS of each gene.</comment>
	</filetype>
	<filetype>
	<identifier>pAffinity_file_gene_view</identifier>
	<format>txt</format>
	<quantity>single</quantity>
	<comment>Using the genome_annotation, we computed an additive score, based on -log transformed pValues, for all genes by overlapping all regions of the region file with a window of size 3000bp centered at the TSS of each gene.</comment>
	</filetype>
	<filetype>
	<identifier>affinity_file_rank</identifier>
	<format>txt</format>
	<quantity>single</quantity>
	<comment>This file contains the top x TFs according to the highest TRAP affinity scores for each gene in the annotation file. The number of genes x is specified by the user through setting the variable TFRank.</comment>
	</filetype>
	<filetype>
	<identifier>pAffinity_file_rank</identifier>
	<format>txt</format>
	<quantity>single</quantity>
	<comment>This file contains the top x TFs according to the highest TRAP pValues (-log transformed) for each gene in the annotation file. The number of genes x is specified by the user through setting the variable TFRank.</comment>
	</filetype>

	</outputs>
	<software>
	<tool>
	<name>bedtools getfasta</name>
	<version>2.23.0</version>
	<command_line><![CDATA[bedtools getfasta -fi {genome_reference} -bed {region_file} -fo temp_file]]></command_line>
	<loop></loop>
	<comment>Creates a fasta file containing the sequences for all regions listed in the region_file.</comment>
	</tool>
	<tool>
	<name>convertRYtoN.py</name>
	<version>1.0</version>
	<command_line><![CDATA[python convertInvalidCharacterstoN.py temp_file temp_file2]]></command_line>
	<loop></loop>
	<comment>As TRAP accepts only A,C,G,T and N we replace all characters unequal to A,C,G, or T by N.</comment>
	</tool>
	<tool>
	<name>TRAP.R3script, R3</name>
	<version>1.0, 3.1.2</version>
	<command_line><![CDATA[R3script TRAP.R3script temp_file2 {pAffinity_file_peak_view} {affinity_file_peak_view} {Number_of_cores} {pwms} {TRAP_pValue_parameters}]]></command_line>
	<loop></loop>
	<comment>This script uses the package TRAP to compute transcription factor binding affinity. It requires the libraries tRap, parallel and Biostrings. FORK is used in the parallelisation. A manual on how to install tRap can be found here: http://trap.molgen.mpg.de/download/TRAP_R_package/tRap-tutorial.html.</comment>
	</tool>
	<tool>
	<name>annotateTSS.py</name>
	<version>1.0</version>
	<command_line><![CDATA[python annotateTSS.py {genome_annotation} {affinity_file_peak_view} {pAffinity_file_peak_view} {affinity_file_gene_view} {affinity_file_rank} {pAffinity_file_gene_view} {pAffinity_file_rank} {windowSize} {TFRank}]]></command_line>
	<loop></loop>
	<comment>This python script computes Gene specific TF scores using both files obtained from TRAP.R3script (affinity_file_peak_view and pAffinity_file_peak_view}) and the genome_annotation.
	There are two types of output: The gene_view files and optinal gene_rank files. Gene_view files provide affinity scores and pValues for each TF and each gene in the genome_annotation.
	The rank files contain the top TFRank transcription factors, individually for all genes. One file is scored according to affinity and one is scored according to pValues. If TFRank is set to none, the rank files are not generated
	The window size determines the size of the window around each TSS.</comment>
	</tool>
	</software>
	</process>