Skip to content

How to setup the TOuCAN configuration file

renewiegandt edited this page Jul 25, 2018 · 23 revisions

General information about nextflow configuration files

Before you start writing the TOuCAN configuration file it is recommended to read the documentation about nextflow configuration files.

Environment variables

  • path_bowtie2

    • Path to directory where bowtie2 is installed. For example: /mnt/software/x86_64/packages/bowtie2/2.3.3.1/. Do not add the executable to the path.
  • path_python

    • Path to directory where python is installed. For example: /mnt/software/x86_64/packages/python/2.7.8/bin.
      Do not add the executable to the path.
  • path_bwa

    • Path to directory where bwa is installed. For example: /mnt/software/x86_64/packages/bwa/0.7.12/bin.
      Do not add the executable to the path.
  • path_bwa

    • Path to directory where bwa is installed. For example: /mnt/software/x86_64/packages/bwa/0.7.12/bin.
      Do not add the executable to the path.
  • path_samtools

    • Path to directory where SAMtools is installed. For example: /mnt/software/x86_64/packages/samtools/1.3.1/bin.
      Do not add the executable to the path.
  • path_bedtools

    • Path to directory where BEDtools is installed. For example: /mnt/software/x86_64/packages/bedtools/2.27.1/bin.
      Do not add the executable to the path.
  • path_R

    • Path to directory where R is installed. For example: /usr/R/bin.
      Do not add the executable to the path.
  • path_bin

    • Path to bin directory of TOuCAN . For example: ./TOuCAN/bin.
  • path_genome

    • Path to full genome in fasta format + bwa index. The name of the fasta file has to be the basename of the bwa index. For example:
      ./index_bwa/
          ./GRCm38.p5.genome_whitelist.fa
          ./GRCm38.p5.genome_whitelist.fa.amb
          ./GRCm38.p5.genome_whitelist.fa.ann
          ./GRCm38.p5.genome_whitelist.fa.bwt
          ./GRCm38.p5.genome_whitelist.fa.pac
          ./GRCm38.p5.genome_whitelist.fa.rbwt
          ./GRCm38.p5.genome_whitelist.fa.rpac
          ./GRCm38.p5.genome_whitelist.fa.rsa
          ./GRCm38.p5.genome_whitelist.fa.sa
  • path_gtf

    • Path to gencode gtf file.
  • path_T2C_restriction_maps

    • Path to restriction maps. If String is empty TOuCAN will generate the restriction maps.
  • uropa_feature, uropa_anchor, uropa_dist_1, uropa_dist_2, uropa_strand, uropa_direction, uropa_filter_attr, uropa_attr_value, uropa_show_attr

    • Parameter for uropa configuration file. It is recommended to read the uropa documentation. Especially about the uropa configuration file. If a uropa parameter contains a list. Separate it with a comma.
      For example: uropa_show_attr = "gene_id,gene_type,gene_name"

Params / Commandline parameter

  • sample_extension
    • regex for sample extension (e.g "_R[12]_001" or "_R[12]"). The regex has to match two cases for the forward and reversed fastq file. Like: "_R1" and "_R2" or "_A" and "_B" This parameter is only required if the input files are in the fastq format.

Enzyme Information

  • enzyme_a_name
    • Name of first restriction enzyme. (e.g. Hindiii)
  • enzyme_a_sequence
    • Sequence of first restriction enzyme cutting site. (e.g. AAGCTT)
  • enzyme_b_name
    • Name of second restriction enzyme cutting site. (T2C only)
  • enzyme_b_sequence
    • Sequence of second restriction enzyme. (T2C only)

Minor fixed parameters for BWA and SAMtools

  • bwa_T2C_options
    • bwa options for T2C Analysis. The alignment is done with bwa aln. For further information follow this link.
  • sort_options
    • Parameter for SAMtools sort. If more than one parameter should be set separate them with a whitespace. For example: "--threads 4" or "--threads 4 --par1 val1". uasually you do not have to change these
  • library_label = "capture"
  • platform_label = "ILLUMINA"
  • center_label = "ECB"

Parameter for normalization and plotting T2C

  • plot_options_T2C
    • Parameter for plotting the interaction matrix. [Insert documentation here!]
  • norm_method
    • Select which normalisation is used for the T2C Matrix. Select one of: "FPM", "log", "fpm", "array" and "none".

Parameter for uropa

  • uropa_threads
    • number of threads for the uropa run. Keep in mind that for each sample uropa will be executed twice.

Parameter for HiC matrix

  • hicBuildMatrix_options
    • Parameter for hicPlotMatrix from the Tool HiCExplorer. If more than one parameter should be set separate them with a whitespace. For example: --threads 4 --inputBufferSize 100000.
      For further information follow this link.
  • bwa_HiC_options
  • bwa options for HiC Analysis. The alignment is done with bwa mem. Multiple parameter should be separated via whitespace. For further information follow this link.

Full Example

For a full Example of an TOuCAN configuration file follow this link

Clone this wiki locally