forked from loosolab/TOBIAS_snakemake
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
Moved snakemake from TOBIAS main to separate repository; Added Wilson…
… visualization rules
- Loading branch information
0 parents
commit bf02df0
Showing
111 changed files
with
3,794 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
*.pyc | ||
*.c | ||
.snakemake/ | ||
build/ | ||
dist/ | ||
*.egg | ||
*.egg-info |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
MIT License | ||
|
||
Copyright (c) 2017 MPI for Heart and Lung Research | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
TOBIAS Snakemake pipeline | ||
======================================= | ||
|
||
Introduction | ||
------------ | ||
|
||
ATAC-seq (Assay for Transposase-Accessible Chromatin using high-throughput sequencing) is a sequencing assay for investigating genome-wide chromatin accessibility. The assay applies a Tn5 Transposase to insert sequencing adapters into accessible chromatin, enabling mapping of regulatory regions across the genome. Additionally, the local distribution of Tn5 insertions contains information about transcription factor binding due to the visible depletion of insertions around sites bound by protein - known as _footprints_. | ||
|
||
**TOBIAS** is a collection of command-line bioinformatics tools for performing footprinting analysis on ATAC-seq data. Please see the [TOBIAS github repository](https://github.molgen.mpg.de/loosolab/TOBIAS/) for details about the individual tools. | ||
|
||
Snakemake how-to: | ||
----------------- | ||
|
||
To use the snakemake pipeline, make sure the included conda environments are installed: | ||
``` | ||
$ conda env create -f environments/tobias.yaml | ||
$ conda env create -f environments/macs.yaml | ||
``` | ||
|
||
You can use the example config (TOBIAS_example.config) or adjust to your own data by replacing the values for each key. Run using: | ||
```bash | ||
$ conda activate TOBIAS_ENV | ||
$ snakemake --configfile example_config.yaml --cores [number of cores] | ||
``` | ||
|
||
Contact | ||
------------ | ||
Mette Bentsen (mette.bentsen (at) mpi-bn.mpg.de) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,171 @@ | ||
""" | ||
Upper level TOBIAS snake | ||
""" | ||
|
||
import os | ||
import subprocess | ||
import itertools | ||
|
||
#Set config | ||
if workflow.overwrite_configfile != None: | ||
configfile: str(workflow.overwrite_configfile) | ||
else: | ||
configfile: 'TOBIAS.config' | ||
CONFIGFILE = str(workflow.overwrite_configfile) | ||
|
||
#Snake modules used to setup run | ||
include: "snakefiles/helper.snake" | ||
|
||
#shell.prefix("") | ||
|
||
#-------------------------------------------------------------------------------# | ||
#------------------------- CHECK FORMAT OF CONFIG FILE -------------------------# | ||
#-------------------------------------------------------------------------------# | ||
|
||
required = [("data",), | ||
("run_info",), | ||
("run_info", "organism"), | ||
("run_info", "fasta"), | ||
("run_info", "blacklist"), | ||
("run_info", "gtf"), | ||
("run_info", "motifs"), | ||
("run_info", "output"), | ||
] | ||
|
||
#Check if all keys are existing and contain information | ||
for key_list in required: | ||
lookup_dict = config | ||
for key in key_list: | ||
try: | ||
lookup_dict = lookup_dict[key] | ||
if lookup_dict == None: | ||
print("ERROR: Missing input for key {0}".format(key_list)) | ||
except: | ||
print("ERROR: Could not find key(s) \"{0}\" in configfile {1}. Please check that your configfile has right format for TOBIAS.".format(":".join(key_list), CONFIGFILE)) | ||
sys.exit() | ||
|
||
#Check if there is at least one condition with bamfiles | ||
if len(config["data"]) > 0: | ||
for condition in config["data"]: | ||
if len(config["data"][condition]) == 0: | ||
print("ERROR: Could not find any bamfiles in \"{0}\" in configfile {1}".format(":".join(("data", condition)), CONFIGFILE)) | ||
else: | ||
print("ERROR: Could not find any conditions (\"data:\{condition\}\") in configfile {0}".format(CONFIGFILE)) | ||
sys.exit() | ||
|
||
|
||
#-------------------------------------------------------------------------------# | ||
#------------------------- WHICH FILES/INFO WERE INPUT? ------------------------# | ||
#-------------------------------------------------------------------------------# | ||
|
||
input_files = [] | ||
|
||
#Files related to experimental data (bam) | ||
CONDITION_IDS = list(config["data"].keys()) | ||
for condition in CONDITION_IDS: | ||
if not isinstance(config["data"][condition], list): | ||
config['data'][condition] = [config['data'][condition]] | ||
input_files.extend(config['data'][condition]) | ||
|
||
|
||
#Flatfiles independent from experimental data (run_info) | ||
FASTA = config['run_info']['fasta'] | ||
BLACKLIST = config['run_info']['blacklist'] | ||
GTF = config['run_info']['gtf'] | ||
OUTPUTDIR = config['run_info']["output"] | ||
BLACKLIST = config['run_info']['blacklist'] | ||
MOTIFDIR = config['run_info']['motifs'] | ||
|
||
input_files.extend([FASTA, BLACKLIST, GTF]) | ||
|
||
|
||
#---------- Test that input files exist -----------# | ||
for file in input_files: | ||
if file != None: | ||
full_path = os.path.abspath(file) | ||
if not os.path.exists(full_path): | ||
exit("ERROR: The following file given in config does not exist: {0}".format(full_path)) | ||
|
||
|
||
#--------------------------------- MOTIFS --------------------------------------# | ||
#Identify IDS of motifs | ||
files = os.listdir(MOTIFDIR) | ||
MOTIF_FILES = {} | ||
for file in files: | ||
full_file = os.path.join(MOTIFDIR, file) | ||
with open(full_file) as f: | ||
for line in f: | ||
if line.startswith("MOTIF"): | ||
columns = line.rstrip().split() | ||
ID = columns[2] + "_" + columns[1] | ||
ID = filafy(ID) | ||
elif line.startswith(">"): | ||
columns = line.replace(">", "").rstrip().split() | ||
ID = columns[1] + "_" + columns[0] | ||
ID = filafy(ID) | ||
MOTIF_FILES[ID] = full_file | ||
|
||
TF_IDS = list(MOTIF_FILES.keys()) | ||
|
||
|
||
#-------------------------------------------------------------------------------# | ||
#------------------------ WHICH FILES SHOULD BE CREATED? -----------------------# | ||
#-------------------------------------------------------------------------------# | ||
|
||
output_files = [] | ||
|
||
|
||
id2bam = {condition:{} for condition in CONDITION_IDS} | ||
for condition in CONDITION_IDS: | ||
config_bams = config['data'][condition] | ||
sampleids = [os.path.splitext(os.path.basename(bam))[0] for bam in config_bams] | ||
id2bam[condition] = {sampleids[i]:config_bams[i] for i in range(len(sampleids))} # Link sample ids to bams | ||
|
||
PLOTNAMES = expand("{condition}_{plotname}", condition=CONDITION_IDS, plotname=["aggregate"]) | ||
if len(CONDITION_IDS) > 1: | ||
PLOTNAMES.extend(["heatmap_comparison", "aggregate_comparison_all", "aggregate_comparison_bound"]) | ||
|
||
output_files.append(expand(os.path.join(OUTPUTDIR, "footprinting", "{condition}_footprints.bw"), condition=CONDITION_IDS)) | ||
|
||
#output_files.append(os.path.join(OUTPUTDIR, "TFBS", "bindetect_results.txt")) | ||
#output_files.append(os.path.join(OUTPUTDIR, "overview", "bindetect_results.txt")) | ||
|
||
#Visualization | ||
output_files.extend(expand(os.path.join(OUTPUTDIR, "TFBS", "{TF}", "plots", "{TF}_{plotname}.pdf"), TF=TF_IDS, plotname=PLOTNAMES)) | ||
output_files.extend(expand(os.path.join(OUTPUTDIR, "overview", "all_{plotname}.pdf"), plotname=PLOTNAMES)) | ||
|
||
#Wilson | ||
output_files.extend(expand(os.path.join(OUTPUTDIR, "wilson", "data", "{TF}_overview.clarion"), TF=TF_IDS)) | ||
output_files.append(os.path.join(OUTPUTDIR, "wilson", "HOW_TO_WILSON.txt")) | ||
|
||
#-------------------------------------------------------------------------------# | ||
#------------------------ DEAL WITH SPECIAL ENVIRONMENTS -----------------------# | ||
#-------------------------------------------------------------------------------# | ||
|
||
sys_env = subprocess.check_output(['conda', 'env', 'list'], universal_newlines=True) | ||
env_list = [line.split()[0] for line in sys_env.split("\n") if len(line.split()) > 0] | ||
|
||
# default TOBIAS environment | ||
if "TOBIAS_ENV" not in env_list: | ||
print("Creating TOBIAS environment for the first time") | ||
subprocess.call(["conda", "env", "create", "--file", "environments/tobias.yaml"]) | ||
|
||
# python 2 related envs | ||
if "MACS_ENV" not in env_list: | ||
print("Creating macs environment for the first time") | ||
subprocess.call(["conda", "env", "create", "--file", "environments/macs.yaml"]) | ||
|
||
|
||
#-------------------------------------------------------------------------------# | ||
#---------------------------------- RUN :-) ------------------------------------# | ||
#-------------------------------------------------------------------------------# | ||
|
||
include: "snakefiles/preprocessing.snake" | ||
include: "snakefiles/footprinting.snake" | ||
include: "snakefiles/visualization.snake" | ||
include: "snakefiles/wilson.snake" | ||
|
||
rule all: | ||
input: | ||
output_files | ||
message: "Rule all" |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
chr4 49118760 49119010 | ||
chr4 49120790 49121130 |
Oops, something went wrong.