Permalink
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
conflict-analysis-snakemake/README.md
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
113 lines (96 sloc)
3.02 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# SV conflict workflow | |
## Filetree | |
The project is built up similarly to other snakemake workflows. | |
Root | |
- config | |
- resources | |
- results | |
- workflow | |
### Config | |
The [`config.yaml`](config/config.yaml) contains global configurations for running the | |
pipeline. Currently, the following settings can be made: | |
```yaml | |
read_cut_padding: # int, padding left and right of the conflict region | |
# path to a conflict file | |
region_file: # str | |
# path to a vcf file with known SVs, e.g. from the HGSVC | |
alternative_haplotype_vcf: # str | |
# path to liftover directory (explained below) | |
liftover_root: # str, defaults to 'resources/references/liftover' | |
clustering: | |
eps: 0.6 # float, the higher the bigger the clusters can be | |
# Settings for the pairwise alignment | |
pw_alignment: | |
match: # positive integer | |
gap: # negative integer | |
mismatch: # negative integer | |
``` | |
**Region files** should be text files, where each line is just a region | |
specifying string (like chr12:456-876). | |
### Resources | |
Here you have to put all the resources that snakemake is going to use. | |
#### Resources | |
In the `references` directory, you should put the fasta files + indexes for | |
each reference that you want to use. Also, `references/liftover` should | |
include a file called `liftover_config.json` which defines the liftover | |
path of one reference to another. | |
E.g. assuming, hg38 is the main reference used in the workflow and | |
t2tv1.1 is another reference, we need: | |
``` | |
references/ | |
hg38.fa | |
hg38.fa.fai | |
t2t.fa | |
t2t.fa.fai | |
liftover/ | |
hg38_to_t2t_v1.0.chain | |
t2t_v1.0_to_t2t_v1.1.chain | |
liftover_config.json | |
``` | |
where `liftover_config.json` looks like: | |
```json | |
{ | |
"t2t": | |
[ | |
"hg38_to_t2t_v1.0.chain", | |
"t2t_v1.0_to_t2t_v1.1.chain", | |
] | |
} | |
``` | |
#### Samples | |
The `samples` subdirectory should contain a folder for each sample that | |
is going to be used in the workflow. | |
Each sample's folder again includes a folder for each reference, where | |
the mapped reads (.bam files) for different technologies reside. | |
If not all samples should be checked at the same regions, a region file | |
can be put into each sample's folder. (TODO!!!) | |
E.g.: | |
``` | |
samples/ | |
Sample1/ | |
hg38/ | |
PacBio.bam | |
PacBio.bam.bai | |
Illumina.bam | |
Illumina.bam.bai | |
t2t/ | |
PacBio.bam | |
PacBio.bam.bai | |
Illumina.bam | |
Illumina.bam.bai | |
regions.txt | |
``` | |
### Results | |
The results directory will include all results after the workflow has | |
run. The structure will look like the following: | |
``` | |
results/ | |
Sample1/ | |
chr1_123_234/ | |
results.pdf | |
... | |
``` | |
There will be more files (distance matrices, screenshots, fasta files, | |
etc.) in each region's directory, but the most | |
important file is the `results.pdf`. Here, all gathered information will | |
be collected. |