Skip to content
Permalink
main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
# SV conflict workflow
## Filetree
The project is built up similarly to other snakemake workflows.
Root
- config
- resources
- results
- workflow
### Config
The [`config.yaml`](config/config.yaml) contains global configurations for running the
pipeline. Currently, the following settings can be made:
```yaml
read_cut_padding: # int, padding left and right of the conflict region
# path to a conflict file
region_file: # str
# path to a vcf file with known SVs, e.g. from the HGSVC
alternative_haplotype_vcf: # str
# path to liftover directory (explained below)
liftover_root: # str, defaults to 'resources/references/liftover'
clustering:
eps: 0.6 # float, the higher the bigger the clusters can be
# Settings for the pairwise alignment
pw_alignment:
match: # positive integer
gap: # negative integer
mismatch: # negative integer
```
**Region files** should be text files, where each line is just a region
specifying string (like chr12:456-876).
### Resources
Here you have to put all the resources that snakemake is going to use.
#### Resources
In the `references` directory, you should put the fasta files + indexes for
each reference that you want to use. Also, `references/liftover` should
include a file called `liftover_config.json` which defines the liftover
path of one reference to another.
E.g. assuming, hg38 is the main reference used in the workflow and
t2tv1.1 is another reference, we need:
```
references/
hg38.fa
hg38.fa.fai
t2t.fa
t2t.fa.fai
liftover/
hg38_to_t2t_v1.0.chain
t2t_v1.0_to_t2t_v1.1.chain
liftover_config.json
```
where `liftover_config.json` looks like:
```json
{
"t2t":
[
"hg38_to_t2t_v1.0.chain",
"t2t_v1.0_to_t2t_v1.1.chain",
]
}
```
#### Samples
The `samples` subdirectory should contain a folder for each sample that
is going to be used in the workflow.
Each sample's folder again includes a folder for each reference, where
the mapped reads (.bam files) for different technologies reside.
If not all samples should be checked at the same regions, a region file
can be put into each sample's folder. (TODO!!!)
E.g.:
```
samples/
Sample1/
hg38/
PacBio.bam
PacBio.bam.bai
Illumina.bam
Illumina.bam.bai
t2t/
PacBio.bam
PacBio.bam.bai
Illumina.bam
Illumina.bam.bai
regions.txt
```
### Results
The results directory will include all results after the workflow has
run. The structure will look like the following:
```
results/
Sample1/
chr1_123_234/
results.pdf
...
```
There will be more files (distance matrices, screenshots, fasta files,
etc.) in each region's directory, but the most
important file is the `results.pdf`. Here, all gathered information will
be collected.