diff --git a/Extension_3UTR/README.md b/Extension_3UTR/README.md index 6ba2ce7..812348c 100644 --- a/Extension_3UTR/README.md +++ b/Extension_3UTR/README.md @@ -1,7 +1,7 @@ # Analysis of 3'End sequencing and extension of gene annotation ## Library -Library called [MACE](http://genxpro.net/sequencing/transcriptome/mace-massive-analysis-of-cdna-ends/) (Massive Analysis of cDNA Ends) is used to prepare RNA samples from whole turtle or lizzard brain. Reads were 68 nucleotides long and filtered for PCR duplicates by unique molecule identifiers (UMIs). +Library called [MACE](http://genxpro.net/sequencing/transcriptome/mace-massive-analysis-of-cdna-ends/) (Massive Analysis of cDNA Ends) is used to prepare RNA samples from whole turtle or lizard brain. Reads were 68 nucleotides long and filtered for PCR duplicates by unique molecule identifiers (UMIs). ## Preprocessing @@ -35,7 +35,7 @@ To evaluate potential internal priming events, a poly(A/T) mask of a 10 consecut bowtie /path_to_genome/bowtie_index/genome PolyTailMask.fa -f -v 2 --all --sam --threads 6 |samtools view -buS - | samtools sort - PolyTailMask ``` -The resulted Bam file si converted to Bed format and compressed with [BGzip](https://github.com/samtools/htslib). +The resulted Bam file is converted to Bed format and compressed with [BGzip](https://github.com/samtools/htslib). ``` bedtools bamtobed -i PolyTailMask.bam -split | bedtools merge -i - -s -c 4 -o count | sort -k1,1 -k2,2n | awk -F"\t" 'OFS="\t"{print $1,$2,$3,sprintf("poly%06d",NR),$5,$4}' | bgzip > species_PolyRegions.bed.gz @@ -107,7 +107,7 @@ Library called [MACE](http://genxpro.net/sequencing/transcriptome/mace-massive-a ### Contribution of extended 3'UTRs ![figure_LengthContribution](./figures/figureX_relativeExtension.png) -The plot represent a cummulative distribution of the relative distance from annotated 3'End. All upstream identified 3'UTRs will have negative distance and will represent an alternative polyadenylation site (~70% of all sites). All downstream identified 3'UTRs will have positive distance and will represent extended PASS (~30% of all sites). Both annotations overcome the same degree of extension. Range of [-5,+5] kB contributes to more than 95% of all sites. +The plot represents a cummulative distribution of the relative distance from annotated 3'End. All upstream identified 3'UTRs will have negative distance and will represent an alternative polyadenylation site (~70% of all sites). All downstream identified 3'UTRs will have positive distance and will represent extended PASS (~30% of all sites). Both annotations overcome the same degree of extension. Range of [-5,+5] kB contributes to more than 95% of all sites. ![figure_FeaturesContribution](./figures/figureX_DistributionOfGeneFeatures.png)