diff --git a/README.md b/README.md index ad2fb05..e2ef7d6 100644 --- a/README.md +++ b/README.md @@ -131,22 +131,24 @@ featureCounts parameters: -Q 255 -T 6 -Command run: +Change first to directory containing the sorted BAMs: ```bash -cd /storage/schu/ProjectGroups/RNA/Data/RNASequencing/Novogene/NHHW162910/alignments/bams/bamsSortedByName +cd /storage/schu/ProjectGroups/RNA/Data/RNASequencing/Novogene/NHHW162910/alignments/bamsSortedByName ``` + +Run featureCounts on every BAM file. The files will be read in alphanumerical order (1,2,3...10,11,12...): ```bash -featureCounts -Q 255 -T 6 -p -a /storage/scic/Data/External/Bioinformatics/Rat/Genomes/rn5_Mar2012/annotation/annRefSeq_rn5_20Apr2016.gtf -o /storage/schu/ProjectGroups/RNA/Data/RNASequencing/Novogene/NHHW162910/gene_counts/gene_counts_130117.txt $(echo *.bam) +featureCounts -Q 255 -T 6 -p -a /storage/scic/Data/External/Bioinformatics/Rat/Genomes/rn5_Mar2012/annotation/annRefSeq_rn5_20Apr2016.gtf -o /storage/schu/ProjectGroups/RNA/Data/RNASequencing/Novogene/NHHW162910/gene_counts/gene_counts_180117.txt $(ls *.bam | sort -n -k1.2 | paste -s -d ' ' -) ``` #### Extract mapped features only Once the counting of the features/genes is done, we extract only those which have at least one read mapped to them: ```bash -awk -F"\t" 'OFS="\t"{if(($7+$8+$9+$10+$11+$12+$13+$14+$15+$16+$17+$18+$19+$20+$21+$22)>0){print $1,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21,$22}}' gene_counts_130117.txt > gene_counts_130117_only_mapped.txt +awk -F"\t" 'OFS="\t"{if(($7+$8+$9+$10+$11+$12+$13+$14+$15+$16+$17+$18+$19+$20+$21+$22)>0){print $1,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21,$22}}' gene_counts_180117.txt > gene_counts_180117_only_mapped.txt ``` -Then we add a header to that file with the sample names: +Then we add a header to that file with labels and sample names: ```bash -paste -d '\t' <(echo "gene_name") <(head -1 gene_counts_130117.txt.summary | cut -f2-) | cat - gene_counts_130117_only_mapped.txt > gene_counts_130117_only_mapped_with_header.txt +paste -d '\t' <(echo "GeneId") <(echo "Length") <(head -1 gene_counts_180117.txt.summary | cut -f2- | tr -s '\t' '\n' | awk 'OFS="\t"{split($1, a, "_"); print a[1]}' | paste -s -d '\t' -) | cat - gene_counts_180117_only_mapped.txt > gene_counts_180117_only_mapped_with_header.txt ```