Skip to content
This repository has been archived by the owner. It is now read-only.

Collection of statistics

Jens Preußner edited this page Apr 6, 2018 · 1 revision

HTStream

HTStream is able to collect statistics in JSON format while running through the chained tasks. Each task results in a (named) dictionary with task-specific statistics, for example:

{
"hts_Stats_14865": 
  {"Notes": "BeforeQC", "totalFragmentsInput": 138720, "totalFragmentsOutput": 138720},
"hts_PolyATTrim_14866":
  {"Notes": "polyATTrim", "totalFragmentsInput": 138720, "totalFragmentsOutput": 138263, "Single_end": {"SE_rightTrim": 290652, "SE_discarded": 457}}
}

sc-preprocess contains logic to extract information from the collected JSON and show it in a nice table:

Sample BeforeQC_totalFragmentsOutput polyATTrim_totalFragmentsOutput polyATTrim_Single_end:SE_discarded
Cell1 138720 138263 457
Cell2 523741 522535 1206
Cell3 259680 259265 415

The config file contains the information on which columns (dictionary keys) to extract in the htstream section:

htstream:
  collect:
    - totalFragmentsOutput
    - Single_end:SE_discarded

A colon (:) in the column name indicates nested dictionaries, that have to be traversed down in order to fetch the right value. Additionally, the value of the Notes key will be used as column prefix (e.g. BeforeQC and polyATTrim in the example above).

Salmon

Similar to HTStream, Salmon also provides its statistics in JSON format. Extraction of values of interest works similiarly:

The configuration

salmon:
  collect:
    - 'num_processed'
    - 'num_mapped'
    - 'percent_mapped'

will result in a table with four columns: Sample, num_processed, num_mapped and percent_mapped.