Collection of statistics

HTStream

HTStream is able to collect statistics in JSON format while running through the chained tasks. Each task results in a (named) dictionary with task-specific statistics, for example:

{
"hts_Stats_14865": 
  {"Notes": "BeforeQC", "totalFragmentsInput": 138720, "totalFragmentsOutput": 138720},
"hts_PolyATTrim_14866":
  {"Notes": "polyATTrim", "totalFragmentsInput": 138720, "totalFragmentsOutput": 138263, "Single_end": {"SE_rightTrim": 290652, "SE_discarded": 457}}
}

sc-preprocess contains logic to extract information from the collected JSON and show it in a nice table:

Sample	BeforeQC_totalFragmentsOutput	polyATTrim_totalFragmentsOutput	polyATTrim_Single_end:SE_discarded
Cell1	138720	138263	457
Cell2	523741	522535	1206
Cell3	259680	259265	415

The config file contains the information on which columns (dictionary keys) to extract in the htstream section:

htstream:
  collect:
    - totalFragmentsOutput
    - Single_end:SE_discarded

A colon (:) in the column name indicates nested dictionaries, that have to be traversed down in order to fetch the right value. Additionally, the value of the Notes key will be used as column prefix (e.g. BeforeQC and polyATTrim in the example above).

Salmon

Similar to HTStream, Salmon also provides its statistics in JSON format. Extraction of values of interest works similiarly:

The configuration

salmon:
  collect:
    - 'num_processed'
    - 'num_mapped'
    - 'percent_mapped'

will result in a table with four columns: Sample, num_processed, num_mapped and percent_mapped.

Collection of statistics

HTStream

Salmon

Clone this wiki locally