-
Notifications
You must be signed in to change notification settings - Fork 0
Collection of statistics
HTStream is able to collect statistics in JSON format while running through the chained tasks. Each task results in a (named) dictionary with task-specific statistics, for example:
{
"hts_Stats_14865":
{"Notes": "BeforeQC", "totalFragmentsInput": 138720, "totalFragmentsOutput": 138720},
"hts_PolyATTrim_14866":
{"Notes": "polyATTrim", "totalFragmentsInput": 138720, "totalFragmentsOutput": 138263, "Single_end": {"SE_rightTrim": 290652, "SE_discarded": 457}}
}
sc-preprocess contains logic to extract information from the collected JSON and show it in a nice table:
Sample | BeforeQC_totalFragmentsOutput | polyATTrim_totalFragmentsOutput | polyATTrim_Single_end:SE_discarded |
---|---|---|---|
Cell1 | 138720 | 138263 | 457 |
Cell2 | 523741 | 522535 | 1206 |
Cell3 | 259680 | 259265 | 415 |
The config file contains the information on which columns (dictionary keys) to extract in the htstream
section:
htstream:
collect:
- totalFragmentsOutput
- Single_end:SE_discarded
A colon (:
) in the column name indicates nested dictionaries, that have to be traversed down in order to fetch the right value. Additionally, the value of the Notes
key will be used as column prefix (e.g. BeforeQC
and polyATTrim
in the example above).
Similar to HTStream, Salmon also provides its statistics in JSON format. Extraction of values of interest works similiarly:
The configuration
salmon:
collect:
- 'num_processed'
- 'num_mapped'
- 'percent_mapped'
will result in a table with four columns: Sample
, num_processed
, num_mapped
and percent_mapped
.