MARMoSET

A package to extract meta data out of Thermo Fischer raw files. Meta data refers to what is found in a raw file as parts of the file header, sample information, instrument method and mass spectrometer tune data.

Works with the following instruments:

Thermo EASY-nLC
RSLCnano
Q Exactive Plus - Orbitrap MS
Q Exactive HF - Orbitrap MS
Q Exactive - Orbitrap MS
Orbitrap Fusion Lumos
LTQ Orbitrap MS
LTQ Orbitrap Velos MS

Motivation:

In the context of evaluability and reproducibility in mass spectrometry, the standardization of reporting instrument settings and other metadata is common. But in case of the raw file format the Xcalibur software accesses the raw files instrument settings but has no feature to extract it, let alone to compare these settings of multiple raw files. Therefore MARMoSET should solve this issue and provide an easy way of extracting meta data out of raw files.

Raw files:

The raw file format starts with meta data. It consists of more information about the instrument settings and chromatography data on a deeper layer of the file. Therefore it contains all the information about how the run was set up besides of the results. To access this format and extract these information, the interface ‘IRaw DataPlus’ was defined. With Thermo Fischers RawFileReader it can return all valuable data in a raw file.

Installation

You can install MARMoSET from github the following way:

install.packages('remotes')

remotes::install_github('loosolab/MARMoSET', host='github.molgen.mpg.de/api/v3')

Usage:

The c# command line tool can be used in R with generate_json() to create a JSON file including only the meta data of grouped raw files. It takes as argument path_data, a path to a raw file or a directory containing raw files.

This JSON file can also be created by using the MARMoSET.exe externally.

Due to the design of the RawFileReader, the MARMoSET.exe and therefore this command is running as 64 bit code on Windows only. To use this package with another operating system, it is necessarry to already have the JSON file created. To open a JSON file in R the package jsonlite provides the function fromJSON().

library(MARMoSET)

if(.Platform$OS.type == 'windows')
{
  # representative of any raw file / directory containing raw files
  data <- system.file(
      file.path('extdata', 'testfile.raw'),
      package = 'MARMoSET', mustWork = TRUE)
  
  json <- generate_json(path_data = data)
} else
{
  # non windows user need to read in the JSON file here
  json <- MARMoSET::testfile_json
}

To allow an easier access to the JSON file it needs to be flattened, this works with flatten_json() which takes only one argument: json.

flat_json <- flatten_json(json = json)

Since raw files include a huge amount of meta data and only several of this information is required by journals there is the need to sort out. Therefore a table linking which information is essential and where to find it in the flattened JSON is useful. Such a here refered to as 'term matching table' can be created with term_matching_table() by submitting two arguments. The first, instrument_list takes a vector with the names of the instruments represented in the JSON file. The second one, origin_key specifies which requirements should be met. If it stays empty, all journals are selected. While 'jpr_guidelines_ms' stands for the requirements of the Journal of Proteome Research, 'miape' for the Minimal Information about a Proteomics Experiment (MIAPE) from the Proteomics Standards Initiative.

term_matching_table <- create_term_match_table(
  instrument_list = c('Thermo EASY-nLC', 'Q Exactive - Orbitrap_MS'),
  origin_key = 'jpr_guidelines_ms')

The names of the instruments can be shown for each group with instrument_names() with the json and the group number as arguments.

instrument_names(json = json, group_number = 1)
#> [1] "Thermo EASY-nLC"          "Q Exactive - Orbitrap MS"

To combine the JSON with the created term matching table and get a clear table there are two possibilities. The function one_group_match_terms() creates one table with the information choosen in the term matching table for one group, while match_terms() creates a vector with a table for each group. Hence match_terms() takes the flattened json as first and the term matching table as second argument while one_group_match_terms() additonally has the group number as third argument. Note that for the use of match_terms() all raw files in the JSON need to be recorded by the same constellation of instruments. If this is not given each group needs a separate term matching table and therefore the use of one_group_match_terms().

meta_data_table <- one_group_match_terms(flat_json = flat_json, 
                                         term_matching_table = term_matching_table, 
                                         group_number = 1)

vector_of_group_tables <- match_terms(flat_json = flat_json,
                                      term_matching_table = term_matching_table)

knitr::kable(meta_data_table[1:3,])

Term	Value
High Performance Liquid Chromatography (HPLC) Instrument	Thermo EASY-nLC
HPLC Vendor	Thermo Scientific
HPLC Model	NA

In the meta data of the 'Thermo EASY-nlc' is a gradient table. This table can be grabbed either by one_gradient_table() for one specified group or with gradient_tables() for all groups at once. Output will be either one table or a vector of tables.

gradient_table <- one_gradient_table(flat_json = flat_json, 
                                     group_number = 1, 
                                     lc_pump = "Thermo EASY-nLC")

vector_of_gradient_tables <- gradient_tables(flat_json = flat_json,
                                             lc_pump = "Thermo EASY-nLC")

knitr::kable(gradient_table)

Time[mm:ss]	Duration[mm:ss]	Flow[nl/min]	Mixture[%B]
00:00	00:00	300	5
01:00	01:00	300	5

Finally to export all tables in a vector or one specific, the functions save_group_table() and save_all_groups()are helpful. The output of gradient_tables() or match_terms() goes in as groups_vector whereas the second argument, output_path expects a path with filename. save_all_groups() adds the group numbers automatically to each filename, save_group_table() takes it as third argument but doesn't add.

The output file is a tab delimited .txt file for each group.

partial_path_for_output <- tempfile()

save_group_table(groups_vector = vector_of_group_tables, 
                 output_path = partial_path_for_output, 
                 group_number = 1)

save_all_groups(groups_vector = vector_of_group_tables,
                output_path =  partial_path_for_output)

Additional functions are group_count() which just returns how many groups the json file contains. files_in_groups()returns a list of the filenames in the group determined with the second parameter group_number. Both functions work with the original JSON file, not the flattened one.

group_count(json)
#> [1] 1

files_in_group(json, 1)
#> [1] "C:/Users/mkiwele/Documents/R/win-library/3.5/MARMoSET/extdata/testfile.raw"

How to cite:

Kiweler M, Looso M and Graumann J. MARMoSET – Extracting Publication-Ready Mass Spectrometry Metadata from RAW Files. Molecular & Cellular Proteomics (2019), DOI: 10.1074/mcp.TIR119.001505

loosolab/MARMoSET

About

Resources

Stars

Watchers

Forks

Releases

Contributors 2

Languages