Skip to content
Permalink
master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time

MARMoSET

A package to extract meta data out of Thermo Fischer raw files. Meta data refers to what is found in a raw file as parts of the file header, sample information, instrument method and mass spectrometer tune data.

Works with the following instruments:

  • Thermo EASY-nLC
  • Q Exactive Plus - Orbitrap MS
  • Q Exactive HF - Orbitrap MS
  • Q Exactive - Orbitrap MS

Motivation:

In the context of evaluability and reproducibility in mass spectrometry, the standardization of reporting instrument settings and other metadata is common. But in case of the raw file format the Xcalibur software accesses the raw files instrument settings but has no feature to extract it, let alone to compare these settings of multiple raw files. Therefore MARMoSET should solve this issue and provide an easy way of extracting meta data out of raw files.

Raw files:

The raw file format starts with meta data. It consists of more information about the instrument settings and chromatography data on a deeper layer of the file. Therefore it contains all the information about how the run was set up besides of the results. To access this format and extract these information, the interface ‘IRaw DataPlus’ was defined. With Thermo Fischers RawFileReader it can return all valuable data in a raw file.

Installation

You can install MARMoSET from github the following way:

install.packages('remotes')

remotes::install_github('loosolab/MARMoSET', host='github.molgen.mpg.de/api/v3')

Usage:

The c# command line tool can be used in R with generate_json() to create a JSON file including only the meta data of grouped raw files. It takes as argument path_data, a path to a raw file or a directory containing raw files.

This JSON file can also be created by using the MARMoSET.exe externally.

Due to the design of the RawFileReader the MARMoSET.exe and therefore this command is running as 64 bit code on Windows only. To use this package with another operating system, it is necessarry to already have the JSON file created. To open a JSON file in R the package jsonlite provides the function fromJSON().

library(MARMoSET)

if(.Platform$OS.type == 'windows')
{
  # representative of any raw file / directory containing raw files
  data <- system.file(
      file.path('extdata', 'testfile.raw'),
      package = 'MARMoSET', mustWork = TRUE)
  
  json <- generate_json(path_data = data)
} else
{
  # non windows user need to read in the JSON file here
  json <- MARMoSET::testfile_json
}

To allow an easier access to the JSON file it needs to be flattened, this works with flatten_json() which takes only one argument: json.

flat_json <- flatten_json(json = json)

Since raw files include a huge amount of meta data and only several of this information is required by journals there is the need to sort out. Therefore a table linking which information is essential and where to find it in the flattened JSON is useful. Such a here refered to as 'term matching table' can be created with term_matching_table() by submitting two arguments. The first, instrument_list takes a vector with the names of the instruments represented in the JSON file. The second one, origin_key specifies which requirements should be met. If it stays empty, all journals are selected. While 'jpr_guidelines_ms' stands for the requirements of the Journal of Proteome Research, 'miape' for the Minimal Information about a Proteomics Experiment (MIAPE) from the Proteomics Standards Initiative..

term_matching_table <- create_term_match_table(
  instrument_list = c('Thermo EASY-nLC', 'Q Exactive - Orbitrap_MS'),
  origin_key = 'jpr_guidelines_ms')

The names of the instruments can be shown for each group with instrument_names() with the json and the group number as arguments.

instrument_names(json = json, group_number = 1)
#> [1] "Thermo EASY-nLC"          "Q Exactive - Orbitrap MS"

To combine the JSON with the created term matching table and get a clear table there are two possibilities. The function one_group_match_terms() creates one table with the information choosen in the term matching table for one group, while match_terms() creates a vector with a table for each group. Hence match_terms() takes the flattened json as first and the term matching table as second argument while one_group_match_terms() additonally has the group number as third argument. Note that for the use of match_terms() all raw files in the JSON need to be recorded by the same constellation of instruments. If this is not given each group needs a separate term matching table and therefore the use of one_group_match_terms().

meta_data_table <- one_group_match_terms(flat_json = flat_json, 
                                         term_matching_table = term_matching_table, 
                                         group_number = 1)

vector_of_group_tables <- match_terms(flat_json = flat_json,
                                      term_matching_table = term_matching_table)

head(meta_data_table, 3)
#>                                                       term
#> 1 High Performance Liquid Chromatography (HPLC) Instrument
#> 2                                              HPLC Vendor
#> 3                                               HPLC Model
#>               value
#> 1   Thermo EASY-nLC
#> 2 Thermo Scientific
#> 3              <NA>

In the meta data of the 'Thermo EASY-nlc' is a gradient table. This table can be grabbed either by one_gradient_table() for one specified group or with gradient_tables() for all groups at once. Output will be either one table or a vector of tables.

gradient_table <- one_gradient_table(flat_json = flat_json, 
                                     group_number = 1)

vector_of_gradient_tables <- gradient_tables(flat_json = flat_json)

head(gradient_table, 3)
#>   Time[mm:ss] Duration[mm:ss] Flow[nl/min] Mixture[%B]
#> 1       00:00           00:00          300           5
#> 2       01:00           01:00          300           5

Finally to export all tables in a vector or one specific, the functions save_group_table() and save_all_groups()are helpful. The output of gradient_tables() or match_terms() goes in as groups_vector whereas the second argument, output_path expects a path with filename. save_all_groups() adds the group numbers automatically to each filename, save_group_table() takes it as third argument but doesn't add.

The output file is a tab delimited .txt file for each group.

partial_path_for_output <- tempfile()

save_group_table(groups_vector = vector_of_group_tables, 
                 output_path = partial_path_for_output, 
                 group_number = 1)

save_all_groups(groups_vector = vector_of_group_tables,
                output_path =  partial_path_for_output)

Additional functions are group_count() which just returns how many groups the json file contains. files_in_groups()returns a list of the filenames in the group determined with the second parameter group_number. Both functions work with the original JSON file, not the flattened one.

group_count(json)
#> [1] 1

files_in_group(json, 1)
#> [1] "C:/Users/mkiwele/Documents/R/win-library/3.5/MARMoSET/extdata/testfile.raw"