README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
  )
```

# MARMoSET
A package to extract meta data out of Thermo Fischer raw files.
Meta data refers to what is found in a raw file as parts of the file header, sample information, instrument method and mass spectrometer tune data.

Works with the following instruments:

* Thermo EASY-nLC
* RSLCnano
* Q Exactive Plus - Orbitrap MS
* Q Exactive HF - Orbitrap MS
* Q Exactive - Orbitrap MS
* Orbitrap Fusion Lumos

## Motivation:

In the context of evaluability and reproducibility in mass spectrometry, the standardization of reporting instrument settings and other metadata is common. But in case of the raw file format the Xcalibur software accesses the raw files instrument settings but has no feature to extract it, let alone to compare these settings of multiple raw files. Therefore MARMoSET should solve this issue and provide an easy way of extracting meta data out of raw files.

## Raw files:

The raw file format starts with meta data. It consists of more information about the instrument settings and chromatography data on a deeper layer of the file. Therefore it contains all the information about how the run was set up besides of the results. To access this format and extract these information, the interface ‘IRaw DataPlus’ was defined. With Thermo Fischers RawFileReader it can return all valuable data in a raw file.

## Installation

You can install MARMoSET from github the following way:

```{r, eval=FALSE}
install.packages('remotes')

remotes::install_github('loosolab/MARMoSET', host='github.molgen.mpg.de/api/v3')
```

## Usage:

The c# command line tool can be used in R with `generate_json()` to create a JSON file including only the meta data of grouped raw files. It takes as argument `path_data`, a path to a raw file or a directory containing raw files.

This JSON file can also be created by using the MARMoSET.exe externally.

Due to the design of the RawFileReader, the MARMoSET.exe and therefore this command is running as 64 bit code on Windows only. To use this package with another operating system, it is necessarry to already have the JSON file created. To open a JSON file in R the package `jsonlite` provides the function `fromJSON()`.

```{r}
library(MARMoSET)

if(.Platform$OS.type == 'windows')
{
  # representative of any raw file / directory containing raw files
  data <- system.file(
      file.path('extdata', 'testfile.raw'),
      package = 'MARMoSET', mustWork = TRUE)

  json <- generate_json(path_data = data)
} else
{
  # non windows user need to read in the JSON file here
  json <- MARMoSET::testfile_json
}

```

To allow an easier access to the JSON file it needs to be flattened, this works with `flatten_json()` which takes only one argument: `json`.

```{r}
flat_json <- flatten_json(json = json)
```

Since raw files include a huge amount of meta data and only several of this information is required by journals there is the need to sort out. Therefore a table linking which information is essential and where to find it in the flattened JSON is useful. Such a here refered to as 'term matching table' can be created with `term_matching_table()` by submitting two arguments. The first, `instrument_list` takes a vector with the names of the instruments represented in the JSON file.
The second one, `origin_key` specifies which requirements should be met. If it stays empty, all journals are selected. While `'jpr_guidelines_ms'` stands for the requirements of the Journal of Proteome Research, `'miape'` for the Minimal Information about a Proteomics Experiment (MIAPE) from the Proteomics Standards Initiative.

```{r}
term_matching_table <- create_term_match_table(
  instrument_list = c('Thermo EASY-nLC', 'Q Exactive - Orbitrap_MS'),
  origin_key = 'jpr_guidelines_ms')
```

The names of the instruments can be shown for each group with `instrument_names()` with the json and the group number as arguments.

```{r}
instrument_names(json = json, group_number = 1)
```

To combine the JSON with the created term matching table and get a clear table there are two possibilities. The function `one_group_match_terms()` creates one table with the information choosen in the term matching table for one group, while `match_terms()` creates a vector with a table for each group.
Hence `match_terms()` takes the flattened json as first and the term matching table as second argument while `one_group_match_terms()` additonally has the group number as third argument.
Note that for the use of `match_terms()` all raw files in the JSON need to be recorded by the same constellation of instruments. If this is not given each group needs a separate term matching table and therefore the use of `one_group_match_terms()`.

```{r}
meta_data_table <- one_group_match_terms(flat_json = flat_json,
                                         term_matching_table = term_matching_table,
                                         group_number = 1)

vector_of_group_tables <- match_terms(flat_json = flat_json,
                                      term_matching_table = term_matching_table)

knitr::kable(meta_data_table[1:3,])
```

In the meta data of the 'Thermo EASY-nlc' is a gradient table. This table can be grabbed either by `one_gradient_table()` for one specified group or with `gradient_tables()` for all groups at once. Output will be either one table or a vector of tables.

```{r}
gradient_table <- one_gradient_table(flat_json = flat_json,
                                     group_number = 1,
                                     lc_pump = "Thermo EASY-nLC")

vector_of_gradient_tables <- gradient_tables(flat_json = flat_json,
                                             lc_pump = "Thermo EASY-nLC")

knitr::kable(gradient_table)
```

Finally to export all tables in a vector or one specific, the functions `save_group_table()` and `save_all_groups()`are helpful. The output of `gradient_tables()` or `match_terms()` goes in as `groups_vector` whereas the second argument, `output_path` expects a path with filename. `save_all_groups()` adds the group numbers automatically to each filename, `save_group_table()` takes it as third argument but doesn't add.

The output file is a tab delimited .txt file for each group.

```{r}
partial_path_for_output <- tempfile()

save_group_table(groups_vector = vector_of_group_tables,
                 output_path = partial_path_for_output,
                 group_number = 1)

save_all_groups(groups_vector = vector_of_group_tables,
                output_path =  partial_path_for_output)
```

Additional functions are `group_count()` which just returns how many groups the json file contains. `files_in_groups()`returns a list of the filenames in the group determined with the second parameter `group_number`. Both functions work with the original JSON file, not the flattened one.

```{r}
group_count(json)

files_in_group(json, 1)

```

## How to cite:

Kiweler M, Looso M and Graumann J. MARMoSET – Extracting Publication-Ready Mass Spectrometry Metadata from RAW Files. Molecular & Cellular Proteomics (2019), DOI: 10.1074/mcp.TIR119.001505
	---
	output: github_document
	---

	<!-- README.md is generated from README.Rmd. Please edit that file -->

	```{r setup, include = FALSE}
	knitr::opts_chunk$set(
	collapse = TRUE,
	comment = "#>"
	)
	```

	# MARMoSET
	A package to extract meta data out of Thermo Fischer raw files.
	Meta data refers to what is found in a raw file as parts of the file header, sample information, instrument method and mass spectrometer tune data.

	Works with the following instruments:

	* Thermo EASY-nLC
	* RSLCnano
	* Q Exactive Plus - Orbitrap MS
	* Q Exactive HF - Orbitrap MS
	* Q Exactive - Orbitrap MS
	* Orbitrap Fusion Lumos

	## Motivation:

	In the context of evaluability and reproducibility in mass spectrometry, the standardization of reporting instrument settings and other metadata is common. But in case of the raw file format the Xcalibur software accesses the raw files instrument settings but has no feature to extract it, let alone to compare these settings of multiple raw files. Therefore MARMoSET should solve this issue and provide an easy way of extracting meta data out of raw files.

	## Raw files:

	The raw file format starts with meta data. It consists of more information about the instrument settings and chromatography data on a deeper layer of the file. Therefore it contains all the information about how the run was set up besides of the results. To access this format and extract these information, the interface ‘IRaw DataPlus’ was defined. With Thermo Fischers RawFileReader it can return all valuable data in a raw file.

	## Installation

	You can install MARMoSET from github the following way:

	```{r, eval=FALSE}
	install.packages('remotes')

	remotes::install_github('loosolab/MARMoSET', host='github.molgen.mpg.de/api/v3')
	```

	## Usage:

	The c# command line tool can be used in R with `generate_json()` to create a JSON file including only the meta data of grouped raw files. It takes as argument `path_data`, a path to a raw file or a directory containing raw files.

	This JSON file can also be created by using the MARMoSET.exe externally.

	Due to the design of the RawFileReader, the MARMoSET.exe and therefore this command is running as 64 bit code on Windows only. To use this package with another operating system, it is necessarry to already have the JSON file created. To open a JSON file in R the package `jsonlite` provides the function `fromJSON()`.

	```{r}
	library(MARMoSET)

	if(.Platform$OS.type == 'windows')
	{
	# representative of any raw file / directory containing raw files
	data <- system.file(
	file.path('extdata', 'testfile.raw'),
	package = 'MARMoSET', mustWork = TRUE)

	json <- generate_json(path_data = data)
	} else
	{
	# non windows user need to read in the JSON file here
	json <- MARMoSET::testfile_json
	}

	```

	To allow an easier access to the JSON file it needs to be flattened, this works with `flatten_json()` which takes only one argument: `json`.

	```{r}
	flat_json <- flatten_json(json = json)
	```

	Since raw files include a huge amount of meta data and only several of this information is required by journals there is the need to sort out. Therefore a table linking which information is essential and where to find it in the flattened JSON is useful. Such a here refered to as 'term matching table' can be created with `term_matching_table()` by submitting two arguments. The first, `instrument_list` takes a vector with the names of the instruments represented in the JSON file.
	The second one, `origin_key` specifies which requirements should be met. If it stays empty, all journals are selected. While `'jpr_guidelines_ms'` stands for the requirements of the Journal of Proteome Research, `'miape'` for the Minimal Information about a Proteomics Experiment (MIAPE) from the Proteomics Standards Initiative.

	```{r}
	term_matching_table <- create_term_match_table(
	instrument_list = c('Thermo EASY-nLC', 'Q Exactive - Orbitrap_MS'),
	origin_key = 'jpr_guidelines_ms')
	```

	The names of the instruments can be shown for each group with `instrument_names()` with the json and the group number as arguments.

	```{r}
	instrument_names(json = json, group_number = 1)
	```

	To combine the JSON with the created term matching table and get a clear table there are two possibilities. The function `one_group_match_terms()` creates one table with the information choosen in the term matching table for one group, while `match_terms()` creates a vector with a table for each group.
	Hence `match_terms()` takes the flattened json as first and the term matching table as second argument while `one_group_match_terms()` additonally has the group number as third argument.
	Note that for the use of `match_terms()` all raw files in the JSON need to be recorded by the same constellation of instruments. If this is not given each group needs a separate term matching table and therefore the use of `one_group_match_terms()`.

	```{r}
	meta_data_table <- one_group_match_terms(flat_json = flat_json,
	term_matching_table = term_matching_table,
	group_number = 1)

	vector_of_group_tables <- match_terms(flat_json = flat_json,
	term_matching_table = term_matching_table)

	knitr::kable(meta_data_table[1:3,])
	```

	In the meta data of the 'Thermo EASY-nlc' is a gradient table. This table can be grabbed either by `one_gradient_table()` for one specified group or with `gradient_tables()` for all groups at once. Output will be either one table or a vector of tables.

	```{r}
	gradient_table <- one_gradient_table(flat_json = flat_json,
	group_number = 1,
	lc_pump = "Thermo EASY-nLC")

	vector_of_gradient_tables <- gradient_tables(flat_json = flat_json,
	lc_pump = "Thermo EASY-nLC")

	knitr::kable(gradient_table)
	```

	Finally to export all tables in a vector or one specific, the functions `save_group_table()` and `save_all_groups()`are helpful. The output of `gradient_tables()` or `match_terms()` goes in as `groups_vector` whereas the second argument, `output_path` expects a path with filename. `save_all_groups()` adds the group numbers automatically to each filename, `save_group_table()` takes it as third argument but doesn't add.

	The output file is a tab delimited .txt file for each group.

	```{r}
	partial_path_for_output <- tempfile()

	save_group_table(groups_vector = vector_of_group_tables,
	output_path = partial_path_for_output,
	group_number = 1)

	save_all_groups(groups_vector = vector_of_group_tables,
	output_path = partial_path_for_output)
	```

	Additional functions are `group_count()` which just returns how many groups the json file contains. `files_in_groups()`returns a list of the filenames in the group determined with the second parameter `group_number`. Both functions work with the original JSON file, not the flattened one.

	```{r}
	group_count(json)

	files_in_group(json, 1)

	```

	## How to cite:

	Kiweler M, Looso M and Graumann J. MARMoSET – Extracting Publication-Ready Mass Spectrometry Metadata from RAW Files. Molecular & Cellular Proteomics (2019), DOI: 10.1074/mcp.TIR119.001505