Skip to content

spike_in

Kai Schmid edited this page Mar 25, 2019 · 9 revisions

The function spike_in() was implemented to produce a test data set that can be analyzed by the workflow.

A noised data set is generated first. For this, all parameters can be set manually by the user to adapt it as closely to the biological data as possible.

table <- spike_in(noise = 500, spike = 25, numberpatients = 500, proportion_noise = 0.7, proportion_spike = 0.7, overlap = 0.65, noise_FC = 2, spike_FC = 2, numberspikeins = 1)

A table with a short description of the parameters can be taken from below.

argument function default
noise number of bimodal genes in the noise 500
spike number of bimodal genes in the spike 25
numberpatients number of patients in both spike and noise 500
proportion_noise proportion of the bimodality in the noise 0.7
proportion_spike proportion of the bimodality in the spike 0.7
overlap enhanced overlap between the spike genes in % 65
noise_FC distance between the means of the modalities in the noise 2
spike_FC distance between the means of the modalities in the spike 2
numberspikeins number of not connected spikes to be added to the noise 1

In this process, the first step is the construction of a table that contains all static values like gene name, component, size, proportion, variance, mean, and FC. In a second step, the composition of patients is generated and inserted into the group member column by randomly filling the first gene with the selected number of patients who have all been tagged with a consecutive number at the end of their names. The composition of the following genes is produced by using the sample() function, while the desired connection strength is selected through the number of patients who have been picked with the sample() function using the defined corresponding gene. The rest of the table is filled with consecutively numbered patients.

spike_graphic

Into this noisy part, a spike-in that can be changed in its proportion, fold change, size, and most importantly its patient composition, is inserted. Also, several spikes can be attuned with each other. For each gene in the noise, patients are randomly extracted from the configured number of patients without inserting them back into the list. This step is also taken for the spikes to ensure that they are part of the noise. To generate the connection between the spiked-in nodes, new patients are selected.

The generated table has the same structure as the output file of readjsonsheet().

table_out

This table is supposed to be given to calcscorematrix().