spike_in

The function spike_in() was implemented to produce a test data set that can be analyzed by the workflow.

A noised data set is generated first. For this, all parameters can be set manually by the user to adapt it as closely to the biological data as possible.

table <- spike_in(noise = 500, spike = 25, numberpatients = 500, proportion_noise = 0.7, proportion_spike = 0.7, overlap = 0.65, noise_FC = 2, spike_FC = 2, numberspikeins = 1)

A table with a short description of the parameters can be taken from below.

argument	function	default
noise	number of bimodal genes in the noise	500
spike	number of bimodal genes in the spike	25
numberpatients	number of patients in both spike and noise	500
proportion_noise	proportion of the bimodality in the noise	0.7
proportion_spike	proportion of the bimodality in the spike	0.7
overlap	enhanced overlap between the spike genes in %	65
noise_FC	distance between the means of the modalities in the noise	2
spike_FC	distance between the means of the modalities in the spike	2
numberspikeins	number of not connected spikes to be added to the noise	1

In this process, the first step is the construction of a table that contains all static values like gene name, component, size, proportion, variance, mean, and FC. In a second step, the composition of patients is generated and inserted into the group member column by randomly filling the first gene with the selected number of patients who have all been tagged with a consecutive number at the end of their names. The composition of the following genes is produced by using the sample() function, while the desired connection strength is selected through the number of patients who have been picked with the sample() function using the defined corresponding gene. The rest of the table is filled with consecutively numbered patients.

spike_graphic

Into this noisy part, a spike-in that can be changed in its proportion, fold change, size, and most importantly its patient composition, is inserted. Also, several spikes can be attuned with each other. For each gene in the noise, patients are randomly extracted from the configured number of patients without inserting them back into the list. This step is also taken for the spikes to ensure that they are part of the noise. To generate the connection between the spiked-in nodes, new patients are selected.

The generated table has the same structure as the output file of readjsonsheet().

table_out

spike_in

Clone this wiki locally