-
Notifications
You must be signed in to change notification settings - Fork 0
spike_in
The function spike_in() was implemented to produce a test data set that can be analyzed by the workflow.
A noised data set is generated first. For this, all parameters can be set manually by the user to adapt it as closely to the biological data as possible.
table <- spike_in(noise = 500, spike = 25, numberpatients = 500, proportion_noise = 0.7, proportion_spike = 0.7, overlap = 0.65, noise_FC = 2, spike_FC = 2, numberspikeins = 1)
A table with a short description of the parameters can be taken from below.
argument | function | default |
---|---|---|
noise | number of bimodal genes in the noise | 500 |
spike | number of bimodal genes in the spike | 25 |
numberpatients | number of patients in both spike and noise | 500 |
proportion_noise | proportion of the bimodality in the noise | 0.7 |
proportion_spike | proportion of the bimodality in the spike | 0.7 |
overlap | enhanced overlap between the spike genes in % | 65 |
noise_FC | distance between the means of the modalities in the noise | 2 |
spike_FC | distance between the means of the modalities in the spike | 2 |
numberspikeins | number of not connected spikes to be added to the noise | 1 |
In this process, the first step is the construction of a table that contains all static values like gene name, component, size, proportion, variance, mean, and FC. In a second step, the composition of patients is generated and inserted into the group member column by randomly filling the first gene with the selected number of patients who have all been tagged with a consecutive number at the end of their names. The composition of the following genes is produced by using the sample() function, while the desired connection strength is selected through the number of patients who have been picked with the sample() function using the defined corresponding gene. The rest of the table is filled with consecutively numbered patients.
Into this noisy part, a spike-in that can be changed in its proportion, fold change, size, and most importantly its patient composition, is inserted. Also, several spikes can be attuned with each other. For each gene in the noise, patients are randomly extracted from the configured number of patients without inserting them back into the list. This step is also taken for the spikes to ensure that they are part of the noise. To generate the connection between the spiked-in nodes, new patients are selected.
The generated table has the same structure as the output file of readjsonsheet().