Graph.py

The implemented python script builds a graph and searches for structures inside it by:

python Graph.py -i <input directory> -o <output directory>

The python script performs four different generative models for the given graphs. The implementation of the algorithm are taken from Graph-tool

Those generative models are then compared by their minimum description length and the best fitting algorithm is used for the ongoing analyses.

The first two models are statistic block models (SBM). The first is the default version and the second a variation where a vertex can be part of several clusters/blocks. In this approach, the assumption that the number of clusters B, which is nearly √N – N is the number of vertices, respectively genes -, is made by the interpretation of the clusters of the model as nodes, and the edge counts between the clusters as edges. The used implementation parametrizes the partition of the nodes into clusters and the edges between the clusters. The nodes’ membership is then randomly switched, while the micro canonical value of the graph is computed and compared to the former value.

sbm

SBM where the blue noise can be clearly separated from the yellow vertices which represent the spike.

The other models are nested block models (NBM). The first analyses is again the default version of a NBM. The second in this case is a nbm where a step of equilibration is performed. A NBM is also an SBM, but in this case the SBM uses different priors. By implementing a flow of information from higher abstracted forms of the graph, a prominent problem of the nonhierarchical model can be reduced. The scale of detectable blocks in this approach is limited to √N.Including the hierarchical information flow, the detection minimum of nodes per cluster can be lowered to ~ log(N). Which is needed to find small structures in large scale graphs.

For all models a normal and a degree corrected model is compared and only the more precise model is used.

nbm NBM where each dot on the circle represents a vertex and the connections between the edges are represent by the lines between them.

For each of the models a graph is drew and the meta data of the graph are also written to the output directory for a later inspection.

The results of the models are saved to files named blockmember_.csv and the best fitting algorithm is saved also as blockmember.csv for the read-in of readcluster().

Graph.py

Clone this wiki locally