Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
documentation
  • Loading branch information
proost committed Jan 22, 2018
1 parent ce091a1 commit 22218f3
Show file tree
Hide file tree
Showing 3 changed files with 65 additions and 2 deletions.
67 changes: 65 additions & 2 deletions docs/building/005_comparative_genomics.md
@@ -1,3 +1,66 @@
# Adding Gene Families and Trees
# Adding OrthoGroups with Trees and Gene Families

When building OrthoGroups and Gene families for CoNekT the easiest way to go about
this is to add the species to the database, and export the protein fastas from
CoNekT. This ensures all IDs are the same.

## Importing OrthoGroups and Gene Families
Output from [OrthoFinder](https://github.com/davidemms/OrthoFinder) and (tribe)MCL can be directly imported, add a fitting
description, select the type of data you wish to import and select the file. Hit
**Add Families** to upload the file and create the gene families in the database.

For OrthoFinder, select Orthogroups.txt, from the output. For (tribe)MCL pick the
file with the final output (all members of a gene family on one line).

![add_gf](../images/add_gf.png)

## Importing Trees

[OrthoFinder](https://github.com/davidemms/OrthoFinder)'s phylogenetic trees can
be imported into CoNekT. To do so **first create a gzip file** containing all the
trees. Furthermore you will need to locate the file **SequenceIDs.txt** which is
used to convert OrthoFinder's internal IDs back to CoNekT's.

![add_trees](../images/add_trees.png)

First select the **OrthoFinder families** you wish to add trees to. Next **add a
description** and finally select the **gzip file** with all trees and
**SequenceIDs.txt**.

Currently adding trees to other types of gene families is not supported.

## Adding Clades

For clades to be detected correctly, clade definitions need to be stored in the
database from 'Add->'Clades'. This is done using a JSON object structured as the
example here:

```json
{
"Arabidopsis": {
"species": ["ath"],
"tree": null
},
"Poplar": {
"species": ["ptr"],
"tree": null
},
"Rice": {
"species": ["osa"],
"tree": null
},
"Rosids": {
"species": ["ptr", "ath"],
"tree": "(ptr:0.01, ath:0.01);"
},
"Angiosperms": {
"species": ["ptr", "ath", "osa"],
"tree": "((ptr:0.03, ath:0.03):0.01, osa:0.04);"
}
}
```

**Dictionary keys** are different clades, within each dict you have to specify two
things : the **species**, which contain an array of short names of the species in that
clade and a **tree** with a newick tree of that clade.

*UNDER CONSTRUCTION*
Binary file added docs/images/add_gf.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/add_trees.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 22218f3

Please sign in to comment.