Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
Improving documentation
  • Loading branch information
proost committed Jan 13, 2018
1 parent d23892e commit 6352e7b
Show file tree
Hide file tree
Showing 6 changed files with 166 additions and 138 deletions.
File renamed without changes
4 changes: 4 additions & 0 deletions conekt/models/gene_families.py
Expand Up @@ -179,6 +179,10 @@ class GeneFamily(db.Model):
interpro_domains = db.relationship('Interpro', secondary=family_interpro, lazy='dynamic')
xrefs = db.relationship('XRef', secondary=family_xref, lazy='dynamic')

# Other properties
# go_annotations from .relationships.family_go FamilyGOAssociation
# interpro_annotations from .relationships.family_intpro FamilyInterproAssociation

def __init__(self, name):
self.name = name

Expand Down
21 changes: 21 additions & 0 deletions docs/building/001_GO_InterPro_domains.md
@@ -0,0 +1,21 @@
# Adding GO term and InterPro domain definition

Descriptions for GO terms and InterPro domains should be added before
adding functional annotation. This step should be completed first. In
the top menu click on 'Add' and select 'Functional Data'.

![Add functional data](../images/add_functional_data.png "Adding functional data")

The GO descriptions can be obtained from the Gene Ontology Consortium's
official website in OBO format [here](http://geneontology.org/page/download-ontology).
InterPro domains and descriptions (called the **Entry list**) are found on EBI InterPro's download pages [here](https://www.ebi.ac.uk/interpro/download.html),
decompress the .gz file prior to uploading.

Click the buttons on the page and select the corresponding files, next
click 'Add functional data' to upload the files to your server and
import them in the database. This process can take some time, do not
close the browser window.

**Note: The existing tables will be cleared before adding the new
definitions. Do not update this information if GO/InterPro data is
already added to species !**
72 changes: 72 additions & 0 deletions docs/building/002_species_functional_data.md
@@ -0,0 +1,72 @@
# Adding a new species and functional data

Adding a species requires multiple steps, follow the steps below for
each species.

## Adding the species and sequences

On the 'Admin panel', under 'Add' select 'Species'. Fill in the
full scientific name (or the name you wish to appear on the website) for
the species and select a 'three letter code', that is unque for the
species (we recommend a combination of genus and species cfr. **H**omo
**sa**piens = hsa, note that while three characters is recommended it is
not required, longer codes are possible).

Some visualizations require a color specific for each species, these can
be entered using the controls below (clicking the box opens a color
picker, there is no need to manually add in hex values).

Finally, select a fasta file with **coding sequences**, in the fasta
headers the gene name (and only the name) needs to be present. These are
the names genes will receive on the website.

```
>Gene1
ATG...
>Gene2
ATG...
```

To upload the data and add the species to the database click 'Add species'

## Adding descriptions to sequences

In the 'Admin panel', under 'Add' -> 'Sequence descriptions'.

Select the **species** and a tab-delimited file containing on a single line the gene ID and
description (example below). Click **Add descriptions** to upload the file and add the
description to the database.

```
gene01 gene01 description
gene02 gene02 description
gene03 gene03 description
...
```

Note: This step can be very time consuming if Whooshee indexing is enabled in your config! When
setting up a database with multiple species, consider disabling indexing while building the DB and
later enabling it + rebuilding the index (found under controls in the admin panel).

## Adding functional Annotation to sequences

**GO** terms can be imported from tab delimited files, containing 3
columns: gene name, GO label and the evidence tag.

```
Gene1 GO:0004621 IEA
Gene1 GO:0004622 IEA
Gene2 GO:0000227 ISS
...
```

On the 'Admin panel', under 'Add' -> 'GO Genes' such a file can be
uploaded, for a species. Additionally a description needs to be provided
from where the GO terms in the file originate from (the source).

![GO gene](../images/add_go_gene.png "Adding GO terms for a species")

**InterPro** domains can be imported directly from [InterProScan](http://www.ebi.ac.uk/interpro/download.html) output.
To do so, in the 'Admin panel', under 'Add' -> 'InterPro genes' select a species, select the file and click 'Add InterPro'

![InterPro gene](../images/add_interpro_gene.png "Adding InterPro terms for a species")
60 changes: 60 additions & 0 deletions docs/building/003_expression_profiles.md
@@ -0,0 +1,60 @@
# Adding expression data

Expression data should be processed using [LSTrAP](https://github.molgen.mpg.de/proost/LSTrAP),
this will generate the expression matrix, coexpression networks and
clusters that can be directly imported into CoNekt. Note that in
some cases additional files, containing meta information, need to be
provided.

## Adding expression profiles

In the 'Admin panel', under 'Add' -> 'Expression profiles'. Select the
species and the source (currently only LSTrAP expression matrices are supported).

Next, select the expression matrix (generated using LSTrAP). Using a
normalized (TPM or RPKM) matrix is strongly recommended !

Furthermore two additional files need to be provided, one that links the
run identifiers to specific conditions. This tab delimited file should
be structured as indicated below, a one-line header (which is ignored)
and two columns, the first with the sample ID and the second with a short
description of the condition sampled. Samples with the same description
will be treated as replicates ! Omitting the condition description will
exclude the sample from the profiles.


```
SampleID ConditionDescription
SRR068987 Endosperm
SRR314813 Seedlings, 11 DAG
SRR314814
SRR314815 Flowers (floral buds)
SRR314816
...
```

For profile plots on the website most likely a custum order of conditions
is preferred. (We usually order tissues from bottom to top) A file to
specify this needs to be provided, conditions need to be stated in the
orther they should appear in the plot.
Furthermore a color for that condition in the plot needs to be added in
rgba() format. See the example below.

```
Roots (apex), 7 DAG rgba(153, 51, 0, 0.5)
Roots (differentation zone), 4 DAP rgba(153, 51, 0, 0.5)
Roots (elongation zone), 4 DAP rgba(153, 51, 0, 0.5)
Roots (meristematic zone), 4 DAP rgba(153, 51, 0, 0.5)
Roots (QC cells), 6 DAS rgba(153, 51, 0, 0.5)
Roots (stele cells), 7 DAS rgba(153, 51, 0, 0.5)
Roots (tip) rgba(153, 51, 0, 0.5)
Leaves (rosette), 21 DAG rgba(0, 153, 51, 0.5)
Leaves (rosette), 29 DAG rgba(0, 153, 51, 0.5)
...
```

If all files are selected click 'Add Expression Profiles' to upload the
data and add everything to the database.


![Add expression profiles](../images/add_expression_profiles.png)
147 changes: 9 additions & 138 deletions docs/building_conekt.md
Expand Up @@ -3,7 +3,8 @@
## Using the admin panel to build CoNekt

Make sure *LOGIN_ENABLED=True* in *config.py* and the database was build
with and admin account. Next, go to the website, log in and (once logged
with and admin account (check [here](install_linux.md) for instructions
how to add an admin account). Next, go to the website, log in and (once logged
in) click on the username (admin) in the top right corner. Select 'Admin
Panel' from the drop-down menu.

Expand All @@ -13,148 +14,18 @@ Panel' from the drop-down menu.
The Admin Panel will welcome you with a large warning. Deleting data,
overwriting or changing entries here can ruin a carefully set up
database. Make sure to read instructions on pages and this documentation
to avoid losing work. When working with an existing database, make sure
to back up the database before proceeding.
to avoid losing work.

### Adding GO terms and InterPro domains
**When working with an existing database, make sure
to back up the database before proceeding.**

Descriptions for GO terms and InterPro domains should be added before
adding functional annotation. This step should be completed first. In
the top menu click on 'Add' and select 'Functional Data'.

![Add functional data](./images/add_functional_data.png "Adding functional data")
Step-by-step instructions

The GO descriptions can be obtained from the Gene Ontology Consortium's
official website in OBO format [here](http://geneontology.org/page/download-ontology).
InterPro domains and descriptions (called the **Entry list**) are found on EBI InterPro's download pages [here](https://www.ebi.ac.uk/interpro/download.html),
decompress the .gz file prior to uploading.
* [Adding GO term and InterPro domain definitions](./building/001_GO_InterPro_domains.md)
* [Adding a new species and functional data](./building/002_species_functional_data.md)
* [Adding expression profiles](./building/003_expression_profiles.md)

Click the buttons on the page and select the corresponding files, next
click 'Add functional data' to upload the files to your server and
import them in the database. This process can take some time, do not
close the browser window.

**Note: The existing tables will be cleared before adding the new
definitions. Do not update this information if GO/InterPro data is
already added to species !**


### Adding a new species and functional data

Adding a species requires multiple steps, follow the steps below for
each species.

#### Adding the species and sequences

On the 'Admin panel', under 'Add' select 'Species'. Fill in the
full scientific name (or the name you wish to appear on the website) for
the species and select a 'three letter code', that is unque for the
species (we recommend a combination of genus and species cfr. **H**omo
**sa**piens = hsa, note that while three characters is recommended it is
not required, longer codes are possible).

Some visualizations require a color specific for each species, these can
be entered using the controls below (clicking the box opens a color
picker, there is no need to manually add in hex values).

Finally, select a fasta file with **coding sequences**, in the fasta
headers the gene name (and only the name) needs to be present. These are
the names genes will receive on the website.

```
>Gene1
ATG...
>Gene2
ATG...
```

To upload the data and add the species to the database click 'Add species'



#### Adding functional Annotation to sequences

**GO** terms can be imported from tab delimited files, containing 3
columns: gene name, GO label and the evidence tag.

```
Gene1 GO:0004621 IEA
Gene1 GO:0004622 IEA
Gene2 GO:0000227 ISS
...
```

On the 'Admin panel', under 'Add' -> 'GO Genes' such a file can be
uploaded, for a species. Additionally a description needs to be provided
from where the GO terms in the file originate from (the source).

![GO gene](./images/add_go_gene.png "Adding GO terms for a species")

**InterPro** domains can be imported directly from [InterProScan](http://www.ebi.ac.uk/interpro/download.html) output.
To do so, in the 'Admin panel', under 'Add' -> 'InterPro genes' select a species, select the file and click 'Add InterPro'

![InterPro gene](./images/add_interpro_gene.png "Adding InterPro terms for a species")

### Adding expression data

Expression data should be processed using [LSTrAP](https://github.molgen.mpg.de/proost/LSTrAP),
this will generate the expression matrix, coexpression networks and
clusters that can be directly imported into CoNekt. Note that in
some cases additional files, containing meta information, need to be
provided.

#### Adding expression profiles

In the 'Admin panel', under 'Add' -> 'Expression profiles'. Select the
species and the source (currently only LSTrAP expression matrices are supported).

Next, select the expression matrix (generated using LSTrAP). Using a
normalized (TPM or RPKM) matrix is strongly recommended !

Furthermore two additional files need to be provided, one that links the
run identifiers to specific conditions. This tab delimited file should
be structured as indicated below, a one-line header (which is ignored)
and two columns, the first with the sample ID and the second with a short
description of the condition sampled. Samples with the same description
will be treated as replicates ! Omitting the condition description will
exclude the sample from the profiles.


```
SampleID ConditionDescription
SRR068987 Endosperm
SRR314813 Seedlings, 11 DAG
SRR314814
SRR314815 Flowers (floral buds)
SRR314816
...
```

For profile plots on the website most likely a custum order of conditions
is preferred. (We usually order tissues from bottom to top) A file to
specify this needs to be provided, conditions need to be stated in the
orther they should appear in the plot.
Furthermore a color for that condition in the plot needs to be added in
rgba() format. See the example below.

```
Roots (apex), 7 DAG rgba(153, 51, 0, 0.5)
Roots (differentation zone), 4 DAP rgba(153, 51, 0, 0.5)
Roots (elongation zone), 4 DAP rgba(153, 51, 0, 0.5)
Roots (meristematic zone), 4 DAP rgba(153, 51, 0, 0.5)
Roots (QC cells), 6 DAS rgba(153, 51, 0, 0.5)
Roots (stele cells), 7 DAS rgba(153, 51, 0, 0.5)
Roots (tip) rgba(153, 51, 0, 0.5)
Leaves (rosette), 21 DAG rgba(0, 153, 51, 0.5)
Leaves (rosette), 29 DAG rgba(0, 153, 51, 0.5)
...
```

If all files are selected click 'Add Expression Profiles' to upload the
data and add everything to the database.


![Add expression profiles](./images/add_expression_profiles.png)


#### Adding coexpression network
Expand Down

0 comments on commit 6352e7b

Please sign in to comment.