Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
limitt/HELP_UPDATE
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
97 lines (67 sloc)
4.9 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
UPDATE DATABASES | |
Help file for updating DBs (MTI DBs, miRBase, UniProtKB) used for LimiTT. | |
Order of updating: | |
* MTI DBs | |
* miRBase | |
* run the "do_after_update.py" script | |
* UniProtKB | |
----------- | |
MTI DBs | |
----------- | |
All current DB files were downloaded, converted to tab delimited text files and saved within the "files" folder. | |
During the LimiTT process the script "db_call.py" accesses the files and filters the matching information out of them. | |
By updating one of the DBs, it is necessary to adjust the file name within the "db_call.py" script and in case of changed | |
columns - to either change the column numbers within the script or be sure to arrange the file to the style of the old file. | |
!!!!!!!!!!!!!!!!! | |
AFTER updating one or more of the DBs (or miRBAse) please run the script "do_after_update.py". | |
!!!!!!!!!!!!!!!! | |
-- TarBase | |
miRNA-mimat, ensgid, gene-name, reporter_gene, nothern_blot, western_blot, qPCR, proteomics, microarray, sequencing, degradome_seq, other | |
Except for 1 (ensgid), all columns are used. | |
-- miRTarBase | |
miRTarBase ID, miRNA, Species (miRNA), Target Gene, Target Gene (Entrez Gene ID), Species (Target Gene), Experiments, Support Type, References (PMID) | |
Starting from 0, the columns 1 (miRNA), 6 (Experiment) and 8 (PubMed) are used. | |
-- miRecords | |
Pubmed_id, Target gene_species_scientific, Target gene_name, Target gene_Refseq_acc, Target site_number, miRNA_species, miRNA_mature_ID, miRNA_regulation, Reporter_target gene/region, Reporter link element, Test_method_inter, Target gene mRNA_level, Original description, Mutation_target region, Post mutation_method, Original description_mutation_region, Target site_position, miRNA_regulation_site, Reporter_target site, Reporter link element, Test_method_inter_site, Original description_inter_site, Mutation_target site, Post mutation_method_site, Original description_mutation_site, Additional note | |
The columns 0 (Pubmed_id), 2 (Target gene_name), 5 (miRNA_species) and 6 (miRNA_mature_ID) are used. | |
-- starBase | |
The content of starBase could be donwnloaded just as single files for each Organism and each stringency (number of CLIP-Seq experiments supporting the MTI), resulting in 9 files with the following columns. | |
name, geneName, targetScanSites, picTarSites, RNA22Sites, PITASites, miRandaSites, CancerNum | |
Important are column 0 (name) and 1 (geneName). | |
Steps for updating starBase: | |
* Download each file and append the stringency information with "_stringX" (e.g. starBase_Human_xycx_string1.xls) to the file name. | |
* Save all files in one folder where no other .xls files are in. | |
* Start the script "starBase_toFile.py" | |
This script will concatenate all files in one text file and save each entry once with the highest stringency. | |
nameMiRNA, NameGene, Stringency | |
LimiTT will automatically add the experimental method, which is solely CLIP-Seq for starBase. | |
--------- | |
miRBase | |
-------- | |
Content of miRBase is needed to convert the miRBase accession numbers the DB TarBase uses, to mature miRNA identifiers. | |
For that, the mature.fa file from http://www.mirbase.org/ftp.shtml was downloaded and converted to a python dictionary (hash) with MIMAT accessions as key and corresponding miRNA identifier as value. | |
* Download the mature.fa file | |
* Start the "miRBase_to_dict.py" script | |
* Run the the script "do_after_update.py" | |
* Replace the old mimat_miRNA.dict filein the "files" folder with the new one | |
-------- | |
UniProt | |
-------- | |
The whole content of the UniProtKB was downloaded (SwissProt and TrEmbl) and coverted to a shortened list per entry. Columns: | |
Review status, Accession(s), Organism, Gene names, Protein names, EC number, Existence, GO-IDs, RefSeq, UniGene, Ensembl, GeneID, KEGG | |
Subsequently this shortened file was used to create a database-like structure with python dictionaries to enable the | |
mapping of | |
- all possible gene symbols to UniProtKB entries and thus on UniProt accessions (UniProtAcc) | |
- UniProtAcc to UniProtKB entries . | |
This "database-like structure" consists of three dictionary (or dictionary-like) structures consisting of the key -> value | |
- Gene Symbol -> ID(s) of UNiProtKB entry | |
- UniProtAcc -> ID of UNiProtKB entry | |
- ID of UNiProtKB entry -> shortened UNiProtKB entry | |
At this, just entries were used, where gene symbols/ synonyms or cross-references can be linked to target symbols of | |
the MTI DBs. The files (entry.shelve, id.dict, uni.dict) are saved under the "files" folder | |
Steps for updating: | |
* If MTI DBs or miRBase were updated before, be sure that "do_after_update.py" was finished. | |
* Download the zipped SwissProt and TrEmbl .dat files from UniProt via FTP (.dat.gz files) | |
* start the "unip_to_tab.py" script to create the shortened table | |
* start the "uniProt_to_dict.py" script to create the database-like structured dictionaries | |
* replace the old dictionary files with the new ones in the "files" folder |