Validation of Metadata Documents

Purpose

After you generated a new Process file, it is desirable to check that the document adheres to the DEEP Process specification that is defined in the XML schema file template/deep_process_schema.xsd. To achieve that with little to no extra effort, you can run the small Python validation script mdvalid.py.

Installation

The script itself requires no installation or configuration and works with Python2.7+ or Python3.2+. It uses the lxml package that is not part of the Python standard library. Please install lxml according to your software environment. For your convenience, we provide two YAML files to install a complete environment for Python2 or Python3 (whichever you prefer) using the Conda package manager.

Attention

Note from: 2016-12-29

Due to some dependency issues concerning the lxml package and Conda, lxml currently it has to be installed from within the activated Conda environment. That is why, in the Conda YAML files, the line - lxml=3.* is commented out - see also here: Github issue 4093

In case you want to use the Conda package manager, you can create the environment as follows:

$ conda env create -f conda_py3_mdvalid.yml
## wait for setup to complete
$ source activate py3valid
## next line: workaround solution - 2016-12-29
$ pip install lxml
## wait for installation to complete
## run the script; see below
$ source deactivate

Execution

The script can be executed on a command line as follows:

$ python mdvalid.py [Options]

Please read the help for details:

$ python mdvalid.py --help

The Python code has been developed on a Debian Linux system:

x86_64 Linux (Debian 7.5 "Wheezy")

with Python versions

Python 2.7.3 (lxml.etree 2.3.2)
Python 3.2.3 (lxml.etree 2.3.2)

Note that this testing setup refers to the released version. The current version of lxml 3.6.4 works as well.

Examples:

Validate a process XML:

$ python mdvalid.py --schema schema_file.xsd --process process_file.xml

Validate a process XML and a corresponding analysis metadata file:

$ python mdvalid.py --schema schema_file.xsd --process process_file.xml --analysis analysis_file.amd.tsv

Validate many analyses of the same type:

$ python mdvalid.py --schema schema_file.xsd --process process_file.xml --analysis file1.amd.tsv file2.amd.tsv file3.amd.tsv

In case of errors, you can get more info by running the script in debug mode

$ python mdvalid.py --debug [Options]