Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
Updated documentation
  • Loading branch information
Klaus Thoden committed Aug 1, 2018
1 parent 6a09b63 commit 03f2727
Showing 1 changed file with 46 additions and 22 deletions.
68 changes: 46 additions & 22 deletions README.md
@@ -1,10 +1,24 @@
# EOASkripts
This is a set of scripts which forms the central part of the conversion workflow in the Edition Open Access.
This is a set of scripts which forms the central part of the
conversion workflow in the Edition Open Access.

The first part uses
[tralics](https://www-sop.inria.fr/marelle/tralics/) to convert the
TeX source to XML. The original DocBook output is enriched by various
EOA-specific elements.
We currently accept and support manuscripts in two different formats:
LaTeX and DocX (as used in Microsoft Word).

![The EOA workflow](doc/eoa_intermediate_workflow.png)

## The LaTeX workflow
The LaTeX workflow is based on a reduced set of LaTeX commands which
are defined in a preambel and help keeping the book production
workflow consistent. A sample project is found at
<https://github.molgen.mpg.de/EditionOpenAccess/eoa_sample_project>.

The PDF version is created directly with `xelatex`.

For the creation of the other format,
[tralics](https://www-sop.inria.fr/marelle/tralics/) is used to
convert the TeX source to XML. The original DocBook output is enriched
by various EOA-specific elements.

This intermediate XML file is subsequently used by three additional
programs which turn it into TEI-XML, EPUB and Django-XML,
Expand All @@ -16,7 +30,25 @@ The EPUB files can be put together to form an ebook. The script

The conversion to TEI is still work in progress.

![The EOA workflow](doc/eoa_intermediate_workflow.png)
## The DocX workflow
This workflow is based on Microsoft Word documents which are created
following the Guidelines of a template found at
<http://edition-open-access.de/media/support/files/EOA_Word_Template.docx>.
Currently, the webservice at <http://www.tei-c.org/oxgarage/#> is used
to convert it into TEI P5.

Similar to the LaTeX workflow we require the authors to hand in their
bibliographic references in a database format, such as BibTeX. The
Word template explains in detail how citations should be entered.

The script `fix_tei` corrects some artifacts of the oxgarage
conversion and expands the shorthand codes for references and figures
to XML tags.

After that, a PDF document can be obtained by using an XSL script to
create a LaTeX file, or the TEI file can be converted into the
customized DocBook format from above workflow so that the existing
tools can be used.

## Examplary workflow

Expand All @@ -30,12 +62,7 @@ in your TeX distribution) and `xelatex` two more times. This will give
you the PDF version of the document.

Next, comment line 9 in `EOASample.tex` (the EOA preambel) and
uncomment line 10 (the XML preambel) and run the older version of
biber (biber v2.1).

biber_2.1 EOASample

Now, you are ready to run `eoaconvert.py`:
uncomment line 10 (the XML preambel) and run `eoaconvert.py`:

eoaconvert.py -f EOASample

Expand All @@ -48,15 +75,12 @@ If everything went well, you can also try and run
These scripts don't take any arguments and will produce output in the
`CONVERT` directory.

# Convert TEI to EOADjango #

Suite of functions to get from TEI encoded XML into the workflow of Edition Open Access. The main output file is an XML file called `IntermediateXML.xml` which can subsequently processed with `tralics2django`, a tool found in the `EOASkripts` repository.

Code written in Python3.

External dependencies
---------------------
- lxml
- BeautifulSoup
- pandoc
- pandoc-citeproc
See INSTALL.md for details.

- lxml (<https://pypi.org/project/lxml/>)
- bibtexparser (<https://pypi.org/project/bibtexparser/>)
- BeautifulSoup (<https://pypi.org/project/bs4/>)
- pandoc (<https://pandoc.org/>)
- pandoc-citeproc (<https://hackage.haskell.org/package/pandoc-citeproc>)

0 comments on commit 03f2727

Please sign in to comment.