From 511a25ed07b340d91f5edc89ff47eed5c5e00281 Mon Sep 17 00:00:00 2001 From: EsGeh Date: Tue, 12 Nov 2019 17:12:27 +0100 Subject: [PATCH] updated README --- INSTALL.md | 49 -------- README.md | 101 ++++++++-------- doc/COSMOS.md | 74 ------------ doc/XSL.md | 76 ------------- doc/bibliography.md | 25 ---- doc/create_tmpbib.md | 19 ---- doc/datapickle.md | 228 ------------------------------------- doc/eoaclassic-workflow.md | 35 ------ doc/eoadocx-workflow.md | 6 - doc/fix_tei.md | 33 ------ doc/metadatamapping.md | 32 ------ doc/tei2imxml.md | 13 --- 12 files changed, 47 insertions(+), 644 deletions(-) delete mode 100644 INSTALL.md delete mode 100644 doc/COSMOS.md delete mode 100644 doc/XSL.md delete mode 100644 doc/bibliography.md delete mode 100644 doc/create_tmpbib.md delete mode 100644 doc/datapickle.md delete mode 100644 doc/eoaclassic-workflow.md delete mode 100644 doc/eoadocx-workflow.md delete mode 100644 doc/fix_tei.md delete mode 100644 doc/metadatamapping.md delete mode 100644 doc/tei2imxml.md diff --git a/INSTALL.md b/INSTALL.md deleted file mode 100644 index 586835d..0000000 --- a/INSTALL.md +++ /dev/null @@ -1,49 +0,0 @@ -# Installation instructions - -## Python Dependencies - -All programs require the installation of Python 3, which can be -obtained at or through package -managers (depending on the operating system). Current installations -already include `pip` which is used to install further packages. - - $ pip install -r requirements.txt - -to install the required packages. - -## Runtime Dependencies - -Additionally, you will need these extra tools installed ( = in your systems $PATH ): -* [graphicsmagick](http://www.graphicsmagick.org/) -* [tralics](https://www-sop.inria.fr/marelle/tralics/) -* [pandoc](https://pandoc.org/) -- [pandoc-citeproc](https://hackage.haskell.org/package/pandoc-citeproc) -* curl -* from your latex distribution: - - - pdfcrop - - xelatex - -For a successful run of the whole conversion you will also need a -slightly older version (Version 2.1) of the biber tool. You can -download the source and/or binaries at -https://sourceforge.net/projects/biblatex-biber/files/biblatex-biber/2.1/ - -In order to run the xsl scripts found in the -[data](https://github.molgen.mpg.de/EditionOpenAccess/EOASkripts/tree/master/data) -directory, an XSL processor like -[Saxon](http://saxon.sourceforge.net/) is required. See -[doc/XSL.md](https://github.molgen.mpg.de/EditionOpenAccess/EOASkripts/blob/master/doc/XSL.md) -for details. - -In later versions, the file format was changed. We still need to adapt -to the new format. - -## TEI components -Clone the repositories -- https://github.com/TEIC/TEI.git -- https://github.com/TEIC/Stylesheets.git - -`Stylesheets` contains the tools for conversion. - -See for further documentation. diff --git a/README.md b/README.md index 395501c..694e380 100644 --- a/README.md +++ b/README.md @@ -7,76 +7,69 @@ LaTeX and DocX (as used in Microsoft Word). ![The EOA workflow](doc/eoa_intermediate_workflow.png) -## The LaTeX workflow -The LaTeX workflow is based on a reduced set of LaTeX commands which -are defined in a preambel and help keeping the book production -workflow consistent. A sample project is found at -. +# Installation -The PDF version is created directly with `xelatex`. +In order to provide a consistent environment we are using docker. +A set of python scripts is used for automatisation. +Installing and running the project without Docker or the scripts is possible, but not recommended. In this case you are on your own. -For the creation of the other format, -[tralics](https://www-sop.inria.fr/marelle/tralics/) is used to -convert the TeX source to XML. The original DocBook output is enriched -by various EOA-specific elements. +## Prerequisites -This intermediate XML file is subsequently used by three additional -programs which turn it into TEI-XML, EPUB and Django-XML, -respectively. The Django-XML format is ingested into the database of -the EOA site where it will show up as an online publication. +- Python 3 +- Docker, Docker Compose -The EPUB files can be put together to form an ebook. The script -[data/misc/epub.sh](https://github.molgen.mpg.de/EditionOpenAccess/EOASkripts/blob/master/data/misc/epub.sh) performs the required steps. +## Initialise the Repository -The conversion to TEI is still work in progress. + $ ./scripts/init.py [--build] -## The DocX workflow -This workflow is based on Microsoft Word documents which are created -following the Guidelines of a template found at -. -Currently, the webservice at is used -to convert it into TEI P5. +This will pull remote repositories and resources, initialize the database, etc. +Force recreating the docker image by adding `--build` e.g. if `Dockerfile` or `requirements.txt` has changed. -Similar to the LaTeX workflow we require the authors to hand in their -bibliographic references in a database format, such as BibTeX. The -Word template explains in detail how citations should be entered. +## Run the Docker Container -The script [fix_tei](https://github.molgen.mpg.de/EditionOpenAccess/EOASkripts/blob/master/fix_tei.py) corrects some artifacts of the oxgarage -conversion and expands the shorthand codes for references and figures -to XML tags. + $ ./scripts/run.py -After that, a PDF document can be obtained by using an XSL script to -create a LaTeX file, or the TEI file can be converted into the -customized DocBook format from above workflow so that the existing -tools can be used. +The container should now be up and running. -See [doc/XSL.md](https://github.molgen.mpg.de/EditionOpenAccess/EOASkripts/blob/master/doc/XSL.md) for a documentation of the XSL workflow. +## Run Command in the Docker Container -## Examplary workflow + $ ./scripts/exec_in_container.py [-- CMD ...] -To install the whole toolchain, clone at least this repository as well -as the 'advanced' branch of -[EOA sample project](https://github.molgen.mpg.de/EditionOpenAccess/eoa_sample_project). -Follow the installation instructions in [INSTALL.md](./INSTALL.md). +Follow your workflow by running the appropriate scripts to compile your publications (see chapter 'Workflow'). +Inside the container you have a well defined environment with all necessary evil dependencies available. +Don't forget to stop the container when you are done. -In `eoa_sample_project`, run `xelatex`, `biber` (the version included -in your TeX distribution) and `xelatex` two more times. This will give -you the PDF version of the document. +## Stop the Docker Container -Next, comment line 9 in `EOASample.tex` (the EOA preambel) and -uncomment line 10 (the XML preambel) and run `eoaconvert.py`: + $ ./scripts/stop.py - eoaconvert.py -f EOASample +## Clean the Repository -If everything went well, you can also try and run + $ ./scripts/exit.py - tralics2django.py - tralics2epub.py - tralics2tei.py +This should remove all remote repositories and resources not part of this repository. +Docker images are not deleted though. -These scripts don't take any arguments and will produce output in the -`CONVERT` directory. +## Configuration -External dependencies ---------------------- -See [INSTALL.md](./INSTALL.md) for details. +- Environment variables: see file `.env` + + These variables are available inside the `docker-compose.yaml` file and are also loaded into the python scripts. + The file is created with default settings by the `init.py` script. + +# Workflows + +Compiling your documents involves following a workflow which can consist of several steps. +For all further explanations we assume you are in the docker container. + +## The LaTeX workflow (TODO) + +TODO + +## The DocX workflow (TODO) + +TODO + +## Examplary workflow (TODO) + +TODO diff --git a/doc/COSMOS.md b/doc/COSMOS.md deleted file mode 100644 index da61e8e..0000000 --- a/doc/COSMOS.md +++ /dev/null @@ -1,74 +0,0 @@ -# The EOA cosmos - -An inventory of all programs related to EOA and how they play together - -## EOASkripts -Contains the most important programs for the whole document conversion workflow. - - - `bib_add_keyword.py`: This script adds one or more keywords to all entries of a bibtex file. - - `create_tmpbib.py` : A helper script that creates a temporary bibtex file from a formatted list of references. - - `eoatex2pdf.py` : Wrapper script for LaTeX conversion. - - `eoatex2imxml.py` : Converts Latex files into a customized DocBook XML file. - - `fix_tei.py` : This program is a processing step after the conversion from docx to TEI. - - `imxml2django.py` : Create an XML file that can be inserted into the Django database of an EOAv1 installation. - - `imxml2epub.py` : Convert a customized DocBook XML file into a set of files that constitute the contents of an EPUB file. - - `imxml2tei.py` : Unfinished program to convert a customized DocBook XML to TEI-XML. - - `utils/libeoabibitem.py`: A library for the formatting of bibliographical references. - - `utils/libeoaconvert.py` : A collection of functions for the different conversion steps - - `mkimage.py` : Create an automatically generated dummy cover to be used during testing. - - `tei2eoatex.xsl` : An XSL converter from TEI-XML to EOATeX - - `tei2imxml.py` : A converter from TEI to customized DocBook XML. - - `find_chapters.py` : Use LaTeX auxiliary files to split a PDF into chapters. - - `idassigner.py` : Assign xml:ids to various elements fo TEI file. - - `parsezotero.py` : Convert Zotero citations to TEI bibl elements. - -## eoa-csl -A nearly-abandoned version of a CSL configuration for EOA. Currently -slightly customized version of `chicago-author-date.csl` is used. - -## eoa-django -Contains the Django platform which is currently re-developed. The -functionality offered via the `manage.py` interface is currently -expanded and offers - - `bib2tei` : Converts a bibtex bibliography file to a TEI-XML `listBibl` structure - - `check_tei_output` - - `publication_add_chapter_frontpages` : Generates frontmatters for chapter downloads - - `publication_export_tei` : Exports a publication from the database as an TEI-XML file - - `publication_list` - - `publicationimport` - - `statistic` - -## eoa-latexclass -A LaTeX document class for Edition Open Access books. This is used for -the conversion from TEI to PDF. It is clearly modeled after the -pre_eoa preambel of the EOATeX path, but uses mostly standard LaTeX -commands, so that the tei2latex script can be re-used. - -## eoa-django2-gitlab -A prototype for the new EOA2 platform. It accepts TEI-XML documents as input. - -## eoa-opds-module -A proof-of-concept to create files for an OPDS catalogue service. - -## eoa-tei-schema -Contains the RELAX NG schema for the EOA publications. - -## eoa-umbrellapage -A small Django-CMS platform for the umbrella page. - -## eoa-utilities -A collection of small tools and files - -## eoa_sample_project -A EOATeX sample project that contains all the possible markup -possibilities. Also a testing ground for new features. The text markup -is congruent with the example file found in `eoa-tei-schema` - -## eoa_skeleton -Provides the basic structure for an EOATeX project. - -## latex2eoa -A tool to convert common LaTeX markup into EOATeX markup. - -## rssfixer -A small tool to correct the links in the rssfeeds. diff --git a/doc/XSL.md b/doc/XSL.md deleted file mode 100644 index afc8848..0000000 --- a/doc/XSL.md +++ /dev/null @@ -1,76 +0,0 @@ -# The XSL workflow - -A 2.0 XSL processor is required to perform the transformation of the -TEI manuscript. It has been successfully used with -Saxon-HE 9.7.0.21J, found at -[sourceforge](https://sourceforge.net/projects/saxon/files/Saxon-HE/9.7/SaxonHE9-7-0-21J.zip/download). - -This workflow is used to convert the TEI source both to LaTeX and HTML versions. - -## LaTeX -There are two files to choose from, depending on whether the highest -element of a publication is part ([tei2eoatex-parts.xsl](https://github.molgen.mpg.de/EditionOpenAccess/EOASkripts/blob/master/data/tei2eoatex-parts.xsl)) or chapter -([tei2eoatex-noparts.xsl](https://github.molgen.mpg.de/EditionOpenAccess/EOASkripts/blob/master/data/tei2eoatex-noparts.xsl)). Both scripts then import -[tei2eoatex-common.xsl](https://github.molgen.mpg.de/EditionOpenAccess/EOASkripts/blob/master/data/tei2eoatex-common.xsl) which contains the rest of the templates. - -### Command line switches -Four command line switches are available -#### Document structure (required) -This switch is required and tells if the document structure contains parts or not. - -Use value `1` if there are parts. - - parts=1 - -Use value `0` if there are no parts. - - parts=0 -#### Frontmatter (optional) -Toggles the creation and inclusion of a frontmatter, which is -generated from the TEI header. Default: `false`. To enable it, use - - frontmatter=1 -#### List of contributors (optional) -Toggles the creation and inclusion of a List of Contributors (mainly -used in edited books). It is optional set to `false` by default. -Enable it by adding - - contributors=1 - -to the command line. -#### Verbosity (optional) -Selects the amount of output messages during runtime. Possible values are `0`, `1`, `2`. Example: - - verbosity=1 -### Output - -The transformation will create a number of files that closely -resembles the structure in the classic EOATeX way. There is one main -file that links to the separate chapter files which are located in a -directory `texfiles`. - -### Working with the LaTeX files - -The LaTeX files are considered temporary files to create a PDF version -of the TEI text. All changes to the contents and structure of the text -should be made in the master TEI file. However, some elements need to -be fixed at the LaTeX level, e.g. the placement and size of images, -line and page breaks or hyphenation. - -## HTML -The HTML conversion is as of March 2018 still incomplete. -### Command line switches -Up until now, only the the optional `verbosity` switch is available. - -### Example file -The example file, available in -[eoa-publication-model](https://github.molgen.mpg.de/EditionOpenAccess/eoa-publication-model) -can be used to check the conversion of all available commands. - - `java -jar …/saxon9.jar -t -s:…/eoa-publication-model/examples/exampleTEI.xml -xsl:…/EOASkripts/tei2html.xsl` - -The script will create a directory structure similar to the one on the -publication platform. To enjoy the webdesign, a directory containing -stylesheets, javascript and images is necessary. A version of this can -be downloaded from -. diff --git a/doc/bibliography.md b/doc/bibliography.md deleted file mode 100644 index 70e1579..0000000 --- a/doc/bibliography.md +++ /dev/null @@ -1,25 +0,0 @@ -# How the bibliography is made - -References are stored in a bibtex file. The XML file contains -citations that are similar to a LaTeX citation command. - -During processing, the bibtex file and the bibliography and citation -type (anthology or monograph, numeric or author/year) are read out of -the XML source file. - -`pandoc-citeproc` is used to construct a JSON file out of the bibtex -database. This is helpful during the next parts of processing. - - - -The next step formats the bibliography, depending on the bibliography -type, as a complete list or as per-chapter-list. In the second case, -several markdown and html files are created, one for each chapter. -They are named after the `xml:id` of the chapter. - -The function `format_citations` then creates two outputs: the list of -references, formatted in html and a dictionary with the citekey as -headword and three manifestations of the data (author-year-citation, -year-citation and title). diff --git a/doc/create_tmpbib.md b/doc/create_tmpbib.md deleted file mode 100644 index 8c15ae6..0000000 --- a/doc/create_tmpbib.md +++ /dev/null @@ -1,19 +0,0 @@ -# Create temporary bibliography - -In cases where authors hand in a formatted version of the bibliography -(rather than a reference database), this tool can help creating a -database in BibTeX format. - -Required argument is a textfile with a formatted bibliography (one -entry per line). The option `k` is there to supply an entry with a -keyword, for example the name of the chapter author. - -We require authors to use shortcuts in their docx manuscript when -citing, including the use of a citekey (`LASTNAME_YEAR`), there should -already be citekeys in the manuscript. When running `fix_tei.py`, -these citekeys are gathered together and can be used as an input to -this tool. - -The tool creates temporary citekeys out of the formatted bibliography -and in an interactive session, the user selects the most likely entry. -With this, rudimentary entries can be created. diff --git a/doc/datapickle.md b/doc/datapickle.md deleted file mode 100644 index 119c644..0000000 --- a/doc/datapickle.md +++ /dev/null @@ -1,228 +0,0 @@ -# Documentation for the contents of `data.pickle` - -The file data.pickle is created during a run of `eoatex2imxml.py` or `fix_tei.py` and primarily assigns numbers to elements. For example, the thirteenth figure in the first (numbered) chapter, that carries the id `uid17` is assigned the human readable reference `1.13`. - -The original list of stored items is -- chapterdict -- figdict -- eqdict -- fndict -- listdict -- pagelabeldict -- secdict -- tabdict -- theoremdict - -## eoatex2imxml.py -In the classic variant, the file contains these fields: - - data["chapterdict"] = chapterdict - data["figdict"] = figdict - data["eqdict"] = eqdict - data["fndict"] = fndict - data["listdict"] = listdict - data["pagelabeldict"] = pagelabeldict - data["secdict"] = secdict - data["tabdict"] = tabdict - data["theoremdict"] = theoremdict - -See below for how the contents look like. - -## fix_tei.py and tei2imxml.py -In this workflow, the file contains additional information, namely `citekeys_not_in_bib` and `used_citekeys`. However, they are currently not evaluated. - -In `tei2imxml.py`, the data is filled in by the `assign_ids` function. - -# Example data -## chapterdict -`{ 'chapterdict': {'uid1': '1', 'uid18': '2', 'uid33': '3', 'uid67': - '4' 'uid114': '5', 'uid223': '6', 'uid257': '7', 'uid304': '8',} ` - -## eqdict -` - 'eqdict': {} -` - -## figdict -` 'figdict': { 'uid3': '1.1', 'uid4': '1.2', 'uid6': '1.3', 'uid7': - '1.4', 'uid9': '1.5' 'uid10': '1.6', 'uid11': '1.7', 'uid12': - '1.8', 'uid13': '1.9', 'uid14': '1.10', 'uid15': '1.11', 'uid16': - '1.12', 'uid17': '1.13', } ` - -## fndict -` -'fndict': {} -` - -## listdict -` 'listdict': { 'uid100': '31', 'uid101': '32', 'uid102': '33', - 'uid103': '34', 'uid104': '35', 'uid105': '36', 'uid106': '37', - 'uid107': '38', 'uid116': '1', 'uid117': '2', 'uid118': '3', - 'uid119': '4', 'uid120': '5', 'uid121': '6', 'uid122': '1', 'uid123': - '2', 'uid124': '3', 'uid125': '4', 'uid126': '5', 'uid127': '6', - 'uid128': '7', 'uid129': '8', 'uid130': '9', 'uid131': '10', - 'uid132': '11', 'uid133': '12', 'uid134': '13', 'uid135': '14', - 'uid136': '15', 'uid137': '16', 'uid138': '17', 'uid139': '18', - 'uid140': '19', 'uid141': '20', 'uid142': '21', 'uid143': '22', - 'uid144': '23', 'uid145': '24', 'uid146': '25', 'uid147': '26', - 'uid148': '27', 'uid149': '28', 'uid150': '29', 'uid151': '30', - 'uid152': '1', 'uid153': '2', 'uid154': '3', 'uid155': '4', 'uid156': - '5', 'uid157': '1', 'uid158': '2', 'uid159': '3', 'uid160': '4', - 'uid161': '5', 'uid162': '6', 'uid163': '7', 'uid164': '8', 'uid165': - '9', 'uid166': '10', 'uid167': '11', 'uid168': '12', 'uid169': '1', - 'uid170': '2', 'uid171': '3', 'uid172': '4', 'uid173': '5', 'uid174': - '6', 'uid175': '7', 'uid176': '8', 'uid177': '9', 'uid178': '10', - 'uid179': '11', 'uid180': '12', 'uid181': '13', 'uid182': '14', - 'uid183': '15', 'uid184': '16', 'uid185': '17', 'uid186': '18', - 'uid187': '19', 'uid188': '20', 'uid189': '21', 'uid190': '22', - 'uid191': '23', 'uid192': '24', 'uid193': '25', 'uid194': '26', - 'uid195': '27', 'uid196': '1', 'uid197': '2', 'uid198': '1', - 'uid199': '2', 'uid200': '3', 'uid201': '4', 'uid202': '5', 'uid203': - '6', 'uid204': '7', 'uid205': '1', 'uid206': '2', 'uid207': '3', - 'uid208': '4', 'uid209': '5', 'uid210': '6', 'uid211': '7', 'uid212': - '8', 'uid213': '9', 'uid214': '1', 'uid215': '2', 'uid216': '3', - 'uid217': '4', 'uid218': '5', 'uid219': '6', 'uid220': '1', 'uid221': - '2', 'uid224': '1', 'uid225': '2', 'uid226': '3', 'uid227': '4', - 'uid228': '5', 'uid229': '6', 'uid230': '7', 'uid231': '8', 'uid232': - '9', 'uid233': '10', 'uid234': '11', 'uid235': '12', 'uid236': '13', - 'uid237': '14', 'uid238': '15', 'uid239': '16', 'uid240': '17', - 'uid241': '18', 'uid242': '1', 'uid243': '2', 'uid244': '3', - 'uid245': '4', 'uid246': '5', 'uid247': '6', 'uid248': '7', 'uid249': - '8', 'uid250': '9', 'uid251': '10', 'uid252': '11', 'uid253': '12', - 'uid254': '13', 'uid255': '14', 'uid256': '15', 'uid260': '1', - 'uid261': '2', 'uid262': '3', 'uid263': '4', 'uid264': '5', 'uid265': - '6', 'uid266': '7', 'uid267': '8', 'uid268': '9', 'uid269': '10', - 'uid270': '11', 'uid271': '12', 'uid272': '13', 'uid273': '14', - 'uid274': '15', 'uid275': '16', 'uid276': '17', 'uid277': '18', - 'uid278': '19', 'uid279': '20', 'uid280': '21', 'uid281': '22', - 'uid282': '23', 'uid283': '24', 'uid284': '25', 'uid285': '26', - 'uid286': '27', 'uid287': '28', 'uid288': '29', 'uid289': '30', - 'uid290': '31', 'uid291': '32', 'uid292': '33', 'uid293': '34', - 'uid294': '35', 'uid295': '36', 'uid296': '37', 'uid297': '38', - 'uid306': '1', 'uid307': '2', 'uid308': '3', 'uid309': '4', 'uid310': - '5', 'uid311': '6', 'uid312': '1', 'uid313': '2', 'uid314': '3', - 'uid315': '4', 'uid316': '5', 'uid317': '6', 'uid318': '7', 'uid319': - '8', 'uid320': '9', 'uid321': '10', 'uid322': '11', 'uid323': '12', - 'uid324': '13', 'uid325': '14', 'uid326': '15', 'uid327': '16', - 'uid328': '17', 'uid329': '18', 'uid330': '19', 'uid331': '20', - 'uid332': '21', 'uid333': '22', 'uid334': '23', 'uid335': '24', - 'uid336': '25', 'uid337': '26', 'uid338': '27', 'uid339': '28', - 'uid34': '1', 'uid340': '29', 'uid341': '30', 'uid342': '1', - 'uid343': '2', 'uid344': '3', 'uid345': '4', 'uid346': '5', 'uid347': - '1', 'uid348': '2', 'uid349': '3', 'uid35': '2', 'uid350': '4', - 'uid351': '5', 'uid352': '6', 'uid353': '7', 'uid354': '8', 'uid355': - '9', 'uid356': '10', 'uid357': '11', 'uid358': '12', 'uid359': '1', - 'uid36': '3', 'uid360': '2', 'uid361': '3', 'uid362': '4', 'uid363': - '5', 'uid364': '6', 'uid365': '7', 'uid366': '8', 'uid367': '9', - 'uid368': '10', 'uid369': '11', 'uid37': '4', 'uid370': '12', - 'uid371': '13', 'uid372': '14', 'uid373': '15', 'uid374': '16', - 'uid375': '17', 'uid376': '18', 'uid377': '19', 'uid378': '20', - 'uid379': '21', 'uid38': '5', 'uid380': '22', 'uid381': '23', - 'uid382': '24', 'uid383': '25', 'uid384': '26', 'uid385': '27', - 'uid386': '1', 'uid387': '2', 'uid388': '3', 'uid389': '4', 'uid39': - '6', 'uid390': '5', 'uid391': '6', 'uid392': '7', 'uid393': '1', - 'uid394': '2', 'uid395': '3', 'uid396': '4', 'uid397': '5', 'uid398': - '6', 'uid399': '7', 'uid40': '7', 'uid400': '8', 'uid401': '9', - 'uid402': '1', 'uid403': '2', 'uid404': '3', 'uid405': '4', 'uid406': - '5', 'uid407': '6', 'uid408': '1', 'uid409': '2', 'uid41': '8', - 'uid42': '9', 'uid43': '10', 'uid44': '11', 'uid45': '12', 'uid46': - '13', 'uid47': '14', 'uid48': '15', 'uid49': '16', 'uid50': '17', - 'uid51': '18', 'uid52': '1', 'uid53': '2', 'uid54': '3', 'uid55': - '4', 'uid56': '5', 'uid57': '6', 'uid58': '7', 'uid59': '8', 'uid60': - '9', 'uid61': '10', 'uid62': '11', 'uid63': '12', 'uid64': '13', - 'uid65': '14', 'uid66': '15', 'uid70': '1', 'uid71': '2', 'uid72': - '3', 'uid73': '4', 'uid74': '5', 'uid75': '6', 'uid76': '7', 'uid77': - '8', 'uid78': '9', 'uid79': '10', 'uid80': '11', 'uid81': '12', - 'uid82': '13', 'uid83': '14', 'uid84': '15', 'uid85': '16', 'uid86': - '17', 'uid87': '18', 'uid88': '19', 'uid89': '20', 'uid90': '21', - 'uid91': '22', 'uid92': '23', 'uid93': '24', 'uid94': '25', 'uid95': - '26', 'uid96': '27', 'uid97': '28', 'uid98': '29', 'uid99': '30'} ` - -## pagelabeldict -` 'pagelabeldict': { 'img001': ' on input line 3', 'img002': ' on - input line 4', 'img003': ' on input line 5', 'img004': ' on input - line 6', 'img005': ' on input line 7', 'img006': ' on input line 8', - 'img007': ' on input line 9', 'img008': ' on input line 10', - 'img009': ' on input line 11', 'img010': ' on input line 12', - 'img011': ' on input line 13', 'img012': ' on input line 14', - 'img013': ' on input line 15', 'img014': ' on input line 16', - 'img015': ' on input line 17', 'img016': ' on input line 18', - 'img017': ' on input line 19', 'img018': ' on input line 20', - 'img019': ' on input line 21', 'img020': ' on input line 22', - 'img021': ' on input line 23', 'img022': ' on input line 24', - 'img023': ' on input line 25', 'img024': ' on input line 26', - 'img025': ' on input line 27', 'img026': ' on input line 28', - 'img027': ' on input line 29', 'img028': ' on input line 30', - 'img029': ' on input line 31', 'img030': ' on input line 32', - 'img031': ' on input line 33', 'img032': ' on input line 34', - 'img033': ' on input line 35', 'img034': ' on input line 36', - 'img035': ' on input line 37', 'img036': ' on input line 38', - 'img037': ' on input line 39', 'img038': ' on input line 40', - 'img039': ' on input line 41', 'img040': ' on input line 42', - 'img041': ' on input line 43', 'img042': ' on input line 44', - 'img043': ' on input line 45', 'img044': ' on input line 46', - 'img045': ' on input line 47', 'img046': ' on input line 48', - 'img047': ' on input line 49', 'img048': ' on input line 50', - 'img049': ' on input line 51', 'img050': ' on input line 52', - 'img051': ' on input line 53', 'img052': ' on input line 54', - 'img053': ' on input line 55', 'img054': ' on input line 56', - 'img055': ' on input line 57', 'img056': ' on input line 58', - 'img057': ' on input line 59', 'img058': ' on input line 60', - 'img059': ' on input line 61', 'img060': ' on input line 62', - 'img061': ' on input line 63', 'img062': ' on input line 64', - 'img063': ' on input line 65', 'img064': ' on input line 66', - 'img065': ' on input line 67', 'img066': ' on input line 68', - 'img067': ' on input line 69', 'img068': ' on input line 70', - 'img069': ' on input line 71', 'img070': ' on input line 72', - 'img071': ' on input line 73', 'img072': ' on input line 74', - 'img073': ' on input line 75', 'img074': ' on input line 76', - 'img075': ' on input line 77', 'img076': ' on input line 78', - 'img077': ' on input line 79', 'img078': ' on input line 80', - 'img079': ' on input line 81', 'img080': ' on input line 82', - 'img081': ' on input line 83', 'img082': ' on input line 84', - 'img083': ' on input line 85', 'img084': ' on input line 86', - 'img085': ' on input line 87', 'img086': ' on input line 88', - 'img087': ' on input line 89', 'img088': ' on input line 90', - 'img089': ' on input line 91', 'img090': ' on input line 92', - 'img091': ' on input line 93', 'img092': ' on input line 94', - 'img093': ' on input line 95', 'img094': ' on input line 96', - 'img095': ' on input line 97', 'img096': ' on input line 98', - 'img097': ' on input line 99', 'img098': ' on input line 100', - 'img099': ' on input line 101', 'img100': ' on input line 102', - 'img101': ' on input line 103', 'img102': ' on input line 104', - 'img103': ' on input line 105', 'img104': ' on input line 106', - 'img105': ' on input line 107', 'img106': ' on input line 108', - 'img107': ' on input line 109', 'img108': ' on input line 110', - 'img109': ' on input line 111', 'img110': ' on input line 112', - 'img111': ' on input line 113', 'img112': ' on input line 114', - 'img113': ' on input line 115', 'img114': ' on input line 116', - 'img115': ' on input line 117', 'img116': ' on input line 118', - 'img117': ' on input line 119', 'img118': ' on input line 120', - 'img119': ' on input line 121', 'sec10:Figure10': '41', - 'sec11:Figure11': '43', 'sec12:Figure12': '45', 'sec13:Figure13': - '47', 'sec1:Figure1': '11', 'sec2:Figure2': '13', 'sec3:Figure3': - '17', 'sec4:Figure4': '26', 'sec5:Figure5': '31', 'sec6:Figure6': - '32', 'sec7:Figure7': '33', 'sec8:Figure8': '35', 'sec9:Figure9': - '37' } ` - -## secdict -` 'secdict': { 'uid2': '1.1', 'uid5': '1.2', 'uid8': '1.3', 'uid19': - '2.1', 'uid20': '2.2', 'uid21': '2.2.1', 'uid22': '2.2.2', 'uid23': - '2.2.3', 'uid24': '2.2.4', 'uid25': '2.2.5', 'uid26': '2.3', 'uid27': - '2.3.1', 'uid28': '2.3.2', 'uid29': '2.3.3', 'uid30': '2.3.4', - 'uid31': '2.3.5', 'uid32': '2.3.6', 'uid68': '4.1', 'uid69': '4.2', - 'uid108': '4.3', 'uid109': '4.4', 'uid110': '4.5', 'uid111': '4.6', - 'uid112': '4.7', 'uid113': '4.8', 'uid115': '5.1', 'uid222': '5.2', - 'uid258': '7.1', 'uid259': '7.2', 'uid298': '7.3', 'uid299': '7.4', - 'uid300': '7.5', 'uid301': '7.6', 'uid302': '7.7', 'uid303': '7.8', - 'uid305': '8.1', 'uid410': '8.2', } ` - -## tabdict -` -'tabdict': {} -` - -## theoremdict -` - 'theoremdict': {}} -` diff --git a/doc/eoaclassic-workflow.md b/doc/eoaclassic-workflow.md deleted file mode 100644 index cdcb588..0000000 --- a/doc/eoaclassic-workflow.md +++ /dev/null @@ -1,35 +0,0 @@ -# The EOA classic workflow -This document documents the different parts of the *EOA classic* workflow, which is based on EOATeX files. - -## eoatex2imxml.py -### Steps in this program -- Preparation, setup and checks -- Tralics conversion of EOATeX source - - Includes slight output correction -- Processing elements - - Mostly chapter by chapter -- Creation of bibliographical entries -- Cleanup -- Write output files and data -### Available functions -- getchildren -- TeX2PNG -- makebibchecker -- sanitizebibentry -- createBibEntryAuthorYear -- createBibEntryNumeric -- pdf_burst -- progress -- cleanup -## imxml2django.py -## imxml2epub.py -## imxml2tei.py -## Libraries -### utils/libeoabibitem.py -### utils/libeoaconvert.py -## Other files -### config -### debug -### mkimage.py -### data -### tmp_files diff --git a/doc/eoadocx-workflow.md b/doc/eoadocx-workflow.md deleted file mode 100644 index 24d7728..0000000 --- a/doc/eoadocx-workflow.md +++ /dev/null @@ -1,6 +0,0 @@ -# The EOA DocX workflow -This document documents the different parts of the *EOA TEI* workflow, which is based on DocX files. - - -tei2eoatex.xsl -tei2imxml.py diff --git a/doc/fix_tei.md b/doc/fix_tei.md deleted file mode 100644 index 73df376..0000000 --- a/doc/fix_tei.md +++ /dev/null @@ -1,33 +0,0 @@ -# Document preparation -Conversion of docx documents to TEI XML - -Used metypeset with parameters `--prettytei --puretei`. This tool, however, removes the div structure that is important for the sectioning of the work. - -Therefore, another attempt with oxgarage which retains the div structure. - -# General info -The script takes three arguments: -* `teifile`: the TEI file -* `bibfile`: the bibliography file in bibtex format -* `figdir`: the place where the images of that publication are stored - -In a first step, some artifacts from the conversion are removed from the file. The output is written to `tmp_files/`$TEIFILE`-cleaned.xml` for inspection. - -Next, some modifications are done on a string version of the XML tree. These are adjustments on the way the citations and references to figures are entered in the text. The output is written to `tmp_files/`$TEIFILE`-modified.xml` for inspection. Some XML parsing errors might show up here which should be taken care of in the original TEI file and the whole script is then to be run again. - -In a final step, a report about missing citations, missing figures and non-parseable page ranges are gathered and other relevant data (infos about document structure, figures, footnotes, citations) are stored in `tmp_files/dict.pickle`. - -The resulting XML file is suffixed with `-out` and can be then given to the next part of the workflow. - -# Handling of citations -We use bibtex to store bibliographic data. When producing PDF, we can use the LaTeX tools to format citations and references. - -For the HTML view, a similar workflow was used (tralics etc), but the output format of biber has been changed, we have not yet adapted to it. - -One can use pandoc in conjunction with pandoc-citeproc to do the formatting. - -The prepare_tei.py script produces a markdown file that only contains the references and being run with - - pandoc -o ldaston.html -t html --filter=pandoc-citeproc --bibliography=03_daston.bib 03_daston-citations.md - -Will produce an easily parseable html file that we can use to extract the formatted bibliography and references from. diff --git a/doc/metadatamapping.md b/doc/metadatamapping.md deleted file mode 100644 index b63f377..0000000 --- a/doc/metadatamapping.md +++ /dev/null @@ -1,32 +0,0 @@ -This is a mapping of the fields in `publication.cfg` to their counterparts in a TEI header - -# Mandatory values (according to database schema) - - PublicationDate: //t:teiHeader/t:fileDesc/t:publicationStmt/t:date/@when - - PublicationYear: //t:teiHeader/t:fileDesc/t:publicationStmt/t:date/@when truncate - - Language: //t:teiHeader/t:profileDesc/t:langUsage/t:language/@ident - - License: //t:teiHeader/t:fileDesc/t:publicationStmt/t:availability/t:licence/text() - - Number: //t:teiHeader/t:fileDesc/t:seriesStmt/t:idno[@type='number']/text() - - Serie: //t:teiHeader/t:fileDesc/t:seriesStmt/t:title/text() - - Title: //t:teiHeader/t:fileDesc/t:titleStmt/t:title[@type='main']/text() - -# Optional values (according to database schema) - - Subtitle: //t:teiHeader/t:fileDesc/t:titleStmt/t:title[@type='sub']/text() - - ISBN: //t:teiHeader/t:fileDesc/t:publicationStmt/t:idno[@type='isbn']/text() - - Price: //t:teiHeader/t:fileDesc/t:extent/t:measure[@type='price']/@quantity + //t:teiHeader/t:fileDesc/t:extent/t:measure[@type='price']/@unit - - Pages: //t:teiHeader/t:fileDesc/t:extent/t:measure[@commodity='pages']/@quantity - - Shoplink: //t:teiHeader/t:fileDesc/t:publicationStmt/t:idno[@type='shoplink']/text() + //t:teiHeader/t:fileDesc/t:publicationStmt/t:distributor/t:orgName/text() - - BriefDescription: //t:teiHeader/t:profileDesc/t:abstract[@n='brief']/p/text() - - DetailedDescription: //t:teiHeader/t:profileDesc/t:abstract[@n='detailed']/p/text() - - Dedication: //t:text/t:front/t:div[@type='dedication']/t:ab/text() - - Submitter: //t:teiHeader/t:fileDesc/t:titleStmt/t:editor[@role='submitter']/@ref - - EditorialCoordination: //t:teiHeader/t:fileDesc/t:titleStmt/t:editor[@role='editorialcoordinator']/@ref - - Copyediting: //t:teiHeader/t:fileDesc/t:titleStmt/t:editor[@role='copyeditor']/@ref - - Translator: //t:teiHeader/t:fileDesc/t:titleStmt/t:editor[@role='translator']/@ref - - Author1..5: //t:teiHeader/t:fileDesc/t:titleStmt/t:author/@ref or //t:teiHeader/t:fileDesc/t:titleStmt/t:editor[@role='volumeeditor']/@ref - - Zusatz: set, if //t:teiHeader/t:fileDesc/t:titleStmt/t:editor[@role='volumeeditor']/@ref - -# Fields that are currently ignored by publicationimport - - Keyword1..6: //t:teiHeader/t:profileDesc/t:textClass/t:keywords/t:list/t:item - - PublicationManagement: //t:teiHeader/t:fileDesc/t:titleStmt/t:editor[@role='publicationmanager']/@ref - - PublicationAssistants: //t:teiHeader/t:fileDesc/t:titleStmt/t:editor[@role='publicationassistant']/@ref - - AdditionalInformation: //t:teiHeader/t:profileDesc/t:abstract[@n='additional']/p/text() diff --git a/doc/tei2imxml.md b/doc/tei2imxml.md deleted file mode 100644 index 2d29dd6..0000000 --- a/doc/tei2imxml.md +++ /dev/null @@ -1,13 +0,0 @@ -# Documentation - -This program converts a TEI-XML file to an intermediate XML file (extended Docbook format) so that it can be integrated into the current EOA Django workflow. - -## Steps performed in this script - -### Rendering of bibliography -The referenced bibtex file converted to a JSON version and rendered as an HTML document (through `pandoc-citeproc`, using a CSL stylesheet). - -### Transformation of text body, assignment of IDs -Here, the TEI tags are converted to their counterparts which is mostly a 1:1 translation. Also, IDs are assigned to the relevant elements. Information about the document structure is added to `tmp_files/dict.pickle`. - -The resulting file is `tmp_files/IntermediateXMLFile.xml` and can be picked up by the next scripts in the workflow, such as `tralics2django.py` and `tralics2epub.py`.