.. toctree:: :maxdepth: 2 :caption: Contents:
This tool generates files for different purposes based on the metadata of publications.
At the moment the following formats are supported:
- BibTeX files and subsequently a formatted suggested citation of publications
- JSON files to create DOIs on DataCite
- A PDF page that is prepended to the downloadable chapter files
- ONIX files for insertion into OJS/OMP (unmaintained as of 2021)
- XHTML files that show a table of contents (unmaintained as of 2021)
- texlive-xetex
- python3-pypdf2
- python3-lxml
- python3-bibtexparser
- pandoc >= 2.11
The program requires a config file that stores some paths and, if a postgres database is used, the credentials for it. If several publication platforms are used, simply create different config files for each instance.
It should contain at least the following fields:
[output] output_directory: [server] media_dir: production_url:
Output files are written to output_directory
. The entry
media_dir
in the server
section should point to the path of
the Django media
directory on the server, while production_url
points to the root URL of the publication platform.
When using SQLite, you can specify the database file like this:
[sqlite] database_file:
Insert the postgres database credentials like this:
[postgres] database_name: user: host: password: ""
If a test repository for DOIs is available, the prefix for testing can be given here:
[doi] testprefix: "10.80956"
For all subsequent examples it is assumed that an sqlite
database is used and a configuration file called apress
. According to the naming scheme of EOA publications, an examplary publication Studies 23 is demonstrated here.
In some cases it will be easier to work locally on a desktop/laptop computer instead of directly on the server. In these cases, the SQLite database can be copied to the local machine.
The config apress.cfg
might look like this (fictional values are used here):
[output] output_directory: "generated_files" [server] media_dir: "/var/www/apress/eoapp/media/" production_url: "http://example.com:9090" [sqlite] database_file: "~/apress.db" [doi] testprefix: "12.2342"
With the configuration in place, formatted citations will be generated like this:
python3 metadator.py --sqlite -f apress.cfg -b studies23
The tool pandoc
is used in the background. For further convenience, a bibtex
file is created along the way.
JSON files for the generation of DOIs for DataCite are created like so:
python3 metadator.py --sqlite -f apress.cfg -j -t studies23
The -t
option will use the test repository. URLs for DataCite are hardcoded in the program code. Two shell scripts are created along the way: studies23_doiupload_test.sh
is for batch uploading the JSON files into DataCite, while studies23_test_deletedraft.sh
can be used to delete the test DOIs again after checking.
In both cases, the curl
option --netrc
is used which uses a file called ~/.netrc
for storing the credentials. It contains entries like:
machine api.test.example.com login LOGIN password *****
If the test DOI entries look good, the proper DOIs can be created with either:
python3 metadator.py --sqlite -f apress.cfg -j studies23
or:
python3 metadator.py --sqlite -f apress.cfg -j -i studies23
The -i
option will set the state of the DOIs instantly to publish
rather than hide
. With the first option the state of each entry has to be manually changed through the DataCite web interface. Published DOIs can not be deleted anymore. However, all the metadata can be modified at any time. This can also be done by re-using the JSON files (they should be kept alongside all the other data of the publication in version control), changing the relevant piece of information (stored in the example below as update.json
) and a curl
command similar to this where we use the URL https://api.datacite.org/dois/10.34663/9783945561577-00 as example:
curl --netrc --request PUT --header "Content-Type: application/vnd.api+json" --url {https://api.datacite.org/dois/10.34663/9783945561577-00} -d @update.json
For your convenience, this script will not only create the frontmatter. It will also attach it to the existing chapter PDFs right away. This is in fact a relic of when the tool was first created and the complete backlist needed to be handled. Thus, this tool still requires that the chapter PDFs have been uploaded into the platform. A shell script will then be used to exchange the two PDF files.
The simplest command here is:
python3 metadator.py --sqlite -f apress.cfg -p studies23
which will check the database which of the chapters of that publication have a PDF attached. This will be downloaded and the frontmatter stuck on the front. Also, the PDF will be enriched with meaningful metadata.
Two more options are available:
-k
will not delete the intermediate LaTeX files in case manual intervention is necessary-r
will remove the first page of the downloaded PDF in case an existing frontmatter is to be updated
Based on the information from the config file (the server/media_dir
key) a file called studies23_copycommand.sh
is created which will back up the existing file and copy the new PDF file into place. Access to the server is necessary here.