EpiDoc Parser
Python parser for EpiDoc (epigraphic documents in TEI XML).
For example idp.data-sheet uses the parser to generate a single CSV sheet of the Papyri.info Integrating Digital Papyrology data.
Usage
Installation
Install the package
pip install git+https://github.com/Xennis/epidoc-parser
Load a document
Load a document from a file
import epidoc
with open("my-epidoc.xml") as f:
doc = epidoc.load(f)
Load a document from a string
import epidoc
my_epidoc = """<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="http://www.stoa.org/epidoc/schema/8.13/tei-epidoc.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:id="hgv74005">
[...]
</TEI>
"""
doc = epidoc.loads(my_epidoc)
Get data from a document
Call the attributes, for example
>>> doc.title
"Ordre de paiement"
>>> doc.material
"ostrakon"
>>> doc.languages
{"en": "Englisch", "la": "Latein", "el": "Griechisch"}
>>> [t.get("text") for t in doc.terms]
["Anweisung", "Zahlung", "Getreide"]
>>> doc.origin_place.get("text")
"Kysis (Oasis Magna)"
>>> doc.origin_dates[0]
{"notbefore": "0301", "notafter": "0425", "precision": "low", "text": "IV - Anfang V"}
Documentation
Field | EpiDoc source element (XPath) |
---|---|
commentary | //body/div[@type='commentary' and @subtype='general'] |
edition_foreign_languages | //body/div[@type='edition']//foreign/@xml:lang |
edition_language | //body/div[@type='edition']/@xml:lang |
idno | //teiHeader/fileDesc/publicationStmt/idno |
authority | //teiHeader/fileDesc/publicationStmt/authority |
availability | //teiHeader/fileDesc/publicationStmt/availability |
languages | //teiHeader/profileDesc/langUsage/language |
material | //teiHeader/fileDesc/sourceDesc/msDesc/physDesc/objectDesc//support/material |
origin_dates | //teiHeader/fileDesc/sourceDesc/msDesc/history/origin/origDate |
origin_place | //teiHeader/fileDesc/sourceDesc/msDesc/history/origin/origPlace |
provenances | //teiHeader/fileDesc/sourceDesc/msDesc/history/provenance |
reprint_from | //body/ref[@type='reprint-from'] |
reprint_in | //body/ref[@type='reprint-in'] |
terms | //teiHeader/profileDesc/textClass//term |
title | //teiHeader/fileDesc/titleStmt/title |
Development
Create a virtual environment, enable it and install the dependencies
python3 -m venv venv
. venv/bin/activate
pip install --requirement requirements.txt
Run the test
make unittest
LICENSE
Code
see LICENSE
Test data
The test data in this project is from the project idp.data by Papyri.info. This data is made available under a Creative Commons Attribution 3.0 License, with copyright and attribution to the respective projects.