Digitale Ethik

15. & 16.09.2025

Middle Persian Corpus & Dictionary Project (MPCD) a corpus-based digital dictionary

Iris Colditz, Thomas Jügel & Raha Musav

Keywords: MPCD, Mittelpersisch, Zoroastrismus, Digitales Korpus, Wörterbuch

Abstract: The DFG-funded long-term project "Zoroastrian Middle Persian: Digital Corpus and Dictionary (MPCD)" is developing an online and open-access corpus of Zoroastrian Middle Persian texts written in Pahlavi script. The corpus provides access to nearly all Zoroastrian Middle Persian texts, including transcription, transliteration, and digitized manuscripts. Comprising approximately 800,000 tokens, it includes morphological, semantic, and partial syntactic annotations. A Middle Persian-English dictionary provides a detailed semantic view of these texts. The digital corpus and dictionary serve as closely interlinked analysis tools for researching linguistic and conceptual-historical questions and enable collaborative processing and research through a web-based working environment.

With this corpus, the original texts and their manuscripts are on display for the general public. This overcomes the obstacle that Middle Persian texts have so far been scattered over numerous publications that are only accessible to insiders of the field. The possibility of polydimensional references between the datasets makes interrelations of many anonymous and hardly datable sources visible, which gives new insight into their chronology and thereby the Zoroastrian literary tradition.

On this poster, we will represent the structure of the MPCD app (mpcorpus.org) and visualise its functionalities, in particular:

- module for the annotation of text together with the manuscript viewer
- module of syntactic annotation
- examples of dictionary entries
- the search engine
- charts with statistics generated from our corpus
- system of semantic taxonomy

Poster

Download