BONAPARTE ONDAREKO ESKUIZKRIBUAK

Universidad de Deusto - Deustuko Univertsitatea

BONAPARTE ONDAREKO ESKUIZKRIBUAK - FONDO BONAPARTE

The POS tagging

The access to the texts offers the possibility to choose between each of the tagged parts: header and the text itself.

The header includes all the gathered bibliographical information in the introductory sections of the printed edition. Moreover, the project has been designed under a general header, containing bibliographical information of previous editions, a brief description of the project and a digital taxonomy of the corpus. This classification (see Description of the Corpus) is shown in the main header of the project and establishes hierarchies and connections between different digitalised texts, taking into account dialects, sub-dialects, varieties and text typology.

The tagging structure has been created based on XML (Extensive Markup Language), following the principles in TEI (Text Encoding Initiative) with the aim of creating a standard language. At first, we followed the principles in TEI Master (Manuscript Access through Standards for Electronic Records), but later, it has been substituted by the latest version of TEI, P5 (http://www.tei-c.org ).

TEI guidelines have equally been followed in order to tag the texts, depending on their typology. For instance, within the body (<body>) and the text (<text>) of documents in pose, the series includes; the title of the document (<head>), chapter divisions (<div type="kapitulua" n="">) with the corresponding numbers and in every chapter, the title (<head>) and a list of verses (<list type="bertsikulua">), which are all numbered and preceded, if that was the case, by a small header (<p>) that provides information about its content. These tags, are not necessary in all cases, as in Gure Aita, where the title <head> and one single paragraph <p> are shown.

The following chapters describe the functions that have been applied to the structure of the documents.

Chapter 2: The TEI Header deals with the descriptive problems of a codified work, so that the text itself, its origins and revisions are kept documented.
Chapter 3: Elements available in all TEI documents describes the elements that may appear in any kind of text and also the codes that are used to mark all TEI documents.
Chapter 4: Default text structure describes the default high-level structure for all TEI documents.
Chapter 7: Performance texts is devoted to code printed dramatic texts, screen plays or radio scripts, and written transcriptions of any form of any text form.
Chapter 10: Manuscript description defines the aim that provides detailed descriptive information about handwritten primary sources.
Chapter 15: Language corpora describes the options the chapter itself offers in order to combine the corpus, the headers, and its text or a possible group of texts within a TEI document.
Chapter 16: Linking, segmentation and alignment describes a number of ways in which encoders may represent analyses of the structure of texts, either external or internal, which are not necessarily liner or hierarchic.

Regarding footnotes, which appear always relevant in this edition, have been kept the same from the printed edition with the necessary implementations. On the other hand, they have been typologically classified into 4 chromatic groups, thus making the referential search much easier:

Textual quotes in brown.
Historical quotes in yellow.
Linguistic quotes in red.
Mixed quotes that combine the three previous type of quotes in blue.

Every sic from the printed version, except some corrected ones, has been kept

All sic have also been highlighted in green so that the user can identify them easier.

The standard language used for the tagging has been English, except in those non-defined attributes from the guidelines, which have been translated into Basque. This may open the way to the first stages into a possible tagging in Basque.

Finally, in view to facilitating a comparison between different dialects, a visual interface provides a collation of up to 4 versions of the same text.

Text selection

Morphological analyzer