This paper outlines the first phase of the PRAGMATEXT project. The aim of PRAGMATEXT is to introduce pragmatic knowledge into the transcriptions of C- ORAL-ROM, a spontaneous spoken corpus of Spanish. The paper is divided in four sections. The first section presents the most relevant features of the C-ORAL-ROM corpus. The second describes the pragmatic-discursive annotation model. The phenomena tagged are: emotional discourse, argumentative operations, modalization operations, evidentiality, phraseological units with metaphoric meaning and speech acts in interrogative clauses. The third section, resolves the three challenges related to the implementation of such annotation model to the XML language: (1) pragmatic- discursive operations are expressed at different grammatical levels (lexicon, prosody, syntax, etc.); (2) a linguistic unit can have as attributes different types of pragmatic information; (3) the pragmatic knowledge is not expressed by a closed word class. The fourth section discuss future work and mentions some uses of a corpus tagged with pragmatic knowledge such as in the field of man-machine conversational systems and teaching of Spanish as a foreign language.
Add Comment