In this paper, a pragmatic annotation model and its implementation in XML language is presented. Currently the spontaneous spoken corpus CORALROM has been tagged. This is a corpus with a wide range of communicative situations and a large number of linguistic registers. The pragmatic phenomena tagged are emotions, discourse relations, modalization, evidentiality, phraseological units, metaphor and speech acts in interrogative utterances. In the XML PRAGMATEXT implementation, the structure of the text is tagged at the element level, whereas the pragmatic information is introduced at the attribute level, so that more than one pragmatic phenomenon can be simultaneously presented for a single word. Additionally the sentence modifiers, such as conjunctions, interjections, sentence adverbs, etc., have been studied. A telephonic conversation has been manually tagged and it will be used to validate the performance of the XML annotation model. This example is included in the Appendix C.
The second part of the project involves semiautomatic tagging of the corpus, which is currently underway. The tagging strategies to recognize pragmatic information are based on context rules including prosodic and categorical information. The final part of the project is focused on two of its possible applications: the improvement of man-machine conversational systems and teaching of Spanish as a foreign language.
References
ALBELDA, M (2002) La intensificación en el español coloquial, http://www.tdx.cesca.es/TDX-0701105-125232/
Briz, A. (1995) La atenuación en la conversación coloquial. Una categoría pragmática, en Cortés, L. (ed.), El español coloquial. Actas del I Simposio sobre
análisis del discurso oral. Almería: Servicio de Publicaciones, págs.: 103–122; ampliado en Briz. 1998, cap. 4 y 6.
Corpas, L (1994) Manual de Fraseología española, Gredos, Madrid.
Cresti, E. Et al. “The C-ORAL-ROM project. New methods for spoken language archives in a multilingual romance corpus”. Proceedings of LREC 2002. Las Palmas de Gran Canaria.
Devillers, L., Vasilescu, I., and Lamel, L. (2002). Annotation and detection of emotion in a task oriented human-human dialog corpus, International Standards for Language Engineering, Edinburgh.
Add Comment