Work Outlines

The present study is organized as follows. After this introductory section, in the second part, we explain the guidelines defining our theoretical pragmatic framework. Based on this framework, in the third section, we describe the typology adopted in the classification of the discourse markers and how it is reflected through the PRAGMATEXT, the XML annotation model for pragmatic information. The fourth section discusses the characteristics of the corpus and the methodology used for the automatic detection, classification and annotation of the discourse markers in the three languages. The approach considers two types of challenges. First, the levels of ambiguity in discourse markers, especially in the Spanish considered as our starting point for tagging the Arabic corpus. Second, the different technical strategies adopted to identify and tag the discoure markers in the three languages, Spanish, English and Arabic. Results of Arabic tagging are included.

Finally in the last section, conclusions are drawn and future work is outlined pointing out some of the possible applications that can benefit from such a resource.

