This article presents an analysis of the translation of Discourse Markers in a parallel Spanish-Arabic corpus from a computational perspective. The research carried out is divided into three main sections. The first section describes the resources used in the study including the main characteristics of the corpus and the pragmatic annotation model (PRAGMATEXT) used in tagging and classifying the discourse markers. The annotation model addresses six semantic-pragmatic phenomena: emotional language, discursive relations, speech acts, modalization, evidentiality and deixis. In the second section, discourse markers in the Spanish corpus are analyzed in detail pointing out the discursive phenomena encoded through the different markers and their frequencies in the corpus. Finally, the third section is dedicated to the technical and computational strategies adopted for the detection of equivalent discourse markers in the aligned Arabic corpus based on the pragmatic information tagged in the Spanish corpus, followed by an evaluation of these strategies. Conclusions regarding the benefits of building resources enriched with pragmatic information are discussed at the end.

Add Comment

Your email address will not be published. Required fields are marked *

error: Este contenido está sometido a copyright.