The phenomena described here are:

  1. Emotional discourse
  2. Discourse relations
  3. Modalization
  4. Evidentiality
  5. Phraseological Units
  6. Metaphor
  7. Speech Acts in interrogative clauses

Each phenomenon described is analyzed according to the methodology adopted to design of the tagset: description of Pragmatic phenomenon; classification of pragmatic categories; inventory of lexical linguistic forms in which a phenomenon is codified and formalization in XML language.

Emotional discourse

The analysis of emotions has been addressed from different knowledge areas and with different theoretical frameworks. However, research related to emotions from a scientific perspective together with its systematization has been pursued only very recently.

In the linguistic domain, pragmatics has focused on the study of emotions through interjections. The most interesting contributions highlight the scope of these particles over the interpretation of the statements:

  • Franco ha muerto ¡bien! à Franco has died, great!
    • Franco ha muerto ¡que pena¡ à Franco has died, what a pity!
    • ¡ah! Franco ha muertoà Oh! Franco has died

In the above examples, statements are interpreted in very different ways. The same fact (Franco’s death) causes different reactions in hearer: happiness in (1), sorrow in (2) and surprise in (3).

There is no closed list of these emotions nor a complete description of the different uses of each interjection in Spanish, for a systematization of the meaning of these particles and a typology of the emotions they codify. On the contrary, the consulted bibliography insists on the fact that the meaning depends on the context in the end.

The PRAGMATEXT project established two types of interjections: a first group which presents a positive or negative evaluation of a particular situation; and a second group which expresses other emotions such as surprise or wish. When the interjection meaning is difficult to define, its context of use is described in a special tag.

Apart from the interjection, there are also other categories used to express emotions, such as sentence adverbs, exclamation sentences, and prepositional phrases. This is a brief list of linguistic forms from the inventory produced: hala,

hostia, joder, de puta madre, genial, madre mía de mi vida y de mi corazón, Jesús bendito, hala, afortunadamente, desgraciadamente, superbien, fatal, de pena, etc.

In XML language, this linguistic forms are tagged in the following way:

  • <PI ED=evaluative/interjective type=positive/negative/other context=description>

An with an example:

  • <PI ED=emotion type=other context=surprise>¡dios mío!</PI>

<PI ED=emotion type=other context=surprise>¡oh my god!</PI>

Discourse relations

This section describes how the human being argues his perception of the world. The study of discursive relations has been often approached from the analysis of discourse markers. However, according to the theoretical principles mentioned in section 2, the model operates from the abstract plane to the concrete one. A typology of discourse relations has been designed and then codified in linguistic forms. The model does not use the discourse marker category due to the ambiguity of the tag, as they codify not only discourse relations but also other discursive phenomena such as modalization.

In the model, discourse relations are: reformulation, condition, change the focus, manner, contrast, result, purpose.

Conjunctions are the most important category used to express discourse relations, although we can find others as prepositional phrases, adverbs, noun phrases, etc. For example: en primer lugar, aunque, si, pero, y, además, incluso, finalmente, porque, o sea, es decir, para, con el fin de, a pesar de, como consecuencia de, de tal modo que, primero, en cuanto a x, a este respecto, etc.

The formalization in XML looks as following:

  • <IP DR={reason/purpose/hypothesis/reformulation/etc.}>

An example of a linguistic forms with argumentative meaning :

  • <IP DR=change_focus>en cuanto a</IP>

<IP DR=change_focus>as regards</IP>

Modalization: hedges, booster and interactive

This group is made up of phenomena related to the social interaction, where two or more people share information and expect to be understood, and, at the same time, to reinforce their emotional bonds.

Through modalization, human beings express agreement o disagreement with the  truth  of  the  statements,  of  each  other.  Two  types  of  modalization  are

distinguished: hedges and boosters. Hedges are used for two reasons: when unsure of the truth of a statement or when the interlocutor is in disagreement and there is no intention to create conflict. Boosters are employed to reinforce engagement with the truth of a statements expressed or heard, but also to strengthen solidarity ties with the interlocutor.

Finally, the latest group of this section comprises the interactive forms. These linguistic forms operate to open or keep the communicative channel, offer options, guarantee that the message has been heard and ask the hearer’s agreement.

Following, some example of each type is showed:

  1. Hedges: bueno, o sea, quizás, etc.
  2. Booster:    claro,    evidentemente,     por    supuesto,    seguramente, indudablemente, clarísimamente, etc.
  3. Interactives: ah, eh? no? sabes? vale? ok? venga hola oye adiós buenos días, …

In XML format:

  • <IP MOD=hedge/booster/interactive> Example:
  • <IP MOD=interactive>vale?<IP>

<IP MOD=interactive>ok?<IP>


The degree of trust about the truth conditions of a statement depends, to a great extent, on the origin of knowledge. This fact is marked in languages through what has been called evidentiality by linguists, and it is defined as the verbal codification of the knowledge source.

Unlike other languages, the evidentiality in Spanish is not expressed through a closed system of grammatical marks. Therefore, it is a difficult phenomenon to recognize. At present, although we have access to a great deal of information, sometime we are unable to assess if that information is reliable or not.. Evidentiality tagging may be a profitable way to face this problem; hence a typology must be established for it. The bibliography about this subject offers us some models; however, useful typology is the one where semi-automatic tagging is possible. It is critical to find a clear correspondence between the phenomena, its categories and the linguistic forms through which these phenomena are codified.

The model uses the following classification of knowledge:

  • experimented through senses
    • obtained through intellectual inference
    • 3rd party knowledge
    • written knowledge
    • oral or popular knowledge

The formalization in XML looks as follows:

  • <IP Evidentiality={1/2/3/4/5}></IP>


  • <IP Evidentiality=5>al parecer</IP>

<IP Evidentiality=5>apparently</IP>

Linguistic convention: phraseological units and Metaphor

A second language student realizes that it is not enough to learn the grammar and vocabulary of a language, but also it is necessary to know how this community usually acts in a specific situation, to select the appropriate linguistic register or style of language, in short, the most common way to say something. Therefore, metaphors and phraseological units play a very important role in how the language becomes conventional.

The model distinguishes three types of phraseological units: collocations, locutions and proverbs. Both collocations and locutions are group of words which usually appear together but the first ones have a compositional meaning; the second ones not. On the other hand, proverbs are to be considered as a sentence, not a group of words.

Cognitive Linguistics highlighted that metaphorical language is a way to express our knowledge of reality. Through the metaphor, abstract concepts are expressed as concrete concepts, which are also culture dependant.

This phenomenon together with the phraseological units are the most typical part of a language. Because of this, the model uses tags for the metaphoric expressions included in the phraseological units. Tagging phrseological units, it must decide:

  1. if it is a collocation, a locution or a proverb
  2. if the expression contains a metaphor
  3. the semantic domains present in the metaphor The formalization in XML looks as follows:
    1. <IP PU={collocation/locution/proverb} MET=true Source={body, space, nature, etc.} Target={time, prudence, wisdom…></IP>


  • <IP PU=proverb MET=true Source=nature Target=prudence>más vale pájaro en mano que ciento volando</IP>

<IP PU=proverb MET=true Source=nature Target=prudence>a bird in the hand is worth two in the bush</IP>

Interrogative speech acts

A speech act is a communicative unit which expresses the speaker’s intention. There is no bi-univocal relation between the linguistic form of statements and its communicative intention. For this reason, many linguists think the automatic recognition of a speech act is impossible, because in this case the context, above all, power relationships. Until now, most initiatives have been developed for specific domain. At this stage, PRAGMATEXT is focused only on interrogative utterance to be studied in all domains of the CORAL ROM corpus.

The formalization in XML looks as follows:

  • For example:
    • <IP SA=request>¿Puedo abrir la ventana?</IP>

<IP SA=request> May I open the window, please?</IP>

