1.      Introduction

This paper is the result of an experiment carried out by a group of transcribers at the Laboratorio de Lingüística Informática (LLI) at the Universidad Autónoma de Madrid, once the recording and transcribing phases of the C-ORAL-ROM project were over.

The goal of the experiment was to confirm certain hypothesis which had arisen after the transcription process as the team attempted to determine which communicative interactions caused more difficulties in the transcription phase and why.

The original hypothesis consisted in relating these transcription problems to the frequency of occurrence of two kinds of linguistic phenomena typical of spoken interactions:

  • Production features, such as fragmented words, supports, retractings, etc.
  • Interaction features, such as the number of turns or the overlapping (Llisterri, 1997).

This would lead to the conclusion that the more frequent these phenomena were in a spoken interaction, the more time and effort needed by the linguist in the process of transcription.

However, it was the team’s aim to rationalize

these impressions and confirm the causes in an empirical way. Thus, the following objectives were stated:

  • Definition of orality.
  • Development of a computational tool which could help to establish a relation between conversational genres and orality features.
  • Showing how these features vary inside the corpus depending of the register.
  • Data analysis and verification of how orality is related to the difficulties present in the transcription process.
  • Establishing a typology of transcription problems based on the results of the analysis.

However, before attempting further explanation of the experiment and analysis of the results, it is necessary to discuss some features of the corpus used, focusing on those which are related to its design and distribution.

