Partenaires

CNRS
Logo tutelle


Rechercher

Sur ce site

Sur le Web du CNRS


Accueil du site > Actualités > TYPOLING - Bulletin d’information n°139

TYPOLING - Bulletin d’information n°139

- International workshop on Multilingual Corpora annotation Lundi 3 octobre 2011
Salle de conférence - Bat D
Campus CNRS de Villejuif.

Multilingual corpora represent an interesting concentrated mixture of most of the problems raised by monolingual corpora and some extra challenges. Central issues are related to problems of variation and non-standard forms, often ignored by big national corpora or controlled by rather general criteria (like quite general typologies of texts and discourses defining the kind of data collected).
These variations transcend internal variation observable within a single language ; they often even question the categorization of linguistic forms as belonging to a given language or a given variety vs. another.
Corpora containing code-switching, code-mixing, hybridization phenomena, heterogeneous uses of a lingua franca variably mobilized within diverse social practices and linguistic competences, raise a range of methodological and theoretical questions - such as problems of identification, notation, transcription, and categorization of hybrid forms.
These problems have crucial consequences for the annotation of corpora, for the definition and delimitation of what a multilingual corpus is, for the choice of relevant contexts of practice to be documented, etc.

The workshop aims at debating these problems, on the basis of a) data bases of multilingual corpora already achieved - for which examples of the problems and solutions will be given.
b) excerpts of multilingual corpora on which problems of transcription, annotation, and exploitations will be illustrated and discussed.

Multilingual-corpora-annotation - 523.7 ko