Partenaires

CNRS
Logo tutelle


Rechercher

Sur ce site

Sur le Web du CNRS


Accueil du site > Actualités > TYPOLING - Bulletin d’information n°131

TYPOLING - Bulletin d’information n°131

Les fédérations de linguistique ont le plaisir de vous annoncer la manifestation suivante :

Workshop sur l’annotation des corpus multilingues à Paris, 3 octobre 2011

coorganisé par les deux Fédérations de linguistiques, Isabelle Léglise (Fédération Typologie et Universaux linguistiques), Lorenza Mondada (Institut de Linguistique Française)

Entrée libre - Pour prévoir la taille de la salle, les personnes intéressées sont invitées à se faire connaitre des organisatrices.


Workshop on multilingual corpora annotation, Paris, 3.10.2011 Organizers : Isabelle Léglise and Lorenza Mondada , representing the two consortiums federating the CNRS research labs in linguistics : Federation TUL (Typologie et Universaux Linguistiques), ILF (Institut de Linguistique Francaise).

Multilingual corpora represent an interesting concentrated mixture of most of the problems raised by monolingual corpora and some extra challenges. Central issues are related to problems of variation and non-standard forms, often ignored by big national corpora or controlled by rather general criteria (like quite general typologies of texts and discourses defining the kind of data collected). These variations transcend internal variation observable within a single language ; they often even question the categorization of linguistic forms as belonging to a given language or a given variety vs. another. Corpora containing code-switching, code-mixing, hybridization phenomena, heterogeneous uses of a lingua franca variably mobilized within diverse social practices and linguistic competences, raise a range of methodological and theoretical questions - such as problems of identification, notation, transcription, and categorization of hybrid forms. These problems have crucial consequences for the annotation of corpora, for the definition and delimitation of what a multilingual corpus is, for the choice of relevant contexts of practice to be documented, etc. The workshop aims at debating these problems, on the basis of a) data bases of multilingual corpora already achieved - for which examples of the problems and solutions will be given. b) excerpts of multilingual corpora on which problems of transcription, annotation, and exploitations will be illustrated and discussed.