Workplan 1
Evaluating language technology methods in the development of bilingual
dictionaries.
Duration: 2008/08/01-2008/12/30.
Partners involved: RIL
Delivarable: Technical report
Description:
The focus of the WP is on the assessment and comparison of methods automating the development of bilingual dictionaries from existing machine readable language resources. The main purpose of the research is to find out how language technology methods can replace manual work and contribute to the automatic creation of bilingual dictionaries. The outcome of the WP is a detailed feasibility study and case study on a prototype method and resources that can be utilized efficiently in the dictionary development process. At the end of the WP a proposal for the most usable methodology and resurces should be formulated.
Main Activities:
1. Exploring current methods for the automatic creation of lexical resources
This task is principally a survey of related research, focusing on two main directions:
(i) research investigating how language technology methods can enable the automatic mapping of dictionary entries between two languages, using the existing dictionaries as intermediary resources, in eg. a hub-and-spoke model, which aims at the creation of a third dictionary by linking two existing bilingual lexical databases.
(ii) research on the automatic construction of bilingual resources from aligned parallel corpora. Over the last decade several models have been developed (see References below) which could be used in bilingual dictionary development.
2. Studying the usability and the structure of available resources
By experimenting with a candidate language pair, this task focuses on a case study investigating how well selected methods perform in the construction of resources and to what extent they can alleviate the need for manual development.
References
Dan Tufiş, Ana Maria Barbu, Radu Ion. 2004. Extracting Multilingual Lexicons from Parallel Corpora, Computers and the Humanities, Volume 38, Issue 2, 163-189
Dan Melamed. 1997. A Word-to-Word Model of Translational Equivalence. Proceedings of ACL 1997. 490-497.
D. Hiemstra. 1996. Using statistical methods to create a bilingual dictionary. Master Thesis. University of Twente.
P. Koehn. 2005. Europarl: A Parallel Corpus for Statistical Machine Translation. MT Summit.
Jean-Michel Renders, Hervé Déjean, Eric Gaussier. 2003. Assessing automatically extracted bilingual lexicons for CLIR in vertical Domains. Lecture Notes in Computer Science. Advances in Cross-Language Information Retrieval. 363-371.
Pascual Nieto, Ismael and Michael O'Donnell. 2007. Flexible statistical construction of bilingual dictionaries. Revista de la Sociedad Espańola para el Procesamiento del Lenguaje Natural (SEPLN). 249-258.
Helena M. Caseli, Maria das Graças V. Nunes, Mikel L. Forcada. 2008. From free shallow monolingual resources to machine translation systems: easing the task. Mixing Approaches To Machine Translation, MATMT2008. 41-48.
Helena M. Caseli, Maria das Graças V. Nunes, Mikel L. Forcada, 2006. Automatic induction of bilingual resources from aligned parallel corpora: application to shallow-transfer machine translation. Machine Translation. 20:4. 227-245.
