Personal tools
You are here: Home Projects EFNILEX Progress_report

Progress_report

Progress report submitted to the 6th Annual EFNIL Conference

Progress Report of the EFNILEX Project for 2008



 Project members: Tamás Váradi, project coordinator, Hungarian Academy of Sciences
                              Enikő Héja, Hungarian Academy of Sciences
                              Johan Van Hoorde, Nederlandse Taalunie, Netherlands
                              Annemieke Hoorntje,  Nederlandse Taalunie, Netherlands          
                              Sabine Kirchmeier, Dansk Sprognævn, Denmark
                              John Simpson, Oxford English Dictionary, UK
                              Jolanta Zabartskaite, Lietviu Kalbos Institutas, Lithuania
                                 
Mission
The objective of the EFNILEX project is the development of a modern, cost-effective method for the production of bi- and multilingual dictionaries of missing language pairs for general public. The intended dictionaries would focus on public areas related to social mobility and consist of 25-40,000 entries. They should be made freely available on the internet.

Resources
The Executive Committee has allocated the sum of  € 15.000 for the operation of the project.  Originally this amount was meant to cover the cost of project group meetings but due to savings on travel costs (made possible from linking the project group meeting with the meeting of the EC in April in Ghent as well as making the second meeting a teleconferencing event) it was possible to shift part of the project group budget to hiring personnel.
As a result, computational linguist Enikő Héja was hired to work as of 1st August under the supervision of project leader Tamás Váradi

Tasks
At the EFNILEX Project Group meeting on 23rd April 2008 it was decided that the project launches activity in the following areas  
Starting an inventory of the available dictionaries, corpora and tools
A survey constructed for the Scandinavian languages was used as a point of departure. After the phone conference held on 14. September we modified the survey following the suggestions of Annemieke Hoorntje and prepared questionnaires available through the internet. This makes the collection of data less time consuming. The surveys can be found under the following links:
EFNILEX Questionnaire (Resources)
http://spreadsheets.google.com/viewform?key=pjQPtI17W_lmlNQP1igMTVQ&hl=en_GB
EFNILEX Questionnaire (Tools)
http://spreadsheets.google.com/viewform?key=pjQPtI17W_lkCQU_DS3FFAg&hl=en_GB

Exploring language technology methods to facilitate the construction of bilingual dictionaries
The objective of this task is to automate the construction of bilingual dictionaries as much as possible by using existing materials.
The preliminary results, as presented at the 14/09/2008 teleconference suggest that the Hub-and-Spoke model (see International Journal of Lexicography, 20(3) ) does not meet our expectations since certain prerequisites e.g. the existence of large-scale monolingual databases render the application of this model rather costly.
An alternative approach is word alignment of parallel corpora. The investigation of these methods yielded the conclusion that being highly language-independent such methods might be appropriate for our purposes.
Initial results seem encouraging.  A manual check of a random sample of an English-French dictionary generated wholly automatically based on the Hansard English-French parallel corpora has yielded 86.66%  accuracy.


Exploring funding possibilities
The search for action programs and other funding possibilities that contain slots in which our project proposal would fit, would be a high-priority task of our project group. Cooperation with the CLARIN infrastructure project (www.clarin.eu) presented itself.
In particular, there is one group working on lexical resources (WP5.2)  in the  CLARIN project led by Erhard Hinrichs (University of Tübingen). It was suggested that the members of the EFNILEX group should join WP5.2




Tamás Váradi
EFNILEX project coordinator

Document Actions