Minutes of the PROJECT GROUP EFNILEX
Meeting 23 April 2008 Ghent University, Faculty of Arts and Philosophy, Department of Nordic Studies, Rozier 44, Ghent
Participants of the Meeting:
Sabine Kirchmeier, Dansk Sprognævn, Denmark
John Simpson, Oxford English Dictionary, UK
Jolanta Zabartskaite, Lietviu Kalbos Institutas (Institute for the Lithuanian Language), Lithuania
Tamás Váradi, Research Institute for Linguistics, Hungarian Academy of Sciences, Hungary
Johan Van Hoorde, Nederlandse Taalunie, Belgium-Netherlands
Annemieke Hoorntje, Nederlandse Taalunie, Belgium-Netherlands
Invited guest: Prof. Godelieve Laureys, Ghent University
Agenda
1. Welcome – introduction
2. Agenda
3. Rationale of the project
4. Financial Resources
5. Toward a full project plan
6. Division of labour
7. Conclusions / next meeting
8. Draft report
Minutes
1. Welcome - introduction
Tamás Váradi welcomed everybody and announced that a presentation and demonstration had been arranged with professor Godelieve Laureys about a related project involving the dictionaries produced in her department. A short introduction round revealed that all particpants had substantial experience with the compilation of either monolingual or bilingual dictionaries.
2. Agenda
The agenda was approved.
3. Rationale of the project
There was general agreement on the project description that has been sent to the cabinet of commissioner Leonard Orban. No further reactions have been received from the commission so far.
4. Financial Resources
4.1. EFNIL
In the EFNIL budget of 2007-2008 a reservation has been made for this projects which amounts to € 15.000. A similar budget could be expected in the next few years. It was agreed that the inventory of existing dictionaries corpuses, lexicons and lexicographical tools (editors, lemmatisers etc…) had the highest priority.
4.2. European Commission
The search for action programmes and other subsidy possibilities that contain slots in which our project proposal would fit, would be one of the tasks of our project group. The cabinet of Commissioner Orban promised to assist us in this search for European financial resources. Johan Van Hoorde would stay in touch with them and look for further possibilities. John Simpson offered to look for other relevant programmes.
4.3. Other possibilities
Tamás Váradi and Sabine Kirchmeier-Andersen mentioned the EU-project CLARIN which aims at providing research infrastructure for the humanities by making technology and language resources such as monolingual and bilingual corpora, dictionaries, etc. available and readily useable (see: www.clarin.eu). An association with this project would be very relevant for our project. The financial model, however, is not simple. Basically, CLARIN will only get funded by the EU if there is a commitment through national funding in the member states. Therefore, it is important that as many EFNIL members as possible are aware of this possibility and take action in order to support existing national initiatives. In Denmark substantial national funding has already been provided for the digitalisation of Danish texts, establishment of monolingual and bilingual corpora, spoken language corpora and various multimodal resources in relation to CLARIN. Tamás Varadi is coordinator of one of the CLARIN work packages. There is also one group working on lexical resources (WP5.2) led by Erhard Hinrichs (University of Tübingen). It was suggested that the members of the EFNILLEX group could join WP5.2, and that we should coordinate our efforts with this group.
Another source could be dictionary publishers who may find the compilation of exotic language combinations too costly, but nevertheless would be interested in printing the results. Therefore, they might be willing to donate bilingual dictionaries to the project without charge. Contacts should be made with the different publishing houses to test this hypothesis.
Finally, the WORDNET project was mentioned as an interesting source of data. In various countries semantic word nets have been established containing information for the semantic description and disambiguation of word senses. The European word nets are mainly contrastive and usually publicly available and could be a valuable supplement.
Some participants expressed doubts about the use of WORDNET as a primary tool / resource and seem to be more in favour of using parallel corpuses, especially those in which there is a word alignment between the words in the source and the words in the target language. It could be considered to set up a limited pilot project in order to compare two (or more) work methods.
5. Towards a full project plan
It was agreed that the first step should be an inventory of existing dictionary resources within the EFNIL community. A survey for the Scandinavian languages was conducted last autumn and could be used as a point of departure. Furthermore, internet based survey tools are now available which make the collection of such data less time consuming. Last year’s event of drawing a winner among those who answered the questionnaire in time could be repeated.
Regarding the size of the aimed bilingual dictionaries: a medium sized dictionary of 20.000-45.000 entries was suggested, the size depending also on the available monolingual resources for the languages. Depth of the microstructure of the dictionary has priority above size. It will contain the most frequently used words in the general language domain. Some terminology should be included. A focus could be on public areas related to the area of social mobility.
Regarding languages: after the inventory of existing dictionaries a priority list of missing language combinations would be made, no languages would be excluded.
Regarding methodology:
- the aim is to try to automate as much as possible by using existing materials.
- the group was very impressed by the results of the Dutch-Finnish/Finnish-Dutch dictionary project at the University of Ghent. A demonstration of the editing tool OMBI (by Maritta Moisio, editor of the dictionary project) and a more general overview of the work of the CLVV (by prof. Godelieve Laureys) was given. The CLVV (Commissie voor Lexicografische Vertaalvoorzieningen, Committee for Interlingual Lexicographical Resources) is a Dutch/Flemish intergovernmental committee which funded and supervised in the period 1993-2003 about 19 bilingual dictionary projects Dutch-Foreign Language, a.o. the Dutch-Finnish project. The CLVV defined criteria to prioritize languages in order to come to a selection of languages and an action plan. Furthermore the editor OMBI and a Dutch reference lexicon were developed. With the reversal function of OMBI bilingual dictionaries can be created in a more time and cost effective way; manual editing is however still time consuming. The so called hub-and-spoke model is a step beyond and aims at the creation of a third dictionary (e.g.. Danish-Finnish) by linking two existing bilingual lexical databases (i.e Dutch-Danish and Dutch-Finnish).
- the use of parallel corpora as a base is mentioned. EU-documents are translated to and available for all EU-languages and could be further processed with tools.
It was agreed that a close cooperation with the projects initiated by the CLVV would be very relevant for EFNILEX.
It was suggested that the resources that would be built within EFNILEX should as far as possible be made freely available to the public on the internet.
6. Division of labour
The main tasks were identfied as
1. drawing up an elaborated project description, including project procedure and methodology.
2. making a survey of available resources
a. in EFNIL member organisations
b. in publishing houses
c. Wordnets
3. keeping in touch with related projects
a. CLARIN
b. CLVV
c. others
4. investigating possible sources for funding (EU, national funding)
5. setting up a collaborative site for the EFNILEX project in connection with the EFNIL website
Task | Responsible |
Elaborated project description | Tamás, Johann |
Survey of available resources within EFNIL | Sabine, Annemieke |
Survey of available resources in publishing house | ? |
Contact to related projects CLARIN | Tamás |
Contact to related projects CLVV/Wordnet | Tamás (WORDNET)? |
Contact to related projects others | Everybody |
Investigating possible sources of funding | Johan, John |
Setting up a collaborative site | Tamás, Johan |
7. Conclusions / next meeting
The next meeting will be held 2nd half of September.
