This database is intended for the evaluation of algorithms for front-end feature extraction algorithms in background noise but may also be used more widely by speech researchers to evaluate and compare the performance of noise robust speech recognition algorithms.AURORA-CD0004-01 AURORA Project Database - Aurora 4a - Evaluation Package
The Aurora 4a database is based the Wall Street Journal data with artificial addition of noise over a range of signal to noise ratios. It contains both clean and multiple condition training sets and 14 evaluation sets with different noise types and microphones.AURORA-CD0004-02 AURORA Project Database - Aurora 4b - Evaluation Package
The Aurora 4b contains noisy versions of the Nov'92 Wall Street Journal development set.AURORA-CD0005 AURORA-5
The AURORA-5 database has been mainly developed to investigate the influence on the performance of automatic speech recognition for a hands-free speech input in noisy room environments. Furthermore two test conditions are included to study the influence of transmitting the speech in a mobile communication system. It contains artificially distorted versions of the recordings from adult speakers in the TI-Digits speech database, a set of recordings that contain sequences of digits uttered by different speakers in hands-free mode in a meeting room, as well as a set of scripts for running recognition experiments on those speech data.ELRA-E0002 TC-STAR 2005 Evaluation Package - ASR English
This package includes the material used for the TC-STAR 2005 Automatic Speech Recognition (ASR) first evaluation campaign for the English language. It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.ELRA-E0011 TC-STAR 2006 Evaluation Package - ASR English
This package includes the material used for the TC-STAR 2006 Automatic Speech Recognition (ASR) second evaluation campaign for the English language. It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.ELRA-E0025 TC-STAR 2007 Evaluation Package - ASR English
This package includes the material used for the TC-STAR 2007 Automatic Speech Recognition (ASR) third evaluation campaign for the English language. It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.ELRA-S0001 ACCOR – English
This resource is an acoustic and articulatory English database recorded as part of the ESPRIT-ACCOR project investigating cross-language acoustic-articulatory correlations in coarticulatory processes.ELRA-S0009 COST232
This resource is a multi-English speech database with 797 calls received in Italy and in the UK , using different types of collecting equipment. It consists of a repetition of the same vocabulary from the "TI (Texas Instrument) words" (digits + yes, no, go, etc.).ELRA-S0239 N4 (NATO Native and Non Native) database
This database comprises speech data recorded in the naval transmission training centers of four countries ( Germany , the Netherlands , United Kingdom and Canada ) during naval communication training sessions in 2000-2002. The material consists of native and non-native speakers using NATO Naval English procedure between ships, and reading from a text, "The North Wind and the Sun," in both English and the speaker's native language. The audio files have been manually transcribed and annotated.ELRA-L0010 MULTEXT Lexicons
This resource contains a set of lexicons developed in the MULTEXT project financed by the European Commission (LRE 62-050). The set contains the following languages:
- English: 66,214 Word forms
- French: 306,795 Word forms
- German: 233,861 Word forms
- Italian: 145,530 Word forms
- Spanish: 510,710 Word forms
ELRA-W0003 CRATER corpus
This is a multi-lingual aligned corpus with 1,000,000 token corpora for English, French and Spanish, with morphosyntactical annotations.
An extended version of CRATER (ref. ELRA-W0003) is available in CRATER 2 (ref. ELRA-W0033).ELRA-W0033 CRATER 2 Corpus
The CRATER 2 parallel corpus is an extension of the CRATER corpus, available in the catalogue under reference W0003. It consists of 1,500,000 tokens for English and French and of 1,000,000 tokens for Spanish, with morphosyntactical annotations.
CRATER 2 (ref. ELRA-W0033) includes CRATER (ref. ELRA-W0003).ELRA-W0023 MLCC Multilingual and Parallel Corpora
The first set contains articles from 6 European newspapers: Het Financieele Dagblad (Dutch, 8.5 million words), The Financial Times (English, 30 million words), Le Monde (French, 10 million words), Handelsblatt (German, 33 million words), Il sole 24 Ore (Italian, 1.88 million words), Expansion (Spanish, 10 million words).
The second set consists of a parallel corpus of translated data in the nine European official languages (1992-1994) divided into 2 sub-corpora: written questions (10.2 million words) and parliamentary debates (5 to 8 million words per language).ELRA-W0048 TUNA Corpus
The TUNA Corpus of Referring Expressions is built with the contributions from 50 native or fluent speakers of English and it contains about 2000 descriptions (referring expressions). Participants described objects (targets) in visual domains by typing and submitting referring expressions that distingued them from other objects that were shown simultaneously (distractors). Each description is annotated with semantic information.
Make sure you download the end-user agreement for these LRs.