1.6.5. Sphinx4 Evaluation Tool¶
Sphinx 4 is an open source speech recognition toolkit, written in Java.
This tool can recognize data-sets and can compare them with a keyword-file. It is very similar to Speech Recognition Tool.
It needs some dependencies like speechmodels or a grammar to work. You can use your own dependencies or simply use the given Language Models.
You will need an accoustic model, a dictionary and a language model or a grammar. The usage of grammar excludes the usage of a language model. These parameters are set via the configuration type.
Also you can use a manipulated sphinx configuration to adjust the recognizer.
1.6.5.2. Interfaces¶
Input/Output:
Attention
The soundfiles you want to recognize need to be a .wav, 16khz, mono file!
- Sphinx recognizes speech from soundfiles and stores the result in a textfile.
- You will find a new folder “hypotheses” in the data folder you specified with the -d parameter. In this folder you will find the recognition results.
- The resultfiles will be named after your parameters.
- If you choose to enable keyword-output, the resultfile will only contain recognized keywords.
1.6.5.3. Language Model Information¶
Language of the Language Model is german.
The two speechmodels “Verbmobil” and “Cocolab” can be found at speechmodel-repo,
and are installed at /vol/csra/releases/trusty/lsp-csra-nightly/etc/speechconfig/
.
Default model will be the Verbmobil-speechmodel.
1.6.5.4. Examples¶
The following scenario is the base for all following examples.
You have got a folder /root/datafolder/
which looks like the table beneath.
/root/datafolder/ | |||
---|---|---|---|
set01 | set02 | set03 | set04 |
set05 | new_folder | crypt_folder | fileXY.wav |
log.log | picture.png | set06 | trash_folder |
You stored soundfiles with speech in this folder that you want to evaluate.
Example 1:
Now you want to use sphinx to recognize them.
To start sphinx, you can use ./sphinxevaluator -d /root/datafolder/
.
With the -d parameter you tell it, where your audio-data is.
Sphinx will then look in /root/datafolder/
for .wav datas and will try to recognize them after each other.
It will ignore everything else that wont have the suffix .wav.
After it is finished, you will find a new folder in which you will find the results. The resultfolder will be created in the same folder where the recognized data is.
Example 2:
You have also got some sound-sets stored in the set0X folders.
So you could use ./sphinxevaluator -d /root/datafolder/ -r true
It will then also look for .wav files in all underlaying directories. So you will have in every set0X folder a new directory called hypothesis where the results are saved. If other folders contain matching .wav files, sphinx will try to recognize them aswell of course.
Example 3:
Also, you know that trash_folder and crypt-folder is not containing any interesting data.
So you could enhance your command line to ./sphinxevaluator -d /root/datafolder/ -r true -e 'trash_folder;crypt_folder'
.
Now the recognizer will ignore the given directories!
Example 4:
If you want to know, if the recognizer understood designated keywords, you can start the tool like this ./sphinxevaluator -d /root/datafolder/ -k /root/keywordfile
.
It will then recognize the files as usual, but will compare its results with a keywordfile and will only print spotted keywords in the resultfile.
Todo
how to set up and where to store keyword-file
1.6.5.5. Things to keep in mind¶
- If sphinx gives you no results concider the following:
- soundfile in wrong format.
- soundfile too noisy
- soundfile too long (speech blurred)
The recognizer will take a maximum amount of ten seconds to recognize a file. If the limit is exceeded, the file will be skipped.
The usage of a grammar if notable faster than using a language model.
Grammar usage excludes language-model usage.
Sphinx has big problems recognizing files that contain noise.
The soundfiles you want to recognize need to be a .wav, 16khz, mono file!