1.6.5. Sphinx4 Evaluation Tool¶

Sphinx 4 is an open source speech recognition toolkit, written in Java.

This tool can recognize data-sets and can compare them with a keyword-file. It is very similar to Speech Recognition Tool.

It needs some dependencies like speechmodels or a grammar to work. You can use your own dependencies or simply use the given Language Models.

You will need an accoustic model, a dictionary and a language model or a grammar. The usage of grammar excludes the usage of a language model. These parameters are set via the configuration type.

Also you can use a manipulated sphinx configuration to adjust the recognizer.

1.6.5.2. Interfaces¶

Input/Output:

Attention

The soundfiles you want to recognize need to be a .wav, 16khz, mono file!

Sphinx recognizes speech from soundfiles and stores the result in a textfile.
You will find a new folder “hypotheses” in the data folder you specified with the -d parameter. In this folder you will find the recognition results.
The resultfiles will be named after your parameters.
If you choose to enable keyword-output, the resultfile will only contain recognized keywords.

1.6.5.3. Language Model Information¶

Language of the Language Model is german.

The two speechmodels “Verbmobil” and “Cocolab” can be found at speechmodel-repo, and are installed at /vol/csra/releases/trusty/lsp-csra-nightly/etc/speechconfig/. Default model will be the Verbmobil-speechmodel.

1.6.5.4. Examples¶

The following scenario is the base for all following examples. You have got a folder /root/datafolder/ which looks like the table beneath.

/root/datafolder/
set01	set02	set03	set04
set05	new_folder	crypt_folder	fileXY.wav
log.log	picture.png	set06	trash_folder

You stored soundfiles with speech in this folder that you want to evaluate.

Example 1:

Now you want to use sphinx to recognize them.

To start sphinx, you can use ./sphinxevaluator -d /root/datafolder/. With the -d parameter you tell it, where your audio-data is.

Sphinx will then look in /root/datafolder/ for .wav datas and will try to recognize them after each other. It will ignore everything else that wont have the suffix .wav.

After it is finished, you will find a new folder in which you will find the results. The resultfolder will be created in the same folder where the recognized data is.

Example 2:

You have also got some sound-sets stored in the set0X folders. So you could use ./sphinxevaluator -d /root/datafolder/ -r true

It will then also look for .wav files in all underlaying directories. So you will have in every set0X folder a new directory called hypothesis where the results are saved. If other folders contain matching .wav files, sphinx will try to recognize them aswell of course.

Example 3:

Also, you know that trash_folder and crypt-folder is not containing any interesting data. So you could enhance your command line to ./sphinxevaluator -d /root/datafolder/ -r true -e 'trash_folder;crypt_folder'. Now the recognizer will ignore the given directories!

Example 4:

If you want to know, if the recognizer understood designated keywords, you can start the tool like this ./sphinxevaluator -d /root/datafolder/ -k /root/keywordfile. It will then recognize the files as usual, but will compare its results with a keywordfile and will only print spotted keywords in the resultfile.

Todo

how to set up and where to store keyword-file

1.6.5.5. Things to keep in mind¶

If sphinx gives you no results concider the following:

soundfile in wrong format.

soundfile too noisy

soundfile too long (speech blurred)

The recognizer will take a maximum amount of ten seconds to recognize a file. If the limit is exceeded, the file will be skipped.

The usage of a grammar if notable faster than using a language model.

Grammar usage excludes language-model usage.

Sphinx has big problems recognizing files that contain noise.

The soundfiles you want to recognize need to be a .wav, 16khz, mono file!

`-d <ABSOLUTE_PATH>, --data-path <ABSOLUTE_PATH>`
	The location of your soundfiles you want to be recognized. Also location where results are stored.
`-k <ABSOLUTE_PATH>, --keywords <ABSOLUTE_PATH>`
	Set the path to a keyword file. The result will then only contain keywords.
`-e <FOLDER;ANOTHER_FOLDER>, --folder-exclusion <FOLDER;ANOTHER_FOLDER>`
	Name directories that you want to be ignored.
`-r <BOOLEAN>, --recursive-folder-crawling <BOOLEAN>`
	Boolean if sphinx looks for soundfiles in every directory inside the data path.
`-m <NAME_OF_MODEL>, --speechmodel-type <NAME_OF_MODEL>`
	Speechmodel configuration. You can use the speechmodel “-m verbmobil” or “-m cocolab”.
`-g <'NAME PATH_TO_GRAMMAR_FILE'>, --grammar <'NAME PATH_TO_GRAMMAR_FILE'>`
	The grammar sphinx should use. Usage of grammar excludes usage of a language model.
`-c <PATH_TO_CONFIG_FILE>, --sphinxconfig <PATH_TO_CONFIG_FILE>`
	Configuration file to adjust the recognizers behaviour.