/ speech-to-text

Speech To Text with Pocketsphinx

Talking to our devices is the future.

No more clunky interfaces to confuse us, no more searching for that one button you were looking for.

I'm going to walk you through creating a simple speech-to-text demo using Pocketsphinx:

Note: Make sure you install pocketsphinx and sphinxbase first. Follow instructions here: http://cmusphinx.sourceforge.net/wiki/tutorialpocketsphinx#installation

We'll start off with some boilerplate code. We include the pocketsphinx header file and a main function.

#include <pocketsphinx.h>

int main(int argc, char *argv[]) {
    return 0;

We'll need a configuration object for our decoder,

// create configuration
cmd_ln_t *config = cmd_ln_init(NULL, ps_args(), TRUE,
   "-hmm",  MODELDIR "/hmm/en_US/hub4wsj_sc_8k",
   "-lm",   MODELDIR "/lm/en/turtle.DMP",
   "-dict", MODELDIR "/lm/en/turtle.dic",
   "-logfn", "/dev/null", // no logs in stdout

and the decoder itself.

// instantiate decoder
ps_decoder_t *decoder = ps_init(config);

Some file parsing and a raw audio decoder that returns the number of samples collected from the file.

// open file and parse it
FILE *file = fopen("goforward.raw", "rb");

// decode raw file
int samples = ps_decode_raw(decoder, file, NULL, -1);

And the most important part; guessing the utterance and a confidence score.

// create hypothesis
char const *utterance;
int32 score;

char const *hypothesis = ps_get_hyp(decoder, &score, &utterance);

printf("Recognized: %s\n", hypothesis);

This should print: Recognized: go forward twenty meters

You can now give your computer the ability to understand human language, pretty neat.

You can record your own voice, to see if it outputs a similar utterance. Most unix-flavored operating systems come with rec installed.

Pocketspinx requires a single-channel, little-endian, unheadered 16-bit signed PCM audio file sampled at 16000Hz.

rec -b 16 -r 16000 -c 1 goforward.raw

Let's add one more thing:

A Makefile to make compilation easier.

	gcc -o demo demo.c \
  	-DMODELDIR=\"`pkg-config --variable=modeldir pocketsphinx`\" \
  	`pkg-config --cflags --libs pocketsphinx sphinxbase` \

	rm -rf demo

Now run make to compile the demo, and ./demo to run it.

Source code is available on GitHub: https://github.com/aperturescience/sphinx-demo


Speech To Text with Pocketsphinx
Share this