Custom Words | Mod9 ASR Engine

[ Overview || TCP | C++ | Python | REST | WebSocket || Models | Customization | Deployment | Licensing ]

Mod9 ASR Engine: Custom Words

By default, requests to the Engine use a pre-built vocabulary and grammar with a very large number of words and generic English grammar. Once these ASR models are loaded, they can be customized by the Engine. The Engine supports commands for customizing loaded ASR models by adding out-of-vocabulary words, and updating the bias weights of loaded words.

Contents

Add Words

The add-words command allows a client to add new words to a loaded ASR model. For example, you could add the words "xcommand", "mod9", and "janin" to the "en_video" model with the following command:

echo '{
  "command": "add-words",
  "asr-model": "en_video",
  "words": [
    { "word": "xcommand" },
    { "word": "mod9", "soundslike": "mod nine" },
    { "word": "janin", "phones": "JH AE N IH N"}
  ]
}' | jq -c . | nc $HOST $PORT

The words field consists of an array of lexical entries. Each lexical entry has the spelling in the word field (e.g. "janin" in the above) that can be pretty much anything -- it's what gets printed by the engine when it recognizes that word. If you don't provide any other options, the pronunciation will be generated automatically. See Custom Pronunciations for more details on how to specify how each word is pronounced using either the "soundslike" or "phones" options.

The options include:

Option Type Default Description
asr-model string N/A The ASR model to modify.
cost number 5.0 Controls the frequency for all words in the words list. The lower the cost, the more frequently the words appear.
id string None (Optional) If provided, you can use this string to later remove the added words using drop-words.
words array of objects N/A A list of words along with their pronunciations.

Each of the objects in the words array have the following options:

Option Type Default Description
cost number 0.0 An adjustment for the likelihood of this pronunciation.
phones string N/A A space delimited phonetic sequence representing the pronunciation of the word.
soundslike string N/A English-like "sounds out" pronunciation.
word string N/A The spelling of the word.

See Custom Pronunciations for more information on the "soundslike" and "phones" options.

If a word has multiple pronunciations, they can all be added in a single add-words request by including multiple lexical entries, each using the same word but a different phones entry for each variant. The relative likelihoods of these can be tuned with their respective cost values.

The response to the add-words request can have the following fields:

Field Type Description
added int The number of unique out-of-vocabulary words added to the ASR model through this request.
updated int The number of unique in-vocabulary words updated in the ASR model through this request.

When the Engine receives an add-words request, it will immediately begin to recognize the added words. Any recognitions currently in progress that are using the modified ASR model will immediately start recognizing those words. The additions will persist until the Engine is shut down. If you want the Engine to recognize the new words in a future session, you must send the add-words command to the new Engine instance.

Note that the add-words command adds the words regardless of whether they're already in the vocabulary. Adding a word that already is in vocabulary may increase the likelihood of that word being output (but will never decrease it).

Cost

Words already in vocabulary have been tuned such that they should appear with the correct frequency. For words you add to the vocabulary with add-words, you can specify a cost term that controls the frequency for all words in the words list. If the cost is too low, the new words will appear where they shouldn't, and memory use and run time will increase. If the cost is too high, the new words will fail to appear when they should. The cost is loosely related to the negative of the log of the likelihood of the added words and must be nonnegative. A cost of 0 will almost certainly cause the words to appear much more frequently than desired. costs above around 14 are high enough that the new words will never appear. The default is 5.

If the same word/pronunciation is added multiple times, the version with lowest cost will have precedence. You cannot adjust the cost once it has been set. Instead, you must remove the previous words with drop-words (see below) and add them back with the new cost.

It is recommended that the cost be set to a large enough number so that the added word does not slow down processing but small enough that the added word shows up in the alternatives lists, and that the accuracy of the speech recognition be further tuned by adjusting the bias of the word using bias-words.

See the Tuning section for more information.

Memory Usage

Each call to add-words increases the memory used by the Engine by at least 50MB regardless of how many words are added by the single call. You can save memory by reducing the number of calls to add-words by including as many words as possible per call.

Passing an id (which allows the words to later be removed with a call to drop-words) increases the memory usage slightly. Calling drop-words should free all the memory used by the corresponding call to add-words with the same id.

Drop Words

Words that were added with add-words that included an id may later be deleted by calling drop-words with the same id. For example:

echo '{
  "command": "add-words",
  "asr-model": "en_video",
  "id": "proper names",
  "words": [
    { "word": "janin" },
	{ "word": "stiggs" },
	{ "word": "vanceson" }
  ]
}' | jq -c . | nc $HOST $PORT

echo '{
  "command": "drop-words",
  "asr-model": "en_video",
  "id": "proper names"
}' | jq -c | nc $HOST $PORT

If you do not provide an id with add-words, there is no way short of unloading and reloading the model to remove the added words.

Bias Words

Words that are in an ASR model's vocabulary, whether in the base model, or added in an add-words request, can be re-weighted or biased with the bias-words command. The bias-words command will bias the word everywhere in the model, changing the likelihood of the word being output by the recognizer in every context.

The following command will bias the model so that "basketball" and "hoop" are output more often, while "alien" is output less often.

echo '{
  "command": "bias-words",
  "asr-model": "en_video",
  "words": [
    {"word":"basketball", "bias":2.3},
    {"word":"hoop", "bias":3},
    {"word":"alien", "bias:-2}
  ]
}' | jq -c . | nc $HOST $PORT

The words option takes in an array of objects, each specifying a word entry and a bias amount. Each call to bias-words sets the bias of the given word, i.e. the bias-words command is idempotent. biases can take any amount. A positive bias will cause the biased word to be output more frequently by the Engine, and a negative bias will cause the biased word to be output less frequently. By default, all words in a model start with a bias of 0.

Option Type Default Description
words array of objects N/A A list of words along with biases.
asr-model string N/A The ASR model to modify.

Each of the objects in the words array has the following options:

Option Type Default Description
word string N/A The spelling of the word.
bias number N/A The new bias for the given word.

Similar to add-words requests, once the Engine completes a bias-words request, the new biases will immediately apply to all requests using the modified model.

To undo or reset a bias for a word, send a bias-words command to set the bias to the default of 0. Stopping and restarting the Engine will cause all biases to be reset to 0.

[top]

Lookup Word

The lookup-word command allows a client to query an ASR model for information about a given word.

echo '{"command":"lookup-word", "word":"euphoria", "asr-model":"en_video"}' | nc $HOST $PORT
Option Type Default Description
word string N/A The word to look up.
asr-model string N/A The ASR model to query.

The Engine responds with fields indicating whether the queried word is in the ASR model's vocabulary. If the word is in the model's vocabulary, the response may have additional fields indicating other information.

{
  "bias": 0,
  "found": true,
  "asr_model": "en_video",
  "status": "completed",
  "word": "euphoria"
}

[top]

Tuning Costs and Biases

Unfortunately, it is difficult to know a priori what the cost and bias parameters for a given set of words should be, as it's dependent not only on how common the words are, but also on details of how the base model was constructed and optimized. Tuning the right values of cost and bias may take some trial and error before the right values are found.

Add-words cost

Words that are not in an ASR model's vocabulary should be added to the model with an add-words request. We recommend adding the words with a relatively high cost so that the newly added words do not show up too often as false positives, and so that the added words do not noticeably slow down the speech recognizer. The cost should still be low enough that the added words show up in the phrase-alternatives when they are spoken in the audio.

Once the added word shows up reliably in the phrase-alternatives, further tuning should be done with bias-words.

To set the cost of an added word to a higher value, you must restart the Engine and call add-words with the new cost.

Bias-words bias

To tune the bias value, send a recognition request with audio containing the specified words, and with phrase alternatives and biases requested. We can use the reported bias scores of each alternative to tune the correct value in the bias-words command. The idea is to adjust the bias value so that the bias score of the correct phrase is just less than the cost of the alternatives.

For example, suppose the file novavax.wav contains the audio "Trials for Novavax are under way", but "Novavax" is not showing up in the transcript. To correct this, we run the audio through the Engine with "phrase-alternatives": 8 and "phrase-alternatives-bias":true.

(echo '{"phrase-alternatives": 8, "phrase-alternatives-bias":true}'; cat novavax.wav) | nc $HOST $PORT | jq .

Find the phrase where Novavax appears. Here's an example:

...
"alternatives": [
  {
    "bias': {
      "am": 0,
      "lm": 0
    }
    "phrase": "nova vacs"
  },
  {
    "bias": {
      "am": 0,
      "lm": 2.056
    }
    "phrase": "Novavax"
  },
...

In this case, the best scoring phrase is "nova vacs", which has a total bias of 0.0. The sum for the next best, "Novavax" is 0 + 2.056 = 2.056. This indicates that we can adjust "Novavax" to be the best scoring phrase by setting its bias to a value greater than 2.056.

echo '{"command":"bias-words", "asr-model":"en_video", "word":"Novavax", "bias":2.1}' | nc $HOST $PORT

If we send the the previous recognition request again...

(echo '{"phrase-alternatives": 8, "phrase-alternatives-bias":true}'; cat novavax.wav) | nc $HOST $PORT | jq .
...
"alternatives": [
  {
    "bias": {
      "am": 0,
      "lm": 0
    }
    "phrase": "Novavax"
  },
  {
    "bias": {
      "am": 0,
      "lm": 0.044
    }
    "phrase": "nova vacs"
  },
...

We see that "Novavax" is now the best scoring phrase alternative, as desired.

Note that if the bias is set too high, the biased word might start showing up unexpectedly in undesired places.

[top]


©2019-2022 Mod9 Technologies (Version 1.9.5)