Add-Grammar Command | Mod9 ASR Engine

[ Overview || TCP | C++ | Python | REST | WebSocket || Models | Customization | Deployment | Licensing ]

Mod9 ASR Engine: The add-grammar Command

Note that the add-grammar command is currently in beta and is subject to change in future releases.

Typical models used in the Engine recognize generic conversational English. The add-grammar command allows you to modify an existing model to recognize a highly structured grammar in addition to the default conversational grammar. A good example is US telephone numbers. The phone number (415) 721-0127 would rarely be spoken as "forty one fifty seven two one hundred one twenty seven", but may well be spoken "four one five seven twenty one oh one two seven". Exploiting this constrained structure can improve accuracy, especially if the audio quality is poor.

The add-grammar command is similar to the existing custom grammar feature. The difference is that a custom grammar would only recognize phone numbers, whereas add-grammar allows recognition of phone numbers interspersed with conversation. For example, if the audio is just "four one five seven two one oh one two seven", then a custom grammar is appropriate, whereas if the audio is "you can reach me at four one five seven two one oh one two seven thanks", then add-grammar is appropriate. Also, the custom grammar is sent along with a recognition request, and does not modify the model, whereas the add-grammar is its own command, and modifies a model. So like add-words, add-grammar will affect all recognitions after the add-grammar command completes (until the Engine is terminated or the model is reloaded).

To modify a model in an Engine with a grammar, you pass the add-grammar command with a "words" option containing the pronunciations of all the words in the grammar, a "grammar" option with the description of the grammar, and the "asr-model" option with the model to be modified. The "word" and "grammar" options are documented for custom grammars, and we recommend reading through the examples on that page to get a better idea of how a grammar is constructed. Note that for add-grammar, the grammar "type" option must be "graph".

Excerpt from a phone number add-grammar command

The following example is an excerpt of the command that would add a phone number grammar to the mod9/en-US_phone-smaller model of an Engine running on locally at port 9900. This example is not meant to be stand alone, but rather is used to demonstrate the format and structure of the add-grammar command.

echo '{"command": "add-grammar", "asr-model": "mod9/en-US_phone-smaller",
       "words": [
         { "word": "eighty", "phones": "EY T IY" },
         { "word": "four", "phones": "F AO R" },
         ...
         ],
       "grammar": {
         "type": "graph",
         "start": 0,
         "exits": [ "1", "12", ... ],
         "arcs": [
           { "from": "0", "to": "1", "word": "one" },
           { "from": "10", "to": "20", "word": "nineteen"},
           ...
           ]
         }
      }' | jq -c | nc localhost 9900

If you provide an id string to the add-grammar request, you can later call drop-grammar with the same asr-model and id to remove the grammar. Note that providing an id increases memory usage slightly.

Worked example

The following example shows how to add a real phone number grammar to an existing model in an Engine. It uses the same phone number grammar as is described for a custom grammar.

First, download the phone number grammar.

curl -sO https://mod9.io/phone-number-grammar.json

Next, download an audio file of a person saying, "ah yes this is adam and you can reach me at 415 721-0127 thanks".

curl -sO https://mod9.io/voicemail.wav

Since the add-grammar command modifies a model in a running Engine, it's best to start a new Engine for testing. The example below uses the mod9/en-US_phone-smaller model because the regular models have high enough accuracy that it can be hard to see the differences. Also note --models.mutable; this argument is provided to allow you to protect the Engine against clients modifying the models unexpectedly.

docker run -d mod9/asr \
  engine --models.asr=mod9/en-US_phone-smaller --models.nlp= --models.mutable=true

Now run recognition on the audio file without the added phone number grammar and notice the many errors (mostly due to the use of the small model).

cat voicemail.wav | nc localhost 9900 | jq -r .transcript
# oh yeah this is that um and you can reach me for one by seventeen wine zero onto so

To add the phone number grammar, we use the jq command to add the required components to phone-number-grammar.json, and pass it on to the Engine:

jq -sc '.[0] + .[1]' phone-number-grammar.json <(echo '{"command": "add-grammar", "asr-model": "mod9/en-US_phone-smaller"}') | nc localhost 9900

At this point, the mod9/en-US_phone-smaller has been modified to support US phone numbers. Any audio recognized using this Engine and that model will recognize not just English conversational audio, but also phone numbers.

cat voicemail.wav | nc localhost 9900 | jq -r .transcript
# oh yeah this is that um and you can reach me four one five seven two one zero one two seven

©2019-2022 Mod9 Technologies (Version 1.9.5)