[ Overview || TCP | C++ | Python | REST | WebSocket || Models | Customization | Deployment | Licensing ]
Note that the add-grammar command is currently in beta and is
subject to change in future releases.
Typical models used in the Engine recognize generic conversational
English. The add-grammar command allows you to modify an existing
model to recognize a highly structured grammar in addition to the
default conversational grammar. A good example is US telephone
numbers. The phone number (415) 721-0127 would rarely be spoken as
"forty one fifty seven two one hundred one twenty seven", but may
well be spoken "four one five seven twenty one oh one two seven".
Exploiting this constrained structure can improve accuracy, especially
if the audio quality is poor.
The add-grammar command is similar to the existing custom grammar
feature. The difference is that a
custom grammar would only recognize phone numbers, whereas
add-grammar allows recognition of phone numbers interspersed with
conversation. For example, if the audio is just "four one five seven
two one oh one two seven", then a custom grammar is appropriate,
whereas if the audio is "you can reach me at four one five seven two
one oh one two seven thanks", then add-grammar is appropriate. Also,
the custom grammar is sent along with a recognition request, and
does not modify the model, whereas the add-grammar is its own
command, and modifies a model. So like add-words, add-grammar will
affect all recognitions after the add-grammar command completes
(until the Engine is terminated or the model is reloaded).
To modify a model in an Engine with a grammar, you pass the
add-grammar command with a "words" option containing the
pronunciations of all the words in the grammar, a "grammar" option
with the description of the grammar, and the "asr-model" option with
the model to be modified. The "word" and "grammar" options are
documented for custom grammars, and we recommend reading
through the examples on that page to get a better idea of how a grammar
is constructed. Note that for add-grammar, the grammar "type"
option must be "graph".
The following example is an excerpt of the command that would add a
phone number grammar to the mod9/en-US_phone-smaller model of an Engine
running on locally at port 9900. This example is not meant to be stand
alone, but rather is used to demonstrate the format and structure of
the add-grammar command.
echo '{"command": "add-grammar", "asr-model": "mod9/en-US_phone-smaller",
"words": [
{ "word": "eighty", "phones": "EY T IY" },
{ "word": "four", "phones": "F AO R" },
...
],
"grammar": {
"type": "graph",
"start": 0,
"exits": [ "1", "12", ... ],
"arcs": [
{ "from": "0", "to": "1", "word": "one" },
{ "from": "10", "to": "20", "word": "nineteen"},
...
]
}
}' | jq -c | nc localhost 9900If you provide an id string to the add-grammar request, you can
later call drop-grammar with the same asr-model and id to remove
the grammar. Note that providing an id increases memory usage
slightly.
The following example shows how to add a real phone number grammar to an existing model in an Engine. It uses the same phone number grammar as is described for a custom grammar.
First, download the phone number grammar.
curl -sO https://mod9.io/phone-number-grammar.jsonNext, download an audio file of a person saying, "ah yes this is adam and you can reach me at 415 721-0127 thanks".
curl -sO https://mod9.io/voicemail.wavSince the add-grammar command modifies a model in a running Engine,
it's best to start a new Engine for testing. The example below
uses the mod9/en-US_phone-smaller model because the regular models have
high enough accuracy that it can be hard to see the differences. Also
note --models.mutable; this argument is provided to allow you
to protect the Engine against clients modifying the models unexpectedly.
docker run -d mod9/asr \
engine --models.asr=mod9/en-US_phone-smaller --models.nlp= --models.mutable=trueNow run recognition on the audio file without the added phone number grammar and notice the many errors (mostly due to the use of the small model).
cat voicemail.wav | nc localhost 9900 | jq -r .transcript
# oh yeah this is that um and you can reach me for one by seventeen wine zero onto soTo add the phone number grammar, we use the jq command to add the
required components to phone-number-grammar.json, and pass it on to the
Engine:
jq -sc '.[0] + .[1]' phone-number-grammar.json <(echo '{"command": "add-grammar", "asr-model": "mod9/en-US_phone-smaller"}') | nc localhost 9900At this point, the mod9/en-US_phone-smaller has been modified to support
US phone numbers. Any audio recognized using this Engine and that
model will recognize not just English conversational audio, but also
phone numbers.
cat voicemail.wav | nc localhost 9900 | jq -r .transcript
# oh yeah this is that um and you can reach me four one five seven two one zero one two seven©2019-2022 Mod9 Technologies (Version 2.0.0)