Speech Recognition Instructions¶
Automatic Speech Recognition (ASR), generally referred to as ASR, is the process of converting voice into characters. In CI110X series chips, CI1102 supports 50~80 voice commands, and CI1103 supports up to 300 voice commands. This description is applicable to all CI110X SDKs.
There are some recognition related algorithm files in the SDK that need special attention, as follows:
- ASR speech recognition decoding library file: the file path is components asr decoder in the SDK package, and exists in the format of lib file (. a). This file is used to recognize sounds and output corresponding strings through deep learning algorithm, which is updated by Chipintelli, and users do not need to modify the file;
- Acoustic model file: the file path in the SDK package is project name (such as sample internal sample_light) firmware dnn , which exists in the bin file (. bin3676) format. This file is an acoustic model generated through server training, which is updated by Chipintelli. Users can use the corresponding acoustic model file provided by Chipintelli according to different application scenarios or different language types;
- Language model file: This file contains wake-up words and command word files. The wake-up words are used for wake-up devices, and the command words are used for recognition control. The path in the SDK package is project name (such as sample internal sample_light) firmware asr , which exists in bin file (*. dat) format. This file needs to be generated online by users through the Chipintelli voice AI platform, For the use method, please refer to the center of this document Command Word and Firmware Making Guide.
The following describes some voice recognition related configurations and operations:
1. Generate command word language model file¶
When using the speech recognition function in the SDK, you need to first generate the language model file according to the user’s wake-up words and command words. For the generation method, please refer to the center of this document Command Words and Firmware Production Guide.
2. Configure the confidence threshold of command entries¶
When using the generated language model file, the SDK will cooperate with the decoding library file to identify. The software has configured the default confidence threshold (25). When the score of the recognition command term is greater than this value, it is considered that the command term is recognized. If users find that some command terms are easy to be misidentified or difficult to recognize in practical applications, they can optimize the recognition effect by adjusting the confidence threshold. Please note that the lower the confidence threshold configuration, the more sensitive the identification will be, and the risk of false identification will increase; The larger the configuration is, the lower the error recognition rate is. For example, when a command term is misidentified, the value of the confidence threshold can be appropriately increased, such as 28; When a command term is difficult to recognize, you can reduce the value of the confidence threshold, such as 22. It is recommended that the confidence threshold range should not be less than 20 at the minimum and 30 at the maximum, otherwise it will affect the adaptation range of regional accents.
The following is an example of configuration modification:
Open the cmd in the SDK package_ [60000] {Smart Housekeeper} cmd of info directory_ info. Xls table file, the path of which is as follows: project name (such as sample internal sample_light) farmware_ser_ file\cmd_ info. For the command term to be modified, modify its corresponding confidence threshold value in the confidence threshold column in the table, save the file after modification, and then repackage and synthesize the firmware for use.
3. Configure the command entry special word count¶
If the command terms used by the user are part of the same words, such as “up and down scavenging” and “up and down scavenging stop” or “twenty degrees” and “twenty nine degrees”, the special word count must be configured. For command entries containing the same words, it is necessary to find out different consecutive words, such as “Stop” in “Up and Down Sweeping” and “Up and Down Sweeping Stop”, and “Nine” in “Twenty Degrees” and “Twenty Nine Degrees”. Set the word number of the different consecutive words as D (if the different words are separated, the number of consecutive words is not counted, such as “turn on the air conditioner” and “turn on the air conditioner”, D should be 1; “quickly turn on the air conditioner” and “turn on the air conditioner”, D should be 2). When the value of D is N, the count value can be configured as 15 * N~20 * N.
For example, when D is 1, if there is a different continuous word “nine” between “twenty degrees” and “twenty nine degrees”, the count value can be configured as 15~20; when D is 2, if there are two different continuous words “stop” between “up and down scavenging” and “up and down scavenging stop”, the count value can be configured as 30~40 times. However, please pay special attention that the higher the count value of special words, the slower the response time may be under noise. It is recommended that the selected command terms avoid the occurrence of the same word between each other as much as possible. If it is necessary to do so, the number of different consecutive words D should not exceed 2. The adjustment range of the count value of special words should preferably be between 15 and 45.
The following is an example of configuration modification:
Open the cmd in the SDK package_ [60000] {Smart Housekeeper} cmd of info directory_ info. Xls table file, the path of which is as follows: project name (such as sample internal sample_light) farmware_ser_ file\cmd_ info. Modify the values corresponding to all command terms with inclusion relations in the count column of special words in the table, save the file after modification, and then repackage the firmware for use.
4. Select appropriate acoustic model¶
The acoustic model is provided by Chipintelli and some of it has been put into the SDK package. Users can also download it on the voice AI platform. The following table shows the scenarios and precautions for each acoustic model. Users can select appropriate acoustic models according to the table and the product application scenarios.
Main Scenes | General Home Environment | Children’s Toys | Kitchen Smoke Machine Environment | Bathroom Aquatic Environment |
---|---|---|---|---|
Application description | Voice products are used in normal home environment (such as bedroom and living room) | Voice products are mainly used for children’s interaction, toys | Voice products are used in kitchen smoke machine environment | Voice products are used in bathroom environment (water heater, etc.) |
Precautions | Noise reduction can be enabled when the ambient noise is 60~65dB steady noise | Noise reduction can not be enabled | Noise reduction must be enabled | Noise reduction must be enabled |
Chinese model | GE-CH-S-V00146 | GE-CH-S-V00138 | YJ-CH-S-V00129 | WR-CH-S-V00130 |
English Model | GE-EN-S-V00131 | |||
Japanese Model | GE-JP-S-V00,141 |
5. Configure frame by frame mode¶
When the user application scheme needs to be more compatible with fast speech recognition, the frame by frame mode can be enabled to improve the effect of fast speech recognition under the condition that the number of command words is small and there is no additional task that consumes high CPU resources. The method to determine the number of command words is to check the number of nodes in the log printing after the system is powered on after the development board is powered on and the serial port is connected. That is, check the number after the keyword “states”. The CI1102 scheme node can be used when it is recommended to be less than 2000, and the CI1103 scheme node can be used when it is recommended to be less than 4000. Pay special attention when using the frame by frame mode. This mode requires more CPU resources, so the recognition response time will be slower than the normal response time (usually 100~200 ms).
The following is an example of configuration modification:
Open the asr in the SDK package_ api. C file, the file path is components asr .
After opening the file, configure USE_ DSPK_ EN is 1, as shown below, the frame by frame mode can be turned on. Turn off frame by frame mode when the configuration is 0.
#define USE_ DSPK_ EN 1/* Enable frame by frame*/