Voice Recognition User Guide¶

Automatic Speech Recognition (ASR) is the process of converting speech into text. This guide applies to all CI13XX series chip SDKs.

Important ASR-related algorithm files in the SDK:

ASR decoder library: Located at \components\asr\decoder\ in the SDK package, provided as lib files (*.a). It uses deep learning algorithms to recognize speech and output the corresponding string. It is updated by Chipintelli; users do not need to modify this file.
Acoustic model files: Located at \project_name\firmware\dnn\ (e.g., \projects\offline_asr_pro_sample\firmware\dnn\), provided as binary files (*.fefixbin3632). These are generated through training and updated by Chipintelli. Users can select different acoustic models provided by Chipintelli for various application scenarios or languages.
Language model files: These include wake words and command words. Wake words awaken the device; command words are used for control. They are located at \project_name\firmware\asr\ （e.g., \projects\offline_asr_pro_sample) as.dat` files and must be generated online via the Chipintelli AI Speech Development Platform. For usage, see: 《命令词和固件制作指南》 .

Voice recognition related configuration and operations:

1. Generate the command-word language model files¶

Before using the SDK’s voice recognition feature, you must generate the language model files based on your wake words and command words. For the generation method, see: 《命令词和固件制作指南》 .

2. Configure confidence thresholds for command entries¶

When using the generated language model files, the SDK works with the decoder library to perform recognition. Default confidence thresholds are configured in software (recommended range in SDK projects: 1–199, depending on command length). When a command’s recognition score is greater than or equal to its threshold, it is considered valid; otherwise, it is rejected. In real applications, if certain commands are prone to false recognition or hard to recognize, adjust their confidence thresholds to optimize performance. Generally, lower thresholds increase sensitivity but also false positives; higher thresholds lower false positives but may reduce recall.

Quick tuning of confidence thresholds:¶

If the command “open the air conditioner” has a default threshold of 40 (defaults may vary by acoustic model), and you observe high false positives, increase its threshold to 42 to reduce false triggers with minimal recognition impact. Conversely, if the command “close the TV” has a default threshold of 40 but is hard to recognize, decrease its threshold to 38 to improve recognition, provided false positives do not notably increase. Adjust within 1–199 as needed.

Precise tuning principle:¶

Tune based on maximizing overall benefit between recognition and false recognition to achieve the best performance.

Steps to adjust confidence thresholds:¶

Open the cmd_info Excel file [60000]cmd_info.xlsx in the SDK package at: \project_name\firmware\user_file\cmd_info\ (e.g., \projects\offline_asr_pro_sample or \projects\offline_asr_pro_sample). Modify the values in the Confidence Threshold column for the target commands. Save the file, then repack and flash the firmware.

3. Configure special word counts for commands¶

If two commands share the same initial characters (or pronunciation), or one contains the other and both appear in the command set, the shorter command must be assigned a special word count. Counting rule: 8 + 4 × (difference in the number of characters), with a maximum of 20.

Examples: “上下扫风” and “上下扫风停止” differ by 2 characters → 8 + 4 × 2 = 16.

“打开空调” and “打开空调扇” differ by 1 character → 8 + 4 × 1 = 12.

The above provides general guidance. In practice, you may fine-tune up or down based on recognition, not exceeding 20 (excessively large values can impact response time).

Adjustment method:

Open the cmd_info Excel file [60000]cmd_info.xlsx at: \project_name\firmware\user_file\cmd_info\ (e.g., \projects\offline_asr_pro_sample). Modify the Special Word Count values for the relevant commands. Save, then repack and flash the firmware.

4. Select an appropriate acoustic model¶

Acoustic models are provided by Chipintelli. Some are included in the SDK package; others can be downloaded from the AI Speech Development Platform. The table below lists scenarios and notes to help you choose the right model for your product.

Primary Scenario	General Home Environment	Kitchen Range Hood Environment	Bathroom/Water Sound Environment
Description	Use in typical home environments (e.g., bedrooms, living rooms)	Use in kitchen range hood environments	Use in bathroom environments (e.g., water heaters)
Notes	Enable noise reduction when steady noise ≥ 65 dB	Noise reduction required	Noise reduction required
Chinese Model	中文普通话通用V1_1.4M_V00487	中文普通话烟机V2_1.4M_V00567	中文普通话卫浴V1_1.4M_V00583
English Model	英文标准通用V1_1M_V00488	\	\
Japanese Model	\	\

Model selection:¶

Naming rule: LanguageScenario_Type_Version_Model Size_Model ID_Remarks
Example: 中文取暖器通用_pro3_V1_0.5M_V01052_仅支持算法sdkV2.2.6

Usage notes:¶

(1) Type: Different types denote different model architectures. With the same size, higher type numbers usually perform better.
(2) Version: Prefer the latest versions (e.g., V1, V2, …).
(3) Model size: For the same type (e.g., pro3), larger models contain more parameters and generally yield higher accuracy.
(4) Model ID: Unique identifier. Use it to synchronize during communication (e.g., V01052 can be referred to as model 1052).
(5) Remarks: As technology iterates, some new architectures are only supported by newer SDKs. Choose an SDK version compatible with the model.

Other notes:¶

Enabling certain features (dual mic, AEC, noise reduction, self-learning, etc.) may reduce the number of command entries and slow recognition; adjust the number of entries accordingly.
When enabling self-learning, ensure raw audio is not altered. Do not enable other software algorithms such as AEC, noise reduction, dual mic, etc.