Command Words and Firmware Production Guide¶

1. Speech Recognition Process and Required Resources¶

The speech recognition process and required resources are shown in the figure below. The microphone converts speech into digital signals, which are then sent to the NN (Neural Network) for recognition. NN recognition requires two resources: an acoustic model and a language model. After NN recognition, it outputs a string. The system then searches for this string in the command word information table. If not found, it’s considered a misrecognition and is ignored. If found, it’s a valid recognition, and the system retrieves the relevant information from the command word and performs the corresponding application function processing. Finally, it calls the prompt tone player to play the corresponding prompt.

Figure 1-1 Resources and Flowchart

Note

Language Model: Generated based on command words, used for NN recognition.
Acoustic Model: Used for NN recognition, typically related to factors such as language and application scenarios.
Command Word Information Table: Stores information related to command words, such as command strings, wake-up word status, corresponding prompt tones, etc.
Prompt Tones: Used for feedback after command word recognition. Currently supports MP3 format.

The following sections will explain how to generate the required resources.

2. Creating Command Word Model Files¶

2.1. Accessing the Model Creation Interface¶

Log in to ☞Chipintelli AI Speech Development Platform. As shown in the figure below, click the (Language Model Development) icon to enter the language model interface:

Figure 2-1 Language Model Menu

Create a new project:

Figure 2-2 Create New Language Model Project

Model Creation Interface: First, fill in the basic information, then edit the command words, and finally submit for processing.

Figure 2-3 Model Creation Interface

2.2. Downloading the Acoustic Model¶

In the command word editing interface, select the acoustic model and download it.

Figure 2-4 Download Acoustic Model Interface

Extract the downloaded acoustic model package to obtain the acoustic model:

Figure 2-5 Extract Downloaded Acoustic Model

2.3. Downloading the Language Model¶

Select the language type from the dropdown (Chinese, English, Japanese, Korean) and download the corresponding template file.

Figure 2-6 Download Template File

Open the downloaded Chinese template file and edit the command words according to the file format:

Figure 2-7 Edit Command Words

Note

Command Word: The string of the command word to be recognized.
Command Word Type: Specifies whether the command word is a wake-up word. If a wake-up word is specified, the platform will generate dual-network data.
Dual-Network: To improve the recognition effect of wake-up words, a separate recognition model is created for wake-up words, while other command words share another model. This means the project uses two neural network models, hence the term “dual-network”.

Upload the command word list file and submit for processing:

Figure 2-8 Upload Command Word List File

The step of uploading the command word list file can be omitted. You can directly add command words in the command word information table by clicking “Add Row” and then click “Submit” to proceed with model creation.
The platform will then load and generate the model. Wait until the current process shows “Completed” and click “Download Files”:

Figure 2-9 Download Generated Language Model Files

Extract the downloaded language model package to obtain the command word information table file: “[60000]cmd_info.xlsx” and the command word/wake-up word language model file “asr_chinese_xxx_xxx.dat”:

Figure 2-10 Extract Downloaded Language Model Files

3. Generating Prompt Tones (Voice)¶

Access the “Prompt Tone Synthesis” interface from the platform’s main menu:

Figure 3-1 Prompt Tone Synthesis Menu

Create a project:

Figure 3-2 Create Prompt Tone Synthesis Project

Fill in the basic information:

1. Language Type: Select Chinese, English, or Japanese from the dropdown.

2. Download Sample File: Samples in Chinese, English, or Japanese will appear based on the selected language.

3. Voice Selection: Choose from different voice types.

4. Preview Prompt Tone: Listen to the selected voice.

Figure 3-3 Prompt Tone Information

Download the sample file:

Figure 3-4 Download Sample File

Open the downloaded template file. Examples for Chinese and English are shown below:

Figure 3-5 Chinese and English Examples

Edit the prompt tones to be generated according to the template format:

Figure 3-6 Edit Prompt Tones to be Generated

Note

Audio ID: Specifies the ID number of the generated audio file.
Audio Name: Specifies the filename of the generated audio file.
Synthesis Content: The text content to be converted to speech (can include commas but no spaces between words).

Tip

There are some rules for creating prompt tones to reduce firmware size and save FLASH space. The SDK supports combined playback and selective playback, allowing you to extract common words and create a single audio file. For example, “Turn on air conditioner,” “Turn on TV,” “Turn on fan,” “Turn on desk lamp,” “Turn on living room light,” and “Turn on study light” all contain the phrase “Turn on.” You can create a separate file for “Turn on” and associate it with the command words in the command word information table using combined playback. For example, for a startup prompt like “I am xxx, you can use xxx to wake me up,” you can split this sentence into 4 audio files:

I am

xxx

you can use

to wake me up

In the command word information table, enter “1+2+3+2+4” for the prompt tone ID association. Here, “xxx” can be multiple names. By using the combined and selective playback features, you can choose the appropriate prompt at runtime without generating a separate set of prompt tones for each name.

Return to the “Prompt Tone Synthesis” interface on the platform, fill in the project information, upload the edited audio string file, and click the “Submit” button. The platform will start generating the speech, which may take some time.

Figure 3-7 Upload Audio String File

Wait for the page to indicate successful generation, and a “Download Synthesized Files” button will appear. Click to download:

Figure 3-8 Generation Successful

The following audio files will be downloaded:

Figure 3-9 Download Audio Files

4. Creating Firmware¶

4.1. Editing the Command Word Information Table File¶

Copy the command word information table file “[60000]cmd_info.xlsx” downloaded in Section 2.2 to the path: %SDK_PATH%\projects\sample_xxx\firmware\user_file\cmd_info\, replacing the original file starting with [60000]. Make necessary modifications based on the project logic, mainly associating prompt tones, setting wake-up words, and adjusting recognition sensitivity.

Figure 4-1 Command Word Information Table File

Note

Model Name: Used to set the model name corresponding to the current set of command words. Currently, there are two: NN ID (Acoustic Model File ID) and ASR ID (Language Model File ID).
Model ID: Used to set the model ID number corresponding to the current set of command words. You can fill in 0 or a number greater than 0, but it must match the [ID] prefix of the file. For example, the file [3]asr_xxx_cmd.dat has an ID of 3, so the Model ID for ASR ID should be 3.
Command Word: The command word string.
Command Word ID: A user-defined ID for the command word, making it easier to implement logic quickly. By default, different command words cannot use the same Command Word ID. If necessary, you can modify the script file “cmd_info.bat” by adding --no-cmd-id-duplicate-check after the cmd_info.exe command.
Semantic ID: A Chipintelli-defined string semantic ID that is unique. If the product considers home networking, this ID can resolve command word conflicts across multiple devices.
Confidence: Used to adjust the recognition sensitivity of command words to address misrecognitions.
Wake-up Word: Specifies the wake-up word.
Combined Word: Specifies a combined word that serves as both a wake-up word and a command word, eliminating the need for a separate wake-up step.
Expected Word: Used when certain command words are particularly difficult to recognize.
Unexpected Word: Used when certain command words are easily recognized but may cause misrecognition of other similar correct command words.
Special Word Count: Used for short command words that might intercept longer command words with the same content. For example, “Heat” and “Heat for three minutes” might both be recognized as “Heat.” The solution is to set a special word count for the “Heat” command. After recognizing “Heat,” the system waits briefly to see if a similar command follows. If so, it discards the “Heat” recognition. However, setting this value too high can significantly increase the response time for “Heat.”
Prompt Tone Type: Mainly used for multiple selective prompts to specify the selection method. Currently, two types are supported: “Random Selection” (set select_index to -1 when calling the playback interface) and “Custom Selection” (set select_index to the desired value when calling the playback interface).
Prompt Tone ID: The ID of the prompt tone file (i.e., the audio sequence number from Chapter 4). Use ‘+’ to connect multiple prompt tones (up to 16 tones in combination). If there are multiple selective prompts, each option occupies one column, with a maximum of 127 columns.
Model Group ID: Used for model switching. In the SDK demo, 0 is the default command word model, and 1 is the wake-up word model.

Tip

If there are prompts not associated with any command word, you can create a dummy command word. The command word string is not used to generate the language model and will not be recognized but can be played back.
The ID in the command word information table filename must be 60000 and cannot be modified, e.g., [60000]cmd_info.xlsx.
Use combined playback, selective playback, and model switching appropriately to reduce firmware size and save FLASH space.

Figure 4-2 Associate Prompt Tones

4.2. Editing Code to Implement Project Requirements¶

User logic is mainly implemented in the system_msg_deal.c file.
The UserTaskManageProcess function is the user logic processing task, where various messages are handled, such as speech recognition messages, button messages, and UART messages.
Locate the message to be processed and implement the corresponding logic, such as I/O control, prompt tone selection, model switching, parameter adjustment, and UART reporting.
If there is information that needs to be saved before shutdown, use the ci_nvdm module. Refer to the volume setting code in the standard demo in the SDK.

Note

If you use command words to switch models and the model-switching commands have voice prompts, pay attention to the order of calling the model-switching interface and the voice prompt function, which depends on the model where the prompt tone is located.

4.3. Building and Flashing the Firmware¶

4.3.1. Copying Resource Files¶

The firmware production directory is shown below:

Figure 4-3 Firmware Production Directory

Place the language model file (asr_zn_214_CI13LC.dat) generated in Chapter 2 into the asr directory under the firmware directory. Set the file ID according to the ASR ID in the command word information table edited in Section 4.1, e.g., [0]asr_zn_214_CI13LC.dat. If using dual-network, place both the wake-up word and command word language models in this directory.
Place the acoustic model file (GE-CH-S-V00214.fefixbin3676) generated in Chapter 2 into the dnn directory under the firmware directory. Set the file ID according to the NN ID in the command word information table edited in Section 4.1, e.g., [0]GE-CH-S-V00214.fefixbin3676. If the model already exists in the NN directory, no replacement is needed.
Place the prompt tone files (WAV format audio files from the TTS_wav directory) generated in Chapter 3 into the voice directory under the firmware directory. Set the folder ID according to the VOICE GROUP in the command word information table edited in Section 3.1, e.g., [0]voice.
Compile the project code to generate user_code.bin in the user_code directory.
Make the partition bin files by double-clicking “make_partition_bin.bat”. After completion, bin files with the same names as the directories will be generated in the asr, dnn, user_file, and voice directories.

4.3.2. Packaging the Firmware¶

Double-click “Package Firmware.bat” to package the upgrade. In the pop-up interface, select “CI13LC,” choose the corresponding chip model (e.g., CI13162), and then select “Firmware Packaging.” For more information on using the tool, press F1 to view the help:

Figure 4-4 Upgrade Tool - Select Chip Type

Enter the packaging interface:

Figure 4-6 Upgrade Tool Packaging Interface

Note

Config: Software and hardware information area.
User, ASR, DNN, Voice, UserFile, etc.: Firmware partition information area.
Menu bar.

Packaging Steps:

1. Fill in the software and hardware information in the version information area.
1. Select or fill in the paths to the partition bin files.
1. Click “Package Firmware.”
1. If a pop-up indicates an address conflict, click “Auto Allocate” or manually adjust the partition sizes, then repeat step 3.
1. A pop-up saying “Firmware generated successfully” indicates successful packaging.
For more information about the firmware packaging interface and troubleshooting, refer to the SDK documentation ☞《UART Upgrade Tool User Guide》.

4.3.3. Flashing the Firmware¶

Click “Firmware Update” in the pack and updatetool:

Figure 4-7 Firmware Update

1. Select or fill in the firmware path.
1. Check the serial port to which the device to be upgraded is connected.
1. Other options: Update all partitions, encrypt.
1. Switch the module to upgrade mode (short the PG and EN pins).
1. Restart the device to be upgraded to start the upgrade.
1. Wait for the upgrade to complete. If successful, the device will automatically boot into the firmware code. If there is a power-on prompt tone, you will hear it.