Product Development Process¶
Overview¶
As the most natural interaction method, speech recognition is increasingly accepted by users. However, developing an intelligent voice products solution differs from traditional logic development. Integrating speech recognition into a product requires collaboration across multiple areas to create a successful product. This document introduces the design and product development of Chipintelli’s ASR solutions, suitable for new entrants to intelligent voice products, IDH solution providers, and product developers.
If you are new to ASR solutions, it is recommended to first view the Beginner’s Guide. If you are familiar with ASR solutions, you can refer to development cases for specific software development. This document mainly describes the standard development process for an actual product. For specific products, refer to the respective domain product solutions.
The development process generally follows these steps:
- Demo Testing and Requirement Analysis: Obtain a demo development kits, test the demo in the actual product environment, and analyze the application requirements based on the product scenario.
- Solution Selection: Choose appropriate solution, including chip and/or hardware module.
- R&D Testing: Clarify product requirements, design the solution from hardware, structure, VUI (Voice User Interface), and product logic, build a complete product prototype, test the electrical performance and recognition performance, and conduct hardware-related tests.
- Firmware Confirmation and Placing Production Orders: After testing, confirm the firmware documentation, conduct relevant hardware reliability tests, perform small batch trials, and place production orders. Incoming chips/modules can be tested using fixtures.
- Quality and After-Sales: If quality issues arise, communicate with our after-sales department.
We also provide corresponding hardware and software checklists for checking structure, schematics, PCB, and software during the design stage. Download the checklist from the Chipintelli AI Speech Development Platform.
We can also provide automated testing solutions for speech recognition and automated module flashing tests. For more information, refer to Product Testing.
1. Demo Testing and Application Requirement Analysis¶
1.1 Demo Testing¶
After purchasing the demo board from Sample Purchase, test and examine the demo kit referring to the demo materials. In actual application scenarios, test the recognition rate, wake-up rate, and false wake-up in quiet and noisy conditions to evaluate the actual experience of speech recognition. For detailed testing standards, refer to Speech Recognition Performance Test. Pay special attention during testing:
- Inductive testing with similar words should not be used as a basis for optimizing false recognition. The current big data deep learning algorithm inherently has a degree of recognition ambiguity, which enhances its general applicability, and ensure a relatively good recognition for those who doesn’t speak standard Mandarin/English.
- The test and demo boards we provide are for testing only, not for production. For mass production, please contact our sales team.
1.2 Analyzing Requirements¶
After gaining insights into the demo board, evaluate whether to integrate intelligent speech to the products, considering the following factors:
- What is the product, and what is the inherent noise level when the product is in use?
- What is the distance between the user and the product in real application scenario?
- What functionalities need to be achieved through voice commands?
- What is the application scenario of the product, and is it suitable for speech recognition technology?
- What is the accuracy requirement for speech recognition, and would false recognition pose significant risks?
2. Product Solution Selection¶
If the product needs to integrate intelligent voice after evaluation, proceed to select a specific ASR solution.
Chipintelli offers various offline ASR solutions, including offline voice single microphone, offline voice dual microphone, offline voice + Bluetooth playback, offline voice + IoT, and offline + online ASR. The chip also supports custom (user-defined) firmware development, allowing connecting to various modules through UART interface. Some product development can also refer to Product Solution Development Overview.
3. R&D Testing¶
3.1 Solution Selection¶
Select the product’s solution based on its application requirements, as exemplified below:
If you want rapid development, we recommend using a voice module with UART communication to the MCU. In this solution, the voice module only develops the speech part and communicates with the MCU via UART port, reducing development and debugging time. This solution allows the voice module and MCU to be simulated using computer serial debug tools during development, and once testing is complete, the actual modules can be connected.
The communication process of this solution:
- Sound is transmitted through the microphone to the voice module, which recognizes the voice commands.
- The recognized voice command information is communicated to the MCU via UART port.
- The MCU executes the relevant actions.
- Based on the execution status, the MCU inform the voice module to play the corresponding voice prompt.
- The voice module plays the voice prompt.
Advantages:
- The voice module plays the corresponding voice prompt based on the MCU’s status, providing optimal user experience.
- If the MCU is controlled via control buttons or remote control, the voice module also play voice prompts.
If you prefer an integrated board solution or other communication methods like IIC, you can develop it yourself. Please contact our technical support if you encounter issues.
3.2 VUI Design¶
3.2.1 Command and Wake Words¶
The words users are expected to say to the device, which can be recognized as operational commands, are command words. Among command words, those used to wake up the device, akin to the device’s “name,” are defined as wake words.
For design methods of both types of words, see: Voice UI Design Reference
3.2.2 Voice Prompt Design¶
When users say a command word, the device needs to play the corresponding voice prompt. The following suggestions are provided for designing this voice prompt:
- The welcome sound when powered on should include the wake word to inform users how to wake the device immediately upon power-up.
- Keep response voice prompts concise to avoid disturbing customers with frequent use.
- Add an exit voice prompt to inform users, so that they know the machine is back to sleep mode and they need to say the wake word again for next commands (if there is an indicator light, it can also indicate the wake-up status).
- Include the “command word” in the voice prompt to reinforce user memory.
- Add a “mute mode” command word, allowing “beep” sounds to replace voice prompts, minimizing disturbance to customers.
- Add a “voice navigation” command word to inform users of available voice commands, preventing loss of instructions.
- Add “increase volume, decrease volume” to adjust voice prompt volume according to user habits.
3.2.3 Product Manual Design¶
The product manual should include the following:
- The interaction process of the voice module, requiring wake-up before saying other command words for control.
- Highlight the wake word in the manual for easy memory, and indicate that voice navigation can be used to obtain other instructions.
- List wake words and key command words in a table, with explanations for keywords. Choose words with good recognition performance as keywords.
- Use a single-page format for the voice manual for easy viewing.
- Design the manual attractively to enhance product image.
- Remind users to use normal speech speed when voice commanding the device.
3.2.4 Tips¶
- Attach/hang command word stickers to the device, to remind users of the command words.
- Add a QR code on the device for scanning to view the manual and command words.
3.3 Product Structure Design¶
Noise control in product design not only affects user comfort and health but also directly impacts market competitiveness and technological innovation.
Therefore, noise control is an important aspect when developing new products. The structural design of microphones and speakers directly affects recognition performance and requires special attention. Microphones should prevent noise entry, and speaker sound pickup hole design should prevent poor sound playback. For details, refer to Product Structure Design. If you are developing an intelligent voice product for the first time, it is strongly recommended to communicate with our technical support for comprehensive product value.
When designing products, use our recommended key materials as much as possible. Relevant information is as follows:
- Microphone selection: Choose an analog microphone with a sensitivity of -32±3dB; SNR >70dB. We recommend using -32±3db. For details, refer to Microphone Compatibility List
- Speaker selection: Choose a speaker with nominal power matching the amplifier chip. If AEC is required, select a speaker with low distortion. For details, refer to Speaker Compatibility List
3.4 Hardware Design¶
Key points for hardware design:
- Reserve upgrade ports: Intelligent voice products have a higher possibility of firmware modification than traditional products. Strongly recommend reserving upgrade interfaces; CI110X, CI112X, and CI13XX series chips require reserving UART0.
- GIPIO usage caution: Some products are sensitive to IO default levels, such as IO used for motor drive, requiring attention to default IO level values.
- GPIO input mode caution: CI110X and CI112X series chips require adding pull-up/down resistors when GPIO is used as input.
- Interface level matching: For example, many electrical controls use 5V levels. CI110X and CI112X series chips require level matching circuits, while CI13XX series chips can directly support 5V levels after software settings.
Emphasize: Intelligent voice products have a higher possibility of firmware modification than traditional products. Strongly recommend reserving upgrade interfaces.
For more information, see Hardware Design Reference.
3.5 Software Development¶
If you are using Chipintelli solutions for the first time, refer to Software Development for software development learning, or log in to Chipintelli AI Speech Development Platform to watch ASR development introduction videos.
Basic introductions for software development:
- SDKs are currently divided into multiple versions. You can use the standard SDK directly or generate a customized SDK using the customization feature on the Chipintelli AI Speech Development Platform. Current standard SDK versions are as follows:
| sdk name | Version | Download Link |
|---|---|---|
| CI13XX offline voice recognition SDK | CI13XX_SDK_ASR_Offline_V2.2.0 | ☞Chipintelli AI Platform |
| CI13XX offline multi-algorithm SDK | CI13XX_SDK_ASR_ALG_V2.6.3 | ☞Chipintelli AI Platform |
| CI13XX offline-online large model SDK | CI13XX_SDK_LLM_AIoT_V1.0.10 | ☞Chipintelli AI Platform |
| CI13XX noise reduction SDK | CI13XX_SDK_NN_ENC_V2.1.8 | ☞Chipintelli AI Platform |
| CI13LC offline voice recognition SDK | CI13LC_SDK_ASR_Offline_V2.0.15 | ☞Chipintelli AI Platform |
| CI13LC offline infra-red dongle remote control SDK | CI13LC_SDK_IR_V2.0.15 | ☞Chipintelli AI Platform |
Note: The version numbers of the above SDKs may be upgraded. Please refer to the SDK version number downloaded from the platform.
-
During software development, write your code in \sample\internal\sample_xxx to minimize code changes when the SDK is updated.
-
Main function description, user code: sample\internal\sample_1102\src\user_msg_deal.c
// Process based on semantic ID
uint32_t deal_asr_msg_by_semantic_id(sys_msg_asr_data_t *asr_msg, cmd_handle_t cmd_handle, uint32_t semantic_id)
// Process based on command word ID
uint32_t deal_asr_msg_by_cmd_id(sys_msg_asr_data_t *asr_msg, cmd_handle_t cmd_handle, uint16_t cmd_id)
// Application message processing
uint32_t deal_userdef_msg(sys_msg_t *msg)
{
uint32_t ret = 1;
switch(msg->msg_type)
{
/* Key message */
case SYS_MSG_TYPE_KEY:
{
sys_msg_key_data_t *key_rev_data;
key_rev_data = &msg->msg_data.key_data;
userapp_deal_key_msg(key_rev_data);
break;
}
#if MSG_COM_USE_UART_EN
/* CI UART Protocol Message */
case SYS_MSG_TYPE_COM:
{
sys_msg_com_data_t *com_rev_data;
com_rev_data = &msg->msg_data.com_data;
userapp_deal_com_msg(com_rev_data);
break;
}
#endif
/* CI IIC Protocol Message */
#if MSG_USE_I2C_EN
case SYS_MSG_TYPE_I2C:
{
sys_msg_i2c_data_t *i2c_rev_data;
i2c_rev_data = &msg->msg_data.i2c_data;
userapp_deal_i2c_msg(i2c_rev_data);
break;
}
#endif
default:
break;
}
return ret;
}
3.6 Precautions for Product Testing¶
The speech recognition effect is affected by factors such as environmental noise, equipment placement, whether the environment has reverberation, and whether the tester’s pronunciation is clear and accurate. It is necessary to identify all the influencing factors when testing the whole machine sample. Pay special attention to the following:
- When testing, the signal-to-noise ratio should be more than 10dB to ensure the testing effect is good;
- The equipment should be placed at the height parallel to the sound source, and the microphone should face the sound source and avoid shielding in the middle, so as to ensure the sound source is within the microphone pickup range;
- The test environment should avoid smooth walls, such as glass walls. Smooth walls will cause reverberation, negatively impacting recognition performance;
- Ensure the test room does not have excessive reverberation; large rooms with smooth walls tend to have severe reverberation issues;
- Test personnel should articulate clearly and avoid using unsupported dialects during testing;
- If recordings are used for testing, high-fidelity equipment should be used for both recording and playback to accurately replicate human speech characteristics, minimizing any frequency alterations by the equipment that could affect recognition performance;
- During the test, the microphone should be placed to avoid vibration, noise sources and wind blowing.
Special reminder:
Please do the following before finalizing and sealing the whole machine:
- When the device is operating under full load (including display, motor, etc.), ensure that the power supply ripple for the speech component is less than 200mV. Verify that the structure of the microphone and speaker complies with our recommendations. Conduct speech recognition performance testing in an environment similar to the product’s real application environment (e.g., high reverberation in a bathroom).
If the test results indicate poor recognition performance, please refer to ☞Chipintelli AI Speech Development Platform Online Support ->Problem Location * * to solve the problem. If the issue persists, collect the board’s baseline noise information and send it to us for analysis and optimization. A recording board will be needed to collect the baseline noise. For detailed instructions, please refer to ☞Instructions for Recording Boards.
4 Firmware Confirmation¶
After the product has reached mass production status through prior development and optimization, if the firmware was developed by us or our solution provider, we recommend following the firmware confirmation process outlined below. This ensures the production firmware’s accuracy and prevents rework due to issues found during or after production.
- Test and verify the firmware provided by us or our solution provider. If the firmware is satisfactory, inform us or our solution provider to finalize the firmware.
- We or our solution provider conduct recognition testing on the firmware to ensure its effectiveness.
- We or our solution provider send a firmware confirmation letter to the client for confirmation and signature, indicating that the client confirms that the current firmware meets the requirements.
- Upon receiving the signed confirmation from the client, we or our solution provider archive the firmware internally and make it available to the production department.
In summary, it is advisable to plan the firmware confirmation process in advance when you have an order. If there are custom modifications to the microphone and speaker, ensure they are synchronized early to guarantee timely delivery of the order.
5 Production Test¶
Our company provides a complete product production automation test solution. For details, please refer to ☞Automated Speech Recognition Performance Test.
For customers using our standard modules:
-
If only warehousing sampling inspection is performed, only one test fixture needs to be purchased.
-
If firmware flashing is required, test tooling shall be purchased.
For customers using our chips for module research and development:
-
You can obtain the data of automated speech recognition testing tool from our company.
-
You can purchase the test panel part of the automated test tooling from our company, and customize the test fixture according to your module.
Our self-developed automated testing equipment, when configured with a 1-to-24 panel setup, can produce and test approximately 10,000 units per day.If you need to purchase, please contact us at email: globalsupport@Chipintelli.com.
6 Contact Information¶
It is recommended that you pay attention to the latest information of our company and communicate with us to ensure a smooth development process and above standard recognition performance of your products. With regard to the latest information of the company, you can log in to ☞Chipintelli Official website or WeChat official account “Chipintelli (WeChat: Chipintelli).
For more development materials and information, please log in to ☞Chipintelli AI Speech Development Platform Get.
For any further assistance, please contact us through the following way.
