Product Scheme Development Process¶

Overview¶

As the most natural way of interaction, speech recognition has been accepted by more and more users. However, the speech scheme is different from the traditional logic development, which requires the cooperation of many parties to create a good product. This document mainly introduces the scheme design and project development of Chipintelli’s speech recognition scheme, which is applicable to the finished product demanders, IDH scheme providers and scheme developers who are new to speech.

If you are new to the speech scheme, it is recommended that you first check ☞novice guide. If you have a preliminary understanding, you can refer to the development case for specific software development. This document mainly describes the standard development process of an actual product. For a specific product, you can refer to the specific domain product scheme.

During scheme development, the following process can be used:

Demo testing and demand analysis: obtain the Demo development version, test the Demo in the actual product running environment, and conduct demand analysis according to the actual application scenario;
Scheme selection: select appropriate schemes, chips and hardware modules;
R&D test: define the requirements of the product, design the scheme, carry out research and development from hardware, structure, VUI (speech dialogue), product logic, build the finished machine, test the electrical performance and recognition of the machine, and conduct hardware related tests;
Firmware confirmation and production order placement: after the test, confirm the firmware confirmation, and conduct relevant hardware reliability test according to the situation, carry out small batch trial production and production order placement. After the order is placed, the materials can be spot checked through the fixture;
After sales quality: If there are quality problems, you can communicate with our after-sales service.

Our company also provides the corresponding software and hardware check list for you to check the structure, schematic diagram, pcb and software at the design stage. For the checklist, please go to ☞Chipintelli Speech AI Development Platform.

Our company can also provide automatic speech recognition test scheme and automatic burning test scheme of modules. For details, please refer to ☞Product Test.

The specific process will be described as follows:

1 Demo test and demand analysis¶

1.1 Demo test¶

After purchasing the Demo, you can refer to the Demo materials for a preliminary experience test. In the actual application scenario, test the recognition rate, wake-up rate and false wake-up rate under quiet and noisy conditions to understand the actual experience of speech recognition. For detailed test standards, refer to ☞recognition effect test. Please pay special attention to:

The induction test using similar words is not used as the optimization basis for error recognition. The current big data deep learning algorithm has a certain degree of fuzziness in recognition, which is more universal and practical, and the customer has less strict requirements for the end user’s Mandarin.
The test and Demo boards provided by our company are only for testing, not for production. Please contact our company to determine the model when batch production is required.

1.2 Analyze speech requirements¶

After testing and experiencing the Demo, you can consider whether to add speech based on the type of product, mainly considering the following:

What is the product and how about its noise?
How far is the product required to be used?
What functions do you need to use speech to achieve?
Where is the application scenario of the product and is it suitable to use speech recognition technology?
What is the accuracy requirement of the product for speech recognition? Will false identification bring serious risks?

2 Scheme selection¶

After the preliminary Demo evaluation and experience test, if the product needs to finally consider adding speech, then consider selecting a specific speech scheme.

Chipintelli provides a variety of recognition schemes such as offline single microphone, offline dual microphone, offline speech+Bluetooth playback, offline speech+IOT, offline online recognition, etc. The chip itself also supports secondary development, and can use serial ports to connect various modules. For the development of some products, please refer to ☞Overview of Product Scheme Development.

The following describes the standard offline single microphone product scheme as an example:

The offline single microphone scheme has low structural requirements, is mature and stable, is simple and easy to use, has good recognition effect, and is widely applicable. At present, it is widely used in household appliances, lighting, infrared sockets, central control, fans, intelligent toilets, range hoods, Yuba, heating tables and other fields, and has been mass produced and sold in the market.

For specific scheme selection, please refer to the description of chip selection and module selection in ☞Hardware Selection Guide.

3 R&D test¶

3.1 Clear scheme¶

According to the requirements of the product, it is clear which solution to use for the product. Examples are as follows:

If you want to achieve rapid development, it is recommended to use the serial communication scheme between the speech module and the electronic control. In this way, the speech module can only develop the speech part, and the mature electronic control can use the serial communication to reduce the development and debugging time. The program can use computer serial port tools to simulate the docking development of speech module and electronic control module respectively. After the development and testing are completed, the actual module can be docked.

Communication process of the scheme:

The speech is transmitted to the speech module through the microphone, and the speech module recognizes the words;
Inform the electronic control unit of the identified word information serial port;
Electric control executes relevant actions;
The electronic control informs the speech module of the content to be played according to the execution of the action;
The speech module plays according to the electronic control feedback.

Advantage:

The speech module plays the corresponding content according to the state of the electric control, which can provide the most appropriate feedback and have a good experience;
If the electronic control is controlled by pressing keys or remote control, the speech module can also play the status.

If you want to use the all-in-one board solution or other communication methods such as IIC, you can also develop it yourself. If you have any questions, please contact our technical support.

3.2 VUI design:¶

3.2.1 Command words and wake-up words:¶

The operation instructions that can be recognized by the device are the command words that the user is expected to say to the device. In the command word, the word used to wake up the device, which is equivalent to the device “name”, is defined as the wake-up word.

For the design methods of the two words, please refer to: ☞Reference for speech UI Design

3.2.2 Play feedback tone¶

After the user has said the command word, the device needs to broadcast the feedback tone of the corresponding term. The following suggestions are made on the design of the broadcast feedback tone:

The welcome sound of power on and power on, including wake-up words as much as possible, so that the user can know how to wake up the device at the first time when the product is powered on.
The reply tone should be as simple as possible. With frequent use, the simple broadcast tone will not disturb customers.
Add the exit wake-up broadcast: so that the user knows when to say the wake-up word again (if there is an indicator light, you can also use the indicator light to indicate the wake-up status).
The feedback speech of command words should include “command words” as much as possible, so that users can deepen their memory of command words.
The command word “silent mode” is added. In silent mode, “Didi” can be used to replace the feedback playback reason of the product, so that customers will be less disturbed.
The command word “speech navigation” is added. In speech navigation, users are provided with the terms of this product to prevent customers from losing the manual and not knowing what terms are available.
Add “Volume increase, volume decrease”, and adjust the feedback tone according to the customer’s habits.

3.2.3 Instructions Description¶

The product manual is recommended to include the following contents:

The interaction process of the speech module needs to be awakened first and then controlled by other command words.
The wake-up words in the instructions must be clearly marked to facilitate customers’ memory of wake-up words, and it is also necessary to indicate that other instructions can be obtained by speech navigation.
List wake-up words and key command words in the form of a table. The command word link has a description of the key words. Try to choose the words with good recognition effect as the key words.
The speech manual shall be single page as far as possible, which is easy for users to view.
In order to improve the product image, the manual should be designed as beautiful as possible.
Remind users to support Putonghua and Putonghua with a few accents, and not speak too fast.

3.2.4 Other suggestions¶

The command word sticker can be made and pasted on or near the device to prevent users from forgetting relevant words.
A QR code is added to the device to scan the code to view the instructions and command entries.

3.3 Product structure design and key material selection¶

The structural design of microphone and loudspeaker directly affects the identification effect of the product, which requires special attention. The microphone should prevent noise from entering, and the speaker needs to pay attention to the design of the sound outlet to prevent poor sound listening. For details, please refer to ☞Product Structure Design,If you are making speech products for the first time, it is strongly recommended that you communicate with our technical support personnel in detail.

Please try your best to use the key materials recommended by our company when designing products. The relevant information is as follows:

Microphone selection: select analog microphone with sensitivity of - 32 ± 3dB; For microphones with signal-to-noise ratio>70dB, we recommend using - 32 ± 3db. For details, please refer to ☞Microphone Compatibility List
Speaker selection: select the same nominal power as the power amplifier chip. If the product has AEC requirements, the speaker distortion should be as small as possible. For details, please refer to ☞Speaker Compatibility List
Selection of NorFlash: If C110X chip is used, 8Mbyte Flash or 4Mbyte Flash should be selected. For details, please refer to ☞NorFlash Compatibility List ,When CI1122 and CI130X are used, flash is built-in. Please select according to the chip model ☞Hardware Selection Guide

3.4 Hardware design¶

The hardware design has the following key points:

Reserved upgrade interface: speech recognition products are more likely to modify firmware than traditional products. It is strongly recommended to reserve upgrade interfaces. UART 0 should be reserved for CI110X, CI1122, and CI130X.
Note for use of GIPIO: Some products are sensitive to the default level of IO, such as the IO driven by a motor. In this case, attention should be paid to the default level value of IO.
Note for GPIO input mode: when GPIO of CI110X and CI1122 is used as input, pull-up and pull-down resistors must be added.
Interface level matching: For example, many electronic controls use 5V level. At this time, CI110X and CI1122 need to add level matching lines. CI130X needs software settings to directly support 5V level.

It is particularly emphasized that speech recognition products are more likely to modify firmware than traditional products, and it is strongly recommended to reserve upgrade interfaces

For more information, please also refer to ☞Hardware Design Reference.

3.5 Software development¶

If you are using the Chipintelli scheme for the first time, you can learn about software development by referring to ☞Software Development ,or you can log in ☞Chipintelli Speech AI Development Platform Play the video of speech development introduction.

The basic introduction of some software development is as follows:

Currently, there are multiple SDKs. You can directly use the standard SDKs, or you can use ☞Chipintelli Speech AI Development Platform . The current standard SDK has the following versions:

SDK package	version name	download address
CI230X_AIOT_SDK	CI230X_audio_aiot_sdk_release_v1.1.1	☞Chipintelli Speech AI Development Platform
CI230X_IOT_SDK	CI230X_audio_iot_sdk_release_v1.1.1	☞Chipintelli Speech AI Development Platform

CI130X offline SDK	CI130X_SDK_V1.5.9	☞Chipintelli Speech AI Development Platform
CI130X infrared socket SDK	CI130X_SDK_baseV1.5.9_IR_V1.1	☞Chipintelli Speech AI Development Platform

Note: The SDK version number after the above SDK may be upgraded. Please refer to the SDK version number downloaded from the platform,

When developing software, please try to write your code in sample internal sample_ Xxx, so that when the SDK is updated, you don’t need to change the code.
Description of main functions, user code:sample\internal\sample_1102\src\user_msg_deal.c

  //Processing according to semantic ID
  uint32_t deal_asr_msg_by_semantic_id(sys_msg_asr_data_t *asr_msg, cmd_handle_t cmd_handle, uint32_t semantic_id)
  //Processing according to command word ID
  uint32_t deal_asr_msg_by_cmd_id(sys_msg_asr_data_t *asr_msg, cmd_handle_t cmd_handle, uint16_t cmd_id)
  //Apply message processing
  uint32_t deal_userdef_msg(sys_msg_t *msg)
  {
      uint32_t ret = 1；
      switch(msg->msg_type)
      {
      /* Key Message */
      case SYS_MSG_TYPE_KEY:
      {
          sys_msg_key_data_t *key_rev_data；
          key_rev_data = &msg->msg_data.key_data；
          userapp_deal_key_msg(key_rev_data)；
          break；
      }
      #if MSG_COM_USE_UART_EN
      /* CI Serial Port Protocol Message */
      case SYS_MSG_TYPE_COM:
      {
          sys_msg_com_data_t *com_rev_data；
          com_rev_data = &msg->msg_data.com_data；
          userapp_deal_com_msg(com_rev_data)；
          break；
      }
      #endif
      /* CI IIC Protocol Message */
      #if MSG_USE_I2C_EN
      case SYS_MSG_TYPE_I2C:
      {
          sys_msg_i2c_data_t *i2c_rev_data；
          i2c_rev_data = &msg->msg_data.i2c_data；
          userapp_deal_i2c_msg(i2c_rev_data)；
          break；
      }
      #endif
      default:
          break；
      }
      return ret；
  }

3.6 Precautions for complete machine sample test¶

The speech recognition effect is affected by such factors as environmental noise, equipment placement, whether the environment has reverberation, and whether the tester’s pronunciation is clear and accurate. It is necessary to identify all the influencing factors when testing the whole machine sample. Special attention:

When testing, the signal-to-noise ratio shall be more than 10dB as far as possible, and the testing effect is good;
The equipment shall be placed at the height parallel to the sound source as far as possible, and the microphone shall face the sound source as far as possible to avoid shielding in the middle, so as to ensure that the sound source is within the microphone pickup range;
The test environment should avoid smooth walls, such as glass walls. Smooth wall will cause serious reverberation, which will have a great impact on identification;
The reverberation in the test room should not be too large, and the reverberation in larger rooms with smooth walls is relatively serious;
The tester shall pronounce as clearly and accurately as possible, and avoid using unsupported dialects for the test;
If recording is used for testing, high fidelity equipment should be used for recording and playing recording, so as to restore the characteristics of human speech as much as possible, and avoid changing the sound frequency characteristics of recording and playing equipment to affect the recognition effect;
During the test, the microphone should be placed to avoid vibration, noise sources and wind blowing.

Special reminder:

The speech recognition function is not a purely logical function. Please do the following before the final sealing of the whole machine:

When the whole machine operates under full load (display, motor, etc.) , the power supply ripple for speech is less than 200mV.
Check whether the structure of the microphone and loudspeaker meets our suggestions.
Actual application scenarios (such as high reverberation in the bathroom) of the product are tested and confirmed for the identification effect of the whole machine.

If the test shows that the recognition effect is not good, please refer to ☞Chipintelli Speech AI Development Platform Online Support ->Problem Location * * to solve the problem. If the problem still cannot be solved, please collect the bottom noise information of the board and send it to our company for analysis, and our company will optimize it. Recording boards will be used to collect the bottom noise. For specific instructions, please refer to ☞Instructions for Recording Boards.

4 Firmware confirmation¶

Through the previous development and optimization, after the product reaches the mass production status, if you are a firmware developed by our company or our solution provider, in order to ensure the correctness of the production firmware and avoid the need for rework due to problems found in the firmware during or after production, we suggest that you implement the following firmware confirmation process.

The user tests and confirms the firmware provided by our company or our solution provider. If the firmware is OK, notify our company or our solution provider to freeze the firmware.
Our company or our solution provider shall conduct identification test on the firmware to confirm its identification effect.
Our company or our solution provider sends a firmware confirmation letter to the user for confirmation and signature, indicating that the user confirms that the current firmware meets the requirements.
After receiving the firmware confirmation signed by the user, our company or our solution provider will store the internal firmware and provide it to the production department.

To sum up, please try to plan a good time to confirm the firmware in advance when you have an order. If the microphone and speaker are customized, please synchronize them in advance to ensure the timely delivery of the order.

5 Production test¶

Our company provides a complete product production automation test scheme. For details, please refer to ☞Automated Identification Test.

For customers using our standard modules:

If only warehousing sampling inspection is performed, only one test fixture needs to be purchased.
If firmware burning is required, test tooling shall be purchased.

For customers who use our chips for module research and development:

You can obtain the data of automated testing tooling from our company.
You can purchase the test panel part of the automated test tooling from our company, and customize the test fixture according to your module.

If the automatic testing tooling developed by our company is assembled by one dragging 24, about 10000 sets can be produced and tested every day; If you need to purchase, please contact the relevant person in charge of our company at +86 15107119906 and email: support@chipintelli.com.

6 Contact information¶

It is recommended that you pay more attention to the latest information of our company and communicate with us more to ensure the development process and effect of your products. With regard to the latest information of the company, you can log in to ☞Chipintelli Official website or WeChat official account “Chipintelli (WeChat: Chipintelli)”.

For more development materials and information, please log in to ☞Chipintelli Speech AI Development Platform Get.

If you have any questions in use, please contact us through the following ways.

Tel.: +86-028-61375925

Email: support@chipintelli.com