Skip to content

Learn About Intelligent Speech

Basic knowledge

Intelligent speech is an important part of artificial intelligence technology, including speech recognition, semantic understanding, natural language processing, speech interaction, etc. The problem solved by intelligent speech is to enable the device to sense the world around with hearing, to interact with people in the most natural way, and to make control and life more convenient.

The foundation of intelligent speech is to improve the recognition rate of speech recognition through neural network technology. At the same time, people’s intentions can be analyzed with semantic understanding and corresponding manipulation can be carried out. During feedback, preset sounds can be played or synthesized through speech synthesis to play and output results.

At present, there are many ways to process intelligent speech, including online speech and offline speech. Because intelligent speech processing requires computing power, cloud servers are used to provide computing power for recognition and semantic processing at the beginning. A common intelligent speech processing process is shown in the following figure.

语音处理流程111

Common intelligent speech processing processes

With the continuous development of technology, a special terminal intelligent speech chip has emerged, which can directly process speech recognition, semantic understanding and other functions on the terminal device through the terminal computing power on the chip. Offline speech has begun to rise. Because offline speech has the advantages of protecting user privacy, fast response, and control without network, it has become the standard speech control mode for many control devices. In the future, speech processing will be implemented more at the edge to reduce server consumption and network bandwidth, and save social resources. As a provider of services and content, the cloud will cooperate with the end speech to jointly serve people’s lives.


Offline speech introduction

The offline speech scheme uses local processing speech recognition and other functions, which requires no network, and has better response speed and privacy security than the online scheme. The offline speech solution needs to use intelligent speech chips to process intelligent speech functions, which is more suitable for processing control devices, such as control household appliances (air conditioners, sockets, etc.).

A comparison of offline speech and online speech functions is shown in the following table.

Project Offline speech Online speech
Whether to connect to the network No need to connect to the network Need to connect to the network
Response speed Very fast (usually about 0.2S) Fast (affected by network quality)
Number of instructions (speech library) 1~1000 (local speech library) unlimited (cloud speech library)
Fuzzy recognition Not supported, fixed term must be used Supported
speech analysis Singlechip query local speech database analysis Cloud computing query database analysis
Extended functions None With entertainment, life services and other functions

At present, our company has launched several offline speech solutions, and an application block diagram is shown below.

离线语音解决方案应用框图


Offline speech introduction

Offline speech has the advantages of no networking and fast response; Online speech has the advantage of access to rich cloud content and services. In the actual scheme, the advantages of the two can be combined. Control functions are implemented by offline speech, and content and services are implemented by online speech. This can not only ensure that the basic functions are used independent of the network, protect user privacy, but also obtain the required content and services through the network under the user’s control and permission, which is very convenient. At present, it has been applied in furniture equipment such as smart home appliances.

At present, our company has launched an off-line speech solution, which can realize hundreds of service skills including online music, video, social networking, news, encyclopedia, stocks, recipes, children’s education and other high-frequency life scenes, and can meet the needs of most products. An application block diagram is shown below.

离线语音AIoT解决方案应用框图


AIoT speech introduction

At present, the Internet of Things is very mature. All kinds of devices can be connected through Ethernet, WIFI, Bluetooth and other ways to achieve interconnection control. At present, IOT control, especially household equipment, still needs to use mobile phones and other devices as the center. In actual use, especially when the device is in front of you, starting with a mobile phone is not the most convenient control method. In addition, when the mobile phone and other central equipment are in failure, there is a lack of control methods between each device and it can not be used, which has certain limitations. At present, speech, as the most natural interaction mode, can be combined with IOT to solve the problems of distribution network in IOT control and some pain points in the need of center. It can also enable devices to provide services for users together after interconnection, and it is very convenient for one speech entrance device to control all IOT devices. Especially, with the emergence of special intelligent speech chip, the cost of its scheme has been greatly reduced, and it has been widely used in IOT equipment such as central control screen, panel, socket, large and small household appliances.

At present, our company has launched a speech AIOT solution, and an application block diagram is shown below.

离线语音AIoT解决方案应用框图