Learn About Intelligent Speech¶

Fundamentals¶

Intelligent speech is an important part of artificial intelligence technology, including speech recognition, semantic understanding, natural language processing, speech interaction, etc. Its core purpose is to enable devices to understand the world through sound and interact with humans in the most natural way, making control and daily life more convenient.

The foundation of intelligent voice technology lies in using neural networks to improve the accuracy of speech recognition. It also leverages semantic understanding to interpret user intent and carry out appropriate actions. For feedback, the system can either play preset audio or use speech synthesis to generate and deliver the voice response.

At present, there are many ways to process intelligent speech, including online automatic speech recognition(ASR) and offline ASR. Since speech recognition and semantic understanding require significant computing power, early implementations primarily relied on cloud servers to handle these tasks. A typical intelligent speech processing flow is shown in the diagram below.

Common intelligent speech processing processes

With ongoing technological advancements, an dedicated edge AI speech Integrated Circuit(IC) has emerged, enabling speech recognition and semantic understanding to be processed directly on the device trough the computing power of the AI chip. This has accelerated the adoption of offline speech processing. Offline voice processing offers several advantages—such as better privacy protection, faster response times, and the ability to operate without an internet connection—and has now become a standard voice control method for many types of functional devices. In the future, speech processing will be implemented more at the edge side to reduce server consumption and reduce network bandwidth usage. As a provider of services and content, the cloud computing will continue to cooperate with the offline speech processing to improve user experience.

Offline ASR introduction¶

Offline ASR solutions process functions such as speech recognition locally, without requiring a network connection. Compared to online ASR solutions, offline ASR offer faster response times and better privacy protection. These solutions typically rely on intelligent voice chips and are particularly well-suited for functional devices, such as air conditioners, smart plugs, and other smart home appliances.

A comparison of offline ASR and online ASR functions is shown in the following table.

Project	Offline ASR	Online ASR
Connectivity to network	Without network	With network
Response speed	Very fast (usually about 0.2S)	Fast (latency affected by network quality)
Number of voice commands	1~1000	unlimited
speech processing	Powered by AI voice chips	Powered by cloud
Extended functions	Limited	Access to entertainment and other online content

At present, our company has launched several offline ASR solutions, and an application block diagram is shown below.

Offline ASR introduction¶

Offline ASR has the advantages of no networking and fast response while online ASR excels at accessing to rich cloud-based content and services. In practical implementation, the two solutions can be combined to leverage their respective strengths. Offline ASR can handle control-related functions, while online ASR is used for content and service retrieval.

We have launched a hybrid ASR solution that supports both offline control and online services. It enables multiple high-frequency functions that are used in daily life, including music, video, social networking, news, encyclopedias, stock updates, recipes, and children’s education etc., meeting the needs of most products. The following is a system application diagram.

AIoT ASR introduction¶

With the maturity of the IoT ecosystem, various devices can now be interconnected via Ethernet, Wi-Fi, Bluetooth, and other technologies. However, IoT control—especially in smart home scenarios—still often relies on smartphones or similar central devices. This approach can be inconvenient, particularly when the device to be controlled is nearby, or when the central device fails, leaving other devices inoperable due to the lack of a fallback control method.

Voice interaction, as a natural and intuitive interface, addresses many of these challenges. It simplifies network setup, removes the dependency on a central device, and enables seamless control across interconnected devices through a single voice entry point. With the emergence of dedicated AIOT ASR chips, the cost of implementing such solutions has dropped significantly, making voice control widely adopted in control panels, smart displays, sockets, and a wide range of home appliances.

We have launched our AIoT ASR solution, and an application block diagram is shown below.