Introduction to Offline Speech Command Recognition¶
1. Offline Speech Recognition¶
ASR (Automatic Speech Recognition) refers to the technology that converts human speech to text in real time using algorithms. Offline speech command recognition is a type of speech recognition that processes audio signals on the local device (without relying on cloud servers; no internet connection required) and converts them into predefined text commands. The device’s built‑in model directly parses spoken keywords or phrases and triggers corresponding local actions.
2. Terminology¶
VAD (Voice Activity Detection): Detects the start and end points of human speech segments in an audio stream.
Wake Word: The keyword that activates the speech system (e.g., “Chipintelli”), putting the device into command‑listening mode.
Command Word: A voice instruction recognized after wake‑up that triggers a specific action (e.g., “Turn on the air conditioner”).
OneShot: The wake word and command word are spoken in one utterance (e.g., “Chipintelli turn on the air conditioner”), without saying the wake word to wake it up and saying command after wake word (e.g., “Chipintelli”, “turn on the air conditioner”).
Confidence Threshold: A confidence score (0–255) for recognition results. An action is triggered only when the score exceeds the threshold. Example: If the threshold is 40, the command “Turn on the air conditioner” is executed only when the score exceeds 40. This helps balance false triggers and missed recognitions.
