AI Platform Attachment Filling Specifications¶
1. Product Firmware Rapid Development¶
1.1 Command Word and Voice Prompt List V3 - Chinese Template¶
| Chinese Command Words |
|---|
| 1. Optimal length is 4-6 characters (4 is ideal) - shorter phrases risk misrecognition, longer ones reduce usability |
| 2. Maximize consonant differentiation between adjacent characters |
| 3. Use natural, common phrases that are specific and direct |
| 4. Avoid casual expressions (e.g., “吃饭”) |
| 5. Exclude rare characters and zero-initial characters (e.g., “语音” in “语音识别”) |
| 6. Omit modal particles (e.g., “啊”, “呢”) |
| 7. Avoid reduplicated words (e.g., “你好你好”) |
| 8. Use pure Chinese characters only - no spaces or punctuation |
| 9. Express numbers in Chinese characters (e.g., “调高一度”, “二十六度”) |
| 10. For reference: Recommended Command Words |
| Chinese Voice Prompts |
|---|
| 1. Should be under 10 words - longer prompts degrade user experience |
| 2. Use commas for natural pauses (e.g., “好的,打开空调”) |
| 3. Punctuation only affects timing, not intonation |
| 4. [=*] specifies pinyin pronunciation (e.g., 打开空调[=tiao2], 音调[=diao4]上升) |
| 5. [n*] controls number pronunciation (e.g., 1300[n1] as “1300”, 1300[n2] as “one three zero”) |
1.2 Command Word and Voice Prompt List V3 - English Template¶
| English Command Words |
|---|
| 1. Optimal length is 2-4 words (4-6 syllables) - too short may cause misrecognition, too long is difficult to remember |
| 2. Maximize syllable differentiation between command words for better recognition |
| 3. Should match natural user language patterns - specific, direct and commonly used |
| 4. Avoid generic greetings like “HI” or “HELLO” |
| 5. Avoid phonetically similar commands (e.g., don’t use both TURN-ON and TURN-OFF) |
| 6. Avoid repeated words like “HI-HI” |
| 7. Format: ALL CAPS, with hyphens connecting multi-word commands (e.g., HELLO-JENNY) |
| English Voice Prompts |
|---|
| 1. Should generally be under 10 words for optimal user experience |
| 2. Format: all lowercase with words separated by spaces |
| 3. Use commas for natural pauses (e.g., “yes, I am here”) |
| 4. Note: Punctuation only affects timing, not intonation |
2. Product Firmware and SDK In-Depth Development¶
2.1 Command Word and Voice Prompt Protocol List V3 - Chinese Template¶
| Chinese Command Words |
|---|
| 1. Optimal length is 4-6 characters (4 is ideal) - shorter phrases risk misrecognition, longer ones reduce usability |
| 2. Maximize consonant differentiation between adjacent characters |
| 3. Use natural, common phrases that are specific and direct |
| 4. Avoid casual expressions (e.g., “吃饭”) |
| 5. Exclude rare characters and zero-initial characters (e.g., “语音” in “语音识别”) |
| 6. Omit modal particles (e.g., “啊”, “呢”) |
| 7. Avoid reduplicated words (e.g., “你好你好”) |
| 8. Use pure Chinese characters only - no spaces or punctuation |
| 9. Express numbers in Chinese characters (e.g., “调高一度”, “二十六度”) |
| 10. For reference: Recommended Command Words |
| Semantic Tag |
|---|
| 1. Semantic tags are used to mark command words with the same semantics |
| 2. Command words with the same semantics have the same semantic tags. For example, the semantic tags of “turn on air conditioner”, “turn on air conditioner” and “turn on air conditioner” must be the same |
| 3. The semantic tag is a positive integer. The value range is 1~65535 |
| 4. The content of broadcast statements with the same semantic command word can be the same or different. When the content of the broadcast statement is the same, the platform will automatically de duplicate according to the broadcast content; When the broadcast statements are different, the system will broadcast randomly when any command word of the semantics is recognized |
| 5. The sending protocol and receiving protocol of the same semantic command word must be the same. If there are differences, the platform will automatically prompt |
| Command Word Type |
|---|
| 1. There are three types of command words: wake-up words, command words and negative words. Wake up words are used to wake up voice systems, such as “Smart assistant”; Command words are voice command words, such as “turn on the air conditioner”; Negative words are used to reduce the false recognition of non command words. For example, when you say “turn on TV” under noisy conditions, it is possible to trigger the command of “turn on the air conditioner”, but “turn on TV” is a non command word command. At this time, “turn on TV” can be marked as a negative word to reduce the false recognition of “turn on the air conditioner” |
| 2. There are two types of voice prompts: welcome speech and rest speech. The welcome message is used for power on announcement, indicating that the system is powered on successfully; The rest speech refers to the announcement that the voice system switches from the awakened turntable to the non awakened state for prompt |
| 3. When the type of voice prompt is marked as “Welcome”, the corresponding command word can be occupied by “Welcome”; When the type of voice prompt is marked as “resting words”, the corresponding command words can be occupied by “resting words” |
| Chinese Voice Prompts |
|---|
| 1. Should be under 10 words - longer prompts degrade user experience |
| 2. Use commas for natural pauses (e.g., “好的,打开空调”) |
| 3. Punctuation only affects timing, not intonation |
| 4. [=*] specifies pinyin pronunciation (e.g., 打开空调[=tiao2], 音调[=diao4]上升) |
| 5. [n*] controls number pronunciation (e.g., 1300[n1] as “1300”, 1300[n2] as “one three zero”) |
| Voice Prompt |
|---|
| 1. There are two voice promoting modes: active voice prompt and passive voice prompt |
| 2. Active voice prompt refers to broadcasting the corresponding voice prompt when the voice system recognizes a command word |
| 3. Passive voice prompt means that when a command word is recognized by the voice system, no broadcasting is carried out; The corresponding voice prompt can only be broadcast when the specified protocol is received |
| Send Protocol |
|---|
| The send protocol refers to the process where, upon recognizing a specific command word, the voice system sends the corresponding protocol to the MCU via the communication serial port. |
| Receive Protocol |
|---|
| The receive protocol refers to the process where the voice system receives a protocol through the serial port, and then either announces the corresponding voice prompt or executes the function associated with that protocol. |
| Other Hidden Functions |
|---|
| 1. If there are command words related to volume control in the command words, the system will automatically realize the corresponding functions. For example, if there is “increase volume” or “decrease volume” in the command word, when the voice system is awake and recognizes “increase volume” or “decrease volume”, the voice system will automatically modify the voice prompt volume |
| 2. If the recognition of a command word is not sensitive enough, change the confidence threshold of the command word on the page to improve its recognition sensitivity (non automatic optimization mode); Command words with the same semantics can also be added to improve their recognition generalization |
2.2 Command Word and Voice Prompt Protocol List V3 - English Template¶
| English Command Words |
|---|
| 1. Optimal length is 2-4 words (4-6 syllables) - too short may cause misrecognition, too long is difficult to remember |
| 2. Maximize syllable differentiation between command words for better recognition |
| 3. Should match natural user language patterns - specific, direct and commonly used |
| 4. Avoid generic greetings like “HI” or “HELLO” |
| 5. Avoid phonetically similar commands (e.g., don’t use both TURN-ON and TURN-OFF) |
| 6. Avoid repeated words like “HI-HI” |
| 7. Format: ALL CAPS, with hyphens connecting multi-word commands (e.g., HELLO-JENNY) |
| Semantic Tag |
|---|
| 1. Semantic tags are used to mark command words with the same semantics |
| 2. Command words with the same semantics must have the same semantic tags, such as “TURN-ON-THE-LIGHT” and “SWITCH-ON-THE-LIGHT” |
| 3. The semantic tag is a positive integer. The value range is 1~65535 |
| 4. The content of broadcast statements with the same semantic command word can be the same or different. When the content of the broadcast statement is the same, the platform will automatically de duplicate according to the broadcast content; When the broadcast statements are different, the system will broadcast randomly when any command word of the semantics is recognized |
| 5. The sending protocol and receiving protocol of the same semantic command word must be the same. If there are differences, the platform will automatically prompt |
| Command Word Type |
|---|
| 1. There are three types of command words: wake-up words, command words and negative words. Wake up words are used to wake up the voice system, such as “HELLO-JENNY”; Command words are voice command words, such as “TURN-ON-THE-LIGHT”; Negative words are used to reduce the false recognition of non command word speech. For example, when “TURN-ON-TELEVISION” is said under noise conditions, it is possible to trigger the “TURN-ON-THE-LIGHT” command, but “TURN-ON-TELEVISION” non command word command. At this time, “TURN-ON-TELEVISION” can be marked as a negative word to be added to reduce the false recognition of “opening the air conditioner” |
| 2. There are two types of broadcast statements: welcome speech and rest speech. The welcome message is used for power on announcement, indicating that the system is powered on successfully; The rest speech refers to the announcement that the voice system switches from the awakened turntable to the non awakened state for prompt |
| 3. When the type of broadcast is marked as “Welcome”, the corresponding command word can be occupied by “WELCOME”; When the type of broadcast is marked as “rest words”, the corresponding command words can be occupied with “BYE” |
| English Voice Prompts |
|---|
| 1. Should generally be under 10 words for optimal user experience |
| 2. Format: all lowercase with words separated by spaces |
| 3. Use commas for natural pauses (e.g., “yes, I am here”) |
| 4. Note: Punctuation only affects timing, not intonation |
| Voice Prompt |
|---|
| 1. There are two voice promoting modes: active voice prompt and passive voice prompt |
| 2. Active voice prompt refers to broadcasting the corresponding voice prompt when the voice system recognizes a command word |
| 3. Passive voice prompt means that when a command word is recognized by the voice system, no broadcasting is carried out; The corresponding voice prompt can only be broadcast when the specified protocol is received |
| Send Protocol |
|---|
| The send protocol refers to the process where, upon recognizing a specific command word, the voice system sends the corresponding protocol to the MCU via the communication serial port. |
| Receive Protocol |
|---|
| The receive protocol refers to the process where the voice system receives a protocol through the serial port, and then either announces the corresponding voice prompt or executes the function associated with that protocol. |
| Other Hidden Functions |
|---|
| 1. If there are command words related to volume control in the command words, the system will automatically realize the corresponding functions. For example, if “VOLUME-UP” or “VOLUME-DOWN” exists in the command word, the voice system will automatically modify the broadcast volume when the voice system is awake and recognizes “VOLUME-UP” or “VOLUME-DOWN” |
| 2. If the recognition of a command word is not sensitive enough, change the confidence threshold of the command word on the page to improve its recognition sensitivity (non automatic optimization mode); Command words with the same semantics can also be added to improve their recognition generalization |
3. Language Model Development¶
3.1 Command Word List - Chinese Template¶
| Chinese command words |
|---|
| 1. It is generally 4-6 characters, 4 characters is the best. If it is too short, it can tolerate high error recognition. If it is too long, it is inconvenient for users to call and remember |
| 2. The greater the discrimination between the consonants of adjacent Chinese characters in command words, the better |
| 3. It conforms to the user’s language habits, is a commonly used statement, and the content is specific and direct |
| 4. Avoid using daily expressions, such as “Have a meal” |
| 5. Rare characters and zero initial characters should be avoided as far as possible. For example, the words “speech” in “speech recognition” are all zero initial characters |
| 6. It is better not to have modal particles in the command words, such as “ah” and “ni” |
| 7. Avoid using reduplicated words, such as “Hello” |
| 8. The Chinese command word can only be composed of pure Chinese characters, without spaces, commas and other characters |
| 9. The number in the command word should be expressed in Chinese characters, such as “调高一度” and “二十六度” |
| 10. If you have not determined the command word, it is recommended that you select it from the “Recommended Command Words” of the platform. Recommended command words |
3.2 Command Word List - English Template¶
| English Command Words |
|---|
| 1. Optimal length is 2-4 words (4-6 syllables) - too short may cause misrecognition, too long is difficult to remember |
| 2. Maximize syllable differentiation between command words for better recognition |
| 3. Should match natural user language patterns - specific, direct and commonly used |
| 4. Avoid generic greetings like “HI” or “HELLO” |
| 5. Avoid phonetically similar commands (e.g., don’t use both TURN-ON and TURN-OFF) |
| 6. Avoid repeated words like “HI-HI” |
| 7. Format: ALL CAPS, with hyphens connecting multi-word commands (e.g., HELLO-JENNY) |
| English Voice Prompts |
|---|
| 1. Should generally be under 10 words for optimal user experience |
| 2. Format: all lowercase with words separated by spaces |
| 3. Use commas for natural pauses (e.g., “yes, I am here”) |
| 4. Note: Punctuation only affects timing, not intonation |
3.3 Command Word List - Japanese Template¶
| Japanese Command Word |
|---|
| 1. A Japanese command word is suggested to be composed of 4-6 syllables of Japanese. If it is too short, it can be easily misidentified. If it is too long, it is inconvenient for users to call and remember |
| 2. The greater the syllable discrimination between command words, the better |
| 3. It conforms to the user’s language habits, is a commonly used statement, and the content is specific and direct |
| 4. Avoid using everyday words, such as “は い, お は よ” |
| 5. Avoid using similar syllables. The pronunciation of words should be clear and loud, such as; Lower げ げ (sa ge te), upper げ げ (a ge te) |
| 6. Avoid using reduplicated words, such as “ラ ボ - ラ ボ” |
| 7. Negative words are words that occur frequently in life and are easy to induce misrecognition of wake-up words or command words, and are used for targeted anti misrecognition and misawakening |
3.4 Command Word List - Korean Template¶
| Korean Command Word |
|---|
| 1. A Korean command word is suggested to be composed of 4-6 syllables of Japanese. If it is too short, it can be easily misidentified. If it is too long, it is inconvenient for users to call and remember |
| 2. The greater the syllable discrimination between command words, the better |
| 3. It conforms to the user’s language habits, is a commonly used statement, and the content is specific and direct |
| 4. Negative words are words that occur frequently in life and are easy to induce misrecognition of wake-up words or command words, and are used for targeted anti misrecognition and misawakening |
| 5. Command word phrases are connected by a balance line |
4. Language Model Optimization¶
4.1 Command Word List_Chinese Template¶
| Chinese command words |
|---|
| 1. It is generally 4-6 characters, 4 characters is the best. If it is too short, it can tolerate high error recognition. If it is too long, it is inconvenient for users to call and remember |
| 2. The greater the discrimination between the consonants of adjacent Chinese characters in command words, the better |
| 3. It conforms to the user’s language habits, is a commonly used statement, and the content is specific and direct |
| 4. Avoid using daily expressions, such as “Have a meal” |
| 5. Rare characters and zero initial characters should be avoided as far as possible. For example, the words “speech” in “speech recognition” are all zero initial characters |
| 6. It is better not to have modal particles in the command words, such as “ah” and “ni” |
| 7. Avoid using reduplicated words, such as “Hello” |
| 8. The Chinese command word can only be composed of pure Chinese characters, without spaces, commas and other characters |
| 9. The number in the command word should be expressed in Chinese characters, such as “one degree higher” and “twenty-six degrees” |
| 10. If you have not determined the command word, it is recommended that you select it from the “Recommended Command Words” of the platform. Recommended command words |
4.2 Command Word List_English Template¶
| English Command Words |
|---|
| 1. Optimal length is 2-4 words (4-6 syllables) - too short may cause misrecognition, too long is difficult to remember |
| 2. Maximize syllable differentiation between command words for better recognition |
| 3. Should match natural user language patterns - specific, direct and commonly used |
| 4. Avoid generic greetings like “HI” or “HELLO” |
| 5. Avoid phonetically similar commands (e.g., don’t use both TURN-ON and TURN-OFF) |
| 6. Avoid repeated words like “HI-HI” |
| 7. Format: ALL CAPS, with hyphens connecting multi-word commands (e.g., HELLO-JENNY) |
| English Voice Prompts |
|---|
| 1. Should generally be under 10 words for optimal user experience |
| 2. Format: all lowercase with words separated by spaces |
| 3. Use commas for natural pauses (e.g., “yes, I am here”) |
| 4. Note: Punctuation only affects timing, not intonation |
4.3 Command Word List_Japanese Template¶
| Japanese Command Word |
|---|
| 1. A Japanese command word is suggested to be composed of 4-6 syllables of Japanese. If it is too short, it can be easily misidentified. If it is too long, it is inconvenient for users to call and remember |
| 2. The greater the syllable discrimination between command words, the better |
| 3. It conforms to the user’s language habits, is a commonly used statement, and the content is specific and direct |
| 4. Avoid using everyday words, such as “は い, お は よ” |
| 5. Avoid using similar syllables. The pronunciation of words should be clear and loud, such as; Lower げ げ (sa ge te), upper げ げ (a ge te) |
| 6. Avoid using reduplicated words, such as “ラ ボ - ラ ボ” |
| 7. Negative words are words that occur frequently in life and are easy to induce misrecognition of wake-up words or command words, and are used for targeted anti misrecognition and misawakening |
4.4 Command Word List_Korean Template¶
| Korean Command Word |
|---|
| 1. A Korean command word is suggested to be composed of 4-6 syllables of Japanese. If it is too short, it can be easily misidentified. If it is too long, it is inconvenient for users to call and remember |
| 2. The greater the syllable discrimination between command words, the better |
| 3. It conforms to the user’s language habits, is a commonly used statement, and the content is specific and direct |
| 4. Negative words are words that occur frequently in life and are easy to induce misrecognition of wake-up words or command words, and are used for targeted anti misrecognition and misawakening |
| 5. Command word phrases are connected by a balance line |
5. Voice Prompt Synthesis¶
5.1 List of Voice Prompts_Chinese Template¶
| Chinese voice prompt |
|---|
| 1. In the sheet (“speech synthesis”), the first column is the audio serial number, the second column is the audio name, and the third column is the text to be synthesized |
| 2. The audio name should not be too long and contain no spaces. The text to be synthesized should not exceed 40 words |
| 3. [=*] is used to indicate the specified pinyin of the previous Chinese character. For example, turn on the air conditioner [=tiao2], and the pitch [=diao4] rises. The number represents the tone. It supports 1~5, and 5 is the soft tone |
| 4. [n *] is used to indicate the number pronunciation before the mark. For example: 1300 [n1] 1300 [n2], where n1 is designated as number pronunciation: 1300; N2 is assigned as numerical pronunciation: one three zero |
| 5. The audio name cannot contain any of the following characters: /: *? “<>| |
5.2 List of Voice Prompts_English Template¶
| English Voice Prompt |
|---|
| 1. In the sheet (“speech synthesis”), the first column is the audio serial number, the second column is the audio name, and the third column is the text to be synthesized |
| 2. In the sheet (“speech synthesis”), the second column is that the audio name must be capitalized and the words must be connected by a balance line. The third column is all lowercase and the words are separated by spaces |
| 3. The audio name should not be too long and contain no spaces. The text to be synthesized should not exceed 40 words |
| 4. The audio name cannot contain any of the following characters: /: *? “<>| |
5.3 List of Voice Prompts_Japanese Template¶
| Japanese Voice Prompt |
|---|
| 1. In the sheet (“speech synthesis”), the first column is the audio serial number, the second column is the audio name, and the third column is the text to be synthesized |
| 2. The audio name should not be too long and contain no spaces. The text to be synthesized should not exceed 40 words |
| 3. The audio name cannot contain any of the following characters: /: *? “<>| |