Chipintelli Non-chinese Language Product Recognition Acceptance Criteria
Table 1. Near-field Dataset ONLY (distance ≤ 1 m)
| # Utterances per command |
50-99 |
100-200 |
≥200 |
| Quiet recognition rate (background noise 35-45 dB, SNR ≥15 dB, distance <3 m) |
>85 % |
>90 % |
>95 % |
| Noisy recognition rate (ambient noise 55-60 dB, SNR ≥15 dB, distance <3 m) |
>60 % |
>70 % |
>80 % |
Table 2. Far-field + Near-field Dataset (distance ≤ 5 m)
| # Utterances per command |
50-99 |
100-200 |
≥200 |
| Quiet recognition rate (background noise 35-45 dB, SNR ≥15 dB, distance <5 m) |
>85 % |
>90 % |
>95 % |
| Noisy recognition rate (ambient noise 55-60 dB, SNR ≥15 dB, distance <5 m) |
>60 % |
>70 % |
>80 % |
False Wake-up Requirement (applies to both scenarios)
Noise-source test: <3 false wake-ups within 24 h
Test conditions: TV programme; distance to MIC 1.5 m; aligned with mic (0°); single noise source; noise source must NOT contain command words or acoustically similar content.
General Recording & Test Notes
- Record in a normal, quiet home environment; pronounce each command word 3-5 times at a relatively fast speaking rate.
- Speakers must pronounce clearly and standardly; maintain a balanced male/female ratio.
- Command words should be 4-8 syllables long and phonetically distinct to reduce confusion and false wake-ups.
- Recognition rate = average correct recognition across all command words.
- Near-field data: at least one microphone located ≤1 m from the speaker.
- Far-field data: at least two microphones located ≤5 m from the speaker, with the farther mic ≥3 m away; recommend 3-4 devices at different distances recording simultaneously.
- Audio data must be saved in WAV format with a sample rate no lower than 44.1 kHz.