Chipintelli Training Data Collection Environment and Method Description V1.1¶
Training Data Requirements¶
1、Command Word Text Format¶
(1)Provide all command word texts corresponding to the recorded audio data according to project requirements.
(2)It is recommended that command word syllable length be between 4-8 syllables. In English, one vowel unit is one syllable, such as “air-con-di-tion-er” being 5 syllables.
(3)For non-chinese and non-english languages, provide Chinese or English translations of the command words.
(4)For foreign words in other languages, provide their corresponding spelling in their native language whenever possible, such as:
- “time” in Korean can be written as “타이머”.
- “massage” in Vietnamese can be written as “Mát xa”.
(5)Convert all Arabic numerals in the text to the corresponding text format of the recording language, such as:
- “1 hour” in English should be changed to “one hour”.
- “1 시간” (1 hour) in Korean should be changed to “한 시간” (one hour).
2、Recording Personnel and Process¶
(1)Training set audio recording people are recommended to mainly select people aged 18-60.
(2)For small language development training data collection, it is recommended to collect no less than 50 people; if the language has a lot of special pronunciation such as throat sound and tremor, it is recommended to record no less than 100 people;
(3)For Chinese project enhancement training data collection, it is recommended to collect no less than 150 people; if it is a new development field, it is recommended to record no less than 300 people;
(4)Record with local accent, avoiding heavy regional accents as much as possible;
(5)Recording people should have no difficulty in reading and pronouncing the text, and the pronunciation should be smooth, avoiding stutters or pronouncing one word at a time;
(6)The same command word should be recorded at least 3 times, normal speed 1 time, fast speed 2 times;
(7)Wake-up words should be recorded no less than 10 times per person;
(8)Single command word recording process should not be interrupted;
(9)The speed should be stable, and the volume should not be too different;
(10)Gender ratio should be 1:1;
(11)The recording person should face the microphone at a distance of about 65-80dB;
(12)When collecting data, record the gender, age, and place of origin of the recording person, and record the recording equipment and model used (e.g. Roland R44 recorder or Huawei Mate50), and correspond to the audio file one by one;
(13)Provide the recording text corresponding to the audio pronunciation order;
3、Recording Equipment and Environment Requirements (High-fidelity recorder)¶
(1)The best home environment is recommended, and the reverberation range should be between 0.3-0.6;
(2)The room noise should be between 35-45dB;
(3)Use Roland high-fidelity recorder, adjust the gain and sensitivity of the equipment according to the environment;
(4)Select four Superlux ECM999 microphones, corresponding to different distances for mic audio collection;
(5)The microphone should be placed at a distance of 0.5m, 1m, 3m, and 5m from the sound source, and if necessary, place different angle corresponding microphones for recording;
(6)The environment should be quiet, with no obvious interference noise, and the environment should be built as follows:
4、Recording Equipment and Environment Requirements (Mobile Phone/Computer/High-fidelity recorder)¶
(1)Mobile phone/Computer recording should ensure high audio quality, with a sampling rate of no less than 44.1k, and avoid damage to the audio quality during transmission;
(2)Use handheld high-fidelity recorder, select 44.1k sampling rate single channel recording;
(3)The recording device should be placed at a distance of 3m from the recording person;
(4)The best home environment is recommended, and the reverberation range should be between 0.3-0.6;
(5)The room noise should be between 35-40dB;
(6)The pronunciation should be clear, and the sound should not be trimmed, reducing data loss;
(7)If multiple command words are recorded consecutively, there should be an interval of 1-2 seconds between each sentence, and the interval should be 10 seconds for the wrong command word, and it should be read again;
(8)The environment should be quiet, with no obvious interference noise;
(9)Audio storage format wav, sampling rate no less than 44.1K;
5、Data Storage Notes¶
(1) When the number of command words is small, it is recommended to store each command word as one file and provide the corresponding text;
(2) When the number of command words is large, it can be recorded as a long audio, not cut, but the recording text corresponding to the recording order should be provided.
