1. Sound is a wave that can be heard by the human ear. Its vibration frequency is between 20 and 20 000 Hz.
2. The process of voice generation
The process of voice formation: air is discharged from the lungs into the throat, through the vocal cords into the channel, and finally the sound waves are radiated from the mouth to form a voice.
3, the classification of sound (concept: understanding + memory)
Voiced sound: The vocal cords are tight, and the airflow will cause the opening to become a periodic action of opening and closing, causing periodic excitation airflow, such as a, o;
(the sound produced by the vocal cord vibration), including all vowels and some consonants.
Unvoiced: The vocal cords are fully stretched, and a part of the vocal tract shrinks to form a narrow passage, creating air turbulence, such as t, d;
(sounds not caused by vocal cord vibration)
Blasting sound: The vocal cord is fully stretched, and a certain part of the vocal tract is completely closed. Once the closing point is suddenly opened, the air pressure is quickly released, such as b, p.
4. Two important acoustic characteristics of speech - pitch frequency and formant (memory)
Base tone rate (F0): Determined by the size, characteristics, and tension of the vocal cords. The value is equal to the reciprocal of the time the vocal cord is opened and closed (here removed is the definition of the pitch period). The human base audio frequency ranges from 80 to 500 Hz.
Formant (Fn, n=1, 2,...): The channel is a resonant cavity that amplifies certain frequency components of the sound stream and attenuates other frequency components. The amplified frequency is called a formant or resonance. Peak frequency.
5, formant characteristics: (understand)
The formant is an important acoustic characteristic of the channel. The response of the channel to an excitation signal can be approximated by a linear system with multiple pairs of poles. Each pair of poles corresponds to a formant frequency. The frequency response characteristic of this linear system is called the formant characteristic, which determines the overall profile of the signal spectrum, or spectral envelope.
The frequency characteristics of speech are mainly determined by the formants. The formant characteristics of the channel determine the spectral characteristics of the emitted sound, ie the tone.
The timbre and distinguishing characteristics of vowels are mainly determined by the formant characteristics of the channel. The formant characteristics can be observed from the amplitude-frequency characteristics obtained from the spectral analysis of the speech signal.
6. The digital model generated by the complete speech signal: (will draw the picture + explain the characteristics of each part of the model)
We can regard the speech signal as a quasi-periodic sequence or a random noise sequence as the output of an excited linear non-shifting system. The model can be divided into three parts: the excitation model, the vocal tract model, and the radiation model.
Complete digital model of speech signal (emphasis)
First, the incentive model
a. Voiced excitation: When the airflow passes through the tensioning vocal cord, the impulse vocal cord vibrates, forming a periodic pulse train at the glottis and using it to excite the channel.
Since the pulse train is similar to the pulse of the oblique triangle, the unit sample sequence string with the pitch period as the period is used as the excitation.
b. Unvoiced excitation: The vocal cords relax without vibrating, and the airflow enters the channel directly through the glottis.
The excitation can be simulated as random white noise due to the channel being blocked from forming turbulence when the sound is unvoiced.
Second, the channel model
a. Sound tube model: A system in which a channel is considered to be a series of tubes of different cross-sectional areas.
b. Formant model: the channel is regarded as a resonant cavity, and the resonant peak is the resonant frequency of this cavity
Cascade type
Applicable to general unit sounds, the channel is considered to be a set of second-order resonators in series, using an all-pole model
Parallel type
Applicable to non-general vowels and most consonants. When these sounds are emitted, the sound cavity has anti-resonance characteristics. It is necessary to add zero points to the model to reduce the harmonics.
Zero-pole model
Hybrid
We can automatically switch the series or parallel path according to the needs of the pronunciation. In addition, there is a straight-through path in the parallel part, and the amplitude control factor is AB.
This is specifically designed for some phonemes with flat spectral characteristics such as [f], [p], [b] to enhance anti-resonance characteristics.
Third, the radiation model
The process of the airflow formed in the sounding channel radiating through the lips to the listener's ear, the sound signal is attenuated, and the characteristics of high-pass filtering
A first-order digital high-pass filter is commonly used to simulate
Model summary:
1. This model is not the most complete model, because it is not applicable to some sounds, such as the fricatives in voiced sounds. There should be two kinds of excitations of voiced and unvoiced sounds, rather than simple superpositions. For these sounds, we It can be simulated with a more accurate model.
2. The gain control (for Av or AN) in the digital model of speech generation represents the acoustic intensity of the output speech;
Time-varying linear systems are mainly used to simulate the characteristics of the channel;
3. Two basic problems in digital speech processing, namely speech analysis and speech synthesis, are based on this model;
4. Features of this digital model:
System parameters are fixed - short-term analysis;
All-pole nature - zero can be approximated by multiple poles;
The excitation source and channel are independent of each other - suitable for most digital speech processing.
7. Definition and characteristics of narrow-band and wide-band spectral maps (deep understanding)
Spectrogram: the spectrogram of the speech signal. The abscissa of the spectrum is time and the ordinate is frequency.
Narrowband spectral map: The generation of the spectrogram is a Fourier transform. When we use a longer analysis window (about 20ms, the corresponding bandwidth is about 45 Hz), the frequency resolution is higher, on the spectrum. You can see the composition of the resonance. An equidistant black and white horizontal line is presented on the spectrogram with the spacing being the fundamental frequency (F0).
Broadband spectrum: If there are fewer sampling points in the conversion calculation (the analysis window is about 3ms, the corresponding bandwidth is about 300 Hz), the resonance component is not visible in the spectrum, and the equidistance is not seen on the spectrum. black and white. The lower the frequency resolution, the higher the resolution on the time axis, and the more visible vertical lines.
Formant:
In the frequency domain, the energy concentration is where the formant is located, and in the spectrogram is the darker position.
When the vowel is pronounced, the sound intensity is large, the vocal cord vibrates to exhibit the fundamental frequency and its resonant frequency, and the formant can be clearly seen, and the energy is concentrated at the low frequency.
If a consonant is emitted and the vocal cord does not vibrate, the resonant frequency is not visible. Usually the consonant has a small sound intensity, the color appears to be lighter, and the energy is concentrated at high frequencies.
If there is no voice in the gap, then there is a blank on the spectrum.
Our Professional 40W solar panel manufacturer is located in China. including Solar Module. PV Solar Module, Silicon PV Solar Module, 40W solar panel for global market.
40W solar panel. Solar panel, PV solar panel. silicon solar panel
Jiangxi Huayang New Energy Co.,Ltd , https://www.huayangenergy.com