WO2022218254A1 - 语音信号增强方法、装置及电子设备 - Google Patents

语音信号增强方法、装置及电子设备 Download PDF

Info

Publication number
WO2022218254A1
WO2022218254A1 PCT/CN2022/086098 CN2022086098W WO2022218254A1 WO 2022218254 A1 WO2022218254 A1 WO 2022218254A1 CN 2022086098 W CN2022086098 W CN 2022086098W WO 2022218254 A1 WO2022218254 A1 WO 2022218254A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
voice signal
gain
spectrum
power spectrum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2022/086098
Other languages
English (en)
French (fr)
Inventor
杨闳博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vivo Mobile Communication Co Ltd
Original Assignee
Vivo Mobile Communication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vivo Mobile Communication Co Ltd filed Critical Vivo Mobile Communication Co Ltd
Priority to EP22787480.7A priority Critical patent/EP4325487A4/en
Publication of WO2022218254A1 publication Critical patent/WO2022218254A1/zh
Priority to US18/484,927 priority patent/US12597433B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0224Processing in the time domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/93Discriminating between voiced and unvoiced parts of speech signals

Definitions

  • the present application belongs to the field of communication technologies, and in particular relates to a voice signal enhancement method, device and electronic device.
  • electronic devices can reduce the noisy voice signal by reducing the noise.
  • the noise component in the voice signal can be obtained from the noisy voice signal, so as to ensure the quality of the obtained voice signal.
  • the quality of the original speech signal in the noisy speech signal may be damaged, so that the original speech signal obtained by the electronic device is distorted, thereby causing the speech signal output by the electronic device. of poor quality.
  • the purpose of the embodiments of the present application is to provide a voice signal enhancement method, device and electronic device, which can solve the problem of poor quality of the voice signal output by the electronic device.
  • an embodiment of the present application provides a voice signal enhancement method.
  • the voice signal enhancement method includes: performing noise reduction processing on a first voice signal according to a first time spectrum and a first power spectrum to obtain a second voice signal , the first time spectrum is used to indicate the time domain feature and frequency domain feature of the first speech signal, and the first power spectrum is the power spectrum of the noise signal in the first speech signal; determine the voiced signal from the second speech signal, Gain compensation is performed on the voiced signal, the voiced signal is a signal whose cepstral coefficient is greater than or equal to a preset threshold in the second voice signal; according to the voiced signal after the gain compensation, the damage compensation gain of the second voice signal is determined, and based on the damage compensation The compensation gain is to perform gain compensation on the second speech signal.
  • an embodiment of the present application provides a voice signal enhancement apparatus, where the voice signal enhancement apparatus includes: a processing module, a determination module, and a compensation module.
  • the processing module is configured to perform noise reduction processing on the first voice signal according to the first time spectrum and the first power spectrum to obtain a second voice signal, where the first time spectrum is used to indicate the time domain feature of the first voice signal and frequency domain features, the first power spectrum is the power spectrum of the noise signal in the first speech signal.
  • the determining module is configured to determine a voiced signal from the second voice signal obtained by the processing module, where the voiced signal is a signal whose cepstral coefficient is greater than or equal to a preset threshold in the second voice signal.
  • the compensation module is used to perform gain compensation on the voiced sound signal determined by the determination module.
  • the determining module is further configured to determine the damage compensation gain of the second speech signal according to the voiced signal after the gain compensation.
  • the compensation module is further configured to perform gain compensation on the second speech signal based on the damage compensation gain determined by the determination module.
  • embodiments of the present application provide an electronic device, the electronic device includes a processor, a memory, and a program or instruction stored on the memory and executable on the processor, the program or instruction being The processor implements the steps of the method according to the first aspect when executed.
  • an embodiment of the present application provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or instruction is executed by a processor, the steps of the method according to the first aspect are implemented .
  • an embodiment of the present application provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction, and implement the first aspect the method described.
  • the second voice signal can be obtained from the second voice signal.
  • a voiced signal is determined in the speech signal to perform gain compensation on the voiced signal
  • an impairment compensation gain of the second speech signal is determined according to the gain-compensated voiced signal to perform gain compensation on the second speech signal based on the impairment compensation gain.
  • the electronic device can first perform noise reduction processing on the noisy speech signal (for example, the first speech signal) to reduce the noise components in the noisy speech signal, so as to obtain a pure original speech signal;
  • the device can also continue to perform damage gain compensation on the obtained original voice signal to correct the voice damage generated in the noise reduction process, so as to obtain the final enhanced voice signal.
  • the problem of distortion of the original voice signal obtained by the electronic device can be avoided. , thereby improving the quality of the voice signal output by the electronic device.
  • FIG. 1 is one of the schematic diagrams of a voice signal enhancement method provided by an embodiment of the present application.
  • FIG. 2 is the second schematic diagram of a voice signal enhancement method provided by an embodiment of the present application.
  • FIG. 3 is a third schematic diagram of a voice signal enhancement method provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a voice signal enhancement apparatus provided by an embodiment of the present application.
  • FIG. 6 is a second schematic diagram of a hardware structure of an electronic device provided by an embodiment of the present application.
  • first, second and the like in the description and claims of the present application are used to distinguish similar objects, and are not used to describe a specific order or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances so that the embodiments of the present application can be practiced in sequences other than those illustrated or described herein, and distinguish between “first”, “second”, etc.
  • the objects are usually of one type, and the number of objects is not limited.
  • the first object may be one or more than one.
  • “and/or” in the description and claims indicates at least one of the connected objects, and the character “/" generally indicates that the associated objects are in an "or” relationship.
  • Cepstrum A spectrum obtained by inverse Fourier transform of the Fourier transform spectrum of a signal after logarithmic operation.
  • MCRA Minima controlled recursive avaraging
  • IMCRA Improved Minima Controlled Recursive Averaging
  • FFT Fast Fourier Transform
  • Short-time Fourier transform It is a mathematical transformation related to the Fourier transform to determine the frequency and phase of the sine wave in the local area of the time-varying signal.
  • the short-time Fourier transform is to truncate the original Fourier transform into multiple segments in the time domain, and perform the Fourier transform on each segment to obtain the frequency domain characteristics of each segment (that is, know the time domain and frequency domain at the same time. corresponding relationship).
  • Minimum mean-square error estimation (minimum mean-square error, MMSE): Based on a given observation value, an estimate of a random variable is obtained.
  • the common method in the existing estimation theory is to seek a transformation function to minimize the mean square error.
  • Minimum mean-square error log-spectral amplitude (MMSE-LSA) of logarithmic amplitude spectrum First, the speech signal is processed into frames according to the quasi-stationary characteristics of the speech signal, so that each frame of the signal is considered to have stationary characteristics , then find the short-term spectrum of each frame of signal, and extract the characteristic parameters, then use the speech detection algorithm to judge whether each frame of signal is a noise signal or a noisy speech signal, and use the MMSE method to estimate the short-time spectrum amplitude of the pure speech signal, Finally, the speech signal is reconstructed from the short-time spectral phase and estimated short-time spectral amplitude of the speech signal by using the characteristic that the human ear is insensitive to the speech phase, thereby obtaining the enhanced speech signal.
  • MMSE-LSA Minimum mean-square error log-spectral amplitude
  • voice enhancement technologies based on voice noise reduction have been gradually applied.
  • traditional speech enhancement techniques spectral subtraction, Wiener filtering, and statistical model-based noise reduction methods are widely used due to their simplicity, effectiveness, and low engineering computation.
  • the single-microphone noise reduction scheme obtains the prior signal-to-noise ratio and the posterior signal-to-noise ratio by estimating the noise power spectrum in the input signal, and then uses the traditional noise reduction method to calculate the noise reduction gain, and acts on the input signal to obtain the reduction Noise-processed speech signal.
  • the multi-microphone noise reduction scheme which uses spatial information to beamform the input multi-channel signals.
  • the single-microphone noise reduction scheme is implemented for the single-channel signal aggregated by the beam, and the traditional noise reduction method is used to calculate the noise reduction.
  • the gain is applied to the beam-aggregated signal to obtain a noise-reduced speech signal.
  • the technical implementation of the traditional noise reduction method is described below by taking the single-microphone noise reduction scheme as an example.
  • the noisy speech signal received by the microphone is:
  • the clean speech signal is x(t), and the additive random noise is n(t).
  • the posterior signal-to-noise ratio ⁇ (f,k) (which can also be described as ⁇ (f)) as the following formula 3
  • the prior signal-to-noise ratio ⁇ (f,k) (which can also be described as ⁇ (f)) as the following formula 4
  • P nn (f, k) is the estimated value of the noise power spectrum
  • P yy (f, k) is the power spectrum of the noisy speech signal (known)
  • P xx (f, k) is the power spectrum of the clean speech signal (unknown)
  • the commonly used strategies for noise power spectrum estimation are as follows: first, the voice activity detection is performed on the input signal (that is, the noisy speech signal). In the time band of the pure noise signal, the power spectrum of the noise signal in the input signal is equal to the power spectrum of the pure noise signal; In the time band of the pure speech signal, the power spectrum of the noise signal is not updated; in the time band between the pure speech signal and the noise signal, the power spectrum of the noise signal is updated according to a specific constant.
  • the above estimation strategy can refer to the noise power spectrum estimation method in MCRA and IMCRA.
  • the prior signal-to-noise ratio ⁇ (f,k) can be calculated from the posterior signal-to-noise ratio ⁇ (f,k)-1, and using the decision-guided method and the prior signal-to-noise ratio ⁇ (f,k of the previous frame signal -1) After recursive smoothing, the specific algorithm is:
  • the noise reduction gain G(f) can be calculated in the following ways:
  • the electronic device can obtain the voice signal after noise reduction as:
  • the traditional noise reduction method can obtain sufficient noise reduction gain and ensure small speech distortion.
  • large noise and low signal-to-noise ratio scenarios that is, the power of the clean speech signal is less than or equal to the power of the noise signal
  • the time-varying scenarios of the noise intensity and probability distribution such as the passing of a car, the start of the subway and the It is difficult to achieve accurate and real-time noise power spectrum estimation, which is limited by factors such as the accuracy and convergence time of the voice activity detection and noise power spectrum estimation methods themselves, resulting in possible deviations in the results of noise power spectrum estimation.
  • the electronic device may perform frame-by-frame windowing and Fast Fourier Transform (FFT) on the acquired noisy speech signal, so as to convert the noisy speech signal from a time-domain signal to a frequency domain signal to obtain the time spectrum of the noisy speech signal, then determine the power spectrum of the noisy speech signal according to the time spectrum of the noisy speech signal, and obtain the noisy speech signal by recursively smoothing the minimum value of the power spectrum of the noisy speech signal
  • FFT Fast Fourier Transform
  • the electronic device can convert the noise reduction processed speech signal from the time-frequency domain to the cepstral domain, and obtain the noise reduction processed speech signal by performing a homomorphic positive analysis on the noise reduction processed speech signal.
  • the cepstral coefficients of the cepstral coefficients are determined, and the signal corresponding to the larger cepstral coefficient among these cepstral coefficients is determined as the voiced signal, and then the cepstral coefficient of the voiced signal is gain-amplified to perform gain compensation on the voiced signal, so as to be enhanced
  • the logarithmic time spectrum of the voice signal after the noise reduction the electronic device can obtain the damage compensation gain according to the difference of the logarithmic time spectrum before and after the homomorphic filtering enhancement, so as to realize the loss compensation gain according to the voice signal after noise reduction processing and the damage compensation gain.
  • the noise-processed speech signal is subjected to gain compensation to obtain the final enhanced speech signal.
  • the electronic device can first perform noise reduction processing on the noisy speech signal (for example, the first speech signal) to reduce the noise components in the noisy speech signal, so as to obtain a pure original speech signal;
  • the device can also continue to perform damage gain compensation on the obtained original voice signal to correct the voice damage generated in the noise reduction process, so as to obtain the final enhanced voice signal.
  • the problem of distortion of the original voice signal obtained by the electronic device can be avoided. , thereby improving the quality of the voice signal output by the electronic device.
  • FIG. 1 shows a flowchart of a voice signal enhancement method provided by an embodiment of the present application, and the method can be applied to an electronic device.
  • the voice signal enhancement method provided by this embodiment of the present application may include the following steps 201 to 204 .
  • Step 201 The electronic device performs noise reduction processing on the first voice signal according to the first time spectrum and the first power spectrum to obtain a second voice signal.
  • the first time spectrum is used to indicate the time domain feature and the frequency domain feature of the first voice signal
  • the first power spectrum is the power spectrum of the noise signal in the first voice signal
  • the electronic device in the process of the user making a voice call through the electronic device, can detect the voice signal during the voice call in real time, so as to obtain a noisy voice signal (for example, the first voice signal), and according to the noise
  • the signal parameters of the speech signal (such as the time spectrum of the entire noisy speech signal, the power spectrum of the noise signal in the noisy speech signal), and the noise reduction processing is performed on the noisy speech signal to obtain the speech signal after noise reduction processing, thereby Realize gain compensation for noisy speech signals.
  • the above-mentioned first time spectrum can be understood as: the frequency domain signal corresponding to the first speech signal (for example, the frequency domain signal obtained by the short-time Fourier transform of the first speech signal described in the following embodiments) time spectrum.
  • the above-mentioned first time spectrum is used to indicate the time domain feature and frequency domain feature of the first voice signal. It can be understood that: the first time spectrum can not only reflect the time domain feature of the first voice signal, but also can reflect the frequency domain of the first voice signal. Domain features.
  • the voice signal enhancement method provided by the embodiment of the present application further includes the following steps 301 to 303.
  • Step 301 The electronic device performs short-time Fourier transform on the first voice signal to obtain a first time spectrum.
  • the electronic device converts the first voice signal received through the microphone into a digital signal, and the digital signal undergoes short-time Fourier transform (ie, frame-by-frame windowing and Fast Fourier Transform (FFT))
  • short-time Fourier transform ie, frame-by-frame windowing and Fast Fourier Transform (FFT)
  • FFT Fast Fourier Transform
  • Y 1 (f, k) is the frequency domain signal corresponding to the first speech signal
  • y(n) is the first speech signal (the time domain signal), so as to obtain the time spectrum of the first speech signal.
  • Step 302 The electronic device determines the power spectrum of the first voice signal according to the first time spectrum, and determines the target power spectrum from the power spectrum of the first voice signal.
  • the above-mentioned target power spectrum is the power spectrum of the signal with the smallest power spectrum among the signals within the preset time window.
  • the electronic device may use a first preset algorithm (Formula 11 below) according to the time spectrum of the first voice signal to determine the power spectrum P yy (f, k) of the first voice signal, and use the preset algorithm to determine the power spectrum P yy (f, k) of the first voice signal.
  • a first preset algorithm Forma 11 below
  • the signal within the preset time window may be the entire first voice signal or a part of the first voice signal.
  • Step 303 The electronic device performs recursive smoothing processing on the target power spectrum to obtain a first power spectrum.
  • the electronic device may perform a recursive smoothing process on the target power spectrum P ymin (f) through ⁇ s to obtain the power spectrum P nn (f) of the noise signal in the first speech signal (ie, the first power spectrum), recursively
  • the smoothing algorithm is:
  • the smoothing coefficient ⁇ s is controlled by the speech existence probability of the current frame.
  • ⁇ s is close to 0.
  • the noisy speech signal is composed of pure speech signal and noise signal.
  • the pure speech signal and the noise signal in the noisy speech signal can be determined, that is, the noisy speech. Which frames of the signal are pure speech signals and which frames are noise signals.
  • the electronic device may perform short-time Fourier transform on the first voice signal (that is, the noisy voice signal) picked up by the microphone to obtain the time spectrum of the first voice signal (that is, the first time spectrum), Determine the power spectrum of the first voice signal according to the first time spectrum and adopt a first preset algorithm, and determine the power spectrum of the signal with the smallest power spectrum among the signals within the preset time window from the power spectrum of the first voice signal (that is, the target power spectrum), to perform recursive smoothing on the target power spectrum to obtain the power spectrum of the noise signal in the first speech signal (that is, the first power spectrum), so that the electronic device can pass the first time spectrum and the first power spectrum. , to implement noise reduction processing on the first voice signal.
  • step 201 may be specifically implemented through the following steps 201a to 201c.
  • Step 201a the electronic device determines the posterior signal-to-noise ratio corresponding to the first voice signal according to the first power spectrum and the power spectrum of the first voice signal, and performs recursive smoothing processing on the posterior signal-to-noise ratio to obtain the corresponding signal of the first voice signal.
  • the prior signal-to-noise ratio is the prior signal-to-noise ratio.
  • Step 201b the electronic device determines the target noise reduction gain according to the posterior SNR and the prior SNR.
  • the target noise reduction gain G 1 (f, k) can be calculated from the prior signal-to-noise ratio and the posterior signal-to-noise ratio, and the specific algorithm is:
  • Step 201c the electronic device performs noise reduction processing on the first voice signal according to the first time spectrum and the target noise reduction gain to obtain a second voice signal.
  • the electronic device may use the second preset algorithm (the following formula 17) according to the first time spectrum and the target noise reduction gain, to perform the first voice signal (that is, the frequency domain signal corresponding to the first voice signal) Perform noise reduction processing to obtain the second voice signal Y 2 (f, k) (that is, the signal after noise reduction processing is performed on the frequency domain signal corresponding to the first voice signal),
  • the electronic device may determine a posteriori signal-to-noise ratio corresponding to the first voice signal according to the power spectrum of the noise signal in the first voice signal and the power spectrum of the first voice signal, and calculate the posterior signal-to-noise ratio.
  • a priori signal-to-noise ratio corresponding to the first speech signal is obtained by recursive smoothing, so as to determine the target noise reduction gain according to the a posteriori signal-to-noise ratio and the priori signal-to-noise ratio, so as to determine the target noise reduction gain according to the time spectrum of the first speech signal and the target reduction.
  • Noise gain using a second preset algorithm to perform noise reduction processing on the first voice signal to obtain a voice signal after noise reduction processing. In this way, by performing noise reduction processing on the noisy speech signal to reduce the noise component in the noisy speech signal, a pure original speech signal is obtained, and the quality of the speech signal output by the electronic device is improved.
  • Step 202 The electronic device determines a voiced signal from the second speech signal, and performs gain compensation on the voiced signal.
  • the voiced signal is a signal whose cepstral coefficient is greater than or equal to a preset threshold in the second speech signal.
  • the electronic device may first determine the cepstral coefficient of the second speech signal, and then determine the signal with a larger cepstral coefficient in the second speech signal as the voiced signal, so as to perform gain compensation on the voiced signal, thereby realizing Gain compensation is performed on the second speech signal.
  • the electronic device can preset a decision threshold (ie, a preset threshold) of the voiced signal to determine a signal whose cepstral coefficient is greater than or equal to the decision threshold from the second speech signal, so as to determine the signal as a voiced signal,
  • a decision threshold ie, a preset threshold
  • the voiced signal has obvious fundamental and harmonic characteristics in time-frequency domain and cepstral domain.
  • step 202 may be specifically implemented through the following steps 202a to 202c.
  • Step 202a the electronic device performs homomorphic positive analysis processing on the second speech signal to obtain the target cepstral coefficient of the second speech signal.
  • the target cepstral coefficient includes at least one cepstral coefficient, and each cepstral coefficient corresponds to a frame of signal in the second speech signal. It should be noted that, for each frame of the second voice signal, the electronic device may divide the second voice signal into at least one voice segment, and one voice segment may be understood as one frame of the second voice signal.
  • the electronic device may perform homomorphic positive analysis processing on the frequency domain signal Y 2 (f, k) corresponding to the second speech signal to obtain the cepstral coefficient Q(c, k) of the second speech signal, where c is the time index of the cepstral coefficient, and the specific algorithm is:
  • a waveform diagram of the first speech signal (which may also be referred to as a noisy speech time-domain signal) is shown;
  • the second voice signal is obtained, and the logarithmic time spectrum of the second voice signal as shown in (B) in FIG. 2 is obtained through logarithmic calculation;
  • Homomorphic positive analysis processing is performed to obtain the cepstrum of the second speech signal as shown in (C) in FIG. 2 (the horizontal axis is the time index, and the vertical axis is the cepstral coefficient).
  • Step 202b the electronic device determines the maximum cepstral coefficient from the target cepstral coefficient, and determines the signal corresponding to the maximum cepstral coefficient in the second speech signal as a voiced signal.
  • each frame of signal in the second speech signal corresponds to a cepstral coefficient
  • the electronic device can search for the maximum cepstral coefficient from the obtained at least one cepstral coefficient, so that the maximum cepstral coefficient corresponds to the maximum cepstral coefficient.
  • One frame of signal is determined to be a voiced signal.
  • the electronic device may preset the search range of the speech pitch period to be [70Hz-400Hz], and the range of the cepstral coefficient corresponding to the search range of the speech pitch period is [Fs/400-Fs/70 ], where Fs is the sampling frequency, the electronic device searches for the maximum cepstral coefficient Q max from the cepstral coefficients located in the range of the target cepstral coefficients, and the corresponding time index is c max , assuming that the discrimination threshold of the voiced signal is h , when Q max (c,k)>h, it is determined that the signal corresponding to the maximum cepstral coefficient is a voiced signal (for example, the signal corresponding to the gene cycle position in (C) in FIG. 2 ), and the voiced signal is in the frequency domain And cepstral domain has obvious fundamental characteristic and harmonic characteristic.
  • Step 202c the electronic device performs a gain amplification process on the maximum cepstral coefficient, so as to perform gain compensation on the voiced sound signal.
  • the electronic device when it is determined that a certain frame signal in the second speech signal is a voiced signal, the electronic device performs a gain amplification process on the maximum cepstral coefficient corresponding to the voiced signal, so as to realize gain compensation for the voiced signal.
  • the algorithm is:
  • g is the gain coefficient, and g is used to control the size of the compensation gain, for example, the value of g can be 1.5.
  • the electronic device may perform homomorphic positive analysis processing on the second speech signal to obtain cepstral coefficients of the second speech signal, and then determine the maximum cepstral coefficient from these cepstral coefficients, and use the second speech signal to determine the maximum cepstral coefficient.
  • the signal corresponding to the maximum cepstral coefficient in the signal is determined as a voiced signal, so that the electronic device can perform gain compensation on the voiced signal by performing gain amplification processing on the maximum cepstral coefficient, so as to gain the speech signal after noise reduction processing. compensate.
  • Step 203 The electronic device determines an impairment compensation gain of the second speech signal according to the voiced signal after the gain compensation, and performs gain compensation on the second speech signal based on the impairment compensation gain.
  • the electronic device determines the impairment compensation gain of the second speech signal according to the voiced signal after the gain compensation in the above step 203 can be specifically implemented by the following steps 203a and 203b.
  • Step 203a the electronic device performs a homomorphic inverse analysis process on the first cepstral coefficient and the maximum cepstral coefficient after the gain amplification process, to obtain a first logarithmic time spectrum.
  • the above-mentioned first cepstral coefficient is a cepstral coefficient other than the largest cepstral coefficient among the target cepstral coefficients.
  • the electronic device performs homomorphic inverse analysis processing on the cepstral coefficients of the target cepstral coefficients except the maximum cepstral coefficient and the maximum cepstral coefficient after gain amplification, so as to obtain the enhanced second speech
  • the logarithmic time spectrum of the signal LY 2E (f,k) (that is, the first logarithmic time spectrum), the specific algorithm is:
  • Step 203b the electronic device determines the logarithmic time spectrum of the second voice signal according to the time spectrum of the second voice signal, and determines damage compensation according to the difference between the first logarithmic time spectrum and the logarithmic time spectrum of the second voice signal gain.
  • the electronic device may determine the logarithmic time spectrum LY 2 (f, k) of the second voice signal according to the time spectrum of the second voice signal.
  • the difference between the log-time spectrum of the speech signal and the log-time spectrum of the second speech signal determines the impairment compensation gain.
  • the electronic device can calculate the damage compensation gain from the logarithmic time spectrum before and after the enhancement of the cepstral coefficient through the F function, that is,
  • the F function can be implemented in two ways.
  • the difference value of the log spectrum is converted into a linear coefficient as the damage compensation gain, and the specific algorithm is as follows: Formula 23; in the second implementation, based on the calculation of the log spectrum difference , increase the gain constraint range, that is, limit the logarithmic spectral difference within the gain constraint range to control the maximum gain and minimum gain at each frequency point, so as to ensure that the damage compensation gain G c (f,k) is within a reasonable range Inside.
  • the logarithmic time spectrum before and after the homomorphic inverse analysis is shown, that is, the logarithmic time spectrum before and after homomorphic filter enhancement.
  • the electronic device After the electronic device performs gain amplification processing on the maximum cepstral coefficient to perform gain compensation on the voiced sound signal, the electronic device can continue to amplify the cepstral coefficients of the target cepstral coefficients except the maximum cepstral coefficient and the maximum cepstral coefficient after the gain amplification processing.
  • Homomorphic inverse analysis processing is performed on the data to obtain the logarithmic time spectrum (ie the first logarithmic time spectrum) of the enhanced second speech signal as shown in (A) in FIG.
  • the electronic device may continue to perform gain compensation on the voiced signal in the second voice signal to determine the damage compensation of the second voice signal. gain, so as to realize gain compensation for the second speech signal based on the impairment compensation gain, so as to obtain a final enhanced speech signal, which improves the quality of the speech signal.
  • An embodiment of the present application provides a voice signal enhancement method.
  • the electronic device performs noise reduction processing on the first voice signal according to the time spectrum of the first voice signal and the power spectrum of the noise signal in the first voice signal to obtain the second voice signal.
  • the voiced signal may be determined from the second voice signal to perform gain compensation on the voiced signal
  • the damage compensation gain of the second voice signal may be determined according to the voiced signal after gain compensation, so as to determine the damage compensation gain of the second voice signal based on the damage compensation gain for the second voice signal Perform gain compensation.
  • the electronic device can first perform noise reduction processing on the noisy speech signal (for example, the first speech signal) to reduce the noise components in the noisy speech signal, so as to obtain a pure original speech signal; then, the electronic device can continue to Damage gain compensation is performed on the obtained original voice signal to correct the voice damage generated in the noise reduction process, so as to obtain the final enhanced voice signal. In this way, the problem of distortion of the original voice signal obtained by the electronic device can be avoided, thereby improving the performance of the original voice signal.
  • the quality of the voice signal output by the electronic device is a noise reduction processing on the noisy speech signal (for example, the first speech signal) to reduce the noise components in the noisy speech signal, so as to obtain a pure original speech signal; then, the electronic device can continue to Damage gain compensation is performed on the obtained original voice signal to correct the voice damage generated in the noise reduction process, so as to obtain the final enhanced voice signal. In this way, the problem of distortion of the original voice signal obtained by the electronic device can be avoided, thereby improving the performance of the original voice signal.
  • the total energy of the voice signal output by this scheme (the signal after voice enhancement) is greater than the total energy of the input voice signal, and
  • the spectrum of the voiced part (including fundamental and harmonic components) in the output voice signal is larger than that of the input voice signal (that is, the output voice signal is enhanced), and the traditional noise reduction method will only attenuate the input voice.
  • the noise signal in the signal that is, the energy of the output voice signal is less than or equal to the energy of the input voice signal, so the quality of the voice signal output by this solution is higher than that of the traditional solution.
  • the second voice signal is a signal obtained by performing noise reduction processing on a target frequency domain signal
  • the target frequency domain signal is a signal obtained by performing short-time Fourier transform on the first voice signal.
  • Step 204 The electronic device performs inverse time-frequency transform processing on the gain-compensated second speech signal to obtain a target time-domain signal, and outputs the target time-domain signal.
  • the time-frequency inverse transformation is performed on the second voice signal after the gain compensation (that is, the enhanced frequency domain signal) to obtain the voice enhanced time domain signal, thereby outputting the enhanced voice signal Y 3 (f,k), the specific algorithm is:
  • the following uses MCRA and MMSE-LSA as examples to describe the noise reduction process.
  • the coefficient is controlled by the speech existence probability of the current frame signal.
  • the value of ⁇ s is close to 0.
  • the posterior signal-to-noise ratio ⁇ (f,k) P yy (f,k)/P nn (f,k)
  • the noise reduction gain G 1 (f,k) is calculated from the prior SNR and the posterior SNR, namely
  • the electronic device can preset the search range of the speech pitch period [70Hz-400Hz], and the range of the corresponding cepstral coefficient is [Fs/400-Fs/70], and the maximum cepstral coefficient in the search range is recorded as Q max , which is The corresponding time index is denoted as c max , and the discriminative threshold of the voiced signal is set to h.
  • Q max (c,k)>h the current frame signal is determined to be a voiced signal, that is, the current frame signal is in the frequency domain and the cepstral domain.
  • the F function can be implemented in many ways, one of which is to convert the difference of the logarithmic spectrum into a linear coefficient, which is used as the damage compensation gain, namely Another implementation is to increase the gain constraint range on the basis of the logarithmic spectral difference, that is, to limit the logarithmic spectral difference within the gain constraint range to control the maximum gain and minimum gain at each frequency point, thereby ensuring that The damage compensation gain G c (f, k) is within a reasonable range.
  • the resulting signal Y 3 (f,k) is processed by inverse time-frequency transform to obtain a time-domain signal after speech enhancement.
  • the execution body may be a voice signal enhancement apparatus, or a control module in the voice signal enhancement apparatus for executing the voice signal enhancement method.
  • the voice signal enhancement method provided by the embodiment of the present application is described by taking the voice signal enhancement method performed by the voice signal enhancement device as an example.
  • FIG. 4 shows a possible schematic structural diagram of the apparatus for enhancing speech signals involved in the embodiments of the present application.
  • the voice signal enhancement apparatus 70 may include: a processing module 71 , a determination module 72 and a compensation module 73 .
  • the processing module 71 is configured to perform noise reduction processing on the first voice signal according to the first time spectrum and the first power spectrum to obtain a second voice signal, and the first time spectrum is used to indicate the time of the first voice signal. Domain feature and frequency domain feature, the first power spectrum is the power spectrum of the noise signal in the first speech signal.
  • the above determining module 72 is configured to determine a voiced signal from the second speech signal obtained by the processing module 71, where the voiced signal is a signal whose cepstral coefficient is greater than or equal to a preset threshold in the second speech signal.
  • the above compensation module 73 is used to perform gain compensation on the voiced sound signal determined by the determination module 72 .
  • the above determining module 72 is further configured to determine the damage compensation gain of the second speech signal according to the voiced signal after the gain compensation.
  • the compensation module 73 is further configured to perform gain compensation on the second speech signal based on the damage compensation gain determined by the determination module 72 .
  • An embodiment of the present application provides a voice signal enhancement device, since noise reduction processing can be performed on a noisy voice signal (for example, a first voice signal) to reduce noise components in the noisy voice signal, thereby obtaining a pure original voice signal; then, damage gain compensation can be continued to the obtained original voice signal to correct the voice damage generated during the noise reduction process, so as to obtain the final enhanced voice signal. In this way, distortion of the obtained original voice signal can be avoided. problem, thereby improving the quality of the output voice signal.
  • noise reduction processing can be performed on a noisy voice signal (for example, a first voice signal) to reduce noise components in the noisy voice signal, thereby obtaining a pure original voice signal; then, damage gain compensation can be continued to the obtained original voice signal to correct the voice damage generated during the noise reduction process, so as to obtain the final enhanced voice signal.
  • distortion of the obtained original voice signal can be avoided. problem, thereby improving the quality of the output voice signal.
  • the above-mentioned processing module 71 is further configured to perform short-time Fourier transform on the first voice signal before performing noise reduction processing on the first voice signal according to the first time spectrum and the first power spectrum Transform to get the first time spectrum.
  • the above-mentioned determination module 72 is further configured to determine the power spectrum of the first voice signal according to the first time spectrum, and determine the target power spectrum from the power spectrum of the first voice signal, where the target power spectrum is the signal in the preset time window. The power spectrum of the signal with the smallest power spectrum.
  • the above-mentioned processing module 71 is further configured to perform recursive smoothing processing on the target power spectrum determined by the determination module 72 to obtain a first power spectrum.
  • the above-mentioned processing module 71 is specifically configured to determine a posteriori signal-to-noise ratio corresponding to the first voice signal according to the first power spectrum and the power spectrum of the first voice signal, and to determine the posterior signal-to-noise ratio corresponding to the first voice signal. Then, recursive smoothing is performed to obtain the prior signal-to-noise ratio corresponding to the first speech signal; and the target noise reduction gain is determined according to the a posteriori signal-to-noise ratio and the prior signal-to-noise ratio; and the target noise reduction gain is determined according to the first time spectrum and the target noise reduction gain. , performing noise reduction processing on the first voice signal.
  • the above compensation module 73 is specifically configured to perform homomorphic positive analysis processing on the second speech signal to obtain the target cepstral coefficient of the second speech signal; and determine the maximum inverse cepstral coefficient from the target cepstral coefficient spectral coefficient, determining the signal corresponding to the largest cepstral coefficient in the second speech signal as a voiced signal; and performing gain amplification processing on the largest cepstral coefficient to perform gain compensation on the voiced signal.
  • the compensation module 73 is specifically configured to perform homomorphic inverse analysis processing on the first cepstral coefficient and the maximum cepstral coefficient after the gain amplification processing, to obtain the first logarithmic time spectrum, the first The cepstral coefficient is the cepstral coefficient except the largest cepstral coefficient in the target cepstral coefficient; and according to the time spectrum of the second voice signal, the logarithmic time spectrum of the second voice signal is determined, and the logarithmic time spectrum of the second voice signal is determined according to the first logarithmic time spectrum. The difference from the logarithmic time spectrum of the second speech signal determines the impairment compensation gain.
  • the above-mentioned second speech signal is a signal obtained by performing noise reduction processing on the target frequency domain signal
  • the above-mentioned target frequency domain signal is a signal obtained by performing short-time Fourier transform on the first speech signal
  • the voice signal enhancement apparatus 70 provided in the embodiment of the present application further includes an output module.
  • the above processing module 71 is specifically used for the compensation module 73 to perform inverse time-frequency transform processing on the second speech signal after gain compensation based on the damage compensation gain, after performing gain compensation on the second speech signal to obtain the target time domain signal.
  • the above-mentioned output module is used to output the target time domain signal obtained by the processing module 71 .
  • the voice signal enhancement device in this embodiment of the present application may be a device, or may be a component, an integrated circuit, or a chip in a terminal.
  • the apparatus may be a mobile electronic device or a non-mobile electronic device.
  • the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palmtop computer, an in-vehicle electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook, or a personal digital assistant (personal digital assistant).
  • UMPC ultra-mobile personal computer
  • netbook or a personal digital assistant
  • non-mobile electronic devices can be servers, network attached storage (Network Attached Storage, NAS), personal computer (personal computer, PC), television (television, TV), teller machine or self-service machine, etc., this application Examples are not specifically limited.
  • Network Attached Storage NAS
  • personal computer personal computer, PC
  • television television
  • teller machine or self-service machine etc.
  • the voice signal enhancement device in the embodiment of the present application may be a device with an operating system.
  • the operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, which are not specifically limited in the embodiments of the present application.
  • the voice signal enhancement apparatus provided in the embodiments of the present application can implement the various processes implemented in the foregoing method embodiments, and can achieve the same technical effect. In order to avoid repetition, details are not repeated here.
  • an embodiment of the present application further provides an electronic device 90, including a processor 91, a memory 92, a program or instruction stored in the memory 92 and executable on the processor 91, When the program or instruction is executed by the processor 91, each process of the above method embodiment is implemented, and the same technical effect can be achieved. To avoid repetition, details are not repeated here.
  • the electronic devices in the embodiments of the present application include the aforementioned mobile electronic devices and non-mobile electronic devices.
  • FIG. 6 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.
  • the electronic device 100 includes but is not limited to: a radio frequency unit 101, a network module 102, an audio output unit 103, an input unit 104, a sensor 105, a display unit 106, a user input unit 107, an interface unit 108, a memory 109, and a processor 110, etc. part.
  • the electronic device 100 may also include a power source (such as a battery) for supplying power to various components, and the power source may be logically connected to the processor 110 through a power management system, so as to manage charging, discharging, and power management through the power management system. consumption management and other functions.
  • a power source such as a battery
  • the structure of the electronic device shown in FIG. 6 does not constitute a limitation on the electronic device, and the electronic device may include more or less components than those shown in the figure, or combine some components, or arrange different components, which will not be repeated here. .
  • the processor 110 is configured to perform noise reduction processing on the first voice signal according to the first time spectrum and the first power spectrum to obtain a second voice signal, where the first time spectrum is used to indicate the time domain of the first voice signal feature and frequency domain feature, the first power spectrum is the power spectrum of the noise signal in the first voice signal; and the voiced signal is determined from the second voice signal, and the gain compensation is performed on the voiced signal, and the voiced signal is the second voice signal a signal whose middle cepstral coefficient is greater than or equal to a preset threshold; and determining an impairment compensation gain of the second speech signal according to the voiced signal after gain compensation, and performing gain compensation on the second speech signal based on the impairment compensation gain.
  • the embodiment of the present application provides an electronic device, because the electronic device can first perform noise reduction processing on a noisy speech signal (for example, a first speech signal) to reduce the noise components in the noisy speech signal, so as to obtain a pure original voice signal; then, the electronic device can continue to perform damage gain compensation on the obtained original voice signal to correct the voice damage generated during the noise reduction process, so as to obtain the final enhanced voice signal.
  • a noisy speech signal for example, a first speech signal
  • the electronic device can continue to perform damage gain compensation on the obtained original voice signal to correct the voice damage generated during the noise reduction process, so as to obtain the final enhanced voice signal.
  • the problem of distortion of the original voice signal thereby improving the quality of the voice signal output by the electronic device.
  • the processor 110 is further configured to perform short-time Fourier transform on the first speech signal before performing noise reduction processing on the first speech signal according to the first time spectrum and the first power spectrum. transform to obtain a first time spectrum; and determine the power spectrum of the first voice signal according to the first time spectrum, and determine a target power spectrum from the power spectrum of the first voice signal, where the target power spectrum is the signal in the preset time window The power spectrum of the signal with the smallest power spectrum; and recursively smoothing the target power spectrum to obtain the first power spectrum.
  • the processor 110 is specifically configured to determine a posteriori signal-to-noise ratio corresponding to the first voice signal according to the first power spectrum and the power spectrum of the first voice signal, Perform recursive smoothing to obtain a priori signal-to-noise ratio corresponding to the first speech signal; and determine the target noise reduction gain according to the a posteriori signal-to-noise ratio and the prior signal-to-noise ratio; and according to the first time spectrum and the target noise reduction gain, Noise reduction processing is performed on the first speech signal.
  • the processor 110 is specifically configured to perform homomorphic positive analysis processing on the second speech signal to obtain a target cepstral coefficient of the second speech signal; and determine the maximum inverse cepstral coefficient from the target cepstral coefficient.
  • spectral coefficient determining the signal corresponding to the largest cepstral coefficient in the second speech signal as a voiced signal; and performing gain amplification processing on the largest cepstral coefficient to perform gain compensation on the voiced signal.
  • the processor 110 is specifically configured to perform homomorphic inverse analysis processing on the first cepstral coefficient and the maximum cepstral coefficient after the gain amplification processing, to obtain a first logarithmic time spectrum, the first The cepstral coefficient is the cepstral coefficient except the largest cepstral coefficient in the target cepstral coefficient; and according to the time spectrum of the second voice signal, the logarithmic time spectrum of the second voice signal is determined, and the logarithmic time spectrum of the second voice signal is determined according to the first logarithmic time spectrum. The difference from the logarithmic time spectrum of the second speech signal determines the impairment compensation gain.
  • the second voice signal is a signal obtained by performing noise reduction processing on a target frequency domain signal
  • the target frequency domain signal is a signal obtained by performing short-time Fourier transform on the first voice signal.
  • the processor 110 is specifically configured to perform inverse time-frequency transform processing on the gain-compensated second speech signal after gain compensation is performed on the second speech signal based on the damage compensation gain to obtain a target time-domain signal.
  • the audio output unit 103 is used for outputting the target time domain signal.
  • the electronic device provided by the embodiments of the present application can implement the various processes implemented by the foregoing method embodiments, and can achieve the same technical effect. To avoid repetition, details are not described here.
  • the input unit 104 may include a graphics processor (Graphics Processing Unit, GPU) 1041 and a microphone 1042. Such as camera) to obtain still pictures or video image data for processing.
  • the display unit 106 may include a display panel 1061, which may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like.
  • the user input unit 107 includes a touch panel 1071 and other input devices 1072 .
  • the touch panel 1071 is also called a touch screen.
  • the touch panel 1071 may include two parts, a touch detection device and a touch controller.
  • Other input devices 1072 may include, but are not limited to, physical keyboards, function keys (such as volume control keys, switch keys, etc.), trackballs, mice, and joysticks, which are not described herein again.
  • Memory 109 may be used to store software programs as well as various data including, but not limited to, application programs and operating systems.
  • the processor 110 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, and an application program, and the like, and the modem processor mainly processes wireless communication. It can be understood that, the above-mentioned modulation and demodulation processor may not be integrated into the processor 110 .
  • Embodiments of the present application further provide a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or instruction is executed by a processor, each process of the foregoing method embodiments can be implemented, and the same technology can be achieved The effect, in order to avoid repetition, is not repeated here.
  • the processor is the processor in the electronic device described in the foregoing embodiments.
  • the readable storage medium includes a computer-readable storage medium, such as a computer read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk or an optical disk, and the like.
  • An embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction to implement each process of the foregoing method embodiments , and can achieve the same technical effect, in order to avoid repetition, it is not repeated here.
  • the chip mentioned in the embodiments of the present application may also be referred to as a system-on-chip, a system-on-chip, a system-on-a-chip, or a system-on-a-chip, or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

一种语音信号增强方法、装置、电子设备、可读存储介质、芯片,该方法包括:根据第一时频谱和第一功率谱,对第一语音信号进行降噪处理,得到第二语音信号,该第一时频谱用于指示第一语音信号的时域特征和频域特征,该第一功率谱为第一语音信号中的噪声信号的功率谱(201);从第二语音信号中确定浊音信号,并对浊音信号进行增益补偿,该浊音信号为第二语音信号中倒谱系数大于或等于预设阈值的信号(202);根据增益补偿后的浊音信号,确定第二语音信号的损伤补偿增益,并基于损伤补偿增益,对第二语音信号进行增益补偿(203)。

Description

语音信号增强方法、装置及电子设备
相关申请的交叉引用
本申请主张在2021年4月16日在中国提交的中国专利申请号No.202110410394.8的优先权,其全部内容通过引用包含于此。
技术领域
本申请属于通信技术领域,具体涉及一种语音信号增强方法、装置及电子设备。
背景技术
随着终端技术的发展,用户对电子设备的通话质量的需求越来越高,为了提升电子设备在通话过程中获取的语音质量,在传统语音增强技术中,电子设备可以通过降低带噪语音信号中的噪声成分,以从该带噪语音信号中获取纯净的原始语音信号,从而保证获取的语音信号的质量。
然而,由于在降低带噪语音信号中的噪声成分的过程中,可能会损伤带噪语音信号中的原始语音信号的质量,使得电子设备获取的原始语音信号失真,从而导致电子设备输出的语音信号的质量较差。
发明内容
本申请实施例的目的是提供一种语音信号增强方法、装置及电子设备,能够解决电子设备输出的语音信号的质量较差的问题。
为了解决上述技术问题,本申请是这样实现的:
第一方面,本申请实施例提供了一种语音信号增强方法,该语音信号增强方法包括:根据第一时频谱和第一功率谱,对第一语音信号进行降噪处理,得到第二语音信号,该第一时频谱用于指示第一语音信号的时域特征和频域特征,该第一功率谱为第一语音信号中的噪声信号的功率谱;从第二语音信号中确定浊音信号,并对浊音信号进行增益补偿,该浊音信号为第二语音信号中倒谱系数大于或等于预设阈值的信号;根据增益补偿后的浊音信号,确定第二语音信号的损伤补偿增益,并基于损伤补偿增益,对第二语音信号进行增益补偿。
第二方面,本申请实施例提供了一种语音信号增强装置,该语音信号增强装置包括: 处理模块、确定模块和补偿模块。其中,处理模块,用于根据第一时频谱和第一功率谱,对第一语音信号进行降噪处理,得到第二语音信号,该第一时频谱用于指示第一语音信号的时域特征和频域特征,该第一功率谱为第一语音信号中的噪声信号的功率谱。确定模块,用于从处理模块得到的第二语音信号中确定浊音信号,该浊音信号为第二语音信号中倒谱系数大于或等于预设阈值的信号。补偿模块,用于对确定模块确定的浊音信号进行增益补偿。确定模块,还用于根据增益补偿后的浊音信号,确定第二语音信号的损伤补偿增益。补偿模块,还用于基于确定模块确定的损伤补偿增益,对第二语音信号进行增益补偿。
第三方面,本申请实施例提供了一种电子设备,该电子设备包括处理器、存储器及存储在所述存储器上并可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如第一方面所述的方法的步骤。
第四方面,本申请实施例提供了一种可读存储介质,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如第一方面所述的方法的步骤。
第五方面,本申请实施例提供了一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如第一方面所述的方法。
在本申请实施例中,电子设备在根据第一语音信号的时频谱和第一语音信号中噪声信号的功率谱,对第一语音信号进行降噪处理得到第二语音信号之后,可以从第二语音信号中确定浊音信号,以对该浊音信号进行增益补偿,并根据增益补偿后的浊音信号确定第二语音信号的损伤补偿增益,以基于该损伤补偿增益对第二语音信号进行增益补偿。通过本方案,由于电子设备可以先通过对带噪语音信号(例如第一语音信号)进行降噪处理,以降低带噪语音信号中的噪声成分,从而获取到纯净的原始语音信号;然后,电子设备还可以继续对得到的原始语音信号进行损伤增益补偿,以修正降噪处理过程中产生的语音损伤,从而得到最终增强后的语音信号,如此,可以避免电子设备获取的原始语音信号失真的问题,从而提高了电子设备输出的语音信号的质量。
附图说明
图1是本申请实施例提供的一种语音信号增强方法的示意图之一;
图2是本申请实施例提供的一种语音信号增强方法的示意图之二;
图3是本申请实施例提供的一种语音信号增强方法的示意图之三;
图4是本申请实施例提供的一种语音信号增强装置的结构示意图;
图5是本申请实施例提供的一种电子设备的硬件结构示意图之一;
图6是本申请实施例提供的一种电子设备的硬件结构示意图之二。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书中的术语“第一”、“第二”等是用于区别类似的对象,而不用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施,且“第一”、“第二”等所区分的对象通常为一类,并不限定对象的个数,例如第一对象可以是一个,也可以是多个。此外,说明书以及权利要求中“和/或”表示所连接对象的至少其中之一,字符“/”,一般表示前后关联对象是一种“或”的关系。
下面对本申请实施例提供的语音信号增强方法、装置及电子设备中涉及的一些概念和/或术语做一下解释说明。
倒谱(cesptrum,CESP):一种信号的傅里叶变换谱经过对数运算后再进行傅里叶反变换得到的谱。
最小量控制递归平均(minima controlled recursive avaraging,MCRA):使用平滑参数对功率谱的过去值进行平均,该平滑参数是根据每个子带中语音存在概率进行调整的。如果给定帧的某个子带中存在语音信号,则噪声功率谱不变,如果给定帧的某个子带中不存在语音信号,则将前一帧的噪声估计值作为当前帧的噪声估计值。
改进型最小量控制递归平均(improved minima controlled recursive averaging,IMCRA):在MCRA的基础上,采用两次平滑处理和最小统计量跟踪进行噪声估计。
快速傅里叶变换(fast fourier transform,FFT):是离散傅氏变换的快速算法,根据离散傅氏变换的奇、偶、虚、实等特性,对离散傅立叶变换的算法进行改进获得的。
短时傅里叶变换(short-time fourier transform,STFT):是和傅里叶变换相关的一种数学变换,用以确定时变信号其局部区域正弦波的频率与相位。短时傅里叶变换就是将原来的傅里叶变换在时域截短为多段,对每一段分别进行傅里叶变换求出每一段的频域特性(也就是同时知道了时域和频域的对应关系)。
最小均方误差估计(minimum mean-square error,MMSE):基于给定的观测值求一个随机变量的估计,现有估计理论中常用方法是寻求变换函数使均方误差最小。
对数幅度谱最小均方误差估计(minimum mean-square error log-spectral amplitude, MMSE-LSA):首先根据语音信号的准平稳特性对语音信号进行分帧处理,这样每帧信号都认为具有平稳特性,再求出每帧信号的短时频谱,并提取特征参数,然后利用语音检测算法判断每帧信号是噪声信号还是带噪语音信号,并采用MMSE方法估计出纯净语音信号的短时谱幅度,最后利用人耳对语音相位不敏感的特性,对语音信号的短时谱相位和估计的短时谱幅度重构语音信号,从而得到增强后的语音信号。
下面结合附图,通过具体的实施例及其应用场景对本申请实施例提供的语音信号增强方法进行详细地说明。
在电子设备进行语音通话的场景中,以语音降噪为主的语音增强技术已逐步应用。在传统语音增强技术中,谱减法、维纳滤波及统计模型为基础的降噪方法由于其简单有效,工程运算量低等优点而被广泛使用。例如,单麦克风降噪方案,通过估计输入信号中的噪声功率谱,得到先验信噪比和后验信噪比,然后利用传统降噪方法计算降噪增益,并作用到输入信号中得到降噪处理后的语音信号。又例如,多麦克风降噪方案,利用空间信息对输入多路信号进行波束形成,在滤除相干噪声后,对波束聚合的单路信号实施单麦克风降噪方案,利用传统降噪方法计算降噪增益,并作用到波束聚合后的信号中得到降噪处理后的语音信号。下面以单麦克风降噪方案为例,说明传统降噪方法的技术实现。
麦克风接收到的带噪语音信号为:
y(t)=x(t)+n(t);   (公式一)
其中,干净语音信号为x(t),加性随机噪声为n(t),该带噪语音信号经过分帧加窗和FFT转换到时频域为:
Y(f,k)=FFT[y(t)]=X(f,k)+N(f,k);   (公式二)
其中,k为帧号。
定义后验信噪比γ(f,k)(也可以描述为γ(f))如下公式三,先验信噪比ξ(f,k)(也可以描述为ξ(f))如下公式四,其中,P nn(f,k)为噪声功率谱的估计值,P yy(f,k)为带噪语音信号功率谱(已知),P xx(f,k)为干净语音信号功率谱(未知),
γ(f)=P yy(f)/P nn(f);   (公式三)
ξ(f)=P xx(f)/P nn(f)。   (公式四)
噪声功率谱估计的常用策略如下:首先对输入信号(即带噪语音信号)进行语音活动检测,在纯噪声信号的时频段,输入信号中噪声信号的功率谱等于纯噪声信号的功率谱;在纯语音信号的时频段,噪声信号的功率谱不进行更新;在介于纯语音信号与噪声信号中间的时频段,噪声信号的功率谱按照特定常数进行更新。上述估计策略可以参考MCRA和IMCRA中噪声功率谱估计方法。
先验信噪比ξ(f,k)可以由后验信噪比γ(f,k)-1求出,并利用判决引导法与上一帧信号的先验信噪比ξ(f,k-1)经过递归平滑处理得到,具体算法为:
ξ(f,k)=α*ξ(f,k-1)+(1-α)*max(0,γ(f,k)-1),   (公式五)
其中,α为平滑系数。
在通过噪声功率谱计算得到先验信噪比和后验信噪比之后,降噪增益G(f)可以由以下几种方式计算得到:
1)谱减法形式的降噪增益为:
Figure PCTCN2022086098-appb-000001
2)维纳滤波形式的降噪增益为:
Figure PCTCN2022086098-appb-000002
3)统计模型形式(例如MMSE对数幅度谱估计)的降噪增益为:
Figure PCTCN2022086098-appb-000003
其中,
Figure PCTCN2022086098-appb-000004
电子设备可以根据输入信号和降噪增益,得到降噪处理后的语音信号为:
Figure PCTCN2022086098-appb-000005
从上述计算降噪增益的公式可以看出,这几种计算降噪增益的方式均间接依赖于噪声功率谱的准确估计和跟踪,从P nn(f)到G(f)的误差传递过程为P nn(f)→γ(f)→ξ(f)→G(f)。
在噪声功率谱准确估计的前提下(例如平稳噪声场景),传统降噪方法能够获得足够的降噪增益,并保证较小的语音失真。然而,在实际应用场景中,例如大噪声低信噪比场景(即干净语音信号的功率小于或等于噪声信号的功率)或者噪声强度和概率分布随时间变化场景(例如汽车经过、地铁的启动和停止),噪声功率谱估计难以做到准确且实时,其受限于语音活动检测和噪声功率谱估计方法本身的准确性和收敛时间等因素,导致噪声功率谱估计的结果可能存在偏差。
根据上述从噪声功率谱P nn(f)到降噪增益G(f)的误差传递过程可以得知:
在第一种情况下,当噪声功率谱欠估计时,先验信噪比偏高,传统降噪方法产生的降噪增益不足,此时降噪处理对干净语音信号的损伤较小,但对噪声信号的抑制能力不足。
在第二种情况下,当噪声功率谱过估计时,先验信噪比偏低,传统降噪方法产生的降噪增益过大,此时会损伤干净语音信号的质量,使得干净语音信号失真。
综上所述,如果希望尽可能地降低带噪语音信号中的噪声成分,就必须面临第二种情 况中干净语音信号损伤的问题。
为了解决上述技术问题,本申请实施例中,电子设备可以对获取的带噪语音信号进行分帧加窗处理和快速傅里叶变换(FFT),以将带噪语音信号从时域信号转换到频域信号,从而得到带噪语音信号的时频谱,然后根据带噪语音信号的时频谱确定带噪语音信号的功率谱,通过对带噪语音信号的功率谱最小值进行递归平滑处理得到带噪语音信号中噪声信号的功率谱,以根据噪声信号的功率谱计算降噪增益,从而根据带噪语音信号和降噪增益得到降噪处理后的语音信号。在降噪处理后,电子设备可以将降噪处理后的语音信号从时频域转换到倒谱域,通过对降噪处理后的语音信号进行同态正分析,得到降噪处理后的语音信号的倒谱系数,并将这些倒谱系数中较大的倒谱系数对应的信号确定为浊音信号,然后对该浊音信号的倒谱系数进行增益放大,以对浊音信号进行增益补偿,从而得到增强后的语音信号的对数时频谱,电子设备可以根据同态滤波增强前后的对数时频谱的差值,得到损伤补偿增益,以根据降噪处理后的语音信号和损伤补偿增益,实现对降噪处理后的语音信号进行增益补偿,从而得到最终增强后的语音信号。
通过本方案,由于电子设备可以先通过对带噪语音信号(例如第一语音信号)进行降噪处理,以降低带噪语音信号中的噪声成分,从而获取到纯净的原始语音信号;然后,电子设备还可以继续对得到的原始语音信号进行损伤增益补偿,以修正降噪处理过程中产生的语音损伤,从而得到最终增强后的语音信号,如此,可以避免电子设备获取的原始语音信号失真的问题,从而提高了电子设备输出的语音信号的质量。
本申请实施例提供一种语音信号增强方法,图1示出了本申请实施例提供的一种语音信号增强方法的流程图,该方法可以应用于电子设备。如图1所示,本申请实施例提供的语音信号增强方法可以包括下述的步骤201至步骤204。
步骤201、电子设备根据第一时频谱和第一功率谱,对第一语音信号进行降噪处理,得到第二语音信号。
本申请实施例中,上述第一时频谱用于指示第一语音信号的时域特征和频域特征,上述第一功率谱为第一语音信号中的噪声信号的功率谱。
本申请实施例中,在用户通过电子设备进行语音通话的过程中,电子设备可以实时检测语音通话过程中的语音信号,以获取带噪语音信号(例如第一语音信号),并根据该带噪语音信号的信号参数(例如整个带噪语音信号的时频谱、带噪语音信号中噪声信号的功率谱),对该带噪语音信号进行降噪处理,以得到降噪处理后的语音信号,从而实现对带噪语音信号的增益补偿。
需要说明的是,上述第一时频谱可以理解为:第一语音信号对应的频域信号(例如下 述实施例所述的第一语音信号经过短时傅里叶变换得到的频域信号)的时频谱。上述第一时频谱用于指示第一语音信号的时域特征和频域特征可以理解为:第一时频谱不仅可以反映第一语音信号的时域特征,而且还可以反映第一语音信号的频域特征。
可选地,本申请实施例中,在上述步骤201之前,本申请实施例提供的语音信号增强方法还包括下述的步骤301至步骤303。
步骤301、电子设备对第一语音信号进行短时傅里叶变换,得到第一时频谱。
本申请实施例中,电子设备将通过麦克风接收到的第一语音信号转换为数字信号,该数字信号经过短时傅里叶变换(即分帧加窗处理和快速傅里叶变换(FFT))实现从时域信号转换到频域信号,具体算法为:
Y 1(f,k)=STFT(y(n)),   (公式十)
其中,Y 1(f,k)为第一语音信号对应的频域信号,y(n)为第一语音信号(即时域信号),从而得到第一语音信号的时频谱。
步骤302、电子设备根据第一时频谱确定第一语音信号的功率谱,并从第一语音信号的功率谱中确定目标功率谱。
本申请实施例中,上述目标功率谱为预设时间窗口内的信号中功率谱最小的信号的功率谱。
本申请实施例中,电子设备可以根据第一语音信号的时频谱,采用第一预设算法(如下公式十一),确定第一语音信号的功率谱P yy(f,k),并从预设时间窗口内的信号中确定功率谱最小的信号的功率谱P ymin(f)(即目标功率谱),具体算法如下公式十二,
P yy(f,k)=|Y 1(f,k)| 2,   (公式十一)
P ymin(f)=min[P yy(f,k),P yy(f,k-1),…P yy(f,k-N min)],  (公式十二)
其中,N为小于k的整数(N=0,1,2,…,k-1)。
需要说明的是,预设时间窗口内的信号可以为整个第一语音信号或者第一语音信号中的部分语音信号。
步骤303、电子设备对目标功率谱进行递归平滑处理,得到第一功率谱。
本申请实施例中,电子设备可以将目标功率谱P ymin(f)经过α s递归平滑处理,得到第一语音信号中噪声信号的功率谱P nn(f)(即第一功率谱),递归平滑处理的算法为:
P nn(f,k)=α s*P nn(f,k-1)+(1-α s)*P ymin(f),   (公式十三)
其中,平滑系数α s由当前帧的语音存在概率控制,当语音存在概率接近1时,α s接近0。
需要说明的是,带噪语音信号是由纯净语音信号和噪声信号组成,可以通过对每帧信 号进行语音存在概率估计,以确定带噪语音信号中的纯净语音信号和噪声信号,即带噪语音信号中哪些帧的信号是纯净语音信号,哪些帧的信号是噪声信号。
本申请实施例中,电子设备可以对麦克风拾音到的第一语音信号(即带噪语音信号)进行短时傅里叶变换,得到第一语音信号的时频谱(即第一时频谱),以根据第一时频谱,采用第一预设算法,确定第一语音信号的功率谱,并从第一语音信号的功率谱中确定预设时间窗口内的信号中功率谱最小的信号的功率谱(即目标功率谱),以对目标功率谱进行递归平滑处理,得到第一语音信号中噪声信号的功率谱(即第一功率谱),从而电子设备可以通过第一时频谱和第一功率谱,实现对第一语音信号进行降噪处理。
可选地,本申请实施例中,上述步骤201具体可以通过下述的步骤201a至步骤201c实现。
步骤201a、电子设备根据第一功率谱和第一语音信号的功率谱,确定第一语音信号对应的后验信噪比,并对后验信噪比进行递归平滑处理,得到第一语音信号对应的先验信噪比。
本申请实施例中,后验信噪比如下公式十四,先验信噪比如下公式十五,其中,平滑因子α=0.7。
γ(f,k)=P yy(f,k)/P nn(f,k),   (公式十四)
ξ(f,k)=α*ξ(f,k-1)+(1-α)*max(0,γ(f,k)-1),   (公式十五)
步骤201b、电子设备根据后验信噪比和先验信噪比,确定目标降噪增益。
本申请实施例中,目标降噪增益G 1(f,k)可以由先验信噪比和后验信噪比计算得到,具体算法为:
Figure PCTCN2022086098-appb-000006
其中,
Figure PCTCN2022086098-appb-000007
步骤201c、电子设备根据第一时频谱和目标降噪增益,对第一语音信号进行降噪处理,得到第二语音信号。
本申请实施例中,电子设备可以根据第一时频谱和目标降噪增益,采用第二预设算法(如下公式十七),对第一语音信号(即第一语音信号对应的频域信号)进行降噪处理,得到第二语音信号Y 2(f,k)(即对第一语音信号对应的频域信号进行降噪处理后的信号),
Y 2(f,k)=Y 1(f,k)*G 1(f,k)。   (公式十七)
本申请实施例中,电子设备可以根据第一语音信号中噪声信号的功率谱和第一语音信号的功率谱,确定第一语音信号对应的后验信噪比,并对后验信噪比进行递归平滑处理得 到第一语音信号对应的先验信噪比,以根据该后验信噪比和该先验信噪比,确定目标降噪增益,从而根据第一语音信号的时频谱和目标降噪增益,采用第二预设算法,对第一语音信号进行降噪处理,以得到降噪处理后的语音信号。如此,通过对带噪语音信号进行降噪处理,以降低带噪语音信号中的噪声成分,从而获取纯净的原始语音信号,提高了电子设备输出的语音信号的质量。
步骤202、电子设备从第二语音信号中确定浊音信号,并对浊音信号进行增益补偿。
本申请实施例中,上述浊音信号为第二语音信号中倒谱系数大于或等于预设阈值的信号。
本申请实施例中,电子设备可以先确定第二语音信号的倒谱系数,然后将第二语音信号中倒谱系数较大的信号确定为浊音信号,以对该浊音信号进行增益补偿,从而实现对第二语音信号进行增益补偿。
可以理解,电子设备可以预先设定浊音信号的判决门限(即预设阈值),以从第二语音信号中确定倒谱系数大于或等于该判决门限的信号,以将该信号确定为浊音信号,该浊音信号在时频域和倒谱域具有明显的基音特征和谐波特征。
可选地,本申请实施例中,上述步骤202具体可以通过下述的步骤202a至步骤202c实现。
步骤202a、电子设备对第二语音信号进行同态正分析处理,得到第二语音信号的目标倒谱系数。
本申请实施例中,上述目标倒谱系数包括至少一个倒谱系数,每个倒谱系数分别对应第二语音信号中的一帧信号。需要说明的是,针对第二语音信号的每帧信号,电子设备可以将第二语音信号划分为至少一个语音片段,一个语音片段可以理解为第二语音信号的一帧信号。
本申请实施例中,电子设备可以对第二语音信号对应的频域信号Y 2(f,k)进行同态正分析处理,得到第二语音信号的倒谱系数Q(c,k),其中c为倒谱系数的时间索引,具体算法为:
Q(c,k)=iFFT[log(|Y 2(f 1,k)|,|Y 2(f 2,k)|,…,|Y 2(f n,k)|)]。   (公式十八)
示例性的,如图2中的(A)所示,示出了第一语音信号(也可以称为带噪语音时域信号)的波形图;电子设备在对该带噪语音时域信号进行降噪处理后,得到第二语音信号,并通过对数计算得到如图2中的(B)所示的第二语音信号的对数时频谱;然后,电子设备可以对该第二语音信号进行同态正分析处理,得到如图2中的(C)所示的第二语音信号的倒谱(横轴为时间索引,纵轴为倒谱系数)。
步骤202b、电子设备从目标倒谱系数中确定最大倒谱系数,并将第二语音信号中与最 大倒谱系数对应的信号确定为浊音信号。
本申请实施例中,第二语音信号中的每帧信号分别对应了一个倒谱系数,电子设备可以从获取的至少一个倒谱系数中查找最大倒谱系数,以将该最大倒谱系数对应的一帧信号确定为浊音信号。
可选地,本申请实施例中,电子设备可以预先设定语音基音周期搜索范围为[70Hz-400Hz],该语音基音周期搜索范围对应的倒谱系数的范围为[Fs/400-Fs/70],其中Fs为采样频率,电子设备从目标倒谱系数中位于该范围内的倒谱系数中搜索最大倒谱系数Q max,其对应的时间索引为c max,假设浊音信号的判别门限为h,当Q max(c,k)>h时,判定该最大倒谱系数对应的信号为浊音信号(例如图2中的(C)中的基因周期位置对应的信号),该浊音信号在频域和倒谱域有明显的基音特征和谐波特征。
步骤202c、电子设备对最大倒谱系数进行增益放大处理,以对浊音信号进行增益补偿。
本申请实施例中,在判定第二语音信号中的某帧信号是浊音信号时,电子设备对该浊音信号对应的最大倒谱系数进行增益放大处理,以实现对该浊音信号进行增益补偿,具体算法为:
Q(c max,k)=g*Q(c max,k),   (公式十九)
其中,g为增益系数,g用于控制补偿增益的大小,例如g的取值可以为1.5。
本申请实施例中,电子设备可以对第二语音信号进行同态正分析处理,以得到第二语音信号的倒谱系数,然后从这些倒谱系数中确定最大倒谱系数,并将第二语音信号中与最大倒谱系数对应的信号确定为浊音信号,从而电子设备可以通过对最大倒谱系数进行增益放大处理,实现对浊音信号进行增益补偿,以便于对降噪处理后的语音信号进行增益补偿。
步骤203、电子设备根据增益补偿后的浊音信号,确定第二语音信号的损伤补偿增益,并基于损伤补偿增益,对第二语音信号进行增益补偿。
可选地,本申请实施例中,上述步骤203中的“电子设备根据增益补偿后的浊音信号,确定第二语音信号的损伤补偿增益”具体可以通过下述的步骤203a和步骤203b实现。
步骤203a、电子设备对第一倒谱系数和增益放大处理后的最大倒谱系数进行同态反分析处理,得到第一对数时频谱。
本申请实施例中,上述第一倒谱系数为目标倒谱系数中除最大倒谱系数之外的倒谱系数。
本申请实施例中,电子设备对目标倒谱系数中除最大倒谱系数之外的倒谱系数和增益放大处理后的最大倒谱系数进行同态反分析处理,以得到增强后的第二语音信号的对数时频谱LY 2E(f,k)(即第一对数时频谱),具体算法为:
LY 2E(f,k)=FFT[Q(c 1,k),Q(c 2,k),…Q(c max,k),…Q n(c n,k)]。  (公式二十)
步骤203b、电子设备根据第二语音信号的时频谱,确定第二语音信号的对数时频谱,并根据第一对数时频谱与第二语音信号的对数时频谱的差值,确定损伤补偿增益。
本申请实施例中,电子设备可以根据第二语音信号的时频谱确定第二语音信号的对数时频谱LY 2(f,k),具体算法如下公式二十一,并根据增强后的第二语音信号的对数时频谱与第二语音信号的对数时频谱的差值,确定损伤补偿增益。
LY 2(f,k)=log(|Y 2(f,k)|)   (公式二十一)
具体的,电子设备可以由倒谱系数增强前后的对数时频谱经过F函数计算得到损伤补偿增益,即
G c(f,k)=F(LY 2(f,k),LY 2E(f,k))。   (公式二十二)
需要说明的是,F函数可以通过两种方式实现。在第一种实现方式中,将对数谱的差值转换为线性系数,作为损伤补偿增益,具体算法如下公式二十三;在第二种实现方式中,在求对数谱差值的基础上,增加增益约束范围,即将对数谱差值限定在增益约束范围内,以控制每个频点上的最大增益和最小增益,从而确保损伤补偿增益G c(f,k)在合理的范围内。
Figure PCTCN2022086098-appb-000008
示例性的,如图3中的(A)所示,示出了同态反分析前后的对数时频谱,即同态滤波增强前后的对数时频谱。电子设备在对最大倒谱系数进行增益放大处理,以对浊音信号进行增益补偿之后,可以继续对目标倒谱系数中除最大倒谱系数之外的倒谱系数和增益放大处理后的最大倒谱系数进行同态反分析处理,得到如图3中的(A)所示的增强后的第二语音信号的对数时频谱(即第一对数时频谱),其中,图3中的(A)中以LY 2表示同态滤波增强前的对数时频谱,以LY 2E表示同态滤波增强后的对数时频谱;电子设备可以根据增强后的第二语音信号的对数时频谱(即LY 2E所示的对数时频谱)与第二语音信号的对数时频谱(即LY 2所示的对数时频谱)的差值,确定如图3中的(B)所示的损伤补偿增益G c,以通过该损伤补偿增益对第二语音信号进行增益补偿。
本申请实施例中,电子设备在对第一语音信号进行降噪处理得到第二语音信号之后,还可以继续对第二语音信号中的浊音信号进行增益补偿,以确定第二语音信号的损伤补偿增益,从而基于该损伤补偿增益实现对第二语音信号的增益补偿,以得到最终增强后的语音信号,提升了语音信号的质量。
本申请实施例提供一种语音信号增强方法,电子设备在根据第一语音信号的时频谱和第一语音信号中噪声信号的功率谱,对第一语音信号进行降噪处理得到第二语音信号之后, 可以从第二语音信号中确定浊音信号,以对该浊音信号进行增益补偿,并根据增益补偿后的浊音信号确定第二语音信号的损伤补偿增益,以基于该损伤补偿增益对第二语音信号进行增益补偿。由于电子设备可以先通过对带噪语音信号(例如第一语音信号)进行降噪处理,以降低带噪语音信号中的噪声成分,从而获取到纯净的原始语音信号;然后,电子设备还可以继续对得到的原始语音信号进行损伤增益补偿,以修正降噪处理过程中产生的语音损伤,从而得到最终增强后的语音信号,如此,可以避免电子设备获取的原始语音信号失真的问题,从而提高了电子设备输出的语音信号的质量。
相较于传统方案,由于在降噪处理的过程中会损伤原始语音信号的质量,通过本方案输出的语音信号(经过语音增强后的信号)的总能量大于输入的语音信号的总能量,且输出的语音信号中的浊音部分(包括基音成分和谐波成分)的频谱相比输入的语音信号的频谱要大(即输出的语音信号受到增强),而传统降噪方法只会衰减输入的语音信号中的噪声信号,即输出的语音信号的能量小于或等于输入的语音信号的能量,因此本方案输出的语音信号的质量高于传统方案输出的语音信号的质量。
可选地,本申请实施例中,上述第二语音信号为对目标频域信号进行降噪处理后的信号,上述目标频域信号为对第一语音信号进行短时傅里叶变换后的信号。在上述步骤203之后,本申请实施例提供的语音信号增强方法还包括下述的步骤204。
步骤204、电子设备对增益补偿后的第二语音信号进行时频反变换处理,得到目标时域信号,并输出目标时域信号。
本申请实施例中,通过对增益补偿后的第二语音信号(即增强后的频域信号)进行时频反变换,以得到语音增强后的时域信号,从而输出增强后的语音信号Y 3(f,k),具体算法为:
Y 3(f,k)=Y 1(f,k)*G 1(f,k)*G c(f,k)。   (公式二十四)
下面对本申请实施例提供的基于同态滤波的语音信号增强方法的具体过程进行描述:在具有声音采集功能的电子设备中,电子设备将麦克风接收到的带噪语音信号(例如第一语音信号)转换为数字信号,然后对该数字信号进行分帧加窗处理和快速傅里叶变换,以将该带噪语音信号从时域信号转换到频域信号,即Y 1(f,k)=STFT(y(n))。然后,电子设备对带噪语音信号的时频谱进行噪声功率谱估计和降噪增益计算,下面以MCRA和MMSE-LSA为例说明降噪处理的过程。带噪语音信号的功率谱为P yy(f,k)=|Y 1(f,k)| 2,采用MCRA设置观察时间窗口,电子设备可以观察预设时间窗口内带噪语音信号的功率谱最小值,即P ymin(f)=min[P yy(f,k),P yy(f,k-1),…P yy(f,k-N min)],噪声功率谱P nn可以由 P ymin(f)经过α s递归平滑处理得到,即P nn(f,k)=α s*P nn(f,k-1)+(1-α s)*P ymin(f),其中α s平滑系数由当前帧信号的语音存在概率控制,当语音概率接近1时,α s值接近0。定义后验信噪比γ(f,k)=P yy(f,k)/P nn(f,k),先验信噪比ξ(f,k)=α*ξ(f,k-1)+(1-α)*max(0,γ(f,k)-1),其中α=0.7。MMSE-LSA方法中降噪增益G 1(f,k)由先验信噪比和后验信噪比计算得到,即
Figure PCTCN2022086098-appb-000009
其中
Figure PCTCN2022086098-appb-000010
经过降噪处理后的信号(即第二语音信号)为Y 2(f,k)=Y 1(f,k)*G 1(f,k),且其对数时频谱LY 2(f,k)=log(|Y 2(f,k)|)。电子设备对Y 2(f,k)进行同态正分析处理,得到降噪处理后的信号的倒谱系数Q(c,k),即Q(c,k)=iFFT[log(|Y 2(f 1,k)|,|Y 2(f 2,k)|,…,|Y 2(f n,k)|)],其中c为倒谱系数的时间索引。电子设备可以预先设定语音基音周期搜索范围[70Hz-400Hz],对应的倒谱系数的范围为[Fs/400-Fs/70],在搜索范围内搜索最大倒谱系数记为Q max,其对应的时间索引记为c max,并设定浊音信号的判别门限为h,当Q max(c,k)>h时判定当前帧信号为浊音信号,即当前帧信号在频域和倒谱域有明显的基音特征和谐波特征。当判断当前帧信号是浊音信号时,电子设备对c max位置对应的倒谱系数(即浊音信号的倒谱系数)进行增益放大,即Q(c max,k)=g*Q(c max,k),其中g为增益系数,电子设备可以通过g控制补偿增益的大小,例如g的取值可以为1.5。电子设备对搜索范围内除最大倒谱系数之外的倒谱系数和增益放大处理后的最大倒谱系数进行同态反分析处理,得到增强后的对数时频谱,即LY 2E(f,k)=FFT[Q(c 1,k),Q(c 2,k),…Q(c max,k),…Q n(c n,k)]。语音损伤补偿增益可以由倒谱系数增益前后的对数时频谱经过F函数计算得到,即G c(f,k)=F(LY 2(f,k),LY 2E(f,k)),F函数可以通过多种方式实现,其中一种实现方式为将对数谱的差值转换成线性系数,作为损伤补偿增益,即
Figure PCTCN2022086098-appb-000011
另一种实现方式是在对数谱差值的基础上,增加增益约束范围,即将对数谱差值限定在增益约束范围内,以控制每个频点上的最大增益和最小增益,从而确保损伤补偿增益G c(f,k)取值在合理范围。经过上述过程,电子设备获取最终语音增强后的信号Y 3(f,k)=Y 1(f,k)*G 1(f,k)*G c(f,k),通过将最终语音增强后的信号Y 3(f,k)经过时频反变换处理,从而得到语音增强后的时域信号。
需要说明的是,本申请实施例提供的语音信号增强方法,执行主体可以为语音信号增强装置,或者该语音信号增强装置中的用于执行语音信号增强方法的控制模块。本申请实施例中以语音信号增强装置执行语音信号增强方法为例,说明本申请实施例提供的语音信号增强装置。
图4示出了本申请实施例中涉及的语音信号增强装置的一种可能的结构示意图。如图4 所示,该语音信号增强装置70可以包括:处理模块71、确定模块72和补偿模块73。
其中,上述处理模块71,用于根据第一时频谱和第一功率谱,对第一语音信号进行降噪处理,得到第二语音信号,该第一时频谱用于指示第一语音信号的时域特征和频域特征,该第一功率谱为第一语音信号中的噪声信号的功率谱。上述确定模块72,用于从处理模块71得到的第二语音信号中确定浊音信号,该浊音信号为第二语音信号中倒谱系数大于或等于预设阈值的信号。上述补偿模块73,用于对确定模块72确定的浊音信号进行增益补偿。上述确定模块72,还用于根据增益补偿后的浊音信号,确定第二语音信号的损伤补偿增益。上述补偿模块73,还用于基于确定模块72确定的损伤补偿增益,对第二语音信号进行增益补偿。
本申请实施例提供一种语音信号增强装置,由于可以先通过对带噪语音信号(例如第一语音信号)进行降噪处理,以降低带噪语音信号中的噪声成分,从而获取到纯净的原始语音信号;然后,还可以继续对得到的原始语音信号进行损伤增益补偿,以修正降噪处理过程中产生的语音损伤,从而得到最终增强后的语音信号,如此,可以避免获取的原始语音信号失真的问题,从而提高了输出的语音信号的质量。
在一种可能的实现方式中,上述处理模块71,还用于根据第一时频谱和第一功率谱,对第一语音信号进行降噪处理之前,对第一语音信号进行短时傅里叶变换,得到第一时频谱。上述确定模块72,还用于根据第一时频谱确定第一语音信号的功率谱,并从第一语音信号的功率谱中确定目标功率谱,该目标功率谱为预设时间窗口内的信号中功率谱最小的信号的功率谱。上述处理模块71,还用于对确定模块72确定的目标功率谱进行递归平滑处理,得到第一功率谱。
在一种可能的实现方式中,上述处理模块71,具体用于根据第一功率谱和第一语音信号的功率谱,确定第一语音信号对应的后验信噪比,并对后验信噪比进行递归平滑处理,得到第一语音信号对应的先验信噪比;并根据后验信噪比和先验信噪比,确定目标降噪增益;以及根据第一时频谱和目标降噪增益,对第一语音信号进行降噪处理。
在一种可能的实现方式中,上述补偿模块73,具体用于对第二语音信号进行同态正分析处理,得到第二语音信号的目标倒谱系数;并从目标倒谱系数中确定最大倒谱系数,将第二语音信号中与最大倒谱系数对应的信号确定为浊音信号;以及对最大倒谱系数进行增益放大处理,以对浊音信号进行增益补偿。
在一种可能的实现方式中,上述补偿模块73,具体用于对第一倒谱系数和增益放大处理后的最大倒谱系数进行同态反分析处理,得到第一对数时频谱,第一倒谱系数为目标倒谱系数中除最大倒谱系数之外的倒谱系数;并根据第二语音信号的时频谱,确定第二语音 信号的对数时频谱,以及根据第一对数时频谱与第二语音信号的对数时频谱的差值,确定损伤补偿增益。
在一种可能的实现方式中,上述第二语音信号为对目标频域信号进行降噪处理后的信号,上述目标频域信号为对第一语音信号进行短时傅里叶变换后的信号;本申请实施例提供的语音信号增强装置70还包括输出模块。上述处理模块71,具体用于补偿模块73基于损伤补偿增益,对第二语音信号进行增益补偿之后,对增益补偿后的第二语音信号进行时频反变换处理,得到目标时域信号。上述输出模块,用于输出处理模块71得到的目标时域信号。
本申请实施例中的语音信号增强装置可以是装置,也可以是终端中的部件、集成电路、或芯片。该装置可以是移动电子设备,也可以为非移动电子设备。示例性的,移动电子设备可以为手机、平板电脑、笔记本电脑、掌上电脑、车载电子设备、可穿戴设备、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本或者个人数字助理(personal digital assistant,PDA)等,非移动电子设备可以为服务器、网络附属存储器(Network Attached Storage,NAS)、个人计算机(personal computer,PC)、电视机(television,TV)、柜员机或者自助机等,本申请实施例不作具体限定。
本申请实施例中的语音信号增强装置可以为具有操作系统的装置。该操作系统可以为安卓(Android)操作系统,可以为ios操作系统,还可以为其他可能的操作系统,本申请实施例不作具体限定。
本申请实施例提供的语音信号增强装置能够实现上述方法实施例实现的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
可选地,如图5所示,本申请实施例还提供一种电子设备90,包括处理器91,存储器92,存储在存储器92上并可在所述处理器91上运行的程序或指令,该程序或指令被处理器91执行时实现上述方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
需要说明的是,本申请实施例中的电子设备包括上述所述的移动电子设备和非移动电子设备。
图6为实现本申请实施例的一种电子设备的硬件结构示意图。
该电子设备100包括但不限于:射频单元101、网络模块102、音频输出单元103、输入单元104、传感器105、显示单元106、用户输入单元107、接口单元108、存储器109、以及处理器110等部件。
本领域技术人员可以理解,电子设备100还可以包括给各个部件供电的电源(比如电 池),电源可以通过电源管理系统与处理器110逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。图6中示出的电子设备结构并不构成对电子设备的限定,电子设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置,在此不再赘述。
其中,处理器110,用于根据第一时频谱和第一功率谱,对第一语音信号进行降噪处理,得到第二语音信号,该第一时频谱用于指示第一语音信号的时域特征和频域特征,该第一功率谱为第一语音信号中的噪声信号的功率谱;并从第二语音信号中确定浊音信号,对浊音信号进行增益补偿,该浊音信号为第二语音信号中倒谱系数大于或等于预设阈值的信号;以及根据增益补偿后的浊音信号,确定第二语音信号的损伤补偿增益,基于损伤补偿增益,对第二语音信号进行增益补偿。
本申请实施例提供一种电子设备,由于电子设备可以先通过对带噪语音信号(例如第一语音信号)进行降噪处理,以降低带噪语音信号中的噪声成分,从而获取到纯净的原始语音信号;然后,电子设备还可以继续对得到的原始语音信号进行损伤增益补偿,以修正降噪处理过程中产生的语音损伤,从而得到最终增强后的语音信号,如此,可以避免电子设备获取的原始语音信号失真的问题,从而提高了电子设备输出的语音信号的质量。
可选地,本申请实施例中,处理器110,还用于根据第一时频谱和第一功率谱,对第一语音信号进行降噪处理之前,对第一语音信号进行短时傅里叶变换,得到第一时频谱;并根据第一时频谱确定第一语音信号的功率谱,从第一语音信号的功率谱中确定目标功率谱,该目标功率谱为预设时间窗口内的信号中功率谱最小的信号的功率谱;以及对目标功率谱进行递归平滑处理,得到第一功率谱。
可选地,本申请实施例中,处理器110,具体用于根据第一功率谱和第一语音信号的功率谱,确定第一语音信号对应的后验信噪比,对后验信噪比进行递归平滑处理,得到第一语音信号对应的先验信噪比;并根据后验信噪比和先验信噪比,确定目标降噪增益;以及根据第一时频谱和目标降噪增益,对第一语音信号进行降噪处理。
可选地,本申请实施例中,处理器110,具体用于对第二语音信号进行同态正分析处理,得到第二语音信号的目标倒谱系数;并从目标倒谱系数中确定最大倒谱系数,将第二语音信号中与最大倒谱系数对应的信号确定为浊音信号;以及对最大倒谱系数进行增益放大处理,以对浊音信号进行增益补偿。
可选地,本申请实施例中,处理器110,具体用于对第一倒谱系数和增益放大处理后的最大倒谱系数进行同态反分析处理,得到第一对数时频谱,第一倒谱系数为目标倒谱系数中除最大倒谱系数之外的倒谱系数;并根据第二语音信号的时频谱,确定第二语音信号的 对数时频谱,以及根据第一对数时频谱与第二语音信号的对数时频谱的差值,确定损伤补偿增益。
可选地,本申请实施例中,上述第二语音信号为对目标频域信号进行降噪处理后的信号,上述目标频域信号为对第一语音信号进行短时傅里叶变换后的信号。处理器110,具体用于基于损伤补偿增益,对第二语音信号进行增益补偿之后,对增益补偿后的第二语音信号进行时频反变换处理,得到目标时域信号。音频输出单元103,用于输出目标时域信号。
本申请实施例提供的电子设备能够实现上述方法实施例实现的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
本实施例中各种实现方式具有的有益效果具体可以参见上述方法实施例中相应实现方式所具有的有益效果,为避免重复,此处不再赘述。
应理解的是,本申请实施例中,输入单元104可以包括图形处理器(Graphics Processing Unit,GPU)1041和麦克风1042,图形处理器1041对在视频捕获模式或图像捕获模式中由图像捕获装置(如摄像头)获得的静态图片或视频的图像数据进行处理。显示单元106可包括显示面板1061,可以采用液晶显示器、有机发光二极管等形式来配置显示面板1061。用户输入单元107包括触控面板1071以及其他输入设备1072。触控面板1071,也称为触摸屏。触控面板1071可包括触摸检测装置和触摸控制器两个部分。其他输入设备1072可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆,在此不再赘述。存储器109可用于存储软件程序以及各种数据,包括但不限于应用程序和操作系统。处理器110可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器110中。
本申请实施例还提供一种可读存储介质,所述可读存储介质上存储有程序或指令,该程序或指令被处理器执行时实现上述方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
其中,所述处理器为上述实施例中所述的电子设备中的处理器。所述可读存储介质,包括计算机可读存储介质,如计算机只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等。
本申请实施例另提供了一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现上述方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
应理解,本申请实施例提到的芯片还可以称为系统级芯片、系统芯片、芯片系统或片 上系统芯片等。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。此外,需要指出的是,本申请实施方式中的方法和装置的范围不限按示出或讨论的顺序来执行功能,还可包括根据所涉及的功能按基本同时的方式或按相反的顺序来执行功能,例如,可以按不同于所描述的次序来执行所描述的方法,并且还可以添加、省去、或组合各种步骤。另外,参照某些示例所描述的特征可在其他示例中被组合。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以计算机软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式,均属于本申请的保护之内。

Claims (16)

  1. 一种语音信号增强方法,所述方法包括:
    根据第一时频谱和第一功率谱,对第一语音信号进行降噪处理,得到第二语音信号,所述第一时频谱用于指示所述第一语音信号的时域特征和频域特征,所述第一功率谱为所述第一语音信号中的噪声信号的功率谱;
    从所述第二语音信号中确定浊音信号,并对所述浊音信号进行增益补偿,所述浊音信号为所述第二语音信号中倒谱系数大于或等于预设阈值的信号;
    根据增益补偿后的所述浊音信号,确定所述第二语音信号的损伤补偿增益,并基于所述损伤补偿增益,对所述第二语音信号进行增益补偿。
  2. 根据权利要求1所述的方法,其中,所述根据第一时频谱和第一功率谱,对第一语音信号进行降噪处理之前,所述方法还包括:
    对所述第一语音信号进行短时傅里叶变换,得到所述第一时频谱;
    根据所述第一时频谱确定所述第一语音信号的功率谱,并从所述第一语音信号的功率谱中确定目标功率谱,所述目标功率谱为预设时间窗口内的信号中功率谱最小的信号的功率谱;
    对所述目标功率谱进行递归平滑处理,得到所述第一功率谱。
  3. 根据权利要求1或2所述的方法,其中,所述根据第一时频谱和第一功率谱,对第一语音信号进行降噪处理,包括:
    根据所述第一功率谱和所述第一语音信号的功率谱,确定所述第一语音信号对应的后验信噪比,并对所述后验信噪比进行递归平滑处理,得到所述第一语音信号对应的先验信噪比;
    根据所述后验信噪比和所述先验信噪比,确定目标降噪增益;
    根据所述第一时频谱和所述目标降噪增益,对所述第一语音信号进行降噪处理。
  4. 根据权利要求1所述的方法,其中,所述从所述第二语音信号中确定浊音信号,并对所述浊音信号进行增益补偿,包括:
    对所述第二语音信号进行同态正分析处理,得到所述第二语音信号的目标倒谱系数;
    从所述目标倒谱系数中确定最大倒谱系数,并将所述第二语音信号中与所述最大倒谱系数对应的信号确定为所述浊音信号;
    对所述最大倒谱系数进行增益放大处理,以对所述浊音信号进行增益补偿。
  5. 根据权利要求4所述的方法,其中,所述根据增益补偿后的所述浊音信号,确定所 述第二语音信号的损伤补偿增益,包括:
    对第一倒谱系数和增益放大处理后的所述最大倒谱系数进行同态反分析处理,得到第一对数时频谱,所述第一倒谱系数为所述目标倒谱系数中除所述最大倒谱系数之外的倒谱系数;
    根据所述第二语音信号的时频谱,确定所述第二语音信号的对数时频谱,并根据所述第一对数时频谱与所述第二语音信号的对数时频谱的差值,确定所述损伤补偿增益。
  6. 根据权利要求1或2所述的方法,其中,所述第二语音信号为对目标频域信号进行降噪处理后的信号,所述目标频域信号为对所述第一语音信号进行短时傅里叶变换后的信号;
    所述基于所述损伤补偿增益,对所述第二语音信号进行增益补偿之后,所述方法还包括:
    对所述增益补偿后的所述第二语音信号进行时频反变换处理,得到目标时域信号,并输出所述目标时域信号。
  7. 一种语音信号增强装置,所述装置包括:处理模块、确定模块和补偿模块;
    所述处理模块,用于根据第一时频谱和第一功率谱,对第一语音信号进行降噪处理,得到第二语音信号,所述第一时频谱用于指示所述第一语音信号的时域特征和频域特征,所述第一功率谱为所述第一语音信号中的噪声信号的功率谱;
    所述确定模块,用于从所述处理模块得到的所述第二语音信号中确定浊音信号,所述浊音信号为所述第二语音信号中倒谱系数大于或等于预设阈值的信号;
    所述补偿模块,用于对所述确定模块确定的所述浊音信号进行增益补偿;
    所述确定模块,还用于根据增益补偿后的所述浊音信号,确定所述第二语音信号的损伤补偿增益;
    所述补偿模块,还用于基于所述确定模块确定的所述损伤补偿增益,对所述第二语音信号进行增益补偿。
  8. 根据权利要求7所述的装置,其中,所述处理模块,还用于根据所述第一时频谱和所述第一功率谱,对所述第一语音信号进行降噪处理之前,对所述第一语音信号进行短时傅里叶变换,得到所述第一时频谱;
    所述确定模块,还用于根据所述第一时频谱确定所述第一语音信号的功率谱,并从所述第一语音信号的功率谱中确定目标功率谱,所述目标功率谱为预设时间窗口内的信号中功率谱最小的信号的功率谱;
    所述处理模块,还用于对所述确定模块确定的所述目标功率谱进行递归平滑处理,得 到所述第一功率谱。
  9. 根据权利要求7或8所述的装置,其中,所述处理模块,具体用于根据所述第一功率谱和所述第一语音信号的功率谱,确定所述第一语音信号对应的后验信噪比,并对所述后验信噪比进行递归平滑处理,得到所述第一语音信号对应的先验信噪比;并根据所述后验信噪比和所述先验信噪比,确定目标降噪增益;以及根据所述第一时频谱和所述目标降噪增益,对所述第一语音信号进行降噪处理。
  10. 根据权利要求7所述的装置,其中,所述补偿模块,具体用于对所述第二语音信号进行同态正分析处理,得到所述第二语音信号的目标倒谱系数;并从所述目标倒谱系数中确定最大倒谱系数,将所述第二语音信号中与所述最大倒谱系数对应的信号确定为所述浊音信号;以及对所述最大倒谱系数进行增益放大处理,以对所述浊音信号进行增益补偿。
  11. 根据权利要求10所述的装置,其中,所述补偿模块,具体用于对第一倒谱系数和增益放大处理后的所述最大倒谱系数进行同态反分析处理,得到第一对数时频谱,所述第一倒谱系数为所述目标倒谱系数中除所述最大倒谱系数之外的倒谱系数;并根据所述第二语音信号的时频谱,确定所述第二语音信号的对数时频谱,以及根据所述第一对数时频谱与所述第二语音信号的对数时频谱的差值,确定所述损伤补偿增益。
  12. 根据权利要求7或8所述的装置,其中,所述第二语音信号为对目标频域信号进行降噪处理后的信号,所述目标频域信号为对所述第一语音信号进行短时傅里叶变换后的信号;所述装置还包括:输出模块;
    所述处理模块,具体用于所述补偿模块基于所述损伤补偿增益,对所述第二语音信号进行增益补偿之后,对所述增益补偿后的所述第二语音信号进行时频反变换处理,得到目标时域信号;
    所述输出模块,用于输出所述处理模块得到的所述目标时域信号。
  13. 一种电子设备,包括处理器,存储器及存储在所述存储器上并可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如权利要求1至6中任一项所述的语音信号增强方法的步骤。
  14. 一种可读存储介质,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如权利要求1至6中任一项所述的语音信号增强方法的步骤。
  15. 一种语音信号增强装置,所述装置用于执行如权利要求1至6中任一项所述的语音信号增强方法。
  16. 一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如权利要求1至6中任一项所述的语音信号增强方 法。
PCT/CN2022/086098 2021-04-16 2022-04-11 语音信号增强方法、装置及电子设备 Ceased WO2022218254A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP22787480.7A EP4325487A4 (en) 2021-04-16 2022-04-11 METHOD AND APPARATUS FOR ENHANCING VOICE SIGNAL, AND ELECTRONIC DEVICE
US18/484,927 US12597433B2 (en) 2021-04-16 2023-10-11 Speech signal enhancement method and apparatus, and electronic device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110410394.8 2021-04-16
CN202110410394.8A CN113241089B (zh) 2021-04-16 2021-04-16 语音信号增强方法、装置及电子设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/484,927 Continuation US12597433B2 (en) 2021-04-16 2023-10-11 Speech signal enhancement method and apparatus, and electronic device

Publications (1)

Publication Number Publication Date
WO2022218254A1 true WO2022218254A1 (zh) 2022-10-20

Family

ID=77128304

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/086098 Ceased WO2022218254A1 (zh) 2021-04-16 2022-04-11 语音信号增强方法、装置及电子设备

Country Status (4)

Country Link
US (1) US12597433B2 (zh)
EP (1) EP4325487A4 (zh)
CN (1) CN113241089B (zh)
WO (1) WO2022218254A1 (zh)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113241089B (zh) * 2021-04-16 2024-02-23 维沃移动通信有限公司 语音信号增强方法、装置及电子设备
CN114495961B (zh) * 2021-12-28 2025-08-08 浙江大华技术股份有限公司 语音降噪方法、装置、电子设备以及计算机可读存储介质
CN114582365B (zh) * 2022-05-05 2022-09-06 阿里巴巴(中国)有限公司 音频处理方法和装置、存储介质和电子设备
CN116504256A (zh) * 2023-04-24 2023-07-28 百瑞互联集成电路(上海)有限公司 一种语音编码方法、装置、介质、设备和程序产品
CN116741201A (zh) * 2023-06-27 2023-09-12 百瑞互联集成电路(上海)有限公司 音频接收端的啸叫检测方法、系统、解码方法及解码器
CN117912462B (zh) * 2023-11-29 2025-11-04 漳州立达信光电子科技有限公司 语音增益控制方法、装置、终端及存储介质
CN118484109B (zh) * 2024-07-16 2024-09-17 成都蓝色起源科技有限公司 弱信号显示方法及装置

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1988738A (zh) * 2005-12-22 2007-06-27 三星电子株式会社 消除语音信号的装置及其方法
CN104704560A (zh) * 2012-09-04 2015-06-10 纽昂斯通讯公司 共振峰依赖的语音信号增强
CN105845150A (zh) * 2016-03-21 2016-08-10 福州瑞芯微电子股份有限公司 一种采用倒谱进行修正的语音增强方法及系统
CN106257584A (zh) * 2015-06-17 2016-12-28 恩智浦有限公司 改进的语音可懂度
CN107910011A (zh) * 2017-12-28 2018-04-13 科大讯飞股份有限公司 一种语音降噪方法、装置、服务器及存储介质
CN110383798A (zh) * 2017-03-08 2019-10-25 三菱电机株式会社 声学信号处理装置、声学信号处理方法和免提通话装置
CN110875049A (zh) * 2019-10-25 2020-03-10 腾讯科技(深圳)有限公司 语音信号的处理方法及装置
CN113241089A (zh) * 2021-04-16 2021-08-10 维沃移动通信有限公司 语音信号增强方法、装置及电子设备

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SE513892C2 (sv) * 1995-06-21 2000-11-20 Ericsson Telefon Ab L M Spektral effekttäthetsestimering av talsignal Metod och anordning med LPC-analys
GB2349259B (en) * 1999-04-23 2003-11-12 Canon Kk Speech processing apparatus and method
DK2151820T3 (da) * 2008-07-21 2012-02-06 Siemens Medical Instr Pte Ltd Fremgangsmåde til forspændingskompensation med henblik på cepstro-temporal udglatning af spektralfilterforstærkninger
CN102664003B (zh) * 2012-04-24 2013-12-04 南京邮电大学 基于谐波加噪声模型的残差激励信号合成及语音转换方法
CN103456310B (zh) * 2013-08-28 2017-02-22 大连理工大学 一种基于谱估计的瞬态噪声抑制方法
US11483663B2 (en) * 2016-05-30 2022-10-25 Oticon A/S Audio processing device and a method for estimating a signal-to-noise-ratio of a sound signal
KR102505719B1 (ko) * 2016-08-12 2023-03-03 삼성전자주식회사 음성 인식이 가능한 디스플레이 장치 및 방법
US11164591B2 (en) * 2017-12-18 2021-11-02 Huawei Technologies Co., Ltd. Speech enhancement method and apparatus
US10885907B2 (en) * 2018-02-14 2021-01-05 Cirrus Logic, Inc. Noise reduction system and method for audio device with multiple microphones
WO2021007841A1 (zh) * 2019-07-18 2021-01-21 深圳市汇顶科技股份有限公司 噪声估计方法、噪声估计装置、语音处理芯片以及电子设备
CN111899752B (zh) 2020-07-13 2023-01-10 紫光展锐(重庆)科技有限公司 快速计算语音存在概率的噪声抑制方法及装置、存储介质、终端
CN112309418B (zh) * 2020-10-30 2023-06-27 出门问问(苏州)信息科技有限公司 一种抑制风噪声的方法及装置

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1988738A (zh) * 2005-12-22 2007-06-27 三星电子株式会社 消除语音信号的装置及其方法
CN104704560A (zh) * 2012-09-04 2015-06-10 纽昂斯通讯公司 共振峰依赖的语音信号增强
CN106257584A (zh) * 2015-06-17 2016-12-28 恩智浦有限公司 改进的语音可懂度
CN105845150A (zh) * 2016-03-21 2016-08-10 福州瑞芯微电子股份有限公司 一种采用倒谱进行修正的语音增强方法及系统
CN110383798A (zh) * 2017-03-08 2019-10-25 三菱电机株式会社 声学信号处理装置、声学信号处理方法和免提通话装置
CN107910011A (zh) * 2017-12-28 2018-04-13 科大讯飞股份有限公司 一种语音降噪方法、装置、服务器及存储介质
CN110875049A (zh) * 2019-10-25 2020-03-10 腾讯科技(深圳)有限公司 语音信号的处理方法及装置
CN113241089A (zh) * 2021-04-16 2021-08-10 维沃移动通信有限公司 语音信号增强方法、装置及电子设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4325487A4 *

Also Published As

Publication number Publication date
EP4325487A1 (en) 2024-02-21
US20240046947A1 (en) 2024-02-08
CN113241089A (zh) 2021-08-10
US12597433B2 (en) 2026-04-07
EP4325487A4 (en) 2024-08-07
CN113241089B (zh) 2024-02-23

Similar Documents

Publication Publication Date Title
CN113241089B (zh) 语音信号增强方法、装置及电子设备
US12057135B2 (en) Speech noise reduction method and apparatus, computing device, and computer-readable storage medium
CN109767783B (zh) 语音增强方法、装置、设备及存储介质
US10504539B2 (en) Voice activity detection systems and methods
CN106486131B (zh) 一种语音去噪的方法及装置
WO2022012367A1 (zh) 快速计算语音存在概率的噪声抑制方法及装置、存储介质、终端
CN111445919B (zh) 结合ai模型的语音增强方法、系统、电子设备和介质
JP6361156B2 (ja) 雑音推定装置、方法及びプログラム
CN110556125B (zh) 基于语音信号的特征提取方法、设备及计算机存储介质
CN112309417B (zh) 风噪抑制的音频信号处理方法、装置、系统和可读介质
WO2021007841A1 (zh) 噪声估计方法、噪声估计装置、语音处理芯片以及电子设备
CN118800268B (zh) 语音信号处理方法、语音信号处理设备及存储介质
CN113160846A (zh) 噪声抑制方法和电子设备
CN106997768B (zh) 一种语音出现概率的计算方法、装置及电子设备
WO2020024787A1 (zh) 音乐噪声抑制方法及装置
CN111261148A (zh) 语音模型的训练方法、语音增强处理方法及相关设备
CN118398022B (zh) 改进的语音增强降噪方法
CN113611320A (zh) 风噪抑制方法、装置、音频设备及系统
WO2025007866A1 (zh) 语音增强方法、装置、电子设备及存储介质
WO2024041512A1 (zh) 音频降噪方法、装置、电子设备及可读存储介质
CN115346545A (zh) 一种基于测量域噪声相减的压缩感知语音增强方法
CN113611319A (zh) 基于语音成分实现的风噪抑制方法、装置、设备及系统
Yu et al. A Single-Channel Speech Enhancement Algorithm Combined with Time-Frequency Mask
Wang et al. Analysis and low-power hardware implementation of a noise reduction algorithm
CN117765910A (zh) 单通道降噪方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22787480

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022787480

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2022787480

Country of ref document: EP

Effective date: 20231116