CN109727605A - Method and system for processing sound signals - Google Patents
Method and system for processing sound signals Download PDFInfo
- Publication number
- CN109727605A CN109727605A CN201811645765.5A CN201811645765A CN109727605A CN 109727605 A CN109727605 A CN 109727605A CN 201811645765 A CN201811645765 A CN 201811645765A CN 109727605 A CN109727605 A CN 109727605A
- Authority
- CN
- China
- Prior art keywords
- signal
- voice signal
- processed
- spectral density
- power spectral
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
Abstract
The present invention discloses a kind of method and system for handling voice signal.One specific embodiment of this method includes: to obtain voice signal to be processed, and the voice signal to be processed includes target sound signal and interference sound signal;It determines the power spectral density of the interference sound signal, and processing is weighted to the voice signal to be processed according to the power spectral density, to obtain the spectrum estimation of target sound signal;Masking threshold is determined according to the spectrum estimation;In the case where determining that the spectrum component of interference sound signal in the voice signal to be processed is greater than the masking threshold, the voice signal to be processed is filtered.This method can make voice signal distortion reduction, sound more natural, and reduce the complexity of algorithm calculating, and accelerate the convergence rate of pre-echo arrester.And it can be improved its robustness under strong background noise and near-end speech environment.
Description
Technical field
The present invention relates to signal processing technology field more particularly to a kind of method and system for handling voice signal.
Background technique
In the prior art, it for the filtering processing of voice signal, can be reduced " music noise ", but there are filter drops
Voice signal of making an uproar that treated less natural problem to a certain extent.Because human ear receive a sound when be likely to by
To the interference and compacting of another sound, this phenomenon is known as masking effect.The tone of two sound or time are upper closer,
Masking effect is more serious, so the residual noise generally after postfilter noise reduction process is lost primary characteristic, certain
Make hearing test unnatural in degree.
Summary of the invention
The embodiment of the present invention provides a kind of method and system for handling voice signal, asks at least solving above-mentioned technology
One of topic.
In a first aspect, the embodiment of the present invention provides a kind of method for handling voice signal, comprising: obtain sound to be processed
Signal, the voice signal to be processed include target sound signal and interference sound signal;Determine the interference sound signal
Power spectral density, and processing is weighted to the voice signal to be processed according to the power spectral density, to obtain target
The spectrum estimation of voice signal;Masking threshold is determined according to the spectrum estimation;It determines and is interfered in the voice signal to be processed
In the case that the spectrum component of voice signal is greater than the masking threshold, place is filtered to the voice signal to be processed
Reason.
Optionally, the interference sound signal includes noise signal and echo signal.
Optionally, processing is weighted to the voice signal to be processed according to the power spectral density, to obtain target
The step of spectrum estimation of voice signal includes:
The voice signal to be processed is converted into frequency-region signal E (Ω);
Determine posteriori SNR PostSNR (Ω) according to the following formula:
PostSNR (Ω)=| E (Ω) |2/(Rbb(Ω)+Rnn(Ω)),
Wherein, Rbb(Ω) is the power spectral density of the echo signal, Rnn(Ω) is the power spectrum of the noise signal
Density;
Prior weight PrioriSNR (Ω) is derived according to the following formula:
PrioriSNR(Ωi)=(1-alpha) * P (PostSNR (Ωi)-1)+alpha*|S’(Ωi-1)|2/Rbb(Ω);
Wherein, alpha is smoothing factor, P (x)=(| x |+x)/2, S ' (Ωi-1) be previous frame voice signal frequency spectrum
Estimation;
Further calculate weighting coefficient HLSA(Ω), and obtain the spectrum estimation S ' (Ω) of the target sound signal:
S ' (Ω)=E (Ω) * HLSA(Ω),
Wherein, theta=PostSNR (Ω) * PrioriSNR (Ω)/(PrioriSNR (Ω)+1).
Optionally it is determined that the spectrum component of interference sound signal is greater than the masking threshold in the voice signal to be processed
In the case where value, the step of being filtered to the voice signal to be processed, includes:
The weighting system of filtering processing is determined according to the power spectral density of the power spectral density of echo signal and noise signal
Number H (Ω):
H (Ω)=min (1, sqrt (RTT(Ω)/(Rbb(Ω)+Rnn(Ω))) +(zeta_b*Rbb(Ω)+zeta_n*Rnn
(Ω))/(Rbb(Ω)+Rnn(Ω))),
Wherein, Rbb(Ω) is the power spectral density of the echo signal, Rnn(Ω) is the power spectrum of the noise signal
Density, zeta_b are echo attenuation coefficient, and zeta_n is noise reduction coefficient.
Optionally, the step of determining masking threshold according to the spectrum estimation include:
According to spectrum estimation, power spectral density B (k) and the extension of the critical band of the voice signal to be processed are determined
Critical band frequency spectrum C (k):
C (k)=B (k) * SF (k),
Wherein, SF (k)=15.81+7.5*k+0.474-17.5*sqrt (1+ (k+0.474) 2), bh, bl are respectively each
The bound frequency of critical band;
According to extension critical band frequency spectrum C (k) and offset function O (k), preliminary masking threshold T (k) is determined:
T (k)=10lg(C(k))-(O(k)/10),
Wherein, offset function O (k)=belta* (14.5+k)+(1-belta) * 5.5;Belta is tone coefficient;
According to preliminary masking threshold T (k) and absolute threshold of audibility Tabs(k), masking threshold R is determinedTT(Ω):
RTT(Ω)=min (T (k), Tabs(k)),
Wherein, Tabs(k)=3.64f-0.8-6.5exp(f-3.3)2+10-3f4。
Optionally, the step of obtaining voice signal to be processed include:
Receive initial voice signal;
Echo cancellor is carried out to the initial voice signal, to obtain the voice signal to be processed.
Optionally, the voice signal to be processed is voice signal.
Second aspect, the embodiment of the present invention provide a kind of system for handling voice signal, comprising: signal acquisition module is used
In obtaining voice signal to be processed, the voice signal to be processed includes target sound signal and interference sound signal;Frequency spectrum is estimated
Determining module is counted, for determining the power spectral density of the interference sound signal, and according to the power spectral density to described
Voice signal to be processed is weighted processing, to obtain the spectrum estimation of target sound signal;Masking threshold determining module is used
In determining masking threshold according to the spectrum estimation;Module is filtered, is done for determining in the voice signal to be processed
The spectrum component of voice signal is disturbed greater than in the case where the masking threshold, place is filtered to the voice signal to be processed
Reason.
Optionally, the interference sound signal includes noise signal and echo signal.
Optionally, the spectrum estimation determining module is also used to, and the voice signal to be processed is converted to frequency-region signal
E(Ω);And determine posteriori SNR PostSNR (Ω) according to the following formula:
PostSNR (Ω)=| E (Ω) |2/(Rbb(Ω)+Rnn(Ω)),
Wherein, Rbb(Ω) is the power spectral density of the echo signal, Rnn(Ω) is the power spectrum of the noise signal
Density;
Prior weight PrioriSNR (Ω) is derived according to the following formula:
PrioriSNR(Ωi)=(1-alpha) * P (PostSNR (Ωi)-1)+alpha*|S’(Ωi-1)|2/Rbb(Ω);
Wherein, alpha is smoothing factor, P (x)=(| x |+x)/2, S ' (Ωi-1) be previous frame voice signal frequency spectrum
Estimation;
Further calculate weighting coefficient HLSA(Ω), and obtain the spectrum estimation S ' (Ω) of the target sound signal:
S ' (Ω)=E (Ω) * HLSA(Ω),
Wherein, theta=PostSNR (Ω) * PrioriSNR (Ω)/(PrioriSNR (Ω)+1).
Optionally, masking threshold determining module is also used to, and according to spectrum estimation, determines the voice signal to be processed
The power spectral density B (k) of critical band and extension critical band frequency spectrum C (k):
C (k)=B (k) * SF (k),
Wherein, SF (k)=15.81+7.5*k+0.474-17.5*sqrt (1+ (k+0.474) 2), bh, bl are respectively each
The bound frequency of critical band;
According to extension critical band frequency spectrum C (k) and offset function O (k), preliminary masking threshold T (k) is determined:
T (k)=10lg(C(k))-(O(k)/10),
Wherein, offset function O (k)=belta* (14.5+k)+(1-belta) * 5.5;Belta is tone coefficient;
According to preliminary masking threshold T (k) and absolute threshold of audibility Tabs(k), masking threshold R is determinedTT(Ω):
RTT(Ω)=min (T (k), Tabs(k)),
Wherein, Tabs(k)=3.64f-0.8-6.5exp(f-3.3)2+10-3f4。
Optionally, the filtering processing module is also used to, according to the function of the power spectral density of echo signal and noise signal
Rate spectrum density determines the weighting coefficient H (Ω) of filtering processing:
H (Ω)=min (1, sqrt (RTT(Ω)/(Rbb(Ω)+Rnn(Ω))) +(zeta_b*Rbb(Ω)+zeta_n*Rnn
(Ω))/(Rbb(Ω)+Rnn(Ω))),
Wherein, Rbb(Ω) is the power spectral density of the echo signal, Rnn(Ω) is the power spectrum of the noise signal
Density, zeta_b are echo attenuation coefficient, and zeta_n is noise reduction coefficient.
Optionally, the signal acquisition module is also used to, and receives initial voice signal;To the initial voice signal into
Row echo cancellor, to obtain the voice signal to be processed.
The third aspect, the embodiment of the present invention provide a kind of storage medium, are stored with one or more in the storage medium
Including the program executed instruction, it is described execute instruction can by electronic equipment (including but not limited to computer, server, or
Network equipment etc.) it reads and executes, in the method for executing any of the above-described processing voice signal of the present invention.
Fourth aspect provides a kind of electronic equipment comprising: at least one processor, and with described at least one
Manage the memory of device communication connection, wherein the memory is stored with the instruction that can be executed by least one described processor,
Described instruction is executed by least one described processor, so that at least one described processor is able to carry out above-mentioned of the present invention
The method and system of one processing voice signal.
5th aspect, the embodiment of the present invention also provide a kind of computer program product, and the computer program product includes
The computer program of storage on a storage medium, the computer program includes program instruction, when described program instruction is calculated
When machine executes, the computer is made to execute the method and system of any of the above-described processing voice signal.
The beneficial effect of the embodiment of the present invention is: can make voice signal distortion reduction, sound more natural, pass through meter
The power spectral density PSD of the interference sound signal of calculation, further determines that out masking threshold, calculates this process reduces algorithm
Complexity.And the order requirement for eliminating filter to pre-echo is reduced, and then accelerates the receipts of pre-echo arrester
Hold back speed.And it can be improved its robustness under strong background noise and near-end speech environment.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, making required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, drawings in the following description are some embodiments of the invention, for this
For the those of ordinary skill of field, without creative efforts, it can also be obtained according to these attached drawings others
Attached drawing.
Fig. 1 is the flow chart of an embodiment of the method for processing voice signal of the invention;
Fig. 2 is the flow chart of another embodiment of the method for processing voice signal of the invention;
Fig. 3 is the schematic diagram that the method for processing voice signal of the invention realizes an embodiment of system;
Fig. 4 is the schematic diagram of an embodiment of the method for processing voice signal of the invention;
Fig. 5 is the schematic diagram of an embodiment of the system of processing voice signal of the invention;
Fig. 6 is the structural schematic diagram of an embodiment of electronic equipment of the invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people
Member's every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase
Mutually combination.
The present invention can describe in the general context of computer-executable instructions executed by a computer, such as journey
Sequence module.Generally, program module include routines performing specific tasks or implementing specific abstract data types, it is program, right
As, element, data structure etc..The present invention can also be practiced in a distributed computing environment, in these distributed computing environment
In, by executing task by the connected remote processing devices of communication network.In a distributed computing environment, program mould
Block can be located in the local and remote computer storage media including storage equipment.
In the present invention, the fingers such as " module ", " device ", " system " are applied to the related entities of computer, such as hardware, firmly
Combination, software or software in execution of part and software etc..In detail, for example, element can with but be not limited to run on
Process, processor, object, executable element, execution thread, program and/or the computer of processor.In addition, running on service
Application program or shell script, server on device can be elements.One or more elements can execution process and/
Or in thread, and element can be localized and/or be distributed between two or multiple stage computers on one computer, and
It can be run by various computer-readable mediums.Element can also according to the signal with one or more data packets, for example,
Interacted from one with another element in local system, distributed system, and/or internet network by signal with
The signal of the data of other system interactions is communicated by locally and/or remotely process.
Finally, it is to be noted that, herein, relational terms such as first and second and the like are used merely to
Distinguish one entity or operation from another entity or operation, without necessarily requiring or implying these entities or
There are any actual relationship or orders between operation.Moreover, the terms "include", "comprise", are not only wanted including those
Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment
Intrinsic element.In the absence of more restrictions, the element limited by sentence " including ... ", it is not excluded that wrapping
Include in the process, method, article or equipment of the element that there is also other identical elements.
As shown in Figure 1, the embodiment of the present invention provides a kind of method for handling voice signal, comprising:
Step S11: obtaining voice signal to be processed, and voice signal to be processed includes target sound signal and interference sound
Signal.
Step S12: determining the power spectral density of interference sound signal, according to power spectral density to voice signal to be processed
It is weighted processing, to obtain the spectrum estimation of target sound signal.Specifically, determining the power spectrum of interference sound signal
After degree, determine posteriority and prior weight, and according to the signal-to-noise ratio computation weighting coefficient and to voice signal to be processed into
Row weighting processing, obtains the spectrum estimation of target sound information.
Step S13: masking threshold is determined according to spectrum estimation.
Step S14: the case where spectrum component of interference sound signal in voice signal to be processed is greater than masking threshold is determined
Under, voice signal to be processed is filtered.
And in embodiments of the present invention, specific for the calculating of masking threshold:
According to spectrum estimation, determines the power spectral density B (k) of the critical band of voice signal to be processed and extend critical
Band spectrum C (k):
C (k)=B (k) * SF (k),
Wherein, SF (k)=15.81+7.5*k+0.474-17.5*sqrt (1+ (k+0.474) 2), bh, bl are respectively each
The bound frequency of critical band;
According to extension critical band frequency spectrum C (k) and offset function O (k), preliminary masking threshold T (k) is determined:
T (k)=10lg(C(k))-(O(k)/10),
Wherein, offset function O (k)=belta* (14.5+k)+(1-belta) * 5.5;Belta is tone coefficient;
According to preliminary masking threshold T (k) and absolute threshold of audibility Tabs(k), masking threshold R is determinedTT(Ω):
RTT(Ω)=min (T (k), Tabs(k)),
Wherein, Tabs(k)=3.64f-0.8-6.5exp(f-3.3)2+10-3f4。
The embodiment of the present invention is further determined that out and is sheltered by the power spectral density PSD of the interference sound signal of calculating
Threshold value, this process reduces the complexities that algorithm calculates.And the order requirement that filter is eliminated to pre-echo is reduced,
And then accelerate the convergence rate of pre-echo arrester.And it can be improved it in strong background noise and near-end speech environment
Under robustness.
As shown in Fig. 2, the embodiment of the present invention provides a kind of method for handling voice signal, comprising:
Step S21: initial voice signal is received.The initial voice signal can be picked up by radio equipments such as microphones.
Step S22: carrying out echo cancellor to initial voice signal by Echo Canceller, to obtain sound letter to be processed
Number.
Step S23: determining the power spectral density of interference sound signal, according to power spectral density to voice signal to be processed
It is weighted processing, to obtain the spectrum estimation of target sound signal.
Step S24: masking threshold is determined according to spectrum estimation.
Step S25: the case where spectrum component of interference sound signal in voice signal to be processed is greater than masking threshold is determined
Under, voice signal to be processed is filtered.
After receiving initial signal, first to its preliminary progress echo cancellor, sound letter is can be improved in the embodiment of the present invention
Number processing accuracy.
If including noise signal and echo signal in voice signal to be processed, according to power spectral density to be processed
Voice signal is weighted processing, during obtaining the spectrum estimation of target sound signal:
Voice signal to be processed is converted into frequency-region signal E (Ω);
Determine posteriori SNR PostSNR (Ω) according to the following formula:
PostSNR (Ω)=| E (Ω) |2/(Rbb(Ω)+Rnn(Ω)),
Wherein, Rbb(Ω) is the power spectral density of echo signal, Rnn(Ω) is the power spectral density of noise signal;
Prior weight PrioriSNR (Ω) is derived according to the following formula:
PrioriSNR(Ωi)=(1-alpha) * P (PostSNR (Ωi)-1)+alpha*|S’(Ωi-1)|2/Rbb(Ω);
Wherein, alpha is smoothing factor, P (x)=(| x |+x)/2, S ' (Ωi-1) be previous frame voice signal frequency spectrum
Estimation;
Further calculate weighting coefficient HLSA(Ω), and obtain the spectrum estimation S ' (Ω) of target sound signal:
S ' (Ω)=E (Ω) * HLSA(Ω),
Wherein, theta=PostSNR (Ω) * PrioriSNR (Ω)/(PrioriSNR (Ω)+1).
In the case where determining that the spectrum component of interference sound signal in voice signal to be processed is greater than masking threshold, treat
Handling the step of voice signal is filtered includes:
The weighting system of filtering processing is determined according to the power spectral density of the power spectral density of echo signal and noise signal
Number H (Ω):
H (Ω)=min (1, sqrt (RTT(Ω)/(Rbb(Ω)+Rnn(Ω))) +(zeta_b*Rbb(Ω)+zeta_n*Rnn
(Ω))/(Rbb(Ω)+Rnn(Ω))),
Wherein, Rbb(Ω) is the power spectral density of echo signal, Rnn(Ω) is the power spectral density of noise signal,
Zeta_b is echo attenuation coefficient, and zeta_n is noise reduction coefficient.
The embodiment of the present invention remains original background noise characteristic, and residual echo hearing test is more noise like, voice
Distortion reduction, so that sound sounds more natural.And the order requirement that filter is eliminated to pre-echo is reduced, in turn
It accelerates the convergence rate of pre-echo arrester while reducing the algorithm computation complexity of Echo Canceller.And it can
Improve its robustness under strong background noise and near-end speech environment.
As shown in figure 3, in embodiments of the present invention, distal end in the method realization system of processing voice signal of the invention
Microphone transmits voice signal, is shown by loudspeaker, and constitutes initial echo signal d (k).Proximal end microphones pick up speech
Signal y (k), including pure voice signal s (k) i.e. target sound signal, noise signal n (k) and loudspeaker are anti-through LRM
The initial echo signal d (k) of feedback.Firstly, Echo Canceller C carries out echo to the voice signal y (k) that proximal end microphone picks up
It eliminates, the filtering processing of filter H further progress.
As shown in figure 4, the embodiment of the present invention provides a kind of method for handling voice signal, comprising:
Proximal end microphones pick up speech signal y (k), including pure voice signal s (k), noise signal n (k), and
The initial echo signal d (k) that loudspeaker is fed back through LRM.In embodiments of the present invention, which is target information.
Echo Canceller carries out echo cancellor to the voice signal y (k) that proximal end microphone picks up, after obtaining echo cancellor
Voice signal e (k).The interference sound signal that voice signal e (k) after the echo cancellor includes is noise signal and residual
Echo signal.
Noise PSD Rnn (Ω) and residual echo PSD R are estimated by statistics or autocorrelation methodbb(Ω)。
Postfilter is weighted processing to the proximal end microphone signal after echo cancellor, obtains pure voice signal
Frequency spectrum S ' (Ω) according to a preliminary estimate.Detailed process includes:
A) posteriori SNR is calculated:
PostSNR (Ω)=| E (Ω) |2/(Rbb(Ω)+Rnn(Ω))
B) prior weight is derived according to decision-directed method:
PrioriSNR(Ωi)=(1-alpha) * P (PostSNR (Ωi)-1)+alpha*|S’(Ωi-1)|2/Rbb(Ω)
Wherein alpha is smoothing factor, P (x)=(| x |+x)/2, S ' (Ωi-1) it is tentatively estimating for previous frame voice signal
Meter.
C) theta=PostSNR (Ω) * PrioriSNR (Ω)/(PrioriSNR (Ω)+1) is defined, then calculates and adds
Weight coefficient:
D) weighting obtains S ' according to a preliminary estimate (Ω)=E (Ω) * H of voice signalLSA(Ω)
Then, according to speech signal spec-trum, S ' (Ω) estimates masking threshold R according to a preliminary estimateTT(Ω).Detailed process packet
It includes:
A) critical band analysis is carried out to signal and human ear is regarded as discrete bandpass filter group according to situation theory,
One critical band is referred to as a Bark, then
The power spectral density of each critical band
Wherein, bh, bl are respectively the bound frequency of each critical band, and k is related with sample rate.
B) spread function SF (k) is calculated:
SF (k)=15.81+7.5*k+0.474-17.5*sqrt (1+ (k+0.474) 2)
Due to influencing each other between critical band, extension extension critical band frequency spectrum is represented by C (k)=B (k) * SF
(k)。
C) the masking threshold R of masking noise and residual echo is calculatedTT(Ω)。
Because being respectively there are two kinds of masking thresholds: the threshold value of masking by pure tone noise and residual echo is C (k)-(14.5+
K) threshold value of db and noise and residual echo masking pure tone, is C (k) -5.5db.
Accordingly, it is determined that signal is similar to pure tone or noise and residual echo, and then needs to define spectrum flatness and estimates SFM:
SFM=10*lg (G/A)
Wherein, G, A are respectively the geometrical mean and arithmetic mean of instantaneous value of power spectrum density.
And define tone coefficient belta=min (SFM/SFMmax,1)
The offset function O (k) that each frequency band shelters energy is calculated by belta:
O (k)=belta* (14.5+k)+(1-belta) * 5.5
Then masking threshold size are as follows: T (k)=10lg(C(k))-(O(k)/10)
The spread function threshold value being calculated is returned in the domain Bark
Compared with human ear hearing absolute threshold, if the absolute threshold of audibility of the masking threshold lower than human ear calculated,
Just take the value of the absolute threshold of audibility, wherein absolute threshold of audibility Tabs (k) is defined as:
Tabs (k)=3.64f-0.8-6.5exp(f-3.3)2+10-3f4
So final masking threshold is RTT(Ω)=min (T (k), Tabs(k))。
Further, psychologic acoustics weighted filtering is carried out to frequency domain microphone signal E (Ω) after echo cancellor.It is (fast with FFT
Fast Fourier transform) digital signal of time domain can be converted to frequency-region signal, and judge frequency domain Mike's wind after echo cancellor
Whether the noise spectrum ingredient in number E (Ω) is less than masking threshold, does not handle if then retaining;If otherwise to corresponding noise frequency
Spectrum ingredient is decayed according to traditional MMSE-LSA.
Wherein, psychologic acoustics weighting filter coefficients specific derivation process is as follows:
The design object of psychologic acoustics adaptive weighted filter is to be equal to cover in the sum of residual echo distortion and noise distortion
Near-end voice signals distortion is minimum when covering threshold value, so optimal psychologic acoustics weighting filter coefficients H (Ω) meets:
[zeta_b–H(Ω)]2Rbb(Ω)+[zeta_n–H(Ω)]2Rnn(Ω)=RTT(Ω)
Wherein, zeta_b is residual echo attenuation coefficient, usually takes 20lg (zeta_b)=- 35;
Zeta_n is noise reduction coefficient, usually takes 20lg (zeta_n)=- 15.
Due to 0≤H (Ω)≤1, solves above-mentioned secondary equation H (Ω) and positive value is taken to obtain:
H (Ω)=min (1, [zeta_b*Rbb(Ω)+zeta_n*Rnn(Ω)+
sqrt([Rbb(Ω)+Rnn(Ω)]*RTT(Ω)-[zeta_b-zeta_n]2*Rbb(Ω)*Rbb(Ω))]/(Rbb
(Ω)+ Rnn(Ω)))
Since zeta_b, zeta_n are much smaller than 1 and usually relative to Rbb(Ω) and RbbR for (Ω)TT(Ω) is not
Too small, institute's above formula can abbreviation are as follows:
H (Ω)=min (1, sqrt (RTT(Ω)/(Rbb(Ω)+Rnn(Ω))) +(zeta_b*Rbb(Ω)+zeta_n*Rnn
(Ω))/(Rbb(Ω)+Rnn(Ω)))
The embodiment of the present invention eliminates adaptive filter to pre-echo since psychologic acoustics postfilter can also be reduced
The order requirement of wave device, it is possible to accelerate the convergence rate of Echo Canceller, reduce algorithm computation complexity, and can mention
Its high robustness under strong background noise and near-end speech environment.
And merge residual echo in postposition psychologic acoustics weighting filter and eliminate, it is gone adaptively using residual echo
Filter weighting coefficients are updated, acoustic echo is further eliminated.In addition, in masking threshold noise spectrum below and remaining back
Sound ingredient is not since human ear masking effect is heard, so this partial noise frequency spectrum and residual echo ingredient do not need to decay,
Only need using traditional adaptive post-filtering method to not by voice signal shelter noise spectrum and residual echo at
Divide and decay, to remain original background noise characteristic well, residual echo hearing test is more noise like, voice
Distortion reduction sounds more natural.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a systems
The movement of column merges, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described,
Because according to the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art also answer
This knows that the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily originally
Necessary to invention.In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, without detailed in some embodiment
The part stated, reference can be made to the related descriptions of other embodiments.
As shown in figure 5, the embodiment of the present invention also provides a kind of system 500 for handling voice signal, comprising:
Signal acquisition module 510, for obtaining voice signal to be processed, voice signal to be processed includes target sound letter
Number and interference sound signal.
Spectrum estimation determining module 520, for determining the power spectral density of interference sound signal, and according to power spectrum
Density is weighted processing to voice signal to be processed, to obtain the spectrum estimation of target sound signal.
Masking threshold determining module 530, for determining masking threshold according to spectrum estimation.
Module 540 is filtered, is covered for determining that the spectrum component of interference sound signal in voice signal to be processed is greater than
In the case where covering threshold value, voice signal to be processed is filtered.
Further, interference sound signal includes noise signal and echo signal.
Spectrum estimation determining module is also used to, and voice signal to be processed is converted to frequency-region signal E (Ω);And according to
Following formula determines posteriori SNR PostSNR (Ω):
PostSNR (Ω)=| E (Ω) |2/(Rbb(Ω)+Rnn(Ω)),
Wherein, Rbb(Ω) is the power spectral density of echo signal, Rnn(Ω) is the power spectral density of noise signal;
Prior weight PrioriSNR (Ω) is derived according to the following formula:
PrioriSNR(Ωi)=(1-alpha) * P (PostSNR (Ωi)-1)+alpha*|S’(Ωi-1)|2/Rbb(Ω);
Wherein, alpha is smoothing factor, P (x)=(| x |+x)/2, S ' (Ωi-1) be previous frame voice signal frequency spectrum
Estimation;
Further calculate weighting coefficient HLSA(Ω), and obtain the spectrum estimation S ' (Ω) of target sound signal:
S ' (Ω)=E (Ω) * HLSA(Ω),
Wherein, theta=PostSNR (Ω) * PrioriSNR (Ω)/(PrioriSNR (Ω)+1).
Masking threshold determining module is also used to, and according to spectrum estimation, determines the function of the critical band of voice signal to be processed
Rate spectrum density B (k) and extension critical band frequency spectrum C (k):
C (k)=B (k) * SF (k),
Wherein, SF (k)=15.81+7.5*k+0.474-17.5*sqrt (1+ (k+0.474) 2), bh, bl are respectively each
The bound frequency of critical band;
According to extension critical band frequency spectrum C (k) and offset function O (k), preliminary masking threshold T (k) is determined:
T (k)=10lg(C(k))-(O(k)/10),
Wherein, offset function O (k)=belta* (14.5+k)+(1-belta) * 5.5;Belta is tone coefficient;
According to preliminary masking threshold T (k) and absolute threshold of audibility Tabs(k), masking threshold R is determinedTT(Ω):
RTT(Ω)=min (T (k), Tabs(k)),
Wherein, Tabs(k)=3.64f-0.8-6.5exp(f-3.3)2+10-3f4。
Filtering processing module is also used to, and the power spectral density according to the power spectral density of echo signal and noise signal is true
Make the weighting coefficient H (Ω) of filtering processing:
H (Ω)=min (1, sqrt (RTT(Ω)/(Rbb(Ω)+Rnn(Ω))) +(zeta_b*Rbb(Ω)+zeta_n*Rnn
(Ω))/(Rbb(Ω)+Rnn(Ω))),
Wherein, Rbb(Ω) is the power spectral density of echo signal, Rnn(Ω) is the power spectral density of noise signal,
Zeta_b is echo attenuation coefficient, and zeta_n is noise reduction coefficient.
Signal acquisition module is also used to, and receives initial voice signal;Echo cancellor is carried out to initial voice signal, with
To voice signal to be processed.
The embodiment of the present invention is further determined that out and is sheltered by the power spectral density PSD of the interference sound signal of calculating
Threshold value, this process reduces the complexities that algorithm calculates.And the order requirement that filter is eliminated to pre-echo is reduced,
And then accelerate the convergence rate of pre-echo arrester.And it can be improved it in strong background noise and near-end speech environment
Under robustness.
In some embodiments, the embodiment of the present invention provides a kind of non-volatile computer readable storage medium storing program for executing, described to deposit
Being stored in storage media one or more includes the programs executed instruction, it is described execute instruction can by electronic equipment (including but
It is not limited to computer, server or the network equipment etc.) it reads and executes, for executing any of the above-described processing of the present invention
The method of voice signal.
In some embodiments, the embodiment of the present invention also provides a kind of computer program product, and the computer program produces
Product include the computer program being stored on non-volatile computer readable storage medium storing program for executing, and the computer program includes program
Instruction makes the computer execute the side of any of the above-described processing voice signal when described program instruction is computer-executed
Method.
In some embodiments, the embodiment of the present invention also provides a kind of electronic equipment comprising: at least one processor,
And the memory being connect at least one described processor communication, wherein the memory is stored with can be by described at least one
The instruction that a processor executes, described instruction is executed by least one described processor, so that at least one described processor energy
Enough methods for executing processing voice signal.
In some embodiments, the embodiment of the present invention also provides a kind of storage medium, is stored thereon with computer program,
It is characterized in that, the method for handling voice signal when the program is executed by processor.
The system of the processing voice signal of the embodiments of the present invention can be used for executing the processing sound of the embodiment of the present invention
The method of signal, and reach the method technology achieved of the realization processing voice signal of the embodiments of the present invention accordingly
Effect, which is not described herein again.Hardware processor (hardware processor) Lai Shixian can be passed through in the embodiment of the present invention
Related function module.
Fig. 6 is the hardware knot of the electronic equipment of the method for the execution processing voice signal that another embodiment of the application provides
Structure schematic diagram, as shown in fig. 6, the equipment includes:
One or more processors 610 and memory 620, in Fig. 6 by taking a processor 610 as an example.
The equipment for executing the method for processing voice signal can also include: input unit 630 and output device 640.
Processor 610, memory 620, input unit 630 and output device 640 can pass through bus or other modes
It connects, in Fig. 6 for being connected by bus.
Memory 620 is used as a kind of non-volatile computer readable storage medium storing program for executing, can be used for storing non-volatile software journey
Sequence, non-volatile computer executable program and module, the method such as the processing voice signal in the embodiment of the present application are corresponding
Program instruction/module.Processor 610 by operation be stored in memory 620 non-volatile software program, instruction with
And module, thereby executing the various function application and data processing of server, i.e. realization above method embodiment handles sound
The method of signal.
Memory 620 may include storing program area and storage data area, wherein storing program area can store operation system
Application program required for system, at least one function;Storage data area can store the use of the device according to processing voice signal
The data etc. created.In addition, memory 620 may include high-speed random access memory, it can also include non-volatile deposit
Reservoir, for example, at least a disk memory, flush memory device or other non-volatile solid state memory parts.In some implementations
In example, optional memory 620 includes the memory remotely located relative to processor 610, these remote memories can lead to
Network connection is crossed to the device for handling voice signal.The example of above-mentioned network include but is not limited to internet, intranet,
Local area network, mobile radio communication and combinations thereof.
Input unit 630 can receive the number or character information of input, and generate and the device of processing voice signal
User setting and the related signal of function control.Output device 640 may include that display screen etc. shows equipment.
One or more of modules are stored in the memory 620, when by one or more of processors
When 610 execution, the method for handling voice signal in above-mentioned any means embodiment is executed.
The said goods can be performed the embodiment of the present application provided by method, have the corresponding functional module of execution method and
Beneficial effect.The not technical detail of detailed description in the present embodiment, reference can be made to method provided by the embodiment of the present application.
The electronic equipment of the embodiment of the present application exists in a variety of forms, including but not limited to:
(1) mobile communication equipment: the characteristics of this kind of equipment is that have mobile communication function, and to provide speech, data
Communication is main target.This Terminal Type includes: smart phone (such as iPhone), multimedia handset, functional mobile phone, and
Low-end mobile phone etc..
(2) super mobile personal computer equipment: this kind of equipment belongs to the scope of personal computer, there is calculating and processing function
Can, generally also have mobile Internet access characteristic.This Terminal Type includes: PDA, MID and UMPC equipment etc., such as iPad.
(3) portable entertainment device: this kind of equipment can show and play multimedia content.Such equipment includes: sound
Frequently, video player (such as iPod), handheld device, e-book and intelligent toy and portable car-mounted navigation equipment.
(4) server: providing the equipment of the service of calculating, and the composition of server includes that processor, hard disk, memory, system are total
Line etc., server is similar with general computer architecture, but due to needing to provide highly reliable service, in processing energy
Power, stability, reliability, safety, scalability, manageability etc. are more demanding.
(5) other electronic devices with data interaction function.
The apparatus embodiments described above are merely exemplary, wherein the unit as illustrated by the separation member
It may or may not be physically separated, component shown as a unit may or may not be physics
Unit, it can it is in one place, or may be distributed over multiple network units.It can select according to the actual needs
Some or all of the modules therein is selected to achieve the purpose of the solution of this embodiment.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment
The mode of general hardware platform can be added to realize by software, naturally it is also possible to pass through hardware.Based on this understanding, above-mentioned
Technical solution substantially in other words can be embodied in the form of software products the part that the relevant technologies contribute, should
Computer software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including several
Instruction is used so that computer equipment (can be personal computer, server or the network equipment an etc.) execution is each
Method described in certain parts of embodiment or embodiment.
Finally, it should be noted that above embodiments are only to illustrate the technical solution of the application, rather than its limitations;To the greatest extent
Pipe is with reference to the foregoing embodiments described in detail the application, those skilled in the art should understand that: it is still
It is possible to modify the technical solutions described in the foregoing embodiments, or part of technical characteristic is equally replaced
It changes;And these are modified or replaceed, the essence of each embodiment technical solution of the application that it does not separate the essence of the corresponding technical solution
Mind and range.
Claims (15)
1. a kind of method for handling voice signal characterized by comprising
Voice signal to be processed is obtained, the voice signal to be processed includes target sound signal and interference sound signal;
It determines the power spectral density of the interference sound signal, and the sound to be processed is believed according to the power spectral density
Number it is weighted processing, to obtain the spectrum estimation of target sound signal;
Masking threshold is determined according to the spectrum estimation;
It is right in the case where determining that the spectrum component of interference sound signal in the voice signal to be processed is greater than the masking threshold
The voice signal to be processed is filtered.
2. the method according to claim 1, wherein the interference sound signal includes noise signal and echo letter
Number.
3. according to the method described in claim 2, it is characterized in that, being believed according to the power spectral density the sound to be processed
Number it is weighted processing, to include: the step of obtaining the spectrum estimation of target sound signal
The voice signal to be processed is converted into frequency-region signal E (Ω);
Determine posteriori SNR PostSNR (Ω) according to the following formula:
PostSNR (Ω)=| E (Ω) |2/(Rbb(Ω)+Rnn(Ω)),
Wherein, Rbb(Ω) is the power spectral density of the echo signal, Rnn(Ω) is the power spectral density of the noise signal;
Prior weight PrioriSNR (Ω) is derived according to the following formula:
PrioriSNR(Ωi)=(1-alpha) * P (PostSNR (Ωi)-1)+alpha*|S’(Ωi-1)|2/Rbb(Ω);
Wherein, alpha is smoothing factor, P (x)=(| x |+x)/2, S ' (Ωi-1) be previous frame voice signal spectrum estimation;
Further calculate weighting coefficient HLSA(Ω), and obtain the spectrum estimation S ' (Ω) of the target sound signal:
S ' (Ω)=E (Ω) * HLSA(Ω),
Wherein, theta=PostSNR (Ω) * PrioriSNR (Ω)/(PrioriSNR (Ω)+1).
4. according to the method described in claim 2, it is characterized in that, determining interference sound signal in the voice signal to be processed
Spectrum component be greater than the masking threshold in the case where, the step of voice signal to be processed is filtered packet
It includes:
The weighting coefficient H of filtering processing is determined according to the power spectral density of the power spectral density of echo signal and noise signal
(Ω):
H (Ω)=min (1, sqrt (RTT(Ω)/(Rbb(Ω)+Rnn(Ω)))
+(zeta_b*Rbb(Ω)+zeta_n*Rnn(Ω))/(Rbb(Ω)+Rnn(Ω))),
Wherein, Rbb(Ω) is the power spectral density of the echo signal, Rnn(Ω) is the power spectral density of the noise signal,
Zeta_b is echo attenuation coefficient, and zeta_n is noise reduction coefficient.
5. the method according to claim 1, wherein the step of determining masking threshold according to spectrum estimation packet
It includes:
According to spectrum estimation, determines the power spectral density B (k) of the critical band of the voice signal to be processed and extend critical frequency
Band frequency spectrum C (k):
C (k)=B (k) * SF (k),
Wherein, SF (k)=15.81+7.5*k+0.474-17.5*sqrt (1+ (k+0.474) 2), bh, bl are respectively each critical frequency
The bound frequency of band;
According to extension critical band frequency spectrum C (k) and offset function O (k), preliminary masking threshold T (k) is determined:
T (k)=10lg(C(k))-(O(k)/10),
Wherein, offset function O (k)=belta* (14.5+k)+(1-belta) * 5.5;Belta is tone coefficient;
According to preliminary masking threshold T (k) and absolute threshold of audibility Tabs(k), masking threshold R is determinedTT(Ω):
RTT(Ω)=min (T (k), Tabs(k)),
Wherein, Tabs(k)=3.64f-0.8-6.5exp(f-3.3)2+10-3f4。
6. the method according to claim 1, wherein the step of obtaining voice signal to be processed includes:
Receive initial voice signal;
Echo cancellor is carried out to the initial voice signal, to obtain the voice signal to be processed.
7. the method according to claim 1, wherein the voice signal to be processed is voice signal.
8. a kind of system for handling voice signal characterized by comprising
Signal acquisition module, for obtaining voice signal to be processed, the voice signal to be processed include target sound signal and
Interference sound signal;
Spectrum estimation determining module, for determining the power spectral density of the interference sound signal, and according to the power spectrum
Density is weighted processing to the voice signal to be processed, to obtain the spectrum estimation of target sound signal;
Masking threshold determining module, for determining masking threshold according to the spectrum estimation;
Module is filtered, for determining that the spectrum component of interference sound signal in the voice signal to be processed is greater than described cover
In the case where covering threshold value, the voice signal to be processed is filtered.
9. system according to claim 8, which is characterized in that the interference sound signal includes noise signal and echo letter
Number.
10. system according to claim 8, which is characterized in that the spectrum estimation determining module is also used to, will it is described to
Processing voice signal is converted to frequency-region signal E (Ω);And determine posteriori SNR PostSNR (Ω) according to the following formula:
PostSNR (Ω)=| E (Ω) |2/(Rbb(Ω)+Rnn(Ω)),
Wherein, Rbb(Ω) is the power spectral density of the echo signal, Rnn(Ω) is the power spectral density of the noise signal;
Prior weight PrioriSNR (Ω) is derived according to the following formula:
PrioriSNR(Ωi)=(1-alpha) * P (PostSNR (Ωi)-1)+alpha*|S’(Ωi-1)|2/Rbb(Ω);
Wherein, alpha is smoothing factor, P (x)=(| x |+x)/2, S ' (Ωi-1) be previous frame voice signal spectrum estimation;
Further calculate weighting coefficient HLSA(Ω), and obtain the spectrum estimation S ' (Ω) of the target sound signal:
S ' (Ω)=E (Ω) * HLSA(Ω),
Wherein, theta=PostSNR (Ω) * PrioriSNR (Ω)/(PrioriSNR (Ω)+1).
11. system according to claim 8, which is characterized in that masking threshold determining module is also used to, and is estimated according to frequency spectrum
Meter determines power spectral density B (k) and extension critical band frequency spectrum C (k) of the critical band of the voice signal to be processed:
C (k)=B (k) * SF (k),
Wherein, SF (k)=15.81+7.5*k+0.474-17.5*sqrt (1+ (k+0.474) 2), bh, bl are respectively each critical frequency
The bound frequency of band;
According to extension critical band frequency spectrum C (k) and offset function O (k), preliminary masking threshold T (k) is determined:
T (k)=10lg(C(k))-(O(k)/10),
Wherein, offset function O (k)=belta* (14.5+k)+(1-belta) * 5.5;Belta is tone coefficient;
According to preliminary masking threshold T (k) and absolute threshold of audibility Tabs(k), masking threshold R is determinedTT(Ω):
RTT(Ω)=min (T (k), Tabs(k)),
Wherein, Tabs(k)=3.64f-0.8-6.5exp(f-3.3)2+10-3f4。
12. system according to claim 8, which is characterized in that the filtering processing module is also used to, according to echo signal
Power spectral density and noise signal power spectral density determine filtering processing weighting coefficient H (Ω):
H (Ω)=min (1, sqrt (RTT(Ω)/(Rbb(Ω)+Rnn(Ω)))
+(zeta_b*Rbb(Ω)+zeta_n*Rnn(Ω))/(Rbb(Ω)+Rnn(Ω))),
Wherein, Rbb(Ω) is the power spectral density of the echo signal, Rnn(Ω) is the power spectral density of the noise signal,
Zeta_b is echo attenuation coefficient, and zeta_n is noise reduction coefficient.
13. system according to claim 8, which is characterized in that the signal acquisition module is also used to, and receives initial voice
Signal;Echo cancellor is carried out to the initial voice signal, to obtain the voice signal to be processed.
14. a kind of electronic equipment comprising: at least one processor, and connect at least one described processor communication
Memory, wherein the memory be stored with can by least one described processor execute instruction, described instruction by it is described extremely
A few processor executes, so that at least one described processor is able to carry out any one of claim 1-7 the method
The step of.
15. a kind of storage medium, is stored thereon with computer program, which is characterized in that the realization when program is executed by processor
The step of any one of claim 1-7 the method.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201811645765.5A CN109727605B (en) | 2018-12-29 | 2018-12-29 | Method and system for processing sound signal |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201811645765.5A CN109727605B (en) | 2018-12-29 | 2018-12-29 | Method and system for processing sound signal |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN109727605A true CN109727605A (en) | 2019-05-07 |
| CN109727605B CN109727605B (en) | 2020-06-12 |
Family
ID=66298550
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201811645765.5A Active CN109727605B (en) | 2018-12-29 | 2018-12-29 | Method and system for processing sound signal |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN109727605B (en) |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110931007A (en) * | 2019-12-04 | 2020-03-27 | 苏州思必驰信息科技有限公司 | Voice recognition method and system |
| CN111524498A (en) * | 2020-04-10 | 2020-08-11 | 维沃移动通信有限公司 | Filtering method, device and electronic device |
| CN114067822A (en) * | 2020-08-07 | 2022-02-18 | 腾讯科技(深圳)有限公司 | Call audio processing method and device, computer equipment and storage medium |
| CN116320123A (en) * | 2022-08-11 | 2023-06-23 | 荣耀终端有限公司 | A voice signal output method and electronic device |
| CN117392994A (en) * | 2023-12-12 | 2024-01-12 | 腾讯科技(深圳)有限公司 | Audio signal processing method, device, equipment and storage medium |
Citations (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101777349A (en) * | 2009-12-08 | 2010-07-14 | 中国科学院自动化研究所 | Auditory perception property-based signal subspace microphone array voice enhancement method |
| EP2226794A1 (en) * | 2009-03-06 | 2010-09-08 | Harman Becker Automotive Systems GmbH | Background Noise Estimation |
| CN101894563A (en) * | 2010-07-15 | 2010-11-24 | 瑞声声学科技(深圳)有限公司 | Methods of Speech Enhancement |
| CN101989423A (en) * | 2009-07-30 | 2011-03-23 | Nxp股份有限公司 | Active noise reduction method using perceptual masking |
| CN103824564A (en) * | 2014-03-17 | 2014-05-28 | 上海申磬产业有限公司 | Voice enhancement method for use in voice identification process of electric wheelchair |
| CN105280195A (en) * | 2015-11-04 | 2016-01-27 | 腾讯科技(深圳)有限公司 | Method and device for processing speech signal |
| CN107393550A (en) * | 2017-07-14 | 2017-11-24 | 深圳永顺智信息科技有限公司 | Method of speech processing and device |
| CN107993670A (en) * | 2017-11-23 | 2018-05-04 | 华南理工大学 | Microphone array voice enhancement method based on statistical model |
| US10079026B1 (en) * | 2017-08-23 | 2018-09-18 | Cirrus Logic, Inc. | Spatially-controlled noise reduction for headsets with variable microphone array orientation |
| CN108564963A (en) * | 2018-04-23 | 2018-09-21 | 百度在线网络技术(北京)有限公司 | Method and apparatus for enhancing voice |
| CN108735225A (en) * | 2018-04-28 | 2018-11-02 | 南京邮电大学 | It is a kind of based on human ear masking effect and Bayesian Estimation improvement spectrum subtract method |
| CN108735229A (en) * | 2018-06-12 | 2018-11-02 | 华南理工大学 | A kind of amplitude based on noise Ratio Weighted and phase combining compensation anti-noise sound enhancement method and realization device |
-
2018
- 2018-12-29 CN CN201811645765.5A patent/CN109727605B/en active Active
Patent Citations (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP2226794A1 (en) * | 2009-03-06 | 2010-09-08 | Harman Becker Automotive Systems GmbH | Background Noise Estimation |
| CN101989423A (en) * | 2009-07-30 | 2011-03-23 | Nxp股份有限公司 | Active noise reduction method using perceptual masking |
| CN101777349A (en) * | 2009-12-08 | 2010-07-14 | 中国科学院自动化研究所 | Auditory perception property-based signal subspace microphone array voice enhancement method |
| CN101894563A (en) * | 2010-07-15 | 2010-11-24 | 瑞声声学科技(深圳)有限公司 | Methods of Speech Enhancement |
| CN103824564A (en) * | 2014-03-17 | 2014-05-28 | 上海申磬产业有限公司 | Voice enhancement method for use in voice identification process of electric wheelchair |
| CN105280195A (en) * | 2015-11-04 | 2016-01-27 | 腾讯科技(深圳)有限公司 | Method and device for processing speech signal |
| CN107393550A (en) * | 2017-07-14 | 2017-11-24 | 深圳永顺智信息科技有限公司 | Method of speech processing and device |
| US10079026B1 (en) * | 2017-08-23 | 2018-09-18 | Cirrus Logic, Inc. | Spatially-controlled noise reduction for headsets with variable microphone array orientation |
| CN107993670A (en) * | 2017-11-23 | 2018-05-04 | 华南理工大学 | Microphone array voice enhancement method based on statistical model |
| CN108564963A (en) * | 2018-04-23 | 2018-09-21 | 百度在线网络技术(北京)有限公司 | Method and apparatus for enhancing voice |
| CN108735225A (en) * | 2018-04-28 | 2018-11-02 | 南京邮电大学 | It is a kind of based on human ear masking effect and Bayesian Estimation improvement spectrum subtract method |
| CN108735229A (en) * | 2018-06-12 | 2018-11-02 | 华南理工大学 | A kind of amplitude based on noise Ratio Weighted and phase combining compensation anti-noise sound enhancement method and realization device |
Non-Patent Citations (1)
| Title |
|---|
| 卢志强: "基于谱估计统计模型的语音增强算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110931007A (en) * | 2019-12-04 | 2020-03-27 | 苏州思必驰信息科技有限公司 | Voice recognition method and system |
| CN110931007B (en) * | 2019-12-04 | 2022-07-12 | 思必驰科技股份有限公司 | Speech recognition method and system |
| CN111524498A (en) * | 2020-04-10 | 2020-08-11 | 维沃移动通信有限公司 | Filtering method, device and electronic device |
| CN114067822A (en) * | 2020-08-07 | 2022-02-18 | 腾讯科技(深圳)有限公司 | Call audio processing method and device, computer equipment and storage medium |
| CN116320123A (en) * | 2022-08-11 | 2023-06-23 | 荣耀终端有限公司 | A voice signal output method and electronic device |
| CN116320123B (en) * | 2022-08-11 | 2024-03-08 | 荣耀终端有限公司 | A voice signal output method and electronic device |
| CN117392994A (en) * | 2023-12-12 | 2024-01-12 | 腾讯科技(深圳)有限公司 | Audio signal processing method, device, equipment and storage medium |
| CN117392994B (en) * | 2023-12-12 | 2024-03-01 | 腾讯科技(深圳)有限公司 | Audio signal processing method, device, equipment and storage medium |
Also Published As
| Publication number | Publication date |
|---|---|
| CN109727605B (en) | 2020-06-12 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109727604B (en) | Frequency domain echo cancellation method and computer storage medium for speech recognition front-end | |
| US7831035B2 (en) | Integration of a microphone array with acoustic echo cancellation and center clipping | |
| CN109727605A (en) | Method and system for processing sound signals | |
| CN111341336B (en) | Echo cancellation method, device, terminal equipment and medium | |
| CN111768796B (en) | Acoustic echo cancellation and dereverberation method and device | |
| CN109473118B (en) | Dual-channel speech enhancement method and device | |
| US7773743B2 (en) | Integration of a microphone array with acoustic echo cancellation and residual echo suppression | |
| CN103632675B (en) | Noise Estimation During Noise Reduction and Echo Cancellation in Personal Communications | |
| CN111951819A (en) | Echo cancellation method, device and storage medium | |
| EP3282678B1 (en) | Signal processor with side-tone noise reduction for a headset | |
| US20160066087A1 (en) | Joint noise suppression and acoustic echo cancellation | |
| US9232309B2 (en) | Microphone array processing system | |
| US8306821B2 (en) | Sub-band periodic signal enhancement system | |
| US20080031469A1 (en) | Multi-channel echo compensation system | |
| CN107123430A (en) | Echo cancellation method, device, conference tablet and computer storage medium | |
| CN106898359A (en) | Acoustic signal processing method, system, audio interactive device and computer equipment | |
| EP4071757A1 (en) | Echo cancellation method and device | |
| TW200948030A (en) | Apparatus and method for computing filter coefficients for echo suppression | |
| CN103534942B (en) | Process audio signal | |
| CN1367977A (en) | Methods and apparatus for improved sub-band adaptive filtering in echo cancellation systems | |
| US8543390B2 (en) | Multi-channel periodic signal enhancement system | |
| EP3796629A1 (en) | Double talk detection method, double talk detection device and echo cancellation system | |
| KR20220157475A (en) | Echo Residual Suppression | |
| US12272369B1 (en) | Dereverberation and noise reduction | |
| TW202331701A (en) | Echo cancelling method for dual-microphone array, echo cancelling device for dual-microphone array, electronic equipment, and computer-readable medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| CP01 | Change in the name or title of a patent holder | ||
| CP01 | Change in the name or title of a patent holder |
Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu. Patentee after: Sipic Technology Co.,Ltd. Address before: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu. Patentee before: AI SPEECH Ltd. |