CN109727605A - Method and system for processing sound signals - Google Patents

Method and system for processing sound signals Download PDF

Info

Publication number
CN109727605A
CN109727605A CN201811645765.5A CN201811645765A CN109727605A CN 109727605 A CN109727605 A CN 109727605A CN 201811645765 A CN201811645765 A CN 201811645765A CN 109727605 A CN109727605 A CN 109727605A
Authority
CN
China
Prior art keywords
signal
voice signal
processed
spectral density
power spectral
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811645765.5A
Other languages
Chinese (zh)
Other versions
CN109727605B (en
Inventor
袁斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sipic Technology Co Ltd
Original Assignee
AI Speech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AI Speech Ltd filed Critical AI Speech Ltd
Priority to CN201811645765.5A priority Critical patent/CN109727605B/en
Publication of CN109727605A publication Critical patent/CN109727605A/en
Application granted granted Critical
Publication of CN109727605B publication Critical patent/CN109727605B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)

Abstract

The present invention discloses a kind of method and system for handling voice signal.One specific embodiment of this method includes: to obtain voice signal to be processed, and the voice signal to be processed includes target sound signal and interference sound signal;It determines the power spectral density of the interference sound signal, and processing is weighted to the voice signal to be processed according to the power spectral density, to obtain the spectrum estimation of target sound signal;Masking threshold is determined according to the spectrum estimation;In the case where determining that the spectrum component of interference sound signal in the voice signal to be processed is greater than the masking threshold, the voice signal to be processed is filtered.This method can make voice signal distortion reduction, sound more natural, and reduce the complexity of algorithm calculating, and accelerate the convergence rate of pre-echo arrester.And it can be improved its robustness under strong background noise and near-end speech environment.

Description

Handle the method and system of voice signal
Technical field
The present invention relates to signal processing technology field more particularly to a kind of method and system for handling voice signal.
Background technique
In the prior art, it for the filtering processing of voice signal, can be reduced " music noise ", but there are filter drops Voice signal of making an uproar that treated less natural problem to a certain extent.Because human ear receive a sound when be likely to by To the interference and compacting of another sound, this phenomenon is known as masking effect.The tone of two sound or time are upper closer, Masking effect is more serious, so the residual noise generally after postfilter noise reduction process is lost primary characteristic, certain Make hearing test unnatural in degree.
Summary of the invention
The embodiment of the present invention provides a kind of method and system for handling voice signal, asks at least solving above-mentioned technology One of topic.
In a first aspect, the embodiment of the present invention provides a kind of method for handling voice signal, comprising: obtain sound to be processed Signal, the voice signal to be processed include target sound signal and interference sound signal;Determine the interference sound signal Power spectral density, and processing is weighted to the voice signal to be processed according to the power spectral density, to obtain target The spectrum estimation of voice signal;Masking threshold is determined according to the spectrum estimation;It determines and is interfered in the voice signal to be processed In the case that the spectrum component of voice signal is greater than the masking threshold, place is filtered to the voice signal to be processed Reason.
Optionally, the interference sound signal includes noise signal and echo signal.
Optionally, processing is weighted to the voice signal to be processed according to the power spectral density, to obtain target The step of spectrum estimation of voice signal includes:
The voice signal to be processed is converted into frequency-region signal E (Ω);
Determine posteriori SNR PostSNR (Ω) according to the following formula:
PostSNR (Ω)=| E (Ω) |2/(Rbb(Ω)+Rnn(Ω)),
Wherein, Rbb(Ω) is the power spectral density of the echo signal, Rnn(Ω) is the power spectrum of the noise signal Density;
Prior weight PrioriSNR (Ω) is derived according to the following formula:
PrioriSNR(Ωi)=(1-alpha) * P (PostSNR (Ωi)-1)+alpha*|S’(Ωi-1)|2/Rbb(Ω);
Wherein, alpha is smoothing factor, P (x)=(| x |+x)/2, S ' (Ωi-1) be previous frame voice signal frequency spectrum Estimation;
Further calculate weighting coefficient HLSA(Ω), and obtain the spectrum estimation S ' (Ω) of the target sound signal:
S ' (Ω)=E (Ω) * HLSA(Ω),
Wherein, theta=PostSNR (Ω) * PrioriSNR (Ω)/(PrioriSNR (Ω)+1).
Optionally it is determined that the spectrum component of interference sound signal is greater than the masking threshold in the voice signal to be processed In the case where value, the step of being filtered to the voice signal to be processed, includes:
The weighting system of filtering processing is determined according to the power spectral density of the power spectral density of echo signal and noise signal Number H (Ω):
H (Ω)=min (1, sqrt (RTT(Ω)/(Rbb(Ω)+Rnn(Ω))) +(zeta_b*Rbb(Ω)+zeta_n*Rnn (Ω))/(Rbb(Ω)+Rnn(Ω))),
Wherein, Rbb(Ω) is the power spectral density of the echo signal, Rnn(Ω) is the power spectrum of the noise signal Density, zeta_b are echo attenuation coefficient, and zeta_n is noise reduction coefficient.
Optionally, the step of determining masking threshold according to the spectrum estimation include:
According to spectrum estimation, power spectral density B (k) and the extension of the critical band of the voice signal to be processed are determined Critical band frequency spectrum C (k):
C (k)=B (k) * SF (k),
Wherein, SF (k)=15.81+7.5*k+0.474-17.5*sqrt (1+ (k+0.474) 2), bh, bl are respectively each The bound frequency of critical band;
According to extension critical band frequency spectrum C (k) and offset function O (k), preliminary masking threshold T (k) is determined:
T (k)=10lg(C(k))-(O(k)/10),
Wherein, offset function O (k)=belta* (14.5+k)+(1-belta) * 5.5;Belta is tone coefficient;
According to preliminary masking threshold T (k) and absolute threshold of audibility Tabs(k), masking threshold R is determinedTT(Ω):
RTT(Ω)=min (T (k), Tabs(k)),
Wherein, Tabs(k)=3.64f-0.8-6.5exp(f-3.3)2+10-3f4
Optionally, the step of obtaining voice signal to be processed include:
Receive initial voice signal;
Echo cancellor is carried out to the initial voice signal, to obtain the voice signal to be processed.
Optionally, the voice signal to be processed is voice signal.
Second aspect, the embodiment of the present invention provide a kind of system for handling voice signal, comprising: signal acquisition module is used In obtaining voice signal to be processed, the voice signal to be processed includes target sound signal and interference sound signal;Frequency spectrum is estimated Determining module is counted, for determining the power spectral density of the interference sound signal, and according to the power spectral density to described Voice signal to be processed is weighted processing, to obtain the spectrum estimation of target sound signal;Masking threshold determining module is used In determining masking threshold according to the spectrum estimation;Module is filtered, is done for determining in the voice signal to be processed The spectrum component of voice signal is disturbed greater than in the case where the masking threshold, place is filtered to the voice signal to be processed Reason.
Optionally, the interference sound signal includes noise signal and echo signal.
Optionally, the spectrum estimation determining module is also used to, and the voice signal to be processed is converted to frequency-region signal E(Ω);And determine posteriori SNR PostSNR (Ω) according to the following formula:
PostSNR (Ω)=| E (Ω) |2/(Rbb(Ω)+Rnn(Ω)),
Wherein, Rbb(Ω) is the power spectral density of the echo signal, Rnn(Ω) is the power spectrum of the noise signal Density;
Prior weight PrioriSNR (Ω) is derived according to the following formula:
PrioriSNR(Ωi)=(1-alpha) * P (PostSNR (Ωi)-1)+alpha*|S’(Ωi-1)|2/Rbb(Ω);
Wherein, alpha is smoothing factor, P (x)=(| x |+x)/2, S ' (Ωi-1) be previous frame voice signal frequency spectrum Estimation;
Further calculate weighting coefficient HLSA(Ω), and obtain the spectrum estimation S ' (Ω) of the target sound signal:
S ' (Ω)=E (Ω) * HLSA(Ω),
Wherein, theta=PostSNR (Ω) * PrioriSNR (Ω)/(PrioriSNR (Ω)+1).
Optionally, masking threshold determining module is also used to, and according to spectrum estimation, determines the voice signal to be processed The power spectral density B (k) of critical band and extension critical band frequency spectrum C (k):
C (k)=B (k) * SF (k),
Wherein, SF (k)=15.81+7.5*k+0.474-17.5*sqrt (1+ (k+0.474) 2), bh, bl are respectively each The bound frequency of critical band;
According to extension critical band frequency spectrum C (k) and offset function O (k), preliminary masking threshold T (k) is determined:
T (k)=10lg(C(k))-(O(k)/10),
Wherein, offset function O (k)=belta* (14.5+k)+(1-belta) * 5.5;Belta is tone coefficient;
According to preliminary masking threshold T (k) and absolute threshold of audibility Tabs(k), masking threshold R is determinedTT(Ω):
RTT(Ω)=min (T (k), Tabs(k)),
Wherein, Tabs(k)=3.64f-0.8-6.5exp(f-3.3)2+10-3f4
Optionally, the filtering processing module is also used to, according to the function of the power spectral density of echo signal and noise signal Rate spectrum density determines the weighting coefficient H (Ω) of filtering processing:
H (Ω)=min (1, sqrt (RTT(Ω)/(Rbb(Ω)+Rnn(Ω))) +(zeta_b*Rbb(Ω)+zeta_n*Rnn (Ω))/(Rbb(Ω)+Rnn(Ω))),
Wherein, Rbb(Ω) is the power spectral density of the echo signal, Rnn(Ω) is the power spectrum of the noise signal Density, zeta_b are echo attenuation coefficient, and zeta_n is noise reduction coefficient.
Optionally, the signal acquisition module is also used to, and receives initial voice signal;To the initial voice signal into Row echo cancellor, to obtain the voice signal to be processed.
The third aspect, the embodiment of the present invention provide a kind of storage medium, are stored with one or more in the storage medium Including the program executed instruction, it is described execute instruction can by electronic equipment (including but not limited to computer, server, or Network equipment etc.) it reads and executes, in the method for executing any of the above-described processing voice signal of the present invention.
Fourth aspect provides a kind of electronic equipment comprising: at least one processor, and with described at least one Manage the memory of device communication connection, wherein the memory is stored with the instruction that can be executed by least one described processor, Described instruction is executed by least one described processor, so that at least one described processor is able to carry out above-mentioned of the present invention The method and system of one processing voice signal.
5th aspect, the embodiment of the present invention also provide a kind of computer program product, and the computer program product includes The computer program of storage on a storage medium, the computer program includes program instruction, when described program instruction is calculated When machine executes, the computer is made to execute the method and system of any of the above-described processing voice signal.
The beneficial effect of the embodiment of the present invention is: can make voice signal distortion reduction, sound more natural, pass through meter The power spectral density PSD of the interference sound signal of calculation, further determines that out masking threshold, calculates this process reduces algorithm Complexity.And the order requirement for eliminating filter to pre-echo is reduced, and then accelerates the receipts of pre-echo arrester Hold back speed.And it can be improved its robustness under strong background noise and near-end speech environment.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, making required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are some embodiments of the invention, for this For the those of ordinary skill of field, without creative efforts, it can also be obtained according to these attached drawings others Attached drawing.
Fig. 1 is the flow chart of an embodiment of the method for processing voice signal of the invention;
Fig. 2 is the flow chart of another embodiment of the method for processing voice signal of the invention;
Fig. 3 is the schematic diagram that the method for processing voice signal of the invention realizes an embodiment of system;
Fig. 4 is the schematic diagram of an embodiment of the method for processing voice signal of the invention;
Fig. 5 is the schematic diagram of an embodiment of the system of processing voice signal of the invention;
Fig. 6 is the structural schematic diagram of an embodiment of electronic equipment of the invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people Member's every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.
The present invention can describe in the general context of computer-executable instructions executed by a computer, such as journey Sequence module.Generally, program module include routines performing specific tasks or implementing specific abstract data types, it is program, right As, element, data structure etc..The present invention can also be practiced in a distributed computing environment, in these distributed computing environment In, by executing task by the connected remote processing devices of communication network.In a distributed computing environment, program mould Block can be located in the local and remote computer storage media including storage equipment.
In the present invention, the fingers such as " module ", " device ", " system " are applied to the related entities of computer, such as hardware, firmly Combination, software or software in execution of part and software etc..In detail, for example, element can with but be not limited to run on Process, processor, object, executable element, execution thread, program and/or the computer of processor.In addition, running on service Application program or shell script, server on device can be elements.One or more elements can execution process and/ Or in thread, and element can be localized and/or be distributed between two or multiple stage computers on one computer, and It can be run by various computer-readable mediums.Element can also according to the signal with one or more data packets, for example, Interacted from one with another element in local system, distributed system, and/or internet network by signal with The signal of the data of other system interactions is communicated by locally and/or remotely process.
Finally, it is to be noted that, herein, relational terms such as first and second and the like are used merely to Distinguish one entity or operation from another entity or operation, without necessarily requiring or implying these entities or There are any actual relationship or orders between operation.Moreover, the terms "include", "comprise", are not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence " including ... ", it is not excluded that wrapping Include in the process, method, article or equipment of the element that there is also other identical elements.
As shown in Figure 1, the embodiment of the present invention provides a kind of method for handling voice signal, comprising:
Step S11: obtaining voice signal to be processed, and voice signal to be processed includes target sound signal and interference sound Signal.
Step S12: determining the power spectral density of interference sound signal, according to power spectral density to voice signal to be processed It is weighted processing, to obtain the spectrum estimation of target sound signal.Specifically, determining the power spectrum of interference sound signal After degree, determine posteriority and prior weight, and according to the signal-to-noise ratio computation weighting coefficient and to voice signal to be processed into Row weighting processing, obtains the spectrum estimation of target sound information.
Step S13: masking threshold is determined according to spectrum estimation.
Step S14: the case where spectrum component of interference sound signal in voice signal to be processed is greater than masking threshold is determined Under, voice signal to be processed is filtered.
And in embodiments of the present invention, specific for the calculating of masking threshold:
According to spectrum estimation, determines the power spectral density B (k) of the critical band of voice signal to be processed and extend critical Band spectrum C (k):
C (k)=B (k) * SF (k),
Wherein, SF (k)=15.81+7.5*k+0.474-17.5*sqrt (1+ (k+0.474) 2), bh, bl are respectively each The bound frequency of critical band;
According to extension critical band frequency spectrum C (k) and offset function O (k), preliminary masking threshold T (k) is determined:
T (k)=10lg(C(k))-(O(k)/10),
Wherein, offset function O (k)=belta* (14.5+k)+(1-belta) * 5.5;Belta is tone coefficient;
According to preliminary masking threshold T (k) and absolute threshold of audibility Tabs(k), masking threshold R is determinedTT(Ω):
RTT(Ω)=min (T (k), Tabs(k)),
Wherein, Tabs(k)=3.64f-0.8-6.5exp(f-3.3)2+10-3f4
The embodiment of the present invention is further determined that out and is sheltered by the power spectral density PSD of the interference sound signal of calculating Threshold value, this process reduces the complexities that algorithm calculates.And the order requirement that filter is eliminated to pre-echo is reduced, And then accelerate the convergence rate of pre-echo arrester.And it can be improved it in strong background noise and near-end speech environment Under robustness.
As shown in Fig. 2, the embodiment of the present invention provides a kind of method for handling voice signal, comprising:
Step S21: initial voice signal is received.The initial voice signal can be picked up by radio equipments such as microphones.
Step S22: carrying out echo cancellor to initial voice signal by Echo Canceller, to obtain sound letter to be processed Number.
Step S23: determining the power spectral density of interference sound signal, according to power spectral density to voice signal to be processed It is weighted processing, to obtain the spectrum estimation of target sound signal.
Step S24: masking threshold is determined according to spectrum estimation.
Step S25: the case where spectrum component of interference sound signal in voice signal to be processed is greater than masking threshold is determined Under, voice signal to be processed is filtered.
After receiving initial signal, first to its preliminary progress echo cancellor, sound letter is can be improved in the embodiment of the present invention Number processing accuracy.
If including noise signal and echo signal in voice signal to be processed, according to power spectral density to be processed Voice signal is weighted processing, during obtaining the spectrum estimation of target sound signal:
Voice signal to be processed is converted into frequency-region signal E (Ω);
Determine posteriori SNR PostSNR (Ω) according to the following formula:
PostSNR (Ω)=| E (Ω) |2/(Rbb(Ω)+Rnn(Ω)),
Wherein, Rbb(Ω) is the power spectral density of echo signal, Rnn(Ω) is the power spectral density of noise signal;
Prior weight PrioriSNR (Ω) is derived according to the following formula:
PrioriSNR(Ωi)=(1-alpha) * P (PostSNR (Ωi)-1)+alpha*|S’(Ωi-1)|2/Rbb(Ω);
Wherein, alpha is smoothing factor, P (x)=(| x |+x)/2, S ' (Ωi-1) be previous frame voice signal frequency spectrum Estimation;
Further calculate weighting coefficient HLSA(Ω), and obtain the spectrum estimation S ' (Ω) of target sound signal:
S ' (Ω)=E (Ω) * HLSA(Ω),
Wherein, theta=PostSNR (Ω) * PrioriSNR (Ω)/(PrioriSNR (Ω)+1).
In the case where determining that the spectrum component of interference sound signal in voice signal to be processed is greater than masking threshold, treat Handling the step of voice signal is filtered includes:
The weighting system of filtering processing is determined according to the power spectral density of the power spectral density of echo signal and noise signal Number H (Ω):
H (Ω)=min (1, sqrt (RTT(Ω)/(Rbb(Ω)+Rnn(Ω))) +(zeta_b*Rbb(Ω)+zeta_n*Rnn (Ω))/(Rbb(Ω)+Rnn(Ω))),
Wherein, Rbb(Ω) is the power spectral density of echo signal, Rnn(Ω) is the power spectral density of noise signal, Zeta_b is echo attenuation coefficient, and zeta_n is noise reduction coefficient.
The embodiment of the present invention remains original background noise characteristic, and residual echo hearing test is more noise like, voice Distortion reduction, so that sound sounds more natural.And the order requirement that filter is eliminated to pre-echo is reduced, in turn It accelerates the convergence rate of pre-echo arrester while reducing the algorithm computation complexity of Echo Canceller.And it can Improve its robustness under strong background noise and near-end speech environment.
As shown in figure 3, in embodiments of the present invention, distal end in the method realization system of processing voice signal of the invention Microphone transmits voice signal, is shown by loudspeaker, and constitutes initial echo signal d (k).Proximal end microphones pick up speech Signal y (k), including pure voice signal s (k) i.e. target sound signal, noise signal n (k) and loudspeaker are anti-through LRM The initial echo signal d (k) of feedback.Firstly, Echo Canceller C carries out echo to the voice signal y (k) that proximal end microphone picks up It eliminates, the filtering processing of filter H further progress.
As shown in figure 4, the embodiment of the present invention provides a kind of method for handling voice signal, comprising:
Proximal end microphones pick up speech signal y (k), including pure voice signal s (k), noise signal n (k), and The initial echo signal d (k) that loudspeaker is fed back through LRM.In embodiments of the present invention, which is target information.
Echo Canceller carries out echo cancellor to the voice signal y (k) that proximal end microphone picks up, after obtaining echo cancellor Voice signal e (k).The interference sound signal that voice signal e (k) after the echo cancellor includes is noise signal and residual Echo signal.
Noise PSD Rnn (Ω) and residual echo PSD R are estimated by statistics or autocorrelation methodbb(Ω)。
Postfilter is weighted processing to the proximal end microphone signal after echo cancellor, obtains pure voice signal Frequency spectrum S ' (Ω) according to a preliminary estimate.Detailed process includes:
A) posteriori SNR is calculated:
PostSNR (Ω)=| E (Ω) |2/(Rbb(Ω)+Rnn(Ω))
B) prior weight is derived according to decision-directed method:
PrioriSNR(Ωi)=(1-alpha) * P (PostSNR (Ωi)-1)+alpha*|S’(Ωi-1)|2/Rbb(Ω)
Wherein alpha is smoothing factor, P (x)=(| x |+x)/2, S ' (Ωi-1) it is tentatively estimating for previous frame voice signal Meter.
C) theta=PostSNR (Ω) * PrioriSNR (Ω)/(PrioriSNR (Ω)+1) is defined, then calculates and adds Weight coefficient:
D) weighting obtains S ' according to a preliminary estimate (Ω)=E (Ω) * H of voice signalLSA(Ω)
Then, according to speech signal spec-trum, S ' (Ω) estimates masking threshold R according to a preliminary estimateTT(Ω).Detailed process packet It includes:
A) critical band analysis is carried out to signal and human ear is regarded as discrete bandpass filter group according to situation theory, One critical band is referred to as a Bark, then
The power spectral density of each critical band
Wherein, bh, bl are respectively the bound frequency of each critical band, and k is related with sample rate.
B) spread function SF (k) is calculated:
SF (k)=15.81+7.5*k+0.474-17.5*sqrt (1+ (k+0.474) 2)
Due to influencing each other between critical band, extension extension critical band frequency spectrum is represented by C (k)=B (k) * SF (k)。
C) the masking threshold R of masking noise and residual echo is calculatedTT(Ω)。
Because being respectively there are two kinds of masking thresholds: the threshold value of masking by pure tone noise and residual echo is C (k)-(14.5+ K) threshold value of db and noise and residual echo masking pure tone, is C (k) -5.5db.
Accordingly, it is determined that signal is similar to pure tone or noise and residual echo, and then needs to define spectrum flatness and estimates SFM:
SFM=10*lg (G/A)
Wherein, G, A are respectively the geometrical mean and arithmetic mean of instantaneous value of power spectrum density.
And define tone coefficient belta=min (SFM/SFMmax,1)
The offset function O (k) that each frequency band shelters energy is calculated by belta:
O (k)=belta* (14.5+k)+(1-belta) * 5.5
Then masking threshold size are as follows: T (k)=10lg(C(k))-(O(k)/10)
The spread function threshold value being calculated is returned in the domain Bark
Compared with human ear hearing absolute threshold, if the absolute threshold of audibility of the masking threshold lower than human ear calculated, Just take the value of the absolute threshold of audibility, wherein absolute threshold of audibility Tabs (k) is defined as:
Tabs (k)=3.64f-0.8-6.5exp(f-3.3)2+10-3f4
So final masking threshold is RTT(Ω)=min (T (k), Tabs(k))。
Further, psychologic acoustics weighted filtering is carried out to frequency domain microphone signal E (Ω) after echo cancellor.It is (fast with FFT Fast Fourier transform) digital signal of time domain can be converted to frequency-region signal, and judge frequency domain Mike's wind after echo cancellor Whether the noise spectrum ingredient in number E (Ω) is less than masking threshold, does not handle if then retaining;If otherwise to corresponding noise frequency Spectrum ingredient is decayed according to traditional MMSE-LSA.
Wherein, psychologic acoustics weighting filter coefficients specific derivation process is as follows:
The design object of psychologic acoustics adaptive weighted filter is to be equal to cover in the sum of residual echo distortion and noise distortion Near-end voice signals distortion is minimum when covering threshold value, so optimal psychologic acoustics weighting filter coefficients H (Ω) meets:
[zeta_b–H(Ω)]2Rbb(Ω)+[zeta_n–H(Ω)]2Rnn(Ω)=RTT(Ω)
Wherein, zeta_b is residual echo attenuation coefficient, usually takes 20lg (zeta_b)=- 35;
Zeta_n is noise reduction coefficient, usually takes 20lg (zeta_n)=- 15.
Due to 0≤H (Ω)≤1, solves above-mentioned secondary equation H (Ω) and positive value is taken to obtain:
H (Ω)=min (1, [zeta_b*Rbb(Ω)+zeta_n*Rnn(Ω)+
sqrt([Rbb(Ω)+Rnn(Ω)]*RTT(Ω)-[zeta_b-zeta_n]2*Rbb(Ω)*Rbb(Ω))]/(Rbb (Ω)+ Rnn(Ω)))
Since zeta_b, zeta_n are much smaller than 1 and usually relative to Rbb(Ω) and RbbR for (Ω)TT(Ω) is not Too small, institute's above formula can abbreviation are as follows:
H (Ω)=min (1, sqrt (RTT(Ω)/(Rbb(Ω)+Rnn(Ω))) +(zeta_b*Rbb(Ω)+zeta_n*Rnn (Ω))/(Rbb(Ω)+Rnn(Ω)))
The embodiment of the present invention eliminates adaptive filter to pre-echo since psychologic acoustics postfilter can also be reduced The order requirement of wave device, it is possible to accelerate the convergence rate of Echo Canceller, reduce algorithm computation complexity, and can mention Its high robustness under strong background noise and near-end speech environment.
And merge residual echo in postposition psychologic acoustics weighting filter and eliminate, it is gone adaptively using residual echo Filter weighting coefficients are updated, acoustic echo is further eliminated.In addition, in masking threshold noise spectrum below and remaining back Sound ingredient is not since human ear masking effect is heard, so this partial noise frequency spectrum and residual echo ingredient do not need to decay, Only need using traditional adaptive post-filtering method to not by voice signal shelter noise spectrum and residual echo at Divide and decay, to remain original background noise characteristic well, residual echo hearing test is more noise like, voice Distortion reduction sounds more natural.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a systems The movement of column merges, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described, Because according to the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art also answer This knows that the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily originally Necessary to invention.In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, without detailed in some embodiment The part stated, reference can be made to the related descriptions of other embodiments.
As shown in figure 5, the embodiment of the present invention also provides a kind of system 500 for handling voice signal, comprising:
Signal acquisition module 510, for obtaining voice signal to be processed, voice signal to be processed includes target sound letter Number and interference sound signal.
Spectrum estimation determining module 520, for determining the power spectral density of interference sound signal, and according to power spectrum Density is weighted processing to voice signal to be processed, to obtain the spectrum estimation of target sound signal.
Masking threshold determining module 530, for determining masking threshold according to spectrum estimation.
Module 540 is filtered, is covered for determining that the spectrum component of interference sound signal in voice signal to be processed is greater than In the case where covering threshold value, voice signal to be processed is filtered.
Further, interference sound signal includes noise signal and echo signal.
Spectrum estimation determining module is also used to, and voice signal to be processed is converted to frequency-region signal E (Ω);And according to Following formula determines posteriori SNR PostSNR (Ω):
PostSNR (Ω)=| E (Ω) |2/(Rbb(Ω)+Rnn(Ω)),
Wherein, Rbb(Ω) is the power spectral density of echo signal, Rnn(Ω) is the power spectral density of noise signal;
Prior weight PrioriSNR (Ω) is derived according to the following formula:
PrioriSNR(Ωi)=(1-alpha) * P (PostSNR (Ωi)-1)+alpha*|S’(Ωi-1)|2/Rbb(Ω);
Wherein, alpha is smoothing factor, P (x)=(| x |+x)/2, S ' (Ωi-1) be previous frame voice signal frequency spectrum Estimation;
Further calculate weighting coefficient HLSA(Ω), and obtain the spectrum estimation S ' (Ω) of target sound signal:
S ' (Ω)=E (Ω) * HLSA(Ω),
Wherein, theta=PostSNR (Ω) * PrioriSNR (Ω)/(PrioriSNR (Ω)+1).
Masking threshold determining module is also used to, and according to spectrum estimation, determines the function of the critical band of voice signal to be processed Rate spectrum density B (k) and extension critical band frequency spectrum C (k):
C (k)=B (k) * SF (k),
Wherein, SF (k)=15.81+7.5*k+0.474-17.5*sqrt (1+ (k+0.474) 2), bh, bl are respectively each The bound frequency of critical band;
According to extension critical band frequency spectrum C (k) and offset function O (k), preliminary masking threshold T (k) is determined:
T (k)=10lg(C(k))-(O(k)/10),
Wherein, offset function O (k)=belta* (14.5+k)+(1-belta) * 5.5;Belta is tone coefficient;
According to preliminary masking threshold T (k) and absolute threshold of audibility Tabs(k), masking threshold R is determinedTT(Ω):
RTT(Ω)=min (T (k), Tabs(k)),
Wherein, Tabs(k)=3.64f-0.8-6.5exp(f-3.3)2+10-3f4
Filtering processing module is also used to, and the power spectral density according to the power spectral density of echo signal and noise signal is true Make the weighting coefficient H (Ω) of filtering processing:
H (Ω)=min (1, sqrt (RTT(Ω)/(Rbb(Ω)+Rnn(Ω))) +(zeta_b*Rbb(Ω)+zeta_n*Rnn (Ω))/(Rbb(Ω)+Rnn(Ω))),
Wherein, Rbb(Ω) is the power spectral density of echo signal, Rnn(Ω) is the power spectral density of noise signal, Zeta_b is echo attenuation coefficient, and zeta_n is noise reduction coefficient.
Signal acquisition module is also used to, and receives initial voice signal;Echo cancellor is carried out to initial voice signal, with To voice signal to be processed.
The embodiment of the present invention is further determined that out and is sheltered by the power spectral density PSD of the interference sound signal of calculating Threshold value, this process reduces the complexities that algorithm calculates.And the order requirement that filter is eliminated to pre-echo is reduced, And then accelerate the convergence rate of pre-echo arrester.And it can be improved it in strong background noise and near-end speech environment Under robustness.
In some embodiments, the embodiment of the present invention provides a kind of non-volatile computer readable storage medium storing program for executing, described to deposit Being stored in storage media one or more includes the programs executed instruction, it is described execute instruction can by electronic equipment (including but It is not limited to computer, server or the network equipment etc.) it reads and executes, for executing any of the above-described processing of the present invention The method of voice signal.
In some embodiments, the embodiment of the present invention also provides a kind of computer program product, and the computer program produces Product include the computer program being stored on non-volatile computer readable storage medium storing program for executing, and the computer program includes program Instruction makes the computer execute the side of any of the above-described processing voice signal when described program instruction is computer-executed Method.
In some embodiments, the embodiment of the present invention also provides a kind of electronic equipment comprising: at least one processor, And the memory being connect at least one described processor communication, wherein the memory is stored with can be by described at least one The instruction that a processor executes, described instruction is executed by least one described processor, so that at least one described processor energy Enough methods for executing processing voice signal.
In some embodiments, the embodiment of the present invention also provides a kind of storage medium, is stored thereon with computer program, It is characterized in that, the method for handling voice signal when the program is executed by processor.
The system of the processing voice signal of the embodiments of the present invention can be used for executing the processing sound of the embodiment of the present invention The method of signal, and reach the method technology achieved of the realization processing voice signal of the embodiments of the present invention accordingly Effect, which is not described herein again.Hardware processor (hardware processor) Lai Shixian can be passed through in the embodiment of the present invention Related function module.
Fig. 6 is the hardware knot of the electronic equipment of the method for the execution processing voice signal that another embodiment of the application provides Structure schematic diagram, as shown in fig. 6, the equipment includes:
One or more processors 610 and memory 620, in Fig. 6 by taking a processor 610 as an example.
The equipment for executing the method for processing voice signal can also include: input unit 630 and output device 640.
Processor 610, memory 620, input unit 630 and output device 640 can pass through bus or other modes It connects, in Fig. 6 for being connected by bus.
Memory 620 is used as a kind of non-volatile computer readable storage medium storing program for executing, can be used for storing non-volatile software journey Sequence, non-volatile computer executable program and module, the method such as the processing voice signal in the embodiment of the present application are corresponding Program instruction/module.Processor 610 by operation be stored in memory 620 non-volatile software program, instruction with And module, thereby executing the various function application and data processing of server, i.e. realization above method embodiment handles sound The method of signal.
Memory 620 may include storing program area and storage data area, wherein storing program area can store operation system Application program required for system, at least one function;Storage data area can store the use of the device according to processing voice signal The data etc. created.In addition, memory 620 may include high-speed random access memory, it can also include non-volatile deposit Reservoir, for example, at least a disk memory, flush memory device or other non-volatile solid state memory parts.In some implementations In example, optional memory 620 includes the memory remotely located relative to processor 610, these remote memories can lead to Network connection is crossed to the device for handling voice signal.The example of above-mentioned network include but is not limited to internet, intranet, Local area network, mobile radio communication and combinations thereof.
Input unit 630 can receive the number or character information of input, and generate and the device of processing voice signal User setting and the related signal of function control.Output device 640 may include that display screen etc. shows equipment.
One or more of modules are stored in the memory 620, when by one or more of processors When 610 execution, the method for handling voice signal in above-mentioned any means embodiment is executed.
The said goods can be performed the embodiment of the present application provided by method, have the corresponding functional module of execution method and Beneficial effect.The not technical detail of detailed description in the present embodiment, reference can be made to method provided by the embodiment of the present application.
The electronic equipment of the embodiment of the present application exists in a variety of forms, including but not limited to:
(1) mobile communication equipment: the characteristics of this kind of equipment is that have mobile communication function, and to provide speech, data Communication is main target.This Terminal Type includes: smart phone (such as iPhone), multimedia handset, functional mobile phone, and Low-end mobile phone etc..
(2) super mobile personal computer equipment: this kind of equipment belongs to the scope of personal computer, there is calculating and processing function Can, generally also have mobile Internet access characteristic.This Terminal Type includes: PDA, MID and UMPC equipment etc., such as iPad.
(3) portable entertainment device: this kind of equipment can show and play multimedia content.Such equipment includes: sound Frequently, video player (such as iPod), handheld device, e-book and intelligent toy and portable car-mounted navigation equipment.
(4) server: providing the equipment of the service of calculating, and the composition of server includes that processor, hard disk, memory, system are total Line etc., server is similar with general computer architecture, but due to needing to provide highly reliable service, in processing energy Power, stability, reliability, safety, scalability, manageability etc. are more demanding.
(5) other electronic devices with data interaction function.
The apparatus embodiments described above are merely exemplary, wherein the unit as illustrated by the separation member It may or may not be physically separated, component shown as a unit may or may not be physics Unit, it can it is in one place, or may be distributed over multiple network units.It can select according to the actual needs Some or all of the modules therein is selected to achieve the purpose of the solution of this embodiment.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment The mode of general hardware platform can be added to realize by software, naturally it is also possible to pass through hardware.Based on this understanding, above-mentioned Technical solution substantially in other words can be embodied in the form of software products the part that the relevant technologies contribute, should Computer software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including several Instruction is used so that computer equipment (can be personal computer, server or the network equipment an etc.) execution is each Method described in certain parts of embodiment or embodiment.
Finally, it should be noted that above embodiments are only to illustrate the technical solution of the application, rather than its limitations;To the greatest extent Pipe is with reference to the foregoing embodiments described in detail the application, those skilled in the art should understand that: it is still It is possible to modify the technical solutions described in the foregoing embodiments, or part of technical characteristic is equally replaced It changes;And these are modified or replaceed, the essence of each embodiment technical solution of the application that it does not separate the essence of the corresponding technical solution Mind and range.

Claims (15)

1. a kind of method for handling voice signal characterized by comprising
Voice signal to be processed is obtained, the voice signal to be processed includes target sound signal and interference sound signal;
It determines the power spectral density of the interference sound signal, and the sound to be processed is believed according to the power spectral density Number it is weighted processing, to obtain the spectrum estimation of target sound signal;
Masking threshold is determined according to the spectrum estimation;
It is right in the case where determining that the spectrum component of interference sound signal in the voice signal to be processed is greater than the masking threshold The voice signal to be processed is filtered.
2. the method according to claim 1, wherein the interference sound signal includes noise signal and echo letter Number.
3. according to the method described in claim 2, it is characterized in that, being believed according to the power spectral density the sound to be processed Number it is weighted processing, to include: the step of obtaining the spectrum estimation of target sound signal
The voice signal to be processed is converted into frequency-region signal E (Ω);
Determine posteriori SNR PostSNR (Ω) according to the following formula:
PostSNR (Ω)=| E (Ω) |2/(Rbb(Ω)+Rnn(Ω)),
Wherein, Rbb(Ω) is the power spectral density of the echo signal, Rnn(Ω) is the power spectral density of the noise signal;
Prior weight PrioriSNR (Ω) is derived according to the following formula:
PrioriSNR(Ωi)=(1-alpha) * P (PostSNR (Ωi)-1)+alpha*|S’(Ωi-1)|2/Rbb(Ω);
Wherein, alpha is smoothing factor, P (x)=(| x |+x)/2, S ' (Ωi-1) be previous frame voice signal spectrum estimation;
Further calculate weighting coefficient HLSA(Ω), and obtain the spectrum estimation S ' (Ω) of the target sound signal:
S ' (Ω)=E (Ω) * HLSA(Ω),
Wherein, theta=PostSNR (Ω) * PrioriSNR (Ω)/(PrioriSNR (Ω)+1).
4. according to the method described in claim 2, it is characterized in that, determining interference sound signal in the voice signal to be processed Spectrum component be greater than the masking threshold in the case where, the step of voice signal to be processed is filtered packet It includes:
The weighting coefficient H of filtering processing is determined according to the power spectral density of the power spectral density of echo signal and noise signal (Ω):
H (Ω)=min (1, sqrt (RTT(Ω)/(Rbb(Ω)+Rnn(Ω)))
+(zeta_b*Rbb(Ω)+zeta_n*Rnn(Ω))/(Rbb(Ω)+Rnn(Ω))),
Wherein, Rbb(Ω) is the power spectral density of the echo signal, Rnn(Ω) is the power spectral density of the noise signal, Zeta_b is echo attenuation coefficient, and zeta_n is noise reduction coefficient.
5. the method according to claim 1, wherein the step of determining masking threshold according to spectrum estimation packet It includes:
According to spectrum estimation, determines the power spectral density B (k) of the critical band of the voice signal to be processed and extend critical frequency Band frequency spectrum C (k):
C (k)=B (k) * SF (k),
Wherein, SF (k)=15.81+7.5*k+0.474-17.5*sqrt (1+ (k+0.474) 2), bh, bl are respectively each critical frequency The bound frequency of band;
According to extension critical band frequency spectrum C (k) and offset function O (k), preliminary masking threshold T (k) is determined:
T (k)=10lg(C(k))-(O(k)/10),
Wherein, offset function O (k)=belta* (14.5+k)+(1-belta) * 5.5;Belta is tone coefficient;
According to preliminary masking threshold T (k) and absolute threshold of audibility Tabs(k), masking threshold R is determinedTT(Ω):
RTT(Ω)=min (T (k), Tabs(k)),
Wherein, Tabs(k)=3.64f-0.8-6.5exp(f-3.3)2+10-3f4
6. the method according to claim 1, wherein the step of obtaining voice signal to be processed includes:
Receive initial voice signal;
Echo cancellor is carried out to the initial voice signal, to obtain the voice signal to be processed.
7. the method according to claim 1, wherein the voice signal to be processed is voice signal.
8. a kind of system for handling voice signal characterized by comprising
Signal acquisition module, for obtaining voice signal to be processed, the voice signal to be processed include target sound signal and Interference sound signal;
Spectrum estimation determining module, for determining the power spectral density of the interference sound signal, and according to the power spectrum Density is weighted processing to the voice signal to be processed, to obtain the spectrum estimation of target sound signal;
Masking threshold determining module, for determining masking threshold according to the spectrum estimation;
Module is filtered, for determining that the spectrum component of interference sound signal in the voice signal to be processed is greater than described cover In the case where covering threshold value, the voice signal to be processed is filtered.
9. system according to claim 8, which is characterized in that the interference sound signal includes noise signal and echo letter Number.
10. system according to claim 8, which is characterized in that the spectrum estimation determining module is also used to, will it is described to Processing voice signal is converted to frequency-region signal E (Ω);And determine posteriori SNR PostSNR (Ω) according to the following formula:
PostSNR (Ω)=| E (Ω) |2/(Rbb(Ω)+Rnn(Ω)),
Wherein, Rbb(Ω) is the power spectral density of the echo signal, Rnn(Ω) is the power spectral density of the noise signal;
Prior weight PrioriSNR (Ω) is derived according to the following formula:
PrioriSNR(Ωi)=(1-alpha) * P (PostSNR (Ωi)-1)+alpha*|S’(Ωi-1)|2/Rbb(Ω);
Wherein, alpha is smoothing factor, P (x)=(| x |+x)/2, S ' (Ωi-1) be previous frame voice signal spectrum estimation;
Further calculate weighting coefficient HLSA(Ω), and obtain the spectrum estimation S ' (Ω) of the target sound signal:
S ' (Ω)=E (Ω) * HLSA(Ω),
Wherein, theta=PostSNR (Ω) * PrioriSNR (Ω)/(PrioriSNR (Ω)+1).
11. system according to claim 8, which is characterized in that masking threshold determining module is also used to, and is estimated according to frequency spectrum Meter determines power spectral density B (k) and extension critical band frequency spectrum C (k) of the critical band of the voice signal to be processed:
C (k)=B (k) * SF (k),
Wherein, SF (k)=15.81+7.5*k+0.474-17.5*sqrt (1+ (k+0.474) 2), bh, bl are respectively each critical frequency The bound frequency of band;
According to extension critical band frequency spectrum C (k) and offset function O (k), preliminary masking threshold T (k) is determined:
T (k)=10lg(C(k))-(O(k)/10),
Wherein, offset function O (k)=belta* (14.5+k)+(1-belta) * 5.5;Belta is tone coefficient;
According to preliminary masking threshold T (k) and absolute threshold of audibility Tabs(k), masking threshold R is determinedTT(Ω):
RTT(Ω)=min (T (k), Tabs(k)),
Wherein, Tabs(k)=3.64f-0.8-6.5exp(f-3.3)2+10-3f4
12. system according to claim 8, which is characterized in that the filtering processing module is also used to, according to echo signal Power spectral density and noise signal power spectral density determine filtering processing weighting coefficient H (Ω):
H (Ω)=min (1, sqrt (RTT(Ω)/(Rbb(Ω)+Rnn(Ω)))
+(zeta_b*Rbb(Ω)+zeta_n*Rnn(Ω))/(Rbb(Ω)+Rnn(Ω))),
Wherein, Rbb(Ω) is the power spectral density of the echo signal, Rnn(Ω) is the power spectral density of the noise signal, Zeta_b is echo attenuation coefficient, and zeta_n is noise reduction coefficient.
13. system according to claim 8, which is characterized in that the signal acquisition module is also used to, and receives initial voice Signal;Echo cancellor is carried out to the initial voice signal, to obtain the voice signal to be processed.
14. a kind of electronic equipment comprising: at least one processor, and connect at least one described processor communication Memory, wherein the memory be stored with can by least one described processor execute instruction, described instruction by it is described extremely A few processor executes, so that at least one described processor is able to carry out any one of claim 1-7 the method The step of.
15. a kind of storage medium, is stored thereon with computer program, which is characterized in that the realization when program is executed by processor The step of any one of claim 1-7 the method.
CN201811645765.5A 2018-12-29 2018-12-29 Method and system for processing sound signal Active CN109727605B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811645765.5A CN109727605B (en) 2018-12-29 2018-12-29 Method and system for processing sound signal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811645765.5A CN109727605B (en) 2018-12-29 2018-12-29 Method and system for processing sound signal

Publications (2)

Publication Number Publication Date
CN109727605A true CN109727605A (en) 2019-05-07
CN109727605B CN109727605B (en) 2020-06-12

Family

ID=66298550

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811645765.5A Active CN109727605B (en) 2018-12-29 2018-12-29 Method and system for processing sound signal

Country Status (1)

Country Link
CN (1) CN109727605B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110931007A (en) * 2019-12-04 2020-03-27 苏州思必驰信息科技有限公司 Voice recognition method and system
CN111524498A (en) * 2020-04-10 2020-08-11 维沃移动通信有限公司 Filtering method, device and electronic device
CN114067822A (en) * 2020-08-07 2022-02-18 腾讯科技(深圳)有限公司 Call audio processing method and device, computer equipment and storage medium
CN116320123A (en) * 2022-08-11 2023-06-23 荣耀终端有限公司 A voice signal output method and electronic device
CN117392994A (en) * 2023-12-12 2024-01-12 腾讯科技(深圳)有限公司 Audio signal processing method, device, equipment and storage medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101777349A (en) * 2009-12-08 2010-07-14 中国科学院自动化研究所 Auditory perception property-based signal subspace microphone array voice enhancement method
EP2226794A1 (en) * 2009-03-06 2010-09-08 Harman Becker Automotive Systems GmbH Background Noise Estimation
CN101894563A (en) * 2010-07-15 2010-11-24 瑞声声学科技(深圳)有限公司 Methods of Speech Enhancement
CN101989423A (en) * 2009-07-30 2011-03-23 Nxp股份有限公司 Active noise reduction method using perceptual masking
CN103824564A (en) * 2014-03-17 2014-05-28 上海申磬产业有限公司 Voice enhancement method for use in voice identification process of electric wheelchair
CN105280195A (en) * 2015-11-04 2016-01-27 腾讯科技(深圳)有限公司 Method and device for processing speech signal
CN107393550A (en) * 2017-07-14 2017-11-24 深圳永顺智信息科技有限公司 Method of speech processing and device
CN107993670A (en) * 2017-11-23 2018-05-04 华南理工大学 Microphone array voice enhancement method based on statistical model
US10079026B1 (en) * 2017-08-23 2018-09-18 Cirrus Logic, Inc. Spatially-controlled noise reduction for headsets with variable microphone array orientation
CN108564963A (en) * 2018-04-23 2018-09-21 百度在线网络技术(北京)有限公司 Method and apparatus for enhancing voice
CN108735225A (en) * 2018-04-28 2018-11-02 南京邮电大学 It is a kind of based on human ear masking effect and Bayesian Estimation improvement spectrum subtract method
CN108735229A (en) * 2018-06-12 2018-11-02 华南理工大学 A kind of amplitude based on noise Ratio Weighted and phase combining compensation anti-noise sound enhancement method and realization device

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2226794A1 (en) * 2009-03-06 2010-09-08 Harman Becker Automotive Systems GmbH Background Noise Estimation
CN101989423A (en) * 2009-07-30 2011-03-23 Nxp股份有限公司 Active noise reduction method using perceptual masking
CN101777349A (en) * 2009-12-08 2010-07-14 中国科学院自动化研究所 Auditory perception property-based signal subspace microphone array voice enhancement method
CN101894563A (en) * 2010-07-15 2010-11-24 瑞声声学科技(深圳)有限公司 Methods of Speech Enhancement
CN103824564A (en) * 2014-03-17 2014-05-28 上海申磬产业有限公司 Voice enhancement method for use in voice identification process of electric wheelchair
CN105280195A (en) * 2015-11-04 2016-01-27 腾讯科技(深圳)有限公司 Method and device for processing speech signal
CN107393550A (en) * 2017-07-14 2017-11-24 深圳永顺智信息科技有限公司 Method of speech processing and device
US10079026B1 (en) * 2017-08-23 2018-09-18 Cirrus Logic, Inc. Spatially-controlled noise reduction for headsets with variable microphone array orientation
CN107993670A (en) * 2017-11-23 2018-05-04 华南理工大学 Microphone array voice enhancement method based on statistical model
CN108564963A (en) * 2018-04-23 2018-09-21 百度在线网络技术(北京)有限公司 Method and apparatus for enhancing voice
CN108735225A (en) * 2018-04-28 2018-11-02 南京邮电大学 It is a kind of based on human ear masking effect and Bayesian Estimation improvement spectrum subtract method
CN108735229A (en) * 2018-06-12 2018-11-02 华南理工大学 A kind of amplitude based on noise Ratio Weighted and phase combining compensation anti-noise sound enhancement method and realization device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
卢志强: "基于谱估计统计模型的语音增强算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110931007A (en) * 2019-12-04 2020-03-27 苏州思必驰信息科技有限公司 Voice recognition method and system
CN110931007B (en) * 2019-12-04 2022-07-12 思必驰科技股份有限公司 Speech recognition method and system
CN111524498A (en) * 2020-04-10 2020-08-11 维沃移动通信有限公司 Filtering method, device and electronic device
CN114067822A (en) * 2020-08-07 2022-02-18 腾讯科技(深圳)有限公司 Call audio processing method and device, computer equipment and storage medium
CN116320123A (en) * 2022-08-11 2023-06-23 荣耀终端有限公司 A voice signal output method and electronic device
CN116320123B (en) * 2022-08-11 2024-03-08 荣耀终端有限公司 A voice signal output method and electronic device
CN117392994A (en) * 2023-12-12 2024-01-12 腾讯科技(深圳)有限公司 Audio signal processing method, device, equipment and storage medium
CN117392994B (en) * 2023-12-12 2024-03-01 腾讯科技(深圳)有限公司 Audio signal processing method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN109727605B (en) 2020-06-12

Similar Documents

Publication Publication Date Title
CN109727604B (en) Frequency domain echo cancellation method and computer storage medium for speech recognition front-end
US7831035B2 (en) Integration of a microphone array with acoustic echo cancellation and center clipping
CN109727605A (en) Method and system for processing sound signals
CN111341336B (en) Echo cancellation method, device, terminal equipment and medium
CN111768796B (en) Acoustic echo cancellation and dereverberation method and device
CN109473118B (en) Dual-channel speech enhancement method and device
US7773743B2 (en) Integration of a microphone array with acoustic echo cancellation and residual echo suppression
CN103632675B (en) Noise Estimation During Noise Reduction and Echo Cancellation in Personal Communications
CN111951819A (en) Echo cancellation method, device and storage medium
EP3282678B1 (en) Signal processor with side-tone noise reduction for a headset
US20160066087A1 (en) Joint noise suppression and acoustic echo cancellation
US9232309B2 (en) Microphone array processing system
US8306821B2 (en) Sub-band periodic signal enhancement system
US20080031469A1 (en) Multi-channel echo compensation system
CN107123430A (en) Echo cancellation method, device, conference tablet and computer storage medium
CN106898359A (en) Acoustic signal processing method, system, audio interactive device and computer equipment
EP4071757A1 (en) Echo cancellation method and device
TW200948030A (en) Apparatus and method for computing filter coefficients for echo suppression
CN103534942B (en) Process audio signal
CN1367977A (en) Methods and apparatus for improved sub-band adaptive filtering in echo cancellation systems
US8543390B2 (en) Multi-channel periodic signal enhancement system
EP3796629A1 (en) Double talk detection method, double talk detection device and echo cancellation system
KR20220157475A (en) Echo Residual Suppression
US12272369B1 (en) Dereverberation and noise reduction
TW202331701A (en) Echo cancelling method for dual-microphone array, echo cancelling device for dual-microphone array, electronic equipment, and computer-readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Patentee after: Sipic Technology Co.,Ltd.

Address before: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Patentee before: AI SPEECH Ltd.