CN109727605B

CN109727605B - Method and system for processing sound signal

Info

Publication number: CN109727605B
Application number: CN201811645765.5A
Authority: CN
Inventors: 袁斌
Original assignee: AI Speech Ltd
Current assignee: Sipic Technology Co Ltd
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2020-06-12
Anticipated expiration: 2038-12-29
Also published as: CN109727605A

Abstract

The invention discloses a method and a system for processing a sound signal. One embodiment of the method comprises: acquiring a sound signal to be processed, wherein the sound signal to be processed comprises a target sound signal and an interference sound signal; determining the power spectral density of the interference sound signal, and performing weighting processing on the sound signal to be processed according to the power spectral density to obtain a frequency spectrum estimation of a target sound signal; determining a masking threshold from the spectral estimate; and under the condition that the frequency spectrum component of the interference sound signal in the sound signal to be processed is determined to be larger than the masking threshold, carrying out filtering processing on the sound signal to be processed. The method can reduce the distortion of the sound signal, sound more naturally, reduce the complexity of algorithm calculation and accelerate the convergence speed of the preposed echo canceller. And the robustness under strong background noise and near-end voice environment can be improved.

Description

Method and system for processing sound signals

技术领域technical field

本发明涉及信号处理技术领域，尤其涉及一种处理声音信号的方法及系统。The present invention relates to the technical field of signal processing, and in particular, to a method and system for processing sound signals.

背景技术Background technique

现有技术中，对于声音信号的滤波处理，能减少“音乐噪声”，但是存在滤波器降噪处理后的语音信号在一定程度上不太自然的问题。因为人耳接受一个声音时很可能受到另一个声音的干扰和压制，这种现象称为掩蔽效应。两个声音的音调或时间上越接近，掩蔽效应越严重，所以一般经后置滤波器降噪处理后的残留噪声丢失了原有特性，在一定程度上使得听觉测试不自然。In the prior art, the filtering processing of the sound signal can reduce "music noise", but there is a problem that the speech signal after the noise reduction processing by the filter is unnatural to a certain extent. Because the human ear is likely to be interfered and suppressed by another sound when it receives a sound, this phenomenon is called the masking effect. The closer the pitch or time of the two sounds, the more serious the masking effect, so the residual noise after noise reduction processing by the post-filter generally loses its original characteristics, which makes the hearing test unnatural to a certain extent.

发明内容SUMMARY OF THE INVENTION

本发明实施例提供一种处理声音信号的方法及系统，用于至少解决上述技术问题之一。Embodiments of the present invention provide a method and system for processing a sound signal, which are used to solve at least one of the above technical problems.

第一方面，本发明实施例提供一种处理声音信号的方法，包括：获取待处理声音信号，所述待处理声音信号包括目标声音信号和干扰声音信号；确定所述干扰声音信号的功率谱密度，以及根据所述功率谱密度对所述待处理声音信号进行加权处理，以得到目标声音信号的频谱估计；根据所述频谱估计确定掩蔽阈值；确定所述待处理声音信号中干扰声音信号的频谱成分大于所述掩蔽阈值的情况下，对所述待处理声音信号进行滤波处理。In a first aspect, an embodiment of the present invention provides a method for processing a sound signal, including: acquiring a sound signal to be processed, where the sound signal to be processed includes a target sound signal and an interference sound signal; and determining a power spectral density of the interference sound signal , and perform weighting processing on the sound signal to be processed according to the power spectral density to obtain a spectral estimation of the target sound signal; determine a masking threshold according to the spectral estimation; determine the frequency spectrum of the interfering sound signal in the sound signal to be processed When the component is greater than the masking threshold, filter processing is performed on the to-be-processed sound signal.

可选地，所述干扰声音信号包括噪声信号和回声信号。Optionally, the interfering sound signal includes a noise signal and an echo signal.

可选地，根据所述功率谱密度对所述待处理声音信号进行加权处理，以得到目标声音信号的频谱估计的步骤包括：Optionally, the step of performing weighting processing on the to-be-processed sound signal according to the power spectral density to obtain the spectral estimation of the target sound signal includes:

将所述待处理声音信号转换为频域信号E(Ω)；Convert the to-be-processed sound signal into a frequency domain signal E(Ω);

根据以下公式确定后验信噪比PostSNR(Ω)：Determine the posterior signal-to-noise ratio PostSNR(Ω) according to the following formula:

PostSNR(Ω)＝|E(Ω)|²/(R_bb(Ω)+R_nn(Ω))，PostSNR(Ω)=|E(Ω)| ² /(R _bb (Ω)+R _nn (Ω)),

其中，R_bb(Ω)为所述回声信号的功率谱密度，R_nn(Ω)为所述噪声信号的功率谱密度；Wherein, R _bb (Ω) is the power spectral density of the echo signal, and R _nn (Ω) is the power spectral density of the noise signal;

根据以下公式推导出先验信噪比PrioriSNR(Ω)：The priori SNR(Ω) is derived from the following formula:

PrioriSNR(Ω_i)＝(1-alpha)*P(PostSNR(Ω_i)-1)+alpha*|S’(Ω_i-1)|²/R_bb(Ω)；PrioriSNR(Ω _i )=(1-alpha)*P(PostSNR(Ω _i )-1)+alpha*|S'(Ω _i-1 )| ² /R _bb (Ω);

其中，alpha为平滑因子，P(x)＝(|x|+x)/2，S’(Ω_i-1)为上一帧声音信号的频谱估计；Among them, alpha is the smoothing factor, P(x)=(|x|+x)/2, S'(Ω _i-1 ) is the spectral estimation of the sound signal of the previous frame;

进一步计算加权系数H_LSA(Ω)，并得到所述目标声音信号的频谱估计 S’(Ω)：Further calculate the weighting coefficient H _LSA (Ω), and obtain the spectral estimation S' (Ω) of the target sound signal:

S’(Ω)＝E(Ω)*H_LSA(Ω)，S'(Ω)=E(Ω)*H _LSA (Ω),

其中，theta＝PostSNR(Ω)*PrioriSNR(Ω)/(PrioriSNR(Ω)+1)。Wherein, theta=PostSNR(Ω)*PrioriSNR(Ω)/(PrioriSNR(Ω)+1).

可选地，确定所述待处理声音信号中干扰声音信号的频谱成分大于所述掩蔽阈值的情况下，对所述待处理声音信号进行滤波处理的步骤包括：Optionally, when it is determined that the spectral component of the interfering sound signal in the sound signal to be processed is greater than the masking threshold, the step of filtering the sound signal to be processed includes:

根据回声信号的功率谱密度和噪声信号的功率谱密度确定出滤波处理的加权系数H(Ω)：According to the power spectral density of the echo signal and the power spectral density of the noise signal, the weighting coefficient H(Ω) of the filtering process is determined:

H(Ω)＝min(1,sqrt(R_TT(Ω)/(R_bb(Ω)+R_nn(Ω))) +(zeta_b*R_bb(Ω)+zeta_n*R_nn(Ω))/(R_bb(Ω)+R_nn(Ω)))，H(Ω)=min(1, sqrt(R _TT (Ω)/(R _bb (Ω)+R _nn (Ω))) +(zeta_b*R _bb (Ω)+zeta_n*R _nn (Ω))/ (R _bb (Ω)+R _nn (Ω))),

其中，R_bb(Ω)为所述回声信号的功率谱密度，R_nn(Ω)为所述噪声信号的功率谱密度，zeta_b为回声衰减系数，zeta_n为噪声衰减系数。Wherein, R _bb (Ω) is the power spectral density of the echo signal, R _nn (Ω) is the power spectral density of the noise signal, zeta_b is the echo attenuation coefficient, and zeta_n is the noise attenuation coefficient.

可选地，根据所述频谱估计确定掩蔽阈值的步骤包括：Optionally, the step of determining a masking threshold according to the spectrum estimation includes:

根据频谱估计，确定所述待处理声音信号的临界频带的功率谱密度 B(k)和扩展临界频带频谱C(k)：According to the spectrum estimation, determine the power spectral density B(k) of the critical band and the extended critical band spectrum C(k) of the sound signal to be processed:

C(k)＝B(k)*SF(k)，C(k)=B(k)*SF(k),

其中，SF(k)＝15.81+7.5*k+0.474-17.5*sqrt(1+(k+0.474)2)，bh,bl分别为各临界频带的上下限频率；Among them, SF(k)=15.81+7.5*k+0.474-17.5*sqrt(1+(k+0.474)2), bh, bl are the upper and lower limit frequencies of each critical band respectively;

根据扩展临界频带频谱C(k)和偏移函数O(k)，确定初步掩蔽阈值T(k)：According to the extended critical band spectrum C(k) and the offset function O(k), determine the preliminary masking threshold T(k):

T(k)＝10^{lg(C(k))-(O(k)/10)}，T(k)=10 ^{lg(C(k))-(O(k)/10)} ,

其中，偏移函数O(k)＝belta*(14.5+k)+(1-belta)*5.5；belta为音调系数；Among them, the offset function O(k)=belta*(14.5+k)+(1-belta)*5.5; belta is the pitch coefficient;

根据初步掩蔽阈值T(k)和绝对听阈T_abs(k)，确定掩蔽阈值R_TT(Ω)：From the preliminary masking threshold T(k) and the absolute hearing threshold T _abs (k), determine the masking threshold R _TT (Ω):

R_TT(Ω)＝min(T(k),T_abs(k))，R _TT (Ω)=min(T(k),T _abs (k)),

其中，T_abs(k)＝3.64f^-0.8-6.5exp(f-3.3)²+10^-3f⁴。Wherein, T _abs (k)=3.64f ^-0.8 -6.5exp(f-3.3) ² +10 ^-3 f ⁴ .

可选地，获取待处理声音信号的步骤包括：Optionally, the step of acquiring the sound signal to be processed includes:

接收初始声音信号；receive the initial sound signal;

对所述初始声音信号进行回声消除，以得到所述待处理声音信号。Perform echo cancellation on the initial sound signal to obtain the to-be-processed sound signal.

可选地，所述待处理声音信号为语音信号。Optionally, the sound signal to be processed is a voice signal.

第二方面，本发明实施例提供一种处理声音信号的系统，包括：信号获取模块，用于获取待处理声音信号，所述待处理声音信号包括目标声音信号和干扰声音信号；频谱估计确定模块，用于确定所述干扰声音信号的功率谱密度，以及根据所述功率谱密度对所述待处理声音信号进行加权处理，以得到目标声音信号的频谱估计；掩蔽阈值确定模块，用于根据所述频谱估计确定掩蔽阈值；滤波处理模块，用于确定所述待处理声音信号中干扰声音信号的频谱成分大于所述掩蔽阈值的情况下，对所述待处理声音信号进行滤波处理。In a second aspect, an embodiment of the present invention provides a system for processing sound signals, including: a signal acquisition module for acquiring a to-be-processed sound signal, where the to-be-processed sound signal includes a target sound signal and an interference sound signal; a spectrum estimation and determination module , for determining the power spectral density of the interfering sound signal, and weighting the to-be-processed sound signal according to the power spectral density to obtain a spectral estimation of the target sound signal; a masking threshold determination module for The spectral estimation determines a masking threshold; a filtering processing module is configured to filter the to-be-processed sound signal when it is determined that the spectral component of the interfering sound signal in the to-be-processed sound signal is greater than the masking threshold.

可选地，所述频谱估计确定模块还用于，将所述待处理声音信号转换为频域信号E(Ω)；以及，根据以下公式确定后验信噪比PostSNR(Ω)：Optionally, the spectrum estimation and determination module is further configured to convert the to-be-processed sound signal into a frequency domain signal E (Ω); and, determine the posterior signal-to-noise ratio PostSNR (Ω) according to the following formula:

S’(Ω)＝E(Ω)*H_LSA(Ω)，S'(Ω)=E(Ω)*H _LSA (Ω),

可选地，掩蔽阈值确定模块还用于，根据频谱估计，确定所述待处理声音信号的临界频带的功率谱密度B(k)和扩展临界频带频谱C(k)：Optionally, the masking threshold determination module is further configured to, according to the spectrum estimation, determine the power spectral density B(k) and the extended critical band spectrum C(k) of the critical band of the sound signal to be processed:

C(k)＝B(k)*SF(k)，C(k)=B(k)*SF(k),

T(k)＝10^{lg(C(k))-(O(k)/10)}，T(k)=10 ^{lg(C(k))-(O(k)/10)} ,

R_TT(Ω)＝min(T(k),T_abs(k))，R _TT (Ω)=min(T(k),T _abs (k)),

可选地，所述滤波处理模块还用于，根据回声信号的功率谱密度和噪声信号的功率谱密度确定出滤波处理的加权系数H(Ω)：Optionally, the filtering processing module is further configured to determine the weighting coefficient H (Ω) of filtering processing according to the power spectral density of the echo signal and the power spectral density of the noise signal:

可选地，所述信号获取模块还用于，接收初始声音信号；对所述初始声音信号进行回声消除，以得到所述待处理声音信号。Optionally, the signal acquisition module is further configured to receive an initial sound signal; perform echo cancellation on the initial sound signal to obtain the to-be-processed sound signal.

第三方面，本发明实施例提供一种存储介质，所述存储介质中存储有一个或多个包括执行指令的程序，所述执行指令能够被电子设备(包括但不限于计算机，服务器，或者网络设备等)读取并执行，以用于执行本发明上述任一项处理声音信号的方法。In a third aspect, an embodiment of the present invention provides a storage medium, where one or more programs including execution instructions are stored in the storage medium, and the execution instructions can be used by an electronic device (including but not limited to a computer, a server, or a network). device, etc.) to read and execute, so as to execute any one of the above-mentioned methods for processing sound signals of the present invention.

第四方面，提供一种电子设备，其包括：至少一个处理器，以及与所述至少一个处理器通信连接的存储器，其中，所述存储器存储有可被所述至少一个处理器执行的指令，所述指令被所述至少一个处理器执行，以使所述至少一个处理器能够执行本发明上述任一项处理声音信号的方法及系统。In a fourth aspect, an electronic device is provided, comprising: at least one processor, and a memory communicatively connected to the at least one processor, wherein the memory stores instructions executable by the at least one processor, The instructions are executed by the at least one processor to enable the at least one processor to perform any of the above-described methods and systems for processing sound signals of the present invention.

第五方面，本发明实施例还提供一种计算机程序产品，所述计算机程序产品包括存储在存储介质上的计算机程序，所述计算机程序包括程序指令，当所述程序指令被计算机执行时，使所述计算机执行上述任一项处理声音信号的方法及系统。In a fifth aspect, an embodiment of the present invention further provides a computer program product, the computer program product includes a computer program stored on a storage medium, the computer program includes program instructions, and when the program instructions are executed by a computer, causes the The computer executes any of the above-mentioned methods and systems for processing sound signals.

本发明实施例的有益效果在于：可使声音信号失真减少，听起来更自然，通过计算的干扰声音信号的功率谱密度PSD，进一步确定出掩蔽阈值，该过程减少了算法计算的复杂度。并且降低了对前置回声消除滤波器的阶数要求，进而加快了前置回声消除器的收敛速度。以及，能够提高其在强背景噪声和近端语音环境下的鲁棒性。The beneficial effect of the embodiment of the present invention is that the distortion of the sound signal can be reduced, and the sound is more natural, and the masking threshold is further determined by calculating the power spectral density PSD of the interference sound signal, which reduces the complexity of algorithm calculation. And it reduces the order requirement of the pre-echo cancel filter, thereby accelerating the convergence speed of the pre-echo canceler. And, it can improve its robustness in strong background noise and near-end speech environment.

附图说明Description of drawings

为了更清楚地说明本发明实施例的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions of the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings used in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.

图1为本发明的处理声音信号的方法的一实施例的流程图；FIG. 1 is a flowchart of an embodiment of a method for processing a sound signal of the present invention;

图2为本发明的处理声音信号的方法的另一实施例的流程图；2 is a flowchart of another embodiment of the method for processing a sound signal of the present invention;

图3为本发明的处理语音信号的方法实现系统的一实施例的示意图；3 is a schematic diagram of an embodiment of a system for implementing a method for processing a speech signal according to the present invention;

图4为本发明的处理语音信号的方法的一实施例的示意图；FIG. 4 is a schematic diagram of an embodiment of a method for processing a speech signal of the present invention;

图5为本发明的处理声音信号的系统的一实施例的示意图；FIG. 5 is a schematic diagram of an embodiment of a system for processing sound signals of the present invention;

图6为本发明的电子设备的一实施例的结构示意图。FIG. 6 is a schematic structural diagram of an embodiment of an electronic device of the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without making creative efforts shall fall within the protection scope of the present invention.

需要说明的是，在不冲突的情况下，本申请中的实施例及实施例中的特征可以相互组合。It should be noted that the embodiments in the present application and the features of the embodiments may be combined with each other under the condition of no conflict.

本发明可以在由计算机执行的计算机可执行指令的一般上下文中描述，例如程序模块。一般地，程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、元件、数据结构等等。也可以在分布式计算环境中实践本发明，在这些分布式计算环境中，由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中，程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, elements, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices.

在本发明中，“模块”、“装置”、“系统”等指应用于计算机的相关实体，如硬件、硬件和软件的组合、软件或执行中的软件等。详细地说，例如，元件可以、但不限于是运行于处理器的过程、处理器、对象、可执行元件、执行线程、程序和/或计算机。还有，运行于服务器上的应用程序或脚本程序、服务器都可以是元件。一个或多个元件可在执行的过程和/或线程中，并且元件可以在一台计算机上本地化和/或分布在两台或多台计算机之间，并可以由各种计算机可读介质运行。元件还可以根据具有一个或多个数据包的信号，例如，来自一个与本地系统、分布式系统中另一元件交互的，和/或在因特网的网络通过信号与其它系统交互的数据的信号通过本地和/或远程过程来进行通信。In the present invention, "module", "device", "system", etc. refer to relevant entities applied to a computer, such as hardware, a combination of hardware and software, software or software in execution, and the like. In detail, for example, an element can be, but is not limited to, a process running on a processor, a processor, an object, an executable element, a thread of execution, a program, and/or a computer. Also, an application or script running on a server, the server can be a component. One or more elements may be in a process and/or thread of execution and an element may be localized on one computer and/or distributed between two or more computers and may be executed from various computer readable media . Elements may also pass through a signal having one or more data packets, for example, a signal from one interacting with another element in a local system, in a distributed system, and/or with data interacting with other systems through a network of the Internet local and/or remote processes to communicate.

最后，还需要说明的是，在本文中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”，不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括……”限定的要素，并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。Finally, it should also be noted that in this document, relational terms such as first and second are used only to distinguish one entity or operation from another, and do not necessarily require or imply these entities or that there is any such actual relationship or sequence between operations. Moreover, the terms "comprising" and "comprising" include not only those elements, but also other elements not expressly listed, or elements inherent to such a process, method, article or apparatus. Without further limitation, an element defined by the phrase "comprising" does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

如图1所示，本发明的实施例提供一种处理声音信号的方法，包括：As shown in FIG. 1, an embodiment of the present invention provides a method for processing a sound signal, including:

步骤S11：获取待处理声音信号，待处理声音信号包括目标声音信号和干扰声音信号。Step S11: Acquire a to-be-processed sound signal, where the to-be-processed sound signal includes a target sound signal and an interference sound signal.

步骤S12：确定干扰声音信号的功率谱密度，根据功率谱密度对待处理声音信号进行加权处理，以得到目标声音信号的频谱估计。具体的，确定干扰声音信号的功率谱密度之后，确定后验及先验信噪比，以及根据该信噪比计算加权系数并对待处理声音信号进行加权处理，得到目标声音信息的频谱估计。Step S12: Determine the power spectral density of the interfering sound signal, and perform weighting processing on the sound signal to be processed according to the power spectral density to obtain a spectrum estimate of the target sound signal. Specifically, after determining the power spectral density of the interfering sound signal, a posteriori and a priori signal-to-noise ratios are determined, and the weighting coefficients are calculated according to the signal-to-noise ratios, and the to-be-processed sound signal is weighted to obtain a spectrum estimate of the target sound information.

步骤S13：根据频谱估计确定掩蔽阈值。Step S13: Determine the masking threshold according to the spectrum estimation.

步骤S14：确定待处理声音信号中干扰声音信号的频谱成分大于掩蔽阈值的情况下，对待处理声音信号进行滤波处理。Step S14: When it is determined that the spectral component of the interfering sound signal in the sound signal to be processed is greater than the masking threshold, the sound signal to be processed is filtered.

以及，在本发明实施例中，对于掩蔽阈值的计算，具体的：And, in this embodiment of the present invention, for the calculation of the masking threshold, specifically:

根据频谱估计，确定待处理声音信号的临界频带的功率谱密度B(k)和扩展临界频带频谱C(k)：According to the spectrum estimation, determine the power spectral density B(k) of the critical band of the sound signal to be processed and the extended critical band spectrum C(k):

C(k)＝B(k)*SF(k)，C(k)=B(k)*SF(k),

T(k)＝10^{lg(C(k))-(O(k)/10)}，T(k)=10 ^{lg(C(k))-(O(k)/10)} ,

R_TT(Ω)＝min(T(k),T_abs(k))，R _TT (Ω)=min(T(k),T _abs (k)),

本发明实施例，通过计算的干扰声音信号的功率谱密度PSD，进一步确定出掩蔽阈值，该过程减少了算法计算的复杂度。并且降低了对前置回声消除滤波器的阶数要求，进而加快了前置回声消除器的收敛速度。以及，能够提高其在强背景噪声和近端语音环境下的鲁棒性。In the embodiment of the present invention, the masking threshold is further determined through the calculated power spectral density PSD of the interfering sound signal, and this process reduces the computational complexity of the algorithm. Moreover, the order requirement of the pre-echo canceller is reduced, thereby accelerating the convergence speed of the pre-echo canceller. And, it can improve its robustness in strong background noise and near-end speech environment.

如图2所示，本发明的实施例提供一种处理声音信号的方法，包括：As shown in FIG. 2, an embodiment of the present invention provides a method for processing a sound signal, including:

步骤S21：接收初始声音信号。该初始声音信号可通过麦克风等收音设备拾取。Step S21: Receive an initial sound signal. The initial sound signal can be picked up by a microphone and other audio equipment.

步骤S22：通过回声消除器对初始声音信号进行回声消除，以得到待处理声音信号。Step S22: Perform echo cancellation on the initial sound signal by an echo canceller to obtain the sound signal to be processed.

步骤S23：确定干扰声音信号的功率谱密度，根据功率谱密度对待处理声音信号进行加权处理，以得到目标声音信号的频谱估计。Step S23: Determine the power spectral density of the interfering sound signal, and perform weighting processing on the sound signal to be processed according to the power spectral density to obtain a spectrum estimate of the target sound signal.

步骤S24：根据频谱估计确定掩蔽阈值。Step S24: Determine the masking threshold according to the spectrum estimation.

步骤S25：确定待处理声音信号中干扰声音信号的频谱成分大于掩蔽阈值的情况下，对待处理声音信号进行滤波处理。Step S25: When it is determined that the spectral component of the interfering sound signal in the sound signal to be processed is greater than the masking threshold, the sound signal to be processed is filtered.

本发明实施例，接收到初始信号之后，先对其初步进行回声消除，可提高声音信号的处理精度。In the embodiment of the present invention, after the initial signal is received, initial echo cancellation is performed on it, which can improve the processing accuracy of the sound signal.

如果待处理声音信号中包括噪声信号和回声信号，则根据功率谱密度对待处理声音信号进行加权处理，以得到目标声音信号的频谱估计的过程中：If the sound signal to be processed includes noise signals and echo signals, weighting processing is performed on the sound signal to be processed according to the power spectral density to obtain the spectral estimation of the target sound signal:

将待处理声音信号转换为频域信号E(Ω)；Convert the sound signal to be processed into a frequency domain signal E(Ω);

其中，R_bb(Ω)为回声信号的功率谱密度，R_nn(Ω)为噪声信号的功率谱密度；Among them, R _bb (Ω) is the power spectral density of the echo signal, and R _nn (Ω) is the power spectral density of the noise signal;

进一步计算加权系数H_LSA(Ω)，并得到目标声音信号的频谱估计S’(Ω)：Further calculate the weighting coefficient H _LSA (Ω), and obtain the spectral estimation S'(Ω) of the target sound signal:

S’(Ω)＝E(Ω)*H_LSA(Ω)，S'(Ω)=E(Ω)*H _LSA (Ω),

确定待处理声音信号中干扰声音信号的频谱成分大于掩蔽阈值的情况下，对待处理声音信号进行滤波处理的步骤包括：When it is determined that the spectral component of the interfering sound signal in the sound signal to be processed is greater than the masking threshold, the steps of filtering the sound signal to be processed include:

其中，R_bb(Ω)为回声信号的功率谱密度，R_nn(Ω)为噪声信号的功率谱密度，zeta_b为回声衰减系数，zeta_n为噪声衰减系数。Among them, R _bb (Ω) is the power spectral density of the echo signal, R _nn (Ω) is the power spectral density of the noise signal, zeta_b is the echo attenuation coefficient, and zeta_n is the noise attenuation coefficient.

本发明实施例，保留了原有的背景噪声特性，残留回声听觉测试更像噪声，语音失真减少，使得声音听起来更自然。并且降低了对前置回声消除滤波器的阶数要求，进而加快了前置回声消除器的收敛速度同时降低了回声消除器的算法计算复杂度。以及，能够提高其在强背景噪声和近端语音环境下的鲁棒性。In the embodiment of the present invention, the original background noise characteristics are retained, the residual echo auditory test is more like noise, and the speech distortion is reduced, so that the voice sounds more natural. Moreover, the order requirement of the pre-echo canceller is reduced, thereby accelerating the convergence speed of the pre-echo canceler and reducing the computational complexity of the echo canceler algorithm. And, it can improve its robustness in strong background noise and near-end speech environment.

如图3所示，在本发明实施例中，本发明的处理语音信号的方法实现系统中远端麦克风传来语音信号，由扬声器示出，并且构成初始回声信号 d(k)。近端麦克风拾取语音信号y(k)，其中包括纯语音信号s(k)即目标声音信号，噪声信号n(k)，以及扬声器经LRM反馈的初始回声信号d(k)。首先，回声消除器C对近端麦克风拾取的语音信号y(k)进行回声消除，滤波器H进一步进行滤波处理。As shown in Fig. 3, in the embodiment of the present invention, the method for processing a voice signal of the present invention realizes that the voice signal transmitted from the far-end microphone in the system is shown by the speaker, and an initial echo signal d(k) is formed. The near-end microphone picks up the speech signal y(k), which includes the pure speech signal s(k), the target sound signal, the noise signal n(k), and the original echo signal d(k) fed back by the speaker through the LRM. First, the echo canceler C performs echo cancellation on the speech signal y(k) picked up by the near-end microphone, and the filter H further performs filtering processing.

如图4所示，本发明的实施例提供一种处理语音信号的方法，包括：As shown in FIG. 4, an embodiment of the present invention provides a method for processing a voice signal, including:

近端麦克风拾取语音信号y(k)，其中包括纯语音信号s(k)，噪声信号 n(k)，以及扬声器经LRM反馈的初始回声信号d(k)。在本发明实施例中，该纯语音信号为目标信息。The near-end microphone picks up the speech signal y(k), which includes the pure speech signal s(k), the noise signal n(k), and the original echo signal d(k) fed back by the speaker via the LRM. In this embodiment of the present invention, the pure voice signal is target information.

回声消除器对近端麦克风拾取的语音信号y(k)进行回声消除，得到回声消除后的语音信号e(k)。该回声消除后的语音信号e(k)包括的干扰声音信号为噪声信号和残留回声信号。The echo canceller performs echo cancellation on the voice signal y(k) picked up by the near-end microphone, and obtains the voice signal e(k) after echo cancellation. The interference sound signals included in the echo-cancelled speech signal e(k) are noise signals and residual echo signals.

通过统计或自相关方法估计出噪声PSD Rnn(Ω)和残留回声PSD R_bb(Ω)。Noise PSD Rnn (Ω) and residual echo PSD R _bb (Ω) are estimated by statistical or autocorrelation methods.

后置滤波器对回声消除后的近端麦克风信号进行加权处理，得到纯语音信号的频谱初步估计S’(Ω)。具体过程包括：The post-filter performs weighting processing on the near-end microphone signal after echo cancellation, and obtains the preliminary estimation S'(Ω) of the spectrum of the pure voice signal. The specific process includes:

a)计算后验信噪比：a) Calculate the posterior signal-to-noise ratio:

PostSNR(Ω)＝|E(Ω)|²/(R_bb(Ω)+R_nn(Ω))PostSNR(Ω)＝|E(Ω)| ² /(R _bb (Ω)+R _nn (Ω))

b)根据判决引导法推导出先验信噪比：b) Derive the prior signal-to-noise ratio according to the decision-guided method:

PrioriSNR(Ω_i)＝(1-alpha)*P(PostSNR(Ω_i)-1)+alpha*|S’(Ω_i-1)|²/R_bb(Ω)PrioriSNR(Ω _i )=(1-alpha)*P(PostSNR(Ω _i )-1)+alpha*|S'(Ω _i-1 )| ² /R _bb (Ω)

其中alpha为平滑因子，P(x)＝(|x|+x)/2,S’(Ω_i-1)为上一帧语音信号的初步估计。where alpha is a smoothing factor, P(x)=(|x|+x)/2, S'(Ω _i-1 ) is a preliminary estimation of the speech signal of the previous frame.

c)定义theta＝PostSNR(Ω)*PrioriSNR(Ω)/(PrioriSNR(Ω)+1)，然后计算加权系数：c) Define theta=PostSNR(Ω)*PrioriSNR(Ω)/(PrioriSNR(Ω)+1), and then calculate the weighting coefficient:

d)加权得到语音信号的初步估计S’(Ω)＝E(Ω)*H_LSA(Ω)d) Preliminary estimation of speech signal obtained by weighting S'(Ω)=E(Ω)*H _LSA (Ω)

然后，根据语音信号频谱初步估计S’(Ω)估算出掩蔽阈值R_TT(Ω)。具体过程包括：Then, the masking threshold R _TT (Ω) is estimated according to the preliminary estimation S' (Ω) of the speech signal spectrum. The specific process includes:

a)对信号进行临界频带分析，按照位置理论，把人耳看成离散的带通滤波器组，一个临界频带被称为一个Bark，则a) Perform critical band analysis on the signal. According to the position theory, the human ear is regarded as a discrete band-pass filter bank, and a critical band is called a Bark, then

各临界频带的功率谱密度

Power spectral density of each critical band

其中，bh、bl分别为各临界频带的上下限频率，k与采样率有关。Among them, bh and bl are the upper and lower limit frequencies of each critical frequency band, respectively, and k is related to the sampling rate.

b)计算扩展函数SF(k)：b) Calculate the spread function SF(k):

SF(k)＝15.81+7.5*k+0.474-17.5*sqrt(1+(k+0.474)2)SF(k)=15.81+7.5*k+0.474-17.5*sqrt(1+(k+0.474)2)

由于临界频带间的相互影响，扩展扩展临界频带频谱可表示为 C(k)＝B(k)*SF(k)。Due to the interaction between critical bands, the extended extended critical band spectrum can be expressed as C(k)=B(k)*SF(k).

c)计算掩蔽噪声和残留回声的掩蔽阈值R_TT(Ω)。c) Calculate the masking threshold R _TT (Ω) for masking noise and residual echo.

因存在两种掩蔽阈值，分别是：纯音掩蔽噪声及残留回声的阈值，为 C(k)-(14.5+k)db，以及噪声及残留回声掩蔽纯音的阈值，为C(k)-5.5db。There are two masking thresholds, namely: the threshold for pure tone masking noise and residual echo, which is C(k)-(14.5+k)db, and the threshold for noise and residual echo masking pure tone, which is C(k)-5.5db .

因此，确定信号类似纯音还是噪声与残留回声，进而需要定义谱平坦度测度SFM：Therefore, to determine whether the signal resembles a pure tone or noise and residual echo, it is necessary to define the spectral flatness measure SFM:

SFM＝10*lg(G/A)SFM=10*lg(G/A)

其中，G，A分别为信号功率谱密度的几何平均值和算术平均值。Among them, G and A are the geometric mean and arithmetic mean of the signal power spectral density, respectively.

以及，定义音调系数belta＝min(SFM/SFM_max,1)And, define the pitch coefficient belta=min(SFM/SFM _max ,1)

通过belta计算各个频带掩蔽能量的偏移函数O(k)：The offset function O(k) of the masking energy of each frequency band is calculated by belta:

O(k)＝belta*(14.5+k)+(1-belta)*5.5O(k)=belta*(14.5+k)+(1-belta)*5.5

则掩蔽阈值大小为：T(k)＝10^{lg(C(k))-(O(k)/10)} Then the masking threshold size is: T(k)=10 ^{lg(C(k))-(O(k)/10)}

将计算得到的扩展函数阈值返回到Bark域中Return the computed spread function threshold to the Bark domain

与人耳听力绝对阈值比较，如果计算出来的掩蔽阈值低于人耳的绝对听阈的话，就取绝对听阈的值，其中，绝对听阈Tabs(k)定义为：Compared with the absolute hearing threshold of the human ear, if the calculated masking threshold is lower than the absolute hearing threshold of the human ear, the value of the absolute hearing threshold is taken, where the absolute hearing threshold Tabs(k) is defined as:

Tabs(k)＝3.64f^-0.8-6.5exp(f-3.3)²+10^-3f⁴ Tabs(k)=3.64f ^-0.8 -6.5exp(f-3.3) ² +10 ^-3 f ⁴

所以，最终的掩蔽阈值为R_TT(Ω)＝min(T(k),T_abs(k))。Therefore, the final masking threshold is R _TT (Ω)=min(T(k),T _abs (k)).

进一步，对回声消除后频域麦克风信号E(Ω)进行心理声学加权滤波。用FFT(快速傅立叶变换)能将时域的数字信号转换为频域信号，以及判断回声消除后频域麦克风信号E(Ω)中的噪声频谱成分是否小于掩蔽阈值，若是则保留不处理；若否则对相应噪声频谱成分根据传统MMSE-LSA进行衰减。Further, psychoacoustic weighting filtering is performed on the frequency domain microphone signal E(Ω) after echo cancellation. FFT (Fast Fourier Transform) can be used to convert the digital signal in the time domain into a frequency domain signal, and judge whether the noise spectral component in the frequency domain microphone signal E(Ω) after echo cancellation is less than the masking threshold, if so, keep it and not process it; if Otherwise, the corresponding noise spectral components are attenuated according to conventional MMSE-LSA.

其中，心理声学加权滤波器系数具体推导过程如下：Among them, the specific derivation process of the psychoacoustic weighting filter coefficients is as follows:

心理声学自适应加权滤波的设计目标是在残留回声失真与噪声失真之和等于掩蔽阈值时近端语音信号失真最少，所以最优心理声学加权滤波器系数H(Ω)满足：The design goal of psychoacoustic adaptive weighted filtering is to minimize the distortion of the near-end speech signal when the sum of residual echo distortion and noise distortion is equal to the masking threshold, so the optimal psychoacoustic weighting filter coefficient H(Ω) satisfies:

[zeta_b–H(Ω)]2R_bb(Ω)+[zeta_n–H(Ω)]2R_nn(Ω)＝R_TT(Ω)[zeta_b–H(Ω)]2R _bb (Ω)+[zeta_n–H(Ω)]2R _nn (Ω)=R _TT (Ω)

其中，zeta_b为残留回声衰减系数，通常取20lg(zeta_b)＝-35；Among them, zeta_b is the residual echo attenuation coefficient, usually 20lg(zeta_b)=-35;

zeta_n为噪声衰减系数，通常取20lg(zeta_n)＝-15.zeta_n is the noise attenuation coefficient, usually 20lg(zeta_n)=-15.

由于0<＝H(Ω)<＝1，解上述二次等式H(Ω)取正值得：Since 0<=H(Ω)<=1, solving the above quadratic equation H(Ω) takes a positive value:

H(Ω)＝min(1,[zeta_b*R_bb(Ω)+zeta_n*R_nn(Ω)+H(Ω)=min(1,[zeta_b* _Rbb (Ω)+zeta_n* _Rnn (Ω)+

sqrt([R_bb(Ω)+R_nn(Ω)]*R_TT(Ω)-[zeta_b-zeta_n]²*R_bb(Ω)*R_bb(Ω))]/(R_bb(Ω)+ R_nn(Ω)))sqrt([R _bb (Ω)+R _nn (Ω)]*R _TT (Ω)-[zeta_b-zeta_n] ² *R _bb (Ω)*R _bb (Ω))]/(R _bb (Ω)+ R _nn (Ω)))

由于zeta_b，zeta_n都远小于1且通常相对于R_bb(Ω)及R_bb(Ω)来说R_TT(Ω) 不是太小，所以上式可化简为：Since zeta_b and zeta_n are much smaller than 1 and R _TT (Ω) is usually not too small relative to R _bb (Ω) and R _bb (Ω), the above formula can be simplified to:

H(Ω)＝min(1,sqrt(R_TT(Ω)/(R_bb(Ω)+R_nn(Ω))) +(zeta_b*R_bb(Ω)+zeta_n*R_nn(Ω))/(R_bb(Ω)+R_nn(Ω)))H(Ω)=min(1, sqrt(R _TT (Ω)/(R _bb (Ω)+R _nn (Ω))) +(zeta_b*R _bb (Ω)+zeta_n*R _nn (Ω))/ (R _bb (Ω)+R _nn (Ω)))

本发明实施例，由于心理声学后置滤波器还可以降低对前置回声消除自适应滤波器的阶数要求，所以可以加快回声消除器的收敛速度，减少算法计算复杂度，且能够提高其在强背景噪声和近端语音环境下的鲁棒性。In the embodiment of the present invention, since the psychoacoustic post-filter can also reduce the order requirement of the pre-echo cancellation adaptive filter, the convergence speed of the echo canceller can be accelerated, the computational complexity of the algorithm can be reduced, and the performance of the echo canceller can be improved. Robustness in strong background noise and near-end speech environments.

以及，在后置心理声学加权滤波器中融合残留回声消除，利用残留回声去自适应更新滤波器加权系数，进一步消除声学回声。另外，在掩蔽阈值以下的噪声频谱和残留回声成分由于人耳掩蔽效应是听不见的，所以这部分噪声频谱和残留回声成分不需要衰减，只需要使用传统的后置自适应滤波方法对没有被语音信号掩蔽的噪声频谱和残留回声成分进行衰减，从而很好地保留了原有的背景噪声特性，残留回声听觉测试更像噪声，语音失真减少，听起来更自然。And, the residual echo cancellation is integrated in the post psychoacoustic weighting filter, and the residual echo is used to adaptively update the filter weighting coefficient to further eliminate the acoustic echo. In addition, the noise spectrum and residual echo components below the masking threshold are inaudible due to the masking effect of the human ear, so this part of the noise spectrum and residual echo components do not need to be attenuated. The noise spectrum masked by the speech signal and the residual echo components are attenuated, so that the original background noise characteristics are well preserved. The residual echo auditory test is more like noise, the speech distortion is reduced, and the sound is more natural.

需要说明的是，对于前述的各方法实施例，为了简单描述，故将其都表述为一系列的动作合并，但是本领域技术人员应该知悉，本发明并不受所描述的动作顺序的限制，因为依据本发明，某些步骤可以采用其他顺序或者同时进行。其次，本领域技术人员也应该知悉，说明书中所描述的实施例均属于优选实施例，所涉及的动作和模块并不一定是本发明所必须的。在上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述的部分，可以参见其他实施例的相关描述。It should be noted that, for the sake of simple description, the foregoing method embodiments are all expressed as a series of actions combined, but those skilled in the art should know that the present invention is not limited by the described sequence of actions. As in accordance with the present invention, certain steps may be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the present invention. In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For the part that is not described in detail in a certain embodiment, reference may be made to the relevant descriptions of other embodiments.

如图5所示，本发明的实施例还提供一种处理声音信号的系统500，包括：As shown in Figure 5, an embodiment of the present invention also provides a system 500 for processing sound signals, including:

信号获取模块510，用于获取待处理声音信号，待处理声音信号包括目标声音信号和干扰声音信号。The signal acquisition module 510 is configured to acquire a sound signal to be processed, and the sound signal to be processed includes a target sound signal and an interference sound signal.

频谱估计确定模块520，用于确定干扰声音信号的功率谱密度，以及根据功率谱密度对待处理声音信号进行加权处理，以得到目标声音信号的频谱估计。The spectrum estimation determining module 520 is configured to determine the power spectral density of the interfering sound signal, and perform weighting processing on the sound signal to be processed according to the power spectral density, so as to obtain a spectrum estimation of the target sound signal.

掩蔽阈值确定模块530，用于根据频谱估计确定掩蔽阈值。The masking threshold determination module 530 is configured to determine the masking threshold according to the spectrum estimation.

滤波处理模块540，用于确定待处理声音信号中干扰声音信号的频谱成分大于掩蔽阈值的情况下，对待处理声音信号进行滤波处理。The filtering processing module 540 is configured to perform filtering processing on the sound signal to be processed when it is determined that the spectral component of the interfering sound signal in the sound signal to be processed is greater than the masking threshold.

进一步，干扰声音信号包括噪声信号和回声信号。Further, the interfering sound signals include noise signals and echo signals.

频谱估计确定模块还用于，将待处理声音信号转换为频域信号E(Ω)；以及，根据以下公式确定后验信噪比PostSNR(Ω)：The spectrum estimation and determination module is further configured to convert the to-be-processed sound signal into a frequency domain signal E(Ω); and to determine the posterior signal-to-noise ratio PostSNR(Ω) according to the following formula:

S’(Ω)＝E(Ω)*H_LSA(Ω)，S'(Ω)=E(Ω)*H _LSA (Ω),

掩蔽阈值确定模块还用于，根据频谱估计，确定待处理声音信号的临界频带的功率谱密度B(k)和扩展临界频带频谱C(k)：The masking threshold determination module is also used to, according to the spectrum estimation, determine the power spectral density B(k) and the extended critical band spectrum C(k) of the critical frequency band of the sound signal to be processed:

C(k)＝B(k)*SF(k)，C(k)=B(k)*SF(k),

T(k)＝10^{lg(C(k))-(O(k)/10)}，T(k)=10 ^{lg(C(k))-(O(k)/10)} ,

R_TT(Ω)＝min(T(k),T_abs(k))，R _TT (Ω)=min(T(k),T _abs (k)),

滤波处理模块还用于，根据回声信号的功率谱密度和噪声信号的功率谱密度确定出滤波处理的加权系数H(Ω)：The filtering processing module is also used to determine the weighting coefficient H (Ω) of filtering processing according to the power spectral density of the echo signal and the power spectral density of the noise signal:

信号获取模块还用于，接收初始声音信号；对初始声音信号进行回声消除，以得到待处理声音信号。The signal acquisition module is also used for receiving the initial sound signal; performing echo cancellation on the initial sound signal to obtain the sound signal to be processed.

在一些实施例中，本发明实施例提供一种非易失性计算机可读存储介质，所述存储介质中存储有一个或多个包括执行指令的程序，所述执行指令能够被电子设备(包括但不限于计算机，服务器，或者网络设备等)读取并执行，以用于执行本发明上述任一项处理声音信号的方法。In some embodiments, embodiments of the present invention provide a non-volatile computer-readable storage medium, where one or more programs including execution instructions are stored in the storage medium, and the execution instructions can be read by an electronic device (including But it is not limited to a computer, a server, or a network device, etc.) to read and execute it, so as to execute any of the above-mentioned methods for processing a sound signal of the present invention.

在一些实施例中，本发明实施例还提供一种计算机程序产品，所述计算机程序产品包括存储在非易失性计算机可读存储介质上的计算机程序，所述计算机程序包括程序指令，当所述程序指令被计算机执行时，使所述计算机执行上述任一项处理声音信号的方法。In some embodiments, embodiments of the present invention further provide a computer program product, the computer program product including a computer program stored on a non-volatile computer-readable storage medium, the computer program including program instructions, when all When the program instructions are executed by a computer, the computer is made to execute any one of the above-mentioned methods for processing sound signals.

在一些实施例中，本发明实施例还提供一种电子设备，其包括：至少一个处理器，以及与所述至少一个处理器通信连接的存储器，其中，所述存储器存储有可被所述至少一个处理器执行的指令，所述指令被所述至少一个处理器执行，以使所述至少一个处理器能够执行处理声音信号的方法。In some embodiments, embodiments of the present invention further provide an electronic device, which includes: at least one processor, and a memory communicatively connected to the at least one processor, wherein the memory stores data that can be accessed by the at least one processor. Instructions executed by one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform a method of processing a sound signal.

在一些实施例中，本发明实施例还提供一种存储介质，其上存储有计算机程序，其特征在于，该程序被处理器执行时处理声音信号的方法。In some embodiments, embodiments of the present invention further provide a storage medium on which a computer program is stored, characterized in that, when the program is executed by a processor, a method for processing a sound signal.

上述本发明实施例的处理声音信号的系统可用于执行本发明实施例的处理声音信号的方法，并相应的达到上述本发明实施例的实现处理声音信号的方法所达到的技术效果，这里不再赘述。本发明实施例中可以通过硬件处理器(hardware processor)来实现相关功能模块。The system for processing a sound signal according to the above embodiment of the present invention can be used to execute the method for processing a sound signal according to the embodiment of the present invention, and correspondingly achieve the technical effects achieved by the method for processing a sound signal according to the above embodiment of the present invention, which is not repeated here. Repeat. In this embodiment of the present invention, relevant functional modules may be implemented by a hardware processor (hardware processor).

图6是本申请另一实施例提供的执行处理声音信号的方法的电子设备的硬件结构示意图，如图6所示，该设备包括：Fig. 6 is the hardware structure schematic diagram of the electronic device that performs the method for processing sound signal provided by another embodiment of the present application, as shown in Fig. 6, this device comprises:

一个或多个处理器610以及存储器620，图6中以一个处理器610为例。One or more processors 610 and memory 620, one processor 610 is taken as an example in FIG. 6 .

执行处理声音信号的方法的设备还可以包括：输入装置630和输出装置640。The apparatus for performing the method of processing a sound signal may further include: an input device 630 and an output device 640.

处理器610、存储器620、输入装置630和输出装置640可以通过总线或者其他方式连接，图6中以通过总线连接为例。The processor 610, the memory 620, the input device 630 and the output device 640 may be connected by a bus or in other ways, and the connection by a bus is taken as an example in FIG. 6 .

存储器620作为一种非易失性计算机可读存储介质，可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块，如本申请实施例中的处理声音信号的方法对应的程序指令/模块。处理器610通过运行存储在存储器620中的非易失性软件程序、指令以及模块，从而执行服务器的各种功能应用以及数据处理，即实现上述方法实施例处理声音信号的方法。As a non-volatile computer-readable storage medium, the memory 620 can be used to store non-volatile software programs, non-volatile computer-executable programs and modules, such as those corresponding to the methods for processing sound signals in the embodiments of the present application. Program instructions/modules. The processor 610 executes various functional applications and data processing of the server by running the non-volatile software programs, instructions and modules stored in the memory 620, that is, implementing the method for processing sound signals in the above method embodiments.

存储器620可以包括存储程序区和存储数据区，其中，存储程序区可存储操作系统、至少一个功能所需要的应用程序；存储数据区可存储根据处理声音信号的装置的使用所创建的数据等。此外，存储器620可以包括高速随机存取存储器，还可以包括非易失性存储器，例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实施例中，存储器620可选包括相对于处理器610远程设置的存储器，这些远程存储器可以通过网络连接至处理声音信号的装置。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 620 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the apparatus for processing sound signals, and the like. Additionally, memory 620 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory 620 may optionally include memory located remotely from the processor 610, and these remote memories may be connected via a network to the apparatus for processing sound signals. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

输入装置630可接收输入的数字或字符信息，以及产生与处理声音信号的装置的用户设置以及功能控制有关的信号。输出装置640可包括显示屏等显示设备。The input device 630 may receive input numerical or character information, and generate signals related to user settings and function control of the device for processing sound signals. The output device 640 may include a display device such as a display screen.

所述一个或者多个模块存储在所述存储器620中，当被所述一个或者多个处理器610执行时，执行上述任意方法实施例中的处理声音信号的方法。The one or more modules are stored in the memory 620, and when executed by the one or more processors 610, perform the method for processing a sound signal in any of the above method embodiments.

上述产品可执行本申请实施例所提供的方法，具备执行方法相应的功能模块和有益效果。未在本实施例中详尽描述的技术细节，可参见本申请实施例所提供的方法。The above product can execute the method provided by the embodiments of the present application, and has corresponding functional modules and beneficial effects for executing the method. For technical details not described in detail in this embodiment, reference may be made to the methods provided in the embodiments of this application.

本申请实施例的电子设备以多种形式存在，包括但不限于:The electronic devices of the embodiments of the present application exist in various forms, including but not limited to:

(1)移动通信设备:这类设备的特点是具备移动通信功能，并且以提供话音、数据通信为主要目标。这类终端包括:智能手机(例如iPhone)、多媒体手机、功能性手机，以及低端手机等。(1) Mobile communication equipment: This type of equipment is characterized by having mobile communication functions, and its main goal is to provide voice and data communication. Such terminals include: smart phones (eg iPhone), multimedia phones, feature phones, and low-end phones.

(2)超移动个人计算机设备:这类设备属于个人计算机的范畴，有计算和处理功能，一般也具备移动上网特性。这类终端包括:PDA、MID和UMPC 设备等，例如iPad。(2) Ultra-mobile personal computer equipment: This type of equipment belongs to the category of personal computers, has computing and processing functions, and generally has the characteristics of mobile Internet access. Such terminals include: PDAs, MIDs, and UMPC devices, such as iPads.

(3)便携式娱乐设备:这类设备可以显示和播放多媒体内容。该类设备包括:音频、视频播放器(例如iPod)，掌上游戏机，电子书，以及智能玩具和便携式车载导航设备。(3) Portable entertainment equipment: This type of equipment can display and play multimedia content. Such devices include: audio and video players (eg iPod), handheld game consoles, e-books, as well as smart toys and portable car navigation devices.

(4)服务器:提供计算服务的设备，服务器的构成包括处理器、硬盘、内存、系统总线等，服务器和通用的计算机架构类似，但是由于需要提供高可靠的服务，因此在处理能力、稳定性、可靠性、安全性、可扩展性、可管理性等方面要求较高。(4) Server: a device that provides computing services. The composition of the server includes a processor, a hard disk, a memory, a system bus, etc. The server is similar to a general computer architecture, but due to the need to provide highly reliable services, it has a great impact on processing capacity, stability, etc. , reliability, security, scalability, manageability and other aspects of high requirements.

(5)其他具有数据交互功能的电子装置。(5) Other electronic devices with data interaction function.

以上所描述的装置实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到各实施方式可借助软件加通用硬件平台的方式来实现，当然也可以通过硬件。基于这样的理解，上述技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品可以存储在计算机可读存储介质中，如ROM/RAM、磁碟、光盘等，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus a general hardware platform, and certainly can also be implemented by hardware. Based on this understanding, the above-mentioned technical solutions can be embodied in the form of software products in essence, or the parts that make contributions to related technologies, and the computer software products can be stored in computer-readable storage media, such as ROM/RAM, magnetic disks , optical disc, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments or some parts of the embodiments.

最后应说明的是：以上实施例仅用以说明本申请的技术方案，而非对其限制；尽管参照前述实施例对本申请进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still be The technical solutions described in the foregoing embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions in the embodiments of the present application.

Claims

1. A method of processing a sound signal, comprising:

acquiring a sound signal to be processed, wherein the sound signal to be processed comprises a target sound signal and an interference sound signal, and the interference sound signal comprises a noise signal and an echo signal;

determining a power spectral density of the interfering sound signal;

converting the sound signal to be processed into a frequency domain signal E (omega);

the posterior signal-to-noise ratio PostSNR (Ω) is determined according to the following formula:

PostSNR(Ω)＝|E(Ω)|²/(R_bb(Ω)+R_nn(Ω))，

wherein R is_bb(omega) is the power spectral density, R, of the echo signal_nn(Ω) is the power spectral density of the noise signal;

the a priori signal-to-noise ratio PrioriSNR (Ω) is derived according to the following equation:

PrioriSNR(Ω_i)＝(1-alpha)*P(PostSNR(Ω_i)-1)+alpha*|S’(Ω_i-1)|²/R_bb(Ω)；

where alpha is a smoothing factor, p (x) ═ (| x | + x)/2, S' (Ω)_i-1) Estimating the frequency spectrum of the sound signal of the previous frame;

further calculating a weighting factor H_LSA(Ω), and obtaining a spectral estimate S' (Ω) of the target sound signal:

S’(Ω)＝E(Ω)*H_LSA(Ω)，

wherein theta is PostSNR (Ω) × PrioriSNR (Ω)/(PrioriSNR (Ω) + 1);

determining a masking threshold from the spectral estimate;

and under the condition that the frequency spectrum component of the interference sound signal in the sound signal to be processed is determined to be larger than the masking threshold, carrying out filtering processing on the sound signal to be processed.

2. The method according to claim 1, wherein the step of performing filtering processing on the sound signal to be processed when it is determined that the spectral component of the interfering sound signal in the sound signal to be processed is greater than the masking threshold comprises:

determining a weighting coefficient H (omega) of the filtering process according to the power spectral density of the echo signal and the power spectral density of the noise signal:

H(Ω)＝min(1,sqrt(R_TT(Ω)/(R_bb(Ω)+R_nn(Ω)))+(zeta_b*R_bb(Ω)+zeta_n*R_nn(Ω))/(R_bb(Ω)+R_nn(Ω)))，

wherein R is_bb(omega) is the power spectral density, R, of the echo signal_nn(Ω) is the power spectral density of the noise signal, zeta _ b is the echo attenuation coefficient, and zeta _ n is the noise attenuation coefficient.

3. The method of claim 1, wherein the step of determining a masking threshold based on the spectral estimate comprises:

determining, from the spectral estimation, a power spectral density b (k) and an extended critical band spectrum c (k) of a critical band of the sound signal to be processed:

C(k)＝B(k)*SF(k)，

wherein sf (k) ═ 15.81+7.5 × k +0.474-17.5 × sqrt (1+ (k +0.474)2), bh, bl are the upper and lower limit frequencies of each critical band, respectively;

determining a preliminary masking threshold t (k) according to the spread critical band spectrum c (k) and the offset function o (k):

T(k)＝10^{lg(C(k))-(O(k)/10)}，

wherein the offset function o (k) ═ belta (14.5+ k) + (1-belta) × 5.5; belta is the pitch coefficient;

according to a preliminary masking threshold T (k) and an absolute hearing threshold T_abs(k) Determining a masking threshold R_TT(Ω)：

R_TT(Ω)＝min(T(k),T_abs(k))，

Wherein, T_abs(k)＝3.64f^-0.8-6.5exp(f-3.3)²+10^-3f⁴。

4. The method of claim 1, wherein the step of obtaining the sound signal to be processed comprises:

receiving an initial sound signal;

and carrying out echo cancellation on the initial sound signal to obtain the sound signal to be processed.

5. The method according to claim 1, characterized in that the sound signal to be processed is a speech signal.

6. A system for processing a sound signal, comprising:

the device comprises a signal acquisition module, a processing module and a processing module, wherein the signal acquisition module is used for acquiring a sound signal to be processed, the sound signal to be processed comprises a target sound signal and an interference sound signal, and the interference sound signal comprises a noise signal and an echo signal;

the frequency spectrum estimation determining module is used for determining the power spectral density of the interference sound signal and carrying out weighting processing on the sound signal to be processed according to the power spectral density to obtain the frequency spectrum estimation of the target sound signal;

a masking threshold determination module for determining a masking threshold from the spectral estimate;

the filtering processing module is used for performing filtering processing on the sound signal to be processed under the condition that the frequency spectrum component of the interference sound signal in the sound signal to be processed is determined to be larger than the masking threshold;

the frequency spectrum estimation determining module is further used for converting the sound signal to be processed into a frequency domain signal E (omega); and determining the posterior signal-to-noise ratio PostSNR (Ω) according to the following formula:

PostSNR(Ω)＝|E(Ω)|²/(R_bb(Ω)+R_nn(Ω))，

S’(Ω)＝E(Ω)*H_LSA(Ω)，

where theta is PostSNR (Ω) × PrioriSNR (Ω)/(PrioriSNR (Ω) + 1).

7. The system according to claim 6, wherein the masking threshold determination module is further configured to determine, according to the spectral estimation, the power spectral density B (k) and the spectrum C (k) of the critical band of the sound signal to be processed:

C(k)＝B(k)*SF(k)，

wherein sf (k) ═ 15.81+7.5 ═ k +0.474 to 17.5 · sqrt (1+ (k +0.474)²) Bh and bl are the upper and lower limit frequencies of each critical frequency band respectively;

T(k)＝10^{lg(C(k))-(O(k)/10)}，

R_TT(Ω)＝min(T(k),T_abs(k))，

Wherein, T_abs(k)＝3.64f^-0.8-6.5exp(f-3.3)²+10^-3f⁴。

8. The system of claim 6, wherein the filtering module is further configured to determine a weighting coefficient H (Ω) of the filtering process according to the power spectral density of the echo signal and the power spectral density of the noise signal:

9. The system of claim 6, wherein the signal acquisition module is further configured to receive an initial sound signal; and carrying out echo cancellation on the initial sound signal to obtain the sound signal to be processed.

10. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method of any one of claims 1-5.

11. A storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 5.