WO2010083641A1 - 双端通话检测方法及装置 - Google Patents

双端通话检测方法及装置 Download PDF

Info

Publication number
WO2010083641A1
WO2010083641A1 PCT/CN2009/070226 CN2009070226W WO2010083641A1 WO 2010083641 A1 WO2010083641 A1 WO 2010083641A1 CN 2009070226 W CN2009070226 W CN 2009070226W WO 2010083641 A1 WO2010083641 A1 WO 2010083641A1
Authority
WO
WIPO (PCT)
Prior art keywords
end signal
far
signal frame
difference
detecting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2009/070226
Other languages
English (en)
French (fr)
Inventor
程荣
张崇岩
韦春妍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to PCT/CN2009/070226 priority Critical patent/WO2010083641A1/zh
Priority to CN200980142133.XA priority patent/CN102160296B/zh
Priority to EP09838608.9A priority patent/EP2348645B1/en
Priority to US12/577,410 priority patent/US8160238B2/en
Publication of WO2010083641A1 publication Critical patent/WO2010083641A1/zh
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • H04M9/082Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic using echo cancellers

Definitions

  • the embodiments of the present invention relate to the field of voice communications, and in particular, to a double-talk detection method and apparatus. Background technique
  • a voice communication product for example, a mobile phone
  • the prior art uses an acoustic echo cancellation technique.
  • the principle of the technique is: using an adaptive filter to simulate an echo path, obtaining an estimated echo signal, and subtracting the estimated echo from the near-end signal of the microphone set. The signal, which implements echo cancellation.
  • This detection technology is a double-talk detection technology. Specifically, the technology needs to detect that the current call state is near-end. Whether the state of the speech is at the same time as the far end (double talk state), or the state of the echo signal (single talk state) in the near-end signal, thereby determining whether to update the adaptive filter coefficient.
  • the prior art provides an energy-based detection method, a signal correlation-based detection method, and a dual filter-based detection method.
  • the energy-based detection method detects the current call state by comparing the instantaneous power of the near-end signal with the instantaneous power of the far-end signal.
  • the method requires that the energy of the echo signal is smaller than the energy of the near-end voice and the far-end signal, and is applicable only to The scene where the echo signal energy is small; therefore, the method relies on the energy levels of the far-end signal and the echo signal, and the false positive rate is high.
  • the signal correlation detection method detects the current call state by calculating the correlation between the far-end signal and the near-end signal.
  • the method has high computational complexity, and the detection accuracy depends on the signal distortion process. Degree, when the echo signal is distorted, the detection accuracy is lowered.
  • the detection method based on the double filter detects the current call state by calculating and comparing the two filtered output results.
  • the detection accuracy of the method also depends on the degree of signal distortion. When the echo signal is distorted, the adaptive filter is easy to diverge, and it is difficult to achieve convergence. The state causes the detection accuracy to decrease.
  • the inventor has found that the existing double-talk detection technology is applicable to a scene with less nonlinear distortion and less echo signal energy; but in the actual environment, taking a mobile phone as an example, due to the speaker of the mobile phone With band-pass characteristics, the speaker will introduce nonlinear distortion to the echo signal, and it cannot be avoided. Further, in the hands-free mode, the echo signal energy is also large; therefore, in the actual environment, the existing double-talk detection technology The detection accuracy is low and the detection performance is poor. Once the near-end speech is misjudged as an echo signal, the near-end speech is cancelled as an echo signal by adaptive filtering, which seriously affects the call quality. Summary of the invention
  • the embodiment of the invention provides a double-talk detection method and device for improving detection accuracy.
  • the embodiment of the invention provides a double-talk detection method, including:
  • the call state is detected according to a spectral difference between the far-end signal frame and the near-end signal frame.
  • An embodiment of the present invention provides a double-talking detection apparatus, including:
  • An obtaining module configured to acquire a far-end signal frame and a near-end signal frame
  • the frequency detection module is configured to detect a call state according to a spectrum difference between the far-end signal frame and the near-end signal frame.
  • the embodiment of the invention detects the call state, and even if there is nonlinear distortion of the signal, the detection accuracy is not affected, but the distortion is larger, the detection result is more accurate, and the detection performance is more Good; and the energy levels of the far-end and echo signals do not affect the detection performance, making the embodiment of the invention particularly suitable for hands-free situations.
  • FIG. 1 is a schematic structural diagram of an AEC to which an embodiment of the present invention is applied;
  • FIG. 2 is a flowchart of a double-talk detection method according to an embodiment of the present invention.
  • FIG. 3 is a flowchart of a double-talk detection method according to Embodiment 2 of the present invention.
  • FIG. 4 is a schematic diagram of a single-talk state in a double-ended call detection method according to Embodiment 2 of the present invention
  • FIG. 5 is a schematic diagram of a curve in the case of a near-end voice in a double-ended call detection method according to Embodiment 2 of the present invention
  • FIG. 6 is a schematic structural diagram of a double-talk processing apparatus according to an embodiment of the present invention. detailed description
  • FIG. 1 it is a schematic structural diagram of an AEC according to an embodiment of the present invention, and the AEC includes double-talk detection. Double Talk Detection (hereinafter referred to as DTD) 11. Adaptive Filter (AF) 12 and Nonlinear Processor (hereinafter referred to as NLP) 13.
  • DTD11 is a key factor that restricts the performance of the AEC.
  • the embodiment of the present invention provides a double-talk detection method and The device can be applied to the DTD11.
  • the technical solutions of the embodiments of the present invention are further described in detail below with reference to the accompanying drawings and embodiments.
  • FIG. 2 it is a flowchart of a double-talk detection method according to an embodiment of the present invention, which specifically includes the following steps:
  • Step 101 Acquire a far-end signal frame and a near-end signal frame.
  • Step 102 Detect a call state according to a spectrum difference between the far-end signal frame and the near-end signal frame.
  • the speaker Since the speaker has a band pass characteristic, the low frequency portion of the echo signal passing through the speaker is attenuated, and when the hardware is fixed, the low frequency attenuation of the echo signal caused by the nonlinear distortion of the speaker is constant. Therefore, the spectral difference between the echo signal and the far-end signal is constant; when there is near-end speech, the spectral difference between the near-end signal and the far-end signal changes.
  • the spectral difference between the far-end signal frame and the near-end signal frame the call state is detected, and even if there is nonlinear distortion of the signal, the detection accuracy is not affected, but the distortion is larger, the detection result is more accurate, and the detection performance is better. And the energy level of the far-end signal and the echo signal does not affect the detection performance, making this embodiment particularly suitable for the hands-free situation.
  • FIG. 3 it is a flowchart of a method for detecting double-talk in the embodiment of the present invention, which specifically includes the following steps:
  • Step 201 Acquire a far-end signal frame and a near-end signal frame.
  • the DTD 11 acquires the input far-end signal input frame and the near-end signal input frame, writes the far-end signal input frame into the buffer (Buffer), and delays the far-end signal input frame in the buffer for a specified time.
  • Traffic_length is used to indicate the number of sample points corresponding to the specified time
  • the specified time is the actual extension time of the echo signal relative to the far-end signal, that is, the echo signal included in the near-end signal input frame actually corresponds to the far-end signal frame after the delay Tail_length points.
  • the echo delay time of a mobile phone is about 16ms, and for a signal with a sampling rate of 8000, the Tail_length is 128.
  • the near-end signal input frame is used as a near-end signal frame and is processed next with the far-end signal frame.
  • Step 202 Calculate a first short-term average zero-crossing rate of the far-end signal frame and a second short-term average zero-crossing rate of the near-end signal frame.
  • the frequency characteristics of the signal are extracted by using the short-term average zero-crossing rate, wherein the short-term average zero-crossing rate refers to the number of times the product of the signal value of each adjacent sample point of the signal frame is less than zero, that is, the signal frame curve passes. The number of zeros; the longer the short-term average zero-crossing rate, the higher the signal frame frequency.
  • Step 203 Perform smoothing processing on the first short-term average zero-crossing rate and the second short-time average zero-crossing rate.
  • Zcrl zcrl*a + (1- a )* zcrl pre;
  • Zcr2 zcr2*a + (1- a )* zcr2_pre;
  • a is the smoothing coefficient
  • a is between 0 and 1
  • zcrl_pre and zcr2_pre respectively represent the short-term average zero-crossing rate of the far-end signal of the previous frame and the short-term average zero-crossing rate of the near-end signal of the previous frame.
  • the high-frequency fluctuation of the short-term average zero-crossing rate can be eliminated.
  • Step 204 Calculate a difference between the first short-term average zero-crossing rate and the second short-term average zero-crossing rate.
  • the zcrl and zcr2 of this step may be obtained in step 202 or may be processed in step 203.
  • Step 205 Determine whether the difference is less than the spectrum difference threshold. If yes, go to step 206; otherwise, go to step 207.
  • the spectral difference threshold is represented by T.
  • Step 206 Detect that the call state is a double talk state.
  • the determination result is input to the double talk processing module of the AEC, specifically, input to the AF12, and the AF12 does not update the adaptive filter coefficient according to the determination result. After the filtering result is output, it is processed by the NLP 13.
  • Step 207 Detect that the call state is a single talk state.
  • the call state is detected by utilizing the spectral difference between the near-end signal and the far-end signal caused by the nonlinear distortion of the speaker. Specifically, the short-time average zero-crossing rate is used to extract the signal.
  • Frequency characteristics based on the short-term average of the far-end signal frame and the near-end signal frame
  • the difference of the zero-crossing rate is used to detect the call state, so that the detection accuracy is not affected by the nonlinear distortion, but the larger the distortion, the more accurate the detection result, and the better the detection performance; compared with the prior art, the embodiment does not depend on the non- The degree of linear distortion and the energy level of the far-end signal and the echo signal improve the quality of the call; further, in the case of hands-free, the beneficial effects of the embodiment are more remarkable.
  • FIG. 4 it is a schematic diagram of a single-speaking state in the double-ended call detection method according to the second embodiment of the present invention.
  • the double-end call detection method in the second embodiment of the present invention has a near-end voice.
  • Curve diagram where the solid line indicates the short-term average zero-crossing rate of the far-end signal frame, and the dashed line indicates the short-term average zero-crossing rate of the near-end signal frame.
  • the near-end signal contains the echo signal
  • the near-end signal in Figure 5 Contains echo signals and near-end speech.
  • Embodiments of the present invention can be applied to AECs of voice communication products having nonlinear distortion characteristics as mobile phone speakers.
  • FIG. 6 it is a schematic structural diagram of a double-talk processing apparatus according to an embodiment of the present invention.
  • This embodiment may be a DTD 11 in an AEC, and specifically includes: an obtaining module 21 and a spectrum detecting module 22, where the acquiring module 21 is configured to obtain The far-end signal frame and the near-end signal frame; the frequency-speech detecting module 22 is configured to detect a call state according to a spectrum difference between the far-end signal frame and the near-end signal frame.
  • the obtaining module 21 includes an input module 23, a buffer module 24, and an output module 25,
  • the input module 23 is configured to obtain the input far-end signal input frame and the near-end signal input frame, and transmit the far-end signal input frame to the buffer module 24;
  • the buffer module 24 is configured to delay the far-end signal input frame designation After the time, the far-end signal frame is obtained;
  • the output module 25 is configured to use the near-end signal input frame as the near-end signal frame, thereby acquiring the far-end signal frame and the near-end signal frame.
  • the frequency detection module 22 includes a first calculation module 26, a second calculation module 27, and a difference detection module.
  • the first calculation module 26 is configured to calculate a first short-term average zero-crossing rate of the far-end signal frame and a second short-term average zero-crossing rate of the near-end signal frame.
  • the second calculating module 27 is configured to calculate the first The difference between the short-term average zero-crossing rate and the second short-term average zero-crossing rate; the difference detecting module 28 is configured to detect the call state according to the difference.
  • the frequency detecting module 22 may further include a filtering module 29, configured to perform smoothing filtering on the first short-term average zero-crossing rate and the second short-time average zero-crossing rate calculated by the first calculating module 26, and then Output to the second calculation module 27.
  • a filtering module 29 configured to perform smoothing filtering on the first short-term average zero-crossing rate and the second short-time average zero-crossing rate calculated by the first calculating module 26, and then Output to the second calculation module 27.
  • the difference detecting module 28 may further include: a determining unit 281 and a detecting unit 282, configured to determine whether the difference is smaller than a spectrum difference threshold; and the detecting unit 282 is configured to: when the determining unit 281 determines the difference When the value is smaller than the spectrum difference threshold, the call state is detected as the double talk state; when the judgment unit 281 determines that the difference is greater than or equal to the spectrum difference threshold, the detecting unit 282 detects that the call state is the single talk state.
  • the call state is detected by utilizing the spectral difference between the near-end signal and the far-end signal caused by the nonlinear distortion of the speaker.
  • the short-time average zero-crossing rate is used to extract the signal.
  • the frequency characteristic detects the call state according to the difference between the short-term average zero-crossing rate of the far-end signal frame and the near-end signal frame, so that the detection accuracy is not affected by the nonlinear distortion, but the distortion is larger, and the detection result is more accurate, and the detection is more accurate.
  • the embodiment does not depend on the degree of nonlinear distortion and the energy level of the far-end signal and the echo signal, and improves the call quality; further, in the hands-free situation, the benefit of this embodiment is beneficial. The effect is more significant.
  • the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk or an optical disk. It is not limited thereto; although the embodiments of the present invention have been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that they can still modify the technical solutions described in the foregoing embodiments, or some of the technologies. The features are equivalent to the equivalents of the technical solutions of the embodiments of the embodiments of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Telephone Function (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)

Description

双端通话检测方法及装置
技术领域
本发明实施例涉及语音通信领域, 尤其涉及一种双端通话检测方法及装 置。 背景技术
在语音通信领域中, 语音通信产品 (例如手机)接收到来自网络侧的远 端信号并经过扬声器播放后, 在声学通路产生回声信号, 回声信号及近端语 音被麦克风釆集后, 被传送至另一通话端。 为了消除回声信号, 现有技术釆 用声学回声消除技术, 该技术的原理为: 利用自适应滤波器模拟回声路径, 得到估计的回声信号, 从麦克风釆集的近端信号中减去估计的回声信号, 实 现了回声消除。
在声学回声消除技术中, 需要检测出麦克风釆集的近端信号中是否存在 近端语音, 这种检测技术为双端通话检测技术, 具体地说, 该技术需要检测 出当前通话状态是近端和远端同时在说话的状态 (双讲状态) , 还是近端信 号中只有回声信号的状态(单讲状态), 从而决定是否更新自适应滤波系数。
针对双端通话检测技术, 现有技术中提供了基于能量的检测方法、 基于 信号相关性的检测方法以及基于双滤波器的检测方法。
其中, 基于能量的检测方法通过比较近端信号瞬时功率和远端信号瞬时 功率的大小, 检测出当前通话状态, 该方法要求回声信号的能量小于近端语 音和远端信号的能量, 仅适用于回声信号能量较小的场景; 因此, 该方法依 赖于远端信号和回声信号的能量水平, 误判率较高。
基于信号相关性的检测方法通过计算远端信号和近端信号的相关度, 检 测出当前通话状态, 该方法计算复杂度较高, 其检测精度依赖于信号失真程 度, 当回声信号失真时, 检测精度降低。
基于双滤波器的检测方法通过计算和比较两次滤波输出结果, 检测出当 前通话状态, 该方法的检测精度也依赖于信号失真程度, 当回声信号失真时, 自适应滤波易发散, 难以达到收敛状态, 导致检测精度降低。
发明人在实现本发明的过程中发现, 现有的双端通话检测技术适用于非 线性失真较小, 回声信号能量较小的场景; 但是在实际环境下, 以手机为例, 由于手机的扬声器具有带通特性, 扬声器会给回声信号带来非线性失真, 并 且无法避免; 进一步, 在免提模式下, 回声信号能量也很大; 因此, 在实际 环境下, 现有的双端通话检测技术的检测精度较低, 检测性能较差, 一旦将 近端语音误判为回声信号, 则通过自适应滤波处理, 将近端语音当作回声信 号抵消了, 这样严重影响了通话质量。 发明内容
本发明实施例提供了一种双端通话检测方法及装置,用以提高检测精度。 本发明实施例提供了一种双端通话检测方法, 包括:
获取远端信号帧和近端信号帧;
根据所述远端信号帧和所述近端信号帧的频谱差异, 检测通话状态。 本发明实施例提供了一种双端通话检测装置, 包括:
获取模块, 用于获取远端信号帧和近端信号帧;
频语检测模块,用于根据所述远端信号帧和所述近端信号帧的频谱差异, 检测通话状态。
本发明实施例根据远端信号帧和近端信号帧的频谱差异,检测通话状态, 即使存在信号的非线性失真, 也不会影响检测精度, 反而失真越大, 检测结 果越准确, 检测性能越好; 并且远端信号和回声信号的能量水平不会影响检 测性能, 使得本发明实施例尤其适用于免提情况。 附图说明
图 1为本发明实施例所适用的 AEC的结构示意图;
图 2为本发明实施例一双端通话检测方法的流程图;
图 3为本发明实施例二双端通话检测方法的流程图;
图 4为本发明实施例二双端通话检测方法中单讲状态的曲线示意图; 图 5为本发明实施例二双端通话检测方法中存在近端语音情况下的曲 线示意图;
图 6为本发明实施例双端通话检测装置的结构示意图。 具体实施方式
首先简要介绍本发明实施例所适用的声学回声消除器 (Acoustic Echo Canceller, 以下简称: AEC ), 如图 1 所示, 为本发明实施例所适用的 AEC 的结构示意图, AEC包含双端通话检测器( Double Talk Detection, 以下简称: DTD ) 11、 自适应滤波器(AF ) 12和非线性处理器( Nonlinear Processor, 以 下简称: NLP ) 13。 其中, DTD11是制约 AEC性能的关键因素, 为了解决现 有 DTD 11 检测技术难以有效地在语音通信产品上检测出双端通话状态的问 题,本发明实施例提出了一种双端通话检测方法和装置,可以应用在该 DTD11 中。 下面通过附图和实施例, 对本发明实施例的技术方案做进一步的详细描 述。
如图 2所示, 为本发明实施例一双端通话检测方法的流程图, 具体包括如 下步骤:
步骤 101、 获取远端信号帧和近端信号帧;
步骤 102、根据所述远端信号帧和所述近端信号帧的频谱差异,检测通话 状态。
由于扬声器具有带通特性, 通过扬声器的回声信号的低频部分被衰减, 当硬件固定后, 由扬声器非线性失真所引发的回声信号低频衰减是恒定的, 因此回声信号和远端信号的频谱差异是恒定的; 而当存在近端语音时, 近端 信号和远端信号的频谱差异就会发生变化。 本实施例根据远端信号帧和近端 信号帧的频谱差异, 检测通话状态, 即使存在信号的非线性失真, 也不会影 响检测精度, 反而失真越大, 检测结果越准确, 检测性能越好; 并且远端信 号和回声信号的能量水平不会影响检测性能, 使得本实施例尤其适用于免提 情况。
如图 3所示, 为本发明实施例二双端通话检测方法的流程图, 具体包括如 下步骤:
步骤 201、 获取远端信号帧和近端信号帧。
具体地说, DTD11获取输入的远端信号输入帧和近端信号输入帧, 将远 端信号输入帧写入緩冲区(Buffer )中, 在緩冲区中将远端信号输入帧延迟指 定时间(用 Tail— length表示指定时间对应的釆样点个数)后,得到远端信号帧。 其中, 指定时间为回声信号相对于远端信号的实际延长时间, 也即包含在近 端信号输入帧中的回声信号实际是与延迟 Tail— length个点之后的远端信号帧 相对应。 一般手机的回声延迟时间为 16ms左右, 对于釆样率为 8000的信号, Tail— length为 128。
将近端信号输入帧作为近端信号帧, 并与远端信号帧一起进行下一步处 理。
步骤 202、计算远端信号帧的第一短时平均过零率和近端信号帧的第二短 时平均过零率。
本实施例釆用短时平均过零率提取信号的频率特征, 其中, 短时平均过 零率是指信号帧每相邻釆样点信号值的乘积小于零的次数, 即为信号帧曲线 经过零点的次数; 短时平均过零率越大, 说明信号帧频率越高。
步骤 203、将第一短时平均过零率和第二短时平均过零率进行平滑滤波处 理。
用 zcrl表示第一短时平均过零率, zcr2表示第二短时平均过零率, 如下两 式所示:
zcrl = zcrl*a + (1- a )* zcrl pre;
zcr2 = zcr2*a + (1- a )* zcr2_pre;
其中 a为平滑系数, a取值为 0 ~ 1之间, zcrl_pre和 zcr2_pre分别表示前一 帧远端信号的短时平均过零率和前一帧近端信号的短时平均过零率。
经过平滑滤波处理可以消除短时平均过零率的高频波动, a值越小, 平滑 度越高, 即前后帧的短时平均过零率相差越小, 短时平均过零率曲线较平稳; a值越大, 平滑度越低。
步骤 204、 计算第一短时平均过零率与第二短时平均过零率的差值。
本步骤的 zcrl和 zcr2可以为步骤 202得到的,也可以为经过步骤 203处理后 的。
步骤 205、 判断差值是否小于频谱差异门限值, 若是, 则执行步骤 206; 否则执行步骤 207。
用 T表示频谱差异门限值, T的取值取决于实际系统中扬声器非线性失真 程度, 可以通过实际试验获取。 本实施例中, T=6。
步骤 206、 检测出通话状态为双讲状态。
由于 zcr2-zcrl<T, 可判定当前的通话状态为双讲状态; 之后, 将判定结 果输入到 AEC的双讲处理模块, 具体地, 输入到 AF12, AF12根据判定结果不 更新自适应滤波系数, 输出滤波结果后, 由 NLP 13进行处理。
步骤 207、 检测出通话状态为单讲状态。
由于 zcr2-zcrl > T, 可判断当前的通话状态为单讲状态; 之后, 将判断结 果输入到 AEC的单讲处理模块, 具体地, 输入到 AF12, AF12根据判定结果更 新自适应滤波系数, 输出滤波结果后, 由 NLP13进行处理。
本实施例考虑到语音通信产品的实际特点, 利用由扬声器的非线性失真 引发的近端信号和远端信号的频谱差异来检测通话状态, 具体地, 釆用短时 平均过零率提取信号的频率特征, 根据远端信号帧和近端信号帧的短时平均 过零率的差值来检测通话状态, 使得检测精度不受非线性失真的影响, 反而 失真越大, 检测结果越准确, 检测性能越好; 相对于现有技术, 本实施例不 依赖于非线性失真程度及远端信号和回声信号的能量水平,提高了通话质量; 进一步的, 在免提情况下, 本实施例的有益效果更加显著。
下面通过一个具体的例子对本发明实施例的方案做进一步介绍。
如图 4所示,为本发明实施例二双端通话检测方法中单讲状态的曲线示意 图, 如图 5所示, 为本发明实施例二双端通话检测方法中存在近端语音情况下 的曲线示意图, 其中, 实线表示远端信号帧的短时平均过零率, 虚线表示近 端信号帧的短时平均过零率, 图 4中近端信号包含回声信号, 图 5中近端信号 包含回声信号和近端语音。
由图 4可以看出, 由于回声信号的非线性失真带来的低频信号衰减, 使得 回声信号整体集中在中高频率, 所有失真的回声信号的过零率大于远端信号 的过零率。 如果回声信号没有失真, 则回声信号的过零率等于远端信号的过 零率, 图 4中的两条曲线应该是重合的。
由图 5可以看出, 由于近端语音没有由扬声器带来的非线性失真, 当存在 近端语音时, 近端信号的过零率接近于远端信号的过零率, 如图 5中椭圓选中 的区域, 该区域为双讲状态; 当没有近端语音时, 近端信号的过零率与远端 信号的过零率仍然保持一定的差距, 处于单讲状态。 本发明实施例通过实时 的捕捉过零率的变化情况, 检测是否存在近端语音, 进而确定通话状态。
本发明实施例可以应用在与手机扬声器一样具有非线性失真特征的语音 通信产品的 AEC中。
如图 6所示, 为本发明实施例双端通话检测装置的结构示意图, 本实施例 可以为 AEC中的 DTD11 , 具体包括: 获取模块 21和频谱检测模块 22, 其中获 取模块 21 , 用于获取远端信号帧和近端信号帧; 频语检测模块 22, 用于根据 远端信号帧和近端信号帧的频谱差异, 检测通话状态。
进一步的, 获取模块 21包括输入模块 23、 緩冲模块 24和输出模块 25 , 输 入模块 23 , 用于获取输入的远端信号输入帧和近端信号输入帧, 并将远端信 号输入帧传送给緩冲模块 24; 緩冲模块 24, 用于将远端信号输入帧延迟指定 时间后, 得到远端信号帧; 输出模块 25 , 用于将近端信号输入帧作为近端信 号帧, 从而获取到远端信号帧和近端信号帧。
频语检测模块 22包括第一计算模块 26、 第二计算模块 27和差值检测模块
28, 第一计算模块 26, 用于计算远端信号帧的第一短时平均过零率以及近端 信号帧的第二短时平均过零率; 第二计算模块 27, 用于计算第一短时平均过 零率与第二短时平均过零率的差值; 差值检测模块 28, 用于根据该差值, 检 测通话状态。
频语检测模块 22还可以包括滤波模块 29, 该滤波模块 29用于将第一计算 模块 26计算得到的第一短时平均过零率和第二短时平均过零率进行平滑滤波 处理, 然后输出给第二计算模块 27。
再进一步, 差值检测模块 28可以包括: 判断单元 281和检测单元 282, 判 断单元 281 , 用于判断该差值是否小于频谱差异门限值; 检测单元 282, 用于 当判断单元 281判断出差值小于频谱差异门限值时,检测出通话状态为双讲状 态; 当判断单元 281判断出差值大于或等于频谱差异门限值时, 检测单元 282 检测出通话状态为单讲状态。
本实施例考虑到语音通信产品的实际特点, 利用由扬声器的非线性失真 引发的近端信号和远端信号的频谱差异来检测通话状态, 具体地, 釆用短时 平均过零率提取信号的频率特征, 根据远端信号帧和近端信号帧的短时平均 过零率的差值来检测通话状态, 使得检测精度不受非线性失真的影响, 反而 失真越大, 检测结果越准确, 检测性能越好; 相对于现有技术, 本实施例不 依赖于非线性失真程度及远端信号和回声信号的能量水平,提高了通话质量; 进一步的, 在免提情况下, 本实施例的有益效果更加显著。
本领域普通技术人员可以理解: 实现上述方法实施例的全部或部分步骤 可以通过程序指令相关的硬件来完成, 前述的程序可以存储于一计算机可读 取存储介质中, 该程序在执行时, 执行包括上述方法实施例的步骤, 而前述 的存储介质包括: ROM, RAM, 磁碟或者光盘等各种可以存储程序代码的介 质。 非对其限制; 尽管参照前述实施例对本发明实施例进行了详细的说明, 本领 域的普通技术人员应当理解: 其依然可以对前述各实施例所记载的技术方案 进行修改, 或者对其中部分技术特征进行等同替换; 而这些修改或者替换, 并不使相应技术方案的本质脱离本发明实施例各实施例技术方案的精神和范 围。

Claims

权 利 要 求
1、 一种双端通话检测方法, 其特征在于包括:
获取远端信号帧和近端信号帧;
根据所述远端信号帧和所述近端信号帧的频谱差异, 检测通话状态。
2、 根据权利要求 1 所述的双端通话检测方法, 其特征在于, 所述获取 远端信号帧和近端信号帧包括:
获取远端信号输入帧和近端信号输入帧;
将所述远端信号输入帧延迟指定时间后, 得到所述远端信号帧; 将所述近端信号输入帧作为所述近端信号帧。
3、 根据权利要求 1或 2所述的双端通话检测方法, 其特征在于, 所述 根据远端信号帧和近端信号帧的频谱差异, 检测通话状态包括:
计算所述远端信号帧的第一短时平均过零率以及所述近端信号帧的第二 短时平均过零率;
计算所述第一短时平均过零率与所述第二短时平均过零率的差值; 根据所述差值, 检测通话状态。
4、 根据权利要求 3所述的双端通话检测方法, 其特征在于, 在所述计 算差值之前还包括: 将所述第一短时平均过零率和所述第二短时平均过零率 进行平滑滤波处理。
5、 根据权利要求 3所述的双端通话检测方法, 其特征在于, 所述根据 差值, 检测通话状态包括:
判断所述差值是否小于频谱差异门限值, 若是, 则检测出通话状态为双 讲状态; 否则检测出通话状态为单讲状态。
6、 一种双端通话检测装置, 其特征在于包括:
获取模块, 用于获取远端信号帧和近端信号帧;
频语检测模块,用于根据所述远端信号帧和所述近端信号帧的频谱差异, 检测通话状态。
7、 根据权利要求 6所述的双端通话检测装置, 其特征在于, 所述获取 模块包括:
输入模块, 用于获取远端信号输入帧和近端信号输入帧;
緩冲模块, 用于将所述远端信号输入帧延迟指定时间后, 得到所述远端 信号帧;
输出模块, 用于将所述近端信号输入帧作为所述近端信号帧。
8、 根据权利要求 6或 7所述的双端通话检测装置, 其特征在于, 所述 频语检测模块包括:
第一计算模块, 用于计算所述远端信号帧的第一短时平均过零率以及所 述近端信号帧的第二短时平均过零率;
第二计算模块, 用于计算所述第一短时平均过零率与所述第二短时平均 过零率的差值;
差值检测模块, 用于根据所述差值, 检测通话状态。
9、 根据权利要求 8所述的双端通话检测装置, 其特征在于, 所述频谱 检测模块还包括:
滤波模块, 用于将所述第一短时平均过零率和所述第二短时平均过零率 进行平滑滤波处理。
10、 根据权利要求 8所述的双端通话检测装置, 其特征在于, 所述差值 检测模块包括:
判断单元, 用于判断所述差值是否小于频谱差异门限值;
检测单元,用于当所述判断单元判断出所述差值小于频谱差异门限值时, 检测出通话状态为双讲状态; 当所述判断单元判断出所述差值大于或等于频 谱差异门限值时, 检测出通话状态为单讲状态。
PCT/CN2009/070226 2009-01-20 2009-01-20 双端通话检测方法及装置 Ceased WO2010083641A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
PCT/CN2009/070226 WO2010083641A1 (zh) 2009-01-20 2009-01-20 双端通话检测方法及装置
CN200980142133.XA CN102160296B (zh) 2009-01-20 2009-01-20 双端通话检测方法及装置
EP09838608.9A EP2348645B1 (en) 2009-01-20 2009-01-20 Method and apparatus for detecting double talk
US12/577,410 US8160238B2 (en) 2009-01-20 2009-10-12 Method and apparatus for double-talk detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2009/070226 WO2010083641A1 (zh) 2009-01-20 2009-01-20 双端通话检测方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US12/577,410 Continuation US8160238B2 (en) 2009-01-20 2009-10-12 Method and apparatus for double-talk detection

Publications (1)

Publication Number Publication Date
WO2010083641A1 true WO2010083641A1 (zh) 2010-07-29

Family

ID=42336962

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2009/070226 Ceased WO2010083641A1 (zh) 2009-01-20 2009-01-20 双端通话检测方法及装置

Country Status (4)

Country Link
US (1) US8160238B2 (zh)
EP (1) EP2348645B1 (zh)
CN (1) CN102160296B (zh)
WO (1) WO2010083641A1 (zh)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103337242A (zh) * 2013-05-29 2013-10-02 华为技术有限公司 一种语音控制方法和控制设备
CN104010100A (zh) * 2014-05-08 2014-08-27 深圳市汇川技术股份有限公司 VoIP通信中的回声消除系统及方法
CN106601269A (zh) * 2016-12-28 2017-04-26 北京小米移动软件有限公司 终端状态确定方法及装置
CN106683683A (zh) * 2016-12-28 2017-05-17 北京小米移动软件有限公司 终端状态确定方法及装置
CN106791245A (zh) * 2016-12-28 2017-05-31 北京小米移动软件有限公司 确定滤波器系数的方法及装置
CN107635082A (zh) * 2016-07-18 2018-01-26 深圳市有信网络技术有限公司 一种双端发声端检测系统
CN107786755A (zh) * 2016-08-30 2018-03-09 合肥君正科技有限公司 一种双端通话检测方法和装置
CN108540680A (zh) * 2018-02-02 2018-09-14 广州视源电子科技股份有限公司 讲话状态的切换方法及装置、通话系统
CN110995951A (zh) * 2019-12-13 2020-04-10 展讯通信(上海)有限公司 基于双端发声检测的回声消除方法、装置及系统
CN111294474A (zh) * 2020-02-13 2020-06-16 杭州国芯科技股份有限公司 一种双端通话检测方法

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102137194B (zh) * 2010-01-21 2014-01-01 华为终端有限公司 一种通话检测方法及装置
US9083783B2 (en) * 2012-11-29 2015-07-14 Texas Instruments Incorporated Detecting double talk in acoustic echo cancellation using zero-crossing rate
CN106601227A (zh) * 2016-11-18 2017-04-26 北京金锐德路科技有限公司 音频采集方法和装置
KR20180082033A (ko) * 2017-01-09 2018-07-18 삼성전자주식회사 음성을 인식하는 전자 장치
US10388298B1 (en) * 2017-05-03 2019-08-20 Amazon Technologies, Inc. Methods for detecting double talk
CN107770683B (zh) * 2017-10-12 2019-10-11 北京小鱼在家科技有限公司 一种回声场景下音频采集状态的检测方法及装置
CN110310653A (zh) * 2019-07-09 2019-10-08 杭州国芯科技股份有限公司 一种回声消除方法
US12531047B2 (en) 2023-05-09 2026-01-20 Nokia Technologies Oy Acoustic echo cancellation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5732134A (en) * 1994-02-28 1998-03-24 Qualcomm Incorporated Doubletalk detection by means of spectral content
CN1195932A (zh) * 1997-04-02 1998-10-14 美国电报电话公司 通信系统中的实时回声检测、跟踪、对消以及噪声填充
CN1494229A (zh) * 2002-10-30 2004-05-05 冲电气工业株式会社 带有回声路径变化探测器的回声消除器
US20080298601A1 (en) * 2007-05-31 2008-12-04 Zarlink Semiconductor Inc. Double Talk Detection Method Based On Spectral Acoustic Properties

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7423983B1 (en) * 1999-09-20 2008-09-09 Broadcom Corporation Voice and data exchange over a packet based network
US6738358B2 (en) * 2000-09-09 2004-05-18 Intel Corporation Network echo canceller for integrated telecommunications processing
US7266287B2 (en) * 2001-12-14 2007-09-04 Hewlett-Packard Development Company, L.P. Using background audio change detection for segmenting video
US8335311B2 (en) * 2005-07-28 2012-12-18 Kabushiki Kaisha Toshiba Communication apparatus capable of echo cancellation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5732134A (en) * 1994-02-28 1998-03-24 Qualcomm Incorporated Doubletalk detection by means of spectral content
CN1195932A (zh) * 1997-04-02 1998-10-14 美国电报电话公司 通信系统中的实时回声检测、跟踪、对消以及噪声填充
CN1494229A (zh) * 2002-10-30 2004-05-05 冲电气工业株式会社 带有回声路径变化探测器的回声消除器
US20080298601A1 (en) * 2007-05-31 2008-12-04 Zarlink Semiconductor Inc. Double Talk Detection Method Based On Spectral Acoustic Properties

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103337242B (zh) * 2013-05-29 2016-04-13 华为技术有限公司 一种语音控制方法和控制设备
CN103337242A (zh) * 2013-05-29 2013-10-02 华为技术有限公司 一种语音控制方法和控制设备
CN104010100A (zh) * 2014-05-08 2014-08-27 深圳市汇川技术股份有限公司 VoIP通信中的回声消除系统及方法
CN107635082A (zh) * 2016-07-18 2018-01-26 深圳市有信网络技术有限公司 一种双端发声端检测系统
CN107786755A (zh) * 2016-08-30 2018-03-09 合肥君正科技有限公司 一种双端通话检测方法和装置
CN106601269A (zh) * 2016-12-28 2017-04-26 北京小米移动软件有限公司 终端状态确定方法及装置
CN106791245A (zh) * 2016-12-28 2017-05-31 北京小米移动软件有限公司 确定滤波器系数的方法及装置
CN106683683A (zh) * 2016-12-28 2017-05-17 北京小米移动软件有限公司 终端状态确定方法及装置
CN106791245B (zh) * 2016-12-28 2021-07-06 北京小米移动软件有限公司 确定滤波器系数的方法及装置
CN108540680A (zh) * 2018-02-02 2018-09-14 广州视源电子科技股份有限公司 讲话状态的切换方法及装置、通话系统
CN110995951A (zh) * 2019-12-13 2020-04-10 展讯通信(上海)有限公司 基于双端发声检测的回声消除方法、装置及系统
CN110995951B (zh) * 2019-12-13 2021-09-03 展讯通信(上海)有限公司 基于双端发声检测的回声消除方法、装置及系统
CN111294474A (zh) * 2020-02-13 2020-06-16 杭州国芯科技股份有限公司 一种双端通话检测方法
CN111294474B (zh) * 2020-02-13 2021-04-16 杭州国芯科技股份有限公司 一种双端通话检测方法

Also Published As

Publication number Publication date
EP2348645A4 (en) 2013-01-02
EP2348645B1 (en) 2018-04-11
US8160238B2 (en) 2012-04-17
US20100183140A1 (en) 2010-07-22
EP2348645A1 (en) 2011-07-27
CN102160296B (zh) 2014-01-22
CN102160296A (zh) 2011-08-17

Similar Documents

Publication Publication Date Title
WO2010083641A1 (zh) 双端通话检测方法及装置
US6792107B2 (en) Double-talk detector suitable for a telephone-enabled PC
CN105825864B (zh) 基于过零率指标的双端说话检测与回声消除方法
US8098813B2 (en) Communication system
TWI392322B (zh) 基於頻譜聲學特性之雙邊發話檢測方法
CN109348072B (zh) 一种应用于回声抵消系统的双端通话检测方法
CN110995951B (zh) 基于双端发声检测的回声消除方法、装置及系统
CN111742541B (zh) 声学回波抵消方法、装置、存储介质
CN103369162B (zh) 一种低复杂度的电话回声自适应消除方法
CN102137194B (zh) 一种通话检测方法及装置
CN104994249B (zh) 声回波消除方法和装置
US8824667B2 (en) Time-domain acoustic echo control
CN111916099A (zh) 一种变步长助听器自适应回声消除装置及回声消除方法
CN111355855B (zh) 回声处理方法、装置、设备及存储介质
CN111970610A (zh) 回声路径检测方法、音频信号处理方法及系统、存储介质、终端
TR201815047T4 (tr) Bir uzak uç konuşmacı sinyali ve bir birleşik sinyal arasındaki akustik bir bağlamanın belirlenmesi.
CN1917386B (zh) 一种回波抵消中双讲状态的检测方法
WO2021190274A1 (zh) 回声声场状态确定方法及装置、存储介质、终端
CN102300014A (zh) 一种适用于有噪声环境下的声回声抵消系统双端说话检测方法
WO2017012350A1 (zh) 一种判断滤波器状态发散的方法及装置
CN119363880B (zh) 一种语音通话状态检测方法、系统
CN111970410B (zh) 回声消除方法及装置、存储介质、终端
CN111294474B (zh) 一种双端通话检测方法
CN115643342B (zh) 一种回声消除方法
CN101286763B (zh) 一种有效的回波抑制器

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200980142133.X

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09838608

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2009838608

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE