WO2015135344A1 - 检测音频信号的方法和装置 - Google Patents
检测音频信号的方法和装置 Download PDFInfo
- Publication number
- WO2015135344A1 WO2015135344A1 PCT/CN2014/092694 CN2014092694W WO2015135344A1 WO 2015135344 A1 WO2015135344 A1 WO 2015135344A1 CN 2014092694 W CN2014092694 W CN 2014092694W WO 2015135344 A1 WO2015135344 A1 WO 2015135344A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- audio signal
- determined
- ssnr
- snr
- signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/10—Speech classification or search using distance or distortion measures between unknown speech and reference templates
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/93—Discriminating between voiced and unvoiced parts of speech signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L2025/783—Detection of presence or absence of voice signals based on threshold decision
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
Definitions
- Embodiments of the present invention relate to the field of signal processing techniques and, more particularly, to methods and apparatus for detecting audio signals.
- VAD Voice Activity Detection
- SAD Sound Activity Detection
- Typical activity signals include voice, music, and the like.
- the principle of VAD is to extract one or more feature parameters from the input audio signal, determine one or more feature values according to the one or more feature parameters, and then combine the one or more feature values with one or more thresholds. Values are compared.
- a segmentation signal to noise ratio (SSNR)-based active signal detection method in the prior art divides an input audio signal into frequency bands into a plurality of sub-band signals, and calculates the audio signal in each sub-band.
- the energy is obtained by comparing the energy of the audio signal in each sub-band with the energy of an estimated background noise signal in each sub-band to obtain a signal-to-noise ratio of the audio signal on each sub-band (Signal-to- Noise Ratio, SNR).
- determining SSNR according to the subband SNR on each subband comparing the SSNR with a preset VAD decision threshold, if the SSNR exceeds the VAD decision threshold, the audio signal is an active signal; if the SSNR does not exceed the VAD decision The threshold is the inactive signal.
- a typical way to calculate the SSNR is to add all the sub-band SNRs of the audio signal, and the result is the SSNR.
- the SSNR can be determined using Equation 1.1:
- k denotes the kth subband
- snr(k) denotes the subband SNR of the kth subband
- N denotes the number of subbands in which the audio signal is divided into subbands in total.
- the missed detection of the active speech may be caused.
- Embodiments of the present invention provide a method and apparatus for detecting an audio signal capable of accurately distinguishing between active speech and inactive speech.
- an embodiment of the present invention provides a method for detecting an audio signal, the method comprising: determining an input audio signal as an audio signal to be determined; determining an enhanced segmentation signal to noise ratio (SSNR) of the audio signal, where the enhanced SSNR is greater than Baseline SSNR; comparing the enhanced SSNR with a voice activity detection VAD decision threshold to determine if the audio signal is an active signal.
- SSNR segmentation signal to noise ratio
- the determined input audio signal is an audio signal to be determined, including: determining, according to a subband SNR of the audio signal, the audio signal is The audio signal is to be judged.
- the determined input audio signal is an audio signal to be determined, including: the sub-band SNR is greater than the audio signal
- the audio signal is determined to be the audio signal to be determined.
- the determining the input audio signal is an audio signal to be determined, including: the sub-band SNR is greater in the audio signal Determining that the audio signal is to be determined if the number of high frequency terminal strips of the first preset threshold is greater than the second number and the number of low frequency terminal strips of the audio signal in which the subband SNR is less than the second preset threshold is greater than the third number audio signal.
- the determining the input audio signal is an audio signal to be determined, including: a neutron band SNR in the audio signal
- the audio signal is determined to be the audio signal to be determined.
- the determined input audio signal is an audio signal to be determined, including: determining that the audio signal is an unvoiced signal, determining that the audio signal is The audio signal is to be judged.
- the determining the enhanced segmentation signal to noise ratio of the audio signal includes: determining a weight of a subband SNR of each subband in the audio signal, wherein the subband SNR of the subband SNR greater than the first preset threshold has a weight of a subband SNR greater than that of the other subbands Weighting; determining the enhanced SSNR based on the weight of the subband SNR of each subband in the audio signal and the subband SNR of each subband.
- determining the enhanced segmentation signal to noise ratio SSNR of the audio signal comprises: determining a reference SSNR of the audio signal; and determining an enhanced SSNR according to a reference SSNR of the audio signal.
- the comparing the enhanced SSNR with the voice activity detection VAD determination threshold further includes: The VAD decision threshold is reduced by using a preset algorithm, and the reduced VAD decision threshold is obtained.
- the enhanced SSNR is compared with the voice activity detection VAD decision threshold, and determining whether the audio signal is an active signal specifically includes: the enhanced SSNR and the The reduced VAD decision threshold is compared to determine if the audio signal is an active signal.
- an embodiment of the present invention provides a method for detecting an audio signal, the method comprising: determining an input audio signal as an audio signal to be determined; determining a weight of a sub-band signal to noise ratio SNR of each subband in the audio signal, The weight of the subband SNR of the high frequency terminal strip with the subband SNR greater than the first preset threshold is greater than the weight of the subband SNR of the other subbands; the weight and each of the subband SNR according to each subband in the audio signal
- the subband SNR of the subband determines an enhanced segmentation signal to noise ratio SSNR, wherein the enhanced SSNR is greater than a reference SSNR; the enhanced SSNR is compared to a voice activity detection VAD decision threshold to determine whether the audio signal is an active signal.
- the determining input The audio signal is an audio signal to be determined, and includes: determining, according to a sub-band SNR of the audio signal, the audio signal as an audio signal to be determined.
- the determined input audio signal is an audio signal to be determined, including: the sub-band SNR is greater than the audio signal
- the audio signal is determined to be the audio signal to be determined.
- the determined input audio signal is an audio signal to be determined, including: the sub-band SNR is greater than the audio signal Determining that the audio signal is to be determined if the number of high frequency terminal strips of the first preset threshold is greater than the second number and the number of low frequency terminal strips of the audio signal in which the subband SNR is less than the second preset threshold is greater than the third number audio signal.
- an embodiment of the present invention provides a method for detecting an audio signal, the method comprising: determining an input audio signal as an audio signal to be determined; acquiring a reference segmental signal to noise ratio SSNR of the audio signal; using a preset algorithm to reduce The small reference voice activity detects the VAD decision threshold, obtains the reduced VAD decision threshold, compares the reference SSNR with the reduced VAD decision threshold, and determines whether the audio signal is an active signal.
- the determined input audio signal is an audio signal to be determined, including: determining, according to a subband SNR of the audio signal, the audio signal is The audio signal is to be judged.
- the determined input audio signal is an audio signal to be determined, including: the sub-band SNR is greater in the audio signal When the number of the high frequency terminal strips of the first preset threshold is greater than the first number, the audio signal is determined to be the audio signal to be determined.
- the determined input audio signal is an audio signal to be determined, including: the sub-band SNR is greater in the audio signal Determining that the audio signal is to be determined if the number of high frequency terminal strips of the first preset threshold is greater than the second number and the number of low frequency terminal strips of the audio signal in which the subband SNR is less than the second preset threshold is greater than the third number audio signal.
- the determined input audio signal is an audio signal to be determined, including: a neutron band SNR in the audio signal
- the audio signal is determined to be an audio signal to be determined.
- the determined input audio signal is an audio signal to be determined, including: determining that the audio signal is an unvoiced signal, determining that the audio signal is The audio signal is to be judged.
- an embodiment of the present invention provides an apparatus, where the apparatus includes: a first determining unit, configured to determine an input audio signal as an audio signal to be determined; and a second determining unit, configured to determine an enhanced segment of the audio signal a signal-to-noise ratio SSNR, wherein the enhanced SSNR is greater than a reference SSNR; and a third determining unit configured to compare the enhanced SSNR with a voice activity detection VAD decision threshold to determine whether the audio signal is an active signal.
- the first determining unit is specifically configured to determine, according to a subband SNR of the audio signal, the audio signal as an audio signal to be determined.
- the first determining unit is configured to: in the audio signal, the sub-band signal-to-noise ratio SNR is greater than the first When the number of high frequency terminal strips of the preset threshold is greater than the first number, the audio signal is determined to be an audio signal to be determined.
- the first determining unit is configured to: in the audio signal, the subband SNR is greater than the first preset threshold In the case where the number of high frequency terminal strips is greater than the second number and the number of low frequency terminal strips in which the subband SNR is less than the second preset threshold in the audio signal is greater than the third number, the audio signal is determined to be the audio signal to be determined.
- the first determining unit is configured to use, in the audio signal, that the value of the sub-band SNR is greater than the third pre- In the case where the number of sub-bands of the threshold is greater than the fourth number, the audio signal is determined to be the audio signal to be determined.
- the first determining unit is configured to determine, in the case that the audio signal is an unvoiced signal, the audio signal is an audio signal to be determined.
- the second determining unit is specifically configured to determine the The weight of the subband SNR of each subband in the audio signal, wherein the subband SNR is greater than The weight of the sub-band SNR of the high-frequency terminal strip of a preset threshold is greater than the weight of the sub-band SNR of the other sub-bands, according to the weight of the sub-band SNR of each sub-band in the audio signal and the sub-band SNR of each sub-band, Determine the enhanced SSNR.
- the second determining unit is specifically configured to determine a reference SSNR of the audio signal, and determine the enhanced SSNR according to a reference SSNR of the audio signal.
- the device further includes a fourth determining unit, where the fourth determining unit is configured to use The preset algorithm reduces the VAD decision threshold, and obtains the reduced VAD decision threshold.
- the third determining unit is specifically configured to compare the enhanced SSNR with the reduced VAD decision threshold to determine whether the audio signal is Activity signal.
- an embodiment of the present invention provides an apparatus, where the apparatus includes: a first determining unit, configured to determine an input audio signal as an audio signal to be determined; and a second determining unit, configured to determine each subband in the audio signal
- the weight of the sub-band signal-to-noise ratio SNR wherein the sub-band SNR is greater than the weight of the sub-band SNR of the high-frequency terminal strip of the first preset threshold, and the weight of the sub-band SNR of the other sub-bands is greater, according to each of the audio signals
- the weight of the subband SNR of the subband and the subband SNR of each subband determine an enhanced segmentation signal to noise ratio SSNR, wherein the enhanced SSNR is greater than a reference SSNR; and a third determining unit for detecting the enhanced SSNR and the voice activity detection VAD
- the decision threshold is compared to determine whether the audio signal is an active signal.
- the first determining unit is configured to determine, according to a subband SNR of the audio signal, the audio signal as an audio signal to be determined.
- the first determining unit is configured to: in the audio signal, the sub-band signal-to-noise ratio SNR is greater than the first When the number of high frequency terminal strips of the preset threshold is greater than the first number, the audio signal is determined to be an audio signal to be determined.
- the first determining unit is configured to: in the audio signal, the subband SNR is greater than the first preset threshold In the case where the number of high frequency terminal strips is greater than the second number and the number of low frequency terminal strips in which the subband SNR is less than the second preset threshold in the audio signal is greater than the third number, the audio signal is determined to be the audio signal to be determined.
- an embodiment of the present invention provides an apparatus, where the apparatus includes: a first determining unit, configured to determine an input audio signal as an audio signal to be determined; and a second determining unit, configured to acquire a reference segment of the audio signal a signal-to-noise ratio SSNR; a third determining unit, configured to reduce a reference voice activity detection VAD decision threshold by using a preset algorithm to obtain a reduced VAD decision threshold; and a fourth determining unit, configured to reduce the reference SSNR and the reduction The subsequent VAD decision threshold is compared to determine if the audio signal is an active signal.
- the first determining unit is configured to determine, according to the sub-band signal-to-noise ratio SNR of the audio signal, the audio signal as the to-be-determined audio signal.
- the first determining unit is configured to: in the audio signal, the subband SNR is greater than the first preset threshold In the case where the number of high frequency terminal strips is greater than the first number, the audio signal is determined to be the audio signal to be determined.
- the first determining unit is configured to: in the audio signal, the subband SNR is greater than the first preset threshold In the case where the number of high frequency terminal strips is greater than the second number and the number of low frequency terminal strips in which the subband SNR is less than the second preset threshold in the audio signal is greater than the third number, the audio signal is determined to be the audio signal to be determined.
- the first determining unit is configured to: in the audio signal, the value of the neutron band SNR is greater than the third In a case where the number of sub-bands of the preset threshold is greater than the fourth number, the audio signal is determined to be an audio signal to be determined.
- the first determining unit is configured to determine, in the case that the audio signal is an unvoiced signal, the audio signal as an audio signal to be determined.
- the characteristics of the audio signal may be determined, and according to the characteristics of the audio signal, the enhanced SSNR is determined in a corresponding manner, and the enhanced SSNR is compared with the VAD decision threshold, so that the active signal is leaked.
- FIG. 1 is a schematic flowchart of a method for detecting an audio signal according to an embodiment of the present invention.
- FIG. 2 is a schematic flowchart of a method for detecting an audio signal according to an embodiment of the present invention.
- FIG. 3 is a schematic flowchart of a method for detecting an audio signal according to an embodiment of the present invention.
- FIG. 4 is a schematic flowchart of a method for detecting an audio signal according to an embodiment of the present invention.
- FIG. 5 is a structural block diagram of an apparatus according to an embodiment of the present invention.
- FIG. 6 is a structural block diagram of another apparatus according to an embodiment of the present invention.
- FIG. 7 is a structural block diagram of an apparatus according to an embodiment of the present invention.
- FIG. 8 is a structural block diagram of another apparatus according to an embodiment of the present invention.
- FIG. 9 is a structural block diagram of another apparatus according to an embodiment of the present invention.
- FIG. 10 is a structural block diagram of another apparatus according to an embodiment of the present invention.
- FIG. 1 is a schematic flowchart of a method for detecting an audio signal according to an embodiment of the present invention.
- the reference VAD decision threshold may be used when comparing the enhanced SSNR with the VAD decision threshold, or the reduced VAD decision threshold obtained after the reference VAD decision threshold may be reduced using a preset algorithm.
- the reference VAD decision threshold may be a default VAD decision threshold.
- the reference VAD decision threshold may be pre-stored or temporarily calculated. The calculation of the reference VAD decision threshold may be performed by using a prior art.
- the preset algorithm may be to multiply the reference VAD decision threshold by a coefficient less than one, and other algorithms may be used.
- the embodiment of the present invention does not limit the specific algorithm used. .
- the SSNR of these audio signals may be lower than the preset VAD decision threshold.
- these audio signals are active audio signals. This is due to the characteristics of these audio signals.
- the sub-band SNR of the high frequency portion is significantly reduced.
- the sub-band SNR of the high-frequency portion contributes less to the SSNR.
- the SSNR calculated by the conventional SSNR calculation method may be lower than the VAD decision threshold, which causes the missed detection of the active signal.
- the energy of the audio signal is relatively flat on the spectrum, but the overall energy of the audio signal is low.
- the SSNR calculated using the conventional SSNR calculation method may also be lower than the VAD decision threshold. The method shown in FIG. 1 can effectively reduce the ratio of active signal leakage by appropriately increasing the SSNR such that the SSNR can be greater than the VAD decision threshold.
- FIG. 2 is a schematic flowchart of a method for detecting an audio signal according to an embodiment of the present invention.
- the spectrum of the input audio signal is divided into N subbands, where N is a positive integer greater than one.
- the spectrum of the audio signal can be divided by psychoacoustic theory.
- the width of the sub-band closer to the low frequency is narrower, and the width of the sub-band closer to the high frequency is wider.
- the spectrum of the audio signal may be divided in other ways, for example, the spectrum of the audio signal is equally divided into N sub-bands.
- a subband SNR is calculated for each subband of the input audio signal, wherein the subband SNR is the ratio of the energy of the subband to the energy of the background noise on the subband.
- the subband energy of the background noise is generally estimated by the background noise estimator. estimated value. How to estimate the background noise energy corresponding to each sub-band by using the background noise estimator is well known in the art, and therefore, it is not necessary to go into details here.
- the sub-band SNR may be a direct energy ratio or other representation of the direct energy ratio, such as a log sub-band SNR.
- the sub-band SNR can also be a sub-band SNR or other deformation after linear or nonlinear processing on the direct sub-band SNR. The following formula is the direct energy ratio of the subband SNR:
- snr(k) represents the subband SNR of the kth subband
- E(k) and En(k) represent the energy of the kth subband and the energy of the background noise on the kth subband, respectively.
- the sub-band energy used to calculate the sub-band SNR can be either the energy of the input audio signal on the sub-band or the energy of the input audio signal on the sub-band to remove the background noise in the sub-band. The energy after the energy.
- the calculation of SNR is as long as it does not deviate from the meaning of SNR.
- determining the input audio signal as the to-be-determined audio signal includes: determining, according to the sub-band SNR of the audio signal determined in step 201, the audio signal as the to-be-determined audio signal.
- the determined input audio signal is an audio signal to be determined, including: in the audio signal.
- the audio signal is determined to be an audio signal to be determined.
- the determined input audio signal is an audio signal to be determined, including: the audio signal If the number of the high frequency terminal strips with the SNR greater than the first preset threshold is greater than the second number and the number of low frequency terminal strips in the audio signal where the subband SNR is less than the second preset threshold is greater than the third number,
- the audio signal is an audio signal to be determined.
- the high frequency end and the low frequency end of one frame of the audio signal are relatively speaking, that is, the portion having a relatively high frequency is a high frequency end, and the portion having a relatively low frequency is a low frequency end.
- the determined input audio signal is an audio signal to be determined.
- the number includes: determining that the audio signal is an audio signal to be determined if a value of the sub-band SNR in the audio signal is greater than a third number of sub-bands of the third preset threshold.
- the first preset threshold and the second preset threshold may be obtained according to a large number of voice samples. Specifically, in a large number of voice unvoiced samples containing background noise, the subband SNR of the high frequency terminal strip is counted, and the first preset threshold is determined therefrom, so that the subband SNR of most of the high frequency terminal strips in the unvoiced samples Both are greater than the threshold. Similarly, the subband SNR of the low frequency terminal strip is counted in the speech unvoiced samples, and the second preset threshold is determined therefrom such that the subband SNR of most of the low frequency terminal strips of the speech unvoiced samples is less than the threshold.
- the third preset threshold is also obtained based on statistics. Specifically, the third preset threshold is determined from the sub-band SNR of the large number of noise signals such that the sub-band SNR of most of the sub-bands of the noise signals is less than the value.
- the first quantity, the second quantity, the third quantity, and the fourth quantity are also obtained based on statistics.
- the first quantity in a large number of noise-containing voice unvoiced sample frames, the number of sub-bands whose SNR of the high-frequency terminal strip is larger than the first preset threshold is determined, and the first quantity is determined therefrom, so that the voice unvoiced samples are made.
- the majority of the subband SNR in the frame is greater than the first preset threshold.
- the number of high frequency terminal strips is greater than the first number.
- the method of obtaining the second quantity is similar to the method of obtaining the first quantity.
- the second number may be the same as the first quantity, and the second quantity may also be different from the first quantity.
- the sub-band SNR of the low-frequency terminal strip is counted less than the number of sub-bands of the second preset threshold, and the third quantity is determined therefrom, so that the voice unvoiced samples The majority of the subband SNR in the frame is less than the second preset threshold.
- the number of low frequency terminal strips is greater than the third number.
- the statistical sub-band SNR is smaller than the number of sub-bands of the third preset threshold, and the fourth quantity is determined therefrom, so that most of the sub-band SNRs of the noise sample frames are smaller than the third
- the number of sub-bands of the preset threshold is greater than the fourth number.
- whether the input audio signal is an audio signal to be determined may be determined by determining whether the input audio signal is an unvoiced signal. In this case, it is not necessary to determine the sub-band SNR of the audio signal when determining whether the audio signal is an audio signal to be judged. In other words, step 201 does not need to be performed in determining whether the audio signal is an audio signal to be determined. Specifically, the determining the input audio signal is the audio signal to be determined, and if the audio signal is determined to be an unvoiced signal, determining the audio signal as the audio signal to be determined. In particular, those skilled in the art will appreciate that there are a variety of methods for detecting whether an audio signal is an unvoiced signal.
- the audio signal by detecting the Zero-Crossing Rate (ZCR) of the audio signal. Determine if the audio signal is an unvoiced signal. Specifically, in the case where the ZCR of the audio signal is greater than the ZCR threshold, the audio signal is determined to be an unvoiced signal, wherein the ZCR threshold is determined by a large number of experiments.
- ZCR Zero-Crossing Rate
- the reference SSNR can be the SSNR calculated using Equation 1.1. As can be seen from Equation 1.1, when calculating the reference SSNR, the subband SNR of any subband is not weighted, that is, the weight of the subband SNR of each subband is the same when calculating the reference SSNR.
- the SSNR is enhanced, including: determining a weight of a subband SNR of each subband in the audio signal, where the weight of the high frequency terminal band of the subband SNR is greater than a weight of a subband SNR of other subbands according to a first preset threshold, according to The weight of the subband SNR of each subband in the audio signal and the subband SNR of each subband determine the enhanced SSNR.
- the audio signal is divided into 20 sub-bands according to psychoacoustic theory, that is, sub-band 0 to sub-band 19.
- the sub-band 18 and the sub-band 19 are both larger than the first predetermined value T1
- four sub-bands that is, the sub-band 20 to the sub-band 23 may be added.
- the sub-band 18 having a signal-to-noise ratio greater than T1 may be divided into a sub-band 18a, a sub-band 18b, and a sub-band 18c, and the sub-band 19 is divided into a sub-band 19a, a sub-band 19b, and a sub-band 19c.
- the sub-band 18 can be regarded as a mother-sub-belt of the sub-band 18a, the sub-band 18b, and the sub-band 18c
- the sub-band 19 can be regarded as a mother-child belt of the sub-band 19a, the sub-band 19b, and the sub-band 19c.
- the values of the signal-to-noise ratio of the sub-band 18a, the sub-band 18b, and the sub-band 18c are the same as the values of the signal-to-noise ratio of the mother and sub-bands, and the values of the signal-to-noise ratio of the sub-band 19a, the sub-band 19b, and the sub-band 19c are the same as those of the mother-child band.
- the noise ratio has the same value.
- the original sub-divided 20 sub-bands are re-divided into 24 sub-bands. Since the VAD is still designed according to 20 subbands when performing active signal detection, it is necessary to map 24 subbands back to 20 subbands to determine the enhanced SSNR.
- the enhanced SSNR is determined by increasing the number of high frequency terminal strips in which the subband SNR is greater than the first preset threshold, the following formula may be used for calculation:
- SSNR' represents the enhanced SSNR.
- Snr(k) represents the subband SNR of the kth subband.
- the calculated reference SSNR is Obviously, the value of the enhanced SSNR calculated using Equation 1.3 for the first type of audio signal is greater than the value of the reference SSNR calculated using Equation 1.1.
- the enhanced SSNR may be determined by the following formula:
- SSNR indicates the reinforcing SSNR
- snr (k) represents the k-th subband of the subband SNR
- a 1 and a 2 to increase the weight parameter
- a 1 and a value of 2 is such that a 1 ⁇ snr (18) + a 2 ⁇ snr(19) is larger than snr(18)+snr(19).
- the value of the enhanced SSNR calculated using Equation 1.4 is greater than the value of the reference SSNR calculated using Equation 1.1.
- determining the enhanced SSNR of the audio signal includes determining a reference SSNR of the audio signal, and determining an enhanced SSNR according to a reference SSNR of the audio signal.
- the enhanced SSNR can be determined using the following formula:
- SSNR represents the reference SSNR of the audio signal
- SSNR' represents the enhanced SSNR
- x and y represent the enhancement parameters.
- the value of x can be 1.05
- the value of y can be 1.
- the values of x and y may also be other suitable values such that the enhanced SSNR is properly greater than the reference SSNR.
- the enhanced SSNR can be determined using the following formula:
- SSNR represents the original SSNR of the audio signal
- SSNR' represents the enhanced SSNR
- f(x), h(y) represents the enhancement function.
- f(x) and h(y) may be functions related to the long-term SNR (LSNR) of the audio signal, and the long-term signal-to-noise ratio of the audio signal is for a long period of time.
- Average SNR or weighted SNR For example, when lsnr is greater than 20, f(lsnr) may be equal to 1.1 and y(lsnr) may be equal to 2.
- f(lsnr) When lsnr is less than 20 and greater than 15, f(lsnr) may be equal to 1.05, and y(lsnr) may be equal to 1. When lsnr is less than 15, f(lsnr) may be equal to 1, and y(lsnr) may be equal to zero.
- f(x) and h(y) may also be in other suitable forms such that the enhanced SSNR is properly greater than the reference SSNR.
- the enhanced SSNR is compared with a VAD decision threshold, and if the enhanced SSNR is greater than the VAD decision threshold, the audio signal is determined to be an active signal. Otherwise it is determined that the audio signal is an inactive signal.
- the method may further include: reducing the VAD decision threshold by using a preset algorithm to obtain a reduced VAD decision threshold.
- comparing the enhanced SSNR with the VAD decision threshold specifically includes comparing the enhanced SSNR with the reduced VAD decision threshold to determine whether the audio signal is an active signal.
- the reference VAD decision threshold may be a default VAD decision threshold, which may be pre-stored or temporarily calculated, wherein the calculation of the reference VAD decision threshold may be performed by a prior art technique.
- the preset algorithm may be to multiply the reference VAD decision threshold by a coefficient less than one, and other algorithms may be used.
- the embodiment of the present invention does not limit the specific algorithm used. .
- the preset algorithm may appropriately reduce the VAD decision threshold such that the enhanced SSNR is greater than the reduced VAD decision threshold, so that the proportion of the active signal being missed may be reduced.
- the characteristics of the audio signal are determined, and according to the characteristics of the audio signal, the enhanced SSNR is determined in a corresponding manner, and the enhanced SSNR is compared with the VAD decision threshold, so that the active signal is reduced in the proportion of missed detection. .
- FIG. 3 is a schematic flowchart of a method for detecting an audio signal according to an embodiment of the present invention.
- the reference SSNR can be the SSNR calculated using Equation 1.1. As can be seen from Equation 1.1, when calculating the reference SSNR, the subband SNR of any subband is not weighted, that is, the weight of the subband SNR of each subband is the same when calculating the reference SSNR.
- the audio signal is divided into 20 sub-bands according to psychoacoustic theory, that is, sub-band 0 to sub-band 19. If the sub-band 18 and the sub-band 19 are both larger than the first preset value T1, four more can be added.
- Sub-bands that is, sub-bands 20 to sub-bands 23.
- the sub-band 18 having a signal-to-noise ratio greater than T1 may be divided into a sub-band 18a, a sub-band 18b, and a sub-band 18c, and the sub-band 19 is divided into a sub-band 19a, a sub-band 19b, and a sub-band 19c.
- the sub-band 18 can be regarded as a mother-sub-belt of the sub-band 18a, the sub-band 18b, and the sub-band 18c
- the sub-band 19 can be regarded as a mother-child belt of the sub-band 19a, the sub-band 19b, and the sub-band 19c.
- the values of the signal-to-noise ratio of the sub-band 18a, the sub-band 18b, and the sub-band 18c are the same as the values of the signal-to-noise ratio of the mother and sub-bands, and the values of the signal-to-noise ratio of the sub-band 19a, the sub-band 19b, and the sub-band 19c are the same as those of the mother-child band.
- the noise ratio has the same value.
- the original sub-divided 20 sub-bands are re-divided into 24 sub-bands. Since the VAD is still designed according to 20 subbands when performing active signal detection, it is necessary to map 24 subbands back to 20 subbands to determine the enhanced SSNR.
- the enhanced SSNR is determined by increasing the number of high frequency terminal strips in which the subband SNR is greater than the first preset threshold, the following formula may be used for calculation:
- SSNR' represents the enhanced SSNR.
- Snr(k) represents the subband SNR of the kth subband.
- the calculated reference SSNR is Obviously, the value of the enhanced SSNR calculated using Equation 1.3 for the first type of audio signal is greater than the value of the reference SSNR calculated using Equation 1.1.
- the enhanced SSNR may be determined by the following formula:
- SSNR indicates the reinforcing SSNR
- snr (k) represents the k-th subband of the subband SNR
- a 1 and a 2 to increase the weight parameter
- a 1 and a value of 2 is such that a 1 ⁇ snr (18) + a 2 ⁇ snr(19) is larger than snr(18)+snr(19).
- the value of the enhanced SSNR calculated using Equation 1.4 is greater than the value of the reference SSNR calculated using Equation 1.1.
- the enhanced SSNR is compared with a VAD decision threshold, and if the enhanced SSNR is greater than the VAD decision threshold, the audio signal is determined to be an active signal. Otherwise determine the audio letter The number is an inactive signal.
- the method described in FIG. 3 can determine the characteristics of the audio signal, determine the enhanced SSNR in a corresponding manner according to the characteristics of the audio signal, and compare the enhanced SSNR with the VAD decision threshold, so that the active signal can be reduced by the missed detection ratio.
- determining the input audio signal is the audio signal to be determined, and determining, according to the sub-band SNR of the audio signal, the audio signal is the audio signal to be determined.
- determining the audio signal as an audio signal to be determined includes: in the audio signal neutron In the case where the number of high frequency terminal strips having an SNR greater than the first preset threshold is greater than the first number, the audio signal is determined to be an audio signal to be determined.
- determining the audio signal as the audio signal to be determined includes: in the audio signal If the number of high frequency terminal strips whose subband SNR is greater than the first preset threshold is greater than the second number and the number of low frequency terminal strips in the audio signal where the subband SNR is less than the second preset threshold is greater than the third number, The audio signal is the audio signal to be judged.
- the first preset threshold and the second preset threshold may be obtained according to a large number of voice samples. Specifically, in a large number of voice unvoiced samples containing background noise, the subband SNR of the high frequency terminal strip is counted, and the first preset threshold is determined therefrom, so that the subband SNR of most of the high frequency terminal strips in the unvoiced samples Both are greater than the threshold. Similarly, the subband SNR of the low frequency terminal strip is counted in the speech unvoiced samples, and the second preset threshold is determined therefrom such that the subband SNR of most of the low frequency terminal strips of the speech unvoiced samples is less than the threshold.
- the first quantity, the second quantity, and the third quantity are also obtained based on statistics.
- the first quantity as an example, in a large number of noise-containing voice unvoiced sample frames, the number of sub-bands whose SNR of the high-frequency terminal strip is larger than the first preset threshold is determined, and the first quantity is determined therefrom, so that the voice unvoiced samples are made.
- the majority of the subband SNR in the frame is greater than the first preset threshold.
- the number of high frequency terminal strips is greater than the first number.
- the method of obtaining the second quantity is similar to the method of obtaining the first quantity.
- the second number may be the same as the first quantity, and the second quantity may also be different from the first quantity.
- the sub-band SNR of the low-frequency terminal strip is counted less than the number of sub-bands of the second preset threshold, and the third quantity is determined therefrom, so that the voice unvoiced samples The majority of the subband SNR in the frame is less than the second preset threshold.
- the number of low frequency terminal strips is greater than the third number.
- the embodiment of Figures 1 to 3 determines the input audio signal by using an enhanced SSNR. No is the activity signal.
- the method shown in FIG. 4 determines whether the input audio signal is an active signal by reducing the VAD decision threshold.
- FIG. 4 is a schematic flowchart of a method for detecting an audio signal according to an embodiment of the present invention.
- determining the input audio signal as the to-be-determined audio signal includes: determining, according to the sub-band SNR of the audio signal determined in step 201, the audio signal as the to-be-determined audio signal.
- the determined input audio signal is an audio signal to be determined, including: in the audio signal.
- the audio signal is determined to be an audio signal to be determined.
- the determined input audio signal is an audio signal to be determined, including: the audio signal If the number of the high frequency terminal strips with the SNR greater than the first preset threshold is greater than the second number and the number of low frequency terminal strips in the audio signal where the subband SNR is less than the second preset threshold is greater than the third number, The audio signal is an audio signal to be determined.
- the determined input audio signal is an audio signal to be determined, including: the audio signal
- the value of the sub-band SNR is greater than the third preset threshold, the number of sub-bands is greater than the fourth number, and the audio signal is determined to be the audio signal to be determined.
- the first preset threshold and the second preset threshold may be obtained according to a large number of voice samples. Specifically, in a large number of voice unvoiced samples containing background noise, the subband SNR of the high frequency terminal strip is counted, and the first preset threshold is determined therefrom, so that the subband SNR of most of the high frequency terminal strips in the unvoiced samples Both are greater than the threshold. Similarly, the subband SNR of the low frequency terminal strip is counted in the speech unvoiced samples, and the second preset threshold is determined therefrom such that the subband SNR of most of the low frequency terminal strips of the speech unvoiced samples is less than the threshold.
- the third preset threshold is also obtained based on statistics. Specifically, the third preset threshold is determined from the sub-band SNR of the large number of noise signals such that the sub-band SNR of most of the sub-bands of the noise signals is less than the value.
- the first quantity, the second quantity, the third quantity, and the fourth quantity are also obtained based on statistics. Taking the first quantity as an example, in a large number of voice-voiced unvoiced sample frames containing noise, the sub-frequency band of the high-frequency terminal strip is counted. The number of subbands with an SNR greater than the first preset threshold is determined from the first number, such that the majority of the subband SNRs of the voice unvoiced sample frames are greater than the first preset threshold, and the number of high frequency terminal strips is greater than the first Quantity.
- the method of obtaining the second quantity is similar to the method of obtaining the first quantity.
- the second number may be the same as the first quantity, and the second quantity may also be different from the first quantity.
- the sub-band SNR of the low-frequency terminal strip is counted less than the number of sub-bands of the second preset threshold, and the third quantity is determined therefrom, so that the voice unvoiced samples The majority of the subband SNR in the frame is less than the second preset threshold.
- the number of low frequency terminal strips is greater than the third number.
- the statistical sub-band SNR is smaller than the number of sub-bands of the third preset threshold, and the fourth quantity is determined therefrom, so that most of the sub-band SNRs of the noise sample frames are smaller than the third
- the number of sub-bands of the preset threshold is greater than the fourth number.
- whether the input audio signal is an audio signal to be determined may be determined by determining whether the input audio signal is an unvoiced signal. In this case, it is not necessary to determine the sub-band SNR of the audio signal when determining whether the audio signal is an audio signal to be judged. In other words, step 201 does not need to be performed in determining whether the audio signal is an audio signal to be determined. Specifically, the determining the input audio signal is the audio signal to be determined, and if the audio signal is determined to be an unvoiced signal, determining the audio signal as the audio signal to be determined. In particular, those skilled in the art will appreciate that there are a variety of methods for detecting whether an audio signal is an unvoiced signal.
- whether the audio signal is an unvoiced signal can be determined by detecting a Zero-Crossing Rate (ZCR) of the audio signal.
- ZCR Zero-Crossing Rate
- the audio signal is determined to be an unvoiced signal, wherein the ZCR threshold is determined by a large number of experiments.
- the reference SSNR may be the SSNR calculated using Equation 1.1.
- the reference VAD decision threshold may be a default VAD decision threshold, which may be pre-stored or temporarily calculated, wherein the calculation of the reference VAD decision threshold may be performed by using a prior art technique.
- the preset algorithm may be to multiply the reference VAD decision threshold by a coefficient less than one, and other algorithms may be used.
- the embodiment of the present invention does not limit the specific algorithm used. .
- the preset algorithm may appropriately reduce the VAD decision threshold such that the enhanced SSNR is greater than the reduced VAD decision gate. Limit, so that the proportion of the active signal being missed can be reduced.
- the SSNR of these audio signals may be lower than the preset VAD decision threshold.
- these audio signals are active audio signals. This is due to the characteristics of these audio signals.
- the sub-band SNR of the high frequency portion is significantly reduced.
- the sub-band SNR of the high-frequency portion contributes less to the SSNR.
- the SSNR calculated by the conventional SSNR calculation method may be lower than the VAD decision threshold, which causes the missed detection of the active signal.
- the energy of the audio signal is relatively flat on the spectrum, but the overall energy of the audio signal is low.
- the SSNR calculated using the conventional SSNR calculation method may also be lower than the VAD decision threshold. The method shown in FIG. 4 reduces the VSNR decision threshold, so that the SSNR calculated by the conventional SSNR calculation method is greater than the VAD decision threshold, so that the ratio of the active signal leakage can be effectively reduced.
- FIG. 5 is a structural block diagram of an apparatus according to an embodiment of the present invention.
- the apparatus shown in Figure 5 is capable of performing the various steps of Figure 1 or Figure 2.
- the apparatus 500 includes a first determining unit 501, a second determining unit 502, and a third determining unit 503.
- the first determining unit 501 is configured to determine that the input audio signal is an audio signal to be determined.
- the second determining unit 502 is configured to determine an enhanced segmentation signal to noise ratio SSNR of the audio signal, where the enhanced SSNR is greater than a reference SSNR.
- the third determining unit 503 is configured to compare the enhanced SSNR with a voice activity detection VAD decision threshold to determine whether the audio signal is an active signal.
- the apparatus 500 shown in FIG. 5 can determine the characteristics of the input audio signal, determine the enhanced SSNR in a corresponding manner according to the characteristics of the audio signal, and compare the enhanced SSNR with the VAD decision threshold, so that the active signal can be missed. The ratio is reduced.
- the first determining unit 501 is specifically configured to determine, according to the subband SNR of the audio signal, the audio signal as an audio signal to be determined.
- the first determining unit 501 determines, according to the subband SNR of the audio signal, that the audio signal is an audio signal to be determined, the first determining unit 501, specifically In the case where the number of high frequency terminal strips in which the subband SNR is greater than the first preset threshold in the audio signal is greater than the first number, the audio signal is determined to be the audio signal to be determined.
- the first determining unit 501 determines that the audio signal is an audio signal to be determined according to the subband SNR of the audio signal
- the first determining unit 501 is specifically configured to use the audio signal. If the number of the high frequency terminal strips whose SNR is greater than the first preset threshold is greater than the second number and the number of low frequency terminal strips in the audio signal where the subband SNR is less than the second preset threshold is greater than the third quantity, The audio signal is the audio signal to be judged.
- the first determining unit 501 determines that the audio signal is an audio signal to be determined according to the subband SNR of the audio signal
- the first determining unit 501 is specifically configured to use the audio signal.
- the value of the sub-band SNR is greater than the third preset threshold
- the number of sub-bands is greater than the fourth number, and the audio signal is determined to be the audio signal to be determined.
- the first determining unit 501 is specifically configured to determine, in the case that the audio signal is an unvoiced signal, the audio signal as an audio signal to be determined.
- the audio signal is an unvoiced signal
- whether the audio signal is an unvoiced signal can be determined by detecting a Zero-Crossing Rate (ZCR) of the audio signal.
- ZCR Zero-Crossing Rate
- the audio signal is determined to be an unvoiced signal, wherein the ZCR threshold is determined by a large number of experiments.
- the first preset threshold and the second preset threshold may be obtained according to a large number of voice samples. Specifically, in a large number of voice unvoiced samples containing background noise, the subband SNR of the high frequency terminal strip is counted, and the first preset threshold is determined therefrom, so that the subband SNR of most of the high frequency terminal strips in the unvoiced samples Both are greater than the threshold. Similarly, the subband SNR of the low frequency terminal strip is counted in the speech unvoiced samples, and the second preset threshold is determined therefrom such that the subband SNR of most of the low frequency terminal strips of the speech unvoiced samples is less than the threshold.
- the third preset threshold is also obtained based on statistics. Specifically, the third preset threshold is determined from the sub-band SNR of the large number of noise signals such that the sub-band SNR of most of the sub-bands of the noise signals is less than the value.
- the first quantity, the second quantity, the third quantity, and the fourth quantity are also obtained based on statistics.
- the first quantity in a large number of voice samples containing noise, the number of subbands whose SNR of the high frequency terminal strip is larger than the first preset threshold is determined, and the first quantity is determined therefrom, so that the voice samples are absolutely large.
- a majority of the high frequency terminal strips having a SNR greater than the first predetermined threshold are greater than the first number.
- the method of determining the second quantity is similar to the method of determining the first quantity.
- the second number may be the same as the first quantity or may be different from the first quantity.
- the sub-band SNR of the low-frequency terminal strip is calculated to be greater than the number of sub-bands of the second preset threshold, and the third quantity is determined therefrom, so that the voice samples are absolutely large.
- a majority of the low frequency terminal band SNR greater than the second predetermined threshold is greater than the third number.
- the statistical sub-band SNR is greater than the number of sub-bands of the third preset threshold, and the fourth quantity is determined therefrom, so that the majority of the voice samples are greater than the third preset.
- the number of subband SNRs of the threshold is greater than the fourth number.
- the second determining unit 502 is specifically configured to determine a weight of the subband SNR of each subband in the audio signal, where the subband SNR is greater than the first preset threshold, and the weight of the high frequency terminal strip is greater than that of the other subbands.
- the weight with SNR is determined according to the weight of the subband SNR of each subband in the audio signal and the SNR of each subband.
- the second determining unit 502 is specifically configured to determine a reference SSNR of the audio signal, and determine an enhanced SSNR according to a reference SSNR of the audio signal.
- the reference SSNR can be the SSNR calculated using Equation 1.1.
- the sub-band SNRs of the respective sub-bands included in the SSNR have the same weight in the SSNR.
- the second determining unit 502 is specifically configured to determine the enhanced SSNR by using the following formula:
- SSNR represents the reference SSNR
- SSNR' represents the enhanced SSNR
- x and y represent enhancement parameters.
- the value of x can be 1.05
- the value of y can be 1.
- the values of x and y may also be other suitable values such that the enhanced SSNR is properly greater than the reference SSNR.
- the second determining unit 502 is specifically configured to determine the enhanced SSNR by using the following formula:
- SSNR represents the reference SSNR
- SSNR' represents the enhanced SSNR
- f(x), h(y) represents an enhancement function.
- f(x) and h(y) may be functions related to the long-term SNR (LSNR) of the audio signal, and the long-term signal-to-noise ratio of the audio signal is for a long period of time.
- Average SNR or weighted SNR For example, when lsnr is greater than 20, f(lsnr) may be equal to 1.1 and y(lsnr) may be equal to 2. When lsnr is less than 20 and greater than 15, f(lsnr) can be equal to 1.05. y(lsnr) can be equal to 1.
- f(lsnr) When lsnr is less than 15, f(lsnr) may be equal to 1, and y(lsnr) may be equal to zero.
- f(x) and h(y) may also be in other suitable forms such that the enhanced SSNR is properly greater than the reference SSNR.
- the third determining unit 503 is specifically configured to compare the enhanced SSNR with a voice activity detection VAD decision threshold, and determine, according to the comparison structure, whether the audio signal is an active signal. Specifically, if the enhanced SSNR is greater than the VAD decision threshold, it is determined that the audio signal is an active signal. If the enhanced SSNR is less than the VAD decision threshold, then the audio signal is determined to be an inactive signal.
- the reduced VAD decision threshold obtained after the reference VAD decision threshold is reduced may also be used by using a preset algorithm, and the reduced VAD decision threshold is used to determine whether the audio signal is an active signal.
- the apparatus 500 may further include a fourth determining unit 504.
- the fourth determining unit 504 is configured to reduce the VAD decision threshold by using a preset algorithm to obtain a reduced VAD decision threshold.
- the third determining unit 503 is specifically configured to compare the enhanced SSNR with the reduced VAD decision threshold to determine whether the audio signal is an active signal.
- FIG. 6 is a structural block diagram of another apparatus according to an embodiment of the present invention.
- the apparatus shown in Figure 6 is capable of performing the various steps of Figure 3.
- the apparatus 600 includes a first determining unit 601, a second determining unit 602, and a third determining unit 603.
- the first determining unit 601 is configured to determine that the input audio signal is an audio signal to be determined.
- a second determining unit 602 configured to determine a weight of a sub-band signal-to-noise ratio SNR of each sub-band in the audio signal, where the sub-band SNR is greater than a first sub-band of the high-frequency terminal band, and the weight of the sub-band SNR is greater than other
- the weight of the subband SNR of the subband, the enhanced segmentation signal to noise ratio SSNR is determined according to the weight of the subband SNR of each subband in the audio signal and the subband SNR of each subband, wherein the enhanced SSNR is greater than the reference SSNR.
- the third determining unit 603 is configured to compare the enhanced SSNR with a voice activity detection VAD decision threshold to determine whether the audio signal is an active signal.
- the apparatus 600 shown in FIG. 6 can determine the characteristics of the input audio signal, determine the enhanced SSNR in a corresponding manner according to the characteristics of the audio signal, and compare the enhanced SSNR with the VAD decision threshold, so that the active signal can be missed. The ratio is reduced.
- the first determining unit 601 is specifically configured to determine, according to the sub-band signal to noise ratio SNR of the audio signal, the audio signal as the to-be-determined audio signal.
- the first determining unit 601 is specifically configured to use the audio signal.
- the audio signal is determined to be the audio signal to be determined.
- the first determining unit 601 is configured to: in the audio signal, the number of the high frequency terminal strips in which the subband SNR is greater than the first preset threshold is greater than the second quantity and the audio signal neutron In the case where the number of low frequency terminal strips having an SNR smaller than the second preset threshold is greater than the third number, the audio signal is determined to be an audio signal to be determined.
- the first preset threshold and the second preset threshold may be obtained according to a large number of voice samples. Specifically, in a large number of voice unvoiced samples containing background noise, the subband SNR of the high frequency terminal strip is counted, and the first preset threshold is determined therefrom, so that the subband SNR of most of the high frequency terminal strips in the unvoiced samples Both are greater than the threshold. Similarly, the subband SNR of the low frequency terminal strip is counted in the speech unvoiced samples, and the second preset threshold is determined therefrom such that the subband SNR of most of the low frequency terminal strips of the speech unvoiced samples is less than the threshold.
- the first quantity, the second quantity, and the third quantity are also obtained based on statistics.
- the first quantity as an example, in a large number of noise-containing voice unvoiced sample frames, the number of sub-bands whose SNR of the high-frequency terminal strip is larger than the first preset threshold is determined, and the first quantity is determined therefrom, so that the voice unvoiced samples are made.
- the majority of the subband SNR in the frame is greater than the first preset threshold.
- the number of high frequency terminal strips is greater than the first number.
- the method of obtaining the second quantity is similar to the method of obtaining the first quantity.
- the second number may be the same as the first quantity, and the second quantity may also be different from the first quantity.
- the sub-band SNR of the low-frequency terminal strip is counted less than the number of sub-bands of the second preset threshold, and the third quantity is determined therefrom, so that the voice unvoiced samples The majority of the subband SNR in the frame is less than the second preset threshold.
- the number of low frequency terminal strips is greater than the third number.
- FIG. 7 is a structural block diagram of an apparatus according to an embodiment of the present invention.
- the apparatus shown in Figure 7 is capable of performing the various steps of Figure 1 or Figure 2.
- device 700 includes a processor 701 and a memory 702.
- the processor 701 can be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a Field Programmable Gate Array (FPGA), or the like. Programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
- the methods, steps, and logical block diagrams disclosed in the embodiments of the present invention may be implemented or carried out.
- the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
- the steps of the method disclosed in the embodiments of the present invention may be directly implemented by the hardware decoding processor, or may be performed by a combination of hardware and software modules in the decoding processor.
- the software module can be located in random access memory Memory (Random Access Memory, RAM), Flash memory, Read-Only Memory (ROM), Programmable Read Only Memory, or Electrically Erasable Programmable Memory, Register, etc., are well-known storage media.
- RAM Random Access Memory
- ROM Read-Only Memory
- Programmable Read Only Memory or Electrically Erasable Programmable Memory, Register, etc.
- the storage medium is located in memory 702, and processor 701 reads the instructions in memory 702 and, in conjunction with its hardware, performs the steps of the above method.
- the processor 701 is configured to determine that the input audio signal is an audio signal to be determined.
- the processor 701 is configured to determine an enhanced segmentation signal to noise ratio SSNR of the audio signal, where the enhanced SSNR is greater than a reference SSNR.
- the processor 701 is configured to compare the enhanced SSNR with a voice activity detection VAD decision threshold to determine whether the audio signal is an active signal.
- the apparatus 700 shown in FIG. 7 can determine the characteristics of the input audio signal, determine the enhanced SSNR in a corresponding manner according to the characteristics of the audio signal, and compare the enhanced SSNR with the VAD decision threshold, so that the active signal can be missed. The ratio is reduced.
- the processor 701 is specifically configured to determine, according to the subband SNR of the audio signal, the audio signal as an audio signal to be determined.
- the processor 701 determines, according to the subband SNR of the audio signal, the audio signal is an audio signal to be determined
- the processor 701 is specifically configured to: in the audio signal, the subband SNR is greater than When the number of the high frequency terminal strips of the first preset threshold is greater than the first number, the audio signal is determined to be the audio signal to be determined.
- the processor 701 determines that the audio signal is an audio signal to be determined according to a subband SNR of the audio signal
- the processor 701 is specifically configured to use a subband SNR in the audio signal. If the number of high frequency terminal strips greater than the first preset threshold is greater than the second number and the number of low frequency terminal strips in the audio signal where the subband SNR is less than the second preset threshold is greater than the third number, determining the audio signal is to be determined. Determine the audio signal.
- the processor 701 determines that the audio signal is an audio signal to be determined according to a subband SNR of the audio signal
- the processor 701 is specifically configured to use a subband in the audio signal.
- the audio signal is determined to be the audio signal to be determined.
- the processor 701 is specifically configured to determine, in the case that the audio signal is an unvoiced signal, the audio signal as an audio signal to be determined.
- the audio signal is an unvoiced signal.
- ZCR Zero-Crossing Rate
- the first preset threshold and the second preset threshold may be obtained according to a large number of voice samples. Specifically, in a large number of voice unvoiced samples containing background noise, the subband SNR of the high frequency terminal strip is counted, and the first preset threshold is determined therefrom, so that the subband SNR of most of the high frequency terminal strips in the unvoiced samples Both are greater than the threshold. Similarly, the subband SNR of the low frequency terminal strip is counted in the speech unvoiced samples, and the second preset threshold is determined therefrom such that the subband SNR of most of the low frequency terminal strips of the speech unvoiced samples is less than the threshold.
- the third preset threshold is also obtained based on statistics. Specifically, the third preset threshold is determined from the sub-band SNR of the large number of noise signals such that the sub-band SNR of most of the sub-bands of the noise signals is less than the value.
- the first quantity, the second quantity, the third quantity, and the fourth quantity are also obtained based on statistics. Taking the first quantity as an example, in a large number of voice samples containing noise, the number of subbands whose SNR of the high frequency terminal strip is larger than the first preset threshold is determined, and the first quantity is determined therefrom, so that the voice samples are absolutely large. A majority of the high frequency terminal strips having a SNR greater than the first predetermined threshold are greater than the first number.
- the method of determining the second quantity is similar to the method of determining the first quantity. The second number may be the same as the first quantity or may be different from the first quantity.
- the sub-band SNR of the low-frequency terminal strip is calculated to be greater than the number of sub-bands of the second preset threshold, and the third quantity is determined therefrom, so that the voice samples are absolutely large.
- a majority of the low frequency terminal band SNR greater than the second predetermined threshold is greater than the third number.
- the statistical sub-band SNR is greater than the number of sub-bands of the third preset threshold, and the fourth quantity is determined therefrom, so that the majority of the voice samples are greater than the third preset.
- the number of subband SNRs of the threshold is greater than the fourth number.
- the processor 701 is specifically configured to determine a weight of a subband SNR of each subband in the audio signal, where the subband SNR is greater than the first preset threshold, and the weight of the high frequency terminal strip is greater than the subband SNR of other subbands. The weight is determined based on the weight of the subband SNR of each subband in the audio signal and the SNR of each subband.
- the processor 701 is specifically configured to determine a reference SSNR of the audio signal, and determine an enhanced SSNR according to a reference SSNR of the audio signal.
- the reference SSNR can be the SSNR calculated using Equation 1.1.
- Benchmark SSNR In the calculation, the sub-band SNRs of the respective sub-bands included in the SSNR have the same weight in the SSNR.
- the processor 701 is specifically configured to determine the enhanced SSNR by using the following formula:
- SSNR represents the reference SSNR
- SSNR' represents the enhanced SSNR
- x and y represent enhancement parameters.
- the value of x can be 1.07
- the value of y can be 1.
- the values of x and y may also be other suitable values such that the enhanced SSNR is properly greater than the reference SSNR.
- the processor 701 is specifically configured to determine the enhanced SSNR by using the following formula:
- SSNR represents the reference SSNR
- SSNR' represents the enhanced SSNR
- f(x), h(y) represents an enhancement function.
- f(x) and h(y) may be functions related to the long-term SNR (LSNR) of the audio signal, and the long-term signal-to-noise ratio of the audio signal is for a long period of time.
- Average SNR or weighted SNR For example, when lsnr is greater than 20, f(lsnr) may be equal to 1.1 and y(lsnr) may be equal to 2.
- f(lsnr) When lsnr is less than 20 and greater than 17, f(lsnr) may be equal to 1.07, and y(lsnr) may be equal to 1. When lsnr is less than 17, f(lsnr) may be equal to 1, and y(lsnr) may be equal to zero.
- f(x) and h(y) may also be in other suitable forms such that the enhanced SSNR is properly greater than the reference SSNR.
- the processor 701 is specifically configured to compare the enhanced SSNR with a voice activity detection VAD decision threshold, and determine, according to the comparison structure, whether the audio signal is an active signal. Specifically, if the enhanced SSNR is greater than the VAD decision threshold, it is determined that the audio signal is an active signal. If the enhanced SSNR is less than the VAD decision threshold, then the audio signal is determined to be an inactive signal.
- the reduced VAD decision threshold obtained after the reference VAD decision threshold is reduced may also be used by using a preset algorithm, and the reduced VAD decision threshold is used to determine whether the audio signal is an active signal.
- the processor 701 can also be configured to reduce the VAD decision threshold by using a preset algorithm to obtain a reduced VAD decision threshold.
- the processor 701 is specifically configured to compare the enhanced SSNR with the reduced VAD decision threshold to determine whether the audio signal is an active signal.
- FIG. 8 is a structural block diagram of another apparatus according to an embodiment of the present invention.
- the apparatus shown in Figure 8 is capable of performing the various steps of Figure 3.
- device 800 includes a processor 801 and a memory 802.
- the processor 801 can be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a Field Programmable Gate Array (FPGA), or the like. Programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
- the methods, steps, and logical block diagrams disclosed in the embodiments of the present invention may be implemented or carried out.
- the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
- the steps of the method disclosed in the embodiments of the present invention may be directly implemented by the hardware decoding processor, or may be performed by a combination of hardware and software modules in the decoding processor.
- the software module can be located in a random access memory (RAM), a flash memory, a read-only memory (ROM), a programmable read only memory or an electrically erasable programmable memory, a register, etc.
- RAM random access memory
- ROM read-only memory
- programmable read only memory or an electrically erasable programmable memory
- register etc.
- the storage medium is located in memory 802, and processor 801 reads the instructions in memory 802 and, in conjunction with its hardware, performs the steps of the above method.
- the processor 801 is configured to determine the input audio signal as the audio signal to be determined.
- the processor 801 is configured to determine a weight of a sub-band signal-to-noise ratio SNR of each subband in the audio signal, where the subband SNR of the subband SNR greater than the first preset threshold has a weight greater than that of the other subbands
- the weight of the subband SNR is determined according to the weight of the subband SNR of each subband in the audio signal and the subband SNR of each subband, wherein the enhanced SSNR is greater than the reference SSNR.
- the processor 801 is configured to compare the enhanced SSNR with a voice activity detection VAD decision threshold to determine whether the audio signal is an active signal.
- the apparatus 800 shown in FIG. 8 can determine the characteristics of the input audio signal, determine the enhanced SSNR in a corresponding manner according to the characteristics of the audio signal, and compare the enhanced SSNR with the VAD decision threshold, so that the active signal can be missed. The ratio is reduced.
- the processor 801 is specifically configured to determine, according to the sub-band signal to noise ratio SNR of the audio signal, the audio signal as an audio signal to be determined.
- the processor 801 is specifically configured to determine, when the number of the high frequency terminal strips in the audio signal that the subband signal to noise ratio SNR is greater than the first preset threshold is greater than the first quantity.
- the audio signal is the audio signal to be judged.
- the processor 801 is specifically configured to: in the audio signal, the number of the high frequency terminal strips in which the subband SNR is greater than the first preset threshold is greater than the second quantity and the subband SNR in the audio signal When the number of low frequency terminal strips smaller than the second preset threshold is greater than the third number, The audio signal is determined to be an audio signal to be determined.
- the first preset threshold and the second preset threshold may be obtained according to a large number of voice samples. Specifically, in a large number of voice unvoiced samples containing background noise, the subband SNR of the high frequency terminal strip is counted, and the first preset threshold is determined therefrom, so that the subband SNR of most of the high frequency terminal strips in the unvoiced samples Both are greater than the threshold. Similarly, the subband SNR of the low frequency terminal strip is counted in the speech unvoiced samples, and the second preset threshold is determined therefrom such that the subband SNR of most of the low frequency terminal strips of the speech unvoiced samples is less than the threshold.
- the first quantity, the second quantity, and the third quantity are also obtained based on statistics.
- the first quantity as an example, in a large number of noise-containing voice unvoiced sample frames, the number of sub-bands whose SNR of the high-frequency terminal strip is larger than the first preset threshold is determined, and the first quantity is determined therefrom, so that the voice unvoiced samples are made.
- the majority of the subband SNR in the frame is greater than the first preset threshold.
- the number of high frequency terminal strips is greater than the first number.
- the method of obtaining the second quantity is similar to the method of obtaining the first quantity.
- the second number may be the same as the first quantity, and the second quantity may also be different from the first quantity.
- the sub-band SNR of the low-frequency terminal strip is counted less than the number of sub-bands of the second preset threshold, and the third quantity is determined therefrom, so that the voice unvoiced samples The majority of the subband SNR in the frame is less than the second preset threshold.
- the number of low frequency terminal strips is greater than the third number.
- FIG. 9 is a structural block diagram of another apparatus according to an embodiment of the present invention.
- the apparatus 900 shown in FIG. 9 can perform the various steps of FIG.
- the apparatus 900 includes a first determining unit 901, a second determining unit 902, a third determining unit 903, and a fourth determining unit 904.
- the first determining unit 901 is configured to determine that the input audio signal is an audio signal to be determined.
- the second determining unit 902 is configured to acquire a reference SSNR of the audio signal.
- the reference SSNR may be the SSNR calculated using Equation 1.1.
- the third determining unit 903 is configured to reduce the reference VAD decision threshold by using a preset algorithm to obtain a reduced VAD decision threshold.
- the reference VAD decision threshold may be a default VAD decision threshold, which may be pre-stored or temporarily calculated, wherein the calculation of the reference VAD decision threshold may be performed by using a prior art technique.
- the preset algorithm may be to multiply the reference VAD decision threshold by a coefficient less than one, and other algorithms may be used.
- the embodiment of the present invention does not limit the specific algorithm used. .
- the preset algorithm may appropriately reduce the VAD decision threshold such that the enhanced SSNR is greater than the reduced VAD decision threshold, so that the proportion of the active signal being missed may be reduced.
- the fourth determining unit 904 is configured to compare the reference SSNR with the reduced VAD decision threshold to determine whether the audio signal is an active signal.
- the first determining unit 901 is specifically configured to determine, according to an SNR of the audio signal, the audio signal as an audio signal to be determined.
- the first determining unit 901 determines that the audio signal is an audio signal to be determined according to an SNR of the audio signal
- the first determining unit 901 is specifically configured to use a subband in the audio signal.
- the audio signal is determined to be the audio signal to be determined.
- the first determining unit 901 determines that the audio signal is an audio signal to be determined according to an SNR of the audio signal
- the first determining unit 901 is specifically configured to use a subband in the audio signal. Determining that the audio signal is determined when the number of high frequency terminal strips whose SNR is greater than the first preset threshold is greater than the second number and the number of low frequency terminal strips in the audio signal where the subband SNR is less than the second preset threshold is greater than the third number The audio signal is to be judged.
- the first determining unit 901 determines that the audio signal is an audio signal to be determined according to an SNR of the audio signal
- the first determining unit 901 is specifically configured to use a neutron in the audio signal.
- the audio signal is determined to be the audio signal to be determined.
- the first determining unit 901 is specifically configured to determine, in the case that the audio signal is an unvoiced signal, the audio signal as an audio signal to be determined.
- the audio signal is an unvoiced signal
- whether the audio signal is an unvoiced signal can be determined by detecting a Zero-Crossing Rate (ZCR) of the audio signal.
- ZCR Zero-Crossing Rate
- the audio signal is determined to be an unvoiced signal, wherein the ZCR threshold is determined by a large number of experiments.
- the first preset threshold and the second preset threshold may be obtained according to a large number of voice samples. Specifically, in a large number of voice unvoiced samples containing background noise, the subband SNR of the high frequency terminal strip is counted, and the first preset threshold is determined therefrom, so that the subband SNR of most of the high frequency terminal strips in the unvoiced samples Both are greater than the threshold. Similarly, the subband SNR of the low frequency terminal strip is counted in the speech unvoiced samples, and the second preset threshold is determined therefrom such that the subband SNR of most of the low frequency terminal strips of the speech unvoiced samples is less than the threshold.
- the third preset threshold is also obtained based on statistics. Specifically, from a large number of noise signals The third preset threshold is determined in the subband SNR such that the subband SNR of most of the subbands of the noise signals is less than the value.
- the first quantity, the second quantity, the third quantity, and the fourth quantity are also obtained based on statistics. Taking the first quantity as an example, in a large number of voice samples containing noise, the number of subbands whose SNR of the high frequency terminal strip is larger than the first preset threshold is determined, and the first quantity is determined therefrom, so that the voice samples are absolutely large. A majority of the high frequency terminal strips having a SNR greater than the first predetermined threshold are greater than the first number.
- the method of determining the second quantity is similar to the method of determining the first quantity. The second number may be the same as the first quantity or may be different from the first quantity.
- the sub-band SNR of the low-frequency terminal strip is calculated to be greater than the number of sub-bands of the second preset threshold, and the third quantity is determined therefrom, so that the voice samples are absolutely large.
- a majority of the low frequency terminal band SNR greater than the second predetermined threshold is greater than the third number.
- the statistical sub-band SNR is greater than the number of sub-bands of the third preset threshold, and the fourth quantity is determined therefrom, so that the majority of the voice samples are greater than the third preset.
- the number of subband SNRs of the threshold is greater than the fourth number.
- the apparatus 900 shown in FIG. 9 can determine the characteristics of the input audio signal, reduce the reference VAD decision threshold according to the characteristics of the audio signal, and compare the SSNR with the reduced VAD decision threshold, so that the active signal can be leaked.
- the inspection ratio is reduced.
- FIG. 10 is a structural block diagram of another apparatus according to an embodiment of the present invention.
- the apparatus 1000 shown in FIG. 10 can perform the various steps of FIG.
- the apparatus 1000 includes a processor 1001 and a memory 1002.
- the processor 1001 may be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a Field Programmable Gate Array (FPGA), or the like. Programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
- the methods, steps, and logical block diagrams disclosed in the embodiments of the present invention may be implemented or carried out.
- the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
- the steps of the method disclosed in the embodiments of the present invention may be directly implemented by the hardware decoding processor, or may be performed by a combination of hardware and software modules in the decoding processor.
- the software module can be located in a random access memory (RAM), a flash memory, a read-only memory (ROM), a programmable read only memory or an electrically erasable programmable memory, a register, etc.
- RAM random access memory
- ROM read-only memory
- programmable read only memory or an electrically erasable programmable memory
- register etc.
- the storage medium is located in the memory 1002, and the processor 1001 reads the instructions in the memory 1002 and completes the steps of the above method in combination with its hardware.
- the processor 1001 is configured to determine the input audio signal as an audio signal to be determined.
- the processor 1001 is configured to acquire a reference SSNR of the audio signal.
- the reference SSNR may be the SSNR calculated using Equation 1.1.
- the processor 1001 is configured to reduce the reference VAD decision threshold by using a preset algorithm to obtain a reduced VAD decision threshold.
- the reference VAD decision threshold may be a default VAD decision threshold, which may be pre-stored or temporarily calculated, wherein the calculation of the reference VAD decision threshold may be performed by using a prior art technique.
- the preset algorithm may be to multiply the reference VAD decision threshold by a coefficient less than one, and other algorithms may be used.
- the embodiment of the present invention does not limit the specific algorithm used. .
- the preset algorithm may appropriately reduce the VAD decision threshold such that the enhanced SSNR is greater than the reduced VAD decision threshold, so that the proportion of the active signal being missed may be reduced.
- the processor 1001 is configured to compare the reference SSNR with the reduced VAD decision threshold to determine whether the audio signal is an active signal.
- the processor 1001 is specifically configured to determine, according to an SNR of the audio signal, the audio signal as an audio signal to be determined.
- the processor 1001 determines that the audio signal is an audio signal to be determined according to an SNR of the audio signal
- the processor 1001 is specifically configured to: in the audio signal, the sub-band SNR is greater than the first When the number of high frequency terminal strips of the preset threshold is greater than the first number, the audio signal is determined to be an audio signal to be determined.
- the processor 1001 determines that the audio signal is an audio signal to be determined according to an SNR of the audio signal
- the processor 1001 is specifically configured to: in the audio signal, the sub-band SNR is greater than the first When the number of high frequency terminal strips of the preset threshold is greater than the second number and the number of low frequency terminal strips in which the subband SNR of the audio signal is less than the second preset threshold is greater than the third quantity, determining the audio signal as the audio signal to be determined .
- the processor 1001 determines that the audio signal is an audio signal to be determined according to an SNR of the audio signal
- the processor 1001 is specifically configured to use a value of a neutron band SNR in the audio signal.
- the audio signal is determined to be an audio signal to be determined.
- the processor 1001 is specifically configured to determine, in the case that the audio signal is an unvoiced signal, the audio signal as an audio signal to be determined.
- the technology in the field The operator can understand that there are a variety of methods for detecting whether an audio signal is an unvoiced signal.
- whether the audio signal is an unvoiced signal can be determined by detecting a Zero-Crossing Rate (ZCR) of the audio signal.
- ZCR Zero-Crossing Rate
- the audio signal is determined to be an unvoiced signal, wherein the ZCR threshold is determined by a large number of experiments.
- the first preset threshold and the second preset threshold may be obtained according to a large number of voice samples. Specifically, in a large number of voice unvoiced samples containing background noise, the subband SNR of the high frequency terminal strip is counted, and the first preset threshold is determined therefrom, so that the subband SNR of most of the high frequency terminal strips in the unvoiced samples Both are greater than the threshold. Similarly, the subband SNR of the low frequency terminal strip is counted in the speech unvoiced samples, and the second preset threshold is determined therefrom such that the subband SNR of most of the low frequency terminal strips of the speech unvoiced samples is less than the threshold.
- the third preset threshold is also obtained based on statistics. Specifically, the third preset threshold is determined from the sub-band SNR of the large number of noise signals such that the sub-band SNR of most of the sub-bands of the noise signals is less than the value.
- the first quantity, the second quantity, the third quantity, and the fourth quantity are also obtained based on statistics. Taking the first quantity as an example, in a large number of voice samples containing noise, the number of subbands whose SNR of the high frequency terminal strip is larger than the first preset threshold is determined, and the first quantity is determined therefrom, so that the voice samples are absolutely large. A majority of the high frequency terminal strips having a SNR greater than the first predetermined threshold are greater than the first number.
- the method of determining the second quantity is similar to the method of determining the first quantity. The second number may be the same as the first quantity or may be different from the first quantity.
- the sub-band SNR of the low-frequency terminal strip is calculated to be greater than the number of sub-bands of the second preset threshold, and the third quantity is determined therefrom, so that the voice samples are absolutely large.
- a majority of the low frequency terminal band SNR greater than the second predetermined threshold is greater than the third number.
- the statistical sub-band SNR is greater than the number of sub-bands of the third preset threshold, and the fourth quantity is determined therefrom, so that the majority of the voice samples are greater than the third preset.
- the number of subband SNRs of the threshold is greater than the fourth number.
- the apparatus 1000 shown in FIG. 10 can determine the characteristics of the input audio signal, reduce the reference VAD decision threshold according to the characteristics of the audio signal, and compare the SSNR with the reduced VAD decision threshold, so that the active signal can be leaked.
- the inspection ratio is reduced.
- the disclosed systems, devices, and methods may be implemented in other manners.
- the device embodiments described above are merely illustrative.
- the division of the unit is only a logical function division.
- there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
- the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
- each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
- the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
- the technical solution of the present invention which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
- the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) or a processor to perform all or part of the steps of the methods described in various embodiments of the present invention.
- the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Telephone Function (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Noise Elimination (AREA)
- Telephonic Communication Services (AREA)
- Signal Processing Not Specific To The Method Of Recording And Reproducing (AREA)
- Circuit For Audible Band Transducer (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Description
Claims (42)
- 一种检测音频信号的方法,其特征在于,所述方法包括:确定输入的音频信号为待判断音频信号;确定所述音频信号的增强分段信噪比SSNR,其中所述增强SSNR大于基准SSNR;将所述增强SSNR与语音活动检测VAD判决门限进行比较,确定所述音频信号是否为活动信号。
- 如权利要求1所述的方法,其特征在于,所述确定输入的音频信号为待判断音频信号,包括:根据所述音频信号的子带信噪比SNR,确定所述音频信号为待判断音频信号。
- 如权利要求2所述的方法,其特征在于,所述确定输入的音频信号为待判断音频信号,包括:在所述音频信号中子带SNR大于第一预设门限的高频端子带的数量大于第一数量的情况下,确定所述音频信号为待判断音频信号。
- 如权利要求2所述的方法,其特征在于,所述确定输入的音频信号为待判断音频信号,包括:在所述音频信号中子带SNR大于第一预设门限的高频端子带的数量大于第二数量且所述音频信号中子带SNR小于第二预设门限的低频端子带的数量大于第三数量情况下,确定所述音频信号为待判断音频信号。
- 如权利要求2所述的方法,其特征在于,所述确定输入的音频信号为待判断音频信号,包括:在所述音频信号中中子带SNR的值大于第三预设门限的子带的数量大于第四数量的情况下,确定所述音频信号为待判断音频信号。
- 如权利要求1所述的方法,其特征在于,所述确定输入的音频信号为待判断音频信号,包括:在确定所述音频信号为清音信号的情况下,确定所述音频信号为待判断音频信号。
- 如权利要求3或4所述的方法,其特征在于,所述确定所述音频信号的增强分段信噪比SSNR,包括:确定所述音频信号中各个子带的子带SNR的权重,其中所述子带SNR 大于第一预设门限的高频端子带的子带SNR的权重大于其他子带的子带SNR的权重;根据所述音频信号中的各个子带的子带SNR的权重和各个子带的子带SNR,确定所述增强SSNR。
- 如权利要求1-6中任一项所述的方法,其特征在于,所述确定所述音频信号的增强分段信噪比SSNR,包括:确定所述音频信号的基准SSNR;根据所述音频信号的基准SSNR,确定增强SSNR。
- 如权利要求8所述的方法,其特征在于,所述根据所述音频信号的基准SSNR,确定增强SSNR,包括:使用以下公式确定所述增强SSNR:SSNR′=x*SSNR+y,其中,SSNR表示所述基准SSNR,SSNR′表示所述增强SSNR,x和y表示增强参数。
- 如权利要求8所述的方法,其特征在于,所述根据所述音频信号的基准SSNR,确定增强SSNR,包括:使用以下公式确定所述增强SSNR:SSNR′=f(x)*SSNR+h(y),其中,SSNR表示所述基准SSNR,SSNR′表示所述增强SSNR,f(x)、h(y)表示增强函数。
- 如权利要求1至10任一所述的方法,其特征在于,所述将所述增强SSNR与语音活动检测VAD判决门限进行比较前进一步包括:使用预置算法减小所述VAD判决门限,获得减小后的VAD判决门限;所述将所述增强SSNR与语音活动检测VAD判决门限比较,确定所述音频信号是否为活动信号具体包括:将所述增强SSNR与所述减小后的VAD判决门限进行比较,确定所述音频信号是否为活动信号。
- 一种检测音频信号的方法,其特征在于,所述方法包括:确定输入的音频信号为待判断音频信号;确定所述音频信号中各个子带的子带信噪比SNR的权重,其中所述子带SNR大于第一预设门限的高频端子带的子带SNR的权重大于其他子带的 子带SNR的权重;根据所述音频信号中的各个子带的子带SNR的权重和各个子带的子带SNR,确定增强分段信噪比SSNR,其中所述增强SSNR大于基准SSNR;将所述增强SSNR与语音活动检测VAD判决门限比较,确定所述音频信号是否为活动信号。
- 如权利要求12所述的方法,其特征在于,所述确定输入的音频信号为待判断音频信号,包括:根据所述音频信号的子带SNR,确定所述音频信号为待判断音频信号。
- 如权利要求13所述的方法,其特征在于,所述确定输入的音频信号为待判断音频信号,包括:在所述音频信号中子带SNR大于第一预设门限的高频端子带的数量大于第一数量的情况下,确定所述音频信号为待判断音频信号。
- 如权利要求13所述的方法,其特征在于,所述确定输入的音频信号为待判断音频信号,包括:在所述音频信号中子带SNR大于第一预设门限的高频端子带的数量大于第二数量且所述音频信号中子带SNR小于第二预设门限的低频端子带的数量大于第三数量情况下,确定所述音频信号为待判断音频信号。
- 一种检测音频信号的方法,其特征在于,所述方法包括:确定输入的音频信号为待判断音频信号;获取所述音频信号的基准分段信噪比SSNR;使用预置算法减小基准语音活动检测VAD判决门限,获得减小后的VAD判决门限;将所述基准SSNR与所述减小后的VAD判决门限进行比较,确定所述音频信号是否为活动信号。
- 如权利要求16所述的方法,其特征在于,所述确定输入的音频信号为待判断音频信号,包括:根据所述音频信号的子带信噪比SNR,确定所述音频信号为待判断音频信号。
- 如权利要求17所述的方法,其特征在于,所述确定输入的音频信号为待判断音频信号,包括:在所述音频信号中子带SNR大于第一预设门限的高频端子带的数量大 于第一数量的情况下,确定所述音频信号为待判断音频信号。
- 如权利要求17所述的方法,其特征在于,所述确定输入的音频信号为待判断音频信号,包括:在所述音频信号中子带SNR大于第一预设门限的高频端子带的数量大于第二数量且所述音频信号中子带SNR小于第二预设门限的低频端子带的数量大于第三数量情况下,确定所述音频信号为待判断音频信号。
- 如权利要求17所述的方法,其特征在于,所述确定输入的音频信号为待判断音频信号,包括:在所述音频信号中中子带SNR的值大于第三预设门限的子带的数量大于第四数量的情况下,确定所述音频信号为待判断音频信号。
- 如权利要求16所述的方法,其特征在于,所述确定输入的音频信号为待判断音频信号,包括:在确定所述音频信号为清音信号的情况下,确定所述音频信号为待判断音频信号。
- 一种装置,其特征在于,所述装置包括:第一确定单元,用于确定输入的音频信号为待判断音频信号;第二确定单元,用于确定所述音频信号的增强分段信噪比SSNR,其中所述增强SSNR大于基准SSNR;第三确定单元,用于将所述增强SSNR与语音活动检测VAD判决门限比较,确定所述音频信号是否为活动信号。
- 如权利要求22所述的装置,其特征在于,所述第一确定单元,具体用于根据所述音频信号的子带信噪比SNR,确定所述音频信号为待判断音频信号。
- 如权利要求23所述的装置,其特征在于,所述第一确定单元,具体用于在所述音频信号中子带信噪比SNR大于第一预设门限的高频端子带的数量大于第一数量的情况下,确定所述音频信号为待判断音频信号。
- 如权利要求23所述的装置,其特征在于,所述第一确定单元,具体用于在所述音频信号中子带SNR大于第一预设门限的高频端子带的数量大于第二数量且所述音频信号中子带SNR小于第二预设门限的低频端子带的数量大于第三数量情况下,确定所述音频信号为待判断音频信号。
- 如权利要求23所述的装置,其特征在于,所述第一确定单元,具 体用于在所述音频信号中子带SNR的值大于第三预设门限的子带的数量大于第四数量的情况下,确定所述音频信号为待判断音频信号。
- 如权利要求22所述的装置,其特征在于,所述第一确定单元,具体用于在确定所述音频信号为清音信号的情况下,确定所述音频信号为待判断音频信号。
- 如权利要求24或25所述的装置,其特征在于,所述第二确定单元,具体用于确定所述音频信号中各个子带的子带SNR的权重,其中所述子带SNR大于第一预设门限的高频端子带的子带SNR的权重大于其他子带的子带SNR的权重,根据所述音频信号中的各个子带的子带SNR的权重和各个子带的子带SNR,确定所述增强SSNR。
- 如权利要求22-27中任一项所述的装置,其特征在于,第二确定单元,具体用于确定所述音频信号的基准SSNR,根据所述音频信号的基准SSNR,确定所述增强SSNR。
- 如权利要求29所述的装置,其特征在于,所述第二确定单元,具体用于使用以下公式确定所述增强SSNR:SSNR′=x*SSNR+y,其中,SSNR表示所述基准SSNR,SSNR′表示所述增强SSNR,x和y表示增强参数。
- 如权利要求29所述的装置,其特征在于,所述第二确定单元,具体用于使用以下公式确定所述增强SSNR:SSNR′=f(x)*SSNR+h(y),其中,SSNR表示所述基准SSNR,SSNR′表示所述增强SSNR,f(x)、h(y)表示增强函数。
- 如权利要求22至31中任一项所述的装置,其特征在于,所述装置还包括第四确定单元;所述第四确定单元,用于使用预置算法减小所述VAD判决门限,获得减小后的VAD判决门限;所述第三确定单元,具体用于将所述增强SSNR与所述减小后的VAD判决门限进行比较,确定所述音频信号是否为活动信号。
- 一种装置,其特征在于,所述装置包括:第一确定单元,用于确定输入的音频信号为待判断音频信号;第二确定单元,用于确定所述音频信号中各个子带的子带信噪比SNR的权重,其中所述子带SNR大于第一预设门限的高频端子带的子带SNR的权重大于其他子带的子带SNR的权重,根据所述音频信号中的各个子带的子带SNR的权重和各个子带的子带SNR,确定增强分段信噪比SSNR,其中所述增强SSNR大于基准SSNR;第三确定单元,用于将所述增强SSNR与语音活动检测VAD判决门限比较,确定所述音频信号是否为活动信号。
- 如权利要求33所述的装置,其特征在于,所述第一确定单元,具体用于根据所述音频信号的子带信噪比SNR,确定所述音频信号为待判断音频信号。
- 如权利要求34所述的装置,其特征在于,所述第一确定单元,具体用于在所述音频信号中子带信噪比SNR大于第一预设门限的高频端子带的数量大于第一数量的情况下,确定所述音频信号为待判断音频信号。
- 如权利要求34所述的装置,其特征在于,所述第一确定单元,具体用于在所述音频信号中子带SNR大于第一预设门限的高频端子带的数量大于第二数量且所述音频信号中子带SNR小于第二预设门限的低频端子带的数量大于第三数量情况下,确定所述音频信号为待判断音频信号。
- 一种装置,其特征在于,所述装置包括:第一确定单元,用于确定输入的音频信号为待判断音频信号;第二确定单元,用于获取所述音频信号的基准分段信噪比SSNR;第三确定单元,用于使用预置算法减小基准语音活动检测VAD判决门限,获得减小后的VAD判决门限;第四确定单元,用于将所述基准SSNR与所述减小后的VAD判决门限进行比较,确定所述音频信号是否为活动信号。
- 如权利要求37所述的装置,其特征在于,所述第一确定单元,具体用于根据所述音频信号的子带信噪比SNR,确定所述音频信号为待判断音频信号。
- 如权利要求38所述的装置,其特征在于,所述第一确定单元,具体用于在所述音频信号中子带SNR大于第一预设门限的高频端子带的数量大于第一数量的情况下,确定所述音频信号为待判断音频信号。
- 如权利要求38所述的装置,其特征在于,所述第一确定单元,具 体用于在所述音频信号中子带SNR大于第一预设门限的高频端子带的数量大于第二数量且所述音频信号中子带SNR小于第二预设门限的低频端子带的数量大于第三数量情况下,确定所述音频信号为待判断音频信号。
- 如权利要求38所述的装置,其特征在于,所述第一确定单元,具体用于在所述音频信号中中子带SNR的值大于第三预设门限的子带的数量大于第四数量的情况下,确定所述音频信号为待判断音频信号。
- 如权利要求37所述的装置,其特征在于,所述第一确定单元,具体用于在确定所述音频信号为清音信号的情况下,确定所述音频信号为待判断音频信号。
Priority Applications (15)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| MYPI2016703030A MY193521A (en) | 2014-03-12 | 2014-12-01 | Method for detecting audio signal and apparatus |
| EP14885786.5A EP3118852B1 (en) | 2014-03-12 | 2014-12-01 | Method and device for detecting audio signal |
| MX2016011750A MX355828B (es) | 2014-03-12 | 2014-12-01 | Método y aparato para la detección de señales de audio. |
| KR1020187021506A KR102005009B1 (ko) | 2014-03-12 | 2014-12-01 | 오디오 신호를 검출하는 방법 및 장치 |
| EP19197660.4A EP3660845B1 (en) | 2014-03-12 | 2014-12-01 | Method for detecting audio signal and apparatus |
| AU2014386442A AU2014386442B9 (en) | 2014-03-12 | 2014-12-01 | Method for detecting audio signal and apparatus |
| SG11201607052SA SG11201607052SA (en) | 2014-03-12 | 2014-12-01 | Method for detecting audio signal and apparatus |
| CA2940487A CA2940487C (en) | 2014-03-12 | 2014-12-01 | Method for detecting audio signal and apparatus |
| RU2016139717A RU2666337C2 (ru) | 2014-03-12 | 2014-12-01 | Способ обнаружения звукового сигнала и устройство |
| JP2016556770A JP6493889B2 (ja) | 2014-03-12 | 2014-12-01 | 音声信号を検出するための方法および装置 |
| KR1020167025280A KR101884220B1 (ko) | 2014-03-12 | 2014-12-01 | 오디오 신호를 검출하는 방법 및 장치 |
| ES14885786T ES2787894T3 (es) | 2014-03-12 | 2014-12-01 | Método y dispositivo para detectar la señal de audio |
| US15/262,263 US10304478B2 (en) | 2014-03-12 | 2016-09-12 | Method for detecting audio signal and apparatus |
| US16/391,893 US10818313B2 (en) | 2014-03-12 | 2019-04-23 | Method for detecting audio signal and apparatus |
| US16/901,846 US11417353B2 (en) | 2014-03-12 | 2020-06-15 | Method for detecting audio signal and apparatus |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201410090386.XA CN104916292B (zh) | 2014-03-12 | 2014-03-12 | 检测音频信号的方法和装置 |
| CN201410090386.X | 2014-03-12 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US15/262,263 Continuation US10304478B2 (en) | 2014-03-12 | 2016-09-12 | Method for detecting audio signal and apparatus |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2015135344A1 true WO2015135344A1 (zh) | 2015-09-17 |
Family
ID=54070889
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2014/092694 Ceased WO2015135344A1 (zh) | 2014-03-12 | 2014-12-01 | 检测音频信号的方法和装置 |
Country Status (14)
| Country | Link |
|---|---|
| US (3) | US10304478B2 (zh) |
| EP (2) | EP3118852B1 (zh) |
| JP (2) | JP6493889B2 (zh) |
| KR (2) | KR102005009B1 (zh) |
| CN (3) | CN107086043B (zh) |
| AU (1) | AU2014386442B9 (zh) |
| CA (1) | CA2940487C (zh) |
| ES (2) | ES2926360T3 (zh) |
| MX (1) | MX355828B (zh) |
| MY (1) | MY193521A (zh) |
| PT (2) | PT3660845T (zh) |
| RU (1) | RU2666337C2 (zh) |
| SG (1) | SG11201607052SA (zh) |
| WO (1) | WO2015135344A1 (zh) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107040359A (zh) * | 2017-05-08 | 2017-08-11 | 海能达通信股份有限公司 | 一种语音呼叫过程中携带随路信令的方法、装置及设备 |
Families Citing this family (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107086043B (zh) * | 2014-03-12 | 2020-09-08 | 华为技术有限公司 | 检测音频信号的方法和装置 |
| CN113038353B (zh) * | 2016-04-29 | 2022-08-09 | 荣耀终端有限公司 | 一种语音输入异常的确定方法、装置、终端以及存储介质 |
| CN107393558B (zh) * | 2017-07-14 | 2020-09-11 | 深圳永顺智信息科技有限公司 | 语音活动检测方法及装置 |
| CN107393553B (zh) * | 2017-07-14 | 2020-12-22 | 深圳永顺智信息科技有限公司 | 用于语音活动检测的听觉特征提取方法 |
| CN107393559B (zh) * | 2017-07-14 | 2021-05-18 | 深圳永顺智信息科技有限公司 | 检校语音检测结果的方法及装置 |
| CN107393550B (zh) * | 2017-07-14 | 2021-03-19 | 深圳永顺智信息科技有限公司 | 语音处理方法及装置 |
| US11783809B2 (en) * | 2020-10-08 | 2023-10-10 | Qualcomm Incorporated | User voice activity detection using dynamic classifier |
| EP4404196A4 (en) | 2021-11-09 | 2025-01-22 | Samsung Electronics Co., Ltd. | ELECTRONIC DEVICE FOR CONTROLLING BEAM FORMING AND OPERATING METHOD THEREFOR |
| US20240304203A1 (en) * | 2023-03-06 | 2024-09-12 | Nvidia Corporation | Noise reduction using voice activity detection in audio processing systems and applications |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6453291B1 (en) * | 1999-02-04 | 2002-09-17 | Motorola, Inc. | Apparatus and method for voice activity detection in a communication system |
| EP2113908A1 (en) * | 2008-04-30 | 2009-11-04 | QNX Software Systems (Wavemakers), Inc. | Robust downlink speech and noise detector |
| CN102044243A (zh) * | 2009-10-15 | 2011-05-04 | 华为技术有限公司 | 语音激活检测方法与装置、编码器 |
| CN102044242A (zh) * | 2009-10-15 | 2011-05-04 | 华为技术有限公司 | 语音激活检测方法、装置和电子设备 |
| CN102576528A (zh) * | 2009-10-19 | 2012-07-11 | 瑞典爱立信有限公司 | 用于语音活动检测的检测器和方法 |
| CN102959625A (zh) * | 2010-12-24 | 2013-03-06 | 华为技术有限公司 | 自适应地检测输入音频信号中的话音活动的方法和设备 |
Family Cites Families (48)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPS59182498A (ja) * | 1983-04-01 | 1984-10-17 | 日本電気株式会社 | 音声検出回路 |
| JPS63259596A (ja) * | 1987-04-16 | 1988-10-26 | 株式会社日立製作所 | 音声区間検出方式 |
| CA2153170C (en) * | 1993-11-30 | 2000-12-19 | At&T Corp. | Transmitted noise reduction in communications systems |
| FI100840B (fi) * | 1995-12-12 | 1998-02-27 | Nokia Mobile Phones Ltd | Kohinanvaimennin ja menetelmä taustakohinan vaimentamiseksi kohinaises ta puheesta sekä matkaviestin |
| US5991718A (en) * | 1998-02-27 | 1999-11-23 | At&T Corp. | System and method for noise threshold adaptation for voice activity detection in nonstationary noise environments |
| US6466906B2 (en) * | 1999-01-06 | 2002-10-15 | Dspc Technologies Ltd. | Noise padding and normalization in dynamic time warping |
| US6324509B1 (en) * | 1999-02-08 | 2001-11-27 | Qualcomm Incorporated | Method and apparatus for accurate endpointing of speech in the presence of noise |
| JP2001236085A (ja) * | 2000-02-25 | 2001-08-31 | Matsushita Electric Ind Co Ltd | 音声区間検出装置、定常雑音区間検出装置、非定常雑音区間検出装置、及び雑音区間検出装置 |
| JP3588030B2 (ja) * | 2000-03-16 | 2004-11-10 | 三菱電機株式会社 | 音声区間判定装置及び音声区間判定方法 |
| US6898566B1 (en) * | 2000-08-16 | 2005-05-24 | Mindspeed Technologies, Inc. | Using signal to noise ratio of a speech signal to adjust thresholds for extracting speech parameters for coding the speech signal |
| CN1175398C (zh) * | 2000-11-18 | 2004-11-10 | 中兴通讯股份有限公司 | 一种从噪声环境中识别出语音和音乐的声音活动检测方法 |
| WO2002080148A1 (en) * | 2001-03-28 | 2002-10-10 | Mitsubishi Denki Kabushiki Kaisha | Noise suppressor |
| US7941313B2 (en) * | 2001-05-17 | 2011-05-10 | Qualcomm Incorporated | System and method for transmitting speech activity information ahead of speech features in a distributed voice recognition system |
| US7203643B2 (en) | 2001-06-14 | 2007-04-10 | Qualcomm Incorporated | Method and apparatus for transmitting speech activity in distributed voice recognition systems |
| US6937980B2 (en) * | 2001-10-02 | 2005-08-30 | Telefonaktiebolaget Lm Ericsson (Publ) | Speech recognition using microphone antenna array |
| JP4281349B2 (ja) * | 2001-12-25 | 2009-06-17 | パナソニック株式会社 | 電話装置 |
| US7024353B2 (en) * | 2002-08-09 | 2006-04-04 | Motorola, Inc. | Distributed speech recognition with back-end voice activity detection apparatus and method |
| US7146315B2 (en) * | 2002-08-30 | 2006-12-05 | Siemens Corporate Research, Inc. | Multichannel voice detection in adverse environments |
| US7162420B2 (en) * | 2002-12-10 | 2007-01-09 | Liberato Technologies, Llc | System and method for noise reduction having first and second adaptive filters |
| JP4490090B2 (ja) * | 2003-12-25 | 2010-06-23 | 株式会社エヌ・ティ・ティ・ドコモ | 有音無音判定装置および有音無音判定方法 |
| CA2454296A1 (en) | 2003-12-29 | 2005-06-29 | Nokia Corporation | Method and device for speech enhancement in the presence of background noise |
| US8340309B2 (en) * | 2004-08-06 | 2012-12-25 | Aliphcom, Inc. | Noise suppressing multi-microphone headset |
| CN100369113C (zh) * | 2004-12-31 | 2008-02-13 | 中国科学院自动化研究所 | 利用增益自适应提高语音识别率的方法 |
| US8175877B2 (en) * | 2005-02-02 | 2012-05-08 | At&T Intellectual Property Ii, L.P. | Method and apparatus for predicting word accuracy in automatic speech recognition systems |
| WO2007091956A2 (en) | 2006-02-10 | 2007-08-16 | Telefonaktiebolaget Lm Ericsson (Publ) | A voice detector and a method for suppressing sub-bands in a voice detector |
| US8032370B2 (en) * | 2006-05-09 | 2011-10-04 | Nokia Corporation | Method, apparatus, system and software product for adaptation of voice activity detection parameters based on the quality of the coding modes |
| US8311814B2 (en) * | 2006-09-19 | 2012-11-13 | Avaya Inc. | Efficient voice activity detector to detect fixed power signals |
| CN101197130B (zh) * | 2006-12-07 | 2011-05-18 | 华为技术有限公司 | 声音活动检测方法和声音活动检测器 |
| US7769585B2 (en) * | 2007-04-05 | 2010-08-03 | Avidyne Corporation | System and method of voice activity detection in noisy environments |
| CN101320559B (zh) * | 2007-06-07 | 2011-05-18 | 华为技术有限公司 | 一种声音激活检测装置及方法 |
| US8954324B2 (en) * | 2007-09-28 | 2015-02-10 | Qualcomm Incorporated | Multiple microphone voice activity detector |
| KR101335417B1 (ko) | 2008-03-31 | 2013-12-05 | (주)트란소노 | 노이지 음성 신호의 처리 방법과 이를 위한 장치 및 컴퓨터판독 가능한 기록매체 |
| US8768690B2 (en) * | 2008-06-20 | 2014-07-01 | Qualcomm Incorporated | Coding scheme selection for low-bit-rate applications |
| WO2010091339A1 (en) | 2009-02-06 | 2010-08-12 | University Of Ottawa | Method and system for noise reduction for speech enhancement in hearing aid |
| JP5337530B2 (ja) * | 2009-02-25 | 2013-11-06 | 京セラ株式会社 | 無線基地局および無線通信方法 |
| KR20110001130A (ko) * | 2009-06-29 | 2011-01-06 | 삼성전자주식회사 | 가중 선형 예측 변환을 이용한 오디오 신호 부호화 및 복호화 장치 및 그 방법 |
| WO2011049515A1 (en) * | 2009-10-19 | 2011-04-28 | Telefonaktiebolaget Lm Ericsson (Publ) | Method and voice activity detector for a speech encoder |
| US8898058B2 (en) * | 2010-10-25 | 2014-11-25 | Qualcomm Incorporated | Systems, methods, and apparatus for voice activity detection |
| EP3252771B1 (en) | 2010-12-24 | 2019-05-01 | Huawei Technologies Co., Ltd. | A method and an apparatus for performing a voice activity detection |
| WO2012083552A1 (en) * | 2010-12-24 | 2012-06-28 | Huawei Technologies Co., Ltd. | Method and apparatus for voice activity detection |
| US9099098B2 (en) * | 2012-01-20 | 2015-08-04 | Qualcomm Incorporated | Voice activity detection in presence of background noise |
| WO2013118192A1 (ja) * | 2012-02-10 | 2013-08-15 | 三菱電機株式会社 | 雑音抑圧装置 |
| JP5862349B2 (ja) * | 2012-02-16 | 2016-02-16 | 株式会社Jvcケンウッド | ノイズ低減装置、音声入力装置、無線通信装置、およびノイズ低減方法 |
| CN103325380B (zh) * | 2012-03-23 | 2017-09-12 | 杜比实验室特许公司 | 用于信号增强的增益后处理 |
| US20130282373A1 (en) | 2012-04-23 | 2013-10-24 | Qualcomm Incorporated | Systems and methods for audio signal processing |
| US9524735B2 (en) * | 2014-01-31 | 2016-12-20 | Apple Inc. | Threshold adaptation in two-channel noise estimation and voice activity detection |
| CN107086043B (zh) * | 2014-03-12 | 2020-09-08 | 华为技术有限公司 | 检测音频信号的方法和装置 |
| US9775113B2 (en) * | 2014-12-11 | 2017-09-26 | Mediatek Inc. | Voice wakeup detecting device with digital microphone and associated method |
-
2014
- 2014-03-12 CN CN201710313043.9A patent/CN107086043B/zh active Active
- 2014-03-12 CN CN201710312455.0A patent/CN107293287B/zh active Active
- 2014-03-12 CN CN201410090386.XA patent/CN104916292B/zh active Active
- 2014-12-01 ES ES19197660T patent/ES2926360T3/es active Active
- 2014-12-01 KR KR1020187021506A patent/KR102005009B1/ko active Active
- 2014-12-01 PT PT191976604T patent/PT3660845T/pt unknown
- 2014-12-01 WO PCT/CN2014/092694 patent/WO2015135344A1/zh not_active Ceased
- 2014-12-01 MY MYPI2016703030A patent/MY193521A/en unknown
- 2014-12-01 SG SG11201607052SA patent/SG11201607052SA/en unknown
- 2014-12-01 EP EP14885786.5A patent/EP3118852B1/en active Active
- 2014-12-01 MX MX2016011750A patent/MX355828B/es active IP Right Grant
- 2014-12-01 JP JP2016556770A patent/JP6493889B2/ja active Active
- 2014-12-01 KR KR1020167025280A patent/KR101884220B1/ko active Active
- 2014-12-01 EP EP19197660.4A patent/EP3660845B1/en active Active
- 2014-12-01 PT PT148857865T patent/PT3118852T/pt unknown
- 2014-12-01 RU RU2016139717A patent/RU2666337C2/ru active
- 2014-12-01 ES ES14885786T patent/ES2787894T3/es active Active
- 2014-12-01 CA CA2940487A patent/CA2940487C/en active Active
- 2014-12-01 AU AU2014386442A patent/AU2014386442B9/en active Active
-
2016
- 2016-09-12 US US15/262,263 patent/US10304478B2/en active Active
-
2018
- 2018-11-30 JP JP2018225323A patent/JP6793706B2/ja active Active
-
2019
- 2019-04-23 US US16/391,893 patent/US10818313B2/en active Active
-
2020
- 2020-06-15 US US16/901,846 patent/US11417353B2/en active Active
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6453291B1 (en) * | 1999-02-04 | 2002-09-17 | Motorola, Inc. | Apparatus and method for voice activity detection in a communication system |
| EP2113908A1 (en) * | 2008-04-30 | 2009-11-04 | QNX Software Systems (Wavemakers), Inc. | Robust downlink speech and noise detector |
| CN102044243A (zh) * | 2009-10-15 | 2011-05-04 | 华为技术有限公司 | 语音激活检测方法与装置、编码器 |
| CN102044242A (zh) * | 2009-10-15 | 2011-05-04 | 华为技术有限公司 | 语音激活检测方法、装置和电子设备 |
| CN102576528A (zh) * | 2009-10-19 | 2012-07-11 | 瑞典爱立信有限公司 | 用于语音活动检测的检测器和方法 |
| CN102959625A (zh) * | 2010-12-24 | 2013-03-06 | 华为技术有限公司 | 自适应地检测输入音频信号中的话音活动的方法和设备 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP3118852A4 * |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107040359A (zh) * | 2017-05-08 | 2017-08-11 | 海能达通信股份有限公司 | 一种语音呼叫过程中携带随路信令的方法、装置及设备 |
| CN107040359B (zh) * | 2017-05-08 | 2021-01-19 | 海能达通信股份有限公司 | 一种语音呼叫过程中携带随路信令的方法、装置及设备 |
Also Published As
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2015135344A1 (zh) | 检测音频信号的方法和装置 | |
| US10867620B2 (en) | Sibilance detection and mitigation | |
| CN104637489B (zh) | 声音信号处理的方法和装置 | |
| CN106486131A (zh) | 一种语音去噪的方法及装置 | |
| Tabibian et al. | Speech enhancement using a wavelet thresholding method based on symmetric Kullback–Leibler divergence | |
| US20140244247A1 (en) | Keyboard typing detection and suppression | |
| US11610601B2 (en) | Method and apparatus for determining speech presence probability and electronic device | |
| CN105489226A (zh) | 一种用于拾音器的多窗谱估计的维纳滤波语音增强方法 | |
| EP3261089B1 (en) | Sibilance detection and mitigation | |
| CN114827833B (zh) | 啸叫抑制方法、装置、芯片及电子设备 | |
| US20180108345A1 (en) | Device and method for audio frame processing | |
| CN116312592A (zh) | 语音广播扩声系统的啸叫处理方法和装置 | |
| Wang et al. | Analysis and low-power hardware implementation of a noise reduction algorithm | |
| HK1229058A1 (zh) | 检测音频信号的方法和装置 | |
| CN114974287A (zh) | 降风噪方法、装置、终端设备及存储介质 | |
| Peng et al. | A Gain Bounded Speech-enhancement Algorithm for Improving Intelligibility |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14885786 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2940487 Country of ref document: CA |
|
| ENP | Entry into the national phase |
Ref document number: 2014386442 Country of ref document: AU Date of ref document: 20141201 Kind code of ref document: A |
|
| ENP | Entry into the national phase |
Ref document number: 2016556770 Country of ref document: JP Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: MX/A/2016/011750 Country of ref document: MX |
|
| ENP | Entry into the national phase |
Ref document number: 20167025280 Country of ref document: KR Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| REEP | Request for entry into the european phase |
Ref document number: 2014885786 Country of ref document: EP |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2014885786 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 2016139717 Country of ref document: RU Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: IDP00201606868 Country of ref document: ID |
|
| REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: 112016019692 Country of ref document: BR |
|
| ENP | Entry into the national phase |
Ref document number: 112016019692 Country of ref document: BR Kind code of ref document: A2 Effective date: 20160825 |





