WO2016206273A1 - 一种激活音修正帧数的获取方法、激活音检测方法和装置 - Google Patents
一种激活音修正帧数的获取方法、激活音检测方法和装置 Download PDFInfo
- Publication number
- WO2016206273A1 WO2016206273A1 PCT/CN2015/093889 CN2015093889W WO2016206273A1 WO 2016206273 A1 WO2016206273 A1 WO 2016206273A1 CN 2015093889 W CN2015093889 W CN 2015093889W WO 2016206273 A1 WO2016206273 A1 WO 2016206273A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- frame
- parameter
- background noise
- signal
- activation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/60—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/81—Detection of presence or absence of voice signals for discriminating voice from music
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/84—Detection of presence or absence of voice signals for discriminating voice from noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/012—Comfort noise or silence coding
Definitions
- This application relates to, but is not limited to, the field of communications.
- VAD Voice Activity Detection
- AMR Adaptive Multi-Rate
- AMR-WB Adaptive Multi-Rate Wideband
- VAD of these encoders does not achieve good performance under all typical background noise. Especially for unsteady noise, these encoders have low VAD efficiency. For music signals, these VADs sometimes have error detection, resulting in a significant quality degradation of the corresponding processing algorithms.
- the embodiment of the invention provides a method for acquiring an activation sound correction frame number, an activation sound detection method and a device, so as to solve the problem that the accuracy of the activation sound detection (VAD) is low.
- VAD activation sound detection
- An embodiment of the present invention provides a method for acquiring an activation tone correction frame number, where the method includes:
- the obtaining an activation tone detection decision result of the current frame includes:
- the frame energy parameter is a weighted superposition value or a direct superposition value of each sub-band signal energy
- the spectral center of gravity feature parameter is a ratio of a weighted accumulated value of all or part of the subband signal energy to an unweighted accumulated value, or a value obtained by smoothing the ratio;
- the time domain stability characteristic parameter is a desired ratio of the variance of the amplitude superposition value and the square of the amplitude superposition value, or the ratio is multiplied by a coefficient;
- the spectral flatness characteristic parameter is a ratio of a geometric mean of the predetermined plurality of spectral magnitudes to an arithmetic mean, or the ratio is multiplied by a coefficient;
- the tonal feature parameter is obtained by calculating the correlation value of the intra-frame spectral difference coefficients of the two frames before and after, or continuing to smooth-filter the correlation value.
- the calculating, according to the tonality flag, the signal to noise ratio parameter, the spectral center of gravity feature parameter, and the frame energy parameter, the activation sound detection decision result includes:
- the obtaining, according to the activation tone detection decision result of the current frame, the number of background noise update times, and the number of the activation tone holding frames, obtaining the number of activated sound correction frames includes:
- the activation sound correction frame number is a constant and the activation sound retention frame number The maximum value in .
- the obtaining the activation tone holding frame number includes:
- the obtaining the activation tone holding frame number includes:
- the calculating the long-term signal-to-noise ratio and the average full-band signal-to-noise ratio according to the sub-band signal includes:
- Calculating the long-term signal to noise ratio by using a ratio of the average long-term active tone signal energy calculated by the previous frame of the current frame and the average long-term background noise energy; calculating the distance from the current
- the average of the full-band signal-to-noise ratio of the plurality of frames closest to the frame results in the average full-band signal-to-noise ratio.
- the precondition for correcting the current active tone holding frame number is that the activation tone flag indicates that the current frame is an active tone frame.
- the correcting the number of currently activated tone keeping frames to obtain the number of the activated tone keeping frames includes:
- the activation tone Maintaining the number of frames equal to the minimum number of consecutive active tone frames minus the number of consecutive speech frames; if the average full band signal to noise ratio is greater than a set threshold value, and the number of consecutive speech frames is greater than a set number
- the second threshold value is set according to the size of the long-term signal to noise ratio.
- the obtaining the number of background noise updates includes:
- the calculating the number of background noise update times according to the background noise update identifier includes:
- the calculating the number of background noise update times according to the background noise update identifier includes:
- the background noise update identifier indicates that the current frame is background noise, and the number of background noise update times is less than a set threshold, the background noise update number is incremented by one.
- the obtaining the background noise update identifier includes:
- the frame energy parameter is a weighted superposition value or a direct superposition value of each sub-band signal energy
- the spectral center of gravity feature parameter is a ratio of a weighted accumulated value of all or part of the subband signal energy to an unweighted accumulated value, or a value obtained by smoothing the ratio;
- the time domain stability characteristic parameter is a desired ratio of a variance of a frame energy amplitude and a square of the amplitude superposition value, or the ratio is multiplied by a coefficient;
- the spectral flatness parameter is a ratio of a geometric mean of the predetermined plurality of spectral magnitudes to an arithmetic mean, or the ratio is multiplied by a coefficient.
- the background noise update identifier includes:
- the time domain stability characteristic parameter is greater than a set threshold value
- the smoothed filter value of the spectral gravity center feature parameter value is greater than a set threshold value, and the time domain stability feature parameter value is also greater than a set threshold value;
- the smoothed filtered value of the tonal characteristic parameter or the tonal characteristic parameter is greater than a set threshold value, and the time domain stability characteristic parameter value is greater than a set threshold value;
- the spectrally flattened characteristic parameters of each sub-band or the spectrally flattened characteristic parameters of each of the sub-bands are each a smoothed filtered value less than a respective corresponding set threshold value;
- the value of the frame energy parameter is greater than a set threshold.
- the embodiment of the invention provides an activation sound detection method, and the method includes:
- the number of sound-holding frames is calculated by the number of activated sound correction frames
- the activation sound detection decision result is calculated according to the activation sound correction frame number and the second activation sound detection determination result.
- the calculating the activation sound detection decision result according to the activation sound correction frame number and the second activation sound detection determination result includes:
- the activation sound detection determination result is set as an active sound frame, and the The number of active tone correction frames is reduced by 1.
- the obtaining the first activation tone detection decision result includes:
- the frame energy parameter is a weighted superposition value or a direct superimposed value of each sub-band signal energy
- the spectral center of gravity feature parameter is a ratio of a weighted accumulated value of all or part of the subband signal energy to an unweighted accumulated value, or a value obtained by smoothing the ratio;
- the time domain stability characteristic parameter is a desired ratio of the variance of the amplitude superposition value and the square of the amplitude superposition value, or the ratio is multiplied by a coefficient;
- the spectral flatness characteristic parameter is a ratio of a geometric mean of the predetermined plurality of spectral magnitudes to an arithmetic mean, or the ratio is multiplied by a coefficient;
- the tonal feature parameter is obtained by calculating the correlation value of the intra-frame spectral difference coefficients of the two frames before and after, or continuing to smooth-filter the correlation value.
- the calculating, according to the tonality flag, the signal to noise ratio parameter, the spectral center of gravity feature parameter, and the frame energy parameter, the first activated sound detection decision result includes:
- Calculating a long-term signal-to-noise ratio by calculating a ratio of an average long-term active tone signal energy calculated by a previous frame of the current frame to an average long-term background noise energy;
- the obtaining the activation tone holding frame number includes:
- the obtaining the activation tone holding frame number includes:
- the calculating the long-term signal-to-noise ratio and the average full-band signal-to-noise ratio according to the sub-band signal includes:
- Calculating the long-term signal to noise ratio by using a ratio of the average long-term activated sound signal energy calculated by the previous frame of the current frame to the average long-term background noise energy; calculating a plurality of the closest to the current frame The average of the full band signal to noise ratio of the frame results in the average full band signal to noise ratio.
- the precondition for correcting the current active tone holding frame number is that the activation tone flag indicates that the current frame is an active tone frame.
- the correcting the current active tone keeping frame number includes: if the continuous voice frame number is less than a set first threshold value, and the long time signal to noise ratio is less than a set threshold value,
- the activation tone keeps the number of frames equal to the minimum number of consecutive active tone frames minus the number of consecutive speech frames; if the average full band signal to noise ratio is greater than a set second threshold value, and the continuous speech frame If the number is greater than a set threshold, the value of the number of active tone hold frames is set according to the size of the long-term signal to noise ratio.
- the obtaining the number of background noise updates includes:
- the calculating the number of background noise update times according to the background noise update identifier includes:
- the calculating the number of background noise update times according to the background noise update identifier includes:
- the background noise update identifier indicates that the current frame is background noise, and the number of background noise update times is less than a set threshold, the background noise update number is incremented by one.
- the obtaining the background noise update identifier includes:
- the sign noise, the tonal feature parameter, and the frame energy parameter perform background noise detection to obtain the background noise update identifier.
- the frame energy parameter is a weighted superposition value or a direct superimposed value of each sub-band signal energy
- the spectral center of gravity feature parameter is a ratio of a weighted accumulated value of all or part of the subband signal energy to an unweighted accumulated value, or a value obtained by smoothing the ratio;
- the time domain stability characteristic parameter is a desired ratio of a variance of a frame energy amplitude and a square of the amplitude superposition value, or the ratio is multiplied by a coefficient;
- the spectral flatness parameter is a ratio of a geometric mean of the predetermined plurality of spectral magnitudes to an arithmetic mean, or the ratio is multiplied by a coefficient.
- the background noise update identifier includes:
- the time domain stability characteristic parameter is greater than a set threshold value
- the smoothed filter value of the spectral gravity center feature parameter value is greater than a set threshold value, and the time domain stability feature parameter value is also greater than a set threshold value;
- the smoothed filtered value of the tonal characteristic parameter or the tonal characteristic parameter is greater than a set threshold, and the time domain stability characteristic parameter value is greater than a set threshold;
- the spectrally flattened characteristic parameters of each sub-band or the spectrally flattened characteristic parameters of each of the sub-bands are each a smoothed filtered value less than a respective corresponding set threshold value;
- the value of the frame energy parameter is greater than a set threshold.
- the calculating the number of the activated sound correction frames according to the first activation sound detection determination result, the background noise update times, and the activation sound retention frame number includes:
- the number of the activated sound correction frames is a constant and the maximum value of the number of the active sound holding frames.
- An embodiment of the present invention provides an apparatus for acquiring an activation tone correction frame number, where the apparatus includes:
- a first acquiring unit configured to: obtain an activation sound detection decision result of the current frame
- a second obtaining unit configured to: obtain an activation tone holding frame number
- a third obtaining unit configured to: obtain a background noise update number
- a fourth acquiring unit configured to: obtain an activation sound correction frame number according to the activation sound detection determination result of the current frame, the background noise update number, and the activation sound retention frame number.
- An embodiment of the present invention provides an activation tone detecting apparatus, where the apparatus includes:
- a fifth obtaining unit configured to: obtain a first activated sound detection decision result
- a sixth obtaining unit configured to: obtain an activation tone holding frame number
- the seventh obtaining unit is configured to: obtain the number of background noise updates
- a first calculating unit configured to: calculate an activation sound correction frame number according to the first activation sound detection determination result, the background noise update number, and the activation sound retention frame number;
- An eighth obtaining unit configured to: obtain a second activated sound detection decision result
- the second calculating unit is configured to: calculate the activation sound detection decision result according to the activation sound correction frame number and the second activation sound detection determination result.
- a computer readable storage medium storing computer executable instructions for performing the method of any of the above.
- An embodiment of the present invention provides a method for acquiring an activation tone correction frame number, an activation tone detection method, and a device, which first obtain a first activation tone detection decision result, obtain an activation tone hold frame number, obtain a background noise update number, and then obtain the background noise update number. First activation sound detection determination result, the number of background noise update times, and the activation sound retention frame number calculation activation sound correction frame number, and obtaining a second activation sound detection determination result, and finally correcting the number of frames and the sound according to the activation sound.
- the second activation sound detection decision result calculates the activation sound detection decision result, and the accuracy of the VAD detection can be improved.
- FIG. 1 is a schematic flowchart of a method for detecting an activated tone according to Embodiment 1 of the present invention
- FIG. 2 is a schematic diagram of a process of obtaining a VAD decision result according to Embodiment 1 of the present invention
- FIG. 3 is a schematic flowchart of a background noise detecting method according to Embodiment 2 of the present invention.
- FIG. 4 is a schematic flowchart of a method for correcting a current active tone holding frame number in a VAD decision according to Embodiment 3 of the present invention
- FIG. 5 is a schematic flowchart of a method for acquiring an activation tone correction frame number according to Embodiment 4 of the present invention.
- FIG. 6 is a schematic structural diagram of an apparatus for acquiring an activated sound correction frame number according to Embodiment 4 of the present invention.
- FIG. 7 is a schematic flowchart of a method for detecting an activated tone according to Embodiment 5 of the present invention.
- FIG. 8 is a schematic structural diagram of an activated sound detecting apparatus according to Embodiment 5 of the present invention.
- the embodiment of the invention provides an activation sound detection method, as shown in FIG. 1 , the method includes:
- Step 101 Obtain a subband signal and a spectrum amplitude of a current frame.
- an audio stream with a frame length of 20 ms and a sampling rate of 32 kHz is taken as an example.
- the method of this paper is equally applicable under other frame lengths and sample rate conditions.
- a 40-channel filter bank is used.
- the input audio signal is s HP (n)
- L C is 40
- w c is a window function
- the window length is 10L C
- the sub-band signal X(k,l) X CR ( l,k)+i ⁇ X CI (l,k)
- X CR and X CI are the real and imaginary parts of the subband signal.
- the subband signals are calculated as follows:
- Time-frequency transform is performed on the filter group sub-band signal, and the spectrum amplitude is calculated.
- Embodiments of the present invention can be implemented by performing time-frequency transform on all filter bank sub-bands or partial filter bank sub-bands and calculating spectrum amplitudes.
- the time-frequency transform method in the embodiment of the present invention may be a Discrete Fourier Transform (DFT), a Fast Fourier Transformation (FFT), or a Discrete Cosine Transform (Discrete Cosine Transform). DCT) or Discrete Sine Transform (DST).
- DFT Discrete Fourier Transform
- FFT Fast Fourier Transformation
- DCT Discrete Cosine Transform
- DST Discrete Sine Transform
- the time-frequency conversion equation is as follows:
- X DFT_POW [k,j] ((Re(X DFT [k,j])) 2 +(Im(X DFT [k,j])) 2 ); 0 ⁇ k ⁇ 10,0 ⁇ j ⁇ 16
- Re and Im respectively represent the real part and the imaginary part of the spectral coefficient X DFT_POW [k, j].
- a sp is the spectrum amplitude after time-frequency transform.
- Step 102 Calculate a frame energy parameter, a spectral center of gravity characteristic parameter, and a time domain stability characteristic parameter value of the current frame according to the subband signal, and calculate a value of the spectral flatness characteristic parameter and the tonal characteristic parameter according to the spectrum amplitude.
- the frame energy parameter is a weighted superposition value or a direct superposition value of each sub-band signal energy, wherein:
- E C (t, k) (X CR (t, k)) 2 + (X CI (t, k)) 2 0 ⁇ t ⁇ 15, 0 ⁇ k ⁇ L C .
- the human ear is relatively insensitive to very low frequency (such as below 100 Hz) and high frequency (such as above 20 kHz) sound.
- the filter is arranged according to the frequency from low to high.
- the sub-bands, from the second sub-band to the penultimate sub-band, are the main filter group sub-bands that are sensitive to hearing, and accumulate some or all of the auditory-sensitive filter sub-band energy to obtain the frame energy parameter 1, and calculate the equation. as follows:
- E_sb_start is the starting subband index, and its value ranges from [0, 6].
- E_sb_end is the end subband index, which takes a value greater than 6, less than the total number of subbands.
- a frame energy parameter 2 The value of the frame energy parameter 1 plus some or all of the weighted values of the energy of the filter bank subbands that are not used in calculating the frame energy parameter 1 yields a frame energy parameter 2, which is calculated as follows:
- Num_band is the total number of subbands.
- the spectral center of gravity feature parameter is a ratio of a weighted accumulated value of all or a portion of the subband signal energy to an unweighted accumulated value, wherein:
- the spectral center-of-gravity characteristic parameter is calculated according to the energy of each filter bank sub-band.
- the spectral center-of-gravity characteristic parameter is a ratio of the sum of the energy of the filter group sub-band energy sum and the direct addition of the sub-band energy or the Other spectral center of gravity feature parameter values are smoothed and filtered.
- the spectral center of gravity feature parameters can be implemented using the following substeps:
- the two spectral center-of-gravity characteristic parameter values are calculated, which are the first interval spectral center-of-gravity characteristic parameter and the second interval spectral gravity center characteristic parameter.
- Delta1 and Delta2 are each a small offset value ranging from (0,1). Where k is the spectral center of gravity numbered index.
- Sp_center[2] sp_center -1 [2] ⁇ spc_sm_scale+sp_center[0] ⁇ (1-spc_sm_scale)
- spc_sm_scale is the spectral center-of-gravity parameter smoothing filter scale factor
- sp_center -1 [2] represents the smooth spectral center of gravity feature parameter value of the previous frame, and its initial value is 1.6.
- the time domain stability characteristic parameter is a desired ratio of the variance of the amplitude superposition value and the square of the amplitude superposition value, or the ratio is multiplied by a coefficient, wherein:
- the time domain stability characteristic parameter is calculated from the latest frame energy parameters of the plurality of frame signals.
- the time domain stability characteristic parameter is calculated by using the frame energy parameter of the latest 40 frame signal. The calculation steps are:
- e_offset is an offset value, which ranges from [0, 0.1].
- Amp t2 (n) Amp t1 (-2n)+Amp t1 (-2n-1); 0 ⁇ n ⁇ 20;
- Amp t1 represents the energy amplitude of the current frame
- Amp t1 represents the energy amplitude of the n frames of the current frame
- time domain stability feature parameter ltd_stable_rate0 is obtained by calculating the ratio of the variance of the 20 amplitude superposition values closest to the current frame to the average energy. The equation is calculated as follows:
- the spectral flatness characteristic parameter is a ratio of a geometric mean of the predetermined plurality of spectral magnitudes to an arithmetic mean, or the ratio is multiplied by a coefficient.
- N A is the number of spectral amplitudes.
- the predetermined plurality of spectrums in the embodiment of the present invention may be a part of the spectrum selected according to the experience of the technician, or may be a part of the spectrum selected according to the actual situation.
- the spectrum amplitude is divided into three frequency bands, and the spectral flatness characteristics of the three frequency bands are calculated.
- the division manner is as follows:
- the tonal feature parameter is obtained by calculating the correlation value of the intra-frame spectral difference coefficients of the two frames before and after, or continuing to smooth-filter the correlation value.
- the tonal characteristic parameters are calculated according to the spectral amplitude, wherein the tonal characteristic parameters can be calculated according to all spectral amplitudes or partial spectral amplitudes.
- a part (not less than 8 spectral coefficients) or all spectral amplitudes are compared with adjacent spectral amplitudes, and the value of the differential result less than 0 is set to 0, resulting in a set of non-negative spectral differential coefficients:
- the angle 0 is represented as the current frame, and the equation is calculated as follows:
- Step 103 Calculate a signal to noise ratio parameter of the current frame according to the background noise energy obtained in the previous frame of the current frame, the frame energy parameter of the current frame, and the signal to noise ratio subband energy.
- the background noise energy of the previous frame of the current frame can be obtained by an existing method.
- the value of the signal to noise ratio subband background noise energy uses the default initial value. SNR of the previous frame of the current frame, background noise energy estimation of the sub-band and the signal-to-noise ratio of the current frame The principle of the quantity estimation is the same, and the signal-to-noise ratio sub-band background energy estimation of the current frame is referred to step 107 of this embodiment.
- the signal to noise ratio parameter of the current frame can be implemented by using an existing signal to noise ratio calculation method. Optionally, the following method is used:
- the filter bank subband is re-divided into multiple SNR subbands, and the index is as follows.
- the energy of each SNR sub-band of the current frame is calculated.
- the calculation equation is as follows:
- the sub-band average signal-to-noise ratio SNR1 is calculated from the energy of each SNR subband of the current frame and the background noise energy of each SNR subband of the previous frame.
- the calculation equation is as follows:
- E sb2_bg is the estimated background noise energy of each SNR subband of the previous frame of the current frame, and the number of num_band SNR subbands.
- the principle of obtaining the background noise energy of the signal-to-noise ratio sub-band of the previous frame is the same as the principle of obtaining the background energy of the signal-to-noise ratio sub-band of the current frame, and the process of obtaining the background energy of the signal-to-noise ratio sub-band of the current frame is as follows. Step 107.
- the full-band signal-to-noise ratio SNR2 is calculated according to the estimated full frame background noise energy of the previous frame and the frame energy parameter of the current frame:
- E t_bg is the estimated total background noise energy of the previous frame
- the principle of obtaining the full background noise energy of the previous frame is the same as the principle of obtaining the full background noise energy of the current frame, and the full background noise energy of the current frame is obtained.
- the signal to noise ratio parameters in this embodiment include a subband average signal to noise ratio SNR1 and a full band signal to noise ratio SNR2.
- the full background noise energy and the background noise energy of each subband are collectively referred to as background noise energy.
- Step 104 Calculate a tonality flag of the current frame according to a frame energy parameter of the current frame, a spectral center-of-gravity characteristic parameter, a time domain stability characteristic parameter, a spectral flatness characteristic parameter, and a tonal characteristic parameter, where:
- a value of 1 for the tonality_frame indicates that the current frame is a tonal frame, and 0 indicates that the current frame is a non-tonal frame;
- step 104b determining whether the tonal characteristic parameter or its smoothed filtered value is greater than the corresponding set threshold value tonality_decision_thr1 or tonality_decision_thr2, if one of the above conditions is true, then step 104c is performed, otherwise step 104d is performed;
- the value range of tonality_decision_thr1 is [0.5, 0.7]
- the range of tonality_rate1 is [0.7, 0.99].
- step 104c If the time domain stability characteristic parameter value lt_stable_rate0 is smaller than a set threshold value lt_stable_decision_thr1; the spectral center of gravity characteristic parameter value sp_center[1] is greater than a set threshold value spc_decision_thr1, and the spectral flatness characteristic parameter of each subband If the threshold is smaller than the corresponding preset threshold, the current frame is determined to be a tonal frame, and the value of the tonality frame flag tonality_frame is set to 1, otherwise it is determined to be a non-tonal frame, and the value of the tonal frame flag tonality_frame is set to 0. . And proceed to step 104d.
- the value range of the threshold lt_stable_decision_thr1 is [0.01, 0.25], and the spc_decision_thr1 is [1.0, 1.8].
- the tonality feature parameter tonality_degree is updated using the following equation:
- Tonality_degree tonality_degree -1 ⁇ td_scale_A+td_scale_B;
- tonality_degree -1 is the characteristic parameter of the degree of tonality of the previous frame. Its initial value ranges from [0,1].
- td_scale_A is the attenuation coefficient, and its value range is [0, 1];
- td_scale_B is the accumulation coefficient, and its value range is [0, 1].
- the current frame is a tonal signal, otherwise, the current frame is determined to be a non-tonal signal.
- Step 105 Calculate the VAD decision result according to the tonality mark, the signal to noise ratio parameter, the spectral center of gravity feature parameter, and the frame energy parameter, as shown in FIG. 2, and the steps are as follows:
- Step 105a Calculating the average long-term active tone signal energy calculated by the previous frame of the current frame And the ratio of the average long-term background noise energy, the long-term signal to noise ratio lt_snr is calculated;
- the calculation and definition of the average long-term activated sound signal energy E fg and the average long-term background noise energy E bg are shown in step 105g.
- the long-term signal-to-noise ratio lt_snr is calculated as follows:
- Step 105b calculating an average value of the full-band signal-to-noise ratio SNR2 of the plurality of frames closest to the current frame, to obtain an average full-band signal-to-noise ratio SNR2_lt_ave;
- SNR2(n) represents the value of the full-band signal-to-noise ratio SNR2 of the nth frame of the current frame
- F_num is the total number of frames for which the average value is calculated, which is in the range of [8, 64].
- Step 105c Obtain a decision signal-to-noise ratio threshold snr_thr of the VAD decision according to the spectral center-of-gravity characteristic parameter, the long-term signal-to-noise ratio lt_snr, the number of consecutive active sound frames continuous_speech_num, and the number of consecutive noise frames continuous_noise_num.
- the initial value of the decision signal to noise ratio threshold snr_thr is set, and the range is [0.1, 2], for example, 1.06.
- the value of the decision signal-to-noise ratio threshold snr_thr is first adjusted according to the spectral center-of-gravity characteristic parameter. The steps are as follows: if the value of the spectral center-of-gravity characteristic parameter sp_center[2] is greater than a set threshold value spc_vad_dec_thr1, then snr_thr is added with an offset value, in this example, the offset value is taken as 0.05; otherwise, if sp_center[1 ] is larger than spc_vad_dec_thr2, then snr_thr is added with an offset value.
- the offset value is taken to be 0.10; otherwise, snr_thr is added with an offset value. In this example, the offset value is taken to be 0.40; wherein the threshold value spc_vad_dec_thr1 and The range of spc_vad_dec_thr2 is [1.2, 2.5].
- snr_thr is secondarily adjusted according to the number of consecutively activated audio frames continuous_speech_num, the number of consecutive noise frames continuous_noise_num, the average full-band signal-to-noise ratio SNR2_lt_ave, and the long-term signal-to-noise ratio lt_snr.
- the offset value is changed to 0.1; otherwise, if continuous_noise_num is greater than a set threshold value cpn_vad_dec_thr3, then snr_thr is added with an offset value. In this example, the offset value is changed. Take 0.2; otherwise, if continuous_noise_num is greater than a set threshold cpn_vad_dec_thr4, then snr_thr is added with an offset value, in this case the offset value is taken as 0.1.
- the thresholds cpn_vad_dec_thr1, cpn_vad_dec_thr2, cpn_vad_dec_thr3, cpn_vad_dec_thr4 have a value range of [2,500], and the coefficient lt_tsnr_scale has a value range of [0, 2].
- the decision signal-to-noise ratio threshold snr_thr is finally adjusted to obtain the decision signal-to-noise ratio threshold snr_thr of the current frame.
- Snr_thr snr_thr+(lt_tsnr-thr_offset) ⁇ thr_scale;
- thr_offset is an offset value
- the value range is [0.5, 3]
- thr_scale is a gain coefficient
- its value range is [0.1, 1].
- Step 105d Calculate an initial VAD decision result according to the decision threshold snr_thr of the activated sound detection and the signal to noise ratio parameters SNR1 and SNR2 calculated by the current frame.
- the value of the VAD flag vad_flag is used to indicate whether the current frame is an active tone frame.
- a value of 1 indicates that the current frame is an active tone frame
- 0 indicates that the current frame is an active tone frame.
- the frame is an inactive tone frame. Otherwise, it is judged that the current frame is an inactive sound frame, and the value of the VAD flag vad_flag is set to zero.
- SNR2 is greater than a set threshold value snr2_thr, it is determined that the current frame is an active tone frame, and the value of the VAD flag vad_flag is set to 1.
- the range of snr2_thr is [1.2, 5.0].
- Step 105e Correct the initial VAD decision result according to the tonality flag, the average full-band signal-to-noise ratio SNR2_lt_ave, the spectral center of gravity, and the long-term signal-to-noise ratio lt_snr.
- the tonality flag indicates that the current frame is a tonal signal, that is, the tonality_flag is 1, it is determined that the current frame is an active tone signal, and the vad_flag flag is set to 1.
- SNR2_lt_ave_thr1 is [1, 4]
- range of lt_tsnr_tscale is [0.1, 0.6].
- the current The frame is the active tone frame and the vad_flag flag is set.
- SNR2_lt_ave_t_thr2 is [1.0, 2.5]
- the range of sp_center_t_thr1 is [2.0, 4.0]
- the range of lt_tsnr_t_thr1 is [2.5, 5.0].
- SNR2_lt_ave is greater than a set threshold SNR2_lt_ave_t_thr3
- the spectral center-of-gravity characteristic parameter sp_center[2] is greater than a set threshold sp_center_t_thr2 and the long-term signal-to-noise ratio lt_snr is less than a set threshold lt_tsnr_t_thr2
- SNR2_lt_ave_t_thr3 is [0.8, 2.0]
- the range of sp_center_t_thr2 is [2.0, 4.0]
- the range of lt_tsnr_t_thr2 is [2.5, 5.0].
- SNR2_lt_ave is greater than a set threshold SNR2_lt_ave_t_thr4
- the spectral center-of-gravity characteristic parameter sp_center[2] is greater than a set threshold sp_center_t_thr3 and the long-term signal-to-noise ratio lt_snr is less than a set threshold lt_tsnr_t_thr3, it is determined that the current frame is an active sound frame,
- the vad_flag flag is set.
- SNR2_lt_ave_t_thr4 is [0.6, 2.0]
- the range of sp_center_t_thr3 is [3.0, 6.0]
- the range of lt_tsnr_t_thr3 is [2.5, 5.0].
- Step 105f According to the determination result of the multiple frames before the current frame, the long-term signal to noise ratio lt_snr, the average full-band signal-to-noise ratio SNR2_lt_ave, the signal-to-noise ratio parameter of the current frame, and the active tone detection of the current frame. As a result of the decision, the number of frames to be activated is corrected.
- the precondition for the current activation tone to maintain the frame number correction is that the activation tone flag indicates that the current frame is the active tone frame. If the condition is not met, the value of the current activation tone retention frame number num_speech_hangover is not corrected, and the process proceeds directly to step 105g.
- the current active tone hold frame number num_speech_hangover is equal to the minimum continuous active tone frame number minus the continuous speech frame. The number continues_speech_num.
- the number of active tone hold frames num_speech_hangover is set according to the size of the long-term signal to noise ratio lt_tsnr value. Otherwise, the value of the current active tone hold frame number num_speech_hangover is not corrected. In this embodiment, the minimum number of consecutive active tone frames is 8, which can take a value between [6, 20].
- the first threshold value continuous_speech_num_thr1 and the second threshold value continuous_speech_num_thr2 may be the same or different.
- the value of num_speech_hangover is 3; otherwise, if the long-term signal-to-noise ratio lt_snr is greater than 1.6, the value of num_speech_hangover is 4; otherwise, the value of num_speech_hangover is 5.
- Step 105g Add an activation tone hold according to the decision result of the current frame and the activation tone holding frame number num_speech_hangover, and obtain the VAD decision result of the current frame.
- the method is:
- the activation tone flag is 0, and the activation tone holding frame number num_speech_hangover is greater than 0, the activation tone is added, that is, the activation tone flag is set to 1, and the value of num_speech_hangover is decremented by 1.
- the final VAD decision result of the current frame is obtained.
- the method further includes: calculating, according to the initial VAD decision result, the average long-term activated sound signal energy E fg , the calculated value is used for the next frame VAD decision; after the step 105 g, the method may further include: according to the current frame The VAD decision results calculate the average long-term background noise energy E bg , and the calculated value is used for the next frame VAD decision.
- the average long-term activation tone signal energy E fg is calculated as follows:
- the initial VAD decision result indicates that the current frame is an active tone frame, that is, the value of the VAD flag is 1, and E t1 is greater than a multiple of E bg , and the embodiment takes 6 times, then the average long-term active sound energy is accumulated.
- the update method is fg_energy plus E t1 to get the new fg_energy. Add 1 to fg_energy_count to get the new fg_energy_count.
- fg_max_frame_num a set value of 512
- attenu_coef1 takes a value of 0.75.
- Bg_energy_count is the background noise energy accumulation frame number, which is used to record how many frames of energy the accumulated value of the latest background noise energy contains.
- Bg_energy is the accumulated value of the most recent background noise energy.
- the background noise energy accumulated value bg_energy and the background noise energy accumulated frame number bg_energy_count are updated.
- the update method is the background noise energy accumulated value bg_energy plus E t1 to obtain a new background noise energy accumulated value bg_energy.
- the background noise energy accumulation frame number bg_energy_count is incremented by one to obtain a new background noise energy accumulation frame number bg_energy_count.
- the background noise energy accumulation frame number bg_energy_count is equal to the average long time background noise
- the maximum number of count frames for acoustic energy calculation then the accumulated frame number and the accumulated value are multiplied by the attenuation coefficient attenu_coef2.
- the maximum number of count frames calculated by the average long-term background noise energy in this embodiment is 512, and the attenuation coefficient attenu_coef2 is equal to 0.75.
- the background noise energy calculation equation is obtained by dividing the background noise energy accumulated value bg_energy by the background noise energy accumulation frame number to obtain an average long time background noise energy as follows:
- first embodiment may further include the following steps:
- Step 106 Calculate the background noise update identifier according to the VAD decision result, the tonal feature parameter, the SNR parameter, the tonality flag, and the time domain stability feature parameter of the current frame. For the calculation method, refer to the second embodiment described later.
- Step 107 Obtain the background noise energy of the current frame according to the background noise update identifier and the frame energy parameter of the current frame, and the full-band background noise energy of the previous frame of the current frame; the background noise energy of the current frame is used for the next frame. Signal to noise ratio parameter calculation.
- the background noise update identifier is used to determine whether to perform background noise update. If the background noise update identifier is 1, the background noise update is performed according to the ratio of the energy of the full-band background noise energy to the current frame signal.
- the background noise energy estimate includes a subband background noise energy estimate and a full band background noise energy estimate.
- E sb2_bg (k) E sb2_bg_pre (k) ⁇ bg_e +E sb2_bg (k) ⁇ (1- ⁇ bg_e ); 0 ⁇ k ⁇ num_sb
- E sb2_bg_pre (k) represents the subband background noise energy of the kth SNR subband of the previous frame.
- ⁇ bg_e is the background noise update factor whose value is determined by the full-band background noise energy of the previous frame and the current frame energy parameter. The calculation process is as follows:
- the value is 0.96, otherwise the value is 0.95.
- the background noise update identifier of the current frame is 1
- the background noise energy accumulated value E t_sum and the background noise energy accumulated frame number N Et_counter are updated , and the calculation equation is as follows:
- E t_sum E t_sum_-1 +E t1 ;
- N Et_counter N Et_counter_-1 +1;
- E t_sum_-1 is the accumulated background noise energy of the previous frame
- N Et_counter_-1 is the accumulated number of background noise energy frames calculated in the previous frame.
- the total band background noise energy is obtained by the ratio of the background noise energy accumulated value E t_sum to the cumulative number of frames N Et_counter :
- N Et_counter is equal to 64, and if N Et_counter is equal to 64, the background noise energy accumulated value E t_sum and the accumulated frame number N Et_counter are respectively multiplied by 0.75.
- tonality flag tonality_flag is equal to 1 and the value of the frame energy parameter E t1 is less than the value of the background noise energy characteristic parameter E t — bg multiplied by a gain coefficient gain
- E t_sum E t_sum ⁇ gain+delta
- E sb2_bg (k) E sb2_bg (k) ⁇ gain+delta;
- the value of gain is [0.3, 1].
- An embodiment of the present invention further provides an embodiment of a background noise detecting method. As shown in FIG. 3, the method includes:
- Step 201 Obtain a subband signal and a spectrum amplitude of a current frame.
- Step 202 Calculate a value of a spectral flatness characteristic parameter and a tonal characteristic parameter according to the spectral amplitude value according to the frame energy parameter, the spectral gravity center characteristic parameter, and the time domain stability characteristic parameter calculated by the subband signal;
- the frame energy parameter is a weighted superposition value or a direct superimposed value for each sub-band signal energy.
- the spectral center of gravity feature parameter is a ratio of a weighted accumulated value of all or part of the subband signal energy to an unweighted accumulated value, or a value obtained by smoothing the ratio.
- the time domain stability characteristic parameter is a desired ratio of the variance of the frame energy amplitude to the square of the amplitude superposition value, or the ratio is multiplied by a coefficient.
- the spectral flatness parameter is a ratio of a geometric mean of the predetermined plurality of spectral magnitudes to an arithmetic mean, or the ratio is multiplied by a coefficient.
- Step 201 and step 202 can adopt the same method as above, and details are not described herein again.
- Step 203 Perform background noise detection according to the spectral center-of-gravity characteristic parameter, the time domain stability characteristic parameter, the spectral flatness characteristic parameter, the tonal characteristic parameter, and the current frame energy parameter, and determine whether the current frame is background noise.
- the background noise update identifier is set to a first preset value; then, if any of the following conditions is true, it is determined that the current frame is not a noise signal, and the background noise update flag is set to the first Two preset values:
- the time domain stability characteristic parameter lt_stable_rate0 is greater than a set threshold
- the smoothed filter value of the spectral center of gravity characteristic parameter value is greater than a set threshold value, and the time domain stability characteristic parameter value is also greater than a set threshold value;
- the smoothed filtered value of the tonal characteristic parameter or the tonal characteristic parameter is greater than a set threshold value, and the time domain stability characteristic parameter lt_stable_rate0 value is greater than the threshold value set therein;
- the smoothed filtered values of the spectral flatness characteristic parameters of each sub-band or the spectral flatness characteristic parameters of each sub-band are smaller than respective corresponding set threshold values;
- the value of the frame energy parameter E t1 is greater than the set threshold E_thr1.
- a background noise update identifier background_flag is used to indicate whether the current frame is background noise, and it is agreed that if the current frame is determined to be background noise, the background noise update identifier background_flag is set to 1 (the first preset value), otherwise the background noise is set.
- the update flag background_flag is 0 (second preset value).
- the feature parameter and the current frame energy parameter detect whether the current frame is a noise signal. If it is not a noise signal, the background noise update flag background_flag is set to zero.
- the threshold value lt_stable_rate_thr1 ranges from [0.8, 1.6];
- Judging tonal characteristic parameters Whether the value is greater than a set threshold value tonality_rate_thr1, whether the time domain stability characteristic parameter lt_stable_rate0 value is greater than the set threshold value lt_stable_rate_thr3, if the above conditions are satisfied, it is determined that the current frame is not background noise, and background_flag is assigned a value of 0.
- the threshold value of tonality_rate_thr1 ranges from [0.4, 0.66].
- the threshold value lt_stable_rate_thr3 ranges from [0.06, 0.3].
- the background_flag is assigned a value of 0.
- the value range of sSMR_thr4, sSMR_thr5, and sSMR_thr6 is [0.80, 0.92].
- E_thr1 takes the value according to the dynamic range of the frame energy parameter.
- the embodiment of the invention further provides a method for correcting the number of activated sound holding frames in the VAD decision. As shown in FIG. 4, the method includes:
- Step 301 Calculating a long-term signal to noise ratio lt_snr according to the subband signal
- the long-term signal-to-noise ratio lt_snr is calculated by calculating the ratio of the average long-time activated sound signal energy and the average long-term background noise energy calculated from the previous frame of the current frame; the long-term signal-to-noise ratio lt_snr can be represented by a logarithm.
- Step 302 Calculate an average full-band signal-to-noise ratio SNR2_lt_ave
- Step 303 Maintain the current active tone according to the decision result of the multiple frames before the current frame, the long-term signal to noise ratio lt_snr, the average full-band signal-to-noise ratio SNR2_lt_ave, the signal-to-noise ratio parameter of the current frame, and the VAD decision result of the current frame. The number of frames is corrected.
- the precondition for the current activation tone to maintain the frame number correction is that the activation tone flag indicates that the current frame is an active tone frame.
- the current active tone keeping frame number is corrected, if the continuous voice frame number is less than a set first threshold value 1 and the long time signal to noise ratio lt_snr is less than a set threshold value 2, the current active tone remains The number of frames is equal to the minimum number of consecutive active tone frames minus the number of consecutive speech frames; otherwise, if the average full band signal to noise ratio SNR2_lt_ave is greater than a set threshold value of 3, and the number of consecutive speech frames is greater than a set second threshold A value of 4 sets the value of the number of active tone hold frames according to the size of the long-term signal to noise ratio. Otherwise, the value of the current active tone hold frame number num_speech_hangover is not corrected.
- An embodiment of the present invention provides a method for acquiring an activation tone correction frame number, as shown in FIG. as follows:
- the number of activated tone correction frames is selected to be a constant, for example, 20 and the number of active tone holding frames. Maximum value.
- the method may further include: 405: correcting the VAD decision result according to the VAD decision result and the activation sound correction frame number, wherein:
- the current frame is set to be an active sound frame, and the number of activated sound correction frames is decreased by 1.
- the embodiment of the present invention further provides an apparatus 60 for acquiring the number of activated sound correction frames.
- the obtaining apparatus 60 includes:
- the first obtaining unit 61 is configured to: obtain an activation sound detection decision result of the current frame;
- the second obtaining unit 62 is configured to: obtain an activation tone holding frame number
- the third obtaining unit 63 is configured to: obtain the number of background noise updates
- the fourth obtaining unit 64 is configured to: obtain an activated sound correction frame number according to the activation sound detection determination result of the current frame, the background noise update number, and the activation sound retention frame number.
- the embodiment of the invention provides an activation sound detection method, as shown in FIG. 7, the steps are as follows:
- the second activation tone detection decision result vadb_flag is obtained by any existing activation tone detection decision scheme, and the existing activation tone detection decision scheme is not elaborated herein.
- the number of activated sound correction frames is selected to be 20 and the maximum value of the number of active sound holding frames.
- the current frame is set to be the active sound frame, and the number of the activated sound correction frames is decreased by 1.
- the embodiment of the present invention further provides an active sound detecting device.
- the detecting device 80 includes:
- the fifth obtaining unit 81 is configured to: obtain a first activated sound detection decision result
- the sixth obtaining unit 82 is configured to: obtain an activation tone holding frame number
- the seventh obtaining unit 83 is configured to: obtain the number of background noise updates
- the first calculating unit 84 is configured to: calculate the number of the activated sound correction frames according to the first activation sound detection determination result, the background noise update number, and the activation sound retention frame number;
- the eighth obtaining unit 85 is configured to: obtain a second activated sound detection decision result
- the second calculating unit 86 is configured to calculate the activation sound detection determination result according to the activation sound correction frame number and the second activation sound detection determination result.
- VAD Voice over IP
- the technical solution provided by the embodiment of the present invention overcomes the shortcomings of the existing VAD algorithm, and improves the detection efficiency of the unstable noise by the VAD, and also improves the accuracy of the music detection.
- the speech and audio signal processing algorithm using the technical solution provided by the embodiment of the present invention can achieve better performance.
- the background noise detecting method provided by the embodiment of the invention can make the estimation of the background noise more accurate and stable, and is beneficial to improving the accuracy of the VAD detection.
- the method for detecting a tonality signal provided by the embodiment of the invention improves the accuracy of the tonal music detection.
- the method for correcting the number of active tone keeping frames provided by the embodiment of the present invention can make the VAD algorithm have a better balance between performance and efficiency under different noise and signal to noise ratios.
- the method for adjusting the decision signal to noise ratio threshold in the VAD decision provided by the embodiment of the present invention can make the VAD decision algorithm achieve better accuracy under different signal to noise ratios, and further improve in the case of ensuring quality. effectiveness.
- all or part of the steps of the above embodiments may also be implemented by using an integrated circuit. These steps may be separately fabricated into individual integrated circuit modules, or multiple modules or steps may be fabricated into a single integrated circuit module. achieve.
- the device/function module/functional unit in the above embodiment can be implemented by using a general-purpose computing device. Now, they can be concentrated on a single computing device or distributed over a network of multiple computing devices.
- the device/function module/functional unit in the above embodiment When the device/function module/functional unit in the above embodiment is implemented in the form of a software function module and sold or used as a stand-alone product, it can be stored in a computer readable storage medium.
- the above mentioned computer readable storage medium may be a read only memory, a magnetic disk or an optical disk or the like.
- the technical solution provided by the embodiment of the present invention overcomes the shortcomings of the existing VAD algorithm, and improves the detection efficiency of the unstable noise by the VAD, and also improves the accuracy of the music detection.
- the speech and audio signal processing algorithm using the technical solution provided by the embodiment of the present invention can achieve better performance.
- the background noise detecting method provided by the embodiment of the invention can make the estimation of the background noise more accurate and stable, and is beneficial to improving the accuracy of the VAD detection.
- the method for detecting a tonality signal provided by the embodiment of the invention improves the accuracy of the tonal music detection.
- the method for correcting the number of active tone keeping frames provided by the embodiment of the present invention can make the VAD algorithm have a better balance between performance and efficiency under different noise and signal to noise ratios.
- the method for adjusting the decision signal to noise ratio threshold in the VAD decision provided by the embodiment of the present invention can make the VAD decision algorithm achieve better accuracy under different signal to noise ratios, and further improve in the case of ensuring quality. effectiveness.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Signal Processing (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Quality & Reliability (AREA)
- Artificial Intelligence (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Circuit For Audible Band Transducer (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
- Telephone Function (AREA)
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
Abstract
Description
Claims (27)
- 一种激活音修正帧数的获取方法,所述方法包括:获得当前帧的激活音检测判决结果;获得激活音保持帧数;获得背景噪声更新次数;根据所述当前帧的激活音检测判决结果、所述背景噪声更新次数和所述激活音保持帧数获取激活音修正帧数。
- 根据权利要求1所述的方法,其中,所述获得当前帧的激活音检测判决结果包括:获得所述当前帧的子带信号及频谱幅值;根据所述子带信号计算得到所述当前帧的帧能量参数、谱重心特征参数和时域稳定度特征参数;根据所述频谱幅值计算得到谱平坦度特征参数和调性特征参数;根据利用所述当前帧的前一帧得到的背景噪声能量、所述帧能量参数及信噪比子带能量计算得到所述当前帧的信噪比参数;根据所述帧能量参数、所述谱重心特征参数、所述时域稳定度特征参数、所述谱平坦度特征参数、所述调性特征参数计算得到所述当前帧的调性标志;根据所述调性标志、所述信噪比参数、所述谱重心特征参数、所述帧能量参数计算得到所述激活音检测判决结果。
- 根据权利要求2所述的方法,其中,所述根据所述调性标志、所述信噪比参数、所述谱重心特征参数、所述帧能量参数计算得到所述激活音检测判决结果包括:通过所述当前帧的前一帧计算得到的平均长时激活音信号能量和平均长时背景噪声能量的比值,计算得到长时信噪比;计算距离所述当前帧最近的多个帧的全带信噪比的平均值,得到平均全带信噪比;根据所述谱重心特征参数、所述长时信噪比、连续激活音帧个数和连续噪声帧个数得到激活音检测判决的判决信噪比门限;根据所述激活音检测的判决门限和所述信噪比参数计算得到初始的激活音检测判决结果;根据所述调性标志、所述平均全带信噪比、所述谱重心特征参数和所述长时信噪比对所述初始的激活音检测判决结果进行修正,得到所述激活音检测判决结果。
- 根据权利要求1所述的方法,其中,所述根据所述当前帧的激活音检测判决结果、所述背景噪声更新次数和所述激活音保持帧数获取激活音修正帧数包括:当所述当前帧的激活音检测判决结果为激活音帧,且所述背景噪声更新次数小于预设门限值时,则所述激活音修正帧数为一个常数和所述激活音保持帧数中的最大值。
- 根据权利要求1所述的方法,其中,所述获得激活音保持帧数包括:获得所述当前帧的子带信号及频谱幅值;根据所述子带信号计算得到长时信噪比和平均全带信噪比,根据所述当前帧之前的多个帧的激活音检测的判决结果、长时信噪比、平均全带信噪比、所述当前帧的激活音检测判决结果,对当前激活音保持帧数进行修正获得所述激活音保持帧数。
- 根据权利要求5所述的方法,其中,所述根据所述子带信号计算得到长时信噪比和平均全带信噪比包括:通过利用所述当前帧的前一帧计算得到的平均长时激活音信号能量和平均长时背景噪声能量的比值,计算得到所述长时信噪比;计算距离所述当前帧最近的多个帧的全带信噪比的平均值,得到所述平均全带信噪比。
- 根据权利要求5所述的方法,其中,对所述当前激活音保持帧数进行修正的前提条件是激活音标志指示所述当前帧为激活音帧。
- 根据权利要求5所述的方法,其中,所述对当前激活音保持帧数进 行修正获得所述激活音保持帧数包括:如果所述连续语音帧数小于一个设定的第一门限值,并且所述长时信噪比小于一个设定的门限值,则所述激活音保持帧数等于最小连续激活音帧数减去所述连续语音帧数;如果所述平均全带信噪比大于一个设定的门限值,并且所述连续语音帧数大于一个设定的第二门限值,则根据所述长时信噪比的大小设置所述激活音保持帧数的值。
- 根据权利要求1所述的方法,其中,所述获得背景噪声更新次数包括:获得背景噪声更新标识;根据所述背景噪声更新标识计算所述背景噪声更新次数。
- 根据权利要求9所述的方法,其中,所述根据所述背景噪声更新标识计算所述背景噪声更新次数包括:当所述背景噪声更新标识指示所述当前帧为背景噪声,且所述背景噪声更新次数小于设定的门限值时,将所述背景噪声更新次数加1。
- 根据权利要求9所述的方法,其中,所述获得背景噪声更新标识包括:获得所述当前帧的子带信号及频谱幅值;根据所述子带信号计算得到帧能量参数、谱重心特征参数、时域稳定度特征参数;根据所述频谱幅值计算得到谱平坦度特征参数和调性特征参数;根据所述谱重心特征参数、所述时域稳定度特征参数、所述谱平坦度特征参数、所述调性特征参数、所述帧能量参数进行背景噪声检测,获得所述背景噪声更新标识。
- 根据权利要求11所述方法,其中,所述根据所述谱重心特征参数、所述时域稳定度特征参数、所述谱平坦度特征参数、所述调性特征参数、所述帧能量参数进行背景噪声检测,获得所述背景噪声更新标识,包括:设置所述背景噪声更新标识为第一预设值;如果以下任一条件成立,则判断所述当前帧不是噪声信号,并将所述背 景噪声更新标识设置为第二预设值:所述时域稳定度特征参数大于一个设定的门限值;所述谱重心特征参数值的平滑滤波值大于一个设定的门限值,且所述时域稳定度特征参数值也大于一个设定的门限值;所述调性特征参数或所述调性特征参数平滑滤波后的值大于一个设定的门限值,且时域稳定度特征参数值大于设定的门限值;每个子带的谱平坦度特征参数或所述每个子带的谱平坦度特征参数各自平滑滤波后的值均小于各自对应的设定的门限值;或,所述帧能量参数的值大于设定的门限值。
- 一种激活音检测方法,所述方法包括:获得第一激活音检测判决结果;获得激活音保持帧数;获得背景噪声更新次数;根据所述第一激活音检测判决结果、所述背景噪声更新次数和所述激活音保持帧数计算激活音修正帧数;获得第二激活音检测判决结果;根据所述激活音修正帧数和所述第二激活音检测判决结果计算所述激活音检测判决结果。
- 根据权利要求13所述的方法,其中,所述根据所述激活音修正帧数和所述第二激活音检测判决结果计算所述激活音检测判决结果包括:当所述第二激活音检测判决结果指示所述当前帧为非激活音帧,且所述激活音修正帧数大于0时,将所述激活音检测判决结果设置为激活音帧,且所述激活音修正帧数减1。
- 根据权利要求13所述的方法,其中,所述获得第一激活音检测判决结果包括:获得当前帧的子带信号及频谱幅值;根据所述子带信号计算得到所述当前帧的帧能量参数、谱重心特征参数 和时域稳定度特征参数;根据所述频谱幅值计算得到谱平坦度特征参数和调性特征参数;根据利用所述当前帧的前一帧得到的背景噪声能量、所述帧能量参数及信噪比子带能量计算得到所述当前帧的信噪比参数;根据所述帧能量参数、所述谱重心特征参数、所述时域稳定度特征参数、所述谱平坦度特征参数、所述调性特征参数计算得到所述当前帧的调性标志;根据所述调性标志、所述信噪比参数、所述谱重心特征参数、所述帧能量参数计算得到所述第一激活音检测判决结果。
- 根据权利要求15所述的方法,其中,所述根据所述调性标志、所述信噪比参数、所述谱重心特征参数、所述帧能量参数计算得到所述第一激活音检测判决结果包括:通过所述当前帧的前一帧计算得到的平均长时激活音信号能量和平均长时背景噪声能量的比值,计算得到长时信噪比;计算距离所述当前帧最近的多个帧的全带信噪比的平均值,得到平均全带信噪比;根据所述谱重心特征参数、所述长时信噪比、连续激活音帧个数和连续噪声帧个数得到激活音检测的判决门限;根据所述激活音检测的判决门限和所述信噪比参数计算得到初始的激活音检测判决结果;根据所述调性标志、所述平均全带信噪比、所述谱重心特征参数和所述长时信噪比对所述初始的激活音检测判决结果进行修正,得到所述第一激活音检测判决结果。
- 根据权利要求13所述的方法,其中,所述获得激活音保持帧数包括:获得当前帧的子带信号及频谱幅值;根据所述子带信号计算得到长时信噪比和平均全带信噪比,根据所述当前帧之前的多个帧的激活音检测的判决结果、长时信噪比、平均全带信噪 比、所述第一激活音检测判决结果,对当前激活音保持帧数进行修正。
- 根据权利要求17所述的方法,其中,所述根据所述子带信号计算得到长时信噪比和平均全带信噪比包括:通过利用所述当前帧的前一帧计算得到的平均长时激活音信号能量和平均长时背景噪声能量的比值,计算得到所述长时信噪比;计算距离所述当前帧最近的多个帧的全带信噪比的平均值,得到所述平均全带信噪比。
- 根据权利要求17所述的方法,其中,所述对当前激活音保持帧数进行修正包括:如果连续语音帧数小于一个设定的第一门限值,并且所述长时信噪比小于一个设定的门限值,则所述激活音保持帧数等于最小连续激活音帧数减去所述连续语音帧数;如果所述平均全带信噪比大于一个设定的门限值,并且所述连续语音帧数大于一个设定的第二门限值,则根据所述长时信噪比的大小设置所述激活音保持帧数的值。
- 根据权利要求13所述的方法,其中,所述获得背景噪声更新次数包括:获得背景噪声更新标识;根据所述背景噪声更新标识计算所述背景噪声更新次数。
- 根据权利要求20所述的方法,其中,所述根据所述背景噪声更新标识计算所述背景噪声更新次数包括:当所述背景噪声更新标识指示所述当前帧为背景噪声时,且所述背景噪声更新次数小于设定的门限值时,将所述背景噪声更新次数加1。
- 根据权利要求20所述的方法,其中,所述获得背景噪声更新标识包括:获得当前帧的子带信号及频谱幅值;根据所述子带信号计算得到的帧能量参数、谱重心特征参数、时域稳定度特征参数的值,根据所述频谱幅值计算得到谱平坦度特征参数和调性特征参数的值;根据所述谱重心特征参数、所述时域稳定度特征参数、所述谱平坦度特征参数、所述调性特征参数、所述帧能量参数进行背景噪声检测,获得所述背景噪声更新标识。
- 根据权利要求22所述的方法,其中,所述根据所述谱重心特征参数、所述时域稳定度特征参数、所述谱平坦度特征参数、所述调性特征参数、所述帧能量参数进行背景噪声检测,获得所述背景噪声更新标识,具体包括:设置所述背景噪声更新标识为第一预设值;如果以下任一条件成立,则判断所述当前帧不是噪声信号,并将所述背景噪声更新标识设置为第二预设值:所述时域稳定度特征参数大于一个设定的门限值;所述谱重心特征参数值的平滑滤波值大于一个设定的门限值,且所述时域稳定度特征参数值也大于一个设定的门限值;所述调性特征参数或所述调性特征参数平滑滤波后的值大于一个设定的门限值,且所述时域稳定度特征参数值大于设定的门限值;每个子带的谱平坦度特征参数或所述每个子带的谱平坦度特征参数各自平滑滤波后的值均小于各自对应的设定的门限值;或,所述帧能量参数的值大于设定的门限值。
- 根据权利要求13所述的方法,其中,所述根据所述第一激活音检测判决结果、所述背景噪声更新次数和所述激活音保持帧数计算激活音修正帧数包括:当所述第一激活音检测判决结果为激活音帧,且所述背景噪声更新次数小于预设门限值时,则所述激活音修正帧数为一个常数和所述激活音保持帧数中的最大值。
- 一种激活音修正帧数的获取装置,所述装置包括:第一获取单元,设置为:获得当前帧的激活音检测判决结果;第二获取单元,设置为:获得激活音保持帧数;第三获取单元,设置为:获得背景噪声更新次数;第四获取单元,设置为:根据所述当前帧的激活音检测判决结果、所述背景噪声更新次数和所述激活音保持帧数获取激活音修正帧数。
- 一种激活音检测装置,所述装置包括:第五获取单元,设置为:获得第一激活音检测判决结果;第六获取单元,设置为:获得激活音保持帧数;第七获取单元,设置为:获得背景噪声更新次数;第一计算单元,设置为:根据所述第一激活音检测判决结果、所述背景噪声更新次数和所述激活音保持帧数计算激活音修正帧数;第八获取单元,设置为:获得第二激活音检测判决结果;第二计算单元,设置为:根据所述激活音修正帧数和所述第二激活音检测判决结果计算所述激活音检测判决结果。
- 一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行权利要求1-24任一项的方法。
Priority Applications (7)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| RU2017145122A RU2684194C1 (ru) | 2015-06-26 | 2015-11-05 | Способ получения кадра модификации речевой активности, устройство и способ обнаружения речевой активности |
| EP25180489.4A EP4641568A3 (en) | 2015-06-26 | 2015-11-05 | Voice activity modification frame acquiring method, and voice activity detection method and apparatus |
| JP2017566850A JP6635440B2 (ja) | 2015-06-26 | 2015-11-05 | 音声区間補正フレーム数の取得方法、音声区間検出方法及び装置 |
| CA2990328A CA2990328C (en) | 2015-06-26 | 2015-11-05 | Voice activity modification frame acquiring method, and voice activity detection method and apparatus |
| US15/577,343 US10522170B2 (en) | 2015-06-26 | 2015-11-05 | Voice activity modification frame acquiring method, and voice activity detection method and apparatus |
| EP15896160.7A EP3316256A4 (en) | 2015-06-26 | 2015-11-05 | Voice activity modification frame acquiring method, and voice activity detection method and apparatus |
| KR1020177036055A KR102042117B1 (ko) | 2015-06-26 | 2015-11-05 | 보이스 활성화 수정 프레임 수량의 취득 방법, 보이스 활성화 탐지 방법 및 장치 |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201510364255.0A CN106328169B (zh) | 2015-06-26 | 2015-06-26 | 一种激活音修正帧数的获取方法、激活音检测方法和装置 |
| CN201510364255.0 | 2015-06-26 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2016206273A1 true WO2016206273A1 (zh) | 2016-12-29 |
Family
ID=57584376
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2015/093889 Ceased WO2016206273A1 (zh) | 2015-06-26 | 2015-11-05 | 一种激活音修正帧数的获取方法、激活音检测方法和装置 |
Country Status (8)
| Country | Link |
|---|---|
| US (1) | US10522170B2 (zh) |
| EP (2) | EP4641568A3 (zh) |
| JP (1) | JP6635440B2 (zh) |
| KR (1) | KR102042117B1 (zh) |
| CN (1) | CN106328169B (zh) |
| CA (1) | CA2990328C (zh) |
| RU (1) | RU2684194C1 (zh) |
| WO (1) | WO2016206273A1 (zh) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10522170B2 (en) | 2015-06-26 | 2019-12-31 | Zte Corporation | Voice activity modification frame acquiring method, and voice activity detection method and apparatus |
| CN112420079A (zh) * | 2020-11-18 | 2021-02-26 | 青岛海尔科技有限公司 | 语音端点检测方法和装置、存储介质及电子设备 |
Families Citing this family (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105261375B (zh) * | 2014-07-18 | 2018-08-31 | 中兴通讯股份有限公司 | 激活音检测的方法及装置 |
| JP6759898B2 (ja) * | 2016-09-08 | 2020-09-23 | 富士通株式会社 | 発話区間検出装置、発話区間検出方法及び発話区間検出用コンピュータプログラム |
| CN107123419A (zh) * | 2017-05-18 | 2017-09-01 | 北京大生在线科技有限公司 | Sphinx语速识别中背景降噪的优化方法 |
| CN108962284B (zh) * | 2018-07-04 | 2021-06-08 | 科大讯飞股份有限公司 | 一种语音录制方法及装置 |
| CN111599345B (zh) * | 2020-04-03 | 2023-02-10 | 厦门快商通科技股份有限公司 | 语音识别算法评估方法、系统、移动终端及存储介质 |
| US11636872B2 (en) * | 2020-05-07 | 2023-04-25 | Netflix, Inc. | Techniques for computing perceived audio quality based on a trained multitask learning model |
| CN112908352B (zh) * | 2021-03-01 | 2024-04-16 | 百果园技术(新加坡)有限公司 | 一种音频去噪方法、装置、电子设备及存储介质 |
| EP4354898A4 (en) * | 2021-06-08 | 2024-10-16 | Panasonic Intellectual Property Management Co., Ltd. | Ear-mounted device and reproduction method |
| US20230046530A1 (en) * | 2021-08-03 | 2023-02-16 | Bard College | Enhanced bird feeders and baths |
| CN114220446B (zh) * | 2021-12-08 | 2025-07-18 | 漳州立达信光电子科技有限公司 | 一种适应性背景噪声检测方法、系统及介质 |
| US12525226B2 (en) * | 2023-02-10 | 2026-01-13 | Qualcomm Incorporated | Latency reduction for multi-stage speech recognition |
Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1473321A (zh) * | 2000-09-09 | 2004-02-04 | 英特尔公司 | 用于综合电信处理的话音激活检测器 |
| CN101197135A (zh) * | 2006-12-05 | 2008-06-11 | 华为技术有限公司 | 声音信号分类方法和装置 |
| CN101399039A (zh) * | 2007-09-30 | 2009-04-01 | 华为技术有限公司 | 一种确定非噪声音频信号类别的方法及装置 |
| CN101841587A (zh) * | 2009-03-20 | 2010-09-22 | 联芯科技有限公司 | 信号音检测方法和装置以及移动终端噪声抑制方法 |
| CN102687196A (zh) * | 2009-10-08 | 2012-09-19 | 西班牙电信公司 | 用于检测语音段的方法 |
| WO2012146290A1 (en) * | 2011-04-28 | 2012-11-01 | Telefonaktiebolaget L M Ericsson (Publ) | Frame based audio signal classification |
| CN103903634A (zh) * | 2012-12-25 | 2014-07-02 | 中兴通讯股份有限公司 | 激活音检测及用于激活音检测的方法和装置 |
| CN104424956A (zh) * | 2013-08-30 | 2015-03-18 | 中兴通讯股份有限公司 | 激活音检测方法和装置 |
Family Cites Families (19)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH05130067A (ja) | 1991-10-31 | 1993-05-25 | Nec Corp | 可変閾値型音声検出器 |
| US6269331B1 (en) * | 1996-11-14 | 2001-07-31 | Nokia Mobile Phones Limited | Transmission of comfort noise parameters during discontinuous transmission |
| CN1703736A (zh) * | 2002-10-11 | 2005-11-30 | 诺基亚有限公司 | 用于源控制可变比特率宽带语音编码的方法和装置 |
| US7567900B2 (en) | 2003-06-11 | 2009-07-28 | Panasonic Corporation | Harmonic structure based acoustic speech interval detection method and device |
| JP4729927B2 (ja) * | 2005-01-11 | 2011-07-20 | ソニー株式会社 | 音声検出装置、自動撮像装置、および音声検出方法 |
| EP2276023A3 (en) * | 2005-11-30 | 2011-10-05 | Telefonaktiebolaget LM Ericsson (publ) | Efficient speech stream conversion |
| EP1982324B1 (en) | 2006-02-10 | 2014-09-24 | Telefonaktiebolaget LM Ericsson (publ) | A voice detector and a method for suppressing sub-bands in a voice detector |
| CN101320559B (zh) * | 2007-06-07 | 2011-05-18 | 华为技术有限公司 | 一种声音激活检测装置及方法 |
| GB2450886B (en) * | 2007-07-10 | 2009-12-16 | Motorola Inc | Voice activity detector and a method of operation |
| US20120095760A1 (en) * | 2008-12-19 | 2012-04-19 | Ojala Pasi S | Apparatus, a method and a computer program for coding |
| CN102044244B (zh) * | 2009-10-15 | 2011-11-16 | 华为技术有限公司 | 信号分类方法和装置 |
| CN102693720A (zh) * | 2009-10-15 | 2012-09-26 | 华为技术有限公司 | 一种音频信号检测方法和装置 |
| EP2816560A1 (en) * | 2009-10-19 | 2014-12-24 | Telefonaktiebolaget L M Ericsson (PUBL) | Method and background estimator for voice activity detection |
| CN102741918B (zh) * | 2010-12-24 | 2014-11-19 | 华为技术有限公司 | 用于话音活动检测的方法和设备 |
| JP5936377B2 (ja) | 2012-02-06 | 2016-06-22 | 三菱電機株式会社 | 音声区間検出装置 |
| RU2536343C2 (ru) * | 2013-04-15 | 2014-12-20 | Открытое акционерное общество "Концерн "Созвездие" | Способ выделения речевого сигнала в условиях наличия помех и устройство для его осуществления |
| JP6406257B2 (ja) * | 2013-08-30 | 2018-10-17 | 日本電気株式会社 | 信号処理装置、信号処理方法および信号処理プログラム |
| FI125723B (en) * | 2014-07-11 | 2016-01-29 | Suunto Oy | Portable activity tracking device and associated method |
| CN106328169B (zh) | 2015-06-26 | 2018-12-11 | 中兴通讯股份有限公司 | 一种激活音修正帧数的获取方法、激活音检测方法和装置 |
-
2015
- 2015-06-26 CN CN201510364255.0A patent/CN106328169B/zh active Active
- 2015-11-05 JP JP2017566850A patent/JP6635440B2/ja active Active
- 2015-11-05 EP EP25180489.4A patent/EP4641568A3/en active Pending
- 2015-11-05 KR KR1020177036055A patent/KR102042117B1/ko active Active
- 2015-11-05 RU RU2017145122A patent/RU2684194C1/ru active
- 2015-11-05 EP EP15896160.7A patent/EP3316256A4/en not_active Ceased
- 2015-11-05 WO PCT/CN2015/093889 patent/WO2016206273A1/zh not_active Ceased
- 2015-11-05 CA CA2990328A patent/CA2990328C/en active Active
- 2015-11-05 US US15/577,343 patent/US10522170B2/en active Active
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1473321A (zh) * | 2000-09-09 | 2004-02-04 | 英特尔公司 | 用于综合电信处理的话音激活检测器 |
| CN101197135A (zh) * | 2006-12-05 | 2008-06-11 | 华为技术有限公司 | 声音信号分类方法和装置 |
| CN101399039A (zh) * | 2007-09-30 | 2009-04-01 | 华为技术有限公司 | 一种确定非噪声音频信号类别的方法及装置 |
| CN101841587A (zh) * | 2009-03-20 | 2010-09-22 | 联芯科技有限公司 | 信号音检测方法和装置以及移动终端噪声抑制方法 |
| CN102687196A (zh) * | 2009-10-08 | 2012-09-19 | 西班牙电信公司 | 用于检测语音段的方法 |
| WO2012146290A1 (en) * | 2011-04-28 | 2012-11-01 | Telefonaktiebolaget L M Ericsson (Publ) | Frame based audio signal classification |
| CN103903634A (zh) * | 2012-12-25 | 2014-07-02 | 中兴通讯股份有限公司 | 激活音检测及用于激活音检测的方法和装置 |
| CN104424956A (zh) * | 2013-08-30 | 2015-03-18 | 中兴通讯股份有限公司 | 激活音检测方法和装置 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP3316256A4 * |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10522170B2 (en) | 2015-06-26 | 2019-12-31 | Zte Corporation | Voice activity modification frame acquiring method, and voice activity detection method and apparatus |
| CN112420079A (zh) * | 2020-11-18 | 2021-02-26 | 青岛海尔科技有限公司 | 语音端点检测方法和装置、存储介质及电子设备 |
Also Published As
| Publication number | Publication date |
|---|---|
| KR20180008647A (ko) | 2018-01-24 |
| EP4641568A2 (en) | 2025-10-29 |
| CN106328169B (zh) | 2018-12-11 |
| CA2990328C (en) | 2021-09-21 |
| CN106328169A (zh) | 2017-01-11 |
| JP6635440B2 (ja) | 2020-01-22 |
| EP3316256A4 (en) | 2018-08-22 |
| KR102042117B1 (ko) | 2019-11-08 |
| RU2684194C1 (ru) | 2019-04-04 |
| JP2018523155A (ja) | 2018-08-16 |
| EP4641568A3 (en) | 2025-11-12 |
| US10522170B2 (en) | 2019-12-31 |
| EP3316256A1 (en) | 2018-05-02 |
| CA2990328A1 (en) | 2016-12-29 |
| US20180158470A1 (en) | 2018-06-07 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2016206273A1 (zh) | 一种激活音修正帧数的获取方法、激活音检测方法和装置 | |
| CN112992188B (zh) | 一种激活音检测vad判决中信噪比门限的调整方法及装置 | |
| CN104424956B9 (zh) | 激活音检测方法和装置 | |
| US9672841B2 (en) | Voice activity detection method and method used for voice activity detection and apparatus thereof | |
| US10339961B2 (en) | Voice activity detection method and apparatus | |
| WO2012158157A1 (en) | Method for super-wideband noise supression | |
| CN110390947B (zh) | 声源位置的确定方法、系统、设备和存储介质 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15896160 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 15577343 Country of ref document: US |
|
| ENP | Entry into the national phase |
Ref document number: 20177036055 Country of ref document: KR Kind code of ref document: A |
|
| ENP | Entry into the national phase |
Ref document number: 2990328 Country of ref document: CA |
|
| ENP | Entry into the national phase |
Ref document number: 2017566850 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2017145122 Country of ref document: RU Ref document number: 2015896160 Country of ref document: EP |
|
| WWW | Wipo information: withdrawn in national office |
Ref document number: 2015896160 Country of ref document: EP |














