EP0334023A2 - Procédé de détection de signaux de parole - Google Patents
Procédé de détection de signaux de parole Download PDFInfo
- Publication number
- EP0334023A2 EP0334023A2 EP89102876A EP89102876A EP0334023A2 EP 0334023 A2 EP0334023 A2 EP 0334023A2 EP 89102876 A EP89102876 A EP 89102876A EP 89102876 A EP89102876 A EP 89102876A EP 0334023 A2 EP0334023 A2 EP 0334023A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- signal
- speech
- amplitude
- signals
- control amplifier
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000001514 detection method Methods 0.000 title abstract description 27
- 230000002238 attenuated effect Effects 0.000 claims description 2
- 230000001934 delay Effects 0.000 claims 1
- 230000003321 amplification Effects 0.000 abstract description 4
- 238000003199 nucleic acid amplification method Methods 0.000 abstract description 4
- 238000001914 filtration Methods 0.000 abstract description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000000737 periodic effect Effects 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000010219 correlation analysis Methods 0.000 description 1
- 238000005314 correlation function Methods 0.000 description 1
- 230000000875 corresponding effect Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 230000010355 oscillation Effects 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
Definitions
- the invention relates to a method for recognizing speech signals, these first being fed to a low-pass filter, the pass band of which is in the range of the basic speech frequency.
- the recognition of voice signals is of great importance, since the presence of voice signals can be used as a criterion for increasing the gain.
- the amplification of the transmit and receive signal is controlled as a function of the presence of a speech signal. The same applies to conference facilities.
- the object of the invention is now to provide a method for recognizing speech signals, in which the presence of speech signals is recognized after a very short time, without this suppressing initial syllables.
- This object is achieved in that the signals appearing at the output of the low-pass filter are checked for the amplitude and duration of a specific amplitude and in that a speech signal is recognized when at least three successive amplitudes have occurred within a predetermined time frame.
- the signals are first checked for maximum amplitude values. As soon as a maximum amplitude value is determined, the period of time within which a further maximum amplitude value occurs is measured in order to be able to recognize speech signals in this way.
- the three amplitudes A1 to A3 shown in FIG. 1 are the amplitudes of a speech signal which are present at the output of a low-pass filter whose cut-off frequency is approximately 400 Hz.
- the signals supplied to the input of the low-pass filter are generated, for example, by a microphone and are composed of room noises and speech signals.
- the method according to the invention for recognizing speech signals now essentially uses the frequency range of the fundamental speech frequency (80 to 333 Hz) for analysis.
- the most important feature for the detection of speech signals is the period of the vibrations of the speech signals, which is in the range of 3 to 12.5 ms at the basic speech frequency depending on the speaker. This first feature is used to distinguish between speech and noise.
- the detection of zero crossings in the speech signal is not expedient, since in the event of interference, for example due to noise, the number of zero crossings can increase so greatly that speech recognition is no longer possible in this way.
- the method according to the invention uses the maxima of the speech signal to recognize speech. If these are then within a predetermined amplitude time window, then a first criterion for the presence of speech signals is given. The choice of window parameters has a significant influence on the period detection.
- the window size is chosen such that it is smaller than half the smallest possible period of the basic speech frequency so that both positive and negative maximum values of the speech signal can be recognized. This is necessary because the speech signal is not symmetrical with respect to the dynamic range.
- the window size is therefore approximately 0.9 ms.
- the amplitude tolerance of the maximum values is very small over a few periods in the case of an undisturbed speech signal, but can be increased significantly at high interference levels due to additive superimposition of the interference signal.
- the amplitude window is approximately plus minus 20% of the first maximum.
- the amplitude A1 has been recognized as the maximum value, whereupon its duration t1 is stored as a period.
- the time window of the period PF begins at the temporal center of the amplitude A1 of the first maximum M1 to run, which is open between 3 and 12.5 ms. If the next amplitude A2 now falls within the time window of the period PF, since its time window ZF lies within the amplitude window AF, the duration of the amplitude A2 is identified as the second maximum by storing the value t2.
- the amplitude window AF is defined as a threshold as a function of the amplitude value of the first maximum M1.
- a simple counting process for detecting the three successive amplitudes A1 to A3, which meet the conditions described above, can already be used to conclude that a speech signal is present, in which case it is not necessary to store the period durations t1 to t3.
- two methods can be used for a more precise determination of speech signals, which are described below.
- the degree of correlation between the individual periods is determined. Through a cross correlation between the successive signal sections of a period length, high values for the nominated cross correlation coefficient are achieved in the areas in which speech is present. However, if the detected period is only random maxima in the specified interval, the correlation analysis gives small values.
- the second or, in the case of detection of several periods the third period is correlated with the first. If three periods are correlated, the smaller of the two values is used for the decision. This reduces the frequency of errors in the case of randomly detected periods, particularly in the event of interference by noise signals. If more periods are used for the detection, the detection speed slows down, however, no further improvement can be achieved since the values of KKF (k. N p ) decrease significantly due to the amplitude and frequency modulation of the speech signal.
- a further improvement in the decision can be achieved if, instead of evaluating the cross-correlation function for speech decision, the nominated mean square error between the recognized periods is used.
- the decisive advantage of this method for speech detection is the recognition time.
- the detection time is 37.5 ms.
- the analysis using the simplified method described at the beginning gives approximately the same results as the evaluation method with cross-correlation or after determining the mean square error.
- the detection rate is on average 5% below the detection rate of the previously described method, but can also assume higher values depending on the noise situation. Differences to the above-mentioned procedure become clear when the speech sequence is disturbed. With the selected parameters, the period detection can deliver an increased number of wrong decisions, depending on the respective background noise, for some background noise situations.
- reflections of the interference signal if they meet the criteria for the presence of speech, are recognized as speech and lead to incorrect decisions.
- the detection of sinusoidal interference in the area of the fundamental speech frequency is only possible on the basis of the duration and frequency constancy of this interference signal.
- the selection of the method for speech detection to be used is essentially determined by the expected useful / interference power ratios and the interference noises.
- useful / interference power ratios of more than 12 dB
- the simplified detection method can already be used without arithmetic operations.
- all methods only have a short signal delay in the range of the detection time (9 to 37 ms) Sequence so that initial syllables are not suppressed.
- the method presented can be implemented, for example, with the aid of a signal processor SP (see FIG. 2).
- the analog signal from the microphone M is sampled and digitized via the analog / digital converter W1.
- the sample values obtained in this way can be used by the signal processor according to the method according to the invention for speech detection. If speech is recognized, the microphone signal can be amplified by the control amplifier RV1 by a fixed amount at the instigation of the signal processor SP.
- Such an arrangement is suitable, for example, for microphones which are located in a room with a large amount of noise.
- the amplification of the speech signals results in better intelligibility.
- a hands-free device in the presence of a speech signal in the signal of the microphone M, the control amplifier RV2 is caused by the signal processor SP to attenuate the signal for the loudspeaker LS accordingly, in order in this way to to prevent acoustic feedback between loudspeaker LS and microphone M.
- the control amplifier RV2 could be influenced at the instigation of the signal processor SP in such a way that it amplifies the input signal to achieve a better intelligibility of the loudspeaker signal LS.
- the signal processor receives at its inputs SE and EE data words which represent the samples of the signals. Data words are also applied to the connected lines at the outputs SA and EA of the signal processor SP. To avoid the suppression of initial syllables, the input signals can be delayed using the signal processor SP by a time which is in the range of the recognition time (5-37 ms). Likewise, a fall time can be caused by the signal processor SP for the control signals influencing the control amplifiers RV, which are of the order of magnitude of 200 to 900 ms and are used to bridge unvoiced sounds and short speech pauses between words and sentences.
- the low-pass filtering function with a cut-off frequency of 400 Hz can also be carried out by the signal processor SP.
- Another application of the method according to the invention is also conceivable in the context of an intercom system, the other direction being attenuated as a function of voice signals in one direction at the instigation of the signal processor.
- a signal processor is not further discussed in the context of this description, but such signal processors are sold, for example, by Texas Instruments under the designation TMS 320 or by Fujitsu under the designation MB 8764. Such a signal processor is to be programmed in such a way that the described method steps run automatically.
- the analog / digital converters W1 and W4 serve to convert the analog signals into digital signals for signal processing in the signal processor SP, while the conversion of the digital signals occurring at the outputs SA and EA into analog signals by the digital / analog converters W2 and W3 takes place.
- control amplifiers RV1 and RV2 can also be dispensed with if the function of amplifying the signals is taken over by the signal processor SP itself, which can also be designed as a suitable microprocessor.
- the implementation of the method according to the invention is conceivable by means of a corresponding, discretely constructed analog circuit arrangement or also a correspondingly designed customer circuit.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Noise Elimination (AREA)
- Interconnected Communication Systems, Intercoms, And Interphones (AREA)
- Circuit For Audible Band Transducer (AREA)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| DE3810068 | 1988-03-25 | ||
| DE19883810068 DE3810068A1 (de) | 1988-03-25 | 1988-03-25 | Verfahren zur erkennung von sprachsignalen |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| EP0334023A2 true EP0334023A2 (fr) | 1989-09-27 |
| EP0334023A3 EP0334023A3 (fr) | 1991-02-06 |
Family
ID=6350648
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP19890102876 Withdrawn EP0334023A3 (fr) | 1988-03-25 | 1989-02-20 | Procédé de détection de signaux de parole |
Country Status (2)
| Country | Link |
|---|---|
| EP (1) | EP0334023A3 (fr) |
| DE (1) | DE3810068A1 (fr) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO1992013340A1 (fr) * | 1991-01-18 | 1992-08-06 | Theis Peter F | Systeme pour distinguer ou pour compter des expressions detaillees verbales |
| WO1997000515A1 (fr) * | 1995-06-19 | 1997-01-03 | Fjaellbrandt Tore | Procede et dispositif de determination d'une frequence de registre dans un signal acoustique |
| WO2000070602A1 (fr) * | 1999-05-18 | 2000-11-23 | Voxlab Oy | Procede d'evaluation de la rythmicite d'un signal numerique compose d'echantillons |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5822726A (en) * | 1995-01-31 | 1998-10-13 | Motorola, Inc. | Speech presence detector based on sparse time-random signal samples |
| DE10321625B4 (de) * | 2003-05-13 | 2007-08-23 | Gehrke Kommunikationssyteme Gmbh | Signalübertragungsvorrichtung und Verfahren zum Regeln einer Signalübertragungsvorrichtung |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US3513260A (en) * | 1967-10-13 | 1970-05-19 | Ibm | Speech presence detector |
| US3751602A (en) * | 1971-08-13 | 1973-08-07 | Bell Telephone Labor Inc | Loudspeaking telephone |
| FR2380612A1 (fr) * | 1977-02-09 | 1978-09-08 | Thomson Csf | Dispositif de discrimination des signaux de parole et systeme d'alternat comportant un tel dispositif |
| US4484344A (en) * | 1982-03-01 | 1984-11-20 | Rockwell International Corporation | Voice operated switch |
| GB2137458B (en) * | 1983-03-01 | 1986-11-19 | Standard Telephones Cables Ltd | Digital handsfree telephone |
-
1988
- 1988-03-25 DE DE19883810068 patent/DE3810068A1/de active Granted
-
1989
- 1989-02-20 EP EP19890102876 patent/EP0334023A3/fr not_active Withdrawn
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO1992013340A1 (fr) * | 1991-01-18 | 1992-08-06 | Theis Peter F | Systeme pour distinguer ou pour compter des expressions detaillees verbales |
| WO1997000515A1 (fr) * | 1995-06-19 | 1997-01-03 | Fjaellbrandt Tore | Procede et dispositif de determination d'une frequence de registre dans un signal acoustique |
| WO2000070602A1 (fr) * | 1999-05-18 | 2000-11-23 | Voxlab Oy | Procede d'evaluation de la rythmicite d'un signal numerique compose d'echantillons |
Also Published As
| Publication number | Publication date |
|---|---|
| EP0334023A3 (fr) | 1991-02-06 |
| DE3810068A1 (de) | 1989-10-05 |
| DE3810068C2 (fr) | 1990-01-11 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| DE69331181T2 (de) | Tonverstärkervorrichtung mit automatischer Unterdrückung akustischer Rückkopplung | |
| DE2719973C2 (fr) | ||
| DE4126902C2 (de) | Sprachintervall - Feststelleinheit | |
| DE602004001241T2 (de) | Vorrichtung zur Unterdrückung von impulsartigen Windgeräuschen | |
| DE3802903C2 (fr) | ||
| DE3687684T2 (de) | Automatischer pegelregler in einer digitalen datenverarbeitungsanlage. | |
| DE1248225B (de) | Verfahren und Vorrichtung zum genauen Ermitteln der Herzschlagfrequenz | |
| DE3525472A1 (de) | Anordnung zum detektieren impulsartiger stoerungen und anordnung zum unterdruecken impulsartiger stoerungen mit einer anordnung zum detektieren impulsartiger stoerungen | |
| DE2020753A1 (de) | Einrichtung zum Erkennen vorgegebener Sprachlaute | |
| EP1101390B1 (fr) | Appareil auditif permettant une meilleure comprehension de la parole grace a un traitement de signal selectif en frequence, et procede permettant de faire fonctionner un tel appareil auditif | |
| CH691787A5 (de) | Klirrunterdruckung bei Hörgeräten mit AGC. | |
| EP0334023A2 (fr) | Procédé de détection de signaux de parole | |
| DE3733983A1 (de) | Verfahren zum daempfen von stoerschall in von hoergeraeten uebertragenen schallsignalen | |
| DE3102385A1 (de) | Schaltungsanordnung zur selbstaetigen aenderung der einstellung von tonwiedergabegeraeten, insbesondere rundfunkempfaengern | |
| EP3588498B1 (fr) | Procédé de suppression d'une réverbération acoustique dans un signal audio | |
| DE3101483A1 (de) | Datenerkennungsdetektor bei einer zeitabhaengigen sprechinterpoliereinrichtung | |
| EP1458216A2 (fr) | Appareil et procédé à l'adaption de microphones dans une prothèse auditive | |
| EP0777130B1 (fr) | Procédé digital de détection d'impulsions courtes et appareil pour la mise en oeuvre du procédé | |
| DE3734446C2 (fr) | ||
| DE69208602T2 (de) | Ein den Frequenzhub begrenzender Übertragungsschaltkreis | |
| DE3779708T2 (de) | Schaltungsanordnung zur isolationsgewaehrung zwischen den uebertragungswegen eines freisprechapparates. | |
| CH654962A5 (de) | Zentrale schaltungseinrichtung zur sprecherkennung fuer ein tasi-system. | |
| DE19854341A1 (de) | Verfahren und Schaltungsanordnung zur Sprachpegelmessung in einem Sprachsignalverarbeitungssystem | |
| DE69906640T2 (de) | Verfahren zur Wiederherstellung der Radarempfindlichkeit bei gepulster elektromagnetischer Störung | |
| DE102005043314B4 (de) | Verfahren zum Dämpfen von Störschall und entsprechende Hörvorrichtung |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AT BE CH DE ES FR GB IT LI LU NL SE |
|
| PUAL | Search report despatched |
Free format text: ORIGINAL CODE: 0009013 |
|
| RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: TELENORMA GMBH |
|
| AK | Designated contracting states |
Kind code of ref document: A3 Designated state(s): AT BE CH DE ES FR GB IT LI LU NL SE |
|
| 17P | Request for examination filed |
Effective date: 19910306 |
|
| 17Q | First examination report despatched |
Effective date: 19921221 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN |
|
| 18W | Application withdrawn |
Withdrawal date: 19930408 |