EP0334023A2 - Procédé de détection de signaux de parole - Google Patents

Procédé de détection de signaux de parole Download PDF

Info

Publication number: EP0334023A2
Authority: EP; European Patent Office
Prior art keywords: signal; speech; amplitude; signals; control amplifier
Prior art date: 1988-03-25
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.): Withdrawn

Application number

EP89102876A

Other languages

German (de)

English (en)

Other versions

EP0334023A3 (fr

Inventor

Hans Wilhelm Dipl.-Ing. Gierlich

Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)

Telenorma GmbH

Original Assignee

Telenorma Telefonbau und Normalzeit GmbH

Telefonbau und Normalzeit GmbH

Telenorma GmbH

Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)

1988-03-25

Filing date

1989-02-20

Publication date

1989-09-27

1989-02-20 Application filed by Telenorma Telefonbau und Normalzeit GmbH, Telefonbau und Normalzeit GmbH, Telenorma GmbH filed Critical Telenorma Telefonbau und Normalzeit GmbH

1989-09-27 Publication of EP0334023A2 publication Critical patent/EP0334023A2/fr

1991-02-06 Publication of EP0334023A3 publication Critical patent/EP0334023A3/fr

Status Withdrawn legal-status Critical Current

Links

238000000034 method Methods 0.000 title claims abstract description 43
238000001514 detection method Methods 0.000 title abstract description 27
230000002238 attenuated effect Effects 0.000 claims description 2
230000001934 delay Effects 0.000 claims 1
230000003321 amplification Effects 0.000 abstract description 4
238000003199 nucleic acid amplification method Methods 0.000 abstract description 4
238000001914 filtration Methods 0.000 abstract description 3
238000004458 analytical method Methods 0.000 description 2
230000015572 biosynthetic process Effects 0.000 description 2
230000002596 correlated effect Effects 0.000 description 2
238000010586 diagram Methods 0.000 description 2
238000011156 evaluation Methods 0.000 description 2
230000000737 periodic effect Effects 0.000 description 2
230000002123 temporal effect Effects 0.000 description 2
239000000654 additive Substances 0.000 description 1
230000000996 additive effect Effects 0.000 description 1
238000006243 chemical reaction Methods 0.000 description 1
238000010219 correlation analysis Methods 0.000 description 1
238000005314 correlation function Methods 0.000 description 1
230000000875 corresponding effect Effects 0.000 description 1
230000003111 delayed effect Effects 0.000 description 1
230000005284 excitation Effects 0.000 description 1
238000011835 investigation Methods 0.000 description 1
230000010355 oscillation Effects 0.000 description 1
230000001629 suppression Effects 0.000 description 1

Images

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals

Definitions

the invention relates to a method for recognizing speech signals, these first being fed to a low-pass filter, the pass band of which is in the range of the basic speech frequency.
the recognition of voice signals is of great importance, since the presence of voice signals can be used as a criterion for increasing the gain.
the amplification of the transmit and receive signal is controlled as a function of the presence of a speech signal. The same applies to conference facilities.
the object of the invention is now to provide a method for recognizing speech signals, in which the presence of speech signals is recognized after a very short time, without this suppressing initial syllables.
This object is achieved in that the signals appearing at the output of the low-pass filter are checked for the amplitude and duration of a specific amplitude and in that a speech signal is recognized when at least three successive amplitudes have occurred within a predetermined time frame.
the signals are first checked for maximum amplitude values. As soon as a maximum amplitude value is determined, the period of time within which a further maximum amplitude value occurs is measured in order to be able to recognize speech signals in this way.
the three amplitudes A1 to A3 shown in FIG. 1 are the amplitudes of a speech signal which are present at the output of a low-pass filter whose cut-off frequency is approximately 400 Hz.
the signals supplied to the input of the low-pass filter are generated, for example, by a microphone and are composed of room noises and speech signals.
the method according to the invention for recognizing speech signals now essentially uses the frequency range of the fundamental speech frequency (80 to 333 Hz) for analysis.
the most important feature for the detection of speech signals is the period of the vibrations of the speech signals, which is in the range of 3 to 12.5 ms at the basic speech frequency depending on the speaker. This first feature is used to distinguish between speech and noise.
the detection of zero crossings in the speech signal is not expedient, since in the event of interference, for example due to noise, the number of zero crossings can increase so greatly that speech recognition is no longer possible in this way.
the method according to the invention uses the maxima of the speech signal to recognize speech. If these are then within a predetermined amplitude time window, then a first criterion for the presence of speech signals is given. The choice of window parameters has a significant influence on the period detection.
the window size is chosen such that it is smaller than half the smallest possible period of the basic speech frequency so that both positive and negative maximum values of the speech signal can be recognized. This is necessary because the speech signal is not symmetrical with respect to the dynamic range.
the window size is therefore approximately 0.9 ms.
the amplitude tolerance of the maximum values is very small over a few periods in the case of an undisturbed speech signal, but can be increased significantly at high interference levels due to additive superimposition of the interference signal.
the amplitude window is approximately plus minus 20% of the first maximum.
the amplitude A1 has been recognized as the maximum value, whereupon its duration t1 is stored as a period.
the time window of the period PF begins at the temporal center of the amplitude A1 of the first maximum M1 to run, which is open between 3 and 12.5 ms. If the next amplitude A2 now falls within the time window of the period PF, since its time window ZF lies within the amplitude window AF, the duration of the amplitude A2 is identified as the second maximum by storing the value t2.
the amplitude window AF is defined as a threshold as a function of the amplitude value of the first maximum M1.
a simple counting process for detecting the three successive amplitudes A1 to A3, which meet the conditions described above, can already be used to conclude that a speech signal is present, in which case it is not necessary to store the period durations t1 to t3.
two methods can be used for a more precise determination of speech signals, which are described below.
the degree of correlation between the individual periods is determined. Through a cross correlation between the successive signal sections of a period length, high values for the nominated cross correlation coefficient are achieved in the areas in which speech is present. However, if the detected period is only random maxima in the specified interval, the correlation analysis gives small values.
the second or, in the case of detection of several periods the third period is correlated with the first. If three periods are correlated, the smaller of the two values is used for the decision. This reduces the frequency of errors in the case of randomly detected periods, particularly in the event of interference by noise signals. If more periods are used for the detection, the detection speed slows down, however, no further improvement can be achieved since the values of KKF (k. N p ) decrease significantly due to the amplitude and frequency modulation of the speech signal.
a further improvement in the decision can be achieved if, instead of evaluating the cross-correlation function for speech decision, the nominated mean square error between the recognized periods is used.
the decisive advantage of this method for speech detection is the recognition time.
the detection time is 37.5 ms.
the analysis using the simplified method described at the beginning gives approximately the same results as the evaluation method with cross-correlation or after determining the mean square error.
the detection rate is on average 5% below the detection rate of the previously described method, but can also assume higher values depending on the noise situation. Differences to the above-mentioned procedure become clear when the speech sequence is disturbed. With the selected parameters, the period detection can deliver an increased number of wrong decisions, depending on the respective background noise, for some background noise situations.
reflections of the interference signal if they meet the criteria for the presence of speech, are recognized as speech and lead to incorrect decisions.
the detection of sinusoidal interference in the area of the fundamental speech frequency is only possible on the basis of the duration and frequency constancy of this interference signal.
the selection of the method for speech detection to be used is essentially determined by the expected useful / interference power ratios and the interference noises.
useful / interference power ratios of more than 12 dB
the simplified detection method can already be used without arithmetic operations.
all methods only have a short signal delay in the range of the detection time (9 to 37 ms) Sequence so that initial syllables are not suppressed.
the method presented can be implemented, for example, with the aid of a signal processor SP (see FIG. 2).
the analog signal from the microphone M is sampled and digitized via the analog / digital converter W1.
the sample values obtained in this way can be used by the signal processor according to the method according to the invention for speech detection. If speech is recognized, the microphone signal can be amplified by the control amplifier RV1 by a fixed amount at the instigation of the signal processor SP.
Such an arrangement is suitable, for example, for microphones which are located in a room with a large amount of noise.
the amplification of the speech signals results in better intelligibility.
a hands-free device in the presence of a speech signal in the signal of the microphone M, the control amplifier RV2 is caused by the signal processor SP to attenuate the signal for the loudspeaker LS accordingly, in order in this way to to prevent acoustic feedback between loudspeaker LS and microphone M.
the control amplifier RV2 could be influenced at the instigation of the signal processor SP in such a way that it amplifies the input signal to achieve a better intelligibility of the loudspeaker signal LS.
the signal processor receives at its inputs SE and EE data words which represent the samples of the signals. Data words are also applied to the connected lines at the outputs SA and EA of the signal processor SP. To avoid the suppression of initial syllables, the input signals can be delayed using the signal processor SP by a time which is in the range of the recognition time (5-37 ms). Likewise, a fall time can be caused by the signal processor SP for the control signals influencing the control amplifiers RV, which are of the order of magnitude of 200 to 900 ms and are used to bridge unvoiced sounds and short speech pauses between words and sentences.
the low-pass filtering function with a cut-off frequency of 400 Hz can also be carried out by the signal processor SP.
Another application of the method according to the invention is also conceivable in the context of an intercom system, the other direction being attenuated as a function of voice signals in one direction at the instigation of the signal processor.
a signal processor is not further discussed in the context of this description, but such signal processors are sold, for example, by Texas Instruments under the designation TMS 320 or by Fujitsu under the designation MB 8764. Such a signal processor is to be programmed in such a way that the described method steps run automatically.
the analog / digital converters W1 and W4 serve to convert the analog signals into digital signals for signal processing in the signal processor SP, while the conversion of the digital signals occurring at the outputs SA and EA into analog signals by the digital / analog converters W2 and W3 takes place.
control amplifiers RV1 and RV2 can also be dispensed with if the function of amplifying the signals is taken over by the signal processor SP itself, which can also be designed as a suitable microprocessor.
the implementation of the method according to the invention is conceivable by means of a corresponding, discretely constructed analog circuit arrangement or also a correspondingly designed customer circuit.

Landscapes

Engineering & Computer Science (AREA)
Computational Linguistics (AREA)
Signal Processing (AREA)
Health & Medical Sciences (AREA)
Audiology, Speech & Language Pathology (AREA)
Human Computer Interaction (AREA)
Physics & Mathematics (AREA)
Acoustics & Sound (AREA)
Multimedia (AREA)
Noise Elimination (AREA)
Interconnected Communication Systems, Intercoms, And Interphones (AREA)
Circuit For Audible Band Transducer (AREA)

EP19890102876 1988-03-25 1989-02-20 Procédé de détection de signaux de parole Withdrawn EP0334023A3 (fr)

Applications Claiming Priority (2)

Application Number	Priority Date	Filing Date	Title
DE3810068		1988-03-25
DE19883810068 DE3810068A1 (de)	1988-03-25	1988-03-25	Verfahren zur erkennung von sprachsignalen

Publications (2)

Publication Number	Publication Date
EP0334023A2 true EP0334023A2 (fr)	1989-09-27
EP0334023A3 EP0334023A3 (fr)	1991-02-06

Family

ID=6350648

Family Applications (1)

Application Number	Title	Priority Date	Filing Date
EP19890102876 Withdrawn EP0334023A3 (fr)	1988-03-25	1989-02-20	Procédé de détection de signaux de parole

Country Status (2)

Country	Link
EP (1)	EP0334023A3 (fr)
DE (1)	DE3810068A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
WO1992013340A1 (fr) *	1991-01-18	1992-08-06	Theis Peter F	Systeme pour distinguer ou pour compter des expressions detaillees verbales
WO1997000515A1 (fr) *	1995-06-19	1997-01-03	Fjaellbrandt Tore	Procede et dispositif de determination d'une frequence de registre dans un signal acoustique
WO2000070602A1 (fr) *	1999-05-18	2000-11-23	Voxlab Oy	Procede d'evaluation de la rythmicite d'un signal numerique compose d'echantillons

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
US5822726A (en) *	1995-01-31	1998-10-13	Motorola, Inc.	Speech presence detector based on sparse time-random signal samples
DE10321625B4 (de) *	2003-05-13	2007-08-23	Gehrke Kommunikationssyteme Gmbh	Signalübertragungsvorrichtung und Verfahren zum Regeln einer Signalübertragungsvorrichtung

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
US3513260A (en) *	1967-10-13	1970-05-19	Ibm	Speech presence detector
US3751602A (en) *	1971-08-13	1973-08-07	Bell Telephone Labor Inc	Loudspeaking telephone
FR2380612A1 (fr) *	1977-02-09	1978-09-08	Thomson Csf	Dispositif de discrimination des signaux de parole et systeme d'alternat comportant un tel dispositif
US4484344A (en) *	1982-03-01	1984-11-20	Rockwell International Corporation	Voice operated switch
GB2137458B (en) *	1983-03-01	1986-11-19	Standard Telephones Cables Ltd	Digital handsfree telephone

1988
- 1988-03-25 DE DE19883810068 patent/DE3810068A1/de active Granted
1989
- 1989-02-20 EP EP19890102876 patent/EP0334023A3/fr not_active Withdrawn

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
WO1992013340A1 (fr) *	1991-01-18	1992-08-06	Theis Peter F	Systeme pour distinguer ou pour compter des expressions detaillees verbales
WO1997000515A1 (fr) *	1995-06-19	1997-01-03	Fjaellbrandt Tore	Procede et dispositif de determination d'une frequence de registre dans un signal acoustique
WO2000070602A1 (fr) *	1999-05-18	2000-11-23	Voxlab Oy	Procede d'evaluation de la rythmicite d'un signal numerique compose d'echantillons

Also Published As

Publication number	Publication date
EP0334023A3 (fr)	1991-02-06
DE3810068A1 (de)	1989-10-05
DE3810068C2 (fr)	1990-01-11

Legal Events

Date	Code	Title	Description
1989-08-12	PUAI	Public reference made under article 153(3) epc to a published international application that has entered the european phase	Free format text: ORIGINAL CODE: 0009012
1989-09-27	AK	Designated contracting states	Kind code of ref document: A2 Designated state(s): AT BE CH DE ES FR GB IT LI LU NL SE
1990-12-17	PUAL	Search report despatched	Free format text: ORIGINAL CODE: 0009013
1991-01-23	RAP1	Party data changed (applicant data changed or rights of an application transferred)	Owner name: TELENORMA GMBH
1991-02-06	AK	Designated contracting states	Kind code of ref document: A3 Designated state(s): AT BE CH DE ES FR GB IT LI LU NL SE
1991-05-02	17P	Request for examination filed	Effective date: 19910306
1993-02-03	17Q	First examination report despatched	Effective date: 19921221
1993-04-28	STAA	Information on the status of an ep patent application or granted ep patent	Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN
1993-06-16	18W	Application withdrawn	Withdrawal date: 19930408

Publication	Publication Date	Title
DE69331181T2 (de)	2002-04-18	Tonverstärkervorrichtung mit automatischer Unterdrückung akustischer Rückkopplung
DE2719973C2 (fr)	1988-03-10
DE4126902C2 (de)	1996-06-27	Sprachintervall - Feststelleinheit
DE602004001241T2 (de)	2006-11-09	Vorrichtung zur Unterdrückung von impulsartigen Windgeräuschen
DE3802903C2 (fr)	1993-09-09
DE3687684T2 (de)	1993-08-19	Automatischer pegelregler in einer digitalen datenverarbeitungsanlage.
DE1248225B (de)	1967-08-24	Verfahren und Vorrichtung zum genauen Ermitteln der Herzschlagfrequenz
DE3525472A1 (de)	1986-01-30	Anordnung zum detektieren impulsartiger stoerungen und anordnung zum unterdruecken impulsartiger stoerungen mit einer anordnung zum detektieren impulsartiger stoerungen
DE2020753A1 (de)	1971-02-11	Einrichtung zum Erkennen vorgegebener Sprachlaute
EP1101390B1 (fr)	2004-04-14	Appareil auditif permettant une meilleure comprehension de la parole grace a un traitement de signal selectif en frequence, et procede permettant de faire fonctionner un tel appareil auditif
CH691787A5 (de)	2001-10-15	Klirrunterdruckung bei Hörgeräten mit AGC.
EP0334023A2 (fr)	1989-09-27	Procédé de détection de signaux de parole
DE3733983A1 (de)	1989-04-20	Verfahren zum daempfen von stoerschall in von hoergeraeten uebertragenen schallsignalen
DE3102385A1 (de)	1982-09-02	Schaltungsanordnung zur selbstaetigen aenderung der einstellung von tonwiedergabegeraeten, insbesondere rundfunkempfaengern
EP3588498B1 (fr)	2023-09-13	Procédé de suppression d'une réverbération acoustique dans un signal audio
DE3101483A1 (de)	1981-12-03	Datenerkennungsdetektor bei einer zeitabhaengigen sprechinterpoliereinrichtung
EP1458216A2 (fr)	2004-09-15	Appareil et procédé à l'adaption de microphones dans une prothèse auditive
EP0777130B1 (fr)	2002-11-13	Procédé digital de détection d'impulsions courtes et appareil pour la mise en oeuvre du procédé
DE3734446C2 (fr)	1992-04-30
DE69208602T2 (de)	1996-08-14	Ein den Frequenzhub begrenzender Übertragungsschaltkreis
DE3779708T2 (de)	1993-05-13	Schaltungsanordnung zur isolationsgewaehrung zwischen den uebertragungswegen eines freisprechapparates.
CH654962A5 (de)	1986-03-14	Zentrale schaltungseinrichtung zur sprecherkennung fuer ein tasi-system.
DE19854341A1 (de)	2000-06-08	Verfahren und Schaltungsanordnung zur Sprachpegelmessung in einem Sprachsignalverarbeitungssystem
DE69906640T2 (de)	2004-02-19	Verfahren zur Wiederherstellung der Radarempfindlichkeit bei gepulster elektromagnetischer Störung
DE102005043314B4 (de)	2009-08-06	Verfahren zum Dämpfen von Störschall und entsprechende Hörvorrichtung

EP0334023A2 - Procédé de détection de signaux de parole - Google Patents

Info

Links

Images

Classifications

Definitions

Landscapes

Applications Claiming Priority (2)

Publications (2)

Family

ID=6350648

Family Applications (1)

Country Status (2)

Cited By (3)

Families Citing this family (2)

Family Cites Families (5)

Cited By (3)

Also Published As

Similar Documents

Legal Events