WO2005098820A1 - 音声認識装置及び音声認識方法 - Google Patents
音声認識装置及び音声認識方法 Download PDFInfo
- Publication number
- WO2005098820A1 WO2005098820A1 PCT/JP2005/005052 JP2005005052W WO2005098820A1 WO 2005098820 A1 WO2005098820 A1 WO 2005098820A1 JP 2005005052 W JP2005005052 W JP 2005005052W WO 2005098820 A1 WO2005098820 A1 WO 2005098820A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- noise
- parameter
- speech recognition
- model
- memory
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0635—Training updating or merging of old and new templates; Mean values; Weighting
Definitions
- the present invention relates to, for example, a voice recognition device and a voice recognition method for recognizing an uttered voice.
- the clustering process is performed so that a plurality of similar distributions are assigned to the same duplication, and a predetermined number of clusters are generated. Then, a model synthesis process is performed on the center of gravity of each class. For this reason, model synthesis can be performed at a higher speed as compared with generally known model synthesis processing.
- the amount of computation required for model synthesis processing for one distribution is large.For example, when applied to so-called embedded devices such as car navigation systems, the processing time and equipment Sue There is a possibility that mounting on a device may be difficult due to restrictions on software.
- noise models for various noise environment categories are prepared in the memory in advance, and a noise model corresponding to the noise environment category of the voice input signal is selected.
- Noise adaptive processing can be performed.
- the number of noise environment categories and noise models prepared in advance increases, the amount of memory to store them also increases dramatically, so that it is used in embedded speech recognition devices built into mobile devices and vehicle-mounted devices. Was difficult.
- Patent Document 1 Japanese Patent Application Laid-Open No. H10-10-161692
- An example of the problem to be solved by the present invention is to provide a speech recognition device and a speech recognition method in which the function of noise adaptive processing in the speech recognition process is improved and the amount of memory used is reduced.
- the invention according to claim 1 is a speech recognition device that performs a noise adaptation process based on a noise model on an input speech signal to perform speech recognition on the input speech signal, and is included in each of a plurality of noise environment categories.
- First storage means for calculating a first parameter representative of a plurality of noise models in advance and storing the calculated first parameters for each noise environment category; and a relative relationship between each of the plurality of noise models and the first parameter.
- Second storage means for calculating and storing in advance a second parameter representing the position information, and estimating a noise environment category to which the environmental noise belongs based on characteristics of the environmental noise superimposed on the input speech signal.
- Estimating means, and Selecting means for extracting the first parameter corresponding to the noise environment category from the first storage means; selecting means for extracting; the first parameter extracted by the selecting means; and the second parameter read from the second storage means.
- a noise adaptation means for performing a noise adaptation process on the input speech signal by using the noise model adapted to the environmental noise.
- the invention according to claim 7 is a voice recognition method for performing a noise adaptation process based on a noise model on an input voice signal to perform voice recognition on the input voice signal, wherein each of the plurality of noise environment categories Calculating in advance a first parameter representative of a plurality of noise models included in the noise environment category and storing the same in a first memory for each of the noise environment categories; and Calculating in advance a second parameter representing relative position information with respect to evening and storing this in a second memory; based on characteristics of the environmental noise superimposed on the input audio signal, Estimating a noise environment category to which the noise environment category belongs, selecting and extracting the first parameter corresponding to the estimated noise environment category from the first memory, and selecting A noise model suitable for the environmental noise is restored using the extracted first parameter and the second parameter read from the second memory, and the noise model for the input audio signal is restored using the noise model. Performing a process.
- FIG. 1 is a block diagram showing an example of a speech recognition device according to the present invention.
- FIG. 2 is a flowchart showing a preparation stage process in the speech recognition apparatus of FIG.
- FIG. 3 is a schematic diagram showing a configuration of a cruster formed in the process of FIG.
- FIG. 4 is a configuration diagram showing the contents of the center-of-gravity database storage unit 104 in the voice recognition device of FIG.
- FIG. 5 is a flowchart showing a process of the voice recognition process in the voice recognition device of FIG.
- FIG. 6 is an explanatory diagram showing how noise categories are selected in the environment estimation processing of FIG.
- FIG. 7 is a block diagram showing a second embodiment of the speech recognition apparatus according to the present invention.
- FIG. 8 is a flowchart illustrating an example of the unsteady parameter exclusion process.
- FIG. 9 is a diagram of noise parameters showing an example of applying the non-stationary parameter overnight exclusion processing.
- FIG. 1 shows a speech recognition apparatus according to an embodiment of the present invention.
- the voice recognition device 10 shown in the figure may have a configuration in which only the device is used alone, for example, or may be built in another device such as a mobile phone or a car navigation device.
- the configuration may be as follows.
- the feature parameter overnight extraction unit 101 converts the input speech signal into the speech section and the non-speech sections before and after the speech section, and converts the acoustic signals existing in these sections into acoustic features. This is a part that is converted into feature parameters to be extracted and extracted.
- the environment estimating unit 102 is a unit that determines an environment category of noise superimposed on the input uttered speech signal based on the feature parameters of the non-speech section.
- the environmental category of noise is, for example, a model selection in which the noise from the engine corresponds to one noise category, and the noise from the car air conditioner also corresponds to one noise category if the noise is related to automobiles.
- the extracting unit 103 stores the seed data relating to the noise model included in the category estimated by the environment estimating unit 102 into a centroid database storage unit 104 (hereinafter, simply referred to as a “storage unit 104”). This is the part to be extracted.
- the noise adaptation processing unit 105 is a part that executes noise adaptation processing by a method such as a Jacobi adaptation method using the various data selected and extracted.
- the model restoring unit 106 is stored in advance in a difference vector database storage unit 100 (hereinafter simply referred to as “storage unit 107”) based on the result of the noise adaptation processing described above.
- This is a unit for performing a model restoration process using the difference vector. The details of the various data and the difference vector stored in advance in the storage unit 104 and the storage unit 107 will be described later.
- the keyword model generation unit 108 is stored in a keyword dictionary storage unit 109 (hereinafter simply referred to as a “memory unit 109”) based on the acoustic model output from the model restoration unit 106. This is the part that extracts recognition candidates from the vocabulary and generates a keyword model as an acoustic pattern.
- the matching unit 110 applies the feature parameters between utterance segments supplied from the feature parameter extraction unit 101 to each of the keyword models generated by the keyword model generation unit 108, and This is the part that calculates the matching likelihood and performs speech recognition processing on the input uttered speech signal.
- the arrows indicating the signal flows indicate the direction of each component. It shows the flow of the main signals between them.
- various signals such as response signals and monitoring signals accompanying such signals include cases where the signals are transmitted in the direction and direction of the arrow.
- the divisions and signal paths of each component shown in the figure are for convenience of explanation of the operation, and it is not necessary to realize the configuration as described in an actual device.
- step S201 of FIG. 2 a clustering process for grouping a distribution of acoustic models similar to the input acoustic model (hereinafter referred to as “distribution”) is performed.
- the number of groups (hereinafter referred to as “clusters”) formed by the clustering process is set in advance, and the clustering process is continued until the number of generated classes reaches such a constant.
- class information indicating information such as to which cluster each distribution belongs is generated simultaneously with the progress of the clustering process.
- step S205 the difference force 3 between the distribution of acoustic models belonging to each cluster and the center of gravity in the class is calculated.
- m (n) is the acoustic model belonging to cluster i
- g (i) is the center of gravity of the cluster
- the value of the difference vector calculated in step S205 is stored in the storage unit 107 of the speech recognition device 10.
- step S207 a predetermined noise model is prepared, and a model synthesis process of this model and the center of gravity of each class obtained in step S203 is performed.
- the center of gravity of each class after Dell synthesis is stored in the storage unit 104 of the speech recognition device 10 *.
- a plurality of noise models are prepared for each environmental noise category, and a model synthesis process is performed for each noise ⁇ del. Therefore, as many clusters as the number of noise models subjected to the model synthesis processing are generated from one cluster centroid before the model synthesis processing.
- FIG. 4 shows how various data obtained by the model synthesis processing in step S207 are stored in the storage unit 104.
- the storage unit 104 has the following characteristics for each environmental noise category.
- the three types of data will be stored.
- predetermined data is stored in advance in the storage unit 104 and the storage unit 107 of the speech recognition apparatus 10 according to the present embodiment.
- the voice recognition device 10 when an uttered voice signal is input to the voice recognition device 10, the feature parameter overnight conversion process of step S301 shown in FIG. 5 is executed.
- the feature parameter overnight extraction unit 101 of the speech recognition device 10 utters the input uttered speech signal, including the LPC cepstrum and MFCC (medium Frequency cepstrum coefficient).
- the type of utterance parameter that is included is not limited to such a case, but is a parameter that expresses the acoustic characteristics of the speech signal, and is the same as the format of these audio parameters. Any acoustic model expressed using parameters can be used in the same way.
- the feature parameter of the speech section is extracted from the feature parameter extraction unit 101 to the matching unit 110, and the feature parameter of the non-speech section is estimated in the environment. It is supplied to the sections 102 respectively.
- the environment estimation processing in the next step S303 in order to select a predetermined category from among a plurality of environmental noise categories stored in the storage unit 104, the environment estimation processing is superimposed on the input speech sound.
- This is a process of estimating the environmental noise to be performed. That is, the environment estimating unit 102 estimates the vote boundary noise of the input utterance signal based on the feature parameters of the non-utterance section, and obtains a noise category corresponding to this.
- 104 stores noise models that are representative of different environmental noise categories.
- the environment estimating unit 102 calculates the noise likelihood for each noise category based on these noise models and the special parameters of the non-utterance section.
- noise models include the mean and covariance of feature parameters calculated using a database of many environmental noise targets. Therefore, the noise likelihood for each environment category can be obtained by fitting the utterance parameters, which are the characteristic parameters of the non-speech interval, to the normal distribution obtained from the mean variance of the noise model. it can.
- Fig. 6 shows a case where the noise likelihood is obtained by fitting the utterance parameter, which is the characteristic parameter of the non-utterance section, to the normal distribution of three noise models of noise categories 1 to 3. Shown in the example shown in Fig. 6, when the speech parameters indicating the environmental noise of the input speech signal are applied to the noise models of noise categories 13 and 13, the noise likelihood of noise category 2 is higher than the other two. Represents. Therefore, in the figure, noise category 2 is selected as the estimation result of the environmental noise category.
- the noise model is selected and extracted in the next step S305. That is, the model selection and extraction unit 103 selects various data on the noise category estimated by the environment estimation unit 102 from the database of the storage unit 104 and extracts them.
- the acoustic model distribution is subjected to clustering processing, and the center of gravity data obtained by synthesizing each cluster centroid with the noise model, the noise model, and the corresponding centroid Data for noise adaptation processing is stored for each category of environmental noise.
- these data belonging to the selected noise category are loaded from the storage unit 104 to the noise adaptation unit 105.
- noise adaptation processing is performed by the noise adaptation unit 105.
- Various techniques can be used for such noise adaptation processing. For example, when performing noise adaptation processing by the Jacobi adaptation method, the Jacobi matrix corresponding to the center of gravity of each class is also stored in advance as a center of gravity data base Store it in 104. Then, at the stage of performing the noise adaptation process in step S307, the Jacobi matrix data of the noise category corresponding to the noise adaptation unit 105 is read from the storage unit 104, and the Jacobi adaptation method is performed using the data.
- the noise adaptation process is performed by the following.
- the model restoring unit 106 uses the difference vector data prepared in advance in the storage unit 107. Model restoration processing is performed. By this processing, an acoustic model after noise adaptation is prepared.
- a keyword model generation process is performed in step S311.
- the keyword model generation unit 108 extracts a vocabulary that is a candidate for speech recognition from the storage unit 109 and executes a key model generation process for stylizing the vocabulary as an acoustic pattern. I do.
- matching processing by the matching unit 110 is executed. That is, for each of the keyword models generated by the keyword model generation unit 108, the feature parameters of the utterance section supplied from the feature parameter overnight extraction unit 101 are collated, and matching for each keyword is performed. The keyword likelihood indicating the degree of is calculated. Then, among the keyword likelihoods obtained by such processing, the keyword having the highest value is output from the voice recognition device 10 as a recognition result for the input uttered voice.
- the amount of used memory can be reduced by converting a set of a plurality of initial synthesis models into a plurality of initial synthesis clusters and storing the converted clusters.
- Built-in speech recognition device with high adaptive processing capability It is easy to mount it on a container.
- the difference vector can be shared, and the required configuration can be simplified and the performance can be improved at the same time.
- the speaker adaptation function is added to the present embodiment and the speaker adaptation is performed using the difference vector, when the content of the center-of-gravity database is upgraded, the speaker characteristics are reflected. This makes it possible to recognize speech in an environment that has been upgraded.
- FIG. 7 is a block diagram showing a second embodiment of the speech recognition apparatus according to the present invention.
- parts that are the same as the respective parts of the speech recognition apparatus 10 shown in FIG. 1 are given the same reference numerals, and description thereof will not be repeated.
- the illustrated speech recognizer 20 is a non-volatile memory provided between the feature parameter extractor 101 and the environment estimator 102 in addition to the components of the speech recognizer 10 shown in FIG. It is characterized in that it has a normal parameter overnight exclusion processing section 111.
- the non-stationary parameter elimination processing unit 111 selects a set of non-stationary parameters from the set of feature parameters (called a noise set) supplied from the feature parameter extraction unit 101. Perform the exclusion operation.
- FIG. 8 is a flowchart showing an example of the unsteady parameter overnight removal processing performed by the unsteady parameter overnight removal processing unit 111.
- the non-stationary parameter overnight exclusion processing unit 111 performs a clustering process of classifying one noise set input in step S401 into a plurality of groups.
- step S402 the similarity between the centers of gravity between the clusters is obtained.
- step S403 the one with the lowest similarity in the class It is determined whether the difference is equal to or less than a predetermined threshold.
- step S404 belongs to the class having the smaller number of elements (indicating the number of feature parameters belonging to the set). Exclude feature parameters. Subsequently, the process proceeds to step S405 to generate an adaptive noise model (corresponding to the environment estimation process in step S303 in FIG. 6). If the similarity is larger than the predetermined threshold value in step S 403, the process proceeds to step S 405 without executing the processing in step S 404.
- step S405 Following execution of step S405, an environment category selection process (corresponding to the model selection extraction process of step S305 in FIG. 6) is performed in step S406. Next, an application example of the non-stationary parameter overnight exclusion process will be described.
- Fig. 9 shows a noise set in which noise parameters corresponding to environment A are mixed with noise parameters affected by sudden noise.
- the environment is determined by calculating the center of gravity of the entire noise set and calculating the similarity between the center of gravity and the noise models representing environment A and environment B. For this reason, as shown in the figure, the center of gravity of the noise set is more similar in the noise model of environment B than in the noise model of environment A, and is erroneously determined as environment B.
- the noise parameters that are determined to be sudden noise are excluded, and the noise parameters that exclude such noise parameters are excluded.
- Center of gravity is required.
- the environmental noise is added to the noise model.
- data such as a center of gravity value may be calculated based on the accumulated data.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Navigation (AREA)
Abstract
Description
Claims
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP05721202A EP1732063A4 (en) | 2004-03-31 | 2005-03-15 | LANGUAGE RECOGNITION AND LANGUAGE RECOGNITION METHOD |
| US11/547,322 US7813921B2 (en) | 2004-03-31 | 2005-03-15 | Speech recognition device and speech recognition method |
| JP2006511980A JP4340686B2 (ja) | 2004-03-31 | 2005-03-15 | 音声認識装置及び音声認識方法 |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2004102904 | 2004-03-31 | ||
| JP2004-102904 | 2004-03-31 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2005098820A1 true WO2005098820A1 (ja) | 2005-10-20 |
Family
ID=35125309
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2005/005052 Ceased WO2005098820A1 (ja) | 2004-03-31 | 2005-03-15 | 音声認識装置及び音声認識方法 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US7813921B2 (ja) |
| EP (1) | EP1732063A4 (ja) |
| JP (1) | JP4340686B2 (ja) |
| WO (1) | WO2005098820A1 (ja) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2008216618A (ja) * | 2007-03-05 | 2008-09-18 | Fujitsu Ten Ltd | 音声判別装置 |
| WO2008126347A1 (ja) * | 2007-03-16 | 2008-10-23 | Panasonic Corporation | 音声分析装置、音声分析方法、音声分析プログラム、及びシステム集積回路 |
| CN103826080A (zh) * | 2012-11-16 | 2014-05-28 | 杭州海康威视数字技术股份有限公司 | 对硬盘录像机进行批量升级的方法及系统 |
| US8996373B2 (en) | 2010-12-27 | 2015-03-31 | Fujitsu Limited | State detection device and state detecting method |
Families Citing this family (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8121837B2 (en) | 2008-04-24 | 2012-02-21 | Nuance Communications, Inc. | Adjusting a speech engine for a mobile computing device based on background noise |
| JP4640463B2 (ja) * | 2008-07-11 | 2011-03-02 | ソニー株式会社 | 再生装置、表示方法および表示プログラム |
| US8234111B2 (en) * | 2010-06-14 | 2012-07-31 | Google Inc. | Speech and noise models for speech recognition |
| US8635067B2 (en) | 2010-12-09 | 2014-01-21 | International Business Machines Corporation | Model restructuring for client and server based automatic speech recognition |
| US9406299B2 (en) * | 2012-05-08 | 2016-08-02 | Nuance Communications, Inc. | Differential acoustic model representation and linear transform-based adaptation for efficient user profile update techniques in automatic speech recognition |
| US9401140B1 (en) * | 2012-08-22 | 2016-07-26 | Amazon Technologies, Inc. | Unsupervised acoustic model training |
| US20140278415A1 (en) * | 2013-03-12 | 2014-09-18 | Motorola Mobility Llc | Voice Recognition Configuration Selector and Method of Operation Therefor |
| DE102013111784B4 (de) * | 2013-10-25 | 2019-11-14 | Intel IP Corporation | Audioverarbeitungsvorrichtungen und audioverarbeitungsverfahren |
| KR20170044849A (ko) * | 2015-10-16 | 2017-04-26 | 삼성전자주식회사 | 전자 장치 및 다국어/다화자의 공통 음향 데이터 셋을 활용하는 tts 변환 방법 |
| US11094316B2 (en) * | 2018-05-04 | 2021-08-17 | Qualcomm Incorporated | Audio analytics for natural language processing |
| CN112201270B (zh) * | 2020-10-26 | 2023-05-23 | 平安科技(深圳)有限公司 | 语音噪声的处理方法、装置、计算机设备及存储介质 |
| CN119541519B (zh) * | 2025-01-20 | 2025-04-29 | 浙江华消科技有限公司 | 一种救援机器人的语音识别方法、装置和一种救援机器人 |
Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH09258765A (ja) * | 1996-03-25 | 1997-10-03 | Kokusai Denshin Denwa Co Ltd <Kdd> | 音声認識のための音声区間始端補正方法及び装置並びに音声認識方法 |
| JPH10149191A (ja) * | 1996-09-20 | 1998-06-02 | Nippon Telegr & Teleph Corp <Ntt> | モデル適応方法、装置およびその記憶媒体 |
| JPH10161692A (ja) * | 1996-12-03 | 1998-06-19 | Canon Inc | 音声認識装置及び音声認識方法 |
| JP2002014692A (ja) * | 2000-06-28 | 2002-01-18 | Matsushita Electric Ind Co Ltd | 音響モデル作成装置及びその方法 |
| JP2002091485A (ja) * | 2000-09-18 | 2002-03-27 | Pioneer Electronic Corp | 音声認識システム |
| JP2002330587A (ja) * | 2001-05-07 | 2002-11-15 | Sony Corp | 商用電源の整流回路 |
| JP2003330484A (ja) * | 2002-05-17 | 2003-11-19 | Pioneer Electronic Corp | 音声認識装置及び音声認識方法 |
| EP1400952A1 (en) * | 2002-09-18 | 2004-03-24 | Pioneer Corporation | Speech recognition adapted to environment and speaker |
Family Cites Families (17)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4811404A (en) * | 1987-10-01 | 1989-03-07 | Motorola, Inc. | Noise suppression system |
| FR2765715B1 (fr) * | 1997-07-04 | 1999-09-17 | Sextant Avionique | Procede de recherche d'un modele de bruit dans des signaux sonores bruites |
| US6188982B1 (en) * | 1997-12-01 | 2001-02-13 | Industrial Technology Research Institute | On-line background noise adaptation of parallel model combination HMM with discriminative learning using weighted HMM for noisy speech recognition |
| JP3434730B2 (ja) | 1999-05-21 | 2003-08-11 | Necエレクトロニクス株式会社 | 音声認識方法および装置 |
| US6615170B1 (en) * | 2000-03-07 | 2003-09-02 | International Business Machines Corporation | Model-based voice activity detection system and method using a log-likelihood ratio and pitch |
| US6766292B1 (en) * | 2000-03-28 | 2004-07-20 | Tellabs Operations, Inc. | Relative noise ratio weighting techniques for adaptive noise cancellation |
| DE10017646A1 (de) * | 2000-04-08 | 2001-10-11 | Alcatel Sa | Geräuschunterdrückung im Zeitbereich |
| US7089182B2 (en) * | 2000-04-18 | 2006-08-08 | Matsushita Electric Industrial Co., Ltd. | Method and apparatus for feature domain joint channel and additive noise compensation |
| US6741873B1 (en) * | 2000-07-05 | 2004-05-25 | Motorola, Inc. | Background noise adaptable speaker phone for use in a mobile communication device |
| DE10041456A1 (de) * | 2000-08-23 | 2002-03-07 | Philips Corp Intellectual Pty | Verfahren zum Steuern von Geräten mittels Sprachsignalen, insbesondere bei Kraftfahrzeugen |
| US7617099B2 (en) * | 2001-02-12 | 2009-11-10 | FortMedia Inc. | Noise suppression by two-channel tandem spectrum modification for speech signal in an automobile |
| US6859420B1 (en) * | 2001-06-26 | 2005-02-22 | Bbnt Solutions Llc | Systems and methods for adaptive wind noise rejection |
| US6959276B2 (en) * | 2001-09-27 | 2005-10-25 | Microsoft Corporation | Including the category of environmental noise when processing speech signals |
| US6937980B2 (en) * | 2001-10-02 | 2005-08-30 | Telefonaktiebolaget Lm Ericsson (Publ) | Speech recognition using microphone antenna array |
| JP2003308091A (ja) * | 2002-04-17 | 2003-10-31 | Pioneer Electronic Corp | 音声認識装置、音声認識方法および音声認識プログラム |
| US7047047B2 (en) * | 2002-09-06 | 2006-05-16 | Microsoft Corporation | Non-linear observation model for removing noise from corrupted signals |
| JP2004325897A (ja) * | 2003-04-25 | 2004-11-18 | Pioneer Electronic Corp | 音声認識装置及び音声認識方法 |
-
2005
- 2005-03-15 EP EP05721202A patent/EP1732063A4/en not_active Withdrawn
- 2005-03-15 JP JP2006511980A patent/JP4340686B2/ja not_active Expired - Fee Related
- 2005-03-15 US US11/547,322 patent/US7813921B2/en not_active Expired - Fee Related
- 2005-03-15 WO PCT/JP2005/005052 patent/WO2005098820A1/ja not_active Ceased
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH09258765A (ja) * | 1996-03-25 | 1997-10-03 | Kokusai Denshin Denwa Co Ltd <Kdd> | 音声認識のための音声区間始端補正方法及び装置並びに音声認識方法 |
| JPH10149191A (ja) * | 1996-09-20 | 1998-06-02 | Nippon Telegr & Teleph Corp <Ntt> | モデル適応方法、装置およびその記憶媒体 |
| JPH10161692A (ja) * | 1996-12-03 | 1998-06-19 | Canon Inc | 音声認識装置及び音声認識方法 |
| JP2002014692A (ja) * | 2000-06-28 | 2002-01-18 | Matsushita Electric Ind Co Ltd | 音響モデル作成装置及びその方法 |
| JP2002091485A (ja) * | 2000-09-18 | 2002-03-27 | Pioneer Electronic Corp | 音声認識システム |
| JP2002330587A (ja) * | 2001-05-07 | 2002-11-15 | Sony Corp | 商用電源の整流回路 |
| JP2003330484A (ja) * | 2002-05-17 | 2003-11-19 | Pioneer Electronic Corp | 音声認識装置及び音声認識方法 |
| EP1400952A1 (en) * | 2002-09-18 | 2004-03-24 | Pioneer Corporation | Speech recognition adapted to environment and speaker |
Non-Patent Citations (4)
| Title |
|---|
| AKAE ET AL.: "Zatsuon Kankyo eno Jacobi Tekioho no Kakucho.", THE ACOUSTICAL SOCIETY OF JAPAN (ASJ) KOEN RONBUNSHU, vol. 1, 15 March 2000 (2000-03-15), pages 7 - 8, XP008082904 * |
| IDA ET AL.: "Zatsuon DB to Model Tekioa o Mochiita HMM Goseiho ni okeru Zatsuon Hendo Taisei no Hyoka.", THE ACOUSTICAL SOCIETY OF JAPAN (ASJ) KOEN RONBUNSHU, vol. 1, 2 October 2001 (2001-10-02), pages 33 - 34, XP008082930 * |
| NOGUCHI ET AL.: "1 Channel Nyuryoku Shingochu no Toppatsusei Zatsuon no Hanbetsu to Jokyo.", THE ACOUSTICAL SOCIETY OF JAPAN (ASJ) KOEN RONBUNSHU, vol. 1, 17 March 2004 (2004-03-17), pages 655 - 656, XP008082931 * |
| See also references of EP1732063A4 * |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2008216618A (ja) * | 2007-03-05 | 2008-09-18 | Fujitsu Ten Ltd | 音声判別装置 |
| WO2008126347A1 (ja) * | 2007-03-16 | 2008-10-23 | Panasonic Corporation | 音声分析装置、音声分析方法、音声分析プログラム、及びシステム集積回路 |
| JP5038403B2 (ja) * | 2007-03-16 | 2012-10-03 | パナソニック株式会社 | 音声分析装置、音声分析方法、音声分析プログラム、及びシステム集積回路 |
| US8478587B2 (en) | 2007-03-16 | 2013-07-02 | Panasonic Corporation | Voice analysis device, voice analysis method, voice analysis program, and system integration circuit |
| US8996373B2 (en) | 2010-12-27 | 2015-03-31 | Fujitsu Limited | State detection device and state detecting method |
| CN103826080A (zh) * | 2012-11-16 | 2014-05-28 | 杭州海康威视数字技术股份有限公司 | 对硬盘录像机进行批量升级的方法及系统 |
Also Published As
| Publication number | Publication date |
|---|---|
| EP1732063A4 (en) | 2007-07-04 |
| US20080270127A1 (en) | 2008-10-30 |
| US7813921B2 (en) | 2010-10-12 |
| EP1732063A1 (en) | 2006-12-13 |
| JPWO2005098820A1 (ja) | 2008-02-28 |
| JP4340686B2 (ja) | 2009-10-07 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US7660717B2 (en) | Speech recognition system and program thereof | |
| JP4590692B2 (ja) | 音響モデル作成装置及びその方法 | |
| CN110310623B (zh) | 样本生成方法、模型训练方法、装置、介质及电子设备 | |
| CN1329883C (zh) | 语音模型的噪声适应系统及方法 | |
| JP4340686B2 (ja) | 音声認識装置及び音声認識方法 | |
| JP3836815B2 (ja) | 音声認識装置、音声認識方法、該音声認識方法をコンピュータに対して実行させるためのコンピュータ実行可能なプログラムおよび記憶媒体 | |
| US6182036B1 (en) | Method of extracting features in a voice recognition system | |
| CN1726532A (zh) | 基于传感器的语音识别器选择、自适应和组合 | |
| CN101548313A (zh) | 话音活动检测系统和方法 | |
| CN1856820A (zh) | 语音识别方法和通信设备 | |
| Akbacak et al. | Environmental sniffing: noise knowledge estimation for robust speech systems | |
| Das et al. | Bangladeshi dialect recognition using Mel frequency cepstral coefficient, delta, delta-delta and Gaussian mixture model | |
| WO2018051945A1 (ja) | 音声処理装置、音声処理方法、および記録媒体 | |
| JP5740362B2 (ja) | 雑音抑圧装置、方法、及びプログラム | |
| JP5235187B2 (ja) | 音声認識装置、音声認識方法及び音声認識プログラム | |
| WO2020049687A1 (ja) | 音声処理装置、音声処理方法、およびプログラム記録媒体 | |
| JP4705414B2 (ja) | 音声認識装置、音声認識方法、音声認識プログラムおよび記録媒体 | |
| Loh et al. | Speech recognition interactive system for vehicle | |
| Charan et al. | Unveiling the challenges of speech recognition in noisy environments: A comprehensive review of issues and solutions | |
| Kannadaguli et al. | Phoneme modeling for speech recognition in Kannada using Hidden Markov Model | |
| JP5104732B2 (ja) | 拡張認識辞書学習装置、これを用いた音声認識システム、その方法及びそのプログラム | |
| JP2000259198A (ja) | パターン認識装置および方法、並びに提供媒体 | |
| JP2007078943A (ja) | 音響スコア計算プログラム | |
| Gomez et al. | Techniques in rapid unsupervised speaker adaptation based on HMM-sufficient statistics | |
| Stemmer et al. | A phone recognizer helps to recognize words better |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
| AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
| WWE | Wipo information: entry into national phase |
Ref document number: 2005721202 Country of ref document: EP |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2006511980 Country of ref document: JP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWW | Wipo information: withdrawn in national office |
Country of ref document: DE |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 11547322 Country of ref document: US |
|
| WWP | Wipo information: published in national office |
Ref document number: 2005721202 Country of ref document: EP |
