WO1987007460A1 - Telephone actionne par la voix - Google Patents

Telephone actionne par la voix Download PDF

Info

Publication number
WO1987007460A1
WO1987007460A1 PCT/US1987/001260 US8701260W WO8707460A1 WO 1987007460 A1 WO1987007460 A1 WO 1987007460A1 US 8701260 W US8701260 W US 8701260W WO 8707460 A1 WO8707460 A1 WO 8707460A1
Authority
WO
WIPO (PCT)
Prior art keywords
routine
name
distance
word
status
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US1987/001260
Other languages
English (en)
Inventor
Devices Innovative
Siddarth Mehta
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Innovative Devices LLC
Original Assignee
Innovative Devices LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Innovative Devices LLC filed Critical Innovative Devices LLC
Publication of WO1987007460A1 publication Critical patent/WO1987007460A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/26Devices for calling a subscriber
    • H04M1/27Devices whereby a plurality of signals may be stored simultaneously
    • H04M1/271Devices whereby a plurality of signals may be stored simultaneously controlled by voice recognition

Definitions

  • the present invention relates to a telephone which is responsive to voice commands to dial automatically a selected phone number. New and unique voice recognition techniques are utilized to enable the telephone to respond to the proper voice command.
  • the next step was to find out why confusible names are confusible to a machine, even though they're not confusible to a human.
  • a vowel is a sound such as "Aaaa”.
  • a plosive is the brief “B” sound in “Baaa” that happens when opening the mouth.
  • a fricative is the "Sh” sound in "Sharp”.
  • Plosives only last about 30 milliseconds compared to a vowel that lasts about 200 milliseconds. But since it lasts only 30 milliseconds, it's importance in any matching algorithm is 30/200 that of a vowel.
  • a plosive is important because it can sometimes hold the key to a confusible word, as the words may differ only in the plosives section.
  • only the vowel section may be different - such as TIM and TOM.
  • first names rather than all words, exhibit the property of being recognizable by plosive differences. For example, Tim, Kim, Jim, etc. Ron, Don, John, etc. Brady, Grady, etc Gary, Harry, Larry, Mary, etc. 1 can go on and on with such names. Note that I discovered that this applies specially to first names, rather than words. This is not public domain, but was discovered by me.
  • each button has three letters associated with it. For example, "5" has JKL.
  • no digit has plosives that are confusible within that set of three letters. For example, no digit has the plosives P,T,K on it or B,D,G on it. Each confusible plosive is on a separate digit.
  • this feature is also useful in distinguishing names that are confusible not because of the beginning plosive as is normally the case, but for any other reason.
  • the safety net feature kicks in - when you teach the name NAT SMITH, before the phone stores it in memory, it first tries to do a recognition check. If the name NAT SMITH can be correctly recognized over all other names under the "MNO" initial set (under the number 6, strictly speaking), then it will accept the name. If it finds it to be too confusible (determined by a threshold) with any . other name, such as MATT SMITH, it will refuse to store it in memory, thereby ensuring that the phone will not misdial.
  • the purpose of the safety feature is to deal with any unknown situation, not just the first initial exceptions with rare instances such as Matt and Nat.
  • the SP1000 chip by General Instrument is a recognition/systhesis chip. It will synthesize pre-processed speech. i.e. Speech that has been parameterized into LPC (Linear Predictive Coding) coefficients by another computer. Such speech is generally stored in a product in ROM in the form of canned phrases.
  • LPC Linear Predictive Coding
  • the chip can also in a crude, approximate fashion, provide pseudo-LPC parameters itself.
  • the chip parameterizes the word into pseudo-LPC coefficients. These are generally stored in RAM as "templates" of the words to be recognized in the future.
  • this chip it is possible to use this chip to synthesize data collected during recognition (called “resynthesis") by sending it back out through the synthesis section. This is done by scaling the coefficients so they are acting on full range (x2 in our system), and scale the energy with a linear function specified by a multiplier and constant.
  • the two missing coefficients, K9 & K10, are set to zero as they are not generated In recognition mode.
  • the SP1000 has the capability to be driven by an external clock. Unfortunately, because of design problems, the chips sample rate mechanism will not function correctly in synthesis mode with an external clock. It is necessary to switch the chip into recognition mode then back to synthesis mode to force the chip to initialize itself properly: the chip will then function normally after that.
  • the software interface is time critical. In recognition mode, if all the parameters are not read in one sample rate period, then there will be distortion in the coefficients. In order to prevent this, the time to read a single parameter from the chip must be less than one "stage time" or 28+SR (SP1000) clock cycles. It is possible to write such a routine on our 1.8Mhz 6502 system. However, the NMI RAM refresh routine still has priority over the SP1000 frame collection routines which cause timing distortions when the events overlap. However, by choosing a frame rate of 16ns that is exactly 4x the period of the 4ms NMI, it is possible to synchronize the SP1000 IRQ.
  • This routine waits for the end of the next NMl then waits for 250usec and writes to Tis inner register. This immediately modifies the internal frame timer (T1) so that it will always occur at this same time relative to the 4ms NMl. Since the frame rate is chosen to be exactly 4x the rate of the NMl, and both are governed by the same master clock, routine is guaranteed.
  • the database contains all the information associated with memorizing a name. Some of these are:
  • Each name has a unique 'slot#' which is simply the index of the name. Slotts can be in the range from 1 to max# of names (110). Slot's used by active names are not necessarily contiguous.
  • Section I currently consists of two different types of data:
  • the fixed length data are stored individually as one or two byte arrays with a 'MAP' suffix.
  • SLOTMAP, STATUSHAP, NWDMAP, STOPMAP, SEGMAP... These are simply indexed to by the slot# (or slot#*2) of the name.
  • VTSLOT address of begining of variable length record
  • VTSTATUS status byte of the name (currently contains only page info)
  • VTNWD # of frames the template contains (a frame is currently 9 bytes)
  • VTSTOP # of stopgaps *2 (each stopgap requires 2 bytes in record)
  • VTSEG # of segments in word
  • variable length record can be found at VTSLOT's address.
  • the records are managed dynamically by several memory management routines.
  • the database structure aas designed to be flexible and should accomodate changes fairly easily. Changes are likely to occur.
  • Tnis routine clears & initializes the name database. Tnis routine should be called once during phone initialization.
  • This routine calculates the entire length of a variable length record based on the lengths of the parts.
  • TEPLEN TLENGTH[0]*RFRMSZ + TLENGTH[1] + ... + TLENGTH[NUMELS-1]
  • TLENGTH[0] must be a frame count
  • TLENGTH[1..] must be ⁇ 256
  • VTI slott of name to access
  • VTSLOT VTSLOT
  • VTSTATUS VTNWD
  • VTSTOP VTSEG
  • This routine accesses the name database by slots and returns all the fixed length information in 'VT' variables.
  • VTI slots of name to access
  • VTSLOT VTSLOT
  • VTSTATUS VTNWD
  • VTSTOP VTSEG
  • VTLENG VTLENG
  • VTSPADOR VTSGADOR
  • VTPAGE VTPAGE
  • This routine calls GETTEMP then does some additional calculations to find additional information. This routine also sets the bank as specified in VTPAGE.
  • VTSGADOR VTSPADOR + VTSTOP
  • VTSLOT In: VTSLOT, VTSTATUS, VTNWD, VTSTOP, VTSEG Out: VTLENG, VTSPADOR, VTSGADOR, VTPAGE
  • VTI VTI, VTSLOT, VTSTATUS, VTNWD, VTSTOP, VTSEG & VTLENG, VTPAGE Out: SLOTMAP, i. .SEGMAP, i Prereq: SPACELOOK
  • This routine allocates memory off a heap as specified by VTPAGE & VTLENG, then stores the VT vars into the MAP arrays indexed by VTI.
  • This routine deletes name index by VTI from the database, then crunches the heap that contained the old record and finally updates the memory pointers. • Name to be deleted must exist, or nasty things can occur.
  • Tnis routine moves a block of memory of length (GPPTR), from (GPPTR) to (TPPTR)
  • SLOT - This refers to the index number associated to a name/number trained. There is one to one relationship between stot# and name/number. The slot#s used are not necessarily contiguous.
  • FRAME - This is a collection of LPC parameters for one time instant (currently 9 bytes of information).
  • UTTERANCE This is a collection of frames that have been properly endpointed.
  • DATABASE - This refers to the data structure which store all the information regarding all the stored names/numbers.
  • TEMPLATE - This is an utterance which has been placed into the database.
  • PAGE - This refers to a segment of memory used by the memory management routines. The proper term should be “segment.” but segment refers to something else in the recognition system.
  • HEAP - This is a dynamically managed data structure which contains variable length records.
  • DTW or DP - This stands for “Dynamic Time Warp which is synonymous with “Dynamic Programming.” This is a technique used for measuring the distance between -an utterance and a template.
  • ENDPOINTING This refers to the process of finding the beginning and end (points) of an utterance; where the person started and stopped speaking.
  • This first representation is called the full or "variable length” representation.
  • This representation can be, as the label implies, of a variable length depending on how long the spoken utterance was.
  • the other representation is the compressed or "constant length” template. This is simply a version of the variable length template that is linearly compressed to a constant length.
  • the constant length templates are stored in a simple array structure, while the variable length templates are stored dynamically in memory, (see “Database & Memory Management Routines")
  • the software Before any processing is done to the raw frames, the software must find the beginning and ending (time) boundaries of the spoken word. This is done by monitoring the energy parameter.
  • an energy noise floor buffer is initialized to the surrounding room noise. This buffer is used to compute NOISE which represents the average ambient room energy level. The noise buffer is then updated with "silence" energies that are at least 4 frames away from either the beginning or ending frames of the word.
  • Endpointing is achieved with a state machine.
  • the state machine defines a "pulse" segment in the word.
  • a pulse is one complete clockwise sequence through the state machine from “silence” to "silence.”
  • a word is made of one or more pulses.
  • the first pulse in a word is subject to additional scrutiny as it may be a false beginning and not actually part of the word. If the maximum amplitude in the first pulse (HIGHEST) is less than MINHIGH, or the number of frames (FRMNUM) is less than MINDURAT. all collected frames are discarded: else. INWORD" true. Subsequent pulses are not scrutinized in this way, but simply appended to the main frame buffer.
  • the end of word condition is detected by SILNUM of consecutive "silence" frames.
  • NWDFRMS # of frames in an utterance
  • NOISE must be less than NOISET or the phone will say "too noisy”. Large background noise causes problems in recognition.
  • the maximum amplitude reached in the word (MAXHIGH) must be greater than MINAMPL or the phone will say "speak louder or into the phone.” This is to encourage people not to mumble and to speak directly into the handset.
  • the routine used upsamples the number of input frames by a factor equal to the number of output frames, and then downsamples by a factor equal to the number of input frames using a rectangular window as a low pass FIR filter.
  • the end result is a 12 frame constant length template in addition to the variable length one.
  • Matching takes place when the user attempts to voice dial his phone by picking up the receiver and speaking a name.
  • the utterance is first endpointed, then a compressed template of length 12 is derived from it. From now on, these will be referred to as the "test" utterance which will be compared against the "reference" templates previously stored in memory.
  • the matching procedure used is a two step process. First a crude but fast matching procedure is used to get rid of the obviously distant matches. Then the top contenders are dynamically time warped, which is a very accurate, but slow, matching algorithm. It is this two pass technique that allows the system to be very accurate without taking a large amount of time ( ⁇ 4 secs). If no crude first pass were used, the system could take in excess of 50 seconds per match.
  • each template is compared against that of the test utterance. If the ratio of the lengths is greater than REJRATIO, then these templates are removed from consideration.
  • a fast first pass is performed on the constant length utterance against all the stored constant length templates.
  • a very fast Chebychev distance metric is used in the matching procedure. The algorithm is nothing more than taking the absolute value of the differences between each parameters in the test and reference templates.
  • D is likewise calculated for each stored reference template.
  • the scores are then sorted in ascending order. The following criterion is used to determine which of these will be considered “candidates” for further consideration:
  • the template must be within the top TOPCUT scores
  • the template must have a score less than ABSCUT.
  • the remaining templates are then dynamically time warped utilizing the full variable length representation.
  • the scores are then sorted and an action is described by the user interface decision matrix (see User Interface Decision Matrix").
  • the decision matrix defines 4 regions based on the two variable ZA & ZR (and also the first initials)
  • LEDSTATA LEDSTATB- States of LEDs & TEN.
  • LEDTOGA LEDTOGB- LEDs to flash in mi if FLASHLED is true.
  • FLASHLED- If true, toggle selected LEDs in NMI every 125ms.
  • RINGING- IRQ ring detect counter. True when phone is in ring envelopE INCOMMING- True if a call is incoming (4 sec proximity to ring)
  • DTDING- Dialtone timeout period (& DTD enable). Time left to detect dial tone.
  • DIALTONE- Dialtone boolean True if dialtone has been detected.
  • MASKSEL- Key mask select byte (which keys are active?)
  • RAMKEYMSK..RAMKEYMSK+2- Programmable (RAM) key mask
  • This routine gets the next nibble off a nibble stream.
  • NIBPTR is decretnented first, then the buffer is read.
  • Thie routine diales a phone number nibble string setup in (NUMPTR) & NIBPTR.
  • MASKSEL should be set to reflect if home/work are active Possible STATUS states are:
  • Routine calls LISTEN and handles all diagnostics. Routine exits when a legal utterance is collected or other I/O occurs (onhook, keypress or timeout).
  • Routine primarily calls LISTEN
  • This routine scans the key buffer for "home. " "work, “ and “long dist” relating to dialing status, and sets the global vars; LDSWlTCH & HWOVERIDE appropriately.
  • LDSWlTCH is set to 1 if one or more "long dist" keys are pressed
  • HWOVERIDE is set to the last key (home/work) that was pressed
  • This routine checks to see if a nunber is a long distance number. If it is, then STRIPSTART is set to the start of the "real" AT&T long distance beginning
  • This routine updates the home/work difference counter (HiDIFFER) for a name.
  • [A] keyval (must be policekey, doctorkey, or dialldkey)
  • SPECIALNB contains the pseudo index of the special number.
  • SWAPNUMBS In: VTI
  • This routine swaps the home# with the work* (and visa vera) for a given name.
  • This routine dials a pulse or DTMF digit over the phone line. This is used both for auto & manual dialing of numbers.
  • DELAYVCT should be a vector pointing to the appropriate delay routine.
  • DELAYAMS is used for speed dialing, as it responds to key presses
  • This routine checks for i) onhook, ii) unmasked keypress, & iii) timeout if enabled by TIMESWITCH.
  • Routine flushes the keyboard buffer.
  • KEYWAIT set to 0.
  • Routine checks quickly for any pressed keys
  • Routine is fast; does not decode the key pressed
  • Routine reads the rows after one column has been sent low. Row decode as follows:
  • Routine decodes a pressed button
  • Routine does fast check of the hook switchs, and returns abort recommendation.
  • This routine starts to monitor the SP1000 for possible legal utterances along with other I/O (keyboard, hooksw). If a legal utterance was found, the proper utterance data structures for training will be set up. The routine will return for one of three reasons: 1). An utterance was found, 2). Some other I/O occured (keyboard, hooksw), or 3). A timeout occured.
  • This routine take a number of utterances from LISTEN, and if everything is ok, combines them into a template and stores it perm t y into the name database at the location found by SPACELOOK. This routine must be called (along with LISTEN) for each training pass. For the first pass, PASSES should be set to 0. Additional passes are necessary if STATUS is returned with 5. TRAINFIRST automatically increments PASSES. When STATUS finally returns with a 0, this indicates that the utterance has been stored in the name database and training is complete. • Name database is not modified if STATUS returns with 5
  • TOPTEN (3 byte wide) array of top ten scores
  • This routine batches a recently collected utterance (via LISTEN), matches it against the stored templates in the name database, and returns the top ten contenders.
  • the format of the TOPTEN array is: BYTEO: template# of this score, BYTE1: MSB of score. BYTE2: LS8 of score. "Note BYTE1 & BYTE2 are the opposite of conventional 2 byte address order.
  • the format of the output is subject to change.
  • Routine takes from 100ms to 3sec to match
  • This routine calls SCOREFIRST and interprets the TOPTEN results according to the user interface decision matrix and returns with STATUS indicating the proper action to take.
  • the [X] register returns the top score's slot#. (see user interface decision matrix). If more that one contender is in the "confusable” region, then [Y] contains the number of candidates, and array "INITIALS" indexes each confusable and unique candidate by initial so that after the first initial is known, the correct choice can be made. Possible STATUS states:
  • test word pattern Is compered to each relevant reference pattern end the distance computed via dynamic timewarping.
  • the matching word is the one whose reference pattern gives the lowest distance.
  • a good method for rejection is to employ two distance thresholds, an absolute thra shold and a relativt thrashold
  • the word is rejected. Rejection due to this criterion means that the test pattern was not sufficiently similar to any of the reference patterns.
  • the algorithm described herein makes efficient use of storage for a distance array while elso using a simple scheme for determining the order of distance calculations.
  • Figure 8.4.2.1-1 shows a grid of dynamic time warp distance points similar to the ones of figures 3.2.4-2 end 3.2.4-3. Notice, however, that the coordinate axes are not the usual vertical end horizontal ones.
  • the "i" axis has a slope of 1/2, while the “j” axis has a slope of 2.
  • a “k” axis is also shown, which has a slope of 1. This exis should be viewed as orthogonal to the other two, actually extending behind the plane of the paper.
  • Each of the distance points can be uniquely represented as a set of coordinales in the (i,j,k) space as shown in figure 8.4.2.1-2. This is a magnified view of the lower part of figure x which has the coordinates of each point labeled. Notice that a diagonal line connects groups of three points. Each point within a group, or "triplet”, has the same i end j coordinates but different k coordinates, ranging from 0 to 2.
  • the i end j coordinates range from 0 to 3.
  • a total of 4*4*3 48 unique coordinates may be generated from i ,j end k, end this is indeed the number of points which lie within the dynamic time warp parallelogram.
  • This scheme uniquely maps each point within the parallelogram into a 3-dimensional array, with no surplus entries in the array.
  • MAX_TRIPLET_INDEX (FRAMES_P ⁇ R_PATTERN/3) - 1;
  • INFINITY 32767; ( a very large distance ⁇ var i: integer; ⁇ test triplet index ⁇ j: integer; ⁇ reference triplet index ⁇ k: integer; ⁇ triplet element index ⁇ distances: array [0. MAX_TRIPLET_INDEX, 0..MAX_TRIPLET_INDEX, 0..2] of Integer;
  • ⁇ var delta integer; left_distance: integer; middle_distance: integer; right_distance: integer; smallest_of_3: integer; begin
  • index of reference frame in reference pattern 1+2* j+k
  • index of test frame in test pattern 2*i,j+k
  • RISTHRS1, RISTHRS2, PLATEAU, FALLT, HAXDECLI, MINDURAT, & MINHIGH are constants, (see "Parameter Setting for Recognition)
  • INWORD true if one or more frames have been collected.
  • Frames are appended to the frame buffer if they are in the "rising,” “plateau,” or “falling” states.
  • VOICE DIALER FUNCTIONAL DESCRIPTION HARDWARE The hardware consists of both digital and analog systems.
  • the digital hardware contains: 1). 65C82 microprocessor 2). 65K dynamic RAM (8 4164s) 3). 32K EPROM (27254) 4).
  • a VIA to handle system I/O (65C22) which include: a) Controlling rows and columns for software keyboard decoding b) Hook switch control c) Ring detector input d) Zero crosser source control e) Output amplification boost control for ringing f) IRQ interrupt control
  • a custom gate array that handles the following a) System clock, timing and address decoding b) DRAM control signals & bank selecting c) Control for LEDs d) Control for DTMF chip e) 4ms NMI generation f) 258ms Watchdog reset timer for system reliability
  • a speech processing chip (SP1000) a) Handles voice parameterization for speech recognition b) Does speech synthesis for i) Canned response synthesis ii) Name resynthesis iii) Ring through the handset
  • Zero cross detector that can look at: a) Input from Microphone (for frication detection) or, b) Input from phone line (for dialtone detection)
  • the software consists of 3 sections. I/O support, recognition and user interface.
  • DRAM refresh routines which must guarentee a max of 2ms refresh.
  • Utner VIA & Date array support LEDs, DTMF, hook switch..
  • Matching routines a) Linear constant length matching b) varlable length dynamic time warping (DTW) c) Two tiered matching, score sorting and threshold testing

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

Appareil téléphonique sensible à des commandes vocales et pouvant composer automatiquement le numéro téléphonique sélectionné. De nouvelles techniques originales de reconnaissance vocale sont utilisées pour permettre à l'appareil téléphonique de réagir correctement à la commande vocale.
PCT/US1987/001260 1986-05-23 1987-05-22 Telephone actionne par la voix Ceased WO1987007460A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US86746886A 1986-05-23 1986-05-23
US867,468860523 1986-05-23

Publications (1)

Publication Number Publication Date
WO1987007460A1 true WO1987007460A1 (fr) 1987-12-03

Family

ID=25349827

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1987/001260 Ceased WO1987007460A1 (fr) 1986-05-23 1987-05-22 Telephone actionne par la voix

Country Status (2)

Country Link
AU (1) AU7488287A (fr)
WO (1) WO1987007460A1 (fr)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5301227A (en) * 1989-04-17 1994-04-05 Sanyo Electic Co., Ltd. Automatic dial telephone
WO1998016048A1 (fr) * 1996-10-07 1998-04-16 Northern Telecom Limited Systeme de composition en vocal a l'aide d'un modele de comportement d'appel
US6208713B1 (en) 1996-12-05 2001-03-27 Nortel Networks Limited Method and apparatus for locating a desired record in a plurality of records in an input recognizing telephone directory
WO2001039176A3 (fr) * 1999-11-25 2002-09-26 Siemens Ag Procede et dispositif de reconnaissance vocale et systeme de telecommunication
US6629072B1 (en) 1999-08-30 2003-09-30 Koninklijke Philips Electronics N.V. Method of an arrangement for speech recognition with speech velocity adaptation
US6771982B1 (en) 1999-10-20 2004-08-03 Curo Interactive Incorporated Single action audio prompt interface utlizing binary state time domain multiple selection protocol
US6804539B2 (en) 1999-10-20 2004-10-12 Curo Interactive Incorporated Single action audio prompt interface utilizing binary state time domain multiple selection protocol
EP1044448B1 (fr) * 1998-09-11 2005-01-26 Koninklijke Philips Electronics N.V. Procede d'extraction d'erreur dans la reconnaissance d'une presentation utilisateur, par evaluation de la fiabilite d'un ensemble limite d'hypotheses
US7822613B2 (en) 2002-10-07 2010-10-26 Mitsubishi Denki Kabushiki Kaisha Vehicle-mounted control apparatus and program that causes computer to execute method of providing guidance on the operation of the vehicle-mounted control apparatus
US9232037B2 (en) 1999-10-20 2016-01-05 Curo Interactive Incorporated Single action sensory prompt interface utilising binary state time domain selection protocol

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4853953A (en) * 1987-10-08 1989-08-01 Nec Corporation Voice controlled dialer with separate memories for any users and authorized users

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3928724A (en) * 1974-10-10 1975-12-23 Andersen Byram Kouma Murphy Lo Voice-actuated telephone directory-assistance system
US4348550A (en) * 1980-06-09 1982-09-07 Bell Telephone Laboratories, Incorporated Spoken word controlled automatic dialer
US4453043A (en) * 1982-02-04 1984-06-05 Northern Telecom Limited Telephone for a physically handicapped person
JPS59225656A (ja) * 1983-06-07 1984-12-18 Fujitsu Ltd 音声ダイヤル電話端末
JPS6059846A (ja) * 1983-09-13 1985-04-06 Matsushita Electric Ind Co Ltd 音声認識自動ダイヤル装置
JPS6085655A (ja) * 1983-10-15 1985-05-15 Fujitsu Ten Ltd 音声ダイヤリング装置
JPS60216655A (ja) * 1984-04-12 1985-10-30 Nippon Telegr & Teleph Corp <Ntt> 自動ダイヤル装置
DE3422409A1 (de) * 1984-06-16 1985-12-19 Standard Elektrik Lorenz Ag, 7000 Stuttgart Einrichtung zur erkennung und umsetzung von wahlinformation sowie von steuerinformation fuer leistungsmerkmale einer fernsprechvermittlungsanlage
US4644107A (en) * 1984-10-26 1987-02-17 Ttc Voice-controlled telephone using visual display

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3928724A (en) * 1974-10-10 1975-12-23 Andersen Byram Kouma Murphy Lo Voice-actuated telephone directory-assistance system
US4348550A (en) * 1980-06-09 1982-09-07 Bell Telephone Laboratories, Incorporated Spoken word controlled automatic dialer
US4453043A (en) * 1982-02-04 1984-06-05 Northern Telecom Limited Telephone for a physically handicapped person
JPS59225656A (ja) * 1983-06-07 1984-12-18 Fujitsu Ltd 音声ダイヤル電話端末
JPS6059846A (ja) * 1983-09-13 1985-04-06 Matsushita Electric Ind Co Ltd 音声認識自動ダイヤル装置
JPS6085655A (ja) * 1983-10-15 1985-05-15 Fujitsu Ten Ltd 音声ダイヤリング装置
JPS60216655A (ja) * 1984-04-12 1985-10-30 Nippon Telegr & Teleph Corp <Ntt> 自動ダイヤル装置
DE3422409A1 (de) * 1984-06-16 1985-12-19 Standard Elektrik Lorenz Ag, 7000 Stuttgart Einrichtung zur erkennung und umsetzung von wahlinformation sowie von steuerinformation fuer leistungsmerkmale einer fernsprechvermittlungsanlage
US4644107A (en) * 1984-10-26 1987-02-17 Ttc Voice-controlled telephone using visual display

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BELL LABORATORIES RECORD, October 1973, Vol. 51, No. 9, KITSOPOULOS, "Experimental Telephone Lets Disabled Dial by Voice", pp. 272-276. *
ELECTRICAL COMMUNICATION, 06 May 1985, Vol. 59, No. 3, IMMENDORFER, "Voice Dialer", pp. 281-285. *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5301227A (en) * 1989-04-17 1994-04-05 Sanyo Electic Co., Ltd. Automatic dial telephone
WO1998016048A1 (fr) * 1996-10-07 1998-04-16 Northern Telecom Limited Systeme de composition en vocal a l'aide d'un modele de comportement d'appel
US6208713B1 (en) 1996-12-05 2001-03-27 Nortel Networks Limited Method and apparatus for locating a desired record in a plurality of records in an input recognizing telephone directory
EP1044448B1 (fr) * 1998-09-11 2005-01-26 Koninklijke Philips Electronics N.V. Procede d'extraction d'erreur dans la reconnaissance d'une presentation utilisateur, par evaluation de la fiabilite d'un ensemble limite d'hypotheses
US6629072B1 (en) 1999-08-30 2003-09-30 Koninklijke Philips Electronics N.V. Method of an arrangement for speech recognition with speech velocity adaptation
US7668567B2 (en) * 1999-10-20 2010-02-23 Toupin Paul M Single action audio prompt interface utilising binary state time domain multiple selection protocol
US6804539B2 (en) 1999-10-20 2004-10-12 Curo Interactive Incorporated Single action audio prompt interface utilizing binary state time domain multiple selection protocol
US6771982B1 (en) 1999-10-20 2004-08-03 Curo Interactive Incorporated Single action audio prompt interface utlizing binary state time domain multiple selection protocol
US8155708B2 (en) 1999-10-20 2012-04-10 Curo Interactive Incorporated Single action audio prompt interface utilising binary state time domain multiple selection protocol
US8611955B2 (en) 1999-10-20 2013-12-17 Curo Interactive Incorporated Single action audio interface utilising binary state time domain multiple selection protocol
US9232037B2 (en) 1999-10-20 2016-01-05 Curo Interactive Incorporated Single action sensory prompt interface utilising binary state time domain selection protocol
US7167544B1 (en) 1999-11-25 2007-01-23 Siemens Aktiengesellschaft Telecommunication system with error messages corresponding to speech recognition errors
WO2001039176A3 (fr) * 1999-11-25 2002-09-26 Siemens Ag Procede et dispositif de reconnaissance vocale et systeme de telecommunication
US7822613B2 (en) 2002-10-07 2010-10-26 Mitsubishi Denki Kabushiki Kaisha Vehicle-mounted control apparatus and program that causes computer to execute method of providing guidance on the operation of the vehicle-mounted control apparatus
EP1450349B1 (fr) * 2002-10-07 2011-06-22 Mitsubishi Denki Kabushiki Kaisha Dispositif de contrôle embarqué et programme amenant un ordinateur à exécuter un procédé visant à fournir un guidage d'opération du dispositif de contrôle embarqué

Also Published As

Publication number Publication date
AU7488287A (en) 1987-12-22

Similar Documents

Publication Publication Date Title
EP0789901B1 (fr) Systeme de reconnaissance de la voix
US6088428A (en) Voice controlled messaging system and processing method
JP4607334B2 (ja) 分散された音声認識システム
AU598999B2 (en) Voice controlled dialer with separate memories for any users and authorized users
US3742143A (en) Limited vocabulary speech recognition circuit for machine and telephone control
US5960393A (en) User selectable multiple threshold criteria for voice recognition
US5960395A (en) Pattern matching method, apparatus and computer readable memory medium for speech recognition using dynamic programming
TWI253056B (en) Combined engine system and method for voice recognition
US6098040A (en) Method and apparatus for providing an improved feature set in speech recognition by performing noise cancellation and background masking
US6438520B1 (en) Apparatus, method and system for cross-speaker speech recognition for telecommunication applications
JP3204632B2 (ja) 音声ダイヤルサーバー
JP4246703B2 (ja) 自動音声認識の方法
WO1987007460A1 (fr) Telephone actionne par la voix
CN100521708C (zh) 移动信息终端的语音识别与语音标签记录和调用方法
TW521263B (en) Automatic speech recognition to control integrated communication devices
US20010056345A1 (en) Method and system for speech recognition of the alphabet
US6845356B1 (en) Processing dual tone multi-frequency signals for use with a natural language understanding system
US7283964B1 (en) Method and apparatus for voice controlled devices with improved phrase storage, use, conversion, transfer, and recognition
US20030081738A1 (en) Method and apparatus for improving access to numerical information in voice messages
KR100827074B1 (ko) 이동 통신 단말기의 자동 다이얼링 장치 및 방법
De Vos et al. Algorithm and DSP-implementation for a speaker-independent single-word speech recognizer with additional speaker-dependent say-in facility
JPS59224900A (ja) 音声認識方法
JPH05508242A (ja) 話者認識方法
JPS63306748A (ja) 音声ダイヤル装置
JPS61107397A (ja) 音声認識応答装置

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AU DK FI JP KR NO

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE FR GB IT LU NL SE

WA Withdrawal of international application