EP1497751A1 - Procede et systeme de detection et d'extraction d'entites nommees de communications spontanees - Google Patents
Procede et systeme de detection et d'extraction d'entites nommees de communications spontaneesInfo
- Publication number
- EP1497751A1 EP1497751A1 EP03721540A EP03721540A EP1497751A1 EP 1497751 A1 EP1497751 A1 EP 1497751A1 EP 03721540 A EP03721540 A EP 03721540A EP 03721540 A EP03721540 A EP 03721540A EP 1497751 A1 EP1497751 A1 EP 1497751A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- named entity
- named
- named entities
- contextual
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1815—Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
Definitions
- the invention relates to automated systems for communication recognition and understanding.
- the invention concerns a method and system for detecting and extracting named entities from spontaneous communications.
- the method may recognizing input communications from a user, detecting contextual named entities from the recognized input communication, and outputting the contextual named entities to a language understanding unit.
- FIG. 1 is a block diagram of an exemplary communication recognition and understanding system
- FIG. 2 is a detailed block diagram of an exemplary named entity training unit
- FIG. 3 is a detailed flowchart illustrating an exemplary named entity training process
- Fig. 4 is a detailed block diagram of an exemplary named entity detection and extraction unit
- FIG. 5 is a detailed block diagram of an exemplary named entity detector
- Fig. 6 is a flowchart illustrating an exemplary named entity detection and extraction process
- Fig. 7 is a flowchart of, an exemplary task classification process. Detailed Description Of The Preferred Embodiments
- a dialogue system can be viewed as an interface between a user and a database.
- the role of the system is to determine first, what kind of query the database is going to be asked, and second, with which parameters. For example, in the How May I Help You? SM ' TM (HMIHY) customer care corpus, if a user wants his or her account balance, the query concerns accessing the account balance field of the database with the customer identification number as the parameter.
- Such database queries are denoted by task-type (or in this example, call-type) and their parameters are the information items that are contained in the user's request. They are often called "named entities".
- named entities The most general definition of a named entity is a sequence of words, symbols, etc. that refers to a unique identifier. For example, a named entity may refer to:
- time identifiers like dates, time expressions or durations
- a named entity In the framework of a dialogue system, the definition of a named entity is often associated to its meaning for the application targeted. For example, in a customer care corpus, most of the relevant time or monetary expressions may be those related to an item on a customer's bill (the date of the bill, a date or an amount of an item or service, etc.).
- context-dependent named entities that are named entities whose definition is linked to the dialogue context
- context-independent named entities that are independent from the dialogue application (e.g., a date).
- One of the aspects of the invention described herein is the detection and extraction of such context-dependent and independent named entities from spontaneous communications in a mixed-initiative dialogue context.
- Dialogue managers can be classified according to the type of the type of system interaction implemented, including system-initiative, user-initiative or mixed-initiative.
- a system-initiative dialogue manager handles very constrained dialogues where the user has to answer to direct questions by either one key word or a simple short sentence.
- a user-initiative dialogue manager gives the user the possibility of directing the dialogue.
- the system waits to know what the user wants before asking a specific question. In this case, spontaneous communications have to be accepted.
- spontaneous communications have to be accepted.
- the first stage of attempting to "understand" a user's query involves classifying the user's intent according to a list of task-types specific to the application.
- a customer care corpus may contain dialogue belonging to a third category which is known as mixed-initiative dialogue manager systems.
- the top-level of the dialogue implements a user-initiative dialogue by simply asking the user "How may I help you?", for example.
- a short dialogue sometimes ensues for clarifying the request, and finally the user is sent to either an automated dialogue system or a human representative depending on the availability of such an automatic process for the request recognized.
- this invention addresses how to automatically process such spontaneous responses.
- this invention concerns a dialogue system that automatically detects and extracts, from a recognized output, the task-type request expressed by a user and its parameters, such as numerical expressions, time expressions or proper names.
- these parameters are called "named entities" and their definitions can be either independent from the context of the application or strongly linked to the application domain.
- a method and system that trains named entity language models, and a method and system that detects and extracts such named entities to improve understanding during an automated dialogue session with a user will be discussed in detail below.
- Fig. 1 is an exemplary block diagram of a possible communication recognition and understanding system 100 that utilizes named entity detection and extraction.
- the exemplary communication recognition and understanding system 100 includes two related subsystems, namely a named entity training subsystem 110 and input communication processing subsystem 120.
- the named entity training subsystem 110 includes a named entity training unit 130 and a named entity database 140.
- the named entity training unit 130 generates named entity language models and a text classifier training corpus from a training corpus of transcribed or untranscribed training communications.
- the generated named entity language models and text classifier training corpus are stored in the named entity database 140 for use by the named entity detection and extraction unit 160.
- the input communication processing subsystem 120 includes an input communication recognizer 150, a named entity detection and extraction unit 160, a natural language understanding unit 170 and a dialogue manager 180.
- the input communication recognizer 150 receives a user's task objective request or other communications in the form of verbal and/or non-verbal communications.
- Non-verbal communications may include tablet strokes, gestures, head movements, hand movements, body movements, etc.).
- the input communication recognizer 150 may perform the function of recognizing or spotting the existence of one or more words, phones, sub-units, acoustic morphemes, non-acoustic morphemes, morpheme lattices, etc., in the user's input communications using any algorithm known to one of ordinary skill in the art.
- One such algorithm may involve the input communication recognizer 150 forming a lattice structure to represent a distribution of recognized phone sequences, such as a probability distribution.
- the input communication recognizer 150 may extract the n-best word strings that may be extracted from the lattice, either by themselves or along with their confidence scores.
- lattice representations are well known those skilled in the art and are further described in detail below. While the invention is described below as being used in a system that forms and uses lattice structures, this is only one possible embodiment and the invention should not be limited as such.
- the named entity detection and extraction unit 160 detects the named entities present in the lattice that represents the user's input request or other communication.
- the named entity detection and extraction unit 160 tags and classifies detected named entities and extracts the named entity values using a process such as discussed in relation to Figs. 4-6 below. These extracted values are then provided as an input to the natural language understanding unit 170.
- the natural language understanding unit 170 may apply a confidence function, based on the probabilistic relationship between the recognized communication including the named entity values and selected task objectives. As a result, the natural language understanding unit 170 will pass the information to the dialogue manager 180. The dialogue manager 180 may then make a decision either to implement a particular task objective, or determine that no decision can made based on the information provided, in which case the user may be defaulted to a human or other automated system for assistance. In any case, the dialogue manager 180 will inform the user of the status and/or solicit more information.
- the dialogue manager 180 may also store the various dialogue iterations during a particular user dialogue session. These previous dialogue iterations may be used by the dialogue manager 180 in conjunction with the current user input to provide an acceptable response, task completion, or task routing objective, for example.
- Fig. 2 is a more detailed diagram of the named entity training subsystem 110 shown in Fig.1.
- the named entity training unit 130 may include a transcriber 210, a labeler 220, a named entity parser 230, a named entity training tagger 240, a training recognizer 250, and an aligner 260.
- the named entity language model and text classifier training corpus generated by the named entity training unit 130 are stored in the named entity database 140 for use by the named entity detection and extraction unit 160.
- the operation of the individual units of the named entity training subsystem 110 will be discussed in detail with respect to Fig. 3 below.
- Fig. 3 illustrates an exemplary named entity training process for using the named entity training subsystem 110 shown in Figs. 1 and 2. Note that while the steps of any of the exemplary flowcharts illustrated herein are shown having steps arranged in a particular order, the order of the steps in the figures may be rearranged to be performed in any order, or simultaneously, for example. [0033] The process in Fig. 3 begins at step 3100 and proceeds to step
- the recognition process may involve a phonotactic language model that was trained on the switchboard corpus using a Variable-Length N- gram Stochastic Automaton, for example.
- This training corpus may be derived from a collection of sentences generated from the recordings of callers responding to a system prompt, for example. In experiments conducted on the system of the invention, 7642 and 1000 sentences in the training and test sets were used, respectively. Sentences are represented at the word level and provided with semantic labels drawn from 46 call-types. This training corpus may be unrelated to the system task. Moreover, off-the-shelf telephony acoustic models may be used.
- the transcriber 210 also receives raw training communications from the training corpus in conjunction with the training corpus being put through the training recognizer 250.
- the processes in steps 3200 and 3300 may be performed simultaneously.
- corpora of spontaneous communications dedicated to a specific application are small and obtained by a particular protocol or a directed experiment trial. Because of their small size, these training corpora may be integrally transcribed by humans.
- a selective sampling process may be used in order to select the dialogues that are going to be labeled.
- This process can be done randomly, but by being able to extract information from the recognizer output in an unsupervised way, the selection method can be made more efficient. For example, some rare task types or named entities can be very badly modeled because of the lack of data representing them. By automatically detecting named entity tags, specifically selected dialogues can be represented that are likely to contain them, and accelerate in a very significant way, the coverage of the training corpus.
- Fig. 2 The dual process shown in Fig. 2 is important to improve the quality of named entity detection and extraction. For instance, consider that named entities are usually represented by either handwritten rules or statistical models. Even if statistical models seem to be much more robust toward recognizer errors, in both cases the models are derived from communications transcribed by the transcriber 210 and the errors from the training recognizer 250 are not taken into account explicitly by the models.
- This strategy certainly emphasizes the precision of the resulting detection, but a great loss in recall can occur by not modeling the training recognizer's 250 behavior. For example, a word can be considered as very salient information for detecting a particular named entity tag. But if this word is, for any reason, very often badly recognized by the training recognizer 250, its salience won't be useful on the output of the system. For this reason, the training recognizer's 250 behavior in the contextual named entity tagging process is explicitly modeled.
- step 3400 the transcribed corpus is then labeled by the labeler
- the labeler 220 classifies each sentence according to the list of named entity tags contained in it. Then for each sentence, the named entity parser 230 marks all of the words or group of words selected by the labeler 220 for characterizing the sentence according to the named entity tags. In this manner, the named entity parser 230 marks the part-of-speech of each word and performs a syntactic bracketing on the corpus.
- step 3500 as a way to make the user's input communication useful to the dialogue manager 180, the named entity training tagger 240 inserts named entity tags on the labeled and parsed training corpus. This process may include using the list of named entity tags included in each of sentence as well as using both statistic and syntactic criteria to present which context is considered salient for identifying named entities.
- Date any date expression with at least a day and a month specified
- Which_Bill a temporal expression identifying the bill the customer is talking about.
- the first two tags can be considered as context-independent named entities. For example, "Date” can correspond to any date expression. In addition, nearly all of the 10 or 11 -digit strings in the customer care corpus are effectively considered "Phone" numbers.
- ltem_Amount refers to a numerical money expression that is explicitly written on the bill. According to this definition, the following sentence contains an ltem_Amount tag: / don't recognize this 22 dollars call to... but this one doesn't; he told me we would get a 50 dollars gift.... [0043] Thus, each named entity tag can correspond to one or several kinds of values. To the tags Phone, ltem_Amount and Date correspond only one type of pattern for their values (respectively: 10-digit string, ⁇ num>$ ⁇ num> ⁇ cents> and ⁇ year>/ ⁇ month>/ ⁇ day>).
- the named entity tagger 240 may use a probabilistic tagging approach. This approach involves is a Hidden Markov Model (HMM) where each word of a sentence is emitted by a state in the model.
- the first term, P(wt ⁇ w t - ⁇ ,st), is implemented as a state-dependent bigram model. For example, if St is the state inside a PHONE, this first term corresponds to the bigram probability Pphon e (w t ⁇ w t . ⁇ ) estimated on the corpus C / v £ . Similarly, the bigram probability for the background text, WM), is estimated on the corpus CBK-
- the second term is the state transition probability of going from the state t-1 to the state t. These probabilities are estimated on the training corpus, once the named entity context selection process has been done.
- the context corresponds to the smallest portion of text, containing the named entity, which allows the labeler 220 to decide that a named entity is occurring.
- the named entity training tagger 240 must model not only the named entity itself (e.g., 20 dollars and 40 cents) but its whole context of occurrence (e.g., . . . this 20 dollars and 40 cents call%) in order to disambiguate relevant named entities from others.
- the relevant context of a named entity tag in a sentence is the concatenation of all the syntactic phrases containing a word marked by the named entity parser 230.
- each named entity may be represented by three items: tag, context and value.
- the aligner 260 aligns the training recognizer's 250 output corpus, at the word level, with the transcribed corpus that has been tagged by the named entity training tagger 240.
- a named entity language model is created.
- This named entity language model may be a series of regular grammars coded as Finite-State- Machines (FSMs).
- FSMs Finite-State- Machines
- the aligner 260 creates named entity language models after the transcribed training corpus is labeled, parsed and tagged by the named entity training tagger 240 in order to extract named entity contexts on the clean text. [0053] Only the named entity contexts correctly tagged according to the labels marked by the labeler 220 are kept. Then, all the digits, natural numbers and proper names are replaced by corresponding non-terminal symbols. Finally, all of the patterns representing a given tag are merged in order to obtain one FSM for each tag coding the regular grammar of the patterns found in the corpus.
- FSMs Finite-State- Machines
- the corpora CNE and C B ⁇ are replaced by their corresponding sections in the recognizer output corpus and stored along with the named entity language models in the named entity training database 140.
- the inconvenience of learning directly a model on a very noisy channel is balanced by structuring the noisy data according to constraints obtained on the clean channel. This leads to an increase in performance.
- the training recognizer 250 can generate, as output, a word-lattice as well as the highest probability hypothesis called the 1-best hypothesis.
- the word-error-rate of the 1-best hypothesis is around 27%.
- the aligner 260 perform an alignment between the transcribed data and the word lattices produced by the training recognizer 250, the word-error-rate of the aligned corpus drops to around 10%.
- step 3800 the recognized training corpus is updated using the aligned corpus and also stored in the named entity database 140 for use in the text classification performed by the named entity detection and extraction unit 160.
- the aligner 260 processes the text classifier training corpus using the named entity tagger 240 with no rejection. On one side, all of the named entities that are correctly tagged by the named entity tagger 240 according to the labels given by the labeler 220 are kept. On the other side, all the false positive detections are labeled by aligner 260 with the tag OTHER. Then, the text classifier 520 (introduced below) is trained in order to separate the named entity tags from the OTHER tags using the text classifier training corpus. The process goes to step 3900 and ends.
- Figs. 4 and 5 illustrate more detailed exemplary diagrams of portions of input communication processing subsystem 120 and Figs. 6 and 7 illustrate an exemplary named detection and extraction process and an exemplary task classification process, respectively.
- Fig. 4 illustrates an exemplary named entity detection and extraction unit 160.
- the named entity detection and extraction unit 160 may include a named entity detector 410 and named entity extractor 420.
- Fig.5 is a more detailed block diagram illustrating an exemplary named entity detector 410.
- the named entity detector 410 may include a named entity tagger 510 and a text classifier 520.
- the operations of individual units shown in Figs. 4 and 5 will be discussed in detail in relation to Figs. 6 and 7 below.
- Fig. 6 is an exemplary flowchart illustrating a possible named entity detection and extraction process using the exemplary system described in relation to the figures discussed above.
- the named entity detection and extraction process begins at step 6100 and proceeds to step 6200 where the input communication recognizer 150 recognizes the input communication from the user and produces a lattice.
- the weights or likelihoods of the paths of the lattices are interpreted as negative logarithms of the probabilities.
- the pruned network will be considered.
- the beam search is restricted in the lattice output, by considering only the paths with probabilities above a certain threshold relative to the best path.
- Most recognizers can generate, as output, a word-lattice as well as the highest probability hypothesis called 1-best hypothesis. This lattice generation can be made at the same time as the 1-best string estimation with no further computational cost. As discussed above, in the customer care corpus, the word error rate of the 1-best hypothesis is around 27%. However, by performing an alignment between the transcribed data and the word lattices produced by the recognizer, the word error rate of the aligned corpus (called the oracle word error rate) dropped to approximately 10%. This simply means that, although the system has nearly all the information for decoding the corpus, most of the time the correct transcription is not the most probable one according to the recognition probabilistic models.
- the training recognizer 250 generally produces word lattices during the recognition process as an intermediate step. Even if they represent a reduced search-space compared to the first one obtained after the acoustic parameterization, they still can contain a huge number of paths, which limits the complexity of the methods that can be applied efficiently to them.
- the named entity parser 230 statistically parses the input word lattice by first pruning the word lattice in order to keep the 1000 best paths.
- WCN Word Confusion Networks
- the main idea consists in changing the topology of the lattice in order to reflect the time-alignment of the words in the signal. This new topology may be a concatenation of word-sets.
- the word lattice is pruned and the score attached to each word is calculated to represent the posterior probability of the word (i.e., the sum of the probabilities of all the paths leading to the word) and some new paths can appear by grouping words into sets.
- An empty transition (called epsilon transition) is also added to each word set in order to complete the probability sum of all the words of the set to 1.
- the named entity tagger 510 of the named entity detector 410 detects the named entity values and context of occurrence using the named entity language models stored in the named entity database 140.
- the named entity tagger 510 inserts named entity tags on the detected named entities.
- the named entity tagger 510 maximizes the probability expressed by the equation (1) above by means of a search algorithm.
- the named entity tagger 510 may give to each named entity tag a confidence score for being contained in a particular sentence, without extracting a value.
- each named entity detected by the named entity tagger 510 is scored by the text classifier 520.
- the scores given by the text classifier 520 are used as confidence scores to accept or reject a named entity tag according to a given threshold.
- the text classifier 520 was trained to separate the named entity tags from the OTHER tags using the text classifier training corpus stored in the named entity database 140.
- the named entities are detected, they are extracted using the named entity extractor 420. A 2-step approach is undertaken for extracting named entity values.
- the named entity extractor 420 extracts the named entity values from the word lattice using the named entity language models stored in the named entity database 140 (specific to each named entity tag). However, only on the areas selected by the named entity tagger 510. [0076] Extracting named entity values is crucial in order to complete a user's request because these values represent the parameters of the database queries.
- each named entity value is obtained by a separate question and named entities embedded in a sentence are usually ignored.
- the named entity extractor 420 it is important for the named entity extractor 420 to extract the named entity values as soon as the caller expresses them, even if the dialogue manager 180 hasn't explicitly asked for them. This point is particularly crucial in order to make the dialogue feel natural to the user. While extracting named entity values embedded in a sentence is much more difficult than processing answers to a direct question, the named entity extractor 420 can use, for this purpose, all the information that has been collected during the previous iterations of the dialogue. [0077] For example, in a customer care application, as soon as the customer is identified, all of the information contained in his bill could be used.
- the phone number to detect is contained in the bill, among all the other calls (N) made during the same period. Extracting a 10-digit string among N (with N of order a few tens) is significantly easier than finding the right string among the 10 10 possible digit strings.
- the named entity extractor 420 performs a composition operation between the FSM associated to a named entity tag and an area of the word-lattice where such a tag has been detected by the named entity tagger 510.
- the word-lattices produced by the input communication recognizer 150 may be first transformed into a chain-like structure (also called "sausages").
- the named entity extractor 420 composes the FSM with the portion of the chain corresponding to the named entity area detected, in step 6700, the named entity extractor 420 searches for the best path according only to the confidence scores attached to each word by the text classifier 520 in the chain. The FSM is not weighted, as all the patterns extracted by the named entity extractor 420 from the named entity language model are considered valid.
- the named entity extractor 420 performs a simple filtering process of the best path in the FSM in order to output values to the natural language understanding unit 170, in step 6800.
- the natural language understanding unit 170 component of a dialogue system is located between the recognizer 150 component and the dialogue manager 180.
- the recognizer 150 outputs the best string of words estimated from the speech input according to the acoustic and language models with a confidence score attached to each word.
- the goal of the natural language understanding unit 170 is then to extract semantic information from this noisy string of words in order to give it to the dialogue manager 180.
- This architecture relies on the assumption that ultimately the progress made by automated communication recognition technology will allow the recognizer 150 to transcribe almost perfectly the communication input. Accordingly, as discussed above, a natural language understanding unit 170 may be designed and trained on manual transcriptions of human-computer dialogues. [0081] This assumption is reasonable when the language accepted by the dialogue application is very constrained, like in system-initiative dialogue systems. However, it becomes unrealistic when dealing with conversational spontaneous communications. This is because of the Out-Of-Vocabulary (OOV) word phenomenon which is inevitable, even with a very large recognition lexicon (occurrences of proper names like customers name, for example) and also because of the ambiguities intrinsic to spontaneous communication.
- OOV Out-Of-Vocabulary
- transcribing and understanding are two processes that should be done simultaneously as most of the transcription ambiguities of spontaneous speech can only be resolved by some understanding of what is being said.
- the usual architecture of dialogue systems may be modified by integrating the transcription process into the natural language understanding unit 170. Instead of producing only one string of words, the recognizer 150 will output a word lattice where the different paths can correspond to different interpretations of the communication input.
- step 6900 ends or continues to step
- a simple insertion or substitution in a named entity expression will lead to a rejection of the expression by the grammar.
- a named entity detection system based only on CFG applied to the best string hypothesis generated by the recognizer 150 will have a very high false rejection rate because of the numerous insertions, substitutions and deletions occurring in the recognizer 150 hypothesis.
- Two possible ways to address this problem are as follows: 1) Replace the CFG by a stochastic model in order to estimate the probability of a given distortion of the canonical form of a named entity; or 2) Apply the CFG, not only to the best string hypothesis of the recognizer 150, but to the whole WCN as mentioned above. [0095] The first possibility will be discussed below.
- the CFGs are represented as Finite State Machines (FSM) where each path, from a starting state to a final state, corresponds to an expression accepted by the corresponding grammar.
- FSM Finite State Machines
- the handwritten grammars are not stochastic, the corresponding FSMs aren't weighted and all paths are considered equals.
- applying a grammar to a WCN only consists in composing the two FSMs. Because an epsilon (empty transition exists between each state of the WCN, all the words surrounding a named entity expression will be replaced by the epsilon symbol and finding all the possible matches between a grammar and a WCN corresponds to enumerating all the possible paths in the composed FSM.
- each dialogue of the labeled corpus is split into turns and to each turn is attached a set of fields containing various information like the prompt name, the dialogue context, the exact transcription, the recognizer output, the NLU tags, etc.
- One of these fields is made of a list of triplets, one for each named entity contained in the sentence, as presented above.
- the field corresponding to the named entity context is supposed to contain the smallest portion of the sentence, containing the named entity, which characterizes this portion as a named entity. For example, in the sentence: / don't recognize this 22 dollar phone call on my December bill the context attached to the named entity ltem_Amount is "this 22 dollar phone call" and the one attached to the named entity Which_Bill is "December bill.”
- the key point of all the grammar induction methods is the strategy chosen for merging the different non-terminal: if no merging is performed, the grammar will only model the training examples, if too many non-terminals are merged, the grammar will accept incoherent strings.
- the merging strategy may be limited to a set of standard non-terminals: digit, natural numbers, and day and month names. The following substitutions are considered:
- the merged grammar is simply the union of the different FSMs corresponding to the different grammars.
- These handwritten grammars can be seen as a back-off strategy for detecting named entities.
- the named entity tag Which_Bill can have either a value corresponding to a date (the bill issued date, for example) or to a relative position (current or previous). But not all dates can be a considered as a Which_Bill, like for example, a date corresponding to a given phone call.
- all the dates corresponding to a tag Which-Bill are embedded in a string clarifying the nature of the date, like: bill issued on November 12th 2001.
- Adding handwritten rules is then an efficient way of increasing the recall of the named entity detection process. For example, in the previous example, if a handwritten grammar representing any kind of date is added to the data-induced grammar related to the tag Which_Bill, all the expressions identifying a bill by means of a date will be accepted. The first expression bill issued on November 12th 2001 will still be identified by the pattern found in the training corpus, because it's longer than the simple date and gives a better coverage of the sentence. The second expression bill dated November 12th 2001 will be reduced to the date itself and accepted by the back-off handwritten grammar representing the dates.
- a named entity value has to be extracted by the named entity extractor 420.
- each named entity tag can be represented by one or several kind of values.
- the evaluation of a named entity processing method applied to dialogue systems has to be done on the values extracted and not the word-string itself. From the dialogue manager's 180 point of view, the normalized values of the named entities will be the parameters of any database dip, and not the string of words, symbols, or gestures used to express them.
- November 12th 2001 is "2001/11/12". If the same value is extracted from the recognizer 150 output, this will be considered as a success, even if the named entity string estimated is bill of the November 12th 2001 or issue the November 12th of 2001.
- the extraction process is implemented as a transduction operation between these FSMs and the output of the recognizer 150.
- the FSMs are simply transformed into transducers by adding the output tokens attached to each input symbol for each arc of the FSMs.
- the recognizer 150 output side the following process is performed:
- the recognizer 150 output is a 1-best word string, it is turned into a sequential FSM, otherwise the word-lattice, or word confusion network, is used directly as an FSM;
- each path in the composed FSM between the recognizer 150 output transducer and the grammar transducer corresponds to a different named entity value.
- the n-best paths according to the confidence score attached to each word in the recognizer 150 output the n-best values are automatically obtained according to the different paths, in the grammars and in the FSM produced by the recognizer 150.
- Tagging methods have been widely used in order to associate to each word of a text a morphological tag called Part-Of-Speech (POS).
- POS Part-Of-Speech
- named entity detection process can be further described. It is assumed that the language contains a fixed vocabulary w 1 , w 2 , . . . , w v , which is the lexicon used by the recognizer 150. It is also assumed that a fixed set of named entity tags t 1 1 2 , . . . , F and a tag r° represents the background text. A particular sequence of n words is represented by the symbol w ⁇ ,n and for each / ' _> n, w,- e w 1 ,w ⁇ ,...,w J .
- n tags are represented by ,n and for each / > n, t, e ?,?,...,?.
- the tagging problem can then be formally defined as finding the sequence of tags >nrise in the following way:
- Equation 1 can be turned as:
- This training corpus is built from the HMIHY corpus in the following way.
- the corpus may contain about 44K dialogues (130K sentences), for example.
- This corpus is divided into a training corpus containing 35K dialogues (102K sentences) and a test corpus of 9K dialogues (28K sentences). Only about 30% of the dialogues and 15% of the sentences contain named entities. This corpus represents only the top level of the whole dialogue system, corresponding to the task classification routing process. This is why the average number of turns is rather small (around 3 turns per dialogue and the percentage of sentences containing a named entity is also small (the database queries which require a lot of named entity values are made in the legs of the dialogue and not at the top level. Nevertheless, still obtain a 16K sentence training corpus, manually transcribed, where each sentence contains at least one named entity.
- the last step in the training corpus process is a non-terminal substitution process applied to digit strings, ordinal numbers, and month and day names. Because the goal of the named entity training tagger 240 is not to predict a string of words but to tag an already existing string, the generalization power of the named entity training tagger 240 can be increased by replacing some words by general non-terminal symbols. This is especially important for digit strings, as the length of a string is a very strong indicator of its purpose.
- 10-digit strings are very likely to represent phone numbers, but if all the digits are represented as single tokens in the training corpus, the 3-gram language model used by the named entity training tagger 240 won't be able to model accurately this phenomenon as the span of such a model is only 3 words.
- a 3-gram LM will be able to correctly model the context surrounding these phone numbers.
- all the N- digit strings axe replaced by the symbol $digit/V the ordinal numbers are replaced by $ord, the month names by $month and the day names by $day.
- the parameters of the probabilistic model of the named entity training tagger 240 are then directly estimated from this corpus by means of a simple 3-gram approach with back off for unseen events.
- Tagging approaches based on language models with back off can be seen as stochastic grammars with no constraints as every path can be processed and receive a score. Therefore, the handling of the possible distortions of the named entity expressions found in the training corpus is automatic and this allows modeling of longer sequences without risking rejecting correct named entities expressed or recognized in a different way.
- a context-expansion method may be implemented based on a syntactic criterion as follows:
- the training corpus is first selected and labeled according to the method presented above; (2) then, a part-of-speech-tagging followed by a syntactic bracketing process is performed on each sentence in order to insert boundaries between each noun phrase, verbal phrase, etc;
- Such a method balances the inconvenience of learning directly a model on a very noisy channel (recognizer output) by structuring the noisy data according to constraints obtained on the clean channel (manual transcriptions).
- the named entity tagging process consists in maximizing the probability expressed by equation 7 by means of a search algorithm.
- the input is the best-hypothesis word string output of the recognizer module, and is pre-processed in order to replace some tokens by non-terminal symbols as discussed above.
- Word-lattices are not processed in this step because the tagging model is not trained for finding the best sequence of words, but instead for finding the best sequence of tags for a given word string.
- each word is associated with a tag, f° if the word is not part of any named entity, and f if the word is part of the named entity n.
- An SGML-like tag ⁇ n> is inserted for each transition between a word tagged f° and a word tagged f.
- the end of a named entity context is detected by the transition between a word tagged f and a word tagged r° and is represented by the tag ⁇ /n>.
- the text classifier 520 is trained in order to separate these samples according to their named entity tags as well as the OTHER tag. [00148] During the tagging process, the scores given by the text classifier
- 520 are used as confidence scores to accept or reject a named entity tag according to a given threshold.
- CFCs are often too strict when they are applied to the 1-best string produced by the recognizer 150, and on the other hand, applying them to the entire word lattice might generate too many false detections as there is no way of modeling their surrounding contexts of occurrence within the sentence.
- the tagging approach presented above provides efficient answers to these problems as the whole context of occurrence of a given named entity expression is modeled with a stochastic grammar that handles distortions due to recognizer errors or spontaneous speech effects. But this latter model can be applied only to the 1-best string produced by the recognizer module, which prevents using the whole word lattice for extracting the named entity values.
- a hybrid method may be implemented based on a 2-step process, which tries to take advantage of the two methods previously presented.
- the tagger is first used in order to have a general idea of its content. Then, the transcription is refined using the word-lattice with very constrained models (the CFGs) applied locally to the areas detected by the tagger. By doing so the understanding and the transcribing processes are linked and the final transcription output is a product of the natural language understanding unit 170 instead of the recognizer 150.
- the general architecture of the process may include the following:
- the dialogue manager receives the named entity values extracted, with two kind of confidence score: one attached to the tag itself and one given to the value (made from the confidence scores of the words composing the value).
- Fig. 7 is a flowchart of a possible task classification process using name entities.
- any method may be used as know to those of skill in the art, including classification methods disclosed in U.S. Patents Nos. 5,675,707, 5,860,063, 6,021 ,384, 6,044,337, 6,173,261 , and 6,192,110.
- the input of the dialogue manager 180 is a list of salient phrases detected in the sentence. These phrases are automatically acquired on a training corpus.
- An important task of the dialogue manager 180 is to generate prompts according to the dialogue history in order to clarify the user's request and complete the task.
- the dialogue manager 180 may perform task classifications based on the detected named entities and/or background text.
- the dialogue manager 180 may apply a confidence function based on the probabilistic relation between the recognized named entities and selected task objectives, for example.
- the dialogue manager 180 determines whether a task can be classified based on the extracted named entities.
- step 7300 the dialogue manager
- step 7700 the task objective is completed by the communication recognition and understanding system 100 or by another system connected directly or indirectly to the communication recognition and understanding system 100.
- step 7800 the process then goes to step 7800 and ends.
- step 7400 the dialogue manager 180 conducts dialogue with the user/customer to obtain clarification of the task objective.
- step 7500 the dialogue manager unit 180 determines whether the task can now be classified based on the additional dialog. If the task can be classified, the process proceeds to step 7300 and the user/customer is routed in accordance with the classified task objective and the process ends at step 7800. However, if task still cannot be classified, in step 7600, the user/customer is routed to a human for assistance and then the process goes to step 7800 and ends.
- Fig. 7 Although the flowchart in Fig. 7 only shows two iterations, multiple attempts to conduct dialogue with the user may be conducted in order to clarify one or more of the task objectives within the spirit and scope of the invention.
- the system and method of the invention is sometime illustrated above using words, numbers or phrases, the invention may also symbols, portions of words or sounds called morphemes (or sub-morphemes known as phone-phrases).
- morphemes are essentially a cluster of semantically meaningful phone sequences for classifying of utterances. The representations of the utterances at the phone level are obtained as an output of a task-independent phone recognizer. Morphemes may also be formed by the input communication recognizer 150 into a lattice structure to increase coverage, as discussed in further detail above.
- the morphemes may be non- acoustic (i.e., made up of non-verbal sub-morphemes such as tablet strokes, gestures, body movements, etc.). Accordingly, the invention should not be limited to just acoustic morphemes and should encompass the utilization of any sub-units of any known or future method of communication for the purposes of recognition and understanding.
- speech may connote only spoken language
- phrase may include verbal and/or non-verbal sub-units (or sub-morphemes). Therefore, “speech”, “phrase” and “utterance” may comprise non-verbal sub- units, verbal sub-units or a combination of verbal and non-verbal sub-units within the sprit and scope of this invention.
- the nature of the invention described herein is such that the method and system may be used with a variety of languages and dialects.
- the method and system may operate on well-known, standard languages, such as English, Spanish or French, but may operate on rare, new and unknown languages and symbols in building the database.
- the invention may operate on a mix of languages, such as communications partly in one language and partly in another (e.g., several English words along with or intermixed with several Spanish words).
- method can also be implemented on a general-purpose or a special purpose computer, a programmed microprocessor or microcontroller, peripheral integrated circuit elements, an application-specific integrated circuit (ASIC) or other integrated circuits, hardware/electronic logic circuits, such as a discrete element circuit, a programmable logic device, such as a PLD, PLA, FPGA, or PAL, or the like.
- ASIC application-specific integrated circuit
- any device on which the finite state machine capable of implementing the flowcharts shown in Figs. 3, 6 and 7 can be used to implement the recognition and understanding system functions of this invention.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
La présente invention concerne un procédé et un système de détection et d'extraction d'entités nommées de communications spontanées FIG. 1). Ce procédé peut reconnaître des communications d'entrée d'un utilisateur (150), détecter des entités (160) nommées contextuelles issues des communications (150) d'entrée reconnues et sortir ces entités nommées contextuelles en une unité de compréhension du langage (170).
Applications Claiming Priority (7)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US30762402P | 2002-04-05 | 2002-04-05 | |
| US307624P | 2002-04-05 | ||
| US44364203P | 2003-01-29 | 2003-01-29 | |
| US443642P | 2003-01-29 | ||
| US40294103A | 2003-04-01 | 2003-04-01 | |
| US402941 | 2003-04-01 | ||
| PCT/US2003/010482 WO2003088080A1 (fr) | 2002-04-05 | 2003-04-07 | Procede et systeme de detection et d'extraction d'entites nommees de communications spontanees |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| EP1497751A1 true EP1497751A1 (fr) | 2005-01-19 |
| EP1497751A4 EP1497751A4 (fr) | 2009-10-21 |
Family
ID=29255289
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP03721540A Withdrawn EP1497751A4 (fr) | 2002-04-05 | 2003-04-07 | Procede et systeme de detection et d'extraction d'entites nommees de communications spontanees |
Country Status (4)
| Country | Link |
|---|---|
| EP (1) | EP1497751A4 (fr) |
| AU (1) | AU2003224846A1 (fr) |
| CA (1) | CA2481080C (fr) |
| WO (1) | WO2003088080A1 (fr) |
Families Citing this family (20)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8185399B2 (en) | 2005-01-05 | 2012-05-22 | At&T Intellectual Property Ii, L.P. | System and method of providing an automated data-collection in spoken dialog systems |
| US8478589B2 (en) * | 2005-01-05 | 2013-07-02 | At&T Intellectual Property Ii, L.P. | Library of existing spoken dialog data for use in generating new natural language spoken dialog systems |
| US9892208B2 (en) | 2014-04-02 | 2018-02-13 | Microsoft Technology Licensing, Llc | Entity and attribute resolution in conversational applications |
| US9798708B1 (en) | 2014-07-11 | 2017-10-24 | Google Inc. | Annotating relevant content in a screen capture image |
| US9965559B2 (en) | 2014-08-21 | 2018-05-08 | Google Llc | Providing automatic actions for mobile onscreen content |
| US9703541B2 (en) | 2015-04-28 | 2017-07-11 | Google Inc. | Entity action suggestion on a mobile device |
| US10229674B2 (en) | 2015-05-15 | 2019-03-12 | Microsoft Technology Licensing, Llc | Cross-language speech recognition and translation |
| CN105070289B (zh) * | 2015-07-06 | 2017-11-17 | 百度在线网络技术(北京)有限公司 | 英文人名识别方法和装置 |
| US10803391B2 (en) * | 2015-07-29 | 2020-10-13 | Google Llc | Modeling personal entities on a mobile device using embeddings |
| US10970646B2 (en) | 2015-10-01 | 2021-04-06 | Google Llc | Action suggestions for user-selected content |
| US10178527B2 (en) | 2015-10-22 | 2019-01-08 | Google Llc | Personalized entity repository |
| US10055390B2 (en) | 2015-11-18 | 2018-08-21 | Google Llc | Simulated hyperlinks on a mobile device based on user intent and a centered selection of text |
| US10535005B1 (en) | 2016-10-26 | 2020-01-14 | Google Llc | Providing contextual actions for mobile onscreen content |
| US11237696B2 (en) | 2016-12-19 | 2022-02-01 | Google Llc | Smart assist for repeated actions |
| US10311874B2 (en) | 2017-09-01 | 2019-06-04 | 4Q Catalyst, LLC | Methods and systems for voice-based programming of a voice-controlled device |
| US10860800B2 (en) * | 2017-10-30 | 2020-12-08 | Panasonic Intellectual Property Management Co., Ltd. | Information processing method, information processing apparatus, and program for solving a specific task using a model of a dialogue system |
| US10629205B2 (en) | 2018-06-12 | 2020-04-21 | International Business Machines Corporation | Identifying an accurate transcription from probabilistic inputs |
| CN109785840B (zh) * | 2019-03-05 | 2021-01-29 | 湖北亿咖通科技有限公司 | 自然语言识别的方法、装置及车载多媒体主机、计算机可读存储介质 |
| US11790172B2 (en) * | 2020-09-18 | 2023-10-17 | Microsoft Technology Licensing, Llc | Systems and methods for identifying entities and constraints in natural language input |
| CN118114675B (zh) * | 2024-04-29 | 2024-07-26 | 支付宝(杭州)信息技术有限公司 | 基于大语言模型的医疗命名实体识别方法和装置 |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5212730A (en) * | 1991-07-01 | 1993-05-18 | Texas Instruments Incorporated | Voice recognition of proper names using text-derived recognition models |
| US6173261B1 (en) * | 1998-09-30 | 2001-01-09 | At&T Corp | Grammar fragment acquisition using syntactic and semantic clustering |
| US5832480A (en) * | 1996-07-12 | 1998-11-03 | International Business Machines Corporation | Using canonical forms to develop a dictionary of names in a text |
| US6044337A (en) * | 1997-10-29 | 2000-03-28 | At&T Corp | Selection of superwords based on criteria relevant to both speech recognition and understanding |
| CN1159661C (zh) * | 1999-04-08 | 2004-07-28 | 肯特里奇数字实验公司 | 用于中文的标记和命名实体识别的系统 |
-
2003
- 2003-04-07 EP EP03721540A patent/EP1497751A4/fr not_active Withdrawn
- 2003-04-07 WO PCT/US2003/010482 patent/WO2003088080A1/fr not_active Ceased
- 2003-04-07 AU AU2003224846A patent/AU2003224846A1/en not_active Abandoned
- 2003-04-07 CA CA2481080A patent/CA2481080C/fr not_active Expired - Fee Related
Also Published As
| Publication number | Publication date |
|---|---|
| AU2003224846A1 (en) | 2003-10-27 |
| CA2481080C (fr) | 2010-10-12 |
| EP1497751A4 (fr) | 2009-10-21 |
| WO2003088080A1 (fr) | 2003-10-23 |
| CA2481080A1 (fr) | 2003-10-23 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20030191625A1 (en) | Method and system for creating a named entity language model | |
| CA2481080C (fr) | Procede et systeme de detection et d'extraction d'entites nommees de communications spontanees | |
| Gorin et al. | How may I help you? | |
| US6937983B2 (en) | Method and system for semantic speech recognition | |
| US8612212B2 (en) | Method and system for automatically detecting morphemes in a task classification system using lattices | |
| US6681206B1 (en) | Method for generating morphemes | |
| US6374224B1 (en) | Method and apparatus for style control in natural language generation | |
| US20150058006A1 (en) | Phonetic alignment for user-agent dialogue recognition | |
| JP2005084681A (ja) | 意味的言語モデル化および信頼性測定のための方法およびシステム | |
| Béchet et al. | Detecting and extracting named entities from spontaneous speech in a mixed-initiative spoken dialogue context: How May I Help You? sm, tm | |
| WO2013163494A1 (fr) | Perfectionnement des résultats de la reconnaissance de la parole basé sur des exemples négatifs (anti-mots) | |
| JP2000200273A (ja) | 発話意図認識装置 | |
| CN115292461B (zh) | 基于语音识别的人机交互学习方法及系统 | |
| Higashinaka et al. | Incorporating discourse features into confidence scoring of intention recognition results in spoken dialogue systems | |
| CN117634471A (zh) | 一种nlp质检方法及计算机可读存储介质 | |
| Gallwitz et al. | The Erlangen spoken dialogue system EVAR: A state-of-the-art information retrieval system | |
| Rose et al. | Integration of utterance verification with statistical language modeling and spoken language understanding | |
| López-Cózar et al. | ASR post-correction for spoken dialogue systems based on semantic, syntactic, lexical and contextual information | |
| JP3364631B2 (ja) | 統計的言語モデル生成装置及び音声認識装置 | |
| Rahim et al. | Robust numeric recognition in spoken language dialogue | |
| Béchet et al. | Named entity extraction from spontaneous speech in How May I Help You? | |
| Béchet | Named entity recognition | |
| Levit | Spoken Language Understanding without Transcriptions in a Call Center Scenario | |
| Huang et al. | Extracting caller information from voicemail | |
| Nair et al. | Pair-wise language discrimination using phonotactic information |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| 17P | Request for examination filed |
Effective date: 20041026 |
|
| AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PT RO SE SI SK TR |
|
| AX | Request for extension of the european patent |
Extension state: AL LT LV MK |
|
| A4 | Supplementary search report drawn up and despatched |
Effective date: 20090916 |
|
| 17Q | First examination report despatched |
Effective date: 20100111 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
| 18D | Application deemed to be withdrawn |
Effective date: 20100522 |