EP0314503A2 - Wörterbuchstruktur für Dokumentenverarbeitungsapparat - Google Patents

Wörterbuchstruktur für Dokumentenverarbeitungsapparat Download PDF

Info

Publication number
EP0314503A2
EP0314503A2 EP88310178A EP88310178A EP0314503A2 EP 0314503 A2 EP0314503 A2 EP 0314503A2 EP 88310178 A EP88310178 A EP 88310178A EP 88310178 A EP88310178 A EP 88310178A EP 0314503 A2 EP0314503 A2 EP 0314503A2
Authority
EP
European Patent Office
Prior art keywords
representations
dictionary
group
kana
language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP88310178A
Other languages
English (en)
French (fr)
Other versions
EP0314503A3 (de
Inventor
Shigeki Kuga
Masahiro Wada
Taro Morishita
Hiroyuki Kanza
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sharp Corp
Original Assignee
Sharp Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sharp Corp filed Critical Sharp Corp
Publication of EP0314503A2 publication Critical patent/EP0314503A2/de
Publication of EP0314503A3 publication Critical patent/EP0314503A3/de
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/53Processing of non-Latin text

Definitions

  • the present invention relates to a dictionary structure for a document processing apparatus, the dictionary structure being used in a variety of document processing apparatus which require analysis of a character string, such as document composing and correction supporting apparatus, summarizing apparatus, retrieving apparatus, mechanical translating apparatus, and word processors.
  • word processors for processing various languages have been put into practice.
  • word processors for processing the Japanese language for example, the associated basic techniques have been established which include input/output of the Japanese language, editing, kana (Japanese alphabet) /kanji (Chinese character used in Japanese writing) conversion algorithms, dictionaries, etc.
  • Kana/kanji conversion algorithms for the Japanese language generally utilize both linguistic information, such as a word dictionary and grammar, and probability information, such as frequency of use of a word, during the process of converting kana to kanji. Accordingly, the dictionary, rules and other components to be of practical use must be fairly large in scale, and hence the means for storing such a dictionary, rules and others components require a large storage capacity (see Japanese Published Unexamined Patent Application Nos. 57-185573(1982) and 57-185575(1982), for example).
  • the method of analyzing the language includes, for example, a morpheme analysis for dividing one sentence into individual words and analyzing its parts of speech, a syntax analysis for analyzing whether or not the linkage between one part of speech and the other part of speech is correct in terms of its grammar, and a meaning analysis for analyzing whether or not the meaning is understood in the proper way, even if the linkage between parts of speech is correct grammatically.
  • Basic techniques relating to those analyses are already known in the art.
  • document processing apparatus in a broad sense is usually designed to include an input means, such as means for kana/ kanji conversion in the case of apparatus for processing a document to be written the Japanese language, for example, and also have the capability of executing specific processing operations such as translation and correction.
  • the document processing apparatus often requires maintenance of a dictionary, rules and others dependent on the demands of the users or varying trend of the times. In this case, it is required to update two sets of dictionaries, rules and other for kana/kanji conversion and analysis of kanji/kana mixed sentence, thereby prolonging the operation time required for maintenance and increasing the probability of errors.
  • the present invention provides a dictionary structure for a document processing apparatus comprising means for inputting and editing language, means for storing the input language, means for storing a dictionary used to analyze the language, means for storing the grammar of the language and analyzing the input language based on the stored grammar while referring to the dictionary, means for extracting a portion to be corrected out of the input language, means for displaying the language, and means for modifying a portion of the language which is to be corrected, if exists,
  • a single dictionary is structured such that the lexical entries obtained by sorting a first group of representations in a predetermined order to be suitable for retrieval from the first group of representations and the lexical entries obtained by sorting a second group of representations in a predetermined order to be suitable for retrieval from the second group of representations are correlated with each other in a one to one correspohding relation via key data, thereby allowing the first group of representations and the second group of representations to be mutually retrievable from the former to the latter or vice versa.
  • first group of representations used in the present invention means one type of representational, morph in any one language, e.g., representations expressed by phonetic symbols such as hiragana (the cursive kana characters), katakana (square form of kana) and romaji (Japanese in Roman characters) in the Japanese language, or the Hangul alphabets in the Korean language.
  • second group of representations used in the present invention means the other type of representational morph different from that of the first group of representations of the language but which can be expressed by the first group of representations, e.g., representations expressed by ideographic characters such as kanji in the Japanese or Korean language.
  • Fig. 1 is a block diagram showing the configuration of a document processing apparatus according to the present invention.
  • Fig. 1 designated at reference numeral 1 is an input/edit means such as a keyboard for inputting and editing a character string of the Japanese language.
  • the input/edit means 1 includes a kana/ kanji converting function for converting kana to kanji and a function of specifying anyone character string, which are already well-known functions in the art.
  • Designated at 2 is an input character storage means for storing the character string of the Japanese language which is input from the input/edit means 1.
  • the input/edit means 1 usually given by a keyboard through which character strings are input sequentially. But, it may be an externally configured storage means, such as a floppy disk or magnetic tape, on which a mass of character strings of the Japanese language have been stored in advance.
  • Designated at 3 is a dictionary storage means for storing a dictionary used to analyze the character/ symbol string loaded in the input character storage means 2.
  • Designated at 4 is grammar storage means for storing grammar, rules and others necessary for analyzing a sentence.
  • Designated at 5 is control means for extracting a portion of the character string loaded in the input character storage means 2, for storing the intermediate result in the course of processing, and for executing a command to display.
  • the control means 5 includes means for storing the result of the control.
  • Designated at 6 is display for displaying the input character string, the intermediate result of collation, the character string to be corrected, KWIC (Key Word in Context), etc.
  • Designated at 7 is an editor for reflecting the result of modifying the displayed portion for correction into the original sentence. Where the document processing apparatus is not mainly used or correction, the function, of the editor means 7 can instead by executed by the input/edit means 1.
  • Fig. 7 is a view showing the conventional structure of a word dictionary for kana/kanji conversion.
  • Fig. 7 designated at 8 is a column of the 10 numbers for vocabulary entries of the dictionary.
  • the ID numbers may be omitted because they can be derived from addresses of a memory in which the dictionary is loaded.
  • Designated at 9 is a column of kana representations of the vocabulary entries, which show readings in kana.
  • the vocabulary entries have conjugations, only the stems of the words are usually registered. So, this explanation follows such a usual method.
  • Designated at 11 is a column of parts of speech information for the vocabulary entries.
  • this column may further include morph information for morpheme analysis, syntax informa tion for syntax analysis and meaning information for meaning analysis. Since the use of these types of information are not related to the essentials of the present invention, only the part of speech information will be dealt with in the following for simplicity of description.
  • "VERB-FIVE CONJUGATIONS• (r)-SERIES” in Fig. 7 indicates that the relevant vocabulary entry is a verb having five conjugations in r-series.
  • the technique of kana/kanji conversion is no more than the processing which compares the input kana character string with the kana representations 9 using the above-mentioned dictionary, and then outputs the relevant kanji character representation 10 when proper analysis has been carried out subject to certain other conditions such as grammar.
  • the first one is to sort the kana representations 9 in the dictionary according to a prescribed reference.
  • Fig. 7 shows a case in which the kana representations 9 are sorted in ascending order of the Japanese kana syllabary, i.e.,
  • Fig. 9 is a view showing an example of a one-character index corresponding to the dictionary of Fig. 2.
  • Reference numeral 12 designates one-characters as keys for retrieval, each of which indicates a sort of the first character in the Japanese reading in kana
  • 13 designates the values of each indicating the location where a particular group of vocabulary entries beginning with that key one-character are stored.
  • Fig. 9 shows an example in which the vocabulary entry with the ID number "00001" in the dictionary is stored in the
  • Fig. 8 is a view showing a dictionary structure for analysis of a kanji/kana mixed sentence.
  • reference numeral 14 designates information indicative of the ID numbers of the respective vocabulary entries, 15 kanji representations for the vocabulary entries, 16 kana representations for the vocabulary entries, and 17 part of speech information for the vocabulary entries.
  • the dictionary for kana/kanji conversion only stems of words are registered for those vocabulary entries which have conjugations, and the part of speech information may include other information dependent of specific cases.
  • kanji representations 15 to be retrieved are stored based on a prescribed reference in this dictionary as well.
  • the kanji representations are stored in the ascending order of kanji code JIS C6226.
  • the ID numbers 14 and/or the kana representations 16 may be omitted.
  • Fig. 10 shows a structure of the index.
  • reference numeral 18 designates key indices comprising one character
  • 19 designates start addresses from each of which the vocabulary entry having the corresponding key character string at its head begins.
  • Fig. 3 is a view showing an example of a dictionary structure which is used in the present invention.
  • the same vocabulary entries as those in Figs. 7 and 8 will be treated in the following description.
  • reference numeral 20 designates the ID numbers of the vocabulary entries, 21 kana representations of the vocabulary entries, and 22 kanji representations of the vocabulary entries.
  • the arrangement of the kana representations 21 and the kanji representations 22 is featured in combination of the dictionary for kanji/kana conversion and the dictionary for analysis of a kana/kanji mixed sentence which have hitherto been independent of each other.
  • the kana representations 21 are sorted in such a way that they are fit for the kana/kanji conversion and the kanji representations 22 are sorted in such a way that they are fit for the analysis of a kanji/kana mixed sentence.
  • Reference numeral 23 designates correlating information necessary to correlate the kana representations 21 and the kanji representations 22.
  • This information indicates that the kanji representations 22 corresponding to the ID numbers 20 contained in this information are the specific kanji representations for the kana representations 21.
  • the kanji representation " " is obtained by retrieving one of the kanji representations associated with the ID number 00006 of the particular vocabulary entry.
  • Reference number 24 designates part of speech information for the vocabulary entries.
  • the part of speech information 24 in the illustrated case is set in conformity with the kana representations 21, but may be set in conformity with the kanji representations 22.
  • the present invention will not be affected by any structural arrangement of the ID numbers 20 to the part of speech information 24 as with the dictionary structures of Figs. 7 and 8.
  • Fig. 4 is a view showing another embodiment of the dictionary structure according to the present invention.
  • the contents of the correlating information 23 in Fig. 3 is replaced by those of correlating information 25.
  • the correlating information 23 in Fig. 3 is used for retrieving the kanji representations 22 from the kana representations 21 as keys
  • the correlating information 25 in Fig. 4 is used for retrieving the kana representations 21 from the kanji representations 22 as keys.
  • the same reference numerals as those employed in Fig. 3 are used in Fig. 4.
  • Fig. 5 is a view showing another embodiment of the dictionary structure according to the present invention.
  • the dictionary structure of Fig. 5 has the combined form of Figs. 3 and 4 in which the correlating information 25 in Fig. 4 is added to the dictionary structure of Fig. 3 as an additional method of correlating information 26.
  • Fig. 2 is a conceptual view showing an entire configuration of the dictionary of the present invention.
  • designated at 27 is a conceptual indication of a structure for storing information of the vocabulary entries read in kana which have been sorted based on the reading of the vocabulary entries in kana in such a way that they are fit for retrieval from the kana representations.
  • 28' is a conceptual indication of a structure for storing information of the vocabulary entries in kanji representations which have been sorted based on the kanji representations in such a way that they are fit for retrieval from a kanji/kana mixed character string.
  • 29 is a conceptual indication of a structure for storing information for interlinking the structures 27 and 28, information about parts of speech common to both the structures, and respective information specific to the structures 27 and 28..
  • the part of speech information 24 can be retrieved more easily.
  • Figs. 6a and 6b are a flow chart showing an outline of language processing by the use of the dictionary having the dictionary structure of the present invention.
  • step 30 selection is made as to which one of the kana/kanji conversion and the analysis of a kanji/kana mixed sentence is to be executed (step 30).
  • the input character string is set in the buffer memory in accordance with the selected processing mode.
  • kana characters step 31
  • a kanji/kana mixed sentence step 32
  • a character string at the head of the character strings set in the buffer memory is collated with the index (step 33).
  • the input character string is collated with the kana representations 21 in the dictionary from the start location of a retrieval range for the dictionary which has been determined by the index (step 34).
  • the collation is passed or failed. If the collation has passed, the part of speech information 24 for the same vocabulary entry is retrieved to check validity of the kana/kanji conversion (step 35).
  • the flow is branched into two ways dependent on whether validity is confirmed or not. If the validity has been confirmed, the relevant kanji representation 22 is retrieved using the correlating information 23 that correlates between the kana representation 21 and the kana representation 22 of the vocabulary entry, based on the above-mentioned method (step 36).
  • the input character string is processed as an unregistered word (step 37).
  • a next character string is set (step 38) and the flow is then returned to the step 31.
  • the cycle of the above processing steps will be repeated until all of the character strings have been processed, or the process has been ended forcibly.
  • step 39 the head character string is collated witht the index for analysis of a kana/kanji mixed sentence.
  • the input character string is collated with the kanji representations 22 in the dictionary (step 40).
  • the part of speech information 24 is retrieved using the correlating information 26 based on the above-mentioned method.
  • the linguistic composition requirements are checked to make analysis of the language based on the vocabulary entries and the part of speech information 24 (step 42).
  • the analyzed result is output (step 43).
  • the input character string is processed as an unregistered word in step 44 as with the process flow for kana/kanji conversion. Then, after setting a next character string (step 45), the flow is returned to the step 32.
  • the processed language has been assumed to be Japanese by way of example, it is not limited to especially the Japanese language, and the dictionary structure of this application is also applicable to other language, such as Korean, in which the first group of representations are given by phonetic symbols and the second group of representations are given by ideographic characters.
  • English, French, German, Spanish, etc. are languages which are expressed by phonetic symbols alone
  • the dictionary structure of this application can further be applied to automatic translating apparatus for translating one of those languages to the other, for example.
  • the dictionary structure capable of mutually translating one language to the other or vice versa can be obtained by setting one language as first group of representations, setting the other language as second group of representations, and adding key data adapted to correlate lexical entries given by the first group of representations and lexical entries given by the second group of representations in one to one corresponding relation.
  • a dictionary used for retrieval based on the first group of representations Such as a dictionary for kana/kanji conversion in Japanese, for example, and a dictionary used for retrieval based on the second group of representations, such as a dictionary for analysis of a kanji/kana mixed sentence in Japanese, for example, are correlated to each other in one to one corresponding relation via key data, the storage capacity necessary for the dictionary can be reduced to a large extent as compared with the conventional method of storing a dictionary and others.
  • Another advantageous affect is in that, despite the reduced storage capacity necessary for a dictionary and others, any loss of information will not occur and information can be retrieved at a high speed. Still another advantageous effect is in that since the lexical entries are stored, the compression rate of a dictionary and others can be increased.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)
EP19880310178 1987-10-28 1988-10-28 Wörterbuchstruktur für Dokumentenverarbeitungsapparat Withdrawn EP0314503A3 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP62274158A JPH01114976A (ja) 1987-10-28 1987-10-28 文書処理装置の辞書構造
JP274158/87 1987-10-28

Publications (2)

Publication Number Publication Date
EP0314503A2 true EP0314503A2 (de) 1989-05-03
EP0314503A3 EP0314503A3 (de) 1990-12-19

Family

ID=17537840

Family Applications (1)

Application Number Title Priority Date Filing Date
EP19880310178 Withdrawn EP0314503A3 (de) 1987-10-28 1988-10-28 Wörterbuchstruktur für Dokumentenverarbeitungsapparat

Country Status (2)

Country Link
EP (1) EP0314503A3 (de)
JP (1) JPH01114976A (de)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2229558A (en) * 1989-03-02 1990-09-26 Nec Corp Device for analyzing Japanese sentences into morphemes with attention directed to morpheme groups
US6292770B1 (en) 1997-01-22 2001-09-18 International Business Machines Corporation Japanese language user interface for messaging system

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5666307B2 (ja) * 2007-11-26 2015-02-12 ウォーレン・ダニエル・チャイルドWarren Daniel CHILD 漢字系文字および文字構成要素の分類ならびに読み出しのためのシステムと方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS59865B2 (ja) * 1979-09-13 1984-01-09 シャープ株式会社 電子式翻訳装置
JPS608980A (ja) * 1983-06-28 1985-01-17 Brother Ind Ltd 電子辞書

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2229558A (en) * 1989-03-02 1990-09-26 Nec Corp Device for analyzing Japanese sentences into morphemes with attention directed to morpheme groups
US6292770B1 (en) 1997-01-22 2001-09-18 International Business Machines Corporation Japanese language user interface for messaging system

Also Published As

Publication number Publication date
JPH01114976A (ja) 1989-05-08
EP0314503A3 (de) 1990-12-19

Similar Documents

Publication Publication Date Title
US4777600A (en) Phonetic data-to-kanji character converter with a syntax analyzer to alter priority order of displayed kanji homonyms
EP0423683B1 (de) Gerät zur automatischen Generierung eines Index
US5890103A (en) Method and apparatus for improved tokenization of natural language text
US4393460A (en) Simultaneous electronic translation device
EP0180047A2 (de) Textaufbereiter für Spracheingabe
JPH0724055B2 (ja) 単語分割処理方法
US5396419A (en) Pre-edit support method and apparatus
US7328404B2 (en) Method for predicting the readings of japanese ideographs
US20040193399A1 (en) System and method for word analysis
US6968308B1 (en) Method for segmenting non-segmented text using syntactic parse
US5079701A (en) System for registering new words by using linguistically comparable reference words
JPS58192173A (ja) 機械翻訳装置
EP0314503A2 (de) Wörterbuchstruktur für Dokumentenverarbeitungsapparat
JP2595934B2 (ja) 仮名漢字変換処理装置
JPS646499B2 (de)
JPH08190561A (ja) 文書修正装置
JP2939945B2 (ja) ローマ字住所認識装置
JPH08305698A (ja) 自然語解析方法及び装置
JPH0630100B2 (ja) 仮名漢字変換方式
KR100248386B1 (ko) 인간 가독형 형태소 접속 정보와 자종(字種) 정보를 이용한 일본어 형태소 분석 장치 및 그 방법
JPS6172359A (ja) べた書きかな漢字変換における文字修正方式
JP2900628B2 (ja) 辞書検索装置
JPH05225183A (ja) 日本文単語誤り自動検出装置
JPH0128977B2 (de)
JPH0262659A (ja) 日本文訂正候補文字抽出装置

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 19881104

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): DE GB

PUAL Search report despatched

Free format text: ORIGINAL CODE: 0009013

AK Designated contracting states

Kind code of ref document: A3

Designated state(s): DE GB

17Q First examination report despatched

Effective date: 19931227

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 19951109