JPH07200256A

JPH07200256A - Method for sorting word irrespective of language

Info

Publication number: JPH07200256A
Application number: JP5354974A
Authority: JP
Inventors: Yosuke Ochiai; 洋介落合
Original assignee: Individual
Current assignee: Individual
Priority date: 1993-12-28
Filing date: 1993-12-28
Publication date: 1995-08-04

Abstract

PURPOSE:To digitize languages for smoothly sorting the words of the languages without depending on the vague memory of human as much as possible. CONSTITUTION:The word cannot be sorted as it is. For sorting the word, KANJI(Chinese character), KANA(Japanese syllabary), KATAKANA(square form of KANA), and foreign languages(alphabets, Arabic script and the like) are digitized in accordance with a situation so as to compare them. (Sorting can be realized by digitizing the language to be sorted in accordance with a definition file and by making it possible to be measured.) Only when numbers corresponding to the words do not exist, or when pronunciation different from a regular case is executed, learning is executed by a manual operation. Thus, sorting corresponding to a designated condition can be executed in a form where errors and labor are required minimum irrespective of the language.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、一般的なデータとして
入力される表記（漢字、カタカナ、ひらがな、アルファ
ベットなど）のソートに関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to sorting of notations (kanji, katakana, hiragana, alphabet, etc.) input as general data.

【０００２】[0002]

【従来の技術】通常、ソートを必要とするデータの源
は、最近流行の名刺認識装置やイメージスキャナー、パ
ソコン通信などによるオンラインデーターベースなど多
様であり、人手による情報の入力ならまだしも、そのほ
かの手段によって入力された情報には文字固有の情報は
あっても、読み名や発音の情報はない。従って、アイウ
エオ順のソートなどは、かなり困難な作業であった。2. Description of the Related Art Generally, there are various sources of data that need to be sorted, such as a business card recognition device that has recently become popular, an image scanner, an online database such as a personal computer communication, etc., and other means for manually inputting information. Although there is information peculiar to characters in the information entered by, there is no reading name or pronunciation information. Therefore, sorting in the order of AIUEO was a fairly difficult task.

【０００３】[0003]

【発明が解決しようとする課題】そういった場合、日本
語アイウエオ順のソート一つを取っても、かなりの労力
を要し、効率も決して良いとは言えない。アイウエオ順
の日本語のソートを行う際、漢字、またはアルファベッ
トを含む場合には比較を行うために、仮名に一度変換を
行い、さらにコンピュータにかける場合には、比較のた
めに何らかの符号化を行うことになる。しかし、日本語
の漢字や外国語は時として複数の仮名の付け方を持つこ
ともあり、とてもではないが容易な作業とは言い難い。
さらに、複雑な条件をつけてのソートや、人名にもとづ
く集合など条件のくわわるものをソートということにな
れば、人の判断に逐一頼ることになり、電子計算機は使
い辛いものになる。In such a case, even if one sort in Japanese aiueo is taken, it takes a lot of labor and efficiency is never good. When sorting Japanese characters in aiueo order, if kanji or alphabets are included, convert them into kana once for comparison, and if they are applied to a computer, perform some encoding for comparison. It will be. However, Japanese kanji and foreign languages sometimes have multiple kana names, which is not easy but difficult to say.
In addition, if you have to sort with complicated conditions, or sort things that meet conditions such as a set based on a person's name, you will have to rely on people's judgments one by one, and the computer will be difficult to use.

【０００４】また、ソートも日本語をベースにすると単
にアイウエオ順に限らず、イロハニホヘト順などもあ
り、基準は様々に変化する。Further, the sorting is not limited to the Iueo order based on Japanese, but may be the Irohanihohet order or the like, and the standard changes variously.

【０００５】デー夕を入力する際に、並行してソートを
行うための情報（符号）を入力するのも一つの方法であ
るが、その分、記憶媒体の資源を奪うことになるため、
良い方法とは思えない。When inputting data, it is also possible to input information (codes) for performing sorting in parallel, but since the resources of the storage medium are deprived by that much,
I don't think it's a good way.

【０００６】[0006]

【課題を解決するための手段】本発明は容易にソートの
できない言語のソートを人手に極力頼らず、可能にする
ためのものである。符号化された比較可能な集団に対す
るソートの方法に関しては、バブルソートや、挿入ソー
トなど様々な方法があるが、それらは資料の数により一
長一短があるので、ソートの方法については状況に応じ
て方法が変わる。問題はソートできないものをソートで
きる状態にする方法である。それは極力人手に頼らず
に、ソートしたい言語を数値化し計量可能にすることで
実現する。DISCLOSURE OF THE INVENTION The present invention is to enable sorting of languages that cannot be easily sorted without using human hands as much as possible. There are various sorts of sorting methods for encoded comparable groups, such as bubble sort and insertion sort. However, there are merits and demerits depending on the number of materials, so the sorting method should be changed according to the situation. Will change. The problem is how to put what can't be sorted into a sortable state. This is achieved by digitizing the languages you want to sort and making them measurable without relying on human resources as much as possible.

【０００７】単語のソートを行う場合、漢字、ひらが
な、カタカナ、さらに外国語（アルファベット、アラビ
ア文字など）を状況に応じて数値化し、比較可能にす
る。単語に対応する数字がない場合、または通常の場合
と異なる発音を行う場合に限り、学習を行う。When sorting words, kanji, hiragana, katakana, and foreign languages (alphabets, Arabic letters, etc.) are digitized according to the situation so that they can be compared. Learn only if the word does not have a corresponding number or if it sounds different than usual.

【０００８】[0008]

【実施例】「表１」の単語（一例として人名）の集団に
ついてソートを行おうとする場合、平仮名になおされた
記述が平行して行われていないと、人間の労力によって
ソートを行うことになる。[Example] When attempting to sort a group of words in Table 1 (personal name as an example), if the descriptions in Hiragana are not done in parallel, the human labor is necessary for sorting. Become.

【表１】 [Table 1]

【０００９】しかし、日本語の漢字を含む単語の読み方
は、以下３通りの特徴のうちの１つが当てはまる。However, one of the following three characteristics applies to how to read a word including Japanese Kanji.

【００１０】「大阪」を「おおはん」「だいさか」
などと読まれることはない。「大阪」は「おおさか」以
外に読まれることはない。このような確定的に読み方の
決まるケース（確定的な読み方）「紀子」を「きこ」または「のりこ」「まさこ」と
いうような択一的に読み方がしばしば行われるようなケ
ース（選択的な読み方）「征則」を「まさのり」と読ませるような予想不能
な読み方を行うような難解なケース（予測不能、または
難解な読み方）"Osaka" is "Ohan" and "Daisaka"
Is not read. "Osaka" is read only by "Osaka". Cases in which such definite reading is decided (definite reading) Cases in which reading is often carried out in an alternative way such as "Kiko" or "Noriko" or "Masako" (selective reading) ) A difficult case (unpredictable or esoteric reading) that causes an unpredictable reading that makes "conquest" read as "Masano"

【００１１】は機械的に「漢字→ひらがな」の変換が
可能である。やのようなケースの時だけ、変換不可
能として人手によって変換を行う。、をそれぞれ辞
書として「表２」や「表３」のように漢字とひらがなを
対になるように定義する。[0011] can mechanically convert "Kanji to Hiragana". Only in cases such as or, the conversion is performed manually as unconvertible. , Are defined as dictionaries so that kanji and hiragana are paired as in “Table 2” and “Table 3”.

【表２】 [Table 2]

【表３】 [Table 3]

【００１２】さらに姓について「漢字→符号」の変換を
ソート条件となるFurthermore, the conversion of "Kanji to sign" for the family name is a sort condition.

【表４】の定義に従って行う。（符号はわかりやすくす
るため十進数を用いる。）この変換方法は、が頻繁
に使われるかどうかで、２つの方法（｜、‖）を使い分
けることが考えられる。Perform according to the definition in [Table 4]. (The code uses a decimal number for the sake of clarity.) This conversion method may use two methods (|, ‖) depending on whether is frequently used.

【表４】 [Table 4]

【００１３】｜辞書の単語が重複して頻繁に使わ
れる辞書をあらかじめ符号化し、変換対象の集合の要素
をかなにせずに、直接符号化する ‖ 辞書の単語が重複して頻繁に使われない辞書を使ってソート対象の集合の要素を一次的にか
なにする。かなにされたものに対して符号化を行う。| Different words in dictionary are frequently used. Dictionary is pre-encoded and directly encoded without compromising the elements of the set to be converted. ‖Dictionary words are not frequently used in duplicate. Use a dictionary to temporarily make the elements of the set to be sorted into kana. Encoding is performed on what is kana.

【００１４】いずれの方法をとるにせよ、結果的には
「表５」のようになる。Whichever method is used, the result is as shown in Table 5.

【表５】 [Table 5]

【００１５】さらにスペースを含めた比較を行うため
に、一番長い符号に合わせた末尾の整理を行うと「表
６」のようになる。Further, in order to make a comparison including a space, the rearrangement according to the longest code is performed, and the result is shown in Table 6.

【表６】 [Table 6]

【００１６】名前を含めると「表７」のようになる。When the name is included, it becomes as shown in "Table 7".

【表７】 [Table 7]

【００１７】さらにスペースを含めた比較を行うため
に、一番長い符号に合わせた末尾の整理を行うと「表
８」のようになる。Further, in order to make a comparison including spaces, the rear end is arranged according to the longest code, as shown in "Table 8".

【表８】 [Table 8]

【００１８】作られた符号をもとにソートを行うと「表
９」のようになる。When sorting is performed based on the created codes, the result is shown in "Table 9".

【表９】 [Table 9]

【００１９】以上のような形ですくなくとも漢字を含め
た単語のソートを可能にすることができる。単語のかな
への登録の必要が発生したときは随時、辞書への追
加を実施する。It is possible to sort words including Kanji at least in the above-mentioned form. When it becomes necessary to register a word in kana, it will be added to the dictionary as needed.

【００２０】誤読であると判断される場合など、調整を
必要とする場合にはユーザーが必要に応じて辞書の定義
の変更を行う、あるいは、この符号化を提供するシステ
ムが調整の内容に応じて辞書の変更を行うなどの方法で
対処する。今回のソートでは、「きょ」と「ぎょ」、
「きよ」と「ぎよ」を完全にわけるような定義にしてい
るが、数値化する上で必要なファイル（実例では「表
４」）の変更でいかようにも変更は可能である。（複数
の文字に１つの符号が対になっている構成も可能であ
る。）When adjustment is required, such as when it is determined to be erroneous reading, the user changes the definition of the dictionary as necessary, or the system that provides this encoding responds to the content of the adjustment. Deal with it by changing the dictionary. In this sort, "Kyo" and "Gyo",
The definition is made so that "kiyo" and "giyo" are completely separated, but it is possible to make any change by changing the file ("Table 4" in the actual example) necessary for digitizing. (A configuration in which one character is paired with a plurality of characters is also possible.)

【表４】[Table 4]

【００２１】[0021]

【効果】言語に関わり無く、誤りと労力を最小に必要と
した形で、指定された条件に従ったソートが行えるよう
になる。[Effect] Sorting can be performed according to specified conditions in a form that requires the least error and effort regardless of the language.

Claims

[Claims]

1. A method of converting a set of Japanese and foreign languages including kanji, katakana, etc. into a code necessary for sorting according to a specified condition, and sorting from the order of the code.

2. When converting to a code necessary for sorting the preceding claims, when there is no corresponding code, or
When the conversion is judged to be a wrong code, the error is corrected and the definition is changed if necessary.