JPH1185802A

JPH1185802A - Computer-readable recording medium recording full-text search data and character string collation device

Info

Publication number: JPH1185802A
Application number: JP10004535A
Authority: JP
Inventors: Osamu Katayama; 修片山; Takamasa Koyama; 隆正小山; Chuichi Kikuchi; 忠一菊池; Tomoko Fujita; 智子藤田; Yasuyo Shirasaki; 安代白崎
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1997-07-11
Filing date: 1998-01-13
Publication date: 1999-03-30
Anticipated expiration: 2018-01-13
Also published as: JP3567711B2

Abstract

(57)【要約】【課題】情報検索処理装置で、検索する文字列に意味
の区切りなどのために特殊文字が挿入されている場合、
例えばそれが入力文字列の３文字に１文字の割合で挿入
されていた場合、その文字の出現回数が膨大になりその
文字の連鎖メモリのみが異常に膨大し、連鎖メモリを圧
迫するという問題点を解決することを目的とする。【解決手段】文字列変換手段３０１、３０４は、文字
列に特殊文字が出現した場合、特殊文字を隣接する文字
により検索対象とならない文字に変換し２文字連鎖検出
器３０２、３０５に出力する。２文字連鎖検出手段３０
２は、特殊文字列を検索の対象とならない文字に変換さ
れた文字列に対し、２文字連鎖を抽出し、連鎖毎に出現
回数をカウントし２文字連鎖メモリ３０３０に格納す
る。比較器は、特殊文字を隣接する文字に従い変換した
検索対象文字列に対し、２文字連鎖メモリを用いて文字
連鎖の有無を調べる。 (57) [Summary] [Problem] In a case where a special character is inserted into a character string to be searched to separate meanings in an information search processing device,
For example, if it is inserted at a ratio of one character to three characters of the input character string, the number of appearances of the character is enormous, and only the chain memory of the character is abnormally enormous, which overwhelms the chain memory. The purpose is to solve. SOLUTION: When a special character appears in a character string, character string conversion means 301, 304 converts the special character into a character which is not a search target by an adjacent character and outputs it to a two-character chain detector 302, 305. Two-character chain detecting means 30
2 extracts a two-character chain from the character string obtained by converting the special character string into a character that is not a search target, counts the number of appearances for each chain, and stores it in the two-character chain memory 3030. The comparator checks the presence or absence of a character chain using a two-character chain memory for the search target character string obtained by converting the special character according to the adjacent character.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、情報検索処理分野
における文書中に記述された文字列を検索する全文検索
に利用されるもので、登録した文書中に含まれる全ての
文字列と入力する文字列を照合するための全文検索デー
タを記録したコンピュータ読み取り可能な記録媒体、お
よび文字列照合装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention is used for a full-text search for searching for a character string described in a document in the field of information search processing, and inputs all the character strings contained in a registered document. The present invention relates to a computer-readable recording medium that records full-text search data for collating a character string, and a character string collating apparatus.

【０００２】[0002]

【従来の技術】図３６は従来の文字列照合装置の構成
図、図３７は従来の文字照合の方法、および全文検索デ
ータの形式を示している。図３７(a)において、８０１
は登録時に入力される文字列「いろａはに」、８０２は
最初に登録されるの２文字連鎖「いろ」、８０３は８０
２の次の２文字連鎖「ろａ」、８０４は８０３の次の２
文字連鎖「ａは」、８０５は８０４の次の２文字連鎖
「はに」である。ここで、「ａ」は、文字列に意味の区
切りなどのために挿入されている特殊文字を意味してお
り、例えばハングル語で頻繁に出現するスペースなどが
ある。2. Description of the Related Art FIG. 36 shows the configuration of a conventional character string collating apparatus, and FIG. 37 shows a conventional character collating method and the format of full-text search data. In FIG. 37 (a), 801
Is a character string "iroa-hani" input at the time of registration, 802 is a two-character chain "iro" to be registered first,
The two-letter chain "ro a" next to "2", and 804
The character chain “a wa” and 805 are the next two character chain “hani” after 804. Here, “a” means a special character inserted into a character string to separate meanings, for example, a space that frequently appears in Hangul.

【０００３】図３７(c)において、８１１は検索時の検
索文字列「いろａはに」、８１２は最初に検索される２
文字連鎖「いろ」、８１３は８１２の次の２文字連鎖
「ろａ」、８１４は８１３の次の２文字連鎖「ａは」、
８１５は８１４の次の２文字連鎖「はに」である。In FIG. 37 (c), reference numeral 811 denotes a search character string "iro a ha ni" at the time of search, and 812 denotes a search character string which is searched first.
The character chain “Iro”, 813 is the next two-character sequence “roa” after 812, 814 is the next two-character sequence “8a” after 813,
815 is the two-letter chain “Hani” following 814.

【０００４】図３７（b）において、２文字連鎖８０２
は「い」および「ろ」の出現回数n1、n2を、２文字連鎖
８０３は「ろ」および「ａ」の出現回数n2、n3を、２文
字連鎖８０４は「ａ」および「は」の出現回数n3、n4
を、２文字連鎖８０５は「は」および「に」の出現回数
n4、n5を記憶する。In FIG. 37 (b), a two-character chain 802
The number of appearances n1 and n2 of “yes” and “ro”, the two-character chain 803 is the number of appearances n2 and n3 of “ro” and “a”, and the two-character chain 804 is the occurrence of “a” and “ha” Number of times n3, n4
, The two-character chain 805 is the number of appearances of “ha” and “ni”
Store n4 and n5.

【０００５】また、図３７（d）は入力された文字列に
「いろ」の連鎖が複数存在する場合の記録形式を示す。
即ち、「いろ」の連鎖はn1回目に出現した「い」とn2回
目に出現した「ろ」、na回目に出現した「い」とnb回目
に出現した「ろ」、・・・、nx回目に出現した「い」と
ny回目に出現した「ろ」からなることを示している。FIG. 37 (d) shows a recording format in the case where a plurality of "iro" chains exist in the input character string.
That is, the sequence of "iro" is such that "iro" appeared at the n1st time and "ro" appeared at the n2th time, "iro" appeared at the nath time and "ro" appeared at the nbth time, ..., nxth "I" appeared in
It is composed of "ro" that appeared at the ny-th time.

【０００６】このとき従来の照合方法では、２文字連鎖
８１２の「いろ」に該当する２文字連鎖８０２を検出
し、このときの「ろ」の出現回数n2と、８１２の次の２
文字連鎖８１３の「ろａ」に該当する２文字連鎖８０３
を検出し、このときの「ろ」の出現回数n2が一致するか
否か判断する。At this time, in the conventional collation method, a two-character chain 802 corresponding to the “color” of the two-character chain 812 is detected.
A two-character chain 803 corresponding to “a” in the character chain 813
Is detected, and it is determined whether or not the number of appearances n2 of “ro” at this time matches.

【０００７】一致したら、次に８０３で検出した「ａ」
の出現回数n3と、８１３の次の２文字連鎖の「ａは」に
該当する２文字連鎖８０４を検出し、このときの「ａ」
の出現回数が一致するか否か判断する。If they match, then "a"
Is detected, and a two-character chain 804 corresponding to “a wa” of the two-character chain following the 813 is detected.
It is determined whether or not the number of appearances of the two matches.

【０００８】一致したら、次に８０４で検出した「は」
の出現回数n4と、８１４の次の２文字連鎖の「はに」に
該当する２文字連鎖８０５を検出し、このときの「は」
の出現回数が一致するか否か判断する。一致したら、文
字列８１１は８０１に一致したと判断する。以上によ
り、文字列の照合がなされる。If they match, then the "ha" detected at 804
Is detected, and a two-character chain 805 corresponding to “Hani” in the next two-character chain of 814 is detected.
It is determined whether or not the number of appearances of the two matches. If they match, it is determined that the character string 811 matches 801. As described above, the character strings are collated.

【０００９】図３６は図３７に示す検索データを用いた
従来の文字列照合装置の構成を示したものである。FIG. 36 shows a configuration of a conventional character string collation apparatus using the search data shown in FIG.

【００１０】図３６において、７０１は登録する文字列
８０１から登録する２文字連鎖８０２、８０３、８０
４、８０５を検出する２文字連鎖検出器、７０２は２文
字連鎖８０２、８０３、８０４、８０５およびそれらの
文字の出現回数を格納する２文字連鎖メモリ、７０３は
検索する文字列８１１から検索する２文字連鎖８１２、
８１３、８１４、８１５を検出する２文字連鎖検出器、
７０４は２文字連鎖検出器７０３より検出された２文字
連鎖８１２、８１３、８１４、８１５を２文字連鎖メモ
リ７０２で検出し、検出した２文字連鎖の前の文字の出
現回数が直前に検出した２文字連鎖の後の文字の出現回
数に一致するか否か判断する比較器、７０５は２文字連
鎖検出器７０３から検出される全ての２文字連鎖につい
ての比較器７０４で判断し、文字列の一致を判断する制
御部である。In FIG. 36, reference numeral 701 denotes a two-character chain 802, 803, 80 to be registered from a character string 801 to be registered.
2, a two-character chain detector 702 for detecting the two-character chains 802, 803, 804, 805 and the number of appearances of those characters; and 703, a two-character chain for searching from the character string 811 to be searched. Character chain 812,
A two-character chain detector that detects 813, 814, 815;
Reference numeral 704 denotes the two-character chain 812, 813, 814, and 815 detected by the two-character chain detector 703 in the two-character chain memory 702, and the number of appearances of the character preceding the detected two-character chain is detected by the last two characters. A comparator 705 determines whether or not the number of occurrences of the character after the character chain matches. The comparator 704 determines all character strings detected by the two-character chain detector 703 and matches the character strings. Is a control unit that determines

【００１１】[0011]

【発明が解決しようとする課題】しかし、以上のような
構成では、登録時に入力する文字列に意味の区切りなど
のために特殊文字（ハングル語におけるスペース等）が
挿入されている場合、例えばそれが入力文字列の３文字
に１文字の割合で挿入されていた場合、その文字の出現
回数が膨大になりその文字の連鎖メモリのみが異常に膨
大し、連鎖メモリを圧迫するという問題点があった。ま
た、同一の文字連鎖に関し、出現回数の一致により連鎖
を抽出する処理が多数繰り返すことにになり、時間がか
かるという問題があった。However, in the above-described configuration, when a special character (such as a space in Hangul) is inserted into a character string input at the time of registration to separate meanings, for example, the character string may not be inserted. If one character is inserted into three characters of the input character string at a rate of one character, the number of appearances of the character becomes enormous, and only the chain memory of the character becomes abnormally enormous. Was. Further, for the same character chain, the process of extracting the chain based on the coincidence of the number of appearances is repeated many times, and there is a problem that it takes time.

【００１２】本発明は従来技術の以上のような問題を解
決するもので、意味区切りで等で使用される特定の特殊
文字を間に挟む３文字連鎖として連鎖を作成するか、ま
たはその特殊文字に連接する文字により一意に決定され
る文字に変更するか、またはその特殊文字の前の文字は
その前の文字とその前の文字により一意に決定される文
字の２文字に、またその特殊文字の後の文字はその後の
文字により一意に決定される文字とその後の文字の２文
字に変更することにより、特殊文字の出現回数を減らし
特殊文字の連鎖メモリの増大を避けるもので、同時に出
現回数の一致による連鎖の抽出処理を効率的行うことを
目的とするものである。The present invention solves the above-mentioned problems of the prior art, and forms a chain as a three-character chain sandwiching a specific special character used as a delimiter or the like, or forms the special character. Is changed to a character uniquely determined by the character adjacent to the character, or the character before the special character is replaced by the two characters of the character before and the character uniquely determined by the preceding character, and the special character The character after is changed to two characters, a character uniquely determined by the following character and the subsequent character, thereby reducing the number of special characters and avoiding an increase in the special character chain memory. It is an object of the present invention to efficiently perform a process of extracting a chain based on the matching of.

【００１３】[0013]

【課題を解決するための手段】本発明は、第１に、全文
検索データを、検索対象文字列に対し、予め指定された
特殊文字以外の文字からなる全ての２文字連鎖を検出
し、２文字連鎖毎に、２文字連鎖を構成する第１文字と
第２文字の検索対象文字列における出現回数を組として
記録した第１のデータと、予め指定された特殊文字が挿
入された特殊文字以外の２文字からなる全ての文字連鎖
を検出し、前記文字連鎖毎に、文字連鎖を構成する第１
文字と第２文字の検索対象文字列における出現回数を組
として記録した第２のデータを、第１データと第２デー
タとを区別して記録し、検索文字列から、予め指定され
た特殊文字以外の文字からなる全ての２文字連鎖と、予
め指定された特殊文字が挿入された特殊文字以外の２文
字からなる全ての文字連鎖を検出し、それぞれの文字連
鎖を第１のデータおよび第２のデータから検索し、検出
された文字連鎖に対応する出現回数の比較により、検索
文字列としての文字連鎖の有無を判定することにより上
記課題を解決している。According to the present invention, first, full-text search data is detected by detecting all two-character sequences consisting of characters other than special characters specified in advance for a character string to be searched. For each character chain, first data that records the number of appearances of the first character and the second character forming the two-character chain in the search target character string as a set, and special characters other than the special character in which a special character specified in advance is inserted All character chains consisting of two characters are detected, and a first character chain forming a character chain is detected for each of the character chains.
The second data, in which the number of appearances of the character and the second character in the search target character string are recorded as a set, is recorded separately from the first data and the second data. Are detected, and all character chains consisting of two characters other than the special character in which a special character specified in advance is inserted are detected, and the respective character chains are converted into the first data and the second data. The above problem is solved by searching data and comparing the number of appearances corresponding to the detected character chain to determine the presence or absence of a character chain as a search character string.

【００１４】第２に、全文検索データを、検索対象文字
列の予め指定された特殊文字を隣接する文字に従い検索
の対象とならない文字に変換し、当該変換された文字列
に対し、前記検索の対象とならない文字も含め全ての２
文字連鎖を検出し、２文字連鎖毎に、２文字連鎖を構成
する第１文字と第２文字の検索対象文字列における出現
回数を組として記録し、検索文字列の予め指定された特
殊文字を前記記録媒体に記録されたデータに対し適用さ
れた同一の規則に従い、隣接する文字に基づき検索の対
象とならない文字に変換するし、変換された文字列に対
し、検索の対象とならない文字も含め全ての２文字連鎖
を検出し、検出された２文字連鎖を、前記記憶媒体から
検出し、対応する出現回数の比較により、検索文字列と
しての文字連鎖の有無を判定することにより上記課題を
解決している。Second, the full-text search data is converted into characters that are not to be searched according to adjacent special characters of a special character specified in the character string to be searched. All 2 including non-target characters
A character chain is detected, and for each two-character chain, the number of appearances of the first character and the second character constituting the two-character chain in the search target character string is recorded as a set. According to the same rules applied to the data recorded on the recording medium, convert to characters that are not to be searched based on adjacent characters, and include characters that are not to be searched for the converted character string. The above problem is solved by detecting all the two-character chains, detecting the detected two-character chains from the storage medium, and comparing the corresponding appearance counts to determine the presence or absence of a character chain as a search character string. doing.

【００１５】第３に、全文検索データを、検索対象文字
列の予め指定された特殊文字を隣接する文字に従い検索
の対象とならない２文字に変換し、当該変換された文字
列に対し、前記検索の対象とならない文字も含め全ての
２文字連鎖を検出し、２文字連鎖毎に、２文字連鎖を構
成する第１文字と第２文字の検索対象文字列における出
現回数を組として記録し、検索文字列の予め指定された
特殊文字を前記記録媒体に記録されたデータに対し適用
された同一の規則に従い、隣接する文字に基づき検索の
対象とならない２文字に変換し、変換された文字列に対
し、検索の対象とならない２文字も含め全ての２文字連
鎖を検出し、検出された２文字連鎖を、前記記憶媒体か
ら検出し、対応する出現回数の比較により、検索文字列
としての文字連鎖の有無を判定することにより上記課題
を解決している。Third, the full-text search data is converted into two characters that are not to be searched according to adjacent special characters of a pre-specified special character of the character string to be searched. , And all the two-character chains including the characters that are not the target of the search are detected, and for each two-character chain, the number of appearances of the first character and the second character constituting the two-character chain in the search target character string is recorded as a set. In accordance with the same rule applied to the data recorded on the recording medium, the special character specified in advance in the character string is converted into two characters that are not searched based on adjacent characters, and the converted character string is obtained. On the other hand, all the two-character sequences including the two characters that are not searched are detected, the detected two-character sequences are detected from the storage medium, and the corresponding occurrence counts are compared to find a character sequence as a search character string. It solves the problem by determining the presence or absence.

【００１６】第４に、全文検索データを、検索対象文字
列に対し、全ての文字に対し２文字連鎖を検出し、２文
字連鎖毎に２文字連鎖を構成する、予め指定された特殊
文字以外の文字連鎖を構成する第１文字と第２文字につ
いて、予め指定された特殊文字以外の文字はその出現回
数を、また予め指定された特殊文字の場合は一定の数値
を、組として記録した第３のデータと、検索対象文字列
に対し、予め指定された特殊文字が間に挿入された３文
字からなる全ての３文字連鎖を検出し、３文字連鎖毎
に、３文字連鎖を構成する第１文字と第３文字の出現回
数を組として記録した第４のデータを、記録、アクセス
することにより、上記課題を解決している。Fourth, the full-text search data is searched for a character string to be searched, and a two-character chain is detected for every character. For the first character and the second character constituting the character chain of, the number of appearances of characters other than the special character specified in advance, and a fixed numerical value in the case of the special character specified in advance, are recorded as a set. In the data 3 and the search target character string, all three-character chains consisting of three characters in which a special character designated in advance is inserted are detected, and a three-character chain is formed for each three-character chain. The above problem is solved by recording and accessing fourth data in which the number of appearances of one character and the third character is recorded as a set.

【００１７】第５に、全文検索データを、検索対象文字
列に対し、予め指定された特殊文字以外の文字からなる
全ての２文字連鎖を検出し、２文字連鎖毎に２文字連鎖
を構成する第１文字と第２文字の検索対象文字列におけ
る出現回数を組として記録した第５のデータと、検索対
象文字列に対し、予め指定された特殊文字が間に挿入さ
れた３文字からなる全ての３文字連鎖を検出し、３文字
連鎖毎に、３文字連鎖を構成する第１文字の出現回数と
値０の組と、値０と第３文字の出現回数の組として記録
した第６のデータを、記録、アクセスすることにより、
上記課題を解決している。第６に、全文検索データを、
検索対象文字列に対し、予め指定された特殊文字以外の
文字からなる全ての２文字連鎖を検出し、２文字連鎖毎
に２文字連鎖を構成する第１文字と第２文字の検索対象
文字列における出現回数を組として記録した第７のデー
タと、検索対象文字列に対し、予め指定された特殊文字
が間に挿入された３文字からなる全ての３文字連鎖を検
出し、３文字連鎖毎に、３文字連鎖を構成する第２文字
の特殊文字を第３文字と同じ文字に変換し第２文字の出
現回数を第３文字の出現回数と同じ値としてから第１文
字と第２文字、第２文字と第３文字の２つの２文字連鎖
を生成し、各２文字連鎖毎に２文字連鎖を構成する第１
文字と第２文字の検索対象文字列における出現回数を組
として記録した第８のデータを、記録、アクセスするこ
とにより、上記課題を解決している。Fifth, in the full-text search data, a two-character chain consisting of characters other than a special character specified in advance is detected for a character string to be searched, and a two-character chain is formed for every two-character chain. Fifth data in which the number of appearances of the first character and the second character in the search target character string are recorded as a set, and all three characters including a special character specified in advance with respect to the search target character string And a sixth character string recorded as a set of the number of appearances of the first character and the value 0 and a set of the value 0 and the number of appearances of the third character for each three-character chain. By recording and accessing data,
The above problem has been solved. Sixth, full-text search data
For the character string to be searched, all two-character strings consisting of characters other than the special characters specified in advance are detected, and the first and second character strings forming the two-character string are formed for each two-character string. In the seventh data recorded as a set of the number of appearances in and the search target character string, all three-character chains consisting of three characters in which a special character specified in advance is inserted are detected. The first character and the second character are converted from the special character of the second character constituting the three-character chain to the same character as the third character, and the number of appearances of the second character is set to the same value as the number of occurrences of the third character. A first two-character chain that generates two character chains of a second character and a third character, and forms a two-character chain for each two-character chain
The above problem is solved by recording and accessing eighth data in which the number of appearances of a character and a second character in a search target character string is recorded as a set.

【００１８】第７に全文検索に用いる検索データを、前
記検索データは検索対象文字列に対し、全ての２文字連
鎖を検出し、２文字連鎖毎に２文字連鎖を構成し、２文
字連鎖が、予め指定された特殊文字以外の文字連鎖の構
成の場合には、第１文字と第２文字について予め指定さ
れた特殊文字以外の文字はその出現回数の組を記録した
第９のデータと、２文字連鎖が、予め指定された特殊文
字を含む文字連鎖の構成の場合には、特殊文字に該当す
る第１文字または第２文字について、その出願回数が予
め指定された出現回数の最大値以下で割った余りと、ま
たは余りが０の場合は最大値、またはその最大値及び余
り、または１度目の出現回数が最大値以下である場合に
２度目以降の最大値以下の値が１度目の値と順番がユニ
ークとなるように値を持ち、特殊文字でない文字の出現
回数とを組として記憶した第１０のデータで、かつ、第
１文字が特殊文字の場合、第１０のデータの組は、第２
文字種別毎にソートされた前記第１０のデータと、第９
データと第１０データとを区別して記憶、アクセスする
ことがで、上記課題を解決している。Seventh, the search data used in the full-text search is detected. The search data detects all two-character chains in the search target character string, forms a two-character chain for every two-character chain, and In the case of a configuration of a character chain other than the special character designated in advance, ninth data in which a set of the number of appearances of the first character and the second character other than the special character designated in advance is recorded, In the case where the two-character chain is a character chain including a special character specified in advance, the number of applications for the first character or the second character corresponding to the special character is equal to or less than the maximum value of the number of appearances specified in advance. If the remainder is 0 or the remainder is 0, the maximum value, or the maximum value and the remainder, or if the first appearance count is less than the maximum value, the value less than or equal to the second maximum value is the first time Values and order are unique The have, in the tenth data storing and number of occurrences of the character is not a special character as a set, and, if the first character is a special character, the 10th set of data of the second
The tenth data sorted for each character type;
The above problem is solved by storing and accessing the data and the tenth data separately.

【００１９】第８に、全文検索に用いる検索データは検
索対象文字列に対し、全ての２文字連鎖を検出し、２文
字連鎖毎に文書番号、２文字連鎖の文字種毎の出現回数
または任意の値の組からなる文字連鎖データを構成し、
前記文字連鎖データが、予め指定された特殊文字を含ま
ない場合は第１文字の出現回数と第２文字の出現回数を
格納するサイズが等しく、予め指定された特殊文字を含
む場合は特殊文字に該当する出現回数を格納するサイズ
が特殊文字を含まない文字に該当する任意の値を格納す
るサイズに比べて大きくなるように構成された文字連鎖
データで、前記文字連鎖データが、第１文字に予め指定
された特殊文字列を含む場合は、第２文字が指定された
値を格納し、次の連続した文字連鎖データの第１文字が
前の文字連鎖データの第２文字で指定された値に等しく
なるように構成される文字連鎖データを記憶し、アクセ
スすることができ、上記課題を解決している。Eighth, the search data used in the full-text search detects all two-character sequences in the character string to be searched, and outputs a document number for each two-character sequence, the number of appearances for each character type of the two-character sequence, or an arbitrary number. Construct character chain data consisting of value pairs,
If the character chain data does not include a special character specified in advance, the size of storing the number of appearances of the first character and the number of occurrences of the second character is equal. The character chain data configured so that the size for storing the corresponding number of appearances is larger than the size for storing an arbitrary value corresponding to a character that does not include a special character. When a special character string specified in advance is included, the second character stores the specified value, and the first character of the next continuous character chain data is the value specified by the second character of the previous character chain data. The character chain data configured to be equal to can be stored and accessed, thereby solving the above problem.

【００２０】第９に、全文検索データを、検索対象文字
列に対し、特殊文字を含まない全ての２文字連鎖を検出
し、２文字連鎖毎を構成する、予め指定された特殊文字
以外の文字連鎖を構成する第１文字と第２文字につい
て、予め指定された特殊文字以外の文字はその出現回数
を組として記録した第１１のデータと、検索対象文字列
に対して、予め指定された特殊文字をまたぐ２文字連鎖
について特殊文字の前にある２文字連鎖の１文字目の文
字種の出現回数と、特殊文字の後ろにある２文字連鎖の
１文字目の文字種の出現回数を組として記録した第１２
のデータまたは特殊文字の前にある２文字連鎖の１文字
目の文字種の出現回数と、特殊文字の直後の文字の出現
回数を組とした第１２のデータを、記録、アクセスする
ことにより上記課題を解決している。Ninth, the full-text search data is obtained by detecting all two-character chains that do not include special characters in the character string to be searched, and forming characters for each two-character chain except for the special characters specified in advance. Regarding the first character and the second character constituting the chain, the characters other than the special character designated in advance are stored in the eleventh data recorded as a set of the number of occurrences thereof, and the special character designated in advance for the search target character string. As a two-character chain that straddles characters, the number of appearances of the first character type of the two-character chain preceding the special character and the number of appearances of the first character type of the two-character chain following the special character are recorded as a set. Twelfth
The above problem can be solved by recording and accessing twelfth data, which is a combination of the number of occurrences of the first character type of the two-character chain preceding the special character or the special character and the number of occurrences of the character immediately after the special character. Has been resolved.

【００２１】第１０に、全文検索データを、検索対象文
字列に対し、予め指定された特殊文字以外の文字からな
る全ての２文字連鎖を検出し、２文字連鎖毎に、２文字
連鎖を構成する第１文字または第２文字の検索対象文字
列における出現位置を２文字連鎖の出現位置として記録
した第１の３データと、予め指定された特殊文字が挿入
された全ての文字連鎖を検出し、前記文字連鎖毎に、文
字連鎖を構成する第１文字の検索対象文字列における出
現位置を前記文字連鎖の出現位置として記録した第１４
のデータを、第１３データと第１４データとを区別して
記録し、検索文字列から、予め指定された特殊文字以外
の文字からなる全ての２文字連鎖と、予め指定された特
殊文字が挿入された全ての文字連鎖を検出し、それぞれ
の文字連鎖を第１３のデータおよび第１４のデータから
検索し、検出された文字連鎖に対応する出現位置の比較
により、検索文字列としての文字連鎖の有無を判定する
ことにより上記課題を解決している。Tenth, in the full-text search data, a two-character chain consisting of characters other than a special character specified in advance is detected for a character string to be searched, and a two-character chain is formed for each two-character chain. The first three data in which the appearance position of the first character or the second character in the search target character string is recorded as the appearance position of a two-character chain, and all the character chains into which a special character specified in advance is inserted are detected. In the fourteenth aspect, for each of the character chains, the appearance position of the first character constituting the character chain in the search target character string is recorded as the appearance position of the character chain.
Is recorded separately from the thirteenth data and the fourteenth data, and from the search character string, all two-character chains consisting of characters other than the special character specified in advance and the special character specified in advance are inserted. All of the detected character chains are searched, and the respective character chains are searched from the thirteenth data and the fourteenth data. By comparing the appearance positions corresponding to the detected character chains, the presence or absence of the character chain as the search character string is determined. The above problem has been solved by determining.

【００２２】第１１に、全文検索データを、検索対象文
字列の予め指定された特殊文字を隣接する文字に従い検
索の対象とならない文字に変換し、当該変換された文字
列に対し、前記検索の対象とならない文字も含め全ての
２文字連鎖を検出し、２文字連鎖毎に、２文字連鎖を構
成する第１文字または第２文字の検索対象文字列におけ
る出現位置を２文字連鎖の出現位置として記録し、検索
文字列の予め指定された特殊文字を前記記録媒体に記録
されたデータに対し適用された同一の規則に従い、隣接
する文字に基づき検索の対象とならない文字に変換し、
変換された文字列に対し、検索の対象とならない文字も
含め全ての２文字連鎖を検出し、検出された２文字連鎖
を、前記記憶媒体から検出し、対応する出現位置の比較
により、検索文字列としての文字連鎖の有無を判定する
ことにより上記課題を解決している。Eleventh, the full-text search data is converted into a non-search target character according to adjacent characters of a pre-specified special character of the search target character string. Detects all two-character chains including non-target characters, and sets the appearance position of the first character or the second character constituting the two-character chain in the search target character string as the appearance position of the two-character chain for each two-character chain Record, according to the same rules applied to the data recorded on the recording medium, the pre-specified special character of the search character string is converted into a character that is not a search target based on adjacent characters,
In the converted character string, all the two-character sequences including the characters that are not to be searched are detected, the detected two-character sequences are detected from the storage medium, and the corresponding occurrence positions are compared to find the search character. The above problem is solved by determining the presence or absence of a character chain as a column.

【００２３】第１２に、全文検索データを、検索対象文
字列の予め指定された特殊文字を前後の隣接する文字に
従い検索の対象とならない２つの文字に変換し、当該変
換された文字列に対し、前記検索の対象とならない文字
も含め全ての２文字連鎖を検出し、２文字連鎖毎に、２
文字連鎖を構成する第１文字または第２文字の検索対象
文字列における出現位置を２文字連鎖の出現位置として
記録し、検索文字列の予め指定された特殊文字を前記記
録媒体に記録されたデータに対し適用された同一の規則
に従い、隣接する文字に基づき検索の対象とならない２
文字に変換し、変換された文字列に対し、検索の対象と
ならない２文字も含め全ての２文字連鎖を検出し、検出
された２文字連鎖を、前記記憶媒体から検出し、対応す
る出現位置の比較により、検索文字列としての文字連鎖
の有無を判定することにより上記課題を解決している。
第１３に、全文検索データを、検索対象文字列に対し、
予め指定された特殊文字以外の文字からなる全ての２文
字連鎖を検出し、前記２文字連鎖毎に２文字連鎖を構成
する第１文字または第２文字の検索対象文字列における
出現位置を２文字連鎖の出現位置として記録したデータ
と、検索対象文字列に対し、予め指定された特殊文字が
間に挿入された３文字からなる全ての３文字連鎖を検出
し、３文字連鎖毎に、３文字連鎖を構成する第２文字の
特殊文字を第３文字と同じ文字に変換し第２文字の出現
位置を第３文字の出現位置と同じ値としてから第１文字
と第２文字、第２文字と第３文字の２つの２文字連鎖を
生成し、３文字連鎖の第１文字と第２文字からなる２文
字連鎖の第１文字または第２文字の出現位置を記録した
データからなる第１５のデータと、前記３文字連鎖の第
２文字と第３文字からなる２文字連鎖の第１文字または
第２文字の出現位置を記録した第１６のデータを、記
録、アクセスすることにより、上記課題を解決してい
る。Twelfth, the full-text search data is converted into two characters that are not to be searched according to adjacent special characters before and after a special character specified in the character string to be searched. , All the two-character sequences including the characters not to be searched are detected.
The appearance position of the first character or the second character constituting the character chain in the search target character string is recorded as the appearance position of the two-character chain, and the special character designated in advance in the search character string is recorded on the recording medium. Are not searched for based on adjacent characters according to the same rules applied to
The character string is converted to a character string, all two-character sequences including two characters that are not to be searched are detected in the converted character string, the detected two-character sequence is detected from the storage medium, and a corresponding appearance position is detected. The above problem is solved by determining the presence or absence of a character chain as a search character string by comparing.
Thirteenth, the full-text search data is
Detects all two-character sequences consisting of characters other than the special characters specified in advance, and sets the appearance position of the first character or the second character constituting the two-character sequence in the search target character string to two characters for each of the two-character sequences. With respect to the data recorded as the appearance position of the chain and the search target character string, all three-character chains consisting of three characters in which a special character specified in advance is inserted are detected. The special character of the second character constituting the chain is converted into the same character as the third character, and the appearance position of the second character is set to the same value as the appearance position of the third character. Fifteenth data consisting of data that generates two two-character chains of the third character and records the appearance position of the first character or the second character of the two-character chain consisting of the first character and the second character of the three-character chain And the second and third characters of the three-character chain The first 16 of the data recording the occurrence position of the first character or the second character Ranaru 2 character chain, recording, by accessing solves the above problems.

【００２４】第１４に、全文検索に用いる検索データを
記録したコンピュータ読み取り可能な記憶媒体であっ
て、前記検索データは検索対象文字列に対し、２文字連
鎖と文字位置を検出し、２文字連鎖毎に検索対象文字列
から構成される文書番号と２文字連鎖と文字位置の組を
２文字連鎖情報として構成し、２文字連鎖情報の文字位
置は、検索対象文字列の先頭を基準として予め指定され
た特殊文字の位置は除外して昇順に番号付けをし、さら
に文字連鎖の第１文字毎に文字連鎖情報を格納している
記憶媒体であって、特殊文字を含まない２文字連鎖で第
１文字と第２文字の組と、特殊文字を除外した文字位置
での第１文字の文字位置、および文書番号を組として記
録した第１７のデータと、特殊文字の直前の文字と特殊
文字を組み合わせた２文字連鎖、特殊文字の文字種によ
り規定される任意の固定値、および文書番号の組から構
成される第１８のデータと、特殊文字と特殊文字の直後
の文字を組合わせた２文字連鎖、特殊文字を除外した文
字位置での第２文字の文字位置、および文書番号から構
成される第１９のデータと、特殊文字の直前と直後の文
字を組み合わせた２文字連鎖、特殊文字を除外した文字
位置での第１文字の文字位置、および文書番号から構成
される第２０のデータと、第１７のデータ、第１８のデ
ータ、第１９のデータ、第２０のデータの２文字連鎖の
第１文字目毎に整列して文字連鎖情報として格納し、第
１７のデータ、第１８のデータおよび第１９のデータに
対しては、２つの文字連鎖の１文字目が同じで２文字目
が特殊文字の場合に第１７のデータの直後に第１８のデ
ータを格納し、全文検索データを記録、アクセスするこ
とにより、上記課題を解決している。Fourteenth, a computer-readable storage medium storing search data used for full-text search, wherein the search data detects a two-character chain and a character position in a search target character string, A set of a document number, a two-character chain, and a character position composed of a search target character string is configured as two-character chain information for each time. The positions of the special characters are excluded and numbering is performed in ascending order. Further, the storage medium storing character chain information for each first character of the character chain. The seventeenth data recorded as a set of one character and the second character, the character position of the first character at the character position excluding the special character, and the document number, and the character immediately before the special character and the special character Combined A two-character chain combining the eighteenth data consisting of a character chain, an arbitrary fixed value defined by the character type of a special character, and a document number, a special character and the character immediately following the special character, a special character In the two-character chain combining the nineteenth data consisting of the character position of the second character at the character position excluding the character and the document number, and the characters immediately before and after the special character, the character position excluding the special character 20th data composed of the character position of the first character and the document number, and the first character of the two-character chain of the seventeenth data, the eighteenth data, the nineteenth data, and the twentieth data , And stored as character chain information. For the seventeenth data, the eighteenth data, and the nineteenth data, when the first character of the two character chains is the same and the second character is a special character, Seventeenth data The first 18 data stored after recording a full-text search data by accessing solves the above problems.

【００２５】第１５に、全文検索に用いる検索データを
記録したコンピュータ読み取り可能な記憶媒体であっ
て、前記検索データは検索対象文字列に対し、２文字連
鎖と文字位置を検出し、２文字連鎖毎に検索対象文字列
から構成される文書番号と２文字連鎖と文字位置の組を
２文字連鎖情報として構成し、文字連鎖の第１文字毎に
文字連鎖情報を格納している記憶媒体し、２文字連鎖情
報の文字位置は、検索対象文字列の先頭を基準として予
め指定された特殊文字の位置は除外して昇順または降順
に番号付けをし、特殊文字を含まない文字種の場合には
第１文字と第２文字の組と、第１文字の文字位置と、文
書番号を組として記録した第２１のデータと、特殊文字
を含む文字連鎖情報は、特殊文字の直前の文字に対して
は、特殊文字の直前の文字と特殊文字の直後の文字を組
み合わせた文字連鎖、特殊文字の直前の文字位置および
文書番号の組から構成され、また該文字連鎖情報の文字
連鎖の第１文字と第２文字が、特殊文字を含まない場合
の文字連鎖情報の文字連鎖の第１文字または第２文字が
一致する場合には特殊文字を含まない文字連鎖情報の後
または前に別個に記録されるように構成される第２２の
データと、特殊文字を含む文字連鎖情報は、特殊文字の
直後の文字に対しては、特殊文字の直後の文字とその文
字に続く文字を組合わせた文字連鎖、特殊文字の直後の
文字位置および文書番号から構成され、また該文字連鎖
情報の文字連鎖の第１文字が、特殊文字を含まない場合
の２文字連鎖の第１文字と一致する場合には特殊文字を
含まない文字連鎖情報の後または前に別個に記録される
ように構成される第２３のデータと、特殊文字を含む文
字連鎖情報は、特殊文字の２個前の文字と特殊文字の直
後の文字とを組み合わせた文字連鎖、特殊文字の２個前
の文字位置および文書番号から構成される第２４のデー
タと、第２１データ、第２２データ、第２３データ、第
２４データを区別して記憶されていることを特徴とす
る、全文検索データを記録、アクセスすることにより、
上記課題を解決している。Fifteenth, a computer-readable storage medium storing search data used for full-text search, wherein the search data detects a two-character chain and a character position with respect to a character string to be searched; A storage medium that stores a set of a document number, a two-character chain, and a character position, each of which is a search target character string, as two-character chain information, and stores character chain information for each first character of the character chain; The character position of the two-character chain information is numbered in ascending or descending order excluding the position of the special character specified in advance with respect to the beginning of the search target character string. The 21st data recorded as a set of one character and the second character, the character position of the first character, and the document number, and the character chain information including the special character are: , Just before special characters A character chain consisting of a character and a character immediately following a special character, a set of a character position immediately before the special character and a document number. The first and second characters of the character chain of the character chain information are special characters. When the first character or the second character of the character chain of the character chain information in the case where the character chain information does not include the character string information, the first character or the second character is separately recorded before or after the character chain information not including the special character. Data and character chain information including special characters, for the character immediately after the special character, the character chain that combines the character immediately after the special character and the character that follows that character, the character position immediately after the special character And a document number, and when the first character of the character chain of the character chain information matches the first character of the two-character chain when no special character is included, the character chain information without the special character is included. After or before The 23rd data configured to be recorded in each character and the character chain information including the special character include a character chain combining the character two characters before the special character and the character immediately after the special character, Full-text search data characterized in that the twenty-fourth data composed of the character position and the document number two characters before and the twenty-first data, the twenty-second data, the twenty-third data, and the twenty-fourth data are stored separately. By recording and accessing
The above problem has been solved.

【００２６】また、本発明の文字列照合装置は、第１
に、上記課題解決するための第１の手段による全文検索
に用いる検索データを記録したコンピュータ読み取り可
能な記録媒体と、検索文字列から、予め指定された特殊
文字以外の文字からなる全ての２文字連鎖を検出する第
１の文字連鎖検出手段と、検索文字列から、予め指定さ
れた特殊文字が挿入された特殊文字以外の２文字からな
る全ての文字連鎖を検出する第２の文字連鎖検出手段
と、第１の文字連鎖検出手段により検出された２文字連
鎖を、前記記録媒体に記録された第１のデータから検索
し、第２の文字連鎖検出手段により検出された文字連鎖
を検出し、検出された文字連鎖に対応する出現回数の比
較により、検索文字列としての文字連鎖の連続の有無を
判定する比較手段とを備えたことを特徴とする。Further, the character string collating device of the present invention comprises:
In addition, a computer-readable recording medium storing search data used for full-text search by the first means for solving the above-mentioned problem, and all two characters consisting of characters other than special characters specified in advance from a search character string First character chain detecting means for detecting a chain, and second character chain detecting means for detecting, from the search character string, all character chains consisting of two characters other than the special character in which a predetermined special character is inserted. And searching the two-character chain detected by the first character chain detecting means from the first data recorded on the recording medium, detecting the character chain detected by the second character chain detecting means, A comparison unit that determines whether or not there is a continuation of the character chain as the search character string by comparing the number of appearances corresponding to the detected character chain.

【００２７】第２に、第２の手段による全文検索に用い
る検索データを記録したコンピュータ読み取り可能な記
録媒体と、検索文字列の予め指定された特殊文字を前記
記録媒体に記録されたデータに対し適用された同一の規
則に従い、隣接する文字に基づき検索の対象とならない
文字に変換する文字列変換手段と、前記文字列変換手段
により変換された文字列に対し、検索の対象とならない
文字も含め全ての２文字連鎖を検出する２文字連鎖検出
手段と、前記２文字連鎖検出手段により検出された２文
字連鎖を、前記記録媒体から検出し、対応する出現回数
の比較により、検索文字列としての文字連鎖の連続の有
無を判定する比較手段とを備えたことを特徴とする。Secondly, a computer-readable recording medium storing search data used for full-text search by the second means, and a special character designated in advance as a search character string for data recorded on the recording medium. According to the same rule applied, a character string conversion unit that converts adjacent characters into characters that are not to be searched, and a character string that is converted by the character string conversion unit, including characters that are not to be searched. A two-character chain detecting means for detecting all the two-character chain, and a two-character chain detected by the two-character chain detecting means are detected from the recording medium, and the corresponding numbers of appearances are compared. Comparing means for judging the presence or absence of continuation of the character chain.

【００２８】第３に、第３手段による全文検索に用いる
検索データを記録したコンピュータ読み取り可能な記録
媒体と、検索文字列の予め指定された特殊文字を前記記
録媒体に記録されたデータに対し適用された同一の規則
に従い、隣接する文字に基づき検索の対象とならない２
文字に変換する文字列変換手段と、前記文字列変換手段
により変換された文字列に対し、検索の対象とならない
２文字も含め全ての２文字連鎖を検出する２文字連鎖検
出手段と、前記２文字連鎖検出手段により検出された２
文字連鎖を、前記記録媒体から検出し、対応する出現回
数の比較により、検索文字列としての文字連鎖の連続の
有無を判定する比較手段とを備えたことを特徴とする文
字列照合装置。Third, a computer-readable recording medium on which search data used for full-text search by the third means is recorded, and a special character designated as a search character string is applied to data recorded on the recording medium. Not subject to search based on adjacent characters according to the same rule 2
A character string conversion means for converting to a character, a two-character chain detection means for detecting all two-character chains including two characters which are not to be searched from the character string converted by the character string conversion means, 2 detected by character chain detection means
A character string collating device comprising: a comparing unit that detects a character chain from the recording medium and determines whether there is a continuation of the character chain as a search character string by comparing the number of corresponding appearances.

【００２９】第４に、第４の手段による全文検索に用い
る検索データを記録したコンピュータ読み取り可能な記
録媒体と、検索文字列から、予め指定された特殊文字以
外の文字からなる全ての２文字連鎖を検出する第１の文
字連鎖検出手段と、検索文字列から、予め指定された特
殊文字が挿入された３文字からなる全ての文字連鎖を検
出する第２の文字連鎖検出手段と、第１の文字連鎖検出
手段により検出された２文字連鎖を、前記記録媒体に記
録された第３のデータから検索し、第３の文字連鎖検出
手段により検出された文字連鎖を前記記録媒体に記録さ
れた第４のデータから検索し、検出された文字連鎖に対
応する出現回数の比較により、検索文字列としての文字
連鎖の連続の有無を判定する比較手段とを備えた構成と
なっている。Fourth, a computer-readable recording medium storing search data used for full-text search by the fourth means, and all two-character sequences consisting of characters other than special characters specified in advance from a search character string A first character chain detecting means for detecting all character chains consisting of three characters into which a special character designated in advance is inserted, from a search character string; and The two-character chain detected by the character chain detecting means is searched from the third data recorded on the recording medium, and the character chain detected by the third character chain detecting means is retrieved from the third data recorded on the recording medium. And a comparing unit that determines whether there is a continuation of the character chain as the search character string by comparing the number of appearances corresponding to the detected character chain by searching from the data of No. 4.

【００３０】第５に、第５の手段による全文検索に用い
る検索データを記録したコンピュータ読み取り可能な記
録媒体と、検索文字列から、予め指定された特殊文字以
外の文字からなる全ての２文字連鎖を検出する第１の文
字連鎖検出手段と、検索文字列から、予め指定された特
殊文字が挿入された３文字からなる全ての文字連鎖を検
出する第３の文字連鎖検出手段と、第１の文字連鎖検出
手段により検出された２文字連鎖を、前記記録媒体に記
録された第５のデータから検索し、第３の文字連鎖検出
手段により検出された文字連鎖を前記記録媒体に記録さ
れた第６のデータから検索し、検出された文字連鎖に対
応する出現回数の比較により、検索文字列としての文字
連鎖の連続の有無を判定する比較手段とを備えた構成と
なっている。Fifth, a computer-readable recording medium storing search data used for full-text search by the fifth means, and all two-character sequences consisting of characters other than special characters specified in advance from a search character string A first character chain detecting means for detecting all character chains consisting of three characters into which a predetermined special character has been inserted, from a search character string; and The two-character chain detected by the character chain detecting means is searched from the fifth data recorded on the recording medium, and the character chain detected by the third character chain detecting means is retrieved from the fifth data recorded on the recording medium. And a comparing means for judging the presence or absence of continuation of the character chain as the search character string by comparing the number of appearances corresponding to the detected character chain by searching from the data of No. 6.

【００３１】第６に、第６の手段による全文検索に用い
る検索データを記録したコンピュータ読み取り可能な記
録媒体と、検索文字列から、予め指定された特殊文字以
外の文字からなる全ての２文字連鎖を検出する第１の文
字連鎖検出手段と、検索文字列から、予め指定された特
殊文字が挿入された３文字からなる全ての３文字連鎖を
検出する第４の文字連鎖検出手段と、第１の文字連鎖検
出手段により検出された２文字連鎖を、前記記録媒体に
記録された第７のデータから検索し、第４の文字連鎖検
出手段により検出された特殊文字を変換して生成した２
文字連鎖を、前記記録媒体に記録された第８のデータか
ら検索し、検出された文字連鎖に対応する出現回数の比
較により、検索文字列としての文字連鎖の連続の有無を
判定する比較手段とを備えた構成となっている。Sixth, a computer-readable recording medium storing search data used for full-text search by the sixth means, and a two-character chain consisting of characters other than special characters specified in advance from a search character string A first character chain detecting unit for detecting all three-character chains consisting of three characters into which a predetermined special character has been inserted, from a search character string; The two-character chain detected by the character chain detecting means is searched from the seventh data recorded on the recording medium, and the special character detected by the fourth character chain detecting means is converted to generate a two-character string.
Comparing means for searching for a character chain from the eighth data recorded on the recording medium and comparing the number of appearances corresponding to the detected character chain to determine whether or not there is a continuation of the character chain as the search character string; Is provided.

【００３２】第７に、第７の手段による全文検索に用い
る検索データを記録したコンピュータ読み取り可能な記
憶媒体と、検索文字列から、特殊文字を含まない全ての
２文字連鎖を検出する第５の文字連鎖検出手段と、検索
文字列から、特殊文字を含む全ての文字連鎖を検出する
第６の文字連鎖検出手段と、検索文字列が、第１の文字
連鎖検出手段で検出された２文字連鎖で構成される場合
には、検出された文字連鎖に対応する出現回数の比較に
より、検索文字列としての文字連鎖の連続の有無を判定
する比較手段と、検索文字列が、第６の文字連鎖検出手
段で検索された２文字連鎖で構成される場合には、検出
した文字連鎖の出現回数および特殊文字の出現回数の重
複した回数の比較により、検索文字列としての文字連鎖
の連続の有無を判定する比較手段とを備えた構成となっ
ている。Seventh, a computer-readable storage medium storing search data used for full-text search by the seventh means, and a fifth method for detecting all two-character sequences that do not include special characters from a search character string A character chain detecting means, a sixth character chain detecting means for detecting all character chains including special characters from the search character string, and a two-character chain detected by the first character chain detecting means. When the search string is composed of a sixth character chain, a comparison unit that determines the presence or absence of continuation of the character chain as a search character string by comparing the number of appearances corresponding to the detected character chain. In the case of a two-character chain searched by the detection means, the presence or absence of the continuation of the character chain as a search character string is determined by comparing the number of occurrences of the detected character chain and the number of times of occurrence of the special character. Size It has comparing means for a configuration with.

【００３３】第８に、第８の手段による全文検索に用い
る検索データを前記記憶媒体と、検索文字列から、特殊
文字を含まない全ての２文字連鎖を検出する第５の文字
連鎖検出手段と、２文字連鎖が特殊文字を含まない場合
は、第５の文字連鎖検出手段で検出された連続した文字
連鎖に該当する文字連鎖データに対して、検出された文
字連鎖データの第２文字の出現回数と、前記文字連鎖に
続く文字連鎖の文字連鎖データの第１文字の出現回数を
比較することにより、検索文字列としての文字連鎖の連
続の有無を判定する比較手段とを備えた構成となってい
る。Eighthly, a fifth character chain detecting means for detecting, from the storage medium, search data used for full-text search by the eighth means, and all two-character strings that do not include special characters from a search character string. If the two-character chain does not include a special character, the appearance of the second character of the detected character chain data is compared with the character chain data corresponding to the continuous character chain detected by the fifth character chain detection unit. By comparing the number of times and the number of appearances of the first character of the character chain data of the character chain following the character chain, a comparison unit that determines whether or not the character chain as a search character string is continuous is provided. ing.

【００３４】第９に、第９の手段による全文検索に用い
る検索データを前記記憶媒体と、検索文字列から、予め
指定された特殊文字以外の文字からなる全ての２文字連
鎖を検出する第１の文字連鎖検索手段と、特殊文字列か
ら、予め指定された特殊文字をまたぐ前後の２文字連鎖
に対して、特殊文字の前の２文字連鎖の第１文字と特殊
文字の後の２文字連鎖の第１文字とを組にした文字連鎖
を検出する第２の文字連鎖検出手段、または特殊文字の
前にある２文字連鎖の第１文字と特殊文字の直後の文字
の文字を組にした文字連鎖を検出する第２の文字連鎖検
出手段と、第１の文字連鎖検出手段により検出された２
文字連鎖を、前記記憶媒体に記録された第１１のデータ
から検索または第１２のデータから検索し、第１１のデ
ータから検索した場合は第７の文字連鎖検出手段により
検出された文字連鎖を検索し、また第２のデータから検
索した場合は第１の文字連鎖検出手段により検出された
文字連鎖を検索し、検出された文字連鎖に対応する出現
回数の比較により、検索文字列としても文字連鎖の連続
の有無を判定する比較手段とを備えた構成となってい
る。Ninth, a first method for detecting, from the storage medium and the search character string, all two-character sequences consisting of characters other than the special characters specified in advance, using search data used for full-text search by ninth means. And a two-character chain before and after the special character and a two-character chain before and after the special character. A second character chain detecting means for detecting a character chain formed by combining the first character with the first character, or a character formed by combining the character of the first character of the two-character chain preceding the special character and the character immediately after the special character A second character chain detecting means for detecting a chain, and two characters detected by the first character chain detecting means.
The character chain is searched from the eleventh data or the twelfth data recorded on the storage medium. If the character chain is searched from the eleventh data, the character chain detected by the seventh character chain detecting means is searched. When the search is performed from the second data, the character chain detected by the first character chain detection means is searched, and the number of appearances corresponding to the detected character chain is compared, so that the character chain can be used as a search character string. And a comparing means for judging the presence or absence of the continuation.

【００３５】また、第１０に、第１０の手段による全文
検索に用いる検索データを記録したコンピュータ読み取
り可能な記録媒体と、検索文字列から、予め指定された
特殊文字以外の文字からなる全ての２文字連鎖を検出す
る第１の文字連鎖検出手段と、検索文字列から、予め指
定された特殊文字が挿入された３文字からなる全ての文
字連鎖を検出する第２の文字連鎖検出手段と、第１の文
字連鎖検出手段により検出された２文字連鎖を、前記記
録媒体に記録された第１３のデータから検索し、第８の
文字連鎖検出手段により検出された文字連鎖を前記記録
媒体に記録された第１４のデータから検索し、検出され
た文字連鎖に対応する出現位置の比較により、検索文字
列としての文字連鎖の連続の有無を判定する比較手段と
を備えた構成となっている。Tenth, a computer-readable recording medium on which search data used for full-text search by the tenth means is recorded, and all two-byte characters consisting of characters other than special characters specified in advance from a search character string. A first character chain detecting means for detecting a character chain, a second character chain detecting means for detecting, from the search character string, all character chains of three characters into which a predetermined special character is inserted, The two-character chain detected by the first character-chain detecting means is searched from the thirteenth data recorded on the recording medium, and the character chain detected by the eighth character-chain detecting means is recorded on the recording medium. And a comparing unit that determines whether there is a continuation of the character chain as the search character string by comparing the appearance positions corresponding to the detected character chains by searching from the fourteenth data. To have.

【００３６】第１１に、第１１の手段による全文検索に
用いる検索データを記録したコンピュータ読み取り可能
な記録媒体と、検索文字列の予め指定された特殊文字を
前記記録媒体に記録されたデータに対し適用された同一
の規則に従い、隣接する文字に基づき検索の対象となら
ない文字に変換する文字列変換手段と、検索文字列か
ら、検索の対象とならない文字も含め全ての２文字連鎖
を検出する２文字連鎖検出手段と、２文字連鎖検出手段
により検出された２文字連鎖を、前記記録媒体に記録さ
れたデータから検索し、検出された文字連鎖に対応する
出現位置の比較により、検索文字列としての文字連鎖の
連続の有無を判定する比較手段とを備えた構成となって
いる。Eleventh, a computer-readable recording medium recording search data used for full-text search by the eleventh means, and a special character designated in advance as a search character string are stored in the data recorded in the recording medium. A character string conversion unit that converts adjacent characters into characters that are not to be searched according to the same rule that has been applied, and detects all two-character chains from the search character string, including characters that are not to be searched. A character chain detecting unit and a two-character chain detected by the two-character chain detecting unit are searched from the data recorded on the recording medium, and an appearance position corresponding to the detected character chain is compared to form a search character string. And a comparing means for determining whether or not there is a continuation of the character chain.

【００３７】第１２に、第１２の手段による全文検索に
用いる検索データを記録したコンピュータ読み取り可能
な記録媒体と、検索文字列の予め指定された特殊文字を
前記記録媒体に記録されたデータに対し適用された同一
の規則に従い、隣接する文字に基づき検索の対象となら
ない２文字に変換する文字列変換手段と、検索文字列か
ら、文字列に対し、検索の対象とならない２文字も含め
全ての２文字連鎖を検出する２文字連鎖検出手段と、２
文字連鎖検出手段により検出された２文字連鎖を、前記
記録媒体に記録されたデータから検索し、検出された文
字連鎖に対応する出現位置の比較により、検索文字列と
しての文字連鎖の連続の有無を判定する比較手段とを備
えた構成となっている。Twelfth, a computer-readable recording medium storing search data used for full-text search by the twelfth means, and a special character designated in advance as a search character string for data recorded on the recording medium. A character string conversion unit that converts adjacent characters into two characters that are not to be searched according to the same rule that has been applied, and all character strings, including two characters that are not to be searched, are converted from a search character string to a character string. Two-character chain detecting means for detecting a two-character chain;
The two-character chain detected by the character chain detecting means is searched from the data recorded on the recording medium, and the appearance position corresponding to the detected character chain is compared to determine whether or not the character chain as a search character string is continuous. And a comparing means for determining

【００３８】第１３に、第１３の手段による全文検索に
用いる検索データを記録したコンピュータ読み取り可能
な記録媒体と、検索文字列から、予め指定された特殊文
字以外の文字からなる全ての２文字連鎖および予め指定
された特殊文字が間に挿入された３文字からなる全ての
３文字連鎖を検出し、３文字連鎖毎に、３文字連鎖を構
成する第２文字の特殊文字を第３文字と同じ文字に変換
し、第１文字と第２文字からなる２文字連鎖を検出する
第１の文字連鎖検出手段と、前記３文字連鎖の第２文字
と第３文字からなる２文字連鎖を検出する第１０の文字
連鎖検出手段と、第１の文字連鎖検出手段により検出さ
れた２文字連鎖を、前記記録媒体に記録された第１５の
データから検索し、第２の文字連鎖検出手段により検出
された特殊文字を変換して生成した２文字連鎖を、前記
記録媒体に記録された第１６のデータから検索し、検出
された文字連鎖に対応する出現位置の比較により、検索
文字列としての文字連鎖の連続の有無を判定する比較手
段とを備えた構成となっている。Thirteenth, a computer-readable recording medium storing search data used for full-text search by the thirteenth means, and a two-character chain consisting of characters other than special characters specified in advance from a search character string And detects all three-character chains consisting of three characters with a special character inserted in advance, and for each three-character chain, the second special character constituting the three-character chain is the same as the third character A first character chain detecting means for converting the character sequence into a character and detecting a two-character chain composed of a first character and a second character; and a second character sequence detecting a two-character chain composed of a second character and a third character of the three-character sequence. The two character chains detected by the ten character chain detecting means and the first character chain detecting means are searched from the fifteenth data recorded on the recording medium, and are detected by the second character chain detecting means. Special characters The two-character chain generated by the replacement is searched from the sixteenth data recorded on the recording medium, and by comparing the appearance positions corresponding to the detected character chain, the presence or absence of the continuation of the character chain as the search character string is determined. And a comparing means for determining

【００３９】第１４に、第１４の手段による全文検索に
用いる検索データを記録したコンピュータ読み取り可能
な記録媒体と、検索文字列から、特殊文字とその前後の
文字を除く全ての２文字連鎖を検出する第１１の文字連
鎖検出手段と、検索文字列から、特殊文字の直前の文字
と直後の文字からなる２文字連鎖、特殊文字の直前の文
字と特殊文字からなる２文字連鎖、特殊文字と特殊文字
の直後の文字からなる２文字連鎖を検出する第１２の文
字連鎖検出手段と、第１１の文字連鎖検出手段で検出さ
れた２文字連鎖に対応する第１７のデータと第１２の文
字連鎖検出手段で検出された文字連鎖対応する第２０の
データ、または第２１のデータと第１７のデータから２
つのデータの文字位置の差と文書番号の比較により、検
索文字列としての連続の有無を判断する比較手段と、第
２０のデータの直後に第２のデータが続いていることに
より特殊文字を含む検索文字列としての連続の有無を判
断する比較手段とを備えた構成となっている。Fourteenthly, a computer-readable recording medium storing search data used for full-text search by the fourteenth means, and all two-character sequences except special characters and characters before and after the special characters are detected from a search character string. And a two-character chain consisting of the character immediately before and after the special character, a two-character chain consisting of the character immediately before the special character and the special character, and a special character and special character from the search character string. A twelfth character chain detecting means for detecting a two-character chain consisting of a character immediately following a character, and seventeenth data corresponding to the two-character chain detected by the eleventh character chain detecting means and a twelfth character chain detection From the 20th data corresponding to the character chain detected by the means, or the 21st data and the 17th data.
Comparing means for judging the presence or absence of continuation as a search character string by comparing the character position difference between the two data with the document number, and including a special character because the second data immediately follows the twentieth data A comparison means for determining whether or not there is a continuation as a search character string is provided.

【００４０】第１５に、第１５の手段による全文検索に
用いる検索データを記録したコンピュータ読み取り可能
な記録媒体と、検索文字列から、特殊文字の前後の文字
を除く全ての２文字連鎖を検出する第１３の文字連鎖検
出手段と、検索文字列から、特殊文字を間に挟む検索文
字列の場合は特殊文字の直前の文字と直後の文字を文字
連鎖として検出し、かつ該文字連鎖の第２文字は特殊文
字の直後の文字としてマークし、検索文字列の先頭が特
殊文字の場合は特殊文字の直後の文字とその次の文字を
文字連鎖として検出し、かつ該文字連鎖の第１文字は特
殊文字の直後の文字としてマークし、検索文字列の先頭
から３番目以降に特殊文字が出現する場合には、特殊文
字の２文字前の文字と特殊文字の直後の文字を文字連鎖
として検出し、かつ該文字連鎖の第２文字は特殊文字の
直後の文字としてマークし、さらに特殊文字の直後の文
字とその次の文字を文字連鎖として検出し、かつ該文字
連鎖の第１文字は特殊文字の直後の文字としてマークす
る第１４の文字連鎖検出手段と、検索文字列が、第１３
の文字連鎖検出手段で検出された２文字連鎖で構成され
る場合には、検出された文字連鎖に対応する文字位置と
文書番号の比較により、検索文字列としての文字連鎖の
連続の有無を判定する比較手段と、検索文字列が、第１
４の文字連鎖検出手段で検索された２文字連鎖で構成さ
れる場合には、第２１データから第２４データの文字連
鎖情報に一致するかどうかを文字連鎖と文書番号から検
索文字列としての文字連鎖の連続の有無を判定する比較
手段とを備えた構成となっている。Fifteenth, a computer-readable recording medium storing search data used for full-text search by the fifteenth means, and all two-character sequences excluding characters before and after a special character are detected from a search character string. A thirteenth character chain detecting means for detecting, from the search character string, a character immediately before and after the special character as a character chain in the case of a search character string sandwiching a special character; The character is marked as the character immediately following the special character. If the beginning of the search string is a special character, the character immediately after the special character and the next character are detected as a character chain, and the first character of the character chain is If the special character appears after the third character from the beginning of the search string, the character two characters before the special character and the character immediately after the special character are detected as a character chain. Or The second character of the character chain is marked as the character immediately after the special character, and the character immediately after the special character and the next character are detected as a character chain, and the first character of the character chain is immediately after the special character. The fourteenth character chain detecting means for marking the character as
In the case of a two-character chain detected by the character chain detecting means, the character position corresponding to the detected character chain is compared with the document number to determine the presence or absence of the continuation of the character chain as the search character string. And the search string are the first
In the case of a two-character chain searched by the character chain detecting means of No. 4, a character string as a search character string is determined from the character chain and the document number as to whether or not the character string matches the character chain information of the 21st data to the 24th data. And a comparing means for judging the presence or absence of continuation of the chain.

【００４１】[0041]

【発明の実施の形態】以下、本発明の実施例について図
面を参照しながら説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００４２】（実施の形態１）図１（ａ）は、本発明に
よる記録媒体を用いて計算により文字列照合装置を構成
した場合の概略図、図１（ｂ）は、本発明による文字列
照合装置のブロック構成図、図２は本発明の第１の方法
の文字列照合の方法の概念、及び全文検索データを記憶
した記録媒体の記憶形式を示している。(Embodiment 1) FIG. 1 (a) is a schematic diagram of a case where a character string collating apparatus is constructed by calculation using a recording medium according to the present invention, and FIG. 1 (b) is a character string according to the present invention. FIG. 2 is a block diagram of the collation apparatus, and FIG. 2 shows the concept of the character string collation method according to the first method of the present invention and the storage format of a recording medium that stores full-text search data.

【００４３】図２(a)において、２０１は登録時に入力
される文字列「いろａはに」、２０２は最初に登録され
るの２文字連鎖「いろ」、２０３は２０２の次の３文字
連鎖「ろａは」、２０４は２０３の次の２文字連鎖「は
に」である。ここで「ａ」は、文字列に意味の区切りな
どのために挿入されている特殊文字を示す。In FIG. 2A, reference numeral 201 denotes a character string "iro a ha ni" inputted at the time of registration; 202, a two-character chain "iro" to be registered first; “Roaha” and 204 are the two-character chain “Hani” next to 203. Here, "a" indicates a special character inserted into the character string to separate the meaning.

【００４４】図２(c)において、２１１は検索時の検索
文字列「いろａはに」、２１２は最初に検索される２文
字連鎖「いろ」、２１３は２１２の次の３文字連鎖「ろ
ａは」、２１４は２１３の次の２文字連鎖「はに」であ
る。In FIG. 2 (c), reference numeral 211 denotes a search character string "iro a ha ni" at the time of search, 212 denotes a two-character chain "iro" to be searched first, and 213 denotes a three-character chain "ro" next to 212. a is "," 214 is a two-character sequence "Hani" following 213.

【００４５】図２(b)において、２文字連鎖２０２は
「い」および「ろ」の出現回数n1、n2を、３文字連鎖２
０３は「ろ」および「は」の出現回数n2、n3を、２文字
連鎖２０４は「は」および「に」の出現回数n3、n4を記
憶する。２文字連鎖２０２、２０４と３文字連鎖２０３
は異なる領域に記憶し、２文字連鎖か３文字連鎖かを識
別する。In FIG. 2B, the two-letter chain 202 determines the number of appearances n1 and n2 of “i” and “ro” by the three-letter chain 2
03 stores the number of appearances n2 and n3 of “ro” and “ha”, and the two-character chain 204 stores the number of appearances n3 and n4 of “ha” and “ni”. Two-character chains 202 and 204 and three-character chains 203
Is stored in a different area and distinguishes between a two-character chain and a three-character chain.

【００４６】検索文字列図２(c)の入力に対し、本発明
の第１の方法による照合方法では、２文字連鎖２１２の
「いろ」に該当する２文字連鎖２０２を２文字連鎖を格
納した領域から検出し、このときの「ろ」の出現回数n2
と、２１２の次の３文字連鎖２１３の「ろａは」に該当
する３文字連鎖２０３「ろは」を３文字連鎖が格納され
た領域から検出し、このときの「ろ」の出現回数n2が一
致するか否か判断する。一致したら、次に２０３で検出
した「は」の出現回数n3と、２１３の次の２文字連鎖の
「はに」に該当する２文字連鎖２０４を２文字連鎖を格
納する領域から検出し、このときの「は」の出現回数が
一致するか否か判断する。一致したら、文字列２１１は
２０１に一致したと判断する。以上により、文字列の照
合がなされる。With respect to the input of FIG. 2 (c), in the collation method according to the first method of the present invention, a two-character chain 202 corresponding to the "color" of the two-character chain 212 is stored as a two-character chain. Detected from the area, the number of appearances of "ro" at this time n2
And a three-letter chain 203 “roha” corresponding to “ro aha” in a three-letter chain 213 next to 212 is detected from the area in which the three-letter chain is stored, and the number of appearances n2 of “ro” at this time is detected. It is determined whether or not matches. If they match, then the number of appearances n3 of “ha” detected in 203 and the two-character chain 204 corresponding to “hani” in the two-character chain following 213 are detected from the area storing the two-character chain. It is determined whether or not the number of appearances of “ha” at the time matches. If they match, it is determined that the character string 211 matches 201. As described above, the character strings are collated.

【００４７】図１（ｂ）は本発明の第１の方法の一実施
の形態における文字列照合装置の構成を示したものであ
る。FIG. 1B shows the configuration of a character string collating apparatus according to an embodiment of the first method of the present invention.

【００４８】図１（ｂ）において、１０１は登録する文
字列２０１から登録する２文字連鎖２０２、２０４、を
検出する２文字連鎖検出器、１０２は登録する文字列２
０１から登録する３文字連鎖２０３を検出する３文字連
鎖検出器、１０３は２文字連鎖２０２、２０４およびそ
れらの文字の出現回数を格納する２文字連鎖メモリ、１
０４は３文字連鎖２０３およびその連鎖の最初と最後の
文字の出現回数を格納する３文字連鎖メモリ、１１１は
検索する文字列２１１から検索する２文字連鎖２１２、
２１４を検出する２文字連鎖検出器、１１２は検索する
文字列２１１から検索する３文字連鎖２１３を検出する
３文字連鎖検出器、１１３は２文字連鎖検出器１１１よ
り検出された２文字連鎖２１２、２１４を２文字連鎖メ
モリ１０３で検出するかまたは、３文字連鎖検出器１１
２より検出された３文字連鎖２１３を３文字連鎖メモリ
１０４で検出し、検出したそれぞれの文字連鎖の前の文
字の出現回数が直前に検出した文字連鎖の後の文字の出
現回数に一致するか否か判断する比較器、１１４は２文
字連鎖検出器１１１および３文字連鎖検出器１１２から
検出される全ての２文字または３文字の連鎖についての
一致を比較器１１３で判断し、文字列の一致を判断する
制御部である。In FIG. 1B, reference numeral 101 denotes a two-character chain detector for detecting two-character chains 202 and 204 to be registered from a character string 201 to be registered, and 102 denotes a character string 2 to be registered.
A three-character chain detector 103 detects a three-character chain 203 registered from 01, a two-character chain memory 103 for storing two-character chains 202 and 204 and the number of appearances of those characters, and
04 is a three-character chain memory for storing the three-character chain 203 and the number of appearances of the first and last characters of the chain; 111 is a two-character chain 212 to search from the character string 211 to be searched;
A two-character chain detector 214 for detecting 214; a three-character chain detector 112 for detecting a three-character chain 213 to be searched from the character string 211 to be searched; 113 a two-character chain 212 detected by the two-character chain detector 111; 214 in the two-character chain memory 103 or the three-character chain detector 11
The three-character chain 213 detected from the second character chain is detected by the three-character chain memory 104, and whether the number of appearances of the character before each detected character chain matches the number of occurrences of the character after the character chain detected immediately before is detected. A comparator 114 determines whether or not all two- or three-character chains detected by the two-character chain detector 111 and the three-character chain detector 112 match with each other. Is a control unit that determines

【００４９】以上のような構成は、図１（ａ）の概略図
において、２文字連鎖メモリ１０３、３文字連鎖メモリ
１０４が外部記録装置４０のフロッピー、または、ハー
ドディに、他の手段が本体３０に対応する。In the above-described configuration, in the schematic diagram of FIG. Corresponding to

【００５０】以上ように構成された文字列照合装置にお
いて、２文字連鎖メモリ１０３に図２（ｂ）の２０２、
２０４の２文字連鎖が、３文字連鎖メモリに図２（ｂ）
の２０３の３文字連鎖が格納されており、検索文字列と
して図２（ｃ）の「いろａはに」が入力された場合の動
作について説明する。In the character string collating device configured as described above, the two-character chain memory 103 stores 202, FIG.
The two-character chain 204 is stored in the three-character chain memory as shown in FIG.
2 is stored, and an operation in the case where "iroa wa ni" in FIG. 2C is input as a search character string will be described.

【００５１】検索文字列「いろａはに」が入力される
と、２文字連鎖検出手段は、予め特殊文字として指定さ
れた「ａ」を含まない２文字連鎖、「いろ」「はに」を
検出し、比較器１１３に出力する。また、３文字連鎖検
出器は、予め特殊文字として指定された「ａ」を中心
に、「ａ」が挿入された３文字連鎖「ろａは」を検出し
比較器１１３に出力する。When the search character string "color a han" is input, the two-character chain detection means converts the two character chain "color" and "hani" which do not include "a" specified as a special character in advance. Detected and output to the comparator 113. The three-letter chain detector detects the three-letter chain “roaha” in which “a” is inserted, centering on “a” specified as a special character in advance, and outputs it to the comparator 113.

【００５２】このとき、比較器への出力は、連鎖順「い
ろ」「ろａは」「はに」としてもよいし、また、文字の
連鎖情報と共に、「いろ」「ろａは」「はに」を同時に
出力してもい。At this time, the output to the comparator may be in the order of the sequence "color", "color a", "color", or together with the character chain information, "color", "color a", "color". May be output at the same time.

【００５３】比較器１１３は、２文字連鎖検出器からの
出力か３文字連鎖検出器からの出力かを区別し、それぞ
れ２文字連鎖メモリ１０３、３文字連鎖メモリ１０４か
ら「いろ」「はに」と「ろａは」に対応する「ろは」の
連鎖を検出し、出現回数に基づき連鎖を判断する。The comparator 113 discriminates between the output from the two-character chain detector and the output from the three-character chain detector. And a chain of “roha” corresponding to “ro a” is detected, and the chain is determined based on the number of appearances.

【００５４】比較器が２文字連鎖か３文字連鎖かを区別
し、それぞれ異なる連鎖メモリから検出することによ
り、検索対象文字列として「いろａはに」と「いろは
に」を区別して検索することが可能となる。The comparator distinguishes between a two-character chain and a three-character chain, and detects them from different chain memories, thereby performing a search by distinguishing between "iro a ha ni" and "iro ha ni" as character strings to be searched. Becomes possible.

【００５５】以上のように、本実施の形態によれば、予
め指定された特定の特殊文字「ａ」の出現回数に制限を
受けること無く、特殊文字による連鎖メモリの増大を避
けることができ、同時に出現回数の一致による連鎖の抽
出処理を効率的行うことが可能となる。As described above, according to the present embodiment, it is possible to avoid an increase in the chain memory due to special characters without being limited by the number of appearances of the specific character “a” specified in advance. At the same time, it is possible to efficiently perform a chain extraction process based on the coincidence of the number of appearances.

【００５６】なお、本実施の形態では特殊文字を「ａ」
と表現したが、特殊文字の並び「ａ、ａ・・・，ａ」を
「ａ」と置き換えることにより、特殊文字の出現回数に
制限を受けることなく、特殊文字の挿入の有無を区別し
た文字連鎖による文字列照合を行うことが可能となる。In this embodiment, the special character is "a".
However, by replacing the special character sequence “a, a..., A” with “a”, a character that distinguishes whether a special character is inserted or not is not limited by the number of appearances of the special character. String matching by chaining can be performed.

【００５７】即ち、「いろ（特殊文字１つ以上）はに」
と「いろはに」を異なる検索文字とした検索が可能とな
る。That is, “Iro (one or more special characters)
And "irohani" can be searched using different search characters.

【００５８】また、本実施の形態では２文字連鎖と３文
字連鎖（特殊文字の挿入）を区別するために異なる連鎖
メモリを設けたが、同一メモリに２文字連鎖か３文字連
鎖かを識別する識別子を設けて、例えば３文字連鎖に対
しては図２(d)のようにして、記憶することも可能であ
る。（実施の形態２）図３は、本発明の第２の実施の形態に
おける文字列照合装置の構成を示す概念図、図４は本実
施の形態における文字列照合の第２の方法の概念、及び
全文検索データを記憶した記録媒体の記憶形式を示して
いる。In this embodiment, different chain memories are provided to distinguish between a two-character chain and a three-character chain (insertion of special characters). However, the same memory is used to distinguish between a two-character chain and a three-character chain. It is also possible to provide an identifier and store it for a three-character chain, for example, as shown in FIG. (Embodiment 2) FIG. 3 is a conceptual diagram showing a configuration of a character string collating apparatus according to a second embodiment of the present invention. FIG. 4 is a conceptual diagram of a second method of character string collating according to the present embodiment. And a storage format of a recording medium storing full-text search data.

【００５９】図４(a)において、４０１は登録時に入力
される文字列「いろａはに」、４０２は文字列４０１に
対して特定の特殊文字「ａ」をその後の文字「は」によ
り一意に決めた「ａ1」に変更した文字列「いろａ1は
に」、４０３は最初に登録されるの２文字連鎖「い
ろ」、４０４は４０３の次の２文字連鎖「ろａ1」、４
０５は４０４の次の２文字連鎖「ａ1は」、４０６は４
０５の次の２文字連鎖「はに」である。In FIG. 4A, reference numeral 401 denotes a character string "iro a ha ni" inputted at the time of registration, and 402 denotes a specific special character "a" for the character string 401 by the subsequent character "ha". The character string "iro a1 ha ni" changed to "a1" determined in 403, 403 is a two-character chain "iro" to be registered first, and 404 is a two-character chain "ro a1" next to 403.
05 is the next two-letter chain "a1" after 404, 406 is 4
It is the two-letter chain "Hani" following 05.

【００６０】ここで「ａ」は、文字列に意味の区切りな
どのために挿入されている特殊文字、「a１」は、検索
対象とならない特定の記号、コードを表す。Here, "a" is a special character inserted into a character string to separate meanings and the like, and "a1" represents a specific symbol or code not to be searched.

【００６１】図４(c)において、４１１は検索時の検索
文字列「いろａはに」、４１２は文字列４１１に対して
特定の特殊文字「ａ」をその後の文字「は」により一意
に決めた「ａ1」に変更した文字列「いろａ1はに」、４
１３は最初に検索される２文字連鎖「いろ」、４１４は
４１３の次の２文字連鎖「ろａ1」、４１５は４１４の
次の２文字連鎖「ａ1は」、４１６は４１５の次の２文
字連鎖「はに」である。In FIG. 4C, reference numeral 411 denotes a search character string “Iro-a-ha-ni” at the time of search, and reference numeral 412 denotes a specific special character “a” for the character string 411 by the subsequent character “Ha”. Character string changed to "a1" decided "color a1 hanni", 4
13 is a two-letter chain "iro" to be searched first, 414 is a two-letter chain following the 413 "ro a1", 415 is a two-letter chain following the 414 "a1", and 416 is a two-letter character following the 415 The chain is "Hani".

【００６２】図４(b)において、２文字連鎖４０３は
「い」および「ろ」の検索対象における今までの出現回
数n1、n2を、２文字連鎖４０４は「ろ」および「ａ1」
の出現回数n2、n3を、２文字連鎖４０５は「ａ1」およ
び「は」の出現回数n3、n4を、２文字連鎖４０６は
「は」および「に」の出現回数n4、n5を記憶する。In FIG. 4B, the two-letter chain 403 is the number of appearances n1 and n2 of the search target of “i” and “ro”, and the two-letter chain 404 is “ro” and “a1”.
The two-letter chain 405 stores the occurrence counts n3 and n4 of “a1” and “ha”, and the two-letter chain 406 stores the occurrence counts n4 and n5 of “ha” and “ni”.

【００６３】このとき本発明の第２の方法による照合方
法では、２文字連鎖４１３の「いろ」に該当する２文字
連鎖４０３を検出し、このときの「ろ」の出現回数n2
と、４１３の次の２文字連鎖４１４の「ろａ1」に該当
する２文字連鎖４０４を検出し、このときの「ろ」の出
現回数n2が一致するか否か判断する。一致したら、次に
４０４で検出した「ａ1」の出現回数n3と、４１４の次
の２文字連鎖の「ａ1は」に該当する２文字連鎖４０５
を検出し、このときの「ａ1」の出現回数が一致するか
否か判断する。一致したら、次に４０５で検出した
「は」の出現回数n4と、４１５の次の２文字連鎖の「は
に」に該当する２文字連鎖４０６を検出し、このときの
「は」の出現回数が一致するか否か判断する。一致した
ら、文字列４１１は４０１に一致したと判断する。以上
により、文字列の照合がなされる。At this time, in the collation method according to the second method of the present invention, the two-character chain 403 corresponding to the “iro” of the two-character chain 413 is detected, and the number of appearances n2 of “ro” at this time is detected.
, A two-character chain 404 corresponding to “ro a1” in the two-character chain 414 following 413 is detected, and it is determined whether or not the number of appearances n2 of “ro” at this time matches. If they match, then the number of appearances n3 of “a1” detected in 404 and the two-character sequence 405 corresponding to “a1” in the two-character sequence following 414
Is detected, and it is determined whether or not the number of appearances of “a1” at this time matches. If there is a match, then the number of appearances n4 of "ha" detected at 405 and the two-character chain 406 corresponding to "hani" of the next two-character chain of 415 are detected, and the number of occurrences of "ha" at this time It is determined whether or not matches. If they match, it is determined that the character string 411 matches 401. As described above, the character strings are collated.

【００６４】図３は本発明の第２の方法の一実施の形態
における文字列照合装置の構成を示したものである。FIG. 3 shows the configuration of a character string collating apparatus according to an embodiment of the second method of the present invention.

【００６５】図３において、３０１は登録する文字列４
０１を特定の特殊文字「ａ」をその後の文字「は」によ
り一意に決めた「ａ1」に変更した文字列４０２に変更
する文字列変換器、３０２は文字列４０２から登録する
２文字連鎖４０３、４０４、４０５、４０６を検出する
２文字連鎖検出器、３０３は２文字連鎖４０３、４０
４、４０５、４０６およびそれらの文字の出現回数を格
納する２文字連鎖メモリ、３０４は検索する文字列４１
１を特定の特殊文字「ａ」をその後の文字「は」により
一意に決めた「ａ1」に変更した文字列４１２に変更す
る文字列変換器、３０５は文字列４１２において検索す
る２文字連鎖４１３、４１４、４１５、４１６を検出す
る２文字連鎖検出器、３０６は２文字連鎖検出器３０５
より検出された２文字連鎖４１３、４１４、４１５、４
１６を２文字連鎖メモリ３０３で検出し、検出した２文
字連鎖の前の文字の出現回数が直前に検出した２文字連
鎖の後の文字の出現回数に一致するか否か判断する比較
器、３０７は２文字連鎖検出器３０５から検出される全
ての２文字連鎖について比較器３０６で判断し、文字列
の一致を判断する制御部である。In FIG. 3, reference numeral 301 denotes a character string 4 to be registered.
01 is a character string converter that changes a specific special character “a” to a character string 402 that is uniquely determined by the subsequent character “ha”, and 302 is a two-character chain 403 registered from the character string 402 , 404, 405, 406, and 303 is a two-character chain 403, 40.
4, 405, 406, and a two-character chain memory for storing the number of appearances of those characters.
1 is a character string converter that changes a specific special character “a” to a character string 412 that is uniquely determined by the subsequent character “ha”, and 305 is a two-character chain 413 to search in the character string 412. , 414, 415, and 416, a two-character chain detector 306, and a two-character chain detector 305
Two character chains 413, 414, 415, 4
307, a comparator 307 that detects 16 in the two-character chain memory 303 and determines whether the number of appearances of the character before the detected two-character chain matches the number of appearances of the character after the two-character chain detected immediately before. Is a control unit for determining by the comparator 306 all the two-character chains detected by the two-character chain detector 305, and determining whether the character strings match.

【００６６】以上のように構成された文字列照合装置に
おいて、その動作について説明する。登録文字列が入力
されると文字列変換手段３０１は、予め指定された特殊
文字「ａ」をその後の文字により予め決められた検索対
象とならない記号、コード、即ち、検索文字列以外の記
号、コードに変換して出力する。The operation of the thus constructed character string collating apparatus will be described. When the registered character string is input, the character string conversion unit 301 converts the special character “a” specified in advance into a symbol or code that is not a predetermined search target by a subsequent character, that is, a symbol other than the search character string, Convert to code and output.

【００６７】文字列変換手段には、図４（ｄ）のよう
に、特殊記号の後の文字に対応し、どの記号に変換する
その対応が格納されている。この対応は４２１、４２２
のように文字毎に異なる対応でも、また、４２３のよう
に文字のグループに対応するものでもよい。In the character string conversion means, as shown in FIG. 4D, the correspondence to the character after the special symbol and the conversion to which symbol is stored. This correspondence is 421, 422
, Or may correspond to a group of characters, such as 423.

【００６８】変換された文字列は、２文字連鎖検出器に
より実施の形態１と同様に２文字連鎖とその出現回収と
が検出され、２文字連鎖メモリに格納される。In the converted character string, a two-character chain and its appearance and recovery are detected by a two-character chain detector as in the first embodiment, and are stored in a two-character chain memory.

【００６９】一方、検索文字列が与えられると文字列変
換器３０４により、文字列変換３０１で用いた対応と同
一の対応に従い、特殊文字を検索文字列以外の記号、コ
ードに変換し、２文字連鎖検出器に出力する。２文字連
鎖検出器は２文字連鎖を検出し、比較器３０６に出力す
る。On the other hand, when the search character string is given, the character string converter 304 converts the special character into a symbol or code other than the search character string according to the same correspondence as that used in the character string conversion 301, and converts the special character into two characters. Output to the chain detector. The two-character chain detector detects the two-character chain and outputs the result to the comparator 306.

【００７０】比較器３０６は実施の形態１と同様の手順
に従い２文字連鎖メモリの内容に従い文字連鎖の一致を
検出する。但し、実施の形態２では、実施の形態１のよ
うに比較器が、２文字連鎖か３文字連鎖かを区別する必
要はない。The comparator 306 detects the coincidence of the character chains according to the contents of the two-character chain memory according to the same procedure as in the first embodiment. However, in the second embodiment, it is not necessary for the comparator to distinguish between a two-character chain and a three-character chain as in the first embodiment.

【００７１】以上のように、本実施の形態によれば、出
現頻度の高い特殊文字「ａ」の連鎖メモリの増大を避け
ることができ、また、同一の特殊文字を後の文字に従い
異なる複数の文字に変換することにより、連鎖を抽出す
るための出現回数の一致を調べる候補が複数に分散され
ることにより、その処理時間が短くてすむ。As described above, according to the present embodiment, it is possible to avoid an increase in the chain memory of the special character “a” having a high appearance frequency, and to replace the same special character with a plurality of different characters in accordance with the subsequent characters. By converting to characters, candidates for checking the coincidence of the number of appearances for extracting a chain are dispersed into a plurality of candidates, so that the processing time is reduced.

【００７２】なお、本実施の形態では特殊文字「ａ」
を、その後の文字によて変換先を決めたが、特殊文字の
前の文字により、変換先を決めた場合でも同様の効果を
得られることは明らかでる。In this embodiment, the special character "a"
Is determined based on the character after that, but it is clear that the same effect can be obtained even when the conversion destination is determined based on the character before the special character.

【００７３】なお、計算機として実装した場合の概略図
は図１（ａ）と同じであり、本実施の形態では、２文字
連鎖メモリ３０３が外部記録装置４０に対応する。The schematic diagram of the case where the present invention is implemented as a computer is the same as that of FIG. 1A.

【００７４】（実施の形態３）図５は、本発明の第３の
実施の形態における文字列照合装置の構成を示すブロッ
ク図、図６〜図８は本発明の文字列照合の第３の方法の
概念、及び全文検索データを記憶した記録媒体の記憶形
式を示している。(Embodiment 3) FIG. 5 is a block diagram showing a configuration of a character string collating apparatus according to a third embodiment of the present invention. FIGS. 2 shows the concept of the method and the storage format of a recording medium that stores full-text search data.

【００７５】図６において、６０１は登録時に入力され
る文字列「いろａはに」、６０２は文字列６０１に対し
て特定の特殊文字「ａ」をその前の文字「ろ」は「ろ」
および「ろ」により一意に決まる「ろ’」からなる「ろ
ろ’」に、またその後の文字「は」は「は」により一意
に決まる「は’」および「は」からなる「は’は」に変
更した文字列「いろろ’は’はに」、６０３は最初に登
録されるの２文字連鎖「いろ」、６０４は６０３の次の
２文字連鎖「ろろ’」、６０５は６０４の次の２文字連
鎖「ろ’は’」、６０６は６０５の次の２文字連鎖
「は’は」、６０７は６０６の次の２文字連鎖「はに」
である。In FIG. 6, reference numeral 601 denotes a character string “iroaha ni” inputted at the time of registration, 602 denotes a specific special character “a” for the character string 601 and the character “ro” preceding it is “ro”.
And "ro", which is uniquely determined by "ro", and the subsequent character "ha", which is uniquely determined by "ha", "ha", which is composed of "ha" and "ha""603" is a two-character chain "iro" that is registered first, 604 is a two-character chain "roro" next to 603, and 605 is a character sequence "604". The next two-letter chain "ro'ha '", 606 is the next two-letter chain after 605 "ha'ha", and 607 is the two-letter chain next to 606 "hani"
It is.

【００７６】ここで「ａ」は、文字列に意味の区切りな
どのために挿入されている特殊文字、「ろ’」「は’」
は、検索対象とならない特定の記号、コードを表す。Here, "a" is a special character inserted into a character string for separating meanings, etc.
Represents a specific symbol or code not to be searched.

【００７７】図８において、６１１は検索時の検索文字
列「いろａはに」、６１２は文字列６１１に対して特定
の特殊文字「ａ」をそのその前の文字「ろ」は「ろ」お
よび「ろ」により一意に決まる「ろ’」からなる「ろ
ろ’」に、またその後の文字「は」は「は」により一意
に決まる「は’」および「は」からなる「は’は」に変
更した文字列「いろろ’は’はに」、６１３は最初に検
索される２文字連鎖「いろ」、６１４は６１３の次の２
文字連鎖「ろろ’」、６１５は６１４の次の２文字連鎖
「ろ’は’」、６１６は６１５の次の２文字連鎖「は’
は」、６１７は６１６の次の２文字連鎖「はに」であ
る。In FIG. 8, reference numeral 611 denotes a search character string “iro a ha ni” at the time of search, and reference numeral 612 denotes a specific special character “a” for the character string 611 and the character “ro” preceding it is “ro”. And "ro", which is uniquely determined by "ro", and the subsequent character "ha", which is uniquely determined by "ha", "ha", which is composed of "ha" and "ha""613 is the two-character chain" iro "to be searched first, and 614 is the next two characters after 613.
The character chain "Roro '", 615 is the two-character chain next to 614 "Roh'", and 616 is the two-character chain next to 615 "Hana"
"Ha", 617 is a two-character chain "hani" next to 616.

【００７８】図７において、２文字連鎖６０３は「い」
および「ろ」の出現回数n1、n2を、２文字連鎖６０４は
「ろ」および「ろ’」の出現回数n2、n3を、２文字連鎖
６０５は「ろ’」および「は’」の出現回数n3、n4を、
２文字連鎖６０６は「は’」および「は」の出現回数n
4、n5を、２文字連鎖６０７は「は」および「に」の出
現回数n5、n6を記憶する。このとき本発明の第３の方法
による照合方法では、２文字連鎖６１３の「いろ」に該
当する２文字連鎖６０３を検出し、このときの「ろ」の
出現回数n2と、６１３の次の２文字連鎖６１４の「ろ
ろ’」に該当する２文字連鎖６０４を検出し、このとき
の「ろ」の出現回数n2が一致するか否か判断する。一致
したら、次に６０４で検出した「ろ’」の出現回数n3
と、６１４の次の２文字連鎖の「ろ’は’」に該当する
２文字連鎖６０５を検出し、このときの「ろ’」の出現
回数が一致するか否か判断する。一致したら、次に６０
５で検出した「は’」の出現回数n4と、６１５の次の２
文字連鎖の「は’は」に該当する２文字連鎖６０６を検
出し、このときの「は’」の出現回数が一致するか否か
判断する。一致したら、次に６０６で検出した「は」の
出現回数n5と、６１６の次の２文字連鎖の「はに」に該
当する２文字連鎖６０７を検出し、このときの「は」の
出現回数が一致するか否か判断する。一致したら、文字
列６１１は６０１に一致したと判断する。以上により、
文字列の照合がなされる。In FIG. 7, the two-character chain 603 is “I”.
And the number of appearances n1 and n2 of "ro", the two-character chain 604 is the number of appearances n2 and n3 of "ro" and "ro '", and the two-character chain 605 is the number of appearances of "ro'" and "ha '" n3, n4,
The two-character chain 606 indicates the number of appearances n of “ha” and “ha”.
The four-character chain 607 stores the number of appearances n5 and n6 of “ha” and “ni”. At this time, in the collation method according to the third method of the present invention, a two-character chain 603 corresponding to “iro” of the two-character chain 613 is detected, and the number of appearances n2 of “ro” at this time and the next two The two-character chain 604 corresponding to “Roro ′” in the character chain 614 is detected, and it is determined whether or not the number of appearances n2 of “Roro” at this time matches. If they match, then the number of appearances n3 of “ro '” detected in 604
, A two-character chain 605 corresponding to “ro'wa” in the two-character chain following 614 is detected, and it is determined whether the number of appearances of “ro '” at this time matches. If they match, then 60
The number of appearances n4 of “wa '” detected in 5 and the next 2 in 615
The two-character chain 606 corresponding to the character chain “ha′ha” is detected, and it is determined whether or not the number of appearances of “ha ′” at this time matches. If there is a match, then the number of occurrences n5 of “ha” detected in 606 and the two-character chain 607 corresponding to “hani” of the two-character chain following 616 are detected, and the number of occurrences of “ha” at this time It is determined whether or not matches. If they match, it is determined that the character string 611 matches 601. From the above,
String matching is performed.

【００７９】図５は本発明の第３の方法の一実施の形態
における文字列照合装置の構成を示したものである。FIG. 5 shows a configuration of a character string collating apparatus according to an embodiment of the third method of the present invention.

【００８０】図５において、５０１は登録する文字列６
０１に対して特定の特殊文字「ａ」をその前の文字
「ろ」は「ろ」および「ろ」により一意に決まる
「ろ’」からなる「ろろ’」に、またその後の文字
「は」は「は」により一意に決まる「は’」および
「は」からなる「は’は」に変更した文字列６０２に変
更する文字列変換器、５０２は文字列６０２から登録す
る２文字連鎖６０３、６０４、６０５、６０６、６０７
を検出する２文字連鎖検出器、５０３は２文字連鎖６０
３、６０４、６０５、６０６、６０７およびそれらの文
字の出現回数を格納する２文字連鎖メモリ、５０４は検
索する文字列６１１を特定の特殊文字「ａ」をその前の
文字「ろ」は「ろ」および「ろ」により一意に決まる
「ろ’」からなる「ろろ’」に、またその後の文字
「は」は「は」により一意に決まる「は’」および
「は」からなる「は’は」に変更した文字列６１２に変
更する文字列変換器、５０５は文字列６１２において検
索する２文字連鎖６１３、６１４、６１５、６１６、６
１７を検出する２文字連鎖検出器、５０６は２文字連鎖
検出器５０５より検出された２文字連鎖６１３、６１
４、６１５、６１６、６１７を２文字連鎖メモリ５０３
で検出し、検出した２文字連鎖の前の文字の出現回数が
直前に検出した２文字連鎖の後の文字の出現回数に一致
するか否か判断する比較器、５０７は２文字連鎖検出器
５０５から検出される全ての２文字連鎖について比較器
５０６で判断し、文字列の一致を判断する制御部であ
る。In FIG. 5, reference numeral 501 denotes a character string 6 to be registered.
01, the special character "a" is replaced by "ro" consisting of "ro", which is uniquely determined by "ro" and "ro", and the subsequent character "ro""Is a character string converter that is uniquely determined by" ha "and is changed to a character string 602 changed to"ha'ha"consistingof" ha '"and" ha ". 502 is a two-character chain 603 registered from the character string 602. , 604, 605, 606, 607
503 is a two-character chain that detects
3, 604, 605, 606, 607 and the two-character chain memory for storing the number of appearances of those characters. 504 is a character string 611 to be searched. "Ro" consisting of "ro" uniquely determined by "" and "ro", and the following character "ha" uniquely consisting of "ha" and "ha" consisting of "ha" and "ha" A character string converter for changing to a character string 612 changed to "ha". Reference numeral 505 denotes a two-character chain 613, 614, 615, 616, 6 to be searched in the character string 612.
17 is a two-character chain detector 506, and the two-character chain 613 and 61 detected by the two-character chain detector 505.
4, 615, 616, and 617 are stored in the two-character chain memory 503.
The comparator 507 determines whether or not the number of occurrences of the character before the detected two-character chain matches the number of occurrences of the character after the two-character chain detected immediately before. The controller 506 determines all the two-character chains detected from, and determines whether the character strings match.

【００８１】以上ように、本実施の形態によれば、特殊
文字「ａ」の出現回数に制限を受けること無く文字連鎖
による文字列照合を行うことが可能となる。As described above, according to the present embodiment, it is possible to perform character string collation using character chains without being limited by the number of appearances of the special character “a”.

【００８２】即ち、実施の形態によれば特殊文字「ａ」
はその前後の文字により別々の文字に変換され、変換さ
れた文字の出現回数が記録されるため、実施の形態２に
比べ、２文字連鎖ファイルがさらに細かく分散されるこ
とにより、使用頻度の高い特殊文字の出現頻度の高い特
殊文字「ａ」の連鎖メモリの増大を避けることができ、
同時に、連鎖の抽出処理の効率化が図れる。That is, according to the embodiment, the special character “a”
Is converted into separate characters by the characters before and after it, and the number of appearances of the converted characters is recorded. Therefore, compared to the second embodiment, the two-character chain file is further finely dispersed, so that the frequency of use is high. It is possible to avoid an increase in the chain memory of the special character “a” having a high appearance frequency of the special character,
At the same time, the efficiency of the chain extraction process can be improved.

【００８３】なお、計算機として実装した場合の概略図
は図１（ａ）と同じであり、この場合、２文字連鎖メモ
リ５０３が外部記録装置４０に対応する。（実施の形態４）第１０図は本発明の文字列照合の第４
の方法の概念を示している。第１０図(a)において、１
００１は登録時に入力される文字列「いろａはに」、１
００２は最初に登録される２文字連鎖「いろ」、１００
３は１００２に続く特殊文字「ａ」をはさむ３文字列
「ろａは」から生成される最初の２文字連鎖「ろは」、
１００４は１００３の次に生成する特殊文字を含む２文
字連鎖「ろａ」、１００５は１００４の次に生成する特
殊文字を含む２文字連鎖「ａは」、１００６は１００５
の次の２文字連鎖「はに」である。第１０図(c)におい
て、１０１１は検索時の検索文字列「いろａはに」、１
０１２は最初に検索される２文字連鎖「いろ」、１０１
３は１０１２に続く特殊文字「ａ」をはさむ３文字列
「ろａは」から生成される最初の２文字連鎖「ろは」、
１０１４は１０１３の次に生成する特殊文字を含む２文
字連鎖「ろａ」、１０１５は１０１４の次に生成する特
殊文字を含む２文字連鎖「ａは」、１０１６は１０１５
の次の２文字連鎖「はに」である。The schematic diagram in the case of being implemented as a computer is the same as FIG. 1A, and in this case, the two-character chain memory 503 corresponds to the external recording device 40. (Embodiment 4) FIG. 10 shows the fourth embodiment of the character string collation according to the present invention.
Shows the concept of the method. In FIG. 10 (a), 1
001 is a character string “Iro a Hani” entered at the time of registration, 1
002 is a two-character chain “Iro” that is registered first, 100
3 is the first two-character chain “loha” generated from the three-character string “roaha” sandwiching the special character “a” following 1002,
1004 is a two-character chain “roa” including a special character generated next to 1003, 1005 is a two-character chain “a wa” including a special character generated next to 1004, and 1006 is 1005
Is the next two-letter chain "Hani". In FIG. 10 (c), reference numeral 1011 denotes a search character string at the time of search,
012 is a two-character chain “iro” to be searched first, 101
3 is the first two-character chain “loha” generated from the three-character string “roaha” sandwiching the special character “a” following 1012,
1014 is a two-character chain “roa” including a special character generated next to 1013, 1015 is a two-character chain “a wa” including a special character generated next to 1014, and 1016 is 1015
Is the next two-letter chain "Hani".

【００８４】第１０図(b)において、２文字連鎖１００
２は「い」および「ろ」の出現回数n1、n2を、２文字連
鎖１００３は「ろ」および「は」の出現回数n2、n3を、
２文字連鎖１００４は「ろ」の出現回数n2および「ａ」
に対しては一定値nを、２文字連鎖１００５は「ａ」に
対して一定値nおよび「は」の出現回数n3を、２文字連
鎖１００６は「は」および「に」の出現回数n3、n4を記
憶する。In FIG. 10 (b), a two-character chain 100
2 is the number of appearances n1 and n2 of “i” and “ro”, the two-character chain 1003 is the number of appearances n2 and n3 of “ro” and “ha”,
The two-character chain 1004 includes the number of appearances n2 of “ro” and “a”
Is a constant value n, a two-character chain 1005 is a constant value n and the number of appearances n3 of “ha” for “a”, a two-character chain 1006 is a number of appearances n3 of “ha” and “ni”, Remember n4.

【００８５】このとき本発明の第４の方法による照合方
法では、２文字連鎖１０１２の「いろ」に該当する２文
字連鎖１００２を検出し、このときの「ろ」の出現回数
n2と、１０１２の次の「ａ」をはさむ３文字列「ろａ
は」より生成する２文字連鎖のうち２文字連鎖１０１３
の「ろは」に該当する２文字連鎖１００３を検出し、こ
のときの「ろ」および「は」の出現回数n2、n3を検出す
る。文字連鎖１００２および１００３の「ろ」の出現回
数がn2で一致するか否か判断する。一致したら、次に文
字連鎖１０１４の「ろａ」に該当する文字連鎖１００４
を検出し「ろ」の出現回数がn2かどうか判断する。次
に、文字連鎖１０１５の「ａは」に該当する文字連鎖１
００５を検出し「は」の出現回数がn3で１００３で検出
した「ろ」の出現回数n3と一致するか判断する。一致し
たら、次に１００５で検出した「は」の出現回数n3と、
１０１５の次の２文字連鎖の「はに」に該当する２文字
連鎖１００６を検出し、このときの「は」の出現回数が
一致するか否か判断する。一致したら、文字列１０１１
は１００１に一致したと判断する。以上により、文字列
の照合がなされる。At this time, in the collation method according to the fourth method of the present invention, a two-character chain 1002 corresponding to “iro” of the two-character chain 1012 is detected, and the number of appearances of “ro” at this time is detected.
n2 and a three-character string "roa" sandwiching "a" next to 1012
Is a two-character chain 1013 out of the two-character chain generated from
Then, a two-character chain 1003 corresponding to "roha" is detected, and the number of appearances n2 and n3 of "ro" and "ha" at this time are detected. It is determined whether or not the number of appearances of “ro” in the character chains 1002 and 1003 matches with n2. If they match, a character chain 1004 corresponding to “a” in the character chain 1014 is next.
Is detected, and it is determined whether the number of appearances of “ro” is n2. Next, the character chain 1 corresponding to “a” in the character chain 1015
005 is detected, and it is determined whether the number of appearances of “ha” is n3 and coincides with the number of appearances n3 of “ro” detected in 1003. If there is a match, then the number of appearances n3 of “ha” detected in 1005,
A two-character chain 1006 corresponding to “Hani” in the next two-character chain after 1015 is detected, and it is determined whether or not the number of appearances of “Hana” at this time matches. If they match, the character string 1011
Is determined to match 1001. As described above, the character strings are collated.

【００８６】第９図は本発明の第４の方法の一実施例に
おける文字列照合装置の構成を示したものである。FIG. 9 shows the configuration of a character string collating apparatus according to an embodiment of the fourth method of the present invention.

【００８７】第９図において、９０１は登録する文字列
１００１に対して特定の特殊文字「ａ」を検出する特殊
文字検出器、９０２は文字列１００１から特殊文字がな
い場合に生成する登録する２文字連鎖１００２、１００
６を検出する２文字連鎖検出器、９０３は文字列１００
１から特殊文字「ａ」をはさむ３文字列「ろａは」から
生成する２文字連鎖１００３、１００４、１００５を検
出する特殊文字連鎖検出器、９０４は２文字連鎖検出器
９０２および特殊文字連鎖検出器９０３で検出された２
文字連鎖１００２、１００３、１００４、１００５、１
００６およびそれぞれの連鎖文字で特殊文字は一定値を
またそのほかの文字はその出現回数を格納する２文字連
鎖メモリ、９１１は検索する文字列１０１１に対して特
定の特殊文字「ａ」を検出する特殊文字検出器、９１２
は文字列１０１１から特殊文字がない場合に生成する登
録する２文字連鎖１０１２、１０１６を検出する２文字
連鎖検出器、９１３は文字列１０１１から特殊文字列
「ａ」をはさむ３文字列「ろａは」から生成する２文字
連鎖１０１３、１０１４、１０１５を検出する特殊文字
連鎖検出器、９１４は２文字連鎖検出器９１２より検出
された２文字連鎖１０１２、１０１６を２文字連鎖メモ
リ９０４で検出し、検出した２文字連鎖の前の文字の出
現回数が直前に検出した２文字連鎖の後の文字の出現回
数に一致するか否か判断し、また特殊文字連鎖検出器９
１３より検出された２文字連鎖１０１３、１０１４、１
０１５を２文字連鎖メモリ９０４で検出し、特殊文字以
外の「ろ」「は」の出現回数が検出した文字連鎖で一致
するか否か判断する比較器、９１５は２文字連鎖検出器
９１２および特殊文字連鎖検出器９１３から検出される
全ての２文字連鎖について比較器９１４で判断し、文字
列の一致を判断する制御部である。In FIG. 9, reference numeral 901 denotes a special character detector for detecting a specific special character “a” in the character string 1001 to be registered, and reference numeral 902 denotes a registration 2 generated when there is no special character from the character string 1001. Character chains 1002, 100
6, a two-character chain detector 903 detects the character string 100
A special character chain detector for detecting a two-character chain 1003, 1004, 1005 generated from a three-character string "roaha" sandwiching the special character "a" from 1; 904, a two-character chain detector 902 and a special character chain detection 2 detected by the detector 903
Character chains 1002, 1003, 1004, 1005, 1
006 and each chained character, a special character is a fixed value, and the other characters are a two-character chain memory that stores the number of appearances. 911 is a special character that detects a specific special character “a” in the character string 1011 to be searched. Character detector, 912
Is a two-character chain detector that detects two-character chains 1012 and 1016 to be registered, which are generated when there is no special character from the character string 1011; A special character chain detector that detects the two-character chain 1013, 1014, and 1015 generated from "ha", and 914 detects the two-character chain 1012 and 1016 detected by the two-character chain detector 912 in the two-character chain memory 904, It determines whether or not the number of appearances of the character before the detected two-character chain matches the number of occurrences of the character after the two-character chain detected immediately before.
13 two-character chain 1013, 1014, 1
015 is detected by the two-character chain memory 904, and a comparator that determines whether the number of appearances of “ro” and “ha” other than special characters matches in the detected character chain, and 915 is a two-character chain detector 912 and a special character The controller 914 determines all the two-character chains detected by the character chain detector 913 with the comparator 914 and determines whether the character strings match.

【００８８】よって、この方法では特定の特殊文字
「ａ」はその出現頻度に関係なく前後の文字と連鎖を生
成することができるため、特殊文字「ａ」の出現回数に
制限を受けること無く文字連鎖による文字列照合を行う
ことが可能となる。Therefore, in this method, a specific special character “a” can be linked with the preceding and following characters irrespective of its appearance frequency. String matching by chaining can be performed.

【００８９】（実施の形態５）第１２図は本発明の第５
の方法の文字列照合の方法の概念を示している。第１２
図(a)において、１２０１は登録時に入力される文字列
「いろａはに」、１２０２は最初に登録されるの２文字
連鎖「いろ」、１２０３は１２０２の次の３文字連鎖
「ろａは」、１２０４は１２０３の次の２文字連鎖「は
に」である。第１２図(c)において、１２１１は検索時
の検索文字列「いろａはに」、１２１２は最初に検索さ
れる２文字連鎖「いろ」、１２１３は１２１２の次の３
文字連鎖「ろａは」、１２１４は１２１３の次の２文字
連鎖「はに」である。(Embodiment 5) FIG. 12 shows a fifth embodiment of the present invention.
3 shows the concept of the method of character string matching. Twelfth
In FIG. 12A, reference numeral 1201 denotes a character string “Iro a Hari” input at the time of registration; 1202 denotes a two-character chain “Iro” which is registered first; , 1204 is a two-character chain “Hani” next to 1203. In FIG. 12 (c), reference numeral 1211 denotes a search character string "iro a ha ni" at the time of search, reference numeral 1212 denotes a two-character chain "iro" to be searched first, and reference numeral 1213 denotes a character string next to 1212.
The character chain “roaha” and 1214 are the next two-character chain “hani” after 1213.

【００９０】第１２図(b)において、２文字連鎖１２０
２は「い」および「ろ」の出現回数n1、n2を、３文字連
鎖１２０３は「ろ」の出現回数n2および「ａ」の回数0
の組み合わせと、および「ａ」の回数0と「は」の出現
回数n3の組み合わせでn2、0および0、n3を、２文字連鎖
１２０４は「は」および「に」の出現回数n3、n4を記憶
する。In FIG. 12B, the two-character chain 120
2 is the number of appearances n1 and n2 of “i” and “ro”, and the three-character chain 1203 is the number of appearances n2 of “ro” and 0 of the number of “a”.
And the combination of the number of times “a” 0 and the number of appearances n3 of “ha”, n2, 0 and 0, n3. Remember.

【００９１】このとき本発明の第５の方法による照合方
法では、２文字連鎖１２１２の「いろ」に該当する２文
字連鎖１２０２を検出し、このときの「ろ」の出現回数
n2と、１２１２の次の３文字連鎖１２１３の「ろａは」
に該当する３文字連鎖１２０３を検出し、このときの
「ろ」の出現回数n2が一致するか否か判断する。一致し
たら、次に３文字連鎖の間の「ａ」に該当する値０を検
出する。次に「は」の前の「ａ」の値０を検出し、１２
０３で検出した「は」の出現回数n3と、１２１３の次の
２文字連鎖の「はに」に該当する２文字連鎖１２０４を
検出し、このときの「は」の出現回数が一致するか否か
判断する。一致したら、文字列１２１１は１２０１に一
致したと判断する。以上により、文字列の照合がなされ
る。At this time, in the collation method according to the fifth method of the present invention, a two-character chain 1202 corresponding to “iro” of the two-character chain 1212 is detected, and the number of appearances of “ro” at this time is detected.
n2 and "roaha" in the three-character chain 1213 following 1212
Is detected, and it is determined whether or not the number of appearances n2 of “ro” at this time matches. If they match, a value 0 corresponding to "a" in the three-character chain is detected. Next, the value 0 of “a” before “ha” is detected, and 12
The number of appearances n3 of “ha” detected in 03 and the two-character chain 1204 corresponding to “hani” of the two-character chain next to 1213 are detected, and whether or not the number of appearances of “ha” at this time matches Judge. If they match, it is determined that the character string 1211 matches 1201. As described above, the character strings are collated.

【００９２】第１１図は本発明の第５の方法の一実施例
における文字列照合装置の構成を示したものである。FIG. 11 shows the configuration of a character string collating apparatus in one embodiment of the fifth method of the present invention.

【００９３】第１１図において、１１０１は登録する文
字列１２０１から登録する２文字連鎖１２０２、１２０
４、を検出する２文字連鎖検出器、１１０２は登録する
文字列１２０１から登録する３文字連鎖１２０３を検出
する３文字連鎖検出器、１１０３は２文字連鎖１２０
２、１２０４およびそれらの文字の出現回数を格納する
２文字連鎖メモリ、１１０４は３文字連鎖１２０３およ
びその連鎖の最初と最後の文字の出現回数を格納する３
文字連鎖メモリ、１１１１は検索する文字列１２１１か
ら検索する２文字連鎖１２１２、１２１４を検出する２
文字連鎖検出器、１１１２は検索する文字列１２１１か
ら検索する３文字連鎖１２１３を検出する３文字連鎖検
出器、１１１３は２文字連鎖検出器１１１１より検出さ
れた２文字連鎖１２１２、１２１４を２文字連鎖メモリ
１１０３で検出するかまたは、３文字連鎖検出器１１１
２より検出された３文字連鎖１２１３を３文字連鎖メモ
リ１１０４で検出し、検出したそれぞれの文字連鎖の前
の文字の出現回数が直前に検出した文字連鎖の後の文字
の出現回数に一致するか否か判断する比較器、１１１４
は２文字連鎖検出器１１１１および３文字連鎖検出器１
１１２から検出される全ての２文字または３文字の連鎖
についての一致を比較器１１１３で判断し、文字列の一
致を判断する制御部である。In FIG. 11, reference numeral 1101 denotes a two-character chain 1202, 120 to be registered from a character string 1201 to be registered.
4, a two-character chain detector 1102 detects a three-character chain 1203 to be registered from a character string 1201 to be registered, and a two-character chain detector 1103 detects a two-character chain 1203.
2, a two-character chain memory for storing 1204 and the number of appearances of those characters; and 1104, a three-character chain 1203 for storing the number of appearances of the first and last characters of the chain.
The character chain memory 1111 detects two-character chains 1212 and 1214 to be searched from the character string 1211 to be searched.
A character chain detector 1112 is a three-character chain detector that detects a three-character chain 1213 to be searched from a character string 1211 to be searched, and 1113 is a two-character chain of two-character chains 1212 and 1214 detected by the two-character chain detector 1111. Detected in memory 1103 or three-character chain detector 111
The three-character chain 1213 detected from the second character chain is detected by the three-character chain memory 1104, and whether the number of appearances of the character before each detected character chain matches the number of occurrences of the character after the character chain detected immediately before is detected. Comparator for determining whether or not 1114
Are the two-character chain detector 1111 and the three-character chain detector 1
The controller 1113 determines whether the two or three character chains detected from the sequence 112 match with each other by the comparator 1113 and determines whether the character strings match.

【００９４】よって、この時特定の特殊文字「ａ」の出
現回数に制限を受けること無く文字連鎖による文字列照
合を行うことが可能となる。Therefore, at this time, it is possible to perform character string collation by character chains without being limited by the number of appearances of the specific special character “a”.

【００９５】（実施の形態６）第１４図は本発明の第５
の方法の文字列照合の方法の概念を示している。第１４
図(a)において、１４０１は登録時に入力される文字列
「いろａはに」、１４０２は最初に登録されるの２文字
連鎖「いろ」、１４０３は１４０２の次の特殊文字が挿
入された３文字連鎖「ろａは」の第２文字で特殊文字
「ａ」を次の第３文字「は」に変換した３文字連鎖「ろ
はは」、１４０４は特殊文字「ａ」を次の文字に変換し
３文字連鎖１４０３の第１文字と第２文字による２文字
連鎖「ろは」、１４０５は３文字連鎖１４０３の第２文
字と第３文字による２文字連鎖「はは」、１４０６の次
の２文字連鎖「はに」である。第１４図(c)において、
１４１１は検索時の検索文字列「いろａはに」、１４１
２は最初に検索される２文字連鎖「いろ」、１４１３は
１４１２の次の特殊文字が挿入された３文字連鎖「ろａ
は」の第２文字で特殊文字「ａ」を次の第３文字「は」
に変換した３文字連鎖「ろはは」、１４１４は特殊文字
「ａ」を次の文字に変換し３文字連鎖１４１３の第１文
字と第２文字による２文字連鎖「ろは」、１４１５は３
文字連鎖１４１３の第２文字と第３文字による２文字連
鎖「はは」、１４１６は１４１５の次の２文字連鎖「は
に」である。(Embodiment 6) FIG. 14 shows a fifth embodiment of the present invention.
3 shows the concept of the method of character string matching. 14th
In FIG. 13A, reference numeral 1401 denotes a character string “Iro a Hari” input at the time of registration; 1402 denotes a two-character chain “Iro” which is registered first; The special character "a" is converted to the next character "ha" by the second character of the character chain "roaha". The two-character chain "roha" of the first character and the second character of the three-character chain 1403 is converted, and 1405 is the two-character chain "haha" of the second character and the third character of the three-character chain 1403. It is a two character chain "Hani". In FIG. 14 (c),
1411 is a search character string at the time of search “iro a hani”, 141
2 is a two-character chain “iro” to be searched first, and 1413 is a three-character chain “ro a” in which the special character next to 1412 is inserted.
The special character “a” is the second character of “ha” and the next third character is “ha”
1414 converts the special character "a" to the next character and converts the special character "a" into the next character and converts the special character "a" to the next character.
The two-character chain “Hana” consisting of the second and third characters in the character chain 1413 is shown, and 1416 is the two-character chain “Hani” next to 1415.

【００９６】第１４図(b)において、２文字連鎖１４０
２は「い」および「ろ」の出現回数n1、n2を、２文字連
鎖１４０４は「ろ」の出現回数n2および１４０３の第３
文字「は」の出現回数n3の組み合わせでn2、n3を、２文
字連鎖１４０５は２つの１４０３の第３文字「は」の出
現回数n3の組み合わせでn3、n3を、２文字連鎖１２０４
は「は」および「に」の出現回数n3、n4を記憶する。In FIG. 14B, a two-character chain 140
2 is the number of appearances n1 and n2 of “i” and “ro”, and the two-character chain 1404 is the third number of appearances n2 and 1403 of “ro”.
The combination of the number of appearances n3 of the character "ha" is n2, n3, and the two-character chain 1405 is the combination of the number of appearances n3 of the third character "ha" of two 1403, n3, n3 is the combination of two characters 1204.
Stores the number of appearances n3 and n4 of “ha” and “ni”.

【００９７】このとき本発明の第６の方法による照合方
法では、２文字連鎖１４１２の「いろ」に該当する２文
字連鎖１４０２を検出し、このときの「ろ」の出現回数
n2と、１４１２の次の３文字連鎖１４１３の最初の２文
字連鎖１４１４の「ろは」に該当する２文字連鎖１４０
４を検出し、このときの「ろ」の出現回数n2が一致する
か否か判断する。一致したら、次に３文字連鎖の次の２
文字連鎖１４１５「はは」に該当する２文字連鎖１４０
５を検出し、この時の「は」の出現回数n3が一致し、か
つ１４０５の連鎖の第１文字「は」の出現回数と第２文
字「は」の出現回数がn3で一致することを検出する。次
に２文字連鎖１４１６の「はに」に該当する２文字連鎖
１４０６を検出し、１４０５で検出した「は」の出現回
数n3と、２文字連鎖１４０６の「は」の出現回数が一致
するか否か判断する。一致したら、文字列１４１１は１
４０１に一致したと判断する。以上により、文字列の照
合がなされる。At this time, in the collation method according to the sixth method of the present invention, a two-character chain 1402 corresponding to “iro” of the two-character chain 1412 is detected, and the number of appearances of “ro” at this time is detected.
n2 and a two-character chain 140 corresponding to “roha” of the first two-character chain 1414 of the three-character chain 1413 following the 1412
4 is detected, and it is determined whether or not the number of appearances n2 of “ro” at this time matches. If they match, then the next two in the three-character chain
Two-character chain 140 corresponding to character chain 1415 "haha"
5 is detected, and the number of appearances n3 of "ha" at this time matches, and the number of occurrences of the first character "ha" and the number of appearances of the second character "ha" in the chain of 1405 match at n3. To detect. Next, a two-character chain 1406 corresponding to "Hani" in the two-character chain 1416 is detected, and the number of appearances n3 of "H" detected in 1405 matches the number of occurrences of "Hana" in the two-character chain 1406? Determine whether or not. If they match, the character string 1411 is 1
It is determined that the number matches 401. As described above, the character strings are collated.

【００９８】第１３図は本発明の第６の方法の一実施例
における文字列照合装置の構成を示したものである。FIG. 13 shows the structure of a character string collating apparatus according to an embodiment of the sixth method of the present invention.

【００９９】第１３図において、１３０１は登録する文
字列１４０１から登録する２文字連鎖１４０２、１４０
６、及び各文字の出現回数を検出する２文字連鎖検出
器、１３０２は登録する文字列１４０１から登録する３
文字連鎖１４０３を検出する３文字連鎖検出器、１３０
３は３文字連鎖１４０３から挿入された特殊文字を次の
文字に変更して１４０４および１４０５の２つの２文字
連鎖及び各文字の出現回数を検出する特殊２文字連鎖生
成器、１３０４は２文字連鎖１４０２、１４０４、１４
０５、１４０６およびそれらの文字の出現回数を格納す
る２文字連鎖メモリ、１３１１は検索する文字列１４１
１から検索する２文字連鎖１４１２、１４１６を検出す
る２文字連鎖検出器、１３１２は検索する文字列１４１
１から検索する３文字連鎖１４１３を検出する３文字連
鎖検出器、１３１３は３文字連鎖１４１３から挿入され
た特殊文字を次の文字に変更して１４１４および１４１
５の２つの２文字連鎖及び各文字の出現回数を検出する
特殊２文字連鎖生成器、１３１４は２文字連鎖検出器１
３１１より検出された２文字連鎖１４１２、１４１６を
２文字連鎖メモリ１３０４で検出するかまたは、特殊２
文字連鎖生成器１３１３より生成された２文字連鎖１４
１４、１４１５を２文字連鎖メモリ１３０４で検出し、
検出したそれぞれの文字連鎖の前の文字の出現回数が直
前に検出した文字連鎖の後の文字の出現回数に一致する
か否か判断し、特殊２文字連鎖１４１５の場合は第１文
字と第２文字の出現回数が一致することを判断する比較
器、１３１５は２文字連鎖検出器１３１１および３文字
連鎖検出器１３１２から検出される全ての２文字または
３文字の連鎖についての一致を比較器１３１４で判断
し、文字列の一致を判断する制御部である。In FIG. 13, reference numeral 1301 denotes a two-character chain 1402, 140 to be registered from a character string 1401 to be registered.
6, a two-character chain detector 1302 that detects the number of appearances of each character,
Three-character chain detector 130 for detecting character chain 1403
3 is a special two-character chain generator for changing the special character inserted from the three-character chain 1403 to the next character and detecting two two-character chains 1404 and 1405 and the number of appearances of each character. 1402, 1404, 14
05, 1406 and a two-character chain memory for storing the number of appearances of those characters.
2 is a two-character chain detector that detects two-character chains 1412 and 1416 to be searched from 1, and 1312 is a character string 141 to be searched.
A three-character chain detector 1313 for detecting a three-character chain 1413 to be searched from 1 is used to change the special character inserted from the three-character chain 1413 to the next character, and 1414 and 141
5 is a special two-character chain generator for detecting the two two-character chain and the number of appearances of each character.
The two-character chains 1412 and 1416 detected from the 311 are detected by the two-character chain memory 1304 or the special 2
Two-character chain 14 generated by character chain generator 1313
14, 1415 are detected by the two-character chain memory 1304,
It is determined whether or not the number of occurrences of the character before the detected character chain matches the number of occurrences of the character after the character chain detected immediately before. In the case of the special two-character chain 1415, the first character and the second character are determined. A comparator 1315 determines that the appearance counts of the characters match each other. A comparator 1314 determines whether the two or three character chains detected by the two-character chain detector 1311 and the three-character chain detector 1312 match. It is a control unit that determines and determines whether the character strings match.

【０１００】よって、この時特定の特殊文字「ａ」の出
現回数に制限を受けること無く文字連鎖による文字列照
合を行うことが可能となる。Therefore, at this time, it is possible to perform character string collation using character chains without being limited by the number of appearances of the specific special character “a”.

【０１０１】（実施の形態７）第１５図は本発明の第７
の方法の文字列照合の方法の概念を示している。第１５
図（ａ）において、１５０１は登録時に入力される文字
列「いろａはにａいろａはとａ」、１５０２は最初に登
録される２文字連鎖「いろ」、１５０３は１５０２に続
く特殊文字「ａ」を含む２文字連鎖「ろａ」、１５０４
は１５０３を含む次の２文字連鎖「ａは」であり、以下
１５０５〜１５１２まで同じように２文字連鎖を生成す
る。(Embodiment 7) FIG. 15 shows a seventh embodiment of the present invention.
3 shows the concept of the method of character string matching. Fifteenth
Referring to FIG. 15A, reference numeral 1501 denotes a character string "color a wa ni a color a hat a" input at the time of registration; "a", 1502
Is the next two-letter chain "aha" including 1503, and similarly generates a two-letter chain from 1505 to 1512.

【０１０２】第１５図（ｂ）において、２文字連鎖１５
０２は「い」および「ろ」の出現回数をn1、n2を、２文
字連鎖１５０５は「は」および「に」の出現回数n3、n4
を、２文字連鎖１５０８は「い」「ろ」の出現回数n1+
1、n2+1を、２文字連鎖１５１１は「は」および「と」
の出現回数n3+1、n5を記憶する。例えば、第１５図
（ｅ）において、２文字連鎖「はと」の記憶されている
出現回数の組が示されている。In FIG. 15B, a two-character chain 15
02 indicates the number of appearances of “i” and “ro” as n1 and n2, and the two-character chain 1505 indicates the number of occurrences of “ha” and “ni” n3 and n4
, The two-character chain 1508 is the number of appearances n1 +
1, n2 + 1, the two-character chain 1511 is "ha" and "to"
Are stored n3 + 1 and n5. For example, FIG. 15 (e) shows a set of stored appearance counts of the two-character chain “hato”.

【０１０３】次に、特殊文字「ａ」の出現回数の最大値
を予め２と指定し、特殊文字の出現回数を最大値で割っ
たときの余りが０の場合には最大値をとなるように指定
する。この場合、特殊文字の出現回数は、１、２のいず
れかとなる。第１５図（ｂ）において、２文字連鎖１５
０３の特殊文字「ａ」は１度目の出現であるから出現回
数は１、文字連鎖１５０４の特殊文字「ａ」の出現回数
も同じく１、２文字連鎖１５０６の特殊文字「ａ」は１
度目の出現であるから出現回数は２、文字連鎖１５０７
の特殊文字「ａ」の出現回数も同じく２となる。一方、
２文字連鎖１５０９の特殊文字「ａ」は出現回数１が２
度目の出現であるから出現回数は１、文字連鎖１５１０
の特殊文字「ａ」の出現回数も同じく１、文字連鎖１５
１２の特殊文字「ａ」は２度目の出現であるから出現回
数は２となる。Next, the maximum value of the number of appearances of the special character "a" is designated as 2 in advance, and if the remainder obtained by dividing the number of occurrences of the special character by the maximum value is 0, the maximum value is set. To be specified. In this case, the number of appearances of the special character is one or two. In FIG. 15 (b), the two-character chain 15
03 is the first occurrence of the special character “a”, so the number of appearances is 1, the number of appearances of the special character “a” in the character chain 1504 is also the same, and the special character “a” in the two-character chain 1506 is 1
Since it is the second appearance, the number of appearances is 2, character chain 1507
The number of appearances of the special character “a” is also 2. on the other hand,
The special character “a” in the two-character chain 1509 has the appearance frequency 1 of 2
Since this is the second appearance, the number of appearances is 1, and the character chain 1510
The number of appearances of the special character "a" is also 1, and the character chain 15
Since the twelve special characters “a” appear for the second time, the number of appearances is two.

【０１０４】次に、特殊文字を含む２文字連鎖は第２文
字に対して文字種別毎にソートして記憶する。第１５図
（ｃ）において、２文字連鎖「ろａ」の文字連鎖の組
は、２文字連鎖１５０３の組n2、１と２文字連鎖n2+1、
１で構成される。一方、第１５図（ｄ）において、２文
字連鎖「ａ＊」の文字連鎖の組、＊は出現するされる文
字種「い」と「は」で構成され、２文字連鎖１５０４、
１５０７、１５１０に対して文字種別毎にソートされて
いる。ここで文字種別毎のソートは文字コード順で、出
現回数が一致した場合には登録文字列で出現した順番と
する。ソートされた結果、文字連鎖は第１５図（ｃ）
（ｄ）のように記憶される。Next, the two-character chain including the special character is stored by sorting the second character for each character type. In FIG. 15 (c), the set of the character chain of the two-character chain "roa" is the set n2,1 of the two-character chain 1503, and the two-character chain n2 + 1.
It is composed of 1. On the other hand, in FIG. 15 (d), a character chain set of a two-character chain "a *", * is composed of the character types "i" and "ha" that appear, and a two-character chain 1504;
1507 and 1510 are sorted for each character type. Here, the sorting for each character type is in the order of the character code, and when the number of appearances matches, the order in which the characters appear in the registered character string. As a result of the sorting, the character chain is as shown in FIG. 15 (c).
It is stored as shown in FIG.

【０１０５】このとき本発明の第１６の方法による照合
方法では、第１５図（ｆ）にある検索文字列「ろａは
と」を例に説明する。At this time, in the collation method according to the sixteenth method of the present invention, a description will be given by taking as an example the search character string "roa-hato" shown in FIG. 15 (f).

【０１０６】先ず、文字連鎖「ろａ」と「ａは」の連続
性の照合を行う。照合が開始されると「ろａ」と「ａ
は」の重複カウンタを０にリセットする。文字連鎖１５
１３の「ろａ」に該当する２文字連鎖について第１５図
（ｃ）１５０３を最初に検出し、このときの「ａ」の出
現回数１から第１５図（ｇ）にある「ろａ」重複カウン
タに出現回数１の重複回数０を記憶する。次に文字連鎖
１５１４の「ａは」に該当する２文字連鎖について第１
５図（ｄ）で「ａは」の最初の文字連鎖から順番に出現
回数を検出し、さらにその重複回数０を「ａは」重複カ
ウンタに記憶する。照合は、２つの文字連鎖１５０３の
第２文字の出現回数と１５０４の第１文字の出現回数、
および、「ろａ」と「ａは」の重複回数が一致している
かどうかを調べ一致していれば、さらに文字連鎖「は
と」の照合を行う。ここでは文字連鎖１５０４の第２文
字の出現回数と文字連鎖１５１０の第１文字の出現回数
が異なるため、次の文字連鎖の照合を行う。文字連鎖１
５０３と文字連鎖１５０９の特殊文字の出現回数の重複
を調べ、重複していれば「ろａ」の重複カウンタ１５１
６を１つ増やす。これにより文字連鎖１５１０に該当す
る「ａは」の重複カウンタ１５１７を１つ増やす。続い
て第１５図（ｄ）において「ａは」の重複カウンタ１５
１６が１であるから第１文字が１つだけ重複した文字連
鎖１５１０を検出する。文字連鎖「はと」の連続の照合
から文字連鎖１５０９、１５１０、１５１１が最終的に
連続文字列として検出される。なお、このとき第１５図
（ｇ）の「ろａ」「ａは」の重複カウンタ値１（１５１
６、１５１７）が記憶されている。First, the continuity of the character chains “a” and “a” are collated. When the matching is started, “a” and “a”
Is reset to zero. Character chain 15
15 (c) 1503 is first detected for the two-character chain corresponding to the 13 "roa", and the "roa" overlap shown in FIG. The counter stores the number of occurrences 0 of the number of appearances 1 in the counter. Next, regarding the two-character chain corresponding to “a wa” of the character chain
5 In FIG. 5D, the number of appearances is detected in order from the first character chain of “a wa”, and the number of times of duplication 0 is stored in the “a wa” duplication counter. The collation is the number of appearances of the second character in the two character chains 1503 and the number of appearances of the first character in 1504,
It is checked whether or not the number of repetitions of “a” and “a” match, and if they do, the character chain “hato” is further collated. Here, since the number of appearances of the second character in the character chain 1504 is different from the number of appearances of the first character in the character chain 1510, the next character chain is collated. Character chain 1
The number of occurrences of special characters in the character chain 503 and the character chain 1509 is checked for overlap.
6 is increased by one. Thereby, the duplication counter 1517 of “a wa” corresponding to the character chain 1510 is increased by one. Subsequently, in FIG. 15 (d), the duplicate counter 15 of "a"
Since 16 is 1, a character chain 1510 in which only one first character is duplicated is detected. The character chains 1509, 1510, and 1511 are finally detected as a continuous character string from the collation of the character sequence "hato". At this time, the duplicate counter value 1 (151) of “a” and “a” in FIG.
6, 1517) are stored.

【０１０７】以上により、文字列の照合がなされる。な
お、特殊文字の最大値は任意に指定できること、また本
実施例では、特殊文字の出現回数を、予め指定した出現
回数の最大値で割った余りで、余りが０の場合は最大値
にする場合を挙げたが、出現回数は最大値以下で重複を
無視すればユニークであればよいので、出現回数の最大
値以下で割った余り、最大値−余り、昇順の偶数、昇順
の奇数、降順の奇数、降順の奇数などがある。例えば、
最大値を１０として、特殊文字の出現回数が３、５、
７、８、６、４、２の繰り返しを出現回数としても構わ
ない。As described above, the character strings are collated. Note that the maximum value of the special character can be arbitrarily specified. In the present embodiment, the number of occurrences of the special character is a remainder obtained by dividing the number of occurrences of the special character by the maximum value of the number of occurrences specified in advance. If the remainder is 0, the maximum value is set. Although the case is given, the number of appearances is not more than the maximum value and it is only necessary to ignore duplication, so it is sufficient that the number of occurrences is less than the maximum value of the number of appearances. There are odd numbers, odd numbers in descending order, and so on. For example,
Assuming that the maximum value is 10, the number of appearances of special characters is 3, 5,
The repetition of 7, 8, 6, 4, 2 may be used as the number of appearances.

【０１０８】第１６図は本発明の第７の方法の一実施例
における文字列照合装置の構成を示したものである。FIG. 16 shows the structure of a character string collating apparatus according to an embodiment of the seventh method of the present invention.

【０１０９】第１６図において、１６０１は登録する文
字列１５０１に対して特定の特殊文字「ａ」を検出する
特殊文字検出器、１６０２は文字列１５０１から特殊文
字がない場合の文字連鎖の文字連鎖と出現回数を算出
し、１５０２、１５０５、１５０８、１５１１を２文字
連鎖メモリ１６０６に格納する２文字連鎖検出器、１６
０３は特殊文字を含む２文字連鎖で特殊文字を含まない
文字種の第１文字または第２文字の出現回数を２文字連
鎖メモリ１６０６から求め、さらに特殊文字の出現回数
を最大値以下になるように算出し、出現回数の重複回数
を出現重複メモリ１６０４に記憶し、次に出現した特殊
文字の出現回数の値を出現重複メモリ１６０４から算出
し、特殊文字を含む文字連鎖と出現回数である１５０
３、１５０４、１５０６、１５０７、１５９、１５１
０、１５１２を決定する特殊文字連鎖検出器、１６０５
は前記特殊文字連鎖検出器１６０５から特殊文字を第１
文字としてときに第２文字の文字種毎にソートし、その
ソートした結果（第１５図（ｄ））を２文字連鎖メモリ
１６０６に格納する特殊文字連鎖ソート器、１６０７は
検索文字列（第１５図（ｆ））から特殊文字「ａ」を検
出する特殊文字検出器、１６０８は前記検索文字列から
特殊文字がない場合に２文字連鎖を生成する２文字連鎖
検出器、１６０９は前記検索文字列から特殊文字を含む
２文字連鎖を生成する２文字連鎖検出器、１６１０は２
文字連鎖検出器１６０８と特殊文字連鎖検出器１６９で
検出された文字連鎖１５１３、１５１４、１５１５に該
当する文字連鎖と出現回数を２文字連鎖メモリ１６０６
から取り出し、２文字連鎖１５１３と１５１４について
は出現重複カウンタメモリ１６１２を０にセットし、２
文字連鎖１５０３と第２文字の重複回数が０、２文字連
鎖１５０４と第１文字の重複回数０を算出、続いて２文
字連鎖１５０９と第２文字の重複回数を１、２文字連鎖
１５１０と第１文字の重複回数１を算出、２文字連鎖１
５１５については１５１１を算出する比較器、１６１１
は２文字連鎖検出器１６０８および比較器１６１０で算
出した結果から、文字列の一致を判断する制御器であ
る。In FIG. 16, reference numeral 1601 denotes a special character detector for detecting a specific special character “a” in a registered character string 1501, and 1602 denotes a character chain of a character chain when there is no special character from the character string 1501. And a two-character chain detector that calculates the number of appearances and stores 1502, 1505, 1508, and 1511 in the two-character chain memory 1606.
03 is obtained from the two-character chain memory 1606 to determine the number of appearances of the first character or the second character of a character type that does not include special characters in a two-character chain including special characters, and further reduces the number of occurrences of special characters to a maximum value or less. The number of occurrences of the special character is calculated and stored in the occurrence overlap memory 1604, and the value of the number of appearances of the next special character is calculated from the occurrence overlap memory 1604.
3, 1504, 1506, 1507, 159, 151
Special character chain detector to determine 0, 1512, 1605
Is the first special character from the special character chain detector 1605.
A special character chain sorter that sometimes sorts by character type of the second character as a character and stores the sorted result (FIG. 15 (d)) in a two-character chain memory 1606. Reference numeral 1607 denotes a search character string (FIG. 15). (F)) a special character detector for detecting the special character “a” from the search character string, 1608 is a two-character chain detector for generating a two-character chain from the search character string when there is no special character, and 1609 is a A two-character chain detector that generates a two-character chain including special characters,
The two-character chain memory 1606 stores the character chains corresponding to the character chains 1513, 1514, and 1515 detected by the character chain detector 1608 and the special character chain detector 169 and the number of appearances.
From the two character chains 1513 and 1514, the appearance duplication counter memory 1612 is set to 0, and
The number of duplications between the character chain 1503 and the second character is 0, the number of duplications between the two-character chain 1504 and the first character is calculated as 0, and then the number of duplications between the two-character chain 1509 and the second character is 1, and the two-character chain 1510 and the Calculate the number of duplications of 1 character 1 and 2 character chains 1
For 515, a comparator for calculating 1511, 1611
Is a controller for judging a match between character strings from the results calculated by the two-character chain detector 1608 and the comparator 1610.

【０１１０】よって、この時特定の特殊文字「ａ」の出
現回数に制限を受けることなく文字連鎖による文字列照
合を行うことが可能となる。Therefore, at this time, it is possible to perform character string collation by character chain without being limited by the number of appearances of the specific special character “a”.

【０１１１】（実施の形態８）第１７図は本発明の第８
の方法の登録方法と文字列照合の方法の概念を示してい
る。はじめに登録方法について説明する。(Embodiment 8) FIG. 17 shows an eighth embodiment of the present invention.
2 shows the concept of the method of registration and the method of character string collation. First, a registration method will be described.

【０１１２】第１７図（ｄ）において、１７０８は登録
時に入力される文字列「あいａあいａあいａあいあ
い」、１７０９は最初に登録される２文字連鎖「あ
い」、１７１０は１７０９に続く特殊文字「ａ」を含む
２文字連鎖「いａ」、１７１１はを次の２文字連鎖「ａ
あ」であり、以下１７１２〜１７２０まで同じように２
文字連鎖を生成する。この２文字連鎖から文書番号、第
１文字と第２文字の出現回数または数値が格納された組
である文字連鎖データを生成する。In FIG. 17 (d), reference numeral 1708 denotes a character string "ai aai aai aaiai" input at the time of registration, 1709 denotes a two-character chain "ai" registered first, and 1710 denotes a special character following 1709. The two-letter chain "a", 1711 including the character "a" is converted to the next two-letter chain "a".
And the same applies to 1712-1720.
Generate a character chain. From this two-character chain, character chain data, which is a set in which a document number, the number of appearances of the first character and the second character, or a numerical value is stored, is generated.

【０１１３】第１７図（ａ）〜（ｃ）は２文字連鎖を構
成する文字種に応じて異なる文字連鎖データを示してい
る。第１７図（ａ）は特殊文字を含まない文字連鎖デー
タで、第１文字の出現回数と第２文字の出現回数を格納
するサイズは同じである。一方、第１７図（ｂ）（ｃ）
では特殊文字を含む文字連鎖データであり、特殊文字に
対する出現回数の格納するサイズは、特殊文字でない文
字に対する領域に比べて大きい。また特殊文字でない文
字に対する領域には指定された値（本実施例では０）を
記憶するものとする。FIGS. 17 (a) to 17 (c) show different character chain data according to the type of characters constituting the two-character chain. FIG. 17A shows character chain data that does not include special characters, and has the same size for storing the number of appearances of the first character and the number of appearances of the second character. On the other hand, FIGS. 17 (b) and (c)
Is character chain data including special characters, and the size of the number of appearances for special characters is larger than the area for characters that are not special characters. A designated value (0 in this embodiment) is stored in an area for a character that is not a special character.

【０１１４】第１７図（ｅ）において、登録文字列１７
０８に対して文字連鎖データ作成する。ここで「あ」の
出現回数をn1、「い」の出現回数をn2とする。２文字連
鎖１７０９は「あい」の文字連鎖データであり、「あ」
の出現回数はn1、「い」n2であるが、２文字連鎖「あ
い」に続く２文字連鎖「いａ」１７１０の文字連鎖デー
タが第１７図（ｃ）で構成されることから「いａ」の文
字連鎖データは「い」に該当する値が０、「ａ」に該当
する値は特殊文字出現回数１となる。従って、文字連鎖
データの連続性から「あい」の文字連鎖データは１７２
１のように第２文字に該当部分が０となる。以下同様に
１７２２〜１７３２のように文字連鎖データを構成する
ことができる。In FIG. 17 (e), the registered character string 17
08 to create character chain data. Here, the number of appearances of “A” is n1, and the number of appearances of “I” is n2. The two-character chain 1709 is character chain data of “Ai”, and “A”
Are n1 and "i" n2, but the character chain data of the two-character chain "a" 1710 following the two-character chain "ai" is composed of FIG. In the character chain data of "", the value corresponding to "i" is 0, and the value corresponding to "a" is 1 the number of special character appearances. Therefore, from the continuity of the character chain data, the character chain data of “A” is 172 characters.
The portion corresponding to the second character becomes 0, such as 1. Hereinafter, similarly, character chain data such as 1722 to 1732 can be constituted.

【０１１５】第１７図（ｅ）で生成された文字連鎖デー
タは、第１７図（ｆ）〜（ｉ）のように出現する２文字
連鎖の組み合わせ毎に分けて格納する。The character chain data generated in FIG. 17 (e) is stored separately for each combination of two character chains appearing as shown in FIGS. 17 (f) to 17 (i).

【０１１６】以上の文字連鎖データの生成方法は、第１
８図のフローおよび第２０図の文字列照合装置の構成に
より実現される。第２０図において、２００１は登録す
る文字列から２文字連鎖および文書番号を作成する２文
字連鎖検出器、２００２は２文字連鎖から２文字連鎖の
各文字種に対して出現回数または値を算出し、さらに特
殊文字を含む２文字連鎖に続く２文字連鎖の場合には、
既に出現回数の値を算出した値に置き換えが必要かどう
かを２００３特殊文字連鎖検出器に問い合わせ、その結
果から再度出現回数を算出する出現回数算出器、および
前記２００２出現回数算出器が文書番号と出現回数の組
を文字連鎖データとして格納する２文字連鎖メモリ２０
０４から構成されている。第１８図で文字連鎖検出器２
００１は登録文字列データを読み取り（ステップ１８０
１）、文書番号をセットし（ステップ１８０２）、最大
文書数まで登録文字列を読み取り文書番号を付与し（ス
テップ１８０３）、さらに２文字連鎖（Ak,Ak+1)（Ak,A
k+1はk,k+1番目の文字種）の組を作成する（ステップ１
８０４）。続いて出現回数検出器２００２は、２文字連
鎖に特殊文字列の有無を調べ（ステップ１８０５）、特
殊文字を含む場合は、特殊文字の出現回数N(Ak)またはN
(Ak+1)をカウントし、文字連鎖データSkを作成する（ス
テップ１８０６、１８０８）。また特殊文字を含まない
場合は出現回数をカウントし文字連鎖データSkを作成す
る（ステップ１８１０）。次に特殊文字連鎖検出器２０
０３は、前記文字連鎖データSkに連続する文字連鎖デー
タSk+1に対してSkの第２文字に該当する出現回数または
値が、Sk+1の第１文字に該当する出現回数または値に等
しくなるよう値を修正する（ステップ１８０７、１８０
８、１８１１）。以下全ての２文字連鎖、および登録文
字列について実施し（ステップ１８１２〜１８１４）、
生成された文字連鎖データを２文字連メモリ２００４に
格納する。The above-described method of generating character chain data is based on the first method.
This is realized by the flow of FIG. 8 and the configuration of the character string collation device of FIG. In FIG. 20, 2001 is a two-character chain detector that creates a two-character chain and a document number from a registered character string, and 2002 calculates the number of appearances or value for each character type of the two-character chain from the two-character chain, Furthermore, in the case of a two-character chain following a two-character chain including special characters,
An inquiry is made to the 2003 special character chain detector as to whether it is necessary to replace the value of the number of appearances with a value already calculated, and an appearance number calculator for calculating the number of appearances again from the result. A two-character chain memory 20 for storing a set of occurrence counts as character chain data
04. In FIG. 18, the character chain detector 2
001 reads the registered character string data (step 180
1), a document number is set (step 1802), a registered character string is read up to the maximum number of documents, a document number is assigned (step 1803), and a two-character chain (Ak, Ak + 1) (Ak, A
Create a set of (k + 1 is the k, k + 1th character type) (Step 1)
804). Subsequently, the appearance number detector 2002 checks the presence or absence of a special character string in the two-character chain (step 1805). When the special character is included, the appearance number N (Ak) or N
(Ak + 1) is counted, and character chain data Sk is created (steps 1806 and 1808). If no special character is included, the number of appearances is counted to create character chain data Sk (step 1810). Next, the special character chain detector 20
03 is such that the number of appearances or value corresponding to the second character of Sk is equal to the number of appearances or value corresponding to the first character of Sk + 1 with respect to the character chain data Sk + 1 continuous with the character chain data Sk. The value is corrected so as to satisfy (steps 1807 and 180
8, 1811). The following is performed for all two-character chains and registered character strings (steps 1812 to 1814),
The generated character chain data is stored in the two-character continuous memory 2004.

【０１１７】次に文字列照合の方法について説明する。
検索文字列として第１７図（ｊ）１７３２の「いａあい
ａ」を例として説明する。検索文字列を２文字連鎖「い
ａ」１７３３、次の２文字連鎖「ａあ」１７３４、以下
同様にして１７３５〜１７３６までを作成する。この２
文字連鎖に該当する文字連鎖データを２文字連鎖メモリ
２００４から取り出し、１７３３から順番に連続性の照
合を行う。連続性の照合の概念は第１７図（ｋ）示して
いる。２文字連鎖「いａ」１７３３に該当する文字連鎖
データを「いａ」の文字連鎖データである第１７図
（ｇ）を先頭から検索し、文字連鎖データ１７２２を取
り出す。文字連鎖データ１７２２の文字種を調べ、予め
指定した特殊文字「ａ」を第２文字に含んでいるので、
「ａ」の出現回数を特殊文字出現カウンタメモリ（第２
０図の２００７）に格納する。次に２文字連鎖「いａ」
に続く２文字連鎖「ａあ」に該当する文字連鎖データを
「ａあ」の文字連鎖データである第１７図（ｈ）を先頭
から検索し、文字連鎖データ１７２２の第２文字の出現
回数と第１７図（ｈ）の文字連鎖データの第１文字の出
現回数が一致するかを調べ、文字連鎖データ１７２３を
取得する。これにより文字連鎖データ１７２２と１７２
３は連続と判定する。Next, a method of character string collation will be described.
The search character string will be described using "Ia aa" in FIG. 17 (j) 1732 as an example. For the search character string, a two-character chain “a” 1733, a next two-character chain “a” 1734, and so on are created in the same manner from 1735 to 1736. This 2
Character chain data corresponding to the character chain is retrieved from the two-character chain memory 2004, and continuity is collated sequentially from 1733. The concept of the continuity check is shown in FIG. 17 (k). The character chain data corresponding to the two-character chain “Ia” 1733 is searched from the beginning in FIG. 17G that is the character chain data of “Ia”, and the character chain data 1722 is extracted. The character type of the character chain data 1722 is checked, and the special character “a” specified in advance is included in the second character.
The number of occurrences of “a” is stored in the special character appearance counter memory (second
(2007) in FIG. Next, the two-letter chain "Ia"
The character chain data corresponding to the two-character chain “a-a” following the character string “a-a” is searched from the beginning in FIG. It is checked whether the number of appearances of the first character in the character chain data of FIG. 17 (h) matches, and character chain data 1723 is obtained. As a result, character chain data 1722 and 172
3 is determined to be continuous.

【０１１８】次に２文字連鎖「ａあ」に続く２文字連鎖
「あい」に該当する文字連鎖データを「あい」の文字連
鎖データである第１７図（ｆ）を先頭から検索し、文字
連鎖データ１７２３の第２文字の出現回数と第１７図
（ｆ）の文字連鎖データの第１文字の出現回数が一致す
るかを調べ、文字連鎖データ１７２４を取得する。これ
により文字連鎖データ１７２３と１７２４は連続と判定
する。Next, character chain data corresponding to the two-character chain "A" following the two-character chain "a" is searched from the beginning in FIG. It is checked whether the number of appearances of the second character in the data 1723 matches the number of appearances of the first character in the character chain data in FIG. 17F, and character chain data 1724 is obtained. Thus, the character chain data 1723 and 1724 are determined to be continuous.

【０１１９】次に２文字連鎖「あい」に続く２文字連鎖
「いａ」に該当する文字連鎖データを「いａ」の文字連
鎖データである第１７図（ｇ）を先頭から検索し、文字
連鎖データ１７２４の第２文字の出現回数と第１７図
（ｇ）の文字連鎖データの第１文字の出現回数が一致す
るかを調べる。ここで２文字連鎖「いａ」には再度特殊
文字「ａ」が出現したため特殊文字出現カウンタ１７３
８の値を１つ増やす（１７３８）。第１７図（ｇ）の先
頭から、文字連鎖データ１７２４の第２文字の出現回数
と「いａ」の文字連鎖データの第１文字の値と一致する
文字連鎖データを調べると１７２２があるが、特殊文字
の出現回数を特殊文字出現カウンタ１７３８から２であ
ることから、次の文字連鎖データを探し１７２５を得
る。これにより文字連鎖データ１７２４と１７２５は連
続と判定し、検索文字列を含む登録文字列が存在すると
判定する。Next, the character chain data corresponding to the two-character chain "Ia" following the two-character chain "A" is searched from the top in FIG. 17 (g) which is the character chain data of "Ia". It is checked whether the number of appearances of the second character in the chained data 1724 and the number of appearances of the first character in the character chained data in FIG. 17 (g) match. Here, the special character appearance counter 173 appears because the special character “a” appears again in the two-character chain “a”.
The value of 8 is increased by one (1738). From the top of FIG. 17 (g), when character chain data that matches the number of appearances of the second character of the character chain data 1724 and the value of the first character of the character chain data of “Ia” is checked, there is 1722. Since the number of occurrences of the special character is 2 from the special character appearance counter 1738, the next character chain data is searched for 1725. As a result, the character chain data 1724 and 1725 are determined to be continuous, and it is determined that a registered character string including the search character string exists.

【０１２０】以上の文字列照合の方法は、第１９図のフ
ローおよび第２０図の文字列照合装置の構成により実現
される。第２０図において、２００５は検索する文字列
から２文字連鎖を作成する２文字連鎖検出器、２００６
は２文字連鎖を構成する各文字種を調べ、２文字連鎖が
特殊文字を含まない場合は、文字連鎖検出器２００５で
検出された連続した文字連鎖に該当する文字連鎖データ
に対して、検出された文字連鎖データの第２文字の出現
回数と、文字連鎖に続く文字連鎖の文字連鎖データの第
１文字の出現回数を比較することにより、検索文字列と
しての文字連鎖の連続の有無を判定し、２文字連鎖が特
殊文字を含む場合は、文字連鎖検出器２００５で検索さ
れた連続した文字連鎖に該当する文字連鎖データに対し
て、比較手段と同様に文字の出現回数と比較し、比較す
る際に指定された特殊文字列の出現回数を特殊文字出現
カウンタメモリ２００７に記憶し、連続した文字連鎖以
外では出現回数が重複しないことを基準として比較する
比較器、２００８は比較器２００７の結果から、文字連
鎖データの連続性の連続の有無を判定する制御器で構成
されている。第１９図で２文字連鎖検出器２００５は検
索文字列を読み取り（ステップ１９０１）、２文字連鎖
Ak,Ak+1を作成し（ステップ１９０２）、２文字連鎖を
先頭からセットし（ステップ１９０３）、比較器２００
６は、２文字連鎖検出器２００５から２文字連鎖を、連
続性の照合ができなくなるまで取り出し（ステップ１９
０４）、さらに２組の２文字連鎖(Ak,Ak+1)、(Ak+1,Ak+
2)（Ak,Ak+1,Ak+2はk,K+1,K+2番目の文字種）に対応す
る文字連鎖データSl(N(Ak),N(Ak+1))、Sm(M(Ak),M(Ak+
1))（Sl,Smはl,m番目の文字連鎖データ、N(Ak),M(Ak+1)
は各々文字種Ak,Ak+1の出現回数または値）を先頭から
取り出し（ステップ１９０５）、２文字連鎖に特殊文字
が含まれているかを調べる（ステップ１９０６）。特殊
文字が含まれている場合は、特殊文字の出現回数N(Ak)
またはN(Ak+1)をTとして格納し、次に文字連鎖データSl
の第２文字の出現回数N(Ak+1)と文字連鎖データSmの第
１文字の出現回数M(Ak+1）が一致しているかどうかを調
べ（ステップ１９０８）、一致していなければSmの次の
文字連鎖データSm+1にセットし（ステップ１９１０）、
ステップ１９０５に移る。出現回数が一致し、特殊文字
を特殊文字を含む文字連鎖データで、かつ特殊文字の出
現回数Tに一致しているかを判定し（ステップ１９０
９）、ステップ１９０９の条件を満たさない場合は、文
字連鎖データは連続と判定した結果を制御器２００８に
返し（ステップ１９１１）、次の２文字連鎖の連続照合
に入る（ステップ１９１２）。The above-described method of character string collation is realized by the flow of FIG. 19 and the configuration of the character string collation apparatus of FIG. In FIG. 20, reference numeral 2005 denotes a two-character chain detector for creating a two-character chain from a character string to be searched;
Examines each character type constituting the two-character chain, and if the two-character chain does not include a special character, the character chain data corresponding to the continuous character chain detected by the character chain detector 2005 is detected. By comparing the number of appearances of the second character of the character chain data with the number of appearances of the first character of the character chain data of the character chain following the character chain, it is determined whether or not the character chain as a search character string is continuous. When the two-character chain includes a special character, the character chain data corresponding to the continuous character chain searched by the character chain detector 2005 is compared with the number of appearances of the character in the same manner as the comparing means. A comparator that stores the number of appearances of the special character string specified in the special character appearance counter memory 2007, and compares the number of occurrences other than continuous character chains based on the fact that the number of occurrences does not overlap. From result of the comparator 2007 is configured the presence or absence of continuity of the continuity of the character chain data determining controller. In FIG. 19, the two-character chain detector 2005 reads the search character string (step 1901), and the two-character chain
Ak, Ak + 1 are created (step 1902), and a two-character chain is set from the beginning (step 1903).
6 takes out a two-character chain from the two-character chain detector 2005 until the continuity can no longer be checked (step 19).
04), and two sets of two-character chains (Ak, Ak + 1) and (Ak + 1, Ak +
2) (Ak, Ak + 1, Ak + 2 is the k, K + 1, K + 2nd character type) Character chain data Sl (N (Ak), N (Ak + 1)), Sm (M (Ak), M (Ak +
1)) (Sl, Sm is the l, m-th character chain data, N (Ak), M (Ak + 1)
Extracts the number of appearances or values of the character types Ak and Ak + 1 from the beginning (step 1905) and checks whether a special character is included in the two-character chain (step 1906). If special characters are included, the number of occurrences of special characters N (Ak)
Or, store N (Ak + 1) as T, then character chain data Sl
Is checked whether the number of appearances N (Ak + 1) of the second character of the character string and the number of appearances M (Ak + 1) of the first character in the character chain data Sm match (step 1908). Is set to the next character chain data Sm + 1 (step 1910),
Move to step 1905. It is determined whether or not the number of appearances matches, whether the special character is character chain data including the special character, and matches the number of appearances T of the special character (step 190).
9) If the condition of step 1909 is not satisfied, the result of determining that the character chain data is continuous is returned to the controller 2008 (step 1911), and the continuous two-character chain collation is started (step 1912).

【０１２１】この時特定の特殊文字「ａ」の出現回数が
他の文字種に比べて多い場合に文字連鎖による文字列照
合を行うことが可能となる。なお、本発明の第８の方法
で、検索文字列で「ａあい」のように先頭に特殊文字を
含む検索を行う場合、「あい」の文字連鎖データの第１
文字の出現回数は０であることから、「ａあ」の文字連
鎖データを参照することなく、「あい」の文字連鎖デー
タで第１文字の出現回数が０であるかを最初に判定する
ことで照合処理を短縮することができる。At this time, if the number of occurrences of the specific special character "a" is larger than that of other character types, it is possible to perform character string collation by character chain. In the eighth method of the present invention, when performing a search including a special character at the beginning such as “a AI” in the search character string, the first character string data of “AI” is used.
Since the number of appearances of a character is 0, it is first determined whether the number of appearances of the first character is 0 in the character chain data of "A" without referring to the character chain data of "a". Can shorten the collation processing.

【０１２２】（実施の形態９）第２２図は本発明の第９
の方法の文字列照合の方法の概念を示している。第２２
図（ａ）において、２２０１は登録時に入力される文字
列「いろａはに」、２２０２は最初に登録される２文字
連鎖「いろ」、２２０３は２２０２に続く特殊２文字連
鎖であり、２２０２の第１文字と特殊文字「ａ」の次の
文字「は」の組にした特殊２文字連鎖「いは」、または
２００３は、特殊文字「ａ」に続く２文字連鎖「はに」
（２２０４）の第１文字である。第２２図（ｃ）におい
て、２２０５は検索時の検索文字列「いろａはに」、続
いて２文字連鎖２２０６「いろ」、２２０６の第１文字
と特殊文字の直後の文字「は」を組とした文字連鎖２２
０７「いは」、特殊文字「ａ」の後の２文字連鎖「は
に」である。(Embodiment 9) FIG. 22 shows a ninth embodiment of the present invention.
3 shows the concept of the method of character string matching. 22nd
Referring to FIG. 9A, reference numeral 2201 denotes a character string “Iro a Hari” input at the time of registration; 2202 denotes a two-character chain “Iro” to be registered first; 2203 denotes a special two-character chain following 2202; The special two-character chain “Iha”, which is a set of the first character and the character “H” next to the special character “a”, or the two-character chain “Hani” following the special character “a”
This is the first character of (2204). In FIG. 22 (c), reference numeral 2205 denotes a search character string “iro a han” at the time of search, followed by a two-character chain 2206 “iro”, and the first character of 2206 and the character “ha” immediately after the special character. Character chain 22
07 "Iha" and the two-character chain "Hani" after the special character "a".

【０１２３】第２２図（ｂ）において、２文字連鎖２２
０２は「い」および「ろ」の出現回数n1、n2を、２文字
連鎖２２０３は「い」および「は」の出現回数n1、n3
を、２文字連鎖２２０４は「は」および「に」の出現回
数n3、n4を記憶する。In FIG. 22B, a two-character chain 22
02 is the number of appearances n1 and n2 of “i” and “ro”, and the two-character chain 2203 is the number of appearances n1 and n3 of “i” and “ha”.
And the two-character chain 2204 stores the number of appearances n3 and n4 of “ha” and “ni”.

【０１２４】このとき本発明の第９の方法による照合方
法では、２文字連鎖の個数の少ない方の文字連鎖または
特殊２文字連鎖を優先させて図２２（ｂ）より検索す
る。たとえば（１）２文字連鎖「いろ」の個数が２文字
連鎖「いは」の個数よりも多い場合には、２文字連鎖
「いは」を最初の検索文字連鎖とする、逆の場合は２文
字連鎖「いろ」を、または（２）特殊文字の前の２文字
連鎖と、特殊文字の前の２文字連鎖の第１文字と特殊文
字の直後の文字との組み合わせの２文字連鎖を最初の検
索文字連鎖となる。以下文字列の照合は第４の発明と同
様に、特殊２文字連鎖２２０７および２文字連鎖２２０
６を検出し、続けて特殊文字連鎖２２０７の第２文字
「は」の出現回数n3と、２文字連鎖２２０８の第１文字
「は」の出現回数が一致するか否かを判断する。以上に
より文字列の照合がなされる。At this time, in the collating method according to the ninth method of the present invention, the character chain with the smaller number of two-character chains or the special two-character chain is searched with priority given in FIG. For example, (1) when the number of two-character chains "iro" is greater than the number of two-character chains "Iha", the two-character chain "Iha" is used as the first search character chain; The character chain "Iro" or (2) the two-character chain of the combination of the two-character chain preceding the special character, the first character of the two-character chain preceding the special character, and the character immediately following the special character It becomes a search character chain. Hereinafter, the collation of the character strings is performed in the same manner as in the fourth invention, except for the special two-character chain 2207 and the two-character chain 220.
Then, it is determined whether or not the number of appearances n3 of the second character "ha" in the special character chain 2207 matches the number of occurrences of the first character "ha" in the two-character chain 2208. Thus, the character strings are collated.

【０１２５】第２１図は本発明の第９の方法の一実施例
における文字列照合装置の構成を示したものである。第
２１図において、２１０１は登録する文字列２２０１か
ら特殊文字「ａ」を検出する特殊文字検出器、２１０２
は文字列２２０１から特殊文字がない場合に２文字連鎖
を生成し、文字の出現回数を組として２文字連鎖メモリ
２１０４に登録する２文字連鎖２２０２、２２０４を検
出する２文字連鎖検出器、２１０３は文字列２２０１か
ら特殊文字をまたいだ特殊文字連鎖２２０３を生成し、
文字の出現回数を組として２文字連鎖メモリ２１０４に
登録する特殊２文字連鎖検出器、２１０５は検索する文
字列２２０５から特殊文字「ａ」を検出する特殊文字検
出器、２１０６は文字列２２０１から特殊文字がない場
合に２文字連鎖を生成し、２文字連鎖２２０２、２２０
４を検出する２文字連鎖検出器、２１０７は文字列２２
０５から特殊文字をまたいだ特殊文字連鎖２２０３を生
成する特殊２文字連鎖検出器、２１０８は、２文字連鎖
検出器２１０６および特殊２文字連鎖検出器２１０７か
ら２文字連鎖２２０７または２文字連鎖２２０６を２文
字連鎖メモリ２１０４から検出し、比較器２１０８で文
字の出現回数から文字連鎖の連続性を判断し、検索文字
列の一致を制御部２１０９で判断する。FIG. 21 shows the structure of a character string collating apparatus according to an embodiment of the ninth method of the present invention. In FIG. 21, reference numeral 2101 denotes a special character detector for detecting the special character "a" from the registered character string 2201;
Generates a two-character chain when there is no special character from the character string 2201, and detects a two-character chain 2202, 2204 registered in the two-character chain memory 2104 with the number of appearances of the character as a set. A special character chain 2203 that spans special characters is generated from the character string 2201.
A special two-character chain detector that registers the number of appearances of a character as a set in the two-character chain memory 2104, 2105 is a special character detector that detects the special character "a" from the character string 2205 to be searched, and 2106 is a special character detector that detects a special character from the character string 2201. If there is no character, a two-character chain is generated and the two-character chain 2202, 220
2 is a two-character chain detector for detecting the character string 2
The special two-character chain detector 2108 that generates a special character chain 2203 that straddles a special character from the special character chain 05 from the two-character chain detector 2106 and the special two-character chain detector 2107. The character string is detected from the character chain memory 2104, the continuity of the character chain is determined from the number of appearances of the character by the comparator 2108, and the matching of the search character string is determined by the control unit 2109.

【０１２６】よって、この方法では特定の特殊文字
「ａ」はその出現回数に関係なく前後の文字と連鎖を生
成することができるため、特殊文字「ａ」の制限を受け
ることなく文字連鎖による文字列照合を行うことができ
る。なお、特殊文字を含む照合、たとえば「ａは」の場
合は、特殊文字を無視して「は」を第１文字とする文字
連鎖の照合を行いことができることはいうまでもない。Therefore, in this method, a specific special character "a" can be chained with the preceding and following characters irrespective of the number of appearances thereof, so that the character by the character chain is not restricted by the special character "a". Column matching can be performed. It is needless to say that in the case of a collation including a special character, for example, in the case of “a wa”, the collation of a character chain in which “ha” is the first character can be performed ignoring the special character.

【０１２７】（実施の形態１０）図２３は本発明の第１
０の実施の形態におけるによる文字列照合装置のブロッ
ク構成図、図２４は本発明の第１０の方法による文字列
照合の方法の概念、及び全文検索データを記憶した記録
媒体の記憶形式を示している。(Embodiment 10) FIG. 23 shows a first embodiment of the present invention.
FIG. 24 is a block diagram of a character string collating apparatus according to the tenth embodiment, and FIG. 24 shows a concept of a character string collating method according to a tenth method of the present invention and a storage format of a recording medium storing full-text search data. I have.

【０１２８】図２４(a)において、２４０１は登録時に
入力される文字列「いろａはに」、２４０２は最初に登
録される２文字連鎖「いろ」、２４０３は２４０２の次
の３文字連鎖「ろａは」、２４０４は２４０３の次の２
文字連鎖「はに」である。ここで「ａ」は、文字列に意
味の区切りなどのために挿入されている特殊文字を示
す。In FIG. 24A, reference numeral 2401 denotes a character string “iro a ha ni” inputted at the time of registration, 2402 denotes a two-character chain “iro” registered first, and 2403 denotes a three-character chain next to 2402. 2404 is 2403 next to 2403
The character chain is "Hani". Here, "a" indicates a special character inserted into the character string to separate the meaning.

【０１２９】図２４(c)において、２４１１は検索時の
検索文字列「いろａはに」、２４１２は最初に検索され
る２文字連鎖「いろ」、２４１３は２４１２の次の３文
字連鎖「ろａは」、２４１４は２４１３の次の２文字連
鎖「はに」である。In FIG. 24 (c), reference numeral 2411 denotes a search character string “iro a ha ni” at the time of search, 2412 denotes a two-character chain “iro” to be searched first, and 2413 denotes a three-character chain “ro” next to 2412. a is ", and 2414 is a two-character sequence" Hani "following 2413.

【０１３０】図２４(b)において、２文字連鎖２４０２
は「い」の出現位置ｎを、３文字連鎖２４０３は「ろ」
の出現位置n＋１を、２文字連鎖２４０４は「は」の出
現位置n＋２を記憶する。２文字連鎖２４０２、２４０
４と３文字連鎖２４０３は異なる領域に記憶し、２文字
連鎖か３文字連鎖かを識別する。検索文字列図２４(c)
の入力に対し、本発明の第１０の方法による照合方法で
は、２文字連鎖２４１２の「いろ」に該当する２文字連
鎖２４０２を２文字連鎖を格納した領域から検出し、こ
のときの出現位置nと、２４１２の次の３文字連鎖２４
１３の「ろａは」に該当する３文字連鎖２４０３「ろ
は」を３文字連鎖が格納された領域から検出し、このと
きの出現位置n＋１が前記の２４０２の出現位置＋１と
一致するか否か判断する。一致したら、次に２４１３の
次の２文字連鎖２４１４「はに」に該当する２文字連鎖
２４０４を２文字連鎖を格納する領域から検出し、この
ときの出現位置ｎ＋２が前記の２４０３の出現位置＋１
と一致するか否か判断する。一致したら、文字列２１１
は２０１に一致したと判断する。以上により、文字列の
照合がなされる。In FIG. 24B, a two-character chain 2402
Is the appearance position n of "i", and the three-character chain 2403 is "ro"
The two-character chain 2404 stores the appearance position n + 2 of “ha”. Two-character chain 2402, 240
The 4 and 3 character chains 2403 are stored in different areas, and identify whether they are 2 character chains or 3 character chains. Search character string figure 24 (c)
In the collation method according to the tenth method of the present invention, a two-character chain 2402 corresponding to the "color" of the two-character chain 2412 is detected from the area storing the two-character chain, and the appearance position n And the three-character sequence 24 following 2412
The three-character chain 2403 corresponding to thirteen “ro aha” is detected from the area where the three-character chain is stored, and whether or not the appearance position n + 1 at this time coincides with the appearance position +1 of 2402 is determined. Judge. If they match, a two-character chain 2404 corresponding to the two-character chain 2414 “Hani” next to 2413 is detected from the area storing the two-character chain, and the appearance position n + 2 at this time is the appearance position of the aforementioned 2403 + 1
It is determined whether or not they match. If they match, the character string 211
Is determined to match 201. As described above, the character strings are collated.

【０１３１】図２３は本発明の第１０の方法の一実施の
形態における文字列照合装置の構成を示したものであ
る。FIG. 23 shows the configuration of a character string collating apparatus according to an embodiment of the tenth method of the present invention.

【０１３２】図２３において、２３０１は登録する文字
列２４０１から登録する２文字連鎖２４０２、２４０
４、およびそれらの出現位置を検出する２文字連鎖位置
検出器、２３０２は登録する文字列２４０１から登録す
る３文字連鎖２４０３およびその出現位置を検出する３
文字連鎖位置検出器、２３０３は２文字連鎖２４０２、
２４０４およびそれらの出現位置を格納する２文字連鎖
位置メモリ、２３０４は３文字連鎖２４０３およびその
連鎖の出現位置を格納する３文字連鎖位置メモリ、２３
１１は検索する文字列２４１１から検索する２文字連鎖
２４１２、２４１４を検出する２文字連鎖検出器、２３
１２は検索する文字列２１１から検索する３文字連鎖２
４１３を検出する３文字連鎖検出器、２３１３は２文字
連鎖検出器２３１１より検出された２文字連鎖２４１
２、２４１４を２文字連鎖位置メモリ２３０３で検出す
るかまたは、３文字連鎖検出器２３１２より検出された
３文字連鎖２４１３を３文字連鎖位置メモリ２３０４で
検出し、検出したそれぞれの文字連鎖の出現位置が直前
に検出した文字連鎖の出現位置＋１に一致するか否か判
断する比較器、２３１４は２文字連鎖検出器２３１１お
よび３文字連鎖検出器２３１２から検出される全ての２
文字または３文字の連鎖についての一致を比較器２３１
３で判断し、文字列の一致を判断する制御部である。In FIG. 23, reference numeral 2301 denotes a two-character chain 2402, 240 to be registered from a character string 2401 to be registered.
4, and a two-character chain position detector 2302 for detecting their appearance position, a three-character chain 2403 to be registered from the registered character string 2401, and a detection for their appearance position 3
A character chain position detector 2303 is a two-character chain 2402,
Reference numeral 2404 denotes a two-character chain position memory for storing the appearance positions thereof, and reference numeral 2304 denotes a three-character chain position memory for storing the three-character chain 2403 and the appearance position of the chain.
Reference numeral 11 denotes a two-character chain detector that detects two-character chains 2412 and 2414 to be searched from a character string 2411 to be searched;
12 is a three-character chain 2 to be searched from the character string 211 to be searched
A three-character chain detector 231 for detecting 413 is a two-character chain 241 detected by the two-character chain detector 2311.
2, 2414 is detected by the two-character chain position memory 2303, or the three-character chain 2413 detected by the three-character chain detector 2312 is detected by the three-character chain position memory 2304, and the appearance position of each detected character chain is detected. The comparator 2314 determines whether or not matches the appearance position +1 of the character chain detected immediately before, and 2314 detects all of the 2 characters detected from the two-character chain detector 2311 and the three-character chain detector 2312.
The comparator 231 checks for a match on a character or a three-character chain.
The control unit determines in step 3 whether the character strings match.

【０１３３】以上ように構成された文字列照合装置にお
いて、２文字連鎖位置メモリ２３０３に図２４（ｂ）の
２４０２、２４０４の２文字連鎖が、３文字連鎖位置メ
モリに図２４（ｂ）の２４０３の３文字連鎖が格納され
ており、検索文字列として図２４（ｃ）の「いろａは
に」が入力された場合の動作について説明する。In the character string collating device configured as described above, the two-character chain 2402 and 2404 in FIG. 24B is stored in the two-character chain position memory 2303, and the two-character chain 2403 in FIG. The following describes the operation performed when the three-character chain is stored, and “iroa wa ni” in FIG. 24C is input as a search character string.

【０１３４】検索文字列「いろａはに」が入力される
と、２文字連鎖検出器は、予め特殊文字として指定され
た「ａ」を含まない２文字連鎖、「いろ」「はに」を検
出し、比較器２３１３に出力する。また、３文字連鎖検
出器は、予め特殊文字として指定された「ａ」を中心
に、「ａ」が挿入された３文字連鎖「ろａは」を検出し
比較器２３１３に出力する。When the search character string "color a han" is input, the two-character chain detector detects a two character chain "iro" and "hani" that do not include "a" specified as a special character in advance. Detected and output to the comparator 2313. The three-letter chain detector detects a three-letter chain “roaha” in which “a” is inserted, centering on “a” specified as a special character in advance, and outputs it to the comparator 2313.

【０１３５】このとき、比較器への出力は、連鎖順「い
ろ」「ろａは」「はに」としてもよいし、また、文字の
連鎖情報と共に、「いろ」「ろａは」「はに」を同時に
出力してもい。At this time, the output to the comparator may be in the order of the sequence "color", "color a", "color", or together with the character chain information, "color", "color a", "color". May be output at the same time.

【０１３６】比較器２３１３は、２文字連鎖検出器から
の出力か３文字連鎖検出器からの出力かを区別し、それ
ぞれ２文字連鎖メモリ１０３、３文字連鎖メモリ１０４
から「いろ」「はに」と「ろａは」に対応する「ろは」
の連鎖を検出し、出現回数に基づき連鎖を判断する。The comparator 2313 distinguishes between the output from the two-character chain detector and the output from the three-character chain detector.
"Roha" corresponding to "iro", "hani" and "roaha"
Are detected, and the chain is determined based on the number of appearances.

【０１３７】比較器が２文字連鎖か３文字連鎖かを区別
し、それぞれ異なる連鎖メモリから検出することによ
り、検索対象文字列として「いろａはに」と「いろは
に」を区別して検索することが可能となる。The comparator discriminates between a two-character chain and a three-character chain, and detects them from different chain memories, thereby performing a search by distinguishing between "iro a ha ni" and "iro ha ni" as character strings to be searched. Becomes possible.

【０１３８】以上のように、本実施の形態によれば、予
め指定された特定の特殊文字「ａ」の出現回数に制限を
受けることなく、特殊文字による連鎖メモリの増大を避
けることができ、同時に出現回数の一致による連鎖の抽
出処理を効率的に行うことが可能となる。As described above, according to the present embodiment, it is possible to avoid an increase in the chain memory due to special characters without being limited by the number of appearances of a specific special character “a” specified in advance. At the same time, it is possible to efficiently perform a chain extraction process based on the coincidence of the number of appearances.

【０１３９】なお、本実施の形態では特殊文字を「ａ」
と表現したが、特殊文字の並び「ａ、ａ・・・，ａ」を
「ａ」と置き換えることにより、特殊文字の出現回数に
制限を受けることなく、特殊文字の挿入の有無を区別し
た文字連鎖による文字列照合を行うことが可能となる。In this embodiment, the special character is "a".
However, by replacing the special character sequence “a, a..., A” with “a”, a character that distinguishes whether a special character is inserted or not is not limited by the number of appearances of the special character. String matching by chaining can be performed.

【０１４０】即ち、「いろ（特殊文字１つ以上）はに」
と「いろはに」を異なる検索文字とした検索が可能とな
る。That is, "Iro (one or more special characters) is a character"
And "irohani" can be searched using different search characters.

【０１４１】また、本実施の形態では２文字連鎖と３文
字連鎖（特殊文字の挿入）を区別するために異なる連鎖
メモリを設けたが、同一メモリに２文字連鎖か３文字連
鎖かを識別する変位を設けて、例えば図２４（ｄ）のよ
うに２文字連鎖と３文字連鎖を記憶することができる。
この場合、文字連鎖２４０２、２４０３、２４０４の出
現位置をｎ、ｎ＋１、ｎ＋３、変位を１、２、１とし、
各文字連鎖の連続性を各文字連鎖の出現位置がその文字
連鎖の直前の文字連鎖の出現位置＋変位と一致するか比
較することで、２文字連鎖か３文字連鎖かの識別が変位
により識別され、同一の領域にこれらのデータを格納し
て、本発明の第１０の方法により、文字列の照合を行う
ことができる。In this embodiment, different chain memories are provided to distinguish between a two-character chain and a three-character chain (insertion of special characters). By providing a displacement, for example, a two-character chain and a three-character chain can be stored as shown in FIG.
In this case, the appearance positions of the character chains 2402, 2403, 2404 are n, n + 1, n + 3, the displacements are 1, 2, 1, and
The continuity of each character chain is compared to determine whether the appearance position of each character chain matches the appearance position of the character chain immediately before the character chain plus the displacement, so that the two-character chain or the three-character chain is identified by the displacement. Then, by storing these data in the same area, the character string can be collated by the tenth method of the present invention.

【０１４２】（実施の形態１１）図２６は、本発明の第
１１の実施の形態における文字列照合装置の構成を示す
概念図、図２５は本発明の第１１の方法による文字列照
合の方法の概念、及び全文検索データを記憶した記録媒
体の記憶形式を示している。(Embodiment 11) FIG. 26 is a conceptual diagram showing a configuration of a character string collating apparatus according to an eleventh embodiment of the present invention, and FIG. 25 is a character string collating method according to an eleventh method of the present invention. And the storage format of a recording medium storing full-text search data.

【０１４３】図２６（ａ）において、２６０１は登録時
に入力される文字列「いろａはに」、２６０２は文字列
２６０１に対して特定の特殊文字「ａ」をその後の文字
「は」により一意に決めた「ａ1」に変更した文字列
「いろａ1はに」、２６０３は最初に登録されるの２文
字連鎖「いろ」、２６０４は２６０３の次の２文字連鎖
「ろａ1」、２６０５は２６０４の次の２文字連鎖「ａ1
は」、２６０６は２６０５の次の２文字連鎖「はに」で
ある。In FIG. 26A, reference numeral 2601 denotes a character string “iro a ha ni” inputted at the time of registration, and reference numeral 2602 denotes a specific special character “a” for the character string 2601 by a subsequent character “ha”. The character string "iro a1 ha ni" changed to "a1", 2603 is a two-character chain "iro" to be registered first, 2604 is a two-character chain "ro a1" next to 2603, and 2605 is 2604. The next two-letter chain "a1
"", And 2606 is a two-character chain "" in the next of 2605.

【０１４４】ここで「ａ」は、文字列に意味の区切りな
どのために挿入されている特殊文字、「a１」は、検索
対象とならない特定の記号、コードを表す。Here, "a" is a special character inserted into a character string to separate meanings and the like, and "a1" represents a specific symbol or code not to be searched.

【０１４５】図２６（ｃ）において、２６１１は検索時
の検索文字列「いろａはに」、２６１２は文字列２６１
１に対して特定の特殊文字「ａ」をその後の文字「は」
により一意に決めた「ａ1」に変更した文字列「いろａ1
はに」、２６１３は最初に検索される２文字連鎖「い
ろ」、２６１４は２６１３の次の２文字連鎖「ろａ
1」、２６１５は２６１４の次の２文字連鎖「ａ1は」、
２６１６は２６１５の次の２文字連鎖「はに」である。In FIG. 26 (c), reference numeral 2611 denotes a search character string "iro a ha ni" at the time of search, and 2612 denotes a character string 261.
1 for the special character "a" followed by the character "wa"
The character string "iro a1" changed to "a1" uniquely determined by
"Hani", 2613 is a two-character chain "iro" to be searched first, and 2614 is a two-character chain "ro a" following the 2613.
1 ", 2615 is the two-letter chain following 2614" a1 ",
2616 is a two-character chain “Hani” next to 2615.

【０１４６】図２６（ｂ）において、２文字連鎖２６０
３は「い」の検索対象文字列における出現位置ｎを、２
文字連鎖２６０４は「ろ」の出現位置ｎ＋１を、２文字
連鎖２６０５は「ａ1」の出現位置ｎ＋２を、２文字連
鎖２６０６は「は」の出現位置ｎ＋３を記憶する。In FIG. 26B, a two-character chain 260
3 indicates the occurrence position n in the search target character string of "i"
The character chain 2604 stores the appearance position n + 1 of “ro”, the two-character chain 2605 stores the appearance position n + 2 of “a1”, and the two-character chain 2606 stores the appearance position n + 3 of “ha”.

【０１４７】このとき本発明の第１１の方法による照合
方法では、２文字連鎖２６１３の「いろ」に該当する２
文字連鎖２６０３を検出し、また２６１３の次の２文字
連鎖２６１４の「ろａ1」に該当する２文字連鎖２６０
４を検出し、このときの２文字連鎖２６０４の出現位置
ｎ＋１が前記検出の２文字連鎖２６０３の出現位置ｎに
＋１したものと一致するか否か判断する。一致したら、
次に２６０４で検出した出現位置ｎ＋１に＋１した値
と、２６１４の次の２文字連鎖の「ａ1は」に該当する
２文字連鎖２６０５の出現位置ｎ＋２が一致するか否か
判断する。一致したら、次に２６０５で検出した出現位
置ｎ＋２に＋１値と、２６１５の次の２文字連鎖の「は
に」に該当する２文字連鎖２６０６の出現位置ｎ＋３が
一致するか否か判断する。一致したら、文字列２６１１
は２６０１に一致したと判断する。以上により、文字列
の照合がなされる。At this time, in the collation method according to the eleventh method of the present invention, 2 characters corresponding to “color” in the two-character chain 2613 are used.
A character chain 2603 is detected, and a two-character chain 2601 corresponding to “a1” of a two-character chain 2614 next to 2613 is detected.
4 is detected, and it is determined whether or not the occurrence position n + 1 of the two-character chain 2604 at this time matches the occurrence position n of the detected two-character chain 2603 by +1. If they match,
Next, it is determined whether or not the value obtained by adding +1 to the appearance position n + 1 detected in 2604 matches the appearance position n + 2 of the two-character chain 2605 corresponding to “a1” in the two-character chain following 2614. If they match, then it is determined whether the occurrence position n + 2 detected at 2605 is equal to the +1 value, and whether the occurrence position n + 3 of the two-character chain 2606 corresponding to “Hana” of the next two-character chain after 2615 matches. If they match, the character string 2611
Is determined to match 2601. As described above, the character strings are collated.

【０１４８】図２５は本発明の第１１の方法の一実施の
形態における文字列照合装置の構成を示したものであ
る。FIG. 25 shows the configuration of a character string collation apparatus according to an embodiment of the eleventh method of the present invention.

【０１４９】図２５において、２５０１は登録する文字
列２６０１を特定の特殊文字「ａ」をその後の文字
「は」により一意に決めた「ａ1」に変更した文字列２
６０２に変更する文字列変換器、２５０２は文字列２６
０２から登録する２文字連鎖２６０３、２６０４、２６
０５、２６０６およびそれらの２文字連鎖の出現位置を
検出する２文字連鎖位置検出器、２５０３は２文字連鎖
２６０３、２６０４、２６０５、２６０６およびそれら
の文字連鎖の出現位置を格納する２文字連鎖位置メモ
リ、２５０４は検索する文字列２６１１を特定の特殊文
字「ａ」をその後の文字「は」により一意に決めた「ａ
1」に変更した文字列２６１２に変更する文字列変換
器、２５０５は文字列２６１２において検索する２文字
連鎖２６１３、２６１４、２６１５、２６１６を検出す
る２文字連鎖検出器、２５０６は２文字連鎖検出器２５
０５より検出された２文字連鎖２６１３、２６１４、２
６１５、２６１６を２文字連鎖位置メモリ２５０３で検
出し、検出した２文字連鎖の出現位置が直前に検出した
２文字連鎖の出現位置に＋１したものに一致するか否か
判断する比較器、２５０７は２文字連鎖検出器２５０５
から検出される全ての２文字連鎖について比較器２５０
６で判断し、文字列の一致を判断する制御部である。In FIG. 25, reference numeral 2501 denotes a character string 2401 in which a character string 2601 to be registered is changed from a specific special character “a” to “a1” uniquely determined by the subsequent character “ha”.
A character string converter for changing to 602, and 2502 for the character string 26
2 character chain 2603, 2604, 26 to be registered from 02
05, 2606 and a two-character chain position detector for detecting the appearance position of the two-character chain, and a two-character chain position memory 2503 for storing the two-character chain 2603, 2604, 2605, 2606 and the occurrence position of the character chain , 2504 designates a character string 2611 to be searched as “a” in which a specific special character “a” is uniquely determined by a subsequent character “ha”.
A character string converter for changing to a character string 2612 changed to "1", 2505 is a two-character chain detector for detecting a two-character chain 2613, 2614, 2615, 2616 to be searched in the character string 2612, and 2506 is a two-character chain detector 25
05 two-character chain 2613, 2614, 2
The comparator 2507 detects 615 and 2616 in the two-character chain position memory 2503, and determines whether or not the detected occurrence position of the two-character chain matches the appearance position of the two-character chain detected immediately before by +1. Two-character chain detector 2505
The comparator 250 for all two-letter chains detected from
The control unit determines in step 6 and determines whether the character strings match.

【０１５０】以上ように構成された文字列照合装置にお
いて、その動作さについて説明する。登録文字列が入力
されると文字列変換手段２５０１は、予め指定された特
殊文字「ａ」をその後の文字により予め決められた検索
対象とならない記号、コード、即ち、検索文字列以外の
記号、コードに変換して出力する。The operation of the thus constructed character string collating apparatus will be described. When the registered character string is input, the character string conversion unit 2501 converts the special character “a” specified in advance into symbols or codes that are not to be searched by predetermined characters, that is, symbols other than the search character string, Convert to code and output.

【０１５１】文字列変換手段には、図２６（ｄ）のよう
に、特殊記号の後の文字に対応し、どの記号に変換する
その対応が格納されている。この対応は２６２１、２６
２２のように文字毎に異なる対応でも、また、２６２３
のように文字のグループに対応するものでもよい。As shown in FIG. 26 (d), the character string conversion means stores the correspondence corresponding to the character following the special symbol and the conversion to which symbol. This correspondence is 2621, 26
Even if the correspondence differs for each character, such as 22,
May correspond to a group of characters.

【０１５２】変換された文字列は、２文字連鎖検出器に
より実施の形態１０と同様に２文字連鎖とその出現位置
とが検出され、２文字連鎖位置メモリに格納される。In the converted character string, a two-character chain and its appearance position are detected by a two-character chain detector as in the tenth embodiment, and are stored in a two-character chain position memory.

【０１５３】一方、検索文字列が与えられると文字列変
換器２５０４により、文字列変換２５０１で用いした対
応と同一の対応に従い、特殊文字を検索文字列以外の記
号、コードに変換し、２文字連鎖検出器に出力する。２
文字連鎖検出器は２文字連鎖を検出し、比較器２５０６
に出力する。On the other hand, when the search character string is given, the character string converter 2504 converts the special character into a symbol or code other than the search character string according to the same correspondence as that used in the character string conversion 2501, and converts the special character into two characters. Output to the chain detector. 2
The character chain detector detects the two-character sequence, and outputs the result to the comparator 2506.
Output to

【０１５４】比較器２５０６は実施の形態１０と同様の
手順に従い２文字連鎖メモリの内容に従い文字連鎖の一
致を検出する。但し、実施の形態１１では、実施の形態
１０のように比較器が、２文字連鎖か３文字連鎖かを区
別する必要はない。The comparator 2506 detects a match between character chains in accordance with the contents of the two-character chain memory according to the same procedure as in the tenth embodiment. However, in the eleventh embodiment, it is not necessary for the comparator to distinguish between a two-character chain and a three-character chain as in the tenth embodiment.

【０１５５】以上のように、本実施の形態によれば、出
現頻度の高い特殊文字「ａ」の連鎖メモリの増大を避け
ることができ、また、同一の特殊文字を後の文字に従い
異なる複数の文字に変換することにより、連鎖を抽出す
るための出現回数の一致を調べる候補が複数に分散され
ることにより、その処理時間が短くてすむ。As described above, according to the present embodiment, it is possible to avoid an increase in the chain memory of the special character “a” having a high appearance frequency, and to replace the same special character with a plurality of different characters in accordance with the subsequent characters. By converting to characters, candidates for checking the coincidence of the number of appearances for extracting a chain are dispersed into a plurality of candidates, so that the processing time is reduced.

【０１５６】なお、本実施の形態では特殊文字「ａ」
を、その後の文字によて変換先を決めたが、特殊文字の
前の文字により、変換先を決めた場合でも同様の効果を
得られることは明らかでる。In the present embodiment, the special character "a"
Is determined based on the character after that, but it is clear that the same effect can be obtained even when the conversion destination is determined based on the character before the special character.

【０１５７】なお、計算機として実装した場合の概略図
は図１（ａ）と同じであり、本実施の形態では、２文字
連鎖位置メモリ２５０３が外部記録装置４０に対応す
る。The schematic diagram of the case where the present invention is implemented as a computer is the same as that of FIG. 1A. In this embodiment, the two-character chain position memory 2503 corresponds to the external recording device 40.

【０１５８】また、本実施の形態では図２６（ｂ）のよ
うな２文字連鎖位置メモリを設けたが、同一メモリに２
文字連鎖位置情報として変位を設けて、例えば図２６
（ｅ）のように２文字連鎖を記憶することができる。こ
の場合、文字連鎖２６０３、２６０４、２６０５、２６
０６の出現位置をｎ、ｎ＋１、ｎ＋２、ｎ＋３、変位を
１、１、１、１として、各文字連鎖の連続性を各文字連
鎖の出現位置がその文字連鎖の直前の文字連鎖の出現位
置＋変位と一致するか比較することで、本発明の第１１
の方法により、文字列の照合を行うことができる。In this embodiment, a two-character chain position memory as shown in FIG. 26B is provided.
By providing a displacement as character chain position information, for example, FIG.
As shown in (e), a two-character chain can be stored. In this case, the character chains 2603, 2604, 2605, 26
Assuming that the appearance position of 06 is n, n + 1, n + 2, n + 3 and the displacement is 1, 1, 1, 1, the continuity of each character chain is represented by the appearance position of the character chain immediately before the character chain + By comparing with or comparing with the displacement, the eleventh aspect of the present invention
By the method described above, character string collation can be performed.

【０１５９】（実施の形態１２）図２７は、本発明の第
１２の実施の形態における文字列照合装置の構成を示す
ブロック図、図２８は本発明の文字列照合の第１２の方
法の概念、及び全文検索データを記憶した記録媒体の記
憶形式を示している。(Embodiment 12) FIG. 27 is a block diagram showing a configuration of a character string collating apparatus according to a twelfth embodiment of the present invention, and FIG. 28 is a concept of a twelfth method of character string collation of the present invention. , And the storage format of the recording medium storing the full-text search data.

【０１６０】図２８（ａ）において、２８０１は登録時
に入力される文字列「いろａはに」、２８０２は文字列
６０１に対して特定の特殊文字「ａ」をその前の文字
「ろ」は「ろ」および「ろ」により一意に決まる
「ろ’」からなる「ろろ’」に、またその後の文字
「は」は「は」により一意に決まる「は’」および
「は」からなる「は’は」に変更した文字列「いろろ’
は’はに」、２８０３は最初に登録されるの２文字連鎖
「いろ」、２８０４は２８０３の次の２文字連鎖「ろ
ろ’」、２８０５は２８０４の次の２文字連鎖「ろ’
は’」、２８０６は２８０５の次の２文字連鎖「は’
は」、２８０７は２８０６の次の２文字連鎖「はに」で
ある。In FIG. 28 (a), reference numeral 2801 denotes a character string “iro a ha ni” inputted at the time of registration, and 2802 denotes a specific special character “a” with respect to the character string 601; "Roro", which consists of "ro" uniquely determined by "ro" and "ro", and the subsequent character "ha", which consists of "ha" and "ha", which is uniquely determined by "ha" Character string changed to 'ha'
Is' Hani ', 2803 is the two-letter chain "iro" that is registered first, 2804 is the two-letter chain "Roro" next to 2803, and 2805 is the two-letter chain "Ro'" next to 2804.
"", 2806 is the two-letter chain following 2805 ""
"Ha", 2807 is the next two-character chain "hani" after 2806.

【０１６１】ここで「ａ」は、文字列に意味の区切りな
どのために挿入されている特殊文字、「ろ’」「は’」
は、検索対象とならない特定の記号、コードを表す。Here, “a” is a special character inserted into a character string for separating meanings, etc.
Represents a specific symbol or code not to be searched.

【０１６２】図２８（ｃ）において、２８１１は検索時
の検索文字列「いろａはに」、２８１２は文字列２８１
１に対して特定の特殊文字「ａ」をそのその前の文字
「ろ」は「ろ」および「ろ」により一意に決まる
「ろ’」からなる「ろろ’」に、またその後の文字
「は」は「は」により一意に決まる「は’」および
「は」からなる「は’は」に変更した文字列「いろろ’
は’はに」、２８１３は最初に検索される２文字連鎖
「いろ」、２８１４は２８１３の次の２文字連鎖「ろ
ろ’」、２８１５は２８１４の次の２文字連鎖「ろ’
は’」、２８１６は２８１５の次の２文字連鎖「は’
は」、２８１７は２８１６の次の２文字連鎖「はに」で
ある。In FIG. 28 (c), reference numeral 2811 denotes a search character string "iro a ha ni" at the time of search, and 2812 denotes a character string 281.
The special character "a" is replaced by "ro" consisting of "ro" which is uniquely determined by "ro" and "ro", and the subsequent character "ro". "Ha" is a character string changed to "ha'ha" consisting of "ha '" and "ha" uniquely determined by "ha".
Is 'Hani', 2813 is the two-letter chain “iro” that is searched first, 2814 is the two-letter chain “Roro” next to 2813, and 2815 is the two-letter chain “Ro” next to 2814.
"', 2816 is the two-letter chain following 2815""
"Ha", 2817 is the two-character chain "hani" following 2816.

【０１６３】図２８（ｂ）において、２文字連鎖２８０
３は「いろ」の出現位置ｎを、２文字連鎖２８０４は
「ろろ’」の出現位置ｎ＋１を、２文字連鎖２８０５は
「ろ’は’」の出現位置ｎ＋２を、２文字連鎖２８０６
は「は’は」の出現位置ｎ＋３を、２文字連鎖２８０７
は「はに」の出現位置ｎ＋４を記憶する。In FIG. 28B, a two-character chain 280
3 is an appearance position n of "iro", a two-character chain 2804 is an appearance position n + 1 of "roro '", a two-character chain 2805 is an appearance position n + 2 of "ro'ha'", and a two-character chain 2806.
Is the appearance position n + 3 of "ha'ha" and the two-character chain 2807
Stores the appearance position n + 4 of "Hani".

【０１６４】このとき本発明の第１２の方法による照合
方法では、２文字連鎖２８１３の「いろ」に該当する２
文字連鎖２８０３を検出し、２８０３の出現位置ｎに＋
１した値と、２８１３の次の２文字連鎖２８１４の「ろ
ろ’」に該当する２文字連鎖２８０４を検出し、２８０
４の出現位置ｎ＋１が一致するか否か判断する。一致し
たら、次に２８０４で検出した出現位置ｎ＋１に＋１し
た値と、２８１４の次の２文字連鎖の「ろ’は’」に該
当する２文字連鎖２８０５を検出し、２８０５の出現位
置ｎ＋２が一致するか否か判断する。一致したら、次に
２８０５で検出した出現位置ｎ＋２に＋１した値と、２
８１５の次の２文字連鎖の「は’は」に該当する２文字
連鎖２８０６を検出し、２８０６の出現位置ｎ＋３が一
致するか否か判断する。一致したら、次に２８０６で検
出した出現位置ｎ＋３に＋１した値と、２８１６の次の
２文字連鎖の「はに」に該当する２文字連鎖２８０７を
検出し、２８０７の出現位置ｎ＋４が一致するか否か判
断する。一致したら、文字列２８１１は２８０１に一致
したと判断する。以上により、文字列の照合がなされ
る。At this time, in the collation method according to the twelfth method of the present invention, two characters corresponding to “color” of the two-character chain 2813
The character chain 2803 is detected, and + appears at the appearance position n of 2803.
The two-character chain 2804 corresponding to “Roro '” in the two-character chain 2814 following the 2813 is detected, and the value 280 is detected.
It is determined whether or not the appearance position n + 1 of 4 matches. If they match, then the value obtained by adding +1 to the appearance position n + 1 detected in 2804 and the two-character chain 2805 corresponding to “ro'wa '” in the next two-character chain following 2814 are detected, and the appearance position n + 2 of 2805 matches. It is determined whether or not to do. If they match, then a value obtained by adding +1 to the appearance position n + 2 detected in 2805 and 2
A two-character chain 2806 corresponding to “ha'ha” of the two-character chain following 815 is detected, and it is determined whether or not the appearance position n + 3 of 2806 matches. If they match, then a value obtained by adding +1 to the appearance position n + 3 detected in 2806 and a two-character chain 2807 corresponding to “Hana” in the next two-character chain after 2816 are detected, and is the appearance position n + 4 of 2807 the same? Determine whether or not. If they match, it is determined that the character string 2811 matches 2801. As described above, the character strings are collated.

【０１６５】図２７は本発明の第１２の方法の一実施の
形態における文字列照合装置の構成を示したものであ
る。FIG. 27 shows the configuration of a character string collating apparatus according to an embodiment of the twelfth method of the present invention.

【０１６６】図２７において、２７０１は登録する文字
列２８０１に対して特定の特殊文字「ａ」をその前の文
字「ろ」は「ろ」および「ろ」により一意に決まる
「ろ’」からなる「ろろ’」に、またその後の文字
「は」は「は」により一意に決まる「は’」および
「は」からなる「は’は」に変更した文字列２８０２に
変更する文字列変換器、２７０２は文字列２８０２から
登録する２文字連鎖２８０３、２８０４、２８０５、２
８０６、２８０７およびそれらの出現位置を検出する２
文字連鎖位置検出器、２７０３は２文字連鎖２８０３、
２８０４、２８０５、２８０６、２８０７およびそれら
の出現位置を格納する２文字連鎖位置メモリ、２７０４
は検索する文字列２８１１を特定の特殊文字「ａ」をそ
の前の文字「ろ」は「ろ」および「ろ」により一意に決
まる「ろ’」からなる「ろろ’」に、またその後の文字
「は」は「は」により一意に決まる「は’」および
「は」からなる「は’は」に変更した文字列２８１２に
変更する文字列変換器、２７０５は文字列２８１２にお
いて検索する２文字連鎖２８１３、２８１４、２８１
５、２８１６、２８１７を検出する２文字連鎖検出器、
２７０６は２文字連鎖検出器２７０５より検出された２
文字連鎖２８１３、２８１４、２８１５、２８１６、２
８１７を２文字連鎖位置メモリ２７０３で検出し、検出
した２文字連鎖の出現位置が直前に検出した２文字連鎖
の出現位置に＋１した値に一致するか否か判断する比較
器、２７０７は２文字連鎖検出器２７０５から検出され
る全ての２文字連鎖について比較器２７０６で判断し、
文字列の一致を判断する制御部である。In FIG. 27, reference numeral 2701 denotes a specific special character “a” for a character string 2801 to be registered, and the preceding character “ro” is composed of “ro” and “ro ′” uniquely determined by “ro”. A character string converter that changes the character string 2802 to “roro '” and the subsequent character “ha” uniquely determined by “ha”. , 2702 are two-character chains 2803, 2804, 2805, 2 registered from the character string 2802.
2 to detect 806, 2807 and their appearance position
A character chain position detector, 2703 is a two-character chain 2803,
A two-character chain position memory for storing 2804, 2805, 2806, 2807 and their appearance positions, 2704
Replaces the character string 2811 to be searched for a specific special character "a" with the preceding character "ro" as "roro" consisting of "ro" uniquely determined by "ro" and "ro", and thereafter The character "wa" is a character string converter that is uniquely determined by "ha" and is changed to a character string 2812 which is changed to "wa'ha" consisting of "ha" and "ha". Character chains 2813, 2814, 281
A two-character chain detector that detects 5, 2816, 2817;
2706 is the value of 2 detected by the two-character chain detector 2705.
Character chains 2813, 2814, 2815, 2816, 2
817 is detected by the two-character chain position memory 2703, and a comparator for determining whether or not the detected occurrence position of the two-character chain matches a value obtained by adding +1 to the occurrence position of the two-character chain detected immediately before, and 2707 is a two-character comparator. The comparator 2706 determines all two-character chains detected from the chain detector 2705,
This is a control unit that determines whether the character strings match.

【０１６７】本発明における特殊文字をその前後のに隣
接する文字により一意に定まる文字に変換する手段とし
て、図２８（ｄ）のように、特殊文字がその隣接する文
字に対応してどの文字に変換されるか、その対応が格納
されている。この対応は２８２１、２８２２のように文
字毎に異なる対応でも、また、２８２３のように文字の
グループに対応するものでもよい。As means for converting a special character into a character uniquely determined by adjacent characters before and after the special character in the present invention, as shown in FIG. It is converted or its correspondence is stored. This correspondence may be different for each character, such as 2821 and 2822, or may correspond to a group of characters, such as 2823.

【０１６８】以上ように、本実施の形態によれば、特殊
文字「ａ」の出現回数に制限を受けること無く文字連鎖
による文字列照合を行うことが可能となる。As described above, according to the present embodiment, it is possible to perform character string collation by character chain without being limited by the number of appearances of the special character “a”.

【０１６９】即ち、実施の形態によれば特殊文字「ａ」
はその前後の文字により別々の文字に変換され、変換さ
れた文字の出現回数が記録されるため、実施の形態１１
に比べ、２文字連鎖ファイルがさらに細かく分散される
ことにより、使用頻度の高い特殊文字の出現頻度の高い
特殊文字「ａ」の連鎖メモリの増大を避けることがで
き、同時に、連鎖の抽出処理の効率化が図れる。That is, according to the embodiment, the special character "a"
Is converted into separate characters by the characters before and after it, and the number of appearances of the converted characters is recorded.
By distributing the two-character chain file more finely, it is possible to avoid an increase in the chain memory for the special character “a” with a high appearance frequency of the frequently used special character. Efficiency can be improved.

【０１７０】なお、計算機として実装した場合の概略図
は図１（ａ）と同じであり、この場合、２文字連鎖メモ
リ２７０３が外部記録装置４０に対応する。The schematic diagram of the case where the present invention is implemented as a computer is the same as that of FIG. 1A. In this case, the two-character chain memory 2703 corresponds to the external recording device 40.

【０１７１】また、本実施の形態では図２８（ｂ）のよ
うな２文字連鎖位置メモリを設けたが、同一メモリに２
文字連鎖位置情報として変位を設けて、例えば図２８
（ｅ）のように２文字連鎖を記憶することができる。こ
の場合、文字連鎖２８０３、２８０４、２８０５、２８
０６、２８０７の出現位置をｎ、ｎ＋１、ｎ＋２、ｎ＋
３、ｎ＋４、変位を１、１、１、１、１として、各文字
連鎖の連続性を各文字連鎖の出現位置がその文字連鎖の
直前の文字連鎖の出現位置＋変位と一致するか比較する
ことで、本発明の第１２の方法により、文字列の照合を
行うことができる。In this embodiment, a two-character chain position memory as shown in FIG. 28B is provided.
By providing a displacement as character chain position information, for example, FIG.
As shown in (e), a two-character chain can be stored. In this case, the character chains 2803, 2804, 2805, 28
06, 2807 are n, n + 1, n + 2, n +
Assuming that 3, n + 4 and the displacement are 1, 1, 1, 1, and 1, the continuity of each character chain is compared to see if the appearance position of each character chain matches the appearance position + displacement of the character chain immediately before the character chain. Thus, the character string can be collated by the twelfth method of the present invention.

【０１７２】（実施の形態１３）図２９は、本発明の第
１３の実施の形態における文字列照合装置の構成を示す
ブロック図、図３０は本発明の文字列照合の第１３の方
法の概念、及び全文検索データを記憶した記録媒体の記
憶形式を示している。(Embodiment 13) FIG. 29 is a block diagram showing a configuration of a character string collating apparatus according to a thirteenth embodiment of the present invention, and FIG. 30 is a concept of a thirteenth method of character string collating according to the present invention. , And the storage format of the recording medium storing the full-text search data.

【０１７３】図３０は本発明の第１３の方法の文字列照
合の方法の概念を示している。図３０（ａ）において、
３００１は登録時に入力される文字列「いろａはに」、
３００２は最初に登録されるの２文字連鎖「いろ」、３
００３は３００２の次の特殊文字が挿入された３文字連
鎖「ろａは」の第２文字で特殊文字「ａ」を次の第３文
字「は」に変換した３文字連鎖「ろはは」、３００４は
３文字連鎖３００３の第１文字と第２文字による２文字
連鎖「ろは」、３００５は３文字連鎖３００３の第２文
字と第３文字による第１文字が特殊文字「ａ」に対応す
る特殊２文字連鎖「はは」、３００６は３００５の次の
２文字連鎖「はに」である。図３０（ｃ）において、３
０１１は検索時の検索文字列「いろａはに」、３０１２
は最初に検索される２文字連鎖「いろ」、３０１３は３
０１２の次の特殊文字が挿入された３文字連鎖「ろａ
は」の第２文字で特殊文字「ａ」を次の第３文字「は」
に変換した３文字連鎖「ろはは」、３０１４は３文字連
鎖３０１３の第１文字と第２文字による２文字連鎖「ろ
は」、３０１５は３文字連鎖３０１３の第２文字と第３
文字による第１文字が特殊文字「ａ」に対応する特殊２
文字連鎖「はは」、３０１６は３０１５の次の２文字連
鎖「はに」である。FIG. 30 shows the concept of a character string collation method according to the thirteenth method of the present invention. In FIG. 30 (a),
3001 is a character string "Iro a Hani" input at the time of registration,
3002 is a two-character chain “iro” that is registered first, 3
003 is the second character of the three-character chain "roaha" in which the special character next to 3002 is inserted, and the three-character chain "rohaha" obtained by converting the special character "a" to the next third character "ha". , 3004 is a two-character chain "roha" of the first character and the second character of the three-character chain 3003, and 3005 is a first character of the second character and the third character of the three-character chain 3003 corresponding to the special character "a". Is a special two-character chain “Hana”, and 3006 is a two-character chain “Hani” next to 3005. In FIG. 30C, 3
011 is a search character string at the time of search “Iro a Hani”, 3012
Is the first two-character chain "iro" to be searched, and 3013 is 3
012 with the special character next to it inserted
The special character “a” is the second character of “ha” and the next third character is “ha”
The three-letter chain “Rohah” converted to “3,” 3014 is the two-letter chain “Rohah” with the first and second characters of the three-letter chain 3013, and 3015 is the second and third characters of the three-letter chain 3013.
Special 2 in which the first character is the special character "a"
The character chain “Hana” and 3016 are the two-character chain “Hani” next to 3015.

【０１７４】図３０（ｂ）において、２文字連鎖３００
２は「いろ」の出現位置ｎを、２文字連鎖３００４は
「ろは」の出現位置n＋１を、特殊２文字連鎖３００５
は別の領域に「はは」の出現位置ｎ＋２を、特殊２文字
連鎖の次の２文字連鎖３００６は「はに」の出現位置を
特殊２文字連鎖３００５の出現位置と同じ値ｎ＋２を記
憶する。In FIG. 30B, a two-character chain 300
2 is the appearance position n of “iro”, the two-character chain 3004 is the appearance position n + 1 of “iroha”, and the special two-character chain 3005
Stores the appearance position n + 2 of “haha” in another area, the next two-character chain 3006 of the special two-character chain stores the appearance position of “hana”, and the same value n + 2 as the appearance position of the special two-character chain 3005. .

【０１７５】このとき本発明の第１３の方法による照合
方法では、２文字連鎖３０１２の「いろ」に該当する２
文字連鎖３００２を検出し、３００２の出現位置ｎに＋
１した値と、３０１２の次の３文字連鎖３０１３の最初
の２文字連鎖３０１４の「ろは」に該当する２文字連鎖
３００４の出現位置ｎ＋１が一致するか否か判断する。
一致したら、次に３００４で検出した出現位置ｎ＋１に
＋１した値と、３０１４の次の特殊２文字連鎖３０１５
「はは」に該当する特殊２文字連鎖３００５の出現位置
ｎ＋２が一致することを検出する。次に３００５の出現
位置ｎ＋２と、３０１５の次の２文字連鎖３０１６の
「はに」に該当する２文字連鎖３００６の出現位置が一
致するか否か判断する。一致したら、文字列３０１１は
３００１に一致したと判断する。以上により、文字列の
照合がなされる。At this time, in the collation method according to the thirteenth method of the present invention, 2 characters corresponding to “color” in the two-character chain 3012 are used.
A character chain 3002 is detected, and +
It is determined whether or not the value obtained by 1 and the appearance position n + 1 of the two-character chain 3004 corresponding to “roha” of the first two-character chain 3014 of the three-character chain 3013 following the 3012 match.
If they match, then the value obtained by adding +1 to the appearance position n + 1 detected in 3004 and the special two-character sequence 3015 next to 3014
It detects that the appearance position n + 2 of the special two-character chain 3005 corresponding to “haha” matches. Next, it is determined whether or not the appearance position n + 2 of 3005 and the appearance position of the two-character chain 3006 corresponding to “Hana” in the two-character chain 3016 next to 3015 match. If they match, it is determined that the character string 3011 matches 3001. As described above, the character strings are collated.

【０１７６】図２９は本発明の第１３の方法の一実施例
における文字列照合装置の構成を示したものである。FIG. 29 shows the structure of a character string collating apparatus according to an embodiment of the thirteenth method of the present invention.

【０１７７】図２９において、２９０１は登録する文字
列３００１に対して第２文字が特殊文字の３文字連鎖３
００３および２文字連鎖３００２、３００６を識別し、
３文字連鎖３００３の場合は第２文字の特殊文字を第３
文字と同じ文字に変換し、且つ、第２文字の文字位置と
第３文字の文字位置を同じにする３文字連鎖検出器、２
９０２は２９０１より入力される２文字連鎖３００２、
３００６の出現位置を検出する２文字連鎖位置検出器、
２９０３は２９０１より入力される３文字連鎖３００３
の第１文字と第２文字からなる２文字連鎖３００４と第
２文字と第３文字からなる特殊２文字連鎖連鎖３００５
の２つの２文字連鎖及び各文字連鎖の出現位置を検出す
る特殊２文字連鎖生成器、２９０４は２文字連鎖３００
２、３００４、３００６およびそれらの文字連鎖の出現
位置を格納する２文字連鎖位置メモリ、２９０５は特殊
２文字連鎖３００５およびその文字連鎖の出現位置を格
納する特殊２文字連鎖位置メモリ、２９１１は検索する
文字列３０１１に対して第２文字が特殊文字の３文字連
鎖３０１３および２文字連鎖３０１２、３０１６を識別
し、３文字連鎖３０１３の場合は第２文字の特殊文字を
第３文字と同じ文字に変換する３文字連鎖検出器、２９
１２は２９１１より入力される２文字連鎖３０１２、３
０１６を検出する２文字連鎖検出器、２９１３は２９１
１より入力される３文字連鎖３０１３の第１文字と第２
文字からなる２文字連鎖３０１４と第２文字と第３文字
からなる特殊２文字連鎖連鎖３０１５の２つの２文字連
鎖を検出する特殊２文字連鎖生成器、２９１４は２文字
連鎖検出器２９１２より検出された２文字連鎖３０１
２、３０１４、３０１６を２文字連鎖メモリ２９０４で
検出するかまたは、特殊２文字連鎖生成器２９１３より
生成された特殊２文字連鎖３０１５を２文字連鎖メモリ
２９０４で検出し、検出したそれぞれの文字連鎖が特殊
２文字連鎖の場合はその出現位置が直前に検出した文字
連鎖の出現位置と一致するか、または特殊２文字連鎖で
ない場合はその出現位置が直前に検出した文字連鎖の出
現位置に＋１した値と一致するか否か判断する比較器、
２９１５は２文字連鎖検出器２９１２および特殊２文字
連鎖検出器２９１３から検出される全ての２文字の連鎖
についての一致を比較器２９１４で判断し、文字列の一
致を判断する制御部である。In FIG. 29, reference numeral 2901 denotes a three-character chain of special characters for the character string 3001 to be registered.
003 and two-character chains 3002, 3006,
In the case of the three-character chain 3003, the special character of the second character is changed to the third character.
A three-character chain detector that converts the character to the same character and makes the character position of the second character the same as the character position of the third character;
902 is a two-character chain 3002 input from 2901;
A two-character chain position detector that detects the appearance position of 3006;
2903 is a three-character chain 3003 input from 2901
Two-character chain 3004 consisting of the first and second characters and special two-character chain 3005 consisting of the second and third characters
Special two-character chain generator 2904 for detecting two character sequences and the appearance position of each character chain.
2, 3004, 3006 and a two-character chain position memory for storing the appearance positions of their character chains, 2905 is a special two-character chain 3005 and a special two-character chain position memory for storing the appearance positions of the character chains, and 2911 is a search. For the character string 3011, the second character identifies a special character three-character chain 3013 and two-character chains 3012 and 3016. In the case of the three-character chain 3013, the special character of the second character is converted to the same character as the third character. Three-letter chain detector, 29
12 is a two-character chain 3012 input from 2911, 3
Two-character chain detector that detects 016, 291 is 291
The first and second characters of the three-character chain 3013 input from
A special two-character chain generator 2914 for detecting two two-character chains, that is, a two-character chain 3014 consisting of characters and a special two-character chain 3015 consisting of a second character and a third character, and 2914 is detected by the two-character chain detector 2912. Two character chain 301
2, 3014, and 3016 are detected by the two-character chain memory 2904, or the special two-character chain 3015 generated by the special two-character chain generator 2913 is detected by the two-character chain memory 2904. In the case of a special two-character chain, its appearance position matches the appearance position of the character chain detected immediately before, or when it is not a special two-character chain, its appearance position is the value obtained by adding +1 to the appearance position of the previously detected character chain. A comparator for determining whether or not
Reference numeral 2915 denotes a control unit which determines whether the two-character chains detected by the two-character chain detector 2912 and the special two-character chain detector 2913 coincide with each other by the comparator 2914, and judges character string coincidence.

【０１７８】よって、この時特定の特殊文字「ａ」の出
現に制限を受けること無く文字連鎖による文字列照合を
行うことが可能となる。Therefore, at this time, it is possible to perform character string collation by character chain without being restricted by the appearance of the specific special character “a”.

【０１７９】また、本実施の形態では図３０（ｂ）のよ
うに２文字連鎖と特殊２文字連鎖（特殊文字の挿入）を
区別するために異なる連鎖メモリを設けたが、同一メモ
リに２文字連鎖か特殊文字連鎖かを識別する変位を設け
て、例えば図３０（ｄ）のように２文字連鎖と特殊２文
字連鎖を記憶することができる。この場合、文字連鎖３
００２、３００４、３００５、３００６の出現位置を
ｎ、ｎ＋１、ｎ＋２、ｎ＋２、変位を１、１、０、１と
して、各文字連鎖の連続性を各文字連鎖の出現位置がそ
の文字連鎖の直前の文字連鎖の出現位置＋変位と一致す
るか比較することで、２文字連鎖か特殊２文字連鎖かの
識別が変位により識別され、同一の領域にこれらのデー
タを格納して、本発明の第１５の方法により、文字列の
照合を行うことができる。In this embodiment, a different chain memory is provided to distinguish between a two-character chain and a special two-character chain (special character insertion) as shown in FIG. 30B. By providing a displacement for identifying a chain or a special character chain, a two-character chain and a special two-character chain can be stored, for example, as shown in FIG. In this case, character chain 3
The appearance positions of 002, 3004, 3005, and 3006 are n, n + 1, n + 2, n + 2, and the displacement is 1, 1, 0, 1, and the continuity of each character chain is determined by the appearance position of each character chain immediately before the character chain. By comparing with the appearance position of the character chain + displacement, whether the two-character chain or the special two-character chain is identified by the displacement, these data are stored in the same area, and the fifteenth aspect of the present invention is described. By the method described above, character string collation can be performed.

【０１８０】（実施の形態１４）第３２図は本発明の文
字列照合の第１４の文字列の登録方法の概念を示してい
る。(Embodiment 14) FIG. 32 shows the concept of a fourteenth character string registration method for character string collation according to the present invention.

【０１８１】第３２図（ａ）において、３２０１は登録
時に入力される登録文字列「いろａはにｂいろａはに」
であり、「ａ」「ｂ」が特殊文字、文書番号はＮとなっ
ている。最初の登録文字列に対して番号付けを行う。３
２１６は登録文字列３２０１の文字位置および固有の番
号であり、登録文字列の先頭をｎとして特殊文字「ａ」
「ｂ」を除いて昇順に番号付けし、先頭の文字「い」に
対して文字位置はｎ、４文字目の「は」はｎ＋２、以下
同様に特殊文字を除いた番号付けがされている。また特
殊文字「ａ」「ｂ」に対しては固有の番号が付けられ、
「ａ」にはｍ、「ｂ」にはｌが番号付けされている。次
に２文字連鎖の作成を行う。登録文字列３２０１におい
て特殊文字「ａ」「ｂ」を含まない文字連鎖を探し、２
文字連鎖「いろ」３２０２、２文字連鎖「はに」３２０
３、２文字連鎖「いろ」３２０４、２文字連鎖「はに」
３２０５を作成する。続けて、特殊文字を含む特殊２文
字連鎖「ろａ」３２０６、特殊２文字連鎖「ａは」３２
０７および特殊文字「ａ」の直前の文字「ろ」と直後の
文字「は」を組合せた特殊２文字連鎖「ろは」３２０８
を作成する。以下同様にして、登録文字列の６文字目の
特殊文字「ｂ」に対しては特殊２文字連鎖「にｂ」３２
０９、「ｂい」３２１０、「にい」３２１０、登録文字
列の９文字目の特殊文字「ａ」に対しては特殊２文字連
鎖「ろａ」３２１２、「ａは」３２１３、「ろは」３２
１３を作成する。In FIG. 32 (a), reference numeral 3201 denotes a registered character string "color a wa ni b color a ha ni" inputted at the time of registration.
“A” and “b” are special characters, and the document number is N. Numbers the first registered character string. 3
Reference numeral 216 denotes a character position and a unique number of the registered character string 3201, and the special character “a” is set with n as the head of the registered character string.
Numbering is performed in ascending order excluding "b", the character position is n for the first character "i", the fourth character "ha" is n + 2, and so on, except for special characters. . Also, unique numbers are assigned to the special characters “a” and “b”,
“A” is numbered m, and “b” is numbered l. Next, a two-character chain is created. A character chain that does not include the special characters “a” and “b” in the registered character string 3201 is searched.
Character chain "Iro" 3202, two-character chain "Hani" 320
3 and 2 character chain "Iro" 3204 and 2 character chain "Hani"
3205 is created. Subsequently, a special two-character chain “roa” 3206 including special characters and a special two-character chain “aha” 32
07 and the special character "ro" 3208 combining the character "ro" immediately before the special character "a" and the character "ha" immediately after the special character "a".
Create Similarly, a special two-character chain “nib” 32 is applied to the sixth special character “b” of the registered character string.
09, “b” 3210, “ni” 3210, a special two-character chain “ro a” 3212, “a wa” 3213, “roha” for the ninth special character “a” in the registered character string. "32
13 is created.

【０１８２】第３２図（ｂ）（ｃ）は、第３２図
（ａ）で作成した２文字連鎖、特殊２文字連鎖から作成
される文字連鎖情報を示している。文字連鎖情報は２文
字連鎖または特殊２文字連鎖、文字位置、文書番号から
構成されており、文字連鎖の第１文字の文字種毎、文書
番号順にならぶ。始めに第３２図（ｂ）について説明す
る。２文字連鎖「いろ」に対しては、第３２図（ａ）か
ら３２０２、３２０４が並び、各２文字連鎖の文字位置
はｎ、ｎ＋１となる。同様に２文字連鎖「はに」に対し
ては３２０３、３２０５が並び文字位置はｎ＋２、ｎ＋
６となる。特殊文字「ａ」を含むか挟む特殊２文字連鎖
に対しては、登録文字列の２番目の文字「ろ」を特殊２
文字連鎖の１番目の文字として特殊２文字連鎖「ろは」
３２０８、および「ろ」の直後の特殊２文字連鎖３２０
９を取り出し連続して並べる。この時の文字位置は「ろ
は」にはｎ＋１、「ろａ」にはｍを付ける。同様にして
登録文字列の８番目の「ろ」に対しては特殊文字連鎖３
２１４、３２１２の順番でセットする。また特殊２文字
連鎖の第１文字目が「ａ」に対しては第２文字の文字位
置を割りあて、「ａは」に対しては特殊２文字連鎖３２
０７、３２１３の文字位置の順番で割りあてる。続いて
特殊文字「ｂ」に対する文字連鎖情報の作成は第３２図
（ｂ）と同様に、「に」を第１文字として特殊２文字連
鎖「にい」３２１１、「にｂ］３２０９の順番でセット
し、さらに「ｂ」を第１文字として特殊２文字連鎖「ｂ
い」３２１０をセットする。FIGS. 32 (b) and (c) show character chain information created from the two-character chain and the special two-character chain created in FIG. 32 (a). The character chain information includes a two-character chain or a special two-character chain, a character position, and a document number. The character chain information is arranged for each character type of the first character in the character chain and in document number order. First, FIG. 32 (b) will be described. For the two-character chain “Iro”, 3202 and 3204 are arranged from FIG. 32 (a), and the character positions of each two-character chain are n and n + 1. Similarly, for the two-character chain “Hani”, 3203 and 3205 are arranged and the character positions are n + 2 and n +
It becomes 6. For a special two-character chain that includes or sandwiches the special character “a”, the second character “R” in the registered character string
Special two-letter chain "roha" as the first character in the character chain
3208, and special two-character chain 320 immediately after "ro"
9 are taken out and arranged continuously. At this time, the character position is "roha" with n + 1, and "roa" with m. Similarly, the special character chain 3 is applied to the eighth character “R” of the registered character string.
It is set in the order of 214 and 3212. If the first character of the special two-character chain is "a", the character position of the second character is assigned.
Assigned in the order of the character positions of 07 and 3213. Subsequently, the character chain information for the special character "b" is created in the order of the special two-character chain "ni" 3211 and "ni b" 3209 with "ni" as the first character, as in FIG. 32 (b). Set, and use the special two-character chain "b" with "b" as the first character.
“3210” is set.

【０１８３】このとき本発明の１４の方法による照合方
法について第３２図（ｄ）を使って説明する。検索文字
列３２１７「いろａはに」に対して、まず先頭から特殊
文字「ａ」の有無を調べ、含まれていなければ２文字連
鎖「いろ」３２１８を作成する。続けて「ろａ」「ａ
は」を作成し特殊２文字連鎖３２２０、３２２１として
検出する。このとき文字「ろａは」は特殊文字「ａ」を
間に含むので特殊２文字連鎖「ろは」３２１９を検出す
る。続いて２文字連鎖「はに」３２２２を検出する。At this time, the collating method according to the fourteenth method of the present invention will be described with reference to FIG. With respect to the search character string 3217 "iro a ha ni", first, the presence or absence of the special character "a" is checked from the top, and if it is not included, a two-character chain "iro" 3218 is created. Continue with "ro a" and "a
Is created and detected as special two-character chains 3220 and 3221. At this time, the special character "a" is detected between the special character "a" and the special character "a". Subsequently, a two-character chain “Hani” 3222 is detected.

【０１８４】次に検出した２文字連鎖と特殊２文字連鎖
に該当する文字連鎖を第３２図（ｂ）の文字連鎖情報か
ら取り出す。２文字連鎖「いろ」３２１８に対応する文
字連鎖は図３２図（ｂ）の３２０２、３２０４があり、
最初に３２０２を取り出す。続いて特殊２文字連鎖「ろ
は」３２１９に対応する文字連鎖は図３２（ｂ）の３２
０８、３２１４があり、最初に３２０８を取り出す。３
２０２と３２０８の文書番号はともにＮとなり一致し、
また２つの文字連鎖の文字位置は各々ｎ、ｎ＋１となる
連続した文字位置であることから３２０２と３２０８は
連続していると判定する。続いて第３２図（ｂ）の文字
連鎖情報において特殊２文字連鎖３２０８に続く文字連
鎖を調べ、文字連鎖３２０６「ろａ」を取り出す。３２
０６は、第１文字が「ろ」、第２文字が特殊文字「ａ」
である、文字位置の値が固有値ｍ、かつ文書番号Ｍであ
ることから、特殊２文字連鎖「ろは」３２０８に続く特
殊２文字連鎖と判定する。Next, a character chain corresponding to the detected two-character chain and the special two-character chain is extracted from the character chain information in FIG. 32 (b). The character chain corresponding to the two-character chain “Iro” 3218 includes 3202 and 3204 in FIG.
First, 3202 is taken out. Subsequently, the character chain corresponding to the special two-character chain "roha" 3219 is 32 in FIG.
08, 3214, and first retrieves 3208. 3
The document numbers of 202 and 3208 are both N and match,
Since the character positions of the two character chains are consecutive character positions of n and n + 1, it is determined that 3202 and 3208 are continuous. Subsequently, the character chain following the special two-character chain 3208 is examined in the character chain information of FIG. 32 (b), and the character chain 3206 “a” is extracted. 32
06 is the first character "ro" and the second character is the special character "a"
Since the value of the character position is the unique value m and the document number M, it is determined to be a special two-character chain following the special two-character chain "roha" 3208.

【０１８５】次に特殊２文字連鎖「ａは」３２２１に該
当する文字連鎖情報を第３２図（ｂ）から調べ、特殊２
文字連鎖「ａは」３２０７を取り出す。３２０７の文書
番号はＮ、文字位置はｎ＋２であることから特殊２文字
連鎖「ろは」３２０８の文字位置ｎ＋１に続く特殊２文
字連鎖と判定する。Next, the character chain information corresponding to the special two-character chain “a wa” 3221 is checked from FIG.
The character chain “a wa” 3207 is extracted. Since the document number of 3207 is N and the character position is n + 2, it is determined to be a special two-character chain following the character position n + 1 of the special two-character chain "roha" 3208.

【０１８６】次に２文字連鎖「はに」３２２２に該当す
る文字連鎖情報を第３２図（ｂ）から調べ、２文字連鎖
「はに」３２０３を取り出す。３２０３の文書番号は
Ｎ、文字位置はｎ＋２であることから特殊２文字連鎖
「ろは」３２０８の文字位置ｎ＋１に続く２文字連鎖と
判定する。以上のようにして検索文字列３２１７は登録
文字列３２０１に含まれていると判断する。Next, character chain information corresponding to the two-character chain “Hana” 3222 is examined from FIG. 32 (b), and the two-character chain “Hana” 3203 is extracted. Since the document number of 3203 is N and the character position is n + 2, it is determined that it is a two-character chain following the character position n + 1 of the special two-character chain "roha" 3208. As described above, it is determined that the search character string 3217 is included in the registered character string 3201.

【０１８７】また、上記照合方法において、２文字連鎖
３２１８に該当する第３２図（ｂ）の文字連鎖情報を取
り出した際、２つの文字連鎖３２０２、３２０４のうち
３２０４についても上記照合方法と同様の方法により、
２文字連鎖「いろ」３２０４（文字位置ｎ＋４、文書番
号Ｎ）、特殊２文字連鎖「ろは」３２１４（文字位置ｎ
＋５、文書番号Ｎ）、特殊２文字連鎖「ろａ」（文字位
置ｍ、文書番号Ｎ）、特殊２文字連鎖「ａは」（文字位
置ｎ＋６、文書番号Ｎ）、２文字連鎖「はに」（文字位
置ｎ＋６、文書番号Ｎ）を検出し文書番号と文字位置の
連続性の比較から一致していると判断することができ
る。以上のことから検索文字列３２１７は登録文字列３
２０１の２箇所で含まれていると判断する。In the above collation method, when the character chain information shown in FIG. 32B corresponding to the two character chain 3218 is extracted, 3204 out of the two character chains 3202 and 3204 is the same as the collation method. By the way,
Two-character chain "iro" 3204 (character position n + 4, document number N), special two-character chain "iro" 3214 (character position n
+5, document number N), special two-character chain "ro a" (character position m, document number N), special two-character chain "a" (character position n + 6, document number N), two-character chain "hani" (Character position n + 6, document number N) is detected, and it can be determined from the comparison of continuity between the document number and the character position that they match. From the above, the search character string 3217 is the registered character string 3
201 is determined to be included.

【０１８８】以下同様の照合方法により検索文字列に特
殊文字「ｂ」を含む検索文字列３２２３「はにｂいろ」
に対して、第３２図（ｂ）と第３２図（ｃ）から文字連
鎖情報をもとめ、文書番号の一致と文字位置の連続性の
照合を行う。２文字連鎖「はに」３２２４に対して２文
字連鎖「はに」３２０３（文字位置ｎ＋２、文書番号
Ｎ）、特殊２文字連鎖「にい」３２２５に対して特殊２
文字連鎖「にい」３２１１（文字位置ｎ＋３、文書番号
Ｎ）、特殊２文字連鎖「にｂ」３２２６に対して特殊２
文字連鎖「にｂ」（文字位置ｌ、文書番号Ｎ）、特殊２
文字連鎖「ｂい」３２２７に対して特殊２文字連鎖「ｂ
い」（文字位置ｎ＋４、文書番号Ｎ）、２文字連鎖「い
ろ」３２２８に対して２文字連鎖「いろ」（文字位置ｎ
＋４、文書番号Ｎ）を取り出し検索文字列３２２３が登
録文字列３２０１に含まれていると判断する。A search character string 3223 containing a special character "b" in the search character string "Hanib Iro" is then obtained by the same collation method.
Then, character chain information is obtained from FIGS. 32 (b) and 32 (c), and matching of document numbers and collation of continuity of character positions are performed. For the two-character chain "Hani" 3224, the two-character chain "Hani" 3203 (character position n + 2, document number N) and for the special two-character chain "Nii" 3225, the special 2
Character chain "ni" 3211 (character position n + 3, document number N), special 2 character chain "ni b" 3226
Character chain "ni b" (character position 1, document number N), special 2
A special two-character chain "b"
(Character position n + 4, document number N) and two-character chain "iro" (character position n)
(+4, document number N), and determines that the search character string 3223 is included in the registered character string 3201.

【０１８９】第３１図は本発明の第１４の方法の一実施
例における文字列照合装置の構成を示したものである。FIG. 31 shows the structure of a character string collating apparatus according to an embodiment of the fourteenth method of the present invention.

【０１９０】第３１図において、３１０１は登録文字列
３２０１に対して特定の特殊文字「ａ」を検出し登録文
字列の文字位置３２１６を与える特殊文字検出器、３１
０２は登録文字列３２０１から２文字連鎖３２０２、３
２０３、３２０４、３２０５と２文字連鎖の文字位置と
文書番号を作成する２文字連鎖符号器、３１０３は登録
文字列３２０１から特殊２文字連鎖３２０６、３２０
７、３２０８、３２０９、３２１０、３２１１、３２１
２、３２１３、３２１４と特殊２文字連鎖の文字位置と
文書番号を作成する特殊２文字連鎖符号器、３１０４は
２文字連鎖符号器３１０２と特殊２文字連鎖符号器３２
０３で作成した２文字連鎖、特殊２文字連鎖、文字位
置、文書番号から第３２図（ｂ）（ｃ）の文字連鎖情報
を作成し２文字連鎖メモリ３１０５に格納する文字連鎖
組合せ判定器である。In FIG. 31, reference numeral 3101 denotes a special character detector which detects a specific special character “a” in the registered character string 3201 and gives the character position 3216 of the registered character string.
02 is a two-character chain 3202, 3 from the registered character string 3201.
203, 3204, and 3205, a two-character chain encoder for creating a character position and a document number of a two-character chain, and 3103 a special two-character chain 3206 and 320 from a registered character string 3201.
7, 3208, 3209, 3210, 3211, 321
2, 3213, 3214, a special two-character chain encoder for creating the character position and the document number of the special two-character chain, and 3104, a two-character chain encoder 3102 and the special two-character chain encoder 32
This is a character chain combination judging unit that generates the character chain information shown in FIGS. 32 (b) and (c) from the two-character chain, special two-character chain, character position, and document number created in step 03 and stores it in the two-character chain memory 3105. .

【０１９１】３１０６は検索文字列３２１７、３２２３
から特殊文字「ａ」または「ｂ」を検出する特殊文字検
出器、３１０７は２文字連鎖３２１８、３２２２、３２
２４、３２２８を検出する２文字連鎖検出器、３２０８
は特殊２文字連鎖３２１９〜３２２１、３２２５〜３２
２７を検出する特殊２文字連鎖検出器、３１０９は２文
字連鎖および特殊２文字連鎖から照合順番を決め、検索
文字列３２１７の場合は、３２１８、３２１９、３２２
０、３２２１、３２２２の順番で、検索文字列３２２３
の場合は、３２２４、３２２５、３２２６、３２２７、
３２２８も順番で文字連鎖を並べる文字連鎖組合せ判定
器、３２１０は、３１０９文字連鎖組合せ判定器から送
られてきた文字連鎖に対応する文字連鎖を２文字連鎖メ
モリ３１０５から２つづつ順番に取り出し、２つの文字
連鎖の文書番号、文字位置を取り出し３１１１制御部に
データを送る比較器、３１１１は比較器３１１０から送
られてきたデータから文字連鎖の連続性の照合を行い、
連続していれば次の文字連鎖のデータを比較器３１１０
から取り出し、連続していなければ照合を終了する制御
部である。Reference numeral 3106 denotes search character strings 3217 and 3223
A special character detector 3107 for detecting a special character "a" or "b" from a two-character chain 3218, 3222, 32
Two-character chain detector for detecting 24, 3228, 3208
Is a special two-character chain 3219-3221, 3225-32
The special two-character chain detector 3109 for detecting the character string 27 determines the collation order from the two-character chain and the special two-character chain. In the case of the search character string 3217, 3218, 3219, and 322
0, 3221 and 3222, in order of the search character string 3223
In the case of, 3224, 3225, 3226, 3227,
A character chain combination determiner 3228 also arranges the character chains in order. The character chain 3210 fetches two character chains corresponding to the character chain sent from the 3109 character chain combination determiner from the two-character chain memory 3105 in order, two by two. A comparator that extracts the document number and character position of one character chain and sends the data to the 3111 control unit, 3111 checks the continuity of the character chain from the data sent from the comparator 3110,
If they are consecutive, the data of the next character chain is compared with the data in the comparator 3110.
And ends the collation if they are not continuous.

【０１９２】よって、この方法では特定の特殊文字
「ａ」はその出現頻度に関係なく前後の文字と連鎖を生
成することができるため、特殊文字「ａ」の出現回数に
制限を受けること無く文字連鎖による文字列照合を行う
ことが可能となる。なお、特殊文字を先頭に含む照合、
たとえば「ａは」の場合は、特殊文字を無視して「は」
を第１文字とする文字連鎖の照合を行うことで照合を高
速に行うことができることはいうまでもない。Therefore, in this method, a specific special character “a” can be chained with the preceding and following characters irrespective of its appearance frequency, so that the number of appearances of the special character “a” is not restricted. String matching by chaining can be performed. In addition, collation that includes special characters at the beginning,
For example, in the case of "a", ignore special characters and
It is needless to say that the collation can be performed at high speed by performing the collation of the character chain in which is the first character.

【０１９３】（実施の形態１５）第３４図は本発明の文
字列照合の第１５の文字列の登録方法の概念を示してい
る。(Embodiment 15) FIG. 34 shows the concept of a fifteenth character string registration method for character string collation according to the present invention.

【０１９４】第３４図（ａ）において、３４０１は登録
時に入力される登録文字列「いろａはにはに」である。
登録文字列３４０１において「ａ」が特殊文字、登録文
字列の文書番号がＭである。３４０２は登録文字列３４
０１の文字から特殊文字「ａ」を除去し、特殊文字の直
後の文字「は」を特殊文字の直後の文字である「は＊」
としてマークされた登録文字列である。また３４０９は
登録文字列３４０１の先頭の文字位置をｎとし、特殊文
字「ａ」を除いて順に番号を付けた登録文字位置を示し
ている。最初に３４０２において特殊文字の直後の文字
「は＊」を除いた２文字連鎖を作成する。３４０３は
「いろ」の２文字連鎖、「には」の２文字連鎖３４０
５、「はに」の２文字連鎖３４０６を作成する。次に特
殊文字「ａ」の２文字前、すなわち「は＊」の２つ前の
文字である「い」と「は＊」の２文字連鎖「いは＊」３
４０７、「は＊」の１つ前の文字である「ろ」と「は
＊」の２文字連鎖「ろは＊」３４０８、「は＊」と直後
の文字「に」との特殊２文字連鎖「は＊に」を作成す
る。In FIG. 34 (a), reference numeral 3401 denotes a registered character string "iro a wa ni ni" inputted at the time of registration.
In the registered character string 3401, “a” is a special character, and the document number of the registered character string is M. 3402 is the registered character string 34
The special character "a" is removed from the character 01, and the character "ha" immediately after the special character is replaced with the character "ha *" immediately after the special character.
Is a registered character string marked as. Reference numeral 3409 denotes registered character positions in which a character position at the head of the registered character string 3401 is set to n and numbered sequentially except for the special character “a”. First, in step 3402, a two-character chain is created except for the character "ha *" immediately after the special character. 3403 is a two-character chain of “iro” and a two-character chain of “ni” 340
5. A two-character chain 3406 of “Hani” is created. Next, the two-character chain “Iha *” 3 of two characters before the special character “a”, that is, two characters before “ha *” and “ii” and “ha *”.
407, a two-character chain of “ro” and “ha *”, which is the character preceding “ha *”, and a special two-character chain of “roha *” 3408, and “ha *” and the next character “ni” "Ha *" is created.

【０１９５】第３４図（ｂ）は、第３４図（ａ）で作
成した２文字連鎖、特殊２文字連鎖から作成される文字
連鎖情報の構成図を示している。文字連鎖情報は文書番
号、２文字連鎖、特殊２文字連鎖、文字位置、特殊２文
字連鎖フラグから構成されており、第１文字の文字種毎
に２文字連鎖と特殊文字連鎖が連続してならび、特殊２
文字連鎖の開始位置が特殊２文字連鎖フラグで示されて
いる。第３４図（ｂ）では文字連鎖情報３４１１は、
文字連鎖の第１文字が「は」および「は」の直後に特殊
文字「ａ」が入る「は＊」に対して、文書番号３４１
２、「は」を含む２文字連鎖３４１３、「は」の文字位
置３４１４となる文字連鎖情報と、文書番号３４１２、
「は＊」を含む特殊２文字連鎖３４１５、「は＊」の文
字位置（ｎ）３４１６となる文字連鎖情報と、特殊２文
字連鎖３４１５の位置を示す特殊２文字連鎖フラグ３４
１７から構成される。なお、ここで「は」を第１文字と
して含む２文字連鎖がＮ個存在し、２文字連鎖の直後に
特殊２文字連鎖が続くので、特殊２文字連鎖フラグ３４
１７には「は＊」の開始位置であるＮ＋１が格納されて
いる。また、第２文字が特殊文字「ａ」の直後の文字で
ある場合は２文字連鎖「はＸ」（Ｘは第２文字）に含ま
れるものとする。FIG. 34 (b) shows the configuration of character chain information created from the two-character chain and the special two-character chain created in FIG. 34 (a). The character chain information includes a document number, a two-character chain, a special two-character chain, a character position, and a special two-character chain flag. The two-character chain and the special character chain are consecutively arranged for each character type of the first character. Special 2
The starting position of the character chain is indicated by a special two-character chain flag. In FIG. 34 (b), the character chain information 3411 is
For the first character of the character chain "ha" and "ha *" in which the special character "a" is inserted immediately after "ha", the document number 341
2, two-character chain 3413 including "ha", character chain information at character position 3414 of "ha", and document number 3412,
A special two-character chain 3415 including “ha *”, character chain information indicating the character position (n) 3416 of “ha *”, and a special two-character chain flag 34 indicating the position of the special two-character chain 3415
17. Here, there are N two-character chains including "wa" as the first character, and the special two-character chain follows immediately after the two-character chain.
17 stores N + 1 which is the start position of “*”. If the second character is the character immediately after the special character “a”, it is assumed that the character is included in the two-character chain “was X” (X is the second character).

【０１９６】第３４図（ｃ）では、第３４図（ａ）の場
合の文字連鎖情報の格納例を示している。３４１８は２
文字連鎖の第１文字が「い」の文字連鎖情報３４１９で
あり、「いろ」の２文字連鎖３４１９、「い」を第１文
字とする２文字連鎖「いは＊」３４２０に格納されてい
る。この場合、特殊２文字連鎖フラグの値は、第１文字
に特殊文字「ａ」の直後の文字を含まないので「０」と
なる。以下同様に、「ろ」を第１文字とする文字連鎖情
報３４２２には２文字連鎖が格納され、かつ特殊２文字
連鎖フラグ３４２３には値「０」が格納され、「に」を
第１文字とする文字連鎖情報３４２８には２文字連鎖が
格納され、かつ特殊２文字連鎖フラグ３４２９には値
「０」が格納される。一方、第１文字を「は」とする文
字連鎖情報３４２４は、２文字連鎖「はに」３４２５
と「は」の文字位置である「ｎ＋４」が最初に格納さ
れ、続いて、「は＊」を第１文字とする特殊２文字連鎖
３４２６と「は＊」の文字位置である「ｎ＋２」が文字
連鎖情報として格納される。また、「は＊」の特殊２文
字連鎖フラグは、「は」を第１文字とする文字連鎖情報
の２番目に存在するので値「２」が格納される。FIG. 34 (c) shows an example of storing character chain information in the case of FIG. 34 (a). 3418 is 2
The first character of the character chain is the character chain information 3419 of “I”, which is stored in the two-character chain 3419 of “Iro” and the two-character chain “Iha *” 3420 with “I” as the first character. . In this case, the value of the special two-character chain flag is “0” because the first character does not include the character immediately after the special character “a”. Similarly, a two-character chain is stored in the character chain information 3422 having "ro" as the first character, a value "0" is stored in the special two-character chain flag 3423, and "ni" is the first character. Is stored in the character chain information 3428, and the value “0” is stored in the special two-character chain flag 3429. On the other hand, the character chain information 3424 in which the first character is “Hana” is a two-character chain “Hani” 34 25
And “n + 4”, which is the character position of “ha”, are stored first, followed by a special two-character chain 3426 with “ha *” as the first character and “n + 2”, which is the character position of “ha *”. Stored as character chain information. Further, since the special two-character chain flag of “ha *” is the second character chain information having “ha” as the first character, the value “2” is stored.

【０１９７】このとき本発明の１５の方法による照合方
法について第３５図を使って説明する。第３５（ａ）
は、特殊文字「ａ」を間に含む検索文字列の場合で、先
頭から第２文字目に特殊文字が含まれる場合の照合方法
を示している。検索文字列３５０１「ろａはに」に対し
て、まず特殊文字「ａ」の続く文字「は」を「は＊」と
して検索文字列３５０２を作成し、２文字連鎖３５０３
「ろは＊」と特殊文字連鎖３５０４「は＊に」を検出す
る。続いて第３４図（ｃ）における文字連鎖情報から第
１文字が「ろ」である２文字連鎖を探し、文字連鎖情報
３４２２から２文字連鎖「ろは＊」を検出する。この
時、「ろは＊」の文字位置ｎ＋１および文書番号Ｍを取
り出し記憶しておく。次に特殊２文字連鎖３５０４「は
＊に」を第３４図（ｃの文字連鎖情報から取得する。特
殊２文字連鎖「は＊に」の第１文字が「は＊」であるか
ら第１文字「は」の文字連鎖情報３４２４から検出し、
「は＊」の位置は特殊２文字連鎖フラグ３４２７から
「２」であることから特殊２文字連鎖３４２６を検出す
る。このとき３４２６の文書番号がＭ、文字位置がｎ＋
２であることから、先に検出した２文字連鎖「ろは＊」
の文書番号と一致し、かつ文字位置ｎ＋１の次の文字位
置ｎ＋２であることから、文字連鎖「ろは＊」と「は＊
に」は連続していると判断し、検索文字列「ろａはに」
が登録文字列に含まれていると判断する。以上により文
字列の照合がなされる。The collation method according to the fifteenth method of the present invention will be described with reference to FIG. No. 35 (a)
Indicates a collation method in the case of a search character string including the special character "a" between them, and the second character from the beginning includes the special character. With respect to the search character string 3501 “ro a ha ni”, a character “ha” following the special character “a” is set to “ha *” to create a search character string 3502, and a two-character chain 3503
"Roha *" and special character chain 3504 "Ha * ni" are detected. Subsequently, a two-character chain whose first character is "ro" is searched from the character chain information in FIG. 34 (c), and a two-character chain "roha *" is detected from the character chain information 3422. At this time, the character position n + 1 of "roha *" and the document number M are extracted and stored. Next, the special two-character chain 3504 “Ha * ni” is obtained from the character chain information in FIG. 34 (c. The first character of the special two-character chain “Ha * ni” is “Ha *”, so the first character Detected from the character chain information 3424 of “ha”,
Since the position of "ha *" is "2" from the special two-character chain flag 3427, the special two-character chain 3426 is detected. At this time, the document number of 3426 is M, and the character position is n +
Since it is 2, the two-character chain "roha *" detected earlier
Character number n + 2 and character position n + 2 next to character position n + 1.
”Is determined to be continuous, and the search string“ roa han ”
Is determined to be included in the registered character string. Thus, the character strings are collated.

【０１９８】次に、第３５（ｂ）は、特殊文字「ａ」を
先頭に含む検索文字列の場合の照合方法を示している。
検索文字列３５０５「ａはに」に対して、まず特殊文字
「ａ」の続く文字「は」を「は＊」として検索文字列３
５０６を作成し、特殊文字連鎖３５０７「は＊に」を検
出する。続いて第３４図（ｃ）における文字連鎖情報か
ら第１文字が「は＊」である特殊２文字連鎖を探し、第
３５図（ａ）の特殊２文字連鎖「は＊に」の文字列照合
と同じ方法で文字連鎖情報の連続しているかどうかの判
断を行い、文字連鎖情報３４２４から特殊２文字連鎖３
４２６を検出する。以上により文字列の照合がなされ
る。なお、検索文字列が「ａは」３５０８の場合は、検
索文字列３５０９を作成し、かつ特殊２文字連鎖３５１
０を検出するが、この場合特殊２文字連鎖の２文字目が
存在しないので、文字連鎖情報から文字連鎖を検出する
場合は、１文字目が「は＊」である文字連鎖情報があれ
ば検出されたと判断する。Next, FIG. 35 (b) shows a collation method in the case of a search character string including the special character "a" at the head.
For the search character string 3505 “a wa ni”, first, the character “ha” following the special character “a” is set to “ha *” and the search character string 3
506 is created, and the special character chain 3507 "Ha * ni" is detected. Subsequently, a special two-character chain in which the first character is "ha *" is searched from the character chain information in FIG. 34 (c), and the character string collation of the special two-character chain "ha * ni" in FIG. 35 (a) is searched. It is determined whether or not the character chain information is continuous by the same method as that described above.
426 is detected. Thus, the character strings are collated. If the search character string is “a wa” 3508, a search character string 3509 is created and a special two-character
0 is detected. In this case, since the second character of the special two-character chain does not exist, when detecting the character chain from the character chain information, if there is character chain information in which the first character is "ha *", it is detected. Judge that it was done.

【０１９９】次に、第３５（ｃ）は、特殊文字「ａ」を
間に含む検索文字列の場合で、先頭から３文字目以降に
特殊文字を含む場合の文字列の照合方法を示している。
検索文字列３５１１「いろａはに」に対して、まず特殊
文字「ａ」の直後の文字を「は＊」として置き換え検索
文字列３５１２を作成する。次に、特殊文字「ａ」を含
まない２文字連鎖「いろ」３５１３、２文字連鎖「いは
＊」３５１４、特殊２文字連鎖３５１５「は＊に」を検
出する。続いて第３４図（ｃ）における文字連鎖情報か
ら第１文字が「い」である２文字連鎖「いろ」を探し、
文字連鎖情報３４１８から２文字連鎖「いろ」３４１９
を検出する。この時、「いろ」の文字位置ｎおよび文書
番号Ｍを取り出し記憶しておく。次に２文字連鎖３５１
４「いは＊」を第３４図（ｃ）の文字連鎖情報から取得
する。２文字連鎖「いは＊」の第１文字が「い」である
から文字連鎖情報３４１８から検出し、第２文字が「は
＊」の２文字連鎖３４２０を検出する。このとき３４２
０の文書番号がＭ、文字位置がｎで一致することから、
先に検出した２文字連鎖「いろ」は連続していると判断
し、「いろａは」までが文書番号Ｍの登録文字列に含ま
れると判断する。続いて２文字連鎖「いは＊」３５１４
と特殊２文字連鎖「は＊に」３５１５との連続性の照合
を行う。この照合は第３５（ａ）と同様の処理である
が、「い」と「は＊」の文字位置の差が２であることに
注意して、「いは＊」の２文字連鎖３４２０の文字位置
ｎ、「は＊に」の特殊２文字連鎖３４２６の文字位置が
ｎ＋２であり文字位置の差が２であることから、文字連
鎖「いは＊」と「は＊に」は連続していると判断し、検
索文字列「いろａはに」が登録文字列に含まれていると
判断する。以上により文字列の照合がなされる。Next, FIG. 35 (c) shows a method of collating a character string in the case of a search character string including the special character “a” between the first and third characters from the beginning. I have.
In response to the search character string 3511 "iroa ha ni", the character immediately after the special character "a" is replaced with "ha *" to create a search character string 3512. Next, the two-letter chain “iro” 3513 that does not include the special character “a”, the two-letter chain “Iha *” 3514, and the special two-letter chain 3515 “Ha * ni” are detected. Subsequently, a two-character chain “IRO” whose first character is “I” is searched from the character chain information in FIG.
From the character chain information 3418, a two-character chain "iro" 3419
Is detected. At this time, the character position n of "iro" and the document number M are extracted and stored. Next, a two-character chain 351
4 "I *" is obtained from the character chain information in FIG. 34 (c). Since the first character of the two-character chain “I *” is “I”, it is detected from the character chain information 3418, and the two-character chain 3420 of the second character “HA *” is detected. At this time, 342
Since the document number of 0 matches M and the character position matches n,
It is determined that the previously detected two-character chain “Iro” is continuous, and that “Iro a” is included in the registered character string of the document number M. Next, the two-character chain "I *" 3514
And the special two-character chain “ha * ni” 3515 are compared. This collation is the same processing as in the 35th (a), but note that the difference between the character positions of “i” and “ha *” is 2, and the two-character chain 3420 of “ii *” Since the character position of the special two-character chain 3426 of the character position n and “ha * ni” is n + 2 and the difference between the character positions is 2, the character chains “ii *” and “ha * ni” are consecutive. It is determined that the search character string is included in the registered character string. Thus, the character strings are collated.

【０２００】第３３図は本発明の第１５の方法の一実施
例における文字列照合装置の構成を示したものである。FIG. 33 shows the structure of a character string collating apparatus according to an embodiment of the fifteenth method of the present invention.

【０２０１】第３３図において、３３０１は登録する文
字列３４０１に対して特定の特殊文字「ａ」を検出し登
録文字位置３４０９の文字位置３４１０を与える特殊文
字検出器、３３０２は登録文字列３４０１から特殊文字
「ａ」の直後の文字「は」を特別の文字とした登録文字
列３４０２から２文字連鎖３４０３、３４０５、３４０
６、３４０７、３４０８と特殊文字連鎖３４０４の組み
合わさせを判定する文字連鎖組合せ判定器、３３０３は
２文字連鎖と文書番号の組を作成し、文字連鎖情報を２
文字連鎖メモリ３３０５に格納する２文字連鎖符号器、
３３０４は特殊文字連鎖と文書番号の組を作成し、文字
連鎖情報と特殊文字連鎖フラグを３３０５の２文字連鎖
メモリーに格納する特殊２文字連鎖符号器、３３０６は
検索する文字列３５０１、３５０５、３５０８、３５１
１から特定の特殊文字「ａ」を検出し検索文字列３５０
２、３５０６、３５０９、３５１２を作成する特殊文字
検出器、３３０７は２文字連鎖３５０３、３５０７、３
５１０、３５１３、３５１４および特殊２文字連鎖３５
０４、３５１５の組合せを判定する文字連鎖組合せ判定
器、３３０８は２文字連鎖３５０３、３５０７、３５１
０、３５１３、３５１４を作成する２文字連鎖検出器、
３５０９は特殊２文字連鎖３５０４、３５１５を作成す
る特殊２文字連鎖検出器、３５１０は、３５０８、０９
からの２文字連鎖、特殊２文字連鎖に一致する２文字連
鎖および特殊２文字連鎖を２文字連鎖メモリ３５０５か
ら取り出し、文字の連続性の照合を行う比較器、３５１
１は比較器３５１０で照合した文字連鎖が一致しなけれ
ば照合をやめ、一致すれば次の文字連鎖との照合を比較
器３５１０に命令し、文字の連続性の照合の判断を行う
制御部である。In FIG. 33, reference numeral 3301 denotes a special character detector that detects a specific special character “a” in a character string 3401 to be registered and gives a character position 3410 of a registered character position 3409; A two-character chain 3403, 3405, 340 from a registered character string 3402 in which the character "ha" immediately after the special character "a" is a special character.
6, 3407, 3408 and a special character chain 3404 determine a character chain combination determining unit 3303, which creates a pair of a two-character chain and a document number, and converts the character chain information into two.
A two-character chain encoder stored in the character chain memory 3305;
Reference numeral 3304 denotes a special two-character chain encoder for creating a set of a special character chain and a document number, and storing character chain information and a special character chain flag in a two-character chain memory 3305. Character strings 3501, 3505, and 3508 to be searched for. , 351
A special character "a" is detected from the search character string 350
2, 3506, 3509, 3512 are special character detectors for generating 3507, and 2307 is a two-character chain 3503, 3507, 3
510, 3513, 3514 and special two-character chain 35
A character chain combination determiner 3308 for determining the combination of 04, 3515 is a two-character chain 3503, 3507, 351
A two-character chain detector that creates 0, 3513, 3514,
Reference numeral 3509 denotes a special two-character chain detector for generating special two-character chains 3504 and 3515, and 3510 denotes 3508 and 09.
351 retrieves a two-character chain and a special two-character chain that match the two-character chain and the special two-character chain from the two-character chain memory 3505, and compares the character continuity.
Reference numeral 1 denotes a control unit that stops collation if the character chains collated by the comparator 3510 do not match, and instructs the comparator 3510 to collate with the next character chain if they match, and determines a collation of character continuity. is there.

【０２０２】よって、この方法では特定の特殊文字
「ａ」はその出現頻度に関係なく前後の文字と連鎖を生
成することができるため、特殊文字「ａ」の出現回数に
制限を受けること無く文字連鎖による文字列照合を行う
ことが可能となる。なお、特殊文字を先頭に含む照合、
たとえば「ａは」の場合は、特殊文字を無視して「は」
を第１文字とする文字連鎖の照合を行うことで照合を高
速に行うことができることはいうまでもない。Therefore, in this method, since the specific special character "a" can be linked with the preceding and following characters regardless of the frequency of occurrence, the character can be generated without being limited by the number of appearances of the special character "a". String matching by chaining can be performed. In addition, collation that includes special characters at the beginning,
For example, in the case of "a", ignore special characters and
It is needless to say that the collation can be performed at high speed by performing the collation of the character chain in which is the first character.

【０２０３】[0203]

【発明の効果】以上のように本発明は、照合を行う文字
列に出現頻度の高い特殊文字が含まれている場合の、特
殊文字を含む文字連鎖で、この特殊文字の出現の頻度を
無視することができるため、このような文字を含む言語
の文字列照合に容易に対応でき、その効果は大きい。As described above, according to the present invention, when a character string to be collated includes a special character having a high appearance frequency, the character chain including the special character ignores the appearance frequency of the special character. Therefore, it is possible to easily cope with character string collation of a language including such characters, and the effect is large.

[Brief description of the drawings]

【図１】本発明の第１の実施の形態における文字列照合
装置の概念図FIG. 1 is a conceptual diagram of a character string collation device according to a first embodiment of the present invention.

【図２】本発明の第１の実施の形態における文字列照合
の方法を示す概念図FIG. 2 is a conceptual diagram showing a character string collation method according to the first embodiment of the present invention.

【図３】本発明の第２の実施の形態における文字列照合
装置のブロック構成図FIG. 3 is a block diagram of a character string collating apparatus according to a second embodiment of the present invention.

【図４】本発明の第２の実施の形態における文字列照合
の方法を示す概念図FIG. 4 is a conceptual diagram showing a character string collation method according to a second embodiment of the present invention.

【図５】本発明の第３の実施の形態における文字列照合
装置のブロック構成図FIG. 5 is a block diagram of a character string collating apparatus according to a third embodiment of the present invention.

【図６】本発明の第３の実施の形態における登録時の文
字列照合方法を示す概念図FIG. 6 is a conceptual diagram showing a character string collation method at the time of registration according to a third embodiment of the present invention.

【図７】本発明の第３の実施の形態における２文字連鎖
の出現回数により文字列照合方法を示す概念図FIG. 7 is a conceptual diagram illustrating a character string matching method based on the number of appearances of a two-character chain according to a third embodiment of the present invention.

【図８】本発明の第３の実施の形態における検索時の文
字列照合の方法を示す概念図FIG. 8 is a conceptual diagram showing a method of collating a character string at the time of retrieval according to a third embodiment of the present invention.

【図９】本発明の第４の方法の実施例における文字列照
合装置のブロック構成図FIG. 9 is a block diagram of a character string collating apparatus according to a fourth embodiment of the present invention;

【図１０】本発明の第４の文字列照合の方法を示す概念
図FIG. 10 is a conceptual diagram showing a fourth character string collation method of the present invention.

【図１１】本発明の第５の方法の実施例における文字列
照合装置のブロック構成図FIG. 11 is a block diagram of a character string collating apparatus according to a fifth embodiment of the present invention.

【図１２】本発明の第５の文字列照合の方法を示す概念
図FIG. 12 is a conceptual diagram showing a fifth character string collation method of the present invention.

【図１３】本発明の第６の方法の実施例における文字列
照合装置のブロック構成図FIG. 13 is a block diagram of a character string collating apparatus according to a sixth embodiment of the present invention.

【図１４】本発明の第６の文字列照合の方法を示す概念
図FIG. 14 is a conceptual diagram showing a sixth character string collation method of the present invention.

【図１５】本発明の第７の文字列照合の方法を示す概念
図FIG. 15 is a conceptual diagram showing a seventh character string collation method of the present invention.

【図１６】本発明の第７の方法の実施例における文字列
照合装置のブロック構成図FIG. 16 is a block diagram of a character string collating apparatus according to a seventh embodiment of the present invention;

【図１７】本発明の第８の２文字連鎖データ作成の方法
と文字列照合の方法を示す概念図FIG. 17 is a conceptual diagram showing an eighth two-character chain data creation method and a character string collation method according to the present invention.

【図１８】本発明の第８の方法の実施例における２文字
連鎖データ作成のフロー図FIG. 18 is a flowchart of creating two-character chain data in an embodiment of the eighth method of the present invention.

【図１９】本発明の第８の方法の実施例における検索文
字列照合のフロー図FIG. 19 is a flowchart of search string matching in an embodiment of the eighth method of the present invention.

【図２０】本発明の第８の方法の実施例における文字列
照合装置のブロック構成図FIG. 20 is a block diagram of a character string collating apparatus according to an eighth embodiment of the present invention.

【図２１】本発明の第９の２文字連鎖データ作成の方法
と文字列照合の方法を示す概念図FIG. 21 is a conceptual diagram showing a ninth two-character chain data creation method and a character string collation method according to the present invention.

【図２２】本発明の第９の方法の実施例における文字列
照合装置のブロック構成図FIG. 22 is a block diagram of a character string collating apparatus in a ninth embodiment of the present invention.

【図２３】本発明の第１０の方法の実施の形態における
文字列照合装置の概念図FIG. 23 is a conceptual diagram of a character string collation device according to a tenth embodiment of the present invention.

【図２４】本発明の第１０の方法による文字列照合の方
法を示す概念図FIG. 24 is a conceptual diagram showing a character string collation method according to a tenth method of the present invention.

【図２５】本発明の第１１の方法の実施の形態における
文字列照合装置のブロック構成図FIG. 25 is a block diagram of a character string collating apparatus according to an eleventh embodiment of the present invention.

【図２６】本発明の第１１の方法による文字列照合の方
法を示す概念図FIG. 26 is a conceptual diagram showing a character string collation method according to an eleventh method of the present invention.

【図２７】本発明の第１２の方法の実施の形態における
文字列照合装置のブロック構成図FIG. 27 is a block diagram of a character string collating apparatus according to a twelfth embodiment of the present invention.

【図２８】本発明の第１２の方法による文字列照合の方
法を示す概念図FIG. 28 is a conceptual diagram showing a character string collation method according to a twelfth method of the present invention.

【図２９】本発明の第１３の方法の実施の形態における
文字列照合装置のブロック構成図FIG. 29 is a block diagram of a character string collation apparatus according to a thirteenth embodiment of the present invention.

【図３０】本発明の第１３の方法による文字列照合の方
法を示す概念図FIG. 30 is a conceptual diagram showing a character string collation method according to a thirteenth method of the present invention.

【図３１】本発明の第１４の方法の実施例における文字
列照合装置のブロック構成図FIG. 31 is a block diagram of a character string collating apparatus in a fourteenth embodiment of the present invention.

【図３２】本発明の第１４の方法の実施例における文字
列照合装置の文字列の登録方法を示す概念図FIG. 32 is a conceptual diagram showing a character string registration method of the character string collation device in the fourteenth embodiment of the present invention.

【図３３】本発明の第１５の方法の実施例における文字
列照合装置のブロック構成図FIG. 33 is a block diagram showing a character string collating apparatus according to a fifteenth embodiment of the present invention;

【図３４】本発明の第１５の方法の実施例における文字
列照合装置の文字列の登録方法を示す概念図FIG. 34 is a conceptual diagram showing a method for registering a character string in the character string collating apparatus according to the fifteenth embodiment of the present invention.

【図３５】本発明の第１５の方法の実施例における文字
列照合装置の文字列照合の方法を示す概念図FIG. 35 is a conceptual diagram showing a character string collation method of the character string collation device in the fifteenth embodiment of the present invention.

【図３６】従来の文字列照合装置のブロック構成図FIG. 36 is a block diagram of a conventional character string collation device.

【図３７】従来の文字列照合の方法を示す概念図FIG. 37 is a conceptual diagram showing a conventional character string collation method.

[Explanation of symbols]

３０本体３１入力手段３９プリンタ３８ディスプレイ４０外部記録手段１０１２文字連鎖検出器１０２３文字連鎖検出器１０３２文字連鎖メモリ１０４３文字連鎖メモリ１１１２文字連鎖検出器１１２３文字連鎖検出器１１３比較器１１４制御部３０１文字列変換器３０２２文字連鎖検出器３０３２文字連鎖メモリ３０４文字列変換器３０５２文字連鎖検出器３０６比較器３０７制御部５０１文字列変換器５０２２文字連鎖検出器５０３２文字連鎖メモリ５０４文字列変換器５０５２文字連鎖検出器５０６比較器５０７制御部９０１特殊文字検出器９０２２文字連鎖検出器９０３特殊文字連鎖検出器９０４２文字連鎖メモリ９１１特殊文字検出器９１２２文字連鎖検出器９１３特殊文字連鎖検出器９１４比較器９１５制御部１１０１２文字連鎖検出器１１０２３文字連鎖検出器１１０３２文字連鎖メモリ１１０４３文字連鎖メモリ１１１１２文字連鎖検出器１１１２３文字連鎖検出器１１１３比較器１１１４制御部１３０１２文字連鎖検出器１３０２３文字連鎖検出器１３０３特殊２文字連鎖生成器１３０４２文字連鎖メモリ１３１１２文字連鎖検出器１３１２３文字連鎖検出器１３１３特殊２文字連鎖生成器１３１４比較器１３１５制御部１５０１登録文字列１５０２２文字連鎖１５０３２文字連鎖１５０４２文字連鎖１５０５２文字連鎖１５０６２文字連鎖１５０７２文字連鎖１５０８２文字連鎖１５０９２文字連鎖１５１０２文字連鎖１５１１２文字連鎖１５１２２文字連鎖１５１３２文字連鎖１５１４２文字連鎖１５１５２文字連鎖１５１６出現重複数１５１７出現重複数１６０１特殊文字検出器１６０２２文字連鎖検出器１６０３特殊文字連鎖検出器１６０４出現重複メモリ１６０５特殊文字連鎖ソート器１６０６２文字連鎖メモリ１６０７特殊文字検出器１６０８２文字連鎖検出器１６０９特殊文字連鎖検出器１６１０比較器１６１１制御器１６１２出現重複カウンタメモリ１７０１文書番号１７０２第１文字の出現回数１７０３第２文字の出現回数１７０４第１文字が特殊文字の出現回数１７０５第２文字の指定数値１７０６第１文字の指定数値１７０７第２文字の特殊文字の出現回数１７０８登録文字列１７０９２文字連鎖１７１０２文字連鎖１７１１２文字連鎖１７１２２文字連鎖１７１３２文字連鎖１７１４２文字連鎖１７１７２文字連鎖１７１６２文字連鎖１７１７２文字連鎖１７１８２文字連鎖１７１９２文字連鎖１７２０２文字連鎖１７２１文字連鎖データ１７２２文字連鎖データ１７２３文字連鎖データ１７２４文字連鎖データ１７２５文字連鎖データ１７２６文字連鎖データ１７２７文字連鎖データ１７２８文字連鎖データ１７２９文字連鎖データ１７３０文字連鎖データ１７３１文字連鎖データ１７３２検索文字列１７３３２文字連鎖１７３４２文字連鎖１７３５２文字連鎖１７３６２文字連鎖１７３７特殊文字出現カウンタ１７３８特殊文字出現カウンタ１８０１〜１８１４ステップ１９０１〜１９１２ステップ２００１２文字連鎖検出器２００２出現回数算出器２００３特殊文字連鎖検出器２００４２文字連鎖メモリ２００５２文字連鎖検出器２００６比較器２００７特殊文字出現カウンタメモリ２００８制御器２１０１特殊文字検出器２１０２２文字連鎖検出器２１０３特殊２文字連鎖検出器２１０４２文字連鎖メモリ２１０５特殊文字検出器２１０６２文字連鎖検出器２１０７特殊２文字連鎖検出器２１０８比較器２１０９制御器２２０１登録文字列２２０２２文字連鎖２２０３特殊２文字連鎖２２０４２文字連鎖２２０５検索文字列２２０６２文字連鎖２２０７特殊２文字連鎖２２０８２文字連鎖２３０１２文字連鎖位置検出器２３０２３文字連鎖位置検出器２３０３２文字連鎖位置メモリ２３０４３文字連鎖位置メモリ２３１１２文字連鎖検出器２３１２３文字連鎖検出器２３１３比較器２３１４制御部２５０１文字列変換器２５０２２文字連鎖位置検出器２５０３２文字連鎖位置メモリ２５０４文字列変換器２５０５２文字連鎖検出器２５０６比較器２５０７制御部２７０１文字列変換器２７０２２文字連鎖位置検出器２７０３２文字連鎖位置メモリ２７０４文字列変換器２７０５２文字連鎖検出器２７０６比較器２７０７制御部２９０１３文字連鎖検出器２９０２２文字連鎖位置検出器２９０３特殊２文字連鎖生成器２９０４２文字連鎖位置メモリ２９０５特殊２文字連鎖位置メモリ２９１１３文字連鎖検出器２９１２２文字連鎖検出器２９１３特殊２文字連鎖生成器２９１４比較器２９１５制御部３１０１特殊文字検出器３１０２２文字連鎖符号器３１０３特殊２文字連鎖符号器３１０４文字連鎖組合せ判定器３１０５２文字連鎖メモリー３１０６特殊文字検出器３１０７２文字連鎖検出器３１０８特殊２文字連鎖検出器３１０９文字連鎖組合せ判定器３１１０比較器３１１１判定部３２０１登録文字列３２０２２文字連鎖３２０３２文字連鎖３２０４２文字連鎖３２０５２文字連鎖３２０６特殊２文字連鎖３２０７特殊２文字連鎖３２０８特殊２文字連鎖３２０９特殊２文字連鎖３２１０特殊２文字連鎖３２１１特殊２文字連鎖３２１２特殊２文字連鎖３２１３特殊２文字連鎖３２１４特殊２文字連鎖３２１５文書番号３２１６登録文字位置３２１７検索文字列３２１８２文字連鎖３２１９特殊２文字連鎖３２２０特殊２文字連鎖３２２１特殊２文字連鎖３２２２２文字連鎖３２２３検索文字列３２２４２文字連鎖３２２５特殊２文字連鎖３２２６特殊２文字連鎖３２２７特殊２文字連鎖３２２８２文字連鎖３３０１特殊文字検出器３３０２文字連鎖組合せ判定器３３０３２文字連鎖符号器３３０４特殊２文字連鎖符号器３３０５特殊２文字連鎖符号器３３０６２文字連鎖メモリー３３０７文字連鎖組合せ判定器３３０８２文字連鎖検出器３３０９特殊２文字連鎖検出器３３１０比較器３３１１判定部３４０１登録文字列３４０２登録文字列３４０３２文字連鎖３４０４特殊２文字連鎖３４０５２文字連鎖３４０６２文字連鎖３４０７２文字連鎖３４０８２文字連鎖３４０９登録文字位置３４１０文字位置３４１１文字連鎖情報３４１２文書番号３４１３２文字連鎖３４１４文字位置３４１５特殊２文字連鎖３４１６文字位置３４１７特殊２文字連鎖フラグ３４１８文字連鎖情報３４１９２文字連鎖３４２０特殊２文字連鎖３４２１特殊２文字連鎖フラグ３４２２文字連鎖情報３４２３特殊２文字連鎖フラグ３４２４文字連鎖情報３４２５２文字連鎖３４２６特殊２文字連鎖３４２７特殊２文字連鎖フラグ３４２８文字連鎖情報３４２９特殊２文字連鎖フラグ３５０１検索文字列３５０２検索文字列３５０３２文字連鎖３５０４特殊２文字連鎖３５０５検索文字列３５０６検索文字列３５０７特殊２文字連鎖３５０８検索文字列３５０９検索文字列３５１０特殊２文字連鎖３５１１検索文字列３５１２検索文字列３５１３２文字連鎖３５１４２文字連鎖３５１５特殊２文字連鎖 Reference Signs List 30 main body 31 input means 39 printer 38 display 40 external recording means 101 two-character chain detector 102 three-character chain detector 103 two-character chain memory 104 three-character chain memory 111 two-character chain detector 112 three-character chain detector 113 comparator 114 control unit 301 character string converter 302 two-character chain detector 303 two-character chain memory 304 character string converter 305 two-character chain detector 306 comparator 307 control unit 501 character string converter 502 two-character chain detector 503 two-character Chain memory 504 Character string converter 505 Two-character chain detector 506 Comparator 507 Control unit 901 Special character detector 902 Two-character chain detector 903 Special character chain detector 904 Two-character chain memory 911 Special character detector 912 Two-character chain Detector 913 Special character chain detection Output unit 914 Comparator 915 Control unit 1101 Two-character chain detector 1102 Three-character chain detector 1103 Two-character chain memory 1104 Three-character chain memory 1111 Two-character chain detector 1112 Three-character chain detector 1113 Comparator 1114 Control unit 1301 2 Character chain detector 1302 Three character chain detector 1303 Special two character chain generator 1304 Two character chain memory 1311 Two character chain detector 1312 Three character chain detector 1313 Special two character chain generator 1314 Comparator 1315 Control unit 1501 Registered characters Sequence 1502 Two-character chain 1503 Two-character chain 1504 Two-character chain 1505 Two-character chain 1506 Two-character chain 1507 Two-character chain 1508 Two-character chain 1509 Two-character chain 1510 Two-character chain 1511 Two-character chain 1512 Two-character chain 15 3 Two-character chain 1514 Two-character chain 1515 Two-character chain 1516 Occurrence overlap 1517 Occurrence overlap 1601 Special character detector 1602 Two-character chain detector 1603 Special character chain detector 1604 Occurrence overlap memory 1605 Special character chain sorter 1606 Two characters Chain memory 1607 Special character detector 1608 Two-character chain detector 1609 Special character chain detector 1610 Comparator 1611 Controller 1612 Appearance duplicate counter memory 1701 Document number 1702 Number of first character appearances 1703 Number of second character appearances 1704 First Number of occurrences of special characters 1705 Number specified for second character 1706 Number specified for first character 1707 Number of occurrences of special character for second character 1708 Registered character string 1709 Two-character chain 1710 Two-character chain 1711 Two-character chain 17 12 Two-character chain 1713 Two-character chain 1714 Two-character chain 1717 Two-character chain 1716 Two-character chain 1717 Two-character chain 1718 Two-character chain 1719 Two-character chain 1720 Two-character chain 1721 Character-chain data 1722 Character-chain data 1723 Character-chain data 1724 characters Chained data 1725 Character chained data 1726 Character chained data 1727 Character chained data 1728 Character chained data 1729 Character chained data 1730 Character chained data 1731 Character chained data 1732 Search character string 1733 Two character chain 1734 Two character chain 1735 Two character chain 1736 Two character chain 1737 Special character appearance counter 1738 Special character appearance counter 1801 to 1814 Step 1901 to 1912 Step 2001 Two-character chain detector 20 2 Appearance frequency calculator 2003 Special character chain detector 2004 Two-character chain memory 2005 Two-character chain detector 2006 Comparator 2007 Special character appearance counter memory 2008 Controller 2101 Special character detector 2102 Two-character chain detector 2103 Special two-character chain Detector 2104 Two-character chain memory 2105 Special character detector 2106 Two-character chain detector 2107 Special two-character chain detector 2108 Comparator 2109 Controller 2201 Registered character string 2202 Two-character chain 2203 Special two-character chain 2204 Two-character chain 2205 Search Character string 2206 Two-character chain 2207 Special two-character chain 2208 Two-character chain 2301 Two-character chain position detector 2302 Three-character chain position detector 2303 Two-character chain position memory 2304 Three-character chain position memory 2311 2 Character chain detector 2312 3 character chain detector 2313 comparator 2314 control unit 2501 character string converter 2502 2 character chain position detector 2503 2 character chain position memory 2504 character string converter 2505 2 character chain detector 2506 comparator 2507 control Unit 2701 character string converter 2702 two-character chain position detector 2703 two-character chain position memory 2704 character string converter 2705 two-character chain detector 2706 comparator 2707 control unit 2901 three-character chain detector 2902 two-character chain position detector 2903 Special two-character chain generator 2904 Two-character chain position memory 2905 Special two-character chain position memory 2911 Three-character chain detector 2912 Two-character chain detector 2913 Special two-character chain generator 2914 Comparator 2915 Control unit 3101 Special character detector 3 102 Two-character chain encoder 3103 Special two-character chain encoder 3104 Character chain combination determiner 3105 Two-character chain memory 3106 Special character detector 3107 Two-character chain detector 3108 Special two-character chain detector 3109 Character chain combination determiner 3110 Comparison Unit 3111 Judgment unit 3201 Registered character string 3202 Two-character chain 3203 Two-character chain 3204 Two-character chain 3205 Two-character chain 3206 Special two-character chain 3207 Special two-character chain 3208 Special two-character chain 3209 Special two-character chain 3210 Special two-character chain 3211 Special two-character chain 3212 Special two-character chain 3213 Special two-character chain 3214 Special two-character chain 3215 Document number 3216 Registered character position 3217 Search character string 3218 Two-character chain 3219 Special two-character chain 3220 Special Two-character chain 3221 Special two-character chain 3222 Two-character chain 3223 Search character string 3224 Two-character chain 3225 Special two-character chain 3226 Special two-character chain 3227 Special two-character chain 3228 Two-character chain 3301 Special character detector 3302 Character chain combination determiner 3303 two-character chain encoder 3304 special two-character chain encoder 3305 special two-character chain encoder 3306 two-character chain memory 3307 character chain combination determiner 3308 two-character chain detector 3309 special two-character chain detector 3310 comparator 3311 determination unit 3401 Registered character string 3402 Registered character string 3403 Two-character chain 3404 Special two-character chain 3405 Two-character chain 3406 Two-character chain 3407 Two-character chain 3408 Two-character chain 3409 Registered character position 3410 Character position 3411 Character chain information 3412 Document number 3413 Two character chain 3414 Character position 3415 Special two character chain 3416 Character position 3417 Special two character chain flag 3418 Character chain information 3419 Two character chain 3420 Special two character chain 3421 Special two character chain flag 3422 Character chain information 3423 Special two character chain flag 3424 Character chain information 3425 Two character chain 3426 Special two character chain 3427 Special two character chain flag 3428 Character chain information 3429 Special two character chain flag 3501 Search character string 3502 Search character string 3503 Two character chain 3504 Special 2 Character chain 3505 Search character string 3506 Search character string 3507 Special two-character chain 3508 Search character string 3509 Search character string 3510 Special two-character chain 3511 Search character string 3512 Search character 3513 two-character chain 3514 two-character chain 3515 special two-character chain

───────────────────────────────────────────────────── フロントページの続き (72)発明者藤田智子大阪府門真市大字門真1006番地松下電器産業株式会社内 (72)発明者白崎安代大阪府門真市大字門真1006番地松下電器産業株式会社内 ──────────────────────────────────────────────────の Continued on the front page (72) Tomoko Fujita, Inventor 1006 Kazuma Kadoma, Osaka Prefecture Matsushita Electric Industrial Co., Ltd. (72) Inventor Yasuhiro Shirasaki 1006 Kadoma Kadoma, Kadoma City, Osaka Matsushita Electric Industrial Co.

Claims

[Claims]

1. A computer-readable recording medium in which search data used for full-text search is recorded, wherein the search data includes all characters including characters other than a special character specified in advance for a search target character string. First data in which a character chain is detected, and for each two-character chain, first data that records the number of occurrences of the first character and the second character that form the two-character chain in the search target character string, and a special character specified in advance 2 other than special characters in which is inserted
Detects all character chains composed of characters, and stores, for each character chain, second data in which the number of occurrences of the first character and the second character constituting the character chain in the search target character string is recorded as a first data. A computer-readable recording medium on which full-text search data is recorded, wherein the data and the second data are recorded separately.

2. A computer-readable recording medium on which search data used for full-text search according to claim 1 is recorded, and all two-character sequences consisting of characters other than special characters specified in advance are detected from a search character string. A first character chain detecting means for detecting all character chains consisting of two characters other than the special character in which a predetermined special character is inserted from a search character string; The two-character chain detected by the first character-chain detecting means is searched from the first data recorded on the recording medium, and the character chain detected by the second character-chain detecting means is detected. A character string collating device comprising: comparing means for judging the presence or absence of a continuation of a character chain as a search character string by comparing the number of appearances corresponding to the character chain.

3. A computer-readable recording medium storing search data used for full-text search, wherein the search data is not a search target according to a character adjacent to a special character designated in advance in a search target character string. Character strings, and detects all two-character chains in the converted character string including the characters not to be searched.
A computer-readable recording medium on which full-text search data is recorded, wherein the number of appearances of a first character and a second character constituting a two-character chain in a character string to be searched is recorded as a set for each character chain.

4. A computer-readable recording medium on which search data used for full-text search according to claim 3 is recorded, and a special character specified in a search character string is applied to data recorded on said recording medium. Subject to the same rules,
A character string conversion unit for converting a character string that is not a search target based on adjacent characters; and detecting a two-character chain including a character that is not a search target for the character string converted by the character string conversion unit. A two-character chain detecting unit, and a two-character chain detected by the two-character chain detecting unit is detected from the recording medium, and the presence or absence of the continuation of the character chain as a search character string is determined by comparing the corresponding number of appearances. A character string collating device, comprising:

5. A computer-readable recording medium on which search data used for full-text search is recorded, wherein the search data is not a search target according to a character adjacent to a special character specified in advance in a search target character string. Converted to two characters, and detected all two-character chains in the converted character string, including the characters that are not to be searched.
For each two-character chain, the first and second characters that make up the two-character chain
A computer-readable recording medium on which full-text search data is recorded, wherein the number of appearances of a character in a character string to be searched is recorded as a set.

6. A computer-readable recording medium on which search data used for full-text search according to claim 5 is recorded, and a special character specified in a search character string is applied to data recorded on said recording medium. Subject to the same rules,
A character string conversion means for converting into two characters that are not to be searched based on adjacent characters; and a two-character chain including two characters that are not to be searched for the character string converted by the character string conversion means. A two-character chain detecting means for detecting the two-character chain detected by the two-character chain detecting means from the recording medium, and comparing the corresponding number of appearances to determine whether or not the character string as a search character string is continuous; A character string collating device, comprising: comparing means for judging a character string.

7. A computer-readable recording medium recording search data used for full-text search, wherein the search data detects a two-character chain for all characters in a search target character string, For each of the first and second characters constituting a character chain other than the special character specified in advance, which constitutes a two-character chain for each chain, the number of appearances of the characters other than the special character specified in advance is specified in advance. In the case of special characters, all three-character sequences consisting of three characters with a special character specified in advance between the third data recorded as a set of certain numerical values and the search target character string are detected. And
For each three-character chain, the first and third characters that make up the three-character chain
A computer-readable recording of full-text search data, characterized in that the fourth data recorded as a set of the number of appearances of the character in the search target character string and the third data and the fourth data are recorded separately. Recording medium.

8. A computer-readable recording medium on which search data used for full-text search according to claim 7 is recorded, and all two-character sequences consisting of characters other than special characters specified in advance are detected from a search character string. A first character chain detecting means for detecting, from the search character string, all character chains consisting of three characters into which a special character designated in advance has been inserted; and a first character chain detecting means. The two-character chain detected by the detecting means is searched from the first data recorded on the recording medium, the character chain detected by the third character chain detecting means is detected, and the character chain corresponding to the detected character chain is detected. A character string collating device comprising: comparing means for judging the presence or absence of continuation of a character chain as a search character string by comparing the number of appearances.

9. A computer-readable recording medium on which search data used for full-text search is recorded, wherein the search data includes all two-byte characters other than special characters specified in advance for a character string to be searched. Fifth data in which a character chain is detected and the number of appearances of the first character and the second character constituting the two-character chain in the search target character string are recorded as a set, and for the search target character string, Detects all three-character chains consisting of three characters with special characters specified in advance,
For each three-letter chain, a sixth set in which a set of the number of appearances of the first character and the value 0 constituting the three-letter chain and a value 0 and a set of the number of occurrences of the value 0 and the third character as a set of two sets are recorded. A computer-readable recording medium on which full-text search data is recorded, wherein the data, the fifth data, and the sixth data are recorded separately.

10. A computer-readable recording medium on which search data used for full-text search according to claim 9 is recorded, and all two-character sequences consisting of characters other than special characters specified in advance are detected from a search character string. A first character chain detecting means for detecting, from the search character string, all character chains consisting of three characters into which a special character designated in advance has been inserted; and a first character chain detecting means. The two-character chain detected by the detecting means is searched from the first data recorded on the recording medium, the character chain detected by the third character chain detecting means is detected, and the character chain corresponding to the detected character chain is detected. A character string collating device comprising: comparing means for judging the presence or absence of continuation of a character chain as a search character string by comparing the number of appearances.

11. A computer-readable recording medium in which search data used for full-text search is recorded, wherein the search data includes all two-byte characters other than special characters specified in advance for a character string to be searched. Seventh data in which a character chain is detected, and the number of occurrences of the first character and the second character constituting the two-character chain in the character string to be searched for each two-character chain are recorded as a set. Detects all three-character chains consisting of three characters with special characters specified in advance,
For each three-character chain, the special character of the second character constituting the three-character chain is converted into the same character as the third character, and the number of occurrences of the second character is set to the same value as the number of occurrences of the third character, and then the first character A two-character chain consisting of a second character, a second character, and a third character is generated, and for each two-character chain, the number of appearances of the first character and the second character constituting the two-character chain in the search target character string is set. A computer-readable recording medium on which full-text search data is recorded, characterized in that the eighth data recorded as the first data and the seventh data and the eighth data are recorded separately.

12. A computer-readable recording medium on which search data used for full-text search according to claim 11 is recorded, and all two-character sequences consisting of characters other than special characters specified in advance are detected from a search character string. A first character chain detecting means for detecting all three-character chains consisting of three characters into which a predetermined special character is inserted, from a search character string; and a first character chain detecting means. The two-character chain detected by the chain detecting means is searched from the first data recorded on the recording medium, and the special character detected by the fourth character chain detecting means is converted to generate a two-character chain. A character string collating apparatus, comprising: comparing means for comparing the number of appearances corresponding to each two-character chain to determine whether or not there is a continuation of character chains as a search character string.

13. A computer-readable storage medium storing search data used for full-text search, wherein the search data detects all two-character sequences in a search target character string, If the two-character chain is a character chain other than the special character specified in advance, the characters other than the special character specified in advance for the first character and the second character are counted as the number of appearances. In the case where the ninth data that records the set and the two-character chain is a character chain that includes a special character specified in advance, the number of appearances of the first or second character corresponding to the special character is Remainder divided by less than the maximum value of the number of appearances specified in advance, or the maximum value if the remainder is 0, or the maximum value and the remainder, or the second and subsequent times if the first occurrence number is less than the maximum value Maximum value of When the lower value has a value such that the first value and the order are unique, the number of appearances of a character that is not a special character is stored as a set, and when the first character is a special character, this data set is: A computer-readable storage medium storing full-text search data, wherein tenth data sorted for each second character type, and ninth and tenth data are stored separately.

14. A computer-readable storage medium storing search data used for full-text search according to claim 13, and a fifth character for detecting all two-character sequences that do not include special characters from a search character string. A chain detecting means, a sixth character chain detecting means for detecting all character chains including special characters from the search character string, and a second character chain detecting means for detecting the search character string by the fifth character chain detecting means.
In the case where the character string is composed of character chains, a comparison unit that determines whether or not there is a continuation of character chains as a search character string by comparing the number of appearances corresponding to the detected character chain; 2 searched by character chain detection means
In the case of a character chain, a comparison means is provided for judging whether or not there is a continuation of the character chain as the search character string by comparing the number of occurrences of the detected character chain and the number of times of occurrence of the special character. A character string collating device characterized by the following.

15. A computer-readable storage medium on which search data used for full-text search is recorded, wherein the search data detects all two-character sequences in a search target character string, and outputs a document for each two-character sequence. Numbers, the number of appearances for each character type of the two-character chain, or character chain data composed of a set of arbitrary values. If the character string data does not include a special character specified in advance, the number of appearances of the first character is determined. If the size for storing the number of appearances of the second character is equal and includes a special character specified in advance, the size for storing the number of appearances corresponding to the special character is stored as an arbitrary value corresponding to a character not including the special character. In the first character chain data configured to be larger than the size, if the first character chain data includes a special character string specified in advance as the first character, the second character is a finger. Is stored, and the second character chain data is configured so that the first character of the next continuous character chain data is equal to the value specified by the second character of the previous character chain data. A computer-readable storage medium on which full-text search data is recorded.

16. A computer-readable storage medium storing search data used for full-text search according to claim 15, and a fifth character for detecting all two-character sequences that do not include special characters from a search character string. A chain detecting means, and if the two-character chain does not include a special character, the character chain data corresponding to the continuous character chain detected by the fifth character chain detecting means is compared with the character chain data of the detected character chain data. Comparing means for comparing the number of appearances of the two characters with the number of appearances of the first character of the character chain data of the character chain following the character chain to determine whether or not there is continuation of the character chain as a search character string; If the character chain includes special characters, the character chain data corresponding to the continuous character chain searched by the character chain detection unit is compared with the number of occurrences of the character in the same manner as the comparison unit. Means for storing the number of occurrences of a specified special character string at the time of comparison, comparing the character strings on the basis that the number of occurrences does not overlap except for a continuous character chain, and determining whether or not there is a continuation of the character chain as a search character string A character string collating device comprising:

17. A computer-readable storage medium storing search data used for full-text search, wherein the search data detects all two-character chains that do not include special characters in a search target character string, The first character and the second character constituting a character chain other than the special character designated in advance, which constitutes every two character chains.
Regarding characters, characters other than the special characters specified in advance are recorded as a set of the eleventh data of the number of appearances of the special characters. Second data recorded as a set of the number of appearances of the first character type in the two-character chain preceding the two-character chain and the number of occurrences of the first character type in the two-character chain following the special character, or before the special character , The twelfth data, which is a combination of the number of appearances of the first character type of the two-character chain and the number of appearances of the character immediately after the special character, and the eleventh data and the twelfth data are recorded separately. A computer-readable recording medium on which full-text search data is recorded.

18. A computer-readable storage medium on which search data used for full-text search according to claim 17 is recorded, and a two-character chain consisting of characters other than special characters designated in advance from a search character string. A first character chain search means for detecting, a first character of the two-character chain before the special character, and a second character chain after the special character from the special character string with respect to the two-character chain before and after the special character specified in advance. Or a character chain that combines the first character of the two-character chain preceding the special character and the character that immediately follows the special character. A seventh character chain detecting means to be detected, and a two-character chain detected by the first character chain detecting means are searched from the first data or the twelfth data recorded on the storage medium. Search from 11 data When the character string is searched, the character chain detected by the seventh character chain detecting means is searched. When the character string is searched from the twelfth data, the character chain detected by the first character chain detecting means is searched. A character string collating device comprising: a comparing unit that determines whether or not there is a continuation of a character chain as a search character string by comparing the number of appearances corresponding to the character chain.

19. A computer-readable recording medium recording search data used for full-text search, wherein the search data includes a special character specified in advance, and counts only occurrences of characters other than the special character specified in advance. Then, in the character string that is the appearance position of each character, a chain of all two characters consisting of characters other than the special character specified in advance is detected from the search target character string, and a two-character chain is detected for each two-character chain. And the thirteenth data in which the appearance position of the first character forming the character string is recorded as the appearance position in the search target character string, and all the character chains in which the special characters specified in advance have been inserted are detected. Fourteenth data, in which the appearance position of the first character constituting the character chain is recorded as the appearance position in the search target character string, is recorded separately from the thirteenth data and the fourteenth data Wherein, the computer-readable recording medium recording a full-text search data.

20. A computer-readable recording medium on which search data used for full-text search according to claim 19 is recorded, and all two-character sequences consisting of characters other than special characters specified in advance are detected from a search character string. A first character chain detecting means for detecting all character chains in which a predetermined special character is inserted from a search character string; and a first character chain detecting means for detecting the character string. Character chain
Searching from the thirteenth data recorded on the recording medium,
The character chain detected by the eighth character chain detecting means is
Searching from the fourteenth data recorded on the recording medium,
A character string collating device comprising: comparing means for judging the presence or absence of continuation of character chains as a search character string by comparing appearance positions corresponding to character chains.

21. A computer-readable recording medium on which search data used for full-text search is recorded, wherein the search data is a character string containing a special character specified in advance, and a special character string of a character string to be searched is specified. Characters are converted into characters that are not to be searched according to adjacent characters, and all two-character sequences including the characters that are not to be searched are detected from the converted character string. A computer-readable recording medium on which full-text search data is recorded, wherein an appearance position of a first character forming a character chain is recorded as an appearance position in a search target character string.

22. A computer-readable recording medium on which search data used for full-text search according to claim 21 is recorded, and a special character specified in a search character string is applied to data recorded on said recording medium. Subject to the same rules,
A character string conversion unit for converting a character string that is not a search target based on adjacent characters; and detecting a two-character chain including a character that is not a search target for the character string converted by the character string conversion unit. A two-character chain detecting unit, and a two-character chain detected by the two-character chain detecting unit is detected from the recording medium, and the presence or absence of the continuation of the character chain as a search character string is determined by comparing corresponding appearance positions. A character string collating device, comprising:

23. A computer-readable recording medium on which search data used for full-text search is recorded, wherein the search data is a character string including a special character specified in advance, and a special character string of a character string to be searched is specified. Characters are converted into two characters that are not to be searched according to adjacent characters, and for the converted character string, all two-character chains including the characters that are not to be searched are detected. 2
A computer-readable recording medium on which full-text search data is recorded, wherein an appearance position of a first character or a second character constituting a character chain is recorded as an appearance position in a search target character string.

24. A computer-readable recording medium on which search data used for full-text search according to claim 23 is recorded, and a special character specified in a search character string is applied to data recorded on said recording medium. Subject to the same rules,
A character string conversion means for converting into two characters that are not to be searched based on adjacent characters; and a two-character chain including two characters that are not to be searched for the character string converted by the character string conversion means. A two-character chain detecting means for detecting, and a two-character chain detected by the two-character chain detecting means is detected from the recording medium, and a corresponding appearance position is compared to determine whether or not the character chain as a search character string is continuous. A character string collating device, comprising: comparing means for judging a character string.

25. A computer-readable recording medium on which search data used for full-text search is recorded, wherein the search data is a character string including a special character specified in advance, and a character string specified in advance for a search target character string. In a character string consisting of characters other than special characters, all two-character
The first or second character that constitutes a two-character chain for each character chain
For the data in which the character appearance position is the appearance position of the search target character string, and for the search target character string, all three-character chains consisting of three characters with special characters specified in advance are detected. For each character chain, the special character of the second character constituting the three-character chain is converted into the same character as the third character, and the two-character chain consisting of the first character and the second character is detected to form the two-character chain. The fifteenth data which records data in which the appearance position of the first character or the second character to be searched is the appearance position of the search target character string, and the two-character sequence consisting of the second character and the third character of the three-character sequence After detecting the occurrence position of the first character in the two-character chain as the same value as the appearance position of the second character, the appearance position of the first character or the second character is determined as the appearance position of the character chain in the search target character string. The recorded 16th data and the 15th data Data and wherein the the first 16 data are recorded in distinction, a computer readable recording medium recording a full-text search data.

26. A computer-readable recording medium on which search data used for full-text search according to claim 25 is recorded, and all two-character sequences consisting of characters other than special characters specified in advance are detected from a search character string. A first character chain detecting means for detecting all three-character chains consisting of three characters into which a special character specified in advance has been inserted, from a search character string; and a first character chain detecting means. The two-character chain detected by the chain detecting means is searched from the fifteenth data recorded on the recording medium, and the special character detected by the tenth character chain detecting means is converted to generate a two-character chain. A character string collating apparatus, comprising: comparing means for comparing appearance positions corresponding to each two-character chain to determine whether or not there is a continuation of character chains as a search character string.

27. A computer-readable storage medium storing search data used for full-text search, wherein the search data detects a two-character chain and a character position in a search target character string, A set of a document number, a two-character chain, and a character position composed of the search target character string is configured as two-character chain information, and the character position of the two-character chain information is specified in advance based on the beginning of the search target character string. A storage medium storing character chain information for each first character of a character chain, excluding special character positions and numbering in ascending order, wherein the first character is a two-character chain that does not include special characters. The seventeenth data recorded as a set of the first character at the character position excluding the special character and the character position and the document number excluding the special character, and the character immediately before the special character and the special character Two characters The 18th data consisting of a chain, an arbitrary fixed value defined by the character type of the special character, and a document number, a two-character chain combining the special character and the character immediately following the special character, and the special character A two-character chain combining the nineteenth data consisting of the character position of the second character at the excluded character position and the document number and the characters immediately before and after the special character, and the character position at the character position excluding the special character 20th data composed of the character position of the first character and the document number; and 17th data, 18th data, 19th data, and 20th data. It is sorted and stored as character chain information. For the seventeenth data, the eighteenth data, and the nineteenth data, if the first character of the two character chains is the same and the second character is a special character, 17
A computer-readable storage medium on which full-text search data is recorded, wherein the eighteenth data is stored immediately after the data.

28. A computer-readable storage medium storing search data used for full-text search according to claim 27, and detecting all two-character sequences excluding a special character and characters before and after the special character from a search character string. An eleventh character chain detecting means, a two-character chain consisting of a character immediately before and after a special character, a two-character chain consisting of a character immediately before a special character and a special character, a special character and a special character from a search character string A twelfth character chain detecting means for detecting a two-character chain consisting of the character immediately after the character string, a seventeenth data corresponding to the two-character chain detected by the first character chain detecting means, and a twelfth character chain detecting means. 20th data corresponding to the character chain detected in
Comparing means for determining the presence or absence of continuation as a search character string by comparing the character position difference between the two data from the seventeenth data with the document number; and the eighteenth data immediately after the twentieth data. Comparing means for determining the presence or absence of a continuation as a search character string including a special character by being continued.

29. A computer-readable storage medium storing search data used for full-text search, wherein the search data detects a two-character chain and a character position with respect to a character string to be searched. A storage medium comprising a set of a document number composed of a search target character string, a two-character chain, and a character position as two-character chain information, and storing character chain information for each first character of the character chain, The character position of the two-character chain information is numbered in ascending or descending order excluding the position of the special character specified in advance with respect to the beginning of the search target character string. The 21st data recorded as a set of one character and the second character, the character position of the first character, the document number, and the character chain information including the special character are the same as the character just before the special character. , Just before special characters A character chain combining a character and a character immediately following a special character, a set of a character position immediately before the special character and a document number, and the first character and the second character of the character chain of the character chain information are special characters. When the first character or the second character of the character chain of the character chain information of the character chain information that does not include the special character is matched, the 22nd character string is separately recorded after or before the character chain information that does not include the special character. Data and character chain information including special characters, for the character immediately following the special character, the character chain combining the character immediately following the special character and the character that follows that character, the character position immediately following the special character And a document number, and when the first character of the character chain of the character chain information matches the first character of the two-character chain when no special character is included, the character chain information without the special character is included. After or before A second 23 data configured to be recorded in the number, character chain information including special characters, character chain that combines the character after the two previous characters and special characters special characters,
The twenty-fourth data consisting of the character position and the document number two places before the special character, and the twenty-first data, the twenty-second data, the twenty-third data, and the twenty-fourth data are stored separately. Computer-readable storage medium that records full-text search data.

30. A computer-readable storage medium storing the search data used in the full-text search according to claim 29, and all the characters other than characters before and after the special character from the search character string.
A thirteenth character chain detecting means for detecting a character chain; and in the case of a search character string sandwiching a special character from the search character string, a character immediately before and after the special character is detected as a character chain; The second character in the character chain is marked as the character immediately following the special character. If the beginning of the search character string is a special character, the character immediately after the special character and the next character are detected as a character chain, and the character chain is detected. Is marked as the character immediately after the special character, and if the special character appears after the third character from the beginning of the search string, the character two characters before the special character and the character immediately after the special character Detected as a character chain and the second of the character chain
The character is marked as the character immediately following the special character, the character immediately following the special character and the next character are detected as a character chain, and the first character of the character chain is marked as the character immediately following the special character. In the case where the fourteenth character chain detecting means and the search character string are composed of two character chains detected by the thirteenth character chain detecting means, a comparison between the character position corresponding to the detected character chain and the document number is performed. The comparison means for judging the presence or absence of continuation of the character chain as the search character string, and the twenty-first character string when the search character string is composed of the two character chain searched by the fourteenth character chain detection means. Comparing means for determining whether or not the data matches the character chain information of the twenty-fourth data based on the character chain and the document number to determine whether or not there is a continuation of character chains as a search character string.