JPH03260868A

JPH03260868A - Information processor

Info

Publication number: JPH03260868A
Application number: JP2060180A
Authority: JP
Inventors: Chuichi Kikuchi; 菊池　忠一
Original assignee: TEREMATEIIKU KOKUSAI KENKYUSHO KK
Current assignee: TEREMATEIIKU KOKUSAI KENKYUSHO KK
Priority date: 1990-03-12
Filing date: 1990-03-12
Publication date: 1991-11-20
Anticipated expiration: 2010-08-16
Also published as: JPH0776973B2

Abstract

PURPOSE:To retrieve a large quantity of data at a high speed to plural retrieving inputs by allocating the record corresponding to a key word to a retrieving file and taking the register information common to the retrieving inputs out of the record corresponding to a key word having the lowest register frequency. CONSTITUTION:A CPU 1 is provided to perform various types of arithmetic processing or decision processing together with a memory 2, a keyboard 4, a display 5, an input/output part 3, an external storage control part 6, and a common bus 8. The register frequency of a key word to be registered is counted for production of a register frequency-key word table. Furthermore a retrieving file is produced to store the registered key words having the register frequencies higher than that of the preceding key word and the register information in pairs. Then the key word having the lowest register frequency is taken out of the key word table, and the register information common to plural inputted retrieving key words is taken out. Thus a large quantity of information can be retrieved at a high speed with use of plural retrieving inputs.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、情報処理装置における複数のキーワードを用
いる情報検索に係わるものであり、特に大量の情報を高
速に検索するものに適する。DETAILED DESCRIPTION OF THE INVENTION [Industrial Application Field] The present invention relates to information retrieval using a plurality of keywords in an information processing device, and is particularly suitable for searching a large amount of information at high speed.

〔overview〕

本発明は、複数のキーワードを有する情報に対して１個
以上のキーワード検索を行い、その照合を行って求める
情報の検索処理を行う情報処理装置において、検索ファイルにキーワードに対応するレコードを割り当
て、登録情報の有するキーワードごとに、レコードに対
応するキーワードの登録頻度以上のすべてのキーワード
をレコードに格納しておき、最も登録頻度の低いキーワ
ードに対応するレコードの中から検索入力に共通する登
録情報を取り出すことにより、複数の検索入力に対する大量のデータ検索を高速に行う
ことができるようにするものである。The present invention provides an information processing apparatus that performs one or more keyword searches on information having a plurality of keywords, performs a comparison, and performs a search process for the information to be sought. For each keyword included in the registered information, all keywords with a registration frequency higher than or equal to the keyword corresponding to the record are stored in the record, and the registered information common to the search input is selected from the records corresponding to the keyword with the lowest registration frequency. By extracting the data, it is possible to quickly search large amounts of data based on multiple search inputs.

[Conventional technology]

従来、情報処理装置の情報検索は、任意の１個以上の検
索入力に対して完全転置方式の情報検索が行われること
が多い。この完全転置方式は、キーワードのすべての属
性ごとに転置ファイルを準備し、検索入力の属性に該当
する転置ファイルから検索入力に対応したレコードを取
り出し、この取り出したレコードから、検索入力に共通
の情報を取り出すことによって検索処理を行うものであ
る。2. Description of the Related Art Conventionally, information searches in information processing apparatuses are often performed using a complete transposition method for one or more arbitrary search inputs. This complete transposition method prepares a transposed file for each attribute of a keyword, extracts a record corresponding to the search input from the transposed file that corresponds to the attribute of the search input, and uses information common to the search input from this extracted record. The search process is performed by extracting the .

[Problem to be solved by the invention]

しかし、転置ファイル方式による検索は、複数の検索入
力に対して検索入力数に比例した転置ファイルからのア
クセス時間を要し、しかも大量情報検索時にはレコード
間に共通する情報を取り出すのに時間がかかるため大量
の情報の高速検索には向かない問題があった。However, searching using the transposed file method requires access time from the transposed file in proportion to the number of search inputs for multiple search inputs, and when searching for large amounts of information, it takes time to retrieve common information between records. Therefore, there was a problem that it was not suitable for high-speed searching of large amounts of information.

この例を説明する。This example will be explained.

図書館においてその図書管理を行うデータベースシステ
ムを構築するとき、収納する図書には著者、発行者、件
名等の複数のキーワードがあり、検索時にはこれらのキ
ーワードの中から任意のキーワードを使用して検索する
。このとき、転置ファイル方式を用いる場合は著者名、
発行者、件名の３つの検索入力を用いると、この３つの
検索入力のそれぞれに対応するレコードを取り出す必要
がある。このため、複数の検索入力数に比例した検索時
間を要することになり、複数の検索入力による情報検索
を行う場合には検索時間が長くなる問題があった。When constructing a database system for book management in a library, the books stored have multiple keywords such as author, publisher, subject, etc., and when searching, use any keyword from among these keywords. . At this time, if using the transposed file method, the author name,
If three search inputs, publisher and subject, are used, it is necessary to retrieve records corresponding to each of the three search inputs. Therefore, a search time is required in proportion to the number of search inputs, and there is a problem in that the search time becomes long when performing an information search using a plurality of search inputs.

さらに、大量の情報の検索処理を行う場合には取り出し
た各レコードのデータ量が巨大なものとなるため、レコ
ード間に共通する情報を取り出すのに時間がかかる問題
があった。Furthermore, when performing a search process for a large amount of information, the amount of data for each retrieved record becomes enormous, so there is a problem in that it takes time to retrieve common information between records.

本発明は、複数検索入力に対してｌレコードだけで検索
すると高速性が得られることと、複数レコード間の共通
情報取り出し処理では最初の２レコ一ド間の処理に多く
の時間がかかるので、登録時に検索処理の最初の２レコ
一ド間の共通情報抽出を行っておくことで検索の高速性
が得られることに着目してなされたもので、複数の検索
入力に対して高速に大量の検索処理を行うことができる
情報処理装置および情報の検索方法を提供することを目
的とする。The present invention provides high speed when searching with only one record for multiple search inputs, and also because it takes a lot of time to process the first two records in the process of retrieving common information between multiple records. This was done based on the fact that searching can be performed at high speed by extracting common information between the first two records of the search process at the time of registration. An object of the present invention is to provide an information processing device capable of performing search processing and an information search method.

[Means to solve the problem]

本発明は、検索用のキーワードを記憶する領域を備えた
記憶装置と検索入力にしたがって検索処理を実行する検
索処理装置とを備え、検索対象の登録情報から抽出され
た検索用のキーワードが登録され、この登録キーワード
と検索入力との照合を行うことにより検索を行う情報処
理装置において、登録されるキーワードの登録頻度を計数する第一手段と
、この登録されたキーワードの登録頻度数とキーワード
とを対応させたキーワード族を作成する第二手段と、上
記第一手段および第二手段を参照して登録キーワードご
とに、このキーワードが属する登録情報の全キーワード
を対象としてこのキーワードの登録頻度以上の登録キー
ワードと登録情報とを対にして格納したレコードからな
る検索ファイルを作成する第三手段とを備えたことを特
徴とする。The present invention includes a storage device having an area for storing search keywords and a search processing device that executes a search process according to a search input, wherein the search keywords extracted from registered information to be searched are registered. , in an information processing device that performs a search by comparing the registered keyword with a search input, a first means for counting the registration frequency of the registered keyword, and a first means for counting the registration frequency of the registered keyword and the keyword. a second means of creating a matched keyword family, and a registration process that is equal to or higher than the registration frequency of this keyword for each registered keyword by referring to the first means and second means, targeting all keywords in the registered information to which this keyword belongs. The present invention is characterized by comprising a third means for creating a search file consisting of records in which keywords and registered information are stored in pairs.

さらに本発明は、検索処理装置が入力された複数の検索
用キーワードの中から最もキーワード登録頻度の低いキ
ーワードを上記キーワード族より取り出し、このキーワ
ードをディレクトリとして、上記検索ファイルの該当す
るレコード欄を検索し、このレコード欄から上記入力さ
れた複数の検索用キーワードに共通する登録情報を検索
結果として取り出す手段を備えたことを特徴とする。Further, in the present invention, the search processing device extracts the least frequently registered keyword from the keyword group from among the plurality of input search keywords, and uses this keyword as a directory to search the corresponding record column of the search file. The present invention is characterized in that it includes means for extracting registered information common to the plurality of input search keywords from the record field as a search result.

[Effect]

本発明の情報処理装置は、キーワードの登録処理と検索
入力にしたがって登録キーワードとの一致確認による検
索処理の二つの動作をもつ。The information processing apparatus of the present invention has two operations: a keyword registration process and a search process by checking a match with a registered keyword according to a search input.

キーワードの登録処理は、まず、検索対象の登録情報か
ら抽出され、この登録情報と対に対応づけられているキ
ーワードについて、登録するキーワードとその登録キー
ワードの登録頻度を計数してキーワードとそのキーワー
ドの識別符号と登録頻度とを対応づけたキーワード族を
作成する。In the keyword registration process, first, the keywords are extracted from the registered information of the search target, and the keywords that are paired with this registered information are counted and their frequency of registration is calculated. A keyword family is created in which identification codes and registration frequencies are associated with each other.

ついで登録するすべてのキーワードごとに作成されるレ
コード欄からなり、このレコード欄にはキーワードの識
別符号と登録情報とが対になって格納されている検索フ
ァイルを作成する。このレコード欄には、このレコード
欄のキーワードの登録頻度以上のキーワードの識別符号
と登録情報の対を登録しておく。Next, a search file is created, consisting of a record column created for each keyword to be registered, in which the keyword identification code and registration information are stored in pairs. In this record field, pairs of identification codes and registration information of keywords having a registration frequency higher than or equal to the keywords in this record field are registered.

キーワードを検索入力として登録キーワードとの照合を
行うときは、入力された複数のキーワードのうち、もっ
とも登録頻度の低いキーワードを探し、この最も登録頻
度の低いキーワードについての検索ファイルのレコード
を検索して、そのレコード内に格納されている検索入力
されたすべてのキーワードに係わるキーワードの識別符
号と登録情報との対を抽出する。そして入力された複数
の検索入力に共通の登録情報を取り出せば、求める検索
対象を抽出することができる。When using a keyword as a search input to match registered keywords, search for the least frequently registered keyword among the multiple input keywords, and search for records in the search file for this least frequently registered keyword. , extracts pairs of keyword identification codes and registered information related to all search-input keywords stored in that record. By extracting the registered information common to multiple search inputs, the desired search target can be extracted.

本発明はこのように複数検索入力に対して最少登録頻度
のキーワードに該当する検索ファイルのレコードを検索
することで高速に検索できるようにしている。In this manner, the present invention enables high-speed searching by searching for a record of a search file that corresponds to the keyword with the least registration frequency in response to multiple search inputs.

〔Example〕

以下図面を参照して本発明の詳細な説明する。 The present invention will be described in detail below with reference to the drawings.

第１図は本発明一実施例における情報処理装置の構成を
示すものである。FIG. 1 shows the configuration of an information processing apparatus in an embodiment of the present invention.

本実施例の情報処理装置は、各種演算処理あるいは判断
処理を行うＣＰＵ１と、検索処理、キーワード登録処理
等のプログラム、キーワード等を記憶するメモリ２、キ
ーボード４、デイスプレィ５を接続する人出力部３、検
索ファイル等の各種情報が記憶される外部記憶装置７を
接続する外部記憶装置制御部６、ＣＰＵ１、メモリ２、
人出力部３、外部記憶装置制御部６を接続する共通バス
８を備える。The information processing apparatus of this embodiment includes a CPU 1 that performs various arithmetic processing or judgment processing, a human output unit 3 that connects programs such as search processing and keyword registration processing, a memory 2 that stores keywords, etc., a keyboard 4, and a display 5. , an external storage device control unit 6 that connects an external storage device 7 in which various information such as search files are stored, a CPU 1, a memory 2,
A common bus 8 is provided to connect the human output section 3 and the external storage device control section 6.

本実施例での情報処理装置の処理は検索処理に供するた
めのキーワード登録と検索ファイルの作成に係わる登録
処理と、検索入力に対する検索処理との二つに分けられ
る。この実施例では電子図書館の閲覧用検索表示を行う
場合について説明する。The processing of the information processing apparatus in this embodiment is divided into two: registration processing related to keyword registration and creation of a search file for use in search processing, and search processing for search input. In this embodiment, a case will be explained in which a search display for browsing an electronic library is performed.

まず、キーワード登録と検索ファイルの作成登録処理に
ついて説明する。First, the keyword registration and search file creation and registration processing will be explained.

第２図は、このキーワード登録において作成されるキー
ワード族を示す。FIG. 2 shows a keyword family created in this keyword registration.

このキーワード族は登録するキーワードの種類の数のキ
ーワードエリアから構成されており、各キーワードエリ
アは、キーワード欄９と、登録番号欄ｌＯと、登録頻度
欄１１とで構成されている。このキーワード欄９は登録
する図書のキーワードを記憶する４バイトのフィールド
であり、登録番号欄１０はキーワードの登録順番を示す
登録番号を記憶する４バイトのフィールドであり、登録
頻度欄１１は図書登録に使用されるキーワードの登録頻
度を記憶する４バイトのフィールドである。ここでキー
ワードは登録時に入力される文字列であり、登録番号と
登録頻度はともに正の整数からなる。This keyword family is made up of keyword areas as many as the types of keywords to be registered, and each keyword area is made up of a keyword field 9, a registration number field IO, and a registration frequency field 11. The keyword column 9 is a 4-byte field that stores the keyword of the book to be registered, the registration number column 10 is a 4-byte field that stores the registration number indicating the order of keyword registration, and the registration frequency column 11 is a 4-byte field that stores the keyword of the book to be registered. This is a 4-byte field that stores the registration frequency of keywords used for. Here, the keyword is a character string input at the time of registration, and both the registration number and registration frequency are positive integers.

第３図は、ハツシュ値頻度表を示すものであり、設定者
が設定する数のハツシュ値頻度欄１２で構成される。例
えば、ハツシュ値を１０００個とすると、ハツシュ値頻
度表にはＮα１ハツシュ値頻度欄からＮＱ、１０００ハ
ツシュ値頻度欄まで１０００個のフィールドが確保され
る。各ハツシュ値頻度欄１２には、ハツシュ関数から算
出される同一ハツシュ値を持つキーワードの数を示す算
出頻度が記憶される。すなわち、ハツシュ値頻度欄１２
の算出頻度は、第２図に示すキーワード族において、同
一ハツシュ値になるキーワードが連続配列される数も示
している。FIG. 3 shows a hash value frequency table, which is composed of a number of hash value frequency columns 12 set by the setter. For example, if the number of hash values is 1000, 1000 fields are secured in the hash value frequency table from the Nα1 hash value frequency column to the NQ and 1000 hash value frequency columns. Each hash value frequency column 12 stores a calculation frequency indicating the number of keywords having the same hash value calculated from a hash function. In other words, the hash value frequency column 12
The calculation frequency also indicates the number of consecutively arranged keywords having the same hash value in the keyword family shown in FIG.

例えば、Ｎｏ、　１ハツシュ値頻度欄の算出頻度が「５
」で、ＮＯ１２ハツシュ値頻度欄の算出頻度が「３」の
場合には、キーワード族のＮα１キーワードからＮ１１
５までの５個のキーワードがＮα１ハツシュ値頻度欄の
算出頻度の算出頻度「５」に対応し、キーワード族のＮ
α６キーワードからＮα８キーワードまでの３個のキー
ワードがＮα２ハッシン値頻度欄の算出頻度「３」に対
応する。For example, if the calculation frequency in the No. 1 hash value frequency column is "5"
”, if the calculation frequency in the NO12 hash value frequency column is “3”, N11 from the Nα1 keyword of the keyword family
The five keywords up to 5 correspond to the calculation frequency "5" of the calculation frequency in the Nα1 hash value frequency column, and the N of the keyword family
Three keywords from the α6 keyword to the Nα8 keyword correspond to the calculation frequency “3” in the Nα2 hashin value frequency column.

第４図は、ハツシュ値先頭番地表の構成を示すもので、
第３図に示すハツシュ値頻度表のハツシュ値頻度欄１２
と同数の４バイトのフィールドであるハツシュ値先頭番
地欄１３で構成されている。このハツシュ値先頭番地欄
１３には、第２図のキーワード族に格納する登録キーワ
ードのキーワードエリア先頭番地が記憶される。すなわ
ち、ハツシュ値先頭番地表は、第３図に示すキーワード
算出頻度表に対応して、例えば、Ｎ＋ｌＬ１ハツシュ値
先頭番地欄の先頭番地は、ＮＣＬ　１ハツシュ関数度欄
に対応するキーワード族のキーワード群の先頭番地を示
している。Figure 4 shows the structure of the hash value starting address table.
Hash value frequency column 12 of the hash value frequency table shown in FIG.
It consists of a hash value start address field 13 which is a 4-byte field with the same number as . This hash value starting address column 13 stores the keyword area starting address of the registered keyword stored in the keyword family of FIG. 2. That is, the hash value starting address table corresponds to the keyword calculation frequency table shown in FIG. indicates the first address.

第５図は、検索ファイルの構成を示す図である。FIG. 5 is a diagram showing the structure of a search file.

この検索ファイルはキーワード族に示すキーワードエリ
アの数のレコードエリアから構成されており、各レコー
ドエリアは、ヘッダ部１４とデータ部１５とで構成され
ている。This search file is composed of record areas as many as the keyword areas shown in the keyword family, and each record area is composed of a header section 14 and a data section 15.

第６図はこの検索ファイル内のレコードの構成を示すも
のである。ヘッダ部１４は、レコード番号欄１６と収容
数欄１７とで構成されており、レコード番号欄１６は４
バイトのフィールドで構成され、キーワードの登録番号
をレコードを識別するレコード番号として記憶する。収
容数欄１７は、４バイトのフィールドで構成され、デー
タ部１５に格納されるキーワードエリアの数を示すレコ
ード容量を記憶する。FIG. 6 shows the structure of records in this search file. The header section 14 is composed of a record number column 16 and an accommodation number column 17, and the record number column 16 is composed of 4
It consists of a byte field and stores the keyword registration number as the record number that identifies the record. The capacity column 17 is composed of a 4-byte field and stores a record capacity indicating the number of keyword areas stored in the data section 15.

データ部１５は、収容数欄１７のレコード容量で示す数
のキーワードエリアで構成されており、それぞれのキー
ワードエリアは、登録番号欄１８と情報番号欄１９とか
ら構成されている。この登録番号欄１８は、４バイトの
フィールドで構成され、キーワードの登録番号を記憶す
る。情報番号欄１９は４バイトのフィールドで構成され
、検索対象であり、登録番号欄１８のキーワードに対応
する図書の情報番号を記憶する。The data section 15 is composed of a number of keyword areas indicated by the record capacity in the storage number column 17, and each keyword area is composed of a registration number column 18 and an information number column 19. This registration number column 18 is composed of a 4-byte field and stores the registration number of the keyword. The information number column 19 is composed of a 4-byte field, and stores the information number of the book that is the search target and corresponds to the keyword in the registration number column 18.

第７図は、レコード容量表の構成を示すもので、第２図
に示すキーワード族のキーワードエリアと同数のレコー
ド容量欄２０で構成され、キーワード族の登録番号「ｌ
」から昇順にレコード容量欄も「１」から昇順に対応づ
けている。レコード容量欄２０は、第５図に示す検索フ
ァイルの各レコードエリアのデータ部の容量を示す数値
を記憶する４バイトのフィールドからなる。FIG. 7 shows the structure of the record capacity table, which consists of the same number of record capacity columns 20 as the keyword areas of the keyword group shown in FIG.
The record capacity column is also associated in ascending order starting with "1". The record capacity column 20 consists of a 4-byte field that stores a numerical value indicating the capacity of the data section of each record area of the search file shown in FIG.

第８図は、それぞれのレコードの格納番地を示すレコー
ド先頭番地表であり、レコード容量表と同数のｎ個のレ
コード先頭番地欄２１から構成され、各レコード容量欄
２０に対応している。このレコード先頭番地欄２１は、
上述の検索ファイルに記憶する各レコードのレコード先
頭番地を記憶するもので、第７図のレコード容量表のレ
コード容量欄２０に対応し、例えばｋｌレコード先頭番
地はＮα１しコード容量欄に対応するレコードの検索フ
ァイルにおける先頭番地を示している。FIG. 8 is a record start address table showing the storage address of each record, which is composed of n record start address columns 21, the same number as the record capacity table, and corresponds to each record capacity column 20. This record starting address column 21 is
It stores the record start address of each record stored in the above search file, and corresponds to the record capacity column 20 of the record capacity table in FIG. 7. For example, the kl record start address is Nα1, and the record corresponding to the code capacity column This indicates the starting address in the search file.

次に具体的にキーワードの登録処理動作の流れ図を第９
図に示して説明する。Next, the flowchart of the keyword registration processing operation is shown in Section 9.
This will be explained with reference to the diagram.

このキーワード登録処理の動作は、大きく分けてキーワ
ードの分類とその登録頻度数の計数と、検索ファイルの
キーワードごとのレコードに、登録番号および情報番号
の対を格納して検索ファイルを作成する動作に分けられ
ており、第１のステップは、キーワードのハツシュ値に
よるハツシュ先頭頭番地表の作成、第２のステップはキ
ーワード族の作成、第３のステップは検索ファイルの先
頭番地を作成、第４のステップは検索ファイルの作成の
動作からなっている。The operations of this keyword registration process can be broadly divided into categorizing keywords and counting their frequency of registration, and creating a search file by storing pairs of registration numbers and information numbers in records for each keyword in the search file. The first step is to create a hash head address table based on the hash value of the keyword, the second step is to create a keyword family, the third step is to create the start address of the search file, and the fourth step is to create a hash head address table using the hash value of the keyword. The step consists of the operation of creating a search file.

まず、第１のステップはキーワード表作成の前処理とし
て、キーワードをハツシュ値で分類し、ハツシュ値分類
したキーワードの数と、キーワード表内におけるハラシ
ン値分類したキーワード群の先頭番地を算出する。First, in the first step, as pre-processing for creating a keyword table, keywords are classified by hash value, and the number of keywords classified by hash value and the starting address of a group of keywords classified by hash value in the keyword table are calculated.

この動作を説明する。This operation will be explained.

まず、登録ファイルから書誌情報を取り出しく３１０１
）、書誌情報の中からキーワードをすべて取り出す。ハ
ツシュ関数を用いてこれらのキーワードのハツシュ値を
算出し、第３図のハツシュ値頻度表の該当するハツシュ
値頻度欄１２に「１」を加算する（　Ｓ　１０２）。同
様の処理を登録されている書誌情報すべてについて行う
。First, extract bibliographic information from the registered file 3101
), all keywords are extracted from the bibliographic information. The hash values of these keywords are calculated using a hash function, and "1" is added to the corresponding hash value frequency column 12 of the hash value frequency table in FIG. 3 (S102). The same process is performed for all registered bibliographic information.

次にハツシュ先頭頭番地表を作成する。まずＮαｌハツ
シュ値先頭番地欄に「０」を格納する（Ｓ１０４〉。そ
して、Ｎα２ハツシュ値先頭番地欄に移り（３１０５）
　、ハツシュ値頻度表のＮＣＬ　１ハツシュ値頻度欄の
算出頻度を３倍した値にＮＩＩＬ１ハツシュ値先頭番先
頭「０」を加算した値を先頭番地としてＮα２ハツシュ
値先頭番地欄に格納する（３１０６）。ハツシュ値頻度
表のハラシス値頻度欄と同数のハツシュ先頭頭番地欄に
先頭番地を格納したかを調べ（３１０７）　、同様にハ
ツシュ値頻度表のＮｃＬ２ハツシュ値頻度欄の算出頻度
を３倍した値にＮα２ハツシュ値先頭番地欄の数値を加
算した値を先頭番地として＆３ハツシュ値先頭番地欄に
格納する。これは（ＮｃＬ（ｎ　−１）　バーｉシュ値
頻度欄）×３＋（Ｎα（ｎ−１）バッジ・ユ先頭頭番地
欄）をＮｏ、　ｎハツシュ値先頭番地に格納するもので
ある。同様の処理をハツシュ先頭頭番地表の全ハツシュ
先頭頭番地欄に対して行う。Next, create a hash head address table. First, "0" is stored in the Nαl hash value starting address field (S104>). Then, the process moves to the Nα2 hash value starting address field (3105).
, the value obtained by adding the calculation frequency in the NCL 1 hash value frequency field of the hash value frequency table by 3 and the NIIL 1 hash value leading number "0" is stored as the leading address in the Nα2 hash value leading address field (3106). . It is checked whether the first address has been stored in the same number of hashish starting address fields as the halash value frequency field of the hashish value frequency table (3107), and similarly, the calculated frequency of the NcL2 hashish value frequency field of the hashish value frequency table is tripled. The value obtained by adding the numerical value in the Nα2 hash value starting address field to the &3 hash value starting address field is stored as the starting address in the &3 hash value starting address field. This is to store (NcL(n-1) base value frequency field) x 3+(Nα(n-1) base value first address field) at the No. n hash value first address. Similar processing is performed for all the hash start address fields in the hash start address table.

第２のステップでは、キーワード族を作成する。The second step is to create a keyword family.

登録番号に「１」をセットした後、登録ファイルから書
誌情報を取り出しく３１１０．１１１）、書誌情報の中
からキーワードを取り出し、ハツシュ関数を用いてこの
キーワードのハツシュ値を算出して（Ｓ１１２）、ハツ
シュ先頭頭番地表から、このハツシュ値に該当するハツ
シュ先頭頭番地欄の先頭番地を取り出す（ｓｌ１３）。After setting the registration number to "1", extract the bibliographic information from the registration file (3110.111), extract the keyword from the bibliographic information, and calculate the hash value of this keyword using the hash function (S112). , the leading address of the hashish leading address column corresponding to this hash value is extracted from the hashish leading address table (sl13).

この先頭番地に該当するキーワード族のキーワードエリ
ア以降の空きエリアのキーワード欄に書誌情報から取り
出したキーワードを、登録番号欄に、このキーワードの
登録番号になる登録番号の現在値を、登録頻度欄に、ハ
ツシュ値頻度表の該当するハツシュ値頻度欄の算出頻度
を格納する（Ｓ１１４）。そして次のキーワードを処理
する前に登録番号に「ｌ」を加算する（Ｓ１１５）。同
様の処理を取り出した書誌情報のすべてのキーワードに
ついて行う。同様の処理を書誌情報すべてについて行う
。In the keyword column of the empty area after the keyword area of the keyword family corresponding to this first address, put the keyword extracted from the bibliographic information in the registration number column, and put the current value of the registration number that will be the registration number of this keyword in the registration frequency column. , the calculated frequency in the corresponding hash value frequency column of the hash value frequency table is stored (S114). Then, before processing the next keyword, "l" is added to the registration number (S115). Similar processing is performed for all keywords in the extracted bibliographic information. Similar processing is performed for all bibliographic information.

第３のステップでは、検索ファイル作成の前処理として
、検索ファイルの各レコードの容量と、検索ファイル内
の先頭番地を算出する。In the third step, as pre-processing for creating a search file, the capacity of each record in the search file and the starting address in the search file are calculated.

初めに、レコード容量表を作成する。登録ファイルから
書誌情報を１冊分取り出しく３１１８）、書誌情報の中
からキーワードを取り出す、ハツシュ関数を用いてこの
キーワードのハツシュ値を算出する（Ｓ１１９）。ハツ
シュ先頭頭番地表から、このハツシュ値に該当するハツ
シュ先頭頭番地欄の先頭番地を取り出しく３１２０）、
この先頭番地に該当するハツシュ値頻度欄の算出頻度が
示す数のキーワードエリアを取り出す（５１２１、１２
２）。取り出したキーワードエリアの中から、登録処理
中のキーワードに一致するキーワードを記憶しているキ
ーワードエリアの登録番号と登録頻度を取り出す（Ｓ１
２３）。同様の処理を取り出した書誌情報の残りのキー
ワードについて行う。First, create a record capacity table. Bibliographic information for one book is extracted from the registered file (3118), a keyword is extracted from the bibliographic information, and a hash value of this keyword is calculated using a hash function (S119). From the hash head address table, extract the head address of the hash head head address field that corresponds to this hash value (3120);
Extracts the number of keyword areas indicated by the calculation frequency in the hash value frequency field corresponding to this first address (5121, 12
2). From the extracted keyword areas, extract the registration number and registration frequency of keyword areas that store keywords that match the keyword being registered (S1
23). Similar processing is performed on the remaining keywords of the extracted bibliographic information.

取り出したこれらの登録番号と登録頻度の対の中で、最
少登録頻度の登録番号に該当するレコード容量表のレコ
ード容量欄に、取り出したこれらの登録番号と登録頻度
の対の数を格納する（　Ｓ　１２５）。Among these retrieved registration number and registration frequency pairs, store the number of retrieved registration number and registration frequency pairs in the record capacity column of the record capacity table that corresponds to the registration number with the lowest registration frequency ( S 125).

これらの登録番号と登録頻度の対から、最少登録頻度の
対を削除しく　Ｓ　１２６）、再び残った登録番号と登
録頻度の対の中で、最少登録頻度の登録番号に該当する
レコード容量表のレコード容量欄に、登録番号と登録頻
度の対の数を格納し、最少登録頻度の対を削除する（　
Ｓ　１２７）。同様の処理を登録番号と登録頻度の対が
無くなるまで繰り返す。From these pairs of registration numbers and registration frequencies, delete the pair with the lowest registration frequency (S126), and then delete the record capacity table corresponding to the registration number with the lowest registration frequency among the remaining pairs of registration numbers and registration frequencies. Store the number of pairs of registration number and registration frequency in the record capacity field, and delete the pair of minimum registration frequency (
S 127). Similar processing is repeated until there are no more pairs of registration numbers and registration frequencies.

次にレコード先頭番地表を作成する。Next, create a record start address table.

まず、Ｎαルコード先頭番地欄に「０」を格納する（　
Ｓ　１３０）。レコード容量表のｋｌレコード容量欄の
容量に、第６図に示すレコードのヘッダ部の容量として
「１」を加算し、この値を２倍した値にＮａ　Ｉ先頭番
地欄の「Ｏ」を加算し、この値を先頭番地として、Ｎ（
Ｌ２レコード先頭番地欄に記憶する。さらに、レコード
容量表のＮα２レコード容量欄の容量に「１」を加えた
値を２倍した値に、Ｎα２レコ一ド先頭番地欄の数値を
加算した値を先頭番地として、Ｎα３レコ一ド先頭番地
に格納する。First, store “0” in the Nα code first address field (
S 130). Add "1" as the capacity of the header section of the record shown in Figure 6 to the capacity in the kl record capacity column of the record capacity table, and add "O" in the NaI start address column to the value doubled. Then, with this value as the starting address, N(
Store it in the first address column of the L2 record. Furthermore, the value obtained by adding "1" to the capacity in the Nα2 record capacity field of the record capacity table and the value in the Nα2 record start address field is set as the start address, and the value of the Nα3 record start address is set as the start address. Store in address.

この処理は、Ｎｏ、ｎハツシュ値先頭番地に（ＮＣＬ（
Ｔｌ−１）レコード容量欄＋１）ｘ２＋　（Ｎ（Ｌ　（
ｎ−１）レコード先頭番地欄）の値を格納するものであ
る（　Ｓ　１３２）。同様の処理をレコード容量表の全
レコード容量欄分行う（Ｓ１３３〜１３５）。This process is performed at the first address of the No, n hash value (NCL(
Tl-1) Record capacity column +1) x2+ (N(L (
n-1) Record start address field) is stored (S132). Similar processing is performed for all record capacity columns of the record capacity table (S133-135).

第４のステップでは、検索ファイルを作成する。In the fourth step, a search file is created.

登録ファイルから書誌情報を１冊分取り出しく　Ｓ　１
４０）　、書誌情報の中からキーワードを取り出し、ハ
ツシュ関数を用いてこのキーワードのハツシュ値を算出
する（　Ｓ　１４１）。そしてハツシュ先頭頭番地表か
ら、このハツシュ値に該当するハツシュ先頭頭番地欄の
先頭番地を取り出しく　Ｓ　１４２）、この先頭番地に
該当するキーワード族のキーワードエリアから、ハツシ
ュ値頻度表の該当するハツシュ値頻度欄の算出頻度が示
すキーワードエリアを取り出す（Ｓ１４３．１４４）。Extract bibliographic information for one book from the registered file S 1
40) A keyword is extracted from the bibliographic information, and a hash value of this keyword is calculated using a hash function (S141). Then, from the hashish starting address table, extract the starting address of the hashish starting address field that corresponds to this hashish value. The keyword area indicated by the calculated frequency in the value frequency column is extracted (S143 and 144).

取り出したキーワードエリアの中から、取り出した書誌
情報のキーワードに一致するキーワードを格納している
登録番号と登録頻度を取り出す（Ｓ　１４５）。同様の
処理を取り出した書誌情報の残りのキーワードについて
行う。From the retrieved keyword area, the registration number and registration frequency storing the keyword that matches the keyword of the retrieved bibliographic information is retrieved (S145). Similar processing is performed on the remaining keywords of the extracted bibliographic information.

取り出したこれらの登録番号と登録頻度との対の中で、
最少登録頻度の登録番号に該当するレコード先頭番地表
のレコード先頭番地欄の先頭番地が示す検索ファイルの
レコード番号欄に、最少登録頻度の登録番号を格納し、
収容数欄に登録処理中のこれらの登録番号と登録頻度の
対の数を加算し、データ部の空き領域に取り出し登録処
理中のすべての登録番号と登録頻度とを格納する（　Ｓ
　１４７．１４８〉。Among these retrieved pairs of registration numbers and registration frequencies,
Store the registration number of the minimum registration frequency in the record number column of the search file indicated by the start address of the record start address column of the record start address table corresponding to the registration number of the minimum registration frequency,
Add the number of pairs of registration numbers and registration frequencies that are currently being registered in the capacity column, and store all the registration numbers and registration frequencies that are being retrieved and registered in the free area of the data section (S
147.148〉.

さらに、これらの登録番号と登録頻度の対から最少登録
頻度の対を削除しく　Ｓ　１４９）、再び残った登録番
号と登録頻度の対の中で、最少登録頻度の登録番号に該
当するレコード先頭番地表のレコード先頭番地欄の先頭
番地に該当する検索ファイルのレコード番地に最少登録
頻度の登録番地を格納し、収容数欄に残った登録番号と
登録頻度の対の数を加算し、データ部の空き領域に残っ
た登録番号と登録頻度を格納し、最少登録頻度の対を削
除する。同様の処理を登録番号と登録頻度の対がなくな
るまで行う（３１５０，１４７〜１４９〉。同様の処理
を登録ファイルの全登録書誌情報について行う。Furthermore, from among these pairs of registration numbers and registration frequencies, the pair with the lowest registration frequency is deleted (S149), and among the remaining pairs of registration numbers and registration frequencies, the first address of the record corresponding to the registration number with the lowest registration frequency is deleted. Store the registration address with the minimum registration frequency in the record address of the search file that corresponds to the start address in the record start address column of the table, add the number of pairs of registration number and registration frequency remaining in the storage capacity column, and Store the remaining registration number and registration frequency in the free space, and delete the pair with the lowest registration frequency. Similar processing is performed until there are no more pairs of registration numbers and registration frequencies (3150, 147-149). Similar processing is performed for all registered bibliographic information of the registered file.

第１Ｏ図は、検索動作を示す流れ図である。FIG. 1O is a flow diagram illustrating the search operation.

まず、検索入力からキーワードを取り出し、ハツシュ関
数を用いてこのキーワードのハツシュ値を算出する（Ｓ
２０１）。ハツシュ先頭頭番地表から、このハツシュ値
に該当するハツシュ先頭頭番地欄の先頭番地を取り出し
く　Ｓ　２０２）、ハツシュ値頻度表から算出したハツ
シュ値に該当する算出頻度を取り出す（Ｓ　２０３＞。First, a keyword is extracted from the search input, and the hash value of this keyword is calculated using a hash function (S
201). The first address of the hashish starting address column corresponding to this hashish value is extracted from the hashish starting address table (S 202), and the calculation frequency corresponding to the hashish value calculated from the hashish value frequency table is extracted (S 203>).

そして先頭番地に該当するキーワード族のキーワードエ
リアから、ハツシュ値頻度表から取り出した算出頻度が
示すキーワードエリアを取り出す（Ｓ　２０４）。取り
出したキーワードエリアのキーワード欄が検索入力から
取り出した検索処理中のキーワードに一致するキーワー
ドを格納しているキーワードエリアの登録番号と登録頻
度を取り出す（Ｓ　２０５）。同様の処理を検索入力さ
れた残りのキーワードについて行い、検索入力に該当す
るキーワードの登録番号と登録頻度を絞り込む。Then, the keyword area indicated by the calculated frequency extracted from the hash value frequency table is extracted from the keyword area of the keyword family corresponding to the first address (S204). The registration number and registration frequency of the keyword area in which the keyword field of the retrieved keyword area stores a keyword that matches the keyword under search processing retrieved from the search input is retrieved (S205). Similar processing is performed on the remaining keywords entered in the search, and the registration number and registration frequency of the keywords corresponding to the search input are narrowed down.

これらの絞り込んだ登録番号と登録頻度の対の中で最少
登録頻度に該当するレコード先頭番地表のレコード先頭
番地欄の先頭番地と、最少登録頻度の登録番号に該当す
るレコード容量が示す容量を取り出し、この先頭番地と
レコード容量が示す検索ファイルのレコードを取り出す
（３２０７〜２０９）。Among these narrowed-down pairs of registration numbers and registration frequencies, extract the first address in the record start address column of the record start address table that corresponds to the minimum registration frequency and the capacity indicated by the record capacity that corresponds to the registration number with the minimum registration frequency. , extracts the record of the search file indicated by this start address and record capacity (3207-209).

そしてこのレコードのデータ部１６について、絞り込ん
だ全登録番号に共通する情報番号を抽出し、検索対象と
する（３２１０）。Then, for the data section 16 of this record, an information number common to all narrowed-down registration numbers is extracted and set as a search target (3210).

例えば第６図のレコードで、検索入力されたキーワード
に対応する登録番号が「１１Ｊと「５０」と「８８」と
すると、これらに共通する情報番号はｒｌｏＯＪとなり
、これが検索図書の情報番号となる。For example, in the record in Figure 6, if the registration numbers corresponding to the keywords entered for search are "11J", "50", and "88", the information number common to these is rloOJ, which becomes the information number of the searched book. .

さらに具体的な例を挙げて説明する。This will be explained by giving a more specific example.

例えば、図書鎗の登録ファイルに登録された書誌番号ｒ
１００　Ｊが、キーワードとしてｒＩＳＤＮＪ、「通信
」、ｒＯ３ＩＪを有していたとする。この３つのキーワ
ードについてその登録番号と登録頻度数について見てみ
ると、ｒ　Ｉ　Ｓ　ＤＮＪは登録番号が「５０」で登録
頻度数がｒｌｏｏ　Ｊ、「通信」が登録番号が「１１」
で登録頻度数がｒ３５０　Ｊ、「０３ＩＪがその登録番
号が「８８」で登録頻度数が「５０」であったとする。For example, the bibliographic number r registered in the registration file of Toshoyari
Assume that 100 J has rISDNJ, "communication", and rO3IJ as keywords. Looking at the registration numbers and frequency of registration for these three keywords, r I S DNJ has a registration number of "50" and frequency of registration is rloo J, and "communication" has a registration number of "11".
Assume that the registration frequency is r350 J, the registration number of "03IJ" is "88", and the registration frequency is "50".

登録頻度数は、ｒＯ３Ｉ」＜ｒｌｓＤＮ」＜ｒ通信」の関係にあるため、検索ファイルのｒＯ３Ｉｊのレコー
ドにはｒＯ３ＩＪ、「ｌ５ＤＮ」、「通信」をその登録
番号と情報番号（書誌番号ｒ１００　Ｊ　）とともに格
納し、ｒｌｓＤＮＪのレコードには、「ｌ５ＤＮ」と「
通信」をその登録番号と情報番号とともに格納し、「通
信」のレコードには、「通信」をその登録番号と情報番
号とともに格納し検索ファイルを作成しておく。Since the registration frequency is in the relationship "rO3I"<rlsDN"< r communication", the record of rO3Ij in the search file contains rO3IJ, "l5DN", and "communication" with its registration number and information number (bibliographic number r100 J). The rlsDNJ record contains "l5DN" and "
"Communication" is stored together with its registration number and information number, and in the "Communication" record, "Communication" is stored together with its registration number and information number to create a search file.

検索処理を行う場合に、検索入力として「Ｏ３■」、ｒ
ｌｓＤＮ」、「通信」のキーワードが入力されると、こ
の３つのキーワードの登録頻度を調べ、最も登録頻度の
低いｒＯ３ＩＪのレコードを取り出す。このｒＯ３ＩＪ
のレコードのデータ部から、「○ＳＩＪ、ｒｌｓＤＮＪ
、「通信」の３つの登録番号「８８」、「５０」、「１
１」に共通する情報番号ｒ１００　Ｊが取り出され、共
通する情報番号ｒ１００　Ｊの書誌情報すなわち図書が
検索結果として取り出される。When performing a search process, enter "O3■", r as a search input.
When the keywords ``lsDN'' and ``communication'' are input, the registration frequency of these three keywords is checked, and the record of rO3IJ with the lowest registration frequency is retrieved. This rO3IJ
From the data section of the record, “○SIJ, rlsDNJ
, three registration numbers for "communications": "88", "50", "1"
1'' is extracted, and the bibliographic information, that is, the book, with the common information number r100J is extracted as a search result.

このようにして、もっとも登録頻度の少ないキーワード
に対応する検索用レコードを検索するだけで複数の検索
入力に対する検索が可能であり、その検索を高速化する
ことができる。In this way, it is possible to search for a plurality of search inputs simply by searching for the search record corresponding to the keyword that is least frequently registered, and the search can be speeded up.

〔Effect of the invention〕

以上説明したように、本発明は、複数検索入力に対して
ルーコードだけで検索できることと、登録時に検索処理
の最初の２レコ一ド間の共通情報抽出を行っておくこと
により、大量の情報を複数の検索入力で検索する場合に
その検索処理を高速化することができる優れた効果があ
る。As explained above, the present invention enables searching for multiple search inputs using only the Lou code, and extracts common information between the first two records of the search process at the time of registration. This has the excellent effect of speeding up the search process when searching for multiple search inputs.

[Brief explanation of drawings]

第１図は実施例情報処理装置の構成例。第２図は実施例のキーワード表。第３図は実施例のハツシュ値頻度表。第４図は実施例ハツシュ先頭頭番地表。第５図は実施例検索ファイル。第６図は検索ファイルのレコード構成を示す図。第７図は実施例のレコード容量表。第８図は実施例のレコード先頭番地表。第９図は実施例のキーワード登録処理動作を説明する図
。第１０図は実施例の検索処理動作を説明する図。１・・・ＣＰＵ、２・・・メモリ、３・・・人出力部、
４・・・キーボード、５・・・デイスプレィ、６・・・
外部記憶装置制御部、７・・・外部記憶装置、訃・・共
通バス。FIG. 1 shows an example of the configuration of the information processing apparatus according to the embodiment. FIG. 2 is a keyword table of the example. FIG. 3 is a hash value frequency table of the example. FIG. 4 is an example hash head address table. FIG. 5 is an example search file. FIG. 6 is a diagram showing the record structure of the search file. FIG. 7 is a record capacity table of the embodiment. FIG. 8 is a record start address table of the embodiment. FIG. 9 is a diagram illustrating the keyword registration processing operation of the embodiment. FIG. 10 is a diagram illustrating the search processing operation of the embodiment. 1...CPU, 2...Memory, 3...Person output section,
4...Keyboard, 5...Display, 6...
External storage device control unit, 7...external storage device, 7...common bus.

Claims

[Claims] 1. A storage device including a storage device having an area for storing search keywords and a search processing device that executes search processing according to a search input; In an information processing device in which a keyword is registered and a search is performed by comparing the registered keyword with a search input, a first means for counting the registration frequency of the registered keyword; and a first means for counting the registration frequency of the registered keyword. a second means of creating a keyword table in which keywords are associated with keywords; An information processing apparatus comprising: a third means for creating a search file consisting of records in which registered keywords and registered information having a registration frequency or higher are stored in pairs. 2. Equipped with a storage device having an area for storing search keywords and a search processing device that executes search processing according to the search input, the search keywords extracted from the registered information to be searched are registered, and this In an information processing device that performs a search by comparing registered keywords and search input, a first means for counting the registration frequency of registered keywords and a correspondence between the registration frequency of the registered keywords and the keywords are provided. a second means of creating a keyword table with reference to the first means and second means, and for each registered keyword, for all keywords in the registered information to which this keyword belongs, registered keywords with a registration frequency equal to or higher than that of this keyword; and a third means for creating a search file consisting of records stored in pairs with registered information. A means is provided for extracting registered information common to the plurality of search keywords inputted above from the above-mentioned keyword table, searching for a corresponding record in the above-mentioned search file using this keyword as a directory, and extracting registered information common to the plurality of search keywords inputted above from this record as a search result. An information processing device characterized by: