JPH04281565A

JPH04281565A - Document retrieving device

Info

Publication number: JPH04281565A
Application number: JP3069321A
Authority: JP
Inventors: Yasuo Tanosaki; 康雄田野崎; Yukio Nakamoto; 幸夫中本
Original assignee: Toshiba Corp; Toshiba Computer Engineering Corp
Current assignee: Toshiba Corp; Toshiba Computer Engineering Corp
Priority date: 1991-03-08
Filing date: 1991-03-08
Publication date: 1992-10-07
Anticipated expiration: 2014-08-23
Also published as: JP2937519B2

Abstract

PURPOSE:To output a retrieval result by inferring the significance of the appearing position of a character string including a key word in a document. CONSTITUTION:A read in document is document-divided to a header part, a preface text part, and a text part. It is decided at which position on the document the key word inputted from an input means appears. The priority of a retrieved document is calculated based on the appearing position of the key word, and the document is outputted as the retrieval result in sequence of calculated priority.

Description

[Detailed description of the invention]

［発明の目的］ [Purpose of the invention]

【０００１】0001

【産業上の利用分野】本発明は、入力されたキーワード
を含む文書を抽出するフルテキストサーチを行なう文書
検索装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document retrieval device that performs a full text search to extract documents containing input keywords.

【０００２】0002

【従来の技術】近年、文書が電子化され、大量の文書デ
ータが流通している。この大量の文書データの中からユ
ーザが必要とするものを抽出する際には、文字列からな
るキーワードを入力して検索を行なう検索方式が主流と
なっている。このキーワードによる検索は、主に次の２
方式に大別される。2. Description of the Related Art In recent years, documents have been digitized and a large amount of document data has been distributed. When extracting what the user needs from this large amount of document data, the mainstream search method is to input keywords consisting of character strings. Searches using this keyword are mainly conducted in the following two ways.
Broadly divided into methods.

【０００３】（１）　全ての文書に予めキーワードが付
与されており、ユーザが入力したキーワードが付与され
ている文書を抽出する方式。(1) A method in which keywords are assigned to all documents in advance, and documents to which keywords input by the user are assigned are extracted.

【０００４】（２）　ユーザが入力したキーワードを文
書中に含んでいるかフルテキストサーチによる方式。(2) A full-text search method to determine whether the document contains the keyword input by the user.

【０００５】上記（１）　の方式では、文書に付与され
ているキーワードの数を制限することによって、ユーザ
が入力したキーワードとの文字列マッチング処理を最少
限にして、高速な検索を実現している。しかしながら、
文書には予めキーワードが付与されていなければならず
、また、付与されていないキーワードをユーザが指定し
た場合には、文書を抽出することは不可能である。その
上、文書にキーワードを付与する作業は、文書作成者に
とって負担であり、また、キーワードの決め方も文書作
成者まかせの場合があり、キーワードの統一性を確保す
るのが困難である。[0005] In the above method (1), by limiting the number of keywords assigned to a document, character string matching processing with keywords input by the user is minimized, thereby realizing high-speed search. There is. however,
A keyword must be assigned to a document in advance, and if the user specifies a keyword that has not been assigned, it is impossible to extract the document. Moreover, the work of assigning keywords to a document is a burden on the document creator, and the method of determining keywords is sometimes left up to the document creator, making it difficult to ensure uniformity of keywords.

【０００６】また、上記（２）　の方式は、ユーザが入
力した文字列を含んでいる文書を抽出するので、抽出洩
れも少ない。Furthermore, since the method (2) above extracts documents that include the character string input by the user, there are few omissions in extraction.

【０００７】ところで、上記（１），（２）　の方式で
は、条件を満たす文書が見つかった場合には、ユーザが
入力した文字列を含んでいる文書を列挙表示して、検索
結果をユーザに与える。ユーザはさらに大量に出力され
た文書の中から、表示画面のスクロールによって順番に
各文書が目的にあったものか否かの判断を行なって必要
なものを選び出す必要がある。このとき、キーワードが
文書中でどのように出現しているかはユーザには示され
ない。[0007] By the way, in the methods (1) and (2) above, when a document that satisfies the conditions is found, the documents containing the character string input by the user are listed and displayed, and the search results are displayed to the user. give. Furthermore, the user must scroll the display screen to determine whether each document is suitable for the purpose and select the desired document from among the large amount of output documents. At this time, the user is not shown how the keyword appears in the document.

【０００８】[0008]

【発明が解決しようとする課題】上記したように、従来
のフルテキストサーチによる検索装置においては、検索
装置から得られた検索結果はユーザが入力したキーワー
ドを含む文書がファイルの格納順に列挙表示され、ユー
ザはこの列挙表示された文書を全て読まなければならな
いという問題点があった。[Problems to be Solved by the Invention] As described above, in the conventional full-text search search device, the search results obtained from the search device display documents containing the keyword input by the user as a list in the order in which the files are stored. However, there was a problem in that the user had to read all of the documents displayed in this enumerated manner.

【０００９】本発明は、上記事情に鑑みてなされたもの
で、キーワードを含む文字列の文書中における出現位置
の重要度を類推して検索結果を出力する文書検索装置を
提供することを目的とする。The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a document retrieval device that outputs search results by estimating the importance of the appearance position of a character string containing a keyword in a document. do.

【００１０】［発明の構成］[Configuration of the invention]

【００１１】[0011]

【課題を解決するための手段】本発明は、上記目的を達
成するために、入力されたキーワードをテキスト中に含
む文書を抽出する文書検索装置において、キーワードを
入力する入力手段と、文書を表題・前書き文・本文など
に文書分割する文書分割手段と、上記キーワードが上記
文書中に含まれているか否かを判別する文字列マッチン
グ手段と、この文字列マッチング手段で判別されたキー
ワードの文書中の出現位置と上記文書分割手段による文
書分割情報に基づいて検索文書の優先度を計算する優先
度計算手段と、この優先度計算手段により得られた優先
度順に文書を出力する検索結果出力手段とを具備したこ
とを特徴とする。[Means for Solving the Problems] In order to achieve the above object, the present invention provides a document retrieval device that extracts documents that include an inputted keyword in the text, including an input means for inputting a keyword, and a title for a document.・Document dividing means for dividing the document into preface text, body text, etc., a character string matching means for determining whether the above-mentioned keyword is included in the above-mentioned document, and a character string matching means for determining whether or not the above-mentioned keyword is included in the above-mentioned document; a priority calculation means for calculating the priority of the searched document based on the appearance position of the document and the document division information by the document division means; and a search result output means for outputting the documents in the priority order obtained by the priority calculation means. It is characterized by having the following.

【００１２】0012

【作用】本発明は上記のように構成したので、入力手段
から入力されたキーワードが文書中に出現する位置（例
えば、表題、前書き文、あるいは本文など）に基づいて
、検索された文書の優先度を算出し、この算出された優
先度順に文書を検索結果として出力することにより、効
率的な文書の検索が行なわれる。[Operation] Since the present invention is configured as described above, priority is given to the searched document based on the position where the keyword inputted from the input means appears in the document (for example, title, preface, body text, etc.). By calculating the priority and outputting documents as search results in the order of the calculated priority, efficient document retrieval is performed.

【００１３】[0013]

【実施例】以下、図面を参照して本発明の実施例を説明
する。Embodiments Hereinafter, embodiments of the present invention will be described with reference to the drawings.

【００１４】図１は、本発明の一実施例の文書検索装置
の構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of a document search device according to an embodiment of the present invention.

【００１５】同図において、１　は、例えばフロッピー
ディスク装置やハードディスク装置などからなり、すで
に作成されている文書データを保存する外部記憶装置で
、この外部記憶装置１　から読み出された文書データは
、例えばダイナミックＲＡＭからなる文書データメモリ
２　に記憶される。１個の文書データは、文書中のテキ
スト情報のみを含むテキストデータ部とイメージデータ
、フォーマット情報などを含む非テキストデータ部から
なる。In the figure, reference numeral 1 denotes an external storage device such as a floppy disk device or a hard disk device, which stores document data that has already been created.The document data read from this external storage device 1 is For example, the document data memory 2 is stored in a dynamic RAM. One piece of document data consists of a text data section that includes only text information in the document, and a non-text data section that includes image data, format information, and the like.

【００１６】また、３　は検索キーワードやコマンドな
どを入力するための入力部で、例えばキーボードとマウ
スおよびこれらを制御する装置で構成され、この入力部
３　から入力された文字列からなる検索キーワードやコ
マンドなどは、制御部４　の制御により、例えばＶＲＡ
Ｍと、このＶＲＡＭに格納されたビット情報をドット列
として表示するためのＣＲＴディスプレイからなる表示
部５　に表示される。表示部５は、さらに、検索結果や
文書データメモリ２　に記憶されている文書データなど
も表示する。Reference numeral 3 denotes an input unit for inputting search keywords, commands, etc., which is composed of, for example, a keyboard, a mouse, and a device for controlling these. Commands and the like are controlled by the control unit 4, such as VRA.
M and the bit information stored in this VRAM are displayed on a display section 5 consisting of a CRT display for displaying the bit information as a dot string. The display unit 5 further displays search results, document data stored in the document data memory 2, and the like.

【００１７】制御部４　は、システムプログラムを記憶
するとともにバッファメモリとして用いられるＲＡＭや
制御動作を実行するＣＰＵなどから構成され、上記各装
置あるいは後述する各装置とバスにより接続され、各装
置の制御、装置間のデータの転送などの制御や処理を行
なうものである。なお、この制御部４　には制御や処理
に必要なバッファやカウンタが含まれており、例えば、
外部記憶装置１　に格納されている文書データ数は、図
示しない文書数格納バッファに格納されている。The control unit 4 is composed of a RAM that stores system programs and is used as a buffer memory, a CPU that executes control operations, etc., and is connected to each of the above-mentioned devices or devices that will be described later by a bus, and controls each device. , and performs control and processing such as data transfer between devices. Note that this control unit 4 includes buffers and counters necessary for control and processing, such as
The number of document data stored in the external storage device 1 is stored in a document number storage buffer (not shown).

【００１８】６　は、文書データのテキストデータ部を
、図２に示すように、表題部分７　、前書き文部分８　
、および本文部分９　に文書分割する文書分割部で、文
書データメモリ２　に格納されている文書データから、
改行コードおよび句点までを一文単位として呼び出す。文書分割部６　は、一テキスト文書の第１文から文末が
句点である一文の前文までを表題部分７　、「はじめに
」や「初めに」などのように本テキスト文書の内容を具
体的に述べ始めることを示す語句を含む文の前文、すな
わちアブストラクトの部分を前書き文部分８　、および
前書き文部分の次の文から本テキスト文書の最終文まで
を本文部分９　として判断し、文書分割する。この文書
分割情報は制御部４　に出力される。6, the text data part of the document data is divided into a title part 7 and a preface part 8, as shown in FIG.
, and the main text portion 9, the document dividing unit divides the document into the text portion 9 from the document data stored in the document data memory 2.
Recalls the line feed code and period as a single sentence. The document dividing unit 6 divides the first sentence of one text document from the first sentence to the preamble of the sentence with a period at the end into a title part 7, which specifically describes the contents of the text document, such as "Introduction" or "Introduction." The preamble, that is, the abstract part of the sentence including the phrase indicating the beginning of the text is determined to be the preface part 8, and the sentence from the next sentence of the preface part to the last sentence of the main text document is determined to be the main text part 9, and the document is divided. This document division information is output to the control section 4.

【００１９】１０はマッチング部で、一テキスト文書デ
ータが格納されている文書データメモリ２　から一文を
呼び出し、その一文中に入力部３　から入力されたキー
ワードが含まれているか否かを判別する。一方、１１は
部分別値表で、この部分別値表１１にはユーザが求める
文書の順位付けを行なうための計算値が格納されており
、入力部３　から入力されたキーワードが文書中のどの
部分、すなわち文書分割部分のどの部分に存在している
かによって計算値が設定されている。例えば、図３に示
すように、表題部分７　に位置する場合には“１０”、
前書き部分８　の場合は“５”、本文部分の場合は“２
”と設定されている。テキスト文書中にキーワードが含
まれている場合には、マッチング部１０は、文書分割部
６　から制御部４　に出力されている文書分割情報に基
づいてキーワードが出現する位置に該当する値を部分別
値表１１から呼び出し、計算値格納バッファ１２に加算
する。複数のテキスト文書データに対し、この計算値格
納バッファ１２への加算をテキスト文書データの第１文
から最終文まで行なったときの計算値格納バッファ１２
の内容一例を図４に示す。この計算値格納バッファ１２
に格納されている計算値は、検索結果を出力する際の検
索結果の出力優先度を示している。Reference numeral 10 denotes a matching section which reads a sentence from the document data memory 2 in which text document data is stored and determines whether or not the keyword input from the input section 3 is included in the sentence. On the other hand, reference numeral 11 is a value table for each part, and this value table for each part 11 stores calculated values for ranking documents desired by the user. The calculated value is set depending on which part of the document division part it exists in. For example, as shown in Figure 3, if it is located in the title section 7, "10",
“5” for the introduction part 8, “2” for the main text part
If a keyword is included in the text document, the matching unit 10 determines the position where the keyword appears based on the document division information output from the document division unit 6 to the control unit 4. The value corresponding to is called from the partial value table 11 and added to the calculated value storage buffer 12.For multiple text document data, the addition to the calculated value storage buffer 12 is performed from the first sentence to the last sentence of the text document data. Calculated value storage buffer 12 when performed up to
An example of the contents is shown in Figure 4. This calculated value storage buffer 12
The calculated value stored in indicates the output priority of the search results when outputting the search results.

【００２０】１３は検索結果出力部で、検索結果出力部
１３は各文書に対する計算値が格納されている計算値格
納バッファ１２の内容を参照し、例えば図５に示すよう
に、計算値の大きい順すなわち優先度順に文書を表示部
５に表示する。Reference numeral 13 denotes a search result output unit, and the search result output unit 13 refers to the contents of the calculated value storage buffer 12 in which calculated values for each document are stored, and for example, as shown in FIG. The documents are displayed on the display unit 5 in order of priority, that is, in order of priority.

【００２１】次に、上記構成の文書検索装置の具体的な
処理動作について、図６の処理の流れを示すフローチャ
ートを参照し説明する。Next, specific processing operations of the document retrieval apparatus having the above configuration will be explained with reference to a flowchart showing the flow of processing in FIG.

【００２２】まず、制御部４　内のバッファやカウンタ
、および計算値格納バッファ１２の初期化が行なわれ、
続いて、入力部３　から検索のための文字列からなるキ
ーワードがユーザによって複数個入力される。（ステッ
プＳ１、ステップＳ２）。First, the buffers and counters in the control unit 4 and the calculated value storage buffer 12 are initialized.
Subsequently, the user inputs a plurality of keywords consisting of character strings for searching through the input section 3 . (Step S1, Step S2).

【００２３】キーワード入力が終了すると、複数の文書
データが格納されている外部記憶装置１　から１テキス
ト文書のデータが文書データメモリ２　に読み込まれる
。１テキスト文書を読み込むと、文書分割情報であるＩ
ＣＨＩフラグをリセットするとともに制御部４　内のテ
キストカウンタＮ（不図示）を“１”にセットする。（
ステップＳ３、ステップＳ４）。When the keyword input is completed, the data of one text document is read into the document data memory 2 from the external storage device 1 in which a plurality of document data are stored. 1 When a text document is read, document division information I
The CHI flag is reset and a text counter N (not shown) in the control unit 4 is set to "1". (
Step S3, Step S4).

【００２４】続いて、文書データメモリ２　から改行コ
ードあるいは句点で区切られた最初の一文、例えば図２
に示す最初の一文１４（以下、具体的なテキスト文書と
して図２に示すテキスト文書を参照し説明する）が制御
部４　を介し、文書分割部６　とマッチング部１０に読
み込まれる。（ステップＳ５）。最初の一文１４が読み
込まれると、ステップＳ４においてＩＣＨＩフラグがリ
セットされているので、ＩＣＨＩフラグが本文部分９　
と前書き文部分８　であるかをチェックするステップＳ
６とステップＳ７を通って、処理はステップＳ８に移行
する。Next, the first sentence separated by a line feed code or period, for example, FIG.
The first sentence 14 shown in FIG. 1 (hereinafter described as a specific text document with reference to the text document shown in FIG. 2) is read into the document dividing section 6 and the matching section 10 via the control section 4. (Step S5). When the first sentence 14 is read, the ICHI flag is reset in step S4, so the ICHI flag is set to the text part 9.
Step S of checking whether the preface part 8 is
6 and step S7, the process moves to step S8.

【００２５】ステップＳ８では、最初の一文１４が読み
込まれているので、文書分割部６　は表題部分７　と判
別する。この文書分割情報に基づいて、マッチング部１０により
、部分別値表１１から表題部分７　に該当する値“１０
”が呼び出され、ＩＣＨＩフラグが“１０”にセットさ
れる。続いて、ステップＳ９において、読み込まれた一
文に文末句点があるか否かがステップＳ９においてチェ
ックされるが、最初の一文１４には句点がないので、処
理はステップＳ１０　に移行する。In step S8, since the first sentence 14 has been read, the document dividing unit 6 determines that it is the title part 7. Based on this document division information, the matching unit 10 selects the value "10" corresponding to the title part 7 from the part-by-part value table 11.
” is called, and the ICHI flag is set to “10”.Next, in step S9, it is checked whether or not the read sentence has a final period. Since there is no period, the process moves to step S10.

【００２６】ステップＳ１０　では、読み込まれた一文
が入力部３　から入力されたキーワードを含むか否かが
マッチング部１０でマッチング法によって調べられる。ここで、キーワードが“文書検索装置”という文字列で
あるとすると、最初の一文１４は“文書検索装置”とい
う文字列を含んでいるので、処理はステップＳ１１　に
移行する。In step S10, the matching unit 10 uses a matching method to check whether the read sentence includes the keyword input from the input unit 3. Here, if the keyword is the character string "document retrieval device", the first sentence 14 includes the character string "document retrieval device", so the process moves to step S11.

【００２７】ステップＳ１１　においては、マッチング
部１０により計算値格納バッファ１２への加算が行なわ
れる。すなわち、ＩＣＨＩフラグの値“１０”がｆｉｌ
ｅ［１］、つまり“文書１”の領域に加算される。加算
後、ステップＳ１２　において、さらに読み込むべき一
文があるか否かがチェックされる。図２の例では読み込
むべき一文があるので、処理はステップＳ５に戻る。In step S11, the matching unit 10 adds the calculated value to the storage buffer 12. In other words, the value “10” of the ICHI flag is fil
It is added to e[1], that is, the area of "Document 1". After the addition, in step S12, it is checked whether there is another sentence to be read. In the example of FIG. 2, there is one sentence to be read, so the process returns to step S5.

【００２８】ステップＳ５では、２番目の一文１５が読
み込まれ、上記と同様の処理が行なわれるが、この一文
１５には“検索”というキーワードが含まれていないの
で、ステップＳ１１　をスキップ（すなわち、２番目の
一文１５にはキーワードが含まれていないので、“文書
１”領域への加算は行なわれない）してステップ１０か
らステップ１２へ処理が移行し、さらに、ステップＳ１
２　からステップＳ５に再び戻る。In step S5, the second sentence 15 is read and the same processing as above is performed, but since this sentence 15 does not include the keyword "search", step S11 is skipped (that is, Since the second sentence 15 does not contain any keywords, it is not added to the "Document 1" area), and the process moves from step 10 to step 12, and further, in step S1
2, the process returns to step S5 again.

【００２９】ステップＳ５に戻ると、３番目の一文１６
が読み込まれ、上記と同様の処理が行なわれるが、この
３番目の一文１６には文末句点があるので、処理はステ
ップＳ９からステップＳ１３　に移行する。Returning to step S5, the third sentence 16
is read and the same processing as above is performed, but since this third sentence 16 has a sentence ending point, the processing moves from step S9 to step S13.

【００３０】ステップＳ１３　では、読み込まれている
３番目の一文１６は表題部分７　に続く一文で、かつ文
末に句点があるので、文書分割部６　は前書き文部分８
　と判別する。この文書分割情報に基づいて、マッチン
グ部１０を介して部分別値表１１から前書き文部分８　
に該当する値“５”が呼び出され、ＩＣＨＩフラグが“
５”にセットされる。ＩＣＨＩフラグへの前書き文部分
８　に該当する値“５”のセットが終了すると、処理は
ステップＳ１０　に移行する。In step S13, the third sentence 16 that has been read is the sentence following the title section 7, and there is a period at the end of the sentence, so the document dividing section 6 divides the text into the preface section 8.
It is determined that Based on this document division information, the introductory text part 8 is
The value “5” corresponding to is called, and the ICHI flag is “
When the setting of the value "5" corresponding to the preface part 8 to the ICHI flag is completed, the process moves to step S10.

【００３１】ステップＳ１０　では、読み込まれた３番
目の一文１６には“文書検索装置”という文字列が含ま
れているので、処理はステップＳ１１に移行する。ステ
ップＳ１１　においては、計算値格納バッファ１２への
加算が行なわれるが、ＩＣＨＩフラグが“５”にセット
されているので、“文書１”の領域には“５”が加算さ
れる。この加算によって、計算値格納バッファ１２の内
容は“１５”となる。計算値格納バッファ１２には、全
文書について１文書ごとに計算値が格納されており、こ
の計算値に基づいて検索結果出力の優先順位が決定され
る。このステップＳ１１　の処理が終了すると、続く前
書き文部分８　の一文を読み込むために、処理は、同様
にステップＳ１２からステップＳ５に戻る。In step S10, the third sentence 16 read includes the character string "document search device", so the process moves to step S11. In step S11, addition to the calculated value storage buffer 12 is performed, and since the ICHI flag is set to "5", "5" is added to the "Document 1" area. As a result of this addition, the content of the calculated value storage buffer 12 becomes "15". The calculated value storage buffer 12 stores calculated values for each document for all documents, and the priority order of search result output is determined based on this calculated value. When the process of step S11 is completed, the process similarly returns from step S12 to step S5 in order to read the next sentence of the preface section 8.

【００３２】３番目の一文１６以降の前書き文部分８　
の一文が読み込まれると、ステップＳ７以降の処理が上
記処理とは一部異なる。すなわち、ステップＳ７におい
て、ＩＣＨＩフラグが前書き文部分８　に該当する値“
５”にセットされているので、処理はステップＳ８では
なくステップ１４に移行する。[0032] Preface part 8 after the third sentence 16
When one sentence is read, the processing from step S7 onwards is partially different from the above processing. That is, in step S7, the ICHI flag is set to the value corresponding to the preface section 8.
5'', the process moves to step 14 instead of step S8.

【００３３】ステップＳ１４　では、読み込まれた一文
に本文部分９　を示す文字列「はじめに」が含まれてい
るか否かのチェックが行なわれる。読み込まれた一文に
文字列「はじめに」が含まれていない場合には、処理は
ステップＳ１０　に移行し、上記と同様の処理を繰り返
す。In step S14, a check is made to see if the read sentence contains the character string "Introduction" indicating the main text portion 9. If the read sentence does not include the character string "Introduction", the process moves to step S10, and the same process as above is repeated.

【００３４】また、読み込まれた一文が本文部分９　の
最初の一文１７である場合には、処理はステップＳ１５
　に進む。ステップＳ１５　では、「はじめに」という
語句が含まれているので、文書分割部６　は本文部分９
　と判別する。この文書分割情報に基づいて、マッチング部１０を介し
て部分別値表１１から本文部分９　に該当する値“２”
が呼び出され、ＩＣＨＩフラグが“２”にセットされる
。ＩＣＨＩフラグへの本文部分９　に該当する値“２”
のセットが終了すると、処理はステップＳ１０　に移行
する。Further, if the read sentence is the first sentence 17 of the body portion 9, the process proceeds to step S15.
Proceed to. In step S15, since the word "Introduction" is included, the document dividing section 6 is divided into the main text section 9.
It is determined that Based on this document division information, the matching unit 10 sends the value “2” corresponding to the text portion 9 from the portion-specific value table 11.
is called and the ICHI flag is set to "2". Value “2” corresponding to body part 9 to ICHI flag
When the setting is completed, the process moves to step S10.

【００３５】ステップＳ１０　以降のステップＳ１１　
、ステップＳ１２　の処理は、上記した処理と同様の処
理が行なわれるが、本文部分９　がキーワードを含む場
合には、ＩＣＨＩフラグにセットされている“２”が計
算値格納バッファ１２への“文書１”領域に加算される
。図２に示す例文においては、本文部分９　にキーワー
ドが含まれていないので、この“２”の加算は行なわれ
ない。Step S10 and subsequent steps S11
, the process in step S12 is similar to the process described above, but if the text portion 9 includes a keyword, "2" set in the ICHI flag is stored in the "document" in the calculation value storage buffer 12. 1” area. In the example sentence shown in FIG. 2, since the main text portion 9 does not include a keyword, this addition of "2" is not performed.

【００３６】本文部分９　の一文が読み込まれると、ス
テップＳ１５　でＩＣＨＩフラグが本文部分９に該当す
る値“２”にセットされているので、ステップＳ６から
途中の処理ステップをスキップしてステップＳ１０　に
処理が移行する。When the sentence in body part 9 is read, the ICHI flag is set to the value "2" corresponding to body part 9 in step S15, so intermediate processing steps from step S6 are skipped and the process proceeds to step S10. Processing is transferred.

【００３７】以上の動作を本文部分９　の最終文まで繰
り返すと、読み込むべき一文がなくなり、“文書１”に
対する検索が終了する。このとき、計算値格納バッファ
１２には図４に示すような計算値が“文書１”の領域に
格納される。When the above operations are repeated until the last sentence of body portion 9, there is no longer a sentence to be read, and the search for "Document 1" ends. At this time, the calculated value as shown in FIG. 4 is stored in the calculated value storage buffer 12 in the "Document 1" area.

【００３８】読み込むべき一文がなくなると、処理はス
テップＳ１６　から、次の文書、例えば“文書２”を読
み込むためにステップＳ３に戻る。When there is no more sentence to be read, the process returns from step S16 to step S3 to read the next document, for example "document 2".

【００３９】ステップＳ３に戻ると、上記と同様に、外
部記憶装置１　から“文書２”のテキスト文書が文書デ
ータメモリ２　に読み込まれ、さらに、ステップＳ４で
は、文書分割情報であるＩＣＨＩフラグをリセットする
とともに制御部４　内のテキストカウンタＮを“２”に
セットする。ステップＳ５以降は、上記処理と同様の処
理が繰り返される。Returning to step S3, the text document "Document 2" is read from the external storage device 1 into the document data memory 2 in the same manner as described above, and further, in step S4, the ICHI flag, which is the document division information, is reset. At the same time, the text counter N in the control section 4 is set to "2". After step S5, the same process as the above process is repeated.

【００４０】このようにして、外部記憶装置１　に格納
されている全てのテキスト文書に対する検索処理が完了
すると、すなわち、制御部４　内の文書数格納バッファ
とテキストカウンタＮの値が一致すると、処理はステッ
プＳ１６　からステップＳ１７　へ移行する。In this way, when the search process for all text documents stored in the external storage device 1 is completed, that is, when the document number storage buffer in the control unit 4 matches the value of the text counter N, the process starts. The process moves from step S16 to step S17.

【００４１】ステップＳ１７　では、検索結果出力部１
３が起動され、計算値格納バッファ１２の内容が参照さ
れる。計算値格納バッファ１２に、例えば図４に示す内
容が格納されているとすると、検索結果出力部１３は、
計算値格納バッファ１２の計算値の高い順に図５に示す
順番でテキスト文書を表示部５　に表示し優先出力する
。[0041] In step S17, the search result output unit 1
3 is activated, and the contents of the calculated value storage buffer 12 are referenced. If the calculated value storage buffer 12 stores, for example, the contents shown in FIG.
The text documents are displayed on the display unit 5 in the order shown in FIG. 5 in descending order of the calculated values in the calculated value storage buffer 12 and output with priority.

【００４２】以上のように、計算値格納バッファ１２に
おける計算値の高い文書がユーザの求めている文書に近
いものであるとして優先出力することにより、効率的な
文書検索が行なわれる。As described above, documents with high calculated values in the calculated value storage buffer 12 are output preferentially because they are close to the documents desired by the user, thereby achieving efficient document retrieval.

【００４３】なお、上記実施例ではマッチング部１０は
、読み込まれた一文中にキーワードが含まれているか否
かを判別するとともに文書分割情報とキーワードが出現
する位置に基づいて計算値格納バッファ１２に加算する
ようにしたが、マッチング部１０をキーワードが含まれ
ているか否かを判別する文字列マッチング手段と文書分
割情報とキーワードが出現する位置に基づいて計算値格
納バッファ１２に加算する優先度計算手段とに構成を分
割させても同様の作用が達成される。In the above embodiment, the matching unit 10 determines whether or not a keyword is included in the read sentence, and stores the calculated value in the storage buffer 12 based on the document division information and the position where the keyword appears. However, the matching section 10 is used for character string matching means for determining whether a keyword is included, document division information, and priority calculation for adding to the calculated value storage buffer 12 based on the position where the keyword appears. A similar effect can be achieved by dividing the configuration into means and means.

【００４４】また、上記実施例では文書を表題部分７　
、前書き文部分８、および本文部分９に文書分割するよ
うにしたが、これに限ることはなく、例えば後書き文部
分などを加えたり、あるいは表題部分７　と本文部分９
　に分割したり、などその文書分割数の追加・削除は任
意に設定してもよい。[0044] In the above embodiment, the title part 7 of the document is
Although the document is divided into a preface section 8 and a main text section 9, the present invention is not limited to this. For example, an afterword section can be added, or a title section 7 and a main text section 9 can be divided.
The number of document divisions may be added or deleted as desired.

【００４５】また、上記実施例では文書を表題部分７　
、前書き文部分８、本文部分９　の順に文書分割するよ
うにしたが、これに限ることはなく、文書分割する順序
をその文書の属する技術分野に応じて任意に変えられる
ことは勿論である。[0045] In the above embodiment, the title part 7 of the document is
, the preface section 8 , and the main text section 9 . However, the present invention is not limited to this, and it goes without saying that the order in which the document is divided can be arbitrarily changed depending on the technical field to which the document belongs.

【００４６】また、本発明は上記実施例に限定されるも
のではなく、本発明の要旨を逸脱しない範囲で種々変形
可能であることは勿論である。Further, the present invention is not limited to the above embodiments, and it goes without saying that various modifications can be made without departing from the gist of the present invention.

【００４７】[0047]

【発明の効果】以上詳述したように、本発明の文書検索
装置によれば、文書中におけるキーワードの出現位置に
応じて、そのキーワードの文書中における重要度を推定
し、検索結果の出力に優先度を設けることにより、効率
的な文書検索ができ、その結果、文書データベース中か
ら目的とするものを検索する際に要するユーザの労力を
著しく削減することが可能になるなどその実用的効果は
多大である。As described in detail above, according to the document search device of the present invention, the importance of a keyword in a document is estimated according to the position of the keyword in the document, and the search results are output. By setting priorities, it is possible to search documents efficiently, and as a result, it is possible to significantly reduce the effort required by the user when searching for the desired item in the document database.The practical effects are as follows. It's a huge amount.

[Brief explanation of the drawing]

【図１】本発明の一実施例の文書検索装置の構成を示す
ブロック図である。FIG. 1 is a block diagram showing the configuration of a document search device according to an embodiment of the present invention.

【図２】一文書の表題・前書き文・本文の分割の例を示
す図である。FIG. 2 is a diagram showing an example of dividing the title, preface, and main text of one document.

【図３】検索結果に優先順位付けを行なうための値を格
納する部分別値表の内容の例を示す図である。FIG. 3 is a diagram showing an example of the contents of a partial value table that stores values for prioritizing search results.

【図４】文書毎の計算値を格納する計算値格納バッファ
の内容の例を示す図である。FIG. 4 is a diagram showing an example of the contents of a calculated value storage buffer that stores calculated values for each document.

【図５】検索結果の出力順番を示す図である。FIG. 5 is a diagram showing the output order of search results.

【図６】処理の流れの概略を示したフローチャートであ
る。FIG. 6 is a flowchart showing an outline of the flow of processing.

[Explanation of symbols]

３　…入力部（入力手段）６　…文書分割部（文書分割手段）１０…マッチング部（文字列マッチング手段、優先度計
算手段）3...Input section (input means) 6...Document dividing section (document dividing means) 10...Matching section (character string matching means, priority calculation means)

Claims

[Claims]

Claim 1: A document retrieval device for extracting documents containing input keywords in text, comprising: input means for inputting keywords; document division means for dividing a document into a title, a preface, a main text, etc.; a character string matching means for determining whether a keyword is included in the document; a search based on the appearance position of the keyword in the document determined by the character string matching means; and document division information by the document dividing means. What is claimed is: 1. A document retrieval device comprising: priority calculation means for calculating the priority of documents; and search result output means for outputting documents in order of priority obtained by the priority calculation means.