JPH02125363A

JPH02125363A - Document retrieving device

Info

Publication number: JPH02125363A
Application number: JP1186051A
Authority: JP
Inventors: Yasutsugu Ogawa; 泰嗣小川
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1988-07-29
Filing date: 1989-07-20
Publication date: 1990-05-14
Anticipated expiration: 2014-09-27
Also published as: JP2954949B2

Abstract

PURPOSE:To improve the retrieval efficiency by performing fuzzy retrieval and changing the keyword connection based on user's judgement or instruction. CONSTITUTION:A document accuracy calculating part 51 calculates a document accuracy indicating the degree, in which each document in a device satisfies a user's question statement (retrieval formula), based on the retrieval formula transferred from a request processing part 54 and sorts documents in the order of document accuracy and presents them to a user. The user selects a document to be more finely seen from a list of document titles and document accuracy of retrieval results and sees document contents. The user refers to contents of the list to judge whether the document which he sees at present satisfies his retrieval formula or not. Thus, the keyword connection is changed by learning based on user's judgement to reflect the judgement on retrieval results at the time of next retrieval.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は文書検索装置、とくに。[Detailed description of the invention] [Industrial application field] The present invention relates to a document retrieval device, and particularly to a document retrieval device.

文書検索時。When searching for documents.

検索結果において各文書が、利用者の検索要求に近いも
のほど大きな値を持つ評価値によって順序づけられ、さ
らに、その順序づけが利用者の判断・指示により変更可
能な柔軟で高速な文書検索装置に関する。The present invention relates to a flexible and high-speed document retrieval device in which each document in a search result is ordered by an evaluation value having a larger value as the documents are closer to a user's search request, and furthermore, the ordering can be changed according to the user's judgment and instructions.

[Conventional technology]

従来の文書検索装置においては１文書登録時に登録オペ
レータが適切と思われるキーワードの選択・登録を行い
、文書検索時には検索装置利用者がシソーラスの中から
適切と思われるキーワードを指示することによって検索
を行っている。この方法によれば高速検索が可能である
。In conventional document search devices, when registering a document, a registration operator selects and registers keywords that are considered appropriate, and when searching for documents, the search device user specifies keywords that are considered appropriate from a thesaurus. Is going. According to this method, high-speed search is possible.

[Problem to be solved by the invention]

上記のような従来の装置によればキーワードの登録のた
めの記憶容量を必要とし、登録オにレータが選択するキ
ーワードの妥当性の問題がある。また、シソーラスによ
る分類・更新作業が煩雑であシ、これらの妥当性につい
ても問題点がある。The conventional device described above requires storage capacity for keyword registration, and there is a problem with the validity of the keywords selected by the registration operator. Furthermore, the classification and updating work using a thesaurus is complicated, and there are also problems with their validity.

さらに、従来の文書検索装置は、利用者が指定した検索
条件を満たしているか否かだけを調べるので、検索条件
をどの程度溝たしているかについてのあいまい検索かで
きない。そのため。Further, since conventional document search devices only check whether the search conditions specified by the user are satisfied, it is not possible to conduct an ambiguous search regarding the extent to which the search conditions are satisfied. Therefore.

利用者が指定した検索条件を完全ではないがほぼ満たし
ているような文書を検索するのにもう一度検索を実行し
なければならない。また、利用者が指定した検索条件を
満たすものがまったく無い場合、検索条件に最も近い文
書を検索することが極めて面倒であ右という問題点もあ
る。The search must be performed again to find a document that almost, but not completely, satisfies the search conditions specified by the user. Another problem is that if there is no document that satisfies the search conditions specified by the user, it is extremely troublesome to search for the document closest to the search conditions.

これらの問題を解決する方法として、キーワード間の関
係を数値で表現する方法が考えられている。しかし、具
体的な数値による表現方法が示されていない。また、学
習機能がないため、実際の文書検索装置としてシステム
にするには不十分であった。As a method to solve these problems, a method of expressing the relationship between keywords numerically has been considered. However, no specific numerical expression method is provided. Furthermore, since it did not have a learning function, it was insufficient to be used as an actual document retrieval device.

本発明はこのような従来技術の欠点を解消し。The present invention overcomes these drawbacks of the prior art.

動的キーワードコネクションという概念を利用し、文書
登録時にはシソーラスなどに基づいた煩雑な分類作業を
必要とせず、文書検索時には利用者の検索要求に近いも
の＃１ど大きな値金持つような文書確・度という評価値
を導入し、その文書確度の大小をもとに利用者が要求を
満たす文書を柔軟に検索できる文書検索装置を提供する
ことを目的とする。本発明は、特に検索質問文がキーワ
ード１つあるいは複数のキーワードの論理和であるよう
な場合に、利用者の判断・指示に基づいてキーワードコ
ネクションを変更し１次の検索時に利用者の判断が検索
結果に反映されるようにするキーワードコネクションの
学習方式を備えた文書検索装置を提供することを目的と
する。Utilizing the concept of dynamic keyword connection, there is no need for complicated classification work based on thesaurus etc. when registering documents, and when searching documents, it is possible to identify documents with high value such as #1 that are close to the user's search request. The present invention aims to provide a document retrieval device that allows users to flexibly search for documents that meet their requirements based on the degree of document accuracy. In particular, when the search question text is a logical sum of one keyword or multiple keywords, the present invention changes the keyword connection based on the user's judgment/instruction, thereby changing the user's judgment during the first search. An object of the present invention is to provide a document retrieval device equipped with a keyword connection learning method that is reflected in search results.

[Means to solve the problem]

本発明は上述の課題を解決するために、登録文書および
その文書情報がファイルに登録される際、文書情報によ
りキーワードを抽出するキーワード抽出手段と、登録文
書と前記キーワードとの間の関連を示すインバーテツド
ファイルを作成するインバーテッド７アイル作成手段と
。In order to solve the above-mentioned problems, the present invention provides a keyword extraction means for extracting keywords from document information when a registered document and its document information are registered in a file, and a relationship between the registered document and the keyword. An inverted 7 isle creation means for creating an inverted file.

キーワード間の関連情報の関連度を記述したキーワード
コネクション表を作成し、既に記録されている関連情報
の関速度の値を変更し、新規関連情報を生成するキーワ
ードコネクション表処理手段と、入力されたキーワード
により前記キーワードコネクション表およびインバーテ
ツドファイルから検索式に合致した文書を選出する文書
選出手段と有する文書検索装置において。A keyword connection table processing means that creates a keyword connection table that describes the degree of association of related information between keywords, changes the relationship speed value of the related information that has already been recorded, and generates new related information; A document search device comprising a document selection means for selecting a document matching a search formula from the keyword connection table and the inverted file using a keyword.

文書選出手段は、特定のキーワード群と各ファイル内の
登録文書のキーワード群との間の関連の強さを示す値を
文書確度とし、キーワードコネクション表、インバーテ
ツドファイルおよび所定の方法により文書確度を計算す
る文書確度計算手段を有し、文書選出手段は、検索式が
入力されると文書確度の大きさの順に検索結果の文書を
出力し１文書ごとの文書確度の可否が入力されると、所
定の方法によりキーワードコネクションを変更する。The document selection means uses a value indicating the strength of the relationship between a specific keyword group and the keyword group of registered documents in each file as document accuracy, and calculates the document accuracy using a keyword connection table, an inverted file, and a predetermined method. The document selection means outputs documents as search results in order of document accuracy when a search formula is input, and when the document accuracy of each document is input. , change the keyword connection by a predetermined method.

[Effect]

本発明による文書検索装置は、利用者の検索質問文（以
下検索式と呼ぶ）に対し装置内の各文書がどの程度検索
式を満たしているかその程度を表す文書確度を計算し、
文書を文書確度の大きい順にソートして利用者に提示す
る。利用者は検索結果の文書タイトルと文書確度の一覧
表からさらに詳しく見たい文書を選択し１文書内容を見
ることができる。利用者は一覧表の内容から、いま見て
いる文書が自分の検索式にふされしいか否かを判断する
。このような利用者の判断に基づく学習によりキーワー
ドコネクションを変更し、次の検索時にその判断が検索
結果に反映されるようにすることができる。The document retrieval device according to the present invention calculates the document certainty that represents the degree to which each document in the device satisfies the search formula for the user's search question sentence (hereinafter referred to as the search formula),
Documents are sorted in descending order of document accuracy and presented to the user. The user can select a document to view in more detail from a list of document titles and document certainty in the search results, and view the contents of one document. From the contents of the list, the user determines whether the document currently being viewed is suitable for his or her search formula. Keyword connections can be changed through learning based on such user judgments, and the judgments can be reflected in the search results at the next search.

〔Example〕

次に１本発明の文書検索装置の実施例を添付図面を参照
して説明する。Next, an embodiment of a document retrieval device according to the present invention will be described with reference to the accompanying drawings.

本発明の文書検索装置はキーワードと各登録文書間の関
連を記述するキーワードコネクションを用いる。各文書
に対し利用者の検索要求に近いものほど大きな値を持つ
ような文書確度という評価値を導入し、文書検索時には
その文書確度の大小をもとに利用者が要求を満たす文書
を柔軟に選択する。The document search device of the present invention uses keyword connections that describe the relationship between keywords and each registered document. For each document, we introduce an evaluation value called document accuracy, which has a larger value as it is closer to the user's search request, and when searching for documents, users can flexibly select documents that meet their requirements based on the degree of document accuracy. select.

キーワードコネクションは、キーワード間の関連を記述
したものである。キーワードコネクションでは、関連の
大きさを０〜ｌの範囲とし。A keyword connection describes a relationship between keywords. For keyword connections, the magnitude of association is in the range 0 to l.

０はキーワード間に関係がないこと、０以上の値は関係
があること、さらに、１は関係の大きさが最大であるこ
とを表現するものとする。0 represents that there is no relationship between the keywords, a value of 0 or more represents that there is a relationship, and 1 represents that the magnitude of the relationship is maximum.

キーワードコネクションは２次元配列として捉えること
もできる。その場合、ｉ番目のキーワードとｊ番目のキ
ーラードの関速度はＷｉ（Ｋ　ｊ　）あるいはもっと簡
単にＷｉｊで表現される。Keyword connections can also be viewed as a two-dimensional array. In that case, the relationship velocity between the i-th keyword and the j-th Keyard is expressed as Wi(K j ) or more simply Wij.

本発明の文書検索装置の構成を第１図に示す。FIG. 1 shows the configuration of a document retrieval device according to the present invention.

キーワード抽出部１０は、登録文書１１を入力すると、
そのキーワードを抽出し、キーワードおよび登録文書１
１を文書情報管理部２０、キーワードコネクション表処
理部３０．インバーテツドファイル作成部４０へ出力す
る。When the keyword extraction unit 10 inputs the registered document 11,
Extract the keyword, keyword and registered document 1
1, a document information management section 20, a keyword connection table processing section 30. It is output to the inverted file creation section 40.

文書管理部２０、キーワードコネクション表処理部３０
．インバーテツドファイル作成部４０は、それぞれファ
イル２１，３１．４１を有する。文書情報管理部２０は
、キーワードと書誌的情報をファイル２１に格納し、検
索時に利用可能な形にデータベース化する。Document management section 20, keyword connection table processing section 30
．． The inverted file creation section 40 has files 21, 31, and 41, respectively. The document information management unit 20 stores keywords and bibliographic information in a file 21, and creates a database in a format that can be used during a search.

キーワードコネクション表管理部２０は、必要なキーワ
ードとキーワードコネクション（以下関連情報と称する
）を記述したキーワードコネクション表を作成して、フ
ァイル３１に格納する。さらに、要求処理部５４からの
要求があれば、関連情報の重みを変更する。The keyword connection table management unit 20 creates a keyword connection table that describes necessary keywords and keyword connections (hereinafter referred to as related information), and stores it in the file 31. Furthermore, if there is a request from the request processing unit 54, the weight of the related information is changed.

インバーテツドファイル作成部４０は、キーワードと文
書の関係を記述し、ファイル４１に格納する。The inverted file creation unit 40 describes the relationship between keywords and documents, and stores it in a file 41.

文書選出部５０は各部５１〜５５よりなる。The document selection section 50 consists of sections 51 to 55.

文書確度計算部５１は要求処理部５４から転送されてき
た検索式に基づき、ファイル３１に格納されているキー
ワードコネクション表内の必要な情報を用いて各文書毎
に文書確度を計算する。計算方式は後で詳しく説明する
。The document accuracy calculation unit 51 calculates the document accuracy for each document based on the search formula transferred from the request processing unit 54 and using necessary information in the keyword connection table stored in the file 31. The calculation method will be explained in detail later.

文書ソート部５２は文書確度計算部５１で計算された文
書確度の大きい順にファイル２１に格納されている文書
を整列し１表示管理部５５に転送する。The document sorting unit 52 arranges the documents stored in the file 21 in descending order of document accuracy calculated by the document accuracy calculation unit 51 and transfers them to the 1 display management unit 55.

表示管理部５５は要求処理部５３から与えられる利用者
の指示に従い検索結果を表示する。The display management section 55 displays search results according to the user's instructions given from the request processing section 53.

学習管理部５４は要求処理部Ｓ３から与えられる利用者
の指示に従いキーワードコネクションの学習を行う。学
習管理部５４ではキーワードコネクションの各重みをど
の程度変化させるかの計算を行い、実際の値の変更はキ
ーワードコネクション表処理部３０に指示を与えること
で実現される。学習方式は後で詳しく説明する。The learning management unit 54 performs keyword connection learning according to the user's instructions given from the request processing unit S3. The learning management unit 54 calculates how much each weight of the keyword connection should be changed, and the actual value change is realized by giving an instruction to the keyword connection table processing unit 30. The learning method will be explained in detail later.

要求処理部５３は、検索要求にあった文書を検索するた
めに文書確度計算部５１に対して、受理した検索式を転
送する。同時に必要があれば、最終的なキーワード群に
関する関連情報の重みの変更をキーワードコネクション
表処理部３０に指示する。The request processing unit 53 transfers the received search expression to the document certainty calculation unit 51 in order to search for a document matching the search request. At the same time, if necessary, the keyword connection table processing unit 30 is instructed to change the weight of related information regarding the final keyword group.

文書検索処理は利用者の要求に従い、適切な文書を検索
表示することを目的としている。文書選択処理は文書選
出部ｓＯで行われる。The purpose of document search processing is to search and display appropriate documents according to user requests. The document selection process is performed by the document selection unit sO.

検索条件はキーワードの指定およびその他書誌的情報に
対する条件設定により行われる。まず、各文書のキーワ
ード以外の条件設定が満たされているか否かが判定され
る。条件が満たされている場合、その文書の文書確度の
計算が行われ、条件が満たされていない場合、その文書
の文書確度の計算は行われず値は０とされる。Search conditions are performed by specifying keywords and setting conditions for other bibliographic information. First, it is determined whether condition settings other than keywords for each document are satisfied. If the conditions are met, the document accuracy of the document is calculated; if the conditions are not met, the document accuracy of the document is not calculated and the value is set to 0.

キーワードの指定は、利用者の得たい情報を表現するキ
ーワード式によって行う。キーワード式とは１つのキー
ワードあるいは複数のキーワードをＯＲ（和演算）によ
り結合させたものである。複数のキーワードの結合が用
いられるのは文書管理装置内に利用者が得ようとしてい
る情報を表現する１語のキーワードが存在していない場
合である。Keywords are specified using keyword expressions that express the information the user wants to obtain. A keyword expression is a combination of one keyword or multiple keywords using OR (sum operation). A combination of multiple keywords is used when a single keyword expressing the information the user is trying to obtain does not exist in the document management device.

つぎに、キーワード式Ｑｕｅｒｙに対する各文書確度の
計算法を示す（実際の計算は文書選出部５０内の文書確
度計算部で行われる）。Next, a method of calculating each document accuracy for the keyword expression Query will be described (the actual calculation is performed by the document accuracy calculation unit in the document selection unit 50).

本発明による文書確度の計算において代数和ｅを用いる
。The algebraic sum e is used in the document accuracy calculation according to the present invention.

ｘＦＢｙ＝ｘ＋ｙ−ｘｙ＝１　　（１ｘ）（１−ｙ）ま
た、 ΦＸｊ＝Ｘ１ΦＸ２ｅ　−ｅＸｎ＝１−７７（１−Ｘｊ
　）ｊである。xFBy=x+y-xy=1 (1x) (1-y) Also, ΦXj=X1ΦX2e -eXn=1-77(1-Xj
)j.

文書確度とは、各文書と利用者の指定した検索式の関連
の強さを表す数値である。本発明のキーワードコネクシ
ョンの学習法は検索式がキーワードが１つあるいは複数
のキーワードの論理和である場合を対象としているので
、その場合の計算法を説明する。鷹ず、各キーワードに
ついてキーワードコネクションの重みの検索式に含まれ
るキーワードの集合に関する代数和を求める。つぎに、
その結果のｍ番目の文書に含まれるキーワードの集合Ｋ
　Ｌ　（ｍ）に関する代数和を求め、それを文書確度と
する。The document accuracy is a numerical value representing the strength of the relationship between each document and the search formula specified by the user. Since the keyword connection learning method of the present invention is intended for the case where the search expression is the logical sum of one or more keywords, the calculation method in that case will be explained. For each keyword, we calculate the algebraic sum of the set of keywords included in the keyword connection weight search formula. next,
A set K of keywords included in the resulting m-th document
Find the algebraic sum regarding L (m) and use it as the document accuracy.

ここで、１はいま注目しているｍ番目の文書についてい
るキーワードの隼合Ｋ　Ｌ　（ｍ）の要素、ｊは検索式
に含まれるキーワードの集合ＱＵＥＲＹの要素をとる。Here, 1 is an element of the combination of keywords K L (m) for the m-th document of interest, and j is an element of the set of keywords QUERY included in the search expression.

さらに、（Ａ２）式の関係を用いることで、次のように
変形できる。Furthermore, by using the relationship of equation (A2), it can be modified as follows.

文書選出部５０の文書確度計算部５１で文書確度が計算
されたならば、次に、文書選出部ＳＯの文書ソート部５
２で文書を文書確度の大きい順にソートする。そして、
その結果が表示管理部５５に送られ利用者に表示される
。Once the document accuracy calculation unit 51 of the document selection unit 50 calculates the document accuracy, next, the document sorting unit 5 of the document selection unit SO
In step 2, documents are sorted in descending order of document accuracy. and,
The results are sent to the display management section 55 and displayed to the user.

本発明によるキーワードコネクションの学習方式を適用
できるキーワードコネクションは、各重みがθ〜１の範
囲内の値を取り、ｉ　＝　ｊの場合、重みが１に固定さ
れるものである。A keyword connection to which the keyword connection learning method according to the present invention can be applied is one in which each weight takes a value within the range of θ to 1, and the weight is fixed to 1 when i = j.

学習の方法はつぎの通りである。検索結果がどの程度利
用者の判断に近いかを評価する、すなわち、文書の文書
確度と利用者の判断との相違を数値化する評価関数Ｅを
あらかじめ設定しておく。この学習方式における評価関
数は後に具体的に示す。キーワードコネクションの学習
において、キーワードラネクシ１ンの重みをいま定義し
た評価関数Ｅの値を小さくするように変更する。The learning method is as follows. An evaluation function E is set in advance to evaluate how close the search results are to the user's judgment, that is, to quantify the difference between the document accuracy of a document and the user's judgment. The evaluation function in this learning method will be specifically shown later. In the learning of keyword connections, the weight of the keyword Lanexi 1 is changed so as to reduce the value of the evaluation function E just defined.

Ｗｉ　ｊ　４−Ｗｉ　ｊ＋αΔＷｉｊここで、また、αは正の定数で学習の速度を決定する学習係数で
ある。変更量ΔＷｉｊの決定は最急降下法に基づいて行
われる。すなわち、以下で、２つの学習方式１・２で評価関数Ｅがどのよう
に与えられ、それに基づいてキーワードコネクションの
変更量Ｎｉｊがどのように与えられるか詳しく説明する
。Wi j 4-Wi j+αΔWij where α is a positive constant and is a learning coefficient that determines the learning speed. The amount of change ΔWij is determined based on the steepest descent method. That is, how the evaluation function E is given in the two learning methods 1 and 2, and how the change amount Nij of keyword connections is given based on it will be explained in detail below.

（学習方式１）学習方式１では、ある１つの文書が検索式に対し適切あ
るいは不適切であるか利用者によりて判断される。評価
関数Ｅは、ある文書のファイル確度と利用者の判断を数
値化したもの（適切である文書に対しては１、不適切で
ある文書に対してはＯが与えられる）の差の２乗として
、つぎのように与えられる。(Learning method 1) In learning method 1, the user determines whether a certain document is appropriate or inappropriate for a search expression. The evaluation function E is the square of the difference between the file accuracy of a certain document and the user's judgment (1 is given to a document that is appropriate, and O is given to a document that is inappropriate). is given as follows.

ここで、ＦＣｍはｍ番目の文書の文書確度である。Here, FCm is the document certainty of the m-th document.

以下で■文書が適切と判断された場合、■文書が不適切
と判断された場合で変更量、ｌｆｗｉｊの計算式が異な
るので、２つの場合に分けて計算式を詳細に説明する。In the following, the calculation formula for the change amount and lfwij is different depending on (1) when the document is determined to be appropriate and (2) when the document is determined to be inappropriate, so the calculation formula will be explained in detail in two cases.

■　文書が適切と判断された場合まず、評価関数Ｅの重みＷｉｊによる偏微分は（Ａ４）
式よシ、ここで、ｋはいま注目しているｍ番目の文書についてい
るキーワードの集合Ｋ　Ｌ　（ｍ）の要素、１は検索式
に含まれるキーワードの集合ＱＵＥＲ？の要素をとる。■ When the document is judged to be appropriate First, the partial differential of the evaluation function E with respect to the weight Wij is (A4)
Here, k is an element of the set of keywords K L (m) for the m-th document of interest, and 1 is the set of keywords included in the search formula QUER? Take the elements of

ただし、（ｋ＊ｔ）が（ｉ、ｊ）となる組合せは除く。However, combinations in which (k*t) becomes (i, j) are excluded.

（６）式において、Ｗｉｊ鋒ｌの場合、と書き換えるこ
とができ、偏微分の計算量を減らすことができる。Ｗｉ
ｊ　＝　１の場合、変更量は正となり、一方キーワード
コネクシｌンの重みは０から１の範囲になければならな
いので、実際に値を変更することはできない。したがっ
て、となる。In Equation (6), it can be rewritten as in the case of Wij Fengl, and the amount of calculation of partial differentials can be reduced. Wi
When j = 1, the amount of change is positive, and on the other hand, the weight of the keyword connexin must be in the range from 0 to 1, so the value cannot actually be changed. Therefore, it becomes.

■　文書が不適切と判断された場合まず、先はどの指標Ｅ１の重みＷｉＪによる偏微分はＦＣｍのＷｉｊによる偏微分は、先はどの（６）式で与
えられる。しかし、■で計算を省略可能だったＷｉｊ＝
１の場合、今回は省略せずに計算しなければならない。■ When a document is judged to be inappropriate First, which index E1 should be given a partial differential by the weight WiJ?The partial differential of FCm by Wij should be given by equation (6). However, Wij =
In the case of 1, you must calculate without omitting it this time.

したがって、となる。therefore, becomes.

さて、以上のように■・■の場合にそれぞれＡＶｉｊが
求められる。実際の重みの変更はＡＶｉｊが０でない場
合についてのみ行えば良く、またｉ＝ｊの場合重みの値
はｌに固定なので、キーワードコネクションの学習のフ
ローチャートは第２図および第３因に示す通りになる。Now, as described above, AVij can be found in each case of ■ and ■. The actual weight needs to be changed only when AVij is not 0, and the weight value is fixed to l when i = j, so the flowchart for learning keyword connections is as shown in Figure 2 and the third factor. Become.

この学習方式によれば、利用者の判断、指示に基づいて
キーワードコネクションを変更し、次の検索時にその判
断、指示を検索結果に反映させることができる。しかも
高速で検索できるから対話処理に適している。この方式
によれば、日常検索時にキーワードコネクションを少し
づつ改良することができる。According to this learning method, keyword connections can be changed based on the user's judgment and instructions, and those judgments and instructions can be reflected in the search results at the time of the next search. Furthermore, it can be searched at high speed, making it suitable for interactive processing. According to this method, keyword connections can be improved little by little during daily searches.

（学習方式２）学習方式２では、全文書が１文書ごとに検索式に対し適
切あるいは不適切であるか利用者によって判断される。(Learning method 2) In learning method 2, the user determines whether all documents are appropriate or inappropriate for the search expression, document by document.

評価関数Ｅは、ある文書のファイル確度と利用者の判断
を数値化したＦＣｍ（利用者の適切・不適切の判断を表
し、適切である文書に対しては１、不適切である文書に
対しては０が与えられる）の差の２乗の全文書に対する
和として、つぎのように与えられる。The evaluation function E is FCm, which quantifies the file accuracy of a certain document and the user's judgment. The sum of the squares of the differences over all documents is given as follows.

実際には、ｆＣｍを与える際に利用者は適切であるもの
だけを検索装置に教えればよい。また、ｍは全文書の集
合ＡＬＬＤＯＣＯ中の要素を取る。In fact, when providing fCm, the user only needs to tell the search device what is appropriate. Further, m takes an element in the set ALLDOCO of all documents.

まず、評価関数Ｅの重みＷｉｊによる偏微分はＦＣｍの
Ｗｉｊによる偏微分は、前記（６）式で与えられる。（
６）式によれば、ｍ番目の文書についているキーワード
に１番目のキーワードが含まれていない場合その偏微分
は０となるので、（２）式でｍは全文書の集合の要素を
取るとせず、１番目のキーワードを含む文書の集合ＤＯ
Ｃ（ｌの要素を取るとできる。First, the partial differential of the evaluation function E with respect to the weight Wij and the partial differential of FCm with respect to Wij are given by the above equation (6). (
According to equation 6), if the keywords in the m-th document do not include the first keyword, the partial differential is 0, so in equation (2), m takes the element of the set of all documents. A set of documents containing the first keyword DO
You can do this by taking the elements of C(l.

Ｗｉ　ｊ　’ｘ；　１の場合、（７）式の書換えができ
る。Wi j 'x; In the case of 1, equation (7) can be rewritten.

Ｗｉｊ＝１の場合、省略せずに計算しなければならない
。したがりて、　、：となる。When Wij=1, it must be calculated without omission. Therefore, , : becomes.

以上のように席１ｊが求められる。実際の重みの変更は
＃ｌｊが０でない場合についてのみ行えば良く、またｌ
＝ｊの場合重みの値はｌに固定なので、キーワードコネ
クションの学習のフローチャートは第４図〜第６図に示
す通りになる。Seat 1j is determined as described above. Actual weight changes only need to be made when #lj is not 0, and l
In the case of =j, the weight value is fixed to l, so the flowchart of keyword connection learning is as shown in FIGS. 4 to 6.

この学習方式の場合にも、利用者の判断、指示に基づい
てキーワードコネクションを変更し、次の検索時にその
判断、指示を検索結果に反映させることができる。この
方式による処理は比較的低速であるため、パッチ処理に
適し、キーワードコネクションの初期設定時などに用い
られる。In the case of this learning method as well, keyword connections can be changed based on the user's judgment and instructions, and those judgments and instructions can be reflected in the search results at the time of the next search. Processing using this method is relatively slow, so it is suitable for patch processing and is used when initializing keyword connections.

〔Effect of the invention〕

本発明によれば、あいまい検索が可能であシ、利用者の
判断、指示に基づいてキーワードコネクションを変更し
、次の検索時にその判断、指示を検索結果に反映させる
ことができる。その結果、利用者が過去に必要とした文
書が上位にランクされるようになシ、検索を効率化でき
る。According to the present invention, vague searches are possible, keyword connections can be changed based on the user's judgments and instructions, and the judgments and instructions can be reflected in the search results at the time of the next search. As a result, documents that the user has needed in the past are ranked higher, making the search more efficient.

逆に、利用者が過去に不必要とした文書は下位にランク
されるようになシ、検索を効率化できる。さらに、利用
者が用いないキーワードは不必要として取シ除くことが
可能であシ、記憶装置の効率利用が可能である。Conversely, documents that the user has deemed unnecessary in the past are ranked lower, making the search more efficient. Furthermore, keywords that are not used by the user can be removed as unnecessary, and the storage device can be used efficiently.

[Brief explanation of drawings]

第１図は本発明による文書検索装置の一実施例を示すブ
ロック図、第２図、第３図は本発明による文書検索装置において行
われる学習方式を示すフローチャート、第４図、第５図、第６図は本発明による文書検索装置に
をいて行われる他の学習方式を示すフローチャートであ
る。主要部分の符号の説明１Ｇ・・・キーワード抽出部１１・・・登録文書１２・・・書誌情報２０・・・文書情報管理部２１．３１．４１・・・ファイル３０・・・キーワードコネクション表処理部４０・・・
インバーテツドファイル作成部５０・・・文書選出部５１・・・文書確度計算部５２・・・文書ソート部５３・・・要求処理部５４・・・学習管理部５５・・・表示管理部Ｏ・・・キーボード０・・・デイスプレィFIG. 1 is a block diagram showing an embodiment of a document search device according to the present invention; FIGS. 2 and 3 are flowcharts showing a learning method performed in the document search device according to the present invention; FIGS. 4 and 5; FIG. 6 is a flowchart showing another learning method performed using the document search device according to the present invention. Explanation of symbols of main parts 1G...Keyword extraction section 11...Registered document 12...Bibliographic information 20...Document information management section 21.31.41...File 30...Keyword connection table processing Part 40...
Inverted file creation section 50...Document selection section 51...Document accuracy calculation section 52...Document sorting section 53...Request processing section 54...Learning management section 55...Display management section O ...Keyboard 0...Display

Claims

[Claims] 1. Keyword extracting means for extracting keywords from the document information when a registered document and its document information are registered in a file; and an inverted keyword indicating a relationship between the registered document and the keyword. an inverted file creation means that creates a file; and a keyword connection table that describes the degree of association of related information between the keywords, changes the value of the degree of association of related information that has already been recorded, and generates new related information. A document retrieval device comprising: a keyword connection table processing means for generating a keyword connection table; and a document selection means for selecting a document matching a search formula from the keyword connection table and the inverted file according to an input keyword, the document selection means comprising: A document whose document accuracy is calculated using the keyword connection table, the inverted file, and a predetermined method, with document accuracy being a value indicating the strength of association between a specific keyword group and a keyword group of registered documents in each file. The document selection means includes a probability calculation means, when the search formula is input, the document selection means outputs documents as search results in order of document probability, and when the document probability of each document is input. , a document retrieval device characterized in that a keyword connection is changed by a predetermined method. 2. Keyword extraction means for extracting keywords from the document information when a registered document and its document information are registered in a file; and an inverted file for creating an inverted file indicating the relationship between the registered document and the keywords. a keyword connection table that creates a keyword connection table that describes the degree of association of related information between the keywords, changes the value of the degree of association of related information that has already been recorded, and generates new related information; In a document retrieval device comprising a processing means and a document selection means for selecting a document matching a search condition from the keyword connection table and the inverted file according to an input keyword, the document selection means selects a document matching a specific keyword group and each document. A value indicating the strength of association between a keyword group of a registered document in a file is defined as document accuracy, and the document accuracy calculation means calculates document accuracy using the keyword connection table, the inverted file, and a predetermined method. , wherein the document selection means outputs documents as search results in order of document certainty when the search formula and document group are input, and further changes keyword connections according to a predetermined method. Search device.