JPH06309366A

JPH06309366A - Key word extraction system

Info

Publication number: JPH06309366A
Application number: JP5119160A
Authority: JP
Inventors: Reiko Bessho; 礼子別所
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1993-04-21
Filing date: 1993-04-21
Publication date: 1994-11-04

Abstract

(57)【要約】【目的】接辞についての緻密な処理が可能になるよう
に、接辞についてのキーワード素性と処理を増やし、強
制削除というキーワード素性を用いることにより、複合
語の末尾になるもの以外でも、キーワードとなる得ない
ものはキーワードから外すことを可能にする。【構成】入力手段１により日本語文書が入力され、該
入力手段１により入力された文書は、形態素解析手段２
により、単語単位に分けられ、該単語に品詞を与える。
キーワード抽出手段３は、前記品詞を用いてキーワード
抽出を行う。これにより、特殊な文字だけをキーワード
とするのではなく、すべての文字列をキーワードの対象
として捉えることができる。 (57) [Summary] [Purpose] Except for the end of a compound word by increasing the keyword features and processing of affixes and using the keyword feature of forced deletion to enable precise processing of affixes. However, it is possible to remove a keyword that cannot be a keyword. [Structure] A Japanese document is input by the input unit 1, and the document input by the input unit 1 is a morphological analysis unit 2.
The word is divided into words and a part of speech is given to the word.
The keyword extracting means 3 extracts keywords using the part of speech. As a result, not only special characters are used as keywords, but all character strings can be considered as keywords.

Description

Detailed Description of the Invention

【０００１】[0001]

【技術分野】本発明は、キーワード抽出方式に関し、よ
り詳細には、文書から自動的にキーワードを抽出するキ
ーワード抽出方式に関する。TECHNICAL FIELD The present invention relates to a keyword extraction method, and more particularly to a keyword extraction method for automatically extracting a keyword from a document.

【０００２】[0002]

【従来技術】先に提案された、特願平４−１００５０９
号の「キーワード抽出方式」は、次の〜の点を特徴
とするキーワード抽出方式である字種に左右されない点。キーワード辞書を必要としない点。不用語辞書を必要としない点。そのため、以下のような効果を奏するものである。すな
わち、日本語文書を形態素解析し、その結果で得た品詞情報
を用いてキーワード抽出をするので、特殊な文字だけを
キーワード抽出するのではなく、すべての文字列をキー
ワードの対象として捉えることができる。品詞情報に加えてキーワード素性も用いるので、不用
語は少なく、かつ必要な語がおちることの少ない正確な
キーワード抽出ができる。しかし、この方法にはいくつかの問題点があった。すな
わち、キーワード素性の導入において、キーワード素性
「接頭修飾」を用いたが、これでは接辞について緻密な
処理が不可能であった。また、キーワード素性の導入に
おいて、キーワード素性「複合語語基」を用いたが、こ
れでは複合語の末尾になるもの以外を、キーワードとな
り得ないものとしてキーワードから外すことは不可能で
あった。2. Description of the Related Art Japanese Patent Application No. 4-100509 previously proposed.
The "keyword extraction method" of the issue is a keyword extraction method characterized by the following points, which is not affected by the character type. It doesn't need a keyword dictionary. It doesn't need a dictionary. Therefore, the following effects are exhibited. That is, since a Japanese document is subjected to morphological analysis and the part-of-speech information obtained as a result is used for keyword extraction, not all special characters can be extracted as keywords, but all character strings can be considered as keywords. it can. Since the keyword features are used in addition to the part-of-speech information, it is possible to accurately extract keywords with few non-words and with fewer necessary words. However, this method has some problems. That is, in introducing the keyword feature, the keyword feature “prefix modification” was used, but this did not allow precise processing of affixes. In addition, in introducing the keyword feature, the keyword feature “compound word base” was used, but it was impossible to exclude all but the end of the compound word from the keywords as being unusable.

【０００３】[0003]

【目的】本発明は、上述のごとき実情に鑑みなされたも
ので、接辞についての緻密な処理が可能になるように、
接辞についてのキーワード素性と処理を増やすこと、ま
た、強制削除というキーワード素性を用いることによ
り、複合語の末尾になるもの以外でも、キーワードとな
る得ないものはキーワードから外すことを可能にしたキ
ーワード抽出方式を提供することを目的としてなされた
ものである。[Purpose] The present invention has been made in view of the above-mentioned circumstances, and enables a precise processing of affixes.
By increasing the keyword features and processing of affixes, and by using the keyword feature called forced deletion, it is possible to remove from the keywords those that can not be keywords, other than the end of compound words. The purpose is to provide a method.

【０００４】[0004]

【構成】本発明は、上記目的を達成するために、（１）
日本語文書を形態素解析し、該形態素解析の結果で得た
品詞情報と、さらにキーワード素性による情報を用いる
ことにより、文書中からキーワードを抽出すること、更
には、（２）前記キーワード素性の一つである接頭特殊
１を用いることにより、必要な接頭辞はキーワードの一
部として抽出すること、更には、（３）前記キーワード
素性の一つである接頭特殊２を用いることにより、同じ
接頭辞でも必要な場合はキーワードの一部とし、不必要
な場合はキーワードとしないとすること、更には、
（４）前記キーワード素性の一つである接頭特殊３を用
いることにより、不必要な場合は接頭辞と後続の語も削
除し、必要な場合でもその接頭辞だけ削除して後続の語
は抽出すること、更には、（５）前記キーワード素性の
一つである接尾人名を用いることにより、人名に付与さ
れることが多い接尾辞はキーワードから外すこと、更に
は、（６）前記キーワード素性の一つである接尾特殊１
を用いることにより、不必要な場合はその接尾辞をキー
ワードから外すこと、更には、（７）前記キーワード素
性の一つである接尾特殊２を用いることにより、不必要
な場合はその接尾辞と後続の語をキーワードから外すこ
と、更には、（８）前記キーワード素性の一つである強
制削除を用いることにより、不必要な話は強制的にキー
ワードから外すことを特徴としたものである。以下、本
発明の実施例に基づいて説明する。In order to achieve the above object, the present invention provides (1)
A morpheme analysis of a Japanese document is performed, and a part of speech information obtained as a result of the morpheme analysis and information based on a keyword feature are used to extract a keyword from the document. (2) One of the keyword features is extracted. The required prefix is extracted as a part of the keyword by using the prefix special 1 which is one, and (3) the same prefix is extracted by using the prefix special 2 which is one of the keyword features. But if it is necessary, it should be a part of the keyword, and if it is unnecessary, it should not be a keyword.
(4) By using the prefix special 3 which is one of the keyword features, the prefix and the following word are deleted when unnecessary, and only the prefix is deleted and the subsequent word is extracted even when necessary. In addition, (5) by using the suffix name which is one of the keyword features, the suffix often given to the person name is removed from the keyword, and (6) the suffix of the keyword feature is added. One suffix special 1
By using, the suffix is removed from the keyword when unnecessary, and further, (7) By using the suffix special 2 which is one of the keyword features, the suffix is changed to the suffix when unnecessary. The feature is that unnecessary words are forcibly removed from the keyword by removing the following word from the keyword, and (8) using forced deletion which is one of the keyword features. Hereinafter, description will be given based on examples of the present invention.

【０００５】図１は、本発明によるキーワード抽出方式
の一実施例を説明するための構成図で、図中、１は入力
手段、２は形態素解析手段、３はキーワード抽出手段で
ある。入力手段１により日本語文書が入力され、該入力
手段１により入力された文書は、形態素解析手段２によ
り、単語語位に分けられ、該単語に品詞を与える。キー
ワード抽出手段３は、前記品詞を用いてキーワード抽出
を行う。FIG. 1 is a block diagram for explaining an embodiment of a keyword extraction system according to the present invention. In the figure, 1 is an input means, 2 is a morpheme analysis means, and 3 is a keyword extraction means. A Japanese document is input by the input unit 1, and the document input by the input unit 1 is divided by the morpheme analysis unit 2 into word positions, and parts of speech are given to the words. The keyword extracting means 3 extracts keywords using the part of speech.

【０００６】図２は、本発明によるキーワード抽出方式
の動作を説明するためのフローチャートで、接頭辞の扱
いについてのフローチャートである。以下、各ステップ
に従って順に説明する。まず、一単語入力され（step
１）、それが接頭辞か否か判断される（step２）。接頭
辞でなければ、またstep１に戻る。さて、接頭辞であっ
た場合、その接頭辞に「接頭特殊１」「接頭特殊２」
「接頭特殊３」のいずれかのキーワード素性が付与され
ているかを調べる。まず、接頭特殊１のキーワード素性
が付与されている場合（step３）、これは接頭辞も含め
て後続の単語（群）をキーワードとする（step４）。次
に、接頭特殊２のキーワード素性が付与されている場合
（step５）、次に、２つ以上の単語が後続しているかど
うかを調べる（step６）。後続している場合は、接頭辞
も含めて後続の単語群をキーワードとする（step４）。
後続しない場合は、接頭辞もその後の単語も含めてキー
ワードとはしない（step７）。FIG. 2 is a flow chart for explaining the operation of the keyword extraction method according to the present invention, which is a flow chart for handling a prefix. Hereinafter, each step will be described in order. First, one word is input (step
1), it is judged whether or not it is a prefix (step 2). If it is not a prefix, return to step 1 again. If it is a prefix, add "prefix special 1" and "prefix special 2" to the prefix.
It is checked whether any of the keyword features of "prefix special 3" is given. First, when the keyword feature of prefix special 1 is added (step 3), this uses the subsequent word (group) including the prefix as a keyword (step 4). Next, when the keyword feature of prefix special 2 is added (step 5), it is next checked whether or not two or more words are followed (step 6). If it follows, the following word group including the prefix is used as a keyword (step 4).
If it does not follow, neither the prefix nor the subsequent words are considered as keywords (step 7).

【０００７】次に、接頭特殊３のキーワード素性が付与
されている場合（step８）、まず、後続する単語にキー
ワード素性「複合語語基」が付与されているか否かを調
べる（step９）。なお、複合語語基とは、キーワード素
性の一つで、複合語の末尾になりやすい語に付与される
素性である。複合語語基が付与されていれば、接頭辞と
その後の一単語を除き、その後全体の単語（群）をキー
ワードとする（step１０）。付与されていなければ、接
頭辞だけを除き、その後全体の単語（群）をキーワード
とする。最後にどのキーワード素性も付与されていない
場合（step１２）、その接頭辞だけを除き、その後全体
の単語（群）をキーワードとする（step１１）。Next, when the keyword feature of prefix special 3 is given (step 8), first, it is checked whether or not the keyword feature "compound word base" is given to the subsequent word (step 9). It should be noted that the compound word base is one of the keyword features and is a feature given to a word that tends to be the end of the compound word. If a compound word base is given, the whole word (group) is used as a keyword except the prefix and the subsequent word (step 10). If not given, only the prefix is removed, and then the entire word (group) is used as the keyword. Finally, when no keyword feature is given (step 12), only the prefix is removed, and then the entire word (group) is used as a keyword (step 11).

【０００８】図３は、接尾辞の扱いについてのフローチ
ャートである。以下、各ステップに従って順に説明す
る。まず、一単語入力され（step１）、それが接尾辞か
否か判断される（step２）。接尾辞でなければ、またst
ep１に戻る。さて、接尾辞であった場合、その接尾辞に
「接尾人名」「接尾特殊１」「接尾特殊２」のいずれか
のキーワード素性が付与されているかを調べる。まず、
接尾人名のキーワード素性が付与されている場合（step
３）、接尾辞だけを除き、その前の単語（群）をキーワ
ードとする（step４）。FIG. 3 is a flowchart for handling suffixes. Hereinafter, each step will be described in order. First, one word is input (step 1), and it is determined whether or not it is a suffix (step 2). If it is not a suffix, then st
Return to ep1. If the suffix is a suffix, it is checked whether the suffix has any of the keyword features “suffix person name”, “suffix special 1” and “suffix special 2”. First,
When the keyword feature of the suffix name is given (step
3), only the suffix is removed and the preceding word (group) is used as a keyword (step 4).

【０００９】次に、接尾特殊１のキーワード素性が付与
されている場合（step５）、その接尾辞の前に２つ以上
の単語が続いていたかを判断する（step６）。２つ以上
の単語が続いていた場合、さらに後続の単語があるか否
かを判断する（step７）。後続の単語がない場合、接尾
辞だけを除き、その前の単語（群）をキーワードとする
（step４）。後続する単語があった場合、前の単語
（群）も接尾辞も含め、全体をキーワードとする（step
８）。さて、前記step６で接尾辞の前に２つ以上の単語
が続いていない場合も、後続の単語があるかどうかを調
べる（step９）。後続の語があった場合、後の単語もあ
わせて全体をキーワードとする（step８）。後続の語が
ない場合は、前の単語も接尾辞も均めてキーワードとし
ない（step１０）。Next, when the keyword feature of the suffix special 1 is added (step 5), it is judged whether or not two or more words follow the suffix (step 6). When two or more words are continued, it is judged whether or not there is a subsequent word (step 7). If there is no subsequent word, only the suffix is removed and the preceding word (group) is used as the keyword (step 4). If there is a succeeding word, the entire word including the preceding word (s) and suffix is used as the keyword (step
8). Even if two or more words do not follow the suffix in step 6, it is checked whether there is a succeeding word (step 9). If there is a succeeding word, the entire word including the succeeding word is used as a keyword (step 8). If there is no subsequent word, the previous word and the suffix are equalized and not used as a keyword (step 10).

【００１０】次に、接尾特殊２のキーワード素性が付与
されている場合（step１１）、この場合もその接尾辞の
前に２つ以上の単語が続いていたかどうかを判断する
（step１２）。続いていた場合、後の単語もあわせて全
体をキーワードとする（step８）。続いていなかった場
合は、後続する単語があるかどうかを判断する（step１
３）。後続する単語があった場合、その接尾辞も含めて
全体をキーワードとし（step８）、後続する単語がなか
った場合は、前の単語も接尾辞も含めてキーワードとし
ない（step１０）。最後にどのキーワード素性も付与さ
れていない場合（step１４）、接尾辞も含めて全体をキ
ーワードとする（step８）。Next, when the keyword feature of suffix special 2 is added (step 11), it is judged whether or not two or more words are followed by the suffix (step 12) also in this case. If it continues, the whole word is used as a keyword including the subsequent words (step 8). If not, it is judged whether there is a succeeding word (step 1
3). When there is a succeeding word, the entire word including the suffix is used as a keyword (step 8), and when there is no succeeding word, the preceding word and the suffix are not used as keywords (step 10). Finally, when no keyword feature is given (step 14), the entire keyword including the suffix is set as a keyword (step 8).

【００１１】図４は、強制削除のキーワード素性が付与
されていた場合のフローチャートである。以下、各ステ
ップに従って順に説明する。まず、一単語入力され（st
ep１）、その語に強制削除のキーワード素性が付与され
ているかどうかを判断する（step２）。付与されていな
ければ、前記step１に戻る。付与されていた場合、その
単語は強制的に削除される。FIG. 4 is a flow chart when the keyword feature of forced deletion is added. Hereinafter, each step will be described in order. First, one word is entered (st
ep1), it is judged whether or not the word has a keyword feature of forced deletion (step 2). If not given, the process returns to step 1. If so, the word is forcibly deleted.

【００１２】次に、本発明の一実施例を例を用いて説明
する。Next, one embodiment of the present invention will be described using an example.

【００１３】[0013]

【表１】 [Table 1]

【００１４】つまり、後続の単語が如何ようであっても
接頭語もキーワードの一部とする。That is, the prefix is also a part of the keyword regardless of the subsequent words.

【００１５】[0015]

【表２】 [Table 2]

【００１６】接頭辞の後続語が２語以上の場合はキーワ
ードとするが、一語の場合はキーワードとしない。When the subsequent word of the prefix is two or more words, it is a keyword, but when it is one word, it is not a keyword.

【００１７】[0017]

【表３】 [Table 3]

【００１８】ここで、「同社」の「社」、「各誌」の
「誌」には、複合語語基のキーワード素性が付与されて
いる。接頭辞の後の単語に複合語語基のキーワード基性
が付与される場合は、接頭辞とその直後に単語が削除さ
れる（「横浜工場」「各誌」参照）。複合語語基のキー
ワード素性が付与されない場合は、その接頭辞だけが削
除されて、その後がキーワードとして残る（「１０１３
機種」「駆動装置」参照）。Here, the keyword feature of the compound word base is given to “company” of “company” and “magazine” of “each magazine”. When the keyword base of the compound word base is given to the word after the prefix, the word is deleted immediately after the prefix (see "Yokohama Factory" and "Magazines"). If the keyword feature of the compound word base is not given, only the prefix is deleted and the rest remains as a keyword (“1013
See "Model""Drive".

【００１９】[0019]

【表４】 [Table 4]

【００２０】人名の後にくる、このような接尾辞はキー
ワードの一部とはしない。Such a suffix that follows the person's name is not part of the keyword.

【００２１】[0021]

【表５】 [Table 5]

【００２２】接尾辞の前に２単語以上続いている場合、
その接尾辞を除いてキーワードとする。１単語しかない
場合は全体をキーワードとしない。ただし、接続する単
語がある場合はその接尾辞も含めてキーワードとする
（「ライン上事故防止対策」参照）。If two or more words follow the suffix,
Remove the suffix and use it as a keyword. If there is only one word, do not use the whole as a keyword. However, if there is a word to connect, use the keyword including the suffix (see "Measures to prevent accidents on the line").

【００２３】[0023]

【表６】 [Table 6]

【００２４】接尾辞の前に２単語以上続いている場合、
その接尾辞を含めてキーワードとする。１単語しかない
場合は全体をキーワードとしない。ただし、後続する単
語がある場合はその接尾辞も含めてキーワードとする
（「全国的オンライン」参照）If more than one word follows the suffix,
Keywords including the suffix. If there is only one word, do not use the whole as a keyword. However, if there is a subsequent word, use the suffix including it as a keyword (see "Nationwide Online").

【００２５】そのキーワード素性が付与された単語が出
現した場合、強制的にキーワードから削除する。When a word having the keyword feature appears, it is forcibly deleted from the keyword.

【００２６】[0026]

【表７】 [Table 7]

【００２７】[0027]

【効果】以上の説明から明らかなように、本発明による
と、以下のような効果がある。（１）請求項１に対応する効果：日本語文書を形態素解
析し、その結果で得た品詞情報を用いてキーワード抽出
するので、特殊な文字だけをキーワードとするのではな
く、すべての文字列をキーワードの対象として捉えるこ
とができる。（２）請求項２に対応する効果：キーワード素性の一つ
である接頭特殊１を用いることにより、必要な接頭辞は
キーワードの一部として抽出することができる。（３）請求項３に対応する効果：キーワード素性の一つ
である接頭特殊２を用いることにより、同じ接頭辞でも
必要な場合はキーワードの一部とし、不必要な場合はキ
ーワードとしない、とすることができる。（４）請求項４に対応する効果：キーワード素性の一つ
である接頭特殊３を用いることにより、不必要な場合は
接頭辞と後続の語も削除し、必要な場合でもその接頭辞
だけを削除して接続の語は抽出することができる。（５）請求項５に対応する効果：キーワーザ素性の１つ
である接尾人名を用いることにより、人名に付与される
ことが多い接尾辞はキーワードから外すことができる。（６）請求項６に対応する効果：キーワード素性の一つ
である接尾特殊１を用いることにより、不必要な場合は
その接尾辞をキーワードから外すことができる。（７）請求項７に対応する効果：キーワード素性の一つ
である接尾特殊２を用いることにより、不必要な場合は
その接尾辞と後続の語をキーワードから外すことができ
る。（８）請求項８に対応する効果：キーワード素性の一つ
である強制削除を用いることにより、不必要な語は強制
的にキーワードから外すことができる。As is apparent from the above description, the present invention has the following effects. (1) Effect corresponding to claim 1: A Japanese document is morphologically analyzed and keywords are extracted using the part-of-speech information obtained as a result, so not only special characters are used as keywords, but also all character strings. Can be regarded as the target of the keyword. (2) Effect corresponding to claim 2: By using the prefix special 1 which is one of the keyword features, the required prefix can be extracted as a part of the keyword. (3) Effect corresponding to claim 3: By using the prefix special 2 which is one of the keyword features, if the same prefix is necessary, it becomes a part of the keyword, and if it is unnecessary, it is not a keyword. can do. (4) Effect corresponding to claim 4: By using the prefix special 3 which is one of the keyword features, the prefix and the subsequent words are deleted when unnecessary, and only the prefix is deleted when necessary. The word of connection can be deleted and extracted. (5) Effect corresponding to claim 5: By using the suffix name which is one of the keyword features, the suffix often given to the name can be removed from the keyword. (6) Effect corresponding to claim 6: By using the suffix special 1 which is one of the keyword features, the suffix can be removed from the keyword when unnecessary. (7) Effect corresponding to claim 7: By using the suffix special 2 which is one of the keyword features, the suffix and the following word can be removed from the keyword when unnecessary. (8) Effect corresponding to claim 8: By using forced deletion, which is one of the keyword features, unnecessary words can be forcibly removed from the keyword.

[Brief description of drawings]

【図１】本発明によるキーワード抽出方式の一実施例
を説明するための構成図である。FIG. 1 is a configuration diagram for explaining an embodiment of a keyword extraction system according to the present invention.

【図２】本発明における接頭辞の扱いについてのフロ
ーチャートである。FIG. 2 is a flowchart for handling a prefix according to the present invention.

【図３】本発明における接尾辞の扱いについてのフロ
ーチャートである。FIG. 3 is a flowchart for handling suffixes in the present invention.

【図４】本発明における強制削除のキーワード素性が
付与されていた場合のフローチャートである。FIG. 4 is a flowchart in the case where a keyword feature of forced deletion according to the present invention is added.

[Explanation of symbols]

１…入力手段、２…形態素解析手段、３…キーワード抽
出手段。1 ... Input means, 2 ... Morphological analysis means, 3 ... Keyword extraction means.

Claims

[Claims]

1. A keyword extraction method characterized by extracting a keyword from a document by morphologically analyzing a Japanese document and using part-of-speech information obtained as a result of the morphological analysis and information based on keyword features. .

2. The keyword extraction method according to claim 1, wherein a required prefix can be extracted as a part of a keyword by using prefix special 1 which is one of the keyword features.

3. By using the prefix special 2 which is one of the keyword features, it is possible to make a part of a keyword even if the same prefix is necessary and not to use it as a keyword when it is unnecessary. The keyword extraction method according to claim 1.

4. By using the prefix special 3 which is one of the keyword features, the prefix and the subsequent word are deleted when unnecessary, and only the prefix is deleted when necessary, and the subsequent word is deleted. The keyword extraction method according to claim 1, wherein the keyword can be extracted.

5. The keyword extraction method according to claim 1, wherein a suffix that is often given to a person's name can be removed from the keyword by using a suffix name which is one of the keyword features.

6. The keyword extraction method according to claim 1, wherein by using the suffix special 1 which is one of the keyword features, the suffix can be removed from the keyword when unnecessary.

7. The keyword according to claim 1, wherein by using the suffix special 2 which is one of the keyword features, the suffix and the following word can be removed from the keyword when unnecessary. Extraction method.

8. The keyword extraction method according to claim 1, wherein an unnecessary story can be forcibly removed from the keyword by using forced deletion, which is one of the keyword features.