JP2012164031A

JP2012164031A - Data processor, data storage device, data processing method, data storage method and program

Info

Publication number: JP2012164031A
Application number: JP2011022147A
Authority: JP
Inventors: Takumi Mori; 拓海森; Tadashi Matsuda; 規松田; Takashi Ito; 伊藤　　隆; Mitsuhiro Hattori; 充洋服部; Takahito Hirano; 貴人平野
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2011-02-03
Filing date: 2011-02-03
Publication date: 2012-08-30

Abstract

【課題】確率的暗号を利用した秘匿検索を安全に高速化する。
【解決手段】利用者端末装置２０１は、保管対象の文書情報に対して指定された保管キーワードから一意に得られる乱数値をエントロピー符号化して、保管対象の文書情報に対応付けられるタグがデータセンター装置３０１で保管される際にタグに付される保管索引値を生成し、保管対象の文書情報とタグと保管索引値をデータセンター装置３０１に送信する。データセンター装置３０１では、保管索引値を用いてタグを分類、索引化することができ、確率的暗号を利用した秘匿検索を安全に高速化することができる。
【選択図】図１Secure search using probabilistic encryption is speeded up safely.
A user terminal device 201 entropy-encodes a random value obtained uniquely from a storage keyword specified for document information to be stored, and a tag associated with the document information to be stored is a data center. A storage index value attached to the tag when stored in the apparatus 301 is generated, and the document information to be stored, the tag, and the storage index value are transmitted to the data center apparatus 301. In the data center device 301, tags can be classified and indexed using the storage index value, and the confidential search using probabilistic encryption can be safely accelerated.
[Selection] Figure 1

Description

本発明は、秘匿検索技術に関する。 The present invention relates to a secret search technique.

秘匿検索とは、暗号化データを暗号化したまま検索する技術である。
近年は、クラウドサービスなどのインターネット上でデータを管理する際に、盗聴などの脅威から守るためのセキュリティ技術として注目されている。 The secret search is a technique for searching encrypted data while encrypting it.
In recent years, when managing data on the Internet such as a cloud service, it is attracting attention as a security technology for protecting against threats such as eavesdropping.

暗号化したデータを検索する手法は、確定的暗号を利用する方法と、確率的暗号を利用する方法の２種類がある。
確定的暗号を利用する方法は、同一のキーワードを暗号化した際に、同一の暗号文が得られるため、暗号化しないデータベース検索システムと同様の高速化手法を利用することができる（例えば、同一キーワードのタグのグルーピング）。
一方、確率的暗号を利用した方式は、同一のキーワードを暗号化する場合でも、異なる暗号文となる。
したがって、単純に（バイナリデータとして）同一の暗号文をグルーピングする方法などを利用することはできない。
特に、確率的暗号を利用した方法では、安全性証明が付加される場合が多く、暗号文から一切の情報を得られないことが、数学的に証明されている。
そのため、キーワードの（部分）情報を利用する索引化は困難である。 There are two methods for retrieving encrypted data: a method using deterministic encryption and a method using probabilistic encryption.
In the method using deterministic encryption, the same ciphertext can be obtained when the same keyword is encrypted. Therefore, the same speed-up method as the database search system without encryption can be used (for example, the same keyword) Keyword tag grouping).
On the other hand, a method using probabilistic encryption results in different ciphertexts even when the same keyword is encrypted.
Therefore, a method of simply grouping the same ciphertext (as binary data) cannot be used.
In particular, in a method using probabilistic encryption, a security proof is often added, and it is mathematically proved that no information can be obtained from the ciphertext.
For this reason, indexing using (partial) information of keywords is difficult.

次に、確率的暗号を利用したキーワード検索システムを説明する。 Next, a keyword search system using probabilistic encryption will be described.

確率的暗号を利用した秘匿検索は、ユーザ（検索者、登録者）とデータセンター（データ管理者）の間で、以下のようにしてデータ検索を行うことが一般的である。
データ登録者は、文書情報と、それを検索するためのキーワードを鍵として無意味な平文を暗号化したタグデータ（以下、タグという）を登録する。
データセンターは、タグと文書情報を組にして保存する。
検索時には、検索者は検索したいキーワードで作成した検索クエリをデータセンターに送信する。
データセンターは、検索クエリを用いて、保存しているタグとの一致検査を行う。
一致すれば、組で保存されている文書情報を検索者に返却する。
なお、上記において、文書情報とは、検索システムとは別の暗号で暗号化された文書そのものや、文書名、文書を保管しているデータベースの位置情報などである。
また、上記において、検索クエリとは、検索語に対応するキーワードで暗号化されたタグを復号するための復号鍵である。また、検索クエリから検索キーワードを読み取ることはできない。 In secret search using probabilistic encryption, data search is generally performed between a user (searcher, registrant) and a data center (data manager) as follows.
A data registrant registers document data and tag data (hereinafter referred to as a tag) obtained by encrypting meaningless plaintext using a keyword for searching for the document information as a key.
The data center stores tags and document information in pairs.
At the time of search, the searcher sends a search query created with a keyword to be searched to the data center.
The data center uses the search query to perform a matching check with the stored tag.
If they match, the document information stored in the group is returned to the searcher.
In the above description, the document information includes a document itself encrypted with a different encryption from the search system, a document name, location information of a database storing the document, and the like.
In the above, the search query is a decryption key for decrypting a tag encrypted with a keyword corresponding to the search term. Further, the search keyword cannot be read from the search query.

なお、単純な秘匿検索方式としては、例えば非特許文献１に記載されている方式がある。
これは、単一のキーワードにおける秘匿検索を実現したものである。 As a simple confidential search method, for example, there is a method described in Non-Patent Document 1.
This realizes a secret search for a single keyword.

秘匿検索において、データセンターはタグ、検索クエリ、一致検査のいずれにおいても一切の情報も得ることができないため、高いセキュリティが提供される。
しかし、データセンターで行われる一致検査時に、保存されたすべてのタグを検査しなければないため、検索速度が低速であるという課題がある。
この解決法として、特許文献１では、検索履歴を利用して高速化する手法を提案している。また、特許文献２では、ブルームフィルタによって索引木を構成する方法を提案している。
なお、ブルームフィルタとは、フィルタに登録された集合に対して、ある要素が集合に含まれているかを判定するためのフィルタである。 In the secret search, the data center cannot obtain any information in any of the tag, the search query, and the matching check, so that high security is provided.
However, there is a problem that the search speed is low because all the stored tags must be inspected at the time of a matching check performed in the data center.
As a solution to this problem, Patent Document 1 proposes a method of speeding up using a search history. Patent Document 2 proposes a method of constructing an index tree using a Bloom filter.
The Bloom filter is a filter for determining whether a certain element is included in the set with respect to the set registered in the filter.

特開２００５−１３４９９０号公報JP 2005-134990 A 特開２００７−５２６９８号公報JP 2007-52698 A

ＤＢｏｎｅｈ、Ｇ．Ｄ．Ｃｒｅｓｃｅｎｚｏ、Ｒ．Ｏｓｔｒｏｖｓｋｙ、ＰｅｒｓｉａｎｏＧ、 “ＰｕｂｌｉｃＫｅｙＥｎｃｒｙｐｔｉｏｎｗｉｔｈＫｅｙｗｏｒｄＳｅａｒｃｈ”、ＰｒｏｃｅｅｄｉｎｇｓｏｆＥＵＲＯＣＲＹＰＴ ’０４、ｖｏｌ．３０２７ＬＮＣＳ、ｐｐ．５０６−５２２（２００４）D Boneh, G.D. D. Crescenzo, R.A. Ostrovsky, Persiano G, “Public Key Encryption with Keyword Search”, Proceedings of EUROCRYPT '04, vol. 3027 LNCS, pp. 506-522 (2004)

秘匿検索の最も基本的な方式は非特許文献１である。
この方式はＩＤベース暗号（以下、ＩＢＥ：Ｉｄｅｎｔｉｔｙ−ＢａｓｅｄＥｎｃｒｙｐｔｉｏｎ）を用いた秘匿検索方式である。
ＩＢＥは、ＩＤを鍵としてデータの暗号化を行う。
ＩＤはＥメールアドレスや住所、名前など、ユーザを一意に識別できるものある。
ＩＤを鍵とするため、誰でもデータの暗号化を行うことができる。
暗号化データは、ＩＤに対応する復号鍵を持つ者（一般的には暗号化に使用したＩＤの所有者）だけが復号できる。
データ登録者はまず、文書情報に設定するキーワードを、ＩＢＥの暗号化機能を利用して、キーワードを鍵（ＩＤ）として無意味な平文を暗号化し、文書情報のタグとして文書情報と共にデータセンターに保存する。
タグは暗号化されているため、キーワードに関する情報は漏洩しない。
次に、検索者は検索したいキーワードと、自身の秘密鍵（ユーザ鍵）から検索クエリを生成し、データセンターに送信する。
第三者は検索クエリからも、キーワード情報を知ることはできない。
検索クエリを受信したデータセンターは、保存されているタグに対し、検索クエリを用いてＩＢＥの復号処理を実行する。
無意味な平文が正しく復号できた場合、暗号化に用いたキーワード（文書情報に設定したキーワード）と検索クエリを生成する際に用いたＩＤ（検索キーワード）が対応していることを意味するため、「一致」という検索結果のみをデータセンターは知ることができる。 Non-patent document 1 is the most basic method of confidential search.
This method is a secret search method using ID-based encryption (hereinafter referred to as IBE: Identity-Based Encryption).
IBE encrypts data using an ID as a key.
An ID such as an e-mail address, an address, or a name can uniquely identify a user.
Since ID is used as a key, anyone can encrypt data.
Only the person who has the decryption key corresponding to the ID (generally the owner of the ID used for encryption) can decrypt the encrypted data.
First, the data registrant uses the IBE encryption function to encrypt the meaningless plaintext using the keyword as a key (ID) and sets the keyword to be set in the document information as a document information tag to the data center along with the document information. save.
Since the tag is encrypted, no information about keywords is leaked.
Next, the searcher generates a search query from the keyword to be searched and its own secret key (user key), and transmits it to the data center.
A third party cannot obtain keyword information from a search query.
The data center that has received the search query executes IBE decryption processing on the stored tag using the search query.
When a meaningless plaintext can be correctly decrypted, it means that the keyword (keyword set in the document information) used for encryption corresponds to the ID (search keyword) used when generating the search query. The data center can know only the search result “match”.

特許文献１は、ある検索キーワードで検索した結果をインデックスに保存し、次回以降に同一のキーワードが検索された場合に、高速に検索結果を応答する仕組みである。
特許文献１の方法は、一度検索したキーワードに対しては高速に検索結果を検索者に返却することが可能であるが、一度も検索したことのないキーワードに対しては、データセンターに保存された全てのタグを検索する必要がある。
そのため、膨大なタグを保存するデータベースで、初めて検索されるキーワードに対する検索結果を得るのに、多大な時間を要するという課題がある。 Patent Document 1 is a mechanism for storing a search result with a certain search keyword in an index and responding the search result at a high speed when the same keyword is searched after the next time.
The method of Patent Document 1 can return a search result to a searcher at high speed for a keyword once searched, but a keyword that has never been searched is stored in a data center. It is necessary to search all tags.
Therefore, there is a problem that it takes a lot of time to obtain a search result for a keyword searched for the first time in a database storing a huge number of tags.

特許文献２の方法は、確定的暗号で暗号化したキーワードからブルームフィルタを作成し、索引木を構築する方法である。
この方法は、データセンターに保存されたブルームフィルタ同士のハミング距離を計算する必要があるため、保存されたブルームフィルタ数が増加すると、索引木生成のための計算量が急激に増加するという課題がある。
また、ブルームフィルタの大きさにより、ブルームフィルタに含まれるキーワードの識別には限度がある。
特に偽陽性の問題から、ブルームフィルタに登録される可能性のあるキーワード種類数を考慮して、十分に大きなフィルタを用意する必要がある。
また、ブルームフィルタ１つに対して１つのキーワードが対応している場合（ブルームフィルタに単一のキーワードしか入力しなかった場合）、ブルームフィルタに対応するデータ数を分析することにより、データセンターにキーワード分布に関する情報が漏洩するという課題がある。 The method of Patent Document 2 is a method for creating a Bloom filter from a keyword encrypted with deterministic encryption and constructing an index tree.
In this method, it is necessary to calculate the Hamming distance between Bloom filters stored in the data center. Therefore, when the number of stored Bloom filters increases, the calculation amount for generating the index tree increases rapidly. is there.
Further, there is a limit to identification of keywords included in the Bloom filter due to the size of the Bloom filter.
In particular, because of the false positive problem, it is necessary to prepare a sufficiently large filter in consideration of the number of keyword types that may be registered in the Bloom filter.
In addition, when one keyword corresponds to one Bloom filter (when only a single keyword is input to the Bloom filter), the data center corresponding to the Bloom filter is analyzed to analyze the data center. There is a problem that information on keyword distribution leaks.

また、他の類似する索引方式でも、単一のキーワードに対して対応するデータ群を管理するものがほとんどである。
そのため、データセンターはキーワード分布を知ることができ、経験上知られている分布と照合することで、キーワードを推測する「頻度解析」が実施できるという課題がある。 Most other similar index methods manage data groups corresponding to a single keyword.
Therefore, the data center can know the keyword distribution, and there is a problem that “frequency analysis” for estimating the keyword can be performed by collating with the distribution known from experience.

本発明では、上記のような課題を解決することを主な目的としており、確率的暗号を利用した秘匿検索を安全に高速化することを目的とする。 The main object of the present invention is to solve the above-described problems, and it is an object of the present invention to speed up a secure search using probabilistic encryption safely.

本発明に係るデータ処理装置は、
複数の暗号化データと、各暗号化データに対応付けられている、暗号化データの検索の際に照合されるタグデータとを保管するデータ保管装置に接続され、
前記データ保管装置での保管の対象となる保管対象データのキーワードを保管キーワードとして指定するキーワード指定部と、
前記保管キーワードから一意に得られる乱数値をエントロピー符号化して、前記保管対象データの暗号化データに対応付けられるタグデータが前記データ保管装置で保管される際に前記タグデータに付される索引値を生成する索引生成部と、
前記保管対象データの暗号化データと前記タグデータと前記索引値とが含まれる保管要求を、前記データ保管装置に対して送信する通信部とを有することを特徴とする。 The data processing apparatus according to the present invention
Connected to a data storage device that stores a plurality of encrypted data and tag data that is associated with each encrypted data and that is collated when searching for the encrypted data;
A keyword designating unit for designating a keyword of data to be stored to be stored in the data storage device as a storage keyword;
An index value assigned to the tag data when the tag data associated with the encrypted data of the storage target data is stored in the data storage device by entropy encoding a random value uniquely obtained from the storage keyword An index generation unit for generating
A communication unit that transmits a storage request including encrypted data of the storage target data, the tag data, and the index value to the data storage device.

本発明によれば、データ処理装置において、保管キーワードから一意に得られる乱数値をエントロピー符号化して、保管対象データの暗号化データに対応付けられるタグデータがデータ保管装置で保管される際にタグデータに付与される索引値を生成するため、データ保管装置において、索引値を用いてタグデータを分類、索引化することができ、確率的暗号を利用した秘匿検索を安全に高速化することができる。 According to the present invention, in a data processing apparatus, a random number value uniquely obtained from a storage keyword is entropy-encoded, and tag data associated with encrypted data of storage target data is stored in the data storage apparatus. In order to generate an index value to be assigned to data, tag data can be classified and indexed by using the index value in the data storage device, and secure search using probabilistic encryption can be safely accelerated. it can.

実施の形態１に係る秘匿検索システムの構成例を示す図。FIG. 3 is a diagram illustrating a configuration example of a confidential search system according to the first embodiment. 実施の形態１に係る利用者端末装置の構成例を示す図。FIG. 3 shows a configuration example of a user terminal device according to the first embodiment. 実施の形態１に係るデータセンター装置の構成例を示す図。FIG. 3 shows a configuration example of a data center apparatus according to the first embodiment. 実施の形態１に係る検索要求のデータ構成例を示す図。FIG. 4 is a diagram showing a data configuration example of a search request according to the first embodiment. 実施の形態１に係る登録データのデータ構成例を示す図。FIG. 3 is a diagram illustrating a data configuration example of registration data according to the first embodiment. 実施の形態１に係るインデックス情報のデータ構成例を示す図。FIG. 3 is a diagram showing a data configuration example of index information according to Embodiment 1. 実施の形態１に係るデータ保管部のデータ保管の例を示す図。FIG. 4 is a diagram showing an example of data storage in the data storage unit according to the first embodiment. 実施の形態１に係る利用者端末装置における索引値生成処理を示すフローチャート図。The flowchart figure which shows the index value production | generation process in the user terminal device which concerns on Embodiment 1. FIG. 実施の形態１に係るデータセンター装置におけるデータ登録処理を示すフローチャート図。FIG. 3 is a flowchart showing data registration processing in the data center apparatus according to the first embodiment. 実施の形態１に係るデータセンター装置における検索処理を示すフローチャート図。FIG. 3 is a flowchart showing search processing in the data center device according to the first embodiment. 実施の形態１に係るハッシュ計算の例とハフマン符号化の例を示す図。The figure which shows the example of the hash calculation which concerns on Embodiment 1, and the example of Huffman encoding. 実施の形態１に係る利用者端末装置及びデータセンター装置のハードウェア構成例を示す図。The figure which shows the hardware structural example of the user terminal device and data center apparatus which concern on Embodiment 1. FIG.

実施の形態１．
本実施の形態では、エントロピー符号を用いてキーワードタグを分類、索引化することで、確率的暗号を利用した秘匿検索を安全に高速化する構成を説明する。
なお、本実施の形態では、エントロピー符号の例として、ハフマン符号を用いて説明を行う。
また、本実施の形態では、ユーザ（検索者、登録者）とデータセンター（データ管理者）の間で、キーワードの鍵付ハッシュ値をハフマン符号化した符号値を索引値とする例を説明する。 Embodiment 1 FIG.
In the present embodiment, a configuration for speeding up a secure search using probabilistic encryption by classifying and indexing keyword tags using entropy codes will be described.
In the present embodiment, a description will be given using a Huffman code as an example of the entropy code.
Further, in this embodiment, an example will be described in which a code value obtained by Huffman coding a keyed hash value of a keyword between a user (searcher, registrant) and a data center (data manager) is used as an index value. .

本実施の形態における大まかな流れは、次の通りである。
まず、データ登録者は、保存したい文書に関する「文書情報」と文書に関連するキーワードから生成した「タグデータ（以下、タグと表記する）」およびキーワードの鍵付ハッシュ値をハフマン符号で符号化した値（索引値）をデータセンターに保存する。
ここで、文書情報とは、検索システムとは別の暗号で暗号化された文書そのものや、文書名、文書を保管しているデータベースの位置情報などである。
文書情報から文書そのものを閲覧することはできない。
タグは文書情報を検索する際に用いる、キーワードを暗号鍵にして無意味な平文（乱数）を暗号化した値である。
データセンターは、タグと文書情報を組にして保存し、索引値を利用してインデックスを構築する。
検索時には、検索者は検索したいキーワードで作成した検索クエリと、登録時と同様の手順で検索キーワードから生成した索引値をデータセンターに送信する。
データセンターは、受信した索引値から検索対象のタグを限定し、検索クエリを用いて、それらのタグと一致検査を行う。
一致すれば、組で保存されている文書情報を検索者に返却する。
なお、以下では、文書情報をデータセンターに保管する際に、保管対象の文書情報に対して指定するキーワードを保管キーワードとも表記し、検索時に指定するキーワードを検索キーワードとも表記する。
また、文書情報及びタグとともにデータセンターで保管される索引値を保管索引値とも表記し、検索時に検索クエリとともにデータセンターに送信する索引値を検索索引値とも表記する。 The general flow in the present embodiment is as follows.
First, the data registrant encodes the “document information” related to the document to be stored and the “tag data (hereinafter referred to as a tag)” generated from the keyword related to the document and the keyed hash value of the keyword with a Huffman code Store the value (index value) in the data center.
Here, the document information includes a document itself encrypted with a different encryption from the search system, a document name, location information of a database storing the document, and the like.
The document itself cannot be viewed from the document information.
The tag is a value obtained by encrypting meaningless plaintext (random number) using a keyword as an encryption key and used when searching for document information.
The data center stores tags and document information in pairs, and builds an index using index values.
At the time of search, the searcher transmits a search query created with the keyword to be searched and an index value generated from the search keyword in the same procedure as at the time of registration to the data center.
The data center limits the tags to be searched from the received index values, and performs a matching check with these tags using a search query.
If they match, the document information stored in the group is returned to the searcher.
In the following, when document information is stored in a data center, a keyword specified for the document information to be stored is also expressed as a storage keyword, and a keyword specified at the time of search is also expressed as a search keyword.
An index value stored in the data center together with the document information and the tag is also referred to as a storage index value, and an index value transmitted to the data center together with the search query at the time of search is also referred to as a search index value.

次に、本実施の形態に係る秘匿検索システムの構成を説明する。
図１は、本実施の形態に係る秘匿検索システムの構成例を示す図である。 Next, the configuration of the secret search system according to the present embodiment will be described.
FIG. 1 is a diagram illustrating a configuration example of a secret search system according to the present embodiment.

図１において、秘匿検索システム１００は、利用者端末装置２０１、データセンター装置３０１を備える。
利用者端末装置２０１は社内ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）１０２に接続されている。
社内ＬＡＮ１０２はネットワーク１０１を介してデータセンター装置３０１と接続されている。 In FIG. 1, the confidential search system 100 includes a user terminal device 201 and a data center device 301.
A user terminal device 201 is connected to an in-house LAN (Local Area Network) 102.
The in-house LAN 102 is connected to the data center device 301 via the network 101.

利用者端末装置２０１は、企業のユーザが利用するＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）である。
利用者端末装置２０１は、文書情報とそれを検索するためのタグをデータセンター装置３０１に保管するとともに、データセンター装置３０１に蓄積したタグを検索し、データセンター装置３０１から文書情報を取り出す。
なお、利用者端末装置２０１は、データ処理装置の例である。 The user terminal device 201 is a PC (Personal Computer) used by corporate users.
The user terminal device 201 stores the document information and a tag for searching for the document information in the data center device 301, searches for the tag stored in the data center device 301, and extracts the document information from the data center device 301.
The user terminal device 201 is an example of a data processing device.

データセンター装置３０１は、企業内で作成された文書情報およびタグを保管する大容量の記憶装置を持つサーバ装置である。
また、利用者端末装置２０１から送信されるキーワードの暗号文（またはハッシュ値）をエントロピー符号化した値を利用した索引値を持ち、保存されたタグを効率的に検索する機能を備える。
タグは暗号化された状態で保存されるため、データセンター装置３０１はタグからキーワードを知ることはできない。
なお、データセンター装置３０１は、データ保管装置の例である。 The data center device 301 is a server device having a large-capacity storage device that stores document information and tags created in a company.
In addition, it has an index value using an entropy-encoded value of a keyword ciphertext (or hash value) transmitted from the user terminal device 201, and has a function of efficiently searching for a stored tag.
Since the tag is stored in an encrypted state, the data center device 301 cannot know the keyword from the tag.
The data center device 301 is an example of a data storage device.

ネットワーク１０１は、社内ＬＡＮ１０２とデータセンター装置３０１を接続する通信路である。
代表的なネットワーク１０１の例はインターネットがある。 The network 101 is a communication path that connects the in-house LAN 102 and the data center device 301.
An example of a typical network 101 is the Internet.

社内ＬＡＮ１０２は、企業内に施設された通信路であり、企業内で利用される様々なサーバ装置やＰＣが接続される。
なお、通信路は専用線や無線、ルータなどで構成される複雑な通信路となる。 The in-house LAN 102 is a communication path established in the company, and is connected to various server devices and PCs used in the company.
The communication path is a complicated communication path composed of a dedicated line, a radio, a router, and the like.

図２は、利用者端末装置２０１の構成例を示すブロック図である。
利用者端末装置２０１は、文書・鍵格納領域２０２、文書情報管理部２０３、利用者Ｉ／Ｆ（Ｉｎｔｅｒｆａｃｅ）部２０４、検索クエリ生成部２０５、タグ生成部２０６、通信部２０７、索引生成部２０８を備える。 FIG. 2 is a block diagram illustrating a configuration example of the user terminal device 201.
A user terminal device 201 includes a document / key storage area 202, a document information management unit 203, a user I / F (Interface) unit 204, a search query generation unit 205, a tag generation unit 206, a communication unit 207, and an index generation unit 208. Is provided.

文書・鍵格納領域２０２は、データセンター装置３０１に保存する文書情報を生成するためのオリジナルの文書と、検索クエリの生成のためのユーザ秘密鍵、およびタグ生成のための公開鍵を保存する。
文書情報として、文書を暗号化したデータを利用する場合、文書情報を暗号化するための鍵も保存する。
また、索引生成部２０８において、利用者Ｉ／Ｆ２０４から得られたキーワードを暗号化（またはハッシュ化）するための鍵を保存する。 The document / key storage area 202 stores an original document for generating document information to be stored in the data center device 301, a user secret key for generating a search query, and a public key for generating a tag.
When using data obtained by encrypting a document as document information, a key for encrypting the document information is also stored.
The index generation unit 208 stores a key for encrypting (or hashing) the keyword obtained from the user I / F 204.

文書情報管理部２０３は、文書・鍵格納領域２０２に保存されている文書をもとに、文書情報を生成する。
文書情報からは、文書を閲覧することはできない。
文書情報の例としては、検索システムとは別の暗号で暗号化された文書そのものや、文書名、文書を保管しているデータベースの位置情報などがある。
つまり、文書情報管理部２０３は、データセンター装置３０１での保管の対象となる文書（保管対象データ）の暗号化を行って文書情報（保管対象データの暗号化データ）を生成する。 The document information management unit 203 generates document information based on the document stored in the document / key storage area 202.
Documents cannot be viewed from document information.
Examples of document information include a document itself encrypted with a different encryption from the search system, a document name, and location information of a database storing the document.
That is, the document information management unit 203 encrypts a document to be stored (storage target data) in the data center device 301 to generate document information (encrypted data of storage target data).

利用者Ｉ／Ｆ部２０４は、秘匿検索システム１００を操作するためのインタフェースである。
検索クエリ生成部２０５や、タグ生成部２０６のためのキーワードをユーザから入力する機能を備える。
つまり、利用者Ｉ／Ｆ部２０４は、ユーザからの指示に従って、保管キーワードや検索キーワードを指定する。
利用者Ｉ／Ｆ部２０４は、キーワード指定部の例である。 The user I / F unit 204 is an interface for operating the confidential search system 100.
A function for inputting keywords for the search query generation unit 205 and the tag generation unit 206 from the user is provided.
That is, the user I / F unit 204 specifies a storage keyword and a search keyword in accordance with an instruction from the user.
The user I / F unit 204 is an example of a keyword specifying unit.

検索クエリ生成部２０５は、ユーザが利用者Ｉ／Ｆ部２０４を介して入力した検索したいキーワードと、文書・鍵格納領域２０２に保存されたユーザ鍵から検索クエリを生成する。
つまり、検索クエリ生成部２０５は、検索クエリとして検索キーワードに対応するタグを復号するための復号鍵を生成する。
検索クエリ生成部２０５は、検索キーワード秘匿化部の例である。 The search query generation unit 205 generates a search query from the keyword that the user wants to search via the user I / F unit 204 and the user key stored in the document / key storage area 202.
That is, the search query generation unit 205 generates a decryption key for decrypting a tag corresponding to the search keyword as a search query.
The search query generation unit 205 is an example of a search keyword concealment unit.

タグ生成部２０６は、ユーザが利用者Ｉ／Ｆ部２０４を介して入力した文書に設定したいキーワードと、文書・鍵格納領域２０２に保存された公開鍵からタグデータを生成する。 The tag generation unit 206 generates tag data from the keyword that the user wants to set in the document input via the user I / F unit 204 and the public key stored in the document / key storage area 202.

通信部２０７は、データセンター装置３０１に対して検索を実行するための検索要求（検索クエリと検索索引値）や、データセンター装置３０１に対して新規に文書情報とタグを保存するための登録データ（文書情報、タグ、保管索引値）を、データセンター装置３０１に送信するために、社内ＬＡＮ１０２に接続するものである。
なお、登録データは、保管要求ともいう。 The communication unit 207 performs a search request (search query and search index value) for executing a search with respect to the data center device 301, and registration data for newly storing document information and a tag with respect to the data center device 301. In order to transmit (document information, tag, storage index value) to the data center apparatus 301, it is connected to the in-house LAN 102.
The registration data is also called a storage request.

索引生成部２０８は、利用者Ｉ／Ｆ部２０４から受信したキーワードを、文書・鍵格納領域２０２に保存してある共通鍵（データ登録者と検索者との間の共通鍵）を用いてハッシュ値を計算する（以下、鍵付ハッシュ値という）。
さらに鍵付ハッシュ値に対して、ハフマン符号化を実施して得られた値を索引値として出力する。
つまり、索引生成部２０８は、保管キーワードから一意に得られる乱数値をハフマン符号化して、タグがデータセンター装置３０１で保管される際にタグに付される索引値（保管索引値）を生成する。
また、索引生成部２０８は、検索キーワードから一意に得られる乱数値をハフマン符号化して、データセンター装置３０１において、検索クエリ（秘匿化された検索キーワード）との照合の対象となるタグを選出するために、データセンター装置３０１に保管されている保管索引値と比較される索引値（検索索引値）を生成する。 The index generation unit 208 hashes the keyword received from the user I / F unit 204 using a common key (common key between the data registrant and the searcher) stored in the document / key storage area 202. Calculate the value (hereinafter referred to as keyed hash value).
Further, a value obtained by performing Huffman coding on the keyed hash value is output as an index value.
That is, the index generation unit 208 Huffman-codes a random value uniquely obtained from the storage keyword, and generates an index value (storage index value) attached to the tag when the tag is stored in the data center device 301. .
Further, the index generation unit 208 performs Huffman coding on a random value uniquely obtained from the search keyword, and selects a tag to be collated with the search query (the concealed search keyword) in the data center device 301. Therefore, an index value (search index value) to be compared with the storage index value stored in the data center device 301 is generated.

図３は、データセンター装置３０１の構成例を示すブロック図である。
データセンター装置３０１は、検索要求受信部３０２、検索処理部３０３、インデックス記憶部３０４、データ保管部３０５、索引分類部３０６、検索要求回答部３０７、登録データ受信部３０８を備える。 FIG. 3 is a block diagram illustrating a configuration example of the data center device 301.
The data center device 301 includes a search request receiving unit 302, a search processing unit 303, an index storage unit 304, a data storage unit 305, an index classification unit 306, a search request answering unit 307, and a registered data receiving unit 308.

登録データ受信部３０８は、利用者端末装置２０１から登録データ（保管要求）を受信すると、受信した登録データに含まれるタグにタグＩＤ（Ｉｄｅｎｔｉｆｉｃａｔｉｏｎ）を付与し、索引分類部３０６に保管索引値とタグＩＤを、データ保管部３０５にタグと文書情報をそれぞれ転送する。
タグＩＤは、タグを一意に特定することができる識別子である。
なお、登録データ受信部３０８は、保管要求受信部の例である。 When the registration data receiving unit 308 receives the registration data (storage request) from the user terminal device 201, the registration data receiving unit 308 gives a tag ID (Identification) to the tag included in the received registration data, and stores the storage index value and the index classification unit 306. The tag ID and document information are transferred to the data storage unit 305, respectively.
The tag ID is an identifier that can uniquely identify the tag.
The registration data receiving unit 308 is an example of a storage request receiving unit.

索引分類部３０６は、登録データ受信部３０８から登録データとして保管索引値とタグＩＤを受信し、インデックス記憶部３０４にインデックス情報を追加する機能を持つ。
つまり、索引分類部３０６は、タグＩＤと保管索引値とを対応付けるインデックス情報を生成する。
索引分類部３０６は、インデックス情報生成部の例である。
また、索引分類部３０６は、後述する検索処理部３０３から受信した検索索引値から、対応するタグＩＤ群をインデックス記憶部３０４から取り出す機能を持つ。 The index classification unit 306 has a function of receiving a storage index value and a tag ID as registration data from the registration data receiving unit 308 and adding index information to the index storage unit 304.
That is, the index classification unit 306 generates index information that associates a tag ID with a storage index value.
The index classification unit 306 is an example of an index information generation unit.
The index classification unit 306 has a function of extracting a corresponding tag ID group from the index storage unit 304 from a search index value received from the search processing unit 303 described later.

インデックス記憶部３０４は、保管索引値とタグＩＤとの対応付けが示されるインデックス情報を保存する機能を持つ。
なお、インデックス記憶部３０４に保存するインデックス情報では、一つの保管索引値に１つ又は複数のタグＩＤが対応する。 The index storage unit 304 has a function of storing index information indicating a correspondence between stored index values and tag IDs.
In the index information stored in the index storage unit 304, one or more tag IDs correspond to one storage index value.

データ保管部３０５は、登録データ受信部３０８から受信した文書情報とタグとを対応付けて保存する。 The data storage unit 305 stores the document information received from the registered data reception unit 308 in association with the tag.

検索要求受信部３０２は、利用者端末装置２０１から送信された検索要求を受信し、検索処理部３０３へ転送する。 The search request receiving unit 302 receives the search request transmitted from the user terminal device 201 and transfers it to the search processing unit 303.

検索処理部３０３は、検索要求受信部３０２から検索要求を受信し、検索要求に含まれる検索索引値を索引分類部３０６に送信し、検索索引値と一致する保管索引値がインデックス記憶部３０４に存在する場合は、保管索引値から得られた検索対象タグをデータ保管部３０５から取り出し、検索クエリとの一致検査を行う。
そして、検索処理部３０３は、一致検査の結果より得られた文書情報を検索要求回答部３０７を介して利用者端末装置２０１に送信する。
つまり、検索処理部３０３は、インデックス記憶部３０４に記憶されているインデックス情報を参照して、検索要求に含まれる検索索引値と一致する保管索引値と対応付けられているタグＩＤを１つ以上選出する。
更に、検索処理部３０３は、選出したタグＩＤに対応するタグをデータ保管部３０５から抽出し、抽出したタグごとに、検索要求に含まれる検索クエリ（秘匿化された検索キーワード）との照合を行い、検索キーワードと一致する保管キーワードから生成されているタグを特定し、特定したタグと対応付けられている文書情報をデータ保管部３０５から抽出する。
検索処理部３０３は、タグＩＤ選出部及びデータ抽出部の例に相当する。 The search processing unit 303 receives the search request from the search request receiving unit 302, transmits the search index value included in the search request to the index classification unit 306, and the storage index value that matches the search index value is stored in the index storage unit 304. If it exists, the search target tag obtained from the storage index value is taken out from the data storage unit 305 and checked for a match with the search query.
Then, the search processing unit 303 transmits the document information obtained from the result of the matching check to the user terminal device 201 via the search request answering unit 307.
That is, the search processing unit 303 refers to the index information stored in the index storage unit 304, and sets one or more tag IDs associated with the storage index value that matches the search index value included in the search request. elect.
Further, the search processing unit 303 extracts a tag corresponding to the selected tag ID from the data storage unit 305, and checks each extracted tag with a search query (a concealed search keyword) included in the search request. The tag generated from the storage keyword that matches the search keyword is specified, and the document information associated with the specified tag is extracted from the data storage unit 305.
The search processing unit 303 corresponds to an example of a tag ID selection unit and a data extraction unit.

検索要求回答部３０７は、検索処理部３０３によって検索された文書情報を検索要求元の利用者端末装置２０１に送信する。 The search request answering unit 307 transmits the document information searched by the search processing unit 303 to the user terminal device 201 that is the search request source.

図４は、本実施の形態に係る検索要求２００１のデータ構造の一例を示す。 FIG. 4 shows an example of the data structure of the search request 2001 according to this embodiment.

ここでは、この構成例を検索要求Ａとする。
検索要求Ａは検索クエリ２００２と索引値２００４から構成される。
検索クエリ２００２は、文書・鍵格納領域２０２に保存されたユーザ秘密鍵と検索キーワード２００３を用いて生成する。
生成された検索クエリ２００２からは、検索キーワード２００３を知ることはできない。
データセンター装置３０１は、検索クエリ２００２を用いてタグとの一致検査を行い、検索キーワード２００３と同じ保管キーワードから生成されたタグを抽出する。
索引値２００４は、検索キーワード２００３から図８の手順で得られるエントロピー符号値である。 Here, this configuration example is referred to as a search request A.
The search request A includes a search query 2002 and an index value 2004.
The search query 2002 is generated using the user secret key stored in the document / key storage area 202 and the search keyword 2003.
The search keyword 2003 cannot be known from the generated search query 2002.
The data center device 301 uses the search query 2002 to perform a matching check with the tag, and extracts a tag generated from the same storage keyword as the search keyword 2003.
The index value 2004 is an entropy code value obtained from the search keyword 2003 by the procedure of FIG.

図５は、本実施の形態に係る登録データ２３０１のデータ構造の一例を示す。 FIG. 5 shows an example of the data structure of the registration data 2301 according to this embodiment.

ここでは、この構成例を登録データＡとする。
登録データＡは文書情報２３０２、タグ２３０３、索引値２３０５から構成される。
文書情報２３０２は、例えば、検索システムとは別の暗号で暗号化された文書そのものや、文書名、文書を保管しているデータベースの位置情報などであり、文書情報から文書そのものを閲覧することはできない。
タグ２３０３は、ユーザによって付与された、文書情報２３０２に対する保管キーワード２３０４と文書・鍵格納領域２０２に保存された公開鍵を鍵として無意味な平文を暗号化したデータである。
索引値２３０５は、保管キーワード２３０４から図８の手順で得られるエントロピー符号値である。 Here, this configuration example is registered data A.
The registration data A includes document information 2302, a tag 2303, and an index value 2305.
The document information 2302 is, for example, a document itself encrypted with a different encryption from the search system, a document name, location information of a database storing the document, and the like, and browsing the document itself from the document information Can not.
A tag 2303 is data obtained by encrypting a meaningless plaintext using a storage keyword 2304 for the document information 2302 and a public key stored in the document / key storage area 202 as keys, which are given by the user.
The index value 2305 is an entropy code value obtained from the storage keyword 2304 by the procedure of FIG.

本実施の形態における索引値は２進数のビット列である。
インデックス情報の構造は索引値から検索対象のタグＩＤが得られればよい。
本実施の形態では、インデックス情報は、最も単純な転置インデックスの形式を例とするが、２分木やＢ木を用いた索引木を構成してもよい。 The index value in the present embodiment is a binary bit string.
As for the structure of the index information, the tag ID to be searched may be obtained from the index value.
In the present embodiment, the index information uses the simplest form of an inverted index as an example, but an index tree using a binary tree or a B-tree may be configured.

図６は、インデックス記憶部３０４に記憶されるインデックス情報３００１のデータ構造の一例を示す。
インデックス情報３００１は、索引値３００２とタグＩＤ３００３から構成される。
索引値３００２は、文書情報に設定されたキーワードの鍵付ハッシュ値をエントロピー符号化した値（保管索引値）である。
タグＩＤ３００３は、索引値３００２を生成するために用いた保管キーワードのタグに対応するＩＤである。
検索の際には、検索者が指定した検索索引値と同じ保管索引値と対応付けられているタグＩＤ群から得られるタグを一致検査対象とする。 FIG. 6 shows an example of the data structure of the index information 3001 stored in the index storage unit 304.
The index information 3001 includes an index value 3002 and a tag ID 3003.
The index value 3002 is a value (storage index value) obtained by entropy encoding the keyed hash value of the keyword set in the document information.
The tag ID 3003 is an ID corresponding to the storage keyword tag used to generate the index value 3002.
In the search, a tag obtained from a tag ID group associated with the same storage index value as the search index value designated by the searcher is set as a matching check target.

図７のようにデータ保管部３０５は、タグＩＤ３０５１、タグ３０５２、文書情報３０５３を列に持つような表の形式でデータを保存する。
タグＩＤ３０５１は、登録データ受信部３０８によって付与され、インデックス記憶部３０４に保存されるタグＩＤと一致する。 As shown in FIG. 7, the data storage unit 305 stores data in the form of a table having tag IDs 3051, tags 3052, and document information 3053 in columns.
The tag ID 3051 is given by the registration data receiving unit 308 and matches the tag ID stored in the index storage unit 304.

図８は、本実施の形態における索引値の計算方法を説明するためのフローチャートである。 FIG. 8 is a flowchart for explaining an index value calculation method according to the present embodiment.

まず、ステップＳ３０６１において、索引生成部２０８は、利用者Ｉ／Ｆ部２０４から入力されたキーワードを文書・鍵格納領域２０２に保存してある鍵を利用してハッシュ値を計算する（本実施の形態ではハッシュ値だが、共通鍵暗号化した値としてもよい）。
このとき、キーワードのハッシュ値を計算する際に用いるハッシュ関数は、同一のキーワードからは同じ値が計算される必要がある。
また、データセンターがキーワードのハッシュ値を計算できないように、データ登録者と、検索者のみ暗号化の鍵を保持することが望ましい。
例えば、暗号化データベースとは無関係の確定的暗号や鍵付ハッシュ関数を利用する。
本実施の形態では、用語の混乱を避け、説明を簡単にするため、キーワードの鍵付ハッシュ値を利用するものとする。 First, in step S3061, the index generation unit 208 calculates a hash value using the key input from the user I / F unit 204 using the key stored in the document / key storage area 202 (this embodiment). Although it is a hash value in the form, it may be a common key encrypted value).
At this time, the hash function used when calculating the hash value of the keyword needs to be calculated from the same keyword.
Also, it is desirable that only the data registrant and the searcher hold the encryption key so that the data center cannot calculate the hash value of the keyword.
For example, a deterministic encryption unrelated to the encryption database or a hash function with a key is used.
In the present embodiment, a keyed hash value of a keyword is used to avoid confusion of terms and simplify the explanation.

次に、ステップＳ３０６２において、索引生成部２０８は、ステップＳ３０６１で生成した値に対し、エントロピー符号化を実施する。
本実施の形態では、エントロピー符号としてハフマン符号を用いることとする。 Next, in step S3062, the index generation unit 208 performs entropy encoding on the value generated in step S3061.
In this embodiment, a Huffman code is used as the entropy code.

次に、ステップＳ３０６３において、索引生成部２０８は、ステップＳ３０６２で出力された符号値を、索引値として通信部２０７に送信する。
エントロピー符号として、ハフマン符号を利用した例を次に示す。 Next, in step S3063, the index generation unit 208 transmits the code value output in step S3062 to the communication unit 207 as an index value.
An example using a Huffman code as an entropy code is shown below.

キーワードとして、「ｏｒｉｇｉｎａｌ」「ｃｏｎｆｉｒｍ」「ｓｈａｒｅ」を持つ文書情報をそれぞれＤ（“ｏｒｉｇｉｎａｌ”）、Ｄ（“ｃｏｎｆｉｒｍ”）、Ｄ（“ｓｈａｒｅ”）とする。
また、それぞれの文書情報に付加されるタグをｔａｇ（“ｏｒｉｇｉｎａｌ”）、ｔａｇ（“ｃｏｎｆｉｒｍ”）、ｔａｇ（“ｓｈａｒｅ”）とし、そのタグに付与されるタグＩＤをそれぞれ１、２、３とする。
（タグ、タグＩＤ）として表すと、（ｔａｇ（“ｏｒｉｇｉｎａｌ”）、１）、（ｔａｇ（“ｃｏｎｆｉｒｍ”）、２）、（ｔａｇ（“ｓｈａｒｅ”）、３）となる。 Document information having “original”, “confirm”, and “share” as keywords are D (“original”), D (“confirm”), and D (“share”), respectively.
Further, the tag added to each document information is tag (“original”), tag (“confirm”), tag (“share”), and tag IDs assigned to the tags are 1, 2, 3, respectively. To do.
When expressed as (tag, tag ID), they are (tag ("original"), 1), (tag ("confirm"), 2), (tag ("share"), 3).

次に、キーワード「ｏｒｉｇｉｎａｌ」「ｃｏｎｆｉｒｍ」「ｓｈａｒｅ」のハッシュ値を計算する。
例えば、ハッシュアルゴリズムであるＳＨＡ２５６でそれぞれのハッシュ値を取ると、図１１（ａ）に示すような１６進数（０〜９、ａ〜ｆ）６４桁の数値が得られる。
ＳＨＡ２５６の性質から、このハッシュ値からキーワードを求めることはできない。 Next, hash values of the keywords “original”, “confirm”, and “share” are calculated.
For example, when each hash value is taken with the hash algorithm SHA256, a hexadecimal number (0-9, af) as shown in FIG. 11A is obtained.
Due to the nature of SHA256, keywords cannot be obtained from this hash value.

次に、得られたハッシュ値をハフマン符号化する。
ハフマン符号化を実施すると、図１１（ｂ）に示すよう２進数の数値を得ることができる。
このキーワードのハッシュ値をハフマン符号化した２進数を索引値とする。 Next, the obtained hash value is Huffman encoded.
When Huffman coding is performed, binary numerical values can be obtained as shown in FIG.
A binary number obtained by Huffman coding the hash value of this keyword is used as an index value.

上記ハフマン符号値は、符号化時の変換表がなければハッシュ値を復元することはできない。
本実施の形態では、ハフマン符号の変換表を破棄するため、符号値からハッシュ値を復元することはできない。
また、「ｏｒｉｇｉｎａｌ」と「ｃｏｎｆｉｒｍ」の索引値が同じであるため、ｔａｇ（“ｏｒｉｇｉｎａｌ”）とｔａｇ（“ｃｏｎｆｉｒｍ”）はデータセンター装置３０１のインデックス情報において同一のエントリに保存されることになる。
そのため、単純な索引技術のように、索引値からタグの分布を読み取ることができなくなるため、キーワードの頻度解析を防止することが可能であり、より安全である。 The Huffman code value cannot be restored without a conversion table at the time of encoding.
In this embodiment, since the Huffman code conversion table is discarded, the hash value cannot be restored from the code value.
Further, since the index values of “original” and “confirm” are the same, tag (“original”) and tag (“confirm”) are stored in the same entry in the index information of the data center device 301. .
For this reason, the tag distribution cannot be read from the index value as in a simple index technique, so that it is possible to prevent keyword frequency analysis and it is safer.

次に、秘匿検索システム１００の動作について説明する。
図９は、本実施の形態のデータ登録処理の例を説明するフローチャートである。
この処理はデータセンター装置３０１で実施される処理である。 Next, the operation of the confidential search system 100 will be described.
FIG. 9 is a flowchart illustrating an example of data registration processing according to the present embodiment.
This processing is performed by the data center device 301.

まず、ステップＳ４０１において、データセンター装置３０１は、利用者端末装置２０１からネットワーク１０１を経由して送信される登録データＡを、登録データ受信部３０８で受信する。
登録データＡは、図５の登録データ２３０１のようになっている。
登録データ受信部３０８は、受信したタグ２３０３にタグＩＤを付与する。
そして、タグＩＤと文書情報２３０２とタグ２３０３をデータ保管部３０５に送信し、タグＩＤと索引値２３０５を索引分類部３０６に送信する。 First, in step S 401, the data center device 301 receives registration data A transmitted from the user terminal device 201 via the network 101 by the registration data receiving unit 308.
The registration data A is as the registration data 2301 in FIG.
The registration data receiving unit 308 gives a tag ID to the received tag 2303.
The tag ID, document information 2302, and tag 2303 are transmitted to the data storage unit 305, and the tag ID and index value 2305 are transmitted to the index classification unit 306.

次に、ステップＳ４０２において、索引分類部３０６は、登録データ受信部３０８から受信した索引値が、既にインデックス情報に存在するかを検査するためにインデックス記憶部３０４を参照する。 In step S402, the index classification unit 306 refers to the index storage unit 304 in order to check whether the index value received from the registered data reception unit 308 already exists in the index information.

次に、ステップＳ４０３において、索引分類部３０６は、登録データ受信部３０８から受信した索引値がインデックス記憶部３０４に保存されたインデックス情報３００１のエントリに含まれるかどうかを検査する。
既に該当するエントリがある場合（該当あり）、ステップＳ４０４に進む。
該当するエントリがない場合（該当なし）は、ステップＳ４０６に進む。 In step S 403, the index classification unit 306 checks whether the index value received from the registered data reception unit 308 is included in the entry of the index information 3001 stored in the index storage unit 304.
If there is a corresponding entry already (corresponding), the process proceeds to step S404.
If there is no corresponding entry (not applicable), the process proceeds to step S406.

次に、ステップＳ４０４において、索引分類部３０６は、該当するインデックス情報３００１のエントリのタグＩＤ３００３に、登録データ受信部３０８から受信したタグＩＤを追加する。 Next, in step S404, the index classification unit 306 adds the tag ID received from the registered data reception unit 308 to the tag ID 3003 of the entry of the corresponding index information 3001.

次に、ステップＳ４０５おいて、データ保管部３０５が、登録データ受信部３０８から受信したタグＩＤと文書情報２３０２とタグ２３０３を保存する。 In step S 405, the data storage unit 305 stores the tag ID, document information 2302, and tag 2303 received from the registered data reception unit 308.

次に、ステップＳ４０３で、インデックス記憶部３０４に保存されたインデックス情報３００１に登録データ受信部３０８から受信した索引値が存在しない場合について説明する。 Next, a case where the index value received from the registered data receiving unit 308 does not exist in the index information 3001 stored in the index storage unit 304 in step S403 will be described.

ステップＳ４０６において、索引分類部３０６は、登録データ受信部３０８から受信した索引値と対応するタグＩＤから新たにエントリを作成し、インデックス情報３００１に新しいエントリを追加する。
その後ステップＳ４０５を実行する。 In step S 406, the index classification unit 306 creates a new entry from the tag value corresponding to the index value received from the registered data reception unit 308, and adds a new entry to the index information 3001.
Thereafter, step S405 is executed.

以上がデータ登録処理の動作の説明である。
次にデータ検索処理の説明をする。
図１０は、本実施の形態のデータ検索処理の例を説明するフローチャートである。 The above is the description of the operation of the data registration process.
Next, the data search process will be described.
FIG. 10 is a flowchart for explaining an example of data search processing according to the present embodiment.

まず、ステップＳ５０１において、データセンター装置３０１は、利用者端末装置２０１からネットワーク１０１を経由して送信される検索要求Ａを、検索要求受信部３０２で受信する。 First, in step S 501, the data center device 301 receives a search request A transmitted from the user terminal device 201 via the network 101 by the search request receiving unit 302.

次に、ステップＳ５０２において、検索要求受信部３０２は、検索要求Ａを検索処理部３０３に転送する。
検索処理部３０３はインデックス情報を参照するために、索引分類部３０６に検索要求Ａに含まれる索引値２００４を転送する。 Next, in step S 502, the search request receiving unit 302 transfers the search request A to the search processing unit 303.
The search processing unit 303 transfers the index value 2004 included in the search request A to the index classification unit 306 in order to refer to the index information.

次に、ステップＳ５０３において、索引分類部３０６は、インデックス記憶部３０４に保存されているインデックス情報３００１に、検索処理部３０３から受信した索引値２００４が含まれているかどうか検査する。 In step S 503, the index classification unit 306 checks whether the index information 2004 stored in the index storage unit 304 includes the index value 2004 received from the search processing unit 303.

次に、Ｓ５０３の検査で検索要求Ａの索引値２００４がインデックス記憶部３０４に存在しなかった場合（該当なし）は、ステップＳ５０４において、索引分類部３０６は検索処理部３０３に対して検索要求Ａに含まれる検索キーワード２００３がデータ保管部３０５に存在しないと通知する。
検索処理部３０３は、データ保管部３０５を検索せずに、検索要求回答部３０７を介して、検索要求のあった利用者端末装置２０１へ該当データが存在しない旨を回答する。 Next, when the index value 2004 of the search request A does not exist in the index storage unit 304 in the inspection of S503 (not applicable), the index classification unit 306 sends the search request A to the search processing unit 303 in step S504. That the search keyword 2003 included in the data storage unit 305 does not exist.
The search processing unit 303 does not search the data storage unit 305, and replies through the search request response unit 307 that the corresponding data does not exist to the user terminal device 201 that has made the search request.

次に、Ｓ５０３の検査の結果、受信した検索要求Ａの索引値２００４がインデックス記憶部３０４に存在した場合は（該当あり）、ステップＳ５０５において、索引分類部３０６によってインデックス情報３００１から索引値２００４に対応するタグＩＤ群が返却される。
検索処理部３０３は、索引値２００４に対応するタグＩＤ群を用いて、該当するタグをデータ保管部３０５から参照し、検索要求Ａに含まれる検索クエリ２００２を用いてキーワード一致検査を行う。 Next, if the index value 2004 of the received search request A exists in the index storage unit 304 as a result of the inspection in S503 (applicable), the index classification unit 306 changes the index information 3001 to the index value 2004 in step S505. The corresponding tag ID group is returned.
The search processing unit 303 refers to the corresponding tag from the data storage unit 305 using the tag ID group corresponding to the index value 2004, and performs a keyword matching check using the search query 2002 included in the search request A.

次に、Ｓ５０５の検索の結果、検索要求Ａに一致するタグがある場合に、検索処理部３０３は、ステップＳ５０４において、該当するタグに対応する文書情報をデータ保管部３０５から読み出し、検索要求回答部３０７を介して、検索要求のあった利用者端末装置２０１へ回答する。
データ保管部３０５に、検索要求に該当するデータが存在しない場合は、検索処理部３０３は、検索要求回答部３０７を介して、検索要求のあった利用者端末装置２０１へ該当データが存在しない旨を回答する。 Next, if there is a tag that matches the search request A as a result of the search in S505, the search processing unit 303 reads the document information corresponding to the corresponding tag from the data storage unit 305 in step S504, and returns a search request response. A response is made to the user terminal device 201 that has made a search request via the unit 307.
If the data corresponding to the search request does not exist in the data storage unit 305, the search processing unit 303 indicates that there is no corresponding data in the user terminal device 201 that made the search request via the search request answering unit 307. To answer.

以上の手順により、暗号化データベースにおいて、暗号化したままデータを検索する方法を安全かつ高速に実施することができる。 According to the above procedure, the method for searching data in the encrypted database while being encrypted can be safely and rapidly performed.

以上の実施の形態によれば、検索キーワードのハッシュ値のエントロピー符号を利用することで、検索キーワードに関する情報を漏らすことなく、索引を構成することが可能となり、高速な検索を実施できるという効果がある。 According to the above embodiment, by using the entropy code of the hash value of the search keyword, it is possible to configure an index without leaking information related to the search keyword, and an effect that a high-speed search can be performed is achieved. is there.

また、本実施の形態で構成する索引は１つ以上のキーワードに対応する検索タグをグルーピングすることが可能であり、索引からキーワードに対応するデータの分布を秘匿するという効果がある。 Further, the index configured in this embodiment can group search tags corresponding to one or more keywords, and has an effect of concealing the distribution of data corresponding to the keywords from the index.

また、保存されるキーワードの種類の増加に対して、索引を再構成する必要がなく、索引の維持のための計算量を低減するという効果がある。 Further, it is not necessary to reconstruct the index with respect to an increase in the number of saved keywords, and there is an effect that the amount of calculation for maintaining the index is reduced.

本実施の形態ではキーワードの鍵付ハッシュ値をハフマン符号化した値を索引値として利用する例を開示したが、キーワードから一意に得られる値であればよく、ハッシュ関数や共通鍵暗号（確定的暗号）を利用してもよい。 In the present embodiment, an example is disclosed in which a value obtained by encoding a keyed hash value of a keyword using a Huffman code is used as an index value. (Encryption) may be used.

また、本実施の形態では、エントロピー符号としてハフマン符号を利用したが、エントロピー符号であればよく、例えば算術符号であっても良い。 In this embodiment, the Huffman code is used as the entropy code. However, any entropy code may be used, and for example, an arithmetic code may be used.

また、本実施の形態では、最も単純な転置インデックスの形式を例としたが、索引値から検索対象のタグＩＤが得られればよく、２分木やＢ木を用いた索引木を構成しても良い。 In this embodiment, the simplest inverted index format is used as an example, but it is sufficient that the tag ID to be searched can be obtained from the index value, and an index tree using a binary tree or a B-tree is constructed. Also good.

以上、秘匿検索におけるエントロピー符号を利用した索引手法を開示したが、これは、文書情報のキーワード検索に限らず、任意のデータに対応した検索のためのキーデータ（検索キー）であれば、応用可能なことは明らかである。
つまり、本実施の形態で説明したキーワードとは、単語やセンテンスに限らず、あらゆる形式のキーデータを意味する。
このように、本実施の形態によれば、データを暗号化したままで複数の検索キーを用いた検索を高速に実施することができる。
したがって、画像検索、動画検索、音声検索などへの応用が可能である。 As mentioned above, although the index method using the entropy code in the secret search has been disclosed, this is not limited to the keyword search of the document information, but can be applied to any key data (search key) for search corresponding to arbitrary data. Clearly it is possible.
In other words, the keyword described in the present embodiment means not only words and sentences but also all types of key data.
Thus, according to the present embodiment, it is possible to perform a search using a plurality of search keys at a high speed while the data is encrypted.
Therefore, application to image search, video search, voice search, etc. is possible.

以上、本実施の形態では、
検索キーワードおよび、文書情報に設定するキーワードから索引値を生成する索引生成部と、
新規登録対象のキーワードタグと文書情報を受信する登録データ受信部と、
前記登録データ受信部から受信した全てのキーワードタグと文書情報を保存するデータ保管部と、
キーワードから一意に得られる乱数値（暗号文やハッシュ値）をエントロピー符号化した値（索引値）と、その索引値に対応するキーワードタグを示すタグＩＤとからなるエントリを複数保持するインデックスを記憶するインデックス記憶部と、
検索要求に含まれる索引値が前記インデックス記憶部に存在する場合に、検索対象となるタグＩＤから該当するタグをデータ保管部から取り出し、キーワードが一致するタグかどうかを検査する検索処理部と、
データ検索時には、索引処理部から受信した索引値から、対応するタグＩＤをインデックス記憶部から取り出し、データ登録時には登録データ受信部から受信した索引値とタグを用いてインデックス記憶部に索引情報を追加する索引分類部とを備える秘匿検索システムを説明した。 As described above, in the present embodiment,
An index generation unit that generates an index value from a search keyword and a keyword set in document information;
A registration data receiving unit for receiving a keyword tag and document information of a new registration target;
A data storage unit for storing all keyword tags and document information received from the registered data receiving unit;
Stores an index that holds a plurality of entries consisting of a value (index value) obtained by entropy encoding a random value (ciphertext or hash value) uniquely obtained from a keyword and a tag ID indicating a keyword tag corresponding to the index value. An index storage unit,
When an index value included in a search request exists in the index storage unit, a search processing unit that takes out a corresponding tag from the tag ID to be searched from the data storage unit and checks whether the keyword matches a tag,
At the time of data retrieval, the corresponding tag ID is extracted from the index storage unit from the index value received from the index processing unit, and at the time of data registration, index information is added to the index storage unit using the index value and tag received from the registered data reception unit A secret search system including an index classification unit to perform was described.

また、本実施の形態では、
前記索引生成部は、データ登録時の文書情報に設定するキーワードやデータ検索時の検索キーワードからエントロピー符号値を計算して索引値として出力し、
前記索引分類部は、前記索引生成部によって出力された索引値と、索引値に対応するタグを示すタグＩＤを保存および参照することを説明した。 In the present embodiment,
The index generation unit calculates an entropy code value from a keyword set in document information at the time of data registration and a search keyword at the time of data search, and outputs it as an index value.
It has been described that the index classification unit stores and refers to the index value output by the index generation unit and the tag ID indicating the tag corresponding to the index value.

また、本実施の形態では、
前記索引分類部は、前記索引生成部によって出力された索引値と、索引値に対応するタグを示すタグＩＤを表形式でインデックス記憶部に保存および参照することを説明した。 In the present embodiment,
It has been described that the index classification unit stores and refers to the index value output by the index generation unit and the tag ID indicating the tag corresponding to the index value in the index storage unit in a tabular format.

また、本実施の形態では、
前記索引分類部は、前記索引生成部によって出力された索引値と、索引値に対応するタグを示すタグＩＤを木形式（２分木、Ｂ木など）でインデックス記憶部に保存および参照することを説明した。 In the present embodiment,
The index classification unit stores and references the index value output by the index generation unit and the tag ID indicating the tag corresponding to the index value in the index storage unit in a tree format (binary tree, B-tree, etc.). Explained.

最後に、本実施の形態に示した利用者端末装置２０１及びデータセンター装置３０１のハードウェア構成例について説明する。
図１２は、本実施の形態に示す利用者端末装置２０１及びデータセンター装置３０１のハードウェア資源の一例を示す図である。
なお、図１２の構成は、あくまでも利用者端末装置２０１及びデータセンター装置３０１のハードウェア構成の一例を示すものであり、利用者端末装置２０１及びデータセンター装置３０１のハードウェア構成は図１２に記載の構成に限らず、他の構成であってもよい。 Finally, a hardware configuration example of the user terminal device 201 and the data center device 301 shown in the present embodiment will be described.
FIG. 12 is a diagram illustrating an example of hardware resources of the user terminal device 201 and the data center device 301 illustrated in the present embodiment.
The configuration of FIG. 12 is merely an example of the hardware configuration of the user terminal device 201 and the data center device 301, and the hardware configuration of the user terminal device 201 and the data center device 301 is described in FIG. It is not limited to this configuration, and other configurations may be used.

図１２において、利用者端末装置２０１及びデータセンター装置３０１は、プログラムを実行するＣＰＵ９１１（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ、中央処理装置、処理装置、演算装置、マイクロプロセッサ、マイクロコンピュータ、プロセッサともいう）を備えている。
ＣＰＵ９１１は、バス９１２を介して、例えば、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）９１３、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）９１４、通信ボード９１５、表示装置９０１、キーボード９０２、マウス９０３、磁気ディスク装置９２０と接続され、これらのハードウェアデバイスを制御する。
更に、ＣＰＵ９１１は、ＦＤＤ９０４（ＦｌｅｘｉｂｌｅＤｉｓｋＤｒｉｖｅ）、コンパクトディスク装置９０５（ＣＤＤ）と接続していてもよい。また、磁気ディスク装置９２０の代わりに、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）、光ディスク装置、メモリカード（登録商標）読み書き装置などの記憶装置でもよい。
ＲＡＭ９１４は、揮発性メモリの一例である。ＲＯＭ９１３、ＦＤＤ９０４、ＣＤＤ９０５、磁気ディスク装置９２０の記憶媒体は、不揮発性メモリの一例である。これらは、記憶装置の一例である。
本実施の形態で説明した文書・鍵格納領域２０２、インデックス記憶部３０４及びデータ保管部３０５は、ＲＡＭ９１４、磁気ディスク装置９２０等により実現される。
通信ボード９１５、キーボード９０２、マウス９０３、ＦＤＤ９０４などは、入力装置の一例である。
また、通信ボード９１５、表示装置９０１などは、出力装置の一例である。 In FIG. 12, the user terminal device 201 and the data center device 301 include a CPU 911 (also referred to as a central processing unit, a central processing unit, a processing unit, an arithmetic unit, a microprocessor, a microcomputer, and a processor) that executes a program. .
The CPU 911 is connected to, for example, a ROM (Read Only Memory) 913, a RAM (Random Access Memory) 914, a communication board 915, a display device 901, a keyboard 902, a mouse 903, and a magnetic disk device 920 via a bus 912. Control hardware devices.
Further, the CPU 911 may be connected to an FDD 904 (Flexible Disk Drive) or a compact disk device 905 (CDD). Further, instead of the magnetic disk device 920, a storage device such as an SSD (Solid State Drive), an optical disk device, or a memory card (registered trademark) read / write device may be used.
The RAM 914 is an example of a volatile memory. The storage media of the ROM 913, the FDD 904, the CDD 905, and the magnetic disk device 920 are an example of a nonvolatile memory. These are examples of the storage device.
The document / key storage area 202, the index storage unit 304, and the data storage unit 305 described in the present embodiment are realized by the RAM 914, the magnetic disk device 920, and the like.
The communication board 915, the keyboard 902, the mouse 903, the FDD 904, and the like are examples of input devices.
The communication board 915, the display device 901, and the like are examples of output devices.

通信ボード９１５は、図１に示すように、ネットワークに接続されている。
例えば、通信ボード９１５は、ＬＡＮ、インターネットの他、ＷＡＮ（ワイドエリアネットワーク）、ＳＡＮ（ストレージエリアネットワーク）などに接続されていても構わない。 As shown in FIG. 1, the communication board 915 is connected to a network.
For example, the communication board 915 may be connected to a WAN (wide area network), a SAN (storage area network), or the like in addition to the LAN and the Internet.

磁気ディスク装置９２０には、オペレーティングシステム９２１（ＯＳ）、ウィンドウシステム９２２、プログラム群９２３、ファイル群９２４が記憶されている。
プログラム群９２３のプログラムは、ＣＰＵ９１１がオペレーティングシステム９２１、ウィンドウシステム９２２を利用しながら実行する。 The magnetic disk device 920 stores an operating system 921 (OS), a window system 922, a program group 923, and a file group 924.
The programs in the program group 923 are executed by the CPU 911 using the operating system 921 and the window system 922.

また、ＲＡＭ９１４には、ＣＰＵ９１１に実行させるオペレーティングシステム９２１のプログラムやアプリケーションプログラムの少なくとも一部が一時的に格納される。
また、ＲＡＭ９１４には、ＣＰＵ９１１による処理に必要な各種データが格納される。 The RAM 914 temporarily stores at least part of the operating system 921 program and application programs to be executed by the CPU 911.
The RAM 914 stores various data necessary for processing by the CPU 911.

また、ＲＯＭ９１３には、ＢＩＯＳ（ＢａｓｉｃＩｎｐｕｔＯｕｔｐｕｔＳｙｓｔｅｍ）プログラムが格納され、磁気ディスク装置９２０にはブートプログラムが格納されている。
利用者端末装置２０１及びデータセンター装置３０１の起動時には、ＲＯＭ９１３のＢＩＯＳプログラム及び磁気ディスク装置９２０のブートプログラムが実行され、ＢＩＯＳプログラム及びブートプログラムによりオペレーティングシステム９２１が起動される。 The ROM 913 stores a BIOS (Basic Input Output System) program, and the magnetic disk device 920 stores a boot program.
When the user terminal device 201 and the data center device 301 are activated, the BIOS program in the ROM 913 and the boot program in the magnetic disk device 920 are executed, and the operating system 921 is activated by the BIOS program and the boot program.

上記プログラム群９２３には、本実施の形態の説明において「〜部」（「インデックス記憶部３０４及びデータ保管部３０５」以外、以下同様）として説明している機能を実行するプログラムが記憶されている。プログラムは、ＣＰＵ９１１により読み出され実行される。 The program group 923 stores a program for executing the function described as “˜” in the description of the present embodiment (other than “index storage unit 304 and data storage unit 305”). . The program is read and executed by the CPU 911.

ファイル群９２４には、本実施の形態の説明において、「〜の判断」、「〜の計算」、「〜の暗号化」、「〜の符号化」、「〜の比較」、「〜の照合」、「〜の参照」、「〜の検索」、「〜の抽出」、「〜の検査」、「〜の生成」、「〜の設定」、「〜の登録」、「〜の選択」、「〜の入力」、「〜の受信」等として説明している処理の結果を示す情報やデータや信号値や変数値やパラメータが、「〜ファイル」や「〜データベース」の各項目として記憶されている。
「〜ファイル」や「〜データベース」は、ディスクやメモリなどの記録媒体に記憶される。
ディスクやメモリなどの記憶媒体に記憶された情報やデータや信号値や変数値やパラメータは、読み書き回路を介してＣＰＵ９１１によりメインメモリやキャッシュメモリに読み出される。
そして、読み出された情報やデータや信号値や変数値やパラメータは、抽出・検索・参照・比較・演算・計算・処理・編集・出力・印刷・表示などのＣＰＵの動作に用いられる。
抽出・検索・参照・比較・演算・計算・処理・編集・出力・印刷・表示のＣＰＵの動作の間、情報やデータや信号値や変数値やパラメータは、メインメモリ、レジスタ、キャッシュメモリ、バッファメモリ等に一時的に記憶される。
また、本実施の形態で説明しているフローチャートの矢印の部分は主としてデータや信号の入出力を示す。
データや信号値は、ＲＡＭ９１４のメモリ、ＦＤＤ９０４のフレキシブルディスク、ＣＤＤ９０５のコンパクトディスク、磁気ディスク装置９２０の磁気ディスク、その他光ディスク、ミニディスク、ＤＶＤ等の記録媒体に記録される。
また、データや信号は、バス９１２や信号線やケーブルその他の伝送媒体によりオンライン伝送される。 In the description of the present embodiment, the file group 924 includes “determination of”, “calculation of”, “encryption of”, “encoding of”, “comparison of”, and “verification of”. ”,“ Refer to ”,“ search for ”,“ extract ”,“ examine ”,“ generate ”,“ set ”,“ register ”,“ select ”, Information, data, signal values, variable values, and parameters indicating the results of the processing described as “input of”, “reception of”, etc. are stored as items of “˜file” and “˜database”. ing.
The “˜file” and “˜database” are stored in a recording medium such as a disk or a memory.
Information, data, signal values, variable values, and parameters stored in a storage medium such as a disk or memory are read out to the main memory or cache memory by the CPU 911 via a read / write circuit.
The read information, data, signal value, variable value, and parameter are used for CPU operations such as extraction, search, reference, comparison, calculation, calculation, processing, editing, output, printing, and display.
Information, data, signal values, variable values, and parameters are stored in the main memory, registers, cache memory, and buffers during the CPU operations of extraction, search, reference, comparison, calculation, processing, editing, output, printing, and display. It is temporarily stored in a memory or the like.
In addition, the arrows in the flowchart described in this embodiment mainly indicate input / output of data and signals.
Data and signal values are recorded on a recording medium such as a memory of the RAM 914, a flexible disk of the FDD 904, a compact disk of the CDD 905, a magnetic disk of the magnetic disk device 920, other optical disks, a mini disk, and a DVD.
Data and signals are transmitted online via a bus 912, signal lines, cables, or other transmission media.

また、本実施の形態の説明において「〜部」として説明しているものは、「〜回路」、「〜装置」、「〜機器」であってもよく、また、「〜ステップ」、「〜手順」、「〜処理」であってもよい。
すなわち、本実施の形態で説明したフローチャートに示すステップ、手順、処理により、本発明に係るデータ処理方法及びデータ保管方法を実現することができる。
また、「〜部」として説明しているものは、ＲＯＭ９１３に記憶されたファームウェアで実現されていても構わない。
或いは、ソフトウェアのみ、或いは、素子・デバイス・基板・配線などのハードウェアのみ、或いは、ソフトウェアとハードウェアとの組み合わせ、さらには、ファームウェアとの組み合わせで実施されても構わない。
ファームウェアとソフトウェアは、プログラムとして、磁気ディスク、フレキシブルディスク、光ディスク、コンパクトディスク、ミニディスク、ＤＶＤ等の記録媒体に記憶される。
プログラムはＣＰＵ９１１により読み出され、ＣＰＵ９１１により実行される。
すなわち、プログラムは、本実施の形態の「〜部」としてコンピュータを機能させるものである。あるいは、本実施の形態の「〜部」の手順や方法をコンピュータに実行させるものである。 In addition, what is described as “˜unit” in the description of the present embodiment may be “˜circuit”, “˜device”, “˜device”, and “˜step”, “˜”. “Procedure” and “˜Process” may be used.
That is, the data processing method and the data storage method according to the present invention can be realized by the steps, procedures, and processes shown in the flowchart described in the present embodiment.
Further, what is described as “˜unit” may be realized by firmware stored in the ROM 913.
Alternatively, it may be implemented only by software, or only by hardware such as elements, devices, substrates, and wirings, by a combination of software and hardware, or by a combination of firmware.
Firmware and software are stored as programs in a recording medium such as a magnetic disk, a flexible disk, an optical disk, a compact disk, a mini disk, and a DVD.
The program is read by the CPU 911 and executed by the CPU 911.
In other words, the program causes the computer to function as “to part” of the present embodiment. Alternatively, the procedure or method of “˜unit” in the present embodiment is executed by a computer.

このように、本実施の形態に示す利用者端末装置２０１及びデータセンター装置３０１は、処理装置たるＣＰＵ、記憶装置たるメモリ、磁気ディスク等、入力装置たるキーボード、マウス、通信ボード等、出力装置たる表示装置、通信ボード等を備えるコンピュータである。
そして、上記したように「〜部」として示された機能をこれら処理装置、記憶装置、入力装置、出力装置を用いて実現するものである。 As described above, the user terminal device 201 and the data center device 301 shown in the present embodiment are output devices such as a CPU as a processing device, a memory as a storage device, a magnetic disk, a keyboard as an input device, a mouse, a communication board, and the like. A computer including a display device, a communication board, and the like.
Then, as described above, the functions indicated as “˜units” are realized using these processing devices, storage devices, input devices, and output devices.

１００秘匿検索システム、１０１ネットワーク、１０２社内ＬＡＮ、２０１利用者端末装置、２０２文書・鍵格納領域、２０３文書情報管理部、２０４利用者Ｉ／Ｆ部、２０５検索クエリ生成部、２０６タグ生成部、２０７通信部、２０８索引生成部、３０１データセンター装置、３０２検索要求受信部、３０３検索処理部、３０４インデックス記憶部、３０５データ保管部、３０６索引分類部、３０７検索要求回答部、３０８登録データ受信部。 DESCRIPTION OF SYMBOLS 100 Secret search system, 101 Network, 102 Internal LAN, 201 User terminal device, 202 Document / key storage area, 203 Document information management part, 204 User I / F part, 205 Search query generation part, 206 Tag generation part, 207 Communication unit, 208 Index generation unit, 301 Data center device, 302 Search request reception unit, 303 Search processing unit, 304 Index storage unit, 305 Data storage unit, 306 Index classification unit, 307 Search request response unit, 308 Registration data reception Department.

Claims

Connected to a data storage device that stores a plurality of encrypted data and tag data that is associated with each encrypted data and that is collated when searching for the encrypted data;
A keyword designating unit for designating a keyword of data to be stored to be stored in the data storage device as a storage keyword;
An index value assigned to the tag data when the tag data associated with the encrypted data of the storage target data is stored in the data storage device by entropy encoding a random value uniquely obtained from the storage keyword An index generation unit for generating
A data processing apparatus comprising: a communication unit that transmits a storage request including encrypted data of the storage target data, the tag data, and the index value to the data storage apparatus.

The data processing device includes:
When a search request including a search keyword that is concealed and an index value of tag data that is a target of collation with the concealed search keyword is received, it matches the index value included in the search request The tag data associated with the index value is extracted, connected to the data storage device that searches the encrypted data by comparing the extracted tag data with the concealed search keyword,
The index generation unit
The data processing apparatus according to claim 1, wherein an index value to be compared with an index value included in the search request is generated in the data storage apparatus.

The data processing device includes:
Connected to a data storage device that stores encrypted data, tag data, and index values in association with each other,
The keyword designating part is
Specify a search keyword that causes the data storage device to search for encrypted data;
The data processing device further includes:
A search keyword concealment unit that conceals the search keyword;
The index generation unit
A random number value uniquely obtained from the search keyword is entropy-encoded and stored in the data storage device to select tag data to be collated with a concealed search keyword in the data storage device. Generate an index value that is compared with the index value
The communication unit is
The data processing apparatus according to claim 1, wherein a search request including the concealed search keyword and the index value is transmitted to the data storage apparatus.

The index generation unit
A hash value of the storage keyword is calculated, the calculated hash value is Huffman-encoded, and tag data associated with the encrypted data of the storage target data is attached to the tag data when stored in the data storage device. Index value to be generated,
The data for calculating the hash value of the search keyword, Huffman encoding the calculated hash value, and selecting tag data to be collated with the concealed search keyword in the data storage device 4. The data processing apparatus according to claim 3, wherein an index value to be compared with an index value stored in the storage apparatus is generated.

A data storage device connected to a data processing device and storing encrypted data transmitted from the data processing device,
From the data processing device, the encrypted data to be stored, the tag data that is collated when searching for the encrypted data, and a random value that is uniquely obtained from the storage keyword specified in the encrypted data to be stored A storage request receiving unit that receives a storage request including a storage index value obtained by entropy encoding, and sets an ID (Identification) of tag data included in the received storage request as a tag ID;
An index information generating unit that generates index information that associates a tag ID set by the storage request receiving unit with a storage index value included in the storage request;
An index storage unit for storing the index information generated by the index information generation unit;
A data storage device, comprising: a data storage unit that stores encrypted data to be stored included in the storage request and tag data included in the storage request in association with each other.

The index information generation unit
When existing index information describing the same storage index value as the storage index value included in the storage request is stored in the index storage unit, the existing index information is associated with the same storage search value. 6. The data storage device according to claim 5, wherein the tag ID of the tag data included in the storage request is associated with the storage index value in the existing index information together with the other tag IDs.

The data storage device is:
Connected to multiple data processing devices,
The data storage device further includes:
Obtained by entropy encoding a concealed search keyword and a random value uniquely obtained from the concealed search keyword from a data processing device that requests retrieval of encrypted data among the plurality of data processing devices A search request receiver for receiving a search request including a search index value;
A tag ID selection unit that selects one or more tag IDs associated with a storage index value that matches a search index value included in the search request with reference to the index information stored in the index storage unit; ,
Tag data corresponding to the tag ID selected by the tag ID selection unit is extracted from the data storage unit, and each extracted tag data is compared with a concealed search keyword included in the search request, A data extraction unit that identifies tag data generated from a storage keyword that matches a search keyword and extracts encrypted data associated with the identified tag data from the data storage unit; The data storage device according to claim 5 or 6.

The index information generation unit
8. The data storage device according to claim 5, wherein index information that associates the tag ID with the storage index value in a table format is generated.

The index information generation unit
9. The data storage device according to claim 5, wherein index information that associates the tag ID with the storage index value in a tree format is generated.

A data processing method performed by a computer connected to a data storage device that stores a plurality of encrypted data and tag data that is associated with each encrypted data and that is collated when searching for encrypted data There,
A keyword specifying step in which the computer specifies a keyword of storage target data to be stored in the data storage device as a storage keyword;
The computer entropy-encodes a random value uniquely obtained from the storage keyword, and is attached to the tag data when tag data associated with the encrypted data of the storage target data is stored in the data storage device. Generating an index value to be generated; and
A data processing method comprising: a communication step in which the computer transmits a storage request including encrypted data of the storage target data, the tag data, and the index value to the data storage device. .

A data storage method performed by a computer connected to a data processing device and storing encrypted data transmitted from the data processing device,
The computer uniquely obtains from the data processing device from the encrypted data to be stored, the tag data that is collated when searching for the encrypted data, and the storage keyword specified in the encrypted data to be stored. A storage request receiving step for receiving a storage request including a storage index value obtained by entropy encoding a random value to be obtained, and setting an ID (Identification) of tag data included in the received storage request as a tag ID; ,
An index information generating step for generating index information in which the computer associates the tag ID set in the storage request receiving step with a storage index value included in the storage request;
An index storing step in which the computer stores the index information generated by the index information generating step;
A data storage method, comprising: a data storage step in which the computer stores the encrypted data to be stored included in the storage request in association with the tag data included in the storage request.

To a computer connected to a data storage device that stores a plurality of encrypted data and tag data that is associated with each encrypted data and that is collated when searching for encrypted data,
A keyword designating step of designating as a storage keyword a keyword of data to be stored to be stored in the data storage device;
An index value assigned to the tag data when the tag data associated with the encrypted data of the storage target data is stored in the data storage device by entropy encoding a random value uniquely obtained from the storage keyword An index generation step for generating
A program for executing a communication step of transmitting a storage request including encrypted data of the storage target data, the tag data, and the index value to the data storage device.

A computer connected to a data processing device and storing encrypted data transmitted from the data processing device,
From the data processing device, the encrypted data to be stored, the tag data that is collated when searching for the encrypted data, and a random value that is uniquely obtained from the storage keyword specified in the encrypted data to be stored A storage request reception step of receiving a storage request including a storage index value obtained by entropy encoding, and setting an ID (Identification) of tag data included in the received storage request as a tag ID;
An index information generating step for generating index information for associating the tag ID set by the storage request receiving step with the storage index value included in the storage request;
An index storage step for storing the index information generated by the index information generation step;
A program for executing a data storage step of storing encrypted data to be stored included in the storage request and tag data included in the storage request in association with each other.