EP3583518A1 - Procédé de recherche d'informations dans un corpus chiffre stocke sur un serveur - Google Patents
Procédé de recherche d'informations dans un corpus chiffre stocke sur un serveurInfo
- Publication number
- EP3583518A1 EP3583518A1 EP18706792.1A EP18706792A EP3583518A1 EP 3583518 A1 EP3583518 A1 EP 3583518A1 EP 18706792 A EP18706792 A EP 18706792A EP 3583518 A1 EP3583518 A1 EP 3583518A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- encrypted
- server
- document
- encryption
- client equipment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6227—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
- G06F16/319—Inverted lists
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/008—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols involving homomorphic encryption
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/11—Patent retrieval
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/21—Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/2107—File encryption
Definitions
- the field of the invention relates to searching information in a database in a form that preserves the confidentiality of data and queries.
- the application relates in particular to systems for processing personal data, and in particular health data.
- Databases are an integral part of many applications, such as financial applications and eHealth applications. Databases can be very sensitive, containing valuable data from a company or individuals. Theft of sensitive data is a growing concern for individuals, businesses and governments.
- Databases can be collections of raw files or managed using the database management system (DBMS), such as the Oracle database, MySQL, Microsoft SQL Server, and so on.
- DBMS database management system
- a database can be deployed on a server within a company, on a virtual server in a cloud, or on a DBMS service in a cloud. Data theft is a concern for every type of deployment.
- a database system can also be deployed by a company on a virtual server, which runs on a cloud like Amazon Elastic Compute Cloud (Amazon EC2).
- Amazon Elastic Compute Cloud Amazon Elastic Compute Cloud
- the virtual server that underlies the database is physically under the control of the cloud provider, and on the enterprise virtual server installs DBMS to manage their databases.
- data theft also occurs in this case, if the cloud infrastructure is compromised by attackers, infected with malware or viruses, and the company's database administrators could violate the confidentiality and integrity of the databases.
- cloud providers are not all trustworthy; they can steal database data from the virtual servers provided by them.
- the homomorphic encryption methods have notably been developed for search engine applications: the user sends an encrypted request to the search engine, without the latter being aware of the request received. It applies a conventional search operation for corresponding documents and returns the response to the user in an encrypted manner. Thus, the search engine never knows the content in clear the query.
- Another application relates to biometrics using a fingerprint database of persons authorized to perform an action, for example entering a protected building. These fingerprints are naturally encrypted because they are non-revocable personal data.
- the search tree is encrypted with a first private encryption key.
- the server receives a request from a client, the request comprising a set of keywords, in which each request term is encrypted with the first private encryption key.
- the search is performed using a query and evaluation at each node of the tree to determine if one or more matches exist. The answer is based on the match of keywords for each document and one or more nodes encrypted with the first private encryption key.
- European patent EP2865127 describes a homomorphic encryption for database interrogation. Numeric values are encrypted using keys and random numbers to produce encrypted text.
- the ciphertext is homomorphic and consists of two or more ciphered subtexts. Queries using addition, averaging and multiplication operations can be performed without decrypting the numerical values applicable to the query. Each encrypted subtext is stored in a single record and in separate attributes.
- the invention relates to methods for encryption and decryption, creating an appropriate table, querying such a database and updating such a database.
- solutions of the prior art have a major disadvantage resulting from the computing power necessary to run on the server the encryption processing homomorphic each indexing of a new document and each new request. For this reason, the solutions of the prior art are applicable only to very small corpora, for example a business directory or a small set of textual documents.
- the solutions of the prior art are limited to searching for documents on the basis of a binary criterion of presence or absence in the document of a term of the request, without making it possible to effectively propose a scheduling the relevance of the documents corresponding to the request.
- the method according to the invention proposes an effective solution to the search for information in a large encrypted corpus.
- the invention relates to a first aspect of a method for searching information in an encrypted corpus stored on a server, from a digital query calculated on a piece of equipment.
- client containing a sequence of terms, comprising the following steps:
- said first table TF ⁇ comprising, for each indexed term w i of the document, the number of occurrences of the word w in document i - said second table Adf L constituted by the index of words w i in the document
- An additional step executed on the client equipment aggregating said identifiers of the data contained in said encrypted response and in the df_A index recorded on the client equipment
- the method comprises a step of reconstitution on the client equipment of the index df_A from the encrypted information ⁇ Adfi ⁇ recorded for each document i in the dedicated space of the server assigned to the user A.
- the calculations performed on the server are implemented in parallel and / or distributed manner.
- the server (2) is constituted by a cloud platform (in English "cloud”).
- the invention also relates to a method for preparing a searchable database containing a sequence of terms, characterized in that it comprises the following steps: a) calculation steps on the client equipment, when introducing a new documentable document i, for each document i belonging to the corpus, a first table TF ⁇ and a second table Adf L
- said first table TF ⁇ comprising, for each indexed term w i of the document, the number of occurrences of the word w in document i
- said second table Adf L constituted by the presence or absence of each term w in the document ib) the encryption of the document i and said table
- Adf L Adf L
- Adf L the encryption by a homomorphic encryption method of said TF L table
- the invention also relates to a method of searching information in an encrypted corpus stored on a server, from a digital query calculated on a client equipment, containing a sequence of terms, characterized in that it comprises the following steps :
- FIG. 1 represents a schematic view of a computer system according to the invention
- FIG. 2 represents a schematic view of the data flows between the various computing resources.
- Hardware architecture Figure 1 shows a schematic view of the hardware architecture of the invention.
- It comprises computer equipment (1) client connected to a server (2) by a computer network, for example the Internet.
- the server (2) is associated with a memory (3) for the registration of a database.
- the server (2) comprises a processor for performing digital processing.
- the server (2) and the memories (3) are in a particular example constituted by a set of distributed resources, for example of the "cloud” type.
- Functional architecture constituted by a set of distributed resources, for example of the "cloud” type.
- Figure 2 shows an example of a functional architecture.
- the client equipment (1) performs the initial processing of a document i constituted by a digital file (9) stored in a working memory.
- each term of the document is subject to prior pretreatment by known means of radicalization ("stemming" in English), list of exclusions (deletion of common words (“stop list” in English) and any other usual linguistic treatment).
- the first task is to apply an encryption to the document i with a known cryptographic method, for example AES symmetric encryption and records an encrypted version (10) of this document on the client equipment, and optionally on the server (2) or a third-party storage service.
- a known cryptographic method for example AES symmetric encryption
- the corpus of encrypted documents thus defined forms the basis of documents (32).
- a second task executed in parallel or sequentially, consists in calculating an index of the occurrences of the terms present in the file (9), and in recording a table TF ⁇ (14) of the occurrences, in the form of a list of the terms W j present in the document i, each of the terms W j of this list being associated with a number corresponding to the occurrence tf lfj of the term W j in the document i.
- the TF ⁇ (14) table is therefore of type [W j ; tf lfj ] ⁇ j for a document i.
- a third task consists of calculating an Adf table L (15) corresponding, for each term w j, the presence or absence of the term in the document.
- This table Adf L (15) is therefore of type
- the encryption of the TF ⁇ (14) table is then carried out by a homomorphic encryption method, for example according to a method described in the article Zhou, H., & Wornell, G. (2014, February). Efficient homomorphic encryption on integer vectors and its applications. In Information Theory and Applications Workshop (ITA), 2014 (pp. 1-9). IEEE.
- the result of this encryption of the TF ⁇ (14) table is a set of encrypted data (11).
- Each set of encrypted data (11) is transmitted by the client equipment (1) to the server (2).
- the grouping of the encrypted data sets (11) constitutes an encrypted basis (30) of all ⁇ TF 1 ⁇ 1 .
- an encryption of the Adf L table (15) is carried out according to a known method, by AES example and the transmission to the server (2) to register an encrypted version (12) on the server (2).
- the set of encrypted files (12) stored on the server constitutes a base (31).
- Each encrypted file (12) recorded on the server (2) makes it possible to reconstruct a df_A (13) table by decryption by an inverse algorithm to that used for the encryption of above.
- This table df_A (13) is calculated only on the client equipment (1), from:
- This data preparation step leads to the recording on the server of data which are not directly queryable and which do not reveal significant information on the content or the documents, notably in case of server attack or action malicious of a privileged user. querying
- the request is made by issuing a textual request formed by a combination of words (20) from the client equipment (1).
- this request (20) is preprocessed by known means of radicalization type ("stemming" in English, list of exclusions (deletion of common words (“Stop list” in English) and any other usual language treatment.
- the request (20) is encrypted with the same homomorphic encryption method used for encrypting the TF ⁇ (14) table to obtain an encrypted request (21).
- the encrypted request (21) is transmitted to the server (2) which records to form a request (40).
- the server (2) calculates an encrypted response (41).
- This processing consists in calculating, in the encrypted domain, the number of occurrences of each term q k of the request (40) for each document i known.
- the client (1) is then able to decrypt the response (50) to calculate a decrypted response (51).
- the client (1) can combine the response (51) and the df_A (13) table to calculate a TF-IDF (52) score (52) (English Term Frequency-Inverse Document Frequency) according to a known method.
- This score (52) constitutes a classification key of the documents i in order of relevance with respect to the request (20).
- the client equipment (1) presents results in the manner of a search engine and allows the user to find the corresponding record.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Computer Hardware Design (AREA)
- General Health & Medical Sciences (AREA)
- Bioethics (AREA)
- Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Storage Device Security (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| FR1751241A FR3062936B1 (fr) | 2017-02-15 | 2017-02-15 | Procede de recherche d'informations dans un corpus chiffre stocke sur un serveur |
| PCT/FR2018/050276 WO2018150119A1 (fr) | 2017-02-15 | 2018-02-05 | Procédé de recherche d'informations dans un corpus chiffre stocke sur un serveur |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| EP3583518A1 true EP3583518A1 (fr) | 2019-12-25 |
Family
ID=59974493
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP18706792.1A Withdrawn EP3583518A1 (fr) | 2017-02-15 | 2018-02-05 | Procédé de recherche d'informations dans un corpus chiffre stocke sur un serveur |
Country Status (5)
| Country | Link |
|---|---|
| US (1) | US11308233B2 (fr) |
| EP (1) | EP3583518A1 (fr) |
| CA (1) | CA3050353A1 (fr) |
| FR (1) | FR3062936B1 (fr) |
| WO (1) | WO2018150119A1 (fr) |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12099997B1 (en) | 2020-01-31 | 2024-09-24 | Steven Mark Hoffberg | Tokenized fungible liabilities |
| US20240411918A1 (en) * | 2023-06-06 | 2024-12-12 | Inigo Labs, Inc. | Edge-based access control for data queries |
| WO2025217386A1 (fr) * | 2024-04-12 | 2025-10-16 | Apple Inc. | Requêtes préservant la confidentialité à l'aide d'un modèle sur dispositif |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9442980B1 (en) * | 2010-04-21 | 2016-09-13 | Stan Trepetin | Mathematical method for performing homomorphic operations |
Family Cites Families (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8677499B2 (en) * | 2005-12-29 | 2014-03-18 | Nextlabs, Inc. | Enforcing access control policies on servers in an information management system |
| US8321437B2 (en) * | 2005-12-29 | 2012-11-27 | Nextlabs, Inc. | Detecting behavioral patterns and anomalies using activity profiles |
| US9081981B2 (en) * | 2005-12-29 | 2015-07-14 | Nextlabs, Inc. | Techniques and system to manage access of information using policies |
| US7716240B2 (en) * | 2005-12-29 | 2010-05-11 | Nextlabs, Inc. | Techniques and system to deploy policies intelligently |
| WO2010026561A2 (fr) * | 2008-09-08 | 2010-03-11 | Confidato Security Solutions Ltd. | Appareil, système, procédé et composants logiciels correspondants pour le cryptage et le traitement de données |
| US20100146299A1 (en) * | 2008-10-29 | 2010-06-10 | Ashwin Swaminathan | System and method for confidentiality-preserving rank-ordered search |
| US8904171B2 (en) | 2011-12-30 | 2014-12-02 | Ricoh Co., Ltd. | Secure search and retrieval |
| WO2013188929A1 (fr) | 2012-06-22 | 2013-12-27 | Commonwealth Scientific And Industrial Research Organisation | Cryptage homomorphe pour interrogation de base de données |
| EP2709028A1 (fr) * | 2012-09-14 | 2014-03-19 | Ecole Polytechnique Fédérale de Lausanne (EPFL) | Technologies renforçant la protection de la vie privée pour tests médicaux à l'aide de données génomiques |
| US9536047B2 (en) * | 2012-09-14 | 2017-01-03 | Ecole Polytechnique Federale De Lausanne (Epfl) | Privacy-enhancing technologies for medical tests using genomic data |
| WO2015017787A2 (fr) * | 2013-08-01 | 2015-02-05 | Visa International Service Association | Systèmes, procédés et appareils pour opérations de bases de données homomorphiques |
| US9501661B2 (en) * | 2014-06-10 | 2016-11-22 | Salesforce.Com, Inc. | Systems and methods for implementing an encrypted search index |
| US10037433B2 (en) * | 2015-04-03 | 2018-07-31 | Ntt Docomo Inc. | Secure text retrieval |
| US20170293913A1 (en) * | 2016-04-12 | 2017-10-12 | The Governing Council Of The University Of Toronto | System and methods for validating and performing operations on homomorphically encrypted data |
| US10783270B2 (en) * | 2018-08-30 | 2020-09-22 | Netskope, Inc. | Methods and systems for securing and retrieving sensitive data using indexable databases |
-
2017
- 2017-02-15 FR FR1751241A patent/FR3062936B1/fr active Active
-
2018
- 2018-02-05 US US16/483,684 patent/US11308233B2/en active Active
- 2018-02-05 EP EP18706792.1A patent/EP3583518A1/fr not_active Withdrawn
- 2018-02-05 WO PCT/FR2018/050276 patent/WO2018150119A1/fr not_active Ceased
- 2018-02-05 CA CA3050353A patent/CA3050353A1/fr active Pending
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9442980B1 (en) * | 2010-04-21 | 2016-09-13 | Stan Trepetin | Mathematical method for performing homomorphic operations |
Non-Patent Citations (2)
| Title |
|---|
| See also references of WO2018150119A1 * |
| TANG JUN TANGJ13@MAILS TSINGHUA EDU CN AND TANG_JUN_76@163 COM ET AL: "Ensuring Security and Privacy Preservation for Cloud Data Services", ACM COMPUTING SURVEYS, ACM, NEW YORK, NY, US, US, vol. 49, no. 1, 6 June 2016 (2016-06-06), pages 1 - 39, XP058666244, ISSN: 0360-0300, DOI: 10.1145/2906153 * |
Also Published As
| Publication number | Publication date |
|---|---|
| FR3062936B1 (fr) | 2021-01-01 |
| US20200019723A1 (en) | 2020-01-16 |
| WO2018150119A1 (fr) | 2018-08-23 |
| FR3062936A1 (fr) | 2018-08-17 |
| CA3050353A1 (fr) | 2018-08-23 |
| US11308233B2 (en) | 2022-04-19 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10498706B2 (en) | Searchable encryption enabling encrypted search based on document type | |
| US9825925B2 (en) | Method and apparatus for securing sensitive data in a cloud storage system | |
| Zhang et al. | Pop: Privacy-preserving outsourced photo sharing and searching for mobile devices | |
| US10404669B2 (en) | Wildcard search in encrypted text | |
| Boldyreva et al. | Masking fuzzy-searchable public databases | |
| Cheng et al. | Person re-identification over encrypted outsourced surveillance videos | |
| US20140108435A1 (en) | Secure private database querying system with content hiding bloom fiters | |
| US20230306131A1 (en) | Systems and methods for tracking propagation of sensitive data | |
| Emekci et al. | Dividing secrets to secure data outsourcing | |
| US10284535B2 (en) | Secure database | |
| US20210184840A1 (en) | Encrypted Search with a Public Key | |
| EP2494491B1 (fr) | Identification par controle de donnees biometriques d'utilisateur | |
| EP3583518A1 (fr) | Procédé de recherche d'informations dans un corpus chiffre stocke sur un serveur | |
| Cui et al. | Harnessing encrypted data in cloud for secure and efficient image sharing from mobile devices | |
| Heen et al. | On the privacy impacts of publicly leaked password databases | |
| EP3461055B1 (fr) | Système et procédé pour assurer l'annotation externalisée sécurisée d'ensembles de données | |
| CN106874379B (zh) | 一种面向密文云存储的多维区间检索方法与系统 | |
| Malhotra et al. | An efficacy analysis of data encryption architecture for cloud platform | |
| CN106789007B (zh) | 一种基于密文检索的网络信息审查方法与系统 | |
| US12463947B1 (en) | Privacy preserving protocol for serving user-specific supplemental content | |
| Praveen et al. | On the Design of a Searchable Encryption Protocol for Keyword Search using Proactive Secret Sharing | |
| Alamri et al. | Secure sharing of health data over cloud | |
| Agarwal et al. | Privacy Preserving content-based image retrieval using Cloud Computing | |
| Surrah | Multi Keyword Retrieval On Secured Cloud | |
| Poon et al. | Privacy-aware search and computation over encrypted data stores |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
| 17P | Request for examination filed |
Effective date: 20190809 |
|
| AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
| AX | Request for extension of the european patent |
Extension state: BA ME |
|
| DAV | Request for validation of the european patent (deleted) | ||
| DAX | Request for extension of the european patent (deleted) | ||
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
| 17Q | First examination report despatched |
Effective date: 20211125 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
| 18D | Application deemed to be withdrawn |
Effective date: 20231130 |