WO2023050687A1 - Procédé et appareil d'alignement d'échantillon dans un apprentissage fédéré, dispositif, et support de stockage - Google Patents
Procédé et appareil d'alignement d'échantillon dans un apprentissage fédéré, dispositif, et support de stockage Download PDFInfo
- Publication number
- WO2023050687A1 WO2023050687A1 PCT/CN2022/076928 CN2022076928W WO2023050687A1 WO 2023050687 A1 WO2023050687 A1 WO 2023050687A1 CN 2022076928 W CN2022076928 W CN 2022076928W WO 2023050687 A1 WO2023050687 A1 WO 2023050687A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- encrypted
- participant
- sample
- private key
- identifier
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/64—Protecting data integrity, e.g. using checksums, certificates or signatures
Definitions
- the embodiments of the present invention relate to the technical field of artificial intelligence, and in particular to a sample alignment method, device, device and storage medium in federated learning.
- Sample alignment is one of the important steps in the vertical federated learning technology. Without revealing the sample ID (Identity Document) of the participants, the intersection of the sample IDs of the participants is obtained, which is convenient for subsequent longitudinal federated learning modeling.
- the embodiment of the present application provides a sample alignment method, device, device and storage medium in federated learning, which are used to solve the problem of sample data security.
- the embodiment of the present application provides a sample alignment method in federated learning, the method includes:
- the first public key is generated based on the first private key and a preset value ;
- the second encrypted identification set is based on the second private key and the first public key respectively for the second participant obtained by encrypting at least one second sample identification of the party;
- the embodiment of the present application provides a sample alignment method in federated learning, the method includes:
- the second public key is generated based on the second private key and a preset value ;
- the first encrypted identification set is based on the first private key and the second public key respectively for the first participant obtained through encryption of at least one first sample identifier of the party;
- the embodiment of the present application provides a sample alignment device in federated learning, which includes:
- the first obtaining module is configured to obtain the first private key and the first public key of the first participant, and send the first public key to the second device, the first public key is based on the first private key Generated by keys and default values;
- the first receiving module is configured to receive the second public key of the second participant sent by the second device, wherein the second public key is based on the second private key of the second participant and the preset Generated by setting;
- a first encryption module configured to respectively encrypt at least one first sample ID of the first participant based on the first private key and the second public key, and obtain a first encrypted ID set
- the first receiving module is further configured to receive a second encrypted identification set of the second participant sent by the second device, the second encrypted identification set is based on the second private key and the first Obtained by encrypting at least one second sample ID of the second participant with a public key;
- a first sample alignment module configured to obtain a sample alignment result of the first participant and the second participant based on the intersection of the first encrypted identifier set and the second encrypted identifier set.
- the first encryption module is also used for:
- the set of first encrypted identifiers includes first encrypted identifiers corresponding to each of the at least one first sample identifier;
- the first encryption module is specifically used for:
- the first encryption module is specifically used for:
- a hash algorithm is used to encrypt the first public-private key encrypted identifier to obtain a first encrypted identifier corresponding to the one first sample identifier.
- the first sample alignment module is specifically used for:
- the first sample identifier corresponding to each first encrypted identifier in the target encrypted identifier set is used as a sample alignment result of the first participant and the second participant.
- the first private key is generated based on the SM2 elliptic curve, and the preset value is the base point of the SM2 elliptic curve;
- the first acquisition module is specifically used for:
- the first public key is generated by multiplying the first private key and the base point.
- the embodiment of the present application provides a sample alignment device in federated learning, which includes:
- the second acquiring module is configured to acquire the second private key and the second public key of the second participant, and send the second public key to the first device, the second public key is based on the second private key Generated by keys and default values;
- the second receiving module is configured to receive the first public key of the first participant sent by the first device, wherein the first public key is based on the first private key of the first participant and the predetermined Generated by setting;
- a second encryption module configured to respectively encrypt at least one second sample identification of the second participant based on the second private key and the first public key, and obtain a second encrypted identification set
- the second receiving module is further configured to receive the first encrypted identification set of the first participant sent by the first device, the first encrypted identification set is based on the first private key and the first encrypted identification set
- the two public keys are respectively obtained by encrypting at least one first sample identification of the first participant;
- a second sample alignment module configured to obtain a sample alignment result of the second participant and the first participant based on the intersection of the second encrypted identifier set and the first encrypted identifier set.
- the second encryption module is also used for:
- the set of second encrypted identifiers includes second encrypted identifiers corresponding to the at least one second sample identifier
- the second encryption module is specifically used for:
- the second encryption module is specifically used for:
- a hash algorithm is used to encrypt the second public-private key encrypted identifier to obtain a second encrypted identifier corresponding to the one second sample identifier.
- the second sample alignment module is specifically used for:
- the second sample identifier corresponding to each second encrypted identifier in the target encrypted identifier set is used as a sample alignment result of the second participant and the first participant.
- the second private key is generated based on the SM2 elliptic curve, and the preset value is the base point of the SM2 elliptic curve;
- the second acquiring module is specifically used for:
- the second public key is generated by multiplying the second private key and the base point.
- an embodiment of the present application provides a computer device, including a memory, a processor, and a computer program stored on the memory and operable on the processor.
- the processor executes the program, it implements the above-mentioned federated learning. The steps of the sample alignment method.
- the embodiment of the present application provides a computer-readable storage medium, which stores a computer program executable by a computer device, and when the program is run on the computer device, the computer device executes the above-mentioned federated learning.
- the steps of the sample alignment method are described in detail below.
- an embodiment of the present application provides a computer program product
- the computer program product includes a computer program stored on a computer-readable storage medium
- the computer program includes program instructions, and when the program instructions are executed by a computer device , making the computer device execute the steps of the above-mentioned sample alignment method in federated learning.
- the two parties participating in the federated learning generate corresponding public and private keys respectively, and send the generated public key to the other party, and then encrypt the sample identification in the sample identification set based on the private key generated by itself and the public key of the other party , to get the encrypted identity set.
- the first participant obtains a sample alignment result of the first participant and the second participant based on the intersection of the first encrypted identifier set and the second encrypted identifier set.
- the second participant obtains a sample alignment result of the second participant and the first participant based on the intersection of the second encrypted identifier set and the first encrypted identifier set. Therefore, the keys owned by both parties participating in federated learning are equal.
- Neither party can obtain the sample ID of the other party in an exhaustive manner, which solves the problem of sample ID leakage and improves data security.
- FIG. 1 is a schematic diagram of a system architecture provided by an embodiment of the present application.
- FIG. 2 is a schematic flowchart of a sample alignment method in federated learning provided by an embodiment of the present application
- FIG. 3 is a schematic flowchart of a sample alignment method in federated learning provided by an embodiment of the present application
- FIG. 4 is a schematic structural diagram of a sample alignment device in federated learning provided by an embodiment of the present application.
- FIG. 5 is a schematic structural diagram of a sample alignment device in federated learning provided by an embodiment of the present application.
- FIG. 6 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
- Federated Learning It is a machine learning framework that can effectively help multiple organizations to perform data usage and machine learning modeling while meeting the requirements of user privacy protection, data security and government regulations.
- RSA A cryptosystem that uses different encryption keys and decryption keys, and deriving the decryption key from the encryption key is computationally infeasible.
- SM2 An elliptic curve public-key cryptographic algorithm.
- FIG. 1 it is a system architecture diagram applicable to the embodiment of the present application, and the system architecture includes at least a first device 101 and a second device 102 .
- the first device 101 is configured to execute a sample alignment method in federated learning on a first participant.
- the first device 101 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a server, etc., but is not limited thereto.
- the second device 102 is configured to execute a sample alignment method in federated learning on a second participant.
- the second device 102 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a server, etc., but is not limited thereto.
- the first device 101 and the second device 102 may be directly connected in a wired or wireless manner, or may be established through an intermediate server.
- the intermediate server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, and can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, Cloud servers for basic cloud computing services such as middleware services, domain name services, security services, content delivery network (Content Delivery Network, CDN), and big data and artificial intelligence platforms.
- the first device 101 obtains the first private key and the first public key of the first participant, and sends the first public key to the second device 102 .
- the second device 102 obtains the second private key and the second public key of the second participant, and sends the second public key to the first device 101 .
- the first device 101 respectively encrypts at least one first sample ID of the first participant based on the first private key and the second public key, obtains a first encrypted ID set, and sends the first encrypted ID set to the second device 102.
- the second device 102 respectively encrypts at least one second sample ID of the second participant based on the second private key and the first public key, obtains a second encrypted ID set, and sends the second encrypted ID set to the first device 101 .
- the first device 101 obtains the sample alignment results of the first participant and the second participant based on the intersection of the first encrypted identifier set and the second encrypted identifier set.
- the second device 102 obtains a sample alignment result of the second participant and the first participant based on the intersection of the second encrypted identifier set and the first encrypted identifier set.
- the embodiment of the present application provides a flow of a sample alignment method in federated learning, as shown in Figure 2, the flow of the method consists of the first device 101 and the first device 101 shown in Figure 1
- the two devices 102 execute each other, including the following steps:
- Step S201 the first device acquires a first private key and a first public key of a first participant.
- the first public key is generated based on the first private key and a preset value.
- a random number is randomly generated based on the SM2 elliptic curve as the first private key, and the base point of the SM2 elliptic curve is used as a preset value.
- the first public key is generated by multiplying the first private key and the base point of the SM2 elliptic curve.
- a random number is randomly generated based on the discrete logarithm as the first private key, and the generator g of the discrete logarithm is used as a preset value.
- the first public key is obtained from the first private key and the generator g of the p-order group through a modular exponent operation.
- preset values in this application are not limited to the base point of the SM2 elliptic curve and the generator of the discrete logarithm, and may also be other parameters.
- Step S202 the first device sends the first public key to the second device.
- the second device receives and stores the first public key.
- Step S203 the second device acquires a second private key and a second public key of the second participant.
- the second public key is generated based on the second private key and a preset value.
- a random number is randomly generated based on the SM2 elliptic curve as the second private key, and the base point of the SM2 elliptic curve is used as a preset value.
- the second public key is generated by multiplying the second private key and the base point of the SM2 elliptic curve.
- a random number is randomly generated based on the discrete logarithm as the first private key, and the generator g of the discrete logarithm is used as a preset value.
- the first public key is obtained from the first private key and the generator g of the p-order group through a modular exponent operation.
- preset values in this application are not limited to the base point of the SM2 elliptic curve and the generator of the discrete logarithm, and may also be other parameters.
- Step S204 the second device sends the second public key to the first device.
- the first device receives and stores the second public key.
- step S201 and step S203 is not in particular order.
- Step S205 the first device encrypts at least one first sample ID of the first participant based on the first private key and the second public key respectively, and obtains a first encrypted ID set.
- the first encrypted identifier set includes at least one first encrypted identifier corresponding to each of the first sample identifiers.
- Step S206 the first device sends the first encrypted identification set to the second device.
- the second device receives and stores the first encrypted identity set.
- Step S207 the second device encrypts at least one second sample ID of the second participant based on the second private key and the first public key, respectively, to obtain a second encrypted ID set.
- the second encrypted identifier set includes at least one second encrypted identifier corresponding to each of the second sample identifiers.
- Step S208 the second device sends the second encrypted identity set to the first device.
- the first device receives and stores the second encrypted identity set.
- step S205 and step S207 is not in particular order.
- Step S209 the first device obtains the sample alignment results of the first participant and the second participant based on the intersection of the first encrypted identifier set and the second encrypted identifier set.
- Step S210 the second device obtains a sample alignment result of the second participant and the first participant based on the intersection of the second encrypted identifier set and the first encrypted identifier set.
- step S209 and step S210 is not in particular order.
- the two parties participating in the federated learning generate corresponding public and private keys respectively, and send the generated public key to the other party, and then encrypt the sample identification in the sample identification set based on the private key generated by itself and the public key of the other party , to get the encrypted identity set.
- the first participant obtains a sample alignment result of the first participant and the second participant based on the intersection of the first encrypted identifier set and the second encrypted identifier set.
- the second participant obtains a sample alignment result of the second participant and the first participant based on the intersection of the second encrypted identifier set and the first encrypted identifier set. Therefore, the keys owned by both parties participating in federated learning are equal.
- Neither party can obtain the sample ID of the other party in an exhaustive manner, which solves the problem of sample ID leakage and improves data security.
- step S205 the embodiment of the present application provides at least the following implementations to obtain the first encrypted identity set:
- Embodiment 1 For at least one first sample identifier, perform the following steps respectively:
- U A ⁇ u 1 , u 2 , u 3 , u 4 ⁇ .
- Use the first private key d A to encrypt each first sample ID in the first sample ID set U A respectively, and obtain multiple first private key encrypted IDs, respectively u 1 d A , u 2 d A , u 3 d A , u 4 d A .
- the second public key P B is used to encrypt each first private key encrypted identifier respectively to obtain a plurality of first encrypted identifiers, which are respectively [u 1 d A ]P B , [u 2 d A ]P B , [u 3 d A ]P B , [u 4 d A ]P B .
- each first encrypted identifier can also be expressed as [u 1 d A d B ]G, [u 2 d A d B ]G, [u 3 d A d B ]G, [u 4 d A d B ]G.
- Table 1 the corresponding relationship between the first sample ID and the first encrypted ID of the first participant A is established, as shown in Table 1 specifically.
- the first private key and the first private public key of the first participant are determined based on the SM2 elliptic curve. Since the SM2 elliptic curve algorithm is an encryption algorithm in my country's commercial cryptography system, it satisfies the self-controllability of the cryptography algorithm.
- the first sample identification is encrypted by using the first private key and the second public key, which reduces encryption complexity and ensures data security at the same time.
- Embodiment 2 For at least one first sample identifier, perform the following steps respectively:
- a first sample identifier is encrypted by using the first private key to obtain the first private key encrypted identifier. Then, the second public key is used to encrypt the first private key encrypted identifier to obtain the first public and private key encrypted identifier. Then, a hash algorithm is used to encrypt the first public-private key encrypted identifier to obtain a first encrypted identifier corresponding to the first sample identifier.
- a hash algorithm is used to encrypt the first public-private key encrypted identifier to generate a first encrypted identifier with a fixed length.
- Common hash algorithms include MD5 algorithm, SHA algorithm and SM3 algorithm in my country's commercial cryptographic system.
- the MD5 algorithm includes 16-bit MD5 algorithm, 32-bit MD5 algorithm, and 64-bit MD5 algorithm.
- SHA algorithms include SHA-1 algorithm, SHA-2 algorithm, and SHA-3 algorithm.
- U A ⁇ u 1 , u 2 , u 3 , u 4 ⁇ .
- Use the first private key d A to encrypt each first sample ID in the first sample ID set U A respectively, and obtain multiple first private key encrypted IDs, respectively u 1 d A , u 2 d A , u 3 d A , u 4 d A .
- each first encrypted identifier can also be expressed as H([u 1 d A d B ]G), H([u 2 d A d B ]G) , H([u 3 d A d B ]G), H([u 4 d A d B ]G).
- Table 2 the corresponding relationship between the first sample ID and the first encrypted ID of the first participant A is established, as shown in Table 2.
- the first private key and the first public key of the first participant are determined based on the SM2 elliptic curve. Since the SM2 elliptic curve algorithm is an encryption algorithm in my country's commercial cryptography system, it satisfies the self-controllability of the cryptography algorithm. Encrypting the first sample identifier with the first private key, the second public key and a hash algorithm can further enhance data security.
- step S207 the embodiment of the present application provides at least the following implementations to obtain the second encrypted identity set:
- Embodiment 3 For at least one second sample identifier, perform the following steps respectively:
- a second sample identifier is encrypted with a second private key to obtain a second private key encrypted identifier. Then use the first public key to encrypt the second private key encrypted identifier to obtain a second encrypted identifier corresponding to the second sample identifier.
- Use the second private key d B to encrypt each second sample ID in the second sample ID set UB to obtain multiple second private key encrypted IDs, respectively u 1 d B , u 2 d B , u 3 d B , u 5 d B .
- each second encrypted identifier can also be expressed as [u 1 d B d A ]G, [u 2 d B d A ]G, [u 3 d B d A ]G, [u 5 d B d A ]G.
- Table 3 the corresponding relationship between the second sample ID and the second encrypted ID of the second participant B is established, as shown in Table 3.
- the second private key and the second public key of the second participant are determined based on the SM2 elliptic curve. Since the SM2 elliptic curve algorithm is an encryption algorithm in my country's commercial cryptography system, it satisfies the self-controllability of the cryptography algorithm.
- the second sample identifier is encrypted by using the second private key and the first public key, which reduces encryption complexity and ensures data security.
- Embodiment 4 For at least one second sample identifier, perform the following steps respectively:
- a second sample identifier is encrypted with a second private key to obtain a second private key encrypted identifier.
- the first public key is used to encrypt the second private key encrypted identifier to obtain the second public and private key encrypted identifier.
- a hash algorithm is used to encrypt the second public-private key encrypted identifier to obtain a second encrypted identifier corresponding to the second sample identifier.
- a hash algorithm is used to encrypt the second public-private key encrypted identifier to generate a second encrypted identifier with a fixed length.
- Use the second private key d B to encrypt each second sample ID in the second sample ID set UB to obtain multiple second private key encrypted IDs, respectively u 1 d B , u 2 d B , u 3 d B , u 5 d B .
- each second encrypted identifier can also be expressed as H([u 1 d B d A ]G), H([u 2 d B d A ]G) , H([u 3 d B d A ]G), H([u 5 d B d A ]G).
- Table 4 the corresponding relationship between the second sample ID and the second encrypted ID of the second participant B is established, as shown in Table 4.
- Second Sample ID Second Encryption ID u 1 H([u 1 d B d A ]G) u 2 H([u 2 d B d A ]G) u 3 H([u 3 d B d A ]G) u 5 H([u 5 d B d A ]G)
- the second private key and the second public key of the second participant are determined based on the SM2 elliptic curve. Since the SM2 elliptic curve algorithm is an encryption algorithm in my country's commercial cryptography system, it satisfies the self-controllability of the cryptography algorithm. Encrypting the second sample identifier with the second private key, the first public key and a hash algorithm can further enhance data security.
- the first device takes the intersection of the first encrypted identification set and the second encrypted identification set as the target encrypted identification set, and then takes the first object corresponding to each first encrypted identification in the target encrypted identification set This identification, as the result of the sample alignment of the first party and the second party.
- the first encrypted identity set is ⁇ [u 1 d A d B ]G, [u 2 d AdB ] G , [ u3dAdB ] G , [ u4dAdB ] G ⁇ .
- the first device receives the second encrypted identity set sent by the second device.
- the second encrypted identity set is ⁇ [u 1 d BdA ] G, [ u2dBdA ] G , [ u3dBdA ]G, [ u5dBdA ] G ⁇ .
- the first device calculates the intersection of the first encrypted identity set and the second encrypted identity set, and obtains the target encrypted identity set as ⁇ [u 1 d A d B ]G, [u 2 d A d B ]G, [u 3 d A d B ]G ⁇ .
- the first device determines that the sample alignment result of the first participant A and the second participant B is ⁇ u 1 , u 2 , u 3 ⁇ .
- the first encrypted identity set is ⁇ H([u 1 d A d B ]G), H ([u 2 d A d B ]G), H([u 3 d A d B ]G), H([u 4 d A d B ]G) ⁇ .
- the first device receives the second encrypted identity set sent by the second device, wherein, when the second device obtains the second encrypted identity set of the second participant B by using Embodiment 4 above, the second encrypted identity set is ⁇ H([u 1 d B d A ]G), H([u 2 d B d A ]G), H([u 3 d B d A ]G), H([u 5 d B d A ]G) ⁇ .
- the first device calculates the intersection of the first encrypted identity set and the second encrypted identity set, and the obtained target encrypted identity set is ⁇ H([u 1 d A d B ]G), H([u 2 d A d B ]G ), H([u 3 d A d B ]G) ⁇ .
- the first device determines that the sample alignment result of the first participant A and the second participant B is ⁇ u 1 , u 2 , u 3 ⁇ .
- the first device determines the sample alignment results of the first participant and the second participant based on the intersection of the first encrypted identifier set and the second encrypted identifier set, instead of directly combining the first participant and the second participant
- the sample identifications are compared to obtain the sample alignment results, thereby avoiding the leakage of sample identifications and improving the security of the sample data of each participant.
- the second device uses the intersection of the second encrypted identity set and the first encrypted identity set as the target encrypted identity set. Then, the second sample identifier corresponding to each second encrypted identifier in the target encrypted identifier set is used as the sample alignment result of the second participant and the first participant.
- the second device receives the first encrypted identification set sent by the first device, wherein, when the first device obtains the first encrypted identification set of the first participant A by using the above-mentioned Embodiment 1, the first encrypted identification The set is ⁇ [u 1 d A d B ]G, [u 2 d A d B ]G, [u 3 d A d B ]G, [u 4 d A d B ]G ⁇ .
- the second encrypted identity set is ⁇ [u 1 d B d A ]G, [u 2 d B d A ]G, [ u3dBdA ] G , [ u5dBdA ] G ⁇ .
- the second device calculates the intersection set of the second encrypted identity set and the first encrypted identity set, and obtains the target encrypted identity set as ⁇ [u 1 d B d A ]G, [u 2 d B d A ]G, [u 3 d B d A ]G ⁇ .
- the second device determines that the sample alignment result of the second participant B and the first participant A is ⁇ u 1 , u 2 , u 3 ⁇ .
- the second device receives the first encrypted identification set sent by the first device, wherein when the first device obtains the first encrypted identification set of the first participant A by using the second embodiment above, the first encrypted identification The set is ⁇ H([u 1 d A d B ]G), H([u 2 d A d B ]G), H([u 3 d A d B ]G), H([u 4 d A d B ]G) ⁇ .
- the second encrypted identification set is ⁇ H([u 1 d B d A ]G), H([u 2 d B d A ]G), H([u 3 d B d A ]G), H([u 5 d B d A ]G) ⁇ .
- the second device calculates the intersection set of the second encrypted identity set and the first encrypted identity set, and obtains the target encrypted identity set as ⁇ H([u 1 d B d A ]G), H([u 2 d B d A ]G) , H([u 3 d B d A ]G) ⁇ .
- the second device determines that the sample alignment result of the second participant B and the first participant A is ⁇ u 1 , u 2 , u 3 ⁇ .
- the second device determines the sample alignment result of the second participant and the first participant based on the intersection of the second encrypted identification set and the first encrypted identification set, instead of directly combining the second participant and the first participant
- the sample identifications are compared to obtain the sample alignment results, thereby avoiding the leakage of sample identifications and improving the security of the sample data of each participant.
- step S301 the first device acquires the first private key of the first participant based on the SM2 elliptic curve.
- Step S302 the first device acquires the first public key of the first participant based on the SM2 elliptic curve.
- Step S303 the first device sends the first public key to the second device.
- Step S304 the second device acquires the second private key of the second participant based on the SM2 elliptic curve.
- Step S305 the second device acquires the second public key of the second participant based on the SM2 elliptic curve.
- Step S306 the second device sends the second public key to the first device.
- Step S307 for each first sample identifier, the first device encrypts it with the first private key, and obtains the first private key encrypted identifier.
- Step S308 the first device uses the second public key to encrypt the first private key encrypted identifier to obtain the first public and private key encrypted identifier.
- Step S309 the first device encrypts the first public-private key encrypted identifier by using a hash algorithm to obtain the first encrypted identifier.
- a first encrypted identifier set is formed based on each obtained first encrypted identifier.
- Step S310 the first device sends the first encrypted identification set to the second device.
- Step S311 for each second sample ID, the second device uses the second private key to encrypt, and obtains the second private key encrypted ID.
- Step S312 the second device uses the first public key to encrypt the second private key encrypted identifier to obtain the second public and private key encrypted identifier.
- Step S313 the second device encrypts the second public-private key encrypted identifier by using a hash algorithm to obtain the second encrypted identifier.
- a second encrypted identifier set is formed.
- Step S314 the second device sends the second encrypted identity set to the first device.
- Step S315 the first device takes the intersection of the first encrypted identity set and the second encrypted identity set as the target encrypted identity set.
- step S316 the first device takes the first sample identifier corresponding to each first encrypted identifier in the target encrypted identifier set as the sample alignment result of the first participant and the second participant.
- Step S317 the second device takes the intersection of the second encrypted identity set and the first encrypted identity set as the target encrypted identity set.
- step S318 the second device takes the second sample ID corresponding to each second encrypted ID in the target encrypted ID set as the sample alignment result of the second participant and the first participant.
- the two parties participating in the federated learning generate corresponding public and private keys respectively, and send the generated public key to the other party, and then encrypt the sample identification in the sample identification set based on the private key generated by itself and the public key of the other party , to get the encrypted identity set.
- the first participant obtains a sample alignment result of the first participant and the second participant based on the intersection of the first encrypted identifier set and the second encrypted identifier set.
- the second participant obtains a sample alignment result of the second participant and the first participant based on the intersection of the second encrypted identifier set and the first encrypted identifier set. Therefore, the keys owned by both parties participating in federated learning are equal.
- SM2 elliptic curve algorithm is an encryption algorithm in my country's commercial cryptography system, it satisfies the self-controllability of the cryptography algorithm.
- the first sample identification is encrypted by the first private key, the second public key and the hash algorithm, and the second sample identification is encrypted by the second private key, the first public key and the hash algorithm, which can further enhance the security of the data safety.
- the embodiment of this application provides a sample alignment device in federated learning, as shown in Figure 4, the device 400 includes:
- the first obtaining module 401 is configured to obtain the first private key and the first public key of the first participant, and send the first public key to the second device, the first public key is based on the first Generated by private key and default value;
- the first receiving module 402 is configured to receive the second public key of the second participant sent by the second device, wherein the second public key is based on the second private key of the second participant and the Generated by default;
- the first encryption module 403 is configured to respectively encrypt at least one first sample identification of the first participant based on the first private key and the second public key, and obtain a first encrypted identification set;
- the first receiving module 402 is further configured to receive a second encrypted identification set of the second participant sent by the second device, the second encrypted identification set is based on the second private key and the The first public key is obtained by encrypting at least one second sample ID of the second participant;
- the first sample alignment module 404 is configured to obtain a sample alignment result of the first participant and the second participant based on the intersection of the first encrypted identifier set and the second encrypted identifier set.
- the first encryption module 403 is also used for:
- the set of first encrypted identifiers includes first encrypted identifiers corresponding to each of the at least one first sample identifier;
- the first encryption module 403 is specifically used for:
- the first encryption module 403 is specifically configured to:
- a hash algorithm is used to encrypt the first public-private key encrypted identifier to obtain a first encrypted identifier corresponding to the one first sample identifier.
- the first sample alignment module 404 is specifically configured to:
- the first sample identifier corresponding to each first encrypted identifier in the target encrypted identifier set is used as a sample alignment result of the first participant and the second participant.
- the first private key is generated based on the SM2 elliptic curve, and the preset value is the base point of the SM2 elliptic curve;
- the first obtaining module 401 is specifically used for:
- the first public key is generated by multiplying the first private key and the base point.
- the embodiment of this application provides a sample alignment device in federated learning, as shown in Figure 5, the device 500 includes:
- the second obtaining module 501 is configured to obtain the second private key and the second public key of the second participant, and send the second public key to the first device, the second public key is based on the second Generated by private key and default value;
- the second receiving module 502 is configured to receive the first public key of the first participant sent by the first device, wherein the first public key is based on the first private key of the first participant and the Generated by default;
- the second encryption module 503 is configured to respectively encrypt at least one second sample identification of the second participant based on the second private key and the first public key, and obtain a second encrypted identification set;
- the second receiving module 502 is further configured to receive the first encrypted identification set of the first participant sent by the first device, the first encrypted identification set is based on the first private key and the The second public key is obtained by encrypting at least one first sample identification of the first participant;
- the second sample alignment module 504 is configured to obtain a sample alignment result of the second participant and the first participant based on the intersection of the second encrypted identifier set and the first encrypted identifier set.
- the second encryption module 503 is also used for:
- the set of second encrypted identifiers includes second encrypted identifiers corresponding to the at least one second sample identifier
- the second encryption module 503 is specifically used for:
- the second encryption module 503 is specifically configured to:
- a hash algorithm is used to encrypt the second public-private key encrypted identifier to obtain a second encrypted identifier corresponding to the one second sample identifier.
- the second sample alignment module 504 is specifically configured to:
- the second sample identifier corresponding to each second encrypted identifier in the target encrypted identifier set is used as a sample alignment result of the second participant and the first participant.
- the second private key is generated based on the SM2 elliptic curve, and the preset value is the base point of the SM2 elliptic curve;
- the second acquiring module 501 is specifically used for:
- the second public key is generated by multiplying the second private key and the base point.
- the embodiment of this application provides a computer device, which can be a terminal or a server, as shown in Figure 6, including at least one processor 601, and a memory 602 connected to at least one processor.
- the specific connection medium between the processor 601 and the memory 602 is not limited in the embodiment of the application, and the connection between the processor 601 and the memory 602 in FIG. 6 is taken as an example.
- the bus can be divided into address bus, data bus, control bus and so on.
- the memory 602 stores instructions executable by at least one processor 601, and at least one processor 601 executes the instructions stored in the memory 602 to perform the steps included in the above-mentioned sample alignment method in federated learning .
- the processor 601 is the control center of the computer equipment, which can use various interfaces and lines to connect various parts of the computer equipment, by running or executing the instructions stored in the memory 602 and calling the data stored in the memory 602, so as to perform federation Sample alignment in learning.
- the processor 601 may include one or more processing units, and the processor 601 may integrate an application processor and a modem processor.
- the tuner processor mainly handles wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 601 .
- the processor 601 and the memory 602 can be implemented on the same chip, and in some embodiments, they can also be implemented on independent chips.
- the processor 601 can be a general processor, such as a central processing unit (CPU), a digital signal processor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field programmable gate array or other programmable logic devices, discrete gates or transistors Logic devices and discrete hardware components can implement or execute the methods, steps and logic block diagrams disclosed in the embodiments of the present application.
- a general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the methods disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.
- the memory 602 can be used to store non-volatile software programs, non-volatile computer-executable programs and modules.
- Memory 602 may include at least one type of storage medium, for example, may include flash memory, hard disk, multimedia card, card memory, random access memory (Random Access Memory, RAM), static random access memory (Static Random Access Memory, SRAM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Magnetic Memory, Disk , CD, etc.
- Memory 602 is, but is not limited to, any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
- the memory 602 in the embodiment of the present application may also be a circuit or any other device capable of implementing a storage function, and is used for storing program instructions and/or data.
- an embodiment of the present application provides a computer-readable storage medium, which stores a computer program that can be executed by a computer device, and when the program is run on the computer device, the computer device performs the above-mentioned sample alignment in the federated learning method steps.
- an embodiment of the present application provides a computer program product, the computer program product includes a computer program stored on a computer-readable storage medium, the computer program includes program instructions, when the program instructions are executed by the computer When executing, the computer is made to execute the steps of the sample alignment method in the above-mentioned federated learning.
- the embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
- computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
- These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions
- the device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Bioethics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Storage Device Security (AREA)
Abstract
Les modes de réalisation selon la présente demande se rapportent au domaine technique de l'intelligence artificielle. Sont fournis un procédé et un appareil d'alignement d'échantillon dans un apprentissage fédéré, un dispositif, et un support de stockage. Le procédé comprend les étapes suivantes : chaque partie participant à l'apprentissage fédéré génère des clés publiques et privées correspondantes, envoie la clé publique générée à une partie homologue, puis chiffre des identifiants d'échantillon dans un ensemble d'identifiants d'échantillon sur la base de la clé privée générée par elle-même et de la clé publique de la partie homologue, de manière à obtenir un ensemble d'identifiants chiffrés ; un premier participant obtient un résultat d'alignement d'échantillon du premier participant et d'un second participant sur la base d'une intersection d'un premier ensemble d'identifiants chiffrés et d'un second ensemble d'identifiants chiffrés ; et le second participant obtient un résultat d'alignement d'échantillon du second participant et du premier participant sur la base d'une intersection du second ensemble d'identifiants chiffrés et du premier ensemble d'identifiants chiffrés. Par conséquent, les clés possédées par les deux parties participant à l'apprentissage fédéré sont équivalentes. Aucune des deux parties ne peut acquérir l'identifiant d'échantillon de la partie homologue d'une manière exhaustive, ce qui permet de résoudre le problème de fuite d'identifiant d'échantillon et d'améliorer la sécurité des données.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202111140469.1 | 2021-09-28 | ||
| CN202111140469.1A CN113836559A (zh) | 2021-09-28 | 2021-09-28 | 一种联邦学习中的样本对齐方法、装置、设备及存储介质 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023050687A1 true WO2023050687A1 (fr) | 2023-04-06 |
Family
ID=78970743
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2022/076928 Ceased WO2023050687A1 (fr) | 2021-09-28 | 2022-02-18 | Procédé et appareil d'alignement d'échantillon dans un apprentissage fédéré, dispositif, et support de stockage |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN113836559A (fr) |
| WO (1) | WO2023050687A1 (fr) |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116910777A (zh) * | 2023-07-05 | 2023-10-20 | 中国电信股份有限公司技术创新中心 | 一种数据分析方法及装置 |
| CN117708887A (zh) * | 2024-02-05 | 2024-03-15 | 智慧眼科技股份有限公司 | 一种基于纵向逻辑回归的联邦学习模型获取方法及系统 |
| CN118396083A (zh) * | 2024-05-31 | 2024-07-26 | 北京火山引擎科技有限公司 | 基于联邦学习的模型训练方法、系统、装置及电子设备 |
| WO2024234685A1 (fr) * | 2023-05-16 | 2024-11-21 | 华为云计算技术有限公司 | Procédé et système de traitement de données et dispositif informatique |
| WO2025232769A1 (fr) * | 2024-05-10 | 2025-11-13 | 维沃移动通信有限公司 | Procédés et appareils d'alignement d'échantillon, ainsi que dispositif de communication |
Families Citing this family (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113836559A (zh) * | 2021-09-28 | 2021-12-24 | 中国银联股份有限公司 | 一种联邦学习中的样本对齐方法、装置、设备及存储介质 |
| CN114358311B (zh) * | 2021-12-31 | 2023-11-07 | 中国电信股份有限公司 | 纵向联邦数据处理方法及装置 |
| CN114357518A (zh) * | 2022-01-04 | 2022-04-15 | 腾讯科技(深圳)有限公司 | 一种基于联邦学习的数据加密方法、装置和计算机设备 |
| CN115544562B (zh) * | 2022-09-29 | 2025-10-31 | 杭州海康威视系统技术有限公司 | 一种数据求交方法、装置、系统、电子设备及存储介质 |
| CN116432772B (zh) * | 2023-03-27 | 2025-12-09 | 杭州博盾习言科技有限公司 | 一种隐私保护的联邦学习方法、装置、电子设备及介质 |
| CN116089991B (zh) * | 2023-04-13 | 2024-02-20 | 北京百度网讯科技有限公司 | 数据对齐方法、装置、设备及存储介质 |
| CN117010002B (zh) * | 2023-09-28 | 2024-01-05 | 腾讯科技(深圳)有限公司 | 样本标识的对齐方法、装置、电子设备及存储介质 |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180323973A1 (en) * | 2016-11-07 | 2018-11-08 | Infosec Global Inc. | Elliptic curve isogeny-based cryptographic scheme |
| CN112241537A (zh) * | 2020-09-23 | 2021-01-19 | 易联众信息技术股份有限公司 | 纵向联邦学习建模方法、系统、介质及设备 |
| CN113836559A (zh) * | 2021-09-28 | 2021-12-24 | 中国银联股份有限公司 | 一种联邦学习中的样本对齐方法、装置、设备及存储介质 |
Family Cites Families (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2018044146A1 (fr) * | 2016-09-05 | 2018-03-08 | Lg Electronics Inc. | Accord de clé authentifiée de faible poids et sans dépôt pour l'internet des objets |
| CN109492420B (zh) * | 2018-12-28 | 2021-07-20 | 深圳前海微众银行股份有限公司 | 基于联邦学习的模型参数训练方法、终端、系统及介质 |
| CN109886417B (zh) * | 2019-03-01 | 2024-05-03 | 深圳前海微众银行股份有限公司 | 基于联邦学习的模型参数训练方法、装置、设备及介质 |
| CN110633806B (zh) * | 2019-10-21 | 2024-04-26 | 深圳前海微众银行股份有限公司 | 纵向联邦学习系统优化方法、装置、设备及可读存储介质 |
| US12160504B2 (en) * | 2019-11-13 | 2024-12-03 | International Business Machines Corporation | Privacy-preserving federated learning |
| CN111291084A (zh) * | 2020-02-12 | 2020-06-16 | 深圳前海微众银行股份有限公司 | 样本id对齐方法、装置、设备及存储介质 |
| CN111461874A (zh) * | 2020-04-13 | 2020-07-28 | 浙江大学 | 一种基于联邦模式的信贷风险控制系统及方法 |
| CN111915019B (zh) * | 2020-08-07 | 2023-06-20 | 平安科技(深圳)有限公司 | 联邦学习方法、系统、计算机设备和存储介质 |
| CN112132292B (zh) * | 2020-09-16 | 2024-05-14 | 建信金融科技有限责任公司 | 基于区块链的纵向联邦学习数据处理方法、装置及系统 |
| CN112668020B (zh) * | 2020-12-24 | 2024-09-03 | 深圳前海微众银行股份有限公司 | 特征交叉方法、设备、可读存储介质和计算机程序产品 |
| CN112365006B (zh) * | 2021-01-12 | 2021-04-02 | 深圳致星科技有限公司 | 用于联邦学习的并行训练方法及系统 |
-
2021
- 2021-09-28 CN CN202111140469.1A patent/CN113836559A/zh active Pending
-
2022
- 2022-02-18 WO PCT/CN2022/076928 patent/WO2023050687A1/fr not_active Ceased
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180323973A1 (en) * | 2016-11-07 | 2018-11-08 | Infosec Global Inc. | Elliptic curve isogeny-based cryptographic scheme |
| CN112241537A (zh) * | 2020-09-23 | 2021-01-19 | 易联众信息技术股份有限公司 | 纵向联邦学习建模方法、系统、介质及设备 |
| CN113836559A (zh) * | 2021-09-28 | 2021-12-24 | 中国银联股份有限公司 | 一种联邦学习中的样本对齐方法、装置、设备及存储介质 |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2024234685A1 (fr) * | 2023-05-16 | 2024-11-21 | 华为云计算技术有限公司 | Procédé et système de traitement de données et dispositif informatique |
| CN116910777A (zh) * | 2023-07-05 | 2023-10-20 | 中国电信股份有限公司技术创新中心 | 一种数据分析方法及装置 |
| CN117708887A (zh) * | 2024-02-05 | 2024-03-15 | 智慧眼科技股份有限公司 | 一种基于纵向逻辑回归的联邦学习模型获取方法及系统 |
| CN117708887B (zh) * | 2024-02-05 | 2024-04-30 | 智慧眼科技股份有限公司 | 一种基于纵向逻辑回归的联邦学习模型获取方法及系统 |
| WO2025232769A1 (fr) * | 2024-05-10 | 2025-11-13 | 维沃移动通信有限公司 | Procédés et appareils d'alignement d'échantillon, ainsi que dispositif de communication |
| CN118396083A (zh) * | 2024-05-31 | 2024-07-26 | 北京火山引擎科技有限公司 | 基于联邦学习的模型训练方法、系统、装置及电子设备 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN113836559A (zh) | 2021-12-24 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2023050687A1 (fr) | Procédé et appareil d'alignement d'échantillon dans un apprentissage fédéré, dispositif, et support de stockage | |
| US9686248B2 (en) | Secure shared key sharing systems and methods | |
| CN114143108B (zh) | 一种会话加密方法、装置、设备及存储介质 | |
| CN110400164B (zh) | 数据确定方法和装置、存储介质及电子装置 | |
| WO2019098941A1 (fr) | Système et procédé d'intégration privée d'ensembles de données | |
| CN107135408B (zh) | 一种视频流地址的鉴权方法及装置 | |
| TWI852048B (zh) | 一種樣本對齊方法、裝置、設備及存儲介質 | |
| CN105722067A (zh) | 移动终端上数据加/解密方法及装置 | |
| CN108768994B (zh) | 数据匹配方法、装置及计算机可读存储介质 | |
| WO2022068356A1 (fr) | Procédé et appareil de chiffrement d'informations basés sur une chaîne de blocs, dispositif, et support | |
| CN111400728A (zh) | 应用于区块链的数据加密解密方法及装置 | |
| CN113434906A (zh) | 数据查询方法、装置、计算机设备及存储介质 | |
| WO2022141014A1 (fr) | Procédé de moyennage de sécurité basé sur des données multi-utilisateurs | |
| WO2022068354A1 (fr) | Procédé, appareil et dispositif de vérification de données, et support de stockage | |
| CN114417364A (zh) | 一种数据加密方法、联邦建模方法、装置及计算机设备 | |
| WO2023087760A1 (fr) | Procédé et appareil de partage de données, dispositif et support de stockage | |
| CN111949996A (zh) | 安全私钥的生成方法、加密方法、系统、设备及介质 | |
| CN116595562B (zh) | 数据处理方法和电子设备 | |
| CN117077156B (zh) | 数据处理方法和电子设备 | |
| US12143491B2 (en) | Secured performance of an elliptic curve cryptographic process | |
| CN113486398B (zh) | 一种信息比对方法、装置及电子设备 | |
| CN119051964B (zh) | 一种数据处理方法、装置、设备及介质 | |
| Prakash et al. | Data verification using block level batch auditing on multi-cloud server | |
| WO2025042740A1 (fr) | Génération de clé distribuée robuste améliorée | |
| HK40068106A (en) | Sample alignment method, apparatus, device and storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22874090 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 22874090 Country of ref document: EP Kind code of ref document: A1 |