WO2019024772A1 - 数据加密、机器学习模型训练方法、装置及电子设备 - Google Patents

数据加密、机器学习模型训练方法、装置及电子设备 Download PDF

Info

Publication number
WO2019024772A1
WO2019024772A1 PCT/CN2018/097339 CN2018097339W WO2019024772A1 WO 2019024772 A1 WO2019024772 A1 WO 2019024772A1 CN 2018097339 W CN2018097339 W CN 2018097339W WO 2019024772 A1 WO2019024772 A1 WO 2019024772A1
Authority
WO
WIPO (PCT)
Prior art keywords
matrix
encryption
data
target
machine learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2018/097339
Other languages
English (en)
French (fr)
Inventor
杨新星
曹绍升
周俊
李小龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CA3058498A priority Critical patent/CA3058498A1/en
Priority to EP18840540.1A priority patent/EP3627759B1/en
Priority to AU2018310377A priority patent/AU2018310377A1/en
Priority to SG11201909193Q priority patent/SG11201909193QA/en
Publication of WO2019024772A1 publication Critical patent/WO2019024772A1/zh
Priority to US16/587,977 priority patent/US11257007B2/en
Anticipated expiration legal-status Critical
Priority to AU2021218153A priority patent/AU2021218153A1/en
Ceased legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/06Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
    • H04L9/0618Block ciphers, i.e. encrypting groups of characters of a plain text message using fixed encryption transformation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols

Definitions

  • the present specification relates to the field of computer applications, and in particular, to a data encryption, a machine learning model training method, a device, and an electronic device.
  • This specification proposes a data encryption method, the method comprising:
  • the PCA algorithm performs encryption calculation on the target matrix to obtain an encrypted N*K-dimensional encryption matrix, and further includes:
  • it also includes:
  • the projection matrix is stored locally as an encryption matrix.
  • the PCA algorithm performs encryption calculation on the target matrix to obtain an encrypted N*K-dimensional encryption matrix, including:
  • the target matrix is subjected to encryption calculation based on a PCA algorithm to obtain the N*K-dimensional encryption matrix.
  • it also includes:
  • the target matrix is re-encrypted based on the PCA algorithm, and the locally stored projection is based on the recalculated projection matrix The matrix is updated.
  • the present specification also proposes a data encryption device, the device comprising:
  • Generating a module generating an N*M-dimensional target matrix based on the N data samples and the data features corresponding to the M dimensions of the N data samples respectively;
  • a calculation module performing encryption calculation on the target matrix based on a PCA algorithm, to obtain an N*K-dimensional encryption matrix; wherein the K value is smaller than the M value;
  • a transmission module that transmits the encryption matrix to a modeling server; wherein the encryption matrix is used to train a machine learning model.
  • the computing module is:
  • it also includes:
  • the storage module stores the projection matrix locally as an encryption matrix.
  • the calculating module further:
  • the target matrix is subjected to encryption calculation based on a PCA algorithm to obtain the N*K-dimensional encryption matrix.
  • it also includes:
  • the target matrix is re-encrypted based on the PCA algorithm, and based on the recalculated projection matrix, locally The stored projection matrix is updated.
  • the present specification also proposes a machine learning model training method, the method comprising
  • the encryption matrix is an encryption matrix of the N*K dimension obtained by the data provider server for performing an encryption calculation on the target matrix of the N*M dimension based on the PCA algorithm;
  • the K value is less than the M value;
  • the encryption matrix is trained as a training sample to train a machine learning model.
  • the training the machine learning model by using the encryption matrix as a training sample comprises:
  • the encryption matrix is used as a training sample, fused with local training samples, and the machine learning model is trained based on the fused training samples.
  • the present specification also proposes a machine learning model training device, the device comprising
  • the receiving module receives the encryption matrix transmitted by the data provider server, wherein the encryption matrix is used by the data provider server to perform encryption calculation on the target matrix of the N*M dimension based on the PCA algorithm, and the obtained N*K dimension Encryption matrix; the K value is less than the M value;
  • the training module trains the machine learning model with the encryption matrix as a training sample.
  • the training module :
  • the encryption matrix is used as a training sample, fused with local training samples, and the machine learning model is trained based on the fused training samples.
  • the present specification also proposes a machine learning model training system, the system comprising:
  • the data provider server generates an N*M-dimensional target matrix based on the N data samples and the data features of the M dimensions respectively corresponding to the N data samples, and performs encryption calculation on the target matrix based on the PCA algorithm, Obtaining an encryption matrix of N*K dimensions; wherein the K value is smaller than the M value; and transmitting the encryption matrix to a modeling server;
  • the modeling server trains the machine learning model based on the encryption matrix.
  • modeling server further:
  • the encryption matrix is used as a training sample, fused with local training samples, and the machine learning model is trained based on the fused training samples.
  • the present specification also proposes an electronic device comprising:
  • a memory for storing machine executable instructions
  • the processor by reading and executing the machine executable instructions corresponding to the data encryption control logic stored by the memory, the processor is caused to:
  • the present specification also proposes an electronic device comprising:
  • a memory for storing machine executable instructions
  • the processor by reading and executing the machine-executable instructions stored in the memory corresponding to the control logic of the machine learning model training, the processor is caused to:
  • the encryption matrix is an encryption matrix of the N*K dimension obtained by the data provider server for performing an encryption calculation on the target matrix of the N*M dimension based on the PCA algorithm;
  • the K value is less than the M value;
  • the encryption matrix is trained as a training sample to train a machine learning model.
  • an N*M-dimensional target matrix is generated based on N data samples and data features corresponding to M dimensions of the N data samples, and the target matrix is encrypted and calculated based on the PCA algorithm to obtain encryption.
  • a subsequent N*K-dimensional encryption matrix and then transmitting the encryption matrix to the modeling server, and the modeling server trains the machine learning model as a training sample;
  • the modeling server is based on the encrypted encryption matrix, and usually cannot restore the original target matrix, thus being able to maximize the user's Privacy data is protected from privacy, avoiding the user's privacy leakage during the process of submitting data samples to the modeling server for model training;
  • the obtained encryption matrix has fewer dimensions than the original target matrix, so the transmission overhead when transmitting the data samples to the modeling server can be reduced; Since the PCA algorithm is used to encrypt the above target matrix, the amount of information in the original data sample can be retained to the greatest extent. Therefore, when the encrypted encryption matrix is transmitted to the modeling server for model training, the model training can still be guaranteed. Precision.
  • FIG. 1 is a flow chart showing a data encryption method according to an embodiment of the present specification
  • FIG. 2 is a schematic diagram of a target matrix of an N*M dimension according to an embodiment of the present specification
  • FIG. 3 is a flowchart of performing confidential calculation on the target matrix based on a PCA algorithm according to an embodiment of the present specification
  • FIG. 4 is a schematic diagram of a joint modeling of a multi-party data sample shown in an embodiment of the present specification
  • FIG. 5 is a flowchart of a machine learning model training method according to an embodiment of the present specification
  • FIG. 6 is a hardware structural diagram of an electronic device carrying the data encryption device according to an embodiment of the present disclosure
  • FIG. 7 is a logic block diagram of the data encryption apparatus according to an embodiment of the present disclosure.
  • FIG. 8 is a hardware structural diagram of an electronic device carrying the machine learning model training device according to an embodiment of the present disclosure
  • FIG. 9 is a logic block diagram of the machine learning model training apparatus provided by an embodiment of the present specification.
  • data features of N dimensions may be extracted from N data samples required for modeling, and based on the N data samples, and data features corresponding to M dimensions of the N data samples respectively , generating a target matrix of N*M dimensions.
  • the target matrix After generating the target matrix of the above N*M dimension, the target matrix may be encrypted and calculated according to the PCA algorithm, and an encrypted N*K-dimensional encryption matrix is obtained, and the encryption matrix is transmitted as a training sample to the modeling server. Where K is less than the value of M.
  • the modeling server may train the machine learning model as a training sample; for example, the encryption matrix may be fused with its local training samples, and then trained based on the fused training samples.
  • Machine learning model After receiving the encryption matrix, the modeling server may train the machine learning model as a training sample; for example, the encryption matrix may be fused with its local training samples, and then trained based on the fused training samples. Machine learning model.
  • the modeling server is based on the encrypted encryption matrix, and usually cannot restore the original target matrix, thus being able to maximize the user's Privacy data is protected from privacy, avoiding the user's privacy leakage during the process of submitting data samples to the modeling server for model training;
  • the dimension in the encryption matrix is obtained less than the original target matrix, so the transmission overhead when transmitting the data sample to the modeling server can be reduced;
  • the PCA algorithm is used to encrypt the above target matrix, the amount of information in the original data sample can be retained to the greatest extent. Therefore, when the encrypted encryption matrix is transmitted to the modeling server for model training, the accuracy of the model training can still be guaranteed. .
  • FIG. 1 is a data encryption method according to an embodiment of the present disclosure.
  • the data encryption method is applied to a data provider server, and the following steps are performed:
  • Step 102 Generate an N*M-dimensional target matrix based on N data samples and data features corresponding to M dimensions of the N data samples respectively.
  • Step 104 Perform a dimensionality reduction calculation on the target matrix based on a PCA algorithm to obtain an encrypted N*K-dimensional encryption matrix, where the K value is smaller than the M value;
  • Step 106 Transfer the encryption matrix to a modeling server; wherein the encryption matrix is used to train a machine learning model.
  • the data provider server may be connected to the modeling server to provide data samples required for modeling to the modeling server.
  • the data provider and the modeling party may respectively correspond to different operators, and the data provider may transmit the collected user data as a data sample to the modeling party to complete data modeling; for example;
  • the above-mentioned modeling party may be Alipay's data operation platform, and the above data provider may be a service platform for providing Internet services to users such as third-party banks and express delivery companies that are connected with Alipay's data operation platform.
  • the data provider's server can collect the user data generated by the user in the background, collect N pieces of user data from the collected user data as data samples, and generate one based on the collected data samples. A collection of initialized data samples.
  • N pieces of sensitive data relating to user privacy may be collected from the collected user data, and then an initialized data sample set is generated based on the sensitive data.
  • the specific number of the N pieces of data samples collected is not specifically limited in the present specification, and those skilled in the art can set based on actual needs.
  • the specific form of the above user data depends on the specific business scenario and the modeling requirements, and is not particularly limited in the present specification; for example, in practical applications, if it is desired to create a risk for the user-initiated payment transaction
  • the score card model of the evaluation, then in this business scenario, the above user data may be based on transaction data generated by the user through the payment client.
  • the data provider server may further preprocess the data samples in the data sample set.
  • the preprocessing of the data samples in the data sample set generally includes performing data cleaning, supplementing default values, normalization processing, or other forms of preprocessing on the data samples in the data sample set.
  • the collected data samples can be converted into standardized data samples suitable for model training.
  • the data provider server may extract data features of the M dimensions from the data samples in the data sample set.
  • the number of data features of the above M dimensions extracted is not particularly limited in the present specification, and those skilled in the art can select based on actual modeling requirements.
  • the specific type of the extracted data feature is not particularly limited in the present specification, and those skilled in the art can manually select from the information actually included in the data sample based on actual modeling requirements;
  • the modeling party may pre-select data features of M dimensions based on actual modeling requirements, and then provide selected data features to the data provider, by the data provider.
  • the data feature values corresponding to the data features of the respective dimensions are extracted from the above data samples.
  • the data provider After the data provider extracts and extracts the data features of the M dimensions from the data samples in the data sample set, the data features corresponding to the extracted data features of the M dimensions may be used as the data.
  • the samples respectively generate a data feature vector, and then construct an N*M-dimensional target matrix based on the data feature vectors of each data sample.
  • the M-dimensional data features may correspond to the rows of the target matrix, or may correspond to the columns of the target matrix, and are not particularly limited in the present specification.
  • the target matrix in the case of the behavior of the M-dimensional data feature corresponding to the target matrix, the target matrix may be expressed in the form shown in FIG. 2.
  • each column represents a data sample
  • each row represents a feature vector consisting of data features of M dimensions.
  • the target matrix may be performed based on the PCA algorithm. Encryption calculations yield an N*K-dimensional encryption matrix.
  • the above-mentioned encryption matrix obtained after encryption is an encrypted encryption matrix. In this way, privacy protection of the user's private data can be maximized.
  • FIG. 3 is a flow chart of performing encryption calculation on the target matrix based on the PCA algorithm, and includes the following steps:
  • Step 302 Perform zero-average processing on the values in the vector corresponding to the data features of the M dimensions in the target matrix.
  • the so-called zero-meanization process refers to the process of subtracting the mean value of the set of values from each of a set of values.
  • the values in the vectors corresponding to the data features of the M dimensions in the target matrix are respectively subjected to zero-average processing, which refers to each of the vectors corresponding to the data features of the M dimensions.
  • the value, the process of subtracting the average of all values in the vector respectively.
  • the vector corresponding to the data features of the M dimensions in the target matrix may be sequentially selected as the target vector, and then the average value of each value in the target vector is calculated, and then each value in the target vector is used. , subtract the average value separately.
  • the average value of each row in the target matrix in FIG. 2 can be calculated, and then the respective values in each row are sequentially subtracted from the average value of the row.
  • Step 304 Calculate a covariance matrix corresponding to the target matrix after the zero-averaging process
  • the covariance matrix refers to a matrix composed of covariances between values in each vector in the target matrix.
  • the vectors corresponding to the data features of the M dimensions described above may be respectively calculated, and the covariance between the vectors corresponding to the data features of other dimensions in the target matrix may be respectively calculated. Then, based on the calculated covariance, a covariance matrix composed of the calculated covariance is generated.
  • Step 306 Calculate an eigenvalue of the covariance matrix and a feature vector corresponding to the eigenvalue
  • the eigenvalues of the covariance matrix and the eigenvectors corresponding to the respective eigenvalues may be further calculated.
  • the eigenvalue of the covariance matrix usually depends on the order of the covariance matrix; for example, for an N*M covariance matrix, the eigenvalue is N*M.
  • the characteristic polynomial of the covariance matrix can be obtained, and then all the roots of the eigenpolynomial are obtained; each root obtained at this time is a eigenvalue.
  • the feature values may be substituted into a linear equation group corresponding to the covariance matrix to obtain a feature vector corresponding to each feature value.
  • Step 308 sorting the calculated feature vectors according to the size of the corresponding feature values, and extracting K feature vectors with the largest feature value to generate a projection matrix of M*K dimensions;
  • each eigenvector may be further sorted according to the size of its corresponding eigenvalue; for example, in descending order Sort. After sorting each feature vector according to the value of the feature value, K feature vectors with the largest feature value can be extracted to generate a M*K-dimensional projection matrix.
  • the value of the above K may be a value smaller than the value of the M.
  • the value of the foregoing K may be manually specified by a person in the field in combination with actual requirements.
  • Step 310 Multiply the target matrix by the projection matrix to obtain an encrypted N*K-dimensional encryption matrix.
  • the projection matrix of the above M*K dimension is the projection matrix that finally encrypts the original target matrix.
  • the original N*M target matrix is encrypted based on the M*K-dimensional projection matrix
  • the original high-dimensional target matrix can be mapped into a low-dimensional projection matrix space.
  • the target matrix of the original N*M is mapped to the projection matrix space of the M*K dimension, and the target matrix of the original N*M can be multiplied by the projection matrix of the M*K dimension.
  • the original N*M target matrix is multiplied by the above-mentioned M*K-dimensional projection matrix, and may be right-multiplied or left-multiplied;
  • the original N* may be calculated by right-multiplying the original N*M target matrix and the M*K-dimensional projection matrix.
  • the target matrix of M is mapped to the projection matrix space of the above M*K dimension; then, in implementation, the target matrix of the original N*M and the projection matrix of the above M*K dimension may be left-multiplied, and then The left multiplication result is transposed, and the original N*M target matrix is mapped to the above-mentioned M*K-dimensional projection matrix space.
  • an N*K-dimensional encryption matrix can be obtained.
  • the encryption matrix is a data sample encrypted by the mapping matrix of the above M*K dimension.
  • the data provider server can calculate the projection matrix of the N*K dimension by using the calculation process shown above, and the projection matrix can also be used as the encryption matrix. Store locally.
  • the subsequent data provider server collects the latest N data data samples again, and generates an N*M dimension based on the N data samples and the data features corresponding to the M dimensions of the N data samples respectively. After the matrix, it can be determined whether the above projection matrix is stored locally;
  • the above-mentioned N*M matrix may be directly encrypted by using the above-described projection matrix that has been stored, and the specific encryption process will not be described again.
  • the encryption matrix may be re-encrypted according to the PCA algorithm-based dimensionality reduction process described above to generate the projection matrix.
  • the data provider can recalculate the projection matrix based on the PCA algorithm-based encryption calculation process described above, and use The recalculated projection matrix updates the projection matrix that should be stored locally.
  • the locally stored failed encryption matrix can be updated in time when the data features required for modeling are updated, so that the original target matrix can be avoided by using the invalid encryption matrix.
  • the resulting loss of data information affects the accuracy of the modeling.
  • the data provider server may The encryption matrix is transmitted as a training sample to a modeling server that interfaces with the above data provider.
  • the modeling server After the modeling server receives the above-mentioned encryption matrix transmitted by the data provider server, the modeling server can train the machine learning model as the training sample;
  • the modeling server may specifically fuse the encryption matrix with the locally stored training samples, and then jointly train the machine learning model based on the merged training samples.
  • FIG. 4 is a schematic diagram of a joint modeling of a multi-party data sample shown in the present specification.
  • the above-mentioned modeling party may be Alipay's data operation platform
  • the above data provider may include a service platform for providing Internet services to users such as banks and third-party financial institutions that interface with Alipay's data operation platform.
  • the above data providers directly provide local user transaction data to Alipay's data operation platform for data modeling. , may cause user privacy leakage in the data transmission link.
  • each data provider may perform an encryption calculation on the N*M-dimensional target matrix generated based on the original transaction data sample based on the above projection matrix based on the PCA algorithm to obtain an N*K-dimensional encryption matrix, and then Transferred as a training sample to Alipay's data operations platform.
  • Alipay's data operation platform can integrate the received training samples provided by each data provider with localized data samples, and then train the machine learning model based on the merged training samples; for example, based on banks and third parties
  • the user transaction data provided by the financial institution is integrated with the localized user transaction data in Alipay's data operation platform, and the joint training office is a scorecard model for risk assessment of the user's transaction.
  • the specific type of the above machine learning model is not particularly limited in the present specification; for example, in practical applications, the above machine learning model may specifically be based on a supervised machine learning algorithm (such as a regression algorithm).
  • a supervised predictive model built; for example, based on user-paid transaction data, a scoring card model trained to assess the user's trading risk; or based on an unsupervised machine learning algorithm (such as the k-means algorithm)
  • An unsupervised classification model for example, based on user clicks and access data, trained recommendation models for targeted advertising or page content to users.
  • the subsequent data provider may still encrypt the data matrix constructed based on the collected data samples and related data features based on the projection matrix described above, and then Transmitting to the above machine learning model for calculation, and obtaining the output result of the model; for example, taking the above machine learning model as the user-based payment transaction data, and training the score card model for evaluating the user's transaction risk as an example, the above data is provided Based on the above projection matrix, the data matrix constructed based on the collected transaction data of the user may be encrypted and then transmitted as input data to the score card model to obtain a risk score corresponding to each transaction.
  • Step 502 Receive an encryption matrix transmitted by the data provider server, where the encryption matrix is used by the data provider server to perform an encryption calculation on the target matrix of the N*M dimension based on the PCA algorithm, and obtain the N*K dimension. Encryption matrix; the K value is less than the M value;
  • Step 504 training the machine learning model with the encryption matrix as a training sample.
  • the N*M-dimensional target matrix is generated by using N data samples and data features corresponding to the M dimensions of the N data samples, and the target matrix is encrypted and calculated based on the PCA algorithm.
  • the modeling server is based on the encrypted encryption matrix, and usually cannot restore the original target matrix, thus being able to maximize the user's Privacy data is protected from privacy, avoiding the user's privacy leakage during the process of submitting data samples to the modeling server for model training;
  • the obtained dimension in the encryption matrix is smaller than the original target matrix, so the transmission overhead when transmitting the data sample to the modeling server can be reduced;
  • the above target matrix is encrypted and calculated using the PCA algorithm, the amount of information in the original data samples can be retained to the greatest extent, so that the model can still be guaranteed when the encrypted encryption matrix is transmitted to the modeling server for model training. The precision of the training.
  • the present specification also provides an embodiment of a data encryption apparatus.
  • Embodiments of the data encryption device of the present specification can be applied to an electronic device.
  • the device embodiment may be implemented by software, or may be implemented by hardware or a combination of hardware and software.
  • the processor of the electronic device in which the computer is located reads the corresponding computer program instructions in the non-volatile memory into the memory.
  • FIG. 6 a hardware structure diagram of an electronic device in which the data encryption device of the present specification is located, except for the processor, the memory, the network interface, and the nonvolatile memory shown in FIG.
  • the electronic device in which the device is located in the embodiment may also include other hardware according to the actual function of the electronic device, and details are not described herein.
  • FIG. 7 is a block diagram of a data encryption apparatus shown in an exemplary embodiment of the present specification.
  • the data encryption device 70 can be applied to the electronic device shown in FIG. 6 , and includes: a generation module 701 , a calculation module 702 , and a transmission module 703 .
  • the generating module 701 generates an N*M-dimensional target matrix based on the N data samples and the data features of the M dimensions respectively corresponding to the N data samples.
  • the calculation module 702 performs encryption calculation on the target matrix based on the PCA algorithm to obtain an N*K-dimensional encryption matrix; wherein the K value is smaller than the M value;
  • the transmission module 703 transmits the encryption matrix to the modeling server; wherein the encryption matrix is used to train a machine learning model.
  • calculation module 702 calculates the calculation module 702:
  • the device further includes:
  • a storage module 704 (not shown in FIG. 7) stores the projection matrix locally as an encryption matrix.
  • the calculating module 702 further:
  • the target matrix is subjected to encryption calculation based on a PCA algorithm to obtain the N*K-dimensional encryption matrix.
  • the device 70 further includes:
  • An update module 705 (not shown in FIG. 7), if the dimension of the data feature changes, or the meaning of the data feature representation changes, re-calculate the target matrix based on the PCA algorithm, and based on the recalculation The obtained projection matrix updates the locally stored projection matrix.
  • the present specification also provides an embodiment of a machine learning model training device.
  • Embodiments of the machine learning model training device of the present specification can be applied to an electronic device.
  • the device embodiment may be implemented by software, or may be implemented by hardware or a combination of hardware and software.
  • the processor of the electronic device in which the computer is located reads the corresponding computer program instructions in the non-volatile memory into the memory.
  • a hardware structure diagram of the electronic device of the machine learning model training device of the present specification except for the processor, memory, network interface, and non-volatile shown in FIG.
  • the electronic device in which the device is located in the embodiment may also include other hardware according to the actual function of the electronic device, and details are not described herein.
  • FIG. 9 is a block diagram of a data encryption apparatus shown in an exemplary embodiment of the present specification.
  • the machine learning model training device 90 can be applied to the electronic device shown in FIG. 8 , and includes: a receiving module 901 and a training module 902 .
  • the receiving module 901 receives an encryption matrix transmitted by the data provider server, where the encryption matrix is used by the data provider server to perform an encryption calculation on the target matrix of the N*M dimension based on the PCA algorithm, and the obtained N* is obtained.
  • a K-dimensional encryption matrix the K value is less than the M value;
  • the training module 902 trains the machine learning model with the encryption matrix as a training sample.
  • the training module 902 further:
  • the encryption matrix is used as a training sample, fused with local training samples, and the machine learning model is trained based on the fused training samples.
  • the device embodiment since it basically corresponds to the method embodiment, reference may be made to the partial description of the method embodiment.
  • the device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, ie may be located A place, or it can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the objectives of the present specification. Those of ordinary skill in the art can understand and implement without any creative effort.
  • the system, device, module or unit illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product having a certain function.
  • a typical implementation device is a computer, and the specific form of the computer may be a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email transceiver, and a game control.
  • the present specification also provides an embodiment of a machine learning model training system.
  • the machine learning model training system may include a data provider server and a modeling server.
  • the data provider server may generate an N*M-dimensional target matrix based on the N data samples and the data features corresponding to the M dimensions of the N data samples, and the target matrix is based on the PCA algorithm. Performing an encryption calculation to obtain an N*K-dimensional encryption matrix; wherein, the K value is smaller than the M value; and transmitting the encryption matrix to a modeling server;
  • the above modeling server trains a machine learning model based on the encryption matrix.
  • the modeling server further:
  • the encryption matrix is used as a training sample, fused with local training samples, and the machine learning model is trained based on the fused training samples.
  • the present specification also provides an embodiment of an electronic device.
  • the electronic device includes a processor and a memory for storing machine executable instructions; wherein the processor and the memory are typically interconnected by an internal bus.
  • the device may also include an external interface to enable communication with other devices or components.
  • the processor by reading and executing the machine-executable instructions stored in the memory corresponding to the above-described data-encrypted control logic shown in Fig. 1, the processor is caused to:
  • the processor by reading and executing the machine-executable instructions stored in the memory corresponding to the control logic of the data encryption shown in FIG. 1, the processor is also caused to:
  • the processor by reading and executing the machine-executable instructions stored in the memory corresponding to the control logic of the data encryption shown in FIG. 1, the processor is also caused to:
  • the projection matrix is stored locally as an encryption matrix.
  • the processor by reading and executing the machine-executable instructions stored in the memory corresponding to the control logic of the data encryption shown in FIG. 1, the processor is also caused to:
  • the target matrix is subjected to encryption calculation based on a PCA algorithm to obtain the N*K-dimensional encryption matrix.
  • the processor by reading and executing the machine-executable instructions stored in the memory corresponding to the data-encrypted control logic, the processor is further caused to:
  • the target matrix is re-encrypted based on the PCA algorithm, and based on the recalculated projection matrix, locally stored The projection matrix is updated.
  • the present specification also provides an embodiment of another electronic device.
  • the electronic device includes a processor and a memory for storing machine executable instructions; wherein the processor and the memory are typically interconnected by an internal bus.
  • the device may also include an external interface to enable communication with other devices or components.
  • the processor by reading and executing the machine-executable instructions stored in the memory corresponding to the control logic of the machine learning model training illustrated in FIG. 5, the processor is caused to:
  • the encryption matrix is an encryption matrix of the N*K dimension obtained by the data provider server for performing an encryption calculation on the target matrix of the N*M dimension based on the PCA algorithm;
  • the K value is less than the M value;
  • the encryption matrix is trained as a training sample to train a machine learning model.
  • the processor by reading and executing the machine-executable instructions stored in the memory corresponding to the control logic of the machine learning model training shown in FIG. 5, the processor is further caused to:
  • the encryption matrix is used as a training sample, fused with local training samples, and the machine learning model is trained based on the fused training samples.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Security & Cryptography (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Hardware Design (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Complex Calculations (AREA)

Abstract

公开一种数据加密方法,包括:基于N条数据样本以及分别对应于所述N条数据样本的M个维度的数据特征生成N*M维的目标矩阵;基于PCA算法对所述目标矩阵进行加密计算,得到加密后的N*K维的加密矩阵;将所述加密矩阵传输至建模服务端,由所述建模服务端将所述加密矩阵作为训练样本训练机器学习模型。

Description

数据加密、机器学习模型训练方法、装置及电子设备 技术领域
本说明书涉及计算机应用领域,尤其涉及一种数据加密、机器学习模型训练方法、装置及电子设备。
背景技术
随着互联网技术的飞速发展,用户的个人数据的网络化和透明化已经成为不可阻挡的大趋势。对于一些面向用户提供互联网服务的服务平台而言,可以通过采集用户日常产生的服务数据,收集到海量的用户数据。而这些用户数据对于服务平台的运营方来说,是非常珍贵的“资源”,服务平台的运营方可以通过数据挖掘和机器学习,从这些“资源”中挖掘出大量有价值的信息。例如,在实际应用中,可以结合具体业务场景,从这些海量用户数据中提取出若干个维度的数据特征,并将提取出的这些特征作为训练样本,通过特定的机器学习算法进行训练构建机器学习模型,然后在该业务场景下应用训练完成的该机器学习模型,来指导业务运营。
发明内容
本说明书提出一种数据加密方法,所述方法包括:
基于N条数据样本以及分别对应于所述N条数据样本的M个维度的数据特征生成N*M维的目标矩阵;
基于PCA算法对所述目标矩阵进行加密计算,得到N*K维的加密矩阵;其中,所述K值小于所述M值;
将所述加密矩阵传输至建模服务端;其中,所述加密矩阵用于训练机器学习模型。
可选的,所述基于PCA算法对所述目标矩阵进行加密计算,得到加密后的N*K维的加密矩阵,还包括:
对所述目标矩阵中与所述M个维度的数据特征对应的向量中的取值分别进行零均值化处理;
计算零均值化处理后的所述目标矩阵对应的协方差矩阵;
计算所述协方差矩阵的特征值以及与所述特征值对应的特征向量;
将计算得到的所述特征向量按照对应的特征值的大小进行排序,并提取特征值最大的K个特征向量生成M*K维的投影矩阵;
将所述目标矩阵与所述投影矩阵相乘,得到所述N*K维的加密矩阵。
可选的,还包括:
将所述投影矩阵作为加密矩阵在本地进行存储。
可选的,所述基于PCA算法对所述目标矩阵进行加密计算,得到加密后的N*K维的加密矩阵,包括:
判断本地是否存储了所述投影矩阵;
如果本地存储了所述投影矩阵,则将所述N*M维的目标矩阵与所述投影矩阵相乘,得到所述N*K维的加密矩阵;
如果本地未存储所述投影矩阵,则基于PCA算法对所述目标矩阵进行加密计算,得到所述N*K维的加密矩阵。
可选的,还包括:
如果所述数据特征的维度发生变化,或者所述数据特征表征的含义发生变化,基于PCA算法重新对所述目标矩阵进行加密计算,并基于重新计算得到的所述投影矩阵,对本地存储的投影矩阵进行更新。
本说明书还提出一种数据加密装置,所述装置包括:
生成模块,基于N条数据样本以及分别对应于所述N条数据样本的M个维度的数据特征生成N*M维的目标矩阵;
计算模块,基于PCA算法对所述目标矩阵进行加密计算,得到N*K维的加密矩阵;其中,所述K值小于所述M值;
传输模块,将所述加密矩阵传输至建模服务端;其中,所述加密矩阵用于训练机器学习模型。
可选的,所述计算模块:
对所述目标矩阵中与所述M个维度的数据特征对应的向量中的取值分别进行零均值化处理;
计算零均值化处理后的所述目标矩阵对应的协方差矩阵;
计算所述协方差矩阵的特征值以及与所述特征值对应的特征向量;
将计算得到的所述特征向量按照对应的特征值的大小进行排序,并提取特征值最大的K个特征向量生成M*K维的投影矩阵;
将所述目标矩阵与所述投影矩阵相乘,得到所述N*K维的加密矩阵。
可选的,还包括:
存储模块,将所述投影矩阵作为加密矩阵在本地进行存储。
可选的,所述计算模块进一步:
判断本地是否存储了所述投影矩阵;
如果本地存储了所述投影矩阵,则将所述N*M维的目标矩阵与所述投影矩阵相乘,得到所述N*K维的加密矩阵;
如果本地未存储所述投影矩阵,则基于PCA算法对所述目标矩阵进行加密计算,得到所述N*K维的加密矩阵。
可选的,还包括:
更新模块,如果所述数据特征的维度发生变化,或者所述数据特征表征的含义发生变化,基于PCA算法重新对所述目标矩阵进行加密计算,并基于重新计算得到的所述投影矩阵,对本地存储的投影矩阵进行更新。
本说明书还提出一种机器学习模型训练方法,所述方法包括
接收数据提供方服务端传输的加密矩阵;其中,所述加密矩阵为所述数据提供方服务端基于PCA算法针对N*M维的目标矩阵进行加密计算,得到的N*K维的加密矩阵;所述K值小于所述M值;
将所述加密矩阵作为训练样本训练机器学习模型。
可选的,所述将所述加密矩阵作为训练样本训练机器学习模型,包括:
将所述加密矩阵作为训练样本,与本地的训练样本进行融合,并基于融合后的训练样本训练机器学习模型。
本说明书还提出一种机器学习模型训练装置,所述装置包括
接收模块,接收数据提供方服务端传输的加密矩阵;其中,所述加密矩阵为所述数据提供方服务端基于PCA算法针对N*M维的目标矩阵进行加密计算,得到的N*K维的加密矩阵;所述K值小于所述M值;
训练模块,将所述加密矩阵作为训练样本训练机器学习模型。
可选的,所述训练模块:
将所述加密矩阵作为训练样本,与本地的训练样本进行融合,并基于融合后的训练样本训练机器学习模型。
本说明书还提出一种机器学习模型训练系统,所述系统包括:
数据提供方服务端,基于N条数据样本以及分别对应于所述N条数据样本的M个维度的数据特征生成N*M维的目标矩阵,并基于PCA算法对所述目标矩阵进行加密计算,得到N*K维的加密矩阵;其中,所述K值小于所述M值;以及,将所述加密矩阵传输至建模服务端;
建模服务端,基于所述加密矩阵训练机器学习模型。
可选的,所述建模服务端进一步:
将所述加密矩阵作为训练样本,与本地的训练样本进行融合,并基于融合后的训练样本训练机器学习模型。
本说明书还提出一种电子设备,包括:
处理器;
用于存储机器可执行指令的存储器;
其中,通过读取并执行所述存储器存储的与数据加密的控制逻辑对应的机器可执行指令,所述处理器被促使:
基于N条数据样本以及分别对应于所述N条数据样本的M个维度的数据特征生成N*M维的目标矩阵;
基于PCA算法对所述目标矩阵进行加密计算,得到N*K维的加密矩阵;其中,所述K值小于所述M值;
将所述加密矩阵传输至建模服务端;其中,所述加密矩阵用于训练机器学习模型。
本说明书还提出一种电子设备,包括:
处理器;
用于存储机器可执行指令的存储器;
其中,通过读取并执行所述存储器存储的与机器学习模型训练的控制逻辑对应的机器可执行指令,所述处理器被促使:
接收数据提供方服务端传输的加密矩阵;其中,所述加密矩阵为所述数据提供方服务端基于PCA算法针对N*M维的目标矩阵进行加密计算,得到的N*K维的加密矩阵;所述K值小于所述M值;
将所述加密矩阵作为训练样本训练机器学习模型。
本说明书中,通过基于N条数据样本以及分别对应于所述N条数据样本的M个维度的数据特征生成N*M维的目标矩阵,并基于PCA算法对该目标矩阵进行加密计算,得到加密后的N*K维的加密矩阵,然后将该加密矩阵传输至建模服务端,由该建模服务端将该加密矩阵作为训练样本训练机器学习模型;
一方面,由于在将基于PCA算法加密后的加密矩阵传输至建模服务端后,建模服务端基于加密后的加密矩阵,通常无法还原出原始的目标矩阵,因而能够最大程度的对用户的隐私数据进行隐私保护,避免将数据样本提交至建模服务端进行模型训练的过程中造成用户的隐私泄露;
另一方面,由于基于PCA算法对上述目标矩阵进行加密计算后,得到的加密矩阵中的维度少于原始的目标矩阵,因此可以降低在向建模服务端传输数据样本时的传输开销;而且,由于使用PCA算法对上述目标矩阵进行加密计算时,能够最大程度保留原始数据样本中的信息量,因此在将加密后的加密矩阵传输至建模服务端进行模型训练时,仍然能够保证模型训练的精度。
附图说明
图1是本说明书一实施例示出的一种数据加密方法的流程图;
图2是本说明书一实施例示出的N*M维的目标矩阵示意图;
图3是本说明书一实施例示出的基于PCA算法对上述目标矩阵进行机密计算的流程图;
图4是本说明书一实施例示出的一种融合多方数据样本进行联合建模的示意;
图5是本说明书一实施例示出的一种机器学习模型训练方法的流程图;
图6是本说明书一实施例提供的承载所述数据加密装置的电子设备所涉及的硬件结构图;
图7是本说明书一实施例提供的所述数据加密装置的的逻辑框图;
图8是本说明书一实施例提供的承载所述机器学习模型训练装置的电子设备所涉及的硬件结构图;
图9是本说明书一实施例提供的所述机器学习模型训练装置的逻辑框图。
具体实施方式
在大数据时代,通过对海量数据进行挖掘,可以获得各种形式的有用信息,因此数据的重要性不言而喻。不同的机构都拥有各自的数据,但是任何一家机构的数据挖掘效果,都会受限于其自身拥有的数据数量和种类。针对该问题,一种直接的解决思路是:多家机构相互合作,将数据进行共享,从而实现更好的数据挖掘效果,实现共赢。
然而对于数据拥有方而言,数据本身是一种具有很大价值的资产,而且出于保护隐私、防止泄露等需求,数据拥有方往往并不愿意直接把数据提供出来,这种状况导致“数据共享”在现实中很难实际运作。因此,如何在充分保证数据安全的前提下实现数据共享,已经成为行业内备受关注的问题。
在本说明书中,旨在提出一种基于PCA算法对建模所需的原始用户数据 进行加密,来对原始的用户数据进行隐私保护,并最大程度保留原始用户数据中的信息量,从而可以在不牺牲建模精度的前提下,兼顾对用户的隐私保护的技术方案。
在实现时,可以从建模所需的N条数据样本中分别提取出N个维度的数据特征,并基于该N条数据样本,以及分别对应于该N条数据样本的M个维度的数据特征,生成一个N*M维的目标矩阵。
当生成上述N*M维的目标矩阵后,可以基于PCA算法对该目标矩阵进行加密计算,得到加密后的N*K维的加密矩阵,并将该加密矩阵作为训练样本传输至建模服务端;其中,K的取值小于M的取值。
上述建模服务端在收到该加密矩阵后,可以将该加密矩阵作为训练样本训练机器学习模型;比如,可以将上述加密矩阵与其本地的训练样本进行融合,然后基于融合后的训练样本来训练机器学习模型。
一方面,由于在将基于PCA算法加密后的加密矩阵传输至建模服务端后,建模服务端基于加密后的加密矩阵,通常无法还原出原始的目标矩阵,因而能够最大程度的对用户的隐私数据进行隐私保护,避免将数据样本提交至建模服务端进行模型训练的过程中造成用户的隐私泄露;
另一方面,由于基于PCA算法对上述目标矩阵进行加密计算后,得到加密矩阵中的维度少于原始的目标矩阵,因此可以降低在向建模服务端传输数据样本时的传输开销;而且,由于使用PCA算法对上述目标矩阵进行加密计算时,能够最大程度保留原始数据样本中的信息量,因此在将加密后的加密矩阵传输至建模服务端进行模型训练时,仍然能够保证模型训练的精度。
下面通过具体实施例并结合具体的应用场景进行详细描述。
请参考图1,图1是本说明书一实施例提供的一种数据加密方法,应用于数据提供方服务端,执行以下步骤:
步骤102,基于N条数据样本以及分别对应于所述N条数据样本的M个维度的数据特征生成N*M维的目标矩阵;
步骤104,基于PCA算法对所述目标矩阵进行降维计算,得到加密后的 N*K维的加密矩阵;其中,所述K值小于所述M值;
步骤106,将所述加密矩阵传输至建模服务端;其中,所述加密矩阵用于训练机器学习模型。
上述数据提供方服务端,可以与建模服务端进行对接,向上述建模服务端提供建模所需的数据样本;
例如,在实际应用中,上述数据提供方和建模方可以分别对应不同的运营方,数据提供方可以将采集到的用户数据作为数据样本,传输至上述建模方来完成数据建模;比如,上述建模方可以是Alipay的数据运营平台,而上述数据提供方可以是与Alipay的数据运营平台对接的诸如第三方银行、快递公司等面向用户提供互联网服务的服务平台。
在初始状态下,数据提供方的服务端可以在后台收集用户日常产生的用户数据,并从收集到的这些用户数据中采集N条用户数据作为数据样本,并基于采集到的这些数据样本生成一个初始化的数据样本集合。
例如,在示出的一种实施方式中,可以从收集到的这些用户数据中,采集出N条涉及用户隐私的敏感数据,然后基于这些敏感数据生成一个初始化的数据样本集合。
其中,采集到的上述N条数据样本的具体数量,在本说明书中不进行特别限定,本领域技术人员可以基于实际的需求进行设置。
上述用户数据的具体形态,取决于具体的业务场景以及建模需求,在本说明书中也不进行特别限定;例如,在实际应用中,如果希望创建出一个用于对用户发起的支付交易进行风险评估的评分卡模型,那么在这种业务场景下,上述用户数据则可以是基于用户通过支付客户端产生的交易数据。
当基于采集到的N条数据样本生成上述数据样本集合后,上述数据提供方服务端还可以对该数据样本集合中的数据样本进行预处理。
其中,对上述数据样本集合中的数据样本进行预处理,通常包括对上述数据样本集合中的数据样本进行数据清洗、补充缺省值、归一化处理或者其它形式的预处理过程。通过对数据样本集合中的数据样本进行预处理,可以 将采集到的数据样本转换成适宜进行模型训练的标准化的数据样本。
当对上述数据样本集合中的数据样本预处理完成后,上述数据提供方服务端可以从上述数据样本集合中的各数据样本中,分别提取出M个维度的数据特征;
其中,提取出的上述M个维度的数据特征的数量,在本说明书中不进行特别限定,本领域技术人员可以基于实际的建模需求进行选择。
另外,提取出的数据特征的具体类型,在本说明书中也不进行特别限定,本领域技术人员可以基于实际的建模需求,从上述数据样本中实际所包含的信息中来人工选定;
例如,在一种实施方式中,上述建模方可以基于实际的建模需求,预先选定M个维度的数据特征,然后将选定的数据特征提供给上述数据提供方,由上述数据提供方从上述数据样本中来提取与各个维度的数据特征对应的数据特征取值。
当数据提供方提取出与上述数据样本集合中的各数据样本中,分别提取出M个维度的数据特征后,可以基于提取出的M个维度的数据特征对应的数据特征取值,为各数据样本分别生成一个数据特征向量,然后基于各数据样本的数据特征向量,构建出一个N*M维的目标矩阵。
其中,在实现时,上述M维的数据特征,可以对应于上述目标矩阵的行,也可以对应于上述目标矩阵的列,在本说明书中不进行特别限定。
例如,请参见图2,以上述M维的数据特征对应于上述目标矩阵的行为例,上述目标矩阵可以表示成如图2所示的形式。在图2所示的目标矩阵中,每一列表示一条数据样本,每一行表示由M个维度的数据特征构成的一个特征向量。
当上述数据提供方的服务端基于N条数据样本,以及分别对应于该N条数据样本的M个维度的数据特征,生成N*M维的目标矩阵后,可以基于PCA算法对上述目标矩阵进行加密计算,得到一个N*K维的加密矩阵。
由于经过PCA降计算的后的加密矩阵,通常无法被还原成原始的目标矩 阵,因而加密后得到的上述加密矩阵,则是一个经过加密的加密矩阵。通过这种方式,能够最大程度的对用户的隐私数据进行隐私保护。
请参见图3,图3为本说明书示出的一种基于PCA算法对上述目标矩阵进行加密计算的流程,包括如下步骤:
步骤302,对所述目标矩阵中与所述M个维度的数据特征对应的向量中的取值分别进行零均值化处理;
所谓零均值化处理,是指将一组数值中的每个数值,减去该组数值的均值的过程。在本说明书中,对上述目标矩阵中与上述M个维度的数据特征对应的向量中的取值,分别进行零均值化处理,是指与上述M个维度的数据特征对应的向量中的各个取值,分别减去该向量中所有取值的平均值的过程。
在实现时,可以将上述目标矩阵中与上述M个维度的数据特征对应的向量依次选定为目标向量,然后计算该目标向量中各个取值的平均值,然后将该目标向量中各个取值,分别减去该平均值。
例如,以图2中示出的目标矩阵为例,可以计算图2中的目标矩阵中的每一行的平均值,然后将每一行中的各个数值,依次减去该行的平均值。
步骤304,计算零均值化处理后的所述目标矩阵对应的协方差矩阵;
所述协方差矩阵,是指由目标矩阵中各个向量中的各个取值之间的协方差构成的矩阵。
在计算零均值化处理后的上述目标矩阵的协方差矩阵时,可以分别计算与上述M个维度的数据特征对应的向量,与上述目标矩阵中其它维度的数据特征对应的向量之间的协方差,然后基于计算得到的协方差,生成一个由计算得到的协方差构成的协方差矩阵。
其中,需要说明的是,关于协方差矩阵的具体计算过程,在本说明书中不再结合具体的示例进行详细描述,本领域技术人员在将本说明书中记载的技术方案付诸实现时,可以参考相关技术中的记载;例如,本领域技术人员可以利用一些诸如matlab等成熟的工具,来计算上述目标矩阵的协方差矩阵。
步骤306,计算所述协方差矩阵的特征值以及与所述特征值对应的特征 向量;
当计算出上述目标矩阵的协方差矩阵后,可以进一步计算该协方差矩阵的特征值,以及与各个特征值对应的特征向量。其中,协方差矩阵的特征值,通常取决于协方差矩阵的阶数;比如,对于一个N*M的协方差矩阵而言,其特征值为N*M个。
在计算上述协方差矩阵的特征值以及特征向量时,首先可以将上述协方差矩阵的特征多项式,然后求出该特征多项式的全部根;此时求出的每一个根都是一个特征值。当求出全部特征值后,可以将该特征值代入到与该协方差矩阵对应的线性方程组进行求解,得到与各个特征值对应的特征向量。
其中,需要说明的是,关于协方差矩阵的特征值和与特征值对应的特征向量的具体计算过程,在本说明书中不再结合具体的示例进行详细描述,本领域技术人员在将本说明书中记载的技术方案付诸实现时,可以参考相关技术中的记载;例如,本领域技术人员可以利用一些诸如matlab等成熟的工具,来计算上述协方差矩阵特征值以及对应于特征值的特征向量。
步骤308,将计算得到的所述特征向量按照对应的特征值的大小进行排序,并提取特征值最大的K个特征向量生成M*K维的投影矩阵;
当计算出上述协方差矩阵的所有特征值,以及与各个特征值对应的特征向量后,可以进一步对各个特征向量按照其对应的特征值的大小进行排序;比如,按照由大到小的顺序进行排序。当按照特征值的取值大小对各个特征向量进行排序完成后,可以提取出特征值最大的K个特征向量,生成一个M*K维的投影矩阵。
其中,上述K的取值,可以是一个小于上述M值的取值,在实际应用中,上述K的取值具体可以由本领域人员结合实际的需求进行人工指定。
步骤310,将所述目标矩阵与所述投影矩阵相乘,得到加密后的所述N*K维的加密矩阵。
上述M*K维的投影矩阵,即为最终对原始的目标矩阵进行加密的投影矩阵。在基于该M*K维的投影矩阵,对原始的N*M的目标矩阵进行加密时,具 体可以将原始的高维度的目标矩阵,映射到一个低维度的投影矩阵空间中。
在实现时,将上述原始的N*M的目标矩阵映射到上述M*K维的投影矩阵空间,具体可以通过将上述原始的N*M的目标矩阵与上述M*K维的投影矩阵相乘来实现(即线性投影);其中,上述原始的N*M的目标矩阵与上述M*K维的投影矩阵相乘,可以采用右乘也可以采用左乘;
比如,假设上述M个维度的数据特征作为上述目标矩阵的列时,可以通过将上述原始的N*M的目标矩阵与上述M*K维的投影矩阵进行右乘计算,将上述原始的N*M的目标矩阵映射到上述M*K维的投影矩阵空间;那么,在实现时,也可以通过将上述原始的N*M的目标矩阵与上述M*K维的投影矩阵进行左乘,然后对左乘结果进行转置,将上述原始的N*M的目标矩阵映射到上述M*K维的投影矩阵空间。
当将上述原始的N*M的目标矩阵映射到上述M*K维的投影矩阵空间后,可以得到一个N*K维的加密矩阵。此时,该加密矩阵即为经过上述M*K维的映射矩阵进行加密后的数据样本。
其中,在示出的一种实施方式中,上述数据提供方服务端,在通过以上示出的计算过程,计算出上述N*K维的投影矩阵后,还可以将该投影矩阵作为加密矩阵在本地进行存储。
后续上述数据提供方服务端,再次收集到最新的N条数据数据样本,并基于该N条数据样本以及分别对应于所述N条数据样本的M个维度的数据特征生成了N*M维的矩阵后,可以判断本地是否存储了上述投影矩阵;
如果本地存储了上述投影矩阵,可以直接使用已经存储的上述投影矩阵,对上述N*M的矩阵进行加密,具体的加密过程不再赘述。
当然,如果本地未存储上述投影矩阵,那么可以按照以上描述的基于PCA算法的降维过程,重新对上述加密矩阵进行加密计算,生成上述投影矩阵。
另外,需要说明的是,在实际应用中,如果上述M个维度的数据特征的维度发生变化(比如增加了新的维度的数据特征,或者删减了其中部分维度的数据特征),或者上述M个维度的数据特征中的全部或者部分维度的数据 特征所表征的含义发生变化,那么此时上述数据提供方可以基于以上描述的基于PCA算法的加密计算过程,重新计算出上述投影矩阵,并使用重新计算出的投影矩阵对本地应存储的投影矩阵进行更新。
通过这种方式,可以在建模所需的数据特征发生更新时,及时的对本地存储的已失效的加密矩阵进行更新,从而可以避免利用已失效的加密矩阵对原始的目标矩阵进行加密,而造成的数据信息量的损失影响建模精度。
在本说明书中,当按照以上示出的基于PCA算法的加密计算过程对所述目标矩阵进行降维计算,得到了加密后的N*K维的加密矩阵后,上述数据提供方服务端可以将该加密矩阵作为训练样本,传输至与上述数据提供方对接的建模服务端。
而建模服务端在收到上述数据提供方服务端传输的上述加密矩阵后,建模服务端可以将该加密矩阵作为训练样本训练机器学习模型;
其中,在示出的一种实施方式中,上述建模服务端具体可以将上述加密矩阵,与本地存储的训练样本进行融合,然后基于融合后的训练样本,来联合训练机器学习模型。
请参见图4,图4为本说明书示出的一种融合多方数据样本进行联合建模的示意图。
在一种场景下,上述建模方可以是Alipay的数据运营平台,而上述数据提供方可以包括与Alipay的数据运营平台对接的诸如银行、第三方金融机构等面向用户提供互联网服务的服务平台。在实际应用中,由于Alipay的数据运营平台对于上述数据提供方而言,是一个非授信的第三方,因此上述数据提供方直接将本地的用户交易数据提供给Alipay的数据运营平台进行数据建模,可能在数据传输环节中造成用户隐私泄露的问题。在这种情况下,各数据提供方可以基于PCA算法对基于上述投影矩阵对基于原始的交易数据样本生成的N*M维的目标矩阵进行加密计算,得到一个N*K维的加密矩阵,然后作为训练样本传输给Alipay的数据运营平台。而Alipay的数据运营平台可以将接收到的由各个数据提供方提供的训练样本,与本地化的数据样本进 行融合,然后基于融合后的训练样本来训练机器学习模型;比如,基于银行以及第三方金融机构提供的用户交易数据,与Alipay的数据运营平台中本地化的用户交易数据进行融合,联合训练处一个用于对用户的交易进行风险评估的评分卡模型。
其中,需要说明的是,上述机器学习模型的具体类型,在本说明书中不进行特别限定;例如,在实际应用中,上述机器学习模型具体可以是基于有监督的机器学习算法(比如回归算法)搭建的有监督的预测模型;比如,基于用户的支付交易数据,训练出的用于评估用户的交易风险的评分卡模型;也可以基于无监督的机器学习算法(比如k-means算法)搭建的无监督的分类模型;比如,基于用户的点击以及访问数据,训练出的用于向用户有针对性的投放广告或者页面内容的推荐模型。
当基于以上示出的建模方式,训练出上述机器学习模型后,后续上述数据提供方,仍然可以基于上述投影矩阵,对基于采集到的数据样本以及相关数据特征构建的数据矩阵进行加密,然后传输至上述机器学习模型进行计算,得到模型的输出结果;比如,以上述机器学习模型为基于用户的支付交易数据,训练出的用于评估用户的交易风险的评分卡模型为例,上述数据提供方可以基于上述投影矩阵,对基于采集到的用户的交易数据构建的数据矩阵进行加密计算,然后作为输入数据传输至上述评分卡模型,得到与每笔交易对应的风险评分。
上面为本说明书实施例提供的一种数据加密方法,请参见图5,基于同样的思路,本说明书实施例提供的一种机器学习模型训练方法,应用于建模服务端,执行以下步骤:
步骤502,接收数据提供方服务端传输的加密矩阵;其中,所述加密矩阵为所述数据提供方服务端基于PCA算法针对N*M维的目标矩阵进行加密计算,得到的N*K维的加密矩阵;所述K值小于所述M值;
步骤504,将所述加密矩阵作为训练样本训练机器学习模型。
其中,图5中示出的各步骤中的技术特征对应的实施过程,在本实施例中 不再赘述,可以参照以上实施例的记载。
通过以上各实施例可知,通过基于N条数据样本以及分别对应于所述N条数据样本的M个维度的数据特征生成N*M维的目标矩阵,并基于PCA算法对该目标矩阵进行加密计算,得到N*K维的加密矩阵,然后将该加密矩阵传输至建模服务端,由该建模服务端将该加密矩阵作为训练样本训练机器学习模型;
一方面,由于在将基于PCA算法加密后的加密矩阵传输至建模服务端后,建模服务端基于加密后的加密矩阵,通常无法还原出原始的目标矩阵,因而能够最大程度的对用户的隐私数据进行隐私保护,避免将数据样本提交至建模服务端进行模型训练的过程中造成用户的隐私泄露;
另一方面,由于基于PCA算法对上述目标矩阵进行加密计算后,得到的加密矩阵中的维度少于原始的目标矩阵,因此可以降低在向建模服务端传输数据样本时的传输开销;
而且,由于使用PCA算法对上述目标矩阵进行加密计算时,能够最大程度保留原始数据样本中的信息量,因此在将加密后的加密矩阵传输至建模服务端进行模型训练时,仍然能够保证模型训练的精度。
与上述方法实施例相对应,本说明书还提供了一种数据加密装置的实施例。
本说明书的数据加密装置的实施例可以应用在电子设备上。装置实施例可以通过软件实现,也可以通过硬件或者软硬件结合的方式实现。以软件实现为例,作为一个逻辑意义上的装置,是通过其所在电子设备的处理器将非易失性存储器中对应的计算机程序指令读取到内存中运行形成的。从硬件层面而言,如图6所示,为本说明书的数据加密装置所在电子设备的一种硬件结构图,除了图6所示的处理器、内存、网络接口、以及非易失性存储器之外,实施例中装置所在的电子设备通常根据该电子设备的实际功能,还可以包括其他硬件,对此不再赘述。
图7是本说明书一示例性实施例示出的一种数据加密装置的框图。
请参考图7,所述数据加密装置70可以应用在前述图6所示的电子设备中,包括有:生成模块701、计算模块702、传输模块703。
其中,生成模块701,基于N条数据样本以及分别对应于所述N条数据样本的M个维度的数据特征生成N*M维的目标矩阵;
计算模块702,基于PCA算法对所述目标矩阵进行加密计算,得到N*K维的加密矩阵;其中,所述K值小于所述M值;
传输模块703,将所述加密矩阵传输至建模服务端;其中,所述加密矩阵用于训练机器学习模型。
在本实施例中,所述计算模块702:
对所述目标矩阵中与所述M个维度的数据特征对应的向量中的取值分别进行零均值化处理;
计算零均值化处理后的所述目标矩阵对应的协方差矩阵;
计算所述协方差矩阵的特征值以及与所述特征值对应的特征向量;
将计算得到的所述特征向量按照对应的特征值的大小进行排序,并提取特征值最大的K个特征向量生成M*K维的投影矩阵;
将所述目标矩阵与所述投影矩阵相乘,得到所述N*K维的加密矩阵。
在本实施例中,所述装置还包括:
存储模块704(图7中未示出),将所述投影矩阵作为加密矩阵在本地进行存储。
在本实施例中,所述计算模块702进一步:
判断本地是否存储了所述投影矩阵;
如果本地存储了所述投影矩阵,则将所述N*M维的目标矩阵与所述投影矩阵相乘,得到所述N*K维的加密矩阵;
如果本地未存储所述投影矩阵,则基于PCA算法对所述目标矩阵进行加密计算,得到所述N*K维的加密矩阵。
在本实施例中,所述装置70还包括:
更新模块705(图7中未示出),如果所述数据特征的维度发生变化, 或者所述数据特征表征的含义发生变化,基于PCA算法重新对所述目标矩阵进行加密计算,并基于重新计算得到的所述投影矩阵,对本地存储的投影矩阵进行更新。
与上述方法实施例相对应,本说明书还提供了一种机器学习模型训练装置的实施例。
本说明书的机器学习模型训练装置的实施例可以应用在电子设备上。装置实施例可以通过软件实现,也可以通过硬件或者软硬件结合的方式实现。以软件实现为例,作为一个逻辑意义上的装置,是通过其所在电子设备的处理器将非易失性存储器中对应的计算机程序指令读取到内存中运行形成的。从硬件层面而言,如图8所示,为本说明书的机器学习模型训练装所在电子设备的一种硬件结构图,除了图8所示的处理器、内存、网络接口、以及非易失性存储器之外,实施例中装置所在的电子设备通常根据该电子设备的实际功能,还可以包括其他硬件,对此不再赘述。
图9是本说明书一示例性实施例示出的一种数据加密装置的框图。
请参考图9,所述机器学习模型训练装置90可以应用在前述图8所示的电子设备中,包括有:接收模块901、训练模块902。
其中,接收模块901,接收数据提供方服务端传输的加密矩阵;其中,所述加密矩阵为所述数据提供方服务端基于PCA算法针对N*M维的目标矩阵进行加密计算,得到的N*K维的加密矩阵;所述K值小于所述M值;
训练模块902,将所述加密矩阵作为训练样本训练机器学习模型。
在本实施例中,所述训练模块902进一步:
将所述加密矩阵作为训练样本,与本地的训练样本进行融合,并基于融合后的训练样本训练机器学习模型。
上述装置中各个模块的功能和作用的实现过程具体详见上述方法中对应步骤的实现过程,在此不再赘述。
对于装置实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的, 其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本说明书方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机,计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。
与上述方法实施例相对应,本说明书还提供一种机器学习模型训练系统的实施例。
该机器学习模型训练系统,可以包括数据提供方服务端和建模服务端。
其中,上述数据提供方服务端,可以基于N条数据样本以及分别对应于所述N条数据样本的M个维度的数据特征生成N*M维的目标矩阵,并基于PCA算法对所述目标矩阵进行加密计算,得到N*K维的加密矩阵;其中,所述K值小于所述M值;以及,将所述加密矩阵传输至建模服务端;
上述建模服务端,基于所述加密矩阵训练机器学习模型。
在本实施例中,所述建模服务端进一步:
将所述加密矩阵作为训练样本,与本地的训练样本进行融合,并基于融合后的训练样本训练机器学习模型。
与上述方法实施例相对应,本说明书还提供了一种电子设备的实施例。该电子设备包括:处理器以及用于存储机器可执行指令的存储器;其中,处理器和存储器通常通过内部总线相互连接。在其他可能的实现方式中,所述设备还可能包括外部接口,以能够与其他设备或者部件进行通信。
在本实施例中,通过读取并执行所述存储器存储的与图1示出的上述数 据加密的控制逻辑对应的机器可执行指令,所述处理器被促使:
基于N条数据样本以及分别对应于所述N条数据样本的M个维度的数据特征生成N*M维的目标矩阵;
基于PCA算法对所述目标矩阵进行加密计算,得到N*K维的加密矩阵;其中,所述K值小于所述M值;
将所述加密矩阵传输至建模服务端;其中,所述加密矩阵用于训练机器学习模型。
在本例中,通过读取并执行所述存储器存储的与图1示出的数据加密的控制逻辑对应的机器可执行指令,所述处理器还被促使:
对所述目标矩阵中与所述M个维度的数据特征对应的向量中的取值分别进行零均值化处理;
计算零均值化处理后的所述目标矩阵对应的协方差矩阵;
计算所述协方差矩阵的特征值以及与所述特征值对应的特征向量;
将计算得到的所述特征向量按照对应的特征值的大小进行排序,并提取特征值最大的K个特征向量生成M*K维的投影矩阵;
将所述目标矩阵与所述投影矩阵相乘,得到加密后的所述N*K维的加密矩阵。
在本例中,通过读取并执行所述存储器存储的与图1示出的数据加密的控制逻辑对应的机器可执行指令,所述处理器还被促使:
将所述投影矩阵作为加密矩阵在本地进行存储。
在本例中,通过读取并执行所述存储器存储的与图1示出的数据加密的控制逻辑对应的机器可执行指令,所述处理器还被促使:
判断本地是否存储了所述投影矩阵;
如果本地存储了所述投影矩阵,则将所述N*M维的目标矩阵与所述投影矩阵相乘,得到所述N*K维的加密矩阵;
如果本地未存储所述投影矩阵,则基于PCA算法对所述目标矩阵进行加密计算,得到所述N*K维的加密矩阵。
在本例中,通过读取并执行所述存储器存储的与数据加密的控制逻辑对应的机器可执行指令,所述处理器还被促使:
如果所述数据特征的维度发生变化,或者所述数据特征所表征的含义发生变化,基于PCA算法重新对所述目标矩阵进行加密计算,并基于重新计算得到的所述投影矩阵,对本地存储的投影矩阵进行更新。
与上述方法实施例相对应,本说明书还提供了另一种电子设备的实施例。该电子设备包括:处理器以及用于存储机器可执行指令的存储器;其中,处理器和存储器通常通过内部总线相互连接。在其他可能的实现方式中,所述设备还可能包括外部接口,以能够与其他设备或者部件进行通信。
在本实施例中,通过读取并执行所述存储器存储的与图5中示出的机器学习模型训练的控制逻辑对应的机器可执行指令,所述处理器被促使:
接收数据提供方服务端传输的加密矩阵;其中,所述加密矩阵为所述数据提供方服务端基于PCA算法针对N*M维的目标矩阵进行加密计算,得到的N*K维的加密矩阵;所述K值小于所述M值;
将所述加密矩阵作为训练样本训练机器学习模型。
在本实施例中,通过读取并执行所述存储器存储的与图5中示出的机器学习模型训练的控制逻辑对应的机器可执行指令,所述处理器还被促使:
将所述加密矩阵作为训练样本,与本地的训练样本进行融合,并基于融合后的训练样本训练机器学习模型。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本说明书的其它实施方案。本说明书旨在涵盖本说明书的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本说明书的一般性原理并包括本说明书未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本说明书的真正范围和精神由下面的权利要求指出。
应当理解的是,本说明书并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本说明书的范围仅 由所附的权利要求来限制。
上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
以上所述仅为本申请的较佳实施例而已,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。

Claims (18)

  1. 一种数据加密方法,所述方法包括:
    基于N条数据样本以及分别对应于所述N条数据样本的M个维度的数据特征生成N*M维的目标矩阵;
    基于PCA算法对所述目标矩阵进行加密计算,得到N*K维的加密矩阵;其中,所述K值小于所述M值;
    将所述加密矩阵传输至建模服务端;其中,所述加密矩阵用于训练机器学习模型。
  2. 根据权利要求1所述的方法,所述基于PCA算法对所述目标矩阵进行加密计算,得到加密后的N*K维的加密矩阵,还包括:
    对所述目标矩阵中与所述M个维度的数据特征对应的向量中的取值分别进行零均值化处理;
    计算零均值化处理后的所述目标矩阵对应的协方差矩阵;
    计算所述协方差矩阵的特征值以及与所述特征值对应的特征向量;
    将计算得到的所述特征向量按照对应的特征值的大小进行排序,并提取特征值最大的K个特征向量生成M*K维的投影矩阵;
    将所述目标矩阵与所述投影矩阵相乘,得到加密后的所述N*K维的加密矩阵。
  3. 根据权利要求2所述的方法,还包括:
    将所述投影矩阵作为加密矩阵在本地进行存储。
  4. 根据权利要求3所述的方法,所述基于PCA算法对所述目标矩阵进行加密计算,得到加密后的N*K维的加密矩阵,包括:
    判断本地是否存储了所述投影矩阵;
    如果本地存储了所述投影矩阵,则将所述N*M维的目标矩阵与所述投影矩阵相乘,得到所述N*K维的加密矩阵;
    如果本地未存储所述投影矩阵,则基于PCA算法对所述目标矩阵进行加 密计算,得到所述N*K维的加密矩阵。
  5. 根据权利要求3所述的方法,还包括:
    如果所述数据特征的维度发生变化,或者所述数据特征表征的含义发生变化,基于PCA算法重新对所述目标矩阵进行加密计算,并基于重新计算得到的所述投影矩阵,对本地存储的投影矩阵进行更新。
  6. 一种数据加密装置,所述装置包括:
    生成模块,基于N条数据样本以及分别对应于所述N条数据样本的M个维度的数据特征生成N*M维的目标矩阵;
    计算模块,基于PCA算法对所述目标矩阵进行加密计算,得到N*K维的加密矩阵;其中,所述K值小于所述M值;
    传输模块,将所述加密矩阵传输至建模服务端;其中,所述加密矩阵用于训练机器学习模型。
  7. 根据权利要求6所述的装置,所述计算模块:
    对所述目标矩阵中与所述M个维度的数据特征对应的向量中的取值分别进行零均值化处理;
    计算零均值化处理后的所述目标矩阵对应的协方差矩阵;
    计算所述协方差矩阵的特征值以及与所述特征值对应的特征向量;
    将计算得到的所述特征向量按照对应的特征值的大小进行排序,并提取特征值最大的K个特征向量生成M*K维的投影矩阵;
    将所述目标矩阵与所述投影矩阵相乘,得到所述N*K维的加密矩阵。
  8. 根据权利要求7所述的装置,还包括:
    存储模块,将所述投影矩阵作为加密矩阵在本地进行存储。
  9. 根据权利要求8所述的装置,所述计算模块进一步:
    判断本地是否存储了所述投影矩阵;
    如果本地存储了所述投影矩阵,则将所述N*M维的目标矩阵与所述投影矩阵相乘,得到所述N*K维的加密矩阵;
    如果本地未存储所述投影矩阵,则基于PCA算法对所述目标矩阵进行加 密计算,得到所述N*K维的加密矩阵。
  10. 根据权利要求8所述的装置,还包括:
    更新模块,如果所述数据特征的维度发生变化,或者所述数据特征表征的含义发生变化,基于PCA算法重新对所述目标矩阵进行加密计算,并基于重新计算得到的所述投影矩阵,对本地存储的投影矩阵进行更新。
  11. 一种机器学习模型训练方法,所述方法包括
    接收数据提供方服务端传输的加密矩阵;其中,所述加密矩阵为所述数据提供方服务端基于PCA算法针对N*M维的目标矩阵进行加密计算,得到的N*K维的加密矩阵;所述K值小于所述M值;
    将所述加密矩阵作为训练样本训练机器学习模型。
  12. 根据权利要求11所述的方法,所述将所述加密矩阵作为训练样本训练机器学习模型,包括:
    将所述加密矩阵作为训练样本,与本地的训练样本进行融合,并基于融合后的训练样本训练机器学习模型。
  13. 一种机器学习模型训练装置,所述装置包括
    接收模块,接收数据提供方服务端传输的加密矩阵;其中,所述加密矩阵为所述数据提供方服务端基于PCA算法针对N*M维的目标矩阵进行加密计算,得到的N*K维的加密矩阵;所述K值小于所述M值;
    训练模块,将所述加密矩阵作为训练样本训练机器学习模型。
  14. 根据权利要求13所述的装置,所述训练模块:
    将所述加密矩阵作为训练样本,与本地的训练样本进行融合,并基于融合后的训练样本训练机器学习模型。
  15. 一种机器学习模型训练系统,所述系统包括:
    数据提供方服务端,基于N条数据样本以及分别对应于所述N条数据样本的M个维度的数据特征生成N*M维的目标矩阵,并基于PCA算法对所述目标矩阵进行加密计算,得到N*K维的加密矩阵;其中,所述K值小于所述M值;以及,将所述加密矩阵传输至建模服务端;
    建模服务端,基于所述加密矩阵训练机器学习模型。
  16. 根据权利要求15所述的系统,所述建模服务端进一步:
    将所述加密矩阵作为训练样本,与本地的训练样本进行融合,并基于融合后的训练样本训练机器学习模型。
  17. 一种电子设备,包括:
    处理器;
    用于存储机器可执行指令的存储器;
    其中,通过读取并执行所述存储器存储的与数据加密的控制逻辑对应的机器可执行指令,所述处理器被促使:
    基于N条数据样本以及分别对应于所述N条数据样本的M个维度的数据特征生成N*M维的目标矩阵;
    基于PCA算法对所述目标矩阵进行加密计算,得到N*K维的加密矩阵;其中,所述K值小于所述M值;
    将所述加密矩阵传输至建模服务端;其中,所述加密矩阵用于训练机器学习模型。
  18. 一种电子设备,包括:
    处理器;
    用于存储机器可执行指令的存储器;
    其中,通过读取并执行所述存储器存储的与机器学习模型训练的控制逻辑对应的机器可执行指令,所述处理器被促使:
    接收数据提供方服务端传输的加密矩阵;其中,所述加密矩阵为所述数据提供方服务端基于PCA算法针对N*M维的目标矩阵进行加密计算,得到的N*K维的加密矩阵;所述K值小于所述M值;
    将所述加密矩阵作为训练样本训练机器学习模型。
PCT/CN2018/097339 2017-08-01 2018-07-27 数据加密、机器学习模型训练方法、装置及电子设备 Ceased WO2019024772A1 (zh)

Priority Applications (6)

Application Number Priority Date Filing Date Title
CA3058498A CA3058498A1 (en) 2017-08-01 2018-07-27 Method and apparatus for encrypting data, method and apparatus for training machine learning model, and electronic device
EP18840540.1A EP3627759B1 (en) 2017-08-01 2018-07-27 Method and apparatus for encrypting data, method and apparatus for training machine learning model, and electronic device
AU2018310377A AU2018310377A1 (en) 2017-08-01 2018-07-27 Method and apparatus for encrypting data, method and apparatus for training machine learning model, and electronic device
SG11201909193Q SG11201909193QA (en) 2017-08-01 2018-07-27 Method and apparatus for encrypting data, method and apparatus for training machine learning model, and electronic device
US16/587,977 US11257007B2 (en) 2017-08-01 2019-09-30 Method and apparatus for encrypting data, method and apparatus for training machine learning model, and electronic device
AU2021218153A AU2021218153A1 (en) 2017-08-01 2021-08-19 Method and apparatus for encrypting data, method and apparatus for training machine learning model, and electronic device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710647102.6 2017-08-01
CN201710647102.6A CN109327421A (zh) 2017-08-01 2017-08-01 数据加密、机器学习模型训练方法、装置及电子设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/587,977 Continuation US11257007B2 (en) 2017-08-01 2019-09-30 Method and apparatus for encrypting data, method and apparatus for training machine learning model, and electronic device

Publications (1)

Publication Number Publication Date
WO2019024772A1 true WO2019024772A1 (zh) 2019-02-07

Family

ID=65233415

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/097339 Ceased WO2019024772A1 (zh) 2017-08-01 2018-07-27 数据加密、机器学习模型训练方法、装置及电子设备

Country Status (8)

Country Link
US (1) US11257007B2 (zh)
EP (1) EP3627759B1 (zh)
CN (1) CN109327421A (zh)
AU (2) AU2018310377A1 (zh)
CA (1) CA3058498A1 (zh)
SG (1) SG11201909193QA (zh)
TW (1) TWI689841B (zh)
WO (1) WO2019024772A1 (zh)

Families Citing this family (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109101801B (zh) * 2018-07-12 2021-04-27 北京百度网讯科技有限公司 用于身份认证的方法、装置、设备和计算机可读存储介质
CN109165249B (zh) * 2018-08-07 2020-08-04 阿里巴巴集团控股有限公司 数据处理模型构建方法、装置、服务器和用户端
CN112183565B (zh) * 2019-07-04 2023-07-14 创新先进技术有限公司 模型训练方法、装置及系统
CN110471908A (zh) * 2019-08-21 2019-11-19 北京百度网讯科技有限公司 一种联合建模方法和装置
CN110704850B (zh) * 2019-09-03 2022-05-10 华为技术有限公司 人工智能ai模型的运行方法和装置
US20210150042A1 (en) * 2019-11-15 2021-05-20 International Business Machines Corporation Protecting information embedded in a machine learning model
CN111062487B (zh) * 2019-11-28 2021-04-20 支付宝(杭州)信息技术有限公司 基于数据隐私保护的机器学习模型特征筛选方法及装置
CN110909216B (zh) * 2019-12-04 2023-06-20 支付宝(杭州)信息技术有限公司 检测用户属性之间的关联性的方法及装置
US11444774B2 (en) * 2020-01-08 2022-09-13 Tata Consultancy Services Limited Method and system for biometric verification
CN111461191B (zh) * 2020-03-25 2024-01-23 杭州跨视科技有限公司 为模型训练确定图像样本集的方法、装置和电子设备
CN113469366B (zh) * 2020-03-31 2024-06-18 北京观成科技有限公司 一种加密流量的识别方法、装置及设备
CN111401479B (zh) * 2020-04-17 2022-05-17 支付宝(杭州)信息技术有限公司 多方联合对隐私数据进行降维处理的方法和装置
CN111983994B (zh) * 2020-08-13 2021-08-20 杭州电子科技大学 一种基于复杂工业化工过程的v-pca故障诊断方法
JP7444265B2 (ja) * 2020-08-21 2024-03-06 富士通株式会社 訓練データ生成プログラム、訓練データ生成方法および訓練データ生成装置
CN112270415B (zh) * 2020-11-25 2024-03-22 矩阵元技术(深圳)有限公司 一种加密机器学习的训练数据准备方法、装置和设备
CN113298289A (zh) * 2021-04-14 2021-08-24 北京市燃气集团有限责任公司 一种对燃气用户的燃气用气量进行预测的方法及装置
CN113268755B (zh) * 2021-05-26 2023-03-31 建投数据科技(山东)有限公司 一种对极限学习机的数据的处理方法、装置及介质
CN113592097B (zh) * 2021-07-23 2024-02-06 京东科技控股股份有限公司 联邦模型的训练方法、装置和电子设备
WO2023015142A1 (en) * 2021-08-04 2023-02-09 Google Llc Principal component analysis
TWI780881B (zh) * 2021-08-27 2022-10-11 緯創資通股份有限公司 瑕疵檢測模型的建立方法及電子裝置
CN113988308B (zh) * 2021-10-27 2024-07-05 东北大学 一种基于延迟补偿机制的异步联邦梯度平均方法
US12413595B2 (en) * 2021-12-14 2025-09-09 International Business Machines Corporation Authorization of service requests in a multi-cluster system
CN114819184B (zh) * 2022-04-22 2025-11-21 中和农信农业集团有限公司 稀疏矩阵建模方法、装置、计算机设备及介质
CN114817961A (zh) * 2022-04-28 2022-07-29 陕西理工大学 一种网络安全信息加密方法
US12411968B2 (en) * 2023-03-30 2025-09-09 Rakuten Group, Inc. Calculation system, calculation method, and information storage medium
CN117035297B (zh) * 2023-08-02 2024-04-19 瀚能科技有限公司 一种基于大数据的园区智能任务分配方法及系统
CN117633536B (zh) * 2023-12-14 2025-04-11 中山大学 模型训练优化方法、电子设备及计算机可读存储介质
CN118211254B (zh) * 2024-05-22 2024-09-03 苏州元脑智能科技有限公司 加密存储方法、解密提取方法、装置、设备及介质
CN119397566B (zh) * 2024-10-22 2025-04-18 华电陕西能源有限公司 数据治理方法、装置、设备、存储介质及产品

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110307424A1 (en) * 2010-06-10 2011-12-15 Wen Jin Determination of training set size for a machine learning system
CN105303197A (zh) * 2015-11-11 2016-02-03 江苏省邮电规划设计院有限责任公司 一种基于机器学习的车辆跟车安全自动评估方法
CN105488539A (zh) * 2015-12-16 2016-04-13 百度在线网络技术(北京)有限公司 分类模型的生成方法及装置、系统容量的预估方法及装置
CN105787557A (zh) * 2016-02-23 2016-07-20 北京工业大学 一种计算机智能识别的深层神经网络结构设计方法
CN105893331A (zh) * 2016-03-28 2016-08-24 浙江工业大学 一种基于主成分分析算法的道路交通时间序列上数据压缩方法

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6954744B2 (en) * 2001-08-29 2005-10-11 Honeywell International, Inc. Combinatorial approach for supervised neural network learning
US7515740B2 (en) * 2006-08-02 2009-04-07 Fotonation Vision Limited Face recognition with combined PCA-based datasets
JP5451302B2 (ja) * 2009-10-19 2014-03-26 キヤノン株式会社 画像処理装置及び方法、プログラム及び記憶媒体
US8984034B2 (en) * 2010-09-28 2015-03-17 Schneider Electric USA, Inc. Calculation engine and calculation providers
CN102982349B (zh) * 2012-11-09 2016-12-07 深圳市捷顺科技实业股份有限公司 一种图像识别方法及装置
US20160132787A1 (en) 2014-11-11 2016-05-12 Massachusetts Institute Of Technology Distributed, multi-model, self-learning platform for machine learning
CN104573720B (zh) * 2014-12-31 2018-01-12 北京工业大学 一种无线传感器网络中核分类器的分布式训练方法
US10395180B2 (en) * 2015-03-24 2019-08-27 International Business Machines Corporation Privacy and modeling preserved data sharing
US11461690B2 (en) * 2016-07-18 2022-10-04 Nantomics, Llc Distributed machine learning systems, apparatus, and methods

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110307424A1 (en) * 2010-06-10 2011-12-15 Wen Jin Determination of training set size for a machine learning system
CN105303197A (zh) * 2015-11-11 2016-02-03 江苏省邮电规划设计院有限责任公司 一种基于机器学习的车辆跟车安全自动评估方法
CN105488539A (zh) * 2015-12-16 2016-04-13 百度在线网络技术(北京)有限公司 分类模型的生成方法及装置、系统容量的预估方法及装置
CN105787557A (zh) * 2016-02-23 2016-07-20 北京工业大学 一种计算机智能识别的深层神经网络结构设计方法
CN105893331A (zh) * 2016-03-28 2016-08-24 浙江工业大学 一种基于主成分分析算法的道路交通时间序列上数据压缩方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3627759A4 *

Also Published As

Publication number Publication date
SG11201909193QA (en) 2019-11-28
AU2021218153A1 (en) 2021-09-09
CA3058498A1 (en) 2019-02-07
EP3627759A4 (en) 2020-06-17
TW201911108A (zh) 2019-03-16
EP3627759B1 (en) 2021-07-14
TWI689841B (zh) 2020-04-01
EP3627759A1 (en) 2020-03-25
US11257007B2 (en) 2022-02-22
CN109327421A (zh) 2019-02-12
US20200034740A1 (en) 2020-01-30
AU2018310377A1 (en) 2019-10-24

Similar Documents

Publication Publication Date Title
TWI689841B (zh) 資料加密、機器學習模型訓練方法、裝置及電子設備
US10891161B2 (en) Method and device for virtual resource allocation, modeling, and data prediction
CN109426861A (zh) 数据加密、机器学习模型训练方法、装置及电子设备
US20240394338A1 (en) Data compression techniques for machine learning models
CN113095408B (zh) 风险的确定方法、装置和服务器
CN113407987B (zh) 保护隐私的确定业务数据特征有效值的方法及装置
CN111027870A (zh) 用户风险评估方法及装置、电子设备、存储介质
CN108984733B (zh) 跨域数据融合方法、系统以及存储介质
US11500992B2 (en) Trusted execution environment-based model training methods and apparatuses
CN107704930A (zh) 基于共享数据的建模方法、装置、系统及电子设备
CN113268772B (zh) 基于差分隐私的联合学习安全聚合方法及装置
US20250061224A1 (en) Electronic protection of sensitive information via data embedding and noise addition
CN109426894A (zh) 用户信息共享、竞价方法、装置、系统及电子设备
Pires et al. Synthetic data generation with hybrid quantum-classical models for the financial sector
CN112948889B (zh) 在数据隐私保护下执行机器学习的方法和系统
US12476943B2 (en) Recommendation engine using fully homomorphic encryption
HK40004018A (zh) 数据加密、机器学习模型训练方法、装置及电子设备
CN114742641A (zh) 基于联邦学习的逻辑回归模型建模方法、装置及电子设备
US20250371543A1 (en) Multi-task convolutional neural network for behavior sequence embedding modeling
HK40004802A (zh) 数据加密、机器学习模型训练方法、装置及电子设备
US20250307921A1 (en) System and Methods for Automated Data Validation and Risk Bias Prediction
US20230274310A1 (en) Jointly predicting multiple individual-level features from aggregate data
Monkiewicz et al. Digital finance: Basic terminology
Thakur Securing Generative AI: Homomorphic Encryption, Differential Privacy, and Federated Learning in Key Industries.
CN116368772A (zh) 用于隐私保护推理的系统、方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18840540

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 3058498

Country of ref document: CA

ENP Entry into the national phase

Ref document number: 2018310377

Country of ref document: AU

Date of ref document: 20180727

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE