WO2021228264A1 - 一种应用机器学习的方法、装置、电子设备及存储介质 - Google Patents
一种应用机器学习的方法、装置、电子设备及存储介质 Download PDFInfo
- Publication number
- WO2021228264A1 WO2021228264A1 PCT/CN2021/094202 CN2021094202W WO2021228264A1 WO 2021228264 A1 WO2021228264 A1 WO 2021228264A1 CN 2021094202 W CN2021094202 W CN 2021094202W WO 2021228264 A1 WO2021228264 A1 WO 2021228264A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- model
- data
- online
- database
- deployed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/1734—Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
Definitions
- the embodiments of the present disclosure relate to the technical field of machine learning, and in particular to a method, device, electronic device, and storage medium for applying machine learning.
- the application of machine learning may include but is not limited to: problem definition, machine learning model establishment (referred to as modeling), model online service, feedback information collection, and model iteration update processes.
- modeling is based on offline data exploration model, and then based on offline evaluation method to determine the model effect, after the model effect reaches the standard (that is, meets the preset requirements), the IT staff will deploy the model online and perform the model online service.
- the online effect of the model that meets the offline effect may not meet the requirements.
- the inventor of the present disclosure found that because the data used for modeling is inconsistent with the online data, it is difficult to guarantee the consistency of the calculated features during the modeling process, resulting in the model effect line There is a big difference between online and offline, which fails to meet expectations, making it difficult for the model to go online.
- At least one embodiment of the present disclosure provides a method, apparatus, electronic device, and storage medium for applying machine learning.
- an embodiment of the present disclosure proposes a method for applying machine learning.
- the method includes: acquiring online related data streams of a specified business scenario based on a data service interface; accumulating data in the related data streams into a first database
- the model solution is explored based on the data in the first database;
- the model solution includes the following program sub-items: feature engineering solution, model algorithm and model hyperparameters; will be explored
- the model solution of is deployed online to provide online model estimation service, wherein the online model estimation service is performed based on the relevant data stream of the specified business scenario obtained online by the data service interface.
- an embodiment of the present disclosure proposes an apparatus for applying machine learning.
- the apparatus includes: a data management module configured to obtain online related data streams of a specified business scenario based on a data service interface; The data of is accumulated in the first database; the model scheme exploration module is configured to explore the model scheme based on the data in the first database when the first preset condition is met; the model scheme includes the following scheme sub-items: Feature engineering solutions, model algorithms, and model hyperparameters; model online prediction service module, configured to deploy the model solution obtained by the model solution exploration module online to provide online model prediction services, wherein the model online prediction service module The estimation service is performed based on the relevant data flow of the specified business scenario obtained online by the data service interface.
- an embodiment of the present disclosure proposes an electronic device, including: a processor and a memory; the processor is configured to execute the application machine learning method described in the first aspect by calling a program or instruction stored in the memory Method steps.
- an embodiment of the present disclosure proposes a computer-readable storage medium configured to store a program or instruction that causes a computer to execute the steps of the method for applying machine learning as described in the first aspect.
- the embodiments of the present disclosure also provide a computer program product, including computer program instructions, which, when the computer program instructions are run on a computer device, implement the steps of the method for applying machine learning as described in the first aspect.
- the business scenario is directly connected, the business scenario-related data is accumulated, and then the model solution is explored to obtain the model solution and the offline model, so as to ensure that the data used in the offline model solution exploration and the online estimation service of the model are used
- the received data is of the same origin, realizing the homology of offline and online data.
- the estimated effect of the offline model deployed online is poor.
- the offline model is deployed online.
- the model solution After the model solution is deployed and launched, it can receive the estimation request (that is, the data of the request data stream) to obtain sample data with features and feedback, and then use the sample data with features and feedback for model self-learning, and the self-learning model can be deployed Go online to ensure that the data and feature engineering schemes used in model self-learning are consistent with the data and feature engineering schemes used in the model online estimation service, so as to achieve the consistency of model self-learning effects and model prediction effects.
- the estimation request that is, the data of the request data stream
- the self-learning model can be deployed Go online to ensure that the data and feature engineering schemes used in model self-learning are consistent with the data and feature engineering schemes used in the model online estimation service, so as to achieve the consistency of model self-learning effects and model prediction effects.
- FIG. 1 is an exemplary architecture diagram of an apparatus for applying machine learning provided by an embodiment of the present disclosure
- Fig. 2 is an exemplary architecture diagram of another apparatus for applying machine learning provided by an embodiment of the present disclosure
- Fig. 3 is an exemplary flow logic block diagram of the apparatus for applying machine learning shown in Fig. 2;
- Fig. 4 is an exemplary data flow diagram of the apparatus for applying machine learning shown in Fig. 2;
- FIG. 5 is an exemplary architecture diagram of an electronic device provided by an embodiment of the present disclosure.
- Fig. 6 is an exemplary flowchart of a method for applying machine learning provided by an embodiment of the present disclosure.
- FIG. 1 is an exemplary architecture diagram of an apparatus for applying machine learning provided by an embodiment of the present disclosure, wherein the apparatus for applying machine learning is suitable for supervised learning artificial intelligence modeling of various data, including but It is not limited to two-dimensional structured data, images, NLP (Natural Language Processing), voice, etc.
- NLP Natural Language Processing
- the apparatus for applying machine learning can be applied to a specified business scenario, wherein the specified business scenario has pre-defined information about the related data flow of the business scenario, and the related data flow may include, but is not limited to: request data flow, display The data stream, the feedback data stream, and the business data stream, wherein the data of the display data stream is the data displayed by the specified business scenario based on the requested data stream.
- the short video application scenario as an example, after the request data is for the user to swipe or click on the user terminal to refresh the short video, the application background will screen out a set of candidate videos to form the request data that needs to be modeled.
- Display data is what short videos the short video application actually shows to users.
- the feedback data is, for example, whether the user clicks or watches the short video displayed by the short video application.
- the business data is, for example, data related to business logic, such as comment data and like data when the user watches a short video.
- the information about the related data stream of the predefined business scenario can be understood as the fields included in the related data.
- the related data stream is the request data stream.
- the information about the predefined request data stream can be understood as the request.
- the fields included in the request data in the data stream, and the fields may be user ID, request content, request time, candidate material ID, and so on.
- the online model estimation service can be provided through the device applying machine learning as shown in FIG. 1.
- the device for applying machine learning may include, but is not limited to: a data management module 100, a model solution exploration module 200, a model online estimation service module 300, and other components required for applying machine learning, such as offline databases, online Database, etc.
- the data management module 100 is configured to store and manage data derived from a specified business scenario and data generated by the model online estimation service module 300.
- the data derived from the designated business scenario is a related data stream obtained online by the data management module 100 directly connecting to the designated business scenario based on a data service interface.
- the data service interface is an application programming interface (API, Application Programming Interface).
- API Application Programming Interface
- the data service interface is created by the data management module 100 based on pre-defined information related to the data flow of the specified business scenario.
- the data management module 100 may provide a user interface, and based on the user interface, receive information about the relevant data flow of the specified business scenario input by the user.
- the user may be The operation and maintenance engineer of the specified business scenario.
- the data management module 100 may create a data service interface based on the information about the relevant data flow of the specified business scenario input by the user.
- the data service interface and the related data flow are one-to-one, for example, request data flow, display data flow, feedback data flow, and business data flow respectively correspond to different data service interfaces.
- the data management module 100 may accumulate data in the relevant data stream of the specified business scenario into a first database, where the first database is an offline database, for example, the offline database may be a distributed database.
- the distributed file storage system (HDFS, Hadoop Distributed File System) can also be other offline databases.
- the data management module 100 may process the data requesting the data stream to obtain sample data, where the processing method includes, but is not limited to, processing using a filter and flattening. .
- the data management module 100 may accumulate the data of the request data stream, the sample data, the data of the feedback data stream, and the data of the service data stream into the first database.
- the data management module 100 can use a filter to make the request based on the data of the display data stream.
- the data of the data stream is filtered to obtain the intersection data. For example, there are 10 pieces of data in the display data stream, 12 pieces of data in the request data stream, and 5 pieces of the same data in the display data stream and the request data stream. Then through the filter, the 5 pieces of the same data are the intersection data. Filter out.
- the data management module 100 can obtain sample data by flattening the intersection data (the 5 pieces of the same data).
- the data management module 100 can accumulate the data of the display data stream and the sample data obtained by the filtering process into the first database.
- the data management module 100 may receive data table attribute information input by the user through a user interface, where the data table attribute information describes the number of columns included in the data table and the data attributes of each column, such as User ID is a discrete field, request time is a time field, browsing time is a numeric field, etc.
- the data management module 100 can receive a splicing scheme between data tables input by a user through a user interface, where the splicing scheme includes splicing keys for splicing different data tables, and the number relationship of the same splicing keys between primary and secondary tables, Timing relationship and aggregation relationship.
- the data management module 100 may maintain logical relationship information through the first database based on the attribute information of the data table and the spelling plan; wherein the logical relationship information is a description of different data tables. Information about interrelationships, where the logical relationship information includes: the data table attribute information and the table spelling scheme.
- the model solution exploration module 200 is configured to, when a first preset condition is met, based on the data in the first database (for example, the logical relationship information, the data of the requested data stream, the sample data, and the feedback One or more of the data of the data stream, the data of the business data stream, and the data of the display data stream) the exploration model scheme.
- the first preset condition may include at least one of data amount, time, and manual trigger.
- the first preset condition may be that the amount of data in the first database reaches the preset amount of data, or all The time length of data accumulation in the first database reaches the preset time length.
- the setting of the first preset condition may enable the model solution exploration module 200 to iteratively update the model solution.
- the model scheme includes the following scheme sub-items: feature engineering scheme, model algorithm and model hyperparameters.
- the feature engineering scheme is explored based on the logical relationship information. Therefore, the feature engineering scheme has at least a table-joining function. It should be noted that the table-joining scheme of the characteristic engineering scheme can be the same as the table-joining scheme input by the user. Can be different.
- the feature engineering solution may also have other functions, such as extracting features from data for use by model algorithms or models.
- the model algorithm may be a commonly used machine learning algorithm, such as a supervised learning algorithm, including but not limited to: LR (Logistic Regression), GBDT (Gradient Boosting Decision Tree, Gradient Boosting Iterative Decision Tree), DeepNN (Deep Neural Network, deep neural network) and so on.
- the hyperparameters of the model are pre-set parameters configured to assist model training before machine learning, such as the number of categories in the clustering algorithm, the step size of the gradient descent method, the number of layers of the neural network, and the number of training neural networks. Learning rate, etc.
- the model solution exploration module 200 may generate at least two model solutions when exploring the model solution, for example, may generate at least two model solutions based on the logical relationship information maintained by the first database. Among them, at least one project sub-item is different between different model projects. In some embodiments, the model solution exploration module 200 uses the at least two model solutions for model training based on the data in the first database to obtain the parameters of the model itself, where the parameters of the model itself are, for example, : Weights in neural networks, support vectors in support vector machines, coefficients in linear regression or logistic regression, etc.
- the model solution exploration module 200 may evaluate the models trained by the at least two model solutions based on the machine learning model evaluation index, and then obtain the results from the at least two model solutions based on the evaluation results. To choose from, get the explored model scheme.
- the machine learning model evaluation index is, for example, an AUC (Area Under Curve) value.
- the model online estimation service module 300 is configured to deploy the model solution explored by the model solution exploration module 200 online to provide online model estimation services, wherein the model online estimation service is based on the data service interface obtained online The relevant data flow of the specified business scenario is performed.
- the online model estimation service module 300 only deploys the model solution online, but does not deploy the offline model obtained during the exploration process of the model solution exploration module 200 online, which can prevent offline models from being directly deployed and online due to online deployment. There is an inconsistency between the data obtained from the upper feature calculation and the offline feature calculation, which leads to the problem that the estimated effect of the offline model deployed on the line is poor.
- model online estimation service module 300 since the model online estimation service module 300 only deploys the model solution online, and does not deploy the offline model online, it does not generate an estimation result when the model online estimation service is provided.
- the default estimation result is sent to the specified business scenario, and the specified business scenario ignores the default estimation result after receiving the default estimation result. Therefore, the model solution exploration module 200 in FIG. 1 points to the model online prediction result with a virtual arrow The estimation service module 300 indicates that the model solution will not provide online estimation services, but will still feed back the default estimation results.
- the online model prediction service module 300 when the online model prediction service module 300 deploys the model solution online, it also deploys the offline model obtained during the exploration process of the model solution exploration module 200, and the offline model is based on the first database.
- the relevant data of the specified business scenario accumulated in the offline database is obtained through training, and after the offline model is deployed and online, the relevant data of the specified business scenario is estimated. Therefore, although the online and offline features are calculated The data may be inconsistent, but the same source of online and offline data is still achieved.
- the online model estimation service module 300 can store the relevant data stream of the specified business scenario acquired by the data service interface in a second database, where the second The database is an online database, such as a real-time feature storage engine (rtidb).
- rtidb is a distributed feature database for AI hard real-time scenarios. It has the characteristics of high-efficiency computing, read-write separation, high concurrency, and high-performance query; the second database is also Can be other online databases.
- the model online estimation service module 300 uses the data in the second database and the received request data to perform online real-time feature calculation based on the feature engineering solution in the deployed model solution to obtain the prediction Estimate the characteristic data of the sample.
- the online model estimation service module 300 when the online model estimation service module 300 receives the requested data, it will collate the data in the second database and the received request data based on the feature engineering solution in the model solution deployed online. And online real-time feature calculation to obtain the wide table feature data, and the obtained feature data of the estimated sample is the wide table feature data.
- the online model prediction service module 300 can obtain the feature data (or wide-table feature data) of the predicted sample based on the deployed model solution, and splice the feature data and the feedback data to generate sample data with features and feedback.
- the sample data may also include other data, such as time stamp data, etc.; the feedback data is derived from a feedback data stream.
- the model online estimation service module 300 before the model online estimation service module 300 splices the characteristic data and the feedback data, the characteristic data and the display data are spliced to obtain the characteristic data with display data, and the display data is derived from the display data. Stream; and then splicing the feature data and feedback data with display data to generate sample data with display data, feature data and feedback data.
- the model online estimation service module 300 returns the sample data with features and feedback to the first database for model self-learning, and the model obtained by self-learning can be deployed online to ensure The data and feature engineering schemes used in the model self-learning are consistent with the data and feature engineering schemes used in the model online estimation service respectively, so as to realize the consistency of the model self-learning effect and the model prediction effect.
- the data management module 100, the model solution exploration module 200, and the model online estimation service module 300 constitute one Machine learning closed loop, because the data used in the exploration of the model scheme is the data in the first database, and the first database is an offline database, the data used in the exploration of the model scheme can be understood as offline data, and the model online estimation service
- the data used is online data, and the offline data and online data are all obtained from the specified business scenario by the data service interface. Therefore, it can ensure that the data used in the exploration of the model solution (referred to as offline data) and The data used by the model online estimation service (abbreviated as online data) is of the same origin, realizing the homology of offline and online data.
- FIG. 2 is another apparatus for applying machine learning according to an embodiment of the present disclosure.
- the apparatus for applying machine learning includes the data management module 100, the model solution exploration module 200, and the model online estimation service module 300 shown in FIG. , Also includes the model self-learning module 400 and other components required by machine learning applications, such as offline databases, online databases, and so on.
- the model self-learning module 400 is configured to perform model self-learning based on the sample data with features and feedback in the first database when the second preset condition is met.
- the second preset condition may include at least one of data amount, time, and manual trigger.
- the second preset condition may be that the amount of data in the first database reaches the preset amount of data, or all The time length of data accumulation in the first database reaches the preset time length.
- the setting of the second preset condition may enable the model self-learning module 400 to iteratively update the model.
- the model self-learning module 400 trains based on the sample data with features and feedback through the model algorithm and model hyperparameters in the model solution. , Get the machine learning model.
- the model online prediction service module 300 deploys the model solution online
- the initial model is also deployed online, wherein the initial model is generated during the process of the model solution exploration module 200 exploring the model solution.
- the model self-learning module 400 trains an initial model through the model algorithm and model hyperparameters in the model solution, updates the parameter values of the initial model itself, and obtains a machine learning model.
- the model self-learning module 400 uses the model algorithm and model in the model solution.
- the hyperparameter trains a random model to obtain a machine learning model, where the random model is a model generated based on the model algorithm, and the parameters of the model itself take random values.
- the model online estimation service module 300 can deploy the model obtained by the model self-learning module 400 to provide online model estimation service.
- the model online estimation service module 300 deploys the model obtained by the model self-learning module 400 online, when the request data is received, it is based on the data in the second database and the received request The data generates estimated samples with characteristics, and the estimated results of the estimated samples are obtained by deploying the online model.
- the difference from the model solution is that the deployed online models can obtain the estimated results of the estimated samples.
- the model online estimation service module 300 may send the estimation result to the specified business scenario for use or reference in the business scenario.
- the model online estimation service module 300 may replace the model obtained by the model self-learning module 400 with a machine learning model that has been deployed online; or, deploy the model obtained by the model self-learning module 400 Go online and provide online model estimation services together with the deployed machine learning models.
- the online model estimation service module 300 may replace the model solution obtained by the model solution exploration module 200 with the model solution that has been deployed online; or, replace the model solution obtained by the model solution exploration module 200 Deploy and go online, and do not go offline. The already deployed and online model solutions.
- the data management module 100, the model self-learning module 400, and the model online estimation service module 300 constitute a machine Learning closed loop, because the sample data with features and feedback used by the model self-learning module 400 to train the model is generated online based on the data in the second database (that is, the online database) and the received request data after the model solution is deployed and launched, and the model After the online estimation service module 300 deploys the model trained by the model self-learning module 400 online, it also provides estimation services based on the data in the second database. Therefore, it is ensured that the data and feature engineering schemes used in the model self-learning are different from the model.
- the data used in the online estimation service is consistent with the feature engineering scheme, realizing the consistency of the self-learning effect of the model and the prediction effect of the model.
- the division of each module in the device applying machine learning is only a logical function division, and there may be other division methods in actual implementation, such as the data management module 100, the model solution exploration module 200, and online model estimation.
- At least two of the service module 300 and the model self-learning module 400 can be implemented as one module; the data management module 100, the model solution exploration module 200, the model online estimation service module 300, or the model self-learning module 400 can also be divided into multiple modules.
- Sub-modules It can be understood that each module or sub-module can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Those skilled in the art can use different methods for each specific application to realize the described functions.
- FIG. 3 is an exemplary flow logic block diagram of the apparatus for applying machine learning shown in FIG. 2.
- the user can input the data flow of the specified business scenario through the user interface.
- the user can also input the attribute information of the data table and the spelling scheme through the user interface during the process of the model scheme exploration 303.
- data management 302, model self-learning 305, and model online estimation service 304 constitute a small closed loop;
- data management 302, model solution exploration 303, and model online estimation service 304 constitute a large closed loop.
- the small closed loop guarantees that the data and feature engineering schemes used in the model self-learning 305 and the data and feature engineering schemes used in the model online estimation service 304 respectively, to achieve the consistency of the model self-learning effect and the model prediction effect.
- the large closed loop guarantees that the data used by the model solution exploration 303 (referred to as offline data) and the data used by the model online estimation service 304 (referred to as online data) are of the same origin, realizing the same origin of offline and online data.
- Fig. 4 is an exemplary data flow diagram of the apparatus for applying machine learning shown in Fig. 2.
- the English words in Figure 4 are explained as follows:
- GW is the gateway of the designated business scenario
- the retain-mixer is configured to implement the function of accumulating data in the relevant data stream of the specified business scenario in the data management module 100 into the first database;
- trial1-mixer and trial2-mixer can be understood as two parallel model online estimation service modules 300;
- HDFS is the first database
- rtidb1 and rtidb2 are two second databases
- self-learn1 and self-learn2 are two model self-learning modules 400;
- fedb1 and fedb2 can be understood as feature engineering schemes in the model scheme.
- the retain-mixer obtains the request, impression, action, and BOes from the specified business scenario based on the data service interface, and adds eventTime or ingestionTime to the request, impression, and action respectively, so that the data management module 100 can maintain the data sequence relationship information in the logical relationship information.
- eventTime belongs to the data management function of the data management module 100.
- the retain-mixer accumulates the request in HDFS, which is convenient for subsequent operation and maintenance.
- the retain-mixer adds ingestionTime to impression, action, and BOes to obtain impression’, action’, and BOes’, and accumulate impression’, action’, and BOes’ into HDFS.
- the addition of ingestionTime belongs to the data management function of the data management module 100.
- the retain-mixer processes the request and the impression through the filter operation to obtain the intersection data. For example, impression has 10 data, request has 12 data, request and impression have 5 identical data, then these 5 identical data are obtained through the filter operation, which is the intersection data. , Filter out the different data; then process the intersection data (the 5 same data) through the flatten operation to get flatten_req (sample data).
- the retain-mixer accumulates flatten_req into HDFS.
- AutoML can explore model schemes based on flatten_req, impression’, action’ and BOes’ in HDFS.
- impression', action', and BOes' are accumulated in rtidb1 and rtidb2, and user historical data, such as user behavior data, can be synchronized to rtidb1 and rtidb2 .
- each request data is obtained, and the accumulated data is obtained from rtidb1 and rtidb2 through fedb1 and fedb2 for feature engineering, and enrich1 and enrich2 are obtained.
- trial1-mixer and trial2-mixer In trial1-mixer and trial2-mixer, enrich1 and enrich2 are joined (spliced) and flattened with impression and action, respectively, to obtain viewlog1 and viewlog2.
- trial1-mixer and trial2-mixer accumulate viewlog1 and viewlog2 into HDFS.
- Self-learn1 and self-learn2 perform model self-learning based on viewlog1 and viewlog2, respectively, to obtain a machine learning model.
- trial1-mixer and trial2-mixer deploy the machine learning models obtained from self-learn1 and self-learn2 respectively, and provide online model estimation services.
- the device for applying machine learning disclosed in this embodiment may not rely on importing historical offline data from other databases, and may collect data from scratch.
- FIG. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
- the electronic device includes: at least one processor 501, at least one memory 502, and at least one communication interface 503.
- the various components in the electronic device are coupled together through the bus system 504.
- the communication interface 503 is configured for information transmission with external devices. Understandably, the bus system 504 is configured to implement connection and communication between these components.
- the bus system 504 also includes a power bus, a control bus, and a status signal bus. However, for the sake of clear description, various buses are marked as the bus system 504 in FIG. 5.
- the memory 502 in this embodiment may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
- the memory 502 stores the following elements, executable units or data structures, or a subset of them, or an extended set of them: operating systems and applications.
- the operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, etc., which are configured to implement various basic services and process hardware-based tasks.
- Application programs including various application programs, such as a media player (Media Player), a browser (Browser), etc., are configured to implement various application services.
- a program that implements the method of applying machine learning provided by the embodiments of the present disclosure may be included in an application program.
- the processor 501 calls a program or instruction stored in the memory 502, specifically, a program or instruction stored in an application program, and the processor 501 is configured to execute the application machine learning provided by the embodiment of the present disclosure.
- the steps of the various embodiments of the method are described in detail below.
- the method for applying machine learning may be configured in the processor 501 or implemented by the processor 501.
- the processor 501 may be an integrated circuit chip with signal processing capability. In the implementation process, the steps of the foregoing method can be completed by an integrated logic circuit of hardware in the processor 501 or instructions in the form of software.
- the aforementioned processor 501 may be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (ASIC), a ready-made programmable gate array (Field Programmable Gate Array, FPGA) or other Programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
- the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
- the steps of the method for applying machine learning provided by the embodiments of the present disclosure may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software units in the decoding processor.
- the software unit may be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
- the storage medium is located in the memory 502, and the processor 501 reads the information in the memory 502 and completes the steps of the method in combination with its hardware.
- Fig. 6 is an exemplary flowchart of a method for applying machine learning provided by an embodiment of the disclosure.
- the main body of execution of this method is electronic equipment.
- an electronic device is used as an execution subject to illustrate the process of the method of applying machine learning.
- the electronic device may provide a user interface, and based on the user interface, receive user-input information about related data streams of a specified business scenario, where the related data streams include, but are not limited to: request data streams, display Data flow, feedback data flow and business data flow.
- the information about the related data flow of the specified business scenario can be understood as the fields included in the related data.
- the electronic device creates a data service interface based on the information about the related data flow of the specified business scenario, for example, request data flow, display data flow, feedback data flow, and business data flow correspond to different data service interfaces.
- the electronic device may receive data table attribute information input by the user based on the user interface, where the data table attribute information describes the number of columns included in the data table and the data attributes of each column.
- the electronic device may also receive a splicing scheme between data tables input by a user through a user interface, where the splicing scheme includes splicing keys for splicing different data tables, and the quantitative relationship and time sequence of the same splicing keys between the primary and secondary tables. Relationships and aggregation relationships.
- the electronic device may maintain logical relationship information through the first database based on the attribute information of the data table and the spelling scheme; wherein the logical relationship information is information describing the relationship between different data tables ,
- the logical relationship information includes: data table attribute information and the table spelling scheme.
- the electronic device online obtains the relevant data stream of the specified business scenario based on the data service interface.
- the electronic device may obtain the display data stream of the specified business scenario online based on the data service interface, where the data of the display data is the data displayed by the specified business scenario based on the requested data stream.
- the electronic device accumulates the data in the related data stream into a first database.
- the first database is an offline database.
- the electronic device processes the data of the requested data stream to obtain sample data; and then combines the data of the requested data stream, the sample data, the data of the feedback data stream, and the data of the service data stream. Accumulate in the first database.
- the processing method includes, for example, but not limited to: processing using a filter and flattening processing.
- the electronic device uses a filter to filter the data of the requested data stream based on the data of the display data stream to obtain the intersection data; and then process the intersection data by flattening the data to obtain the intersection data. sample.
- the electronic device accumulates the display data and the sample data obtained by the filtering process into the first database.
- the electronic device is based on the data in the first database (for example, the logical relationship information, the data of the requested data stream, the sample data, the feedback data stream).
- the model scheme includes the following scheme sub-items: feature engineering scheme, model algorithm and model hyperparameters.
- the feature engineering scheme is explored based on the logical relationship information. Therefore, the feature engineering scheme has at least a table-joining function. It should be noted that the table-joining scheme of the characteristic engineering scheme can be the same as the table-joining scheme input by the user. Can be different.
- the feature engineering solution may also have other functions, such as extracting features from data for use by model algorithms or models.
- the first preset condition may include at least one of data amount, time, and manual trigger.
- the first preset condition may be that the amount of data in the first database reaches the preset data amount.
- the time length of data accumulation in the first database reaches the preset time length.
- the electronic device generates at least two model solutions when the first preset condition is met.
- at least two model solutions may be generated based on the logical relationship information maintained by the first database, wherein different models There is at least one project sub-item that is different between the schemes; and then the at least two model schemes are used for model training based on the data in the first database; then the at least two model schemes are evaluated based on the machine learning model evaluation index The separately trained models are evaluated; finally, the at least two model solutions are selected based on the evaluation results to obtain the explored model solutions.
- the electronic device deploys the explored model solution online to provide online model estimation service, wherein the model online estimation service is based on the relevant data stream of the specified business scenario obtained online by the data service interface conduct.
- the electronic device only deploys the model solution online, and does not deploy the offline model obtained during the model solution exploration process, which can avoid the offline model directly deployed and online after the online feature calculation and offline feature calculation are obtained.
- the inconsistency of the data leads to the problem of poor estimation effect of the offline model deployed online.
- the model solution is deployed online, and offline models are not deployed online, when the online model estimation service is provided, the estimation result will not be generated.
- the requested data is received, it will be sent to the specified business scenario.
- the specified business scenario ignores the default estimation result after receiving the default estimation result.
- the electronic device when the electronic device deploys the model solution online, it also deploys the offline model obtained in the process of model solution exploration, and the offline model is based on the accumulated in the first database (ie, the offline database).
- the relevant data of the specified business scenario is trained, and after the offline model is deployed and online, the relevant data of the specified business scenario is estimated to serve. Therefore, although the data obtained by online and offline feature calculations may be inconsistent, it is still implemented online The next data is the same source.
- the data of the related data stream is stored in a second database, where the second database is an online database.
- the electronic device uses the data in the second database and the received request data to perform online real-time feature calculation based on the feature engineering solution in the deployed model solution to obtain the feature data of the estimated sample.
- the electronic device deploys the explored model solution online, upon receiving the request data, based on the feature engineering solution in the deployed model solution, it makes a request to the data in the second database and the received request.
- the data is tabled together and online real-time feature calculation is performed to obtain the wide table feature data, and the obtained feature data of the estimated sample is the wide table feature data.
- the electronic device obtains the feature data (or wide table feature data) of the estimated sample based on the model solution deployed online, and splices the feature data and the feedback data to generate sample data with features and feedback.
- the sample data may also Including other data, such as time stamp data, etc.; the feedback data comes from the feedback data stream.
- the display data is derived from the display data stream; The characteristic data and feedback data of the tape display data are described, and sample data of the tape display data, characteristic data and feedback data are generated.
- the electronic device reflows the sample data with characteristics and feedback to the first database, and when a second preset condition is met, based on the characteristics and feedback in the first database Sample data for self-learning of the model.
- the second preset condition may include at least one of data amount, time, and manual trigger.
- the second preset condition may be that the amount of data in the first database reaches the preset amount of data, or all The time length of data accumulation in the first database reaches the preset time length.
- the electronic device will train the model algorithm and model hyperparameters in the model scheme based on the sample data with features and feedback when the second preset condition is met to obtain machine learning Model.
- the electronic device deployment model solution is online
- the initial model is also deployed online, where the initial model is an offline model generated in the process of exploring the model solution, and the electronic device passes the model in the model solution
- the hyperparameters of the algorithm and the model train the initial model, update the parameter values of the initial model itself, and obtain the machine learning model.
- the electronic device trains a random model through the model algorithm in the model solution and the hyperparameters of the model to obtain a machine learning model, where
- the random model is a model generated based on the model algorithm, and the parameters of the model itself are random values.
- the electronic device deploys the machine learning model online to provide online model estimation services.
- the electronic device after the electronic device deploys the machine learning model online, when the request data is received, it generates an estimated sample with characteristics based on the data in the second database and the received request data, and deploys it.
- the online model obtains the estimated result of the estimated sample.
- the difference from the model solution is that the online model can be deployed to obtain the estimated result of the estimated sample.
- the electronic device may send the estimation result to the specified business scenario for use or reference in the business scenario.
- the electronic device replaces the model obtained by the self-learning of the model with the deployed machine learning model; or, the model obtained by the self-learning of the model is deployed online, and is combined with the deployed machine learning model Provide online model estimation service.
- the electronic device replaces the explored model solution with the deployed model solution; or, deploys the explored model solution online, and does not offline the deployed model solution.
- the data used in the exploration of the model scheme is the data in the first database, and the first database is an offline database
- the data used in the exploration of the model scheme can be understood as a line.
- the data used by the model online estimation service is online data, and the offline data and online data are both obtained from the specified business scenario by the data service interface. Therefore, it can be guaranteed that the model solution is used for exploration
- the data (abbreviated as offline data) and the data used by the model online estimation service (abbreviated as online data) are of the same origin, realizing the homology of offline and online data.
- the sample data with features and feedback used for model self-learning is based on the data in the second database (that is, the online database) and the received request data after the model solution is deployed and launched.
- the model is generated online, and the model obtained by the model self-learning module is deployed online, it also provides estimation services based on the data in the second database. Therefore, it is ensured that the data and feature engineering schemes used in the model self-learning are respectively the same as the model online estimation service
- the data used is consistent with the feature engineering scheme, achieving consistency between the self-learning effect of the model and the predictive effect of the model.
- the embodiment of the present disclosure also proposes a computer-readable storage medium that stores a program or instruction that causes a computer to execute the steps of each embodiment of the method for applying machine learning, in order to avoid repetitive description , I won’t repeat it here.
- the embodiments of the present disclosure also provide a computer program product, which includes computer program instructions.
- the computer program instructions When the computer program instructions are run on a computer device, they can execute the method steps of the various embodiments of the present disclosure, for example, when run by a processor, The processor is caused to execute the method steps of the various embodiments of the present disclosure.
- the computer program product may use any combination of one or more programming languages to write program codes for performing the operations of the embodiments of the present disclosure.
- the programming languages include object-oriented programming languages, such as Java, C++, etc. , Also includes conventional procedural programming languages, such as "C" language or similar programming languages.
- the program code can be executed entirely on the user's computing device, partly on the user's device, executed as an independent software package, partly on the user's computing device and partly executed on the remote computing device, or entirely on the remote computing device or server Executed on.
- the business scenario is directly connected, the business scenario-related data is accumulated, and then the model solution is explored, and the model solution and offline model are obtained to ensure that the data used for the exploration of the offline model solution and the online estimation service of the model are used
- the data is of the same origin, realizing the homology of offline and online data.
- the estimated effect of the offline model deployed online is poor.
- the offline model is deployed online.
- the model solution After the model solution is deployed and launched, it can receive the estimation request (that is, the data of the request data stream) to obtain sample data with features and feedback, and then use the sample data with features and feedback for model self-learning, and the self-learning model can be deployed Go online to ensure that the data and feature engineering schemes used in model self-learning are consistent with the data and feature engineering schemes used in model online estimation services, so as to achieve consistency between model self-learning effects and model prediction effects.
- the estimation request that is, the data of the request data stream
- the self-learning model can be deployed Go online to ensure that the data and feature engineering schemes used in model self-learning are consistent with the data and feature engineering schemes used in model online estimation services, so as to achieve consistency between model self-learning effects and model prediction effects.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
Claims (43)
- 一种应用机器学习的方法,所述方法包括:基于数据服务接口在线获取指定业务场景的相关数据流;将所述相关数据流中的数据积累到第一数据库中;当第一预设条件被满足时,基于所述第一数据库中的数据探索模型方案;将探索得到的模型方案部署上线以提供模型在线预估服务,其中,所述模型在线预估服务基于所述数据服务接口在线获取的所述指定业务场景的相关数据流进行。
- 根据权利要求1所述的方法,其中,所述模型方案包括以下方案子项:特征工程方案、模型算法和模型的超参数。
- 根据权利要求1所述的方法,其中,在所述基于数据服务接口在线获取指定业务场景的相关数据流的步骤之前,该方法还包括:提供用户界面,基于所述用户界面接收用户输入的关于所述指定业务场景的相关数据流的信息;基于所述关于所述指定业务场景的相关数据流的信息创建所述数据服务接口。
- 根据权利要求1所述的方法,其中,所述相关数据流包括:请求数据流、反馈数据流和业务数据流。
- 根据权利要求4所述的方法,其中,所述将所述相关数据流中的数据积累到第一数据库中包括:处理所述请求数据流的数据得到样本数据;将所述请求数据流的数据、所述样本数据、所述反馈数据流的数据和所述业务数据流的数据积累到所述第一数据库中。
- 根据权利要求5所述的方法,其中,所述相关数据流还包括展示数据流;其中所述展示数据流的数据为所述指定业务场景基于所述请求数据流展示的数据;相应地,处理所述请求数据流的数据得到样本数据包括:基于所述展示数据流的数据对所述请求数据流的数据进行过滤,得到交集数据;处理所述交集数据得到样本数据;相应地,将所述展示数据流的数据和所述样本数据积累到所述第一数据库中。
- 根据权利要求1至6任一项所述的方法,其中,所述基于所述第一数据库中的数据探索模型方案包括:生成至少两个模型方案,其中,不同模型方案之间至少有一个方案子项不同;基于所述第一数据库中的数据分别采用所述至少两个模型方案进行模型训练;基于机器学习模型评价指标,对所述至少两个模型方案所分别训练出的模型进行评价;基于评价结果从所述至少两个模型方案中进行选择,得到探索到的模型方案。
- 根据权利要求7所述的方法,其中,所述方法还包括:基于用户界面接收用户输入的数据表属性信息和拼表方案;基于所述数据表属性信息和所述拼表方案,通过所述第一数据库维护逻辑关系信息;所述逻辑关系信息为描述不同数据表之间关系的信息;相应地,所述生成至少两个模型方案包括:基于所述逻辑关系信息生成至少两个模型方案。
- 根据权利要求8所述的方法,其中,所述拼表方案包括拼接不同数据表的拼接键、时序关系和聚合关系;所述逻辑关系信息包括:数据表属性信息和所述拼表方案。
- 根据权利要求1至6任一项所述的方法,其中,所述将探索得到的模型方案部署上线后,所述方法还包括:将所述相关数据流的数据存储到第二数据库中;所述第二数据库支持线上实时特征计算;当接收到请求数据时,基于部署上线的模型方案中的特征工程方案,利用所述第二数据库中的数据和接收的请求数据进行线上实时特征计算,得到预估样本的特征数据。
- 根据权利要求10所述的方法,其中,利用所述第二数据库中的数据和接收的请求数据进行线上实时特征计算包括:基于部署上线的模型方案中的特征工程方案,对所述第二数据库中的数据和接收的请求数据进行拼表和线上实时特征计算得到宽表特征数据;相应地,所述预估样本的特征数据为宽表特征数据。
- 根据权利要求10所述的方法,其中,所述方法还包括:所述将探索得到的模型方案部署上线后,当接收到请求数据且没有将模型部署上线时,向所述指定业务场景发送默认的预估结果。
- 根据权利要求10所述的方法,其中,所述将探索得到的模型方案部署上线以提供模型在线预估服务包括:基于部署上线的模型方案得到预估样本的特征数据;拼接所述特征数据和反馈数据,生成带特征和反馈的样本数据;所述反馈数据来源于反馈数据流;将所述带特征和反馈的样本数据回流到所述第一数据库中;当第二预设条件被满足时,基于所述第一数据库中的带特征和反馈的样本数据进行模型自学习;将所述模型自学习得到的模型部署上线以提供模型在线预估服务。
- 根据权利要求13所述的方法,其中,所述拼接所述特征数据和反馈数据之前,所述方法还 包括:拼接所述特征数据和展示数据,得到带展示数据的特征数据;所述展示数据来源于展示数据流;相应地,拼接所述带展示数据的特征数据和反馈数据,生成带展示数据、特征数据和反馈数据的样本数据。
- 根据权利要求13所述的方法,其中,所述基于所述第一数据库中的带特征和反馈的样本数据进行模型自学习包括:基于所述带特征和反馈的样本数据,通过所述模型方案中的模型算法和模型的超参数进行训练,得到机器学习模型。
- 根据权利要求15所述的方法,其中,所述通过所述模型方案中的模型算法和模型的超参数进行训练,得到机器学习模型,包括:通过所述模型方案中的模型算法和模型的超参数训练初始模型,得到机器学习模型;其中所述初始模型为所述探索模型方案的过程中产生的模型,且将探索得到的模型方案部署上线时,还将所述初始模型部署上线。
- 根据权利要求15所述的方法,其中,所述通过所述模型方案中的模型算法和模型的超参数进行训练,得到机器学习模型,包括:通过所述模型方案中的模型算法和模型的超参数训练随机模型,得到机器学习模型;其中所述随机模型为基于所述模型算法生成的模型,且所述模型本身的参数取值为随机值;且将探索得到的模型方案部署上线时,没有将初始模型部署上线。
- 根据权利要求13所述的方法,其中,所述将所述模型自学习得到的模型部署上线以提供模型在线预估服务包括:将所述模型自学习得到的模型部署上线后,当接收到请求数据时,基于所述第二数据库中的数据和接收的请求数据生成带特征的预估样本,并通过部署上线的模型得到所述预估样本的预估结果;向所述指定业务场景发送所述预估结果。
- 根据权利要求13所述的方法,其中,所述将所述模型自学习得到的模型部署上线包括:将所述模型自学习得到的模型替换已部署上线的机器学习模型;或,将所述模型自学习得到的模型部署上线,并与已部署上线的机器学习模型共同提供模型在线预估服务;所述将探索得到的模型方案部署上线包括:将所述探索得到的模型方案替换已部署上线的模型方案;或,将所述探索得到的模型方案部署上线,且不下线已部署上线的模型方案。
- 根据权利要求13所述的方法,其中,所述第一预设条件和所述第二预设条件包括:数据量、 时间、人工触发中的至少一个。
- 一种应用机器学习的装置,所述装置包括:数据管理模块,被配置为基于数据服务接口在线获取指定业务场景的相关数据流;将所述相关数据流中的数据积累到第一数据库中;模型方案探索模块,被配置为当第一预设条件被满足时,基于所述第一数据库中的数据探索模型方案;模型在线预估服务模块,被配置为将所述模型方案探索模块得到的模型方案部署上线以提供模型在线预估服务,其中,所述模型在线预估服务基于所述数据服务接口在线获取的所述指定业务场景的相关数据流进行。
- 根据权利要求21所述的装置,其中,所述模型方案包括以下方案子项:特征工程方案、模型算法和模型的超参数。
- 根据权利要求21所述的装置,其中,所述数据管理模块还被配置为:提供用户界面,基于所述用户界面接收用户输入的关于所述指定业务场景的相关数据流的信息;基于所述关于所述指定业务场景的相关数据流的信息创建所述数据服务接口。
- 根据权利要求21所述的装置,其中,所述相关数据流包括:请求数据流、反馈数据流和业务数据流。
- 根据权利要求24所述的装置,其中,所述数据管理模块被配置为:处理所述请求数据流的数据得到样本数据;将所述请求数据流的数据、所述样本数据、所述反馈数据流的数据和所述业务数据流的数据积累到所述第一数据库中。
- 根据权利要求25所述的装置,其中,所述相关数据流还包括展示数据流;其中所述展示数据流的数据为所述指定业务场景基于所述请求数据流展示的数据;所述数据管理模块被配置为:基于所述展示数据流的数据对所述请求数据流的数据进行过滤,得到交集数据;处理所述交集数据得到样本数据;所述数据管理模块还被配置为将所述展示数据流的数据和所述样本数据积累到所述第一数据库中。
- 根据权利要求21至26任一项所述的装置,其中,所述模型方案探索模块被配置为:当第一预设条件被满足时,生成至少两个模型方案,其中,不同模型方案之间至少有一个方案子项不同;基于所述第一数据库中的数据分别采用所述至少两个模型方案进行模型训练;基于机器学习模型评价指标,对所述至少两个模型方案所分别训练出的模型进行评价;基于评价结果从所述至少两个模型方案中进行选择,得到探索到的模型方案。
- 根据权利要求27所述的装置,其中,所述数据管理模块还被配置为:基于用户界面接收用户输入的数据表属性信息和拼表方案;基于所述数据表属性信息和所述拼表方案,通过所述第一数据库维护逻辑关系信息;所述逻辑关系信息为描述不同数据表之间关系的信息;相应地,所述模型方案探索模块被配置为:基于所述逻辑关系信息生成至少两个模型方案。
- 根据权利要求28所述的装置,其中,所述拼表方案包括拼接不同数据表的拼接键、时序关系和聚合关系;所述逻辑关系信息包括:数据表属性信息和所述拼表方案。
- 根据权利要求21至26任一项所述的装置,其中,所述模型在线预估服务模块还被配置为:将所述模型方案探索模块得到的模型方案部署上线后,将所述相关数据流的数据存储到第二数据库中;所述第二数据库支持线上实时特征计算;当接收到请求数据时,基于部署上线的模型方案中的特征工程方案,利用所述第二数据库中的数据和接收的请求数据进行线上实时特征计算,得到预估样本的特征数据。
- 根据权利要求30所述的装置,其中,所述模型在线预估服务模块利用所述第二数据库中的数据和接收的请求数据进行线上实时特征计算,包括:基于部署上线的模型方案中的特征工程方案,对所述第二数据库中的数据和接收的请求数据进行拼表和线上实时特征计算得到宽表特征数据;相应地,所述预估样本的特征数据为宽表特征数据。
- 根据权利要求30所述的装置,其中,所述模型在线预估服务模块还被配置为:将所述模型方案探索模块得到的模型方案部署上线后,当接收到请求数据且没有将模型部署上线时,向所述指定业务场景发送默认的预估结果。
- 根据权利要求30所述的装置,其中,所述模型在线预估服务模块还被配置为:基于部署上线的模型方案得到预估样本的特征数据;拼接所述特征数据和反馈数据,生成带特征和反馈的样本数据,其中所述反馈数据来源于反馈数据流;将所述带特征和反馈的样本数据回流到所述第一数据库中;所述装置还包括模型自学习模块,被配置为当第二预设条件被满足时,基于所述第一数据库中的带特征和反馈的样本数据进行模型自学习;所述模型在线预估服务模块还被配置为:将所述模型自学习模块得到的模型部署上线以提供模型在线预估服务。
- 根据权利要求33所述的装置,其中,所述模型在线预估服务模块还被配置为:在拼接所述特征数据和反馈数据之前,拼接所述特征数据和展示数据,得到带展示数据的特征数据;所述展示数 据来源于展示数据流;相应地,所述模型在线预估服务模块被配置为拼接所述带展示数据的特征数据和反馈数据,生成带展示数据、特征数据和反馈数据的样本数据。
- 根据权利要求33所述的装置,其中,所述模型自学习模块被配置为:当第二预设条件被满足时,基于所述带特征和反馈的样本数据,通过所述模型方案中的模型算法和模型的超参数进行训练,得到机器学习模型。
- 根据权利要求35所述的装置,其中,所述模型自学习模块被配置为:通过所述模型方案中的模型算法和模型的超参数训练初始模型,得到机器学习模型;其中所述初始模型为所述模型方案探索模块探索模型方案的过程中产生的模型,且所述模型在线预估服务模块将所述模型方案探索模块得到的模型方案部署上线时,还将所述初始模型部署上线。
- 根据权利要求35所述的装置,其中,所述模型自学习模块被配置为包括:通过所述模型方案中的模型算法和模型的超参数训练随机模型,得到机器学习模型;其中所述随机模型为基于所述模型算法生成的模型,且所述模型本身的参数取值为随机值;且所述模型在线预估服务模块将所述模型方案探索模块得到的模型方案部署上线时,没有将初始模型部署上线。
- 根据权利要求33所述的装置,其中,所述模型在线预估服务模块被配置为:将所述模型自学习模块得到的模型部署上线后,当接收到请求数据时,基于所述第二数据库中的数据和接收的请求数据生成带特征的预估样本,并通过部署上线的模型得到所述预估样本的预估结果;向所述指定业务场景发送所述预估结果。
- 根据权利要求33所述的装置,其中,所述模型在线预估服务模块被配置为:将所述模型自学习模块得到的模型替换已部署上线的机器学习模型;或,将所述模型自学习模块得到的模型部署上线,并与已部署上线的机器学习模型共同提供模型在线预估服务;所述模型在线预估服务模块被配置为:将所述模型方案探索模块得到的模型方案替换已部署上线的模型方案;或,将所述模型方案探索模块得到的模型方案部署上线,且不下线已部署上线的模型方案。
- 根据权利要求33所述的装置,其中,所述第一预设条件和所述第二预设条件包括:数据量、时间、人工触发中的至少一个。
- 一种电子设备,包括:处理器和存储器;所述处理器通过调用所述存储器存储的程序或指令,被配置为执行如权利要求1至20任一项所述方法的步骤。
- 一种计算机可读存储介质,所述计算机可读存储介质存储程序或指令,所述程序或指令使计算机执行如权利要求1至20任一项所述方法的步骤。
- 一种计算机程序产品,包括计算机程序指令,当所述计算机程序指令在计算机装置上运行时实现如权利要求1至20任一项所述方法的步骤。
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/925,576 US20230342663A1 (en) | 2020-05-15 | 2021-05-17 | Machine learning application method, device, electronic apparatus, and storage medium |
| EP21802933.8A EP4152224A4 (en) | 2020-05-15 | 2021-05-17 | MACHINE LEARNING APPLICATION METHOD, DEVICE, ELECTRONIC DEVICE AND STORAGE MEDIUM |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010415370.7 | 2020-05-15 | ||
| CN202010415370.7A CN113673707B (zh) | 2020-05-15 | 2020-05-15 | 一种应用机器学习的方法、装置、电子设备及存储介质 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021228264A1 true WO2021228264A1 (zh) | 2021-11-18 |
Family
ID=78525199
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2021/094202 Ceased WO2021228264A1 (zh) | 2020-05-15 | 2021-05-17 | 一种应用机器学习的方法、装置、电子设备及存储介质 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US20230342663A1 (zh) |
| EP (1) | EP4152224A4 (zh) |
| CN (1) | CN113673707B (zh) |
| WO (1) | WO2021228264A1 (zh) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115242648A (zh) * | 2022-07-19 | 2022-10-25 | 北京百度网讯科技有限公司 | 扩缩容判别模型训练方法和算子扩缩容方法 |
| CN115269730A (zh) * | 2022-08-04 | 2022-11-01 | 北京京东振世信息技术有限公司 | 宽表同步方法及装置 |
Families Citing this family (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112036577B (zh) * | 2020-08-20 | 2024-02-20 | 第四范式(北京)技术有限公司 | 基于数据形式的应用机器学习的方法、装置和电子设备 |
| CN112446597B (zh) * | 2020-11-14 | 2024-01-12 | 西安电子科技大学 | 贮箱质量评估方法、系统、存储介质、计算机设备及应用 |
| CN114238269B (zh) * | 2021-12-03 | 2024-01-23 | 中兴通讯股份有限公司 | 数据库参数调整方法、装置、电子设备和存储介质 |
| CN114492638B (zh) * | 2022-01-26 | 2025-12-30 | 第四范式(北京)技术有限公司 | 一种特征抽取方法及装置、电子设备、存储介质 |
| CN116451056B (zh) * | 2023-06-13 | 2023-09-29 | 支付宝(杭州)信息技术有限公司 | 端特征洞察方法、装置以及设备 |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106777088A (zh) * | 2016-12-13 | 2017-05-31 | 飞狐信息技术(天津)有限公司 | 快速迭代的搜索引擎排序方法及系统 |
| US20180012145A1 (en) * | 2016-07-07 | 2018-01-11 | Hcl Technologies Limited | Machine learning based analytics platform |
| CN109003091A (zh) * | 2018-07-10 | 2018-12-14 | 阿里巴巴集团控股有限公司 | 一种风险防控处理方法、装置及设备 |
| CN110766164A (zh) * | 2018-07-10 | 2020-02-07 | 第四范式(北京)技术有限公司 | 用于执行机器学习过程的方法和系统 |
| CN111107102A (zh) * | 2019-12-31 | 2020-05-05 | 上海海事大学 | 基于大数据实时网络流量异常检测方法 |
Family Cites Families (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8533222B2 (en) * | 2011-01-26 | 2013-09-10 | Google Inc. | Updateable predictive analytical modeling |
| US20160148115A1 (en) * | 2014-11-26 | 2016-05-26 | Microsoft Technology Licensing | Easy deployment of machine learning models |
| US11681943B2 (en) * | 2016-09-27 | 2023-06-20 | Clarifai, Inc. | Artificial intelligence development via user-selectable/connectable model representations |
| CN107862602A (zh) * | 2017-11-23 | 2018-03-30 | 安趣盈(上海)投资咨询有限公司 | 一种基于多维度指标计算、自学习及分群模型应用的授信决策方法与系统 |
| CN110083334B (zh) * | 2018-01-25 | 2023-06-20 | 百融至信(北京)科技有限公司 | 模型上线的方法及装置 |
| CN110766163B (zh) * | 2018-07-10 | 2023-08-29 | 第四范式(北京)技术有限公司 | 用于实施机器学习过程的系统 |
| CN110956272B (zh) * | 2019-11-01 | 2023-08-08 | 第四范式(北京)技术有限公司 | 实现数据处理的方法和系统 |
| CN111008707A (zh) * | 2019-12-09 | 2020-04-14 | 第四范式(北京)技术有限公司 | 自动化建模方法、装置及电子设备 |
-
2020
- 2020-05-15 CN CN202010415370.7A patent/CN113673707B/zh active Active
-
2021
- 2021-05-17 WO PCT/CN2021/094202 patent/WO2021228264A1/zh not_active Ceased
- 2021-05-17 EP EP21802933.8A patent/EP4152224A4/en not_active Withdrawn
- 2021-05-17 US US17/925,576 patent/US20230342663A1/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180012145A1 (en) * | 2016-07-07 | 2018-01-11 | Hcl Technologies Limited | Machine learning based analytics platform |
| CN106777088A (zh) * | 2016-12-13 | 2017-05-31 | 飞狐信息技术(天津)有限公司 | 快速迭代的搜索引擎排序方法及系统 |
| CN109003091A (zh) * | 2018-07-10 | 2018-12-14 | 阿里巴巴集团控股有限公司 | 一种风险防控处理方法、装置及设备 |
| CN110766164A (zh) * | 2018-07-10 | 2020-02-07 | 第四范式(北京)技术有限公司 | 用于执行机器学习过程的方法和系统 |
| CN111107102A (zh) * | 2019-12-31 | 2020-05-05 | 上海海事大学 | 基于大数据实时网络流量异常检测方法 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4152224A4 |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115242648A (zh) * | 2022-07-19 | 2022-10-25 | 北京百度网讯科技有限公司 | 扩缩容判别模型训练方法和算子扩缩容方法 |
| CN115242648B (zh) * | 2022-07-19 | 2024-05-28 | 北京百度网讯科技有限公司 | 扩缩容判别模型训练方法和算子扩缩容方法 |
| CN115269730A (zh) * | 2022-08-04 | 2022-11-01 | 北京京东振世信息技术有限公司 | 宽表同步方法及装置 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN113673707A (zh) | 2021-11-19 |
| EP4152224A4 (en) | 2024-06-05 |
| CN113673707B (zh) | 2024-12-27 |
| US20230342663A1 (en) | 2023-10-26 |
| EP4152224A1 (en) | 2023-03-22 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2021228264A1 (zh) | 一种应用机器学习的方法、装置、电子设备及存储介质 | |
| CN118093801B (zh) | 基于大语言模型的信息交互方法、装置以及电子设备 | |
| KR102484617B1 (ko) | 이종 그래프 노드를 표현하는 모델 생성 방법, 장치, 전자 기기, 저장 매체 및 프로그램 | |
| US11868361B2 (en) | Data distribution process configuration method and apparatus, electronic device and storage medium | |
| WO2022048648A1 (zh) | 实现自动构建模型的方法、装置、电子设备和存储介质 | |
| US20150007084A1 (en) | Chaining applications | |
| CN112036577B (zh) | 基于数据形式的应用机器学习的方法、装置和电子设备 | |
| CN109033109A (zh) | 数据处理方法及系统 | |
| CN111158666B (zh) | 实体归一化处理方法、装置、设备及存储介质 | |
| CN110633959A (zh) | 基于图结构的审批任务创建方法、装置、设备及介质 | |
| JP2024175030A (ja) | 人工知能に基づく情報処理方法、装置、電子機器及びエージェント | |
| CN109376015A (zh) | 用于任务调度系统的日志阻塞解决方法及系统 | |
| WO2024139703A1 (zh) | 对象识别模型的更新方法、装置、电子设备、存储介质及计算机程序产品 | |
| WO2024016547A1 (zh) | 一种基于多方协作的数据查询方法及装置 | |
| CN112686381B (zh) | 神经网络模型、方法、电子设备及可读介质 | |
| CN110689137A (zh) | 参数确定方法、系统、介质和电子设备 | |
| CN110442753A (zh) | 一种基于opc ua的图数据库自动建立方法及装置 | |
| US12596923B2 (en) | Machine learning of keywords | |
| US12235862B2 (en) | Time series prediction method for graph structure data | |
| WO2018205390A1 (zh) | 一种控件布局显示控制方法、系统、装置及计算机可读存储介质 | |
| CN114443831B (zh) | 应用机器学习的文本分类方法、装置和电子设备 | |
| CN118974697B (zh) | 服务内通信基础设施的半自动部署 | |
| TWI803875B (zh) | 業務邏輯表示模型的建模裝置及建模方法 | |
| US20250278668A1 (en) | Parallel join application for machine learning features | |
| WO2022037689A1 (zh) | 一种基于数据形式的数据处理方法和应用机器学习的方法 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21802933 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2021802933 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 2021802933 Country of ref document: EP Effective date: 20221215 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWW | Wipo information: withdrawn in national office |
Ref document number: 2021802933 Country of ref document: EP |