WO2014131262A1 - 一种缺陷预测方法及装置 - Google Patents

一种缺陷预测方法及装置 Download PDF

Info

Publication number
WO2014131262A1
WO2014131262A1 PCT/CN2013/080279 CN2013080279W WO2014131262A1 WO 2014131262 A1 WO2014131262 A1 WO 2014131262A1 CN 2013080279 W CN2013080279 W CN 2013080279W WO 2014131262 A1 WO2014131262 A1 WO 2014131262A1
Authority
WO
WIPO (PCT)
Prior art keywords
training
classifier
tree
unit
nth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2013/080279
Other languages
English (en)
French (fr)
Inventor
陈焕华
潘璐伽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to EP13876166.3A priority Critical patent/EP2854053B1/en
Publication of WO2014131262A1 publication Critical patent/WO2014131262A1/zh
Priority to US14/587,724 priority patent/US10068176B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present invention relates to the field of data processing, and in particular, to a defect prediction method and apparatus. Background technique
  • the quality of products has become a major concern for users and enterprises. Especially for enterprises, the quality of products is the foundation of enterprises. Reducing the defect rate of a product is critical to the business.
  • the cause of product defects is mainly the production process of the product, including the design of the product, the quality of the materials used, the ability of the manufacturer, etc. Therefore, if the company wants to reduce the defect rate of the product, it needs to analyze and improve the product. Production process to improve product quality.
  • Each product has a record of all aspects of the product, such as raw material sources, production information, test information, shipping information, usage information, etc., and when the product is in use or during production, a certain type of defect or failure occurs. At the time, the factors causing such defects or malfunctions have a certain correlation with the recorded information of the product.
  • the prior art provides a fault product defect prediction method, which specifically uses a recorded faulty product information to generate a single decision tree through a decision tree based classification algorithm. At this time, when a product fails, it can be generated according to the generated The decision tree predicts the defects of the faulty product. When the classified information of the information of the faulty product is recorded, the single decision tree generated by the decision tree-based classification algorithm is likely to cause over-fitting or under-fitting, which makes the defect prediction impossible. Therefore, when a product is defective or faulty, how to quickly locate the fault point and find the cause of the fault has become the focus of industry research.
  • Embodiments of the present invention provide a defect prediction method and apparatus that achieve accurate and rapid positioning of defects of a faulty product.
  • a defect prediction method including:
  • the classifier set includes at least two tree classifiers
  • the classifier set is used as a predictive model to predict defects in the faulty product.
  • the training set includes M training units, each training unit includes a target attribute and a training attribute set; and the generating a classifier set according to the training set, Includes:
  • the Nth training subset includes M, a training unit, and the M is less than or equal to the M;
  • the N tree classifiers are combined to generate the classifier set.
  • the method further includes:
  • obtaining an error rate of the generated K tree classifiers includes:
  • the Mth class tree classifier is a classifier set that does not use the Mth training unit to generate a tree classifier, where the M is The training set contains the number of training units;
  • M prediction labels including:
  • the target attribute obtained by the training attribute set included in the M training unit is a collection of classification labels.
  • E(r) is an error rate of the generated K tree classifiers, and a number of training units in the training set, C 00B (r , x ) is the rth of the rth training unit
  • the prediction label which is the target attribute of the rth training unit.
  • the method further includes:
  • the Nth, the training subset is selected from the training set; wherein, the Nth, the intersection of the training subset and the Nth training subset is empty, and the Nth training subset includes at least one training unit;
  • the using the classifier set as a prediction model to predict a fault of the faulty product includes:
  • classifier set as a prediction model to predict a defect of the faulty product according to the attribute information to obtain a classification label set
  • the preset policy includes a decision tree algorithm.
  • a defect prediction apparatus including:
  • a processing unit configured to select a training attribute set from the pre-stored product fault records according to the target attribute, and combine the target attribute and the training attribute set into a training set; wherein the target attribute is a defect attribute of the historical fault product ;
  • a generating unit configured to generate a classifier set according to the training set obtained by the processing unit Wherein the classifier set includes at least two tree classifiers;
  • a prediction unit configured to use the classifier set generated by the generating unit as a prediction model to predict a defect of the faulty product.
  • the training set includes M training units, each training unit includes a target attribute and a training attribute set; and the generating unit includes:
  • a selection module configured to select a first training subset from the training set obtained by the processing unit
  • a generating module configured to generate, according to a preset policy, a first tree classifier corresponding to the first training subset selected by the selecting module;
  • the selecting module is further configured to select a second training subset from the training set obtained by the processing unit;
  • the generating module is further configured to generate, according to the preset policy, a second tree classifier corresponding to the second training subset selected by the selecting module;
  • the selection module is further configured to select an Nth training subset from the training set obtained by the processing unit, where the Nth training subset includes M' training units, and the M is less than or equal to the M;
  • the generating module is further configured to generate, according to the preset policy, an Nth tree classifier corresponding to the Nth training subset selected by the selecting module, where the N is an integer greater than or equal to 2;
  • a combination module configured to combine the N tree classifiers generated by the generating module to generate the classifier set.
  • the generating unit further includes:
  • a first obtaining module configured to acquire an error rate of the generated K-1 tree classifiers when generating the K-1 tree classifier
  • a second obtaining module configured to: when generating the Kth tree classifier, obtain an error rate of the generated K tree classifiers; so that when the error rate of the K tree classifiers and the K-1 tree classifier Classify the K trees when the difference in error rate is less than a preset threshold Combining the generated classifier set; wherein, K is an integer less than or equal to N.
  • the second acquiring module includes:
  • selecting a submodule configured to select, according to the first training unit, the first class tree classifier from the set of classifiers
  • Generating a submodule configured to generate, according to the first class tree classifier selected by the selecting submodule, a first prediction tag of the first training unit;
  • the selecting submodule is further configured to select a second class tree classifier from the set of classifiers according to the second training unit;
  • the generating submodule is further configured to generate a second prediction label of the second training unit according to the second type tree classifier selected by the selecting submodule;
  • the selecting sub-module is further configured to select, according to the Mth training unit, the M-type tree classifier from the set of classifiers; wherein the M-th class tree classifier is a tree classifier that does not use the M-th training unit a set of classifiers, where M is the number of training units included in the training set;
  • the generating submodule is further configured to generate an Mth prediction label of the Mth training unit according to the Mth class tree classifier selected by the selecting submodule;
  • an obtaining submodule configured to obtain an error rate of the generated K tree classifiers according to the M prediction labels generated by the generating submodule.
  • the generating submodule is specifically configured to:
  • Sign; among them, , 3 ⁇ 4 is the Mth prediction label of the Mth training unit, Cj j-tree classifier, ( ⁇ £ is the M-th class tree classifier, is the weight of the j-th tree classifier, C. (x M And a target attribute obtained according to the training attribute set included in the j-th tree classifier and the Mth training unit, is a classification label set.
  • the error rate of the M ri tree classifier wherein, E(r) is the error rate of the generated K tree classifiers, and the number of training units in the training set, C 00B ⁇ r , ⁇ ) is the number The rth prediction label of the r training unit, ⁇ is the target attribute of the rth training unit.
  • the method further includes:
  • a selecting unit configured to select an Nth, training subset from the training set after the generating module generates an Nth tree classifier corresponding to the Nth training subset according to a preset policy;
  • the intersection of the Nth training subset and the Nth training subset is empty, and the Nth, the training subset includes at least one training unit;
  • a first obtaining unit configured to acquire, according to the Nth, training subset selected by the selecting unit, a misprediction rate of the Nth tree classifier
  • a first acquiring unit configured to acquire a weight of the Nth tree classifier according to the Nth tree classifier misprediction rate acquired by the first acquiring unit.
  • the predicting unit includes:
  • a statistics module configured to collect attribute information of the faulty product
  • a prediction module configured to use the classifier set as a prediction model to predict a defect of the faulty product according to the attribute information of the statistical module, and obtain a classification label third acquiring module, configured to use the classifier set and the The weight of each tree classifier in the classifier set is obtained, and the trust value of each class tag in the class tag set is obtained.
  • the preset policy includes a decision tree algorithm.
  • a defect prediction method and apparatus selects a training attribute set from a pre-stored product fault record according to a target attribute, and combines the target attribute and the training attribute set into a training set to generate a tree classifier including at least two tree classifiers. Classifier collection, this When the product fails, the classifier set can be used as a predictive model to predict the defects of the faulty product. Using the classifier set as the predictive model, it is solved that the single decision tree is easy to cause over-fitting or under-fitting. The problem that can not predict the defect of the faulty product, and the rapid positioning of the fault of the faulty product is improved, and the accuracy of the faulty product defect prediction is also improved.
  • FIG. 1 is a flowchart of a defect prediction method according to Embodiment 1 of the present invention
  • FIG. 2 is a flowchart of a defect prediction method according to Embodiment 2 of the present invention
  • FIG. 3 is a defect prediction according to Embodiment 3 of the present invention
  • FIG. 4 is a schematic diagram of another defect prediction apparatus according to Embodiment 3 of the present invention
  • FIG. 5 is a schematic diagram of a defect prediction apparatus according to Embodiment 4 of the present invention.
  • the embodiment of the invention provides a defect prediction method. As shown in FIG. 1, the method may include:
  • the fault detection personnel when a product fails, the fault detection personnel generally want to be able to quickly locate the fault type of the faulty product or cause the product to malfunction.
  • the device can be used to save maintenance personnel's maintenance time.
  • the fault detector can Information on products that have failed during the production process or during use is collected and recorded in the product fault record, so that when the predictive model is trained, it can be recorded from the fault recorded in advance according to the defect attribute of the historical faulty product.
  • the product fault record of the product selects the attribute necessary for establishing the prediction model as the training attribute set, wherein the defect attribute of the historical fault product is defined as the target attribute, and after selecting the training attribute set according to the target attribute, the target attribute and training are performed.
  • the attribute set combination generates a training set.
  • the training set may include multiple training units, where each training unit includes one target attribute and one training attribute set.
  • the classifier set when the required training attribute set is selected according to the target attribute, and the target attribute and the training attribute set are combined into a training set, the classifier set may be generated according to the training set.
  • the classifier set includes at least two A tree classifier, each tree classifier is generated according to a preset policy, and all the generated tree classifiers together form a classifier set.
  • the preset policy may be a decision tree algorithm or the like.
  • the defects of the faulty product can be quickly and accurately located according to the generated classifier set including at least one tree classifier.
  • a defect prediction method selects a training attribute set from a pre-stored product fault record according to a target attribute, and combines the target attribute and the training attribute set into a training set to generate a classifier including at least two tree classifiers.
  • the classifier set can be used as a predictive model to predict the defects of the faulty product.
  • the classifier set is solved that the use of a single decision tree can easily lead to over-fitting or owing Fitting causes problems that cannot be predicted for defective products, and improves the rapid positioning of faulty products. Accuracy of predictions for faulty product defects.
  • the embodiment of the invention provides a defect prediction method. As shown in FIG. 2, the method may include:
  • the fault detection personnel hope to quickly locate the defect type of the faulty product or the faulty device, and for any product
  • the occurrence of faults or defects is related to the objective information of the product, for example, the model of the product, the environment in which it is used, the source of the raw materials, and so on.
  • the product fault record of the product that has failed during the production or use process can be selected to establish the prediction model. Attributes, and the selected attributes are grouped into a training set, and the training set is used to establish a prediction model.
  • Attribute information can be divided into the following categories: Attributes describing product characteristics, attributes describing the use environment, attributes describing production links, and defect attributes.
  • the attributes describing the characteristics of the product may be the product name, the product model, the component parts, etc.
  • the attributes describing the use environment may be the use period, the place of use, the climate of use, etc.
  • the attributes describing the production process may be the production date, the processing department, and the inspection. Recording, etc.; defect attributes can be defect types, defect phenomena, defect root causes, defective devices, and so on.
  • the classification of the attribute information of the faulty product recorded and the type of the attribute information recorded under each category are not limited, and the form of the attribute information of the faulty product is not limited.
  • the training set is composed so that the training set can be used to build the predictive model.
  • the screening process may be specifically: filtering the attribute information of the record for the target attribute, and selecting X attributes to form a training attribute set, where X may be all attributes in the recorded attribute information, or may be one attribute.
  • the attribute information of the faulty product recorded includes: product name, product model, component parts, use period, use place, use climate , production date, processing department, inspection record, defect type, defect phenomenon, defect root cause, defective device, then we can use the preset rules to select the prediction model in the attribute information in the historical fault record of the recorded faulty product.
  • the required attributes are used to form the training attribute set.
  • the training attribute set contains ⁇ training modules, each training unit that contains the target attribute and the training attribute set a historical faulty product.
  • the accuracy of the prediction model of the predicted target attribute established by the training attribute set is high, which requires that different training attributes can be selected by repeatedly targeting the target attribute.
  • the set composes the training set, and verifies the accuracy of the predictive model established by the different generated training sets, selects the highest accuracy as the training set needed to build the predictive model, and can target the known defective faulty product.
  • the attributes in the training attribute set are available before the faulty product is detected. For example, in the attribute information of the faulty product recorded above, the defective device cannot be used as an attribute in the training attribute set because Before the test, it is not known that the faulty product is that the device has failed.
  • the specific selection rule of the training attribute set may be a traversal method, or the first X attributes with the largest correlation may be selected as the training attribute set by calculating the correlation with the target attribute.
  • the method of selecting the correlation between the calculation and the target attribute is a more common method.
  • One of the simplest methods of calculating the correlation is to calculate the frequency at which the attributes and the target attributes appear at the same time. The higher the frequency, the greater the correlation.
  • the method for selecting the training attribute set and the algorithm to be used when selecting some methods are not limited.
  • the classifier set can be generated according to the training set.
  • the method may be a random sample that can be put back, and the embodiment of the present invention is not limited herein.
  • the first training subset, the second training subset, the Nth training subset may be selected from the training set.
  • the policy after selecting from the training set to the Nth training subset, according to the preset
  • the policy generates an Nth tree classifier corresponding to the Nth training subset.
  • the preset policy may be a spanning tree algorithm, and the specific understanding is as follows: the Nth training subset selected from the training set is taken as a root node, and the separation attribute and the separation predicate are selected according to the separation algorithm, and the root node is separated according to the separation attribute.
  • the separation predicate is split, and two branches are obtained. For each attribute in the branch, the attribute selection strategy can be used to select, and then the branch continues to be split according to the separation algorithm, and the above steps are repeated until the final generated branch can determine the target attribute.
  • the generated tree classifier is detected according to the tree clipping strategy.
  • the training set T ⁇ product name, production date, processing department, use period, defect type ⁇ , which includes one training unit
  • the second training subset is a set containing training units
  • the third training unit As a root node, it is assumed that the separation attribute is selected according to the separation algorithm as the usage period, the separation predicate is used for more than 50 days, and the usage period is less than or equal to 50 days, so that the root node can be divided into two branches according to the separation attribute and the separation predicate. You can continue to select the split attribute and the split predicate for splitting until you can determine the target attribute.
  • the separation algorithm used in the above tree classifier generation process includes but is not limited to information entropy checking, Gini index test, square root test, and gain rate test; attribute selection may include random single attribute selection and random multiple attribute selection, attribute selection.
  • the policy is not limited in the embodiment of the present invention; the tree cropping strategy includes but is not limited to a pre-cropping strategy and a post-cropping strategy.
  • the number N of the generated tree classifiers in the embodiment of the present invention may be a preset threshold value, that is, when the number of generated tree classifiers reaches a predetermined threshold, the generated N may be generated.
  • the tree classifiers are composed to generate a classifier set, for example, when the preset threshold value N is 5, the classifier set ⁇ ⁇ , ⁇ , , ⁇ .
  • When generating the classifier set may also be determined by calculating the difference between the error rate of the K tree classifiers generated and the error rate of the generated K-1 tree classifiers, specifically, when generating the K-1 tree
  • the error rate of the generated K-1 tree classifier can be calculated, and when the Kth tree classifier is generated, the error rate of the generated K tree classifiers is calculated, so that when K is calculated
  • the difference between the error rate of the tree classifier and the error rate of the K-1 tree classifier is less than a preset threshold, the generated K tree classifiers are combined to generate a classifier set, where K is less than or equal to N The integer.
  • the error rate of the generated K tree classifiers is calculated as: For each training unit in the training set, the prediction label is calculated, and the generated K tree classifications are obtained according to the prediction label. The error rate of the device. Specifically, the first class tree classifier is selected from the classifier set according to the first training unit, and the first prediction tag of the first training unit is generated according to the first class tree classifier; and the second training unit is used from the classifier set. Selecting a second type of tree classifier, and generating a second prediction label of the second training unit according to the second type of tree classifier, ...
  • the class tree classifier generates the Mth prediction label of the Mth training unit; repeats the above steps until the prediction label corresponding to the training unit is calculated for each training unit in the training set, and finally, according to the calculated M predictions
  • the tag gets the error rate of the generated K tree classifiers.
  • the class M tree classifier is a classifier set that does not use the M training unit to generate a tree classifier.
  • the prediction label specific calculation process is: for the r training unit in the training set (where r is a positive integer greater than 0 and less than or equal to M), the tree classifier in the classifier set can be divided into two categories, one class For the tree classifier generated by the rth training unit and the tree classifier generated by the rth training unit, we use a tree classifier generated by the rth training unit to form a set, which is called r The class tree classifier, denoted as ⁇ , then the specific calculation formula of the rth prediction tag of the rth training unit is:
  • C. is the jth tree classifier
  • r class a tree classifier which is a weight of the j-th tree classifier
  • C is a target attribute obtained according to a training attribute set included in the j-th tree classifier and the r-th training unit
  • y is a classification tag, which is according to the r-th training unit and the classification
  • the specific calculation formula of the error rate of the generated K tree classifiers is:
  • E(r) is the error rate of the generated K tree classifiers
  • is the number of training units in the training set
  • C 00B (r, x ) is the rd prediction label of the rth training unit
  • is the target attribute of the r-th training unit
  • the specific calculation process of the weight of the j-tree classifier is: selecting the jth, training subset from the training set, and then obtaining the mispredicted rate of the j-th tree classifier according to the jth training sub-set, and finally according to the j-th tree classifier
  • the misprediction rate obtains the weight of the j-th tree classifier.
  • the intersection of the jth training subset and the jth training subset is empty, and the jth training subset includes at least one training unit.
  • the attribute information of the faulty product may be first counted, and the attribute information is data obtained during the production and use of the faulty product, and may include: a product name, a product model, a component, and a use. Cycle, place of use, date of manufacture, processing department, etc.
  • the classifier set is used as a prediction model to predict a faulty product based on the attribute information to obtain a classification label set. After the attribute information of the faulty product is counted, the attribute information of the faulty product can be used as a predictive model to predict the fault of the faulty product, because the generated classifier is in the set.
  • the N tree classifiers are included, because the defects of the faulty product predicted by the classifier set will have multiple prediction results, and the predicted multiple results are used as the classification label set. Using the defect prediction method provided by the embodiment of the present invention, not only the defect of the faulty product can be predicted, but also a plurality of prediction results can be obtained for reference by the maintenance personnel.
  • the weight of the tree classifier in the classifier set and the classifier set may also be used. Calculates the trust value for each of the classification labels in the classification label set.
  • the specific calculation method for the trust value of the classification label is:
  • y is the classification label set, [/r 0 is the trust value of the classification label;
  • C xJ is the fault predicted by the j-tree classifier The target attribute of the product.
  • the possible defect classification label of r is defined as ⁇ _y e ⁇ 0 ) > ⁇ .
  • An embodiment of the present invention provides a defect prediction method, which selects a training attribute set from a pre-stored product failure record according to a target attribute, and combines the target attribute and the training attribute set into a training set to generate a classifier set including at least two tree classifiers.
  • the classifier set can be used as a predictive model to predict the defects of the faulty product.
  • the classifier set is used as the predictive model, which solves the problem that the use of a single decision tree can easily lead to over-fitting or under-fitting.
  • the problem of defect prediction by the faulty product, and the rapid positioning of the fault of the faulty product is also achieved, and the accuracy of predicting the defect of the faulty product is also improved.
  • the embodiment of the present invention provides a defect prediction apparatus, as shown in FIG. 3, including: a processing unit 3 1 , a generating unit 32 , and a prediction unit 33 .
  • the processing unit 3 1 is configured to select a training attribute set from the pre-stored product fault records according to the target attribute, and combine the target attribute and the training attribute set into a training set; wherein the target attribute is a historical fault product Defect attribute.
  • the generating unit 32 is configured to generate a classifier set according to the training set obtained by the processing unit 31; wherein the classifier set includes at least two tree classifiers.
  • the prediction unit 33 is configured to use the classifier set generated by the generating unit 32 as a prediction model to predict a defect of the faulty product.
  • the training set includes M training units, and each training unit includes a target attribute and a training attribute set.
  • the generating unit 32 may include: a selecting module 321, a generating module 322, and a combining module 323.
  • the selecting module 321 is configured to select the first training subset from the training set obtained by the processing unit 31.
  • the generating module 322 is configured to generate, according to the preset policy, a first tree classifier corresponding to the first training subset selected by the selecting module 321 .
  • the selecting module 321 is further configured to select a second training subset from the training set obtained by the processing unit 31.
  • the generating module 322 is further configured to generate and select the module according to a preset policy. 321 select the second tree classifier corresponding to the second training subset.
  • the selection module 321 is further configured to select an Nth training subset from the training set obtained by the processing unit 31, where the Nth training subset includes M' training units, and the M is less than or equal to Said M.
  • the generating module 322 is further configured to generate, according to the preset policy, an Nth tree classifier corresponding to the Nth training subset selected by the selecting module 321; where, the N is an integer greater than or equal to 2.
  • the combining module 323 is configured to combine the N tree classifiers generated by the generating module 322 to generate the classifier set.
  • the generating unit 32 may further include: a first obtaining module 324 and a second acquiring module 325.
  • the first obtaining module 324 is configured to obtain an error rate of the generated K-1 tree classifiers when the K-1 tree classifier is generated.
  • a second obtaining module 325 configured to: when generating the Kth tree classifier, obtain an error rate of the generated K tree classifiers; so that when the error rate of the K tree classifiers and the K-1 tree classification When the difference between the error rates of the devices is less than a preset threshold, the K tree classifiers are combined to generate the classifier set; wherein, K is an integer less than or equal to N.
  • the second obtaining module 325 may include: a selecting submodule 3251, a generating submodule 3252, and an obtaining submodule 3253.
  • the selecting submodule 3251 is configured to select the first class tree classifier from the set of classifiers according to the first training unit.
  • the generating submodule 3252 is configured to generate, according to the first class tree classifier selected by the selecting submodule 3251, a first prediction tag of the first training unit.
  • the selecting sub-module 3251 is further configured to select a second class tree classifier from the set of classifiers according to the second training unit.
  • the generating sub-module 3252 is further configured to generate a second prediction label of the second training unit according to the second type tree classifier selected by the selecting sub-module 3251.
  • the selection sub-module 3251 is further configured to perform classification from the M training unit according to the The M-type tree classifier is selected from the set of devices; wherein the M-type tree classifier is a classifier set that does not use the M-th training unit to generate a tree classifier, where the M is a number of training units included in the training set.
  • the generating submodule 3252 is further configured to generate an Mth prediction label of the Mth training unit according to the Mth class tree classifier selected by the selecting submodule 3251.
  • the obtaining sub-module 3253 is configured to obtain an error rate of the generated K tree classifiers according to the M prediction labels generated by the generating sub-module 3252.
  • the generating submodule 3252 is specifically configured to:
  • C 00B (M, 3 ⁇ 4) of the M-th training unit M th prediction tags C is the j-th tree classifier, ( ⁇ £ M to the first type tree classifier, j is the weight of a heavy tree classifier, C (x M ) is a set of classification labels according to a target attribute obtained according to the training attribute set included in the j-th tree classifier and the M-th training unit.
  • the obtaining submodule 3253 is specifically configured to:
  • the error rate of the classifier wherein, E(r) is the error rate of the generated K tree classifiers, and the number of training units in the training set, C°° £ (r, x ) is the r
  • the rth prediction label of the training unit is the target attribute of the rth training unit.
  • the device may further include: a selecting unit 34, a first obtaining unit 35, and a second acquiring unit 36.
  • the selecting unit 34 is configured to: after the generating module 322 generates an Nth tree classifier corresponding to the Nth training subset according to the preset policy, select an Nth training subset from the training set; The intersection of the Nth training subset and the Nth training subset is empty, and the Nth, training subset includes at least one training unit.
  • the first obtaining unit 35 is configured to acquire, according to the Nth, training subset selected by the selecting unit 34, a misprediction rate of the Nth tree classifier.
  • the prediction unit 33 may include: a statistics module 33 1 , a prediction module 332 , and a third acquisition module 333 .
  • the statistics module 33 1 is configured to collect attribute information of the faulty product.
  • the prediction module 332 is configured to predict, according to the attribute information of the statistics module 33 1 , the classifier set as a prediction model to predict a defect of the faulty product to obtain a classification label set.
  • the third obtaining module 333 is configured to obtain, according to the weight of each classifier in the classifier set and the classifier set, a trust value of each class tag in the classifier tag set.
  • An embodiment of the present invention provides a defect prediction apparatus, which selects a training attribute set from a pre-stored product failure record according to a target attribute, and combines the target attribute and the training attribute set into a training set to generate a classifier set including at least two tree classifiers.
  • the classifier set can be used as a predictive model to predict the defects of the faulty product, and the classifier set is used as the predictive model, which solves the problem that the use of a single decision tree easily causes over-fitting or under-fitting
  • the result is that the defect product cannot be predicted, and the accuracy of the defect prediction of the faulty product is improved while the defect of the faulty product is quickly located.
  • An embodiment of the present invention provides a defect prediction apparatus.
  • the method includes: at least one processor 41, a memory 42, a communication interface 43, and a bus 44.
  • the at least one processor 41, the memory 42 and the communication interface 43 pass through a bus. 44 connect and complete communication with each other, where:
  • the bus 44 may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, or an extended industry standard architecture (Extended Industry) Standard Architecture, EISA) Bus, etc.
  • ISA Industry Standard Architecture
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the bus 44 can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is shown in Figure 5, but it does not mean that there is only one bus or one type of bus.
  • the memory 42 is for storing executable program code, the program code including computer operating instructions.
  • the memory 42 may include a high speed RAM memory and may also include a non-volatile memory such as at least one disk memory.
  • the processor 41 may be a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present invention. .
  • CPU Central Processing Unit
  • ASIC Application Specific Integrated Circuit
  • the communication interface 43 is mainly used to implement communication between devices of the embodiment.
  • the processor 41 executes the program code, configured to select a training attribute set from a pre-stored product fault record according to a target attribute, and combine the target attribute and the training attribute set into a training set; wherein, the target The attribute is a defect attribute of the historical fault product, and the classifier set is generated according to the training set; wherein the classifier set includes at least two tree classifiers, and the generated classifier set is used as a prediction model to predict a defect of the faulty product.
  • the training set includes M training units, and each training unit includes a target attribute and a training attribute set.
  • the processor 41 is further configured to: select a first training subset from the training set, generate a first tree classifier corresponding to the first training subset according to a preset policy; and select a first a second training subset, generating a second tree classifier corresponding to the second training subset according to a preset policy; selecting an Nth training subset from the training set, and generating and the Nth training according to a preset policy The Nth tree classifier corresponding to the subset, and finally the generated N tree classifiers are combined to generate the classifier set.
  • the Nth training subset includes M and a training unit, and the M is less than or equal to the M, and the N is an integer greater than or equal to 2.
  • the processor 41 is further configured to: when generating the K-1 tree classifier, obtain an error rate of the generated K-1 tree classifier, and acquire the generated K-tree classifier when generating the K-tree classifier K tree classifier error rate so that when the K tree classifier is wrong
  • the tree classifiers are combined to generate the classifier set; wherein, the ⁇ is an integer less than or equal to ⁇ .
  • the processor 41 is further configured to: select, according to the first training unit, a first class tree classifier from the classifier set, and generate, according to the first class tree classifier, the first training unit a prediction label; selecting, according to the second training unit, a second class tree classifier from the classifier set, generating a second prediction tag of the second training unit according to the second class tree classifier; training according to the third training The unit selects a third class tree classifier from the classifier set; generates a third prediction tag of the third training unit according to the third class tree classifier, and finally obtains the generating according to the generated one prediction tag The error rate of a tree classifier.
  • the first class tree classifier is a classifier set that does not use the first training unit to generate a tree classifier, where the number of training units is included in the training set.
  • processor 41 is further configured to:
  • C 00B (M, 3 ⁇ 4 ) is the Mth prediction label of the Mth training unit
  • C. is the jth tree classifier
  • ( ⁇ £ is the Mth class tree classifier, and is the weight of the jth tree classifier
  • C (x M ) is a target attribute obtained according to the training attribute set included in the j-th tree classifier and the Mth training unit, and is a classification label set.
  • E (r) of the generated K tree classifier training error rate of the number of training units concentrated, C 00B (r, x) for the first label of the prediction r r th training unit, ⁇ is The target attribute of the rth training unit.
  • the processor 41 is further configured to: after generating the Nth tree classifier corresponding to the Nth training subset according to the preset policy, select the Nth training subset from the training set Acquiring a misprediction rate of the Nth tree classifier according to the Nth training subset, and acquiring the Nth tree classification according to the Nth tree classifier misprediction rate The weight of the device.
  • the intersection of the Nth training subset and the Nth training subset is empty, and the Nth, the training subset includes at least one training unit.
  • the processor 41 is further configured to collect attribute information of the faulty product, and use the classifier set as a predictive model to predict a defect of the faulty product according to the attribute information, and obtain a classification label set, and according to the Deriving a weight of a tree classifier in the classifier set and the classifier set, and obtaining a trust value of each class label in the class tag set.
  • the preset policy includes a decision tree algorithm.
  • An embodiment of the present invention provides a defect prediction apparatus, which selects a training attribute set from a pre-stored product failure record according to a target attribute, and combines the target attribute and the training attribute set into a training set to generate a classifier set including at least two tree classifiers.
  • the classifier set can be used as a predictive model to predict the defects of the faulty product, and the classifier set is used as the predictive model, which solves the problem that the use of a single decision tree easily causes over-fitting or under-fitting
  • the result is that the defect product cannot be predicted, and the accuracy of the defect prediction of the faulty product is improved while the defect of the faulty product is quickly located.
  • the classifier set when used as the prediction model to predict the defect of the faulty product, a plurality of prediction results can also be obtained, and the trust value of each prediction result can be calculated, which saves the time for the maintenance personnel to locate the defect.
  • the present invention can be implemented by means of software plus necessary general hardware, and of course, by hardware, but in many cases, the former is a better implementation. .
  • the technical solution of the present invention which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a readable storage medium, such as a floppy disk of a computer.
  • a hard disk or optical disk or the like includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种缺陷预测方法及装置,涉及数据处理领域。该方法包括:根据目标属性从预存的产品故障记录中选择训练属性集,并将所述目标属性和所述训练属性集组合成训练集(101);所述目标属性为历史故障产品的缺陷属性;根据所述训练集生成分类器集合,所述分类器集合包含至少2个树分类器(102);将所述分类器集合作为预测模型预测故障产品的缺陷(103)。上述方法用于故障产品的缺陷预测的过程中,实现对故障产品的准确及快速定位。

Description

一种缺陷预测方法及装置 本申请要求于 2013 年 02 月 28 日提交中国专利局、 申请号为 2013 10066324.0、 发明名称为 "一种缺陷预测方法及装置" 的中国专 利申请的优先权, 其全部内容通过引用结合在本申请中。
技术领域
本发明涉及数据处理领域,尤其涉及一种缺陷预测方法及装置。 背景技术
随着时代的发展, 能够满足人们需求的产品种类和数量逐渐增 多, 产品的质量是也已成为用户及企业关心的主要问题, 特别是尤 其对于企业来说, 产品的质量就是企业的根本, 因此降低产品的缺 陷率对企业至关重要。 而引起产品缺陷的原因主要是产品的生产工 艺, 包括产品的设计、 所使用材料的质量、 生产商能力等, 因此对 于企业来讲, 若想降低产品的缺陷率, 就需要分析并改进产品的生 产工艺, 从而提高产品质量。
每个产品都有关于该产品各方面的信息的记录, 如原料来源、 生产信息、 测试信息、 运输信息、 使用信息等等, 而当产品在使用 或者生产过程中出现某一类型的缺陷或者故障时, 引起这类缺陷或 故障的因素和记录的该产品的信息具有一定的关联性。
现有技术提供一种故障产品缺陷预测方法, 具体为利用记录的 出现过故障的产品的信息, 通过基于决策树的分类算法生成单一决 策树, 此时当产品出现故障时, 便可以根据生成的决策树对故障产 品的缺陷进行预测。 而当记录的出现过故障的产品的信息的分类标 签较多时, 釆用基于决策树的分类算法产生的单一决策树就容易引 起过拟合或欠拟合, 从而导致无法进行缺陷预测。 因此当产品出现 缺陷或者故障时, 如何快速的定位故障点, 并查找到故障原因已成 为业界研究的重点。
发明内容 本发明的实施例提供一种缺陷预测方法及装置, 实现了对故障 产品的缺陷的准确及快速定位。
本发明的第一方面, 提供一种缺陷预测方法, 包括:
根据目标属性从预存的产品故障记录中选择训练属性集, 并将 所述目标属性和所述训练属性集组合成训练集; 其中, 所述目标属 性为历史故障产品的缺陷属性;
根据所述训练集生成分类器集合; 其中, 所述分类器集合包含 至少 2个树分类器;
将所述分类器集合作为预测模型预测故障产品的缺陷。
结合第一方面, 在一种可能的实现方式中, 所述训练集包含 M 个训练单元, 每个训练单元包含一个目标属性和一个训练属性集; 所述根据所述训练集生成分类器集合, 包括:
从所述训练集中选取第一训练子集;
根据预设策略生成与所述第一训练子集相对应的第一树分类 器;
从所述训练集中选取第二训练子集;
根据预设策略生成与所述第二训练子集相对应的第二树分类 器;
从所述训练集中选取第 N训练子集; 其中, 所述第 N训练子集 包含 M,个训练单元, 所述 M,小于等于所述 M;
根据预设策略生成与所述第 N 训练子集相对应的第 N 树分类 器; 其中, 所述 N为大于等于 2的整数;
将 N个树分类器组合生成所述分类器集合。
结合第一方面和上述可能的实现方式, 在另一种可能的实现方 式中, 还包括:
当生成第 K- 1树分类器时, 获取生成的 K- 1个树分类器的错误 率;
当生成第 K树分类器时, 获取生成的 K个树分类器的错误率; 以便当所述 K个树分类器的错误率和所述 K - 1个树分类器的错误率 的差值小于预设的阈值时, 将所述 K个树分类器组合生成所述分类 器集合; 其中, 所述 K为小于等于 N的整数。
结合第一方面和上述可能的实现方式, 在另一种可能的实现方 式中, 所述当生成第 K树分类器时, 获取生成的 K个树分类器的错 误率, 包括:
根据第一训练单元从所述分类器集合中选取第一类树分类器; 根据所述第一类树分类器生成所述第一训练单元的第一预测标 签;
根据第二训练单元从所述分类器集合中选取第二类树分类器; 根据所述第二类树分类器生成所述第二训练单元的第二预测标 签;
根据第 M训练单元从所述分类器集合中选取第 M类树分类器; 其中, 所述第 M类树分类器为未使用第 M训练单元生成树分类器 的分类器集合,所述 M为训练集中包含训练单元的个数;
根据所述第 M类树分类器生成所述第 M训练单元的第 M预测 标签;
根据 M个预测标签获取所述生成的 K个树分类器的错误率。 结合第一方面和上述可能的实现方式, 在另一种可能的实现方 式中, 所述根据所述第 M类树分类器生成所述第 M训练单元的第
M预测标签, 具体包括:
根据 C00B (M , xM ) = arg max ∑ h{s j )1{C j ) = 生成所述第 M预测标 签; 其中, C00B (M , ¾)为所述第 M训练单元的第 M预测标签, Cj为 第 j树分类器, 0^ £为所述第 M类树分类器, 为第 j树分类器的 权重, C (xM)为根据所述第 j树分类器和所述第 M训练单元中包含的 训练属性集得到的目标属性, , 为分类标签集合。
结合第一方面和上述可能的实现方式, 在另一种可能的实现方 式中, 所述根据 M个预测标签获取所述生成的 K个树分类器的错 误率, 具体包括: 根据 E = > 获取所述生成的 K个树分类器的
Figure imgf000005_0001
错误率; 其中, E(r)为所述生成的 K个树分类器的错误率 为所述 训练集中训练单元的个数, C00B (r , x )为所述第 r训练单元的第 r预 测标签, Λ为第 r训练单元的目标属性。
结合第一方面和上述可能的实现方式, 在另一种可能的实现方 式中, 在所述根据预设策略生成与所述第 N训练子集相对应的第 N 树分类器之后, 还包括:
从所述训练集中选取第 N,训练子集; 其中, 所述第 N,训练子 集与所述第 N训练子集的交集为空, 所述第 N '训练子集包含至少 一个训练单元;
根据所述第 N,训练子集获取所述第 N树分类器的误预测率; 根据所述第 N 树分类器误预测率获取所述第 N 树分类器的权 重。
结合第一方面和上述可能的实现方式, 在另一种可能的实现方 式中, 所述将所述分类器集合作为预测模型预测故障产品的缺陷, 包括:
统计所述故障产品的属性信息;
根据所述属性信息将所述分类器集合作为预测模型预测所述故 障产品的缺陷得到分类标签集合;
根据所述分类器集合和所述分类器集合中每个树分类器的权 重, 获取所述分类标签集合中每个分类标签的信任值。
结合第一方面和上述可能的实现方式, 在另一种可能的实现方 式中, 所述预设策略包括决策树算法。
本发明的第二方面, 提供一种缺陷预测装置, 包括:
处理单元, 用于根据目标属性从预存的产品故障记录中选择训 练属性集, 并将所述目标属性和所述训练属性集组合成训练集; 其 中, 所述目标属性为历史故障产品的缺陷属性;
生成单元, 用于根据所述处理单元得到的训练集生成分类器集 合; 其中, 所述分类器集合包含至少 2个树分类器;
预测单元, 用于将所述生成单元生成的分类器集合作为预测模 型预测故障产品的缺陷。
结合第二方面, 在一种可能的实现方式中, 所述训练集包含 M 个训练单元, 每个训练单元包含一个目标属性和一个训练属性集; 所述生成单元, 包括:
选取模块, 用于从所述处理单元得到的所述训练集中选取第一 训练子集;
生成模块, 用于根据预设策略生成与所述选取模块选取的所述 第一训练子集相对应的第一树分类器;
所述选取模块, 还用于从所述处理单元得到的所述训练集中选 取第二训练子集;
所述生成模块, 还用于根据预设策略生成与所述选取模块选取 的所述第二训练子集相对应的第二树分类器;
所述选取模块, 还用于从所述处理单元得到的所述训练集中选 取第 N训练子集; 其中, 所述第 N训练子集包含 M'个训练单元, 所述 M,小于等于所述 M;
所述生成模块, 还用于根据预设策略生成与所述选取模块选取 的所述第 N训练子集相对应的第 N树分类器; 其中, 所述 N为大 于等于 2的整数;
组合模块, 用于将所述生成模块生成的 N个树分类器组合生成 所述分类器集合。
结合第二方面和上述可能的实现方式, 在另一种可能的实现方 式中, 所述生成单元还包括:
第一获取模块,用于当生成第 K- 1树分类器时,获取生成的 K- 1 个树分类器的错误率;
第二获取模块, 用于当生成第 K树分类器时, 获取生成的 K个 树分类器的错误率; 以便当所述 K个树分类器的错误率和所述 K- 1 个树分类器的错误率的差值小于预设的阈值时,将所述 K个树分类 器组合生成所述分类器集合; 其中, 所述 K为小于等于 N的整数。 结合第二方面和上述可能的实现方式, 在另一种可能的实现方 式中, 所述第二获取模块, 包括:
选取子模块, 用于根据第一训练单元从所述分类器集合中选取 第一类树分类器;
生成子模块, 用于根据所述选取子模块选取的所述第一类树分 类器生成所述第一训练单元的第一预测标签;
所述选取子模块, 还用于根据第二训练单元从所述分类器集合 中选取第二类树分类器;
所述生成子模块, 还用于根据所述选取子模块选取的所述第二 类树分类器生成所述第二训练单元的第二预测标签;
所述选取子模块, 还用于根据第 M训练单元从所述分类器集合 中选取第 M类树分类器; 其中, 所述第 M类树分类器为未使用第 M训练单元生成树分类器的分类器集合,所述 M为训练集中包含训 练单元的个数;
所述生成子模块, 还用于根据所述选取子模块选取的所述第 M 类树分类器生成所述第 M训练单元的第 M预测标签;
获取子模块, 用于根据所述生成子模块生成的 M个预测标签获 取所述生成的 K个树分类器的错误率。
结合第二方面和上述可能的实现方式, 在另一种可能的实现方 式中, 所述生成子模块, 具体用于:
根据 C (M , ) - argmax = 生成所述第 M预测标
Figure imgf000007_0001
签; 其中,
Figure imgf000007_0002
, ¾)为所述第 M训练单元的第 M预测标签, Cj 第 j树分类器, (^ £为所述第 M类树分类器, 为第 j树分类器的 权重, C . (xM)为根据所述第 j树分类器和所述第 M训练单元中包含的 训练属性集得到的目标属性, 为分类标签集合。
结合第二方面和上述可能的实现方式, 在另一种可能的实现方 式中, 所述获取子模块, 具体用于: 根据 E
Figure imgf000008_0001
, ) = Λ)获取所述生成子模块生成的 Κ个
M ri 树分类器的错误率; 其中, E(r)为所述生成的 K个树分类器的错误 率 为所述训练集中训练单元的个数, C00B {r , χ )为所述第 r训练单 元的第 r预测标签, Λ为第 r训练单元的目标属性。
结合第二方面和上述可能的实现方式, 在另一种可能的实现方 式中, 还包括:
选取单元, 用于在所述生成模块根据预设策略生成与所述第 N 训练子集相对应的第 N树分类器之后, 从所述训练集中选取第 N, 训练子集; 其中, 所述第 N'训练子集与所述第 N训练子集的交集 为空, 所述第 N,训练子集包含至少一个训练单元;
第一获取单元, 用于根据所述选取单元选取的所述第 N,训练子 集获取所述第 N树分类器的误预测率;
第一获取单元, 用于根据所述第一获取单元获取到的所述第 N 树分类器误预测率获取所述第 N树分类器的权重。
结合第二方面和上述可能的实现方式, 在另一种可能的实现方 式中, 所述预测单元包括:
统计模块, 用于统计所述故障产品的属性信息;
预测模块, 用于根据所述统计模块统计的所述属性信息将所述 分类器集合作为预测模型预测所述故障产品的缺陷得到分类标签 第三获取模块, 用于根据所述分类器集合和所述分类器集合中 每个树分类器的权重, 获取所述分类标签集合中每个分类标签的信 任值。
结合第二方面和上述可能的实现方式, 在另一种可能的实现方 式中, 所述预设策略包括决策树算法。
本发明实施例提供的一种缺陷预测方法及装置, 根据目标属性 从预存的产品故障记录中选择训练属性集, 并根据目标属性和训练 属性集组合成训练集生成包含至少 2个树分类器的分类器集合, 此 时当产品出现故障时, 便可以将该分类器集合作为预测模型来预测 故障产品的缺陷, 利用该分类器集合作为预测模型, 解决了釆用单 一决策树容易引起过拟合或欠拟合而导致无法对故障产品进行缺 陷预测的问题, 并且在实现了对故障产品的缺陷快速定位的同时也 提高了对故障产品缺陷预测的准确率。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案, 下 面将对实施例或现有技术描述中所需要使用的附图作简单地介绍, 显而易见地, 下面描述中的附图仅仅是本发明的一些实施例, 对于 本领域普通技术人员来讲, 在不付出创造性劳动性的前提下, 还可 以根据这些附图获得其他的附图。
图 1 为本发明实施例 1提供的一种缺陷预测方法流程图; 图 2为本发明实施例 2提供的一种缺陷预测方法流程图; 图 3为本发明实施例 3提供的一种缺陷预测装置组成示意图; 图 4为本发明实施例 3提供的另一种缺陷预测装置组成示意图; 图 5为本发明实施例 4提供的一种缺陷预测装置组成示意图。 具体实施方式
下面将结合本发明实施例中的附图, 对本发明实施例中的技术 方案进行清楚、 完整地描述, 显然, 所描述的实施例仅仅是本发明 一部分实施例, 而不是全部的实施例。 基于本发明中的实施例, 本 领域普通技术人员在没有作出创造性劳动前提下所获得的所有其 他实施例, 都属于本发明保护的范围。
实施例 1
本发明实施例提供一种缺陷预测方法, 如图 1 所示, 该方法可 以包括:
101、 根据目标属性从预存的产品故障记录中选择训练属性集, 并将目标属性和训练属性集组合成训练集。
其中, 当一个产品出现故障时, 故障检测人员一般情况下都希 望能够快速的定位出故障产品的缺陷类型或者导致产品出现故障 的器件, 以便来节省维修人员的维修的时间, 而要实现对故障产品 的缺陷类型或者是导致产品出现故障的器件进行快速的定位, 可以 通过提前训练预测模型来实现, 首先故障检测人员可以将生产环节 或者使用过程中出现过故障的产品的信息进行收集并将这些信息 记录到产品故障记录中, 这样当训练预测模型的时候, 便可以根据 历史故障产品的缺陷属性从提前记录的出现过故障的产品的产品 故障记录中选择建立预测模型所必须的属性作为训练属性集, 其 中, 将历史故障产品的缺陷属性定义为目标属性, 当根据目标属性 选择好训练属性集之后, 将目标属性和训练属性集组合生成训练 集, 具体的, 训练集中可以包含多个训练单元, 其中每个训练单元 中包含一个目标属性和一个训练属性集。
102、 根据训练集生成分类器集合; 其中, 分类器集合包含至少 2个树分类器。
其中, 当根据目标属性选择好需要的训练属性集, 并将目标属 性和训练属性集组合成训练集之后, 便可以根据训练集生成分类器 集合, 具体的, 该分类器集合中包含至少 2个树分类器, 每个树分 类器根据预设的策略生成, 并将生成的所有的树分类器共同组成分 类器集合。 该预设的策略可以是决策树算法等。
103、 将分类器集合作为预测模型预测故障产品的缺陷。
其中, 在生产或者使用过程中, 若某个产品出现了故障, 便可 以根据生成的包含至少一个树分类器的分类器集合快速并准确的 定位出该故障产品的缺陷。
本发明实施例提供的一种缺陷预测方法, 根据目标属性从预存 的产品故障记录中选择训练属性集, 并根据目标属性和训练属性集 组合成训练集生成包含至少 2个树分类器的分类器集合, 此时当产 品出现故障时, 便可以将该分类器集合作为预测模型来预测故障产 品的缺陷, 利用该分类器集合作为预测模型, 解决了釆用单一决策 树容易引起过拟合或欠拟合而导致无法对故障产品进行缺陷预测 的问题, 并且在实现了对故障产品的缺陷快速定位的同时也提高了 对故障产品缺陷预测的准确率。
实施例 2
本发明实施例提供一种缺陷预测方法, 如图 2所示, 该方法可 以包括:
201、 根据目标属性从预存的产品故障记录中选择训练属性集, 并将目标属性和训练属性集组合成训练集。
具体的,当一个产品在生产过程中或者使用过程中出现故障时, 一般情况下故障检测人员都希望可以快速的定位故障产品的缺陷 类型或者出现故障的器件, 而对于任何一种产品来说, 故障或缺陷 的出现都与该产品的客观信息有一定的关联性, 例如,产品的型号、 使用环境、 原料来源等等。 为了实现在产品出现故障或者缺陷时, 能够快速的定位故障产品的缺陷类型或者出现故障的器件, 可以从 生产环节或者使用过程中出现过故障的产品的产品故障记录中选 择出建立预测模型需要的属性, 并将选择出来的属性组成训练集, 利用该训练集来建立预测模型。
其中, 首先要做的就是收集生产环节或者使用过程中的出现过 故障的产品的属性信息, 并将每个故障产品的属性信息记录下来。 属性信息具体的可以分为以下几类: 描述产品特征的属性、 描述使 用环境的属性、 描述生产环节的属性以及缺陷属性。 其中, 描述产 品特性的属性可以是产品名称、 产品型号、 组成部件等; 描述使用 环境的属性可以是使用周期、 使用地点、 使用气候等; 描述生产环 节的属性可以是生产日期、 加工部门、 检测记录等; 缺陷属性则可 以是缺陷类型、 缺陷现象、 缺陷根因、 缺陷器件等。
需要说明的是, 本发明实施例对记录的故障产品的属性信息的 分类以及每种分类下记录的属性信息的种类不作限制, 对记录故障 产品的属性信息的形式也不作限制。
其次, 由于对于故障产品来说, 记录的属性信息有 4艮多, 而有 些属性不是建立预测模型所必须要使用的属性, 也就是说某些属性 对判断故障产品的缺陷的作用不大, 因此接下来要做的就是对故障 产品的属性信息进行筛选。 可以理解的是, 历史故障记录中记录的 故障产品的属性信息中的缺陷属性也极有可能是将来出现故障的 产品故障, 即是将来出现故障的产品需要进行预测的属性, 因此为 了方便本领域技术人员的理解, 我们将历史故障产品的缺陷属性称 为 目标属性, 将根据历史故障产品的缺陷属性挑选出与其关联性较 大的属性称为训练属性集, 我们可以将目标属性和训练属性集组成 训练集, 这样便可以利用训练集来建立预测模型。 筛选过程具体的 可以是: 针对目标属性, 对记录的属性信息进行筛选, 可以选出 X 个属性形成训练属性集, 其中 X可以是记录的属性信息中的全部属 性, 也可以是 1个属性。 例如, 历史故障产品的缺陷属性为缺陷类 型, 即可以定义目标属性 Y= {缺陷类型} , 记录的故障产品的属性 信息包括: 产品名称、 产品型号、 组成部件、 使用周期、 使用地点、 使用气候、 生产日期、 加工部门、 检测记录、 缺陷类型、 缺陷现象、 缺陷根因、 缺陷器件, 那么我们可以利用预设的规则在记录的故障 产品的历史故障记录中的属性信息中选择建立预测模型所需要的 属性来组成训练属性集, 假设我们选出来的属性为: 产品名称、 生 产曰期、 加工部门、 使用周期, 即可以定义训练属性集 Χ= {产品名 称、生产日期、加工部门、使用周期 },这样即可以定义训练集 Τ= {产 品名称、 生产日期、 加工部门、 使用周期、 缺陷类型} , 当选出 目 标属性和训练属性集之后, 便可以根据目标属性和训练属性集从历 史故障记录中选取多个故障产品相对应的属性来生成训练集, 该训 练属性集中包含 Μ个训练单元,每个训练单元包含一个历史故障产 品的目标属性和训练属性集。 其中, 对于训练属性集中属性的选择 有 2个要求: 一是利用训练属性集建立的预测目标属性的预测模型 的准确率要高, 这点要求可以通过重复的针对该目标属性选择不同 的训练属性集组成训练集, 并验证由不同生成的训练集建立的预测 模型的准确性, 从中选择准确性最高的作为建立预测模型所需的训 练集, 并可以将已知的缺陷的故障产品的目标属性去掉, 将该故障 产品在生产和制造过程中的属性信息作为测试数据, 来检测生成的 树分类器准确性; 二是训练属性集里的属性在故障产品被检测前是 可获得的, 例如, 在上述记录的故障产品的属性信息中缺陷器件不 能作为训练属性集中的属性, 因为在故障检测前, 并不能获知该故 障产品是那个器件出现了故障。
需要说明的是,训练属性集的具体选择规则可以是遍历的方法, 也可以是通过计算和目标属性的相关性来选出相关性最大的前 X 个属性作为训练属性集。 计算和目标属性的相关性的选择方法是较 为常用的方法, 其中计算相关性的算法也有很多, 一种最简单的相 关性的计算方法是计算各属性和目标属性同时出现的频率, 同时出 现的频率越高, 相关性便越大。 在本发明实施例中, 对训练属性集 的选择方法及选择某些方法时需要运用的算法不作限制。
202、 根据训练集生成分类器集合; 其中, 分类器集合包含至少 2个树分类器。
其中, 在根据目标属性从预存的产品故障记录中选择训练属性 集, 并组合成训练集之后, 便可以根据训练集生成分类器集合。 可 以理解的是, 目标属性和训练属性集组成的训练集可以包含 M个训 练单元, 其中每个训练单元包含一个目标属性和一个训练属性集, 即训练集 r = {( r , l;), r = 1,2, · · · } , 其中( }^)即为第一训练单元。
根据 训 练 集 = {( ^,};), r = l,2,〜M} 生 成 一 个 分 类 器 集 合 C = { . , = 1,2,— 具体的可以是分为以下步骤, 202a、 202b及 202c :
202a , 从训练集中选取第 N训练子集; 其中 N为大于等于 2的 整数。
其中, 从训练集 r = {( r , l ), r = l,2,… }中选取第 N训练子集, 该 第 N训练子集包含 M,个训练单元, M,小于等于 M , 选取方法可以 为可放回的随机抽样, 本发明实施例在此不作限制。 例如, 可以从 训练集中选取第一训练子集, 第二训练子集…第 N训练子集。
202b、根据预设策略生成与该第 N训练子集相对应的第 N树分 类器。
其中, 在从训练集中选取到第 N训练子集之后, 可以根据预设 的策略生成与该第 N训练子集相对应的第 N树分类器。该预设策略 可以是生成树算法, 具体的可以理解的是: 将从训练集中选择的第 N训练子集作为根节点,并按照分离算法选择分离属性和分离谓词, 将根节点按照分离属性和分离谓词进行分裂, 得到两个分支, 对于 每一个分支中的属性可以利用属性选择策略进行选择, 然后对分支 继续进行按照分离算法进行分裂, 重复上述步骤直到得到最终生成 的分支可以确定目标属性, 最后再根据树裁剪策略对生成的树分类 器进行检测。 例如训练集 T= {产品名称、 生产日期、 加工部门、 使 用周期、 缺陷类型} , 其中包含 Μ个训练单元, 第 Ν训练子集为包 含 Μ,个训练单元的集合并将该第 Ν训练子集作为根节点, 假设根 据分离算法选择分离属性为使用周期、分离谓词为使用周期大于 50 天和使用周期小于等于 50 天, 这样便可以根据分离属性和分离谓 词将根节点分为 2个分支, 可以再继续选择分离属性和分离谓词进 行分裂, 直到可以确定目标属性。
其中, 上述树分类器生成过程中使用的分离算法包括但不限于 信息熵检验、 基尼索引检验、 开方检验、 增益率检验; 属性选择可 以包括随机单个属性选择和随机多个属性选择, 属性选择策略本发 明实施例不作限制; 树裁剪策略包括但不限于预裁剪策略、 后裁剪 策略。
202c , 重复以上步骤 202a、 202b , 生成 N个树分类器, 并将 N 个树分类器组合生成分类器集合。
其中, 本发明实施例中的生成的树分类器的个数 N可以是预先 设置的门限值, 即当生成的树分类器的个数达到预定的门限值时, 便可以将生成的 N个树分类器组成生成分类器集合, 例如当预设的 门限值 N为 5时, 分类器集合^^ ^, ^, ,^^。 何时生成分类器集 合也可以是通过计算生成的 K个树分类器的错误率和生成的 K- 1个 树分类器的错误率的差值来决定, 具体的, 当生成第 K- 1树分类器 时, 可以计算生成的 K- 1个树分类器的错误率, 并且当生成第 K树 分类器时, 计算生成的 K个树分类器的错误率, 这样当计算得到 K 个树分类器的错误率和 K- 1个树分类器的错误率的差值小于预设的 阈值时, 便将生成的 K个树分类器组合生成分类器集合, 其中, K 为小于等于 N的整数。
当生成第 K树分类器时, 生成的 K个树分类器的错误率的计算 方法为: 对于训练集中的每一个训练单元, 计算其预测标签, 并根 据该预测标签得到生成的 K个树分类器的错误率。 具体的, 根据第 一训练单元从分类器集合中选取第一类树分类器, 并根据第一类树 分类器生成第一训练单元的第一预测标签; 根据第二训练单元从分 类器集合中选取第二类树分类器, 并根据第二类树分类器生成第二 训练单元的第二预测标签, …根据第 M训练单元从分类器集合中选 取第 M类树分类器, 并根据第 M类树分类器生成第 M训练单元的 第 M预测标签; 重复上述步骤, 直到针对训练集中的每一个训练单 元都对应计算出来该训练单元对应的预测标签再结束, 最后根据计 算出来的 M个预测标签得到生成的 K个树分类器的错误率。 其中, 第 M类树分类器为未使用第 M训练单元生成树分类器的分类器集 合。
预测标签具体计算过程为, 假设对于训练集中的第 r训练单元 (其中 r为大于 0 , 并小于等于 M的正整数) 来说, 分类器集合中 的树分类器可以分为两类, 一类为使用第 r训练单元生成的树分类 器, 另一类为未使用第 r训练单元生成的树分类器, 我们将未使用 第 r训练单元生成的树分类器组成一个集合, 并称为第 r类树分类 器,记作 ^ ,那么第 r训练单元的第 r预测标签的具体计算公式为:
C00B (r , r ) - argmax J ( )/(C xr) ) 其中, C B (r , 为第 r训练单元的第 r预测标签, C .为第 j树 分类器, 为第 r类树分类器, 为第 j树分类器的权重, C 为根据第 j树分类器和第 r训练单元中包含的训练属性集得到的目 标属性, y为分类标签, 为根据第 r训练单元和分类器集合 得到的分类标签集合, I(x)是指标函数: litrue) = 1, I( fake) = 0。 生成的 K个树分类器的错误率的具体计算公式为:
E(T) = ^-∑KC00B(r, xr) = yr)
M ~i 其中, E(r)为生成的 K个树分类器的错误率,Μ为训练集中训练 单元的个数, C00B(r, x )为所述第 r训练单元的第 r预测标签, Λ为第 r训练单元的目标属性, I(x)是指标函数: I(true) = 1, I(fake) = 0。
第 j 树分类器的权重的具体计算过程为: 从训练集中选取第 j, 训练子集, 然后根据第 j,训练子集获取第 j树分类器的误预测率, 最后根据第 j树分类器误预测率获取第 j树分类器的权重。 其中, 所述第 j'训练子集与所述第 j训练子集的交集为空, 所述第 j'训练 子集包含至少一个训练单元。 具体的: 将第 j,训练子集记录为 T"= {(xl, yr), r = \X--N), 其中 "门 '= , Γ为生成第 j树分类器的第 j 训练子集, 第 j树分类器的误预测率的具体计算公式为:
Figure imgf000016_0001
其中, 为第 j树分类器的误预测率, N为第 N,训练子集中训 练单元的个数, I(x)是指标函数: I(true) = 1, I(fake) = 0 , C. (x: )为才艮据 第 j树分类器和第 r训练单元中包含的训练属性集得到的目标属性, :为第 r训练单元包含的目标属性。
第 j树分类器的权重由公式/ 得到, 其中, /<χ) = 1-χ或 h(x) = log (-)。
X
203、 统计故障产品的属性信息。
其中, 当需要预测故障产品的缺陷时, 可以先统计故障产品的 属性信息, 该属性信息是故障产品的在生产及使用过程中获得的数 据, 可以包括: 产品名称、 产品型号、 组成部件、 使用周期、 使用 地点、 生产日期、 加工部门等。
204、根据属性信息将分类器集合作为预测模型预测故障产品的 缺陷得到分类标签集合。 其中, 当将故障产品的属性信息统计出来之后, 可以利用统计 出来的该故障产品的属性信息, 将提前训练好的分类器集合作为预 测模型, 预测故障产品的缺陷, 由于生成的分类器集合中包含 N个 树分类器, 因为釆用该分类器集合预测出来的故障产品的缺陷将会 出现多个预测结果, 将预测出来的多个结果作为分类标签集合。 釆 用本发明实施例提供的缺陷预测方法, 不仅可以预测出故障产品的 缺陷, 还可以得到多个预测结果供维修人员参考, 当维修人员根据 预测出来的第一个预测结果检测故障产品时, 发现第一个预测结果 不是故障产品的缺陷时, 便可以从分类标签集合中选择其他的预测 结果来对故障产品进行检测, 直到找到故障产品真正的缺陷, 这样 便可以节约维修人员的时间。
205、 根据分类器集合和分类器集合中树分类器的权重, 获取分 类标签集合中每个分类标签的信任值。
其中, 当根据统计出的故障产品的属性信息得到分类标签集合 之后, 为了让维修人员能够更快的定位出故障产品的缺陷, 还可以 根据分类器集合和分类器集合中树分类器的权重, 计算分类标签集 合中每个分类标签的信任值。分类标签的信任值的具体计算方法为:
UI (y) = -∑KeJ )I(CJ (xr ) = y) 其中, y为分类标签集合, [/r 0 为分类标签 的信任值; Z 为归一化因子, Z = §/< ); 为第 j树分类器的权重; /(X)是指标 函数: /(tn/e) = l, I(fake) = 0; C xJ为根据第 j树分类器预测的故障产品 的目标属性。
若通过公式计算出 t/r ( = 0 , 则表明该属性信息没有用于 y的分 类, 此外, r可能的缺陷分类标签定义为 {_y e }†^ 0 ) > }。
本发明实施例提供一种缺陷预测方法, 根据目标属性从预存的 产品故障记录中选择训练属性集, 并根据目标属性和训练属性集组 合成训练集生成包含至少 2个树分类器的分类器集合, 此时当产品 出现故障时, 便可以将该分类器集合作为预测模型来预测故障产品 的缺陷, 利用该分类器集合作为预测模型, 解决了釆用单一决策树 容易引起过拟合或欠拟合而导致无法对故障产品进行缺陷预测的 问题, 并且在实现了对故障产品的缺陷快速定位的同时也提高了对 故障产品缺陷预测的准确率。
并且, 当将分类器集合作为预测模型预测故障产品的缺陷时, 还可以得到多个预测结果, 并可以计算出每个预测结果的信任值, 节约了维修人员定位缺陷的时间。 实施例 3
本发明实施例提供一种缺陷预测装置, 如图 3 所示, 包括: 处 理单元 3 1、 生成单元 32、 预测单元 33。
处理单元 3 1 , 用于根据目标属性从预存的产品故障记录中选择 训练属性集, 并将所述目标属性和所述训练属性集组合成训练集; 其中, 所述目标属性为历史故障产品的缺陷属性。
生成单元 32 , 用于根据所述处理单元 3 1 得到的训练集生成分 类器集合; 其中, 所述分类器集合包含至少 2个树分类器。
预测单元 33 , 用于将所述生成单元 32 生成的分类器集合作为 预测模型预测故障产品的缺陷。
进一步的, 所述训练集包含 M个训练单元, 每个训练单元包含 一个目标属性和一个训练属性集。
进一步的, 如图 4所示, 所述生成单元 32可以包括: 选取模块 321、 生成模块 322、 组合模块 323。
选取模块 321 , 用于从所述处理单元 3 1得到的所述训练集中选 取第一训练子集。
生成模块 322 , 用于根据预设策略生成与所述选取模块 321 选 取的所述第一训练子集相对应的第一树分类器。
所述选取模块 321 , 还用于从所述处理单元 3 1得到的所述训练 集中选取第二训练子集。
所述生成模块 322 , 还用于根据预设策略生成与所述选取模块 321选取的所述第二训练子集相对应的第二树分类器。
所述选取模块 321, 还用于从所述处理单元 31得到的所述训练 集中选取第 N训练子集; 其中, 所述第 N训练子集包含 M'个训练 单元, 所述 M,小于等于所述 M。
所述生成模块 322, 还用于根据预设策略生成与所述选取模块 321选取的所述第 N训练子集相对应的第 N树分类器; 其中, 所述 N为大于等于 2的整数。
组合模块 323, 用于将所述生成模块 322生成的 N个树分类器 组合生成所述分类器集合。
进一步的, 所述生成单元 32还可以包括: 第一获取模块 324、 第二获取模块 325。
第一获取模块 324, 用于当生成第 K-1 树分类器时, 获取生成 的 K-1个树分类器的错误率。
第二获取模块 325, 用于当生成第 K树分类器时, 获取生成的 K个树分类器的错误率; 以便当所述 K个树分类器的错误率和所述 K-1个树分类器的错误率的差值小于预设的阈值时, 将所述 K个树 分类器组合生成所述分类器集合; 其中, 所述 K为小于等于 N的整 数。
进一步的,所述第二获取模块 325可以包括: 选取子模块 3251、 生成子模块 3252、 获取子模块 3253。
选取子模块 3251, 用于根据第一训练单元从所述分类器集合中 选取第一类树分类器。
生成子模块 3252, 用于根据所述选取子模块 3251 选取的所述 第一类树分类器生成所述第一训练单元的第一预测标签。
所述选取子模块 3251, 还用于根据第二训练单元从所述分类器 集合中选取第二类树分类器。
所述生成子模块 3252, 还用于根据所述选取子模块 3251 选取 的所述第二类树分类器生成所述第二训练单元的第二预测标签。
所述选取子模块 3251, 还用于根据第 M 训练单元从所述分类 器集合中选取第 M类树分类器; 其中, 所述第 M类树分类器为未 使用第 M训练单元生成树分类器的分类器集合,所述 M为训练集中 包含训练单元的个数。
所述生成子模块 3252, 还用于根据所述选取子模块 3251 选取 的所述第 M类树分类器生成所述第 M训练单元的第 M预测标签。
获取子模块 3253, 用于根据所述生成子模块 3252生成的 M个 预测标签获取所述生成的 K个树分类器的错误率。
进一步的, 所述生成子模块 3252具体用于: 根据
C00B(M, ¾) = argmaX /< )/((^(¾) = 3)生成所述第 M预测标签; 其中,
C00B(M, ¾)为所述第 M训练单元的第 M预测标签, C为第 j树分类 器, (^£为所述第 M类树分类器, 为第 j树分类器的权重, C (xM) 为根据所述第 j树分类器和所述第 M训练单元中包含的训练属性集 得到的目标属性, 为分类标签集合。
进一步的, 所述获取子模块 3253具体用于: 根据
Ε ) =丄 f O , xj = _yj获取所述生成子模块 3252生成的 K个树分 M fri
类器的错误率;其中, E(r)为所述生成的 K个树分类器的错误率 为 所述训练集中训练单元的个数, C°°£(r, x )为所述第 r训练单元的第 r预测标签, 为第 r训练单元的目标属性。
进一步的, 该装置还可以包括: 选取单元 34、第一获取单元 35、 第二获取单元 36。
选取单元 34, 用于在所述生成模块 322根据预设策略生成与所 述第 N训练子集相对应的第 N树分类器之后,从所述训练集中选取 第 N'训练子集; 其中, 所述第 N'训练子集与所述第 N训练子集的 交集为空, 所述第 N,训练子集包含至少一个训练单元。
第一获取单元 35, 用于根据所述选取单元 34选取的所述第 N, 训练子集获取所述第 N树分类器的误预测率。
第二获取单元 36, 用于根据所述第一获取单元 35 获取到的所 述第 N树分类器误预测率获取所述第 N树分类器的权重。 进一步的, 所述预测单元 33 可以包括: 统计模块 33 1、 预测模 块 332、 第三获取模块 333。
统计模块 33 1 , 用于统计所述故障产品的属性信息。
预测模块 332 , 用于根据所述统计模块 33 1 统计的所述属性信 息将所述分类器集合作为预测模型预测所述故障产品的缺陷得到 分类标签集合。
第三获取模块 333 , 用于根据所述分类器集合和所述分类器集 合中每个树分类器的权重, 获取所述分类标签集合中每个分类标签 的信任值。
本发明实施例提供一种缺陷预测装置, 根据目标属性从预存的 产品故障记录中选择训练属性集, 并根据目标属性和训练属性集组 合成训练集生成包含至少 2个树分类器的分类器集合, 此时当产品 出现故障时, 便可以将该分类器集合作为预测模型来预测故障产品 的缺陷, 利用该分类器集合作为预测模型, 解决了釆用单一决策树 容易引起过拟合或欠拟合而导致无法对故障产品进行缺陷预测的 问题, 并且在实现了对故障产品的缺陷快速定位的同时也提高了对 故障产品缺陷预测的准确率。
并且, 当将分类器集合作为预测模型预测故障产品的缺陷时, 还可以得到多个预测结果, 并可以计算出每个预测结果的信任值, 节约了维修人员定位缺陷的时间。 实施例 4
本发明实施例提供一种缺陷预测装置, 如图 5所示, 包括: 至 少一个处理器 41、 存储器 42、 通信接口 43和总线 44 , 该至少一个 处理器 41、 存储器 42和通信接口 43通过总线 44连接并完成相互 间的通信, 其中:
所述总线 44 可以是工业标准体系结构 ( Industry Standard Architecture , ISA ) 总线、 夕卜部设备互连 ( Peripheral Component Interconnect , PCI )总线或扩展工业标准体系结构( Extended Industry Standard Architecture , EISA ) 总线等。 所述总线 44可以分为地址 总线、 数据总线、 控制总线等。 为便于表示, 图 5 中仅用一条粗线 表示, 但并不表示仅有一根总线或一种类型的总线。
所述存储器 42用于存储可执行程序代码,该程序代码包括计算 机操作指令。 存储器 42可能包含高速 RAM存储器, 也可能还包括 非易失性存储器( non-volatile memory ) ,例如至少一个磁盘存储器。
所述处理器 41 可能是一个中央处理器 ( Central Processing Unit , CPU ) , 或者是特定集成电路( Application Specific Integrated Circuit , ASIC ) , 或者是被配置成实施本发明实施例的一个或多个 集成电路。
所述通信接口 43 , 主要用于实现本实施例的设备之间的通信。 所述处理器 41执行所述程序代码,用于根据目标属性从预存的 产品故障记录中选择训练属性集, 并将所述目标属性和所述训练属 性集组合成训练集; 其中, 所述目标属性为历史故障产品的缺陷属 性, 根据所述训练集生成分类器集合; 其中, 所述分类器集合包含 至少 2个树分类器, 并将生成的分类器集合作为预测模型预测故障 产品的缺陷。
进一步的, 所述训练集包含 M个训练单元, 每个训练单元包含 一个目标属性和一个训练属性集。 所述处理器 41 , 还用于从所述训 练集中选取第一训练子集, 根据预设策略生成与所述第一训练子集 相对应的第一树分类器; 从所述训练集中选取第二训练子集, 根据 预设策略生成与所述第二训练子集相对应的第二树分类器; 从所述 训练集中选取第 N训练子集,根据预设策略生成与所述第 N训练子 集相对应的第 N树分类器,最后将生成的 N个树分类器组合生成所 述分类器集合。 其中, 所述第 N训练子集包含 M,个训练单元, 所 述 M,小于等于所述 M , 所述 N为大于等于 2的整数。
进一步的, 所述处理器 41 , 还用于当生成第 K- 1树分类器时, 获取生成的 K- 1个树分类器的错误率,并且当生成第 K树分类器时, 获取生成的 K个树分类器的错误率,以便当所述 K个树分类器的错 误率和所述 K-l个树分类器的错误率的差值小于预设的阈值时, 将 所述 Κ个树分类器组合生成所述分类器集合; 其中, 所述 Κ为小于 等于 Ν的整数。
进一步的, 所述处理器 41, 还用于根据第一训练单元从所述分 类器集合中选取第一类树分类器, 根据所述第一类树分类器生成所 述第一训练单元的第一预测标签; 根据第二训练单元从所述分类器 集合中选取第二类树分类器, 根据所述第二类树分类器生成所述第 二训练单元的第二预测标签;根据第 Μ训练单元从所述分类器集合 中选取第 Μ类树分类器; 根据所述第 Μ类树分类器生成所述第 Μ 训练单元的第 Μ预测标签, 最后根据生成的 Μ个预测标签获取所 述生成的 Κ个树分类器的错误率。 其中, 所述第 Μ类树分类器为 未使用第 Μ训练单元生成树分类器的分类器集合,所述 Μ为训练集 中包含训练单元的个数。
进一步的, 所述处理器 41还用于: 根据
C00B(M, ¾) = argmaX ^ ( )/(<^(¾)= 生成所述第 M预测标签; 其中,
C00B(M, ¾)为所述第 M训练单元的第 M预测标签, C.为第 j树分类 器, (^£为所述第 M类树分类器, 为第 j树分类器的权重, C (xM) 为根据所述第 j树分类器和所述第 M训练单元中包含的训练属性集 得到的目标属性, 为分类标签集合。 并根据
£( = ±f (C。。B (r, χ = Λ)获取生成的 Κ个树分类器的错误率;其中, M fri
E(r)为所述生成的 K个树分类器的错误率 为所述训练集中训练单 元的个数, C00B(r, x )为所述第 r训练单元的第 r预测标签, Λ为第 r 训练单元的目标属性。
进一步的, 所述处理器 41, 还用于在所述根据预设策略生成与 所述第 N训练子集相对应的第 N树分类器之后,从所述训练集中选 取第 N'训练子集, 根据所述第 N'训练子集获取所述第 N树分类器 的误预测率,根据所述第 N树分类器误预测率获取所述第 N树分类 器的权重。 其中, 所述第 N'训练子集与所述第 N训练子集的交集 为空, 所述第 N,训练子集包含至少一个训练单元。
进一步的, 所述处理器 41 , 还用于统计所述故障产品的属性信 息, 根据所述属性信息将所述分类器集合作为预测模型预测所述故 障产品的缺陷得到分类标签集合, 并根据所述分类器集合和所述分 类器集合中树分类器的权重, 获取所述分类标签集合中每个分类标 签的信任值。
进一步的, 所述预设策略包括决策树算法。
本发明实施例提供一种缺陷预测装置, 根据目标属性从预存的 产品故障记录中选择训练属性集, 并根据目标属性和训练属性集组 合成训练集生成包含至少 2个树分类器的分类器集合, 此时当产品 出现故障时, 便可以将该分类器集合作为预测模型来预测故障产品 的缺陷, 利用该分类器集合作为预测模型, 解决了釆用单一决策树 容易引起过拟合或欠拟合而导致无法对故障产品进行缺陷预测的 问题, 并且在实现了对故障产品的缺陷快速定位的同时也提高了对 故障产品缺陷预测的准确率。
并且, 当将分类器集合作为预测模型预测故障产品的缺陷时, 还可以得到多个预测结果, 并可以计算出每个预测结果的信任值, 节约了维修人员定位缺陷的时间。
通过以上的实施方式的描述, 所属领域的技术人员可以清楚地 了解到本发明可借助软件加必需的通用硬件的方式来实现, 当然也 可以通过硬件, 但很多情况下前者是更佳的实施方式。 基于这样的 理解, 本发明的技术方案本质上或者说对现有技术做出贡献的部分 可以以软件产品的形式体现出来, 该计算机软件产品存储在可读取 的存储介质中, 如计算机的软盘, 硬盘或光盘等, 包括若干指令用 以使得一台计算机设备 (可以是个人计算机, 服务器, 或者网络设 备等) 执行本发明各个实施例所述的方法。
以上所述, 仅为本发明的具体实施方式, 但本发明的保护范围 并不局限于此, 任何熟悉本技术领域的技术人员在本发明揭露的技 术范围内, 可轻易想到的变化或替换, 都应涵盖在本发明的保护范 围之内。 因此, 本发明的保护范围应以所述权利要求的保护范围为 准。

Claims

权 利 要 求 书
1、 一种缺陷预测方法, 其特征在于, 包括:
根据目标属性从预存的产品故障记录中选择训练属性集,并将所 述目标属性和所述训练属性集组合成训练集; 其中, 所述目标属性 为历史故障产品的缺陷属性;
根据所述训练集生成分类器集合; 其中, 所述分类器集合包含至 少 2个树分类器;
将所述分类器集合作为预测模型预测故障产品的缺陷。
2、 根据权利要求 1 所述的缺陷预测方法, 其特征在于, 所述训 练集包含 M个训练单元, 每个训练单元包含一个目标属性和一个训 练属性集;
所述根据所述训练集生成分类器集合, 包括:
从所述训练集中选取第一训练子集;
根据预设策略生成与所述第一训练子集相对应的第一树分类器; 从所述训练集中选取第二训练子集;
根据预设策略生成与所述第二训练子集相对应的第二树分类器; 从所述训练集中选取第 N训练子集; 其中, 所述第 N训练子集 包含 M,个训练单元, 所述 M,小于等于所述 M;
根据预设策略生成与所述第 N训练子集相对应的第 N树分类器; 其中, 所述 N为大于等于 2的整数;
将 N个树分类器组合生成所述分类器集合。
3、 根据权利要求 1所述的缺陷预测方法, 其特征在于, 还包括: 当生成第 K- 1 树分类器时, 获取生成的 K- 1 个树分类器的错误 率;
当生成第 K树分类器时, 获取生成的 K个树分类器的错误率; 以便当所述 K个树分类器的错误率和所述 K- 1 个树分类器的错误率 的差值小于预设的阈值时, 将所述 K个树分类器组合生成所述分类 器集合; 其中, 所述 K为小于等于 N的整数。
4、 根据权利要求 3所述的缺陷预测方法, 其特征在于, 所述当 生成第 K树分类器时, 获取生成的 Κ个树分类器的错误率, 包括: 根据第一训练单元从所述分类器集合中选取第一类树分类器; 根据所述第一类树分类器生成所述第一训练单元的第一预测标 签;
根据第二训练单元从所述分类器集合中选取第二类树分类器; 根据所述第二类树分类器生成所述第二训练单元的第二预测标 签;
根据第 Μ训练单元从所述分类器集合中选取第 Μ类树分类器; 其中, 所述第 Μ类树分类器为未使用第 Μ训练单元生成树分类器的 分类器集合,所述 Μ为训练集中包含训练单元的个数;
根据所述第 Μ类树分类器生成所述第 Μ训练单元的第 Μ预测标 签;
根据 Μ个预测标签获取所述生成的 Κ个树分类器的错误率。
5、 根据权利要求 4所述的缺陷预测方法, 其特征在于, 所述根 据所述第 Μ类树分类器生成所述第 Μ训练单元的第 Μ预测标签, 具体包括:
根据 C。。£(M, ) = argmax ^ ( )/((^·(χΜ) = 生成所述第 M预测标 签; 其中, C00B (M , ^)为所述第 M训练单元的第 M预测标签, Cj为 第 j树分类器, (^£为所述第 M类树分类器, 为第 j树分类器的 权重, C.(xM)为根据所述第 j树分类器和所述第 M训练单元中包含的 训练属性集得到的目标属性, 为分类标签集合。
6、 根据权利要求 5所述的缺陷预测方法, 其特征在于, 所述根 据 M个预测标签获取所述生成的 K个树分类器的错误率,具体包括: 根据 E(r) = lf /(c。。 , = > 获取所述生成的 K个树分类器的
M ri 错误率; 其中, E(r)为所述生成的 K个树分类器的错误率 为所述训 练集中训练单元的个数, C00B(r, x )为所述第 r训练单元的第 r预测标 签, Λ为第 r训练单元的目标属性。
7、 根据权利要求 2所述的缺陷预测方法, 其特征在于, 在所述 根据预设策略生成与所述第 N训练子集相对应的第 N树分类器之后, 还包括:
从所述训练集中选取第 N,训练子集; 其中, 所述第 N,训练子集 与所述第 N训练子集的交集为空, 所述第 N'训练子集包含至少一个 训练单元;
根据所述第 N,训练子集获取所述第 N树分类器的误预测率; 根据所述第 N树分类器误预测率获取所述第 N树分类器的权重。
8、 根据权利要求 7所述的缺陷预测方法, 其特征在于, 所述将 所述分类器集合作为预测模型预测故障产品的缺陷, 包括:
统计所述故障产品的属性信息;
根据所述属性信息将所述分类器集合作为预测模型预测所述故 障产品的缺陷得到分类标签集合;
根据所述分类器集合和所述分类器集合中每个树分类器的权重, 获取所述分类标签集合中每个分类标签的信任值。
9、 根据权利要求 2-8 中任一权利要求所述的缺陷预测方法, 其 特征在于, 所述预设策略包括决策树算法。
10、 一种缺陷预测装置, 其特征在于, 包括:
处理单元,用于根据目标属性从预存的产品故障记录中选择训练 属性集, 并将所述目标属性和所述训练属性集组合成训练集; 其中, 所述目标属性为历史故障产品的缺陷属性;
生成单元, 用于根据所述处理单元得到的训练集生成分类器集 合; 其中, 所述分类器集合包含至少 2个树分类器;
预测单元,用于将所述生成单元生成的分类器集合作为预测模型 预测故障产品的缺陷。
11、 根据权利要求 10所述的缺陷预测装置, 其特征在于, 所述 训练集包含 M个训练单元, 每个训练单元包含一个目标属性和一个 训练属性集;
所述生成单元, 包括: 选取模块,用于从所述处理单元得到的所述训练集中选取第一训 练子集;
生成模块,用于根据预设策略生成与所述选取模块选取的所述第 一训练子集相对应的第一树分类器;
所述选取模块,还用于从所述处理单元得到的所述训练集中选取 第二训练子集;
所述生成模块,还用于根据预设策略生成与所述选取模块选取的 所述第二训练子集相对应的第二树分类器;
所述选取模块,还用于从所述处理单元得到的所述训练集中选取 第 N训练子集; 其中, 所述第 N训练子集包含 M,个训练单元, 所述 M'小于等于所述 M;
所述生成模块,还用于根据预设策略生成与所述选取模块选取的 所述第 N训练子集相对应的第 N树分类器; 其中, 所述 N为大于等 于 2的整数;
组合模块, 用于将所述生成模块生成的 N 个树分类器组合生成 所述分类器集合。
12、 根据权利要求 10所述的缺陷预测装置, 其特征在于, 所述 生成单元还包括:
第一获取模块, 用于当生成第 K- 1树分类器时, 获取生成的 K- 1 个树分类器的错误率;
第二获取模块, 用于当生成第 K树分类器时, 获取生成的 K个 树分类器的错误率; 以便当所述 K个树分类器的错误率和所述 K- 1 个树分类器的错误率的差值小于预设的阈值时, 将所述 K个树分类 器组合生成所述分类器集合; 其中, 所述 K为小于等于 N的整数。
13、 根据权利要求 12所述的缺陷预测装置, 其特征在于, 所述 第二获取模块, 包括:
选取子模块,用于根据第一训练单元从所述分类器集合中选取第 一类树分类器;
生成子模块,用于根据所述选取子模块选取的所述第一类树分类 器生成所述第一训练单元的第一预测标签;
所述选取子模块,还用于根据第二训练单元从所述分类器集合中 选取第二类树分类器;
所述生成子模块,还用于根据所述选取子模块选取的所述第二类 树分类器生成所述第二训练单元的第二预测标签;
所述选取子模块, 还用于根据第 M训练单元从所述分类器集合 中选取第 M类树分类器; 其中, 所述第 M类树分类器为未使用第 M 训练单元生成树分类器的分类器集合,所述 M为训练集中包含训练单 元的个数;
所述生成子模块, 还用于根据所述选取子模块选取的所述第 M 类树分类器生成所述第 M训练单元的第 M预测标签;
获取子模块, 用于根据所述生成子模块生成的 M个预测标签获 取所述生成的 K个树分类器的错误率。
14、 根据权利要求 13所述的缺陷预测装置, 其特征在于, 所述 生成子模块, 具体用于:
根据 C。。£(M, ) = argmax ^ ( )/((^·(χΜ) = 生成所述第 M预测标 签; 其中, C00B (M , ^)为所述第 M训练单元的第 M预测标签, Cj为 第 j树分类器, (^£为所述第 M类树分类器, 为第 j树分类器的 权重, C.(xM)为根据所述第 j树分类器和所述第 M训练单元中包含的 训练属性集得到的目标属性, 为分类标签集合。
15、 根据权利要求 14所述的缺陷预测装置, 其特征在于, 所述 获取子模块, 具体用于: 根据 (Γ) = ΐ£/ ΰίΜ(Γ, = Λ)获取所述生成子模块生成的 Κ个
M ri 树分类器的错误率; 其中, E(r)为所述生成的 K个树分类器的错误 率, 为所述训练集中训练单元的个数, C°°£(r, x )为所述第 r训练单 元的第 r预测标签, 为第 r训练单元的目标属性。
16、 根据权利要求 11 所述的缺陷预测装置, 其特征在于, 还包 括:
选取单元, 用于在所述生成模块根据预设策略生成与所述第 N 训练子集相对应的第 N树分类器之后, 从所述训练集中选取第 N,训 练子集; 其中, 所述第 N'训练子集与所述第 N训练子集的交集为空, 所述第 N,训练子集包含至少一个训练单元;
第一获取单元, 用于根据所述选取单元选取的所述第 N,训练子 集获取所述第 N树分类器的误预测率;
第二获取单元, 用于根据所述第一获取单元获取到的所述第 N 树分类器误预测率获取所述第 N树分类器的权重。
17、 根据权利要求 16所述的缺陷预测装置, 其特征在于, 所述 预测单元包括:
统计模块, 用于统计所述故障产品的属性信息;
预测模块,用于根据所述统计模块统计的所述属性信息将所述分 类器集合作为预测模型预测所述故障产品的缺陷得到分类标签集 合;
第三获取模块,用于根据所述分类器集合和所述分类器集合中每 个树分类器的权重, 获取所述分类标签集合中每个分类标签的信任 值。
18、 根据权利要求 1 1 - 17中任一权利要求所述的缺陷预测装置, 其特征在于, 所述预设策略包括决策树算法。
PCT/CN2013/080279 2013-02-28 2013-07-29 一种缺陷预测方法及装置 Ceased WO2014131262A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP13876166.3A EP2854053B1 (en) 2013-02-28 2013-07-29 Defect prediction method and device
US14/587,724 US10068176B2 (en) 2013-02-28 2014-12-31 Defect prediction method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310066324.0 2013-02-28
CN201310066324.0A CN104021264B (zh) 2013-02-28 2013-02-28 一种缺陷预测方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/587,724 Continuation US10068176B2 (en) 2013-02-28 2014-12-31 Defect prediction method and apparatus

Publications (1)

Publication Number Publication Date
WO2014131262A1 true WO2014131262A1 (zh) 2014-09-04

Family

ID=51427505

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/080279 Ceased WO2014131262A1 (zh) 2013-02-28 2013-07-29 一种缺陷预测方法及装置

Country Status (4)

Country Link
US (1) US10068176B2 (zh)
EP (1) EP2854053B1 (zh)
CN (1) CN104021264B (zh)
WO (1) WO2014131262A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109491914A (zh) * 2018-11-09 2019-03-19 大连海事大学 基于不平衡学习策略高影响缺陷报告预测方法
CN109657718A (zh) * 2018-12-19 2019-04-19 广东省智能机器人研究院 一种数据驱动的smt生产线上spi缺陷类别智能识别方法
CN110888798A (zh) * 2019-10-14 2020-03-17 西安理工大学 一种基于图卷积神经网络对软件缺陷预测方法
US10627723B2 (en) 2013-12-17 2020-04-21 Asml Netherlands B.V. Yield estimation and control
CN111291105A (zh) * 2020-01-21 2020-06-16 江门荣信电路板有限公司 一种pcb板检验数据处理方法、装置和存储介质

Families Citing this family (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10417076B2 (en) * 2014-12-01 2019-09-17 Uptake Technologies, Inc. Asset health score
US9898811B2 (en) * 2015-05-08 2018-02-20 Kla-Tencor Corporation Method and system for defect classification
US10437702B2 (en) * 2016-02-29 2019-10-08 B. G. Negev Technologies And Applications Ltd., At Ben-Gurion University Data-augmented software diagnosis method and a diagnoser therefor
US10229169B2 (en) * 2016-03-15 2019-03-12 International Business Machines Corporation Eliminating false predictors in data-mining
CN106054104B (zh) * 2016-05-20 2019-01-11 国网新疆电力公司电力科学研究院 一种基于决策树的智能电表故障实时预测方法
CN107888397B (zh) * 2016-09-30 2020-12-25 华为技术有限公司 确定故障类型的方法和装置
US11086761B2 (en) * 2017-03-20 2021-08-10 Devfactory Innovations Fz-Llc Defect prediction operation
US10789550B2 (en) * 2017-04-13 2020-09-29 Battelle Memorial Institute System and method for generating test vectors
CN110197187A (zh) * 2018-02-24 2019-09-03 腾讯科技(深圳)有限公司 对用户流失进行预测的方法、设备、存储介质以及处理器
CN108985465A (zh) * 2018-05-21 2018-12-11 许继电气股份有限公司 一种换流站故障分类方法及系统
CN108776808A (zh) * 2018-05-25 2018-11-09 北京百度网讯科技有限公司 一种用于检测钢包溶蚀缺陷的方法和装置
US11859846B2 (en) 2018-06-15 2024-01-02 Johnson Controls Tyco IP Holdings LLP Cost savings from fault prediction and diagnosis
US11474485B2 (en) 2018-06-15 2022-10-18 Johnson Controls Tyco IP Holdings LLP Adaptive training and deployment of single chiller and clustered chiller fault detection models for connected chillers
CN110196792B (zh) * 2018-08-07 2022-06-14 腾讯科技(深圳)有限公司 故障预测方法、装置、计算设备及存储介质
CN109214527B (zh) * 2018-08-09 2020-10-30 南瑞集团有限公司 一种变压器故障早期诊断预警方法和系统
CN109739902A (zh) * 2018-12-29 2019-05-10 联想(北京)有限公司 一种数据分析方法、设备及计算机可读存储介质
CN110796288B (zh) * 2019-09-29 2022-05-03 宁波海上鲜信息技术有限公司 一种信息推送方法、装置及存储介质
CN112785101A (zh) * 2019-11-06 2021-05-11 中国石油化工股份有限公司 存储器、炼油化工设备缺陷处置方法、装置和设备
CN113011690B (zh) * 2019-12-19 2025-06-10 华为云计算技术有限公司 用于产品缺陷定位的模型的训练及选择方法和装置
US11410064B2 (en) * 2020-01-14 2022-08-09 International Business Machines Corporation Automated determination of explanatory variables
CN111259953B (zh) * 2020-01-15 2023-10-20 云南电网有限责任公司电力科学研究院 一种基于电容型设备缺陷数据的设备缺陷时间预测方法
US11175973B1 (en) 2020-05-11 2021-11-16 International Business Machines Corporation Prediction of performance degradation with non-linear characteristics
CN113297045B (zh) * 2020-07-27 2024-03-08 阿里巴巴集团控股有限公司 分布式系统的监控方法及装置
CN111968098A (zh) * 2020-08-24 2020-11-20 广东工业大学 一种带钢表面缺陷检测方法、装置和设备
CN112506483B (zh) * 2020-12-04 2024-04-05 北京五八信息技术有限公司 数据增广方法、装置、电子设备及存储介质
CN113204482B (zh) * 2021-04-21 2022-09-13 武汉大学 基于语义属性子集划分与度量匹配的异质缺陷预测方法及系统
CN113656390A (zh) * 2021-08-13 2021-11-16 国网辽宁省电力有限公司信息通信分公司 一种基于缺陷设备的电力设备缺陷标签画像方法
CN115269377B (zh) * 2022-06-23 2023-07-11 南通大学 一种基于优化实例选择的跨项目软件缺陷预测方法
CN115296916B (zh) * 2022-08-09 2025-01-28 江苏易安联网络技术有限公司 一种基于决策树模型的零信任安全系统
CN116756372B (zh) * 2023-06-19 2026-03-17 红云红河烟草(集团)有限责任公司 一种烟支缺陷检测方法及装置
CN116993327B (zh) * 2023-09-26 2023-12-15 国网安徽省电力有限公司经济技术研究院 用于变电站的缺陷定位系统及其方法
CN118777563B (zh) * 2024-07-30 2025-04-11 江苏爱箔乐铝箔制品有限公司 铝箔餐盒缺陷检测方法及系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101799320A (zh) * 2010-01-27 2010-08-11 北京信息科技大学 一种旋转设备故障预测方法及其装置
CN101556553B (zh) * 2009-03-27 2011-04-06 中国科学院软件研究所 基于需求变更的缺陷预测方法和系统
CN102928720A (zh) * 2012-11-07 2013-02-13 广东电网公司 油浸式主变压器的缺陷率检测方法

Family Cites Families (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5991699A (en) 1995-05-04 1999-11-23 Kla Instruments Corporation Detecting groups of defects in semiconductor feature space
US6477471B1 (en) 1995-10-30 2002-11-05 Texas Instruments Incorporated Product defect predictive engine
US6148099A (en) * 1997-07-03 2000-11-14 Neopath, Inc. Method and apparatus for incremental concurrent learning in automatic semiconductor wafer and liquid crystal display defect classification
US6513025B1 (en) * 1999-12-09 2003-01-28 Teradyne, Inc. Multistage machine learning process
US7536677B2 (en) 2003-12-19 2009-05-19 International Business Machines Corporation Method, system, and product for determining defect detection efficiency
US7328343B2 (en) * 2004-03-10 2008-02-05 Sun Microsystems, Inc. Method and apparatus for hybrid group key management
US7467116B2 (en) * 2004-09-17 2008-12-16 Proximex Corporation Incremental data fusion and decision making system and associated method
US7996219B2 (en) * 2005-03-21 2011-08-09 At&T Intellectual Property Ii, L.P. Apparatus and method for model adaptation for spoken language understanding
CA2605143A1 (en) 2005-04-15 2006-10-26 Becton, Dickinson And Company Diagnosis of sepsis
US7614043B2 (en) 2005-08-26 2009-11-03 Microsoft Corporation Automated product defects analysis and reporting
US7451009B2 (en) 2005-09-07 2008-11-11 General Instrument Corporation Method and apparatus for product defect classification
US20070124235A1 (en) 2005-11-29 2007-05-31 Anindya Chakraborty Method and system for income estimation
GB2434225A (en) 2006-01-13 2007-07-18 Cytokinetics Inc Random forest modelling of cellular phenotypes
JP4644613B2 (ja) * 2006-02-27 2011-03-02 株式会社日立ハイテクノロジーズ 欠陥観察方法及びその装置
US20070260563A1 (en) * 2006-04-17 2007-11-08 International Business Machines Corporation Method to continuously diagnose and model changes of real-valued streaming variables
EP2657705A3 (en) 2006-09-19 2013-12-25 Metabolon Inc. Biomarkers for prostate cancer and methods using the same
US9330127B2 (en) * 2007-01-04 2016-05-03 Health Care Productivity, Inc. Methods and systems for automatic selection of classification and regression trees
JP5095315B2 (ja) 2007-09-05 2012-12-12 富士フイルム株式会社 ペロブスカイト型酸化物、強誘電体膜とその製造方法、強誘電体素子、及び液体吐出装置
US8306942B2 (en) * 2008-05-06 2012-11-06 Lawrence Livermore National Security, Llc Discriminant forest classification method and system
US8165826B2 (en) 2008-09-30 2012-04-24 The Boeing Company Data driven method and system for predicting operational states of mechanical systems
US8140514B2 (en) 2008-11-26 2012-03-20 Lsi Corporation Automatic classification of defects
US8676432B2 (en) * 2010-01-13 2014-03-18 GM Global Technology Operations LLC Fault prediction framework using temporal data mining
US8924313B2 (en) * 2010-06-03 2014-12-30 Xerox Corporation Multi-label classification using a learned combination of base classifiers
CN102622510A (zh) * 2012-01-31 2012-08-01 龚波 一种软件缺陷量化管理系统和方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101556553B (zh) * 2009-03-27 2011-04-06 中国科学院软件研究所 基于需求变更的缺陷预测方法和系统
CN101799320A (zh) * 2010-01-27 2010-08-11 北京信息科技大学 一种旋转设备故障预测方法及其装置
CN102928720A (zh) * 2012-11-07 2013-02-13 广东电网公司 油浸式主变压器的缺陷率检测方法

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10627723B2 (en) 2013-12-17 2020-04-21 Asml Netherlands B.V. Yield estimation and control
US11119414B2 (en) 2013-12-17 2021-09-14 Asml Netherlands B.V. Yield estimation and control
CN109491914A (zh) * 2018-11-09 2019-03-19 大连海事大学 基于不平衡学习策略高影响缺陷报告预测方法
CN109657718A (zh) * 2018-12-19 2019-04-19 广东省智能机器人研究院 一种数据驱动的smt生产线上spi缺陷类别智能识别方法
CN109657718B (zh) * 2018-12-19 2023-02-07 广东省智能机器人研究院 一种数据驱动的smt生产线上spi缺陷类别智能识别方法
CN110888798A (zh) * 2019-10-14 2020-03-17 西安理工大学 一种基于图卷积神经网络对软件缺陷预测方法
CN110888798B (zh) * 2019-10-14 2022-11-04 西安理工大学 一种基于图卷积神经网络对软件缺陷预测方法
CN111291105A (zh) * 2020-01-21 2020-06-16 江门荣信电路板有限公司 一种pcb板检验数据处理方法、装置和存储介质
CN111291105B (zh) * 2020-01-21 2023-12-15 江门荣信电路板有限公司 一种pcb板检验数据处理方法、装置和存储介质

Also Published As

Publication number Publication date
EP2854053A1 (en) 2015-04-01
US10068176B2 (en) 2018-09-04
CN104021264B (zh) 2017-06-20
CN104021264A (zh) 2014-09-03
EP2854053B1 (en) 2019-10-09
US20150112903A1 (en) 2015-04-23
EP2854053A4 (en) 2016-12-21

Similar Documents

Publication Publication Date Title
WO2014131262A1 (zh) 一种缺陷预测方法及装置
AU2019236757B2 (en) Self-Service Classification System
CN108228705B (zh) 直播视频反馈中的自动对象和活动跟踪设备、方法及介质
US10565519B2 (en) Systems and method for performing contextual classification using supervised and unsupervised training
US20210256436A1 (en) Machine learning model for predicting litigation risk in correspondence and identifying severity levels
JP2024538508A (ja) 電子通信における健康および安全性リスクを特定および予測するための機械学習モデル
AlJame et al. Deep forest model for diagnosing COVID-19 from routine blood tests
US20170200205A1 (en) Method and system for analyzing user reviews
CN103617435B (zh) 一种主动学习图像分类方法和系统
CN103679160B (zh) 一种人脸识别方法和装置
KR20230107558A (ko) 모델 트레이닝, 데이터 증강 방법, 장치, 전자 기기 및 저장 매체
CN108710555A (zh) 一种基于监督学习的服务器错误诊断方法
US20150294052A1 (en) Anomaly detection using tripoint arbitration
EP4338103A1 (en) Ensemble machine learning for anomaly detection
CN104052612A (zh) 一种电信业务的故障识别与定位的方法及系统
Yan et al. A region based attention method for weakly supervised sound event detection and classification
KR20170097535A (ko) 비관심 아이템을 활용한 아이템 추천 방법 및 장치
CN106991425B (zh) 商品交易质量的检测方法和装置
CN109639456A (zh) 一种自动化告警的改进方法及告警数据的自动化处理平台
JP7207537B2 (ja) 分類装置、分類方法及び分類プログラム
Seetharaman et al. Analysis of fake news detection using machine learning technique
US11687491B2 (en) Generating weights for finding duplicate records
US20200210760A1 (en) System and method for cascading image clustering using distribution over auto-generated labels
CN118692007A (zh) 一种具有分析能力的视频异常检测及其模型训练方法
CN109857859A (zh) 新闻信息的处理方法、装置、设备以及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13876166

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2013876166

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE