WO2021164254A1 - 训练分类器的方法和装置 - Google Patents

训练分类器的方法和装置 Download PDF

Info

Publication number
WO2021164254A1
WO2021164254A1 PCT/CN2020/117613 CN2020117613W WO2021164254A1 WO 2021164254 A1 WO2021164254 A1 WO 2021164254A1 CN 2020117613 W CN2020117613 W CN 2020117613W WO 2021164254 A1 WO2021164254 A1 WO 2021164254A1
Authority
WO
WIPO (PCT)
Prior art keywords
training
sample
classifier
image
semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2020/117613
Other languages
English (en)
French (fr)
Inventor
王硕
岳俊
刘健庄
田奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to EP20919863.9A priority Critical patent/EP4109335A4/en
Publication of WO2021164254A1 publication Critical patent/WO2021164254A1/zh
Priority to US17/892,908 priority patent/US12536471B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements

Definitions

  • This application relates to the field of artificial intelligence, and in particular to a method and device for training a classifier.
  • Neural network is a tool to realize artificial intelligence. Neural network needs to be trained with a large number of samples to achieve specific functions before application. When neural network is required to implement new functions, it is usually necessary to use a large number of new samples to train neural network.
  • One way to reduce the workload of retraining the neural network is knowledge transfer. After the neural network is trained based on a large number of samples, it has learned knowledge; when using novel samples to train the neural network, the learned knowledge can be used to process the new samples, so that the neural network can be retrained with fewer new samples. Improve the performance of neural networks. Compared to new samples, the above-mentioned large number of samples can be called base samples,
  • the feature extractor When using knowledge transfer and new samples to train neural networks, the feature extractor is usually used to extract features from new samples, and the new samples are classified based on the features. When the category of the new sample changes, the feature extractor needs to learn the new category again. The characteristics of the new samples lead to an increase in training workload.
  • the present application provides a method and device for training a classifier, which can train the efficiency and performance of the classifier.
  • a method for training a classifier including: obtaining a first training sample, the first training sample including a corresponding semantic label; obtaining a plurality of second training samples, each of the second training samples Including corresponding semantic labels; determining a target sample from the plurality of second training samples according to the semantic similarity between the first training sample and the plurality of second training samples; according to the first training sample, the The semantic similarity of the target sample, the first training sample, and the target sample trains a classifier.
  • Semantic similarity is used to measure the difficulty of knowledge transfer. For example, the semantic similarity of tabby cat and tiger cat is higher, which means that the feature similarity of tabby cat image and tiger cat image is high, and the classifier uses tiger cat image features.
  • the learned classification knowledge is easier to transfer to the classification process of the tabby cat images, and more tiger skin cat images can be used to train the classifier; the semantic similarity between tabby cat and hunting dog is low, which means that the feature similarity between the tabby cat image and the hunting dog image is low ,
  • the classification knowledge learned by the classifier through the hound image features is difficult to transfer to the classification process of the tabby cat image, which can reduce the amount of hound image used in the classifier training process.
  • training the classifier based on semantic similarity can improve the training efficiency and performance of the classifier.
  • the above method does not use semantic tags for learning during feature extraction, there is no need to change the network structure of the feature extractor, which can improve the training efficiency of the neural network (including the classifier).
  • the training a classifier based on the semantic similarity of the first training sample, the target sample, the first training sample, and the target sample includes: determining the first training sample by the classifier The prediction score of the training sample; the semantic transfer loss function L semantic of the classifier is determined according to the prediction score and the semantic similarity between the first training sample and the target sample, and the target sample and the first training The semantic similarity of the samples is used to determine the degree of influence of the prediction score on the L semantic ; and the classifier is trained according to the L semantic.
  • Training the classifier according to the semantic similarity can improve the performance of the classifier.
  • the degree of influence of the prediction score on the L semantic is 100%; or, when the target sample When the semantic similarity with the first training sample is less than the intensity of semantic transfer, the degree of influence of the prediction score on the L semantic is 0.
  • the intensity of semantic transfer can be set based on experience, so that the classifier can learn the correct classification knowledge and avoid the classifier being misled by the wrong classification knowledge.
  • the training a classifier based on the semantic similarity of the first training sample, the target sample, the first training sample, and the target sample includes: determining the first training sample by the classifier The prediction score S n of the training sample; the prediction score S b of the target training sample is determined by the classifier; the balance learning loss of the classifier is determined according to the S n , the S b and the balance learning intensity Function L IC , the balance learning intensity is used to adjust the degree of influence of the S n and the S b on the L IC ; and train the classifier according to the L IC.
  • the classifier tends to predict the new samples into the categories of the basic samples based on the knowledge learned from the basic samples, which leads to a decrease in the performance of the classifier. Adjusting the degree of influence of S n and S b on L IC based on the balanced learning intensity can make the classifier focus more on learning the classification knowledge of new samples, and finally obtain a classifier with better performance.
  • the balance of learning for adjusting the intensity of said S b and S n the degree of influence of L IC comprising: a balance study for increasing the intensity of said S n The degree of influence of the L IC , and the degree of reducing the influence of the S b on the L IC .
  • the above scheme enables the classifier to focus on learning the classification knowledge of new samples, increase the value of S n to reduce L IC , and finally obtain a classifier with better performance.
  • the training a classifier according to the semantic similarity of the first training sample, the target sample, the first training sample and the target sample includes: acquiring the first training sample and the Multi-view features of the target sample; training the classifier according to the multi-view features.
  • the feature extractor can extract image features from the original image and input the image features into the classifier for training. Due to the large number of target samples, the feature extractor uses the target samples for extraction and learning, and then extracts the first When training the image features of the sample, it is more inclined to extract features from the first training sample based on the learned knowledge, while ignoring the new content in the first training sample.
  • this application provides a method for extracting features, which converts each sample into a multi-view image (for example, original image, foreground image, and background image).
  • the details of the multi-view image are more abundant, and the feature extractor changes from multi-view Extracting image features from the image can prevent the feature extractor from ignoring the new content in the first training sample, and extract more accurate features, thereby obtaining a classifier with better performance.
  • the acquiring the multi-view features of the first training sample and the target sample includes: separating a plurality of images from each of the first training sample and the target sample, so The perspectives of the multiple images of each sample are different from each other; the multiple features of each sample are acquired according to the multiple images of each sample; the multiple features of each sample are spliced to obtain the multiple perspectives feature.
  • the training a classifier based on the semantic similarity of the first training sample, the target sample, the first training sample, and the target sample includes: determining the first training sample by the classifier The prediction score of the training sample; the classification loss function L CE of the classifier is determined according to the prediction score; the classifier is trained according to the L CE.
  • an image classification method including: acquiring features of an image to be classified; and inputting the features into a neural network for classification.
  • the target sample corresponding to the first training sample, the semantic similarity of the first training sample and the target sample are trained, and the target sample is obtained based on the first training sample and multiple second training samples
  • the semantic similarity is determined from the multiple second training samples; the classification result of the image to be classified is obtained.
  • Training the classifier based on semantic similarity can improve the performance of the classifier. Therefore, the classification result of the image to be classified is more accurate by using the classifier.
  • the feature of the image to be classified is a multi-view feature.
  • the details of the multi-view image are more abundant.
  • the feature extractor extracts image features from the multi-view image, which can prevent the feature extractor from ignoring the new content in the image to be classified, extracting more accurate features, and obtaining more accurate classification results.
  • the obtaining the characteristics of the image to be classified includes: obtaining a plurality of images of different perspectives from the image species to be classified; obtaining the feature of each image in the plurality of images of different perspectives; The features of each image obtain the multi-view features of the image to be classified.
  • the classifier is obtained by training the first training sample, the target sample corresponding to the first training sample, and the semantic similarity of the first training sample and the target sample, including: The classifier determines the prediction score of the first training sample; determines the semantic transfer loss function L semantic of the classifier according to the prediction score and the semantic similarity between the first training sample and the target sample, and the target The semantic similarity between the sample and the first training sample is used to determine the degree of influence of the prediction score on the L semantic ; and the classifier is trained according to the L semantic.
  • Training the classifier according to the semantic similarity can improve the performance of the classifier.
  • the degree of influence of the prediction score on the L semantic is 100%;
  • the degree of influence of the prediction score on the L semantic is 0.
  • the intensity of semantic transfer can be set based on experience, so that the classifier can learn the correct classification knowledge and avoid the classifier being misled by the wrong classification knowledge.
  • the classifier is obtained by training the first training sample, the target sample corresponding to the first training sample, and the semantic similarity of the first training sample and the target sample, including: The classifier determines the prediction score S n of the first training sample; the prediction score S b of the target training sample is determined by the classifier; the prediction score S b is determined according to the S n , the S b and the balance learning intensity A balanced learning loss function L IC of the classifier, where the balanced learning intensity is used to adjust the degree of influence of the S n and the S b on the L IC ; and the classifier is trained according to the L IC.
  • the classifier tends to predict the new samples into the categories of the basic samples based on the knowledge learned from the basic samples, which leads to a decrease in the performance of the classifier. Adjusting the degree of influence of S n and S b on L IC based on the balanced learning intensity can make the classifier focus more on learning the classification knowledge of new samples, and finally obtain a classifier with better performance.
  • the balance of learning for adjusting the intensity of said S b and S n the degree of influence of L IC comprising: a balance study for increasing the intensity of said S n The degree of influence of the L IC , and the degree of reducing the influence of the S b on the L IC .
  • the above scheme enables the classifier to focus on learning the classification knowledge of new samples, increase the value of S n to reduce L IC , and finally obtain a classifier with better performance.
  • the classifier is obtained by training the first training sample, the target sample corresponding to the first training sample, the semantic similarity of the first training sample and the target sample, and includes: acquiring the Multi-view features of the first training sample and the target sample; training the classifier according to the multi-view features.
  • the feature extractor can extract image features from the original image and input the image features into the classifier for training. Due to the large number of target samples, the feature extractor uses the target samples for extraction and learning, and then extracts the first When training the image features of the sample, it is more inclined to extract features from the first training sample based on the learned knowledge, while ignoring the new content in the first training sample.
  • this application provides a method for extracting features, which converts each sample into a multi-view image (for example, original image, foreground image, and background image).
  • the details of the multi-view image are more abundant, and the feature extractor changes from multi-view Extracting image features from the image can prevent the feature extractor from ignoring the new content in the first training sample, and extract more accurate features, thereby obtaining a classifier with better performance.
  • the acquiring the multi-view features of the first training sample and the target sample includes: separating a plurality of images from each of the first training sample and the target sample, so The viewing angles of the multiple images of each sample are different from each other; the multiple features of each sample are acquired according to the multiple images of each sample; the multiple features of each sample are spliced to obtain the target sample The multi-view feature.
  • the classifier is obtained by training the first training sample, the target sample corresponding to the first training sample, and the semantic similarity of the first training sample and the target sample, including: The classifier determines the prediction score of the first training sample; determines the classification loss function L CE of the classifier according to the prediction score; trains the classifier according to the L CE.
  • another image classification method including: acquiring a plurality of images of different perspectives from an image to be classified; acquiring the characteristics of each image in the plurality of images of different perspectives; The features of is input into a neural network for classification, and the neural network includes a classifier; and the classification result of the image to be classified is obtained.
  • the details of the multi-view image are more abundant.
  • the feature extractor extracts image features from the multi-view image, which can prevent the feature extractor from ignoring the new content in the image to be classified, extracting more accurate features, and obtaining more accurate classification results.
  • the multiple images with different perspectives include the image to be classified, a foreground image in the image to be classified, and a background image in the image to be classified.
  • the inputting the features of each image into a neural network for classification includes: joining the features of each image and then inputting the features of each image into the neural network for classification.
  • the classifier is obtained by training according to any one of the methods in the first aspect.
  • an apparatus for training a classifier which includes a unit for executing any of the methods in the first aspect.
  • an image classification device including a unit for executing any one of the methods in the second aspect.
  • an image classification device which includes a unit for executing any one of the methods in the third aspect.
  • a device for training a classifier including a processor and a memory, the memory is used to store a computer program, and the processor is used to call and run the computer program from the memory, so that the device executes the device in the first aspect Either method.
  • an image classification device including a processor and a memory, the memory is used to store a computer program, and the processor is used to call and run the computer program from the memory, so that the device executes any one of the second aspect Kind of method.
  • an image classification device including a processor and a memory, the memory is used to store a computer program, and the processor is used to call and run the computer program from the memory, so that the device executes any one of the third aspect Kind of method.
  • a computer program product comprising: computer program code, when the computer program code is run by an apparatus for training a classifier, the apparatus executes any of the methods in the first aspect .
  • a computer program product comprising: computer program code, when the computer program code is run by a classification device, the device executes any of the methods in the second aspect.
  • a computer program product includes computer program code, which when the computer program code is run by a classification device, causes the device to execute any method in the third aspect.
  • a computer-readable medium stores program code, and the program code includes instructions for executing any of the methods in the first aspect.
  • a computer-readable medium stores program code, and the program code includes instructions for executing any of the methods in the second aspect.
  • a computer-readable medium stores program code, and the program code includes instructions for executing any of the methods in the third aspect.
  • Fig. 1 is a schematic diagram of a neural network provided by this application.
  • Fig. 2 is a schematic diagram of a method for training a classifier provided by the present application
  • Fig. 3 is a schematic diagram of determining target samples based on semantic tags provided by the present application.
  • Fig. 4 is a schematic diagram of a device for training a classifier provided by the present application.
  • Fig. 5 is a schematic diagram of an image classification device provided by the present application.
  • Fig. 6 is a schematic diagram of another image classification device provided by the present application.
  • Fig. 7 is a schematic diagram of an electronic device provided by the present application.
  • ANN Artificial neural network
  • NN neural network
  • CNN neural network
  • MLP multilayer perceptron
  • Fig. 1 is a schematic diagram of a neural network provided by this application.
  • the neural network 100 includes a multi-view image extractor 110, a feature extractor 120, and a classifier 130.
  • the training image with semantic labels is input to the multi-view image extractor 110, and the multi-view image extractor 110 converts the training image into multiple images of different views, such as foreground image and background image, multi-view image extractor 110 can transform training images into foreground images and background images through a saliency detection network and a multi-view classifier.
  • This application does not limit the specific working mode of the multi-view image extractor 110, nor does it do anything about the specific perspective of the multi-view image. limited.
  • the multi-view image extractor 110 can train the multi-view classifier to improve the performance of the multi-view classification.
  • the feature extractor 120 After the feature extractor 120 obtains the multiple images with different perspectives, it extracts image features from the multiple images with different perspectives. Since the multiple images with different perspectives obtained by the feature extractor 120 belong to one image, for example, the multiple images with different perspectives belong to one image.
  • the perspective image may include the original image, the foreground image of the image, and the background image of the image. Therefore, the feature extractor 120 can learn more knowledge.
  • the feature extractor 120 may splice these image features together and input them into the classifier 130, or may also input these image features into the classifier 130 separately.
  • the classifier 130 is configured to determine the category to which the training image belongs according to the image characteristics, and then determine the loss function according to the classification result and the semantic label of the training image, and perform training according to the loss function.
  • the neural network 100 After the neural network 100 is trained, it can be applied to image classification.
  • the image classification process is similar to the training process.
  • the images to be classified are input to the multi-view image extractor 110, they are transformed into multiple images with different perspectives; the feature extractor 120 extracts the image features of each image from the multiple images with different perspectives. , And then input the image features of each image into the classifier 130.
  • the image features of each image can be spliced to form a multi-view feature input to the classifier 130; the classifier 130 determines the image to be classified according to the input image features
  • the category to which it belongs, that is, the semantic label of the image to be classified is determined.
  • the method 200 may be executed by a processor, and the method 200 includes:
  • S210 Obtain a first training sample, where the first training sample includes a corresponding semantic label.
  • S220 Acquire a plurality of second training samples, each of the second training samples includes a corresponding semantic label.
  • the first training sample is, for example, a new sample
  • the second training sample is, for example, a basic sample
  • both the first training sample and the multiple second training samples are images carrying semantic labels.
  • the processor may perform the following steps.
  • S230 Determine a target sample from the plurality of second training samples according to the semantic similarity between the first training sample and the plurality of second training samples.
  • Semantic tags describe the similarity between training samples to a certain extent. Therefore, the semantic tags carried by training samples can be used to determine the similarity between training samples.
  • Fig. 3 shows an example of determining the similarity between training samples based on semantic tags.
  • the semantic label of the first training sample is tabby cat, and the semantic labels of the eight second training samples are tiger cat, bear cat, Persian cat, Egyptian cat. cat), Siamese cat (Siamese cat), hound (coonhound), Eskimo dog (Eskimo dog) and Maltese dog (Maltese dog).
  • a pre-trained language model can be used to convert these semantic tags into feature vectors, and the cosine similarity between the feature vector of the flower cat and the feature vectors of the other eight semantic tags can be calculated respectively. The higher the cosine similarity, the higher the similarity between the semantic tags. The result is shown by the numbers in Figure 3.
  • the cosine similarity between the eigenvectors of the Eskimo dog and Maltese dog and the eigenvector of the flower cat is too low . It is difficult for the classifier to transfer the classification knowledge learned from the images of huskies and maltese dogs to the classification process of the cat images. Therefore, these two training samples can be discarded and the remaining six training samples are determined as target training samples.
  • the above-mentioned classification knowledge is, for example, weights, connection relationships between neurons, and so on.
  • the processor can perform the following steps.
  • Semantic similarity is used to measure the difficulty of knowledge transfer. For example, the semantic similarity of tabby cat and tiger cat is higher, which means that the feature similarity of tabby cat image and tiger cat image is high, and the classifier uses tiger cat image features.
  • the learned classification knowledge is easier to transfer to the classification process of the tabby cat images, and more tiger skin cat images can be used to train the classifier; the semantic similarity between tabby cat and hunting dog is low, which means that the feature similarity between the tabby cat image and the hunting dog image is low ,
  • the classification knowledge learned by the classifier through the hound image features is difficult to transfer to the classification process of the tabby cat image, which can reduce the amount of hound image used in the classifier training process.
  • training the classifier based on semantic similarity can improve the training efficiency and performance of the classifier.
  • the above method does not use semantic tags for learning during feature extraction, so that the training efficiency of the neural network (such as the neural network 100) can be improved.
  • the feature extractor can extract image features from the original image and input the image features into the classifier for training. Due to the large number of target samples, the feature extractor uses the target samples for extraction and learning, and then extracts the first When training the image features of the sample, it is more inclined to extract features from the first training sample based on the learned knowledge, while ignoring the new content in the first training sample.
  • this application provides a method for extracting features, which converts each sample into a multi-view image (such as a foreground image and a background image). The details of the multi-view image are richer. The feature extractor extracts the image from the multi-view image. Features can prevent the feature extractor from ignoring the new content in the first training sample, and extract more accurate features, thereby obtaining a classifier with better performance.
  • the processor may determine the prediction score of the first training sample through the classifier; subsequently, determine the semantics of the classifier according to the prediction score and the semantic similarity between the first training sample and the target sample. Transfer loss function L semantic ; subsequently, train the classifier according to L semantic.
  • the semantic similarity between the target sample and the first training sample is used to determine the degree of influence of the prediction score on L semantic.
  • a semantic transfer intensity can be set.
  • the prediction score will affect L semantic 100%; when the semantic similarity between the target sample and the first training sample When it is less than the intensity of semantic transfer, the degree of influence of the prediction score on L semantic is 0.
  • the L semantic can be determined based on formula (1).
  • C base represents the number of categories of the target sample
  • base represents a type of target sample
  • s j is the score of classifying the first training sample using the knowledge of training sample j
  • R represents a real number
  • C novel represents the number of categories of the new sample
  • l j is the semantic similarity between the training sample j and the first training sample
  • is the intensity of semantic transfer.
  • the classifier tends to predict the new samples into the categories of the basic samples based on the knowledge learned from the basic samples, which leads to a decrease in the performance of the classifier.
  • a balanced learning loss function L IC can be introduced in the training process of the classifier to solve this problem.
  • the processor can determine L IC according to S n , S b and the balance learning intensity, and then according to L IC Train the classifier.
  • L IC can be determined according to formula (2).
  • s b is a real number (score of a single sample) or vector (score of multiple samples);
  • s n is a real number (score of a single sample) or vector (score of multiple samples);
  • ⁇ s b ,s n > represents the product of s b and s n ;
  • represents the absolute value of the real number s b or The modulus of the vector s b ;
  • represents the absolute value of the real number s n or the modulus of the vector s n ;
  • represents the strength of balanced learning.
  • Balanced learning intensity is used to adjust the degree of influence of S n and S b on L IC.
  • balanced learning intensity is used to increase the impact of S n on L IC and reduce the impact of S b on L IC .
  • the classifier needs to focus more on learning the classification knowledge of new samples and improve S n Value to reduce L IC and finally get a better-performing classifier.
  • the processor can also determine the classification loss function L CE of the classifier according to the prediction score of classifying the new sample, and train the classifier according to the L CE. For example, the processor can complete the training of the classifier by minimizing L semantic +L IC +L CE.
  • Table 1 is a comparison of the test results of the classifier trained using L CE and L IC and the classifier trained only using L CE.
  • Table 2 shows the test results of the method 200 on the publicly available large-scale and small-sample data set.
  • Table 3 shows the combined effect of the method 200 and the existing large-scale and small-sample recognition method.
  • the method 200 can also be applied to traditional small-sample recognition tasks.
  • the specific experimental results are shown in Table 4.
  • the corresponding device includes a hardware structure and/or software module corresponding to each function.
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
  • This application can divide the device for training the classifier and the image classification device into functional units based on the above method examples.
  • each function can be divided into functional units, or two or more functions can be integrated into one unit. middle.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit. It should be noted that the division of units in this application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.
  • Fig. 4 is a schematic structural diagram of a device for training a classifier provided by the present application.
  • the device 400 includes a processor 410 and a storage unit 420.
  • the memory 420 is used to store a computer program, and the processor 410 is used to call and run the computer program from the memory 420 to execute: obtain a first training sample, the first training sample Including corresponding semantic labels; acquiring a plurality of second training samples, each of the second training samples includes a corresponding semantic label; according to the semantic similarity of the first training sample and the plurality of second training samples from all Determine a target sample from the plurality of second training samples; train a classifier according to the semantic similarity of the first training sample, the target sample, the first training sample, and the target sample.
  • the processor 410 is specifically configured to: determine the prediction score of the first training sample by the classifier; according to the prediction score, the semantic similarity between the first training sample and the target sample Determine the semantic transfer loss function L semantic of the classifier, and the semantic similarity between the target sample and the first training sample is used to determine the degree of influence of the prediction score on L semantic ; train the classifier according to L semantic .
  • the degree of influence of the prediction score on L semantic is 100%; or, when the target sample is When the semantic similarity of the first training sample is less than the intensity of semantic transfer, the degree of influence of the prediction score on L semantic is 0.
  • the processor 410 is specifically configured to: determine the prediction score S n of the first training sample by the classifier; determine the prediction score S b of the target training sample by the classifier; n , S b and the balance learning intensity determine the balance learning loss function L IC of the classifier, and the balance learning intensity is used to adjust the degree of influence of S n and S b on L IC ; training the L IC according to the L IC Classifier.
  • the processor 410 is configured to: the balance learning intensity is used to increase the degree of influence of S n on the L IC and reduce the degree of influence of S b on the L IC .
  • the processor 410 is specifically configured to: obtain the multi-view characteristics of the first training sample and the target sample; and train the classifier according to the multi-view characteristics.
  • the processor 410 is specifically configured to: separate a plurality of images from each of the first training sample and the target sample, and the perspectives of the multiple images are different from each other; The multiple images acquire multiple features of each of the first training sample and the target sample; and the multiple features are spliced to obtain the multi-view feature.
  • the processor 410 is specifically configured to: determine the prediction score of the first training sample by the classifier; determine the classification loss function L CE of the classifier according to the prediction score; and train according to the L CE The classifier.
  • Fig. 5 is a schematic structural diagram of an image classification device provided by the present application.
  • the device 500 includes a processor 510 and a memory 520.
  • the memory 520 is used to store a computer program.
  • the processor 510 is used to call and run the computer program from the memory 520 to execute: obtain the features of the image to be classified;
  • the network performs classification, and the neural network includes a classifier, the classifier is obtained by training through the method 200; and the classification result of the image to be classified is obtained.
  • the feature of the predicted image is a multi-view feature.
  • the processor 510 is specifically configured to: acquire a plurality of images of different perspectives from the image to be classified; acquire the characteristics of each image in the plurality of images of different perspectives; and stitch the characteristics of each image Obtain the multi-view features of the image to be classified.
  • the device 500 and the device 400 are the same device.
  • Fig. 6 is a schematic structural diagram of another image classification device provided by the present application.
  • the device 600 includes a processor 610 and a memory 620, the memory 620 is used to store a computer program, and the processor 610 is used to call and run the computer program from the memory 620 to execute: acquire multiple images with different perspectives from images to be classified; Obtain the feature of each image in the plurality of images from different perspectives; input the feature of each image into a neural network for classification, and the neural network includes a classifier; and obtain a classification result of the image to be classified.
  • the multiple images with different perspectives include at least two of the image to be classified, a foreground image in the image to be classified, and a background image in the image to be classified.
  • the inputting the features of each image into a neural network for classification includes: joining the features of each image and then inputting the features of each image into the neural network for classification.
  • the classifier is obtained through method 200 training.
  • the device 600 and the device 400 are the same device.
  • Fig. 7 shows a schematic structural diagram of an electronic device provided by the present application.
  • the dotted line in Figure 7 indicates that the unit or the module is optional.
  • the device 700 may be used to implement the methods described in the foregoing method embodiments.
  • the device 700 may be a terminal device or a server or a chip.
  • the device 700 includes one or more processors 701, and the one or more processors 701 can support the device 700 to implement the method in the method embodiment.
  • the processor 701 may be a general-purpose processor or a special-purpose processor.
  • the processor 701 may be a central processing unit (CPU).
  • the CPU can be used to control the device 700, execute a software program, and process data of the software program.
  • the device 700 may further include a communication unit 705 to implement signal input (reception) and output (transmission).
  • the device 700 may be a chip, and the communication unit 705 may be an input and/or output circuit of the chip, or the communication unit 705 may be a communication interface of the chip, and the chip may be used as a terminal device or a network device or other electronic device. component.
  • the device 700 may be a terminal device or a server
  • the communication unit 705 may be a transceiver of the terminal device or the server, or the communication unit 705 may be a transceiver circuit of the terminal device or the server.
  • the device 700 may include one or more memories 702 with a program 704 stored thereon.
  • the program 704 can be run by the processor 701 to generate an instruction 703 so that the processor 701 executes the method described in the foregoing method embodiment according to the instruction 703.
  • the memory 702 may also store data.
  • the processor 701 may also read data stored in the memory 702. The data may be stored at the same storage address as the program 704, or the data may be stored at a different storage address from the program 704.
  • the processor 701 and the memory 702 may be provided separately or integrated together, for example, integrated on a system-on-chip (SOC) of the terminal device.
  • SOC system-on-chip
  • the device 700 may also include an antenna 706.
  • the communication unit 705 is configured to implement the transceiving function of the device 700 through the antenna 706.
  • the method for the processor 701 to execute the training classifier and the specific method for the image classification method can be referred to the related description in the method embodiment.
  • the processor 701 can be a CPU, a digital signal processor (digital signal processor, DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), or other programmable logic devices , For example, discrete gates, transistor logic devices, or discrete hardware components.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • This application also provides a computer program product, which, when executed by the processor 701, implements the method described in any method embodiment in this application.
  • the computer program product may be stored in the memory 702, for example, a program 704, and the program 704 is finally converted into an executable object file that can be executed by the processor 701 through processing processes such as preprocessing, compilation, assembly, and linking.
  • the present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a computer, the method described in any method embodiment in the present application is implemented.
  • the computer program can be a high-level language program or an executable target program.
  • the computer-readable storage medium is, for example, the memory 702.
  • the memory 702 may be a volatile memory or a non-volatile memory, or the memory 702 may include both a volatile memory and a non-volatile memory.
  • the non-volatile memory can be read-only memory (ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), and electrically available Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • the volatile memory may be random access memory (RAM), which is used as an external cache.
  • RAM random access memory
  • static random access memory static random access memory
  • dynamic RAM dynamic RAM
  • DRAM dynamic random access memory
  • synchronous dynamic random access memory synchronous DRAM, SDRAM
  • double data rate synchronous dynamic random access memory double data rate SDRAM, DDR SDRAM
  • enhanced synchronous dynamic random access memory enhanced SDRAM, ESDRAM
  • synchronous connection dynamic random access memory serial DRAM, SLDRAM
  • direct rambus RAM direct rambus RAM, DR RAM
  • the disclosed system, device, and method may be implemented in other ways. For example, some features of the method embodiments described above may be ignored or not implemented.
  • the device embodiments described above are merely illustrative.
  • the division of units is only a logical function division. In actual implementation, there may be other division methods. Multiple units or components can be combined or integrated into another system.
  • the coupling between the units or the coupling between the components may be direct coupling or indirect coupling, and the foregoing coupling includes electrical, mechanical, or other forms of connection.
  • the size of the sequence number of each process does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not correspond to the embodiments of the present application.
  • the implementation process constitutes any limitation.
  • system and “network” in this article are often used interchangeably in this article.
  • the term “and/or” in this article is only an association relationship that describes associated objects, which means that there can be three relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, alone There are three cases of B.
  • the character “/" in this text generally indicates that the associated objects before and after are in an "or" relationship.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请提供了一种训练分类器的方法,包括:获取第一训练样本,所述第一训练样本包括相应的语义标签;获取多个第二训练样本,每个所述第二训练样本包括相应的语义标签;根据所述第一训练样本和所述多个第二训练样本的语义相似度从所述多个第二训练样本中确定目标样本;根据所述第一训练样本、所述目标样本、所述第一训练样本和所述目标样本的语义相似度训练分类器。基于语义相似度训练分类器能够提高分类器的训练效率和性能。此外,由于上述方法在特征提取时未使用语义标签进行学习,无需改变特征提取器的网络结构,从而能够提高神经网络的训练效率。

Description

训练分类器的方法和装置
本申请要求于2020年02月23日提交中国专利局、申请号为202010109899.6、申请名称为“训练分类器的方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能领域,尤其涉及一种训练分类器的方法和装置。
背景技术
神经网络是实现人工智能的工具,神经网络在应用前需要经过大量样本的训练才能实现特定的功能,当需要神经网络实现新的功能时,通常还需要使用大量新样本训练神经网络。
一种减小再次训练神经网络的工作量的方法是知识迁移。神经网络基于大量样本完成训练后,学习到了知识;当使用新(novel)样本训练神经网络时,可以利用已经学习到的知识处理新样本,这样可以较少的新样本完成神经网络的再次训练,提升神经网络的性能。相比于新样本,上述大量样本可以称为基础(base)样本,
在利用知识迁移和新样本训练神经网络时,通常利用特征提取器从新样本中提取特征,并基于特征对新样本进行分类等处理,当新样本的类别改变时,特征提取器需要重新学习新类别的新样本的特征,导致训练工作量增大。
发明内容
本申请提供了一种训练分类器的方法和装置,能够分类器的训练效率和性能。
第一方面,提供了一种训练分类器的方法,包括:获取第一训练样本,所述第一训练样本包括相应的语义标签;获取多个第二训练样本,每个所述第二训练样本包括相应的语义标签;根据所述第一训练样本和所述多个第二训练样本的语义相似度从所述多个第二训练样本中确定目标样本;根据所述第一训练样本、所述目标样本、所述第一训练样本和所述目标样本的语义相似度训练分类器。
语义相似度用于衡量知识迁移的难易程度,例如,花猫与虎皮猫的语义相似度较高,表示花猫图像和虎皮猫图像的特征相似度较高,分类器通过虎皮猫图像特征学习到的分类知识更容易迁移到花猫图像的分类过程中,可以使用更多虎皮猫图像训练分类器;花猫与猎犬的语义相似度较低,表示花猫图像与猎犬图像的特征相似度较低,分类器通过猎犬图像特征学习到的分类知识难以迁移到花猫图像的分类过程中,可以减少猎犬图像在分类器训练过程中的使用量。因此,基于语义相似度训练分类器能够提高分类器的训练效率和性能。此外,由于上述方法在特征提取时未使用语义标签进行学习,无需改变特征提取器的网络结构,从而能够提高神经网络(包含分类器)的训练效率。
可选地,所述根据所述第一训练样本、所述目标样本、所述第一训练样本和所述目标样本的语义相似度训练分类器,包括:通过所述分类器确定所述第一训练样本的预测得分;根据所述预测得分、所述第一训练样本与所述目标样本的语义相似度确定所述分类器的语义迁移损失函数L semantic,所述目标样本与所述第一训练样本的语义相似度用于确定所述预测得分对所述L semantic的影响程度;根据所述L semantic训练所述分类器。
根据语义相似度训练分类器能够提高分类器的性能。
可选地,当所述目标样本与所述第一训练样本的语义相似度大于等于语义迁移强度时,所述预测得分对所述L semantic的影响程度为100%;或者,当所述目标样本与所述第一训练样本的语义相似度小于语义迁移强度时,所述预测得分对所述L semantic的影响程度为0。
语义迁移强度可以根据经验设置,使得分类器学习到正确的分类知识,避免分类器被错误的分类知识误导。
可选地,所述根据所述第一训练样本、所述目标样本、所述第一训练样本和所述目标样本的语义相似度训练分类器,包括:通过所述分类器确定所述第一训练样本的预测得分S n;通过所述分类器确定所述目标训练样本的预测得分S b;根据所述S n、所述S b和平衡性学习强度确定所述分类器的平衡性学习损失函数L IC,所述平衡性学习强度用于调整所述S n和所述S b对所述L IC的影响程度;根据所述L IC训练所述分类器。
在分类器的训练过程中,由于基础样本的数量通常比新样本的数量多,分类器基于基础样本学习到的知识倾向于将新样本预测到基础样本的类别中,导致分类器性能下降。基于平衡性学习强度调整S n和S b对L IC的影响程度,能够使得分类器更加专注于学习新样本的分类知识,最终获得性能更好的分类器。
可选地,所述平衡性学习强度用于调整所述S n和所述S b对所述L IC的影响程度,包括:所述平衡性学习强度用于增大所述S n对所述L IC的影响程度,以及减小所述S b对所述L IC的影响程度。
上述方案使得分类器能够专注于学习新样本的分类知识,提高S n的值以缩小L IC,最终获得性能更好的分类器。
可选地,所述根据所述第一训练样本、所述目标样本、所述第一训练样本和所述目标样本的语义相似度训练分类器,包括:获取所述第一训练样本和所述目标样本的多视角特征;根据所述多视角特征训练所述分类器。
在训练的过程中,特征提取器可以从原图中提取图像特征,将图像特征输入分类器进行训练,由于目标样本的数量较多,特征提取器使用目标样本进行提取学习后,再提取第一训练样本的图像特征时,更倾向于基于已学习到的知识从第一训练样本中提取特征,而忽略了第一训练样本中的新内容。为此,本申请提供了一种提取特征的方法,将每个样本转变为多视角图像(例如,原图、前景图像和背景图像),多视角图像的细节更加丰富,特征提取器从多视角图像中提取图像特征,能够避免特征提取器忽略第一训练样本中的新内容,提取到更加准确的特征,进而获得性能更好的分类器。
可选地,所述获取所述第一训练样本和所述目标样本的多视角特征,包括:从所述第一训练样本和所述目标样本中的每个样本中分离出多个图像,所述每个样本的多个图像的视角互不相同;根据所述每个样本的多个图像获取所述每个样本的多个特征;拼接所述每 个样本的多个特征得到所述多视角特征。
可选地,所述根据所述第一训练样本、所述目标样本、所述第一训练样本和所述目标样本的语义相似度训练分类器,包括:通过所述分类器确定所述第一训练样本的预测得分;根据所述预测得分确定所述分类器的分类损失函数L CE;根据所述L CE训练所述分类器。
第二方面,提供了一种图像分类方法,包括:获取待分类图像的特征;将所述特征输入神经网络进行分类,所述神经网络包括分类器,所述分类器是通过第一训练样本、所述第一训练样本对应的目标样本、所述第一训练样本和所述目标样本的语义相似度训练得到的,所述目标样本是根据所述第一训练样本和多个第二训练样本的语义相似度从所述多个第二训练样本中确定的;获取所述待分类图像的分类结果。
基于语义相似度训练分类器能够提高分类器的性能,因此,使用该分类器对待分类图像的分类结果更加准确。
可选地,所述待分类图像的特征为多视角特征。
多视角图像的细节更加丰富,特征提取器从多视角图像中提取图像特征,能够避免特征提取器忽略待分类图像中的新内容,提取到更加准确的特征,进而获得更准确的分类结果。
可选地,所述获取待分类图像的特征,包括:从所述待分类图像种获取多个不同视角的图像;获取所述多个不同视角的图像中每个图像的特征;拼接所述每个图像的特征得到所述待分类图像的多视角特征。
可选地,所述分类器是通过第一训练样本、所述第一训练样本对应的目标样本、所述第一训练样本和所述目标样本的语义相似度训练得到的,包括:通过所述分类器确定所述第一训练样本的预测得分;根据所述预测得分、所述第一训练样本与所述目标样本的语义相似度确定所述分类器的语义迁移损失函数L semantic,所述目标样本与所述第一训练样本的语义相似度用于确定所述预测得分对所述L semantic的影响程度;根据所述L semantic训练所述分类器。
根据语义相似度训练分类器能够提高分类器的性能。
可选地,当所述目标样本与所述第一训练样本的语义相似度大于等于语义迁移强度时,所述预测得分对所述L semantic的影响程度为100%;当所述目标样本与所述第一训练样本的语义相似度小于所述语义迁移强度时,所述预测得分对所述L semantic的影响程度为0。
语义迁移强度可以根据经验设置,使得分类器学习到正确的分类知识,避免分类器被错误的分类知识误导。
可选地,所述分类器是通过第一训练样本、所述第一训练样本对应的目标样本、所述第一训练样本和所述目标样本的语义相似度训练得到的,包括:通过所述分类器确定所述第一训练样本的预测得分S n;通过所述分类器确定所述目标训练样本的预测得分S b;根据所述S n、所述S b和平衡性学习强度确定所述分类器的平衡性学习损失函数L IC,所述平衡性学习强度用于调整所述S n和所述S b对所述L IC的影响程度;根据所述L IC训练所述分类器。
在分类器的训练过程中,由于基础样本的数量通常比新样本的数量多,分类器基于基础样本学习到的知识倾向于将新样本预测到基础样本的类别中,导致分类器性能下降。基于平衡性学习强度调整S n和S b对L IC的影响程度,能够使得分类器更加专注于学习新样 本的分类知识,最终获得性能更好的分类器。
可选地,所述平衡性学习强度用于调整所述S n和所述S b对所述L IC的影响程度,包括:所述平衡性学习强度用于增大所述S n对所述L IC的影响程度,以及减小所述S b对所述L IC的影响程度。
上述方案使得分类器能够专注于学习新样本的分类知识,提高S n的值以缩小L IC,最终获得性能更好的分类器。
可选地,所述分类器是通过第一训练样本、所述第一训练样本对应的目标样本、所述第一训练样本和所述目标样本的语义相似度训练得到的,包括:获取所述第一训练样本和所述目标样本的多视角特征;根据所述多视角特征训练所述分类器。
在训练的过程中,特征提取器可以从原图中提取图像特征,将图像特征输入分类器进行训练,由于目标样本的数量较多,特征提取器使用目标样本进行提取学习后,再提取第一训练样本的图像特征时,更倾向于基于已学习到的知识从第一训练样本中提取特征,而忽略了第一训练样本中的新内容。为此,本申请提供了一种提取特征的方法,将每个样本转变为多视角图像(例如,原图、前景图像和背景图像),多视角图像的细节更加丰富,特征提取器从多视角图像中提取图像特征,能够避免特征提取器忽略第一训练样本中的新内容,提取到更加准确的特征,进而获得性能更好的分类器。
可选地,所述获取所述第一训练样本和所述目标样本的多视角特征,包括:从所述第一训练样本和所述目标样本中的每个样本中分离出多个图像,所述每个样本的多个图像的视角互不相同;根据所述每个样本的多个图像获取每个样本的多个特征;拼接所述每个样本的所述多个特征得到所述目标样本的多视角特征。
可选地,所述分类器是通过第一训练样本、所述第一训练样本对应的目标样本、所述第一训练样本和所述目标样本的语义相似度训练得到的,包括:通过所述分类器确定所述第一训练样本的预测得分;根据所述预测得分确定所述分类器的分类损失函数L CE;根据所述L CE训练所述分类器。
第三方面,提供了另一种图像分类方法,包括:从待分类图像中获取多个不同视角的图像;获取所述多个不同视角的图像中每个图像的特征;将所述每个图像的特征输入神经网络进行分类,所述神经网络包括分类器;获取所述待分类图像的分类结果。
多视角图像的细节更加丰富,特征提取器从多视角图像中提取图像特征,能够避免特征提取器忽略待分类图像中的新内容,提取到更加准确的特征,进而获得更准确的分类结果。
可选地,所述多个不同视角的图像包括所述待分类图像,所述待分类图像中的前景图像,所述待分类图像中的背景图像。
可选地,所述将所述每个图像的特征输入神经网络进行分类,包括:将所述每个图像的特征拼接后输入所述神经网络进行分类。
可选地,所述分类器是通过第一方面中任一项所述的方法训练得到的。
第四方面,提供了一种训练分类器的装置,包括用于执行第一方面中任一种方法的单元。
第五方面,提供了一种图像分类装置,包括用于执行第二方面中任一种方法的单元。
第六方面,提供了一种图像分类装置,包括用于执行第三方面中任一种方法的单元。
第七方面,提供了一种训练分类器的设备,包括处理器和存储器,该存储器用于存储计算机程序,该处理器用于从存储器中调用并运行该计算机程序,使得该设备执行第一方面中任一种方法。
第八方面,提供了一种图像分类设备,包括处理器和存储器,该存储器用于存储计算机程序,该处理器用于从存储器中调用并运行该计算机程序,使得该设备执行第二方面中任一种方法。
第九方面,提供了一种图像分类设备,包括处理器和存储器,该存储器用于存储计算机程序,该处理器用于从存储器中调用并运行该计算机程序,使得该设备执行第三方面中任一种方法。
第十方面,提供了一种计算机程序产品,所述计算机程序产品包括:计算机程序代码,当所述计算机程序代码被训练分类器的装置运行时,使得该装置执行第一方面中任一种方法。
第十一方面,提供了一种计算机程序产品,所述计算机程序产品包括:计算机程序代码,当所述计算机程序代码被分类装置运行时,使得该装置执行第二方面中任一种方法。
第十二方面,提供了一种计算机程序产品,所述计算机程序产品包括:计算机程序代码,当所述计算机程序代码被分类装置运行时,使得该装置执行第三方面中任一种方法。
第十三方面,提供了一种计算机可读介质,所述计算机可读介质存储有程序代码,所述程序代码包括用于执行第一方面中任一种方法的指令。
第十四方面,提供了一种计算机可读介质,所述计算机可读介质存储有程序代码,所述程序代码包括用于执行第二方面中任一种方法的指令。
第十五方面,提供了一种计算机可读介质,所述计算机可读介质存储有程序代码,所述程序代码包括用于执行第三方面中任一种方法的指令。
附图说明
图1是本申请提供的一种神经网络的示意图;
图2是本申请提供的一种训练分类器的方法的示意图;
图3是本申请提供的一种基于语义标签确定目标样本的示意图;
图4是本申请提供的一种训练分类器的装置的示意图;
图5是本申请提供的一种图像分类装置的示意图;
图6是本申请提供的另一种图像分类装置的示意图;
图7是本申请提供的一种电子设备的示意图。
具体实施方式
为了便于理解本申请的技术方案,首先对本申请涉及的概念做简要介绍。
人工神经网络(artificial neural network,ANN),简称为神经网络(neural network,NN)或类神经网络,在机器学习和认知科学领域,是一种模仿生物神经网络(动物的中枢神经系统,特别是大脑)的结构和功能的数学模型或计算模型,用于对函数进行估计或近似。人工神经网络可以包括卷积神经网络(convolutional neural network,CNN)、深度神经网络(deep neural network,DNN)、多层感知器(multilayer perceptron,MLP)等神经网 络。
图1是本申请本申请提供的一种神经网络的示意图。该神经网络100包括多视角图像提取器110、特征提取器120和分类器130。
在训练阶段,将带语义标签的训练图像输入多视角图像提取器110,多视角图像提取器110将该训练图像转变为多张不同视角的图像,如前景图像和背景图像,多视角图像提取器110可以通过显著性检测网络和多视角分类器将训练图像转变为前景图像和背景图像,本申请对多视角图像提取器110的具体工作方式不做限定,对多视角图像的具体视角也不做限定。在上述处理过程中,多视角图像提取器110可以对多视角分类器进行训练,提高多视角分类性能。
特征提取器120获取上述多张不同视角的图像后,从多张不同视角的图像中分别提取图像特征,由于特征提取器120获得的多张不同视角的图像属于一张图像,例如,多张不同视角的图像可以包括图像原图、图像的前景图、图像的背景图。因此,特征提取器120可以学习到更多的知识。特征提取器120可以将这些图像特征拼接在一起输入分类器130,或者,也可以将这些图像特征分别输入分类器130。
分类器130用于根据图像特征确定训练图像所属的类别,随后,根据分类结果和该训练图像的语义标签确定损失函数,并根据该损失函数进行训练。
神经网络100训练完成之后,可以应用于图像分类。图像分类过程与训练过程类似,待分类图像输入多视角图像提取器110后,转变为多张不同视角的图像;特征提取器120从该多张不同视角的图像中提取每一张图像的图像特征,然后将每一张图像的图像特征输入分类器130,例如,可以将每一张图像的图像特征拼接后形成多视角特征输入分类器130;分类器130根据输入的图像特征确定该待分类图像所属的类别,即,确定待分类图像的语义标签。
下面,介绍本申请提供的训练分类器的方法。
如图2所示,方法200可以由处理器执行,方法200包括:
S210,获取第一训练样本,所述第一训练样本包括相应的语义标签。
S220,获取多个第二训练样本,每个所述第二训练样本包括相应的语义标签。
第一训练样本例如是新样本,第二训练样本例如是基础样本,第一训练样本和多个第二训练样本均是携带语义标签的图像。获取第一训练样本和多个第二训练样本之后,处理器可以执行下列步骤。
S230,根据所述第一训练样本和所述多个第二训练样本的语义相似度从所述多个第二训练样本中确定目标样本。
语义标签在一定程度上描述了训练样本之间的相似度,因此,可以利用训练样本携带的语义标签确定训练样本之间的相似度。
图3示出了一种根据语义标签确定训练样本之间的相似度的示例。
第一训练样本的语义标签为花猫(tabby cat),八个第二训练样本的语义标签分别为虎皮猫(tiger cat)、熊猫(bear cat)、波斯猫(Persian cat)、埃及猫(Egyptian cat)、暹罗猫(Siamese cat)、猎犬(coonhound)、爱斯基摩狗(Eskimo dog)和马耳他狗(Maltese dog)。可以使用预训练的语言模型将这些语义标签转变成特征向量,并分别计算花猫的特征向量和其它八个语义标签的特征向量之间的余弦相似度。余弦相似度越高,表示语义 标签之间的相似度越高,结果如图3中的数字所示,其中,爱斯基摩狗和马耳他狗的特征向量与花猫的特征向量之间的余弦相似度过低,分类器难以将通过爱斯基摩狗图像和马耳他狗图像学习到分类知识迁移到花猫图像的分类过程中,因此,可以舍弃这两个训练样本,确定其余六个训练样本为目标训练样本。上述分类知识例如是权重、神经元之间的连接关系等。
确定相似度和目标样本之后,处理器可以执行下列步骤。
S240,根据所述第一训练样本、所述目标样本、所述第一训练样本和所述目标样本的语义相似度训练分类器。
语义相似度用于衡量知识迁移的难易程度,例如,花猫与虎皮猫的语义相似度较高,表示花猫图像和虎皮猫图像的特征相似度较高,分类器通过虎皮猫图像特征学习到的分类知识更容易迁移到花猫图像的分类过程中,可以使用更多虎皮猫图像训练分类器;花猫与猎犬的语义相似度较低,表示花猫图像与猎犬图像的特征相似度较低,分类器通过猎犬图像特征学习到的分类知识难以迁移到花猫图像的分类过程中,可以减少猎犬图像在分类器训练过程中的使用量。因此,基于语义相似度训练分类器能够提高分类器的训练效率和性能。此外,由于上述方法在特征提取时未使用语义标签进行学习,无需改变特征提取器的网络结构,从而能够提高神经网络(如神经网络100)的训练效率。
在训练的过程中,特征提取器可以从原图中提取图像特征,将图像特征输入分类器进行训练,由于目标样本的数量较多,特征提取器使用目标样本进行提取学习后,再提取第一训练样本的图像特征时,更倾向于基于已学习到的知识从第一训练样本中提取特征,而忽略了第一训练样本中的新内容。为此,本申请提供了一种提取特征的方法,将每个样本转变为多视角图像(如前景图像和背景图像),多视角图像的细节更加丰富,特征提取器从多视角图像中提取图像特征,能够避免特征提取器忽略第一训练样本中的新内容,提取到更加准确的特征,进而获得性能更好的分类器。
可选地,在分类器的训练过程中,处理器可以通过分类器确定第一训练样本的预测得分;随后,根据该预测得分、第一训练样本与目标样本的语义相似度确定分类器的语义迁移损失函数L semantic;随后,根据L semantic训练分类器。
目标样本与第一训练样本的语义相似度用于确定预测得分对L semantic的影响程度。语义相似度越高,影响程度越大;语义相似度越低,影响程度越小。即,语义相似度与影响程度正相关。
可以设置一个语义迁移强度,当目标样本与第一训练样本的语义相似度大于等于语义迁移强度时,预测得分对L semantic的影响程度为100%;当目标样本与第一训练样本的语义相似度小于语义迁移强度时,预测得分对L semantic的影响程度为0。
可以基于公式(1)确定L semantic
Figure PCTCN2020117613-appb-000001
其中,C base表示目标样本的类别个数;base表示一类目标样本;s j为使用训练样本j的知识对第一训练样本进行分类的得分,
Figure PCTCN2020117613-appb-000002
R表示实数,C novel表示新样本的类别个数;l j为训练样本j与第一训练样本之间的语义相似度,
Figure PCTCN2020117613-appb-000003
α为语义迁移 强度。
当l j≥α时,说明训练样本j与第一训练样本的语义相似度较高,可以将s j作为影响L semantic的因子,使得分类器学习到正确的分类知识;当l j<α时,说明训练样本j与第一训练样本的语义相似度较低,可以无需考虑s j对L semantic的影响,避免分类器被错误的分类知识误导。
由上可知,根据语义相似度训练分类器能够提高分类器的性能。
在分类器的训练过程中,由于基础样本的数量通常比新样本的数量多,分类器基于基础样本学习到的知识倾向于将新样本预测到基础样本的类别中,导致分类器性能下降。
可选地,可以在分类器的训练过程中引入平衡性学习损失函数L IC来解决这一问题。当分类器对第一训练样本的预测得分为S n,对目标训练样本的预测得分为S b时,处理器可以根据S n、S b和平衡性学习强度确定L IC,随后,根据L IC训练分类器。
可以根据公式(2)确定L IC
Figure PCTCN2020117613-appb-000004
其中,
Figure PCTCN2020117613-appb-000005
s b为实数(单个样本的得分)或向量(多个样本的得分);
Figure PCTCN2020117613-appb-000006
s n为实数(单个样本的得分)或向量(多个样本的得分);<s b,s n>表示s b与s n的乘积;||s b||表示实数s b的绝对值或向量s b的模;||s n||表示实数s n的绝对值或向量s n的模;β表示平衡性学习强度。
平衡性学习强度用于调整S n和S b对L IC的影响程度。例如,平衡性学习强度用于增大S n对L IC的影响程度,以及减小S b对L IC的影响程度,这样,分类器需要更加专注于学习新样本的分类知识,提高S n的值以缩小L IC,最终获得性能更好的分类器。
除了上述L semantic和L IC之外,处理器还可以根据对新样本进行分类的预测得分确定分类器的分类损失函数L CE,并根据L CE训练分类器。例如,处理器可以通过最小化L semantic+L IC+L CE完成分类器的训练。
下面给出几个本申请的有益效果的示例。
表1是使用L CE和L IC训练的分类器与仅使用L CE训练的分类器的测试结果的对比。
表1
Figure PCTCN2020117613-appb-000007
由表1可以看出,通过L CE和L IC训练的分类器的预测得分普遍高于仅使用L CE训练的分类器的预测得分。
表2是方法200在公开的大规模小样本数据集上的测试结果。
表2
Figure PCTCN2020117613-appb-000008
Figure PCTCN2020117613-appb-000009
由表2可以看出,方法200的增益相比于其它方法有了大幅度的提高。
表3是方法200与现有的大规模小样本识别方法结合后的效果。
表3
Figure PCTCN2020117613-appb-000010
由表3可以看出,方法200与现有的大规模小样本识别方法结合后,提升了准确性。
方法200除了可以应用于大规模小样本识别任务中,还可看应用在传统的小样本识别任务中,具体实验结果如表4所示。
表4
MiniImageNet 特征提取器 K=1 K=5
RelationNet[CVPR 2018] Conv-4-64 50.40±0.80% 65.30±0.70%
MetaGAN[NeurIPS 2018] Conv-4-64 52.71±0.64% 68.63±0.67%
R2-D2[ICLR 2019] Conv-4-64 48.70±0.60% 65.50±0.60%
DN4[CVPR2019] Conv-4-64 51.24±0.74% 71.02±0.64%
MetaNet[ICML 2017] ResNets-12 57.10±0.70% 70.04±0.63%
TADAM[NeurIPS 2018] ResNets-12 58.05±0.30% 76.70±0.30%
MTL[CVPR2019] ResNets-12 61.20±1.20% 75.50±0.80%
PPA[CVPR2018] WRN-28-10 59.60±0.41% 73.74±0.19%
LEO[ICLR 2019] WRN-28-10 61.76±0.08% 77.59±0.12%
LwoF[CVPR2018] WRN-28-10 60.06±0.14% 76.39±0.11%
wDAE-GNN[CVPR 2019] WRN-28-10 62.96±0.15% 78.85±0.10%
方法200 WRN-28-10 64.40±0.43% 83.05±0.28%
上文详细介绍了本申请提供的训练分类器的方法以及图像分类方法的示例。可以理解的是,相应的装置为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单 元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请可以根据上述方法示例对训练分类器的装置以及图像分类装置进行功能单元的划分,例如,可以将各个功能划分为各个功能单元,也可以将两个或两个以上的功能集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。需要说明的是,本申请中对单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
图4是本申请提供的一种训练分类器的装置的结构示意图。该装置400包括处理器410和存储单元420,存储器420用于存储计算机程序,处理器410用于从存储器420中调用并运行所述计算机程序执行:获取第一训练样本,所述第一训练样本包括相应的语义标签;获取多个第二训练样本,每个所述第二训练样本包括相应的语义标签;根据所述第一训练样本和所述多个第二训练样本的语义相似度从所述多个第二训练样本中确定目标样本;根据所述第一训练样本、所述目标样本、所述第一训练样本和所述目标样本的语义相似度训练分类器。
可选地,所述处理器410具体用于:通过所述分类器确定所述第一训练样本的预测得分;根据所述预测得分、所述第一训练样本与所述目标样本的语义相似度确定所述分类器的语义迁移损失函数L semantic,所述目标样本与所述第一训练样本的语义相似度用于确定所述预测得分对L semantic的影响程度;根据L semantic训练所述分类器。
可选地,当所述目标样本与所述第一训练样本的语义相似度大于等于语义迁移强度时,所述预测得分对L semantic的影响程度为100%;或者,当所述目标样本与所述第一训练样本的语义相似度小于语义迁移强度时,所述预测得分对L semantic的影响程度为0。
可选地,所述处理器410具体用于:通过所述分类器确定所述第一训练样本的预测得分S n;通过所述分类器确定所述目标训练样本的预测得分S b;根据S n、S b和平衡性学习强度确定所述分类器的平衡性学习损失函数L IC,所述平衡性学习强度用于调整S n和S b对L IC的影响程度;根据L IC训练所述分类器。
可选地,所述处理器410用于:所述平衡性学习强度用于增大S n对L IC的影响程度,以及减小S b对L IC的影响程度。
可选地,所述处理器410具体用于:获取所述第一训练样本和所述目标样本的多视角特征;根据所述多视角特征训练所述分类器。
可选地,所述处理器410具体用于:从所述第一训练样本和所述目标样本中的每个样本中分离出多个图像,所述多个图像的视角互不相同;根据所述多个图像获取所述第一训练样本和所述目标样本中的每个样本的多个特征;拼接所述多个特征得到所述多视角特征。
可选地,所述处理器410具体用于:通过所述分类器确定所述第一训练样本的预测得分;根据所述预测得分确定所述分类器的分类损失函数L CE;根据L CE训练所述分类器。
装置400执行训练分类器的方法的具体方式以及产生的有益效果可以参见方法实施例中的相关描述。
图5是本申请提供的一种图像分类装置的结构示意图。该装置500包括处理器510和存储器520,存储器520用于存储计算机程序,处理器510用于从存储器520中调用并运行所述计算机程序执行:获取待分类图像的特征;将所述特征输入神经网络进行分类,所述神经网络包括分类器,所述分类器是通过方法200训练得到的;获取所述待分类图像的分类结果。
可选地,所述预测图像的特征为多视角特征。
可选地,处理器510具体用于:从所述待分类图像中获取多个不同视角的图像;获取所述多个不同视角的图像中每个图像的特征;拼接所述每个图像的特征得到所述待分类图像的多视角特征。
装置500执行图像分类方法的具体方式以及产生的有益效果可以参见方法实施例中的相关描述。
可选地,装置500和装置400为相同的装置。
图6是本申请提供的另一种图像分类装置的结构示意图。该装置600包括处理器610和存储器620,存储器620用于存储计算机程序,处理器610用于从存储器620中调用并运行所述计算机程序执行:从待分类图像中获取多个不同视角的图像;获取所述多个不同视角的图像中每个图像的特征;将所述每个图像的特征输入神经网络进行分类,所述神经网络包括分类器;获取所述待分类图像的分类结果。
可选地,所述多个不同视角的图像包括所述待分类图像,所述待分类图像中的前景图像,所述待分类图像中的背景图像中的至少两个。
可选地,所述将所述每个图像的特征输入神经网络进行分类,包括:将所述每个图像的特征拼接后输入所述神经网络进行分类。
可选地,所述分类器是通过方法200训练得到的。
装置600执行图像分类方法的具体方式以及产生的有益效果可以参见方法实施例中的相关描述。
可选地,装置600和装置400为相同的装置。
图7示出了本申请提供的一种电子设备的结构示意图。图7中的虚线表示该单元或该模块为可选的。设备700可用于实现上述方法实施例中描述的方法。设备700可以是终端设备或服务器或芯片。
设备700包括一个或多个处理器701,该一个或多个处理器701可支持设备700实现方法实施例中的方法。处理器701可以是通用处理器或者专用处理器。例如,处理器701可以是中央处理器(central processing unit,CPU)。CPU可以用于对设备700进行控制,执行软件程序,处理软件程序的数据。设备700还可以包括通信单元705,用以实现信号的输入(接收)和输出(发送)。
例如,设备700可以是芯片,通信单元705可以是该芯片的输入和/或输出电路,或者,通信单元705可以是该芯片的通信接口,该芯片可以作为终端设备或网络设备或其它电子设备的组成部分。
又例如,设备700可以是终端设备或服务器,通信单元705可以是该终端设备或该服务器的收发器,或者,通信单元705可以是该终端设备或该服务器的收发电路。
设备700中可以包括一个或多个存储器702,其上存有程序704,程序704可被处理 器701运行,生成指令703,使得处理器701根据指令703执行上述方法实施例中描述的方法。可选地,存储器702中还可以存储有数据。可选地,处理器701还可以读取存储器702中存储的数据,该数据可以与程序704存储在相同的存储地址,该数据也可以与程序704存储在不同的存储地址。
处理器701和存储器702可以单独设置,也可以集成在一起,例如,集成在终端设备的系统级芯片(system on chip,SOC)上。
设备700还可以包括天线706。通信单元705用于通过天线706实现设备700的收发功能。
处理器701执行训练分类器的方法以及图像分类方法的具体方式可以参见方法实施例中的相关描述。
应理解,上述方法实施例的各步骤可以通过处理器701中的硬件形式的逻辑电路或者软件形式的指令完成。处理器701可以是CPU、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或者其它可编程逻辑器件,例如,分立门、晶体管逻辑器件或分立硬件组件。
本申请还提供了一种计算机程序产品,该计算机程序产品被处理器701执行时实现本申请中任一方法实施例所述的方法。
该计算机程序产品可以存储在存储器702中,例如是程序704,程序704经过预处理、编译、汇编和链接等处理过程最终被转换为能够被处理器701执行的可执行目标文件。
本申请还提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被计算机执行时实现本申请中任一方法实施例所述的方法。该计算机程序可以是高级语言程序,也可以是可执行目标程序。
该计算机可读存储介质例如是存储器702。存储器702可以是易失性存储器或非易失性存储器,或者,存储器702可以同时包括易失性存储器和非易失性存储器。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。
本领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的装置和设备的具体工作过程以及产生的技术效果,可以参考前述方法实施例中对应的过程和技术效果,在此不再赘述。
在本申请所提供的几个实施例中,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的方法实施例的一些特征可以忽略,或不执行。以上所描述的装 置实施例仅仅是示意性的,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,多个单元或组件可以结合或者可以集成到另一个系统。另外,各单元之间的耦合或各个组件之间的耦合可以是直接耦合,也可以是间接耦合,上述耦合包括电的、机械的或其它形式的连接。
应理解,在本申请的各种实施例中,各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请的实施例的实施过程构成任何限定。
另外,本文中术语“系统”和“网络”在本文中常被可互换使用。本文中的术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
总之,以上所述仅为本申请技术方案的较佳实施例而已,并非用于限定本申请的保护范围。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (26)

  1. 一种训练分类器的方法,其特征在于,包括:
    获取第一训练样本,所述第一训练样本包括相应的语义标签;
    获取多个第二训练样本,每个所述第二训练样本包括相应的语义标签;
    根据所述第一训练样本和所述多个第二训练样本的语义相似度从所述多个第二训练样本中确定目标样本;
    根据所述第一训练样本、所述目标样本、所述第一训练样本和所述目标样本的语义相似度训练分类器。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述第一训练样本、所述目标样本、所述第一训练样本和所述目标样本的语义相似度训练分类器,包括:
    通过所述分类器确定所述第一训练样本的预测得分;
    根据所述预测得分、所述第一训练样本与所述目标样本的语义相似度确定所述分类器的语义迁移损失函数L semantic,所述目标样本与所述第一训练样本的语义相似度用于确定所述预测得分对所述L semantic的影响程度;
    根据所述L semantic训练所述分类器。
  3. 根据权利要求2所述的方法,其特征在于,
    当所述目标样本与所述第一训练样本的语义相似度大于等于语义迁移强度时,所述预测得分对所述L semantic的影响程度为100%;
    当所述目标样本与所述第一训练样本的语义相似度小于所述语义迁移强度时,所述预测得分对所述L semantic的影响程度为0。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述根据所述第一训练样本、所述目标样本、所述第一训练样本和所述目标样本的语义相似度训练分类器,包括:
    通过所述分类器确定所述第一训练样本的预测得分S n
    通过所述分类器确定所述目标训练样本的预测得分S b
    根据所述S n、所述S b和平衡性学习强度确定所述分类器的平衡性学习损失函数L IC,所述平衡性学习强度用于调整所述S n和所述S b对所述L IC的影响程度;
    根据所述L IC训练所述分类器。
  5. 根据权利要求4所述的方法,其特征在于,所述平衡性学习强度用于调整所述S n和所述S b对所述L IC的影响程度,包括:
    所述平衡性学习强度用于增大所述S n对所述L IC的影响程度,以及减小所述S b对所述L IC的影响程度。
  6. 根据权利要求1至5中任一项所述的方法,其特征在于,所述根据所述第一训练样本、所述目标样本、所述第一训练样本和所述目标样本的语义相似度训练分类器,包括:
    获取所述第一训练样本和所述目标样本的多视角特征;
    根据所述多视角特征训练所述分类器。
  7. 根据权利要求6所述的方法,其特征在于,所述获取所述第一训练样本和所述目标样本的多视角特征,包括:
    从所述第一训练样本和所述目标样本中的每个样本中分离出多个图像,所述每个样本的多个图像的视角互不相同;
    根据所述每个样本的多个图像获取所述每个样本的多个特征;
    拼接所述每个样本的所述多个特征得到所述多视角特征。
  8. 根据权利要求1至7中任一项所述的方法,其特征在于,所述根据所述第一训练样本、所述目标样本、所述第一训练样本和所述目标样本的语义相似度训练分类器,包括:
    通过所述分类器确定所述第一训练样本的预测得分;
    根据所述预测得分确定所述分类器的分类损失函数L CE
    根据所述L CE训练所述分类器。
  9. 一种图像分类方法,其特征在于,包括:
    获取待分类图像的特征;
    将所述特征输入神经网络进行分类,所述神经网络包括分类器,所述分类器是通过第一训练样本、所述第一训练样本对应的目标样本、所述第一训练样本和所述目标样本的语义相似度训练得到的,所述目标样本是根据所述第一训练样本和多个第二训练样本的语义相似度从所述多个第二训练样本中确定的;
    获取所述待分类图像的分类结果。
  10. 根据权利要求9所述的方法,其特征在于,所述带分类图像的特征为多视角特征。
  11. 根据权利要求10所述的方法,其特征在于,所述获取待分类图像的特征,包括:
    从所述待分类图像中获取多个不同视角的图像;
    获取所述多个不同视角的图像中每个图像的特征;
    拼接所述每个图像的特征得到所述待分类图像的多视角特征。
  12. 根据权利要求9至11中任一项所述的方法,其特征在于,所述分类器是通过第一训练样本、所述第一训练样本对应的目标样本、所述第一训练样本和所述目标样本的语义相似度训练得到的,包括:
    通过所述分类器确定所述第一训练样本的预测得分;
    根据所述预测得分、所述第一训练样本与所述目标样本的语义相似度确定所述分类器的语义迁移损失函数L semantic,所述目标样本与所述第一训练样本的语义相似度用于确定所述预测得分对所述L semantic的影响程度;
    根据所述L semantic训练所述分类器。
  13. 根据权利要求12所述的方法,其特征在于,
    当所述目标样本与所述第一训练样本的语义相似度大于等于语义迁移强度时,所述预测得分对所述L semantic的影响程度为100%;
    当所述目标样本与所述第一训练样本的语义相似度小于所述语义迁移强度时,所述预测得分对所述L semantic的影响程度为0。
  14. 根据权利要求9至13中任一项所述的方法,其特征在于,所述分类器是通过第一训练样本、所述第一训练样本对应的目标样本、所述第一训练样本和所述目标样本的语义相似度训练得到的,包括:
    通过所述分类器确定所述第一训练样本的预测得分S n
    通过所述分类器确定所述目标训练样本的预测得分S b
    根据所述S n、所述S b和平衡性学习强度确定所述分类器的平衡性学习损失函数L IC,所述平衡性学习强度用于调整所述S n和所述S b对所述L IC的影响程度;
    根据所述L IC训练所述分类器。
  15. 根据权利要求14所述的方法,其特征在于,所述平衡性学习强度用于调整所述S n和所述S b对所述L IC的影响程度,包括:
    所述平衡性学习强度用于增大所述S n对所述L IC的影响程度,以及减小所述S b对所述L IC的影响程度。
  16. 根据权利要求9至15中任一项所述的方法,其特征在于,所述分类器是通过第一训练样本、所述第一训练样本对应的目标样本、所述第一训练样本和所述目标样本的语义相似度训练得到的,包括:
    获取所述第一训练样本和所述目标样本的多视角特征;
    根据所述多视角特征训练所述分类器。
  17. 根据权利要求16所述的方法,其特征在于,所述获取所述第一训练样本和所述目标样本的多视角特征,包括:
    从所述第一训练样本和所述目标样本中的每个样本中分离出多个图像,所述每个样本的多个图像的视角互不相同;
    根据所述每个样本的多个图像获取所述每个样本的多个特征;
    拼接所述每个样本的所述多个特征得到所述目标样本的多视角特征。
  18. 根据权利要求9至17中任一项所述的方法,其特征在于,所述分类器是通过第一训练样本、所述第一训练样本对应的目标样本、所述第一训练样本和所述目标样本的语义相似度训练得到的,包括:
    通过所述分类器确定所述第一训练样本的预测得分;
    根据所述预测得分确定所述分类器的分类损失函数L CE
    根据所述L CE训练所述分类器。
  19. 一种图像分类方法,其特征在于,包括:
    从待分类图像中获取多个不同视角的图像;
    获取所述多个不同视角的图像中每个图像的特征;
    将所述每个图像的特征输入神经网络进行分类,所述神经网络包括分类器;
    获取所述待分类图像的分类结果。
  20. 根据权利要求19所述的方法,其特征在于,所述多个不同视角的图像包括以下至少两个:所述待分类图像,所述待分类图像中的前景图像或所述待分类图像中的背景图像。
  21. 根据权利要求19或20所述的方法,其特征在于,所述将所述每个图像的特征输入神经网络进行分类,包括:
    将所述每个图像的特征拼接后输入所述神经网络进行分类。
  22. 根据权利要求19至21中任一项所述的方法,其特征在于,所述分类器是通过权利要求1至8中任一项所述的方法训练得到的。
  23. 一种训练分类器的装置,其特征在于,包括处理器和存储器,所述存储器用于存储计算机程序,所述处理器用于从所述存储器中调用并运行所述计算机程序执行权利要求 1至8中任一项所述的方法。
  24. 一种图像分类装置,其特征在于,包括处理器和存储器,所述存储器用于存储计算机程序,所述处理器用于从所述存储器中调用并运行所述计算机程序执行权利要求9至22中任一项所述的方法。
  25. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储了计算机程序,当所述计算机程序被处理器执行时,使得所述处理器执行权利要求1至8中任一项所述的方法。
  26. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储了计算机程序,当所述计算机程序被处理器执行时,使得所述处理器执行权利要求9至22中任一项所述的方法。
PCT/CN2020/117613 2020-02-23 2020-09-25 训练分类器的方法和装置 Ceased WO2021164254A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP20919863.9A EP4109335A4 (en) 2020-02-23 2020-09-25 METHOD AND DEVICE FOR TRAINING A CLASSIFIER
US17/892,908 US12536471B2 (en) 2020-02-23 2022-08-22 Method and apparatus for training classifier

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010109899.6 2020-02-23
CN202010109899.6A CN111382782B (zh) 2020-02-23 2020-02-23 训练分类器的方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/892,908 Continuation US12536471B2 (en) 2020-02-23 2022-08-22 Method and apparatus for training classifier

Publications (1)

Publication Number Publication Date
WO2021164254A1 true WO2021164254A1 (zh) 2021-08-26

Family

ID=71215189

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/117613 Ceased WO2021164254A1 (zh) 2020-02-23 2020-09-25 训练分类器的方法和装置

Country Status (4)

Country Link
US (1) US12536471B2 (zh)
EP (1) EP4109335A4 (zh)
CN (1) CN111382782B (zh)
WO (1) WO2021164254A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114842255A (zh) * 2022-05-05 2022-08-02 西安交通大学 自适应图约束多视图线性判别分析方法、系统及存储介质
CN115187840A (zh) * 2022-06-20 2022-10-14 支付宝(杭州)信息技术有限公司 训练关系识别模型、进行图分析的方法及装置

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111382782B (zh) 2020-02-23 2024-04-26 华为技术有限公司 训练分类器的方法和装置
CN112766387B (zh) * 2021-01-25 2024-01-23 卡奥斯数字科技(上海)有限公司 一种训练数据的纠错方法、装置、设备及存储介质
CN113111977B (zh) * 2021-05-20 2021-11-09 润联软件系统(深圳)有限公司 训练样本的贡献度评价方法、装置及相关设备
CN113255701B (zh) * 2021-06-24 2021-10-22 军事科学院系统工程研究院网络信息研究所 一种基于绝对-相对学习架构的小样本学习方法和系统
CN114281985B (zh) * 2021-09-30 2025-03-07 腾讯科技(深圳)有限公司 样本特征空间增强方法及装置
CN115510074B (zh) * 2022-11-09 2023-03-03 成都了了科技有限公司 基于一张表的分布式数据管理及应用系统
CN117115678B (zh) * 2023-09-08 2025-04-25 西北工业大学 一种迁移学习驱动的未爆弹弹坑定位与识别方法
CN117609842A (zh) * 2023-11-30 2024-02-27 安徽大学 一种基于元学习的细胞分类方法
CN120183089A (zh) * 2025-03-31 2025-06-20 南京天奥智能医疗科技有限公司 基于多模自适应物联网通信的智能药柜精准信息交互系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108053030A (zh) * 2017-12-15 2018-05-18 清华大学 一种开放领域的迁移学习方法及系统
CN108510004A (zh) * 2018-04-04 2018-09-07 深圳大学 一种基于深度残差网络的细胞分类方法及系统
CN110309875A (zh) * 2019-06-28 2019-10-08 哈尔滨工程大学 一种基于伪样本特征合成的零样本目标分类方法
CN110378408A (zh) * 2019-07-12 2019-10-25 台州宏创电力集团有限公司 基于迁移学习和神经网络的电力设备图像识别方法和装置
AU2020100052A4 (en) * 2020-01-10 2020-02-13 Gao, Yiang Mr Unattended video classifying system based on transfer learning
CN111382782A (zh) * 2020-02-23 2020-07-07 华为技术有限公司 训练分类器的方法和装置

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10706324B2 (en) * 2017-01-19 2020-07-07 Hrl Laboratories, Llc Multi-view embedding with soft-max based compatibility function for zero-shot learning
WO2018140014A1 (en) * 2017-01-25 2018-08-02 Athelas, Inc. Classifying biological samples using automated image analysis
CN108805803B (zh) * 2018-06-13 2020-03-13 衡阳师范学院 一种基于语义分割与深度卷积神经网络的肖像风格迁移方法
CN109658366A (zh) * 2018-10-23 2019-04-19 平顶山天安煤业股份有限公司 基于改进ransac和动态融合的实时视频拼接方法
CN110164550B (zh) * 2019-05-22 2021-07-09 杭州电子科技大学 一种基于多视角协同关系的先天性心脏病辅助诊断方法
CN110738309B (zh) * 2019-09-27 2022-07-12 华中科技大学 Ddnn的训练方法和基于ddnn的多视角目标识别方法和系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108053030A (zh) * 2017-12-15 2018-05-18 清华大学 一种开放领域的迁移学习方法及系统
CN108510004A (zh) * 2018-04-04 2018-09-07 深圳大学 一种基于深度残差网络的细胞分类方法及系统
CN110309875A (zh) * 2019-06-28 2019-10-08 哈尔滨工程大学 一种基于伪样本特征合成的零样本目标分类方法
CN110378408A (zh) * 2019-07-12 2019-10-25 台州宏创电力集团有限公司 基于迁移学习和神经网络的电力设备图像识别方法和装置
AU2020100052A4 (en) * 2020-01-10 2020-02-13 Gao, Yiang Mr Unattended video classifying system based on transfer learning
CN111382782A (zh) * 2020-02-23 2020-07-07 华为技术有限公司 训练分类器的方法和装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4109335A4

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114842255A (zh) * 2022-05-05 2022-08-02 西安交通大学 自适应图约束多视图线性判别分析方法、系统及存储介质
CN115187840A (zh) * 2022-06-20 2022-10-14 支付宝(杭州)信息技术有限公司 训练关系识别模型、进行图分析的方法及装置

Also Published As

Publication number Publication date
US20230177390A1 (en) 2023-06-08
CN111382782B (zh) 2024-04-26
EP4109335A4 (en) 2023-08-16
EP4109335A1 (en) 2022-12-28
US12536471B2 (en) 2026-01-27
CN111382782A (zh) 2020-07-07

Similar Documents

Publication Publication Date Title
WO2021164254A1 (zh) 训练分类器的方法和装置
US12314343B2 (en) Image classification method, neural network training method, and apparatus
Weng et al. Cattle face recognition based on a Two-Branch convolutional neural network
US20210335002A1 (en) Method, apparatus, terminal, and storage medium for training model
WO2022077646A1 (zh) 一种用于图像处理的学生模型的训练方法及装置
CN114612961B (zh) 一种多源跨域表情识别方法、装置及存储介质
CN113392866A (zh) 一种基于人工智能的图像处理方法、装置及存储介质
CN113887447B (zh) 一种针对密集群体目标的密度估计、分类预测模型的训练、推理方法及装置
CN112204575A (zh) 使用文本和视觉嵌入的多模态图像分类器
CN113743426A (zh) 一种训练方法、装置、设备以及计算机可读存储介质
CN110033026A (zh) 一种连续小样本图像的目标检测方法、装置及设备
CN112950642B (zh) 点云实例分割模型的训练方法、装置、电子设备和介质
CN110598603A (zh) 人脸识别模型获取方法、装置、设备和介质
CN110175657B (zh) 一种图像多标签标记方法、装置、设备及可读存储介质
CN107292349A (zh) 基于百科知识语义增强的零样本分类方法、装置
CN114281985B (zh) 样本特征空间增强方法及装置
CN110162639A (zh) 识图知意的方法、装置、设备及存储介质
CN111768457A (zh) 图像数据压缩方法、装置、电子设备和存储介质
CN113704528B (zh) 聚类中心确定方法、装置和设备及计算机存储介质
CN115546831A (zh) 一种多粒度注意力机制跨模态行人搜索方法和系统
Hu et al. Sketch-a-segmenter: Sketch-based photo segmenter generation
CN117422942A (zh) 模型训练方法、图像分类方法、图像分类装置及存储介质
CN115359296A (zh) 图像识别方法、装置、电子设备及存储介质
CN117475369A (zh) 一种基于深度学习的文本描述行人重识别方法
CN115641480A (zh) 一种基于样本筛选与标签矫正的噪声数据集训练方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20919863

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020919863

Country of ref document: EP

Effective date: 20220923