CN110175615A - The adaptive visual position recognition methods in model training method, domain and device - Google Patents

The adaptive visual position recognition methods in model training method, domain and device Download PDF

Info

Publication number
CN110175615A
CN110175615A CN201910350741.5A CN201910350741A CN110175615A CN 110175615 A CN110175615 A CN 110175615A CN 201910350741 A CN201910350741 A CN 201910350741A CN 110175615 A CN110175615 A CN 110175615A
Authority
CN
China
Prior art keywords
image
feature
layer
feature extraction
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910350741.5A
Other languages
Chinese (zh)
Other versions
CN110175615B (en
Inventor
桑农
刘耀华
高常鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201910350741.5A priority Critical patent/CN110175615B/en
Publication of CN110175615A publication Critical patent/CN110175615A/en
Application granted granted Critical
Publication of CN110175615B publication Critical patent/CN110175615B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种模型训练方法、域自适应的视觉位置识别方法及装置,属于计算机视觉技术领域,包括:建立基于深度神经网络的图像特征提取模型;根据标准数据集构建训练集,训练集中每个训练样本包括目标图像及其正样本和s个负样本;利用训练集对图像特征提取模型进行训练;图像特征提取模型中,特征提取网络包括级联的多个第一网络;第一网络由一个或多个第二网络以及一个极大池化层依次连接而成,极大池化层用于特征选择;第二网络包括依次连接的卷积层,用于特征提取;批标准化层,用于进行零均值标准化处理;激活函数层,用于进行激活处理;局部特征聚合网络用于聚合局部特征以得到图像的特征向量。本发明能够提高视觉位置识别的鲁棒性。

The invention discloses a model training method, a domain-adaptive visual position recognition method and a device, belonging to the technical field of computer vision, including: establishing an image feature extraction model based on a deep neural network; constructing a training set according to a standard data set; Each training sample includes a target image and its positive samples and s negative samples; the image feature extraction model is trained using the training set; in the image feature extraction model, the feature extraction network includes a plurality of cascaded first networks; the first network It is composed of one or more second networks and a maximum pooling layer connected in sequence, and the maximum pooling layer is used for feature selection; the second network includes sequentially connected convolution layers for feature extraction; batch normalization layer for Perform zero-mean normalization processing; the activation function layer is used for activation processing; the local feature aggregation network is used to aggregate local features to obtain the feature vector of the image. The invention can improve the robustness of visual position recognition.

Description

Model training method, domain-adaptive visual position identification method and device
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a model training method, a domain adaptive visual position identification method and a domain adaptive visual position identification device.
Background
The visual position identification specifically refers to that the image is subjected to feature extraction, and then the geographic position of the image is identified according to the extracted image features. With the growing development of automated driving, the increasing demand for autonomous navigation mobile robots, and the increasing popularity of virtual reality and augmented reality, research in visual location identification has attracted considerable attention in the field of computer vision, the community of robots, and other related fields.
In the early stage of computer vision research, image features such as Scale Invariant Feature Transform (SIFT) feature points are extracted mainly by a manually and carefully designed method for extracting image feature points. The design of the extracted features is very dependent on experience, some expert scholars can design a good feature even after decades, the algorithm for extracting the image feature points by utilizing manual design has very poor effect under the conditions of sharp illumination change (such as day to night) and scene change (editing occurs to pedestrians and vehicles in the scene), and the performance of a visual position identification method relying on the features, such as a visual bag-of-words model (V-BOW), can also be sharply reduced. In recent years, with the rise of deep learning, and the method is widely applied to the fields of target recognition, target detection, target tracking, semantic segmentation, and the like, some visual position recognition methods based on deep learning are proposed. For example, based on the position recognition (Convolutional Neural Network-based plant recognition) of the Convolutional Neural Network, the method extracts the image features by using the deep Convolutional Neural Network, and the extracted image features are more robust because the deep Convolutional Neural Network can perform end-to-end training according to a specific task. For another example, the CNN architecture NetVLAD (NetVLAD: CNN architecture for weak supervised position recognition) for weakly supervised position recognition, which takes advantage of the conventional local feature aggregation (VLAD) method, effectively aggregates local features of an image to obtain a compact image expression feature vector, and makes image features extracted by using a deep neural network more robust.
Compared with the traditional visual position identification method based on the manually designed image feature points, the image features extracted by the visual position identification method based on the deep neural network are more robust, and the visual position identification is more accurate. However, the deep neural network needs to be trained before use, and due to the influence of factors such as viewing angle and illumination, the feature distribution of the image used for training often has a large difference from the feature distribution of the actual image to be recognized, and in this case, the accuracy of visual position recognition cannot be guaranteed. In general, the existing visual position recognition methods are less robust.
Disclosure of Invention
Aiming at the defects and improvement requirements of the prior art, the invention provides a model training method, a domain adaptive visual position identification method and a domain adaptive visual position identification device, and aims to improve the robustness of visual position identification.
To achieve the above object, according to a first aspect of the present invention, there is provided an image feature extraction model training method, including:
(1) establishing an image feature extraction model based on a deep neural network, and obtaining feature vectors of an image;
the image feature extraction model comprises a feature extraction network and a local feature aggregation network;
the feature extraction network comprises a plurality of cascaded first networks; the first network is formed by sequentially connecting one or more second networks and a maximum pooling layer, and the maximum pooling layer is used for carrying out feature selection on images output by the previous second networks; the second network comprises a convolution layer, a batch standardization layer and an activation function layer which are sequentially connected, wherein the convolution layer is used for carrying out feature extraction on images, the batch standardization layer is used for carrying out zero-mean standardization processing on the images output by the convolution layer, and the activation function layer is used for carrying out activation processing on the images output by the batch standardization layer;
the local feature aggregation network is used for aggregating all local features in the image output by the feature extraction network so as to obtain a feature vector of the image;
(2) in the standard data set, obtaining a positive sample and s negative samples of each target image, so as to form a training sample by one target image and the positive sample and the negative sample thereof, thereby obtaining a training set formed by all the training samples;
the positive sample of the target image is the image which is closest to the characteristic distance of the target image in the adjacent images, and the position distance d between the target image and the adjacent images meets TNL≤d<TNH(ii) a The position distance between the target image and the negative sample thereof satisfies d ≥ TF
(3) Training the image feature extraction model by using a training set so as to obtain each model parameter;
the position information of each image in the standard data set is known, and the target image is a plurality of images screened in advance in the standard data set; the characteristic distance between the images is the distance between the characteristic vectors of the images; t isNL、TNHAnd TFAre all preset threshold values, T is more than 0NL<TNH,TNH≤TF;s≥1。
According to the image feature extraction model training method, in the established image feature extraction model, after each convolution used for feature extraction, zero-mean standardization processing is carried out on the image output by the convolution layer through a Batch Normalization layer (Batch Normalization), so that the image features extracted by the image feature extraction model are distributed similarly while model training is accelerated, the problem that the model training effect is poor due to the fact that the feature distribution difference of the images in a training set is large is effectively avoided, and the problem that the robustness of visual position recognition is low when the feature distribution difference of the images is large is solved.
Further, the local feature aggregation network includes: the integrated structure comprises a dimensionality reduction convolution layer, a soft-max layer, a polymerization layer, an internal normalization layer and an integral normalization layer;
the dimensionality reduction convolutional layer is a convolutional layer and is used for reducing the dimensionality of the image to be aggregated to be equal to the number of preset clustering centers so that each channel of the image to be aggregated represents the weight of the difference between the local characteristic and each clustering center;
the soft-max layer is used for normalizing the weight of the difference between the local feature and each cluster center;
the aggregation layer is used for aggregating according to the local features, the clustering center and the weight after normalization to obtain VLAD (vector of localization aggregated descriptors) vectors; the VLAD vector consists of vectors of N D dimensions, wherein N is the number of the clustering centers, and D is the dimension of the clustering centers;
the internal normalization layer is used for normalizing the vector of each D dimension in the VLAD vector so as to enable the distribution of the vector of each D dimension to be in the same order of magnitude;
the integral normalization layer is used for serially connecting the D-dimension vectors processed by the internal normalization layer into a column vector, and then normalizing the column vector so as to enable each local feature of the image to be aggregated to be distributed in the same order of magnitude; therefore, the convergence speed of the neural network model and the accuracy of the network model can be improved;
wherein, the image to be aggregated is the image output by the feature extraction network.
Further, s > 1; the training precision of the model can be improved by selecting a plurality of negative samples, so that the image feature vector acquired by the image feature extraction model has higher robustness when being used for visual position identification.
Further, in the step (3), when the training set is used to train the image feature extraction model, the loss function adopted is:
wherein n is the total number of training samples, k is the serial number of the training samples, i is the serial number of the negative sample, qk、pkAnd nkiRespectively representing a target image, a positive sample and an ith negative sample in a kth training sample,representing a target image qkWith its positive sample pkThe characteristic distance between the two or more of them,is a target image qkWith its negative sample nkiM is a predefined hyper-parameter, max represents taking the maximum value, and min represents taking the minimum value;
the loss function is based on the thought of triple loss, so that the characteristic distance between a target image and a positive sample is minimized and the characteristic distance between the target image and a negative sample is maximized through training; wherein by passingThe negative sample with the largest loss is selected, so that the negative sample which is difficult to distinguish is more noticed in the model training process based on the idea of difficult example mining, and the interference of the negative sample similar to the image to be recognized can be avoided when the visual position recognition is carried out by utilizing the image feature extraction model.
According to a second aspect of the present invention, there is also provided a domain-adaptive visual position recognition method based on the image feature extraction model training method provided in the first aspect of the present invention, including:
determining a target domain to which an image to be identified belongs, obtaining a plurality of images at different positions in the target domain, and taking the obtained image and the image to be identified as an image to be retrieved;
taking an image to be retrieved as input, and obtaining a feature vector of each image to be retrieved by using an image feature extraction model; when the image characteristic vector is obtained, for each convolution layer, counting the mean value and the standard deviation of the characteristic graph obtained after all the images to be retrieved pass through the convolution layer, and taking the mean value and the standard deviation as the parameters of a batch standard layer behind the convolution layer; the rest model parameters in the image feature extraction model are model parameters obtained by training;
acquiring a feature vector of each image in the test data set by using the image feature acquisition model;
obtaining an image which is closest to the characteristic distance of the image to be recognized in the test data set according to the obtained characteristic vector, and determining the position information of the image as the position information of the image to be recognized, so as to complete the visual position recognition of the image to be recognized;
the position information of each image in the test data set is known, and the domain is a factor set influencing the characteristic distribution of the image;
according to practical application, the domain can be defined according to the influence of factors such as illumination, visual angle, season and the like on the image characteristic distribution, and the characteristic distribution of the images in the same domain is similar; for example, if only light irradiation has a large influence on the feature distribution of an image, and an image photographed in the daytime has a similar feature distribution and an image photographed in the nighttime has a similar feature distribution, two domains may be divided according to the light condition;
according to the domain self-adaptive visual position recognition method, when the image feature extraction model is used for obtaining the feature vector of the image to be recognized, the parameters of each batch of standardized layers in the model do not depend on the training set, but a plurality of images belonging to the same domain with the image to be recognized are used for obtaining corresponding parameters, and the images in the same domain have similar feature distribution, so that the domain self-adaptive visual position recognition method can realize the domain self-adaptation, and can still accurately complete the visual position recognition when the difference between the feature distribution of the image in the training set and the feature distribution of the image to be recognized is large, namely, the robustness of the visual position recognition can be improved.
Further, when the feature vector of each image in the test data set is obtained by using the image feature obtaining model, the setting mode of each model parameter is as follows:
setting each model parameter by using the model parameters obtained by training;
or, for each convolution layer, counting the thickness of all images in the test data set passing through the convolution layer, and taking the mean value and standard deviation of the obtained characteristic diagram as the parameters of a batch of normalization layers behind the convolution layer; and the rest model parameters in the image feature extraction model are model parameters obtained by training.
According to a third aspect of the present invention, there is provided an image feature extraction model training apparatus comprising: the system comprises a model establishing module, a training set constructing module and a model training module;
the model establishing module is used for establishing an image feature extraction model based on a deep neural network, and the image feature extraction model is used for acquiring a feature vector of an image;
the training set construction module is used for obtaining a positive sample and s negative samples of each target image in the standard data set so as to form a training sample by one target image and the positive sample and the negative sample thereof, thereby obtaining a training set formed by all the training samples;
the model training module is used for training the image feature extraction model by utilizing a training set so as to obtain each model parameter;
the image feature extraction model comprises a feature extraction network and a local feature aggregation network;
the feature extraction network comprises a plurality of cascaded first networks; the first network is formed by sequentially connecting one or more second networks and a maximum pooling layer, and the maximum pooling layer is used for carrying out feature selection on images output by the previous second networks; the second network comprises a convolution layer, a batch standardization layer and an activation function layer which are sequentially connected, wherein the convolution layer is used for carrying out feature extraction on images, the batch standardization layer is used for carrying out zero-mean standardization processing on the images output by the convolution layer, and the activation function layer is used for carrying out activation processing on the images output by the batch standardization layer;
the local feature aggregation network is used for aggregating all local features in the image output by the feature extraction network so as to obtain a feature vector of the image;
the positive sample of the target image is the image which is closest to the characteristic distance of the target image in the adjacent images, and the position distance d between the target image and the adjacent images meets TNL≤d<TNH(ii) a The position distance d between the target image and the negative sample thereof satisfies that d is more than or equal to TF
The position information of each image in the standard data set is known, and the target image is a plurality of images screened in advance from the standard data setAn image; the characteristic distance between the images is the distance between the characteristic vectors of the images; t isNL、TNHAnd TFAre all preset threshold values, T is more than 0NL<TNH,TNH≤TF;s≥1。
According to a fourth aspect of the present invention, there is further provided a domain-adaptive visual position recognition apparatus based on the image feature extraction model training method provided in the first aspect of the present invention, including: the system comprises a retrieval set acquisition module, a first feature extraction module, a second feature extraction module and an identification module;
the retrieval set acquisition module is used for determining a target domain to which the image to be identified belongs, acquiring a plurality of images at different positions in the target domain, and taking the acquired image and the image to be identified as the image to be retrieved;
the first feature extraction module is used for taking the images to be retrieved as input and obtaining feature vectors of the images to be retrieved by utilizing the image feature extraction model; when the image characteristic vector is obtained, for each convolution layer, counting the mean value and the standard deviation of the characteristic graph obtained after all the images to be retrieved pass through the convolution layer, and taking the mean value and the standard deviation as the parameters of a batch standard layer behind the convolution layer; the rest model parameters in the image feature extraction model are model parameters obtained by training;
the second feature extraction module is used for acquiring a feature vector of each image in the test data set by using the image feature acquisition model;
the identification module is used for acquiring an image which is closest to the characteristic distance of the image to be identified in the test data set according to the characteristic vectors extracted by the first characteristic extraction module and the second characteristic extraction module, and determining the position information of the image as the position information of the image to be identified, so that the visual position identification of the image to be identified is completed;
the position information of each image in the test data set is known, and the domain is a factor set influencing the image characteristic distribution.
Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:
(1) according to the image feature extraction model training method provided by the invention, in the established image feature extraction model, after each convolution used for feature extraction, zero-mean standardization processing is carried out on the image output by the convolution layer through one batch of standardization layers, so that the image features extracted by the image feature extraction model can be distributed similarly while the model training is accelerated, the problem that the model training effect is poor due to the fact that the feature distribution difference of the images in a training set is large is effectively avoided, and the problem that the robustness of visual position identification is low when the image feature distribution difference is large can be further improved.
(2) In the preferred scheme of the image feature extraction model training method provided by the invention, the training sample is constructed by selecting a plurality of negative samples, so that the training precision of the model can be improved, and the image feature vector acquired by the image feature extraction model has higher robustness when being used for visual position identification.
(3) In the preferred scheme of the image feature extraction model training method provided by the invention, the loss function is constructed based on the thought of triple loss and difficult example mining, so that the negative sample which is difficult to distinguish is paid more attention in the model training process, and the interference of the negative sample similar to the image to be recognized can be avoided when the image feature extraction model is used for visual position recognition.
(4) According to the domain self-adaptive visual position identification method provided by the invention, when the image feature extraction model is used for acquiring the feature vector of the image to be identified, the parameters of each batch of standardized layers in the model do not depend on the training set, but a plurality of images belonging to the same domain as the image to be identified are used for acquiring corresponding parameters, so that the domain self-adaptation is realized, and the robustness of visual position identification can be improved.
Drawings
FIG. 1 is a schematic diagram of an image feature extraction model according to an embodiment of the present invention;
FIG. 2 is a flow chart of a domain-adaptive visual location identification method according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a domain-adaptive visual location identification method according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The invention provides an image feature extraction model training method, which comprises the following steps:
(1) establishing an image feature extraction model based on a deep neural network, and obtaining feature vectors of an image;
as shown in fig. 1, the image feature extraction model includes a feature extraction network and a local feature aggregation network;
the feature extraction network comprises a plurality of cascaded first networks; the first network is formed by sequentially connecting one or more second networks and a maximum pooling layer (Pool), and the maximum pooling layer is used for carrying out feature selection on images output by the previous second network; the second network comprises a convolution layer (Conv), a batch standardization layer (BN) and an activation function layer (Relu) which are sequentially connected, wherein the convolution layer is used for carrying out feature extraction on images, the batch standardization layer is used for carrying out zero-mean standardization processing on the images output by the convolution layer, and the activation function layer is used for carrying out activation processing on the images output by the batch standardization layer; in this embodiment, the convolution kernel size of the convolution layer in each second network is specifically 3 × 3; the number of the second networks included in each first network may be the same or different;
the local feature aggregation network is used for aggregating all local features in the image output by the feature extraction network so as to obtain a feature vector of the image;
in an alternative embodiment, as shown in fig. 1, a local feature aggregation network includes: a dimension-reducing convolutional layer (Conv), a Soft-max layer (Soft-max), a polymeric layer (VLAD), an inner-normalization layer (Intra-normalization), and an overall-normalization layer (L2-normalization);
the dimension reduction convolutional layer is a convolutional layer, the convolution sum of the dimension reduction convolutional layer is 1x1, and the dimension reduction convolutional layer is used for reducing the dimension of the latitude of the image to be aggregated to be equal to the number of preset clustering centers so that each channel of the image to be aggregated represents the weight of the difference between the local feature and each clustering center; wherein, the image to be aggregated is the image output by the feature extraction network;
the soft-max layer is used for normalizing the weight of the difference between the local feature and each cluster center;
the aggregation layer is used for aggregating according to the local features, the clustering center and the weight after normalization to obtain a VLAD vector; the VLAD vector consists of vectors of N D dimensions, wherein N is the number of the clustering centers, and D is the dimension of the clustering centers;
let N cluster centers be represented by CluCenter [ c ]1,c2,...cj,...cN]Wherein the dimension of each cluster center is D, cj(j ∈ {1,2, …, N }) represents the jth cluster center;
the local Features of each image output by the feature extraction network are n, and are expressed by Features, wherein the Features is f1,f2,...fi...fn]Wherein f isi(i ∈ {1,2, …, n }) represents the ith local feature;
the weight of the difference between the ith local feature and the jth cluster center is given by aijAs indicated, the vector VLADvector of the jth D dimension in the VLAD vector can be obtainedj(i.e., the jth element of the VLAD vector) is:
the internal normalization layer is used for normalizing the vector of each D dimension in the VLAD vector so as to enable the distribution of the vector of each D dimension to be in the same order of magnitude;
the integral normalization layer is used for serially connecting the D-dimension vectors processed by the internal normalization layer into a column vector, and then normalizing the column vector so as to enable each local feature of the image to be aggregated to be distributed in the same order of magnitude; therefore, the convergence speed of the neural network model and the accuracy of the network model can be improved;
in this embodiment, the internal normalization layer and the integral normalization layer both perform normalization operations by means of norm normalization with L2;
(2) in the standard data set, obtaining a positive sample and s negative samples of each target image, so as to form a training sample by one target image and the positive sample and the negative sample thereof, thereby obtaining a training set formed by all the training samples;
the position information of each image in the standard data set is known, and the target image is a plurality of images screened in advance in the standard data set;
the positive sample of the target image is the image which is closest to the characteristic distance of the target image in the adjacent images, and the position distance d between the target image and the adjacent images meets TNL≤d<TNH(ii) a The position distance between the target image and the negative sample thereof satisfies d ≥ TF(ii) a Wherein, TNL、TNHAnd TFAre all preset threshold values, T is more than 0NL<TNH,TNH≤TF(ii) a s is more than or equal to 1; the characteristic distance between images being characteristic of the imagesDistance between eigenvectors;
in this embodiment, the standard dataset for model training is the TokyoTimeMachine google streetscape dataset; the data set includes images taken from a plurality of different locations, each location taken from 12 angular directions, for a total of about 47000 images, each with geographic coordinate information; in the data set, the target images are 10000 images randomly selected, that is, the total number of training samples is n is 10000; in other applications, other data sets can be selected as standard data sets according to actual application requirements;
threshold value TNL、TNHAnd TFCan be set according to the adopted standard data set and the actual application scene, and in general, T isNH≤25,25≤TF(ii) a In this embodiment, the threshold setting is specifically TNL=1,TNH=10,TF25; passing threshold TNLAnd TNHSetting the upper and lower limits of the distance between the target image and the positive sample thereof can ensure that the positive sample is similar to the target image but different from the target image, avoid overfitting the model and further ensure better model training effect;
in this embodiment, specifically, in each training sample, the number of negative samples is s-4; the training precision of the model can be improved by selecting a plurality of negative samples, so that the image feature vector acquired by the image feature extraction model has higher robustness when being used for visual position identification;
the training set train set constructed in this embodiment may be specifically expressed as:
wherein for any k-th training sample Sk,qk、pkAnd nki(i e belongs to {1,2,3,4}) respectively represents a target image, a positive sample and an ith negative sample in the training sample;
(3) and training the image feature extraction model by using a training set so as to obtain each model parameter.
According to the image feature extraction model training method, in the established image feature extraction model, after each convolution used for feature extraction, zero-mean standardization processing is carried out on the image output by the convolution layer through a Batch Normalization layer (Batch Normalization), so that the image features extracted by the image feature extraction model are distributed similarly while model training is accelerated, the problem that the model training effect is poor due to the fact that the feature distribution difference of the images in a training set is large is effectively avoided, and the problem that the robustness of visual position recognition is low when the feature distribution difference of the images is large is solved.
In order to further improve the robustness of visual location recognition, in step (3) of the image feature extraction model training method, when the image feature extraction model is trained by using a training set, the adopted loss function is specifically:
wherein,representing a target image qkWith its positive sample pkThe characteristic distance between the two or more of them,is a target image qkWith its negative sample nkiM is a predefined hyper-parameter, max represents taking the maximum value, and min represents taking the minimum value;
the loss function is based on the thought of triple loss, so that the characteristic distance between a target image and a positive sample is minimized and the characteristic distance between the target image and a negative sample is maximized through training; wherein by passingThe negative sample with the largest loss is selected, so that the negative sample which is difficult to distinguish is more noticed in the model training process based on the idea of difficult example mining, and the interference of the negative sample similar to the image to be recognized can be avoided when the visual position recognition is carried out by utilizing the image feature extraction model.
The invention also provides a domain-adaptive visual position recognition method based on the image feature extraction model training method, as shown in fig. 2, comprising the following steps:
determining a target domain to which an image to be identified belongs, obtaining a plurality of images at different positions in the target domain, and taking the obtained image and the image to be identified as an image to be retrieved;
taking an image to be retrieved as input, and obtaining a feature vector of each image to be retrieved by using an image feature extraction model; when the image characteristic vector is obtained, for each convolution layer, counting the mean value and the standard deviation of the characteristic graph obtained after all the images to be retrieved pass through the convolution layer, and taking the mean value and the standard deviation as the parameters of a batch standard layer behind the convolution layer; the rest model parameters in the image feature extraction model are model parameters obtained by training;
acquiring a feature vector of each image in the test data set by using the image feature acquisition model; in the present embodiment, the test data set for visual position identification is specifically a tokyo247 data set, where each image carries geographic coordinate information;
obtaining an image which is closest to the characteristic distance of the image to be recognized in the test data set according to the obtained characteristic vector, and determining the position information of the image as the position information of the image to be recognized, so as to complete the visual position recognition of the image to be recognized;
the position information of each image in the test data set is known, and the domain is a factor set influencing the characteristic distribution of the image;
according to practical application, the domain can be defined according to the influence of factors such as illumination, visual angle, season and the like on the image characteristic distribution, and the characteristic distribution of the images in the same domain is similar; for example, if only light irradiation has a large influence on the feature distribution of an image, and an image photographed in the daytime has a similar feature distribution and an image photographed in the nighttime has a similar feature distribution, two domains may be divided according to the light condition; the domain is specifically defined according to which factors and the similarity degree of image feature distribution in the same domain can be determined according to actual application requirements, and only the accuracy of final visual position identification is ensured to meet the requirements;
according to the domain self-adaptive visual position recognition method, when the image feature extraction model is used for obtaining the feature vector of the image to be recognized, the parameters of each batch of standardized layers in the model do not depend on the training set, but a plurality of images belonging to the same domain with the image to be recognized are used for obtaining corresponding parameters, and the images in the same domain have similar feature distribution, so that the domain self-adaptive visual position recognition method can realize the domain self-adaptation, and can still accurately complete the visual position recognition when the difference between the feature distribution of the image in the training set and the feature distribution of the image to be recognized is large, namely, the robustness of the visual position recognition can be improved.
In the above method for recognizing a visual position, since the images in the test data set tokyo247 for recognizing a visual position and the standard data set tokyo timemachine for training a model have similar feature distributions, in this embodiment, when the feature vector of each image in the test data set is obtained by using the image feature obtaining model, the model parameters obtained by training by using the above method for extracting a model from image features are directly used to set each model parameter;
in other application scenarios, in order to avoid the dependence on the training set to the greatest extent, when the feature vector of each image in the test data set is obtained by using the image feature acquisition model, the setting of the model parameters can be realized by adopting the following method: for each convolution layer, counting the thickness of all images in the test data set passing through the convolution layer, and taking the mean value and standard deviation of the obtained characteristic graph as parameters of a batch of normalization layers behind the convolution layer; and the rest model parameters in the image feature extraction model are model parameters obtained by training.
Fig. 3 shows an example of using visual location recognition, where a training set image represents a standard data set for model training, a query image is an image to be retrieved, and a galery image is a test set database image.
The invention also provides an image feature extraction model training device, which is used for realizing the image feature extraction model training method and comprises the following steps: the system comprises a model establishing module, a training set constructing module and a model training module;
the model establishing module is used for establishing an image feature extraction model based on a deep neural network, and the image feature extraction model is used for acquiring a feature vector of an image;
the training set construction module is used for obtaining a positive sample and s negative samples of each target image in the standard data set so as to form a training sample by one target image and the positive sample and the negative sample thereof, thereby obtaining a training set formed by all the training samples;
the model training module is used for training the image feature extraction model by utilizing a training set so as to obtain each model parameter;
the image feature extraction model comprises a feature extraction network and a local feature aggregation network;
the feature extraction network comprises a plurality of cascaded first networks; the first network is formed by sequentially connecting one or more second networks and a maximum pooling layer, and the maximum pooling layer is used for carrying out feature selection on images output by the previous second networks; the second network comprises a convolution layer, a batch standardization layer and an activation function layer which are sequentially connected, wherein the convolution layer is used for carrying out feature extraction on images, the batch standardization layer is used for carrying out zero-mean standardization processing on the images output by the convolution layer, and the activation function layer is used for carrying out activation processing on the images output by the batch standardization layer;
the local feature aggregation network is used for aggregating all local features in the image output by the feature extraction network so as to obtain a feature vector of the image;
the positive sample of the target image is the image which is closest to the characteristic distance of the target image in the adjacent images, and the position distance d between the target image and the adjacent images meets TNL≤d<TNH(ii) a The position distance d between the target image and the negative sample thereof satisfies that d is more than or equal to TF
The position information of each image in the standard data set is known, and the target image is a plurality of images screened in advance in the standard data set; the characteristic distance between the images is the distance between the characteristic vectors of the images; t isNL、TNHAnd TFAre all preset threshold values, T is more than 0NL<TNH,TNH≤TF;s≥1;
In this embodiment, the detailed implementation of each module may refer to the description of the method embodiment described above, and will not be repeated here.
The invention also provides a domain adaptive visual position recognition device, which is used for realizing the domain adaptive visual position recognition method and comprises the following steps: the system comprises a retrieval set acquisition module, a first feature extraction module, a second feature extraction module and an identification module;
the retrieval set acquisition module is used for determining a target domain to which the image to be identified belongs, acquiring a plurality of images at different positions in the target domain, and taking the acquired image and the image to be identified as the image to be retrieved;
the first feature extraction module is used for taking the images to be retrieved as input and obtaining feature vectors of the images to be retrieved by utilizing the image feature extraction model; when the image characteristic vector is obtained, for each convolution layer, counting the mean value and the standard deviation of the characteristic graph obtained after all the images to be retrieved pass through the convolution layer, and taking the mean value and the standard deviation as the parameters of a batch standard layer behind the convolution layer; the rest model parameters in the image feature extraction model are model parameters obtained by training;
the second feature extraction module is used for acquiring a feature vector of each image in the test data set by using the image feature acquisition model;
the identification module is used for acquiring an image which is closest to the characteristic distance of the image to be identified in the test data set according to the characteristic vectors extracted by the first characteristic extraction module and the second characteristic extraction module, and determining the position information of the image as the position information of the image to be identified, so that the visual position identification of the image to be identified is completed;
the position information of each image in the test data set is known, and the domain is a factor set influencing the characteristic distribution of the image;
in this embodiment, the detailed implementation of each module may refer to the description of the method embodiment described above, and will not be repeated here.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (8)

1.一种图像特征提取模型训练方法,其特征在于,包括:1. an image feature extraction model training method, is characterized in that, comprises: (1)建立基于深度神经网络的图像特征提取模型,用于获取图像的特征向量;(1) Establish an image feature extraction model based on a deep neural network to obtain the feature vector of the image; 所述图像特征提取模型包括特征提取网络和局部特征聚合网络;The image feature extraction model includes a feature extraction network and a local feature aggregation network; 所述特征提取网络包括级联的多个第一网络;所述第一网络由一个或多个第二网络以及一个极大池化层依次连接而成,所述极大池化层用于对其前的第二网络输出的图像进行特征选择;所述第二网络包括依次连接的卷积层、批标准化层以及激活函数层,所述卷积层用于对图像进行特征提取,所述批标准化层用于对所述卷积层输出的图像进行零均值标准化处理,所述激活函数层用于对所述批标准化层输出的图像进行激活处理;The feature extraction network includes a plurality of cascaded first networks; the first network is sequentially connected by one or more second networks and a maximum pooling layer, and the maximum pooling layer is used for its previous The image of the second network output of the second network is subjected to feature selection; the second network includes a convolutional layer, a batch normalization layer and an activation function layer connected in sequence, the convolutional layer is used to extract features from the image, and the batch normalization layer For performing zero-mean normalization processing on the image output by the convolution layer, and the activation function layer is used for performing activation processing on the image output by the batch normalization layer; 所述局部特征聚合网络用于对所述特征提取网络输出的图像中所有的局部特征进行聚合,从而得到图像的特征向量;The local feature aggregation network is used to aggregate all the local features in the image output by the feature extraction network, so as to obtain the feature vector of the image; (2)在标准数据集中获得各目标图像的正样本和s个负样本,以由一张目标图像及其正样本和负样本构成一个训练样本,从而得到由所有训练样本构成的训练集;(2) Obtain positive samples and s negative samples of each target image in the standard data set, so as to form a training sample by a target image and its positive samples and negative samples, thereby obtaining a training set composed of all training samples; 目标图像的正样本为其临近图像中与其特征距离最近的图像,目标图像与其临近图像的位置距离d满足TNL≤d<TNH;目标图像与其负样本的位置距离满足d≥TFThe positive sample of the target image is the image closest to its feature distance among the adjacent images, and the position distance d between the target image and its adjacent image satisfies T NL ≤ d<T NH ; the position distance between the target image and its negative sample satisfies d≥T F ; (3)利用所述训练集对所述图像特征提取模型进行训练,从而得到各模型参数;(3) using the training set to train the image feature extraction model, so as to obtain each model parameter; 其中,所述标准数据集中各图像的位置信息已知,目标图像为所述标准数据集中预先筛选出的多张图像;图像间的特征距离为图像的特征向量之间的距离;TNL、TNH和TF均为预设的阈值,0<TNL<TNH,TNH≤TF;s≥1。Wherein, the position information of each image in the standard data set is known, and the target image is a plurality of pre-screened images in the standard data set; the feature distance between images is the distance between the feature vectors of the images; T NL , T Both NH and TF are preset thresholds, 0<T NL <T NH , T NH ≤T F ; s≥1 . 2.如权利要求1所述的图像特征提取模型训练方法,其特征在于,所述局部特征聚合网络包括:降维卷积层、soft-max层、聚合层、内部归一化层以及整体归一化层;所述降维卷积层为一层卷积层,用于将待聚合图像的纬度降维到与预设的聚类中心的个数相等,以使得所述待聚合图像的每个通道表示局部特征与每个聚类中心之差的权重;2. The image feature extraction model training method according to claim 1, wherein the local feature aggregation network comprises: a dimensionality reduction convolution layer, a soft-max layer, an aggregation layer, an internal normalization layer, and an overall normalization layer. One layer; the dimensionality reduction convolutional layer is a layer of convolutional layer, which is used to reduce the latitude of the image to be aggregated to be equal to the number of preset cluster centers, so that each of the images to be aggregated Channels represent the weight of the difference between local features and each cluster center; 所述soft-max层用于对局部特征与每个聚类中心之差的权重进行归一化;The soft-max layer is used to normalize the weight of the difference between the local feature and each cluster center; 所述聚合层用于根据局部特征、聚类中心以及归一化之后的权重聚合得到VLAD向量;The aggregation layer is used to obtain the VLAD vector according to the weight aggregation after local features, cluster centers and normalization; 所述VLAD向量由N个D维度的向量组成,N为聚类中心个数,D为聚类中心的维度;The VLAD vector is composed of N vectors of D dimensions, N is the number of cluster centers, and D is the dimension of the cluster centers; 所述内部归一化层用于对所述VLAD向量中每个D维度的向量进行归一化,以使得每个D维度的向量的分布在同一数量级;The internal normalization layer is used to normalize the vectors of each D dimension in the VLAD vector, so that the distribution of the vectors of each D dimension is in the same order of magnitude; 所述整体归一化层用于将经过所述内部归一化层处理后的D维度的向量串联为一个列向量后,对该列向量进行归一化,以使得所述待聚合图像的每个局部特征分布在同一数量级;The overall normalization layer is used to concatenate the D-dimensional vectors processed by the internal normalization layer into a column vector, and then normalize the column vector, so that each of the images to be aggregated The local features are distributed in the same order of magnitude; 其中,所述待聚合图像为所述特征提取网络输出的图像。Wherein, the image to be aggregated is an image output by the feature extraction network. 3.如权利要求1所述的图像特征提取模型训练方法,其特征在于,s>1。3. image feature extraction model training method as claimed in claim 1, is characterized in that, s>1. 4.如权利要求3所述的图像特征提取模型训练方法,其特征在于,所述步骤(3)中,利用所述训练集对所述图像特征提取模型进行训练时,所采用的损失函数为:4. image feature extraction model training method as claimed in claim 3, is characterized in that, in described step (3), when utilizing described training set to train described image feature extraction model, the loss function that adopts is : 其中,n为训练样本总数,k为训练样本序号,i为负样本序号,qk、pk和nki分别表示第k个训练样本中的目标图像、正样本和第i个负样本,表示目标图像qk与其正样本pk之间的特征距离,为目标图像qk与其负样本nki之间的特征距离,m为预定义的超参数,max表示取最大值,min表示取最小值。Among them, n is the total number of training samples, k is the number of training samples, i is the number of negative samples, q k , p k and n ki represent the target image, positive sample and i-th negative sample in the kth training sample respectively, Indicates the feature distance between the target image q k and its positive sample p k , is the feature distance between the target image q k and its negative sample n ki , m is a predefined hyperparameter, max means to take the maximum value, and min means to take the minimum value. 5.一种基于权利要求1-4任一项所述的图像特征提取模型训练方法的域自适应的视觉位置识别方法,其特征在于,包括:5. A domain-adaptive visual position recognition method based on the image feature extraction model training method described in any one of claims 1-4, characterized in that, comprising: 确定待识别图像所属的目标域,并获得所述目标域中不同位置处的多张图像,将所获取的图像与所述待识别图像均作为待检索图像;Determining the target domain to which the image to be recognized belongs, and obtaining multiple images at different positions in the target domain, and using the acquired image and the image to be recognized as the image to be retrieved; 以所述待检索图像为输入,利用所述图像特征提取模型获得各待检索图像的特征向量;获取图像特征向量时,对于每一个卷积层,统计所有待检索图像经过该卷积层后,所得到的特征图的均值和标准差,作为该卷积层之后的一层批标准化层的参数;所述图像特征提取模型中其余的模型参数为训练所得的模型参数;Taking the image to be retrieved as input, using the image feature extraction model to obtain the feature vector of each image to be retrieved; when obtaining the image feature vector, for each convolutional layer, count all the images to be retrieved after passing through the convolutional layer, The mean value and the standard deviation of the obtained feature map are used as the parameters of a layer of batch normalization layer after the convolution layer; the rest of the model parameters in the image feature extraction model are model parameters obtained from training; 利用所述图像特征获取模型获取测试数据集中各图像的特征向量;Using the image feature acquisition model to acquire the feature vectors of each image in the test data set; 根据所获取的特征向量获得所述测试数据集中与所述待识别图像的特征距离最近的图像,并将该图像的位置信息确定为所述待识别图像的位置信息,从而完成对所述待识别图像的视觉位置识别;Obtain the image in the test data set that is closest to the feature distance of the image to be identified according to the acquired feature vector, and determine the position information of the image as the position information of the image to be identified, thereby completing the identification of the image to be identified Visual location recognition of images; 其中,所述测试数据集中各图像的位置信息已知,域为影响图像特征分布的因素集合。Wherein, the location information of each image in the test data set is known, and the domain is a set of factors that affect the distribution of image features. 6.如权利要求1所述的域自适应的视觉位置识别方法,其特征在于,利用所述图像特征获取模型获取测试数据集中各图像的特征向量时,各模型参数的设置方式为:6. The visual position recognition method of domain adaptation as claimed in claim 1, is characterized in that, when utilizing described image feature acquisition model to acquire the feature vector of each image in the test data set, the setting mode of each model parameter is: 利用训练所得的模型参数设置各模型参数;Use the model parameters obtained from training to set each model parameter; 或者,对于每一个卷积层,统计所述测试数据集中所有图像经过该卷积层厚,所得到的特征图的均值和标准差,作为该卷积层之后的一层批标准化层的参数;所述图像特征提取模型中其余的模型参数为训练所得的模型参数。Or, for each convolutional layer, count all images in the test data set through the convolutional layer thickness, the mean value and standard deviation of the obtained feature map, as the parameter of a layer of batch normalization layer after the convolutional layer; The remaining model parameters in the image feature extraction model are model parameters obtained through training. 7.一种图像特征提取模型训练装置,其特征在于,包括:模型建立模块、训练集构造模块以及模型训练模块;7. A model training device for image feature extraction, comprising: a model building module, a training set construction module and a model training module; 所述模型建立模块用于建立基于深度神经网络的图像特征提取模型,所述图像特征提取模型用于获取图像的特征向量;The model building module is used to set up an image feature extraction model based on a deep neural network, and the image feature extraction model is used to obtain a feature vector of an image; 所述训练集构造模块用于在标准数据集中,获得各目标图像的正样本和s个负样本,以由一张目标图像及其正样本和负样本构成一个训练样本,从而得到由所有训练样本构成的训练集;The training set construction module is used to obtain positive samples and s negative samples of each target image in the standard data set, so as to form a training sample from a target image and its positive samples and negative samples, thereby obtaining all training samples The training set formed; 所述模型训练模块用于利用所述训练集对所述图像特征提取模型进行训练,从而得到各模型参数;The model training module is used to use the training set to train the image feature extraction model, so as to obtain each model parameter; 其中,所述图像特征提取模型包括特征提取网络和局部特征聚合网络;Wherein, the image feature extraction model includes a feature extraction network and a local feature aggregation network; 所述特征提取网络包括级联的多个第一网络;所述第一网络由一个或多个第二网络以及一个极大池化层依次连接而成,所述极大池化层用于对其前的第二网络输出的图像进行特征选择;所述第二网络包括依次连接的卷积层、批标准化层以及激活函数层,所述卷积层用于对图像进行特征提取,所述批标准化层用于对所述卷积层输出的图像进行零均值标准化处理,所述激活函数层用于对所述批标准化层输出的图像进行激活处理;The feature extraction network includes a plurality of cascaded first networks; the first network is sequentially connected by one or more second networks and a maximum pooling layer, and the maximum pooling layer is used for its previous The image of the second network output of the second network is subjected to feature selection; the second network includes a convolutional layer, a batch normalization layer and an activation function layer connected in sequence, the convolutional layer is used to extract features from the image, and the batch normalization layer For performing zero-mean normalization processing on the image output by the convolution layer, and the activation function layer is used for performing activation processing on the image output by the batch normalization layer; 所述局部特征聚合网络用于对所述特征提取网络输出的图像中所有的局部特征进行聚合,从而得到图像的特征向量;The local feature aggregation network is used to aggregate all the local features in the image output by the feature extraction network, so as to obtain the feature vector of the image; 目标图像的正样本为其临近图像中与其特征距离最近的图像,目标图像与其临近图像的位置距离d满足TNL≤d<TNH;目标图像与其负样本的位置距离d满足d≥TFThe positive sample of the target image is the image closest to its feature distance among the adjacent images, and the position distance d between the target image and its adjacent image satisfies T NL ≤ d<T NH ; the position distance d between the target image and its negative sample satisfies d≥T F ; 所述标准数据集中各图像的位置信息已知,目标图像为所述标准数据集中预先筛选出的多张图像;图像间的特征距离为图像的特征向量之间的距离;TNL、TNH和TF均为预设的阈值,0<TNL<TNH,TNH≤TF;s≥1。The position information of each image in the standard data set is known, and the target image is a plurality of images pre-screened in the standard data set; the feature distance between images is the distance between the feature vectors of the images; T NL , T NH and T F are preset thresholds, 0<T NL <T NH , T NH ≤T F ; s≥1. 8.一种基于1-4任一项所述的图像特征提取模型训练方法的域自适应的视觉位置识别装置,其特征在于,包括:检索集获取模块、第一特征提取模块、第二特征提取模块以及识别模块;8. A domain-adaptive visual position recognition device based on the image feature extraction model training method described in any one of 1-4, characterized in that it includes: a retrieval set acquisition module, a first feature extraction module, a second feature Extraction module and identification module; 所述检索集获取模块用于确定待识别图像所属的目标域,并获得所述目标域中不同位置处的多张图像,将所获取的图像与所述待识别图像均作为待检索图像;The retrieval set acquisition module is used to determine the target domain to which the image to be recognized belongs, and obtain multiple images at different positions in the target domain, and use the acquired image and the image to be recognized as the image to be retrieved; 所述第一特征提取模块用于以所述待检索图像为输入,利用所述图像特征提取模型获得各待检索图像的特征向量;获取图像特征向量时,对于每一个卷积层,统计所有待检索图像经过该卷积层后,所得到的特征图的均值和标准差,作为该卷积层之后的一层批标准化层的参数;所述图像特征提取模型中其余的模型参数为训练所得的模型参数;The first feature extraction module is used to take the image to be retrieved as input, and use the image feature extraction model to obtain the feature vector of each image to be retrieved; when obtaining the image feature vector, for each convolutional layer, count all After the retrieved image passes through the convolution layer, the mean and standard deviation of the obtained feature map are used as the parameters of a layer of batch normalization layer after the convolution layer; the rest of the model parameters in the image feature extraction model are obtained from training model parameters; 所述第二特征提取模块用于利用所述图像特征获取模型获取测试数据集中各图像的特征向量;The second feature extraction module is used to acquire the feature vector of each image in the test data set by using the image feature acquisition model; 所述识别模块用于根据所述第一特征提取模块和所述第二特征提取模块所提取的特征向量,获得所述测试数据集中与所述待识别图像的特征距离最近的图像,并将该图像的位置信息确定为所述待识别图像的位置信息,从而完成对所述待识别图像的视觉位置识别;The recognition module is used to obtain the image in the test data set with the closest feature distance to the image to be recognized according to the feature vectors extracted by the first feature extraction module and the second feature extraction module, and use the The position information of the image is determined as the position information of the image to be recognized, thereby completing the visual position recognition of the image to be recognized; 其中,所述测试数据集中各图像的位置信息已知,域为影响图像特征分布的因素集合。Wherein, the location information of each image in the test data set is known, and the domain is a set of factors that affect the distribution of image features.
CN201910350741.5A 2019-04-28 2019-04-28 Model training method, domain-adaptive visual position identification method and device Expired - Fee Related CN110175615B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910350741.5A CN110175615B (en) 2019-04-28 2019-04-28 Model training method, domain-adaptive visual position identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910350741.5A CN110175615B (en) 2019-04-28 2019-04-28 Model training method, domain-adaptive visual position identification method and device

Publications (2)

Publication Number Publication Date
CN110175615A true CN110175615A (en) 2019-08-27
CN110175615B CN110175615B (en) 2021-01-01

Family

ID=67690216

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910350741.5A Expired - Fee Related CN110175615B (en) 2019-04-28 2019-04-28 Model training method, domain-adaptive visual position identification method and device

Country Status (1)

Country Link
CN (1) CN110175615B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110866140A (en) * 2019-11-26 2020-03-06 腾讯科技(深圳)有限公司 Image feature extraction model training method, image searching method and computer equipment
CN111627065A (en) * 2020-05-15 2020-09-04 Oppo广东移动通信有限公司 Visual positioning method and device and storage medium
CN111914712A (en) * 2020-07-24 2020-11-10 合肥工业大学 A method and system for target detection in railway ground track scene
CN112328891A (en) * 2020-11-24 2021-02-05 北京百度网讯科技有限公司 Method for training search model, method for searching target object and device thereof
CN112541515A (en) * 2019-09-23 2021-03-23 北京京东乾石科技有限公司 Model training method, driving data processing method, device, medium and equipment
CN112733701A (en) * 2021-01-07 2021-04-30 中国电子科技集团公司信息科学研究院 Robust scene recognition method and system based on capsule network
CN112906724A (en) * 2019-11-19 2021-06-04 华为技术有限公司 Image processing device, method, medium and system
WO2021204014A1 (en) * 2020-11-12 2021-10-14 平安科技(深圳)有限公司 Model training method and related apparatus
CN113591771A (en) * 2021-08-10 2021-11-02 武汉中电智慧科技有限公司 Training method and device for multi-scene power distribution room object detection model
CN115345930A (en) * 2021-05-12 2022-11-15 阿里巴巴新加坡控股有限公司 Model training method, visual positioning method, device and equipment
CN115761246A (en) * 2022-11-21 2023-03-07 Oppo广东移动通信有限公司 Training method of feature extraction model, information recommendation method, device and equipment
CN116468784A (en) * 2023-04-10 2023-07-21 哈尔滨工业大学 A visual location recognition method, system and device based on attention compression coding features
CN116863164A (en) * 2023-07-03 2023-10-10 浙江大学 Visual position identification method, electronic equipment and medium
CN119359802A (en) * 2024-09-26 2025-01-24 浙江大学 A method and device for image position recognition based on basic visual model

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107122712A (en) * 2017-03-27 2017-09-01 大连大学 It polymerize the palmprint image recognition methods of description vectors based on convolutional neural networks and two-way local feature
CN107767378A (en) * 2017-11-13 2018-03-06 浙江中医药大学 The multi-modal Magnetic Resonance Image Segmentation methods of GBM based on deep neural network
CN107967457A (en) * 2017-11-27 2018-04-27 全球能源互联网研究院有限公司 A kind of place identification for adapting to visual signature change and relative positioning method and system
US20180181804A1 (en) * 2016-12-28 2018-06-28 Konica Minolta Laboratory U.S.A., Inc. Data normalization for handwriting recognition
WO2018184195A1 (en) * 2017-04-07 2018-10-11 Intel Corporation Joint training of neural networks using multi-scale hard example mining
CN108647577A (en) * 2018-04-10 2018-10-12 华中科技大学 A kind of pedestrian's weight identification model that adaptive difficult example is excavated, method and system
CN109684977A (en) * 2018-12-18 2019-04-26 成都三零凯天通信实业有限公司 View landmark retrieval method based on end-to-end deep learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180181804A1 (en) * 2016-12-28 2018-06-28 Konica Minolta Laboratory U.S.A., Inc. Data normalization for handwriting recognition
CN107122712A (en) * 2017-03-27 2017-09-01 大连大学 It polymerize the palmprint image recognition methods of description vectors based on convolutional neural networks and two-way local feature
WO2018184195A1 (en) * 2017-04-07 2018-10-11 Intel Corporation Joint training of neural networks using multi-scale hard example mining
CN107767378A (en) * 2017-11-13 2018-03-06 浙江中医药大学 The multi-modal Magnetic Resonance Image Segmentation methods of GBM based on deep neural network
CN107967457A (en) * 2017-11-27 2018-04-27 全球能源互联网研究院有限公司 A kind of place identification for adapting to visual signature change and relative positioning method and system
CN108647577A (en) * 2018-04-10 2018-10-12 华中科技大学 A kind of pedestrian's weight identification model that adaptive difficult example is excavated, method and system
CN109684977A (en) * 2018-12-18 2019-04-26 成都三零凯天通信实业有限公司 View landmark retrieval method based on end-to-end deep learning

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
RELJA ARANDJELOVIC 等: "NetVLAD: CNN architecture for weakly supervised place recognition", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》 *
YANGHAO LI等: "Adaptive Batch Normalization for practical domain adaptation", 《PATTERN RECOGNITION》 *
仇晓松 等: "基于卷积神经网络的视觉位置识别方法", 《计算机工程与设计》 *
王丽君 等: "基于卷积神经网络的位置识别", 《电子科技》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541515A (en) * 2019-09-23 2021-03-23 北京京东乾石科技有限公司 Model training method, driving data processing method, device, medium and equipment
CN112906724A (en) * 2019-11-19 2021-06-04 华为技术有限公司 Image processing device, method, medium and system
CN110866140B (en) * 2019-11-26 2024-02-02 腾讯科技(深圳)有限公司 Image feature extraction model training method, image search method and computer equipment
CN110866140A (en) * 2019-11-26 2020-03-06 腾讯科技(深圳)有限公司 Image feature extraction model training method, image searching method and computer equipment
CN111627065A (en) * 2020-05-15 2020-09-04 Oppo广东移动通信有限公司 Visual positioning method and device and storage medium
CN111627065B (en) * 2020-05-15 2023-06-20 Oppo广东移动通信有限公司 A visual positioning method, device, and storage medium
CN111914712A (en) * 2020-07-24 2020-11-10 合肥工业大学 A method and system for target detection in railway ground track scene
CN111914712B (en) * 2020-07-24 2024-02-13 合肥工业大学 A railway ground track scene target detection method and system
WO2021204014A1 (en) * 2020-11-12 2021-10-14 平安科技(深圳)有限公司 Model training method and related apparatus
CN112328891A (en) * 2020-11-24 2021-02-05 北京百度网讯科技有限公司 Method for training search model, method for searching target object and device thereof
CN112733701A (en) * 2021-01-07 2021-04-30 中国电子科技集团公司信息科学研究院 Robust scene recognition method and system based on capsule network
CN115345930A (en) * 2021-05-12 2022-11-15 阿里巴巴新加坡控股有限公司 Model training method, visual positioning method, device and equipment
CN113591771A (en) * 2021-08-10 2021-11-02 武汉中电智慧科技有限公司 Training method and device for multi-scene power distribution room object detection model
CN113591771B (en) * 2021-08-10 2024-03-08 武汉中电智慧科技有限公司 Training method and equipment for object detection model of multi-scene distribution room
CN115761246A (en) * 2022-11-21 2023-03-07 Oppo广东移动通信有限公司 Training method of feature extraction model, information recommendation method, device and equipment
CN116468784A (en) * 2023-04-10 2023-07-21 哈尔滨工业大学 A visual location recognition method, system and device based on attention compression coding features
CN116863164A (en) * 2023-07-03 2023-10-10 浙江大学 Visual position identification method, electronic equipment and medium
CN119359802A (en) * 2024-09-26 2025-01-24 浙江大学 A method and device for image position recognition based on basic visual model

Also Published As

Publication number Publication date
CN110175615B (en) 2021-01-01

Similar Documents

Publication Publication Date Title
CN110175615B (en) Model training method, domain-adaptive visual position identification method and device
Jin Kim et al. Learned contextual feature reweighting for image geo-localization
Li et al. SAR image change detection using PCANet guided by saliency detection
CN110188225B (en) Image retrieval method based on sequencing learning and multivariate loss
Chen et al. T-center: A novel feature extraction approach towards large-scale iris recognition
CN111310662B (en) A method and system for flame detection and recognition based on integrated deep network
CN111582178B (en) Method and system for vehicle re-identification based on multi-directional information and multi-branch neural network
CN107679078A (en) A kind of bayonet socket image vehicle method for quickly retrieving and system based on deep learning
Yue et al. Robust loop closure detection based on bag of superpoints and graph verification
CN112084895B (en) Pedestrian re-identification method based on deep learning
CN113743251B (en) A target search method and device based on weak supervision scenarios
CN110689043A (en) Vehicle fine granularity identification method and device based on multiple attention mechanism
CN105320764B (en) A 3D model retrieval method and retrieval device based on incremental slow feature
CN109344856B (en) An offline signature identification method based on multi-layer discriminative feature learning
CN109871892A (en) A kind of robot vision cognitive system based on small sample metric learning
CN114927236A (en) A detection method and system for multiple target images
CN105989369A (en) Pedestrian Re-Identification Method Based on Metric Learning
CN114596546A (en) Vehicle re-identification method, device and computer, and readable storage medium
CN113316080B (en) Indoor positioning method based on Wi-Fi and image fusion fingerprint
CN103903017B (en) A kind of face identification method based on adaptive soft histogram local binary patterns
CN110458234B (en) Vehicle searching method with map based on deep learning
CN118097226A (en) A vehicle classification method and system based on improved ViT
CN117456480A (en) A lightweight vehicle re-identification method based on multi-source information fusion
CN115410223B (en) A domain-generalized person re-identification method based on invariant feature extraction
CN102609732B (en) Object recognition method based on generalization visual dictionary diagram

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210101

CF01 Termination of patent right due to non-payment of annual fee