WO2018036293A1 - 图像分割方法、装置及全卷积网络系统 - Google Patents
图像分割方法、装置及全卷积网络系统 Download PDFInfo
- Publication number
- WO2018036293A1 WO2018036293A1 PCT/CN2017/092614 CN2017092614W WO2018036293A1 WO 2018036293 A1 WO2018036293 A1 WO 2018036293A1 CN 2017092614 W CN2017092614 W CN 2017092614W WO 2018036293 A1 WO2018036293 A1 WO 2018036293A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- convolution
- porous
- image
- target
- porous convolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/143—Segmentation; Edge detection involving probabilistic approaches, e.g. Markov random field [MRF] modelling
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Definitions
- the present application relates to the field of machine vision technology, and in particular, to an image segmentation method, apparatus, and full convolution network system.
- Image segmentation such as semantic segmentation and scene marker has a very important role in many application scenarios, such as image understanding, autopilot and other application scenarios, which makes image segmentation important for machine to understand images; among them, the so-called semantics Divided into: Given an image, classify the pixels in the image; the so-called scene mark is: mark the area in the image according to the image semantics.
- the full convolutional network has brought significant performance improvements to the semantic segmentation and scene marking of images. Specifically, through the good classification performance of the full convolutional network, densely predict the pixels in the image. Finally, The final result is predicted comprehensively by means of a conditional random field method.
- a full convolutional network for image segmentation is mainly a network composed of a convolutional layer, a pooling layer, and an activation layer, and a full convolutional network does not have a fully connected layer with respect to a convolutional network.
- the large receptive field means that more spatial context information is considered, and the prediction accuracy can be increased.
- the so-called receptive field is the region of the input image corresponding to the response of a certain node of the output feature map.
- the spatial size of the feature map in the network becomes smaller as the network deepens, resulting in The resolution becomes lower, and finally the full convolution network tends to have poor prediction results for the edges of the targets in the image. If the pooling layer is increased, the prediction accuracy of the small targets in the image is undoubtedly reduced.
- porous convolution has been proposed to solve the above problems to some extent. Among them, the porous convolution achieves the increase of the convolution kernel while not filling the zero element in the convolution kernel. The parameter explosion, and the porous convolution can help remove some of the pooling layers in the network, so that the feature map remains unchanged as the network deepens.
- the scale of the target object in the image has different characteristics, that is, there are very large targets and very small targets.
- a full-convolution network often has its own scale range, which is the target object that is most suitable for processing. For example, choosing different expansion coefficients in porous convolution often makes the full convolution network suitable. Different scale ranges.
- the scale of the features extracted by the convolution operation is not only proportional to the receptive field of the convolutional layer, but also proportional to the expansion coefficient of the convolutional layer, and the scale of the extracted features. If it is large, it will cause the target object with small scale to be ignored. Therefore, how to effectively segment the target objects in different scales in the image under the premise of ensuring a large receptive field is a problem worthy of attention.
- the purpose of the embodiments of the present application is to provide an image segmentation method, a device, and a full convolutional network system, so as to improve the segmentation effectiveness of target objects in different scales in an image under the premise of ensuring a large receptive field.
- the specific technical solutions are as follows:
- an embodiment of the present application provides an image segmentation method, including:
- the target network is a full convolution network including a hybrid context network structure, and the hybrid context network structure is configured to fuse a plurality of reference features with a predetermined scale range extracted by itself into target features
- the target feature is a feature that matches a scale range of the target object in the segmented image; the target network is trained by a sample image of the target object having different scale ranges;
- the hybrid context network structure is specifically a convolution structure combining non-porous convolution and porous convolution.
- the hybrid context network structure includes: at least one hybrid context component
- Each of the hybrid context components includes: a porous convolution branch, a non-porous convolution branch, a channel tandem layer, and a non-porous convolution layer, wherein the porous convolution branch and the non-porous convolution branch respectively are mixed
- the input of the context component is convoluted, and the channel tandem layer serially processes the convolution result of the porous convolution branch and the convolution result of the non-porous convolution branch, the non-porous convolution layer pair
- the processing result of the channel serial layer is subjected to convolution processing and the resulting convolution result is output as the output content of the hybrid context component in which it is located.
- the porous convolution branch includes at least one porous convolution
- the non-porous convolution includes at least one non-porous convolution
- the hybrid context component convolves the input content by using the following convolution formula:
- F i represents the input feature map of the ith layer
- F i+1 represents the output feature map of the ith layer
- W k represents the parameters of the porous convolution branch and the non-porous convolution branch
- b k represents the porous convolution branch and The offset term of the non-porous convolution branch
- ⁇ denotes the activation function of the porous convolution branch and the non-porous convolution branch
- c() denotes all input matrices in series on the channel axis
- W i denotes the non-porous convolution layer
- the parameter, b i represents the offset term of the non-porous convolution layer. Represents the activation function of a non-porous convolution layer.
- the training process of the target network is:
- the training process ends, and the target network is obtained.
- an image segmentation apparatus including:
- a target image obtaining module configured to obtain a target image to be processed
- An image feature data obtaining module configured to obtain image feature data of the target image
- An image segmentation module configured to input the image feature data into a pre-trained target network for image segmentation, to obtain an output result;
- the target network is a full convolution network including a hybrid context network structure,
- the hybrid context network structure is configured to fuse a plurality of reference features extracted by itself to a target feature, wherein the target feature is a feature that matches a scale range of the target object in the segmented image; the target network Trained by sample images of target objects with different scale ranges;
- a result obtaining module configured to obtain an image segmentation result corresponding to the target image based on the output result.
- the hybrid context network structure is specifically a convolution structure combining non-porous convolution and porous convolution.
- the hybrid context network structure includes: at least one hybrid context component
- Each of the hybrid context components includes: a porous convolution branch, a non-porous convolution branch, a channel tandem layer, and a non-porous convolution layer, wherein the porous convolution branch and the non-porous convolution branch respectively are mixed
- the input of the context component is convoluted, and the channel tandem layer serially processes the convolution result of the porous convolution branch and the convolution result of the non-porous convolution branch, the non-porous convolution layer pair
- the processing result of the channel serial layer is subjected to convolution processing and the resulting convolution result is output as the output content of the hybrid context component in which it is located.
- the porous convolution branch includes at least one porous convolution
- the non-porous convolution includes at least one non-porous convolution
- the hybrid context component convolves the input content by using the following convolution formula:
- F i represents the input feature map of the ith layer
- F i+1 represents the output feature map of the ith layer
- W k represents the parameters of the porous convolution branch and the non-porous convolution branch
- b k represents the porous convolution branch and The offset term of the non-porous convolution branch
- ⁇ denotes the activation function of the porous convolution branch and the non-porous convolution branch
- c() denotes all input matrices in series on the channel axis
- W i denotes the non-porous convolution layer
- the parameter, b i represents the offset term of the non-porous convolution layer. Represents the activation function of a non-porous convolution layer.
- the target network is trained by a training module, where the training module includes:
- a building unit for constructing an initial full convolutional network comprising a hybrid context network structure
- a feature data obtaining unit configured to obtain image feature data of each sample image
- a training unit configured to input image feature data of each sample image into the initial full convolution network for training
- the judging unit is configured to: when the output values corresponding to the respective training samples and the loss value of the corresponding image segmentation true value are lower than a predetermined threshold, the training process ends, and the target network is obtained.
- the embodiment of the present application further provides a full convolution network system, including: a hybrid context network structure;
- the hybrid context network structure includes: at least one hybrid context component
- Each of the hybrid context components includes: a porous convolution branch, a non-porous convolution branch, a channel tandem layer, and a non-porous convolution layer, wherein the porous convolution branch and the non-porous convolution branch respectively are mixed
- the input of the context component is convoluted, and the channel tandem layer serially processes the convolution result of the porous convolution branch and the convolution result of the non-porous convolution branch, the non-porous convolution layer pair
- the processing result of the channel serial layer is subjected to convolution processing and the resulting convolution result is output as the output content of the hybrid context component in which it is located.
- the porous convolution branch includes at least one porous convolution
- the non-porous convolution includes at least one non-porous convolution
- the hybrid context component convolves the input content by using the following convolution formula:
- F i represents the input feature map of the ith layer
- F i+1 represents the output feature map of the ith layer
- W k represents the parameters of the porous convolution branch and the non-porous convolution branch
- b k represents the porous convolution branch and The offset term of the non-porous convolution branch
- ⁇ denotes the activation function of the porous convolution branch and the non-porous convolution branch
- c() denotes all input matrices in series on the channel axis
- W i denotes the non-porous convolution layer
- the parameter, b i represents the offset term of the non-porous convolution layer. Represents the activation function of a non-porous convolution layer.
- the embodiment of the present application further provides an electronic device, including:
- the memory stores executable program code
- the processor executes a program corresponding to the executable program code by reading executable program code stored in the memory for performing an image segmentation method according to the first aspect of the present application at runtime .
- the present application provides a storage medium, wherein the storage medium is configured to store executable program code for performing an image segmentation according to the first aspect of the present application at runtime method.
- the target network for image segmentation is a full convolution network with a hybrid context network structure
- the hybrid context network structure can fuse a plurality of reference features with a predetermined scale range extracted by itself into:
- the features of the scale range of the target object in the segmentation image are matched, so that the target objects in each scale range in the image are not ignored, and the receptive field depends on the convolution with the largest expansion coefficient. Therefore, the scheme can guarantee large Under the premise of the receptive field, the segmentation effectiveness of the target objects in different scale ranges in the image is improved.
- FIG. 1 is a flowchart of an image segmentation method according to an embodiment of the present application
- FIG. 2 is a schematic structural diagram of a hybrid context component provided by an embodiment of the present application.
- FIG. 3 is a schematic diagram of a target network for segmenting an image listed in the embodiment of the present application.
- FIG. 4 is a flowchart of a training process of a target network in an embodiment of the present application.
- FIG. 5 is a schematic structural diagram of an image segmentation apparatus according to an embodiment of the present application.
- FIG. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
- the embodiment of the present application provides an image segmentation method, device, and full convolution network.
- the image segmentation in the embodiment of the present application may be: a semantic segmentation of an image, or a scene tag of an image, and optionally, another manner of dividing an area in the image, which is not limited herein. .
- An image segmentation method provided by the embodiment of the present application is first introduced.
- an image segmentation method may be an image segmentation device, where the image segmentation device may be a function plug-in in the image processing software in the related art, or may be Independent of the functional software, this is reasonable; in addition, the image segmentation device can be applied to an electronic device, which can include a terminal device or a server device, which is also reasonable.
- an image segmentation method provided by an embodiment of the present application may include the following steps:
- the so-called obtaining the target image to be processed may be: obtaining the target image to be processed locally, or downloading the target image to be processed from the network, etc., which is reasonable.
- the target image may include target objects of different scale ranges, for example, the target image is a surveillance image captured by a roadside camera, the target image includes cars and pedestrians belonging to a close-up view, and includes birds belonging to a distant view. Among them, the cars and pedestrians belonging to the close view are target objects of a large scale, and the birds belonging to the distant view are target objects of a small scale range.
- the setting of the size scale is a relative concept and does not refer to a specific scale range; however, for the full convolution network in the prior art, if the network is good at processing scale
- the target object is usually not suitable for processing target objects with small scale, and the size is relative.
- the full convolutional network A in the related art is suitable for processing target objects of 100*100 pixels, then the target of 10*10
- the object is a target object of a small scale range, and the target object of the small scale range is ignored by the full convolutional network A.
- the full convolutional network B is suitable for processing a target object of 1000*1000, 100*100
- the target object can be considered as a target object of a small scale range, and the target object of the small scale range is ignored by the full convolutional network B.
- the scale range applicable to the image segmentation method using the full convolutional network in the related art has limitations, and the image segmentation method provided by the embodiment of the present application utilizes a target network including a hybrid context network structure, with the purpose of expanding The range of scales that apply.
- image feature data of the target image can be obtained, wherein the image feature data can uniquely determine the target image.
- the image feature data may include, but is not limited to, a color channel value, wherein, for an RGB image, a color channel value is an RGB channel value, and an “R” in the RGB image and the RGB channel value represents a red Red, “ G” stands for Green Green and "B” stands for Blue Blue.
- the image segmentation device can call the external color extractor to extract the color channel value, or the color channel value can be extracted by the built-in program code, which is reasonable, wherein the color extractor is capable of extracting the image. Function software for color channel values of pixels.
- S103 Input the image feature data into a pre-trained target network for image segmentation.
- the network an output result is obtained;
- the target network is a full convolution network including a hybrid context network structure, and the hybrid context network structure is used to fuse a plurality of reference features extracted by the predetermined scale range into target features.
- the target feature is a feature that matches a scale range of the target object in the segmented image; the target network is trained by sample images of the target object having different scale ranges;
- a target network for image segmentation is obtained in advance, and the target network is a full convolution including a mixed context network structure.
- the network that is, the target network belongs to a full convolutional network, but the target network further includes a hybrid context network structure, including the convolution layer, the pooling layer, and the activation layer.
- the hybrid context The network structure is configured to fuse a plurality of reference features extracted by itself with a predetermined scale range into a target feature, the target feature being a feature that matches a scale range of the target object in the segmented image.
- the convolution layer is used for convolution processing
- the pooling layer is used for upsampling or under processing
- the activation layer is used to introduce nonlinear factors, and for a full convolutional network structure, there may be only one
- the layer is activated, and multiple convolution layers and pooling layers are located, and the pooling layer and the activation layer are located after the convolution layer.
- the obtained image feature data of the target image is input as input content to the target network, and an output result is obtained.
- the hybrid context network structure is specifically a convolution structure combining non-porous convolution and porous convolution.
- the porous convolution is a convolution that enlarges the convolution kernel by filling the convolution kernel with zero elements
- the non-porous convolution is: the convolution kernel is not filled with zero elements to expand the ordinary convolution of the convolution kernel.
- the expansion coefficient is an attribute related to the porous convolution, and the ordinary convolution, that is, the expansion coefficient of the non-porous convolution is 1, if one is added between two elements of the ordinary convolutional convolution kernel. Zero, the expansion coefficient is 2, and so on, adding N zeros, and the expansion coefficient is (N+1).
- the hybrid context network structure includes: at least one hybrid context component
- each of the hybrid context components includes: a porous convolution branch, a non-porous convolution branch, a channel series layer, and a non-porous convolution layer, wherein the porous convolution branch and the non-porous convolution branch respectively Convolution processing is performed on the input content of the mixed context component in which the concatenated result of the porous convolution branch and the convolution result of the non-porous convolution branch are serially processed, the non-porous convolution layer pair The processing result of the channel serial layer is convoluted and the resulting convolution result is output as the output content of the hybrid context component in which it is located.
- the input of the hybrid context component may be any feature map, and then the feature map is respectively subjected to convolution processing of the porous convolution branch and the non-porous convolution branch, and respectively extracted in the parallel convolution process.
- the characteristics of different predetermined scale ranges, the predetermined scale range is affected by the expansion coefficient, and then, through the processing of the channel series layer and the non-porous convolution layer, the features of the new scale range are selected from the different predetermined scale ranges as the output.
- the new scale range is characterized by a feature that matches the scale range of the target object in the segmented image.
- W i represents a convolution kernel parameter of the i-th layer
- F i represents an input feature map of the i-th layer
- F i+1 represents an output feature map of the i-th layer
- b i represents an offset term
- It represents activation function, where all of the convolution kernel W i are the same size, the same expansion coefficient.
- F i represents the input feature map of the ith layer
- F i+1 represents the output feature map of the ith layer
- W k represents the parameters of the porous convolution branch and the non-porous convolution branch
- b k represents the porous convolution branch and The offset term of the non-porous convolution branch
- ⁇ denotes the activation function of the porous convolution branch and the non-porous convolution branch
- c() denotes all input matrices in series on the channel axis
- W i denotes the non-porous convolution layer
- the parameter, b i represents the offset term of the non-porous convolution layer. Represents the activation function of a non-porous convolution layer.
- c is specifically to have two four-dimensional matrices connected in the second dimension to become a matrix.
- a matrix of n*c1*h*w and a matrix of n*c2*h*w are merged. It becomes n*(c1+c2)*h*w.
- F represents a feature map, which is a matrix
- the receptive field refers to an area in which an element in the feature map corresponds to the original image, and the receptive field can be regarded as an attribute of the feature map.
- the convolution kernel size W i may be 1.
- W k can be either a porous convolution or a non-porous convolution. It will be appreciated that the scope and scale expansion coefficient convolution convolution extracted feature is directly proportional, i.e., W i is supplied to the filter characteristics, both large scale features and small scale features also.
- the receptive field of F i+1 depends on the convolution with the largest expansion coefficient. That is to say, F i+1 can have a large receptive field while outputting large-scale features or small-scale features according to the scale range selection of the input image, or a mixture thereof, that is, outputting the target in the segmented image.
- the feature scale matching of the object rather than the full convolutional network in the related art, can only output the characteristics of a certain scale. This undoubtedly gives the target network more freedom, and it is best for the target network to learn what combination of scales is from the specified sample image.
- the porous convolution branch includes at least one porous convolution comprising at least one non-porous convolution.
- the hybrid context network structure includes at least two hybrid context components, the at least two hybrid context components are connected in series.
- the hybrid context network structure includes a plurality of porous convolution branches, there must be multiple porous convolutions in the hybrid context network structure, and any porous volume in the hybrid context network structure
- a plurality of porous convolutions may also exist in the hybrid context network structure; further, when there are multiple porous convolutions in the hybrid context network structure, expansion of the plurality of porous convolutions
- the coefficient can be set according to the actual situation, which is not limited by the embodiment of the present application.
- the hybrid context network structure can be set in the second half of the entire target network, which is of course not limited thereto, and the hybrid context network structure is complex due to the complexity and variety of the entire network configuration.
- the specific embodiment of the present application is not limited in this embodiment. It can be understood that the context-mixed network structure provided by the embodiment of the present application is provided.
- the target network from the perspective of function, it can be divided into three parts, the first part is the classification prediction module, the second part is the context comprehensive judgment module, and the third part is the correction module, wherein the hybrid context network structure is a context comprehensive judgment module; specifically, the classification prediction module is configured to perform preliminary prediction on a category to which a pixel point belongs in the feature map, and the context comprehensive judgment module is configured to classify based on the context prediction module based on more context information, and The correction module is configured to correct the edge and the small target object based on the more detailed information based on the output result of the context synthesis judgment module.
- the hybrid context network structure is a context comprehensive judgment module
- the classification prediction module is configured to perform preliminary prediction on a category to which a pixel point belongs in the feature map
- the context comprehensive judgment module is configured to classify based on the context prediction module based on more context information
- the correction module is configured to correct the edge and the small target object based on the more detailed information based on the output result of the context synthesis judgment module.
- the hybrid context network structure includes five mixed context components, and the porous convolution branch in each mixed context component includes a porous convolution, non-porous
- the volume branch includes a non-porous convolution.
- 222*224, 112*112, etc. are the size of the feature map, and the size of the feature map reflects the change in the spatial size of the feature map during network operation.
- the left half of FIG. 3 is the classification prediction process corresponding to the above-mentioned classification prediction module, and the specific network structure corresponding to the classification prediction process is an FCN network transformed by the classification network, and the The classification network may be any mainstream classification network that already exists.
- the FCN network For Fully Convolutional Networks, the FCN network transforms the fully connected layers of traditional convolutional neural networks into convolutional layers that attempt to recover the categories to which each pixel belongs from abstract features, ie from image level classification. Further extended to the pixel level classification.
- the image segmentation result corresponding to the target image can be obtained based on the output result. It can be understood that the output result of the target network is some feature data, and the image segmentation result corresponding to the target image can be generated according to the feature data.
- the target network for image segmentation is a full convolution network with a hybrid context network structure
- the hybrid context network structure can fuse a plurality of reference features with a predetermined scale range extracted by itself into:
- the features of the scale range of the target object in the segmentation image are matched, so that the target objects in each scale range in the image are not ignored, and the receptive field depends on the convolution with the largest expansion coefficient. Therefore, the scheme can guarantee large Under the premise of the receptive field, the segmentation effectiveness of the target objects in different scale ranges in the image is improved.
- the training process of the target network may be:
- the so-called construction of the initial full convolution network including the hybrid context network structure is to construct a full convolution network containing a hybrid context network structure, and the expansion coefficients of the porous convolution and non-porous convolution involved in the full convolution network are constructed.
- the target network is set to complete.
- the initial full convolution network is a network structure that needs to be trained, that is, a target network whose parameters are not trained; and the initial full convolution network further includes a convolution layer, an activation layer, and a pooling Layer, wherein the specific location of the hybrid context network structure in the initial full convolution network may be set according to actual conditions, the number of convolution layers, activation layers and pooling layers and the positional relationship in the initial full convolution network, Can be set according to the actual situation.
- the positional relationship between the convolution layer, the activation layer, and the pooling layer in the initial full convolution network constructed may follow certain design principles, for example, the pooling layer and the activation layer are convoluted. After the layer.
- the image feature data of the sample image may include, but is not limited to, a color channel value, wherein, for an RGB image, the color channel value is an RGB channel value; optionally, the image feature data of the obtained sample image is as described above.
- the image feature data of the obtained target image is of the same type.
- the image feature data of each sample image may be input as input content to the initial full convolution network for training, and the output results corresponding to each training sample may be detected in real time and correspondingly Whether the loss value of the image segmentation true value is lower than a predetermined threshold, and when the output value corresponding to each training sample and the loss value of the corresponding image segmentation true value are lower than a predetermined threshold, the training process ends, and the target network is obtained, wherein each The true value of the image segmentation corresponding to the image is obtained by manual annotation, and the true value of the image segmentation refers specifically to the image of the sample image. Like the image data after segmentation.
- the optimization algorithm may be used to optimize the network of the initial full convolution network.
- the so-called optimization refers to adjusting parameters in the initial full convolution network, for example, convolutional convolution kernels and step sizes, and the like.
- the optimization algorithm used for network optimization of the initial full convolution network can adopt the gradient descent method.
- the basic processing idea of the gradient descent method is to solve the minimum value along the gradient descent direction (it can also be solved along the gradient ascending direction). Maximum value), the specific gradient descent method may include, but is not limited to, a SGD (stochastic gradient descent) algorithm or a Mini batch gradient algorithm.
- the embodiment of the present application further provides an image segmentation apparatus.
- an image segmentation apparatus provided by an embodiment of the present application may include:
- a target image obtaining module 310 configured to obtain a target image to be processed
- the image feature data obtaining module 320 is configured to obtain image feature data of the target image
- the image segmentation module 330 is configured to input the image feature data into a pre-trained target network for image segmentation to obtain an output result; wherein the target network is a full convolution network including a hybrid context network structure, the hybrid The context network structure is configured to fuse a plurality of reference features extracted by itself to a target feature, wherein the target feature is a feature that matches a scale range of the target object in the segmented image; the target network has different scales Training a sample image of the target object of the range;
- the result obtaining module 340 is configured to obtain an image segmentation result corresponding to the target image based on the output result.
- the target network for image segmentation is a full convolution network with a hybrid context network structure
- the hybrid context network structure can fuse a plurality of reference features with a predetermined scale range extracted by itself into: The features of the scale range of the target object in the segmentation image are matched, so that the target objects in each scale range in the image are not ignored, and the receptive field depends on the convolution with the largest expansion coefficient. Therefore, the scheme can guarantee large The premise of the feeling of the wild Next, the segmentation effectiveness of the target object in different scale ranges in the image is improved.
- the hybrid context network structure is specifically a convolution structure combining non-porous convolution and porous convolution.
- the hybrid context network structure can include: at least one hybrid context component
- Each of the hybrid context components includes: a porous convolution branch, a non-porous convolution branch, a channel series layer, and a non-porous convolution layer, wherein the porous convolution branch and the non-porous convolution branch are respectively associated with the hybrid context component
- the input content is subjected to convolution processing, and the channel serial layer serially processes the convolution result of the porous convolution branch and the convolution result of the non-porous convolution branch, and the non-porous convolution layer processes the serial layer of the channel
- convolution processing is performed and the resulting convolution result is output as the output content of the hybrid context component in which it is located.
- the porous convolution branch includes at least one porous convolution
- the non-porous convolution includes at least one non-porous convolution
- the hybrid context component convolves the input content using the following convolution formula:
- F i represents the input feature map of the ith layer
- F i+1 represents the output feature map of the ith layer
- W k represents the parameters of the porous convolution branch and the non-porous convolution branch
- b k represents the porous convolution branch and The offset term of the non-porous convolution branch
- ⁇ denotes the activation function of the porous convolution branch and the non-porous convolution branch
- c() denotes all input matrices in series on the channel axis
- W i denotes the non-porous convolution layer
- the parameter, b i represents the offset term of the non-porous convolution layer. Represents the activation function of a non-porous convolution layer.
- the target network is trained by a training module, and the training module includes:
- a building unit for constructing an initial full convolutional network comprising a hybrid context network structure
- a feature data obtaining unit configured to obtain image feature data of each sample image
- a training unit configured to input image feature data of each sample image into the initial full convolution network for training
- a judging unit configured to: when each training sample corresponds to an output result and a corresponding image segmentation true When the value of the loss value is below a predetermined threshold, the training process ends and the target network is obtained.
- the embodiment of the present application further provides a full convolution network system, which can be applied to image segmentation, and is of course not limited thereto.
- the full convolution network system includes: a hybrid context network structure;
- the hybrid context network structure includes: at least one of a plurality of hybrid context components
- Each of the hybrid context components includes: a porous convolution branch, a non-porous convolution branch, a channel series layer, and a non-porous convolution layer, wherein the porous convolution branch and the non-porous convolution branch are respectively associated with the hybrid context component
- the input content is subjected to convolution processing, and the channel serial layer serially processes the convolution result of the porous convolution branch and the convolution result of the non-porous convolution branch, and the non-porous convolution layer processes the serial layer of the channel
- convolution processing is performed and the resulting convolution result is output as the output content of the hybrid context component in which it is located.
- the full convolutional network is a network structure including a convolutional layer, a pooling layer, and an activation layer, and also includes a hybrid context network structure.
- the structure of the hybrid context component can be seen in Figure 2. And, when the mixed context component included in the hybrid context network structure is at least two, the at least two mixed context components are connected in series.
- the full convolution network has a hybrid context network structure
- the features of the feature maps from different scale ranges can be merged to form a feature that matches the scale range of the target object in the segmented image, thus making the target
- the network can automatically adjust the scale range of its adaptation by learning the sample image, wherein when training the full convolution network, sample images of target objects having different scale ranges can be utilized.
- the input of the hybrid context component may be any feature map, and then the feature map is respectively subjected to convolution processing of the porous convolution branch and the non-porous convolution branch, and respectively extracted in the parallel convolution process.
- the characteristics of different predetermined scale ranges, the predetermined scale range is affected by the expansion coefficient, and then, through the processing of the channel series layer and the non-porous convolution layer, the features of the new scale range are selected from the different predetermined scale ranges as the output.
- the new scale range is characterized by a feature that matches the scale range of the target object in the segmented image.
- the hybrid context network structure may be set in the second half of the entire target network, of course, not limited thereto, and due to the complexity and variety of the entire network configuration, the specificity of the hybrid context network structure in the network
- the position of the application is not limited. It can be understood that, for the target network including the context hybrid network structure provided by the embodiment of the present application, from the perspective of function, it can be divided into three parts, the first part is a classification prediction module, and the second part is context synthesis.
- the judgment module, the third part is a correction module, wherein the hybrid context network structure is the context comprehensive judgment module; specifically, the classification prediction module is used for preliminary prediction of the category to which the pixel points in the feature map belong, and the context comprehensive judgment module uses Based on the classification prediction module, the classification is performed based on more context information, and the correction module is configured to correct the edge and the small target object based on the more detailed information on the basis of the output result of the context synthesis judgment module.
- the hybrid context network structure includes five mixed context components, and the porous convolution branch in each mixed context component includes a porous convolution, non-porous The volume branch includes a non-porous convolution.
- 222*224, 112*112, etc. are the size of the feature map, and the size of the feature map reflects the change in the spatial size of the feature map during network operation.
- the left half of FIG. 3 is the classification prediction process corresponding to the above-mentioned classification prediction module, and the specific network structure corresponding to the classification prediction process is an FCN network transformed by the classification network, and the The classification network may be any mainstream classification network that already exists. Therefore, only the schematic diagram of the feature diagram is given in FIG.
- the FCN network For Fully Convolutional Networks, the FCN network transforms the fully connected layers of traditional convolutional neural networks into convolutional layers that attempt to recover the categories to which each pixel belongs from abstract features, ie from image level classification. Further extended to the pixel level classification.
- the target network for image segmentation is a full convolution network with a hybrid context network structure
- the hybrid context network structure can fuse a plurality of reference features with a predetermined scale range extracted by itself into:
- the features of the scale range of the target object in the segmentation image are matched, so that the target objects in each scale range in the image are not ignored, and the receptive field depends on the convolution with the largest expansion coefficient. Therefore, the scheme can guarantee large Under the premise of the receptive field, the segmentation effectiveness of the target objects in different scale ranges in the image is improved.
- the porous convolution branch includes at least one porous convolution
- the non-porous convolution includes at least one non-porous convolution.
- the mixture A plurality of porous convolutions may also exist in the context network structure.
- the expansion coefficients of the plurality of porous convolutions may be set according to actual conditions. There is no limit to this.
- the hybrid context component convolves the input content using the following convolution formula:
- F i represents the input feature map of the ith layer
- F i+1 represents the output feature map of the ith layer
- W k represents the parameters of the porous convolution branch and the non-porous convolution branch
- b k represents the porous convolution branch and The offset term of the non-porous convolution branch
- ⁇ denotes the activation function of the porous convolution branch and the non-porous convolution branch
- c() denotes all input matrices in series on the channel axis
- W i denotes the non-porous convolution layer
- the parameter, b i represents the offset term of the non-porous convolution layer. Represents the activation function of a non-porous convolution layer.
- c is specifically to have two four-dimensional matrices connected in the second dimension to become a matrix.
- a matrix of n*c1*h*w and a matrix of n*c2*h*w are merged. It becomes n*(c1+c2)*h*w.
- F represents a feature map, which is a matrix
- the receptive field refers to an area in which an element in the feature map corresponds to the original image, and the receptive field can be regarded as an attribute of the feature map.
- the convolution kernel size W i may be 1.
- W k can be either a porous convolution or a non-porous convolution. It will be appreciated that the scope and scale expansion coefficient convolution convolution extracted feature is directly proportional, i.e., W i is supplied to the filter characteristics, both large scale features and small scale features also.
- the receptive field of F i+1 depends on the convolution with the largest expansion coefficient. That is to say, F i+1 can have a large receptive field, and select a large-scale feature or a small-scale feature according to the scale range of the input image, or a mixture thereof, that is, output the target object in the segmented image.
- the features of the scale range match, rather than the full convolutional network of the related art, can only output the characteristics of a certain scale. This undoubtedly gives the target network more freedom, and it is best for the target network to learn what combination of scales is from the specified sample image.
- an electronic device which may include:
- the memory stores executable program code
- the processor by reading the executable program code stored in the memory, to run a program corresponding to the executable program code, for performing an image segmentation method according to an embodiment of the present application at runtime,
- the image segmentation method includes:
- the target network is a full convolution network including a hybrid context network structure, and the hybrid context network structure is used Integrating a plurality of reference features having a predetermined scale range extracted by itself into a target feature, the target feature being a feature matching a scale range of the target object in the segmented image; the target network passing through different scale ranges The sample image of the target object is trained;
- the embodiment of the present application further provides an electronic device, which may include:
- processor 410 a processor 410, a memory 420, a communication interface 430, and a bus 440;
- the processor 410, the memory 420, and the communication interface 430 are connected by the bus 440 and complete communication with each other;
- the memory 420 stores executable program code
- the processor 410 runs a program corresponding to the executable program code by reading the executable program code stored in the memory 420, for performing an image segmentation described in the embodiment of the present application at runtime.
- the method, wherein the image segmentation method comprises:
- the target network is a full convolution network including a hybrid context network structure, and the hybrid context network structure is used a plurality of predetermined scales
- the surrounding reference features are merged into target features, the target features being features that match the scale range of the target object in the segmented image; the target network is trained by sample images of target objects having different scale ranges;
- the target network for image segmentation is a full convolution network with a hybrid context network structure
- the hybrid context network structure can fuse a plurality of reference features with a predetermined scale range extracted by itself into:
- the features of the scale range of the target object in the segmentation image are matched, so that the target objects in each scale range in the image are not ignored, and the receptive field depends on the convolution with the largest expansion coefficient. Therefore, the scheme can guarantee large Under the premise of the receptive field, the segmentation effectiveness of the target objects in different scale ranges in the image is improved.
- the embodiment of the present application further provides a storage medium, where the storage medium is used to store executable program code, and the executable program code is used to execute an image according to an embodiment of the present application at runtime.
- a segmentation method wherein the image segmentation method comprises:
- the target network is a full convolution network including a hybrid context network structure, and the hybrid context network structure is used Integrating a plurality of reference features having a predetermined scale range extracted by itself into a target feature, the target feature being a feature matching a scale range of the target object in the segmented image; the target network passing through different scale ranges The sample image of the target object is trained;
- the target network for image segmentation is a full convolution network with a hybrid context network structure
- the hybrid context network structure can fuse a plurality of reference features with a predetermined scale range extracted by itself into:
- the features of the scale range of the target object in the segmentation image are matched, so that the target objects in each scale range in the image are not ignored, and the receptive field depends on the convolution with the largest expansion coefficient. Therefore, the scheme can guarantee large Under the premise of the receptive field, the segmentation effectiveness of the target objects in different scale ranges in the image is improved.
- the embodiment of the present application further provides an application program, where the application is used to perform an image segmentation method according to an embodiment of the present application at runtime, where the image segmentation method includes:
- the target network is a full convolution network including a hybrid context network structure, and the hybrid context network structure is used Integrating a plurality of reference features having a predetermined scale range extracted by itself into a target feature, the target feature being a feature matching a scale range of the target object in the segmented image; the target network passing through different scale ranges The sample image of the target object is trained;
- the target network for image segmentation is a full convolution network with a hybrid context network structure
- the hybrid context network structure can fuse a plurality of reference features with a predetermined scale range extracted by itself into:
- the features of the scale range of the target object in the segmentation image are matched, so that the target objects in each scale range in the image are not ignored, and the receptive field depends on the convolution with the largest expansion coefficient. Therefore, the scheme can guarantee large Under the premise of the receptive field, the segmentation effectiveness of the target objects in different scale ranges in the image is improved.
- the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
本申请实施例提供了一种图像分割方法、装置及全卷积网络系统。其中,所述方法包括:获得待处理的目标图像;获得目标图像的图像特征数据;将图像特征数据输入至预先训练得到的用于图像分割的目标网络中,得到输出结果;目标网络为包括混合上下文网络结构的全卷积网络,所述混合上下文网络结构用于将自身所提取的多个具有预定尺度范围的参考特征融合为目标特征,目标特征为与所分割图像中目标对象的尺度范围相匹配的特征;目标网络通过具有不同尺度范围的目标对象的样本图像训练而成;基于输出结果,得到目标图像所对应的图像分割结果。通过本方案可以在保证大的感受野的前提下,提升对图像中不同尺度范围的目标对象的分割有效性。
Description
本申请要求于2016年8月26日提交中国专利局、申请号为201610734168.4发明名称为“图像分割方法、装置及全卷积网络系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及机器视觉技术领域,特别是涉及图像分割方法、装置及全卷积网络系统。
语义分割和场景标记等类型的图像分割在很多应用场景中都有着十分重要的作用,例如图片理解、自动驾驶等应用场景,这样使得图像分割对于机器理解图像有着重要的意义;其中,所谓的语义分割为:给定一张图像,对于图像中的像素做分类;所谓的场景标记为:按照图像语义对图像中的区域进行标记。而近年来,全卷积网络为图像的语义分割和场景标记带来了显著的性能提升,具体的,通过借助全卷积网络良好的分类性能,对图像中的像素进行密集的类别预测,最后借助条件随机场一类的方法,综合预测最终结果。相关技术中,用于图像分割的全卷积网络主要是由卷积层、池化层和激活层构成的网络,且全卷积网络相对于卷积网络不存在全连接层。
其中,全卷积网络中,大的感受野意味着更多的空间上下文信息被考虑,可以增加预测准确性,所谓感受野为输出特征图的某个节点的响应对应的输入图像的区域。相关技术中增大感受野的方式包括两种:其一,增加卷积核的大小,其二,增加池化层;其中,对于第一种方式而言,会带来参数的膨胀,也就是需要被训练的参数过多,导致无法被正常训练;而对于第二种方式而言,由于池化层的存在,使得网络中的特征图的空间尺寸随着网络的加深而逐渐变小,导致分辨率变低,最终使得全卷积网络对于图像中的目标的边缘往往预测结果不佳,如果增加池化层,无疑导致图像中小目标的预测精度下降。由于存在上述问题,多孔卷积被提出,一定程度上解决了上述问题。其中,多孔卷积通过在卷积核中填充零元素,达到了增大卷积核同时不会导
致参数爆炸,而且,多孔卷积可以帮助移除网络中的部分池化层,使得特征图随着网络的加深保持不变。这些优点,使得多孔卷积在图像分割应用中得到广泛的应用。
另外,在图像分割的具体应用中,图像中目标对象的尺度具有差异性,也就是说,有非常大的目标,也有非常小的目标。而一个全卷积网络往往有其所适用的尺度范围,也就是最适合处理哪个尺度范围的目标对象,举例而言:在多孔卷积中选择不同的膨胀系数,往往会使得全卷积网络适用不同的尺度范围。并且,全卷积网络中,卷积操作提取的特征的尺度范围,不但和卷积层的感受野成正比,同时也和该卷积层的膨胀系数成正比,而且,所提取的特征的尺度如果较大,会导致尺度小的目标对象被忽略。因此,如何在保证大的感受野的前提下,对图像中不同尺度范围的目标对象进行有效分割,是一个值得关注的问题,
相关技术中,存在一种ASPP(Atrous spatial pyramid pooling)的结构的全卷积网络,具体为:构建多个分支,每个分支由具有不同膨胀系数的多孔卷积构成,然后综合多个分支的结果进行预测。但是,由于膨胀系数固定,最适合处理哪个尺度范围的目标对象也即被确定了,导致所适用的目标对象被固定,自由度不够。
可见,如何在保证大的感受野的前提下,提升对图像中不同尺度范围的目标对象的分割有效性,是一个亟待解决的问题。
发明内容
本申请实施例的目的在于提供一种图像分割方法、装置及全卷积网络系统,以在保证大的感受野的前提下,提升对图像中不同尺度范围的目标对象的分割有效性。具体技术方案如下:
第一方面,本申请实施例提供了一种图像分割方法,包括:
获得待处理的目标图像;
获得所述目标图像的图像特征数据;
将所述图像特征数据输入至预先训练得到的用于图像分割的目标网络
中,得到输出结果;其中,所述目标网络为包括混合上下文网络结构的全卷积网络,所述混合上下文网络结构用于将自身所提取的多个具有预定尺度范围的参考特征融合为目标特征,所述目标特征为与所分割图像中目标对象的尺度范围相匹配的特征;所述目标网络通过具有不同尺度范围的目标对象的样本图像训练而成;
基于所述输出结果,得到所述目标图像所对应的图像分割结果。
可选的,所述混合上下文网络结构具体为非多孔卷积与多孔卷积相结合的卷积结构。
可选的,所述混合上下文网络结构包括:至少一个混合上下文组件;
每一混合上下文组件均包括:多孔卷积分支、非多孔卷积分支、通道串联层和非多孔卷积层,其中,所述多孔卷积分支和所述非多孔卷积分支分别对所在的混合上下文组件的输入内容进行卷积处理,所述通道串联层将所述多孔卷积分支的卷积结果以及所述非多孔卷积分支的卷积结果进行串联处理,所述非多孔卷积层对所述通道串联层的处理结果进行卷积处理并将所得的卷积结果作为所在的混合上下文组件的输出内容进行输出。
可选的,所述多孔卷积分支中包括至少一个多孔卷积,所述非多孔卷积中包括至少一个非多孔卷积。
可选的,所述混合上下文组件采用下述卷积公式对输入内容进行卷积处理:
其中,Fi表示第i层的输入特征图,Fi+1表示第i层的输出特征图,Wk表示多孔卷积分支和非多孔卷积分支的参数,bk表示多孔卷积分支和非多孔卷积分支的偏移项,ψ表示多孔卷积分支和非多孔卷积分支的激活函数,c()表示在通道坐标轴上串联所有的输入矩阵,Wi表示非多孔卷积层的参数,bi表示非多孔卷积层的偏移项,表示非多孔卷积层的激活函数。
可选的,所述目标网络的训练过程为:
构建包括混合上下文网络结构的初始全卷积网络;
获得各个样本图像的图像特征数据;
将各个样本图像的图像特征数据输入至所述初始全卷积网络中进行训练;
当各个训练样本所对应的输出结果均与相应图像分割真值的损失值低于预定阈值时,训练过程结束,得到目标网络。
第二方面,本申请实施例提供了一种图像分割装置,包括:
目标图像获得模块,用于获得待处理的目标图像;
图像特征数据获得模块,用于获得所述目标图像的图像特征数据;
图像分割模块,用于将所述图像特征数据输入至预先训练得到的用于图像分割的目标网络中,得到输出结果;其中,所述目标网络为包括混合上下文网络结构的全卷积网络,所述混合上下文网络结构用于将自身所提取的多个具有预定尺度范围的参考特征融合为目标特征,所述目标特征为与所分割图像中目标对象的尺度范围相匹配的特征;所述目标网络通过具有不同尺度范围的目标对象的样本图像训练而成;
结果获得模块,用于基于所述输出结果,得到所述目标图像所对应的图像分割结果。
可选的,所述混合上下文网络结构具体为非多孔卷积与多孔卷积相结合的卷积结构。
可选的,所述混合上下文网络结构包括:至少一个混合上下文组件;
每一混合上下文组件均包括:多孔卷积分支、非多孔卷积分支、通道串联层和非多孔卷积层,其中,所述多孔卷积分支和所述非多孔卷积分支分别对所在的混合上下文组件的输入内容进行卷积处理,所述通道串联层将所述多孔卷积分支的卷积结果以及所述非多孔卷积分支的卷积结果进行串联处理,所述非多孔卷积层对所述通道串联层的处理结果进行卷积处理并将所得的卷积结果作为所在的混合上下文组件的输出内容进行输出。
可选的,所述多孔卷积分支中包括至少一个多孔卷积,所述非多孔卷积中包括至少一个非多孔卷积。
可选的,所述混合上下文组件采用下述卷积公式对输入内容进行卷积处理:
其中,Fi表示第i层的输入特征图,Fi+1表示第i层的输出特征图,Wk表示多孔卷积分支和非多孔卷积分支的参数,bk表示多孔卷积分支和非多孔卷积分支的偏移项,ψ表示多孔卷积分支和非多孔卷积分支的激活函数,c()表示在通道坐标轴上串联所有的输入矩阵,Wi表示非多孔卷积层的参数,bi表示非多孔卷积层的偏移项,表示非多孔卷积层的激活函数。
可选的,所述目标网络由训练模块训练而成,所述训练模块包括:
构建单元,用于构建包括混合上下文网络结构的初始全卷积网络;
特征数据获得单元,用于获得各个样本图像的图像特征数据;
训练单元,用于将各个样本图像的图像特征数据输入至所述初始全卷积网络中进行训练;
判断单元,用于当各个训练样本所对应的输出结果均与相应图像分割真值的损失值低于预定阈值时,训练过程结束,得到目标网络。
第三方面,本申请实施例还提供了一种全卷积网络系统,包括:混合上下文网络结构;
所述混合上下文网络结构包括:至少一个混合上下文组件;
每一混合上下文组件均包括:多孔卷积分支、非多孔卷积分支、通道串联层和非多孔卷积层,其中,所述多孔卷积分支和所述非多孔卷积分支分别对所在的混合上下文组件的输入内容进行卷积处理,所述通道串联层将所述多孔卷积分支的卷积结果以及所述非多孔卷积分支的卷积结果进行串联处理,所述非多孔卷积层对所述通道串联层的处理结果进行卷积处理并将所得的卷积结果作为所在的混合上下文组件的输出内容进行输出。
可选的,所述多孔卷积分支中包括至少一个多孔卷积,所述非多孔卷积中包括至少一个非多孔卷积。
可选的,所述混合上下文组件采用下述卷积公式对输入内容进行卷积处理:
其中,Fi表示第i层的输入特征图,Fi+1表示第i层的输出特征图,Wk表示多孔卷积分支和非多孔卷积分支的参数,bk表示多孔卷积分支和非多孔卷积分支的偏移项,ψ表示多孔卷积分支和非多孔卷积分支的激活函数,c()表示在通道坐标轴上串联所有的输入矩阵,Wi表示非多孔卷积层的参数,bi表示非多孔卷积层的偏移项,表示非多孔卷积层的激活函数。
第四方面,本申请实施例还提供了一种电子设备,包括:
处理器、存储器;
所述存储器存储可执行程序代码;
所述处理器通过读取所述存储器中存储的可执行程序代码来运行与所述可执行程序代码对应的程序,以用于在运行时执行本申请第一方面所述的一种图像分割方法。
第五方面,本申请提供了一种存储介质,其中,该存储介质用于存储可执行程序代码,所述可执行程序代码用于在运行时执行本申请第一方面所述的一种图像分割方法。
本申请实施例中,用于图像分割的目标网络为具有混合上下文网络结构的全卷积网络,该混合上下文网络结构能够将自身所提取的多个具有预定尺度范围的参考特征融合为:与所分割图像中目标对象的尺度范围相匹配的特征,使得图像中的各个尺度范围的目标对象均不被忽略,同时,感受野依赖具有最大膨胀系数的卷积,因此,通过本方案可以在保证大的感受野的前提下,提升对图像中不同尺度范围的目标对象的分割有效性。
为了更清楚地说明本申请实施例和现有技术的技术方案,下面对实施例和现有技术中所需要使用的附图作简单地介绍,显而易见地,下面描述中的
附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例所提供的一种图像分割方法的流程图;
图2为本申请实施例所提供的混合上下文组件的结构示意图;
图3为本申请实施例所列举的用于分割图像的目标网络的示意图;
图4为本申请实施例中目标网络的训练过程的流程图;
图5为本申请实施例所提供的一种图像分割装置的结构示意图;
图6为本申请实施例提供的一种电子设备的结构示意图。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
为了在保证大的感受野的前提下,提升对图像中不同尺度范围的目标对象的分割有效性,本申请实施例提供了一种图像分割方法、装置及全卷积网络。其中,本申请实施例所涉及的图像分割可以指:对图像的语义分割,或者,对图像的场景标记,可选地,还可以指其他对图像中区域进行划分的方式,在此不做限定。
下面首先对本申请实施例所提供的一种图像分割方法进行介绍。
需要说明的是,本申请实施例所提供的一种图像分割方法的执行主体可以为一种图像分割装置,其中,该图像分割装置可以为相关技术中图像处理软件中的功能插件,也可以为独立的功能软件,这都是合理的;另外,该图像分割装置可以应用于电子设备中,该电子设备可以包括终端设备或服务器设备,这也是合理的。
如图1所示,本申请实施例所提供的一种图像分割方法,可以包括如下步骤:
S101,获得待处理的目标图像;
其中,所谓的获得待处理的目标图像可以为:从本地获得待处理的目标图像,或者,从网络中下载获得待处理的目标图像,等等,这都是合理的。并且,该目标图像中可以包含不同尺度范围的目标对象,例如,该目标图像为路边摄像头所拍摄的监控图像,该目标图像中包括属于近景的汽车和行人,还包括属于远景的小鸟,其中,该属于近景的汽车和行人为大尺度范围的目标对象,该属于远景的小鸟为小尺度范围的目标对象。
需要强调的是,大小尺度范围的设定是一个相对的概念,并不是指某个具体的尺度范围;但是,对于现有技术中的全卷积网络而言,如果该网络善于处理尺度较大的目标对象,通常就不适合处理尺度较小的目标对象,大小是相对的,举例而言:相关技术中的全卷积网络A适合处理100*100像素的目标对象,那么10*10的目标对象就是小尺度范围的目标对象,该小尺度范围的目标对象则会被该全卷积网络A忽略,类似的,如果全卷积网络B适合处理1000*1000的目标对象时,100*100的目标对象可以被认为是小尺度范围的目标对象,该小尺度范围的目标对象则会被该全卷积网络B忽略。也就是说,利用相关技术中的全卷积网络的图像分割方法所适用的尺度范围具有局限性,而本申请实施例所提供的图像分割方法利用包含混合上下文网络结构的目标网络,目的在于扩大所适用的尺度范围。
S102,获得该目标图像的图像特征数据;
为了对目标图像进行图像分割,可以获得目标图像的图像特征数据,其中,该图像特征数据能够唯一确定出该目标图像。
具体的,该图像特征数据可以包括但不局限于颜色通道值,其中,对于RGB图像而言,其颜色通道值为RGB通道值,RGB图像和RGB通道值中的“R”代表红色Red,“G”代表绿色Green,“B”代表蓝色Blue。可以理解的是,该图像分割装置可以调用外部的颜色提取器提取颜色通道值,也可以通过内置的程序代码自行提取颜色通道值,这都是合理的,其中,颜色提取器为能够提取图像中像素点的颜色通道值的功能软件。
S103,将该图像特征数据输入至预先训练得到的用于图像分割的目标网
络中,得到输出结果;其中,该目标网络为包括混合上下文网络结构的全卷积网络,该混合上下文网络结构用于将自身所提取的多个具有预定尺度范围的参考特征融合为目标特征,该目标特征为与所分割图像中目标对象的尺度范围相匹配的特征;该目标网络通过具有不同尺度范围的目标对象的样本图像训练而成;
为了在保证大的感受野的前提下,提升对图像中不同尺度范围的目标对象的分割有效性,预先训练得到用于图像分割的目标网络,该目标网络为包括混合上下文网络结构的全卷积网络,也就是说,该目标网络属于全卷积网络,但是该目标网络在包括卷积层、池化层和激活层的基础上,还额外增加有混合上下文网络结构,具体的,该混合上下文网络结构用于将自身所提取的多个具有预定尺度范围的参考特征融合为目标特征,该目标特征为与所分割图像中目标对象的尺度范围相匹配的特征。其中,卷积层用于进行卷积处理,池化层用于进行上采样或下采用处理,激活层用于引入非线性因素,并且,对于一个全卷积网络结构而言,可以仅仅具有一个激活层,而多个卷积层和池化层,并且,池化层和激活层位于卷积层之后。
可以理解的是,由于目标网络中增加有混合上下文网络结构,可以融合来自不同尺度范围的特征图的特征,从而形成与所分割图像中目标对象的尺度范围相匹配的特征,这样使得该目标网络能够通过对样本图像的学习,自动调节自身适应的尺度范围,需要强调的是,在训练该目标网络时,可以利用具有不同尺度范围的目标对象的样本图像。为了布局清楚,后续对该目标网络的训练过程进行介绍。
在分割该目标图像的过程中,可以在获得该目标图像的图像特征数据后,将所获得的该目标图像的图像特征数据作为输入内容输入至该目标网络中,得到输出结果。
具体的,该混合上下文网络结构具体为非多孔卷积与多孔卷积相结合的卷积结构。其中,多孔卷积为通过在卷积核中填充零元素来扩大卷积核的一种卷积,非多孔卷积为:卷积核中未填充零元素以扩大卷积核的普通卷积。另外,需要强调的是,膨胀系数是和多孔卷积相关的属性,普通卷积,即非多孔卷积的膨胀系数为1,如果在普通卷积的卷积核的两个元素之间添加一个
零,膨胀系数则为2,以此类推,添加N个零,膨胀系数则为(N+1)。
在一种实现方式中,该混合上下文网络结构包括:至少一个混合上下文组件;
如图2所示,每一混合上下文组件均包括:多孔卷积分支、非多孔卷积分支、通道串联层和非多孔卷积层,其中,该多孔卷积分支和该非多孔卷积分支分别对所在的混合上下文组件的输入内容进行卷积处理,该通道串联层将该多孔卷积分支的卷积结果以及该非多孔卷积分支的卷积结果进行串联处理,该非多孔卷积层对该通道串联层的处理结果进行卷积处理并将所得的卷积结果作为所在的混合上下文组件的输出内容进行输出。
需要强调的是,该混合上下文组件的输入可以是任何一个特征图,然后该特征图分别经过多孔卷积分支和非多孔卷积分支的卷积处理,在并行的卷积处理过程中,分别提取不同的预定尺度范围的特征,预定尺度范围受膨胀系数影响,然后,通过通道串联层和非多孔卷积层的处理,实现从不同的预定尺度范围的特征中筛选出新尺度范围的特征作为输出,该新尺度范围的特征为与所分割图像中目标对象的尺度范围相匹配的特征。
本领域技术人员可以理解的是,对于相关技术中的全卷积网络而言,其某一个卷积层的卷积核的膨胀系数是固定的,随着感受野的增大,其卷积核提取的特征的尺度范围也随着变大,如下的公式(1)表示一个卷积层,
而对于混合上下文网络结构而言,如果将一个混合上下文组件看作为一个卷积层,那么,该混合上下文组件采用如下卷积公式(2)进行卷积处理:
其中,Fi表示第i层的输入特征图,Fi+1表示第i层的输出特征图,Wk表示多孔卷积分支和非多孔卷积分支的参数,bk表示多孔卷积分支和非多孔卷
积分支的偏移项,ψ表示多孔卷积分支和非多孔卷积分支的激活函数,c()表示在通道坐标轴上串联所有的输入矩阵,Wi表示非多孔卷积层的参数,bi表示非多孔卷积层的偏移项,表示非多孔卷积层的激活函数。其中,c具体是让两个四维的矩阵,在第二个维度上连接起来,变成一个矩阵,举例而言,n*c1*h*w的矩阵和n*c2*h*w的矩阵合并起来,变成n*(c1+c2)*h*w。另外,需要强调的是,F表示特征图,是一个矩阵,而感受野是指特征图中的一个元素对应到原图有多大的区域,感受野可以看作为是这个特征图的一个属性。
具体的,Wi的卷积核大小可以为1。Wk既可以是一个多孔卷积也可以是一个非多孔卷积。可以理解的是,卷积提取的特征的尺度范围和卷积的膨胀系数是成正比的,也就是说,提供给Wi筛选的特征中,既有大尺度的特征,也有小尺度的特征。
可以理解的是,Fi+1的感受野依赖具有最大膨胀系数的卷积。也就是说,Fi+1可以具有较大的感受野的同时,根据所输入图像的尺度范围选择来输出大尺度的特征或者小尺度的特征,或者它们的混合物,即输出所分割图像中目标对象的尺度范围相匹配的特征,而不是像相关技术中全卷积网络,只能输出某一个尺度的特征。这无疑给了目标网络更大的自由度,目标网络可以从指定的样本图像中学习到什么样的尺度组合是最好的。
可以理解的是,该多孔卷积分支中包括至少一个多孔卷积,该非多孔卷积中包括至少一个非多孔卷积。并且,当混合上下文网络结构包括至少两个混合上下文组件时,该至少两个混合上下文组件串联相接。另外,可以理解的是,当该混合上下文网络结构包括多个多孔卷积分支时,该混合上下文网络结构中必然存在多个多孔卷积,而当该混合上下文网络结构中的任一多孔卷积分支中包括多个多孔卷积时,该混合上下文网络结构中也会存在多个多孔卷积;进一步的,当混合上下文网络结构中存在多个多孔卷积时,多个多孔卷积的膨胀系数可以根据实际情况进行设定,本申请实施例对此不做限定。
需要强调的是,在具体应用时,该混合上下文网络结构可以设置在整个目标网络的后半段,当然并不局限于此,而由于整个网络形态的复杂和多样化,对于该混合上下文网络结构在网络中的具体位置,本申请实施例不做限定。可以理解的是,对于本申请实施例所提供的包含上下文混合网络结构的
目标网络而言,从功能的角度,可以被划分为三个部分,第一部分为分类预测模块,第二部分为上下文综合判断模块,第三部分为修正模块,其中,混合上下文网络结构即为该上下文综合判断模块;具体的,分类预测模块用于对特征图中像素点所属的类别进行初步预测,上下文综合判断模块用于在分类预测模块的基础上基于更多的上下文信息来进行分类,而修正模块用于在上下文综合判断模块的输出结果的基础上来基于更加细节的信息修正边缘和小目标对象。
举例而言:在如图3所示的目标网络的一种结构示意图中,混合上下文网络结构包括5个混合上下文组件,每个混合上下文组件中的多孔卷积分支包括一个多孔卷积,非多孔卷分支包括一个非多孔卷积,另外,222*224,112*112等为特征图的大小,通过特征图的大小值体现特征图的空间尺寸在网络运行中的改变。需要强调的是,图3的左半边为上述的分类预测模块所对应的分类预测流程,并且,由于该分类预测流程所对应的具体网络结构是一个由分类网络转化而来的FCN网络,且该分类网络可以是已存在的任一主流的分类网络,因此,图3中仅仅给出了特征图的示意,以从对特征图的处理的角度来来介绍具体的分类预测过程,其中,FCN网络为Fully Convolutional Networks,FCN网络将传统卷积神经网络的全连接层转化成一个个的卷积层,该FCN网络试图从抽象的特征中恢复出每个像素所属的类别,即从图像级别的分类进一步延伸到像素级别的分类。
S104,基于该输出结果,得到该目标图像所对应的图像分割结果。
在得到输出结果后,可以基于该输出结果,得到该目标图像所对应的图像分割结果。可以理解的是,该目标网络的输出结果为某些特征数据,可以根据这些特征数据生成该目标图像所对应的图像分割结果。
本申请实施例中,用于图像分割的目标网络为具有混合上下文网络结构的全卷积网络,该混合上下文网络结构能够将自身所提取的多个具有预定尺度范围的参考特征融合为:与所分割图像中目标对象的尺度范围相匹配的特征,使得图像中的各个尺度范围的目标对象均不被忽略,同时,感受野依赖具有最大膨胀系数的卷积,因此,通过本方案可以在保证大的感受野的前提下,提升对图像中不同尺度范围的目标对象的分割有效性。
为了方案清楚,下面介绍该目标网络的训练过程。
具体的,如图4所示,所述目标网络的训练过程可以为:
S201,构建包括混合上下文网络结构的初始全卷积网络;
所谓构建包括混合上下文网络结构的初始全卷积网络即为构建一个包含混合上下文网络结构的全卷积网络,该全卷积网络中所涉及的多孔卷积和非多孔卷积的膨胀系数在构建目标网络时被设置完成。
需要强调的是,该初始全卷积网络为需要被训练的网络结构,也即参数未被训练得到的目标网络;并且,该初始全卷积网络中还包括卷积层、激活层、池化层,其中,混合上下文网络结构在初始全卷积网络中的具体位置可以根据实际情况自行设定,该初始全卷积网络中的卷积层、激活层和池化层的数量以及位置关系,可以根据实际情况设定。可选地,所构建的该初始全卷积网络中的卷积层、激活层和池化层之间的位置关系可以遵循一定的设计原则,举例而言:池化层和激活层在卷积层之后。
S202,获得各个样本图像的图像特征数据;
其中,样本图像的图像特征数据可以包括但不局限于颜色通道值,其中,对于RGB图像而言,其颜色通道值为RGB通道值;可选地,所获得的样本图像的图像特征数据与前述的获得目标图像的图像特征数据的类型相同。
S203,将各个样本图像的图像特征数据输入至初始全卷积网络中进行训练;
S204,当各个训练样本所对应的输出结果均与相应图像分割真值的损失值低于预定阈值时,训练过程结束,得到目标网络。
在获得各个样本图像的图像特征数据后,可以将各个样本图像的图像特征数据作为输入内容输入至初始全卷积网络中进行训练,并且,可以实时检测各个训练样本所对应的输出结果均与相应图像分割真值的损失值是否低于预定阈值,当各个训练样本所对应的输出结果均与相应图像分割真值的损失值低于预定阈值时,训练过程结束,得到目标网络,其中,每一样本图像对应的图像分割真值通过人工标注得到,且图像分割真值具体指样本图像被图
像分割后的图像数据。
另外,需要说明的是,当各个训练样本所对应的输出结果与相应图像分割真值的损失值低于预定阈值时,可以利用优化算法对该初始全卷积网络进行网络优化。具体的,所谓优化指调整该初始全卷积网络中的参数,举例而言,卷积的卷积核和步长等等。另外,对该初始全卷积网络进行网络优化所采用的优化算法可以采用梯度下降法,其中,梯度下降法的基本处理思想为沿梯度下降的方向求解极小值(也可以沿梯度上升方向求解极大值),具体的梯度下降法可以包括但不局限于SGD(stochastic gradient descent,随机梯度下降)算法或Mini批梯度算法。
相应于上述方法实施例,本申请实施例还提供了一种图像分割装置。
如图5所示,本申请实施例所提供的一种图像分割装置,可以包括:
目标图像获得模块310,用于获得待处理的目标图像;
图像特征数据获得模块320,用于获得该目标图像的图像特征数据;
图像分割模块330,用于将该图像特征数据输入至预先训练得到的用于图像分割的目标网络中,得到输出结果;其中,该目标网络为包括混合上下文网络结构的全卷积网络,该混合上下文网络结构用于将自身所提取的多个具有预定尺度范围的参考特征融合为目标特征,该目标特征为与所分割图像中目标对象的尺度范围相匹配的特征;该目标网络通过具有不同尺度范围的目标对象的样本图像训练而成;
结果获得模块340,用于基于该输出结果,得到该目标图像所对应的图像分割结果。
本申请实施例中,用于图像分割的目标网络为具有混合上下文网络结构的全卷积网络,该混合上下文网络结构能够将自身所提取的多个具有预定尺度范围的参考特征融合为:与所分割图像中目标对象的尺度范围相匹配的特征,使得图像中的各个尺度范围的目标对象均不被忽略,同时,感受野依赖具有最大膨胀系数的卷积,因此,通过本方案可以在保证大的感受野的前提
下,提升对图像中不同尺度范围的目标对象的分割有效性。
具体的,该混合上下文网络结构具体为非多孔卷积与多孔卷积相结合的卷积结构。
在一种具体实现方式中,该混合上下文网络结构可以包括:至少一个混合上下文组件;
每一混合上下文组件均包括:多孔卷积分支、非多孔卷积分支、通道串联层和非多孔卷积层,其中,该多孔卷积分支和该非多孔卷积分支分别对所在的混合上下文组件的输入内容进行卷积处理,该通道串联层将该多孔卷积分支的卷积结果以及该非多孔卷积分支的卷积结果进行串联处理,该非多孔卷积层对该通道串联层的处理结果进行卷积处理并将所得的卷积结果作为所在的混合上下文组件的输出内容进行输出。
具体的,该多孔卷积分支中包括至少一个多孔卷积,该非多孔卷积中包括至少一个非多孔卷积。
具体的,该混合上下文组件采用下述卷积公式对输入内容进行卷积处理:
其中,Fi表示第i层的输入特征图,Fi+1表示第i层的输出特征图,Wk表示多孔卷积分支和非多孔卷积分支的参数,bk表示多孔卷积分支和非多孔卷积分支的偏移项,ψ表示多孔卷积分支和非多孔卷积分支的激活函数,c()表示在通道坐标轴上串联所有的输入矩阵,Wi表示非多孔卷积层的参数,bi表示非多孔卷积层的偏移项,表示非多孔卷积层的激活函数。
具体的,该目标网络由训练模块训练而成,该训练模块包括:
构建单元,用于构建包括混合上下文网络结构的初始全卷积网络;
特征数据获得单元,用于获得各个样本图像的图像特征数据;
训练单元,用于将各个样本图像的图像特征数据输入至该初始全卷积网络中进行训练;
判断单元,用于当各个训练样本所对应的输出结果均与相应图像分割真
值的损失值低于预定阈值时,训练过程结束,得到目标网络。
本申请实施例还提供了一种全卷积网络系统,可以应用于图像分割,当然并不局限与此。具体的,该全卷积网络系统包括:混合上下文网络结构;
该混合上下文网络结构包括:至少一个多个混合上下文组件;
每一混合上下文组件均包括:多孔卷积分支、非多孔卷积分支、通道串联层和非多孔卷积层,其中,该多孔卷积分支和该非多孔卷积分支分别对所在的混合上下文组件的输入内容进行卷积处理,该通道串联层将该多孔卷积分支的卷积结果以及该非多孔卷积分支的卷积结果进行串联处理,该非多孔卷积层对该通道串联层的处理结果进行卷积处理并将所得的卷积结果作为所在的混合上下文组件的输出内容进行输出。
需要强调的是,本申请实施例所提供的全卷积网络为:在包括卷积层、池化层和激活层的同时还包括有混合上下文网络结构的网络结构。
其中,混合上下文组件的结构可以参见图2所示。并且,当混合上下文网络结构所包括的混合上下文组件为至少两个时,该至少两个混合上下文组件串联相接。
可以理解的是,由于全卷积网络增加有混合上下文网络结构,可以融合来自不同尺度范围的特征图的特征,从而形成与所分割图像中目标对象的尺度范围相匹配的特征,这样使得该目标网络能够通过对样本图像的学习,自动调节自身适应的尺度范围,其中,在训练该全卷积网络时,可以利用具有不同尺度范围的目标对象的样本图像。
需要强调的是,该混合上下文组件的输入可以是任何一个特征图,然后该特征图分别经过多孔卷积分支和非多孔卷积分支的卷积处理,在并行的卷积处理过程中,分别提取不同的预定尺度范围的特征,预定尺度范围受膨胀系数影响,然后,通过通道串联层和非多孔卷积层的处理,实现从不同的预定尺度范围的特征中筛选出新尺度范围的特征作为输出,该新尺度范围的特征为与所分割图像中目标对象的尺度范围相匹配的特征。
在具体应用时,该混合上下文网络结构可以设置在整个目标网络的后半段,当然并不局限于此,而由于整个网络形态的复杂和多样化,对于该混合上下文网络结构在网络中的具体位置,本申请实施例不做限定。可以理解的是,对于本申请实施例所提供的包含上下文混合网络结构的目标网络而言,从功能的角度,可以被划分为三个部分,第一部分为分类预测模块,第二部分为上下文综合判断模块,第三部分为修正模块,其中,混合上下文网络结构即为该上下文综合判断模块;具体的,分类预测模块用于对特征图中像素点所属的类别进行初步预测,上下文综合判断模块用于在分类预测模块的基础上基于更多的上下文信息来进行分类,,而修正模块用于在上下文综合判断模块的输出结果的基础上来基于更加细节的信息修正边缘和小目标对象。举例而言:在如图3所示的目标网络的一种结构示意图中,混合上下文网络结构包括5个混合上下文组件,每个混合上下文组件中的多孔卷积分支包括一个多孔卷积,非多孔卷分支包括一个非多孔卷积,另外,222*224,112*112等为特征图的大小,通过特征图的大小值体现特征图的空间尺寸在网络运行中的改变。需要强调的是,图3的左半边为上述的分类预测模块所对应的分类预测流程,并且,由于该分类预测流程所对应的具体网络结构是一个由分类网络转化而来的FCN网络,且该分类网络可以是已存在的任一主流的分类网络,因此,图3中仅仅给出了特征图的示意,以从对特征图的处理的角度来来介绍具体的分类预测过程,其中,FCN网络为Fully Convolutional Networks,FCN网络将传统卷积神经网络的全连接层转化成一个个的卷积层,该FCN网络试图从抽象的特征中恢复出每个像素所属的类别,即从图像级别的分类进一步延伸到像素级别的分类。
本申请实施例中,用于图像分割的目标网络为具有混合上下文网络结构的全卷积网络,该混合上下文网络结构能够将自身所提取的多个具有预定尺度范围的参考特征融合为:与所分割图像中目标对象的尺度范围相匹配的特征,使得图像中的各个尺度范围的目标对象均不被忽略,同时,感受野依赖具有最大膨胀系数的卷积,因此,通过本方案可以在保证大的感受野的前提下,提升对图像中不同尺度范围的目标对象的分割有效性。
具体的,该多孔卷积分支中包括至少一个多孔卷积,该非多孔卷积中包括至少一个非多孔卷积。另外,可以理解的是,当该混合上下文网络结构包
括多个多孔卷积分支时,该混合上下文网络结构中必然存在多个多孔卷积,而当该混合上下文网络结构中的任一多孔卷积分支中包括多个多孔卷积时,该混合上下文网络结构中也会存在多个多孔卷积;进一步的,当混合上下文网络结构中存在多个多孔卷积时,多个多孔卷积的膨胀系数可以根据实际情况进行设定,本申请实施例对此不做限定。
具体的,该混合上下文组件采用下述卷积公式对输入内容进行卷积处理:
其中,Fi表示第i层的输入特征图,Fi+1表示第i层的输出特征图,Wk表示多孔卷积分支和非多孔卷积分支的参数,bk表示多孔卷积分支和非多孔卷积分支的偏移项,ψ表示多孔卷积分支和非多孔卷积分支的激活函数,c()表示在通道坐标轴上串联所有的输入矩阵,Wi表示非多孔卷积层的参数,bi表示非多孔卷积层的偏移项,表示非多孔卷积层的激活函数。其中,c具体是让两个四维的矩阵,在第二个维度上连接起来,变成一个矩阵,举例而言,n*c1*h*w的矩阵和n*c2*h*w的矩阵合并起来,变成n*(c1+c2)*h*w。另外,需要强调的是,F表示特征图,是一个矩阵,而感受野是指特征图中的一个元素对应到原图有多大的区域,感受野可以看作为是这个特征图的一个属性。
具体的,Wi的卷积核大小可以为1。Wk既可以是一个多孔卷积也可以是一个非多孔卷积。可以理解的是,卷积提取的特征的尺度范围和卷积的膨胀系数是成正比的,也就是说,提供给Wi筛选的特征中,既有大尺度的特征,也有小尺度的特征。
并且,可以理解的是,Fi+1的感受野依赖具有最大膨胀系数的卷积。也就是说,Fi+1可以具有较大的感受野的同时,根据所输入图像的尺度范围选择输出大尺度的特征或者小尺度的特征,或者它们的混合物,即输出所分割图像中目标对象的尺度范围相匹配的特征而不是像相关技术中全卷积网络,只能输出某一个尺度的特征。这无疑给了目标网络更大的自由度,目标网络可以从指定的样本图像中学习到什么样的尺度组合是最好的。
相应地,本申请实施例还提供了一种电子设备,可以包括:
处理器、存储器;
所述存储器存储可执行程序代码;
所述处理器通过读取所述存储器中存储的可执行程序代码来运行与所述可执行程序代码对应的程序,以用于在运行时执行本申请实施例所述的一种图像分割方法,其中,所述图像分割方法包括:
获得待处理的目标图像;
获得所述目标图像的图像特征数据;
将所述图像特征数据输入至预先训练得到的用于图像分割的目标网络中,得到输出结果;其中,所述目标网络为包括混合上下文网络结构的全卷积网络,所述混合上下文网络结构用于将自身所提取的多个具有预定尺度范围的参考特征融合为目标特征,所述目标特征为与所分割图像中目标对象的尺度范围相匹配的特征;所述目标网络通过具有不同尺度范围的目标对象的样本图像训练而成;
基于所述输出结果,得到所述目标图像所对应的图像分割结果。
相应地,如图6所示,本申请实施例还提供了一种电子设备,可以包括:
处理器410、存储器420、通信接口430和总线440;
所述处理器410、所述存储器420和所述通信接口430通过所述总线440连接并完成相互间的通信;
所述存储器420存储可执行程序代码;
所述处理器410通过读取所述存储器420中存储的可执行程序代码来运行与所述可执行程序代码对应的程序,以用于在运行时执行本申请实施例所述的一种图像分割方法,其中,所述图像分割方法包括:
获得待处理的目标图像;
获得所述目标图像的图像特征数据;
将所述图像特征数据输入至预先训练得到的用于图像分割的目标网络中,得到输出结果;其中,所述目标网络为包括混合上下文网络结构的全卷积网络,所述混合上下文网络结构用于将自身所提取的多个具有预定尺度范
围的参考特征融合为目标特征,所述目标特征为与所分割图像中目标对象的尺度范围相匹配的特征;所述目标网络通过具有不同尺度范围的目标对象的样本图像训练而成;
基于所述输出结果,得到所述目标图像所对应的图像分割结果。
本申请实施例中,用于图像分割的目标网络为具有混合上下文网络结构的全卷积网络,该混合上下文网络结构能够将自身所提取的多个具有预定尺度范围的参考特征融合为:与所分割图像中目标对象的尺度范围相匹配的特征,使得图像中的各个尺度范围的目标对象均不被忽略,同时,感受野依赖具有最大膨胀系数的卷积,因此,通过本方案可以在保证大的感受野的前提下,提升对图像中不同尺度范围的目标对象的分割有效性。
相应地,本申请实施例还提供了一种存储介质,其中,该存储介质用于存储可执行程序代码,所述可执行程序代码用于在运行时执行本申请实施例所述的一种图像分割方法,其中,所述图像分割方法包括:
获得待处理的目标图像;
获得所述目标图像的图像特征数据;
将所述图像特征数据输入至预先训练得到的用于图像分割的目标网络中,得到输出结果;其中,所述目标网络为包括混合上下文网络结构的全卷积网络,所述混合上下文网络结构用于将自身所提取的多个具有预定尺度范围的参考特征融合为目标特征,所述目标特征为与所分割图像中目标对象的尺度范围相匹配的特征;所述目标网络通过具有不同尺度范围的目标对象的样本图像训练而成;
基于所述输出结果,得到所述目标图像所对应的图像分割结果。
本申请实施例中,用于图像分割的目标网络为具有混合上下文网络结构的全卷积网络,该混合上下文网络结构能够将自身所提取的多个具有预定尺度范围的参考特征融合为:与所分割图像中目标对象的尺度范围相匹配的特征,使得图像中的各个尺度范围的目标对象均不被忽略,同时,感受野依赖具有最大膨胀系数的卷积,因此,通过本方案可以在保证大的感受野的前提下,提升对图像中不同尺度范围的目标对象的分割有效性。
相应地,本申请实施例还提供了一种应用程序,其中,该应用程序用于在运行时执行本申请实施例所述的一种图像分割方法,其中,所述图像分割方法包括:
获得待处理的目标图像;
获得所述目标图像的图像特征数据;
将所述图像特征数据输入至预先训练得到的用于图像分割的目标网络中,得到输出结果;其中,所述目标网络为包括混合上下文网络结构的全卷积网络,所述混合上下文网络结构用于将自身所提取的多个具有预定尺度范围的参考特征融合为目标特征,所述目标特征为与所分割图像中目标对象的尺度范围相匹配的特征;所述目标网络通过具有不同尺度范围的目标对象的样本图像训练而成;
基于所述输出结果,得到所述目标图像所对应的图像分割结果。
本申请实施例中,用于图像分割的目标网络为具有混合上下文网络结构的全卷积网络,该混合上下文网络结构能够将自身所提取的多个具有预定尺度范围的参考特征融合为:与所分割图像中目标对象的尺度范围相匹配的特征,使得图像中的各个尺度范围的目标对象均不被忽略,同时,感受野依赖具有最大膨胀系数的卷积,因此,通过本方案可以在保证大的感受野的前提下,提升对图像中不同尺度范围的目标对象的分割有效性。
对于装置/电子设备/存储介质/应用程序实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个......”限定的要素,
并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
本说明书中的各个实施例均采用相关的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
本领域普通技术人员可以理解实现上述方法实施方式中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,所述的程序可以存储于计算机可读取存储介质中,这里所称得的存储介质,如:ROM/RAM、磁碟、光盘等。
以上所述仅为本申请的较佳实施例而已,并非用于限定本申请的保护范围。凡在本申请的精神和原则之内所作的任何修改、等同替换、改进等,均包含在本申请的保护范围内。
Claims (17)
- 一种图像分割方法,其特征在于,包括:获得待处理的目标图像;获得所述目标图像的图像特征数据;将所述图像特征数据输入至预先训练得到的用于图像分割的目标网络中,得到输出结果;其中,所述目标网络为包括混合上下文网络结构的全卷积网络,所述混合上下文网络结构用于将自身所提取的多个具有预定尺度范围的参考特征融合为目标特征,所述目标特征为与所分割图像中目标对象的尺度范围相匹配的特征;所述目标网络通过具有不同尺度范围的目标对象的样本图像训练而成;基于所述输出结果,得到所述目标图像所对应的图像分割结果。
- 根据权利要求1所述的方法,其特征在于,所述混合上下文网络结构具体为非多孔卷积与多孔卷积相结合的卷积结构。
- 根据权利要求2所述的方法,其特征在于,所述混合上下文网络结构包括:至少一个混合上下文组件;每一混合上下文组件均包括:多孔卷积分支、非多孔卷积分支、通道串联层和非多孔卷积层,其中,所述多孔卷积分支和所述非多孔卷积分支分别对所在的混合上下文组件的输入内容进行卷积处理,所述通道串联层将所述多孔卷积分支的卷积结果以及所述非多孔卷积分支的卷积结果进行串联处理,所述非多孔卷积层对所述通道串联层的处理结果进行卷积处理并将所得的卷积结果作为所在的混合上下文组件的输出内容进行输出。
- 根据权利要求3所述的方法,其特征在于,所述多孔卷积分支中包括至少一个多孔卷积,所述非多孔卷积中包括至少一个非多孔卷积。
- 根据权利要求1-5任一项所述的方法,其特征在于,所述目标网络的训练过程为:构建包括混合上下文网络结构的初始全卷积网络;获得各个样本图像的图像特征数据;将各个样本图像的图像特征数据输入至所述初始全卷积网络中进行训练;当各个训练样本所对应的输出结果均与相应图像分割真值的损失值低于预定阈值时,训练过程结束,得到目标网络。
- 一种图像分割装置,其特征在于,包括:目标图像获得模块,用于获得待处理的目标图像;图像特征数据获得模块,用于获得所述目标图像的图像特征数据;图像分割模块,用于将所述图像特征数据输入至预先训练得到的用于图像分割的目标网络中,得到输出结果;其中,所述目标网络为包括混合上下文网络结构的全卷积网络,所述混合上下文网络结构用于将自身所提取的多个具有预定尺度范围的参考特征融合为目标特征,所述目标特征为与所分割图像中目标对象的尺度范围相匹配的特征;所述目标网络通过具有不同尺度范围的目标对象的样本图像训练而成;结果获得模块,用于基于所述输出结果,得到所述目标图像所对应的图像分割结果。
- 根据权利要求7所述的装置,其特征在于,所述混合上下文网络结构 具体为非多孔卷积与多孔卷积相结合的卷积结构。
- 根据权利要求8所述的装置,其特征在于,所述混合上下文网络结构包括:至少一个混合上下文组件;每一混合上下文组件均包括:多孔卷积分支、非多孔卷积分支、通道串联层和非多孔卷积层,其中,所述多孔卷积分支和所述非多孔卷积分支分别对所在的混合上下文组件的输入内容进行卷积处理,所述通道串联层将所述多孔卷积分支的卷积结果以及所述非多孔卷积分支的卷积结果进行串联处理,所述非多孔卷积层对所述通道串联层的处理结果进行卷积处理并将所得的卷积结果作为所在的混合上下文组件的输出内容进行输出。
- 根据权利要求9所述的装置,其特征在于,所述多孔卷积分支中包括至少一个多孔卷积,所述非多孔卷积中包括至少一个非多孔卷积。
- 根据权利要求7-11任一项所述的装置,其特征在于,所述目标网络由训练模块训练而成,所述训练模块包括:构建单元,用于构建包括混合上下文网络结构的初始全卷积网络;特征数据获得单元,用于获得各个样本图像的图像特征数据;训练单元,用于将各个样本图像的图像特征数据输入至所述初始全卷积网络中进行训练;判断单元,用于当各个训练样本所对应的输出结果均与相应图像分割真值的损失值低于预定阈值时,训练过程结束,得到目标网络。
- 一种全卷积网络系统,其特征在于,包括:混合上下文网络结构;所述混合上下文网络结构包括:至少一个混合上下文组件;每一混合上下文组件均包括:多孔卷积分支、非多孔卷积分支、通道串联层和非多孔卷积层,其中,所述多孔卷积分支和所述非多孔卷积分支分别对所在的混合上下文组件的输入内容进行卷积处理,所述通道串联层将所述多孔卷积分支的卷积结果以及所述非多孔卷积分支的卷积结果进行串联处理,所述非多孔卷积层对所述通道串联层的处理结果进行卷积处理并将所得的卷积结果作为所在的混合上下文组件的输出内容进行输出。
- 根据权利要求13所述的全卷积网络系统,其特征在于,所述多孔卷积分支中包括至少一个多孔卷积,所述非多孔卷积中包括至少一个非多孔卷积。
- 一种电子设备,其特征在于,包括:处理器、存储器;所述存储器存储可执行程序代码;所述处理器通过读取所述存储器中存储的可执行程序代码来运行与所述可执行程序代码对应的程序,以用于执行如权利要求1-6任一项所述的一种图像分割方法。
- 一种存储介质,其特征在于,所述存储介质用于存储可执行程序代码,所述可执行程序代码用于在运行时执行如权利要求1-6任一项所述的一种图像分割方法。
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP17842716.7A EP3506200B1 (en) | 2016-08-26 | 2017-07-12 | Image segmentation method, apparatus, and fully convolutional network system |
| US16/327,682 US11151723B2 (en) | 2016-08-26 | 2017-07-12 | Image segmentation method, apparatus, and fully convolutional network system |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610734168.4 | 2016-08-26 | ||
| CN201610734168.4A CN107784654B (zh) | 2016-08-26 | 2016-08-26 | 图像分割方法、装置及全卷积网络系统 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2018036293A1 true WO2018036293A1 (zh) | 2018-03-01 |
Family
ID=61245431
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2017/092614 Ceased WO2018036293A1 (zh) | 2016-08-26 | 2017-07-12 | 图像分割方法、装置及全卷积网络系统 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US11151723B2 (zh) |
| EP (1) | EP3506200B1 (zh) |
| CN (1) | CN107784654B (zh) |
| WO (1) | WO2018036293A1 (zh) |
Cited By (16)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108564589A (zh) * | 2018-03-26 | 2018-09-21 | 江苏大学 | 一种基于改进全卷积神经网络的植物叶片分割方法 |
| CN108830912A (zh) * | 2018-05-04 | 2018-11-16 | 北京航空航天大学 | 一种深度特征对抗式学习的交互式灰度图像着色方法 |
| CN109801293A (zh) * | 2019-01-08 | 2019-05-24 | 平安科技(深圳)有限公司 | 遥感影像分割方法、装置及存储介质、服务器 |
| CN110363210A (zh) * | 2018-04-10 | 2019-10-22 | 腾讯科技(深圳)有限公司 | 一种图像语义分割模型的训练方法和服务器 |
| CN110781850A (zh) * | 2019-10-31 | 2020-02-11 | 深圳金信诺高新技术股份有限公司 | 道路识别的语义分割系统和方法、计算机存储介质 |
| CN111144560A (zh) * | 2018-11-05 | 2020-05-12 | 杭州海康威视数字技术股份有限公司 | 一种深度神经网络运算方法及装置 |
| CN111738036A (zh) * | 2019-03-25 | 2020-10-02 | 北京四维图新科技股份有限公司 | 图像处理方法、装置、设备及存储介质 |
| CN112329603A (zh) * | 2020-11-03 | 2021-02-05 | 西南科技大学 | 一种基于图像级联的坝面裂纹缺陷定位方法 |
| CN112492323A (zh) * | 2019-09-12 | 2021-03-12 | 上海哔哩哔哩科技有限公司 | 直播蒙版的生成方法、可读存储介质及计算机设备 |
| CN112733919A (zh) * | 2020-12-31 | 2021-04-30 | 山东师范大学 | 基于空洞卷积和多尺度多分支的图像语义分割方法及系统 |
| CN112836804A (zh) * | 2021-02-08 | 2021-05-25 | 北京迈格威科技有限公司 | 图像处理方法、装置、电子设备及存储介质 |
| CN113487483A (zh) * | 2021-07-05 | 2021-10-08 | 上海商汤智能科技有限公司 | 影像分割网络的训练方法和装置 |
| CN113496159A (zh) * | 2020-03-20 | 2021-10-12 | 昆明理工大学 | 一种多尺度卷积与动态权重代价函数的烟尘目标分割方法 |
| CN113793345A (zh) * | 2021-09-07 | 2021-12-14 | 复旦大学附属华山医院 | 一种基于改进注意力模块的医疗影像分割方法及装置 |
| CN114357986A (zh) * | 2021-12-07 | 2022-04-15 | 北京健康之家科技有限公司 | 一种交易数据提取方法、设备终端及存储介质 |
| CN116740121A (zh) * | 2023-06-15 | 2023-09-12 | 吉林大学 | 一种基于专用神经网络和图像预处理的秸秆图像分割方法 |
Families Citing this family (72)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11587304B2 (en) * | 2017-03-10 | 2023-02-21 | Tusimple, Inc. | System and method for occluding contour detection |
| US10902252B2 (en) | 2017-07-17 | 2021-01-26 | Open Text Corporation | Systems and methods for image based content capture and extraction utilizing deep learning neural network and bounding box detection training techniques |
| US10776903B2 (en) | 2017-07-17 | 2020-09-15 | Open Text Corporation | Systems and methods for image modification and image based content capture and extraction in neural networks |
| US10783381B2 (en) | 2017-08-31 | 2020-09-22 | Tusimple, Inc. | System and method for vehicle occlusion detection |
| CN108460764B (zh) * | 2018-03-31 | 2022-02-15 | 华南理工大学 | 基于自动上下文和数据增强的超声图像智能分割方法 |
| CN108765425B (zh) * | 2018-05-15 | 2022-04-22 | 深圳大学 | 图像分割方法、装置、计算机设备和存储介质 |
| WO2019218136A1 (zh) | 2018-05-15 | 2019-11-21 | 深圳大学 | 图像分割方法、计算机设备和存储介质 |
| CN108596330B (zh) * | 2018-05-16 | 2022-03-15 | 中国人民解放军陆军工程大学 | 一种并行特征全卷积神经网络装置及其构建方法 |
| CN108921196A (zh) * | 2018-06-01 | 2018-11-30 | 南京邮电大学 | 一种改进全卷积神经网络的语义分割方法 |
| CN108898092A (zh) * | 2018-06-26 | 2018-11-27 | 北京工业大学 | 基于全卷积神经网络的多光谱遥感影像路网提取方法 |
| CN109086705B (zh) * | 2018-07-23 | 2021-11-16 | 北京旷视科技有限公司 | 图像处理方法、装置、电子设备及储存介质 |
| CN109299716B (zh) * | 2018-08-07 | 2021-07-06 | 北京市商汤科技开发有限公司 | 神经网络的训练方法、图像分割方法、装置、设备及介质 |
| CN109101975B (zh) * | 2018-08-20 | 2022-01-25 | 电子科技大学 | 基于全卷积神经网络的图像语义分割方法 |
| US10922589B2 (en) * | 2018-10-10 | 2021-02-16 | Ordnance Survey Limited | Object-based convolutional neural network for land use classification |
| US10984532B2 (en) | 2018-08-24 | 2021-04-20 | Ordnance Survey Limited | Joint deep learning for land cover and land use classification |
| CN109145906B (zh) * | 2018-08-31 | 2020-04-24 | 北京字节跳动网络技术有限公司 | 目标对象的图像确定方法、装置、设备及存储介质 |
| CN109241658A (zh) * | 2018-09-27 | 2019-01-18 | 中国电子科技集团公司第五十四研究所 | 基于遥感影像的蝶形卫星天线形态解析方法 |
| CN109389078B (zh) | 2018-09-30 | 2022-06-21 | 京东方科技集团股份有限公司 | 图像分割方法、相应的装置及电子设备 |
| US11037030B1 (en) * | 2018-10-29 | 2021-06-15 | Hrl Laboratories, Llc | System and method for direct learning from raw tomographic data |
| CN111127510B (zh) * | 2018-11-01 | 2023-10-27 | 杭州海康威视数字技术股份有限公司 | 一种目标对象位置的预测方法及装置 |
| CN109492612B (zh) * | 2018-11-28 | 2024-07-02 | 平安科技(深圳)有限公司 | 基于骨骼点的跌倒检测方法及其跌倒检测装置 |
| CN109584142A (zh) * | 2018-12-05 | 2019-04-05 | 网易传媒科技(北京)有限公司 | 图像增强系统和方法、训练方法、介质以及电子设备 |
| US10929665B2 (en) * | 2018-12-21 | 2021-02-23 | Samsung Electronics Co., Ltd. | System and method for providing dominant scene classification by semantic segmentation |
| CN110148148B (zh) * | 2019-03-01 | 2024-11-05 | 纵目科技(上海)股份有限公司 | 一种基于目标检测的下边缘检测模型的训练方法、模型和存储介质 |
| CN110288082B (zh) * | 2019-06-05 | 2022-04-05 | 北京字节跳动网络技术有限公司 | 卷积神经网络模型训练方法、装置和计算机可读存储介质 |
| CN110363780B (zh) * | 2019-07-23 | 2025-03-04 | 腾讯科技(深圳)有限公司 | 图像分割方法、装置、计算机可读存储介质和计算机设备 |
| CN112288748B (zh) * | 2019-07-25 | 2024-03-01 | 银河水滴科技(北京)有限公司 | 一种语义分割网络训练、图像语义分割方法及装置 |
| CN110543895B (zh) * | 2019-08-08 | 2023-06-23 | 淮阴工学院 | 一种基于VGGNet和ResNet的图像分类方法 |
| CN110738609B (zh) * | 2019-09-11 | 2022-05-06 | 北京大学 | 一种去除图像摩尔纹的方法及装置 |
| CN110796162B (zh) * | 2019-09-18 | 2023-08-29 | 平安科技(深圳)有限公司 | 图像识别、训练识别模型的方法、相关设备及存储介质 |
| CN110717527B (zh) * | 2019-09-24 | 2023-06-27 | 东南大学 | 结合空洞空间金字塔结构的目标检测模型确定方法 |
| CN110751154B (zh) * | 2019-09-27 | 2022-04-08 | 西北工业大学 | 一种基于像素级分割的复杂环境多形状文本检测方法 |
| CN110852270B (zh) * | 2019-11-11 | 2024-03-15 | 中科视语(北京)科技有限公司 | 基于深度学习的混合语法人体解析方法及装置 |
| CN111079540B (zh) * | 2019-11-19 | 2024-03-19 | 北航航空航天产业研究院丹阳有限公司 | 一种基于目标特性的分层可重构车载视频目标检测方法 |
| CN111028237B (zh) * | 2019-11-26 | 2023-06-06 | 中国科学院深圳先进技术研究院 | 图像分割方法、装置及终端设备 |
| CN112862828B (zh) * | 2019-11-26 | 2022-11-18 | 华为技术有限公司 | 一种语义分割方法、模型训练方法及装置 |
| CN110956122B (zh) * | 2019-11-27 | 2022-08-02 | 深圳市商汤科技有限公司 | 图像处理方法及装置、处理器、电子设备、存储介质 |
| CN111028163B (zh) * | 2019-11-28 | 2024-02-27 | 湖北工业大学 | 一种基于卷积神经网络的联合图像去噪与弱光增强方法 |
| CN111179214A (zh) * | 2019-11-29 | 2020-05-19 | 苏州优纳医疗器械有限公司 | 一种基于图像语义分割的病理切片组织区域识别系统 |
| CN112950647B (zh) * | 2019-12-10 | 2023-08-18 | 杭州海康威视数字技术股份有限公司 | 图像分割方法、装置、设备及存储介质 |
| CN111127378A (zh) * | 2019-12-23 | 2020-05-08 | Oppo广东移动通信有限公司 | 图像处理方法、装置、计算机设备及存储介质 |
| CN111161269B (zh) * | 2019-12-23 | 2024-03-22 | 上海联影智能医疗科技有限公司 | 图像分割方法、计算机设备和可读存储介质 |
| CN113055666B (zh) * | 2019-12-26 | 2022-08-09 | 武汉Tcl集团工业研究院有限公司 | 一种视频质量评估方法及装置 |
| CN111179283A (zh) * | 2019-12-30 | 2020-05-19 | 深圳市商汤科技有限公司 | 图像语义分割方法及装置、存储介质 |
| CN111325204B (zh) * | 2020-01-21 | 2023-10-31 | 腾讯科技(深圳)有限公司 | 目标检测方法、装置、电子设备以及存储介质 |
| CN111311518B (zh) * | 2020-03-04 | 2023-05-26 | 清华大学深圳国际研究生院 | 基于多尺度混合注意力残差网络的图像去噪方法及装置 |
| CN111539458B (zh) * | 2020-04-02 | 2024-02-27 | 咪咕文化科技有限公司 | 特征图处理方法、装置、电子设备及存储介质 |
| CN111680781B (zh) * | 2020-04-20 | 2023-07-25 | 北京迈格威科技有限公司 | 神经网络处理方法、装置、电子设备及存储介质 |
| CN111714145B (zh) * | 2020-05-27 | 2022-07-01 | 浙江飞图影像科技有限公司 | 基于弱监督分割的股骨颈骨折检测方法及系统 |
| CN111709338B (zh) * | 2020-06-08 | 2024-02-27 | 苏州超云生命智能产业研究院有限公司 | 一种用于表格检测的方法、装置及检测模型的训练方法 |
| CN111815639B (zh) * | 2020-07-03 | 2024-08-30 | 浙江大华技术股份有限公司 | 目标分割方法及其相关装置 |
| CN112102321B (zh) * | 2020-08-07 | 2023-09-01 | 深圳大学 | 一种基于深度卷积神经网络的病灶图像分割方法及系统 |
| CN112419342B (zh) * | 2020-10-22 | 2025-03-07 | 原力金智(重庆)科技有限公司 | 图像处理方法、装置、电子设备和计算机可读介质 |
| CN112329861B (zh) * | 2020-11-06 | 2024-05-28 | 北京工业大学 | 一种面向移动机器人多目标检测的分层特征融合方法 |
| CN112348116B (zh) * | 2020-11-30 | 2024-02-02 | 长沙理工大学 | 利用空间上下文的目标检测方法、装置和计算机设备 |
| CN112598673B (zh) * | 2020-11-30 | 2024-12-27 | 北京迈格威科技有限公司 | 全景分割方法、装置、电子设备和计算机可读介质 |
| CN112598676B (zh) * | 2020-12-29 | 2022-11-22 | 北京市商汤科技开发有限公司 | 图像分割方法及装置、电子设备和存储介质 |
| CN112766176B (zh) * | 2021-01-21 | 2023-12-01 | 深圳市安软科技股份有限公司 | 轻量化卷积神经网络的训练方法及人脸属性识别方法 |
| CN112836076B (zh) * | 2021-01-27 | 2024-07-19 | 京东方科技集团股份有限公司 | 一种图像标签生成方法、装置及设备 |
| CN112991350B (zh) * | 2021-02-18 | 2023-06-27 | 西安电子科技大学 | 一种基于模态差异缩减的rgb-t图像语义分割方法 |
| CN113536898B (zh) * | 2021-05-31 | 2023-08-29 | 大连民族大学 | 全面特征捕捉型时间卷积网络、视频动作分割方法、计算机系统和介质 |
| CN113284155B (zh) * | 2021-06-08 | 2023-11-07 | 京东科技信息技术有限公司 | 视频目标分割方法、装置、存储介质及电子设备 |
| CN113516670B (zh) * | 2021-06-29 | 2024-06-25 | 清华大学 | 一种反馈注意力增强的非模式图像分割方法及装置 |
| CN113724286B (zh) * | 2021-08-09 | 2024-10-18 | 浙江大华技术股份有限公司 | 显著性目标的检测方法、检测设备及计算机可读存储介质 |
| CN114092434B (zh) * | 2021-11-16 | 2024-11-29 | 中国飞机强度研究所 | 一种基于自适应非局部特征融合的飞机内仓裂纹分割方法 |
| CN114417990B (zh) * | 2022-01-13 | 2024-08-23 | 广东双电科技有限公司 | 一种利用矩形框标注的瓷绝缘子红外图像分割方法 |
| CN114187293B (zh) * | 2022-02-15 | 2022-06-03 | 四川大学 | 基于注意力机制和集成配准的口腔腭部软硬组织分割方法 |
| CN115131290B (zh) * | 2022-05-24 | 2025-06-03 | 阿里巴巴(中国)有限公司 | 图像处理方法 |
| CN114882596B (zh) * | 2022-07-08 | 2022-11-15 | 深圳市信润富联数字科技有限公司 | 行为预警方法、装置、电子设备及存储介质 |
| CN115423810B (zh) * | 2022-11-04 | 2023-03-14 | 国网江西省电力有限公司电力科学研究院 | 一种风力发电机组叶片覆冰形态分析方法 |
| CN116452948B (zh) * | 2023-04-12 | 2026-02-17 | 以萨技术股份有限公司 | 一种基于改进yolov5的高精度目标检测方法及系统 |
| CN116758508B (zh) * | 2023-08-18 | 2024-01-12 | 四川蜀道新能源科技发展有限公司 | 基于像素差异扩大处理的路面标线检测方法、系统及终端 |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104067314A (zh) * | 2014-05-23 | 2014-09-24 | 中国科学院自动化研究所 | 人形图像分割方法 |
| US20150117760A1 (en) * | 2013-10-30 | 2015-04-30 | Nec Laboratories America, Inc. | Regionlets with Shift Invariant Neural Patterns for Object Detection |
| CN104700100A (zh) * | 2015-04-01 | 2015-06-10 | 哈尔滨工业大学 | 面向高空间分辨率遥感大数据的特征提取方法 |
| CN105389584A (zh) * | 2015-10-13 | 2016-03-09 | 西北工业大学 | 基于卷积神经网络与语义转移联合模型的街景语义标注方法 |
| CN105528575A (zh) * | 2015-11-18 | 2016-04-27 | 首都师范大学 | 基于上下文推理的天空检测算法 |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104778448B (zh) * | 2015-03-24 | 2017-12-15 | 孙建德 | 一种基于结构自适应卷积神经网络的人脸识别方法 |
| WO2017106645A1 (en) * | 2015-12-18 | 2017-06-22 | The Regents Of The University Of California | Interpretation and quantification of emergency features on head computed tomography |
-
2016
- 2016-08-26 CN CN201610734168.4A patent/CN107784654B/zh active Active
-
2017
- 2017-07-12 WO PCT/CN2017/092614 patent/WO2018036293A1/zh not_active Ceased
- 2017-07-12 EP EP17842716.7A patent/EP3506200B1/en active Active
- 2017-07-12 US US16/327,682 patent/US11151723B2/en active Active
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20150117760A1 (en) * | 2013-10-30 | 2015-04-30 | Nec Laboratories America, Inc. | Regionlets with Shift Invariant Neural Patterns for Object Detection |
| CN104067314A (zh) * | 2014-05-23 | 2014-09-24 | 中国科学院自动化研究所 | 人形图像分割方法 |
| CN104700100A (zh) * | 2015-04-01 | 2015-06-10 | 哈尔滨工业大学 | 面向高空间分辨率遥感大数据的特征提取方法 |
| CN105389584A (zh) * | 2015-10-13 | 2016-03-09 | 西北工业大学 | 基于卷积神经网络与语义转移联合模型的街景语义标注方法 |
| CN105528575A (zh) * | 2015-11-18 | 2016-04-27 | 首都师范大学 | 基于上下文推理的天空检测算法 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP3506200A4 * |
Cited By (25)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108564589A (zh) * | 2018-03-26 | 2018-09-21 | 江苏大学 | 一种基于改进全卷积神经网络的植物叶片分割方法 |
| CN110363210B (zh) * | 2018-04-10 | 2023-05-05 | 腾讯科技(深圳)有限公司 | 一种图像语义分割模型的训练方法和服务器 |
| CN110363210A (zh) * | 2018-04-10 | 2019-10-22 | 腾讯科技(深圳)有限公司 | 一种图像语义分割模型的训练方法和服务器 |
| CN108830912A (zh) * | 2018-05-04 | 2018-11-16 | 北京航空航天大学 | 一种深度特征对抗式学习的交互式灰度图像着色方法 |
| CN111144560B (zh) * | 2018-11-05 | 2024-02-02 | 杭州海康威视数字技术股份有限公司 | 一种深度神经网络运算方法及装置 |
| CN111144560A (zh) * | 2018-11-05 | 2020-05-12 | 杭州海康威视数字技术股份有限公司 | 一种深度神经网络运算方法及装置 |
| CN109801293A (zh) * | 2019-01-08 | 2019-05-24 | 平安科技(深圳)有限公司 | 遥感影像分割方法、装置及存储介质、服务器 |
| CN109801293B (zh) * | 2019-01-08 | 2023-07-14 | 平安科技(深圳)有限公司 | 遥感影像分割方法、装置及存储介质、服务器 |
| CN111738036A (zh) * | 2019-03-25 | 2020-10-02 | 北京四维图新科技股份有限公司 | 图像处理方法、装置、设备及存储介质 |
| CN111738036B (zh) * | 2019-03-25 | 2023-09-29 | 北京四维图新科技股份有限公司 | 图像处理方法、装置、设备及存储介质 |
| CN112492323A (zh) * | 2019-09-12 | 2021-03-12 | 上海哔哩哔哩科技有限公司 | 直播蒙版的生成方法、可读存储介质及计算机设备 |
| CN110781850A (zh) * | 2019-10-31 | 2020-02-11 | 深圳金信诺高新技术股份有限公司 | 道路识别的语义分割系统和方法、计算机存储介质 |
| CN113496159B (zh) * | 2020-03-20 | 2022-12-23 | 昆明理工大学 | 一种多尺度卷积与动态权重代价函数的烟尘目标分割方法 |
| CN113496159A (zh) * | 2020-03-20 | 2021-10-12 | 昆明理工大学 | 一种多尺度卷积与动态权重代价函数的烟尘目标分割方法 |
| CN112329603B (zh) * | 2020-11-03 | 2022-09-13 | 西南科技大学 | 一种基于图像级联的坝面裂纹缺陷定位方法 |
| CN112329603A (zh) * | 2020-11-03 | 2021-02-05 | 西南科技大学 | 一种基于图像级联的坝面裂纹缺陷定位方法 |
| CN112733919B (zh) * | 2020-12-31 | 2022-05-20 | 山东师范大学 | 基于空洞卷积和多尺度多分支的图像语义分割方法及系统 |
| CN112733919A (zh) * | 2020-12-31 | 2021-04-30 | 山东师范大学 | 基于空洞卷积和多尺度多分支的图像语义分割方法及系统 |
| CN112836804A (zh) * | 2021-02-08 | 2021-05-25 | 北京迈格威科技有限公司 | 图像处理方法、装置、电子设备及存储介质 |
| CN112836804B (zh) * | 2021-02-08 | 2024-05-10 | 北京迈格威科技有限公司 | 图像处理方法、装置、电子设备及存储介质 |
| CN113487483A (zh) * | 2021-07-05 | 2021-10-08 | 上海商汤智能科技有限公司 | 影像分割网络的训练方法和装置 |
| CN113793345A (zh) * | 2021-09-07 | 2021-12-14 | 复旦大学附属华山医院 | 一种基于改进注意力模块的医疗影像分割方法及装置 |
| CN113793345B (zh) * | 2021-09-07 | 2023-10-31 | 复旦大学附属华山医院 | 一种基于改进注意力模块的医疗影像分割方法及装置 |
| CN114357986A (zh) * | 2021-12-07 | 2022-04-15 | 北京健康之家科技有限公司 | 一种交易数据提取方法、设备终端及存储介质 |
| CN116740121A (zh) * | 2023-06-15 | 2023-09-12 | 吉林大学 | 一种基于专用神经网络和图像预处理的秸秆图像分割方法 |
Also Published As
| Publication number | Publication date |
|---|---|
| EP3506200A4 (en) | 2019-07-03 |
| EP3506200B1 (en) | 2021-06-02 |
| CN107784654A (zh) | 2018-03-09 |
| CN107784654B (zh) | 2020-09-25 |
| EP3506200A1 (en) | 2019-07-03 |
| US20190228529A1 (en) | 2019-07-25 |
| US11151723B2 (en) | 2021-10-19 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2018036293A1 (zh) | 图像分割方法、装置及全卷积网络系统 | |
| CN109447990B (zh) | 图像语义分割方法、装置、电子设备和计算机可读介质 | |
| CN108009543B (zh) | 一种车牌识别方法及装置 | |
| US20200034648A1 (en) | Method and apparatus for segmenting sky area, and convolutional neural network | |
| CN111797712B (zh) | 基于多尺度特征融合网络的遥感影像云与云阴影检测方法 | |
| CN111353956B (zh) | 图像修复方法、装置、计算机设备及存储介质 | |
| CN109086811B (zh) | 多标签图像分类方法、装置及电子设备 | |
| CN107944450B (zh) | 一种车牌识别方法及装置 | |
| CN111209858B (zh) | 一种基于深度卷积神经网络的实时车牌检测方法 | |
| CN108664981A (zh) | 显著图像提取方法及装置 | |
| CN113744142A (zh) | 图像修复方法、电子设备及存储介质 | |
| CN113744280B (zh) | 图像处理方法、装置、设备及介质 | |
| CN111738272B (zh) | 一种目标特征提取方法、装置及电子设备 | |
| CN111062964A (zh) | 图像分割方法及相关装置 | |
| CN116524189A (zh) | 一种基于编解码索引化边缘表征的高分辨率遥感图像语义分割方法 | |
| CN114549913A (zh) | 一种语义分割方法、装置、计算机设备和存储介质 | |
| CN113486856B (zh) | 一种驾驶员不规范行为检测方法 | |
| CN109426773A (zh) | 一种道路识别方法和装置 | |
| CN111523439B (zh) | 一种基于深度学习的目标检测的方法、系统、设备及介质 | |
| CN111444923A (zh) | 自然场景下图像语义分割方法和装置 | |
| CN116612280A (zh) | 车辆分割方法、装置、计算机设备和计算机可读存储介质 | |
| CN111274936A (zh) | 多光谱图像地物分类方法、系统、介质及终端 | |
| US20190188512A1 (en) | Method and image processing entity for applying a convolutional neural network to an image | |
| CN114898113A (zh) | 一种对象检测方法、装置、电子设备及存储介质 | |
| CN120339884A (zh) | 基于分支加权融合的无人机航拍小目标检测方法 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17842716 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2017842716 Country of ref document: EP Effective date: 20190326 |