CN108171663A

CN108171663A - The image completion system for the convolutional neural networks that feature based figure arest neighbors is replaced

Info

Publication number: CN108171663A
Application number: CN201711416650.4A
Authority: CN
Inventors: 左旺孟; 颜肇义; 李晓明; 山世光
Original assignee: Harbin Institute of Technology Shenzhen
Current assignee: Harbin Institute of Technology Shenzhen
Priority date: 2017-12-22
Filing date: 2017-12-22
Publication date: 2018-06-15
Anticipated expiration: 2037-12-22
Also published as: CN108171663B

Abstract

The image filling system of the convolutional neural network based on the nearest neighbor replacement of the feature map belongs to the field of image filling technology, and solves the problem that the existing image filling method cannot quickly obtain the filling image with consistent overall semantics and good clarity. The system: the generating network encodes the image to be filled and then decodes it to obtain the filled image. The decoder of the generation network includes N deconvolution layers. For any M deconvolution layers in the first deconvolution layer to the N-1th deconvolution layer, the generation network is based on the output of each deconvolution layer The result and the output result of the convolutional layer corresponding to the deconvolution layer, and the feature map nearest neighbor replacement method is used to obtain the additional feature map, and the output result of each deconvolution layer, the convolution layer corresponding to the deconvolution layer The output of the convolution layer and the additional feature map are used as the input object of the next deconvolution layer. The discriminative network is used to judge whether the filled image is the real image corresponding to the image to be filled.

Description

Image Filling System Based on Convolutional Neural Networks with Feature Map Nearest Neighbor Replacement

技术领域technical field

本发明涉及一种图像填充系统，属于图像填充技术领域。The invention relates to an image filling system, which belongs to the technical field of image filling.

背景技术Background technique

图像填充是计算机视觉和图像处理领域中的一项基本问题，其主要用于对受到损坏的图像进行修复重建或者去除图像中的多余物体。Image filling is a basic problem in the field of computer vision and image processing, which is mainly used to repair and reconstruct damaged images or remove redundant objects in images.

现有的图像填充方法主要包括基于扩散的图像填充方法、基于样本的图像填充方法和基于深度学习的图像填充方法。Existing image filling methods mainly include diffusion-based image filling methods, sample-based image filling methods and deep learning-based image filling methods.

基于扩散的图像填充方法的基本思想为：以像素点为单位，将待填充区域边缘的图像信息扩散到待填充区域内部。当待填充区域面积较小且结构简单、纹理单一时，该图像填充方法能够较好地完成图像填充任务。然而，当待填充区域面积较大时，采用该图像填充方法获得的填充图像的清晰度较差。The basic idea of the diffusion-based image filling method is to diffuse the image information at the edge of the area to be filled to the inside of the area to be filled in units of pixels. When the area to be filled is small, the structure is simple, and the texture is single, the image filling method can better complete the image filling task. However, when the area to be filled is large, the clarity of the filled image obtained by using this image filling method is poor.

基于样本的图像填充方法的基本思想为：以图像块为单位，由图像已知区域向待填充区域逐渐填充。每次填充图像块时，用图像已知区域中与待填充区域边缘图像块最相似的图像块来填充待填充区域。与基于扩散的图像填充方法相比，采用基于样本的图像填充方法获得的填充图像的纹理更好，清晰度更高。然而，由于基于样本的图像填充方法是采用图像已知区域中的相似图像块来逐步替换待填充区域中的未知图像块，因此，采用该图像填充方法无法获得整体语义一致的填充图像。The basic idea of the sample-based image filling method is: take the image block as a unit, and gradually fill from the known area of the image to the area to be filled. Each time an image block is filled, the area to be filled is filled with the image block in the known area of the image that is most similar to the edge image block of the area to be filled. Compared with the diffusion-based image filling method, the sample-based image filling method obtains better texture and higher definition of the filled image. However, since the sample-based image filling method uses similar image blocks in the known area of the image to gradually replace the unknown image blocks in the area to be filled, it is impossible to obtain an overall semantically consistent filled image using this image filling method.

基于深度学习的图像填充方法主要是指将深度神经网络应用到图像填充领域中。目前，有学者提出采用编码器-解码器网络来对中间区域缺失的图像进行图像填充。然而，这种图像填充方法只适用于128*128的RGB图像。采用该图像填充方法获得的填充图像虽然能够满足整体语义一致的要求，但是，其填充图像的清晰度较差。针对这一问题，有学者尝试采用多尺度迭代更新的方式进行大图的清晰填充。然而，虽然这种图像填充方法所获得的填充图像具有整体语义一致性和良好的清晰度，但是，其速度极慢。在Titan X显卡运行环境下，对一张256*256的RGB图像进行填充，需要耗时数十秒至几分钟。The image filling method based on deep learning mainly refers to the application of deep neural network to the field of image filling. At present, some scholars have proposed to use the encoder-decoder network to fill in the missing image in the middle area. However, this image filling method is only suitable for 128*128 RGB images. Although the filled image obtained by this image filling method can meet the requirement of overall semantic consistency, the clarity of the filled image is poor. In response to this problem, some scholars try to clear fill large images by using multi-scale iterative updates. However, although the filled image obtained by this image filling method has overall semantic consistency and good clarity, its speed is extremely slow. Under the running environment of the Titan X graphics card, it takes tens of seconds to several minutes to fill a 256*256 RGB image.

发明内容Contents of the invention

本发明为解决现有的图像填充方法无法快速地获得整体语义一致且具有良好清晰度的填充图像的问题，提出了一种基于特征图最近邻替换的卷积神经网络的图像填充系统。In order to solve the problem that the existing image filling method cannot quickly obtain a filling image with consistent overall semantics and good definition, the present invention proposes an image filling system based on a convolutional neural network with feature map nearest neighbor replacement.

本发明所述的基于特征图最近邻替换的卷积神经网络的图像填充系统包括生成网络和判别网络；The image filling system of the convolutional neural network based on the feature map nearest neighbor replacement of the present invention includes a generation network and a discrimination network;

生成网络包括编码器和解码器，编码器包括N个卷积层，解码器包括N个反卷积层，N≥2；The generation network includes an encoder and a decoder, the encoder includes N convolutional layers, and the decoder includes N deconvolutional layers, N≥2;

生成网络通过对待填充图像先编码后解码的方式，得到已填充图像；The generation network obtains the filled image by first encoding and then decoding the image to be filled;

对于第一反卷积层～第N-1反卷积层中的任意M个反卷积层，生成网络基于每个反卷积层的输出结果和该反卷积层对应的卷积层的输出结果，并采用特征图最近邻替换的方式得到附加特征图，并将每个反卷积层的输出结果、该反卷积层对应的卷积层的输出结果和得到的附加特征图共同作为下一反卷积层的输入对象；For any M deconvolution layers in the first deconvolution layer to the N-1th deconvolution layer, the generation network is based on the output result of each deconvolution layer and the convolution layer corresponding to the deconvolution layer. Output the result, and use the feature map nearest neighbor replacement method to obtain an additional feature map, and use the output result of each deconvolution layer, the output result of the convolution layer corresponding to the deconvolution layer and the obtained additional feature map as The input object of the next deconvolution layer;

1≤M≤N-1；1≤M≤N-1;

判别网络用于判断已填充图像是否为待填充图像对应的真实图像，进而对生成网络的权重学习进行约束。The discriminative network is used to judge whether the filled image is the real image corresponding to the image to be filled, and then constrains the weight learning of the generating network.

作为优选的是，编码器包括卷积层E₁～卷积层E₈，解码器包括反卷积层D₁～反卷积层D₈；Preferably, the encoder includes a convolutional layer E ₁ to a convolutional layer E ₈ , and the decoder includes a deconvolutional layer D ₁ to a deconvolutional layer D ₈ ;

待填充图像为卷积层E₁的输入对象；The image to be filled is the input object of the convolutional layer _E1 ;

对于卷积层E₁～卷积层E₈，前者的输出结果在依次经批规范化和Leaky ReLU函数激活后，作为后者的输入对象；For the convolutional layer E ₁ ~ convolutional layer E ₈ , the output result of the former is used as the input object of the latter after batch normalization and Leaky ReLU function activation in turn;

卷积层E₈的输出结果在依次经批规范化和Leaky ReLU函数激活后，作为反卷积层D₁的输入对象；The output result of the convolutional layer E ₈ is used as the input object of the deconvolutional layer D ₁ after being sequentially activated by batch normalization and Leaky ReLU function;

反卷积层D₁的输出结果在经ReLU函数激活后作为反卷积层D₂的第一输入对象；The output result of the deconvolution layer _D1 is activated by the ReLU function as the first input object of the deconvolution layer _D2 ;

对于反卷积层D₂～反卷积层D₈，前者的输出结果在依次经ReLU函数激活和批规范化后，作为后者的第一输入对象；For the deconvolution layer D ₂ ~ deconvolution layer D ₈ , the output of the former is the first input object of the latter after being activated by the ReLU function and batch normalized in turn;

反卷积层D₂～反卷积层D₈的第二输入对象依次为卷积层E₇～卷积层E₁的依次经批规范化和Leaky ReLU函数激活后的输出结果；The second input objects of deconvolution layer D ₂ to deconvolution layer D ₈ are the output results of convolution layer E ₇ to convolution layer E ₁ after batch normalization and Leaky ReLU function activation in sequence;

经Tanh函数激活后的反卷积层D₈的输出结果为已填充图像；The output result of the deconvolution layer D ₈ activated by the Tanh function is a filled image;

卷积层E₁用于对输入对象进行64个4*4、步长为2的卷积操作；The convolutional layer E ₁ is used to perform 64 convolution operations of 4*4 with a step size of 2 on the input object;

卷积层E₂用于对输入对象进行128个4*4、步长为2的卷积操作；The convolutional layer E ₂ is used to perform 128 convolution operations of 4*4 with a step size of 2 on the input object;

卷积层E₃用于对输入对象进行256个4*4、步长为2的卷积操作；The convolutional layer E ₃ is used to perform 256 convolution operations of 4*4 with a step size of 2 on the input object;

卷积层E₄～卷积层E₈均用于对输入对象进行512个4*4、步长为2的卷积操作；The convolutional layers E ₄ to E ₈ are all used to perform 512 convolution operations of 4*4 with a step size of 2 on the input object;

反卷积层D₁～反卷积层D₄均用于对输入对象进行512个4*4、步长为2的反卷积操作；The deconvolution layer D ₁ ~ deconvolution layer D ₄ are all used to perform 512 deconvolution operations of 4*4 with a step size of 2 on the input object;

反卷积层D₅用于对输入对象进行256个4*4、步长为2的反卷积操作；The deconvolution layer D ₅ is used to perform 256 4*4 deconvolution operations with a step size of 2 on the input object;

反卷积层D₆用于对输入对象进行128个4*4、步长为2的反卷积操作；The deconvolution layer D ₆ is used to perform 128 deconvolution operations of 4*4 with a step size of 2 on the input object;

反卷积层D₇用于对输入对象进行64个4*4、步长为2的反卷积操作；The deconvolution layer D ₇ is used to perform 64 deconvolution operations of 4*4 with a step size of 2 on the input object;

反卷积层D₈用于对输入对象进行3个4*4、步长为2的反卷积操作；The deconvolution layer D ₈ is used to perform three 4*4 deconvolution operations with a step size of 2 on the input object;

生成网络基于反卷积层D₅的输出结果和卷积层E₃的输出结果，并采用特征图最近邻替换的方式得到附加特征图，并将该附加特征图作为反卷积层D₆的第三输入对象。The generation network is based on the output results of the deconvolution layer _D5 and the output results of the convolution layer _E3 , and adopts the method of feature map nearest neighbor replacement to obtain additional feature maps, and uses the additional feature maps as the output of the deconvolution layer _D6 . The third input object.

作为优选的是，生成网络基于反卷积层D₅的输出结果和卷积层E₃的输出结果，并采用特征图最近邻替换的方式得到附加特征图的具体过程为：Preferably, the generation network is based on the output result of the deconvolution layer _D5 and the output result of the convolution layer _E3 , and the specific process of obtaining the additional feature map by means of feature map nearest neighbor replacement is as follows:

选取一个特征值均为0的待赋值特征图，该特征图与反卷积层D₅的输出特征图和卷积层E₃的输出特征图具有相等的通道数和相同的空间大小；Select a feature map to be assigned with a feature value of 0, which has the same number of channels and the same space size as the output feature map of the deconvolution layer _D5 and the output feature map of the convolution layer _E3 ;

计算得到反卷积层D₅的输出特征图的掩膜区域和卷积层E₃的输出特征图的非掩膜区域，并同时将所述掩膜区域和所述非掩膜区域切割为多个特征块；Calculate the masked area of the output feature map of the deconvolution layer _D5 and the non-masked area of the output feature map of the convolutional layer _E3 , and simultaneously cut the masked area and the non-masked area into multiple a feature block;

多个特征块均为长方体，其尺寸为C*h*w，其中，C、h和w分别为反卷积层D₅的输出特征图的通道数、长方体的长度和长方体的宽度；A plurality of feature blocks are cuboids with a size of C*h*w, wherein C, h and w are respectively the number of channels of the output feature map of the deconvolution layer _D5 , the length of the cuboid and the width of the cuboid;

对于所述掩膜区域中的每个特征块p₁，选取所述非掩膜区域的多个特征块中与特征块p₁距离最近的特征块p₂；For each feature block p ₁ in the masked area, select the feature block p ₂ closest to the feature block p ₁ among the multiple feature blocks in the non-masked area;

选取待赋值特征图中的待赋值区域，该待赋值区域与特征块p₁在反卷积层D₅的输出特征图中的位置一致；Select the region to be assigned in the feature map to be assigned, the region to be assigned is consistent with the position of the feature block _p1 in the output feature map of the deconvolution layer _D5 ;

将特征块p₂的特征值赋予所述待赋值区域。Assign the feature value of the feature block _p2 to the area to be assigned.

作为优选的是，特征块p₂与特征块p₁的余弦距离最近。Preferably, the cosine distance between the feature block p ₂ and the feature block p ₁ is the shortest.

作为优选的是，输出特征图的掩膜区域和非掩膜区域的计算方式为：Preferably, the calculation method of the mask area and the non-mask area of the output feature map is:

给定一幅掩码图像来替代待填充图像，掩码图像与待填充图像的尺寸相同，通道数为1，特征值为0或1；Given a mask image to replace the image to be filled, the mask image has the same size as the image to be filled, the number of channels is 1, and the feature value is 0 or 1;

0表示该特征点在待填充图像上的相应位置为非待填充点；0 indicates that the corresponding position of the feature point on the image to be filled is not a point to be filled;

1表示该特征点在待填充图像上的相应位置为待填充点；1 indicates that the corresponding position of the feature point on the image to be filled is the point to be filled;

通过卷积网络来计算掩码图像的特征图的掩膜区域和非掩膜区域，该卷积网络包括第一卷积层～第三卷积层；calculating the mask area and non-mask area of the feature map of the mask image through a convolutional network, the convolutional network includes a first convolutional layer to a third convolutional layer;

掩码图像为第一卷积层的输入对象；The mask image is the input object of the first convolutional layer;

对于第一卷积层～第三卷积层，前者的输出结果为后者的输入对象；For the first convolutional layer to the third convolutional layer, the output of the former is the input object of the latter;

第一卷积层～第三卷积层均用于对输入对象进行1个4*4、步长为2的卷积操作；The first convolutional layer to the third convolutional layer are used to perform a 4*4 convolution operation with a step size of 2 on the input object;

第三卷积层的输出结果为掩码图像的特征图，其尺寸为32*32，通道为1；The output of the third convolutional layer is the feature map of the mask image, which has a size of 32*32 and a channel of 1;

对于掩码图像的特征图，当其一个特征值大于设定的阈值时，判定该特征点为掩膜点，否则，判定该特征点为非掩膜点；For the feature map of the mask image, when one of the feature values is greater than the set threshold, the feature point is determined to be a mask point, otherwise, the feature point is determined to be a non-mask point;

掩码图像的特征图的掩膜区域为掩膜点的集合，掩码图像的特征图的非掩膜区域为非掩膜点的集合；The mask area of the feature map of the mask image is a set of mask points, and the non-mask area of the feature map of the mask image is a set of non-mask points;

输出特征图的掩膜区域与掩码图像的特征图的掩膜区域相等，输出特征图的非掩膜区域与掩码图像的特征图的非掩膜区域相等。The masked area of the output feature map is equal to the masked area of the feature map of the mask image, and the unmasked area of the output feature map is equal to the non-masked area of the feature map of the mask image.

作为优选的是，生成网络采用引导损失约束的方式进行训练，引导损失约束的具体方式为在生成网络训练的过程中，在任意卷积层或反卷积层中对真实图像和输入图像进行特征相似约束；Preferably, the generation network is trained in a guided loss constraint, and the specific method of guiding the loss constraint is to perform feature extraction on the real image and the input image in any convolution layer or deconvolution layer during the training process of the generation network. similar constraints;

输入图像为经掩膜操作的真实图像。The input image is a masked real image.

作为优选的是，生成网络进行训练的具体方式为：Preferably, the specific way to generate the network for training is as follows:

将目标图像I^gt输入至生成网络，计算第l层的特征图的掩膜区域，并得到(Φ_l(I^gt))_y信息；Input the target image I ^gt to the generation network, calculate the mask area of the feature map of the l-th layer, and obtain (Φ _l (I ^gt )) _y information;

将待填充图像I输入至生成网络，计算第L-l层的特征图的掩膜区域，并得到(Φ_L-l(I))_y信息；Input the image I to be filled into the generation network, calculate the mask area of the feature map of the L1 layer, and obtain (Φ _L1 (I)) _y information;

此时定义引导损失约束L_g：At this point define the bootstrap loss constraint L _g :

式中，Ω是掩模区域，L为生成网络的总层数，y为掩模区域内的任一坐标点，Φ_L-l(I)为当输入对象为待填充图像时，生成网络在第L-l层输出的特征图，(Φ_L-l(I))_y为第L-l层的输出特征图的掩膜区域中y的信息，Φ_l(I^gt)为输入对象为目标图像时，生成网络在第l层输出的特征图，(Φ_l(I^gt))_y为第l层的输出特征图的掩膜区域中y的信息。In the formula, Ω is the mask area, L is the total number of layers of the generated network, y is any coordinate point in the mask area, Φ _Ll (I) is when the input object is an image to be filled, The feature map output by the layer, (Φ _Ll (I)) _y is the information of y in the mask area of the output feature map of the Ll layer, Φ _l (I ^gt ) is when the input object is the target image, the generation network The feature map output by the layer, (Φ _l (I ^gt )) _y is the information of y in the mask area of the output feature map of the l-th layer.

作为优选的是，判别网络包括卷积层E₉～卷积层E₁₃；Preferably, the discriminant network includes a convolutional layer E ₉ to a convolutional layer E ₁₃ ;

卷积层E₉的输入对象为已填充图像；The input object of the convolutional layer E ₉ is a filled image;

卷积层E₉的输出结果经Leaky ReLU函数激活后，作为卷积层E₁₀的输入对象；The output result of the convolutional layer E ₉ is activated by the Leaky ReLU function as the input object of the convolutional layer E ₁₀ ;

对于卷积层E₁₀～卷积层E₁₃，前者的输出结果依次经批规范化和Leaky ReLU函数激活后，作为后者的输入对象；For the convolutional layer E ₁₀ ~ convolutional layer E ₁₃ , the output result of the former is used as the input object of the latter after batch normalization and Leaky ReLU function activation in turn;

依次经批规范化和Sigmoid函数激活后的卷积层E₁₃的输出结果为判别网络的输出结果；The output result of the convolutional layer E ₁₃ after successive batch normalization and Sigmoid function activation is the output result of the discriminant network;

卷积层E₉用于对输入对象进行64个4*4、步长为2的卷积操作；The convolutional layer E ₉ is used to perform 64 convolution operations of 4*4 with a step size of 2 on the input object;

卷积层E₁₀用于对输入对象进行128个4*4、步长为2的卷积操作；The convolutional layer E ₁₀ is used to perform 128 convolution operations of 4*4 with a step size of 2 on the input object;

卷积层E₁₁用于对输入对象进行256个4*4、步长为2的卷积操作；The convolutional layer E ₁₁ is used to perform 256 convolution operations of 4*4 with a step size of 2 on the input object;

卷积层E₁₂用于对输入对象进行512个4*4、步长为1的卷积操作；The convolutional layer E ₁₂ is used to perform 512 4*4 convolution operations with a step size of 1 on the input object;

卷积层E₁₃用于对输入对象进行1个4*4、步长为1的卷积操作。The convolutional layer E ₁₃ is used to perform a 4*4 convolution operation with a step size of 1 on the input object.

作为优选的是，已填充图像为256*256的RGB图像，卷积层E₁₃的输出结果的空间大小为64*64，通道为1。Preferably, the filled image is a 256*256 RGB image, the output result of the convolutional layer _E13 has a space size of 64*64, and a channel of 1.

作为优选的是，所述图像填充系统采用Adam优化算法进行端对端的训练。Preferably, the image filling system uses Adam optimization algorithm for end-to-end training.

本发明所述的基于特征图最近邻替换的卷积神经网络的图像填充系统，将待填充图像作为其输入对象，通过在生成网络解码部分的中间输出进行特征图最近邻替换，使得一次前向传播即可得到具有整体语义一致性和良好清晰度的已填充图像。与现有的图像填充方法相比，所述图像填充系统因只需进行一次前向传播而能够更快速地获得已填充图像。The image filling system of the convolutional neural network based on the nearest neighbor replacement of the feature map according to the present invention takes the image to be filled as its input object, and performs the nearest neighbor replacement of the feature map through the intermediate output of the decoding part of the generation network, so that a forward Propagation results in a filled image with overall semantic consistency and good clarity. Compared with existing image filling methods, the image filling system can obtain filled images more quickly because it only needs to perform one forward pass.

附图说明Description of drawings

在下文中将基于实施例并参考附图来对本发明所述的基于特征图最近邻替换的卷积神经网络的图像填充系统进行更详细的描述，其中：In the following, the image filling system of the convolutional neural network based on feature map nearest neighbor replacement according to the present invention will be described in more detail based on the embodiments and with reference to the accompanying drawings, wherein:

图1为实施例提及的生成网络的结构框图；Fig. 1 is the structural block diagram of the generation network that embodiment mentions;

图2为实施例提及的判别网络的结构框图；Fig. 2 is the block diagram of the discriminant network mentioned in the embodiment;

图3为任意缺失的待填充图像；Figure 3 is any missing image to be filled;

图4为将任意缺失的待填充图像输入生成网络后得到的已填充图像；Figure 4 is the filled image obtained after inputting any missing image to be filled into the generation network;

图5为中心缺失的待填充图像；Figure 5 is an image to be filled with a missing center;

图6为将中心缺失的待填充图像输入生成网络后得到的已填充图像。Figure 6 shows the filled image obtained after inputting the image to be filled with missing center into the generation network.

具体实施方式Detailed ways

下面将结合附图对本发明所述的基于特征图最近邻替换的卷积神经网络的图像填充系统进一步说明。The image filling system based on the convolutional neural network with feature map nearest neighbor replacement according to the present invention will be further described below with reference to the accompanying drawings.

实施例：下面结合图1～图6详细地说明本实施例。Embodiment: The present embodiment will be described in detail below with reference to FIGS. 1 to 6 .

本实施例所述的基于特征图最近邻替换的卷积神经网络的图像填充系统包括生成网络和判别网络；The image filling system based on the convolutional neural network of feature map nearest neighbor replacement described in this embodiment includes a generation network and a discrimination network;

1≤M≤N-1；1≤M≤N-1;

本实施例的编码器包括卷积层E₁～卷积层E₈，解码器包括反卷积层D₁～反卷积层D₈；The encoder in this embodiment includes a convolutional layer E ₁ to a convolutional layer E ₈ , and the decoder includes a deconvolutional layer D ₁ to a deconvolutional layer D ₈ ;

本实施例的生成网络基于反卷积层D₅的输出结果和卷积层E₃的输出结果，并采用特征图最近邻替换的方式得到附加特征图的具体过程为：The generation network of this embodiment is based on the output result of the deconvolution layer _D5 and the output result of the convolution layer _E3 , and the specific process of obtaining additional feature maps by means of feature map nearest neighbor replacement is as follows:

输出特征图的掩膜区域和非掩膜区域的计算方式为：The calculation method of the masked area and non-masked area of the output feature map is:

本实施例的生成网络采用引导损失约束的方式进行训练，引导损失约束的具体方式为在生成网络训练的过程中，在任意卷积层或反卷积层中对真实图像和输入图像进行特征相似约束；The generation network of this embodiment is trained in a guided loss constraint. The specific method of the guidance loss constraint is to perform feature similarity between the real image and the input image in any convolution layer or deconvolution layer during the generation network training process. constraint;

本实施例的生成网络进行训练的具体方式为：The specific way for the generation network of this embodiment to train is as follows:

将目标图像Igt输入至生成网络，计算第l层的特征图的掩膜区域，并得到(Φ_l(I^gt))_y信息；Input the target image Igt to the generation network, calculate the mask area of the feature map of the l-th layer, and obtain (Φ _l (I ^gt )) _y information;

此外，待填充图像I经过生成网络得到图像记为Φ(I；W)，W是生成网络模型的参数。定义重建损失 In addition, the image I to be filled is denoted as Φ(I; W) through the generative network, and W is the parameter of the generative network model. Define reconstruction loss

对于每个(Φ_L-l(I))_y，其与(Φ_l(I))_x的距离计算如下：For each (Φ _Ll (I)) _y , its distance from (Φ _l (I)) _x is calculated as follows:

x为非掩模区域内的任一坐标点，(Φ_l(I))_x为第l层的输出特征图的非掩膜区域中x的信息，是非掩膜区域。x is any coordinate point in the non-mask area, (Φ _l (I)) _x is the information of x in the non-mask area of the output feature map of the l-th layer, is the unmasked area.

其中距离度量公式如下：The distance metric formula is as follows:

找到最近点x^*(y)后，用x^*(y)替换区域中的与y在同一平面位置的为待输入下一反卷积层的附加特征图。After finding the closest point x ^* (y), replace with x ^* (y) in the same plane as y in the area is the additional feature map to be input to the next deconvolution layer.

即有：That is:

本实施例的判别网络包括卷积层E₉～卷积层E₁₃；The discriminant network in this embodiment includes convolutional layers E ₉ to convolutional layers E ₁₃ ;

已填充图像为256*256的RGB图像，卷积层E₁₃的输出结果的空间大小为64*64，通道为1。The filled image is a 256*256 RGB image, the output result of the convolutional layer E ₁₃ has a spatial size of 64*64 and a channel of 1.

判别网络输入是生成网络的输出的Φ(I；W)或是I^gt，生成网络与判别网络进行对抗训练，此时产生对抗损失L_adv：The input of the discriminant network is Φ(I; W) or I ^gt of the output of the generator network, and the generator network and the discriminant network are trained against each other, and the confrontation loss L _adv is generated at this time:

式中，p_data(I^gt)为真实图像的分布，p_miss(I)为输入图像的分布，D(·)表示判别网络对输入进判别网络的图像来自p_data(I^gt)的概率预测，log为对数函数，I^gt为目标图像，I为待填充图像。In the formula, p _data (I ^gt ) is the distribution of the real image, p _miss (I) is the distribution of the input image, and D( ) represents the probability prediction of the discriminant network for the image input into the discriminant network from p _data (I ^gt ) , log is a logarithmic function, I ^gt is the target image, and I is the image to be filled.

因此，训练生成网络时，总损失为L：Therefore, when training the generative network, the total loss is L:

其中λ_g和λ_adv都是超参数。where _λg and _λadv are both hyperparameters.

图3为任意缺失的待填充图像，图4为将任意缺失的待填充图像输入生成网络后得到的已填充图像。将图3与图4对比可知：本实施例所述的基于特征图最近邻替换的卷积神经网络的图像填充系统适用于对任意缺失的待填充图像进行填充，且能够获得较好的填充效果。Figure 3 shows any missing image to be filled, and Figure 4 shows the filled image obtained after inputting any missing image to be filled into the generation network. Comparing Figure 3 with Figure 4, it can be seen that the image filling system based on the convolutional neural network based on the feature map nearest neighbor replacement described in this embodiment is suitable for filling any missing image to be filled, and can obtain a better filling effect .

图5为中心缺失的待填充图像，图6为将中心缺失的待填充图像输入生成网络后得到的已填充图像。将图5与图6对比可知：本实施例所述的基于特征图最近邻替换的卷积神经网络的图像填充系统适用于对中心缺失的待填充图像进行填充，且能够获得较好的填充效果。Figure 5 is the image to be filled with missing center, and Figure 6 is the filled image obtained after inputting the image to be filled with missing center into the generation network. Comparing Figure 5 with Figure 6, it can be seen that the image filling system based on the convolutional neural network based on the nearest neighbor replacement of the feature map described in this embodiment is suitable for filling the image to be filled with a missing center, and can obtain a better filling effect .

经仿真实验，本实施例所述的基于特征图最近邻替换的卷积神经网络的图像填充系统对一张256*256的RGB图像，耗时80ms左右。与现有图像填充方法耗时数十秒至几分钟相比，本实施例的图像填充系统在填充速度方面的提升十分显著。Through simulation experiments, the image filling system based on the convolutional neural network with feature map nearest neighbor replacement described in this embodiment takes about 80 ms for a 256*256 RGB image. Compared with the existing image filling method, which takes tens of seconds to several minutes, the image filling system of this embodiment has a significant improvement in filling speed.

本实施例所述的基于特征图最近邻替换的卷积神经网络的图像填充系统采用Adam优化算法进行端对端的训练。The image filling system of the convolutional neural network based on the feature map nearest neighbor replacement described in this embodiment uses the Adam optimization algorithm for end-to-end training.

虽然在本文中参照了特定的实施方式来描述本发明，但是应该理解的是，这些实施例仅是本发明的原理和应用的示例。因此应该理解的是，可以对示例性的实施例进行许多修改，并且可以设计出其他的布置，只要不偏离所附权利要求所限定的本发明的精神和范围。应该理解的是，可以通过不同于原始权利要求所描述的方式来结合不同的从属权利要求和本文中所述的特征。还可以理解的是，结合单独实施例所描述的特征可以使用在其他所述实施例中。Although the invention is described herein with reference to specific embodiments, it should be understood that these embodiments are merely illustrative of the principles and applications of the invention. It is therefore to be understood that numerous modifications may be made to the exemplary embodiments and that other arrangements may be devised without departing from the spirit and scope of the invention as defined by the appended claims. It shall be understood that different dependent claims and features described herein may be combined in a different way than that described in the original claims. It will also be appreciated that features described in connection with individual embodiments can be used in other described embodiments.

Claims

1. the image completion system for the convolutional neural networks that feature based figure arest neighbors is replaced, which is characterized in that described image is filled out Charging system includes generation network and differentiates network；

Generate network include encoder and decoder, encoder include N number of convolutional layer, decoder include N number of warp lamination, N >= 2；

Generation network fills decoded mode after image first encodes by treating, and obtains having been filled with image；

For the arbitrary M warp lamination in the first warp lamination~N-1 warp laminations, generation network is based on each deconvolution The output of the output result convolutional layer corresponding with the warp lamination of layer by the way of the replacement of characteristic pattern arest neighbors as a result, and obtained To supplementary features figure, and by the output result of each warp lamination, the output result of the corresponding convolutional layer of warp lamination and obtain The supplementary features figure arrived is collectively as the input object of next warp lamination；

Network is differentiated for judging to have been filled with whether image is the corresponding true picture of image to be filled, and then to generation network Weight study is constrained.

2. the image completion system for the convolutional neural networks that feature based figure arest neighbors as described in claim 1 is replaced, special Sign is that encoder includes convolutional layer E₁~convolutional layer E₈, decoder include warp lamination D₁~warp lamination D₈；

Image to be filled is convolutional layer E₁Input object；

For convolutional layer E₁~convolutional layer E₈, the former output result is successively through batch standardization and the activation of Leaky ReLU functions Afterwards, as the input object of the latter；

Convolutional layer E₈Output result successively through batch standardization and Leaky ReLU functions activation after, as warp lamination D₁'s Input object；

Warp lamination D₁Output result through ReLU functions activation after be used as warp lamination D₂The first input object；

For warp lamination D₂~warp lamination D₈, the former output result successively through ReLU functions activate and batch standardization after, The first input object as the latter；

Warp lamination D₂~warp lamination D₈The second input object be followed successively by convolutional layer E₇~convolutional layer E₁Successively through batch specification Change and the output result after the activation of Leaky ReLU functions；

Warp lamination D after the activation of Tanh functions₈Output result to have been filled with image；

Convolutional layer E₁For the convolution operation for carrying out 64 4*4 to input object, step-length is 2；

Convolutional layer E₂For the convolution operation for carrying out 128 4*4 to input object, step-length is 2；

Convolutional layer E₃For the convolution operation for carrying out 256 4*4 to input object, step-length is 2；

Convolutional layer E₄~convolutional layer E₈It is used to the convolution operation for 512 4*4 being carried out to input object, step-length is 2；

Warp lamination D₁~warp lamination D₄The deconvolution for being used to carry out 512 4*4 to input object, step-length is 2 operates；

Warp lamination D₅For carrying out 256 4*4 to input object, step-length is operated for 2 deconvolution；

Warp lamination D₆For carrying out 128 4*4 to input object, step-length is operated for 2 deconvolution；

Warp lamination D₇For carrying out 64 4*4 to input object, step-length is operated for 2 deconvolution；

Warp lamination D₈For carrying out 3 4*4 to input object, step-length is operated for 2 deconvolution；

It generates network and is based on warp lamination D₅Output result and convolutional layer E₃Output as a result, and being replaced using characteristic pattern arest neighbors The mode changed obtains supplementary features figure, and using the supplementary features figure as warp lamination D₆Third input object.

3. the image completion system for the convolutional neural networks that feature based figure arest neighbors as claimed in claim 2 is replaced, special Sign is that generation network is based on warp lamination D₅Output result and convolutional layer E₃Output as a result, and use characteristic pattern arest neighbors The detailed process that the mode of replacement obtains supplementary features figure is：

It is 0 to treat assignment characteristic pattern to choose a characteristic value, this feature figure and warp lamination D₅Output characteristic pattern and convolutional layer E₃Output characteristic pattern have equal port number and identical space size；

Warp lamination D is calculated₅The output masked areas of characteristic pattern and convolutional layer E₃Output characteristic pattern unmasked areas Domain, and the masked areas and the unmasked areas are cut into multiple characteristic blocks simultaneously；

Multiple characteristic blocks are cuboid, size C*h*w, wherein, C, h and w are respectively warp lamination D₅Output characteristic pattern Port number, the length of cuboid and the width of cuboid；

For each characteristic block p in the masked areas₁, choose in multiple characteristic blocks of the unmasked areas with characteristic block p₁Closest characteristic block p₂；

It chooses and treats to treat assignment region in assignment characteristic pattern, this treats assignment region and characteristic block p₁It is special in the output of warp lamination D5 Levy the position consistency in figure；

By characteristic block p₂Characteristic value assign described in treat assignment region.

4. the image completion system for the convolutional neural networks that feature based figure arest neighbors as claimed in claim 3 is replaced, special Sign is, characteristic block p₂With characteristic block p₁COS distance it is nearest.

5. the image completion system for the convolutional neural networks that feature based figure arest neighbors as claimed in claim 4 is replaced, special Sign is that the calculation of masked areas and unmasked areas for exporting characteristic pattern is：

A width mask image is given to substitute image to be filled, mask image is identical with the size of image to be filled, and port number is 1, characteristic value is 0 or 1；

0 represents that corresponding position of this feature point on image to be filled is non-point to be filled；

1 represents that corresponding position of this feature point on image to be filled is point to be filled；

The masked areas of the characteristic pattern of mask image and unmasked areas are calculated by convolutional network, which includes the One convolutional layer~third convolutional layer；

Mask image is the input object of the first convolutional layer；

For the first convolutional layer~third convolutional layer, the former output result is the input object of the latter；

First convolutional layer~third convolutional layer is used to the convolution operation for carrying out 1 4*4 to input object, step-length is 2；

Characteristic pattern of the output result of third convolutional layer for mask image, size 32*32, channel 1；

For the characteristic pattern of mask image, when one characteristic value is more than the threshold value of setting, judgement this feature point is mask point, Otherwise, it is determined that this feature point is unmasked point；

The masked areas of the characteristic pattern of mask image is the set of mask point, and the unmasked areas of the characteristic pattern of mask image is non- The set of mask point；

The masked areas for exporting characteristic pattern is equal with the masked areas of the characteristic pattern of mask image, exports the unmasked areas of characteristic pattern Domain is equal with the unmasked areas of the characteristic pattern of mask image.

6. the image completion system for the convolutional neural networks that feature based figure arest neighbors as claimed in claim 5 is replaced, special Sign is that generation network is trained by the way of Loss constraint is guided, and guides the concrete mode of Loss constraint to generate It is similar about to true picture and input picture progress feature in arbitrary convolutional layer or warp lamination during network training Beam；

Input picture is the true picture through masking operations.

7. the image completion system for the convolutional neural networks that feature based figure arest neighbors as claimed in claim 6 is replaced, special Sign is that the concrete mode that generation network is trained is：

By target image I^gtGeneration network is input to, calculates the masked areas of l layers of characteristic pattern, and obtains (Φ_l(I^gt))_yLetter Breath；

Image I to be filled is input to generation network, calculates the masked areas of L-l layers of characteristic pattern, and obtains (Φ_L-l(I))_y Information；

Definition guiding Loss constraint L at this time_g：

In formula, Ω is masks area, and L makes a living into total number of plies of network, and y is any coordinate points in masks area, Φ_L-l(I) it is When input object is image to be filled, the characteristic pattern of generation network output at L-l layers, (Φ_L-l(I))_yIt is L-l layers Export the information of y in the masked areas of characteristic pattern, Φ_l(I^gt) when to be input object be target image, generation network is defeated at l layers The characteristic pattern gone out, (Φ_l(I^gt))_yInformation for y in the masked areas of l layers of output characteristic pattern.

8. the image completion system for the convolutional neural networks that feature based figure arest neighbors as claimed in claim 7 is replaced, special Sign is, differentiates that network includes convolutional layer E₉~convolutional layer E₁₃；

Convolutional layer E₉Input object to have been filled with image；

Convolutional layer E₉Output result through Leaky ReLU functions activation after, as convolutional layer E₁₀Input object；

For convolutional layer E₁₀~convolutional layer E₁₃, the former output result is successively through batch standardization and the activation of Leaky ReLU functions Afterwards, as the input object of the latter；

Convolutional layer E after batch standardization and the activation of Sigmoid functions successively₁₃Output result be differentiate network output knot Fruit；

Convolutional layer E₉For the convolution operation for carrying out 64 4*4 to input object, step-length is 2；

Convolutional layer E₁₀For the convolution operation for carrying out 128 4*4 to input object, step-length is 2；

Convolutional layer E₁₁For the convolution operation for carrying out 256 4*4 to input object, step-length is 2；

Convolutional layer E₁₂For the convolution operation for carrying out 512 4*4 to input object, step-length is 1；

Convolutional layer E₁₃For the convolution operation for carrying out 1 4*4 to input object, step-length is 1.

9. the image completion system for the convolutional neural networks that feature based figure arest neighbors as claimed in claim 8 is replaced, special Sign is, has been filled with the RGB image that image is 256*256, convolutional layer E₁₃Output result space size for 64*64, channel It is 1.

10. the image completion system for the convolutional neural networks that feature based figure arest neighbors as claimed in claim 9 is replaced, special Sign is that described image fill system carries out end-to-end training using Adam optimization algorithms.