WO2020034663A1 - Recadrage d'image basé sur une grille - Google Patents
Recadrage d'image basé sur une grille Download PDFInfo
- Publication number
- WO2020034663A1 WO2020034663A1 PCT/CN2019/084959 CN2019084959W WO2020034663A1 WO 2020034663 A1 WO2020034663 A1 WO 2020034663A1 CN 2019084959 W CN2019084959 W CN 2019084959W WO 2020034663 A1 WO2020034663 A1 WO 2020034663A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- feature map
- crop
- grid
- individual
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20021—Dividing image into blocks, subimages or windows
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20112—Image segmentation details
- G06T2207/20132—Image cropping
Definitions
- SRCC Spearman s rank-order correlation coefficient.
- the present invention generally relates to image processing. Particularly, the present invention relates to a method for automatically cropping a source image based on grids.
- Cropping plays a key role in recomposing taken photos. By removing unwanted region of a photo or readjusting the arrangement of the content, cropping can make a photo much more visually pleasing.
- Manually cropping by users is currently the most popular and reliable way in most of the camera devices and image processing applications.
- manually cropping photos is time-consuming especially when the number is large, and requiring the users to know the rules of photography composition.
- a reliable and efficient automatic cropping system can significantly improve the user experience and benefit billions of users.
- the first category is directed to attention-driven methods. Earlier methods are mostly attention driven, aiming to identify the major subject or the most informative region of an image. Most of them [1] , [18] , [23] , [24] resort to a saliency detection algorithm (e.g. [12] ) to get an attention map of an image, and search a cropping window with the highest attention value. Some methods also employ face detection [30] or gaze interaction [21] to find the important region of an image.
- a saliency detection algorithm e.g. [12]
- Some methods also employ face detection [30] or gaze interaction [21] to find the important region of an image.
- the second category is directed to aesthetic-driven methods.
- the aesthetic-driven methods improve the attention-based methods by emphasizing the overall aesthetic quality of an image.
- These methods [4] , [7] , [15] , [20] , [27] - [29] , [30] usually design a set of handcrafted features to characterize the widely recognized aesthetic properties or composition rules. Some methods further design quality measures [15] , [30] to evaluate the quality of crop candidates, while some resort to training an aesthetic discriminator [4] , [20] .
- the release of two cropping databases [7] , [27] facilitates the training of discriminative cropping models.
- the handcrafted features are not strong enough to accurately predict image aesthetics [6] .
- the third category is directed to data-driven methods. Most recent methods are data-driven, which train an end-to-end CNN model for image cropping. However, limited by insufficient number of annotated training samples, many methods in this category [2] , [5] , [6] , [8] , [14] , [25] adopt a general aesthetic classifier trained from image aesthetic databases such as the ones described in [19] and [16] to help cropping. However, a general aesthetic classifier trained on full images may not be able to reliably evaluate the crops within one image [3] , [26] . An alternative strategy is to use pairwise learning to construct more training data [3] , [26] . However, annotation of ranking pairs is also very expensive because of the subjective nature of image cropping. Recently, Wei et al. [26] constructed a large scale comparative photo composition dataset using an efficient two-stage annotation protocol, which provides a good training set for pairwise learning. Unfortunately, pairwise learning cannot provide adequate evaluation metrics for image cropping.
- the present invention provides a method for automatically cropping a source image to yield a cropped image.
- the method comprises the step of generating plural crop candidates from the source image and the step of selecting the cropped image from the generated crop candidates.
- a first aspect of the present invention is to provide embodiments for generating the crop candidates from the source image.
- a grid anchor based approach is disclosed and is used to generate the crop candidates. Generating the crop candidates using the disclosed grid anchor based approach shrinks the number of crop candidates from millions to less than one hundred in certain situations, thus making an image cropping system based on this approach very efficient. Redundant and unnecessary crop candidates are also discarded, making cropping results more stable.
- a preselected portion of the source image is gridded to form an image grid having M ⁇ N grid units.
- the preselected portion may be the whole source image or a part thereof.
- An individual grid unit has a grid anchor for identifying the individual grid unit.
- the grid anchor is located on or inside the individual grid unit. In one option, the grid anchor is selected to be a center of the individual grid unit.
- a first corner region of size m ⁇ n grid units and a second corner region of size m′ ⁇ n′ grid units in the image grid are selected.
- the first and second corner regions are diagonally-opposite to each other.
- m, m′, n and n′ are selected such that M-m-m′ and N-n-n′ are positive.
- an individual crop candidate is created as a rectangular portion of the source image having a first anchor point and a second anchor point.
- the second anchor point is diagonally-opposite to the first anchor point.
- the crop candidates are generated via selecting a plurality of different anchor-point pairs each consisting of respective first and second anchor points.
- the first and second anchor points in each anchor-point pair are respectively selected from the first and second pluralities of grid anchors rather than from a first plurality of image pixels in the first corner region and a second plurality of image pixels in the second corner region.
- the first and second anchor points are selected under a first constraint that an area of the individual crop candidate is not less than a predetermined proportion of a whole area of the source image.
- the first constraint may be modified such that the area of the individual crop candidate is not less than the predetermined proportion of a whole area of the image grid.
- the first and second anchor points are selected under a second constraint that an aspect ratio of the individual crop candidate is between a predetermined lower limit and a predetermined upper limit inclusively.
- the aspect ratio is a ratio of a width of the individual crop candidate to a height thereof.
- the first and second constraints may be used together in the selection of anchor-point pair. It is also possible that only one of the constraints is used.
- a second aspect of the present invention is to provide embodiments for selecting the cropped image from the generated crop candidates.
- An approach of using feature information contained in both RoI and RoD in evaluating an individual crop candidate is adopted. This approach is shown to be more effective in evaluating the generated crop candidates and enables the image cropping system to handle more complicated scenes possibly appeared in the source image.
- a suitability score of using the individual crop candidate as the cropped image is determined. The determination of the suitability score is repeated until respective suitability scores for all the crop candidates are determined. A preferred crop candidate having a maximum suitability score among the crop candidates is selected to be the cropped image.
- the suitability score of using the individual crop candidate as the cropped image is determined as follows.
- a whole feature map of the source image is obtained, where the whole feature map contains feature information of the source image.
- the whole feature map is obtained by processing the source image with a plurality of convolutional layers of a CNN.
- the whole feature map is separated into a first feature map containing feature information of a RoI and a second feature map containing feature information of a RoD.
- the RoI is a region of the whole feature map containing feature information of the individual crop candidate.
- the RoD is a remaining region of the whole feature map without the RoI.
- the first feature map and the second feature map are resized to form a resized first feature map and a resized second feature map, respectively.
- bilinear interpolation is used to resize each of the first and second feature maps.
- the resized first and second feature maps are concatenated to form a combined feature map such that the combined feature map contains feature information of both the RoI and the RoD. If spatial resolutions of the resized first and second feature maps are different, before concatenation, each of the resized first and second feature maps is processed with a fully connected layer of the CNN for adjusting the resized first and second feature maps to have a same spatial resolution.
- the suitability score is determined according to the combined feature map.
- feature information of both the RoI and the RoD is utilized in determining the suitability score.
- FIG. 1 provides an example for illustrating local redundancy in image cropping, indicating that small local changes, such as shifting and scaling, on the cropping window of an acceptable crop (as shown in the bottom-right one in FIG. 1) are very likely to output acceptable crops too.
- FIG. 2 illustrates the grid anchor based formulation of image cropping in accordance with an exemplary embodiment of the present invention, where M and N are the numbers of grid units used for gridding the source image, and m and n define the adopted range of grid anchors for facilitating content preservation.
- FIG. 3 depicts a CNN architecture for image cropping model learning in accordance with an exemplary embodiment of the present invention.
- FIG. 4 depicts a flow diagram showing exemplary steps used for cropping a source image to yield a cropped image, where the steps include generating crop candidates from the source image and selecting the cropped image from the generated crop candidates.
- FIG. 5 depicts a flow diagram showing exemplary steps used for generating the crop candidates from the source image.
- FIG. 6 depicts a flow diagram showing exemplary steps used for selecting the cropped image from the generated crop candidates, where the steps include determining a suitability score of using a crop candidate as the cropped image.
- FIG. 7 depicts a flow diagram showing exemplary steps used for determining the suitability score of using an individual crop candidate as the cropped image.
- FIG. 8 provides examples for illustrating different choices of image grid coverage on the source image.
- FIG. 9 depicts, as examples, two choices of locations for setting up grid anchors of grid units, where the grid anchors are used as anchor points for defining crop candidates.
- FIG. 10 illustrates two arrangements of locating first and second corner regions in an image grid.
- a crop is synonymous to a cropped image, and means an image cropped from a source image. It is also used herein that “a crop candidate” means an image cropped from a source image, where this cropped image is intended to contend with other crop candidates for selection as a finalized cropped image for the source image according to certain selection criteria.
- a novel grid-based automatic cropping system which is very efficient (shrinking the cropping candidates from millions to less than one hundred) , flexible (applicable to images taken by any camera devices with any resolution) , and stable in various scenes, and (2) a method for evaluating the cropping candidates by evaluating both the RoI and RoD for each cropping candidate.
- Section 1 elaborates embodiments regarding grid anchor based image cropping.
- Section 2 describes embodiments regarding cropping model learning by a CNN.
- Section 3 provides experimental results. By condensing the details disclosed in Sections 1-3, the present invention is detailed in Section 3.
- Image cropping has a high degree of freedom and there is no unique optimal crop for a given image.
- a reliable cropping algorithm should be able to return acceptable results for different settings (e.g., aspect ratio and resolution) rather than one single output.
- the cropping system should be lightweight and efficient to run on devices without powerful computational resource.
- a crop candidate Given an image with resolution H ⁇ W, a crop candidate can be defined using its top-left corner (x 1 , y 1 ) and bottom-right corner (x 2 , y 2 ) , where 1 ⁇ x 1 ⁇ x 2 ⁇ H and 1 ⁇ y 1 ⁇ y 2 ⁇ W. It is easy to calculate that the number of potential crop candidates is H (H-1) W (W-1) /4, which is a huge number even for an image of size 100 ⁇ 100. Fortunately, by exploiting the following three properties and requirements of image cropping, the searching space can be significantly reduced, making automatic image cropping a tractable problem.
- Local redundancy Image cropping is naturally a problem with local redundancy. As illustrated in FIG. 1, a set of similar and acceptable crops can be obtained in the neighborhood of a good crop candidate by shifting and/or scaling the size of the cropping widow. Intuitively, we can remove the redundant crop candidates by defining crops on image grid anchors rather than dense pixels.
- the grid anchor based formulation developed herein is illustrated in FIG. 2.
- the image grid 210 is uniform in the sense that all the grid units of are same size.
- a grid unit 230 is a rectangular block of minimal size in the image grid 210.
- the grid unit 230 is selected such that the grid unit 230 is not as small as one image pixel of the source image so as to achieve that the resultant grid anchors are not as dense as the image pixels.
- the two corners 221, 222 serve as the anchors to generate a representative crop in the neighborhood.
- Such a formulation largely reduces the number of cropping candidates from H (H-1) W (W-1) /4 to M (M-1) N (N-1) /4, which can be several orders smaller.
- a good crop should preserve the major content of the source image [7] . Cropping should not dramatically change the subject of the source image and should preserve a large proportion of the source image. Therefore, the cropping window should not be too small in order to avoid discarding the image content too much.
- the anchor points (x 1 , y 1 ) 221 and (x 2 , y 2 ) 222 selected to form a crop are located in two regions with m ⁇ n grid units on the top-left and bottom-right corners of the source image, respectively, as illustrated in FIG. 2. It further reduces the number of crops from M (M-1) N (N-1) /4 to m (m-1) n (n-1) .
- the smallest possible crop 240 generated by the scheme developed herein covers about a ratio of (M-2m) (N-2n) / (MN) in area of the source image, which may still be too small to preserve enough image content.
- the area of potential crops is further constrained to be no smaller than a certain proportion of the whole area of source image, i.e.
- S crop and S Image represent the areas of crop and source image, respectively.
- the aspect ratio of acceptable crop candidates satisfies the following condition:
- W crop and H crop are the width and height of a crop.
- the final number of crop candidates is less than m (m-1) n (n-1) /2.
- the grid anchor based formulation as developed herein reduces the number of cropping candidates from H 2 W 2 to less than m (m-1) n (n-1) /2. It enables us to annotate all of the crop candidates for each image.
- ⁇ is set to 0.5.
- the dense annotations of multiple crops for each image enable us to define more reasonable metrics to evaluate cropping performance compared to the IoU or BDE used in previous databases [2] , [7] , [27] .
- the SRCC has been widely used to evaluate the rank correlation between the MOS and model’s predictions in image quality and aesthetic assessment [13] , [17] . Denote the MOS and model’s predictions by g and p.
- the SRCC is defined as
- r g and r p are the rank indexes of g and p, respectively; and cov (. ) and ⁇ (. ) denote covariance and standard deviation, respectively.
- the average SRCC is computed by
- g i and p i denote the MOS and predictions of all crops in i-th image
- T is the number of testing images.
- Acc K/N return K of top-N accuracy
- Truncating at shallower layers can preserve larger spatial resolution but the output feature map 322 may not have enough receptive field to describe large objects in images.
- the output feature map 322 is generated by processing a source image 305 by a plurality of convolutional layers 315. For reference, a cropped image 306 to be considered is also shown on the source image 305.
- a cropped image 306 to be considered is also shown on the source image 305.
- the first significant difference between image cropping and object detection is that object detection only focuses on the RoI (viz., the region of a feature map containing feature information of the cropped image 306 where the feature map contains feature information of the source image 305) , while cropping also needs to consider the discarded information (i.e. the RoD, the remaining region of the feature map excluding the RoI) .
- the RoD the remaining region of the feature map excluding the RoI
- removing distracting information can significantly improve the composition.
- cropping out important region can dramatically change or even destroy an image.
- a CNN model is unavailable to such important information if only the RoI is considered, while additionally modeling the RoD can effectively solve this problem.
- F denote the whole feature map 322 outputted by the feature extraction module 310. It consists of a first feature map 326 in RoI and a second feature map 327 in RoD, denoted by F RoI and F RoD , respectively.
- the F RoD 327 is constructed by removing F RoI 326 from F 322, namely, setting the values of F RoI 326 to zeros in F 322.
- RoDAlign using the same bilinear interpolation as RoIAlign is performed for F RoD 327, leading to 337, which has the same spatial resolution as 336.
- 336 and 337 are concatenated along the channel dimension as one aligned feature map (referred to as a combined feature map 342) that contains the feature information in both RoI and RoD.
- the combined feature map 342 is fed into two fully connected layers 352, 353 for final prediction of a MOS 365.
- the Huber loss [11] is employed to regress the MOS of crop candidates:
- VGG16 models generally outperform the ResNet50 models. It may be because the ResNet50 models are easier to overfit on our database. We thus choose the VGG16 model (truncated at conv5_1 layer) as the feature extraction module in the following experiments.
- Parameter size There are two key parameters in the image cropping module: spatial resolution (s ⁇ s) of the aligned feature map and channel dimension (cdim) after dimension reduction. Table 2 below reports the cropping performance of using different s ⁇ s and cdim. The number of filters (nfilter) was fixed at 512 for the followed fully connected layers. The VGG16 model (truncated at conv5_1) is employed as the feature extraction module for all cases. The parameter size (par (Mbit) ) of the image cropping module (including two fully connected layers with s ⁇ s ⁇ (s*cdim) ⁇ 512 and 1 ⁇ 1 ⁇ 512 ⁇ 512 kernels) is reported for each case. We first found that small s (e.g. 3 or 5) resulted in obviously worse performance.
- RoI and RoD We made an ablation study on the role of RoI and RoD. The results of using only RoI, only RoD and both of them are reported in Table 3 below. As can be seen, modeling only the RoD results in very poor accuracy, modeling only the RoI performs much better, while simultaneously modeling the RoI and RoD achieves the best cropping accuracy in all cases. This corroborates our analysis that cropping needs to consider both the RoI and RoD.
- FIG. 4 depicts a flowchart showing exemplary steps of the disclosed method.
- a step 410 plural crop candidates are generated from the source image.
- the cropped image is selected from the generated crop candidates in a step 420.
- FIG. 5 depicts a flowchart showing exemplary steps used for generating the crop candidates from the source image.
- the step 410 includes steps 510, 520 and 530.
- a preselected portion of the source image is gridded to form an image grid having M ⁇ N grid units.
- the grid units are of same size and each grid unit is rectangular in shape.
- a square grid unit may be used.
- the size and shape of the individual grid unit are determined by those skilled in the art according to practical situations, such as the predicted size of object of interest in the source image, and the computation power available for performing the image-cropping task.
- An individual grid unit has a grid anchor for identifying the individual grid unit.
- the grid anchor is located on or inside the individual grid unit.
- the grid anchor is selected as a center of the individual grid unit, and this grid anchor is located inside the individual grid unit.
- the grid anchor is a top-left corner of the individual grid unit, so that this grid anchor is located on the individual grid unit, and equivalently, located on a boundary of the individual grid unit.
- FIG. 8 depicts that (a) top left corners 831 of grid units are used as grid anchors, and (b) grid unit centers 832 are used as grid anchors.
- the size of the individual grid unit is not selected to be one image pixel of the source image in order that the resultant grid anchors are not as dense as the image pixels.
- Section 1.1 mentions that the whole source image is gridded so that the image grid and the source image have the same size, the present invention is not limited only to the case that the preselected portion of the source image is the whole source image. In some practical situations, it may be advantageous to grid only a certain portion of the source image. For example, if it is known beforehand that an object of interest never appears in a certain region of the source image, such as a region near the image boundary, then the preselected portion of the source image may be selected to be the whole source image excluding the aforementioned certain region. As examples, FIG. 9 shows three image grids 921, 922, 923 each obtained by gridding only a central region of a source image 910.
- a first corner region of size m ⁇ n grid units and a second corner region of size m′ ⁇ n′ grid units in the image grid are selected.
- the first and second corner regions are diagonally-opposite to each other.
- the first and second corner regions are located at a top-left corner and a bottom-right corner, respectively, of the image grid.
- the first and second corner regions are located at a bottom-left corner and a top-right corner, respectively, of the image grid.
- m, m′, n and n′ are selected such that M-m-m′ and N-n-n′ are positive.
- the present invention is not limited only to this special case. It is possible that m ⁇ m′ or n ⁇ n′.
- the crop candidates are generated.
- an individual crop candidate is created as a rectangular portion of the source image having a first anchor point and a second anchor point.
- the individual crop candidate is created in a manner as mentioned in Section 1.1. If (x 1 , y 1 ) and (x 2 , y 2 ) are respectively the first and second anchor points, viz., the coordinates of image pixels on the source image, then the individual crop candidate is formed as a rectangular portion of the source image having four corners at image-pixel positions (x 1 , y 1 ) , (x 1 , y 2 ) , (x 2 , y 2 ) and (x 2 , y 1 ) .
- first and second anchor points form an anchor-point pair.
- the crop candidates are generated via selecting a plurality of different anchor-point pairs.
- first and second anchor points in each anchor-point pair are respectively selected from the first and second pluralities of grid anchors, but not from a first plurality of image pixels in the first corner regions and a second plurality of image pixels in the second corner region.
- a likelihood of obtaining similar crop candidates among the generated crop candidates is reduced as explained in Section 1.1 regarding local redundancy.
- the first and second anchor points are selected under a first constraint that an area of the individual crop candidate is not less than a predetermined proportion of a whole area of the source image.
- the first constraint is given by EQN. 1 above.
- the predetermined proportion is given by ⁇ of EQN. 1.
- EQN. 2 may be generalized to
- the first constraint may be modified to that the first and second anchor points are selected under a constraint that an area of the individual crop candidate is not less than a predetermined proportion of a whole area of the image grid.
- EQN. 7 is still applicable whereas S Image in EQN. 1 is replaced by the whole area of the image grid.
- the first and second anchor points are selected under a second constraint that an aspect ratio of the individual crop candidate is between a predetermined lower limit and a predetermined upper limit inclusively.
- the aspect ratio is a ratio of a width of the individual crop candidate to a height thereof.
- the second constraint is reflected in EQN. 3, in which the predetermined lower and upper limits are ⁇ 1 and ⁇ 2 , respectively.
- the first and second constraints may be used together in the selection of anchor-point pair. It is also possible that only one of the constraints is used. Similarly, if the image grid does not occupy the whole source image, the modified first constraint and the second constraint may or may not be used together in the selection of anchor-point pair.
- the grid anchor based approach via executing the steps 510, 520 and 530 can shrink the number of crop candidates by several orders of magnitude.
- the cropped image is selected from the crop candidates generated in the step 410.
- the crop candidates are preferably and advantageously generated through executing the steps 510, 520, 530, in the present invention it is not intended that the crop candidates used in executing the step 420 are generated only from the steps 510, 520, 530.
- the sequence of the steps 510, 520, 530 only forms certain preferred embodiments of the step 410. It is possible that the crop candidates are generated by another embodiment of the step 410.
- FIG. 6 depicts a flow diagram showing exemplary steps used in the step 420 for selecting the cropped image from the generated crop candidates.
- a suitability score of using an individual crop candidate as the cropped image is determined, or estimated, in a step 610.
- the suitability score is to quantify whether the individual crop candidate is suitable to be used as the cropped image.
- the suitability score is a degree of suitability that the individual crop candidate is suitable to be used as the cropped image.
- One example of the suitability score is a MOS.
- the step 610 is repeated until all the generated crop candidates are processed such that each of the generated crop candidates has a respective suitability score determined.
- a preferred crop candidate having a maximum suitability score among the suitability scores determined for the crop candidates is identified in a step 630. The preferred crop candidate is selected as the cropped image.
- FIG. 7 depicts a flow diagram showing exemplary steps used for determining the suitability score of using the individual crop candidate as the cropped image.
- the step 610 includes steps 710, 720, 730, 735, 740, 745, 750 and 760.
- the steps 710, 720, 730, 735, 740, 745, 750 and 760 are conveniently implemented by a CNN.
- a whole feature map of the source image is obtained.
- the whole feature map contains feature information of the source image. Details of the step 710 are elaborated in Section 2.1.
- the whole feature map 322 is obtained by processing the source image 305 with a plurality of convolutional layers 315.
- the whole feature map obtained in the step 710 contains feature information of a RoI and of a RoD.
- the RoI is a region of the whole feature map containing feature information of the individual crop candidate (i.e. the cropped portion of the source image) .
- the RoD is a remaining region of the whole feature map without the RoI. .
- the whole feature map is separated into a first feature map and a second feature map.
- the first feature map contains feature information of the RoI.
- the second feature map contains feature information of the RoD.
- the whole feature map 322 consists of the first feature map 326 and the second feature map 327. Implementation details of separating the first and second feature maps 326, 327 from the whole feature map 322 are provided in Section 2.2.
- first feature map and the second feature map are resized to form a resized first feature map in the step 730 and a resized second feature map in the step 735, respectively. Details of the steps 730 and 735 are elaborated in Section 2.2.
- each of the first and second feature maps is resized by bilinear interpolation.
- the resized first and second feature maps are directly concatenated to form a combined feature map in the step 750.
- the combined feature map contains feature information of both the RoI and the RoD.
- the first feature map 336 and the second feature map 337 both under the same spatial resolution are directly concatenated to form the combined feature map 342.
- the resized first and second feature maps are adjusted to align in spatial resolution in the steps 740 and 745, respectively before concatenation proceeds in the step 750.
- each of the resized first and second feature maps is processed by a fully connected layer in a CNN (not shown in FIG. 3) for adjusting the resized first and second feature maps to have a same spatial resolution.
- a fully connected layer in a CNN not shown in FIG. 3
- the suitability score is determined according to the combined feature map.
- Feature information of both the RoI and the RoD contained in the combined feature map is advantageously used in determining the suitability score.
- one or more fully connected layers in a CNN are used for determining the suitability score from the combined feature map. As an example shown in the CNN implementation of FIG. 3, the two fully connected layers 352, 353 are used to process the combined feature map 342 to give a resultant MOS.
- the embodiments disclosed herein may be implemented using computing devices, such as computers, computing servers, general purpose processors, specialized computing processors, digital signal processors, processors specialized in computing convolution products for images, programmable logic devices and field programmable gate arrays (FPGAs) , where the computing devices are configured or programmed according to the teachings of the present disclosure.
- Computer instructions or software codes running in the computing devices can readily be prepared by practitioners skilled in the software or electronic art based on the teachings of the present disclosure.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
L'invention concerne une image source étant automatiquement recadrée par génération de candidats de recadrage à partir de l'image source et sélection d'une image recadrée à partir des candidats de recadrage. Les candidats de recadrage sont générés par maillage de l'image source pour former une grille d'image ayant des unités de grille ayant chacune un ancrage de grille, par sélection de deux régions de coin dans la grille d'image, et par création d'un candidat de recadrage individuel ayant deux points de coin respectivement sélectionnés parmi deux pluralités d'ancrages de grille dans les deux régions de coin. Ledit procédé réduit le nombre de candidats de recadrage générés tout en réduisant la probabilité d'obtenir des candidats de recadrage similaires. Un score d'adéquation d'utilisation du candidat de recadrage individuel en tant qu'image recadrée est déterminé en utilisant des informations de caractéristiques dans les deux régions d'intérêt (RoI) et la région de rejet (RoD). La RoI est une région de carte de caractéristiques contenant des informations de caractéristiques du candidat de recadrage individuel et la RoD est la région restante sans la RoI.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201862717931P | 2018-08-13 | 2018-08-13 | |
| US62/717,931 | 2018-08-13 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2020034663A1 true WO2020034663A1 (fr) | 2020-02-20 |
Family
ID=69525051
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2019/084959 Ceased WO2020034663A1 (fr) | 2018-08-13 | 2019-04-29 | Recadrage d'image basé sur une grille |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2020034663A1 (fr) |
Cited By (30)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111553364A (zh) * | 2020-04-28 | 2020-08-18 | 支付宝(杭州)信息技术有限公司 | 图片处理方法及装置 |
| CN113159028A (zh) * | 2020-06-12 | 2021-07-23 | 杭州喔影网络科技有限公司 | 显著性感知图像裁剪方法、装置、计算设备和存储介质 |
| US11403069B2 (en) | 2017-07-24 | 2022-08-02 | Tesla, Inc. | Accelerated mathematical engine |
| US11409692B2 (en) | 2017-07-24 | 2022-08-09 | Tesla, Inc. | Vector computational unit |
| US11487288B2 (en) | 2017-03-23 | 2022-11-01 | Tesla, Inc. | Data synthesis for autonomous control systems |
| WO2022256020A1 (fr) * | 2021-06-04 | 2022-12-08 | Hewlett-Packard Development Company, L.P. | Nouvelle composition d'image |
| US11537811B2 (en) | 2018-12-04 | 2022-12-27 | Tesla, Inc. | Enhanced object detection for autonomous vehicles based on field view |
| CN115546092A (zh) * | 2021-06-30 | 2022-12-30 | 北京电子科技职业学院 | 基于块结构的矩形件裁剪方法、装置、设备及存储介质 |
| US11561791B2 (en) | 2018-02-01 | 2023-01-24 | Tesla, Inc. | Vector computational unit receiving data elements in parallel from a last row of a computational array |
| US11562231B2 (en) | 2018-09-03 | 2023-01-24 | Tesla, Inc. | Neural networks for embedded devices |
| US11567514B2 (en) | 2019-02-11 | 2023-01-31 | Tesla, Inc. | Autonomous and user controlled vehicle summon to a target |
| US11610117B2 (en) | 2018-12-27 | 2023-03-21 | Tesla, Inc. | System and method for adapting a neural network model on a hardware platform |
| US11636333B2 (en) | 2018-07-26 | 2023-04-25 | Tesla, Inc. | Optimizing neural network structures for embedded systems |
| US11665108B2 (en) | 2018-10-25 | 2023-05-30 | Tesla, Inc. | QoS manager for system on a chip communications |
| US11681649B2 (en) | 2017-07-24 | 2023-06-20 | Tesla, Inc. | Computational array microprocessor system using non-consecutive data formatting |
| US11734562B2 (en) | 2018-06-20 | 2023-08-22 | Tesla, Inc. | Data pipeline and deep learning system for autonomous driving |
| US11748620B2 (en) | 2019-02-01 | 2023-09-05 | Tesla, Inc. | Generating ground truth for machine learning from time series elements |
| US11790664B2 (en) | 2019-02-19 | 2023-10-17 | Tesla, Inc. | Estimating object properties using visual image data |
| CN116993734A (zh) * | 2023-09-27 | 2023-11-03 | 深圳市博硕科技股份有限公司 | 基于视觉成像进行分析的电池隔热棉裁切质量预测系统 |
| US11816585B2 (en) | 2018-12-03 | 2023-11-14 | Tesla, Inc. | Machine learning models operating at different frequencies for autonomous vehicles |
| CN117067772A (zh) * | 2023-08-09 | 2023-11-17 | 深圳劲鑫科技有限公司 | 一种喷印图像的生成方法、装置、设备及介质 |
| US11841434B2 (en) | 2018-07-20 | 2023-12-12 | Tesla, Inc. | Annotation cross-labeling for autonomous control systems |
| US11893393B2 (en) | 2017-07-24 | 2024-02-06 | Tesla, Inc. | Computational array microprocessor system with hardware arbiter managing memory requests |
| US11893774B2 (en) | 2018-10-11 | 2024-02-06 | Tesla, Inc. | Systems and methods for training machine models with augmented data |
| US12014553B2 (en) | 2019-02-01 | 2024-06-18 | Tesla, Inc. | Predicting three-dimensional features for autonomous driving |
| US12307350B2 (en) | 2018-01-04 | 2025-05-20 | Tesla, Inc. | Systems and methods for hardware-based pooling |
| US12462575B2 (en) | 2021-08-19 | 2025-11-04 | Tesla, Inc. | Vision-based machine learning model for autonomous driving with adjustable virtual camera |
| US12522243B2 (en) | 2021-08-19 | 2026-01-13 | Tesla, Inc. | Vision-based system training with simulated content |
| US12591240B2 (en) | 2016-12-29 | 2026-03-31 | Tesla, Inc. | Multi-channel sensor simulation for autonomous control systems |
| US12618976B2 (en) | 2023-12-08 | 2026-05-05 | Tesla, Inc. | Annotation cross-labeling for autonomous control systems |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101167363A (zh) * | 2005-03-31 | 2008-04-23 | 欧几里得发现有限责任公司 | 处理视频数据装置和方法 |
| CN102736837A (zh) * | 2011-05-10 | 2012-10-17 | 新奥特(北京)视频技术有限公司 | 一种基于网格的字幕编辑方法 |
| WO2016207875A1 (fr) * | 2015-06-22 | 2016-12-29 | Photomyne Ltd. | Système et procédé de détection d'objets dans une image |
-
2019
- 2019-04-29 WO PCT/CN2019/084959 patent/WO2020034663A1/fr not_active Ceased
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101167363A (zh) * | 2005-03-31 | 2008-04-23 | 欧几里得发现有限责任公司 | 处理视频数据装置和方法 |
| CN102736837A (zh) * | 2011-05-10 | 2012-10-17 | 新奥特(北京)视频技术有限公司 | 一种基于网格的字幕编辑方法 |
| WO2016207875A1 (fr) * | 2015-06-22 | 2016-12-29 | Photomyne Ltd. | Système et procédé de détection d'objets dans une image |
Cited By (50)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12591240B2 (en) | 2016-12-29 | 2026-03-31 | Tesla, Inc. | Multi-channel sensor simulation for autonomous control systems |
| US11487288B2 (en) | 2017-03-23 | 2022-11-01 | Tesla, Inc. | Data synthesis for autonomous control systems |
| US12020476B2 (en) | 2017-03-23 | 2024-06-25 | Tesla, Inc. | Data synthesis for autonomous control systems |
| US12554467B2 (en) | 2017-07-24 | 2026-02-17 | Tesla, Inc. | Accelerated mathematical engine |
| US11403069B2 (en) | 2017-07-24 | 2022-08-02 | Tesla, Inc. | Accelerated mathematical engine |
| US11409692B2 (en) | 2017-07-24 | 2022-08-09 | Tesla, Inc. | Vector computational unit |
| US12536131B2 (en) | 2017-07-24 | 2026-01-27 | Tesla, Inc. | Vector computational unit |
| US11893393B2 (en) | 2017-07-24 | 2024-02-06 | Tesla, Inc. | Computational array microprocessor system with hardware arbiter managing memory requests |
| US12086097B2 (en) | 2017-07-24 | 2024-09-10 | Tesla, Inc. | Vector computational unit |
| US12216610B2 (en) | 2017-07-24 | 2025-02-04 | Tesla, Inc. | Computational array microprocessor system using non-consecutive data formatting |
| US11681649B2 (en) | 2017-07-24 | 2023-06-20 | Tesla, Inc. | Computational array microprocessor system using non-consecutive data formatting |
| US12307350B2 (en) | 2018-01-04 | 2025-05-20 | Tesla, Inc. | Systems and methods for hardware-based pooling |
| US12455739B2 (en) | 2018-02-01 | 2025-10-28 | Tesla, Inc. | Instruction set architecture for a vector computational unit |
| US11797304B2 (en) | 2018-02-01 | 2023-10-24 | Tesla, Inc. | Instruction set architecture for a vector computational unit |
| US11561791B2 (en) | 2018-02-01 | 2023-01-24 | Tesla, Inc. | Vector computational unit receiving data elements in parallel from a last row of a computational array |
| US11734562B2 (en) | 2018-06-20 | 2023-08-22 | Tesla, Inc. | Data pipeline and deep learning system for autonomous driving |
| US11841434B2 (en) | 2018-07-20 | 2023-12-12 | Tesla, Inc. | Annotation cross-labeling for autonomous control systems |
| US11636333B2 (en) | 2018-07-26 | 2023-04-25 | Tesla, Inc. | Optimizing neural network structures for embedded systems |
| US12079723B2 (en) | 2018-07-26 | 2024-09-03 | Tesla, Inc. | Optimizing neural network structures for embedded systems |
| US11562231B2 (en) | 2018-09-03 | 2023-01-24 | Tesla, Inc. | Neural networks for embedded devices |
| US12346816B2 (en) | 2018-09-03 | 2025-07-01 | Tesla, Inc. | Neural networks for embedded devices |
| US11983630B2 (en) | 2018-09-03 | 2024-05-14 | Tesla, Inc. | Neural networks for embedded devices |
| US11893774B2 (en) | 2018-10-11 | 2024-02-06 | Tesla, Inc. | Systems and methods for training machine models with augmented data |
| US11665108B2 (en) | 2018-10-25 | 2023-05-30 | Tesla, Inc. | QoS manager for system on a chip communications |
| US12367405B2 (en) | 2018-12-03 | 2025-07-22 | Tesla, Inc. | Machine learning models operating at different frequencies for autonomous vehicles |
| US11816585B2 (en) | 2018-12-03 | 2023-11-14 | Tesla, Inc. | Machine learning models operating at different frequencies for autonomous vehicles |
| US12198396B2 (en) | 2018-12-04 | 2025-01-14 | Tesla, Inc. | Enhanced object detection for autonomous vehicles based on field view |
| US11537811B2 (en) | 2018-12-04 | 2022-12-27 | Tesla, Inc. | Enhanced object detection for autonomous vehicles based on field view |
| US11908171B2 (en) | 2018-12-04 | 2024-02-20 | Tesla, Inc. | Enhanced object detection for autonomous vehicles based on field view |
| US11610117B2 (en) | 2018-12-27 | 2023-03-21 | Tesla, Inc. | System and method for adapting a neural network model on a hardware platform |
| US12136030B2 (en) | 2018-12-27 | 2024-11-05 | Tesla, Inc. | System and method for adapting a neural network model on a hardware platform |
| US12223428B2 (en) | 2019-02-01 | 2025-02-11 | Tesla, Inc. | Generating ground truth for machine learning from time series elements |
| US12014553B2 (en) | 2019-02-01 | 2024-06-18 | Tesla, Inc. | Predicting three-dimensional features for autonomous driving |
| US11748620B2 (en) | 2019-02-01 | 2023-09-05 | Tesla, Inc. | Generating ground truth for machine learning from time series elements |
| US11567514B2 (en) | 2019-02-11 | 2023-01-31 | Tesla, Inc. | Autonomous and user controlled vehicle summon to a target |
| US12164310B2 (en) | 2019-02-11 | 2024-12-10 | Tesla, Inc. | Autonomous and user controlled vehicle summon to a target |
| US12236689B2 (en) | 2019-02-19 | 2025-02-25 | Tesla, Inc. | Estimating object properties using visual image data |
| US11790664B2 (en) | 2019-02-19 | 2023-10-17 | Tesla, Inc. | Estimating object properties using visual image data |
| CN111553364A (zh) * | 2020-04-28 | 2020-08-18 | 支付宝(杭州)信息技术有限公司 | 图片处理方法及装置 |
| CN111553364B (zh) * | 2020-04-28 | 2022-10-11 | 支付宝(杭州)信息技术有限公司 | 图片处理方法及装置 |
| CN113159028B (zh) * | 2020-06-12 | 2022-04-05 | 杭州喔影网络科技有限公司 | 显著性感知图像裁剪方法、装置、计算设备和存储介质 |
| CN113159028A (zh) * | 2020-06-12 | 2021-07-23 | 杭州喔影网络科技有限公司 | 显著性感知图像裁剪方法、装置、计算设备和存储介质 |
| WO2022256020A1 (fr) * | 2021-06-04 | 2022-12-08 | Hewlett-Packard Development Company, L.P. | Nouvelle composition d'image |
| CN115546092A (zh) * | 2021-06-30 | 2022-12-30 | 北京电子科技职业学院 | 基于块结构的矩形件裁剪方法、装置、设备及存储介质 |
| US12462575B2 (en) | 2021-08-19 | 2025-11-04 | Tesla, Inc. | Vision-based machine learning model for autonomous driving with adjustable virtual camera |
| US12522243B2 (en) | 2021-08-19 | 2026-01-13 | Tesla, Inc. | Vision-based system training with simulated content |
| CN117067772A (zh) * | 2023-08-09 | 2023-11-17 | 深圳劲鑫科技有限公司 | 一种喷印图像的生成方法、装置、设备及介质 |
| CN116993734A (zh) * | 2023-09-27 | 2023-11-03 | 深圳市博硕科技股份有限公司 | 基于视觉成像进行分析的电池隔热棉裁切质量预测系统 |
| CN116993734B (zh) * | 2023-09-27 | 2023-12-01 | 深圳市博硕科技股份有限公司 | 基于视觉成像进行分析的电池隔热棉裁切质量预测系统 |
| US12618976B2 (en) | 2023-12-08 | 2026-05-05 | Tesla, Inc. | Annotation cross-labeling for autonomous control systems |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2020034663A1 (fr) | Recadrage d'image basé sur une grille | |
| Zeng et al. | Reliable and efficient image cropping: A grid anchor based approach | |
| Zeng et al. | Grid anchor based image cropping: A new benchmark and an efficient model | |
| US8594385B2 (en) | Predicting the aesthetic value of an image | |
| RU2628192C2 (ru) | Устройство для семантической классификации и поиска в архивах оцифрованных киноматериалов | |
| US8422832B2 (en) | Annotating images | |
| Zhang et al. | Probabilistic graphlet transfer for photo cropping | |
| US6738494B1 (en) | Method for varying an image processing path based on image emphasis and appeal | |
| US9892342B2 (en) | Automatic image product creation for user accounts comprising large number of images | |
| CN111062871A (zh) | 一种图像处理方法、装置、计算机设备及可读存储介质 | |
| CN110347868B (zh) | 用于图像搜索的方法和系统 | |
| WO2022217876A1 (fr) | Procédé et appareil de segmentation d'instance, dispositif électronique et support de stockage | |
| Wang et al. | Aspect-ratio-preserving multi-patch image aesthetics score prediction | |
| JP4545641B2 (ja) | 類似画像検索方法,類似画像検索システム,類似画像検索プログラム及び記録媒体 | |
| CN112101376B (zh) | 图像处理方法、装置、电子设备和计算机可读介质 | |
| CN110866938A (zh) | 一种全自动视频运动目标分割方法 | |
| Celona et al. | A grid anchor based cropping approach exploiting image aesthetics, geometric composition, and semantics | |
| Yang et al. | Focusing on your subject: Deep subject-aware image composition recommendation networks | |
| US8270731B2 (en) | Image classification using range information | |
| CN110210572B (zh) | 图像分类方法、装置、存储介质及设备 | |
| Kuzovkin et al. | Context in photo albums: Understanding and modeling user behavior in clustering and selection | |
| Tang et al. | Image retargetability | |
| CN111353433A (zh) | 一种基于对抗尺度一致性追求特征自学习的人群计数方法 | |
| US9519826B2 (en) | Automatic image product creation for user accounts comprising large number of images | |
| CN118411536A (zh) | 一种基于多模态特征融合的视频相似性判定方法及装置 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19849197 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 19849197 Country of ref document: EP Kind code of ref document: A1 |