WO2022152116A1 - 图像处理方法、装置、设备、存储介质及计算机程序产品 - Google Patents
图像处理方法、装置、设备、存储介质及计算机程序产品 Download PDFInfo
- Publication number
- WO2022152116A1 WO2022152116A1 PCT/CN2022/071306 CN2022071306W WO2022152116A1 WO 2022152116 A1 WO2022152116 A1 WO 2022152116A1 CN 2022071306 W CN2022071306 W CN 2022071306W WO 2022152116 A1 WO2022152116 A1 WO 2022152116A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- area
- ternary
- target
- region
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—Two-dimensional [2D] image generation
- G06T11/20—Drawing from basic elements
- G06T11/23—Drawing from basic elements using straight lines or curves
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/13—Edge detection
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/194—Segmentation; Edge detection involving foreground-background segmentation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/90—Determination of colour characteristics
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
Definitions
- the present application relates to the technical field of artificial intelligence, and in particular, to an image processing method, apparatus, device, storage medium and computer program product.
- image matting is a widely used image processing technology, which specifically refers to separating the foreground area in the image from the background area in the image.
- the method of segmentation is usually used to realize matting, specifically, each pixel in the image is classified, so as to obtain block segmentation results of different categories, thereby obtaining the foreground area in the image, Such as portrait area, or building area.
- Embodiments of the present application provide an image processing method, apparatus, device, storage medium, and computer program product.
- an image processing method executed by a computer device, the method comprising:
- the foreground area in the first image is the area where the target object is located in the original image
- the second image package is the target object.
- the third image is a segmented image of the second target area of the target object; the sub-areas of the foreground area include the first target area and the second target area;
- a target ternary image is generated, the target ternary image includes the foreground area and a line drawing area, the line drawing area is formed by passing on the outline of the foreground area Obtained by drawing lines; different sub-regions of the foreground area correspond to different line widths;
- the target object in the original image is cut out to obtain a target image including the target object.
- an image processing device comprising:
- the image segmentation module is used to perform image semantic segmentation on the original image to obtain a first image, a second image and a third image.
- the foreground area in the first image is the area where the target object is located in the original image
- the second image is the segmented image of the first target area of the target object
- the third image is the segmented image of the second target area of the target object
- the sub-area of the foreground area includes the first target area and the second target area
- a ternary image generation module configured to generate a target ternary image based on the first image, the second image and the third image, where the target ternary image includes the foreground area and the line drawing area, and the line drawing area is obtained by Obtained by drawing lines on the outline of the foreground area; different sub-areas of the foreground area correspond to different line widths;
- the matting module is configured to perform matting processing on the target object in the original image based on the target ternary image to obtain a target image including the target object.
- a computer device comprising one or more processors and a memory for storing at least one computer readable instruction, the at least one piece of computer readable instruction being processed by the one or more
- the image processing method in the embodiment of the present application is loaded and executed by the imager.
- one or more computer-readable storage media having stored therein at least one computer-readable instruction that is loaded and executed by one or more processors to The operations performed in the image processing method in the embodiment of the present application are implemented.
- a computer program product comprising computer readable instructions stored in a computer readable storage medium.
- One or more processors of the computer device read the computer-readable instructions from the computer-readable storage medium, and the one or more processors execute the computer-readable instructions, causing the computer device to perform the image processing provided in the above embodiments method.
- FIG. 1 is a schematic structural diagram of a high-resolution network provided according to an embodiment of the present application.
- FIG. 2 is a schematic structural diagram of an object context feature representation provided according to an embodiment of the present application.
- FIG. 3 is a schematic diagram of an implementation environment of an image processing method provided according to an embodiment of the present application.
- FIG. 5 is a flowchart of another image processing method provided according to an embodiment of the present application.
- FIG. 6 is a schematic diagram of an image semantic segmentation result provided according to an embodiment of the present application.
- FIG. 7 is a schematic diagram of a first ternary graph provided according to an embodiment of the present application.
- FIG. 8 is a schematic diagram of a second ternary graph provided according to an embodiment of the present application.
- FIG. 9 is a schematic diagram of a third ternary graph provided according to an embodiment of the present application.
- FIG. 10 is a schematic diagram of a target ternary graph provided according to an embodiment of the present application.
- FIG. 11 is a schematic diagram of a cutout model provided according to an embodiment of the present application.
- FIG. 12 is a schematic diagram of a target image provided according to an embodiment of the present application.
- FIG. 13 is a schematic diagram of an image processing method provided according to an embodiment of the present application.
- FIG. 14 is a schematic structural diagram of an image processing apparatus provided according to an embodiment of the present application.
- FIG. 15 is a schematic structural diagram of a terminal according to an embodiment of the present application.
- first and second are used to distinguish the same or similar items that have substantially the same function and function. It should be understood that the terms “first”, “second” and “nth” There is no logical or timing dependency between them, and the number and execution order are not limited. It will also be understood that, although the following description uses the terms first, second, etc. to describe various elements, these elements should not be limited by the terms.
- first image could be referred to as a second image
- second image could be referred to as a first image
- Both the first image and the second image may be images, and in some cases, may be separate and distinct images.
- At least one refers to one or more than one image
- at least one image may be any integer number of images greater than or equal to one, such as one image, two images, three images, etc.
- the multiple refers to two or more than two.
- the multiple images may be any integer number of images greater than or equal to two, such as two images or three images.
- the image processing solutions provided in the embodiments of the present application may use computer vision technology in artificial intelligence technology.
- the semantic segmentation processing in this application uses computer vision technology.
- a high-resolution network can be used to extract image feature information
- an Object-Contextual Representations (OCR) technology can be used to calculate the semantic category of each pixel in the image.
- High Resolution Network is a computational model used to obtain image feature information, which can maintain high-resolution representations during all operations.
- HRNET starts with a set of high-resolution convolutions, and then progressively adds low-resolution branches of convolutions and concatenates them in parallel.
- FIG. 1 is a schematic structural diagram of a high-resolution network provided by the present application. As shown in FIG. 1, the network parallels feature maps of different resolutions, and each resolution is one channel. In the whole process Information is continuously exchanged between the parallel operation combinations in the multi-resolution fusion.
- OCR is a computational model for characterizing the semantic categories of pixels in an image.
- FIG. 2 is a schematic structural diagram of an object context feature representation provided by the present application, as shown in FIG. 2: First, a rough semantic segmentation result is obtained through the middle layer of the backbone network, that is, the soft object region (Soft object area).
- Soft object area the soft object region
- K groups of vectors are obtained by calculating the pixel representation (Pixel Representation) and the soft object region output by the deep layer of the backbone network, K>1, that is, the object region representation (Object Region Representations), in which, each vector The feature representation corresponding to a semantic category; thirdly, the relationship matrix between the pixel feature and the object region feature representation is calculated; fourthly, according to the value of the pixel feature of each pixel and the object region feature representation in the relationship matrix, each The object region features are weighted and summed to obtain the contextual feature representation of the object, that is, OCR; finally, based on the OCR and pixel features, an Augmented Representation is obtained as contextual information enhancement, and the enhanced feature representation can be used for prediction. Semantic category for each pixel.
- Semantic Segmentation For the input image, based on the semantic understanding of each pixel, the pixels with the same semantics are divided into the same part or region, and the process of obtaining several different semantic regions.
- Foreground The subject in the image, such as a portrait in a portrait shot.
- Image Matting An image processing technique that separates the foreground of an image from the background.
- Trimap An image that contains three types of markers: foreground, background, and foreground-background mixed areas, and is usually used as the input of the matting model together with the original image. It should be noted that, in the following embodiments, the foreground/background mixed area is also referred to as a line drawing area.
- Identification value A numerical value used to identify the color of a pixel in an image. For example, the identification value of a pixel is 255, indicating that the RGB (Red-Green-Blue, red, green and blue) color value of the pixel is (255, 255, 255), which is white; The identification value is 0, indicating that the RGB color value of the pixel point is (0, 0, 0), which is black; for another example, the identification value of a pixel point is 128, indicating that the RGB color value of the pixel point is (128, 128, 128), shown in gray.
- Open Source Computer Vision Library A cross-platform computer vision and machine learning software library that runs on a variety of operating systems. OpenCV can be used to develop real-time image processing, computer vision and pattern recognition programs.
- findContours A function in OpenCV for detecting contours in images.
- drawContours A function in OpenCV for drawing contours in an image.
- Cutout model A computational model used to calculate the probability that each pixel in the original image belongs to the foreground based on the original image and the ternary map.
- the matting models include the IndexNet model, the GCAMatting model, and the ContextNet model.
- FIG. 3 is a schematic diagram of an implementation environment of an image processing method provided according to an embodiment of the present application.
- the implementation environment includes: a terminal 301 and a server 302 .
- the terminal 301 and the server 302 can be directly or indirectly connected through wired or wireless communication, which is not limited in this application.
- the terminal 301 is a smart phone, a tablet computer, a notebook computer, a desktop computer, etc., but is not limited thereto.
- the terminal 301 can have applications installed and running.
- the application is a social application, an image processing application, a photographing application, or the like.
- the terminal 301 is a terminal used by a user, and a social application program runs in the terminal 301 , and the user can extract the portrait in the picture through the social application program.
- the server 302 can be an independent physical server, a server cluster or a distributed system composed of multiple physical servers, and can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, Cloud servers for basic cloud computing services such as middleware services, domain name services, security services, Content Delivery Network (CDN), and big data and artificial intelligence platforms.
- the server 302 is used to provide background services for the applications running on the terminal 301 .
- the server 302 undertakes the main computing work, and the terminal 301 undertakes the secondary computing work; or, the server 302 undertakes the secondary computing work, and the terminal 301 undertakes the main computing work; or, the server 302 or The terminals 301 can individually undertake computing work.
- the terminal 301 generally refers to one of multiple terminals, and this embodiment only takes the terminal 301 as an example for illustration.
- the number of the above-mentioned terminals 301 can be larger.
- the number of the above-mentioned terminals 301 is tens or hundreds, or more, in this case, the implementation environment of the above-mentioned image processing method also includes other terminals.
- the embodiments of the present application do not limit the number of terminals and device types.
- the aforementioned wireless or wired networks use standard communication techniques and/or protocols.
- the network is usually the Internet, but can be any network, including but not limited to Local Area Network (LAN), Metropolitan Area Network (MAN), Wide Area Network (WAN), mobile, wired or wireless network, private network, or any combination of virtual private networks).
- data exchanged over a network is represented using technologies and/or formats including HTML, Extensible Markup Language (XML), and the like.
- it can also use services such as Secure Socket Layer (SSL), Transport Layer Security (TLS), Virtual Private Network (VPN), Internet Protocol Security (Internet Protocol Security, IPsec) and other conventional encryption techniques to encrypt all or some of the links.
- custom and/or dedicated data communication techniques can also be used in place of or in addition to the data communication techniques described above.
- FIG. 4 is a flowchart of an image processing method provided according to an embodiment of the present application.
- the application to a computer device is taken as an example for description.
- the computer equipment can be a terminal or a server, and the method includes the following steps:
- 401 Perform image semantic segmentation on the original image to obtain a first image, a second image and a third image, where the foreground area in the first image is the area where the target object is located in the original image, and the second image is the area of the target object.
- a segmented image of the first target area, and the third image is a segmented image of the second target area of the target object.
- the original image refers to an image that needs to be extracted.
- the target object refers to the object in the original image that needs to be separated to generate the target image.
- the first image, the second image and the third image are essentially divided images, and the first image is a segmented image obtained by segmenting the entire target object. Therefore, the foreground area in the first image is the original image. All elements of the target object in the image.
- the second image is a segmented image obtained by segmenting the local part of the first target area of the target object. Therefore, the foreground area in the second image is all elements in the first target area of the target object, except the first target area. The other regions belong to the background region in the second image.
- the second image is a segmented image obtained by segmenting the local part of the second target area of the target object. Therefore, the foreground area in the third image is all elements in the second target area of the target object, except for the first target area. Areas other than a target area belong to the background area in the third image.
- the target object may include at least one of a portrait of a person, an image of an animal, an image of a plant, and the like in the original image.
- the foreground area of the target object in the first image is the area used to indicate the entire target object, and the first target area and the second target area are the areas of the local parts of the target object, the first target area and the second target area are Both target areas are sub-areas in the foreground area of the target object in the first image.
- the first target area may be an area in the target object that needs to be refined.
- the second target area may be an area of the target object that is related to the first target area and has a different matting refinement requirement than the first target area.
- the matting refinement requirement of the second target area may be lower than the matting refinement requirement of the first target area.
- the detail information of the second target area is lower than that of the first target area, and therefore, the matting refinement requirement is lower than that of the first target area.
- the first target area and the second target area may be a hair area and a face area, respectively.
- the first target area and the second target area may be a hair area and a head area, respectively.
- the first target area and the second target area may be a leaf area and a branch area, respectively.
- the target object may be a portrait of a real person, a portrait of a cartoon character, a portrait of an anime character, or the like.
- the target ternary image includes a foreground area and a line drawing area, and the line drawing area is obtained by drawing lines on the outline of the foreground area. ; different sub-areas of the foreground area correspond to different line widths.
- the sub-area refers to a partial area in the foreground area, including a first target area and a second target area.
- matting processing refers to a process of separating the target object in the original image from the background area to obtain the target image.
- a semantic segmentation method is used to obtain a plurality of segmented images containing different regions, and further, according to these segmented images, on the outline of the foreground region, the Lines of different widths are drawn to obtain a target ternary image, and finally a target image is generated based on the target ternary image.
- a target ternary map because lines of different widths are used to draw on the outline of the foreground area, targeted matting of different regions can be realized.
- the matting accuracy of the region can also be guaranteed, so that a fine and natural matting image can be finally obtained; in addition, the above-mentioned matting process is fully automated, which greatly improves the matting efficiency.
- FIG. 4 is only the basic flow of the present application, and the solution provided by the present application will be further described below based on a specific implementation manner.
- FIG. 5 is a flowchart of another image processing method provided according to an embodiment of the present application.
- the first target area is a hair area; the second target area is a face area.
- the application to a terminal is taken as an example for description. The method includes the following steps:
- the terminal provides a cutout function
- the user can perform a cutout operation on the terminal
- the terminal obtains the original image in response to the cutout operation.
- the original image is a local image stored on the terminal, or, the original image is an online image, and this embodiment of the present application does not limit the source of the original image.
- an image processing interface for the original image is displayed on the terminal, and the image processing interface includes a cropping option, a cropping option, etc., the user can select the cropping option, and the terminal responds to the selecting operation , to obtain the original image.
- an image processing interface is displayed on the terminal, and the image processing interface includes a cutout option, the user can select the cutout option, and the terminal displays an image selection interface in response to the selection operation. Click on the image you want to cut out, select the original image to cut out, and the terminal acquires the original image in response to the click operation.
- the image segmentation model is used to calculate the semantic category of each pixel in the original image according to the input original image, so as to output at least one image of the original image.
- the image segmentation model is an HRNET-OCR model, which is a computational model combining the HRNET model and the OCR model.
- the calculation process of the HRNET-OCR model is as follows: first, extract the features of the original image through the HRNET model to obtain the feature information of the original image; secondly, input the obtained feature information into the backbone network of the OCR model; again , based on the OCR model, the semantic category of each pixel in the original image is calculated; for example, the semantic categories include hair, nose, eyes, torso, clothing, and buildings, etc.; finally, based on the semantic category of each pixel, output the At least one image of the original image.
- the specific calculation process of the above HRNET-OCR model has been described in detail with reference to FIG. 1 and FIG. 2 , so it will not be repeated here.
- At least one image of the original image may be output by adjusting some structures in the above HRNET-OCR model, and the embodiments of the present application do not limit the structure composition of the HRNET-OCR model.
- the above image segmentation model may also be implemented by other network models, and the embodiment of the present application does not limit the type of the image semantic segmentation model.
- the first image includes a foreground area where the target object is located in the original image
- the second image is a segmented image of the target object's hair area
- the third image is a segmented image of the target object's face area.
- the terminal can obtain three segmented images, that is, the first image, the second image and the third image in this step.
- each image includes two kinds of regions, and the two kinds of regions are respectively marked with different identification values.
- the first image includes a foreground region and a background region, wherein each of the foreground regions
- the identification value of the pixel point is 255, and the identification value of each pixel point in the background area is 0. It should be noted that, in practical applications, the developer can flexibly set the identification value according to requirements, which is not limited in this embodiment of the present application.
- FIG. 6 is a schematic diagram of an image semantic segmentation result provided by an embodiment of the present application.
- the original image is a portrait of a person
- the image shown in (a) in FIG. 6 is the first image
- the first image includes a foreground area 1 and a background area 2
- the foreground area 1 contains the portrait of the person All elements of
- the image shown in (b) in Figure 6 is the second image
- the second image includes the hair area 3 and the background area 4
- Figure 6 (c) shows the third image
- the The third image includes a face area 5 and a background area 6 .
- first ternary image based on the first image and the second image, where the first ternary image includes a foreground area, a first line drawing sub-area, and a second line drawing sub-area.
- the first line-drawing sub-region covers the outline of the hair region on the side of the hair region close to the background region in the first image
- the second line-drawing sub-region covers the non-hair region in the foreground region
- the non-hair region is the region other than the hair region in the foreground region.
- the first line width is greater than the second line width
- the first line width is used for drawing the first line drawing sub-region
- the second line width is used for drawing the second line drawing sub-region.
- the first ternary image further includes a background area, and the identification values of the first line drawing sub-area and the second line drawing sub-area are different from the identification value of the foreground area and the identification value of the background area.
- the identification value of each pixel in the foreground area is 255
- the identification value of each pixel in the background area is 0
- the identification value of each pixel in the first line drawing sub-region and the second line drawing sub-region is 128.
- the developer can flexibly set the identification value of the line drawing area according to requirements, which is not limited in this embodiment of the present application.
- FIG. 7 is a schematic diagram of a first ternary graph provided by an embodiment of the present application.
- the first ternary diagram includes a foreground area 7 , a background area 8 , a first line drawing sub-area 9 and a second line drawing sub-area 10 , wherein the first line drawing sub-area 9 is drawn according to the first line width, and the second line drawing sub-area 10 is drawn according to the second line width.
- the complete contour line of the foreground area refers to the boundary line between the foreground area and the background area.
- the terminal obtains the complete contour line of the foreground area in the first image through the contour detection algorithm.
- the above-mentioned contour detection algorithm may be implemented by the findContours function, which is not limited in this embodiment of the present application.
- the second ternary image includes a foreground area and a third line-drawing sub-area, and the third line-drawing sub-area covers the complete contour line of the foreground area.
- the second line width is calculated from the dimensions of the original image. In some embodiments, the second line width can be calculated by the following formula (1):
- S is the width of the second line; width and height are the width and height of the original image, respectively; min() refers to the minimum value function; min(width, height) indicates that the minimum value is selected from the width and height of the original image; N is the default line size, for example, N may be 17, which is not limited in this embodiment of the present application.
- the terminal draws a line on the complete contour line according to the second line width through the contour drawing algorithm, and the identification value of the line is different from the identification value of the foreground area and the background area.
- the above-mentioned contour drawing algorithm can be implemented by the drawContours function. For example, taking the identification value of the foreground area as 255 and the identification value of the background area as 0, the following formula (2) is used to realize the completeness of the foreground area.
- segResult is the first image
- contours is the complete contour line of the foreground area detected by the findContours function
- -1 indicates that all contour lines are operated
- Scalar is the identification value
- Scalar(128, 128, 128) indicates that the The color values of the R, G, and B channels in the RGB channel are all set to 128
- S is the second line width.
- the above method of drawing a line is to operate on the obtained complete contour line, that is, to cover the complete contour line.
- the complete contour line obtained by the findContours function includes pixel points A1 to A10, then these pixel points are operated to realize line drawing. That is, the third line drawing sub-region obtained by drawing lines covers both the foreground region in the first image and the background region in the first image.
- FIG. 8 is a schematic diagram of a second ternary graph provided by an embodiment.
- the left image in FIG. 8 shows the first image
- the right image in FIG. 8 shows the second ternary image.
- the second ternary image includes a foreground area 11, a background area 12 and a third line drawing sub-area 13.
- the third line drawing sub-region 13 is drawn according to the second line width.
- the complete outline of the hair region refers to the boundary between the hair region and the background region.
- the terminal acquires the complete contour line of the hair region in the second image through the contour detection algorithm.
- the above-mentioned contour detection algorithm may be implemented by the findContours function, which is not limited in this embodiment of the present application.
- the third ternary image includes a hair region and a fourth line-drawing sub-region, and the fourth line-drawing sub-region covers the complete contour line of the hair region.
- the width of the first line is M times the width of the second line, wherein M is greater than 1.
- the width of the first line is three times the width of the second line; that is, when the width of the second line is S, the width of the first line is S*3, which is not limited in this embodiment of the present application.
- segResultHair is the second image
- contours is the complete contour line of the hair area detected by the findContours function
- -1 indicates that all contour lines are operated
- Scalar is the identification value
- Scalar(128, 128, 128) indicates that the The color values of the R, G, and B channels in the RGB channel are all set to 128
- S*3 is the first line width.
- FIG. 9 is a schematic diagram of a third ternary graph provided by an embodiment.
- the left image in FIG. 9 shows the second image, and the right image in FIG. 9 shows the third ternary image.
- the third ternary image includes a foreground area 14, a background area 15, and a fourth line drawing sub-area 16.
- the fourth line drawing sub-region 16 is drawn according to the first line width.
- merging the second ternary graph and the third ternary graph means: taking the maximum identification value of the same position in the two ternary graphs as the identification value of the corresponding position in the first ternary graph.
- Step A Obtain the first identification value of each pixel in the second ternary image, where the first identification value is used to identify the color of the pixel in the second ternary image.
- Step B Obtain the second identification value of each pixel point in the third ternary diagram, where the second identification value is used to identify the color of the pixel point in the third ternary diagram.
- Step C Generate a first ternary graph based on the magnitude relationship between the first identification value and the second identification value.
- step C includes: comparing the first identification value of the pixel point at any position in the second ternary diagram with the second identification value of the pixel point at the same position in the third ternary diagram; The largest of the identification value and the second identification value is used as the third identification value of the pixel at the same position in the first ternary diagram, and the third identification value is used to identify the color of the pixel in the first ternary diagram.
- Pixel result Pixel leftUp > Pixel leftDown ? Pixel leftUp : Pixel leftDown (4)
- Pixel result is the first ternary image, which is the right image in Figure 7;
- Pixel leftUp is the second ternary image, which is the upper left image in Figure 7;
- Pixel leftDown is the third ternary image, That is, the lower left image in FIG. 7 .
- the first line drawing sub-region 9 corresponding to the hair region is larger than the second line drawing sub-region 10 corresponding to other regions,
- the target ternary image includes a foreground area and a line drawing area, and the line drawing area is obtained by drawing lines on the outline of the foreground area; different sub-areas of the foreground area correspond to different line widths.
- the foreground region further includes a torso region of the target object, wherein, in the target ternary diagram, the line width corresponding to the hair region is larger than the line width corresponding to the torso region, and the line width corresponding to the torso region is larger than the face The width of the line corresponding to the area.
- FIG. 10 is a schematic diagram of a target ternary graph provided by an embodiment.
- the upper right picture in FIG. 10 shows the target ternary graph, which includes a foreground area 17 , a background area 18 and a line drawing area 19 .
- the line width corresponding to the hair area is larger than the line width corresponding to the torso area
- the line width corresponding to the torso area is larger than the line width corresponding to the face area.
- the relationship between the line widths in the line drawing area 19 can continue to refer to the lower right figure in FIG. 10, which includes the line drawing areas 19a, 19b and 19c.
- 19a represents the line width corresponding to the hair area
- 19b represents the line width.
- the line width corresponding to the face area, 19c represents the line width corresponding to the torso area. As shown in the figure, the line width 19a corresponding to the hair area is larger than the line width 19c corresponding to the torso area, and the line width 19c corresponding to the torso area is larger than that corresponding to the face area.
- the line width is 19b.
- the terminal determines the target overlapping area in the first ternary image based on the pixel position of the face region in the third image.
- the figure includes the target overlapping area 20, which is the face area in the third image and the second in the first ternary image.
- the identification value of the target overlapping area is the identification value of the second line drawing sub-area
- the identification value of the target overlapping area in the first ternary diagram is changed to The target identification value assigns the pixel points in this area to generate the target ternary map. For example, taking the identification value of the face area as 255 and the identification value of the second line drawing sub-region as 128, in the first ternary diagram, the identification value of the target overlapping area was originally 128. The pixel points in the target overlapping area are reassigned with the identification value of 255 to obtain the target ternary map.
- the above steps 5051 and 5052 can be implemented by the following formula (5):
- Pixel Pixel ⁇ Face ⁇ ? 255: Pixel trimp (5)
- ⁇ Face ⁇ represents the face area
- 255 represents the target identification value
- Pixel trimp is the target ternary image.
- the terminal After the above steps 501 to 505, after acquiring the original image, the terminal automatically generates a target ternary image, and in the target ternary image, the line widths corresponding to different sub-areas of the foreground area are different.
- the line drawing area in the target ternary image is the foreground and background mixed area.
- the terminal automatically divides the hair area and the areas other than the hair area according to different line widths. Line drawing in other areas ensures the matting range of complex areas such as the hair area, and improves the matting accuracy of this part of the area; at the same time, the pixels belonging to the face area are assigned the same target identification value as the foreground area. , which takes into account the protection of key areas in the portrait, and avoids the loss of details in the cutout.
- the matting model is used to calculate the probability that each pixel in the original image belongs to the target image according to the input target ternary image and the original image, so as to output the transparency.
- transparency is calculated by the following formula (6):
- I represents the original image
- F represents the foreground, that is, the area that includes all the elements of the target object
- B represents the background
- ⁇ is the transparency, which is used to represent the proportion of the foreground color in the original image.
- Formula (6) shows that the original image is composed of the foreground and background superimposed according to a certain transparency.
- the above-mentioned matting model may be an IndexNet matting model, or, the above-mentioned matting model may also be a GCAMatting matting model, or, the above-mentioned matting model may also be a ContextNet model, etc.
- the specific type of the above cutout model is not limited.
- FIG. 11 is a schematic diagram of a matting model provided by an embodiment of the present application.
- the target ternary image and the original image are used as inputs to obtain a rough Alpha (that is, a rough Alpha).
- the fine result of that is, the Alpha value of each pixel.
- the matting process in step 508 refers to the process of separating the target object in the original image from the background based on the transparency of each pixel to obtain the target image included.
- FIG. 12 is a schematic diagram of a target image provided by an embodiment of the present application.
- the left picture in Fig. 12 is the original image
- the upper right picture in Fig. 12 is the target image obtained by this method.
- the lower right figure shows the target image obtained according to the image segmentation method in the related art.
- the image segmentation is accurate, the hair tips of the portrait are very rough, and there is a loss of details on the face.
- a semantic segmentation method is used to obtain a plurality of segmented images containing different regions, and further, according to these segmented images, on the outline of the foreground region, the Lines of different widths are drawn to obtain a target ternary image, and finally a target image is generated based on the target ternary image.
- a target ternary map because lines of different widths are used to draw on the outline of the foreground area, targeted matting of different regions can be realized.
- the matting accuracy of the region can also be guaranteed, so that a fine and natural matting image can be finally obtained; in addition, the above-mentioned matting process is fully automated, which greatly improves the matting efficiency.
- an original image is obtained, and the original image is an image including a portrait of a person.
- the first segmented image includes the foreground area, which contains all elements of the portrait; the second segmented image Including the hair area, because the edge lines between the human torso and the background are relatively clear, and because of the characteristics of its shape, the hair often merges with the background more seriously, so it is necessary to focus on matting; the third segmented image includes the face area, which can also be understood as In the protection area, the face is an important part of the portrait. If it is accidentally cut out and injured, it will greatly affect the look and feel. It is necessary to protect this part of the area from being cut out by the cutout.
- the second ternary graph and the third ternary graph are merged to obtain the merged first ternary graph.
- the target image is finally obtained, that is, the portrait of the person in the original image.
- application scenarios of the image processing method provided by the embodiments of the present application include but are not limited to:
- the terminal provides an emoticon package making function for portraits through an application, and the user performs an operation on the terminal to input the original image of the portrait that the user wants to extract.
- the terminal adopts the image processing method provided in the embodiment of the present application to automatically extract the portrait of the person in the original image, and display it on the terminal for the user to follow up on the basis of the portrait of the person. Perform other image processing operations to obtain the emoticon package that the user wants.
- the process of extracting a person's portrait by the terminal includes the following steps 1 to 8:
- the terminal obtains the original image.
- the terminal inputs the original image into the image segmentation model.
- the terminal acquires the first image, the second image and the third image output by the image segmentation model.
- the first image includes a foreground area where a portrait of a person is located in the original image
- the second image includes a hair area of the portrait
- the third image includes a face area of the portrait.
- the terminal generates a first ternary image based on the first image and the second image, where the first ternary image includes a foreground area, a first line drawing sub-area, and a second line drawing sub-area.
- the terminal generates a target ternary image based on the third image and the first ternary image.
- the terminal inputs the target ternary image and the original image into the matting model.
- the terminal obtains the transparency of the output of the matting model, and the transparency is used to represent the probability that the pixel belongs to the portrait of the person.
- the terminal performs matting processing on the original image based on the transparency to obtain a target image including the portrait of the person. Subsequently, the user makes an emoticon package based on the target image.
- the image processing method provided by the embodiment of the present application can realize the automatic extraction of the portrait of the person, and the effect of the extracted portrait of the person is fine and natural, which can meet the user's personalized needs for the production of expression packs.
- the host may wish to hide the real background environment he is in, and then only display the host's portrait in the live broadcast, or add other virtual backgrounds based on the host's portrait.
- the terminal provides a character portrait mode during the live broadcast, and the host enables the character portrait mode to enable the terminal to acquire each frame of the original image captured by the camera in real time, and then use the image processing method provided by the embodiment of the present application to transform each frame of the original image.
- the portrait of the anchor in the original image of the frame is extracted, and the live broadcast screen is generated in real time for live broadcast.
- the specific process of extracting the portrait of the person by the terminal is similar to the above-mentioned scenario 1, so it is not repeated here.
- the image processing method provided by the embodiment of the present application realizes automatic portrait extraction, it can be directly applied to such a scene that requires real-time extraction of a portrait of a person.
- FIG. 14 is a schematic structural diagram of an image processing apparatus provided according to an embodiment of the present application.
- the apparatus is used to execute the steps of the above-mentioned image processing method.
- the apparatus includes: an image segmentation module 1401 , a ternary image generation module 1402 , and a matting module 1403 .
- the image segmentation module 1401 is used to perform image semantic segmentation on the original image to obtain a first image, a second image and a third image, the first image includes the foreground area where the target object is located in the original image, and the second image includes the the hair region of the target object, the third image includes the face region of the target object;
- the ternary image generation module 1402 is configured to generate a target ternary image based on the first image, the second image and the third image, where the target ternary image includes the foreground area and the drawing area, and the drawing area is Obtained by drawing lines on the outline of the foreground area; different sub-areas of the foreground area correspond to different line widths;
- the matting module 1403 is configured to perform matting processing on the original image based on the target ternary image to obtain a target image including the target object.
- the foreground area further includes a torso area of the target object, wherein, in the target ternary diagram, the line width corresponding to the hair area is greater than the line width corresponding to the torso area, and the line corresponding to the torso area The width is larger than the line width corresponding to the face area.
- the ternary graph generation module 1402 includes:
- a first generating unit for generating a first ternary image based on the first image and the second image, the first ternary image including the foreground area, the first line drawing sub-region and the second line drawing sub-region;
- the first line drawing sub-area covers the contour line of the side of the hair area close to the background area
- the second line drawing sub-region covers the contour lines of other areas, and the other areas are the foreground areas except for the The area outside the hair area
- the first line width is greater than the second line width
- the first line width is used to draw the first line drawing sub-area
- the second line width is used to draw the second line drawing sub-area
- the second generating unit is configured to generate the target ternary image based on the third image and the first ternary image.
- the first generating unit is configured to: in the first image, obtain a complete contour line of the foreground area; and draw a line on the complete contour line of the foreground area according to the second line width to obtain a second ternary image; wherein, the second ternary image includes the foreground area and a third line-drawing sub-area, and the third line-drawing sub-area covers the complete contour line of the foreground area; in the second image , obtain the complete contour line of the hair area; according to the width of the first line, draw a line on the complete contour line of the hair area to obtain a third ternary image; wherein, the third ternary image includes the hair area and the third ternary image.
- Four line drawing sub-regions, the fourth line drawing sub-region covers the complete contour line of the hair region; the second ternary image and the third ternary image are combined to obtain the first ternary image.
- the first line width is M times the second line width, and M is greater than 1.
- the first generating unit is further configured to: obtain a first identification value of each pixel in the second ternary diagram, where the first identification value is used to identify the pixel point in the second ternary diagram color; obtain the second identification value of each pixel in the third ternary diagram, and the second identification value is used to identify the color of the pixel in the third ternary diagram; based on the first identification value and the second identification The magnitude relationship between the values generates the first ternary map.
- the first generating unit is further configured to: the first identification value of the pixel point at any position in the second ternary diagram is the same as the second identification value of the pixel point at the same position in the third ternary diagram The identification values are compared; the maximum of the first identification value and the second identification value is used as the third identification value of the pixel point at the same position in the first ternary diagram, and the third identification value is used to identify the The color of the pixel in the first ternary image.
- the second generating unit is configured to: determine, based on the face region in the third image, a target overlapping region of the first ternary image, where the target overlapping region is the face region and the first ternary image.
- the overlapping area of the two-line sub-area; the pixel points of the target overlapping area are assigned with the target identification value, and the target ternary map is generated, and the target identification value is used to identify the color of the pixels in the face area.
- the matting module 1403 is configured to: based on the target ternary map, obtain the transparency of each pixel in the original image, where the transparency is used to represent the probability that the pixel belongs to the target object; based on the transparency The original image is cut out to obtain the target image.
- the image segmentation module 1401 is further used for: acquiring the original image; inputting the original image into an image segmentation model, wherein the image segmentation model is used for, according to the input original image, the original image Calculate the semantic category of each pixel in the original image to output at least one image of the original image; obtain the first image, the second image and the third image output by the image segmentation model.
- the matting module 1403 is further configured to: input the target ternary image and the original image into a matting model, where the matting model is used to input the target ternary image and the original image according to the input , calculate the probability that each pixel in the original image belongs to the target image to output the transparency; obtain the transparency output by the matting model.
- a semantic segmentation method is used to obtain a plurality of segmented images containing different regions, and further, according to these segmented images, on the outline of the foreground region, the Lines of different widths are drawn to obtain a target ternary image, and finally a target image is generated based on the target ternary image.
- a target ternary map because lines of different widths are used to draw on the outline of the foreground area, targeted matting of different regions can be realized.
- the matting accuracy of the region can also be guaranteed, so that a fine and natural matting image can be finally obtained; in addition, the above-mentioned matting process is fully automated, which greatly improves the matting efficiency.
- the image processing apparatus when the image processing apparatus provided in the above-mentioned embodiments performs image processing, only the division of the above-mentioned functional modules is used as an example for illustration. That is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
- Each module in the above apparatus may be implemented in whole or in part by software, hardware and combinations thereof.
- the above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
- the image processing apparatus and the image processing method embodiments provided by the above embodiments belong to the same concept, and the specific implementation process thereof is detailed in the method embodiments, which will not be repeated here.
- FIG. 15 shows a schematic structural diagram of a terminal 1500 provided by an exemplary embodiment of the present application.
- the terminal 1500 can be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, the standard audio level 3 of the moving picture expert compression), MP4 (Moving Picture Experts Group Audio Layer IV, the moving picture expert compressed standard audio Level 4) Player, laptop or desktop computer.
- Terminal 1500 may also be called user equipment, portable terminal, laptop terminal, desktop terminal, and the like by other names.
- the terminal 1500 includes: one or more processors 1501 and a memory 1502 .
- the processor 1501 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like.
- the processor 1501 can use at least one hardware form among DSP (Digital Signal Processing, digital signal processing), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, programmable logic array) accomplish.
- the processor 1501 may also include a main processor and a coprocessor.
- the main processor is a processor used to process data in the wake-up state, also called CPU (Central Processing Unit, central processing unit); the coprocessor is A low-power processor for processing data in a standby state.
- the processor 1501 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used for rendering and drawing the content that needs to be displayed on the display screen.
- the processor 1501 may further include an AI (Artificial Intelligence, artificial intelligence) processor, where the AI processor is used to process computing operations related to machine learning.
- AI Artificial Intelligence, artificial intelligence
- Memory 1502 may include one or more computer-readable storage media, which may be non-transitory. Memory 1502 may also include high-speed random access memory, as well as non-volatile memory, such as one or more disk storage devices, flash storage devices. In some embodiments, a non-transitory computer-readable storage medium in memory 1502 is used to store at least one computer-readable instruction for execution by one or more processors 1501 to implement The image processing methods provided by the method embodiments in this application.
- the terminal 1500 may further include: a peripheral device interface 1503 and at least one peripheral device.
- One or more of the processor 1501, the memory 1502 and the peripheral device interface 1503 may be connected through a bus or a signal line.
- Each peripheral device can be connected to the peripheral device interface 1503 through a bus, a signal line or a circuit board.
- the peripheral device includes: at least one of a radio frequency circuit 1504 , a display screen 1505 , a camera assembly 1506 , an audio circuit 1507 , a positioning assembly 1508 and a power supply 1509 .
- the peripheral device interface 1503 may be used to connect at least one peripheral device related to I/O (Input/Output) to the one or more processors 1501 and the memory 1502 .
- the radio frequency circuit 1504 is used for receiving and transmitting RF (Radio Frequency, radio frequency) signals, also called electromagnetic signals.
- the radio frequency circuit 1504 communicates with communication networks and other communication devices via electromagnetic signals.
- the display screen 1505 is used for displaying UI (User Interface, user interface).
- the UI can include graphics, text, icons, video, and any combination thereof.
- the display screen 1505 also has the ability to acquire touch signals on or above the surface of the display screen 1505 .
- the touch signal may be input to one or more processors 1501 as a control signal for processing.
- the display screen 1505 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards.
- the display screen 1505 there may be one display screen 1505, which is arranged on the front panel of the terminal 1500; in other embodiments, there may be at least two display screens 1505, which are respectively arranged on different surfaces of the terminal 1500 or in a folded design; In other embodiments, the display screen 1505 may be a flexible display screen, which is disposed on a curved surface or a folding surface of the terminal 1500 . Even, the display screen 1505 can also be set as a non-rectangular irregular figure, that is, a special-shaped screen.
- the display screen 1505 can be prepared by using materials such as LCD (Liquid Crystal Display, liquid crystal display), OLED (Organic Light-Emitting Diode, organic light emitting diode).
- the camera assembly 1506 is used to capture images or video.
- Audio circuitry 1507 may include a microphone and speakers.
- the microphone is used to collect the sound waves of the user and the environment, convert the sound waves into electrical signals and input them to one or more processors 1501 for processing, or input them to the radio frequency circuit 1504 to realize voice communication.
- Speakers are used to convert electrical signals from one or more processors 1501 or radio frequency circuits 1504 into sound waves.
- the positioning component 1508 is used to locate the current geographic location of the terminal 1500 to implement navigation or LBS (Location Based Service).
- LBS Location Based Service
- the power supply 1509 is used to power various components in the terminal 1500 .
- the terminal 1500 also includes one or more sensors 1510 .
- the one or more sensors 1510 include, but are not limited to, an acceleration sensor 1511 , a gyro sensor 1512 , a pressure sensor 1513 , a fingerprint sensor 1514 , an optical sensor 1515 , and a proximity sensor 1516 .
- the acceleration sensor 1511 can detect the magnitude of acceleration on the three coordinate axes of the coordinate system established by the terminal 1500 .
- the gyroscope sensor 1512 can detect the body direction and rotation angle of the terminal 1500 , and the gyroscope sensor 1512 can cooperate with the acceleration sensor 1511 to collect 3D actions of the user on the terminal 1500 .
- the pressure sensor 1513 may be disposed on the side frame of the terminal 1500 and/or the lower layer of the display screen 1505 .
- the fingerprint sensor 1514 is used to collect the user's fingerprint, and the one or more processors 1501 identify the user's identity according to the fingerprints collected by the fingerprint sensor 1514, or the fingerprint sensor 1514 identifies the user's identity according to the collected fingerprints.
- Optical sensor 1515 is used to collect ambient light intensity.
- the one or more processors 1501 may control the display brightness of the display screen 1505 according to the ambient light intensity collected by the optical sensor 1515 .
- a proximity sensor 1516 also called a distance sensor, is usually provided on the front panel of the terminal 1500.
- the proximity sensor 1516 is used to collect the distance between the user and the front of the terminal 1500 .
- FIG. 15 does not constitute a limitation on the terminal 1500, and may include more or less components than the one shown, or combine some components, or adopt different component arrangements.
- Embodiments of the present application further provide one or more computer-readable storage media, where the computer-readable storage media is applied to a computer device, and at least one computer-readable instruction is stored in the computer-readable storage medium, and the at least one computer-readable storage medium is The instructions are loaded and executed by one or more processors to implement the operations performed by the computer device in the image processing method of the above-described embodiments.
- Embodiments of the present application further provide a computer-readable instruction product or computer-readable instruction, where the computer-readable instruction product or computer-readable instruction includes computer-readable instruction code, and the computer-readable instruction code is stored in a computer-readable storage in the medium.
- One or more processors of the computer device read the computer-readable instruction code from the computer-readable storage medium, and the one or more processors execute the computer-readable instruction code, so that the computer device executes the various optional implementations described above Image processing methods provided in .
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
Description
Claims (17)
- 一种图像处理方法,其特征在于,由计算机设备执行,所述方法包括:对原始图像进行图像语义分割,得到第一图像、第二图像以及第三图像,所述第一图像中的前景区域是所述原始图像中目标对象所在的区域,所述第二图像是所述目标对象的第一目标区域的分割图像,所述第三图像是所述目标对象的第二目标区域的分割图像;所述前景区域的子区域包括所述第一目标区域和所述第二目标区域;基于所述第一图像、所述第二图像以及所述第三图像,生成目标三元图,所述目标三元图包括所述前景区域和画线区域,所述画线区域是通过在所述前景区域的轮廓线上绘制线条得到的;所述前景区域的不同子区域对应不同的线条宽度;基于所述目标三元图,对所述原始图像中的所述目标对象进行抠图处理,得到包括所述目标对象的目标图像。
- 根据权利要求1所述的方法,其特征在于,所述第一目标区域为头发区域;所述第二目标区域为脸部区域;所述前景区域还包括所述目标对象的躯干区域,其中,在所述目标三元图中,所述头发区域对应的线条宽度大于所述躯干区域对应的线条宽度,所述躯干区域对应的线条宽度大于所述脸部区域对应的线条宽度。
- 根据权利要求1所述的方法,其特征在于,所述第一目标区域为头发区域;所述第二目标区域为脸部区域;所述基于所述第一图像、所述第二图像以及所述第三图像,生成目标三元图,包括:基于所述第一图像和所述第二图像,生成第一三元图,所述第一三元图包括所述前景区域、第一画线子区域和第二画线子区域;其中,所述第一画线子区域覆盖在所述头发区域靠近所述第一图像中的背景区域一侧的轮廓线上,所述第二画线子区域覆盖在所述前景区域中的非头发区域的轮廓线上,所述非头发区域是所述前景区域中除了所述头发区域之外的区域;所述第一画线子区域是使用第一线条宽度绘制的,所述第二画线子区域是使用第二线条宽度绘制的;所述第一线条宽度大于第二线条宽度;基于所述第三图像和所述第一三元图,生成所述目标三元图。
- 根据权利要求3所述的方法,其特征在于,所述基于所述第一图像和所述第二图像,生成第一三元图,包括:在所述第一图像中,获取所述前景区域的完整轮廓线;按照所述第二线条宽度,在所述前景区域的完整轮廓线上绘制线条,得到第二三元图;其中,所述第二三元图包括所述前景区域和第三画线子区域,所述第三画线子区域覆盖在所述前景区域的完整轮廓线上;在所述第二图像中,获取所述头发区域的完整轮廓线;按照所述第一线条宽度,在所述头发区域的完整轮廓线上绘制线条,得到第三三元图;其中,所述第三三元图包括所述头发区域和第四画线子区域,所述第四画线子区域覆盖在所述头发区域的完整轮廓线上;对所述第二三元图和所述第三三元图进行合并处理,得到所述第一三元图。
- 根据权利要求3或4所述的方法,其特征在于,所述第一线条宽度为所述第二线条宽度的M倍,M大于1。
- 根据权利要求4所述的方法,其特征在于,所述对所述第二三元图和所述第三三元图进行合并处理,得到所述第一三元图,包括:获取所述第二三元图中各个像素点的第一标识值,所述第一标识值用于标识所述第二三元图中 像素点的颜色;获取所述第三三元图中各个像素点的第二标识值,所述第二标识值用于标识所述第三三元图中像素点的颜色;基于所述第一标识值和所述第二标识值之间的大小关系,生成所述第一三元图。
- 根据权利要求6所述的方法,其特征在于,所述基于所述第一标识值和所述第二标识值之间的大小关系,生成所述第一三元图,包括:将所述第二三元图中任意位置上像素点的第一标识值,与所述第三三元图中相同位置上像素点的第二标识值进行比较;将所述第一标识值和所述第二标识值中的最大者,作为所述第一三元图中相同位置上像素点的第三标识值,所述第三标识值用于标识所述第一三元图中像素点的颜色。
- 根据权利要求3所述的方法,其特征在于,所述基于所述第三图像和所述第一三元图,生成所述目标三元图,包括:基于所述第三图像中的所述脸部区域,确定所述第一三元图的目标重叠区域,所述目标重叠区域为所述脸部区域与所述第二画线子区域的重叠区域;以目标标识值对所述目标重叠区域的像素点进行赋值,生成所述目标三元图,所述目标标识值用于标识所述脸部区域中像素点的颜色。
- 根据权利要求1所述的方法,其特征在于,所述基于所述目标三元图,对所述原始图像中的所述目标对象进行抠图处理,得到包括所述目标对象的目标图像,包括:基于所述目标三元图,获取所述原始图像中各个像素点的透明度,所述透明度用于表征所述像素点属于所述目标对象的概率;基于所述透明度对所述原始图像进行抠图处理,得到所述目标图像。
- 根据权利要求9所述的方法,其特征在于,所述基于所述目标三元图,获取所述原始图像中各个像素点的透明度包括:将所述目标三元图和所述原始图像输入至抠图模型中,基于所述抠图模型根据所述目标三元图和所述原始图像,对所述原始图像中每个像素点属于目标图像的概率进行计算,以输出透明度;获取所述抠图模型输出的所述透明度。
- 一种图像处理装置,其特征在于,所述装置包括:图像分割模块,用于对原始图像进行图像语义分割,得到第一图像、第二图像以及第三图像,所述第一图像中的前景区域是所述原始图像中目标对象所在的区域,所述第二图像是所述目标对象的第一目标区域的分割图像,所述第三图像是所述目标对象的第二目标区域的分割图像;所述前景区域的子区域包括所述第一目标区域和所述第二目标区域;三元图生成模块,用于基于所述第一图像、所述第二图像以及所述第三图像,生成目标三元图,所述目标三元图包括所述前景区域和画线区域,所述画线区域是通过在所述前景区域的轮廓线上绘制线条得到的;所述前景区域的不同子区域对应不同的线条宽度;抠图模块,用于基于所述目标三元图,对所述原始图像中的所述目标对象进行抠图处理,得到包括所述目标对象的目标图像。
- 根据权利要求11所述的装置,其特征在于,所述第一目标区域为头发区域;所述第二目标区域为脸部区域;所述前景区域还包括所述目标对象的躯干区域,其中,在所述目标三元图中,所述头发区域对应的线条宽度大于所述躯干区域对应的线条宽度,所述躯干区域对应的线条宽度大于所述脸部区域对应的线条宽度。
- 根据权利要求12所述的装置,其特征在于,所述第一目标区域为头发区域;所述第二目标区域为脸部区域;所述三元图生成模块包括:第一生成单元,用于基于所述第一图像和所述第二图像,生成第一三元图,所述第一三元图包括所述前景区域、第一画线子区域和第二画线子区域;其中,所述第一画线子区域覆盖在所述头发区域靠近背景区域一侧的轮廓线上,所述第二画线子区域覆盖在其他区域的轮廓线上,所述其他区域为所述前景区域中除了所述头发区域之外的区域;第一线条宽度大于第二线条宽度,所述第一线条宽度用于绘制所述第一画线子区域,所述第二线条宽度用于绘制所述第二画线子区域;第二生成单元,用于基于所述第三图像和所述第一三元图,生成所述目标三元图。
- 根据权利要求13所述的装置,其特征在于,所述第一生成单元用于:在所述第一图像中,获取所述前景区域的完整轮廓线;按照所述第二线条宽度,在所述前景区域的完整轮廓线上绘制线条,得到第二三元图;其中,所述第二三元图包括所述前景区域和第三画线子区域,所述第三画线子区域覆盖在所述前景区域的完整轮廓线上;在所述第二图像中,获取所述头发区域的完整轮廓线;按照所述第一线条宽度,在所述头发区域的完整轮廓线上绘制线条,得到第三三元图;其中,所述第三三元图包括所述头发区域和第四画线子区域,所述第四画线子区域覆盖在所述头发区域的完整轮廓线上;对所述第二三元图和所述第三三元图进行合并处理,得到所述第一三元图。
- 一种计算机设备,其特征在于,所述计算机设备包括一个或多个处理器和存储器,所述存储器用于存储至少一条计算机可读指令,所述至少一条计算机可读指令由所述一个或多个处理器加载并执行权利要求1至10中任一项所述的图像处理方法。
- 一个或多个计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有至少一条计算机可读指令,所述至少一条计算机可读指令由一个或多个处理器加载并执行以实现权利要求1至10中任一项所述的图像处理方法。
- 一种计算机程序产品,包括计算机可读指令,其特征在于,该计算机可读指令被一个或多个处理器执行时实现权利要求1至10中任一项所述的图像处理方法。
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2023524819A JP7635372B2 (ja) | 2021-01-18 | 2022-01-11 | 画像処理の方法、装置、デバイス及びコンピュータプログラム |
| EP22738995.4A EP4276754B1 (en) | 2021-01-18 | 2022-01-11 | Image processing method and apparatus, device, storage medium, and computer program product |
| US17/989,109 US20230087489A1 (en) | 2021-01-18 | 2022-11-17 | Image processing method and apparatus, device, and storage medium |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110062567.1 | 2021-01-18 | ||
| CN202110062567.1A CN113570614B (zh) | 2021-01-18 | 2021-01-18 | 图像处理方法、装置、设备及存储介质 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/989,109 Continuation US20230087489A1 (en) | 2021-01-18 | 2022-11-17 | Image processing method and apparatus, device, and storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2022152116A1 true WO2022152116A1 (zh) | 2022-07-21 |
Family
ID=78160954
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2022/071306 Ceased WO2022152116A1 (zh) | 2021-01-18 | 2022-01-11 | 图像处理方法、装置、设备、存储介质及计算机程序产品 |
Country Status (5)
| Country | Link |
|---|---|
| US (1) | US20230087489A1 (zh) |
| EP (1) | EP4276754B1 (zh) |
| JP (1) | JP7635372B2 (zh) |
| CN (1) | CN113570614B (zh) |
| WO (1) | WO2022152116A1 (zh) |
Families Citing this family (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113570614B (zh) * | 2021-01-18 | 2025-12-12 | 腾讯科技(深圳)有限公司 | 图像处理方法、装置、设备及存储介质 |
| CN117651972A (zh) * | 2022-07-04 | 2024-03-05 | 北京小米移动软件有限公司 | 图像处理方法、装置、终端设备、电子设备及存储介质 |
| US20240320838A1 (en) * | 2023-03-20 | 2024-09-26 | Adobe Inc. | Burst image matting |
| CN116503423B (zh) * | 2023-04-27 | 2026-01-13 | 深圳市即构科技有限公司 | 区域抠图方法、电子设备及存储介质 |
| CN116843708B (zh) * | 2023-08-30 | 2023-12-12 | 荣耀终端有限公司 | 图像处理方法和装置 |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20150213611A1 (en) * | 2014-01-29 | 2015-07-30 | Canon Kabushiki Kaisha | Image processing apparatus that identifies image area, and image processing method |
| CN110751655A (zh) * | 2019-09-16 | 2020-02-04 | 南京工程学院 | 一种基于语义分割和显著性分析的自动抠图方法 |
| CN111383232A (zh) * | 2018-12-29 | 2020-07-07 | Tcl集团股份有限公司 | 抠图方法、装置、终端设备及计算机可读存储介质 |
| CN113570614A (zh) * | 2021-01-18 | 2021-10-29 | 腾讯科技(深圳)有限公司 | 图像处理方法、装置、设备及存储介质 |
Family Cites Families (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8386964B2 (en) * | 2010-07-21 | 2013-02-26 | Microsoft Corporation | Interactive image matting |
| CN103473780B (zh) * | 2013-09-22 | 2016-05-25 | 广州市幸福网络技术有限公司 | 一种人像背景抠图的方法 |
| US10275892B2 (en) * | 2016-06-09 | 2019-04-30 | Google Llc | Multi-view scene segmentation and propagation |
| CN108961303B (zh) * | 2018-07-23 | 2021-05-07 | 北京旷视科技有限公司 | 一种图像处理方法、装置、电子设备和计算机可读介质 |
| KR102135478B1 (ko) * | 2018-12-04 | 2020-07-17 | 엔에이치엔 주식회사 | 딥러닝 기반 가상 헤어 염색방법 및 시스템 |
| CN111080656B (zh) * | 2019-12-10 | 2024-10-01 | 腾讯科技(深圳)有限公司 | 一种图像处理的方法、图像合成的方法以及相关装置 |
| CN111507994B (zh) * | 2020-04-24 | 2023-10-03 | Oppo广东移动通信有限公司 | 一种人像提取方法、人像提取装置及移动终端 |
-
2021
- 2021-01-18 CN CN202110062567.1A patent/CN113570614B/zh active Active
-
2022
- 2022-01-11 EP EP22738995.4A patent/EP4276754B1/en active Active
- 2022-01-11 WO PCT/CN2022/071306 patent/WO2022152116A1/zh not_active Ceased
- 2022-01-11 JP JP2023524819A patent/JP7635372B2/ja active Active
- 2022-11-17 US US17/989,109 patent/US20230087489A1/en active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20150213611A1 (en) * | 2014-01-29 | 2015-07-30 | Canon Kabushiki Kaisha | Image processing apparatus that identifies image area, and image processing method |
| CN111383232A (zh) * | 2018-12-29 | 2020-07-07 | Tcl集团股份有限公司 | 抠图方法、装置、终端设备及计算机可读存储介质 |
| CN110751655A (zh) * | 2019-09-16 | 2020-02-04 | 南京工程学院 | 一种基于语义分割和显著性分析的自动抠图方法 |
| CN113570614A (zh) * | 2021-01-18 | 2021-10-29 | 腾讯科技(深圳)有限公司 | 图像处理方法、装置、设备及存储介质 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4276754A4 |
Also Published As
| Publication number | Publication date |
|---|---|
| US20230087489A1 (en) | 2023-03-23 |
| JP2023546607A (ja) | 2023-11-06 |
| EP4276754B1 (en) | 2026-04-08 |
| CN113570614B (zh) | 2025-12-12 |
| CN113570614A (zh) | 2021-10-29 |
| EP4276754A1 (en) | 2023-11-15 |
| EP4276754A4 (en) | 2024-06-26 |
| JP7635372B2 (ja) | 2025-02-25 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN113569614B (zh) | 虚拟形象生成方法、装置、设备及存储介质 | |
| WO2022152116A1 (zh) | 图像处理方法、装置、设备、存储介质及计算机程序产品 | |
| CN108537859B (zh) | 使用深度学习的图像蒙板 | |
| CN112991494B (zh) | 图像生成方法、装置、计算机设备及计算机可读存储介质 | |
| CN112749613B (zh) | 视频数据处理方法、装置、计算机设备及存储介质 | |
| US11308655B2 (en) | Image synthesis method and apparatus | |
| US12430782B2 (en) | Item display method, apparatus, and device, and storage medium | |
| Liu et al. | Real-time robust vision-based hand gesture recognition using stereo images | |
| CN110570460B (zh) | 目标跟踪方法、装置、计算机设备及计算机可读存储介质 | |
| CN112257552B (zh) | 图像处理方法、装置、设备及存储介质 | |
| CN112001872B (zh) | 信息显示方法、设备及存储介质 | |
| CN107749062B (zh) | 图像处理方法、及装置 | |
| CN111768356A (zh) | 一种人脸图像融合方法、装置、电子设备及存储介质 | |
| CN112818979B (zh) | 文本识别方法、装置、设备及存储介质 | |
| CN111107264A (zh) | 图像处理方法、装置、存储介质以及终端 | |
| WO2020155984A1 (zh) | 人脸表情图像处理方法、装置和电子设备 | |
| CN114676360A (zh) | 图像处理方法、装置、电子设备及存储介质 | |
| CN118230203B (zh) | Ar翻译的处理方法及电子设备 | |
| Liu et al. | Light direction estimation and hand touchable interaction for augmented reality | |
| CN116012270A (zh) | 图像处理的方法和装置 | |
| Sheremet et al. | Efficient face detection and replacement in the creation of simple fake videos | |
| HK40054498A (zh) | 图像处理方法、装置、设备及存储介质 | |
| CN120747126B (zh) | 图像处理方法及电子设备 | |
| HK40071445B (zh) | 图像处理方法、装置、电子设备及存储介质 | |
| HK40044513B (zh) | 视频数据处理方法、装置、计算机设备及存储介质 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22738995 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2023524819 Country of ref document: JP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2022738995 Country of ref document: EP Effective date: 20230811 |
|
| WWG | Wipo information: grant in national office |
Ref document number: 2022738995 Country of ref document: EP |