Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
In order that the aspects shown in the embodiments of the present application can be easily understood, several terms appearing in the embodiments of the present application will be explained below.
First partial image: the first partial image may be an image of one closed region in the target image, or may be an image of two or more closed regions. In an extreme case, the first partial image may also be the target image itself. The target image contains a specified type object, and the specified type object can be at least one of a sky object, a sea object, a river object, a lake object, a hill object, a grassland object or a road object.
Target image: the image comprises a first partial image, and the target image can be obtained by screening from the image to be processed by the terminal. Alternatively, the terminal may determine whether the image to be processed contains the specified type object by means of image recognition, and determine a partial image corresponding to the specified type object as the first partial image. Optionally, the embodiment of the application can also identify whether the image to be processed is a target image through an image identification model, where the image identification model is a machine learning model expected to be obtained through training of a second sample image, the second sample image includes an image labeled with a corresponding image type, the image type includes a target type and a non-target type, an image belonging to the target type includes a specified type object, and an image belonging to the non-target type does not include the specified type object. It should be noted that the image recognition model may be a CNN (Convolutional Neural Network) model, such as at least one of a VGG (Visual Geometry Group, super-resolution test sequence) model, a YOLO (youonly Look one) model, or an RCNN (Regions with CNN features based regional methods) model.
The image generation model refers to a machine learning model obtained by training N first sample images in advance, the image generation model can correspond to a preset classification, and the preset classification can be one of at least two classifications of a specified type object. Alternatively, when the specified type object is a sky object, the preset classification may be one of a sunny type, a cloudy type, or a cloudy-rain type. It should be noted that the terminal may simultaneously integrate an image generation model corresponding to a clear type, an image generation model corresponding to a cloudy type, and an image generation model corresponding to a rainy type. In another possible implementation manner, the terminal may further set more image generation models according to time, for example, an image generation model corresponding to a clear type may be replaced with an image generation model corresponding to sunrise, an image generation model corresponding to a clear type at noon, an image generation model corresponding to a clear type at sunset, and an image generation model corresponding to a clear type at night. In practical applications, the image generation model may be a GNN (Generative adaptive Nets) model. For example, the model may be trained by a set of clear sky images (first training samples), and then converted into an image generation model corresponding to a clear type. In the image generation model, the first training sample belongs to the same preset classification model, so that the generated second local image is similar to the image of the preset classification model.
An information output component: the information can be perceived by a user through visual information, auditory information or somatosensory information, and the information comprises at least one of a screen, a loudspeaker, a vibration component, a prompt lamp and other components.
For example, the image processing method shown in the embodiment of the present application may be applied to a terminal, and the terminal is provided with a display screen. The terminal may include a mobile phone, a tablet computer, a laptop computer, a desktop computer, a computer all-in-one machine, a server, a workstation, a television, a set-top box, smart glasses, a smart watch, a digital camera, an MP4 player terminal, an MP5 player terminal, a learning machine, a point-and-read machine, an electronic book, an electronic dictionary, a vehicle-mounted terminal, a Virtual Reality (VR) player terminal, an Augmented Reality (AR) player terminal, or the like.
Please refer to fig. 1, which is a flowchart illustrating an image processing method according to an exemplary embodiment of the present application. The image processing method can be applied to the terminal shown above. In fig. 1, the image processing method includes:
step 110, a target image is acquired.
In this embodiment of the application, the terminal may first acquire the image to be processed, and the process of acquiring the image to be processed may refer to the above explanation about the target image, which is not described herein again. After the image to be processed is acquired, the terminal can determine whether the image to be processed contains the specified type object or not in an image identification mode, and acquire the image containing the specified type object as a target image.
In step 120, a first partial image in the target image is extracted, the first partial image being an image containing an object of the specified type.
In the embodiment of the application, the terminal extracts the first partial image in the target image, wherein the first partial image contains the specified type object. For example, when the specified type object is a sky object, the terminal extracts a sky region in the target image, and sets the region as the first partial image.
Please refer to fig. 2, which is a schematic diagram of a first partial image according to the embodiment shown in fig. 1. In fig. 2, the type object is designated as a sky object, and the terminal extracts a first partial image 210 from the target image 200, where the first partial image 210 includes the sky object.
And step 130, generating a second partial image through an image generation model corresponding to a preset classification, wherein the preset classification is one of at least two classifications of the specified type object, the image generation model is a machine learning model obtained through training of N first sample images in advance, the first sample images contain the specified type object of the preset classification, and the geometric size of the second partial image is the same as that of the first partial image.
In the embodiment of the application, the terminal can generate the second local image through the image generation model corresponding to the preset classification. The preset classification is a classification determined by the terminal itself or a classification designated by the user. The preset classification is a classification corresponding to an image generation model stored in the terminal, and for example, when the specified type object is a sky object, the preset classification may be one of a sunny type, a cloudy type, or a rainy type.
Optionally, the terminal may determine a preset classification from at least two classifications of the specified type of object, and determine a corresponding image generation model from the at least two image generation models according to the preset classification.
In one possible implementation, the terminal may autonomously determine the preset classification. For example, the terminal may determine the preset classification according to the real-time weather condition of the current geographic location, where the terminal determines that the current geographic location is the beijing east city area and the time is 3/27/15: 00 in 2018. The terminal can read the weather condition of 15:00 in 27 months in 2018 in the tokyo city of beijing from the cloud or local cache data, for example, the weather condition acquired by the terminal is clear, and the terminal can determine the preset classification as clear.
In another possible implementation manner, the terminal may accept an operation of a user, and determine the preset classification according to the operation of the user. First, the terminal may receive a first signal, which is a signal generated when a first user operation, which is a selection operation performed based on at least two categories, is received, and then, the terminal determines a category corresponding to the first user operation as a preset category according to the first signal. Please refer to fig. 3, which is a schematic diagram illustrating an interaction of determining a preset category by a user according to the embodiment shown in fig. 1. In fig. 3, the terminal displays candidate preset classifications of a sunny type 310, a cloudy type 320, and a cloudy type 330, and a finger 340 of a user clicks an icon corresponding to the sunny type 310 in a touchable display screen of the terminal. At the moment, the terminal determines the sunny type as a preset classification, and generates a second local image by using an image generation model corresponding to the preset classification.
In step 140, the first partial image in the target image is replaced with the second partial image.
In the embodiment of the present application, the terminal replaces the first partial image in the target image with the second partial image. In one possible implementation, the terminal may overlay the second partial image at the position of the first partial image, and synthesize the target image and the second partial image into a new target image. In another possible implementation process, the terminal may delete the first partial image in the target image, and splice the remaining part with the second partial image to synthesize a new second image.
In summary, in the image processing method provided in this embodiment, the target image is obtained, the first partial image including the specified type object in the target image is extracted, the second partial image having the same geometric size as the first partial image is generated by using the image generation model corresponding to the preset classification, and the first partial image in the first partial image is replaced by the second partial image, so that the terminal can automatically replace the partial region including the specified type object in the target image by the image generated by the image generation model corresponding to the preset classification.
In an embodiment provided by the present application, the terminal can further determine a preset classification according to the expression of the person in the target image, please refer to the following embodiment.
Please refer to fig. 4, which is a flowchart of image processing according to another exemplary embodiment of the present application. The method of image processing can be applied to the terminal shown above. In fig. 4, the method of image processing includes:
step 401, acquiring an image to be processed.
In this embodiment, the terminal may acquire the image to be processed in the following ways.
The first method is as follows: the terminal can control the camera to collect the image to be processed by receiving an image collecting signal, wherein the image collecting signal is a signal generated when the terminal receives an operation for triggering the camera to collect the image to be processed.
For example, when a user clicks a shooting button of a camera application in the terminal or presses a volume adjustment button of the terminal, the terminal will shoot an image and take the image as an image to be processed.
The second method comprises the following steps: the terminal can read the local storage chip and acquire the image stored in the local chip as the image to be processed.
For example, the user uses an album of the terminal to select an image in the album, and the image can be used as the image to be processed. It should be noted that the local storage chip may include a memory (volatile storage) chip or a nonvolatile storage chip.
The third method comprises the following steps: the terminal can control the camera to acquire the image to be processed by receiving an open framing signal generated when the terminal receives an operation for instructing the photographing application to enter a framing mode.
For example, the terminal may use a signal generated when the user clicks a start icon of the shooting application as the open framing signal, or use another signal generated when the shooting application is started and the framing mode is entered as the open framing signal, and the terminal uses the framing image acquired by the camera as the image to be processed.
Alternatively, the framing image may be an image displayed in real time on a screen of the terminal when the camera is in a framing state, so that the user can observe an image acquired by the camera in real time.
The method is as follows: the terminal can also take the image downloaded from the cloud as the image to be processed.
For example, a user downloads an image from a cloud server, and the terminal takes the image as a to-be-processed image.
Step 402, inputting an image to be processed into an image recognition model, and obtaining an image type corresponding to the image to be processed, where the image recognition model is a machine learning model obtained in advance through training of a second sample image, and the second sample image includes an image labeled with the corresponding image type.
It should be noted that the image types include a target type and a non-target type, an image belonging to the target type includes a specified type object, and an image belonging to the non-target type does not include a specified type object. After the terminal acquires the image to be processed, the image to be processed can be input into the image recognition model to acquire the target image. Please refer to fig. 5, which is a schematic diagram of a process for acquiring a target image according to the embodiment shown in fig. 4. In fig. 5, the image to be processed is identified by the image identification model and then is divided into a target image and a non-target image.
Wherein, the process of acquiring the second sample image may be implemented by the step (a1), the step (a2), the step (a3), and the step (a 4).
The step (a1) that the terminal collects an image containing an object of a specified type;
it should be noted that the image containing the specified type object may be distinguished according to the difference of the specified type object, and when the specified type object is a sky object, the terminal may collect various images containing a sky area, where the image may be an image of all sky objects in the whole image. When the specified type object is a sea object, the terminal can acquire images of a calm sea, an image of a wave rolling, an image of a wave-light-up sea surface, and the like. The object of the specified type may also be at least one of a river object, a lake object, a hill object, a grassland object, or a road object, according to the same image acquisition criteria.
The step (a2) of the terminal dividing an image containing an object of a specified type into M image blocks, M being a positive integer.
The terminal may divide an image containing the object of the designated type into M image blocks, M may be set according to a resolution of the image, and a value of M may be set to be larger when the resolution of the image is higher and smaller when the resolution of the image is lower. Wherein M is in positive correlation with the resolution of the target image. For example, when the terminal is a mobile phone or a tablet computer, M may have a value of 100, and M may be implemented as a 10 × 10 square grid, and the image is divided into 100 equal-sized image blocks. It should be noted that the value of M may depend on an experimental value, may float around the example disclosed in the present disclosure, or may be determined as a specific value, which is not limited in this embodiment.
The step (a3) terminal adds labels to the M image blocks, respectively, the labels being used to indicate the image types of the corresponding image blocks.
In this embodiment, the terminal may obtain the respective labels of the M image blocks calibrated by the technician, or may add the labels to the M image blocks by calling other identification models.
And (a4) the terminal acquires the M image blocks as second sample images.
In this embodiment, the terminal takes the M image blocks as the second sample image. Optionally, the terminal may train the image recognition model through the second sample image. The image recognition model may be trained in the server, and the trained image recognition model may be directly installed in the terminal.
Step 403, determining the image to be processed with the image type as the target image.
Step 404, dividing the target image into Q image blocks, wherein Q is a positive integer.
In this embodiment, the terminal may divide the target image into Q image blocks, where Q is a positive integer. The value of Q may be the same as the parameter M, i.e. Q equals M. When Q and M have the same value, the program code for executing the embodiment and the required computing resources will be reduced, so that the efficiency of implementing the method shown in the embodiment is high.
For each of the Q image blocks, a geometric reference point of an area of the image block belonging to the first local image is determined, the geometric reference point being one of a center, a orthocenter or a center of gravity, step 405.
Please refer to fig. 6, which is a schematic diagram illustrating the positions of geometric reference points provided based on the embodiment shown in fig. 4. In fig. 6, the entire block of the image block 610 belongs to the first local area, and if the geometric reference point is taken as the center, the point 611 is the geometric reference point of the image block 610. The upper left triangular area in the image block 620 belongs to the first local area, and if the geometric reference point is taken as the center, the point 621 is the geometric reference point of the image block 620.
And 406, extracting a first local image from the target image according to the coordinates of the respective geometric reference point of each image block in the Q image blocks.
Wherein, in the actual implementation process, the terminal can acquire the first local image with more accurate range according to the geometric reference point, and the implementation scheme comprises the step (b1), the step (b2), the step (b3), the step (b4) and the step (b 5).
And (b1) dividing the target image into K image blocks with equal area by the terminal, wherein K is a positive integer.
It should be noted that the value of K may be a large value as much as possible within the range of the computing capability of the terminal executing the present solution. The reason is that the larger the value of K is, the finer the boundary of the first partial image extracted by the terminal will be.
And (b2) when the terminal executes for the ith time, calculating the similarity S between every two image blocks with a common edge aiming at the remaining (K +1-i) image blocks of the K image blocks to obtain a similarity set S.
In this embodiment, the terminal will perform the fusion operation on the K image blocks, starting from the 1 st execution until the K image blocks are fused into 2 graphics blocks.
And (b3) merging the target image block and the second image block corresponding to the maximum similarity s into one image block.
In the step (b4), the similarity value corresponding to the target image block and the similarity value corresponding to the second image block are deleted from the similarity set S.
And (b5) executing the (i + 1) th operation until the similarity set S is an empty set, and taking the image block, which contains the geometric reference point of the image block of the Q image blocks, in each merged image block as the first local image.
Please refer to fig. 7, which is a schematic diagram illustrating a process of extracting a first partial image according to the embodiment shown in fig. 4. In fig. 7, K is 9, and each time the terminal determines two image blocks with the highest similarity, until the number of the image blocks is 2, the terminal determines an image block including the geometric reference point of the image block of the Q image blocks from the image block 7A and the image block 7B. For example, in fig. 7, the image block 7A includes the geometric reference point of the image block in the Q image blocks, and the image block is used as the first local image. For example, in the 1 st execution, the image block denoted by 1 and the image block denoted by 2 have the highest similarity, and are merged into the new image block 10, and after 7 times of merging, the image block 7A is the most part of the first partial image.
Optionally, in a possible implementation manner, the terminal may acquire the first partial image using an image segmentation algorithm, which may be a selective search (selective search) algorithm.
Step 407, performing expression recognition on the face to obtain an expression type of the face, where the face is the face included in the target image.
When the face in the target image includes at least two faces, the terminal may perform expression recognition on the at least two faces in the face respectively to obtain respective expression types of the at least two faces. And determining the expression type with the highest occupied proportion as the expression type of the face based on the respective expression types of at least two faces.
For example, if the faces in the target image include three faces, and the expression types of the two faces are happy expression types, the happy tag type is determined as the expression type of the face.
Optionally, the terminal may further use, when the face in the target image includes at least two faces, the expression type of the face with the largest area as the expression type of the face.
Optionally, when the faces in the target image include at least two faces, the terminal may further use the expression type of the face closest to the lens as the expression type of the face. The distance from the lens can be determined by the depth of field, and the depth of field can be determined by the terminals of two or more cameras when acquiring the target image.
Step 408, determining the classification corresponding to the expression type of the face of at least two classifications as a preset classification.
In this embodiment, the classification relationship corresponding to the expression type of the face may be a predetermined relationship. For example, a happy expression type corresponds to a sunny type, a calm expression type corresponds to a cloudy type, and a difficult expression type corresponds to a cloudy and rainy type.
Step 409, generating a second partial image through an image generation model corresponding to a preset classification, wherein the preset classification is one of at least two classifications of a specified type object, the image generation model is a machine learning model obtained through training of N first sample images in advance, the first sample images contain the specified type object of the preset classification, and the geometric size of the second partial image is the same as that of the first partial image.
In the embodiment of the present application, the execution process of step 409 is the same as that of step 130, and please refer to step 130 for details, which is not described herein again.
In step 410, the first partial image in the target image is replaced with the second partial image.
In the embodiment of the present application, the execution process of step 410 is the same as that of step 140, and please refer to step 140 for details, which is not described herein again.
In a possible implementation manner of this embodiment, for example, the terminal may be a mobile phone, and one image recognition model and a plurality of image generation models may be installed in the mobile phone in advance. The image generation models may include a sky image generation model corresponding to a sunny type, a sky image generation model corresponding to a cloudy type, and a sky image generation model corresponding to a rainy type.
In a possible application scenario, if the terminal starts the camera application and is in a sky beautifying mode, the terminal directly replaces the sky area in the viewfinder image with the beautified sky area. It should be noted that, in this application scenario, the terminal may query a corresponding weather condition according to the current geographic location of the terminal and the system time, select an image generation model according to the weather condition, and generate a beautified sky area in real time according to the selected image generation model.
In another possible application scenario, when the terminal takes a photograph (target image) containing the sky, the terminal pops up a predetermined type that can be used, such as a sunny type, a cloudy type, and a cloudy-rain type. When the user clicks and selects a predetermined type, such as a clear type, the terminal generates a beautified sky area (a second partial image) by using a sky image generation model corresponding to the clear type to replace an area (a first partial image) containing the sky in the photo. Optionally, in another possible implementation manner, the present solution may also be applied in the field of video processing, and the present solution may replace a local image of a video in units of frames, for example, replace a sky image in a segment of a video with a sky image corresponding to a clear type.
In summary, the embodiment can also obtain the image type corresponding to the image to be processed by obtaining the image to be processed, inputting the image to be processed into the image recognition model, determining the image to be processed with the image type as the target image, dividing the target image into Q image blocks, for each of the Q image blocks, a geometric reference point of an area of the image block belonging to the first partial image is determined, extracting a first partial image from the target image according to the coordinates of the respective geometric reference point of each of the Q image blocks, performing expression recognition on the face to acquire the expression type of the face, determining the classification corresponding to the expression type of the face in at least two classifications as a preset classification, and generating a second partial image through an image generation model corresponding to the preset classification, and replacing the first partial image in the target image with the second partial image. The method and the device can automatically generate the corresponding beautified second local image according to the expression type of the face in the image, simplify the operation steps of replacing the designated area in the image by a user, and improve the image processing effect.
The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.
Referring to fig. 8, a block diagram of a graphics processing apparatus according to an exemplary embodiment of the present application is shown. The screen recording device can be realized by software, hardware or a combination of the software and the hardware to form all or part of the terminal. The device includes:
an acquisition unit 810 for acquiring a target image;
an extraction unit 820 configured to extract a first partial image in the target image, the first partial image being an image containing a specified type object;
a generating unit 830, configured to generate a second partial image through an image generation model corresponding to a preset classification, where the preset classification is one of at least two classifications of the specified type object, the image generation model is a machine learning model obtained in advance through training of N first sample images, the first sample images include the specified type object of the preset classification, and a geometric size of the second partial image is the same as a geometric size of the first partial image;
a replacing unit 840 for replacing the first partial image in the target image with the second partial image.
In an optional embodiment, the apparatus further comprises a classification determination unit and a model acquisition unit;
the classification determining unit is used for determining the preset classification from at least two classifications of the specified type object;
and the model obtaining unit is used for obtaining the image generation model according to the preset classification.
In an alternative embodiment, the classification determining unit is configured to receive a first signal, where the first signal is a signal generated when a first user operation is received, and the first user operation is a selection operation performed based on the at least two classifications; and determining the classification corresponding to the first user operation as the preset classification according to the first signal.
In an optional embodiment, the apparatus further comprises an expression recognition unit;
the expression recognition unit is used for performing expression recognition on a human face to acquire an expression type of the human face, wherein the human face is the human face contained in the target image;
the classification determining unit is configured to determine, as the preset classification, a classification corresponding to the expression type of the face from among the at least two classifications.
In an optional embodiment, when a face includes at least two faces, the expression recognition unit is configured to perform expression recognition on the at least two faces in the faces respectively to obtain respective expression types of the at least two faces; and determining the expression type with the highest occupied proportion as the expression type of the face based on the expression types of the at least two faces.
In an optional embodiment, the obtaining unit 810 is configured to input an image to be processed into an image recognition model, and obtain an image type corresponding to the image to be processed, where the image recognition model is a machine learning model obtained through training of a second sample image in advance, and the second sample image includes an image labeled with the corresponding image type; determining the image to be processed with the image type as a target type as the target image; the image type comprises the target type and a non-target type, the image belonging to the target type contains the object of the specified type, and the image belonging to the non-target type does not contain the object of the specified type.
In an optional embodiment, the apparatus further comprises a second sample acquiring unit for acquiring an image containing the object of the specified type; dividing the image containing the object of the specified type into M image blocks, wherein M is a positive integer; respectively adding labels to the M image blocks, wherein the labels are used for indicating the image types of the corresponding image blocks; and acquiring the M image blocks as the second sample images.
In an optional embodiment, the extracting unit 820 is configured to divide the target image into Q image blocks, where Q is a positive integer; for each of the Q image blocks, determining a geometric reference point of an area of the image block belonging to the first partial image, the geometric reference point being one of a center, a orthocenter, or a center of gravity; and extracting the first partial image from the target image according to the coordinates of the respective geometric reference point of each image block in the Q image blocks.
In an optional embodiment, the extracting unit 820 is configured to divide the target image into K image blocks with equal areas, where K is a positive integer; during the ith execution, calculating the similarity S between every two image blocks with a common edge aiming at the rest (K +1-i) image blocks of the K image blocks to obtain a similarity set S; combining the target image block and the second image block corresponding to the similarity s with the maximum numerical value into one image block; deleting the similarity value corresponding to the target image block and the similarity value corresponding to the second image block from the similarity set S; and executing the (i + 1) th operation until the similarity set S is an empty set, and taking the image block of the geometric reference point of the image block in the Q image blocks in each merged image block as the first local image.
In an optional embodiment, the apparatus further includes a to-be-processed image obtaining unit, where the to-be-processed image obtaining unit is configured to receive an image acquisition signal, where the image acquisition signal is a signal generated when a second user operation is received, and the second user operation is an operation for triggering a camera to acquire the to-be-processed image; controlling the camera to collect the image to be processed according to the image collecting signal; or, reading a local storage chip, and acquiring a stored image as the image to be processed; or, receiving an open framing signal, the open framing signal being a signal generated when a third user operation is received, the third user operation being an operation for instructing the photographing application to enter a framing mode; controlling the shooting application to enter a framing mode and acquiring a framing image as the image to be processed according to the framing starting signal; or acquiring an image downloaded from a cloud end, and acquiring the image downloaded from the cloud end as the image to be processed.
In an alternative embodiment, the objects of the specified type involved in the apparatus include: at least one of a sky object, a sea object, a river object, a lake object, a hill object, a grassland object, or a road object.
In an alternative embodiment, the specific type of object involved in the apparatus is a sky object, and the preset classification includes one of a sunny type, a cloudy type, or a cloudy-rain type.
Referring to fig. 9, which is a block diagram of a terminal according to an exemplary embodiment of the present application, as shown in fig. 9, the terminal includes a processor 910 and a memory 920, where the memory 920 stores at least one instruction, and the instruction is loaded and executed by the processor 910 to implement the image processing method according to the above embodiments.
The embodiment of the present application further provides a computer-readable medium, which stores at least one instruction, and the at least one instruction is loaded and executed by the processor to implement the image processing method according to the above embodiments.
The embodiment of the present application further provides a computer program product, where at least one instruction is stored, and the at least one instruction is loaded and executed by the processor to implement the image processing method according to the above embodiments.
It should be noted that: in the image processing apparatus provided in the above embodiment, when the image processing method is executed, only the division of the above functional modules is taken as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the above described functions. In addition, the image processing apparatus and the image processing method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments in detail and are not described herein again.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.