WO2021244270A1 - 图像处理方法、装置、设备及计算机可读存储介质 - Google Patents
图像处理方法、装置、设备及计算机可读存储介质 Download PDFInfo
- Publication number
- WO2021244270A1 WO2021244270A1 PCT/CN2021/094049 CN2021094049W WO2021244270A1 WO 2021244270 A1 WO2021244270 A1 WO 2021244270A1 CN 2021094049 W CN2021094049 W CN 2021094049W WO 2021244270 A1 WO2021244270 A1 WO 2021244270A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- pixel
- processed
- model
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0475—Generative networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0495—Quantised networks; Sparse networks; Compressed networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/094—Adversarial learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4046—Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/20—Image enhancement or restoration using local operators
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N9/00—Details of colour television systems
- H04N9/64—Circuits for processing colour signals
- H04N9/67—Circuits for processing colour signals for matrixing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/01—Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Definitions
- the embodiments of the present application relate to the field of image processing technology, and relate to but not limited to an image processing method, device, device, and computer-readable storage medium.
- Image processing is a method and technology for removing noise, enhancing, restoring, and improving resolution of an image through a computer.
- image processing is widely used in various fields such as work, life, military, and medicine.
- image processing can be achieved through machine learning to achieve better processing results.
- the embodiments of the present application provide an image processing method, device, device, and computer-readable storage medium, which can not only ensure the pixel coherence of a target image, but also perform image processing in real time, thereby improving image processing efficiency.
- An embodiment of the present application provides an image processing method, which is executed by an image processing device, and includes:
- the image to be processed is a grayscale image
- extract the feature vector of each pixel in the image to be processed and determine the neighborhood image block corresponding to each pixel;
- An embodiment of the present application provides an image processing device, including:
- the first acquisition module is configured to acquire the image to be processed
- the first extraction module is configured to extract the feature vector of each pixel in the image to be processed when the image to be processed is a grayscale image, and determine the neighborhood image block corresponding to each pixel;
- the output module is configured to output the target image.
- An embodiment of the application provides an image processing device, including:
- Memory configured to store executable instructions
- the processor is configured to execute the executable instructions stored in the memory to implement the foregoing method.
- the embodiment of the present application provides a computer-readable storage medium that stores executable instructions for causing a processor to execute to implement the above-mentioned method.
- the lightweight model is obtained by lightweight processing the trained neural network model; due to the training used It is a neural network structure, so it can ensure that the output pixel coherent target image is output when various special losses are used, and the lightweight model (such as subspace model, or decision tree) obtained through model conversion is used for image processing. This enables it to run and output the target image in real time, thereby improving image processing efficiency while ensuring the processing effect.
- FIG. 1A is a schematic diagram of a network architecture of an image processing system provided by an embodiment of this application;
- FIG. 1B is a schematic diagram of another network architecture of the image processing system provided by an embodiment of this application.
- FIG. 2 is a schematic structural diagram of a first terminal 100 according to an embodiment of the application.
- Figure 3 is a schematic diagram of an implementation process of the image processing method provided by an embodiment of the application.
- FIG. 4 is a schematic diagram of the implementation process of obtaining a lightweight model provided by an embodiment of the application.
- FIG. 5 is a schematic diagram of still another implementation process of the image processing method provided by an embodiment of the application.
- FIG. 6 is a schematic diagram of an implementation flow of an image processing method provided by an embodiment of the application.
- FIG. 7A is a schematic diagram of an implementation process of constructing a data set according to an embodiment of the application.
- FIG. 7B is a schematic diagram of an implementation process of extracting low-resolution image features according to an embodiment of the application.
- 8A is a schematic diagram of the implementation process of a deep learning model and its training according to an embodiment of the application;
- FIG. 8B is a schematic diagram of the implementation process of the superdivision network structure and network usage method provided by an embodiment of this application;
- FIG. 8C is a schematic diagram of a network structure of a discriminator provided by an embodiment of the application.
- FIG. 8D is a schematic diagram of the implementation process of constructing and generating an objective function provided by an embodiment of the application.
- FIG. 8E is a schematic diagram of the implementation process of constructing a discrimination objective function provided by an embodiment of the application.
- FIG. 8F is a schematic diagram of a model training implementation process provided by an embodiment of the application.
- FIG. 9 is a schematic diagram of the implementation process of model conversion according to an embodiment of the application.
- FIG. 10 is a schematic diagram of the implementation process of real-time reasoning in an embodiment of the application.
- FIG. 11 is a schematic diagram of an implementation process of performing super-division processing on a color image according to an embodiment of the application.
- FIG. 12 is a schematic diagram of the implementation process of super-division processing on a video provided by an embodiment of the application.
- FIG. 13 is a schematic diagram of the composition structure of an image processing device provided by an embodiment of the application.
- first ⁇ second ⁇ third refers only to distinguish similar objects, and does not represent a specific order for the objects. Understandably, “first ⁇ second ⁇ third” Where permitted, the specific order or sequence can be interchanged, so that the embodiments of the present application described herein can be implemented in a sequence other than those illustrated or described herein.
- Image processing the processing of images, that is, the processing of pixel maps to pixel maps, such as super-resolution, image denoising and enhancement.
- Super Resolution (SR) algorithm that is, an algorithm that can improve image resolution, can be referred to as a super resolution algorithm for short, and belongs to an image processing method.
- Super-resolution algorithms can be divided into two types: multi-frame super-division and single-frame super-division.
- Single-frame super-division processes a picture to obtain the super-resolution image corresponding to the picture; the multi-frame super-resolution algorithm processes multiple pictures to obtain the super-resolution image corresponding to multiple pictures.
- the focus of this patent is the single-frame super-resolution algorithm.
- methods based on deep learning have the best results (obviously better than traditional methods).
- Computer Central Processing Unit (CPU, Central Processing Unit), the computing and control core of a computer system, is the final execution unit for information processing and program operation, and can be used in various computing scenarios.
- GPU Graphics Processing Unit
- display core visual processor
- display chip is a kind of specialized in personal computers, workstations, game consoles and some mobile devices (such as tablet computers, smart phones, etc.) It is a microprocessor that does image and graphics related operations. GPU has strong computing power and can often far exceed CPU, so it is widely used in deep learning model reasoning. Since GPU resources are scarce resources, there is a delay in deployment.
- Deep Learning that is, machine learning using neural networks.
- Model conversion algorithm that is, the algorithm for converting model types, such as converting a deep learning network into a decision tree model, or a subspace model, etc.
- the model conversion algorithm can convert a complex model to a simple model, greatly improving its calculation speed (the disadvantage is that it may lead to a decrease in accuracy).
- Convolution kernel when image processing, given an input image, the weighted average of pixels in a small area in the input image becomes each corresponding pixel in the output image, where the weight is defined by a function, this function is called convolution nuclear.
- the objective function also known as the Loss Function or the Cost Function, is to map the value of a random event or its related random variable to a non-negative real number to express the "risk” or “loss of the random event” "The function.
- the objective function is usually used as a learning criterion to be associated with optimization problems, that is, to solve and evaluate the model by minimizing the objective function.
- the parameter estimation used in the model in statistics and machine learning is the optimization goal of the machine learning model.
- Color gamut also known as color space, represents the range of colors that a color image can display.
- the current common color gamuts include Luminance Chrominance (YUV) color gamut, red, green and blue (RGB, Red Green Blue color gamut, Cyan Magenta Yellow Black (Cyan Magenta Yellow Black, CMYK color gamut, etc.).
- image processing methods for improving resolution include at least the following two:
- Step S001 first enlarge the image to the target size
- Step S002 Calculate the gradient feature of each pixel on the enlarged image
- Step S003 each pixel indexes the filter (convolution kernel) to be used by the gradient feature;
- step S004 each pixel is convolved with its indexed filter to obtain a super-divided pixel.
- RAISR uses 3 features calculated based on gradients, and divides the feature space into many small blocks by dividing each feature into different paragraphs.
- the target value can be directly fitted with the least square method to obtain the convolution kernel parameters.
- the least squares are used to fit the image block to the target pixel (high-resolution pixel) to achieve the training of the model.
- the RAISR method Compared with the deep learning method, the RAISR method has a slightly lower effect, but the calculation speed can be greatly improved (in the RAISR paper, compared with the deep learning, the speed is more than 100 times that of the latter).
- SRGAN Super Resolution Generative Adversarial Network
- SRGAN is a super-resolution technology based on generative confrontation networks. In general, it is to use the characteristics of the generated confrontation network to train two networks at the same time, one is used to construct a more realistic high-resolution image generation network, and the other is used to determine whether the input high-resolution image is constructed by an algorithm. To discriminate the network, the two networks are trained using two objective functions. Through continuous alternating training of these two networks, the performance of these two networks is getting stronger and stronger. Finally, take out the generating network and use it in inference.
- the disadvantage of the SRGAN algorithm is that it needs to ensure that the network is deep enough, so the network structure will taste very complicated, and it is difficult to run in real time like RAISR.
- the embodiment of the present application proposes a method of combining an image processing deep learning solution with a supporting model acceleration (model conversion).
- the neural network structure is used during training to ensure that the output result pixel is coherent when various special losses are used, and No additional noise is introduced; and through the method of model conversion, the model is simplified into a lightweight model (such as a subspace model or a decision tree) so that it can run in real time.
- the image processing device provided by the embodiment of the application can be implemented as a notebook computer, a tablet computer, a desktop computer, a mobile device (for example, a mobile phone, a portable music player, Any terminal with a screen display function, such as personal digital assistants, dedicated messaging devices, portable game devices), smart TVs, smart robots, etc., can also be implemented as servers.
- the server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers. It can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, and intermediate Cloud servers for basic cloud computing services such as software services, domain name services, security services, CDN, and big data and artificial intelligence platforms.
- the first terminal 100 may request to obtain a video or a picture from the server 200 (in this embodiment, the picture 101 is taken as an example for description).
- the image processing method provided by the embodiment of this application can be integrated into the gallery App of the terminal as a functional plug-in. If the first terminal 100 starts the image processing function, then the first terminal 100 can use the image processing method provided by the embodiment of this application. , Perform real-time processing on the picture 101 obtained from the server 200 to obtain the processed picture 102 and present it on the display interface of the first terminal 100.
- Fig. 1A the super-division processing of the image is taken as an example. Comparing 101 and 102 in Fig. 1A, it can be seen that the resolution of the processed picture 102 is higher, so that the bit rate can be unchanged. Next, improve the user’s picture quality experience.
- FIG. 1B is a schematic diagram of another network architecture of the image processing system provided by an embodiment of the application.
- the image processing system includes a first terminal 400, a second terminal 700, a server 500, and a network 600.
- the first terminal 400 is connected to the server 500 through the network 600.
- the first terminal 400 may be a smart terminal.
- Various application programs may be installed on the smart terminal, such as watching a video.
- the network 600 may be a wide area network, a local area network, or a combination of the two, and wireless links are used to implement data transmission.
- the second terminal 700 can also be any such as a notebook computer, a tablet computer, a desktop computer, a mobile device (for example, a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, a portable game device), a smart TV, a smart robot, etc. Terminal with screen display function.
- the second terminal 700 may upload a picture or video file to the server 500.
- the server 500 may process the picture or video according to the image processing method provided in the embodiment of the present application. And get the processed picture or video.
- the server 500 may return the processed picture or video to the first terminal 400, and the first terminal 400 displays it on its own display interface.
- FIG. 1B Display the processed picture or video to improve the user’s picture quality experience.
- the image denoising is used as an example.
- the image 201 in Fig. 1B is the original image
- the image 202 in Fig. 1B is the processed image.
- a comparison of the image 201 and the image 202 shows that the processed image The image has almost no noise points, thereby improving the user’s picture quality experience.
- FIG. 2 is a schematic structural diagram of a first terminal 100 according to an embodiment of the application.
- the first terminal 100 shown in FIG. 2 includes: at least one processor 110, a memory 150, at least one network interface 120, and a user interface 130 .
- the various components in the first terminal 100 are coupled together through the bus system 140.
- the bus system 140 is used to implement connection and communication between these components.
- the bus system 140 also includes a power bus, a control bus, and a status signal bus.
- various buses are marked as the bus system 140 in FIG. 2.
- the processor 110 may be an integrated circuit chip with signal processing capabilities, such as a general-purpose processor, a digital signal processor (DSP, Digital Signal Processor), or other programmable logic devices, discrete gates or transistor logic devices, or discrete hardware Components, etc., where the general-purpose processor may be a microprocessor or any conventional processor.
- DSP Digital Signal Processor
- the user interface 130 includes one or more output devices 131 that enable the presentation of media content, including one or more speakers and/or one or more visual display screens.
- the user interface 130 also includes one or more input devices 132, including user interface components that facilitate user input, such as a keyboard, a mouse, a microphone, a touch screen display, a camera, and other input buttons and controls.
- the memory 150 may be removable, non-removable, or a combination thereof.
- Exemplary hardware devices include solid-state storage, hard disk drives, optical disk drives, and the like.
- the memory 150 may include one or more storage devices that are physically remote from the processor 110.
- the memory 150 includes volatile memory or non-volatile memory, and may also include both volatile and non-volatile memory.
- the non-volatile memory may be a read only memory (ROM, Read Only Memory), and the volatile memory may be a random access memory (RAM, Random Access Memory).
- ROM read only memory
- RAM Random Access Memory
- the memory 150 described in the embodiment of the present application is intended to include any suitable type of memory.
- the operating system 151 includes system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;
- the network communication module 152 is used to reach other computing devices via one or more (wired or wireless) network interfaces 120.
- Exemplary network interfaces 120 include: Bluetooth, Wireless Compatibility Authentication (WiFi), and Universal Serial Bus ( USB, Universal Serial Bus), etc.;
- the input processing module 153 is configured to detect one or more user inputs or interactions from one or more of the one or more input devices 132 and translate the detected inputs or interactions.
- FIG. 2 shows an image processing device 154 stored in the memory 150.
- the image processing device 154 may be a device in the first terminal 100.
- An image processing device which can be software in the form of programs and plug-ins, and includes the following software modules: a first acquisition module 1541, a first extraction module 1542, a first processing module 1543, and an output module 1544. These modules are logical and therefore According to the realized function, it can be combined or split arbitrarily. The function of each module will be explained below.
- the device provided in the embodiment of the application may be implemented in hardware.
- the device provided in the embodiment of the application may be a processor in the form of a hardware decoding processor, which is programmed to execute the application.
- the image processing method provided by the embodiment for example, a processor in the form of a hardware decoding processor may adopt one or more application specific integrated circuits (ASIC, Application Specific Integrated Circuit), DSP, and Programmable Logic Device (PLD, Programmable Logic Device). ), Complex Programmable Logic Device (CPLD, Complex Programmable Logic Device), Field-Programmable Gate Array (FPGA, Field-Programmable Gate Array) or other electronic components.
- ASIC Application Specific Integrated Circuit
- DSP Digital Signal Processing
- PLD Programmable Logic Device
- CPLD Complex Programmable Logic Device
- FPGA Field-Programmable Gate Array
- AI Artificial Intelligence
- digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
- artificial intelligence is a comprehensive technology of computer science. It attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
- Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
- Artificial intelligence technology is a comprehensive discipline, covering a wide range of fields, including both hardware-level technology and software-level technology.
- Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, and mechatronics.
- Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning. Each direction will be explained separately below.
- Machine Learning is a multi-disciplinary interdisciplinary, involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other subjects. Specializing in the study of how computers simulate or realize human learning behaviors in order to acquire new knowledge or skills, and reorganize the existing knowledge structure to continuously improve its own performance.
- Machine learning is the core of artificial intelligence, the fundamental way to make computers intelligent, and its applications cover all fields of artificial intelligence.
- Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning and other technologies.
- AIaaS AI as a Service
- AIaaS AI as a Service
- the AIaaS platform will split several common AI services , And provide independent or packaged services in the cloud.
- This service model is similar to opening an AI theme mall: all developers can access one or more artificial intelligence services provided by the platform through API interfaces, and some senior developers can also use the platform Provide AI framework and AI infrastructure to deploy and operate their own exclusive cloud artificial intelligence services.
- FIG. 3 is a schematic diagram of an implementation flow of the image processing method provided by an embodiment of the application, and will be described with reference to the steps shown in FIG. 3.
- Step S101 Obtain an image to be processed.
- the image to be processed can be a grayscale image or a multi-channel color image.
- the image to be processed may be a video frame image obtained by decoding a video file.
- the image to be processed may be obtained from the server.
- the image to be processed may also be an image collected by the first terminal.
- the image to be processed may be uploaded to the server by the second terminal.
- step S101 after the image to be processed is obtained in step S101, it can also be executed: determine whether the image to be processed is a grayscale image, wherein when the image to be processed is a grayscale image, go to step S102; when the image to be processed is For color images, the color gamut of the image to be processed needs to be converted, and then the image processing process is performed.
- Step S102 when the image to be processed is a grayscale image, extract the feature vector of each pixel in the image to be processed, and determine the neighborhood image block corresponding to each pixel.
- step S102 the first direction gradient value and the second direction gradient value of each pixel can be determined according to the pixel value of each pixel in the image to be processed, and then according to the first direction gradient value of each pixel And the second direction gradient value determines the feature vector of each pixel.
- the neighborhood image block may be a K*K image block centered on each pixel, where K is an odd number, for example, K may be 5, 7, 9, 13, and so on.
- Step S103 using the lightweight model to process the feature vector of each pixel and the neighborhood image block to obtain a processed target image.
- the lightweight model is obtained by performing lightweight processing on the trained neural network model.
- it can be based on the trained neural network model to perform subspace division or generate a decision tree to obtain the lightweight model
- the lightweight model is more simplified. Therefore, when the lightweight model is used to perform image processing on the feature vector of each pixel and the neighborhood image block, compared with the neural network model, the calculation efficiency can be improved and the image processing time can be shortened. So as to realize real-time processing.
- step S103 it can be based on the feature vector of each pixel to determine the subspace corresponding to each pixel, or determine the leaf node in the decision tree corresponding to each pixel, and then determine the subspace or leaf node corresponding
- the convolution kernel of performs convolution operation on the convolution kernel and the neighboring image block to obtain the processed pixel value corresponding to each pixel, and determines the target image based on the processed pixel value of each pixel.
- Step S104 output the target image.
- step S104 when step S104 is implemented by the first terminal shown in FIG. 1A, the target image may be presented on the display device of the first terminal.
- step S104 when step S104 is implemented by the server shown in FIG. 1B, the target image may be Sent to the first terminal.
- step S104 when step S104 is implemented by the server shown in FIG. 1B, after step S104, it can also be executed: the server stores the target image in the local storage space.
- the neighborhood image block corresponding to each pixel in the image to be processed is determined; when the image to be processed is a grayscale image, the image to be processed is extracted Process the feature vector of each pixel in the image; use the lightweight model to process the feature vector of each pixel and the neighborhood image block to obtain the processed target image, where the lightweight model is a trained neural network model It is obtained by lightweight processing; because the neural network structure is used for training, it can ensure that the output pixel coherent target image is used when various special losses are used, and the light weight obtained through model conversion is used for image processing
- a model (such as a subspace model, or a decision tree) enables it to run and output the target image in real time, thereby improving the image processing efficiency while ensuring the processing effect.
- step S102 extracting the feature vector of each pixel in the image to be processed.
- Step S1021 Determine the first direction gradient map and the second direction gradient map corresponding to the image to be processed.
- the first direction may be a horizontal direction
- the second direction may be a vertical direction.
- step S1021 for each pixel in the image to be processed, the right neighbor pixel of each pixel The pixel value minus the pixel value of the left adjacent pixel, and the difference is divided by 2, to obtain the gradient value of the pixel in the first direction, and determine the to-be-processed based on the gradient value of each pixel in the first direction
- the first direction gradient map corresponding to the image subtract the pixel value of each pixel's lower neighboring pixel from the pixel value of the upper neighboring pixel, and divide the difference by 2, to get the pixel in the second direction
- edge pixels in the image to be processed you can use edge symmetrical flipping to calculate their gradient values.
- the vertical gradient values of the pixels on the upper and lower edges of the image to be processed are all 0, and the pixels on the left and right edges are The horizontal gradient values are all 0.
- Step S1022 Determine the first gradient neighborhood block in the first direction gradient map and the second gradient neighborhood block in the second direction gradient map of each pixel in the image to be processed.
- the first gradient neighborhood block and the second gradient neighborhood block have the same size, and both have the same size as the neighborhood image block of each pixel in the image to be processed.
- Step S1023 Determine the feature vector of each pixel based on the first gradient neighborhood block and the second gradient neighborhood block of each pixel.
- step S1023 can be implemented through the following steps:
- Step S231 Determine the covariance matrix of each pixel based on the first gradient neighborhood block and the second gradient neighborhood block of each pixel.
- the covariance matrix A of pixel i can be obtained by formula (1-1) out:
- Step S232 Determine each first eigenvalue and each second eigenvalue corresponding to each cosquare matrix.
- the first eigenvalue ⁇ 1 and the second eigenvalue ⁇ 2 of the cosquare matrix A can be calculated according to formula (1-2) and formula (1-3).
- a ⁇ x i x i
- b ⁇ x i y i
- c ⁇ y i y i .
- Step S233 Determine each variance value corresponding to the neighboring image block of each pixel.
- Step S234 Determine the feature vector of each pixel based on each first feature value, each second feature value, and each variance value.
- the feature vector of each pixel point may be 4-dimensional.
- the fourth-dimensional feature f 4 v, where v is the variance value determined in step S233.
- the first direction gradient value and the second direction gradient value of each pixel point can be directly used as each pixel point.
- Eigenvectors In some embodiments, other feature extraction algorithms can also be used to extract the feature vector of each pixel in the image to be processed.
- the dimension of the obtained feature vector cannot be too large, so as to avoid the excessive number of lightweight models obtained after the model conversion. More, which in turn causes the computational complexity to be too high.
- the preset neural network model needs to be trained through the following steps to obtain a trained neural network model:
- Step S001 Obtain training data and a preset neural network model.
- the training data includes at least a first training image and a second training image, where the second training image is obtained by down-sampling the first training image, that is to say, the resolution of the second training image is lower than The resolution of the first training image.
- the first training image and the second training image are both grayscale images.
- the training data may also include the feature vector of each pixel in the second training image.
- the preset neural network model may be a deep learning neural network model, and the neural network model may include a generative model and a discriminant model.
- Step S002 Use the neural network model to process the second training image to obtain a predicted image.
- the training data when the training data includes the feature vector of each pixel in the second training image, when step S002 is implemented, the feature vector of each pixel in the second training image may be input to the neural network model to obtain the predicted image
- the training data when the training data includes only the first training image and the second training image, when step S002 is implemented, the second training image may be input to the neural network model to obtain a predicted image.
- Step S003 Perform back propagation training on the neural network model based on the predicted image, the first training image and the preset objective function to obtain a trained neural network model.
- the preset objective function includes generating objective function and discriminating objective function.
- this step S003 can be implemented through the following steps:
- Step S31 Fix the discriminant parameters of the discriminant model, and perform backpropagation training on the generative model based on the predicted image, the first training image and the generative target function to adjust the generative parameters of the generative model.
- Step S32 Fix generation parameters of the discriminant model, and perform back propagation training on the discriminant model based on the predicted image, the first training image and the discriminant objective function, so as to adjust the discriminant parameters of the discriminant model until it reaches the preset value.
- the training completion conditions are obtained, and a trained neural network model is obtained.
- the preset training completion condition may be that the number of training times reaches the preset threshold of times, or the difference value between the predicted image and the first training image is lower than the preset difference threshold.
- the objective function can be constructed and generated through the following steps:
- Step S41a Determine the pixel-level error value and the content error value between the predicted image and the first training image.
- the pixel-level error value when determining the pixel-level error value between the predicted image and the first training image, you can first determine the error value between each pixel in the predicted image and the first training image, and then use the difference between each pixel.
- the error value determines the pixel-level error value between the predicted image and the first training image, where the pixel-level error value can be an average error calculated based on the error value between each pixel, or it can be based on the difference between each pixel
- the mean square error (MSE, Mean Square Error), absolute error, etc. of the calculated error value MSE, Mean Square Error
- the predicted image and the first training image can be input to the content feature module respectively, and the predicted content feature vector and the training content feature vector can be obtained correspondingly, where the content feature
- the module is a pre-trained module.
- the first multi-layer structure of VGG19 is used (the first 17 layers are recommended), and the content error value is calculated based on the predicted content feature vector and the training content feature vector, where the content error value can be the predicted content
- the average error between the feature vector and the training content feature vector can also be in the form of mean square error or absolute error between the two.
- Step S42a Determine the first pixel discrimination error value and the first global discrimination error value of the prediction image based on the prediction image and the discrimination model.
- the predicted image can be first input to the discriminant model to obtain the predicted pixel discriminant matrix and the predicted global discriminant value, where the size of the predicted pixel discriminant matrix is consistent with the size of the predicted image, and the predicted pixel discriminant matrix
- the predicted global discriminant value is a value, which indicates the probability that the predicted image is constructed by the generator (the value is a real number between 0-1);
- the first pixel discrimination error value is determined based on the predicted pixel discrimination matrix value (that is, 0), and the first global discrimination error value is determined based on the predicted global discrimination value.
- the first pixel discrimination error value can be obtained by calculating the average error between the predicted pixel discrimination matrix and the negative value, or can be obtained by calculating the mean square error between the two; similarly, the first global discrimination error value It can be obtained by calculating the average error between the predicted global discriminant value and the negative value, or by calculating the mean square error between the two.
- Step S43a based on the preset generation weight value, the pixel-level error value, the content error value, the first pixel discriminant error value, and the first global discriminant error value, the target function is determined to be generated.
- the preset generated weight value includes the first weight value corresponding to the pixel-level error value, the second weight value corresponding to the content error value, the third weight value corresponding to the first pixel discrimination error value, and the first global value. Determine the fourth weight value corresponding to the error value.
- step S43a the pixel-level error value, the content error value, the first pixel discrimination error value, the first global discrimination error value and the corresponding weight value are weighted. And, get the generated objective function.
- the discriminant objective function can be constructed through the following steps:
- Step S41b Determine the second pixel discrimination error value and the second global discrimination error value of the prediction image based on the prediction image and the discrimination model.
- step S41b the predicted image is first input to the discriminant model to obtain the predicted pixel discriminant matrix and the predicted global discriminant value; then the second pixel discriminant error value is determined based on the predicted pixel discriminant matrix and the value (that is, 1) , And determine the second global discriminant error value based on the predicted global discriminant value and the yes value.
- the second pixel discriminant error value can be obtained by calculating the average error between the predicted pixel discriminant matrix and the yes value, or can be obtained by calculating the mean square error between the two; similarly, the second global discriminant error value It can be obtained by calculating the average error between the predicted global discriminant value and the yes value, or by calculating the mean square error between the two.
- Step S42b Determine a third pixel discrimination error value and a third global discrimination error value of the first training image based on the first training image and the discrimination model.
- step S42b when step S42b is implemented, first input the first training image to the discriminant model to obtain the training pixel discriminant matrix and the training global discriminant value; and then determine the third pixel discriminant based on the training pixel discriminant matrix value (that is, 0) Error value, and the third global discriminant error value is determined based on the training global discriminant value or not.
- the third pixel discrimination error value can be obtained by calculating the average error between the training pixel discrimination matrix and the negative value, or the mean square error between the two; similarly, the third global discrimination error value It can be obtained by calculating the average error between the training global discriminant value and the negative value, or by calculating the mean square error between the two.
- Step S43b Determine the discrimination objective function based on the preset discrimination weight value, the second pixel discrimination error value, the second global discrimination error value, the third pixel discrimination error value and the third global discrimination error value.
- the preset discrimination weight value includes the fifth weight value corresponding to the second pixel discrimination error value, the sixth weight value corresponding to the second global discrimination error value, and the seventh weight value corresponding to the third pixel discrimination error value.
- the eighth weight value corresponding to the third global discrimination error value when step S43b is implemented, may be the second pixel discrimination error value, the second global discrimination error value, the third pixel discrimination error value, and the third pixel discrimination error value.
- the global discriminant error and the corresponding weight value are weighted and summed to obtain the discriminant objective function.
- the lightweight model can be obtained through step S51a to step S54a as shown in FIG. 4:
- Step S51a Determine the feature space based on the feature vector corresponding to each pixel in the image to be processed.
- the feature space may be determined based on the maximum value and the minimum value of each dimension in the corresponding feature vector in each pixel point.
- step S52a the feature space is divided into N feature subspaces according to a preset division rule, and N central coordinates corresponding to the N feature subspaces are determined respectively.
- each dimension of the feature vector can be divided.
- Space and determine the corresponding center coordinates based on the maximum and minimum values of each dimension in each feature subspace.
- the median of the maximum value and the minimum value of each dimension in each feature subspace may be determined as the center coordinate corresponding to the feature subspace.
- step S53a the N center coordinates are respectively input to the trained neural network model, and N convolution kernels of N feature subspaces are obtained correspondingly.
- Step S54a Determine N feature subspaces and N convolution kernels as a lightweight model.
- the lightweight model can also be obtained through the following steps:
- step S51b a decision tree is constructed based on the feature vector corresponding to each pixel in the image to be processed.
- step S51b when step S51b is implemented, it can be that all the feature vectors are first regarded as a node, and then a feature vector is selected from all the feature vectors to segment all the feature vectors to generate several child nodes; for each child The node judges, if the condition to stop splitting is met, set the node as a leaf node; otherwise, select a feature vector from the child nodes to split all the feature vectors in the child node until the condition to stop splitting is reached. Decision tree.
- step S52b each leaf node in the decision tree is input to the trained neural network model, and the convolution kernel corresponding to each leaf node is correspondingly obtained.
- each leaf node is input to the trained neural network model, that is, the feature vector as the leaf node is input to the trained neural network model, and the convolution kernel corresponding to each leaf node is obtained.
- Step S53b Determine each leaf node and the corresponding convolution kernel as the lightweight model.
- a decision tree is constructed based on the feature vector of each pixel, and the convolution kernel corresponding to each leaf node in the decision tree is determined, thus obtaining a lightweight model.
- the above step S103 "Using the lightweight model to process the feature vector of each pixel and the neighborhood image block , Get the processed target image" can be achieved through the following steps:
- Step S1031 Determine the convolution kernel corresponding to each pixel based on the feature vector of each pixel and the lightweight model.
- step S1031 when the lightweight model is obtained by dividing the feature space to obtain feature subspaces, then when step S1031 is implemented, it can be determined based on the feature vector of a certain pixel i that the feature vector falls into the lightweight model. Which feature subspace, and then obtain the convolution kernel corresponding to the feature subspace. In the embodiment of this application, when different image processing is performed, the number of channels of the obtained convolution kernel is different.
- the super-division processing is performed, and the super-division multiple is P, P is an integer greater than 1 (for example, It can be 2)
- the size of the original image before processing is W*D (for example, 1280*720)
- the size of the processed image is W*P*D*P (for example, the size of the processed image is 1280*2*720* 2, which is 2560*1440)
- the number of channels of the convolution kernel obtained at this time is P*P (that is 4); if it is denoising processing, because the original image before processing and the processed image The size of is the same, then the number of channels of the convolution kernel obtained at this time is 1.
- step S1031 when the lightweight model is obtained by constructing a decision tree, then when step S1031 is implemented, the feature vector of each pixel is compared with each node in the decision tree, and finally the corresponding pixel is obtained.
- the target leaf node and obtain the convolution kernel corresponding to the target leaf node.
- Step S1032 Perform convolution calculation on the neighborhood image block of each pixel and each corresponding convolution kernel to obtain the processed pixel value.
- the number of processed pixel values of a pixel value after convolution calculation is related to the number of channels of the convolution kernel. For example, if the number of channels of the convolution kernel is 1, then the processed pixel value obtained The number is also 1; and the number of channels of the convolution kernel is P*P, then the number of processed pixel values obtained is P*P.
- Step S1033 Determine a processed target image based on the processed pixel value.
- the processed target image is obtained directly based on the processed pixel values; when the number of processed pixel values is P*P, the processed pixels The values are spliced and reordered to obtain the processed target image.
- steps S1031 to S1033 are located, a lightweight model is used to determine the convolution kernel corresponding to each pixel, the dimensionality is reduced compared to the convolution kernel corresponding to the neural network model before the lightweight processing Therefore, when performing convolution calculations, the amount of calculation can be reduced, thereby improving processing efficiency and realizing real-time processing.
- FIG. 5 is a schematic diagram of another implementation flow of the image processing method provided by the embodiment of the application, which is applied to the network architecture shown in FIG. 1A, as shown in FIG. As shown in 5, the method includes:
- Step S201 The first terminal receives an operation instruction for watching a video.
- the operation instruction may be triggered by a click or touch operation made by the user to watch the video viewing portal of the video App.
- Step S202 The first terminal sends a request message for watching the video to the server based on the operation instruction.
- the request message carries the target video identifier.
- Step S203 The server obtains the target video file based on the request message.
- the server parses the request message, obtains the target video identifier, and obtains the target video file based on the target video identifier.
- Step S204 The server returns a video data stream to the first terminal based on the target video file.
- Step S205 The first terminal decodes the received video data stream to obtain an image to be processed.
- step S205 the first terminal decodes the received video data stream to obtain each video image frame, and determines each video image frame as an image to be processed.
- Step S206 The first terminal determines whether the image to be processed is a grayscale image.
- step S207 when the image to be processed is a grayscale image, go to step S207; when the image to be processed is a color image, go to step S209.
- the image to be processed when the image to be processed is a color image, it may be an RGB color image, or an sRGB color image, a CMYK color image, and so on.
- Step S207 The first terminal extracts the feature vector of each pixel in the image to be processed, and determines the neighborhood image block corresponding to each pixel.
- step S208 the first terminal uses the lightweight model to process the feature vector of each pixel and the neighborhood image block to obtain a processed target image.
- the lightweight model is obtained by performing lightweight processing on the trained neural network model.
- it can be based on the trained neural network model to perform subspace division or generate a decision tree to obtain the lightweight model .
- step S207 and step S208 in the embodiment of the present application is similar to the implementation process of step S102 and step S103 in other embodiments, and the implementation process of step S102 and step S103 can be referred to.
- step S209 the first terminal converts the image to be processed into the luminance and chrominance (YUV) color gamut to obtain the luminance Y channel to be processed image and the chrominance UV channel to be processed image.
- YUV luminance and chrominance
- step S209 may be implemented by converting the color image to be processed to the YUV color gamut according to a preset conversion function, so as to obtain the Y channel to be processed image and the UV channel to be processed image. Since the Y channel information in the YUV image is sufficient to display the gray scale of the image, that is, the Y channel to be processed image is a single-channel gray scale image at this time.
- Step S210 The first terminal extracts the feature vector of each Y-channel pixel in the Y-channel to-be-processed image, and determines the neighborhood image block corresponding to each Y-channel pixel.
- step S210 is similar to the implementation process of step S102 described above, and the implementation process of step S102 can be referred to in actual implementation.
- step S211 the first terminal uses the lightweight model to process the feature vector of each Y channel pixel and the neighborhood image block to obtain a processed Y channel target image.
- the lightweight model is used to perform image processing on only the Y channel to be processed image, so as to obtain the processed Y channel target image.
- the implementation process of step S211 is similar to the implementation process of step S103 described above. In actual implementation, the implementation process of step S103 can be referred to.
- step S212 the first terminal uses a preset image processing algorithm to process the UV channel to-be-processed image to obtain the UV channel target image.
- the preset image processing algorithms are different.
- the preset image processing algorithm may be an image interpolation algorithm, for example, it may be bicubic. Interpolation algorithm; when the purpose of image processing is to remove image noise, the preset image processing algorithm can be a filtering algorithm, for example, a spatial domain filtering algorithm, a transform domain filtering algorithm, etc.
- Step S213 The first terminal determines a target image based on the Y channel target image and the UV channel target image, where the target image has the same color gamut as the image to be processed.
- step S213 after using a preset image processing algorithm to process the UV channel to-be-processed image to obtain the UV channel target image, in step S213, the Y channel target image and the UV channel target image obtained in step S211 are subjected to color gamut conversion to obtain The target image with the same color gamut as the image to be processed.
- Step S214 the first terminal outputs the target image.
- step S214 the target image may be presented on the display interface of the first terminal.
- the first terminal decodes the video data stream to obtain the image to be processed.
- the image to be processed is a grayscale image
- the light source is directly used.
- the quantization model processes the image to be processed to obtain the target image; when the image to be processed is a color image, convert the image to be processed to the YUV color gamut, and use the lightweight model to process the Y channel to be processed image to obtain the Y channel target image , Use the preset image processing algorithm to process the UV channel to be processed image to obtain the UV channel target image, and then convert the Y channel target image and the UV channel target image to the same color gamut as the image to be processed to obtain the target image, and Output the target image, which can increase the image processing speed and realize real-time operation (the acceleration ratio is different after different models are converted, theoretically up to 100 times or more).
- the image processing method provided by the embodiment of this application can be used for super-resolution processing and denoising It has a wide range of applications in processing, image enhancement processing, etc.
- image processing method provided in the embodiments of this application can be used in a variety of image processing applications (such as image super-resolution, denoising, enhancement, etc.).
- image processing applications such as image super-resolution, denoising, enhancement, etc.
- the application of image and video super-resolution is taken as an example Be explained.
- FIG. 6 is a schematic diagram of an implementation flow of an image processing method provided by an embodiment of this application.
- the method is applied to an image processing device, where the image processing device may be the first terminal shown in FIG. 1A, or it may be the first terminal shown in FIG. The server shown.
- the method includes:
- step S601 the image processing device constructs a training data set.
- step S601 when step S601 is implemented, the high-resolution image is first down-sampled to construct a low-resolution image, and then the feature extraction algorithm is used to extract the features of each pixel in the low-resolution image to obtain a feature map, and finally each group is used ⁇ High-resolution image, low-resolution image, feature map> Construct a training data set.
- step S602 the image processing device trains the deep learning model.
- step S602 the deep learning model is trained based on the training data set, the training algorithm, and the loss function.
- step S603 the image processing device performs model conversion.
- a model conversion algorithm is used to simplify the trained deep learning model into a lightweight model, such as a subspace model.
- step S604 the image processing device performs real-time inference.
- FIG. 7A is a schematic diagram of the implementation process of constructing a data set according to an embodiment of the application. As shown in Figure 7A, the implementation process includes:
- step S6011 a high-resolution image is obtained.
- the width and height of the high-resolution image must be an integer multiple of the super-division multiple N, and must be a grayscale image.
- step S6012 the artificial downsampling algorithm is used to reduce the resolution of the high-resolution image to obtain a low-resolution image.
- step S6013 the feature extraction algorithm is used to extract features of the low-resolution image to obtain a feature map.
- step S6014 the high-resolution image, the low-resolution image, and the feature map are combined into a training set.
- step S6013 when step S6013 is implemented, gradient features and variance can be used as features of the low-resolution image to construct a feature map.
- the corresponding 4-dimensional feature may be calculated for each pixel. After that, it is arranged in the order of the original pixels into a feature map with the same width and height as the low-resolution image, and the number of channels is 4.
- FIG. 7B is a schematic diagram of the implementation process of extracting low-resolution image features according to an embodiment of the application. As shown in FIG. 7B, the process includes:
- step S31 the image processing device calculates the first direction gradient map dx of the low-resolution image.
- step S32 the image processing device calculates the second direction gradient map dy of the low-resolution image 6012.
- the value of the lower pixel minus the upper pixel is used, and the difference is divided by 2, to obtain the corresponding gradient value of the pixel i on dy.
- Step S33 For each pixel i on the low-resolution image, the image processing device performs the following processing to obtain its corresponding feature (the four-dimensional feature obtained in the embodiment of the present application):
- Step 331 The image processing device calculates the neighboring image blocks of the corresponding positions of the pixel i on dx and dy, which are respectively denoted as x and y.
- x and y correspond to the dx block and the dy block in FIG. 7B.
- Step 333 Calculate the eigenvalues ⁇ 1 and ⁇ 2 of the covariance matrix A.
- the eigenvalues ⁇ 1 and ⁇ 2 of the cosquare matrix A are calculated according to formula (1-2) and formula (1-3), respectively.
- a ⁇ x i x i
- b ⁇ x i y i
- c ⁇ y i y i .
- Step 334 On the low-resolution image, extract the neighborhood image block of pixel i, and calculate the variance v of the neighborhood image block.
- the 4th dimension feature f4 v.
- the feature of each pixel on the low-resolution image is calculated, thereby constructing a feature map.
- Fig. 8A is a schematic diagram of the implementation process of the deep learning model and its training according to an embodiment of the application. As shown in Fig. 8A, the process includes:
- step S6021 a generator (super-division model) is constructed.
- Step S6022 construct a discriminator (discrimination model).
- Step S6023 construct and generate the objective function.
- Step S6024 construct a discriminant objective function.
- step S6025 two objective functions are used to train the super-resolution model and the discriminant model.
- the available superdivision network structure and network usage method are shown in FIG. 8B (the network structure is not limited to this), and the available superdivision network structure is shown as 811 in FIG. 8B.
- the deep superdivision network is a deep neural network, as shown in FIG. 2Z+1 8114, Reshape layer 2 8115.
- the residual module i 8113 as shown in FIG. 8B, further includes a fully connected layer i_1 1131, a fully connected layer i_2 1132, and an addition layer 1133.
- the feature map of the low-resolution image is input into the deep neural network, and the convolution kernel used for the current image block super-division is output.
- the recommended value of Z is 10
- the "-" in the table indicates the batch dimension.
- Step S801 the focus pixel i correspond to remove low-resolution image data from the block R i, 4-dimensional feature F i.
- Step S802 the feature F i is input to the super-network depth, obtained by convolution super-tile i R i used.
- Step S803 the image block R i and i convolve the convolution kernel, to give N 2 after the super-pixels, referred to as the vector I i.
- step S804 after the super-divided values I i of all pixels are calculated, they are spliced and reordered (ie, pixel reordering, PixelShuffle) to obtain a super-resolution image S.
- the directly composed super-divided pixels obtain an image S as a three-dimensional matrix with three dimensions W, H, and N 2 respectively , and the priority is increased in turn, where N is the super-resolution multiple.
- W is 640
- H is 360
- N is 2
- the three dimensions of the image S obtained after super-division are 640, 360, and 4, respectively.
- the convolution kernel output by the superdivision network is a convolution kernel with N 2 channels.
- the super-division network uses the above-mentioned input features to ensure that the subsequent model conversion steps can run effectively (because the feature dimensions used are not many, only 4 dimensions).
- Fig. 8C is a schematic diagram of a network structure of a discriminator provided by an embodiment of the application.
- the network model includes convolutional layer 1 821, convolutional layer 2 822, convolutional layer 3 823, Fully connected layer 1 824 and convolutional layer 4 825.
- the network structure parameters of the discriminant network model shown in Figure 8C are shown in Table 2 below:
- the discrimination network will have two outputs: global discrimination output 827 and pixel discrimination output 828, among which:
- the global discriminant output 827 is used to discriminate whether the input image is an image constructed by a superdivision network.
- the output is a value indicating the probability that the input image is constructed by the generator (between 0-1, 0 means no, 1 means yes) .
- Pixel discrimination output 828 is used to determine whether the input image is an image constructed by a superdivision network.
- the output is a matrix with the same width and height as the input image.
- Each element represents the probability that the input image pixel at the corresponding position is the generator structure (0- Between 1, 0 means no, 1 means yes).
- the objective function can be constructed and generated as shown in FIG. 8D:
- Step S231 Calculate the pixel-level error.
- step S231 when step S231 is implemented, the average error of each pixel point between the high-resolution image and the super-divided image is calculated.
- the error can be in various forms such as minimum square error (MSE), absolute error, and so on.
- step S232 the content error is calculated.
- step S232 can be implemented through the following steps:
- Step S2321 Input the high-resolution image into the content feature module to obtain the high-scoring content feature.
- the content feature module is a pre-trained module, and generally uses the first multi-layer structure of VGG19 (recommended to use the first 17 layers); other networks or different first multi-layers can be used.
- Step S2322 input the super-divided image into the content feature module to obtain the super-divided content feature.
- Step S2323 Calculate the average error between the high-scoring content feature and the super-scoring content feature, that is, the content error.
- the error can be in various forms such as minimum square error (MSE) and absolute error.
- Step S233 Calculate the pixel discrimination error and the global discrimination error.
- step S233 can be implemented through the following steps:
- Step S2331 input the super-divided image into the discrimination network to obtain super-divided pixel discrimination and super-divided global discrimination;
- Step S2332 Calculate the average error of the super-divided pixel discrimination value (0), that is, the pixel discrimination error (hope that the generator can fool the discrimination network to make the discrimination network think that the pixels of the input image are not over-divided).
- the pixel discrimination error may be in various forms such as binary cross entropy.
- Step S2333 Calculate the average error of the global discriminant value (0), which is the global discriminant error (hope that the generator can fool the discriminant network so that the discriminant network thinks that the input image is not super-resolution as a whole ).
- the global discrimination error may be in various forms such as binary cross entropy.
- step S234 the weighted sum of the 4 errors is obtained to generate the objective function.
- the suggested weights are: pixel discrimination error weight 7e-4, global discrimination error weight 3e-4, content error weight 2e-6, and pixel-level error weight 1.0.
- Step S241 Calculate the super-resolution global error and the super-resolution pixel error of the super-resolution image.
- step S241 can be implemented through the following steps:
- step S2411 the super-resolution image is input into the discrimination network to obtain the super-division global judgment and super-division pixel judgment.
- Step S2412 Calculate the average error between the super-division pixel judgment and the value (1), that is, the super-division pixel error (it is hoped that the discrimination network can recognize that each pixel of the input super-division image is constructed by the generator super-division module) .
- the super-resolution pixel error may be in various forms such as binary cross entropy.
- Step S2413 Calculate the average error between the super-division global discrimination and the yes value (1), that is, the super-division global error (it is hoped that the discriminant network can recognize that the input super-division image is constructed by the generator super-division module as a whole).
- the hyperdivision global error may be in various forms such as binary cross entropy.
- Step S242 Calculate the high-resolution global error and high-resolution pixel error of the high-resolution image.
- step S242 can be implemented through the following steps:
- step S2421 the high-resolution image is input into the discrimination network to obtain high-score global judgment and high-score pixel judgment.
- Step S2422 Calculate the average error of the high-resolution pixel judgment value (0), that is, the high-resolution pixel error (it is hoped that the discrimination network can recognize that each pixel of the input high-resolution image is not constructed for the generator super-division module) .
- the high-resolution pixel error may be in various forms such as binary cross entropy.
- Step S2423 Calculate the average error of the high-score global discrimination value (0), that is, the high-score global error (it is hoped that the high-resolution image input by the discrimination network is not constructed by the generator super-score module as a whole),
- the high score global error may be in various forms such as binary cross entropy.
- step S243 the four errors are weighted and summed to obtain a discriminative loss function.
- the suggested weights are respectively: the weight of the super-resolution global error is 0.25, the weight of the super-resolution pixel error is 0.25, the weight of the high-resolution global error is 0.25, and the weight of the high-resolution pixel error is 0.25.
- Fig. 8F is a schematic diagram of a model training implementation process provided by an embodiment of the application. As shown in Fig. 8F, the process includes:
- the number of initialization iterations is 1, and the parameter structure of the discrimination network is initialized and the network is generated.
- step S842 the image processing device determines whether the number of iterations is less than T.
- T is a preset threshold for the number of iterations, for example, it may be 10,000 times.
- step S843 is entered; when the number of iterations is greater than or equal to T, the process ends.
- step S843 the image processing device fixes the parameters of the discriminator, uses the optimization algorithm, uses the data in the training set and the generation loss function, and trains (iteratively) the generator parameters once.
- step S844 the image processing device fixes the parameters of the generator, uses the optimization algorithm, uses the data in the training set and the discriminant loss function, and trains (iteratively) the discriminator parameters once.
- step S845 the number of iterations is +1, and step S842 is entered again.
- the trained generator parameters and discriminator parameters can be obtained, where the generator parameters are the parameters of the deep superdivision network.
- step S603 model conversion.
- the core idea of model conversion is to approximate the deep learning model and transform it into a simple and lightweight model.
- the following is an example of the method of converting the deep superdivision network model into a subspace model. To describe in one sentence is to divide the input feature space to obtain each subspace, and approximate all the deep learning output values of each subspace to the output value of the deep learning model corresponding to the current space center point.
- Figure 9 is a schematic diagram of the implementation process of model conversion in an embodiment of this application. As shown in Figure 9, the process includes:
- step S6031 the image processing device discretizes the feature space.
- each dimension of the feature space (the aforementioned 4-dimensional feature space) is segmented, where: Feature 1 is recommended to be divided into N 1 segments evenly from [0-2 ⁇ ] (recommended value is 16) ; Feature 2 is recommended to be divided into N 2 segments according to the maximum and minimum data (recommended value 8); Feature 3 is recommended to be divided into N 3 segments according to the maximum and minimum data (recommended value 8); Feature 4 is recommended From 0 to the maximum of the data, it is evenly divided into N 4 segments (the recommended value is 8). According to the above segmentation, the feature space is divided into N 1 *N 2 *N 3 *N 4 (the recommended value is 8192) subspaces.
- step S6032 for each subspace i, the image processing device calculates the center of the subspace, that is, the center coordinate i.
- step S6032 the median value of the upper and lower bounds of each dimension may be calculated to obtain the center coordinates of the subspace.
- step S6033 the image processing device inputs the center coordinate i into the deep superdivision network to obtain a convolution kernel i.
- step S6034 the image processing device composes each subspace and its corresponding convolution kernel into a converted subspace model.
- the deep learning model in addition to being converted into a subspace model, in some embodiments, may also be converted into other lightweight models, such as decision trees. For this type of model conversion, you can use the deep learning model to construct the data to train a new target lightweight model.
- step S604 real-time reasoning.
- the step of real-time reasoning we will use the lightweight model (for example, subspace model) obtained in step S603 to realize real-time reasoning of image super-division.
- FIG. 10 is a schematic diagram of the implementation process of real-time reasoning in an embodiment of the application. As shown in FIG. 10, the process includes:
- step S6041 the image processing device calculates a feature map of the image to be super-divided.
- the calculation method is the same as that of S6013, using a feature extraction algorithm to extract the feature map of the image to be super-divided, where the image to be super-divided is a single-channel image.
- Step S6042 For each pixel i on the image to be super-divided, on the image to be super-divided, the image processing device obtains the low-resolution image block Ri of the pixel i .
- step S6043 the image processing device obtains the feature F i of the pixel i on the feature map.
- step S6044 the image processing device inputs the feature F i into the subspace model, queries the subspace to which it belongs, and obtains the convolution kernel i corresponding to the subspace.
- Step S6045 the image processing apparatus of a low resolution image block with R i corresponding to the determined subspace i convolution kernel convolution operation, the result obtained super-pixel i L i, i.e., to obtain the super-N 2 super-divided pixels.
- step S6046 the image processing device performs splicing and reordering on all the pixels L i (N 2 channels, where N is the multiple of the super-division) after the super-division, to obtain the super-division image.
- step S6046 can refer to the implementation of step S804.
- step S1101 the image processing device transfers the color image from the original color gamut (for example, RGB color gamut) to the YUV color gamut to obtain the Y-channel to-be-super-divided image and the UV-channel to-be-super-divided image.
- the original color gamut for example, RGB color gamut
- the YUV color gamut to obtain the Y-channel to-be-super-divided image and the UV-channel to-be-super-divided image.
- Step S1102 the image processing device inputs the Y-channel to-be-super-divided image to the real-time super-division module to perform real-time super-division, and obtain the Y-channel super-division image.
- step S1103 the image processing device uses the traditional image interpolation method to perform the super-division processing on the UV channel to be super-division image, to obtain the UV channel super-division image.
- bicubic interpolation can be used to perform super-division processing on the UV channel to be super-division image.
- other image interpolation methods can also be used.
- step S1104 the image processing device transfers the super-divided YUV image to the original color gamut, and the converted image is the super-divided image.
- step S1201 the image processing device obtains a video to be super-divided.
- step S1202 the image processing device decodes the video to obtain each video frame to be superdivided.
- step S1203 the image processing device inputs each video frame i to be super-divided into the real-time super-division module, performs super-division processing, and obtains a super-division image of the video frame i.
- step S1203 can be implemented with reference to step S1101 to step S1104.
- step S1204 the image processing device performs video encoding on the super-divided image of each video frame i to obtain the super-divided video.
- various objective functions in deep learning can be used during training, which can make the trained model have better picture effects and can convert the deep super-division model into lightweight Model, which can greatly improve its inference speed, realize real-time operation (the acceleration ratio is different after different models are converted, theoretically up to 100 times or more).
- the method proposed in the embodiments of this application can also be used for other image processing In applications, such as image denoising, enhancement, etc., the scope of application is wider.
- FIG. 13 is a schematic diagram of the composition structure of the image processing device provided by the embodiment of the application. As shown in FIG. 13, the image processing device 154 includes :
- the first obtaining module 1541 is configured to obtain the image to be processed
- the first extraction module 1542 is configured to extract the feature vector of each pixel in the image to be processed when the image to be processed is a grayscale image, and determine the neighborhood image block corresponding to each pixel in the image to be processed;
- the first processing module 1543 is configured to use a lightweight model to process the feature vector of each pixel and the neighborhood image block to obtain a processed target image.
- the lightweight model is a lightweight model for the trained neural network model. Quantified processing;
- the output module 1544 is configured to output the target image.
- the image processing device further includes:
- the color gamut conversion module is configured to convert the to-be-processed image to the YUV color gamut when the to-be-processed image is a color image to obtain the Y-channel to-be-processed image and the UV-channel to-be-processed image;
- the second extraction module is configured to extract the feature vector of each Y-channel pixel in the Y-channel to-be-processed image, and determine the neighborhood image block corresponding to each Y-channel pixel;
- the second processing module is configured to use the lightweight model to process the feature vector of each Y channel pixel and the neighborhood image block to obtain a processed Y channel target image;
- the third processing module is configured to use a preset image processing algorithm to process the UV channel to-be-processed image to obtain a UV channel target image;
- the first determining module is configured to determine the target image based on the Y channel target image and the UV channel target image, wherein the target image has the same color gamut as the image to be processed.
- the first obtaining module is further configured to:
- the respective video frame image is determined as the image to be processed.
- the first extraction module is further configured to:
- the feature vector of each pixel is determined based on the first gradient neighborhood block and the second gradient neighborhood block of each pixel.
- the first extraction module is further configured to:
- the feature vector of each pixel is determined based on each first feature value, each second feature value, and each variance value.
- the image processing device further includes:
- the second acquisition module is configured to acquire training data and a preset neural network model, where the training data includes a first training image and a second training image, and the second training image is to download the first training image Obtained by sampling, the neural network model includes a generative model and a discriminant model;
- the fourth processing module is configured to use the neural network model to process the second training image to obtain a predicted image
- the model training module is configured to perform back propagation training on the neural network model based on the predicted image, the first training image, and a preset objective function to obtain a trained neural network model.
- the preset objective function includes generating objective function and discriminating objective function.
- the model training module is further configured as:
- Fix generation parameters of the discriminant model and perform back propagation training on the discriminant model based on the predicted image, the first training image and the discriminant objective function to adjust the discriminant parameters of the discriminant model until the preset training is completed Condition, get a trained neural network model.
- the image processing device further includes:
- the second determining module is configured to determine the pixel-level error value and the content error value between the predicted image and the first training image
- the third determining module is configured to determine the first pixel discrimination error value and the first global discrimination error value of the predicted image based on the predicted image and the discriminant model;
- the fourth determination module is configured to determine the generation objective function based on the preset generation weight value, the pixel-level error value, the content error value, the first pixel discrimination error value and the first global discrimination error value.
- the image processing device further includes:
- a fifth determining module configured to determine a second pixel discrimination error value and a second global discrimination error value of the predicted image based on the predicted image and the discriminant model;
- a sixth determining module configured to determine a third pixel discrimination error value and a third global discrimination error value of the first training image based on the first training image and the discrimination model;
- the seventh determining module is configured to determine the discrimination objective function based on the preset discrimination weight value, the second pixel discrimination error value, the second global discrimination error value, the third pixel discrimination error value, and the third global discrimination error value .
- the image processing device further includes:
- the eighth determining module is configured to determine the feature space based on the feature vector corresponding to each pixel in the image to be processed;
- the subspace division module is configured to divide the feature space into N feature subspaces according to preset division rules, and respectively determine the N center coordinates corresponding to the N feature subspaces;
- the first input module is configured to input the N center coordinates to the trained neural network model respectively, and obtain N convolution kernels of N feature subspaces correspondingly;
- the ninth determining module is configured to determine the N feature subspaces and the N convolution kernels as the lightweight model.
- the image processing device further includes:
- the decision tree building module is configured to build a decision tree based on the feature vector corresponding to each pixel in the image to be processed;
- the second input module is configured to input each leaf node in the decision tree to the trained neural network model, and correspondingly obtain a convolution kernel corresponding to each leaf node;
- the tenth determining module is configured to determine each leaf node and the corresponding convolution kernel as the lightweight model.
- the first processing module is further configured to:
- the processed target image is determined.
- the embodiments of the present application provide a computer program product or computer program.
- the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
- the processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the image processing method described in the embodiment of the present application.
- the embodiment of the present application provides a storage medium storing executable instructions, and the executable instructions are stored therein.
- the processor will cause the processor to execute the method provided in the embodiments of the present application.
- the storage medium may be a computer-readable storage medium, for example, Ferromagnetic Random Access Memory (FRAM), Read Only Memory (ROM), and Programmable Read Only Memory (PROM). Read Only Memory), Erasable Programmable Read Only Memory (EPROM, Erasable Programmable Read Only Memory), Electrically Erasable Programmable Read Only Memory (EEPROM, Electrically Erasable Programmable Read Only Memory), flash memory, magnetic surface memory, optical disks, Or CD-ROM (Compact Disk-Read Only Memory) and other memories; it can also be a variety of devices including one or any combination of the foregoing memories.
- FRAM Ferromagnetic Random Access Memory
- ROM Read Only Memory
- PROM Programmable Read Only Memory
- EPROM Erasable Programmable Read Only Memory
- EEPROM Electrically Erasable Programmable Read Only Memory
- flash memory magnetic surface memory, optical disks, Or CD-ROM (Compact Disk-Read Only Memory) and other memories; it can also be a variety of devices including one or any combination of the foregoing
- executable instructions may be in the form of programs, software, software modules, scripts or codes, written in any form of programming language (including compiled or interpreted languages, or declarative or procedural languages), and their It can be deployed in any form, including being deployed as an independent program or as a module, component, subroutine or other unit suitable for use in a computing environment.
- executable instructions may but do not necessarily correspond to files in the file system, and may be stored as part of a file that saves other programs or data, for example, in a HyperText Markup Language (HTML, HyperText Markup Language) document
- HTML HyperText Markup Language
- One or more scripts in are stored in a single file dedicated to the program in question, or in multiple coordinated files (for example, a file storing one or more modules, subroutines, or code parts).
- executable instructions can be deployed to be executed on one computing device, or on multiple computing devices located in one location, or on multiple computing devices that are distributed in multiple locations and interconnected by a communication network Executed on.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
Description
Claims (15)
- 一种图像处理方法,所述方法由图像处理设备执行,包括:获取待处理图像;当所述待处理图像为灰度图像时,提取所述待处理图像中各个像素点的特征向量,并确定所述各个像素点对应的邻域图像块;利用轻量化模型对各个像素点的特征向量和邻域图像块进行处理,得到处理后的目标图像,其中,所述轻量化模型是对训练好的神经网络模型进行轻量化处理得到的;输出所述目标图像。
- 根据权利要求1中所述的方法,其中,所述方法还包括:当所述待处理图像为彩色图像时,将所述待处理图像转换至亮度色度YUV色域,得到亮度Y通道待处理图像和色度UV通道待处理图像;提取所述Y通道待处理图像中各个Y通道像素点的特征向量,并确定所述各个Y通道像素点对应的邻域图像块;利用所述轻量化模型对所述各个Y通道像素点的特征向量和邻域图像块进行处理,得到处理后的Y通道目标图像;利用预设的图像处理算法对所述UV通道待处理图像进行处理,得到UV通道目标图像;基于所述Y通道目标图像和UV通道目标图像确定目标图像,其中,所述目标图像与待处理图像的色域相同。
- 根据权利要求1中所述的方法,其中,所述获取待处理图像,包括:获取待处理视频文件;对所述待处理视频文件进行解码,得到所述待处理视频文件中的各个视频帧图像;将所述各个视频帧图像确定为所述待处理图像。
- 根据权利要求1中所述的方法,其中,所述提取所述待处理图像中各个像素点的特征向量,包括:确定所述待处理图像对应的第一方向梯度图和第二方向梯度图;确定所述待处理图像中各个像素点在第一方向梯度图中的第一梯度邻域块和在第二方向梯度图中的第二梯度邻域块;基于所述各个像素点的第一梯度邻域块和第二梯度邻域块确定所述各个像素点的特征向量。
- 根据权利要求4中所述的方法,其中,所述基于所述各个像素点的第一梯度邻域块和第二梯度邻域块确定所述各个像素点的特征向量,包括:基于所述各个像素点的第一梯度邻域块和第二梯度邻域块确定所述各个像素点的协方矩阵;确定各个协方矩阵对应的各个第一特征值和各个第二特征值;确定所述各个像素点的邻域图像块对应的各个方差值;基于所述各个第一特征值、各个第二特征值和各个方差值确定所述各个像素点的特征向量。
- 根据权利要求1至5中任一项所述的方法,其中,所述方法还包括:获取训练数据和预设的神经网络模型,其中,所述训练数据包括第一训练图像和第二训练图像,其中,所述第二训练图像是对所述第一训练图像进行下采样得到的,所述神经网络模型包括生成模型和判别模型;利用所述神经网络模型对所述第二训练图像进行处理,得到预测图像;基于所述预测图像、所述第一训练图像和预设的目标函数对所述神经网络模型进行反向传播训练,得到训练好的神经网络模型。
- 根据权利要求6中所述的方法,其中,所述预设的目标函数包括生成目标函数和判别目标函数,所述基于所述预测图像、所述第一训练图像和预设的目标函数对所述神经网络模型进行反向传播训练,得到训练好的神经网络模型,包括:固定所述判别模型的判别参数,基于所述预测图像、所述第一训练图像和生成目标函数对所述生成模型进行反向传播训练,对所述生成模型的生成参数进行调整;固定生成判别模型的生成参数,基于所述预测图像、所述第一训练图像和判别目标函数对所述判别模型进行反向传播训练,对所述判别模型的判别参数进行调整,直至达到预设的训练完成条件,得到训练好的神经网络模型。
- 根据权利要求7中所述的方法,其中,所述方法还包括:确定所述预测图像和所述第一训练图像之间的像素级误差值和内容误差值;基于所述预测图像和所述判别模型确定所述预测图像的第一像素判别误差值和第一全局判别误差值;基于预设的生成权重值、所述像素级误差值、所述内容误差值、所述第一像素判别误差值和所述第一全局判别误差值确定生成目标函数。
- 根据权利要求7中所述的方法,其中,所述方法还包括:基于所述预测图像和所述判别模型确定所述预测图像的第二像素判别误差值和第二全局判别误差值;基于所述第一训练图像和所述判别模型确定所述第一训练图像的第三像素判别误差值和第三全局判别误差值;基于预设的判别权重值、所述第二像素判别误差值、所述第二全局判别误差值、所述第三像素判别误差值和所述第三全局判别误差值确定判别目标函数。
- 根据权利要求1中所述的方法,其中,所述方法还包括:基于所述待处理图像中各个像素点对应的特征向量,确定特征空间;将所述特征空间按照预设的划分规则,划分为N个特征子空间,并分别确定所述N个特征子空间对应的N个中心坐标,其中N为大于2的整数;将所述N个中心坐标分别输入至所述训练好的神经网络模型,对应得到N个特征子空间的N个卷积核;将所述N个特征子空间和所述N个卷积核确定为所述轻量化模型。
- 根据权利要求1中所述的方法,其中,所述方法还包括:基于所述待处理图像中各个像素点对应的特征向量,构建决策树;将所述决策树中各个叶子节点分别输入至所述训练好的神经网络模型,对应得到各个叶子节点对应的卷积核;将所述各个叶子节点和对应的卷积核确定为所述轻量化模型。
- 根据权利要求10或11中所述的方法,其中,所述利用轻量化模型对各个像素点的特征向量和邻域图像块进行处理,得到处理后的目标图像,包括:基于所述各个像素点的特征向量和所述轻量化模型,确定各个像素点对应的各个卷积核;将所述各个像素点的邻域图像块和对应的各个卷积核进行卷积计算,得到所述各个像素点处理后的像素值;基于各个像素点处理后的像素值,确定处理后的目标图像。
- 一种图像处理装置,包括:第一获取模块,配置为获取待处理图像;第一提取模块,配置为当所述待处理图像为灰度图像时,提取所述待处理图像中各个像素点的特征向量,并确定所述各个像素点对应的邻域图像块;第一处理模块,配置为利用轻量化模型对各个像素点的特征向量和邻域图像块进行处理,得到处理后的目标图像,其中,所述轻量化模型是对训练好的神经网络模型进行轻量化处理得到的;输出模块,配置为输出所述目标图像。
- 一种图像处理设备,包括:存储器,配置为存储可执行指令;处理器,配置为执行所述存储器中存储的可执行指令时,实现权利要求1至12任一项所述的方法。
- 一种计算机可读存储介质,存储有可执行指令,用于引起处理器执行时,实现权利要求1至12任一项所述的方法。
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2022566432A JP7464752B2 (ja) | 2020-06-03 | 2021-05-17 | 画像処理方法、装置、機器及びコンピュータプログラム |
| EP21817967.9A EP4044106B1 (en) | 2020-06-03 | 2021-05-17 | Image processing method and apparatus, device, and computer readable storage medium |
| US17/735,942 US12198296B2 (en) | 2020-06-03 | 2022-05-03 | Image processing method, apparatus, device, and computer-readable storage medium |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010495781.1 | 2020-06-03 | ||
| CN202010495781.1A CN111402143B (zh) | 2020-06-03 | 2020-06-03 | 图像处理方法、装置、设备及计算机可读存储介质 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/735,942 Continuation US12198296B2 (en) | 2020-06-03 | 2022-05-03 | Image processing method, apparatus, device, and computer-readable storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021244270A1 true WO2021244270A1 (zh) | 2021-12-09 |
Family
ID=71431873
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2021/094049 Ceased WO2021244270A1 (zh) | 2020-06-03 | 2021-05-17 | 图像处理方法、装置、设备及计算机可读存储介质 |
Country Status (5)
| Country | Link |
|---|---|
| US (1) | US12198296B2 (zh) |
| EP (1) | EP4044106B1 (zh) |
| JP (1) | JP7464752B2 (zh) |
| CN (1) | CN111402143B (zh) |
| WO (1) | WO2021244270A1 (zh) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114066780A (zh) * | 2022-01-17 | 2022-02-18 | 广东欧谱曼迪科技有限公司 | 4k内窥镜图像去雾方法、装置、电子设备及存储介质 |
| CN116385281A (zh) * | 2023-02-14 | 2023-07-04 | 大连工业大学 | 一种基于真实噪声模型与生成对抗网络的遥感图像去噪方法 |
| CN116982072A (zh) * | 2022-02-28 | 2023-10-31 | 京东方科技集团股份有限公司 | 机器学习模型的训练方法、装置和图像的处理方法、装置 |
| CN119150919A (zh) * | 2024-11-15 | 2024-12-17 | 中国海洋大学 | 一种基于归纳学习的时空数据的插值方法、系统和装置 |
Families Citing this family (31)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111402143B (zh) * | 2020-06-03 | 2020-09-04 | 腾讯科技(深圳)有限公司 | 图像处理方法、装置、设备及计算机可读存储介质 |
| CN113936163B (zh) * | 2020-07-14 | 2024-10-15 | 武汉Tcl集团工业研究院有限公司 | 一种图像处理方法、终端以及存储介质 |
| CN111932463B (zh) * | 2020-08-26 | 2023-05-30 | 腾讯科技(深圳)有限公司 | 图像处理方法、装置、设备及存储介质 |
| CN114173137A (zh) * | 2020-09-10 | 2022-03-11 | 北京金山云网络技术有限公司 | 视频编码方法、装置及电子设备 |
| CN114266696B (zh) * | 2020-09-16 | 2024-06-11 | 广州虎牙科技有限公司 | 图像处理方法、装置、电子设备和计算机可读存储介质 |
| CN112333456B (zh) * | 2020-10-21 | 2022-05-10 | 鹏城实验室 | 一种基于云边协议的直播视频传输方法 |
| CN114612294B (zh) * | 2020-12-08 | 2025-01-03 | 武汉Tcl集团工业研究院有限公司 | 一种图像超分辨率处理方法和计算机设备 |
| CN114612295B (zh) * | 2020-12-08 | 2025-08-05 | 武汉Tcl集团工业研究院有限公司 | 一种图像超分辨率处理方法和计算机设备 |
| CN112801879B (zh) * | 2021-02-09 | 2023-12-08 | 咪咕视讯科技有限公司 | 图像超分辨率重建方法、装置、电子设备及存储介质 |
| CN112837242B (zh) * | 2021-02-19 | 2023-07-14 | 成都国科微电子有限公司 | 一种图像降噪处理方法、装置、设备及介质 |
| CN112991203B (zh) * | 2021-03-08 | 2024-05-07 | Oppo广东移动通信有限公司 | 图像处理方法、装置、电子设备及存储介质 |
| CN113128116B (zh) | 2021-04-20 | 2023-09-26 | 上海科技大学 | 可用于轻量级神经网络的纯整型量化方法 |
| CN113242440A (zh) * | 2021-04-30 | 2021-08-10 | 广州虎牙科技有限公司 | 直播方法、客户端、系统、计算机设备以及存储介质 |
| CN113379629B (zh) * | 2021-06-08 | 2024-08-16 | 深圳思谋信息科技有限公司 | 卫星图像去噪方法、装置、计算机设备和存储介质 |
| CN113822803A (zh) * | 2021-07-22 | 2021-12-21 | 腾讯科技(深圳)有限公司 | 图像超分处理方法、装置、设备及计算机可读存储介质 |
| EP4369282A4 (en) | 2021-09-02 | 2024-11-20 | Samsung Electronics Co., Ltd. | IMAGE PROCESSING DEVICE AND OPERATING METHOD THEREFOR |
| CN113808020A (zh) * | 2021-09-18 | 2021-12-17 | 北京字节跳动网络技术有限公司 | 图像处理方法及设备 |
| KR102566798B1 (ko) * | 2021-10-25 | 2023-08-11 | 에스케이텔레콤 주식회사 | 코드 레벨의 초해상화 방법 및 그를 위한 초해상화 모델 학습방법 |
| CN116168076B (zh) * | 2021-11-24 | 2025-01-14 | 腾讯科技(深圳)有限公司 | 一种图像处理方法、装置、设备及存储介质 |
| CN114298904B (zh) * | 2021-12-15 | 2025-03-11 | 深圳云天励飞技术股份有限公司 | 图像放大方法、装置、电子设备及存储介质 |
| US12033303B2 (en) * | 2022-02-08 | 2024-07-09 | Kyocera Document Solutions, Inc. | Mitigation of quantization-induced image artifacts |
| CN116739907A (zh) * | 2022-03-01 | 2023-09-12 | Oppo广东移动通信有限公司 | 图像去噪方法、装置、设备和计算机可读存储介质 |
| JP2024031119A (ja) * | 2022-08-25 | 2024-03-07 | 富士フイルム株式会社 | 画像処理装置、画像処理方法、画像処理プログラム、及び内視鏡システム |
| CN115550653B (zh) * | 2022-09-21 | 2025-02-11 | 南华大学 | 基于轻量级神经网络的动态3d点云编码模式快速确定方法及系统 |
| CN116976299A (zh) * | 2022-10-11 | 2023-10-31 | 中移(杭州)信息技术有限公司 | 广告生成方法、装置、设备及存储介质 |
| CN116128730A (zh) * | 2023-02-16 | 2023-05-16 | 京东方科技集团股份有限公司 | 一种实时视频处理方法、装置、设备、介质及产品 |
| US20240321320A1 (en) * | 2023-03-21 | 2024-09-26 | KINT Inc. | Harmonizing system for optimizing sound in content |
| US12556733B2 (en) * | 2023-04-20 | 2026-02-17 | Tencent America LLC | Efficient neural network decoder for image compression |
| CN117475367B (zh) * | 2023-06-12 | 2024-05-07 | 中国建筑第四工程局有限公司 | 基于多规则协调的污水图像处理方法及系统 |
| CN116993590B (zh) * | 2023-08-09 | 2025-01-03 | 中国电信股份有限公司技术创新中心 | 图像处理方法及装置、存储介质及电子设备 |
| US12524637B2 (en) * | 2024-06-18 | 2026-01-13 | Datalogic Ip Tech S.R.L. | System and method for symbol detection |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20070086627A1 (en) * | 2005-10-18 | 2007-04-19 | Samsung Electronics Co., Ltd. | Face identification apparatus, medium, and method |
| CN104598908A (zh) * | 2014-09-26 | 2015-05-06 | 浙江理工大学 | 一种农作物叶部病害识别方法 |
| CN109308679A (zh) * | 2018-08-13 | 2019-02-05 | 深圳市商汤科技有限公司 | 一种图像风格转换方及装置、设备、存储介质 |
| US20200082154A1 (en) * | 2018-09-10 | 2020-03-12 | Algomus, Inc. | Computer vision neural network system |
| CN111402143A (zh) * | 2020-06-03 | 2020-07-10 | 腾讯科技(深圳)有限公司 | 图像处理方法、装置、设备及计算机可读存储介质 |
Family Cites Families (22)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2010093650A (ja) | 2008-10-09 | 2010-04-22 | Nec Corp | 端末、画像表示方法及びプログラム |
| GB2539846B (en) * | 2015-02-19 | 2017-11-01 | Magic Pony Tech Ltd | Online training of hierarchical algorithms |
| US10235608B2 (en) * | 2015-12-22 | 2019-03-19 | The Nielsen Company (Us), Llc | Image quality assessment using adaptive non-overlapping mean estimation |
| CN108960514B (zh) * | 2016-04-27 | 2022-09-06 | 第四范式(北京)技术有限公司 | 展示预测模型的方法、装置及调整预测模型的方法、装置 |
| US10861143B2 (en) * | 2017-09-27 | 2020-12-08 | Korea Advanced Institute Of Science And Technology | Method and apparatus for reconstructing hyperspectral image using artificial intelligence |
| CN108062744B (zh) * | 2017-12-13 | 2021-05-04 | 中国科学院大连化学物理研究所 | 一种基于深度学习的质谱图像超分辨率重建方法 |
| KR101882704B1 (ko) * | 2017-12-18 | 2018-07-27 | 삼성전자주식회사 | 전자 장치 및 그 제어 방법 |
| US20190325293A1 (en) * | 2018-04-19 | 2019-10-24 | National University Of Singapore | Tree enhanced embedding model predictive analysis methods and systems |
| US11756160B2 (en) * | 2018-07-27 | 2023-09-12 | Washington University | ML-based methods for pseudo-CT and HR MR image estimation |
| CN109034102B (zh) * | 2018-08-14 | 2023-06-16 | 腾讯科技(深圳)有限公司 | 人脸活体检测方法、装置、设备及存储介质 |
| CN109063666A (zh) * | 2018-08-14 | 2018-12-21 | 电子科技大学 | 基于深度可分离卷积的轻量化人脸识别方法及系统 |
| CN109598676A (zh) * | 2018-11-15 | 2019-04-09 | 华南理工大学 | 一种基于哈达玛变换的单幅图像超分辨率方法 |
| CN109409342A (zh) * | 2018-12-11 | 2019-03-01 | 北京万里红科技股份有限公司 | 一种基于轻量卷积神经网络的虹膜活体检测方法 |
| CN109902720B (zh) * | 2019-01-25 | 2020-11-27 | 同济大学 | 基于子空间分解进行深度特征估计的图像分类识别方法 |
| CN109949235A (zh) * | 2019-02-26 | 2019-06-28 | 浙江工业大学 | 一种基于深度卷积神经网络的胸部x光片去噪方法 |
| CN110084108A (zh) * | 2019-03-19 | 2019-08-02 | 华东计算技术研究所(中国电子科技集团公司第三十二研究所) | 基于gan神经网络的行人重识别系统及方法 |
| CN110136063B (zh) * | 2019-05-13 | 2023-06-23 | 南京信息工程大学 | 一种基于条件生成对抗网络的单幅图像超分辨率重建方法 |
| CN110348350B (zh) * | 2019-07-01 | 2022-03-25 | 电子科技大学 | 一种基于面部表情的驾驶员状态检测方法 |
| CN110796101A (zh) * | 2019-10-31 | 2020-02-14 | 广东光速智能设备有限公司 | 一种嵌入式平台的人脸识别方法及系统 |
| CN110907732A (zh) * | 2019-12-04 | 2020-03-24 | 江苏方天电力技术有限公司 | 基于pca-rbf神经网络的调相机故障诊断方法 |
| CN111105352B (zh) * | 2019-12-16 | 2023-04-25 | 佛山科学技术学院 | 超分辨率图像重构方法、系统、计算机设备及存储介质 |
| US11240465B2 (en) * | 2020-02-21 | 2022-02-01 | Alibaba Group Holding Limited | System and method to use decoder information in video super resolution |
-
2020
- 2020-06-03 CN CN202010495781.1A patent/CN111402143B/zh active Active
-
2021
- 2021-05-17 WO PCT/CN2021/094049 patent/WO2021244270A1/zh not_active Ceased
- 2021-05-17 EP EP21817967.9A patent/EP4044106B1/en active Active
- 2021-05-17 JP JP2022566432A patent/JP7464752B2/ja active Active
-
2022
- 2022-05-03 US US17/735,942 patent/US12198296B2/en active Active
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20070086627A1 (en) * | 2005-10-18 | 2007-04-19 | Samsung Electronics Co., Ltd. | Face identification apparatus, medium, and method |
| CN104598908A (zh) * | 2014-09-26 | 2015-05-06 | 浙江理工大学 | 一种农作物叶部病害识别方法 |
| CN109308679A (zh) * | 2018-08-13 | 2019-02-05 | 深圳市商汤科技有限公司 | 一种图像风格转换方及装置、设备、存储介质 |
| US20200082154A1 (en) * | 2018-09-10 | 2020-03-12 | Algomus, Inc. | Computer vision neural network system |
| CN111402143A (zh) * | 2020-06-03 | 2020-07-10 | 腾讯科技(深圳)有限公司 | 图像处理方法、装置、设备及计算机可读存储介质 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4044106A4 |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114066780A (zh) * | 2022-01-17 | 2022-02-18 | 广东欧谱曼迪科技有限公司 | 4k内窥镜图像去雾方法、装置、电子设备及存储介质 |
| CN116982072A (zh) * | 2022-02-28 | 2023-10-31 | 京东方科技集团股份有限公司 | 机器学习模型的训练方法、装置和图像的处理方法、装置 |
| CN116385281A (zh) * | 2023-02-14 | 2023-07-04 | 大连工业大学 | 一种基于真实噪声模型与生成对抗网络的遥感图像去噪方法 |
| CN119150919A (zh) * | 2024-11-15 | 2024-12-17 | 中国海洋大学 | 一种基于归纳学习的时空数据的插值方法、系统和装置 |
Also Published As
| Publication number | Publication date |
|---|---|
| JP7464752B2 (ja) | 2024-04-09 |
| CN111402143B (zh) | 2020-09-04 |
| JP2023523833A (ja) | 2023-06-07 |
| US20220270207A1 (en) | 2022-08-25 |
| CN111402143A (zh) | 2020-07-10 |
| EP4044106A4 (en) | 2023-02-01 |
| EP4044106A1 (en) | 2022-08-17 |
| EP4044106B1 (en) | 2026-04-08 |
| US12198296B2 (en) | 2025-01-14 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111402143B (zh) | 图像处理方法、装置、设备及计算机可读存储介质 | |
| US20240233074A9 (en) | Image processing method and device, training method of neural network, image processing method based on combined neural network model, constructing method of combined neural network model, neural network processor, and storage medium | |
| KR102663519B1 (ko) | 교차 도메인 이미지 변환 기법 | |
| CN114155543B (zh) | 神经网络训练方法、文档图像理解方法、装置和设备 | |
| CN108921225B (zh) | 一种图像处理方法及装置、计算机设备和存储介质 | |
| CN118230081B (zh) | 图像处理方法、装置、电子设备、计算机可读存储介质及计算机程序产品 | |
| CN110378838B (zh) | 变视角图像生成方法,装置,存储介质及电子设备 | |
| CN109934792B (zh) | 电子装置及其控制方法 | |
| CN111104962A (zh) | 图像的语义分割方法、装置、电子设备及可读存储介质 | |
| CN110020676A (zh) | 基于多感受野深度特征的文本检测方法、系统、设备及介质 | |
| CN114008663A (zh) | 实时视频超分辨率 | |
| KR20200128378A (ko) | 이미지 생성 네트워크의 훈련 및 이미지 처리 방법, 장치, 전자 기기, 매체 | |
| EP4425423B1 (en) | Image processing method and apparatus, device, storage medium and program product | |
| CN113066017A (zh) | 一种图像增强方法、模型训练方法及设备 | |
| CN111091010A (zh) | 相似度确定、网络训练、查找方法及装置和存储介质 | |
| CN120322806A (zh) | 各种类别和场景的3d生成 | |
| US11948090B2 (en) | Method and apparatus for video coding | |
| CN117151987B (zh) | 一种图像增强方法、装置及电子设备 | |
| CN116977548A (zh) | 三维重建方法、装置、设备及计算机可读存储介质 | |
| CN116109892A (zh) | 虚拟试衣模型的训练方法及相关装置 | |
| US20240362894A1 (en) | Apparatus and method with image processing | |
| CN115861605B (zh) | 一种图像数据处理方法、计算机设备以及可读存储介质 | |
| US20250336043A1 (en) | Denoising neural networks with shared core sub-networks | |
| CN117011416A (zh) | 一种图像处理方法、装置、设备、介质及程序产品 | |
| HK40025785A (zh) | 图像处理方法、装置、设备及计算机可读存储介质 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21817967 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2021817967 Country of ref document: EP Effective date: 20220419 |
|
| ENP | Entry into the national phase |
Ref document number: 2022566432 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWG | Wipo information: grant in national office |
Ref document number: 2021817967 Country of ref document: EP |









