WO2024217164A1 - 视频去噪模型的处理方法、装置、计算机设备和存储介质 - Google Patents
视频去噪模型的处理方法、装置、计算机设备和存储介质 Download PDFInfo
- Publication number
- WO2024217164A1 WO2024217164A1 PCT/CN2024/079883 CN2024079883W WO2024217164A1 WO 2024217164 A1 WO2024217164 A1 WO 2024217164A1 CN 2024079883 W CN2024079883 W CN 2024079883W WO 2024217164 A1 WO2024217164 A1 WO 2024217164A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video
- video frame
- downsampled
- features
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/60—Image enhancement or restoration using machine learning, e.g. neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/70—Denoising; Smoothing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/30—Noise filtering
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Definitions
- the present application relates to the field of computer technology, and in particular to a processing method, device, computer equipment and storage medium for a video denoising model.
- video denoising technology has gradually become a research hotspot in the field of improving video quality.
- the video denoising model based on deep learning has obvious advantages in denoising effect and speed, and has broad application prospects.
- the existing single-frame based video denoising model cannot fully consider the correlation and continuity of the video in the temporal dimension, and cannot extract better features.
- the multi-frame based video denoising model is also unable to extract better features when computing resources are limited, resulting in poor denoising effect of the existing video denoising model on the video.
- a processing method, apparatus, computer device, computer-readable storage medium, and computer program product for a video denoising model that can improve the video denoising effect are provided.
- the present application provides a method for processing a video denoising model, which is executed by a computer device, and the method includes:
- the parameters in the video denoising model are adjusted to obtain a target video denoising model;
- the reference video frame is a video frame in the reference video corresponding to the target video frame;
- the target video denoising model is used to denoise the video to be denoised.
- the present application also provides a processing device for a video denoising model.
- the device comprises:
- a video frame acquisition module used to acquire a target video frame in a video frame sequence of a sample video, and to acquire a reference video corresponding to the sample video;
- a detail feature extraction module used to extract image detail features of the target video frame through a first branch of a video denoising model
- a fusion feature extraction module used for downsampling the video frame sequence to obtain a downsampled video frame sequence, and extracting features from the downsampled video frame sequence through the second branch of the video denoising model to obtain an image fusion feature;
- a prediction module used for generating a predicted video frame based on the image fusion feature and the image detail feature
- the parameter adjustment module is used to adjust the parameters in the video denoising model according to the loss value between the predicted video frame and the reference video frame to obtain a target video denoising model;
- the reference video frame is the reference video frame.
- the target video denoising model is used to perform denoising on the video to be denoised.
- the present application further provides a computer device, wherein the computer device comprises a memory and a processor, wherein the memory stores computer-readable instructions, and when the processor executes the computer-readable instructions, the steps of the processing method of the video denoising model are implemented.
- the present application further provides a computer-readable storage medium having computer-readable instructions stored thereon, which implement the steps of the processing method of the video denoising model when executed by a processor.
- the present application further provides a computer program product, which includes computer-readable instructions, and when the computer-readable instructions are executed by a processor, the steps of the processing method of the video denoising model are implemented.
- FIG1 is a diagram of an application environment of a processing method of a video denoising model in one embodiment
- FIG2a is a schematic flow chart of a method for processing a video denoising model in one embodiment
- FIG2b is a schematic flow chart of a method for processing a video denoising model in another embodiment
- FIG3 is a schematic diagram of denoising a noisy video frame in one embodiment
- FIG4 is a schematic diagram of adding noise to a video frame in one embodiment
- FIG5 is a schematic diagram of a real noise image in one embodiment
- FIG6 is a schematic diagram of a flow chart of an image fusion feature extraction step in one embodiment
- FIG7 is a schematic diagram of a process flow of a video denoising step in one embodiment
- FIG8 is a flow chart of a method for processing a video denoising model in another embodiment
- FIG9 is a schematic diagram of sample data processing in one embodiment
- FIG10 is a schematic diagram of a video denoising model structure in one embodiment
- FIG11 is a schematic diagram of a noisy video frame in another embodiment
- FIG12 is a schematic diagram of a denoised video frame in one embodiment
- FIG13 is a structural block diagram of a processing device for a video denoising model in one embodiment
- FIG14 is a structural block diagram of a processing device for a video denoising model in another embodiment
- FIG15 is a diagram showing the internal structure of a computer device in one embodiment
- FIG. 16 is a diagram showing the internal structure of a computer device in another embodiment.
- first, second, and third are only used to distinguish similar objects, and do not represent a specific order of the objects. It is understandable that the specific order or sequence of "first, second, and third” can be interchanged where permitted, so that the embodiments of the present application described herein can be used in addition to the embodiments in the figures. The invention may be performed in any order other than that shown or described.
- the processing method of the video denoising model provided in the embodiment of the present application can be applied in the application environment shown in FIG1.
- the terminal 102 communicates with the server 104 through a network.
- the data storage system can store the data that the server 104 needs to process.
- the data storage system can be integrated on the server 104, or it can be placed on the cloud or other servers.
- the processing method of the video denoising model is executed by the terminal 102 or the server 104 alone, or by the terminal 102 and the server 104 in collaboration.
- the processing method of the video denoising model is executed by terminal 102, which obtains a target video frame in a video frame sequence of a sample video, and obtains a reference video corresponding to the sample video; extracts image detail features of the target video frame through a first branch of the video denoising model; downsamples the video frame sequence to obtain a downsampled video frame sequence, and extracts features of the downsampled video frame sequence through a second branch of the video denoising model to obtain image fusion features; generates a predicted video frame based on the image fusion features and the image detail features; adjusts parameters in the video denoising model according to a loss value between the predicted video frame and the reference video frame to obtain a target video denoising model; wherein the reference video frame is a video frame in the reference video corresponding to the target video frame, and the target video denoising model is used to denoise the video to be denoised.
- the terminal 102 can be, but is not limited to, various desktop computers, laptops, smart phones, tablet computers, Internet of Things devices and portable wearable devices.
- the Internet of Things devices can be smart speakers, smart TVs, smart air conditioners, smart car-mounted devices, etc.
- Portable wearable devices can be smart watches, smart bracelets, head-mounted devices, etc.
- the server 104 can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms.
- the terminal 102 and the server 104 can be directly or indirectly connected via wired or wireless communications, and this application does not limit this.
- a method for processing a video denoising model is provided, which is described by taking the method applied to the computer device (terminal 102 or server 104) in FIG. 1 as an example, and includes the following steps:
- the sample video is the video data used to train the machine learning model.
- the sample video usually consists of multiple video frames, and each video frame contains information about the video content, such as color, shape, action, etc.
- the sample video can come from various sources, such as real-life recordings, simulated videos, videos on the Internet, etc.
- the sample video is a video with noise, and the reference video is a noise-free or extremely low-noise video corresponding to the sample video. It is usually used as the "real" or "ideal" state in the video denoising task.
- the reference video provides a target standard for evaluating the denoising effect and model performance.
- the sample videos in the embodiments of the present application include static videos with real noise and noisy dynamic videos.
- Static video refers to the video data generated when the camera is fixed and the subject is not moving. Since the camera is not moving, the real noise in the static video is usually caused by factors such as the noise of the camera itself, uneven lighting, and sensor noise. Therefore, static videos with real noise can better reflect the video noise situation in actual applications; dynamic video refers to the video data generated when the camera or the subject is moving.
- noisy dynamic video refers to the original video data. By adding noise to the video data to simulate the noise situation in the actual application scenario, the robustness and performance of the video denoising algorithm or model can be better tested and evaluated through the noisy dynamic video.
- the reference video in the embodiments of the present application includes a clear static video obtained by smoothing the static video and a clear dynamic video without noise.
- the terminal extracts a video frame sequence from the sample video at a certain time interval, and obtains the current target video frame to be processed from the extracted video frame sequence.
- the video frame sequence extracted from the sample video by the terminal includes 10 video frames, and the current target video frame to be processed is the second frame, then the video frame of the second frame is obtained from the video frame sequence.
- the video denoising model refers to a computer vision model or algorithm used to remove noise from the video.
- Video noise is usually caused by factors such as imperfections in the acquisition equipment, interference in signal transmission, and compression algorithms. Therefore, in many video applications, such as video conferencing and video encoding, denoising is an important preprocessing step.
- the task of the video denoising model is to restore the clearest and most noise-free video from the input noisy video, while retaining the details and quality of the input noisy video as much as possible.
- the first branch of the video denoising model can specifically be a high-resolution branch, which is used to process the target video frame of the original resolution. It can be understood that the original resolution of the target video frame is high resolution. High resolution means that the resolution of the image reaches a specific resolution threshold. The resolution threshold can be set according to needs.
- the target video frame with high resolution usually carries more noise and richer detail information. By performing feature processing on the target video frame through the first branch of the video denoising model, richer image detail features can be obtained.
- Image detail features refer to the features of the detailed parts of an image, such as texture, edges, corners, etc. By extracting image detail features, noise and signals can be distinguished more accurately, and more detail information can be restored, thereby improving the quality and clarity of the image.
- the terminal inputs the target video frame into the first branch of the video denoising model, processes the target video frame through each network layer of the first branch, and obtains image detail features of the target video frame.
- the downsampled video frame sequence refers to the video frame sequence obtained by downsampling the video frame sequence of the sample video.
- downsampling refers to reducing the resolution of the image, thereby reducing the size of the image and reducing the detail information in the image. It is usually used to reduce the amount of calculation and memory usage, while accelerating the training and reasoning process of the model.
- the second branch of the video denoising model can specifically be a low-resolution branch, which is used to process the downsampled video frame sequence.
- the resolution of each downsampled video frame in the downsampled video frame sequence is low resolution.
- Low resolution means that the resolution of the image does not reach a specific resolution threshold.
- the resolution threshold can be set according to demand.
- the size of each downsampled video frame in the low-resolution downsampled video frame sequence is reduced or the detail information is reduced.
- Processing the downsampled video frame sequence through the second branch of the video denoising model can effectively reduce the amount of calculation and improve the operating efficiency of the model. At the same time, it can also enhance the generalization ability of the model, making it more suitable for processing videos of different resolutions.
- Image fusion features are feature representations obtained by fusing features of at least two downsampled video frames in a downsampled video frame sequence. It is understandable that for video data with noise, it is often difficult to obtain good denoising effects by using only one frame of image for denoising, because a single frame of image may have too much noise and distortion and cannot provide sufficient information. By fusing the features of multiple downsampled video frames, the expressiveness of the features can be improved, thereby improving the denoising effect of the model. In addition, the feature representations obtained after feature extraction of each downsampled video frame in the downsampled video frame sequence may have information loss. Fusion of the features of multiple downsampled video frames can improve the expressiveness of the features, thereby improving the denoising effect of the model.
- the terminal after obtaining the video frame sequence, the terminal performs downsampling processing on each video frame in the video frame sequence to obtain a downsampled video frame sequence, and inputs the downsampled video frame sequence into the second branch of the video denoising model.
- Each sub-branch of the branch processes each down-sampled video frame in the down-sampled video frame sequence respectively to obtain image fusion features.
- the predicted video frame refers to a video frame generated by denoising the input video in the video denoising model.
- the terminal fuses the image fusion feature and the image detail feature to obtain the global image feature, and generates a predicted video frame based on the global image feature.
- the reference video frame is the video frame in the reference video corresponding to the target video frame
- the loss value is used to evaluate the degree of difference between the predicted video frame obtained by the video denoising model after denoising the input video and the corresponding video frame in the reference video.
- the smaller the loss value the smaller the difference between the result predicted by the model and the actual result, and the better the prediction accuracy and effect of the model.
- the target video denoising model is a trained machine learning model used to denoise the target video.
- the terminal after obtaining the predicted video frame, obtains a video frame corresponding to the target video frame from the reference video, which video frame can also be called a reference video frame, determines a loss value based on the predicted video frame and the corresponding reference video frame, and adjusts the parameters in the video denoising model based on the determined loss value until the training is stopped when the convergence condition is met, and the target video denoising model is obtained.
- convergence means that the training process of the video denoising model has become stable, that is, the video denoising model has learned the characteristics of the data and no longer has significant improvement.
- the convergence conditions include a fixed number of training rounds, a fixed threshold of the loss function, etc. When the model reaches this condition, the training is stopped to avoid overfitting.
- the terminal adjusts the values of the weight parameters and bias parameters in the video denoising model based on the loss value to obtain the adjusted video denoising model, and re-executes step S202 until the training stops when the convergence condition is met to obtain the target video denoising model.
- the terminal may determine based on the following formula:
- L represents the loss value
- I LQ represents the video frame sequence in the sample video
- T represents the number of video frames in the video frame sequence
- F(I LQ ) i represents the predicted video frame corresponding to the i-th video frame (target video frame) in the video frame sequence
- the terminal extracts the image detail features of the target video frame through the first branch of the video denoising model.
- the downsampled video frame sequence is extracted through the second branch of the video denoising model to obtain image fusion features, and a predicted video frame is generated based on the image fusion features and the image detail features.
- the parameters in the video denoising model can be adjusted according to the loss value between the predicted video frame and the video frame corresponding to the target video frame in the reference video, so as to obtain a target video denoising model with better denoising effect.
- the sample video includes a static video with real noise and a dynamic video with added noise;
- the video includes a clear static video and a dynamic video without noise added obtained by smoothing a static video.
- the static video also carries the added noise
- the processing method of the above-mentioned video denoising model also includes the following steps: performing video capture on the static object to obtain the original static video carrying the real noise; performing noise addition processing on the original static video to obtain the static video; the static video carries the added noise and the real noise; and performing smoothing processing on the original static video to obtain a clear static video.
- noise is the noise added to the video artificially
- the types of noise include Gaussian noise, salt and pepper noise, pseudo-random noise, etc.
- static objects refer to objects that remain motionless.
- Smoothing is an image processing method, and its main purpose is to reduce the noise of the image. In video processing, smoothing can be applied to each frame of the video. By smoothing each frame, the video can be made smoother and more natural, and the noise can be reduced. Smoothing usually needs to be applied to each frame, so for video, smoothing can also be called time domain filtering.
- the terminal keeps the video acquisition device still and shoots a static object to obtain a static video, which is the original static video carrying real noise.
- a preset noise addition algorithm is used to perform noise addition processing on the original static video to obtain a static video, which carries the noise addition noise and real noise.
- a preset smoothing algorithm is used to smooth the original static video to obtain a clear static video.
- the smoothing algorithms include Gaussian blur, median filtering, mean filtering, etc.
- Gaussian blur can reduce the noise of the image by taking the weighted average of the pixels around each pixel.
- Median filtering and mean filtering can reduce the noise of the image by calculating the median or average of the pixels around each pixel.
- the noise level of the clear static video is significantly lower than that of the original static video. Therefore, the clear static video can also be approximated as a video without noise, so that it can be used as a reference video without noise during model training.
- the terminal uses a preset smoothing algorithm to smooth the original static video
- the process of obtaining a clear static video specifically includes the following steps: determining the frame difference between adjacent original static video frames in the original static video, determining the area where the frame difference reaches a frame difference threshold as the noise area in the corresponding original static video frame, and smoothing the noise area in each original static video frame to obtain a clear static video.
- the acquisition device may not be absolutely stable during video acquisition. There may be some very small jitters, and the flow of gas in the environment may cause slight movement of the static object, etc., resulting in the original static video being not absolutely static, but relatively static.
- the frame difference between adjacent video frames should be 0.
- FIG. 3 shows three adjacent noisy video frames
- (a) in Figure 3 is a schematic diagram of the frame difference between two adjacent noisy video frames, and after smoothing the three noisy video frames, a clear video frame as shown in (c) in Figure 3 is obtained
- (d) in Figure 3 is a schematic diagram of the frame difference between two adjacent clear video frames.
- the terminal acquires the original static video with real noise by collecting the video of the static object.
- the original static video is processed with noise to obtain a static video.
- the static video carries the noise added and the real noise.
- the original static video is smoothed to obtain a clear static video. Therefore, the static video containing the real noise can be used as the sample video.
- the clear static video is used as the reference video to train the video denoising model, which can better simulate the noise situation in the real scene and improve the denoising effect of the target video denoising model.
- the terminal performs noise processing on the original static video
- the process of obtaining the static video specifically includes the following steps: obtaining partial pixels from each noisy video frame of the original static video; generating corresponding first pixel images according to the partial pixels of each noisy video frame; generating first initial noise images corresponding to each noisy video frame; fusing the first initial noise images with the first pixel images respectively to obtain first noise images corresponding to each noisy video frame; and fusing each first noise image into the corresponding noisy video frame to obtain a static video.
- some pixels refer to some pixels in the noisy video frame, which can be randomly selected from the noisy video frame.
- the first pixel image is used to describe the distribution of some pixels. Specifically, the grayscale value at the position corresponding to some pixels in the first pixel image is 1, which means that noise is added at the position corresponding to this pixel. The grayscale value at other positions other than some pixels is 0, which means that no noise is added at the position corresponding to this pixel.
- the terminal after obtaining the original static video, the terminal obtains each noisy video frame from the original static video, and for any noisy video frame, randomly selects some pixels from the noisy video frame, and generates a first pixel image with the same size as the noisy video frame based on the selected some pixels, wherein the grayscale values at the positions corresponding to some pixels in the first pixel image can be 1, and the grayscale values at other positions outside the some pixels can be 0, and a preset noise generation algorithm is used to generate a first initial noise image, and the first pixel image is dot-multiplied by the first initial noise image to obtain a first noise image, and the first noise image is fused into the noisy video frame to obtain the corresponding noisy static video frame. It can be understood that the above noise addition processing is performed on each noisy video frame in the original static video to obtain a noisy static video.
- the preset noise generation algorithm may be a random distribution algorithm, such as a Gaussian distribution algorithm.
- the Gaussian distribution algorithm is used to process the corresponding noisy video frame to obtain a first initial noise image.
- the terminal fuses the first noise image into the corresponding noisy video frame, and specifically can use pixel-by-pixel weighted averaging to achieve image fusion, which specifically includes the following steps: obtaining a first weight corresponding to the first noise image and a second weight corresponding to the noisy video frame, determining a weighted pixel value corresponding to each target pixel based on the first weight and the pixel value of each pixel in the first noise image, and the second weight and the pixel value of each pixel in the noisy video frame, and generating a noisy static video frame based on the weighted pixel value of each target pixel.
- the target pixel refers to a pixel in the noisy static video frame.
- the first row in FIG4 shows a traditional noise adding method, which specifically includes first randomly generating a noise image, and directly fusing the noise image onto the image to be noised (clean image) to obtain a corresponding noise image. From the noise image, it can be seen that the noise is evenly added to the clean image. However, as shown in FIG5 , in a real image, the noise (the dots in the figure represent the noise) is not evenly distributed at each pixel position.
- the noise adding method used in the embodiment of the present application is shown in the second row or the third row in FIG4 , which first randomly selects some pixels from the image to be noised (clean image), and generates a noise image based on the selected some pixels.
- a pixel image is formed, and the pixel image is fused with the corresponding noise image to obtain a noise image after adding noise, wherein the pixel image is a matrix composed of only 0 and 1 with the same length and width as the image to be added with noise, 0 indicates that no noise is added to this pixel position, and 1 indicates that noise is added to this pixel position.
- the images to be added with noise (clean images) in the second and third rows of FIG4 are the same, and the randomly generated noise images are also the same, but the pixel images generated respectively are different, and the noise adding coefficients used in the noise adding are also different, so that the noise images obtained are also different, wherein the noise adding coefficient can be specifically based on the weight corresponding to the noise image and the weight corresponding to the clean image.
- the terminal obtains partial pixels from each noisy video frame of the original static video; generates corresponding first pixel images according to the partial pixels of each noisy video frame; generates first initial noise images corresponding to each noisy video frame; fuses the first initial noise images with the first pixel images respectively to obtain first noise images corresponding to each noisy video frame; and fuses each first noise image into the corresponding noisy video frame to obtain a static video, so that the obtained static video can more accurately simulate the distribution of noise in the actual image, and also increase the diversity of noise.
- the static video is used to train the video denoising model, which can further improve the denoising effect of the video denoising model.
- the processing method of the video denoising model further includes the following steps: obtaining a non-noised dynamic video from a video database; and performing a noise processing on the non-noised dynamic video to obtain a noisy dynamic video.
- dynamic videos contain moving and changing content, such as people walking, vehicles driving, etc. Such videos can show the movement and changes of dynamic objects from multiple angles.
- the video database can be a public video data set, and the public video data set can specifically be a clear video data set REDS and DAVIS.
- the video database can also be a clear video library obtained after denoising the video obtained by self-video acquisition. It should be noted that the clarity in the embodiment of the present application can be approximated as noise-free, that is, a clear video refers to a noise-free video.
- the terminal can directly obtain a clear dynamic video from the video database, that is, a dynamic video without noise, and use a preset noise adding algorithm to perform noise adding processing on the obtained dynamic video to obtain a noisy dynamic video.
- the terminal obtains unnoised dynamic video from the video database, performs noise processing on the unnoised dynamic video, and obtains noisy dynamic video, so that the noisy dynamic video can be used as a sample video, and the unnoised dynamic video can be used as a reference video to train the video denoising model, which can better simulate the noise situation in the real scene, thereby improving the denoising effect of the target video denoising model.
- the video frames in the unnoised dynamic video are clear video frames
- the terminal performs noise processing on the unnoised dynamic video to obtain a noisy dynamic video.
- the process includes the following steps: selecting part of the pixels from each clear video frame; generating corresponding second pixel images according to the part of the pixels of each clear video frame; generating second initial noise images corresponding to each clear video frame; fusing each second initial noise image with the corresponding second pixel image to obtain a second noise image corresponding to each clear video frame; fusing each second noise image into the corresponding clear video frame to obtain a noisy dynamic video.
- partial pixels refer to partial pixels in a clear video frame, which can be randomly selected from a clear video frame.
- the second pixel image is used to describe the distribution of partial pixels.
- the grayscale value at the position corresponding to the partial pixels in the second pixel image is 1, and 1 means that noise is added at the position corresponding to this pixel.
- the grayscale value at other positions other than partial pixels is 0, and 0 means that no noise is added at the position corresponding to this pixel.
- the terminal after obtaining the unnoised dynamic video, the terminal obtains each clear video frame from the unnoised dynamic video, and for any clear video frame, randomly selects some pixels from the clear video frame, and generates a second pixel image of the same size as the pre-clear video frame based on the selected some pixels, wherein the grayscale values at the corresponding positions of some pixels in the second pixel image can be 1, and the grayscale values at other positions outside the some pixels can be 0, and a preset noise generation algorithm is used to generate a second initial noise image, and the second pixel image is dot-multiplied by the second initial noise image to obtain a second noise image, and the second noise image is merged into the clear video frame to obtain a noisy dynamic video frame.
- the above noise addition processing is performed on each clear video frame in the unnoised dynamic video to obtain a noisy dynamic video.
- the preset noise generation algorithm may be a random distribution algorithm, such as a Gaussian distribution algorithm, etc.
- a Gaussian distribution algorithm is used to process the corresponding clear video frame to obtain a second initial noise image.
- the terminal fuses the second noise image into the corresponding clear video frame, and specifically can use pixel-by-pixel weighted averaging to achieve image fusion, which specifically includes the following steps: obtaining the third weight corresponding to the second noise image and the fourth weight corresponding to the clear video frame, determining the weighted pixel value corresponding to each target pixel based on the third weight and the pixel value of each pixel in the second noise image, and the fourth weight and the pixel value of each pixel in the clear video frame, and generating a noisy dynamic video frame based on the weighted pixel value of each target pixel.
- the target pixel refers to the pixel in the noisy dynamic video frame.
- the terminal selects part of pixels from each clear video frame; generates corresponding second pixel images according to part of pixels of each clear video frame; generates second initial noise images corresponding to each clear video frame; fuses each second initial noise image with the corresponding second pixel image to obtain a second noise image corresponding to each clear video frame; fuses each second noise image into the corresponding clear video frame to obtain a noisy dynamic video, so that the obtained noisy dynamic video can more accurately simulate the distribution of noise in the actual image, and also increase the diversity of noise.
- the noisy dynamic video is used to train the video denoising model, which can further improve the denoising effect of the video denoising model.
- the second branch includes an optical flow network, a target frame sub-branch and other frame sub-branches.
- the terminal extracts features from the downsampled video frame sequence through the second branch of the video denoising model, and the process of obtaining the image fusion features specifically includes the following steps:
- the optical flow network is a neural network model used to estimate optical flow information, which can be specifically the optical flow network SpyNet; optical flow information refers to the information about pixel position changes between adjacent video frames. It can be understood that in the video, there may be movement of objects or cameras between adjacent video frames, and these movements cause the pixel positions between adjacent frames to be different, and the optical flow information is the information used to describe the pixel position changes between adjacent frames.
- the optical flow information in the embodiment of the present application may include the optical flow information between the downsampled target video frame and the corresponding adjacent downsampled video frame in the downsampled video frame sequence, and may also include the optical flow information between any two adjacent downsampled video frames in the downsampled video frame sequence.
- the optical flow information may also be referred to as an optical flow vector, which may represent the pixel displacement between adjacent video frames and may be used for subsequent frame alignment and feature fusion.
- a downsampled video frame sequence refers to a video frame sequence obtained after downsampling each video frame in a video frame sequence, and may specifically include a downsampled target video frame and a downsampled continuous video frame, wherein the downsampled continuous video frame includes at least one of a downsampled preceding video frame and a downsampled succeeding video frame.
- the downsampled video frame sequence includes 5 downsampled video frames.
- the downsampled target video frame is the 3rd frame in the downsampled video frame sequence
- the other downsampled video frames except the 3rd frame in the downsampled video frame sequence are downsampled continuous video frames, wherein the 1st frame and the 2nd frame are the downsampled preceding video frames, and the 4th frame and the 5th frame are the downsampled succeeding video frames
- the downsampled target video frame is the 1st frame in the downsampled video frame sequence
- the 2nd frame to the 5th frame in the downsampled video frame sequence are the downsampled succeeding video frames of the downsampled target video frame
- the downsampled target video frame is the 5th frame in the downsampled video frame sequence
- the 1st frame to the 4th frame in the downsampled video frame sequence are the downsampled preceding video frames of the downsampled target video frame.
- the terminal After obtaining the downsampled video frame sequence, the terminal inputs each downsampled video frame in the downsampled video frame sequence into the optical flow network, and determines the optical flow information between any two adjacent downsampled video frames in the downsampled video frame sequence through the optical flow network, thereby obtaining the optical flow information between the downsampled target video frame and the corresponding adjacent downsampled video frames.
- the adjacent downsampled video frames include at least one of a downsampled preceding video frame and a downsampled succeeding video frame;
- the optical flow information includes at least one of first optical flow information and second optical flow information, and when the downsampled continuous video frames include the downsampled preceding video frame, the terminal determines the first optical flow information between the adjacent first downsampled video frames through the optical flow network; when the downsampled continuous video frames include the downsampled succeeding video frame, the terminal determines the second optical flow information between the adjacent second downsampled video frames through the optical flow network;
- the first optical flow information is information between adjacent first downsampled video frames
- the second optical flow information is information between second downsampled video frames
- the first downsampled video frame is a downsampled video frame between a downsampled target video frame and a downsampled preceding video frame
- the second downsampled video frame is a downsampled video frame between a downsampled target video frame and a downsampled subsequent video frame.
- the downsampled video frame sequence includes 5 downsampled video frames. If the downsampled target video frame is the third frame in the downsampled video frame sequence, then the second downsampled video frame sequence includes the third frame.
- the 1st frame and the 2nd frame are downsampled pre-order video frames
- the 4th frame and the 5th frame are downsampled post-order video frames
- the first downsampled video frame is the downsampled video frame in the 1st frame, the 2nd frame and the 3rd frame in the sampling video frame sequence
- the first optical flow information includes the optical flow information from the 1st frame to the 2nd frame
- the second downsampled video frame is the downsampled video frame in the 3rd frame
- the second optical flow information includes the optical flow information from the 5th frame to the 4th frame, and the optical flow information from the 4th frame to the 3rd frame.
- the terminal when the downsampled continuous video frames include downsampled preceding video frames, the terminal inputs each downsampled preceding video frame and the downsampled target video frame in the downsampled video frame sequence into the optical flow network, and determines the optical flow information between any two adjacent downsampled video frames in the downsampled preceding video frame and the downsampled target video frame through the optical flow network, that is, determines the optical flow information between adjacent first downsampled video frames, and determines the optical flow information as the first optical flow information between the downsampled target video frame and the corresponding adjacent downsampled video frame; when the downsampled continuous video frames include downsampled subsequent video frames, the terminal inputs each downsampled subsequent video frame and the downsampled target video frame in the downsampled video frame sequence into the optical flow network, and determines the optical flow information between any two adjacent downsampled video frames in the downsampled subsequent video frame and the downsampled target video frame through the optical flow network, that is, determines the
- other frame sub-branches are used to perform feature extraction on downsampled video frames other than the downsampled target video frames in the downsampled video frame sequence to obtain continuous video frame features corresponding to the downsampled target video frames.
- the other frame sub-branches include at least one of the preceding frame sub-branch and the succeeding frame sub-branch.
- the preceding frame sub-branch is used to perform feature extraction on the downsampled preceding video frame to obtain the preceding video frame features
- the succeeding frame sub-branch is used to perform feature extraction on the downsampled succeeding video frame to obtain the succeeding video frame features.
- the terminal inputs the downsampled continuous video frames in the downsampled video frame sequence into other frame sub-branches, performs feature extraction on the input sampled continuous video frames through other frame sub-branches, and obtains continuous video frame features corresponding to the downsampled target video frames.
- adjacent downsampled video frames include at least one of a downsampled preceding video frame and a downsampled succeeding video frame; continuous video frame features include at least one of a preceding video frame feature and a succeeding video frame feature; when the downsampled continuous video frames include the downsampled preceding video frames, the terminal extracts features of the downsampled preceding video frames through a forward network layer of a preceding frame sub-branch to obtain the preceding video frame features; when the downsampled continuous video frames include the downsampled succeeding video frames, the terminal extracts features of the downsampled succeeding video frames through a backward network layer of a succeeding frame sub-branch to obtain the succeeding video frames.
- Video frame features when the downsampled continuous video frames include the downsampled preceding video frames, the terminal extracts features of the downsampled preceding video frames through a forward network layer of a succeeding frame sub-branch to obtain the succeeding video frames.
- the forward network layer refers to the forward U-type network
- the backward network layer refers to the backward U-type network.
- the forward U-type network is a U-type network used to extract features from the downsampled preceding video frames
- the backward U-type network is a U-type network used to extract features from the downsampled subsequent video frames.
- the U-type network is a convolutional neural network structure used for image processing tasks, which consists of a downsampling module and an upsampling module, and usually there are some convolutional layers and pooling layers in the middle.
- the terminal when the downsampled continuous video frames include downsampled preceding video frames, the terminal inputs each downsampled preceding video frame in the downsampled video frame sequence into the preceding frame sub-branch, and extracts features of each downsampled preceding video frame through the forward network layer of the preceding frame sub-branch to obtain the features of the preceding video frame; when the downsampled continuous video frames include downsampled subsequent video frames, the terminal inputs each downsampled subsequent video frame in the downsampled video frame sequence into the subsequent frame sub-branch, and extracts features of each downsampled subsequent video frame through the backward network layer of the subsequent frame sub-branch to obtain the features of the subsequent video frame.
- the video frame sequence there is usually a correlation between the preceding and succeeding frames.
- the downsampled video frame sequence includes 5 downsampled video frames
- the downsampled target video frame is the 3rd frame in the downsampled video frame sequence
- the preceding frame sub-branch 1 is used to perform feature extraction on the 1st downsampled video frame in the downsampled video frame sequence
- the preceding frame sub-branch 2 is used to perform feature extraction on the 2nd downsampled video frame in the downsampled video frame sequence
- the subsequent frame sub-branch 3 is used to perform feature extraction on the 4th downsampled video frame in the downsampled video frame sequence
- the subsequent frame sub-branch 4 is used to perform feature extraction on the 5th downsampled video frame in the downsampled video frame sequence.
- S606 Align the continuous video frame features with the downsampled target video frame based on the optical flow information to obtain aligned video frame features.
- alignment refers to matching the features of continuous video frames with the content of the downsampled target video frames. It can be understood that in a video frame sequence, there is a certain motion relationship between adjacent video frames. Through optical flow information, the downsampled target video frames can be aligned with the corresponding continuous video frame features. In this way, in subsequent processing, they can be regarded as video frames and video frame features at the same moment, thereby improving the accuracy of the model.
- adjacent downsampled video frames include at least one of a downsampled preceding video frame and a downsampled succeeding video frame;
- the optical flow information includes at least one of first optical flow information and second optical flow information;
- the continuous video frame features include at least one of the preceding video frame features and the succeeding video frame features;
- the aligned video frame features include at least one of the preceding aligned video frame features and the succeeding aligned video frame features;
- the terminal extracts a feature vector of a preset position from the features of the preceding video frames, determines the target position corresponding to the preset position in the downsampled target video frame based on the first optical flow information and the extracted feature vector, and aligns the features of the preceding video frame with the features of the downsampled target video frame based on the feature vector of the preset position and the corresponding target position in the downsampled target video frame using an interpolation method to obtain the features of the pre-aligned video frame;
- the terminal extracts a feature vector of a preset position from the features of the subsequent video frames, determines the target position corresponding to the preset position in the downsampled target video frame based on the second optical flow information and the extracted feature vector, and aligns the features of the subsequent video frame with the features of the downsampled target video frame based on the feature
- the target sub-branch is used to perform feature processing on the downsampled target video frame in the downsampled video frame sequence to obtain image fusion features corresponding to the downsampled target video frame.
- the terminal inputs the aligned video frame features into the target sub-branch, performs feature processing on the aligned video frame features through the target sub-branch, and obtains image fusion features.
- the terminal when the downsampled continuous video frames include the downsampled previous video frames, the terminal processes the features of the previous aligned video frames through the forward network layer of the target sub-branch to obtain the previous image fusion features; when the downsampled continuous video frames include the downsampled subsequent video frames, the terminal processes the features of the subsequent aligned video frames through the backward network layer of the target sub-branch to obtain the subsequent image fusion features; and the image fusion features are determined based on at least one of the previous image fusion features and the subsequent image fusion features.
- the forward network layer refers to the forward U-type network
- the backward network layer refers to the backward U-type network
- the forward U-type network of the target sub-branch is a U-type network for feature processing of the video frame features after the pre-order alignment
- the backward U-type network of the target sub-branch is a U-type network for feature processing of the video frame after the post-order alignment.
- the U-type network is a convolutional neural network structure for image processing tasks, which consists of a downsampling module and an upsampling module, and usually there are some convolutional layers and pooling layers in the middle.
- the terminal when the downsampled continuous video frames include the downsampled preceding video frames, the terminal inputs the features of the preceding aligned video frames into the forward network layer of the target sub-branch, and performs feature processing on the features of the preceding aligned video frames through the forward network layer of the target sub-branch to obtain the preceding image fusion features; when the downsampled continuous video frames include the downsampled subsequent video frames, the terminal inputs the features of the subsequent aligned video frames into the forward network layer of the target sub-branch, and performs feature processing on the features of the subsequent aligned video frames through the forward network layer of the target sub-branch to obtain the subsequent image fusion features; when the downsampled continuous video frames only include the downsampled preceding video frames, the preceding image fusion features are directly determined as the image fusion features; when the downsampled continuous video frames only include the downsampled subsequent video frames, the subsequent image fusion features are directly determined as the image fusion features; when the downsampled continuous video
- the terminal determines the optical flow information between the downsampled target video frame and the corresponding adjacent downsampled video frame in the downsampled video frame sequence through the optical flow network of the second branch, and performs feature processing on the downsampled video frame sequence through other frame sub-branches of the second branch to obtain continuous video frame features corresponding to the downsampled target video frame, so that the continuous frame information and optical flow information in the video sequence can be used to better understand the movement and changes in the video, thereby obtaining an accurate video feature representation.
- the process of the terminal determining the image fusion feature based on the previous image fusion feature and the subsequent image fusion feature specifically includes the following steps: splicing the previous image fusion feature and the subsequent image fusion feature to obtain the spliced image feature, and performing convolution processing on the spliced image feature to obtain the image fusion feature.
- the terminal splices the fusion features of the preceding image and the succeeding image to obtain the spliced image features, and inputs the spliced image features into the convolution layer of the target sub-branch.
- the spliced image features are convoluted through the convolution layer to obtain more advanced feature information, which is the image fusion feature.
- the terminal can effectively fuse the information of the previous and subsequent video frames by splicing the fusion features of the previous image and the fusion features of the subsequent image, and make full use of the correlation between the consecutive frames in the previous and subsequent video frames, so as to obtain accurate video feature representation.
- convolution processing is performed on the spliced image features to further extract and enhance the features, so as to obtain more accurate image fusion features, and then based on the image fusion features, subsequent image reconstruction can be made more accurate, thereby improving the denoising effect of the target video denoising model.
- the process of generating a predicted video frame based on the image fusion feature and the image detail feature by the terminal specifically includes the following steps: fusing the image fusion feature with the image detail feature to obtain a global image feature; reconstructing the image based on the global image feature to obtain a predicted video frame.
- the terminal after obtaining the image fusion feature and the image detail feature, the terminal obtains a first fusion coefficient corresponding to the image fusion feature and a second fusion coefficient corresponding to the image detail feature, and fuses the image fusion feature and the image detail feature based on the first fusion coefficient and the second fusion coefficient to obtain a global image feature, and performs a deconvolution operation on the global image feature to obtain a predicted video frame of the same size as the target video frame.
- the deconvolution operation is used to gradually enlarge the global image features to the original size to obtain a predicted video frame of the same size as the target video frame.
- the terminal obtains the global image feature by fusing the image fusion feature with the image detail feature, and can comprehensively utilize the information of the image fusion feature and the image detail feature to more comprehensively describe the image content of the target video frame, thereby reconstructing the image based on the global image feature to obtain the predicted video frame, and can also have a better denoising effect, thereby improving the denoising effect of the target video denoising model.
- the terminal fuses the image fusion feature with the image detail feature to obtain the global image feature, and the process specifically includes the following steps: upsampling the image fusion feature to obtain the upsampled image fusion feature; fusing the upsampled image fusion feature with the image detail feature to obtain the global image feature.
- the terminal After obtaining the image fusion feature, the terminal specifically performs a deconvolution operation on the image fusion feature to obtain an upsampled image fusion feature, obtains a first fusion coefficient corresponding to the upsampled image fusion feature and a second fusion coefficient corresponding to the image detail feature, and fuses the upsampled image fusion feature with the image detail feature based on the first fusion coefficient and the second fusion coefficient to obtain a global image feature.
- the upsampled image fusion feature and the image detail feature may be weightedly fused based on the first fusion coefficient and the second fusion coefficient.
- the terminal upsamples the image fusion features to obtain upsampled image fusion features with the same resolution as the target video frame, and fuses the upsampled image fusion features with the image detail features to obtain global image features.
- the respective advantages of the two features can be fully utilized to further improve the expression ability of the global image features, thereby improving the denoising effect of the target video denoising model.
- the terminal may also use the target video denoising model to perform denoising processing on the denoised video.
- the process specifically includes the following steps:
- S702 Determine a current video frame to be denoised in a sequence of video frames to be denoised of the video to be denoised.
- the terminal obtains the video to be denoised, extracts a sequence of video frames to be denoised from the video to be denoised, and determines the current video frame to be denoised from the sequence of video frames to be denoised.
- the sequence of video frames to be denoised extracted by the terminal from the video to be denoised contains 10 video frames, and the current video frame to be denoised is the second frame. Get the second frame in the video frame sequence.
- the target video denoising model refers to a trained video denoising model obtained by training the video denoising model, and the first branch of the target video denoising model may specifically be a high-resolution branch, which is used to process the current video frame to be denoised at the original resolution.
- the terminal inputs the current video frame to be denoised into the first branch of the target video denoising model, processes the current video frame to be denoised through each network layer of the first branch, and obtains the image detail features to be denoised of the video frame to be denoised.
- the downsampled video frame sequence to be denoised refers to the video frame sequence obtained by downsampling the video sequence to be denoised.
- downsampling refers to reducing the resolution of the image, thereby reducing the size of the image and reducing the detail information in the image. It is usually used to reduce the amount of calculation and memory usage, while accelerating the prediction process of the model.
- the second branch of the target video denoising model can specifically be a low-resolution branch, which is used to process the downsampled video frame sequence to be denoised. It can be understood that the resolution of each downsampled video frame to be denoised in the downsampled video frame sequence to be denoised is low resolution, and the size of each downsampled video frame to be denoised in the low-resolution downsampled video frame sequence to be denoised is reduced or the detail information is reduced.
- the amount of calculation can be effectively reduced, the operating efficiency of the model can be improved, and the generalization ability of the model can be enhanced, making it more suitable for processing videos of different resolutions.
- the fused features of the image to be denoised refer to the feature representation obtained by fusing the features of at least two downsampled video frames to be denoised in the downsampled video frame sequence to be denoised. It can be understood that for video data with noise, it is often difficult to obtain a good denoising effect by using only one frame of image for denoising, because a single frame image may have too much noise and distortion and cannot provide sufficient information. By fusing the features of multiple downsampled video frames to be denoised, the expressiveness of the features can be improved, thereby improving the denoising effect of the target video denoising model.
- the feature representation obtained after feature extraction of each downsampled video frame to be denoised in the downsampled video frame sequence to be denoised may have information loss. Fusion of the features of multiple downsampled video frames to be denoised can improve the expressiveness of the features, thereby improving the denoising effect of the target video denoising model.
- the terminal downsamples each of the video frames to be denoised in the video frame sequence to obtain a downsampled video frame sequence to be denoised, and inputs the downsampled video frame sequence to be denoised into the second branch of the target video denoising model, and processes each of the downsampled video frames to be denoised in the downsampled video frame sequence to be denoised through each sub-branch of the second branch to obtain the image fusion features to be denoised.
- the second branch includes an optical flow network, a target frame sub-branch and other frame sub-branches
- S706 specifically includes the following steps: determining the optical flow information between the current downsampled video frame to be denoised and the corresponding adjacent downsampled video frame to be denoised in the downsampled video frame sequence to be denoised through the optical flow network; performing feature extraction on the downsampled video frame sequence to be denoised through other frame sub-branches to obtain features of the continuous video frames to be denoised corresponding to the current downsampled video frame to be denoised; aligning the features of the continuous video frames to be denoised with the current downsampled video frame to be denoised based on the optical flow information to obtain features of the aligned video frames to be denoised; processing the features of the aligned video frames to be denoised through the target sub-branch to obtain features of the image fusion to be denoised.
- the downsampled video frame sequence to be denoised includes the current downsampled video frame to be denoised and the downsampled continuous video frame to be denoised
- the downsampled continuous video frame to be denoised includes at least one of the downsampled preceding video frame to be denoised and the downsampled subsequent video frame to be denoised
- the other frame sub-branches include at least one of the preceding frame sub-branches and the subsequent frame sub-branches.
- the features of the continuous video frames to be denoised include at least one of the features of the preceding video frames to be denoised and the features of the subsequent video frames to be denoised
- the features of the aligned video frames to be denoised include at least one of the features of the preceding video frames to be denoised and the features of the subsequent aligned video frames to be denoised
- the terminal determines, through the optical flow network, the optical flow information between the current downsampled video frame to be denoised and the corresponding adjacent downsampled video frames in the downsampled video frame sequence to be denoised, and the process specifically includes the following steps: determining, through the optical flow network, the third optical flow information between the current downsampled video frame to be denoised and the adjacent downsampled video frames in the downsampled preceding video frames to be denoised; determining, through the optical flow network, the fourth optical flow information between the current downsampled video frame to be denoised and the adjacent downsampled video frames in the downsamp
- the terminal performs feature extraction on the downsampled video frame sequence to be denoised through other frame sub-branches to obtain the features of the continuous video frame to be denoised corresponding to the current downsampled video frame to be denoised
- the process includes the following steps: performing feature extraction on the downsampled preceding video frame to be denoised through the forward network layer of the preceding frame sub-branch to obtain the features of the preceding video frame to be denoised; performing feature extraction on the downsampled subsequent video frame to be denoised through the backward network layer of the subsequent frame sub-branch to obtain the features of the subsequent video frame to be denoised.
- the terminal aligns the features of the continuous video frames to be denoised with the current downsampled video frames to be denoised based on the optical flow information
- the process of obtaining the features of the aligned video frames to be denoised includes the following steps: aligning the features of the preceding video frames to be denoised with the current downsampled video frames to be denoised based on the third optical flow information to obtain the features of the preceding aligned video frames to be denoised; aligning the features of the subsequent video frames to be denoised with the current downsampled video frames to be denoised based on the fourth optical flow information to obtain the features of the subsequent aligned video frames to be denoised;
- the terminal processes the features of the video frame after denoising and alignment through the target sub-branch to obtain the process of image fusion features, which includes the following steps: processing the features of the pre-order aligned video frame to be denoised through the forward network layer of the target sub-branch to obtain the fusion features of the pre-order image to be denoised; processing the features of the post-order aligned video frame to be denoised through the backward network layer of the target sub-branch to obtain the fusion features of the post-order image to be denoised; and determining the fusion features of the image to be denoised based on at least one of the fusion features of the pre-order image to be denoised and the fusion features of the post-order image to be denoised.
- the process of determining the fusion features of the image to be denoised based on at least one of the fusion features of the preceding image to be denoised and the fusion features of the subsequent image to be denoised by the terminal specifically includes the following steps: when the downsampled continuous video frames to be denoised only include the downsampled preceding video frames to be denoised, directly determining the fusion features of the preceding image to be denoised as the fusion features of the image to be denoised; when the downsampled continuous video frames to be denoised only include the downsampled subsequent video frames to be denoised, directly determining the fusion features of the subsequent image to be denoised as the fusion features of the image to be denoised; when the downsampled continuous video frames to be denoised include the downsampled preceding video frames to be denoised and the downsampled subsequent video frames to be denoised, splicing the fusion features of the preceding image to be denoised and the fusion features of
- the terminal fuses the fused features of the image to be denoised and the detailed features of the image to be denoised to obtain the global image features to be denoised, and generates a predicted video frame based on the global image features to be denoised.
- the terminal determines the current video frame to be denoised in the video frame sequence to be denoised of the video to be denoised; extracts the image detail features to be denoised of the video frame to be denoised through the first branch of the target video denoising model; after obtaining the downsampled video frame sequence to be denoised corresponding to the video frame sequence to be denoised, extracts the features of the downsampled video frame sequence to be denoised through the second branch of the target video denoising model to obtain the image fusion features to be denoised; generates the denoised video frame corresponding to the video frame to be denoised based on the image detail features to be denoised and the image fusion features to be denoised, which fully considers the video
- the correlation and continuity in the time dimension can effectively reduce the amount of calculation and improve the operating efficiency of the model. Therefore, even when computing resources are limited, the features of the video frames to be denoised can be better extracted, thereby improving the denoising effect of the target video denoising
- a method for processing a video denoising model is provided, which is described by taking the method applied to the computer device in FIG1 as an example, and includes the following steps:
- S806 The static video with added noise and real noise and the dynamic video with added noise are determined as sample videos, and the clear static video and the dynamic video without added noise are determined as reference videos.
- the reference video frame is a video frame in the reference video corresponding to the target video frame; the target video denoising model is used to denoise the video to be denoised, and the reference video includes a clear static video obtained by smoothing the static video and a dynamic video without noise.
- the present application also provides an application scenario, which uses the processing method of the above-mentioned video denoising model, and the method includes the following steps:
- the training data comes from two parts, one part is an artificially collected still video with real noise, and the other part is a public clear video set.
- the video with real noise and the clear video are artificially denoised to obtain a low-quality noisy video (LQ).
- the video with real noise is time-domain smoothed, and the clear video is copied to obtain a high-quality clear video (GT).
- the low-quality noisy video (LQ) is used as a sample video
- the corresponding high-quality clear video (GT) is used as a reference video to construct a paired data set, and the constructed paired data set is used to train the video denoising model.
- the video denoising model includes a high-resolution branch and a low-resolution branch.
- the low-resolution branch includes an optical flow network and multiple sub-branches.
- Each sub-branch includes a forward U-type network and a backward U-type network.
- the terminal obtains a target video frame in the video frame sequence of the sample video, and extracts the image detail features of the target video frame through the high-resolution branch of the video denoising model. After downsampling the video frame sequence to obtain a downsampled video frame sequence, the downsampled video frame sequence is input into the low-resolution branch.
- the optical flow network of the second branch of the video denoising model is used to determine the optical flow information between adjacent downsampled video frames in the downsampled video frame sequence.
- the downsampled video frames corresponding to the target sub-branch in the low-resolution branch and the optical flow information are processed respectively.
- a video frame sequence includes 10 video frames, and the target video frame is the i-th frame.
- the 10 downsampled video frames are input into the low-resolution branch of the video denoising model.
- Each downsampled video frame corresponds to a sub-branch in the low-resolution branch.
- the pre-trained optical flow network SpyNet is first used to determine the first optical flow information from the i+1th frame to the i-th frame, and the second optical flow information from the i-1th frame to the i-th frame, and the i+1th frame is extracted through the backward U-shaped network layer of the sub-branch corresponding to the i+1th frame.
- the forward U-shaped network layer of the sub-branch corresponding to the i-1-th frame is used to extract features of the i-1-th frame to obtain the previous video frame features, and the previous video frame features and the subsequent video frame features are respectively aligned with the i-th frame based on the first optical flow information and the second optical flow information to obtain the previous aligned video frame features and the subsequent aligned video frame features, the forward U-shaped network layer of the sub-branch corresponding to the i-th frame is used to perform feature processing on the previous aligned video frame features to obtain the previous image fusion features, the backward U-shaped network layer of the sub-branch corresponding to the i-th frame is used to perform feature processing on the subsequent aligned video frame features to obtain the subsequent image fusion features, the previous image fusion features and the subsequent image fusion features are spliced to obtain the spliced image features, and the spliced image features are convoluted by the convolution layer of the sub-bra
- the preceding video frame features corresponding to the i-1th frame can be specifically determined based on the image of the i-1th frame and the video frame features of the i-2th frame
- the succeeding video frame features corresponding to the i+1th frame can be specifically determined based on the image of the i+1th frame and the video frame features of the i+2th frame.
- Figure 11 is a video frame to be denoised of the video to be denoised, and the video frame to be denoised contains a lot of noise.
- Figure 12 is a clear video frame obtained after denoising the video frame to be denoised using the target video denoising model trained by the solution of the present application.
- the embodiment of the present application also provides a processing device for a video denoising model for implementing the processing method of the video denoising model involved above.
- the implementation scheme for solving the problem provided by the device is similar to the implementation scheme recorded in the above method, so the specific limitations in the embodiments of the processing device for one or more video denoising models provided below can refer to the limitations of the processing method for the video denoising model above, and will not be repeated here.
- a processing device for a video denoising model comprising: a video frame acquisition module 1302, a detail feature extraction module 1304, a fusion feature extraction module 1306, a prediction module 1308 and a parameter adjustment module 1310, wherein:
- a video frame acquisition module 1302 is used to acquire a target video frame in a video frame sequence of a sample video
- a detail feature extraction module 1304 is used to extract image detail features of a target video frame through a first branch of a video denoising model
- a fusion feature extraction module 1306 is used to downsample the video frame sequence to obtain a downsampled video frame sequence, and extract features from the downsampled video frame sequence through the second branch of the video denoising model to obtain an image fusion feature;
- a prediction module 1308, configured to generate a predicted video frame based on the image fusion feature and the image detail feature
- the parameter adjustment module 1310 is used to adjust the parameters in the video denoising model according to the loss value between the predicted video frame and the reference video frame to obtain the target video denoising model;
- the reference video frame is the video frame in the reference video corresponding to the target video frame;
- the target video denoising model is used to denoise the video to be denoised.
- the image detail features of the target video frame are extracted through the first branch of the video denoising model.
- the downsampled video frame sequence is subjected to feature extraction through the second branch of the video denoising model to obtain the image fusion features.
- the predicted video frame is generated based on the image fusion features and the image detail features. This not only fully considers the correlation and continuity of the video in the time dimension, but also can effectively reduce the amount of calculation and improve the operation efficiency of the model.
- the parameters in the video denoising model can be adjusted according to the loss value between the predicted video frame and the video frame corresponding to the target video frame in the reference video to obtain a target video denoising model with better denoising effect.
- the sample video includes a static video carrying real noise and a noisy dynamic video.
- the sample video includes a static video with real noise and a dynamic video with added noise
- the reference video includes a clear static video obtained by smoothing the static video and a dynamic video without added noise.
- the apparatus further includes a sample video acquisition module 1312 and a reference video acquisition module 1314, wherein: the sample video acquisition module 1312 is used to perform video capture on a static object to obtain an original static video carrying real noise; the original static video is subjected to noise addition processing to obtain a static video; the static video carries added noise and real noise; the reference video acquisition module 1314 is used to perform smoothing processing on the original static video to obtain a clear static video.
- the sample video acquisition module 1312 is further used to acquire partial pixels from each noisy video frame of the original static video; generate corresponding first pixel images according to the partial pixels of each noisy video frame; generate first initial noise images corresponding to each noisy video frame; fuse the first initial noise images with the first pixel images to obtain first noise images corresponding to each noisy video frame; fuse each first noise image to the corresponding noisy video frame
- a static video is obtained.
- the reference video acquisition module 1314 is further used to acquire a non-noised dynamic video from a video database; the sample video acquisition module 1312 is further used to perform noise processing on the non-noised dynamic video to obtain a noisy dynamic video.
- the video frames in the unnoised dynamic video are clear video frames;
- the sample video acquisition module 1312 is also used to select part of the pixels from each clear video frame; generate corresponding second pixel images according to the part of the pixels of each clear video frame; generate second initial noise images corresponding to each clear video frame; fuse each second initial noise image with the corresponding second pixel image to obtain a second noise image corresponding to each clear video frame; fuse each second noise image into the corresponding clear video frame to obtain a noisy dynamic video.
- the second branch includes an optical flow network, a target frame sub-branch and other frame sub-branches; the fusion feature extraction module 1306 is also used to: determine the optical flow information between the downsampled target video frame in the downsampled video frame sequence and the corresponding adjacent downsampled video frame through the optical flow network; extract features of the downsampled video frame sequence through other frame sub-branches to obtain continuous video frame features corresponding to the downsampled target video frame; align the continuous video frame features with the downsampled target video frame based on the optical flow information to obtain aligned video frame features; process the aligned video frame features through the target sub-branch to obtain image fusion features.
- adjacent downsampled video frames include downsampled preceding video frames and downsampled succeeding video frames
- the fusion feature extraction module 1306 the optical flow information includes first optical flow information and second optical flow information
- the continuous video frame features include preceding video frame features and succeeding video frame features
- the aligned video frame features include preceding aligned video frame features and succeeding aligned video frame features; and are also used to: determine the first optical flow information between adjacent first downsampled video frames through an optical flow network; determine the second optical flow information between adjacent second downsampled video frames through an optical flow network
- the first downsampled video frame is a downsampled video frame between the downsampled target video frame and the downsampled preceding video frame
- the second downsampled video frame is a downsampled video frame between the downsampled target video frame and the downsampled succeeding video frame
- the fusion feature extraction module 1306 is used to: splice the fusion features of the preceding image and the fusion features of the succeeding image to obtain spliced image features; and perform convolution processing on the spliced image features to obtain image fusion features.
- the prediction module 1308 is further used to: fuse the image fusion feature with the image detail feature to obtain the global image feature; and reconstruct the image based on the global image feature to obtain a predicted video frame.
- the prediction module is further used to: upsample the image fusion feature to obtain the upsampled image fusion feature; and fuse the upsampled image fusion feature with the image detail feature to obtain the global image feature.
- the video frame acquisition module 1302 is further used to determine the current video frame to be denoised in the sequence of video frames to be denoised of the video to be denoised; the detail feature extraction module is further used to extract the detail features of the image to be denoised of the video frame to be denoised through the first branch of the target video denoising model; the fusion feature extraction module 1306 is further used to obtain the video frame to be denoised After the downsampled video frame sequence to be denoised corresponds to the video frame sequence, the feature extraction of the downsampled video frame sequence to be denoised is performed through the second branch of the target video denoising model to obtain the fusion feature of the image to be denoised; the prediction module is also used to generate a denoised video frame corresponding to the video frame to be denoised based on the detail feature of the image to be denoised and the fusion feature of the image to be denoised.
- Each module in the processing device of the above video denoising model can be implemented in whole or in part by software, hardware and a combination thereof.
- Each module can be embedded in or independent of a processor in a computer device in the form of hardware, or can be stored in a memory in a computer device in the form of software, so that the processor can call and execute operations corresponding to each module above.
- a computer device which may be a server, and its internal structure diagram may be shown in FIG15.
- the computer device includes a processor, a memory, an input/output interface (Input/Output, referred to as I/O) and a communication interface.
- the processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface.
- the processor of the computer device is used to provide computing and control capabilities.
- the memory of the computer device includes a non-volatile storage medium and an internal memory.
- the non-volatile storage medium stores an operating system, a computer-readable instruction and a database.
- the internal memory provides an environment for the operation of the operating system and the computer-readable instructions in the non-volatile storage medium.
- the database of the computer device is used to store video data.
- the input/output interface of the computer device is used to exchange information between the processor and an external device.
- the communication interface of the computer device is used to communicate with an external terminal through a network connection.
- a computer device which may be a terminal, and its internal structure diagram may be shown in FIG16.
- the computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input device.
- the processor, the memory, and the input/output interface are connected via a system bus, and the communication interface, the display unit, and the input device are connected to the system bus via the input/output interface.
- the processor of the computer device is used to provide computing and control capabilities.
- the memory of the computer device includes a non-volatile storage medium and an internal memory.
- the non-volatile storage medium stores an operating system and computer-readable instructions.
- the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium.
- the input/output interface of the computer device is used to exchange information between the processor and an external device.
- the communication interface of the computer device is used to communicate with an external terminal in a wired or wireless manner, and the wireless manner can be implemented through WIFI, a mobile cellular network, NFC (near field communication) or other technologies.
- a processing method for a video denoising model is implemented.
- the display unit of the computer device is used to form a visually visible image, and can be a display screen, a projection device or a virtual reality imaging device.
- the display screen can be a liquid crystal display screen or an electronic ink display screen.
- the input device of the computer device can be a touch layer covered on the display screen, or a button, trackball or touchpad set on the computer device casing, or an external keyboard, touchpad or mouse, etc.
- FIG. 15 or FIG. 16 is merely a block diagram of a partial structure related to the scheme of the present application, and does not constitute a limitation on the computer device to which the scheme of the present application is applied.
- the specific computer device may include more or fewer components than shown in the figure, or combine certain components, or have a different arrangement of components.
- a computer device including a memory and a processor, wherein the memory stores computer-readable instructions, and the processor implements the steps in the above-mentioned method embodiments when executing the computer-readable instructions.
- a computer-readable storage medium on which computer-readable instructions are stored.
- the steps in the above-mentioned method embodiments are implemented.
- a computer program product comprising computer-readable instructions, which implement the steps in the above-mentioned method embodiments when executed by a processor.
- user information including but not limited to user device information, user personal information, etc.
- data including but not limited to data used for analysis, stored data, displayed data, etc.
- any reference to the memory, database or other medium used in the embodiments provided in the present application can include at least one of non-volatile and volatile memory.
- Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetoresistive random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc.
- Volatile memory can include random access memory (RAM) or external cache memory, etc.
- RAM can be in various forms, such as static random access memory (SRAM) or dynamic random access memory (DRAM).
- SRAM static random access memory
- DRAM dynamic random access memory
- the database involved in each embodiment provided in this application may include at least one of a relational database and a non-relational database.
- Non-relational databases may include distributed databases based on blockchains, etc., but are not limited to this.
- the processor involved in each embodiment provided in this application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, etc., but are not limited to this.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Signal Processing (AREA)
- Image Processing (AREA)
Abstract
Description
Claims (16)
- 一种视频去噪模型的处理方法,由计算机设备执行,所述方法包括:在样本视频的视频帧序列中获取目标视频帧,以及获取所述样本视频对应的参考视频;通过视频去噪模型的第一分支提取所述目标视频帧的图像细节特征;对所述视频帧序列进行下采样得到下采样视频帧序列,通过所述视频去噪模型的第二分支对所述下采样视频帧序列进行特征提取,得到图像融合特征;基于所述图像融合特征和所述图像细节特征生成预测视频帧;根据所述预测视频帧和参考视频帧之间的损失值,对所述视频去噪模型中的参数进行调整,得到目标视频去噪模型;所述参考视频帧是所述参考视频中与所述目标视频帧对应的视频帧;所述目标视频去噪模型用于对待去噪视频进行去噪处理。
- 根据权利要求1所述的方法,所述样本视频包括携带真实噪声的静态视频和加噪的动态视频;所述参考视频包括对所述静态视频进行平滑处理所得的清晰静态视频和未加噪的所述动态视频。
- 根据权利要求2所述的方法,所述静态视频还携带有加噪噪声;所述方法还包括:对静态对象进行视频采集,得到携带真实噪声的原始静态视频;对所述原始静态视频进行加噪处理,得到所述静态视频;所述静态视频携带有所述加噪噪声和所述真实噪声;对所述原始静态视频进行平滑处理,得到所述清晰静态视频。
- 根据权利要求3所述的方法,所述对所述原始静态视频进行加噪处理,得到所述静态视频,包括:从所述原始静态视频的各带噪视频帧中获取部分像素;根据各所述带噪视频帧的部分像素分别生成对应的第一像素图像;生成与各所述带噪视频帧对应的第一初始噪声图像;将所述第一初始噪声图像分别与所述第一像素图像进行融合,得到各所述带噪视频帧对应的第一噪声图像;将各所述第一噪声图像分别融合至对应的所述带噪视频帧中,得到所述静态视频。
- 根据权利要求1所述的方法,所述方法还包括:从视频数据库中获取未加噪的动态视频;对所述未加噪的动态视频进行加噪处理,得到加噪的动态视频。
- 根据权利要求5所述的方法,所述未加噪的动态视频中的视频帧为清晰视频帧;所述对所述未加噪的动态视频进行加噪处理,得到加噪的动态视频,包括:从各所述清晰视频帧中选取部分像素;根据各所述清晰视频帧的部分像素分别生成对应的第二像素图像;生成各所述清晰视频帧对应的第二初始噪声图像;将各所述第二初始噪声图像分别与对应的所述第二像素图像进行融合,得到各所述清晰视频帧对应的第二噪声图像;将各所述第二噪声图像分别融合至对应的所述清晰视频帧中,得到加噪的动态视频。
- 根据权利要求1所述的方法,所述第二分支包括光流网络、目标帧子分支和其它帧子分支;所述通过所述视频去噪模型的第二分支对所述下采样视频帧序列进行特征提取, 得到图像融合特征,包括:通过所述光流网络,确定所述下采样视频帧序列中的下采样目标视频帧与对应的相邻下采样视频帧之间的光流信息;通过所述其它帧子分支对所述下采样视频帧序列进行特征提取,得到所述下采样目标视频帧对应的连续视频帧特征;基于所述光流信息将所述连续视频帧特征与所述下采样目标视频帧进行对齐,得到对齐后视频帧特征;通过所述目标子分支对所述对齐后视频帧特征进行处理,得到图像融合特征。
- 根据权利要求7所述的方法,所述相邻下采样视频帧包括下采样前序视频帧和下采样后序视频帧;所述光流信息包括第一光流信息和第二光流信息;所述连续视频帧特征包括前序视频帧特征和后序视频帧特征;对齐后视频帧特征包括前序对齐后视频帧特征和后序对齐后视频帧特征;所述通过所述光流网络,确定所述下采样视频帧序列中的下采样目标视频帧与对应的相邻下采样视频帧之间的光流信息,包括:通过所述光流网络,确定相邻的第一下采样视频帧之间的第一光流信息;通过所述光流网络,确定相邻的第二下采样视频帧之间的第二光流信息;所述第一下采样视频帧是所述下采样目标视频帧与所述下采样前序视频帧中的下采样视频帧;所述第二下采样视频帧是所述下采样目标视频帧与所述下采样后序视频帧中的下采样视频帧;所述通过所述其它帧子分支对所述下采样视频帧序列进行特征提取,得到所述下采样目标视频帧对应的连续视频帧特征,包括:通过前序帧子分支的前向网络层对所述下采样前序视频帧进行特征提取,得到前序视频帧特征;通过后序帧子分支的后向网络层对所述下采样后序视频帧进行特征提取,得到后序视频帧特征;所述前序帧子分支和所述后序帧子分支属于所述其它帧子分支;所述基于所述光流信息将所述连续视频帧特征与所述下采样目标视频帧进行对齐,得到对齐后视频帧特征,包括:基于所述第一光流信息将所述前序视频帧特征与所述下采样目标视频帧进行对齐,得到前序对齐后视频帧特征;基于所述第二光流信息将所述后序视频帧特征与所述下采样目标视频帧进行对齐,得到后序对齐后视频帧特征;所述通过所述目标子分支对所述对齐后视频帧特征进行处理,得到图像融合特征,包括:通过所述目标子分支的前向网络层对所述前序对齐后视频帧特征进行处理,得到前序图像融合特征;通过所述目标子分支的后向网络层对所述后序对齐后视频帧特征进行处理,得到后序图像融合特征;基于所述前序图像融合特征和所述后序图像融合特征,确定图像融合特征。
- 根据权利要求8所述的方法,所述基于所述前序图像融合特征和所述后序图像融合特征,确定图像融合特征,包括:将所述前序图像融合特征和所述后序图像融合特征进行拼接,得到拼接后图像特征;对所述拼接后图像特征进行卷积处理,得到图像融合特征。
- 根据权利要求1所述的方法,所述基于所述图像融合特征和所述图像细节特征生成预测视频帧,包括:将所述图像融合特征与所述图像细节特征进行融合,得到全局图像特征;基于所述全局图像特征进行图像重建,得到预测视频帧。
- 根据权利要求10所述的方法,所述将所述图像融合特征与所述图像细节特征进行融合,得到全局图像特征,包括:对所述图像融合特征进行上采样处理,得到上采样图像融合特征;将所述上采样图像融合特征与所述图像细节特征进行融合,得到全局图像特征。
- 根据权利要求1至11中任一项所述的方法,所述方法还包括:在待去噪视频的待去噪视频帧序列中确定当前的待去噪视频帧;通过所述目标视频去噪模型的第一分支提取所述待去噪视频帧的待去噪图像细节特征;对所述待去噪视频帧序列进行下采样得到下采样待去噪视频帧序列,通过所述目标视频去噪模型的第二分支对所述下采样待去噪视频帧序列进行特征提取,得到待去噪图像融合特征;基于所述待去噪图像细节特征和所述待去噪图像融合特征,生成所述待去噪视频帧对应的去噪视频帧。
- 一种视频去噪模型的处理装置,所述装置包括:视频帧获取模块,用于在样本视频的视频帧序列中获取目标视频帧,以及获取所述样本视频对应的参考视频;细节特征提取模块,用于通过视频去噪模型的第一分支提取所述目标视频帧的图像细节特征;融合特征提取模块,用于对所述视频帧序列进行下采样得到下采样视频帧序列,通过所述视频去噪模型的第二分支对所述下采样视频帧序列进行特征提取,得到图像融合特征;预测模块,用于基于所述图像融合特征和所述图像细节特征生成预测视频帧;参数调整模块,用于根据所述预测视频帧和和参考视频帧之间的损失值,对所述视频去噪模型中的参数进行调整,得到目标视频去噪模型;所述参考视频帧是所述参考视频中与所述目标视频帧对应的视频帧;所述目标视频去噪模型用于对待去噪视频进行去噪处理。
- 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现权利要求1至12中任一项所述的方法的步骤。
- 一种计算机可读存储介质,其上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现权利要求1至12中任一项所述的方法的步骤。
- 一种计算机程序产品,包括计算机可读指令,其特征在于,该计算机可读指令被处理器执行时实现权利要求1至12中任一项所述的方法的步骤。
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP24791752.9A EP4632666A4 (en) | 2023-04-18 | 2024-03-04 | METHOD AND APPARATUS FOR PROCESSING VIDEO DENOISSING MODELS, COMPUTER DEVICE AND STORAGE MEDIA |
| US19/193,267 US20250272803A1 (en) | 2023-04-18 | 2025-04-29 | Method, computer device, and storage medium for processing video denoising model |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310457798.1A CN116977200A (zh) | 2023-04-18 | 2023-04-18 | 视频去噪模型的处理方法、装置、计算机设备和存储介质 |
| CN202310457798.1 | 2023-04-18 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US19/193,267 Continuation US20250272803A1 (en) | 2023-04-18 | 2025-04-29 | Method, computer device, and storage medium for processing video denoising model |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024217164A1 true WO2024217164A1 (zh) | 2024-10-24 |
Family
ID=88482158
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2024/079883 Ceased WO2024217164A1 (zh) | 2023-04-18 | 2024-03-04 | 视频去噪模型的处理方法、装置、计算机设备和存储介质 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US20250272803A1 (zh) |
| EP (1) | EP4632666A4 (zh) |
| CN (1) | CN116977200A (zh) |
| WO (1) | WO2024217164A1 (zh) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119991416A (zh) * | 2025-04-17 | 2025-05-13 | 南京信息工程大学 | 一种基于raft光流的视频风格迁移方法 |
| CN120075372A (zh) * | 2025-04-27 | 2025-05-30 | 中国科学院沈阳自动化研究所 | 一种图像采集与融合的方法及装置 |
| CN121147028A (zh) * | 2025-09-11 | 2025-12-16 | 青岛大学 | 一种基于聚类和多尺度直方图匹配的图像序列闪烁消除方法及系统 |
Families Citing this family (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116977200A (zh) * | 2023-04-18 | 2023-10-31 | 腾讯科技(深圳)有限公司 | 视频去噪模型的处理方法、装置、计算机设备和存储介质 |
| CN117495853B (zh) * | 2023-12-28 | 2024-05-03 | 淘宝(中国)软件有限公司 | 视频数据处理方法、设备及存储介质 |
| CN118714417B (zh) * | 2024-02-07 | 2026-01-27 | 浙江天猫技术有限公司 | 视频的生成方法、系统、电子设备和存储介质 |
| CN118555461B (zh) * | 2024-07-29 | 2024-10-15 | 浙江天猫技术有限公司 | 视频生成方法、装置、设备、系统及计算机程序产品 |
| CN119991465A (zh) * | 2025-01-23 | 2025-05-13 | 英特灵达信息技术(深圳)有限公司 | 光流信息预测网络训练方法、图像增强方法、装置及设备 |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111738952A (zh) * | 2020-06-22 | 2020-10-02 | 京东方科技集团股份有限公司 | 一种图像修复的方法、装置及电子设备 |
| CN112686828A (zh) * | 2021-03-16 | 2021-04-20 | 腾讯科技(深圳)有限公司 | 视频去噪方法、装置、设备及存储介质 |
| CN113011562A (zh) * | 2021-03-18 | 2021-06-22 | 华为技术有限公司 | 一种模型训练方法及装置 |
| CN113034401A (zh) * | 2021-04-08 | 2021-06-25 | 中国科学技术大学 | 视频去噪方法及装置、存储介质及电子设备 |
| US11151695B1 (en) * | 2019-08-16 | 2021-10-19 | Perceive Corporation | Video denoising using neural networks with spatial and temporal features |
| CN116977200A (zh) * | 2023-04-18 | 2023-10-31 | 腾讯科技(深圳)有限公司 | 视频去噪模型的处理方法、装置、计算机设备和存储介质 |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114494011B (zh) * | 2021-12-31 | 2024-09-03 | 深圳市联影高端医疗装备创新研究院 | 图像插值方法及装置、处理设备、存储介质 |
-
2023
- 2023-04-18 CN CN202310457798.1A patent/CN116977200A/zh active Pending
-
2024
- 2024-03-04 EP EP24791752.9A patent/EP4632666A4/en active Pending
- 2024-03-04 WO PCT/CN2024/079883 patent/WO2024217164A1/zh not_active Ceased
-
2025
- 2025-04-29 US US19/193,267 patent/US20250272803A1/en active Pending
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11151695B1 (en) * | 2019-08-16 | 2021-10-19 | Perceive Corporation | Video denoising using neural networks with spatial and temporal features |
| CN111738952A (zh) * | 2020-06-22 | 2020-10-02 | 京东方科技集团股份有限公司 | 一种图像修复的方法、装置及电子设备 |
| CN112686828A (zh) * | 2021-03-16 | 2021-04-20 | 腾讯科技(深圳)有限公司 | 视频去噪方法、装置、设备及存储介质 |
| CN113011562A (zh) * | 2021-03-18 | 2021-06-22 | 华为技术有限公司 | 一种模型训练方法及装置 |
| CN113034401A (zh) * | 2021-04-08 | 2021-06-25 | 中国科学技术大学 | 视频去噪方法及装置、存储介质及电子设备 |
| CN116977200A (zh) * | 2023-04-18 | 2023-10-31 | 腾讯科技(深圳)有限公司 | 视频去噪模型的处理方法、装置、计算机设备和存储介质 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4632666A4 * |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119991416A (zh) * | 2025-04-17 | 2025-05-13 | 南京信息工程大学 | 一种基于raft光流的视频风格迁移方法 |
| CN120075372A (zh) * | 2025-04-27 | 2025-05-30 | 中国科学院沈阳自动化研究所 | 一种图像采集与融合的方法及装置 |
| CN121147028A (zh) * | 2025-09-11 | 2025-12-16 | 青岛大学 | 一种基于聚类和多尺度直方图匹配的图像序列闪烁消除方法及系统 |
Also Published As
| Publication number | Publication date |
|---|---|
| EP4632666A4 (en) | 2026-04-15 |
| US20250272803A1 (en) | 2025-08-28 |
| CN116977200A (zh) | 2023-10-31 |
| EP4632666A1 (en) | 2025-10-15 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2024217164A1 (zh) | 视频去噪模型的处理方法、装置、计算机设备和存储介质 | |
| TWI728465B (zh) | 圖像處理方法和裝置、電子設備及儲存介質 | |
| CN111539879A (zh) | 基于深度学习的视频盲去噪方法及装置 | |
| WO2022110638A1 (zh) | 人像修复方法、装置、电子设备、存储介质和程序产品 | |
| CN111784578A (zh) | 图像处理、模型训练方法及装置、设备、存储介质 | |
| JP2018527687A (ja) | 知覚的な縮小方法を用いて画像を縮小するための画像処理システム | |
| CN113628115B (zh) | 图像重建的处理方法、装置、电子设备和存储介质 | |
| CN113902647B (zh) | 一种基于双闭环网络的图像去模糊方法 | |
| CN106127689A (zh) | 图像视频超分辨率方法和装置 | |
| CN116385283A (zh) | 一种基于事件相机的图像去模糊方法及系统 | |
| Shrivastava et al. | Video dynamics prior: An internal learning approach for robust video enhancements | |
| CN115222606A (zh) | 图像处理方法、装置、计算机可读介质及电子设备 | |
| Fang et al. | Self-enhanced convolutional network for facial video hallucination | |
| CN120912433A (zh) | 基于事件流的模糊图像超分重建方法、装置、设备及存储介质 | |
| CN118608387A (zh) | 用于对卫星视频帧进行超分辨率重建的方法、装置和设备 | |
| WO2024131707A1 (zh) | 毛发增强方法、神经网络、电子装置和存储介质 | |
| Mahamud et al. | Effective Super-Resolution Through Multi-Order Degradation Simulation and Efficient Training Strategies | |
| CN106204445A (zh) | 基于结构张量全变差的图像视频超分辨率方法 | |
| HK40097800A (zh) | 视频去噪模型的处理方法、装置、计算机设备和存储介质 | |
| CN120912438B (zh) | 基于双向聚焦增强的图像超分辨率方法和装置 | |
| CN121120896B (zh) | 基于不确定性感知深度监督的稀疏视角室内重建方法 | |
| Xu et al. | Image Restoration for Beautification | |
| CN117557462A (zh) | 图像重建模型的训练与视频播放方法、装置和计算机设备 | |
| CN118735821A (zh) | 图像处理方法、装置、计算机设备和可读存储介质 | |
| CN114612293A (zh) | 图像超分辨率处理方法、装置、设备和存储介质 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24791752 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2024791752 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 2024791752 Country of ref document: EP Effective date: 20250710 |
|
| WWP | Wipo information: published in national office |
Ref document number: 2024791752 Country of ref document: EP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |