WO2024164694A1 - 图像压缩方法、装置、电子设备、计算机程序产品及存储介质 - Google Patents
图像压缩方法、装置、电子设备、计算机程序产品及存储介质 Download PDFInfo
- Publication number
- WO2024164694A1 WO2024164694A1 PCT/CN2023/138206 CN2023138206W WO2024164694A1 WO 2024164694 A1 WO2024164694 A1 WO 2024164694A1 CN 2023138206 W CN2023138206 W CN 2023138206W WO 2024164694 A1 WO2024164694 A1 WO 2024164694A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- latent variable
- compressed
- network
- latent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0475—Generative networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/182—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
- H04N19/423—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements
- H04N19/426—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements using memory downsizing methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/70—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/85—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/90—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
- H04N19/91—Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
Definitions
- the present application relates to computer technology, and in particular to an image compression method, device, electronic device, computer program product and computer storage medium.
- the related art uses basic convolutional networks to transform images.
- the compression rate is low, it is necessary to restore latent variables from the byte stream to reconstruct high-quality images, and the ability of the image nonlinear transformation network limits the network's ability to reconstruct high-quality images; at the same time, the context model in the related art uses PixelCNN serial decoding, which makes the image compression efficiency low.
- the embodiments of the present application provide an image compression method, device, electronic device, computer program product and computer storage medium, which can improve the efficiency of image compression by utilizing an image processing model. At the same time, the volume of the compressed image is smaller, thereby reducing the storage cost of the image.
- the present application provides an image compression method, the method comprising:
- a compressed image corresponding to the image to be compressed is generated, and the data amount of the compressed image is smaller than the data amount of the image to be compressed.
- the present application also provides an image compression device, the device comprising:
- An encoding module configured to encode the image to be compressed to obtain a first latent variable corresponding to the image to be compressed
- An information processing module configured to determine a hyper-priori probability estimate corresponding to the first latent variable
- the information processing module is further configured to partially decode the first latent variable according to the super a priori probability estimate to obtain a partial decoding result of the first latent variable;
- the information processing module is further configured to generate a compressed image corresponding to the image to be compressed based on a partial decoding result of the first latent variable and the first latent variable corresponding to the image to be compressed, wherein the data amount of the compressed image is smaller than the data amount of the image to be compressed.
- the present application also provides an electronic device, wherein the training device comprises:
- a memory configured to store executable instructions
- the processor is configured to implement the aforementioned image compression method when running the executable instructions stored in the memory.
- the embodiment of the present application also provides a computer program product, and when the computer program or instruction is executed by a processor, the aforementioned image compression method is implemented.
- the embodiment of the present application also provides a computer-readable storage medium storing executable instructions, which implement the aforementioned image compression method when executed by a processor.
- the embodiment of the present application encodes the image to be compressed to obtain the first latent variable, and determines the super prior probability estimate based on the first latent variable; if the first latent variable obtained by encoding obeys a certain inherent prior probability, the obtained super prior probability estimate can be used as a reference for subsequent partial decoding, so that the decoding result obtained by decoding is more accurate, and at the same time, the performance of image compression is improved, so that the size of the compressed image obtained after decoding is smaller, and the storage cost of the image is reduced.
- the first latent variable is partially decoded to obtain a partial decoding result; the first latent variable is partially decoded, that is, a part of the pixels are decoded, so that when other pixels are subsequently decoded, prediction (decoding) can be performed based on the partial decoding result, thus saving the time spent on compressing the image and improving the compression efficiency.
- FIG1 is a schematic diagram of a use environment of an image compression method provided by an embodiment of the present application.
- FIG2 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application.
- FIG3A is a flowchart diagram 1 of an image compression method provided in an embodiment of the present application.
- FIG3B is a second flow chart of the image compression method provided in an embodiment of the present application.
- FIG4 is a schematic diagram of data flow of an image processing model provided in an embodiment of the present application.
- FIG5 is a schematic diagram of a model structure of an image processing model provided in an embodiment of the present application.
- FIG6 is a schematic diagram of the working process of the spatial depth conversion layer and the depth-space conversion layer provided in an embodiment of the present application;
- FIG7 is a schematic diagram of the composition structure of the transfer window attention mechanism module provided in an embodiment of the present application.
- FIG8 is a schematic diagram of the calculation principle of the transfer window attention mechanism module provided in an embodiment of the present application.
- FIG9 is a schematic diagram of an autoregression of a context network provided in an embodiment of the present application.
- FIG10 is a schematic diagram of a flow chart of an image processing model training method provided in an embodiment of the present application.
- FIG. 11 is a schematic diagram of an effect test of an image processing model provided in an embodiment of the present application.
- Wasserstein distance A distance metric function mainly used to measure the difference between two distributions.
- Neural Network for short, in the field of machine learning and cognitive science, is a mathematical model or computational model that imitates the structure and function of biological neural networks and is used to estimate or approximate functions.
- Model parameters It is a quantity that uses universal variables to establish the relationship between functions and variables. In artificial neural networks, model parameters are usually real number matrices.
- Model training, multi-classification learning of image datasets The model can be built using deep learning frameworks such as Tensor Flow and torch, and uses multiple layers of neural network layers such as CNN to form a multi-image classification model.
- the input of the model is a three-channel or original channel matrix formed by reading the image through tools such as openCV.
- the model output is a multi-classification probability, and the image compression result is finally output through algorithms such as softmax.
- the model approaches the correct trend through objective functions such as cross entropy.
- VAE Variational Autoencoder
- latent code low-dimensional latent variables
- the latent variable obeys a certain inherent prior probability, and the input image obeys the conditional probability conditioned on the latent variable. Then the low-dimensional variable can describe the information contained in the input image. And it can reconstruct high-dimensional input images through sampling.
- the variational autoencoder compresses low-dimensional latent variables to reduce information redundancy.
- Hyper Prior Based on the latent variables obtained by the encoder from the input image, the hyper prior uses a lightweight network to model each point in the latent variable with a scalar entropy model, and obtains the occurrence of feature points through the entropy model of the eigenvalue for bit rate estimation and entropy coding.
- the hyper prior stores the probability modeling of the latent variable in a smaller amount of bytes.
- the byte stream stored in the hyper prior module is decoded first, and then the latent variable is restored using the probability decoded from the byte stream to reconstruct the image.
- Context model usually use autoregression to predict undecoded pixel information using decoded pixel information to reduce information redundancy.
- autoregressive models use sliding window linear serial prediction, and the complexity increases exponentially with the dimension of the input data. Although the autoregressive context model can greatly improve the performance of the model, the computational complexity of the compression model also increases significantly.
- Entropy coding It is a lossless coding method that does not lose any information according to the entropy principle during the encoding process. It is also a key module in lossy coding and is located at the end of the encoder. Information entropy is the average amount of information of the source (a measure of uncertainty). Common entropy coding methods include: Shannon coding, Huffman coding, Exp-Golomb coding, and arithmetic coding. Since entropy coding is the symbol to be encoded obtained by the encoder after a series of operations such as quantization, transformation, motion, and prediction, a suitable entropy coding model is selected according to the distribution of the encoded symbols. Therefore, entropy coding is a relatively independent unit that can be applied not only to video encoding and decoding, but also to other encoders such as image coding and point cloud coding.
- the image coding method needs to manually set the image features, such as JPEG, BPG and VVC-intra use orthogonal linear transformations, such as discrete cosine transform (DCT) and discrete wavelet transform (DWT) to decorrelate the image pixels before quantization and encoding.
- JPEG compression compresses Y, Cb, and Cr separately based on the premise that the human eye is sensitive to color but more sensitive to brightness. For example, for a natural picture, jpeg performs DCT decomposition on each 8*8 patch to obtain 64 DCT parameters.
- variable-length coding and Huffman coding can be used to compress redundancy.
- the compression rate is low, it is necessary to recover the hidden variables from the byte stream to reconstruct a high-quality image, and the ability of the image nonlinear transformation network limits the ability of the network to reconstruct a high-quality image; at the same time, the context model in the related art uses PixelCNN serial decoding, and the decoding efficiency is low.
- the embodiment of the present application provides an image compression method, which compresses the image using an image processing model including an image transformation network, a hyper-prior network and a context network, thereby improving the compression efficiency and the quality of the compressed image. quality.
- FIG1 is a schematic diagram of a usage scenario of an image compression method provided in an embodiment of the present application.
- a client having an image processing function or a client having a video processing function is provided on a terminal (including a terminal 10-1 and a terminal 10-2).
- a user can input a corresponding image to be processed through the provided image processing client, and the image processing client can also receive a corresponding compressed image and display the received compressed image to the user; the video processing client can compress each frame of the video through the image processing model provided in an embodiment of the present application to reduce the server storage space occupied by the video.
- the terminal is connected to the server 200 via a network 300, and the network 300 can be a wide area network or a local area network, or a combination of the two, and a wireless link is used to realize data transmission.
- the server 200 is configured to deploy an image processing model and train the image processing model to determine the network parameters of the image transformation network, the hyper-prior network and the context network in the image processing model; and after the image processing model training is completed, the compressed image corresponding to the image to be processed generated by the image processing model is displayed through the terminal (terminal 10-1 and/or terminal 10-2).
- the image processing model needs to be trained to determine the network parameters of the image transformation network, the hyper-prior network and the context network.
- FIG. 2 is a schematic diagram of the composition structure of the electronic device provided by the embodiment of the present application. It can be understood that FIG. 2 only shows an exemplary structure of the electronic device rather than the entire structure. Part or all of the structure shown in FIG. 2 can be implemented as needed.
- the electronic device provided in the embodiment of the present application includes: at least one processor 201, a memory 202, a user interface 203 and at least one network interface 204.
- the various components in the electronic device 20 are coupled together through a bus system 205.
- the bus system 205 is configured to achieve connection and communication between these components.
- the bus system 205 also includes a power bus, a control bus and a status signal bus.
- various buses are marked as bus systems 205 in Figure 2.
- the user interface 203 may include a display, a keyboard, a mouse, a trackball, a click wheel, keys, buttons, a touch pad or a touch screen.
- the memory 202 can be a volatile memory or a non-volatile memory, or can include both volatile and non-volatile memories.
- the memory 202 in the embodiment of the present application can store data to support the operation of the terminal (such as 10-1). Examples of such data include: any computer program used to operate on the terminal (such as 10-1), such as an operating system and an application program.
- the operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, etc. Used to implement various basic services and process hardware-based tasks.
- Applications can include various applications.
- the image compression device provided in the embodiment of the present application can be implemented in a combination of software and hardware.
- the image compression device provided in the embodiment of the present application can be a processor in the form of a hardware decoding processor, which is programmed to execute the image compression method provided in the embodiment of the present application.
- the processor in the form of a hardware decoding processor can adopt one or more application specific integrated circuits (ASICs), DSPs, programmable logic devices (PLDs), complex programmable logic devices (CPLDs), field programmable gate arrays (FPGAs), or other electronic components.
- ASICs application specific integrated circuits
- DSPs digital signal processor
- PLDs programmable logic devices
- CPLDs complex programmable logic devices
- FPGAs field programmable gate arrays
- the image compression device provided in an embodiment of the present application can be directly embodied as a combination of software modules executed by a processor 201.
- the software module can be located in a storage medium, and the storage medium is located in a memory 202.
- the processor 201 reads the executable instructions included in the software module in the memory 202, and completes the image compression method provided in an embodiment of the present application in combination with necessary hardware (for example, including the processor 201 and other components connected to the bus 205).
- processor 201 can be an integrated circuit chip with signal processing capabilities, such as a general-purpose processor, a digital signal processor (DSP), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., where the general-purpose processor can be a microprocessor or any conventional processor, etc.
- DSP digital signal processor
- the device provided in the embodiment of the present application can be directly executed by a processor 201 in the form of a hardware decoding processor.
- the image compression method provided in the embodiment of the present application can be implemented by one or more application specific integrated circuits (ASICs), DSPs, programmable logic devices (PLDs), complex programmable logic devices (CPLDs), field programmable gate arrays (FPGAs), or other electronic components.
- ASICs application specific integrated circuits
- DSPs digital signal processor
- PLDs programmable logic devices
- CPLDs complex programmable logic devices
- FPGAs field programmable gate arrays
- the memory 202 in the embodiment of the present application is configured to store various types of data to support the operation of the electronic device 20. Examples of such data include: any executable instructions for operating on the electronic device 20, such as executable instructions, and the program implementing the image compression method in the embodiment of the present application may be included in the executable instructions.
- the image compression device provided in the embodiments of the present application can be implemented in software.
- Figure 2 shows an image compression device stored in the memory 202, which can be software in the form of programs and plug-ins, and includes a series of modules.
- the image compression device includes the following software modules: an encoding module 2081 and an information processing module 2082.
- the encoding module 2081 is configured to encode the image to be compressed to obtain a first latent variable corresponding to the image to be compressed;
- An information processing module 2082 is configured to determine a hyper-priori probability estimate corresponding to the first latent variable
- the information processing module 2082 is configured to partially decode the first latent variable according to the super prior probability estimate to obtain a partial decoding result of the first latent variable;
- the information processing module 2082 is further configured to generate a compressed image corresponding to the image to be compressed based on the partial decoding result of the first latent variable and the first latent variable corresponding to the image to be compressed, wherein the data amount of the compressed image is smaller than the data amount of the image to be compressed.
- the information processing module 2082 is further configured to perform autoregression on the second latent variable according to the partial decoding result to obtain the mean and variance of the second latent variable;
- the information processing module 2082 is further configured to decode the second latent variable using the mean and variance to obtain a compressed image.
- the information processing module 2082 is configured to encode the first latent variable to obtain a third latent variable
- the information processing module 2082 is configured to perform entropy coding on the third latent variable to obtain an entropy coding of the third latent variable;
- the information processing module 2082 is configured to decode the entropy coding of the third latent variable to obtain a fourth latent variable
- the information processing module 2082 is configured to decode the fourth latent variable to obtain a super-prior probability estimate.
- the information processing module 2082 is configured to group the second latent variables to obtain at least two groups of sub-latent variables;
- An information processing module 2082 is configured to perform spatial autoregression on each group of latent sub-variables through a checkerboard;
- the information processing module 2082 is configured to predict the undecoded channel group through the partial decoding results after each group of sub-latent variables completes the spatial autoregression, until the second latent variable completely completes the autoregression, and obtain the mean and variance of the second latent variable.
- the information processing module 2082 is further configured to decode the second latent variable using the mean and variance to obtain a decoding result of the second latent variable;
- the information processing module 2082 is configured to alternately segment and transfer the attention mechanism on the decoding result of the second latent variable until the decoding result of the second latent variable is completely segmented to obtain a compressed image.
- the information processing module 2082 is further configured to encode the image to be compressed through an image transformation network of the image processing model to obtain a first latent variable
- the second latent variable is autoregressed according to the partial decoding results to obtain the mean and variance of the second latent variable
- the second latent variable is decoded using the mean and variance through the image transformation network to obtain the compressed image.
- the information processing module 2082 is further configured to obtain a first training sample set corresponding to the image processing model, the first training sample set including at least one set of noise-free training samples;
- the information processing module 2082 is further configured to configure random noise for the first training sample set to obtain a second training sample set;
- the information processing module 2082 is further configured to obtain initial parameters of the image processing model
- the information processing module 2082 is also configured to train the image processing model based on the initial parameters of the image processing model and the loss function of the image processing model through a first training sample set and a second training sample set to determine the image transformation network parameters, hyper-prior network parameters and context network parameters of the image processing model.
- the information processing module 2082 is configured to determine a threshold value of the amount of dynamic noise that matches the use environment of the image processing model when the use environment of the image processing model is video image compression;
- a dynamic quantity of random noise is configured for the first training sample to obtain a second training sample set that matches the dynamic noise threshold.
- the information processing module 2082 is configured to determine a fixed noise amount threshold that matches the use environment of the image processing model when the use environment of the image processing model is medical image compression;
- a fixed quantity of random noise is configured for the first training sample to obtain a second training sample set that matches the fixed noise threshold.
- the information processing module 2082 is configured to obtain the pixel difference between the compressed image and the image to be compressed; obtain the number of bytes when storing the second latent variable and the fourth latent variable in the image processing model; and determine the fusion loss function of the image processing model based on the pixel difference and the number of bytes.
- the embodiment of the present application also provides a computer program product or a computer program, which includes computer executable instructions, and the computer executable instructions are stored in a computer-readable storage medium.
- the processor of the computer device or electronic device reads the computer executable instructions from the computer-readable storage medium, and the processor executes the computer executable instructions, so that the computer device executes the different embodiments and combinations of the embodiments provided by the above-mentioned image compression method.
- the image processing model training After the image processing model training is completed, it can be deployed in a server or a cloud server network.
- the image compression device provided in the present application can also be deployed in the electronic device shown in Figure 2 to execute the image compression method provided in the embodiment of the present application.
- FIG. 3A is a flow chart of the image compression method provided by the embodiment of the present application, which includes the following steps:
- Step 3001 Encode the image to be compressed to obtain a first latent variable corresponding to the image to be compressed.
- the image to be compressed can be a natural image.
- the image to be compressed can be encoded through an image transformation network for image encoding, such as a variational autoencoder, to obtain a first latent variable corresponding to the image to be compressed.
- the first latent variable refers to a random variable that exists in the model but cannot be directly observed, and is used to represent the potential characteristics of the input data.
- the first latent variable can be the output of the hidden layer of the image transformation network (i.e., the middle layer between the input layer and the output layer of the image transformation network).
- the image transformation network can be a neural network model for encoding the image to be compressed, including an input layer, at least one hidden layer and an output layer.
- the image to be compressed is encoded through the image transformation network to obtain a first latent variable corresponding to the image to be compressed.
- the high-definition images in the electronic games are usually compressed 4 times in batches.
- the resolution of the original game image is 1024*1024, and after 4 times compression, a low-resolution game image with a resolution of 256*256 is formed.
- the image compression method of the present application can batch convert image resources into compressed images adapted to the graphics processing unit (GPU) of the terminal, thereby reducing the memory overhead on the terminal side and the network overhead during image transmission.
- the original game image with a resolution of 1024*1024 is compressed 8 times, so that the size of the compressed image obtained after decoding is smaller, reducing the storage cost of the image.
- Step 3002 Determine the hyper-prior probability estimate corresponding to the first latent variable.
- the super-prior probability estimate can be determined in the following manner: encoding the first latent variable to obtain an encoding result, quantizing the encoding result to obtain a quantization result, and then decoding the quantization result to obtain a super-prior probability estimate.
- the encoding of the first latent variable can be achieved by a super-a priori encoder, and the decoding of the quantization result can be achieved by a super-a priori decoder, and the super-a priori encoder and the super-a priori decoder can be included in the Transformer model.
- the obtained super-a priori probability estimate can be used as a reference for subsequent partial decoding, so that the decoding result obtained by decoding is more accurate.
- the hyper-prior probability estimation can be the process of estimating the parameters of the prior distribution, where the prior distribution
- the parameters depend on the form of the prior distribution.
- the corresponding parameters of the prior distribution can be the mean and variance. That is, when the prior distribution of the first latent variable is a normal distribution, the corresponding parameters of the prior distribution can be the mean and variance.
- the hyper-prior probability estimate is the value obtained by estimating the parameters of the prior distribution.
- Step 3003 Partially decode the first latent variable according to the super-prior probability estimate to obtain a partial decoding result of the first latent variable.
- the super-prior probability estimate is used as reference information for decoding to partially decode the first latent variable, that is, to decode a portion of pixels, so that when other pixels are subsequently decoded, prediction (decoding) can be performed based on the partial decoding result.
- the first latent variable is grouped in the channel dimension to obtain multiple channel latent variable groups corresponding to each channel dimension, and then, an autoregression (such as a checkerboard autoregression) method can be used to decode part of the channel latent variable groups (such as one channel latent variable group) in the obtained multiple channel latent variable groups to obtain a partial decoding result of the first latent variable, and then the partial decoding result is used as prediction reference information to decode the next undecoded channel latent variable group.
- the selection of the decoded channel latent variable group can be random selection.
- channel refers to the component of color information that constitutes a color image, or the characteristic component used to represent an image.
- a color image consists of three color channels: red (R), green (G), and blue (B);
- R red
- G green
- B blue
- HSV HSV
- a color image consists of three channels: hue, saturation, and brightness.
- the first latent variable is grouped based on these three color dimensions to obtain a channel latent variable group corresponding to the red (R) dimension, a channel latent variable group corresponding to the green (G) dimension, and a channel latent variable group corresponding to the blue (B) dimension, each latent variable group including multiple pixels.
- checkerboard autoregression is used instead of serial autoregression within each latent variable group, so that the autoregression processing is performed orthogonally in the spatial and channel dimensions, and the channel group that is decoded first is used to predict the undecoded channel group.
- Step 3004 Generate a compressed image corresponding to the image to be compressed based on the partial decoding result of the first latent variable and the first latent variable corresponding to the image to be compressed.
- the data volume of the compressed image is smaller than the data volume of the image to be compressed.
- a compressed image corresponding to the image to be compressed can be generated in the following manner: based on the partial decoding result of the first latent variable and the first latent variable corresponding to the image to be compressed, a new image is generated by performing autoregressive modeling on each pixel of the image.
- the process may include Masked convolution and pixel-by-pixel conditional probability modeling. For example, for each convolution layer, by using A suitable mask is used to cover future pixels to ensure that only known pixel values can be used to predict the probability distribution of the current pixel value during training.
- Each pixel is conditionally modeled using a series of convolutional layers. Each convolutional layer is responsible for modeling a subset of the input image. By modeling the conditional probability distribution of each pixel (given the pixel values to its left and above), known pixels can be used to predict the possible value of the current pixel.
- conditional probability distribution refers to the probability distribution of the possible values of variable Y given variable X, that is, the distribution of Y given X. Modeling the conditional probability distribution of each pixel, that is, for each pixel, such as the target pixel, the distribution of the possible pixel values of the target pixel given the pixel values of the associated pixels of the target pixel (such as the pixel values to the left and above the pixel).
- a compressed image corresponding to the image to be compressed can be generated in the following manner: quantizing the first latent variable to obtain a second latent variable; performing autoregression on the second latent variable according to the partial decoding result to obtain the mean and variance of the second latent variable; and decoding the second latent variable using the mean and variance to obtain the compressed image.
- the process of quantizing the first latent variable can be regarded as the process of reducing the dimension of the first latent variable.
- the first latent variable is mapped to a preset low-dimensional space to obtain a second latent variable.
- the low-dimensional space here refers to a low-dimensional space relative to the dimension of the first latent variable.
- the first latent variable can be quantized by nonlinear dimensionality reduction or quantization matrix.
- the operation of quantizing the first latent variable can also be implemented by a vector quantizer.
- the vector quantizer is a system that maps a continuous or discrete vector sequence into a digital sequence suitable for communication or storage on a digital channel. By quantizing the first latent variable, data compression is achieved while maintaining the necessary data fidelity.
- the process of performing autoregression on the second latent variable according to the partial decoding result to obtain the mean and variance of the second latent variable can be achieved as follows: constructing sequence data according to the partial decoding result and the second latent variable, fitting the sequence data through at least one of an autoregressive model or a conditional heteroskedasticity model to obtain at least one of a mean dynamic model and a variance dynamic model of the second latent variable, performing parameter estimation on the mean dynamic model and the variance dynamic model of the second latent variable respectively through maximum likelihood estimation or other parameter estimation methods to obtain the mean and variance of the second latent variable.
- the autoregressive model is a type of stationary time series model that can be used to predict and analyze data with autocorrelation.
- an autoregressive model we can study whether there is a dependency relationship between data at different time points and the strength of this dependency relationship.
- the autoregressive model we can predict the average trend change of the random variable value at future moments and obtain the mean dynamic model of the second latent variable.
- the conditional heteroskedasticity model is explained.
- the conditional heteroskedasticity model is a model used to describe the existence of In the heteroskedasticity model (i.e., the variance is not constant). In practical applications, the variance of the data may change significantly over time or due to changes in other factors.
- the conditional heteroskedasticity model can better capture this heteroskedasticity. By fitting the conditional heteroskedasticity model to the sequence data, a variance dynamic model can be obtained.
- the mean dynamic model and variance dynamic model of the second latent variable are explained.
- the mean dynamic model is a model that describes the average trend change of time series data, which is used to describe the average trend change of sequence data and indicate the dynamic characteristics in the sequence data.
- the variance dynamic model is a model that describes the variance change of time series data and is used for the dynamic change of the variance of time series data.
- the second latent variable can be decoded using the mean and variance to obtain a compressed image in the following manner: construct a multivariate Gaussian distribution based on the mean and variance of the latent variable.
- construct a multivariate Gaussian distribution based on the mean and variance of the latent variable.
- the decoder is usually a neural network structure corresponding to the encoder, which can map the latent variable back to the original image space and map the input to the generated image through the decoder.
- the second latent variable can be decoded using the mean and variance to obtain a compressed image in the following manner: the second latent variable is decoded using the mean and variance to obtain a decoding result of the second latent variable; the decoding result of the second latent variable is alternately segmented and the attention mechanism is transferred until the decoding result of the second latent variable is completely segmented to obtain a compressed image.
- the second latent variable is obtained by quantizing the first latent variable, and the second latent variable contains more abstract and compressed information for decoding or reconstructing the image than the first latent variable, thereby making it possible to improve the image compression efficiency by compressing the image to be compressed based on the second latent variable.
- Segmenting the decoding result of the second latent variable that is, segmenting the decoding result of the second latent variable into different regions or blocks, so that parallel processing can be performed to improve efficiency, and using the attention mechanism to focus on a specific part of the segmented region, it is possible to concentrate resources on processing relevant parts of the image and improve the accuracy of image reconstruction; in some embodiments, segmenting and transferring the attention mechanism alternately on the decoding result of the second latent variable until the decoding result of the second latent variable is completely segmented to obtain a compressed image, including:
- the obtained multiple image regions are combined to obtain a compressed image.
- autoregression is performed alternately from the spatial dimension and the channel dimension, which greatly improves Improves compression efficiency.
- the input image is first transformed to generate a low-dimensional latent variable (latent code), then the latent variable is modeled for probability estimation, and finally the latent variable is compressed into a bit stream using entropy coding according to the calculated probability; during the decompression process, the latent variable is first decoded and restored according to the bit stream, and then the image is reconstructed according to the latent variable, achieving efficient image compression.
- latent code low-dimensional latent variable
- the latent variable is modeled for probability estimation
- the latent variable is compressed into a bit stream using entropy coding according to the calculated probability
- the latent variable is first decoded and restored according to the bit stream, and then the image is reconstructed according to the latent variable, achieving efficient image compression.
- the processing steps shown in Figure 3A can be implemented using an image processing model.
- the image processing model used in the image compression method provided in this application includes: an image transformation network, a hyper-prior network, and a context network. The following describes the working processes of the image processing model including: the image transformation network, the hyper-prior network, and the context network.
- FIG. 3B is a flow chart of an image compression method provided in an embodiment of the present application. It can be understood that the steps shown in FIG. 3B can be performed by various electronic devices running an image compression device, such as a server or server cluster with an image compression function, which is used to compress each image frame in a received image or received video through an image processing model to reduce the storage space occupied by image storage. The steps shown in FIG. 3B are described below.
- an image compression device such as a server or server cluster with an image compression function
- Step 301 The electronic device encodes the image to be compressed through the image transformation network of the image processing model to obtain a first latent variable.
- FIG. 4 is a schematic diagram of data flow of an image processing model in an embodiment of the present application.
- the image processing model in the present application includes: an image transformation network, a hyper-prior network, and a context network; the functions are as follows:
- the role of the image transformation network is to use high-resolution natural images to generate low-dimensional latent variables (latent codes). Assuming that the first latent variable obeys some inherent prior probability and the input image to be compressed obeys the conditional probability conditional on the latent variable, the image transformation network should make the probability estimates constructed by the encoder and decoder close enough so that the image reconstructed by the latent variable is close to the original image.
- the super prior network uses the encoder structure and the decoder structure to model the entropy value of each point in the latent variable.
- the bit rate of the compressed image is estimated and entropy coding is performed based on the appearance of the entropy feature points in the process of obtaining the entropy model of the feature value.
- the super prior network can store the probability modeling of the latent variables in a smaller amount of bytes, providing auxiliary reference for the subsequent decoding of the context network.
- the context network uses an autoregressive approach to predict the undecoded pixel information using the decoded pixel information, and finally inputs the prediction result into the decoder network of the image transformation network for decoding to obtain the compressed image.
- the context network can reduce information redundancy and improve the efficiency of image compression.
- the following describes the model structure and working principle of the image transformation network, super prior network and context network included in the image processing model.
- the image transformation network includes: an image encoder network and an image decoder network;
- the image encoder network includes: a transfer window attention mechanism module (Swin Transformer Block) and a block fusion module (Patch Merge Block), wherein the block fusion module includes in sequence: a space-to-depth conversion layer (Space-to-Depth), a normalization layer (LayerNorm) and a mapping layer (Linear);
- the image decoder network includes: a transfer window attention mechanism module (Swin Transformer Block) and a block segmentation module (Patch Split Block), wherein the block segmentation module includes in sequence: a mapping layer (Linear), a normalization layer (LayerNorm) and a depth-to-space conversion layer (Depth-to-Space).
- FIG6 is a schematic diagram of the working process of the space-depth conversion layer and the depth-space conversion layer in the embodiment of the present application. Since the image processing model needs to compress the image to be compressed, so that the volume of the compressed image is smaller than the image to be compressed, but the resolution is close to the image to be compressed, the space-depth conversion layer (Space-to-Depth) in the encoder network is configured to perform downsampling, and the depth-space conversion layer (Depth-to-Space) in the decoder network is configured to perform upsampling.
- the space-depth conversion layer (Space-to-Depth) in the encoder network is configured to perform downsampling
- the depth-space conversion layer (Depth-to-Space) in the decoder network is configured to perform upsampling.
- Space-to-Depth divides each 2*2 adjacent pixel into a block (patch), splices the pixels in the same position (same shadow) in each block and connects them along the channel direction to obtain 4 2*2 blocks.
- Depth-to-Space is the reverse operation of Space-to-Depth, which converts 4 2*2 blocks into a 4*4 image by upsampling.
- FIG. 7 is a schematic diagram of the composition structure of the transfer window attention mechanism module in the embodiment of the present application, wherein the transfer window attention mechanism module (Swin-Transformer block) mainly includes layer normalization, multi-layer perceptron, a normal window multi-attention and a transfer window multi-head attention mechanism.
- the use of the window attention mechanism can effectively reduce the computational complexity in the operation process compared to the traditional attention mechanism, greatly improve the efficiency of the calculation, so that the attention mechanism can be applied in the processing of large images.
- the receptive field of the framework is severely limited. Therefore, by adding the transfer window attention mechanism, the receptive field of the attention mechanism is greatly improved without increasing the computational complexity.
- the transfer window attention mechanism module constructs a hierarchical feature map by merging deeper image blocks, and since the attention is only calculated in each local window, it has a linear computational complexity for the input image size. As shown in FIG. 7, in the present application, the transfer window attention mechanism module performs local self-attention in each non-overlapping window of the feature map and retains the feature size.
- Figure 7 shows the internal structure of two consecutive Swin Transformer Blocks, including Layer Norm, multi-head self-attention and fully connected layers, which are connected internally using short cuts.
- the window size used by the encoder network and decoder network of the image transformation network is 8, the number of channels is 128, 192, 256, and 320 respectively, and the number of superpositions of the transfer window attention mechanism module network is 2, 2, 6, and 2 respectively.
- Step 302 Determine a hyper-prior probability estimate according to the first latent variable through a hyper-prior network.
- the encoder network of the super-prior network includes: a transfer window attention mechanism module and a block fusion module; the decoder network of the super-prior network includes: a transfer window attention mechanism module and a block segmentation module, the window size is 4, the number of channels is 192, 192 respectively, and the number of superimposed transfer window attention mechanism modules is 5, 1 respectively.
- the super-prior network determines the super-prior probability estimate value according to the first latent variable, which can be implemented in the following way:
- the first latent variable y is encoded by the super-prior encoder of the super-prior network to obtain the third latent variable z; the super-prior probability estimate corresponding to the first latent variable is determined by the quantization module (Q), arithmetic encoding module (AE) and arithmetic decoding module (AD) of the super-prior network, and the third latent variable z is quantized by the quantization module (Q) of the super-prior network to obtain the fourth latent variable When compressing, the fourth latent variable is encoded using an arithmetic coding module.
- the fourth latent variable is obtained by quantizing the third latent variable z during decompression. During compression, the fourth latent variable is compressed to obtain a byte stream, and during decompression, the fourth latent variable is restored from the byte stream.
- the fourth latent variable is decoded by the decoder network of the super prior network shown in FIG4. Decode and obtain the super prior probability estimate N( ⁇ , ⁇ ).
- the encoder of the super prior network needs to compress the probability or cumulative probability distribution into z first, and transmit it to the decoding end of the encoder of the super prior network by quantizing entropy encoding z, and learn the modeling parameters of the potential representation y through decoding at the decoding end.
- the compressed code stream file is obtained by modeling it and entropy encoding the quantized second latent variable, and arithmetic decoding obtains it from the byte stream. Then the entropy decoding result is input into the decoding module to obtain the final compressed image. picture.
- Step 303 quantify the first latent variable to obtain a second latent variable, and input the second latent variable into the context network.
- Step 304 Autoregression is performed on the second latent variable through the context network to obtain the mean and variance of the second latent variable.
- the arithmetic encoder models it according to the probability distribution of the second latent variable to obtain a byte stream.
- the electronic device performs autoregression on the second latent variable according to the partial decoding result through the context network, performs probability modeling on the second latent variable, calculates the mean and variance of the second latent variable, and then the arithmetic encoder performs modeling according to the probability distribution of the second latent variable to obtain a byte stream.
- FIG. 9 is a schematic diagram of the autoregression of the context network in an embodiment of the present application.
- the context network performs autoregression on the second latent variable according to the partial decoding result to obtain the mean and variance of the second latent variable, which can be achieved in the following manner:
- the second latent variable is grouped to obtain at least two groups of sub-latent variables; each group of sub-latent variables is subjected to spatial autoregression through a checkerboard; after each group of sub-latent variables completes spatial autoregression, the undecoded channel group is predicted through partial decoding results until the second latent variable completely completes autoregression, thereby obtaining the probability distribution of the second latent variable.
- spatial autoregression usually assumes that the features of a spatial location are correlated with the features of its surrounding neighboring locations. This correlation can be represented by a weight matrix (usually called a spatial weight matrix), which describes the spatial relationship between points in space.
- the spatial weight matrix can be used to describe the association between latent variables and predicted results.
- autoregression in the spatial dimension can be implemented by associating the current decoded symbol with the decoded symbols, modeling the variables probabilistically, and calculating the probability of all observable neighboring symbols.
- Channel dimension autoregression can be achieved by dividing the channels of the second latent variable into K groups for autoregression to reduce redundancy between channels, and using the first decoded channel group to perform autoregressive convolution in the channel direction g ch to predict the context expression of the undecoded channel group Process reference formula 2:
- the setting of the number of channel groups is crucial to balancing compression performance and running speed.
- k 5 is the group number for this application. Please select the preferred value of the image processing model.
- a checkerboard spatial context autoregression model and a channel context autoregression model are combined to realize an accelerated operation of orthogonally alternating autoregression in the spatial and channel dimensions.
- the latent variables are grouped in the channel dimension, and the checkerboard autoregression is used instead of the serial autoregression in each latent variable group.
- the channel autoregression is used to predict the undecoded channel group with the first decoded channel group.
- the context network performs autoregression prediction based on the hyper-prior probability modeling.
- the first part of the checkerboard in the first channel group is predicted, and then the remaining checkerboard part is predicted with the currently predicted checkerboard result.
- the prediction of the first channel group has been completed.
- the predicted results of the first group will be used as information reference for subsequent probability modeling for joint calculation.
- the entire operation process performs autoregression orthogonally and alternately in the spatial and channel dimensions, thereby effectively improving the compression rate of the image.
- Step 305 decode the second latent variable using the mean and the variance through the image transformation network to obtain a compressed image.
- the second latent variable is decoded by the transfer window attention mechanism module of the decoder network of the image transformation network to obtain a decoding result of the second latent variable; the second latent variable is used to alternately pass through the transfer window attention mechanism module and the block segmentation module to obtain a compressed image, wherein the volume of the compressed image is smaller than the image to be compressed.
- FIG10 is a flow chart of the image processing model training method provided by an embodiment of the present application. It is understandable that the steps shown in FIG10 can be performed by various electronic devices running an image processing model training device, such as a dedicated terminal with an image processing function, a server with an image processing model training function, or a server cluster. The steps shown in FIG10 are described below.
- Step 1001 The image processing model training device obtains a first training sample set, where the first training sample set includes at least one group of noise-free training samples.
- Step 1002 The image processing model training device configures random noise for the first training sample set to obtain a second training sample set.
- configuring random noise for the first training sample set to obtain the second training sample set can be achieved in the following manner:
- a dynamic noise quantity threshold that matches the use environment of the image processing model is determined; according to the dynamic noise quantity threshold, a dynamic quantity of random noise is configured for the first training sample to form a second training sample set that matches the dynamic noise threshold.
- the use environment of mini-program game images is diverse, for example, it can be a role-playing mini-program game image, it can be an image of the user collected by the terminal as a mini-program game image, or it can be an image captured from a video image frame as a mini-program game image
- the training samples come from different data sources, the data sources include data of various types of application scenarios as the data source of the corresponding training books.
- a second training sample set matching the dynamic noise threshold can be used to perform targeted training on the image processing model.
- configuring random noise for the first training sample set to obtain the second training sample set can be achieved in the following manner:
- a fixed noise quantity threshold that matches the use environment of the image processing model is determined; according to the fixed noise quantity threshold, a fixed amount of random noise is configured for the first training sample to form a second training sample set that matches the fixed noise threshold. Since the training samples are derived from a fixed data source, the data source includes data of fixed scenes as the data source of the corresponding training book (for example, any electronic device that generates medical images).
- the image processing model provided in this application can be packaged as a software module in a mobile detection electronic device, or it can be packaged in different fixed medical examination equipment (including but not limited to: handheld diagnostic instruments, ward central monitoring systems, bedside monitoring systems), and of course it can also be solidified in the hardware equipment of the intelligent robot.
- a second training sample set that matches the fixed noise threshold can be used to conduct targeted training on the image processing model to improve the training speed of the image processing model.
- Step 1003 The image processing model training device calculates the loss function of the image processing model.
- the pixel difference between the compressed image and the image to be compressed is obtained; then the number of bytes when storing the second latent variable and the fourth latent variable in the image processing model is obtained; finally, the fusion loss function of the image processing model is calculated according to the pixel difference and the number of bytes.
- R represents rate, which is the bytes required to store the second latent variable and the fourth latent variable.
- D represents distortion, which is usually expressed as Calculate the difference between the compressed image and the image to be compressed, where d is usually the mean square error MSE.
- ⁇ is a parameter that controls rate and distortion. Generally, the larger ⁇ is, the larger the pixel depth (BPP Bits Per Pixel) of the corresponding model is, and the higher the quality of image reconstruction is.
- Step 1004 Based on the initial parameters of the image processing model and the loss function of the image processing model, the image processing model is trained using the first training sample set and the second training sample set.
- the image processing model is trained to determine the image transformation network parameters, hyper-prior network parameters and context network parameters of the image processing model.
- FIG11 is a schematic diagram of the effect test of the image processing model provided in the embodiment of the present application, wherein a performance test is performed on the standard Kodak dataset, with bpp as the horizontal axis and PSNR (Peak Signal to Noise Ratio) as the horizontal axis. Ratio Peak Signal-to-Noise Ratio) is used as the ordinate to plot the rate-distortion performance of the model at different compression rates.
- the values of ⁇ at the four test points in the image processing model of the present application are 0.002, 0.005, 0.02 and 0.04 respectively. It can be seen that the image processing model of the present application improves the efficiency of image compression, and the compressed image has a smaller volume.
- the image to be compressed is encoded by the image transformation network of the image processing model to obtain a first latent variable, and the super prior network determines the super prior probability estimate according to the first latent variable; thus, the image is processed by the image transformation network and the super prior network constructed by the transfer window attention mechanism, which can improve the performance of image compression, make the compressed image obtained after decoding smaller in size, and reduce the storage cost of the image.
- the context network partially decodes the first latent variable according to the super-prior probability estimate to obtain a partial decoding result; the context network performs autoregression on the second latent variable according to the partial decoding result to obtain the mean and variance of the second latent variable; the second latent variable is decoded using the mean and the variance to obtain a compressed image, wherein the volume of the compressed image is smaller than the image to be compressed. Therefore, the context network uses the first decoded channel grouping information as the prior knowledge of the subsequent channel grouping to be decoded to reduce the subsequent compression redundancy and save the time of compressing the image. At the same time, the context network can perform autoregression alternately from the spatial dimension and the channel dimension to improve the compression efficiency.
- the training sample set can be flexibly adjusted according to different usage requirements, so that the image processing model can be applicable to different image compression environments.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Compression Of Band Width Or Redundancy In Fax (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
L=R+λD 公式3
Claims (15)
- 一种图像压缩方法,所述方法由电子设备执行,所述方法包括:对待压缩图像进行编码,得到所述待压缩图像对应的第一隐变量;确定所述第一隐变量对应的超先验概率估计值;根据所述超先验概率估计值,对所述第一隐变量进行部分解码,得到所述第一隐变量的部分解码结果;基于所述第一隐变量的部分解码结果及所述待压缩图像对应的第一隐变量,生成所述待压缩图像对应的压缩后图像,所述压缩后图像的数据量小于所述待压缩图像的数据量。
- 根据权利要求1所述的方法,其中,所述基于所述第一隐变量的部分解码结果及所述待压缩图像对应的第一隐变量,生成所述待压缩图像对应的压缩后图像,包括:对所述第一隐变量进行量化,得到第二隐变量;根据所述部分解码结果对所述第二隐变量进行自回归,得到第二隐变量的均值和方差;利用所述均值和所述方差对所述第二隐变量进行解码,得到所述压缩后图像。
- 根据权利要求2所述的方法,其中,所述根据所述部分解码结果对所述第二隐变量进行自回归,得到第二隐变量的均值和方差,包括:对所述第二隐变量进行分组,得到至少两组子隐变量;通过棋盘格对每一组子隐变量进行空间自回归;当每一组子隐变量完成空间自回归后,通过所述部分解码结果预测未解码的通道组,直至所述第二隐变量完全完成自回归,得到所述第二隐变量的均值和方差。
- 根据权利要求2所述的方法,其中,所述利用所述均值和所述方差对所述第二隐变量进行解码,得到所述压缩后图像,包括:利用所述均值和所述方差对所述第二隐变量进行解码,得到所述第二隐变量的解码结果;对所述第二隐变量的解码结果交替进行分割和注意力机制转移,直至所述第二隐变量的解码结果完全分割,得到所述压缩后图像。
- 根据权利要求2所述的方法,其中,所述方法基于图像处理模型所实现,所述图像处理模型包括:图像变换网络、超先验网络和上下文网络,所述对待压缩图像进行编码,得到所述待压缩图像对应的第一隐变量,包括:通过图像处理模型的图像变换网络对待压缩图像进行编码,得到第一隐变量;所述确定所述第一隐变量对应的超先验概率估计值,包括:通过所述超先验网络,根据所述第一隐变量确定所述超先验概率估计值;所述根据所述超先验概率估计值,对所述第一隐变量进行部分解码,得到所述第一隐变量的部分解码结果,包括:通过所述上下文网络根据所述超先验概率估计值,对所述第一隐变量进行部分解码,得到所述部分解码结果;所述根据所述部分解码结果对所述第二隐变量进行自回归,得到第二隐变量的均值和方差,包括:通过所述上下文网络,根据所述部分解码结果对所述第二隐变量进行自回归,得到第二隐变量的均值和方差;所述利用所述均值和所述方差对所述第二隐变量进行解码,得到所述压缩后图像,包括:通过所述图像变换网络利用所述均值和所述方差对所述第二隐变量进行解码,得到所述压缩后图像。
- 根据权利要求5所述的方法,其中,所述图像变换网络包括:图像编码器网络和图像解码器网络;所述图像编码器网络包括:转移窗口注意力机制模块和块融合模块,其中,所述块融合模块依次包括:空间深度转换层、归一化层和映射层;所述图像解码器网络包括:转移窗口注意力机制模块和块分割模块,其中,所述块分割模块依次包括:映射层、归一化层和深度空间转换层。
- 根据权利要求5所述的方法,其中,所述方法还包括:获取与所述图像处理模型对应的第一训练样本集合,所述第一训练样本集合包括至少一组无噪声的训练样本;为所述第一训练样本集合配置随机噪声,得到第二训练样本集合;获取所述图像处理模型的初始参数;基于所述图像处理模型的初始参数和所述图像处理模型的损失函数,通过所述第一训练样本集合和所述第二训练样本集合,对所述图像处理模型进行训练,以确定所述图像处理模型的图像变换网络参数、超先验网络参数和上下文网络参数。
- 根据权利要求7所述的方法,其中,所述为所述第一训练样本集合配置随机噪声,得到第二训练样本集合,包括:当所述图像处理模型的使用环境为视频图像压缩时,确定与所述图像处理模型的使用环境相匹配的动态噪声数量阈值;根据所述动态噪声数量阈值,为所述第一训练样本配置动态数量的随机噪声,得到与所述动态噪声阈值相匹配的第二训练样本集合。
- 根据权利要求7所述的方法,其中,所述为所述第一训练样本集合配置随机噪声,得到第二训练样本集合,包括:当所述图像处理模型的使用环境为医疗图像压缩时,确定与所述图像处理模型的使用环境相匹配的固定噪声数量阈值;根据所述固定噪声数量阈值,为所述第一训练样本配置固定数量的随机噪声,得到与所述固定噪声阈值相匹配的第二训练样本集合。
- 根据权利要求7所述的方法,其中,所述方法还包括:获取所述压缩后图像和所述待压缩图像的像素差值;获取对所述图像处理模型中第二隐变量和第四隐变量进行存储时的字节数;根据所述像素差值和所述字节数确定所述图像处理模型的融合损失函数。
- 根据权利要求1至10任一项所述的方法,其中,所述确定所述第一隐变量对应的超先验概率估计值,包括:对所述第一隐变量进行编码,得到第三隐变量;对所述第三隐变量进行熵编码,得到所述第三隐变量的熵编码;对所述第三隐变量的熵编码进行解码,得到第四隐变量;对所述第四隐变量进行解码,得到所述超先验概率估计值。
- 一种图像压缩装置,所述装置包括:编码模块,配置为对待压缩图像进行编码,得到所述待压缩图像对应的第一隐变量;信息处理模块,配置为确定所述第一隐变量对应的超先验概率估计值;所述信息处理模块,还配置为根据所述超先验概率估计值,对所述第一隐变量进行部分解码,得到所述第一隐变量的部分解码结果;所述信息处理模块,还配置为基于所述第一隐变量的部分解码结果及所述待压缩图像对应的第一隐变量,生成所述待压缩图像对应的压缩后图像,所述压缩后图像的数据量小于所述待压缩图像的数据量。
- 一种电子设备,所述电子设备包括:存储器,配置为存储可执行指令;处理器,配置为运行所述存储器存储的可执行指令时实现权利要求1至11任一项所述的图像压缩方法。
- 一种计算机程序产品,包括计算机程序或指令,所述计算机程序或指令被处理器执行时,实现权利要求1至11任一项所述的图像压缩方法。
- 一种计算机可读存储介质,存储有可执行指令,所述可执行指令被处理器执行时实 现权利要求1至11任一项所述的图像压缩方法。
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP23920883.8A EP4568246A4 (en) | 2023-02-09 | 2023-12-12 | IMAGE COMPRESSION METHOD AND APPARATUS, ELECTRONIC DEVICE, COMPUTER PROGRAM PRODUCT AND STORAGE MEDIA |
| US19/089,142 US20250227272A1 (en) | 2023-02-09 | 2025-03-25 | Image compression method and apparatus, electronic device, computer program product, and storage medium |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310136843.3 | 2023-02-09 | ||
| CN202310136843.3A CN116980611A (zh) | 2023-02-09 | 2023-02-09 | 图像压缩方法、装置、设备、计算机程序产品及介质 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US19/089,142 Continuation US20250227272A1 (en) | 2023-02-09 | 2025-03-25 | Image compression method and apparatus, electronic device, computer program product, and storage medium |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2024164694A1 true WO2024164694A1 (zh) | 2024-08-15 |
| WO2024164694A9 WO2024164694A9 (zh) | 2024-09-12 |
Family
ID=88478440
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2023/138206 Ceased WO2024164694A1 (zh) | 2023-02-09 | 2023-12-12 | 图像压缩方法、装置、电子设备、计算机程序产品及存储介质 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US20250227272A1 (zh) |
| EP (1) | EP4568246A4 (zh) |
| CN (1) | CN116980611A (zh) |
| WO (1) | WO2024164694A1 (zh) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119135910A (zh) * | 2024-09-10 | 2024-12-13 | 电子科技大学 | 一种基于深度学习的图像编码方法、设备 |
| CN120218382A (zh) * | 2025-05-19 | 2025-06-27 | 长沙矿冶研究院有限责任公司 | 一种基于自回归生成策略的动力电池拆解路径优化方法 |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116980611A (zh) * | 2023-02-09 | 2023-10-31 | 腾讯科技(深圳)有限公司 | 图像压缩方法、装置、设备、计算机程序产品及介质 |
| CN117915114B (zh) * | 2024-03-15 | 2024-07-09 | 深圳大学 | 一种点云属性压缩方法、装置、终端及介质 |
| CN120807661A (zh) * | 2024-04-10 | 2025-10-17 | 华为技术有限公司 | 一种数据压缩方法、装置及计算设备集群 |
| CN121151561A (zh) * | 2024-06-14 | 2025-12-16 | 抖音视界有限公司 | 用于图像编解码的方法、装置、设备和存储介质 |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113574883A (zh) * | 2019-03-21 | 2021-10-29 | 高通股份有限公司 | 使用深度生成性模型的视频压缩 |
| CN114663536A (zh) * | 2022-02-08 | 2022-06-24 | 中国科学院自动化研究所 | 一种图像压缩方法及装置 |
| WO2022268641A1 (en) * | 2021-06-21 | 2022-12-29 | Interdigital Vc Holdings France, Sas | Methods and apparatuses for encoding/decoding an image or a video |
| CN116980611A (zh) * | 2023-02-09 | 2023-10-31 | 腾讯科技(深圳)有限公司 | 图像压缩方法、装置、设备、计算机程序产品及介质 |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11375194B2 (en) * | 2019-11-16 | 2022-06-28 | Uatc, Llc | Conditional entropy coding for efficient video compression |
| EP4241450A1 (en) * | 2020-11-04 | 2023-09-13 | Vid Scale, Inc. | Learned video compression framework for multiple machine tasks |
-
2023
- 2023-02-09 CN CN202310136843.3A patent/CN116980611A/zh active Pending
- 2023-12-12 EP EP23920883.8A patent/EP4568246A4/en active Pending
- 2023-12-12 WO PCT/CN2023/138206 patent/WO2024164694A1/zh not_active Ceased
-
2025
- 2025-03-25 US US19/089,142 patent/US20250227272A1/en active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113574883A (zh) * | 2019-03-21 | 2021-10-29 | 高通股份有限公司 | 使用深度生成性模型的视频压缩 |
| WO2022268641A1 (en) * | 2021-06-21 | 2022-12-29 | Interdigital Vc Holdings France, Sas | Methods and apparatuses for encoding/decoding an image or a video |
| CN114663536A (zh) * | 2022-02-08 | 2022-06-24 | 中国科学院自动化研究所 | 一种图像压缩方法及装置 |
| CN116980611A (zh) * | 2023-02-09 | 2023-10-31 | 腾讯科技(深圳)有限公司 | 图像压缩方法、装置、设备、计算机程序产品及介质 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4568246A4 * |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119135910A (zh) * | 2024-09-10 | 2024-12-13 | 电子科技大学 | 一种基于深度学习的图像编码方法、设备 |
| CN119135910B (zh) * | 2024-09-10 | 2025-06-20 | 电子科技大学 | 一种基于深度学习的图像编码方法、设备 |
| CN120218382A (zh) * | 2025-05-19 | 2025-06-27 | 长沙矿冶研究院有限责任公司 | 一种基于自回归生成策略的动力电池拆解路径优化方法 |
Also Published As
| Publication number | Publication date |
|---|---|
| US20250227272A1 (en) | 2025-07-10 |
| EP4568246A4 (en) | 2025-11-19 |
| CN116980611A (zh) | 2023-10-31 |
| WO2024164694A9 (zh) | 2024-09-12 |
| EP4568246A1 (en) | 2025-06-11 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2024164694A1 (zh) | 图像压缩方法、装置、电子设备、计算机程序产品及存储介质 | |
| CN111881920B (zh) | 一种大分辨率图像的网络适配方法及神经网络训练装置 | |
| CN120640000B (zh) | 一种多尺度语义引导图像压缩方法、系统及存储介质 | |
| CN113079378B (zh) | 图像处理方法、装置和电子设备 | |
| CN115345785A (zh) | 一种基于多尺度时空特征融合的暗光视频增强方法及系统 | |
| Hema et al. | Effective Image Reconstruction Using Various Compressed Sensing Techniques | |
| Cui et al. | Deep network for image compressed sensing coding using local structural sampling | |
| Liu et al. | Learning to generate realistic images for bit-depth enhancement via camera imaging processing | |
| CN117793289A (zh) | 一种视频传输方法、视频重建方法及相关设备 | |
| CN117376564B (zh) | 数据编解码方法及相关设备 | |
| CN115294429A (zh) | 一种基于特征域网络训练方法和装置 | |
| CN119603465A (zh) | 一种基于边信息自回归的学习图像压缩方法 | |
| WO2025081929A1 (zh) | 图像解码方法、图像编码方法及装置 | |
| CN114022575B (zh) | 基于单目深度估计的深度图压缩方法、装置、设备及介质 | |
| Zhang et al. | A Low-Complexity Transformer-CNN Hybrid Model Combining Dynamic Attention for Remote Sensing Image Compression. | |
| EP4664887A1 (en) | Encoding and decoding method and apparatus, and device thereof | |
| US20260122241A1 (en) | Image decoding method and apparatus, image coding method and apparatus, device and storage medium | |
| US20260122263A1 (en) | Image decoding method and apparatus, image coding method and apparatus, and device and storage medium | |
| CN118972620B (zh) | 图像解码和编码方法、装置、设备及存储介质 | |
| Jannani et al. | An Image Compression Approach Based | |
| CA3285218A1 (en) | Encoding and decoding method and apparatus, and device thereof | |
| Pushpalatha et al. | Interpolative Model on Hueristic Projection Transform for Image Compression in Cloud Services. | |
| Sophia et al. | An efficient hybrid transform algorithm for image compression using a matrix rank-based optimization approach | |
| Bastos | Low-complexity transform-quantization pair for 360° image compression | |
| Singh et al. | A Review on Recent Developments in Image Compression Techniques |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23920883 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2023920883 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 2023920883 Country of ref document: EP Effective date: 20250305 |
|
| WWP | Wipo information: published in national office |
Ref document number: 2023920883 Country of ref document: EP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |