WO2024164694A1 - 图像压缩方法、装置、电子设备、计算机程序产品及存储介质 - Google Patents

图像压缩方法、装置、电子设备、计算机程序产品及存储介质 Download PDF

Info

Publication number
WO2024164694A1
WO2024164694A1 PCT/CN2023/138206 CN2023138206W WO2024164694A1 WO 2024164694 A1 WO2024164694 A1 WO 2024164694A1 CN 2023138206 W CN2023138206 W CN 2023138206W WO 2024164694 A1 WO2024164694 A1 WO 2024164694A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
latent variable
compressed
network
latent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2023/138206
Other languages
English (en)
French (fr)
Other versions
WO2024164694A9 (zh
Inventor
吕悦
项进喜
张军
韩骁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to EP23920883.8A priority Critical patent/EP4568246A4/en
Publication of WO2024164694A1 publication Critical patent/WO2024164694A1/zh
Publication of WO2024164694A9 publication Critical patent/WO2024164694A9/zh
Priority to US19/089,142 priority patent/US20250227272A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/182Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • H04N19/423Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements
    • H04N19/426Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements using memory downsizing methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/91Entropy coding, e.g. variable length coding [VLC] or arithmetic coding

Definitions

  • the present application relates to computer technology, and in particular to an image compression method, device, electronic device, computer program product and computer storage medium.
  • the related art uses basic convolutional networks to transform images.
  • the compression rate is low, it is necessary to restore latent variables from the byte stream to reconstruct high-quality images, and the ability of the image nonlinear transformation network limits the network's ability to reconstruct high-quality images; at the same time, the context model in the related art uses PixelCNN serial decoding, which makes the image compression efficiency low.
  • the embodiments of the present application provide an image compression method, device, electronic device, computer program product and computer storage medium, which can improve the efficiency of image compression by utilizing an image processing model. At the same time, the volume of the compressed image is smaller, thereby reducing the storage cost of the image.
  • the present application provides an image compression method, the method comprising:
  • a compressed image corresponding to the image to be compressed is generated, and the data amount of the compressed image is smaller than the data amount of the image to be compressed.
  • the present application also provides an image compression device, the device comprising:
  • An encoding module configured to encode the image to be compressed to obtain a first latent variable corresponding to the image to be compressed
  • An information processing module configured to determine a hyper-priori probability estimate corresponding to the first latent variable
  • the information processing module is further configured to partially decode the first latent variable according to the super a priori probability estimate to obtain a partial decoding result of the first latent variable;
  • the information processing module is further configured to generate a compressed image corresponding to the image to be compressed based on a partial decoding result of the first latent variable and the first latent variable corresponding to the image to be compressed, wherein the data amount of the compressed image is smaller than the data amount of the image to be compressed.
  • the present application also provides an electronic device, wherein the training device comprises:
  • a memory configured to store executable instructions
  • the processor is configured to implement the aforementioned image compression method when running the executable instructions stored in the memory.
  • the embodiment of the present application also provides a computer program product, and when the computer program or instruction is executed by a processor, the aforementioned image compression method is implemented.
  • the embodiment of the present application also provides a computer-readable storage medium storing executable instructions, which implement the aforementioned image compression method when executed by a processor.
  • the embodiment of the present application encodes the image to be compressed to obtain the first latent variable, and determines the super prior probability estimate based on the first latent variable; if the first latent variable obtained by encoding obeys a certain inherent prior probability, the obtained super prior probability estimate can be used as a reference for subsequent partial decoding, so that the decoding result obtained by decoding is more accurate, and at the same time, the performance of image compression is improved, so that the size of the compressed image obtained after decoding is smaller, and the storage cost of the image is reduced.
  • the first latent variable is partially decoded to obtain a partial decoding result; the first latent variable is partially decoded, that is, a part of the pixels are decoded, so that when other pixels are subsequently decoded, prediction (decoding) can be performed based on the partial decoding result, thus saving the time spent on compressing the image and improving the compression efficiency.
  • FIG1 is a schematic diagram of a use environment of an image compression method provided by an embodiment of the present application.
  • FIG2 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application.
  • FIG3A is a flowchart diagram 1 of an image compression method provided in an embodiment of the present application.
  • FIG3B is a second flow chart of the image compression method provided in an embodiment of the present application.
  • FIG4 is a schematic diagram of data flow of an image processing model provided in an embodiment of the present application.
  • FIG5 is a schematic diagram of a model structure of an image processing model provided in an embodiment of the present application.
  • FIG6 is a schematic diagram of the working process of the spatial depth conversion layer and the depth-space conversion layer provided in an embodiment of the present application;
  • FIG7 is a schematic diagram of the composition structure of the transfer window attention mechanism module provided in an embodiment of the present application.
  • FIG8 is a schematic diagram of the calculation principle of the transfer window attention mechanism module provided in an embodiment of the present application.
  • FIG9 is a schematic diagram of an autoregression of a context network provided in an embodiment of the present application.
  • FIG10 is a schematic diagram of a flow chart of an image processing model training method provided in an embodiment of the present application.
  • FIG. 11 is a schematic diagram of an effect test of an image processing model provided in an embodiment of the present application.
  • Wasserstein distance A distance metric function mainly used to measure the difference between two distributions.
  • Neural Network for short, in the field of machine learning and cognitive science, is a mathematical model or computational model that imitates the structure and function of biological neural networks and is used to estimate or approximate functions.
  • Model parameters It is a quantity that uses universal variables to establish the relationship between functions and variables. In artificial neural networks, model parameters are usually real number matrices.
  • Model training, multi-classification learning of image datasets The model can be built using deep learning frameworks such as Tensor Flow and torch, and uses multiple layers of neural network layers such as CNN to form a multi-image classification model.
  • the input of the model is a three-channel or original channel matrix formed by reading the image through tools such as openCV.
  • the model output is a multi-classification probability, and the image compression result is finally output through algorithms such as softmax.
  • the model approaches the correct trend through objective functions such as cross entropy.
  • VAE Variational Autoencoder
  • latent code low-dimensional latent variables
  • the latent variable obeys a certain inherent prior probability, and the input image obeys the conditional probability conditioned on the latent variable. Then the low-dimensional variable can describe the information contained in the input image. And it can reconstruct high-dimensional input images through sampling.
  • the variational autoencoder compresses low-dimensional latent variables to reduce information redundancy.
  • Hyper Prior Based on the latent variables obtained by the encoder from the input image, the hyper prior uses a lightweight network to model each point in the latent variable with a scalar entropy model, and obtains the occurrence of feature points through the entropy model of the eigenvalue for bit rate estimation and entropy coding.
  • the hyper prior stores the probability modeling of the latent variable in a smaller amount of bytes.
  • the byte stream stored in the hyper prior module is decoded first, and then the latent variable is restored using the probability decoded from the byte stream to reconstruct the image.
  • Context model usually use autoregression to predict undecoded pixel information using decoded pixel information to reduce information redundancy.
  • autoregressive models use sliding window linear serial prediction, and the complexity increases exponentially with the dimension of the input data. Although the autoregressive context model can greatly improve the performance of the model, the computational complexity of the compression model also increases significantly.
  • Entropy coding It is a lossless coding method that does not lose any information according to the entropy principle during the encoding process. It is also a key module in lossy coding and is located at the end of the encoder. Information entropy is the average amount of information of the source (a measure of uncertainty). Common entropy coding methods include: Shannon coding, Huffman coding, Exp-Golomb coding, and arithmetic coding. Since entropy coding is the symbol to be encoded obtained by the encoder after a series of operations such as quantization, transformation, motion, and prediction, a suitable entropy coding model is selected according to the distribution of the encoded symbols. Therefore, entropy coding is a relatively independent unit that can be applied not only to video encoding and decoding, but also to other encoders such as image coding and point cloud coding.
  • the image coding method needs to manually set the image features, such as JPEG, BPG and VVC-intra use orthogonal linear transformations, such as discrete cosine transform (DCT) and discrete wavelet transform (DWT) to decorrelate the image pixels before quantization and encoding.
  • JPEG compression compresses Y, Cb, and Cr separately based on the premise that the human eye is sensitive to color but more sensitive to brightness. For example, for a natural picture, jpeg performs DCT decomposition on each 8*8 patch to obtain 64 DCT parameters.
  • variable-length coding and Huffman coding can be used to compress redundancy.
  • the compression rate is low, it is necessary to recover the hidden variables from the byte stream to reconstruct a high-quality image, and the ability of the image nonlinear transformation network limits the ability of the network to reconstruct a high-quality image; at the same time, the context model in the related art uses PixelCNN serial decoding, and the decoding efficiency is low.
  • the embodiment of the present application provides an image compression method, which compresses the image using an image processing model including an image transformation network, a hyper-prior network and a context network, thereby improving the compression efficiency and the quality of the compressed image. quality.
  • FIG1 is a schematic diagram of a usage scenario of an image compression method provided in an embodiment of the present application.
  • a client having an image processing function or a client having a video processing function is provided on a terminal (including a terminal 10-1 and a terminal 10-2).
  • a user can input a corresponding image to be processed through the provided image processing client, and the image processing client can also receive a corresponding compressed image and display the received compressed image to the user; the video processing client can compress each frame of the video through the image processing model provided in an embodiment of the present application to reduce the server storage space occupied by the video.
  • the terminal is connected to the server 200 via a network 300, and the network 300 can be a wide area network or a local area network, or a combination of the two, and a wireless link is used to realize data transmission.
  • the server 200 is configured to deploy an image processing model and train the image processing model to determine the network parameters of the image transformation network, the hyper-prior network and the context network in the image processing model; and after the image processing model training is completed, the compressed image corresponding to the image to be processed generated by the image processing model is displayed through the terminal (terminal 10-1 and/or terminal 10-2).
  • the image processing model needs to be trained to determine the network parameters of the image transformation network, the hyper-prior network and the context network.
  • FIG. 2 is a schematic diagram of the composition structure of the electronic device provided by the embodiment of the present application. It can be understood that FIG. 2 only shows an exemplary structure of the electronic device rather than the entire structure. Part or all of the structure shown in FIG. 2 can be implemented as needed.
  • the electronic device provided in the embodiment of the present application includes: at least one processor 201, a memory 202, a user interface 203 and at least one network interface 204.
  • the various components in the electronic device 20 are coupled together through a bus system 205.
  • the bus system 205 is configured to achieve connection and communication between these components.
  • the bus system 205 also includes a power bus, a control bus and a status signal bus.
  • various buses are marked as bus systems 205 in Figure 2.
  • the user interface 203 may include a display, a keyboard, a mouse, a trackball, a click wheel, keys, buttons, a touch pad or a touch screen.
  • the memory 202 can be a volatile memory or a non-volatile memory, or can include both volatile and non-volatile memories.
  • the memory 202 in the embodiment of the present application can store data to support the operation of the terminal (such as 10-1). Examples of such data include: any computer program used to operate on the terminal (such as 10-1), such as an operating system and an application program.
  • the operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, etc. Used to implement various basic services and process hardware-based tasks.
  • Applications can include various applications.
  • the image compression device provided in the embodiment of the present application can be implemented in a combination of software and hardware.
  • the image compression device provided in the embodiment of the present application can be a processor in the form of a hardware decoding processor, which is programmed to execute the image compression method provided in the embodiment of the present application.
  • the processor in the form of a hardware decoding processor can adopt one or more application specific integrated circuits (ASICs), DSPs, programmable logic devices (PLDs), complex programmable logic devices (CPLDs), field programmable gate arrays (FPGAs), or other electronic components.
  • ASICs application specific integrated circuits
  • DSPs digital signal processor
  • PLDs programmable logic devices
  • CPLDs complex programmable logic devices
  • FPGAs field programmable gate arrays
  • the image compression device provided in an embodiment of the present application can be directly embodied as a combination of software modules executed by a processor 201.
  • the software module can be located in a storage medium, and the storage medium is located in a memory 202.
  • the processor 201 reads the executable instructions included in the software module in the memory 202, and completes the image compression method provided in an embodiment of the present application in combination with necessary hardware (for example, including the processor 201 and other components connected to the bus 205).
  • processor 201 can be an integrated circuit chip with signal processing capabilities, such as a general-purpose processor, a digital signal processor (DSP), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., where the general-purpose processor can be a microprocessor or any conventional processor, etc.
  • DSP digital signal processor
  • the device provided in the embodiment of the present application can be directly executed by a processor 201 in the form of a hardware decoding processor.
  • the image compression method provided in the embodiment of the present application can be implemented by one or more application specific integrated circuits (ASICs), DSPs, programmable logic devices (PLDs), complex programmable logic devices (CPLDs), field programmable gate arrays (FPGAs), or other electronic components.
  • ASICs application specific integrated circuits
  • DSPs digital signal processor
  • PLDs programmable logic devices
  • CPLDs complex programmable logic devices
  • FPGAs field programmable gate arrays
  • the memory 202 in the embodiment of the present application is configured to store various types of data to support the operation of the electronic device 20. Examples of such data include: any executable instructions for operating on the electronic device 20, such as executable instructions, and the program implementing the image compression method in the embodiment of the present application may be included in the executable instructions.
  • the image compression device provided in the embodiments of the present application can be implemented in software.
  • Figure 2 shows an image compression device stored in the memory 202, which can be software in the form of programs and plug-ins, and includes a series of modules.
  • the image compression device includes the following software modules: an encoding module 2081 and an information processing module 2082.
  • the encoding module 2081 is configured to encode the image to be compressed to obtain a first latent variable corresponding to the image to be compressed;
  • An information processing module 2082 is configured to determine a hyper-priori probability estimate corresponding to the first latent variable
  • the information processing module 2082 is configured to partially decode the first latent variable according to the super prior probability estimate to obtain a partial decoding result of the first latent variable;
  • the information processing module 2082 is further configured to generate a compressed image corresponding to the image to be compressed based on the partial decoding result of the first latent variable and the first latent variable corresponding to the image to be compressed, wherein the data amount of the compressed image is smaller than the data amount of the image to be compressed.
  • the information processing module 2082 is further configured to perform autoregression on the second latent variable according to the partial decoding result to obtain the mean and variance of the second latent variable;
  • the information processing module 2082 is further configured to decode the second latent variable using the mean and variance to obtain a compressed image.
  • the information processing module 2082 is configured to encode the first latent variable to obtain a third latent variable
  • the information processing module 2082 is configured to perform entropy coding on the third latent variable to obtain an entropy coding of the third latent variable;
  • the information processing module 2082 is configured to decode the entropy coding of the third latent variable to obtain a fourth latent variable
  • the information processing module 2082 is configured to decode the fourth latent variable to obtain a super-prior probability estimate.
  • the information processing module 2082 is configured to group the second latent variables to obtain at least two groups of sub-latent variables;
  • An information processing module 2082 is configured to perform spatial autoregression on each group of latent sub-variables through a checkerboard;
  • the information processing module 2082 is configured to predict the undecoded channel group through the partial decoding results after each group of sub-latent variables completes the spatial autoregression, until the second latent variable completely completes the autoregression, and obtain the mean and variance of the second latent variable.
  • the information processing module 2082 is further configured to decode the second latent variable using the mean and variance to obtain a decoding result of the second latent variable;
  • the information processing module 2082 is configured to alternately segment and transfer the attention mechanism on the decoding result of the second latent variable until the decoding result of the second latent variable is completely segmented to obtain a compressed image.
  • the information processing module 2082 is further configured to encode the image to be compressed through an image transformation network of the image processing model to obtain a first latent variable
  • the second latent variable is autoregressed according to the partial decoding results to obtain the mean and variance of the second latent variable
  • the second latent variable is decoded using the mean and variance through the image transformation network to obtain the compressed image.
  • the information processing module 2082 is further configured to obtain a first training sample set corresponding to the image processing model, the first training sample set including at least one set of noise-free training samples;
  • the information processing module 2082 is further configured to configure random noise for the first training sample set to obtain a second training sample set;
  • the information processing module 2082 is further configured to obtain initial parameters of the image processing model
  • the information processing module 2082 is also configured to train the image processing model based on the initial parameters of the image processing model and the loss function of the image processing model through a first training sample set and a second training sample set to determine the image transformation network parameters, hyper-prior network parameters and context network parameters of the image processing model.
  • the information processing module 2082 is configured to determine a threshold value of the amount of dynamic noise that matches the use environment of the image processing model when the use environment of the image processing model is video image compression;
  • a dynamic quantity of random noise is configured for the first training sample to obtain a second training sample set that matches the dynamic noise threshold.
  • the information processing module 2082 is configured to determine a fixed noise amount threshold that matches the use environment of the image processing model when the use environment of the image processing model is medical image compression;
  • a fixed quantity of random noise is configured for the first training sample to obtain a second training sample set that matches the fixed noise threshold.
  • the information processing module 2082 is configured to obtain the pixel difference between the compressed image and the image to be compressed; obtain the number of bytes when storing the second latent variable and the fourth latent variable in the image processing model; and determine the fusion loss function of the image processing model based on the pixel difference and the number of bytes.
  • the embodiment of the present application also provides a computer program product or a computer program, which includes computer executable instructions, and the computer executable instructions are stored in a computer-readable storage medium.
  • the processor of the computer device or electronic device reads the computer executable instructions from the computer-readable storage medium, and the processor executes the computer executable instructions, so that the computer device executes the different embodiments and combinations of the embodiments provided by the above-mentioned image compression method.
  • the image processing model training After the image processing model training is completed, it can be deployed in a server or a cloud server network.
  • the image compression device provided in the present application can also be deployed in the electronic device shown in Figure 2 to execute the image compression method provided in the embodiment of the present application.
  • FIG. 3A is a flow chart of the image compression method provided by the embodiment of the present application, which includes the following steps:
  • Step 3001 Encode the image to be compressed to obtain a first latent variable corresponding to the image to be compressed.
  • the image to be compressed can be a natural image.
  • the image to be compressed can be encoded through an image transformation network for image encoding, such as a variational autoencoder, to obtain a first latent variable corresponding to the image to be compressed.
  • the first latent variable refers to a random variable that exists in the model but cannot be directly observed, and is used to represent the potential characteristics of the input data.
  • the first latent variable can be the output of the hidden layer of the image transformation network (i.e., the middle layer between the input layer and the output layer of the image transformation network).
  • the image transformation network can be a neural network model for encoding the image to be compressed, including an input layer, at least one hidden layer and an output layer.
  • the image to be compressed is encoded through the image transformation network to obtain a first latent variable corresponding to the image to be compressed.
  • the high-definition images in the electronic games are usually compressed 4 times in batches.
  • the resolution of the original game image is 1024*1024, and after 4 times compression, a low-resolution game image with a resolution of 256*256 is formed.
  • the image compression method of the present application can batch convert image resources into compressed images adapted to the graphics processing unit (GPU) of the terminal, thereby reducing the memory overhead on the terminal side and the network overhead during image transmission.
  • the original game image with a resolution of 1024*1024 is compressed 8 times, so that the size of the compressed image obtained after decoding is smaller, reducing the storage cost of the image.
  • Step 3002 Determine the hyper-prior probability estimate corresponding to the first latent variable.
  • the super-prior probability estimate can be determined in the following manner: encoding the first latent variable to obtain an encoding result, quantizing the encoding result to obtain a quantization result, and then decoding the quantization result to obtain a super-prior probability estimate.
  • the encoding of the first latent variable can be achieved by a super-a priori encoder, and the decoding of the quantization result can be achieved by a super-a priori decoder, and the super-a priori encoder and the super-a priori decoder can be included in the Transformer model.
  • the obtained super-a priori probability estimate can be used as a reference for subsequent partial decoding, so that the decoding result obtained by decoding is more accurate.
  • the hyper-prior probability estimation can be the process of estimating the parameters of the prior distribution, where the prior distribution
  • the parameters depend on the form of the prior distribution.
  • the corresponding parameters of the prior distribution can be the mean and variance. That is, when the prior distribution of the first latent variable is a normal distribution, the corresponding parameters of the prior distribution can be the mean and variance.
  • the hyper-prior probability estimate is the value obtained by estimating the parameters of the prior distribution.
  • Step 3003 Partially decode the first latent variable according to the super-prior probability estimate to obtain a partial decoding result of the first latent variable.
  • the super-prior probability estimate is used as reference information for decoding to partially decode the first latent variable, that is, to decode a portion of pixels, so that when other pixels are subsequently decoded, prediction (decoding) can be performed based on the partial decoding result.
  • the first latent variable is grouped in the channel dimension to obtain multiple channel latent variable groups corresponding to each channel dimension, and then, an autoregression (such as a checkerboard autoregression) method can be used to decode part of the channel latent variable groups (such as one channel latent variable group) in the obtained multiple channel latent variable groups to obtain a partial decoding result of the first latent variable, and then the partial decoding result is used as prediction reference information to decode the next undecoded channel latent variable group.
  • the selection of the decoded channel latent variable group can be random selection.
  • channel refers to the component of color information that constitutes a color image, or the characteristic component used to represent an image.
  • a color image consists of three color channels: red (R), green (G), and blue (B);
  • R red
  • G green
  • B blue
  • HSV HSV
  • a color image consists of three channels: hue, saturation, and brightness.
  • the first latent variable is grouped based on these three color dimensions to obtain a channel latent variable group corresponding to the red (R) dimension, a channel latent variable group corresponding to the green (G) dimension, and a channel latent variable group corresponding to the blue (B) dimension, each latent variable group including multiple pixels.
  • checkerboard autoregression is used instead of serial autoregression within each latent variable group, so that the autoregression processing is performed orthogonally in the spatial and channel dimensions, and the channel group that is decoded first is used to predict the undecoded channel group.
  • Step 3004 Generate a compressed image corresponding to the image to be compressed based on the partial decoding result of the first latent variable and the first latent variable corresponding to the image to be compressed.
  • the data volume of the compressed image is smaller than the data volume of the image to be compressed.
  • a compressed image corresponding to the image to be compressed can be generated in the following manner: based on the partial decoding result of the first latent variable and the first latent variable corresponding to the image to be compressed, a new image is generated by performing autoregressive modeling on each pixel of the image.
  • the process may include Masked convolution and pixel-by-pixel conditional probability modeling. For example, for each convolution layer, by using A suitable mask is used to cover future pixels to ensure that only known pixel values can be used to predict the probability distribution of the current pixel value during training.
  • Each pixel is conditionally modeled using a series of convolutional layers. Each convolutional layer is responsible for modeling a subset of the input image. By modeling the conditional probability distribution of each pixel (given the pixel values to its left and above), known pixels can be used to predict the possible value of the current pixel.
  • conditional probability distribution refers to the probability distribution of the possible values of variable Y given variable X, that is, the distribution of Y given X. Modeling the conditional probability distribution of each pixel, that is, for each pixel, such as the target pixel, the distribution of the possible pixel values of the target pixel given the pixel values of the associated pixels of the target pixel (such as the pixel values to the left and above the pixel).
  • a compressed image corresponding to the image to be compressed can be generated in the following manner: quantizing the first latent variable to obtain a second latent variable; performing autoregression on the second latent variable according to the partial decoding result to obtain the mean and variance of the second latent variable; and decoding the second latent variable using the mean and variance to obtain the compressed image.
  • the process of quantizing the first latent variable can be regarded as the process of reducing the dimension of the first latent variable.
  • the first latent variable is mapped to a preset low-dimensional space to obtain a second latent variable.
  • the low-dimensional space here refers to a low-dimensional space relative to the dimension of the first latent variable.
  • the first latent variable can be quantized by nonlinear dimensionality reduction or quantization matrix.
  • the operation of quantizing the first latent variable can also be implemented by a vector quantizer.
  • the vector quantizer is a system that maps a continuous or discrete vector sequence into a digital sequence suitable for communication or storage on a digital channel. By quantizing the first latent variable, data compression is achieved while maintaining the necessary data fidelity.
  • the process of performing autoregression on the second latent variable according to the partial decoding result to obtain the mean and variance of the second latent variable can be achieved as follows: constructing sequence data according to the partial decoding result and the second latent variable, fitting the sequence data through at least one of an autoregressive model or a conditional heteroskedasticity model to obtain at least one of a mean dynamic model and a variance dynamic model of the second latent variable, performing parameter estimation on the mean dynamic model and the variance dynamic model of the second latent variable respectively through maximum likelihood estimation or other parameter estimation methods to obtain the mean and variance of the second latent variable.
  • the autoregressive model is a type of stationary time series model that can be used to predict and analyze data with autocorrelation.
  • an autoregressive model we can study whether there is a dependency relationship between data at different time points and the strength of this dependency relationship.
  • the autoregressive model we can predict the average trend change of the random variable value at future moments and obtain the mean dynamic model of the second latent variable.
  • the conditional heteroskedasticity model is explained.
  • the conditional heteroskedasticity model is a model used to describe the existence of In the heteroskedasticity model (i.e., the variance is not constant). In practical applications, the variance of the data may change significantly over time or due to changes in other factors.
  • the conditional heteroskedasticity model can better capture this heteroskedasticity. By fitting the conditional heteroskedasticity model to the sequence data, a variance dynamic model can be obtained.
  • the mean dynamic model and variance dynamic model of the second latent variable are explained.
  • the mean dynamic model is a model that describes the average trend change of time series data, which is used to describe the average trend change of sequence data and indicate the dynamic characteristics in the sequence data.
  • the variance dynamic model is a model that describes the variance change of time series data and is used for the dynamic change of the variance of time series data.
  • the second latent variable can be decoded using the mean and variance to obtain a compressed image in the following manner: construct a multivariate Gaussian distribution based on the mean and variance of the latent variable.
  • construct a multivariate Gaussian distribution based on the mean and variance of the latent variable.
  • the decoder is usually a neural network structure corresponding to the encoder, which can map the latent variable back to the original image space and map the input to the generated image through the decoder.
  • the second latent variable can be decoded using the mean and variance to obtain a compressed image in the following manner: the second latent variable is decoded using the mean and variance to obtain a decoding result of the second latent variable; the decoding result of the second latent variable is alternately segmented and the attention mechanism is transferred until the decoding result of the second latent variable is completely segmented to obtain a compressed image.
  • the second latent variable is obtained by quantizing the first latent variable, and the second latent variable contains more abstract and compressed information for decoding or reconstructing the image than the first latent variable, thereby making it possible to improve the image compression efficiency by compressing the image to be compressed based on the second latent variable.
  • Segmenting the decoding result of the second latent variable that is, segmenting the decoding result of the second latent variable into different regions or blocks, so that parallel processing can be performed to improve efficiency, and using the attention mechanism to focus on a specific part of the segmented region, it is possible to concentrate resources on processing relevant parts of the image and improve the accuracy of image reconstruction; in some embodiments, segmenting and transferring the attention mechanism alternately on the decoding result of the second latent variable until the decoding result of the second latent variable is completely segmented to obtain a compressed image, including:
  • the obtained multiple image regions are combined to obtain a compressed image.
  • autoregression is performed alternately from the spatial dimension and the channel dimension, which greatly improves Improves compression efficiency.
  • the input image is first transformed to generate a low-dimensional latent variable (latent code), then the latent variable is modeled for probability estimation, and finally the latent variable is compressed into a bit stream using entropy coding according to the calculated probability; during the decompression process, the latent variable is first decoded and restored according to the bit stream, and then the image is reconstructed according to the latent variable, achieving efficient image compression.
  • latent code low-dimensional latent variable
  • the latent variable is modeled for probability estimation
  • the latent variable is compressed into a bit stream using entropy coding according to the calculated probability
  • the latent variable is first decoded and restored according to the bit stream, and then the image is reconstructed according to the latent variable, achieving efficient image compression.
  • the processing steps shown in Figure 3A can be implemented using an image processing model.
  • the image processing model used in the image compression method provided in this application includes: an image transformation network, a hyper-prior network, and a context network. The following describes the working processes of the image processing model including: the image transformation network, the hyper-prior network, and the context network.
  • FIG. 3B is a flow chart of an image compression method provided in an embodiment of the present application. It can be understood that the steps shown in FIG. 3B can be performed by various electronic devices running an image compression device, such as a server or server cluster with an image compression function, which is used to compress each image frame in a received image or received video through an image processing model to reduce the storage space occupied by image storage. The steps shown in FIG. 3B are described below.
  • an image compression device such as a server or server cluster with an image compression function
  • Step 301 The electronic device encodes the image to be compressed through the image transformation network of the image processing model to obtain a first latent variable.
  • FIG. 4 is a schematic diagram of data flow of an image processing model in an embodiment of the present application.
  • the image processing model in the present application includes: an image transformation network, a hyper-prior network, and a context network; the functions are as follows:
  • the role of the image transformation network is to use high-resolution natural images to generate low-dimensional latent variables (latent codes). Assuming that the first latent variable obeys some inherent prior probability and the input image to be compressed obeys the conditional probability conditional on the latent variable, the image transformation network should make the probability estimates constructed by the encoder and decoder close enough so that the image reconstructed by the latent variable is close to the original image.
  • the super prior network uses the encoder structure and the decoder structure to model the entropy value of each point in the latent variable.
  • the bit rate of the compressed image is estimated and entropy coding is performed based on the appearance of the entropy feature points in the process of obtaining the entropy model of the feature value.
  • the super prior network can store the probability modeling of the latent variables in a smaller amount of bytes, providing auxiliary reference for the subsequent decoding of the context network.
  • the context network uses an autoregressive approach to predict the undecoded pixel information using the decoded pixel information, and finally inputs the prediction result into the decoder network of the image transformation network for decoding to obtain the compressed image.
  • the context network can reduce information redundancy and improve the efficiency of image compression.
  • the following describes the model structure and working principle of the image transformation network, super prior network and context network included in the image processing model.
  • the image transformation network includes: an image encoder network and an image decoder network;
  • the image encoder network includes: a transfer window attention mechanism module (Swin Transformer Block) and a block fusion module (Patch Merge Block), wherein the block fusion module includes in sequence: a space-to-depth conversion layer (Space-to-Depth), a normalization layer (LayerNorm) and a mapping layer (Linear);
  • the image decoder network includes: a transfer window attention mechanism module (Swin Transformer Block) and a block segmentation module (Patch Split Block), wherein the block segmentation module includes in sequence: a mapping layer (Linear), a normalization layer (LayerNorm) and a depth-to-space conversion layer (Depth-to-Space).
  • FIG6 is a schematic diagram of the working process of the space-depth conversion layer and the depth-space conversion layer in the embodiment of the present application. Since the image processing model needs to compress the image to be compressed, so that the volume of the compressed image is smaller than the image to be compressed, but the resolution is close to the image to be compressed, the space-depth conversion layer (Space-to-Depth) in the encoder network is configured to perform downsampling, and the depth-space conversion layer (Depth-to-Space) in the decoder network is configured to perform upsampling.
  • the space-depth conversion layer (Space-to-Depth) in the encoder network is configured to perform downsampling
  • the depth-space conversion layer (Depth-to-Space) in the decoder network is configured to perform upsampling.
  • Space-to-Depth divides each 2*2 adjacent pixel into a block (patch), splices the pixels in the same position (same shadow) in each block and connects them along the channel direction to obtain 4 2*2 blocks.
  • Depth-to-Space is the reverse operation of Space-to-Depth, which converts 4 2*2 blocks into a 4*4 image by upsampling.
  • FIG. 7 is a schematic diagram of the composition structure of the transfer window attention mechanism module in the embodiment of the present application, wherein the transfer window attention mechanism module (Swin-Transformer block) mainly includes layer normalization, multi-layer perceptron, a normal window multi-attention and a transfer window multi-head attention mechanism.
  • the use of the window attention mechanism can effectively reduce the computational complexity in the operation process compared to the traditional attention mechanism, greatly improve the efficiency of the calculation, so that the attention mechanism can be applied in the processing of large images.
  • the receptive field of the framework is severely limited. Therefore, by adding the transfer window attention mechanism, the receptive field of the attention mechanism is greatly improved without increasing the computational complexity.
  • the transfer window attention mechanism module constructs a hierarchical feature map by merging deeper image blocks, and since the attention is only calculated in each local window, it has a linear computational complexity for the input image size. As shown in FIG. 7, in the present application, the transfer window attention mechanism module performs local self-attention in each non-overlapping window of the feature map and retains the feature size.
  • Figure 7 shows the internal structure of two consecutive Swin Transformer Blocks, including Layer Norm, multi-head self-attention and fully connected layers, which are connected internally using short cuts.
  • the window size used by the encoder network and decoder network of the image transformation network is 8, the number of channels is 128, 192, 256, and 320 respectively, and the number of superpositions of the transfer window attention mechanism module network is 2, 2, 6, and 2 respectively.
  • Step 302 Determine a hyper-prior probability estimate according to the first latent variable through a hyper-prior network.
  • the encoder network of the super-prior network includes: a transfer window attention mechanism module and a block fusion module; the decoder network of the super-prior network includes: a transfer window attention mechanism module and a block segmentation module, the window size is 4, the number of channels is 192, 192 respectively, and the number of superimposed transfer window attention mechanism modules is 5, 1 respectively.
  • the super-prior network determines the super-prior probability estimate value according to the first latent variable, which can be implemented in the following way:
  • the first latent variable y is encoded by the super-prior encoder of the super-prior network to obtain the third latent variable z; the super-prior probability estimate corresponding to the first latent variable is determined by the quantization module (Q), arithmetic encoding module (AE) and arithmetic decoding module (AD) of the super-prior network, and the third latent variable z is quantized by the quantization module (Q) of the super-prior network to obtain the fourth latent variable When compressing, the fourth latent variable is encoded using an arithmetic coding module.
  • the fourth latent variable is obtained by quantizing the third latent variable z during decompression. During compression, the fourth latent variable is compressed to obtain a byte stream, and during decompression, the fourth latent variable is restored from the byte stream.
  • the fourth latent variable is decoded by the decoder network of the super prior network shown in FIG4. Decode and obtain the super prior probability estimate N( ⁇ , ⁇ ).
  • the encoder of the super prior network needs to compress the probability or cumulative probability distribution into z first, and transmit it to the decoding end of the encoder of the super prior network by quantizing entropy encoding z, and learn the modeling parameters of the potential representation y through decoding at the decoding end.
  • the compressed code stream file is obtained by modeling it and entropy encoding the quantized second latent variable, and arithmetic decoding obtains it from the byte stream. Then the entropy decoding result is input into the decoding module to obtain the final compressed image. picture.
  • Step 303 quantify the first latent variable to obtain a second latent variable, and input the second latent variable into the context network.
  • Step 304 Autoregression is performed on the second latent variable through the context network to obtain the mean and variance of the second latent variable.
  • the arithmetic encoder models it according to the probability distribution of the second latent variable to obtain a byte stream.
  • the electronic device performs autoregression on the second latent variable according to the partial decoding result through the context network, performs probability modeling on the second latent variable, calculates the mean and variance of the second latent variable, and then the arithmetic encoder performs modeling according to the probability distribution of the second latent variable to obtain a byte stream.
  • FIG. 9 is a schematic diagram of the autoregression of the context network in an embodiment of the present application.
  • the context network performs autoregression on the second latent variable according to the partial decoding result to obtain the mean and variance of the second latent variable, which can be achieved in the following manner:
  • the second latent variable is grouped to obtain at least two groups of sub-latent variables; each group of sub-latent variables is subjected to spatial autoregression through a checkerboard; after each group of sub-latent variables completes spatial autoregression, the undecoded channel group is predicted through partial decoding results until the second latent variable completely completes autoregression, thereby obtaining the probability distribution of the second latent variable.
  • spatial autoregression usually assumes that the features of a spatial location are correlated with the features of its surrounding neighboring locations. This correlation can be represented by a weight matrix (usually called a spatial weight matrix), which describes the spatial relationship between points in space.
  • the spatial weight matrix can be used to describe the association between latent variables and predicted results.
  • autoregression in the spatial dimension can be implemented by associating the current decoded symbol with the decoded symbols, modeling the variables probabilistically, and calculating the probability of all observable neighboring symbols.
  • Channel dimension autoregression can be achieved by dividing the channels of the second latent variable into K groups for autoregression to reduce redundancy between channels, and using the first decoded channel group to perform autoregressive convolution in the channel direction g ch to predict the context expression of the undecoded channel group Process reference formula 2:
  • the setting of the number of channel groups is crucial to balancing compression performance and running speed.
  • k 5 is the group number for this application. Please select the preferred value of the image processing model.
  • a checkerboard spatial context autoregression model and a channel context autoregression model are combined to realize an accelerated operation of orthogonally alternating autoregression in the spatial and channel dimensions.
  • the latent variables are grouped in the channel dimension, and the checkerboard autoregression is used instead of the serial autoregression in each latent variable group.
  • the channel autoregression is used to predict the undecoded channel group with the first decoded channel group.
  • the context network performs autoregression prediction based on the hyper-prior probability modeling.
  • the first part of the checkerboard in the first channel group is predicted, and then the remaining checkerboard part is predicted with the currently predicted checkerboard result.
  • the prediction of the first channel group has been completed.
  • the predicted results of the first group will be used as information reference for subsequent probability modeling for joint calculation.
  • the entire operation process performs autoregression orthogonally and alternately in the spatial and channel dimensions, thereby effectively improving the compression rate of the image.
  • Step 305 decode the second latent variable using the mean and the variance through the image transformation network to obtain a compressed image.
  • the second latent variable is decoded by the transfer window attention mechanism module of the decoder network of the image transformation network to obtain a decoding result of the second latent variable; the second latent variable is used to alternately pass through the transfer window attention mechanism module and the block segmentation module to obtain a compressed image, wherein the volume of the compressed image is smaller than the image to be compressed.
  • FIG10 is a flow chart of the image processing model training method provided by an embodiment of the present application. It is understandable that the steps shown in FIG10 can be performed by various electronic devices running an image processing model training device, such as a dedicated terminal with an image processing function, a server with an image processing model training function, or a server cluster. The steps shown in FIG10 are described below.
  • Step 1001 The image processing model training device obtains a first training sample set, where the first training sample set includes at least one group of noise-free training samples.
  • Step 1002 The image processing model training device configures random noise for the first training sample set to obtain a second training sample set.
  • configuring random noise for the first training sample set to obtain the second training sample set can be achieved in the following manner:
  • a dynamic noise quantity threshold that matches the use environment of the image processing model is determined; according to the dynamic noise quantity threshold, a dynamic quantity of random noise is configured for the first training sample to form a second training sample set that matches the dynamic noise threshold.
  • the use environment of mini-program game images is diverse, for example, it can be a role-playing mini-program game image, it can be an image of the user collected by the terminal as a mini-program game image, or it can be an image captured from a video image frame as a mini-program game image
  • the training samples come from different data sources, the data sources include data of various types of application scenarios as the data source of the corresponding training books.
  • a second training sample set matching the dynamic noise threshold can be used to perform targeted training on the image processing model.
  • configuring random noise for the first training sample set to obtain the second training sample set can be achieved in the following manner:
  • a fixed noise quantity threshold that matches the use environment of the image processing model is determined; according to the fixed noise quantity threshold, a fixed amount of random noise is configured for the first training sample to form a second training sample set that matches the fixed noise threshold. Since the training samples are derived from a fixed data source, the data source includes data of fixed scenes as the data source of the corresponding training book (for example, any electronic device that generates medical images).
  • the image processing model provided in this application can be packaged as a software module in a mobile detection electronic device, or it can be packaged in different fixed medical examination equipment (including but not limited to: handheld diagnostic instruments, ward central monitoring systems, bedside monitoring systems), and of course it can also be solidified in the hardware equipment of the intelligent robot.
  • a second training sample set that matches the fixed noise threshold can be used to conduct targeted training on the image processing model to improve the training speed of the image processing model.
  • Step 1003 The image processing model training device calculates the loss function of the image processing model.
  • the pixel difference between the compressed image and the image to be compressed is obtained; then the number of bytes when storing the second latent variable and the fourth latent variable in the image processing model is obtained; finally, the fusion loss function of the image processing model is calculated according to the pixel difference and the number of bytes.
  • R represents rate, which is the bytes required to store the second latent variable and the fourth latent variable.
  • D represents distortion, which is usually expressed as Calculate the difference between the compressed image and the image to be compressed, where d is usually the mean square error MSE.
  • is a parameter that controls rate and distortion. Generally, the larger ⁇ is, the larger the pixel depth (BPP Bits Per Pixel) of the corresponding model is, and the higher the quality of image reconstruction is.
  • Step 1004 Based on the initial parameters of the image processing model and the loss function of the image processing model, the image processing model is trained using the first training sample set and the second training sample set.
  • the image processing model is trained to determine the image transformation network parameters, hyper-prior network parameters and context network parameters of the image processing model.
  • FIG11 is a schematic diagram of the effect test of the image processing model provided in the embodiment of the present application, wherein a performance test is performed on the standard Kodak dataset, with bpp as the horizontal axis and PSNR (Peak Signal to Noise Ratio) as the horizontal axis. Ratio Peak Signal-to-Noise Ratio) is used as the ordinate to plot the rate-distortion performance of the model at different compression rates.
  • the values of ⁇ at the four test points in the image processing model of the present application are 0.002, 0.005, 0.02 and 0.04 respectively. It can be seen that the image processing model of the present application improves the efficiency of image compression, and the compressed image has a smaller volume.
  • the image to be compressed is encoded by the image transformation network of the image processing model to obtain a first latent variable, and the super prior network determines the super prior probability estimate according to the first latent variable; thus, the image is processed by the image transformation network and the super prior network constructed by the transfer window attention mechanism, which can improve the performance of image compression, make the compressed image obtained after decoding smaller in size, and reduce the storage cost of the image.
  • the context network partially decodes the first latent variable according to the super-prior probability estimate to obtain a partial decoding result; the context network performs autoregression on the second latent variable according to the partial decoding result to obtain the mean and variance of the second latent variable; the second latent variable is decoded using the mean and the variance to obtain a compressed image, wherein the volume of the compressed image is smaller than the image to be compressed. Therefore, the context network uses the first decoded channel grouping information as the prior knowledge of the subsequent channel grouping to be decoded to reduce the subsequent compression redundancy and save the time of compressing the image. At the same time, the context network can perform autoregression alternately from the spatial dimension and the channel dimension to improve the compression efficiency.
  • the training sample set can be flexibly adjusted according to different usage requirements, so that the image processing model can be applicable to different image compression environments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Compression Of Band Width Or Redundancy In Fax (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请提供了一种图像压缩方法、装置、电子设备、计算机程序产品及存储介质,方法包括:对待压缩图像进行编码,得到所述待压缩图像对应的第一隐变量;确定所述第一隐变量对应的超先验概率估计值;根据所述超先验概率估计值,对所述第一隐变量进行部分解码,得到所述第一隐变量的部分解码结果;基于所述第一隐变量的部分解码结果及所述待压缩图像对应的第一隐变量,确定所述待压缩图像对应的压缩后图像,所述压缩后图像的数据量小于所述待压缩图像的数据量。

Description

图像压缩方法、装置、电子设备、计算机程序产品及存储介质
相关申请的交叉引用
本申请基于申请号为2023101368433、申请日为2023年02月09日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及计算机技术,尤其涉及一种图像压缩方法、装置、电子设备、计算机程序产品及计算机存储介质。
背景技术
相关技术中,深度神经网络在许多计算机视觉任务中获得了非常好的性能。但是在通过神经网络模型对图像进行压缩时,相关技术中使用基础的卷积网络对进行图像变换,在压缩率较低时,需要从字节流中恢复隐变量重建高质量图像,而图像非线性变换网络的能力限制网络重建高质量图像的能力;同时相关技术中的上下文模型使用PixelCNN串行解码,使得图像压缩的效率较低。
发明内容
有鉴于此,本申请实施例提供一种图像压缩方法、装置、电子设备、计算机程序产品及计算机存储介质,能够通过利用图像处理模型的提升图像压缩的效率,同时经过压缩的图像的体积更小,降低了图像的存储成本。
本申请实施例的技术方案是这样实现的:
本申请实施例提供了一种图像压缩方法,所述方法包括:
对待压缩图像进行编码,得到所述待压缩图像对应的第一隐变量;
确定所述第一隐变量对应的超先验概率估计值;
据所述超先验概率估计值,对所述第一隐变量进行部分解码,得到所述第一隐变量的部分解码结果;
基于所述第一隐变量的部分解码结果及所述待压缩图像对应的第一隐变量,生成所述待压缩图像对应的压缩后图像,所述压缩后图像的数据量小于所述待压缩图像的数据量。
本申请实施例还提供了一种图像压缩装置,所述装置包括:
编码模块,配置为对待压缩图像进行编码,得到所述待压缩图像对应的第一隐变量;
信息处理模块,配置为确定所述第一隐变量对应的超先验概率估计值;
所述信息处理模块,还配置为根据所述超先验概率估计值,对所述第一隐变量进行部分解码,得到所述第一隐变量的部分解码结果;
所述信息处理模块,还配置为基于所述第一隐变量的部分解码结果及所述待压缩图像对应的第一隐变量,生成所述待压缩图像对应的压缩后图像,所述压缩后图像的数据量小于所述待压缩图像的数据量。
本申请实施例还提供了一种电子设备,所述训练装置包括:
存储器,配置为存储可执行指令;
处理器,配置为运行所述存储器存储的可执行指令时,实现前述的图像压缩方法。
本申请实施例还提供了一种计算机程序产品,所述计算机程序或指令被处理器执行时,实现前述的图像压缩方法。
本申请实施例还提供了一种计算机可读存储介质,存储有可执行指令,所述可执行指令被处理器执行时实现前述的图像压缩方法。
本申请实施例具有以下有益效果:
本申请实施例通过对待压缩图像进行编码,得到第一隐变量,根据第一隐变量确定超先验概率估计值;若编码得到的第一隐变量服从某种固有的先验概率,得到的超先验概率估计值可作为后续部分解码的参考,使得解码得到的解码结果的准确度更高,同时提升图像压缩的性能,使得经过解码得到压缩后图像的体积更小,降低了图像的存储成本。根据超先验概率估计值,对第一隐变量进行部分解码,得到部分解码结果;对第一隐变量进行部分解码,也即对一部分像素进行解码,使得后续再对其它像素进行解码时,可基于该部分解码结果进行预测(解码),如此,节省了压缩图像的耗时,同时提高了压缩效率。
附图说明
图1为本申请实施例提供的一种图像压缩方法的使用环境示意图;
图2为本申请实施例提供的电子设备的组成结构示意图;
图3A为本申请实施例提供的图像压缩方法的流程示意图一;
图3B为本申请实施例提供的图像压缩方法的流程示意图二;
图4为本申请实施例提供的图像处理模型的数据流转示意图;
图5为本申请实施例提供的图像处理模型的模型结构示意图;
图6为本申请实施例提供的空间深度转换层和深度空间转换层的工作过程示意图;
图7为本申请实施例提供的转移窗口注意力机制模块的组成结构示意图;
图8为本申请实施例提供的转移窗口注意力机制模块的计算原理示意图;
图9为本申请实施例提供的上下文网络的自回归示意图;
图10为本申请实施例提供的图像处理模型训练方法的流程示意图;
图11为本申请实施例提供的图像处理模型的效果测试示意图。
具体实施方式
为了使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请详细描述,所描述的实施例不应视为对本申请的限制,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。
对本申请实施例进行详细说明之前,对本申请实施例中涉及的名词和术语进行说明,本申请实施例中涉及的名词和术语适用于如下的解释。
1)Wasserstein distance:一种距离度量函数,主要用于衡量两个分布之间的差异。
2)人工神经网络:简称神经网络(Neural Network,NN),在机器学习和认知科学领域,是一种模仿生物神经网络结构和功能的数学模型或计算模型,用于对函数进行估计或近似。
3)模型参数:是使用通用变量来建立函数和变量之间关系的一个数量。在人工神经网络中,模型参数通常是实数矩阵。
4)模型训练,对图像数据集进行多分类学习。该模型可采用Tensor Flow、torch等深度学习框架进行构建,使用CNN等神经网络层的多层结合组成多图像分类模型。模型的输入为图像经过openCV等工具读取形成的三通道或原通道矩阵,模型输出为多分类概率,通过softmax等算法最终输出图像压缩结果。在训练时,模型通过交叉熵等目标函数向正确趋势逼近。
5)变分自编码器(VAE),变分自编码器是图像压缩中的常用网络架构,它对输入的高维图像进行图像变换,生成低维度的隐变量(latent code)。隐变量服从某种固有的先验概率,输入图像服从以隐变量为条件的条件概率,则低维变量可描述输入图像包含的信息, 并可以通过采样重建高维的输入图像。在对图像进行压缩时,变分自编码器压缩低维隐变量,减少信息冗余。
6)超先验(hyper prior)。在输入图片通过编码器得到的隐变量的基础上,超先验使用轻量级网络对隐变量中的每一个点进行标熵模型建模,通过对特征值的熵模型获取特征点的出现情况以用于码率估计和熵编码。超先验将隐变量的概率建模使用较小的字节量进行存储,解码时优先解码超先验模块存储的字节流,接着使用从字节流中解码的概率恢复隐变量重建图像。
7)上下文模型(context model)。上下文模型通常使用自回归的方式,利用已解码的像素信息预测未解码像素点信息,减少信息冗余。常用的自回归模型利用滑动窗口线性串行预测,复杂度随输入数据的维度成倍增长。尽管自回归的上下文模型能够极大地提高模型的性能,但压缩模型的计算复杂度也随之大幅增加。
8)熵编码:即编码过程中按熵原理不丢失任何信息的无损编码方式,也是有损编码中的一个关键模块,处于编码器的末端。信息熵为信源的平均信息量(不确定性的度量)。常见的熵编码有:香农(Shannon)编码、哈夫曼(Huffman)编码,指数哥伦布编码(Exp-Golomb)和算术编码(arithmetic coding)。由于熵编码的是编码器通过量化、变换、运动、预测等一系列操作之后得到的需要编码的符号,根据编码符号的分布情况选择适合的熵编码模型,因此熵编码是一个相对独立的单元,可以不止适用于视频编解码,在其他编码器,如图像编码、点云编码中同样适用。
在介绍本申请实施例提供的图像压缩方法之前,首先对相关技术中的图像压缩方法的缺陷进行说明;相关技术中,图像编码方法需要手工设置图像特征,如JPEG、BPG和VVC-intra使用正交线性变换,如离散余弦变换(DCT)和离散小波变换(DWT)在量化和编码之前对图像像素进行去相关。其中JPEG压缩基于人眼对颜色敏感而对亮度较为敏感的前提对Y、Cb、Cr分别进行压缩。例如,对于一张自然图片,jpeg对其每个8*8的patch进行DCT分解,得到64个DCT参数,根据能量聚合原理,较为重要的参数基本集中在低频区域,故无需全部参数即可将图片恢复到可接受的质量范围内。将DCT参数进行量化后可使用变长编码和哈夫曼编码压缩冗余。但是,在压缩率较低时,需要从字节流中恢复隐变量重建高质量图像,而图像非线性变换网络的能力限制网络重建高质量图像的能力;同时相关技术中的上下文模型使用PixelCNN串行解码,解码效率较低。
基于此,本申请实施例提供了一种图像压缩方法,利用包括图像变换网络、超先验网络和上下文网络的图像处理模型对图像进行压缩,提升压缩效率的同时,提高压缩图像的 质量。
图1为本申请实施例提供的图像压缩方法的使用场景示意图,参考图1,终端(包括终端10-1和终端10-2)上设置有具有图像处理功能的客户端或者具有视频处理功能的客户端,用户通过所设置的图像处理客户端可以输入相应的待处理图像,图像处理客户端也可以接收相应的压缩后图像,并将所接收的压缩后图像向用户进行展示;视频处理客户端可以通过本申请实施例提供的图像处理模型对视频中的每一帧图像进行压缩,以减少视频所占用的服务器存储空间。终端通过网络300连接服务器200,网络300可以是广域网或者局域网,又或者是二者的组合,使用无线链路实现数据传输。
作为一个示例,服务器200配置为布设图像处理模型并对图像处理模型进行训练,以确定图像处理模型中图像变换网络、超先验网络和上下文网络的网络参数;并在图像处理模型训练完成后通过终端(终端10-1和/或终端10-2)展示图像处理模型所生成的与待处理图像相对应的压缩后图像。
当然在通过图像处理模型对待处理图像进行压缩之前,还需要对图像处理模型进行训练,以确定图像变换网络、超先验网络和上下文网络的网络参数。
下面对实施本申请实施例提供的图像压缩方法的电子设备的结构做详细说明,电子设备可以各种形式来实施,如带有图像压缩功能的专用终端,也可以为设置有图像压缩功能的服务器,例如前述图1中的服务器200。图2为本申请实施例提供的电子设备的组成结构示意图,可以理解,图2仅仅示出了电子设备的示例性结构而非全部结构,根据需要可以实施图2示出的部分结构或全部结构。
本申请实施例提供的电子设备包括:至少一个处理器201、存储器202、用户接口203和至少一个网络接口204。电子设备20中的各个组件通过总线系统205耦合在一起。可以理解,总线系统205配置为实现这些组件之间的连接通信。总线系统205除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图2中将各种总线都标为总线系统205。
其中,用户接口203可以包括显示器、键盘、鼠标、轨迹球、点击轮、按键、按钮、触感板或者触摸屏等。
可以理解,存储器202可以是易失性存储器或非易失性存储器,也可包括易失性和非易失性存储器两者。本申请实施例中的存储器202能够存储数据以支持终端(如10-1)的操作。这些数据的示例包括:用于在终端(如10-1)上操作的任何计算机程序,如操作系统和应用程序。其中,操作系统包含各种系统程序,例如框架层、核心库层、驱动层等, 用于实现各种基础业务以及处理基于硬件的任务。应用程序可以包含各种应用程序。
在一些实施例中,本申请实施例提供的图像压缩装置可以采用软硬件结合的方式实现,作为示例,本申请实施例提供的图像压缩装置可以是采用硬件译码处理器形式的处理器,其被编程以执行本申请实施例提供的图像压缩方法。例如,硬件译码处理器形式的处理器可以采用一个或多个应用专用集成电路(ASIC,Application Specific Integrated Circuit)、DSP、可编程逻辑器件(PLD,Programmable Logic Device)、复杂可编程逻辑器件(CPLD,Complex Programmable Logic Device)、现场可编程门阵列(FPGA,Field-Programmable Gate Array)或其他电子元件。
作为本申请实施例提供的图像压缩装置采用软硬件结合实施的示例,本申请实施例所提供的图像压缩装置可以直接体现为由处理器201执行的软件模块组合,软件模块可以位于存储介质中,存储介质位于存储器202,处理器201读取存储器202中软件模块包括的可执行指令,结合必要的硬件(例如,包括处理器201以及连接到总线205的其他组件)完成本申请实施例提供的图像压缩方法。
作为示例,处理器201可以是一种集成电路芯片,具有信号的处理能力,例如通用处理器、数字信号处理器(DSP,Digital Signal Processor),或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等,其中,通用处理器可以是微处理器或者任何常规的处理器等。
作为本申请实施例提供的图像压缩装置采用硬件实施的示例,本申请实施例所提供的装置可以直接采用硬件译码处理器形式的处理器201来执行完成,例如,被一个或多个应用专用集成电路(ASIC,Application Specific Integrated Circuit)、DSP、可编程逻辑器件(PLD,Programmable Logic Device)、复杂可编程逻辑器件(CPLD,Complex Programmable Logic Device)、现场可编程门阵列(FPGA,Field-Programmable Gate Array)或其他电子元件执行实现本申请实施例提供的图像压缩方法。
本申请实施例中的存储器202配置为存储各种类型的数据以支持电子设备20的操作。这些数据的示例包括:用于在电子设备20上操作的任何可执行指令,如可执行指令,实现本申请实施例的从图像压缩方法的程序可以包含在可执行指令中。
在另一些实施例中,本申请实施例提供的图像压缩装置可以采用软件方式实现,图2示出了存储在存储器202中的图像压缩装置,其可以是程序和插件等形式的软件,并包括一系列的模块,作为存储器202中存储的程序的示例,图像压缩装置中包括以下的软件模块:编码模块2081和信息处理模块2082。
当图像压缩装置中的软件模块被处理器201读取到随机存储器(Random Access Memory,RAM)中并执行时,将实现本申请实施例提供的图像压缩方法,下面介绍本申请实施例中图像压缩装置中各个软件模块的功能,其中,
编码模块2081,配置为对待压缩图像进行编码,得到待压缩图像对应的第一隐变量;
信息处理模块2082,配置为确定第一隐变量对应的超先验概率估计值;
信息处理模块2082,配置为据超先验概率估计值,对第一隐变量进行部分解码,得到第一隐变量的部分解码结果;
信息处理模块2082,还配置为基于第一隐变量的部分解码结果及待压缩图像对应的第一隐变量,生成待压缩图像对应的压缩后图像,压缩后图像的数据量小于待压缩图像的数据量。
在一些实施例中,信息处理模块2082,还配置为根据部分解码结果对第二隐变量进行自回归,得到第二隐变量的均值和方差;
信息处理模块2082,还配置为利用均值和方差对第二隐变量进行解码,得到压缩后图像。
在一些实施例中,信息处理模块2082,配置为对第一隐变量进行编码,得到第三隐变量;
信息处理模块2082,配置为对第三隐变量进行熵编码,得到第三隐变量的熵编码;
信息处理模块2082,配置为对第三隐变量的熵编码进行解码,得到第四隐变量;
信息处理模块2082,配置为对第四隐变量进行解码,得到超先验概率估计值。
在一些实施例中,信息处理模块2082,配置为对第二隐变量进行分组,得到至少两组子隐变量;
信息处理模块2082,配置为通过棋盘格对每一组子隐变量进行空间自回归;
信息处理模块2082,配置为当每一组子隐变量完成空间自回归后,通过部分解码结果预测未解码的通道组,直至第二隐变量完全完成自回归,得到第二隐变量的均值和方差。
在一些实施例中,信息处理模块2082,还配置为利用均值和方差对第二隐变量进行解码,得到第二隐变量的解码结果;
信息处理模块2082,配置为对第二隐变量的解码结果交替进行分割和注意力机制转移,直至第二隐变量的解码结果完全分割,得到压缩后图像。
在一些实施例中,信息处理模块2082,还配置为通过图像处理模型的图像变换网络对待压缩图像进行编码,得到第一隐变量;
通过超先验网络,根据第一隐变量确定超先验概率估计值;
通过上下文网络根据超先验概率估计值,对第一隐变量进行部分解码,得到部分解码结果;
通过上下文网络,根据部分解码结果对第二隐变量进行自回归,得到第二隐变量的均值和方差;
通过图像变换网络利用均值和方差对第二隐变量进行解码,得到压缩后图像。
在一些实施例中,信息处理模块2082,还配置为获取与图像处理模型对应的第一训练样本集合,第一训练样本集合包括至少一组无噪声的训练样本;
信息处理模块2082,还配置为第一训练样本集合配置随机噪声,得到第二训练样本集合;
信息处理模块2082,还配置为获取图像处理模型的初始参数;
信息处理模块2082,还配置为基于图像处理模型的初始参数和图像处理模型的损失函数,通过第一训练样本集合和第二训练样本集合,对图像处理模型进行训练,以确定图像处理模型的图像变换网络参数、超先验网络参数和上下文网络参数。
在一些实施例中,信息处理模块2082,配置为当图像处理模型的使用环境为视频图像压缩时,确定与图像处理模型的使用环境相匹配的动态噪声数量阈值;
根据动态噪声数量阈值,为第一训练样本配置动态数量的随机噪声,得到与动态噪声阈值相匹配的第二训练样本集合。
在一些实施例中,信息处理模块2082,配置为当图像处理模型的使用环境为医疗图像压缩时,确定与图像处理模型的使用环境相匹配的固定噪声数量阈值;
根据固定噪声数量阈值,为第一训练样本配置固定数量的随机噪声,得到与固定噪声阈值相匹配的第二训练样本集合。
在一些实施例中,信息处理模块2082,配置为获取压缩后图像和待压缩图像的像素差值;获取对图像处理模型中第二隐变量和第四隐变量进行存储时的字节数;根据像素差值和字节数确定图像处理模型的融合损失函数。
本申请实施例还提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机可执行指令,该计算机可执行指令存储在计算机可读存储介质中。计算机设备或电子设备的处理器从计算机可读存储介质读取该计算机可执行指令,处理器执行该计算机可执行指令,使得该计算机设备执行上述图像压缩方法所提供的不同实施例及实施例的组合。
当图像处理模型训练完成之后,可以部署在服务器中或者云服务器网络中,本申请所提供的图像压缩装置也可以部署在图2所示的电子设备中,执行本申请实施例提供的图像压缩方法。
结合图2示出的电子设备20说明本申请实施例提供的图像压缩方法,参见图3A,图3A为本申请实施例提供的图像压缩方法的流程示意图,包括以下步骤:
步骤3001:对待压缩图像进行编码,得到待压缩图像对应的第一隐变量。
这里,待压缩图像可以是自然图像,在实际应用中,可以通过用于进行图像编码的图像变换网络,如通过变分自编码器对待压缩图像进行编码,得到待压缩图像对应的第一隐变量,该第一隐变量是指在模型中存在但无法直接观测到的随机变量,用于表示输入数据的潜在特征,在实际实施时,第一隐变量可以是图像变换网络的隐藏层(即图像变换网络的输入层和输出层之间的中间层)的输出。
这里,图像变换网络可以为用于对待压缩图像进行编码的神经网络模型,包括输入层、至少一个隐藏层及输出层,通过图像变换网络对待压缩图像进行编码,得到待压缩图像对应的第一隐变量。
以电子游戏应用场景为例,由于游戏图像数量较多,为了压缩图像,通常会对电子游戏中的高清图片分批进行4倍的压缩处理,例如:原始游戏图像的分辨率为1024*1024,进行4倍压缩处理后,形成分辨率为256*256的低分辨率游戏图像。但是通过本申请的图像压缩方法可以批量地将图片资源转化为适配于终端的图形处理器(Graphics Processing Unit,GPU)运行的压缩后图像,进而减少终端侧的内存开销和图像传输时的网络开销,例如将分辨率为1024*1024原始游戏图像,进行8倍压缩处理,使得经过解码得到压缩后图像的体积更小,降低了图像的存储成本。
步骤3002:确定第一隐变量对应的超先验概率估计值。
在一些实施例中,基于编码得到的第一隐变量,可通过如下方式确定超先验概率估计值:对第一隐变量进行编码,得到编码结果,并对该编码结果进行量化,得到量化结果,然后对该量化结果进行解码,得到超先验概率估计值。
其中,对第一隐变量进行编码可通过超先验编码器实现,对量化结果进行解码可通过超先验解码器实现,该超先验编码器及超先验解码器可包含于Transformer模型中。如此,若编码得到的第一隐变量服从某种固有的先验概率,得到的超先验概率估计值可作为后续部分解码的参考,使得解码得到的解码结果的准确度更高。
这里,超先验概率估计可以为对先验分布的参数进行估计的过程,其中,先验分布的 参数取决于先验分布的形式,如当先验分布形式为正态分布时,相应的先验分布的参数可以为均值和方差,也即当第一隐变量的先验分布为正态分布时,相应的先验分布的参数可以为均值和方差;超先验概率估计值即为对先验分布的参数估计所得到的值。
步骤3003:根据超先验概率估计值,对第一隐变量进行部分解码,得到第一隐变量的部分解码结果。
这里,在实际应用中,利用超先验概率估计值作为解码的参考信息,对第一隐变量进行部分解码,也即对一部分像素进行解码,使得后续再对其它像素进行解码时,可基于该部分解码结果进行预测(解码)。
示例性地,在通道维度上对第一隐变量进行分组,得到对应于各个通道维度的多个通道隐变量组,然后,可采用自回归(如棋盘格自回归)的方式,对得到的多个通道隐变量组中的部分通道隐变量组(如一个通道隐变量组)进行解码,得到第一隐变量的部分解码结果,进而使用该部分解码结果作为预测参考信息,对下一个未解码的通道隐变量组进行解码。其中,对于解码的通道隐变量组的选取,可以为随机选取。
这里,通道是指构成彩色图像的颜色信息的分量,或者用于表示图像的特征分量。针对RGB色彩模型,彩色图像由红色(R)、绿色(G)和蓝色(B)三个颜色通道组成;针对HSV色彩模型,彩色图像由色调、饱和度、亮度三个通道组成。
在一些实施例中,当通道维度对应的是颜色维度时,包括红色(R)维度、绿色(G)维度和蓝色(B)维度,基于这三个颜色维度,对第一隐变量进行分组,得到对应红色(R)维度的通道隐变量组、对应绿色(G)维度的通道隐变量组及对应蓝色(B)维度的通道隐变量组、每个隐变量组中包括多个像素。
如此,在通道维度上对第一隐变量进行分组后,在每个隐变量分组内部使用棋盘格自回归代替串行自回归,实现了在空间和通道维度上正交地交替进行自回归的处理,并用先解码的通道组预测未解码的通道组。
步骤3004:基于第一隐变量的部分解码结果及待压缩图像对应的第一隐变量,生成待压缩图像对应的压缩后图像。
其中,压缩后图像的数据量小于待压缩图像的数据量。
在一些实施例中,基于第一隐变量的部分解码结果及待压缩图像对应的第一隐变量,可以通过如下方式生成待压缩图像对应的压缩后图像:基于第一隐变量的部分解码结果及待压缩图像对应的第一隐变量,通过对图像的每个像素进行自回归建模来生成新的图像,该过程可以包括Masked卷积和逐像素的条件概率建模,例如,对于每个卷积层,通过用 合适的mask(掩码)来遮盖未来像素,保证在训练时只能利用已知的像素值来预测当前像素的像素值的概率分布,通过使用一系列的卷积层来对每个像素进行条件建模,每个卷积层负责对输入图像的一个子集进行建模,通过建模每个像素的条件概率分布(给定其左边和上方的像素值),可以利用已知像素来预测当前像素的可能取值。
这里,条件概率分布指的是在给定变量X的情况下,变量Y取各个可能取值的概率分布,即给定X的条件下,Y的分布情况。建模每个像素的条件概率分布,也即针对每个像素,如目标像素来说,在给定目标像素的关联像素(如该像素的左边和上方的像素值)的像素值的情况下,目标像素的可能的像素值的分布。
在一些实施例中,基于第一隐变量的部分解码结果及待压缩图像对应的第一隐变量,可以通过如下方式生成待压缩图像对应的压缩后图像:对第一隐变量进行量化,得到第二隐变量;根据部分解码结果对第二隐变量进行自回归,得到第二隐变量的均值和方差;利用均值和方差对第二隐变量进行解码,得到压缩后图像。
在一些实施例中,对第一隐变量进行量化的过程可以看做是对第一隐变量进行降维的过程,例如,将第一隐变量映射到预设的低维空间,得到第二隐变量;这里的低维空间指的是相对第一隐变量的维度来说为低维空间,在实际应用中,可以采用非线性降维的方式或量化矩阵的方式对第一隐变量进行量化,对第一隐变量进行量化的操作还可通过向量量化器实现,向量量化器是一个将连续或离散向量序列映射为适合在数字信道上通信或存储的数字序列的系统,通过对第一隐变量进行量化,在保持数据必要保真度的同时,实现了数据压缩。
在一些实施例中,根据部分解码结果对第二隐变量进行自回归,得到第二隐变量的均值和方差的过程可以通过如下方式实现:根据部分解码结果及第二隐变量,构建序列数据,通过自回归模型或者条件异方差模型中至少之一,对该序列数据进行拟合,得到第二隐变量的均值动态模型及方差动态模型中至少之一,通过最大似然估计或者其他参数估计方法,分别对第二隐变量的均值动态模型及方差动态模型进行参数估计,得到第二隐变量的均值和方差。
这里对自回归模型进行说明。自回归模型属于平稳时间序列模型的一种,可以用于预测和分析具有自相关性的数据,通过构建自回归模型,可以研究不同时间点上的数据是否存在依赖关系,以及这种依赖关系的强度,利用自回归模型对序列数据进行拟合和分析,可以预测未来时刻的随机变量值的平均趋势变化,得到第二隐变量的均值动态模型。
对条件异方差模型进行说明。条件异方差模型是一种用于描述时间序列/序列数据中存 在异方差(即方差不恒定)的模型。在实际应用中,数据的方差可能会随着时间或其他因素的改变而发生明显的变化,条件异方差模型能够更好地捕捉这种异方差性,通过条件异方差模型对序列数据进行拟合能够得到方差动态模型。
对第二隐变量的均值动态模型及方差动态模型进行说明。该均值动态模型一种描述时间序列数据平均趋势变化的模型,用于描述序列数据的平均趋势变化,指示序列数据中的动态特征。方差动态模型是一种用于描述时间序列数据方差变化的模型,用于时间序列数据的方差的动态变化。
在一些实施例中,利用均值和方差,可采用如下方式对第二隐变量进行解码,得到压缩后图像:根据隐变量的均值和方差,构建一个多变量高斯分布,在构建高斯分布时,需要确保分布的维度与隐变量的维度相同;从构建的高斯分布中抽取一定数量的样本。这些样本代表了在给定隐变量均值和方差条件下,隐变量可能取得的值;将抽取的样本值作为隐变量的输入,通过解码器进行解码,该解码器通常是一个与编码器相对应的神经网络结构,它能够将隐变量映射回原始图像空间,通过解码器将输入映射为生成的图像。
在一些实施例中,利用均值和方差,还可采用如下方式对第二隐变量进行解码,得到压缩后图像:利用该均值和方差对第二隐变量进行解码,得到第二隐变量的解码结果;对第二隐变量的解码结果交替进行分割和注意力机制转移,直至第二隐变量的解码结果完全分割,得到压缩后图像。其中,第二隐变量通过对第一隐变量进行量化所得到,第二隐变量相较于第一隐变量包含了用于解码或重建图像的更抽象、压缩后的信息,进而使得基于第二隐变量进行针对待压缩图像的压缩,能够提高图像压缩效率。
对第二隐变量的解码结果进行分割也即将第二隐变量的解码结果分割成不同的区域或块,如此可以并行处理以提高效率,使用注意力机制来专注于分割后区域中的特定部分,能够更集中资源处理图像的相关部分,提高图像重建的精确度;在一些实施例中,对第二隐变量的解码结果交替进行分割和注意力机制转移,直至第二隐变量的解码结果完全分割,得到压缩后图像,包括:
将第二隐变量的解码结果分割成目标数量的不同的区域,并针对每个区域应用注意力机制;
迭代执行上述处理,直至第二隐变量的解码结果被分割成的块的数量达到数量阈值(即第二隐变量的解码结果被完全分割),得到多个图像区域;
将得到的上述多个图像区域进行组合,得到压缩后图像。
通过图3A所示的处理处理步骤,从空间维度和通道维度上交替进行自回归,大幅提 高了压缩效率。对于一张输入图像,在压缩过程中首先对所输入的图像进行图像变换,生成低维度的隐变量(latent code),接着对隐变量进行概率估计建模,最终根据计算概率使用熵编码的处理方式将隐变量压缩为比特流;在解压过程中首先根据比特流解码恢复隐变量,然后根据隐变量重建图像,实现图像的高效压缩。
图3A所示的处理步骤在实际使用时,可以利用图像处理模型所实现,不同于相关技术中的图像处理模型,本申请所提供的图像压缩方法中所使用的图像处理模型包括:图像变换网络、超先验网络和上下文网络,下面对图像处理模型包括:图像变换网络、超先验网络和上下文网络的工作过程分别进行说明。
参见图3B,图3B为本申请实施例提供的图像压缩方法的流程示意图,可以理解地,图3B所示的步骤可以由运行图像压缩装置的各种电子设备执行,例如可以是如带有图像压缩功能的服务器或者服务器集群,用于通过图像处理模型对接收的图像或者接收的视频中每一个图像帧进行压缩,减少图像存储所占用的存储空间。下面针对图3B示出的步骤进行说明。
步骤301:电子设备通过图像处理模型的图像变换网络对待压缩图像进行编码,得到第一隐变量。
参考图4,图4为本申请实施例中图像处理模型的数据流转示意图,本申请中的图像处理模型包括:图像变换网络、超先验网络和上下文网络;作用如下:
1)图像变换网络的作用是利用高分辨率的自然图像生成低维度的隐变量(latent code),假定第一隐变量服从某种固有的先验概率,输入的待压缩图像服从以隐变量为条件的条件概率,图像变换网络应使编码器和解码器构造的概率估计足够接近,使隐变量重建出的图像接近原始图像。
2)超先验网络在隐变量的基础上使用编码器结构和解码器结构对隐变量中的每一个点进行熵值建模,通过特征值的熵模型获取过程中熵值特征点的出现情况来对压缩后图像的码率进行估计并进行熵编码。超先验网络可以将隐变量的概率建模使用较小的字节量进行存储,为后续上下文网络的解码提供辅助参考。
3)上下文网络使用自回归的方式,利用已解码的像素信息预测未解码像素点信息,最终将预测结果输入图像变换网络的解码器网络中进行解码处理,得到压缩后图像,上下文网络可以实现减少信息冗余,提升图像压缩的效率。
下面对图像处理模型所包括的图像变换网络、超先验网络和上下文网络的模型结构和工作原理分别进行说明。
参考图5,图5为本申请实施例中图像处理模型的模型结构示意图,其中,图像变换网络包括:图像编码器网络和图像解码器网络;图像编码器网络包括:转移窗口注意力机制模块(Swin Transformer Block)和块融合模块(Patch Merge Block),其中,块融合模块依次包括:空间深度转换层(Space-to-Depth)、归一化层(LayerNorm)和映射层(Linear);图像解码器网络包括:转移窗口注意力机制模块(Swin Transformer Block)和块分割模块(Patch Split Block),其中,块分割模块依次包括:映射层(Linear)、归一化层(LayerNorm)和深度空间转换层(Depth-to-Space)。
参考图6,图6为本申请实施例中空间深度转换层和深度空间转换层的工作过程示意图,由于图像处理模型需要对待压缩图像进行压缩,使得压缩后图像的体积小于待压缩图像,但是分辨率接近待压缩图像,因此,编码器网络中的空间深度转换层(Space-to-Depth)配置为进行下采样,解码器网络中的深度空间转换层(Depth-to-Space)配置为进行上采样,如图6所示,对于一个4*4的待压缩图像,Space-to-Depth将每个2*2的相邻像素划分为一个块(patch),将每个块中相同位置(同一阴影)像素拼接后沿通道方向进行连接,得到4个2*2块。Depth-to-Space为Space-to-Depth的逆向操作,通过上采样将4个2*2块转换为4*4的图像。
参考图7,图7为本申请实施例中转移窗口注意力机制模块的组成结构示意图,其中,转移窗口注意力机制模块(Swin-Transformer block)主要包括层标准化、多层感知器以及一个正常的窗口多注意力以及一个转移窗口的多头注意力机制,值得注意的是利用窗口注意力机制相对于传统注意力机制可以有效地降低运算过程中的计算复杂度,大大提高计算的效率,使得注意力机制可以应用在大图像的处理过程中。然而如果只利用正常的窗口注意力机制严重限制框架的感受野,因此通过添加转移窗口注意力机制在不增加计算复杂度的前提下大大提高了注意力机制的感受野。转移窗口注意力机制模块通过合并更深层的图像块来构建分层特征图,并且由于只在每个局部窗口内计算注意力,因此对于输入图像大小具有线性计算复杂度。如图7所示,本申请中,转移窗口注意力机制模块在特征图的每个非重叠窗口内执行局部自注意力,并保留特征大小。图7展示了两个连续Swin Transformer Block的内部结构,包含Layer Norm、多头自注意力和全连接层,内部使用short cut进行连接。图像变换网络的编码器网络和解码器网络使用的窗口大小为8,通道数依次为128,192,256,320,转移窗口注意力机制模块络叠加个数依次为2,2,6,2。
参考图8,图8为本申请实施例中转移窗口注意力机制模块的计算原理示意图,其中,将输入图片(Images)HxWx3划分为不重合的patch集合,其中每个patch尺寸为4x4, 那么每个patch的特征维度为4x4x3=48,patch块的数量为H/4 x W/4;如图8所示,stage1:先通过一个linear embedding将划分后的patch特征维度变成C,然后送入转移窗口注意力机制模块;stage2-stage4操作相同,先通过一个patch merging,将输入按照2x2的相邻patches合并,这样子patch块的数量就变成了H/8 x W/8,特征维度就变为4C,如图8所示,每一个转移窗口注意力机制模块的处理结果如下:stage1:【H/4 x W/4,C】,stage2:【H/8 x W/8,2C】stage3:【H/16 x W/16,4C】stage4:【H/32 x W/32,8C】,转移窗口注意力机制模块随着网络深度的加深数量会逐渐减少并且每个块的感知范围会扩大,这个设计是为了方便转移窗口注意力机制模块的层级构建,并且能够适应视觉任务的多尺度。
步骤302:通过超先验网络根据第一隐变量确定超先验概率估计值。
如图4所示,超先验网络的编码器网络包括:转移窗口注意力机制模块和块融合模块;超先验网络的解码器网络包括:转移窗口注意力机制模块和块分割模块,窗口大小为4,通道数依次为192,192,转移窗口注意力机制模块叠加个数依次为5,1。
在一些实施例中,超先验网络根据第一隐变量确定超先验概率估计值,可以通过以下方式实现:
通过超先验网络的超先验编码器对第一隐变量y进行编码,得到第三隐变量z;通过超先验网络的量化模块(Q)、算数编码模块(AE)以及算数解码模块(AD)确定第一隐变量对应的超先验概率估计值,通过超先验网络的量化模块(Q)对第三隐变量z进行量化,得到第四隐变量压缩时使用算术编码模块对第四隐变量进行熵编码,得到字节流(即第四隐变量的熵编码)。解压缩时使用算术解码器从字节流中解码第四隐变量,如图4所示,第四隐变量是对第三隐变量z进行量化后获得的。压缩时是对第四隐变量进行压缩以获得字节流,解压缩时从字节流恢复第四隐变量,通过图4所示的超先验网络的解码器网络对第四隐变量进行解码,得到超先验概率估计值N(μ,σ)。
在一些实施例中,由于在使用高斯分布参数对压缩后隐变量进行算数编码和算数解码的阶段都需要解码点的出现概率或者累计概率分布(CDF),故而需要将出现概率或者累计概率分布传输到解码端用于正确的熵解码。因此超先验网络的编码器需要对概率或者累计概率分布先压缩成z,通过对z进行量化熵编码传输至超先验网络的编码器的解码端,通过解码端解码学习潜在表示y的建模参数。通过超先验网络的编码器获取得到y潜在表示的建模分布后,通过对其建模并且对量化后的第二隐变量进行熵编码得到压缩后的码流文件,而算术解码从字节流中得到再将熵解码结果输入到解码模块,得到最终的压缩后图 像。
步骤303:对所述第一隐变量进行量化后得到第二隐变量,并将第二隐变量输入上下文网络。
步骤304:通过上下文网络对第二隐变量进行自回归,得到第二隐变量的均值和方差。
其中,电子设备通过上下文网络计算得到第二隐变量的均值和方差之后,算术编码器根据第二隐变量的概率分布进行建模,得到字节流。
其中,电子设备通过上下文网络根据部分解码结果对第二隐变量进行自回归,对第二隐变量进行概率建模,计算第二隐变量的均值和方差,之后,算术编码器根据第二隐变量的概率分布进行建模,得到字节流。
其中,参考图9,图9为本申请实施例中上下文网络的自回归示意图,在一些实施例中,上下文网络根据部分解码结果对第二隐变量进行自回归,得到第二隐变量的均值和方差,可以通过以下方式实现:
对第二隐变量进行分组,得到至少两组子隐变量;通过棋盘格对每一组子隐变量进行空间自回归;当每一组子隐变量完成空间自回归后,通过部分解码结果预测未解码的通道组,直至第二隐变量完全完成自回归,得到第二隐变量的概率分布。
这里,空间自回归通常假设一个空间位置的特征与其周围邻近位置的特征是相关的。这种相关性可以通过权重矩阵(通常称为空间权重矩阵)来表示,该矩阵描述了空间上各点之间的空间关系,可通过空间权重矩阵描述子隐变量与预测结果间的关联。
例如,空间维度的自回归可以通过以下方式实现:将当前解码符号与已解码符号相关联,变量进行概率建模,根据所有可观测的相邻符号使用空间方向上的自回归卷积gsp预测在第i个位置的上下文表达Φsp,i,其中上下文网络中的上下文表达的计算参考公式1:
通道维度自回归可以通过以下方式实现:将第二隐变量的通道分为K组进行自回归以减少通道间的冗余,使用先解码的通道组进行通道方向上的自回归卷积gch预测未解码的通道组的上下文表达过程参考公式2:
在通道自回归中,通道分组个数的设置对于平衡压缩性能和运行速度至关重要。分组数k越大,计算粒度越细腻,率失真性能越好,但参数估计越慢,k=5作为分组数为本申 请图像处理模型的优选值。
如图9所示,结合棋盘格的空间上下文自回归模型和通道上下文自回归模型,实现了一种在空间和通道维度上正交地交替进行自回归的加速运算。在实际应用中,在通道维度上对隐变量进行分组,在每个隐变量分组内部使用棋盘格自回归代替串行自回归,待该组第二隐变量内部完成空间自回归后,使用通道自回归用先解码的通道组预测未解码的通道组。在自回归初始阶段,上下文网络根据超先验的概率建模进行自回归预测,首先预测出第一个通道分组中的棋盘格第一部分,接着以当前预测出的棋盘格结果来对剩余棋盘格部分进行预测。在两次棋盘格自回归过后,第一个通道分组已完成预测。在预测第二组通道分组时,第一组已预测的结果将作为后续概率建模的信息参考共同进行运算。整个运算过程在空间和通道维度上正交交替地进行自回归,由此,有效提升图像的压缩速率。
步骤305:通过图像变换网络,利用均值和所述方差对第二隐变量进行解码,得到压缩后图像。
结合前述图4所示,通过图像变换网络的解码器网络的转移窗口注意力机制模块对第二隐变量进行解码,得到第二隐变量的解码结果;利用第二隐变量交替通过转移窗口注意力机制模块和块分割模块得到压缩后图像,其中压缩后图像的体积小于待压缩图像。
图10为本申请实施例提供的图像处理模型训练方法的流程示意图,可以理解地,图10所示的步骤可以由运行图像处理模型训练装置的各种电子设备执行,例如可以是如带有图像处理功能的专用终端、带有图像处理模型训练功能的服务器或者服务器集群。下面针对图10示出的步骤进行说明。
步骤1001:图像处理模型训练装置获取第一训练样本集合,第一训练样本集合包括至少一组无噪声的训练样本。
步骤1002:图像处理模型训练装置为第一训练样本集合配置随机噪声,得到第二训练样本集合。
在一些实施例中,为第一训练样本集合配置随机噪声,得到第二训练样本集合,可以通过以下方式实现:
当图像处理模型的使用环境为小程序游戏图像生成时,确定与图像处理模型的使用环境相匹配的动态噪声数量阈值;根据动态噪声数量阈值,为第一训练样本配置动态数量的随机噪声,以形成与动态噪声阈值相匹配的第二训练样本集合。其中,由于小程序游戏图像的使用环境多种多样,例如可以是角色扮演类小程序游戏图像,可以是终端采集的用户的图像作为小程序游戏图像,也可以是视频图像帧中截取的图像作为小程序游戏图像,由 于训练样本来源于不同的数据源,数据源中包括各类型应用场景的数据作为相应的训练本的数据来源,针对这些图像处理模型的不同使用场景,可以使用与动态噪声阈值相匹配的第二训练样本集合对图像处理模型进行针对性的训练。
在一些实施例中,为第一训练样本集合配置随机噪声,得到第二训练样本集合,可以通过以下方式实现:
当图像处理模型的使用环境为医疗图像生成时,确定与图像处理模型的使用环境相匹配的固定噪声数量阈值;根据固定噪声数量阈值,为第一训练样本配置固定数量的随机噪声,以形成与固定噪声阈值相匹配的第二训练样本集合。由于训练样本来源于固定的数据源,数据源中包括固定场景的数据作为相应的训练本的数据来源(例如任一种产生医疗图像的电子设备),例如,本申请所提供的图像处理模型可以作为软件模块封装于移动检测电子设备中,也可以封装于不同的固定医疗检查设备中(包括但不限于:手持诊断仪,病房中央监测系统,床边监测系统),当然也可以固化于智能机器人的硬件设备中,针对这些图像处理模型的不同使用场景,可以使用固定噪声阈值相匹配的第二训练样本集合对图像处理模型进行针对性的训练,以提升图像处理模型的训练速度。
步骤1003:图像处理模型训练装置计算图像处理模型的损失函数。
在本申请的一些实施例中,首先,获取压缩后图像的和待压缩图像的像素差值;之后获取对图像处理模型中第二隐变量和第四隐变量进行存储时的字节数;最后根据像素差值和字节数计算图像处理模型的融合损失函数。图像处理模型的损失函数参考公式3:
L=R+λD   公式3
其中,R表示rate,为第二隐变量和第四隐变量存储所需字节。D表示distortion,通常用计算压缩后图像与待压缩图像之间的差异,其中d通常为均方误差MSE。λ为控制rate和distortion的参数,λ一般越大,对应模型的像素深度(BPP Bits Per Pixel)越大,图像的重建质量越高。
步骤1004:基于图像处理模型的初始参数和图像处理模型的损失函数,通过第一训练样本集合和第二训练样本集合,对图像处理模型进行训练。
这里,通过对图像处理模型训练,以确定图像处理模型的图像变换网络参数、超先验网络参数和上下文网络参数。
训练完成的测试阶段,图11为本申请实施例提供的图像处理模型的效果测试示意图,其中,在标准数据集Kodak上进行了性能测试,以bpp为横坐标,PSNR(Peak Signal to Noise  Ratio峰值信噪比)为纵坐标绘制模型在不同压缩率下的率失真性能。其中本申请的图像处理模型中四个测试点的λ取值依次为0.002、0.005、0.02以及0.04,可见本申请的图像处理模型的提升图像压缩的效率,同时经过压缩的图像的体积更小,相同的PSNR=32时,本申请的bpp=0.4大于相关技术的bpp=0.25。
本申请具有以下有益技术效果:
1)本申请实施例通过图像处理模型的图像变换网络对待压缩图像进行编码,得到第一隐变量,超先验网络根据第一隐变量确定超先验概率估计值;由此,利用转移窗口注意力机制所构建的图像变换网络和超先验网络对图像进行处理,可以提升图像压缩的性能,使得经过解码得到压缩后图像的体积更小,降低了图像的存储成本。
2)上下文网络根据超先验概率估计值,对第一隐变量进行部分解码,得到部分解码结果;上下文网络根据所述部分解码结果对所述第二隐变量进行自回归,得到第二隐变量的均值和方差;利用所述均值和所述方差对所述第二隐变量进行解码,得到压缩后图像,其中,所述压缩后图像的体积小于所述待压缩图像,由此,上下文网络用先解码的通道分组信息作为后续待解码的通道分组的先验知识减少后续的压缩冗余,节省了压缩图像的耗时,同时,上下文网络可以从空间维度和通道维度上交替进行自回归,提高了压缩效率。
3)图像处理模型在训练时,可以根据不同的使用需求,灵活地调整训练样本集合,使得图像处理模型能够适用于不同的图像压缩环境中。
以上,仅为本申请的实施例而已,并非用于限定本申请的保护范围,凡在本申请的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本申请的保护范围之内。

Claims (15)

  1. 一种图像压缩方法,所述方法由电子设备执行,所述方法包括:
    对待压缩图像进行编码,得到所述待压缩图像对应的第一隐变量;
    确定所述第一隐变量对应的超先验概率估计值;
    根据所述超先验概率估计值,对所述第一隐变量进行部分解码,得到所述第一隐变量的部分解码结果;
    基于所述第一隐变量的部分解码结果及所述待压缩图像对应的第一隐变量,生成所述待压缩图像对应的压缩后图像,所述压缩后图像的数据量小于所述待压缩图像的数据量。
  2. 根据权利要求1所述的方法,其中,所述基于所述第一隐变量的部分解码结果及所述待压缩图像对应的第一隐变量,生成所述待压缩图像对应的压缩后图像,包括:
    对所述第一隐变量进行量化,得到第二隐变量;
    根据所述部分解码结果对所述第二隐变量进行自回归,得到第二隐变量的均值和方差;
    利用所述均值和所述方差对所述第二隐变量进行解码,得到所述压缩后图像。
  3. 根据权利要求2所述的方法,其中,所述根据所述部分解码结果对所述第二隐变量进行自回归,得到第二隐变量的均值和方差,包括:
    对所述第二隐变量进行分组,得到至少两组子隐变量;
    通过棋盘格对每一组子隐变量进行空间自回归;
    当每一组子隐变量完成空间自回归后,通过所述部分解码结果预测未解码的通道组,直至所述第二隐变量完全完成自回归,得到所述第二隐变量的均值和方差。
  4. 根据权利要求2所述的方法,其中,所述利用所述均值和所述方差对所述第二隐变量进行解码,得到所述压缩后图像,包括:
    利用所述均值和所述方差对所述第二隐变量进行解码,得到所述第二隐变量的解码结果;
    对所述第二隐变量的解码结果交替进行分割和注意力机制转移,直至所述第二隐变量的解码结果完全分割,得到所述压缩后图像。
  5. 根据权利要求2所述的方法,其中,所述方法基于图像处理模型所实现,所述图像处理模型包括:图像变换网络、超先验网络和上下文网络,所述对待压缩图像进行编码,得到所述待压缩图像对应的第一隐变量,包括:
    通过图像处理模型的图像变换网络对待压缩图像进行编码,得到第一隐变量;
    所述确定所述第一隐变量对应的超先验概率估计值,包括:
    通过所述超先验网络,根据所述第一隐变量确定所述超先验概率估计值;
    所述根据所述超先验概率估计值,对所述第一隐变量进行部分解码,得到所述第一隐变量的部分解码结果,包括:
    通过所述上下文网络根据所述超先验概率估计值,对所述第一隐变量进行部分解码,得到所述部分解码结果;
    所述根据所述部分解码结果对所述第二隐变量进行自回归,得到第二隐变量的均值和方差,包括:
    通过所述上下文网络,根据所述部分解码结果对所述第二隐变量进行自回归,得到第二隐变量的均值和方差;
    所述利用所述均值和所述方差对所述第二隐变量进行解码,得到所述压缩后图像,包括:
    通过所述图像变换网络利用所述均值和所述方差对所述第二隐变量进行解码,得到所述压缩后图像。
  6. 根据权利要求5所述的方法,其中,所述图像变换网络包括:图像编码器网络和图像解码器网络;
    所述图像编码器网络包括:转移窗口注意力机制模块和块融合模块,其中,所述块融合模块依次包括:空间深度转换层、归一化层和映射层;
    所述图像解码器网络包括:转移窗口注意力机制模块和块分割模块,其中,所述块分割模块依次包括:映射层、归一化层和深度空间转换层。
  7. 根据权利要求5所述的方法,其中,所述方法还包括:
    获取与所述图像处理模型对应的第一训练样本集合,所述第一训练样本集合包括至少一组无噪声的训练样本;
    为所述第一训练样本集合配置随机噪声,得到第二训练样本集合;
    获取所述图像处理模型的初始参数;
    基于所述图像处理模型的初始参数和所述图像处理模型的损失函数,通过所述第一训练样本集合和所述第二训练样本集合,对所述图像处理模型进行训练,以确定所述图像处理模型的图像变换网络参数、超先验网络参数和上下文网络参数。
  8. 根据权利要求7所述的方法,其中,所述为所述第一训练样本集合配置随机噪声,得到第二训练样本集合,包括:
    当所述图像处理模型的使用环境为视频图像压缩时,确定与所述图像处理模型的使用环境相匹配的动态噪声数量阈值;
    根据所述动态噪声数量阈值,为所述第一训练样本配置动态数量的随机噪声,得到与所述动态噪声阈值相匹配的第二训练样本集合。
  9. 根据权利要求7所述的方法,其中,所述为所述第一训练样本集合配置随机噪声,得到第二训练样本集合,包括:
    当所述图像处理模型的使用环境为医疗图像压缩时,确定与所述图像处理模型的使用环境相匹配的固定噪声数量阈值;
    根据所述固定噪声数量阈值,为所述第一训练样本配置固定数量的随机噪声,得到与所述固定噪声阈值相匹配的第二训练样本集合。
  10. 根据权利要求7所述的方法,其中,所述方法还包括:
    获取所述压缩后图像和所述待压缩图像的像素差值;
    获取对所述图像处理模型中第二隐变量和第四隐变量进行存储时的字节数;
    根据所述像素差值和所述字节数确定所述图像处理模型的融合损失函数。
  11. 根据权利要求1至10任一项所述的方法,其中,所述确定所述第一隐变量对应的超先验概率估计值,包括:
    对所述第一隐变量进行编码,得到第三隐变量;
    对所述第三隐变量进行熵编码,得到所述第三隐变量的熵编码;
    对所述第三隐变量的熵编码进行解码,得到第四隐变量;
    对所述第四隐变量进行解码,得到所述超先验概率估计值。
  12. 一种图像压缩装置,所述装置包括:
    编码模块,配置为对待压缩图像进行编码,得到所述待压缩图像对应的第一隐变量;
    信息处理模块,配置为确定所述第一隐变量对应的超先验概率估计值;
    所述信息处理模块,还配置为根据所述超先验概率估计值,对所述第一隐变量进行部分解码,得到所述第一隐变量的部分解码结果;
    所述信息处理模块,还配置为基于所述第一隐变量的部分解码结果及所述待压缩图像对应的第一隐变量,生成所述待压缩图像对应的压缩后图像,所述压缩后图像的数据量小于所述待压缩图像的数据量。
  13. 一种电子设备,所述电子设备包括:
    存储器,配置为存储可执行指令;
    处理器,配置为运行所述存储器存储的可执行指令时实现权利要求1至11任一项所述的图像压缩方法。
  14. 一种计算机程序产品,包括计算机程序或指令,所述计算机程序或指令被处理器执行时,实现权利要求1至11任一项所述的图像压缩方法。
  15. 一种计算机可读存储介质,存储有可执行指令,所述可执行指令被处理器执行时实 现权利要求1至11任一项所述的图像压缩方法。
PCT/CN2023/138206 2023-02-09 2023-12-12 图像压缩方法、装置、电子设备、计算机程序产品及存储介质 Ceased WO2024164694A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP23920883.8A EP4568246A4 (en) 2023-02-09 2023-12-12 IMAGE COMPRESSION METHOD AND APPARATUS, ELECTRONIC DEVICE, COMPUTER PROGRAM PRODUCT AND STORAGE MEDIA
US19/089,142 US20250227272A1 (en) 2023-02-09 2025-03-25 Image compression method and apparatus, electronic device, computer program product, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202310136843.3 2023-02-09
CN202310136843.3A CN116980611A (zh) 2023-02-09 2023-02-09 图像压缩方法、装置、设备、计算机程序产品及介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US19/089,142 Continuation US20250227272A1 (en) 2023-02-09 2025-03-25 Image compression method and apparatus, electronic device, computer program product, and storage medium

Publications (2)

Publication Number Publication Date
WO2024164694A1 true WO2024164694A1 (zh) 2024-08-15
WO2024164694A9 WO2024164694A9 (zh) 2024-09-12

Family

ID=88478440

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/138206 Ceased WO2024164694A1 (zh) 2023-02-09 2023-12-12 图像压缩方法、装置、电子设备、计算机程序产品及存储介质

Country Status (4)

Country Link
US (1) US20250227272A1 (zh)
EP (1) EP4568246A4 (zh)
CN (1) CN116980611A (zh)
WO (1) WO2024164694A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119135910A (zh) * 2024-09-10 2024-12-13 电子科技大学 一种基于深度学习的图像编码方法、设备
CN120218382A (zh) * 2025-05-19 2025-06-27 长沙矿冶研究院有限责任公司 一种基于自回归生成策略的动力电池拆解路径优化方法

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116980611A (zh) * 2023-02-09 2023-10-31 腾讯科技(深圳)有限公司 图像压缩方法、装置、设备、计算机程序产品及介质
CN117915114B (zh) * 2024-03-15 2024-07-09 深圳大学 一种点云属性压缩方法、装置、终端及介质
CN120807661A (zh) * 2024-04-10 2025-10-17 华为技术有限公司 一种数据压缩方法、装置及计算设备集群
CN121151561A (zh) * 2024-06-14 2025-12-16 抖音视界有限公司 用于图像编解码的方法、装置、设备和存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113574883A (zh) * 2019-03-21 2021-10-29 高通股份有限公司 使用深度生成性模型的视频压缩
CN114663536A (zh) * 2022-02-08 2022-06-24 中国科学院自动化研究所 一种图像压缩方法及装置
WO2022268641A1 (en) * 2021-06-21 2022-12-29 Interdigital Vc Holdings France, Sas Methods and apparatuses for encoding/decoding an image or a video
CN116980611A (zh) * 2023-02-09 2023-10-31 腾讯科技(深圳)有限公司 图像压缩方法、装置、设备、计算机程序产品及介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11375194B2 (en) * 2019-11-16 2022-06-28 Uatc, Llc Conditional entropy coding for efficient video compression
EP4241450A1 (en) * 2020-11-04 2023-09-13 Vid Scale, Inc. Learned video compression framework for multiple machine tasks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113574883A (zh) * 2019-03-21 2021-10-29 高通股份有限公司 使用深度生成性模型的视频压缩
WO2022268641A1 (en) * 2021-06-21 2022-12-29 Interdigital Vc Holdings France, Sas Methods and apparatuses for encoding/decoding an image or a video
CN114663536A (zh) * 2022-02-08 2022-06-24 中国科学院自动化研究所 一种图像压缩方法及装置
CN116980611A (zh) * 2023-02-09 2023-10-31 腾讯科技(深圳)有限公司 图像压缩方法、装置、设备、计算机程序产品及介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4568246A4 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119135910A (zh) * 2024-09-10 2024-12-13 电子科技大学 一种基于深度学习的图像编码方法、设备
CN119135910B (zh) * 2024-09-10 2025-06-20 电子科技大学 一种基于深度学习的图像编码方法、设备
CN120218382A (zh) * 2025-05-19 2025-06-27 长沙矿冶研究院有限责任公司 一种基于自回归生成策略的动力电池拆解路径优化方法

Also Published As

Publication number Publication date
US20250227272A1 (en) 2025-07-10
EP4568246A4 (en) 2025-11-19
CN116980611A (zh) 2023-10-31
WO2024164694A9 (zh) 2024-09-12
EP4568246A1 (en) 2025-06-11

Similar Documents

Publication Publication Date Title
WO2024164694A1 (zh) 图像压缩方法、装置、电子设备、计算机程序产品及存储介质
CN111881920B (zh) 一种大分辨率图像的网络适配方法及神经网络训练装置
CN120640000B (zh) 一种多尺度语义引导图像压缩方法、系统及存储介质
CN113079378B (zh) 图像处理方法、装置和电子设备
CN115345785A (zh) 一种基于多尺度时空特征融合的暗光视频增强方法及系统
Hema et al. Effective Image Reconstruction Using Various Compressed Sensing Techniques
Cui et al. Deep network for image compressed sensing coding using local structural sampling
Liu et al. Learning to generate realistic images for bit-depth enhancement via camera imaging processing
CN117793289A (zh) 一种视频传输方法、视频重建方法及相关设备
CN117376564B (zh) 数据编解码方法及相关设备
CN115294429A (zh) 一种基于特征域网络训练方法和装置
CN119603465A (zh) 一种基于边信息自回归的学习图像压缩方法
WO2025081929A1 (zh) 图像解码方法、图像编码方法及装置
CN114022575B (zh) 基于单目深度估计的深度图压缩方法、装置、设备及介质
Zhang et al. A Low-Complexity Transformer-CNN Hybrid Model Combining Dynamic Attention for Remote Sensing Image Compression.
EP4664887A1 (en) Encoding and decoding method and apparatus, and device thereof
US20260122241A1 (en) Image decoding method and apparatus, image coding method and apparatus, device and storage medium
US20260122263A1 (en) Image decoding method and apparatus, image coding method and apparatus, and device and storage medium
CN118972620B (zh) 图像解码和编码方法、装置、设备及存储介质
Jannani et al. An Image Compression Approach Based
CA3285218A1 (en) Encoding and decoding method and apparatus, and device thereof
Pushpalatha et al. Interpolative Model on Hueristic Projection Transform for Image Compression in Cloud Services.
Sophia et al. An efficient hybrid transform algorithm for image compression using a matrix rank-based optimization approach
Bastos Low-complexity transform-quantization pair for 360° image compression
Singh et al. A Review on Recent Developments in Image Compression Techniques

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23920883

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023920883

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2023920883

Country of ref document: EP

Effective date: 20250305

WWP Wipo information: published in national office

Ref document number: 2023920883

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE