WO2023193253A1 - 解码方法、编码方法、解码器以及编码器 - Google Patents
解码方法、编码方法、解码器以及编码器 Download PDFInfo
- Publication number
- WO2023193253A1 WO2023193253A1 PCT/CN2022/085897 CN2022085897W WO2023193253A1 WO 2023193253 A1 WO2023193253 A1 WO 2023193253A1 CN 2022085897 W CN2022085897 W CN 2022085897W WO 2023193253 A1 WO2023193253 A1 WO 2023193253A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- mip
- mode
- current block
- optimal
- block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/593—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/105—Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/11—Selection of coding mode or of prediction mode among a plurality of spatial predictive coding modes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/132—Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/147—Data rate or code amount at the encoder output according to rate distortion criteria
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/70—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
Definitions
- the embodiments of the present application relate to the technical field of image and video encoding and decoding, and more specifically, to a decoding method, an encoding method, a decoder, and an encoder.
- Digital video compression technology mainly compresses huge digital image and video data to facilitate transmission and storage.
- Digital video compression standards can implement video decompression technology, there is still a need to pursue better digital video decompression technology to Improving compression efficiency.
- Embodiments of the present application provide a decoding method, encoding method, decoder and encoder, which can improve compression efficiency.
- this application provides a decoding method, including:
- the distortion cost of the multiple MIP modes includes the distortion cost obtained by using the multiple MIP modes to predict samples in the template area adjacent to the current block;
- the first intra prediction mode includes at least one of the following: a suboptimal MIP mode for predicting the current block determined based on distortion costs of the plurality of MIP modes, a decoder side intra mode Export the intra prediction mode derived from the DIMD mode, and derive the intra prediction mode derived from the TIMD mode from the template-based intra mode;
- a reconstructed block of the current block is obtained.
- this application provides an encoding method, including:
- the distortion cost of the multiple MIP modes includes the distortion cost obtained by using the multiple MIP modes to predict samples in the template area adjacent to the current block;
- the first intra prediction mode includes at least one of the following: a suboptimal MIP mode for predicting the current block determined based on distortion costs of the plurality of MIP modes, a decoder side intra mode Export the intra prediction mode derived from the DIMD mode, and derive the intra prediction mode derived from the TIMD mode from the template-based intra mode;
- this application provides a decoder, including:
- the parsing unit is used to parse the code stream to obtain the residual block of the current block in the current sequence
- Prediction unit used for:
- the distortion cost of the multiple MIP modes includes the distortion cost obtained by using the multiple MIP modes to predict samples in the template area adjacent to the current block;
- the first intra prediction mode includes at least one of the following: a suboptimal MIP mode for predicting the current block determined based on distortion costs of the plurality of MIP modes, a decoder side intra mode Export the intra prediction mode derived from the DIMD mode, and derive the intra prediction mode derived from the TIMD mode from the template-based intra mode;
- a reconstruction unit configured to obtain a reconstruction block of the current block based on the residual block of the current block and the prediction block of the current block.
- this application provides an encoder, including:
- Prediction unit used for:
- the distortion cost of the multiple MIP modes includes the distortion cost obtained by using the multiple MIP modes to predict samples in the template area adjacent to the current block;
- the first intra prediction mode includes at least one of the following: a suboptimal MIP mode for predicting the current block determined based on distortion costs of the plurality of MIP modes, a decoder side intra mode Export the intra prediction mode derived from the DIMD mode, and derive the intra prediction mode derived from the TIMD mode from the template-based intra mode;
- a residual unit configured to obtain a residual block of the current block based on the prediction block of the current block and the original block of the current block;
- a coding unit used to code the residual block of the current block to obtain the code stream of the current sequence.
- this application provides a decoder, including:
- a processor adapted to implement computer instructions
- the computer-readable storage medium stores computer instructions, and the computer instructions are suitable for the processor to load and execute the decoding method in the above-mentioned first aspect or its respective implementations.
- processors there are one or more processors and one or more memories.
- the computer-readable storage medium may be integrated with the processor, or the computer-readable storage medium may be provided separately from the processor.
- this application provides an encoder, including:
- a processor adapted to implement computer instructions
- the computer-readable storage medium stores computer instructions, and the computer instructions are suitable for the processor to load and execute the encoding method in the above-mentioned second aspect or its respective implementations.
- processors there are one or more processors and one or more memories.
- the computer-readable storage medium may be integrated with the processor, or the computer-readable storage medium may be provided separately from the processor.
- the present application provides a computer-readable storage medium that stores computer instructions.
- the computer instructions When the computer instructions are read and executed by a processor of a computer device, the computer device performs the above-mentioned first aspect.
- the present application provides a code stream, which code stream relates to the code stream in the above-mentioned first aspect or the code stream involved in the above-mentioned second aspect.
- the decoder predicts the current block based on the optimal MIP mode and the first intra prediction mode, and designs the optimal MIP mode to determine the distortion cost based on multiple MIP modes for predicting the current block.
- the optimal MIP mode of the block, the first intra prediction mode is designed to include at least one of the following: a suboptimal MIP mode for predicting the current block determined based on the distortion cost of the plurality of MIP modes. , the intra prediction mode derived from the DIMD mode, and the intra prediction mode derived from the TIMD mode; equivalently, it is helpful to prevent the decoder from obtaining the MIP mode by parsing the code stream. Compared with the traditional MIP technology, it can effectively reduce Coding unit-level bit overhead, thereby improving decompression efficiency.
- the optimal MIP mode completely replaces the optimal prediction mode calculated based on the rate distortion cost, and both prediction accuracy and prediction can be taken into consideration. diversity, which in turn can improve decompression performance.
- Figure 1 is a schematic block diagram of a coding framework provided by an embodiment of the present application.
- FIG. 2 is a schematic diagram of the MIP mode provided by the embodiment of the present application.
- Figure 3 is a schematic diagram of a prediction mode derived based on DIDM provided by an embodiment of the present application.
- Figure 4 is a schematic diagram of deriving prediction blocks based on DIMD provided by an embodiment of the present application.
- FIG. 5 is a schematic diagram of a template used by TIMD provided by an embodiment of the present application.
- Figure 6 is a schematic block diagram of a decoding framework provided by an embodiment of the present application.
- Figure 7 is a schematic flow chart of the decoding method provided by the embodiment of the present application.
- Figure 8 is a schematic flow chart of the encoding method provided by the embodiment of the present application.
- Figure 9 is a schematic block diagram of a decoder provided by an embodiment of the present application.
- Figure 10 is a schematic block diagram of an encoder provided by an embodiment of the present application.
- Figure 11 is a schematic block diagram of an electronic device provided by an embodiment of the present application.
- the solutions provided by the embodiments of this application can be applied to the field of digital video coding technology, including but not limited to: image coding and decoding, video coding and decoding, hardware video coding and decoding, dedicated circuit video coding and decoding, and real-time video coding and decoding. field.
- the solution provided by the embodiments of the present application can be combined with the Audio Video Coding Standard (AVS), the second generation AVS standard (AVS2) or the third generation AVS standard (AVS3).
- AVS Audio Video Coding Standard
- VVC Very Video Coding
- the solution provided by the embodiment of the present application can be used to perform lossy compression on images (lossy compression), or can also be used to perform lossless compression on images (lossless compression).
- the lossless compression can be visually lossless compression (visually lossless compression) or mathematically lossless compression (mathematically lossless compression).
- Video coding and decoding standards all adopt block-based hybrid coding framework.
- Each frame in the video is divided into square largest coding units (LCU largest coding unit) or coding tree units (CTU Coding Tree Unit) of the same size (such as 128x128, 64x64, etc.).
- Each maximum coding unit or coding tree unit can be divided into rectangular coding units (CU coding units) according to rules.
- Coding units may also be divided into prediction units (PU prediction unit), transformation units (TU transform unit), etc.
- the hybrid coding framework includes prediction, transform, quantization, entropy coding, in loop filter and other modules.
- the prediction module includes intra prediction and inter prediction.
- Inter-frame prediction includes motion estimation (motion estimation) and motion compensation (motion compensation).
- Intra-frame prediction only refers to the information of the same frame image and predicts the pixel information within the current divided block. Since there is a strong similarity between adjacent frames in the video, the interframe prediction method is used in video coding and decoding technology to eliminate the temporal redundancy between adjacent frames, thereby improving coding efficiency.
- Inter-frame prediction can refer to image information of different frames and use motion estimation to search for motion vector information that best matches the current divided block. The transformation converts the predicted image blocks into the frequency domain and redistributes the energy. Combined with quantization, it can remove information that is insensitive to the human eye and is used to eliminate visual redundancy.
- Entropy coding can eliminate character redundancy based on the current context model and the probability information of the binary code stream.
- the encoder can first read a black-and-white image or color image from the original video sequence, and then encode the black-and-white image or color image.
- the black-and-white image may include pixels of the brightness component
- the color image may include pixels of the chrominance component.
- the color image may also include pixels with a brightness component.
- the color format of the original video sequence can be luminance-chrominance (YCbCr, YUV) format or red-green-blue (Red-Green-Blue, RGB) format, etc.
- the encoder reads a black-and-white image or a color image, it divides it into blocks respectively, and uses intra-frame prediction or inter-frame prediction for the current block to generate a predicted block of the current block.
- the prediction block is subtracted from the original block of the current block. block to obtain a residual block, transform and quantize the residual block to obtain a quantized coefficient matrix, entropy encode the quantized coefficient matrix and output it to the code stream.
- the decoder uses intra prediction or inter prediction for the current block to generate a prediction block of the current block.
- the decoder parses the code stream to obtain the quantization coefficient matrix, performs inverse quantization and inverse transformation on the quantization coefficient matrix to obtain the residual block, and adds the prediction block and the residual block to obtain the reconstruction block.
- Reconstruction blocks can be used to compose a reconstructed image, and the decoder performs loop filtering on the reconstructed image based on images or blocks to obtain a decoded image.
- the current block can be the current coding unit (CU) or the current prediction unit (PU), etc.
- the encoding end also requires similar operations as the decoding end to obtain the decoded image.
- the decoded image can be used as a reference frame for inter-frame prediction for subsequent frames.
- the block division information determined by the encoding end, mode information or parameter information such as prediction, transformation, quantization, entropy coding, loop filtering, etc., need to be output to the code stream if necessary.
- the decoding end determines the same block division information as the encoding end through parsing and analyzing based on existing information, prediction, transformation, quantization, entropy coding, loop filtering and other mode information or parameter information, thereby ensuring the decoded image and decoding obtained by the encoding end
- the decoded image obtained at both ends is the same.
- the decoded image obtained at the encoding end is usually also called a reconstructed image.
- the current block can be divided into prediction units during prediction, and the current block can be divided into transformation units during transformation.
- the divisions between prediction units and transformation units can be the same or different.
- the above is only the basic process of the video codec under the block-based hybrid coding framework. With the development of technology, some modules of the framework or some steps of the process may be optimized. This application is applicable to this block-based hybrid coding.
- Figure 1 is a schematic block diagram of a coding framework 100 provided by an embodiment of the present application.
- the coding framework 100 may include an intra prediction unit 180, an inter prediction unit 170, a residual unit 110, a transformation and quantization unit 120, an entropy coding unit 130, an inverse transformation and inverse quantization unit 140, and a loop. filter unit 150.
- the encoding framework 100 may also include a decoded image buffer unit 160. This coding framework 100 may also be called a hybrid framework coding mode.
- the intra prediction unit 180 or the inter prediction unit 170 may predict the image block to be encoded to output the prediction block.
- the residual unit 110 may calculate a residual block, that is, a difference value between the prediction block and the image block to be encoded, based on the prediction block and the image block to be encoded.
- the transformation and quantization unit 120 is used to perform operations such as transformation and quantization on the residual block to remove information that is insensitive to the human eye, thereby eliminating visual redundancy.
- the residual block before transformation and quantization by the transformation and quantization unit 120 may be called a time domain residual block
- the time domain residual block after transformation and quantization by the transformation and quantization unit 120 may be called a frequency residual block. or frequency domain residual block.
- the entropy encoding unit 130 may output a code stream based on the transform quantization coefficient. For example, the entropy encoding unit 130 may eliminate character redundancy according to the target context model and probability information of the binary code stream. For example, the entropy coding unit 130 may be used for context-based adaptive binary arithmetic entropy coding (CABAC). The entropy encoding unit 130 may also be called a header information encoding unit.
- CABAC context-based adaptive binary arithmetic entropy coding
- the image block to be encoded can also be called an original image block or a target image block, and a prediction block can also be called a predicted image block or image prediction block, and can also be called a prediction signal or prediction information.
- the reconstruction block may also be called a reconstructed image block or an image reconstruction block, and may also be called a reconstructed signal or reconstructed information.
- the image block to be encoded may also be called an encoding block or a coded image block, and for the decoding end, the image block to be encoded may also be called a decoding block or a decoded image block.
- the image block to be encoded may be a CTU or a CU.
- the encoding framework 100 calculates the residual between the prediction block and the image block to be encoded to obtain the residual block, and then transmits the residual block to the decoder through processes such as transformation and quantization.
- the decoder receives and parses the code stream, it obtains the residual block through steps such as inverse transformation and inverse quantization.
- the prediction block predicted by the decoder is superimposed on the residual block to obtain the reconstructed block.
- the inverse transform and inverse quantization unit 140, the loop filter unit 150 and the decoded image buffer unit 160 in the encoding framework 100 may be used to form a decoder.
- the intra prediction unit 180 or the inter prediction unit 170 can predict the image block to be encoded based on the existing reconstructed block, thereby ensuring that the encoding end and the decoding end have consistent understanding of the reference frame.
- the encoder can replicate the decoder's processing loop and thus produce the same predictions as the decoder.
- the quantized transform coefficients are inversely transformed and inversely quantized by the inverse transform and inverse quantization unit 140 to copy the approximate residual block at the decoding side.
- the approximate residual block After the approximate residual block is added to the prediction block, it can pass through the loop filtering unit 150 to smoothly filter out block effects and other effects caused by block-based processing and quantization.
- the image blocks output by the loop filter unit 150 may be stored in the decoded image buffer unit 160 for use in prediction of subsequent images.
- Figure 1 is only an example of the present application and should not be understood as a limitation of the present application.
- the loop filtering unit 150 in the coding framework 100 may include a deblocking filter (DBF) and sample adaptive compensation filtering (SAO).
- DBF deblocking filter
- SAO sample adaptive compensation filtering
- the encoding framework 100 may adopt a neural network-based loop filtering algorithm to improve video compression efficiency.
- the coding framework 100 may be a video coding hybrid framework based on a deep learning neural network.
- a model based on a convolutional neural network can be used to calculate the result of filtering the pixels based on the deblocking filter and sample adaptive compensation filtering.
- the network structures of the loop filter unit 150 on the luminance component and the chrominance component may be the same or different. Considering that the brightness component contains more visual information, the brightness component can also be used to guide the filtering of the chroma component to improve the reconstruction quality of the chroma component.
- intra-frame prediction only refers to the information of the same frame image and predicts pixel information within the image block to be encoded to eliminate spatial redundancy;
- the frame used for intra-frame prediction can be an I frame.
- the image block to be encoded can refer to the upper left image block, the upper image block and the left image block as reference information to predict the image block to be encoded, and the image block to be encoded The block is used as reference information for the next image block, so that the entire image can be predicted.
- the input digital video is in color format, such as YUV 4:2:0 format, then every 4 pixels of each image frame of the digital video consists of 4 Y components and 2 UV components.
- the encoding framework can components (i.e. luma blocks) and UV components (i.e. chrominance blocks) are encoded separately.
- the decoding end can also perform corresponding decoding according to the format.
- the process involved in the MIP mode can be divided into three main steps, which are the down-sampling process, the matrix multiplication process and the up-sampling process.
- the spatially adjacent reconstructed samples are first downsampled through the downsampling process, and then the downsampled sample sequence is used as the input vector of the matrix multiplication process, that is, the output vector of the downsampling process is used as the input of the matrix multiplication process.
- vector multiply the preset matrix and add the offset vector, and output the calculated sample vector; finally, use the output vector of the matrix multiplication process as the input vector of the upsampling process, and obtain the final result through upsampling. prediction block.
- FIG. 2 is a schematic diagram of the MIP mode provided by the embodiment of the present application.
- the MIP mode obtains the upper-adjacent down-sampled reconstructed sample vector bdry top by averaging the reconstructed samples adjacent to the top of the current coding unit, and obtains the left-adjacent down-sampled vector bdry top by averaging the reconstructed samples adjacent to the left.
- Sampling reconstructed sample vector bdry left After obtaining bdry top and bdry left , use them as the input vector bdry red of the matrix multiplication process.
- the sample can be obtained through the top row vector bdry top red , bdry left , A k ⁇ bdry red +b k based on bdry red Vector, where A k is a preset matrix, b k is a preset bias vector, and k is the index of the MIP mode.
- a k is a preset matrix
- b k is a preset bias vector
- k is the index of the MIP mode.
- MIP in order to predict a block with width W and height H, MIP requires H reconstructed pixels in the left column of the current block and W reconstructed pixels in the upper row of the current block as input.
- MIP generates prediction blocks in the following three steps: reference pixel averaging (Averaging), matrix multiplication (Matrix Vector Multiplication) and interpolation (Interpolation).
- the core of MIP is matrix multiplication, which can be thought of as a process of generating prediction blocks using input pixels (reference pixels) in a matrix multiplication manner.
- MIP provides a variety of matrices. Different prediction methods can be reflected in different matrices. Using different matrices for the same input pixel will yield different results.
- reference pixel averaging and interpolation is a design that compromises performance and complexity. For larger blocks, an effect similar to downsampling can be achieved by reference pixel averaging, allowing the input to be adapted to a smaller matrix, while interpolation achieves an upsampling effect. In this way, there is no need to provide MIP matrices for blocks of each size, but only matrices of one or several specific sizes. As the demand for compression performance increases and hardware capabilities improve, more complex MIPs may appear in the next generation of standards.
- the MIP mode can be simplified from the neural network.
- the matrix used can be obtained based on training. Therefore, the MIP mode has strong generalization ability and prediction effects that traditional prediction models cannot achieve.
- the MIP mode can be a model obtained through multiple simplifications of hardware and software complexity for an intra-frame prediction model based on a neural network. Based on a large number of training samples, multiple prediction modes represent a variety of models and parameters, which can compare Good coverage of natural sequences of textures.
- MIP is somewhat similar to planar mode, but obviously MIP is more complex and more flexible than planar mode.
- the number of MIP modes may be different. For example, for a coding unit of 4x4 size, the MIP mode has 16 prediction modes; for a coding unit of 8x8 with width equal to 4 or height equal to 4, the MIP mode has 8 prediction modes; for coding units of other sizes, the MIP mode has 6 prediction modes.
- MIP mode has a transpose function. For prediction modes that conform to the current size, MIP mode can try transposition calculations on the encoder side. Therefore, MIP mode not only requires a flag bit to indicate whether the current coding unit uses MIP mode, but also, if the current coding unit uses MIP mode, an additional transposed flag bit needs to be transmitted to the decoder.
- the transposed flag bit of MIP is binarized by fixed-length encoding (Fixed Length, FL), and the length is 1.
- the mode index of MIP is binarized by truncated binary encoding (Truncated Binary, TB).
- the main core point of the DIMD prediction mode is that the intra prediction mode is derived in the decoder using the same method as the encoder. This avoids transmitting the intra prediction mode index of the current coding unit in the code stream to save bit overhead. the goal of.
- the specific process of the DIDM prediction model can be divided into the following two main steps:
- Step 1 Export the prediction model.
- the encoding end uses the Sobel operator to count the gradient histogram (histogram of gradients) in each prediction mode.
- the area of effect is the three rows of adjacent reconstructed samples above the current block, the three adjacent columns of reconstructed samples on the left, and the corresponding adjacent ones in the upper left.
- To reconstruct the sample by calculating the aforementioned L-shaped regional gradient histogram, the prediction mode with the largest amplitude and the prediction mode with the second largest amplitude in the histogram can be obtained.
- Figure 3 is a schematic diagram of a prediction mode derived based on DIDM provided by an embodiment of the present application.
- DIMD derives a prediction mode using pixels in the template in the reconstruction area (reconstruction pixels on the left and upper sides of the current block).
- the template may include the three adjacent rows of reconstructed samples above the current block, the three adjacent columns of reconstructed samples on the left, and the corresponding adjacent reconstructed samples at the upper left.
- the template can be configured according to the window (for example, as shown in (a) of Figure 3 or (The window shown in Figure 3(b)) determines multiple gradient values within the template, each of which can be used to configure an intra prediction mode (Intra prediction mode, ipm) that is suitable for its gradient direction.
- Intra prediction mode Intra prediction mode
- the encoder can use the prediction mode adapted to the largest and second largest gradient values among multiple gradient values as the derived prediction mode. For example, as shown in Figure 3 (b), for a 4 ⁇ 4 block size, all pixels whose gradient values need to be determined are analyzed and the corresponding gradient histogram (histogram of gradients) is obtained, for example, as shown in Figure 3 As shown in (c), for blocks of other sizes, all pixels whose gradient values need to be determined are analyzed and the corresponding gradient histograms are obtained; finally, the prediction modes corresponding to the largest and second largest gradients in the gradient histogram are as a derived prediction model.
- Figure 3 (b) for a 4 ⁇ 4 block size, all pixels whose gradient values need to be determined are analyzed and the corresponding gradient histogram (histogram of gradients) is obtained, for example, as shown in Figure 3
- the prediction modes corresponding to the largest and second largest gradients in the gradient histogram are as a derived prediction model.
- the gradient histogram in this application is only an example for determining the derived prediction mode, and can be implemented in a variety of simple forms, which is not specifically limited in this application.
- this application does not limit the method of statistical gradient histograms.
- the Sobel operator or other methods may be used to calculate the gradient histograms.
- Step 2 Export prediction blocks.
- Figure 4 is a schematic diagram of deriving prediction blocks based on DIMD provided by an embodiment of the present application.
- the encoder can weight the prediction values of 3 intra prediction modes (planar mode and 2 intra prediction modes derived based on DIMD).
- the codec uses the same prediction block derivation method to obtain the prediction block of the current block. Assume that the prediction mode corresponding to the largest gradient value is prediction mode 1, and the prediction mode corresponding to the second largest gradient value is prediction mode 2.
- the encoder determines the following two conditions:
- the gradient of prediction mode 2 is not 0;
- Neither prediction mode 1 nor prediction mode 2 is planar mode or DC prediction mode.
- prediction mode 1 is used to calculate the prediction sample value of the current block, that is, the ordinary prediction prediction process is applied to prediction mode 1; otherwise, that is, the above two conditions are established, then the weighted averaging method is used. Export the predicted block for the current block.
- the specific method is: the plane mode occupies 1/3 of the weighted weight, and the remaining 2/3 is the total weight of prediction mode 1 and prediction mode 2. For example, divide the gradient amplitude value of prediction mode 1 by the sum of the gradient amplitude values of prediction mode 1. The sum of the gradient amplitude values of prediction mode 2 is used as the weighting weight of prediction mode 1.
- the gradient amplitude value of prediction mode 2 is divided by the sum of the gradient amplitude value of prediction mode 1 and the gradient amplitude value of prediction mode 2 as the weighting of prediction mode 2.
- the decoder follows the same steps to obtain the prediction block.
- Weight(mode1) 2/3*(amp1/(amp1+amp2));
- Weight(mode2) 1-Weight(PLANAR)-Weight(mode1);
- mode1 and mode2 represent prediction mode 1 and prediction mode 2 respectively
- amp1 and amp2 represent the gradient amplitude value of prediction mode 1 and the gradient amplitude value of prediction mode 2 respectively.
- the DIMD prediction mode requires a flag bit to be transmitted to the decoder. The flag bit is used to indicate whether the current coding unit uses the DIDM prediction mode.
- DIMD uses gradient analysis of reconstructed pixels to screen intra prediction modes, and the two intra prediction modes plus the planar mode can be weighted according to the analysis results.
- the advantage of DIMD is that if the DIMD mode is selected for the current block, there is no need to indicate which intra prediction mode is used in the code stream. Instead, the decoder itself derives it through the above process, which saves overhead to a certain extent.
- the codec also uses the operation to derive the prediction mode to save the overhead of the transmission mode index.
- the TIMD prediction mode can be understood as two main parts. First, the cost information of each prediction mode is calculated according to the template. The prediction mode corresponding to the minimum cost and the second-lowest cost will be selected. The prediction mode corresponding to the minimum cost is recorded as prediction mode 1. The prediction mode corresponding to the small cost is recorded as prediction mode 2; if the ratio of the next smallest cost value (costMode2) to the minimum cost value (costMode1) meets the preset conditions, such as costMode2 ⁇ 2*costMode1, then prediction mode 1 and prediction mode 2. Each corresponding prediction block can be weighted and fused according to the corresponding weights of prediction mode 1 and prediction mode 2 to obtain the final prediction block.
- the corresponding weights of prediction mode 1 and prediction mode 2 are determined in the following manner:
- weight1 costMode2/(costMode1+costMode2)
- weight2 1-weight1
- weight1 is the weight of the prediction block corresponding to prediction mode 1
- weight2 is the weight of the prediction block corresponding to prediction mode 2.
- weighted fusion between prediction blocks will not be performed, and the prediction block corresponding to prediction mode 1 will be the prediction block of TIMD.
- the TIMD prediction mode selects the planar mode to perform intra prediction on the current block. That is, no unweighted fusion is performed. Same as the DIMD prediction mode, the TIMD prediction mode needs to transmit a flag bit to the decoder to indicate whether the current coding unit uses the TIMD prediction mode.
- the process of the encoder or decoder calculating the cost information of each prediction mode is mainly: performing intra mode prediction on the samples in the template area based on the reconstructed samples adjacent to the upper or left side of the template area.
- the prediction process is the same as the original intra prediction.
- the modes are the same; for example, when DC mode is used to perform intra mode prediction on samples in the template area, the mean value of the entire coding unit is calculated; and when angle prediction mode is used to perform intra mode prediction on samples in the template area, the corresponding mode is selected according to the mode.
- interpolation filter and interpolate prediction samples according to rules At this time, based on the predicted samples and reconstructed samples in the template area, the distortion between the predicted samples and reconstructed samples in the area can be calculated, which is the cost information of the current prediction mode.
- FIG. 5 is a schematic diagram of a template used by TIMD provided by an embodiment of the present application.
- the codec can be based on a coding unit with a width equal to 2(M+L1)+1 and a height equal to 2(N+L2)+1
- Select the reconstructed samples in the reference template (Reference of template) of the current block to predict the samples in the template area of the current block.
- the TIMD prediction mode selects the planar mode to perform intra prediction on the current block.
- the available adjacent reconstruction samples may be samples adjacent to the left and upper sides of the current CU in FIG. 5 , that is, there are no available reconstruction samples in the diagonally filled area. That is to say, if there are no available reconstruction samples in the diagonal padding area, the TIMD prediction mode selects the planar mode to perform intra prediction on the current block.
- the left and upper sides of the current block can theoretically obtain reconstruction values, that is, the template of the current block contains available adjacent reconstruction samples.
- the decoder can use a certain intra prediction mode to predict on the template, and compare the prediction value and the reconstructed value to obtain the cost of the intra prediction mode on the template.
- a certain intra prediction mode to predict on the template, and compare the prediction value and the reconstructed value to obtain the cost of the intra prediction mode on the template.
- the reconstructed samples in the template are correlated with the pixels in the current block. Therefore, the performance of a prediction mode on the template can be used to estimate the performance of this prediction mode on the current block. .
- TIMD predicts some candidate intra prediction modes on the template, obtains the cost of the candidate intra prediction mode on the template, and replaces the one or two intra prediction modes with the lowest cost as the intra prediction value of the current block. If the template cost difference between the two intra prediction modes is not large, the compression performance can be improved by weighting the prediction values of the two intra prediction modes.
- the weights of the predicted values of the two prediction modes are related to the above-mentioned costs. For example, the weights are inversely proportional to the costs.
- TIMD uses the prediction effect of the intra prediction mode on the template to screen the intra prediction mode, and can weight the two intra prediction modes according to the cost on the template.
- the advantage of TIMD is that if the TIMD mode is selected for the current block, there is no need to indicate which intra prediction mode is used in the code stream. Instead, the decoder itself derives it through the above process, which saves overhead to a certain extent.
- DIMD prediction mode and TIMD prediction mode are close. They both use the decoder to perform the same operation as the encoder to infer the current coding unit. prediction model.
- This prediction mode can save the transmission of the index of the prediction mode when the complexity is acceptable, thereby saving overhead and improving compression efficiency.
- the DIMD prediction mode and the TIMD prediction mode work better in large areas with consistent texture characteristics. If the texture changes slightly or the template area cannot coverage, the prediction effect of this prediction mode is poor.
- DIMD prediction mode regardless of whether it is for DIMD prediction mode or TIMD prediction mode, prediction blocks obtained based on multiple traditional prediction modes are fused or prediction blocks obtained based on multiple traditional prediction modes are weighted.
- the prediction blocks are Fusion can generate effects that cannot be achieved by a single prediction mode.
- the DIMD prediction mode introduces planar mode as an additional weighted prediction mode to increase the spatial correlation between adjacent reconstructed samples and predicted samples, it can thereby improve the prediction of intra-frame prediction.
- planar mode since the prediction principle of planar mode is relatively simple, for some prediction blocks with obvious differences between the upper right corner and the lower left corner, using planar mode as an additional weighted prediction mode may have counterproductive effects.
- Figure 6 is a schematic block diagram of the decoding framework 200 provided by the embodiment of the present application.
- the decoding framework 200 may include an entropy decoding unit 210, an inverse transform and inverse quantization unit 220, a residual unit 230, an intra prediction unit 240, an inter prediction unit 250, a loop filtering unit 260, and a decoded image buffer. Unit 270.
- the entropy decoding unit 210 After the entropy decoding unit 210 receives and parses the code stream, it obtains the prediction block and the frequency domain residual block. For the frequency domain residual block, the inverse transform and inverse quantization unit 220 performs steps such as inverse transformation and inverse quantization to obtain the time domain residual block. Difference block, the residual unit 230 superposes the prediction block predicted by the intra prediction unit 240 or the inter prediction unit 250 to the time domain residual block after inverse transformation and inverse quantization by the inverse transformation and inverse quantization unit 220, we can obtain Rebuild block. For example, the intra prediction unit 240 or the inter prediction unit 250 may obtain the prediction block by decoding the header information of the code stream.
- Figure 7 is a schematic flow chart of the decoding method 300 provided by the embodiment of the present application. It should be understood that the decoding method 300 can be performed by a decoder. For example, it is applied to the decoding framework 200 shown in FIG. 6 . For the convenience of description, the following takes the decoder as an example.
- the decoding method 300 may include some or all of the following:
- the distortion cost of the multiple MIP modes includes the distortion cost obtained by using the multiple MIP modes to predict samples in the template area adjacent to the current block;
- the first intra prediction mode includes at least one of the following: a suboptimal MIP mode for predicting the current block determined based on distortion costs of the plurality of MIP modes, a decoder side intra mode Export the intra prediction mode derived from the DIMD mode, and derive the intra prediction mode derived from the TIMD mode from the template-based intra mode;
- S340 Predict the current block based on the optimal MIP mode and the first intra prediction mode to obtain a prediction block of the current block;
- S350 Obtain the reconstructed block of the current block based on the residual block of the current block and the prediction block of the current block.
- this application may also include S320 to S340 (that is, the decoder determines the optimal MIP mode based on the distortion costs of the multiple MIP modes and determines the first MIP mode based on the optimal MIP mode and
- the process of predicting the current block using an intra prediction mode is referred to as template matching MIP (Template Matching MIP, TMMIP) technology, template matching-based MIP prediction mode derivation method, or TMMIP fusion enhancement technology; that is, decoding After obtaining the residual block of the current block, the processor can perform performance enhancement on the prediction process of the current block based on the derived optimal MIP mode and the first intra prediction mode.
- template matching MIP Temporal Matching MIP, TMMIP
- TMMIP template matching-based MIP prediction mode derivation method
- TMMIP fusion enhancement technology that is, decoding
- the processor can perform performance enhancement on the prediction process of the current block based on the derived optimal MIP mode and the first intra prediction mode.
- TMMIP technology can utilize the optimal MIP prediction mode and at least one of the following: suboptimal MIP prediction mode, intra prediction mode derived from TIMD mode, intra prediction mode derived from DIMD mode, prediction of the current block Performance enhancements to the prediction process.
- the decoder predicts the current block based on the optimal MIP mode and the first intra prediction mode, and designs the optimal MIP mode to determine the distortion cost based on multiple MIP modes for predicting the current block.
- the optimal MIP mode of the block, the first intra prediction mode is designed to include at least one of the following: a suboptimal MIP mode for predicting the current block determined based on the distortion cost of the plurality of MIP modes. , the intra prediction mode derived from the DIMD mode, and the intra prediction mode derived from the TIMD mode; equivalently, it is helpful to prevent the decoder from obtaining the MIP mode by parsing the code stream. Compared with the traditional MIP technology, it can effectively reduce Coding unit-level bit overhead, thereby improving decompression efficiency.
- MIP is a simplified technology based on neural network technology. It is quite different from traditional interpolation filter prediction technology. For some special textures, although MIP prediction is better than traditional intra-frame prediction mode, its larger logo Overhead is a flaw of MIP technology.
- this application uses the decoder to independently determine the optimal MIP mode for predicting the current block and determines the intra prediction mode of the current block based on the optimal MIP mode, which can save up to 5 or 6 bits of overhead. It can effectively reduce the bit overhead at the coding unit level, thereby improving decompression efficiency.
- the premise of saving up to 5 or 6 bits of overhead per block is that the prediction mode algorithm based on template matching needs to be very accurate. If the MIP prediction mode calculated by the fast algorithm based on template matching is optimized with the coding end usage rate distortion Different from the MIP prediction mode calculated from the original sequence, it may not necessarily be the optimal choice. Therefore, the MIP prediction mode selection performance based on template matching depends on the matching accuracy, that is, the accuracy rate. However, the accuracy of both the template-based derivation algorithm in the traditional intra-frame prediction mode and the inter-frame template matching-based derivation algorithm is unsatisfactory.
- this application avoids the problem by fusing the optimal MIP mode and the first intra prediction mode, that is, performing fusion prediction on the current block based on the optimal MIP mode and the first intra prediction mode.
- Completely replacing the optimal prediction mode calculated based on rate-distortion cost with the optimal MIP mode can take both prediction accuracy and prediction diversity into account, thereby improving decompression performance.
- the distortion cost involved in the decoder in this application is different from the rate distortion cost (RDcost) involved in the encoder.
- the rate distortion cost is used by the encoding end to determine a certain one among multiple intra prediction techniques.
- the rate distortion cost can be the cost value obtained by comparing the distorted image and the original image. Since the decoder cannot obtain the original image, the distortion cost involved in the decoder can be the reconstructed sample and the predicted sample.
- the distortion cost between the reconstructed samples and the predicted samples such as the Sum of Absolute Transformed Difference (SATD) cost between the reconstructed samples and the predicted samples or other costs that can be used to calculate the difference between the reconstructed samples and the predicted samples.
- SATD Sum of Absolute Transformed Difference
- Table 1 is a test result of the solution of the present application when the first intra prediction mode is designed as a suboptimal MIP mode for predicting the current block determined based on the distortion cost of the multiple MIP modes.
- Table 2 is the test result obtained by testing the solution of the present application when the first intra prediction mode is designed as the intra prediction mode derived from the TIMD mode.
- Category A1 is the most obvious.
- the two technologies have an average performance improvement of 0.15 and 0.12 respectively.
- Category A1 is a 4K test sequence, which is a category that urgently needs to improve performance at that time. This result effectively proves the coding efficiency improvement capability of the technology.
- Bjorgaard delta bitrate A negative delta bit rate (BD-rate) represents the performance improvement based on the test results of ECM2.0 based on the solution provided by this application.
- the TIMD prediction mode integrated in ECM2.0 has a higher complexity based on ECM1.0, and has only a 0.4% performance gain. This application can bring good performance gains without increasing the decoder complexity, especially for 4K sequences. In addition, due to server load reasons, even if the encoding and decoding time fluctuates slightly, theoretically the decoding time will basically not increase.
- the S320 may include:
- the first identifier is used to identify that the optimal MIP mode and the first intra prediction mode are allowed to be used to predict the image block in the current sequence, then based on the distortion cost of the multiple MIP modes, Determine the optimal MIP pattern.
- the value of the first identifier is a first numerical value, it is used to identify that the optimal MIP mode and the first intra prediction mode are allowed to be used to predict the image block in the current sequence. ; If the value of the first flag is a second value, it is used to flag that the optimal MIP mode and the first intra prediction mode are not allowed to be used to predict the image block in the current sequence.
- the first value is 1 and the second value is 0. In another implementation, the first value is 0 and the second value is 1.
- the first numerical value and the second numerical value can also be other numerical values, which are not limited in this application.
- the first identification is true, it is used to identify that the optimal MIP mode and the first intra prediction mode are allowed to be used to predict the image block in the current sequence; if the first If the flag is false, it is used to flag that the optimal MIP mode and the first intra prediction mode are not allowed to be used to predict the image block in the current sequence.
- the decoder parses the block-level identifier. If the current block adopts the intra prediction mode, it parses or obtains the first identifier. If the first identifier is true, the decoder uses the block-level identifier based on the multiple MIP modes. The distortion cost determines the optimal MIP mode.
- the first flag is recorded as sps_timd_enable_flag.
- the decoder parses or obtains sps_timd_enable_flag. If the sps_timd_enable_flag is true, the decoder determines the optimal MIP based on the distortion costs of the multiple MIP modes. model.
- the first identifier is a sequence-level identifier.
- the first identifier is used to identify that the optimal MIP mode and the first intra prediction mode are allowed to be used to predict the image block in the current sequence. It can also be replaced by a similar or identical identifier. Description of meaning.
- the first identifier is used to identify that the optimal MIP mode and the first intra prediction mode are allowed to be used to predict the image block in the current sequence, or alternatively It is any one of the following: the first identifier is used to identify that the TMMIP technology is allowed to be used to determine the intra prediction mode of the image block in the current sequence, and the first identifier is used to identify that the TMMIP technology is allowed to be used to determine the intra prediction mode of the image block in the current sequence.
- the image block performs intra-frame prediction.
- the first identifier is used to identify that the image block in the current sequence is allowed to use the TMMIP technology.
- the first identifier is used to identify that the MIP mode pair determined based on the multiple MIP modes is allowed to be used.
- Image blocks in the current sequence are predicted.
- the permission flag bits of other technologies can also be used to indirectly indicate whether the current sequence is allowed to use the TMMIP technology.
- TIMD technology when the first identifier is used to indicate that the current sequence is allowed to use TIMD technology, it means that the current sequence is also allowed to use TMMIP technology; or in other words, the first identifier is used to indicate that the current sequence is allowed to use TIMD. technology, it means that the current sequence allows the use of TIMD technology and TMMIP technology at the same time; to further save bit overhead.
- the first identifier is used to identify that the optimal MIP mode and the first intra prediction mode are allowed to be used to predict image blocks in the current sequence
- parse the code stream Obtain a second identifier; if the second identifier is used to identify that the optimal MIP mode and the first intra prediction mode are allowed to be used to predict the current block, then the distortion cost based on the multiple MIP modes , determine the optimal MIP mode.
- the decoder parses the block-level identifier. If the current block adopts intra prediction mode, the decoder parses or obtains the first identifier. If the first identifier is true, the decoder parses or obtains the second identifier. , if the second flag is true, the decoder determines the optimal MIP mode based on the distortion costs of the multiple MIP modes.
- the third value of the second identifier is used to identify that the optimal MIP mode and the first intra prediction mode are allowed to be used to predict the current block; if the second identifier The fourth value is used to identify that the optimal MIP mode and the first intra prediction mode are not allowed to be used to predict the current block.
- the third value is 1 and the fourth value is 0.
- the third value is 0 and the fourth value is 1.
- the third numerical value and the fourth numerical value can also be other numerical values, which are not specifically limited in this application.
- the second flag is true, it is used to flag that the optimal MIP mode and the first intra prediction mode are allowed to be used to predict the current block; if the second flag is false, , is used to identify that the optimal MIP mode and the first intra prediction mode are not allowed to be used to predict the current block.
- the decoder parses or obtains sps_timd_enable_flag. If the sps_timd_enable_flag is true, the decoder can parse or obtain cu_timd_enable_flag. If the If cu_timd_enable_flag is true, the decoder determines the optimal MIP mode based on the distortion costs of the multiple MIP modes.
- the second identification is a block-level identification or a coding unit-level identification.
- the second identifier is used to identify that the optimal MIP mode and the first intra prediction mode are allowed to be used to predict the current block, and may also be replaced by a description with a similar or identical meaning.
- the second identifier is used to identify that the optimal MIP mode and the first intra prediction mode are allowed to be used to predict the current block, and may also be replaced by: Any item: the second identifier is used to identify that the TMMIP technology is allowed to be used to determine the intra prediction mode of the current block, the second identifier is used to identify that the TMMIP technology is allowed to be used to perform intra prediction of the current block, the second identifier
- the second identification is used to identify that the image block in the current block is allowed to use the TMMIP technology, and the second identification is used to identify that the current block is allowed to be predicted using the MIP mode determined based on the multiple MIP modes.
- whether the current block is allowed to use the TMMIP technology can also be indirectly indicated through the permission flag bits of other technologies.
- TIMD technology when the second identifier is used to indicate that the current block is allowed to use TIMD technology, it means that the current block is also allowed to use TMMIP technology; or in other words, the second identifier is used to indicate that the current block is allowed to use TIMD. technology, it means that the current block allows the use of TIMD technology and TMMIP technology at the same time; to further save bit overhead.
- the decoding end when the decoding end parses the second identifier, it can parse the second identifier before parsing the residual block of the current block, or it can parse the second identifier after parsing the residual block of the current block.
- This application provides This is not specifically limited.
- the method 300 may further include:
- the code stream of the current sequence is decoded based on the encoding method used in the optimal MIP mode to obtain the index of the optimal MIP mode.
- the decoder determines the optimal MIP mode for predicting the current block based on the distortion costs of multiple MIP modes, it needs to calculate the distortion cost of each MIP mode in the multiple MIP modes, and calculate the distortion cost according to The distortion cost of each MIP mode ranks the multiple MIP modes, and the MIP mode with the smallest cost is the optimal prediction result.
- the index of the MIP mode is usually binarized using truncated binary.
- This encoding method is closer to equal probability encoding, that is, it divides all prediction modes into two segments, and one segment is Represented by N codewords, the other is represented by N+1 codewords.
- the decoder may first calculate the distortion of each MIP mode in the multiple MIP modes. cost, and sorts the multiple MIP modes according to the distortion cost of each MIP mode.
- the decoder can choose to use a more flexible variable length encoding method and an equal probability encoding method based on the sorting of the multiple MIP modes.
- the decoder can choose to use a more flexible variable length encoding method and an equal probability encoding method based on the sorting of the multiple MIP modes.
- by flexibly setting the encoding method of the MIP mode it is helpful to save the bit overhead of the index of the MIP mode.
- the codeword length of the encoding method used by the first n MIP modes in the arrangement sequence is smaller than the codeword length of the encoding mode used by the MIP mode after the nth MIP mode in the arrangement sequence; and/ Or, the first n MIP patterns use variable length encoding and the MIP patterns after the nth MIP pattern use truncated binary encoding.
- N can be any value greater than or equal to 1.
- the arrangement order is the order obtained by the decoder arranging the multiple MIP modes in order from small to large distortion costs, and the codewords of the encoding methods used by the first n MIP modes in the arrangement order are The length is less than the codeword length of the encoding method used by the MIP mode after the n-th MIP mode in the arrangement sequence; and/or, the first n MIP modes use variable-length encoding and the n-th MIP mode after the MIP mode uses truncated binary encoding.
- the codewords of the encoding method used by the first n MIP modes in the arrangement order are The length is designed to be less than the codeword length of the encoding method used by the MIP mode after the nth MIP mode in the arrangement sequence; and/or, the encoding method used by the first n MIP modes is designed as a variable length encoding method and The coding method used in the MIP mode after the nth MIP mode is designed as a truncated binary coding method; equivalently, the MIP mode used by the encoder with a high probability uses a shorter codeword length or a variable length coding method, which can save MIP modes The bit overhead of the index improves the decompression performance.
- the method 300 may further include:
- the decoder determines whether to use the suboptimal MIP mode based on the distortion cost of the optimal MIP mode and the distortion cost of the suboptimal MIP mode.
- the MIP mode predicts the current block; if it is determined not to use the sub-optimal MIP mode, the decoder can directly predict the current block based on the optimal MIP mode; if it is determined to use the sub-optimal MIP mode , then the decoder can predict the current block based on the optimal MIP mode and the suboptimal MIP mode to obtain a prediction block of the current block.
- the decoder can directly predict the current block based on the optimal MIP mode to obtain the prediction block of the current block.
- the first intra prediction mode is a suboptimal MIP mode
- the ratio between the distortion cost of the suboptimal MIP mode and the distortion cost of the optimal MIP mode is greater than or equal to a preset ratio
- the decoder can directly predict the current block based on the optimal MIP mode to obtain a prediction block of the current block.
- the distortion cost of the suboptimal MIP mode is greater than or equal to a certain multiple (for example, twice) of the distortion cost of the optimal MIP mode, it can be interpreted that the suboptimal MIP mode has large distortion and is not suitable for the current block. That is, the fusion enhancement technology is not needed and only the optimal MIP mode is used to predict the current block.
- a certain multiple for example, twice
- the decoder determines whether to use the suboptimal MIP mode to predict the current block based on the distortion cost of the optimal MIP mode and the distortion cost of the suboptimal MIP mode, which is equivalent to the decoder Based on the distortion cost of the optimal MIP mode and the distortion cost of the suboptimal MIP mode, it is determined whether to use the suboptimal MIP mode to perform performance enhancement on the optimal MIP mode, avoiding carrying in the code stream for determining whether Using the sub-optimal MIP mode to identify the optimal MIP mode for performance enhancement saves bit overhead, thereby enhancing decompression performance.
- the S340 may include:
- the decoder predicts the current block based on the optimal MIP mode to obtain a first prediction block; and predicts the current block based on the first intra prediction mode to obtain a second prediction block; then, decodes
- the processor performs weighting processing on the first prediction block and the second prediction block based on the weight of the optimal MIP mode and the weight of the first intra prediction mode to obtain a prediction block of the current block.
- the decoder may directly perform intra prediction on the current block based on the optimal MIP mode to obtain the first prediction block.
- the decoder can directly obtain the optimal prediction mode and the suboptimal prediction mode based on the TIMD mode, predict the current block, and obtain the second prediction block.
- the distortion cost of the optimal prediction mode requires a prediction block fusion operation; that is, the decoder can first perform intra prediction on the current block according to the optimal prediction mode to obtain the optimal prediction block; secondly, the current block can be obtained based on the suboptimal prediction mode. Perform intra-frame prediction to obtain the suboptimal prediction block; then use the ratio between the distortion cost of the optimal prediction mode and the distortion cost of the suboptimal prediction mode to calculate the weight value belonging to the optimal prediction block and the weight of the suboptimal prediction block.
- the optimal prediction block and the suboptimal prediction block are weighted and fused to obtain the second prediction block.
- the optimal prediction mode or the suboptimal prediction mode is planar mode or DC mode, or the distortion cost of the optimal prediction mode is greater than twice the distortion cost of the optimal prediction mode, then there is no need to perform a prediction block fusion operation, that is, only The optimal prediction block obtained based on the optimal prediction mode may be directly used as the second prediction block.
- the decoder After obtaining the first prediction block and the second prediction block, the decoder performs weighting processing on the first prediction block and the second prediction block to obtain the prediction block of the current block.
- the S340 may include:
- the decoder based on the distortion cost of the optimal MIP mode and the first intra prediction The distortion cost of the mode determines the weight of the optimal MIP mode and the weight of the first intra prediction mode; if the first intra prediction mode includes an intra prediction mode derived from the DIMD mode, decode The processor determines that the weight of the optimal MIP mode and the weight of the first intra prediction mode are both preset values.
- the S320 may include:
- the decoder predicts the samples in the template area based on the third identifier and the multiple MIP modes, and obtains the distortion cost of the multiple MIP modes in each state of the third identifier; the third identifier The input vector and the output vector used to identify whether to transpose the MIP mode; the decoder determines the optimal MIP mode based on the distortion cost of the multiple MIP modes in each state of the third identification.
- MIP has more bit overhead than other intra prediction tools. It not only requires a flag bit to indicate whether to use MIP technology, but also needs a flag bit to indicate whether to use transposition. MIP, the last and most expensive part, requires the use of truncated binary encoding to represent the prediction mode of MIP. MIP is a simplified technology based on neural network technology. It is quite different from traditional interpolation filter prediction technology. For some special textures, although MIP prediction is better than traditional intra-frame prediction mode, its larger logo Overhead is a flaw of MIP technology.
- this application considers the transposition function of the MIP mode by traversing each state of the third identifier, which can save the cost of one MIP transposition identifier, thereby improving the solution. Compression efficiency.
- the decoder traverses each state of the third identifier and the multiple MIP modes, determines the distortion cost of the multiple MIP modes in each state of the third identifier, and determines the distortion cost of the multiple MIP modes based on the third identifier.
- the distortion cost of the multiple MIP modes in each state of the three identifiers is determined to determine the optimal MIP mode; or, the decoder traverses each state of the third identifier and the multiple MIP modes to determine the
- the optimal MIP mode is determined based on the distortion cost of the multiple MIP modes in each state of the third identifier, and based on the distortion costs of the multiple MIP modes in each state of the third identifier. That is to say, the decoding end may first traverse the multiple MIP modes, or may first traverse the state of the third identifier.
- the value of the third identifier is a fifth numerical value, it is used to identify the input vector and output vector of the transposed MIP mode; if the value of the third identifier is a sixth numerical value, then it is used to identify the input vector and output vector of the transposed MIP mode. Identifies the input and output vectors of the non-transposed MIP mode.
- each state of the third identifier can also be replaced by each value of the third identifier.
- the fifth value is 1 and the sixth value is 0.
- the fifth value is 0 and the sixth value is 1.
- the fifth numerical value and the sixth numerical value can also be other numerical values, which are not limited in this application.
- the third identifier is true, it is used to identify the input vector and the output vector of the transposed MIP mode; if the third identifier is false, it is used to identify the input vector and the output vector of the non-transposed MIP mode. Output vector. At this time, whether the third identifier is true or false is a state of the third identifier.
- the third identification is a sequence level identification, a block level identification or a coding unit level identification.
- the third identification may also be called transposition information, transposition identification, or MIP transposition identification bit.
- the third identifier is used to identify whether to transpose the input vector and the output vector of the MIP mode, and can also be replaced with a description with similar or identical meaning.
- the third identifier is used to identify whether the input and output of the MIP mode need to be transposed, and the third identifier is used to identify whether the input vector and the output vector of the MIP mode are transposed. vector, and the third identifier is used to indicate whether to transpose.
- the S320 may include:
- the decoder determines the optimal MIP mode based on the distortion costs of the multiple MIP modes.
- the preset size may include a size whose width is the preset width and whose height is the preset height. That is to say, if the width of the current block is the preset width and the height is the preset height, the decoder determines the optimal MIP mode based on the distortion costs of the multiple MIP modes.
- the preset size can be realized by pre-saving corresponding codes, tables or other methods that can be used to indicate relevant information in the device (for example, including a decoder and an encoder).
- This application is concerned with its specific implementation.
- the method is not limited.
- the preset size may refer to the size defined in the agreement.
- the "protocol" may refer to a standard protocol in the field of coding and decoding technology, which may include, for example, VCC or ECM protocols and other related protocols.
- the decoder may also use other methods to determine whether to determine the optimal MIP mode based on the distortion costs of the multiple MIP modes based on the preset size, which is not specified in this application. limited.
- the decoder may determine whether to determine the optimal MIP mode based on the distortion costs of the plurality of MIP modes based solely on the width or height of the current block. In one implementation, if the width of the current block is the preset width or the height is the preset height, the decoder determines the optimal MIP mode based on the distortion costs of the multiple MIP modes. For another example, the decoder may determine whether to determine the optimal MIP mode based on the distortion costs of the multiple MIP modes by comparing the size of the current block with the preset size. In one implementation, if the size of the current block is larger or smaller than a preset size, the decoder determines the optimal MIP mode based on the distortion costs of the multiple MIP modes.
- the decoder determines the optimal MIP mode based on distortion costs of the multiple MIP modes. In another implementation, if the height of the current block is greater than or less than a preset height, the decoder determines the optimal MIP mode based on distortion costs of the multiple MIP modes.
- the S320 may include:
- the decoder determines the optimal MIP mode based on the distortion costs of the multiple MIP modes.
- the decoder based on the multiple MIP modes
- the distortion cost is determined by determining the optimal MIP mode. That is to say, only when the image frame in which the current block is located is an I frame, the decoder determines whether to determine the optimal value based on the distortion costs of the multiple MIP modes based on the size of the current block. MIP mode.
- the S320 may include:
- the decoder determines the optimal MIP mode based on the distortion costs of the multiple MIP modes.
- the decoder may directly determine the optimal MIP mode based on the distortion costs of the multiple MIP modes. That is to say, when the image frame in which the current block is located is a B frame, regardless of the size of the current block, the decoder can directly determine the optimal MIP based on the distortion costs of the multiple MIP modes. model.
- the method 300 may further include:
- the decoder obtains the MIP mode used by the adjacent block adjacent to the current block; the decoder determines the MIP mode used by the adjacent block as the multiple MIP modes.
- the adjacent block may be an image block adjacent to at least one of the upper side, the left side, the lower left, the upper right, and the upper left of the current block.
- the decoder may determine image blocks acquired in the order of upper, left, lower left, upper right, and upper left of the current block as the adjacent blocks.
- the plurality of MIP modes can be used to construct an available MIP mode or an available MIP mode list that the decoder determines is used to predict the current block, so that the decoder passes the template area in the available MIP mode or available MIP mode list.
- the optimal MIP mode is determined by predicting the samples within the model.
- the method 300 may further include:
- the decoder performs reconstruction sample filling on the adjacent reference area outside the template area to obtain the reference rows and reference columns of the template area; the decoder uses the reference rows and the reference columns as input, and uses the multiple
- the MIP mode predicts the samples in the template area respectively to obtain multiple prediction blocks corresponding to the multiple MIP modes; the decoder determines the multiple prediction blocks based on the multiple prediction blocks and the reconstruction block in the template area. Distortion costs of multiple MIP modes.
- the decoder fills in the reference reconstruction samples required for template prediction
- the width of the area in the reference area adjacent to the upper side of the template area is equal to the width of the template area
- the height of the area in the reference area adjacent to the left side of the template area is equal to the width of the template area. is equal to the width of the template area; if the width of the area in the reference area adjacent to the upper side of the template area is greater than the width of the template area, the decoder can The area adjacent to the upper side of the area is subjected to downsampling or dimensionality reduction processing to obtain the reference row. If the height of the area in the reference area adjacent to the left side of the template area can be greater than the width of the template area, the decoder can determine the area in the reference area adjacent to the left side of the template area. The region is subjected to downsampling or dimensionality reduction processing to obtain the reference column.
- the template area may be a template area used in the TIMD mode mentioned above, and the reference area may be a reference template (Reference of template) used in the TIMD mode.
- the decoder will The composed reference area is filled with reconstructed samples, and the filled reference area is downsampled or dimensionally reduced to obtain the reference rows and reference columns, and then the MIP pattern is constructed based on the reference rows and reference columns. input vector.
- the decoder obtains the reference row and the reference column
- the reference row and the reference column are used as input
- the multiple MIP modes are used to predict the samples in the template area, respectively.
- Multiple prediction blocks corresponding to the multiple MIP modes are obtained; that is to say, the decoder, based on the reconstructed samples in the reference template of the current block, traverses the multiple MIP modes to estimate the template area of the current block. Make predictions on the samples within.
- the decoder uses the reference row, the reference column, the index of the current traversal MIP mode, and the third identifier mentioned above as inputs to obtain the prediction corresponding to the current traversal MIP mode.
- the reference row and the reference column are used to construct the input vector of the current traversal MIP mode; the index of the current traversal MIP mode is used to determine the matrix and/or offset vector of the current traversal MIP mode;
- the third identifier is used to identify whether to transpose the input vector and the output vector of the MIP mode; for example, if the third identifier is used to identify the input vector and the output vector of the MIP mode not to be transposed, then the reference columns are spliced after the reference line to form the input vector of the current traversal MIP mode; if the third identifier is used to identify the input vector and output vector of the transposed MIP mode, then the reference line is spliced to the reference line.
- the decoder transposes the output of the current traversal MIP mode to obtain the prediction block of the template area.
- the decoder can based on the distortion cost between the multiple prediction blocks and the reconstructed samples in the template area, according to The minimum distortion cost principle selects the MIP mode with the smallest cost and determines it as the optimal MIP mode in the current block's template matching-based MIP mode.
- the decoder when the decoder uses the multiple MIP modes to predict samples in the template area, it first downsamples the reference row and the reference column to obtain the input vector; and then uses the The input vector is used as input, and the samples in the template area are predicted by traversing the multiple MIP modes to obtain the output vectors of the multiple MIP modes; finally, the output vectors of the multiple MIP modes are Upsample to obtain prediction blocks corresponding to the multiple MIP modes.
- the reference row and the reference column satisfy the input conditions of the multiple MIP modes. If the reference row and the reference column do not meet the input conditions of the multiple MIP modes, the reference row and/or the reference column can be processed first as input that meets the input conditions of the multiple MIP modes. samples, and then determine the input vectors of the plurality of MIP modes based on the input samples that satisfy the input conditions of the plurality of MIP modes. For example, taking the input condition as a specified number of input samples, if the reference row and the reference column do not meet the number of input samples in the MIP mode, the decoder can modify the reference row and/or the reference column.
- the reference column is dimensionally reduced to a specified number of input samples by methods such as Haar-downsampling, and the input vectors of the multiple MIP modes are determined based on the specified number of input samples after dimensionality reduction.
- the S320 may include:
- the decoder determines the optimal MIP pattern based on the sum of absolute transform differences SATD of the plurality of MIP patterns on the template region.
- the distortion costs of the multiple MIP modes are designed to be the multiple MIP modes.
- SATD compared with directly calculating the rate distortion cost of the multiple MIP modes, can not only determine the optimal MIP mode based on the distortion cost of the multiple MIP modes on the template area, but also simplify the process. Describes the computational complexity of the distortion cost of multiple MIP modes, thereby improving the decompression performance of the decoder.
- the solution provided by this application proposes the idea of fusion enhancement based on the optimal MIP mode, that is, the decoder not only needs to determine the optimal MIP mode for predicting the current block, but also needs to fuse another prediction block to achieve different prediction effects. This not only saves bit overhead, but also creates a new prediction technology.
- the fusion process is actually because the optimal MIP mode cannot completely replace the optimal prediction mode calculated based on the rate distortion cost at the encoding end.
- a fusion approach is adopted to balance prediction accuracy with prediction diversity.
- the main idea of the decoder's MIP pattern derivation method based on template matching can be divided into the following parts:
- the reconstructed samples in the reference area are filled, that is, the reference reconstructed samples required for predicting the samples in the template area (such as the template shown in Figure 5).
- the width and height of the reference area do not need to exceed the width and height of the template area. If the reference area is filled with samples that exceed the width and height of the template area, downsampling or other methods need to be used to reduce the dimension to meet the MIP input dimension requirements.
- the decoder uses the reference reconstructed samples in the reference area, the indexes of the multiple MIP modes, and the MIP transposition flag bits as inputs to predict the samples in the template area to obtain the multiple MIP modes. corresponding prediction block.
- the reference reconstruction samples in the reference area need to meet MIP input conditions, such as Haar-downsampling, etc. to reduce the dimension to a specified number of input samples.
- the indexes of the multiple MIP modes are used to determine the matrix index of the MIP technology, and then obtain the MIP prediction matrix coefficients.
- the MIP transposition flag bit is used to identify whether input and output need to be transposed.
- the decoder uses the first intra prediction mode of the optimal MIP prediction mode to predict the current block to obtain the first prediction block and the second prediction block, and uses the weighting weight of the first intra prediction mode of the optimal MIP prediction mode to obtain the first prediction block and the second prediction block. , perform weighted calculation on the first prediction block and the second prediction block to obtain the prediction block of the current block.
- the decoding method according to the embodiment of the present application is described in detail from the perspective of the decoder above.
- the encoding method according to the embodiment of the present application will be described from the perspective of the encoder with reference to FIG. 6 below.
- Figure 8 is a schematic flow chart of the encoding method 400 provided by the embodiment of the present application. It should be understood that the encoding method 400 can be performed by an encoder. For example, it is applied to the coding framework 100 shown in FIG. 1 . For ease of description, the following uses an encoder as an example.
- the encoding method 400 may include:
- the distortion cost of the multiple MIP modes includes the distortion cost obtained by using the multiple MIP modes to predict samples in the template area adjacent to the current block;
- the first intra prediction mode includes at least one of the following: a suboptimal MIP mode for predicting the current block determined based on distortion costs of the plurality of MIP modes, a decoder side intra mode Export the intra prediction mode derived from the DIMD mode, and derive the intra prediction mode derived from the TIMD mode from the template-based intra mode;
- S430 Predict the current block based on the optimal MIP mode and the first intra prediction mode to obtain a prediction block of the current block;
- S450 Encode the residual block of the current block to obtain the code stream of the current sequence.
- the S410 may include:
- the first identifier is used to identify that the optimal MIP mode and the first intra prediction mode are allowed to be used to predict the image block in the current sequence, then based on the distortion cost of the multiple MIP modes, Determine the optimal MIP mode;
- the S450 may include:
- the S430 may include:
- the first identifier is used to identify that the optimal MIP mode and the first intra prediction mode are allowed to be used to predict the image block in the current sequence, then based on the optimal MIP mode and the first intra prediction mode, Predict the current block in an intra prediction mode to obtain the first rate distortion cost;
- the current block will be predicted based on the optimal MIP mode and the first intra prediction mode. Prediction block, determined to be the prediction block of the current block;
- the S450 may include:
- the second identification is used to identify that the optimal MIP mode and the first intra prediction are allowed to be used. mode to predict the current block; if the first rate distortion cost is greater than the minimum value of the at least one rate distortion cost, the second identification is used to identify that the optimal MIP mode and the The first intra prediction mode predicts the current block.
- the S450 may include:
- the code stream is obtained by encoding the residual block of the current block and encoding the index of the optimal MIP mode based on the encoding method used in the optimal MIP mode.
- the codeword length of the encoding method used by the first n MIP modes in the arrangement sequence is smaller than the codeword length of the encoding mode used by the MIP mode after the nth MIP mode in the arrangement sequence; and/ Or, the first n MIP patterns use variable length encoding and the MIP patterns after the nth MIP pattern use truncated binary encoding.
- the S430 may include:
- the first prediction block and the second prediction block are weighted to obtain a prediction block of the current block.
- the S430 may include:
- the first intra prediction mode includes the suboptimal MIP mode or the intra prediction mode derived from the TIMD mode, then based on the distortion cost of the optimal MIP mode and the first intra prediction mode Distortion cost, determine the weight of the optimal MIP mode and the weight of the first intra prediction mode;
- the first intra prediction mode includes an intra prediction mode derived from the DIMD mode, it is determined that the weight of the optimal MIP mode and the weight of the first intra prediction mode are both preset values.
- the S410 may include:
- the optimal MIP mode is determined based on the distortion costs of the multiple MIP modes in each state of the third identifier.
- the S410 may include:
- the optimal MIP mode is determined based on the distortion costs of the multiple MIP modes.
- the S410 may include:
- the optimal MIP mode is determined based on the distortion costs of the multiple MIP modes.
- the S410 may include:
- the optimal MIP mode is determined based on the distortion costs of the multiple MIP modes.
- the method 400 may further include:
- the MIP modes used by the adjacent blocks are determined as the multiple MIP modes.
- the method 400 may further include:
- Distortion costs for the plurality of MIP modes are determined based on the plurality of prediction blocks and reconstruction blocks within the template region.
- the multiple MIP modes when using the multiple MIP modes to predict samples in the template area, first down-sample the reference row and the reference column to obtain an input vector; and then use the input
- the vector is the input
- the samples in the template area are predicted by traversing the multiple MIP modes to obtain the output vectors of the multiple MIP modes; finally, the output vectors of the multiple MIP modes are upsampled. , obtain prediction blocks corresponding to the multiple MIP modes.
- the S410 may include:
- the optimal MIP pattern is determined based on the sum SATD of absolute transformation differences of the plurality of MIP patterns on the template region.
- the encoding method can be understood as the reverse process of the decoding method. Therefore, for the specific solution of the encoding method 400, please refer to the relevant content of the decoding method 300. For the convenience of description, this application will not repeat it again.
- the first intra prediction mode mentioned above is the suboptimal MIP mode, that is, the encoder or decoder can perform intra prediction on the current block based on the optimal MIP mode and the suboptimal MIP mode to obtain the current block. prediction block.
- the encoder traverses the prediction mode. If the current block is in intra mode, the encoder obtains the sequence-level allowable flag bit, which is used to indicate whether the current sequence is allowed to use the MIP mode derivation technology based on template matching, which can be in the form of sps_tmmip_enable_flag. If the allowed flag bits of tmmip are all true, it means that the current encoder allows the use of TMMIP technology.
- the encoder process can be implemented as the following process:
- step 1
- step 2 If sps_tmmip_enable_flag is true, the encoder tries TMMIP technology, that is, performs step 2; if sps_tmmip_enable_flag is false, the encoder does not try TMMIP technology, that is, skips step 2 and proceeds to step 3 directly.
- the encoder fills the adjacent rows and columns outside the template region with reconstructed samples.
- the filling process is the same as the filling method in the original intra prediction process. For example, the encoder can traverse and fill from the lower left corner to the upper right corner. If all reconstructed samples are available, all available reconstructed samples will be filled in sequence; if all reconstructed samples are available If it is not available, all the reconstructed samples will be filled with the mean value; if some of the reconstructed samples are available, the available reconstructed samples will be filled in first. For the remaining unavailable reconstructed samples, the encoder can traverse according to the above order from the lower left corner to the upper right corner until the first available reconstruction sample appears. After a sample is reconstructed, previously unavailable locations are filled with the first available reconstructed sample.
- the encoder takes the reconstructed samples outside the filled template area as input, and uses the allowable MIP mode to predict the samples within the template area.
- MIP patterns are allowed to be used for a 4x4 block size.
- the allowed MIP patterns are 8.
- the allowed MIP pattern for other size blocks is 6.
- blocks of any size can use the MIP transpose function, and the prediction mode of TMMIP mentioned above is the same as the MIP technology.
- the specific prediction calculation process includes: the encoder first performs Haar down-sampling on the reconstructed samples. For example, the encoder determines the down-sampling step size based on the block size. Then, the encoder adjusts the splicing order of the reconstructed samples after downsampling on the upper side and the reconstructed samples after downsampling on the left based on the information about whether to transpose or not; if transposition is not required, the reconstructed samples after downsampling on the left are spliced on top After the reconstructed sample after side downsampling, the obtained vector is used as input.
- the encoder obtains the MIP matrix coefficients based on the traversed prediction mode as an index, and calculates the output vector with the input. Finally, the encoder upsamples the output vector according to the number of output vectors and the current template size. If upsampling is not required, the vectors are filled in the horizontal direction and output as the template prediction block. If upsampling is required, the horizontal direction is upsampled first. Then downsample in the vertical direction, upsample to the same size as the template, and then output it as a prediction block in the template area.
- the encoder calculates the distortion cost based on the prediction blocks of the template area obtained by traversing each MIP mode and the reconstructed samples in the template area, and records the distortion cost value under each prediction mode and transposition information. After traversing all allowed prediction modes and transposition information, according to the principle of minimum cost, the optimal MIP mode and its corresponding transposition information are selected, as well as the suboptimal MIP mode and its corresponding transposition information. The encoder determines whether fusion enhancement is needed based on the relationship between the cost value of the optimal MIP mode and the cost value of the suboptimal MIP mode.
- the optimal MIP mode needs to be The optimal MIP prediction block and the suboptimal MIP prediction block are fused and enhanced. If the cost value of the suboptimal prediction mode is greater than or equal to twice the cost value of the optimal MIP mode, no fusion enhancement is required.
- the encoder obtains the prediction block corresponding to the optimal MIP mode and the suboptimal MIP mode based on the optimal MIP mode, the suboptimal MIP mode, the transposition information of the optimal MIP mode, and the transposition information of the suboptimal MIP mode.
- the prediction block corresponding to the excellent MIP mode Specifically, first, the encoder samples the reconstructed samples adjacent to the upper and left sides of the current block as appropriate and splices them according to the transposed information as the input vector, and reads the matrix coefficients in the current mode according to the MIP mode as an index, Then, the output vector is obtained by calculating the input vector and matrix coefficients.
- the encoder can perform output transposition according to the transposition information, and upsample the output vector according to the size of the current block and the number of samples of the output vector to obtain the optimal MIP prediction block and the suboptimal MIP prediction block of the same size as the current block. And based on the calculated weight value of the optimal MIP mode and the weight value of the suboptimal MIP mode, the optimal MIP prediction block and the suboptimal MIP prediction block are weighted and averaged to obtain a new prediction block as the final prediction block of the current block. . If fusion enhancement is not required, the encoder can calculate the optimal MIP prediction block based on the optimal MIP mode and its transposition information. The calculation process is the same as above. Finally, the encoder uses the optimal MIP prediction block as the prediction of the current block. piece. In addition, the encoder obtains the rate distortion cost of the current block and records it as cost1.
- the encoder continues to traverse other intra prediction techniques and calculates the corresponding rate-distortion cost as cost2...costN.
- the current block uses TMMIP technology, and the encoder sets the TMMIP usage flag of the current block to true and writes it into the code stream; if cost1 is not the minimum rate distortion cost, the current block uses other intra-frame techniques. Prediction technology, the encoder sets the TMMIP usage flag position of the current block and writes it into the code stream. It should be understood that information such as identification bits or indexes of other intra prediction technologies are transmitted according to definition and will not be elaborated here;
- the encoder determines the residual block of the current block based on the prediction block of the current block and the original block of the current block, and performs operations such as transformation and quantization, entropy coding, and loop filtering on the residual block of the current block. It should be understood that the specific process can be found in the relevant content above, and to avoid repetition, it will not be described again here.
- the decoder parses the block-level type flag bit. If it is intra-frame mode, it parses or obtains the sequence-level allowable flag bit, which is used to indicate whether the current sequence is allowed to use the MIP mode export technology based on template matching, which can be in the form of sps_tmmip_enable_flag. If the allowed flag bits of tmmip are all true, it means that the current decoder allows the use of TMMIP technology.
- the decoder process can be implemented as the following process:
- step 1
- the decoder parses the TMMIP usage flag of the current block. Otherwise, the current decoding process does not need to decode the block-level TMMIP usage flag.
- the block-level TMMIP usage flag defaults to No. If the TMMIP usage flag of the current block is true, perform step 2; otherwise, perform step 3.
- the decoder fills the adjacent rows and columns outside the template region with reconstructed samples.
- the filling process is the same as the filling method in the original intra prediction process. For example, the decoder can traverse and fill from the lower left corner to the upper right corner. If all reconstructed samples are available, all available reconstructed samples will be filled in sequence; if all reconstructed samples are available If it is not available, all the reconstructed samples will be filled with the mean; if some of the reconstructed samples are available, the available reconstructed samples will be filled in first. For the remaining unavailable reconstructed samples, the decoder can traverse according to the above order from the lower left corner to the upper right corner until the first available reconstruction sample appears. After a sample is reconstructed, previously unavailable locations are filled with the first available reconstructed sample.
- the decoder takes the reconstructed samples outside the filled template area as input, and uses the allowed MIP mode to predict the samples in the template area.
- MIP patterns are allowed to be used for a 4x4 block size.
- the allowed MIP patterns are 8.
- the allowed MIP pattern for other size blocks is 6.
- blocks of any size can use the MIP transpose function, and the prediction mode of TMMIP mentioned above is the same as the MIP technology.
- the specific prediction calculation process includes: the decoder first performs Haar down-sampling on the reconstructed samples. For example, the decoder determines the down-sampling step size based on the block size. Then, the decoder adjusts the splicing order of the reconstructed samples after downsampling on the upper side and the reconstructed samples after downsampling on the left based on the information about whether to transpose or not; if transposition is not required, the reconstructed samples after downsampling on the left are spliced on top After the reconstructed sample after side downsampling, the obtained vector is used as input.
- the decoder obtains the MIP matrix coefficients based on the traversed prediction mode as an index, and calculates the output vector with the input. Finally, the decoder upsamples the output vector according to the number of output vectors and the current template size. If upsampling is not required, the vectors are filled in the horizontal direction and output as the template prediction block. If upsampling is required, the horizontal direction is upsampled first. Then downsample in the vertical direction, upsample to the same size as the template, and then output it as a prediction block in the template area.
- the decoder calculates the distortion cost based on the prediction blocks of the template area obtained by traversing each MIP mode and the reconstructed samples in the template area, and records the distortion cost value under each prediction mode and transposition information. After traversing all allowed prediction modes and transposition information, according to the principle of minimum cost, the optimal MIP mode and its corresponding transposition information are selected, as well as the suboptimal MIP mode and its corresponding transposition information. The decoder determines whether fusion enhancement is needed based on the relationship between the cost value of the optimal MIP mode and the cost value of the suboptimal MIP mode.
- the optimal MIP mode needs to be The excellent MIP prediction block and the suboptimal MIP prediction block are fused and enhanced. If the cost value of the suboptimal prediction mode is greater than or equal to twice the cost value of the optimal MIP mode, no fusion enhancement is required.
- the decoder obtains the prediction block corresponding to the optimal MIP mode and the suboptimal MIP mode based on the optimal MIP mode, the suboptimal MIP mode, the transposition information of the optimal MIP mode, and the transposition information of the suboptimal MIP mode.
- the prediction block corresponding to the excellent MIP mode Specifically, first, the decoder samples the reconstructed samples adjacent to the upper and left sides of the current block as appropriate and splices them according to the transposed information as the input vector, and reads the matrix coefficients in the current mode according to the MIP mode as an index, Then, the output vector is obtained by calculating the input vector and matrix coefficients.
- the decoder can perform output transposition according to the transposition information, and upsample the output vector according to the size of the current block and the number of samples of the output vector to obtain the optimal MIP prediction block and the suboptimal MIP prediction block of the same size as the current block. And based on the calculated weight value of the optimal MIP mode and the weight value of the suboptimal MIP mode, the optimal MIP prediction block and the suboptimal MIP prediction block are weighted and averaged to obtain a new prediction block as the final prediction block of the current block. . If fusion enhancement is not required, the decoder can calculate the optimal MIP prediction block based on the optimal MIP mode and its transposition information. The calculation process is the same as above. Finally, the decoder uses the optimal MIP prediction block as the prediction of the current block. piece.
- the decoder continues to parse information such as usage flags or indexes of other intra prediction technologies, and obtains the final prediction block of the current block based on the parsed information.
- the decoder parses the code stream and obtains the frequency domain residual block of the current block (also called frequency domain residual information), and performs inverse quantization and inverse transformation on the frequency domain residual block of the current block to obtain the residual block of the current block (also known as the temporal residual block or temporal residual information); the decoder then superimposes the prediction block of the current block and the residual block of the current block to obtain a reconstructed sample block.
- frequency domain residual information also called frequency domain residual information
- temporal residual block or temporal residual information also known as the temporal residual block or temporal residual information
- the reconstructed image can be used as video output or as a reference for subsequent decoding.
- the size of the template area used by the encoder or decoder in the TMMIP technology can be predefined according to the size of the current block.
- the width of the upper area adjacent to the current block in the template area is the width of the current block, and its height is the height of two rows of samples; the left side of the template area adjacent to the left side of the current block
- the height of the region is the height of the current block, and its width is the width of two rows of samples.
- it can also be implemented as template areas of other sizes, and this application does not specifically limit this.
- the first intra prediction mode mentioned above is the intra prediction mode derived from the TIMD mode, that is, the encoder or decoder can predict the current intra prediction mode based on the optimal MIP mode and the intra prediction mode derived from the TIMD mode.
- the block is intra-predicted to obtain the prediction block of the current block.
- the MIP pattern derivation fusion enhancement technology based on template matching can not only fuse two derived MIP prediction blocks, but can also be fused with prediction blocks generated by other template matching-based derivation technologies.
- This application integrates TMMIP technology and TIMD technology to obtain a derived fusion method of traditional prediction blocks and matrix-based prediction blocks.
- TIMD uses the idea of template matching on the encoding and decoding end to derive the optimal traditional intra prediction mode, and this technology can also offset and expand the prediction mode to obtain an updated intra prediction mode.
- TMMIP technology also uses the idea of template matching on the encoding and decoding end to derive the optimal MIP mode. By fusing these two optimal prediction modes, it can take into account the directionality of traditional prediction blocks and the unique texture characteristics of MIP prediction, resulting in A brand new prediction block to improve coding efficiency.
- the encoder traverses the prediction mode. If the current block is in intra mode, the encoder obtains the sequence-level allowable flag bit, which is used to indicate whether the current sequence is allowed to use the MIP mode derivation technology based on template matching, which can be in the form of sps_tmmip_enable_flag. If the allowed flag bits of tmmip are all true, it means that the current encoder allows the use of TMMIP technology.
- the encoder process can be implemented as the following process:
- step 1
- step 2 If sps_tmmip_enable_flag is true, the encoder tries TMMIP technology, that is, performs step 2; if sps_tmmip_enable_flag is false, the encoder does not try TMMIP technology, that is, skips step 2 and proceeds to step 3 directly.
- the encoder fills the adjacent rows and columns outside the template region with reconstructed samples.
- the filling process is the same as the filling method in the original intra prediction process. For example, the encoder can traverse and fill from the lower left corner to the upper right corner. If all reconstructed samples are available, all available reconstructed samples will be filled in sequence; if all reconstructed samples are available If it is not available, all the reconstructed samples will be filled with the mean value; if some of the reconstructed samples are available, the available reconstructed samples will be filled in first. For the remaining unavailable reconstructed samples, the encoder can traverse according to the above order from the lower left corner to the upper right corner until the first available reconstruction sample appears. After a sample is reconstructed, previously unavailable locations are filled with the first available reconstructed sample.
- the encoder takes the reconstructed samples outside the filled template area as input, and uses the allowable MIP mode to predict the samples within the template area.
- MIP patterns are allowed to be used for a 4x4 block size.
- the allowed MIP patterns are 8.
- the allowed MIP pattern for other size blocks is 6.
- blocks of any size can use the MIP transpose function, and the prediction mode of TMMIP mentioned above is the same as the MIP technology.
- the specific prediction calculation process includes: the encoder first performs Haar down-sampling on the reconstructed samples. For example, the encoder determines the down-sampling step size based on the block size. Then, the encoder adjusts the splicing order of the reconstructed samples after downsampling on the upper side and the reconstructed samples after downsampling on the left based on the information about whether to transpose or not; if transposition is not required, the reconstructed samples after downsampling on the left are spliced on top After the reconstructed sample after side downsampling, the obtained vector is used as input.
- the encoder obtains the MIP matrix coefficients based on the traversed prediction mode as an index, and calculates the output vector with the input. Finally, the encoder upsamples the output vector according to the number of output vectors and the current template size. If upsampling is not required, the vectors are filled in the horizontal direction and output as the template prediction block. If upsampling is required, the horizontal direction is upsampled first. Then downsample in the vertical direction, upsample to the same size as the template, and then output it as a prediction block in the template area.
- the encoder also needs to try the template matching calculation process of TIMD, obtain different interpolation filters according to different prediction mode indexes, and interpolate the reference samples to obtain prediction samples within the template.
- the encoder calculates the distortion cost based on the predicted samples in the template area obtained by traversing each MIP mode and the reconstructed samples in the template area, and records the distortion cost value under each prediction mode and transposed information, and based on each prediction Based on the distortion cost value under the mode and transposition information, the optimal MIP mode and its corresponding transposition information are selected based on the principle of minimum cost.
- the encoder also needs to traverse all intra prediction modes allowed by TIMD, calculate the prediction samples within the template and calculate the distortion cost with the reconstructed samples within the template, and record the optimal prediction mode and times derived from TIMD technology according to the principle of minimum cost.
- the optimal prediction mode, the distortion cost value of the optimal prediction mode, and the distortion cost value of the suboptimal prediction mode are examples of the suboptimal prediction mode.
- the encoder samples the reconstructed samples adjacent to the upper and left sides of the current block as appropriate and splices them according to the transposition information as the input vector, and uses the MIP mode as the index Read the matrix coefficients in the current mode, and then obtain the output vector by calculating the input vector and matrix coefficients.
- the encoder can perform output transposition according to the transposition information, and upsample the output vector according to the size of the current block and the number of samples of the output vector, and obtain an output with the same size as the current block as the optimal MIP prediction block of the current block.
- the encoder For the optimal prediction mode and suboptimal prediction mode derived from TIMD technology, if neither the optimal prediction mode nor the suboptimal prediction mode is the mean (DC) mode or the flat (PLANAR) mode, and the distortion cost of the suboptimal prediction mode is less than two times the distortion cost of the optimal prediction mode, the encoder needs to perform a prediction block fusion operation. First, the encoder obtains the interpolation filter coefficients according to the optimal prediction mode, performs interpolation filtering on the upper and left adjacent reconstructed samples to obtain prediction samples at all positions in the current block, which are recorded as the optimal prediction block; secondly, the encoder performs interpolation filtering according to the suboptimal prediction mode.
- the prediction mode obtains the interpolation filter coefficients, and performs interpolation filtering on the upper and left adjacent reconstructed samples to obtain prediction samples at all positions in the current block, which are recorded as suboptimal prediction blocks.
- the encoder uses the ratio between the optimal prediction mode cost value and the suboptimal prediction mode cost value to calculate the weight value belonging to the optimal prediction block and the weight value of the suboptimal prediction block.
- the encoder performs a weighted fusion of the optimal prediction block and the suboptimal prediction block to obtain the prediction block of the current block as output.
- the encoder does not need to perform prediction blocks In the fusion operation, only the optimal prediction block obtained by interpolation filtering the upper and left adjacent reconstructed samples using the optimal prediction mode is used as the optimal TIMD prediction block of the current block.
- the encoder performs a weighted average of the optimal MIP prediction block and the optimal TIMD prediction block to obtain a new prediction block: The predicted block for the current block.
- the encoder obtains the rate distortion cost of the current block and records it as cost1.
- the template areas of TIMD technology and TMMIP technology can be set the same, that is, the distortion cost area for calculating the template area is the same, then the template area cost information of the two technologies can be equivalent and at the same comparison level. At this time , it can also be determined based on the cost information whether to fuse enhancement, which is not specifically limited in this application.
- the encoder continues to traverse other intra prediction techniques and calculates the corresponding rate-distortion cost as cost2...costN.
- the current block uses TMMIP technology, and the encoder sets the TMMIP usage flag of the current block to true and writes it into the code stream; if cost1 is not the minimum rate distortion cost, the current block uses other intra-frame techniques. Prediction technology, the encoder sets the TMMIP usage flag position of the current block and writes it into the code stream. It should be understood that information such as identification bits or indexes of other intra prediction technologies are transmitted according to definition and will not be elaborated here;
- the encoder determines the residual block of the current block based on the prediction block of the current block and the original block of the current block, and performs operations such as transformation and quantization, entropy coding, and loop filtering on the residual block of the current block. It should be understood that the specific process can be found in the relevant content above, and to avoid repetition, it will not be described again here.
- the decoder parses the block-level type flag bit. If it is intra-frame mode, it parses or obtains the sequence-level allowable flag bit, which is used to indicate whether the current sequence is allowed to use the MIP mode export technology based on template matching, which can be in the form of sps_tmmip_enable_flag. If the allowed flag bits of tmmip are all true, it means that the current decoder allows the use of TMMIP technology.
- the decoder process can be implemented as the following process:
- step 1
- the decoder parses the TMMIP usage flag of the current block. Otherwise, the current decoding process does not need to decode the block-level TMMIP usage flag.
- the block-level TMMIP usage flag defaults to No. If the TMMIP usage flag of the current block is true, perform step 2; otherwise, perform step 3.
- the decoder fills the adjacent rows and columns outside the template region with reconstructed samples.
- the filling process is the same as the filling method in the original intra prediction process. For example, the decoder can traverse and fill from the lower left corner to the upper right corner. If all reconstructed samples are available, all available reconstructed samples will be filled in sequence; if all reconstructed samples are available If it is not available, all the reconstructed samples will be filled with the mean; if some of the reconstructed samples are available, the available reconstructed samples will be filled in first. For the remaining unavailable reconstructed samples, the decoder can traverse according to the above order from the lower left corner to the upper right corner until the first available reconstruction sample appears. After a sample is reconstructed, previously unavailable locations are filled with the first available reconstructed sample.
- the decoder takes the reconstructed samples outside the filled template area as input, and uses the allowed MIP mode to predict the samples in the template area.
- MIP patterns are allowed to be used for a 4x4 block size.
- the allowed MIP patterns are 8.
- the allowed MIP pattern for other size blocks is 6.
- blocks of any size can use the MIP transpose function, and the prediction mode of TMMIP mentioned above is the same as the MIP technology.
- the specific prediction calculation process includes: the decoder first performs Haar down-sampling on the reconstructed samples. For example, the decoder determines the down-sampling step size based on the block size. Then, the decoder adjusts the splicing order of the reconstructed samples after downsampling on the upper side and the reconstructed samples after downsampling on the left based on the information about whether to transpose or not; if transposition is not required, the reconstructed samples after downsampling on the left are spliced on top After the reconstructed sample after side downsampling, the obtained vector is used as input.
- the decoder obtains the MIP matrix coefficients based on the traversed prediction mode as an index, and calculates the output vector with the input. Finally, the decoder upsamples the output vector according to the number of output vectors and the current template size. If upsampling is not required, the vectors are filled in the horizontal direction and output as the template prediction block. If upsampling is required, the horizontal direction is upsampled first. Then downsample in the vertical direction, upsample to the same size as the template, and then output it as a prediction block in the template area.
- the decoder also needs to try the template matching calculation process of TIMD, obtain different interpolation filters according to different prediction mode indexes, and interpolate the reference samples to obtain prediction samples within the template.
- the decoder calculates the distortion cost based on the predicted samples in the template area obtained by traversing each MIP mode and the reconstructed samples in the template area, and records the distortion cost value under each prediction mode and transposed information, and based on each prediction Based on the distortion cost value under the mode and transposition information, the optimal MIP mode and its corresponding transposition information are selected based on the principle of minimum cost.
- the decoder also needs to traverse all intra prediction modes allowed by TIMD, calculate the prediction samples within the template and calculate the distortion cost with the reconstructed samples within the template, and record the optimal prediction mode and times derived from TIMD technology according to the principle of minimum cost.
- the optimal prediction mode, the distortion cost value of the optimal prediction mode, and the distortion cost value of the suboptimal prediction mode are examples of the suboptimal prediction mode.
- the decoder samples the reconstructed samples adjacent to the upper and left sides of the current block as appropriate and splices them according to the transposition information as the input vector, and uses the MIP mode as the index Read the matrix coefficients in the current mode, and then obtain the output vector by calculating the input vector and matrix coefficients.
- the decoder can perform output transposition according to the transposition information, and upsample the output vector according to the size of the current block and the number of samples of the output vector, and obtain an output with the same size as the current block as the optimal MIP prediction block of the current block.
- the decoder For the optimal prediction mode and suboptimal prediction mode derived from TIMD technology, if neither the optimal prediction mode nor the suboptimal prediction mode is the mean (DC) mode or the flat (PLANAR) mode, and the distortion cost of the suboptimal prediction mode is less than two times the distortion cost of the optimal prediction mode, the decoder needs to perform a prediction block fusion operation. First, the decoder obtains the interpolation filter coefficients according to the optimal prediction mode, and performs interpolation filtering on the upper and left adjacent reconstructed samples to obtain prediction samples at all positions in the current block, which are recorded as the optimal prediction block; secondly, the decoder performs interpolation filtering according to the suboptimal prediction mode.
- the prediction mode obtains the interpolation filter coefficients, and performs interpolation filtering on the upper and left adjacent reconstructed samples to obtain prediction samples at all positions in the current block, which are recorded as suboptimal prediction blocks.
- the decoder uses the ratio between the optimal prediction mode cost value and the suboptimal prediction mode cost value to calculate the weight value belonging to the optimal prediction block and the weight value of the suboptimal prediction block.
- the decoder performs a weighted fusion of the optimal prediction block and the suboptimal prediction block to obtain the prediction block of the current block as output.
- the decoder does not need to perform prediction blocks In the fusion operation, only the optimal prediction block obtained by interpolation filtering the upper and left adjacent reconstructed samples using the optimal prediction mode is used as the optimal TIMD prediction block of the current block.
- the decoder performs a weighted average of the optimal MIP prediction block and the optimal TIMD prediction block to obtain a new prediction block: The predicted block for the current block.
- the decoder continues to parse information such as usage flags or indexes of other intra prediction technologies, and obtains the final prediction block of the current block based on the parsed information.
- the decoder parses the code stream and obtains the frequency domain residual block of the current block (also called frequency domain residual information), and performs inverse quantization and inverse transformation on the frequency domain residual block of the current block to obtain the residual block of the current block (also known as the temporal residual block or temporal residual information); the decoder then superimposes the prediction block of the current block and the residual block of the current block to obtain a reconstructed sample block.
- frequency domain residual information also called frequency domain residual information
- temporal residual block or temporal residual information also known as the temporal residual block or temporal residual information
- the reconstructed image can be used as video output or as a reference for subsequent decoding.
- the calculation process of the weight value of the weighted fusion of TIMD prediction blocks can be referred to the content described above in the introduction to TIMD technology. To avoid duplication, it will not be described again here.
- the encoder or decoder may determine whether to fuse enhancement based on the optimal prediction mode derived by TIMD; for example, if the optimal prediction mode derived by TIMD is DC mode or PLANAR mode, the encoder or decoder may not use fusion Enhancement, that is, only the prediction block generated by the optimal MIP mode derived from the TMMIP technology is used as the prediction block of the current block.
- the size of the template area used by the encoder or decoder in the TMMIP technology can be predefined according to the size of the current block.
- the definition of the template area in the TMMIP technology may be consistent with the definition of the template area in the TIMD technology, or may be different.
- the width of the current block is less than or equal to 8
- the height of the upper area adjacent to the current block in the template area is the height of two rows of samples, otherwise the height is the height of four rows of samples
- the width of the left area in the template area adjacent to the left side of the current block is two columns of sample height, otherwise the width is four columns of sample height.
- the first intra prediction mode mentioned above is the intra prediction mode derived from the DIMD mode, that is, the encoder or decoder can predict the current block based on the optimal MIP mode and the intra prediction mode derived from the DIMD mode. Intra prediction is performed to obtain the predicted block of the current block.
- TMMIP technology can also be integrated and enhanced with DIMD technology.
- the prediction modes derived by DIMD technology and TIMD technology are both traditional intra prediction modes, due to different derivation methods, the prediction modes obtained by the two are not necessarily the same.
- the fusion enhancement of TMMIP technology and DIMD technology will be different from the fusion enhancement of TMMIP technology and TIMD technology.
- the size of the template area of TMMIP technology and TIMD technology is generally the same, and the calculation cost information is basically the absolute transformation difference. and (Sum of Absolute Transformed Difference, SATD), which is also called the distortion cost value based on Hadamard transform. Therefore, TMMIP technology and TIMD technology can directly calculate the fusion weight based on this cost information.
- the template area of DIMD technology is generally the same as
- the template areas of TMMIP technology (or DIMD technology) are not the same size, and the criterion for DIMD derived prediction mode is measured based on the gradient amplitude value.
- the gradient amplitude value and the SATD cost value are not directly equivalent, so the weight cannot simply refer to the TMMIP technology. Calculate the solution when integrating with TIMD technology.
- the encoder traverses the prediction mode. If the current block is in intra mode, the encoder obtains the sequence-level allowable flag bit, which is used to indicate whether the current sequence is allowed to use the MIP mode derivation technology based on template matching, which can be in the form of sps_tmmip_enable_flag. If the allowed flag bits of tmmip are all true, it means that the current encoder allows the use of TMMIP technology.
- the encoder process can be implemented as the following process:
- step 1
- step 2 If sps_tmmip_enable_flag is true, the encoder tries TMMIP technology, that is, performs step 2; if sps_tmmip_enable_flag is false, the encoder does not try TMMIP technology, that is, skips step 2 and proceeds to step 3 directly.
- the encoder fills the adjacent rows and columns outside the template region with reconstructed samples.
- the filling process is the same as the filling method in the original intra prediction process. For example, the encoder can traverse and fill from the lower left corner to the upper right corner. If all reconstructed samples are available, all available reconstructed samples will be filled in sequence; if all reconstructed samples are available If it is not available, all the reconstructed samples will be filled with the mean value; if some of the reconstructed samples are available, the available reconstructed samples will be filled in first. For the remaining unavailable reconstructed samples, the encoder can traverse according to the above order from the lower left corner to the upper right corner until the first available reconstruction sample appears. After a sample is reconstructed, previously unavailable locations are filled with the first available reconstructed sample.
- the encoder takes the reconstructed samples outside the filled template area as input, and uses the allowable MIP mode to predict the samples within the template area.
- MIP patterns are allowed to be used for a 4x4 block size.
- the allowed MIP patterns are 8.
- the allowed MIP pattern for other size blocks is 6.
- blocks of any size can use the MIP transpose function, and the prediction mode of TMMIP mentioned above is the same as the MIP technology.
- the specific prediction calculation process includes: the encoder first performs Haar down-sampling on the reconstructed samples. For example, the encoder determines the down-sampling step size based on the block size. Then, the encoder adjusts the splicing order of the reconstructed samples after downsampling on the upper side and the reconstructed samples after downsampling on the left based on the information about whether to transpose or not; if transposition is not required, the reconstructed samples after downsampling on the left are spliced on top After the reconstructed sample after side downsampling, the obtained vector is used as input.
- the encoder obtains the MIP matrix coefficients based on the traversed prediction mode as an index, and calculates the output vector with the input. Finally, the encoder upsamples the output vector according to the number of output vectors and the current template size. If upsampling is not required, the vectors are filled in the horizontal direction and output as the template prediction block. If upsampling is required, the horizontal direction is upsampled first. Then downsample in the vertical direction, upsample to the same size as the template, and then output it as a prediction block in the template area.
- the encoder uses DIMD technology to derive the optimal intra prediction mode, which is the optimal DIMD mode.
- DIMD technology calculates the gradient value of the reconstructed sample in the template area based on the Sobel operator, and converts the gradient value based on the angle values of different prediction modes to obtain the amplitude value in the corresponding prediction mode.
- the encoder traverses the template prediction blocks obtained from each MIP mode, calculates the distortion cost with the reconstructed samples in the template, and records the optimal MIP mode and transposition information according to the minimum cost principle.
- the encoder traverses all allowed intra-frame prediction modes, calculates the amplitude value in each intra-frame prediction mode, and records the optimal DIMD prediction mode according to the principle of maximum amplitude.
- the encoder samples the reconstructed samples adjacent to the upper and left sides of the current block as appropriate and splices them according to the transposition information as the input vector, and uses the MIP mode as the index Read the matrix coefficients in the current mode, and then obtain the output vector by calculating the input vector and matrix coefficients.
- the encoder can perform output transposition according to the transposition information, and upsample the output vector according to the size of the current block and the number of samples of the output vector, and obtain an output with the same size as the current block as the optimal MIP prediction block of the current block.
- the encoder obtains the corresponding interpolation filter coefficients, and performs interpolation filtering on the upper and left adjacent reconstructed samples to obtain prediction samples at all positions in the current block, which are recorded as the optimal DIMD prediction block.
- the encoder performs a weighted average of the optimal MIP prediction block and the optimal DIMD prediction block for each prediction sample according to the preset weight, and the new prediction block obtained is the prediction block of the current block.
- the encoder obtains the rate distortion cost of the current block and records it as cost1.
- the encoder continues to traverse other intra prediction techniques and calculates the corresponding rate-distortion cost as cost2...costN.
- the current block uses TMMIP technology, and the encoder sets the TMMIP usage flag of the current block to true and writes it into the code stream; if cost1 is not the minimum rate distortion cost, the current block uses other intra-frame techniques. Prediction technology, the encoder sets the TMMIP usage flag position of the current block and writes it into the code stream. It should be understood that information such as identification bits or indexes of other intra prediction technologies are transmitted according to definition and will not be elaborated here;
- the encoder determines the residual block of the current block based on the prediction block of the current block and the original block of the current block, and performs operations such as transformation and quantization, entropy coding, and loop filtering on the residual block of the current block. It should be understood that the specific process can be found in the relevant content above, and to avoid repetition, it will not be described again here.
- the decoder parses the block-level type flag bit. If it is intra-frame mode, it parses or obtains the sequence-level allowable flag bit, which is used to indicate whether the current sequence is allowed to use the MIP mode export technology based on template matching, which can be in the form of sps_tmmip_enable_flag. If the allowed flag bits of tmmip are all true, it means that the current decoder allows the use of TMMIP technology.
- the decoder process can be implemented as the following process:
- step 1
- the decoder parses the TMMIP usage flag of the current block. Otherwise, the current decoding process does not need to decode the block-level TMMIP usage flag.
- the block-level TMMIP usage flag defaults to No. If the TMMIP usage flag of the current block is true, perform step 2; otherwise, perform step 3.
- the decoder fills the adjacent rows and columns outside the template region with reconstructed samples.
- the filling process is the same as the filling method in the original intra prediction process. For example, the decoder can traverse and fill from the lower left corner to the upper right corner. If all reconstructed samples are available, all available reconstructed samples will be filled in sequence; if all reconstructed samples are available If it is not available, all the reconstructed samples will be filled with the mean; if some of the reconstructed samples are available, the available reconstructed samples will be filled in first. For the remaining unavailable reconstructed samples, the decoder can traverse according to the above order from the lower left corner to the upper right corner until the first available reconstruction sample appears. After a sample is reconstructed, previously unavailable locations are filled with the first available reconstructed sample.
- the decoder takes the reconstructed samples outside the filled template area as input, and uses the allowed MIP mode to predict the samples in the template area.
- MIP patterns are allowed to be used for a 4x4 block size.
- the allowed MIP patterns are 8.
- the allowed MIP pattern for other size blocks is 6.
- blocks of any size can use the MIP transpose function, and the prediction mode of TMMIP described above is the same as the MIP technology.
- the specific prediction calculation process includes: the decoder first performs Haar down-sampling on the reconstructed samples. For example, the decoder determines the down-sampling step size based on the block size. Then, the decoder adjusts the splicing order of the reconstructed samples after downsampling on the upper side and the reconstructed samples after downsampling on the left based on the information about whether to transpose or not; if transposition is not required, the reconstructed samples after downsampling on the left are spliced on top After the reconstructed sample after side downsampling, the obtained vector is used as input.
- the decoder obtains the MIP matrix coefficients based on the traversed prediction mode as an index, and calculates the output vector with the input. Finally, the decoder upsamples the output vector according to the number of output vectors and the current template size. If upsampling is not required, the vectors are filled in the horizontal direction and output as the template prediction block. If upsampling is required, the horizontal direction is upsampled first. Then downsample in the vertical direction, upsample to the same size as the template, and then output it as a prediction block in the template area.
- the decoder uses DIMD technology to derive the optimal intra prediction mode, which is the optimal DIMD mode.
- DIMD technology calculates the gradient value of the reconstructed sample in the template area based on the Sobel operator, and converts the gradient value based on the angle values of different prediction modes to obtain the amplitude value in the corresponding prediction mode.
- the decoder traverses the template prediction blocks obtained from each MIP mode, calculates the distortion cost with the reconstructed samples in the template, and records the optimal MIP mode and transposition information according to the principle of minimum cost.
- the decoder traverses all allowed intra-frame prediction modes, calculates the amplitude value in each intra-frame prediction mode, and records the optimal DIMD prediction mode according to the principle of maximum amplitude.
- the decoder samples the reconstructed samples adjacent to the upper and left sides of the current block as appropriate and splices them according to the transposition information as the input vector, and uses the MIP mode as the index Read the matrix coefficients in the current mode, and then obtain the output vector by calculating the input vector and matrix coefficients.
- the decoder can perform output transposition according to the transposition information, and upsample the output vector according to the size of the current block and the number of samples of the output vector, and obtain an output with the same size as the current block as the optimal MIP prediction block of the current block.
- the decoder obtains the corresponding interpolation filter coefficients, and performs interpolation filtering on the upper and left adjacent reconstructed samples to obtain prediction samples at all positions in the current block, which are recorded as the optimal DIMD prediction block.
- the decoder performs a weighted average of the optimal MIP prediction block and the optimal DIMD prediction block for each prediction sample according to the preset weight, and obtains a new prediction block, which is the prediction block of the current block.
- the decoder continues to parse information such as usage flags or indexes of other intra prediction technologies, and obtains the final prediction block of the current block based on the parsed information.
- the decoder parses the code stream and obtains the frequency domain residual block of the current block (also called frequency domain residual information), and performs inverse quantization and inverse transformation on the frequency domain residual block of the current block to obtain the residual block of the current block (also known as the temporal residual block or temporal residual information); the decoder then superimposes the prediction block of the current block and the residual block of the current block to obtain a reconstructed sample block.
- frequency domain residual information also called frequency domain residual information
- temporal residual block or temporal residual information also known as the temporal residual block or temporal residual information
- the reconstructed image can be used as video output or as a reference for subsequent decoding.
- the calculation process of the optimal DIMD prediction block can be referred to the content described above in the introduction to DIMD technology. To avoid repetition, it will not be described again here.
- the fusion weight of the optimal MIP prediction block and the optimal DIMD prediction block can be a preset value, for example, the optimal MIP prediction block accounts for 5/9, and the optimal DIMD prediction block accounts for 4/9.
- the fusion weight of the optimal MIP prediction block and the optimal DIMD prediction block can also be other values, which is not specifically limited in this application.
- the size of the sequence numbers of the above-mentioned processes does not mean the order of execution.
- the execution order of each process should be determined by its functions and internal logic, and should not be used in this application.
- the implementation of the examples does not constitute any limitations.
- Figure 9 is a schematic block diagram of the decoder 500 according to the embodiment of the present application.
- the decoder 500 may include:
- the parsing unit 510 is used to parse the code stream to obtain the residual block of the current block in the current sequence
- Prediction unit 520 used for:
- the distortion cost of the multiple MIP modes includes the distortion cost obtained by using the multiple MIP modes to predict samples in the template area adjacent to the current block;
- the first intra prediction mode includes at least one of the following: a suboptimal MIP mode for predicting the current block determined based on distortion costs of the plurality of MIP modes, a decoder side intra mode Export the intra prediction mode derived from the DIMD mode, and derive the intra prediction mode derived from the TIMD mode from the template-based intra mode;
- the reconstruction unit 530 is configured to obtain a reconstruction block of the current block based on the residual block of the current block and the prediction block of the current block.
- the prediction unit 520 is specifically used to:
- the first identifier is used to identify that the optimal MIP mode and the first intra prediction mode are allowed to be used to predict the image block in the current sequence, then based on the distortion cost of the multiple MIP modes, Determine the optimal MIP pattern.
- the prediction unit 520 is specifically used to:
- the first identifier is used to identify that the optimal MIP mode and the first intra prediction mode are allowed to be used to predict the image blocks in the current sequence, then parse the code stream to obtain the second identifier;
- the optimal MIP mode is determined based on the distortion costs of the multiple MIP modes. Excellent MIP mode.
- the parsing unit 510 is also used to:
- the code stream of the current sequence is decoded based on the encoding method used in the optimal MIP mode to obtain the index of the optimal MIP mode.
- the codeword length of the encoding method used by the first n MIP modes in the arrangement sequence is smaller than the codeword length of the encoding mode used by the MIP mode after the nth MIP mode in the arrangement sequence; and/ Or, the first n MIP patterns use variable length encoding and the MIP patterns after the nth MIP pattern use truncated binary encoding.
- the prediction unit 520 is specifically used to:
- the first prediction block and the second prediction block are weighted to obtain a prediction block of the current block.
- the prediction unit 520 performs weighting processing on the first prediction block and the second prediction block based on the weight of the optimal MIP mode and the weight of the first intra prediction mode, Before obtaining the predicted block of the current block, it is also used to:
- the first intra prediction mode includes the suboptimal MIP mode or the intra prediction mode derived from the TIMD mode, then based on the distortion cost of the optimal MIP mode and the first intra prediction mode Distortion cost, determine the weight of the optimal MIP mode and the weight of the first intra prediction mode;
- the first intra prediction mode includes an intra prediction mode derived from the DIMD mode, it is determined that the weight of the optimal MIP mode and the weight of the first intra prediction mode are both preset values.
- the prediction unit 520 is specifically used to:
- the optimal MIP mode is determined based on the distortion costs of the multiple MIP modes in each state of the third identifier.
- the prediction unit 520 is specifically used to:
- the optimal MIP mode is determined based on the distortion costs of the multiple MIP modes.
- the prediction unit 520 is specifically used to:
- the optimal MIP mode is determined based on the distortion costs of the multiple MIP modes.
- the prediction unit 520 is specifically used to:
- the optimal MIP mode is determined based on the distortion costs of the multiple MIP modes.
- the prediction unit 520 before determining the optimal MIP mode for predicting the current block based on the distortion costs of multiple matrix-based intra prediction MIP modes, the prediction unit 520 is also used to:
- the MIP modes used by the adjacent blocks are determined as the multiple MIP modes.
- the prediction unit 520 before determining the optimal MIP mode for predicting the current block based on the distortion costs of multiple matrix-based intra prediction MIP modes, the prediction unit 520 is also used to:
- Distortion costs for the plurality of MIP modes are determined based on the plurality of prediction blocks and reconstruction blocks within the template region.
- the prediction unit 520 is specifically used to:
- the output vectors of the multiple MIP modes are upsampled to obtain prediction blocks corresponding to the multiple MIP modes.
- the prediction unit 520 is specifically used to:
- the optimal MIP pattern is determined based on the sum SATD of absolute transformation differences of the plurality of MIP patterns on the template region.
- Figure 10 is a schematic block diagram of the encoder 600 according to the embodiment of the present application.
- the encoder 600 may include:
- Prediction unit 610 used for:
- the distortion cost of the multiple MIP modes includes the distortion cost obtained by using the multiple MIP modes to predict samples in the template area adjacent to the current block;
- the first intra prediction mode includes at least one of the following: a suboptimal MIP mode for predicting the current block determined based on distortion costs of the plurality of MIP modes, a decoder side intra mode Export the intra prediction mode derived from the DIMD mode, and derive the intra prediction mode derived from the TIMD mode from the template-based intra mode;
- Residual unit 620 configured to obtain a residual block of the current block based on the prediction block of the current block and the original block of the current block;
- the encoding unit 630 is used to encode the residual block of the current block to obtain the code stream of the current sequence.
- the prediction unit 610 is specifically used to:
- the first identifier is used to identify that the optimal MIP mode and the first intra prediction mode are allowed to be used to predict the image block in the current sequence, then based on the distortion cost of the multiple MIP modes, Determine the optimal MIP mode;
- the encoding unit 630 is specifically used for:
- Predicting the current block based on the optimal MIP mode and the first intra prediction mode to obtain a predicted block of the current block includes:
- the prediction unit 610 is specifically used to:
- the first identifier is used to identify that the optimal MIP mode and the first intra prediction mode are allowed to be used to predict the image block in the current sequence, then based on the optimal MIP mode and the first intra prediction mode, Predict the current block in an intra prediction mode to obtain the first rate distortion cost;
- the current block will be predicted based on the optimal MIP mode and the first intra prediction mode. Prediction block, determined to be the prediction block of the current block;
- the encoding unit 630 is specifically used for:
- the second identification is used to identify that the optimal MIP mode and the first intra prediction are allowed to be used. mode to predict the current block; if the first rate distortion cost is greater than the minimum value of the at least one rate distortion cost, the second identification is used to identify that the optimal MIP mode and the The first intra prediction mode predicts the current block.
- the encoding unit 630 is specifically used to:
- the code stream is obtained by encoding the residual block of the current block and encoding the index of the optimal MIP mode based on the encoding method used in the optimal MIP mode.
- the codeword length of the encoding method used by the first n MIP modes in the arrangement sequence is smaller than the codeword length of the encoding mode used by the MIP mode after the nth MIP mode in the arrangement sequence; and/ Or, the first n MIP patterns use variable length encoding and the MIP patterns after the nth MIP pattern use truncated binary encoding.
- the prediction unit 610 is specifically used to:
- the first prediction block and the second prediction block are weighted to obtain a prediction block of the current block.
- the prediction unit 610 performs weighting processing on the first prediction block and the second prediction block based on the weight of the optimal MIP mode and the weight of the first intra prediction mode, Before obtaining the predicted block of the current block, it is also used to:
- the first intra prediction mode includes the suboptimal MIP mode or the intra prediction mode derived from the TIMD mode, then based on the distortion cost of the optimal MIP mode and the first intra prediction mode Distortion cost, determine the weight of the optimal MIP mode and the weight of the first intra prediction mode;
- the first intra prediction mode includes an intra prediction mode derived from the DIMD mode, it is determined that the weight of the optimal MIP mode and the weight of the first intra prediction mode are both preset values.
- the prediction unit 610 is specifically used to:
- the optimal MIP mode is determined based on the distortion costs of the multiple MIP modes in each state of the third identifier.
- the prediction unit 610 is specifically used to:
- the optimal MIP mode is determined based on the distortion costs of the multiple MIP modes.
- the prediction unit 610 is specifically used to:
- the optimal MIP mode is determined based on the distortion costs of the multiple MIP modes.
- the prediction unit 610 is specifically used to:
- the optimal MIP mode is determined based on the distortion costs of the multiple MIP modes.
- the prediction unit 610 before determining the optimal MIP mode for predicting the current block in the current sequence based on the distortion cost of multiple matrix-based intra prediction MIP modes, the prediction unit 610 is also used to:
- the MIP modes used by the adjacent blocks are determined as the multiple MIP modes.
- the prediction unit 610 before determining the optimal MIP mode for predicting the current block in the current sequence based on the distortion cost of multiple matrix-based intra prediction MIP modes, the prediction unit 610 is also used to:
- Distortion costs for the plurality of MIP modes are determined based on the plurality of prediction blocks and reconstruction blocks within the template region.
- the prediction unit 610 is specifically used to:
- the output vectors of the multiple MIP modes are upsampled to obtain prediction blocks corresponding to the multiple MIP modes.
- the prediction unit 610 is specifically used to:
- the optimal MIP pattern is determined based on the sum SATD of absolute transformation differences of the plurality of MIP patterns on the template region.
- the device embodiments and the method embodiments may correspond to each other, and similar descriptions may refer to the method embodiments. To avoid repetition, they will not be repeated here.
- the decoder 500 shown in Figure 9 may correspond to the corresponding subject in performing the method 300 of the embodiment of the present application, and the foregoing and other operations and/or functions of each unit in the decoder 500 are respectively to implement the method 300, etc.
- the encoder 600 shown in Figure 10 may correspond to the corresponding subject in performing the method 400 of the embodiment of the present application, that is, the foregoing and other operations and/or functions of each unit in the encoder 600 are respectively in order to implement the method 400, etc. Corresponding processes in each method.
- each unit in the decoder 500 or encoder 600 involved in the embodiment of the present application can be separately or entirely combined into one or several other units to form, or some of the units (some) can also be disassembled. It is divided into multiple units with smaller functions, which can achieve the same operation without affecting the realization of the technical effects of the embodiments of the present application.
- the above units are divided based on logical functions. In practical applications, the function of one unit can also be realized by multiple units, or the functions of multiple units can be realized by one unit. In other embodiments of the present application, the decoder 500 or the encoder 600 may also include other units. In practical applications, these functions may also be implemented with the assistance of other units, and may be implemented by multiple units in cooperation.
- a general-purpose computing device including a general-purpose computer including processing elements and storage elements such as a central processing unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), etc.
- Run a computer program capable of executing each step involved in the corresponding method to construct the decoder 500 or encoder 600 involved in the embodiment of the present application, and implement the encoding method or decoding method of the embodiment of the present application.
- the computer program can be recorded on, for example, a computer-readable storage medium, loaded into an electronic device through the computer-readable storage medium, and run therein to implement the corresponding methods of the embodiments of the present application.
- the units mentioned above can be implemented in the form of hardware, can also be implemented in the form of instructions in the form of software, or can be implemented in the form of a combination of software and hardware.
- each step of the method embodiments in the embodiments of the present application can be completed by integrated logic circuits of hardware in the processor and/or instructions in the form of software.
- the steps of the methods disclosed in conjunction with the embodiments of the present application can be directly embodied in hardware.
- the execution of the decoding processor is completed, or the execution is completed using a combination of hardware and software in the decoding processor.
- the software can be located in a mature storage medium in this field such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, register, etc.
- the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps in the above method embodiment in combination with its hardware.
- FIG. 11 is a schematic structural diagram of an electronic device 700 provided by an embodiment of the present application.
- the electronic device 700 at least includes a processor 710 and a computer-readable storage medium 720 .
- the processor 710 and the computer-readable storage medium 720 may be connected through a bus or other means.
- the computer-readable storage medium 720 is used to store a computer program 721
- the computer program 721 includes computer instructions
- the processor 710 is used to execute the computer instructions stored in the computer-readable storage medium 720.
- the processor 710 is the computing core and the control core of the electronic device 700. It is suitable for implementing one or more computer instructions. Specifically, it is suitable for loading and executing one or more computer instructions to implement the corresponding method flow or corresponding functions.
- the processor 710 may also be called a central processing unit (Central Processing Unit, CPU).
- the processor 710 may include, but is not limited to: a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field programmable gate array (Field Programmable Gate Array, FPGA) Or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
- DSP Digital Signal Processor
- ASIC Application Specific Integrated Circuit
- FPGA Field Programmable Gate Array
- the computer-readable storage medium 720 can be a high-speed RAM memory, or a non-volatile memory (Non-Volatile Memory), such as at least one disk memory; optionally, it can also be at least one located far away from the aforementioned processor 710 Computer-readable storage media.
- the computer-readable storage medium 720 includes, but is not limited to: volatile memory and/or non-volatile memory.
- non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electrically removable memory.
- Volatile memory may be Random Access Memory (RAM), which is used as an external cache.
- RAM Random Access Memory
- SRAM static random access memory
- DRAM dynamic random access memory
- DRAM synchronous dynamic random access memory
- DDR SDRAM double data rate synchronous dynamic random access memory
- Enhanced SDRAM, ESDRAM enhanced synchronous dynamic random access memory
- SLDRAM synchronous link dynamic random access memory
- Direct Rambus RAM Direct Rambus RAM
- the electronic device 700 may be the encoder or coding framework involved in the embodiment of the present application; the computer-readable storage medium 720 stores the first computer instructions; the computer-readable instructions are loaded and executed by the processor 710
- the first computer instructions stored in the storage medium 720 are used to implement the corresponding steps in the encoding method provided by the embodiment of the present application; in other words, the first computer instructions in the computer-readable storage medium 720 are loaded by the processor 710 and execute the corresponding steps, To avoid repetition, they will not be repeated here.
- the electronic device 700 may be the decoder or decoding framework involved in the embodiment of the present application; the computer-readable storage medium 720 stores second computer instructions; the computer-readable instructions are loaded and executed by the processor 710 The second computer instructions stored in the storage medium 720 are used to implement the corresponding steps in the decoding method provided by the embodiment of the present application; in other words, the second computer instructions in the computer-readable storage medium 720 are loaded by the processor 710 and execute the corresponding steps, To avoid repetition, they will not be repeated here.
- embodiments of the present application also provide a coding and decoding system, including the above-mentioned encoder and decoder.
- embodiments of the present application also provide a computer-readable storage medium (Memory).
- the computer-readable storage medium is a memory device in the electronic device 700 and is used to store programs and data.
- computer-readable storage medium 720 may include a built-in storage medium in the electronic device 700, and of course may also include an extended storage medium supported by the electronic device 700.
- the computer-readable storage medium provides storage space that stores the operating system of the electronic device 700 .
- one or more computer instructions suitable for being loaded and executed by the processor 710 are also stored in the storage space. These computer instructions may be one or more computer programs 721 (including program codes).
- a computer program product or computer program is provided, the computer program product or computer program including computer instructions stored in a computer-readable storage medium.
- the data processing device 700 can be a computer.
- the processor 710 reads the computer instructions from the computer-readable storage medium 720.
- the processor 710 executes the computer instructions, so that the computer executes the encoding method provided in the above various optional ways. or decoding method.
- the computer program product includes one or more computer instructions.
- the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
- the computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transmitted from a website, computer, server, or data center to Transmission to another website, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) methods.
- wired such as coaxial cable, optical fiber, digital subscriber line (DSL)
- wireless such as infrared, wireless, microwave, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
Claims (36)
- 一种解码方法,其特征在于,所述方法适用于解码器,所述方法包括:解析码流获取当前序列中当前块的残差块;基于多个基于矩阵的帧内预测MIP模式的失真代价,确定用于预测所述当前块的最优MIP模式;其中,所述多个MIP模式的失真代价包括利用所述多个MIP模式对与所述当前块相邻的模板区域内的样本进行预测得到的失真代价;确定第一帧内预测模式;其中,所述第一帧内预测模式包括以下中的至少一项:基于所述多个MIP模式的失真代价确定的用于预测所述当前块的次优MIP模式、由解码器侧帧内模式导出DIMD模式导出的帧内预测模式、由基于模板的帧内模式导出TIMD模式导出的帧内预测模式;基于所述最优MIP模式和所述第一帧内预测模式对所述当前块进行预测,得到所述当前块的预测块;基于所述当前块的残差块和所述当前块的预测块,得到所述当前块的重建块。
- 根据权利要求1所述的方法,其特征在于,所述基于多个基于矩阵的帧内预测MIP模式的失真代价,确定用于预测所述当前块的最优MIP模式,包括:解析所述当前序列的码流获取第一标识;若所述第一标识用于标识允许使用所述最优MIP模式和所述第一帧内预测模式对所述当前序列中的图像块进行预测,则基于所述多个MIP模式的失真代价,确定所述最优MIP模式。
- 根据权利要求2所述的方法,其特征在于,所述若所述第一标识用于标识允许使用所述最优MIP模式和所述第一帧内预测模式对所述当前序列中的图像块进行预测,则基于所述多个MIP模式的失真代价,确定所述最优MIP模式,包括:若所述第一标识用于标识允许使用所述最优MIP模式和所述第一帧内预测模式对所述当前序列中的图像块进行预测,则解析所述码流获取第二标识;若所述第二标识用于标识允许使用所述最优MIP模式和所述第一帧内预测模式对所述当前块进行预测,则基于所述多个MIP模式的失真代价,确定所述最优MIP模式。
- 根据权利要求1所述的方法,其特征在于,所述方法还包括:基于所述MIP模式的失真代价,确定所述多个MIP模式的排列顺序;基于所述多个MIP模式的排列顺序,确定所述最优MIP模式使用的编码方式;基于最优MIP模式使用的编码方式对所述当前序列的码流进行解码,得到最优MIP模式的索引。
- 根据权利要求4所述的方法,其特征在于,所述排列顺序中前n个MIP模式使用的编码方式的码字长度小于所述排列顺序中第n个MIP模式之后的MIP模式使用的编码方式的码字长度;和/或,所述前n个MIP模式使用变长编码方式且所述第n个MIP模式之后的MIP模式使用截断二进制编码方式。
- 根据权利要求1至5中任一项所述的方法,其特征在于,所述基于所述最优MIP模式和所述第一帧内预测模式对所述当前块进行预测,得到所述当前块的预测块,包括:基于所述最优MIP模式对所述当前块进行预测,得到第一预测块;基于所述第一帧内预测模式对所述当前块进行预测,得到第二预测块;基于所述最优MIP模式的权重和所述第一帧内预测模式的权重,对所述第一预测块和所述第二预测块进行加权处理,得到所述当前块的预测块。
- 根据权利要求6所述的方法,其特征在于,所述基于所述最优MIP模式的权重和所述第一帧内预测模式的权重,对所述第一预测块和所述第二预测块进行加权处理,得到所述当前块的预测块之前,所述方法还包括:若所述第一帧内预测模式包括所述次优MIP模式或由所述TIMD模式导出的帧内预测模式,则基于所述最优MIP模式的失真代价和所述第一帧内预测模式的失真代价,确定所述最优MIP模式的权重和所述第一帧内预测模式的权重;若所述第一帧内预测模式包括由所述DIMD模式导出的帧内预测模式,则确定所述最优MIP模式的权重和所述第一帧内预测模式的权重均为预设值。
- 根据权利要求1至7中任一项所述的方法,其特征在于,所述基于多个基于矩阵的帧内预测MIP模式的失真代价,确定用于预测所述当前块的最优MIP模式,包括:基于第三标识和所述多个MIP模式对所述模板区域内的样本进行预测,得到所述第三标识的每一个状态下所述多个MIP模式的失真代价;所述第三标识的用于标识是否转置MIP模式的输入向量和输出向量;基于所述第三标识的每一个状态下所述多个MIP模式的失真代价,确定所述最优MIP模式。
- 根据权利要求1至8中任一项所述的方法,其特征在于,所述基于多个基于矩阵的帧内预测MIP模式的失真代价,确定用于预测所述当前块的最优MIP模式,包括:若所述当前块的尺寸为预设尺寸,则基于所述多个MIP模式的失真代价,确定所述最优MIP模式。
- 根据权利要求9所述的方法,其特征在于,所述若所述当前块的尺寸为预设尺寸,则基于所述多个MIP模式的失真代价,确定所述最优MIP模式,包括:若所述当前块所在的图像帧为I帧、且所述当前块的尺寸为所述预设尺寸,则基于所述多个MIP模式的失真代价,确定所述最优MIP模式。
- 根据权利要求1至8中任一项所述的方法,其特征在于,所述基于多个基于矩阵的帧内预测MIP模式的失真代价,确定用于预测所述当前块的最优MIP模式,包括:若所述当前块所在的图像帧为B帧,则基于所述多个MIP模式的失真代价,确定所述最优MIP模式。
- 根据权利要求1至11中任一项所述的方法,其特征在于,所述基于多个基于矩阵的帧内预测MIP模式的失真代价,确定用于预测所述当前块的最优MIP模式之前,所述方法还包括:获取与所述当前块相邻的相邻块使用的MIP模式;将所述相邻块使用的MIP模式,确定为所述多个MIP模式。
- 根据权利要求1至12中任一项所述的方法,其特征在于,所述基于多个基于矩阵的帧内预测MIP模式的失真代价,确定用于预测所述当前块的最优MIP模式之前,所述方法还包括:对所述模板区域外部相邻的参考区域进行重建样本填充,得到所述模板区域的参考行和参考列;以所述参考行和所述参考列为输入,利用所述多个MIP模式分别对所述模板区域内的样本进行预测,得到所述多个MIP模式对应的多个预测块;基于所述多个预测块和所述模板区域内的重建块,确定所述多个MIP模式的失真代价。
- 根据权利要求13所述的方法,其特征在于,所述以所述参考行和所述参考列为输入,利用所述多个MIP模式分别对所述模板区域内的样本进行预测,得到所述多个MIP模式对应的多个预测块,包括:对所述参考行和所述参考列进行下采样,得到输入向量;以所述输入向量为输入,通过遍历所述多个MIP模式的方式对所述模板区域内的样本进行预测,得到所述多个MIP模式的输出向量;对所述多个MIP模式的输出向量进行上采样,得到所述多个MIP模式对应的预测块。
- 根据权利要求1至14中任一项所述的方法,其特征在于,所述基于多个基于矩阵的帧内预测MIP模式的失真代价,确定用于预测所述当前块的最优MIP模式,包括:基于所述多个MIP模式在所述模板区域上的绝对变换差的和SATD,确定所述最优MIP模式。
- 一种编码方法,其特征在于,所述方法适用于编码器,所述方法包括:基于多个基于矩阵的帧内预测MIP模式的失真代价,确定用于预测当前序列中的当前块的最优MIP模式;其中,所述多个MIP模式的失真代价包括利用所述多个MIP模式对与所述当前块相邻的模板区域内的样本进行预测得到的失真代价;确定第一帧内预测模式;其中,所述第一帧内预测模式包括以下中的至少一项:基于所述多个MIP模式的失真代价确定的用于预测所述当前块的次优MIP模式、由解码器侧帧内模式导出DIMD模式导出的帧内预测模式、由基于模板的帧内模式导出TIMD模式导出的帧内预测模式;基于所述最优MIP模式和所述第一帧内预测模式对所述当前块进行预测,得到所述当前块的预测块;基于所述当前块的预测块和所述当前块的原始块,得到所述当前块的残差块;对所述当前块的残差块进行编码,得到所述当前序列的码流。
- 根据权利要求16所述的方法,其特征在于,所述基于多个基于矩阵的帧内预测MIP模式的失真代价,确定用于预测当前序列中的当前块的最优MIP模式,包括:获取第一标识;若所述第一标识用于标识允许使用所述最优MIP模式和所述第一帧内预测模式对所述当前序列中的图像块进行预测,则基于所述多个MIP模式的失真代价,确定所述最优MIP模式;其中,所述对所述当前块的残差块进行编码,得到所述当前序列的码流,包括:对所述当前块的残差块和所述第一标识进行编码,得到所述码流。
- 根据权利要求17所述的方法,其特征在于,所述基于所述最优MIP模式和所述第一帧内预测模式对所述当前块进行预测,得到所述当前块的预测块,包括:若所述第一标识用于标识允许使用所述最优MIP模式和所述第一帧内预测模式对所述当前序列中的图像块进行预测,则基于所述最优MIP模式和所述第一帧内预测模式对所述当前块进行预测,得到所述第一率失真代价;基于至少一个帧内预测模式对所述当前块进行预测,得到至少一个率失真代价;若所述第一率失真代价小于或等于所述至少一个率失真代价中的最小值,则将基于所述最优MIP模式和所述第一帧内预测模式对所述当前块进行预测得到的预测块,确定为所述当前块的预测块;其中,所述对所述当前块的残差块和所述第一标识进行编码,得到所述码流,包括:对所述当前块的残差块、所述第一标识和第二标识进行编码,得到所述码流;其中,若所述第一率失真代价小于或等于所述至少一个率失真代价中的最小值,则所述第二标识用于标识允许使用所述最优MIP模式和所述第一帧内预测模式对所述当前块进行预测;若所述第一率失真代价大于所述至少一个率失真代价中的最小值,则所述第二标识用于标识不允许使用所述最优MIP模式和所述第一帧内预测模式对所述当前块进行预测。
- 根据权利要求16所述的方法,其特征在于,所述对所述当前块的残差块进行编码,得到所述当前序列的码流,包括:基于所述MIP模式的失真代价,确定所述多个MIP模式的排列顺序;基于所述多个MIP模式的排列顺序,确定所述最优MIP模式使用的编码方式;对所述当前块的残差块进行编码以及基于最优MIP模式使用的编码方式对所述最优MIP模式的索引进行编码,得到所述码流。
- 根据权利要求19所述的方法,其特征在于,所述排列顺序中前n个MIP模式使用的编码方式的码字长度小于所述排列顺序中第n个MIP模式之后的MIP模式使用的编码方式的码字长度;和/或,所述前n个MIP模式使用变长编码方式且所述第n个MIP模式之后的MIP模式使用截断二进制编码方式。
- 根据权利要求16至20中任一项所述的方法,其特征在于,所述基于所述最优MIP模式和所述第一帧内预测模式对所述当前块进行预测,得到所述当前块的预测块,包括:基于所述最优MIP模式对所述当前块进行预测,得到第一预测块;基于所述第一帧内预测模式对所述当前块进行预测,得到第二预测块;基于所述最优MIP模式的权重和所述第一帧内预测模式的权重,对所述第一预测块和所述第二预测块进行加权处理,得到所述当前块的预测块。
- 根据权利要求21所述的方法,其特征在于,所述基于所述最优MIP模式的权重和所述第一帧内预测模式的权重,对所述第一预测块和所述第二预测块进行加权处理,得到所述当前块的预测块之前,所述方法还包括:若所述第一帧内预测模式包括所述次优MIP模式或由所述TIMD模式导出的帧内预测模式,则基于所述最优MIP模式的失真代价和所述第一帧内预测模式的失真代价,确定所述最优MIP模式的权重和所述第一帧内预测模式的权重;若所述第一帧内预测模式包括由所述DIMD模式导出的帧内预测模式,则确定所述最优MIP模式的权重和所述第一帧内预测模式的权重均为预设值。
- 根据权利要求16至22中任一项所述的方法,其特征在于,所述基于多个基于矩阵的帧内预测MIP模式的失真代价,确定用于预测当前序列中的当前块的最优MIP模式,包括:基于第三标识和所述多个MIP模式对所述模板区域内的样本进行预测,得到所述第三标识的每一个状态下所述多个MIP模式的失真代价;所述第三标识的用于标识是否转置MIP模式的输入向量和输出向量;基于所述第三标识的每一个状态下所述多个MIP模式的失真代价,确定所述最优MIP模式。
- 根据权利要求16至23中任一项所述的方法,其特征在于,所述基于多个基于矩阵的帧内预测MIP模式的失真代价,确定用于预测当前序列中的当前块的最优MIP模式,包括:若所述当前块的尺寸为预设尺寸,则基于所述多个MIP模式的失真代价,确定所述最优MIP模式。
- 根据权利要求24所述的方法,其特征在于,所述若所述当前块的尺寸为预设尺寸,则基于所述多个MIP模式的失真代价,确定所述最优MIP模式,包括:若所述当前块所在的图像帧为I帧、且所述当前块的尺寸为所述预设尺寸,则基于所述多个MIP模式的失真代价,确定所述最优MIP模式。
- 根据权利要求16至23中任一项所述的方法,其特征在于,所述基于多个基于矩阵的帧内预测 MIP模式的失真代价,确定用于预测当前序列中的当前块的最优MIP模式,包括:若所述当前块所在的图像帧为B帧,则基于所述多个MIP模式的失真代价,确定所述最优MIP模式。
- 根据权利要求16至26中任一项所述的方法,其特征在于,所述基于多个基于矩阵的帧内预测MIP模式的失真代价,确定用于预测当前序列中的当前块的最优MIP模式之前,所述方法还包括:获取与所述当前块相邻的相邻块使用的MIP模式;将所述相邻块使用的MIP模式,确定为所述多个MIP模式。
- 根据权利要求16至27中任一项所述的方法,其特征在于,所述基于多个基于矩阵的帧内预测MIP模式的失真代价,确定用于预测当前序列中的当前块的最优MIP模式之前,所述方法还包括:对所述模板区域外部相邻的参考区域进行重建样本填充,得到所述模板区域的参考行和参考列;以所述参考行和所述参考列为输入,利用所述多个MIP模式分别对所述模板区域内的样本进行预测,得到所述多个MIP模式对应的多个预测块;基于所述多个预测块和所述模板区域内的重建块,确定所述多个MIP模式的失真代价。
- 根据权利要求28所述的方法,其特征在于,所述以所述参考行和所述参考列为输入,利用所述多个MIP模式分别对所述模板区域内的样本进行预测,得到所述多个MIP模式对应的多个预测块,包括:对所述参考行和所述参考列进行下采样,得到输入向量;以所述输入向量为输入,通过遍历所述多个MIP模式的方式对所述模板区域内的样本进行预测,得到所述多个MIP模式的输出向量;对所述多个MIP模式的输出向量进行上采样,得到所述多个MIP模式对应的预测块。
- 根据权利要求16至29中任一项所述的方法,其特征在于,所述基于多个基于矩阵的帧内预测MIP模式的失真代价,确定用于预测当前序列中的当前块的最优MIP模式,包括:基于所述多个MIP模式在所述模板区域上的绝对变换差的和SATD,确定所述最优MIP模式。
- 一种解码器,其特征在于,包括:解析单元,用于解析码流获取当前序列中当前块的残差块;预测单元,用于:基于多个基于矩阵的帧内预测MIP模式的失真代价,确定用于预测所述当前块的最优MIP模式;其中,所述多个MIP模式的失真代价包括利用所述多个MIP模式对与所述当前块相邻的模板区域内的样本进行预测得到的失真代价;确定第一帧内预测模式;其中,所述第一帧内预测模式包括以下中的至少一项:基于所述多个MIP模式的失真代价确定的用于预测所述当前块的次优MIP模式、由解码器侧帧内模式导出DIMD模式导出的帧内预测模式、由基于模板的帧内模式导出TIMD模式导出的帧内预测模式;基于所述最优MIP模式和所述第一帧内预测模式对所述当前块进行预测,得到所述当前块的预测块;重建单元,用于基于所述当前块的残差块和所述当前块的预测块,得到所述当前块的重建块。
- 一种编码器,其特征在于,包括:预测单元,用于:基于多个基于矩阵的帧内预测MIP模式的失真代价,确定用于预测当前序列中的当前块的最优MIP模式;其中,所述多个MIP模式的失真代价包括利用所述多个MIP模式对与所述当前块相邻的模板区域内的样本进行预测得到的失真代价;确定第一帧内预测模式;其中,所述第一帧内预测模式包括以下中的至少一项:基于所述多个MIP模式的失真代价确定的用于预测所述当前块的次优MIP模式、由解码器侧帧内模式导出DIMD模式导出的帧内预测模式、由基于模板的帧内模式导出TIMD模式导出的帧内预测模式;基于所述最优MIP模式和所述第一帧内预测模式对所述当前块进行预测,得到所述当前块的预测块;残差单元,用于基于所述当前块的预测块和所述当前块的原始块,得到所述当前块的残差块;编码单元,用于对所述当前块的残差块进行编码,得到所述当前序列的码流。
- 一种电子设备,其特征在于,包括:处理器,适于执行计算机程序;计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序被所述处理器执行时,实现如权利要求1至15中任一项所述的方法或如权利要求16至30中任一项所述的方法。
- 一种计算机可读存储介质,其特征在于,用于存储计算机程序,所述计算机程序使得计算机执行如权利要求1至15中任一项所述的方法或如权利要求16至30中任一项所述的方法。
- 一种计算机程序产品,包括计算机程序/指令,其特征在于,所述计算机程序/指令被处理器执行时实现如权利要求1至15中任一项所述的方法或如权利要求16至30中任一项所述的方法。
- 一种码流,其特征在于,所述码流如权利要求1至15中任一项所述的方法中的码流或如权利要求16至30中任一项所述的方法生成的码流。
Priority Applications (7)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP22936191.0A EP4507306A4 (en) | 2022-04-08 | 2022-04-08 | DECODING METHOD, CODING METHOD, DECODER AND ENCODER |
| JP2024558329A JP2025511314A (ja) | 2022-04-08 | 2022-04-08 | デコーディング方法、エンコーディング方法、デコーダー及びエンコーダー |
| CN202280094571.9A CN118985131A (zh) | 2022-04-08 | 2022-04-08 | 解码方法、编码方法、解码器以及编码器 |
| PCT/CN2022/085897 WO2023193253A1 (zh) | 2022-04-08 | 2022-04-08 | 解码方法、编码方法、解码器以及编码器 |
| KR1020247036730A KR20250002334A (ko) | 2022-04-08 | 2022-04-08 | 디코딩 방법, 인코딩 방법, 디코더 및 인코더 |
| US18/898,361 US20250030845A1 (en) | 2022-04-08 | 2024-09-26 | Decoding method, encoding method, and storage medium |
| MX2024012273A MX2024012273A (es) | 2022-04-08 | 2024-10-03 | Metodo de decodificacion, metodo de codificacion, decodificador y codificador |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2022/085897 WO2023193253A1 (zh) | 2022-04-08 | 2022-04-08 | 解码方法、编码方法、解码器以及编码器 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/898,361 Continuation US20250030845A1 (en) | 2022-04-08 | 2024-09-26 | Decoding method, encoding method, and storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023193253A1 true WO2023193253A1 (zh) | 2023-10-12 |
Family
ID=88243836
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2022/085897 Ceased WO2023193253A1 (zh) | 2022-04-08 | 2022-04-08 | 解码方法、编码方法、解码器以及编码器 |
Country Status (7)
| Country | Link |
|---|---|
| US (1) | US20250030845A1 (zh) |
| EP (1) | EP4507306A4 (zh) |
| JP (1) | JP2025511314A (zh) |
| KR (1) | KR20250002334A (zh) |
| CN (1) | CN118985131A (zh) |
| MX (1) | MX2024012273A (zh) |
| WO (1) | WO2023193253A1 (zh) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20230336718A1 (en) * | 2022-04-18 | 2023-10-19 | Comcast Cable Communications, Llc | Video Compression Using Template-Based Determination of Intra Prediction Mode |
| WO2025148127A1 (en) * | 2024-01-08 | 2025-07-17 | Beijing Xiaomi Mobile Software Co., Ltd. | Encoding/decoding video picture data |
| WO2025213371A1 (zh) * | 2024-04-09 | 2025-10-16 | Oppo广东移动通信有限公司 | 编解码方法、码流、编码器、解码器以及存储介质 |
| WO2025213370A1 (zh) * | 2024-04-09 | 2025-10-16 | Oppo广东移动通信有限公司 | 编解码方法、码流、编码器、解码器以及存储介质 |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2025511315A (ja) * | 2022-04-08 | 2025-04-15 | オッポ広東移動通信有限公司 | デコーディング方法、エンコーディング方法、デコーダー及びエンコーダー |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190166370A1 (en) * | 2016-05-06 | 2019-05-30 | Vid Scale, Inc. | Method and system for decoder-side intra mode derivation for block-based video coding |
| CN111050183A (zh) * | 2019-12-13 | 2020-04-21 | 浙江大华技术股份有限公司 | 一种帧内预测方法、编码器及存储介质 |
| CN112532976A (zh) * | 2019-09-18 | 2021-03-19 | 夏普株式会社 | 运动图像解码装置以及运动图像编码装置 |
Family Cites Families (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2018070790A1 (ko) * | 2016-10-14 | 2018-04-19 | 세종대학교 산학협력단 | 영상의 부호화/복호화 방법 및 장치 |
| AU2019204437B2 (en) * | 2019-06-24 | 2022-02-03 | Canon Kabushiki Kaisha | Method, apparatus and system for encoding and decoding a block of video samples |
| CN119402667B (zh) * | 2019-07-07 | 2026-02-03 | Oppo广东移动通信有限公司 | 图像预测方法、编码器、解码器以及存储介质 |
| PH12022553231A1 (en) * | 2020-06-03 | 2024-02-12 | Nokia Technologies Oy | A method, an apparatus and a computer program product for video encoding and video decoding |
| CN117981300A (zh) * | 2021-09-27 | 2024-05-03 | Oppo广东移动通信有限公司 | 编解码方法、码流、编码器、解码器以及存储介质 |
| EP4454265A1 (en) * | 2021-12-21 | 2024-10-30 | InterDigital CE Patent Holdings, SAS | Most probable mode list generation with template-based intra mode derivation and decoder-side intra mode derivation |
| WO2023138543A1 (en) * | 2022-01-19 | 2023-07-27 | Beijing Bytedance Network Technology Co., Ltd. | Method, apparatus, and medium for video processing |
| JP2025511315A (ja) * | 2022-04-08 | 2025-04-15 | オッポ広東移動通信有限公司 | デコーディング方法、エンコーディング方法、デコーダー及びエンコーダー |
| CN119547440A (zh) * | 2022-07-04 | 2025-02-28 | Oppo广东移动通信有限公司 | 解码方法、编码方法、解码器以及编码器 |
-
2022
- 2022-04-08 CN CN202280094571.9A patent/CN118985131A/zh active Pending
- 2022-04-08 JP JP2024558329A patent/JP2025511314A/ja active Pending
- 2022-04-08 WO PCT/CN2022/085897 patent/WO2023193253A1/zh not_active Ceased
- 2022-04-08 EP EP22936191.0A patent/EP4507306A4/en active Pending
- 2022-04-08 KR KR1020247036730A patent/KR20250002334A/ko active Pending
-
2024
- 2024-09-26 US US18/898,361 patent/US20250030845A1/en active Pending
- 2024-10-03 MX MX2024012273A patent/MX2024012273A/es unknown
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190166370A1 (en) * | 2016-05-06 | 2019-05-30 | Vid Scale, Inc. | Method and system for decoder-side intra mode derivation for block-based video coding |
| CN112532976A (zh) * | 2019-09-18 | 2021-03-19 | 夏普株式会社 | 运动图像解码装置以及运动图像编码装置 |
| CN111050183A (zh) * | 2019-12-13 | 2020-04-21 | 浙江大华技术股份有限公司 | 一种帧内预测方法、编码器及存储介质 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4507306A4 * |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20230336718A1 (en) * | 2022-04-18 | 2023-10-19 | Comcast Cable Communications, Llc | Video Compression Using Template-Based Determination of Intra Prediction Mode |
| US12457328B2 (en) * | 2022-04-18 | 2025-10-28 | Comcast Cable Communications, Llc | Video compression using template-based determination of intra prediction mode |
| WO2025148127A1 (en) * | 2024-01-08 | 2025-07-17 | Beijing Xiaomi Mobile Software Co., Ltd. | Encoding/decoding video picture data |
| WO2025213371A1 (zh) * | 2024-04-09 | 2025-10-16 | Oppo广东移动通信有限公司 | 编解码方法、码流、编码器、解码器以及存储介质 |
| WO2025213370A1 (zh) * | 2024-04-09 | 2025-10-16 | Oppo广东移动通信有限公司 | 编解码方法、码流、编码器、解码器以及存储介质 |
Also Published As
| Publication number | Publication date |
|---|---|
| KR20250002334A (ko) | 2025-01-07 |
| EP4507306A4 (en) | 2026-02-11 |
| CN118985131A (zh) | 2024-11-19 |
| US20250030845A1 (en) | 2025-01-23 |
| MX2024012273A (es) | 2024-11-08 |
| JP2025511314A (ja) | 2025-04-15 |
| EP4507306A1 (en) | 2025-02-12 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2023193253A1 (zh) | 解码方法、编码方法、解码器以及编码器 | |
| CN114868386B (zh) | 编码方法、解码方法、编码器、解码器以及电子设备 | |
| WO2024007116A1 (zh) | 解码方法、编码方法、解码器以及编码器 | |
| CN116601957A (zh) | 一种帧内预测方法、装置及解码器和编码器 | |
| CN116686288A (zh) | 编码方法、解码方法、编码器、解码器以及电子设备 | |
| US20250024027A1 (en) | Decoding method, encoding method, and storage medium | |
| US12395627B2 (en) | Intra prediction method and decoder | |
| CN118044184A (zh) | 用于执行组合帧间预测和帧内预测的方法和系统 | |
| WO2023197181A1 (zh) | 解码方法、编码方法、解码器以及编码器 | |
| WO2023197179A1 (zh) | 解码方法、编码方法、解码器以及编码器 | |
| RU2852150C2 (ru) | Способ декодирования, способ кодирования, декодер и кодер | |
| RU2859876C2 (ru) | Способ декодирования, способ кодирования и энергонезависимый машиночитаемый носитель данных | |
| WO2025236218A1 (zh) | 编解码方法、码流、编码器、解码器以及存储介质 | |
| JP2026500444A (ja) | 復号方法、符号化方法、復号装置および符号化装置 | |
| WO2025091378A1 (zh) | 编解码方法、编解码器以及存储介质 | |
| WO2025073085A1 (zh) | 编解码方法、编解码器以及存储介质 | |
| WO2025213368A1 (zh) | 编解码方法、码流、编码器、解码器以及存储介质 | |
| WO2025123197A1 (zh) | 编解码方法、编解码器、码流以及存储介质 | |
| WO2025213371A1 (zh) | 编解码方法、码流、编码器、解码器以及存储介质 | |
| WO2024197744A9 (zh) | 解码方法、编码方法、解码器和编码器 | |
| WO2025213396A1 (zh) | 编解码方法、码流、编码器、解码器以及存储介质 | |
| WO2025098301A1 (zh) | 滤波方法、装置、电子设备以及存储介质 | |
| WO2023197180A1 (zh) | 解码方法、编码方法、解码器以及编码器 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22936191 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2024558329 Country of ref document: JP |
|
| WWE | Wipo information: entry into national phase |
Ref document number: MX/A/2024/012273 Country of ref document: MX |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 202280094571.9 Country of ref document: CN |
|
| REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: 112024020451 Country of ref document: BR |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 1020247036730 Country of ref document: KR |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 202417085286 Country of ref document: IN |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2024133297 Country of ref document: RU Ref document number: 2022936191 Country of ref document: EP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2022936191 Country of ref document: EP Effective date: 20241108 |
|
| WWP | Wipo information: published in national office |
Ref document number: 2024133297 Country of ref document: RU |
|
| ENP | Entry into the national phase |
Ref document number: 112024020451 Country of ref document: BR Kind code of ref document: A2 Effective date: 20241002 |
|
| WWG | Wipo information: grant in national office |
Ref document number: 2024133297 Country of ref document: RU |

