WO2023193254A1 - 解码方法、编码方法、解码器以及编码器 - Google Patents
解码方法、编码方法、解码器以及编码器 Download PDFInfo
- Publication number
- WO2023193254A1 WO2023193254A1 PCT/CN2022/085898 CN2022085898W WO2023193254A1 WO 2023193254 A1 WO2023193254 A1 WO 2023193254A1 CN 2022085898 W CN2022085898 W CN 2022085898W WO 2023193254 A1 WO2023193254 A1 WO 2023193254A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- mip
- mode
- current block
- optimal
- block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/109—Selection of coding mode or of prediction mode among a plurality of temporal predictive coding modes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/11—Selection of coding mode or of prediction mode among a plurality of spatial predictive coding modes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/105—Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/147—Data rate or code amount at the encoder output according to rate distortion criteria
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/189—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
- H04N19/196—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding being specially adapted for the computation of encoding parameters, e.g. by averaging previously computed encoding parameters
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/593—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
Definitions
- the embodiments of the present application relate to the technical field of image and video encoding and decoding, and more specifically, to a decoding method, an encoding method, a decoder, and an encoder.
- Digital video compression technology mainly compresses huge digital image and video data to facilitate transmission and storage.
- Digital video compression standards can implement video decompression technology, there is still a need to pursue better digital video decompression technology to Improving compression efficiency.
- Embodiments of the present application provide a decoding method, encoding method, decoder and encoder, which can improve compression efficiency.
- this application provides a decoding method, including:
- the distortion cost of the multiple MIP modes includes the distortion cost obtained by using the multiple MIP modes to predict samples in the template area adjacent to the current block;
- a reconstructed block of the current block is obtained.
- this application provides an encoding method, including:
- the distortion cost of the multiple MIP modes includes the distortion cost obtained by using the multiple MIP modes to predict samples in the template area adjacent to the current block;
- this application provides a decoder, including:
- the parsing unit is used to parse the code stream to obtain the residual block of the current block in the current sequence
- Prediction unit used for:
- the distortion cost of the multiple MIP modes includes the distortion cost obtained by using the multiple MIP modes to predict samples in the template area adjacent to the current block;
- a reconstruction unit configured to obtain a reconstruction block of the current block based on the residual block of the current block and the prediction block of the current block.
- this application provides an encoder, including:
- Prediction unit used for:
- the distortion cost of the multiple MIP modes includes the distortion cost obtained by using the multiple MIP modes to predict samples in the template area adjacent to the current block;
- a residual unit configured to obtain a residual block of the current block based on the prediction block of the current block and the original block of the current block;
- a coding unit used to code the residual block of the current block to obtain the code stream of the current sequence.
- this application provides a decoder, including:
- a processor adapted to implement computer instructions
- Computer-readable storage medium stores computer instructions, and the computer instructions are suitable for the processor to load and execute the decoding method in the above-mentioned first aspect or its various implementations.
- processors there are one or more processors and one or more memories.
- the computer-readable storage medium may be integrated with the processor, or the computer-readable storage medium may be provided separately from the processor.
- this application provides an encoder, including:
- a processor adapted to implement computer instructions
- the computer-readable storage medium stores computer instructions, and the computer instructions are suitable for the processor to load and execute the encoding method in the above-mentioned second aspect or its respective implementations.
- processors there are one or more processors and one or more memories.
- the computer-readable storage medium may be integrated with the processor, or the computer-readable storage medium may be provided separately from the processor.
- the present application provides a computer-readable storage medium that stores computer instructions.
- the computer instructions When the computer instructions are read and executed by a processor of a computer device, the computer device performs the above-mentioned first aspect.
- the present application provides a code stream, which code stream relates to the code stream in the above-mentioned first aspect or the code stream involved in the above-mentioned second aspect.
- the decoder determines the intra prediction mode of the current block, it determines the optimal MIP mode for predicting the current block based on the distortion costs of multiple MIP modes, and determines the optimal MIP mode based on the optimal MIP mode.
- the intra prediction mode of the current block is equivalent to preventing the decoder from obtaining the MIP mode by parsing the code stream. Compared with the traditional MIP technology, it can effectively reduce the bit overhead at the coding unit level, thereby improving decompression. efficiency.
- Figure 1 is a schematic block diagram of a coding framework provided by an embodiment of the present application.
- FIG. 2 is a schematic diagram of the MIP mode provided by the embodiment of the present application.
- FIG. 3 is a schematic diagram of a template used by TIMD provided by an embodiment of the present application.
- Figure 4 is a schematic block diagram of a decoding framework provided by an embodiment of the present application.
- Figure 5 is a schematic flowchart of a decoding method provided by an embodiment of the present application.
- Figure 6 is a schematic flow chart of the encoding method provided by the embodiment of the present application.
- Figure 7 is a schematic block diagram of a decoder provided by an embodiment of the present application.
- Figure 8 is a schematic block diagram of an encoder provided by an embodiment of the present application.
- Figure 9 is a schematic block diagram of an electronic device provided by an embodiment of the present application.
- the solutions provided by the embodiments of this application can be applied to the field of digital video coding technology, including but not limited to: image coding and decoding, video coding and decoding, hardware video coding and decoding, dedicated circuit video coding and decoding, and real-time video coding and decoding. field.
- the solution provided by the embodiments of the present application can be combined with the Audio Video Coding Standard (AVS), the second generation AVS standard (AVS2) or the third generation AVS standard (AVS3).
- AVS Audio Video Coding Standard
- VVC Very Video Coding
- the solution provided by the embodiment of the present application can be used to perform lossy compression on images (lossy compression), or can also be used to perform lossless compression on images (lossless compression).
- the lossless compression can be visually lossless compression (visually lossless compression) or mathematically lossless compression (mathematically lossless compression).
- Video coding and decoding standards all adopt block-based hybrid coding framework.
- Each frame in the video is divided into square largest coding units (LCU largest coding unit) or coding tree units (CTU Coding Tree Unit) of the same size (such as 128x128, 64x64, etc.).
- Each maximum coding unit or coding tree unit can be divided into rectangular coding units (CU coding units) according to rules.
- Coding units may also be divided into prediction units (PU prediction unit), transformation units (TU transform unit), etc.
- the hybrid coding framework includes prediction, transform, quantization, entropy coding, in loop filter and other modules.
- the prediction module includes intra prediction and inter prediction.
- Inter-frame prediction includes motion estimation (motion estimation) and motion compensation (motion compensation).
- Intra-frame prediction only refers to the information of the same frame image and predicts the pixel information within the current divided block. Since there is a strong similarity between adjacent frames in the video, the inter-frame prediction method is used in video coding and decoding technology to eliminate the temporal redundancy between adjacent frames, thereby improving coding efficiency.
- Inter-frame prediction can refer to image information of different frames and use motion estimation to search for motion vector information that best matches the current divided block. The transformation converts the predicted image blocks into the frequency domain and redistributes the energy. Combined with quantization, it can remove information that is insensitive to the human eye and is used to eliminate visual redundancy.
- Entropy coding can eliminate character redundancy based on the current context model and the probability information of the binary code stream.
- the encoder can first read a black-and-white image or color image from the original video sequence, and then encode the black-and-white image or color image.
- the black-and-white image may include pixels of the brightness component
- the color image may include pixels of the chrominance component.
- the color image may also include pixels with a brightness component.
- the color format of the original video sequence can be luminance-chrominance (YCbCr, YUV) format or red-green-blue (Red-Green-Blue, RGB) format, etc.
- the encoder reads a black-and-white image or a color image, it divides it into blocks respectively, and uses intra-frame prediction or inter-frame prediction for the current block to generate a predicted block of the current block.
- the prediction block is subtracted from the original block of the current block. block to obtain a residual block, transform and quantize the residual block to obtain a quantized coefficient matrix, entropy encode the quantized coefficient matrix and output it to the code stream.
- the decoder uses intra prediction or inter prediction for the current block to generate a prediction block of the current block.
- the decoder parses the code stream to obtain the quantization coefficient matrix, performs inverse quantization and inverse transformation on the quantization coefficient matrix to obtain the residual block, and adds the prediction block and the residual block to obtain the reconstruction block.
- Reconstruction blocks can be used to compose a reconstructed image, and the decoder performs loop filtering on the reconstructed image based on images or blocks to obtain a decoded image.
- the current block can be the current coding unit (CU) or the current prediction unit (PU), etc.
- the encoding end also requires similar operations as the decoding end to obtain the decoded image.
- the decoded image can be used as a reference frame for inter-frame prediction for subsequent frames.
- the block division information determined by the encoding end, mode information or parameter information such as prediction, transformation, quantization, entropy coding, loop filtering, etc., need to be output to the code stream if necessary.
- the decoding end determines the same block division information as the encoding end through parsing and analyzing based on existing information, prediction, transformation, quantization, entropy coding, loop filtering and other mode information or parameter information, thereby ensuring the decoded image and decoding obtained by the encoding end
- the decoded image obtained at both ends is the same.
- the decoded image obtained at the encoding end is usually also called a reconstructed image.
- the current block can be divided into prediction units during prediction, and the current block can be divided into transformation units during transformation.
- the divisions between prediction units and transformation units can be the same or different.
- the above is only the basic process of the video codec under the block-based hybrid coding framework. With the development of technology, some modules of the framework or some steps of the process may be optimized. This application is applicable to this block-based hybrid coding.
- Figure 1 is a schematic block diagram of a coding framework 100 provided by an embodiment of the present application.
- the coding framework 100 may include an intra prediction unit 180, an inter prediction unit 170, a residual unit 110, a transformation and quantization unit 120, an entropy coding unit 130, an inverse transformation and inverse quantization unit 140, and a loop. filter unit 150.
- the encoding framework 100 may also include a decoded image buffer unit 160. This coding framework 100 may also be called a hybrid framework coding mode.
- the intra prediction unit 180 or the inter prediction unit 170 may predict the image block to be encoded to output the prediction block.
- the residual unit 110 may calculate a residual block, that is, a difference value between the prediction block and the image block to be encoded, based on the prediction block and the image block to be encoded.
- the transformation and quantization unit 120 is used to perform operations such as transformation and quantization on the residual block to remove information that is insensitive to the human eye, thereby eliminating visual redundancy.
- the residual block before transformation and quantization by the transformation and quantization unit 120 may be called a time domain residual block
- the time domain residual block after transformation and quantization by the transformation and quantization unit 120 may be called a frequency residual block. or frequency domain residual block.
- the entropy encoding unit 130 may output a code stream based on the transform quantization coefficient. For example, the entropy encoding unit 130 may eliminate character redundancy according to the target context model and probability information of the binary code stream. For example, the entropy coding unit 130 may be used for context-based adaptive binary arithmetic entropy coding (CABAC). The entropy encoding unit 130 may also be called a header information encoding unit.
- CABAC context-based adaptive binary arithmetic entropy coding
- the image block to be encoded can also be called an original image block or a target image block, and a prediction block can also be called a predicted image block or image prediction block, and can also be called a prediction signal or prediction information.
- the reconstruction block may also be called a reconstructed image block or an image reconstruction block, and may also be called a reconstructed signal or reconstructed information.
- the image block to be encoded may also be called an encoding block or a coded image block, and for the decoding end, the image block to be encoded may also be called a decoding block or a decoded image block.
- the image block to be encoded may be a CTU or a CU.
- the encoding framework 100 calculates the residual between the prediction block and the image block to be encoded to obtain the residual block, and then transmits the residual block to the decoder through processes such as transformation and quantization.
- the decoder receives and parses the code stream, it obtains the residual block through steps such as inverse transformation and inverse quantization.
- the prediction block predicted by the decoder is superimposed on the residual block to obtain the reconstructed block.
- the inverse transform and inverse quantization unit 140, the loop filter unit 150 and the decoded image buffer unit 160 in the encoding framework 100 may be used to form a decoder.
- the intra prediction unit 180 or the inter prediction unit 170 can predict the image block to be encoded based on the existing reconstructed block, thereby ensuring that the encoding end and the decoding end have consistent understanding of the reference frame.
- the encoder can replicate the decoder's processing loop and thus produce the same predictions as the decoder.
- the quantized transform coefficients are inversely transformed and inversely quantized by the inverse transform and inverse quantization unit 140 to copy the approximate residual block at the decoding side.
- the approximate residual block After the approximate residual block is added to the prediction block, it can pass through the loop filtering unit 150 to smoothly filter out block effects and other effects caused by block-based processing and quantization.
- the image blocks output by the loop filter unit 150 may be stored in the decoded image buffer unit 160 for use in prediction of subsequent images.
- Figure 1 is only an example of the present application and should not be understood as a limitation of the present application.
- the loop filtering unit 150 in the coding framework 100 may include a deblocking filter (DBF) and sample adaptive compensation filtering (SAO).
- DBF deblocking filter
- SAO sample adaptive compensation filtering
- the encoding framework 100 may adopt a neural network-based loop filtering algorithm to improve video compression efficiency.
- the coding framework 100 may be a video coding hybrid framework based on a deep learning neural network.
- a model based on a convolutional neural network can be used to calculate the result of filtering the pixels based on the deblocking filter and sample adaptive compensation filtering.
- the network structures of the loop filter unit 150 on the luminance component and the chrominance component may be the same or different. Considering that the brightness component contains more visual information, the brightness component can also be used to guide the filtering of the chroma component to improve the reconstruction quality of the chroma component.
- intra-frame prediction only refers to the information of the same frame image and predicts pixel information within the image block to be encoded to eliminate spatial redundancy;
- the frame used for intra-frame prediction can be an I frame.
- the image block to be encoded can refer to the upper left image block, the upper image block and the left image block as reference information to predict the image block to be encoded, and the image block to be encoded The block is used as reference information for the next image block, so that the entire image can be predicted.
- the input digital video is in color format, such as YUV 4:2:0 format, then every 4 pixels of each image frame of the digital video consists of 4 Y components and 2 UV components.
- the encoding framework can components (i.e. luma blocks) and UV components (i.e. chrominance blocks) are encoded separately.
- the decoding end can also perform corresponding decoding according to the format.
- the process involved in the MIP mode can be divided into three main steps, which are the down-sampling process, the matrix multiplication process and the up-sampling process.
- the spatially adjacent reconstructed samples are first downsampled through the downsampling process, and then the downsampled sample sequence is used as the input vector of the matrix multiplication process, that is, the output vector of the downsampling process is used as the input of the matrix multiplication process.
- vector multiply the preset matrix and add the offset vector, and output the calculated sample vector; finally, use the output vector of the matrix multiplication process as the input vector of the upsampling process, and obtain the final result through upsampling. prediction block.
- FIG. 2 is a schematic diagram of the MIP mode provided by the embodiment of the present application.
- the MIP mode obtains the upper-adjacent down-sampled reconstructed sample vector bdry top by averaging the reconstructed samples adjacent to the top of the current coding unit, and obtains the left-adjacent down-sampled vector bdry top by averaging the reconstructed samples adjacent to the left.
- Sampling reconstructed sample vector bdry left After obtaining bdry top and bdry left , use them as the input vector bdry red of the matrix multiplication process.
- the sample can be obtained through the top row vector bdry top red , bdry left , A k ⁇ bdry red +b k based on bdry red Vector, where A k is a preset matrix, b k is a preset bias vector, and k is the index of the MIP mode.
- a k is a preset matrix
- b k is a preset bias vector
- k is the index of the MIP mode.
- MIP in order to predict a block with width W and height H, MIP requires H reconstructed pixels in the left column of the current block and W reconstructed pixels in the upper row of the current block as input.
- MIP generates prediction blocks in the following three steps: reference pixel averaging (Averaging), matrix multiplication (Matrix Vector Multiplication) and interpolation (Interpolation).
- the core of MIP is matrix multiplication, which can be thought of as a process of generating prediction blocks using input pixels (reference pixels) in a matrix multiplication manner.
- MIP provides a variety of matrices. Different prediction methods can be reflected in different matrices. Using different matrices for the same input pixel will yield different results.
- reference pixel averaging and interpolation is a design that compromises performance and complexity. For larger blocks, an effect similar to downsampling can be achieved by reference pixel averaging, allowing the input to be adapted to a smaller matrix, while interpolation achieves an upsampling effect. In this way, there is no need to provide MIP matrices for blocks of each size, but only matrices of one or several specific sizes. As the demand for compression performance increases and hardware capabilities improve, more complex MIPs may appear in the next generation of standards.
- the MIP mode can be simplified from the neural network.
- the matrix used can be obtained based on training. Therefore, the MIP mode has strong generalization ability and prediction effects that traditional prediction models cannot achieve.
- the MIP mode can be a model obtained through multiple simplifications of hardware and software complexity for an intra-frame prediction model based on a neural network. Based on a large number of training samples, multiple prediction modes represent a variety of models and parameters, which can compare Good coverage of natural sequences of textures.
- MIP is somewhat similar to planar mode, but obviously MIP is more complex and more flexible than planar mode.
- the number of MIP modes may be different. For example, for a coding unit of 4x4 size, the MIP mode has 16 prediction modes; for a coding unit of 8x8 with width equal to 4 or height equal to 4, the MIP mode has 8 prediction modes; for coding units of other sizes, the MIP mode has 6 prediction modes.
- MIP mode has a transpose function. For prediction modes that conform to the current size, MIP mode can try transposition calculations on the encoder side. Therefore, MIP mode not only requires a flag bit to indicate whether the current coding unit uses MIP mode, but also, if the current coding unit uses MIP mode, an additional transposed flag bit needs to be transmitted to the decoder.
- the transposed flag bit of MIP is binarized by fixed-length encoding (Fixed Length, FL), and the length is 1.
- the mode index of MIP is binarized by truncated binary encoding (Truncated Binary, TB).
- the codec also uses the operation to derive the prediction mode to save the overhead of the transmission mode index.
- the TIMD prediction mode can be understood as two main parts. First, the cost information of each prediction mode is calculated according to the template. The prediction mode corresponding to the minimum cost and the second-lowest cost will be selected. The prediction mode corresponding to the minimum cost is recorded as prediction mode 1. The prediction mode corresponding to the small cost is recorded as prediction mode 2; if the ratio of the next smallest cost value (costMode2) to the minimum cost value (costMode1) meets the preset conditions, such as costMode2 ⁇ 2*costMode1, then prediction mode 1 and prediction mode 2. Each corresponding prediction block can be weighted and fused according to the corresponding weights of prediction mode 1 and prediction mode 2 to obtain the final prediction block.
- the corresponding weights of prediction mode 1 and prediction mode 2 are determined in the following manner:
- weight1 costMode2/(costMode1+costMode2)
- weight2 1-weight1
- weight1 is the weight of the prediction block corresponding to prediction mode 1
- weight2 is the weight of the prediction block corresponding to prediction mode 2.
- weighted fusion between prediction blocks will not be performed, and the prediction block corresponding to prediction mode 1 will be the prediction block of TIMD.
- the TIMD prediction mode selects the planar mode to perform intra prediction on the current block. That is, no unweighted fusion is performed. Same as the DIMD prediction mode, the TIMD prediction mode needs to transmit a flag bit to the decoder to indicate whether the current coding unit uses the TIMD prediction mode.
- the process of the encoder or decoder calculating the cost information of each prediction mode is mainly: performing intra mode prediction on the samples in the template area based on the reconstructed samples adjacent to the upper or left side of the template area.
- the prediction process is the same as the original intra prediction.
- the modes are the same; for example, when DC mode is used to perform intra mode prediction on samples in the template area, the mean value of the entire coding unit is calculated; and when angle prediction mode is used to perform intra mode prediction on samples in the template area, the corresponding mode is selected according to the mode.
- interpolation filter and interpolate prediction samples according to rules At this time, based on the predicted samples and reconstructed samples in the template area, the distortion between the predicted samples and reconstructed samples in the area can be calculated, which is the cost information of the current prediction mode.
- FIG. 3 is a schematic diagram of a template used by TIMD provided by an embodiment of the present application.
- the codec can be based on a coding unit with a width equal to 2(M+L1)+1 and a height equal to 2(N+L2)+1
- Select the reconstructed samples in the reference template (Reference of template) of the current block to predict the samples in the template area of the current block.
- the TIMD prediction mode selects the planar mode to perform intra prediction on the current block.
- the available adjacent reconstruction samples may be samples adjacent to the left and upper sides of the current CU in Figure 3, that is, there are no available reconstruction samples in the diagonally filled area. That is to say, if there are no available reconstruction samples in the diagonal padding area, the TIMD prediction mode selects the planar mode to perform intra prediction on the current block.
- the left and upper sides of the current block can theoretically obtain reconstruction values, that is, the template of the current block contains available adjacent reconstruction samples.
- the decoder can use a certain intra prediction mode to predict on the template, and compare the prediction value and the reconstructed value to obtain the cost of the intra prediction mode on the template.
- a certain intra prediction mode to predict on the template, and compare the prediction value and the reconstructed value to obtain the cost of the intra prediction mode on the template.
- the reconstructed samples in the template are correlated with the pixels in the current block. Therefore, the performance of a prediction mode on the template can be used to estimate the performance of this prediction mode on the current block. .
- TIMD predicts some candidate intra prediction modes on the template, obtains the cost of the candidate intra prediction mode on the template, and replaces the one or two intra prediction modes with the lowest cost as the intra prediction value of the current block. If the template cost difference between the two intra prediction modes is not large, the compression performance can be improved by weighting the prediction values of the two intra prediction modes.
- the weights of the predicted values of the two prediction modes are related to the above-mentioned costs. For example, the weights are inversely proportional to the costs.
- TIMD uses the prediction effect of the intra prediction mode on the template to screen the intra prediction mode, and can weight the two intra prediction modes according to the cost on the template.
- the advantage of TIMD is that if the TIMD mode is selected for the current block, there is no need to indicate which intra prediction mode is used in the code stream. Instead, the decoder itself derives it through the above process, which saves overhead to a certain extent.
- Figure 4 is a schematic block diagram of the decoding framework 200 provided by the embodiment of the present application.
- the decoding framework 200 may include an entropy decoding unit 210, an inverse transform and inverse quantization unit 220, a residual unit 230, an intra prediction unit 240, an inter prediction unit 250, a loop filtering unit 260, and a decoded image buffer. Unit 270.
- the entropy decoding unit 210 After the entropy decoding unit 210 receives and parses the code stream, it obtains the prediction block and the frequency domain residual block. For the frequency domain residual block, the inverse transform and inverse quantization unit 220 performs steps such as inverse transformation and inverse quantization to obtain the time domain residual block. Difference block, the residual unit 230 superposes the prediction block predicted by the intra prediction unit 240 or the inter prediction unit 250 to the time domain residual block after inverse transformation and inverse quantization by the inverse transformation and inverse quantization unit 220, we can obtain Rebuild block. For example, the intra prediction unit 240 or the inter prediction unit 250 may obtain the prediction block by decoding the header information of the code stream.
- Figure 5 is a schematic flow chart of the decoding method 300 provided by the embodiment of the present application. It should be understood that the decoding method 300 can be performed by a decoder. For example, it is applied to the decoding framework 200 shown in FIG. 4 . For the convenience of description, the following takes the decoder as an example.
- the decoding method 300 may include some or all of the following:
- the decoder parses the code stream to obtain the residual block of the current block in the current sequence
- the decoder determines the optimal MIP mode for predicting the current block based on the distortion costs of multiple matrix-based intra prediction MIP modes
- the distortion cost of the multiple MIP modes includes the distortion cost obtained by using the multiple MIP modes to predict samples in the template area adjacent to the current block;
- the decoder determines the intra prediction mode of the current block based on the optimal MIP mode
- S340 The decoder predicts the current block based on the intra prediction mode of the current block to obtain a prediction block of the current block;
- the decoder obtains a reconstructed block of the current block based on the residual block of the current block and the prediction block of the current block.
- this application may also combine S320 and S330 (that is, the decoder determines the optimal MIP mode based on the distortion costs of the multiple MIP modes and determines the current block based on the optimal MIP mode).
- the process of intra prediction mode is referred to as the template matching MIP (Template Matching MIP, TMMIP) technology or the MIP prediction mode derivation method based on template matching; that is to say, after the decoder obtains the residual block of the current block, The intra prediction mode of the current block can be determined based on TMMIP technology, and then the current block can be predicted based on the intra prediction mode of the current block to obtain the prediction block of the current block; finally, the decoder can be based on the residual block of the current block and The predicted block of the current block is used to obtain the reconstructed block of the current block.
- TMMIP Temporal Matching MIP
- the decoder determines the intra prediction mode of the current block, it determines the optimal MIP mode for predicting the current block based on the distortion costs of multiple MIP modes, and determines the optimal MIP mode based on the optimal MIP mode.
- the intra prediction mode of the current block is equivalent to preventing the decoder from obtaining the MIP mode by parsing the code stream. Compared with the traditional MIP technology, it can effectively reduce the bit overhead at the coding unit level, thereby improving decompression. efficiency.
- MIP is a simplified technology based on neural network technology. It is quite different from traditional interpolation filter prediction technology. For some special textures, although MIP prediction is better than traditional intra-frame prediction mode, its larger logo Overhead is a flaw of MIP technology.
- this application uses the decoder to independently determine the optimal MIP mode for predicting the current block and determines the intra prediction mode of the current block based on the optimal MIP mode, which can save up to 5 or 6 bits of overhead. It can effectively reduce the bit overhead at the coding unit level, thereby improving decompression efficiency.
- the solution provided by this application can solve the shortcomings of traditional MIP technology, that is, reduce the bit overhead at the image block level, thereby improving decompression efficiency.
- the distortion cost involved in the decoder in this application is different from the rate distortion cost (RDcost) involved in the encoder.
- the rate distortion cost is used by the encoding end to determine a certain intraframe among multiple intraframe prediction technologies.
- the rate distortion cost can be the cost value obtained by comparing the distorted image and the original image. Since the decoder cannot obtain the original image, the distortion cost involved in the decoder can be the difference between the reconstructed sample and the predicted sample.
- the distortion cost such as the Sum of Absolute Transformed Difference (SATD) cost between reconstructed samples and predicted samples, or other costs that can be used to calculate the difference between reconstructed samples and predicted samples.
- SATD Sum of Absolute Transformed Difference
- the S320 may include:
- the decoder parses the code stream of the current sequence to obtain a first identifier; if the first identifier is used to identify that the optimal MIP mode is allowed to be used to predict image blocks in the current sequence, the decoder based on the The distortion costs of multiple MIP modes are determined to determine the optimal MIP mode.
- the value of the first identifier is a first numerical value, it is used to identify that the optimal MIP mode is allowed to be used to predict the image block in the current sequence; if the value of the first identifier is If the value is the second numerical value, it is used to identify that the optimal MIP mode is not allowed to be used to predict the image blocks in the current sequence.
- the first value is 1 and the second value is 0. In another implementation, the first value is 0 and the second value is 1.
- the first numerical value and the second numerical value can also be other numerical values, which are not limited in this application.
- the first flag is true, it is used to identify that the optimal MIP mode is allowed to be used to predict the image block in the current sequence; if the first flag is false, it is used to identify The optimal MIP mode is not allowed to be used for prediction of image blocks in the current sequence.
- the decoder parses the block-level identifier. If the current block adopts the intra prediction mode, it parses or obtains the first identifier. If the first identifier is true, the decoder uses the block-level identifier based on the multiple MIP modes. The distortion cost determines the optimal MIP mode.
- the first flag is recorded as sps_timd_enable_flag.
- the decoder parses or obtains sps_timd_enable_flag. If the sps_timd_enable_flag is true, the decoder determines the optimal MIP based on the distortion costs of the multiple MIP modes. model.
- the first identifier is a sequence-level identifier.
- the first identifier is used to identify that the optimal MIP mode is allowed to be used to predict the image blocks in the current sequence, and may also be replaced by a description with similar or identical meanings.
- the first identifier is used to identify that the optimal MIP mode is allowed to be used to predict image blocks in the current sequence, and may be replaced by any of the following:
- the first identifier is used to identify that the TMMIP technology is allowed to be used to determine the intra prediction mode of the image block in the current sequence.
- the first identifier is used to identify that the TMMIP technology is allowed to be used to perform intra prediction of the image block in the current sequence.
- the first identifier is used to identify that the image blocks in the current sequence are allowed to use the TMMIP technology, and the first identifier is used to identify that the image blocks in the current sequence are allowed to be predicted using the MIP mode determined based on the multiple MIP modes.
- the permission flag bits of other technologies can also be used to indirectly indicate whether the current sequence is allowed to use the TMMIP technology.
- TIMD technology when the first identifier is used to indicate that the current sequence is allowed to use TIMD technology, it means that the current sequence is also allowed to use TMMIP technology; or in other words, the first identifier is used to indicate that the current sequence is allowed to use TIMD. technology, it means that the current sequence allows the use of TIMD technology and TMMIP technology at the same time; to further save bit overhead.
- the decoder parses the code stream to obtain the second identifier; if If the second identifier is used to identify that the optimal MIP mode is allowed to be used to predict the current block, the decoder determines the optimal MIP mode based on the distortion costs of the multiple MIP modes.
- the decoder parses the block-level identifier. If the current block adopts intra prediction mode, the decoder parses or obtains the first identifier. If the first identifier is true, the decoder parses or obtains the second identifier. , if the second flag is true, the decoder determines the optimal MIP mode based on the distortion costs of the multiple MIP modes.
- the third value of the second identifier is used to identify that the optimal MIP mode is allowed to be used to predict the current block; if the fourth value of the second identifier is used to identify The current block is not allowed to be predicted using the optimal MIP mode.
- the third value is 1 and the fourth value is 0.
- the third value is 0 and the fourth value is 1.
- the third numerical value and the fourth numerical value can also be other numerical values, which are not specifically limited in this application.
- the second identifier For example, if the second identifier is true, it is used to identify that the optimal MIP mode is allowed to be used to predict the current block; if the second identifier is false, it is used to identify that the use of the optimal MIP mode is not allowed.
- the optimal MIP mode predicts the current block.
- the decoder parses or obtains sps_timd_enable_flag. If the sps_timd_enable_flag is true, the decoder can parse or obtain cu_timd_enable_flag. If the If cu_timd_enable_flag is true, the decoder determines the optimal MIP mode based on the distortion costs of the multiple MIP modes.
- the second identification is a block-level identification or a coding unit-level identification.
- the second identifier is used to identify that the optimal MIP mode is allowed to be used to predict the current block, and may also be replaced by a description with a similar or identical meaning.
- the second identification is used to identify that the optimal MIP mode is allowed to be used to predict the current block, and may be replaced by any of the following: the second identification Used to identify that the TMMIP technology is allowed to be used to determine the intra prediction mode of the current block, the second identification is used to identify that the TMMIP technology is allowed to be used to perform intra prediction on the current block, the second identification is used to identify that the current block is allowed to be in The image block uses the TMMIP technology, and the second identifier is used to identify that the current block is allowed to be predicted using the MIP mode determined based on the plurality of MIP modes.
- whether the current block is allowed to use the TMMIP technology can also be indirectly indicated through the permission flag bits of other technologies.
- TIMD technology when the second identifier is used to indicate that the current block is allowed to use TIMD technology, it means that the current block is also allowed to use TMMIP technology; or in other words, the second identifier is used to indicate that the current block is allowed to use TIMD. technology, it means that the current block allows the use of TIMD technology and TMMIP technology at the same time; to further save bit overhead.
- the decoding end when the decoding end parses the second identifier, it can parse the second identifier before parsing the residual block of the current block, or it can parse the second identifier after parsing the residual block of the current block.
- This application provides This is not specifically limited.
- the method 300 may further include:
- the decoder determines the arrangement order of the multiple MIP modes based on the distortion cost of the MIP mode; the decoder determines the encoding method used by the optimal MIP mode based on the arrangement order of the multiple MIP modes; the decoder determines the encoding method used by the optimal MIP mode based on the arrangement order.
- the encoding method used by the optimal MIP mode decodes the code stream of the current sequence to obtain the index of the optimal MIP mode.
- the decoder determines the optimal MIP mode for predicting the current block based on the distortion costs of multiple MIP modes, it needs to calculate the distortion cost of each MIP mode in the multiple MIP modes, and calculate the distortion cost according to The distortion cost of each MIP mode ranks the multiple MIP modes, and the MIP mode with the smallest cost is the optimal prediction result.
- the index of the MIP mode is usually binarized using truncated binary.
- This encoding method is closer to equal probability encoding, that is, it divides all prediction modes into two segments, and one segment is Represented by N codewords, the other is represented by N+1 codewords.
- the decoder may first calculate the distortion of each MIP mode in the multiple MIP modes. cost, and sorts the multiple MIP modes according to the distortion cost of each MIP mode.
- the decoder can choose to use a more flexible variable length encoding method and an equal probability encoding method based on the sorting of the multiple MIP modes.
- the decoder can choose to use a more flexible variable length encoding method and an equal probability encoding method based on the sorting of the multiple MIP modes.
- by flexibly setting the encoding method of the MIP mode it is helpful to save the bit overhead of the index of the MIP mode.
- the codeword length of the encoding method used by the first n MIP modes in the arrangement sequence is smaller than the codeword length of the encoding mode used by the MIP mode after the nth MIP mode in the arrangement sequence; and/ Or, the first n MIP patterns use variable length encoding and the MIP patterns after the nth MIP pattern use truncated binary encoding.
- N can be any value greater than or equal to 1.
- the arrangement order is the order obtained by the decoder arranging the multiple MIP modes in order from small to large distortion costs, and the codewords of the encoding methods used by the first n MIP modes in the arrangement order are The length is less than the codeword length of the encoding method used by the MIP mode after the n-th MIP mode in the arrangement sequence; and/or, the first n MIP modes use variable-length encoding and the n-th MIP mode after the MIP mode uses truncated binary encoding.
- the codewords of the encoding method used by the first n MIP modes in the arrangement order are The length is designed to be less than the codeword length of the encoding method used by the MIP mode after the nth MIP mode in the arrangement sequence; and/or, the encoding method used by the first n MIP modes is designed as a variable length encoding method and The coding method used in the MIP mode after the nth MIP mode is designed as a truncated binary coding method; equivalently, the MIP mode used by the encoder with a high probability uses a shorter codeword length or a variable length coding method, which can save MIP modes The bit overhead of the index improves the decompression performance.
- the S330 may include:
- the decoder determines the optimal MIP mode as the intra prediction mode of the current block.
- the decoder determines the optimal MIP mode, it directly performs intra prediction on the current block based on the optimal MIP mode to obtain a prediction block of the current block.
- the S330 may include:
- the decoder predicts the current block based on the optimal MIP mode to obtain a first prediction block; the decoder derives a TIMD mode based on the intra mode of the template to predict the current block and obtains a second prediction block; the decoder Based on the distortion cost of the first prediction block and the distortion cost of the second prediction block, the prediction mode with the smallest distortion cost among the optimal MIP mode and the TIMD mode is determined as the intra-frame of the current block. Prediction mode.
- the optimal intra prediction mode among the optimal MIP mode and the TIMD mode is determined as the intra prediction mode of the current block.
- the decoder can directly perform intra-frame processing on the current block based on the optimal MIP mode. Predict to obtain the predicted block of the current block. If the optimal prediction mode in the optimal MIP mode and the TIMD mode is the TIMD mode, the decoder can directly obtain the current prediction mode based on the optimal prediction mode and the suboptimal prediction mode obtained in the TIMD mode. Block prediction block.
- the distortion cost of the optimal prediction mode requires a prediction block fusion operation; that is, the decoder can first perform intra prediction on the current block according to the optimal prediction mode to obtain the optimal prediction block; secondly, the current block can be obtained based on the suboptimal prediction mode. Perform intra-frame prediction to obtain the suboptimal prediction block; then use the ratio between the distortion cost of the optimal prediction mode and the distortion cost of the suboptimal prediction mode to calculate the weight value belonging to the optimal prediction block and the weight of the suboptimal prediction block.
- the optimal prediction block and the suboptimal prediction block are weighted and fused to obtain the prediction block of the current block.
- the optimal prediction mode or suboptimal prediction mode is planar mode or DC mode, or the distortion cost of the suboptimal prediction mode is greater than twice the distortion cost of the optimal prediction mode, then there is no need to perform a prediction block fusion operation, that is, Only the optimal prediction block obtained based on the optimal prediction mode can be directly used as the prediction block of the current block.
- the S320 may include:
- the decoder predicts the samples in the template area based on the third identifier and the multiple MIP modes, and obtains the distortion cost of the multiple MIP modes in each state of the third identifier; the third identifier The input vector and the output vector used to identify whether to transpose the MIP mode; the decoder determines the optimal MIP mode based on the distortion cost of the multiple MIP modes in each state of the third identification.
- MIP has more bit overhead than other intra prediction tools. It not only requires a flag bit to indicate whether to use MIP technology, but also needs a flag bit to indicate whether to use transposition. MIP, the last and most expensive part, requires the use of truncated binary encoding to represent the prediction mode of MIP. MIP is a simplified technology based on neural network technology. It is quite different from traditional interpolation filter prediction technology. For some special textures, although MIP prediction is better than traditional intra-frame prediction mode, its larger logo Overhead is a flaw of MIP technology.
- this application considers the transposition function of the MIP mode by traversing each state of the third identifier, which can save the cost of one MIP transposition identifier, thereby improving the solution. Compression efficiency.
- the decoder traverses each state of the third identifier and the multiple MIP modes, determines the distortion cost of the multiple MIP modes in each state of the third identifier, and determines the distortion cost of the multiple MIP modes based on the third identifier.
- the distortion cost of the multiple MIP modes in each state of the three identifiers is determined to determine the optimal MIP mode; or, the decoder traverses each state of the third identifier and the multiple MIP modes to determine the
- the optimal MIP mode is determined based on the distortion cost of the multiple MIP modes in each state of the third identifier, and based on the distortion costs of the multiple MIP modes in each state of the third identifier. That is to say, the decoding end may first traverse the multiple MIP modes, or may first traverse the state of the third identifier.
- the value of the third identifier is a fifth numerical value, it is used to identify the input vector and output vector of the transposed MIP mode; if the value of the third identifier is a sixth numerical value, then it is used to identify the input vector and output vector of the transposed MIP mode. Identifies the input and output vectors of the non-transposed MIP mode.
- each state of the third identifier can also be replaced by each value of the third identifier.
- the fifth value is 1 and the sixth value is 0.
- the fifth value is 0 and the sixth value is 1.
- the fifth numerical value and the sixth numerical value can also be other numerical values, which are not limited in this application.
- the third identifier is true, it is used to identify the input vector and the output vector of the transposed MIP mode; if the third identifier is false, it is used to identify the input vector and the output vector of the non-transposed MIP mode. Output vector. At this time, whether the third identifier is true or false is a state of the third identifier.
- the third identification is a sequence level identification, a block level identification or a coding unit level identification.
- the third identification may also be called transposition information, transposition identification, or MIP transposition identification bit.
- the third identifier is used to identify whether to transpose the input vector and the output vector of the MIP mode, and can also be replaced with a description with similar or identical meaning.
- the third identifier is used to identify whether the input and output of the MIP mode need to be transposed, and the third identifier is used to identify whether the input vector and the output vector of the MIP mode are transposed. vector, and the third identifier is used to indicate whether to transpose.
- the S320 may include:
- the decoder determines the optimal MIP mode based on the distortion costs of the multiple MIP modes.
- the preset size may include a size whose width is the preset width and whose height is the preset height. That is to say, if the width of the current block is the preset width and the height is the preset height, the decoder determines the optimal MIP mode based on the distortion costs of the multiple MIP modes.
- the preset size can be realized by pre-saving corresponding codes, tables or other methods that can be used to indicate relevant information in the device (for example, including a decoder and an encoder).
- This application is concerned with its specific implementation.
- the method is not limited.
- the preset size may refer to the size defined in the agreement.
- the "protocol" may refer to a standard protocol in the field of coding and decoding technology, which may include, for example, VCC or ECM protocols and other related protocols.
- the decoder may also use other methods to determine whether to determine the optimal MIP mode based on the distortion costs of the multiple MIP modes based on the preset size, which is not specified in this application. limited.
- the decoder may determine whether to determine the optimal MIP mode based on the distortion costs of the plurality of MIP modes based solely on the width or height of the current block. In one implementation, if the width of the current block is the preset width or the height is the preset height, the decoder determines the optimal MIP mode based on the distortion costs of the multiple MIP modes. For another example, the decoder may determine whether to determine the optimal MIP mode based on the distortion costs of the multiple MIP modes by comparing the size of the current block with the preset size. In one implementation, if the size of the current block is larger or smaller than a preset size, the decoder determines the optimal MIP mode based on the distortion costs of the multiple MIP modes.
- the decoder determines the optimal MIP mode based on distortion costs of the multiple MIP modes. In another implementation, if the height of the current block is greater than or less than a preset height, the decoder determines the optimal MIP mode based on distortion costs of the multiple MIP modes.
- the S320 may include:
- the decoder determines the optimal MIP mode based on the distortion costs of the multiple MIP modes.
- the decoder based on the multiple MIP modes
- the distortion cost is determined by determining the optimal MIP mode. That is to say, only when the image frame in which the current block is located is an I frame, the decoder determines whether to determine the optimal value based on the distortion costs of the multiple MIP modes based on the size of the current block. MIP mode.
- the S320 may include:
- the decoder determines the optimal MIP mode based on the distortion costs of the multiple MIP modes.
- the decoder may directly determine the optimal MIP mode based on the distortion costs of the multiple MIP modes. That is to say, when the image frame in which the current block is located is a B frame, regardless of the size of the current block, the decoder can directly determine the optimal MIP based on the distortion costs of the multiple MIP modes. model.
- the method 300 may further include:
- the decoder obtains the MIP mode used by the adjacent block adjacent to the current block; the decoder determines the MIP mode used by the adjacent block as the multiple MIP modes.
- the adjacent block may be an image block adjacent to at least one of the upper side, the left side, the lower left, the upper right, and the upper left of the current block.
- the decoder may determine image blocks acquired in the order of upper, left, lower left, upper right, and upper left of the current block as the adjacent blocks.
- the plurality of MIP modes can be used to construct an available MIP mode or an available MIP mode list that the decoder determines is used to predict the current block, so that the decoder passes the template area in the available MIP mode or available MIP mode list.
- the optimal MIP mode is determined by predicting the samples within the model.
- the method 300 may further include:
- the decoder performs reconstruction sample filling on the adjacent reference area outside the template area to obtain the reference rows and reference columns of the template area; the decoder uses the reference rows and the reference columns as input, and uses the multiple
- the MIP mode predicts the samples in the template area respectively to obtain multiple prediction blocks corresponding to the multiple MIP modes; the decoder determines the multiple prediction blocks based on the multiple prediction blocks and the reconstruction block in the template area. Distortion costs of multiple MIP modes.
- the decoder fills in the reference reconstruction samples required for template prediction
- the width of the area in the reference area adjacent to the upper side of the template area is equal to the width of the template area
- the height of the area in the reference area adjacent to the left side of the template area is equal to the width of the template area. is equal to the width of the template area; if the width of the area in the reference area adjacent to the upper side of the template area is greater than the width of the template area, the decoder can The area adjacent to the upper side of the area is subjected to downsampling or dimensionality reduction processing to obtain the reference row. If the height of the area in the reference area adjacent to the left side of the template area can be greater than the width of the template area, the decoder can determine the area in the reference area adjacent to the left side of the template area. The region is subjected to downsampling or dimensionality reduction processing to obtain the reference column.
- the template area may be a template area used in the TIMD mode mentioned above, and the reference area may be a reference template (Reference of template) used in the TIMD mode.
- the decoder will The composed reference area is filled with reconstructed samples, and the filled reference area is downsampled or dimensionally reduced to obtain the reference rows and reference columns, and then the MIP pattern is constructed based on the reference rows and reference columns. input vector.
- the decoder obtains the reference row and the reference column
- the reference row and the reference column are used as input
- the multiple MIP modes are used to predict the samples in the template area, respectively.
- Multiple prediction blocks corresponding to the multiple MIP modes are obtained; that is to say, the decoder, based on the reconstructed samples in the reference template of the current block, traverses the multiple MIP modes to estimate the template area of the current block. Make predictions on the samples within.
- the decoder uses the reference row, the reference column, the index of the current traversal MIP mode, and the third identifier mentioned above as inputs to obtain the prediction corresponding to the current traversal MIP mode.
- the reference row and the reference column are used to construct the input vector of the current traversal MIP mode; the index of the current traversal MIP mode is used to determine the matrix and/or offset vector of the current traversal MIP mode;
- the third identifier is used to identify whether to transpose the input vector and the output vector of the MIP mode; for example, if the third identifier is used to identify the input vector and the output vector of the MIP mode not to be transposed, then the reference columns are spliced after the reference line to form the input vector of the current traversal MIP mode; if the third identifier is used to identify the input vector and output vector of the transposed MIP mode, then the reference line is spliced to the reference line.
- the decoder transposes the output of the current traversal MIP mode to obtain the prediction block of the template area.
- the decoder can based on the distortion cost between the multiple prediction blocks and the reconstructed samples in the template area, according to The minimum distortion cost principle selects the MIP mode with the smallest cost and determines it as the optimal MIP mode in the current block's template matching-based MIP mode.
- the decoder when the decoder uses the multiple MIP modes to predict samples in the template area, it first downsamples the reference row and the reference column to obtain the input vector; and then uses the The input vector is used as input, and the samples in the template area are predicted by traversing the multiple MIP modes to obtain the output vectors of the multiple MIP modes; finally, the output vectors of the multiple MIP modes are Upsample to obtain prediction blocks corresponding to the multiple MIP modes.
- the reference row and the reference column satisfy the input conditions of the multiple MIP modes. If the reference row and the reference column do not meet the input conditions of the multiple MIP modes, the reference row and/or the reference column can be processed first as input that meets the input conditions of the multiple MIP modes. samples, and then determine the input vectors of the plurality of MIP modes based on the input samples that satisfy the input conditions of the plurality of MIP modes. For example, taking the input condition as a specified number of input samples, if the reference row and the reference column do not meet the number of input samples in the MIP mode, the decoder can modify the reference row and/or the reference column.
- the reference column is dimensionally reduced to a specified number of input samples by methods such as Haar-downsampling, and the input vectors of the multiple MIP modes are determined based on the specified number of input samples after dimensionality reduction.
- the S320 may include:
- the decoder determines the optimal MIP pattern based on the sum of absolute transform differences SATD of the plurality of MIP patterns on the template region.
- the distortion costs of the multiple MIP modes are designed to be the multiple MIP modes.
- SATD compared with directly calculating the rate distortion cost of the multiple MIP modes, can not only determine the optimal MIP mode based on the distortion cost of the multiple MIP modes on the template area, but also simplify the process. Describes the computational complexity of the distortion cost of multiple MIP modes, thereby improving the decompression performance of the decoder.
- the reconstructed samples in the reference area are filled, that is, the reference reconstructed samples required for predicting the samples in the template area (such as the template shown in FIG. 3).
- the width and height of the reference area do not need to exceed the width and height of the template area. If the reference area is filled with samples that exceed the width and height of the template area, downsampling or other methods need to be used to reduce the dimension to meet the MIP input dimension requirements.
- the decoder uses the reference reconstructed samples in the reference area, the indexes of the multiple MIP modes, and the MIP transposition flag bits as inputs to predict the samples in the template area to obtain the multiple MIP modes. corresponding prediction block.
- the reference reconstruction samples in the reference area need to meet MIP input conditions, such as Haar-downsampling, etc. to reduce the dimension to a specified number of input samples.
- the indexes of the multiple MIP modes are used to determine the matrix index of the MIP technology, and then obtain the MIP prediction matrix coefficients.
- the MIP transposition flag bit is used to identify whether input and output need to be transposed.
- the decoding method according to the embodiment of the present application is described in detail from the perspective of the decoder above.
- the encoding method according to the embodiment of the present application is described in detail from the perspective of the encoder with reference to FIG. 8 .
- Figure 6 is a schematic flow chart of the encoding method 400 provided by the embodiment of the present application. It should be understood that the encoding method 400 may be performed by an encoder. For example, it is applied to the coding framework 100 shown in FIG. 1 . For ease of description, the following uses an encoder as an example.
- the encoding method 400 may include:
- the distortion cost of the multiple MIP modes includes the distortion cost obtained by using the multiple MIP modes to predict samples in the template area adjacent to the current block;
- S430 Predict the current block based on the intra prediction mode of the current block to obtain a prediction block of the current block;
- S450 Encode the residual block of the current block to obtain the code stream of the current sequence.
- the S410 may include:
- the optimal MIP mode is determined based on the distortion costs of the multiple MIP modes
- the S450 may include:
- the S420 may include:
- the optimal prediction mode based on the optimal MIP mode
- the optimal prediction mode as the intra prediction mode of the current block
- the S450 may include:
- the second identifier is used to identify that the optimal MIP mode is allowed to be used to predict the current block; If the first rate distortion cost is greater than the minimum value of the at least one rate distortion cost, the second identification is used to identify that the optimal MIP mode is not allowed to be used to predict the current block.
- the S420 may include:
- the optimal MIP mode is determined as the optimal prediction mode.
- the S420 may include:
- the template-based intra mode derivation TIMD mode predicts the current block to obtain the second prediction block
- the prediction mode with the smallest distortion cost among the optimal MIP mode and the TIMD mode is determined as the optimal prediction mode.
- the S450 may include:
- the code stream is obtained by encoding the residual block of the current block and encoding the index of the optimal MIP mode based on the encoding method used in the optimal MIP mode.
- the codeword length of the encoding method used by the first n MIP modes in the arrangement sequence is smaller than the codeword length of the encoding mode used by the MIP mode after the nth MIP mode in the arrangement sequence; and/ Or, the first n MIP patterns use variable length encoding and the MIP patterns after the nth MIP pattern use truncated binary encoding.
- the S410 may include:
- the optimal MIP mode is determined based on the distortion costs of the multiple MIP modes in each state of the third identifier.
- the S410 may include:
- the optimal MIP mode is determined based on the distortion costs of the multiple MIP modes.
- the S410 may include:
- the optimal MIP mode is determined based on the distortion costs of the multiple MIP modes.
- the S410 may include:
- the optimal MIP mode is determined based on the distortion costs of the multiple MIP modes.
- the method 400 may further include:
- the MIP modes used by the adjacent blocks are determined as the multiple MIP modes.
- the method 400 may further include:
- Distortion costs for the plurality of MIP modes are determined based on the plurality of prediction blocks and reconstruction blocks within the template region.
- the multiple MIP modes when using the multiple MIP modes to predict samples in the template area, first down-sample the reference row and the reference column to obtain an input vector; and then use the input
- the vector is the input
- the samples in the template area are predicted by traversing the multiple MIP modes to obtain the output vectors of the multiple MIP modes; finally, the output vectors of the multiple MIP modes are upsampled. , obtain prediction blocks corresponding to the multiple MIP modes.
- the S410 may include:
- the optimal MIP pattern is determined based on the sum SATD of absolute transformation differences of the plurality of MIP patterns on the template region.
- the encoding method can be understood as the reverse process of the decoding method. Therefore, for the specific solution of the encoding method 400, please refer to the relevant content of the decoding method 300. For the convenience of description, this application will not repeat it again.
- the encoder or decoder can directly determine the optimal MIP mode used to predict the current block as the intra prediction mode of the current block.
- the encoder traverses the prediction mode. If the current block is in intra mode, the encoder obtains the sequence-level allowable flag bit, which is used to indicate whether the current sequence is allowed to use the MIP mode derivation technology based on template matching, which can be in the form of sps_tmmip_enable_flag. If the allowed flag bits of tmmip are all true, it means that the current encoder allows the use of TMMIP technology.
- the encoder process can be implemented as the following process:
- step 1
- step 2 If sps_tmmip_enable_flag is true, the encoder tries TMMIP technology, that is, performs step 2; if sps_tmmip_enable_flag is false, the encoder does not try TMMIP technology, that is, skips step 2 and proceeds to step 3 directly.
- the encoder fills the adjacent rows and columns outside the template region with reconstructed samples.
- the filling process is the same as the filling method in the original intra prediction process. For example, the encoder can traverse and fill from the lower left corner to the upper right corner. If all reconstructed samples are available, all available reconstructed samples will be filled in sequence; if all reconstructed samples are available If it is not available, all the reconstructed samples will be filled with the mean value; if some reconstructed samples are available, the available reconstructed samples will be filled in first. For the remaining unavailable reconstructed samples, the encoder can traverse according to the above order from the lower left corner to the upper right corner until the first available reconstruction sample appears. After a sample is reconstructed, previously unavailable locations are filled with the first available reconstructed sample.
- the encoder takes the reconstructed samples outside the filled template area as input, and uses the allowable MIP mode to predict the samples within the template area.
- MIP patterns are allowed to be used for a 4x4 block size.
- the allowed MIP patterns are 8.
- the allowed MIP pattern for other size blocks is 6.
- blocks of any size can use the MIP transpose function, and the prediction mode of TMMIP mentioned above is the same as the MIP technology.
- the specific prediction calculation process includes: the encoder first performs Haar down-sampling on the reconstructed samples. For example, the encoder determines the down-sampling step size based on the block size. Then, the encoder adjusts the splicing order of the reconstructed samples after downsampling on the upper side and the reconstructed samples after downsampling on the left based on the information about whether to transpose or not; if transposition is not required, the reconstructed samples after downsampling on the left are spliced on top After the reconstructed sample after side downsampling, the obtained vector is used as input.
- the encoder obtains the MIP matrix coefficients based on the traversed prediction mode as an index, and calculates the output vector with the input. Finally, the encoder upsamples the output vector according to the number of output vectors and the current template size. If upsampling is not required, the vectors are filled in the horizontal direction and output as the template prediction block. If upsampling is required, the horizontal direction is upsampled first. Then downsample in the vertical direction, upsample to the same size as the template, and then output it as a prediction block in the template area.
- the encoder calculates the distortion cost based on the prediction block of the template area obtained by traversing each MIP mode and the reconstructed samples in the template area, and records the distortion cost value under the prediction mode and transposition information. After traversing all allowed prediction modes and transposition information, according to the principle of minimum cost, the optimal MIP mode and its corresponding transposition information are selected as the optimal prediction mode for the current block in TMMIP mode.
- the encoder samples the reconstructed samples adjacent to the upper and left sides of the current block as appropriate and splices them according to the transposition information as the input vector, and uses the MIP mode as the index Read the matrix coefficients in the current mode, and then obtain the output vector by calculating the input vector and matrix coefficients.
- the encoder can perform output transposition according to the transposition information, and upsample the output vector according to the size of the current block and the number of samples of the output vector, and obtain an output with the same size as the current block as the prediction block of the current block. Based on this, the encoder obtains the rate distortion cost of the current block and records it as cost1.
- the encoder continues to traverse other intra prediction techniques and calculates the corresponding rate-distortion cost as cost2...costN.
- the current block uses TMMIP technology, and the encoder sets the TMMIP usage flag of the current block to true and writes it into the code stream; if cost1 is not the minimum rate distortion cost, the current block uses other intra-frame techniques. Prediction technology, the encoder sets the TMMIP usage flag position of the current block and writes it into the code stream. It should be understood that information such as identification bits or indexes of other intra prediction technologies are transmitted according to definition and will not be elaborated here;
- the encoder determines the residual block of the current block based on the prediction block of the current block and the original block of the current block, and performs operations such as transformation and quantization, entropy coding, and loop filtering on the residual block of the current block. It should be understood that the specific process can be found in the relevant content above, and to avoid repetition, it will not be described again here.
- the decoder parses the block-level type flag bit. If it is intra-frame mode, it parses or obtains the sequence-level allowable flag bit, which is used to indicate whether the current sequence is allowed to use the MIP mode export technology based on template matching, which can be in the form of sps_tmmip_enable_flag. If the allowed flag bits of tmmip are all true, it means that the current decoder allows the use of TMMIP technology.
- the decoder process can be implemented as the following process:
- step 1
- the decoder parses the TMMIP usage flag of the current block. Otherwise, the current decoding process does not need to decode the block-level TMMIP usage flag.
- the block-level TMMIP usage flag defaults to No. If the TMMIP usage flag of the current block is true, perform step 2; otherwise, perform step 3.
- the decoder fills the adjacent rows and columns outside the template region with reconstructed samples.
- the filling process is the same as the filling method in the original intra prediction process. For example, the decoder can traverse and fill from the lower left corner to the upper right corner. If all reconstructed samples are available, all available reconstructed samples will be filled in sequence; if all reconstructed samples are available If it is not available, all the reconstructed samples will be filled with the mean; if some of the reconstructed samples are available, the available reconstructed samples will be filled in first. For the remaining unavailable reconstructed samples, the decoder can traverse according to the above order from the lower left corner to the upper right corner until the first available reconstruction sample appears. After a sample is reconstructed, previously unavailable locations are filled with the first available reconstructed sample.
- the decoder takes the reconstructed samples outside the filled template area as input, and uses the allowed MIP mode to predict the samples in the template area.
- MIP patterns are allowed to be used for a 4x4 block size.
- the allowed MIP patterns are 8.
- the allowed MIP pattern for other size blocks is 6.
- blocks of any size can use the MIP transpose function, and the prediction mode of TMMIP mentioned above is the same as the MIP technology.
- the specific prediction calculation process includes: the decoder first performs Haar down-sampling on the reconstructed samples. For example, the decoder determines the down-sampling step size based on the block size. Then, the decoder adjusts the splicing order of the reconstructed samples after downsampling on the upper side and the reconstructed samples after downsampling on the left based on the information about whether to transpose or not; if transposition is not required, the reconstructed samples after downsampling on the left are spliced on top After the reconstructed sample after side downsampling, the obtained vector is used as input.
- the decoder obtains the MIP matrix coefficients based on the traversed prediction mode as an index, and calculates the output vector with the input. Finally, the decoder upsamples the output vector according to the number of output vectors and the current template size. If upsampling is not required, the vectors are filled in the horizontal direction and output as the template prediction block. If upsampling is required, the horizontal direction is upsampled first. Then downsample in the vertical direction, upsample to the same size as the template, and then output it as a prediction block in the template area.
- the decoder calculates the distortion cost based on the prediction block of the template area obtained by traversing each MIP mode and the reconstructed samples in the template area, and records the distortion cost value under the prediction mode and transposition information. After traversing all allowed prediction modes and transposition information, according to the principle of minimum cost, the optimal MIP mode and its corresponding transposition information are selected as the optimal prediction mode for the current block in TMMIP mode.
- the decoder samples the reconstructed samples adjacent to the upper and left sides of the current block as appropriate and splices them according to the transposition information as the input vector, and uses the MIP mode as the index Read the matrix coefficients in the current mode, and then obtain the output vector by calculating the input vector and matrix coefficients.
- the decoder can perform output transposition according to the transposition information, and upsample the output vector according to the size of the current block and the number of samples of the output vector, and obtain an output with the same size as the current block as the prediction block of the current block.
- the decoder continues to parse information such as usage flags or indexes of other intra prediction technologies, and obtains the final prediction block of the current block based on the parsed information.
- the decoder parses the code stream and obtains the frequency domain residual block of the current block (also called frequency domain residual information), and performs inverse quantization and inverse transformation on the frequency domain residual block of the current block to obtain the residual block of the current block (also known as the temporal residual block or temporal residual information); the decoder then superimposes the prediction block of the current block and the residual block of the current block to obtain a reconstructed sample block.
- frequency domain residual information also called frequency domain residual information
- temporal residual block or temporal residual information also known as the temporal residual block or temporal residual information
- the reconstructed image can be used as video output or as a reference for subsequent decoding.
- the size of the template area used by the encoder or decoder in the TMMIP technology can be predefined according to the size of the current block.
- the width of the upper area adjacent to the current block in the template area is the width of the current block, and its height is the height of two rows of samples; the left side of the template area adjacent to the left side of the current block
- the height of the region is the height of the current block, and its width is the width of two rows of samples.
- it can also be implemented as template areas of other sizes, and this application does not specifically limit this.
- the encoder or decoder can determine the optimal prediction mode as the current The intra prediction mode of the block. That is to say, the encoder or decoder can predict image blocks based on TMMIP technology and TIMD technology. For example, under the condition that TIMD technology is allowed to be used, TMMIP technology and TIMD technology only need one usage identification bit at the sequence level and block level to identify whether to use TIMD or TMMIP technology.
- the template areas of TIMD technology and TMMIP technology can be set the same, that is, the distortion cost area for calculating the template area is the same, and the two technologies can be merged together for comparison.
- TIMD derives traditional intra prediction modes while TMMIP technology derives MIP modes and their corresponding transposed information.
- the encoder traverses the prediction mode. If the current block is in intra mode, it obtains the allowed use flag of TIMD. This flag is a sequence-level flag, indicating that the current encoder allows the use of TIMD technology, which can be in the form of sps_TIMD_enable_flag.
- the encoder process can be implemented as the following process:
- step 1
- the encoder tries the TIMD prediction method, that is, performs step 2; if the allowed use flag of TIMD is false, the encoder does not try the TIMD prediction method, that is, skips step 2 directly. Perform step 3.
- the encoder fills the adjacent rows and columns outside the template region with reconstructed samples.
- the filling process is the same as the filling method in the original intra prediction process. For example, the encoder can traverse and fill from the lower left corner to the upper right corner. If all reconstructed samples are available, all available reconstructed samples will be filled in sequence; if all reconstructed samples are available If it is not available, all the reconstructed samples will be filled with the mean value; if some of the reconstructed samples are available, the available reconstructed samples will be filled in first. For the remaining unavailable reconstructed samples, the encoder can traverse according to the above order from the lower left corner to the upper right corner until the first available reconstruction sample appears. After a sample is reconstructed, previously unavailable locations are filled with the first available reconstructed sample.
- the encoder takes the reconstructed samples outside the filled template area as input, and uses the allowable MIP mode to predict the samples within the template area.
- MIP patterns are allowed to be used for a 4x4 block size.
- the allowed MIP patterns are 8.
- the allowed MIP pattern for other size blocks is 6.
- blocks of any size can use the MIP transpose function, and the prediction mode of TMMIP mentioned above is the same as the MIP technology.
- the specific prediction calculation process includes: the encoder first performs Haar down-sampling on the reconstructed samples. For example, the encoder determines the down-sampling step size based on the block size. Then, the encoder adjusts the splicing order of the reconstructed samples after downsampling on the upper side and the reconstructed samples after downsampling on the left based on the information about whether to transpose or not; if transposition is not required, the reconstructed samples after downsampling on the left are spliced on top After the reconstructed sample after side downsampling, the obtained vector is used as input.
- the encoder obtains the MIP matrix coefficients based on the traversed prediction mode as an index, and calculates the output vector with the input. Finally, the encoder upsamples the output vector according to the number of output vectors and the current template size. If upsampling is not required, the vectors are filled in the horizontal direction and output as the template prediction block. If upsampling is required, the horizontal direction is upsampled first. Then downsample in the vertical direction, upsample to the same size as the template, and then output it as a prediction block in the template area.
- the encoder also needs to try the template matching calculation process of TIMD, that is, the encoder can obtain different interpolation filters based on different prediction mode indexes to interpolate the reference samples to obtain prediction samples within the template.
- the encoder calculates the distortion cost based on the prediction block of the template area obtained by traversing each MIP mode and the reconstructed samples in the template area, and records the distortion cost under the prediction mode and transposition information.
- the encoder also needs to traverse all intra prediction modes allowed by TIMD, calculate the prediction blocks within the template, calculate the distortion cost with the reconstructed samples within the template, and record the distortion cost in each prediction mode. After the encoder traverses all allowed MIP modes and transposed information and traverses the prediction modes allowed by TIMD, it can select the optimal prediction mode based on the principle of minimum distortion cost.
- the MIP mode and its The corresponding transposition information is used as the optimal prediction mode of the current block; if the distortion cost of the TIMD mode is the smallest, the TIMD mode is used as the optimal prediction mode of the current block, and the optimal intra prediction mode derived by the TIMD technology is recorded and recorded. Distortion cost, while recording the suboptimal prediction mode derived by TIMD technology and its distortion cost.
- the encoder will sample the reconstructed samples adjacent to the upper and left sides of the current block according to the optimal MIP mode and transposition information as appropriate, and transform them according to the situation.
- the position information is spliced as an input vector, and the matrix coefficients in the current mode are read according to the MIP mode as an index.
- the output vector is obtained by calculating the input vector and matrix coefficients.
- the encoder can perform output transposition according to the transposition information, and upsample the output vector according to the size of the current block and the number of samples of the output vector, and obtain an output with the same size as the current block as the prediction block of the current block. Based on this, the encoder obtains the rate distortion cost of the current block and records it as cost1.
- the optimal prediction mode obtained by the encoder is the TIMD mode. If neither the optimal prediction mode nor the suboptimal prediction mode is the mean (DC) mode or the flat (PLANAR) mode, and the distortion cost of the suboptimal prediction mode is less than twice the distortion cost of the optimal prediction mode, the encoder needs to predict the block Fusion operation. First, the encoder obtains the interpolation filter coefficients according to the optimal prediction mode, performs interpolation filtering on the upper and left adjacent reconstructed samples to obtain prediction samples at all positions in the current block, which are recorded as the optimal prediction block; secondly, the encoder performs interpolation filtering according to the suboptimal prediction mode.
- the prediction mode obtains the interpolation filter coefficients, and performs interpolation filtering on the upper and left adjacent reconstructed samples to obtain prediction samples at all positions in the current block, which are recorded as suboptimal prediction blocks.
- the encoder uses the ratio between the optimal prediction mode cost value and the suboptimal prediction mode cost value to calculate the weight value belonging to the optimal prediction block and the weight value of the suboptimal prediction block.
- the encoder performs a weighted fusion of the optimal prediction block and the suboptimal prediction block to obtain the prediction block of the current block as output.
- the encoder does not need to perform prediction blocks In the fusion operation, only the optimal prediction block obtained by interpolation filtering the upper and left adjacent reconstructed samples using the optimal prediction mode is used as the prediction block of the current block. Based on this, the encoder obtains the rate distortion cost of the current block, recorded as cost1.
- the encoder continues to traverse other intra prediction techniques and calculates the corresponding rate distortion cost, which is recorded as cost2...costN;
- the encoder is allowed to use TIMD technology in the current block, set the TIMD usage flag of the current block to true and write it into the code stream. It should be noted that in this embodiment, the encoder is allowed to use TIMD technology in the current block, which can be understood as the encoder is allowed to use TIMD technology or allows TMMIP technology in the current block. The specific use of TIMD technology or TMMIP technology can be determined based on the cost information. Sure. If cost1 is not the minimum rate distortion cost, the encoder uses other intra-frame prediction techniques in the current block, sets the TIMD usage flag position of the current block false and writes it into the code stream. It should be understood that information such as identification bits or indexes of other intra prediction technologies are transmitted according to definition and will not be elaborated here.
- the encoder determines the residual block of the current block based on the prediction block of the current block and the original block of the current block, and performs operations such as transformation and quantization, entropy coding, and loop filtering on the residual block of the current block. It should be understood that the specific process can be found in the relevant content above, and to avoid repetition, it will not be described again here.
- the decoder parses the block-level type flag bit. If it is intra-frame mode, it parses or obtains the TIMD allowed use flag bit. This flag bit is a sequence-level flag bit, indicating that the current decoder allows the use of TIMD technology.
- the decoder process can be implemented as the following process:
- step 1
- the decoder parses the TIMD use flag of the current block. Otherwise, the current decoding process does not need to decode the block-level TIMD use flag, and the block-level TIMD use flag defaults to No. If the TIMD usage flag of the current block is true, perform step 2; otherwise, perform step 3.
- the decoder fills the adjacent rows and columns outside the template region with reconstructed samples.
- the filling process is the same as the filling method in the original intra prediction process. For example, the decoder can traverse and fill from the lower left corner to the upper right corner. If all reconstructed samples are available, all available reconstructed samples will be filled in sequence; if all reconstructed samples are available If it is not available, all the reconstructed samples will be filled with the mean; if some of the reconstructed samples are available, the available reconstructed samples will be filled in first. For the remaining unavailable reconstructed samples, the decoder can traverse according to the above order from the lower left corner to the upper right corner until the first available reconstruction sample appears. After a sample is reconstructed, previously unavailable locations are filled with the first available reconstructed sample.
- the decoder takes the reconstructed samples outside the filled template area as input, and uses the allowed MIP mode to predict the samples in the template area.
- MIP patterns are allowed to be used for a 4x4 block size.
- the allowed MIP patterns are 8.
- the allowed MIP pattern for other size blocks is 6.
- blocks of any size can use the MIP transpose function, and the prediction mode of TMMIP mentioned above is the same as the MIP technology.
- the specific prediction calculation process includes: the decoder first performs Haar down-sampling on the reconstructed samples. For example, the decoder determines the down-sampling step size based on the block size. Then, the decoder adjusts the splicing order of the reconstructed samples after downsampling on the upper side and the reconstructed samples after downsampling on the left based on the information about whether to transpose or not; if transposition is not required, the reconstructed samples after downsampling on the left are spliced on top After the reconstructed sample after side downsampling, the obtained vector is used as input.
- the decoder obtains the MIP matrix coefficients based on the traversed prediction mode as an index, and calculates the output vector with the input. Finally, the decoder upsamples the output vector according to the number of output vectors and the current template size. If upsampling is not required, the vectors are filled in the horizontal direction and output as the template prediction block. If upsampling is required, the horizontal direction is upsampled first. Then downsample in the vertical direction, upsample to the same size as the template, and then output it as a prediction block in the template area.
- the decoder also needs to try the template matching calculation process of TIMD, that is, the decoder obtains different interpolation filters based on different prediction mode indexes to interpolate the reference samples to obtain prediction samples within the template.
- the decoder calculates the distortion cost based on the prediction block of the template area obtained by traversing each MIP mode and the reconstructed samples in the template area, and records the distortion cost value under the prediction mode and transposition information.
- the decoder also needs to traverse all intra prediction modes allowed by TIMD, calculate the prediction blocks within the template, calculate the distortion cost with the reconstructed samples in the template, and record the distortion cost in each prediction mode. After the decoder traverses all allowed MIP modes and transposed information and traverses the prediction modes allowed by TIMD, it can select the optimal prediction mode based on the principle of minimum distortion cost.
- the MIP mode and its The corresponding transposition information is used as the optimal prediction mode of the current block; if the distortion cost of the TIMD mode is the smallest, the TIMD mode is used as the optimal prediction mode of the current block, and the optimal intra prediction mode derived by the TIMD technology is recorded and recorded. Distortion cost, while recording the suboptimal prediction mode derived by TIMD technology and its distortion cost.
- the decoder will sample the reconstructed samples adjacent to the upper and left sides of the current block according to the optimal MIP mode and transposition information as appropriate, and transform them according to the situation.
- the position information is spliced as an input vector, and the matrix coefficients in the current mode are read according to the MIP mode as an index.
- the output vector is obtained by calculating the input vector and matrix coefficients.
- the decoder can perform output transposition according to the transposition information, and upsample the output vector according to the size of the current block and the number of samples of the output vector, and obtain an output with the same size as the current block as the prediction block of the current block. Based on this, the decoder obtains the rate distortion cost of the current block and records it as cost1.
- the optimal prediction mode obtained by the decoder is the TIMD mode. If neither the optimal prediction mode nor the suboptimal prediction mode is the mean (DC) mode or the flat (PLANAR) mode, and the distortion cost of the suboptimal prediction mode is less than twice the distortion cost of the optimal prediction mode, the decoder needs to predict the block Fusion operation. First, the decoder obtains the interpolation filter coefficients according to the optimal prediction mode, and performs interpolation filtering on the upper and left adjacent reconstructed samples to obtain prediction samples at all positions in the current block, which are recorded as the optimal prediction block; secondly, the decoder performs interpolation filtering according to the suboptimal prediction mode.
- the prediction mode obtains the interpolation filter coefficients, and performs interpolation filtering on the upper and left adjacent reconstructed samples to obtain prediction samples at all positions in the current block, which are recorded as suboptimal prediction blocks.
- the decoder uses the ratio between the optimal prediction mode cost value and the suboptimal prediction mode cost value to calculate the weight value belonging to the optimal prediction block and the weight value of the suboptimal prediction block.
- the decoder performs a weighted fusion of the optimal prediction block and the suboptimal prediction block to obtain the prediction block of the current block as output.
- the decoder does not need to perform prediction blocks In the fusion operation, only the optimal prediction block obtained by interpolation filtering the upper and left adjacent reconstructed samples using the optimal prediction mode is used as the prediction block of the current block. Based on this, the decoder obtains the rate distortion cost of the current block, recorded as cost1.
- the decoder continues to parse information such as usage flags or indexes of other intra prediction technologies, and obtains the final prediction block of the current block based on the parsed information.
- the decoder parses the code stream and obtains the frequency domain residual block of the current block (also called frequency domain residual information), and performs inverse quantization and inverse transformation on the frequency domain residual block of the current block to obtain the residual block of the current block (also known as the temporal residual block or temporal residual information); the decoder then superimposes the prediction block of the current block and the residual block of the current block to obtain a reconstructed sample block.
- frequency domain residual information also called frequency domain residual information
- temporal residual block or temporal residual information also known as the temporal residual block or temporal residual information
- the reconstructed image can be used as video output or as a reference for subsequent decoding.
- the calculation process of the weight value of the weighted fusion of TIMD prediction blocks can be referred to the content described above in the introduction to TIMD technology. To avoid duplication, it will not be described again here.
- the size of the template area used by the encoder or decoder in the TMMIP technology can be predefined based on the size of the current block. For example, if the MIP mode is the optimal prediction mode or the suboptimal prediction mode, you can also try a weighted fusion operation; in addition, the definition of the template area in the TMMIP technology can be consistent with the definition of the template area in the TIMD technology, or it can be different.
- the width of the current block is less than or equal to 8
- the height of the upper area adjacent to the current block in the template area is the height of two rows of samples, otherwise the height is the height of four rows of samples; similarly, if the height of the current block is less than If equal to 8, then the width of the left area in the template area adjacent to the left side of the current block is two columns of sample height, otherwise the width is four columns of sample height.
- the size of the sequence numbers of the above-mentioned processes does not mean the order of execution.
- the execution order of each process should be determined by its functions and internal logic, and should not be used in this application.
- the execution of the examples does not constitute any limitations.
- Figure 7 is a schematic block diagram of the decoder 500 according to the embodiment of the present application.
- the decoder 500 may include:
- the parsing unit 510 is used to parse the code stream to obtain the residual block of the current block in the current sequence
- Prediction unit 520 used for:
- the distortion cost of the multiple MIP modes includes the distortion cost obtained by using the multiple MIP modes to predict samples in the template area adjacent to the current block;
- the reconstruction unit 530 is configured to obtain a reconstruction block of the current block based on the residual block of the current block and the prediction block of the current block.
- the prediction unit 520 is specifically used to:
- the optimal MIP mode is determined based on the distortion costs of the multiple MIP modes.
- the prediction unit 520 is specifically used to:
- the first identifier is used to identify that the optimal MIP mode is allowed to be used to predict the image blocks in the current sequence, then parse the code stream to obtain the second identifier;
- the optimal MIP mode is determined based on the distortion costs of the multiple MIP modes.
- the parsing unit 510 is also used to:
- the code stream of the current sequence is decoded based on the encoding method used in the optimal MIP mode to obtain the index of the optimal MIP mode.
- the codeword length of the encoding method used by the first n MIP modes in the arrangement sequence is smaller than the codeword length of the encoding mode used by the MIP mode after the nth MIP mode in the arrangement sequence; and/ Or, the first n MIP patterns use variable length encoding and the MIP patterns after the nth MIP pattern use truncated binary encoding.
- the prediction unit 520 is specifically used to:
- the optimal MIP mode is determined as the intra prediction mode of the current block.
- the prediction unit 520 is specifically used to:
- the template-based intra mode derivation TIMD mode predicts the current block to obtain the second prediction block
- the prediction mode with the smallest distortion cost among the optimal MIP mode and the TIMD mode is determined as the intra-frame of the current block. Prediction mode.
- the prediction unit 520 is specifically used to:
- the optimal MIP mode is determined based on the distortion costs of the multiple MIP modes in each state of the third identifier.
- the prediction unit 520 is specifically used to:
- the optimal MIP mode is determined based on the distortion costs of the multiple MIP modes.
- the prediction unit 520 is specifically used to:
- the optimal MIP mode is determined based on the distortion costs of the multiple MIP modes.
- the prediction unit 520 is specifically used to:
- the optimal MIP mode is determined based on the distortion costs of the multiple MIP modes.
- the prediction unit 520 before determining the optimal MIP mode for predicting the current block based on the distortion costs of multiple matrix-based intra prediction MIP modes, the prediction unit 520 is also used to:
- the MIP modes used by the adjacent blocks are determined as the multiple MIP modes.
- the prediction unit 520 before determining the optimal MIP mode for predicting the current block based on the distortion costs of multiple matrix-based intra prediction MIP modes, the prediction unit 520 is also used to:
- Distortion costs for the plurality of MIP modes are determined based on the plurality of prediction blocks and reconstruction blocks within the template region.
- the prediction unit 520 is specifically used to:
- the output vectors of the multiple MIP modes are upsampled to obtain prediction blocks corresponding to the multiple MIP modes.
- the prediction unit 520 is specifically used to:
- the optimal MIP pattern is determined based on the sum SATD of absolute transformation differences of the plurality of MIP patterns on the template region.
- Figure 8 is a schematic block diagram of the encoder 600 according to the embodiment of the present application.
- the encoder 600 may include:
- Prediction unit 610 used for:
- the distortion cost of the multiple MIP modes includes the distortion cost obtained by using the multiple MIP modes to predict samples in the template area adjacent to the current block;
- Residual unit 620 configured to obtain a residual block of the current block based on the prediction block of the current block and the original block of the current block;
- the encoding unit 630 is used to encode the residual block of the current block to obtain the code stream of the current sequence.
- the prediction unit 610 is specifically used to:
- the optimal MIP mode is determined based on the distortion costs of the multiple MIP modes
- the encoding unit 630 is specifically used for:
- the prediction unit 610 is specifically used to:
- the optimal prediction mode based on the optimal MIP mode
- the optimal prediction mode as the intra prediction mode of the current block
- the encoding unit 630 is specifically used for:
- the second identifier is used to identify that the optimal MIP mode is allowed to be used to predict the current block; If the first rate distortion cost is greater than the minimum value of the at least one rate distortion cost, the second identification is used to identify that the optimal MIP mode is not allowed to be used to predict the current block.
- the prediction unit 610 is specifically used to:
- the optimal MIP mode is determined as the optimal prediction mode.
- the prediction unit 610 is specifically used to:
- the template-based intra mode derivation TIMD mode predicts the current block to obtain the second prediction block
- the prediction mode with the smallest distortion cost among the optimal MIP mode and the TIMD mode is determined as the optimal prediction mode.
- the encoding unit 630 is specifically used to:
- the code stream is obtained by encoding the residual block of the current block and encoding the index of the optimal MIP mode based on the encoding method used in the optimal MIP mode.
- the codeword length of the encoding method used by the first n MIP modes in the arrangement sequence is smaller than the codeword length of the encoding mode used by the MIP mode after the nth MIP mode in the arrangement sequence; and/ Or, the first n MIP patterns use variable length encoding and the MIP patterns after the nth MIP pattern use truncated binary encoding.
- the prediction unit 610 is specifically used to:
- the optimal MIP mode is determined as the intra prediction mode of the current block.
- the prediction unit 610 is specifically used to:
- the optimal MIP mode is determined based on the distortion costs of the multiple MIP modes in each state of the third identifier.
- the prediction unit 610 is specifically used to:
- the optimal MIP mode is determined based on the distortion costs of the multiple MIP modes.
- the prediction unit 610 is specifically used to:
- the optimal MIP mode is determined based on the distortion costs of the multiple MIP modes.
- the prediction unit 610 is specifically used to:
- the optimal MIP mode is determined based on the distortion costs of the multiple MIP modes.
- the prediction unit 610 before determining the optimal MIP mode for predicting the current block in the current sequence based on the distortion cost of multiple matrix-based intra prediction MIP modes, the prediction unit 610 is also used to:
- the MIP modes used by the adjacent blocks are determined as the multiple MIP modes.
- the prediction unit 610 before determining the optimal MIP mode for predicting the current block in the current sequence based on the distortion cost of multiple matrix-based intra prediction MIP modes, the prediction unit 610 is also used to:
- Distortion costs for the plurality of MIP modes are determined based on the plurality of prediction blocks and reconstruction blocks within the template region.
- the prediction unit 610 is specifically used to:
- the output vectors of the multiple MIP modes are upsampled to obtain prediction blocks corresponding to the multiple MIP modes.
- the prediction unit 610 is specifically used to:
- the optimal MIP pattern is determined based on the sum SATD of absolute transformation differences of the plurality of MIP patterns on the template region.
- the device embodiments and the method embodiments may correspond to each other, and similar descriptions may refer to the method embodiments. To avoid repetition, they will not be repeated here.
- the decoder 500 shown in Figure 7 may correspond to the corresponding subject in performing the method 300 of the embodiment of the present application, and the foregoing and other operations and/or functions of each unit in the decoder 500 are respectively to implement the method 300, etc.
- the encoder 600 shown in Figure 8 may correspond to the corresponding subject in performing the method 400 of the embodiment of the present application, that is, the foregoing and other operations and/or functions of each unit in the encoder 600 are respectively in order to implement the method 400, etc. Corresponding processes in each method.
- each unit in the decoder 500 or encoder 600 involved in the embodiment of the present application can be separately or entirely combined into one or several other units to form, or some of the units (some) can also be disassembled. It is divided into multiple units with smaller functions, which can achieve the same operation without affecting the realization of the technical effects of the embodiments of the present application.
- the above units are divided based on logical functions. In practical applications, the function of one unit can also be realized by multiple units, or the functions of multiple units can be realized by one unit. In other embodiments of the present application, the decoder 500 or the encoder 600 may also include other units. In practical applications, these functions may also be implemented with the assistance of other units, and may be implemented by multiple units in cooperation.
- a general-purpose computing device including a general-purpose computer including processing elements and storage elements such as a central processing unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), etc.
- Run a computer program capable of executing each step involved in the corresponding method to construct the decoder 500 or encoder 600 involved in the embodiment of the present application, and implement the encoding method or decoding method of the embodiment of the present application.
- the computer program can be recorded on, for example, a computer-readable storage medium, loaded into an electronic device through the computer-readable storage medium, and run therein to implement the corresponding methods of the embodiments of the present application.
- the units mentioned above can be implemented in the form of hardware, can also be implemented in the form of instructions in the form of software, or can be implemented in the form of a combination of software and hardware.
- each step of the method embodiments in the embodiments of the present application can be completed by integrated logic circuits of hardware in the processor and/or instructions in the form of software.
- the steps of the methods disclosed in conjunction with the embodiments of the present application can be directly embodied in hardware.
- the execution of the decoding processor is completed, or the execution is completed using a combination of hardware and software in the decoding processor.
- the software can be located in a mature storage medium in this field such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, register, etc.
- the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps in the above method embodiment in combination with its hardware.
- FIG. 9 is a schematic structural diagram of an electronic device 700 provided by an embodiment of the present application.
- the electronic device 700 at least includes a processor 710 and a computer-readable storage medium 720 .
- the processor 710 and the computer-readable storage medium 720 may be connected through a bus or other means.
- the computer-readable storage medium 720 is used to store a computer program 721
- the computer program 721 includes computer instructions
- the processor 710 is used to execute the computer instructions stored in the computer-readable storage medium 720.
- the processor 710 is the computing core and the control core of the electronic device 700. It is suitable for implementing one or more computer instructions. Specifically, it is suitable for loading and executing one or more computer instructions to implement the corresponding method flow or corresponding functions.
- the processor 710 may also be called a central processing unit (Central Processing Unit, CPU).
- the processor 710 may include, but is not limited to: a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field programmable gate array (Field Programmable Gate Array, FPGA) Or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
- DSP Digital Signal Processor
- ASIC Application Specific Integrated Circuit
- FPGA Field Programmable Gate Array
- the computer-readable storage medium 720 can be a high-speed RAM memory, or a non-volatile memory (Non-Volatile Memory), such as at least one disk memory; optionally, it can also be at least one located far away from the aforementioned processor 710 Computer-readable storage media.
- the computer-readable storage medium 720 includes, but is not limited to: volatile memory and/or non-volatile memory.
- non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electrically removable memory.
- Volatile memory may be Random Access Memory (RAM), which is used as an external cache.
- RAM Random Access Memory
- SRAM static random access memory
- DRAM dynamic random access memory
- DRAM synchronous dynamic random access memory
- DDR SDRAM double data rate synchronous dynamic random access memory
- Enhanced SDRAM, ESDRAM enhanced synchronous dynamic random access memory
- SLDRAM synchronous link dynamic random access memory
- Direct Rambus RAM Direct Rambus RAM
- the electronic device 700 may be the encoder or coding framework involved in the embodiment of the present application; the computer-readable storage medium 720 stores the first computer instructions; the computer-readable instructions are loaded and executed by the processor 710
- the first computer instructions stored in the storage medium 720 are used to implement the corresponding steps in the encoding method provided by the embodiment of the present application; in other words, the first computer instructions in the computer-readable storage medium 720 are loaded by the processor 710 and execute the corresponding steps, To avoid repetition, they will not be repeated here.
- the electronic device 700 may be the decoder or decoding framework involved in the embodiment of the present application; the computer-readable storage medium 720 stores second computer instructions; the computer-readable instructions are loaded and executed by the processor 710 The second computer instructions stored in the storage medium 720 are used to implement the corresponding steps in the decoding method provided by the embodiment of the present application; in other words, the second computer instructions in the computer-readable storage medium 720 are loaded by the processor 710 and execute the corresponding steps, To avoid repetition, they will not be repeated here.
- embodiments of the present application also provide a coding and decoding system, including the above-mentioned encoder and decoder.
- embodiments of the present application also provide a computer-readable storage medium (Memory).
- the computer-readable storage medium is a memory device in the electronic device 700 and is used to store programs and data.
- computer-readable storage medium 720 may include a built-in storage medium in the electronic device 700 , and of course may also include an extended storage medium supported by the electronic device 700 .
- the computer-readable storage medium provides storage space that stores the operating system of the electronic device 700 .
- one or more computer instructions suitable for being loaded and executed by the processor 710 are also stored in the storage space. These computer instructions may be one or more computer programs 721 (including program codes).
- a computer program product or computer program is provided, the computer program product or computer program including computer instructions stored in a computer-readable storage medium.
- the data processing device 700 can be a computer.
- the processor 710 reads the computer instructions from the computer-readable storage medium 720.
- the processor 710 executes the computer instructions, so that the computer executes the encoding method provided in the above various optional ways. or decoding method.
- the computer program product includes one or more computer instructions.
- the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
- the computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transmitted from a website, computer, server, or data center to Transmission to another website, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) methods.
- wired such as coaxial cable, optical fiber, digital subscriber line (DSL)
- wireless such as infrared, wireless, microwave, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computing Systems (AREA)
- Theoretical Computer Science (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
Claims (36)
- 一种解码方法,其特征在于,所述方法适用于解码器,所述方法包括:解析码流获取当前序列中当前块的残差块;基于多个基于矩阵的帧内预测MIP模式的失真代价,确定用于预测所述当前块的最优MIP模式;其中,所述多个MIP模式的失真代价包括利用所述多个MIP模式对与所述当前块相邻的模板区域内的样本进行预测得到的失真代价;基于所述最优MIP模式,确定所述当前块的帧内预测模式;基于所述当前块的帧内预测模式对所述当前块进行预测,得到所述当前块的预测块;基于所述当前块的残差块和所述当前块的预测块,得到所述当前块的重建块。
- 根据权利要求1所述的方法,其特征在于,所述基于多个基于矩阵的帧内预测MIP模式的失真代价,确定用于预测所述当前块的最优MIP模式,包括:解析所述当前序列的码流获取第一标识;若所述第一标识用于标识允许使用所述最优MIP模式对所述当前序列中的图像块进行预测,则基于所述多个MIP模式的失真代价,确定所述最优MIP模式。
- 根据权利要求2所述的方法,其特征在于,所述若所述第一标识用于标识允许使用所述最优MIP模式对所述当前序列中的图像块进行预测,则基于所述多个MIP模式的失真代价,确定所述最优MIP模式,包括:若所述第一标识用于标识允许使用所述最优MIP模式对所述当前序列中的图像块进行预测,则解析所述码流获取第二标识;若所述第二标识用于标识允许使用所述最优MIP模式对所述当前块进行预测,则基于所述多个MIP模式的失真代价,确定所述最优MIP模式。
- 根据权利要求1所述的方法,其特征在于,所述方法还包括:基于所述MIP模式的失真代价,确定所述多个MIP模式的排列顺序;基于所述多个MIP模式的排列顺序,确定所述最优MIP模式使用的编码方式;基于最优MIP模式使用的编码方式对所述当前序列的码流进行解码,得到最优MIP模式的索引。
- 根据权利要求4所述的方法,其特征在于,所述排列顺序中前n个MIP模式使用的编码方式的码字长度小于所述排列顺序中第n个MIP模式之后的MIP模式使用的编码方式的码字长度;和/或,所述前n个MIP模式使用变长编码方式且所述第n个MIP模式之后的MIP模式使用截断二进制编码方式。
- 根据权利要求1至5中任一项所述的方法,其特征在于,所述基于所述最优MIP模式,确定所述当前块的帧内预测模式,包括:将所述最优MIP模式确定为所述当前块的帧内预测模式。
- 根据权利要求1至5中任一项所述的方法,其特征在于,所述基于所述最优MIP模式,确定所述当前块的帧内预测模式,包括:基于所述最优MIP模式对所述当前块进行预测,得到第一预测块;基于模板的帧内模式导出TIMD模式对所述当前块进行预测,得到第二预测块;基于所述第一预测块的失真代价和所述第二预测块的失真代价,将所述最优MIP模式和所述TIMD模式中失真代价最小的预测模式,确定为所述当前块的帧内预测模式。
- 根据权利要求1至7中任一项所述的方法,其特征在于,所述基于多个基于矩阵的帧内预测MIP模式的失真代价,确定用于预测所述当前块的最优MIP模式,包括:基于第三标识和所述多个MIP模式对所述模板区域内的样本进行预测,得到所述第三标识的每一个状态下所述多个MIP模式的失真代价;所述第三标识的用于标识是否转置MIP模式的输入向量和输出向量;基于所述第三标识的每一个状态下所述多个MIP模式的失真代价,确定所述最优MIP模式。
- 根据权利要求1至8中任一项所述的方法,其特征在于,所述基于多个基于矩阵的帧内预测MIP模式的失真代价,确定用于预测所述当前块的最优MIP模式,包括:若所述当前块的尺寸为预设尺寸,则基于所述多个MIP模式的失真代价,确定所述最优MIP模式。
- 根据权利要求9所述的方法,其特征在于,所述若所述当前块的尺寸为预设尺寸,则基于所述多个MIP模式的失真代价,确定所述最优MIP模式,包括:若所述当前块所在的图像帧为I帧、且所述当前块的尺寸为所述预设尺寸,则基于所述多个MIP模式的失真代价,确定所述最优MIP模式。
- 根据权利要求1至8中任一项所述的方法,其特征在于,所述基于多个基于矩阵的帧内预测 MIP模式的失真代价,确定用于预测所述当前块的最优MIP模式,包括:若所述当前块所在的图像帧为B帧,则基于所述多个MIP模式的失真代价,确定所述最优MIP模式。
- 根据权利要求1至11中任一项所述的方法,其特征在于,所述基于多个基于矩阵的帧内预测MIP模式的失真代价,确定用于预测所述当前块的最优MIP模式之前,所述方法还包括:获取与所述当前块相邻的相邻块使用的MIP模式;将所述相邻块使用的MIP模式,确定为所述多个MIP模式。
- 根据权利要求1至12中任一项所述的方法,其特征在于,所述基于多个基于矩阵的帧内预测MIP模式的失真代价,确定用于预测所述当前块的最优MIP模式之前,所述方法还包括:对所述模板区域外部相邻的参考区域进行重建样本填充,得到所述模板区域的参考行和参考列;以所述参考行和所述参考列为输入,利用所述多个MIP模式分别对所述模板区域内的样本进行预测,得到所述多个MIP模式对应的多个预测块;基于所述多个预测块和所述模板区域内的重建块,确定所述多个MIP模式的失真代价。
- 根据权利要求13所述的方法,其特征在于,所述以所述参考行和所述参考列为输入,利用所述多个MIP模式分别对所述模板区域内的样本进行预测,得到所述多个MIP模式对应的多个预测块,包括:对所述参考行和所述参考列进行下采样,得到输入向量;以所述输入向量为输入,通过遍历所述多个MIP模式的方式对所述模板区域内的样本进行预测,得到所述多个MIP模式的输出向量;对所述多个MIP模式的输出向量进行上采样,得到所述多个MIP模式对应的预测块。
- 根据权利要求1至14中任一项所述的方法,其特征在于,所述基于多个矩阵的帧内预测MIP模式在模板区域上的失真代价,确定所述多个MIP模式中的最优MIP模式,包括:基于所述多个MIP模式在所述模板区域上的绝对变换差的和SATD,确定所述最优MIP模式。
- 一种编码方法,其特征在于,所述方法适用于编码器,所述方法包括:基于多个基于矩阵的帧内预测MIP模式的失真代价,确定用于预测当前序列中的当前块的最优MIP模式;其中,所述多个MIP模式的失真代价包括利用所述多个MIP模式对与所述当前块相邻的模板区域内的样本进行预测得到的失真代价;基于所述最优MIP模式,确定所述当前块的帧内预测模式;基于所述当前块的帧内预测模式对所述当前块进行预测,得到所述当前块的预测块;基于所述当前块的预测块和所述当前块的原始块,得到所述当前块的残差块;对所述当前块的残差块进行编码,得到所述当前序列的码流。
- 根据权利要求16所述的方法,其特征在于,所述基于多个基于矩阵的帧内预测MIP模式的失真代价,确定用于预测当前序列中的当前块的最优MIP模式,包括:获取第一标识;若所述第一标识用于标识允许使用所述最优MIP模式对所述当前序列中的图像块进行预测,则基于所述多个MIP模式的失真代价,确定所述最优MIP模式;其中,所述对所述当前块的残差块进行编码,得到所述当前序列的码流,包括:对所述当前块的残差块和所述第一标识进行编码,得到所述码流。
- 根据权利要求17所述的方法,其特征在于,所述基于所述最优MIP模式,确定所述当前块的帧内预测模式,包括:若所述第一标识用于标识允许使用所述最优MIP模式对所述当前序列中的图像块进行预测,则基于所述最优MIP模式确定最优预测模式;基于所述最优预测模式对所述当前块进行预测,得到第一率失真代价;基于至少一个帧内预测模式对所述当前块进行预测,得到至少一个率失真代价;若所述第一率失真代价小于或等于所述至少一个率失真代价中的最小值,则将所述最优预测模式,确定为所述当前块的帧内预测模式;其中,所述对所述当前块的残差块和所述第一标识进行编码,得到所述码流,包括:对所述当前块的残差块、所述第一标识和第二标识进行编码,得到所述码流;其中,若所述第一率失真代价小于或等于所述至少一个率失真代价中的最小值,则所述第二标识用于标识允许使用所述最优MIP模式对所述当前块进行预测;若所述第一率失真代价大于所述至少一个率失真代价中的最小值,则所述第二标识用于标识不允许使用所述最优MIP模式对所述当前块进行预 测。
- 根据权利要求18所述的方法,其特征在于,所述基于所述最优MIP模式确定最优预测模式,包括:将所述最优MIP模式确定为所述最优预测模式。
- 根据权利要求18所述的方法,其特征在于,所述基于所述最优MIP模式确定最优预测模式,包括:基于所述最优MIP模式对所述当前块进行预测,得到第一预测块;基于模板的帧内模式导出TIMD模式对所述当前块进行预测,得到第二预测块;基于所述第一预测块的失真代价和所述第二预测块的失真代价,将所述最优MIP模式和所述TIMD模式中失真代价最小的预测模式,确定为所述最优预测模式。
- 根据权利要求16所述的方法,其特征在于,所述对所述当前块的残差块进行编码,得到所述当前序列的码流,包括:基于所述MIP模式的失真代价,确定所述多个MIP模式的排列顺序;基于所述多个MIP模式的排列顺序,确定所述最优MIP模式使用的编码方式;对所述当前块的残差块进行编码以及基于最优MIP模式使用的编码方式对所述最优MIP模式的索引进行编码,得到所述码流。
- 根据权利要求21所述的方法,其特征在于,所述排列顺序中前n个MIP模式使用的编码方式的码字长度小于所述排列顺序中第n个MIP模式之后的MIP模式使用的编码方式的码字长度;和/或,所述前n个MIP模式使用变长编码方式且所述第n个MIP模式之后的MIP模式使用截断二进制编码方式。
- 根据权利要求16至22中任一项所述的方法,其特征在于,所述基于多个基于矩阵的帧内预测MIP模式的失真代价,确定用于预测当前序列中的当前块的最优MIP模式,包括:基于第三标识和所述多个MIP模式对所述模板区域内的样本进行预测,得到所述第三标识的每一个状态下所述多个MIP模式的失真代价;所述第三标识的用于标识是否转置MIP模式的输入向量和输出向量;基于所述第三标识的每一个状态下所述多个MIP模式的失真代价,确定所述最优MIP模式。
- 根据权利要求16至23中任一项所述的方法,其特征在于,所述基于多个基于矩阵的帧内预测MIP模式的失真代价,确定用于预测当前序列中的当前块的最优MIP模式,包括:若所述当前块的尺寸为预设尺寸,则基于所述多个MIP模式的失真代价,确定所述最优MIP模式。
- 根据权利要求24所述的方法,其特征在于,所述若所述当前块的尺寸为预设尺寸,则基于所述多个MIP模式的失真代价,确定所述最优MIP模式,包括:若所述当前块所在的图像帧为I帧、且所述当前块的尺寸为所述预设尺寸,则基于所述多个MIP模式的失真代价,确定所述最优MIP模式。
- 根据权利要求16至23中任一项所述的方法,其特征在于,所述基于多个基于矩阵的帧内预测MIP模式的失真代价,确定用于预测当前序列中的当前块的最优MIP模式,包括:若所述当前块所在的图像帧为B帧,则基于所述多个MIP模式的失真代价,确定所述最优MIP模式。
- 根据权利要求16至26中任一项所述的方法,其特征在于,所述基于多个基于矩阵的帧内预测MIP模式的失真代价,确定用于预测当前序列中的当前块的最优MIP模式之前,所述方法还包括:获取与所述当前块相邻的相邻块使用的MIP模式;将所述相邻块使用的MIP模式,确定为所述多个MIP模式。
- 根据权利要求16至27中任一项所述的方法,其特征在于,所述基于多个基于矩阵的帧内预测MIP模式的失真代价,确定用于预测当前序列中的当前块的最优MIP模式之前,所述方法还包括:对所述模板区域外部相邻的参考区域进行重建样本填充,得到所述模板区域的参考行和参考列;以所述参考行和所述参考列为输入,利用所述多个MIP模式分别对所述模板区域内的样本进行预测,得到所述多个MIP模式对应的多个预测块;基于所述多个预测块和所述模板区域内的重建块,确定所述多个MIP模式的失真代价。
- 根据权利要求28所述的方法,其特征在于,所述以所述参考行和所述参考列为输入,利用所述多个MIP模式分别对所述模板区域内的样本进行预测,得到所述多个MIP模式对应的多个预测块,包括:对所述参考行和所述参考列进行下采样,得到输入向量;以所述输入向量为输入,通过遍历所述多个MIP模式的方式对所述模板区域内的样本进行预测, 得到所述多个MIP模式的输出向量;对所述多个MIP模式的输出向量进行上采样,得到所述多个MIP模式对应的预测块。
- 根据权利要求16至29中任一项所述的方法,其特征在于,所述基于多个矩阵的帧内预测MIP模式在模板区域上的失真代价,确定所述多个MIP模式中的最优MIP模式,包括:基于所述多个MIP模式在所述模板区域上的绝对变换差的和SATD,确定所述最优MIP模式。
- 一种解码器,其特征在于,包括:解析单元,用于解析码流获取当前序列中当前块的残差块;预测单元,用于:基于多个基于矩阵的帧内预测MIP模式的失真代价,确定用于预测所述当前块的最优MIP模式;其中,所述多个MIP模式的失真代价包括利用所述多个MIP模式对与所述当前块相邻的模板区域内的样本进行预测得到的失真代价;基于所述最优MIP模式,确定所述当前块的帧内预测模式;基于所述当前块的帧内预测模式对所述当前块进行预测,得到所述当前块的预测块;重建单元,用于基于所述当前块的残差块和所述当前块的预测块,得到所述当前块的重建块。
- 一种编码器,其特征在于,包括:预测单元,用于:基于多个基于矩阵的帧内预测MIP模式的失真代价,确定用于预测当前序列中的当前块的最优MIP模式;其中,所述多个MIP模式的失真代价包括利用所述多个MIP模式对与所述当前块相邻的模板区域内的样本进行预测得到的失真代价;基于所述最优MIP模式,确定所述当前块的帧内预测模式;基于所述当前块的帧内预测模式对所述当前块进行预测,得到所述当前块的预测块;残差单元,用于基于所述当前块的预测块和所述当前块的原始块,得到所述当前块的残差块;编码单元,用于对所述当前块的残差块进行编码,得到所述当前序列的码流。
- 一种电子设备,其特征在于,包括:处理器,适于执行计算机程序;计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序被所述处理器执行时,实现如权利要求1至15中任一项所述的方法或如权利要求16至30中任一项所述的方法。
- 一种计算机可读存储介质,其特征在于,用于存储计算机程序,所述计算机程序使得计算机执行如权利要求1至15中任一项所述的方法或如权利要求16至30中任一项所述的方法。
- 一种计算机程序产品,包括计算机程序/指令,其特征在于,所述计算机程序/指令被处理器执行时实现如权利要求1至15中任一项所述的方法或如权利要求16至30中任一项所述的方法。
- 一种码流,其特征在于,所述码流如权利要求1至15中任一项所述的方法中的码流或如权利要求16至30中任一项所述的方法生成的码流。
Priority Applications (6)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2022/085898 WO2023193254A1 (zh) | 2022-04-08 | 2022-04-08 | 解码方法、编码方法、解码器以及编码器 |
| JP2024558335A JP2025511315A (ja) | 2022-04-08 | 2022-04-08 | デコーディング方法、エンコーディング方法、デコーダー及びエンコーダー |
| EP22936192.8A EP4507295A4 (en) | 2022-04-08 | 2022-04-08 | DECODING METHOD, CODING METHOD, DECODER AND ENCODER |
| CN202280094572.3A CN119032562A (zh) | 2022-04-08 | 2022-04-08 | 解码方法、编码方法、解码器以及编码器 |
| US18/898,188 US20250024027A1 (en) | 2022-04-08 | 2024-09-26 | Decoding method, encoding method, and storage medium |
| MX2024012272A MX2024012272A (es) | 2022-04-08 | 2024-10-03 | Metodo de decodificacion, metodo de codificacion, decodificador y codificador |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2022/085898 WO2023193254A1 (zh) | 2022-04-08 | 2022-04-08 | 解码方法、编码方法、解码器以及编码器 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/898,188 Continuation US20250024027A1 (en) | 2022-04-08 | 2024-09-26 | Decoding method, encoding method, and storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023193254A1 true WO2023193254A1 (zh) | 2023-10-12 |
Family
ID=88243934
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2022/085898 Ceased WO2023193254A1 (zh) | 2022-04-08 | 2022-04-08 | 解码方法、编码方法、解码器以及编码器 |
Country Status (6)
| Country | Link |
|---|---|
| US (1) | US20250024027A1 (zh) |
| EP (1) | EP4507295A4 (zh) |
| JP (1) | JP2025511315A (zh) |
| CN (1) | CN119032562A (zh) |
| MX (1) | MX2024012272A (zh) |
| WO (1) | WO2023193254A1 (zh) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2025213371A1 (zh) * | 2024-04-09 | 2025-10-16 | Oppo广东移动通信有限公司 | 编解码方法、码流、编码器、解码器以及存储介质 |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118985131A (zh) * | 2022-04-08 | 2024-11-19 | Oppo广东移动通信有限公司 | 解码方法、编码方法、解码器以及编码器 |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2021004153A1 (zh) * | 2019-07-07 | 2021-01-14 | Oppo广东移动通信有限公司 | 图像预测方法、编码器、解码器以及存储介质 |
| CN113940065A (zh) * | 2019-06-24 | 2022-01-14 | 佳能株式会社 | 用于编码和解码视频样本的块的方法、设备和系统 |
| CN113950832A (zh) * | 2019-06-03 | 2022-01-18 | Lg电子株式会社 | 基于矩阵的帧内预测装置和方法 |
Family Cites Families (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2018070790A1 (ko) * | 2016-10-14 | 2018-04-19 | 세종대학교 산학협력단 | 영상의 부호화/복호화 방법 및 장치 |
| WO2020127811A2 (en) * | 2018-12-20 | 2020-06-25 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Intra predictions using linear or affine transforms with neighbouring sample reduction |
| CN113647106B (zh) * | 2019-03-05 | 2024-08-13 | 弗劳恩霍夫应用研究促进协会 | 混合视频编码工具的用例驱动上下文模型选择 |
| KR102845344B1 (ko) * | 2019-04-16 | 2025-08-11 | 엘지전자 주식회사 | 영상 코딩에서 매트릭스 기반의 인트라 예측을 위한 변환 |
| KR20250078609A (ko) * | 2019-05-08 | 2025-06-02 | 엘지전자 주식회사 | Mip 및 lfnst를 수행하는 영상 부호화/복호화 방법, 장치 및 비트스트림을 전송하는 방법 |
| CN114073081B (zh) * | 2019-06-25 | 2025-07-18 | 弗劳恩霍夫应用研究促进协会 | 使用基于矩阵的帧内预测和二次变换进行编码 |
| WO2021025478A1 (ko) * | 2019-08-06 | 2021-02-11 | 현대자동차주식회사 | 동영상 데이터의 인트라 예측 코딩을 위한 방법 및 장치 |
| PH12022553231A1 (en) * | 2020-06-03 | 2024-02-12 | Nokia Technologies Oy | A method, an apparatus and a computer program product for video encoding and video decoding |
| US11582460B2 (en) * | 2021-01-13 | 2023-02-14 | Lemon Inc. | Techniques for decoding or coding images based on multiple intra-prediction modes |
| EP4454265A1 (en) * | 2021-12-21 | 2024-10-30 | InterDigital CE Patent Holdings, SAS | Most probable mode list generation with template-based intra mode derivation and decoder-side intra mode derivation |
| CN118985131A (zh) * | 2022-04-08 | 2024-11-19 | Oppo广东移动通信有限公司 | 解码方法、编码方法、解码器以及编码器 |
-
2022
- 2022-04-08 JP JP2024558335A patent/JP2025511315A/ja active Pending
- 2022-04-08 CN CN202280094572.3A patent/CN119032562A/zh active Pending
- 2022-04-08 EP EP22936192.8A patent/EP4507295A4/en active Pending
- 2022-04-08 WO PCT/CN2022/085898 patent/WO2023193254A1/zh not_active Ceased
-
2024
- 2024-09-26 US US18/898,188 patent/US20250024027A1/en active Pending
- 2024-10-03 MX MX2024012272A patent/MX2024012272A/es unknown
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113950832A (zh) * | 2019-06-03 | 2022-01-18 | Lg电子株式会社 | 基于矩阵的帧内预测装置和方法 |
| CN113940065A (zh) * | 2019-06-24 | 2022-01-14 | 佳能株式会社 | 用于编码和解码视频样本的块的方法、设备和系统 |
| WO2021004153A1 (zh) * | 2019-07-07 | 2021-01-14 | Oppo广东移动通信有限公司 | 图像预测方法、编码器、解码器以及存储介质 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4507295A4 * |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2025213371A1 (zh) * | 2024-04-09 | 2025-10-16 | Oppo广东移动通信有限公司 | 编解码方法、码流、编码器、解码器以及存储介质 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN119032562A (zh) | 2024-11-26 |
| EP4507295A1 (en) | 2025-02-12 |
| JP2025511315A (ja) | 2025-04-15 |
| US20250024027A1 (en) | 2025-01-16 |
| MX2024012272A (es) | 2024-11-08 |
| EP4507295A4 (en) | 2026-02-11 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN114868386B (zh) | 编码方法、解码方法、编码器、解码器以及电子设备 | |
| WO2023193253A1 (zh) | 解码方法、编码方法、解码器以及编码器 | |
| WO2021185008A1 (zh) | 编码方法、解码方法、编码器、解码器以及电子设备 | |
| TW202404370A (zh) | 解碼方法、編碼方法、解碼器、編碼器、電子設備、電腦可讀儲存媒介、電腦程式產品以及碼流 | |
| CN116686288A (zh) | 编码方法、解码方法、编码器、解码器以及电子设备 | |
| US20250024027A1 (en) | Decoding method, encoding method, and storage medium | |
| US12395627B2 (en) | Intra prediction method and decoder | |
| WO2023123398A1 (zh) | 滤波方法、滤波装置以及电子设备 | |
| WO2024016156A1 (zh) | 滤波方法、编码器、解码器、码流以及存储介质 | |
| WO2023197181A1 (zh) | 解码方法、编码方法、解码器以及编码器 | |
| WO2023197179A1 (zh) | 解码方法、编码方法、解码器以及编码器 | |
| RU2852150C2 (ru) | Способ декодирования, способ кодирования, декодер и кодер | |
| RU2859876C2 (ru) | Способ декодирования, способ кодирования и энергонезависимый машиночитаемый носитель данных | |
| WO2025236218A1 (zh) | 编解码方法、码流、编码器、解码器以及存储介质 | |
| WO2025213371A1 (zh) | 编解码方法、码流、编码器、解码器以及存储介质 | |
| JP2026500444A (ja) | 復号方法、符号化方法、復号装置および符号化装置 | |
| WO2025073085A1 (zh) | 编解码方法、编解码器以及存储介质 | |
| WO2025213396A1 (zh) | 编解码方法、码流、编码器、解码器以及存储介质 | |
| WO2025091378A1 (zh) | 编解码方法、编解码器以及存储介质 | |
| WO2025213368A1 (zh) | 编解码方法、码流、编码器、解码器以及存储介质 | |
| WO2024197744A9 (zh) | 解码方法、编码方法、解码器和编码器 | |
| WO2025213370A1 (zh) | 编解码方法、码流、编码器、解码器以及存储介质 | |
| WO2025213398A1 (zh) | 编解码方法、码流、编码器、解码器以及存储介质 | |
| WO2024212086A1 (zh) | 解码方法、编码方法、解码器以及编码器 | |
| WO2025123197A1 (zh) | 编解码方法、编解码器、码流以及存储介质 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22936192 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2024558335 Country of ref document: JP |
|
| WWE | Wipo information: entry into national phase |
Ref document number: MX/A/2024/012272 Country of ref document: MX |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 202280094572.3 Country of ref document: CN |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 202417084570 Country of ref document: IN |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2022936192 Country of ref document: EP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2022936192 Country of ref document: EP Effective date: 20241108 |
