WO2024149384A1 - Modes de codage basés sur la régression - Google Patents

Modes de codage basés sur la régression Download PDF

Info

Publication number
WO2024149384A1
WO2024149384A1 PCT/CN2024/072071 CN2024072071W WO2024149384A1 WO 2024149384 A1 WO2024149384 A1 WO 2024149384A1 CN 2024072071 W CN2024072071 W CN 2024072071W WO 2024149384 A1 WO2024149384 A1 WO 2024149384A1
Authority
WO
WIPO (PCT)
Prior art keywords
regression
weights
samples
prediction
bcw
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2024/072071
Other languages
English (en)
Inventor
Chih-Hsuan Lo
Cheng-Yen Chuang
Chen-Yen LAI
Tzu-Der Chuang
Ching-Yeh Chen
Chih-Wei Hsu
Yi-Wen Chen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
MediaTek Inc
Original Assignee
MediaTek Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by MediaTek Inc filed Critical MediaTek Inc
Publication of WO2024149384A1 publication Critical patent/WO2024149384A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/577Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/154Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock

Definitions

  • the present disclosure relates generally to video coding systems, and more particularly, to techniques of regression-based coding.
  • VVC Versatile video coding
  • JVET Joint Video Experts Team
  • MPEG ISO/IEC Moving Picture Experts Group
  • ISO/IEC 23090-3 2021
  • Information technology -Coded representation of immersive media -Part 3 Versatile video coding, published Feb. 2021.
  • VVC is developed based on its predecessor HEVC (High Efficiency Video Coding) by adding more coding tools to improve coding efficiency and also to handle various types of video sources including 3-dimensional (3D) video signals.
  • HEVC High Efficiency Video Coding
  • the apparatus may be a coder.
  • the coder employs a regression-based prediction model to determine blending weights for List 0 (L0) and List 1 (L1) reference frames.
  • the coder includes at least a L0 term associated with the L0 reference frame, a L1 term associated with the L1 reference frame, and a bias term in the regression-based prediction model.
  • the coder utilizes the determined blending weights to generate bi-predictive values for pixels in a coding unit (CU) based on values of corresponding pixels in L0 and L1 reference frames.
  • the coder derives the blending weights based on a Mean Squared Error (MSE) between the bi-predictive values and actual pixel values within the CU.
  • MSE Mean Squared Error
  • the apparatus may be a coder.
  • the coder receives video data that consists of multiple coding units (CUs) .
  • CUs coding units
  • the coder derives a set of regression parameters, which includes weights for cross-prediction sample blending, by performing regression on template samples within a specific region.
  • the coder then applies this set of regression parameters to reference predictors in order to derive prediction samples for the CU.
  • the coder adjusts the set of regression parameters based on a prediction error metric.
  • the apparatus may be a UE.
  • the UE determines a resource allocation of a second physical downlink shared channel (PDSCH) .
  • the second PDSCH is on a second time-frequency resource transmitted from a wireless device.
  • the second PDSCH carries data transmitted from a base station.
  • the UE receives the second PDSCH from the wireless device.
  • the second PDSCH is received on the second time-frequency resource according to the resource allocation.
  • the UE decodes the data carried in the second PDSCH.
  • PDSCH physical downlink shared channel
  • the one or more aspects comprise the features hereinafter fully described and particularly pointed out in the claims.
  • the following description and the annexed drawings set forth in detail certain illustrative features of the one or more aspects. These features are indicative, however, of but a few of the various ways in which the principles of various aspects may be employed, and this description is intended to include all such aspects and their equivalents.
  • FIGs. 1A and 1B illustrate an exemplary adaptive Inter/Intra video coding system incorporating loop processing.
  • FIG. 2 is a diagram illustrating a convolutional cross-component model (CCCM) .
  • CCCM convolutional cross-component model
  • FIG. 3 is a diagram illustrating a reference area of a PU.
  • FIG. 4 is a diagram illustrating a technique of Bi-prediction with coding unit level weight (BCW) .
  • FIG. 5 is a diagram illustrating one embodiment of a first approach of enhancement.
  • FIG. 6 is a diagram illustrating another embodiment of the first approach.
  • FIG. 7 is a diagram illustrating a first configuration of signaling for use with regression-based BCW mode.
  • FIG. 8 is a diagram illustrating a second configuration of signaling for use with regression-based BCW mode.
  • FIG. 9 is a diagram illustrating a third configuration of signaling for use with regression-based BCW mode.
  • FIG. 10 is a flow chart 1000 of a method (process) for coding.
  • FIG. 11 is a flow chart 1100 of another method (process) for coding.
  • FIG. 12 shows Top and left neighboring blocks used in CIIP weight derivation.
  • FIG. 13 shows a MHP coding flow.
  • FIG. 14 shows the detailed flow of the low complexity cost search steps.
  • FIG. 15 shows spatial parts of the convolutional filter.
  • FIG. 16 shows reference area (with its paddings) used to derive the filter coefficients.
  • FIG. 17 shows four Sobel based gradient patterns for GLM.
  • FIG. 18 shows non-downsampled luma samples.
  • FIG. 19 shows spatial samples used for GL-CCCM.
  • FIG. 20 shows various downsampling filters used in cross-component models.
  • FIG. 21 shows filter on samples of MM-CCLM/MM-CCCM.
  • FIG. 22 shows 67 intra prediction modes.
  • FIG. 23 shows reference samples for wide-angular intra prediction.
  • FIG. 24 shows problem of discontinuity in case of directions beyond 45°.
  • FIG. 25 shows DIMD chroma mode
  • FIG. 26 shows intra template matching search area.
  • FIG. 27 shows use of IntraTMP block vector for IBC block.
  • FIG. 28 shows intra TMP fusion.
  • FIG. 29 shows the division method for angular modes.
  • FIG. 30 shows a regression model for the inter modes with multiple predictors.
  • FIG. 31 shows another regression model for the inter modes with multiple predictors.
  • FIG. 32 shows a regression model for the intra modes with multiple predictors.
  • FIG. 33 shows another regression model for the intra modes with multiple predictors.
  • processors include microprocessors, microcontrollers, graphics processing units (GPUs) , central processing units (CPUs) , application processors, digital signal processors (DSPs) , reduced instruction set computing (RISC) processors, systems on a chip (SoC) , baseband processors, field programmable gate arrays (FPGAs) , programmable logic devices (PLDs) , state machines, gated logic, discrete hardware circuits, and other suitable hardware configured to perform the various functionality described throughout this disclosure.
  • processors in the processing system may execute software.
  • Software shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software components, applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise.
  • the functions described may be implemented in hardware, software, or any combination thereof. If implemented in software, the functions may be stored on or encoded as one or more instructions or code on a computer-readable medium.
  • Computer-readable media includes computer storage media. Storage media may be any available media that can be accessed by a computer.
  • such computer-readable media can comprise a random-access memory (RAM) , a read-only memory (ROM) , an electrically erasable programmable ROM (EEPROM) , optical disk storage, magnetic disk storage, other magnetic storage devices, combinations of the aforementioned types of computer-readable media, or any other medium that can be used to store computer executable code in the form of instructions or data structures that can be accessed by a computer.
  • RAM random-access memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable ROM
  • optical disk storage magnetic disk storage
  • magnetic disk storage other magnetic storage devices
  • combinations of the aforementioned types of computer-readable media or any other medium that can be used to store computer executable code in the form of instructions or data structures that can be accessed by a computer.
  • FIG. 1A illustrates an exemplary adaptive Inter/Intra video coding system incorporating loop processing.
  • Intra Prediction the prediction data is derived based on previously coded video data in the current picture.
  • Motion Estimation (ME) is performed at the encoder side and Motion Compensation (MC) is performed based of the result of ME to provide prediction data derived from other picture (s) and motion data.
  • Switch 114 selects Intra Prediction 110 or Inter-Prediction 112 and the selected prediction data is supplied to Adder 116 to form prediction errors, also called residues.
  • the prediction error is then processed by Transform (T) 118 followed by Quantization (Q) 120.
  • T Transform
  • Q Quantization
  • the transformed and quantized residues are then coded by Entropy Encoder 122 to be included in a video bitstream corresponding to the compressed video data.
  • the bitstream associated with the transform coefficients is then packed with side information such as motion and coding modes associated with Intra prediction and Inter prediction, and other information such as parameters associated with loop filters applied to underlying image area.
  • the side information associated with Intra Prediction 110, Inter prediction 112 and in-loop filter 130, are provided to Entropy Encoder 122 as shown in FIG. 1A. When an Inter-prediction mode is used, a reference picture or pictures have to be reconstructed at the encoder end as well.
  • the transformed and quantized residues are processed by Inverse Quantization (IQ) 124 and Inverse Transformation (IT) 126 to recover the residues.
  • the residues are then added back to prediction data 136 at Reconstruction (REC) 128 to reconstruct video data.
  • the reconstructed video data may be stored in Reference Picture Buffer 134 and used for prediction of other frames.
  • incoming video data undergoes a series of processing in the encoding system.
  • the reconstructed video data from REC 128 may be subject to various impairments due to a series of processing.
  • in-loop filter 130 is often applied to the reconstructed video data before the reconstructed video data are stored in the Reference Picture Buffer 134 in order to improve video quality.
  • deblocking filter (DF) may be used.
  • SAO Sample Adaptive Offset
  • ALF Adaptive Loop Filter
  • the loop filter information may need to be incorporated in the bitstream so that a decoder can properly recover the required information. Therefore, loop filter information is also provided to Entropy Encoder 122 for incorporation into the bitstream.
  • Loop filter 130 is applied to the reconstructed video before the reconstructed samples are stored in the reference picture buffer 134.
  • the system in FIG. 1A is intended to illustrate an exemplary structure of a typical video encoder. It may correspond to the High Efficiency Video Coding (HEVC) system, VP8, VP9, H. 264 or VVC.
  • HEVC High Efficiency Video Coding
  • the decoder can use similar or portion of the same functional blocks as the encoder except for Transform 118 and Quantization 120 since the decoder only needs Inverse Quantization 124 and Inverse Transform 126.
  • the decoder uses an Entropy Decoder 140 to decode the video bitstream into quantized transform coefficients and needed coding information (e.g. ILPF information, Intra prediction information and Inter prediction information) .
  • the Intra prediction 150 at the decoder side does not need to perform the mode search. Instead, the decoder only needs to generate Intra prediction according to Intra prediction information received from the Entropy Decoder 140.
  • the decoder only needs to perform motion compensation (MC 152) according to Inter prediction information received from the Entropy Decoder 140 without the need for motion estimation.
  • an input picture is partitioned into non-overlapped square block regions referred as CTUs (Coding Tree Units) , similar to HEVC.
  • CTUs Coding Tree Units
  • Each CTU can be partitioned into one or multiple smaller size coding units (CUs) .
  • the resulting CU partitions can be in square or rectangular shapes.
  • VVC divides a CTU into prediction units (PUs) as a unit to apply prediction process, such as Inter prediction, Intra prediction, etc.
  • FIG. 2 is a diagram 200 illustrating a convolutional cross-component model (CCCM) utilized by both an encoder and a decoder in video processing to predict chroma samples from reconstructed luma samples.
  • CCCM convolutional cross-component model
  • CCCM Cross-Component Linear Model
  • CCCM offers the choice between a single model or a multi-model variant.
  • the multi-model variant employs two models: one derived for samples above the average luma reference value and another for the remainder of the samples, maintaining the essence of the CCLM design.
  • the multi-model CCCM mode is selectable for Prediction Units (PUs) with at least 128 reference samples available.
  • PUs Prediction Units
  • the CCCM’s convolutional filter is a 7-tap filter that includes a 5-tap spatial component shaped like a plus sign, a non-linear term, and a bias term.
  • the spatial component’s input includes a center luma sample (C) , collocated with the chroma sample being predicted, and its surrounding north (N) , south (S) , west (W) , and east (E) neighbors.
  • a center pixel 210 is surrounded by a north pixel 222 to the top, a east pixel 224 to the right, a south pixel 226 to the below, and a west pixel 228 to the left.
  • C, N, S, E, W represent spatial sample values from the chroma sample that is to be predicted. More specifically, C is the luma sample value of the center pixel 210 whose chroma sample value being predicted. N is the luma sample value of the north pixel 222. S is the luma sample value of the south pixel 226. E is the luma sample value of the east pixel 224. W is the luma sample value of the west pixel 228.
  • B which is a bias term, is a scalar offset between input and output (similar to the offset term in CCLM) and is set to the median chroma value (512 for 10-bit content) .
  • the filter coefficients (c i ) are determined by minimizing the Mean Squared Error (MSE) between the predicted and actual reconstructed chroma samples within the reference area.
  • MSE Mean Squared Error
  • FIG. 3 is a diagram 300 illustrating a reference area of a PU.
  • a reference area 310 containing 2 or 6 lines of chroma samples above and to the left of a PU 320.
  • the choice between using 2 or 6 lines to derive the CCCM model parameters within the single model CCCM depends on a template cost.
  • candidate models utilize either 6 lines of adjacent luma samples or luma samples aligned with the current chroma block to determine the mean values that divide samples into two groups.
  • the template cost is ascertained by applying the candidate Cross-Component Prediction (CCP) –either 2 or 6 lines –on the template, calculating the Sum of Absolute Differences (SAD) between CCP-predicted samples and the reconstructed samples in the template.
  • CCP Cross-Component Prediction
  • SAD Sum of Absolute Differences
  • the reference area extends one PU width to the right and one PU height below the PU boundaries, adjusted to encompass only available samples.
  • extensions 330 in the shaded area are necessary to support the "side samples" of the plus-shaped spatial filter and are padded in unavailable regions.
  • the MSE is calculated by taking each chroma sample in the reference area, comparing it to the predicted chroma value produced by the CCCM using the current filter coefficients, squaring the difference, and then summing all those squared differences across the entire reference area as follows:
  • the goal in training the model is to adjust the filter coefficients (e.g., c 0 , c 1 , ..., c 6 ) such that this MSE value is as small as possible.
  • the filter coefficients e.g., c 0 , c 1 , ..., c 6
  • the CCCM’s predictions are more closely aligned with the actual chroma values, leading to a more accurate and effective chroma prediction model for the video encoding process.
  • MSE minimization employs the calculation of an autocorrelation matrix for the luma input and a cross-correlation vector between the luma input and chroma output.
  • the autocorrelation matrix undergoes LDL decomposition and the final filter coefficients are computed through back-substitution. This procedure aligns roughly with the calculation of the Adaptive Loop Filter (ALF) coefficients in ECM, but opts for LDL decomposition over the Cholesky decomposition to forego square root operations.
  • ALF Adaptive Loop Filter
  • the autocorrelation matrix employs the reconstructed luma and chroma sample values, which for full range samples (0 to 1023 for 10-bit content) , results in relatively large values within the matrix, necessitating high bit depth operations for model parameter calculation.
  • the proposed solution is to eliminate fixed offsets from luma and chroma samples for each model within each PU, thereby lowering the magnitudes of the values involved in model creation and lessening the need for high-precision fixed-point arithmetic. Consequently, the use of 16-bit decimal precision is advocated instead of the original CCCM implementation’s 22-bit precision.
  • the luma offset is removed during luma reference sample interpolation.
  • the interpolation’s rounding term can be replaced by a corrected offset comprising both the original rounding term and offsetLuma.
  • the chroma offset can be directly subtracted from the reference chroma samples; alternatively, its influence can be excluded from the cross-component vector, resulting in an analogous outcome.
  • Division operations are sometimes regarded as challenging for implementation. Therefore, these operations within CCCM model parameter calculations are supplanted by multiplications (with a scaling factor) and shift operations. Such replacements are computed by determining the scaling factor and shift count based on the denominator, following a methodology akin to that adopted for CCLM parameters calculation.
  • FIG. 4 is a diagram 400 illustrating a technique of Bi-prediction with coding unit level weight (BCW) .
  • BCW coding unit level weight
  • the value (P bi-pred ) of the pixel 422 in the current frame 420 is predicated based on the value (P 0 ) of a corresponding pixel 412 in the reference frame 410 and the value (P 1 ) of a corresponding pixel 432 in the reference frame 430 as follows:
  • w b is the weighting factor applied to the prediction signals. It determines the influence each reference frame has on the bi-predictive signal.
  • the constants 8 and 4 are used for normalization and rounding in the weighted averaging process.
  • the weight w b is determined in one of two ways: 1) for a non-merge CU, the weight index is signalled after the motion vector difference; 2) for a merge CU, the weight index is inferred from neighboring blocks based on the merge candidate index. BCW is only applied to CUs with 256 or more luma samples (i.e., CU width times CU height is greater than or equal to 256) . For low-delay pictures, all 5 weights are used. For non-low-delay pictures, only 3 weights (w b ⁇ ⁇ 3, 4, 5 ⁇ ) are used.
  • the BCW weight index is coded using one context-coded bin followed by bypass-coded bins.
  • the first context-coded bin indicates if equal weight is used; and if unequal weight is used, additional bins are signalled using bypass coding to indicate which unequal weight is used.
  • Weighted Prediction is a coding tool supported by the H. 264/Advanced Video Coding (AVC) and HEVC standards to efficiently code video content with fading. Support for WP was also added into the VVC standard. WP allows weighting parameters (weight and offset) to be signalled for each reference picture in each of the reference picture lists L0 and L1. Then, during motion compensation, the weight (s) and offset (s) of the corresponding reference picture (s) are applied. WP and BCW are designed for different types of video content.
  • the BCW weight index is not signalled, and w is inferred to be 4 (i.e., equal weight is applied) .
  • the weight index is inferred from neighboring blocks based on the merge candidate index. This can be applied to both normal merge mode and inherited affine merge mode.
  • constructed affine merge mode the affine motion information is constructed based on the motion information of up to 3 blocks.
  • the BCW index for a CU using the constructed affine merge mode is simply set equal to the BCW index of the first control point MV.
  • CIIP Inter and Intra Prediction
  • BCW index of the current CU is set to 2, e.g., equal weight.
  • the blending weights of multi-prediction inter modes are mostly chosen from a pre-defined candidate set (e.g., five BCW weights for BCW mode) and are determined either by Template Matching (TM) costs or Rate-Distortion (RD) costs.
  • TM Template Matching
  • RD Rate-Distortion
  • TM costs are a measure used in video encoding to evaluate the effectiveness of template matching predictions. They quantify the similarity between the block being encoded and potential reference blocks within the same frame or previous frames. This similarity is typically calculated using the sum of absolute differences (SAD) between the pixel values of the surrounding areas of the blocks. A lower TM cost indicates a closer match, suggesting that the reference block will provide a good prediction for the current block, thus reducing the amount of data required to encode the residuals. TM costs assist encoders in making decisions that optimize the efficiency of video compression by selecting the most suitable reference blocks for prediction.
  • SAD sum of absolute differences
  • RD costs are a metric in video compression that represent the trade-off between bitrate and video quality.
  • the rate component refers to the size of the data or bitrate, while distortion measures the loss in quality compared to the original content.
  • RD costs are used to evaluate different encoding options, aiming to minimize the bitrate while preserving as much quality as possible.
  • the weighting of rate versus distortion in the RD cost formula is adjusted by a parameter (usually denoted by ⁇ ) , influencing the encoder’s preference for either video quality or file size.
  • RD optimization is the process of using RD costs to choose encoding parameters that result in efficient compression with balanced quality and file size.
  • the blending weights are derived directly from a regression process, which increases flexibility and also removes the need for signaling the index for blending weight (e.g., at most three bits for BCW signaling) .
  • the encoder 472 /decoder 476 collects reference predictors from multiple sources are collected. These predictors could be generated from different reference frames or constructed using different prediction methods, such as motion-compensated predictors from List 0 and List 1 references.
  • the encoder 472 /decoder 476 further sets up a regression model (e.g., a model represented by the equation above) .
  • the goal of the regression model is to find the best combination of weights that would produce a predicted signal closest to the actual signal, according to some error metric.
  • the encoder 472 /decoder 476 then calculates an error metric.
  • the error metric is the MSE.
  • the MSE is computed between the blended prediction signal of the CU and the actual signal.
  • the MSE is a function of the blending weights and is defined as the sum of the squared differences between the predicted and actual signal values of all pixels in the CU.
  • the decoder 476 uses a regression process to optimize the blending weights to minimize the MSE. This optimization process can be done analytically by solving a system of linear equations derived from the MSE minimization conditions, or it can be done numerically using iterative optimization algorithms.
  • the MSE between the actual signal P actual and the bi-predicted signal P bi-pred of all pixels in the coding unit 450 can be calculated as follows:
  • N is the total number of pixels in the coding unit 450.
  • P actual, i is the actual value of the i-th pixel in the CU.
  • P bi-pred, i is the bi-predicted value of the i-th pixel, calculated using the weighted average of the two predictors P 0 and P 1 for that pixel, and the bi-prediction weight w b .
  • the specific values of the blending weights that minimize the MSE are calculated. These weights are continuous values which provide more precision and flexibility compared to selecting weights from a predefined set.
  • the encoder can directly use the calculated weight without the need for indexing or signaling predefined weight choices.
  • the encoder 472 /decoder 476 may derive a set of regression parameters ⁇ that include the weights for cross-prediction sample blending, and/or the weight of the center sample, and/or the weights of spatial samples, and/or the weights of non-linear terms, and/or the weight of the bias term for the inter modes with multiple predictors (e.g., BCW, MHP, CIIP, AMVP-Merge, etc. ) .
  • This is done by regression using the template samples in a specific region.
  • Template samples refer to a set of pixel values that are used as a reference for predicting other pixel values within a coding unit or block. Template samples are typically the neighboring pixel values that are immediate to the coding block and have already been reconstructed (decoded) and are therefore available for use in the prediction process.
  • the set of regression parameters ⁇ can be derived.
  • the parameter set ⁇ is then applied to the samples in the reference predictors to derive the samples in the current Coding Unit (CU) .
  • the refined sample consists of K (e.g., five) spatial samples, a non-linear term, and a bias term. That is,
  • the number of spatial samples K can be any number; the spatial samples can be obtained from arbitrary positions inside the template region (e.g., one, two, four, etc., template lines) , and the non-linear terms can be multiple and calculated as the higher order of C, N, S, E, or W.
  • the final prediction sample is the weighted sum of all the refined samples according to the weight ⁇ i of each prediction.
  • the number of prediction sources can be any number. That is, i denotes a specific instance of the refined sample is associated with the i-th predictor or the i-th set of filter coefficients used in the algorithm.
  • FIG. 5 is a diagram 500 illustrating one embodiment of the first approach.
  • the encoder 472 /decoder 476 uses 3 filters to predict the sample value of a pixel 510. Using the equation describe supra, the encoder 472 /decoder 476 generates 3 refined samples: Further, in this example, the same 3 filters are applied to a pixel x to generate the corresponding 3 refined samples The encoder 472 /decoder 476 generates a predicted value P (x) of the pixel x as follows
  • the encoder 472 /decoder 476 builds a model that will use the predictors’ outputs as independent variables to estimate the target pixel values (dependent variable) .
  • the model is a linear combination of the predictors’ outputs.
  • the encoder 472 /decoder 476 derives the weights ( ⁇ 0 , ⁇ 1 , ⁇ 2 ) attributed to 3 predictors through a regression-based method to minimize the mean square error (MSE) .
  • MSE mean square error
  • the MSE for a set of n pixels in a CU is calculated as follows:
  • Y (x) is the actual (ground truth) value of pixel x
  • P (x) is the predicted value of the same pixel using the model.
  • the encoder 472 /decoder 476 finds the values of ⁇ 0 , ⁇ 1 , ⁇ 2 that minimize the MSE.
  • the model aims to produce predicted pixel values that closely approximate the actual (or desired) pixel values, thus achieving a more accurate prediction.
  • the encoder 472 /decoder 476 may use the calculated weights to guide the selection of the final weights. More specifically, the encoder 472 /decoder 476 initially use the regression model to compute a set of weights that would ideally minimize the prediction error. Rather than applying these calculated weights directly, the values are used to guide which weights should ultimately be used from a predefined candidate set (e.g., w b ⁇ ⁇ -2, 3, 4, 5, 10 ⁇ as described referring to FIG. 4) . The encoder 472 /decoder 476 preserves and considers only those weights from the original candidate set that are close to the values derived from the regression model.
  • a predefined candidate set e.g., w b ⁇ ⁇ -2, 3, 4, 5, 10 ⁇ as described referring to FIG.
  • the encoder 472 /decoder 476 selectively chooses the weight for the final blend. This choice may be guided by additional criteria such as Rate-Distortion (RD) costs. As the encoder 472 /decoder 476 narrows down the candidate set to weights that are close to the regression model values, it effectively reduces the size of the candidate set.
  • RD Rate-Distortion
  • FIG. 6 is a diagram 600 illustrating another embodiment of the first approach.
  • the refined sample consists of K 0 (e.g., two) spatial gradients, K 1 (e.g., two) location terms, a non-linear term, and a bias term. That is,
  • G x (2W+NW+SW) - (2E+NE+SE) ,
  • X and Y are the vertical and horizontal relative locations
  • the number of spatial sample weights, K 0 and K 1 can be any number; the spatial gradients and location terms can be derived from arbitrary positions within the template region (e.g., one, two, four, etc., template lines) , and there can be multiple non-linear terms calculated as higher-orders of C, N, S, E, and/or W.
  • the final prediction sample is the weighted sum of all the refined samples determined by the weight ⁇ i for each source predictor.
  • the number of source predictors can be any number.
  • a parameter set ⁇ b that includes the blending weights for L 0 and L 1 , the weight of the center sample, the weights of spatial samples, the weights of non-linear terms, and the weight of the bias term for BCW mode is derived by a regression model. This model minimizes the SAD, SSD, or other sums of differences between the current and reference refined samples in the template region.
  • the parameter set ⁇ b is then applied to the samples from the L 0 and L 1 reference predictors to derive the samples in the current CU.
  • Another embodiment describes a parameter set ⁇ m that includes the MHP blending weights, the weight of the center sample, the weights of spatial samples, the weights of non-linear terms, and the weight of the bias term for MHP mode.
  • This parameter set is derived by a regression model that minimizes the SAD, SSD, or other sums of differences between the current and reference refined samples in the template region.
  • the parameter set ⁇ m is then applied to the samples from the reference predictors to derive the samples in the current CU.
  • a parameter set ⁇ c that includes the CIIP blending weights, the weight of the center sample, the weights of spatial samples, the weights for non-linear terms, and the weight of the bias term for CIIP mode is derived by a regression model. This model also aims to minimize the SAD, SSD, or other sums of differences between the current and reference refined samples in the template region.
  • the parameter set ⁇ c is then applied to the samples from both the inter and intra reference predictors to construct the final predictor samples in the current CU.
  • a parameter set ⁇ a that encompasses the AMVP-Merge blending weights, the weight of the center sample, the weights of spatial samples, the weights of non-linear terms, and the weight of the bias term for the AMVP-Merge mode is derived by a regression model. This model minimizes the SAD, SSD, or other sums of differences between the current and reference refined samples within the template region.
  • the derived parameter set ⁇ a is applied to the samples from the inter AMVP and inter merge reference predictors to generate the predictor samples for the current CU.
  • the final weights –the parameter set that includes the weights for cross-prediction sample blending, the weight of the center sample, the weights of spatial samples, the weights of non-linear terms, and the weight of the bias term –of modes with multiple prediction sources can be adjudicated by RD cost.
  • the weights that include both those derived by the regression model and the original weights of each inter mode with the minimum RD cost will be chosen as the ultimate cross-prediction blending weights.
  • a flag is then signaled to determine whether the weights derived by the regression model will be used at the decoder side explicitly.
  • the weights derived by the regression model can steer the determination for the final weights of each inter mode with multiple prediction sources rather than being directly applied. This implies that only the weights within the original weight candidate set for each mode that are close to the weights derived by the regression model –where the difference in weights is smaller than a pre-defined threshold –are preserved. This selection process can result in a reduced size for the weight candidates set and further diminish the signaling bits needed for the final weights.
  • one or more control syntaxes/flags can be implemented to enable or disable the aforementioned function as needed.
  • These control syntaxes/flags could be inserted at the CU level, "Coding Tree Unit (CTU) level, CTU-row level, CTU group level, slice level, within the slice header, at the picture level, within the picture header, PPS, at the sequence level, within the SPS, APS, or VPS. If the function is disabled, the default weights are instead applied.
  • CTU Cosmetic Tree Unit
  • a disabling mechanism can also be deployed based on inferring criteria.
  • a condition check evaluates the legality of the derived parameter: if the derived parameters or predictors deviate extensively from those of the original method (e.g., using default weights) –outside of a pre-defined or adaptive range –the proposed method is consequently disabled and reverts to the original method.
  • the blending weights of multi-prediction intra modes are mostly determined by the ratio of TM costs (e.g., TIMD, intraTMP) and amplitude of histogram (e.g., DIMD) which lacks flexibility.
  • TM costs e.g., TIMD, intraTMP
  • DIMD amplitude of histogram
  • the encoder /decoder may derive a set of regression parameter ⁇ including the weights for cross-prediction sample blending, and/or center sample, and/or spatial samples, and/or non-linear terms, and/or bias terms of the intra modes with multiple predictors (e.g., DIMD, TIMD, fusion of Chroma Intra prediction, IntraTMP, Intra prediction fusion, combination of CIIP with TIMD and TM merge, etc. ) by regression using the template samples in a specific region.
  • multiple predictors e.g., DIMD, TIMD, fusion of Chroma Intra prediction, IntraTMP, Intra prediction fusion, combination of CIIP with TIMD and TM merge, etc.
  • the set of regression parameter ⁇ can be derived.
  • the parameter set ⁇ is then applied to the samples of the reference predictors to derive the samples in current Coding Unit (CU) .
  • the refined sample consists of K (e.g., 5) spatial samples, a nonlinear term, and a bias term. That is,
  • the number of spatial samples K can be any number, the spatial samples can be obtained from arbitrary positions inside the template region (e.g., 1, 2, 4, etc. template lines) and the non-linear terms can be multiple and calculated as the higher order of C, N, S, E and/or W.
  • the final prediction sample is the weighted sum of all the refined samples according to the weight ⁇ i of each source predictor.
  • the number of the source predictor can be any number.
  • the refined sample consists of K 0 (e.g., 2) spatial gradients, K 1 (e.g., 2) location terms, a nonlinear term, and a bias term. That is,
  • G x (2W+NW+SW) - (2E+NE+SE)
  • G y (2N+NW+NE) - (2S+SW+SE)
  • P (C 2 + (1 ⁇ (bitDepth-1) ) ) >>bitDepth
  • X and Y are the vertical and horizontal relative locations
  • the number of spatial samples K 0 and K 1 can be any number, the spatial gradients and location terms can be obtained from arbitrary positions inside the template region (e.g., 1, 2, 5, etc. template lines) and the non-linear terms can be multiple and calculated as the higher order of C, N, S, E and/or W.
  • the final prediction sample is the weighted sum of all the refined samples according to the weight ⁇ i of each source predictor.
  • the number of the source predictor can be any number.
  • a parameter set ⁇ d including the blending weights of three predictors (i.e., two intra angular mode predictors derived from the reconstructed neighbor samples and the planar mode predictor) , and/or the weight of center sample, and/or the weights of spatial samples, and/or the weights of non-linear terms, and/or the weight of bias term of DIMD mode is derived by a regression model minimizing the SAD, SSD, or other sum of difference between the current and reference refined samples in template region wherein the current samples indicates the neighboring reconstructed samples and the reference samples indicates the neighboring predicted samples derived according to the intra mode.
  • the parameter set ⁇ d is then applied to the samples of the three source predictors to derive the samples in current CU.
  • a parameter set ⁇ t including the blending weights of two predictors i.e., the first two intra prediction modes in MPMs with the minimum SATD) , and/or the weight of center sample, and/or the weights of spatial samples, and/or the weights of non-linear terms, and/or the weight of bias term of TIMD mode is derived by a regression model minimizing the SAD, SSD, or other sum of difference between the current and reference refined samples in template region wherein the current samples indicates the neighboring reconstructed samples and the reference samples indicates the neighboring predicted samples derived according to the intra mode.
  • the parameter set ⁇ t is then applied to the samples of the two source predictors to derive the samples in current CU.
  • a parameter set ⁇ f including the blending weights of two chroma predictors, and/or the weight of center sample, and/or the weights of spatial samples, and/or the weights of non-linear terms, and/or the weight of bias term of fusion of Chroma Intra prediction mode is derived by a regression model minimizing the SAD, SSD, or other sum of difference between the current and reference refined samples in template region wherein the current samples indicates the neighboring reconstructed samples and the reference samples indicates the neighboring predicted samples derived according to the intra mode.
  • the parameter set ⁇ f is then applied to the samples of the two chroma source predictors to derive the samples in current CU.
  • a parameter set ⁇ i including the blending weights of multiple predictors, and/or the weight of center sample, and/or the weights of spatial samples, and/or the weights of non-linear terms, and/or the weight of bias term of Intra Template-Matching Prediction Fusion mode proposed in JVET-AC0069 or Fusion of Intra Template Matching mode proposed in JVET-AC0107 or other Intra Template Matching fusion method which combines multiple source predictors to derive the final predictor is derived by a regression model minimizing the SAD, SSD, or other sum of difference between the current and reference refined samples in template region wherein the current samples indicates the neighboring reconstructed samples and the reference samples indicates the L-shaped reconstructed samples of the source predictors.
  • the parameter set ⁇ i is then applied to the samples of the source predictors of Intra Template-Matching Prediction Fusion mode or Fusion of Intra Template Matching mode or other Intra Template Matching fusion method which combines multiple source predictors to derive the final predictor to derive the samples in current CU.
  • a parameter set ⁇ m including the blending weights of predictors derived from different reference lines, and/or the weight of center sample, and/or the weights of spatial samples, and/or the weights of non-linear terms, and/or the weight of bias term of Intra prediction fusion mode is derived by a regression model minimizing the SAD, SSD, or other sum of difference between the current and reference refined samples in template region wherein the current samples indicates the neighboring reconstructed samples and the reference samples indicates the neighboring predicted samples derived from different reference lines.
  • the parameter set ⁇ m is then applied to the samples of the source predictors derived from different reference lines to derive the samples in current CU.
  • a parameter set ⁇ l including the blending weights of predictors derived from different reference lines, and/or the weight of center sample, and/or the weights of spatial samples, and/or the weights of non-linear terms, and/or the weight of bias term of Intra prediction fusion mode is derived by a regression model minimizing the SAD, SSD, or other sum of difference between the current and reference refined samples in template region wherein the current samples indicates the neighboring reconstructed samples and the reference samples indicates the neighboring predicted samples derived from different reference lines.
  • the parameter set ⁇ l is then applied to the samples of the source predictors derived from different reference lines to derive the samples in current CU.
  • a parameter set ⁇ c including the blending weights of inter and intra predictors, and/or the weight of center sample, and/or the weights of spatial samples, and/or the weights of non-linear terms, and/or the weight of bias term of combination of CIIP with TIMD and TM merge mode is derived by a regression model minimizing the SAD, SSD, or other sum of difference between the current and reference refined samples in template region wherein the current samples indicates the neighboring reconstructed samples and the reference samples indicates the reconstructed samples neighboring to the reference inter predictor for inter part and the neighboring predicted samples derived according to the intra mode for intra part.
  • the parameter set ⁇ c is then applied to the samples of the inter and intra predictors to derive the samples in current CU.
  • the final weights i.e., the parameter set including the weights for cross-prediction sample blending, and/or center sample, and/or spatial samples, and/or non-linear terms, and/or bias terms
  • the weights including the weights derived by regression model and the original weights of each intra mode with minimum RD cost will be the final cross-prediction blending weights and one flag is signaled to determine whether to use the weights derived by regression model at decoder side explicitly.
  • the weights i.e., the parameter set including the weights for cross-prediction sample blending, and/or center sample, and/or spatial samples, and/or non-linear terms, and/or bias terms
  • the weights derived by regression model can be used to guide the final weights of each intra mode with multiple prediction sources instead of directly being used. That is, only the weights in the original weight candidate set of each mode close to (i.e., the difference of weights is smaller than a threshold) the weights derived by regression model are preserved and have a chance to be chosen at the encoder side which can reduce the size of the candidate set and further reduce the signaling bits of the final weights.
  • one or more control syntaxes/flags can be added to enable the above-mentioned function or not.
  • the one or more control syntaxes/flags can be added on the CU-level, CTU-level, CTU-row-level, CTU group-level, slice-level, slice header, picture-level, picture header, PPS, sequence-level, SPS, APS, or VPS. If the function is disabled, the default weight is applied.
  • the inferring disable mechanism can also be applied.
  • One condition check is applied to evaluate the legality of the derived parameter.
  • the derived parameter can be applied to get the derived predictors. However, if the derived predictors are far from (out of a predefined range or an adaptive range) the predictors of the original method (e.g., using default weight) , or if the derived parameters are far from (out of a predefined range or an adaptive range) the parameters of the original method (e.g., default wright) , the proposed method was disabled. It will fall back to the original method.
  • any of the foregoing proposed techniques can be implemented in encoders and/or decoders.
  • any of the proposed methods can be implemented in inter coding of an encoder, and/or a decoder.
  • any of the proposed methods can be implemented as a circuit coupled to the inter coding of the encoder and/or the decoder, so as to provide the information needed by the inter coding.
  • LIC bi-predictive Local Illumination Compensation
  • the encoder /decoder may employ a technique based on regression for BCW to derive the weights for L0 and L1 in a single iteration while also considering template matching between reference and current templates to replace bi-predictive LIC.
  • regression-based BCW conducts a single regression process.
  • the regression-based BCW model includes three terms: T L0 , T L1 , and B.
  • Pred RegressionBCW is the predicted pixel value.
  • T L0 signifies the sample value within the L0 template region
  • T L1 represents the sample value within the L1 template region
  • B denotes the bias term, which may either be a predefined factor or a factor proportional to the bit depth.
  • the regression weights c 0 , c 1 , and c 2 are obtained by minimizing the Mean Squared Error (MSE) as follows:
  • T Org refers to the original template sample and is the actual pixel value.
  • the regression-based BCW model can encompass some or all parts of spatial terms, gradient terms, position terms, non-linear terms, the term of sample differences between L0 and L1, and/or other terms analogous to those detailed in the CCCM model described supra.
  • An example of an extended regression-based BCW model is as follows:
  • PosX and PosY correspond to the horizontal and vertical positions of the sample; G x and G y are the horizontal and vertical gradient terms respectively, and N, S, E, and W are the spatial terms similar to those in CCCM.
  • the regression weights c 0 , c 1 , ..., c 13 are determined by minimizing the MSE as shown below:
  • regularization terms are crucial to ensure the stability of the regression process.
  • several regularization terms are introduced to enhance coding efficiency based on empirical evidence.
  • various regularization terms are incorporated into the MSE formulation to control the values of certain regression weights.
  • the objective of these regularization terms is to stabilize the regression process and avoid deriving regression weights that are unreasonably large.
  • One embodiment involves adding two regularization terms designed to ensure that regression weights associated with L0 and L1 are close to either inherited BCW weights in merge modes or signaled BCW weights in AMVP modes.
  • w 0 and w 1 represent the inherited or signaled BCW weights of L0 and L1, respectively.
  • the parameters ⁇ 0 and ⁇ 1 serve as scaling factors for these regularization terms and may be fixed values or proportionate to the average magnitude of T L0 , T L1 , T L0 ⁇ T L1 or other parameters. By minimizing this MSE with the included regularization terms, the resulting regression weights for L0 and L1 will more closely align with the values of w 0 and w 1 .
  • w 0 and w 1 may be set to fixed, equal BCW weights (e.g., 1/2 and 1/2) independent of inherited or signaled BCW weights.
  • the matrix A contains information derived from the autocorrelation of the template samples, potentially adjusted by regularization terms, and the vector b results from the cross-correlation of the template samples with the original samples, also potentially adjusted by regularization terms.
  • x contains the regression coefficients c 0 , c 1 , ... that are applied to the reference predictors to best match the actual data.
  • C is a predefined constant reflecting the total sum of the regression weights under constraint. For example, setting C to 1 implies that the sum of derived regression weights should approximate 1.
  • Another adaptation further extends the approach to include the total of numerous regression weights: ⁇ 0
  • MSE
  • c 0 and c 1 are two predefined factors to constrain the ratio of regression weights to be inversely proportional to the ratio of c 0 and c 1 , without prescribing the exact values of c 0 and c 1 .
  • the terms can be integrated into the auto-correlation matrix, A, and the cross-correlation vector, b:
  • the signaling of LIC and BCW are independent for a bi-predictive CU. These two coding tools have some coding gain overlapping. Moreover, when the regression-based BCW is supported, original BCW signaling might be redundant.
  • the signaling process for the Bi-prediction with Coding Unit Level weight (BCW) during Advanced Motion Vector Prediction (AMVP) mode can be omitted when the regression-based BCW mode is selected.
  • BCW Coding Unit Level weight
  • AMVP Advanced Motion Vector Prediction
  • a flag is transmitted to the decoder to indicate whether the regression-based BCW is active. If this is the case and the flag is set to true, traditional BCW signaling becomes unnecessary and is subsequently bypassed. Conversely, if the flag is false, conventional BCW signaling remains and continues to determine the BCW weight for the CU encoded in AMVP mode.
  • the regression-based BCW mode is treated as a sub-mode of BCW, simplifying the process. Specifically, a candidate from the BCW weights set represents the regression-based BCW mode without necessitating an additional flag for differentiation.
  • an additional flag may be introduced to indicate the usage of regression-based BCW for further refinement of the selected BCW weights. If this flag is true, the selected BCW weights function as regularization terms within the regression-based BCW mode, and the calculated regression weights are utilized to generate the final predictor. If the flag is false, the selected BCW weights are applied directly without further modification.
  • One embodiment aims to decrease signaling overhead by entirely substituting the classic BCW process with the regression-based BCW approach.
  • the Rate-Distortion (RD) costs of the original BCW candidates are not computed during encoding; only the RD cost of the regression-based BCW is considered. If this RD cost is less than that of other modes, a flag is set to true and signaled to the decoder. If not, the flag is to be set to false, indicating that the entire BCW blending process should be omitted.
  • RD Rate-Distortion
  • LIC Local Illumination Compensation
  • BCW BCW weights
  • bi-predictive LIC with the original BCW blending and regression-based BCW could be mutually exclusive, a choice made according to the state of a binary flag. If the flag is true, regression-based BCW is in operation. If false, bi-predictive LIC is performed with the original BCW blending. The simultaneous disabling of both LIC and BCW is unsupported in this configuration.
  • regression-based BCW is employed to replace both bi-predictive LIC and traditional BCW for a bi-predictive CU. If the associated flag is true, regression-based BCW is executed. Should the flag be false, neither LIC nor BCW is performed.
  • the bi-predictive LIC with the conventional BCW blending and the regression-based BCW with signaled BCW index are mutually exclusive.
  • the choice between these two techniques is determined by a flag. Specifically, if the flag is true, the regression-based BCW is executed with the presence of signaled BCW weights (i.e., the regularization terms take into account the value of signaled BCW weights) . If the flag is false, then the bi-predictive LIC is conducted with the original BCW blending.
  • the regression-based BCW with a signaled BCW index is employed to replace both the bi-predictive LIC and the original BCW for a bi-predictive CU. If the corresponding flag is set to true, regression-based BCW is performed, utilizing the signaled BCW weights. If the flag is set to false, however, the bi-predictive LIC is not executed.
  • two distinct flags are employed to indicate the application of bi-predictive LIC and regression-based BCW for a bi-predictive CU. Specifically, when the first flag is set to false, neither bi-predictive LIC nor BCW are executed. Should the first flag be true while the second flag is false, regression-based BCW is carried out without the signaling of traditional BCW. If both the first and second flags are true, regression-based BCW is performed with the presence of the original BCW signaling.
  • two flags are indicated to inform on the use of bi-predictive LIC, regression-based BCW, and conventional BCW for a bi-predictive CU. If the first flag is false, both bi-predictive LIC and BCW are not performed. If the first flag is true and the second is false, only the original BCW is performed. If both flags are true, regression-based BCW is performed with the presence of the original BCW signaling..
  • another configuration utilizes two flags to determine the application of bi-predictive LIC and BCW for a bi-predictive CU. If the first flag is false, both bi-predictive LIC and BCW are not performed. If the first flag is true and the second flag is false, bi-predictive LIC is conducted without the BCW. If both flags are true, regression-based BCW is performed with the presence of the original BCW signaling.
  • two flags are signaled to indicate the application of bi-predictive LIC, regression-based BCW, and original BCW for a bi-predictive CU.
  • first flag is false
  • both bi-predictive LIC and BCW are not performed. If the first flag is true and the second flag is false, the bi-predictive LIC is performed and the original BCW is performed. Should both flags be set to true, regression-based BCW is performed with the presence of the original BCW signaling.
  • FIG. 7 is a diagram 700 illustrating a first configuration of signaling for use with regression-based BCW mode. This configuration leverages a conditional approach to signaling.
  • a LIC flag is false (0)
  • the original BCW signaling proceeds as usual.
  • the LIC flag is true (1)
  • an additional regression BCW flag is utilized. If the regression BCW flag is false (0) , the original BCW signaling proceeds as usual with LIC. If the regression BCW flag is true (1) , regression-based BCW is applied and the BCW signaling is bypassed.
  • FIG. 8 is a diagram 800 illustrating a second configuration of signaling for use with regression-based BCW mode.
  • This configuration leverages a conditional approach to signaling.
  • a LIC flag is false (0)
  • the original BCW signaling proceeds as usual.
  • the LIC flag is true (1)
  • an additional regression BCW flag is utilized. If the regression BCW flag is false (0) , the original BCW signaling proceeds as usual with LIC. If the regression BCW flag is true (1) , regression-based BCW is applied and the BCW signal is not skipped.
  • the traditional BCW signaling can serve as a reference that helps to guide and stabilize the regression process that is being employed to determine the BCW weights. This pre-existing knowledge of BCW weights can serve as a form of regularization or anchoring in the regression calculations, potentially leading to more reliable and stable outcomes in the optimization of the BCW weights.
  • FIG. 9 is a diagram 900 illustrating a third configuration of signaling for use with regression-based BCW mode.
  • LIC signaling combined with BCW, and regression-based BCW signaling are combined.
  • a regression-based BCW flag dictates the course of action. If the flag indicates that regression-based BCW is off, BCW signaling is allowed alongside LIC (irrespective of LIC being on or off) . Conversely, if the flag indicates that regression-based BCW is on, both LIC and BCW signaling are disabled.
  • LIC can be either on or off, and BCW signaling will proceed as usual. This allows for the continued use of LIC and BCW as determined independently. However, if regression-based BCW is turned on, there may be no need to perform LIC or signal for BCW, because the regression-based BCW presumably encompasses the functionalities or benefits of both tools and can replace them.
  • This configuration aims to reduce of redundant processes by using regression-based BCW to potentially replace the conventional LIC and BCW signaling whenever appropriate.
  • FIG. 10 is a flow chart 1000 of a method (process) for coding.
  • the method may be performed by a coder (e.g., the encoder 372 /decoder 376) .
  • the coder employs a regression-based prediction model to determine blending weights for List 0 (L0) and List 1 (L1) reference frames. This model includes at least a term associated with the L0 reference frame, a term associated with the L1 reference frame, and a bias term.
  • the coder utilizes the determined blending weights to generate bi-predictive values for pixels in a coding unit (CU) based on values of corresponding pixels in L0 and L1 reference frames.
  • the coder derives the blending weights based on a Mean Squared Error (MSE) between the bi-predictive values and actual pixel values within the CU.
  • MSE Mean Squared Error
  • the regression-based prediction model may further includes one or more of spatial gradient terms, location terms, nonlinear terms, and differences between the L0 and L1 terms.
  • the coder incorporates one or more regularization terms into the MSE to constrain one or more of the blending weights.
  • the one or more regularization terms constrain the L0 term and the L1 term to a set of preconfigured blending weights.
  • the one or more regularization terms constrain a sum of the L0 term and the L1 term.
  • the one or more regularization terms constrain a ratio between the L0 term and the L1 term.
  • the coder transmits an indicator indicating whether the regression-based prediction model is enabled.
  • the coder bypasses both signaling of a set of preconfigured blending weights and LIC when the indicator shows that the regression-based prediction model is enabled.
  • the coder transmits a first indicator to signal whether LIC is enabled.
  • the coder also transmits a second indicator to signal whether the regression-based prediction model is enabled.
  • the coder if the second indicator indicates that the regression-based prediction model is enabled, the coder disables signaling of a set of preconfigured blending weights, and the regression-based prediction model is employed in response to the second indicator.
  • the coder proceeds with signaling of a set of preconfigured blending weights, and the regression-based prediction model is employed in response to the second indicator.
  • FIG. 11 is a flow chart 1100 of another method (process) for coding.
  • the method may be performed by a coder (e.g., the encoder 372 /decoder 376) .
  • the coder receives video data that includes a plurality of coding units (CUs) .
  • the coder derives a set of regression parameters that include weights for cross-prediction sample blending. This derivation is accomplished by employing regression on template samples located within a specific region.
  • the coder applies the set of regression parameters to reference predictors to derive prediction samples for the CU.
  • the code adjusts the set of regression parameters based on a prediction error metric.
  • the coder calculates a Mean Squared Error (MSE) as the prediction error metric. This MSE is determined by comparing the prediction samples of the CU with the actual values of the CU.
  • the regression parameters include weights for spatial samples, spatial gradients, location terms, non-linear terms, and a bias term.
  • the regression parameters are applied to prediction samples derived from a range of prediction modes.
  • These modes include Bi-prediction with Coding Unit Level weight (BCW) , Multiple Hypothesis Prediction (MHP) , Combined Inter and Intra Prediction (CIIP) , Advanced Motion Vector Prediction Merge (AMVP-Merge) , and Geometric Partitioning Mode (GPM) .
  • BCW Bi-prediction with Coding Unit Level weight
  • MHP Multiple Hypothesis Prediction
  • CIIP Combined Inter and Intra Prediction
  • AMVP-Merge Advanced Motion Vector Prediction Merge
  • GPS Geometric Partitioning Mode
  • the coder further applies the regression parameters to prediction samples derived from Decoder-side Intra Mode Derivation (DIMD) , Template Matching Intra Prediction (TIMD) , fusion of Chroma Intra prediction, Intra Block Copy (IBC) , Intra prediction fusion, and a combination of CIIP with TIMD and TM merge.
  • DIMD Decoder-side Intra Mode Derivation
  • TIMD Template Matching Intra Prediction
  • IBC Intra Block Copy
  • Intra prediction fusion Intra combination of CIIP with TIMD and TM merge.
  • the coder uses the weights derived by regression to guide the selection of final weights from a predefined candidate set. This approach reduces the size of the candidate set and the signaling overhead in the video bitstream. The coder selects the final weights that are close to the values derived from the regression model, establishing that the most suitable weights are used for the prediction process.
  • motion parameters consisting of motion vectors, reference picture indices and reference picture list usage index, and additional information needed for the new coding feature of VVC to be used for inter-predicted sample generation.
  • the motion parameter can be signalled in an explicit or implicit manner.
  • a CU is coded with skip mode, the CU is associated with one PU and has no significant residual coefficients, no coded motion vector delta or reference picture index.
  • a merge mode is specified whereby the motion parameters for the current CU are obtained from neighbouring CUs, including spatial and temporal candidates, and additional schedules introduced in VVC.
  • the merge mode can be applied to any inter-predicted CU, not only for skip mode.
  • the alternative to merge mode is the explicit transmission of motion parameters, where motion vector, corresponding reference picture index for each reference picture list and reference picture list usage flag and other needed information are signalled explicitly per each CU.
  • VVC includes a number of new and refined inter prediction coding tools listed as follows:
  • MMVD Merge mode with MVD
  • SMVD Symmetric MVD
  • AMVR Adaptive motion vector resolution
  • the CIIP prediction combines an inter prediction signal with an intra prediction signal.
  • the inter prediction signal in the CIIP mode P inter is derived using the same inter prediction process applied to regular merge mode; and the intra prediction signal P intra is derived following the regular intra prediction process with the planar mode. Then, the intra and inter prediction signals are combined using weighted averaging, where the weight value is calculated depending on the coding modes of the top and left neighbouring blocks (depicted in FIG. 12) as follows:
  • the bi-directional predictor is composed of an AMVP predictor in one direction and a merge predictor in the other direction.
  • the mode can be enabled to a coding block when the selected merge predictor and the AMVP predictor satisfy DMVR condition, where there is at least one reference picture from the past and one reference picture from the future relatively to the current picture and the distances from two reference pictures to the current picture are the same, the bilateral matching MV refinement is applied for the merge MV candidate and AMVP MVP as a starting point. Otherwise, if template matching functionality is enabled, template matching MV refinement is applied to the merge predictor or the AMVP predictor which has a higher template matching cost.
  • AMVP part of the mode is signaled as a regular uni-directional AMVP, i.e. reference index and MVD are signaled, and it has a derived MVP index if template matching is used or MVP index is signaled when template matching is disabled.
  • AMVP direction LX X can be 0 or 1
  • the merge part in the other direction (1 –LX) is implicitly derived by minimizing the bilateral matching cost between the AMVP predictor and a merge predictor, i.e., for a pair of the AMVP and a merge motion vectors.
  • the bilateral matching cost is calculated using the merge candidate MV and the AMVP MV.
  • the merge candidate with the smallest cost is selected.
  • the bilateral matching refinement is applied to the coding block with the selected merge candidate MV and the AMVP MV as a starting point.
  • the third pass of multi pass DMVR which is 8x8 sub-PU BDOF refinement of the multi-pass DMVR is enabled to AMVP-merge mode coded block.
  • the mode is indicated by a flag, if the mode is enabled AMVP direction LX is further indicated by a flag.
  • MVD is not signalled.
  • An additional pair of AMVP-merge MVPs is introduced.
  • the merge candidate list is sorted based on the BM cost in increase order.
  • An index (0 or 1) is signaled to indicate which merge candidate in the sorted merge candidate list to use.
  • the pair of AMVP MVP and merge MVP without bilateral matching MV refinement is padded.
  • AMVR Adaptive motion vector resolution
  • MVDs motion vector differences
  • a CU-level adaptive motion vector resolution (AMVR) scheme is introduced. AMVR allows MVD of the CU to be coded in different precision.
  • the MVDs of the current CU can be adaptively selected as follows:
  • Normal AMVP mode quarter-luma-sample, half-luma-sample, integer-luma-sample or four-luma-sample.
  • Affine AMVP mode quarter-luma-sample, integer-luma-sample or 1/16 luma-sample.
  • the CU-level MVD resolution indication is conditionally signalled if the current CU has at least one non-zero MVD component. If all MVD components (that is, both horizontal and vertical MVDs for reference list L0 and reference list L1) are zero, quarter-luma-sample MVD resolution is inferred.
  • a first flag is signalled to indicate whether quarter-luma-sample MVD precision is used for the CU. If the first flag is 0, no further signaling is needed and quarter-luma-sample MVD precision is used for the current CU. Otherwise, a second flag is signalled to indicate half-luma-sample or other MVD precisions (interger or four-luma sample) is used for normal AMVP CU. In the case of half-luma-sample, a 6-tap interpolation filter instead of the default 8-tap interpolation filter is used for the half-luma sample position.
  • a third flag is signalled to indicate whether integer-luma-sample or four-luma-sample MVD precision is used for normal AMVP CU.
  • the second flag is used to indicate whether integer-luma-sample or 1/16 luma-sample MVD precision is used.
  • the motion vector predictors for the CU will be rounded to the same precision as that of the MVD before being added together with the MVD.
  • the motion vector predictors are rounded toward zero (that is, a negative motion vector predictor is rounded toward positive infinity and a positive motion vector predictor is rounded toward negative infinity) .
  • the encoder determines the motion vector resolution for the current CU using RD check.
  • the RD check of MVD precisions other than quarter-luma-sample is only invoked conditionally.
  • the RD cost of quarter-luma-sample MVD precision and integer-luma sample MV precision is computed first. Then, the RD cost of integer-luma-sample MVD precision is compared to that of quarter-luma-sample MVD precision to decide whether it is necessary to further check the RD cost of four-luma-sample MVD precision.
  • the RD check of four-luma-sample MVD precision is skipped. Then, the check of half-luma-sample MVD precision is skipped if the RD cost of integer-luma-sample MVD precision is significantly larger than the best RD cost of previously tested MVD precisions.
  • affine AMVP mode For affine AMVP mode, if affine inter mode is not selected after checking rate-distortion costs of affine merge/skip mode, merge/skip mode, quarter-luma-sample MVD precision normal AMVP mode and quarter-luma-sample MVD precision affine AMVP mode, then 1/16 luma-sample MV precision and 1-pel MV precision affine inter modes are not checked. Furthermore affine parameters obtained in quarter-luma-sample MV precision affine inter mode is used as starting search point in 1/16 luma-sample and quarter-luma-sample MV precision affine inter modes.
  • the bi-prediction signal is generated by averaging two prediction signals obtained from two different reference pictures and/or using two different motion vectors.
  • the bi-prediction mode is extended beyond simple averaging to allow weighted averaging of the two prediction signals.
  • P bi-pred ( (8-w) *P 0 +w*P 1 +4) >>3 (3-2)
  • the weight w is determined in one of two ways: 1) for a non-merge CU, the weight index is signalled after the motion vector difference; 2) for a merge CU, the weight index is inferred from neighbouring blocks based on the merge candidate index. BCW is only applied to CUs with 256 or more luma samples (i.e., CU width times CU height is greater than or equal to 256) . For low-delay pictures, all 5 weights are used. For non-low-delay pictures, only 3 weights (w ⁇ ⁇ 3, 4, 5 ⁇ ) are used.
  • affine ME When combined with affine, affine ME will be performed for unequal weights if and only if the affine mode is selected as the current best mode.
  • the BCW weight index is coded using one context coded bin followed by bypass coded bins.
  • the first context coded bin indicates if equal weight is used; and if unequal weight is used, additional bins are signalled using bypass coding to indicate which unequal weight is used.
  • Weighted prediction is a coding tool supported by the H. 264/AVC and HEVC standards to efficiently code video content with fading. Support for WP was also added into the VVC standard. WP allows weighting parameters (weight and offset) to be signalled for each reference picture in each of the reference picture lists L0 and L1. Then, during motion compensation, the weight (s) and offset (s) of the corresponding reference picture (s) are applied. WP and BCW are designed for different types of video content. In order to avoid interactions between WP and BCW, which will complicate VVC decoder design, if a CU uses WP, then the BCW weight index is not signalled, and w is inferred to be 4 (i.e. equal weight is applied) .
  • the weight index is inferred from neighbouring blocks based on the merge candidate index. This can be applied to both normal merge mode and inherited affine merge mode.
  • the affine motion information is constructed based on the motion information of up to 3 blocks.
  • the BCW index for a CU using the constructed affine merge mode is simply set equal to the BCW index of the first control point MV.
  • CIIP and BCW cannot be jointly applied for a CU.
  • the BCW index of the current CU is set to 2, e.g. equal weight.
  • the first additional prediction/hypothesis is h 3 .
  • the blending weight ⁇ can be one of two values listed in the following table.
  • the number of the additional prediction/hypothesis can be more than one.
  • ECM is a video codec software repository for developing the latest video coding algorithm
  • MHP algorithm also evolves and becomes an algorithm super set of that in JVET-M0425.
  • the MHP encoding flow of ECM 4.0 is depicted in FIG. 13. First a single reference list is constructed from reference list L0 and L1. With this step, only one reference index is required to be transmitted in the bitstream for decoding. Then the best results of the motion estimation from inter and affine search are put into a vector where possible candidates of adding additional hypothesis are stored.
  • the inherited motion search results from different merge modes are also appended to the candidate vector.
  • the motion information in these merge modes is inherited from the CU where motion searching is performed.
  • the low complexity cost search can be invoked to seek most promising prediction units for coding.
  • the candidates with lowest cost are then send to high complexity search.
  • the cost here means the SATD (sum of absolute Hadamard transformed difference) plus syntax bits multiplying the lambda (lambda is the factor to transform the bits to distortion domain widely used in rate distortion optimization in video codecs) . If MHP is the best coding mode for this CU after competing with other coding modes, MHP will be the final coding mode selected for the CU.
  • the CU can be coded with the syntax shown in Table 2 if MHP wins the competition after high complexity search. There is a while loop within the mh_pred_data function. “additional_hypothesis_flag” is used to indicate if there is another hypothesis. If “additional_hypothesis_flag” is true, “is_merge_hyp” is signaled to denote if the addition hypothesis is of merge type or AMVP type. For merge type, “merge_idx_gt_zero” and “merge_idx_minus_one” are used to signal the candidate index.
  • AMVP type For AMVP type, “mvp_add_hyp_flag” is used to signal the candidate index to select the final MVP (motion vector predictor) from the two possible choices. Besides, “ref_idx_add_hyp” is signaled to identify the reference frame for AMVP mode within the single reference list for MHP. For both the merge mode and AMVP mode, “add_hyp_weight_idx” is signaled for determining the blending weight ⁇ as listed in Table 1. With all the syntax elements, a decoder can retrieve necessary information to reconstruct the addition hypothesis set for producing correct prediction for the CU.
  • LIC is an inter prediction technique to model local illumination variation between current block and its prediction block as a function of that between current block template and reference block template.
  • the parameters of the function can be denoted by a scale ⁇ and an offset ⁇ , which forms a linear equation, that is, ⁇ *p [x] + ⁇ to compensate illumination changes, where p [x] is a reference sample pointed to by MV at a location x on reference picture.
  • the MV shall be clipped with wrap around offset taken into consideration. Since ⁇ and ⁇ can be derived based on current block template and reference block template, no signaling overhead is required for them, except that an LIC flag is signaled for AMVP mode to indicate the use of LIC.
  • JVET-O0066 The local illumination compensation proposed in JVET-O0066 is used for uni-prediction inter CUs with the following modifications.
  • Intra neighbor samples can be used in LIC parameter derivation
  • ⁇ LIC is disabled for blocks with less than 32 luma samples
  • LIC parameter derivation is performed based on the template block samples corresponding to the current CU, instead of partial template block samples corresponding to first top-left 16x16 unit;
  • Samples of the reference block template are generated by using MC with the block MV without rounding it to integer-pel precision.
  • ⁇ 0 and ⁇ 0 , and ⁇ 1 and ⁇ 1 indicate the scales and the offsets in L0 and L1, respectively; ⁇ indicates the weight (as indicated by the CU-level BCW index) for the weighted combination of L0 and L1 predictions.
  • the same derivation scheme of the LIC mode is reused and applied in one iterative manner to derive the L0 and L1 LIC parameters. Specifically, the method firstly derives the L0 parameters by minimizing difference between L0 template prediction T 0 and the template T and the samples in T are updated by subtracting the corresponding samples in T 0 . Then, the L1 parameters are calculated that minimizes the difference between L1 template prediction T 1 and the updated template. Finally, the L0 parameter is refined again in the same way.
  • one flag is signaled for AMVP bi-predicted CUs for the indication of the LIC mode while the flag is inherited for merge related inter CUs. Additionally, the LIC is disabled when decoder-side motion vector refinement (DMVR) (including multi-pass DMVR, adaptive DMVR and affine DMVR) and bi-directional optical flow (BDOF) is applied.
  • DMVR decoder-side motion vector refinement
  • BDOF bi-directional optical flow
  • the OBMC is enabled for the inter blocks that are coded with the LIC mode. And, to reduce the complexity, the OBMC is only applied to the top and left CU boundaries while being always disabled for the boundaries of the internal sub-blocks of one LIC CU. Additionally, when one neighboring block is coded with the LIC, its LIC parameters are applied to generate the corresponding prediction samples for the OBMC of one current block.
  • convolutional cross-component model (CCCM) is applied to predict chroma samples from reconstructed luma samples in a similar spirit as done by the current CCLM modes.
  • CCLM convolutional cross-component model
  • the reconstructed luma samples are down-sampled to match the lower resolution chroma grid when chroma sub-sampling is used.
  • left or top and left reference samples are used as templates for model derivation.
  • Multi-model CCCM mode can be selected for PUs which have at least 128 reference samples available.
  • the convolutional 7-tap filter consist of a 5-tap plus sign shape spatial component, a nonlinear term and a bias term.
  • the input to the spatial 5-tap component of the filter consists of a center (C) luma sample which is collocated with the chroma sample to be predicted and its above/north (N) , below/south (S) , left/west (W) and right/east (E) neighbors as illustrated below.
  • FIG. 15 shows spatial parts of the convolutional filter.
  • the nonlinear term P is represented as power of two of the center luma sample C and scaled to the sample value range of the content:
  • the bias term B represents a scalar offset between the input and output (similarly to the offset term in CCLM) and is set to middle chroma value (512 for 10-bit content) .
  • Output of the filter is calculated as a convolution between the filter coefficients c i and the input values and clipped to the range of valid chroma samples:
  • predChromaVal c 0 C + c 1 N + c 2 S + c 3 E + c 4 W + c 5 P + c 6 B
  • the filter coefficients c i are calculated by minimising MSE between predicted and reconstructed chroma samples in the reference area.
  • FIG. 16 illustrates the reference area which consists of 2 or 6 lines of chroma samples above and left of the PU. Whether to use 6 lines or 2 lines of neighbouring samples to derive the CCCM model parameters in the single model CCCM is determined by a template cost. Similarly, for the multi-model CCCM mode, the two candidates use 6 lines neighbouring luma samples or luma samples collocated to the current chroma block to derive mean values which separate samples into two groups. The cost is derived by applying the candidate CCP (either 2 or 6 lines) on a template, calculating the sum of absolute difference (SAD) between CCP predicted samples and reconstructed samples in the template.
  • SAD sum of absolute difference
  • Reference area extends one PU width to the right and one PU height below the PU boundaries. Area is adjusted to include only available samples. The extensions to the area shown in blue are needed to support the “side samples” of the plus shaped spatial filter and are padded when in unavailable areas.
  • the MSE minimization is performed by calculating autocorrelation matrix for the luma input and a cross-correlation vector between the luma input and chroma output.
  • Autocorrelation matrix is LDL decomposed and the final filter coefficients are calculated using back-substitution. The process follows roughly the calculation of the ALF filter coefficients in ECM, however LDL decomposition was chosen instead of Cholesky decomposition to avoid using square root operations.
  • the autocorrelation matrix is calculated using the reconstructed values of luma and chroma samples. These samples are full range (e.g. between 0 and 1023 for 10-bit content) resulting in relatively large values in the autocorrelation matrix. This requires high bit depth operation during the model parameters calculation. It is proposed to remove fixed offsets from luma and chroma samples in each PU for each model. This is driving down the magnitudes of the values used in the model creation and allows reducing the precision needed for the fixed-point arithmetic. As a result, 16-bit decimal precision is proposed to be used instead of the 22-bit precision of the original CCCM implementation.
  • offsetLuma, offsetCb and offsetCr Reference sample values just outside of the top-left corner of the PU are used as the offsets (offsetLuma, offsetCb and offsetCr) for simplicity.
  • the samples values used in both model creation and final prediction i.e., luma and chroma in the reference area, and luma in the current PU are reduced by these fixed values, as follows:
  • N' N –offsetLuma
  • offsetChroma is equal to offsetCr and offsetCb for Cr and Cb components, respectively:
  • the luma offset is removed during the luma reference sample interpolation. This can be done, for example, by substituting the rounding term used in the luma reference sample interpolation with an updated offset including both the rounding term and the offsetLuma.
  • the chroma offset can be removed by deducting the chroma offset directly from the reference chroma samples. As an alternative way, impact of the chroma offset can be removed from the cross-component vector giving identical result. In order to add the chroma offset back to the output of the convolutional prediction operation the chroma offset is added to the bias term of the convolutional model.
  • CCCM model parameter calculation requires division operations. Division operations are not always considered implementation friendly. The division operation are replaced with multiplication (with a scale factor) and shift operation, where scale factor and number of shifts are calculated based on denominator similar to the method used in calculation of CCLM parameters.
  • CCCM Usage of the mode is signalled with a CABAC coded PU level flag.
  • CABAC context was included to support this.
  • CCCM is considered a sub-mode of CCLM. That is, the CCCM flag is only signalled if intra prediction mode is LM_CHROMA.
  • a gradient linear model (GLM) method can be used to predict the chroma samples from luma sample gradients.
  • Two modes are supported: a two-parameter GLM mode and a three-parameter GLM mode.
  • the two-parameter GLM utilizes luma sample gradients to derive the linear model. Specifically, when the two-parameter GLM is applied, the input to the CCLM process, i.e., the down-sampled luma samples L, are replaced by luma sample gradients G. The other parts of the CCLM (e.g., parameter derivation, prediction sample linear transform) are kept unchanged.
  • C ⁇ G+ ⁇
  • a chroma sample can be predicted based on both the luma sample gradients and down-sampled luma values with different parameters.
  • the model parameters of the three-parameter GLM are derived from 6 rows and columns adjacent samples by the LDL decomposition based MSE minimization method as used in the CCCM.
  • C ⁇ 0 ⁇ G+ ⁇ 1 ⁇ L+ ⁇ 2 ⁇
  • one flag is signaled to indicate whether GLM is enabled for both Cb and Cr components; if the GLM is enabled, another flag is signaled to indicate which of the two GLM modes is selected and one syntax element is further signaled to select one of 4 gradient filters for the gradient calculation.
  • Four gradient filters are enabled for the GLM, as illustrated in FIG. 17.
  • CCCM mode with 3x2 filter using non-downsampled luma samples which consists of 6-tap spatial terms, four nonlinear terms and a bias term.
  • the 6-tap spatial terms correspond to 6 neighboring luma samples (i.e., L 0 , L 1 , ..., L 5 ) around the chroma sample (i.e., C) to be predicted, the four non-linear terms are derived from the samples L 0 , L 1 , L 2 , and L 3 as shown in FIG. 18.
  • ⁇ i is the coefficient
  • is the offset.
  • up to 6 lines/columns of chroma samples above and left to the current CU are applied to derive the filter coefficients.
  • the filter coefficients are derived based on the same LDL decomposition method used in CCCM.
  • the proposed method is signaled as an additional CCCM model besides the existing one, when the CCCM is selected, one single flag is signaled and used for both two chroma components to indicate whether the default CCCM model or the proposed CCCM model is applied. Additionally, SPS signaling is introduced to indicate whether the CCCM using non-downsampled luma samples is enabled.
  • This method maps luma values into chroma values using a filter with inputs consisting of one spatial luma sample, two gradient values, two location information, a nonlinear term, and a bias term.
  • the GL-CCCM method uses gradient and location information instead of the 4 spatial neighbor samples used in the CCCM filter.
  • G y and G x are the vertical and horizontal gradients, respectively, and are calculated as FIG. 19:
  • G y (2N + NW + NE) – (2S + SW + SE)
  • G x (2W + NW + SW) – (2E + NE + SE)
  • the Y and X are the spatial coordinates of the center luma sample.
  • the rest of the parameters are the same as CCCM tool.
  • the reference area for the parameter calculation is the same as CCCM method.
  • GL-CCCM is considered a sub-mode of CCCM. That is, the GL-CCCM flag is only signalled if original CCCM flag is true.
  • GL-CCCM tool has 6 modes for calculating the parameters:
  • the encoder performs SATD search for the 6 GL-CCCM modes along with the existing CCCM modes to find the best candidates for full RD tests.
  • H ( ⁇ ) , G1 ( ⁇ ) , G2 ( ⁇ ) , G3 ( ⁇ ) are various downsampling filters as indicated in FIG. 20, C denotes the current chroma sample position, and N, S, W, E, NE, SW are the positions around C,
  • c i are filter coefficients
  • P and B are nonlinear term and bias term
  • X and Y are the horizontal and vertical locations of the center luma sample with respect to the top-left coordinates of the block.
  • Prediction samples of MM-CCLM/MM-CCCM can be filtered with neighbouring samples.
  • a 3 ⁇ 3 low-pass filter is applied to filter prediction samples generated by MM-CCLM/MM-CCCM.
  • the filtering window may involve neighbouring reconstructed samples.
  • the filtering window only involves prediction samples, which may be padded.
  • a flag is signaled to indicate whether filtering is applied or not for a block coded with MM-CCLM/MM-CCCM.
  • CCP Cross-Component Prediction
  • a flag is signalled to indicate whether CCP mode (including the CCLM, CCCM, GLM and their variants) or non-CCP mode (conventional chroma intra prediction mode, fusion of chroma intra prediction mode) is used. If the CCP mode is selected, one more flag is signalled to indicate how to derive the CCP type and parameters, i.e., either from a CCP merge list or signalled/derived on-the-fly.
  • a CCP merge candidate list is constructed from the spatial adjacent, spatial non-adjacent, or history-based candidates. After including these candidates, default models are further included to fill the remaining empty positions in the merge list. In order to remove redundant CCP models in the list, pruning operation is applied. After constructing the list, the CCP models in the list are reordered depending on the SAD costs, which are obtained using the neighbouring template of the current block. More details are described below.
  • the positions and inclusion order of the spatial adjacent and non-adjacent candidates are the same as those defined in ECM for regular inter merge prediction candidates.
  • a history-based table is maintained to include the recently used CCP models, and the table is reset at the beginning of each CTU row. If the current list is not full after including spatial adjacent and non-adjacent candidates, the CCP models in the history-based table are added into the list.
  • CCLM candidates with default scaling parameters are considered, only when the list is not full after including the spatial adjacent, spatial non-adjacent, or history-based candidates. If the current list has no candidates with the single model CCLM mode, the default scaling parameters are ⁇ 0, 1/8, -1/8, 2/8, -2/8, 3/8, -3/8, 4/8, -4/8, 5/8, -5/8, 6/8 ⁇ . Otherwise, the default scaling parameters are ⁇ 0, the scaling parameter of the first CCLM candidate + ⁇ 1/8, -1/8, 2/8, -2/8, 3/8, -3/8, 4/8, -4/8, 5/8, -5/8, 6/8 ⁇ ⁇ .
  • the offset parameter is derived according to the default scaling parameter, average neighbouring reconstructed luma sample value, and average neighbouring reconstructed Cb/Cr sample value.
  • a flag is signaled to indicate whether the CCP merge mode is applied or not. If CCP merge mode is applied, an index is signaled to indicate which candidate model is used by the current block. In addition, CCP merge mode is not allowed for the current chroma coding block when the current CU is coded by intra subpartitions (ISP) with single tree, or the current chroma coding block size is less than or equal to 16.
  • ISP intra subpartitions
  • the number of directional intra modes in VVC is extended from 33, as used in HEVC, to 65.
  • the new directional modes not in HEVC are depicted as red dotted arrows in FIG. 22, and the planar and DC modes remain the same.
  • These denser directional intra prediction modes apply for all block sizes and for both luma and chroma intra predictions.
  • every intra-coded block has a square shape and the length of each of its side is a power of 2. Thus, no division operations are required to generate an intra-predictor using DC mode.
  • blocks can have a rectangular shape that necessitates the use of a division operation per block in the general case. To avoid division operations for DC prediction, only the longer side is used to compute the average for non-square blocks.
  • MPM most probable mode
  • a unified 6-MPM list is used for intra blocks irrespective of whether MRL and ISP coding tools are applied or not.
  • the MPM list is constructed based on intra modes of the left and above neighboring block. Suppose the mode of the left is denoted as Left and the mode of the above block is denoted as Above, the unified MPM list is constructed as follows:
  • Max –Min is equal to 1 :
  • Max –Min is greater than or equal to 62 :
  • Max –Min is equal to 2 :
  • the first bin of the mpm index codeword is CABAC context coded. In total three contexts are used, corresponding to whether the current intra block is MRL enabled, ISP enabled, or a normal intra block.
  • TBC Truncated Binary Code
  • Conventional angular intra prediction directions are defined from 45 degrees to -135 degrees in clockwise direction.
  • VVC several conventional angular intra prediction modes are adaptively replaced with wide-angle intra prediction modes for non-square blocks.
  • the replaced modes are signalled using the original mode indexes, which are remapped to the indexes of wide angular modes after parsing.
  • the total number of intra prediction modes is unchanged, i.e., 67, and the intra mode coding method is unchanged.
  • top reference with length 2W+1, and the left reference with length 2H+1 are defined as shown in FIG. 23.
  • the number of replaced modes in wide-angular direction mode depends on the aspect ratio of a block.
  • the replaced intra prediction modes are illustrated in Table 1.5-1
  • two vertically-adjacent predicted samples may use two non-adjacent reference samples in the case of wide-angle intra prediction.
  • low-pass reference samples filter and side smoothing are applied to the wide-angle prediction to reduce the negative effect of the increased gap ⁇ p ⁇ .
  • a wide-angle mode represents a non-fractional offset.
  • There are 8 modes in the wide-angle modes satisfy this condition, which are [-14, -12, -10, -6, 72, 76, 78, 80] .
  • the samples in the reference buffer are directly copied without applying any interpolation.
  • this modification the number of samples needed to be smoothing is reduced. Besides, it aligns the design of non-fractional modes in the conventional prediction modes and wide-angle modes.
  • Chroma derived mode (DM) derivation table for 4: 2: 2 chroma format was initially ported from HEVC extending the number of entries from 35 to 67 to align with the extension of intra prediction modes. Since HEVC specification does not support prediction angle below -135 degree and above 45 degree, luma intra prediction modes ranging from 2 to 5 are mapped to 2. Therefore chroma DM derivation table for 4: 2: 2: chroma format is updated by replacing some values of the entries of the mapping table to convert prediction angle more precisely for chroma blocks.
  • Derived intra modes are included into the primary list of intra most probable modes (MPM) , so the DIMD process is performed before the MPM list is constructed.
  • the primary derived intra mode of a DIMD block is stored with a block and is used for MPM list construction of the neighboring blocks.
  • the DIMD chroma mode uses the DIMD derivation method to derive the chroma intra prediction mode of the current block based on the neighboring reconstructed Y, Cb and Cr samples in the second neighboring row and column as shown in FIG. 25. Specifically, a horizontal gradient and a vertical gradient are calculated for each collocated reconstructed luma sample of the current chroma block, as well as the reconstructed Cb and Cr samples, to build a HoG. Then the intra prediction mode with the largest histogram amplitude values is used for performing chroma intra prediction of the current chroma block.
  • the intra prediction mode derived from the DIMD chroma mode is the same as the intra prediction mode derived from the DM mode, the intra prediction mode with the second largest histogram amplitude value is used as the DIMD chroma mode.
  • a CU level flag is signaled to indicate whether the proposed DIMD chroma mode is applied.
  • pred0 is the predictor obtained by applying the non-LM mode
  • pred1 is the predictor obtained by applying the MMLM_LT mode
  • pred is the final predictor of the current chroma block.
  • Intra template matching prediction is a special intra prediction mode that copies the best prediction block from the reconstructed part of the current frame, whose L-shaped template matches the current template. For a predefined search range, the encoder searches for the most similar template to the current template in a reconstructed part of the current frame and uses the corresponding block as a prediction block. The encoder then signals the usage of this mode, and the same prediction operation is performed at the decoder side.
  • the prediction signal is generated by matching the L-shaped causal neighbor of the current block with another block in a predefined search area in FIG. 26 consisting of:
  • Sum of absolute differences (SAD) is used as a cost function.
  • the decoder searches for the template that has least SAD with respect to the current one and uses its corresponding block as a prediction block.
  • the dimensions of all regions are set proportional to the block dimension (BlkW, BlkH) to have a fixed number of SAD comparisons per pixel. That is:
  • ‘a’ is a constant that controls the gain/complexity trade-off. In practice, ‘a’ is equal to 5.
  • the search range of all search regions is subsampled by a factor of 2. This leads to a reduction of template matching search by 4.
  • a refinement process is performed. The refinement is done via a second template matching search around the best match with a reduced range.
  • the reduced range is defined as min(BlkW, BlkH) /2.
  • the Intra template matching tool is enabled for CUs with size less than or equal to 64 in width and height. This maximum CU size for Intra template matching is configurable.
  • the Intra template matching prediction mode is signaled at CU level through a dedicated flag when DIMD is not used for current CU.
  • block vector (BV) derived from the intra template matching prediction (IntraTMP) is used for intra block copy (IBC) .
  • IntraTMP BV of the neighbouring blocks along with IBC BV are used as spatial BV candidates in IBC candidate list construction.
  • IntraTMP block vector is stored in the IBC block vector buffer and, the current IBC block can use both IBC BV and IntraTMP BV of neighbouring blocks as BV candidate for IBC BV candidate list as shown in FIG. 27.
  • IntraTMP block vectors are added to IBC block vector candidate list as spatial candidates.
  • the proposed IntraTMP fusion algorithm includes the following aspects:
  • a candidate list is initially generated with 30 matched blocks having the smallest template SAD. For each matched block, a full pixel refinement search is performed within a small region. The sampling factor of subsampled search process is 3 and the refinement region is 3x3 around each of the 30 matched blocks. Then the best 3 candidate matched blocks by template SAD across all refinement regions are selected.
  • a threshold is used to judge whether it should be used for fusion.
  • Threshold SAD 1 ⁇ 1
  • SAD 1 is the smallest template SAD of the three candidate matched blocks.
  • fusion weights are calculated by their SAD.
  • the weights are calculated as follows:
  • the division operations are replaced by an integer look-up table (LUT) .
  • LUT integer look-up table
  • weights are used to further reduce the complexity.
  • the weights are set as or
  • p i the i th matched block
  • n the number of blocks selected for fusion.
  • p TMP is the single matched block and p intra is the intra predictor derived by planar mode.
  • a CU level flag is added to signal whether an IntraTMP CU is predicted by the proposed fusion method or original method.
  • the proposed method with enlarged search range is also tested.
  • the modified search range can be described as follows:
  • minSearchRange is set as 128 in the following test.
  • Fused template cost is computed as follows where fusion operation is same as used for P1 and P2
  • Tf fusion of T1 and T2
  • costFusion costLeft + costTop
  • TIMD modes For each intra prediction mode in MPMs, The SATD between the prediction and reconstruction samples of the template is calculated. First two intra prediction modes with the minimum SATD are selected as the TIMD modes. These two TIMD modes are fused with the weights after applying PDPC process, and such weighted intra prediction is used to code the current CU. Position dependent intra prediction combination (PDPC) is included in the derivation of the TIMD modes.
  • PDPC Position dependent intra prediction combination
  • Weights of the modes are computed from their SATD costs as follows:
  • weight1 costMode2 / (costMode1+ costMode2)
  • weight2 1 -weight1
  • the division operations are conducted using the same lookup table (LUT) based integerization scheme used by the CCLM.
  • LUT lookup table
  • This intra prediction method derives predicted samples as a weighted combination of multiple predictors generated from different reference lines. In this process multiple intra predictors are generated and then fused by weighted averaging. The process of deriving the predictors to be used in the fusion process is described as follows:
  • the number of predictors selected for a weighted average is increased from 3 to 6.
  • Intra prediction fusion method is applied to luma blocks when angular intra mode has non-integer slope (required reference samples interpolation) and the block size is greater than 16, it is used with MRL and not applied for ISP coded blocks.
  • PDPC is applied for the intra prediction mode using the closest to the current block reference line
  • the prediction samples are generated by weighting an inter prediction signal predicted using CIIP-TM merge candidate and an intra prediction signal predicted using TIMD derived intra prediction mode.
  • the method is only applied to coding blocks with an area less than or equal to 1024.
  • the TIMD derivation method is used to derive the intra prediction mode in CIIP. Specifically, the intra prediction mode with the smallest SATD values in the TIMD mode list is selected and mapped to one of the 67 regular intra prediction modes.
  • CIIP-TM a CIIP-TM merge candidate list is built for the CIIP-TM mode.
  • the merge candidates are refined by template matching.
  • the CIIP-TM merge candidates are also reordered by the ARMC method as regular merge candidates.
  • the maximum number of CIIP-TM merge candidates is equal to two.
  • the blending weights of BCW, MHP, CIIP, AMVP-Merge or other modes that have multiple predictors are all fixed or signaled by bits, which is either lack of flexibility or burdensome to the coding efficiency.
  • some of the inter modes utilize the template matching search, boundary matching search or other matching search to find the refined the MVs or predictors with minimum matching cost in a pre-defined search region iteratively which only depends on pixel-based matching cost and neglects the spatial information from neighboring pixels.
  • T L0 represents the sample value of a sample in L0 template region
  • T L1 represents the sample value of a sample in L1 template region
  • B is the bias term which could be a predefined factor or a factor proportional to the bit depth.
  • the model of regression-based BCW could further include all or part of spatial terms, gradient terms, position terms non-linear terms, the term of sample difference between L0 and L1, and/or any other terms similar to CCCM model described in Section 2.
  • the following formula is an example of the model of regression-based BCW.
  • PosX and PosY represent the horizontal and vertical positions of the sample
  • Gx and Gy represent the horizontal and vertical gradients of the sample
  • N, S, E, and W represent the spatial terms same as CCCM.
  • c 0 , c 1 , ..., and c 13 are the regression weights derived by minimizing the MSE formula as follows.
  • some regularization terms are added to the MSE formula and used to constrain the value of some regression weights.
  • the regularization term aims at stabilizing the regression process and preventing large derived regression weights which are usually unreasonable.
  • two regularization terms are added as follows to constrain the value of regression weights close to the inherited BCW weights in merge modes or the signaled BCW weights in AMVP modes.
  • MSE
  • w 0 and w 1 indicate the inherited or signaled BCW weights (e.g., 3/8 and 5/8) of L0 and L1 respectively
  • ⁇ 0 and ⁇ 1 are the scaling factors of these two regularization terms respectively and can be fixed values or proportional to the magnitudes of the average of T L0 , T L1 , T L0 *T L1 , or any other values.
  • the derived regression weights of L0 and L1 will be closer to w 0 and w 1 .
  • w 0 and w 1 can be a fixed equal BCW weights (e.g., 1/2 and 1/2) independent to the inherited or signaled BCW weights.
  • the regularization terms mentioned above could be combined into the auto-correlation matrix and the cross-correlation vector as follow.
  • MSE
  • C is a predefined factor which is set to the sum of the regression weights we want to constrain. For example, if we set the C to 1, the sum of derived regression weights will be close to 1.
  • the regularization terms mentioned above could be combined into the auto-correlation matrix and the cross-correlation vector as follow.
  • the regularization term can be defined as ⁇ 0
  • MSE
  • w 0 indicates the inherited or signaled BCW weights of L0
  • w 1 indicates the inherited or signaled BCW weights of L1.
  • the regularization term only constrains the ratio of c 0 and c 1 but not the exact values of c 0 and c 1 which is more flexible.
  • the regularization terms mentioned above could be combined into the auto-correlation matrix and the cross-correlation vector as follow.
  • the BCW signaling of AMVP mode can be skipped when the regression-based BCW mode is selected.
  • a flag is signaled to decoder to indicate whether the regression-based BCW is selected or not. If the regression-based BCW is selected, the flag is true and the original BCW signaling is skipped. Otherwise, if the flag is false, the original BCW signaling is kept and used to determine the BCW weight of the AMVP-mode coded CU at decoder.
  • the regression-based BCW mode is treated as one sub-mode of BCW. That is, one candidate of BCW weights set is used to represent the regression-based BCW mode without additional flag.
  • one additional flag is used to indicate the regression-based BCW mode is used to further refine the selected BCW weights, as introduced in the above method. That is, if this flag is true, the selected BCW weights are used as regularization terms in the regression- based BCW mode and the regression weights are applied to generate the final predictor. Otherwise (the flag is false) , the selected BCW weights are directly applied.
  • the entire original BCW process can be replaced by regression-based BCW. That is, the RD costs of original BCW candidates don’t need to be calculated at encoder but only the RD cost of regression-based BCW is needed. If the RD cost of regression-based BCW is lower than other modes, a flag is set to be true and signaled to decoder. Otherwise, the flag is set to be false and signaled to decoder to indicate the entire BCW blending process is skipped.
  • a LIC flag is signaled to indicate the on/off condition of LIC and a BCW index is signaled to indicate the BCW weights. Since both tools try to derive linear models to finetune the bi-predictive sample value for a bi-predictive CU, these two tools could be mutually exclusive or combined under conditions.
  • the bi-predictive LIC with original BCW blending and regression-based BCW are mutually exclusive and the selection between these two tools are determined by a flag. That is, if the flag is true, regression-based BCW is performed. Otherwise, the bi-predictive LIC is performed with original BCW blending. Disabling both LIC and BCW is not supported in this example.
  • regression-based BCW is used to replace the bi-predictive LIC and original BCW for a bi-predictive CU. That is, if the flag is true, regression-based BCW is performed. Otherwise, both LIC and BCW are NOT performed.
  • the bi-predictive LIC with original BCW blending and regression-based BCW with signaled BCW index are mutually exclusive and the selection between these two tools are determined by a flag. That is, if the flag is true, regression-based BCW is performed with the exist of signaled BCW weights (i.e., the regularization terms consider the value of signaled BCW weights) . Otherwise, the bi-predictive LIC is performed with original BCW blending.
  • regression-based BCW with signaled BCW index is used to replace the bi-predictive LIC and original BCW for a bi-predictive CU That is, if the flag is true, regression-based BCW is performed with the exist of signaled BCW weights. Otherwise, the bi-predictive LIC is NOT performed.
  • TWO flags are signaled to indicate the usage of bi-predictive LIC and regression-based BCW for a bi-predictive CU. That is, if the first flag is false, both bi-predictive LIC and BCW are NOT performed. If the first flag is true and the second flag is false, the regression-based BCW is performed without original BCW signaling. If the first flag is true and the second flag is true, the regression-based BCW is performed with the exist of original BCW signaling.
  • TWO flags are signaled to indicate the usage of bi-predictive LIC, regression-based BCW and original BCW for a bi-predictive CU. That is, if the first flag is false, both bi-predictive LIC and BCW are NOT performed. ---If the first flag is true and the second flag is false, the original BCW is performed. If the first flag is true and the second flag is true, the regression-based BCW is performed with the exist of original BCW signaling.
  • TWO flags are signaled to indicate the usage of bi-predictive LIC and BCW for a bi-predictive CU. -- That is, if the first flag is false, both bi-predictive LIC and BCW are NOT performed. --If the first flag is true and the second flag is false, the bi-predictive LIC is performed but BCW is NOT performed. If the first flag is true and the second flag is true, the regression-based BCW is performed with the exist of original BCW signaling.
  • TWO flags are signaled to indicate the usage of bi-predictive LIC, regression-based BCW and original BCW for a bi-predictive CU. That is, if the first flag is false, both bi-predictive LIC and BCW are NOT performed. If the first flag is true and the second flag is false, the bi-predictive LIC is performed and the original BCW is performed. If the first flag is true and the second flag is true, the regression-based BCW is performed with the exist of original BCW signaling.
  • FIG. 30 shows a regression model for the inter modes with multiple predictors.
  • a set of regression parameter P including the weights for cross-prediction sample blending, and/or center sample, and/or spatial samples, and/or non-linear terms, and/or bias terms of the inter modes with multiple predictors (e.g., BCW, MHP, CIIP, AMVP-Merge, etc. ) by regression using the template samples in a specific region.
  • the set of regression parameter P can be derived.
  • the parameter set P is then applied to the samples in the reference predictors to derive the samples in current CU.
  • the number of spatial samples K can be any number, the spatial samples can be obtained from arbitrary positions inside the template region (e.g., 1, 2, 4, etc. template lines) and the non-linear terms can be multiple and calculated as the higher order of C, N, S, E or W.
  • the final prediction sample is the weighted sum of all the refined samples according to the weight ⁇ i of each prediction.
  • the number of the prediction source can be any number.
  • FIG. 31 shows another regression model for the inter modes with multiple predictors.
  • the number of spatial samples K 0 and K 1 can be any number, the spatial gradients and location terms can be obtained from arbitrary positions inside the template region (e.g., 1, 2, 4, etc. template lines) and the non-linear terms can be multiple and calculated as the higher order of C, N, S, E and/or W.
  • the final prediction sample is the weighted sum of all the refined samples according to the weight ⁇ i of each source predictor.
  • the number of the source predictor can be any number.
  • a parameter set P b including the L0 and L1 blending weights, and/or the weight of center sample, and/or the weights of spatial samples, and/or the weights of non-linear terms, and/or the weight of bias term of BCW mode is derived by a regression model minimizing the SAD, SSD, or other sum of difference between the current and reference refined samples in template region.
  • the parameter set P b is then applied to the samples in the L0 and L1 reference predictors to derive the samples in current CU.
  • a parameter set P m including the MHP blending weights, and/or the weight of center sample, and/or the weights of spatial samples, and/or the weights of non-linear terms, and/or the weight of bias term of MHP mode is derived by a regression model minimizing the SAD, SSD, or other sum of difference between the current and reference refined samples in template region.
  • the parameter set P m is then applied to the samples in the reference predictors to derive the samples in current CU.
  • a parameter set P c including the CIIP blending weights, and/or the weight of center sample, and/or the weights of spatial samples, and/or the weights of non-linear terms, and/or the weight of bias term of CIIP mode is derived by a regression model minimizing the SAD, SSD, or other sum of difference between the current and reference refined samples in template region.
  • the parameter set P c is then applied to the samples in the inter and intra reference predictors to derive the samples in current CU.
  • a parameter set P a including the AMVP-Merge blending weights, and/or the weight of center sample, and/or the weights of spatial samples, and/or the weights of non-linear terms, and/or the weight of bias term of AMVP-Merge mode is derived by a regression model minimizing the SAD, SSD, or other sum of difference between the current and reference refined samples in template region.
  • the parameter set P a is then applied to the samples in the inter AMVP and inter merge reference predictors to derive the samples in current CU.
  • the final weights i.e., the parameter set including the weights for cross-prediction sample blending, and/or center sample, and/or spatial samples, and/or non-linear terms, and/or bias terms
  • the weights including the weights derived by regression model and the original weights of each inter mode with minimum RD cost will be the final cross-prediction blending weights and one flag is signaled to determine whether to use the weights derived by regression model at decoder side explicitly.
  • the weights i.e., the parameter set including the weights for cross-prediction sample blending, and/or center sample, and/or spatial samples, and/or non-linear terms, and/or bias terms
  • the weights derived by regression model can be used to guide the final weights of each inter mode with multiple prediction sources instead of directly being used. That is, only the weights in original weight candidate set of each mode close to (i.e., the difference of weights are smaller than a threshold) the weights derived by regression model are preserved and have chance to be chose at encoder side which can reduced the size of candidate set and further reduce the signaling bits of the final weights.
  • one or more control syntaxes/flag can be added to enable the above-mentioned function or not.
  • the one or more control syntaxes/flag can be added on CU-level, CTU-level, CTU-row-level, CTU group-level, slice-level, slice header, picture-level, picture header, PPS, sequence-level, SPS, APS, or VPS. It the function is disabled, and the default weight was applied.
  • the inferring disable mechanism can also be applied.
  • One condition check is applied to evaluate the legality of the derived parameter.
  • the derived parameter can be applied to get the derived predictors. However, if the derived predictors are far from (out of a predefined range or an adaptive range) the predictors of the original method (e.g., using default weight) , or if the derived parameters are far from (out of a predefined range or an adaptive range) the parameters of the original method (e.g., default wright) , the proposed method was disabled. It will fall back to the original method.
  • a set of regression parameter P including the weights for cross-prediction sample blending, and/or center sample, and/or spatial samples, and/or non-linear terms, and/or bias terms of the intra modes with multiple predictors (e.g., DIMD, TIMD, fusion of Chroma Intra prediction, IntraTMP, Intra prediction fusion, combination of CIIP with TIMD and TM merge, etc. ) by regression using the template samples in a specific region.
  • multiple predictors e.g., DIMD, TIMD, fusion of Chroma Intra prediction, IntraTMP, Intra prediction fusion, combination of CIIP with TIMD and TM merge, etc.
  • the set of regression parameter P can be derived.
  • the parameter set P is then applied to the samples of the reference predictors to derive the samples in current CU.
  • FIG. 32 shows a regression model for the intra modes with multiple predictors.
  • the refined sample consists of K (e.g., 5) spatial samples, a nonlinear term, and a bias term. That is,
  • the number of spatial samples K can be any number, the spatial samples can be obtained from arbitrary positions inside the template region (e.g., 1, 2, 4, etc. template lines) and the non-linear terms can be multiple and calculated as the higher order of C, N, S, E and/or W.
  • the final prediction sample is the weighted sum of all the refined samples according to the weight ⁇ i of each source predictor.
  • the number of the source predictor can be any number.
  • the number of spatial samples K 0 and K 1 can be any number, the spatial gradients and location terms can be obtained from arbitrary positions inside the template region (e.g., 1, 2, 4, etc. template lines) and the non-linear terms can be multiple and calculated as the higher order of C, N, S, E and/or W.
  • the final prediction sample is the weighted sum of all the refined samples according to the weight ⁇ i of each source predictor.
  • the number of the source predictor can be any number.
  • a parameter set P d including the blending weights of three predictors (i.e., two intra angular mode predictors derived from the reconstructed neighbor samples and the planar mode predictor) , and/or the weight of center sample, and/or the weights of spatial samples, and/or the weights of non-linear terms, and/or the weight of bias term of DIMD mode is derived by a regression model minimizing the SAD, SSD, or other sum of difference between the current and reference refined samples in template region wherein the current samples indicates the neighboring reconstructed samples and the reference samples indicates the neighboring predicted samples derived according to the intra mode.
  • the parameter set P d is then applied to the samples of the three source predictors to derive the samples in current CU.
  • a parameter set P t including the blending weights of two predictors i.e., the first two intra prediction modes in MPMs with the minimum SATD
  • the weight of center sample i.e., the weights of spatial samples, and/or the weights of non-linear terms, and/or the weight of bias term of TIMD mode
  • the parameter set P t is then applied to the samples of the two source predictors to derive the samples in current CU.
  • a parameter set P f including the blending weights of two chroma predictors, and/or the weight of center sample, and/or the weights of spatial samples, and/or the weights of non-linear terms, and/or the weight of bias term of fusion of Chroma Intra prediction mode is derived by a regression model minimizing the SAD, SSD, or other sum of difference between the current and reference refined samples in template region wherein the current samples indicates the neighboring reconstructed samples and the reference samples indicates the neighboring predicted samples derived according to the intra mode.
  • the parameter set P f is then applied to the samples of the two chroma source predictors to derive the samples in current CU.
  • a parameter set P i including the blending weights of multiple predictors, and/or the weight of center sample, and/or the weights of spatial samples, and/or the weights of non-linear terms, and/or the weight of bias term of Intra Template-Matching Prediction Fusion mode proposed in JVET-AC0069 or Fusion of Intra Template Matching mode proposed in JVET-AC0107 or other Intra Template Matching fusion method which combines multiple source predictors to derive the final predictor is derived by a regression model minimizing the SAD, SSD, or other sum of difference between the current and reference refined samples in template region wherein the current samples indicates the neighboring reconstructed samples and the reference samples indicates the L-shaped reconstructed samples of the source predictors.
  • the parameter set P i is then applied to the samples of the source predictors of Intra Template-Matching Prediction Fusion mode or Fusion of Intra Template Matching mode or other Intra Template Matching fusion method which combines multiple source predictors to derive the final predictor to derive the samples in current CU.
  • a parameter set P m including the blending weights of predictors derived from different reference lines, and/or the weight of center sample, and/or the weights of spatial samples, and/or the weights of non-linear terms, and/or the weight of bias term of Intra prediction fusion mode is derived by a regression model minimizing the SAD, SSD, or other sum of difference between the current and reference refined samples in template region wherein the current samples indicates the neighboring reconstructed samples and the reference samples indicates the neighboring predicted samples derived from different reference lines.
  • the parameter set P m is then applied to the samples of the source predictors derived from different reference lines to derive the samples in current CU.
  • a parameter set P l including the blending weights of predictors derived from different reference lines, and/or the weight of center sample, and/or the weights of spatial samples, and/or the weights of non-linear terms, and/or the weight of bias term of Intra prediction fusion mode is derived by a regression model minimizing the SAD, SSD, or other sum of difference between the current and reference refined samples in template region wherein the current samples indicates the neighboring reconstructed samples and the reference samples indicates the neighboring predicted samples derived from different reference lines.
  • the parameter set P l is then applied to the samples of the source predictors derived from different reference lines to derive the samples in current CU.
  • a parameter set P c including the blending weights of inter and intra predictors, and/or the weight of center sample, and/or the weights of spatial samples, and/or the weights of non-linear terms, and/or the weight of bias term of combination of CIIP with TIMD and TM merge mode is derived by a regression model minimizing the SAD, SSD, or other sum of difference between the current and reference refined samples in template region wherein the current samples indicates the neighboring reconstructed samples and the reference samples indicates the reconstructed samples neighboring to the reference inter predictor for inter part and the neighboring predicted samples derived according to the intra mode for intra part.
  • the parameter set P c is then applied to the samples of the inter and intra predictors to derive the samples in current CU.
  • the final weights i.e., the parameter set including the weights for cross-prediction sample blending, and/or center sample, and/or spatial samples, and/or non-linear terms, and/or bias terms
  • the weights including the weights derived by regression model and the original weights of each intra mode with minimum RD cost will be the final cross-prediction blending weights and one flag is signaled to determine whether to use the weights derived by regression model at decoder side explicitly.
  • the weights i.e., the parameter set including the weights for cross-prediction sample blending, and/or center sample, and/or spatial samples, and/or non-linear terms, and/or bias terms
  • the weights derived by regression model can be used to guide the final weights of each intra mode with multiple prediction sources instead of directly being used. That is, only the weights in original weight candidate set of each mode close to (i.e., the difference of weights is smaller than a threshold) the weights derived by regression model are preserved and have chance to be chose at encoder side which can reduced the size of candidate set and further reduce the signaling bits of the final weights.
  • one or more control syntaxes/flag can be added to enable the above-mentioned function or not.
  • the one or more control syntaxes/flag can be added on CU-level, CTU-level, CTU-row-level, CTU group-level, slice-level, slice header, picture-level, picture header, PPS, sequence-level, SPS, APS, or VPS. It the function is disabled, and the default weight was applied.
  • the inferring disable mechanism can also be applied.
  • One condition check is applied to evaluate the legality of the derived parameter.
  • the derived parameter can be applied to get the derived predictors. However, if the derived predictors are far from (out of a predefined range or an adaptive range) the predictors of the original method (e.g., using default weight) , or if the derived parameters are far from (out of a predefined range or an adaptive range) the parameters of the original method (e.g., default wright) , the proposed method was disabled. It will fall back to the original method.
  • any of the foregoing proposed methods can be implemented in encoders and/or decoders.
  • any of the proposed methods can be implemented in inter coding of an encoder, and/or a decoder.
  • any of the proposed methods can be implemented as a circuit coupled to the inter coding of the encoder and/or the decoder, so as to provide the information needed by the inter coding.
  • Combinations such as “at least one of A, B, or C, ” “one or more of A, B, or C, ” “at least one of A, B, and C, ” “one or more of A, B, and C, ” and “A, B, C, or any combination thereof” include any combination of A, B, and/or C, and may include multiples of A, multiples of B, or multiples of C.
  • combinations such as “at least one of A, B, or C, ” “one or more of A, B, or C, ” “at least one of A, B, and C, ” “one or more of A, B, and C, ” and “A, B, C, or any combination thereof” may be A only, B only, C only, A and B, A and C, B and C, or A and B and C, where any such combinations may contain one or more member or members of A, B, or C.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

Un codeur utilise un modèle de prédiction basé sur une régression pour déterminer des pondérations de mélange pour des trames de référence de liste 0 (L0) et de liste 1 (L1). Le codeur comprend au moins un terme de L0 associé à la trame de référence de L0, un terme L1 associé à la trame de référence de L1, et un terme de biais dans le modèle de prédiction basé sur la régression. Le codeur utilise les pondérations de mélange déterminées pour générer des valeurs bi-prédictives pour des pixels dans une unité de codage (CU) sur la base de valeurs de pixels correspondants dans les trames de référence de L0 et de L1. Le codeur dérive les pondérations de mélange sur la base d'une erreur quadratique moyenne (MSE) entre les valeurs bi-prédictives et les valeurs de pixel réelles à l'intérieur de la CU.
PCT/CN2024/072071 2023-01-13 2024-01-12 Modes de codage basés sur la régression Ceased WO2024149384A1 (fr)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US202363479752P 2023-01-13 2023-01-13
US202363479750P 2023-01-13 2023-01-13
US63/479752 2023-01-13
US63/479750 2023-01-13
US202363579052P 2023-08-28 2023-08-28
US63/579052 2023-08-28

Publications (1)

Publication Number Publication Date
WO2024149384A1 true WO2024149384A1 (fr) 2024-07-18

Family

ID=91897885

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2024/072071 Ceased WO2024149384A1 (fr) 2023-01-13 2024-01-12 Modes de codage basés sur la régression

Country Status (1)

Country Link
WO (1) WO2024149384A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4676039A1 (fr) * 2024-07-02 2026-01-07 InterDigital CE Patent Holdings, SAS Régularisation pour dériver des modèles convolutifs pour une prédiction de bloc
EP4730788A1 (fr) * 2024-10-16 2026-04-22 InterDigital CE Patent Holdings, SAS Procédés de repli de partitionnement gpm basé sur une régression

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005217746A (ja) * 2004-01-29 2005-08-11 Kddi Corp 動き予測情報検出装置
US20200021845A1 (en) * 2018-07-14 2020-01-16 Mediatek Inc. Method and Apparatus of Constrained Overlapped Block Motion Compensation in Video Coding
US20200107015A1 (en) * 2016-10-05 2020-04-02 Lg Electronics Inc. Method and apparatus for decoding image in image coding system
CN113170122A (zh) * 2018-12-01 2021-07-23 北京字节跳动网络技术有限公司 帧内预测的参数推导
US20220232233A1 (en) * 2021-01-15 2022-07-21 Tencent America LLC Method and apparatus for video coding

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005217746A (ja) * 2004-01-29 2005-08-11 Kddi Corp 動き予測情報検出装置
US20200107015A1 (en) * 2016-10-05 2020-04-02 Lg Electronics Inc. Method and apparatus for decoding image in image coding system
US20200021845A1 (en) * 2018-07-14 2020-01-16 Mediatek Inc. Method and Apparatus of Constrained Overlapped Block Motion Compensation in Video Coding
CN113170122A (zh) * 2018-12-01 2021-07-23 北京字节跳动网络技术有限公司 帧内预测的参数推导
US20220232233A1 (en) * 2021-01-15 2022-07-21 Tencent America LLC Method and apparatus for video coding

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4676039A1 (fr) * 2024-07-02 2026-01-07 InterDigital CE Patent Holdings, SAS Régularisation pour dériver des modèles convolutifs pour une prédiction de bloc
EP4730788A1 (fr) * 2024-10-16 2026-04-22 InterDigital CE Patent Holdings, SAS Procédés de repli de partitionnement gpm basé sur une régression
WO2026082864A1 (fr) * 2024-10-16 2026-04-23 Interdigital Ce Patent Holdings, Sas Procédés de repli de partitionnement de gpm basés sur une régression

Similar Documents

Publication Publication Date Title
US11212523B2 (en) Video processing methods and apparatuses of merge number signaling in video coding systems
US11166037B2 (en) Mutual excluding settings for multiple tools
WO2020035054A1 (fr) Procédés et appareils de traitement vidéo avec prédiction bidirectionnelle dans des systèmes de codage vidéo
WO2023198142A1 (fr) Procédé et appareil de prédiction implicite de composantes croisées dans un système de codage vidéo
US20250063155A1 (en) Method and Apparatus for Cross Component Linear Model with Multiple Hypotheses Intra Modes in Video Coding System
US20250080756A1 (en) Method and Apparatus for Cross Component Linear Model for Inter Prediction in Video Coding System
WO2024149384A1 (fr) Modes de codage basés sur la régression
WO2024153085A1 (fr) Procédé de codage vidéo et appareil de prédiction de chrominance
WO2024109618A1 (fr) Procédé et appareil pour hériter de modèles à composante transversale avec propagation d'informations à composante transversale dans un système de codage vidéo
WO2024074125A1 (fr) Procédé et appareil de dérivation de modèle linéaire implicite à l'aide de multiples lignes de référence pour une prédiction inter-composantes
WO2025082308A1 (fr) Procédés et appareil de signalisation pour compensation d'éclairage local
WO2025026397A1 (fr) Procédés et appareil de codage vidéo utilisant une prédiction inter-composantes à hypothèses multiples pour un codage de chrominance
WO2025007977A1 (fr) Procédé et appareil permettant de construire une liste de candidats pour hériter de modèles inter-composants voisins pour un codage inter de chrominance
WO2025209049A1 (fr) Procédés et appareil de commande d'outils de codage basés sur un modèle dans un codage vidéo
WO2025156991A1 (fr) Procédés et appareil de dérivation et d'héritage de modèle de compensation d'éclairage local avec vecteur de mouvement chaîné pour codage vidéo
WO2025007952A1 (fr) Procédés et appareil d'amélioration de codage vidéo par dérivation de modèle
WO2024149247A1 (fr) Procédés et appareil de mode de fusion de modèle inter-composantes par région pour codage vidéo
WO2024153079A1 (fr) Procédé et appareil de codage vidéo de prédiction de chrominance
WO2025157169A1 (fr) Modèle de prédiction intra d'extrapolation pour codage inter chroma
WO2024193386A1 (fr) Procédé et appareil de fusion de mode luma intra de modèle dans un système de codage vidéo
WO2025148640A1 (fr) Procédé et appareil de mélange basé sur la régression pour améliorer la fusion de prédiction intra dans un système de codage vidéo
WO2024222798A1 (fr) Procédés et appareil pour hériter de modèles à composants transversaux décalés par vecteur de bloc pour un codage vidéo
WO2024193431A1 (fr) Procédé et appareil de prédiction combinée dans un système de codage vidéo
WO2026012384A1 (fr) Procédé et appareil de région inter partagée pour mode de prédiction inter dérivé côté décodeur et mode de fusion interccp dans le codage vidéo
WO2024222760A1 (fr) Procédé et appareil de codage vidéo pour améliorer la prédiction de chrominance par fusion

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24741378

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 24741378

Country of ref document: EP

Kind code of ref document: A1