WO2020264221A1 - Appareils et procédés pour réguler la largeur binaire d'un flux optique bidirectionnel - Google Patents

Appareils et procédés pour réguler la largeur binaire d'un flux optique bidirectionnel Download PDF

Info

Publication number
WO2020264221A1
WO2020264221A1 PCT/US2020/039702 US2020039702W WO2020264221A1 WO 2020264221 A1 WO2020264221 A1 WO 2020264221A1 US 2020039702 W US2020039702 W US 2020039702W WO 2020264221 A1 WO2020264221 A1 WO 2020264221A1
Authority
WO
WIPO (PCT)
Prior art keywords
decoder
value
prediction
prediction samples
obtaining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2020/039702
Other languages
English (en)
Inventor
Xiaoyu XIU
Yi-Wen Chen
Xianglin Wang
Tsung-Chuan MA
Hong-Jheng Jhu
Shuiming Ye
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN202080045432.8A priority Critical patent/CN114175659B/zh
Publication of WO2020264221A1 publication Critical patent/WO2020264221A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/573Motion compensation with multiple frame prediction using two or more reference frames in a given prediction direction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors

Definitions

  • This disclosure is related to video coding and compression. More specifically, this disclosure relates to methods and apparatus for bi-directional optical flow (BDOF) method for video coding.
  • BDOF bi-directional optical flow
  • Video coding is performed according to one or more video coding standards.
  • video coding standards include versatile video coding (VVC), joint exploration test model (JEM), high- efficiency video coding (H.265/HEVC), advanced video coding (H.264/AVC), moving picture experts group (MPEG) coding, or the like.
  • Video coding generally utilizes prediction methods (e.g., inter-prediction, intra-prediction, or the like) that take advantage of redundancy present in video images or sequences.
  • An important goal of video coding techniques is to compress video data into a form that uses a lower bit rate while avoiding or minimizing degradations to video quality.
  • Examples of the present disclosure provide methods and apparatus for motion vector prediction in video coding.
  • a method of decoding a video signal may include obtaining, at a decoder, a first reference picture / (0) and a second reference picture / (1) associated with a video block.
  • the first reference picture / (0) is before a current picture and the second reference picture / (1) is after the current picture in display order.
  • the method may also include obtaining, at the decoder, first prediction samples of the video block from a reference block in the first reference picture I (0
  • the i and j may represent a coordinate of one sample within the current picture.
  • the method may include obtaining, at the decoder, second prediction samples of the video block from a reference block in the second reference picture I (l).
  • the method may further include controlling, at the decoder, internal bit-depths of the BDOF by applying right-shifting to internal BDOF parameters.
  • the BDOF is independent of an input video bit-depth.
  • the internal BDOF parameters may include horizontal gradient values and vertical gradient values derived based on the first prediction samples the second prediction samples I (1 i,j).
  • the method may include obtaining, at the decoder, final bi-prediction samples of the video block based on the BDOF being applied to the video block based on the first prediction samples and the second prediction samples I (1 i,j).
  • a method of decoding a video signal may include obtaining, at a decoder, a first reference picture / (0) and a second reference picture / (1) associated with a video block.
  • the first reference picture is before a current picture and the second reference picture / (1) is after the current picture in display order.
  • the method may also include obtaining, at the decoder, first prediction samples / (0) (i,j) of the video block from a reference block in the first reference picture /®.
  • the i and j represent a coordinate of one sample within the current picture.
  • the method may include obtaining, at the decoder, second prediction samples ( i,j ) of the video block from a reference block in the second reference picture I (1
  • the method may further include controlling, at the decoder and when internal bit-depth is more than 12-bit, the internal bit-depths of the BDOF by applying right-shifting to internal BDOF parameters to align precision of an output prediction signal to a constant number.
  • the internal BDOF parameters include horizontal gradient values and vertical gradient values derived based on the first prediction samples the second prediction samples I (l i,j). and sample differences between the first prediction samples and the second prediction samples I (l i,j).
  • the method may include obtaining, at the decoder, final bi-prediction samples of the video block based on the BDOF being applied to the video block based on the first prediction samples and the second prediction samples I (l i,j).
  • the method may also include obtaining, at the decoder, the output prediction signal based on the final bi-prediction samples.
  • a computing device for decoding a video signal.
  • the computing device may include one or more processors, anon-transitory computer-readable memory storing instructions executable by the one or more processors.
  • the one or more processors may be configured to obtain, at a decoder, a first reference picture and a second reference picture / (1) associated with a video block.
  • the first reference picture / (0) is before a current picture and the second reference picture / (1) is after the current picture in display order.
  • the one or more processors may further be configured to obtain, at the decoder, first prediction samples of the video block from a reference block in the first reference picture I ⁇ °
  • the i and j represent a coordinate of one sample within the current picture.
  • the one or more processors may be configured to obtain, at the decoder, second prediction samples of the video block from a reference block in the second reference picture I (1
  • the one or more processors may also be configured to control, at the decoder, internal bit-depths of bi-directional optical flow (BDOF) by applying right-shifting to internal BDOF parameters.
  • BDOF is independent of an input video bit-depth.
  • the internal BDOF parameters include horizontal gradient values and vertical gradient values derived based on the first prediction samples the second prediction samples I (l i,j). and sample differences between the first prediction samples and the second prediction samples
  • the one or more processors may be configured to obtain, at the decoder, final bi prediction samples of the video block based on the BDOF being applied to the video block based on the first prediction samples and the second prediction samples
  • a non-transitory computer- readable storage medium having stored therein instructions.
  • the instructions may cause the apparatus to perform obtaining, at a decoder, a first reference picture / (0) and a second reference picture associated with a video block.
  • the first reference picture /® is before a current picture and the second reference picture is after the current picture in display order.
  • the instructions may further cause the apparatus to perform obtaining, at the decoder, first prediction samples of the video block from a reference block in the first reference picture /®.
  • the i and j represent a coordinate of one sample within the current picture.
  • the instructions may additionally further cause the apparatus to perform obtaining, at the decoder, second prediction samples of the video block from a reference block in the second reference picture I ⁇
  • the instructions may further cause the apparatus to perform controlling, at the decoder and when internal bit-depth is more than 12-bit, the internal bit-depths of bi directional optical flow (BDOF) by applying right-shifting to internal BDOF parameters to align precision of an output prediction signal to a constant number.
  • the internal BDOF parameters include horizontal gradient values and vertical gradient values derived based on the first prediction samples , the second prediction samples I (1 i,j) . and sample differences between the first prediction samples and the second prediction samples
  • the instructions may in addition further cause the apparatus to perform obtaining, at the decoder, final bi-prediction samples of the video block based on the BDOF being applied to the video block based on the first prediction samples and the second prediction samples I (1 i,j).
  • the instructions may also further cause the apparatus to perform obtaining, at the decoder, the output prediction signal based on the final bi-prediction samples.
  • FIG. 1 is a block diagram of an encoder, according to an example of the present disclosure.
  • FIG. 2 is a block diagram of a decoder, according to an example of the present disclosure.
  • FIG. 3A is a diagram illustrating block partitions in a multi-type tree structure, according to an example of the present disclosure.
  • FIG. 3B is a diagram illustrating block partitions in a multi-type tree structure, according to an example of the present disclosure.
  • FIG. 3C is a diagram illustrating block partitions in a multi-type tree structure, according to an example of the present disclosure.
  • FIG. 3D is a diagram illustrating block partitions in a multi-type tree structure, according to an example of the present disclosure.
  • FIG. 3E is a diagram illustrating block partitions in a multi-type tree structure, according to an example of the present disclosure.
  • FG. 4 is a diagram illustration of a bi-directional optical flow (BDOF) model, according to an example of the present disclosure.
  • FIG. 5 is a bit-depth control method of BDOF, according to an example of the present disclosure.
  • FIG. 6 is a bit-depth control method of BDOF, according to an example of the present disclosure.
  • FIG. 7 is a diagram illustrating a computing environment coupled with a user interface, according to an example of the present disclosure.
  • first,“second,”“third,” etc. may be used herein to describe various information, the information should not be limited by these terms. These terms are only used to distinguish one category of information from another. For example, without departing from the scope of the present disclosure, first information may be termed as second information; and similarly, second information may also be termed as first information. As used herein, the term“if’ may be understood to mean“when” or“upon” or“in response to a judgment” depending on the context. [0024] The first version of the HEVC standard was finalized in October 2013, which offers approximately 50% bit-rate saving or equivalent perceptual quality compared to the prior generation video coding standard H.264/MPEG AVC.
  • JVET Joint Video Exploration Team
  • VVC Versatile Video Coding
  • VVC is built upon the block-based hybrid video coding framework.
  • FIG. 1 shows a general diagram of a block-based video encoder for the VVC.
  • FIG. 1 shows atypical encoder 100.
  • the encoder 100 has video input 110, motion compensation 112, motion estimation 114, intra/inter mode decision 116, block predictor 140, adder 128, transform 130, quantization 132, prediction related info 142, intra prediction 118, picture buffer 120, inverse quantization 134, inverse transform 136, adder 126, memory 124, in-loop filter 122, entropy coding 138, and bitstream 144.
  • a video frame is partitioned into a plurality of video blocks for processing. For each given video block, a prediction is formed based on either an inter prediction approach or an intra prediction approach.
  • a prediction residual representing the difference between a current video block, part of video input 110, and its predictor, part of block predictor 140, is sent to a transform 130 from adder 128. Transform coefficients are then sent from the Transform 130 to a Quantization
  • Quantized coefficients are then fed to an Entropy Coding 138 to generate a compressed video bitstream.
  • prediction related information 142 from an intra/inter mode decision 116 such as video block partition info, motion vectors (MVs), reference picture index, and intra prediction mode, are also fed through the Entropy Coding 138 and saved into a compressed bitstream 144.
  • Compressed bitstream 144 includes a video bitstream.
  • decoder-related circuitries are also needed in order to reconstruct pixels for the purpose of prediction.
  • a prediction residual is reconstructed through an Inverse Quantization 134 and an Inverse Transform 136.
  • This reconstructed prediction residual is combined with a Block Predictor 140 to generate un-filtered reconstructed pixels for a current video block.
  • Spatial prediction uses pixels from samples of already coded neighboring blocks (which are called reference samples) in the same video frame as the current video block to predict the current video block.
  • Temporal prediction uses reconstructed pixels from already-coded video pictures to predict the current video block. Temporal prediction reduces temporal redundancy inherent in the video signal.
  • the temporal prediction signal for a given coding unit (CU) or coding block is usually signaled by one or more MVs, which indicate the amount and the direction of motion between the current CU and its temporal reference. Further, if multiple reference pictures are supported, one reference picture index is additionally sent, which is used to identify from which reference picture in the reference picture storage, the temporal prediction signal comes from.
  • Motion estimation 114 intakes video input 110 and a signal from picture buffer 120 and output, to motion compensation 112, amotion estimation signal.
  • Motion compensation 112 intakes video input 110, a signal from picture buffer 120, and motion estimation signal from motion estimation 114 and output to intra/inter mode decision 116, a motion compensation signal.
  • an intra/inter mode decision 116 in the encoder 100 chooses the best prediction mode, for example, based on the rate- distortion optimization method.
  • the block predictor 140 is then subtracted from the current video block, and the resulting prediction residual is de-correlated using the transform 130 and the quantization 132.
  • the resulting quantized residual coefficients are inverse quantized by the inverse quantization 134 and inverse transformed by the inverse transform 136 to form the reconstructed residual, which is then added back to the prediction block to form the reconstructed signal of the CU.
  • in-loop filtering 122 such as a deblocking filter, a sample adaptive offset (SAO), and/or an adaptive in-loop filter (ALF) may be applied on the reconstructed CU before it is put in the reference picture storage of the picture buffer 120 and used to code future video blocks.
  • coding mode inter or intra
  • prediction mode information motion information
  • quantized residual coefficients are all sent to the entropy coding unit 138 to be further compressed and packed to form the bitstream.
  • FIG. 1 gives the block diagram of a generic block-based hybrid video encoding system.
  • the input video signal is processed block by block (called coding units (CUs)).
  • CUs coding units
  • VTM-1.0 a CU can be up to 128x128 pixels.
  • HEVC High Efficiency Video Coding
  • one coding tree unit (CTU) is split into CUs to adapt to varying local characteristics based on quad/binary/temary-tree.
  • each CU is always used as the basic unit for both prediction and transform without further partitions.
  • the multi-type tree structure one CTU is firstly partitioned by a quad-tree structure. Then, each quad-tree leaf node can be further partitioned by a binary and ternary tree structure.
  • FIGS. 3 A, 3B, 3C, 3D, and 3E there are five splitting types, quaternary partitioning, horizontal binary partitioning, vertical binary partitioning, horizontal ternary partitioning, and vertical ternary partitioning.
  • FIG. 3 A shows a diagram illustrating block quaternary partition in a multi-type tree structure, in accordance with the present disclosure.
  • FIG. 3B shows a diagram illustrating block vertical binary partition in a multi-type tree structure, in accordance with the present disclosure.
  • FIG. 3C shows a diagram illustrating block horizontal binary partition in a multi type tree structure, in accordance with the present disclosure.
  • FIG. 3D shows a diagram illustrating block vertical ternary partition in a multi-type tree structure, in accordance with the present disclosure.
  • FIG. 3E shows a diagram illustrating block horizontal ternary partition in a multi type tree structure, in accordance with the present disclosure.
  • spatial prediction and/or temporal prediction may be performed.
  • Spatial prediction (or“intra prediction”) uses pixels from the samples of already coded neighboring blocks (which are called reference samples) in the same video picture/slice to predict the current video block. Spatial prediction reduces spatial redundancy inherent in the video signal.
  • Temporal prediction also referred to as“inter prediction” or“motion compensated prediction” uses reconstructed pixels from the already coded video pictures to predict the current video block. Temporal prediction reduces temporal redundancy inherent in the video signal.
  • the temporal prediction signal for a given CU is usually signaled by one or more motion vectors (MVs), which indicate the amount and the direction of motion between the current CU and its temporal reference.
  • MVs motion vectors
  • one reference picture index is additionally sent, which is used to identify from which reference picture in the reference picture storage, the temporal prediction signal comes from.
  • the mode decision block in the encoder chooses the best prediction mode, for example, based on the rate-distortion optimization method.
  • the prediction block is then subtracted from the current video block, and the prediction residual is de-correlated using transform and quantized.
  • the quantized residual coefficients are inverse quantized and inverse transformed to form the reconstructed residual, which is then added back to the prediction block to form the reconstructed signal of the CU.
  • in-loop filtering such as deblocking filter, sample adaptive offset (SAO), and adaptive in-loop filter (ALF) may be applied on the reconstructed CU before it is put in the reference picture store and used to code future video blocks.
  • coding mode inter or intra
  • prediction mode information motion information
  • quantized residual coefficients are all sent to the entropy coding unit to be further compressed and packed to form the bitstream.
  • FIG. 2 shows a general block diagram of a video decoder for the VVC. Specifically, FIG. 2 shows a typical decoder 200 block diagram. Decoder 200 has bitstream 210, entropy decoding 212, inverse quantization 214, inverse transform 216, adder 218, intra/inter mode selection 220, intra prediction 222, memory 230, in-loop filter 228, motion compensation 224, picture buffer 226, prediction related info 234, and video output 232.
  • Decoder 200 is similar to the reconstruction-related section residing in the encoder 100 of FIG. 1.
  • an incoming video bitstream 210 is first decoded through an Entropy Decoding 212 to derive quantized coefficient levels and prediction-related information.
  • the quantized coefficient levels are then processed through an Inverse Quantization 214 and an Inverse Transform 216 to obtain a reconstructed prediction residual.
  • a block predictor mechanism implemented in an Intra/inter Mode Selector 220, is configured to perform either an Intra Prediction 222 or a Motion Compensation 224, based on decoded prediction information.
  • a set of unfiltered reconstructed pixels is obtained by summing up the reconstructed prediction residual from the Inverse Transform 216 and a predictive output generated by the block predictor mechanism, using a summer 218.
  • the reconstructed block may further go through an In-Loop Filter 228 before it is stored in a Picture Buffer 226, which functions as a reference picture store.
  • the reconstructed video in the Picture Buffer 226 may be sent to drive a display device, as well as used to predict future video blocks.
  • a filtering operation is performed on these reconstructed pixels to derive a final reconstructed Video Output 232.
  • FIG. 2 gives a general block diagram of a block-based video decoder.
  • the video bitstream is first entropy decoded at entropy decoding unit.
  • the coding mode and prediction information are sent to either the spatial prediction unit (if intra coded) or the temporal prediction unit (if inter coded) to form the prediction block.
  • the residual transform coefficients are sent to inverse quantization unit and inverse transform unit to reconstruct the residual block.
  • the prediction block and the residual block are then added together.
  • the reconstructed block may further go through in-loop filtering before it is stored in the reference picture storage.
  • the reconstructed video in reference picture store is then sent out to drive a display device, as well as used to predict future video blocks.
  • BDOF bi directional optical flow
  • FIG. 4 shows an illustration of a BDOF model, in accordance with the present disclosure.
  • the BDOF is a sample-wise motion refinement that is performed on top of the block-based motion- compensated predictions when bi-prediction is used.
  • the motion refinement (v x , v y ) of each 4x4 sub-block is calculated by minimizing the difference between L0 and LI prediction samples after the BDOF is applied inside one 6x6 window W around the sub-block.
  • the value of ( x , v y ) is derived as where [ ] is the floor function; clip3(min, max, x) is a function that clips a given value x inside the range of [min, max]; the symbol » represents bitwise right shift operation; the symbol « represents bitwise left shit operation; th BD0F is the motion refinement threshold to prevent the propagated errors due to irregular local motion, which is equal to 2 13 ⁇ BD , where BD is the bit- depth of the input video.
  • Si S( ⁇ ,»E ⁇ yc ( ⁇ ,b yc( ⁇ ,b,
  • the final bi-prediction samples of the CU are calculated by interpolating the L0/L1 prediction samples along the motion trajectory based on the optical flow model, as indicated by
  • shift and o ⁇ set are the right shift value and the offset value that are applied to combine the LO and LI prediction signals for bi-prediction, which are equal to 15— BD and 1 « (14— BD) + 2 (1 « 13) , respectively.
  • Table 1 illustrates the specific bit-widths of intermediate parameters that are involved in the BDOF process. As shown in the table, the internal bit-width of the whole BDOF process does not exceed 32-bit. Additionally, the multiplication with the worst possible input happens at the product of v x S 2 m in (1) with inputs of 15-bit and 4-bit. Therefore, 15-bit multiplier is enough for the BDOF.
  • first,“second,”“third,” etc. may include used herein to describe various information, the information should not be limited by these terms. These terms are only used to distinguish one category of information from another. For example, without departing from the scope of the present disclosure, first information may include termed as second information; and similarly, second information may also be termed as first information. As used herein, the term “if may be understood to mean “when” or “upon” or “in response to” depending on the context.
  • the BDOF can enhance the efficiency of bi-predictive prediction, its design can still be further improved. Specifically, the following inefficiencies in the existing BDOF design in VVC for controlling the bit- widths of intermediate parameters are identified in this disclosure.
  • LI prediction samples and the parameter p x (i,j) and p y (i,j) (i.e., the sum of the horizontal/vertical L0 and LI gradient values) are represented in the same bit- width of 11 -bit.
  • the gradient values are calculated as the difference between neighboring prediction samples; Due to the high-pass nature of such process, the derived gradients are less reliable in the presence of noise, e.g., the noise captured in the original video and the coding noise that is generated during the coding process. This means that it may not always be beneficial to represent the gradient values in high bit-width.
  • maximum bit- width of the current design is equal to 31 -bit.
  • the coding process with maximal internal bit- width more than 16-bit is usually implemented by a 32-bit implementation. Therefore, the existing design does not fully utilize the valid dynamic range of the 32-bit implementation. This may lead to unnecessary precision loss of the motion refinements derived by the BDOF.
  • bit-width control methods are proposed to address the two issues of the bit-width control method, as pointed out in the“Current BDOF and PROF Design” section for the existing BDOF design.
  • additional right shift n grad are introduced in the proposed method a/(fc) a/(fc)
  • bit-width of gradient values Specifically, the horizontal and vertical gradients at each sample position are calculated as
  • bit-shift n adj is introduced to the calculation of variables/ (i,7), ip y (i- and 0(i,/) in order to control the entire BDOF process so that it is operated at appropriate internal bit- widths, as depicted as:
  • the values of two parameters are calculated as where B 2 and B 6 are the parameters to control the output dynamic ranges of S 2 and S 6 , respectively. It should be noticed that different from the gradient calculation, the clipping operations in (8) are only applied once to calculate the motion refinement of each 4x4 sub block inside one BDOF CU, i.e., being invoked based on the 4x4 unit. Therefore, the corresponding complexity increase due to the clipping operations introduced in the proposed method is very negligible.
  • n grad , n adj , B 2 and B 6 may be applied to achieve different trade-offs between the intermediate bit-width and the precision of internal BDOF derivations.
  • Table 2 illustrates the corresponding bit-width of each intermediate parameter when the proposed bit-width control method is applied to the BDOF.
  • gray colors highlight the changes that are applied in the proposed bit-width control method compared to the existing BDOF design in VVC.
  • the internal bit-width of the whole BDOF process does not exceed 32-bit.
  • the maximal bit-width is just 32-bit, which can fully utilize the available dynamic range of 32-bit hardware implementation.
  • the multiplication with the worst possible input happens at the product of v x S 2 m where the input S 2 m is 14-bit and the input v x is 6-bit. Therefore, like the existing BDOF design, one 16-bit multiplier is also large enough when the proposed method is applied.
  • the final bi-prediction samples of the CU are calculated by interpolating the L0/L1 prediction samples along the motion trajectory based on the optical flow model, as indicated by
  • bit-depth is the internal bit-depth
  • the proposed method is described by the steps: Firstly, the a /(f c) a /(f e)
  • th BD0F is the motion refinement threshold, which is calculated based on the internal bit- depth as l «max(5, bit Depth - 7).
  • FIG. 5 shows a method 500 of decoding a video signal in accordance with the present disclosure.
  • the method may be, for example, applied to a decoder.
  • the decoder may obtain a first reference picture / (0) and a second reference picture associated with a video block.
  • the first reference picture / (0) may be before a current picture and the second reference picture is after the current picture in display order.
  • the decoder may obtain first prediction samples /®(i,y) of the video block from a reference block in the first reference picture /®.
  • the numbers i and j may represent a coordinate of one sample within the current picture.
  • the decoder may obtain second prediction samples ( i,j ) of the video block from a reference block in the second reference picture / (1 - ) .
  • the decoder may control internal bit-depths of the BDOF by applying right-shifting to internal BDOF parameters, where the BDOF is independent of an input video bit-depth, and where the internal BDOF parameters include horizontal gradient values and vertical gradient values derived based on the first prediction samples the second prediction samples and sample differences between the first prediction samples and the second prediction samples
  • the decoder may obtain final bi-prediction samples of the video block based on the BDOF being applied to the video block based on the first prediction samples and the second prediction samples
  • th BD0F is the motion refinement threshold, which is one constant number equal to 32.
  • FIG. 6 shows a method 600 of decoding a video signal in accordance with the present disclosure.
  • the method may be, for example, applied to a decoder.
  • the decoder may obtain a horizontal gradient difference value.
  • the horizontal gradient difference value may be the difference between a first horizontal gradient value and a second horizontal gradient value.
  • the decoder may obtain a vertical gradient difference value.
  • the vertical gradient difference value may be the difference between a first vertical gradient value and a second vertical gradient value.
  • the decoder may left shit the horizontal gradient difference value by a third shift value.
  • the decoder may left shift the vertical gradient difference value by the third shift value.
  • the decoder may calculate a sample refinement value based on a sum of a product of the horizontal motion refinement value and the horizontal gradient difference value and a product of the vertical motion refinement value and the vertical gradient difference value.
  • the sample refinement value may be calculated based on equation (27) below.
  • the sample refinement value may be b in equation (27) which is used to calculate pred BD0F (x, y ) for the final bi-prediction samples.
  • the decoder may obtain the final bi-prediction samples of the video block based on a sum of the first prediction samples I ⁇ 0 i,j , the second prediction samples I (1 i,j). the sample refinement value, and an offset value.
  • the decoder may right shift the final bi-prediction samples by a fourth shift value.
  • one new bit-shift method as below is proposed for the motion compensated prediction for high internal bit-depths (i.e., > 12-bit) to align the precision of the output prediction signal to one constant number (e.g., 20-bit).
  • the proposed method may include at least the following steps:
  • Si S( ⁇ ,»E ⁇ yc( ⁇ ,b 1>x(i,D,
  • th BD0F is the motion refinement threshold which is one constant number equal to 32.
  • the above methods may be implemented using an apparatus that includes one or more circuitries, which include application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components.
  • ASICs application-specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGAs field-programmable gate arrays
  • controllers microcontrollers, microprocessors, or other electronic components.
  • microcontrollers microcontrollers, microprocessors, or other electronic components.
  • FIG. 7 shows a computing environment 710 coupled with a user interface 760.
  • the computing environment 710 can be part of a data processing server.
  • the computing environment 710 includes processor 720, memory 740, and I/O interface 750.
  • the processor 720 typically controls overall operations of the computing environment 710, such as the operations associated with the display, data acquisition, data communications, and image processing.
  • the processor 720 may include one or more processors to execute instructions to perform all or some of the steps in the above-described methods.
  • the processor 720 may include one or more modules that facilitate the interaction between the processor 720 and other components.
  • the processor may be a Central Processing Unit (CPU), a microprocessor, a single chip machine, a GPU, or the like.
  • the memory 740 is configured to store various types of data to support the operation of the computing environment 710.
  • Memory 740 may include predetermined software 742. Examples of such data comprise instructions for any applications or methods operated on the computing environment 710, video datasets, image data, etc.
  • the memory 740 may be implemented by using any type of volatile or non-volatile memory devices, or a combination thereof, such as a static random access memory (SRAM), an electrically erasable programmable read-only memory (EEPROM), an erasable programmable read-only memory (EPROM), a programmable read-only memory (PROM), a read-only memory (ROM), a magnetic memory, a flash memory, a magnetic or optical disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM erasable programmable read-only memory
  • PROM programmable read-only memory
  • ROM read-only memory
  • magnetic memory a magnetic memory
  • flash memory a magnetic or
  • the I/O interface 750 provides an interface between the processor 720 and peripheral interface modules, such as a keyboard, a click wheel, buttons, and the like.
  • the buttons may include but are not limited to, a home button, a start scan button, and a stop scan button.
  • the I/O interface 750 can be coupled with an encoder and decoder.
  • non-transitory computer-readable storage medium comprising a plurality of programs, such as comprised in the memory 740, executable by the processor 720 in the computing environment 710, for performing the above- described methods.
  • the non-transitory computer-readable storage medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disc, an optical data storage device, or the like.
  • the non-transitory computer-readable storage medium has stored therein a plurality of programs for execution by a computing device having one or more processors, where the plurality of programs when executed by the one or more processors, cause the computing device to perform the above-described method for motion prediction.
  • the computing environment 710 may be implemented with one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field- programmable gate arrays (FPGAs), graphical processing units (GPUs), controllers, microcontrollers, microprocessors, or other electronic components, for performing the above methods.
  • ASICs application-specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGAs field- programmable gate arrays
  • GPUs graphical processing units
  • controllers microcontrollers, microprocessors, or other electronic components, for performing the above methods.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

L'invention concerne des procédés, des appareils et des supports d'enregistrement non transitoires lisibles par ordinateur pour décoder un signal vidéo. Le procédé comprend les étapes consistant à : obtenir une première image de référence I (0) et une seconde image de référence I (1) associées à un bloc vidéo ; obtenir des premiers échantillons de prédiction I (0) (i, j) du bloc vidéo à partir d'un bloc de référence dans la première image de référence I (0) ; obtenir des seconds échantillons de prédiction I (1) (i, j) du bloc vidéo à partir d'un bloc de référence dans la seconde image de référence I (1) ; réguler les largeurs binaires internes du BDOF par application d'un décalage à droite à des paramètres BDOF internes ; obtenir des échantillons de bi-prédiction finaux du bloc vidéo en fonction du BDOF qui est appliqué au bloc vidéo sur la base des premiers échantillons de prédiction I (0) (i, j) et des seconds échantillons de prédiction I (1) (i, j).
PCT/US2020/039702 2019-06-25 2020-06-25 Appareils et procédés pour réguler la largeur binaire d'un flux optique bidirectionnel Ceased WO2020264221A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202080045432.8A CN114175659B (zh) 2019-06-25 2020-06-25 用于双向光流的比特深度控制的装置和方法

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201962866607P 2019-06-25 2019-06-25
US62/866,607 2019-06-25
US201962867185P 2019-06-26 2019-06-26
US62/867,185 2019-06-26

Publications (1)

Publication Number Publication Date
WO2020264221A1 true WO2020264221A1 (fr) 2020-12-30

Family

ID=74061322

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2020/039702 Ceased WO2020264221A1 (fr) 2019-06-25 2020-06-25 Appareils et procédés pour réguler la largeur binaire d'un flux optique bidirectionnel

Country Status (2)

Country Link
CN (1) CN114175659B (fr)
WO (1) WO2020264221A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3891990A4 (fr) * 2019-01-06 2022-06-15 Beijing Dajia Internet Information Technology Co., Ltd. Commande de largeur de bit destinée à un flux optique bidirectionnel

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016054765A1 (fr) * 2014-10-08 2016-04-14 Microsoft Technology Licensing, Llc Ajustements apportés au codage et au décodage lors de la commutation entre espaces colorimétriques
EP3413563A1 (fr) * 2016-02-03 2018-12-12 Sharp Kabushiki Kaisha Dispositif de décodage d'image animée, dispositif d'encodage d'image animée, et dispositif de génération d'image de prédiction

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100890512B1 (ko) * 2008-09-11 2009-03-26 엘지전자 주식회사 모션벡터 결정방법
EP2661892B1 (fr) * 2011-01-07 2022-05-18 Nokia Technologies Oy Prédiction de mouvement dans un codage vidéo
WO2017036399A1 (fr) * 2015-09-02 2017-03-09 Mediatek Inc. Procédé et appareil de compensation de mouvement pour un codage vidéo sur la base de techniques de flux optique à biprédiction
WO2018230493A1 (fr) * 2017-06-14 2018-12-20 シャープ株式会社 Dispositif de décodage vidéo, dispositif de codage vidéo, dispositif de génération d'image de prévision et dispositif de dérivation de vecteur de mouvement
US10904565B2 (en) * 2017-06-23 2021-01-26 Qualcomm Incorporated Memory-bandwidth-efficient design for bi-directional optical flow (BIO)
KR102580910B1 (ko) * 2017-08-29 2023-09-20 에스케이텔레콤 주식회사 양방향 옵티컬 플로우를 이용한 움직임 보상 방법 및 장치

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016054765A1 (fr) * 2014-10-08 2016-04-14 Microsoft Technology Licensing, Llc Ajustements apportés au codage et au décodage lors de la commutation entre espaces colorimétriques
EP3413563A1 (fr) * 2016-02-03 2018-12-12 Sharp Kabushiki Kaisha Dispositif de décodage d'image animée, dispositif d'encodage d'image animée, et dispositif de génération d'image de prédiction

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BENJAMIN BROSS et al., Versatile Video Coding (Draft 5), Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, [Document: JVET-N1001-v7 (Version 7)], 14th Meeting: Geneva, CH, PP. 1-371, 29 May 2019 [Retrieved on 10-Sep-2020], from <http://phenix.int-evry.fr/jvet/> pages 222-223 *
JIANCONG (DANIEL) LUO et al., CE2-related: Prediction refinement with optical flow for affine mode, Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, [Document: JVET-N0236-r5 (Version 7)], 14th Meeting: Geneva, CH, PP. 1-7, 26 March 2019 [Retrieved on 10-Sep-2020], from <http://phenix.int-evry.fr/jvet/> pages 1-4 *
XIAOYU XIU et al., CE9-related: Improvements on bi-directional optical flow (BDOF), Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, [Document: JVET-N0325 (Version 3)], 14th Meeting: Geneva, CH, 26 March 2019 [Retrieved on 10-Sep-2020], from <http://phenix.int-evry.fr/jvet/> pages 1-2, 4 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3891990A4 (fr) * 2019-01-06 2022-06-15 Beijing Dajia Internet Information Technology Co., Ltd. Commande de largeur de bit destinée à un flux optique bidirectionnel
US11388436B2 (en) 2019-01-06 2022-07-12 Beijing Dajia Internet Information Technology Co., Ltd. Bit-width control for bi-directional optical flow
US11743493B2 (en) 2019-01-06 2023-08-29 Beijing Dajia Internet Information Technology Co., Ltd. Bit-width control for bi-directional optical flow
US12137244B2 (en) 2019-01-06 2024-11-05 Beijing Dajia Internet Information Technology Co., Ltd. Bit-width control for bi-directional optical flow
US12238331B2 (en) 2019-01-06 2025-02-25 Beijing Dajia Internet Information Technology Co., Ltd. Bit-width control for bi-directional optical flow

Also Published As

Publication number Publication date
CN114175659B (zh) 2026-02-13
CN114175659A (zh) 2022-03-11

Similar Documents

Publication Publication Date Title
US12341973B2 (en) Methods and devices for bit-width control for bi-directional optical flow
JP7463460B2 (ja) ビデオ復号方法およびビデオデコーダ
KR20210099008A (ko) 이미지를 디블록킹하기 위한 방법 및 장치
EP3891990A1 (fr) Commande de largeur de bit destinée à un flux optique bidirectionnel
WO2021041332A1 (fr) Procédés et appareil d&#39;affinement de prédiction avec flux optique
WO2021072326A1 (fr) Procédés et appareils pour un affinement de prédiction avec un flux optique, un flux optique bidirectionnel, et un affinement de vecteur de mouvement côté décodeur
WO2020220048A1 (fr) Procédés et appareils pour affinement de prédiction avec flux optique
EP4032298A1 (fr) Procédés et appareils d&#39;affinement de prédiction avec flux optique
EP3991431A1 (fr) Procédés et appareil d&#39;affinement de prédiction avec flux optique
EP3909241A1 (fr) Système et procédé d&#39;amélioration de prédiction inter et intra combinée
WO2020257629A1 (fr) Procédés et appareils d&#39;affinement de prédiction avec flux optique
WO2020223552A1 (fr) Procédés et appareil d&#39;affinement de prédiction avec flux optique
WO2020264221A1 (fr) Appareils et procédés pour réguler la largeur binaire d&#39;un flux optique bidirectionnel
WO2021188707A1 (fr) Procédés et appareils pour la simplification du flux optique bidirectionnel et l&#39;affinement de vecteurs de mouvement côté décodeur
CN113615197B (zh) 用于双向光流的比特深度控制的方法和设备
WO2020159990A1 (fr) Procédés et appareil de prédiction intra pour codage de contenu d&#39;écran

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20831137

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20831137

Country of ref document: EP

Kind code of ref document: A1