WO2020007093A1 - Procédé et appareil de prédiction d'image - Google Patents
Procédé et appareil de prédiction d'image Download PDFInfo
- Publication number
- WO2020007093A1 WO2020007093A1 PCT/CN2019/082942 CN2019082942W WO2020007093A1 WO 2020007093 A1 WO2020007093 A1 WO 2020007093A1 CN 2019082942 W CN2019082942 W CN 2019082942W WO 2020007093 A1 WO2020007093 A1 WO 2020007093A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- block
- image block
- current image
- motion information
- sub
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/105—Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/593—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
Definitions
- Embodiments of the present invention relate to the technical field of video encoding and decoding, and in particular, to an image prediction method and device.
- digital video devices can implement video compression technology.
- video compression technology can be used at the encoding end to encode video data, and video data can be decoded at the decoding end.
- video data can be efficiently encoded according to the encoding and decoding methods popularized in the relevant standards of video compression technology (such as MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264, etc.), Decode and / or store.
- video compression technology such as MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264, etc.
- intra-frame prediction and inter-frame prediction can be performed on video frames, thereby reducing redundant information in the video data.
- Common video coding technologies include block-based video coding. Specifically, a frame of image (ie, a video frame) can be divided into several image blocks.
- This image block can also be called a coding tree unit (coding tree unit, CTU), coding unit (CU), and / or coding nodes, such as the high-efficiency video coding (HEVC) standard, and other video coding standards have proposed predictive coding modes based on image blocks, that is, based on
- the encoded video data block is used to predict the current image block to be encoded.
- the encoding end can predict the current image block based on the encoded neighboring blocks in the same frame image, and then encode the current image block; or, the encoding end
- the current image block may be predicted based on the encoded reference blocks in other video frames (which may be referred to as reference frames) in the video sequence, and the current image block may be further encoded.
- the existing encoding and decoding methods in the above-mentioned HEVC standard still have some shortcomings.
- the accuracy of prediction results of image blocks is relatively low.
- the present application provides an image prediction method and device, which can improve the performance of encoding and decoding and reduce the complexity of encoding and decoding.
- the present application provides an image prediction method, which may include determining motion information of control points of a current image block according to motion information of control points of neighboring image blocks of a current image block to be predicted; and
- the motion information of the control points of the current image block uses an affine transformation model to determine the motion information of the sub-blocks of the current image block; and according to the motion information of the sub-blocks of the current image block, a prediction block of the current image block's sub-blocks is obtained.
- Adjacent image blocks of the current image block satisfy at least one of the following conditions:
- the adjacent image block of the current image block is the image block located to the left or lower left of the current image block, and the adjacent image block does not include Image blocks located above the current image block, upper left and upper right;
- the adjacent image block of the current image block is an image block located above or above the current image block, and the adjacent image block does not include the The top left, left, and bottom left image blocks of the current image block.
- the adjacent image block of the current image block may be an image block adjacent to a certain edge of the current image block, or an image block adjacent to a certain point of the current image block, and the current image block and the Adjacent image blocks of the current image block are all CUs. It can be understood that the sub-blocks of the current image block are sub-blocks of the CU.
- the image prediction method in the process of encoding and decoding the current image block, by determining whether the boundary of the current image block coincides with the boundary of the CTU where the current image block is located, a part of adjacent images of the current image block is selected.
- the block is used to determine the motion information of the control points of the current image block.
- the motion information of the control points of other adjacent image blocks is no longer acquired across the CTU. In this way, the resources consumed by encoding and decoding can be saved.
- the motion information includes a motion vector
- the motion of the control points of the current image block is determined according to the motion information of the control points of the adjacent image blocks of the current image block to be predicted.
- Information including: calculating the motion vector of the control point of the current image block using the following formula:
- (vx 4 , vy 4 ) is the motion vector of the control point (x 4 , y 4 ) located at the upper left vertex of the adjacent image block
- (vx 5 , vy 5 ) is the control located at the upper right vertex of the adjacent image block.
- the motion vector of the point (x 5 , y 5 ), (vx, vy) is the motion vector of the control point (x, y) of the current image block.
- the motion information of the control points of the current image block can be determined by using the 4-parameter motion model of the first adjacent image block based on the motion information of the two control points of the adjacent image blocks of the current image block.
- the motion information includes a motion vector
- the foregoing uses an affine transformation model to determine the motion information of a sub-block of the current image block according to the motion information of the control points of the current image block. Including: using the following formula to calculate the motion vector of the sub-block of the current image block:
- (vx 0 , vy 0 ) is the motion vector of the control point (x 0 , y 0 ) located at the upper-left vertex of the current image block
- (vx 1 , vy 1 ) is the control point located at the upper-right vertex of the current image block ( x 1 , y 1 )
- (vx, vy) is the motion vector of the sub-block.
- the foregoing obtaining a prediction block of a subblock of the current image block according to the motion information of the subblock of the current image block includes: according to the motion information of the subblock of the current image block.
- the motion vector in the current image block and the position information of the subblock of the current image block In the reference frame of the current image block, determine the reference block pointed to by the motion vector in the motion information of the subblock of the current image block, and use the reference block as the current A prediction block for a sub-block of an image block.
- the neighboring image blocks of the current image block are spatial neighboring blocks of the current image block.
- the present application provides an image prediction method, which may include: determining whether an upper boundary of a current image block to be predicted coincides with an upper boundary of a CTU where the current image block is located, and whether a left boundary of the current image block is consistent with When the left boundary of the CTU where the current image block is located coincides with the left boundary of the CTU where the current image block is located, and when the upper boundary of the current image block coincides with the upper boundary of the CTU where the current image block is located, Determine the motion information of the control points of the current image block according to the motion information of the neighboring sub-blocks of the control points of the current image block, the adjacent sub-blocks being the sub-blocks of the CU; and according to the motion information of the control points of the current image block, An affine transformation model is used to determine the motion information of the subblocks of the current image block; and according to the motion information of the subblocks of the current image block, a prediction block of the subblocks of the current image block is obtained.
- the motion of adjacent sub-blocks at the control points of the current image block is used.
- Method to determine the motion information of the control points of the current image block to realize the prediction of the current image block It is no longer necessary to obtain the motion information of the control points of the adjacent image blocks of the current image block across the CTU. Resources.
- the above-mentioned determining the motion information of the control points of the current image block based on the motion information of the neighboring sub-blocks of the control points of the current image block includes: The motion information of the neighboring sub-blocks of the control point is determined as the motion information of the control points of the current image block; or, it is determined whether the neighboring sub-blocks of the control points of the current image block are available according to a preset order; The motion information of the first available adjacent sub-block is determined as the motion information of the control points of the current image block.
- the motion information includes a motion vector
- the foregoing uses the affine transformation model to determine the motion information of the sub-blocks of the current image block according to the motion information of the control points of the current image block. Including: using the following formula to calculate the motion vector of the target pixel in the sub-block of the current image block:
- (vx 0 , vy 0 ) is the motion vector of the control point (x 0 , y 0 ) located at the upper-left vertex of the current image block
- (vx 1 , vy 1 ) is the control point located at the upper-right vertex of the current image block ( x 1 , y 1 )
- (vx, vy) is the motion vector of the target pixel (x, y).
- obtaining the prediction block of the subblock of the current image block according to the motion information of the subblock of the current image block includes: according to the motion information of the subblock of the current image block.
- a prediction block for a sub-block of the block includes: according to the motion information of the subblock of the current image block.
- the image prediction method provided in this application further includes: when the left edge of the current image block coincides with the left edge of the CTU where the current image block is located, or the top of the current image block When the boundary coincides with the upper boundary of the CTU where the current image block is located, the motion information of the control point of the current image block is determined according to the motion information of the control point of the adjacent image block of the current image block; according to the motion of the control point of the current image block Information, an affine transformation model is used to determine the motion information of the sub-blocks of the current image block; according to the motion information of the sub-blocks of the current image block, a prediction block of the sub-blocks of the current image block is obtained.
- Adjacent image blocks of the current image block satisfy at least one of the following conditions:
- the adjacent image block of the current image block is the image block located to the left or lower left of the current image block, and the adjacent image block does not include Image blocks located above the current image block, upper left and upper right;
- the adjacent image block of the current image block is an image block located above or above the current image block, and the adjacent image block does not Include image blocks located at the upper left, left, and lower left of the current image block.
- the present application provides a method for predicting motion information.
- the method includes: acquiring motion information of a target control point of a current image block to be predicted; and using an affine transformation model to determine the motion information of the target control point.
- the motion information of the target pixel in the sub-block of the current image block, and the motion information of the target pixel is used as the motion information of the sub-block.
- the target pixel is a pixel point different from the target control point; further, the current image block is The motion information of the subblock is used to predict the motion information of the neighboring image block of the subblock to determine the motion information of the neighboring image block of the subblock, and the neighboring image block of the subblock is adjacent to the current image block .
- the target control point of the current image block may include at least two of the control points of the upper left vertex, the upper right vertex, the lower left vertex, and the lower right vertex of the current image block.
- the prediction of the motion information of the adjacent image blocks of the sub-block may be determined according to the motion information of the sub-blocks of the current image block to determine the phase Motion information of neighboring image blocks.
- the method for predicting motion information may use an affine transformation model to determine the motion information of a target pixel point in a sub-block of the current image block according to the motion information of the target control point of the current image block to be predicted, and
- the motion information of the target pixel is used as the motion information of the sub-block of the current image block, so that the motion information of the sub-block is used to predict the motion information of the adjacent image block of the sub-block to determine the adjacent image block.
- the method for predicting motion information provided in the present application further includes: determining the neighbor of the sub-block according to the motion information of the neighboring image blocks of the sub-block of the current image block.
- the prediction block of the image block is a first optional implementation manner of the third aspect.
- the method for predicting motion information provided in this application further includes: performing deblocking filtering on the subblock according to the motion information of the subblock of the current image block; or, Perform superimposed block motion compensation on the sub-block according to the motion information of the sub-block of the current image block.
- the method for predicting motion information provided in this application further includes: using the motion information of the target control point described above for the motion of the control point of an adjacent image block of the current image block. Information prediction to determine motion information of control points of the adjacent image block.
- a sub-block of the current image block includes a control point of the current image block.
- the method for predicting motion information provided in this application further includes: setting the motion information of each pixel in a sub-block of the current image block as the motion information of the sub-block.
- the method for predicting motion information provided in this application further includes: saving motion information of pixels in a sub-block of the current image block.
- the method for predicting motion information provided in this application further includes: saving at least one of the motion information of the sub-block of the current image block and the motion information of the target control point of the current image block.
- the motion information of the sub-blocks of the current image block saved as described above may be used, Determine the motion information of the control points of the adjacent image block, or determine the motion information of the control points of the adjacent image block based on the saved motion information of the target control point of the current image block, thereby determining the The motion information of the sub-blocks to obtain the prediction block of the sub-block of the adjacent image block, thereby obtaining the prediction block of the adjacent image block.
- the motion information of the sub-blocks of the current image block and the motion information of the control points of the current image block are stored in different storage locations, such as different storage locations in the memory of the codec device, or storage outside the codec device. Different storage locations of the device and the like are not limited in this application.
- the target pixel point may be a central pixel point of a sub-block of the current image block.
- the present application provides an image prediction device, including a first determination module, a second determination module, and a third determination module.
- the first determination module is configured to determine the motion information of the control points of the current image block based on the motion information of the control points of the adjacent image blocks of the current image block to be predicted;
- the second determination module is used to determine the motion information of the control points of the current image block;
- Use the affine transformation model to determine the motion information of the control points of the current image block;
- the third determination module is configured to obtain the motion information of the sub-blocks of the current image block based on the motion information of the sub-blocks of the current image block Prediction block.
- Adjacent image blocks of the current image block satisfy at least one of the following conditions:
- the adjacent image block of the current image block is an image block located to the left or lower left of the current image block.
- Adjacent image blocks do not include image blocks that are located above the current image block, upper left, and upper right;
- the adjacent image block of the current image block is an image block located above or above the current image block, and the adjacent image block does not include the The top left, left, and bottom left image blocks of the current image block.
- the motion information includes a motion vector
- the first determining module is specifically configured to calculate a motion vector of a control point of the current image block by using the following formula:
- (vx 4 , vy 4 ) is the motion vector of the control point (x 4 , y 4 ) located at the upper-left vertex of the adjacent image block
- (vx 5 , vy 5 ) is the upper-right vertex located at the adjacent image block
- (vx, vy) is the motion vector of the control point (x, y) of the current image block.
- the motion information includes a motion vector
- the second determining module is specifically configured to calculate a motion vector of a sub-block of the current image block by using the following formula:
- (vx 0 , vy 0 ) is the motion vector of the control point (x 0 , y 0 ) located at the upper-left vertex of the current image block
- (vx 1 , vy 1 ) is the control point located at the upper-right vertex of the current image block ( x 1 , y 1 )
- (vx, vy) is the motion vector of the sub-block.
- the third determining module is specifically configured to, based on the motion vector in the motion information of the sub-block of the current image block and the position information of the sub-block of the current image block, in In the reference frame of the current image block, a reference block pointed to by a motion vector in motion information of a sub-block of the current image block is determined, and the reference block is used as a prediction block of the sub-block of the current image block.
- the adjacent image blocks of the current image block are spatially adjacent blocks of the current image block.
- the present application provides an image prediction apparatus including a first determination module, a second determination module, a third determination module, and a fourth determination module.
- the first determining module is configured to determine whether the upper boundary of the current image block to be predicted coincides with the upper boundary of the CTU where the current image block is located, and whether the left boundary of the current image block is consistent with the left boundary of the CTU where the current image block is located.
- a second determining module configured to: when the left edge of the current image block coincides with the left edge of the CTU where the current image block is located, and the upper edge of the current image block coincides with the upper edge of the CTU where the current image block is located, according to the current
- the motion information of the adjacent sub-blocks of the control points of the image block determines the motion information of the control points of the current image block.
- the adjacent sub-blocks are the sub-blocks of the coding unit CU;
- the third determination module is configured to The motion information of the control points uses an affine transformation model to determine the motion information of the sub-blocks of the current image block.
- the fourth determination module is used to obtain the prediction of the sub-blocks of the current image block based on the motion information of the sub-blocks of the current image block. Piece.
- the second determining module is specifically configured to determine motion information of a neighboring subblock of a control point of the current image block as motion information of a control point of the current image block.
- the second determining module is specifically configured to determine whether adjacent sub-blocks of the control points of the current image block are available according to a preset order; and The motion information is determined as the motion information of the control points of the current image block.
- the motion information includes a motion vector
- the third determining module is specifically configured to calculate a motion vector of a target pixel in a sub-block of the current image block by using the following formula:
- (vx 0 , vy 0 ) is the motion vector of the control point (x 0 , y 0 ) located at the upper-left vertex of the current image block
- (vx 1 , vy 1 ) is the control point located at the upper-right vertex of the current image block ( x 1 , y 1 )
- (vx, vy) is the motion vector of the target pixel (x, y).
- the foregoing fourth determining module is specifically configured to, according to the motion vector in the motion information of the current image block sub-block and the position information of the current image block sub-block, in the current In the reference frame of the image block, the reference block pointed to by the motion vector in the motion information of the subblock of the current image block is determined, and the reference block is used as a prediction block of the subblock of the current image block.
- the second determining module is further configured to: when the left boundary of the current image block coincides with the left boundary of the CTU where the current image block is located, or the upper boundary of the current image block When coincident with the upper boundary of the CTU where the current image block is located, the motion information of the control points of the current image block is determined according to the motion information of the control points of the adjacent image blocks of the current image block; the third determination module is further configured to The motion information of the control points of the current image block uses an affine transformation model to determine the motion information of the sub-blocks of the current image block; the fourth determination module is further configured to obtain the current image according to the motion information of the sub-blocks of the current image block.
- a prediction block for a sub-block of the block is further configured to: when the left boundary of the current image block coincides with the left boundary of the CTU where the current image block is located, or the upper boundary of the current image block When coincident with the upper boundary of the CTU where the current image block is located, the motion information of the control
- Adjacent image blocks of the current image block satisfy at least one of the following conditions:
- the adjacent image block of the current image block includes the image block to the left or lower left of the current image block, and the adjacent image block does not include Image blocks located above the current image block, upper left and upper right;
- the adjacent image block of the current image block is an image block located above or above the current image block, and the adjacent image block does not include the The top left, left, and bottom left image blocks of the current image block.
- the present application provides a device for predicting motion information, including an acquisition module and a determination module.
- the obtaining module is used to obtain the motion information of the target control point of the current image block to be predicted;
- the determination module is used to determine the sub-blocks of the current image block by using the affine transformation model according to the motion information of the target control point.
- Motion information of a target pixel point and using the motion information of the target pixel point as the motion information of the sub-block, the target pixel point being a pixel point different from the target control point; and using the motion information of the sub-block for the sub-block Prediction of motion information of neighboring image blocks of a block to determine motion information of neighboring image blocks of the sub-block, and neighboring image blocks of the sub-block are adjacent to the current image block.
- the foregoing determining module is further configured to determine a prediction block of an adjacent image block of the current image block according to the motion information of an adjacent image block of the current image block.
- the apparatus for predicting motion information provided in this application further includes a processing module; the processing module is configured to, according to the motion information of the sub-block of the current image block, for the sub-block Deblocking filtering is performed; or, based on the motion information of the subblock of the current image block, overlapping block motion compensation is performed on the subblock.
- the foregoing determining module is further configured to use the motion information of the target control point of the current image block for the motion information prediction of the control points of adjacent image blocks of the current image block. To determine the motion information of the control points of the adjacent image block.
- a sub-block of the current image block includes a control point of the current image block.
- the foregoing determining module is further configured to set motion information of each pixel in the sub-block as motion information of the sub-block.
- the apparatus for predicting motion information provided in this application further includes a storage module; the storage module is configured to store motion information of pixels in a sub-block of the current image block.
- the apparatus for predicting motion information provided in this application further includes a storage module; the storage module is configured to store the motion information of the sub-block of the current image block and the target control point. At least one item of exercise information.
- the target pixel point is a central pixel point of a sub-block of the current image block.
- the present application provides an image prediction device, including a processor and a memory coupled to the processor; the memory is configured to store computer instructions, and when the image prediction device is running, the processor executes the computer stored in the memory An instruction to cause the image prediction apparatus to execute the image prediction method described in any one of the first aspect and its optional implementation manner or the second aspect and any of its optional implementation manners.
- the present application provides a computer-readable storage medium including computer instructions that, when the computer instructions are run on an image prediction device, cause the image prediction device to execute any one of the first aspect and an optional implementation thereof. Or the image prediction method described in any one of the second aspect and its optional implementation.
- the present application provides a computer program product containing instructions that, when the computer program product runs on an image prediction device, causes the image prediction device to execute any one of the first aspect and its optional implementation manner or The image prediction method according to any one of the second aspect and its optional implementation.
- the present application provides a motion information prediction device, including a processor and a memory coupled to the processor; the memory is used to store computer instructions, and when the motion information prediction device runs, the processor executes Computer instructions stored in the memory, so that the apparatus for predicting motion information executes the method for predicting motion information according to any one of the third aspect and an optional implementation manner thereof.
- the present application provides a computer-readable storage medium including computer instructions, which when executed on a motion information prediction device, causes the motion information prediction device to execute the third aspect and its optional The method for predicting motion information according to any one of the implementation manners.
- the present application provides a computer program product containing instructions, and when the computer program product runs on a prediction device of motion information, causes the prediction device of the motion information to execute the third aspect and an optional implementation manner thereof.
- the present application provides an encoding and decoding method and device based on affine transformation, and a corresponding encoder and decoder, to a certain extent, improve the result or efficiency of the inter prediction mode.
- an encoding and decoding method based on affine transformation includes: obtaining motion information of a control point of a current affine coding block; and according to motion information of a control point of the current affine coding block, Adopting an affine transformation model to determine motion information of a motion compensation unit in the current affine coding block; perform motion compensation prediction on the motion compensation unit according to the motion information of the motion compensation unit, thereby obtaining the affine coding The predicted block of the block.
- the acquiring motion information of a control point of the current affine coding block includes: according to neighboring affine coding of the current affine coding block The motion information of the control points of the block determines the motion information of the control points of the current affine coding block.
- the affine coding block is a spatially adjacent block.
- an upper boundary of the current affine coding block and the current When the upper boundary of the coding tree unit (CTU) where the affine coding block is located coincides, the adjacent affine coding block is located to the left or lower left of the current affine coding block, and the adjacent affine coding block is not located The upper side of the current affine coding block, upper left or upper right.
- CTU coding tree unit
- the left boundary of the current affine coding block and the current When the left boundaries of the coding tree unit (CTU) where the affine coding block is located coincide, the adjacent affine coding block is not located at the upper left, left or lower left of the current affine coding block, the adjacent affine The coding block is located on the upper side or the upper right of the current affine coding block.
- CTU coding tree unit
- a fifth possible implementation manner of the thirteenth aspect it is characterized in that, at a left boundary of the current affine coding block When coincident with the left boundary of the coding tree unit (CTU) where the current affine coding block is located, the adjacent affine coding block is not located to the left or lower left of the current affine coding block, and the adjacent affine coding block An affine coding block is located at the upper left, upper side, or upper right of the current affine coding block.
- CTU coding tree unit
- the left and upper boundaries of the current affine coding block and the When the left and upper boundaries of the coding tree unit (CTU) where the current affine coding block is located respectively overlapped obtaining the motion information of the control points of the current affine coding block includes: based on the control points of the current affine coding block Obtain the motion information of the control points of the current affine-coded block by using the motion information of the neighboring encoded blocks.
- CTU coding tree unit
- the method further includes: according to a control point of the current affine coding block The motion information of the image block where it is located, perform at least one of the following operations: deblocking filter, overlapping block motion compensation, prediction of motion information of non-affine coded block, motion information based on control point combination of affine coded block And prediction of time-domain motion information; wherein the motion information of the image block where the control point of the current affine coding block is located is different from the motion information of the control point of the current affine coding block.
- the motion information of the image block where the control point of the current affine coding block is the current The motion information of the motion compensation unit of the image block where the control point of the affine coding block is located or the motion information of the central pixel point of the image block where the control point of the current affine coding block is located.
- an affine transformation-based codec device includes a module for executing the method in the thirteenth aspect or any one of the thirteenth aspects.
- a codec includes a non-volatile memory and a processor coupled to each other, and the processor calls program code stored in the memory to execute the thirteenth aspect. Or part or all of the steps of the method in any one of the thirteenth aspects.
- a computer-readable storage medium stores program code, where the program code includes a method for executing any one of the implementation methods of the thirteenth aspect Instructions for some or all of the steps.
- a seventeenth aspect provides a computer program product that, when the computer program product runs on a computer, causes the computer to execute instructions of part or all of the steps of the method in any one of the implementation manners of the thirteenth aspect. .
- FIG. 1 is a schematic diagram of neighboring image blocks of a current image block according to an embodiment of the present invention
- FIG. 2 is a schematic diagram of a control point of a current image block according to an embodiment of the present invention.
- FIG. 3 is a schematic structural diagram of a video encoding and decoding system according to an embodiment of the present invention.
- FIG. 4 is a flowchart of an inter prediction method in a video encoding process according to an embodiment of the present invention.
- FIG. 5 is a flowchart of an inter prediction method in a video decoding process according to an embodiment of the present invention.
- 6A is a hardware schematic diagram of a video encoder according to an embodiment of the present invention.
- FIG. 6B is a hardware schematic diagram of a video decoder according to an embodiment of the present invention.
- FIG. 7 is a schematic diagram of a method for predicting motion information according to an embodiment of the present invention.
- FIG. 8 is a first schematic diagram of neighboring sub-blocks of a current image block according to an embodiment of the present invention.
- FIG. 9 is a second schematic diagram of a neighboring sub-block of a current image block according to an embodiment of the present invention.
- FIG. 10 is a third schematic diagram of neighboring sub-blocks of a current image block according to an embodiment of the present invention.
- FIG. 11 is a first schematic diagram of an image prediction method according to an embodiment of the present invention.
- FIG. 12 is a second schematic diagram of an image prediction method according to an embodiment of the present invention.
- FIG. 13 is a first schematic structural diagram of a device for predicting motion information according to an embodiment of the present invention.
- FIG. 14 is a second schematic structural diagram of a device for predicting motion information according to an embodiment of the present invention.
- 15 is a third schematic structural diagram of a device for predicting motion information according to an embodiment of the present invention.
- 16 is a schematic structural diagram of an image prediction apparatus according to an embodiment of the present invention.
- FIG. 17 is a schematic structural diagram of another image prediction apparatus according to an embodiment of the present invention.
- FIG. 18 is a schematic structural diagram of an encoding device or a decoding device according to an embodiment of the present invention.
- FIG. 19 is a schematic block diagram of a video encoding and decoding system according to an embodiment of the present invention.
- 20A is a schematic block diagram of a video encoder according to an embodiment of the present invention.
- 20B is a schematic block diagram of a video decoder according to an embodiment of the present invention.
- 21 is a schematic diagram of inter prediction for encoding a video image according to an embodiment of the present invention.
- FIG. 22 is a schematic diagram of a motion information candidate position for decoding a video image according to an embodiment of the present invention.
- FIG. 23 is a schematic diagram of a motion vector prediction method based on a motion model according to an embodiment of the present invention.
- 24 is a schematic diagram of a motion vector prediction method based on a combination of control points according to an embodiment of the present invention.
- 25 is a schematic diagram of a motion vector prediction method based on a combination of control points according to an embodiment of the present invention.
- FIG. 26 is a schematic block diagram of an encoding device or a decoding device according to an embodiment of the present invention.
- first and second in the description and claims of the embodiments of the present invention are used to distinguish different objects, rather than to describe a specific order of the objects.
- first control point and the second control point are suitable for distinguishing different control points, rather than describing the characteristic sequence of the control points.
- words such as “exemplary” or “for example” are used as examples, illustrations or illustrations. Any embodiment or design described as “exemplary” or “for example” in the embodiments of the present invention should not be construed as more preferred or more advantageous than other embodiments or designs. Rather, the use of the words "exemplary” or “for example” is intended to present the relevant concept in a concrete manner.
- a plurality means two or more.
- multiple processing units refer to two or more processing units; multiple systems refer to two or more systems.
- An image block can be a rectangular image area in a frame of image and contains A ⁇ B sampling points (that is, pixels). Therefore, it can also be referred to as an A ⁇ B sampling point containing A row B column sampling points.
- the sampling points may include luminance sampling points and / or chrominance sampling points.
- the sampling points in an image area may be some or all of the pixel points in the image area. Among them, the values of A and B may be equal or unequal.
- the values of A and B are usually integer powers of 2, such as 256, 128, 64, 32, 16, 8, 4, and so on.
- CTU coding tree unit (full name: coding tree unit) is a basic unit in the process of video encoding or video decoding.
- CTU corresponds to a square image block in a video frame (that is, a frame image) in video data, that is, a An image may include one or more CTUs.
- the size of the CTU may be 64 * 64, that is, the CTU of 64 * 64 contains a rectangular pixel lattice consisting of 64 rows and 64 columns of pixels.
- the size of the CTU may also be 128 * 128 or 256 * 256 and so on.
- a coding unit is a leaf node generated after the CTU is divided.
- a CU corresponds to a rectangular image block.
- the width and height of a CU can also be expressed using the number of pixels. For example, the width of a CU can be 256, 128, 64 , 32, 8, or 4 pixels, etc., the height of the CU can also be 256, 128, 64, 32, 8, or 4 pixels, etc .; among them, the height and width of the CU can be equal or unequal.
- a video frame (also referred to as a frame image) of video data is taken as an example.
- the video encoding device uses CU as an encoding unit and completes encoding of all CUs included in a CTU according to certain encoding rules. Complete the encoding of multiple CTUs of a frame of image to obtain the code stream; in the process of decoding video data, the decoding device completes the reconstruction of multiple CUs included in a CTU according to the decoding rules corresponding to the encoding process, and then completes one frame Reconstruction of multiple CTUs of the image to obtain a reconstructed image.
- the image block refers to a CU
- a smaller image block obtained by further dividing an image block to be encoded or to be decoded that is, a CU
- a subblock which can also be called As a sub-motion compensation unit
- Encoding a video stream, or a part of it, such as a video frame or an image block can use the temporal and spatial similarity in the video stream to improve the encoding performance, which is the action performed by the encoding end.
- the predicted image blocks can be referred to as prediction blocks
- the prediction block and the The difference between the original image blocks referred to as residuals
- residuals are transformed, quantized, entropy-encoded, and in-loop filtered, and all image blocks are encoded based on the previously encoded blocks to complete the image. coding.
- Decoding a video stream is an opposite process to encoding a video stream. It is an action performed by the decoder.
- the decoder For an image block (in the embodiment of the present invention, during the process of encoding the video stream and decoding the video stream, the currently processed image block is called the current Image block or current encoding block), the decoder performs entropy decoding, inverse quantization, and inverse transformation on the residual of the current image block to obtain the residual information of the current image block, and the decoder uses a method similar to the encoding end to determine the current image block According to the obtained residual information of the current image block and the prediction block, a reconstruction block of the current image block is determined, and decoding of the current image block is completed.
- Methods for generally determining a prediction block may include intra prediction and inter prediction.
- Intra prediction refers to using the pixel values of the reconstructed area to predict the pixel values of the current image block within a video frame.
- Inter prediction refers to finding a reference block of a current image block in a reconstructed video frame in a video stream, and using the pixel value of the reference block as prediction information or prediction value of the pixel value of the current image block.
- the prediction block of the current image block may be determined according to the motion information of the current image block.
- the motion information of the current image block includes prediction direction indication information, one or more motion vectors pointing to the reference block, and a video frame where the reference block is located (here, the video frame where the reference block is located may be referred to as a reference frame) Indication information, wherein the prediction direction indication information is used to indicate a prediction direction of inter prediction, and generally the prediction direction includes forward prediction, backward prediction, or bidirectional prediction; a motion vector is used to indicate a displacement of a reference block relative to a current image block;
- the indication information of the video frame where the reference block is located is used to indicate the position of the reference block in the video stream, that is, which video frame the reference block is located in.
- the indication information of the video frame where the reference block is located may be an index of the reference frame.
- the aforementioned forward prediction refers to selecting a reference frame from a forward reference frame set to obtain a reference block of the current image block
- the backward prediction refers to selecting a reference frame from a backward reference frame set to obtain a reference of the current image block Block
- bidirectional prediction refers to selecting a reference frame from the forward reference frame set and the backward reference frame set to obtain the reference block of the current image block, obtaining two reference blocks, and then according to the pixels corresponding to the two reference blocks The value determines the pixel value of the current image block.
- the above motion vector is an important parameter in the inter prediction process, which represents the spatial displacement of a previously coded block relative to the current coded block.
- Motion vectors can be obtained using motion estimation methods, such as motion search.
- the bits representing the motion vector were included in the encoded bit stream to allow the decoder to reproduce the predicted block and then obtain the reconstructed block.
- it was later proposed to use the reference motion vector to differentially encode the motion vector that is, instead of encoding the entire motion vector, only the difference between the motion vector and the reference motion vector was encoded.
- the reference motion vector may be selected from previously used motion vectors in the video stream. Selecting a previously used motion vector to encode the current motion vector can further reduce the number of bits included in the encoded video bitstream .
- inter prediction modes in the existing standards.
- HEVC that is, H.265
- merges inter prediction modes for prediction units
- AMVP advanced motion vector prediction
- the motion information of the coded image blocks adjacent to the current or temporal domain of the current image block is used to construct a candidate motion vector list (including multiple candidate motion vectors), and then the optimal motion vector list is determined from the candidate motion vector list.
- the motion vector is used as a motion vector predictor (MVP) of the current image block.
- MVP motion vector predictor
- the encoding end passes the index value of the selected motion vector prediction value in the candidate motion vector list and the index value of the reference frame to the decoding end. Further, a motion search is performed in a neighborhood centered on the MVP to obtain the actual motion vector of the current image block, and the encoding end passes the difference between the MVP and the actual motion vector (motion vector difference) to the decoding end.
- a rate-distortion optimization technique may be used to determine the optimal motion vector. Specifically, according to all candidate motion information in the first candidate motion list, the following formula can be used to calculate the rate distortion cost corresponding to each motion vector, and then the motion vector with the lowest rate distortion cost (that is, the smallest rate distortion cost corresponding to Motion vector) as the motion vector prediction value of the current image block:
- J represents the cost of rate distortion
- SAD represents the sum of the absolute error between the predicted value and the original value of the current image block determined using the motion vectors in the candidate motion vector list
- ⁇ represents the Lagrange multiplier ( ⁇ can be Set constant)
- R represents the bit rate of the video stream.
- the motion information of the coded image blocks adjacent to the current or temporal domain of the current image block is used to construct a candidate motion vector list, and then the optimal motion vector is determined from the candidate motion vector list as the motion of the current image block.
- Vector and then pass the index value (denoted as merge index) of the position of the optimal motion vector in the candidate motion vector list to the decoding end.
- the spatial candidate motion vector of the current image block is determined by the motion vectors of the five image blocks (A0, B0, C0, D0, and E0) that are spatially adjacent to the current image block. If the image block is unavailable, the candidate motion information list is not added.
- the time-domain candidate motion information of the current image block is obtained by scaling the motion vector of the image block at the corresponding position in the reference frame. Embodiments of the present invention will introduce the method of scaling the motion vector in detail below.
- an image block with a position T0 in a reference frame of the current image block is available. If it is not available, then an image block with a position C0 is selected as an image block adjacent to the current image block in the time domain.
- the image block (or sub-block) can be obtained as follows: the image block (or sub-block) is encoded or decoded, and its prediction mode is the inter prediction mode; otherwise, the image block (or Subblock) is not available.
- the control points of the current image block refer to the pixels used to generate the motion vector of the current image block. They are usually the pixels located at the vertices of the current image block. As shown in Figure 2, the control points of the current image block can be the current The top left vertex P1, the top right vertex P2, the bottom left vertex P3, and the bottom right vertex P4 of the image block.
- the motion vectors of all pixels in the same CU may be different. Therefore, the CU is divided into multiple sub-blocks, and then the motion information of each sub-block is determined. And predict each sub-block, so as to achieve prediction of CU.
- a non-translational motion model can be used to determine the motion information of all sub-blocks of the current image block based on the motion information of the control points of the current image block.
- Commonly used non-translational motion models include a 4-parameter affine transformation model and a 6-parameter Affine transformation model and 8-parameter affine transformation model.
- the motion information of the two control points of the current image block can be used to determine the motion information of all sub-blocks
- the motion information of the two control points of the current image block can be used to determine the motion information of all sub-blocks
- the 6-parameter affine transformation The projection transformation model can use the motion information of the three control points of the current image block (such as the upper left corner control point, the upper right corner control point, and the lower left corner control point) to determine the motion information of all sub-blocks; for an 8-parameter simulation
- the projection transformation model can use the motion information of the four control points of the current image block (such as the upper left corner control point, the upper right corner control point, the lower left corner control point, and the lower right corner control point) to determine the motion of all subblocks. information.
- the aforementioned motion information prediction method based on the non-translational motion model can be applied to the inter prediction process of the fusion mode and the AMVP mode, respectively, and the encoding end can pass the motion information of the control point to the decoding end, so that the decoding end
- the corresponding non-translational motion model is used to determine the motion information of each sub-block.
- the motion information of the CU can be predicted in a unit of CU. This process of determining motion information can be referred to as determining based on translational motion models Sports information.
- embodiments of the present invention provide an image prediction method and device.
- FIG. 3 is a block diagram of a video decoding system 1 according to an example described in the embodiment of the present invention.
- video coder generally refers to both video encoders and video decoders.
- video coding or “coding” may generally refer to video encoding or video decoding.
- the video encoder 100 and the video decoder 200 of the video decoding system 1 are used to predict a current coded image block according to various method examples described in any of the multiple new inter prediction modes proposed by the present invention.
- the motion information of the sub-block or its sub-blocks makes the predicted motion vector close to the motion vector obtained by using the motion estimation method to the greatest extent, so that the motion vector difference value is not transmitted during encoding, thereby further improving the encoding and decoding performance.
- the video decoding system 1 includes a source device 10 and a destination device 20.
- the source device 10 generates encoded video data. Therefore, the source device 10 may be referred to as a video encoding device.
- the destination device 20 may decode the encoded video data generated by the source device 10. Therefore, the destination device 20 may be referred to as a video decoding device.
- Various implementations of the source device 10, the destination device 20, or both may include one or more processors and a memory coupled to the one or more processors.
- the memory may include, but is not limited to, random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (electrically erasable, programmable, read-only memory, EEPROM). ), Flash memory, or any other medium that can be used to store the desired program code in the form of instructions or data structures accessible by a computer, as described herein.
- the source device 10 and the destination device 20 may include various devices including desktop computers, mobile computing devices, notebook (e.g., laptop) computers, tablet computers, set-top boxes, telephone handsets, such as so-called “smart” phones, etc. Cameras, televisions, cameras, display devices, digital media players, video game consoles, on-board computers, or the like.
- the destination device 20 may receive the encoded video data from the source device 10 via the link 30.
- the link 30 may include one or more media or devices capable of moving the encoded video data from the source device 10 to the destination device 20.
- the link 30 may include one or more communication media enabling the source device 10 to directly transmit the encoded video data to the destination device 20 in real time.
- the source device 10 may modulate the encoded video data according to a communication standard, such as a wireless communication protocol, and may transmit the modulated video data to the destination device 20.
- the one or more communication media may include wireless and / or wired communication media, such as a radio frequency (RF) spectrum or one or more physical transmission lines.
- RF radio frequency
- the one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (eg, the Internet).
- the one or more communication media may include a router, a switch, a base station, or other devices that facilitate communication from the source device 10 to the destination device 20.
- the encoded data may be output from the output interface 140 to the storage device 40.
- the encoded data can be accessed from the storage device 40 through the input interface 240.
- the storage device 40 may include any of a variety of distributed or locally-accessed data storage media, such as a hard disk drive, a Blu-ray disc, a digital versatile disc (DVD), and a compact disc (read-only). only memory (CD-ROM), flash memory, volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded video data.
- the storage device 40 may correspond to a file server or another intermediate storage device that may hold the encoded video produced by the source device 10.
- the destination device 20 may access the stored video data from the storage device 40 via streaming or download.
- the file server may be any type of server capable of storing encoded video data and transmitting the encoded video data to the destination device 20.
- the example file server includes a network server (for example, for a website), a file transfer protocol (FTP) server, a network attached storage (NAS) device, or a local disk drive.
- the destination device 20 can access the encoded video data through any standard data connection, including an Internet connection.
- This may include wireless channels (e.g., Wi-Fi connection), wired connections (e.g., digital subscriber line (DSL), cable modem, etc.), or suitable for accessing encoded video data stored on a file server A combination of both.
- the transmission of the encoded video data from the storage device 40 may be a streaming transmission, a download transmission, or a combination of the two.
- the image prediction method provided by the embodiment of the present invention can be applied to video encoding and decoding to support a variety of multimedia applications, such as air television broadcasting, cable television transmission, satellite television transmission, streaming video transmission (for example, via the Internet), and storage in Encoding of video data on a data storage medium, decoding of video data stored on a data storage medium, or other applications.
- the video coding system 1 may be used to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and / or video telephony.
- the video decoding system 1 illustrated in FIG. 3 is only an example, and the technology of the present application can be applied to a video decoding setting (for example, video encoding or video decoding) that does not necessarily include any data communication between the encoding device and the decoding device. .
- data is retrieved from local storage, streamed over a network, and so on.
- the video encoding device may encode the data and store the data to a memory, and / or the video decoding device may retrieve the data from the memory and decode the data.
- encoding and decoding are performed by devices that do not communicate with each other, but only encode data to and / or retrieve data from memory and decode data.
- the source device 10 includes a video source 120, a video encoder 100, and an output interface 140.
- the output interface 140 may include a regulator / demodulator (modem) and / or a transmitter.
- Video source 120 may include a video capture device (e.g., a video camera), a video archive containing previously captured video data, a video feed interface to receive video data from a video content provider, and / or a computer for generating video data Graphics systems, or a combination of these sources of video data.
- the video encoder 100 may encode video data from the video source 120.
- the source device 10 transmits the encoded video data directly to the destination device 20 via the output interface 140.
- the encoded video data may also be stored on the storage device 40 for later access by the destination device 20 for decoding and / or playback.
- the destination device 20 includes an input interface 240, a video decoder 200, and a display device 220.
- the input interface 240 includes a receiver and / or a modem.
- the input interface 240 may receive encoded video data via the link 30 and / or from the storage device 40.
- the display device 220 may be integrated with the destination device 20 or may be external to the destination device 20. Generally, the display device 220 displays decoded video data.
- the display device 220 may include various display devices, for example, a liquid crystal display (LCD), a plasma display, an organic light-emitting diode (OLED) display, or other types of display devices.
- LCD liquid crystal display
- OLED organic light-emitting diode
- video encoder 100 and video decoder 200 may each be integrated with an audio encoder and decoder, and may include an appropriate multiplexer-demultiplexer (A multiplexer-demultiplexer (MUX-DEMUX) unit or other hardware and software to handle encoding of both audio and video in a common or separate data stream.
- a multiplexer-demultiplexer (MUX-DEMUX) unit or other hardware and software to handle encoding of both audio and video in a common or separate data stream.
- the MUX-DEMUX unit may conform to the ITU H.223 multiplexer protocol, or other protocols such as the user datagram protocol (UDP).
- Each of the video encoder 100 and the video decoder 200 may implement any of a variety of circuits such as one or more microprocessors, digital signal processors (DSPs), and application specific integrated circuits (DSPs). application specific integrated circuit (ASIC), field-programmable gate array (FPGA), discrete logic, hardware, or any combination thereof. If the present application is implemented partially in software, the device may store instructions for the software in a suitable non-volatile computer-readable storage medium and may use one or more processors to execute the instructions in hardware Thus, the technology of the present application is implemented. Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) may be considered as one or more processors. Each of video encoder 100 and video decoder 200 may be included in one or more encoders or decoders, any of which may be integrated as a combined encoder in a corresponding device / Decoder (codec).
- codec device / Decoder
- Embodiments of the invention may generally refer to video encoder 100 as “signaling” or “transmitting” certain information to another device, such as video decoder 200.
- the terms “signaling” or “transmitting” may generally refer to the transmission of syntax elements and / or other data used to decode compressed video data. This transfer can occur in real time or almost real time. Alternatively, this communication may occur over a period of time, such as when a syntax element is stored in a coded bitstream to a computer-readable storage medium at the time of encoding, and the decoding device may then after the syntax element is stored in this medium. retrieve the syntax element at any time.
- the video encoder 100 and the video decoder 200 may operate according to a video compression standard such as HEVC or an extension thereof, and may conform to the HEVC test model (HM).
- HEVC HEVC test model
- the video encoder 100 and video decoder 200 may also operate according to other industry standards, such as the ITU-T H.264, H.265 standards, or extensions of such standards.
- the technology of the embodiments of the present invention is not limited to any particular codec standard.
- the video encoder 100 is configured to encode a syntax element related to a current image block to be encoded into a digital video output bit stream (referred to as a bit stream or a code stream) (S101),
- the syntax elements related to the current image block may include, for example, but not limited to, syntax elements that perform inter prediction on the current image block.
- the syntax elements used for inter prediction of the current image block are referred to as inter prediction data.
- the inter prediction data may include an identifier for indicating (specifically used to instruct the video decoder 200) whether to divide the current image block into sub-blocks and perform inter-prediction based on the motion information of the sub-blocks (in other words, to indicate video decoding Identifier of whether the decoder 200 performs prediction on the current image block by using the image prediction method provided by the embodiment of the present invention). Because the video encoder 100 and the video decoder 200 process video data in the same process (or the same correspondence), if the identifier is used to instruct the video decoder 200 to divide the current image block into sub-blocks, and according to the sub-block's The motion information is decoded.
- the video encoder 100 can send the code stream to the video decoder 200 on the one hand; on the other hand, it can predict the current image block Motion information of one or more sub-blocks (specifically, motion information of each sub-block or all sub-blocks), and using the motion information of one or more sub-blocks in the current image block to perform inter prediction on the current image block (S102) .
- the video decoder 200 is configured to decode a code stream to obtain syntax elements related to the current image block to be decoded (S201).
- the syntax elements related to the current image block may be, for example, but not It is limited to include syntax elements for inter-prediction of the current image block.
- the syntax elements used for inter-prediction of the current image block are simply referred to as inter-prediction data, where the inter-prediction data may include instructions (specifically used to indicate video).
- Decoder 200 whether the current image block is divided into sub-blocks, and an identifier for inter prediction is performed according to the motion information of the sub-block, if the identifier indicates that the current image block is divided into sub-blocks, and the frame is performed according to the motion information of the sub-block Inter-prediction, video decoder 200 may predict motion information of one or more sub-blocks in the current image block, and perform inter-prediction on the current image block by using the motion information of one or more sub-blocks in the current image block (S202).
- the syntax element (specifically, the inter prediction data) sent by the video encoder 100 to the video decoder 200 includes an instruction for indicating whether to block the current image.
- the identification divided into sub-blocks and encoding / decoding according to the motion information of the sub-blocks will be described as an example.
- the video encoder 100 and the video decoder 200 may also agree in advance (for example, through a protocol or a standard agreement) to divide the current image block into sub-blocks, and perform prediction based on the motion information of the sub-blocks.
- the syntax element (specifically, the inter prediction data) sent by the video encoder 100 to the video decoder 200 may not include the foregoing identifier.
- FIG. 6A is a block diagram of a video encoder 100 according to an example described in the embodiment of the present invention.
- the video encoder 100 is configured to output a video to the post-processing entity 41.
- the post-processing entity 41 represents an example of a video entity that can process encoded video data from the video encoder 100, such as a media-aware network element (MANE) or a stitching / editing device.
- the post-processing entity 41 may be an instance of a network entity.
- the post-processing entity 41 and the video encoder 100 may be parts of a separate device, while in other cases, the functionality described with respect to the post-processing entity 41 may be performed by the same device including the video encoder 100 carried out.
- the post-processing entity 41 is an example of the storage device 40 of FIG. 3.
- the video encoder 100 includes a prediction processing unit 108, a loop filter 106, a decoded picture buffer (DPB) 107, a summer 112, a transformer 101, a quantizer 102, and Entropy encoder 103.
- the prediction processing unit 108 (labeled as a frame predictor 108 in FIG. 6A) includes an inter predictor 110 and an intra predictor 109.
- the video encoder 100 further includes an inverse quantizer 104, an inverse transformer 105, and a summer 111.
- the loop filter 106 is intended to represent one or more loop filters, such as a deblocking filter, an adaptive loop filter (ALF), and a sample adaptive offset (SAO) filter .
- the loop filter 106 is shown as an in-loop filter in FIG. 6A, in other implementations, the loop filter 106 may be implemented as a post-loop filter.
- the video encoder 100 may further include a video data memory and a segmentation unit (not shown in the figure).
- the video data memory may store video data to be encoded by the components of the video encoder 100.
- the video data stored in the video data storage may be obtained from the video source 120.
- the DPB 107 may be a reference image memory that stores reference video data used by the video encoder 100 to encode video data in an intra-frame or inter-frame decoding mode.
- the video data memory and the DPB 107 can be formed by any of a variety of memory devices, such as dynamic random access memory (DRAM) including synchronous DRAM (SDRAM), magnetic resistive RAM (MRAM) ), Resistive RAM (resistive RAM, RRAM), or other types of memory devices.
- the video data memory and the DPB 107 may be provided by the same memory device or separate memory devices.
- the video data memory may be on-chip with other components of video encoder 100, or off-chip relative to those components.
- the video encoder 100 receives video data and stores the video data in a video data memory.
- the segmentation unit divides the video data into several image blocks, and these image blocks can be further divided into smaller blocks, such as image block segmentation based on a quad-tree structure or a binary tree structure. This segmentation may also include segmentation into slices, tiles, or other larger units.
- Video encoder 100 typically illustrates components that encode image blocks within a video slice to be encoded. The slice can be divided into multiple image patches (and possibly into a collection of image patches called slices).
- the prediction processing unit 108 may select one of a plurality of possible decoding modes for the current image block, such as one of a plurality of intra-coding modes or one of a plurality of inter-coding modes.
- the prediction processing unit 108 may provide the obtained intra, inter-coded block to the summer 112 to generate a residual block, and to the summer 111 to reconstruct the encoded block used as a reference image.
- the intra predictor 109 within the prediction processing unit 108 may perform intra predictive encoding of the current image block with respect to one or more neighboring blocks in the same frame or slice as the current block to be encoded to remove spatial redundancy.
- the inter predictor 110 within the prediction processing unit 108 may perform inter predictive encoding of the current image block with respect to one or more prediction blocks in the one or more reference images to remove temporal redundancy.
- the inter predictor 110 is configured to predict motion information (for example, a motion vector) of one or more subblocks in the current image block, and obtain or generate a prediction block of the current image block by using the motion information of one or more subblocks in the current image block.
- the inter predictor 110 may locate a prediction block pointed to by the motion vector in one of the reference image lists.
- the inter predictor 110 may also generate syntax elements associated with the image blocks and video slices for use by the video decoder 200 when decoding the image blocks of the video slices.
- the inter predictor 110 uses the motion information of each sub-block to perform a motion compensation process to generate a prediction block of each sub-block, thereby obtaining a prediction block of the current image block. It should be understood that the The inter predictor 110 performs motion estimation and motion compensation processes.
- the video encoder 100 forms a residual image block by subtracting the prediction block from the current image block to be encoded.
- the summer 112 represents one or more components that perform this subtraction operation.
- the residual video data in the residual block may be included in one or more TUs and applied to the transformer 101.
- the transformer 101 transforms the residual video data into residual transform coefficients using a transform such as a discrete cosine transform (DCT) or a conceptually similar transform.
- the transformer 101 may transform the residual video data from a pixel value domain to a transform domain, such as a frequency domain.
- DCT discrete cosine transform
- the transformer 101 may send the obtained transform coefficients to a quantizer 102.
- a quantizer 102 quantizes the transform coefficients to further reduce the bit rate.
- the quantizer 102 may then perform a scan of a matrix containing the quantized transform coefficients.
- the entropy encoder 103 may perform scanning.
- the entropy encoder 103 After quantization, the entropy encoder 103 entropy encodes the quantized transform coefficients. For example, the entropy encoder 103 can perform context adaptive variable length coding (CAVLC), context adaptive binary arithmetic coding (CABAC), syntax-based context adaptive Binary Arithmetic Coding (SBAC), Probability Interval Partition Entropy (PIPE) coding, or another entropy coding method or technique.
- CAVLC context adaptive variable length coding
- CABAC context adaptive binary arithmetic coding
- SBAC syntax-based context adaptive Binary Arithmetic Coding
- PIPE Probability Interval Partition Entropy
- the encoded bitstream may be transmitted to the video decoder 200, or archived for later transmission or retrieved by the video decoder 200.
- the entropy encoder 103 may also perform entropy coding on the syntax elements of the current image block to be coded.
- the inverse quantizer 104 and the inverse changer 105 respectively apply inverse quantization and inverse transform to reconstruct the residual block in the pixel domain, for example, for later use as a reference block of a reference image.
- the summer 111 adds the reconstructed residual block to a prediction block generated by the inter predictor 110 or the intra predictor 109 to generate a reconstructed image block.
- processing for example, processing such as interpolation
- on a reference block of an image block can obtain a prediction block of the image block.
- the loop filter 106 may be applied to reconstructed image blocks to reduce distortion, such as block artifacts. This reconstructed image block is then stored as a reference block in the decoded image buffer 107 and can be used by the inter predictor 110 as a reference block to inter-predict the subsequent video frames or blocks in the image.
- the video encoder 100 can directly quantize the residual signal without processing by the transformer 101 and correspondingly without the inverse transformer 105; or, for some image blocks Or image frames, the video encoder 100 does not generate residual data, and accordingly does not need to be processed by the transformer 101, quantizer 102, inverse quantizer 104, and inverse transformer 105; or, the video encoder 100 may convert the reconstructed image
- the blocks are stored directly as reference blocks without being processed by the loop filter 106; alternatively, the quantizer 102 and the inverse quantizer 104 in the video encoder 100 may be merged together.
- FIG. 6B is a block diagram of a video decoder 200 according to an example described in the embodiment of the present invention.
- the video decoder 200 includes an entropy decoder 203, a prediction processing unit 208, an inverse quantizer 204, an inverse transformer 205, a summer 211, a loop filter 206, and a decoded image buffer 207.
- the prediction processing unit 208 may include an inter predictor 210 and an intra predictor 209.
- video decoder 200 may perform a decoding process that is substantially reciprocal to the encoding process described with respect to video encoder 100 from FIG. 6A.
- video decoder 200 receives from video encoder 100 an encoded video bitstream representing image blocks of the encoded video slice and associated syntax elements.
- the video decoder 200 may receive video data from the network entity 42, optionally, the video data may also be stored in a video data storage (not shown in the figure).
- the video data memory may store video data, such as an encoded video bitstream, to be decoded by components of the video decoder 200.
- the video data stored in the video data storage can be obtained, for example, from the storage device 40, from a local video source such as a camera, via a wired or wireless network of video data, or by accessing a physical data storage medium.
- the video data memory can be used as a decoded image buffer (CPB) for storing encoded video data from the encoded video bitstream. Therefore, although the video data storage is not shown in FIG. 6B, the video data storage and the DPB 207 may be the same storage, or may be separately provided storages.
- the video data memory and the DPB207 can be formed by any of a variety of memory devices, such as: dynamic random access memory (DRAM) including synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), resistive RAM (RRAM), Or other types of memory devices.
- DRAM dynamic random access memory
- SDRAM synchronous DRAM
- MRAM magnetoresistive RAM
- RRAM resistive RAM
- the video data memory may be integrated on a chip with other components of the video decoder 200 or provided off-chip relative to those components.
- the network entity 42 may be, for example, a server, a MANE, a video editor / splicer, or other such device for implementing one or more of the techniques described above.
- the network entity 42 may or may not include a video encoder, such as video encoder 100.
- the network entity 42 may implement some of the techniques described in this application.
- the network entity 42 and the video decoder 200 may be part of separate devices, while in other cases, the functionality described with respect to the network entity 42 may be performed by the same device including the video decoder 200.
- the network entity 42 may be an example of the storage device 40 of FIG. 3.
- the entropy decoder 203 of the video decoder 200 entropy decodes the bit stream to generate quantized coefficients and some syntax elements.
- the entropy decoder 203 forwards the syntax elements to the prediction processing unit 208.
- Video decoder 200 may receive syntax elements at a video slice level and / or an image block level.
- the syntax element here may include inter prediction data related to the current image block, and the inter prediction data may include an indication of whether the current image block is divided into sub-blocks, and The motion information of a block is inter-predicted.
- the video encoder 100 may signal a specific syntax element indicating whether to use the image prediction method proposed in the present application. The process of performing inter prediction based on the motion information of the subblocks of the current image block will be explained in detail below.
- the inverse quantizer 204 inverse quantizes, that is, dequantizes, the quantized transform coefficients provided in the bitstream and decoded by the entropy decoder 203.
- the inverse quantization process may include using a quantization parameter calculated by the video encoder 100 for each image block in the video slice to determine the degree of quantization that should be applied and similarly to determine the degree of inverse quantization that should be applied.
- the inverse transformer 205 applies an inverse transform to transform coefficients, such as an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process in order to generate a residual block in the pixel domain.
- the video decoder 200 works by comparing the residual block from the inverse transformer 205 with the corresponding prediction generated by the inter predictor 210 The blocks are summed to get the reconstructed block, that is, the decoded image block.
- the summer 211 represents a component that performs this summing operation.
- a loop filter in the decoding loop or after the decoding loop
- the loop filter 206 may represent one or more loop filters, such as a deblocking filter, an adaptive loop filter (ALF), and a sample adaptive offset (SAO) filter.
- the loop filter 206 is shown as an in-loop filter in FIG. 6B, in other implementations, the loop filter 206 may be implemented as a post-loop filter.
- the loop filter 206 is adapted to reconstruct a block to reduce block distortion, and the result is output as a decoded video stream.
- a decoded image block in a given frame or image may also be stored in a decoded image buffer 207, and the decoded image buffer 207 stores a reference image for subsequent motion compensation.
- the decoded image buffer 207 may be part of a memory, which may also store the decoded video for later presentation on a display device, such as the display device 220 of FIG. 3, or may be separate from such memory.
- video decoder 200 may generate an output video stream without being processed by loop filter 206; or, for certain image blocks or image frames, entropy decoder 203 of video decoder 200 does not decode the quantized coefficients, correspondingly The ground need not be processed by the inverse quantizer 204 and the inverse transformer 205.
- the motion information prediction method and image prediction method provided by the embodiments of the present invention are mainly applied in the process of inter prediction, and a non-translational motion model can be used to predict the current image block.
- the affine transformation model is used to derive the current image block.
- the motion information of each sub-block of the image block, and then the prediction block of each sub-block is determined according to the motion information of each sub-block, thereby obtaining the prediction block of the current image block, and the motion information of the sub-block of the current image block can be obtained Used for prediction of other image blocks, and so on.
- the method for predicting motion information and the method for predicting images according to the embodiments of the present invention are described in detail below.
- the method for predicting motion information provided by an embodiment of the present invention may include S301-S303:
- the current image block is a CU
- the target control point of the current image block may include at least two of the upper left vertex, the upper right vertex, the lower left vertex, and the control point of the lower right vertex of the current image block.
- the target control points of the current image block may be control points P1 and P2, or the target control points may be control points P1, control points P2, and control points P3, or the target control points may be control points.
- P1, control point P2, control point P3, and control point P4, etc., are not listed here one by one.
- the method for acquiring the motion information of the target control point may include the following S3011 or S3012:
- S3011 Determine motion information of a target control point of a current image block according to motion information of control points of neighboring image blocks of the current image block.
- the adjacent image block may be: an image block adjacent to a certain edge of the current image block, or an image block adjacent to a certain point of the current image block, which is not specifically limited in this embodiment of the present invention.
- the neighboring image blocks of the current image block may include reconstructed neighboring image blocks and unreconstructed neighboring image blocks.
- the reconstructed neighboring image blocks of the current image block are referred to as the current image.
- the first neighboring image block of the block, and the unreconstructed neighboring image block of the current image block is referred to as the second neighboring image block of the current image block.
- the adjacent image block of the current image block in the above S3011 is also a CU, and the adjacent image block is a reconstructed (that is, encoded or decoded) image block adjacent to the current image block. Is the first neighboring image block, and the motion information of the control points of the first neighboring image block has been obtained and stored.
- the first adjacent image block of the current image block may include multiple, so the target control point of the current image block is determined according to the motion information of the control points of the first adjacent image block of the current image block.
- the exercise information may also include multiple results.
- the motion information includes a motion vector
- the above-mentioned motion information for determining the control point of the current image block is to determine the motion vector of the current image block.
- the first adjacent image block of the current image block is the image block in which the adjacent sub-block of the current image block (the adjacent sub-block is a CU sub-block), as shown in FIG. 8.
- the five adjacent sub-blocks are A1, B1, C1, D1, and E1, and can be traversed in a certain order (such as A1 ⁇ B1 ⁇ C1 ⁇ D1 ⁇ E1) to obtain the first neighboring image block of the current image block to obtain
- the motion information of the control points of the first adjacent image block, and the motion of the target control point of the current image block is determined according to the motion model of the first adjacent image block (including 4-parameter, 6-parameter, and 8-parameter motion models). information.
- the above-mentioned adjacent sub-blocks A1, B1, C1, D1, and E1 are all spatially-adjacent sub-blocks of the current image block.
- an adjacent image block of the current image block (that is, an image block where the adjacent sub-block A1 of the current image block is located) is taken as an example.
- the control points of the current image block are denoted as M0, M1, M2, and M3, where the coordinates of M0 are (x 0 , y 0 ), the coordinates of M1 are (x 1 , y 1 ), the coordinates of M2 are (x 2 , y 2 ), and the coordinates of M3 are (x 4 , y 4 ), the adjacent image block where the sub-block A1 is located is recorded as image block 1, and the control points of the image block 1 are respectively recorded as N0, N1, N2, N3, where the coordinates of N0 is (x 4 , y 4 ), the motion vector of N0 is (vx 4 , vy 4 ), the coordinates of N1 is (x 5 , y 5 ), the motion vector of N1 is (vx 5 , vy 5 ),
- a 4-parameter motion model can be used to determine the motion information of the target control point. Specifically, it can be adopted
- the following formula (1) calculates the motion vector of the control point M1, and uses the formula (2) to calculate the motion vector of the control point M2:
- the motion vector of the control point M1 is (vx 0 , vy 0 ).
- the motion vector of the control point M2 is (vx 1 , vy 1 ).
- the motion vectors of the target control points of the current image block are (vx 0 , vy 0 ) and (vx 1 , vy 1 ).
- a 6-parameter motion model can be used to determine the motion information of the target control point.
- the motion vector of the control point M1 is (vx 0 , vy 0 ).
- the motion vector of the control point M2 is (vx 1 , vy 1 ).
- the motion vector of the control point M3 can be obtained (vx 2 , vy 2 ).
- the motion vectors of the target control points of the current image block are (vx 0 , vy 0 ), (vx 1 , vy 1 ), and (vx 2 , vy 2 ).
- the motions of the target control points of the five current image blocks can be determined according to the motion information of the control points of the five adjacent image blocks where the subblocks A1, B1, C1, D1, and E1 are respectively located. information. If one of the sub-blocks A1, B1, C1, D1, and E1 is unavailable, the sub-block is skipped, and the current image is determined based on the motion information of the control point of the first neighboring image block where the next sub-block is located. The motion information of the target control point of the block.
- the positions of the above-mentioned sub-blocks A1, B1, C1, D1, and E1, the traversal order of the sub-blocks, and the motion model of the first neighboring image block are not limited. In practical applications, it may be Use sub-blocks at other positions, other traversal orders, and other motion models.
- S3012. Determine the motion information of the target control point of the current image block according to the motion information of the adjacent sub-blocks of the target control point of the current image block.
- the adjacent sub-blocks of the target control point are sub-blocks of a CU, that is, a smaller block divided by a CU.
- the adjacent sub-block is a sub-block of the first adjacent image block of the current image block.
- the first adjacent image block of the current image block may include one or more than one, then the adjacent sub-blocks of the control points of the current image block may also include one or more, and the adjacent sub-blocks may also belong to different first phases. Adjacent image blocks.
- the motion information of the target control point of the current image block may be determined according to the motion information of the reconstructed sub-block adjacent to the current image block, specifically based on the motion information of the adjacent sub-blocks of the target control point. To determine the motion information of the target control point of the current image block.
- the motion information of the adjacent sub-blocks of the target control point is determined as the motion information of the target control point of the current image block
- the target control point of the current image block is the upper left vertex control point and the upper right vertex.
- the control point of the upper left vertex of the current image block is recorded as M0
- the adjacent subblocks of the control point M0 are A2, B2, and C2, respectively
- the control point of the upper right vertex is recorded as M1.
- Adjacent sub-blocks of the control point M1 are D2 and E2, respectively
- the motion information of the target control point includes the motion information of the control point M0 and the motion information of the control point M1.
- the adjacent sub-blocks A2, B2, and C2 are spatially-adjacent sub-blocks of the control point M0, and the adjacent sub-blocks D2 and E2 are spatially-adjacent sub-blocks of the control point M1.
- the motion vector of the control point M0 is recorded as v 0 (specifically (vx 0 , vy 0 )), and the motion vector of the control point M1 is recorded as v 1 (specifically (vx 1 , vy 1 )).
- v 0 specifically (vx 0 , vy 0 )
- v 1 specifically (vx 1 , vy 1 )
- the motion vector is combined with the candidate motion vector of the control point M1 to obtain a two-tuple queue of candidate motion vectors of the target control point of the current image block:
- the indexes of the respective tuple queues of the candidate motion vectors are 0, 1, 2, 3, 4, 5 in order.
- the motion information of the control points is obtained.
- the motion information of the target control point can be determined in a similar manner of the control point combination.
- the motion information of the first available adjacent sub-block in the adjacent sub-block is obtained, Determine the motion information of the target control point of the current image block. Specifically, first, the motion information of all control points of the current image block is determined, and then the control points are combined to obtain all combinations of the motion information of the target control points of the current image block.
- the neighboring sub-blocks of the control point M0 of the current image block are C3, F3, and G3. These three sub-blocks are used to determine the motion information of the control point M0 and the neighboring sub-points of the control point M1.
- the blocks are D3 and E3. These two sub-blocks are used to determine the motion information of control point M1, and the adjacent sub-blocks of control point M2 are A3 and B3. These two sub-blocks are used to determine the motion information of control point M2, and control point M3
- the neighboring sub-blocks are T1, which is used to determine the motion information of the control point M3.
- A3, B3, C3, D3, E3, F3, and G3 are all spatially neighboring sub-blocks, and T1 is the phase in the time domain. Neighboring subblocks.
- the motion information of each sub-block can be sequentially obtained in the order of F3 ⁇ C3 ⁇ G3, and the detected motion information of the first available sub-block is used as the motion information of the control point M0.
- the process of determining the motion information of the control point M0 is as follows:
- the motion information of each sub-block can be sequentially obtained in the order of D3 ⁇ E3, and the detected motion information of the first available sub-block is used as the motion information of the control point M1.
- the motion information of each sub-block can be sequentially obtained in the order of A3 ⁇ B3, and the detected motion information of the first available sub-block is used as the motion information of the control point M2.
- the above-mentioned process of determining the movement information of the control point M1 and the control point M2 is similar to the process of determining the movement information of the control point M0. For details, refer to the above description of determining the movement information of the control point M1, which is not described herein again.
- the motion information of the sub-block T1 is used as the motion information of the control point M3.
- control points of the current image block are combined to obtain multiple combinations of the motion information of the target control point.
- the two control points of the above control points M0, M1, M2, and M3 are combined, and the obtained control point tuple includes: ⁇ M0, M1 ⁇ , ⁇ M0, M2 ⁇ , ⁇ M0, M3 ⁇ , ⁇ M2, M3 ⁇ , ⁇ M2, M2 ⁇ , ⁇ M3, M4 ⁇ .
- the target control point includes three control points
- the three control points of the above control points M0, M1, M2, and M3 are combined, and the resulting triple of control points includes: ⁇ M0, M1, M2 ⁇ , ⁇ M0, M1, M3 ⁇ , ⁇ M1, M2, M3 ⁇ , ⁇ M0, M2, M3 ⁇ .
- control points M0, M1, M2, and M3 are combined to obtain a quaternion of the control points: ⁇ M0, M1, M2, M3 ⁇ .
- a candidate motion vector list (including the two-tuple in S3011, the two-tuple, three-tuple, and the four-tuple in S3012) of the target control point can be constructed. It is used to determine the motion information of the sub-block of the current image block.
- an affine transformation model is used to determine the motion information of the target pixel point in the sub-block of the current image block, and the motion information of the target pixel point is used as the motion information of the sub-block.
- the target pixel point is a pixel point different from the control point of the current image block.
- the motion information of each sub-block in the current image block may be determined according to the motion information of the target control point of the current image block, so that the prediction of each sub-block is obtained according to the motion information of each sub-block. Block to obtain the predicted block of the current image block.
- the following uses a sub-block in the current image block (referred to as the first sub-block) as an example to describe the process of determining the motion information of the sub-block.
- the first sub-block may be a sub-block containing the control points of the current image block.
- the first sub-block may also be a sub-block that does not include a control point of the current image block.
- the motion information of the target control point of the current image block obtained in the above S301 may be two control points, or three control points, or 4 control points.
- the motion information may be determined based on 4
- the affine transformation model of the parameters determines the motion information of the target pixels; when the target control point includes three control points, the motion information of the target pixels can be determined according to the 6-parameter affine transformation model; when the target control point includes four For the control point, the motion information of the target pixel point can be determined according to an affine transformation model with 8 parameters.
- the 4-parameter affine transformation model is:
- (vx, vy) composed of vx and vy is the motion vector of the target pixel point in the first sub-block
- (x, y) is the coordinate of the target pixel point (specifically, the coordinate of the upper-left vertex pixel relative to the current image block)
- a 1 , a 2 , a 3 , and a 4 are parameters of the affine transformation model, and the parameters are related to the motion information of the target control point. If the two control points included in the target control point are the control points M0, respectively And M1, according to the motion information of the control point M1 and the control point M2, the motion information of the target pixel in the first sub-block is:
- (vx 0 , vy 0 ) is the motion vector of the control point M1
- (vx 1 , vy 1 ) is the motion vector of the control point M2
- w is the width of the current image block.
- the 6-parameter affine transformation model is:
- a 1 , a 2 , a 3 , and a 4 are parameters of the affine transformation model, and the parameters are related to the motion information of the target control point. If the three control points included in the target control point are the control points M0, respectively , M1, and M3. According to the motion information of the control point M1, the control point M2, and the control point M3, the motion information of the target pixel point in the first sub-block is:
- (vx 0 , vy 0 ) is a motion vector of the control point M1
- (vx 1 , vy 1 ) is a motion vector of the control point M2
- (vx 2 , vy 2 ) is a motion vector of the control point M3.
- the target pixel point may be a central pixel point of the first sub-block, and the coordinates of the central pixel point may be determined according to the following formula (10):
- i 0,1,2, ...
- j 0,1,2, ...
- the target pixel point may also be any pixel point in the first sub-block, or a pixel point determined according to a certain rule, which is not specifically limited in the embodiment of the present invention.
- the motion information of the target pixel point in the sub-block of the current image block after obtaining the motion information of the target pixel point in the sub-block of the current image block according to the affine transformation model, it is considered that the motion of the pixel point in the same sub-block is considered in the prediction method based on the non-translational motion model.
- the information is the same, so the motion information of the target pixel can be used as the motion information of the sub-block.
- the motion information can represent the sub-blocks of the current image block can be obtained by offsetting the sub-blocks of the reconstructed image block
- the prediction of each sub-block can be quickly determined according to the motion information of each sub-block in the current image block.
- Information ie prediction block. Specifically, according to the motion vector in the motion information of the sub-block of the current image block and the position information of the sub-block of the current image block, in a reference frame of the current image block, the motion in the motion information of the sub-block of the current image block is determined. The vector points to a reference block, and uses the reference block as a prediction block of a sub-block of the current image block.
- the candidate motion information (ie, the candidate motion vector) due to the motion information of the target control point of the current image block determined in the above S3011 or S3012 may include multiple types. Multiple prediction results for the current image block.
- the candidate motion vector tuple of the target control point includes the aforementioned ⁇ v 0A2 , v 1D2 ⁇ , ⁇ v 0A2, v 1E2 ⁇ , ⁇ v 0B2, v 1D2 ⁇ , ⁇ v 0B2, v 1E2 ⁇ , ⁇ v 0C2, v 1D2 ⁇ , ⁇ v 0C2, v 1E2 ⁇ 6 this set of motion vectors. Accordingly, six kinds of prediction results of the current image block can be determined.
- one of the above 6 types of prediction results can be selected as the final prediction block of the current image block (that is, the optimal prediction block) for encoding the current image block.
- the current image block can be calculated.
- the average value of the difference between the value of each pixel in the original block and the value of each pixel in the prediction block, and the prediction block with the smallest average value is selected as the final prediction block of the current image block.
- the prediction mode of the current image block is the AMVP mode
- the value of the motion vector of the target control point corresponding to the final prediction result of the current image block is used as the predicted value of the motion vector of the target control point
- the motion vector is The index in the candidate motion vector tuple is coded into the code stream and passed to the decoding end, and the encoding end passes the difference between the motion vector of the target control point and the actual motion vector of the target control point to the decoding end.
- the prediction mode of the current image block is the Merge mode
- the motion vector value of the target control point corresponding to the final prediction result of the current image block is used as the predicted value of the motion vector of the target control point, and the motion vector is included in the candidate motion vector.
- the index in the two-tuple is coded into the code stream and passed to the decoder.
- the deblocking effect filtering may be performed on the subblocks of the current image block according to the motion information of the subblocks of the current image block, or according to the motion information of the subblocks of the current image block.
- Subblocks of the current image block are subjected to overlapping block motion compensation.
- the adjacent image block of the above-mentioned sub-block is adjacent to the current image block, and the adjacent image block is obtained by prediction using a translational motion model.
- the adjacent image block of the sub-block of the current image block is a CU.
- the adjacent image blocks of the subblock are unreconstructed image blocks. According to the description of the adjacent image blocks in the foregoing embodiment, it can be known that the adjacent image blocks of the subblocks of the current image block, that is, the second adjacent blocks of the current image block Image block.
- the motion information of the second adjacent image block can be predicted according to the motion information of the sub-blocks of the current image block, so that the prediction of the second adjacent image block is determined based on the motion information of the second adjacent image block.
- the sub-blocks of the second adjacent image block are determined according to the method in S3012 above.
- the motion information of the control points of the second adjacent image block may be determined according to the motion information of the sub-block of the current image block, and then based on the second adjacent image block The motion information of the control points of the image block is used to obtain the motion information of each sub-block of the second adjacent image block, thereby obtaining the prediction block of each sub-block, and then the prediction block of the second adjacent image block.
- the motion information of the target control point of the current image block may be used to predict the motion information of the control point of the second adjacent image block of the current image block to determine the second phase.
- the motion information of the control points of the adjacent image blocks may be specifically based on the motion model of the first adjacent image block in S3011 (that is, the above formulas (1)-(2), or formulas (3)-(4))
- the motion information of the control points of the second adjacent image block is determined.
- the method for predicting motion information provided by the embodiment of the present invention may further include: setting the motion information of each pixel in the sub-block of the current image block as the motion information of the sub-block, that is, The motion information is used as motion information of each pixel in a sub-block of the current image block.
- the method for predicting motion information provided by the embodiment of the present invention may further include: after determining the motion information of the pixels in the sub-block of the current image block, and storing the motion information of the pixels in the first sub-block .
- the current image block includes two kinds of motion information, one is motion information of a control point in the current image block, and one is motion information of each sub-block of the current image block.
- the method for predicting motion information provided by the embodiment of the present invention may further include: saving at least one of motion information of a sub-block of the current image block and motion information of a target control point.
- the motion information of the sub-blocks of the current image block and the motion information of the control points of the current image block are respectively stored in different storage locations, for example, different storage locations in the memory of a codec device, Or different storage locations of a storage device external to the codec device, and the like are not limited in the embodiment of the present invention.
- the position coordinates of the target control point may be saved.
- the method for predicting motion information may use an affine transformation model to determine the motion information of a target pixel in a sub-block of the current image block according to the motion information of the target control point of the current image block to be predicted. And using the motion information of the target pixel as the motion information of the sub-block of the current image block, so that the motion information of the sub-block is used to predict the motion information of the adjacent image block of the sub-block to determine the adjacent
- the motion information of the image block can improve the performance of the codec and reduce the complexity of the codec.
- the image prediction method provided by the embodiment of the present invention may include S401-S403:
- S401 Determine motion information of control points of the current image block according to motion information of control points of neighboring image blocks of the current image block to be predicted.
- the adjacent image block of the current image block is a CU, and the adjacent image block of the current image block satisfies at least one of the following A1-A3 conditions:
- the adjacent image block of the current image block is the image block located to the left or lower left of the current image block, and the adjacent image The blocks do not include image blocks located above, to the left, and to the right of the current image block.
- the upper boundary of the current image block coincides with the upper boundary of the CTU where the current image block is located. If the upper boundary of the current image block and the upper boundary of the CTU where the current image block is located Coincidence, and only coincide with the upper boundary of the CTU where the current image block is located, when selecting the above adjacent image block, the image block above the upper, upper left, and upper right of the current image block may not be selected, or when the above-mentioned image block is selected When the image block is adjacent, the image block above, upper left, or upper right of the current image block may not be selected.
- the image block where B1 is located and the one where C1 is located are not selected.
- the image block where A1 is located and the image block where D1 is located can be selected.
- the image block where B1 is located, the image block where C1 is located, and the image block where E1 is located are images in other CTUs, respectively.
- Blocks, when predicting images during the process of encoding and decoding, these adjacent image blocks are no longer selected, so that it is no longer necessary to obtain motion information of control points of adjacent image blocks across the CTU, which can save the resources consumed by encoding and decoding.
- the adjacent image block of the current image block is an image block located above or above the current image block, and the adjacent image block does not Include image blocks located at the upper left, left, and lower left of the current image block.
- the left boundary of the current image block coincides with the left boundary of the CTU where the current image block is located.
- the left edge of the current image block coincides with the left edge of the CTU where the current image block is located, when the adjacent image block of the current image block is selected, the image block where A1 is located and the D1 where For the image block and the image block where E1 is located, the image block where B1 is located and the image block where C1 is located can be selected.
- the image block where A1 is located, the image block where D1 is located, and the image block where E1 is located are image blocks in other CTUs.
- these adjacent image blocks are no longer selected, so it is no longer necessary to obtain the motion information of the control points of the adjacent image blocks across the CTU, which can save the resources consumed by the encoding and decoding.
- the adjacent image block of the current image block is an image block located at the upper left, upper, or upper right of the current image block. Adjacent image blocks do not include image blocks located to the left or lower left of the current image block.
- the left edge of the current image block coincides with the left edge of the CTU where the current image block is located, when the adjacent image block of the current image block is selected, the image block where A1 is located and the D1 where For the image block, you can select the image block where B1 is located, the image block where C1 is located, and the image block where E1 is located.
- the adjacent image blocks of the current image block are spatially adjacent blocks of the current image block.
- the method for determining the motion information of the control point of the current image block in S401 is the method for determining the motion information of the target control point of the current image block described in S3011.
- the control point of the current image block is the target control point of the current image block in S301 above
- the motion information of the control point of the current image block is the motion information of the target control point in S301 above
- the control points of the adjacent image blocks of the current image block are the motion information of the control points of the first adjacent image block of the current image block
- the motion information of the control points of the adjacent image blocks of the current image block are Is the motion information of the target control point of the first image block.
- S402. Determine the motion information of the sub-blocks of the current image block by using an affine transformation model according to the motion information of the control points of the current image block.
- the motion information of the subblock of the current image block is determined.
- a reference block pointed to by a motion vector in the frame, and the reference block is used as a prediction block of a subblock of the current image block.
- a partial phase of the current image block is selected. Adjacent image blocks are used to determine the motion information of the control points of the current image block. The motion information of control points of other adjacent image blocks is no longer obtained across the CTU. In this way, the resources consumed by encoding and decoding can be saved.
- the image prediction method provided by the embodiment of the present invention may include S501-S504:
- S501 Determine whether the upper boundary of the current image block coincides with the upper boundary of the CTU where the current image block is located, and whether the left boundary of the current image block coincides with the left boundary of the CTU where the current image block is located.
- the current image block when the upper and left edges of the current image block coincide with the upper and left edges of the CTU where the current image block is located, the current image block is no longer determined based on the method in S401 (or S3011).
- the motion information of the control points of the current image block is the motion information of the adjacent sub-blocks of the control points of the current image block (ie, the S3012 method in the above embodiment) to determine the motion information of the control points of the current image block.
- the adjacent sub-blocks of the control points of the current image block are used.
- the method of determining the motion information of the control points of the current image block realizes the prediction of the current image block, and it is no longer necessary to obtain the motion information of the control points of the adjacent image blocks of the current image block across the CTU. Consumed resources.
- the above mainly introduces the solution provided by the embodiment of the present invention from a method perspective.
- it includes a hardware structure and / or a software module corresponding to each function.
- the present application can be implemented in hardware or a combination of hardware and computer software. Whether a certain function is performed by hardware or computer software-driven hardware depends on the specific application and design constraints of the technical solution. A professional technician can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of this application.
- the functional modules of the motion information prediction device and the image prediction device may be divided according to the foregoing method example.
- each functional module may be divided corresponding to each function, or two or more functions may be integrated into one.
- Processing module The above integrated modules can be implemented in the form of hardware or software functional modules. It should be noted that the division of the modules in the embodiment of the present invention is schematic, and is only a logical function division. In actual implementation, there may be another division manner.
- FIG. 13 is a schematic block diagram of a motion information prediction apparatus 1100 in an embodiment of the present invention.
- the apparatus 1100 for predicting motion information may include an acquisition module 1101 and a determination module 1102. Wherein: an acquisition module 1101 is configured to acquire motion information of a target control point of a current image block to be predicted; a determination module 1102 is configured to determine an sub-block of the current image block based on the motion information of the target control point by using an affine transformation model; Motion information of the target pixel point in the target image, and use the motion information of the target pixel point as the motion information of the sub-block of the current image block.
- the target pixel point is a pixel point different from the target control point.
- the target pixel point may be the sub-block.
- the center pixel of the sub-block uses the motion information of the sub-block to predict the motion information of the neighboring image block of the sub-block to determine the motion information of the neighboring image block of the sub-block and the neighboring image of the sub-block
- the block is adjacent to the current image block.
- the obtaining module 1101 may be specifically used to execute S301, and the determining module 1102 may be used to execute S302 and S303.
- the determining module 1102 is further configured to determine a prediction block of a neighboring image block of the sub-block according to the motion information of the neighboring image block of the sub-block.
- the apparatus for predicting motion information provided by the embodiment of the present invention may further include a processing module 1103, which is configured to perform, based on the motion information of a sub-block of the current image block, The sub-block performs deblocking filtering; or, based on the motion information of the sub-block, performs overlapping block motion compensation on the sub-block.
- a processing module 1103 which is configured to perform, based on the motion information of a sub-block of the current image block, The sub-block performs deblocking filtering; or, based on the motion information of the sub-block, performs overlapping block motion compensation on the sub-block.
- the above-mentioned determining module 1102 may be further configured to use the motion information of the target control point to predict the motion information of the control points of the adjacent image blocks of the current image block, so as to determine the motion information of the control points of the adjacent image blocks.
- the determining module 1102 is further configured to set the motion information of each pixel in a sub-block of the current image block as the motion information of the sub-block.
- the apparatus for predicting motion information provided by an embodiment of the present invention may further include a storage module 1104; the storage module 1104 is configured to store motion information of pixels in a sub-block of the current image block.
- the storage module 1104 may be further configured to store at least one of motion information of a sub-block of the current image block and motion information of a target control point of the current image block.
- FIG. 16 is a schematic block diagram of an image prediction apparatus 1200 in an embodiment of the present invention.
- the image prediction apparatus 1200 may include a first determination module 1201, a second determination module 1202, and a third determination module 1203.
- the first determining module 1201 may be used to determine the movement information of the control points of the current image block according to the movement information of the control points of the neighboring image blocks of the current image block to be predicted;
- the second determining module 1202 may be used to determine And according to the motion information of the control points of the current image block, an affine transformation model is used to determine the motion information of the sub-blocks of the current image block.
- the third determination module 1203 may be used to determine the motion information of the sub-blocks of the current image block.
- the first determination module 1201 may be specifically used to execute S401
- the second determination module 1202 may be specifically used to execute S402
- the third determination module 1203 may be specifically used to execute S403.
- Adjacent image blocks of the current image block satisfy at least one of the following conditions:
- the adjacent image block of the current image block is the image block located to the left or lower left of the current image block, and the adjacent image block does not include Image blocks located above the current image block, upper left and upper right;
- the adjacent image block of the current image block is an image block located above or above the current image block, and the adjacent image block does not include the The top left, left, and bottom left image blocks of the current image block.
- the adjacent image blocks of the current image block are spatially adjacent blocks of the current image block.
- the first determining module 1201 is specifically configured to calculate the motion information of the control points of the current image block according to the motion information of the control points of the adjacent image block of the current image block to be predicted:
- (vx 4 , vy 4 ) is the motion vector of the control point (x 4 , y 4 ) located at the upper left vertex of the adjacent image block
- (vx 5 , vy 5 ) is the control located at the upper right vertex of the adjacent image block.
- the motion vector of the point (x 5 , y 5 ), (vx, vy) is the motion vector of the control point (x, y) of the current image block.
- the second determining module 1202 is specifically configured to calculate the motion information of the target pixel point in the sub-block of the current image block by using the following formula according to the motion information of the control point of the current image block:
- (vx 0 , vy 0 ) is the motion vector of the control point (x 0 , y 0 ) located at the upper-left vertex of the current image block
- (vx 1 , vy 1 ) is the control point located at the upper-right vertex of the current image block ( x 1 , y 1 )
- (vx, vy) is the motion vector of the target pixel (x, y); and the motion vector of the target pixel is used as the motion vector of the sub-block.
- the third determining module 1203 is specifically configured to determine a sub-block of the current image block in a reference frame of the current image block according to the motion vector in the motion information of the sub-block of the current image block and the position information of the sub-block of the current image block.
- the reference block pointed to by the motion vector in the motion information of the frame is used as a prediction block of a sub-block of the current image block.
- FIG. 17 is a schematic block diagram of an image prediction apparatus 1300 according to an embodiment of the present invention.
- the image prediction apparatus may include a first determination module 1301, a second determination module 1302, a third determination module 1303, and a fourth determination module 1304.
- the first determining module 1301 is configured to determine whether the upper boundary of the current image block to be predicted coincides with the upper boundary of the CTU where the current image block is located, and whether the left boundary of the current image block coincides with the left boundary of the CTU where the current image block is located.
- the second determining module 1302 is configured to: when the left boundary of the current image block coincides with the left boundary of the CTU where the current image block is located, and the upper boundary of the current image block coincides with the upper boundary of the CTU where the current image block is located, according to the current
- the motion information of the adjacent sub-blocks of the control points of the image block determines the motion information of the control points of the current image block, the adjacent sub-blocks are the sub-blocks of the CU;
- the third determining module 1303 is configured to determine the control points of the current image block
- the affine transformation model is used to determine the motion information of the sub-blocks of the current image block.
- the fourth determination module 1304 is configured to obtain the predicted blocks of the sub-blocks of the current image block according to the motion information of the sub-blocks of the current image block.
- the first determination module 1301 may be specifically used to execute S501
- the second determination module 1302 may be specifically used to execute S502
- the third determination module 1303 may be specifically used to execute S503
- the fourth determination module 1304 may be specifically used to execute S504.
- the second determining module 1302 is specifically configured to determine the motion information of the neighboring sub-blocks of the control points of the current image block as the motion information of the control points of the current image block; or determine the control points of the current image block in a preset order. Whether adjacent sub-blocks are available; and determine the motion information of the first available adjacent sub-block in the adjacent sub-blocks as the motion information of the control points of the current image block.
- the third determining module 1303 is specifically configured to calculate the motion information of the target pixel point in the sub-block of the current image block according to the motion information of the control point of the current image block by using the following formula:
- (vx 0 , vy 0 ) is the motion vector of the control point (x 0 , y 0 ) located at the upper-left vertex of the current image block
- (vx 1 , vy 1 ) is the control point located at the upper-right vertex of the current image block ( x 1 , y 1 )
- (vx, vy) is the motion vector of the target pixel (x, y).
- the fourth determining module 1304 is specifically configured to determine the sub-block of the current image block in the reference frame of the current image block according to the motion vector in the motion information of the sub-block of the current image block and the position information of the sub-block of the current image block.
- the reference block pointed to by the motion vector in the motion information of the frame is used as a prediction block of a sub-block of the current image block.
- the above-mentioned second determining module 1302 may also be used when the left boundary of the current image block coincides with the left boundary of the CTU where the current image block is located, or the upper boundary of the current image block and the CTU where the current image block is located When the boundaries overlap, the motion information of the control points of the current image block is determined according to the motion information of the control points of the adjacent image block of the current image block.
- the third determination module 1303 may also be used to determine the control points of the current image block.
- the motion information uses an affine transformation model to determine the motion information of the sub-blocks of the current image block.
- the above-mentioned fourth determination module 1304 may also be used to obtain the prediction of the sub-blocks of the current image block based on the motion information of the sub-blocks of the current image block. Piece.
- Adjacent image blocks of the current image block satisfy at least one of the following conditions:
- the adjacent image block of the current image block is the image block located to the left or lower left of the current image block, and the adjacent image block does not include Image blocks located above the current image block, upper left and upper right;
- the adjacent image block of the current image block is an image block located above or above the current image block, and the adjacent image block does not include the The top left, left, and bottom left image blocks of the current image block.
- each module in the image prediction apparatus and the motion information prediction apparatus according to the embodiments of the present invention is a functional subject that implements various execution steps included in the image prediction method and the motion information prediction method according to the embodiments of the present invention. That is to say, it has the function main body for realizing each step in the image prediction method and the prediction method of motion information in the embodiment of the present invention, and the extension and deformation of these steps.
- this article will not repeat them.
- FIG. 18 is a schematic block diagram of an implementation manner of an encoding device or a decoding device (referred to as a decoding device 1400) used in an embodiment of the present invention.
- the decoding device 1400 may include a processor 1410, a memory 1430, and a bus system 1450.
- the processor 1410 and the memory 1430 are connected through a bus system 1450.
- the memory 1430 is used to store instructions.
- the processor 1410 is used to execute the instructions stored in the memory 1430 to perform various video encoding and decoding methods described in the embodiments of the present invention. , Especially the above-mentioned prediction method of motion information and image prediction method. To avoid repetition, it will not be described in detail here.
- the processor 1410 may be a central processing unit (CPU), and the processor 1410 may also be another general-purpose processor, DSP, ASIC, FPGA, or other programmable logic device, discrete gate. Or transistor logic devices, discrete hardware components, etc.
- a general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
- the memory 1430 may include a ROM device or a RAM device. Any other suitable type of storage device may also be used as the memory 1430.
- the memory 1430 may include code and data 1431 accessed by the processor 1410 using the bus 1450.
- the memory 1430 may further include an operating system 1433 and an application program 1435.
- the application program 1435 includes at least one program that allows the processor 1410 to execute the video encoding and decoding method described in the embodiment of the present invention.
- the application program 1435 may include applications 1 to N, which further includes a video encoding or decoding application (referred to as a video decoding application) that executes the video encoding and decoding method described in the embodiment of the present invention.
- the bus system 1450 may include a data bus, a power bus, a control bus, and a status signal bus. However, for the sake of clarity, various buses are marked as the bus system 1450 in the figure.
- the decoding device 1400 may further include one or more output devices, such as a display 1470.
- the display 1470 may be a tactile display that incorporates the display with a tactile unit operatively sensing a touch input.
- the display 1470 may be connected to the processor 1410 via a bus 1450.
- Computer-readable media may include computer-readable storage media, which corresponds to tangible media, such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another (e.g., according to a communication protocol) .
- computer-readable media may generally correspond to (1) non-transitory, tangible computer-readable storage media, or (2) communication media such as signals or carrier waves.
- a data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code, and / or data structures used to implement the techniques described in this application.
- the computer program product may include a computer-readable medium.
- such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage devices, magnetic disk storage devices or other magnetic storage devices, flash memory, or may be used to store instructions or data structures Any other medium in the form of the required program code and accessible by a computer. Also, any connection is properly termed a computer-readable medium. For example, if coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave are used to transmit instructions from a website, server, or other remote source, then coaxial cable Wire, fiber optic cable, twisted pair, DSL or wireless technologies such as infrared, radio and microwave are included in the definition of media.
- DSL digital subscriber line
- the computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other temporary media, but are instead directed to non-transitory tangible storage media.
- magnetic and optical discs include compact discs (CDs), laser discs, optical discs, DVDs, and Blu-ray discs, where magnetic discs typically reproduce data magnetically, and optical discs use lasers to reproduce data optically. Combinations of the above should also be included within the scope of computer-readable media.
- DSPs digital signal processors
- ASICs application specific integrated circuits
- FPGAs field programmable logic arrays
- the term "processor” as used herein may refer to any of the aforementioned structures or any other structure suitable for implementing the techniques described herein.
- the functions described by the various illustrative logical blocks, modules, and steps described herein may be provided within dedicated hardware and / or software modules configured for encoding and decoding, or Into the combined codec.
- the techniques can be fully implemented in one or more circuits or logic elements.
- various illustrative logical boxes, units, and modules in the video encoder 100 and the video decoder 200 can be understood as corresponding circuit devices or logic elements.
- the techniques of this application may be implemented in a wide variety of devices or devices, including a wireless handset, an integrated circuit (IC), or a group of ICs (eg, a chipset).
- IC integrated circuit
- Various components, modules, or units are described in this application to emphasize functional aspects of the apparatus for performing the disclosed techniques, but do not necessarily need to be implemented by different hardware units.
- the various units may be combined in a codec hardware unit in combination with suitable software and / or firmware, or through interoperable hardware units (including one or more processors as described above) provide.
- FIG. 19 is a block diagram of a video decoding system 1 according to an example described in the embodiment of the present invention.
- the term "video coder” generally refers to both video encoders and video decoders.
- the terms “video coding” or “coding” may generally refer to video encoding or video decoding.
- the video encoder 300 and the video decoder 400 of the video decoding system 1 are configured to predict a current coded image block according to various method examples described in any of a variety of new inter prediction modes proposed in this application.
- the motion information of the sub-block or its sub-blocks makes the predicted motion vector close to the motion vector obtained by using the motion estimation method to the greatest extent, so that the motion vector difference value is not transmitted during encoding, thereby further improving the encoding and decoding performance.
- the video decoding system 1 includes a source device 30 and a destination device 40.
- the source device 30 generates encoded video data. Therefore, the source device 30 may be referred to as a video encoding device.
- the destination device 40 may decode the encoded video data generated by the source device 30. Therefore, the destination device 40 may be referred to as a video decoding device.
- Various implementations of the source device 10, the destination device 40, or both may include one or more processors and a memory coupled to the one or more processors.
- the memory may include, but is not limited to, RAM, ROM, EEPROM, flash memory, or any other medium that can be used to store the desired program code in the form of instructions or data structures accessible by a computer, as described herein.
- the source device 30 and the destination device 40 may include various devices including desktop computers, mobile computing devices, notebook (e.g., laptop) computers, tablet computers, set-top boxes, telephone handsets, such as so-called “smart” phones, etc. Cameras, televisions, cameras, display devices, digital media players, video game consoles, on-board computers, or the like.
- the destination device 40 may receive the encoded video data from the source device 30 via the link 50.
- the link 50 may include one or more media or devices capable of moving the encoded video data from the source device 30 to the destination device 40.
- the link 50 may include one or more communication media enabling the source device 30 to directly transmit the encoded video data to the destination device 40 in real time.
- the source device 30 may modulate the encoded video data according to a communication standard, such as a wireless communication protocol, and may transmit the modulated video data to the destination device 40.
- the one or more communication media may include wireless and / or wired communication media, such as a radio frequency (RF) spectrum or one or more physical transmission lines.
- RF radio frequency
- the one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (eg, the Internet).
- the one or more communication media may include a router, a switch, a base station, or other devices that facilitate communication from the source device 30 to the destination device 40.
- the encoded data may be output from the output interface 340 to the storage device 60.
- the encoded data can be accessed from the storage device 60 through the input interface 440.
- the storage device 60 may include any of a variety of distributed or locally accessed data storage media, such as a hard drive, Blu-ray disc, DVD, CD-ROM, flash memory, volatile or non-volatile memory, Or any other suitable digital storage medium for storing encoded video data.
- the storage device 60 may correspond to a file server or another intermediate storage device that may hold the encoded video produced by the source device 30.
- the destination device 40 may access the stored video data from the storage device 60 via streaming or download.
- the file server may be any type of server capable of storing encoded video data and transmitting the encoded video data to the destination device 40.
- Example file servers include a web server (eg, for a website), an FTP server, a network attached storage (NAS) device, or a local disk drive.
- the destination device 40 can access the encoded video data through any standard data connection, including an Internet connection.
- This may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both suitable for accessing encoded video data stored on a file server.
- the transmission of the encoded video data from the storage device 60 may be a streaming transmission, a download transmission, or a combination of the two.
- the motion vector prediction technology of the present application can be applied to video codecs to support a variety of multimedia applications, such as air television broadcasting, cable television transmission, satellite television transmission, streaming video transmission (e.g., via the Internet), for storage in data storage Encoding of video data on media, decoding of video data stored on data storage media, or other applications.
- the video coding system 1 may be used to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and / or video telephony.
- the video decoding system 1 illustrated in FIG. 19 is only an example, and the technology of the present application can be applied to a video decoding setting (for example, video encoding or video decoding) that does not necessarily include any data communication between the encoding device and the decoding device. .
- data is retrieved from local storage, streamed over a network, and so on.
- the video encoding device may encode the data and store the data to a memory, and / or the video decoding device may retrieve the data from the memory and decode the data.
- encoding and decoding are performed by devices that do not communicate with each other, but only encode data to and / or retrieve data from memory and decode data.
- the source device 30 includes a video source 320, a video encoder 300, and an output interface 340.
- the output interface 340 may include a regulator / demodulator (modem) and / or a transmitter.
- Video source 320 may include a video capture device (e.g., a video camera), a video archive containing previously captured video data, a video feed interface to receive video data from a video content provider, and / or a computer for generating video data Graphics systems, or a combination of these sources of video data.
- Video encoder 300 may encode video data from video source 320.
- the source device 30 transmits the encoded video data directly to the destination device 40 via the output interface 340.
- the encoded video data may also be stored on the storage device 60 for later access by the destination device 40 for decoding and / or playback.
- the destination device 40 includes an input interface 440, a video decoder 400, and a display device 420.
- the input interface 440 includes a receiver and / or a modem.
- the input interface 440 may receive encoded video data via the link 50 and / or from the storage device 60.
- the display device 420 may be integrated with the destination device 40 or may be external to the destination device 40. Generally, the display device 420 displays decoded video data.
- the display device 420 may include various display devices, such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or other types of display devices.
- LCD liquid crystal display
- OLED organic light emitting diode
- video encoder 300 and video decoder 400 may each be integrated with an audio encoder and decoder, and may include an appropriate multiplexer-demultiplexer unit Or other hardware and software to handle encoding of both audio and video in a common or separate data stream.
- the MUX-DEMUX unit may conform to the ITU H.223 multiplexer protocol, or other protocols such as the User Datagram Protocol (UDP), if applicable.
- UDP User Datagram Protocol
- Video encoder 100 and video decoder 200 may each be implemented as any of a variety of circuits, such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), Field Programmable Gate Array (FPGA), discrete logic, hardware, or any combination thereof. If the present application is implemented partially in software, the device may store instructions for the software in a suitable non-volatile computer-readable storage medium and may use one or more processors to execute the instructions in hardware Thus, the technology of the present application is implemented. Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) may be considered as one or more processors. Each of video encoder 100 and video decoder 200 may be included in one or more encoders or decoders, any of which may be integrated as a combined encoder in a corresponding device / Decoder (codec).
- codec device / Decoder
- This application may generally refer to video encoder 300 as “signaling” or “transmitting” certain information to another device, such as video decoder 400.
- the terms “signaling” or “transmitting” may generally refer to the transmission of syntax elements and / or other data used to decode compressed video data. This transfer can occur in real time or almost real time. Alternatively, this communication may occur over a period of time, such as when a syntax element is stored in a coded bitstream to a computer-readable storage medium at the time of encoding, and the decoding device may then after the syntax element is stored in this medium. retrieve the syntax element at any time.
- the video encoder 300 and the video decoder 400 may operate according to a video compression standard such as High Efficiency Video Coding (HEVC) or an extension thereof, and may conform to the HEVC test model (HM).
- HEVC High Efficiency Video Coding
- HM HEVC test model
- video encoder 300 and video decoder 400 may also operate according to other industry standards, such as the ITU-T H.264, H.265 standards, or extensions of such standards.
- the techniques of this application are not limited to any particular codec standard.
- the video encoder 300 is configured to encode syntax elements related to the image block to be encoded into a digital video output bit stream (referred to as a bit stream or code stream for short), which will be used between the current image block frames.
- the syntax element of prediction is referred to as inter-prediction data for short.
- the inter-prediction data may include a first identifier for indicating whether to use the above-mentioned candidate inter prediction mode set for inter prediction on the current image block (in other words, for indicating whether or not (The first identifier for performing inter prediction on the current image block by using the new inter prediction mode proposed in this application); or, the inter prediction data may include: indicating whether to use a candidate inter prediction mode for the current image block to be encoded
- the first identifier of the inter-prediction set and the second identifier of the inter-prediction mode for indicating the current image block are collected.
- the video encoder 300 is further configured to: Determine or select the inter prediction mode in the candidate inter prediction mode set used for inter prediction of the current image block (for example, select multiple Encoding the current image block in the inter-prediction mode with a compromise or minimum inter-prediction mode); and encoding the current image block based on the determined inter-prediction mode, the encoding process here may include Inter prediction mode, predicting motion information of one or more sub-blocks in the current image block (specifically, motion information of each sub-block or all sub-blocks), and using one or more sub-blocks in the current image block Performing inter prediction on the current image block of the motion information;
- the difference that is, the residual
- the difference that is, the residual
- the current image block that is, the original block
- the video encoder 300 only needs to program the syntax elements related to the image block to be encoded into a bit stream (also known as a code stream); otherwise, in addition to the syntax elements, the corresponding residuals need to be coded into bits flow.
- the video decoder 400 is configured to decode syntax elements related to the image block to be decoded from the bit stream.
- the syntax elements used for inter prediction of the current image block are referred to as inter prediction data for short.
- the inter-prediction data includes a first identifier for indicating whether to use a candidate inter-prediction mode set for inter-prediction for the currently decoded image block (that is, for indicating whether to use the The first identifier of the new inter prediction mode for performing inter prediction) is determined when the inter prediction data indicates that a set of candidate inter prediction modes (that is, a new inter prediction mode) is used to predict the current image block
- An inter prediction mode for performing inter prediction on the current image block in the candidate inter prediction mode set, and decoding the current image block based on the determined inter prediction mode, and the decoding process here may include based on the determined frame Inter prediction mode, predicting motion information of one or more sub-blocks in the current image block, and using the operation of one or more sub-blocks in the current image block The motion information
- the video decoder 400 is configured to determine the inter prediction indicated by the second identifier.
- the mode is an inter-prediction mode for inter-prediction of the current image block; or, if the inter-prediction data does not include a second identifier used to indicate which inter-prediction mode is used by the current image block
- the video decoder 400 is configured to determine that a first inter prediction mode used for a non-directional motion field is an inter prediction mode used for inter prediction of the current image block.
- FIG. 20A is a block diagram of a video encoder 300 according to an example described in the embodiment of the present invention.
- the video encoder 300 is configured to output a video to the post-processing entity 61.
- the post-processing entity 61 represents an example of a video entity that can process the encoded video data from the video encoder 300, such as a media-aware network element (MANE) or a stitching / editing device.
- the post-processing entity 61 may be an instance of a network entity.
- the post-processing entity 61 and the video encoder 300 may be parts of a separate device, while in other cases, the functionality described relative to the post-processing entity 61 may be performed by the same device including the video encoder 300 carried out.
- the post-processing entity 61 is an example of the storage device 60 of FIG. 19.
- the video encoder 100 may perform encoding of a video image block according to any new inter prediction mode set of candidate inter prediction mode sets including modes 0, 1, 2, ..., or 10 proposed in the present application, for example, perform a video image block. Inter prediction.
- the video encoder 300 includes a prediction processing unit 308, a loop filter 306, a decoded image buffer (DPB) 307, a summer 312, a transformer 301, a quantizer 302, and an entropy encoder 303.
- the prediction processing unit 308 includes an inter predictor 310 and an intra predictor 309.
- the video encoder 300 further includes an inverse quantizer 304, an inverse transformer 305, and a summer 311.
- the loop filter 306 is intended to represent one or more loop filters, such as a deblocking filter, an adaptive loop filter (ALF), and a sample adaptive offset (SAO) filter.
- the loop filter 106 is shown as an in-loop filter in FIG. 20A, in other implementations, the loop filter 306 may be implemented as a post-loop filter.
- the video encoder 300 may further include a video data memory and a segmentation unit (not shown in the figure).
- the video data memory may store video data to be encoded by the components of the video encoder 300.
- the video data stored in the video data storage may be obtained from the video source 320.
- the DPB 307 may be a reference image memory that stores reference video data used by the video encoder 300 to encode video data in an intra-frame or inter-frame decoding mode.
- Video data memory and DPB 307 can be formed by any of a variety of memory devices, such as dynamic random access memory (DRAM), synchronous resistive RAM (MRAM), resistive RAM (RRAM) including synchronous DRAM (SDRAM), Or other types of memory devices.
- Video data storage and DPB 307 can be provided by the same storage device or separate storage devices.
- the video data memory may be on-chip with other components of video encoder 100, or off-chip relative to those components.
- the video encoder 300 receives video data and stores the video data in a video data memory.
- the segmentation unit divides the video data into several image blocks, and these image blocks can be further divided into smaller blocks, such as image block segmentation based on a quad-tree structure or a binary tree structure. This segmentation may also include segmentation into slices, tiles, or other larger units.
- Video encoder 300 generally illustrates components that encode image blocks within a video slice to be encoded. The slice can be divided into multiple image patches (and possibly into a collection of image patches called slices).
- the prediction processing unit 308 may select one of a plurality of possible decoding modes for the current image block, such as one of a plurality of intra-coding modes or one of a plurality of inter-coding modes, wherein The multiple inter-frame decoding modes include, but are not limited to, one or more of the modes 0, 1, 2, 3 ... 10 proposed in the present application.
- the prediction processing unit 308 may provide the obtained intra- and inter-coded blocks to the summer 312 to generate a residual block, and to the summer 311 to reconstruct an encoded block used as a reference image.
- the intra predictor 309 within the prediction processing unit 308 may perform intra predictive encoding of the current image block with respect to one or more neighboring blocks in the same frame or slice as the current block to be encoded to remove spatial redundancy.
- the inter predictor 310 within the prediction processing unit 308 may perform inter predictive coding of the current image block with respect to one or more prediction blocks in the one or more reference images to remove temporal redundancy.
- the inter predictor 310 may be configured to determine an inter prediction mode for encoding a current image block. For example, the inter predictor 310 may use rate-distortion analysis to calculate rate-distortion values for various inter-prediction modes in a set of candidate inter-prediction modes, and select an inter-frame having the best rate-distortion characteristics from among them. Forecasting mode. Rate distortion analysis generally determines the amount of distortion (or error) between the coded block and the original uncoded block that was coded to produce the coded block, and the bit rate used to generate the coded block (that is, , Number of bits).
- the inter predictor 310 may determine that the inter prediction mode with the lowest code rate distortion cost of encoding the current image block in the candidate inter prediction mode set is the inter prediction mode used for inter prediction of the current image block.
- the following describes the inter-predictive coding process in detail, especially in the various inter-prediction modes for non-directional or directional sports fields in this application, predicting one or more sub-blocks in the current image block. Sub-block or all sub-blocks).
- the inter predictor 310 is configured to predict motion information (such as a motion vector) of one or more subblocks in the current image block based on the determined inter prediction mode, and use the motion information (such as the motion vector) of one or more subblocks in the current image block Motion vector) to obtain or generate a prediction block of the current image block.
- the inter predictor 310 may locate a prediction block pointed to by the motion vector in one of the reference image lists.
- the inter predictor 310 may also generate syntax elements associated with the image blocks and video slices for use by the video decoder 400 when decoding the image blocks of the video slices.
- the inter predictor 310 uses the motion information of each sub-block to perform a motion compensation process to generate a prediction block of each sub-block, thereby obtaining a prediction block of the current image block. It should be understood that the The inter predictor 310 performs motion estimation and motion compensation processes.
- the inter predictor 310 may provide information indicating the selected inter prediction mode of the current image block to the entropy encoder 303, so that the entropy encoder 303 encodes the instruction Information on the selected inter prediction mode.
- the video encoder 300 may include inter prediction data related to the current image block in the transmitted bit stream, which may include a first identifier blockbasedenableflag, to indicate whether the new image block proposed by the present application is adopted for the current image block.
- the inter prediction mode is used for inter prediction; optionally, a second identifier blockbased index may also be included to indicate which new inter prediction mode is used by the current image block.
- the process of using the motion vectors of multiple reference blocks to predict the motion vectors of the current image block or its sub-blocks in different modes 0, 1, 2 ... 10 will be described in detail below.
- the intra predictor 309 may perform intra prediction on the current image block.
- the intra predictor 309 may determine an intra prediction mode used to encode the current block.
- the intra-predictor 309 may use rate-distortion analysis to calculate rate-distortion values for various intra-prediction modes to be tested, and select an intra-prediction with the best rate-distortion characteristics from the test modes. mode.
- the intra predictor 309 may provide information indicating the selected intra prediction mode of the current image block to the entropy encoder 303 so that the entropy encoder 303 encodes the indication Information on the selected intra prediction mode.
- the video encoder 300 forms a residual image block by subtracting the prediction block from the current image block to be encoded.
- the summer 312 represents one or more components that perform this subtraction operation.
- the residual video data in the residual block may be included in one or more TUs and applied to the transformer 301.
- the transformer 301 transforms the residual video data into residual transform coefficients using a transform such as a discrete cosine transform (DCT) or a conceptually similar transform.
- the transformer 301 may transform the residual video data from a pixel value domain to a transform domain, such as the frequency domain.
- DCT discrete cosine transform
- the transformer 301 may send the obtained transform coefficients to a quantizer 302.
- a quantizer 302 quantizes the transform coefficients to further reduce the bit rate.
- the quantizer 302 may then perform a scan of a matrix containing the quantized transform coefficients.
- the entropy encoder 303 may perform scanning.
- the entropy encoder 303 After quantization, the entropy encoder 303 entropy encodes the quantized transform coefficients. For example, the entropy encoder 303 can perform context-adaptive variable-length coding (CAVLC), context-adaptive binary arithmetic coding (CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), and probability interval segmentation entropy (PIPE ) Coding or another entropy coding method or technique.
- CAVLC context-adaptive variable-length coding
- CABAC context-adaptive binary arithmetic coding
- SBAC syntax-based context-adaptive binary arithmetic coding
- PIPE probability interval segmentation entropy Coding or another entropy coding method or technique.
- the encoded bitstream may be transmitted to the video decoder 400, or archived for later transmission or retrieved by the video decoder 400.
- the entropy encoder 303 may also perform entrop
- An inverse quantizer 304 and an inverse changer 305 respectively apply inverse quantization and inverse transform to reconstruct the residual block in the pixel domain, for example, for later use as a reference block of a reference image.
- 312311 adds the reconstructed residual block to a prediction block generated by the inter predictor 310 or the intra predictor 309 to generate a reconstructed image block.
- the loop filter 306 may be applied to reconstructed image blocks to reduce distortion, such as block artifacts.
- This reconstructed image block is then stored as a reference block in the decoded image buffer 307 and can be used as a reference block by the inter predictor 310 to perform inter prediction on subsequent video frames or blocks in the image.
- the video encoder 300 may directly quantize the residual signal without processing by the transformer 301 and correspondingly does not need to be processed by the inverse transformer 305; or, for some image blocks Or image frames, the video encoder 300 does not generate residual data, and accordingly does not need to be processed by the transformer 301, quantizer 302, inverse quantizer 304, and inverse transformer 305; or, the video encoder 300 may convert the reconstructed image
- the blocks are stored directly as reference blocks without being processed by the loop filter 306; alternatively, the quantizer 302 and the inverse quantizer 304 in the video encoder 300 may be merged together.
- FIG. 20B is a block diagram of a video decoder 400 according to an example described in the embodiment of the present invention.
- the video decoder 400 includes an entropy decoder 403, a prediction processing unit 408, an inverse quantizer 404, an inverse transformer 405, a summer 411, a loop filter 406, and a decoded image buffer 407.
- the prediction processing unit 408 may include an inter predictor 410 and an intra predictor 409.
- video decoder 400 may perform a decoding process that is substantially inverse to the encoding process described with respect to video encoder 300 from FIG. 20A.
- video decoder 400 receives an encoded video bitstream from video encoder 300 that represents an image block of an encoded video slice and an associated syntax element.
- the video decoder 400 may receive video data from the network entity 62.
- the video decoder 400 may also store the video data in a video data storage (not shown in the figure).
- the video data memory may store video data, such as an encoded video bitstream, to be decoded by components of the video decoder 400.
- the video data stored in the video data storage can be obtained, for example, from the storage device 60, from a local video source such as a camera, via a wired or wireless network of video data, or by accessing a physical data storage medium.
- the video data memory can be used as a decoded image buffer (CPB) for storing encoded video data from the encoded video bitstream. Therefore, although the video data storage is not shown in FIG. 20B, the video data storage and the DPB 407 may be the same storage, or may be separately provided storages. Video data memory and DPB 407 can be formed by any of a variety of memory devices, such as: dynamic random access memory (DRAM) including synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), resistive RAM (RRAM) , Or other types of memory devices. In various examples, the video data memory may be integrated on-chip with other components of video decoder 400, or disposed off-chip relative to those components.
- DRAM dynamic random access memory
- SDRAM synchronous DRAM
- MRAM magnetoresistive RAM
- RRAM resistive RAM
- the video data memory may be integrated on-chip with other components of video decoder 400, or disposed off-chip relative to those components.
- the network entity 62 may be, for example, a server, a MANE, a video editor / splicer, or other such device for implementing one or more of the techniques described above.
- the network entity 62 may or may not include a video encoder, such as video encoder 300.
- the network entity 62 may implement some of the techniques described in this application.
- the network entity 42 and the video decoder 200 may be part of separate devices, while in other cases, the functionality described with respect to the network entity 42 may be performed by the same device including the video decoder 400.
- the network entity 42 may be an example of the storage device 60 of FIG. 19.
- the entropy decoder 403 of the video decoder 400 entropy decodes the bitstream to generate quantized coefficients and some syntax elements.
- the entropy decoder 403 forwards the syntax elements to the prediction processing unit 408.
- Video decoder 400 may receive syntax elements at a video slice level and / or an image block level.
- the syntax element here may include inter prediction data related to the current image block, and the inter prediction data may include a first identifier block based enable flag to indicate whether to adopt the current image block.
- the above candidate set of inter prediction modes is used for inter prediction (in other words, whether the current image block is to be inter predicted using the new inter prediction mode proposed in this application); optionally, it may also include a second identification block based index to indicate which new inter prediction mode is used by the current image block.
- the intra predictor 409 of the prediction processing unit 408 may be based on the signaled intra prediction mode and the previously decoded block from the current frame or image. Data to generate prediction blocks for image blocks of the current video slice.
- the inter predictor 410 of the prediction processing unit 408 may determine a method for evaluating the current content based on the syntax element received from the entropy decoder 403. An inter prediction mode in which a current image block of a video slice is decoded, and based on the determined inter prediction mode, the current image block is decoded (for example, inter prediction is performed).
- the inter predictor 410 may determine whether to use a new inter prediction mode for prediction of the current image block of the current video slice. If the syntax element indicates that the new inter prediction mode is used to predict the current image block, based on A new inter prediction mode (e.g., a new inter prediction mode specified by a syntax element or a default new inter prediction mode) predicts the current image block of the current video slice or a sub-block of the current image block. Motion information, thereby obtaining or generating a predicted block of the current image block or a sub-block of the current image block using the predicted motion information of the current image block or a sub-block of the current image block through a motion compensation process.
- a new inter prediction mode e.g., a new inter prediction mode specified by a syntax element or a default new inter prediction mode
- the motion information here may include reference image information and motion vectors, where the reference image information may include but is not limited to unidirectional / bidirectional prediction information, a reference image list number, and a reference image index corresponding to the reference image list.
- a prediction block may be generated from one of reference pictures within one of the reference picture lists.
- the video decoder 400 may construct a reference picture list, that is, a list 0 and a list 1, based on the reference pictures stored in the DPB 407.
- the reference frame index of the current image may be included in one or more of the reference frame list 0 and list 1.
- the video encoder 300 may signal whether to use a new inter prediction mode to decode a specific syntax element of a specific block, or may also signal to indicate whether to use a new inter prediction mode. And indicate which new inter prediction mode is used to decode a specific syntax element of a specific block.
- the inter predictor 410 here performs a motion compensation process. In the following, the inter prediction process for using the motion information of the reference block to predict the motion information of the current image block or a sub-block of the current image block under various new inter prediction modes will be explained in detail.
- the inverse quantizer 404 inverse quantizes, that is, dequantizes, the quantized transform coefficients provided in the bit stream and decoded by the entropy decoder 403.
- the inverse quantization process may include using a quantization parameter calculated by the video encoder 300 for each image block in the video slice to determine the degree of quantization that should be applied and similarly to determine the degree of inverse quantization that should be applied.
- the inverse transformer 405 applies an inverse transform to transform coefficients, such as an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process in order to generate a residual block in the pixel domain.
- the video decoder 400 works by comparing the residual block from the inverse transformer 405 with the corresponding prediction generated by the inter predictor 410.
- the blocks are summed to get the reconstructed block, that is, the decoded image block.
- the summer 411 represents a component that performs this summing operation.
- a loop filter in the decoding loop or after the decoding loop
- the loop filter 406 may represent one or more loop filters, such as a deblocking filter, an adaptive loop filter (ALF), and a sample adaptive offset (SAO) filter.
- the loop filter 406 is shown as an in-loop filter in FIG. 20B, in other implementations, the loop filter 406 may be implemented as a post-loop filter.
- the loop filter 406 is adapted to reconstruct a block to reduce block distortion, and the result is output as a decoded video stream.
- a decoded image block in a given frame or image may also be stored in a decoded image buffer 407, and the decoded image buffer 407 stores a reference image for subsequent motion compensation.
- the decoded image buffer 407 may be part of a memory, which may also store the decoded video for later presentation on a display device, such as the display device 420 of FIG. 19, or may be separate from such memory.
- the video decoder 400 may generate an output video stream without being processed by the loop filter 406; or, for some image blocks or image frames, the entropy decoder 403 of the video decoder 400 does not decode the quantized coefficients, correspondingly The ground need not be processed by the inverse quantizer 404 and the inverse transformer 405.
- Inter prediction is to find a matching reference block for the current coding block in the current image in the reconstructed image, and use the pixel value of the pixel point in the reference block as the prediction information or prediction of the pixel value of the pixel point in the current coding block. Value (there is no longer distinguishing between information and value), this process is called motion estimation (ME) (as shown in Figure 21), and the motion information of the current coded block is transmitted.
- ME motion estimation
- the motion information of the current coding block includes indication information of the prediction direction (usually forward prediction, backward prediction, or bidirectional prediction), one or two motion vectors (Motion vector, MV) pointing to the reference block, And indication information (usually referred to as a reference frame index) of a picture in which a reference block is located.
- Forward prediction means that the current coding block selects a reference image from the forward reference image set to obtain a reference block.
- Backward prediction means that the current coding block selects a reference image from the backward reference image set to obtain a reference block.
- Bidirectional prediction refers to selecting a reference image from the forward and backward reference image sets to obtain a reference block.
- the bidirectional prediction method there are two reference blocks in the current coding block, each of which requires a motion vector and a reference frame index to indicate, and then the pixels in the current block are determined according to the pixel values of the pixels in the two reference blocks. The predicted value of the pixel value.
- the motion estimation process needs to try multiple reference blocks in the reference image for the current coding block. Which one or several reference blocks are ultimately used for prediction is determined using rate-distortion optimization (RDO) or other methods.
- RDO rate-distortion optimization
- the pixel information of the pixels in the current coding block is subtracted from the corresponding prediction information to obtain residual information. Then, discrete cosine transform (DCT) and other methods are applied to the residual The difference information is transformed, and then the code stream is obtained by using quantized entropy coding. After the prediction signal is added with the reconstructed residual signal, further filtering operations are required to obtain a reconstructed signal and use it as a reference signal for subsequent encoding.
- DCT discrete cosine transform
- Decoding is equivalent to the reverse process of encoding.
- the entropy decoding inverse quantization and inverse transformation is first used to obtain residual information, and the decoded code stream determines whether the current coding block uses intra prediction or inter prediction. If it is intra prediction, the pixel information of pixels in the surrounding reconstructed area is used to construct prediction information according to the used intra prediction method. If it is inter prediction, you need to parse out the motion information, use the parsed motion information to determine the reference block in the reconstructed image, and use the pixel values of the pixels in the block as prediction information. This process is called motion compensation (Motion Compensation, MC). Reconstruction information can be obtained by using prediction information plus residual information after filtering.
- Motion Compensation Motion Compensation
- AMVP Advanced Motion Vector Prediction
- a motion vector list of a currently coded block in a spatial domain or a temporally adjacent coded block is used to construct a candidate motion vector list, and then an optimal motion vector is determined from the candidate motion vector list as a motion vector of the current coding block.
- Predictor Motion vector predictor, MVP.
- the rate-distortion cost is calculated by formula (11), where J is the rate-distortion cost RD Cost, and SAD is the sum of the absolute error between the predicted pixel value and the original pixel value obtained after motion estimation using the candidate motion vector prediction value.
- Absolute Differences (SAD) is the code rate, and ⁇ is the Lagrangian multiplier.
- the encoder passes the index value of the selected motion vector prediction value in the candidate motion vector list and the reference frame index value to the decoder. Further, the motion search is performed in the neighborhood centered on the MVP to obtain the actual motion vector of the current coding block, and the encoder passes the difference between the MVP and the actual motion vector (Motion vector difference) to the decoder.
- the motion information of the currently coded block in the spatial or time-domain adjacent coded blocks is used to construct a candidate motion information list, and then the optimal motion information is determined from the candidate motion information list as the current coding block at the rate-distortion cost. And then pass the index value of the position of the optimal motion information in the candidate motion information list (referred to as merge index, the same below) to the decoder.
- merge index the index value of the position of the optimal motion information in the candidate motion information list (referred to as merge index, the same below) to the decoder.
- the spatial and temporal candidate motion information of the current coding block is shown in Figure 22. Spatial candidate motion information comes from the spatially adjacent 5 blocks (A0, A1, B0, B1, and B2). If adjacent blocks are not available or are frames In the inner coding mode, the candidate motion information list is not added.
- the time-domain candidate motion information of the current coding block is obtained by scaling the MV of the corresponding position block in the reference frame according to the reference frame and the picture order count (POC) of the current frame. First, it is judged whether the block with the position T in the reference frame is available. If it is not available, the block with the position C is selected.
- POC picture order count
- Non-translational motion model prediction refers to the use of the same motion model in the codec to derive the motion information of each sub motion compensation unit in the current coding block, and perform motion compensation based on the motion information of the sub motion compensation unit to obtain the prediction block, thereby improving Forecast efficiency.
- Commonly used motion models are 6-parameter affine models or 4-parameter affine transformation models.
- the 4-parameter affine transformation model can be represented by the motion vector of two pixels and their coordinates relative to the top left vertex pixel of the current coding block, and the pixels used to represent the parameters of the motion model are recorded as control points. If the upper left vertex (0,0) and upper right vertex (W, 0) pixels are used as control points, then the motion vectors (vx0, vy0) and (vx1, vy1) of the upper left vertex and the upper right vertex control point of the current coding block are determined first.
- the 6-parameter affine transformation model can be represented by the motion vector of three pixels and its coordinates with respect to the top left vertex pixel of the current coding block. If the upper-left vertex (0,0), upper-right vertex (W, 0), and lower-left vertex (0, H) pixels are used as control points, the motion vectors of the upper-left vertex, upper-right vertex, and lower-left vertex control point of the current coding block are determined first.
- the coding block predicted by the affine transformation motion model is called an affine coding block.
- an affine transformation Advanced Motion Vector Prediction (AMVP) mode or an affine transformation fusion (Merge) mode can be used to obtain the motion information of the control points of the affine coding block.
- the motion information of the control points can be obtained by the method based on the motion model or based on the combination of control points:
- the motion model derives the motion vector of the control point of the current block (for the merge mode) or the prediction value of the motion vector of the control point (for the AMVP mode).
- the motion vector (vx4, vy4) and upper right vertex (x5, y5) of the upper left vertex (x4, y4) of the affine coding block are obtained.
- Use formula (18) to obtain the motion vector (vx0, vy0) of the upper left vertex (x0, y0) of the current encoding block and use formula (19) to obtain the motion vector (vx1, vy1) of the upper right vertex (x1, y1) of the current encoding block.
- the motion information of the upper left vertex and the upper right vertex of the current encoding block is determined by using the motion information of the encoded blocks around the current encoding block.
- the motion vectors of the upper left vertex adjacent encoded blocks A, B, and C are used as candidate motion vectors of the motion vectors of the upper left vertex of the current encoding block; the upper right vertex adjacent encoded blocks D and
- the motion vector of the E block is used as a candidate motion vector of the motion vector of the top right vertex of the current coding block.
- the candidate motion vectors of the upper left vertex and the upper right vertex are combined to form a candidate motion vector two-tuple queue of two control points:
- v0 represents the upper left vertex candidate motion vector
- v1 represents the upper right vertex candidate motion vector
- its position in the queue is indexed
- the index values are 0, 1, 2, 3, 4, 5 in this order.
- the coordinates of CP1, CP2, CP3, and CP4 are (0,0), (W, 0), (H, 0), and (W, H), where W and H are the width and height of the current block.
- the check order is B2-> A2-> B3. If B2 is available, the motion information of B2 is used. Otherwise, detect A2, B3. If motion information is not available at all three locations, CP1 motion information cannot be obtained.
- the check sequence is B0-> B1;
- the detection sequence is A0-> A1;
- Available here means that the block including the X position has been encoded and is in inter-coding mode; otherwise, the X position is not available.
- control point motion information can also be applied to the present invention, and details are not described again.
- the motion information of two control points is combined to construct a 4-parameter affine transformation model.
- the combination of the two control points is ⁇ CP1, CP4 ⁇ , ⁇ CP2, CP3 ⁇ , ⁇ CP1, CP2 ⁇ , ⁇ CP2, CP4 ⁇ , ⁇ CP1, CP3 ⁇ , ⁇ CP3, CP4 ⁇ .
- Affine CP1, CP2
- the motion information of the three control points is combined to construct a 6-parameter affine transformation model.
- the combination of the three control points is ⁇ CP1, CP2, CP4 ⁇ , ⁇ CP1, CP2, CP3 ⁇ , ⁇ CP2, CP3, CP4 ⁇ , ⁇ CP1, CP3, CP4 ⁇ .
- Affine CP1, CP2, CP3
- the motion information of the four control points is combined to construct an 8-parameter bilinear model.
- An 8-parameter bilinear model constructed using CP1, CP2, CP3, and CP4 control points is denoted as Bilinear (CP1, CP2, CP3, CP4).
- control point motion information corresponding to the combined model is not available, the model is considered to be unavailable; otherwise, the reference frame index of the model is determined, and the motion vector of the control points is scaled. After the motion information of all control points is consistent, the model is invalid. Otherwise, it is added to the candidate motion information list.
- CurPoc represents the POC number of the current frame
- DesPoc represents the POC number of the reference frame of the current block
- SrcPoc represents the POC number of the reference frame of the control point
- MVs represents the MV obtained by scaling.
- control points can also be converted into a control point at the same position.
- the 4-parameter affine transformation model obtained by combining ⁇ CP1, CP4 ⁇ , ⁇ CP2, CP3 ⁇ , ⁇ CP2, CP4 ⁇ , ⁇ CP1, CP3 ⁇ , ⁇ CP3, CP4 ⁇ into a control point ⁇ CP1, CP2 ⁇ ⁇ CP1, CP2, CP3 ⁇ .
- the conversion method is to substitute the motion vector of the control point and its coordinate information into formula (13) to obtain the model parameters, and then substitute the coordinate information of ⁇ CP1, CP2 ⁇ to obtain its motion vector.
- the 6-parameter affine transformation model of ⁇ CP1, CP2, CP4 ⁇ , ⁇ CP2, CP3, CP4 ⁇ , ⁇ CP1, CP3, CP4 ⁇ is converted into a control point ⁇ CP1, CP2, CP3 ⁇ to represent it.
- the conversion method is to substitute the motion vector of the control point and its coordinate information into formula (15) to obtain the model parameters, and then substitute the coordinate information of ⁇ CP1, CP2, CP3 ⁇ to obtain its motion vector.
- Affine transform advanced motion vector prediction mode (AMVP) coding Affine transform advanced motion vector prediction mode (AMVP) coding:
- the candidate motion vector binary / triple list is pruned and sorted according to a specific rule, and it can be truncated or filled to a specific number.
- each candidate motion vector binary / triple triple is used to obtain each sub-motion compensation unit (the size of the pixel or a specific method partition is N1 ⁇ N2) in the current coding block by formulas (14) / (16) Pixel block) motion vector, and further obtain the pixel value of the position in the reference frame pointed to by the motion vector of each sub motion compensation unit, and use it as the prediction value to perform affine transformation motion compensation.
- Calculate the average value of the difference between the original value and the predicted value of each pixel in the current coding block and select the motion vector in the corresponding candidate motion vector tuple / triplet with the smallest difference as the current coding block Predicted motion vector for two / three control points.
- An index number indicating the position of the tuple / triplet in the candidate motion vector tuple / triple queue is encoded into a code stream and sent to a decoder.
- the index number is parsed, and the motion vector prediction value (control point vector predictor, CPMVP) of two / three control points is determined from the candidate motion vector binary / triple list according to the index number.
- CPMVP control point vector predictor
- the motion vector of the two / three control points is obtained by performing a motion search within a certain search range using the motion vector prediction value of the two / three control points as a search starting point.
- the difference between the motion vector of the two / three control points and the motion vector prediction value is passed to the decoder.
- the motion vector difference between two / three control points is parsed and added to the motion vector prediction value to obtain the motion vector of the control points.
- a motion vector prediction method based on a motion model and / or a motion vector prediction method based on a combination of control points are used to construct a candidate motion vector binary / triple list.
- each candidate motion vector binary / triple triple is used to obtain each sub-motion compensation unit (the size of the pixel or a specific method partition is N1 ⁇ N2) in the current coding block by formulas (14) / (16) Pixel block) motion vector, and further obtain the pixel value of the position in the reference frame pointed to by the motion vector of each sub motion compensation unit, and use it as the prediction value to perform affine transformation motion compensation.
- Calculate the average value of the difference between the original value and the predicted value of each pixel in the current coding block and select the motion vector in the corresponding candidate motion vector tuple / triplet with the smallest difference as the current coding block Motion vector for two / three control points.
- An index number indicating the position of the tuple / triplet in the candidate motion vector tuple / triple queue is encoded into a code stream and sent to a decoder.
- the index number is parsed, and the motion vector (control point motion vector, CPMV) of two / three control points is determined from the candidate motion vector tuple / triple list according to the index number.
- CPMV control point motion vector
- the acquisition module contains a camera or camera group and pre-processing to convert the light signal into a digitized video sequence.
- the video sequence is then encoded by an encoder and converted into a code stream.
- the code stream is then sent from the sending module to the receiving module via the network. After being converted into a code stream by the receiving module, it is decoded and reconstructed by the decoder into a video sequence. Finally, the reconstructed video sequence is sent to a display device for display after post-processing such as rendering.
- the method of the present invention is mainly applied in encoding and / or decoding.
- Step 1 Obtain the motion information of the control points of the current affine coding block
- Step 2 Determine the size of the motion compensation unit
- the size of the motion compensation unit of the current affine coding block, MxN is the size agreed by the codec using the same rules. It can be fixed to 4x4, 8x8, etc., or it can be based on the control point motion vector difference, motion vector accuracy, and control point. The distance between them is determined.
- the size of the motion compensation unit of the affine coding block can be determined by other methods, which will not be described in detail in the present invention.
- Step 3 Use the affine transformation model to determine the motion information of each motion compensation unit in the current affine coding block according to the motion information of the control points.
- the motion information of pixels in a preset position in the motion compensation unit may be used to represent the motion information of all pixels in the motion compensation unit.
- the preset position pixels can be the center point of the motion compensation unit (M / 2, N / 2), the upper left vertex (0,0), and the upper right vertex (M-1,0). Or other pixels.
- the following uses the center point of the motion compensation unit as an example.
- the coordinates of the center point of the motion compensation unit relative to the top left pixel of the current affine coding block are calculated using formula (22), where i is the i-th motion compensation unit in the horizontal direction (from left to right), and j is the j-th vertical direction.
- Motion compensation units (from top to bottom), (x ((i, j)), y ((i, j)))) represents the center point of the (i, j) th motion compensation unit relative to the current affine coding block
- the coordinates of the top-left vertex pixel is calculated using formula (22), where i is the i-th motion compensation unit in the horizontal direction (from left to right), and j is the j-th vertical direction.
- the accuracy of the motion vector calculated directly by formula (23) is higher than the accuracy of the control point motion vector, which can be further quantified to the same accuracy as the control point motion vector.
- Step 4 Perform motion compensation prediction according to the motion information of the motion compensation unit to obtain a prediction block of the affine coding block
- the motion information obtained in step 3 is used to perform motion compensation prediction to obtain the prediction value of each motion compensation unit.
- Step 5 Determine the size of the motion information storage unit, and use the affine transformation model to determine the motion information of each motion information storage unit in the current affine coding block and store it according to the motion information of the control points.
- the size of the motion information storage unit is usually 4x4.
- the calculation method of the motion information of the storage unit is similar to that of the motion compensation unit. In particular, it is necessary to store the motion information of the control point to its corresponding storage unit. For the storage unit where the control point pixel is located, the motion information of the storage unit is directly set as the motion information of the control point. If the control point is not located in the current affine coding block, the motion information of the storage unit closest to the control point is set as the motion information of the control point.
- the storage unit may store the motion information of the control point, or the storage unit itself, such as the motion information of the storage unit center point or the motion compensation unit where the storage unit is located, which makes the storage unit
- the stored motion information is not uniform and leads to inconsistent motion information based on future motion compensation.
- an embodiment of the present invention provides another encoding and decoding method based on affine transformation.
- Step 1 Obtain the motion information of the control points of the current affine coding block (Control point motion vector, CPMV)
- Step 2 Determine the size of the motion compensation unit (can be an optional step)
- the size of the motion compensation unit of the current affine coding block, MxN is the size agreed by the codec using the same rules. It can be fixed to 4x4, 8x8, etc., or it can be based on the control point motion vector difference, motion vector accuracy, and control point. The distance between them is determined.
- the size of the motion compensation unit of the affine coding block can be determined by other methods, which will not be described in detail in the present invention.
- Step 3 Use the affine transformation model to determine the motion information of each motion compensation unit in the current affine coding block according to the motion information of the control points.
- the motion information of pixels in a preset position in the motion compensation unit may be used to represent the motion information of all pixels in the motion compensation unit.
- the preset position pixels can be the center point of the motion compensation unit (M / 2, N / 2), the upper left vertex (0,0), and the upper right vertex (M-1,0). Or other pixels.
- the following uses the center point of the motion compensation unit as an example.
- the coordinates of the center point of the motion compensation unit relative to the top left pixel of the current affine coding block are calculated using formula (24), where i is the i-th motion compensation unit in the horizontal direction (from left to right), and j is the j-th vertical direction.
- Motion compensation units (from top to bottom), (x ((i, j)), y ((i, j)))) represents the center point of the (i, j) th motion compensation unit relative to the current affine coding block
- the coordinates of the top-left vertex pixel is calculated using formula (24), where i is the i-th motion compensation unit in the horizontal direction (from left to right), and j is the j-th vertical direction.
- the accuracy of the motion vector calculated directly by formula 14 is higher than the accuracy of the control point motion vector, which can be further quantified to the same accuracy as the control point motion vector.
- Step 4 Perform motion compensation prediction according to the motion information of the motion compensation unit to obtain a prediction block of the affine coding block
- the motion information obtained in step 3 is used to perform motion compensation prediction to obtain the prediction value of each motion compensation unit.
- Step 5 Store the motion information of the motion compensation unit and the control point.
- the motion information of the motion compensation unit obtained in step 3 is stored, and the motion information of the control point obtained in step 1 is also stored.
- the motion information of the motion compensation unit is usually the motion information of the center point of the motion compensation unit
- the motion information of the control point is usually the motion information of the vertex of the affine coding block.
- the vertex of the affine coding block may be the upper left vertex. , Top right vertex, bottom left vertex or bottom right vertex.
- the size of the motion information storage unit is usually 4x4. Then, for each 4x4 storage unit in the motion compensation unit, its motion information is set as the motion information of the motion compensation unit. Specifically, for an 8x8 motion compensation unit, when stored, the motion information stored in the four storage units corresponding to the motion compensation unit are all motion information of the motion compensation unit.
- deblocking filters In the subsequent encoding process, deblocking filters, overlapping block motion compensation, prediction of motion information of subsequent non-affine coding blocks, prediction of motion information based on control point combination of affine coding blocks, prediction of time domain motion information
- the stored motion information of each motion compensation unit is used; the model-based motion information prediction of subsequent affine coding blocks uses the stored motion information of the control points.
- the motion vector of the motion compensation unit needs to be quantized to the motion vector accuracy of the storage unit for storage.
- steps 4 and 5 can be reversed, that is, after the calculated motion compensation unit motion information is stored, then motion compensation is performed according to the stored motion information.
- the above method enables the storage unit including the control point to store the motion information of the storage unit itself, such as the motion information of the center point of the storage unit, or the motion information of the motion compensation unit where the storage unit is located, instead of only storing the control Point's motion information, which is based on subsequent deblocking filter, overlapping block motion compensation, prediction of subsequent non-affine coded block's motion information, and affine coded block's motion information based on control point combination. Operations such as prediction and prediction of time-domain motion information provide better motion information.
- this method directly uses the motion information of the motion compensation unit obtained in step 3 for storage, and avoids separate calculation of the motion information of the motion compensation unit and the storage unit.
- the embodiment of the present invention also provides another encoding and decoding method based on affine transformation. This method is different from the previous method in that the other steps are the same.
- Step 1 Obtain the motion information of the control points of the current affine coding block (Control point motion vector, CPMV)
- the neighboring block at the upper position is not used for prediction.
- the motion information of the neighboring block at the upper position is not used for prediction, such as B, C, and E in FIG. 23 are not used.
- traverse the surrounding neighboring blocks of the current block to find a non-translational prediction block obtain the motion vector of the control point of the non-translational prediction block, and then derive the control point of the current block through its motion model.
- Motion vector for merge mode
- motion vector prediction of control points for AMVP mode
- the motion vector (vx4, vy4) and upper right vertex (x5, y5) of the upper left vertex (x4, y4) of the affine coding block are obtained.
- the motion vector (vx4, vy4) of the upper left vertex (x4, y4) and the motion vector (vx5, y5) of the upper right vertex (x4, y5) of the affine coding block are obtained.
- This method restricts the current CU from reading the motion information of control points across the upper boundary of the CTU, and reduces the line buffer that stores motion information.
- the embodiment of the present invention also provides another encoding and decoding method based on affine transformation. This method is different from the previous method in that the other steps are the same.
- Step 1 Obtain the motion information of the control points of the current affine coding block (Control point motion vector, CPMV)
- the neighboring block at its left position is not used for prediction.
- the motion information of the neighboring block at its left position is not used for prediction, such as A and D in FIG. 23 are not used.
- traverse the surrounding neighboring blocks of the current block to find a non-translational prediction block obtain the motion vector of the control point of the non-translational prediction block, and then derive the control point of the current block through its motion model.
- Motion vector for merge mode
- motion vector prediction of control points for AMVP mode
- This method also restricts the current CU from reading the motion information of the control point across the left boundary of the CTU, and reduces the line buffer that stores motion information.
- the embodiment of the present invention also provides another encoding and decoding method based on affine transformation. This method is different from the previous method in step one, and other steps are the same.
- Step 1 Obtain the motion information of the control points of the current affine coding block (Control point motion vector, CPMV)
- a motion vector prediction method based on a motion model is not used, and a motion vector prediction method based on a combination of control points may be used to obtain control point motion information.
- FIG. 26 is a schematic block diagram of an implementation manner of an encoding device or a decoding device (referred to as a decoding device 800) according to an embodiment of the present application.
- the decoding device 800 may include a processor 810, a memory 830, and a bus system 850.
- the processor and the memory are connected through a bus system, the memory is used to store instructions, and the processor is used to execute the instructions stored in the memory.
- the memory of the encoding device stores program code, and the processor can call the program code stored in the memory to perform various video encoding or decoding methods described in this application, especially the video encoding or decoding methods in various new inter prediction modes. , And methods for predicting motion information in various new inter prediction modes. To avoid repetition, it will not be described in detail here.
- the processor 810 may be a Central Processing Unit (“CPU”), and the processor 810 may also be another general-purpose processor, a digital signal processor (DSP), or a dedicated integration. Circuits (ASICs), off-the-shelf programmable gate arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
- a general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
- the memory 830 may include a read-only memory (ROM) device or a random access memory (RAM) device. Any other suitable type of storage device may also be used as the memory 830.
- the memory 830 may include code and data 831 accessed by the processor 810 using the bus 850.
- the memory 830 may further include an operating system 833 and an application program 835.
- the application program 835 includes a processor 810 that allows the processor 810 to perform the video encoding or decoding method (especially the inter prediction method or the motion information prediction method described in this application) At least one program.
- the application program 835 may include applications 1 to N, which further includes a video encoding or decoding application (referred to as a video decoding application) that executes the video encoding or decoding method described in this application.
- the bus system 850 may include a data bus, a power bus, a control bus, a status signal bus, and the like. However, for the sake of clarity, various buses are marked as the bus system 850 in the figure.
- the decoding device 800 may further include one or more output devices, such as a display 870.
- the display 870 may be a tactile display that incorporates the display with a tactile unit operatively sensing a touch input.
- the display 870 may be connected to the processor 810 via a bus 850.
- Computer-readable media may include computer-readable storage media, which corresponds to tangible media, such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another (e.g., according to a communication protocol) .
- computer-readable media may generally correspond to (1) tangible computer-readable storage media that is non-transitory, or (2) a communication medium such as a signal or carrier wave.
- a data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code, and / or data structures used to implement the techniques described in this application.
- the computer program product may include a computer-readable medium.
- such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage devices, magnetic disk storage devices or other magnetic storage devices, flash memory, or may be used to store instructions or data structures Any other medium in the form of the required program code and accessible by a computer. Also, any connection is properly termed a computer-readable medium. For example, if coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave are used to transmit instructions from a website, server, or other remote source, then coaxial cable Wire, fiber optic cable, twisted pair, DSL or wireless technologies such as infrared, radio and microwave are included in the definition of media.
- DSL digital subscriber line
- the computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other temporary media, but are instead directed to non-transitory tangible storage media.
- magnetic and optical discs include compact discs (CDs), laser discs, optical discs, digital versatile discs (DVDs), and Blu-ray discs, where magnetic discs typically reproduce data magnetically, and optical discs use lasers to reproduce optical data. Combinations of the above should also be included within the scope of computer-readable media.
- DSPs digital signal processors
- ASICs application specific integrated circuits
- FPGAs field programmable logic arrays
- processor may refer to any of the aforementioned structures or any other structure suitable for implementing the techniques described herein.
- functions described by the various illustrative logical blocks, modules, and steps described herein may be provided within dedicated hardware and / or software modules configured for encoding and decoding, or Into the combined codec.
- the techniques can be fully implemented in one or more circuits or logic elements.
- the techniques of this application may be implemented in a wide variety of devices or devices, including a wireless handset, an integrated circuit (IC), or a group of ICs (eg, a chipset).
- IC integrated circuit
- Various components, modules, or units are described in this application to emphasize functional aspects of the apparatus for performing the disclosed techniques, but do not necessarily need to be implemented by different hardware units.
- the various units may be combined in a codec hardware unit in combination with suitable software and / or firmware, or through interoperable hardware units (including one or more processors as described above) provide.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
L'invention concerne un procédé et un appareil de prédiction d'image, se rapportant au domaine technique du codage et du décodage vidéo, et capables d'améliorer les performances de codage et de décodage et de réduire la complexité de codage et de décodage. Le procédé comprend les étapes suivantes : selon des informations de mouvement d'un point de commande d'un bloc d'image adjacent d'un bloc d'image courant à prédire, déterminer des informations de mouvement d'un point de commande du bloc d'image courant ; selon les informations de mouvement du point de commande du bloc d'image courant, déterminer des informations de mouvement d'un sous-bloc du bloc d'image courant à l'aide d'un modèle de transformation affine ; et obtenir un bloc prédit du sous-bloc selon les informations de mouvement du sous-bloc, le bloc d'image adjacent du bloc d'image courant satisfaisant au moins l'une des conditions suivantes : lorsqu'une limite supérieure du bloc d'image courant chevauche une limite supérieure d'une CTU où le bloc d'image courant est situé, le bloc d'image adjacent est un bloc d'image situé au niveau d'un côté gauche ou d'un côté gauche inférieur du bloc d'image courant ; et lorsqu'une limite gauche du bloc d'image courant chevauche une limite gauche de la CTU où le bloc d'image courant est situé, le bloc d'image adjacent est un bloc d'image situé au niveau d'un côté supérieur ou d'un côté droit supérieur du bloc d'image courant.
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810709850.7 | 2018-07-02 | ||
| CN201810709850 | 2018-07-02 | ||
| CN201811090471.0A CN110677645B (zh) | 2018-07-02 | 2018-09-18 | 一种图像预测方法及装置 |
| CN201811090471.0 | 2018-09-18 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2020007093A1 true WO2020007093A1 (fr) | 2020-01-09 |
Family
ID=69060763
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2019/082942 Ceased WO2020007093A1 (fr) | 2018-07-02 | 2019-04-16 | Procédé et appareil de prédiction d'image |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2020007093A1 (fr) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2022068326A1 (fr) * | 2020-09-30 | 2022-04-07 | 华为技术有限公司 | Procédé de prédiction de trame d'image et dispositif électronique |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103024378A (zh) * | 2012-12-06 | 2013-04-03 | 浙江大学 | 一种视频编解码中运动信息导出方法及装置 |
| CN104539966A (zh) * | 2014-09-30 | 2015-04-22 | 华为技术有限公司 | 图像预测方法及相关装置 |
| US20170339405A1 (en) * | 2016-05-20 | 2017-11-23 | Arris Enterprises Llc | System and method for intra coding |
| WO2017205701A1 (fr) * | 2016-05-25 | 2017-11-30 | Arris Enterprises Llc | Prédiction angulaire pondérée pour codage intra |
| CN108141582A (zh) * | 2015-08-07 | 2018-06-08 | Lg 电子株式会社 | 视频编译系统中的帧间预测方法和装置 |
-
2019
- 2019-04-16 WO PCT/CN2019/082942 patent/WO2020007093A1/fr not_active Ceased
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103024378A (zh) * | 2012-12-06 | 2013-04-03 | 浙江大学 | 一种视频编解码中运动信息导出方法及装置 |
| CN104539966A (zh) * | 2014-09-30 | 2015-04-22 | 华为技术有限公司 | 图像预测方法及相关装置 |
| CN108141582A (zh) * | 2015-08-07 | 2018-06-08 | Lg 电子株式会社 | 视频编译系统中的帧间预测方法和装置 |
| US20170339405A1 (en) * | 2016-05-20 | 2017-11-23 | Arris Enterprises Llc | System and method for intra coding |
| WO2017205701A1 (fr) * | 2016-05-25 | 2017-11-30 | Arris Enterprises Llc | Prédiction angulaire pondérée pour codage intra |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2022068326A1 (fr) * | 2020-09-30 | 2022-04-07 | 华为技术有限公司 | Procédé de prédiction de trame d'image et dispositif électronique |
| CN115398907A (zh) * | 2020-09-30 | 2022-11-25 | 华为技术有限公司 | 一种图像帧预测的方法及电子设备 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11252436B2 (en) | Video picture inter prediction method and apparatus, and codec | |
| JP7148612B2 (ja) | ビデオデータインター予測の方法、装置、ビデオエンコーダ、ビデオデコーダ及びプログラム | |
| WO2020052534A1 (fr) | Procédé de décodage vidéo et décodeur vidéo | |
| WO2020006969A1 (fr) | Procédé de prédiction de vecteur de mouvement et dispositif associé | |
| WO2020042604A1 (fr) | Codeur vidéo, décodeur vidéo, et procédé correspondant | |
| CN117730535A (zh) | 视频编解码中用于仿射运动补偿预测的几何分割 | |
| WO2019154424A1 (fr) | Procédé de décodage vidéo, décodeur vidéo et dispositif électronique | |
| CN121713476A (zh) | 基于外推滤波器的预测模式的方法和设备 | |
| CN111355958B (zh) | 视频解码方法及装置 | |
| WO2020043004A1 (fr) | Procédé de construction pour liste d'informations de mouvement candidates, procédé de prédiction entre trames et appareil | |
| CN110677645B (zh) | 一种图像预测方法及装置 | |
| JP7485809B2 (ja) | インター予測方法及び装置、ビデオエンコーダ、並びにビデオデコーダ | |
| WO2020007093A1 (fr) | Procédé et appareil de prédiction d'image | |
| CN110971899B (zh) | 一种确定运动信息的方法、帧间预测方法及装置 | |
| WO2019237287A1 (fr) | Procédé de prédiction inter-trames pour image vidéo, dispositif, et codec | |
| CN120982085A (zh) | 用于帧内模板匹配预测的搜索区域修改 | |
| CN121942200A (zh) | 基于外推滤波器的预测模式的方法和设备 | |
| WO2020007187A1 (fr) | Procédé et dispositif de décodage de bloc d'image | |
| WO2019227297A1 (fr) | Procédé et dispositif de prédiction inter-trame et codec pour image vidéo |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19831436 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 19831436 Country of ref document: EP Kind code of ref document: A1 |