WO2025120331A1 - Perfectionnements et intégration de codage d'amélioration - Google Patents

Perfectionnements et intégration de codage d'amélioration Download PDF

Info

Publication number
WO2025120331A1
WO2025120331A1 PCT/GB2024/053051 GB2024053051W WO2025120331A1 WO 2025120331 A1 WO2025120331 A1 WO 2025120331A1 GB 2024053051 W GB2024053051 W GB 2024053051W WO 2025120331 A1 WO2025120331 A1 WO 2025120331A1
Authority
WO
WIPO (PCT)
Prior art keywords
enhancement
rol
base
input video
soc
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/GB2024/053051
Other languages
English (en)
Inventor
Harry Morgan
Lorenzo CICCARELLI
Kevin MOCKFORD
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
V Nova International Ltd
Original Assignee
V Nova International Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by V Nova International Ltd filed Critical V Nova International Ltd
Publication of WO2025120331A1 publication Critical patent/WO2025120331A1/fr
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/167Position within a video image, e.g. region of interest [ROI]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/174Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a slice, e.g. a line of blocks or a group of blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/33Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the spatial domain

Definitions

  • Recent improvements in video coding technology have included the concept of hierarchical video coding. Examples include VC-6, standardised at SMPTE as ST 2117, and LCEVC, standardised at MPEG as MPEG-5 Part II. Typically, these hierarchical encoding schemes use multiple resolution levels and an encoder (or encoding module) associated with each resolution level.
  • LCEVC Low Complexity Enhancement Video Coding published in November 2021 , and many possible implementation details of LCEVC are described in patent publications WO 2020/188273 and WO 2020/188229. Each of these earlier documents is incorporated here by reference.
  • LCEVC enhances the reproduction fidelity of a decoded video after encoding and decoding using an existing codec. This is achieved by combining a base layer with an enhancement layer, where the base layer contains the video encoded using the existing codec, and the enhancement layer indicates a residual difference between the original video and an expected decoded video produced by decoding the base layer using the existing codec.
  • the enhancement layer can be combined with the decoded base layer to more accurately reproduce the original video.
  • the technology uses a down- sampled source signal encoded using a base codec to form a base stream.
  • An enhancement stream is formed using an encoded set of residuals which correct or enhance the base stream for example by increasing resolution or by increasing frame rate.
  • enhancement encoders into existing ecosystems, such as security cameras and other similar infrastructure.
  • many video scenarios for example those not media oriented such as security cameras, most of the video feed is unimportant and only a minor part is important. However, in that important region, details are all important perhaps even essential. Examples include detecting faces for identification, detecting vehicle number plates for vehicle identification and tracking or detail in evidence-based scenarios where the activity needs to be recorded or monitored, but background is less relevant.
  • a system on a chip comprising a plurality of enhancement encoders and a base codec configured to serve the plurality of enhancement encoders
  • the SoC is configured to receive a plurality of input video streams at a first resolution
  • the base codec is configured to encode the plurality of input video streams to provide a plurality of base coded streams at a second resolution lower than the first resolution
  • each of the plurality of enhancement encoders is configured to encode a respective one of the plurality of base coded streams to provide a plurality of enhancement coded streams corresponding to the plurality of input video streams, the enhancement coded streams being at the first resolution.
  • the enhancement encoders may be configured to add enhancement to the base encoded video to create the encoded output video streams at the same resolution as the input video streams, while the base codec can sustain a higher bitrate (i.e., throughput) at a relatively lower resolution without sacrificing the resulting quality of the output stream.
  • this provides improvements to the coding/processing capability per unit area of the chip over SoCs in the art.
  • the encoding/decoding hardware may handle more input streams, and higher resolution/frame rate streams, for a given chip area.
  • this also provides the opportunity for greater coding pipeline re-use, since encoding/decoding resources (i.e. parts of pipeline) may be reused for different tasks in order to optimise efficiency in a data compression workflow pipeline.
  • the SoC comprises a buffer configured to store one or more to-be-encoded input video streams during at least the encoding of one of the plurality of input video streams by the base codec.
  • a base codec may be controlled to switch between encoding different streams at a rate where the base codec can encode at a quicker rate than the information being stored in the buffer.
  • the base codec is an AV1 codec or an AVC codec or an HEVC codec
  • the plurality of enhancement encoders are LCEVC encoders.
  • the base and enhancement layers may be arranged in a hierarchical structure.
  • the SoC may be further configured to provide the base coded stream, such as in addition to the enhancement coded streams.
  • enhancement encoding may be employed to provide enhanced compression to reduce bitrate, which is advantageous in high motion or otherwise challenging scenarios (e.g. , in wearable cameras).
  • the plurality of enhancement encoders comprises three enhancement encoders or four enhancement encoders, and the plurality of input video streams comprises four input video streams.
  • the plurality of enhancement encoders comprises three enhancement encoders, and the plurality of input video streams comprises three input video streams.
  • the number of enhancement encoders is at least equal to the number of input streams. In this way, the enhancement encoding implemented in the SoC may provide a high performance gain density compared to using a base encoder or several such base encoders alone, particularly when there are multiple input streams, as the base may be operated at a lower resolution.
  • the SoC is configured for use in one or more of a camera, a security camera, an automotive camera, and a body worn camera.
  • the SoC is configured to receive the plurality of input streams from a respective plurality of such cameras.
  • the SoC may be configured for use in an edge device, wherein the edge device is configured to handle or otherwise receive each input stream of a plurality of input streams from each respective camera in a plurality of local cameras.
  • the SoC may be configured to receive the plurality of input streams from a respective plurality of views captured by a multi-view camera.
  • different views may have different lens angles (e.g.
  • a first view may be provided by a wide-angle lens
  • a second view may be provided by an ultra-wide angle lens
  • a third view may be provided by a fisheye lens
  • a fourth view may be provided by a narrow-angle lens, and so on).
  • an enhancement encoder within a (video encoder) SoC may be integrated into challenging practical applications involving cameras, such as in a security scenario.
  • the base codec is configured to provide a set of coordinates corresponding to a region of interest, Rol, to at least one of the plurality of enhancement encoders; and, the Rol is usable to modify the encoding of the respective one of the plurality of the base coded streams by the at least one of the plurality of enhancement encoders to prioritise enhancement of the base coded stream at the Rol.
  • the set of coordinates may have been received or otherwise provided to the base codec by an API or an Al object analysis model.
  • the base codec, the API, or the Al object analysis model may insert the set of coordinates into the base coded stream such that the set may be extracted by one or more of the enhancement encoders.
  • the set of coordinates may be extractable by the enhancement encoding operation.
  • the base coded stream may be adapted so as to permit Rol enhancement encoding by one or more enhancement encoders in the SoC, which in turn may provide residuals for particular areas of interest without needing to transmit residuals for the entire scene.
  • the SoC comprises a region of interest, Rol, identification module configured to provide a set of coordinates corresponding to the Rol to the base codec and/or at least one of the plurality of enhancement encoders; and, the Rol is usable to modify the encoding of the respective one of the plurality of the base coded streams by the at least one of the plurality of enhancement encoders to prioritise enhancement of the base coded stream at the Rol.
  • the Rol identification module may be a separate module to the enhancement encoders and/or to the base codec.
  • the Rol identification module may be the base codec, an API, an Al object analysis model, or another processing device or CPU module.
  • the Rol identification module may insert the set of coordinates into the base coded stream such that the set may be extracted by one or more of the enhancement encoders. Accordingly, the set of coordinates may be extractable by the enhancement encoding operation.
  • the one or more enhancement encoders may be provided with an indication of the area corresponding to the Rol by the Rol identification module, in order that the one or more enhancement encoders may provide encoded residuals in a specified area and therefore a higher resolution at the Rol.
  • At least one of the plurality of enhancement encoders is configured to determine a set of coordinates corresponding to a region of interest, Rol, based on one or more settings of a residual quantisation process of the base codec; and, the Rol is usable to modify the encoding of the respective one of the plurality of the base coded streams by the at least one of the plurality of enhancement encoders to prioritise enhancement of the base coded stream at the Rol.
  • the settings of the residual quantisation process of the base codec may comprise one or more quantisation step widths, wherein areas/portions of the base layer having a relatively lower quantisation step width may correspond to portions of the base coded stream corresponding to the Rol, and/or areas/portions of the base layer having a relatively higher quantisation step width may correspond to portions of the base coded stream not corresponding to the Rol.
  • a step width applied during the encoding process may be modified (e.g. by a residual selection block or by applying a priority map during a residual quantisation process) to selectively encode (a) residual value(s) corresponding to the Rol.
  • one or more enhancement encoders may be configured to provide (more) residuals in the area provided by the coordinates, and therefore a higher resolution for a particularly desired portion of the input video.
  • the Rol may be usable to modify an enhancement coding operation to prioritise enhancement of a base coded stream at the Rol by overriding a default enhancement coding operation according to the Rol to prioritise enhancement of a base video signal at the Rol.
  • the default enhancement coding operation may include an enhancement coding operation configured to encode residuals for an input video without reference to an Rol and/or without reference to a priority map.
  • the default enhancement coding operation may include an enhancement coding operation applied in a scenario where no object of interest is detected in the input video stream.
  • overriding the default enhancement coding operation may include varying, altering or updating the default encoding parameters of said operation to prioritise residual encoding of identified Rols.
  • overriding the default enhancement coding operation may include overriding a default enhancement coding operation according to the determined Rol to prioritise enhancement of a base coded stream at the Rol over the base coded stream not at the Rol.
  • the modification of an enhancement coding operation to prioritise enhancement of a base video signal at the Rol may be implemented by an enhancement codec configured to perform the enhancement coding operation.
  • the means for modifying the enhancement process according to the present disclosure may be included as part of the (e.g. standalone) enhancement codec.
  • enhancement modification may be independent of any given base codec and may be implemented without reference to, or without any requirement for, a base codec.
  • a camera system comprising the SoC of the first aspect, wherein the camera system comprises a plurality of cameras, and each camera of the plurality of cameras is configured to: capture one of the plurality of input videos streams, and send the captured one of the plurality of input streams to the SoC.
  • the camera system may comprise one or more downsampler modules configured to downsample (respective) one or more of the plurality of input video streams prior to the plurality of input video streams being passed to the base codec.
  • the camera system may comprise one or more upsampler modules configured to upsample an output of the base codec to generate an upsampled rendition of the output of the base codec to be input to one or more of the enhancement encoders.
  • the output of the base codec is a decoded rendition of a base encoded signal, wherein typically the output of the base codec is at a lower resolution than the input video signal(s).
  • a multiresolution camera comprising the SoC of the first aspect and any examples thereof, wherein the camera is configured in use to provide the plurality of input video streams to the SoC.
  • the camera may be configured to output the plurality of enhancement coded streams corresponding to the plurality of input video streams, the enhancement coded streams being at the first resolution.
  • the camera may be configured to output two video streams decodable to derive three video streams at respective three different resolutions, wherein the two video streams may include a first stream and a second stream, the first stream may include an enhancement coded stream at high resolution and a base coded stream at medium resolution, and the second stream may include a stream at low resolution.
  • a camera system may be provided, the camera system being configured to output two video streams.
  • Each output stream may be at a different level of quality and each output may be encoded.
  • the first quality may be (high definition) HD and the second may be UHD (ultra-high definition).
  • a base codec may encode HD, then an LCEVC encoder may be configured to encode UHD.
  • UHD output there may be configured to include two streams, an HD base and the HD+LCEVC enhancement.
  • the base may encode standard definition (SD) with LCEVC encoding HD and UHD.
  • SD standard definition
  • the same base stream may be used, with two output streams being provided by a respective LCEVC encoder configured to output HD and UHD respectively.
  • the base may be output at SD to be combined with HD from one encoding step and LCEVC from another encoding step.
  • the same LCEVC encoder can be used to encode each of the HD and UHD enhancements separately from the same base coder.
  • the UHD output may be provided from an HD base (i.e. HD base + UHD enhancement) and the HD may be provided from an SD base (i.e. SD base + HD enhancement).
  • an HD base i.e. HD base + UHD enhancement
  • SD base + HD enhancement i.e. SD base + HD enhancement
  • a method for operating a system on a chip, SoC comprising a plurality of enhancement encoders and a base codec configured to serve the plurality of enhancement encoders, and the method comprising: receiving, by the SoC, a plurality of input video streams at a first resolution; encoding, by the base codec, the plurality of input video streams to provide a plurality of base coded streams at a second resolution lower than the first resolution; and, encoding, by each of the plurality of enhancement encoders, a respective one of the plurality of base coded streams to provide a plurality of enhancement coded streams corresponding to the plurality of input video streams, the enhancement coded streams being at the first resolution.
  • the enhancement encoders may be operated so as to add enhancement to the base encoded video to create the encoded output video streams at the same resolution as the input video streams, while the base codec may be operated faster and at a relatively lower resolution without sacrificing output resolution.
  • the SoC comprises a buffer
  • the method further comprises: storing, by the buffer, one or more to-be-encoded input video streams during at least the encoding of one of the plurality of input video streams by the base codec.
  • the base codec may be controlled (e.g. by a CPU or module thereof, by a separate control module, or the like) to switch between encoding different streams such that a rate of encoding by the base codec is higher than the rate at which data is being stored into the buffer.
  • the base codec is an AV1 codec or an AVC codec or an HEVC codec
  • the plurality of enhancement encoders are LCEVC encoders.
  • the base and enhancement layers may be arranged in a hierarchical structure.
  • the method may further comprise providing the base coded stream, optionally in addition to the enhancement coded streams.
  • LCEVC encoding (which may not use inter-frame prediction) may be employed to provide enhanced compression by reducing bitrates in high motion cases (e.g., video footage from wearable cameras).
  • the plurality of enhancement encoders comprises three enhancement encoders or four enhancement encoders, and the plurality of input video streams comprises four input video streams.
  • the plurality of enhancement encoders comprises three enhancement encoders, and the plurality of input video streams comprises three input video streams.
  • the number of enhancement encoders is at least equal to the number of input streams. In this way, the enhancement coding operation may provide a high performance gain density over using a base encoder alone.
  • the plurality of input video streams are provided in use by one or more of a camera, a security camera, an automotive camera, and a body worn camera.
  • the SoC is configured to receive the plurality of input streams from a respective plurality of such cameras.
  • the SoC may be configured for use in an edge device, wherein the edge device is configured to handle or otherwise receive each input stream of a plurality of input streams from each respective camera in a plurality of local cameras.
  • the SoC may be configured to receive the plurality of input streams from a respective plurality of views captured by a multi-view camera.
  • different views may have different lens angles (e.g.
  • a first view may be provided by a wide-angle lens
  • a second view may be provided by an ultra-wide angle lens
  • a third view may be provided by a fisheye lens
  • a fourth view may be provided by a narrow-angle lens, and so on).
  • enhancement encoders within a video encoder SoC may be integrated into challenging practical applications involving cameras.
  • SoC system on a chip
  • a method for providing an encoded video signal comprising: determining a region of interest, Rol, in an input video signal; modifying an enhancement coding operation to prioritise enhancement of a base video signal at the Rol; and, encoding the input video signal using the modified enhancement coding operation to generate an encoded video signal.
  • Rol enhancement encoding may focus residuals (and thereby prioritise enhancement) on areas of interest for high quality reconstruction at viable bitrates.
  • the determining a region of interest, Rol, in an input video signal comprises: analysing the input video signal to identify the Rol.
  • the Rol may correspond to an area or areas of a frame of an image or video to be allocated residuals and/or to an area or areas of a frame of an image or video where an object of a particular class has been localised. In this way, residuals corresponding to objects of different classes may be modified differently or not modified at all.
  • the analysing the input video signal to identify the Rol comprises: applying a machine learning (ML) model and/or a computer vision (CV) model to the input video signal.
  • ML and/or CV models may be a part of an artificial intelligence, Al, recognition model. In this way, different Al detection/recognition searches may be implemented at an appropriate quality level, resulting in higher efficiency and accuracy.
  • object detection may be implemented at the base video signal resolution, and/or object recognition may be implemented at full input video resolution, and/or decoding (i.e. partial decoding) may be implemented only on the identified Rol.
  • a first object analysis task may be performed on a first version of an image or a video at a first level of quality
  • a second object analysis task may be performed on a second version of the image or the video at a second level of quality, the second level of quality being higher than the first level of quality
  • the first object analysis task may be an object detection task and the second object analysis task may be an object recognition task.
  • results of an object analysis task may comprise one or more confidence levels that a given object is or has been detected and/or recognised.
  • the determining a region of interest, Rol, in an input video comprises: receiving a set of coordinates corresponding to the Rol.
  • the set of coordinates may be received by or otherwise provided to the enhancement coding operation from an API or from an Al object analysis model.
  • the set of coordinates may be inserted into the base video signal, optionally by the API, the Al object analysis model, or by a base codec, such that the set may be extracted by the enhancement coding operation.
  • the method may further comprise: providing Al-detected Rol coordinates to the enhancement coding operation to adjust a focus of the video according to the Rol coordinates, optionally on a per-frame basis. In this way, encoding the Rol enhancement based on the received Rol coordinates permits prioritisation of residuals for known high priority areas, and permits optimisation of video quality, focus, and improvements in compression efficiency.
  • the determining a region of interest, Rol, in an input video comprises: analysing residual quantisation settings of a base codec configured to provide the base video signal.
  • the settings of residual quantisation process of the base codec may comprise one or more quantisation step widths, wherein areas/portions of the base video signal having a relatively lower quantisation step width may correspond to parts of the base video signal corresponding to the Rol, and/or areas/portions of the base video signal having a relatively higher quantisation step width may correspond to parts of the base video signal not corresponding to the Rol.
  • the modifying an enhancement coding operation to prioritise enhancement of a base video signal at the Rol comprises: adjusting a priority map, the priority map comprising one or more weights corresponding to one or more residuals, wherein the one or more weights indicate to prioritise encoding residuals corresponding to the Rol.
  • the priority map may guide decisions on which area or areas of a frame, scene or video should be allocated residuals to provide enhancement and thereby higher resolution.
  • the method further comprises: filtering the one or more residuals according to the priority map.
  • filtering a residual may comprise selecting the residual for encoding where the residual corresponds at least in part to the Rol, or quantising the residual to zero where the residual corresponds to an area of the scene, frame or footage which is not in the Rol.
  • the modifying an enhancement coding operation to prioritise enhancement of a base video signal at the Rol comprises: encoding the input video signal using a base codec to provide the base video signal; and modifying a residual quantisation process of the base codec to selectively encode residuals corresponding to the Rol.
  • non-ROI portions of the image/video may remain at base layer resolution.
  • the modifying a residual quantisation process of the enhancement coding operation to selectively encode residuals corresponding to the Rol comprises one or more of: adjusting a step width of the residual quantisation operation for the residuals corresponding to the Rol and/or for residuals not corresponding to the Rol; quantising residuals not corresponding to the Rol to zero.
  • a step width applied during the encoding process may be modified to selectively encode (a) residual value(s) corresponding to the Rol.
  • the modifying an enhancement coding operation to prioritise enhancement of a base video signal at the Rol comprises: modifying the enhancement coding operation based on the Rol and a received indication of value of the Rol.
  • the modifying an enhancement coding operation to prioritise enhancement of a base video signal at the Rol may be implemented by an enhancement codec configured to perform the enhancement coding operation.
  • the means for modifying the enhancement process according to the present disclosure may be included as part of the (e.g. standalone) enhancement codec.
  • enhancement modification may be independent of any given base codec and may be implemented without reference to, or without any requirement for, a base codec.
  • the modifying an enhancement coding operation to prioritise enhancement of a base video signal at the Rol comprises: overriding a default enhancement coding operation to prioritise enhancement of a base video signal at the Rol.
  • the default enhancement coding operation may include an enhancement coding operation configured to encode residuals for an input video without reference to an Rol and/or without reference to a priority map.
  • the default enhancement coding operation may include an enhancement coding operation applied in a scenario where no object of interest is detected in the input video stream.
  • overriding the default enhancement coding operation may include varying, altering or updating the default encoding parameters of said operation to prioritise residual encoding of identified Rols.
  • overriding the default enhancement coding operation may include overriding a default enhancement coding operation according to the determined Rol to prioritise enhancement of a base video signal at the Rol over the base video signal not at the Rol.
  • the method further comprises: receiving the set of coordinates corresponding to the Rol and/or the indication of value of the Rol respectively from an application programming interface, API, or extracting the set of coordinates corresponding to the Rol and/or the indication of value of the Rol respectively from the base video signal by the enhancement coding operation.
  • the method before determining a region of interest, Rol, in an input video signal, the method comprises: receiving an input video signal from a camera, a security camera, an automotive camera, or a body worn camera.
  • the Rol corresponds to an area of the input video signal where an object has been detected or recognised, or the Rol corresponds to an area of the base video signal where an object has been detected or recognised.
  • the method may further comprise: determining the Rol as an area of one or more frames of the input video signal corresponding to an area in which an object has been detected or recognised, wherein the object is optionally of a particular class.
  • the method further comprises: transmitting or otherwise outputting the encoded video signal.
  • a system on a chip, SoC configured in use to perform the method of the fifth aspect and any examples thereof.
  • a method for providing a region of interest, Rol, in a to-be-encoded input video signal comprising: analysing an input video signal to identify a region of interest, Rol, wherein the Rol is usable to modify an enhancement coding operation to prioritise enhancement of the input video signal at the Rol; and, transmitting or otherwise outputting a set of coordinates corresponding to the Rol.
  • video quality may be optimised by encoding the Rol enhancement based on Rol coordinates and prioritising residuals in known high priority areas.
  • a processing device configured in use to perform the method of the seventh aspect and any examples thereof.
  • the processing device is an Rol identification module and, optionally, the processing device is an Rol identification module configured for use in the system on a chip, SoC, of any of the first, fourth, and sixth aspects and any examples thereof.
  • the Rol identification module may be a separate module to the enhancement coding operation (or the encoders/decoders/other modules thereof).
  • the Rol identification module may insert the set of coordinates into the base coded stream such that the set may be extracted by the enhancement coding operation. In this way, the enhancement coding operation may be permitted to provide more residuals in the area indicated by the Rol identification module, and therefore a higher resolution for the particularly desired area of the scene.
  • a camera system configured to be worn on a body, the camera system comprising a camera, a processor, and the SoC of the first, fourth, and sixth aspects and any examples thereof.
  • a camera system comprising the multi-resolution camera of the second aspect and any examples thereof.
  • a method for providing an encoded video signal comprising: analysing a received input video signal to identify a region of interest; modifying an enhancement coding operation to prioritise enhancement of a base video signal at the region of interest; and, encoding the received input signal using the enhancement coding operation to generate the encoded video signal.
  • the method further comprises: receiving an indication of value of the region of interest, and modifying the enhancement coding operation based on the region of interest and the indication of value.
  • the indication of value may be received by an API or inserted into the base stream for extraction from the base stream by the enhancement coding operation.
  • modifying may comprise adjusting a priority map, the priority map being a set of residual weights (or a residual mask) corresponding to a residual or group of residuals.
  • the weights may indicate to the encoder to prioritise the encoding of residuals according to the weights.
  • the residuals may be filtered according to the weights.
  • the residuals may be quantised differently according to the weights of the residuals.
  • a step width applied during the encoding process may be modified to selectively encode a residual value corresponding to the region of interest.
  • analysing a received input video signal to identify a region of interest comprises: applying a machine learning and/or computer vision model to the input video signal.
  • the analysing and/or the modifying is provided by an enhancement decoder or may be received from a remote or external device or system.
  • the method further comprises: outputting and/or transmitting the encoded video signal.
  • a method for providing an encoded video signal comprising: analysing a received input video signal to identify a region of interest; and, outputting a set of coordinates corresponding to a region of interest, the region of interest being for encoding enhancement data in that region.
  • analysing a received input video signal to identify a region of interest comprises: applying a machine learning and/or computer vision model to the input video signal.
  • the analysing may be provided by an enhancement decoder or may be received from a remote or external device or system.
  • a method for providing an encoded video signal comprising: receiving a set of coordinates corresponding to a region of interest; modifying an enhancement coding operation to prioritise enhancement of a base video signal at the region of interest; and, encoding the received input signal using the enhancement coding operation to generate the encoded video signal.
  • the method further comprises: receiving an indication of value of the region of interest, and modifying the enhancement coding operation based on the region of interest (e.g. the received coordinates thereof) and the indication of value.
  • the set of coordinates and/or indication of value may be received by an API or inserted into the base stream for extraction from the base stream by the enhancement coding operation.
  • modifying comprises adjusting a priority map, the priority map being a set of residual weights (or a residual mask) corresponding to a residual or group of residuals.
  • the weights may indicate to the encoder to prioritise the encoding of residuals according to the weights.
  • the residuals may be filtered according to the weights.
  • the residuals may be quantised differently according to the weights of the residuals.
  • a step width applied during the encoding process may be modified to selectively encode a residual value corresponding to the region of interest.
  • the modifying may be provided by an enhancement decoder or may be received from a remote or external device or system.
  • the method further comprises: outputting and/or transmitting the encoded video signal.
  • a multiresolution camera configured to output two video streams decodable to derive three video streams at three different resolutions, wherein the multi-resolution camera is configured to output a first encoded enhancement stream comprising an enhancement stream at a high resolution and a base stream at a medium resolution, and a second stream at a low resolution.
  • an SoC for a camera, the SoC being configured to receive a plurality of input video streams and output a plurality of encoded output video streams, preferably at the same resolution.
  • the SoC may comprise a base encoder for use in encoding a plurality of the input streams and a plurality of enhancement encoders configured to provide the plurality of encoded output video streams.
  • the SoC may comprise a (e.g. single) base encoder configured to encode a plurality (e.g. two, three, four, and so forth) of (e.g. different) video streams at a first resolution (e.g. resolution X), optionally wherein the plurality equals N.
  • the SoC may comprise a plurality of (e.g. separate) enhancement encoders. Each enhancement encoder may be configured to encode a (e.g. single) video stream.
  • the SoC may comprise N enhancement encoders. In other words, the SoC may comprise a number of enhancement encoders, such that the number of enhancement encoders equals the number of video streams that the base encoder is configured to process (e.g. encode).
  • the video stream that each enhancement encoder processes may correspond to a rendition of one of the plurality of video streams that the base encoder processes (e.g. is input into the base encoder or is output by the base encoder).
  • the SoC may comprise a (e.g. single) base encoder configured to process multiple video streams, and further configured to serve (e.g. distribute data to, be connected to, and so forth) multiple enhancement encoders.
  • a fifteenth aspect of the invention there may be provided computer programs comprising respective instructions which, when executed by a respective computer, cause the computer to perform the method of the third, fifth, seventh, tenth, eleventh, and twelfth aspects respectively, and any examples thereof.
  • the computer programs of the fifteenth aspect may be stored as respective instructions on respective non-transitory computer-readable media or in respective signal transmissions.
  • a base codec may be understood to include a base encoder and a base decoder, which may or may not be independent of each other.
  • a base codec may alternatively be referred to as a base coding block.
  • the base codec may be provided as part of a general low complexity encoder, which may be configured to control an independent base encoder and decoder (e.g. as packaged as a base codec). In this way, the base encoder and decoder may be supplied as part of the low complexity encoder.
  • the low complexity encoder may be seen as a form of wrapper for the base codec, where the functionality of the base codec may be hidden from an entity implementing the low complexity encoder.
  • a base codec may be any other (e.g. non-enhancement) codec, such as AV1 , AVC, HEVC, which typically operates at a lower resolution than the input signal, and the base codec may be provided together with one or more enhancement codecs as herein described.
  • an input signal may be received from or otherwise originate from such a device as a camera, security camera, automotive camera, body worn camera, wearable camera, multi-resolution camera, camera SoC, camcorder, sensor.
  • any method may further comprise a step of output the signal resulting from, provided by, or otherwise obtained according to that method.
  • any method may be performed at least at a camera, by an SoC, and/or by a camera SoC; and any method may be applied at least to operate a camera, an SoC, and/or a camera SoC.
  • Figure 1 illustrates an image including Rols identified according to the present disclosure
  • Figure 2 illustrates an image including an Rol identified according to the present disclosure, and a base encoded version thereof
  • Figure 3 illustrates an image including an Rol at high resolution compared to the non-Rol portions of the image at low resolution
  • Figure 4 illustrates an image including an Rol identified according to the present disclosure, and a base encoded version thereof wherein only the Rol is enhanced according to the present disclosure
  • Figure 5 illustrates an image including an Rol identified according to the present disclosure, and a base encoded version thereof
  • Figure 6 illustrates an image including an Rol identified according to the present disclosure, and a base encoded version thereof wherein only the Rol is enhanced according to the present disclosure
  • Figure 7 illustrates an image including an Rol identified according to the present disclosure, and a base encoded version thereof wherein only the Rol is enhanced according to the present disclosure
  • Figure 8 illustrates an image including Rols identified according to the present disclosure, and a base encoded version thereof wherein one Rol is enhanced according to the present disclosure
  • Figure 9 illustrates examples of per frame bitrate as a percentage of total bitrate in an example of the present disclosure
  • Figure 10 illustrates a camera SoC in the art and a camera SoC having an enhancement coding operation according to the present disclosure
  • Figure 11 illustrates an exemplary encoding process according to the present disclosure
  • Figure 12 illustrates an exemplary encoding process according to the present disclosure
  • Figure 13 illustrates an exemplary encoding process according to the present disclosure
  • Figure 14 illustrates an exemplary process of providing an indication of an Rol according to the present disclosure
  • Figure 15 illustrates an exemplary process of providing an indication of an Rol according to the present disclosure.
  • LCEVC encoders may provide particular savings in storage and streaming bitrate.
  • a camera is configured to output multiple streams, each of different resolutions. This may be referred to as a typical multi-resolution scenario and may for example output 3 streams from each camera.
  • the streams may be low quality, medium quality (HD) and high quality (UHD) depending on the camera and processing capabilities.
  • a single LCEVC stream can replace the top two qualities, by using the base layer for the medium quality (HD) and the enhanced stream for high quality (UHD), unlocking significant storage savings by removing a standalone UHD stream.
  • HD medium quality
  • UHD enhanced stream for high quality
  • LCEVC may also provide enhanced compression, reducing bitrates especially in challenging/high motion cases (e.g., wearables).
  • wearable we mean a body worn camera.
  • the motion of such cameras means that typical coding techniques of modem compression algorithms may be particularly ineffective since they achieve efficiency gains by predicting motion from one frame to the next.
  • LCEVC does not use inter-frame prediction and so does not suffer such issues.
  • an LCEVC encoder into a body worn camera system comprising a camera, a processor and a video encoder SoC.
  • an LCEVC encoder could equally be integrated into any kind of security camera. Rate control in LCEVC allows optimal base vs. enhancement bitrate allocation, reducing bitrates spikes in peak-motion scenes, by backing-up to a lower resolution.
  • a rate control method be adapted or calibrated to smooth bitrates spikes by allowing a base resolution to be used where a high-motion scene requires a large amount of bits to encode and transmit the scene.
  • LCEVC encoders may provide enhanced quality and more details for forensic analysis.
  • Region of Interest (Rol) enhancement encoding allows to focus LCEVC residuals on areas of interest (e.g., faces, plates) for pristine quality reconstruction at viable bitrates.
  • a set of coordinates and preferably a value parameter, are provided to an LCEVC encoder.
  • the LCEVC encoder may be configured to provide more residuals in the area provided by the coordinates.
  • LCEVC compression efficiency allows more details in high motion scenarios. This is a benefit of using LCEVC in security and automotive camera devices and systems.
  • the indication of an area may be provided by a base encoder to the enhancement encoder (e.g. LCEVC encoder).
  • the enhancement encoder e.g. LCEVC encoder
  • the indication of an area may be provided by a Rol identification module to one or more of the enhancement encoder (e.g. LCEVC encoder) and the base encoder.
  • the Rol identification module may be a separate module to one or more of the enhancement encoder and the base encoder.
  • the enhancement encoder determines an indicated area by analysing the quantisation settings of the base. For example, if a particular area/portion of the base layer has a small quantisation step width, then this may be determined by the enhancement encoder to be an Rol because the base layer is of a higher quality at this region.
  • LCEVC encoding tools can uniquely leverage Analytics at camera level to optimise quality, for instance by encoding the Rol enhancement based on detected Rol coordinates and prioritising residuals on known high priority areas.
  • LCEVC encoding tools may be integrated with and are amenable for Al image processing. Higher quality details, such as in Rols, improve Al recognition accuracy.
  • the native dual-layer format allows different Al searches at the most appropriate quality level, resulting in higher efficiency and accuracy (e.g., object detection on base layer, object recognition at full resolution, partial decoding only on Rol).
  • a first object analysis task may be performed on a first version of an image at a first level of quality. Then, a second object analysis task may be performed on a second version of the image at a second level of quality, the second levels of quality being higher than the first level of quality.
  • the object analysis tasks can be categorised into two broad categories.
  • a first type of object analysis may be object detection, which detects the presence of an object in an image. The position of the object in the image may also be localised.
  • object detection relates to finding one or more instances of one or more objects of one or more particular classes and localizing the one or more objects within the representation.
  • Object detection may therefore relate to detecting all objects belonging to certain classes for which the object analysis element has been trained, as well as localizing them within the image.
  • An object analysis element for example, may have been trained to detect human faces, vehicles, signs, and animals. If such an object analysis element 370 detects one or more such objects, the specific location of each such detected object is returned, for example via a bounding box.
  • a bounding box may be provided in relation to each human face and each vehicle detected in an image.
  • a result of such object detection may be a level of confidence that one or more objects have been detected and localized.
  • a second type of object analysis task is object recognition.
  • Object recognition relates to identifying an object that has been detected.
  • object recognition may relate to determining a class label, such as “car”, “lorry” “motorbike” etc, to which a detected vehicle belongs.
  • object recognition may relate to recognising the identity of the particular person whose face has been detected.
  • a result of such object recognition may, for example, be a level of confidence that a given detected vehicle is a car.
  • a result of such object recognition may correspond to an 80% level of confidence that a car has been recognised and a 20% level of confidence that a car has not been recognised.
  • LCEVC provides a >2x performance density gain over AVI alone (and similarly over AVC/HEVC chipsets), thanks to the base encoder only needing to support 1/4 resolution video.
  • the base encoder only needing to support 1/4 resolution video.
  • the full resolution LCEVC- enhancement stream could be coded only when needed, e.g. in response to a detection (at the low resolution layer) of a particular object.
  • LCEVC has a “Priority Map” feature that highlights priority levels within each frame, guiding decisions on where to allocate the enhancement first. It is proposed to calibrate LCEVC Priority Map according to their needs, selectively prioritising high-quality reconstruction in important areas (e.g., faces).
  • Figure 1 illustrates an example of an image, highlighting areas (i.e., regions) of interest of a priority map.
  • a priority map is a map configured to provide weights (or a residual mask) to a set of residuals, guiding decisions on which area or areas of a frame should be allocated residuals to provide enhancement.
  • the residuals are modified (i.e. selected for encoding or quantized to 0) based on the indications in the map corresponding to the residuals.
  • the area or areas of the frame to be allocated residuals could correspond to areas where an object of a particular class has been localised.
  • residual data corresponding to objects belonging to different classes can be modified in different ways according to their class. For example, residuals relating to a detected sign could be modified differently to residuals relating to a detected vehicle. In other examples, residuals relating to objects in a particular class may not be modified at all, for example where greater fidelity is needed.
  • Figure 2 illustrates an image with a region of interest highlighted, showing the image encoded using x264 and LCEVC encoding of an x264 base, without residuals being encoded.
  • ROI Region of Interest
  • Figure 3 illustrates a region of interest at 2160p resolution compared to a base layer image at 1080p resolution.
  • the LCEVC format allows flexibility on where to place the enhancement and the extend to which part of an image is to be enhanced. Residuals can be placed exclusively in areas (i.e., regions) of interest (Rols) for forensic purposes (e.g., faces, plates, text, etc) which are targeted for high quality reconstruction, while non-Rol portions of the frame remain with base layer quality.
  • Rols regions of interest
  • the enhancement may be placed by modifying the residuals of the enhancement layers or selectively encoding them, for example, killing or not encoding the not selected residuals. This may be thought of as ranking or filtering, or weighting the residuals.
  • the modification may be performed by modifying (or selectively modifying) a step width of a quantisation process, for example, to increase a dead zone which quantizes the residuals to 0.
  • modifying may be found in WO2023/187308, GB2313070.1 and W02020/188229, the contents of which are incorporated by reference.
  • Al-detected Rol coordinates can be fed to an LCEVC encoder to adjust areas of focus on a per-frame basis.
  • Rols may be determined in accordance with a result of an object analysis, such as object detection or object recognition task.
  • a region of interest may correspond to an area of a frame where an object has been detected or recognised.
  • the coordinates may be fed by an API to the enhancement coding operation or via the base stream where they are passed to the enhancement coding operation or extracted from it.
  • the bitrate cost is affordable, as Rols often represent a small portion of the frame (1/100th as in the example), e.g., within 1-3 Mbps.
  • Figures 4-8 illustrate examples of region of interest encoding using different encoders and parameters.
  • Region of interest coding examples are provided in PCT/GB2023/052755, WO201 8/015764 and W02020/165575, which are incorporated by reference.
  • LCEVC rate control allows automatic per-frame base vs. enhancement bitrate adaptation. In peak complexity scenes, it would allocate majority of the bitrate to the base layer, turning to a more pleasant lower resolution image and massively reducing bitrate spikes. In normal/ low motion scenarios the base would require less bitrate for a good quality, hence more bitrate can be allocated to the enhancement.
  • Figure 9 illustrates examples of per frame bitrate as a percentage of total bitrate in a gaming example.
  • an SoC for a Camera configured to receive four input video streams and output 4 encoded output video streams, preferably at the same resolution.
  • the SoC may comprise one base encoder, e.g. AV1 and a plurality of LCEVC encoders, e.g. 3.
  • the encoders may be configured such that the base encoder encodes a reduced resolution of the input video (e.g. ) and the LCEVC encoders are configured to add enhancement to that base encoded video to create the encoded output video streams at the same resolution as the input video streams.
  • the base encoder Since the base encoder is operating at only a quarter of the resolution, it may be possible to output multiple streams (e.g. 4) at the resolution using only one base encoder. For example, if the base encoder has a throughput of X bits/s, then without utilising an enhancement stream, the base encoder could output a single stream (of X bits/s). However, if the base encoder is operating at only a quarter of the resolution, then the base encoder will, generally speaking, take a quarter of the time to process a stream. This means that the base encoder can process (e.g. encode) a first stream for a portion of time (where the portion of time is a quarter of the time it would have spent encoding the full resolution base), process (e.g.
  • a second stream for a second portion of time (where the portion of time is a quarter of the time it would have spent encoding the full resolution base), process (e.g. encode) a third stream for a third portion of time (where the portion of time is a quarter of the time it would have spent encoding the full resolution base), process (e.g. encode) a fourth stream for a fourth portion of time (where the portion of time is a quarter of the time it would have spent encoding the full resolution base).
  • the base encoder may utilise a buffer to achieve this, for example, the Taw’ data corresponding to the second stream may be stored in a buffer during the first portion of time, then when the first stream has been encoded, the base encoder can retrieve the buffered second stream and begin processing it. In this way, a base encoder can switch between encoding different streams, at a rate where the base encoder can encode at a quicker rate than the information being put into a buffer.
  • LCEVC provides a ⁇ 2.3x performance density gain over AV1 alone. This is based on the base encoder only needing to support resolution video. In a worked example:
  • a method for providing an encoded video signal comprising: analysing a received input video signal to identify a region of interest; modifying an enhancement coding operation to prioritise enhancement of a base video signal at the region of interest; and, encoding the received input signal using the enhancement coding operation to generate the encoded video signal.
  • a method for providing an encoded video signal comprising: analysing a received input video signal to identify a region of interest; outputting a set of coordinates corresponding to a region of interest, the region of interest being for encoding enhancement data in that region.
  • a method for providing an encoded video signal comprising: receiving a set of coordinates corresponding to a region of interest; modifying an enhancement coding operation to prioritise enhancement of a base video signal at the region of interest; and, encoding the received input signal using the enhancement coding operation to generate the encoded video signal.
  • Modifying may comprise adjusting a priority map, the priority map being a set of residual weights (or a residual mask) corresponding to a residual or group of residuals.
  • the weights indicate to the encoder to prioritise the encoding of residuals according to the weights.
  • the residuals may be filtered according to the weights. Alternatively, the residuals may be quantised differently according to the weights of the residuals.
  • a step width applied during the encoding process may be modified to selectively encode a residual value corresponding to the region of interest.
  • the input signal may be received from a camera, security camera, automotive camera or body worn camera.
  • the method may comprise outputting and/or transmitting the encoded video signal.
  • the method may comprise receiving an indication of value of the region of interest and modifying the enhancement coding operation based on the region of interest and the indication of value.
  • the set of coordinates and/or indication of value may be received by an API or inserted into the base stream for extraction from the base stream by the enhancement encoding operation.
  • the analysing a received input video signal to identify a region of interest may comprise applying a machine learning and/or computer vision model to the input video signal.
  • the analysing and modification may be provided by an enhancement decoder or may be received from a remote or external device or system.
  • a multi-resolution camera configured to output two video streams decodable to derive three video streams at three different resolutions.
  • the multi-resolution camera is configured to output a first encoded enhancement stream comprising an enhancement stream at a high resolution and a base stream at a medium resolution, and a second stream at a low resolution.
  • an SoC for a Camera configured to receive a plurality of input video streams and output a plurality of encoded output video streams, preferably at the same resolution.
  • the SoC may comprise a base encoder for use in encoding a plurality of the input streams and a plurality of enhancement encoders configured to provide the plurality of encoded output video streams.
  • the system on a chip may comprise a (e.g. single) base encoder configured to encode a plurality (e.g. two, three, four, and so forth) of (e.g. different) video streams at a first resolution (e.g. resolution X), optionally wherein the plurality equals N.
  • the system on a chip may comprise a plurality of (e.g. separate) enhancement encoders. Each enhancement encoder may be configured to encode a (e.g. single) video stream.
  • the SoC may comprise N enhancement encoders.
  • the SoC may comprise a number of enhancement encoders, such that the number of enhancement encoders equals the number of video streams that the base encoder is configured to process (e.g. encode).
  • the video stream that each enhancement encoder processes may correspond to a rendition of one of the plurality of video streams that the base encoder processes (e.g. is input into the base encoder or is output by the base encoder).
  • the SoC may comprise a (e.g. single) base encoder configured to process multiple video streams, and further configured to serve (e.g. distribute data to, be connected to, and so forth) multiple enhancement encoders.
  • Each of these methods may be performed at a camera or on a camera SoC.
  • Figure 11 shows an example of applying residual processing to select, de-select or otherwise modify residuals based a region of interest.
  • An input full resolution video 100 is processed to generate various encoded streams 101 , 102, 103.
  • Afirst encoded stream (encoded base stream) is produced by feeding a base codec (e.g., AVC, HEVC, or any other codec) with a down- sampled version of the input video.
  • the encoded base stream may be referred to as the base layer or base level.
  • a second encoded stream (encoded level 1 stream) is produced by processing the residuals obtained by taking the difference between a reconstructed base codec video and the down-sampled version of the input video.
  • a third encoded stream (encoded level 2 stream) is produced by processing the residuals obtained by taking the difference between an up-sampled version of a corrected version of the reconstructed base coded video and the input video.
  • the components of may provide a general low complexity encoder.
  • the enhancement streams may be generated by encoding processes that form part of the low complexity encoder and the low complexity encoder may be configured to control an independent base encoder and decoder (e.g. as packaged as a base codec).
  • the base encoder and decoder may be supplied as part of the low complexity encoder.
  • the low complexity encoder may be seen as a form of wrapper for the base codec, where the functionality of the base codec may be hidden from an entity implementing the low complexity encoder.
  • a down-sampling operation illustrated by downsampling component 105 may be applied to the input video to produce a down-sampled video to be encoded by a base encoder 113 of a base codec.
  • the down-sampling can be done either in both vertical and horizontal directions, or alternatively only in the horizontal direction.
  • the base encoder 113 and a base decoder 114 may be implemented by a base codec (e.g. as different functions of a common codec).
  • the base codec, and/or one or more of the base encoder 113 and the base decoder 114 may comprise suitably configured electronic circuitry (e.g. a hardware encoder/decoder) and/or computer program code that is executed by a processor.
  • Each enhancement stream encoding process may not necessarily include an up- sampling step.
  • the first enhancement stream is conceptually a correction stream while the second enhancement stream is up- sampled to provide a level of enhancement.
  • the encoded base stream is decoded by the base decoder 114 (i.e. a decoding operation is applied to the encoded base stream to generate a decoded base stream).
  • Decoding may be performed by a decoding function or mode of a base codec.
  • the difference between the decoded base stream and the down-sampled input video is then created at a level 1 comparator 110 (i.e. a subtraction operation is applied to the down-sampled input video and the decoded base stream to generate a first set of residuals).
  • the output of the comparator 110 may be referred to as a first set of residuals, e.g. a surface or frame of residual data, where a residual value is determined for each picture element at the resolution of the base encoder 113, the base decoder 114 and the output of the downsampling block 105.
  • the difference is then encoded by a first encoder 115 (i.e. a level 1 encoder) to generate the encoded Level 1 stream 102 (i.e. an encoding operation is applied to the first set of residuals to generate a first enhancement stream).
  • a first encoder 115 i.e. a level 1 encoder
  • an encoding operation is applied to the first set of residuals to generate a first enhancement stream.
  • the enhancement stream may comprise a first level of enhancement 102 and a second level of enhancement 103.
  • the first level of enhancement 102 may be considered to be a corrected stream, e.g. a stream that provides a level of correction to the base encoded/decoded video signal at a lower resolution than the input video 100.
  • the second level of enhancement 103 may be considered to be a further level of enhancement that converts the corrected stream to the original input video 100, e.g. that applies a level of enhancement or correction to a signal that is reconstructed from the corrected stream.
  • the second level of enhancement 103 is created by encoding a further set of residuals.
  • the further set of residuals are generated by a level 2 comparator 119.
  • the level 2 comparator 119 determines a difference between an up-sampled version of a decoded level 1 stream, e.g. the output of an upsampling component 117, and the input video 100.
  • the input to the upsampling component 117 is generated by applying a first decoder (i.e. a level 1 decoder) to the output of the first encoder 115. This generates a decoded set of level 1 residuals. These are then combined with the output of the base decoder 114 at summation component 120.
  • the output of summation component 120 may be seen as a simulated signal that represents an output of applying level 1 processing to the encoded base stream 101 and the encoded level 1 stream 102 at a decoder.
  • an up-sampled stream is compared to the input video which creates a further set of residuals (i.e. a difference operation is applied to the up-sampled recreated stream to generate a further set of residuals).
  • the further set of residuals are then encoded by a second encoder 121 (i.e. a level 2 encoder) as the encoded Level 2 enhancement stream (i.e. an encoding operation is then applied to the further set of residuals to generate an encoded further enhancement stream).
  • the output of the encoding process is a base stream 101 and one or more enhancement streams 102, 103 which preferably comprise a first level of enhancement and a further level of enhancement.
  • the three streams 101 , 102 and 103 may be combined, with or without additional information such as control headers, to generate a combined stream for the video encoding framework that represents the input video 100.
  • the components shown in Figure 1 may operate on blocks or coding units of data, e.g. corresponding to 2x2 or 4x4 portions of a frame at a particular level of resolution.
  • the components operate without any inter-block dependencies, hence they may be applied in parallel to multiple blocks or coding units within a frame. This differs from comparative video encoding schemes wherein there are dependencies between blocks (e.g. either spatial dependencies or temporal dependencies).
  • the dependencies of comparative video encoding schemes limit the level of parallelism and require a much higher complexity.
  • Figure 11 illustrates a residual selection block 140.
  • Residuals are processed (i.e. modified and/or ranked and selected) in order to determine which residuals should be transformed and encoded, i.e. which residuals are to be processed by the first and/or second encoders 115 and 121. Preferably this processing is performed prior to entropy encoding.
  • Residual selection 140 is an optional step that may configure or activate processing or modification of residuals i.e. residual processing is performed according to a selected mode.
  • the “residual mode” may correspond to a residual pre-processing mode, wherein residuals for enhancement layers are pre-processed prior to encoding. This mode may be turned on and off depending on requirements.
  • the residual mode may be configured via one or more control headers or fields.
  • the residuals may always be modified (i.e. pre-processed) and so selection of a mode is not required.
  • residual pre-processing may be hard-coded. Examples of residuals processing will be described in detail below.
  • the residual mode if selected, may act to filter residuals within one or more of the level 1 and level 2 encoding operations, preferably at a stage prior to the encoding sub-components. As shown and described above, the residual processing may be indicated or embedded in the base stream or passed by an API to the enhancement encoding process.
  • Figure 12 is a flow diagram illustrating an exemplary encoding scheme according to the present disclosure.
  • the scheme comprises receiving input data at the input 201 , such as image or video data from a camera or other sensor data from another sensor.
  • the input data may be in the form of Taw’ or unencoded data.
  • the input data may be in the form of base coded data.
  • the scheme further comprises running the input data through an object analysis module 202 to identify one or more objects of interest in the input data.
  • the object analysis module 202 may comprise or otherwise employ an Al object analysis model to perform this identification.
  • Analysing the input data to identify one or more objects of interest in the input data may include analysing the input data to detect the presence of an object and/or to localise an object within the input data, wherein in some examples the object may be an object belonging to a particular object class. Analysing the input data as such may further include recognising the detected and/or localised object as being an object belonging to that particular class, which in some examples may include determining a confidence level that an object detected in the input data is an object of said class. Once one or more objects are identified (i.e. , at least localised or otherwise detected), the scheme further comprises determining one or more Rols corresponding to the one or more objects.
  • the step of determining the Rol may be performed by an Rol identification module 203 as described above.
  • the Rol may correspond to the area in the input data (or the areas in the frames, scenes, streams etc. of the input data) corresponding to where the object is identified.
  • the scheme involves utilising the Rol(s) to influence the enhancement encoding process and, in particular, to influence the encoding decisions (such as residual encoding decisions) made by, for example, one or more enhancement encoders, a quantisation process, or a residual selection module.
  • the encoding scheme may further comprise a pre-analysis step performed by a pre-analysis module 204.
  • Pre-analysis may involve conducting a preliminary analysis on the input data (or the base coded data) prior to (e.g. enhancement) encoding in order to generate header data that can be integrated with the encoded streams.
  • the header data may be usable to modify or control enhancement by the enhancement encoder.
  • the header data may be usable to assist a decoder which subsequently decodes the output data.
  • the header data may comprise encoder configuration parameters including one or more of: resolution settings, quantisation parameters, temporal and/or spatial prediction parameters, bitrate control settings, an indicator indicating a type of codec.
  • the pre-analysis module may operate locally with the encoder or remotely (e.g. via a network).
  • Such parameters may be applicable at different levels of granularity i.e. per block (e.g. coding unit), per frame, or per group of frames. Further examples of pre-analysis are provided in patent publication WO 2023/187308, the contents of which is incorporated here by reference.
  • the scheme comprises determining, by a priority mapping module 205, one or more priority maps according to the Rol(s) detected in the input data. Determining a priority map may optionally comprise determining the priority map based on the detected Rol(s) and one or more parameters provided by the pre-analysis module 204.
  • the priority map(s), or the metadata signalling the priority map(s) may be comprised in the encoder configuration parameters, optionally together with the encoder configuration parameters resulting from the optional pre-analysis step.
  • Each priority map may comprise a residual mask, which in turn may comprise weightings to be applied to residuals at varying granularities (e.g.
  • High priority data may be indicated in the priority map by being assigned a priority (and an associated weighting) above a certain threshold.
  • High priority data may correspond to data in an Rol.
  • Low priority data may be indicated in the map by being assigned a priority (and an associated weighting) below a certain threshold.
  • Low priority data may correspond to data in regions that are not determined to be an Rol.
  • High priority data may be prioritised for enhancement encoding over low priority data.
  • the Rol data may be enhancement encoded, with residuals corresponding to the Rol being generated and thereafter used in temporal prediction, transformed, quantised, entropy coded, and transmitted.
  • the non-Rol data may similarly be enhancement encoded, with the corresponding residuals being generated but thereafter discarded or quantised to zero prior to transmission.
  • the non-Rol data may not be enhancement encoded and, optionally, may be base encoded only. In this way, the priority map indicates which residuals (i.e. the residuals corresponding to the Rol) are more important to encode, and thereby permits adjustment of the enhancement encoding for visually significant areas.
  • FIG. 13 is a flow diagram illustrating an exemplary encoding scheme according to the present disclosure.
  • the scheme comprises inputting data supplied by a camera or by a sensor at an input 301.
  • the input data may comprise image or video data or other sensor data, and it may be in the form of Taw’, unencoded data or in the form of base coded data.
  • the scheme further comprises a prioritisation step performed by a prioritisation module 302.
  • the prioritisation module 302 may be separate to the modules of the encoding operation, for example prioritisation module 302 may be implemented externally and/or by an API.
  • the purpose of the prioritisation step is to determine portions of the input data which are to be prioritised for enhancement encoding over other portions of the input data, to thereby influence the priority map created by the priority mapping module 303.
  • the prioritisation step may include a step of performing object analysis as described above and/or a step of performing Rol detection as described above. Where object analysis is performed on the input data, the resulting object(s) localised in the input data by an object analysis module of the module 302 may be used directly to inform the creation of the priority map.
  • the prioritisation module 302 may configure in-stream metadata (such as by assigning one or more value parameters) according to a particular portion of a frame of the input data (such as a block, coding unit, or other subdivision of the frame) where one or more objects are detected and/or recognised.
  • the object analysis module may incorporate an Al object analysis model and, optionally, an ML and/or CV model thereof. Where object analysis is performed on the input data, the resulting indication of an object localised in the input data may be input to the Rol detection module of the prioritisation module 302 to an determine Rol according to the methods described above. Likewise, Rol detection may be performed on the input data by an Rol detection module of the prioritisation module 302 without necessarily performing object analysis beforehand.
  • the Rol detection module may be supplied with one or more sets of coordinates in the input data in which there is expected to be one or more objects of interest, or in which it is already known that there are objects of interest, without the scheme necessarily requiring additional real-time object analysis.
  • the Rol detection module may determine one or more Rols which it can thereafter provide to the priority mapping module 303.
  • the resulting prioritisation information or prioritisation metadata provided by the object analysis and/or Rol detection is usable by the priority mapping module 303 to determine the relative priorities of different portions of the input data.
  • the priority mapping module 303 may proceed to determine a priority map, such as by configuring a residual mask to be applied during a subsequent residual quantisation process.
  • the weights of the residual mask may be comprised in the encoder configuration data, optionally together with other prioritisation metadata provided by the prioritisation module 302 and which may be embedded in the input stream(s) to the enhancement coding operation.
  • the priority map may indicate to prioritise enhancement encoding in order to encode more (e.g. LCEVC) residuals in the prioritised portions of the input video and thereby encode more details for said portions.
  • the priority map may indicate to de-prioritise enhancement encoding for input data having a lower priority in order to encode less (e.g. LCEVC) residuals and therefore less details for such portions.
  • Figure 14 provides an example process of providing an indication of an Rol to an enhancement encoder, such as an LCEVC encoder.
  • the process comprises identifying, by an analysis module, an Rol, as described above. Identifying the Rol may further comprise determining an indication of the Rol usable (such as by an encoder or a CPU module) to modify an enhancement encoding operation.
  • An indication of an Rol may include a set of coordinates (e.g. coordinates in or otherwise according to the frame of the input video) corresponding to the Rol, or an indication of an Rol may include an indication of value (or ‘value parameter’), as described above.
  • the analysis module may be configured to provide the set of coordinates for the Rol directly to the LCEVC encoder.
  • the enhancement coding operation may be modified according to the Rol coordinates according to the present disclosure.
  • the modification of the enhancement coding operation to prioritise enhancement at the Rol may be implemented by an enhancement encoder configured to perform the enhancement coding operation.
  • enhancement modification may be independent of any given base codec and may be implemented without reference to, or without any requirement for, a base codec.
  • Figure 15 provides an example process of providing an indication of an Rol to an enhancement encoder, such as an LCEVC encoder.
  • the process comprises identifying, by an analysis module, an Rol, as described above. Identifying the Rol may further comprise determining an indication of the Rol usable (such as by an encoder or a CPU module) to modify an enhancement coding operation.
  • An indication of an Rol may include a set of coordinates (e.g. coordinates in or otherwise according to the frame of the input video) corresponding to the Rol, or an indication of an Rol may include an indication of value (or ‘value parameter’), as described above.
  • the analysis module may be configured to provide the set of coordinates for the Rol to the base codec.
  • the analysis module typically is also configured to provide the input data signal to the base codec in addition to the set of coordinates.
  • the analysis module may be configured to embed the coordinates as metadata (such as header data) in the input data stream such that said coordinates may be extractable by an encoder, a decoder, and/or another module (e.g. a residual selection module, a quantisation module etc.) further along in the encoding process.
  • the input data signal may alternatively be provided to the base codec separately, for example the input data signal may be provided directly to the base codec from the input source, in which case the analysis module may be configured to provide the coordinates to the base codec as a part of a separate data stream configured between the analysis module and the base codec.
  • the base codec may encode the input signal according to the particular encoding scheme of that base codec (e.g., AVC, HEVC, AV1 etc.). Thereafter, the base codec is configured to provide in effect two output streams, as further described with reference to Figure 11.
  • a first output stream is a first base encoded stream output for transmission or storage, wherein the first base encoded stream is, once decoded, configured for recombination with one or more enhancement encoded streams to reconstruct the input signal.
  • a second output stream is a second base encoded stream output for Rol-modified enhancement encoding at the one or more enhancement encoders according to the present disclosure.
  • only one such enhancement encoder is shown; however, this is not limited herein, and several such enhancement encoders may be provided as part of an enhancement coding operation described above.
  • the coordinates corresponding to the Rol may be provided to the enhancement encoder by the base codec.
  • Providing the coordinates as such may include embedding the coordinates in the base coded video stream or may include providing the coordinates in a separate stream between the base codec and the enhancement encoder.
  • the enhancement encoding of the base coded signal may be modified according to the Rol coordinates according to the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

La présente divulgation se rapporte au codage vidéo et en particulier au codage d'amélioration dans un système sur puce, SoC, tel qu'un SoC appliqué à une caméra. Des systèmes sur puce, SoC, sont décrits qui comprennent une pluralité de codeurs d'amélioration et un codec de base configuré pour desservir la pluralité de codeurs d'amélioration. Des procédés pour exploiter de tels SoC sont décrits ; ainsi que des procédés pour fournir un signal vidéo codé par détermination d'une région d'intérêt, RoI, dans un signal vidéo d'entrée ; et des procédés pour fournir une RoI dans un signal vidéo d'entrée à coder. D'autres SoC, d'autres procédés, des dispositifs de traitement, des programmes informatiques et analogues sont également décrits.
PCT/GB2024/053051 2023-12-08 2024-12-06 Perfectionnements et intégration de codage d'amélioration Pending WO2025120331A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB2318815.4 2023-12-08
GBGB2318815.4A GB202318815D0 (en) 2023-12-08 2023-12-08 Enhancement coding improvements and integration

Publications (1)

Publication Number Publication Date
WO2025120331A1 true WO2025120331A1 (fr) 2025-06-12

Family

ID=89575680

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2024/053051 Pending WO2025120331A1 (fr) 2023-12-08 2024-12-06 Perfectionnements et intégration de codage d'amélioration

Country Status (2)

Country Link
GB (1) GB202318815D0 (fr)
WO (1) WO2025120331A1 (fr)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2308476A (en) 1994-10-13 1997-06-25 Stephen Lee Thaler Device for the autonomous generation of useful information
GB2313070A (en) 1996-05-16 1997-11-19 Toyo Engineering Corp Improved steam reforming catalysts for lower hydrocarbons
WO2018015764A1 (fr) 2016-07-20 2018-01-25 V-Nova Ltd Dispositifs de décodage, procédés et programmes d'ordinateur
WO2020165575A1 (fr) 2019-02-13 2020-08-20 V-Nova International Ltd Analyse d'objet
WO2020188230A1 (fr) 2019-03-20 2020-09-24 V-Nova International Ltd Commande de débit pour un codeur vidéo
WO2020188273A1 (fr) 2019-03-20 2020-09-24 V-Nova International Limited Codage vidéo d'amélioration à faible complexité
WO2020212701A1 (fr) 2019-04-16 2020-10-22 V-Nova International Ltd Échange d'informations en codage vidéo hiérarchique
US11172208B2 (en) * 2017-02-28 2021-11-09 Nokia Technologies Oy Method and apparatus for improving the visual quality of viewport-based omnidirectional video streaming
US20230067541A1 (en) * 2020-04-16 2023-03-02 Intel Corporation Patch based video coding for machines
WO2023187308A1 (fr) 2022-03-31 2023-10-05 V-Nova International Ltd Pré-analyse pour codage vidéo

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2308476A (en) 1994-10-13 1997-06-25 Stephen Lee Thaler Device for the autonomous generation of useful information
GB2313070A (en) 1996-05-16 1997-11-19 Toyo Engineering Corp Improved steam reforming catalysts for lower hydrocarbons
WO2018015764A1 (fr) 2016-07-20 2018-01-25 V-Nova Ltd Dispositifs de décodage, procédés et programmes d'ordinateur
US11172208B2 (en) * 2017-02-28 2021-11-09 Nokia Technologies Oy Method and apparatus for improving the visual quality of viewport-based omnidirectional video streaming
WO2020165575A1 (fr) 2019-02-13 2020-08-20 V-Nova International Ltd Analyse d'objet
WO2020188230A1 (fr) 2019-03-20 2020-09-24 V-Nova International Ltd Commande de débit pour un codeur vidéo
WO2020188273A1 (fr) 2019-03-20 2020-09-24 V-Nova International Limited Codage vidéo d'amélioration à faible complexité
WO2020188229A1 (fr) 2019-03-20 2020-09-24 V-Nova International Ltd Traitement de données résiduelles dans un codage vidéo
WO2020212701A1 (fr) 2019-04-16 2020-10-22 V-Nova International Ltd Échange d'informations en codage vidéo hiérarchique
US20230067541A1 (en) * 2020-04-16 2023-03-02 Intel Corporation Patch based video coding for machines
WO2023187308A1 (fr) 2022-03-31 2023-10-05 V-Nova International Ltd Pré-analyse pour codage vidéo

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
FERRARA S ET AL: "Deployment Status of the LCEVC standard", no. m65265, 9 October 2023 (2023-10-09), XP030312795, Retrieved from the Internet <URL:https://dms.mpeg.expert/doc_end_user/documents/144_Hannover/wg11/m65265-v1-m65265-DeploymentstatusoftheLCEVCstandard.zip m65265 - Deployment status of the LCEVC standard.docx> [retrieved on 20231009] *
TIMMERER CHRISTIAN ET AL: "Special issue on Open Media Compression: Overview, Design Criteria, and Outlook on Emerging Standards", PROCEEDINGS OF THE IEEE, IEEE. NEW YORK, US, vol. 109, no. 9, 19 August 2021 (2021-08-19), pages 1423 - 1434, XP011873259, ISSN: 0018-9219, [retrieved on 20210819], DOI: 10.1109/JPROC.2021.3098048 *

Also Published As

Publication number Publication date
GB202318815D0 (en) 2024-01-24

Similar Documents

Publication Publication Date Title
US20230370624A1 (en) Distributed analysis of a multi-layer signal encoding
CN114450940B (zh) 一种对沉浸式视频进行编解码的方法以及编码器、解码器
CN113660486B (zh) 图像编码、解码、重建、分析方法、系统及电子设备
JP5389172B2 (ja) 深度画像を再構成する方法および深度画像を再構成する復号器
WO2022067656A1 (fr) Procédé et appareil de traitement d&#39;images
CN121125996A (zh) 将多个经编码流解码成经重建输出视频的方法
EP4120684A1 (fr) Procédé et système d&#39;optimisation de la compression d&#39;images et de vidéos pour la vision artificielle
JP2024511587A (ja) ニューラルネットワークベースのピクチャ処理における補助情報の独立した配置
US20260107007A1 (en) Exchanging information in hierarchical video coding
CN118872266A (zh) 基于多模态处理的视频译码方法
US20240054686A1 (en) Method and apparatus for coding feature map based on deep learning in multitasking system for machine vision
US12407854B2 (en) Artificial intelligence (AI) encoding apparatus and method and AI decoding apparatus and method for region of object of interest in image
Li et al. End-to-end optimized 360° image compression
US20160360231A1 (en) Efficient still image coding with video compression techniques
US20200322612A1 (en) Encoding a plurality of signals
CN117441333A (zh) 用于输入图像数据处理神经网络的辅助信息的可配置位置
EP4120683A1 (fr) Procédé et système d&#39;optimisation de la compression d&#39;images et de vidéos pour la vision artificielle
CN112383778A (zh) 一种视频编码方法、装置及解码方法、装置
Lee et al. Machine-Attention-based Video Coding for Machines
WO2024084248A1 (fr) Analyse distribuée d&#39;un codage de signal multicouche
Yuan et al. Split computing with scalable feature compression for visual analytics on the edge
US20200296358A1 (en) Method and device for encoding image according to low-quality coding mode, and method and device for decoding mage
WO2025120331A1 (fr) Perfectionnements et intégration de codage d&#39;amélioration
HK40064484B (zh) 图像编码、解码、重建、分析方法、系统及电子设备
TWI913231B (zh) 用於沈浸式視訊的編碼方案

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24827110

Country of ref document: EP

Kind code of ref document: A1