WO2022148446A1 - 图像处理方法、装置、设备及存储介质 - Google Patents
图像处理方法、装置、设备及存储介质 Download PDFInfo
- Publication number
- WO2022148446A1 WO2022148446A1 PCT/CN2022/070815 CN2022070815W WO2022148446A1 WO 2022148446 A1 WO2022148446 A1 WO 2022148446A1 CN 2022070815 W CN2022070815 W CN 2022070815W WO 2022148446 A1 WO2022148446 A1 WO 2022148446A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- raw image
- raw
- frame
- current photographing
- photographing scene
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/90—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
- H04N19/91—Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/72—Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
- H04M1/724—User interfaces specially adapted for cordless or mobile telephones
- H04M1/72403—User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/04—Protocols specially adapted for terminals or networks with limited capabilities; specially adapted for terminal portability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/184—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/70—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/63—Control of cameras or camera modules by using electronic viewfinders
- H04N23/631—Graphical user interfaces [GUI] specially adapted for controlling image capture or setting capture parameters
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/63—Control of cameras or camera modules by using electronic viewfinders
- H04N23/631—Graphical user interfaces [GUI] specially adapted for controlling image capture or setting capture parameters
- H04N23/632—Graphical user interfaces [GUI] specially adapted for controlling image capture or setting capture parameters for displaying or modifying preview images prior to image capturing, e.g. variety of image resolutions or capturing parameters
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/70—Circuitry for compensating brightness variation in the scene
- H04N23/741—Circuitry for compensating brightness variation in the scene by increasing the dynamic range of the image compared to the dynamic range of the electronic image sensors
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2250/00—Details of telephonic subscriber devices
- H04M2250/52—Details of telephonic subscriber devices including functional features of a camera
Definitions
- the embodiments of the present application relate to the field of image processing, and in particular, to an image processing method, apparatus, device, and storage medium.
- the camera module of the mobile phone can collect the original image and output it to the intermediate processing module.
- Raw images can be called RAW images or digital negatives.
- the intermediate processing module of the mobile phone can perform a series of processing on the received RAW image, and finally obtain an image that can be used for display, such as a JPEG image.
- the JPEG image can be transmitted to the display screen of the mobile phone for display, and/or transmitted to the memory of the mobile phone for storage.
- the intermediate processing module processes the RAW image
- the process of generating the JPEG image may include: performing image signal processing (ISP) on the RAW image, converting the image from the RAW domain to the YUV domain, and the image in the YUV domain can be called is a YUV image; then, the YUV image is processed by the YUV domain post-processing algorithm; finally, the JPEG encoding method is used to encode the YUV image that has been processed in the YUV domain to obtain a JPEG image.
- ISP image signal processing
- some image processing algorithms in the YUV domain can be migrated to the RAW domain.
- image processing algorithms such as multi-frame registration, fusion, and noise reduction of HDR can be migrated from the YUV domain to the RAW domain.
- the benefits of image processing in the RAW domain can include: RAW images contain higher bit information than YUV images; RAW images are not processed by ISP, and will not cause damage to color, details, and other information.
- the image processing in the RAW domain requires a larger amount of data, and requires higher algorithm performance and memory.
- the computing resources and memory resources of mobile phones are limited. Therefore, by migrating some image processing algorithms in the YUV domain to the RAW domain, there will be certain constraints in the mobile phone, which may easily lead to the problem of limited processing effects. For example, some image processing algorithms may need to be cropped and adapted according to the computing power of the mobile phone, resulting in unsatisfactory processing results of the image processing algorithms.
- the embodiments of the present application provide an image processing method, apparatus, device, and storage medium, which can solve the problem that the processing effect is limited due to restrictions on the mobile phone when some image processing algorithms in the YUV domain are migrated to the RAW domain.
- an embodiment of the present application provides an image processing method.
- the method includes: in response to a user's photographing operation, a terminal device collects a RAW image corresponding to a current photographing scene.
- the terminal device encodes the RAW image corresponding to the current photographing scene, obtains the encoded code stream of the RAW image corresponding to the current photographing scene, and sends the encoded code stream of the RAW image corresponding to the current photographing scene to the cloud.
- the terminal device receives the image in the first format from the cloud, where the image in the first format is generated by the cloud according to the encoded code stream of the RAW image corresponding to the current photographing scene.
- This image processing method can avoid the problem that some image processing algorithms in the YUV domain are migrated to the RAW domain, which will be subject to certain constraints in the terminal device, resulting in limited processing effects; the big data resources and computing resources in the cloud can be fully utilized. , perform RAW domain image processing, ISP processing, and YUV domain processing on RAW images to achieve better image processing results.
- the RAW image corresponding to the current photographing scene includes one or more frames; the terminal device encodes the RAW image corresponding to the current photographing scene, obtains an encoded code stream of the RAW image corresponding to the current photographing scene, and sends the RAW image to the current photographing scene.
- the cloud sends the encoded code stream of the RAW image corresponding to the current photographing scene, including: when the RAW image corresponding to the current photographing scene includes multiple frames, the terminal device encodes the multi-frame RAW image to obtain the encoded code stream of the multiple-frame RAW image, and Send the encoded stream of multi-frame RAW images to the cloud.
- the encoded code stream corresponding to the multi-frame RAW images is uploaded to the cloud for processing.
- a frame of RAW image is collected, it is directly processed locally (ie, on the side of the terminal device).
- the terminal device encodes the collected RAW image to obtain the encoded code stream corresponding to the RAW image, and converts the encoded code corresponding to the RAW image.
- the stream is uploaded to the cloud for processing.
- the method further includes: the terminal device, in response to the user's first selection operation, determines that the RAW image collected during the photographing needs to be recorded.
- the image is uploaded to the cloud for processing.
- the terminal device may have a function for the user to choose whether to upload the RAW image captured by the camera module to the cloud for processing.
- the first selection operation may be an operation of the user using the function on the terminal device. For example, when the mobile phone starts and runs the camera application, it can provide the user with a function control on the camera interface to choose whether to upload the RAW image captured by the camera module to the cloud for processing. The user can operate the function control to Actively choose whether to upload the RAW images captured by the camera module to the cloud for processing. The operation that the mobile phone chooses to upload the RAW image captured by the camera module to the cloud for processing is the first selection operation.
- the terminal device encodes the RAW image corresponding to the current photographing scene, and obtains an encoded code stream of the RAW image corresponding to the current photographing scene, including: the terminal device compresses the RAW image corresponding to the current photographing scene to obtain the current photographing scene.
- the terminal device compresses the RAW image corresponding to the current photographing scene to obtain compression characteristics of the RAW image corresponding to the current photographing scene, including:
- the type of the shooting scene determines the inter-frame correlation between the multi-frame RAW images;
- the terminal device selects a frame from the multi-frame RAW image as a reference frame, and uses the reference frame and the inter-frame correlation between the multi-frame RAW images according to the reference frame. , to predict other frames except the reference frame in the multi-frame RAW image, and obtain the residual map corresponding to the other frames;
- the reference frame is compressed to obtain the compression characteristics of the multi-frame RAW image.
- the terminal device selects a frame from the multi-frame RAW image as the reference frame, and according to the reference frame and the inter-frame correlation between the multi-frame RAW images
- the frames are predicted to obtain residual maps corresponding to other frames, which means that the multi-frame RAW images are preprocessed according to the inter-frame correlation between the multi-frame RAW images.
- Preprocessing the multi-frame RAW images according to the inter-frame correlation between the multi-frame RAW images can further improve the compression ratio of the multi-frame RAW images when compressing them, and improve the transmission speed of the RAW image encoding code stream.
- the method further includes: the terminal device determining the type of the current photographing scene according to the metadata information of the multi-frame RAW images.
- the terminal device encodes the RAW image corresponding to the current photographing scene, and obtains an encoded code stream of the RAW image corresponding to the current photographing scene, including: The RAW image is channel-coded to obtain an encoded code stream of the RAW image corresponding to the current photographing scene; wherein, when the RAW image corresponding to the current photographing scene includes multiple frames, the encoded code stream of the RAW image corresponding to the current photographing scene includes the RAW image corresponding to the multi-frame RAW image.
- Figure 1 corresponds to multiple sets of code stream packets; when the RAW image corresponding to the current photographing scene includes one frame, the encoded code stream of the RAW image corresponding to the current photographing scene includes a set of code stream packets corresponding to one frame of RAW image.
- Each group of code stream packets includes a plurality of code stream packets, and each code stream packet includes at least an error correction code and metadata information of a frame of RAW image corresponding to the code stream packet.
- the terminal device sends the encoded code stream of the RAW image corresponding to the current photographing scene to the cloud, including: the terminal device sequentially uploads the code stream packet corresponding to each frame of the RAW image to the cloud in units of frames.
- the terminal device uses the distributed encoding method to encode the RAW image
- the more accurate the predicted value in the cloud the less error correction code needs to be transmitted and the higher the compression rate. Therefore, the data correlation in the cloud can be fully utilized to achieve a higher compression rate and effectively save upload traffic.
- an embodiment of the present application provides an image processing apparatus, which can be used to implement the method described in the first aspect above.
- the functions of the apparatus may be implemented by hardware, or by executing corresponding software by hardware.
- the hardware or software includes one or more modules or units corresponding to the above functions, for example, a camera module, an encoding module, a sending module, a receiving module, and the like.
- the camera module is used to collect the RAW image corresponding to the current photographing scene in response to the user's photographing operation; the encoding module is used to encode the RAW image corresponding to the current photographing scene, and obtain the coding code of the RAW image corresponding to the current photographing scene.
- the sending module is used to send the encoded code stream of the RAW image corresponding to the current photographing scene to the cloud; the receiving module is used to receive the image in the first format from the cloud, and the image in the first format is the image corresponding to the current photographing scene by the cloud. Generated from the encoded code stream of the RAW image.
- the RAW image corresponding to the current photographing scene includes one or more frames; the encoding module is specifically configured to encode the RAW image of the multiple frames when the RAW image corresponding to the current photographing scene includes multiple frames to obtain multiple frames.
- the encoding code stream of the RAW image; the sending module is specifically used to send the encoding code stream of the multi-frame RAW image to the cloud.
- the camera module is further configured to, in response to the user's first selection operation, determine that it is necessary to upload the RAW images collected when taking pictures to the cloud for processing.
- the encoding module is specifically configured to compress the RAW image corresponding to the current photographing scene to obtain the compression feature of the RAW image corresponding to the current photographing scene; quantify the compression feature of the RAW image corresponding to the current photographing scene; Entropy encoding is performed on the quantized compression feature of the RAW image corresponding to the current photographing scene, and an encoded code stream of the RAW image corresponding to the current photographing scene is obtained.
- the encoding module is specifically configured to, when the RAW image corresponding to the current photographing scene includes multiple frames, according to the type of the current photographing scene, determine the inter-frame correlation between the multiple frames of RAW images; Select a frame as the reference frame, and according to the reference frame and the inter-frame correlation between the multi-frame RAW images, make predictions on other frames except the reference frame in the multi-frame RAW images, and obtain the residuals corresponding to the other frames.
- Figure compress the residual map corresponding to other frames except the reference frame and the reference frame in the multi-frame RAW image to obtain the compression characteristics of the multi-frame RAW image.
- the encoding module is further configured to determine the type of the current photographing scene according to the metadata information of the multi-frame RAW images.
- the coding module is specifically configured to perform channel coding on the RAW image corresponding to the current photographing scene by means of distributed information source coding, to obtain an encoded code stream of the RAW image corresponding to the current photographing scene;
- the encoded code stream of the RAW image corresponding to the current photographing scene includes multiple sets of code stream packets corresponding to the multiple frames of RAW images one-to-one;
- the RAW image corresponding to the current photographing scene includes one frame,
- the encoded code stream of the RAW image corresponding to the current photographing scene includes a set of code stream packets corresponding to one frame of the RAW image.
- Each group of code stream packets includes a plurality of code stream packets, and each code stream packet includes at least an error correction code and metadata information of a frame of RAW image corresponding to the code stream packet.
- the sending module is specifically used for uploading the code stream packet corresponding to each frame of RAW image to the cloud in a frame as a unit.
- an embodiment of the present application provides an electronic device, including: a processor, a memory for storing instructions executable by the processor; when the processor is configured to execute the instructions, the electronic device achieves the implementation of the first aspect.
- the described image processing method includes: a processor, a memory for storing instructions executable by the processor; when the processor is configured to execute the instructions, the electronic device achieves the implementation of the first aspect.
- the electronic device may be a mobile terminal such as a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an AR/VR device, a notebook computer, a super mobile personal computer, a netbook, a personal digital assistant, etc., or a digital camera, a single-lens reflex camera/ Professional shooting equipment such as mirrorless cameras, action cameras, PTZ cameras, and drones.
- a mobile terminal such as a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an AR/VR device, a notebook computer, a super mobile personal computer, a netbook, a personal digital assistant, etc.
- a digital camera a single-lens reflex camera/ Professional shooting equipment such as mirrorless cameras, action cameras, PTZ cameras, and drones.
- embodiments of the present application provide a computer-readable storage medium on which computer program instructions are stored; when the computer program instructions are executed by an electronic device, the electronic device can implement the image processing method described in the first aspect.
- the embodiments of the present application further provide a computer program product, including computer-readable codes, which, when the computer-readable codes are executed in an electronic device, enable the electronic device to implement the image processing method described in the foregoing first aspect .
- an embodiment of the present application further provides an image processing method, the method comprising: receiving an encoded code stream of a RAW image corresponding to a current photographing scene from a terminal device in the cloud.
- the cloud decodes the encoded code stream of the RAW image corresponding to the current photographing scene, and obtains the reconstructed RAW image corresponding to the current photographing scene.
- the cloud processes the reconstructed RAW image corresponding to the current photographing scene, generates an image in the first format corresponding to the current photographing scene, and sends the image in the first format to the terminal device.
- the cloud decodes the encoded code stream of the RAW image corresponding to the current photographing scene to obtain the reconstructed RAW image corresponding to the current photographing scene, including: the cloud performs entropy decoding on the encoded code stream of the RAW image corresponding to the current photographing scene , to obtain the quantized compression feature of the RAW image corresponding to the current photographing scene; the cloud performs inverse quantization on the quantized compression feature of the RAW image corresponding to the current photographing scene, and obtains the compression feature of the RAW image corresponding to the current photographing scene; the cloud quantifies the current photographing scene The compression feature of the RAW image corresponding to the scene is decompressed to obtain the reconstructed RAW image corresponding to the current photographing scene.
- the RAW image corresponding to the current photographing scene includes multiple frames; the cloud decompresses the compression feature of the RAW image corresponding to the current photographing scene to obtain the reconstructed RAW image corresponding to the current photographing scene, including: the cloud describing the multiple frames.
- the compression features of the RAW image are decompressed to obtain the reconstructed RAW image corresponding to the reference frame in the multi-frame RAW image and the residual image corresponding to other frames; the cloud determines the frame interval between the multi-frame RAW images according to the type of the current shooting scene.
- the cloud reconstructs the multi-frame RAW image according to the reconstructed RAW image corresponding to the reference frame, the residual image corresponding to other frames, and the inter-frame correlation between the multi-frame RAW images, and obtains the multi-frame RAW image one by one. Corresponding multi-frame reconstruction RAW images.
- the encoding code stream of the multi-frame RAW image also includes metadata information of the multi-frame RAW image; before the cloud determines the inter-frame correlation between the multi-frame RAW images according to the type of the current photographing scene, the The method further includes: the cloud determines the type of the current photographing scene according to the metadata information of the multi-frame RAW images.
- the coded code stream of the RAW image corresponding to the current photographing scene is obtained by channel coding the RAW image corresponding to the current photographing scene by the terminal device in a distributed source coding manner; when the RAW image corresponding to the current photographing scene includes multiple frame, the encoded code stream of the RAW image corresponding to the current photographing scene includes multiple sets of code stream packets corresponding to multiple frames of RAW images one-to-one; when the RAW image corresponding to the current photographing scene includes one frame, the RAW image corresponding to the current photographing scene
- the encoded code stream includes a group of code stream packets corresponding to a frame of RAW images; each group of code stream packets includes a plurality of code stream packets, and each code stream packet includes at least an error correction code and a frame corresponding to the code stream packet Metadata information for RAW images.
- the cloud decodes the encoded code stream of the RAW image corresponding to the current photographing scene, and obtains the reconstructed RAW image corresponding to the current photographing scene, including: when the RAW image corresponding to the current photographing scene includes one frame, the cloud adopts the frame according to the initial prediction value.
- the intra-prediction method decodes a code stream packet corresponding to a received RAW image, and obtains a reconstructed RAW image corresponding to a RAW image.
- the cloud decodes the code stream packet corresponding to the received RAW image of the first frame by intra-frame prediction according to the initial predicted value, and obtains the reconstruction corresponding to the RAW image of the first frame.
- the cloud according to at least one frame in the reconstructed RAW image that has been decoded, and the inter-frame correlation between multiple RAW images, the code corresponding to each RAW image after the first received RAW image
- the stream packet is decoded to obtain the reconstructed RAW image corresponding to each frame of the RAW image after the first frame of the RAW image.
- the cloud processes the reconstructed RAW image corresponding to the current photographing scene, and generates an image of the first format corresponding to the current photographing scene, including: The frame reconstructed RAW image is fused into one frame of reconstructed RAW image in the RAW domain; the cloud converts the fused one-frame reconstructed RAW image from the RAW domain to the YUV domain to obtain a YUV image corresponding to the reconstructed RAW image; the cloud reconstructs one frame of the RAW image The YUV image corresponding to the image is encoded in the first format, and an image of the first format corresponding to the current photographing scene is obtained.
- the cloud processes the reconstructed RAW image corresponding to the current photographing scene, and generates an image of the first format corresponding to the current photographing scene, including: the cloud reconstructs the multiple frames.
- the RAW image is converted from the RAW domain to the YUV domain, and the multi-frame YUV image corresponding to the multi-frame reconstructed RAW image is obtained;
- the cloud fuses the multi-frame YUV image corresponding to the multi-frame reconstructed RAW image into one frame of YUV in the YUV domain. image;
- the cloud encodes the fused frame of YUV image into the first format, and obtains an image of the first format corresponding to the current photographing scene.
- the image processing method described in the sixth aspect corresponds to the image processing method described in the aforementioned first aspect, and therefore, has the same beneficial effects as the aforementioned sixth aspect, which will not be repeated.
- an embodiment of the present application provides an image processing apparatus, which can be used to implement the method described in the sixth aspect.
- the functions of the apparatus may be implemented by hardware, or by executing corresponding software by hardware.
- the hardware or software includes one or more modules or units corresponding to the above functions, for example, a receiving module, a decoding module, a processing module, a sending module, and the like.
- the receiving module is used to receive the coded code stream of the RAW image corresponding to the current photographing scene from the terminal device; the decoding module is used to decode the coded code stream of the RAW image corresponding to the current photographing scene, and obtain the corresponding RAW image of the current photographing scene. Reconstructing the RAW image; the processing module is used to process the reconstructed RAW image corresponding to the current photographing scene to generate an image of the first format corresponding to the current photographing scene; the sending module is used to send the image of the first format to the terminal device.
- the processing module may include a RAW domain post-processing module, an ISP module, a YUV domain post-processing module, a first format encoder, and the like.
- the decoding module is specifically configured to perform entropy decoding on the encoded code stream of the RAW image corresponding to the current photographing scene, to obtain the quantized compression feature of the RAW image corresponding to the current photographing scene;
- the compressed features of the quantized image are inversely quantized to obtain the compression features of the RAW image corresponding to the current photographing scene;
- the compressed features of the RAW image corresponding to the current photographing scene are decompressed to obtain the reconstructed RAW image corresponding to the current photographing scene.
- the RAW image corresponding to the current photographing scene includes multiple frames; the decoding module is specifically configured to decompress the compression features of the multiple-frame RAW image to obtain the reconstructed RAW image corresponding to the reference frame in the multiple-frame RAW image, and Residual maps corresponding to other frames; determine the inter-frame correlation between multi-frame RAW images according to the type of the current photographing scene; reconstructed RAW images corresponding to reference frames, residual images corresponding to other frames, and multi-frame RAW images The inter-frame correlation between the multi-frame RAW images is reconstructed, and the multi-frame reconstructed RAW images corresponding to the multi-frame RAW images one-to-one are obtained.
- the encoding code stream of the multi-frame RAW image further includes metadata information of the multi-frame RAW image; the decoding module is further configured to determine the type of the current photographing scene according to the metadata information of the multi-frame RAW image.
- the coded code stream of the RAW image corresponding to the current photographing scene is obtained by channel coding the RAW image corresponding to the current photographing scene by the terminal device in a distributed source coding manner; when the RAW image corresponding to the current photographing scene includes multiple frame, the encoded code stream of the RAW image corresponding to the current photographing scene includes multiple sets of code stream packets corresponding to multiple frames of RAW images one-to-one; when the RAW image corresponding to the current photographing scene includes one frame, the RAW image corresponding to the current photographing scene
- the encoded code stream includes a group of code stream packets corresponding to a frame of RAW images; each group of code stream packets includes a plurality of code stream packets, and each code stream packet includes at least an error correction code and a frame corresponding to the code stream packet Metadata information for RAW images.
- the decoding module is specifically configured to decode the code stream packet corresponding to the received RAW image by using intra-frame prediction according to the initial predicted value to obtain a frame The reconstructed RAW image corresponding to the RAW image.
- the decoding module is specifically configured to decode the code stream packet corresponding to the received RAW image of the first frame by using intra-frame prediction according to the initial predicted value, and obtain the first frame of the RAW image.
- the reconstructed RAW image corresponding to one frame of RAW image according to at least one frame in the reconstructed RAW image that has been decoded and the inter-frame correlation between multiple frames of RAW images, for each received RAW image after the first frame
- the code stream packet corresponding to the RAW image of the frame is decoded, and the reconstructed RAW image corresponding to each RAW image after the RAW image of the first frame is obtained.
- the processing module is specifically configured to fuse the multiple frames of the reconstructed RAW image into a single frame of the reconstructed RAW image in the RAW domain; reconstruct the fused one frame.
- the RAW image is converted from the RAW domain to the YUV domain, and a YUV image corresponding to a reconstructed RAW image is obtained; the YUV image corresponding to a reconstructed RAW image is encoded into the first format to obtain an image of the first format corresponding to the current photographing scene.
- the processing module is specifically configured to convert the multiple-frame reconstructed RAW image from the RAW domain to the YUV domain, and obtain a one-to-one correspondence with the multiple-frame reconstructed RAW image.
- the multi-frame YUV image of the RAW image fuse the multi-frame YUV image corresponding to the multi-frame reconstructed RAW image into one frame of YUV image in the YUV domain; encode the fused one-frame YUV image into the first format, and obtain the corresponding picture of the current photographing scene. image in the first format.
- an embodiment of the present application provides an electronic device, including: a processor, a memory for storing instructions executable by the processor; when the processor is configured to execute the instructions, the electronic device implements the method described in the sixth aspect.
- the described image processing method is not limited to: a processor, a memory for storing instructions executable by the processor; when the processor is configured to execute the instructions, the electronic device implements the method described in the sixth aspect. The described image processing method.
- the electronic device may be a cloud server, a server cluster, a cloud platform, and the like.
- an embodiment of the present application provides a computer-readable storage medium on which computer program instructions are stored; when the computer program instructions are executed by an electronic device, the electronic device can implement the image processing method described in the sixth aspect.
- the embodiments of the present application further provide a computer program product, including computer-readable codes, which, when the computer-readable codes are executed in an electronic device, enable the electronic device to implement the image processing method described in the sixth aspect. .
- Fig. 1 shows the schematic diagram of a kind of photographing principle
- FIG. 2 shows a schematic structural diagram of a device-cloud collaboration system provided by an embodiment of the present application
- FIG. 3 shows a schematic structural diagram of a terminal device provided by an embodiment of the present application
- FIG. 4 shows a schematic diagram of interaction between a mobile phone and a cloud provided by an embodiment of the present application
- FIG. 5 shows a schematic diagram of a photographing interface provided by an embodiment of the present application
- FIG. 6 shows another schematic diagram of a photographing interface provided by an embodiment of the present application.
- FIG. 7 shows another schematic diagram of a photographing interface provided by an embodiment of the present application.
- FIG. 8 shows another schematic diagram of a photographing interface provided by an embodiment of the present application.
- FIG. 9 shows a schematic diagram of an encoding module provided by an embodiment of the present application.
- FIG. 10 shows a schematic diagram of a decoding module provided by an embodiment of the present application.
- Figure 11 shows a schematic diagram of the RGGB format arrangement of a RAW image
- FIG. 12 shows another schematic diagram of an encoding module provided by an embodiment of the present application.
- FIG. 13 shows another schematic diagram of a decoding module provided by an embodiment of the present application.
- FIG. 14 shows a schematic diagram of a processing flow of a decoding module provided by an embodiment of the present application.
- FIG. 15 shows a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application.
- FIG. 16 shows a schematic structural diagram of another image processing apparatus provided by an embodiment of the present application.
- the embodiments of the present application may be applicable to a scene in which a terminal device with a photographing function performs photographing.
- the terminal device may be a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an augmented reality (AR)/virtual reality (VR) device, a laptop computer, an ultra-mobile personal computer (ultra-mobile) Personal computer, UMPC), netbook, personal digital assistant (personal digital assistant, PDA) and other mobile terminals, or, can also be professional digital cameras, SLR cameras/mirror cameras, action cameras, PTZ cameras, drones, etc.
- Shooting equipment the embodiment of the present application does not limit the specific type of the terminal equipment.
- FIG. 1 shows a schematic diagram of a photographing principle.
- a camera module 110 (or referred to as a camera module) of a mobile phone includes a lens (lens) 111 and a sensor (sensor) 112 .
- the lens 111 of the camera module 110 can acquire the light signal corresponding to the photographed object in the photographing scene.
- the sensor 112 of the camera module 110 can convert the optical signal passing through the lens 111 into an electrical signal, and then perform analog-to-digital (A/D) conversion on the electrical signal, and output the corresponding digital signal to the intermediate processing module 120.
- A/D analog-to-digital
- the digital signal output by the sensor 112 to the intermediate processing module 120 is the original image captured by the camera module 110, which can be called a RAW image or a digital negative.
- the intermediate processing module 120 may perform a series of processing on the received RAW image, and finally obtain an image that can be used for display, such as a JPEG image.
- the JPEG image may be transmitted to the display screen 130 of the mobile phone for display, and/or transmitted to the memory 140 of the mobile phone for storage.
- the intermediate processing module 120 processes the RAW image
- the process of generating the JPEG image may include: performing image signal processing (ISP) on the RAW image, converting the image from the RAW domain to In the YUV domain, the image in the YUV domain can be called a YUV image; then, the YUV image is processed by the YUV domain post-processing algorithm; finally, the JPEG encoding method is used to encode the YUV image after the YUV domain processing is completed to obtain a JPEG image.
- ISP image signal processing
- the ISP processing may include: bad pixel correction (DPC), RAW domain noise reduction, black level correction (BLC), lens shading correction (LSC), automatic White balance (auto white balance, AWB), demosaicing (demosica) color interpolation, color correction (color correction matrix, CCM), dynamic range compression (dynamic range compression, DRC), gamma (gamma), 3D lookup table (look up table, LUT), YUV domain noise reduction, sharpen, detail enhancement, etc.
- DPC bad pixel correction
- BLC black level correction
- LSC lens shading correction
- AWB automatic White balance
- demosaicing demosaicing
- CCM color correction matrix
- DRC dynamic range compression
- gamma gamma
- 3D lookup table look up table, LUT
- YUV domain noise reduction sharpen, detail enhancement, etc.
- YUV domain post-processing algorithms can include: multi-frame registration, fusion, and noise reduction of high-dynamic range (HDR) images, as well as super resolution (SR) algorithms and skin beautification algorithms to improve clarity , distortion correction algorithm, blur algorithm, etc.
- HDR high-dynamic range
- SR super resolution
- the intermediate processing module 120 processes the RAW image to generate the JPEG image
- migrating some image processing algorithms in the YUV domain to the RAW domain a better image processing effect can be achieved.
- image processing algorithms such as multi-frame registration, fusion, and noise reduction of HDR can be migrated from the YUV domain to the RAW domain.
- the benefits of image processing in the RAW domain can include: RAW images contain higher bit information than YUV images; RAW images are not processed by ISP, and will not cause damage to color, details, and other information.
- the image processing in the RAW domain requires a larger amount of data, and requires higher algorithm performance and memory.
- the computing resources and memory resources of the terminal device are limited. Therefore, by migrating some image processing algorithms in the YUV domain to the RAW domain, certain constraints will be imposed on the terminal device, which will easily lead to the problem of limited processing effects. For example, some image processing algorithms may need to be cropped and adapted according to the computing power of the terminal device, resulting in unsatisfactory processing results of the image processing algorithms.
- an embodiment of the present application provides an image processing method, in which the terminal device can upload the collected RAW image that needs to be processed to the cloud.
- the cloud can make full use of big data resources and computing resources to perform RAW domain image processing, ISP processing, and YUV domain processing on RAW images to obtain the final first-format image and send it back to the terminal device.
- the first format may include JPEG format, high efficiency image file format (high efficiency image file format, HEIF), etc., and this application does not limit the first format.
- This image processing method can avoid the problem that some image processing algorithms in the YUV domain are migrated to the RAW domain, which will be subject to certain constraints in the terminal device, resulting in limited processing effects; the big data resources and computing resources in the cloud can be fully utilized. , perform RAW domain image processing, ISP processing, and YUV domain processing on RAW images to achieve better image processing results.
- FIG. 2 shows a schematic structural diagram of a terminal-cloud collaboration system provided by an embodiment of the present application.
- the terminal-cloud collaboration system may include: a terminal device 210 and a cloud 220 , and the terminal device 210 can communicate with a wireless network through a wireless network. Cloud 220 connections.
- the cloud 220 may be a computer server or a server cluster composed of multiple servers, and the application does not limit the implementation architecture of the cloud 220.
- the terminal device 210 For the specific form of the terminal device 210, reference may be made to the description in the foregoing embodiments, and details are not repeated here.
- a terminal device 210 is exemplarily shown in FIG. 2 .
- the terminal device 210 in the terminal-cloud collaboration system may include one or more, and the multiple terminal devices 210 may be the same or different, which is not limited herein.
- the image processing method provided by the embodiment of the present application is a process of implementing image processing for the interaction between each terminal device 210 and the cloud 220 .
- FIG. 3 shows a schematic structural diagram of a terminal device provided by an embodiment of the present application.
- the mobile phone may include a processor 310, an external memory interface 320, an internal memory 321, a universal serial bus (USB) interface 330, a charging management module 340, a power management module 341, a battery 342, an antenna 1, Antenna 2, Mobile Communication Module 350, Wireless Communication Module 360, Audio Module 370, Speaker 370A, Receiver 370B, Microphone 370C, Headphone Interface 370D, Sensor Module 380, Key 390, Motor 391, Indicator 392, Camera 393, Display screen 394, and a subscriber identification module (subscriber identification module, SIM) card interface 395 and the like.
- SIM subscriber identification module
- the processor 310 may include one or more processing units, for example, the processor 310 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
- application processor application processor, AP
- modem processor graphics processor
- ISP image signal processor
- controller memory
- video codec digital signal processor
- DSP digital signal processor
- NPU neural-network processing unit
- the controller can be the nerve center and command center of the mobile phone.
- the controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.
- a memory may also be provided in the processor 310 for storing instructions and data.
- the memory in processor 310 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 310 . If the processor 310 needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided, and the waiting time of the processor 310 is reduced, thereby increasing the efficiency of the system.
- processor 310 may include one or more interfaces.
- the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transceiver (universal asynchronous transmitter) receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, SIM interface, and/or USB interface, etc.
- I2C integrated circuit
- I2S integrated circuit built-in audio
- PCM pulse code modulation
- PCM pulse code modulation
- UART universal asynchronous transceiver
- MIPI mobile industry processor interface
- GPIO general-purpose input/output
- SIM interface SIM interface
- USB interface etc.
- the external memory interface 320 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the mobile phone.
- the external memory card communicates with the processor 310 through the external memory interface 320 to realize the data storage function. For example to save files like music, video etc in external memory card.
- Internal memory 321 may be used to store computer executable program code, which includes instructions.
- the processor 310 executes various functional applications and data processing of the mobile phone by executing the instructions stored in the internal memory 321 .
- the internal memory 321 may include a storage program area and a storage data area.
- the storage program area can store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), and the like.
- the storage data area can store data (such as image data, phone book, etc.) created during the use of the mobile phone.
- the internal memory 321 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.
- the charging management module 340 is used to receive charging input from the charger. While the charging management module 340 charges the battery 342 , it can also supply power to the mobile phone through the power management module 341 .
- the power management module 341 is used to connect the battery 342 , the charging management module 340 , and the processor 310 .
- the power management module 341 can also receive the input of the battery 342 to supply power to the mobile phone.
- the wireless communication function of the mobile phone can be realized by the antenna 1, the antenna 2, the mobile communication module 350, the wireless communication module 360, the modulation and demodulation processor, the baseband processor, and the like.
- Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
- Each antenna in a cell phone can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
- the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
- the mobile phone can implement audio functions through an audio module 370, a speaker 370A, a receiver 370B, a microphone 370C, an earphone interface 370D, and an application processor. Such as music playback, recording, etc.
- the sensor module 380 may include a pressure sensor 380A, a gyro sensor 380B, an air pressure sensor 380C, a magnetic sensor 380D, an acceleration sensor 380E, a distance sensor 380F, a proximity light sensor 380G, a fingerprint sensor 380H, a temperature sensor 380J, a touch sensor 380K, and an ambient light sensor 380L, bone conduction sensor 380M, etc.
- the camera 393 may include various types.
- the camera 393 may include a telephoto camera with different focal lengths, a wide-angle camera or an ultra-wide-angle camera, and the like.
- the telephoto camera has a small field of view and is suitable for shooting scenes in a small range in the distance;
- the wide-angle camera has a larger field of view;
- the ultra-wide-angle camera has a larger field of view than the wide-angle camera, and can be used to shoot panoramas and other large areas screen.
- the telephoto camera with a smaller field of view can be rotated to capture scenes in different ranges.
- the phone can capture RAW images through the camera 393.
- the specific structure of the camera 393 can refer to the camera module described in FIG. 1 , which at least includes a lens and a sensor.
- the sensor can convert the optical signal passing through the lens into an electrical signal, and then perform A/D conversion on the electrical signal to output the corresponding digital signal.
- the digital signal is the RAW image.
- Subsequent RAW domain processing, ISP processing, and YUV domain processing are performed on the RAW image to convert the RAW image into an image visible to the naked eye.
- the photosensitive element of the sensor may be a charge coupled device (CCD), and the sensor also includes an A/D converter.
- the photosensitive element of the sensor may be a complementary metal-oxide-semiconductor (CMOS).
- CMOS complementary metal-oxide-semiconductor
- Display screen 394 is used to display images, videos, and the like.
- Display screen 394 includes a display panel.
- the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light).
- emitting diode AMOLED
- flexible light-emitting diode flex light-emitting diode, FLED
- Miniled MicroLed, Micro-oLed, quantum dot light-emitting diode (quantum dot light emitting diodes, QLED) and so on.
- the cell phone may include 1 or N display screens 394, where N is a positive integer greater than 1.
- the display screen 394 may be used to display a photo-taking interface, a photo-playing interface, and the like.
- the mobile phone realizes the display function through the GPU, the display screen 394, and the application processor.
- the GPU is a microprocessor for image processing, and is connected to the display screen 394 and the application processor.
- the GPU is used to perform mathematical and geometric calculations for graphics rendering.
- Processor 310 may include one or more GPUs that execute program instructions to generate or alter display information.
- the structure shown in FIG. 3 does not constitute a specific limitation on the mobile phone.
- the mobile phone may also include more or less components than those shown in FIG. 3, or some components may be combined, or some components may be separated, or different component arrangements, and the like.
- some of the components shown in FIG. 3 may be implemented in hardware, software, or a combination of software and hardware.
- terminal device when the terminal device is other tablet computers, wearable devices, vehicle-mounted devices, AR/VR devices, notebook computers, UMPCs, netbooks, PDAs and other mobile terminals, or digital cameras, SLR cameras/mirror cameras, action cameras, cloud
- the specific structures of these other terminal equipment can also be referred to as shown in Figure 3.
- other terminal devices may have components added or reduced on the basis of the structure given in FIG. 3 , which will not be repeated here.
- a terminal device such as a mobile phone
- the camera application may include a system level application "camera" application.
- the photographing application may further include other application programs installed in the terminal device that can be used for photographing.
- the following takes the terminal device as a mobile phone as an example, combined with the terminal-cloud collaboration system shown in Figure 2 above, the process of collecting RAW images when the mobile phone is taking pictures, and uploading the collected RAW images to the cloud for image processing is given as an example.
- sexual description It should be understood that the image processing process given in the following embodiments is also applicable to a scenario in which other terminal devices with a photographing function interact with the cloud.
- FIG. 4 shows a schematic diagram of interaction between a mobile phone and a cloud provided by an embodiment of the present application.
- the terminal side represents the mobile phone side
- the cloud side represents the cloud side.
- the mobile phone may at least include a camera module and an encoding module.
- the mobile phone can collect RAW images through the camera module.
- the mobile phone can encode the RAW image collected by the camera module through the encoding module to obtain the encoded code stream corresponding to the RAW image, and upload the encoded code stream corresponding to the RAW image to the cloud.
- the cloud may at least include a decoding module, a RAW domain post-processing module, an ISP module, a YUV domain post-processing module, and a first format encoder.
- the cloud can decode the encoded code stream corresponding to the RAW image from the mobile phone through the decoding module to obtain the reconstructed RAW image.
- the RAW domain post-processing module, ISP module, and YUV domain post-processing module can sequentially perform RAW domain image processing, ISP processing, and YUV domain image processing on the reconstructed RAW image, and the YUV domain post-processing module will output a frame of YUV image.
- the YUV image output by the YUV domain post-processing module can be encoded in the first format, and finally an image in the first format (eg, a JPEG image) can be obtained.
- the cloud can then transmit the image in the first format back to the phone.
- the mobile phone can save the image in the first format in the gallery or present it to the user.
- the user may first start the camera application of the mobile phone. For example, the user can click or touch the icon of the camera on the mobile phone, and the mobile phone can start the camera in response to the user's click or touch operation on the icon of the camera (or, the user can also activate the camera through a voice assistant, without limitation).
- the mobile phone After the mobile phone starts and runs the photo application program, it will present a photo interface for the user. At the same time, the mobile phone will also obtain the preview screen corresponding to the current photo scene and display it in the photo interface.
- FIG. 5 shows a schematic diagram of a photographing interface provided by an embodiment of the present application. As shown in FIG. 5 , when the photographing application of the mobile phone is started, the photographing interface presented by the mobile phone to the user may at least include: a preview screen and a photographing button.
- the process of acquiring the preview image is similar to the photographing principle shown in the aforementioned FIG. 1 , for example, the mobile phone can collect the RAW image corresponding to the current photographing scene through the camera module. Then, the mobile phone's ISP module, YUV domain post-processing module, etc. (the structure of the mobile phone's ISP module, YUV domain post-processing module, etc. is not shown in Figure 4) can process the RAW image to obtain a preview screen that can be displayed on the camera interface .
- the processing of the RAW image is relatively simple when the mobile phone obtains the preview image.
- the YUV image can be obtained by performing some simple ISP processing on the RAW image, and then the YUV image
- the preview screen directly converted to RGB format is displayed in the camera interface, and JPEG encoding of the YUV image is not required.
- the essence of the photographing button may be a functional control displayed in the photographing interface.
- the user can click or touch the functional control, and the mobile phone can collect the RAW image through the camera module in response to the user's click or touch operation on the functional control of the photo-taking button.
- the function of the camera button can also be implemented by other physical buttons on the mobile phone, which is not limited.
- the mobile phone also has a function of scene detection, and the mobile phone collects the RAW image through the camera module may be: the mobile phone first uses the scene detection function to detect the current photographing scene, and confirms the camera module according to the detection of the current photographing scene. Then, the mobile phone collects the RAW image through the camera module according to the image output requirements of the sensor.
- the mobile phone when the user opens the camera application of the mobile phone to take a photo, when the mobile phone detects that the current photo scene is a high dynamic scene (ie, an HDR scene), it can be confirmed that the sensor needs to output multiple frames of different exposure values (exposure value, EV) RAW images for multi-frame fusion to generate high-dynamic images. Then, the mobile phone can collect multiple frames of RAW images with different EV values through the camera module according to the image output requirements of the sensor determined based on the scene detection. Among them, the mobile phone can configure the camera module with different exposures and different sensitivities (sensitivity can be represented by ISO values) to meet the EV value requirements corresponding to each frame of RAW images. That is, in an HDR scene, the sensor needs to output multi-frame RAW images with different exposures and different ISOs.
- sensitivity can be represented by ISO values
- the mobile phone when the user opens the camera application of the mobile phone to take a photo, when the mobile phone detects that the current photo scene is a low-brightness scene, it can be confirmed that the sensor needs to output multi-frame images with different exposures and different ISOs for multi-frame fusion. noise. Then, the mobile phone can also collect multi-frame RAW images with different exposures and different ISOs through the camera module according to the image output requirements of the sensor determined based on the scene detection.
- the mobile phone when the user opens the camera application of the mobile phone to take a photo, when the mobile phone detects that the current photo scene is a scene with insufficient depth of field, it can be confirmed that the sensor needs to output multiple frames of images at different focusing distances to perform multi-frame fusion and expand the depth of field. (extend depth of field, EDOF). Then, the mobile phone can collect multiple frames of RAW images at different focusing distances through the camera module according to the image output requirements of the sensor determined based on the scene detection.
- EDOF end depth of field
- the scene detection function of the mobile phone can be implemented by deploying a scene detection module in the mobile phone.
- the scene detection module may be a program module (or algorithm unit) in the mobile phone.
- the scene detection module can detect the photo-taking scene selected by the user in the photo-taking application as the current photo-taking scene.
- FIG. 6 shows another schematic diagram of a photographing interface provided by an embodiment of the present application.
- the mobile phone when the user opens the camera application of the mobile phone, the mobile phone can provide the camera interface shown in Figure 6, and the camera interface will display the preview screen corresponding to the current shooting scene.
- the photographing interface further includes a function control corresponding to at least one scene, such as the function control HDR shown in FIG. 6 .
- the scene detection module of the mobile phone can detect the user's click or touch operation on the function control HDR, and determine the current photo in response to the click operation.
- the scene is an HDR scene, and it is confirmed that the sensor needs to output multiple frames of images with different exposures and different sensitivities to perform multi-frame fusion to generate high dynamic images.
- the mobile phone can respond to the user's click or touch operation on the function control of the camera button, and output multi-frame images with different exposures and different sensitivities according to the needs of the sensor. Capture multi-frame RAW images with different exposures and different ISOs through the camera module.
- the scene detection module may also determine the current photographing scene according to the sensor data of the mobile phone and/or the preview image collected by the camera module. For example, after the mobile phone starts and runs the photo application program, the scene detection module can determine that the current photo scene is a low-brightness scene according to the data collected by the ambient light sensor and/or the preview image collected by the camera module, and determine that the sensor needs to output different exposure, different ISO multi-frame image for multi-frame fusion denoising.
- the mobile phone can respond to the user's click or touch operation on the function control of the camera button, according to the sensor needs to output different exposures , Multi-frame RAW images with different sensitivities are required to capture multi-frame RAW images with different exposures and different ISOs through the camera module.
- the scene detection module can determine whether the current photographing scene is an HDR scene according to the proportion of the overexposed area and/or the underexposed area in the preview image collected by the camera module. For example, if the proportion of the overexposed area is greater than a certain threshold, it is determined that the current photographing scene is an HDR scene.
- the threshold may be 60%, 70%, etc., which is not limited here.
- the mobile phone can respond to the user's click or touch operation on the function control of the camera button, and output multi-frame images with different exposures and different sensitivities according to the needs of the sensor. Capture multi-frame RAW images with different exposures and different ISOs through the camera module.
- the present application does not limit the specific implementation of the scene detection module here.
- the scene detection module detects that the current photographing scene is an ordinary scene (that is, not the above-mentioned special scene such as HDR, low brightness, insufficient depth of field, etc.), it can be confirmed that the sensor needs to output a frame of image. Then, when the user uses the mobile phone to take a photo, the mobile phone can collect a frame of RAW image through the camera module according to the image output requirements of the aforementioned sensor.
- the acquisition of a RAW image by a mobile phone through a camera module may include two scenarios: acquisition of a single frame of RAW image and acquisition of multiple frames of RAW image.
- the mobile phone uses the encoding module to encode the RAW image collected by the camera module to obtain an encoded code stream corresponding to the RAW image, and uploading the encoded code stream corresponding to the RAW image to the cloud may refer to: regardless of whether the camera module collects a frame of RAW The mobile phone encodes the RAW image collected by the camera module to obtain the encoded code stream corresponding to the RAW image, and uploads the encoded code stream corresponding to the RAW image to the cloud.
- the mobile phone will upload the RAW images collected by the camera module to the cloud for processing.
- the mobile phone encodes the RAW image collected by the camera module through the encoding module to obtain the encoded code stream corresponding to the RAW image, and uploads the encoded code stream corresponding to the RAW image to the cloud.
- frame RAW image the mobile phone encodes the multi-frame RAW image collected by the camera module through the encoding module to obtain the encoded code stream corresponding to the multi-frame RAW image, and uploads the encoded code stream corresponding to the multi-frame RAW image to the cloud.
- the mobile phone directly performs ISP processing and YUV domain image processing on the frame of RAW image through the local (ie mobile phone side) ISP module and YUV domain post-processing module, and YUV domain post-processing module.
- a frame of YUV image will be output.
- the mobile phone can encode the YUV image output by the YUV domain post-processing module in the first format through the local first format encoder, and finally obtain an image in the first format (eg, a JPEG image).
- the mobile phone can save the image in the first format in the gallery or present it to the user.
- FIG. 1 for the processing process of one frame of RAW image, reference may be made to the process shown in FIG. 1 , which will not be described in detail.
- the mobile phone can automatically determine whether the RAW image collected by the camera module needs to be uploaded to the cloud for processing in combination with the photographing scene corresponding to the RAW image collected by the camera module. Only when the camera module captures multiple frames of RAW images, the mobile phone uploads the RAW images captured by the camera module to the cloud for processing.
- the mobile phone may also have a function for the user to choose whether to upload the RAW image captured by the camera module to the cloud for processing.
- a function for the user to choose whether to upload the RAW image captured by the camera module to the cloud for processing.
- the mobile phone when it starts and runs the camera application, it can also provide the user with a function control on the camera interface that can choose whether to upload the RAW image captured by the camera module to the cloud for processing.
- the mobile phone can determine whether to upload the RAW image captured by the camera module to the cloud for processing according to the user's choice.
- FIG. 7 shows another schematic diagram of a photographing interface provided by an embodiment of the present application.
- a prompt message as shown in FIG. 7 may pop up in the camera interface provided by the mobile phone: “Do you want to upload to the cloud for processing?”, and at the same time, The area below the prompt message in the camera interface will also display two function controls: “Yes” and "No".
- the mobile phone can determine that the RAW image collected by the camera module needs to be uploaded to the cloud for processing in response to the user's click or touch operation on the "Yes” function control.
- the mobile phone after the mobile phone collects the RAW image through the camera module in response to the user's click or touch operation on the function control of the photographing button, the mobile phone will encode the RAW image collected by the camera module through the encoding module to obtain the corresponding code of the RAW image. code stream, and upload the encoded code stream corresponding to the RAW image to the cloud.
- the mobile phone may, in response to the user's click or touch operation on the "No" function control, determine that the RAW image collected by the camera module does not need to be uploaded to the cloud for processing.
- the mobile phone After the mobile phone collects the RAW image through the camera module in response to the user's click or touch operation on the function control of the camera button in the subsequent photographing process, the mobile phone will directly process the RAW image collected by the camera module locally (ie, on the mobile phone side). , and the specific processing process can also be referred to as shown in the aforementioned FIG. 1 .
- the prompt information shown in FIG. 7 and the function controls "Yes” and “No” will only be displayed each time the mobile phone starts and runs the photographing application, for the user to select.
- the prompt information in the shooting interface and the function controls "Yes” and “No” will disappear, and the user can continue to use the mobile phone to take pictures.
- the mobile phone can select "Yes” or “No” by default, and no longer display the prompt information and function controls" Yes" and "No".
- the prompt information and the function controls "Yes” and “No” shown in FIG. 7 are only exemplary descriptions.
- the prompt information can also be "Whether to upload to the cloud for processing to obtain better image quality?", "Whether to take pictures with the cloud?", etc.; the function control "Yes” can also be replaced with “OK” , the function control "No” can also be replaced with “Cancel”, etc., which are not limited in this application.
- the present application also does not limit the display area of prompt information and function controls "Yes” and "No” in the shooting interface.
- FIG. 8 shows another schematic diagram of a photographing interface provided by an embodiment of the present application.
- two function controls can be displayed at the bottom of the preview screen in the camera interface provided by the mobile phone: “mobile phone processing mode” and “cloud processing mode” .
- the user can click or touch the function control: “Mobile Phone Processing Mode” or “Cloud Processing Mode” to select.
- the mobile phone can determine that the RAW image collected by the camera module needs to be uploaded to the cloud for processing in response to the user's click or touch operation on the function control "cloud processing mode”.
- the mobile phone after the mobile phone collects the RAW image through the camera module in response to the user's click or touch operation on the function control of the photographing button, the mobile phone will encode the RAW image collected by the camera module through the encoding module to obtain the corresponding code of the RAW image. code stream, and upload the encoded code stream corresponding to the RAW image to the cloud.
- the mobile phone can determine that the RAW image collected by the camera module does not need to be uploaded to the cloud for processing in response to the user's click or touch operation on the function control "Mobile Phone Processing Mode".
- the mobile phone After the mobile phone collects the RAW image through the camera module in response to the user's click or touch operation on the function control of the camera button in the subsequent photographing process, the mobile phone will directly process the RAW image collected by the camera module locally (ie, on the mobile phone side). , and the specific processing process can also be referred to as shown in the aforementioned FIG. 1 .
- the function controls "mobile phone processing mode” and “cloud processing mode” shown in FIG. 8 may only be displayed for the user to select each time the mobile phone starts and runs the camera application. seconds, 5 seconds, 8 seconds, etc.) without making any selections, the phone can default to "Phone Processing Mode” or “Cloud Processing Mode”, and the function controls "Phone Processing Mode” and “Cloud Processing Mode” are no longer displayed. .
- the function controls "mobile phone processing mode” and “cloud processing mode” shown in FIG. 8 can also be displayed on the camera interface all the time for the user to select, which is not limited here.
- the function controls "Mobile Processing Mode” and “Cloud Processing Mode” are always displayed, if the user does not make any selections within a certain period of time, the mobile phone can also select “Mobile Processing Mode” or “Cloud Processing Mode” by default. ".
- the user if the user has selected the "mobile phone processing mode", the user can re-select the "cloud processing mode” to switch. Similarly, it is also possible to switch from “cloud processing mode” to "mobile phone processing mode”.
- the function controls "mobile phone processing mode” and "cloud processing mode” shown in FIG. 8 are also only illustrative.
- “mobile phone processing mode” can also be replaced with “local mode”
- “cloud processing mode” can also be replaced with “cloud mode”, etc., which are not limited in this application.
- the present application does not limit the display area of the function controls "mobile phone processing mode” and “cloud processing mode” in the shooting interface.
- the judgment conditions for whether the mobile phone uploads the RAW images collected by the camera module to the cloud for processing described in the foregoing embodiments may also be partially combined.
- the mobile phone can determine whether the RAW image collected by the camera module needs to be uploaded to the cloud for processing in response to the user's active selection operation. After the mobile phone determines that the RAW image collected by the camera module needs to be uploaded to the cloud for processing, in the subsequent photographing process, the mobile phone can further collect the RAW image according to the scene of the camera module. The received RAW image is uploaded to the cloud for processing; if the camera module captures a frame of RAW image, it will be processed locally on the phone.
- the mobile phone determines in response to the user's active selection operation that it is not necessary to upload the RAW images collected by the camera module to the cloud for processing, then in the subsequent photographing process, regardless of whether the camera module collects multiple frames of RAW images or one frame of RAW images, they are stored locally on the phone. to be processed.
- the aforementioned operation of the user selecting to upload the RAW image to the cloud for processing may be referred to as a first selection operation.
- the user selects the operation of "Yes" in the function control described in FIG. 7 , the operation of selecting the “cloud processing mode” shown in FIG. 8 , and the like.
- the following takes the multi-frame RAW image output by the sensor of the camera module as an example, the process of encoding the multi-frame RAW image collected by the camera module through the encoding module of the mobile phone, and the process of decoding the encoded code stream corresponding to the RAW image through the decoding module on the cloud Exemplary description. It is understandable that when the sensor of the camera module outputs a frame of RAW image, the process of encoding a frame of RAW image collected by the camera module through the encoding module, and the process of decoding the encoded code stream corresponding to the RAW image through the decoding module on the cloud You can refer to the processing process of each frame of RAW image when the sensor outputs multi-frame RAW images, and will not repeat them.
- FIG. 9 shows a schematic diagram of an encoding module provided by an embodiment of the present application.
- the coding module of the mobile phone includes an artificial intelligence (AI) coding network, a quantization module and an entropy coding module.
- AI artificial intelligence
- the camera module collects the multi-frame RAW image, it can first input the multi-frame RAW image into the AI encoding network, and the AI encoding network can perform AI encoding on the multi-frame RAW image, and output the compression features corresponding to the multi-frame RAW image to the quantization module.
- the quantization module can quantize the compressed features corresponding to the multi-frame RAW images, for example, it can convert the floating-point numbers in the compressed features corresponding to the multi-frame RAW images into binary numbers or integers.
- the entropy coding module can perform entropy coding on the compression features corresponding to the multi-frame RAW images quantized by the quantization module, and finally obtain an encoded code stream corresponding to the multi-frame RAW images. That is, the output of the encoding module is the code stream corresponding to the multi-frame RAW image.
- the coding manner of the entropy coding module may include: Shannon coding, Huffman coding, arithmetic coding (arithmetic coding), etc., which are not limited herein.
- FIG. 10 shows a schematic diagram of a decoding module provided by an embodiment of the present application.
- the decoding module in the cloud includes an entropy decoding module, an inverse quantization module, and an AI decoding network.
- the cloud receives the encoded code stream corresponding to the multi-frame RAW image from the mobile phone, it can first input the encoded code stream corresponding to the multi-frame RAW image into the entropy decoding module. Entropy decoding is performed on the encoded code stream corresponding to the frame RAW image, and the compression features corresponding to the quantized multi-frame RAW image are obtained and output to the inverse quantization module.
- the inverse quantization module can perform inverse quantization on the compression features corresponding to the quantized multi-frame RAW images output by the entropy decoding module in the opposite way to the quantization module of the mobile phone, and obtain the compression features corresponding to the multi-frame RAW images and output them to the AI decoding network.
- the AI decoding network can perform AI decoding on the compressed features corresponding to the multi-frame RAW images, and output the reconstructed RAW images corresponding to the multi-frame RAW images.
- AI encoding network and AI decoding network may be a convolutional neural network (CNN), a recurrent neural network (RNN), etc., which are not limited herein.
- CNN convolutional neural network
- RNN recurrent neural network
- the sensor of the camera module can output RAW images in various formats, such as: Bayer pattern, Foveon X3, Fuji X-E3, etc.
- RAW images in various formats, such as: Bayer pattern, Foveon X3, Fuji X-E3, etc.
- the following takes the bayer pattern format of RGGB as an example to illustrate the process of AI encoding of multi-frame RAW images by the AI encoding network in this design.
- FIG. 11 shows a schematic diagram of the RGGB format arrangement of a RAW image.
- R represents a red (red) component
- G represents a green (green) component
- B represents a blue (blue) component.
- the information of the R component, the G component, or the B component included in each pixel is represented by different filling patterns.
- the AI coding network can extract the positions corresponding to the four components of R, G, G, and B in the RAW image of each frame respectively to form a new all of Four sub-images of R component, all G components in the upper left corner, all B components, and all G components in the lower right corner, so that the intra-frame correlation (or spatial correlation) within the image between each component can be learned. , compress the RAW image based on the intra-frame correlation in the image between each component, and output the compression feature corresponding to each frame of the RAW image.
- the camera module collects 8 frames of RAW images with different EV values.
- the EV values of the 8 frames of RAW images are EV0, EV0, EV0, EV0, EV0, EV-2 respectively. , EV-4, EV2.
- the camera module can then feed the aforementioned 8-frame RAW image into the AI encoding network.
- the AI encoding network can convert each frame of RAW image into four channels of R ⁇ G ⁇ G ⁇ B data stream output.
- the data stream of the w/2*h/2*32 channel is the compression feature corresponding to the 8-frame RAW image.
- the AI decoding network performs AI decoding on the compressed features corresponding to the multi-frame RAW images in the opposite way, that is, the AI decoding network is the reverse network of the AI encoding network.
- the cloud receives the encoded code stream corresponding to the 8-frame RAW image from the mobile phone, and performs entropy decoding and inverse quantization on the encoded code stream corresponding to the 8-frame RAW image, and the data stream of w/2*h/2*32 channels can be obtained.
- the AI decoding network can perform AI decoding on the data stream of w/2*h/2*32 channels, and obtain a reconstructed image of w/2*h/2*32. After the reconstructed images of w/2*h/2*32 are reordered according to the arrangement structure of RGGB channels, 8 frames of reconstructed RAW images of w*h can be obtained.
- the RAW domain post-processing module, ISP module, and YUV domain post-processing module can perform subsequent image processing to obtain a frame of YUV image, and send the YUV image to the first format.
- the first format encoder can encode the YUV image in the first format, and finally obtain an image in the first format (eg, a JPEG image).
- the cloud can then transmit the image in the first format back to the phone.
- the mobile phone can save the image in the first format in the gallery or present it to the user.
- the process of performing subsequent image processing through the RAW domain post-processing module, the ISP module, and the YUV domain post-processing module to obtain a frame of YUV images may be: First, the 8-frame reconstructed RAW image is fused into one frame of RAW image through the RAW domain post-processing module, and then the fused one-frame RAW image is input to the ISP module; the ISP module performs a series of ISP processing on this frame of RAW image to obtain A frame of YUV image is input to the YUV domain post-processing module; the YUV domain post-processing module performs SR, skin beautification, distortion correction, blurring and other processing on this frame of YUV image, and finally obtains the processed YUV image.
- the above-mentioned multi-frame fusion processing can also be completed in the YUV domain, that is, the output of the RAW domain post-processing module is an 8-frame processed RAW image, and the 8-frame processed RAW image is input to the ISP
- the ISP module can perform a series of ISP processing on the 8-frame processed RAW images to obtain the corresponding 8-frame YUV images and input them into the YUV domain post-processing module;
- the YUV domain post-processing module can first fuse the 8-frame YUV images into one Frame the YUV image, and then perform SR, skin beautification, distortion correction, blurring and other processing on the fused YUV image, and finally obtain the processed YUV image.
- both the above-mentioned AI encoding network and AI decoding network are obtained by training a neural network (such as the aforementioned CNN, RNN, etc.) according to sample training data.
- the sample training data may include a sample RAW image and a sample reconstructed RAW image corresponding to the sample RAW image.
- the sample RAW image can be a RAW image output by the sensor of the camera module in different scenarios; the sample reconstructed RAW image is obtained by first performing ISP processing, YUV domain processing and first format encoding on the sample RAW image to obtain an image in the first format.
- the image in the first format is obtained by performing the reverse degeneration and inversion of the above-mentioned processing process.
- the sample RAW image can be used as the input of the AI encoding network, and the sample reconstructed RAW image can be used as the output of the AI decoding network (the output of the AI encoding network is the input of the AI decoding network) for training.
- the loss between the input of the AI encoding network and the output of the AI decoding network can also be calculated according to the loss function, and the parameters of the AI encoding network and the AI decoding network (such as the weight of neurons) can be optimized.
- the optimization goal is AI encoding
- the loss between the input of the network and the output of the AI decoding network is as small as possible.
- the input of the AI encoding network is a RAW image
- the output of the AI decoding network is the reconstructed RAW image.
- the RAW image input to the AI encoding network can be sequentially processed through the RAW domain post-processing module, ISP module, and YUV domain post-processing module in the cloud.
- the corresponding YUV image is obtained by processing (hereinafter referred to as the input YUV image), and the reconstructed RAW image output by the AI decoding network is sequentially processed through the RAW domain post-processing module, ISP module, and YUV domain post-processing module in the cloud for image processing.
- the corresponding reconstructed YUV map is obtained. Then, the loss of the input YUV map and the reconstructed YUV map is calculated as the loss between the input of the AI encoding network and the output of the AI decoding network. Therefore, the principle of optimizing the parameters of the AI encoding network and the AI decoding network can be as small as possible for the loss between the input YUV image and the reconstructed YUV image.
- the RAW image input to the AI encoding network may be the sample RAW image (which may be referred to as the first sample RAW image) in the above-mentioned sample training data. ), or other sample RAW images (which can be called second sample RAW images).
- the loss function may include a peak signal-to-noise ratio (PSNR), a structural similarity (SSIM), a minimum absolute error loss function (least abosulote error-loss, L1-loss) )Wait.
- the optimizer algorithm for optimizing the parameters of the AI encoding network and the AI decoding network may include stochastic gradient descent (SGD), batch gradient descent (BGD), and the like.
- the input YUV image is ori, and its size is m*n, that is, ori includes m*n
- the reconstructed RAW image finally output by the AI decoding network is processed by the RAW domain post-processing module, the ISP module, and the YUV domain post-processing module.
- the reconstructed YUV image is rec, Its size is also m*n, that is, rec includes m*n pixel points, and both m and n are integers greater than 0.
- the PSNR between ori and rec can be calculated through the following steps 1) and 2).
- MSE represents the mean square error between ori and rec
- m represents the width of ori and rec
- n represents the height of ori and rec
- (i, j) represents the pixel coordinates in ori or rec
- ori(i, j ) represents the pixel value of the pixel point (i, j).
- MAX I is the maximum possible pixel value of ori and rec. For example, each pixel is represented by 8 significant bits, so MAX I is 255.
- the SSIM between ori and rec can be calculated by the following formula (3).
- ⁇ ori represents the mean value of ori
- ⁇ rec represents the mean value of rec
- ⁇ ori rec represents the covariance between ori and rec
- c 1 and c 2 represent constants.
- c 1 and c 2 can be as follows.
- the principle of optimizing the parameters of the AI encoding network and the AI decoding network may be that the values of PSNR and SSIM between ori and rec are as large as possible.
- the optimization can be limited to meet the conditions that the PSNR between ori and rec is greater than the first threshold, and the SSIM between ori and rec is greater than the second threshold.
- the first threshold may be 38, 39, 40 or a larger value, which may be set according to the requirements of image quality
- the second threshold may be 0.8, 0.85, 0.9 and other numbers in the range of 0 to 1.
- the embodiment of the present invention involves the compression of RAW images, and the RAW images are not directly viewable by users.
- the images directly observed by users (such as JPEG images) are obtained. ).
- ISP processing and YUV domain processing of RAW images and converting RAW images to YUV or RGB space, certain losses will also occur, resulting in RAW domain losses that cannot be directly matched with user experience losses.
- correlation mapping in ISP processing can result in RAW domain loss that cannot be directly matched with user experience loss.
- DRC digital versatile correction
- gamma correction etc.
- the RAW data of different data segments will be compressed or enlarged to another data segment when mapped to the YUV domain or RGB domain seen by the end user, resulting in losses.
- the valid bits of the general RAW domain data are 10 to 12 bits, but the final YUV domain or RGB domain data valid bits are 8 bits, so the DRC will be used to map and compress the bits.
- this type of compression is not linear, and it will ensure that the pixel range that appears in the high frequency band in the middle interval uses more bit width.
- Gamma correction is also a stretching adjustment method for curve mapping of brightness, which also results in a nonlinear adjustment of the final image brightness and RAW domain information.
- the loss of the RAW image input to the AI encoding network and the reconstructed RAW image output by the AI decoding network are directly calculated, the RAW image input to the AI encoding network and the reconstructed RAW image output by the AI decoding network may appear.
- the PSNR and/or SSIM between the reconstructed RAW images output by the AI decoding network are small, but the loss is very large after a series of ISP processing, YUV domain processing and conversion to YUV and RGB space.
- the PSNR between the RAW image and the reconstructed RAW image can reach 48.13 decibels (dB)
- the PNSR between the RAW image and the reconstructed RAW image is already very high.
- YUV domain or RGB domain PSNR reaches more than 40dB, which almost achieves the quality that the naked eye does not see obvious problems.
- this loss will lead to obvious color casts, which will have very obvious problems for the end user experience.
- the RAW image input into the AI encoding network and the reconstructed RAW image output by the AI decoding network are first converted into the corresponding input YUV image and the reconstructed YUV image, and then the input YUV image and the reconstructed YUV image are calculated.
- the loss as the loss between the input of the AI encoding network and the output of the AI decoding network, realizes the estimation of the loss between the input of the AI encoding network and the output of the AI decoding network in the YUV domain (or RGB domain), thus It can make the reconstructed RAW image output by the trained AI decoding network still have a small loss after conversion to the YUV domain, avoiding the loss of chromatic aberration and distortion in the final image presented to the user, and reducing the loss in the RAW domain cannot be directly related to the user experience loss. The impact of the matching situation.
- the encoding module compresses the RAW images based on the intra-frame correlation of each frame of RAW images through the AI encoding network
- the multi-frame RAW images are preprocessed according to the inter-frame correlation between the multi-frame RAW images.
- FIG. 12 shows another schematic diagram of an encoding module provided by an embodiment of the present application.
- the encoding module further includes a correlation processing module.
- the correlation processing module can first select a frame from the multi-frame RAW image as a reference frame, and then according to the difference between the reference frame and the multi-frame RAW image The inter-frame correlation predicts other frames in the multi-frame RAW image, and outputs the reference frame RAW image and the correlation-processed RAW image of other frames to the AI encoding network for subsequent processing.
- the other frame RAW image after correlation processing refers to a residual image corresponding to the other frame RAW image obtained by predicting the other frame RAW image according to the reference frame RAW image. That is, the reference frame RAW image is directly sent to the AI encoding network for subsequent processing, while other frame RAW images are predicted based on the inter-frame correlation between the reference frame and multi-frame RAW images, and then sent to the AI encoding network for processing. processed. In this way, the amount of data occupied by the RAW images of other frames can be reduced, thereby improving the compression rate of the multi-frame RAW images by the encoding module.
- BLC represents the offset of the black level.
- the correlation processing module can first select a frame from the multi-frame RAW image as the reference frame (it can be randomly selected, or the first frame can be selected randomly. One frame), and then based on the linear relationship shown in the above formula (4), other frames are predicted according to the reference frame, and the RAW image of other frames can be predicted as a RAW image with almost all 0s (the RAW image is the residual error picture). Since a large part of the data that does not satisfy the formula (4) is caused by dead pixels, noise, etc., the compression rate of such data can be improved, especially for data with a large absolute value.
- the correlation processing module can also be based on the multi-frame RAW images.
- the RAW images are preprocessed by the inter-frame correlation between them.
- the correlation processing module can preprocess the RAW images according to the difference in the area definition between the multiple frames of RAW images.
- the correlation processing module may use different algorithms to process according to the inter-frame correlation between the multi-frame RAW images, which is not limited in this application.
- the metadata (mata data) information of each frame of the RAW images will be recorded, which may also be referred to as the description data or parameter information of the RAW images.
- the mata data of the RAW image may include the photographing scene (eg, HDR) of the RAW image of the frame, the width and height of the RAW image, the ISO value, and the like.
- the correlation processing module can select the corresponding algorithm to process according to the inter-frame correlation between the multi-frame RAW images according to the mata data information of the multi-frame RAW images. algorithm.
- the correlation processing module may not perform frame processing. Interrelationship processing.
- the correlation processing module may be a program module (or an algorithm unit) in the mobile phone, which can process the correlation of multiple frames of RAW images for different photographing scenarios.
- the correlation processing module can be skipped, and the frame of RAW image can be directly sent to the AI encoding network for subsequent processing.
- the encoding module uses the AI encoding network to compress the RAW images based on the intra-frame correlation of each frame of RAW images, according to the inter-frame correlation between the multi-frame RAW images If the correlation processing is performed on the multi-frame RAW images, the decoding module performs AI decoding on the compression features corresponding to the multi-frame RAW images through the AI decoding network, and the obtained results are the reconstructed RAW images corresponding to the reference frame RAW images, and other frame RAW images. The reconstructed residual map corresponding to the map (reconstructed map of the RAW map after correlation processing).
- the decoding module also needs to perform correlation inverse processing on the reconstructed RAW image corresponding to the RAW image of the reference frame obtained by AI decoding, and the reconstructed residual image corresponding to the RAW image of other frames, so as to obtain a multi-frame RAW image.
- One-to-one corresponding reconstructed RAW images are provided.
- FIG. 13 shows another schematic diagram of a decoding module provided by an embodiment of the present application.
- the decoding module further includes a correlation inverse processing module.
- the decoding module performs AI decoding on the compressed features corresponding to the multi-frame RAW images through the AI decoding network, and obtains the reconstructed RAW images corresponding to the RAW images of the reference frame and the reconstructed residual images corresponding to the RAW images of other frames.
- the reconstructed RAW images corresponding to the RAW images of the reference frame and the reconstructed residual images corresponding to the RAW images of other frames are subjected to correlation inverse processing to obtain the reconstructed RAW images corresponding to the RAW images of the multiple frames one-to-one.
- the processing procedure of the correlation inverse processing module may be completely opposite to the procedure of the correlation processing module on the mobile phone side, which will not be repeated here.
- the encoded code stream corresponding to the multi-frame RAW image uploaded by the mobile phone to the cloud also includes the mata data information of each frame of the RAW image.
- the correlation inverse processing module can know which algorithm the correlation processing module on the mobile phone uses to perform correlation processing on multi-frame RAW images according to the mata data information of each frame of RAW image, so that the correlation inverse processing module can use the same algorithm as the mobile phone.
- the process of the side correlation processing module is reversed, and performs the correlation inverse processing on the reconstructed RAW images corresponding to the RAW images of the reference frame and the reconstructed residual images corresponding to the RAW images of other frames.
- the data of the scene in which the sensor outputs multi-frame RAW images may be added to the sample training data, so that the AI encoding network and the AI decoding network can themselves. Learn the inter-frame correlation between multi-frame RAW images.
- the function of the correlation processing module described in the foregoing embodiment can be implemented by an AI coding network, and correspondingly, the function of the correlation inverse processing module can be decoded by AI. network to achieve.
- the multi-frame RAW image output by the sensor in the HDR scene can be added to the sample RAW image, and the multi-frame RAW image output by the sensor in the HDR scene can be added to the sample reconstruction RAW image.
- the EDOF scene you can add the multi-frame RAW image output by the sensor in the EDOF scene to the sample RAW image, and add the multi-frame RAW image output by the sensor in the EDOF scene in the sample reconstruction RAW image.
- One-to-one corresponding multi-frame reconstruction RAW Figure etc. For the training data related to the aforementioned EDOF scene, you can place the multi-frame sample RAW images at the same focus distance position on the corresponding channel, and let the AI encoding network and AI decoding network learn the EDOF scene. Correlation.
- the AI encoding network after training performs AI encoding on multi-frame RAW images, it can not only compress each frame of RAW images based on the intra-frame correlation of each frame of RAW images, but also can select from the multi-frame RAW images.
- One frame is used as a reference frame, and other frames in the multi-frame RAW image are predicted according to the inter-frame correlation between the reference frame and the multi-frame RAW image.
- the AI decoding process of the AI decoding network after the training is completed for the compressed features corresponding to the multi-frame RAW images is the opposite of the AI encoding process performed by the AI encoding network, and will not be described again.
- only one AI encoding network and one corresponding AI decoding network can be obtained by training, and both the AI encoding network and the AI decoding network are applicable to various scenarios where the sensor outputs multi-frame RAW images.
- the AI encoding network and the AI decoding network can be trained using the sample training data in the HDR scene, so that the AI encoding network and the AI decoding network can learn the inter-frame correlation of RAW images with different EV value differences in the HDR scene.
- the AI encoding network and AI decoding network can be trained using the sample training data in the EDOF scene, so that the AI encoding network and the AI decoding network can learn the inter-frame correlation of RAW images under different focus distances in the EDOF scene.
- the AI encoding network and the AI decoding network can also be trained using the sample training data in the scenario where more different sensors output multi-frame RAW images. Therefore, the AI encoding network and the corresponding AI decoding network can be applied to various scenarios where the sensor outputs multi-frame RAW images.
- AI encoding network and AI decoding network can be applied to various scenarios where the sensor outputs multi-frame RAW images, the encoding module of the mobile phone only contains one AI encoding network, and the decoding module of the cloud only contains one. AI decoding network.
- an AI encoding network and an AI decoding network suitable for the scene can be obtained by training. That is, for a variety of different scenarios where the sensor outputs multi-frame RAW images, it is possible to train and obtain multiple AI encoding networks and AI decoding networks that correspond to the scenarios one-to-one.
- the first AI encoding network and the first AI decoding network can be trained by using the sample training data in the HDR scene, so that the first AI encoding network and the first AI decoding network can learn the RAW images with different EV values in the HDR scene. Inter-frame correlation.
- the sample training data in the EDOF scene can also be used to train the second AI encoding network and the second AI decoding network, so that the second AI encoding network and the second AI decoding network can learn the RAW images under different focusing distances in the EDOF scene inter-frame correlation.
- the third AI encoding network and the third AI decoding network may also be trained by using the sample training data in the scenario where more different sensors output multi-frame RAW images. Therefore, each group of AI encoding network and AI decoding network (the first AI encoding network and the first AI decoding network may be one group) can be applied to a scenario where the sensor outputs multi-frame RAW images.
- each group of AI encoding network and AI decoding network is applicable to the sensor outputting multi-frame RAW images
- the decoding module in the cloud correspondingly includes multiple (M) AI decoding networks.
- the encoding module can select an AI encoding network that matches the scene corresponding to the multi-frame RAW image according to the mata data information of the multi-frame RAW image to perform AI encoding on the multi-frame RAW image.
- the encoded code stream corresponding to the multi-frame RAW image uploaded by the mobile phone to the cloud also includes the mata data information of each frame of the RAW image.
- the decoding module can select an AI decoding network that matches the scene corresponding to the multi-frame RAW image according to the mata data information of the multi-frame RAW image to perform AI decoding on the multi-frame RAW image.
- the same set of AI encoding network and AI decoding network may be obtained by training for some relatively close or similar scenes; for other different scenes, the AI encoding network and AI decoding network that are only applicable to the scene may be obtained by training.
- AI decoding network For example, for low-brightness scenes and HDR scenes, the multi-frame RAW images output by the sensor are multi-frame RAW images with different EV values. Therefore, for low-brightness scenes and HDR scenes, the same set of AI encoding network and AI decoding can be obtained by training. network. For the EDOF scene, the multi-frame RAW images output by the sensor are multi-frame RAW images with different focus distances.
- the AI encoding network and AI decoding network that are only applicable to the EDOF scene can be obtained by training.
- the encoding module and the decoding module can select the AI encoding network and AI decoding network corresponding to the scene corresponding to the multi-frame RAW image for processing according to the mata data information of the multi-frame RAW image. Repeat. It should be noted that this application does not limit the correspondence between the AI encoding network and the AI decoding network and the photographing scene.
- the implementation scheme of the encoding module and the decoding module based on the AI network is exemplified.
- the encoding module of the mobile phone may also be a distributed encoding module, and correspondingly, the decoding module in the cloud may adopt a strategy corresponding to the distributed encoding for decoding.
- the distributed coding module can use the distributed source coding (distributed source coding, DSC) method to process the multi-frame RAW image.
- DSC distributed source coding
- Channel coding to obtain multiple sets of code stream packets corresponding to multiple frames of RAW images, wherein one frame of RAW image corresponds to a set of code stream packets, each set of code stream packets includes multiple code stream packets, and each code stream packet includes at least correction. Error code and the mata data information of the RAW image corresponding to the stream packet.
- the distributed encoding module may use a channel encoding manner such as low-density parity-check (LDPC), turbo encoding, and the like to encode each frame of the RAW image.
- LDPC low-density parity-check
- turbo encoding turbo encoding
- M megabytes
- FIG. 14 shows a schematic diagram of a processing flow of a decoding module provided by an embodiment of the present application.
- the mobile phone can upload the code stream package corresponding to the RAW image of the frame, and the decoding module in the cloud can obtain the RAW image of the frame when it receives the code stream package corresponding to the RAW image of the frame.
- the predicted value of the picture, and the error correction is performed on the predicted value according to the error correction code in the received code stream packet to decode the RAW picture of the frame (ie, the decoding process in the decoding module shown in FIG. 14). Then, the decoding module can judge whether the RAW image is correctly decoded. If decoded correctly, output the reconstructed RAW image to the subsequent processing module, such as: RAW image post-processing module. If the cloud decodes the received code stream packet incorrectly (that is, it is not decoded correctly), it means that the error correction code contained in the currently received code stream packet is not enough for error correction, and the cloud can request the mobile phone side to continue to transmit the error correction code.
- the mobile phone can first upload the first code stream packet corresponding to the RAW image of the frame, and the decoding module in the cloud can obtain the predicted value of the RAW image of the frame when it receives the first code stream packet corresponding to the RAW image of the frame. , and then perform error correction on the predicted value according to the error correction code in the first code stream packet to decode the RAW picture of the frame. If the cloud decodes an error according to the first code stream packet corresponding to the RAW image of the frame, it means that the error correction code contained in the first code stream packet is insufficient for error correction.
- the cloud can request the mobile phone side to continue to transmit the error correction code, that is, to continue to transmit the second code stream packet corresponding to the frame image.
- the cloud can send a notification message to the mobile phone to notify the mobile phone to continue uploading the second stream packet corresponding to the frame image. Therefore, the cloud can continue to decode according to the second code stream packet corresponding to the RAW image of the frame.
- the cloud can use more error correction codes to decode, until the decoding is correct and the binary matches, the cloud can obtain the reconstructed RAW image corresponding to the RAW image of the frame.
- the decoding module can output the reconstructed RAW image to the RAW domain post-processing module for subsequent processing.
- correct decoding refers to iterating until all parity checks return to 0.
- the decoding module in the cloud predicts and acquires the predicted value of the first frame of the RAW image, it can use the frame according to the initial predicted value (pred).
- the initial predicted value may be a default value (preset value), for example, may be an intermediate value of significant bits. For example, when the significant bits are 255, the initial predicted value may be 128.
- the intra-frame prediction method described here may refer to the existing image coding or video intra-frame prediction method, which is not limited herein.
- the decoding module in the cloud predicts and obtains the predicted values of the RAW images of other frames after the first frame of the RAW image (such as: the second frame, the third frame, etc., the second frame, the third frame, etc. are used to indicate the upload code stream of the mobile phone)
- the RAW image sequence of the package an inter-frame correlation prediction model can be established according to the reconstructed RAW image that has been decoded, and prediction values of other frame RAW images can be predicted and obtained according to the inter-frame correlation prediction model.
- the mata data information includes at least: the photographing scene of the RAW image corresponding to the code stream package is an HDR scene, and the EV value of the RAW image corresponding to the code stream package.
- the EV value of the first RAW image uploaded by the mobile phone to the cloud is EV0
- the EV value of the second RAW image is EV-1
- the EV value of the third RAW image is EV-2.
- the decoding module in the cloud can use intra-frame prediction according to the initial predicted value (pred) to obtain the predicted value of the RAW image of EV0, and use the code corresponding to the RAW image of EV0 as described above. Error correction is performed on the stream packet, and the reconstructed RAW image corresponding to the RAW image of EV0 is obtained.
- the decoding module in the cloud can determine the RAW image of EV0 and the RAW image of EV-1 according to the mata data information of the RAW image included in the code stream package corresponding to the RAW image of EV-1. There is a linear relationship as described in the aforementioned formula (4). Therefore, the decoding module in the cloud can use the reconstructed RAW image corresponding to the RAW image of EV0 as a reference frame, and establish the following correlation prediction model corresponding to the RAW image of EV-1 in segments to predict and obtain the predicted value of the RAW image of EV-1.
- pred -1 represents the predicted value of the RAW image of EV-1
- rec 0 represents the actual value of the reconstructed RAW image corresponding to the RAW image of EV0
- the initial values of parameters a 1 , a 2 , b 1 , and b 2 are 2 in sequence , 2, 0, 0
- min and max are set to 1/16 and 15/16 of the maximum significant bits, respectively. For example, assuming that the significant bits are 8 bits, min is set to 16 and max is set to 240.
- the effective bits are related to the sensor, and its size is not limited here.
- the cloud can obtain the predicted value of the RAW image of EV-1, and then use the predicted value as the data received by the default decoding module, and follow the above-mentioned method.
- the method uses the error correction code in the code stream package actually transmitted from the mobile phone side to correct the error of the predicted value until the reconstructed RAW image corresponding to the RAW image of EV-1 is obtained.
- the decoding module in the cloud can determine the RAW image of EV-1 according to the mata data information of the RAW image included in the code stream package corresponding to the RAW image of EV-2. There is a linear relationship between the image and the RAW image of EV-2 as described in the aforementioned formula (4). Therefore, the decoding module in the cloud can use the reconstructed RAW image corresponding to the RAW image of EV-1 as a reference frame, and establish a correlation prediction model corresponding to the RAW image of EV-2 in segments to predict and obtain the predicted value of the RAW image of EV-2. For details, please refer to the correlation prediction model corresponding to the RAW image of EV-1 above, which will not be repeated here.
- the decoding module in the cloud can also use the reconstructed RAW image corresponding to the RAW image of EV-0 and the reconstructed RAW image corresponding to the RAW image of EV-1 as reference frames.
- the following correlation prediction model corresponding to the RAW image of EV-2 is established in sections to predict the predicted value of the RAW image of EV-2.
- pred -2 represents the predicted value of the RAW image of EV-2
- rec 0 represents the actual value of the reconstructed RAW image corresponding to the RAW image of EV0
- rec -1 represents the actual value of the reconstructed RAW image corresponding to the RAW image of EV-1
- the initial values of parameters a 1 , a 2 , b 1 , b 2 , c 1 , c 2 can be set to 2, 2, 1, 1, 0, 0 respectively
- the initial values of are 2, 2, 0, and 0 in sequence
- min and max are the same as in the previous embodiment, and are not repeated here.
- the cloud can also obtain the predicted value of the RAW image of EV-2, and then use the predicted value as the data that has been received by the default decoding module, and follow the steps described above.
- the method corrects the predicted value through the error correction code in the code stream package actually transmitted from the mobile phone side, until the reconstructed RAW image corresponding to the RAW image of EV-2 is obtained.
- the cloud when obtaining the predicted value of the RAW image of EV-1 and the method of obtaining the predicted value of the RAW image of EV-2, in the embodiment of the present application, when obtaining the predicted value of the RAW image of a certain EV value, it can be Select the reconstructed RAW image that has been obtained from the previous frame as the reference frame, or you can select the reconstructed RAW image of multiple frames that have been obtained together as the reference frame. That is, the content that has been reconstructed in the current cloud can be used as a reference for subsequent frames. For example, if n frames have been successfully decoded, the maximum number of reference frames that can be set in the next frame is n (n is an integer greater than 0) .
- the cloud may determine the number of reconstructed frames for reference according to actual requirements, which is not limited herein.
- the decoding module in the cloud may also update parameters such as a 1 , a 2 , b 1 , and b 2 in the correlation prediction model according to the reconstructed RAW image in the process of decoding and obtaining the reconstructed RAW image. For example, after obtaining the reconstructed RAW image corresponding to the RAW image of EV0 and the reconstructed RAW image corresponding to the RAW image of EV-1, the cloud can reconstruct the RAW image corresponding to the RAW image of EV0 and the RAW image corresponding to the RAW image of EV-1.
- the updated parameters such as a 1 , a 2 , b 1 , and b 2 can be used.
- parameters such as a 1 , a 2 , b 1 , and b 2 can be continuously updated, which will not be repeated. That is, in this embodiment of the present application, parameters such as a 1 , a 2 , b 1 , and b 2 can be continuously updated on the cloud side through more data.
- the sensor output map cannot satisfy the linear relationship in the dark area close to 0 and the overexposed area close to the maximum value of the highest bit. Therefore, in the embodiment of the present application, it can be assumed that it satisfies the linear relationship to carry out the mathematical model. Modeled and used to predict the pixel value of the next EV.
- the correlation prediction model in the above-mentioned HDR scene is only an exemplary illustration, and other mathematical models can also be established correspondingly according to the photographing scene of the multi-frame RAW image output by the sensor, and the relevant unknown parameters can be continuously refreshed. There is no restriction here.
- the decoding module in the cloud may further predict and obtain the predicted value of the RAW image in combination with other existing data stored in the cloud.
- Other existing data may be historical images uploaded and stored by the user (which may be the current photographing user or other users), some correlation prediction models established in the process of historical image processing, and the like.
- the mata data information of a RAW image of a certain EV value may also include the location information when the mobile phone obtains the RAW image of the frame (for example, it can be the latitude and longitude coordinates of the mobile phone).
- the cloud needs to obtain the predicted value of the RAW image of the frame
- the RAW images corresponding to other existing images of the information are used as reference frames to obtain the predicted values of the RAW images, etc.
- the RAW domain processing algorithm can also be implemented on the mobile phone side.
- you can first perform multi-frame fusion processing on the multi-frame RAW images on the mobile phone side to obtain a single-frame RAW image.
- the mobile phone can upload the obtained single-frame RAW image to the cloud for ISP processing, YUV domain processing, first format encoding, etc.
- the cloud can return the first format image to the mobile phone side.
- the cloud when the cloud performs image processing in the YUV domain, the YUV image can also be optimized according to the high-quality reference image stored in the database, so that the final obtained first image can be optimized. Format image quality is better.
- the cloud can learn from the high-quality reference images stored in the database to obtain network models suitable for various scenarios. Then, in the above-mentioned photographing process, the cloud can optimize the YUV image by using the network model that matches the photographing scene.
- the network architecture of the network model is not specifically limited herein.
- the user For another example, for a scene in which a user takes a selfie, the user generally takes a photo by himself. If the current photo is blurred due to hand shake or other reasons, then if the user authorizes access to his own data, the cloud can also use the existing data to perform AI learning through the face information of the clear photo of the current blurred image. Image learning, get more real and clear photos of yourself.
- the embodiments of the present application can make full use of the resources of the cloud, so that the image processing can achieve a better effect, which will not be described one by one here.
- FIG. 15 shows a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application.
- the image processing apparatus may include: a camera module 1501 , an encoding module 1502 , a sending module 1503 , and a receiving module 1504 .
- the camera module 1501 is used to collect the RAW image corresponding to the current photographing scene in response to the user's photographing operation; the encoding module 1502 is used to encode the RAW image corresponding to the current photographing scene to obtain the RAW image corresponding to the current photographing scene.
- the camera module 1501 is further configured to, in response to the user's first selection operation, determine that it is necessary to upload the RAW image collected when taking pictures to the cloud for processing.
- FIG. 16 shows a schematic structural diagram of another image processing apparatus provided by an embodiment of the present application.
- the image processing apparatus may include: a receiving module 1601 , a decoding module 1602 , a processing module 1603 , and a sending module 1604 .
- the receiving module 1601 is used to receive the coded code stream of the RAW image corresponding to the current photographing scene from the terminal device;
- the decoding module 1602 is used to decode the coded code stream of the RAW image corresponding to the current photographing scene to obtain the current photographing scene.
- the processing module 1603 is used to process the reconstructed RAW image corresponding to the current photographing scene to generate an image of the first format corresponding to the current photographing scene;
- the sending module 1604 is used to send the first format to the terminal device. image.
- the processing module 1603 may include a RAW domain post-processing module, an ISP module, a YUV domain post-processing module, a first format encoder, and the like.
- a RAW domain post-processing module for example, a RAW domain post-processing module, an ISP module, a YUV domain post-processing module, a first format encoder, and the like.
- units in the above apparatus is only a division of logical functions, and may be fully or partially integrated into a physical entity in actual implementation, or may be physically separated.
- all the units in the device can be implemented in the form of software calling through the processing element; also all can be implemented in the form of hardware; some units can also be implemented in the form of software calling through the processing element, and some units can be implemented in the form of hardware.
- each unit can be a separately established processing element, or can be integrated in a certain chip of the device to be implemented, and can also be stored in the memory in the form of a program, which can be called and executed by a certain processing element of the device. Function. In addition, all or part of these units can be integrated together, and can also be implemented independently.
- the processing element described here may also be called a processor, which may be an integrated circuit with signal processing capability.
- each step of the above method or each of the above units may be implemented by an integrated logic circuit of hardware in the processor element or implemented in the form of software being invoked by the processing element.
- the units in the above apparatus may be one or more integrated circuits configured to implement the above methods, eg, one or more application specific integrated circuits (ASICs), or, one or more A digital signal processor (DSP), or, one or more field programmable gate arrays (FPGA), or a combination of at least two of these integrated circuit forms.
- ASICs application specific integrated circuits
- DSP digital signal processor
- FPGA field programmable gate arrays
- the processing element can be a general-purpose processor, such as a CPU or other processors that can invoke programs.
- these units can be integrated together and implemented in the form of a system-on-a-chip (SOC).
- the unit of the above apparatus for implementing each corresponding step in the above method may be implemented in the form of a processing element scheduler.
- the apparatus may include a processing element and a storage element, and the processing element invokes a program stored in the storage element to execute the method described in the above method embodiments.
- the storage element may be a storage element on the same chip as the processing element, ie, an on-chip storage element.
- the program for performing the above method may be in a storage element on a different chip from the processing element, ie, an off-chip storage element.
- the processing element calls or loads the program from the off-chip storage element to the on-chip storage element, so as to call and execute the methods described in the above method embodiments.
- an embodiment of the present application may further provide an apparatus, such as an electronic device, which may include a processor, a memory for storing instructions executable by the processor.
- an electronic device which may include a processor, a memory for storing instructions executable by the processor.
- the electronic device can implement the steps performed by the terminal device or the steps performed by the cloud in the image processing method described in the foregoing embodiments.
- the memory may be located within the electronic device or external to the electronic device.
- the processor includes one or more.
- the unit of the apparatus implementing each step in the above method may be configured as one or more processing elements, where the processing elements may be integrated circuits, such as: one or more ASICs, or, one or more Multiple DSPs, or, one or more FPGAs, or a combination of these types of integrated circuits. These integrated circuits can be integrated together to form chips.
- an embodiment of the present application further provides a chip, which can be applied to the above-mentioned electronic device.
- the chip includes one or more interface circuits and one or more processors; the interface circuit and the processor are interconnected by lines; the processor receives and executes computer instructions from the memory of the electronic device through the interface circuit, so as to realize the above-mentioned embodiments.
- the steps performed by the terminal device or the steps performed by the cloud are interconnected by lines; the processor receives and executes computer instructions from the memory of the electronic device through the interface circuit, so as to realize the above-mentioned embodiments.
- Embodiments of the present application further provide a computer program product, including computer-readable codes, when the computer-readable codes are executed in an electronic device, the electronic device enables the electronic device to implement the image processing method described in the foregoing embodiments, and the terminal device executes the image processing method. steps or steps performed in the cloud.
- the disclosed apparatus and method may be implemented in other manners.
- the device embodiments described above are only illustrative.
- the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods.
- multiple units or components may be Incorporation may either be integrated into another device, or some features may be omitted, or not implemented.
- the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separated, and components shown as units may be one physical unit or multiple physical units, that is, they may be located in one place, or may be distributed to multiple different places . Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
- each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
- the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
- the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a readable storage medium.
- the software product is stored in a program product, such as a computer-readable storage medium, and includes several instructions for causing a device (which may be a single-chip microcomputer, a chip, etc.) or a processor (processor) to execute all of the methods described in the various embodiments of the present application or part of the steps.
- the aforementioned storage medium includes: a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk and other mediums that can store program codes.
- the embodiments of the present application may further provide a computer-readable storage medium on which computer program instructions are stored.
- the electronic device is made to implement the steps performed by the terminal device or the steps performed by the cloud in the image processing method described in the foregoing embodiments.
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Computer Networks & Wireless Communication (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
本申请提供一种图像处理方法、装置、设备及存储介质,涉及图像处理领域。本申请中,终端设备响应于用户的拍照操作,采集当前拍照场景对应的RAW图后,可以对当前拍照场景对应的RAW图进行编码,得到当前拍照场景对应的RAW图的编码码流并发送至云端。云端可以对当前拍照场景对应的RAW图的编码码流进行解码,得到当前拍照场景对应的重建RAW图,并对当前拍照场景对应的重建RAW图进行处理,生成当前拍照场景对应的第一格式的图像。本申请可以充分利用云端的大数据资源和计算资源进行图像处理,以达到更好的图像处理效果,避免了在图像处理过程中将YUV域的一些图像处理算法迁移至RAW域进行时,在终端设备中受到约束而导致处理效果受限的问题。
Description
本申请要求于2021年01月08日提交国家知识产权局、申请号为202110026530.3、申请名称为“图像处理方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请实施例涉及图像处理领域,尤其涉及一种图像处理方法、装置、设备及存储介质。
手机在进行拍照时,手机的相机模块可以采集原始图像并输出给中间处理模块。原始图像可以称为RAW图或数字底片。手机的中间处理模块可以对接收到的RAW图进行一系列处理,最终得到可用于显示的图像,如:JPEG图像。该JPEG图像可以被传输给手机的显示屏进行显示,和/或,传输给手机的存储器进行存储。其中,中间处理模块对RAW图进行处理,生成JPEG图像的过程可以包括:对RAW图进行图像信号处理(image signal processing,ISP),将图像从RAW域转换到YUV域,YUV域的图像可称为YUV图;然后,采用YUV域后处理算法对YUV图进行处理;最后,采用JPEG编码方式对完成YUV域处理后的YUV图进行编码,得到JPEG图像。
目前,为了中间处理模块能够达到更好的图像处理效果,可以将YUV域的一些图像处理算法迁移至RAW域进行。例如,可以将HDR的多帧配准、融合、降噪等图像处理算法由YUV域迁移至RAW域进行。在RAW域进行图像处理的好处可以包括:RAW图比YUV图含有更高的比特位信息;RAW图未经过ISP处理,不会对颜色、细节等信息造成破坏等。
但是,相比于YUV域的图像处理而言,RAW域的图像处理的数据量要更大,对算法性能、内存等要求会更高。而手机的计算资源和内存资源是有限的,因此,通过将YUV域的一些图像处理算法迁移至RAW域进行的方式,在手机中会受到一定约束,容易导致处理效果受限的问题。例如,可能需要对一些图像处理算法根据手机的算力进行裁剪适配,从而导致图像处理算法的处理结果并不理想。
发明内容
本申请实施例提供一种图像处理方法、装置、设备及存储介质,可以解决将YUV域的一些图像处理算法迁移至RAW域进行时,手机受到约束而导致处理效果受限的问题。
第一方面,本申请实施例提供一种图像处理方法,所述方法包括:终端设备响应于用户的拍照操作,采集当前拍照场景对应的RAW图。终端设备对当前拍照场景对应的RAW图进行编码,得到当前拍照场景对应的RAW图的编码码流,并向云端发送当前拍照场景对应的RAW图的编码码流。终端设备接收来自云端的第一格式的图像,第一格式的图像为云端根据当前拍照场景对应的RAW图的编码码流所生成的。
通过该图像处理方法可以避免将YUV域的一些图像处理算法迁移至RAW域进行时,在终端设备中会受到一定约束而导致处理效果受限的问题;可以充分利用云端的大数据 资源和计算资源,对RAW图进行RAW域图像处理、ISP处理、以及YUV域处理,以达到更好的图像处理效果。
可选地,所述当前拍照场景对应的RAW图包括一帧或多帧;所述终端设备对当前拍照场景对应的RAW图进行编码,得到当前拍照场景对应的RAW图的编码码流,并向云端发送当前拍照场景对应的RAW图的编码码流,包括:当当前拍照场景对应的RAW图包括多帧时,终端设备对多帧RAW图进行编码,得到多帧RAW图的编码码流,并向云端发送多帧RAW图的编码码流。
本设计中,如果终端设备的相机模块采集多帧RAW图,则将多帧RAW图对应的编码码流上传给云端进行处理。而如果采集一帧RAW图,则直接通过本地(即终端设备侧)进行处理。
其他一些实现方式中,也可以是不论采集一帧RAW图,还是采集多帧RAW图,终端设备都对采集的RAW图进行编码得到RAW图对应的编码码流,并将RAW图对应的编码码流上传给云端进行处理。
可选地,所述终端设备响应于用户的拍照操作,采集当前拍照场景对应的RAW图之前,所述方法还包括:终端设备响应于用户的第一选择操作,确定需要将拍照时采集的RAW图上传到云端进行处理。
本设计中,终端设备可以具有供用户选择是否将相机模块采集到的RAW图上传到云端进行处理的功能。其中,第一选择操作可以是用户在终端设备上使用该功能的操作。例如,当手机启动运行拍照应用程序后,可以在拍照界面为用户提供一个可选择是否将相机模块采集到的RAW图上传到云端进行处理的功能控件,用户可以通过对该功能控件进行操作,以主动选择是否将相机模块采集到的RAW图上传到云端进行处理。手机选择将相机模块采集到的RAW图上传到云端进行处理的操作即为第一选择操作。
可选地,所述终端设备对当前拍照场景对应的RAW图进行编码,得到当前拍照场景对应的RAW图的编码码流,包括:终端设备对当前拍照场景对应的RAW图进行压缩,得到当前拍照场景对应的RAW图的压缩特征;终端设备对当前拍照场景对应的RAW图的压缩特征进行量化;终端设备对当前拍照场景对应的RAW图的量化后的压缩特征进行熵编码,得到当前拍照场景对应的RAW图的编码码流。
可选地,当当前拍照场景对应的RAW图包括多帧时,所述终端设备对当前拍照场景对应的RAW图进行压缩,得到当前拍照场景对应的RAW图的压缩特征,包括:终端设备根据当前拍照场景的类型,确定多帧RAW图之间的帧间相关性;终端设备从多帧RAW图中选择一帧作为参考帧,并根据参考帧、以及多帧RAW图之间的帧间相关性,对多帧RAW图中除参考帧之外的其他帧进行预测,得到其他帧对应的残差图;终端设备对多帧RAW图中除参考帧之外的其他帧对应的残差图、以及参考帧进行压缩,得到多帧RAW图的压缩特征。
本设计中,终端设备从多帧RAW图中选择一帧作为参考帧,并根据参考帧、以及多帧RAW图之间的帧间相关性,对多帧RAW图中除参考帧之外的其他帧进行预测,得到其他帧对应的残差图,即是指根据多帧RAW图之间的帧间相关性对多帧RAW图进行预处理。根据多帧RAW图之间的帧间相关性对多帧RAW图进行预处理,能够进一步提高多帧RAW图进行压缩时的压缩率,提高RAW图编码码流的传输速度。
可选地,所述方法还包括:终端设备根据多帧RAW图的元数据信息,确定当前拍照场景的类型。
可选地,所述终端设备对当前拍照场景对应的RAW图进行编码,得到当前拍照场景对应的RAW图的编码码流,包括:终端设备采用分布式信源编码的方式对当前拍照场景对应的RAW图进行信道编码,得到当前拍照场景对应的RAW图的编码码流;其中,当当前拍照场景对应的RAW图包括多帧时,当前拍照场景对应的RAW图的编码码流包括与多帧RAW图一一对应的多组码流包;当当前拍照场景对应的RAW图包括一帧时,当前拍照场景对应的RAW图的编码码流包括与一帧RAW图对应的一组码流包。每组码流包中包括多个码流包,每个码流包至少包括纠错码、以及码流包对应的一帧RAW图的元数据信息。所述终端设备向云端发送当前拍照场景对应的RAW图的编码码流,包括:终端设备以帧为单位,依次向云端上传每一帧RAW图对应的码流包。
本设计中,当终端设备采用分布式编码的方法对RAW图进行编码时,在云端的预测值越准确,则需要传输的纠错码就会越少,压缩率会越高。从而,可以充分利用云端的数据相关性,达到更高压缩率,有效节省上传流量。
第二方面,本申请实施例提供一种图像处理装置,该装置可以用于实现上述第一方面所述的方法。该装置的功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。硬件或软件包括一个或多个与上述功能相对应的模块或单元,例如,相机模块、编码模块、发送模块、接收模块等。
其中,相机模块,用于响应于用户的拍照操作,采集当前拍照场景对应的RAW图;编码模块,用于对当前拍照场景对应的RAW图进行编码,得到当前拍照场景对应的RAW图的编码码流;发送模块,用于向云端发送当前拍照场景对应的RAW图的编码码流;接收模块,用于接收来自云端的第一格式的图像,第一格式的图像为云端根据当前拍照场景对应的RAW图的编码码流所生成的。
可选地,所述当前拍照场景对应的RAW图包括一帧或多帧;编码模块,具体用于当当前拍照场景对应的RAW图包括多帧时,对多帧RAW图进行编码,得到多帧RAW图的编码码流;发送模块,具体用于向云端发送多帧RAW图的编码码流。
可选地,所述相机模块,还用于响应于用户的第一选择操作,确定需要将拍照时采集的RAW图上传到云端进行处理。
可选地,所述编码模块,具体用于对当前拍照场景对应的RAW图进行压缩,得到当前拍照场景对应的RAW图的压缩特征;对当前拍照场景对应的RAW图的压缩特征进行量化;对当前拍照场景对应的RAW图的量化后的压缩特征进行熵编码,得到当前拍照场景对应的RAW图的编码码流。
可选地,所述编码模块,具体用于当当前拍照场景对应的RAW图包括多帧时,根据当前拍照场景的类型,确定多帧RAW图之间的帧间相关性;从多帧RAW图中选择一帧作为参考帧,并根据参考帧、以及多帧RAW图之间的帧间相关性,对多帧RAW图中除参考帧之外的其他帧进行预测,得到其他帧对应的残差图;对多帧RAW图中除参考帧之外的其他帧对应的残差图、以及参考帧进行压缩,得到多帧RAW图的压缩特征。
可选地,所述编码模块还用于根据多帧RAW图的元数据信息,确定当前拍照场景的类型。
可选地,所述编码模块,具体用于采用分布式信源编码的方式对当前拍照场景对应的RAW图进行信道编码,得到当前拍照场景对应的RAW图的编码码流;其中,当当前拍照场景对应的RAW图包括多帧时,当前拍照场景对应的RAW图的编码码流包括与多帧RAW图一一对应的多组码流包;当当前拍照场景对应的RAW图包括一帧时,当前拍 照场景对应的RAW图的编码码流包括与一帧RAW图对应的一组码流包。每组码流包中包括多个码流包,每个码流包至少包括纠错码、以及码流包对应的一帧RAW图的元数据信息。所述发送模块,具体用于以帧为单位,依次向云端上传每一帧RAW图对应的码流包。
第三方面,本申请实施例提供一种电子设备,包括:处理器,用于存储处理器可执行指令的存储器;处理器被配置为执行所述指令时,使得电子设备实现如第一方面所述的图像处理方法。
该电子设备可以是手机、平板电脑、可穿戴设备、车载设备、AR/VR设备、笔记本电脑、超级移动个人计算机、上网本、个人数字助理等移动终端,或者,也可以是数码相机、单反相机/微单相机、运动摄像机、云台相机、无人机等专业的拍摄设备。
第四方面,本申请实施例提供一种计算机可读存储介质,其上存储有计算机程序指令;当计算机程序指令被电子设备执行时,使得电子设备实现如第一方面所述的图像处理方法。
第五方面,本申请实施例还提供一种计算机程序产品,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,使得电子设备实现前述第一方面所述的图像处理方法。
上述第二方面至第五方面所具备的有益效果,可参考第一方面中所述,在此不再赘述。
第六方面,本申请实施例还提供一种图像处理方法,所述方法包括:云端接收来自终端设备的当前拍照场景对应的RAW图的编码码流。云端对当前拍照场景对应的RAW图的编码码流进行解码,得到当前拍照场景对应的重建RAW图。云端对当前拍照场景对应的重建RAW图进行处理,生成当前拍照场景对应的第一格式的图像,并向终端设备发送第一格式的图像。
可选地,所述云端对当前拍照场景对应的RAW图的编码码流进行解码,得到当前拍照场景对应的重建RAW图,包括:云端对当前拍照场景对应的RAW图的编码码流进行熵解码,得到当前拍照场景对应的RAW图的量化后的压缩特征;云端对当前拍照场景对应的RAW图的量化后压缩特征进行反量化,得到当前拍照场景对应的RAW图的压缩特征;云端对当前拍照场景对应的RAW图的压缩特征进行解压,得到当前拍照场景对应的重建RAW图。
可选地,当前拍照场景对应的RAW图包括多帧;所述云端对当前拍照场景对应的RAW图的压缩特征进行解压,得到当前拍照场景对应的重建RAW图,包括:云端对多帧所述RAW图的压缩特征进行解压,得到多帧RAW图中的参考帧对应的重建RAW图、以及其他帧对应的残差图;云端根据当前拍照场景的类型,确定多帧RAW图之间的帧间相关性;云端根据参考帧对应的重建RAW图、其他帧对应的残差图、以及多帧RAW图之间的帧间相关性,对多帧RAW图进行重建,得到与多帧RAW图一一对应的多帧重建RAW图。
可选地,多帧RAW图的编码码流中还包括多帧RAW图的元数据信息;所述云端根据当前拍照场景的类型,确定多帧RAW图之间的帧间相关性之前,所述方法还包括:云端根据多帧RAW图的元数据信息,确定当前拍照场景的类型。
可选地,当前拍照场景对应的RAW图的编码码流是终端设备采用分布式信源编码的方式对当前拍照场景对应的RAW图进行信道编码得到的;当当前拍照场景对应的 RAW图包括多帧时,当前拍照场景对应的RAW图的编码码流包括与多帧RAW图一一对应的多组码流包;当当前拍照场景对应的RAW图包括一帧时,当前拍照场景对应的RAW图的编码码流包括与一帧RAW图对应的一组码流包;每组码流包中包括多个码流包,每个码流包至少包括纠错码、以及码流包对应的一帧RAW图的元数据信息。所述云端对当前拍照场景对应的RAW图的编码码流进行解码,得到当前拍照场景对应的重建RAW图,包括:当当前拍照场景对应的RAW图包括一帧时,云端根据初始预测值采用帧内预测的方式对接收到的一帧RAW图对应的码流包进行解码,得到一帧RAW图对应的重建RAW图。当当前拍照场景对应的RAW图包括多帧时,云端根据初始预测值采用帧内预测的方式对接收到的第一帧RAW图对应的码流包进行解码,得到第一帧RAW图对应的重建RAW图;云端根据已经解码得到的重建RAW图中的至少一帧、以及多帧RAW图之间的帧间相关性,对接收到的第一帧RAW图之后的每一帧RAW图对应的码流包进行解码,得到第一帧RAW图之后的每一帧RAW图对应的重建RAW图。
可选地,当当前拍照场景对应的重建RAW图包括多帧时,所述云端对当前拍照场景对应的重建RAW图进行处理,生成当前拍照场景对应的第一格式的图像,包括:云端将多帧重建RAW图在RAW域融合为一帧重建RAW图;云端将融合后的一帧重建RAW图由RAW域转换至YUV域,得到一帧重建RAW图对应的YUV图;云端将一帧重建RAW图对应的YUV图编码为第一格式,得到当前拍照场景对应的第一格式的图像。
可选地,当当前拍照场景对应的重建RAW图包括多帧时,云端对当前拍照场景对应的重建RAW图进行处理,生成当前拍照场景对应的第一格式的图像,包括:云端将多帧重建RAW图由RAW域转换至YUV域,得到与多帧重建RAW图一一对应的多帧YUV图;云端将与多帧重建RAW图一一对应的多帧YUV图在YUV域融合为一帧YUV图;云端将融合后的一帧YUV图编码为第一格式,得到当前拍照场景对应的第一格式的图像。
第六方面中所述的图像处理方法与前述第一方面所述的图像处理方法相对应,因此,具有与前述第六方面相同的有益效果,不再赘述。
第七方面,本申请实施例提供一种图像处理装置,该装置可以用于实现上述第六方面所述的方法。该装置的功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。硬件或软件包括一个或多个与上述功能相对应的模块或单元,例如,接收模块、解码模块、处理模块、发送模块等。
其中,接收模块,用于接收来自终端设备的当前拍照场景对应的RAW图的编码码流;解码模块,用于对当前拍照场景对应的RAW图的编码码流进行解码,得到当前拍照场景对应的重建RAW图;处理模块,用于对当前拍照场景对应的重建RAW图进行处理,生成当前拍照场景对应的第一格式的图像;发送模块,用于向终端设备发送第一格式的图像。
例如,处理模块可以包括RAW域后处理模块、ISP模块、YUV域后处理模块、第一格式编码器等。
可选地,所述解码模块,具体用于对当前拍照场景对应的RAW图的编码码流进行熵解码,得到当前拍照场景对应的RAW图的量化后的压缩特征;对当前拍照场景对应的RAW图的量化后压缩特征进行反量化,得到当前拍照场景对应的RAW图的压缩特征;对当前拍照场景对应的RAW图的压缩特征进行解压,得到当前拍照场景对应的重建RAW图。
可选地,当前拍照场景对应的RAW图包括多帧;所述解码模块,具体用于对多帧RAW图的压缩特征进行解压,得到多帧RAW图中的参考帧对应的重建RAW图、以及其他帧对应的残差图;根据当前拍照场景的类型,确定多帧RAW图之间的帧间相关性;根据参考帧对应的重建RAW图、其他帧对应的残差图、以及多帧RAW图之间的帧间相关性,对多帧RAW图进行重建,得到与多帧RAW图一一对应的多帧重建RAW图。
可选地,多帧RAW图的编码码流中还包括多帧RAW图的元数据信息;所述解码模块,还用于根据多帧RAW图的元数据信息,确定当前拍照场景的类型。
可选地,当前拍照场景对应的RAW图的编码码流是终端设备采用分布式信源编码的方式对当前拍照场景对应的RAW图进行信道编码得到的;当当前拍照场景对应的RAW图包括多帧时,当前拍照场景对应的RAW图的编码码流包括与多帧RAW图一一对应的多组码流包;当当前拍照场景对应的RAW图包括一帧时,当前拍照场景对应的RAW图的编码码流包括与一帧RAW图对应的一组码流包;每组码流包中包括多个码流包,每个码流包至少包括纠错码、以及码流包对应的一帧RAW图的元数据信息。当当前拍照场景对应的RAW图包括一帧时,所述解码模块,具体用于根据初始预测值采用帧内预测的方式对接收到的一帧RAW图对应的码流包进行解码,得到一帧RAW图对应的重建RAW图。当当前拍照场景对应的RAW图包括多帧时,所述解码模块,具体用于根据初始预测值采用帧内预测的方式对接收到的第一帧RAW图对应的码流包进行解码,得到第一帧RAW图对应的重建RAW图;根据已经解码得到的重建RAW图中的至少一帧、以及多帧RAW图之间的帧间相关性,对接收到的第一帧RAW图之后的每一帧RAW图对应的码流包进行解码,得到第一帧RAW图之后的每一帧RAW图对应的重建RAW图。
可选地,当当前拍照场景对应的重建RAW图包括多帧时,所述处理模块,具体用于将多帧重建RAW图在RAW域融合为一帧重建RAW图;将融合后的一帧重建RAW图由RAW域转换至YUV域,得到一帧重建RAW图对应的YUV图;将一帧重建RAW图对应的YUV图编码为第一格式,得到当前拍照场景对应的第一格式的图像。
可选地,当当前拍照场景对应的重建RAW图包括多帧时,所述处理模块,具体用于将多帧重建RAW图由RAW域转换至YUV域,得到与多帧重建RAW图一一对应的多帧YUV图;将与多帧重建RAW图一一对应的多帧YUV图在YUV域融合为一帧YUV图;将融合后的一帧YUV图编码为第一格式,得到当前拍照场景对应的第一格式的图像。
第八方面,本申请实施例提供一种电子设备,包括:处理器,用于存储处理器可执行指令的存储器;处理器被配置为执行所述指令时,使得电子设备实现如第六方面所述的图像处理方法。
该电子设备可以是云端服务器、服务器集群、云平台等。
第九方面,本申请实施例提供一种计算机可读存储介质,其上存储有计算机程序指令;当计算机程序指令被电子设备执行时,使得电子设备实现如第六方面所述的图像处理方法。
第十方面,本申请实施例还提供一种计算机程序产品,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,使得电子设备实现前述第六方面所述的图像处理方法。
上述第七方面至第十方面所具备的有益效果,可参考第六方面中所述,在此不再赘述。
应当理解的是,本申请中对技术特征、技术方案、有益效果或类似语言的描述并不是暗示在任意的单个实施例中可以实现所有的特点和优点。相反,可以理解的是对于特征或有益效果的描述意味着在至少一个实施例中包括特定的技术特征、技术方案或有益效果。因此,本说明书中对于技术特征、技术方案或有益效果的描述并不一定是指相同的实施例。进而,还可以任何适当的方式组合本实施例中所描述的技术特征、技术方案和有益效果。本领域技术人员将会理解,无需特定实施例的一个或多个特定的技术特征、技术方案或有益效果即可实现实施例。在其他实施例中,还可在没有体现所有实施例的特定实施例中识别出额外的技术特征和有益效果。
图1示出了一种拍照原理的示意图;
图2示出了本申请实施例提供的端云协同系统的结构示意图;
图3示出了本申请实施例提供的终端设备的结构示意图;
图4示出了本申请实施例提供的手机与云端的交互示意图;
图5示出了本申请实施例提供的拍照界面的示意图;
图6示出了本申请实施例提供的拍照界面的另一示意图;
图7示出了本申请实施例提供的拍照界面的又一示意图;
图8示出了本申请实施例提供的拍照界面的又一示意图;
图9示出了本申请实施例提供的编码模块的示意图;
图10示出了本申请实施例提供的解码模块的示意图;
图11示出了一种RAW图的RGGB格式排布示意图;
图12示出了本申请实施例提供的编码模块的另一示意图;
图13示出了本申请实施例提供的解码模块的另一示意图;
图14示出了本申请实施例提供的解码模块的处理流程示意图;
图15示出了本申请实施例提供的一种图像处理装置的结构示意图;
图16示出了本申请实施例提供的另一种图像处理装置的结构示意图。
本申请实施例可以适用于具有拍照功能的终端设备进行拍照的场景。
可选地,终端设备可以是手机、平板电脑、可穿戴设备、车载设备、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)等移动终端,或者,也可以是数码相机、单反相机/微单相机、运动摄像机、云台相机、无人机等专业的拍摄设备,本申请实施例对终端设备的具体类型不作限制。
以终端设备为手机为例,图1示出了一种拍照原理的示意图。如图1所示,一般而言,手机的相机模块110(或称为相机模组)包括镜头(lens)111和传感器(sensor)112。手机在进行拍照时,相机模块110的镜头111可以获取拍摄场景中拍摄对象对应的光信号。相机模块110的sensor 112可以将通过镜头111的光信号转换为电信号,再对电信号进行模数(analogue-to-digital,A/D)转换,输出对应的数字信号给中间处理模块120。sensor 112输出给中间处理模块120的数字信号即为相机模块110拍摄的原始图像,可称 为RAW图或数字底片。中间处理模块120可以对接收到的RAW图进行一系列处理,最终得到可用于显示的图像,如:JPEG图像。该JPEG图像可以被传输给手机的显示屏130进行显示,和/或,传输给手机的存储器140进行存储。
请继续参考图1所示,其中,中间处理模块120对RAW图进行处理,生成JPEG图像的过程可以包括:对RAW图进行图像信号处理(image signal processing,ISP),将图像从RAW域转换到YUV域,YUV域的图像可称为YUV图;然后,采用YUV域后处理算法对YUV图进行处理;最后,采用JPEG编码方式对完成YUV域处理后的YUV图进行编码,得到JPEG图像。
示例性地,ISP处理中可以包括:坏点矫正(bad pixel correction,DPC)、RAW域降噪、黑电平矫正(black level correction,BLC)、镜头亮度矫正(lens shading correction,LSC)、自动白平衡(auto white balance,AWB)、去马赛克(demosica)颜色插值、色彩校正(color correction matrix,CCM)、动态范围压缩(dynamic range compression,DRC)、伽玛(gamma)、3D查找表(look up table,LUT)、YUV域降噪、锐化(sharpen)、增强细节(detail enhance)等。
YUV域后处理算法可以包括:高动态范围图像(high-dynamic range,HDR)的多帧配准、融合、降噪,以及提升清晰度的超分辨率(super resolution,SR)算法、美肤算法、畸变校正算法、虚化算法等。
在上述中间处理模块120对RAW图进行处理生成JPEG图像的过程中,通过将YUV域的一些图像处理算法迁移至RAW域进行,可以达到更好的图像处理效果。例如,可以将HDR的多帧配准、融合、降噪等图像处理算法由YUV域迁移至RAW域进行。在RAW域进行图像处理的好处可以包括:RAW图比YUV图含有更高的比特位信息;RAW图未经过ISP处理,不会对颜色、细节等信息造成破坏等。
但是,相比于YUV域的图像处理而言,RAW域的图像处理的数据量要更大,对算法性能、内存等要求会更高。而终端设备的计算资源和内存资源是有限的,因此,通过将YUV域的一些图像处理算法迁移至RAW域进行的方式,在终端设备中会受到一定约束,容易导致处理效果受限的问题。例如,可能需要对一些图像处理算法根据终端设备的算力进行裁剪适配,从而导致图像处理算法的处理结果并不理想。
基于此,本申请实施例提供一种图像处理方法,该方法中,终端设备可以将采集到的需要处理的RAW图上传到云端。云端可以充分利用大数据资源和计算资源,对RAW图进行RAW域图像处理、ISP处理、以及YUV域处理,得到最终的第一格式的图像,并回传给终端设备。
其中,第一格式可以包括JPEG格式、高效率图像文件格式(high efficiency image file format,HEIF)等,本申请对第一格式不作限制。
通过该图像处理方法可以避免将YUV域的一些图像处理算法迁移至RAW域进行时,在终端设备中会受到一定约束而导致处理效果受限的问题;可以充分利用云端的大数据资源和计算资源,对RAW图进行RAW域图像处理、ISP处理、以及YUV域处理,以达到更好的图像处理效果。
以下结合附图对本申请实施例进行具体说明。
需要说明的是,在本申请的描述中,“至少一个”是指一个或多个,“多个”是指两个或两个以上。类似地,“多帧”是指两帧或两帧以上。“第一”、“第二”等字样仅仅是为了区分描述,并不用于对某个特征的特别限定。“和/或”用于描述关联对象的 关联关系,表示可以存在三种关系。例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
本申请实施例提供的图像处理方法可以应用于由终端设备和云端组成的端云协同系统。端云协同的“端”指终端设备,“云”指云端,云端也可称为云服务器或云平台。例如,图2示出了本申请实施例提供的端云协同系统的结构示意图,如图2所示,该端云协同系统可以包括:终端设备210和云端220,终端设备210可以通过无线网络与云端220连接。
在一个实施例中,云端220可以是计算机服务器、或者多个服务器组成的服务器集群,本申请对云端220的实现架构不作限制。终端设备210的具体形态可以参考前述实施例中所述,不再赘述。
可选地,图2中示例性给出了一个终端设备210。但应当理解,该端云协同系统中的终端设备210可以包括一个或多个,多个终端设备210可以相同,也可以不相同,在此不作限制。本申请实施例提供的图像处理方法是针对每个终端设备210与云端220之间进行交互实现图像处理的过程。
示例性地,以终端设备为手机为例,图3示出了本申请实施例提供的终端设备的结构示意图。如图3所示,手机可以包括处理器310,外部存储器接口320,内部存储器321,通用串行总线(universal serial bus,USB)接口330,充电管理模块340,电源管理模块341,电池342,天线1,天线2,移动通信模块350,无线通信模块360,音频模块370,扬声器370A,受话器370B,麦克风370C,耳机接口370D,传感器模块380,按键390,马达391,指示器392,摄像头393,显示屏394,以及用户标识模块(subscriber identification module,SIM)卡接口395等。
处理器310可以包括一个或多个处理单元,例如:处理器310可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
其中,控制器可以是手机的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器310中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器310中的存储器为高速缓冲存储器。该存储器可以保存处理器310刚用过或循环使用的指令或数据。如果处理器310需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器310的等待时间,因而提高了系统的效率。
在一些实施例中,处理器310可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,SIM接口,和/或USB接口等。
外部存储器接口320可以用于连接外部存储卡,例如Micro SD卡,实现扩展手机的 存储能力。外部存储卡通过外部存储器接口320与处理器310通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器321可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器310通过运行存储在内部存储器321的指令,从而执行手机的各种功能应用以及数据处理。内部存储器321可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储手机使用过程中所创建的数据(比如图像数据,电话本等)等。此外,内部存储器321可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。
充电管理模块340用于从充电器接收充电输入。充电管理模块340为电池342充电的同时,还可以通过电源管理模块341为手机供电。电源管理模块341用于连接电池342,充电管理模块340,以及处理器310。电源管理模块341也可接收电池342的输入为手机供电。
手机的无线通信功能可以通过天线1,天线2,移动通信模块350,无线通信模块360,调制解调处理器以及基带处理器等实现。天线1和天线2用于发射和接收电磁波信号。手机中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
手机可以通过音频模块370,扬声器370A,受话器370B,麦克风370C,耳机接口370D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
传感器模块380可以包括压力传感器380A,陀螺仪传感器380B,气压传感器380C,磁传感器380D,加速度传感器380E,距离传感器380F,接近光传感器380G,指纹传感器380H,温度传感器380J,触摸传感器380K,环境光传感器380L,骨传导传感器380M等。
摄像头393可以包括多种类型。例如,摄像头393可以包括具有不同焦段的长焦摄像头,广角摄像头或超广角摄像头等。其中,长焦摄像头的视场角小,适用于拍摄远处小范围内的景物;广角摄像头的视场角较大;超广角摄像头的视场角大于广角摄像头,可以用于拍摄全景等大范围的画面。在一些实施例中,视场角较小的长焦摄像头可转动,从而可以拍摄不同范围内的景物。
手机可以通过摄像头393捕获RAW图。例如,摄像头393的具体结构可以参考如图1中所述的相机模块,至少包括镜头和传感器(sensor)。在拍摄照片或者拍摄视频时,打开快门,光线可以通过摄像头393的镜头被传递到sensor上。sensor可以将通过镜头的光信号转换为电信号,再对电信号进行A/D转换,输出对应的数字信号。该数字信号即为RAW图。后续通过对RAW图进行后续的RAW域处理、ISP处理、以及YUV域处理,可以将RAW图转化为肉眼可见的图。
在一种可能的设计中,sensor的感光元件可以是电荷耦合器件(charge coupled device,CCD),sensor还包括A/D转换器。在另外一种可能的设计中,sensor的感光元件可以是互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)。
显示屏394用于显示图像,视频等。显示屏394包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode, OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,手机可以包括1个或N个显示屏394,N为大于1的正整数。例如,显示屏394可以用于显示拍照界面,照片播放界面等。
手机通过GPU,显示屏394,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏394和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器310可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
可以理解的是,图3所示的结构并不构成对手机的具体限定。在一些实施例中,手机也可以包括比图3所示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置等。又或者,图3所示的一些部件可以以硬件,软件或软件和硬件的组合实现。
另外,当终端设备是其他平板电脑、可穿戴设备、车载设备、AR/VR设备、笔记本电脑、UMPC、上网本、PDA等移动终端,或者,数码相机、单反相机/微单相机、运动摄像机、云台相机、无人机等专业的拍摄设备时,这些其他终端设备的具体结构也可以参考图3所示。示例性地,其他终端设备可以是在图3给出的结构的基础上增加或减少了组件,在此不再一一赘述。
还应当理解的是,终端设备(如手机)中可以运行有一个或多个拍照应用程序,以便通过运行拍照应用程序,实现拍摄的功能。例如,该拍照应用程序可以包括系统级应用“相机”应用。又如,该拍照应用还可以包括其他安装在终端设备中的能够用于拍摄的应用程序。
下面以终端设备为手机为例,结合上述图2所示的端云协同系统,对手机在拍照时采集RAW图、以及将采集到的需要处理的RAW图上传到云端进行图像处理的过程进行示例性说明。应当理解,下述实施例给出的图像处理过程,同样适用于其他具有拍照功能的终端设备与云端交互的场景。
图4示出了本申请实施例提供的手机与云端的交互示意图。其中,端侧表示手机侧,云侧表示云端侧。如图4所示,本申请实施例中,手机至少可以包括相机模块和编码模块。当用户使用手机进行拍照时,手机可以通过相机模块采集RAW图。然后,手机可以通过编码模块对相机模块采集的RAW图进行编码得到RAW图对应的编码码流,并将RAW图对应的编码码流上传给云端。
云端至少可以包括解码模块、RAW域后处理模块、ISP模块、YUV域后处理模块、以及第一格式编码器。云端可以通过解码模块对来自手机的RAW图对应的编码码流进行解码,得到重建RAW图。通过RAW域后处理模块、ISP模块、以及YUV域后处理模块可以对重建RAW图依次进行RAW域图像处理、ISP处理、以及YUV域图像处理,YUV域后处理模块会输出一帧YUV图。通过第一格式编码器可以对YUV域后处理模块输出的YUV图进行第一格式的编码,最终得到第一格式的图像(如:JPEG图像)。然后,云端可以将第一格式的图像回传给手机。从而,手机可以将第一格式的图像保存在图库中或呈现给用户。
示例性地,用户在使用手机进行拍照之前,可以先启动手机的拍照应用程序。如:用户可以点击或触摸手机上的相机的图标,手机可以响应于用户对相机的图标的点击或触摸操作,启动运行相机(或者,用户还可以通过语音助手启动相机,不作限制)。手 机在启动运行拍照应用程序后,会为用户呈现拍照界面,同时,手机还会获取当前拍照场景对应的预览画面,并显示在拍照界面中。例如,图5示出了本申请实施例提供的拍照界面的示意图。如图5所示,当手机的拍照应用程序启动时,手机为用户呈现的拍照界面至少可以包括:预览画面、拍照按键。
其中,预览画面的获取过程与前述图1所示拍照原理类似,如:手机可以通过相机模块采集当前拍照场景对应的RAW图。然后,手机的ISP模块、YUV域后处理模块等(图4中未再示出手机的ISP模块、YUV域后处理模块等结构)可以对RAW图进行处理,得到可在拍照界面显示的预览画面。可选地,相对于拍照过程而言,手机在获取预览画面的过程中,对RAW图的处理较为简单,如:可以只对RAW图进行一些简单的ISP处理得到YUV图,然后,将YUV图直接转换为RGB格式的预览画面显示在拍照界面中,并不需要对YUV图进行JEPG编码。
可以理解的,图5所示的拍照界面中,拍照按键的实质可以为拍照界面中显示的一个功能控件。当用户使用手机进行拍照时,可以点击或触摸该功能控件,手机可以响应于用户对拍照按键的功能控件的点击或触摸操作,通过相机模块采集RAW图。或者,其他一些实施方式中,拍照按键的功能也可以通过手机上的其他物理按键实现,不作限制。
可选地,本申请实施例中,手机还具有场景检测的功能,手机通过相机模块采集RAW图可以是:手机先利用场景检测功能检测当前拍照场景,并根据对当前拍照场景的检测确认相机模块中sensor的出图要求;然后,手机根据sensor的出图要求,通过相机模块采集RAW图。
例如,当用户打开手机的拍照应用程序进行拍照的过程中,手机检测到当前拍照场景为高动态场景(即,HDR场景)时,可以确认sensor需要输出多帧不同曝光值(exposure value,EV)的RAW图,以进行多帧融合生成高动态图。然后,手机可以根据前述基于场景检测确定出的sensor的出图要求,通过相机模块采集多帧不同EV值的RAW图。其中,手机可以为相机模块配置不同曝光、不同感光度(感光度可以用ISO值表示)以达到每帧RAW图对应的EV值要求。也即,HDR场景中,sensor需要输出不同曝光、不同ISO的多帧RAW图。
又例如,当用户打开手机的拍照应用程序进行拍照的过程中,手机检测到当前拍照场景为低亮场景时,可以确认sensor需要输出不同曝光、不同ISO的多帧图像,以进行多帧融合去噪。然后,手机也可以根据前述基于场景检测确定出的sensor的出图要求,通过相机模块采集不同曝光、不同ISO的多帧RAW图。
又例如,当用户打开手机的拍照应用程序进行拍照的过程中,手机检测到当前拍照场景为景深不足场景时,可以确认sensor需要输出不同对焦距离下的多帧图像,以进行多帧融合扩展景深(extend depth of field,EDOF)。然后,手机可以根据前述基于场景检测确定出的sensor的出图要求,通过相机模块采集不同对焦距离下的多帧RAW图。
可以理解的,前述对手机检测当前拍照场景,并根据对当前拍照场景的检测确认相机模块中sensor的出图要求的说明,均为示例性说明,本申请实施例对此不作限制。
示例性地,请继续参考图4所示,手机的场景检测功能可以通过在手机中部署一个场景检测模块来实现。可选地,场景检测模块可以是手机中的一个程序模块(或算法单元)。
在一些实施例中,场景检测模块可以检测用户在拍照应用程序中选择的拍照场景作 为当前拍照场景。例如,图6示出了本申请实施例提供的拍照界面的另一示意图。如图6所示,当用户打开手机的拍照应用程序后,手机可以提供图6中所示的拍照界面,拍照界面中会显示当前拍照场景对应的预览画面,预览画面是相机模块实时采集(采集过程参考前述实施例所述)到的当前拍照场景的画面。另外,拍照界面中还包括至少一个场景对应的功能控件,如图6中所示的功能控件HDR。当用户在前述图6所示的拍照界面中对功能控件HDR进行点击或触摸时,手机的场景检测模块可以检测到用户对功能控件HDR的点击或触摸操作,并响应于该点击操作确定当前拍照场景为HDR场景,确认sensor需要输出不同曝光、不同感光度的多帧图像,以进行多帧融合生成高动态图。然后,当用户点击或触摸拍照按键的功能控件时,手机可以响应于用户对拍照按键的功能控件的点击或触摸操作,根据sensor需要输出不同曝光、不同感光度的多帧图像的出图要求,通过相机模块采集不同曝光、不同ISO的多帧RAW图。
在另外一些实施例中,场景检测模块也可以是根据手机的传感器数据和/或相机模块采集的预览画面确定当前拍照场景。例如,手机在启动运行拍照应用程序后,场景检测模块可以根据环境光传感器采集的数据和/或根据相机模块采集的预览画面确定当前拍照场景为低亮场景,并确定sensor需要输出不同曝光、不同ISO的多帧图像,以进行多帧融合去噪。然后,当用户点击或触摸拍照按键的功能控件(拍照按键可以参考图5/图6所示)时,手机可以响应于用户对拍照按键的功能控件的点击或触摸操作,根据sensor需要输出不同曝光、不同感光度的多帧图像的出图要求,通过相机模块采集不同曝光、不同ISO的多帧RAW图。
又例如,手机在启动运行拍照应用程序后,场景检测模块可以根据相机模块采集的预览画面中过曝区域和/或欠曝区域的占比,判断当前拍照场景是否为HDR场景。如:过曝区域的占比大于某个阈值,则确定当前拍照场景为HDR场景,示例性地,该阈值可以是60%、70%等,此处不作限制。当场景检测模块确定当前拍照场景为HDR场景时,可以确认sensor需要输出不同曝光、不同感光度的多帧图像,以进行多帧融合生成高动态图。然后,当用户点击或触摸拍照按键的功能控件时,手机可以响应于用户对拍照按键的功能控件的点击或触摸操作,根据sensor需要输出不同曝光、不同感光度的多帧图像的出图要求,通过相机模块采集不同曝光、不同ISO的多帧RAW图。本申请在此对场景检测模块的具体实现不作限制。
可选地,如果场景检测模块检测到当前拍照场景为普通场景(即,不是上述HDR、低亮、景深不足等特别的场景),则可以确认sensor需要输出一帧图像。然后,当用户使用手机进行拍照时,手机可以根据前述sensor的出图要求,通过相机模块采集一帧RAW图。
由上可知,从整体来看,本申请实施例中,手机通过相机模块采集RAW图可以包括采集一帧RAW图和采集多帧RAW图两种场景。
一些实施例中,手机通过编码模块对相机模块采集的RAW图进行编码得到RAW图对应的编码码流,并将RAW图对应的编码码流上传给云端可以是指:不论相机模块采集一帧RAW图,还是采集多帧RAW图,手机都对相机模块采集的RAW图进行编码得到RAW图对应的编码码流,并将RAW图对应的编码码流上传给云端。
也即,本实施例中,不论相机模块采集一帧RAW图,还是采集多帧RAW图,手机都会将相机模块采集的RAW图上传到云端进行处理。
另外一些实施例中,手机通过编码模块对相机模块采集的RAW图进行编码得到 RAW图对应的编码码流,并将RAW图对应的编码码流上传给云端也可以是指:如果相机模块采集多帧RAW图,则手机通过编码模块对相机模块采集的多帧RAW图进行编码得到多帧RAW图对应的编码码流,并将多帧RAW图对应的编码码流上传给云端。而如果相机模块采集一帧RAW图,则手机直接通过本地(即手机侧)的ISP模块、YUV域后处理模块对该帧RAW图依次进行ISP处理、以及YUV域图像处理,YUV域后处理模块会输出一帧YUV图。然后,手机可以通过本地的第一格式编码器对YUV域后处理模块输出的YUV图进行第一格式的编码,最终得到第一格式的图像(如:JPEG图像)。从而,手机可以将第一格式的图像保存在图库中或呈现给用户。具体地,一帧RAW图的处理过程,可以参考前述图1所示的过程,不再详细赘述。
也即,本实施例中,手机可以结合相机模块采集的RAW图对应的拍照场景自动判断是否需要将相机模块采集的RAW图上传到云端进行处理。只有在相机模块采集多帧RAW图时,手机才将相机模块采集到的RAW图上传到云端进行处理。
还有一些实施例中,手机还可以具有供用户选择是否将相机模块采集到的RAW图上传到云端进行处理的功能。例如,当手机启动运行拍照应用程序后,也可以在拍照界面为用户提供一个可选择是否将相机模块采集到的RAW图上传到云端进行处理的功能控件,用户可以通过对该功能控件进行操作,以主动选择是否将相机模块采集到的RAW图上传到云端进行处理。手机可以根据用户的选择,确定是否将相机模块采集到的RAW图上传到云端进行处理。
示例性地,图7示出了本申请实施例提供的拍照界面的又一示意图。如图7所示,一种实施方式中,手机启动运行拍照应用程序时,手机提供的拍照界面中还可以弹出如图7所示的提示信息:“是否上传到云端进行处理?”,同时,拍照界面中提示信息下面的区域还会显示两个功能控件:“是”和“否”。当用户点击或触摸功能控件“是”时,手机可以响应于用户对功能控件“是”的点击或触摸操作,确定需要将相机模块采集到的RAW图上传到云端进行处理。从而,手机在后续拍照过程中响应于用户对拍照按键的功能控件的点击或触摸操作通过相机模块采集到RAW图后,会通过编码模块对相机模块采集的RAW图进行编码得到RAW图对应的编码码流,并将RAW图对应的编码码流上传给云端。当用户点击或触摸功能控件“否”时,手机可以响应于用户对功能控件“否”的点击或触摸操作,确定不需要将相机模块采集到的RAW图上传到云端进行处理。从而,手机在后续拍照过程中响应于用户对拍照按键的功能控件的点击或触摸操作通过相机模块采集到RAW图后,会直接在本地(即手机侧)进行对相机模块采集的RAW图进行处理,具体处理过程也可以参考前述图1所示。
可选地,图7中所示的提示信息、以及功能控件“是”和“否”仅在手机每次启动运行拍照应用程序时才会显示,以供用户进行选择。当用户在选择“是”或“否”后,拍摄界面中的提示信息以及功能控件“是”和“否”会消失,用户可以继续使用手机进行拍照。或者,如果用户在一定时长(如3秒、5秒、8秒等)内未做任何选择,则手机可以默认选择为“是”或“否”,并不再显示提示信息、以及功能控件“是”和“否”。
可选地,图7中所示的提示信息、以及功能控件“是”和“否”仅为示例性说明。其他实施方式中,提示信息也可以是“是否上传到云端进行处理以得到更优的图像质量?”、“是否结合云端进行拍照?”等;功能控件“是”也可以被替换为“确定”、功能控件“否”也可以被替换为“取消”等,本申请在此均不作限制。另外,本申请对提示信息以及功能控件“是”和“否”在拍摄界面中的显示区域也不作限制。
示例性地,图8示出了本申请实施例提供的拍照界面的又一示意图。如图8所示,另一种实施方式中,手机启动运行拍照应用程序时,手机提供的拍照界面中在预览画面底部还可以显示两个功能控件:“手机处理模式”和“云端处理模式”。用户在点击或触摸拍照按键的功能控件进行拍照前,可以先点击或触摸功能控件:“手机处理模式”或“云端处理模式”进行选择。当用户点击或触摸功能控件“云端处理模式”时,手机可以响应于用户对功能控件“云端处理模式”的点击或触摸操作,确定需要将相机模块采集到的RAW图上传到云端进行处理。从而,手机在后续拍照过程中响应于用户对拍照按键的功能控件的点击或触摸操作通过相机模块采集到RAW图后,会通过编码模块对相机模块采集的RAW图进行编码得到RAW图对应的编码码流,并将RAW图对应的编码码流上传给云端。当用户点击或触摸功能控件“手机处理模式”时,手机可以响应于用户对功能控件“手机处理模式”的点击或触摸操作,确定不需要将相机模块采集到的RAW图上传到云端进行处理。从而,手机在后续拍照过程中响应于用户对拍照按键的功能控件的点击或触摸操作通过相机模块采集到RAW图后,会直接在本地(即手机侧)进行对相机模块采集的RAW图进行处理,具体处理过程也可以参考前述图1所示。
可选地,图8中所示的功能控件“手机处理模式”和“云端处理模式”可以仅在手机每次启动运行拍照应用程序时显示以供用户进行选择,如果用户在一定时长(如3秒、5秒、8秒等)内未做任何选择,则手机可以默认选择为“手机处理模式”或“云端处理模式”,并不再显示功能控件“手机处理模式”和“云端处理模式”。
或者,图8中所示的功能控件“手机处理模式”和“云端处理模式”也可以在拍照界面中一直显示供用户进行选择,此处不作限制。同样的,功能控件“手机处理模式”和“云端处理模式”一直显示的场景中,如果用户在一定时长内未做任何选择,则手机也可以默认选择为“手机处理模式”或“云端处理模式”。在该场景中,如果用户已经选择了“手机处理模式”,后续还可以重新选择“云端处理模式”以进行切换。类似地,也可以从“云端处理模式”切换至“手机处理模式”。
另外,图8中所示的功能控件“手机处理模式”和“云端处理模式”也仅为示例性说明。例如,其他实施方式中,“手机处理模式”也可以被替换为“本地模式”,“云端处理模式”也可以被替换为“云端模式”等,本申请在此均不作限制。同样的,本申请对功能控件“手机处理模式”和“云端处理模式”在拍摄界面中的显示区域也不作限制。
还有一些实施例中,前述实施例中所述的手机是否将相机模块采集的RAW图上传到云端进行处理的判断条件也可以部分结合在一起。例如,首先,可以如前述图7/8所示,手机可以响应于用户的主动选择操作,确定是否需要将相机模块采集的RAW图上传到云端进行处理。手机确定需要将相机模块采集的RAW图上传到云端进行处理后,在后续拍照过程中,手机可以进一步根据相机模块采集RAW图的场景,只在相机模块采集多帧RAW图时,将相机模块采集到的RAW图上传到云端进行处理;如果相机模块采集一帧RAW图,则在手机本地进行处理。如果手机响应于用户的主动选择操作,确定不需要将相机模块采集的RAW图上传到云端进行处理,则后续拍照过程中,不论相机模块采集多帧RAW图还是一帧RAW图,均在手机本地进行处理。
本申请实施例中,前述用户选择将RAW图上传至云端进行处理的操作均可以称为第一选择操作。如:用户选择图7中所述的功能控件“是”的操作、选择图8中所示的“云端处理模式”的操作等。
下面以相机模块的sensor输出多帧RAW图为例,对手机通过编码模块对相机模块采集的多帧RAW图进行编码的过程、以及云端通过解码模块对RAW图对应的编码码流进行解码的过程进行示例性说明。可以理解的,相机模块的sensor输出一帧RAW图时,手机通过编码模块对相机模块采集的一帧RAW图进行编码的过程、以及云端通过解码模块对RAW图对应的编码码流进行解码的过程可以参考sensor输出多帧RAW图时每一帧RAW图的处理过程,不再赘述。
图9示出了本申请实施例提供的编码模块的示意图。如图9所示,在一种可能的设计中,手机的编码模块中包括人工智能(artificial intelligence,AI)编码网络、量化模块和熵编码模块。当相机模块采集到多帧RAW图后,可以先将多帧RAW图输入AI编码网络,AI编码网络可以对多帧RAW图进行AI编码,输出多帧RAW图对应的压缩特征给量化模块。量化模块可以对多帧RAW图对应的压缩特征进行量化,如:可以将多帧RAW对应的压缩特征中的浮点数转化为二进制数或整数。熵编码模块可以对量化模块量化后的多帧RAW图对应的压缩特征进行熵编码,最终得到多帧RAW图对应的编码码流。也即,编码模块的输出为多帧RAW图对应的码流。
示例性地,熵编码模块的编码方式可以包括:香农(shannon)编码、哈尔曼(huffman)编码、算术编码(arithmetic coding)等,在此不作限制。
图10示出了本申请实施例提供的解码模块的示意图。如图10所示,与手机的编码模块对应,本设计中,云端的解码模块包括熵解码模块、反量化模块和AI解码网络。云端接收到来自手机的多帧RAW图对应的编码码流后,可以先将多帧RAW图对应的编码码流输入熵解码模块,熵解码模块可以采用与手机的熵编码模块相反的方式对多帧RAW图对应的编码码流进行熵解码,得到量化后的多帧RAW图对应的压缩特征并输出给反量化模块。反量化模块可以采用与手机的量化模块相反的方式对熵解码模块输出的量化后的多帧RAW图对应的压缩特征进行反量化,得到多帧RAW图对应的压缩特征并输出给AI解码网络。AI解码网络可以对多帧RAW图对应的压缩特征进行AI解码,输出多帧RAW图一一对应的重建RAW图。
示例性地,上述AI编码网络和AI解码网络可以是卷积神经网络(convolutional neural network,CNN)、循环神经网络(recurrent neural network,RNN)等,在此不作限制。
可选地,相机模块的sensor出RAW图的格式可以有多种,如:拜耳阵列(bayer pattern)、Foveon X3、富士X-E3等。下面以RGGB的bayer pattern格式为例,对本设计中,AI编码网络对多帧RAW图进行AI编码的过程进行举例说明。
图11示出了一种RAW图的RGGB格式排布示意图,相机模块的sensor输出的每一帧RAW图的RGGB格式的具体排布可以参考图11所示。其中,R表示红色(red)分量、G表示绿色(green)分量和B表示蓝色(blue)分量。示例性地,图11中用不同的填充图案表示每个像素点包括的R分量、G分量或者B分量的信息。
对于上述图11所示的按照RGGB格式排布的RAW图而言,AI编码网络可以将每帧RAW图中的R、G、G、B四个分量对应的位置分别抽取出来形成新的全部为R分量、全部为左上角G分量、全部为B分量、全部为右下角G分量的四个子图,从而可以学习到各个分量之间的图像内部的帧内相关性(或称为空间相关性),基于各个分量之间的图像内部的帧内相关性对RAW图进行压缩,输出每帧RAW图对应的压缩特征。
例如,假设手机检测到当前拍照场景为HDR场景,相机模块采集到8帧不同EV值的RAW图,如:8帧RAW图的EV值分别为EV0、EV0、EV0、EV0、EV0、EV-2、 EV-4、EV2。然后,相机模块可以将前述8帧RAW图输入AI编码网络。AI编码网络可以将每帧RAW图分别转化成R\G\G\B四个通道的数据流输出。每帧RAW图对应4个通道,8帧RAW图则对应4*8=32个通道,从而AI编码网络可以输出w/2*h/2*32通道的数据流,其中,w表示RAW图的宽,h表示RAW图的高,32表示通道数。w/2*h/2*32通道的数据流即为8帧RAW图对应的压缩特征。
AI解码网络对多帧RAW图对应的压缩特征进行AI解码的过程与此相反,即,AI解码网络为AI编码网络的反向网络。例如,云端接收到来自手机的8帧RAW图对应的编码码流,对8帧RAW图对应的编码码流进行熵解码和反量化,可以得到w/2*h/2*32通道的数据流。AI解码网络可以对w/2*h/2*32通道的数据流进行AI解码,得到一个w/2*h/2*32的重建图像。该w/2*h/2*32的重建图像根据RGGB通道的排布结构进行重排序后,可以得到8帧w*h的重建RAW图。
云端通过解码模块得到8帧重建RAW图后,可以依次通过RAW域后处理模块、ISP模块、以及YUV域后处理模块进行后续的图像处理得到一帧YUV图,并将YUV图送入第一格式的编码器。通过第一格式编码器可以对YUV图第一格式的编码,最终得到第一格式的图像(如:JPEG图像)。然后,云端可以将第一格式的图像回传给手机。从而,手机可以将第一格式的图像保存在图库中或呈现给用户。
一种实施方式中,云端通过解码模块得到8帧重建RAW图后,依次通过RAW域后处理模块、ISP模块、以及YUV域后处理模块进行后续的图像处理得到一帧YUV图的过程可以是:先通过RAW域后处理模块将8帧重建RAW图融合为一帧RAW图,然后将融合后的一帧RAW图输入到ISP模块;ISP模块对这一帧RAW图进行一系列的ISP处理,得到一帧YUV图并输入YUV域后处理模块;YUV域后处理模块对这一帧YUV图进行SR、美肤、畸变校正、虚化等处理,最终得到处理后的YUV图。
另一种实施方式中,上述多帧融合的处理也可以是在YUV域完成,也即,RAW域后处理模块的输出为8帧处理后的RAW图,8帧处理后的RAW图输入到ISP模块后,ISP模块可以对8帧处理后的RAW图进行一系列ISP处理,得到对应的8帧YUV图并输入YUV域后处理模块;YUV域后处理模块可以先将8帧YUV图融合为一帧YUV图,然后对融合后的这一帧YUV图进行SR、美肤、畸变校正、虚化等处理,最终得到处理后的YUV图。
可选地,上述AI编码网络和AI解码网络均是根据样本训练数据对神经网络(如前述CNN、RNN等)进行训练而得到的。样本训练数据可以包括样本RAW图、以及样本RAW图对应的样本重建RAW图。其中,样本RAW图可以是相机模块的sensor在不同场景下输出的RAW图;样本重建RAW图是先对样本RAW图进行ISP处理、YUV域处理以及第一格式编码得到第一格式的图像后,再对第一格式的图像进行与前述处理过程相反的退化反推后得到的。在训练AI编码网络和AI解码网络时,可以将样本RAW图作为AI编码网络的输入,样本重建RAW图作为AI解码网络的输出(AI编码网络的输出是AI解码网络的输入)进行训练。
另外,还可以根据损失函数计算AI编码网络的输入与AI解码网络的输出之间的损失,对AI编码网络和AI解码网络的参数(如:神经元的权重)进行优化,优化目标为AI编码网络的输入与AI解码网络的输出之间的损失尽可能小。
本申请实施例中,AI编码网络的输入为RAW图,AI解码网络的输出为重建RAW图。在计算AI编码网络的输入与AI解码网络的输出之间的损失时,可以先将输入AI 编码网络的RAW图依次通过云端的RAW域后处理模块、ISP模块、以及YUV域后处理模块进行图像处理得到对应的YUV图(下面称该YUV图为输入YUV图),以及将AI解码网络输出的重建RAW图依次通过云端的RAW域后处理模块、ISP模块、以及YUV域后处理模块进行图像处理得到对应的重建YUV图。然后,再计算输入YUV图和重建YUV图的损失,作为AI编码网络的输入和AI解码网络的输出之间的损失。从而,对AI编码网络和AI解码网络的参数进行优化的原则可以为输入YUV图和重建YUV图之间的损失尽可能小。
可选地,计算AI编码网络的输入与AI解码网络的输出之间的损失时,输入AI编码网络的RAW图可以是上述样本训练数据中的样本RAW图(可以称为第一样本RAW图),也可以是其他样本RAW图(可以称为第二样本RAW图)。
示例性地,损失函数可以包括峰值信噪比(peak signal-to-noise ratio,PSNR)、结构相似性(structural similarity,SSIM)、最小化绝对误差损失函数(least abosulote error-loss,L1-loss)等。对AI编码网络和AI解码网络的参数进行优化的优化器算法可以包括随机梯度下降法(stochastic gradient descent,SGD)、批量梯度下降法(batch gradient descent,BGD)等。
下面以PSNR和SSIM为例,对本申请实施例中计算AI编码网络的输入与AI解码网络的输出之间的损失的过程进行说明。
假设输入AI编码网络的RAW图经过RAW域后处理模块、ISP模块、以及YUV域后处理模块进行图像处理后的输入YUV图为为ori,其大小为m*n,即,ori包括m*n个像素点;对于前述输入AI编码网络的RAW图,AI解码网络最终输出的重建RAW图经过RAW域后处理模块、ISP模块、以及YUV域后处理模块进行图像处理后的重建YUV图为rec,其大小也为m*n,即,rec包括m*n个像素点,m和n均为大于0的整数。则,ori与rec之间的PSNR可以通过下述步骤1)和2)进行计算。
1)先通过下述公式(1)计算ori与rec之间的均方误差(mean-square error,MSE)。
其中,MSE表示ori与rec之间的均方误差,m表示ori与rec的宽,n表示ori与rec的高,(i,j)表示ori或rec中的像素点坐标,ori(i,j)表示像素点(i,j)的像素值。
2)然后,根据ori与rec之间的均方误差,通过下述公式(2)计算ori与rec之间的PSNR。
其中MAX
I为ori和rec可能的最大像素值。例如,每个像素都由8比特有效位来表示,则MAX
I就为255。
ori与rec之间的PSNR的值越大,表示rec相对于ori失真越小,也即ori和rec之间的损失越小。
ori与rec之间的SSIM可以通过下述公式(3)进行计算。
一般而言,c
1和c
2的取值可以如下所示。
c
1=(K1*L)
2
c
2=(K2*L)
2
其中,K1可以为0.01;K2可以为0.03;L表示像素值的动态范围,一般取为255。
ori与rec之间的SSIM的值越大,表示ori和rec之间的损失越小。
该示例中,对AI编码网络和AI解码网络的参数进行优化的原则可以为ori与rec之间的PSNR和SSIM的值尽可能大。例如,对AI编码网络和AI解码网络进行训练时,可以限定优化满足条件为ori与rec之间的PSNR大于第一阈值,以及ori与rec之间的SSIM大于第二阈值。其中,第一阈值可以38、39、40等值或者更大的值,可以根据图像质量的需求进行设置,第二阈值可以是0.8、0.85、0.9等取值在0至1范围内的数。
一般而言,在训练优化AI编码网络和AI解码网络时,都是直接计算AI编码网络的输入与AI解码网络的输出之间的损失。但本发明实施例中涉及到RAW图的压缩,而RAW图并非用户直接可查看的效果,后续会经过一系列的ISP处理、YUV域处理后才得到用户直接观察到的图像(如:JPEG图像)。而在对RAW图进行ISP处理、YUV域处理,将RAW图转换至YUV或RGB空间的过程中,还会产生一定的损失,导致RAW域损失无法直接与用户体验损失相匹配。
示例性地,ISP处理中的相关映射会导致RAW域损失无法直接与用户体验损失相匹配。例如,ISP模块中进行DRC、gamma校正等处理时,会导致不同数据段的RAW数据映射到最终用户看到的YUV域或者RGB域时会被压缩或者放大到另一数据段,从而造成损失。如:一般的RAW域数据有效位为10~12比特,但是最终的YUV域或RGB域数据有效位为8比特,因此会通过DRC进行映射压缩比特位。但是为了保证质量,通常这类压缩并非线性压缩,会保证在中间区间内的高频段出现的像素范围使用更多的比特宽。gamma校正也是一种对于亮度进行曲线映射的拉伸调整方式,也会导致最终的图像亮度与RAW域信息成非线性调整。
鉴于上述RAW域损失无法直接与用户体验损失相匹配的情况,如果直接计算输入AI编码网络的RAW图和AI解码网络输出的重建RAW图的损失,则可能会出现输入AI编码网络的RAW图和AI解码网络输出的重建RAW图之间的PSNR和/或SSIM较小,但经过一系列的ISP处理、YUV域处理转换到YUV、RGB空间后的损失非常大的情况。
例如,假设RAW图和重建RAW图每个像素都差1,以8比特有效位为例计算,那么根据公式(2)计算可得RAW图和重建RAW图之间的PSNR达到48.13分贝(dB),RAW图和重建RAW图的之间的PNSR已经非常高。通常YUV域或者RGB域PSNR达到40dB以上几乎就达到肉眼看不出明显问题的质量了,然而,在RAW域,R、G、G、B四个分量中,如果只有一个分量是呈现此类损失,则这种损失会导致明显的颜色偏色,对最终用户体验会有非常明显的问题。
而本申请实施例中,通过先将输入AI编码网络的RAW图和AI解码网络输出的重建RAW图先转化为对应的输入YUV图和重建YUV图,然后,再计算输入YUV图和重建YUV图的损失,作为AI编码网络的输入和AI解码网络的输出之间的损失,实现了在YUV域(或者RGB域)对AI编码网络的输入和AI解码网络的输出之间的损失进行估量,从而可以使得训练好的AI解码网络输出的重建RAW图在转换到YUV域以后损失仍然较小,避免最终呈现给用户的图像出现色差、失真等损失问题,减少RAW域损失无法直接与用户体验损失相匹配的情况所造成的影响。
可选地,本申请实施例中,对于相机模块的sensor输出多帧RAW图的场景而言,编码模块通过AI编码网络基于每帧RAW图的帧内相关性对RAW图进行压缩之前,还可以根据多帧RAW图之间的帧间相关性对多帧RAW图进行预处理。
例如,图12示出了本申请实施例提供的编码模块的另一示意图。如图12所示,编码模块还包括相关性处理模块。AI编码网络基于每帧RAW图的帧内相关性对RAW图进行压缩之前,相关性处理模块可以先从多帧RAW图中选择一帧作为参考帧,然后根据参考帧和多帧RAW图之间的帧间相关性对多帧RAW图中的其他帧进行预测,输出参考帧RAW图和相关性处理后的其他帧RAW图给AI编码网络进行后续的处理。其中,相关性处理后的其他帧RAW图是指根据参考帧RAW图对其他帧RAW图进行预测后得到的其他帧RAW图对应的残差图。也即,参考帧RAW图是直接送入AI编码网络进行后续处理的,而其他帧RAW图是根据参考帧和多帧RAW图之间的帧间相关性进行预测后,再送入AI编码网络进行处理的。通过这种方式可以减小其他帧RAW图所占的数据量,进而提高编码模块对多帧RAW图的压缩率。
以手机检测到当前拍照场景为HDR场景,相机模块采集不同EV值的多帧RAW图为例,一般而言,HDR场景下相机模块的sensor输出的多帧RAW图的EV值不同(可能部分相同),不同EV值对应的RAW图之间会有线性关系。例如,对于EV0和EV-1的两帧EV值相邻的RAW图(即,一帧RAW图的EV值为0,另一帧RAW图的EV值为-1),假设EV0的RAW图中某个位置(某个像素点)的像素值为P
0,EV-1的RAW图中该相同位置的像素值为P
-1,那么,对应的sensor会有一个黑电平的偏移量,减去该偏移量后,P
0和P
-1两个像素值之间满足2倍关系。即,P
0和P
-1满足下述公式(4)。
P
0-BLC=2×(P
-1-BLC) (4)
其中,BLC表示黑电平的偏移量。
当然,由于sensor输出的RAW图会有噪声、坏点、像素比特有效范围等因素干扰,P
0和P
-1可能不能完全满足上述公式(4)所述的线性关系,但是整体会呈现上述线性关系。
因此,对于HDR场景下相机模块的sensor输出的多帧EV值不同的RAW图,相关性处理模块可以先从多帧RAW图中选择一帧作为参考帧(可以是随机选择,也可以是选择第一帧),然后基于上述公式(4)所示的线性关系,根据参考帧对其他帧进行预测,其他帧RAW图可以被预测为一张几乎全0的RAW图(该RAW图即为残差图)。由于此处对于不满足公式(4)的数据,很大一部分是由于坏点、噪声等导致,因此这类数据的压缩率可以提升,尤其是绝对值较大的数据。
应当理解,上述公式(4)所示的线性关系仅为针对HDR场景的示例性说明,与此类似,其他sensor输出多帧RAW图的场景中,相关性处理模块也可以根据多帧RAW图之间的帧间相关性对RAW图进行预处理。例如,EDOF场景中,相关性处理模块可以根据多帧RAW图之间的区域清晰度的差异对RAW图进行预处理等。
本申请实施例中,sensor输出多帧RAW图的场景不同时,相关性处理模块可以采用不同的算法根据多帧RAW图之间的帧间相关性进行处理,本申请对此不作限制。例如,手机通过相机模块获取多帧RAW图时,会记录每帧RAW图的元数据(mata data)信息,也可以称为RAW图的说明数据或参数信息。示例性地,RAW图的mata data可以包括该帧RAW图的拍照场景(如HDR)、RAW图的宽和高、ISO值等。相关性处理模块可以根据多帧RAW图的mata data信息,选择相应的算法根据多帧RAW图之间的帧间相关性进行处理,如:HDR场景,则选择基于上述公式(4)所示的算法。可选地,如果某些场景下sensor输出的多帧RAW图之间不存在可以建立数学模型的线性关系(即,无 法通过算法提取帧间相关性)时,相关性处理模块也可以不进行帧间相关性处理。
示例性地,相关性处理模块可以是手机中的一个程序模块(或算法单元),能够针对不同的拍照场景,对多帧RAW图的相关性进行处理。可选地,对于相机模块的sensor输出一帧RAW图的场景而言,可以跳过相关性处理模块,直接将该帧RAW图送入AI编码网络进行后续处理。
对于相机模块的sensor输出多帧RAW图的场景而言,如果编码模块通过AI编码网络基于每帧RAW图的帧内相关性对RAW图进行压缩之前,根据多帧RAW图之间的帧间相关性对多帧RAW图进行相关性处理,则解码模块通过AI解码网络对多帧RAW图对应的压缩特征进行AI解码之后,得到的结果为参考帧RAW图对应的重建RAW图、以及其他帧RAW图对应的重建残差图(相关性处理后的RAW图的重建图)。因此,与编码模块相对应,解码模块还需要对AI解码得到的参考帧RAW图对应的重建RAW图、以及其他帧RAW图对应的重建残差图进行相关性逆向处理,以得到多帧RAW图一一对应的重建RAW图。
例如,图13示出了本申请实施例提供的解码模块的另一示意图。如图13所示,解码模块还包括相关性逆向处理模块。解码模块通过AI解码网络对多帧RAW图对应的压缩特征进行AI解码,得到参考帧RAW图对应的重建RAW图、以及其他帧RAW图对应的重建残差图之后,相关性逆向处理模块可以对参考帧RAW图对应的重建RAW图、以及其他帧RAW图对应的重建残差图进行相关性逆向处理,得到多帧RAW图一一对应的重建RAW图。具体地,相关性逆向处理模块的处理过程与手机侧相关性处理模块的过程可以完全相反,在此不再赘述。
示例性地,手机向云端上传的多帧RAW图对应的编码码流中还包括每帧RAW图的mata data信息。相关性逆向处理模块可以根据每帧RAW图的mata data信息,得知手机侧相关性处理模块采用了何种算法对多帧RAW图进行了相关性处理,从而相关性逆向处理模块可以采用与手机侧相关性处理模块的过程相反的方式,对参考帧RAW图对应的重建RAW图、以及其他帧RAW图对应的重建残差图进行相关性逆向处理。
可选地,本申请另外一些实施例中,在训练获取AI编码网络和AI解码网络时,可以在样本训练数据增加sensor输出多帧RAW图的场景的数据,使AI编码网络和AI解码网络自己学习多帧RAW图之间的帧间相关性。对于相机模块的sensor输出多帧RAW图的场景而言,前述实施例中所述的相关性处理模块的功能可以通过AI编码网络来实现,对应的,相关性逆向处理模块的功能可以通过AI解码网络来实现。
例如,对于HDR场景,可以在样本RAW图中增加HDR场景下sensor输出的多帧RAW图,在样本重建RAW图中对应增加HDR场景下sensor输出的多帧RAW图一一对应的多帧重建RAW图。在使用样本训练数据对AI编码网络和AI解码网络进行训练时,对于前述HDR场景相关的训练数据,可以将多帧样本RAW图固定对应EV值差异放在对应通道上,让AI编码网络和AI解码网络学习HDR场景中不同EV值差异的RAW图的帧间相关性。
或者,对于EDOF场景,可以在样本RAW图中增加EDOF场景下sensor输出的多帧RAW图,在样本重建RAW图中对应增加EDOF场景下sensor输出的多帧RAW图一一对应的多帧重建RAW图等。对于前述EDOF场景相关的训练数据,可以将多帧样本RAW图固定相同对焦距离位置的放在对应通道上,让AI编码网络和AI解码网络学习EDOF场景中不同对焦距离下的RAW图的帧间相关性。
从而,训练完成后的AI编码网络对多帧RAW图进行AI编码时,既可以基于每帧RAW图的帧内相关性,对每帧RAW图进行压缩,同时也可以从多帧RAW图中选择一帧作为参考帧,根据参考帧和多帧RAW图之间的帧间相关性对多帧RAW图中的其他帧进行预测。对应的,训练完成后的AI解码网络对多帧RAW图对应的压缩特征进行AI解码时的过程,则与AI编码网络进行AI编码的过程相反,不再赘述。
本申请一些实施例中,可以仅训练获取一个AI编码网络,和对应的一个AI解码网络,对于sensor输出多帧RAW图的多种不同场景,该AI编码网络和AI解码网络均可以适用。例如,可以使用HDR场景下的样本训练数据对AI编码网络和AI解码网络进行训练,使AI编码网络和AI解码网络学习HDR场景中不同EV值差异的RAW图的帧间相关性。然后,可以继续使用EDOF场景下的样本训练数据对AI编码网络和AI解码网络进行训练,使AI编码网络和AI解码网络学习EDOF场景中不同对焦距离下的RAW图的帧间相关性。类似地,还可以使用更多不同的sensor输出多帧RAW图的场景下的样本训练数据对AI编码网络和AI解码网络进行训练等。从而,AI编码网络和对应的AI解码网络可以适用于sensor输出多帧RAW图的多种不同场景。
可以理解的,对于上述AI编码网络和AI解码网络可以适用于sensor输出多帧RAW图的多种不同场景而言,手机的编码模块中仅包含一个AI编码网络,云端的解码模块中仅包含一个AI解码网络。
另外一些实施例中,也可以针对sensor输出多帧RAW图的每种场景,训练获取适用于该场景的AI编码网络和AI解码网络。也即,对于sensor输出多帧RAW图的多种不同场景而言,可以训练获取与场景一一对应的多个AI编码网络和AI解码网络。例如,可以使用HDR场景下的样本训练数据对第一AI编码网络和第一AI解码网络进行训练,使第一AI编码网络和第一AI解码网络学习HDR场景中不同EV值差异的RAW图的帧间相关性。另外,还可以使用EDOF场景下的样本训练数据对第二AI编码网络和第二AI解码网络进行训练,使第二AI编码网络和第二AI解码网络学习EDOF场景中不同对焦距离下的RAW图的帧间相关性。类似地,还可以使用更多不同的sensor输出多帧RAW图的场景下的样本训练数据对第三AI编码网络和第三AI解码网络进行训练等。从而,每组AI编码网络和AI解码网络(第一AI编码网络和第一AI解码网络可以为一组)可以适用于sensor输出多帧RAW图的一种场景。
可以理解的,对于上述每组AI编码网络和AI解码网络适用sensor输出多帧RAW图的一种场景而言,若手机的编码模块中包含多个(如:M个,M为大于1的整数)AI编码网络,则云端的解码模块中对应包含多个(M个)AI解码网络。当sensor输出多帧RAW图时,编码模块可以根据多帧RAW图的mata data信息,选择符合多帧RAW图对应的场景的AI编码网络对多帧RAW图进行AI编码。手机向云端上传的多帧RAW图对应的编码码流中还包括每帧RAW图的mata data信息。解码模块可以根据多帧RAW图的mata data信息,选择符合多帧RAW图对应的场景的AI解码网络对多帧RAW图进行AI解码。
还有一些实施例中,也可以是针对部分比较接近或相似的场景,训练获取同一组AI编码网络和AI解码网络;针对其他不同的场景,则训练获取仅适用于该场景的AI编码网络和AI解码网络。例如,对于低亮场景、HDR场景而言,sensor输出的多帧RAW图均为不同EV值的多帧RAW图,所以,针对低亮场景和HDR场景可以训练获取同一组AI编码网络和AI解码网络。而对于EDOF场景而言,sensor输出的多帧RAW图为对焦 距离不同的多帧RAW图,所以,可以训练获取仅适用于EDOF场景的AI编码网络和AI解码网络。同样的,当sensor输出多帧RAW图时,编码模块和解码模块可以根据多帧RAW图的mata data信息,分别选择符合多帧RAW图对应的场景的AI编码网络和AI解码网络进行处理,不再赘述。需要说明的是,本申请对AI编码网络和AI解码网络与拍照场景的对应关系并不作限制。
前述实施例中,示例性给出了编码模块和解码模块基于AI网络的实现方案。可选地,还有一些实施例中,手机的编码模块也可以是分布式编码模块,相应的,云端的解码模块可以采取与分布式编码对应的策略进行解码。
以相机模块的sensor输出多帧RAW图为例,当相机模块采集到多帧RAW图后,分布式编码模块可以采用分布式信源编码(distributed source coding,DSC)的方式对多帧RAW图进行信道编码,得到多帧RAW图对应的多组码流包,其中,一帧RAW图对应一组码流包,每组码流包中包括多个码流包,每个码流包至少包括纠错码以及该码流包对应的RAW图的mata data信息。
示例性地,分布式编码模块可以采用低密度奇偶校验码(low-density parity-check,LDPC)、turbo编码等信道编码方式对每帧RAW图进行编码。如:假设某一帧RAW图的大小为2兆(M),则分布式编码模块采用LDPC编码算法对该帧RAW图进行编码时,可以将1024比特划分为1个码流包,得到该帧RAW图对应的16个码流包。
然后,手机可以以帧为单位,依次向云端上传每一帧RAW图对应的码流包,供云端进行解码以获取重建RAW图。具体地,图14示出了本申请实施例提供的解码模块的处理流程示意图。请参考图14所示,对于每一帧RAW图:手机可以上传该帧RAW图对应的码流包,云端的解码模块在接收到该帧RAW图对应的码流包时,可以获取该帧RAW图的预测值,并根据接收到的码流包中的纠错码对预测值进行纠错以对该帧RAW图进行解码(即,图14中所示解码模块中的解码过程)。然后,解码模块可以判断RAW图是否正确解码。如果正确解码,则输出重建RAW图至后续处理模块,如:RAW图后处理模块。如果云端根据接收到的码流包解码错误(即未正确解码),则说明当前接收到的这个码流包中包含的纠错码不足以进行纠错,云端可以要求手机侧继续传输纠错码,即,通知手机继续传输码流包,并重复前述过程直到解码正确。例如,手机可以先上传该帧RAW图对应的第1个码流包,云端的解码模块在接收到该帧RAW图对应的第1个码流包时,可以先获取该帧RAW图的预测值,然后根据第1个码流包中的纠错码对预测值进行纠错以对该帧RAW图进行解码。如果云端根据该帧RAW图对应的第1个码流包解码错误,则说明第1个码流包中包含的纠错码不足以进行纠错。云端可以要求手机侧继续传输纠错码,即,继续传输该帧图像对应的第2个码流包。如:云端可以向手机发送通知消息通知手机继续上传该帧图像对应的第2个码流包。从而,云端可以继续根据该帧RAW图对应的第2个码流包进行解码。以此类推,云端可以利用更多的纠错码进行解码,直到解码正确,二进制匹配为止时,云端可以得到该帧RAW图对应的重建RAW图。得到该帧RAW图对应的重建RAW图后,解码模块可以将重建RAW图输出给RAW域后处理模块进行后续处理。其中,解码正确是指不断迭代至所有的奇偶校验归0。
可选地,本申请实施例中,对于相机模块采集多帧RAW图的场景而言,云端的解码模块在预测获取第一帧RAW图的预测值时,可以根据初始预测值(pred)采用帧内预测的方式进行预测。其中,初始预测值可以是默认值(预设值),如可以是有效比特位 的中间值。例如,当有效比特位为255时,初始预测值可以是128。另外,这里所述的帧内预测的方式可以参考现有图像编码或者视频的帧内预测的方式,在此不作限制。
云端的解码模块在预测获取第一帧RAW图之后的其他帧RAW图的预测值时(如:第二帧,第三帧等,这里第二帧、第三帧等用于表示手机上传码流包的RAW图顺序),可以根据已经解码得到的重建RAW图建立帧间相关性预测模型,并根据该帧间相关性预测模型预测获取其他帧RAW图的预测值。
举例说明,假设HDR场景下,相机模块的sensor输出3帧不同EV值的RAW图,EV值分别为EV0、EV-1和EV-2,则手机上传给云端的码流包中包括的RAW图的mata data信息至少包括:码流包对应的RAW图的拍照场景为HDR场景、以及码流包对应的RAW图的EV值。其中,手机上传给云端的第一帧RAW图的EV值为EV0,第二帧RAW图的EV值为EV-1,第三帧RAW图的EV值为EV-2。
对于EV0的RAW图,云端的解码模块可以根据初始预测值(pred)采用帧内预测的方式,获取EV0的RAW图的预测值,并按照如前述所述的方式根据EV0的RAW图对应的码流包进行纠错,得到EV0的RAW图对应的重建RAW图。
然后,对于EV-1的RAW图,云端的解码模块可以根据EV-1的RAW图对应的码流包中包括的RAW图的mata data信息,确定出EV0的RAW图和EV-1的RAW图存在如前述公式(4)所述的线性关系。从而,云端的解码模块可以将EV0的RAW图对应的重建RAW图作为参考帧,分段建立EV-1的RAW图对应的如下相关性预测模型预测获取EV-1的RAW图的预测值。
其中,pred
-1表示EV-1的RAW图的预测值,rec
0表示EV0的RAW图对应的重建RAW图的实际值;参数a
1、a
2、b
1、b
2的初始值依次为2、2、0、0;min和max分别设置为有效比特位最大值的1/16和15/16。例如,假设有效比特位为8比特,则min设置为16,max设置为240。有效比特位与sensor有关,在此不限制其大小。
通过上述EV-1的RAW图对应的相关性预测模型,云端可以获取EV-1的RAW图的预测值,然后,将预测值作为默认解码模块已接收到的数据,并按照如前述所述的方式通过从手机侧实际传过来的码流包中的纠错码对预测值进行纠错,直至得到EV-1的RAW图对应的重建RAW图。
进一步地,一些实施方式中,对于EV-2的RAW图,云端的解码模块可以根据EV-2的RAW图对应的码流包中包括的RAW图的mata data信息,确定出EV-1的RAW图和EV-2的RAW图存在如前述公式(4)所述的线性关系。从而,云端的解码模块可以将EV-1的RAW图对应的重建RAW图作为参考帧,分段建立EV-2的RAW图对应的相关性预测模型预测获取EV-2的RAW图的预测值。具体可以参考上述EV-1的RAW图对应的相关性预测模型,不再赘述。
另外一些实施方式中,对于EV-2的RAW图,云端的解码模块也可以将EV-0的RAW图对应的重建RAW图、以及EV-1的RAW图对应的重建RAW图均作为参考帧,分段建立EV-2的RAW图对应的如下相关性预测模型预测获取EV-2的RAW图的预测值。
其中,pred
-2表示EV-2的RAW图的预测值,rec
0表示EV0的RAW图对应的重建RAW图的实际值,rec
-1表示EV-1的RAW图对应的重建RAW图的实际值;参数a
1、a
2、b
1、b
2、c
1、c
2的初始值可以分别设置为2、2、1、1、0、0;参数a
1、a
2、b
1、b
2的初始值依次为2、2、0、0;min和max与前述实施例相同,不再赘述。
通过上述EV-2的RAW图对应的相关性预测模型,云端也可以获取EV-2的RAW图的预测值,然后,将预测值作为默认解码模块已接收到的数据,并按照如前述所述的方式通过从手机侧实际传过来的码流包中的纠错码对预测值进行纠错,直至得到EV-2的RAW图对应的重建RAW图。
根据上述获取获取EV-1的RAW图的预测值、以及获取EV-2的RAW图的预测值的方式可知,本申请实施例中,在获取某个EV值的RAW图的预测值时,可以选择已经得到上一帧重建RAW图作为参考帧,也可以选择已经得到的多帧重建RAW图共同作为参考帧。也即,对于当前云端已经重建的内容,均可以作为后续帧的参考,如:假设已经有n帧解码成功,则下一帧可以设置的最大参考帧数目为n(n为大于0的整数)。可选地,本申请实施例中,当相机模块的sensor输出的RAW图的帧数更多时,云端可以根据实际需求确定用于参考的已经重建帧的数量,在此不作限制。
可选地,云端的解码模块在上述解码获取重建RAW图的过程中,还可以根据重建RAW图对上述相关性预测模型中的a
1、a
2、b
1、b
2等参数进行更新。例如,云端在得到EV0的RAW图对应的重建RAW图、以及EV-1的RAW图对应的重建RAW图之后,可以将EV0的RAW图对应的重建RAW图、以及EV-1的RAW图对应的重建RAW图代入上述EV-1的RAW图对应的相关性预测模型中(将pred
-1替换为EV-1的RAW图对应的重建RAW图的实际值),重新计算a
1、a
2、b
1、b
2的值,并将a
1、a
2、b
1、b
2用重新计算后的结果替换以实现更新。后续建立EV-2的RAW图对应的相关性预测模型时,则可以使用更新后的a
1、a
2、b
1、b
2等参数。
类似地,在得到EV-2的RAW图对应的重建图之后,可以继续对a
1、a
2、b
1、b
2等参数进行更新,不再赘述。也即,本申请实施例中,可以在云侧通过更多的数据对a
1、a
2、b
1、b
2等参数进行不断更新。
需要说明的是,通常sensor出图在接近0的较暗区域以及接近最高比特位最大值的过曝区域附近无法满足线性关系,因此,本申请实施例中,可以假设其满足线性关系进行数学模型建模,并用来预测下一个EV的像素值。
另外,可以理解的,上述HDR场景下的相关性预测模型仅为示例性说明,也可以根据sensor输出的多帧RAW图的拍照场景相应的建立其它数学模型,并对相关未知参数进行不断刷新,在此不作限制。
由上可知,当手机侧的编码模块采用分布式编码的方法对RAW图进行编码时,在云端的预测值越准确,则需要传输的纠错码就会越少,压缩率会越高。因此,本实施例可以充分利用云端的数据相关性,达到更高压缩率,有效节省上传流量。
可选地,本申请实施例中,云端的解码模块还可以结合云端存储的其他已有数据预测获取RAW图的预测值。其他已有数据可以是用户(可以是当前拍照用户或其他用户)上传存储的历史图像、历史图像处理过程中建立过的一些相关性预测模型等。例如,某个EV值的RAW图的mata data信息中还可以包括手机获取该帧RAW图时的位置信息(如:可以是手机的经纬度坐标),云端需要获取该帧RAW图的预测值时,可以先根据该帧RAW图对应的位置信息从云端的数据库中获取同样是在该位置信息对应的地点 (如某个景点)拍摄的其他已有图像,然后,可以将从数据库中获取的同样位置信息的其他已有图像对应的RAW图作为参考帧,以获取RAW图的预测值等。
可选地,也有一些实施例中,RAW域处理算法也可以放在手机侧实现。例如,对于相机模块采集多帧RAW图的场景,可以在手机侧先对多帧RAW图进行多帧融合处理,得到单帧RAW图。然后,手机可以将得到的单帧RAW图上传到云端进行ISP处理、YUV域处理、第一格式编码等,最终云端可以返回第一格式的图像给手机侧。对于单帧图像的上传,可以参考前述实施例中所述,在此不再赘述。
可选地,请参考图4所示,本申请实施例中,云端进行YUV域图像处理时,还可以根据数据库中存储的高质量参考图,对YUV图进行优化,从而使得最终得到的第一格式的图像质量更好。例如,云端可以对数据库中存储的高质量参考图进行参考学习,得到适用于各种场景的网络模型,然后,在上述拍照过程中,云端可以利用符合拍照场景的网络模型对YUV图进行优化。在此对网络模型的网络架构具体不作限制。
又例如,对于用户自拍的场景而言,一般都是用户自己拍照。假如当前照片由于手抖等原因拍摄模糊,那么在用户授权可以访问本人的数据的情况下,云端还可以利用已有的数据进行AI学习的方式通过本人已有清晰照片的人脸信息对当前模糊图像学习,得到更加真实的本人清晰照片。
类似地,本申请实施例可以充分利用云端的资源,使得图像处理达到更好的效果,在此不再一一举例说明。
对应于前述实施例中所述的图像处理方法,本申请实施例还提供一种图像处理装置,可以应用于终端设备。该装置的功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。硬件或软件包括一个或多个与上述功能相对应的模块或单元。例如,图15示出了本申请实施例提供的一种图像处理装置的结构示意图。如图15所示,该图像处理装置可以包括:相机模块1501、编码模块1502、发送模块1503、接收模块1504。
其中,相机模块1501,用于响应于用户的拍照操作,采集当前拍照场景对应的RAW图;编码模块1502,用于对当前拍照场景对应的RAW图进行编码,得到当前拍照场景对应的RAW图的编码码流;发送模块1503,用于向云端发送当前拍照场景对应的RAW图的编码码流;接收模块1504,用于接收来自云端的第一格式的图像,第一格式的图像为云端根据当前拍照场景对应的RAW图的编码码流所生成的。
可选地,编码模块1502的具体编码过程、以及具体结构,可以参考前述方法实施例中所述,在此不再赘述。
可选地,相机模块1501还用于响应于用户的第一选择操作,确定需要将拍照时采集的RAW图上传到云端进行处理。
类似地,本申请实施例还提供一种图像处理装置,可以应用于云端。该装置的功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。硬件或软件包括一个或多个与上述功能相对应的模块或单元。例如,图16示出了本申请实施例提供的另一种图像处理装置的结构示意图。如图16所示,该图像处理装置可以包括:接收模块1601、解码模块1602、处理模块1603、发送模块1604。
其中,接收模块1601,用于接收来自终端设备的当前拍照场景对应的RAW图的编码码流;解码模块1602,用于对当前拍照场景对应的RAW图的编码码流进行解码,得到当前拍照场景对应的重建RAW图;处理模块1603,用于对当前拍照场景对应的重建RAW图进行处理,生成当前拍照场景对应的第一格式的图像;发送模块1604,用于向 终端设备发送第一格式的图像。
例如,处理模块1603可以包括RAW域后处理模块、ISP模块、YUV域后处理模块、第一格式编码器等。处理模块1603的具体处理过程、以及具体结构可以参考前述方法实施例中所述。
可选地,解码模块1602的具体编码过程、以及具体结构,也可以参考前述方法实施例中所述,在此不再赘述。
应理解以上装置中单元或模块(以下均称为单元)的划分仅仅是一种逻辑功能的划分,实际实现时可以全部或部分集成到一个物理实体上,也可以物理上分开。且装置中的单元可以全部以软件通过处理元件调用的形式实现;也可以全部以硬件的形式实现;还可以部分单元以软件通过处理元件调用的形式实现,部分单元以硬件的形式实现。
例如,各个单元可以为单独设立的处理元件,也可以集成在装置的某一个芯片中实现,此外,也可以以程序的形式存储于存储器中,由装置的某一个处理元件调用并执行该单元的功能。此外这些单元全部或部分可以集成在一起,也可以独立实现。这里所述的处理元件又可以称为处理器,可以是一种具有信号的处理能力的集成电路。在实现过程中,上述方法的各步骤或以上各个单元可以通过处理器元件中的硬件的集成逻辑电路实现或者以软件通过处理元件调用的形式实现。
在一个例子中,以上装置中的单元可以是被配置成实施以上方法的一个或多个集成电路,例如:一个或多个专用集成电路(application specific integrated circuit,ASIC),或,一个或多个数字信号处理器(digital signal process,DSP),或,一个或者多个现场可编辑逻辑门阵列(field programmable gate array,FPGA),或这些集成电路形式中至少两种的组合。
再如,当装置中的单元可以通过处理元件调度程序的形式实现时,该处理元件可以是通用处理器,例如CPU或其它可以调用程序的处理器。再如,这些单元可以集成在一起,以片上系统(system-on-a-chip,SOC)的形式实现。
在一种实现中,以上装置实现以上方法中各个对应步骤的单元可以通过处理元件调度程序的形式实现。例如,该装置可以包括处理元件和存储元件,处理元件调用存储元件存储的程序,以执行以上方法实施例所述的方法。存储元件可以为与处理元件处于同一芯片上的存储元件,即片内存储元件。
在另一种实现中,用于执行以上方法的程序可以在与处理元件处于不同芯片上的存储元件,即片外存储元件。此时,处理元件从片外存储元件调用或加载程序于片内存储元件上,以调用并执行以上方法实施例所述的方法。
例如,本申请实施例还可以提供一种装置,如:电子设备,可以包括:处理器,用于存储该处理器可执行指令的存储器。该处理器被配置为执行上述指令时,使得该电子设备实现如前述实施例所述的图像处理方法中,终端设备执行的步骤或者云端执行的步骤。该存储器可以位于该电子设备之内,也可以位于该电子设备之外。且该处理器包括一个或多个。
在又一种实现中,该装置实现以上方法中各个步骤的单元可以是被配置成一个或多个处理元件,这里的处理元件可以为集成电路,例如:一个或多个ASIC,或,一个或多个DSP,或,一个或者多个FPGA,或者这些类集成电路的组合。这些集成电路可以集成在一起,构成芯片。
例如,本申请实施例还提供一种芯片,该芯片可以应用于上述电子设备。芯片包括 一个或多个接口电路和一个或多个处理器;接口电路和处理器通过线路互联;处理器通过接口电路从电子设备的存储器接收并执行计算机指令,以实现如前述实施例所述的图像处理方法中,终端设备执行的步骤或者云端执行的步骤。
本申请实施例还提供一种计算机程序产品,包括计算机可读代码,当计算机可读代码在电子设备中运行时,使得电子设备实现如前述实施例所述的图像处理方法中,终端设备执行的步骤或者云端执行的步骤。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,如:程序。该软件产品存储在一个程序产品,如计算机可读存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
例如,本申请实施例还可以提供一种计算机可读存储介质,其上存储有计算机程序指令。当计算机程序指令被电子设备执行时,使得电子设备实现如前述实施例所述的图像处理方法中,终端设备执行的步骤或者云端执行的步骤。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。
Claims (16)
- 一种图像处理方法,其特征在于,所述方法包括:终端设备响应于用户的拍照操作,采集当前拍照场景对应的RAW图;所述终端设备对所述当前拍照场景对应的RAW图进行编码,得到所述当前拍照场景对应的RAW图的编码码流,并向云端发送所述当前拍照场景对应的RAW图的编码码流;所述终端设备接收来自云端的第一格式的图像,所述第一格式的图像为云端根据所述当前拍照场景对应的RAW图的编码码流所生成的。
- 根据权利要求1所述的方法,其特征在于,所述当前拍照场景对应的RAW图包括一帧或多帧;所述终端设备对所述当前拍照场景对应的RAW图进行编码,得到所述当前拍照场景对应的RAW图的编码码流,并向云端发送所述当前拍照场景对应的RAW图的编码码流,包括:当所述当前拍照场景对应的RAW图包括多帧时,所述终端设备对多帧所述RAW图进行编码,得到多帧所述RAW图的编码码流,并向云端发送多帧所述RAW图的编码码流。
- 根据权利要求1或2所述的方法,其特征在于,所述终端设备响应于用户的拍照操作,采集当前拍照场景对应的RAW图之前,所述方法还包括:所述终端设备响应于用户的第一选择操作,确定需要将拍照时采集的RAW图上传到云端进行处理。
- 根据权利要求1-3任一项所述的方法,其特征在于,所述终端设备对所述当前拍照场景对应的RAW图进行编码,得到所述当前拍照场景对应的RAW图的编码码流,包括:所述终端设备对所述当前拍照场景对应的RAW图进行压缩,得到所述当前拍照场景对应的RAW图的压缩特征;所述终端设备对所述当前拍照场景对应的RAW图的压缩特征进行量化;所述终端设备对所述当前拍照场景对应的RAW图的量化后的压缩特征进行熵编码,得到所述当前拍照场景对应的RAW图的编码码流。
- 根据权利要求4所述的方法,其特征在于,当所述当前拍照场景对应的RAW图包括多帧时,所述终端设备对所述当前拍照场景对应的RAW图进行压缩,得到所述当前拍照场景对应的RAW图的压缩特征,包括:所述终端设备根据所述当前拍照场景的类型,确定多帧所述RAW图之间的帧间相关性;所述终端设备从多帧所述RAW图中选择一帧作为参考帧,并根据所述参考帧、以及多帧所述RAW图之间的帧间相关性,对多帧所述RAW图中除参考帧之外的其他帧进行预测,得到其他帧对应的残差图;所述终端设备对多帧所述RAW图中除参考帧之外的其他帧对应的残差图、以及所述参考帧进行压缩,得到多帧所述RAW图的压缩特征。
- 根据权利要求5所述的方法,其特征在于,所述方法还包括:所述终端设备根据多帧所述RAW图的元数据信息,确定所述当前拍照场景的类型。
- 根据权利要求1-3任一项所述的方法,其特征在于,所述终端设备对所述当前拍照场景对应的RAW图进行编码,得到所述当前拍照场景对应的RAW图的编码码流,包括:所述终端设备采用分布式信源编码的方式对所述当前拍照场景对应的RAW图进行信道编码,得到所述当前拍照场景对应的RAW图的编码码流;其中,当所述当前拍照场景对应的RAW图包括多帧时,所述当前拍照场景对应的RAW图的编码码流包括与多帧所述RAW图一一对应的多组码流包;当所述当前拍照场景对应的RAW图包括一帧时,所述当前拍照场景对应的RAW图的编码码流包括与一帧所述RAW图对应的一组码流包;每组所述码流包中包括多个码流包,每个所述码流包至少包括纠错码、以及所述码流包对应的一帧RAW图的元数据信息;所述终端设备向云端发送所述当前拍照场景对应的RAW图的编码码流,包括:所述终端设备以帧为单位,依次向所述云端上传每一帧RAW图对应的码流包。
- 一种图像处理方法,其特征在于,所述方法包括:云端接收来自终端设备的当前拍照场景对应的RAW图的编码码流;所述云端对所述当前拍照场景对应的RAW图的编码码流进行解码,得到所述当前拍照场景对应的重建RAW图;所述云端对所述当前拍照场景对应的重建RAW图进行处理,生成所述当前拍照场景对应的第一格式的图像,并向所述终端设备发送所述第一格式的图像。
- 根据权利要求8所述的方法,其特征在于,所述云端对所述当前拍照场景对应的RAW图的编码码流进行解码,得到所述当前拍照场景对应的重建RAW图,包括:所述云端对所述当前拍照场景对应的RAW图的编码码流进行熵解码,得到所述当前拍照场景对应的RAW图的量化后的压缩特征;所述云端对所述当前拍照场景对应的RAW图的量化后压缩特征进行反量化,得到所述当前拍照场景对应的RAW图的压缩特征;所述云端对所述当前拍照场景对应的RAW图的压缩特征进行解压,得到所述当前拍照场景对应的重建RAW图。
- 根据权利要求9所述的方法,其特征在于,所述当前拍照场景对应的RAW图包括多帧;所述云端对所述当前拍照场景对应的RAW图的压缩特征进行解压,得到所述当前拍照场景对应的重建RAW图,包括:所述云端对多帧所述RAW图的压缩特征进行解压,得到多帧所述RAW图中的参考帧对应的重建RAW图、以及其他帧对应的残差图;所述云端根据所述当前拍照场景的类型,确定多帧所述RAW图之间的帧间相关性;所述云端根据所述参考帧对应的重建RAW图、所述其他帧对应的残差图、以及多帧所述RAW图之间的帧间相关性,对多帧所述RAW图进行重建,得到与多帧所述RAW图一一对应的多帧重建RAW图。
- 根据权利要求10所述的方法,其特征在于,多帧所述RAW图的编码码流中还包括多帧所述RAW图的元数据信息;所述云端根据所述当前拍照场景的类型,确定多帧所述RAW图之间的帧间相关性之前,所述方法还包括:所述云端根据多帧所述RAW图的元数据信息,确定所述当前拍照场景的类型。
- 根据权利要求8所述的方法,其特征在于,所述当前拍照场景对应的RAW图 的编码码流是所述终端设备采用分布式信源编码的方式对所述当前拍照场景对应的RAW图进行信道编码得到的;当所述当前拍照场景对应的RAW图包括多帧时,所述当前拍照场景对应的RAW图的编码码流包括与多帧所述RAW图一一对应的多组码流包;当所述当前拍照场景对应的RAW图包括一帧时,所述当前拍照场景对应的RAW图的编码码流包括与一帧所述RAW图对应的一组码流包;每组所述码流包中包括多个码流包,每个所述码流包至少包括纠错码、以及所述码流包对应的一帧RAW图的元数据信息;所述云端对所述当前拍照场景对应的RAW图的编码码流进行解码,得到所述当前拍照场景对应的重建RAW图,包括:当所述当前拍照场景对应的RAW图包括一帧时,所述云端根据初始预测值采用帧内预测的方式对接收到的一帧所述RAW图对应的码流包进行解码,得到一帧所述RAW图对应的重建RAW图;当所述当前拍照场景对应的RAW图包括多帧时,所述云端根据初始预测值采用帧内预测的方式对接收到的第一帧RAW图对应的码流包进行解码,得到第一帧RAW图对应的重建RAW图;所述云端根据已经解码得到的重建RAW图中的至少一帧、以及多帧所述RAW图之间的帧间相关性,对接收到的第一帧RAW图之后的每一帧RAW图对应的码流包进行解码,得到第一帧RAW图之后的每一帧RAW图对应的重建RAW图。
- 根据权利要求8-12任一项所述的方法,其特征在于,当所述当前拍照场景对应的重建RAW图包括多帧时,所述云端对所述当前拍照场景对应的重建RAW图进行处理,生成所述当前拍照场景对应的第一格式的图像,包括:所述云端将多帧所述重建RAW图在RAW域融合为一帧重建RAW图;所述云端将融合后的一帧所述重建RAW图由RAW域转换至YUV域,得到一帧所述重建RAW图对应的YUV图;所述云端将一帧所述重建RAW图对应的YUV图编码为第一格式,得到所述当前拍照场景对应的第一格式的图像。
- 根据权利要求8-12任一项所述的方法,其特征在于,当所述当前拍照场景对应的重建RAW图包括多帧时,所述云端对所述当前拍照场景对应的重建RAW图进行处理,生成所述当前拍照场景对应的第一格式的图像,包括:所述云端将多帧所述重建RAW图由RAW域转换至YUV域,得到与多帧所述重建RAW图一一对应的多帧YUV图;所述云端将与多帧所述重建RAW图一一对应的多帧YUV图在YUV域融合为一帧YUV图;所述云端将融合后的一帧所述YUV图编码为第一格式,得到所述当前拍照场景对应的第一格式的图像。
- 一种电子设备,其特征在于,包括:处理器,用于存储所述处理器可执行指令的存储器;所述处理器被配置为执行所述指令时,使得所述电子设备实现如权利要求1-7任一项所述的方法,或者,如权利要求8-14任一项所述的方法。
- 一种计算机可读存储介质,其上存储有计算机程序指令;其特征在于,当所述计算机程序指令被电子设备执行时,使得电子设备实现如权利要求1-7任一项所述的方法,或者,如权利要求8-14任一项所述的方法。
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US18/260,611 US20240305782A1 (en) | 2021-01-08 | 2022-01-07 | Image processing method and apparatus, device, and storage medium |
| EP22736607.7A EP4254964A4 (en) | 2021-01-08 | 2022-01-07 | IMAGE PROCESSING METHOD AND APPARATUS, DEVICE, AND STORAGE MEDIUM |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110026530.3A CN114760480A (zh) | 2021-01-08 | 2021-01-08 | 图像处理方法、装置、设备及存储介质 |
| CN202110026530.3 | 2021-01-08 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2022148446A1 true WO2022148446A1 (zh) | 2022-07-14 |
Family
ID=82324993
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2022/070815 Ceased WO2022148446A1 (zh) | 2021-01-08 | 2022-01-07 | 图像处理方法、装置、设备及存储介质 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US20240305782A1 (zh) |
| EP (1) | EP4254964A4 (zh) |
| CN (1) | CN114760480A (zh) |
| WO (1) | WO2022148446A1 (zh) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115767262A (zh) * | 2022-10-31 | 2023-03-07 | 华为技术有限公司 | 拍照方法及电子设备 |
| CN117041601A (zh) * | 2023-10-09 | 2023-11-10 | 海克斯康制造智能技术(青岛)有限公司 | 一种基于isp神经网络模型的图像处理方法 |
| CN117119291A (zh) * | 2023-02-06 | 2023-11-24 | 荣耀终端有限公司 | 一种出图模式切换方法和电子设备 |
| US20240233084A9 (en) * | 2022-10-21 | 2024-07-11 | Intellindust Information Technology (Shenzhen) Co., Ltd | Image enhancement method, chip and image acquisition device |
Families Citing this family (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112929484B (zh) * | 2021-02-02 | 2022-07-12 | 维沃移动通信(杭州)有限公司 | 通知消息的提醒方法、通知消息的提醒装置、电子设备和可读存储介质 |
| US20250225610A1 (en) * | 2022-04-04 | 2025-07-10 | Dolby Laboratories Licensing Corporation | Raw image data reconstruction system and method |
| CN115359105B (zh) * | 2022-08-01 | 2023-08-11 | 荣耀终端有限公司 | 景深扩展图像生成方法、设备及存储介质 |
| CN118301356A (zh) * | 2023-01-03 | 2024-07-05 | 华为技术有限公司 | 一种图像处理方法及装置 |
| CN116366853B (zh) * | 2023-03-03 | 2025-09-19 | 苏州市科远软件技术开发有限公司 | 视频图像处理方法、系统、设备及存储介质 |
| CN118264889B (zh) * | 2023-08-29 | 2025-05-02 | 华为技术有限公司 | 图像处理的方法和电子设备 |
| EP4555746A1 (en) * | 2023-10-03 | 2025-05-21 | Google LLC | Video enhancement |
| WO2025198488A1 (en) * | 2024-03-21 | 2025-09-25 | Huawei Technologies Co., Ltd. | Image signal processing method, system and related devices |
| WO2026060596A1 (zh) * | 2024-09-19 | 2026-03-26 | 深圳引望智能技术有限公司 | 一种图像处理方法及装置 |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2003319311A (ja) * | 2002-04-23 | 2003-11-07 | Fuji Photo Film Co Ltd | 画像処理装置及びシステム |
| JP2008236396A (ja) * | 2007-03-20 | 2008-10-02 | Fujifilm Corp | カメラシステム、撮像装置及びサービスサーバ |
| CN104918027A (zh) * | 2014-03-11 | 2015-09-16 | 索尼公司 | 用于生成数字处理图片的方法、电子装置和服务器 |
| WO2018123078A1 (ja) * | 2016-12-27 | 2018-07-05 | 株式会社Nexpoint | 監視カメラシステム |
| CN110139109A (zh) * | 2018-02-08 | 2019-08-16 | 北京三星通信技术研究有限公司 | 图像的编码方法及相应终端 |
| CN110177289A (zh) * | 2019-05-14 | 2019-08-27 | 努比亚技术有限公司 | 终端设备及其录像方法、播放方法、云端设备 |
Family Cites Families (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| GB2521676B (en) * | 2013-12-31 | 2016-08-03 | Electric Road Ltd | System and method for powering an electric vehicle on a road |
| US20170118475A1 (en) * | 2015-10-22 | 2017-04-27 | Mediatek Inc. | Method and Apparatus of Video Compression for Non-stitched Panoramic Contents |
| CN105827957A (zh) * | 2016-03-16 | 2016-08-03 | 上海斐讯数据通信技术有限公司 | 一种图像处理的系统和方法 |
| KR102385188B1 (ko) * | 2017-09-29 | 2022-04-12 | 삼성전자주식회사 | 외부 전자 장치에서 생성된 정보를 이용하여 이미지 데이터를 처리하는 방법 및 전자 장치 |
| KR102495753B1 (ko) * | 2017-10-10 | 2023-02-03 | 삼성전자주식회사 | 카메라를 이용하여 획득한 원시 이미지를 외부 전자 장치를 이용하여 처리하는 방법 및 전자 장치 |
| CN111418201B (zh) * | 2018-03-27 | 2021-10-15 | 华为技术有限公司 | 一种拍摄方法及设备 |
| US10904564B2 (en) * | 2018-07-10 | 2021-01-26 | Tencent America LLC | Method and apparatus for video coding |
| CN110868548B (zh) * | 2018-08-27 | 2021-05-18 | 华为技术有限公司 | 一种图像处理方法及电子设备 |
| CN111741211B (zh) * | 2019-03-25 | 2022-07-29 | 华为技术有限公司 | 图像显示方法和设备 |
| CN110198417A (zh) * | 2019-06-28 | 2019-09-03 | Oppo广东移动通信有限公司 | 图像处理方法、装置、存储介质及电子设备 |
| US11451834B2 (en) * | 2019-09-16 | 2022-09-20 | Tencent America LLC | Method and apparatus for cross-component filtering |
-
2021
- 2021-01-08 CN CN202110026530.3A patent/CN114760480A/zh active Pending
-
2022
- 2022-01-07 WO PCT/CN2022/070815 patent/WO2022148446A1/zh not_active Ceased
- 2022-01-07 US US18/260,611 patent/US20240305782A1/en active Pending
- 2022-01-07 EP EP22736607.7A patent/EP4254964A4/en active Pending
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2003319311A (ja) * | 2002-04-23 | 2003-11-07 | Fuji Photo Film Co Ltd | 画像処理装置及びシステム |
| JP2008236396A (ja) * | 2007-03-20 | 2008-10-02 | Fujifilm Corp | カメラシステム、撮像装置及びサービスサーバ |
| CN104918027A (zh) * | 2014-03-11 | 2015-09-16 | 索尼公司 | 用于生成数字处理图片的方法、电子装置和服务器 |
| WO2018123078A1 (ja) * | 2016-12-27 | 2018-07-05 | 株式会社Nexpoint | 監視カメラシステム |
| CN110139109A (zh) * | 2018-02-08 | 2019-08-16 | 北京三星通信技术研究有限公司 | 图像的编码方法及相应终端 |
| CN110177289A (zh) * | 2019-05-14 | 2019-08-27 | 努比亚技术有限公司 | 终端设备及其录像方法、播放方法、云端设备 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4254964A4 |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20240233084A9 (en) * | 2022-10-21 | 2024-07-11 | Intellindust Information Technology (Shenzhen) Co., Ltd | Image enhancement method, chip and image acquisition device |
| US12597098B2 (en) * | 2022-10-21 | 2026-04-07 | Intellindust Information Technology (Shenzhen) Co., Ltd | Image enhancement method, chip and image acquisition device |
| CN115767262A (zh) * | 2022-10-31 | 2023-03-07 | 华为技术有限公司 | 拍照方法及电子设备 |
| CN115767262B (zh) * | 2022-10-31 | 2024-01-16 | 华为技术有限公司 | 拍照方法及电子设备 |
| CN117119291A (zh) * | 2023-02-06 | 2023-11-24 | 荣耀终端有限公司 | 一种出图模式切换方法和电子设备 |
| CN117041601A (zh) * | 2023-10-09 | 2023-11-10 | 海克斯康制造智能技术(青岛)有限公司 | 一种基于isp神经网络模型的图像处理方法 |
| CN117041601B (zh) * | 2023-10-09 | 2024-01-12 | 海克斯康制造智能技术(青岛)有限公司 | 一种基于isp神经网络模型的图像处理方法 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN114760480A (zh) | 2022-07-15 |
| EP4254964A4 (en) | 2024-06-12 |
| EP4254964A1 (en) | 2023-10-04 |
| US20240305782A1 (en) | 2024-09-12 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2022148446A1 (zh) | 图像处理方法、装置、设备及存储介质 | |
| US20220207680A1 (en) | Image Processing Method and Apparatus | |
| WO2021077878A1 (zh) | 图像处理方法、装置及电子设备 | |
| WO2024027287A9 (zh) | 图像处理系统及方法、计算机可读介质和电子设备 | |
| CN114331918B (zh) | 图像增强模型的训练方法、图像增强方法及电子设备 | |
| CN115330633B (zh) | 图像色调映射方法及装置、电子设备、存储介质 | |
| CN115735226B (zh) | 一种图像处理方法及芯片 | |
| CN117135293B (zh) | 图像处理方法和电子设备 | |
| US10600170B2 (en) | Method and device for producing a digital image | |
| CN113364964B (zh) | 图像处理方法、图像处理装置、存储介质与终端设备 | |
| JP2021118403A (ja) | 画像処理装置、その制御方法、プログラム並びに画像処理システム | |
| CN117135471B (zh) | 一种图像处理方法和电子设备 | |
| CN116048323B (zh) | 图像处理方法及电子设备 | |
| CN116228554B (zh) | 图像恢复方法、装置和计算机存储介质 | |
| CN114827430B (zh) | 一种图像处理方法、芯片及电子设备 | |
| WO2023246655A1 (zh) | 一种图像编码、解码方法及编码、解码装置 | |
| CN116095509A (zh) | 生成视频帧的方法、装置、电子设备及存储介质 | |
| CN115719316A (zh) | 图像处理方法及装置、电子设备和计算机可读存储介质 | |
| CN116233625A (zh) | 一种图像处理方法、电子设备及芯片 | |
| WO2024146349A1 (zh) | 一种图像处理方法及装置 | |
| CN119277212B (zh) | 图像处理方法和装置 | |
| WO2025198488A1 (en) | Image signal processing method, system and related devices | |
| CN118574008B (zh) | 一种自动曝光方法、终端、存储介质及程序产品 | |
| WO2024164736A1 (zh) | 视频处理方法及装置、计算机可读介质和电子设备 | |
| WO2024148968A9 (zh) | 一种图像预览方法及终端设备 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22736607 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 18260611 Country of ref document: US |
|
| ENP | Entry into the national phase |
Ref document number: 2022736607 Country of ref document: EP Effective date: 20230629 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |

