WO2021190221A1 - 沉浸式媒体提供方法、获取方法、装置、设备及存储介质 - Google Patents

沉浸式媒体提供方法、获取方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2021190221A1
WO2021190221A1 PCT/CN2021/077360 CN2021077360W WO2021190221A1 WO 2021190221 A1 WO2021190221 A1 WO 2021190221A1 CN 2021077360 W CN2021077360 W CN 2021077360W WO 2021190221 A1 WO2021190221 A1 WO 2021190221A1
Authority
WO
WIPO (PCT)
Prior art keywords
resolution
zoom
information
immersive media
selection strategy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2021/077360
Other languages
English (en)
French (fr)
Inventor
胡颖
许晓中
刘杉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to EP21777160.9A priority Critical patent/EP4009644A4/en
Publication of WO2021190221A1 publication Critical patent/WO2021190221A1/zh
Priority to US17/679,877 priority patent/US12425700B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/61Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio
    • H04L65/612Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio for unicast
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/75Media network packet handling
    • H04L65/756Media network packet handling adapting media to device capabilities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/80Responding to QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/04Protocols specially adapted for terminals or networks with limited capabilities; specially adapted for terminal portability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234327Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by decomposing into layers, e.g. base layer and one or more enhancement layers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234363Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by altering the spatial resolution, e.g. for clients with a lower screen resolution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/235Processing of additional data, e.g. scrambling of additional data or processing content descriptors
    • H04N21/2353Processing of additional data, e.g. scrambling of additional data or processing content descriptors specifically adapted to content descriptors, e.g. coding, compressing or processing of metadata
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/238Interfacing the downstream path of the transmission network, e.g. adapting the transmission rate of a video stream to network bandwidth; Processing of multiplex streams
    • H04N21/2387Stream processing in response to a playback request from an end-user, e.g. for trick-play
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/258Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
    • H04N21/25808Management of client data
    • H04N21/25825Management of client data involving client display capabilities, e.g. screen resolution of a mobile phone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/258Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
    • H04N21/25808Management of client data
    • H04N21/25833Management of client data involving client hardware characteristics, e.g. manufacturer, processing or storage capabilities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/258Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
    • H04N21/25866Management of end-user data
    • H04N21/25891Management of end-user data being end-user preferences
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/84Generation or processing of descriptive data, e.g. content descriptors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/85406Content authoring involving a specific file format, e.g. MP4 format
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Definitions

  • the embodiments of the present application relate to the field of audio and video technologies, and in particular, to an immersive media providing method, obtaining method, device, device, and storage medium.
  • Immersive Media is also called immersive media, and its purpose is to enable users to have an immersive audio-visual experience through audio and video technology.
  • the industry has supported the preparation of multiple file tracks of different resolutions on the server side, but has not given rules for selecting the multiple file tracks of different resolutions.
  • One method is to randomly select a file track of a certain resolution and send it to the client, and the other method is to send all the file tracks of multiple resolutions to the client.
  • the embodiments of the present application provide an immersive media providing method, obtaining method, device, device, and storage medium, which can adaptively select the resolution of the immersive media content according to the client's capabilities, so that under the premise of ensuring user experience, Improve the utilization of bandwidth resources.
  • the technical solution is as follows:
  • an embodiment of the present application provides a method for providing immersive media, which is executed by a server, and the method includes:
  • an embodiment of the present application provides an immersive media acquisition method, which is executed by a terminal, and the method includes:
  • the file format information of the immersive media content of the immersive media file includes resolution description information and resolution selection strategy information, where the resolution description information is used for Defining candidate resolutions of the immersive media content, and the resolution selection strategy information is used to define the resolution selection strategy of the immersive media content;
  • the immersive media file is presented according to the file format information.
  • an embodiment of the present application provides an immersive media providing device, and the device includes:
  • Add module used to add resolution description information and resolution selection strategy information to the file format information of immersive media content
  • a resolution selection module configured to determine the target resolution provided to the client according to the resolution description information of the immersive media content and the resolution selection strategy information;
  • the file sending module is used to send the immersive media file of the target resolution to the client.
  • an embodiment of the present application provides an immersive media acquisition device, and the device includes:
  • the file receiving module is configured to receive an immersive media file of the target resolution from the server, and the file format information of the immersive media content of the immersive media file includes resolution description information and resolution selection strategy information, wherein the The resolution description information is used to define the candidate resolution of the immersive media content, and the resolution selection strategy information is used to define the resolution selection strategy of the immersive media content;
  • the presentation module is configured to present the immersive media file according to the file format information.
  • an embodiment of the present application provides a computer device, the computer device includes a processor and a memory, and the memory stores executable instructions of the processor. When the instructions are executed by one or more processors, The foregoing method for providing immersive media or obtaining method for immersive media is realized.
  • the computer equipment is a server or a terminal.
  • an embodiment of the present application provides a computer-readable storage medium having processor-executable instructions stored in the computer-readable storage medium.
  • processors When the instructions are executed by one or more processors, the above-mentioned immersion is realized.
  • Media delivery methods When the instructions are executed by one or more processors, the above-mentioned immersion is realized.
  • an embodiment of the present application provides a computer-readable storage medium having processor-executable instructions stored in the computer-readable storage medium. When the instructions are executed by one or more processors, the above-mentioned immersion is realized. Media acquisition method.
  • an embodiment of the present application provides a computer program product, which is used to implement the above-mentioned immersive media providing method when the computer program product is executed by a processor.
  • an embodiment of the present application provides a computer program product, which is used to implement the above-mentioned immersive media acquisition method when the computer program product is executed by a processor.
  • Figure 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of an end-to-end processing flow of an immersive media playback system provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of a system processing architecture of an immersive media playback system provided by an embodiment of the present application
  • FIG. 4 is a schematic diagram of a client reference model based on an immersive media application provided by an embodiment of the present application
  • FIG. 5A is a flowchart of an immersive media providing method provided by an embodiment of the present application.
  • FIG. 5B is a flowchart of an immersive media providing method provided by an embodiment of the present application.
  • FIG. 5C is a flowchart of an immersive media providing method provided by an embodiment of the present application.
  • FIG. 6A is a block diagram of an immersive media providing apparatus according to an embodiment of the present application.
  • FIG. 6B is a block diagram of an immersive media providing apparatus according to an embodiment of the present application.
  • FIG. 7A is a block diagram of an immersive media acquisition device provided by an embodiment of the present application.
  • FIG. 7B is a block diagram of an immersive media acquisition device provided by an embodiment of the present application.
  • FIG. 8 is a structural block diagram of a server provided by an embodiment of the present application.
  • Fig. 9 is a structural block diagram of a terminal provided by an embodiment of the present application.
  • the technical solutions provided in the embodiments of the present application can be applied to any immersive media playback scene, such as an immersive media on-demand or live broadcast scene.
  • FIG. 1 shows a schematic diagram of an implementation environment provided by an embodiment of the present application.
  • the implementation environment can be realized as an immersive media playback system.
  • the implementation environment may include: a terminal 10 and a server 20.
  • the terminal 10 may be an electronic device such as a mobile phone, a tablet computer, a multimedia playback device, a television, a projector, a display, a wearable device, and a personal computer (PC).
  • the terminal 10 can install and run a client with an immersive media playback function.
  • the client may interact with the server 20, request to obtain immersive media content from the server 20, and play the obtained immersive media content.
  • the server 20 is used to provide immersive media content.
  • the server 20 may be one server, a server cluster composed of multiple servers, or a cloud computing service center.
  • the terminal 10 and the server 20 can communicate with each other through the network 30.
  • the network 30 may be a wired network or a wireless network.
  • the terminal 10 and the server 20 may also include one or more intermediate nodes, such as a Content Delivery Network (CDN) or other relay devices or routing devices, which are not limited in the embodiment of the present application.
  • CDN Content Delivery Network
  • FIG. 2 it shows a schematic diagram of the end-to-end processing flow of the immersive media playback system.
  • the processing flow may include: content acquisition and production 21, immersive media encoding/file encapsulation 22, immersive media transmission 23, immersive media decoding/file decapsulation 24, immersive media rendering 25 and other major technical links.
  • technical links such as content acquisition and production 21, immersive media encoding/file packaging 22, and immersive media transmission 23 can be performed by the server, and technical links such as immersive media decoding/decapsulation 24 and immersive media rendering 25 can be performed by the terminal (such as client End) execution.
  • FIG. 3 shows a schematic diagram of the system processing architecture of the immersive media playback system, including the processing and presentation of immersive media content from the server 31 to the terminal 32 (client), file format, and transmission signaling.
  • Real-world sound-visual scenes are collected by audio sensors, camera equipment (such as ordinary cameras, stereo cameras, light field cameras) and sensing equipment (such as lidar), converted into a series of data signals and then produced into virtual reality content Presented to the user to watch.
  • Camera equipment is deployed in a specific location to obtain video/image content in a certain space.
  • Audio can be obtained through different microphone configurations, and video/image and audio are kept synchronized in time and space.
  • For video/image content production it can be divided into 3DoF (Degree of Freedom) and 3DoF+ video production and 6DoF video production.
  • DoF refers to the movement supported by the user when watching immersive media and the freedom of content interaction.
  • 3DoF video production is recorded by a group of cameras or a camera device with multiple cameras and sensors.
  • the camera can usually capture content in all directions around the center of the device.
  • 3DoF+ video production combined with 3DoF video and in-depth information production.
  • 6DoF video production is mainly made of content in the form of point clouds and light fields captured by camera arrays.
  • 6DoF media needs to be processed before encoding.
  • point cloud media needs to be cut and mapped before encoding.
  • the collected audio/video is encoded into the corresponding audio and video stream.
  • point cloud data or light field information to represent the collected video
  • its corresponding encoding method (such as point cloud encoding) needs to be adopted.
  • the encoded media is encapsulated in a file container according to a certain format (such as ISO Base Media File Format (ISOBMFF) or other international standard systems) and combined with the description information of the media / combined with the metadata describing the attributes of the media content.
  • ISOBMFF ISO Base Media File Format
  • the data and window metadata form a media file or an initialization segment and a media segment according to a specific media file format.
  • media presentation description/signaling information and media file resources are stored.
  • the media presentation description/signaling information provides enough notification information to the client, so that the corresponding media content is delivered to the player and consumed under a transmission mechanism.
  • the client can dynamically request media file resources through quality/viewpoint adaptation according to the terminal status, such as head/eye/position tracking, network throughput, etc.
  • the media files are transmitted to the user terminal 32 through a transmission mechanism, for example, Dynamic Adaptive Streaming over HTTP (DASH) and Smart Media Transport (SMT).
  • DASH Dynamic Adaptive Streaming over HTTP
  • SMT Smart Media Transport
  • the user terminal 32 After receiving the media file, the user terminal 32 performs a series of processing such as unpacking, decoding, splicing/synthesizing, and rendering on the file to display virtual reality content.
  • FIG 4 shows a schematic diagram of a client reference model based on an immersive media application, which defines various functional components of the client.
  • the user terminal selects the media file through the remote server recommendation or the user's own needs, downloads or receives the media file pushed by the remote server from the remote server, and passes through the parser 41, the decoder 42, the converter 43, and the renderer. After a series of components such as 44 are processed, the display of virtual reality media content is realized. At the same time, the user terminal can perform remote rendering according to user needs.
  • Parser 41 The parser 41 provides processing of media files or fragments, extracts elementary streams and parses metadata, and the parsed metadata is used for rendering.
  • the parser 41 can perform dynamic information processing (such as tracking information of the user's head movement and position) according to the user's actions, such as dynamically selecting the downloaded media segments.
  • Decoder 42 The decoder 42 is used to decode the media stream provided by the parser 41 and output the decoded stream to the converter 43.
  • the converter 43 converts the decoded media into a spherical/3D (3 Dimensional, three-dimensional) video according to the metadata provided by the parser 41. For example, in the case of 3DoF, a plane image is mapped to a sphere, and in the case of 6DoF processing based on mapping and projection, a 2D (2 Dimensional, two-dimensional) information stream is reconstructed into 3D data. If necessary, a parser 41 can be used to parse the converted metadata.
  • Renderer 44 uses decoded signaling, rendering metadata, and window information (or considering other possible information) to render the video/audio.
  • 3DoF and 3DoF+ mainly render spherical media content based on the current viewpoint, disparity, and depth information, and 6DoF renders the 3D media content in the window from the current viewpoint.
  • Sensing device 45 obtains the direction of the current window and the position information of the user according to the movement of the user, and feeds it back to the user terminal parser 41.
  • the user terminal can choose to download appropriate media based on the window, the direction of the window, and the user's location information, or the parser 41 can select the appropriate media file based on the window and the user's location information.
  • Remote rendering platform 46 The remote rendering platform 46 is deployed on a remote server and performs rendering based on the window, the direction of the window, and the user's location information or the rendering metadata in the media file fed back by the user terminal. The user terminal is based on the rendering media of the remote rendering platform. Show directly.
  • the resolution of the immersive media is equivalent to the subjective quality of the immersive media, the objective quality of the immersive media, and the clarity of the immersive media.
  • the resolution of immersive media can include 8K, 4K, 2K, 1080p, 720p and many other different resolution titles. Examples of the typical resolution values (ie, the number of pixels in the horizontal x vertical directions) represented by the above various resolution titles are shown in Table-1 below:
  • FIG. 5A shows a flowchart of an immersive media providing method provided by an embodiment of the present application. This method can be applied to the implementation environment shown in FIG. 1. The method can include the following steps (501-504):
  • Step 501 The client obtains its own capability information, which is used to indicate the immersive media playback capability of the device where the client is located.
  • the capability information may include at least one of the following: device capability information, user authority information, and user bandwidth information.
  • the device capability information is used to reflect the processing capabilities of the device where the client is located, such as the rendering capability of immersive media content.
  • the device capability information can be used to indicate the maximum resolution supported by the device where the client is located, so as to inform the server where the client is located.
  • the user authority information is used to reflect the user authority corresponding to the client, such as the level and/or authority information of the user account logged in the client.
  • the user authority information can be used to indicate the maximum resolution supported by the corresponding user authority of the client. This tells the server the maximum resolution that the user of the client has permission to watch.
  • the user bandwidth information is used to reflect the bandwidth capability of the client. For example, the user bandwidth information is used to indicate the upper limit of the user bandwidth corresponding to the client.
  • the capability information introduced above is only exemplary and explanatory.
  • the capability information may also include other information, which is not limited in the embodiment of the present application.
  • the capability information may also include user network information, so as to inform the server of the network type used by the client, such as a cellular network or a wireless assurance (Wireless Fidelity, WiFi) network.
  • Step 502 The client sends capability information to the server.
  • the client sends capability information to the server through the network connection with the server.
  • the server receives capability information from the client.
  • the capability information may be carried in a request message (such as an immersive media playback request, used to request to play immersive media content) and sent, or it may be sent separately, which is not limited in the embodiment of the present application.
  • a request message such as an immersive media playback request, used to request to play immersive media content
  • Step 503 The server determines the target resolution provided to the client from the candidate resolutions of the immersive media content according to the resolution selection strategy and capability information of the immersive media content.
  • the server side stores the immersive media content and the resolution selection strategy of the immersive media content.
  • the immersive media content includes at least one candidate resolution.
  • the immersive media content may include multiple candidate resolutions.
  • the server combines the resolution selection strategy of the immersive media content with the capability information sent by the client, and determines to provide the resolution to the immersive media content from the multiple candidate resolutions
  • the target resolution of the client may be one of the multiple candidate resolutions.
  • the resolution selection strategy of the immersive media content can be preset and stored on the server side.
  • the resolution selection strategy may be to select candidate resolutions that meet the above-mentioned capability information requirements from among multiple candidate resolutions of the immersive media content, and then select the maximum resolution from the candidate resolutions that meet the above-mentioned capability information requirements As the target resolution.
  • the candidate resolutions of immersive media content include 8K, 4K, 2K, 1080p, and 720p from large to small.
  • the capability information of the client includes: the maximum resolution supported by the device where the client is located is 4K, the user authority corresponding to the client is a common authority, and the maximum resolution supported by the common authority is 2K; then, the server selects 2K as the target resolution Rate.
  • candidate resolutions for immersive media content include 8K, 4K, 2K, 1080p, and 720p, from large to small.
  • the capability information of the client includes: the maximum resolution supported by the device where the client is located is 4K, the user authority corresponding to the client is an advanced authority, the maximum resolution supported by the advanced authority is 8K, and the user bandwidth limit corresponding to the client 10mbps, the maximum resolution supported by the user's upper bandwidth limit is 4K; then, the server selects 4K as the target resolution.
  • Step 504 The server sends the immersive media file of the target resolution to the client.
  • the immersive media content may include the above-mentioned file tracks of multiple candidate resolutions.
  • the file track of the target resolution is packaged as an immersive media file and delivered to the client.
  • the server sends the immersive media file of the target resolution of the immersive media content to the client through the network connection with the client.
  • the client receives the immersive media file of the target resolution from the server.
  • the embodiment of the present application further includes the following step 505:
  • Step 505 The client plays the immersive media file of the target resolution.
  • the client After receiving the immersive media file of the target resolution, the client can play the immersive media file.
  • FIG. 5B shows a flowchart of an immersive media providing method provided by an embodiment of the present application.
  • This method can be applied to the implementation environment shown in FIG. 1, in particular, it can be applied to the server 20 in FIG. 1.
  • the method can include the following steps (511-513):
  • Step 511 Add resolution description information and resolution selection strategy information to the file format information of the immersive media content; where the resolution description information and resolution selection strategy information are the same as those in the remaining embodiments of the application, and will not be repeated here. .
  • Step 512 Determine a target resolution provided to the client according to the resolution description information of the immersive media content and the resolution selection strategy information;
  • Step 513 Send the immersive media file of the target resolution to the client.
  • FIG. 5C shows a flowchart of an immersive media providing method provided by an embodiment of the present application.
  • This method can be applied to the implementation environment shown in FIG. 1, in particular, it can be applied to the terminal 10 in FIG. 1.
  • the method includes the following steps (521 ⁇ 522):
  • Step 521 Receive an immersive media file of the target resolution from the server, where the file format information of the immersive media content of the immersive media file includes resolution description information and resolution selection strategy information, where the resolution description Information is used to define the candidate resolution of the immersive media content, and the resolution selection strategy information is used to define the resolution selection strategy of the immersive media content;
  • Step 522 Present the immersive media file according to the file format information.
  • the technical solution provided by the embodiments of the present application selects the immersive media of the target resolution from the candidate resolutions of the immersive media content based on the capability information of the client and the resolution selection strategy of the immersive media content.
  • the file is sent to the client; it provides a technical solution for adaptively selecting the resolution of the immersive media content according to the client’s capabilities, which can select the maximum resolution from the candidate resolutions that meet the client’s capability information requirements and provide it to Client, so as to improve the utilization of bandwidth resources while ensuring user experience.
  • the file format information of the immersive media content includes: resolution selection strategy information and resolution description information.
  • the resolution selection strategy information is used to define the resolution selection strategy of the immersive media content
  • the resolution description information is used to define the candidate resolution of the immersive media content.
  • different resolution selection strategies and/or different candidate resolutions can be defined in the file format information, thereby enhancing flexibility in adaptive resolution selection for different immersive media content sex.
  • the resolution selection strategy information includes: a zoom strategy type field, which is used to indicate the type (or referred to as identification) of the resolution selection strategy adopted by the immersive media content.
  • a zoom strategy type field which is used to indicate the type (or referred to as identification) of the resolution selection strategy adopted by the immersive media content.
  • identification the type of the resolution selection strategy adopted by the immersive media content.
  • the zoom strategy type field takes different values, it means that different resolution selection strategies are adopted.
  • the values corresponding to various resolution selection strategies can be pre-defined or pre-configured, which is not limited in the embodiment of the present application.
  • the value of the zoom strategy type field is the first value, which means that the resolution selection strategy is that the device capability allows viewing quality to be prioritized; the value of the zoom strategy type field is the second value, which indicates that the resolution selection strategy is the device capability allowable condition Under the user’s bandwidth limit, viewing quality is prioritized.
  • the first value is 0 and the second value is 1.
  • the resolution selection strategies introduced above are only exemplary and explanatory. In the embodiments of the present application, the number, content, and corresponding values of the resolution selection strategies are not limited, which can be flexibly set based on actual conditions. Certainly.
  • the resolution selection strategy information may further include: a zoom strategy description field, which is used to provide a text description of the resolution selection strategy.
  • the resolution selection strategy indicated by the zoom strategy type field may require some description information, such as user bandwidth limitation, etc., which can be described in the zoom strategy description field.
  • the resolution selection strategy information may further include: a scaling strategy description length field, which is used to indicate the length of the text description in the scaling strategy description field.
  • the resolution description information includes: a quantity indication field and a zoom ratio indication field.
  • the quantity indication field is used to indicate the number of zoom areas included in the immersive media content
  • the zoom ratio indication field is used to indicate the zoom ratio of the zoom area.
  • different zoom areas correspond to different candidate resolutions. There may be one or more zoom areas in the spherical area of the same omnidirectional immersive media content (such as omnidirectional video) or the 2D area on the projected image. Among them, the video data of different zoom areas have different resolutions or quality.
  • the aforementioned zoom ratio refers to the zoom ratio of the zoom area relative to the original area (that is, the spherical area or the 2D area described above).
  • the zoom ratio indication field When the zoom ratio indication field takes different values, it means different zoom ratios.
  • the values corresponding to various zoom ratios can be pre-defined or pre-configured, which is not limited in the embodiment of the present application.
  • the value of the zoom ratio indication field is 0, which means that the zoom area is not scaled relative to the original area; the value of the zoom ratio indication field is 1, which means that the width and height of the zoom area are 1/2 of the original area respectively; zoom ratio The value of the indication field is 2, which means that the zoom area is 1/4 of the original area in width and height; the value of the zoom ratio indication field is 3, which means that the zoom area is 1/6 of the original area in width and height;
  • the value of the zoom ratio indication field is 4, which means that the zoom area is 1/8 of the original area in width and height.
  • the zoom ratio introduced above is only exemplary and explanatory.
  • the number, value, and corresponding value of the zoom ratio are not limited, which can be flexibly set in accordance with the actual situation.
  • the resolution description information further includes at least one of the following fields: a zoom algorithm type field, a zoom symbol type field, a zoom area type field, and a zoom area description field.
  • the zoom algorithm type field is used to indicate the zoom algorithm type of the zoom area
  • the zoom symbol type field is used to indicate the boundary symbol type of the zoom area
  • the zoom area type field is used to indicate the type of zoom area
  • the zoom area description field is used to provide information about the zoom area.
  • the file format information of the immersive media content may include the following zoom area structure:
  • num_regions the number indication field described above, used to indicate the number of zoom regions included in the immersive media content.
  • this field may indicate the number of zoom areas corresponding to the spherical area of the same omnidirectional video or the 2D area on the projected image.
  • zoom_strategy_type The zoom strategy type field described above, used to indicate the resolution selection strategy adopted by the immersive media content. For example, this field can indicate the type of strategy for selecting zoom areas of different resolutions or qualities. Examples can be shown in Table-2 below:
  • the resolution selection strategy is to give priority to viewing quality if the device capability allows 1
  • the resolution selection strategy is to give priority to viewing quality within the user’s bandwidth limit when the device capabilities allow 2 ⁇ 255 Undefined
  • zoom_strategy_description_length the zoom strategy description length field introduced above, used to indicate the length of the text description in the zoom strategy description field.
  • this field may indicate the length of the description part of the scaling policy, in bytes.
  • zoom_strategy_description The zoom strategy description field described above, used to provide a text description of the resolution selection strategy.
  • the field may be a UTF-8 string ending with a null character, providing a text description of the scaling strategy (ie, the resolution selection strategy).
  • zoom_reg_width[i], zoom_reg_height[i], zoom_reg_top[i], zoom_reg_left[i] respectively define the width, height, vertical offset and horizontal offset of the i-th zoom area, i is a positive integer.
  • zoom_ratio the zoom ratio indication field described above, used to indicate the zoom ratio of the zoom area.
  • the allowed value in this field indicates the different zoom ratios supported by the system.
  • the corresponding relationship between the value of this field and the zoom ratio may be as shown in the following Table-3:
  • the zoom_ratio corresponding to the original video track A is 0.
  • zoom_algorithm_type the zoom algorithm type field described above, used to indicate the zoom algorithm type of the zoom area.
  • zoom_symbolization_type the zoom symbol type field described above, used to indicate the boundary symbol type of the zoom area.
  • zoom_area_type The zoom area type field described above, used to indicate the type of zoom area. Exemplarily, the corresponding relationship between the value of this field and the zoom area type may be as shown in the following Table-4:
  • zoom_description The zoom area description field described above, a UTF-8 string ending with a null character, used to provide a text description of the zoom area.
  • the resolution selection strategy and candidate resolutions of the immersive media content are defined, so that the server can act as the client according to the file format information.
  • the server side stores the video file. Assuming that the unscaled video resolution is 8K, the video file contains video file tracks of multiple resolutions (that is, multiple definitions), and the zoom_ratio is 0 (corresponding to 8K resolution), 1 ( Corresponding to 4K resolution), 2 (corresponding to 1080p resolution).
  • the server sets the resolution selection strategy to 1, that is, the viewing quality within a certain bandwidth limit is prioritized when the device capability allows it, and the bandwidth limit is 10mbps, which is described as "Limit bandwidth: 10mbps" in zoom_strategy_description.
  • the client side (or called the player side) sends capability information to the server side. Assuming that user A can consume 8K video, which is an ordinary user; user B can consume 4K video, which is an advanced user; and user C can consume 8K video. For advanced users. For example, advanced users have a higher priority than ordinary users.
  • the server decides:
  • User A is an ordinary user and needs to be subject to a bandwidth limit of 10mbps.
  • the video sent to user A should be a video with a resolution of 8K or less and a bandwidth of less than 10mbps (4K video is assumed in this embodiment).
  • This video corresponds to a file track with a zoom_ratio of 1. Therefore, the server repackages the file track whose zoom_ratio is 1 into a video file and sends it to user A.
  • User B is an advanced user and is not limited by 10mbps bandwidth.
  • the video sent to user B should be the highest resolution video that can be consumed, that is, 4K video. Therefore, the server repackages the file track whose zoom_ratio is 1 into a video file and sends it to user B.
  • User C is an advanced user and is not limited by 10mbps bandwidth.
  • the video sent to user C should be the highest resolution video that can be consumed, that is, 8K video. Therefore, the server repackages the file track with a zoom_ratio of 0 as a video file and sends it to user C.
  • the scaling value and the corresponding video resolution are not limited to the examples given.
  • the server can select the appropriate video files to send to the corresponding users according to the stored video files of different resolutions.
  • the server may not necessarily store video files with resolutions corresponding to all possible scaling ratios. In this case, according to the existing video files of different resolutions and the resolution indicated by the zoom ratio, the video file that meets the conditions and is closest to the target video resolution can be selected and sent to the corresponding user.
  • the technical solution of the present application is introduced and explained only from the perspective of the interaction between the server and the client.
  • the above-mentioned steps related to the server execution can be separately realized as a server-side immersive media providing method; the aforementioned steps related to the client-side execution can be separately realized as a client-side immersive media acquisition method.
  • FIG. 6A shows a block diagram of an immersive media providing apparatus according to an embodiment of the present application.
  • the device has the function of realizing the above example of the immersive media providing method, and the function can be realized by hardware, or by hardware executing corresponding software.
  • the device can be the server described above, or it can be set on the server.
  • the device 600 may include: an information receiving module 610, a resolution selection module 620, and a file sending module 630.
  • the information receiving module 610 is configured to receive capability information from the client, where the capability information is used to indicate the immersive media playback capability of the device where the client is located.
  • the resolution selection module 620 is configured to determine the target resolution provided to the client from the candidate resolutions of the immersive media content according to the resolution selection strategy of the immersive media content and the capability information.
  • the file sending module 630 is configured to send the immersive media file of the target resolution to the client.
  • the file format information of the immersive media content includes: resolution description information and resolution selection strategy information; wherein: the resolution description information is used to define candidates for the immersive media content Resolution: The resolution selection strategy information is used to define the resolution selection strategy of the immersive media content.
  • the resolution selection strategy information includes: a zoom strategy type field, which is used to indicate the type of resolution selection strategy adopted by the immersive media content.
  • the value of the zoom strategy type field is a first value, which indicates that the resolution selection strategy is that the device capability allows the viewing quality to be prioritized; the value of the zoom strategy type field is a second value , Indicating that the resolution selection strategy is that the viewing quality within the user's bandwidth limit is prioritized under the condition that the device capability permits.
  • the resolution selection strategy information further includes: a scaling strategy description field, used to provide a text description of the resolution selection strategy; a scaling strategy description length field, used to indicate the scaling strategy description field The length of the text description in.
  • the resolution description information includes: a quantity indication field, which is used to indicate the number of zoom areas included in the immersive media content; and a zoom ratio indication field, which is used to indicate the zoom area of the zoom area. Ratio; where different zoom areas correspond to different candidate resolutions.
  • the value of the zoom ratio indication field is 0, which means that the zoom area is not scaled relative to the original area; the value of the zoom ratio indication field is 1, which means that the zoom area is wide or wide.
  • the height is 1/2 of the original area; the value of the zoom ratio indication field is 2, which means that the width and height of the zoom area are 1/4 of the original area respectively; the value of the zoom ratio indication field is 3. It means that the zoom area is 1/6 of the original area in width and height; the value of the zoom ratio indication field is 4, which means that the zoom area is 1/8 of the original area in width and height. .
  • the resolution description information further includes: a zoom algorithm type field, which is used to indicate the zoom algorithm type of the zoom area; and a zoom symbol type field, which is used to indicate the boundary symbol type of the zoom area. ;
  • the zoom area type field is used to indicate the type of the zoom area;
  • the zoom area description field is used to provide a text description of the zoom area.
  • the capability information includes at least one of the following: device capability information, used to indicate the maximum resolution supported by the device where the client is located; user authority information, used to indicate the user corresponding to the client The maximum resolution supported by the authority; user bandwidth information, used to indicate the upper limit of the user bandwidth corresponding to the client.
  • the technical solution provided by the embodiments of the present application selects the immersive media of the target resolution from the candidate resolutions of the immersive media content based on the capability information of the client and the resolution selection strategy of the immersive media content.
  • the file is sent to the client; it provides a technical solution for adaptively selecting the resolution of the immersive media content according to the client’s capabilities, which can select the maximum resolution from the candidate resolutions that meet the client’s capability information requirements and provide it to Client, so as to improve the utilization of bandwidth resources while ensuring user experience.
  • FIG. 6B shows a block diagram of an immersive media providing apparatus according to an embodiment of the present application.
  • the device has the function of realizing the above example of the immersive media providing method, and the function can be realized by hardware, or by hardware executing corresponding software.
  • the device can be the server described above, or it can be set on the server.
  • the device 600 may include: an adding module 640, a resolution selecting module 650, and a file sending module 660.
  • the adding module 640 is configured to add resolution description information and resolution selection strategy information to the file format information of the immersive media content.
  • the resolution selection module 650 is configured to determine the target resolution provided to the client according to the resolution description information of the immersive media content and the resolution selection strategy information;
  • the file sending module 660 is configured to send the immersive media file of the target resolution to the client.
  • FIG. 7A shows a block diagram of an immersive media acquisition apparatus according to an embodiment of the present application.
  • the device has the function of realizing the above example of the immersive media acquisition method, and the function can be realized by hardware, or by hardware executing corresponding software.
  • the device can be the terminal described above, or it can be set on the terminal.
  • the apparatus 700 may include: an information acquiring module 710, an information sending module 720, and a file receiving module 730.
  • the information acquiring module 710 is configured to acquire capability information of the client, where the capability information is used to indicate the immersive media playback capability of the device where the client is located.
  • the information sending module 720 is configured to send the capability information to the server.
  • the file receiving module 730 is configured to receive an immersive media file of a target resolution from the server, where the target resolution is based on the resolution selection strategy of the immersive media content and the capability information, and obtains information from the immersive media The content’s candidate resolution is determined.
  • the capability information includes at least one of the following: device capability information, used to indicate the maximum resolution supported by the device where the client is located; user authority information, used to indicate the user corresponding to the client The maximum resolution supported by the authority; user bandwidth information, used to indicate the upper limit of the user bandwidth corresponding to the client.
  • FIG. 7B shows a block diagram of an immersive media acquisition apparatus according to an embodiment of the present application.
  • the device has the function of realizing the above example of the immersive media acquisition method, and the function can be realized by hardware, or by hardware executing corresponding software.
  • the device can be the terminal described above, or it can be set on the terminal.
  • the apparatus 700 may include: a file receiving module 740 and a presentation module 750.
  • the file receiving module 740 is configured to receive an immersive media file of the target resolution from the server, and the file format information of the immersive media content of the immersive media file includes resolution description information and resolution selection strategy information, where all The resolution description information is used to define the candidate resolution of the immersive media content, and the resolution selection strategy information is used to define the resolution selection strategy of the immersive media content;
  • the presentation module 750 is configured to present the immersive media file according to the file format information.
  • the technical solution provided by the embodiments of the present application selects the immersive media of the target resolution from the candidate resolutions of the immersive media content based on the capability information of the client and the resolution selection strategy of the immersive media content.
  • the file is sent to the client; it provides a technical solution for adaptively selecting the resolution of the immersive media content according to the client's capabilities, which can select the maximum resolution from the candidate resolutions that meet the client's capability information requirements and provide it to Client, so as to improve the utilization of bandwidth resources while ensuring user experience.
  • FIG. 8 shows a structural block diagram of a server provided by an embodiment of the present application.
  • the server can be used to execute the immersive media providing method provided in the foregoing embodiment. Specifically:
  • the server 800 includes a central processing unit (CPU) 801, a system memory 804 including a random access memory (RAM) 802 and a read only memory (ROM) 803, and a system memory 804 connected to it And the system bus 805 of the central processing unit 801.
  • the server 800 also includes a basic input/output system (I/O (Input/Output) system) 806 that helps to transfer information between various devices in the computer, and a basic input/output system (I/O (Input/Output) system) 806 for storing the operating system 813, application programs 814, and other program modules 812 Mass storage device 807.
  • I/O Input/Output
  • the basic input/output system 806 includes a display 808 for displaying information and an input device 809 such as a mouse and a keyboard for the user to input information.
  • the display 808 and the input device 809 are both connected to the central processing unit 801 through the input and output controller 810 connected to the system bus 805.
  • the basic input/output system 806 may also include an input and output controller 810 for receiving and processing input from multiple other devices such as a keyboard, a mouse, or an electronic stylus.
  • the input and output controller 810 also provides output to a display screen, a printer, or other types of output devices.
  • the mass storage device 807 is connected to the central processing unit 801 through a mass storage controller (not shown) connected to the system bus 805.
  • the mass storage device 807 and its associated computer-readable medium provide non-volatile storage for the server 800. That is, the mass storage device 807 may include a computer-readable medium (not shown) such as a hard disk or a CD-ROM (Compact Disc Read-Only Memory) drive.
  • Computer-readable media may include computer storage media and communication media.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented by any method or technology for storing information such as computer readable instructions, data structures, program modules, or other data.
  • Computer storage media include RAM, ROM, Erasable Programmable Read Only Memory (EPROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other solid-state memory Technology, CD-ROM, high-density digital video disc (Digital Video Disc, DVD) or other optical storage, tape cartridges, magnetic tape, magnetic disk storage or other magnetic storage devices.
  • RAM random access memory
  • ROM Erasable Programmable Read Only Memory
  • EEPROM Electrically Erasable Programmable Read Only Memory
  • flash memory or other solid-state memory Technology
  • CD-ROM high-density digital video disc (Digital Video Disc, DVD) or other optical storage, tape cartridges, magnetic tape, magnetic disk storage or other magnetic storage devices.
  • the server 800 may also be connected to a remote computer on the network to run through a network such as the Internet. That is, the server 800 can be connected to the network 812 through the network interface unit 811 connected to the system bus 805, or in other words, the network interface unit 811 can also be used to connect to other types of networks or remote computer systems (not shown).
  • the memory also includes a computer program, which is stored in the memory and configured to be executed by one or more processors to implement the above-mentioned immersive media providing method.
  • FIG. 9 shows a structural block diagram of a terminal 900 according to an embodiment of the present application.
  • the terminal 900 may be an electronic device such as a mobile phone, a tablet computer, a multimedia playback device, a TV, a projector, a display, a wearable device, a PC, and the like.
  • the terminal can be used to implement the immersive media acquisition method provided in the foregoing embodiment. Specifically:
  • the terminal 900 includes a processor 901 and a memory 902.
  • the processor 901 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on.
  • the processor 901 can be implemented in at least one hardware form of Digital Signal Processing (DSP), Field Programmable Gate Array (FPGA), Programmable Logic Array (Programmable Logic Array, PLA) .
  • the processor 901 may also include a main processor and a co-processor.
  • the main processor is a processor used to process data in the awake state, also called a central processing unit (CPU);
  • the co-processor is A low-power processor used to process data in the standby state.
  • the processor 901 may be integrated with a graphics processing unit (GPU), and the GPU is used for rendering and drawing content that needs to be displayed on the display screen.
  • the processor 901 may further include an artificial intelligence (AI) processor, and the AI processor is used to process computing operations related to machine learning.
  • AI artificial intelligence
  • the memory 902 may include one or more computer-readable storage media, which may be non-volatile.
  • the memory 902 may also include high-speed random access memory and non-volatile memory, such as one or more magnetic disk storage devices and flash memory storage devices.
  • the non-volatile computer-readable storage medium in the memory 902 is used to store at least one instruction, at least one program, code set, or instruction set, the at least one instruction, at least one program, code set, or The instruction set is configured to be executed by one or more processors to realize the above-mentioned immersive media acquisition method.
  • the terminal 900 may further include: a peripheral device interface 903 and at least one peripheral device.
  • the processor 901, the memory 902, and the peripheral device interface 903 may be connected by a bus or a signal line.
  • Each peripheral device can be connected to the peripheral device interface 903 through a bus, a signal line, or a circuit board.
  • the peripheral device includes: at least one of a radio frequency circuit 904, a touch display screen 905, a camera 906, an audio circuit 907, a positioning component 908, and a power supply 909.
  • FIG. 9 does not constitute a limitation on the terminal 900, and may include more or fewer components than shown in the figure, or combine certain components, or adopt different component arrangements.
  • a computer device is also provided.
  • the computer device includes a processor and a memory, and processor-executable instructions are stored in the memory. When the instructions are executed by one or more processors, The foregoing method for providing immersive media or obtaining method for immersive media is realized.
  • the computer equipment may include the server shown in FIG. 8 and the terminal shown in FIG. 9.
  • a computer-readable storage medium is also provided, and processor-executable instructions are stored in the storage medium.
  • processors When the instructions are executed by one or more processors, the above-mentioned immersive medium is provided.
  • the one or more processors may be located in the server.
  • a computer-readable storage medium stores processor-executable instructions, and when the instructions are executed by one or more processors, the above-mentioned immersive media acquisition is implemented.
  • the one or more processors may be located in the terminal.
  • the computer-readable storage medium may also include: Read Only Memory (ROM), Random Access Memory (RAM), Solid State Drives (SSD), or optical disks.
  • ROM Read Only Memory
  • RAM Random Access Memory
  • SSD Solid State Drives
  • optical disks the random access memory may include resistance random access memory (Resistance Random Access Memory, ReRAM) and dynamic random access memory (Dynamic Random Access Memory, DRAM).
  • ReRAM resistance random access memory
  • DRAM Dynamic Random Access Memory
  • a computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the server reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the server executes the above-mentioned immersive media providing method.
  • the processor of the terminal reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the terminal executes the above-mentioned immersive media acquisition method.
  • the "plurality” mentioned herein refers to two or more.
  • “And/or” describes the association relationship of the associated objects, indicating that there can be three types of relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, and B exists alone.
  • the character "/” generally indicates that the associated objects before and after are in an "or” relationship.
  • the numbering of the steps described in this article only exemplarily shows a possible order of execution among the steps. In some other embodiments, the above steps may also be executed out of the order of numbers, such as two different numbers. The steps are executed at the same time, or the two steps with different numbers are executed in the reverse order from the figure, which is not limited in the embodiment of the present application.
  • the above are only exemplary embodiments of this application and are not intended to limit this application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in the protection of this application. Within range.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Databases & Information Systems (AREA)
  • Computer Graphics (AREA)
  • Library & Information Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Information Transfer Between Computers (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)
  • Container Filling Or Packaging Operations (AREA)
  • Transition And Organic Metals Composition Catalysts For Addition Polymerization (AREA)

Abstract

本申请实施例提供了一种沉浸式媒体提供方法、获取方法、装置、设备及存储介质,涉及音视频技术领域。所述方法由服务器执行,包括:在沉浸式媒体内容的文件格式信息中添加分辨率描述信息和分辨率选择策略信息;根据所述沉浸式媒体内容的所述分辨率描述信息以及所述分辨率选择策略信息,确定提供给客户端的目标分辨率;向所述客户端发送所述目标分辨率的沉浸式媒体文件。

Description

沉浸式媒体提供方法、获取方法、装置、设备及存储介质
本申请要求于2020年3月24日提交中国专利局、申请号为202010211178.6、发明名称为“沉浸式媒体提供方法、获取方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及音视频技术领域,特别涉及一种沉浸式媒体提供方法、获取方法、装置、设备及存储介质。
发明背景
沉浸式媒体(Immersive Media)也称为浸入式媒体,其目的是通过音视频技术使用户产生身临其境的视听体验。
在沉浸式媒体传输方案中,业界已支持在服务器端准备多种不同分辨率的文件轨道,但并未给出该多种不同分辨率的文件轨道的选择规则。一种方式是随机选择某一分辨率的文件轨道下发给客户端,另一种方式是将多种分辨率的文件轨道全部下发给客户端。
但是,上述两种方式均无法兼顾用户体验和带宽资源的利用率。
发明内容
本申请实施例提供了一种沉浸式媒体提供方法、获取方法、装置、设备及存储介质,可以根据客户端能力自适应地选择沉浸式媒体内容的分辨率,从而在保证用户体验的前提下,提升带宽资源的利用率。所述技术方案如下:
一方面,本申请实施例提供了一种沉浸式媒体提供方法,由服务器执行,所述方法包括:
在沉浸式媒体内容的文件格式信息中添加分辨率描述信息和分辨率选择策略信息;
根据所述沉浸式媒体内容的所述分辨率描述信息以及所述分辨率选择策略信 息,确定提供给所述客户端的目标分辨率;
向所述客户端发送所述目标分辨率的沉浸式媒体文件。
另一方面,本申请实施例提供了一种沉浸式媒体获取方法,由终端执行,所述方法包括:
接收来自服务器的目标分辨率的沉浸式媒体文件,所述沉浸式媒体文件的沉浸式媒体内容的文件格式信息包括分辨率描述信息和分辨率选择策略信息,其中,所述分辨率描述信息用于定义所述沉浸式媒体内容的候选分辨率,所述分辨率选择策略信息用于定义所述沉浸式媒体内容的分辨率选择策略;
根据所述文件格式信息呈现所述沉浸式媒体文件。
另一方面,本申请实施例提供了一种沉浸式媒体提供装置,所述装置包括:
添加模块,用于在沉浸式媒体内容的文件格式信息中添加分辨率描述信息和分辨率选择策略信息;
分辨率选择模块,用于根据所述沉浸式媒体内容的所述分辨率描述信息以及所述分辨率选择策略信息,确定提供给客户端的目标分辨率;
文件发送模块,用于向所述客户端发送所述目标分辨率的沉浸式媒体文件。
另一方面,本申请实施例提供了一种沉浸式媒体获取装置,所述装置包括:
文件接收模块,用于接收来自服务器的目标分辨率的沉浸式媒体文件,所述沉浸式媒体文件的沉浸式媒体内容的文件格式信息包括分辨率描述信息和分辨率选择策略信息,其中,所述分辨率描述信息用于定义所述沉浸式媒体内容的候选分辨率,所述分辨率选择策略信息用于定义所述沉浸式媒体内容的分辨率选择策略;
呈现模块,用于根据所述文件格式信息呈现所述沉浸式媒体文件。
再一方面,本申请实施例提供了一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器中存储有处理器可执行指令,所述指令由一个或一个以上处理器执行时,实现上述沉浸式媒体提供方法或沉浸式媒体获取方法。
所述计算机设备为服务器或终端。
还一方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有处理器可执行指令,所述指令由一个或一个以上处理器执行时,实现上述沉浸式媒体提供方法。
还一方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存 储介质中存储有处理器可执行指令,所述指令由一个或一个以上处理器执行时,实现上述沉浸式媒体获取方法。
又一方面,本申请实施例提供了一种计算机程序产品,所述计算机程序产品被处理器执行时,用于实现上述沉浸式媒体提供方法。
又一方面,本申请实施例提供了一种计算机程序产品,所述计算机程序产品被处理器执行时,用于实现上述沉浸式媒体获取方法。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请。
附图简要说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请一个实施例提供的实施环境的示意图;
图2是本申请一个实施例提供的沉浸式媒体播放系统的端到端处理流程的示意图;
图3是本申请一个实施例提供的沉浸式媒体播放系统的系统处理架构的示意图;
图4是本申请一个实施例提供的基于沉浸式媒体应用的客户端参考模型的示意图;
图5A是本申请一个实施例提供的沉浸式媒体提供方法的流程图;
图5B是本申请一个实施例提供的沉浸式媒体提供方法的流程图;
图5C是本申请一个实施例提供的沉浸式媒体提供方法的流程图;
图6A是本申请一个实施例提供的沉浸式媒体提供装置的框图;
图6B是本申请一个实施例提供的沉浸式媒体提供装置的框图;
图7A是本申请一个实施例提供的沉浸式媒体获取装置的框图;
图7B是本申请一个实施例提供的沉浸式媒体获取装置的框图;
图8是本申请一个实施例提供的服务器的结构框图;
图9是本申请一个实施例提供的终端的结构框图。
实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的方法的例子。
本申请实施例提供的技术方案,可应用于任何沉浸式媒体播放场景中,如沉浸式媒体点播或直播场景。
请参考图1,其示出了本申请一个实施例提供的实施环境的示意图。该实施环境可以实现成为沉浸式媒体播放系统。如图1所示,该实施环境可以包括:终端10和服务器20。
终端10可以是诸如手机、平板电脑、多媒体播放设备、电视机、放映机、显示器、可穿戴设备、个人计算机(Personal Computer,PC)等电子设备。终端10中可以安装运行具有沉浸式媒体播放功能的客户端。例如,该客户端可以与服务器20进行交互,从服务器20请求获取沉浸式媒体内容,并对该获取到的沉浸式媒体内容进行播放。
服务器20用于提供沉浸式媒体内容。服务器20可以是一台服务器,也可以是由多台服务器组成的服务器集群,或者是一个云计算服务中心。
终端10和服务器20之间可通过网络30进行互相通信。该网络30可以是有线网络,也可以是无线网络。
终端10和服务器20之间还可以包括一个或多个中间节点,如内容分发网络(Content Delivery Network,CDN)或其它中继设备或路由设备,本申请实施例对此不作限定。
如图2所示,其示出了沉浸式媒体播放系统的端到端处理流程的示意图。该处理流程可以包括:内容获取与制作21、沉浸媒体编码/文件封装22、沉浸媒体传输23、沉浸媒体解码/文件解封装24、沉浸媒体渲染25等主要技术环节。其中,内容获取与制作21、沉浸媒体编码/文件封装22、沉浸媒体传输23等技术环节可以由服务器执行,沉浸媒体解码/文件解封装24、沉浸媒体渲染25等技术环节可以由终端 (如客户端)执行。
如图3所示,其示出了沉浸式媒体播放系统的系统处理架构的示意图,包括从服务器31到终端32(客户端)的沉浸式媒体内容的处理及表述、文件格式和传输信令。
现实世界的声音-视觉场景通过音频传感器、摄像设备(如普通摄像头、立体摄像头、光场摄像头)以及传感设备(如包括激光雷达)采集,转化为一系列的数据信号后制作成虚拟现实内容呈现给用户观看。摄像设备部署在特定的位置获取一定空间内视频/图像内容,音频可以通过不同的麦克风配置来获取,视频/图像和音频在时间和空间内保持同步。对于视频/图像内容制作,可分为3DoF(Degree of Freedom,自由度)及3DoF+视频制作和6DoF视频制作。其中,DoF是指用户在观看沉浸式媒体时支持的运动并产生内容交互的自由度。
3DoF视频制作,由一组摄像机或一个带有多个摄像头和传感器的摄像设备录制而成。摄像头通常可以获取在设备中心周围所有方向的内容。
3DoF+视频制作,结合3DoF视频与深度信息制作而成。
6DoF视频制作,主要由相机阵列拍摄得到的点云、光场等形式的内容制作而成。6DoF媒体需要在编码前进行特定处理,例如点云媒体在编码前需要切割、映射等过程。
采集的音频/视频被编码成相应音视频码流,当使用点云数据或光场信息表示采集视频,需要采用其对应的编码方式(如点云编码)。然后,按一定格式(如ISO基媒体文件格式(ISO Base Media File Format,ISOBMFF)或者其它国际标准体系)将编码的媒体封装在文件容器中并结合媒体的描述信息/结合描述媒体内容属性的元数据和视窗元数据,根据一个特定的媒体文件格式组成一个媒体文件或者组成一个初始化片段和媒体片段。
在服务器31中,存储了媒体呈现描述/信令信息和媒体文件资源。媒体呈现描述/信令信息给客户端提供了足够的通知信息,使得对应的媒体内容在一种传输机制下被交付到播放器并进行消费。客户端可以根据终端状态,例如头部/眼部/位置追踪、网络吞吐量等,通过质量/视点自适应动态请求媒体文件资源。
媒体文件通过传输机制,比如,动态自适应流媒体传输(Dynamic Adaptive Streaming over HTTP,DASH)、智能媒体传输(Smart Media Transport,SMT)传输 给用户终端32。用户终端32接收到媒体文件后,对文件进行解封装、解码、拼接/合成、渲染等一系列处理后可显示虚拟现实内容。
如图4所示,其示出了基于沉浸式媒体应用的客户端参考模型的示意图,其定义了客户端的各功能组件。
用户终端通过远端服务器推荐或用户自己需求的方式对媒体文件选择,从远端服务器下载或接收远端服务器推送的媒体文件,经过并由解析器41、解码器42、转换器43、渲染器44等一系列组件进行处理后,实现虚拟现实媒体内容的显示。同时,用户终端可以依据用户需求进行远程渲染。
解析器41:解析器41提供对媒体文件或分片的处理,提取基本流以及解析元数据,解析出的元数据用于渲染。解析器41可依据用户动作进行动态的信息处理(如用户头动、位置的跟踪信息),如动态选择下载的媒体分片。
解码器42:解码器42用于解码解析器41提供的媒体流,并将解码流输出到转换器43。
转换器43:转换器43根据解析器41提供的元数据,将解码后的媒体转换为球形/3D(3 Dimensional,三维)视频。例如3DoF时将平面图像映射为球形,在基于映射、投影的6DoF处理时将2D(2 Dimensional,二维)信息流重建成3D数据。如果有必要,可使用解析器41解析转换的元数据。
渲染器44:渲染器44使用解码的信令、渲染元数据、以及视窗的信息(或者考虑其他的可能的信息)对视频/音频进行渲染。3DoF和3DoF+主要基于当前视点、视差、深度信息等对球形媒体内容进行渲染,6DoF对当前视点对视窗内的3D媒体内容进行渲染。
传感装置45:传感装置45依据用户的移动获取当前视窗的方向以及用户的位置信息,并反馈给用户终端解析器41。用户终端可依据视窗、视窗的方向以及用户的位置信息选择下载适当的媒体,或者解析器41依据视窗、用户位置信息选择适当的媒体文件。
远程渲染平台46:远程渲染平台46部署在远端服务器,依据用户终端反馈的视窗、视窗的方向以及用户的位置信息或者媒体文件中的渲染元数据进行渲染,用户终端依据远程渲染平台的渲染媒体直接显示。
另外,在本申请实施例中,沉浸式媒体的分辨率与沉浸式媒体的主观质量、沉 浸式媒体的客观质量、沉浸式媒体的清晰度等同。
沉浸式媒体的分辨率可以包括8K、4K、2K、1080p、720p等多种不同分辨率称谓。上述各种分辨率称谓所代表的典型分辨率数值(即水平x垂直方向的像素数)示例性如下表-1所示:
表-1:分辨率介绍
Figure PCTCN2021077360-appb-000001
下面,将通过几个实施例,对本申请技术方案进行详细的介绍说明。
请参考图5A,其示出了本申请一个实施例提供的沉浸式媒体提供方法的流程图。该方法可应用于图1所示的实施环境中。该方法可以包括如下几个步骤(501~504):
步骤501,客户端获取自身的能力信息,该能力信息用于指示客户端所在设备的沉浸式媒体播放能力。
能力信息可以包括以下至少一项:设备能力信息、用户权限信息、用户带宽信息。其中,设备能力信息用于体现客户端所在设备的处理能力,如沉浸式媒体内容的渲染能力,设备能力信息可以用于指示客户端所在设备支持的最大分辨率,以此告知服务器该客户端所在设备能够渲染播放的最大分辨率。用户权限信息用于体现客户端对应的用户权限,如客户端中登录的用户帐号的等级和/或权限信息,用户权限信息可以用于指示客户端对应的用户权限所支持的最大分辨率,以此告知服务器该客户端的使用者有权限观看的最大分辨率。用户带宽信息用于体现客户端的带宽能力,如用户带宽信息用于指示客户端对应的用户带宽上限。
当然,上述介绍的能力信息仅是示例性和解释性的,在一些其它实施例中,能力信息还可以包括其它信息,本申请实施例对此不作限定。例如,能力信息还可以包括用户网络信息,以此告知服务器该客户端所使用的网络类型,如蜂窝网络或无线保证(Wireless Fidelity,WiFi)网络。
步骤502,客户端向服务器发送能力信息。
客户端通过与服务器之间的网络连接,向服务器发送能力信息。相应地,服务器接收来自客户端的能力信息。
另外,能力信息可以携带在请求消息(如沉浸式媒体播放请求,用于请求播放沉浸式媒体内容)中发送,也可以单独发送,本申请实施例对此不作限定。
步骤503,服务器根据沉浸式媒体内容的分辨率选择策略和能力信息,从沉浸式媒体内容的候选分辨率中,确定提供给客户端的目标分辨率。
服务器端存储有沉浸式媒体内容,以及该沉浸式媒体内容的分辨率选择策略。该沉浸式媒体内容包括至少一种候选分辨率。该沉浸式媒体内容可以包括多种候选分辨率,服务器结合该沉浸式媒体内容的分辨率选择策略和客户端发送的能力信息,从该沉浸式媒体内容的多种候选分辨率中,确定提供给客户端的目标分辨率。其中,目标分辨率可以是该多种候选分辨率中的其中一种分辨率。
沉浸式媒体内容的分辨率选择策略可以预先设定并存储在服务器端。该分辨率选择策略可以是从沉浸式媒体内容的多个候选分辨率中,筛选出满足上述能力信息要求的候选分辨率,然后从该满足上述能力信息要求的候选分辨率中,选取最大分辨率作为目标分辨率。
例如,沉浸式媒体内容的候选分辨率由大到小包括:8K、4K、2K、1080p和720p。假设客户端的能力信息包括:客户端所在设备支持渲染的最大分辨率为4K,客户端对应的用户权限为普通权限,该普通权限所支持的最大分辨率为2K;那么,服务器选择2K作为目标分辨率。
又例如,沉浸式媒体内容的候选分辨率由大到小包括:8K、4K、2K、1080p和720p。假设客户端的能力信息包括:客户端所在设备支持渲染的最大分辨率为4K,客户端对应的用户权限为高级权限,该高级权限所支持的最大分辨率为8K,且客户端对应的用户带宽上限为10mbps,该用户带宽上限所支持的最大分辨率为4K;那么,服务器选择4K作为目标分辨率。
步骤504,服务器向客户端发送目标分辨率的沉浸式媒体文件。
沉浸式媒体内容可以包括上述多种候选分辨率的文件轨道,服务器确定出目标分辨率之后,将该目标分辨率的文件轨道封装为沉浸式媒体文件,下发给客户端。服务器通过与客户端之间的网络连接,向客户端发送该沉浸式媒体内容的目标分辨率的沉浸式媒体文件。相应地,客户端接收来自服务器的目标分辨率的沉浸式媒体文件。
如图5A所示,本申请实施例还包括如下步骤505:
步骤505,客户端播放目标分辨率的沉浸式媒体文件。
客户端在接收到目标分辨率的沉浸式媒体文件之后,可以对该沉浸式媒体文件进行播放。
请参考图5B,其示出了本申请一个实施例提供的沉浸式媒体提供方法的流程图。该方法可应用于图1所示的实施环境中,特别是,可应用于图1的服务器20中。该方法可以包括如下几个步骤(511~513):
步骤511,在沉浸式媒体内容的文件格式信息中添加分辨率描述信息和分辨率选择策略信息;其中,分辨率描述信息和分辨率选择策略信息同本申请其余的实施例,在此不再赘述。
步骤512,根据所述沉浸式媒体内容的所述分辨率描述信息以及所述分辨率选择策略信息,确定提供给客户端的目标分辨率;
步骤513,向所述客户端发送所述目标分辨率的沉浸式媒体文件。
请参考图5C,其示出了本申请一个实施例提供的沉浸式媒体提供方法的流程图。该方法可应用于图1所示的实施环境中,特别是,可应用于图1的终端10中。所述方法包括如下几个步骤(521~522):
步骤521,接收来自服务器的目标分辨率的沉浸式媒体文件,所述沉浸式媒体文件的沉浸式媒体内容的文件格式信息包括分辨率描述信息和分辨率选择策略信息,其中,所述分辨率描述信息用于定义所述沉浸式媒体内容的候选分辨率,所述分辨率选择策略信息用于定义所述沉浸式媒体内容的分辨率选择策略;
步骤522,根据所述文件格式信息呈现所述沉浸式媒体文件。
综上所述,本申请实施例提供的技术方案,通过根据客户端的能力信息和沉浸式媒体内容的分辨率选择策略,从沉浸式媒体内容的候选分辨率中,选择目标分辨 率的沉浸式媒体文件发送给客户端;提供了一种根据客户端能力自适应地选择沉浸式媒体内容的分辨率的技术方案,能够实现从满足客户端能力信息要求的候选分辨率中,选择最大分辨率提供给客户端,从而在保证用户体验的前提下,提升带宽资源的利用率。
为了实现上述图5A-图5C实施例所介绍的功能,需要定义沉浸式媒体内容的分辨率选择策略和候选分辨率。在示例性实施例中,沉浸式媒体内容的文件格式信息包括:分辨率选择策略信息和分辨率描述信息。其中,分辨率选择策略信息用于定义沉浸式媒体内容的分辨率选择策略,分辨率描述信息用于定义沉浸式媒体内容的候选分辨率。针对不同的沉浸式媒体内容,可以在其文件格式信息中分别定义不同的分辨率选择策略和/或定义不同的候选分辨率,从而提升针对不同沉浸式媒体内容进行分辨率自适应选择时的灵活性。
在示例性实施例中,分辨率选择策略信息包括:缩放策略类型字段,用于指示沉浸式媒体内容所采用的分辨率选择策略的类型(或称为标识)。当该缩放策略类型字段取不同的值时,表示采用不同的分辨率选择策略。各种分辨率选择策略对应的值可以进行预定义或预配置,本申请实施例对此不作限定。
例如,缩放策略类型字段的值为第一数值,表示分辨率选择策略为设备能力允许条件下,观看质量优先;缩放策略类型字段的值为第二数值,表示分辨率选择策略为设备能力允许条件下,用户带宽限制内观看质量优先。示例性地,第一数值为0,第二数值为1。当然,上述介绍的分辨率选择策略仅是示例性和解释性的,在本申请实施例中,对分辨率选择策略的数量、内容及对应的值均不作限定,这可以结合实际情况进行灵活设定。
分辨率选择策略信息还可以包括:缩放策略描述字段,用于提供分辨率选择策略的文本描述。缩放策略类型字段所指示的分辨率选择策略,可能会需要用到一些描述信息,如用户带宽限制等,这些描述信息可以在缩放策略描述字段中进行说明。分辨率选择策略信息还可以包括:缩放策略描述长度字段,用于指示缩放策略描述字段中的文本描述的长度。
在示例性实施例中,分辨率描述信息包括:数量指示字段和缩放比例指示字段。数量指示字段用于指示沉浸式媒体内容包括的缩放区域的数量,缩放比例指示字段用于指示缩放区域的缩放比例。其中,不同的缩放区域对应于不同的候选分辨率。 同一个全方向的沉浸式媒体内容(如全方向视频)的球面区域或投影图像上的2D区域,其可能存在一个或者多个缩放区域,其中,不同缩放区域的视频数据具有不同的分辨率或质量。上述缩放比例是指缩放区域相对于原始区域(也即上述球面区域或2D区域)的缩放比例。
缩放比例指示字段取不同的值时,表示不同的缩放比例。各种缩放比例对应的值可以进行预定义或预配置,本申请实施例对此不作限定。例如,缩放比例指示字段的值为0,表示缩放区域相对于原始区域未进行缩放;缩放比例指示字段的值为1,表示缩放区域在宽、高上分别为原始区域的1/2;缩放比例指示字段的值为2,表示缩放区域在宽、高上分别为原始区域的1/4;缩放比例指示字段的值为3,表示缩放区域在宽、高上分别为原始区域的1/6;缩放比例指示字段的值为4,表示缩放区域在宽、高上分别为原始区域的1/8。当然,上述介绍的缩放比例仅是示例性和解释性的,在本申请实施例中,对缩放比例的数量、取值及对应的值均不作限定,这可以结合实际情况进行灵活设定。
分辨率描述信息,还包括以下字段中的至少一种:缩放算法类型字段、缩放符号类型字段、缩放区域类型字段,以及缩放区域描述字段。缩放算法类型字段用于指示缩放区域的缩放算法类型,缩放符号类型字段用于指示缩放区域的边界符号类型,缩放区域类型字段用于指示缩放区域的类型,缩放区域描述字段用于提供缩放区域的文本描述。
下面,以扩展ISOBMFF数据盒的形式举例,对沉浸式媒体内容的分辨率选择策略和候选分辨率的定义方式进行介绍说明。沉浸式媒体内容的文件格式信息可以包括如下的缩放区域结构:
Figure PCTCN2021077360-appb-000002
Figure PCTCN2021077360-appb-000003
缩放区域结构RegionWiseZoomingStruct中各字段语义如下:
num_regions:即上文介绍的数量指示字段,用于指示沉浸式媒体内容包括的缩放区域的数量。例如,该字段可以指示对应于同一个全方向视频的球面区域或投影图像上的2D区域的缩放区域的数量。可能存在一个或者多个缩放区域,其中,不同缩放区域的视频数据具有不同的分辨率或质量。
zoom_strategy_type:即上文介绍的缩放策略类型字段,用于指示沉浸式媒体内容所采用的分辨率选择策略。例如,该字段可以指示选择不同分辨率或质量的缩放区域的策略类型,示例可以如下表-2所示:
表-2:缩放策略类型字段
取值 描述
0 分辨率选择策略为设备能力允许条件下,观看质量优先
1 分辨率选择策略为设备能力允许条件下,用户带宽限制内观看质量优先
2~255 未定义
zoom_strategy_description_length:即上文介绍的缩放策略描述长度字段,用于指示缩放策略描述字段中的文本描述的长度。例如,该字段可以指示缩放策略描述部分的长度,以字节为单位。
zoom_strategy_description:即上文介绍的缩放策略描述字段,用于提供分辨率选择策略的文本描述。例如,该字段可以是以空字符结尾的UTF-8字符串,提供缩放策略(即分辨率选择策略)的文本描述。
zoom_reg_width[i]、zoom_reg_height[i]、zoom_reg_top[i]、zoom_reg_left[i]:分 别定义第i个缩放区域的宽、高、垂直偏移和水平偏移,i为正整数。
zoom_ratio:即上文介绍的缩放比例指示字段,用于指示缩放区域的缩放比例。该字段允许选取的数值,指示系统支持的不同缩放比例。示例性地,该字段的值与缩放比例之间的对应关系可以如下表-3所示:
表-3:缩放比例指示字段
取值 描述
0 表示缩放区域相对于原始区域未进行缩放
1 表示缩放区域在宽、高上分别为原始区域的1/2
2 表示缩放区域在宽、高上分别为原始区域的1/4
3 表示缩放区域在宽、高上分别为原始区域的1/6
4 表示缩放区域在宽、高上分别为原始区域的1/8
5~255 未定义
为便于理解上述缩放比例指示字段在实际应用中的取值,做如下举例说明:
假设原始视频轨道A对应8K分辨率,即7680x 4320分辨率,则该原始视频轨道A对应的zoom_ratio为0。
假设通过视频降采样后,得到4K(3840x 2160)分辨率和1080p(1920x 1080)分辨率的视频轨道B与C。由7680x 4320、3840x 2160、1920x 1080的数学关系不难得知,视频轨道B对应的缩放区域在宽、高分别为原始区域的1/2,视频轨道C对应的缩放区域在宽、高分别为原始区域的1/4。因此,视频轨道B对应的zoom_ratio为1,视频轨道C对应的zoom_ratio为2。
假设通过视频降采样后,得到影院4K(假设为4096x 2160)分辨率和2K(假设为2048x 1080)分辨率的视频轨道B与C。此时,即使B与C对应的缩放区域的宽、高在数学上并不严格等于原始区域的1/2与1/4。但为了避免穷举,认为视频轨道B与视频轨道C在宽、高上近似为原始视频的1/2与1/4。此时视频轨道B对应的zoom_ratio依然为1,视频轨道C对应的zoom_ratio依然为2。
zoom_algorithm_type:即上文介绍的缩放算法类型字段,用于指示缩放区域的缩放算法类型。
zoom_symbolization_type:即上文介绍的缩放符号类型字段,用于指示缩放区域的边界符号类型。
zoom_area_type:即上文介绍的缩放区域类型字段,用于指示缩放区域的类型。示例性地,该字段的值与缩放区域类型之间的对应关系可以如下表-4所示:
表-4:缩放区域类型字段
取值 描述
0 导演剪辑的缩放区域,即根据内容提供者的创作意图缩放视频
1 根据观看统计数据的测量结果选择的缩放区域
2~239 保留
240~255 未定义
zoom_description:即上文介绍的缩放区域描述字段,以空字符结尾的UTF-8字符串,用于提供缩放区域的文本描述。
在本申请实施例中,通过在沉浸式媒体内容的文件格式信息中添加字段,来定义沉浸式媒体内容的分辨率选择策略和候选分辨率,使得服务器端能够根据该文件格式信息,为客户端提供合适分辨率的沉浸式媒体文件,为实现上述基于客户端能力的分辨率自适应选择提供技术支持。
下面,结合一个例子对本申请技术方案进行介绍说明。
服务器端存储视频文件,假设未缩放的视频分辨率为8K,视频文件中包含多种分辨率(也即多种清晰度)的视频文件轨道,zoom_ratio分别为0(对应8K分辨率),1(对应4K分辨率),2(对应1080p分辨率)。
服务器设定清晰度选择策略为1,即设备能力允许条件下,一定带宽限制内观看质量优先,带宽限制为10mbps,在zoom_strategy_description中描述为“Limit bandwidth:10mbps”。
客户端(或称为播放器端)向服务器端发送能力信息,假设A用户设备可以消费8K视频,为普通用户;B用户设备可以消费4K视频,为高级用户;C用户设备可以消费8K视频,为高级用户。例如,高级用户比普通用户具有更高的优先级。
服务器根据上述能力信息以及分辨率选择策略,决定:
1、A用户为普通用户,需受到10mbps带宽限制,发送给A用户的视频应为8K分辨率以下且带宽小于10mbps的视频(本实施例中假设为4K视频)。该视频对应zoom_ratio为1的文件轨道。因此服务器将zoom_ratio为1的文件轨道重新封装为视频文件,发送给用户A。
2、B用户为高级用户,不受10mbps带宽限制,发送给B用户的视频应为其能消费的最高分辨率视频,即4K视频。因此服务器将zoom_ratio为1的文件轨道重新封装为视频文件,发送给用户B。
3、C用户为高级用户,不受10mbps带宽限制,发送给C用户的视频应为其能消费的最高分辨率视频,即8K视频。因此服务器将zoom_ratio为0的文件轨道重新封装为视频文件,发送给用户C。
用户A、B、C分别消费其收到的视频文件。
需要说明的是,在上述实施例中,缩放比例取值及对应的视频分辨率不局限于给出的例子。服务器端可根据已存储的不同分辨率的视频文件,选择合适的发送给对应用户。另外,服务器端可能未必存储有所有可能的缩放比例对应分辨率的视频文件。在这种情况下,可根据已有的不同分辨率的视频文件及缩放比例指示的分辨率,选择符合条件的最接近目标视频分辨率的视频文件发送给对应用户。
还需要说明的是,在上述实施例中,关于各字段的名称和描述仅是示例性和解释性的,在实现上述各字段所定义功能的前提下,各字段的名称和描述可以结合实际情况进行设定,但都应当落入本申请保护范围之内。
还需要说明的是,在上述实施例中,仅从服务器和客户端交互的角度,对本申请技术方案进行了介绍说明。上述有关服务器执行的步骤,可以单独实现成为服务器侧的沉浸式媒体提供方法;上述有关客户端执行的步骤,可以单独实现成为客户端侧的沉浸式媒体获取方法。
下述为本申请装置实施例,可以用于执行本申请方法实施例。对于本申请装置实施例中未披露的细节,请参照本申请方法实施例。
请参考图6A,其示出了本申请一个实施例提供的沉浸式媒体提供装置的框图。该装置具有实现上述沉浸式媒体提供方法示例的功能,所述功能可以由硬件实现,也可以由硬件执行相应的软件实现。该装置可以是上文介绍的服务器,也可以设置在服务器上。该装置600可以包括:信息接收模块610、分辨率选择模块620和文件发送模块630。
信息接收模块610,用于接收来自客户端的能力信息,所述能力信息用于指示所述客户端所在设备的沉浸式媒体播放能力。
分辨率选择模块620,用于根据沉浸式媒体内容的分辨率选择策略和所述能力 信息,从所述沉浸式媒体内容的候选分辨率中,确定提供给所述客户端的目标分辨率。
文件发送模块630,用于向所述客户端发送所述目标分辨率的沉浸式媒体文件。
在示例性实施例中,所述沉浸式媒体内容的文件格式信息包括:分辨率描述信息和分辨率选择策略信息;其中:所述分辨率描述信息,用于定义所述沉浸式媒体内容的候选分辨率;所述分辨率选择策略信息,用于定义所述沉浸式媒体内容的分辨率选择策略。
在示例性实施例中,所述分辨率选择策略信息包括:缩放策略类型字段,用于指示所述沉浸式媒体内容所采用的分辨率选择策略的类型。
在示例性实施例中,所述缩放策略类型字段的值为第一数值,表示所述分辨率选择策略为设备能力允许条件下,观看质量优先;所述缩放策略类型字段的值为第二数值,表示所述分辨率选择策略为设备能力允许条件下,用户带宽限制内观看质量优先。
在示例性实施例中,所述分辨率选择策略信息还包括:缩放策略描述字段,用于提供所述分辨率选择策略的文本描述;缩放策略描述长度字段,用于指示所述缩放策略描述字段中的所述文本描述的长度。
在示例性实施例中,所述分辨率描述信息,包括:数量指示字段,用于指示所述沉浸式媒体内容包括的缩放区域的数量;缩放比例指示字段,用于指示所述缩放区域的缩放比例;其中,不同的缩放区域对应于不同的候选分辨率。
在示例性实施例中,所述缩放比例指示字段的值为0,表示所述缩放区域相对于原始区域未进行缩放;所述缩放比例指示字段的值为1,表示所述缩放区域在宽、高上分别为原始区域的1/2;所述缩放比例指示字段的值为2,表示所述缩放区域在宽、高上分别为原始区域的1/4;所述缩放比例指示字段的值为3,表示所述缩放区域在宽、高上分别为原始区域的1/6;所述缩放比例指示字段的值为4,表示所述缩放区域在宽、高上分别为原始区域的1/8。
在示例性实施例中,所述分辨率描述信息,还包括:缩放算法类型字段,用于指示所述缩放区域的缩放算法类型;缩放符号类型字段,用于指示所述缩放区域的边界符号类型;缩放区域类型字段,用于指示所述缩放区域的类型;缩放区域描述字段,用于提供所述缩放区域的文本描述。
在示例性实施例中,所述能力信息包括以下至少一项:设备能力信息,用于指示所述客户端所在设备支持的最大分辨率;用户权限信息,用于指示所述客户端对应的用户权限所支持的最大分辨率;用户带宽信息,用于指示所述客户端对应的用户带宽上限。
综上所述,本申请实施例提供的技术方案,通过根据客户端的能力信息和沉浸式媒体内容的分辨率选择策略,从沉浸式媒体内容的候选分辨率中,选择目标分辨率的沉浸式媒体文件发送给客户端;提供了一种根据客户端能力自适应地选择沉浸式媒体内容的分辨率的技术方案,能够实现从满足客户端能力信息要求的候选分辨率中,选择最大分辨率提供给客户端,从而在保证用户体验的前提下,提升带宽资源的利用率。
请参考图6B,其示出了本申请一个实施例提供的沉浸式媒体提供装置的框图。该装置具有实现上述沉浸式媒体提供方法示例的功能,所述功能可以由硬件实现,也可以由硬件执行相应的软件实现。该装置可以是上文介绍的服务器,也可以设置在服务器上。该装置600可以包括:添加模块640、分辨率选择模块650和文件发送模块660。
添加模块640,用于在沉浸式媒体内容的文件格式信息中添加分辨率描述信息和分辨率选择策略信息。
分辨率选择模块650,用于根据所述沉浸式媒体内容的所述分辨率描述信息以及所述分辨率选择策略信息,确定提供给客户端的目标分辨率;
文件发送模块660,用于向所述客户端发送所述目标分辨率的沉浸式媒体文件。
请参考图7A,其示出了本申请一个实施例提供的沉浸式媒体获取装置的框图。该装置具有实现上述沉浸式媒体获取方法示例的功能,所述功能可以由硬件实现,也可以由硬件执行相应的软件实现。该装置可以是上文介绍的终端,也可以设置在终端上。该装置700可以包括:信息获取模块710、信息发送模块720和文件接收模块730。
信息获取模块710,用于获取客户端的能力信息,所述能力信息用于指示所述客户端所在设备的沉浸式媒体播放能力。
信息发送模块720,用于向服务器发送所述能力信息。
文件接收模块730,用于接收来自所述服务器的目标分辨率的沉浸式媒体文件, 所述目标分辨率是基于沉浸式媒体内容的分辨率选择策略和所述能力信息,从所述沉浸式媒体内容的候选分辨率中确定的。
在示例性实施例中,所述能力信息包括以下至少一项:设备能力信息,用于指示所述客户端所在设备支持的最大分辨率;用户权限信息,用于指示所述客户端对应的用户权限所支持的最大分辨率;用户带宽信息,用于指示所述客户端对应的用户带宽上限。
请参考图7B,其示出了本申请一个实施例提供的沉浸式媒体获取装置的框图。该装置具有实现上述沉浸式媒体获取方法示例的功能,所述功能可以由硬件实现,也可以由硬件执行相应的软件实现。该装置可以是上文介绍的终端,也可以设置在终端上。该装置700可以包括:文件接收模块740和呈现模块750。
文件接收模块740,用于接收来自服务器的目标分辨率的沉浸式媒体文件,所述沉浸式媒体文件的沉浸式媒体内容的文件格式信息包括分辨率描述信息和分辨率选择策略信息,其中,所述分辨率描述信息用于定义所述沉浸式媒体内容的候选分辨率,所述分辨率选择策略信息用于定义所述沉浸式媒体内容的分辨率选择策略;
呈现模块750,用于根据所述文件格式信息呈现所述沉浸式媒体文件。
综上所述,本申请实施例提供的技术方案,通过根据客户端的能力信息和沉浸式媒体内容的分辨率选择策略,从沉浸式媒体内容的候选分辨率中,选择目标分辨率的沉浸式媒体文件发送给客户端;提供了一种根据客户端能力自适应地选择沉浸式媒体内容的分辨率的技术方案,能够实现从满足客户端能力信息要求的候选分辨率中,选择最大分辨率提供给客户端,从而在保证用户体验的前提下,提升带宽资源的利用率。
需要说明的是,上述实施例提供的装置,在实现其功能时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的装置与方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
请参考图8,其示出了本申请一个实施例提供的服务器的结构框图。该服务器可用于执行上述实施例中提供的沉浸式媒体提供方法。具体来讲:
服务器800包括中央处理单元(Central Processing Unit,CPU)801、包括随机 存取存储器(Random Access Memory,RAM)802和只读存储器(Read Only Memory,ROM)803的系统存储器804,以及连接系统存储器804和中央处理单元801的系统总线805。服务器800还包括帮助计算机内的各个器件之间传输信息的基本输入/输出系统(I/O(Input/Output)系统)806,和用于存储操作系统813、应用程序814和其他程序模块812的大容量存储设备807。
基本输入/输出系统806包括有用于显示信息的显示器808和用于用户输入信息的诸如鼠标、键盘之类的输入设备809。其中显示器808和输入设备809都通过连接到系统总线805的输入输出控制器810连接到中央处理单元801。基本输入/输出系统806还可以包括输入输出控制器810,以用于接收和处理来自键盘、鼠标、或电子触控笔等多个其他设备的输入。类似地,输入输出控制器810还提供输出到显示屏、打印机或其他类型的输出设备。
大容量存储设备807通过连接到系统总线805的大容量存储控制器(未示出)连接到中央处理单元801。大容量存储设备807及其相关联的计算机可读介质为服务器800提供非易失性存储。也就是说,大容量存储设备807可以包括诸如硬盘或者只读光盘(Compact Disc Read-Only Memory,CD-ROM)驱动器之类的计算机可读介质(未示出)。
不失一般性,计算机可读介质可以包括计算机存储介质和通信介质。计算机存储介质包括用于存储诸如计算机可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机存储介质包括RAM、ROM、可擦除可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)、电可擦可编程只读存储器(Electrically Erasable Programmable Read Only Memory,EEPROM)、闪存或其他固态存储器技术,CD-ROM、高密度数字视频光盘(Digital Video Disc,DVD)或其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存储设备。当然,本领域技术人员可知计算机存储介质不局限于上述几种。上述的系统存储器804和大容量存储设备807可以统称为存储器。
根据本申请的各种实施例,服务器800还可以通过诸如因特网等网络连接到网络上的远程计算机运行。也即服务器800可以通过连接在系统总线805上的网络接口单元811连接到网络812,或者说,也可以使用网络接口单元811来连接到其他类型的网络或远程计算机系统(未示出)。
所述存储器还包括计算机程序,该计算机程序存储于存储器中,且经配置以由一个或者一个以上处理器执行,以实现上述沉浸式媒体提供方法。
请参考图9,其示出了本申请一个实施例提供的终端900的结构框图。该终端900可以是诸如手机、平板电脑、多媒体播放设备、电视机、放映机、显示器、可穿戴设备、PC等电子设备。该终端可用于实施上述实施例中提供的沉浸式媒体获取方法。具体来讲:
通常,终端900包括有:处理器901和存储器902。
处理器901可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器901可以采用数字信号处理(Digital Signal Processing,DSP)、现场可编程门阵列(Field Programmable Gate Array,FPGA)、可编程逻辑阵列(Programmable Logic Array,PLA)中的至少一种硬件形式来实现。处理器901也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称中央处理器(Central Processing Unit,CPU);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器901可以集成有图像处理器(Graphics Processing Unit,GPU),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器901还可以包括人工智能(Artificial Intelligence,AI)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器902可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非易失性的。存储器902还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器902中的非易失性的计算机可读存储介质用于存储至少一个指令,至少一段程序、代码集或指令集,所述至少一条指令、至少一段程序、代码集或指令集,且经配置以由一个或者一个以上处理器执行,以实现上述沉浸式媒体获取方法。
在一些实施例中,终端900还可包括有:外围设备接口903和至少一个外围设备。处理器901、存储器902和外围设备接口903之间可以通过总线或信号线相连。各个外围设备可以通过总线、信号线或电路板与外围设备接口903相连。具体地,外围设备包括:射频电路904、触摸显示屏905、摄像头906、音频电路907、定位组件908和电源909中的至少一种。
本领域技术人员可以理解,图9中示出的结构并不构成对终端900的限定,可 以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
在示例性实施例中,还提供了一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器中存储有处理器可执行指令,所述指令由一个或一个以上处理器执行时,实现上述沉浸式媒体提供方法或沉浸式媒体获取方法。其中,所述计算机设备可包括图8所示的服务器和图9所示的终端。
在示例性实施例中,还提供了一种计算机可读存储介质,所述存储介质中存储有处理器可执行指令,所述指令由一个或一个以上处理器执行时,实现上述沉浸式媒体提供方法。其中,所述一个或一个以上处理器可以位于服务器中。
在示例性实施例中,还提供了一种计算机可读存储介质,所述存储介质中存储有处理器可执行指令,所述指令由一个或一个以上处理器执行时,实现上述沉浸式媒体获取方法。其中,所述一个或一个以上处理器可以位于终端中。
该计算机可读存储介质还可以包括:只读存储器(Read Only Memory,ROM)、随机存取记忆体(Random Access Memory,RAM)、固态硬盘(Solid State Drives,SSD)或光盘等。其中,随机存取记忆体可以包括电阻式随机存取记忆体(Resistance Random Access Memory,ReRAM)和动态随机存取存储器(Dynamic Random Access Memory,DRAM)。
根据本申请的一个方面,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。服务器的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该服务器执行上述沉浸式媒体提供方法。终端的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该终端执行上述沉浸式媒体获取方法。
应当理解的是,在本文中提及的“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。另外,本文中描述的步骤编号,仅示例性示出了步骤间的一种可能的执行先后顺序,在一些其它实施例中,上述步骤也可以不按照编号顺序来执行,如两个不同编号的步骤同时执行,或者两个不同编号的步骤按照与图示相反的顺序执行,本申请实施例对此不作限定。以上所述仅为本申请的示例性实施例,并不用 以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (18)

  1. 一种沉浸式媒体提供方法,由服务器执行,其特征在于,所述方法包括:
    在沉浸式媒体内容的文件格式信息中添加分辨率描述信息和分辨率选择策略信息;
    根据所述沉浸式媒体内容的所述分辨率描述信息以及所述分辨率选择策略信息,确定提供给客户端的目标分辨率;
    向所述客户端发送所述目标分辨率的沉浸式媒体文件。
  2. 根据权利要求1所述的方法,其特征在于,所述分辨率描述信息,用于定义所述沉浸式媒体内容的候选分辨率;
    所述分辨率选择策略信息,用于定义所述沉浸式媒体内容的分辨率选择策略。
  3. 根据权利要求2所述的方法,其特征在于,所述分辨率选择策略信息包括:
    缩放策略类型字段,用于指示所述沉浸式媒体内容所采用的分辨率选择策略的类型。
  4. 根据权利要求3所述的方法,其特征在于,
    所述缩放策略类型字段的值为第一数值,表示所述分辨率选择策略为设备能力允许条件下,观看质量优先;
    所述缩放策略类型字段的值为第二数值,表示所述分辨率选择策略为设备能力允许条件下,用户带宽限制内观看质量优先。
  5. 根据权利要求3所述的方法,其特征在于,所述分辨率选择策略信息还包括:
    缩放策略描述字段,用于提供所述分辨率选择策略的文本描述;
    缩放策略描述长度字段,用于指示所述缩放策略描述字段中的所述文本描述的长度。
  6. 根据权利要求2所述的方法,其特征在于,所述分辨率描述信息,包括:
    缩放比例指示字段,用于指示所述沉浸式媒体内容包括的缩放区域的缩放比例;其中,不同的缩放区域对应于不同的候选分辨率;
    所述缩放比例指示字段的值为0,表示所述缩放区域相对于原始区域未进行缩放;
    所述缩放比例指示字段的值为1,表示所述缩放区域在宽、高上分别为原始区 域的1/2;
    所述缩放比例指示字段的值为2,表示所述缩放区域在宽、高上分别为原始区域的1/4;
    所述缩放比例指示字段的值为3,表示所述缩放区域在宽、高上分别为原始区域的1/6;
    所述缩放比例指示字段的值为4,表示所述缩放区域在宽、高上分别为原始区域的1/8。
  7. 根据权利要求1至6任一项所述的方法,其特征在于,所述能力信息包括以下至少一项:
    设备能力信息,用于指示所述客户端所在设备支持的最大分辨率;
    用户权限信息,用于指示所述客户端对应的用户权限所支持的最大分辨率;
    用户带宽信息,用于指示所述客户端对应的用户带宽上限。
  8. 一种沉浸式媒体获取方法,由终端执行,其特征在于,所述方法包括:
    接收来自服务器的目标分辨率的沉浸式媒体文件,所述沉浸式媒体文件的沉浸式媒体内容的文件格式信息包括分辨率描述信息和分辨率选择策略信息,其中,所述分辨率描述信息用于定义所述沉浸式媒体内容的候选分辨率,所述分辨率选择策略信息用于定义所述沉浸式媒体内容的分辨率选择策略;
    根据所述文件格式信息呈现所述沉浸式媒体文件。
  9. 根据权利要求8所述的方法,其特征在于,所述能力信息包括以下至少一项:
    设备能力信息,用于指示所述客户端所在设备支持的最大分辨率;
    用户权限信息,用于指示所述客户端对应的用户权限所支持的最大分辨率;
    用户带宽信息,用于指示所述客户端对应的用户带宽上限。
  10. 根据权利要求8所述的方法,其特征在于,所述分辨率选择策略信息包括:
    缩放策略类型字段,用于指示所述沉浸式媒体内容所采用的分辨率选择策略的类型。
  11. 根据权利要求10所述的方法,其特征在于,
    所述缩放策略类型字段的值为第一数值,表示所述分辨率选择策略为设备能力允许条件下,观看质量优先;
    所述缩放策略类型字段的值为第二数值,表示所述分辨率选择策略为设备能力 允许条件下,用户带宽限制内观看质量优先。
  12. 根据权利要求8所述的方法,其特征在于,所述分辨率选择策略信息还包括:
    缩放策略描述字段,用于提供所述分辨率选择策略的文本描述;
    缩放策略描述长度字段,用于指示所述缩放策略描述字段中的所述文本描述的长度。
  13. 根据权利要求8所述的方法,其特征在于,所述分辨率描述信息,包括:
    数量指示字段,用于指示所述沉浸式媒体内容包括的缩放区域的数量;
    缩放比例指示字段,用于指示所述缩放区域的缩放比例;其中,不同的缩放区域对应于不同的候选分辨率。
  14. 根据权利要求13所述的方法,其特征在于,
    所述缩放比例指示字段的值为0,表示所述缩放区域相对于原始区域未进行缩放;
    所述缩放比例指示字段的值为1,表示所述缩放区域在宽、高上分别为原始区域的1/2;
    所述缩放比例指示字段的值为2,表示所述缩放区域在宽、高上分别为原始区域的1/4;
    所述缩放比例指示字段的值为3,表示所述缩放区域在宽、高上分别为原始区域的1/6;
    所述缩放比例指示字段的值为4,表示所述缩放区域在宽、高上分别为原始区域的1/8。
  15. 一种沉浸式媒体提供装置,其特征在于,所述装置包括:
    添加模块,用于在沉浸式媒体内容的文件格式信息中添加分辨率描述信息和分辨率选择策略信息;
    分辨率选择模块,用于根据所述沉浸式媒体内容的所述分辨率描述信息以及所述分辨率选择策略信息,确定提供给客户端的目标分辨率;
    文件发送模块,用于向所述客户端发送所述目标分辨率的沉浸式媒体文件。
  16. 一种沉浸式媒体获取装置,其特征在于,所述装置包括:
    文件接收模块,用于接收来自服务器的目标分辨率的沉浸式媒体文件,所述沉 浸式媒体文件的沉浸式媒体内容的文件格式信息包括分辨率描述信息和分辨率选择策略信息,其中,所述分辨率描述信息用于定义所述沉浸式媒体内容的候选分辨率,所述分辨率选择策略信息用于定义所述沉浸式媒体内容的分辨率选择策略;
    呈现模块,用于根据所述文件格式信息呈现所述沉浸式媒体文件。
  17. 一种计算机设备,其特征在于,所述计算机设备包括处理器和存储器,所述存储器中存储有处理器可执行指令,所述指令由一个或一个以上处理器执行时,实现如权利要求1-14中任一项所述的方法。
  18. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有处理器可执行指令,所述指令由一个或一个以上处理器执行时,实现如权利要求1-14中任一项所述的方法。
PCT/CN2021/077360 2020-03-24 2021-02-23 沉浸式媒体提供方法、获取方法、装置、设备及存储介质 Ceased WO2021190221A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21777160.9A EP4009644A4 (en) 2020-03-24 2021-02-23 METHOD OF PROVIDING AND METHOD OF ACQUIRING IMMERSIVE MULTIMEDIA CONTENT, APPARATUS, DEVICE AND STORAGE MEDIA
US17/679,877 US12425700B2 (en) 2020-03-24 2022-02-24 Method for providing and method for acquiring immersive media, apparatus, device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010211178.6A CN113453046B (zh) 2020-03-24 2020-03-24 沉浸式媒体提供方法、获取方法、装置、设备及存储介质
CN202010211178.6 2020-03-24

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/679,877 Continuation US12425700B2 (en) 2020-03-24 2022-02-24 Method for providing and method for acquiring immersive media, apparatus, device, and storage medium

Publications (1)

Publication Number Publication Date
WO2021190221A1 true WO2021190221A1 (zh) 2021-09-30

Family

ID=77806310

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/077360 Ceased WO2021190221A1 (zh) 2020-03-24 2021-02-23 沉浸式媒体提供方法、获取方法、装置、设备及存储介质

Country Status (5)

Country Link
US (1) US12425700B2 (zh)
EP (1) EP4009644A4 (zh)
CN (2) CN113453046B (zh)
TW (1) TWI786572B (zh)
WO (1) WO2021190221A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114257838A (zh) * 2021-11-29 2022-03-29 新奥特(北京)视频技术有限公司 一种视频数据处理方法、装置、电子设备和存储介质

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116347183A (zh) * 2020-06-04 2023-06-27 腾讯科技(深圳)有限公司 一种沉浸媒体的数据处理方法及相关装置
CN113891117B (zh) * 2021-09-29 2023-02-14 腾讯科技(深圳)有限公司 沉浸媒体的数据处理方法、装置、设备及可读存储介质
US11983214B2 (en) * 2021-11-05 2024-05-14 Tencent America LLC Reuse of redundant assets with client query
CN115314723B (zh) * 2022-06-17 2023-12-12 百果园技术(新加坡)有限公司 一种初始档位视频流传输方法、装置、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090184962A1 (en) * 2008-01-22 2009-07-23 Dell Product L.P. Systems and Methods for Managing Video Resolution in a Multiple-Output Information Handling System
CN103493500A (zh) * 2012-09-04 2014-01-01 华为终端有限公司 媒体播放方法、控制点和终端
CN103825912A (zh) * 2014-03-24 2014-05-28 联想(北京)有限公司 一种数据传输方法、电子设备及服务器
CN106713895A (zh) * 2014-11-26 2017-05-24 索尼公司 处理内容的方法和设备
CN108462899A (zh) * 2018-03-19 2018-08-28 青岛海信电器股份有限公司 基于设备能力的流媒体码流自适应传输方法、设备及系统
US20190045248A1 (en) * 2018-05-31 2019-02-07 Intel Corporation Super resolution identifier mechanism

Family Cites Families (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6492985B1 (en) * 1999-07-06 2002-12-10 Internet Pictures Corporation Presenting manipulating and serving immersive images
US8458753B2 (en) * 2006-02-27 2013-06-04 Time Warner Cable Enterprises Llc Methods and apparatus for device capabilities discovery and utilization within a content-based network
US9124650B2 (en) * 2006-12-13 2015-09-01 Quickplay Media Inc. Digital rights management in a mobile environment
CN101163245B (zh) * 2007-11-27 2010-09-29 北京中星微电子有限公司 一种图像处理方法及装置
CN101420604A (zh) * 2008-11-20 2009-04-29 华为技术有限公司 一种媒体提供、下载方法及系统
US8947492B2 (en) * 2010-06-18 2015-02-03 Microsoft Corporation Combining multiple bit rate and scalable video coding
JP5684033B2 (ja) * 2011-04-11 2015-03-11 オリンパス株式会社 撮像装置及び内視鏡装置の作動方法
US20140082661A1 (en) * 2012-03-06 2014-03-20 Google Inc. Low latency video storyboard delivery with selectable resolution levels
CN103369355A (zh) * 2012-04-10 2013-10-23 华为技术有限公司 一种在线媒体数据转换的方法、播放视频方法及相应装置
CN105306986B (zh) * 2013-05-14 2016-09-07 广东云海云计算科技有限公司 集成基本数据、正常数据解扰的dvb条件接收装置
GB2524531B (en) * 2014-03-25 2018-02-07 Canon Kk Methods, devices, and computer programs for improving streaming of partitioned timed media data
EP2961182A1 (en) * 2014-06-27 2015-12-30 Alcatel Lucent Method, system and device for navigating in ultra high resolution video content by a client device
US9781350B2 (en) * 2015-09-28 2017-10-03 Qualcomm Incorporated Systems and methods for performing automatic zoom
US20180270515A1 (en) * 2015-10-01 2018-09-20 Vid Scale, Inc. Methods and systems for client interpretation and presentation of zoom-coded content
CN105933726A (zh) * 2016-05-13 2016-09-07 乐视控股(北京)有限公司 虚拟现实终端及其视频分辨率的适应方法及装置
CN109511284B (zh) * 2016-05-26 2023-09-01 Vid拓展公司 视窗自适应360度视频传送的方法和设备
CN107566854B (zh) * 2016-06-30 2020-08-07 华为技术有限公司 一种媒体内容的获取和发送方法及装置
KR102545195B1 (ko) * 2016-09-12 2023-06-19 삼성전자주식회사 가상 현실 시스템에서 컨텐트 전송 및 재생 방법 및 장치
WO2018049321A1 (en) * 2016-09-12 2018-03-15 Vid Scale, Inc. Method and systems for displaying a portion of a video stream with partial zoom ratios
ES2895927T3 (es) * 2017-01-05 2022-02-23 Nokia Technologies Oy Un aparato, un método y un programa de ordenador para la codificación y decodificación de vídeo
JP6872631B2 (ja) * 2017-03-23 2021-05-19 ヴィド スケール インコーポレイテッド 360度適応ストリーミングのエクスペリエンスを改善するためのメトリックおよびメッセージ
CN107087212B (zh) * 2017-05-09 2019-10-29 杭州码全信息科技有限公司 基于空间可伸缩编码的交互式全景视频转码与播放方法及系统
US10887379B2 (en) * 2017-09-20 2021-01-05 Verizon Patent And Licensing Inc. Dynamically determining a content delivery network from which to receive content
US20190104326A1 (en) * 2017-10-03 2019-04-04 Qualcomm Incorporated Content source description for immersive media data
CN111201796A (zh) * 2017-10-04 2020-05-26 Vid拓展公司 定制的360度媒体观看
JP7401453B2 (ja) * 2018-04-05 2023-12-19 ヴィド スケール インコーポレイテッド 全方位ビデオに対する視点メタデータ
GB2573543B (en) * 2018-05-09 2021-10-27 Advanced Risc Mach Ltd Graphics Processing
CN109218763A (zh) * 2018-11-12 2019-01-15 青岛海信传媒网络技术有限公司 一种流媒体视频切换的方法及智能电视
CN110572656B (zh) * 2019-09-19 2021-11-19 江苏视博云信息技术有限公司 一种编码方法、图像处理方法、装置、系统、存储介质及设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090184962A1 (en) * 2008-01-22 2009-07-23 Dell Product L.P. Systems and Methods for Managing Video Resolution in a Multiple-Output Information Handling System
CN103493500A (zh) * 2012-09-04 2014-01-01 华为终端有限公司 媒体播放方法、控制点和终端
CN103825912A (zh) * 2014-03-24 2014-05-28 联想(北京)有限公司 一种数据传输方法、电子设备及服务器
CN106713895A (zh) * 2014-11-26 2017-05-24 索尼公司 处理内容的方法和设备
CN108462899A (zh) * 2018-03-19 2018-08-28 青岛海信电器股份有限公司 基于设备能力的流媒体码流自适应传输方法、设备及系统
US20190045248A1 (en) * 2018-05-31 2019-02-07 Intel Corporation Super resolution identifier mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4009644A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114257838A (zh) * 2021-11-29 2022-03-29 新奥特(北京)视频技术有限公司 一种视频数据处理方法、装置、电子设备和存储介质
CN114257838B (zh) * 2021-11-29 2024-04-16 新奥特(北京)视频技术有限公司 一种视频数据处理方法、装置、电子设备和存储介质

Also Published As

Publication number Publication date
TWI786572B (zh) 2022-12-11
US20220182687A1 (en) 2022-06-09
EP4009644A1 (en) 2022-06-08
CN115225937A (zh) 2022-10-21
US12425700B2 (en) 2025-09-23
EP4009644A4 (en) 2023-03-15
CN115225937B (zh) 2023-12-01
TW202137770A (zh) 2021-10-01
CN113453046B (zh) 2022-07-12
CN113453046A (zh) 2021-09-28

Similar Documents

Publication Publication Date Title
KR102246002B1 (ko) 가상 현실 미디어 콘텐트의 스트리밍을 개선하는 방법, 디바이스, 및 컴퓨터 프로그램
TWI786572B (zh) 沉浸式媒體提供方法、獲取方法、裝置、設備及存儲介質
RU2711591C1 (ru) Способ, устройство и компьютерная программа для адаптивной потоковой передачи мультимедийного контента виртуальной реальности
KR102261559B1 (ko) 정보 처리 방법 및 장치
US20200092600A1 (en) Method and apparatus for presenting video information
WO2019202207A1 (en) Processing video patches for three-dimensional content
CN108282449B (zh) 一种应用于虚拟现实技术的流媒体的传输方法和客户端
US11348307B2 (en) Method and device for processing content
US20200145736A1 (en) Media data processing method and apparatus
CN107438203B (zh) 用于建立和接收清单的方法、网络设备及终端
US12518797B2 (en) Data processing method and storage medium
US12148106B2 (en) Data processing method and apparatus for immersive media, and computer-readable storage medium
HK40076011B (zh) 沉浸式媒体提供方法、获取方法、装置、设备及存储介质
HK40076011A (zh) 沉浸式媒体提供方法、获取方法、装置、设备及存储介质
HK40051796A (zh) 沉浸式媒体提供方法、获取方法、装置、设备及存储介质
HK40051796B (zh) 沉浸式媒体提供方法、获取方法、装置、设备及存储介质
HK40079468B (zh) 容积媒体的数据处理方法、装置、设备以及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21777160

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021777160

Country of ref document: EP

Effective date: 20220302

NENP Non-entry into the national phase

Ref country code: DE