201119346 六、發明說明: 【發明所屬之技術領域】 本發明係關於經編碼之視訊資料的傳送。 本申印案主張以下美國臨時申請案之權利:2〇〇9年9月 b曰申凊之美國臨時申請案第61/243,〇3〇號、2〇〇9年9月u 曰申清之美國臨時申請案第61/244,827號、2〇1〇年i月u曰 申清之美國臨時申請案第61/293,961號及2〇1〇年^ 15曰申 请之美國臨時申請案第61/295,261號,該等臨時申請案中 之每一者的全部内容以引用之方式併入本文中。 【先前技術】 可將數位視訊能力併入至廣泛範圍之器件中,該等器件 包括數位電視、數位直播系統、無線廣播系統、個人數位 助理(PDA) '膝上型或桌上型電腦、數位相機、數位記錄 器件、數位媒體播放器、視訊遊戲器件、視訊遊戲控制 台、蜂巢式或衛星無線電電話、視訊電話會議器件及其類 似者。數位視訊器件實施視訊壓縮技術,諸如在由mpeg_ 2、MPEG-4、ITU-T H.263 或 ITU-T H.264/MPEG-4 第 10部 分進階視訊寫碼(AVC)定義之標準及此等標準之擴展中描 述的視訊壓縮技術,以更有效地傳輸及接收數位視訊資 訊。 視訊壓縮技術執行空間預測及/或時間預測以減小或移 除視訊序列中固有之冗餘。對於基於區塊之視訊寫碼而 言,視訊訊框或切片可分割為巨型區塊。每一巨型區塊可 經進一步分割。框内寫碼⑴訊框或切片中之巨型區塊係使 151028.doc 201119346 神對於鄰近巨型區塊之空間預測師編碼。框間寫碼(p ^訊框或切片中之巨型區塊可使用相對於同—訊框或切 區塊的空㈣㈣相對於其他參考訊框之 時間預測。 在已編碼視訊資料之後,可由多卫器封包化視訊資料以 供傳輸或儲存。刪G_2包減義許多視訊編碼標準之傳 送等級的「系統」部分。刪G_2傳送等級系統可由 Μ隱2視訊編碼器或遵照不同視訊_標㈣ 編碼器使用。舉例而言,MPEG_4規定不同於MPEG]之編 石馬及解碼方法的編碼及解碼方法,但實施MpEG_4標準之 技術的視訊編碼器仍可利用MPEG_2傳送等級方法。一般 而言’對「MPEG-2系統」之提及指代由MpEG_2規定之視 訊資料的傳送等級。MPEG_2規定之傳送等級在本發明中 亦稱為「MPEG-2傳送流」,或簡稱為「傳送流」。同 樣,MPEG-2系統之傳送等級亦包括程式流。傳送流及程 式流通常包括用於遞送類似資料之不同格式,#中傳送流 包含包括音訊資料及視訊資料兩者之一或多個「程式」, 而程式流包括一個包括音訊資料及視訊資料兩者的程式。 已努力開發基於H.264/AVC之新視訊寫碼標準。一種此 才示準為係對H.264/AVC之可擴充擴展的可擴充視訊寫碼 (SVC)‘準。另-標準為多視圖視訊寫碼(mvc),其為對 H.264/AVC之多視圖擴展。ΜρΕ(}·2系統規範描述經壓縮 之多媒體(視訊及音訊)資料流可如何與其他資料一起進行 多工以形成適於數位傳輸或儲存的單一資料流。在·時 J51028.doc 201119346 5月之「Information Technology-Generic Coding of Moving Pictures and Associated Audio: Systems, Recommendation H.222.0; International Organisation for Standardisation, ISO/IEC JTC1/SC29/WG11; Coding of Moving Pictures and Associated Audio」中指定MPEG-2系統之最新規範。 MPEG最近設計優於MPEG-2系統之MVC傳送標準,且此規 範之最新版本為「Study of ISO/IEC 13818-1:2007/FPDAM4 Transport of MVC」,MPEG doc. N10572, MPEG of ISO/IEC JTC1/SC29/WG11(美國夏威夷毛伊島,2009年4月)。 MVC之最新聯合草案描述於JVT-AB204,「Joint Draft 8.0 on Multiview Video Coding」(德國漢错威第 28次 JVT會 議,2008年7月)中,其可在http://wftp3.itu.int/av-arch/jvt-site/2008_07_Hannover/JVT-AB204.zip 處獲得。整合於 AVC標準中之較新版本描述於JVT-AD007,「Editors」 draft revision to ITU-T Rec. H.264 | ISO/IEC 14496-10 Advanced Video Coding - in preparation for ITU-T SG 16 AAP Consent (in integrated form)」(瑞士曰内瓦第 30 次 JVT 會議,2009 年 2 月)中,其可在 http://wftp3.itu.int/av-arch/jvt-site/2009_01_Geneva/JVT-AD007.zip處獲得。 【發明内容】 一般而言,本發明描述用於在多軌跡視訊資料格式中使 用媒體提取器以形成一媒體提取器軌跡的技術。本發明修 改國際標準組織(ISO)基礎媒體格式以利用一提取器,該 提取器能夠參考一或多個潛在不連續網路存取層(NAL)單 151028.doc 201119346 元。此提取器可存在於_ISO基礎媒體格式樓案之任何軌 跡中。本發明亦描述對第三代合作夥伴計劃(3Gp㈣案格 式之修改以包括-訊框率值作為—軌跡選擇箱 selecti〇n b〇x)之一屬性。本發日純一步關於對該IS0基礎 媒體格式之多視圖視訊寫碼(MVC)擴展來描述該提取器之 使用以支援MVC操作點的有效提取。 在貫例中 #用於編碼視訊資料之方法包括:藉由 一源視訊器件基於經編碼之視訊資料來建構一第一軌跡, 該第一軌跡包括一包含複數個網路存取層(Nal)單元之視 訊樣本’叾中該視訊樣本包括於—存取單元中;藉由該源 視訊器件建構一包括一提取器之第二軌跡,該提取器識別 該第一軌跡之該視訊樣本中之該複數個NAL單元中的至少 一者,該複數個NAL單元中之該至少一者包含一第一經識 別之NAL單元,且其中該提取器識別該存取單元之一第二 NAL單元,其中該第一經識別之NAL單元與該第二經識別 之NAL單元為不連續的;將該第一軌跡及該第二軌跡包括 於一至少部分遵照國際標準組織(IS〇)基礎媒體檔案格式 的視訊樓案中;及輸出該視訊樓案。 在另一實例中,一種用於編碼視訊資料之裝置包括:一 編碼器,其經組態以編碼視訊資料;一多工器,其經組態 以:基於該經編碼之視訊資料來建構一第一軌跡,該第一 軌跡包括一包含複數個網路存取層(NAL)單元之視訊樣 本,其中該視訊樣本包括於一存取單元中;建構一包括一 提取器之第二軌跡,該提取器識別該第一轨跡之該視訊樣 151028.doc 201119346 本中之該複數個說單元中的至少—者,該複數個NA4 疋中之該至少-者包含-第一經識別之nal單元,且其中 該提取器識別該存取翠元之一第二NAL單元,其中該第一 經識別之單元與該第二經識別之隐單元為不連續 的;將該第-軌跡及該第二執跡包括於一至少部分遵昭國 際標準組織(IS〇)基礎媒體槽案格式的視訊槽案中;及一 輸出介面,其經組態以輸出該視訊檔案。 在另實例中’-種用於編碼視訊資料之裝置包括:用 於基於經編碼之視訊資料來建構—第—軌跡之構件該第 -軌跡包括-包含複數個網路存取層(nal)單元之視訊樣 本,其中該視訊樣本包括於一存取單元中;用於建構一包 括提取益之第二軌跡之構件,該提取器識別該第一執跡 之該視訊樣本中之該複數個NAL單元中的至少一者, 數個NAL單元中之該至少一者包含一第_經識別之魏單 凡’且其中該提取器識別該存取單元之—第二狐單元, -中。亥第別之NAL單元與該第:nal單元為不連續 的;用於將該第一軌跡及該第二軌跡包括於一至少部分遵 照國際標準組織(IS〇)基礎媒體槽案格式之視訊檔案中的 構件;及用於輸出該視訊檔案之構件。 在另一實例中’-種電腦可讀儲存媒體包含指令,該等 指令在執行時使-源器件之-處理器進行以下操作:基^ 經編碼之視訊資料來建構一第一軌跡,該第一軌跡包括— 包含複數個網路存取層(NAL)單元之視訊樣本,其中該視 訊樣本包括於一存取單元中;建構一包括一提取器之第二 151028.doc 201119346 轨跡,該提取器識別該第一執跡之該視訊樣本中之該複數 個NAL單元中的至少一者,該複數個Nal單元中之該至少 一者包含一第一經識別之NAL·單元,且其中該提取器識別 該存取單元之一第二NAL單元,其中該第一經識別之NAL 單元與s玄第一經識別之NAL單元為不連續的;將該第一軌 跡及該第二軌跡包括於一至少部分遵照國際標準組織 (ISO)基礎媒體檔案格式的視訊檔案中;及輸出該視訊檔 案。 在另一貫例中,一種用於解碼視訊資料之方法包括:藉 由一目的地器件之一解多工器接收一至少部分遵照國際標 準組織(ISO)基礎媒體檔案格式之視訊檔案該視訊檔案 包含一第一軌跡及一第二軌跡,該第一軌跡包括一包含對 應於經編碼視訊資料之複數個網路存取層(NAL)單元的視 Λ樣本,其中該視訊樣本包括於一存取單元中,且該第二 軌跡包括一識別該第一軌跡之該複數個NAL單元中之至少 一者的提取器,該複數個NAL單元中之該至少一者包含— 第一經識別之NAL單元,且其中該提取器識別該存取單元 之第一 NAL單元,其中該第一經識別單元與該第 一 k識別之NAL單元為不連續的;選擇該第二軌跡以進行 解碼,及將由該第二軌跡之該提取器識別之該第一 NAL單 元及》玄第一 NAL單元的經編碼視訊資料發送至該目的地器 件之一視訊解碼器。 在另一實例中,一種用於解碼視訊資料之裝置包括:一 視Λ解碼器,其經組態以解碼視訊資料;及一解多工器, 151028.doc -9 - 201119346 其經組態以··接收一至少部分遵照國際標準組織讲⑺基 礎媒體棺案格式之視訊檔案,該視訊樓案包含—第一軌跡 及一第二軌跡,該第-軌跡包括一包含對應於經編碼視訊 資料之複數個網路存取層(NAL)單元的視訊樣本,其中該 視訊樣本包括於-存取單元中,且該第二轨跡包括二㈣ 該第一軌跡之該複數個NAL單元中之至少—者的提取器, 該複數個NAL單元中之該至少—者包含—第—經識別之 NAL単兀,且其中該提取器識別該存取單元之一第二 單元j其中該第-經識狀N A L單元與該第二經識別之 NAL單元為不連續的;選擇該第二執跡以進行 由該第二軌跡之該提取器識別之該第一 NAL單元及該第二 NAL單元的經編碼視訊資料發送至該視訊解碼器。 在另一實例中,一種用於解碼視訊資料之裝置包括:用 於藉由-目的地器件之—解多工器接收一至少部分遵照國 際標準組織(ISO)基礎媒體檔案格式之視訊檔案之構件, 該視訊檔案包含一第一轨跡及一第二軌跡,該第一軌跡包 =包含對應於經編碼視訊資料之複數個網路存取層(NAL) 單元的視訊樣本,其令該視訊樣本包括於一存取單元中, 且該第二軌跡包括—識別該第—執跡之該複數個狐單元 中之至少一者的提取器,該複數個NAL單元中之該至少一 者包^第-經識別之NAL單元,且其中該提取器識別該 存取單元之HAL單元,其巾該第_經識別之說單 元與該第―經識別之NAL單元為不連續的,·用於選擇該第 二軌跡以進行解碼之構件;及用於將由該第二軌跡之該提 151028.doc -10· 201119346 取器識別之該第-NAL單元及該第二NAL單元的經編碼視 訊貧料發送至該目的地器件之—視訊解碼器的構件。 在另一實例中,一種電腦可讀儲存媒體編碼有指令,該 等指令在執行時使一目的地器件之一處理器進行以下操 作.在接收到一至少部分遵照國際標準組織(is〇)基礎媒 體檔案格式之視訊檔案之後’該視訊擋案包含一第一執跡 及-第二軌跡’該第一軌跡包括一包含對應於經編碼視訊 資料之複數個網路存取層(NAL)單元的視訊樣本其中該 視訊樣本包括於一存取單元令,且該第二執跡包括—識別 該第一軌跡之該複數個NAL單元中之至少一者的提取器, 該複數個NAL單元中之該至少一者包含一第一經識別之 NAL單元,且其中該提取器識別該存取單元之一第二NAl 單元,其中該第一經識別之NAL單元與該第二經識別之 NAL單元為不連續的,選擇該第二軌跡以進行解碼;及將 由§亥第二軌跡之該提取器識別之該第一 NAL單元及該第二 NAL單元的經編碼視訊資料發送至一視訊解碼器。 一或多個實例之細節在隨附圖式及以下描述中進行闡 述。其他特徵、目標及優點將自描述及圖式且自申請專利 範圍而顯而易見。 【實施方式】 本發明之技術大體上係針對增強國際標準組織(IS〇)基 礎媒體檔案格式及ISO基礎媒體檔案格式之擴展。ISO基礎 媒體檔案格式之擴展包括(例如)進階視訊寫碼(AVC)檔案 格式、可擴充視訊寫碼(SVC)檔案格式、多視圖視訊寫碼 151028.doc -11 - 201119346 (MVC)檔案格式及第三代合作夥伴計劃(3Gpp)檔案格式。 一般而言’本發明之技術可用以產生呈IS〇基礎媒體檔案 格式及/或ISO基礎媒體檔案格式之擴展的媒體提取器轨 跡。如下文更詳細描述,在一些實例中,此等媒體提取器 軌跡可用以支援超文字傳送協定(Ηττρ)視訊串流中之調 適。在一些實例中,媒體提取器形成ISO基礎媒體檔案格 式及/或ISO基礎媒體檔案格式之擴展(例如,AVC、SVC、 MVC及3GPP)之部分以提取另一軌跡之整個樣本從而形成 新媒體提取器軌跡。 此等技術可由MPEG-2(動畫專家群)系統(亦即,在傳送 等級細節方面遵照MPEG-2之系統)使用。MPEG_4(例如)提 供用於視訊編碼之標準,但通常假設遵照MpEG_4標準之 視訊編碼器將利用MPEG_2傳送等級系統。因此,本發明 之技術適用於遵照以下各者之視訊編碼器:、 MPEG-4、ITU-T H.263、ITU-T H.264/MPEG-4,或利用 MPEG-2傳送流及/或程式流之任何其他視訊編碼標準。 ISO基礎媒體檔案格式提供包括一或多個軌跡之檔案。 IS 0基礎媒體檔案格式標準將執跡定義為相關樣本之按時 間順序的序列。ISO基礎媒體檔案格式標準將樣本定義為 與單-時戳相關聯之資料,並提供樣本之實㈣為視訊之 個別訊框、按解碼次序之—系列視訊訊框,或音訊之按解 碼次序的經I缩區段。稱為*意軌跡(hint track)之特殊執 跡並不含有媒體資料’而含有用於將一或多個軌跡封裝於 串流頻道中的指令。ISO基礎媒體檔案格式標準指出,、在 I5l028.doc •12· 201119346 丁思:跡中’樣本定義一或多個串流封包之形成。 月之技術提供媒體提取器軌跡之建立。媒體提取器 軌跡通常可包括—或多個提取器。媒體提取器軌跡中之提 取^用以識別並提取另—軌跡之樣本。以此方式,可將媒 β器軌跡中之媒體提取器視為指標,該等指標在解參 考夺自另一執跡擷取樣本。不同於svc之提取器,例如, 本發月之提取器可參考另—執跡之—或多個潛在不連續的 ,用路存取層(NAL)單元。根據本發明之技術,媒體提取器 軌跡s有一或多個媒體提取器之軌跡及不包括媒體提取 器之其他軌跡可分組在一起以形成交替群組。 本發明關於NAL單元使用術語「連續的」以描述在同一 軌跡中連續出現之兩個或兩個以上nal單元。亦即,當兩 個NAL單元為連續的時,該等單元中之—者中的資料 之最末位兀組緊接於同一轨跡中之另一 nal單元的資料的 第一位元組之前。在同一存取單元中之兩個NAL單元在同 一軌跡内分離開某一資料量的情況下,或在一nal單元在 軌跡中出現而另一 NAL單元在不同軌跡中出現的情況 下’通常認為該兩個NAL單以系「不連續的」。本發明之 技術提供一可識別一存取單元之兩個或兩個以上不連續 NAL單元的提取器。 此外,本發明之提取器並不限於SVC,而是通常可包括 於1 s 0基礎媒體檔案格式或IS 0基礎媒體檔案格式之任何其 他擴展(諸如’ AVC、SVC或]VIVC)中。本發明之提取器亦 可包括於第三代合作夥伴計劃(3GPP)檔案格式中。本發明 151028.doc •13· 201119346 另外提供修改3GPP檔案格式 跡選擇箱之一屬性。 以明確地傳訊訊框率作為軌 媒體提取器軌跡可用於Mvc槽案格式中(例如)以支援操 作點之提取。祠服器器件可在MpEG_2傳送層位元流中提 2各種操作點,該等操作點中之每—者對應於多視圖視訊 $碼視訊資料之料視圖的—各別子集。亦即,操作點通 ㊉=於位疋流之視圖之—子集。在__些實例中,操作點 之母-視圖包括處於相同訊框率之視訊資料。根據本發明 ,技術,操作點可制_媒體提取器軌跡來表示,該媒體 ,取器執跡包括參考其他執跡之視訊資料之—或多個提取 益,及並未包括於其他軌跡中的潛在額外樣本。 以此=式U作點可僅包括解碼操作點所需要之必 要NAL單元,以便以共同訊框率輸出視圖之—子集。提取 器軌跡'與MVC視訊之整體表示的組合可形成MV:表示之 播放’月早。本發明之媒體提取器軌跡之使用可支援操作點 選擇及(例如)具有由時間可調能力引起之各種位it率之操 作點的切換。 本發明之媒體提取器執跡亦可心形成交替群組或切換 群組。亦即’在ISO基礎媒體檔案格式t,執跡可分組在 一起以形成交替群組。在助基礎媒體標案格式之實例 中,交替群組之軌跡形成對彼此之可行替代,使得在任一 時間通常播放或串流交替群組之執跡中的交替群 且之執跡應可(例如)經由諸如位元帛、編解碼器、語言、 封包大小之屬性或其他特性而與交替群組之其他軌跡區分 151028.doc -14- 201119346 開。本發明之技術提供對媒體提取器軌跡、含有媒 益之執跡及/或其他正常視訊軌跡進行分組,以形 群組。在遵照MVC之實例中,每—軌跡可對應於―各^ 作點。亦即,MVC中之每一操作點可由軌跡中之一特定: 跡(例如,媒體提取器執跡或不包括媒體提取器之軌跡疋)來 表不。同-交替群財之—執跡通常經選擇以用於漸進式 下載’以適應於可用頻寬。 工 類似地,媒體提取器軌跡及其他軌跡可分組在一起以形 成贿檔案格式之切換群組,且可用於軌跡選擇以適應 HTTP串流應用中的頻寬及解碼器能力。3Gpp檔案格式提 供軌跡之㈣群組之定義。切換群財之㈣屬於同二交 替群組。料,根據3Gpp㈣格式,同—切換群組中之 軌跡可用於在會話期間進行切換,而不同切換群組中之執 跡不可用於切換。 圖1為說明音訊/視訊(A / V )源器件2 〇將音訊資料及視訊 資料傳送至A/V目的地器件40之實例系統1〇的方塊圖。 A/V源器件20亦可稱為「源視訊器件」。^之系統ι〇可對 應於視訊電話會議系、统、伺服器/用戶端系统、廣播裝置/ 接收盗系統,或其中將視訊資料自源器件(諸如,a/v源器 件20)發送至目的地器件(諸如,Α/ν目的地器件4〇)的任何 其他系統。A/V目的地器件40亦可稱為「目的地視訊器 件」或「用戶端器件」。在一些實例中,Α/ν源器件2〇及 A/V目的地器件4〇可執行雙向資訊交換。亦即,Α/ν源器 件20及A/V目的地器件4〇可能能夠編碼並解碼(且傳輸並接 151028.doc 15 201119346 收)音訊資料及視訊資料。在一些實例中,音訊編碼器% 可包含亦%為聲碼器(vocoder)之語音編碼器。 圖1之實例中之A/V源器件20包含音訊源22及視訊源24。 音訊源22可包含(例如)麥克風,其產生表示待由音訊編碼 器26編碼之所捕獲音訊資料的電信號。或者,音訊源η可 包含:儲存先前記錄之音訊f料之儲存媒體、諸如電腦化 合成器之音訊資料產生器’或任何其他音訊資料源。視訊 源24可包含產生待由視訊編碼器28編碼之視訊資料的視訊 相機:編碼有先前記錄之視訊資料的儲存媒體、視訊資料 產生單元,或任何其他視訊資料源。 原始音訊資料及視訊資料可包含類比或數位資料。類比 資料在由音訊編碼器26及/或視訊編媽器28編碼之前可經 數位化。音訊源22可在談話參與者正在談話之同時自談話 參與者獲得音訊㈣,且視㈣2何㈣獲得談話參與者 之視Λ資料。在其他實例中,音訊源22可包含一包含所儲 存之a Λ資料的電腦可存媒體,且視訊源2何包含一 包含所儲存之視訊資料的電腦可讀儲存媒體。以此方式, 本發明中所描述之技術可應用至直播之串流即時音訊資料 及視訊資料’或應用至經封存之預先記錄之音訊資料及視 訊資料。 對應於視讯訊框之音訊訊框通常為含有音訊資料之音訊 訊框,該音訊資料由音訊源22與由視訊源24捕獲之含於視 訊訊框内之視訊資料同時捕獲。舉例而言,在談話參與者 通常藉由谈活產生音訊資料時,音訊源22捕獲音訊資料, 151028.doc • 16 * 201119346 且視訊源24同時(亦即,在音訊源22正捕獲音訊資料時)捕 獲谈活參與者之視訊資料。因此,音訊訊框可在時間上對 應於一或多個特定視訊訊框。因此,對應於視訊訊框之音 訊訊框通常對應於同時捕獲音訊資料及視訊資料之情形, 且對於s玄情形而έ,音訊訊框及視訊訊框分別包含同時捕 獲之音訊資料及視訊資料。 貫例中,音訊編碼器26可編碼每一經編碼音訊訊 框中之時戳,該時戳表示記錄經編碼音訊訊框之音訊資料 的時間,且類似地,視訊編碼器28可編石馬每一經編碼視訊 絲t之時戳’該時戮表示記錄經編碼視訊訊框之視訊資 料的時間。在此等實例中,對應於視訊訊框之音訊訊框可 包含-包含時戳之音訊訊框及一包含同一時戳的視訊訊 框。A/V源器件20可包括可供音訊編碼器糾/或視訊編碼 器28產生時戮或可供音訊源22及視訊源24用以使音訊資料 及視訊負料刀別與時戮相關聯的内部時鐘。 在一些實例中,音訊源22可將對應於記錄音訊資料之時 間的資料發送至音 <孔總^ ㈣… 視訊源24可將對應於記 錄視❹料之時間的資料發送至視訊編碼㈣。在一 例中,音訊編碼器26可在經編 一201119346 VI. Description of the Invention: TECHNICAL FIELD OF THE INVENTION The present invention relates to the transmission of encoded video material. This application for the following US temporary application claims: September 9th, September, b, US application for temporary application No. 61/243, 〇3〇, September 29, September 曰 Shen Qing U.S. Provisional Application No. 61/244, 827, 〇 〇 i 曰 曰 曰 之 之 之 之 之 美国 美国 美国 美国 美国 美国 美国 美国 美国 美国 美国 61 61 61 61 美国 美国 美国 美国 美国 美国 美国 美国 美国 美国 美国 美国 美国 美国 美国 美国 美国 美国295,261, the entire contents of each of which are incorporated herein by reference. [Prior Art] Digital video capabilities can be incorporated into a wide range of devices, including digital TVs, digital live systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, digital Cameras, digital recording devices, digital media players, video game devices, video game consoles, cellular or satellite radio phones, video teleconferencing devices and the like. Digital video devices implement video compression techniques, such as those defined by mpeg_2, MPEG-4, ITU-T H.263 or ITU-T H.264/MPEG-4 Part 10 Advanced Video Recording (AVC) and The video compression technology described in the extension of these standards to transmit and receive digital video information more efficiently. Video compression techniques perform spatial prediction and/or temporal prediction to reduce or remove redundancy inherent in video sequences. For block-based video writing, the video frame or slice can be split into giant blocks. Each giant block can be further divided. In-frame code (1) The giant block in the frame or slice makes the code for the space predictor of the adjacent giant block. Inter-frame coding (a megablock in a p-frame or slice can be predicted relative to other reference frames relative to the same frame or block (4) (4). After the video material has been encoded, it can be The Guardian encapsulates the video data for transmission or storage. The G_2 packet is used to reduce the "system" part of the transmission level of many video coding standards. The G_2 transmission level system can be encoded by the 2 2 video encoder or according to different video _ (4) codes. For example, MPEG_4 specifies a coding and decoding method different from MPEG], but the video encoder that implements the MpEG_4 standard technology can still use the MPEG_2 transmission level method. Generally speaking, 'right' The reference to the MPEG-2 system refers to the transmission level of the video material specified by MpEG_2. The transmission level specified by MPEG_2 is also referred to as "MPEG-2 Transport Stream" or simply "Transport Stream" in the present invention. The transport level of the MPEG-2 system also includes the program stream. The transport stream and the program stream usually include different formats for delivering similar data, and the # transport stream includes audio data and video assets. One or more "programs", and the program stream includes a program that includes both audio and video data. Efforts have been made to develop a new video writing standard based on H.264/AVC. Expandable video write code (SVC) for H.264/AVC expandable extension. Another standard is multi-view video write code (mvc), which is a multi-view extension for H.264/AVC. The }2 system specification describes how compressed multimedia (video and audio) data streams can be multiplexed with other data to form a single data stream suitable for digital transmission or storage. J51028.doc 201119346 May Information Technology-Generic Coding of Moving Pictures and Associated Audio: Systems, Recommendation H.222.0; International Organisation for Standardisation, ISO/IEC JTC1/SC29/WG11; Coding of Moving Pictures and Associated Audio" MPEG has recently been designed to be superior to the MVC transmission standard for MPEG-2 systems, and the latest version of this specification is "Study of ISO/IEC 13818-1:2007/FPDAM4 Transport of MVC", MPEG do c. N10572, MPEG of ISO/IEC JTC1/SC29/WG11 (Maui, Hawaii, USA, April 2009). The latest joint draft of MVC is described in JVT-AB204, "Joint Draft 8.0 on Multiview Video Coding" (28th JVT Conference, July 2008), available at http://wftp3.itu.int Obtained at /av-arch/jvt-site/2008_07_Hannover/JVT-AB204.zip. A newer version integrated into the AVC standard is described in JVT-AD007, "Editors" draft revision to ITU-T Rec. H.264 | ISO/IEC 14496-10 Advanced Video Coding - in preparation for ITU-T SG 16 AAP Consent (in integrated form) (30th JVT meeting in Geneva, Switzerland, February 2009), available at http://wftp3.itu.int/av-arch/jvt-site/2009_01_Geneva/JVT-AD007 Obtained at .zip. SUMMARY OF THE INVENTION In general, the present invention describes techniques for using a media extractor to form a media extractor trajectory in a multi-track video data format. The present invention modifies the International Standards Organization (ISO) base media format to utilize an extractor capable of reference to one or more potentially discontinuous network access layer (NAL) singles 151028.doc 201119346. This extractor can exist in any track of the _ISO base media format building. The present invention also describes one of the attributes of the third generation partnership program (the modification of the 3Gp (four) case format to include the frame rate value as the - track selection box selecti〇n b〇x). This is a pure day to describe the use of the extractor to support efficient extraction of MVC operating points with respect to the multiview video code (MVC) extension to the IS0 base media format. In the example, the method for encoding video data includes: constructing a first track by using a source video device based on the encoded video data, the first track comprising a plurality of network access layers (Nal) The video sample of the unit is included in the access unit; the source video device constructs a second track including an extractor, and the extractor identifies the video sample in the first track At least one of the plurality of NAL units, the at least one of the plurality of NAL units including a first identified NAL unit, and wherein the extractor identifies a second NAL unit of the access unit, wherein the The first identified NAL unit and the second identified NAL unit are discontinuous; the first track and the second track are included in a video that is at least partially in accordance with an International Standards Organization (IS) basic media file format In the case of the building; and output the video building case. In another example, an apparatus for encoding video data includes: an encoder configured to encode video material; a multiplexer configured to: construct a video based on the encoded video material a first track, the first track includes a video sample including a plurality of network access layer (NAL) units, wherein the video sample is included in an access unit; and a second track including an extractor is constructed The extractor identifies at least one of the plurality of said units of the first trajectory of the first trajectory, wherein the at least one of the plurality of NA4 包含 includes - the first identified nal unit And wherein the extractor identifies one of the second NAL units of the accessary, wherein the first identified unit and the second identified hidden unit are discontinuous; the first track and the second The manifestation is included in a video slot at least partially in accordance with the International Standards Organization (IS) basic media slot format; and an output interface configured to output the video file. In another example, an apparatus for encoding video data includes: means for constructing based on encoded video data - a component of a first track comprising - a plurality of network access layer (nal) units a video sample, wherein the video sample is included in an access unit; configured to construct a component including a second track of the benefit, the extractor identifying the plurality of NAL units in the video sample of the first trace In at least one of the plurality of NAL units, the at least one of the plurality of NAL units includes a first identified weidan Fan and wherein the extractor identifies the access unit - the second fox unit. The NAL unit and the nal unit are discontinuous; the first trajectory and the second trajectory are included in a video file at least partially conforming to the International Standards Organization (IS) basic media slot format The component in the device; and the component for outputting the video file. In another example, a computer readable storage medium includes instructions that, when executed, cause a processor of the source device to: construct a first trajectory based on the encoded video material, the A track includes - a video sample comprising a plurality of network access layer (NAL) units, wherein the video sample is included in an access unit; constructing a second 151028.doc 201119346 track including an extractor, the extracting Identifying at least one of the plurality of NAL units in the video sample of the first representation, the at least one of the plurality of Nal units including a first identified NAL unit, and wherein the extracting Identifying a second NAL unit of the access unit, wherein the first identified NAL unit and the first identified NAL unit are discontinuous; the first track and the second track are included in one At least partially in accordance with the International Standards Organization (ISO) basic media file format of the video file; and output the video file. In another example, a method for decoding video data includes receiving, by a multiplexer of a destination device, a video file at least partially in accordance with an International Standards Organization (ISO) basic media file format. a first track and a second track, the first track comprising a view sample comprising a plurality of network access layer (NAL) units corresponding to the encoded video data, wherein the video sample is included in an access unit And the second track includes an extractor identifying at least one of the plurality of NAL units of the first track, the at least one of the plurality of NAL units including - the first identified NAL unit, And wherein the extractor identifies the first NAL unit of the access unit, wherein the first identified unit and the first k identified NAL unit are discontinuous; selecting the second track for decoding, and The encoded video data of the first NAL unit and the first NAL unit identified by the extractor of the two tracks is sent to one of the video decoders of the destination device. In another example, an apparatus for decoding video data includes: a video decoder configured to decode video data; and a demultiplexer, 151028.doc -9 - 201119346 configured Receiving a video file at least partially in accordance with the International Standards Organization (7) basic media file format, the video file comprising - a first track and a second track, the first track comprising an image corresponding to the encoded video data a plurality of video samples of a network access layer (NAL) unit, wherein the video sample is included in an access unit, and the second track includes two (four) at least one of the plurality of NAL units of the first track— Extractor, the at least one of the plurality of NAL units includes - the first identified NAL, and wherein the extractor identifies one of the access units, the second unit j, wherein the first NAL unit and the second identified NAL unit are discontinuous; selecting the second track to perform encoded video of the first NAL unit and the second NAL unit identified by the extractor of the second track Data sent to the view decoder. In another example, an apparatus for decoding video data includes: means for receiving, by a demultiplexer of a destination device, a video file at least partially in accordance with an International Standards Organization (ISO) basic media file format The video file includes a first track and a second track. The first track package includes a video sample including a plurality of network access layer (NAL) units corresponding to the encoded video data, and the video sample is obtained. Included in an access unit, and the second track includes an extractor that identifies at least one of the plurality of fox units of the first-permission, the at least one of the plurality of NAL units An identified NAL unit, and wherein the extractor identifies the HAL unit of the access unit, the towel identifying the unit and the identified NAL unit being discontinuous, for selecting the a second trajectory for decoding; and for transmitting the encoded _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ The destination device Component video decoder. In another example, a computer readable storage medium is encoded with instructions that, when executed, cause a processor of a destination device to perform the following operations: upon receiving an at least partial compliance with an international standard organization (is) After the video file of the media file format, the video file includes a first track and a second track. The first track includes a plurality of network access layer (NAL) units corresponding to the encoded video data. a video sample, wherein the video sample is included in an access unit command, and the second track includes an extractor identifying at least one of the plurality of NAL units of the first track, the plurality of NAL units At least one of the first identified NAL units, and wherein the extractor identifies a second NAl unit of the access unit, wherein the first identified NAL unit and the second identified NAL unit are Continuously selecting the second track for decoding; and transmitting the encoded video data of the first NAL unit and the second NAL unit identified by the extractor of the second track of the second path to a video decoder. Details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objectives, and advantages will be apparent from the description and drawings and claims. [Embodiment] The technology of the present invention is generally directed to enhancing the extension of the International Standards Organization (IS) basic media file format and the ISO base media file format. Extensions to the ISO base media file format include, for example, Advanced Video Recording (AVC) file format, expandable video code (SVC) file format, multi-view video code 151028.doc -11 - 201119346 (MVC) file format And the 3rd Generation Partnership Project (3Gpp) file format. In general, the techniques of the present invention can be used to generate extended media extractor tracks in the IS〇 base media file format and/or the ISO base media file format. As described in more detail below, in some examples, such media extractor trajectories may be used to support adaptation in hypertext transfer protocol (Ηττρ) video streams. In some examples, the media extractor forms part of an extension of the ISO base media file format and/or the ISO base media file format (eg, AVC, SVC, MVC, and 3GPP) to extract the entire sample of another track to form a new media extraction. Trajectory. Such techniques may be used by the MPEG-2 (Animation Experts Group) system (i.e., systems that comply with MPEG-2 in terms of transmission level details). MPEG_4 (for example) provides a standard for video coding, but it is generally assumed that a video encoder that complies with the MpEG_4 standard will utilize the MPEG_2 transmission level system. Thus, the techniques of the present invention are applicable to video encoders that conform to: MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4, or utilize MPEG-2 transport streams and/or Any other video coding standard that the program streams. The ISO base media file format provides a file that includes one or more tracks. The IS 0 Basic Media Archive Format Standard defines the execution as a sequence of time-series of related samples. The ISO Basic Media File Format Standard defines a sample as the data associated with a single-time stamp and provides a sample of the actual frame (4) for individual frames of video, in decoding order, for series of video frames, or for decoding in audio decoding order. Through the I contraction section. A special implementation called a "hint track" does not contain media material' and contains instructions for encapsulating one or more tracks in a streaming channel. The ISO Basic Media Archives Format Standard states that, in I5l028.doc •12· 201119346 Ding: Traces, the sample defines the formation of one or more stream packets. Month's technology provides the creation of media extractor trajectories. The media extractor track can typically include - or multiple extractors. The extract in the media extractor trajectory is used to identify and extract samples of the other trajectory. In this way, the media extractor in the trajectory of the mediator can be considered as an indicator, and the metrics are taken from another scrambled sample in the solution. Unlike the extractor of svc, for example, the extractor of this month can refer to another---------------------------------------------------------------------------------------------------------- In accordance with the teachings of the present invention, the track of the media extractor track s one or more media extractors and other tracks that do not include the media extractor may be grouped together to form an alternating group. The present invention uses the term "continuous" with respect to NAL units to describe two or more nal units that occur consecutively in the same trajectory. That is, when two NAL units are consecutive, the last bit group of the data in the units is immediately before the first byte of the data of another nal unit in the same track. . In the case where two NAL units in the same access unit are separated by a certain amount of data within the same trajectory, or in the case where one nal unit appears in the trajectory and another NAL unit appears in a different trajectory, it is generally considered The two NALs are "discontinuous". The technique of the present invention provides an extractor that can identify two or more discrete NAL units of an access unit. Moreover, the extractor of the present invention is not limited to SVC, but may generally be included in any other extension of the 1 s 0 base media file format or the IS 0 base media file format (such as 'AVC, SVC or 'VIVC). The extractor of the present invention may also be included in the 3rd Generation Partnership Project (3GPP) file format. The present invention 151028.doc • 13· 201119346 additionally provides an option to modify one of the 3GPP file format trace selection boxes. Using the explicit frame rate as a track media extractor track can be used in the Mvc slot format (for example) to support the extraction of operating points. The server device can provide various operating points in the MpEG_2 transport layer bitstream, each of which corresponds to a respective subset of the view of the multiview video $code video material. That is, the operating point passes through a subset of the view of the bit turbulence. In some instances, the parent-view of the operating point includes video data at the same frame rate. According to the present invention, the operating point can be represented by a media extractor trajectory, which includes reference to other executive video data, or multiple extraction benefits, and is not included in other trajectories. Potential extra samples. The use of this = U can simply include the necessary NAL units needed to decode the operating point to output a subset of the views at a common frame rate. The combination of the extractor trajectory' and the overall representation of the MVC video can form an MV: the representation is played 'early month'. The use of the media extractor trajectory of the present invention can support operating point selection and, for example, switching of operating points having various bit rate rates caused by time adjustable capabilities. The media extractor of the present invention can also form an alternating group or a switching group. That is, in the ISO base media file format t, the traces can be grouped together to form an alternating group. In the example of the assisted base media markup format, the trajectories of the alternating groups form a viable alternative to each other such that at any one time the alternate group of the alternate group of play or stream alternate groups is usually obligatory (eg ) Distinguish from other tracks of alternate groups via attributes such as bit 帛, codec, language, packet size, or other characteristics. 151028.doc -14- 201119346 On. The techniques of the present invention provide grouping of media extractor trajectories, media-containing trajectories, and/or other normal video trajectories to form groups. In the example of obeying MVC, each trajectory may correspond to "each" point. That is, each of the operating points in the MVC can be represented by one of the tracks: a trace (e.g., a media extractor or a tracker that does not include a media extractor). The same-alternative grouping—the falsification is usually selected for progressive downloading' to accommodate the available bandwidth. Similarly, media extractor trajectories and other trajectories can be grouped together to form a switching group of bribe file formats, and can be used for trajectory selection to accommodate bandwidth and decoder capabilities in HTTP streaming applications. The 3Gpp file format provides the definition of the (4) group of tracks. The switch group (4) belongs to the same two-group. According to the 3Gpp (4) format, the tracks in the same-switching group can be used to switch during the session, and the tracks in the different switching groups are not available for switching. 1 is a block diagram showing an example system 1 of an audio/video (A / V) source device 2 for transmitting audio data and video data to an A/V destination device 40. The A/V source device 20 may also be referred to as a "source video device." The system ι can correspond to a video teleconferencing system, a server/client system, a broadcast device/receiving system, or a video device from a source device (such as a/v source device 20) to a destination. Any other system of ground devices (such as Α/ν destination device 4〇). The A/V destination device 40 may also be referred to as a "destination video device" or a "customer device." In some examples, the Α/ν source device 2〇 and the A/V destination device 4〇 can perform two-way information exchange. That is, the Α/ν source device 20 and the A/V destination device 4〇 may be capable of encoding and decoding (and transmitting and receiving) audio and video data. In some examples, the audio encoder % may include a speech coder that is also a vocoder. The A/V source device 20 in the example of FIG. 1 includes an audio source 22 and a video source 24. The audio source 22 can include, for example, a microphone that produces an electrical signal representative of the captured audio material to be encoded by the audio encoder 26. Alternatively, the audio source η may comprise: a storage medium storing previously recorded audio material, an audio data generator such as a computerized synthesizer' or any other source of audio data. The video source 24 can include a video camera that produces video data to be encoded by the video encoder 28: a storage medium encoding the previously recorded video material, a video data generating unit, or any other source of video data. The original audio and video materials may contain analog or digital data. The analog data can be digitized prior to being encoded by the audio encoder 26 and/or the video encoder 28. The audio source 22 can obtain audio (4) from the conversation participant while the conversation participant is talking, and obtain the visual data of the conversation participant depending on (4) 2 (4). In other examples, audio source 22 can include a computer storable medium containing stored a data, and video source 2 includes a computer readable storage medium containing stored video data. In this manner, the techniques described in this disclosure can be applied to streaming live audio and video data or to pre-recorded audio and video data that are archived. The audio frame corresponding to the video frame is usually an audio frame containing audio data, which is captured by the audio source 22 and the video data captured by the video source 24 and contained in the video frame. For example, when a conversation participant typically generates audio data by talking, the audio source 22 captures the audio material, and the video source 24 is simultaneously (ie, when the audio source 22 is capturing the audio material). ) Capture video information about live participants. Therefore, the audio frame can correspond to one or more specific video frames in time. Therefore, the audio frame corresponding to the video frame generally corresponds to the simultaneous capture of the audio data and the video data, and for the sinusoidal situation, the audio frame and the video frame respectively contain the simultaneously captured audio data and video data. In an example, the audio encoder 26 can encode a timestamp in each encoded audio frame, the timestamp indicating the time at which the audio material of the encoded audio frame is recorded, and similarly, the video encoder 28 can be programmed to each other. The timestamp of the encoded videowire t is the time at which the video material of the encoded video frame is recorded. In these examples, the audio frame corresponding to the video frame may include an audio frame containing a time stamp and a video frame containing the same time stamp. The A/V source device 20 can be included in the audio encoder or/the video encoder 28, or can be used by the audio source 22 and the video source 24 to associate the audio data and the video device with the time. Internal clock. In some examples, the audio source 22 can transmit data corresponding to the time at which the audio material was recorded to the tone. The video source 24 can transmit the data corresponding to the time at which the video was recorded to the video code (4). In one example, the audio encoder 26 can be programmed
Kj| - , ,t . 1貝科申編碼一序列識 別符4讀編碼音訊f料中之㈣時 示記錄音訊資料之絕對時間,且類似地,= 序。類似地,在一此實射/貝科的相對時間排 其他方式與時戳相關。 映射,或以 l5J028.doc 201119346 本發明之技術通常係針對經編碼多媒體(例如,音訊及 視訊)資料之傳送,及經傳送多媒體資料之接收及後續解 譯以及解碼。本發明之技術可應用至各種標準及擴展之視 訊資料(諸如,可擴充視訊寫碼(svc)、進階視訊寫碼 (AVC)、OSI基礎層或多視圖視訊寫碼(MVC)資料)或包含 複數個視圖之其他視訊資料的傳送。如圖丨之實例中所 示,視訊源24可向視訊編碼器28提供一場景之複數個視 圖。視訊資料之多個視圖可用於產生待由三維顯示器(諸 如,戴眼鏡式立體或眼式立體三維顯示器)使用之三維視 訊資料。 A/V源器件20可向A/V目的地器件4〇提供「服務」。服 務通常對應於MVC資料之可用視圖的子集。舉例而言,多 視圖視訊資料可用於以零至七排序之八個視圖。一服務可 對應於具有兩個視圖之立體視訊,而另一服務可對應於四 個視圓,且又一服務可對應於所有八個視圖。一般而言, 一服務對應於可用視圖之任一組合(亦即,任一子集)。服 務亦可對應於可用視圖以及音訊資料之組合。 A/V源器件20根據本發明之技術能夠提供對應於視圖之 一子集的服務。一般而言,一視圖藉由亦稱為「Wew_id」 之視圖識別符來表示。視圖識別符通常包含可用以識別視 圖之語法要素《在編碼視圖時,MVC編碼器提供視圖之 view_id。view 一 id可由MVC解碼器使用以用於視圖間預 測’或由其他單元使用以用於其他用途(例如,用於顯 現)。 15I028.doc •18· 201119346 視圖間預測為用於參看共同時faH立 之 夕 編碼-訊框之MVC視訊資料作為 ”個訊框 作為不同視圖之經編碼訊框的 何。下文更詳細論述之圖7提供用於視圖間預測之實例 寫碼方案。-般而言,Mvc視訊資料之經編竭訊框可μ 間、時間預測性編碼,及/或參看共同時間位置處之其他 視圖的訊框而經預測性編碼。因此’供預測其他視圖之夫 考視圖通常在參考視圖充當參考之視圖之前進行解碼1 得此等經解碼視圖在解碼有參考内容之視圖時可用於參 考。解碼次序不必對應於view」d之次序。因&,使用視 圖次序索引來描述視圖之解碼次序。視圖次序索引為指示 存取單元中之相應視圖組件之解碼次序的索引。 ' 每:個別資料(音訊或視訊)流稱為基本流。基本流為程 式之單一經數位寫碼(可能經壓縮)組件。舉例而言,程式 之經寫碼視訊或音訊部分可為基本流。基本流在多工為裎 式流或傳送流之前可轉換為經封包化之基本流(pEsp在 同耘式内,流ID用以區分屬於一基本流之pES封包與屬 於其他基本流的PES封包。基本流之基本資料單元為經封 包化之基本流(PES)封包。因此,MVC視訊資料之每一視 圖對應於各別基本流。類似地,音訊資料對應於一或多個 各別基本流。 經MVC寫碼之視訊序列可分離成若干子位元流,該等子 位兀流中之每一者為一基本流。可使用MVc view_id子集 來識別每一子位元流。基於每一 Mvc view_id子集之概 念’定義MVC視訊子位元流^ MVC視訊子位元流含有在 151028.doc •19· 201119346 MVC vieW」d子集中列出之視圖的職單元。程式流通常 含有僅係來自基本流之NAL單元的NAL單元。亦設計任兩 個基本流不可含有相同視圖。 在圖1之實例中’多工器3〇接收來自視訊編碼器之包 含視訊資料之基本流及來自音訊編碼器26之包含音訊資料 的基本流。在-些實例中’視訊編碼器28及音訊編碼器% 可各自包括用於自經編碼資料形成pES封包的封包化器。 在其他實例中,視訊編碼器28及音訊編碼器%可各自與用 於^經編碼資料形成PES封包的各別封包化器介接。:其 他實例中’多JL㈣可包括用於自經編碼音訊資料及視訊 資料形成PES封包之封包化器。 如本發明中所使用之「程式」可包含音訊資料及視訊資 料之組合’例如’藉由A/v源器件2()之服務所遞送之音訊 基本流及可用視圖之一子集。每一m封包包括一識別 :ES封包所屬於之基本流的stream」d。多工器3。可將基本 流組合為構成性程式流或傳送流。程式流及傳送流係目標 為不同應用之兩個替代性多工。 般而呂,程式流包括一程式之資料’而傳送流可包括 一或多個程式之資料。多玉器3G可基於以下各者來編碼程 式流或傳送流中之任_者或兩者:正提供之服務、流將傳 遞至^媒體、待發送之程式之數目’或其他考慮事項。舉 例而s,當將在儲存媒體中編碼視訊資料時,多工器儿可 能更有可能形成程式流,而當將經由網路串流、廣播或發 、’'為視efl電,舌之部分的視訊資料時,多工器3 〇可能更有 151028.doc 201119346 可能使用傳送流。 可使多工器30更傾向於使用程式流來用於來自數位儲存 服務之單—程式的儲存及顯示。因為程式流對於錯誤為相 當敏感的,所以程式流意欲用於無錯誤環境或較不容易遭 遇錯誤的環境中。程式流僅包含屬於其之基本流,且通常 含有可變長度封包。在程式流中,自相關基本流得出之 :ES封包經組織為「套包(pack)」。套包包含套包標頭、 I選系統標頭’及自相關基本流中之任一者獲取的採用任 一次序之任何數目個PES封包。系統標頭含有程式流之特 性之概述,諸如其最大資料速率、相關視訊及音訊基本流 / '、他時序資讯’或其他資訊。解碼器可使用含於 系:標,中之資訊以判定解碼器是否能夠解碼程式流。、 夕工态30可使用一傳送流用於經由潛在易於出錯之頻 ^時遞^複數個程式。傳送流為針對多程式應用(諸如, :播=汁之多工’使得單一傳送流可容納許多獨立裎 式。傳送流包含_ i車虫屑、笔& a 連串傳送封包,該等傳送封包中之每— I 188位①組長。短固定長度封&之使 =程,對錯誤較不敏感。另外,可藉由經由標 188位’李德-所羅門編碼)處理封包來向每- 傳送流之改良=7提供料錯㈣護。舉例而言, t ^ ^ ^ ^ D思明,傳送流更可能使在廣播環境 中所見之易於出錯之頻道存續下來。 兄 傳3大之抗誤性及載運許多同時程式之能力的 傳达流為兩個多工中 刀的 幸佳者。然而,傳送流相較於程式 i51028.doc 201119346 w為更複雜之多工’ j_因此更難以產生且難以解多工。傳 ^封包之第-位元組為具有值0x47(十六進制47、二進制 「010001U」、十進制71)的同步位元組。單—傳送流可 ^料多不同程式,每—程式包含許多封包化之基本流。 夕工益3G可使用13位元封包識別符(piD)欄位來區分含有 -基本流之資料的傳送封包與載運其他基本流之資料的封 包。多工器負責確保向每一基本流授予一唯一⑽值。傳 送封包之最末位元組為連續性計數攔位。多工器3〇使屬於 同一基本流之連續傳送封包之間的連續性計數搁位的值遞 增。,情形使得目的地器件(諸如,A/v目的地器件4〇)之 解碼斋或其他單元能夠债測到傳送封包之丢失或增益並 有希望隱匿原本可能由此類事件導致的錯誤。 夕器30接收來自音sfl編碼器26及視訊編碼器μ之程式 之基本流的PES封包’且自PES封包形成相應網路抽象層 (NAL)單兀。在H.264/AVC(進階視訊寫碼)之實例中,經寫 碼之視訊片段組織為NAL單元’該等NAL單元提供解決諸 如視訊電話、儲存、廣播或串流之應用的「網路親和性」 視訊表示。NAL單元可分類成視訊寫碼層(vcl)nal單元 及非VCL NAL單元。VCL單元含有核心壓縮引擎,且可包 含區塊、巨型區塊及/或切片等級。其他NAL單元為非vcl NAL單元。 多工器30可形成NAL單元,該等NAL單元包含一識別 NAL所屬於之程式的標頭,以及有效負載,例如音訊資 料、視訊資料或描述NAL單元對應於之傳送流或程式流的 I5I028.doc •22· 201119346 貧料。舉例而言,在H.264/Avc中,NAL單元可包括一位 元組標頭及可變大小之有效負載。在一實例中,Nal單元 標頭包含priorityjd要素、temp〇ralJd要素、 anCh〇r_Pic—flag要素、view_id要素、n〇n_idr—fiag要素, 及inter—view-flag要素。在習知MVC中,保留藉由Η〗”定 義之NAL單兀(除包括4位元組Mvc NAL單元標頭及nal單 兀有效負載之首碼NAL單元及經厘¥(:寫碼之切片Nal單元 外)。 NAL標頭之pri〇rity_id要素可用於簡單之單路徑位元流 調適過程。temporal」d要素可用於指定相應NAL單元之時 間等級,其t不同時間等級對應於不同訊框率。 anCh〇r_pic_flag要素可指示圖片為錨定圖片或是非錨定 圖片。錯定圖片及其後之採用輸出次序(亦即,顯示次序) =所有圖片可在不以解碼次序(亦即,位元流次序)解碼先 前圖片的情況下經正確解碼,且因此可用作隨機存取點。 錨定圖片及非錨定圖片可具有不同相依性,其兩者皆在序 列參數集合中予以傳訊。將在此章節之隨後段落中論述並 使用其他旗標。此錯定圖片亦可稱為開放G〇p(圖像群組) 存取點,而在non一idr_flag要素等於零時亦支援封閉G〇p存 取點。n〇n」dr_flag要素指示圖片為即時解碼器再新(idr) 圖片或是視圖IDR(V.職)圖片…般而言,職圖片及其 ,之採用輸出次序或位元流次序的所有圖片可在不以解碼 •人序或顯示次序解碼先前圖片H兄下經正確解碼。 viewed要素包含可用以識別視圖之語法資訊,其可用 151028.doc •23- 201119346Kj| - , , t . 1 Beikeshen coded a sequence of identifiers 4 (4) when reading the encoded audio material, the absolute time of recording the audio data, and similarly, = order. Similarly, the relative timing of the actual shot/becaine is related to the time stamp. The mapping, or the technique of the present invention, is generally directed to the transmission of encoded multimedia (e.g., audio and video) data, and the receipt and subsequent interpretation and decoding of transmitted multimedia material. The techniques of the present invention are applicable to a variety of standard and extended video materials (such as scalable video write code (svc), advanced video write code (AVC), OSI base layer or multi-view video code (MVC) data) or The transmission of other video material containing a plurality of views. As shown in the example of Figure 视, video source 24 can provide video encoder 28 with a plurality of views of a scene. Multiple views of the video material can be used to generate 3D video material to be used by a 3D display, such as a glasses-type stereo or an ophthalmic stereoscopic 3D display. The A/V source device 20 can provide a "service" to the A/V destination device. The service typically corresponds to a subset of the available views of the MVC material. For example, multi-view video data can be used for eight views sorted from zero to seven. One service may correspond to stereoscopic video with two views, while another service may correspond to four viewing circles, and yet another service may correspond to all eight views. In general, a service corresponds to any combination of available views (ie, any subset). Services can also correspond to a combination of available views and audio data. A/V source device 20 is capable of providing services corresponding to a subset of views in accordance with the teachings of the present invention. In general, a view is represented by a view identifier also known as "Wew_id". The view identifier usually contains the syntax elements that can be used to identify the view. When encoding the view, the MVC encoder provides the view_id of the view. The view id can be used by the MVC decoder for inter-view predictions' or used by other units for other purposes (e. g., for display). 15I028.doc •18· 201119346 Inter-view prediction is used to refer to the MVC video data of the common faH code-frame as the “frame” as the coded frame of different views. 7 provides an example write code scheme for inter-view prediction. In general, the Mvc video data warp frame can be inter-predicted, and/or reference frames of other views at common time positions. Predictive coding. Therefore, the view for predicting other views is usually decoded before the reference view serves as the reference view. 1 These decoded views can be used for reference when decoding the view with reference content. The decoding order does not have to correspond. In the order of view "d". Because &, the view order index is used to describe the decoding order of the view. The view order index is an index indicating the decoding order of the corresponding view component in the access unit. 'Every: The individual data (information or video) stream is called the elementary stream. The elementary stream is a single digit-coded (possibly compressed) component of the program. For example, the coded video or audio portion of the program can be an elementary stream. The elementary stream can be converted into a packetized elementary stream before the multiplex is a stream or a transport stream (pEsp is in the same type, and the stream ID is used to distinguish the pES packet belonging to one elementary stream from the PES packet belonging to other elementary stream). The basic data unit of the elementary stream is a packetized elementary stream (PES) packet. Therefore, each view of the MVC video data corresponds to a respective elementary stream. Similarly, the audio data corresponds to one or more respective elementary streams. The video sequence encoded by the MVC can be separated into a number of sub-bitstreams, each of which is an elementary stream. Each sub-bitstream can be identified using the MVc view_id subset. The concept of a Mvc view_id subset defines the MVC video sub-bit stream ^ MVC video sub-bitstream contains the unit of view listed in the 151028.doc •19· 201119346 MVC vieW”d subset. The program stream usually contains only The NAL unit is derived from the NAL unit of the elementary stream. It is also designed that no two elementary streams can contain the same view. In the example of Figure 1, the multiplexer 3 receives the elementary stream containing the video data from the video encoder and the audio stream. coding 26 includes an elementary stream of audio material. In some examples, 'video encoder 28 and audio encoder % may each include a packetizer for forming a pES packet from encoded data. In other examples, video encoder 28 And the audio encoder % can each be interfacing with a separate packetizer for forming a PES packet for the encoded data. In other examples, 'multiple JL(4) may include a packet for forming a PES packet from the encoded audio material and the video material. A "program" as used in the present invention may include a combination of audio data and video data 'eg, a subset of audio elementary streams and available views delivered by the service of A/v source device 2(). Each m packet includes a stream identifying the elementary stream to which the ES packet belongs. multiplexer 3. The elementary stream can be combined into a constitutive program stream or a transport stream. The program stream and the transport stream system target are different applications. The two alternative multiplexes. The program stream includes a program of data' and the transport stream may include data of one or more programs. The multi-jade 3G can encode the program stream or the transport stream based on the following: Any or both: the service being provided, the stream will be passed to ^media, the number of programs to be sent' or other considerations. For example, when the video material will be encoded in the storage medium, the multiplexer It may be more likely to form a program stream, and when the video data will be streamed, broadcasted or sent via the network, the multiplexer 3 may be more than 1510.doc 201119346 may be used. The transport stream can make the multiplexer 30 more inclined to use the program stream for the storage and display of the single-program from the digital storage service. Because the program stream is quite sensitive to errors, the program stream is intended for use in an error-free environment. Or less likely to encounter an error in the environment. A program stream contains only the elementary streams that belong to it, and usually contains variable-length packets. In the program flow, the autocorrelation elementary stream is derived: the ES packet is organized into a "pack". The package includes any number of PES packets in any order obtained from any of the package header, the I-select system header, and the autocorrelation elementary stream. The system header contains an overview of the characteristics of the program stream, such as its maximum data rate, associated video and audio elementary stream / ', his timing information' or other information. The decoder can use the information contained in the system to determine if the decoder can decode the program stream. The evening state 30 can use a transport stream for transferring a plurality of programs via a potentially error-prone frequency. The transport stream is for multi-program applications (such as: broadcast = juice multiplex) so that a single transport stream can accommodate many independent rafts. The transport stream contains _i car bugs, pen & a serial transport packets, such transport Each of the packets - I 188 bits and 1 group length. Short fixed length seals & amps = less sensitive to errors. In addition, the packets can be processed by the 188-bit 'Lead-Solomon code'. Improvement of the flow = 7 provides the wrong material (four) protection. For example, t ^ ^ ^ ^ D, the transport stream is more likely to survive the error-prone channel seen in the broadcast environment. The transmission of the three major abilities of the brothers and the ability to carry many simultaneous programs is the lucky one of the two knives. However, the transport stream is more complicated than the program i51028.doc 201119346 w, so it is more difficult to generate and difficult to solve. The first byte of the packet is a sync byte having a value of 0x47 (hex 47, binary "010001U", decimal 71). The single-transport stream can be programmed with many different programs, each of which contains many packetized elementary streams. Xigongyi 3G can use the 13-bit packet identifier (piD) field to distinguish between a transport packet containing data of the elementary stream and a packet carrying data of other elementary streams. The multiplexer is responsible for ensuring that a unique (10) value is granted to each elementary stream. The last byte of the transmitted packet is the continuity count block. The multiplexer 3 increments the value of the continuity count stall between consecutive transport packets belonging to the same elementary stream. The situation enables the decoding of the destination device (such as the A/v destination device 4) or other unit to be able to detect the loss or gain of the transmitted packet and hopefully conceal the error that might otherwise be caused by such an event. The october 30 receives the PES packet ' from the elementary stream of the program of the sfl encoder 26 and the video encoder μ and forms a corresponding network abstraction layer (NAL) unit from the PES packet. In the case of H.264/AVC (Advanced Video Write Code), the coded video segments are organized into NAL units that provide "networks" for applications such as video telephony, storage, broadcast or streaming. Affinity" video said. NAL units can be classified into video codec layer (vcl) nal units and non-VCL NAL units. The VCL unit contains a core compression engine and can contain blocks, megablocks, and/or slice levels. Other NAL units are non-vcl NAL units. The multiplexer 30 may form NAL units, which include a header identifying the program to which the NAL belongs, and a payload, such as audio data, video data, or I5I028 describing the transport stream or program stream to which the NAL unit corresponds. Doc •22· 201119346 Poor material. For example, in H.264/Avc, a NAL unit can include a one-bit header and a variable-sized payload. In one example, the Nal unit header includes a priorityjd element, a temp〇ralJd element, an anCh〇r_Pic_flag element, a view_id element, an n〇n_idr-fiag element, and an inter-view-flag element. In the conventional MVC, the NAL unit defined by Η ” is reserved (except for the first NAL unit including the 4-byte Mvc NAL unit header and the nal unit payload) and the PCT (: slice of the code) Outside the Nal unit) The pri〇rity_id element of the NAL header can be used for a simple single-path bitstream adaptation process. The temporal "d" element can be used to specify the time level of the corresponding NAL unit, and the different time levels of t correspond to different frame rates. The anCh〇r_pic_flag element may indicate whether the picture is an anchor picture or a non-anchor picture. The wrong picture and its subsequent output order (ie, display order) = all pictures may not be in decoding order (ie, bit) The stream order) is decoded correctly in the case of decoding the previous picture, and thus can be used as a random access point. The anchor picture and the non-anchor picture can have different dependencies, both of which are signaled in the sequence parameter set. Other flags are discussed and used in subsequent paragraphs of this section. This erroneous picture may also be referred to as an open G〇p (image group) access point, and also supports closed G〇p when the non-idr_flag element is equal to zero. Access point The n_n"dr_flag element indicates that the picture is an instant decoder re-new (idr) picture or a view IDR (V. job) picture... in general, the job picture and its use in the output order or bit stream order Pictures can be decoded correctly without decoding the previous picture H brother in decoding, person order or display order. The view element contains grammar information that can be used to identify the view, which is available 151028.doc •23- 201119346
於MVC解碼器内部之資料石红。I Ή互動(例如’用於視圖間預測)及 解碼器外部之資料互動(例如, . v J ^ 用於顯現)。inter_view_flag 要素可指定相應NAL單元县 ~ 干及*疋否由其他視圖用於視圖間預 測。為了傳達可能符合Avr + β丄、 C之基本視圖的4位元組nal單 元標頭資訊’在MVC中定蠡昔m \T A τ 0„ _ 丄 丁夂我言碼NAL·早兀。在]^^(:之情形 下’基礎視圖存取單元包括相阁1 _ c枯祝圖之當前時刻的VCL NAL單 το以及其首碼NAL單元,該首碼ΝΑ單元僅含有說單元 標頭。H.264/AVC解碼器可忽略首碼祖單元。 在有效負載中包括視訊資料之缝單元可包含各種粒度 等級之視訊資料1例而言,缝單元可包含視訊資料區 塊:巨型區塊、複數個巨型區塊、視訊資料之切片,或視 §fl貢料的整個訊框。 一般而言’存取單元可包含用於表示視訊資料之訊框的 或夕個NAL單元,以及對應於訊框之音訊資料(在此音 訊資d料可料)°存取單元通常包括-輸出時刻之所有 NAL單το ’例如’—時刻之所有音訊資料及視訊資料。在 對應於H.264/AVC之實例中,存取單元可包含在—時刻之 經寫碼圖片’該圖片可呈現為初始經寫碼圖片。因此,存 取單元可包3共同時刻之所有視訊訊框,例如,對應於時 間X之所有視圖組件。 本發明亦將特定視圖之經編碼圖片㈣「視圖組件」。 亦即’視圖組件包含特定視圖在特^時間之經編碼圖片 (或訊框)。因A,存取單元在一些實例中可包含共同時刻 之所有視圖組件。存取單元之解竭次序不需要必須與輸出 151028.doc .24· 201119346 次序或顯示次序相ι連續存取單元之集合可形成經寫碼 之視訊序列,其可對應於NAL單元位元流或子位元流之圖 片群組(GOP)或其他可獨立解碼單元。 如同多數視tfl寫碼標準,H.264/AVC定義無錯誤位元流 ,語法、語義及解碼過程,纟中之任一者遵照某一規範或 等級。H.264/AVC並不指定編碼器,但向編碼器分派保證 所產生之位元流對於解碼器而言符合標準的任務。在視訊 寫碼標準之情形下,「規範」對應於演算法、特徵或工具 及施加至演算法、特徵或工具之約束的子集。舉例而言, 如藉由H.264標準所定義,「規範」為藉由凡264標準指定 之1個位元吾去的子集。「等級」對應於對解瑪器資源 消耗(諸如,解碼器記憶體及計算)的限制,該等限制係關 於圖片之解析度、位元率及巨型區塊(MB)處理速率。 H.264標準(例如)確認,在藉由給定規範之語法強加之界 限内,視藉由位元流中之語法要素獲取之值(諸如,經解 碼圖片之指定大小)而定需要編碼器及解碼器之效能方面 的大變化仍係可能的。H.264標準進一步確認,在許多應 用中,實施一能夠處置特定規範内之語法的所有假設使用 之解碼器為既不實際亦不經濟的。因此,H 264標準將 「等級」定義為強加於位元流中之語法要素之值上的指定 約束集合。此等約束可為對值之簡單限制。或者,此等約 束可採用對值(例如,圖片寬度X圖片高度\每秒解碼之圖 片之數目)之算術組合之約束的形式。H 264標準進一步提 供’個別實施可支援每一所支援規範之不同等級。 151028.doc -25- 201119346 遵照規範之解碼n -般支援在規範中定義之所有特徵。 舉例而言,作為寫碼特徵,B圖片寫碼在H 264/avc之基 線規範中並不被支援,且在H.264/AVC之其他規範中被1 援。遵照一等級之解碼器應能夠解碼並不需要超出在該等 級中疋義之限制之資源的任何位元流。規範及等級之定義 可有助於解譯能力。舉例而言,在視訊傳輸期間,一對規 範及等級定義可對整體傳輸會話進行協商並達成—致。更 具體而言,在H.264/AVC中,等級可定義(例如)對以下各 者之限制:需要進行處理之巨型區塊之數目、經解碼之圖 片緩衝器(DPB)大小、經寫碼之圓片緩衝器(CPB)大小、垂 直動作向量範圍、每兩個連續MB之動作向量的最大數 目,及B區塊是否可具有小於8x8像素之子巨型區塊分區。 以此方式,解碼器可判定解石馬器是否能夠適當地解碼位元 流。 一參數集合通常含有序列參數集合(sps)中之序列層標頭 貝錢圖片參數集合(PPS)中的偶爾改變之圖片層標頭資 訊。藉由參數集合,對於每一序列或圖片而言不需要重複 此偶爾改變之資訊;因此’寫碼效率可得以改良。此外, 參數集合之使用可致能標頭資訊之頻帶外傳輸,從而避免 對冗餘傳輸之需要以達成抗誤性。在頻帶外傳輸中,在不 同於其他說單元之頻道上傳輸參數集合NAL單元。 本發明之技術涉及將提取器包括於媒體提取器軌跡中。 本發明之提取器可參考共同檔案中之另一 個以上說單元。亦即,權案可包括-具有複數個NAL: 151028.doc -26- 201119346 元之第-執跡及-包括-提取器的第二執跡,該提取器識 別第一執跡之複數個NAL單元中的兩個或兩個以上nal單 元。一般而言,提取器可充當指標,使得當解多工㈣遭 遇提取器3夺,解多工器38可自第_軌跡操取由該提取器識 別之NAL單元’並將彼等NAL單元發送至視訊解碼㈣。 包括提取器之軌跡可稱為媒體提取器軌跡。本發明之提取 器可包括於遵照(例如)以下各者之各種檔案格式的檔案 中:iso基礎媒體檔案格式、可擴充視訊寫碼(svc)檔案格 式、進階視訊寫碼(AVC)檔案格式、第三代合作夥伴計劃 (3GPP)檔案格式,及/或多視圖視訊寫碼(Mvc)檔案格式。 一般而言,視訊檔案之各種軌跡可用作切換軌跡。亦 即,多工器30可包括各種軌跡以支援各種訊框率、顯示能 力及/或解碼能力。舉例而言,當視訊檔案遵照檔案 格式時,每一軌跡可表示不同MVC操作點。因此,解多工 器38可經組態以選擇軌跡中之—者,從而自所選擇軌跡操 取不同於所選擇軌跡之由提取器識別之NAL單元的nal單 7L並丟棄其他軌跡之資料。亦即,當所選擇軌跡包括一參 考另一軌跡之NAL單元的提取器時,解多工器38可提取經 參考之NAL單元,同時丟棄其他軌跡之未經參考的單 元。解多XH38可將所提取之NAL單元發送至視訊解碼器 48 ° 藉由在媒體提取器軌跡中使用提取器,本發明之技術可 用以達成視訊檔案之各種軌跡之間的時間可調能力。在 MPEG-1及MPEG-2中,例如,經㈣碼之圖片提供固有之 151028.doc -27· 201119346 時間可調能力。遵照MPEG-l或MPEG-2之視訊檔案之第一 執跡可包括經I編碼之圖片、經p編碼之圖片及經8編碼之 圖片的整個集合。視訊檔案之第二軌跡可包括僅參考第— 軌跡之經I編碼之圖片及經p編碼之圖片的一或多個提取 器,省略對經B編碼之圖片之參考。藉由捨棄經B編碼之 圖片,視訊檔案可達成確定之一半解析度的視訊表示。 MPEG-1及MPEG_2亦提供基礎層及增強層概念以寫碼兩個 時間層,其中增強層圖片對於每一預測方向可選定來自基 礎層或增強層之圖片作為參考。 作為另一實例,H.264/AVC使用階層式經B編碼之圖片 以支援時間可調能力。採用H 264/AVC之視訊序列之第— 圖月可稱為即時解碼器再新(IDR)圖片(亦稱為關鍵圖片)。 關鍵圖片通常在規則或不規則時間間隔内進行寫碼該等 關鍵圖片使用先前關鍵圖片作為參考經框内寫碼或框間寫 碼以用於經動作補償之預測。圖片群組(G〇p)通常包括— 關鍵圖片及在時間上定位於關鍵圖片與先前關鍵圖片之間 的所有圖片。可將G0P分成兩個部分,一部分為關鍵圖 片’且另一部分包括非關鍵圖片。非關鍵圖片藉由2個參 考圖片進行階層式預測,該2個參考圖片為具有較低時間 等級之距過去及未來最近的圖片。可向每一圖片指派時間 識別符值以指示圖片之階層位置。因此,具有達1^之時間 識別符值之圖片可形成一視訊片段,該視訊片段具有為具 有達N-1之時間識別符值的圖片所形成之視訊片段之訊框 率的兩倍之訊框率。因此,本發明之技術亦可用以藉由以 151028.doc -28- 201119346 執跡包括一或多個提取器,該一或多個提取器參考第 跡之具有達N-1之時間識別符值的nal單元。 下操作來達成H.264/AVC中之時間可調能力:使第—軌跡 包括具有達N之時間識別符值之所有NAL單元,且使第二 執 如上文所指出,本發明之技術可應用至遵照以下各者中 之至少一者的視訊擋案:IS〇基礎媒體檔案格式可擴充 視訊寫碼(SVC)檔案格式、進階視訊寫碼(AVC)檔案格 式、第三代合作夥伴計劃(3GPP)檔案格式,及多視圖視訊 寫碼(MVC)檔案格式。ISO基礎媒體檔案格式經設計以含 有按時間順序的媒體資訊以用於以靈活的可擴展格式進行 呈現,該格式促進媒體之互換、管理、編輯及呈現。1§〇 基礎媒體檔案格式(ISO/IEC 14496-12:2004)指定於mpeg_4 第12部分中,該MPEG-4第12部分定義基於時間之媒體檔 案的通用結構。其用作該族中之其他檔案格式(諸如,經 定義以支援H.264/MPEG-4 AVC視訊壓縮之Avc檔案格式 (ISO/IEC 14496-15)、3GPP檔案格式、svc檔案格式及 MVC擋案格式)的基礎。3GPP檔案格式及Mvc檔案格式為 AVC檔案格式之擴展。IS〇基礎媒體檔案格式含有媒體資 料之按時間順序的序列之時序、結構及媒體資訊,諸如視 聽呈現。檔案結構為物件導向式。檔案可極簡單地分解為 基本物件,且該等物件之結構由其類型來暗示。 遵照IS 0基礎媒體標案格式之檔案形成為稱作「箱」之 一系列物件。ISO基礎媒體檔案格式之資料含於該等箱 中’且在棺案内不存在其他資料。此箱包括特定檔案格式 151028.doc •29- 201119346 所需之任何初始簽章。「箱」為由唯_類型識別符及長度 定義之物件導向式構建區塊。通常,呈現項含於一播案 中’且媒體呈現為自含式的。電影容器(電影箱)含有媒體 及視訊之中繼資料,且音訊訊框含於媒體資料容器中,且 可係在其他檔案中。 呈現項(動作序列)可含於若干檔案中1有時序及成框 (置及大小)資訊通常係在ISO基礎媒體檔案中,且輔助 =:質上可「使用任何格式。此呈現項對於含有該呈現項 糸、、先可為「本端」的,或可經 制。 j岭次再他流遞送機 可具有邏輯結構、時間結構及實體結構,且此 L構不需要進行麵合。槽案之邏輯結構可 時間平行之軌跡之集人的雷办冰也 x s有 跡A右扯主 、η ^。檔案之時間結構可為,軌 跡3有按時間順序的樣本之序列,且彼等 輯清單映射至整體電% # 可選編 金體電影的時刻表中。檔案之 媒體資料樣本自身分離出'羅鯓士 構可自 資料…構資時間及結構分解所需要的 箱在時間上進行擴展= 可能藉由電影片段 時序_ 、電衫相可用文件證明樣本之邏輯及 標可係指向同-I宰有 =關係位於何處的指標。彼等指 中。 ㈣或(例如)藉由狐參考的另—槽案 :-媒體流可含於專用於彼媒體 軌跡中’且可藉由樣本項來進— ,)之 準確之媒體類型之「名避⑽化#本項可含有 」(對流進行解碼所需要之解碼 I5W28.doc -30. 201119346 器的類型)及所需要之該解碼器的任何來數化 採用四字元碼(例如,「m。。…戈「_」)之二稱亦可 不僅用於聰G_4媒體而且用於由❹ 他組織使用之媒體類型的所定義之樣本項格式。气矢之其 對中繼資料之支援通常採用兩種形式。首 序的中繼資料可儲存於恰當執跡中,視需要鱼間順 體資料進行同步。第二,可存在對附接至電影或二: =不按時間順序的中繼資料的通用支援。結構支援為通用 安’且如在t繼資料中-般允許將中繼資料資源儲存 次另&案中。此外,此等資源可進行命 名,且可進行保護。 在助基礎媒體槽案格式中,樣本分組為將軌跡中之樣 本中的每_者指派為—樣本群組的—成員。不需要樣本群 組中之樣本為連續的。舉例而言’當呈現AVC樓案格式之 H.264/AVC時’可將—時間等級中之視訊樣本取樣為二樣 本群組。樣本群組可藉由以下兩個資料結構來表示: SamPleToGroup 箱(sbdp)及 SampleGr〇upDescr箱。The information inside the MVC decoder is stone red. I Ή interaction (eg 'for inter-view predictions') and data interaction outside the decoder (eg, .v J ^ for visualization). The inter_view_flag element specifies whether the corresponding NAL unit county ~ dry and *疋 is used by other views for inter-view prediction. In order to convey the 4-byte nal unit header information that may conform to the basic view of Avr + β丄, C's in the MVC, m \TA τ 0 „ _ 丄 夂 夂 言 言 NAL NAL NAL 兀 兀. ^^(: In the case of the 'base view access unit', the VCL NAL single τ ο ο ο ο ο ο ο ο ο ο ο ο ο ο ο ο ο ο ο ο ο The .264/AVC decoder can ignore the first code ancestor unit. The seam unit including the video data in the payload can include video data of various granularity levels. For example, the seam unit can include the video data block: the giant block, the plural a block of video blocks, a slice of video data, or an entire frame of §fl tribute. Generally, an access unit may include a frame for indicating video data or a NAL unit, and corresponding to the frame. The audio data (in this audio data can be expected) ° access unit usually includes - all NAL single τ ο ' at the moment of the moment - 'time' all the audio data and video data. In the case corresponding to H.264 / AVC In the access unit, the access unit can be included in the time code The picture 'This picture can be rendered as an initial coded picture. Therefore, the access unit can pack all of the video frames at the same time, for example, all view components corresponding to time X. The present invention also encodes pictures of a particular view. (4) "View component". That is, the 'view component contains a coded picture (or frame) of a specific view at a specific time. Because A, the access unit may include all view components of a common time in some instances. The exhaustion order does not necessarily have to be combined with the output 151028.doc.24·201119346 order or display order. The set of consecutive access units can form a coded video sequence that can correspond to a NAL unit bit stream or sub-bit. Streaming Picture Group (GOP) or other independently decodable unit. Like most viewing tfl code standards, H.264/AVC defines error-free bitstreams, syntax, semantics, and decoding processes, either of which follows a certain A specification or level. H.264/AVC does not specify an encoder, but assigns to the encoder to ensure that the resulting bitstream is compliant with the standard for the decoder. In the case of video coding standards, "Fan" corresponds to a subset of algorithms, features or tools and constraints imposed on algorithms, features or tools. For example, as defined by the H.264 standard, "norms" are specified by the 264 standard. A subset of 1 bit. The "level" corresponds to the limit on the resource consumption of the damper (such as decoder memory and calculation). The restrictions are related to the resolution of the image, the bit rate and the mega zone. Block (MB) processing rate. The H.264 standard (for example) confirms that, within the bounds imposed by the syntax of a given specification, the value obtained by the syntax element in the bitstream (such as the designation of the decoded picture) Large variations in the performance of the encoder and decoder are still possible depending on the size. The H.264 standard further confirms that in many applications it is neither practical nor economical to implement a decoder that is capable of handling all hypotheses used in a grammar within a particular specification. Therefore, the H264 standard defines "hierarchy" as a specified set of constraints imposed on the values of the syntax elements in the bitstream. These constraints can be a simple limit on the value. Alternatively, such constraints may take the form of constraints on the arithmetic combination of values (e.g., picture width x picture height\number of pictures decoded per second). The H 264 standard further provides that 'individual implementations can support different levels of each supported specification. 151028.doc -25- 201119346 Compliance with the specification n-like support for all features defined in the specification. For example, as a write code feature, the B picture write code is not supported in the H 264/avc base line specification and is supported in other specifications of H.264/AVC. A decoder that complies with a level should be able to decode any bit stream that does not require resources beyond the limits of the limits in that class. The definition of specifications and levels can help with interpretation capabilities. For example, during video transmission, a pair of specifications and level definitions can negotiate and achieve an overall transmission session. More specifically, in H.264/AVC, the level may define, for example, restrictions on the number of megablocks that need to be processed, the decoded picture buffer (DPB) size, and the coded code. The slice buffer (CPB) size, the vertical motion vector range, the maximum number of motion vectors per two consecutive MBs, and whether the B block can have sub-megablock partitions smaller than 8x8 pixels. In this way, the decoder can determine if the solution is capable of properly decoding the bit stream. A set of parameters typically contains a sequence layer header in the sequence parameter set (sps) that occasionally changes the picture layer header information in the Pixel Picture Parameter Set (PPS). With the set of parameters, this occasional change of information does not need to be repeated for each sequence or picture; therefore, the write efficiency can be improved. In addition, the use of a set of parameters enables out-of-band transmission of header information, thereby avoiding the need for redundant transmissions to achieve error tolerance. In out-of-band transmission, a parameter set NAL unit is transmitted on a different channel than the other speaking unit. The technique of the present invention involves including an extractor in a media extractor trajectory. The extractor of the present invention can refer to another unit in the common file. That is, the rights case may include - a plurality of NALs having a plurality of NALs: 151028.doc -26-201119346 - the second track of the -include-extractor, the extractor identifying the plurality of NALs of the first trace Two or more nal units in a cell. In general, the extractor can act as an indicator such that when the multiplexed (4) encounters the extractor 3, the demultiplexer 38 can fetch the NAL units identified by the extractor from the _trajectory and send the NAL units To video decoding (four). The trajectory including the extractor can be referred to as a media extractor trajectory. The extractor of the present invention may be included in files in accordance with, for example, various file formats of the following: iso basic media file format, expandable video write code (svc) file format, advanced video write code (AVC) file format. Third Generation Partnership Project (3GPP) file format, and/or multi-view video code (Mvc) file format. In general, various tracks of a video file can be used as a switching track. That is, multiplexer 30 can include various trajectories to support various frame rate, display capabilities, and/or decoding capabilities. For example, when a video file conforms to the file format, each track can represent a different MVC operating point. Thus, the demultiplexer 38 can be configured to select one of the trajectories to manipulate the nal unit 7L of the NAL unit identified by the extractor from the selected trajectory from the selected trajectory and discard the data of the other trajectories. That is, when the selected trajectory includes an extractor that references a NAL unit of another trajectory, the demultiplexer 38 can extract the referenced NAL unit while discarding the unreferenced cells of the other trajectories. The multi-XH38 can send the extracted NAL unit to the video decoder. 48 ° By using the extractor in the media extractor track, the techniques of the present invention can be used to achieve time-adjustable capabilities between the various tracks of the video file. In MPEG-1 and MPEG-2, for example, the intrinsic 151028.doc -27· 201119346 time adjustable capability is provided by the (4) code picture. The first track of a video file conforming to MPEG-I or MPEG-2 may include the entire set of I-coded pictures, p-coded pictures, and 8-encoded pictures. The second track of the video file may include one or more extractors that only reference the I-coded picture of the first track and the p-coded picture, omitting references to the B-coded picture. By discarding the B-encoded picture, the video file can achieve a one-and-a-half resolution video representation. MPEG-1 and MPEG_2 also provide a base layer and enhancement layer concept to write two time layers, where the enhancement layer picture can be selected for reference from the base layer or enhancement layer for each prediction direction. As another example, H.264/AVC uses hierarchical B-coded pictures to support time-adjustable capabilities. The first month of the video sequence using H 264/AVC can be called an instant decoder renew (IDR) picture (also known as a key picture). Key pictures are typically coded in regular or irregular time intervals. These key pictures use the previous key picture as a reference to in-frame or inter-frame code for motion compensated prediction. A group of pictures (G〇p) usually includes - a key picture and all pictures that are temporally located between the key picture and the previous key picture. The GOP can be divided into two parts, one for the key picture' and the other for the non-key picture. The non-key picture is hierarchically predicted by two reference pictures, which are pictures of the past and the future with a lower time level. Each picture can be assigned a time identifier value to indicate the hierarchical location of the picture. Therefore, a picture having a time identifier value of up to 1^ can form a video segment having twice the frame rate of the video segment formed by the picture having the time identifier value of N-1. Box rate. Therefore, the technique of the present invention can also be used to include one or more extractors by reference to the 151028.doc -28-201119346, the one or more extractor reference traces having a time identifier value of N-1 Nal unit. The following operations are performed to achieve the time adjustable capability in H.264/AVC: the first track includes all NAL units having a time identifier value of up to N, and the second implementation is as indicated above, the technique of the present invention is applicable To videoconferencing in accordance with at least one of the following: IS〇 Basic Media File Format Expandable Video Recording (SVC) File Format, Advanced Video Recording (AVC) File Format, Third Generation Partnership Program ( 3GPP) file format, and multi-view video code (MVC) file format. The ISO base media file format is designed to contain chronological media information for presentation in a flexible and extensible format that facilitates the interchange, management, editing and presentation of media. 1§〇 The base media file format (ISO/IEC 14496-12:2004) is specified in mpeg_4, Part 12, which defines the general structure of time-based media files. It is used as other file formats in the family (such as the Avc file format (ISO/IEC 14496-15), 3GPP file format, svc file format and MVC block defined to support H.264/MPEG-4 AVC video compression. The basis of the case format). The 3GPP file format and the Mvc file format are extensions of the AVC file format. The IS® base media file format contains the timing, structure, and media information of the chronological sequence of media materials, such as audiovisual presentations. The file structure is object oriented. Files can be broken down very simply into basic objects, and the structure of these objects is implied by their type. Files in accordance with the IS 0 base media standard format are formed into a series of objects called "boxes". The information in the ISO base media file format is contained in these boxes' and no other information exists in the file. This box includes the specific file format 151028.doc •29- 201119346 Any initial signature required. A "box" is an object-oriented building block defined by a _type identifier and a length. Usually, the presentation item is included in a broadcast case and the media appears to be self-contained. The movie container (cinema box) contains media and video relay data, and the audio frame is contained in the media data container and can be attached to other files. The presentation item (action sequence) can be included in several files. 1 The timing and frame (size and size) information is usually in the ISO base media file, and the auxiliary =: qualitatively can use "any format. This presentation item contains The presentation item may be "local" or may be processed. The j-ridge re-flowing delivery machine can have a logical structure, a time structure, and a solid structure, and the L-structure does not need to be face-to-face. The logical structure of the trough case can be a time-parallel trajectory of the set of people's thunder ice also x s trace A right pull main, η ^. The time structure of the file can be that trajectory 3 has a sequence of chronological samples, and the list of matches is mapped to the time table of the overall electricity % # optional gold movie. The sample of the media data of the file itself is separated from the 'Royce's self-documentation... The box required for the time and structure decomposition of the structure is expanded in time. _ The timing of the sample may be confirmed by the film fragment timing _ The indicator can point to the same -I slaughter = where the relationship is located. They are in the middle. (d) or (for example) another slot case referenced by Fox: - the media stream may be contained in an accurate media type dedicated to the media track 'and can be entered by the sample item'. #本项可以包含" (Decoding I5W28.doc -30. 201119346 type required for decoding the stream) and any digitization of the decoder required by the four-character code (for example, "m.... The second "_") can also be used not only for Sung G_4 media but also for the defined sample item format of the media type used by his organization. The support for relay data is usually in two forms. The relay data of the first order can be stored in the proper obstruction, and the fish inter-information data can be synchronized as needed. Second, there may be general support for attaching to a movie or two: = non-chronological relay data. The structure support is general security and, as in the case of the data, the relay data resources are allowed to be stored in the other & In addition, these resources can be named and protected. In the help-based media slot format, the samples are grouped to assign each of the samples in the trace as a member of the -sample group. Samples in the sample group are not required to be contiguous. For example, when the H.264/AVC in the AVC architectural format is presented, the video samples in the time level can be sampled into two groups. The sample group can be represented by the following two data structures: SamPleToGroup box (sbdp) and SampleGr〇upDescr box.
SampleToGroup箱表示樣本至樣本群組之指派。對於每— 樣本群組項可存在SampleGroupDescription箱之一例項以 描述相應群組之性質。 可選中繼資料執跡可用以由每一軌跡具有之「所關心特 性」對每一軌跡加標籤,對於該「所關心特性」而言,其 值了不同於群組之其他成員(例如,其位元率、瑩幕大小 或語言)。轨跡内之一些樣本可具有特殊特性,或可經個 151028.doc •31- 201119346 別地識別。該特性之一實例為同步點(通常為視訊之^江 框)。此等點可藉由每一軌跡中之特殊表來識別。更一般 而口’軌跡樣本之間的相依性之本質亦可使用中繼資料來 用文件證明。中繼資料可如視訊軌跡般結構化為樓案格式 樣本之-序列。此執跡可稱為中繼資料軌跡。每一中繼資 料樣本可結構化為一中繼資料陳述式。存在對應於可能就 相應私案格式樣本或其構成性樣本進行詢問的各種問題之 各種種類陳述式。 备經由串流協定遞送媒體時,可能需要自在檔案中呈現 媒體之方式變換媒體。此情形之一實例為在經 Γρ)傳輪媒體時。在槽案中,例如,視訊之每一訊桓: 、·’貝地健存為檔案格式樣本。在RTP中必須遵守專門針對 :斤使用之編解碼器之封包化規則以將此等訊植置於咖封 匕中_ 4服a可經組態以在運轉時間計算此封包化。 然而,存在對輔助串流伺 — 殊軌跡可置於槽案中。…极稱作-意軌跡之特 示意轨跡含有用於串流伺服器之關於如何針對特定協定 ;媒體軌跡形成封包流的通用指令。因為此等指令之 :獨立於媒體的,所以當引入新編解碼器時,可能不需要 修正伺服器。此外,編竭 ::“要 哭一 „ ^ 1、 1汉、兩科秋體可不知曉争流伺服 —π成對檔案之編輯,稱作 段可用以在將檔案m ( nter)之軟體片 一置於串流伺服器上之前 至檔案。作為一會μ各 〜軌跡添加 RTPO 例’在ΜΡ4稽案格式規範中存在針對 L之所定義的示意轨跡格式。 15I028.doc •32· 201119346 3 0?(3 0??檔案格式)為由第三代合作夥伴計劃(30??)針 對3 G UMTS多媒體服務定義之多媒體容器格式。其通常用 於3G行動電話及具有3G能力之其他器件上,但亦可在某 些2G及4G電話及器件上播放。3GPP檔案格式係基於ISO基 礎媒體樓案格式。在3GPP TS26.244「Transparent end-to- end packet switched streaming service (PSS); 3GPP Hie format (3GP)」中指定最新3GP。3GPP檔案格式將視訊流 儲存為MPEG-4 第2部分或H.263或MPEG-4第10部分 (AVC/H.264)。因為3GPP指定樣本項及模板欄位在ISO基 礎媒體檔案格式中的使用以及定義編解碼器參考之新箱, 所以3GPP允許在ISO基礎媒體檔案.格式(MPEG-4第12部分) 中使用AMR及Η·263編解碼器。對於MPEG-4媒體特定資訊 在3GP檔案中之儲存,3GP規範參考MP4及AVC檔案格 式,MP4及AVC檔案格式亦係基於ISO基礎媒體檔案格 式。MP4及AVC檔案格式規範描述MPEG-4内容在ISO基礎 媒體檔案格式中的使用。 為AVC檔案格式之擴展的SVC檔案格式具有提取器及層 之新結構。提取器為提供關於樣本中之與在另一轨跡中具 有相等解碼時間的視訊寫碼資料之位置及大小之資訊的指 標。此情形允許在寫碼域中直接構建軌跡階層。SVC中之 提取器轨跡鏈接至一或多個基礎軌跡,提取器軌跡在運轉 時間自一或多個基本軌跡提取資料。提取器為具有NAL單 元標頭之可藉由SVC擴展解參考的指標。若用於提取之轨 跡含有不同訊框率下之視訊寫碼資料,則提取器亦含有解 151028.doc -33· 201119346 碼時間偏移以確保軌跡之間的同步。在運轉時間,在流傳 遞至視訊解碼器之前,提取器必須由其指向之資料替換。 因為SVC中之提取器軌跡類似於視訊寫碼執跡而結構 化,所以SVC中之提取器軌跡可表示其以不同方式需要的 子集。SVC提取器軌跡僅含有關於如何自另一軌跡提取資 料之指令。在SVC檔案格式中,亦存在彙總工具,其可將 樣本内之NAL單元彙總在一起作為一 NAL單元,包括將一 層中之NAL單元彙總至一彙總工具中。SVC中之提取器經 設計以自以下各者提取某一範圍之位元組:樣本或彙總工 具,或僅一整個NAL單元而非多個NAL單元,尤其是樣本 中之不連續的NAL單元。在SVC檔案格式中,可存在許多 視訊操作點。層經設計以對操作點之一或多個軌跡中的樣 本進行分組。 MVC檔案格式亦支援提取器軌跡,該提取器軌跡自不同 視圖提取NAL單元以形成一操作點,該操作點為採用某一 訊框率之視圖的子集。MVC提取器執跡之設計類似於SVC 檔案格式中之提取器。然而,並不支援使用MVC提取器執 跡來形成交替群組。為了支援轨跡選擇,向以下MPEG提 案建議 MPEG:P. Frojdh、A. Norkin 及 C. Priddle「File format sub-track selection and switching」 {JS0/1EC JTC1/SC29/WG11 MPEG M16665,英國,倫敦)。此提案試 圖致能子軌跡等級中之交替/切換群組概念。 映射樣本群組為樣本群組之擴展。在映射樣本群組中, 每一群組項(樣本之群組項)具有其「groupID」之描述,其 151028.doc -34- 201119346 實際上為在可能將一視圖中之NAL單元彙總為一NAL單元 之後的至vieW_id之映射。換言之,每一樣本群組項具有 其含有之在ScalableNALUMapEntry值中列出的視圖。此樣 本群組項之gr〇uping_type為rscnm」。 漸進式下載為用以描述數位媒體槽案通常使用Ηττρ協 疋自伺服器至用戶端之轉移的術語。當自電腦起始時,消 費者在下載完成之前可開始播放媒體。串流媒體與漸進式 下載之間的關鍵差異在於正存取數位媒體之終端使用者器 件如何接收並儲存數位媒體資料。具有漸進式下載播放能 力之媒體播放器依賴位於當自web伺服器下載數位媒體檔 案時完整檔案之標頭中的中繼資料及數位媒體檔案之本端 緩衝益。在指定量之資料可用於本端播放器件之時刻,將 開始播放媒體。此指定量之緩衝藉由編碼器設定中之内容 的生產者嵌入於檔案中,並由藉由媒體播放器強加之額外 緩衝器設定進行加強。 在3GPP中,對於3GP檔案而言,支援HTTp/Tcp/Ip傳送 以用於下載及漸進式下載。此外,將HTTp用於視訊串流 具有一些優點,且基於Ηττρ之視訊串流服務正變得愈加 風仃。HTTP串流之—些優點包括:可使用現有網際網路 組件及協定’使得不需要新努力來開發經由網路傳送視訊 =料的新技術。(例如)RTP有效負載格式之其他傳送協定 需:中間網路器件(例如’中間箱)以知曉媒體格式及傳訊 背尽。又’ HTTP串流可經用戶端驅動,#避免許多控制 問題。舉例而言’ $ 了利用所有特徵以獲得最佳效能,伺 15I028.doc •35· 201119346 服器可追蹤尚未確認之封包的大小及内容《伺服器亦可分 析檔案結構並重建用戶端緩衝器之狀態以做出RD最佳切 換/精簡決策。此外’可滿足對位元流變化之約束以便符 合經協商之規範。HTTP在實施HTTP 1 · 1之Web伺服器處不 必需要新硬體或軟體實施。HTTP串流亦提供TCP親和性及 防火牆遍歷。本發明之技術可(例如)藉由提供位元率適應 來改良視訊資料之HTTP串流以克服關於頻寬之問題。 諸如 ITU-T H.261、H.262、H.263、MPEG-1、MPEG-2 及H.264/MPEG-4第10部分之視訊壓縮標準利用經動作補 償之時間預測來減小時間冗餘。編碼器使用來自一些先前 經編碼之圖片(本文中亦稱為訊框)之經動作補償之預測, 以根據動作向量來預測當前經寫碼之圖片。在典型視訊寫 碼中存在二種主要圖片類型。其為經框内寫碼之圖片(「【 圖片」或「I訊框」)、所預測之圖片(「p圖片」或「p訊 框」),及經雙向預測之圖片(「B圖片」或「B訊框」)/p 圖片之區塊可經框内寫㉙,或參看—其他圖片進行預測。 在B圖片中’區塊可自一或兩個參考圖片進行預測,或可 經框内寫碼。此等參考圖片可按時間次序定位於當前圖片 之前或之後。 很撅H.264寫 ..,如準,作為一實例,B圖片使用先前^ 碼之參考圖片的兩個清單(清單Q及清單…此等兩個、; 可各自含有採料間次序之過去及/或未來之經寫; 片。可以以下若干方式中之一者來預測Β圖片中之區3 自“°參考圖片之經動作補償之預測,自清Β參考丨 151028.doc -36 - 201119346 之經動作補償的預測,或自清 片兩者之組合的經動作補償丨圖片及清單1參考圖 圖片及清單!參考圖片兩者之組Hi為了獲得清單〇參考 、主°。1 A '' 刀別自清單0來考円y =早1參考圖片獲得兩個經動作補償之參考區域L片 合將用以預測當前區塊。 域其組 ,小視訊區塊可提供較佳解析度,且可用於定位包^ 等級細節之視訊訊框。—妒而‘ _ ^ 為:區塊之各種分區視為視訊區塊。此外,可將切片視為 複數個視机區塊,諸如巨型區塊及/或子區塊。每 可為視訊簡之-可獨立解碼之單元4者訊框自身可 為可解碼之| ’或訊框之其他部分可定義為可解碼之單 元。術語「經寫碼之單元」《「寫碼單元」可指代視訊訊 框之任何可獨立解碼之單元,諸如整個訊框、訊框之切 片、亦稱為序列之圖片群組(G〇P),或根據適用寫碼技術 定義之另一可獨立解碼之單元。 術語巨型區塊指代用於根據包含16χ 16像素之二維像素 陣列編碼圖片及/或視訊資料的資料結構ε每一像素包含 一色度分量及一照度分量。因此,巨型區塊可定義各自包 含一為8x8像素之二維陣列的四個照度區塊、各自包含一 為丨6><16像素之二維陣列的兩個色度區塊,及一包含諸如 以下各者之語法資訊的標頭:經寫碼之區塊型樣(CBP)、 編碼模式(例如,框内⑴編碼模式或框間(Ρ或Β)編碼模 式)、經框内編碼之區塊之分區的分區大小(例如, 16x16、16x8、8x16、8x8、8Μ' 4x8或 4x4),或經框間編 151028.doc -37· 201119346 碼之巨型區塊的一或多個動作向量。 視訊編碼器28、視訊解❹48、音訊編碼㈣、音訊解 碼器私、多工器30及解多工器38在適用時各自可實施為多 種合適編碼器或解碼器電路中的任一者,諸如一或多個微 處理器、數位信號處理器()、特殊應用積體電路 (ASK:)、場可程式化閘陣列(FpGA)、離散邏輯電路、軟 體:硬體、勤體,或其任—組合。視訊編碼器似視訊解 馬态48中之母一者可包括於一或多個編碼器或解碼器中, - 者可整口為絰組合之視訊編碼器/解碼器(codec) 之部分。同樣,音訊編碼器26及音訊解碼器^中之每一者 可包括於一或多個編碼器或解碼器中,其任一者可整人為 經組合之音訊編碼器/解碼器(c〇DEC)的部分。包括Μ 編碼器28、視訊解碼㈣、音訊編碼㈣、音訊解碼器 46、多工器30及/或解多工器以之裝置可包含一積體電 路、一微處理器’及/或-諸如蜂巢式電話之無線通信器 件0 根據本發明之技術,多工器3〇可將魏單元組合為遵照 ISO基礎媒體槽案格式或其衍生格式⑼士。,svc ' ^或3咖)之視訊檔案的執跡,且包括—媒體提取器軌 亦’该媒體提取器軌跡識別另一軌跡之一或多個潛在不連 續NAL單元並將視訊權案傳遞至輸出介面32。輸出介面η 可包含(例如)傳輸器、收發器、用於將資料寫入至電腦可 讀媒體之器件(諸如’光碟機、磁性媒體驅動器(例如,軟 碟機)、通用串行匯流排(USB)、網路介面),或其他輸出 151028.doc -38- 201119346 "面輪4介®32將NAL單it或存取單元輸 媒體34(例如,锉4你认 铷卬主%驷了凟 磁性 #輪信號或載波之暫態媒體),或諸如 體。…先學媒體、記憶體或隨身碟之電腦可讀儲存媒 勺H介面36自電腦可讀媒體34操取資料。輸人介面36可 如)光碟機、磁性媒體驅動器、職埠、接收器、 單元;;或其他電腦可讀媒體介面。輸入介面36可將NAL 早兀或存取單元提供至解 流或程式流解夕工^ 多工器38可將傳送 夕二冓成性PES流,解封包化該等PES >·* 以操取經编碼之資料,且視經_之_(例二ES: 所指示係音訊流或是視訊淹的二: 多工器π/㈣送至音訊解碼器46或視訊解碼器48。解 $初可選擇包括於所接收視訊檔案中之執跡中的 ^接著制所選㈣跡之^枝其他㈣的由 擇執跡之提取H參考之㈣傳遞至視鱗碼器48,丢棄1 他執跡之並未由所選擇軌跡之提取器參 碼器⑽碼經編碼之音訊資料,並將經解碼之= :至::輸出42’而視訊解…解碼經編碼之IS = 流之複數個視圖—視_ 發送視續出44。視訊輸出44可包含—使用 數個視圖的顯示器,例如’同時呈現一場景之= 戴眼鏡式立體顯示器或眼式立體顯示器。 見⑽ 圖2為說明多工薄1λ 在圖2件之實例配置的方塊圖。 益匕括流管理單元6〇、視訊輸入 15I028.doc •39· 201119346 介面80、音訊輸入介面82、多工流輸出介面料,及程式特 定資訊表88 ^流管理單元60包括NAL單元建構器62、流識 別符(流ID)查找單元66、軌跡產生單元64及提取器產生單 元68。 在圖2之實例中,視訊輸入介面8〇及音訊輸入介面82包 括用於自經編碼之視訊資料及經編碼之音訊資料形成 單元的各別封包化器。在其他實例中,視訊及/或音訊封 包化器可呈現為在多工器3〇外部。關於圖2之實例,視訊 輸入介面80可自接收自視訊編碼器28之經編碼之視訊資料 开y成PES封包,且音訊輸入介面82可自接收自音訊編碼器 26之經編碼之音訊資料形成pES封包。 在NAL單元建構器62建構NAL單元之後,NAL單元建構 器62將NAL單元發送至軌跡產生單元64。轨跡產生單元6斗 接收NAL單元,並將包括NAL單元之視訊樓案組合為視訊 檔案的一或多個軌跡。執跡產生單元64可進一步執行提取 益產生單凡68以產生藉由軌跡產生單元64建構之一或多個 媒體提取器軌跡的提取器。當判定—或多個說單元屬於 多個軌跡而非在執跡之間重複NAL單元時,提取器產生單 兀68可建構參考NAL單元之軌跡的提取器。以此方式,多 工器30可避免資料在軌跡之間的重複,此可減小傳輸視訊 檔案時的頻寬消耗。 下文論述提取器之資料結構及組件之各種實你j。-般而 5,提取器可包括:_軌跡識別符值,其參考其中包括經 參考之NAL單元的轨跡;及—或多個肌單元識別符,丈 151028.doc 201119346 識別由提取器參考之NAL單元。在—此_ 二貫例中,nali开 識別符可參考由對應於經識別之nat A ^ [早凡的軌跡識別符值 參考之軌跡中的位元或位元組範圍。 » - ^ . 在—些實例中,NAL· 早7L識別符可個別地參考由提取器 > W之母—NAL單元, (例如)以便識別不連續NAL單元。a _ * 6 _ 上貧例中,NAL單 兀識別符可基於自媒體提取器軌跡中 甲之k取器之時間或空 間位置的偏移來參考NAL單元。 ^產生單元M在—些實例中可包括媒體提取器軌跡中 ^卜隐單元。亦即,媒體提取器軌跡可包括祖單元 及提取器兩者。因此,在一此實例 诸槐、 _霄例中,軌跡產生單元64可 建構一視訊檔案’該視訊檔案且有 /、頁僅包括NAL·單元之第 :軌跡及一包括一或多個提取器之第二軌跡,該一或多個 ^取益參考第-軌跡之祖單元的全部或子集。此外,在 些實例中,執跡產生單元64可包 _ 弟—執跡中之並未包 祜於第一軌跡中的額外NAL單元。 』樣本發明之技術可 擴展至複數個轨跡。舉例而言,軌跡產生單元64可建構一 可參考第-軌跡之NAL單元及/或第二軌跡之nal單元的第 2跡,且可另外包括並未包括於第—軌跡或第二 的NAL單元。 圖3為說明一實例稽案1〇〇之方塊圖,該實例播案⑽包 括-具有視訊樣本之—集合的第—軌跡及—具有參考第— 軌跡之視讯樣本子集的提取器之第二執跡。在圖3之實例 中’擋案100包括MOOV|gl()2及媒體資料(鹽7)箱11〇。 ▽相102對應於電影箱,其由助基礎媒體檔案格式定 I51028.doc •41 · 201119346 義為容器箱’該容器箱之子箱定義用於呈現之中繼資料。 ΜΙ^ΑΤ箱1()4對應於媒體資料箱,其由助基礎媒體樓案格 式疋義為可保持用於呈現之實際資料的箱。 在圖3之實例中,M00V箱1〇2包括完整子集軌跡1〇4及 媒體提取器軌跡1〇6。IS〇基礎媒體檔案格式將「軌跡」定 義為ISO基礎媒體權案中之相關樣本之按時間順序的序 列。ISO基礎媒體檔案格式進—步指丨,對於媒體資料而 言’軌跡對應於影像或經取樣音訊之序列。 在圖3之實例中,MDAT箱u〇包括經j編碼之樣本η〕、 經P編碼之樣本114、經B編碼之樣本116及經B編碼的樣本 118。將經B編碼之樣本116及經B編碼的樣本118視為處於 不同階層編碼等級。在圖3之實例中,經B編碼之樣本ιι6 可用作對經B編碼之樣本118的參考,且因此經3編碼之樣 本11 8可係處於低於經B編碼的樣本丨16之階層編碼等級的 階層編碼等級。樣本之顯示次序可不同於階層次序(亦稱 為解碼次序)及樣本包括於MDAT箱丨丨〇中的次序。舉例而 5,經1編碼之樣本112可具有為〇之顯示次序值及為0的解 馬-人序值’經p編碼之樣本i i 4可具有為2之顯示次序值及 為1的解碼次序值,經B編碼之樣本116可具有為1之顯示次 序值及為2的解碼次序值,且經B編碼的樣本118可具有為4 之顯示次序值及為3的解碼次序值。軌跡1可包括額外樣本, 例如’具有為3之顯示次序值及為4的解碼次序值之樣本。 經1編踢之樣本U2、經P編碼之樣本U4、經B編碼之樣 本116及經B編碼的樣本丨18中之每一者可對應於各種nal 151028.doc -42· 201119346 單元或存取單元。;TSO基礎媒體檔案格式將「樣本」定義 為與單一時戳相關聯之所有資料,例如,視訊之個別訊 框、按解碼次序之一系列視訊訊框,或音訊之按解碼次序 的經壓縮區段。在圖3之實例中,完整子集軌跡1〇4包括參 考經I編碼之樣本π 2、經P編碼之樣本1 14、經B編碼之樣 本116及經B編碼的樣本11 8之中繼資料。 MDAT箱11〇進一步包括提取器12〇、提取器122及提取 器124。因此,提取器12〇至124包括於一電影資料箱中, 其通*將包括資料樣本。在圖3之實例中,提取器12〇參考 經I編碼之樣本112,提取器122參考經P編碼之樣本114, 且提取器124參考經B編碼的樣本丨18。可能存在對應於經工 編碼之樣本1 12、經P編碼之樣本丨14及/或經B編碼之樣本 118的兩個或兩個以上NAL單元,且該單元可能為 不連續的。根據本發明之技術,儘管在相應樣本中可能存 在兩個或兩個以上不連續NAL單元,但提取器12〇至124仍 可識別相應樣本之NAL單元中的每一者。在圖3之實例 中,媒體提取器軌跡106包括參考提取器12〇、提取器122 及提取器124的中繼資料。 提取器120至124中之每一者亦可包括顯示次序值及解碼 次序值。舉例而言,提取器120可具有為〇之顯示次序值及 為〇之解碼次序值,提取器122可具有為丨之顯示次序值及 為1的解碼次序值,且提取器124可具有為2之顯示次序值 及為2之解碼次序值。在_些實例中’顯示值及/或解碼值 可跳過某些值’(例如)以與所識別之樣本的值匹配。 151028.doc -43· 201119346 完整子集軌跡104及媒體提取器執跡i〇6可形成交替群 組’使%解多工器38(圖1)可選擇完整子集轨跡或媒體 提取器軌跡106以由視訊解碼器48進行解碼。關於MVC之 貫例’元整子集軌跡10 4可對應於第一操作點,且媒體提 取器軌跡106可對應於第二操作點。關於3(}pp之實例,完 整子集軌跡104及媒體提取器執跡106可形成切換群組。以 此方式,完整子集軌跡104及媒體提取器軌跡1〇6(例如)在 HTTP串流應用中可用以調適頻寬可用性及解碼器能力。 當選擇完整子集軌跡104時,解多工器38可將對應於完 整子集轨跡104之樣本(例如,經〗編碼之樣本丨12、經p編 碼之樣本114、經B編碼之樣本116及經B編碼的樣本118)發 送至視訊解碼器48。當選擇媒體提取器軌跡1〇6時,解多 工器38可將對應於媒體提取器軌跡1〇6之樣本(包括由對應 於媒體提取器軌跡106之媒體提取器識別的樣本)發送至視 訊解碼器48。因此’當選擇媒體提取器軌跡106時,解多 工态38可將經丨編碼之樣本112、經p編碼之樣本η*及經b 編碼的樣本118發送至視訊解碼器48,解多工器38可藉由 1提取H12G、提取器122及提取器124進行解參考而自完 集軌跡104擷取經I編碼之樣本丨〗2、經p編碼之樣本 114及經B編碼的樣本118。 圖4為說明包括兩個相異提取器轨跡i46、148之另—實 例私案140的方塊圖。雖然在圖4之實例中說明兩個提取器 軌跡’但-般而言一檔案可包括任何數目個提取器執跡。 圖4之實例中’檔案14〇包括^〇〇乂箱142及^〇八丁箱 J51028.doc • 44 * 201119346 150。MOOV箱142包括完整子集軌跡144及媒體提取器軌 跡146、148。MDAT箱150包括各種軌跡之資料樣本及提 取器,例如’經】編碼之樣本152、經Ρ編碼之樣本154、經 Β編碼的樣本I%、經β編碼的樣本158及提取器160至168。 在圖4之實例中’提取器160至164對應於媒體提取器軌 跡146 ’而提取器166至168對應於媒體提取器轨跡148。在 此實例中’媒體提取器軌跡146之提取器1 6〇識別經I編碼 之樣本152 ’提取器ι62識別經ρ編碼之樣本154,且提取器 164識別經Β編碼的樣本156。在此實例中,提取器166識別 經I編碼之樣本152,而提取器162識別經ρ編碼之樣本 154。圖4之實例示範一其中各種媒體提取器軌跡之兩個或 兩個以上提取器參考完整子集執跡之同一樣本的實例。 媒體提取器軌跡可用α表示為可解碼之視訊流之時間子 集及含有原始全時間解析度位元流之軌跡的交替/切換軌 跡(例如,完整子集執跡144)。完整子集軌跡144可(例如) 表示30個訊框/秒(Fps)之視訊流。在一些實例中,藉由在 子位元流中不包括某—階層等級之β圖片,該子位^流之 純率可減半或減小某—其他分率。舉例而言,媒體提取 器軌亦1 46藉由不包括經B編碼之樣本1 5 8而可具有相對於 完整子集軌跡144經減半的訊框率。舉例而言,媒體提取 器軌跡Μ可具有為15FPS之訊框率。同#,媒體提取巧 軌跡148藉由省略經B編碼之樣本156及經匕編碼之樣本158 而了具有-相對於媒體提取器軌跡146經減半的訊框率, 且因此具有為7·5 FPS之訊框率。 15l028.doc •45- 201119346 圖5為說明包括一子集軌跡188及兩個媒體提取器轨跡 184、186之另一實例檔案180的方塊圖。檔案ι8〇之M〇〇v 箱182包括子集軌跡188、媒體提取器轨跡184、186,而 MDAT箱190包括經I編碼之樣本192、經Ρ編碼之樣本i 94、 經B編碼之樣本202、經B編碼的樣本208,及提取器198、 20t)、204、206及 210。 如上文所論述’媒體提取器軌跡可包括參考另一軌跡之 樣本的提取器。此外,媒體提取器轨跡可進一步包括並未 包括於另一軌跡中之額外視訊樣本。在圖5之實例中,子 集軌跡1 88包括經I編碼之樣本192及經P編碼之樣本丨94。 媒體提取器軌跡186包括提取器198、200,且另外包括經B 編碼之樣本202。類似地,媒體提取器轨跡} 84包括提取器 204、206、210 ’且另外包括經匕編碼的樣本2〇8。 在圖5之貫例中,媒體提取器執跡1 8 ό包括視訊資料之經 編碼之樣本(經Β編碼之樣本202),且媒體提取器軌跡184 包括提取器210,其參考包括經編碼樣本的媒體提取器軌 跡丨86之樣本β亦即,在圖5之實例中’提取器2ι〇參考經Β 編碼之樣本202。因此,媒體提取器執跡184可表示全時間 解析度位元流,而媒體提取器軌跡186及子集執跡188可表 示全時間解析度位元流的子集。亦即,媒體提取器軌跡 186及子集軌跡188可具有低於由媒體提取器軌跡“A表示 之全時間解析度的時間解析度(例如,較低訊框率)。 根據本發明之技術,H.264/AVC檔案格式可經修改以包 括提取器軌跡’其可經提取而作為軌跡之含有原始全時間 15I028.doc -46· 201119346 解析度位元流的任何相容時間子集。對於支援階層B(或p) 圖片寫碼之H.264/AVC而言,假設存在N個時間等級,包 括自時間等級0至k(k<N)之樣本的每一子位元流可藉由定 義相應提取器軌跡來提取。因此,對於同一視訊而言,可 存在形成父替/切換群組之N個轨跡(包括N-1個提取器執 跡)。提取器可與對應於由提取器識別之樣本之時間階層 4級的時間階層等級相關聯。舉例而言,亦可在提取器中 傳訊指定樣本之時間等級的時間識別符值。 圖6A至圖6C為說明一檔案之mdAT箱220之實例的方塊 圖,該MDAT箱220包括各種媒體提取器軌跡之媒體提取 器的貫例。圖6A至圖6C中之每一者描繪:錯定樣本222, 其包括視圖0樣本224A、視圖2樣本226A、視圖1樣本 228A、視圖4樣本230A及視圖3樣本232A ;及非錫定樣本 223,其包括視圖〇樣本224B、視圖2樣本226B、視圖1樣 本228B、視圖4樣本230B及視圖3樣本232B。非錨定樣本 223旁邊之省略號指不,額外樣本可包括於MDAT箱22〇 中。錨定樣本及非錨定樣本中之每一者可共同形成檔案之 第一軌跡。在一實例中,根據本發明之技術,描繪於圖6a 至圖6C中之檔案之提取器之每一集合的媒體提取器軌跡可 對應於遵照MVC檔案格式之視訊檔案的獨立操作點。以此 方式,本發明之技術可用以產生對應於遵照MVC檔案格式 之視訊檔案之操作點的一或多個媒體提取器轨跡。 圖6A至圖6C描繪各種媒體提取器軌跡之提取器24〇、 244、250,其中提取器24〇、244、25〇將各自包括於 151028.doc -47- 201119346 MDAT箱220中,但出於清晰性目的而未以獨立圖進行說 明。亦即,在進行充分組合時,MDAT箱22〇可包括提取 器240、244、250之每一集合。 圖6A至圖6C提供一檔案之實例,該檔案包括含有媒體 提取器以及真實視訊樣本的軌跡。各種樣本根據不同時間 等級可獨立地含於不同執跡中。對於每一時間等級而言, 特定軌跡可含有所有視訊樣本以及對具有較低時間等級之 軌跡的提取器。可將視訊樣本(NAL單元)分離成不同轨 跡’而具有較高訊框率之軌跡可具有指向其他軌跡的提取 器。以此方式,有可能具有含有具僅一時間等級之樣本之 電影片段’且一電影片段可能含有指向其他片段的提取 器。在此狀況下,不同軌跡但同一時間週期之電影片段可 以時間等級之遞增次序交錯。 圖6A提供包括對應於媒體提取器軌跡之提取器242a至 242N之提取器240的實例。在此實例中,提取器242a參考 錨定樣本222之兩個視圖〇樣本224A。提取器242n參考非 錨定樣本223之視圖〇樣本224B。一般而言,在圖6A之實 例中,提取器集合240之提取器參考相應視圖。樣本。提取 器242A至242N中之每—者對應於共同媒體提取器轨跡,該 等軌跡可屬於切換群組及/或交替群組。媒體提取器軌跡 可進-步對應於個別操作點,例如,包括視圖。之操作點。 在一些實财,對於使用MVC寫碼之立體視訊而言,可 存在三個操作點,包括支援輸出兩個視圖之一操作點及一 支援輸出僅一視圖(例如,僅視圖〇或視圖1}的第二操作 151028.doc -48- 201119346 2。第三操作點可為輸出視圖1的操作點。視預測關係而 定第—“作點可包括僅視圖1中之VCL· NAL單元及相關 聯之非VCL NAL單元、視圖〇及視圖}之所有單元,或 視圖1之nal單兀以及錫定NAL單元(亦# m見圖組件 之NAL單元)。在此立體狀況下,所揭示技術之實例可提 供,其他兩個操作點可由兩個提取器執跡來表示。此等兩 個提取器軌跡可形成切換群組,且與原始視訊軌跡一起, 此等三個執跡可形成交替群組。 本毛月提供用於修改MVC檔案格式以包括MVC媒體提 取器軌跡的技術。-般而言,具有相同數目個輸出視圖的 包括MVC媒體提取器軌跡之Mvc視訊軌跡可特徵化為切 換群組。由檔案之軌跡表示之所有操作點可屬於mvc視訊 呈現之一交替群組。錨定樣本222及非錨定樣本223中之每 一者的視圖可形成完整子集軌跡,例如,包括所有可用視 圖的操作點。 (例如)如關於圖6B中之提取器246A至 246N所展示,提 取器可參考樣本之連續部分。在圖6β之實例+,提取器 246A參考視圖〇樣本224A並參考視圖2樣本226a。表示提 取器246A之資料結構可指定所識別視圖之位元組範圍、起 始視圖及結束視圖、起始視圖及後續視圖之數目,或由提 取器識別之-系列連續視圖的其他表*。提取器集合⑽ 可對應於另-媒體提取器軌跡,該另—媒體提取器軌跡又 可對應於獨立MVC操作點。 舉例而言 如關於圖6C中之提取器254A、256A所展 151028.doc -49- 201119346 示,兩個提取器亦可參考一樣本的兩個部分(例如,兩個 不連續視圖)^舉例而言,提取器樣本252A包括參考視圖〇 樣本224A及視圖2樣本226A之提取器254A,以及參考視圖 4樣本230A的提取器254B ^因此,由提取器樣本252A表示 之樣本可對應於參考不連續視圖樣本的提取器樣本。類似 地,在圖6C之實例中,提取器樣本252N包括參考視圖〇樣 本224B及視圖2樣本226B之提取器256A,以及參考視圖4 樣本230B的提取器256B。 亦可關於錨定或非錨定樣本來定義提取器,其中關於錨 定樣本定義之提取器與關於非錨定樣本定義之提取器可參 考不同視圖。 ISO基礎媒體檔案格式或Mvc檔案格式之上文所提之 MVC媒體提取器軌跡可為中繼資料軌跡之例項,該中繼資 料軌跡可經實施而具有類似提取功能性,且可用以表示正 常視訊軌跡的交替及/或切換軌跡。 在使用MVC檔案格式之實例中,全位元流可含於—軌跡 中,且所有其他可能操作點可由提取器軌跡來表示,該等 提取器執跡中之每-者可為信號,例如供輸出的視圖之數 目、供輸出的視圖之視圖識別符值、傳輸所需要之頻寬, 及訊框率。 圖7為說明實例MVC預測型式之概念圖。在圖7之實例 中,說明八個視圖(具有視圖ID rs〇」至「S7」),且說明 每-視圖之12個時間位置(「τ〇」至「Tu」)。亦即,。圖7 中之每-列對應於-視圖’而每一行指示時間位置。 151028.doc •50- 201119346 雖以M VC具有彳由H.264/AVC解碼器解碼之所謂基礎視 圖,且立體視圖對亦可由MVC支援,但MVC之優點為, 其可支援使用兩個以上視圖作為3D視訊輸入並解碼由多個 視圖表示之此3D視訊的實例。具有Mvc解碼器之用戶端 之呈現器(renderer)可期待具有多個視圖之3〇視訊内容。 視圖中之錫定視圖組件及非錫定視圖組件可具有不同視圖 相依性。舉例而言,視圖S2中之銘定視圖組件視視_中 之視圖組件而疋。然而,視圖S2中之非銷定視圖組件並非 視其他視圖中之視圖組件而定。 /在圖7中使用包括字母之陰影方塊來指示每一列及每一 仃的圖7中之訊框’從而指定相應訊框係經框内寫碎(亦 即,I訊框),或是在—方向上經框間寫碼(亦即,為p訊 框)’或是在多個方向上經框間寫碼(亦即,為㈣框)。一 般而言,由箭頭來指示預測,其中箭頭指向之訊框使用箭 頭出發之物件用於預測參考。舉例而言自視圖$◦之時間 位置T0處的!訊桓預測視圖S2之時間位置τ〇處的p訊框。 如同早-視圖視訊編碼,多視圖視訊寫碼視訊序列之訊 框可相對於不同時間位置處之訊框經預測性編碼。舉例而 視圖SO之時間位置T1處的b訊框具有一自視圖训之時 間位置το處的m框指向其之箭頭’從而指示自㈣框預測 b Sfl框°然而’另外’在多視圖視訊編碼之情形下,可視 圖間地預測訊框。亦即,視圖組件可使用其他視圖中之視 圖組件用於參考。在Mvc中,例如,如同另一視圖中之視 圖、,·件為預測間參考_般實現視圖間預測。潛在視圖間參 151028.doc -51 - 201119346 考在序列參數集合(SPS)MVC擴展中傳訊,且可藉由參考 圖片清單建構過程來修改,其致能預測間或視圖間預測參 考的靈活排序。以下表1提供MVC擴展序列參數集合的實 例定以。 表1 seq_parameter_set_mvc_extension() { c 描述符 num—views 一minus 1 0 ue(v) for( i = 0; i <= num_views_minusl; i++) view—id[ i ] 0 ue(v) for( i = 1; i <= num一views—minusl; i++ ) { num_anchor_refs_10[ i ] 0 ue(v) for( j = 0; j < num_anchor一refs—10[ i ]; j++) anchor_ref_10[ i ][ j ] 0 ue(v) num一anchor一refs一11 [ i ] 0 ue(v) for( j = 0; j < num—anchor—refs—11 [ i ]; j++) anchor_ref_ll[ i ][ j ] 0 ue(v) } for( i = 1; i <= num_views_minusl; i++ ) { num_non_anchor_refs_10[ i ] 0 ue(v) for( j = 0; j < num一non_anchor—refs_10[ i ]; j++ ) non一anchor一ref_10[ i ][ j ] 0 ue(v) num_non_anchor_refs_ll[ i ] 0 ue(v) for( j = 0; j < num一non_anchor一refs_ll [ i ]; j++ ) non一anchor_ref_l 1 [ i ] [ j ] 0 ue(v) } num—Ievel_values一 signalled一minus 1 0 ue(v) for(i = 0; i<= num_level_values_signalled_minusl; i++) { level_idc[ i ] 0 u(8) num—applicable—ops_minusl[ i ] 0 ue(v) for( j = 0; j <= num_applicable_ops_minusl[ i ]; j++ ) { applicable_op—temporal—id[ i ][j ] 0 u(3) applicable_op一num—target一views_minusl[ i ][j ] 0 ue(v) for( k = 0; k <= appIicable_op_num_target_views_minusl[i][j];k++) applicable_op_target_view_id[ i ][ j ][ k ] 0 ue(v) applicable一op一num_views_minusl [ i ][j ] 0 ue(v) } } } -52- 151028.doc 201119346 圖7提供視圖間預測之各種實例。在圖7之實例中,視圖 S1之訊框說明為係自視圖“之不同時間位置處的訊框來預 測,以及自視圖S0及S2在同一時間位置處的訊框中之訊框 經視圖間預測。舉例而言,視圖s 1在時間位置T丨處之b訊 框係自視圖S 1在時間位置T0及T2處之B訊框以及視圖8〇及 S2在時間位置T1處之b訊框中的每一者進行預測。 在圖7之實例中,大寫字母r B」及小寫字母「b」意欲 指示訊框之間的不同階層關係而非不同編碼方法。一般而 言’大寫字母「B」訊框相較於小寫字母rb」訊框預測階 層相對較高。亦即,在圖7之實例中,參看「B」訊框來編 碼「b」訊框。可添加額外階層等級,從而具有可參考圖7 之「b」訊框的額外經雙向編碼之訊框。圖7亦使用不同等 級之陰影來說明預測階層之變化,其中較大量之陰影(亦 即,相對較深)訊框預測階層高於具有較少陰影(亦即,相 對較庚)的彼等訊框。舉例而言,由全陰影來說明圖7中之 所有I Λ框而Ρ汛框具有稍淺之陰影,且Β訊框(及小寫字 母b訊框)相對於彼此具有各種等級之陰影,但始終淺於ρ 訊框及I訊框之陰影。 般而5,預測階層與視圖次序索引有關,纟關係在於 預測1¾層4目對較高之訊框應在解碼階層相對較低之訊框之 則進行解碼,使得階層相對較高之彼等訊框在解碼階層相 對車乂低之期間可用作參考訊框。視圖次序索引為指示 存取早兀中之視圖組件之解碼次序的索引。如在 H 4/AVC之附錄H(MVC修正)中所指定,視圖次序索引 151028.doc -53· 201119346 隱含於SPS MVC擴展中。在SPS中,對於每一索引z•而士, 傳訊相應view一id。視圖組件之解碼應遵循視圖次序索引 之升序。若呈現所有視圖,則視圖次序索引係處於自〇至 num_views_minus_l的連續次序。 以此方式,用作參考訊框之訊框可在解碼參考參考訊框 編碼之訊框之前進行解碼。視圖次序索引為指示存取單元 中之視圖組件之解碼次序的索引。對於每一視圖次序索引 z而言,傳訊相應view一id。視圖組件之解碼遵循視圖次序 索引之升序。若呈現所有視圖,則視圖次序索引之集合包 含一自零至比視圖之全數目小1的連續排序之集合。 對於處於相等階層等級之某些訊框而言,解碼次序相對 於彼此可能並不重要。舉例而言,視圖S0在時間位置τ〇處 之I訊框用作視圖S2在時間位置T0處之p訊框的參考訊框, 視圖S2在時間位置T0處之p訊框又用作視圖S4在時間位置 το處之p訊框的參考訊框。因此,視圖s〇在時間位置τ〇處 之I訊框應在視圖S2在時間位置T0處的p訊框之前經解碼, 視圖S2在時間位置Τ0處之ρ訊框應在視圖S4在時間位置丁〇 處的P訊框之前經解碼。然而,在視圖s丨與S3之間,解碼 次序並不重要,此係因為視圖81及83並不依賴於彼此以進 行預測,而僅係自預測階層較高之視圖進行預測。此外, 視圖S1可在視圖S4之前經解碼,只要視圖S1在視圖8〇及幻 之後解碼即可。 以此方式’階層排序可用以描述視圖8〇至S7。使記法 SA>SB意謂視圖SA應在視圖沾之前解碼。在圖7之實例 】5】028.doc -54· 201119346 S0>S2>S4>S6>S7。又,關於圖7之實 ’ S2>S3 , S4>S3 , S4>S5 ,且S6>S5 。 中,使用此記法, 例,S0>S1 , S2>S1 視圖之並不違反此等要求之任—解碼次序為可能的。因 此’在僅具有某些限制之情況下,許多不同解碼次序為可 能的。下文呈現兩個實例解碼次序,但應理解,許多其他 解碼次序為可能的。在說明於下表2中之一實例中,儘可 能快地解碼視圖。 表2 視圖ID so S1 S2 S3 S4 S5 S6 視圖次序索引 0 2 1 4 3 6 5 〇/ 7 表2之實例確認,在已解碼視圖S0及S2之後,可立即解 碼視圖S1 ;在已解碼視圖82及84之後,可立即解碼視圖 S3 ;且在已解碼視圖S4&S6之後,可立即解碼視圖s5。 下表3提供另一實例解碼次序,其中該解碼次序使得用 作另一視圖之參考的任一視圖在並未用作任何其他視圖之 參考的視圖之前經解碼。 表3 視圖ID so S1 S2 S3 S4 S5 S6 C7 視圖次序索引 0 5 1 6 2 7 3 〇 / 4 表3之實例確認’在圖7之實例中,視圖si、s3、85及37 之訊框並不充當任何其他視圖之訊框的參考訊框,且因此 視圖S1、S3、S5及S7在用作參考訊框之彼等視圖(亦即, 視圖SO、S2、S4及S6)的訊框之後解碼。視圖S1、S3、s5 及S7相對於彼此可以任何次序經解碼。因此,在表3之實 151028.doc -55· 201119346 例中’在視圖SI、S3及S5中之每一者之前解碼視圖S7。 為了清楚起見,在每一視圖之訊框以及每一視圖之訊框 的時間位置之間可能存在階層關係。關於圖7之實例,自 其他視圖在時間位置T0處之訊框來視圖内預測或視圖間預 測時間位置T0處的訊框。類似地’自其他視圖在時間位置 T8處之訊框來視圖内預測或視圖間預測時間位置Τ8處的訊 框。因此,關於時間階層,時間位置丁〇及Τ8係處於時間階 層之頂點處。 在圖7之實例中,因為參考時間位置別及以之訊框來β 編碼時間位置Τ4之訊框,所以時間位置Τ4處之訊框在時間 階層上低於時間位置丁〇及Τ8的訊框。時間位置仞及以處 之訊框在時間階層上低於時間位置Τ4處的訊框。最後,時 間位置ΤΙ、Τ3、Τ5及Τ7處之訊框在時間階層±低於時間 位置Τ2及Τ6處的訊框。 在MVC中,可提取整體位元流之子集以形成仍遵 MVC之子位元流。基於(例如)以下各者而存在特定應用 能需要之許多可能子位元流:由伺服器提供之服務、— 多個用戶端之解碼器的容量、支援及能力,及/或一或 個用戶端的優選項。舉例而言1戶端可能需要僅三個 圖’且可能存在兩個情境。在―實例中,—用戶端可_ 要流暢之觀看體驗,且可能首選具有= S2之視圖,而另__用戶端可能需要視圓可調能力並首選 有vieW_id值S0、52及以之視圖。若最初關於表9之實例 ——id進行排序,則視®次序索引值在此等兩個實例 15I028.doc -56- 201119346 分別為(〇, i,2}及A I 4}。注意,此等子位元流中之兩 者可解碼為獨立MVC位元流,且可得以同時支援。 〜可存在可由MVC解碼器解碼之許多Mvc子位元流。理 淪上,滿足以下兩個性質之視圖的任何組合可由符合某— 規範或等級之MVC解碼器來解碼:⑴以視圖次序:弓^ 遞增次序來對每-存取單元中之視圖組件排序,及⑺對於 組合中之每一視圖而言,其相依視圖亦包括於組合中。 關於本發明之技術,可使用媒體提取器軌跡及/或純視 訊樣本轨跡來表示各種MVC子位元流。此等軌跡中之每一 者可對應於一 MVC操作點。 、:圖8至圖21為說明根據本發明之技術的媒體提取器之資 料結構及可使用之其他支援資料結構的各種實例之方塊 圖。如下文所詳細論述,圖8至圖22之各種媒體提取器包 括各種特徵。一般而言,圖8至圖21之媒體提取器中的任 一者可包括於一檔案之媒體提取器軌跡中以識別檔案之經 寫碼之樣本,該檔案遵照iso基礎媒體檔案格式或對18〇基 礎媒體檔案格式之擴展。一般而言,媒體提取器可用以自 參考軌跡提取一或多個整體樣本。圖8至圖12為能夠識別 另一軌跡之一視訊樣本箱的媒體提取器之實例。如圖13中 所示,實施提取器之另一方式為致能來自另一軌跡之樣本 的樣本分組。為了提供對時間可調能力之更特定支援,如 圖Η中所示’可傳訊一時間識別符。圖16至圖22為Mvc之 媒體提取器之貫例,其能夠自每一視訊樣本箱(存取單元) 提取一或多個潛在不連續NAL單元。提取器之各種實例係 151028.doc •57- 201119346 基於檔案或存取單元中之偏移及位元組長度,而其他實例 可純粹係基於整體NAL單元之索引,因此傳訊位元組範圍 並不必要。由整體NAL單元之索引傳訊提取器之機制亦可 擴展至SVC檔案格式。 圖8至圖21之實例亦可直接應用至3GPP檔案格式而作為 對3GPP檔案格式的擴展。圖8至圖21中之-或多者的元件 及概念亦可與圖8至圖22中之其他者的元件組合,以形成 其他提取器。雖然關於特定檔案格式來描述圖8至圖二中 之某些圖,但一般而言,圖8至圖21之實例可關於具有類 似特性之任何檔案格式(例如,IS〇基礎媒體檔案格式或 iso基礎媒體檔案格式之擴展)來使用。如在圖21之實例中 所不,為了促進所提議提取器在3Gpp中之使用,3Gpp轨 跡選擇相可經擴展以包括(所提取)交替軌跡中之每一者的 更多特性,諸如時間識別符、待顯示之視圖的數目,及待 解碼之視圖的數目。 圖8為說明實例媒體提取器3〇〇之方塊圖,該實例媒體提 器300說月媒體七取器之格式。在圖8之實例中,媒體提 取器300包括軌跡參考索引3〇2及樣本偏移值3〇4。根據本 發明之技術,媒體提取器3〇〇可對應於可例示於媒體提取 器軌跡内之資料結構的定義。多工器30可經組態以將遵照 媒體提取益300之實例的提取器包括於視訊檔案之媒體提 取器執跡中,以識別視訊檔案之不同軌跡的NAL單元。解 夕工器38可經組態以使用遵照媒體提取器3〇〇之提取器來 拮貝取經識別之NAL單元。 151028.doc •58- 201119346 跡 > 考索引302可對應於其中存在經識別之NAl單元 的轨跡之識別符。可向視訊檔案之每-軌跡指派一唯一索 、便區別視訊稽案之軌跡。執跡參考索引302可指定 厂亦 > 考之索引以用以找尋供提取資料的軌跡。供提取資 ;: 軌跡中的樣本可經準確地時間對準(在媒體解碼時 刻表中’使用時間·樣本表’藉由由樣本偏移值3〇4指定的 偏移來凋整),其中該樣本含有提取器。在一些實例中, ,訊槽案之第一軌跡具有為Γ1」之索引值,:因此多工 心⑼㈣參考索引值3()2指派為「匕的值以參考視 訊檔案之第-軌跡。可保留軌跡參考索引值之為「〇」的 值以供未來使用。 ,本偏移值304疋義自媒體提取器軌跡中之媒體提取器 _之時間位置至由軌跡參考索引3()2指代之軌跡的經識別 之NAL單元之偏移值。亦即’樣本偏移值3〇4給出用作資 訊源之經鏈接軌跡中的樣本之相對索弓卜樣本偏移值3〇4 之為零的值指代具有與含有提取器之樣本相同之解碼時間 或緊接於含有提取器之樣本之前的樣本。樣本以下一樣 本’樣本-1為前一樣本,以此類推。舉例而言,當在 H.263或MPEG_4第2部分中使用遵照媒體提取器则之二體 提取器時’媒體提取料用以提取由執跡參考纟引則曰匕 代的視訊執跡之時間子集。 以下偽碼提供類似於媒體提取器3⑽之媒體提取号類別 的實例定義。 class aligned(8) MediaExtractor () { 151028.doc •59- 201119346 unsigned int(8) track_ref_index; signed int(8) sample—offset; } 多工器30及解多工器38可使用在以上實例偽碼中定義之 媒體提取器來例示媒體提取器資料物件。因此,解多工器 3 8(例如)在自所選擇軌跡擷取資料時可參考所例示之媒體 提取器,以便自由所例示之媒體提取器參考之另一軌跡擷 取經識別的資料。 在貫例偽碼中’ class MediaExtractor()經位元組對準。 亦即,當由MediaExtractor() class來例示提取器時,將在 八位元組邊界上對準提取器。變數rtrack一ref一index」對 應於軌跡參考索引值302,且在此實例偽碼中對應於無正 負號之八位元組整數值。變數r sample-〇ffset」對應於樣 本偏移值304 ’且在此實例中對應於帶正負號之八位元組 整數值。 圖9為說明媒體提取器3 1 〇之另一實例的方塊圖。媒體提 取器310包括軌跡參考索引314及樣本偏移值316,且另外 包括樣本標頭312。軌跡參考索引314及樣本偏移值316通 常可包括類似於軌跡參考索引3〇2及樣本偏移值3〇4(圖8)之 資料。 在對應於H.264/AVC之實例中,樣本標頭312可根據由 媒體提取器310參考之視訊樣本之naL單元標頭來建構。 樣本標頭3 12可含有具有三個語法要素之一位元組資料: forbidden一zero_bit、nal_ref_idc(其可包含 3 個位元)、 I51028.doc •60· 201119346 nal—unit—type(其可包含5個位元 % Λ' ^ ; nal-unit_type」之值 了為29(或任何其他保留數字),且其他兩個語 =別視訊樣本中之彼等語法要素相同。對於遵照Μ祕 4第2部分「視覺」之實例而言,樣本標頭⑴可包含四位 二,组碼,該碼可包括為「Gx1」之起始碼首碼及為The SampleToGroup box represents the assignment of samples to sample groups. For each sample group item, there may be one instance of the SampleGroupDescription box to describe the nature of the corresponding group. An optional relay data trace can be used to tag each track with a "characteristic of interest" for each track, for which the value of the "characteristic of interest" is different from other members of the group (eg, Its bit rate, screen size or language). Some samples within the trajectory may have special characteristics or may be identified by another 151028.doc •31- 201119346. An example of this feature is the sync point (usually the video box). These points can be identified by a special table in each track. More general, the nature of the dependence between the trajectory samples can also be documented using relay data. The relay data can be structured into a sample format-sequence as a video track. This trace can be referred to as a relay data track. Each relay data sample can be structured into a relay data statement. There are various kinds of statements corresponding to various questions that may be asked about the corresponding private case format sample or its constitutive sample. When delivering media via a streaming protocol, it may be necessary to change the media in a manner that presents the media in the archive. An example of this is when the media is transmitted through Γρ). In the case of the slot, for example, every message of the video: , · 'Bei Dianjian as a file format sample. In the RTP, the packetization rules specific to the codec used by the codec must be adhered to in the coffee package. The service can be configured to calculate the packetization at runtime. However, there is a trajectory that can be placed in the slot for the auxiliary stream. ...expressed as a special trajectory The schematic trajectory contains general instructions for the streaming server on how to form a packet stream for a particular protocol; Because of these instructions: independent of the media, when a new codec is introduced, the server may not need to be modified. In addition, edited: "When you want to cry a ^ ^ 1, 1 Han, two subjects Qiu body can not know the competition servo - π pair file editing, called the segment can be used in the file m (nter) software film one Place on the streaming server before going to the file. Adding an RTPO example as a sigma ~ trajectory 'There is a schematic trajectory format defined for L in the 稽4 Instance Format Specification. 15I028.doc •32· 201119346 3 0? (3 0?? File Format) is a multimedia container format defined by the 3rd Generation Partnership Project (30??) for 3G UMTS multimedia services. It is typically used on 3G mobile phones and other devices with 3G capabilities, but can also be played on some 2G and 4G phones and devices. The 3GPP file format is based on the ISO base media building format. The latest 3GP is specified in 3GPP TS 26.244 "Transparent end-to-end packet switched streaming service (PSS); 3GPP Hie format (3GP)". The 3GPP file format stores the video stream as MPEG-4 Part 2 or H.263 or MPEG-4 Part 10 (AVC/H.264). Because 3GPP specifies the use of sample and template fields in the ISO base media file format and a new box that defines the codec reference, 3GPP allows the use of AMR in the ISO Base Media File Format (MPEG-4 Part 12). Η·263 codec. For MPEG-4 media-specific information stored in 3GP files, the 3GP specification refers to the MP4 and AVC file formats, and the MP4 and AVC file formats are also based on the ISO base media file format. The MP4 and AVC file format specifications describe the use of MPEG-4 content in the ISO base media file format. The extended SVC file format for the AVC file format has a new structure of extractors and layers. The extractor is an indicator that provides information about the location and size of the video code data in the sample that has equal decoding time in another track. This situation allows the trajectory hierarchy to be built directly in the code domain. The extractor trajectory in the SVC is linked to one or more base trajectories, and the extractor trajectory extracts data from one or more basic trajectories at runtime. The extractor is an indicator with a NAL unit header that can be dereferenced by SVC extension. If the trace used for extraction contains video code data at different frame rates, the extractor also contains a solution time offset of 151028.doc -33·201119346 to ensure synchronization between the tracks. At runtime, the extractor must be replaced by the data it points to before being streamed to the video decoder. Since the extractor trajectory in the SVC is structured similarly to the video write code trajectory, the extractor trajectory in the SVC can represent a subset that it needs in different ways. The SVC extractor trajectory contains only instructions on how to extract data from another trajectory. In the SVC file format, there is also a summary tool that aggregates the NAL units within the sample together as a NAL unit, including summarizing the NAL units in a layer into a summary tool. The extractor in the SVC is designed to extract a range of bytes from: a sample or summary tool, or only one entire NAL unit rather than multiple NAL units, especially discontinuous NAL units in the sample. There are many video operating points in the SVC file format. The layers are designed to group samples in one or more of the operating points. The MVC file format also supports extractor trajectories that extract NAL units from different views to form an operating point that is a subset of views that use a certain frame rate. The design of the MVC extractor is similar to the extractor in the SVC file format. However, the use of MVC extractor trajectories to form alternate groups is not supported. To support trajectory selection, MPEG is proposed to the following MPEG proposal: P. Frojdh, A. Norkin, and C. Priddle "File format sub-track selection and switching" {JS0/1EC JTC1/SC29/WG11 MPEG M16665, London, UK) . This proposal attempts to alternate the concept of switching/switching groups in the sub-track level. The mapping sample group is an extension of the sample group. In the mapping sample group, each group item (group item of the sample) has a description of its "groupID", and its 151028.doc -34-201119346 is actually a summary of the NAL units in a view. The mapping to vieW_id after the NAL unit. In other words, each sample group item has a view that it contains in the ScalableNALUMapEntry value. The gr〇uping_type of this sample group item is rscnm". Progressive download is a term used to describe a digital media slot that typically uses a transfer from the server to the client. When starting from the computer, the consumer can start playing the media before the download is complete. The key difference between streaming media and progressive download is how the end user device that is accessing the digital media receives and stores the digital media material. A media player with progressive download capabilities relies on the local buffering of the relay data and digital media files in the header of the complete file when the digital server is downloaded from the web server. The media will start playing when the specified amount of data is available for the local playback device. This specified amount of buffer is embedded in the archive by the producer of the content in the encoder settings and is enhanced by the additional buffer settings imposed by the media player. In 3GPP, HTTp/Tcp/Ip transfers are supported for 3GP files for download and progressive download. In addition, the use of HTTp for video streaming has several advantages, and video streaming services based on Ηττρ are becoming more popular. Some of the advantages of HTTP streaming include the ability to use existing Internet components and protocols' so that no new effort is required to develop new technologies for transmitting video over the network. Other transport protocols, such as the RTP payload format, require: intermediate network devices (eg, 'middle boxes') to be aware of media formats and messaging. Also, 'HTTP streaming can be driven by the client, # avoid many control problems. For example, '$ utilizes all features for optimal performance, Servo 15I028.doc •35· 201119346 The server can track the size and content of unconfirmed packets. The server can also analyze the file structure and reconstruct the client buffer. Status to make RD optimal switching/reduction decisions. In addition, the constraints on the change of the bit stream can be satisfied in order to comply with the negotiated specifications. HTTP does not require new hardware or software implementations on web servers that implement HTTP 1.1. HTTP streaming also provides TCP affinity and firewall traversal. The techniques of the present invention can overcome the problem of bandwidth by, for example, improving the HTTP stream of video data by providing bit rate adaptation. Video compression standards such as ITU-T H.261, H.262, H.263, MPEG-1, MPEG-2, and H.264/MPEG-4 Part 10 utilize motion compensated temporal prediction to reduce time redundancy I. The encoder uses motion compensated predictions from some previously encoded pictures (also referred to herein as frames) to predict the current coded picture based on the motion vector. There are two main picture types in a typical video code. It is a picture written in the frame ("[picture" or "I frame"), a predicted picture ("p picture" or "p frame"), and a bi-predicted picture ("B picture" Or the block of the "B frame" /p picture can be written 29 in the box, or see - other pictures for prediction. In the B picture, the 'block' can be predicted from one or two reference pictures, or can be coded in-frame. These reference pictures can be positioned in chronological order before or after the current picture. Very H.264 write.., as expected, as an example, the B picture uses two lists of reference pictures of the previous code (list Q and list... these two;; each can contain the order of the order between the picks And/or future writing; film. It can be predicted in one of the following ways. Area 3 in the picture is predicted by the action compensation of the "° reference picture, since the clear reference 丨151028.doc -36 - 201119346 The motion compensated prediction, or the motion compensation of the combination of the two modes, and the list 1 reference picture and list! The reference picture of the two sets of Hi in order to obtain the list 主 reference, the main °. 1 A '' Knife from the list 0 to test y = early 1 reference picture to obtain two motion compensated reference area L film will be used to predict the current block. Domain group, small video block can provide better resolution, and It can be used to locate the video frame of the level of the package. - 妒 and ' _ ^ is: the various partitions of the block are treated as video blocks. In addition, the slice can be regarded as a plurality of video blocks, such as giant blocks and / or sub-blocks. Each can be video-simplified - unit 4 that can be decoded independently The frame itself can be decodable | ' or other parts of the frame can be defined as decodable units. The term "coded unit" ""code unit" can refer to any independently decodable video frame. Units, such as the entire frame, the slice of the frame, also known as the sequence of picture groups (G〇P), or another independently decodable unit defined according to the applicable coding techniques. The term megablock refers to According to a data structure ε including a 16 χ 16 pixel two-dimensional pixel array for encoding pictures and/or video data, each pixel includes a chrominance component and an illuminance component. Therefore, the megablocks can each define a two-dimensional array of 8×8 pixels. The four illumination blocks, each containing one is 丨6><Two chrominance blocks of a two-dimensional array of 16 pixels, and a header containing syntax information such as: coded block type (CBP), coding mode (for example, in-frame (1) Encoding mode or inter-frame (Ρ or Β encoding mode), the partition size of the partition of the block coded (for example, 16x16, 16x8, 8x16, 8x8, 8Μ' 4x8 or 4x4), or 151028 .doc -37· 201119346 One or more action vectors for the giant block of code. Video encoder 28, video decoding 48, audio encoding (4), audio decoder private, multiplexer 30, and demultiplexer 38 may each be implemented as any of a variety of suitable encoder or decoder circuits, such as One or more microprocessors, digital signal processors (), special application integrated circuits (ASK:), field programmable gate arrays (FpGA), discrete logic circuits, software: hardware, hard work, or any -combination. Video encoder-like video decoding The mother of the horse state 48 can be included in one or more encoders or decoders, and can be part of the video encoder/decoder (codec). Similarly, each of the audio encoder 26 and the audio decoder can be included in one or more encoders or decoders, either of which can be a combined audio encoder/decoder (c〇DEC) )part. Including Μ encoder 28, video decoding (4), audio coding (4), audio decoder 46, multiplexer 30, and/or demultiplexer, the device may include an integrated circuit, a microprocessor 'and/or - such as Wireless communication device for cellular telephones 0 In accordance with the teachings of the present invention, the multiplexer 3 can combine the Wei units to conform to the ISO base media slot format or its derived format (9). , svc '^ or 3 coffee) the archive of the video archive, and includes - the media extractor track also 'the media extractor track identifies one or more potential discontinuous NAL units and passes the video rights to Output interface 32. The output interface η can include, for example, a transmitter, a transceiver, a device for writing data to a computer readable medium (such as a 'disc player, a magnetic media drive (eg, a floppy disk drive), a universal serial bus ( USB), network interface), or other output 151028.doc -38- 201119346 " face wheel 4 media® 32 will NAL single it or access unit to send media 34 (for example, 锉 4 you think the main % 驷凟Magnetic #轮信号 or carrier transient media), or such as body. ...the computer-readable storage medium of the media, memory or flash drive, the H interface 36, is read from the computer-readable medium 34. The input interface 36 can be, for example, a CD player, a magnetic media drive, a job, a receiver, a unit; or other computer readable media interface. The input interface 36 can provide the NAL early or access unit to the de-streaming or program stream. The multiplexer 38 can transmit the singular PES stream and decapsulate the PES > Take the encoded data, and view the _ _ (Example 2 ES: the indicated audio stream or video flooded two: multiplexer π / (four) sent to the audio decoder 46 or video decoder 48. Optionally, the selected (four) traces of the selected (four) traces of the other (four) of the selected traces of the H-reference (4) are passed to the visual scale 48, discarding 1 The encoded audio data is not encoded by the extractor coder (10) of the selected trajectory, and the decoded =: to:: output 42' and the video solution... decodes the encoded IS = multiple views of the stream - Viewing Continuation 44. Video output 44 may include - a display using several views, such as 'presenting a scene at the same time = wearing a stereoscopic stereoscopic display or an ophthalmic stereoscopic display. See (10) Figure 2 is a multiplexed thin 1λ Block diagram of the example configuration in Figure 2. Benefits Management Unit 6〇, Video Input 15I028.doc •39· 201119346 interface 80, audio input interface 82, multiplex flow output interface fabric, and program specific information table 88 ^ stream management unit 60 includes NAL unit constructor 62, stream identifier (stream ID) search unit 66, track generation unit 64 and Extractor generation unit 68. In the example of Figure 2, video input interface 8 and audio input interface 82 include separate packetizers for the encoded video material and the encoded audio data forming unit. The video and/or audio packetizer can be presented external to the multiplexer 3. For the example of FIG. 2, the video input interface 80 can be opened from the encoded video data received from the video encoder 28 into a PES packet. And the audio input interface 82 can form a pES packet from the encoded audio material received from the audio encoder 26. After the NAL unit constructor 62 constructs the NAL unit, the NAL unit constructor 62 sends the NAL unit to the trajectory generating unit 64. The trajectory generating unit 6 receives the NAL unit and combines the video building including the NAL unit into one or more trajectories of the video file. The trajectory generating unit 64 can further perform the extraction. The generator 68 generates an extractor that constructs one or more media extractor trajectories by the trajectory generating unit 64. When it is determined that - or a plurality of said units belong to a plurality of trajectories rather than repeating the NAL unit between the trajectories The extractor generates a single 兀 68 that can construct an extractor that references the trajectory of the NAL unit. In this manner, the multiplexer 30 can avoid duplication of data between tracks, which can reduce bandwidth consumption when transmitting video files. Discussing the various data structures and components of the extractor. In general, the extractor may include: a trajectory identifier value, which refers to the trajectory of the referenced NAL unit; and - or multiple muscle units Identifier, zhang 151028.doc 201119346 Identify the NAL unit referenced by the extractor. In the second example, the nali open identifier can refer to the range of bits or bytes in the trajectory corresponding to the identified nat A ^ [earlier track identifier value reference. » - ^ . In some examples, the NAL·early 7L identifier may be individually referenced by the parent of the extractor > W - NAL unit, for example, to identify discontinuous NAL units. a _ * 6 _ In the upper case, the NAL 兀 identifier can refer to the NAL unit based on the offset of the time or space position of the k-staker from the media extractor trajectory. The generating unit M may include, in some instances, a media extractor track. That is, the media extractor trajectory can include both the ancestor unit and the extractor. Therefore, in this example, the trajectory generating unit 64 can construct a video file 'the video file and have /, the page only includes the NAL unit: the track and one includes one or more extractors The second track, the one or more benefits refer to all or a subset of the ancestor elements of the first track. Moreover, in some examples, the falsification generating unit 64 may include additional NAL units in the trajectory that are not included in the first trajectory. The technique of the sample invention can be extended to a plurality of tracks. For example, the trajectory generating unit 64 may construct a second track that may refer to the NAL unit of the first track and/or the nal unit of the second track, and may further include a NAL unit that is not included in the first track or the second. . 3 is a block diagram illustrating an example audit file (10) including - a first track having a set of video samples and an extractor having a subset of video samples having a reference first track Two obedience. In the example of Fig. 3, the file 100 includes MOOV|gl() 2 and a media material (salt 7) box 11〇. The phase 102 corresponds to a movie box, which is defined by the helper base media file format. I51028.doc • 41 · 201119346 is a container box. The container box of the container defines the relay data for presentation. ΜΙ^ΑΤ箱1()4 corresponds to the media data box, which is defined by the help basic media building format as a box that can hold the actual data for presentation. In the example of Figure 3, the M00V box 1〇2 includes the complete subset track 1〇4 and the media extractor track 1〇6. The IS® Basic Media Archive format defines the “track” as a chronological sequence of related samples in the ISO base media rights case. The ISO base media file format is step-by-step, and for media data, the track corresponds to the sequence of images or sampled audio. In the example of FIG. 3, the MDAT box includes a j-coded sample η], a P-coded sample 114, a B-coded sample 116, and a B-coded sample 118. The B-coded samples 116 and the B-coded samples 118 are considered to be at different levels of coding. In the example of FIG. 3, the B-coded sample ι6 can be used as a reference to the B-coded sample 118, and thus the 3-coded sample 117 can be at a lower level coding level than the B-coded sample 丨16. Class coding level. The order in which the samples are displayed may differ from the hierarchical order (also referred to as the decoding order) and the order in which the samples are included in the MDAT box. For example, 5, the coded sample 112 may have a display order value of 〇 and a solution-human sequence value of 0. The p-coded sample ii 4 may have a display order value of 2 and a decoding order of 1. The value, B-coded sample 116 may have a display order value of 1 and a decoding order value of 2, and the B-coded sample 118 may have a display order value of 4 and a decoding order value of 3. Track 1 may include additional samples, such as 'samples having a display order value of 3 and a decoding order value of 4. Each of the sample U2, the P-coded sample U4, the B-coded sample 116, and the B-coded sample 丨18 may correspond to various nal 151028.doc -42·201119346 units or accesses. unit. The TSO Basic Media File Format defines a "sample" as all data associated with a single timestamp, such as individual frames for video, a series of video frames in decoding order, or compressed regions in the decoding order of audio. segment. In the example of FIG. 3, the complete subset track 1〇4 includes a reference to the I-coded sample π 2, the P-coded sample 144, the B-coded sample 116, and the B-coded sample 117. . The MDAT box 11 further includes an extractor 12, an extractor 122, and an extractor 124. Thus, the extractors 12A through 124 are included in a movie data box, which will include data samples. In the example of FIG. 3, the extractor 12 refers to the I-coded sample 112, the extractor 122 refers to the P-coded sample 114, and the extractor 124 refers to the B-coded sample 丨18. There may be two or more NAL units corresponding to the coded sample 1 12, the P-coded sample 丨 14 and/or the B-coded sample 118, and the unit may be discontinuous. In accordance with the teachings of the present invention, although there may be two or more discontinuous NAL units in the respective samples, extractors 12A through 124 may still identify each of the NAL units of the respective samples. In the example of FIG. 3, the media extractor trajectory 106 includes relay data for the reference extractor 12, the extractor 122, and the extractor 124. Each of the extractors 120 to 124 may also include a display order value and a decoding order value. For example, the extractor 120 can have a display order value of 〇 and a decoding order value of ,, the extractor 122 can have a display order value of 丨 and a decoding order value of 1, and the extractor 124 can have 2 The display order value is a decoding order value of 2. In some instances, 'display values and/or decoded values may skip certain values' (for example) to match the values of the identified samples. 151028.doc -43· 201119346 The complete subset trajectory 104 and the media extractor trajectory i 〇 6 can form an alternating group 'to make the % multiplexer 38 (Fig. 1) select the complete subset trajectory or media extractor trajectory 106 is decoded by video decoder 48. Regarding the MVC example, the meta-subset trajectory 10 4 may correspond to a first operating point, and the media extractor trajectory 106 may correspond to a second operating point. Regarding the example of 3 (} pp, the complete subset trajectory 104 and the media extractor trajectory 106 may form a switching group. In this way, the complete subset trajectory 104 and the media extractor trajectory 1 〇 6 (for example) in HTTP streaming The application can be used to adapt bandwidth availability and decoder capabilities. When the complete subset trajectory 104 is selected, the demultiplexer 38 can sample the corresponding subset trajectory 104 (eg, the encoded sample 丨12, The p-coded sample 114, the B-coded sample 116, and the B-coded sample 118) are sent to the video decoder 48. When the media extractor track 1〇6 is selected, the demultiplexer 38 can correspond to the media extraction. The sample of track 1 ( 6 (including samples identified by the media extractor corresponding to media extractor trajectory 106) is sent to video decoder 48. Thus, when media extractor trajectory 106 is selected, solution multiplex 38 can The encoded coded sample 112, the p-coded sample η*, and the b-coded sample 118 are sent to the video decoder 48, and the demultiplexer 38 can extract the H12G, the extractor 122, and the extractor 124 for dereference. The self-finishing track 104 is captured by the I code. Sample 丨 2, p-coded sample 114, and B-coded sample 118. Figure 4 is a block diagram illustrating another example of a private case 140 including two distinct extractor trajectories i46, 148. The two extractor trajectories are illustrated in the example 'but in general, a file may include any number of extractor trajectories. In the example of Figure 4, the 'file 14' includes the 〇〇乂 box 142 and the 〇 〇 box J51028. Doc • 44 * 201119346 150. The MOOV box 142 includes a complete subset trajectory 144 and media extractor trajectories 146, 148. The MDAT box 150 includes data samples and extractors for various trajectories, such as 'via' encoded samples 152, Ρ encoded Sample 154, Β encoded sample I%, beta encoded sample 158, and extractors 160 through 168. In the example of Figure 4, 'extractors 160 through 164 correspond to media extractor trajectory 146' and extractor 166 to 168 corresponds to media extractor trajectory 148. In this example, extractor 16 of media extractor trajectory 146 identifies I-coded sample 152 'extractor ι 62 to identify p-coded sample 154, and extractor 164 identifies The encoded sample 156. In this example, The fetcher 166 identifies the I-coded samples 152, and the extractor 162 identifies the p-coded samples 154. The example of Figure 4 demonstrates that two or more extractors of the various media extractor traces refer to the complete subset of the traces An example of the same sample. The media extractor trajectory may represent a time subset of the decodable video stream and an alternate/switching trajectory containing the trajectory of the original full-time resolution bit stream (eg, complete subset trajectory 144). The complete subset track 144 can, for example, represent a video stream of 30 frames per second (Fps). In some instances, by not including a β-picture of a certain level in the sub-bitstream, the sub-rate's pure rate can be halved or reduced to some other fraction. For example, the media extractor track 146 may have a frame rate that is halved relative to the full subset trajectory 144 by not including the B-coded sample 158. For example, the media extractor track Μ can have a frame rate of 15 FPS. With #, the media extraction trajectory 148 has a frame rate that is halved with respect to the media extractor trajectory 146 by omitting the B-coded sample 156 and the 匕 encoded sample 158, and thus has a 7. 5 Frame rate of FPS. 15l028.doc • 45- 201119346 FIG. 5 is a block diagram illustrating another example archive 180 including a subset trace 188 and two media extractor traces 184, 186. The file 182 of the file 包括8 includes a subset track 188, media extractor tracks 184, 186, and the MDAT box 190 includes an I-coded sample 192, a Ρ-coded sample i 94, a B-coded sample. 202, B-coded sample 208, and extractors 198, 20t), 204, 206, and 210. As discussed above, the media extractor trajectory can include an extractor that references a sample of another trajectory. Additionally, the media extractor trajectory can further include additional video samples that are not included in another trajectory. In the example of FIG. 5, subset track 188 includes an I-coded sample 192 and a P-coded sample 丨94. Media extractor trajectory 186 includes extractors 198, 200 and additionally includes B-coded samples 202. Similarly, media extractor track} 84 includes extractors 204, 206, 210' and additionally includes truncated coded samples 2〇8. In the example of FIG. 5, the media extractor exemplifies a coded sample of the video material (the encoded sample 202), and the media extractor trajectory 184 includes an extractor 210, the reference including the encoded sample The sample of the media extractor track 丨 86, i.e., in the example of Fig. 5, the 'extractor 2' refers to the sample 202 encoded by the 。. Thus, media extractor trace 184 can represent a full time resolution bitstream, while media extractor trace 186 and subset archive 188 can represent a subset of the full time resolution bitstream. That is, the media extractor trajectory 186 and the subset trajectory 188 may have a temporal resolution (eg, a lower frame rate) that is lower than the full time resolution represented by the media extractor trajectory "A." According to the techniques of the present invention, The H.264/AVC file format can be modified to include an extractor trajectory that can be extracted as a trajectory containing any compatible time subset of the original full time 15I028.doc -46·201119346 resolution bitstream. Hierarchy B (or p) H.264/AVC for picture writing, assuming there are N time levels, including from time level 0 to k (k) Each sub-bit stream of the sample of <N) can be extracted by defining a corresponding extractor trajectory. Thus, for the same video, there may be N trajectories (including N-1 extractor trajectories) that form a parent/switching group. The extractor can be associated with a time hierarchy level corresponding to the time level 4 of the sample identified by the extractor. For example, the time identifier value of the time level of the specified sample can also be communicated in the extractor. Figures 6A through 6C are block diagrams showing an example of a mdAT box 220 of a file that includes a plurality of media extractor traces for various media extractor tracks. Each of FIGS. 6A-6C depicts a sample 222 that includes a view 0 sample 224A, a view 2 sample 226A, a view 1 sample 228A, a view 4 sample 230A, and a view 3 sample 232A; and a non-tin sample 223 It includes a view 〇 sample 224B, a view 2 sample 226B, a view 1 sample 228B, a view 4 sample 230B, and a view 3 sample 232B. The ellipsis next to the non-anchor sample 223 means no, and additional samples may be included in the MDAT box 22〇. Each of the anchored sample and the non-anchored sample can collectively form a first trajectory of the archive. In one example, in accordance with the teachings of the present invention, the media extractor trajectory for each set of extractors of the archives depicted in Figures 6a through 6C may correspond to independent operating points of video archives that conform to the MVC file format. In this manner, the techniques of the present invention can be used to generate one or more media extractor trajectories corresponding to the operating points of video archives that conform to the MVC file format. 6A-6C depict various media extractor trajectory extractors 24, 244, 250, wherein the extractors 24, 244, 25 〇 will each be included in the 151028.doc -47 - 201119346 MDAT box 220, but out of For clarity purposes, it is not illustrated in a separate diagram. That is, the MDAT box 22A can include each of the extractors 240, 244, 250 when fully combined. Figures 6A through 6C provide an example of a file that includes a track containing a media extractor and real video samples. The various samples can be independently included in different tracks depending on the time level. For each time level, a particular trajectory can contain all video samples as well as extractors for trajectories with lower time levels. The video samples (NAL units) can be separated into different tracks' and the tracks with higher frame rate can have extractors pointing to other tracks. In this way, it is possible to have a movie clip containing samples of only one time level and a movie clip may contain an extracter pointing to other clips. In this case, movie segments of different trajectories but of the same time period can be interleaved in increasing order of time. Figure 6A provides an example of an extractor 240 that includes extractors 242a through 242N corresponding to media extractor trajectories. In this example, extractor 242a references two views 〇 samples 224A of anchor sample 222. Extractor 242n references view 〇 sample 224B of non-anchor sample 223. In general, in the example of Figure 6A, the extractor of extractor set 240 references the corresponding view. sample. Each of the extractors 242A through 242N corresponds to a common media extractor trajectory, which may belong to a switching group and/or an alternating group. The media extractor trajectory can correspond to individual operating points, for example, including views. The operating point. In some real money, for stereoscopic video using MVC code, there may be three operating points, including support for outputting one of two views, and one output for only one view (for example, view only or view 1) The second operation 151028.doc -48- 201119346 2. The third operating point can be the operating point of the output view 1. Depending on the prediction relationship - "the point can include only the VCL · NAL unit in view 1 and the associated All of the elements of the non-VCL NAL unit, view, and view}, or the nal unit of view 1 and the NAL unit of the tinned NAL unit (also referred to as the NAL unit of the component). In this stereoscopic situation, an example of the disclosed technique It is provided that the other two operating points can be represented by two extractor tracks. These two extractor tracks can form a switching group, and together with the original video track, these three tracks can form an alternating group. The present month provides a technique for modifying the MVC file format to include the MVC media extractor trajectory. In general, Mvc video trajectories including MVC media extractor trajectories having the same number of output views can be characterized as a switching group. by All of the operating points represented by the track of the archive may belong to an alternating group of mvc video presentations. The view of each of the anchor sample 222 and the non-anchor sample 223 may form a complete subset trajectory, for example, including all available views. Operating points. For example, as shown with respect to extractors 246A through 246N in Figure 6B, the extractor may reference successive portions of the sample. In Example + of Figure 6β, extractor 246A refers to view 〇 sample 224A and refers to view 2 sample 226a The data structure representing the extractor 246A may specify the byte range, the start view and the end view of the identified view, the number of start and subsequent views, or other tables* of the series-continuous view identified by the extractor. The extractor set (10) may correspond to another media extractor trajectory, which in turn may correspond to a separate MVC operating point. For example, as with respect to extractor 254A, 256A in Figure 6C, 151028.doc - 49- 201119346 shows that the two extractors can also refer to two parts of the same book (for example, two discontinuous views). For example, the extractor sample 252A includes a reference view. Extractor 254A of 224A and View 2 samples 226A, and extractor 254B of reference view 4 sample 230A. Thus, the samples represented by extractor samples 252A may correspond to extractor samples that reference discontinuous view samples. Similarly, in In the example of Figure 6C, the extractor sample 252N includes a reference view 〇 sample 224B and an extractor 256A of the view 2 sample 226B, and an extractor 256B of the reference view 4 sample 230B. The extraction may also be defined with respect to anchored or non-anchored samples. The extractor with respect to the anchor sample definition and the extractor for the non-anchor sample definition may refer to different views. The above-mentioned MVC media extractor trajectory of the ISO basic media file format or the Mvc file format may be an example of a relay data track, which may be implemented to have similar extraction functionality and may be used to indicate normal Alternation of video tracks and/or switching of tracks. In an example using the MVC file format, a full bit stream may be included in the trajectory, and all other possible operating points may be represented by an extractor trajectory, each of which may be a signal, such as The number of views output, the view identifier value of the view for output, the bandwidth required for transmission, and the frame rate. Figure 7 is a conceptual diagram illustrating an example MVC prediction pattern. In the example of Fig. 7, eight views (with view ID rs〇 to "S7") are illustrated, and 12 time positions ("τ〇" to "Tu") of each view are illustrated. that is,. Each column in Figure 7 corresponds to a -view' and each row indicates a temporal position. 151028.doc •50- 201119346 Although M VC has a so-called base view decoded by H.264/AVC decoder, and stereo view pairs can also be supported by MVC, MVC has the advantage of supporting more than two views. As an example of 3D video input and decoding of this 3D video represented by multiple views. A renderer with a client side of the Mvc decoder can expect 3 video content with multiple views. The tin fix component and the non-tin fix component in the view can have different view dependencies. For example, in view S2, the view component of the view component view _ is in the view component. However, the non-marketing view component in view S2 is not dependent on the view components in other views. / In Figure 7, a shaded box comprising letters is used to indicate each column and each frame in Figure 7 to specify that the corresponding frame is broken in the frame (i.e., I frame), or - The code is inter-frame coded (ie, is a p-frame) or written between frames in multiple directions (ie, as a (four) box). In general, the prediction is indicated by an arrow, where the arrow points to the frame using the arrow to start the object for predictive reference. For example, from the time of view $◦ at position T0! The p-frame at the time position τ〇 of the prediction view S2 is predicted. Like early-view video coding, the frame of the multi-view video coded video sequence can be predictively coded with respect to frames at different time positions. For example, the b frame at the time position T1 of the view SO has a self-view time position τ o at the m frame pointing to its arrow ', thereby indicating the prediction from the (four) frame b Sfl box, but 'other' in the multi-view video coding In this case, the frame can be predicted between views. That is, the view component can use the view component in other views for reference. In Mvc, for example, as in the view in another view, the pieces are inter-view predictions. The potential view inter-parameter 151028.doc -51 - 201119346 is used to communicate in the Sequence Parameter Set (SPS) MVC extension and can be modified by reference to the picture list construction process, which enables flexible ordering of predictive or inter-view prediction references. Table 1 below provides an example of a set of MVC extended sequence parameters. Table 1 seq_parameter_set_mvc_extension() { c descriptor num_views a minus 1 0 ue(v) for( i = 0; i <= num_views_minusl; i++) view—id[ i ] 0 ue(v) for( i = 1; i <= num a views-minusl; i++ ) { num_anchor_refs_10[ i ] 0 ue(v) for( j = 0; j < num_anchor-refs—10[ i ]; j++) anchor_ref_10[ i ][ j ] 0 ue(v) num-anchor-refs-11 [ i ] 0 ue(v) for( j = 0; j < num—anchor—refs—11 [ i ]; j++) anchor_ref_ll[ i ][ j ] 0 ue(v) } for( i = 1; i <= num_views_minusl; i++ ) { num_non_anchor_refs_10[ i ] 0 ue(v) for( j = 0; j < num a non_anchor_refs_10[ i ]; j++ ) non-anchor-ref_10[ i ][ j ] 0 ue(v) num_non_anchor_refs_ll[ i ] 0 ue(v) for( j = 0; j < num a non_anchor-refs_ll [ i ]; j++ ) non-anchor_ref_l 1 [ i ] [ j ] 0 ue(v) } num—Ievel_values a signalled a minus 1 0 ue(v) for(i = 0; i <= num_level_values_signalled_minusl; i++) { level_idc[ i ] 0 u(8) num—applicable—ops_minusl[ i ] 0 ue(v) for( j = 0; j <= num_applicable_ops_minusl[ i ]; j++ ) { applicable_op—temporal—id[ i ][j ] 0 u(3) applicable_op a num—target a views_minusl[ i ][j ] 0 ue(v) for( k = 0 ; k <= appIicable_op_num_target_views_minusl[i][j];k++) applicable_op_target_view_id[ i ][ j ][ k ] 0 ue(v) applicable-op-num_views_minusl [ i ][j ] 0 ue(v) } } } -52- 151028.doc 201119346 Figure 7 provides various examples of inter-view prediction. In the example of FIG. 7, the frame of the view S1 is illustrated as being predicted from the frame at different time positions of the view, and between the frames in the frame at the same time position from the views S0 and S2. For example, the b frame of the view s 1 at the time position T 系 is from the B frame at the time positions T0 and T2 of the view S 1 and the b frame at the time position T1 of the view 8 〇 and S 2 . Each of the predictions is made. In the example of Figure 7, the capital letter r B" and the lowercase letter "b" are intended to indicate different hierarchical relationships between frames rather than different encoding methods. In general, the uppercase "B" frame is relatively higher than the lowercase rb" frame prediction layer. That is, in the example of Fig. 7, the "B" frame is encoded by referring to the "B" frame. Additional hierarchy levels can be added to have additional bi-directionally encoded frames that can be referenced to the "b" frame of Figure 7. Figure 7 also uses different levels of shading to account for changes in the predicted hierarchy, where a larger amount of shadow (i.e., relatively deeper) frame prediction hierarchy is higher than those with less shadows (i.e., relatively giga). frame. For example, all the I frames in Figure 7 are illustrated by full shading and the frames have a slightly shallower shadow, and the frames (and lowercase b frames) have various levels of shadow relative to each other, but always Shallower than the shadow of the ρ frame and the I frame. In general, the prediction hierarchy is related to the view order index. The relationship between the predictions is that the higher frame of the frame should be decoded in the frame with a relatively lower decoding level, so that the class is relatively higher. The frame can be used as a reference frame during the period when the decoding level is relatively low. The view order index is an index indicating the decoding order of the view components in the access. As specified in H 4/AVC Appendix H (MVC Amendment), the view order index 151028.doc -53· 201119346 is implicit in the SPS MVC extension. In the SPS, for each index z, the corresponding view is id. The decoding of the view component should follow the ascending order of the view order index. If all views are rendered, the view order index is in a sequential order from 〇 to num_views_minus_l. In this way, the frame used as the reference frame can be decoded before decoding the frame referenced by the reference frame. The view order index is an index indicating the decoding order of the view components in the access unit. For each view order index z, the corresponding view is id. The decoding of the view component follows the ascending order of the view order index. If all views are rendered, the set of view order indexes contains a set of consecutive sorts from zero to less than the full number of views. For certain frames at equal level levels, the order of decoding may not be important relative to each other. For example, the I frame of the view S0 at the time position τ 用作 is used as the reference frame of the p frame of the view S2 at the time position T0, and the p frame of the view S2 at the time position T0 is used as the view S4 again. The reference frame of the p-frame at the time position το. Therefore, the I frame of the view s at the time position τ〇 should be decoded before the p frame at the time position T0 of the view S2, and the frame of the view S2 at the time position Τ0 should be at the time position of the view S4. The P frame at Ding was decoded before. However, between views s and S3, the order of decoding is not important, since views 81 and 83 do not rely on each other for prediction, but only for predictions from higher prediction levels. Furthermore, view S1 can be decoded prior to view S4 as long as view S1 is decoded after view 8 and illusion. In this way, hierarchical ordering can be used to describe views 8〇 to S7. Making the notation SA> SB means that the view SA should be decoded before the view is dimmed. In the example of Fig. 7] 5] 028.doc -54·201119346 S0>S2>S4>S6>S7. Further, regarding Fig. 7, the actual 'S2> S3, S4 > S3, S4 > S5, and S6 > S5. In this case, using the notation, for example, the S0>S1, S2>S1 view does not violate these requirements - the decoding order is possible. Therefore, many different decoding orders are possible with only certain limitations. Two example decoding orders are presented below, but it should be understood that many other decoding orders are possible. In one of the examples illustrated in Table 2 below, the view is decoded as quickly as possible. Table 2 View ID so S1 S2 S3 S4 S5 S6 View Order Index 0 2 1 4 3 6 5 〇/ 7 The example in Table 2 confirms that view S1 can be decoded immediately after decoded views S0 and S2; in decoded view 82 After 84, view S3 can be decoded immediately; and after decoded view S4 & S6, view s5 can be decoded immediately. Table 3 below provides another example decoding order, wherein the decoding order enables any view that is used as a reference for another view to be decoded before being used as a reference for any other view. Table 3 View ID so S1 S2 S3 S4 S5 S6 C7 View Order Index 0 5 1 6 2 7 3 〇/ 4 Example of Table 3 Confirmation 'In the example of Figure 7, the frames si, s3, 85 and 37 are Does not act as a reference frame for any other view frame, and thus views S1, S3, S5, and S7 are used as frames for their views (ie, views SO, S2, S4, and S6) decoding. Views S1, S3, s5, and S7 can be decoded in any order relative to each other. Therefore, in the example of Table 3, 151028.doc - 55 · 201119346, the view S7 is decoded before each of the views SI, S3, and S5. For the sake of clarity, there may be a hierarchical relationship between the frame of each view and the temporal position of the frame of each view. With respect to the example of Fig. 7, the frames at the time position T0 from other views are used to predict the intra-view or inter-view prediction time position T0. Similarly, the frame at the time position T8 from other views is used to predict the intra-view or inter-view prediction time position Τ8. Therefore, with respect to the time hierarchy, the time positions Ding and Τ8 are at the apex of the time horizon. In the example of FIG. 7, because the reference time position is different from the frame of the time frame β coded by the frame, the frame at the time position Τ4 is lower than the time position of the frame at the time level. . The time position and the frame of the frame are lower than the time frame Τ4 at the time level. Finally, the frames at the time positions ΤΙ, Τ3, Τ5, and Τ7 are at the time level ± below the time position Τ 2 and Τ 6 frames. In MVC, a subset of the overall bitstream can be extracted to form a sub-bitstream that is still MVC compliant. There are many possible sub-bitstreams that a particular application can need based on, for example, the services provided by the server, the capacity of the decoders of multiple clients, support and capabilities, and/or one or one user. The preference of the end. For example, 1 client may need only three graphs' and there may be two scenarios. In the instance, the client can _ smooth the viewing experience, and may prefer the view with = S2, while the other __ user may need to adjust the circle and prefer the vieW_id value S0, 52 and the view . If the first instance of Table 9 - id is sorted, then the view of the order index value in these two instances 15I028.doc -56 - 201119346 are (〇, i, 2} and AI 4} respectively. Note that this Both of the sub-bitstreams can be decoded into independent MVC bitstreams and can be supported simultaneously. There can be many Mvc sub-bitstreams that can be decoded by the MVC decoder. In theory, the following two properties are satisfied. Any combination of these may be decoded by an MVC decoder conforming to a certain specification or level: (1) sorting the view components in each access unit in a view order: bowing, and (7) for each view in the combination The dependent views are also included in the combination. With respect to the techniques of the present invention, media extractor trajectories and/or pure video sample trajectories may be used to represent various MVC sub-bitstreams. Each of these trajectories may correspond to An MVC operating point., Figures 8 through 21 are block diagrams illustrating various examples of data structures of media extractors and other supporting data structures that may be used in accordance with the teachings of the present invention. As discussed in detail below, Figure 8 Figure 22 of the various media The volume extractor includes various features. In general, any of the media extractors of Figures 8 through 21 can be included in a media extractor track of a file to identify a sample of the file's coded code, the file conforming to iso The base media file format or an extension to the 18” base media file format. In general, the media extractor can extract one or more global samples from the reference track. Figures 8 through 12 are ones that can identify one of the other tracks. An example of a media extractor for a box. Another way to implement an extractor is to enable grouping of samples from samples of another trajectory, as shown in Figure 13. To provide more specific support for time-adjustable capabilities, as shown in Figure Η The 'transportable time identifier is shown. Figure 16 through Figure 22 are examples of Mvc's media extractor capable of extracting one or more potentially discontinuous NAL units from each video sample box (access unit). Various examples of extractors are 151028.doc •57- 201119346 based on offsets and byte lengths in archives or access units, while other instances can be based purely on the index of the overall NAL unit, so the subpoena The tuple range is not necessary. The mechanism of the index communication extractor of the overall NAL unit can also be extended to the SVC file format. The examples of Figures 8 through 21 can also be directly applied to the 3GPP file format as an extension to the 3GPP file format. The elements and concepts of - or more of Figures 8 through 21 may also be combined with elements of the other of Figures 8 through 22 to form other extractors, although Figures 8 through 2 are described with respect to a particular file format. Some of the figures, but in general, the examples of Figures 8 through 21 can be used with respect to any file format having similar characteristics (e.g., an extension of the IS) base media file format or the iso base media file format. In the example of 21, in order to facilitate the use of the proposed extractor in 3Gpp, the 3Gpp trajectory selection phase may be extended to include more features of each of the (extracted) alternate trajectories, such as a time identifier, The number of views to be displayed, and the number of views to be decoded. Figure 8 is a block diagram illustrating an example media extractor 3, which illustrates the format of a monthly media extractor. In the example of Figure 8, media extractor 300 includes a track reference index 3〇2 and a sample offset value of 3〇4. In accordance with the teachings of the present invention, the media extractor 3 can correspond to a definition of a data structure that can be instantiated within the media extractor trajectory. The multiplexer 30 can be configured to include an extractor that follows an instance of the media extraction benefit 300 in a media extractor trace of the video archive to identify NAL units of different trajectories of the video archive. The decimator 38 can be configured to use the extractor compliant with the media extractor 3 to fetch the identified NAL unit. 151028.doc • 58- 201119346 Trace > The test index 302 may correspond to an identifier of the track in which the identified NAl unit is present. A unique cable can be assigned to each track of the video file to distinguish the track of the video file. The Tracking Reference Index 302 can specify the factory> index to find the trajectory for extracting data. For the extraction; the samples in the trajectory can be accurately time aligned (in the media decoding timetable, the 'use time·sample table' is deflated by the offset specified by the sample offset value 3〇4), where This sample contains an extractor. In some examples, the first track of the channel has an index value of Γ1": therefore, the multi-working center (9) (four) reference index value 3 () 2 is assigned as "the value of 匕 to refer to the first track of the video file. The value of the track reference index value is "〇" for future use. The offset value 304 is the offset value from the time slot of the media extractor in the media extractor track to the identified NAL unit of the track indicated by the track reference index 3()2. That is, the sample offset value 3〇4 gives a value of the relative value of the sample offset value 3〇4 of the sample in the linked track used as the information source, and has the same value as the sample containing the extractor. The decoding time is either immediately before the sample containing the sample of the extractor. The sample is the same as the following sample-1 is the same as before, and so on. For example, when using a two-body extractor that complies with the media extractor in H.263 or MPEG_4 Part 2, the media extract is used to extract the time of the video censored by the profiling reference. Subset. The following pseudo code provides an instance definition similar to the media extraction number category of media extractor 3 (10). Class aligned(8) MediaExtractor () { 151028.doc •59- 201119346 unsigned int(8) track_ref_index; signed int(8) sample—offset; } multiplexer 30 and demultiplexer 38 can be used in the above example pseudocode The media extractor defined in the example is used to instantiate the media extractor data object. Thus, the demultiplexer 38, for example, may refer to the exemplified media extractor when extracting data from the selected trajectory to freely retrieve the identified material from another trajectory referenced by the instantiated media extractor. In the example pseudocode, 'class MediaExtractor() is aligned by the byte. That is, when the extractor is instantiated by the MediaExtractor() class, the extractor is aligned on the octet boundary. The variable rtrack_ref_index" corresponds to the track reference index value 302, and in this example pseudo code corresponds to an octet integer value without a sign. The variable r sample-〇ffset" corresponds to the sample offset value 304' and in this example corresponds to an octet integer value with a sign. Figure 9 is a block diagram showing another example of the media extractor 31. Media extractor 310 includes track reference index 314 and sample offset value 316, and additionally includes sample header 312. Track reference index 314 and sample offset value 316 may typically include data similar to track reference index 3〇2 and sample offset value 3〇4 (Fig. 8). In an example corresponding to H.264/AVC, the sample header 312 can be constructed from the naL unit header of the video sample referenced by the media extractor 310. The sample header 3 12 may contain one byte data with three syntax elements: forbidden one zero_bit, nal_ref_idc (which may contain 3 bits), I51028.doc • 60· 201119346 nal—unit—type (which may include The value of 5 bits % Λ ' ^ ; nal-unit_type " is 29 (or any other reserved number), and the other two words = the same grammatical elements in the video sample. For compliance 4 For some examples of "vision", the sample header (1) may contain a four-digit two-group code, and the code may include the start code of the "Gx1" code and
Ox C5」(或任何其他保留數字)之起始瑪其中「如」指 不〇χ」之後的值為十六進制值。對於H 263而言,樣本 標頭312亦可包括-不同於正常視訊樣本之起始碼的經位 元組對準之起始碼。樣本標頭312可由解多工器38用於同 步化之目的,使得可將提取器視為正常視訊樣本。 以下偽碼提供類似於媒體提取器3 1〇之媒體提取器類別 的實例定義: class aligned(8) MediaExtractor () {The starting value of Ox C5" (or any other reserved number) where "if" means not 〇χ is the hexadecimal value. For H 263, the sample header 312 may also include a starting code of the bit group alignment that is different from the start code of the normal video sample. The sample header 312 can be used by the demultiplexer 38 for synchronization purposes so that the extractor can be considered a normal video sample. The following pseudocode provides an example definition of a media extractor class similar to media extractor: class aligned(8) MediaExtractor () {
SampleHeader (); unsigned int(8) track_ref_index; signed int(8) sample offset; } 多工器30及解多工器38可使用在以上實例偽碼中定義之 媒體提取器來例示媒體提取器資料物件。因此,解多卫哭 38(例如)在自所選擇軌跡擷取資料時可參考所例示之媒體 提取器,以便自由所例示之媒體提取器參考之另一執跡掏 取經識別的資料。 圖10為說明實例媒體提取器320之方塊圖,該實例媒體 提取器320藉由在提取器内傳訊經識別之NAL單元的位元 • 61 - 151028.doc 201119346 組範圍來識別NAL單元。媒體提取器320包括可類似於樣 本標頭312之樣本標頭322,及可類似於轨跡參考索引302 的軌跡參考索引324。然而,媒體提取器320之實例包括資 料偏移值326及資料長度值328而非樣本偏移值。 資料偏移值326可描述由媒體提取器320識別之資料的起 始點。亦即,資料偏移值326可包含一表示至要複製的由 軌跡索引值324識別之軌跡内之第一位元組的偏移之值。 資料長度值328可描述要複製之位元組的數目,且因此可 等效於經參考之樣本(或在參考多個NAL單元時,多個樣 本)的長度。 以下偽碼提供類似於媒體提取器320之媒體提取器類別 的實例定義: class aligned(8) MediaExtractor () {SampleHeader (); unsigned int (8) track_ref_index; signed int (8) sample offset; } multiplexer 30 and demultiplexer 38 may use the media extractor defined in the above example pseudocode to instantiate the media extractor data object . Therefore, the solution to the crying 38 (for example) may refer to the exemplified media extractor when extracting data from the selected trajectory, so as to freely obtain the identified data from the other exemplified media extractor reference. 10 is a block diagram illustrating an example media extractor 320 that identifies NAL units by communicating in the extractor a range of bits of the identified NAL unit • 61 - 151028.doc 201119346. Media extractor 320 includes a sample header 322 that can be similar to sample header 312, and a track reference index 324 that can be similar to track reference index 302. However, examples of media extractor 320 include data offset value 326 and data length value 328 rather than sample offset values. The data offset value 326 can describe the starting point of the material identified by the media extractor 320. That is, the data offset value 326 can include a value indicative of the offset of the first byte within the track identified by the track index value 324 to be copied. The data length value 328 can describe the number of bytes to be copied, and thus can be equivalent to the length of the referenced sample (or multiple samples when referring to multiple NAL units). The following pseudocode provides an example definition similar to the media extractor class of media extractor 320: class aligned(8) MediaExtractor () {
SampleHeader (); unsigned int(8) track_ref_index; unsigned int(32) data_offset; signed int(32) data_length; } _ 多工器30及解多工器38可使用在以上實例偽碼中定義之 媒體提取器來例示媒體提取器資料物件。因此,解多工器 38(例如)在自所選擇軌跡擷取資料時可參考所例示之媒體 提取器,以便自由所例示之媒體提取器參考之另一軌跡擷 取經識別的資料。 圖11為說明實例媒體提取器340之方塊圖,該實例媒體 151028.doc -62- 201119346 提取器340含有用於未來可擴展性之保留位元。媒體提取 器340包括可分別類似於媒體提取器3〇2及樣本偏移值 的軌跡參考索引342及樣本偏移值346。此外,媒體提取器 3 40包括保留位元344,其可包含用於對媒體提取器之未來 擴展的保留位元。以下偽碼提供類似於媒體提取器34〇之 媒體提取器類別的實例類別定義: class aligned(8) MediaExtractor () { unsigned int(8) track_ref—index; unsigned int(8) reserved_bits; signed int(8) sample_offset; } 多工器30及解多工器38可使用在以上實例偽碼中定義之 媒體提取器來例示媒體提取器資料物件。因此,解多工器 3 8(例如)在自所選擇軌跡擷取資料時可參考所例示之媒體 提取器,以便自由所例示之媒體提取器參考之另一軌跡擷 取經識別的資料。 圖12為說明實例媒體提取器35〇之方塊圖,該實例媒體 提取器350使用軌跡識別符值而非軌跡參考索引值。使用 軌跡識別符值來識別轨跡可指代按IS〇基礎媒體檔案格式 呈現軌跡參考箱。媒體提取器35G之實例包括執跡識別符 352、保留位元354及樣本偏移值356。如藉由圍繞保留位 元354之虛線所指示,保留位元354為可選的。亦即,一些 實例可包括保留位元354,而其他實例可省略保留位元 354。樣本偏移值356可類似於樣本偏移值3〇4。 J51028.doc -63- 201119346 軌跡識別符352指定供提取資料之軌跡的轨跡1〇。供提 取資料之軌跡中的樣本可在時間上對準(在媒體解碼時刻 表中,使用時間-樣本表,藉由由樣本偏移356指定之偏移 來調整),其中該樣本含有媒體提取器35〇。可向第一執跡 參考指派為1之識別符值。可保留為〇之值以供未來使用及 擴展。 以下偽碼提供類似於媒體提取器35〇之媒體提取器類別 的實例定義: class aligned(8) MediaExtractor () { unsigned int(8) trackjd; unsigned int(8) reserved_bits; signed int(8) sample_offset; } 多工器30及解多工器38可使用在以上實例偽碼中定義之 媒體提取器來例示媒體提取器資料物件。因此,解多工器 38(例如)在自所選擇軌跡擷取資料時可參考所例示之媒體 提取器’以便自由所例示之媒體提取器參考之另一軌跡擷 取經識別的資料。 圖13為說明一實例媒體提取器樣本群組36〇之方塊圖。 多工器30可將媒體提取器樣本群組36〇包括於訊息類型箱 (具有類型識別符「MESG」)、樣本表箱容器中。多工器 3〇可經組態以將零或一個媒體提取器樣本群組36〇物件包 括於訊息箱中。在圖13之實例中,媒體提取器樣本群組 360包括轨跡參考索引362、群組類型364、群組數目計數 151028.doc •64· 201119346 366、保留位元368及群組描述索引370。 軌跡參考索引362指定用以在某一準則下找尋來自樣本 群組之供提取資料之轨跡的軌跡參考之索引。亦即,軌跡 參考索引362以類似於軌跡參考索引302之方式識別供提取 由媒體提取器識別之資料的軌跡。 群組類型值364識別媒體提取器樣本群組36〇對應於之樣 本群組的類型。群組類型值364通常識別用以形成取樣群 組之樣本群組的準則,並將準則鏈接至由軌跡參考索引 3 62識別之軌跡中的具有相同群組類型值之樣本群組描述 表。群組類型值364可包含整數值。以此方式媒體提取 器樣本群組360之群組類型值可與執跡參考索引362所指代 之軌跡的群組類型相同。或者,對於視訊時間子集而言, 群組類型值364可定義為「vtst」,可僅針對該群組類型定 義媒體提取器樣本群組,且語法表將不需要語法要素 「grouping_type」。 群組數目汁數值366可描述包括媒體提取器樣本群組㈣ 之媒體提取ΙΜΛ跡中的樣本群組之數目。群崎目計數值 366之為零的值可表示在準則下由群組類型值⑽參考之所 ^樣本群組用以形成媒體提取器軌跡。群組描述索引⑽ 定義用以形成樣本群組描述表中之媒體提取器軌跡的樣本 群組項之索引。 根據本發明之技術,纟且人讲π m 7 口過程可用以將所有樣本置於樣 财,使得樣本按時間排序,使得樣本Α在媒體提 取益軌跡中之樣本B之後指千样士 交寻曰不樣本A在由軌跡參考索引362 15I028.doc -65· 201119346 指代之執跡中的樣本B之後。 以下偽碼提供類似於媒體提取器樣本群組360之媒體提 取器樣本群組類別的實例定義: class aligned(8) MedEtrSampleGroup () { unsigned int(8) track_ref_index; unsigned int(32) grouping_type; unsigned int(32) group一number_count; for ( i =0; i< group number count; i++ ) unsigned int (32) group一description_index; } 多工器30及解多工器38可使用在以上實例偽碼中定義之 媒體提取器來例示媒體提取器資料物件。因此,解多工器 38(例如)在自所選擇軌跡擷取資料時可參考所例示之媒體 提取器,以便自由所例示之媒體提取器參考之另一軌跡掏 取經識別的資料。 圖14為說明實例媒體提取器380之方塊圖,該媒體提取 器380可用於遵照AVC檔案格式之視訊檔案的情形中。媒 體提取器380之實例包括執跡參考索引382、時間識別符值 3 84、保留位元386及樣本偏移值388。軌跡參考索引382及 樣本偏移值388可以分別類似於軌跡參考索引302及樣本偏 移值304之方式來使用。保留位元386可經保留以供未來使 用,且此時並不向其指派語義值。 時間識別符值384指定待由媒體提取器380提取之樣本的 時間等級。在一實例中,時間等級係在〇至7(包括0及7)之 -66· 151028.doc 201119346 範圍内。如上文所論述,經編碼之圖片可對應於時間等 級,其中時間等級通常描述訊框之間的編碼階層。舉例而 舌,可向關鍵訊框(亦稱為錨定訊框)指派最高時間等級, 而可向並不用作參考訊框之訊框指派相對較低之時間等 級。以此方式,媒體提取器380可藉由參考樣本之時間等 級而非明確地識別樣本自身來識別來自由軌跡參考索引 382指代之軌跡的所提取樣本。具有達一高於由時間識別 符值384定義之值的值之媒體提取器的媒體提取器軌跡可 對應於具有車父兩訊框率之操作點。 以下偽碼提供類似於媒體提取器38〇之媒體提取器類別 的實例定義: class aligned(8) MediaExtractor () { unsigned int(8) track_ref_index; unsigned int(3) temporal—id; unsigned int(5) reserved_bits; signed int(8) sample—offset; } 多工器30及解多工器38可使用在以上實例偽碼中定義之 媒體提取器來例示媒體提取器資料物件。因此,解多工器 38(例如)在自所選擇軌跡擷取資料時可參考所例示之媒體 提取器,以便自由所例示之媒體提取器參考之另—軌跡棟 取經識別的資料。 圖15為說明實例MVC媒體提取器42〇之方塊圖,該實例 MVC媒體提取器420可用以修改MVC以使其包括媒體提取 151028.doc •67- 201119346 器軌跡。媒體提取器420之實例包括一可選NAL單元標頭 422、轨跡參考索引424、樣本偏移426、連續位元組集合 計數428 ’及一包括資料偏移值430及資料長度值432的值 迴圈。MVC媒體提取器420可用以自特定執跡提取視圖組 件之一子集的多個NAL單元。MVC媒體提取器420之實例 在自經參考之軌跡之樣本提取資料時可跳過軌跡中的視圖 組件。 在存在時,NAL單元標頭422可鏡射由MVC媒體提取器 420識別之NAL單元的NAL單元標頭。亦即,NAL單元標 頭422之語法要素可根據提取器中之NAL單元標頭語法或 在MVC檔案格式中定義之彙總工具產生過程而產生。在一 些實例中,(例如)在一系列提取器將經產生以包括相關 NAL單元標頭時,提取器可能不需要NAL單元標頭422。 轨跡參考索引值424指定軌跡參考之索引以用以找尋供 提取資料的轨跡。供提取資料之執跡中的樣本可在媒體解 碼時刻表中在時間上對準,藉由由樣本偏移值426指定之 偏移來調整,其中該樣本含有Mvc媒體提取器42〇。第一 執跡參考可經指定以接收為】之索引值,且可保留軌跡參 考索引值之為零的值。 π樣本偏移值426定義待提取之樣本相對於MVC媒體提取 之時間位置的偏移,該樣本位於由軌跡參考索引值 424指代之軌跡巾。樣本偏移值似之為零的值指示待提取 =樣本係處於同一時間位置,_丨指示先前樣本,+ι指示下 一樣本,以此類推。 151028.doc • 68 · 201119346 連續位元組集合計數42 8描述供提取資料之執跡之樣本 的連續位元組集合之數目。若連續位元組集合計數428具 有為零之值,則將提取轨跡中的整個經參考之樣本。連續 位元組集合亦可稱為樣本之獨立部分。 資料偏移值430及資料長度值432出現於一迴圈中。一般 而言,迴圈之反覆的數目(亦即,資料偏移值430及資料長 度值432之數目)與待擷取之樣本之部分的數目(例如,連續 位元組集合之數目)有關。因此,可使用MVC媒體提取器 420來提取樣本之兩個或兩個以上部分。對於待提取之樣 本之每一部分而言,資料偏移值430之相應者指示該部分 之起始(例如,該部分之第一位元組,相對於樣本之第一 位元組),且資料長度值432之相應者指示要複製之長度(例 如,位元組之數目)。在一些實例中,資料長度值432中之 一者的為零之值可指示將複製樣本中之所有剩餘位元組, 亦即,該部分對應於由資料偏移值430之相應者指示的位 元組及直至樣本之結束的所有其他連續位元組。 以下偽碼提供類似於MVC媒體提取器420之媒體提取器 類別的實例定義: class aligned(8) MediaExtractorMVC () { NALUnitHeader(); //在一些實例中可被省略 unsigned int(8) track_ref_index; signed int(8) sample—offset; unsigned int(8) continuous_byte_set_count; for (i = 0 ; i < continuous—byte—set—count; i++ ) { 151028.doc -69- 201119346 unsigned int((lengthSizeMinusOne+1 )*8) data一offset; unsigned int((lengthSizeMinusOne+1 )* 8) data_length; 多工器30及解多工器38可使用在以上實例偽喝令定義之 媒體提取器例示媒體提取器資料物件。因此,解多工器 38(例如)在自所選擇軌跡擷取資料時可參考 1 不之媒體 提取器,以便自由所例示之媒體提取器參者 1 <另一軌跡擷 取經識別的資料。 圖16為說明另一實例MVC媒體提取器44〇 ^ 万塊圖,該 實例MVC媒體提取器440可用以修改MVC以蚀计 1之其包括媒體 提取器軌跡。與如關於圖15之實例所描述之掸 〈樣本的特定位 元組形成對比’ MVC媒體提取器440之實伽缉…SampleHeader (); unsigned int (8) track_ref_index; unsigned int (32) data_offset; signed int (32) data_length; } _ multiplexer 30 and demultiplexer 38 may use the media extractor defined in the above example pseudo code To instantiate the media extractor data object. Thus, the demultiplexer 38, for example, may refer to the exemplified media extractor when extracting data from the selected trajectory to freely retrieve the identified material from another trajectory referenced by the instantiated media extractor. 11 is a block diagram illustrating an example media extractor 340 that contains reserved bits for future scalability. Media extractor 340 includes track reference index 342 and sample offset value 346 that may be similar to media extractor 3〇2 and sample offset values, respectively. In addition, media extractor 3 40 includes reserved bits 344, which may contain reserved bits for future expansion of the media extractor. The following pseudocode provides an instance class definition similar to the media extractor class of media extractor 34: class aligned(8) MediaExtractor () { unsigned int(8) track_ref_index; unsigned int(8) reserved_bits; signed int(8 Sample_offset; } The multiplexer 30 and the demultiplexer 38 may instantiate the media extractor data object using the media extractor defined in the above example pseudo code. Thus, the demultiplexer 38, for example, may refer to the exemplified media extractor when extracting data from the selected trajectory to freely retrieve the identified material from another trajectory referenced by the instantiated media extractor. Figure 12 is a block diagram illustrating an example media extractor 35 that uses track identifier values instead of track reference index values. Using the track identifier value to identify the track can refer to presenting the track reference box in the IS〇 base media file format. Examples of media extractor 35G include a trace identifier 352, a reserved bit 354, and a sample offset value 356. The reserved bit 354 is optional as indicated by the dashed line surrounding the reserved bit 354. That is, some instances may include reserved bits 354, while other instances may omit reserved bits 354. The sample offset value 356 can be similar to the sample offset value of 3〇4. J51028.doc -63- 201119346 The track identifier 352 specifies the track 1〇 for the track of the extracted data. The samples in the trajectory for extracting data may be aligned in time (in the media decoding time table, using the time-sample table, adjusted by the offset specified by the sample offset 356), where the sample contains the media extractor 35〇. An identifier value of 1 can be assigned to the first trace reference. It can be retained as a value for future use and expansion. The following pseudocode provides an example definition of a media extractor class similar to media extractor 35: class aligned(8) MediaExtractor () { unsigned int(8) trackjd; unsigned int(8) reserved_bits; signed int(8) sample_offset; The multiplexer 30 and the demultiplexer 38 can instantiate the media extractor data object using the media extractor defined in the above example pseudo code. Thus, the demultiplexer 38, for example, may refer to the exemplified media extractor' when extracting data from the selected trajectory to freely retrieve the identified material from another trajectory referenced by the instantiated media extractor. 13 is a block diagram illustrating an example media extractor sample group 36. The multiplexer 30 can include the media extractor sample group 36 in a message type box (having a type identifier "MESG"), a sample list box container. The multiplexer 3〇 can be configured to include zero or one media extractor sample group 36 〇 objects in the message box. In the example of FIG. 13, media extractor sample group 360 includes track reference index 362, group type 364, group number count 151028.doc • 64·201119346 366, reserved bit 368, and group description index 370. The trajectory reference index 362 specifies an index of trajectory references used to find trajectories from the sample group for extracting data under a certain criterion. That is, the trajectory reference index 362 identifies the trajectory for extracting the material identified by the media extractor in a manner similar to the trajectory reference index 302. The group type value 364 identifies the type of sample group that the media extractor sample group 36 〇 corresponds to. The group type value 364 typically identifies criteria for forming a sample group of sample groups and links the criteria to a sample group description table having the same group type value among the tracks identified by the track reference index 3 62. Group type value 364 can include an integer value. In this manner, the group type value of the media extractor sample group 360 can be the same as the group type of the track indicated by the tracking reference index 362. Alternatively, for the video time subset, the group type value 364 can be defined as "vtst", the media extractor sample group can be defined only for the group type, and the syntax element will not require the syntax element "grouping_type". The group number juice value 366 may describe the number of sample groups in the media extraction track including the media extractor sample group (4). A value of zero for the group Hamas count value 366 can be expressed by the group of samples referenced by the group type value (10) under the criterion to form a media extractor trajectory. The group description index (10) defines an index of the sample group items used to form the media extractor trajectory in the sample group description table. According to the technique of the present invention, the π m 7 port process can be used to place all samples in the sample, so that the samples are sorted by time, so that the sample 指 refers to the sample search after the sample B in the media extraction benefit track.曰 No sample A is after sample B in the trace of the track reference index 362 15I028.doc -65· 201119346. The following pseudocode provides an example definition of a media extractor sample group class similar to media extractor sample group 360: class aligned(8) MedEtrSampleGroup () { unsigned int(8) track_ref_index; unsigned int(32) grouping_type; unsigned int (32) group_number_count; for (i =0; i< group number count; i++ ) unsigned int (32) group-description_index; } multiplexer 30 and demultiplexer 38 can be defined in the above example pseudo code The media extractor is used to instantiate the media extractor data object. Thus, the demultiplexer 38, for example, may refer to the exemplified media extractor when extracting data from the selected trajectory to freely retrieve the identified material from another trajectory referenced by the instantiated media extractor. Figure 14 is a block diagram illustrating an example media extractor 380 that can be used in the context of a video archive in accordance with the AVC file format. Examples of media extractor 380 include a trace reference index 382, a time identifier value 3 84, a reserved bit 386, and a sample offset value 388. The trajectory reference index 382 and the sample offset value 388 can be used in a manner similar to the trajectory reference index 302 and the sample offset value 304, respectively. The reserved bit 386 can be reserved for future use and no semantic values are assigned to it at this time. The time identifier value 384 specifies the time level of the sample to be extracted by the media extractor 380. In one example, the time scale is in the range of -66· 151028.doc 201119346 of 〇 to 7 (including 0 and 7). As discussed above, the encoded picture may correspond to a time level, where the time level generally describes the coding level between frames. For example, the key frame (also known as the anchor frame) can be assigned the highest time level, and the frame that is not used as the reference frame can be assigned a relatively low time level. In this manner, media extractor 380 can identify the extracted samples from the trajectory referred to by trajectory reference index 382 by reference to the temporal level of the samples rather than explicitly identifying the samples themselves. The media extractor trajectory of the media extractor having a value above a value defined by the time identifier value 384 may correspond to an operating point having a two frame rate of the parent. The following pseudocode provides an example definition of a media extractor class similar to media extractor 38: class aligned(8) MediaExtractor () { unsigned int(8) track_ref_index; unsigned int(3) temporal_id; unsigned int(5) Reserved_bits; signed int(8) sample_offset; } The multiplexer 30 and the demultiplexer 38 can instantiate the media extractor data object using the media extractor defined in the above example pseudo code. Thus, the demultiplexer 38, for example, may refer to the exemplified media extractor when extracting data from the selected trajectory so that the exemplified media extractor can refer to the other trajectory to retrieve the identified material. 15 is a block diagram illustrating an example MVC media extractor 42 that can be used to modify the MVC to include media extraction 151028.doc • 67-201119346 trajectories. Examples of media extractor 420 include an optional NAL unit header 422, track reference index 424, sample offset 426, consecutive byte set count 428', and a value including data offset value 430 and data length value 432. Loop. The MVC media extractor 420 can be used to extract a plurality of NAL units from a subset of the view components from a particular profile. Example of MVC Media Extractor 420 The View component in the trajectory can be skipped when the data is extracted from the sample of the referenced trajectory. When present, the NAL unit header 422 can mirror the NAL unit header of the NAL unit identified by the MVC media extractor 420. That is, the syntax elements of NAL unit header 422 may be generated based on the NAL unit header syntax in the extractor or the summary tool generation process defined in the MVC file format. In some examples, the extractor may not require the NAL unit header 422, for example, when a series of extractors are to be generated to include the associated NAL unit header. The trajectory reference index value 424 specifies an index of the trajectory reference for finding a trajectory for extracting data. The samples in the trace for the extracted data may be temporally aligned in the media decode schedule, adjusted by the offset specified by the sample offset value 426, where the sample contains the Mvc media extractor 42. The first trace reference can be assigned to receive an index value of 】, and the value of the track reference index value of zero can be retained. The π sample offset value 426 defines the offset of the sample to be extracted relative to the time position of the MVC media extraction, which is located in the track towel referred to by the track reference index value 424. A value with a sample offset value of zero indicates that the sample is to be fetched = the sample system is at the same time position, _丨 indicates the previous sample, +ι indicates the next sample, and so on. 151028.doc • 68 · 201119346 Continuous byte set count 42 8 describes the number of consecutive byte sets for the sample of the trace of the extracted data. If the consecutive byte set count 428 has a value of zero, then the entire referenced sample in the trajectory will be extracted. A collection of consecutive bytes can also be referred to as a separate part of the sample. The data offset value 430 and the data length value 432 appear in a loop. In general, the number of repetitions of the loop (i.e., the number of data offset values 430 and data length values 432) is related to the number of portions of the sample to be retrieved (e.g., the number of consecutive sets of bytes). Thus, MVC media extractor 420 can be used to extract two or more portions of a sample. For each portion of the sample to be extracted, the corresponding one of the data offset values 430 indicates the beginning of the portion (eg, the first byte of the portion, relative to the first byte of the sample), and the data The corresponding one of the length values 432 indicates the length to be copied (eg, the number of bytes). In some examples, a value of zero of one of the data length values 432 may indicate that all remaining bytes in the sample will be copied, ie, the portion corresponds to the bit indicated by the corresponding one of the data offset values 430. The tuple and all other consecutive bytes up to the end of the sample. The following pseudocode provides an example definition similar to the media extractor class of the MVC media extractor 420: class aligned(8) MediaExtractorMVC () { NALUnitHeader(); // may be omitted in some instances unsigned int(8) track_ref_index; Int(8) sample-offset; unsigned int(8) continuous_byte_set_count; for (i = 0 ; i <continuous_byte_set_count; i++ ) { 151028.doc -69- 201119346 unsigned int((lengthSizeMinusOne+1 ) *8) data-offset; unsigned int((lengthSizeMinusOne+1)* 8) data_length; The multiplexer 30 and the demultiplexer 38 may instantiate the media extractor data object using the media extractor defined in the above example pseudo-drink order. Thus, the demultiplexer 38, for example, may refer to a media extractor when extracting data from the selected trajectory so as to freely exemplify the media extractor actor 1 < another trajectory to retrieve the identified material. Figure 16 is a diagram illustrating another example MVC media extractor 44, which may be used to modify the MVC to include the media extractor trajectory. In contrast to the specific bit group of the sample as described with respect to the example of Fig. 15, the real gamma of the MVC media extractor 440...
列識別特定NAL 單元以供提取。在圖16之實例中,MVC媒妒担〜 又卞遐徒取器440包 括一可選NAL單元標頭442、軌跡參考索引4料、樣本偏移 楊、連續NALU(NAL單元)集合計數448,及NALU偏移值 450及連續NAL單元之數目452的迴圈。通常分別以與NAL 單元標頭422、執跡參考索引424及樣本偏移值426相同之 方式來疋義NAL單元標頭442、軌跡參考索引444及樣本偏 移值446。 連續N A L U集合計數4 4 8描述供提取資料之軌跡之樣本的 連續NAL單元之數目》在一些實例中,若將此值設定為 151028.doc -70- 201119346 零’則提取軌跡中的整個經參考之樣本。 NALU偏移值450及連續NALU之數目452出現於一迴圈 中。一般而言,如藉由連續NALU集合計數448所定義,存 在與連續NALU集合一樣多的NALU偏移值之例項及連續 NALU之數目。每一 NALU偏移值描述供提取資料之執跡 之樣本處的相應NAL單元之偏移。可使用此提取器來提取 自NAL單元之此偏移起始的NAL單元。連續NALU之數目 的每一值描述相應NAL單元集合之要複製的整個單一經參 考之NAL單元的數目。 以下偽碼提供類似於MVC媒體提取器440之媒體提取器 類別的實例定義: class aligned(8) MediaExtractorMVC () { NALUnitHeaderQ; //在一些實例中可省略 unsigned int(8) track_ref_index; signed int(8) sample—offset; unsigned int(8) continuous_NALU_set—count; for (i = 0 ; i < continuous_NALU_Set_Count; i++ ) { unsigned int((lengthSizeMinusOne+1 )* 8) NALU_offset; • unsigned int((lengthSizeMinusOne+1 )* 8) num_continuous_NALUs } } 多工器30及解多工器38可使用在以上實例偽碼中定義之 151028.doc • 71· 201119346 媒體提取器來例示媒體提取器資料物件。因此,解多工器 38(例如)在自所選擇轨跡擷取資料時可參考所例示之媒體 提取器,以便自由所例示之媒體提取器參考之另一軌跡擷 取經識別的資料。 圖17為說明另一實例MVC媒體提取器460之方塊圖,該 MVC媒體提取器460在存在一視圖組件之一個以上1^人[單 兀時彙總同一視圖組件中的NAL單元。MVc媒體提取器 460可接著用以提取經識別之視圖組件。在圖丨7之實例 中,MVC媒體提取器460包括一可選^^八:單元標頭462、轨 跡參考索引464、樣本偏移466、連續視圖集合計數468, 及視圖組件偏移值470及視圖組件計數472的迴圈。通常分 別以與NAL單元標頭422、軌跡參考索引424及樣本偏移值 426相同之方式來定義NAL單元標頭462、轨跡參考索引 4 6 4及樣本偏移值4 6 6。 連續視圖集合計數468定義由執跡參考索引464識別之供 提取資料之軌跡中的經識別之樣本之連續視圖組件的數 目。多工器30可將連續視圖集合計數468之值設定為零以 指示將提取軌跡中之整個經參考的樣本。 視圖組件偏移值470及視圖組件計數472出現於—迴圈 中。一般而言,存在與連續視圖集合計數468之值—樣多 的迴圈反覆,且每一迴圈對應於連續視圖集合中的—者。 視圖組件偏移值470中之每一者指示相應連續視圖集合之 供提取資料之軌跡之樣本處的第一視圖組件之偏移。可接 著使用MVC媒體提取器460提取自視圖組件之此偏移起妒 151028.doc •72· 201119346 之視圖組件。視圖組件計數472中之每一者描述相應連續視 圖集合之樣本中的要複製之整個經參考之視圖組件的數目。 以下偽碼提供類似於MVC媒體提取器460之媒體提取器 類別的實例定義: class aligned(8) MediaExtractorMVC () { NALUnitHeader(); //在一些實例中可省略 unsigned int(8)trak_ref_index; signed int(8) sample_offset; unsigned int(8) continuous_view_set_count; for ( i = 0 ; i < continuous_view_set一count; i++) { unsigned int((lengthSizeMinusOne+1 )* 8) view_component_offset; unsigned int((lengthSizeMinusOne+1 )* 8) view_component_count } 多工器30及解多工器38可使用在以上實例偽碼中定義之 媒體提取器來例示媒體提取器資料物件。因此,解多工器 3 8 (例如)在自所選擇軌跡搁取資料時可參考所例示之媒體 提取器’以便自由所例示之媒體提取器參考之另一軌跡擷 ' 取經識別的資料。 圖U為說明MVC媒體提取器480之另一實例的方塊圖, δ亥M VC媒體提取器480可用以參考各種軌跡。在圖a之實 例中,MVC媒體提取器480包括一可選NaL單元標頭482、 連續視圖集合計數484,及樣本偏移值486、執跡參考索引 15l028.doc -73- 201119346 值488、視圖組件偏移值490及視圖組件計數492的迴圈。 NAL單元標頭482可類似於NAL單元標頭422而進行定義, 且在一些實例中可省略。 連續視圖集合計數484給出供提取資料之具有軌跡參考 索引track—ref Jndex的媒體提取器執跡之樣本的連續視圖 組件之數目。track_ref_index值可指定軌跡參考之索引以 用以找尋供提取資料的軌跡。供提取資料之軌跡中的視圖 組件可在時間上對準(在媒體解碼時刻表中,使用時間·樣 本表,藉由由樣本偏移值486之相應者指定的偏移來碉 整)’其中樣本含有MediaExtractorMVC。第一軌跡泉考 具有索引值1 ;可保留值〇以供未來使用。 MVC媒體提取器480之實例將樣本偏移值牦6、軌跡參考 索引值488、視圖組件偏移值490及視圖組件計數中 每一者包括於一迴圈中。每一迴圈反覆對應於一供提取對 應於MVC媒體提取器480之樣本的資料之特定轨跡。 樣本偏移值486定義由軌跡參考索引值488之相應者指, 之軌跡中的樣本之相對索引,該樣本可用作資訊:。^ 〇(零)為由軌跡參考索引值488中的相應者識 ’ 』^^具有血合 有MVC媒體提取器48〇之樣本相同之解碼時間或^人 有MVC媒體提取器480之樣本之前的樣本,樣本1為於含 本,樣本-1為前一樣本,以此類推。 下樣 執跡參考索引值488中之每一者指定執跡參考之 、 用以找尋供提取相應迴圈&覆之資料的執跡。“引= 個軌跡參考索引值,MVC媒體提取 9使用多 T自多個不同轨 15I028.doc •74, 201119346 跡提取資料。 視圖組件偏移值490中之每一者描述供提取資料之執跡 之樣本處的第一視圖組件之偏移,該軌跡具有對應於此迴 圈反覆中之執跡參考索引值488之相應者的軌跡參考索 引。可使用MVC媒體提取器480提取自視圖組件之此偏移 起始之視圖組件。在一些實例中,可建構一種具有巢式迴 圈結構之類似於圖15至圖17之彼等媒體提取器的媒體提取 器,在該巢式迴圈結構中,外部迴圈經由供提取樣本之轨 跡反覆,且内部迴圈經由待自相應軌跡提取之樣本反覆。 視圖組件計數492中之每一者描述軌跡之樣本中的經參考 之視圖組件的數目,該轨跡具有對應於此迴圈反覆中之執 跡參考索引值488中之當前者的軌跡參考索引。 以下偽碼提供類似於MVC媒體提取器480之媒體提取器 類別的實例定義: class aligned(8) MediaExtractorMVC () { NALUnitHeader(); //在一些實例中可省略 unsigned int(8) continuous_view_set_count; for (i = 0 ; i < continuous_view_set_count; i++) { signed int(8) sample—offset; unsigned int(8) track一ref_index; unsigned int((lengthSizeMinusOne+1 )* 8) view_component_offset; unsigned int((lengthSizeMinusOne+1 )* 8) view—component—count 151028.doc -75- 201119346 1/ 多工器3G及解多工器38可使用在以上實例偽碼中定義之 媒體提取器來例示媒體提取器資料物件。因此,解多工器 38(例如)在自所選擇軌跡擷取資料時可參考所例示之媒體 提取器,以便自由所例示之媒體提取器參考之另—軌跡擷 取經識別的資料。 圖19為說明另一實例MVC媒體提取器5〇〇之方塊圖,該 實例MVC媒體提取器500傳訊提取器之持續時間。當媒體 提取器軌跡中之不同樣本共用提取器之相同語法要素時, MVC媒體提取器500可提供一或多個優點。在圖19之實例 中,MVC媒體提取器500包括樣本計數5〇2、連續視圖集合 計數504、樣本偏移值506、軌跡參考索引5〇8、視圖組件 偏移5 1 0,及視圖組件計數5 12。 連續視圖集合計數504、樣本偏移值506、軌跡參考索引 508、視圖組件偏移510及視圖組件計數512通常可根據連 續視圖集合計數484、樣本偏移值486、軌跡參考索引 488、視圖組件偏移490及視圖組件計數492中之相應者來 定義。樣本計數502可定義含有MVC媒體提取器5〇〇之媒體 提取器軌跡中的使用同一媒體提取器之連續樣本的數目。 以下偽碼提供類似於MVC媒體提取器500之媒體提取器 類別的實例定義: class aligned(8) MediaExtractorMVC () { unsigned int(8) sample count; 151028.doc •76· 201119346 unsigned int(8) continuous_view_set_count; for (i = 0 ; i < continuous—view_set_count; i++) { signed int(8) sample_offset; unsigned int(8) track_ref—index; unsigned int((lengthSizeMinusOne+1 )*8) view_component_offset; unsigned int((lengthSizeMinusOne+1) * 8) view_component—count } } 多工器30及解多工器38可使用在以上實例偽碼中定義之 媒體提取器來例示媒體提取器資料物件。因此,解多工器 38(例如)在自所選擇軌跡擷取資料時可參考所例示之媒體 提取器,以便自由所例示之媒體提取器參考之另一軌跡擷 取經識別的資料。 圖20為說明另一實例MVC媒體提取器520之方塊圖,該 實例MVC媒體提取器520定義不同提取器之集合。對於媒 體提取器軌跡中之每一樣本而言,樣本可使用提取器之集 合中的一或多個提取器,或對該等提取器之一參考。亦 即,可定義類似於MVC媒體提取器520之媒體提取器之一 集合,且每一樣本可使用提取器之集合中的一或多個提取 器或對該等提取器之一參考來識別另一執跡之樣本。 MVC媒體提取器520之實例包括提取器識別符值522、樣 本偏移值524、軌跡參考索引值526、連續視圖集合計數 151028.doc -77- 201119346 528,及包括視圖組件偏移530及視圖組件計數532的迴 圈。樣本偏移值524、連續視圖集合計數528、視圖組件偏 移5 30及視圖組件計數532可根據連續視圖集合計數484、 樣本偏移值486、視圖組件偏移490及視圖組件計數492中 之相應者來定義。軌跡參考索引值526可根據(例如)軌跡參 考索引464來定義。 提取器識別符值522定義提取器(亦即,MVC媒體提取器 520)之識別符。向同一媒體提取器軌跡中之提取器指派不 同提取器識別符,使得媒體提取器軌跡中之樣本可參考提 取器識別符值來使用媒體提取器。參考提取器箱亦可經定 義以包括提取器之數目及參考提取器識別符。提取器數目 之值可提供用以複製提取器軌跡中之樣本之資料的提取器 之數目。當提取器數目之值等於零時,可使用具有預定提 取器識別符(例如,等於零之提取器識別符)的提取器。參 考提取器識別符可提供用以複製提取器軌跡中之樣本之資 料的提取器之提取器識別符。此箱可包括於媒體提取器軌 跡之樣本中。 以下偽碼提供類似於MVC媒體提取器520之媒體提取器 類別的實例定義: class aligned(8) MediaExtractorMVC () { unsigned int((lengthSizeMinusOne+l)*8) extractor_id; signed int(8) sample_offset; unsigned int(8) track_ref_index; for ( i = 0 ; i < continuous_view_set_count; i++) { 151028.doc -78 - 201119346 unsigned int((lengthSizeMinusOne+1 )* 8) view component offset; unsigned int((lengthSizeMinusOne+1 )* 8) viewcomponentcount; } } 多工器3〇及解多工器38可使用在以上實例偽碼中定義之 媒體提取器來例示媒體提取器資料物件。因此,解多工器 3 8(例如)在自所選擇軌跡擷取資料時可參考所例示之媒體 提取器’以便自由所例示之媒體提取器參考之另一執跡掏 取經識別的資料。 以下偽碼提供上文所描述之參考提取器箱之參考提取器 箱類別的實例定義: class aligned(8) RefExtractorMVC () { unsigned int((lengthSizeMinusOne+1 )* 8) num—extractor; for (i = 0 ; i < num_extractor; i++) ref_extractor_id ; } } 圖21為說明實例MVC媒體提取器550之方塊圖,該實% MVC媒體提取器550可使用映射樣本群組來形成。Mvc媒 體提取器550之實例指定來自一系列樣本項之NAL單元的 群組,該等樣本項中之每一者貢獻於映射樣本群組中的$ 151028.doc •79- 201119346 續NAL單元。在圖22之實例中,MVC媒體提取器550包括 NALU群組計數552,及包括軌跡索引554、群組描述索引 556、NALU起始映射樣本558及NALU視圖計數560的迴 圈。 NALU群組計數552指定來自參考軌跡中之映射樣本群組 項之NAL單元群組的數目。軌跡參考索引554各自指定轨 跡參考之索引以用以找尋供提取相應迴圈反覆之資料的軌 跡。群組描述索引556各自指定用以形成相應迴圈反覆之 NAL單元群組的映射樣本群組項之索引。NALU起始映射 樣本558各自指定相應迴圈反覆中之映射樣本群組中之 NAL單元的偏移,該映射樣本群組具有群組描述索引556 之相應者的映射樣本項索引。NALU視圖計數560指定相應 迴圈反覆中之待提取至映射樣本群組中之媒體提取器中的 連續NAL單元之數目,該映射樣本群組具有群組描述索引 556之相應者的映射樣本項索引。 以下偽碼提供類似於MVC媒體提取器550之媒體提取器 類別的實例定義: class aligned(8) MedEtrMapSampleGroup () { unsigned int(32) NALU group count; for ( i =0; i< NALU_group_count; i++ ) { unsigned int(8) track_ref_index; unsigned int(32) group_description_index; unsigned int(8) NALU_start_map_sample; unsigned int(8) NALU_view_count; I51028.doc • 80 · 201119346 多工器30及解多工器38可使用在以上實例偽碼中定義之 媒體提取器來例示媒體提取器資料物件。因此,解多工器 38(例如)在自所選擇軌跡擷取資料時可參考所例示之媒體 提取器,以便自由所例示之媒體提取器參考之另一執跡擷 取經識別的資料。. 本發明之技術可包括用於將樣本之視圖組件配置於樣本 群組中的組合過程。樣本群組項之樣本中的視圖組件以時 間方式進行排序,使得:若樣本A在原始軌跡(具有轨跡參 考索引之索引)中在樣本B之後,則樣本八中之視圖組件在The column identifies a particular NAL unit for extraction. In the example of FIG. 16, the MVC mediator 440 includes an optional NAL unit header 442, a track reference index 4, a sample offset Yang, and a consecutive NALU (NAL unit) set count 448. And a loop of the NALU offset value 450 and the number 452 of consecutive NAL units. The NAL unit header 442, the track reference index 444, and the sample offset value 446 are typically deprecated in the same manner as the NAL unit header 422, the tracking reference index 424, and the sample offset value 426, respectively. The continuous NALU set count 4 4 8 describes the number of consecutive NAL units of samples for the trajectory of the extracted data. In some examples, if this value is set to 151028.doc -70-201119346 zero' then the entire reference in the extracted trajectory Sample. The NALU offset value 450 and the number 452 of consecutive NALUs appear in a loop. In general, as defined by the consecutive NALU set count 448, there are as many instances of the NALU offset value and the number of consecutive NALUs as there are consecutive NALU sets. Each NALU offset value describes the offset of the corresponding NAL unit at the sample for the trace of the extracted data. This extractor can be used to extract NAL units starting from this offset of the NAL unit. Each value of the number of consecutive NALUs describes the number of entire single referenced NAL units of the corresponding NAL unit set to be copied. The following pseudocode provides an example definition of the media extractor class similar to MVC media extractor 440: class aligned(8) MediaExtractorMVC () { NALUnitHeaderQ; //In some instances, unsigned int(8) track_ref_index; signed int(8) Sample_offset; unsigned int(8) continuous_NALU_set_count; for (i = 0 ; i <continuous_NALU_Set_Count; i++ ) { unsigned int((lengthSizeMinusOne+1 )* 8) NALU_offset; • unsigned int((lengthSizeMinusOne+1 ) * 8) num_continuous_NALUs } } The multiplexer 30 and the demultiplexer 38 can use the 151028.doc • 71· 201119346 media extractor defined in the above example pseudocode to instantiate the media extractor data object. Thus, the demultiplexer 38, for example, may refer to the exemplified media extractor when extracting data from the selected trajectory to freely retrieve the identified material from another trajectory referenced by the instantiated media extractor. Figure 17 is a block diagram illustrating another example MVC media extractor 460 that summarizes NAL units in the same view component when there is more than one person of a view component. The MVc media extractor 460 can then be used to extract the identified view components. In the example of FIG. 7, MVC media extractor 460 includes an optional unit header 462, track reference index 464, sample offset 466, continuous view set count 468, and view component offset value 470. And the view component counts 472 the loop. The NAL unit header 462, the trajectory reference index 466, and the sample offset value 466 are typically defined in the same manner as the NAL unit header 422, the trajectory reference index 424, and the sample offset value 426, respectively. The continuous view set count 468 defines the number of consecutive view components of the identified samples in the trace for extracting data identified by the trace reference index 464. The multiplexer 30 can set the value of the continuous view set count 468 to zero to indicate that the entire referenced sample in the trajectory will be extracted. The view component offset value 470 and the view component count 472 appear in the -loop. In general, there are as many loop repeats as the value of the continuous view set count 468, and each loop corresponds to the one in the continuous view set. Each of the view component offset values 470 indicates an offset of the first view component at the sample of the track of the corresponding continuous view set for extracting the data. The view component of the self-view component can be extracted using the MVC media extractor 460 as follows: 151028.doc • 72· 201119346. Each of the view component counts 472 describes the number of entire referenced view components to be copied in the samples of the respective successive view sets. The following pseudocode provides an example definition similar to the media extractor class of the MVC media extractor 460: class aligned(8) MediaExtractorMVC () { NALUnitHeader(); // In some instances, unsigned int(8)trak_ref_index; signed int may be omitted (8) sample_offset; unsigned int(8) continuous_view_set_count; for ( i = 0 ; i <continuous_view_set_count; i++) { unsigned int((lengthSizeMinusOne+1 )* 8) view_component_offset; unsigned int((lengthSizeMinusOne+1 )* 8) view_component_count } The multiplexer 30 and the demultiplexer 38 may instantiate the media extractor data object using the media extractor defined in the above example pseudo code. Thus, the demultiplexer 38 can, for example, refer to the exemplified media extractor' when freeing data from the selected trajectory to freely retrieve the identified material from another trajectory referenced by the media extractor as exemplified. Figure U is a block diagram illustrating another example of an MVC media extractor 480 that can be used to reference various trajectories. In the example of FIG. a, MVC media extractor 480 includes an optional NaL unit header 482, a continuous view set count 484, and a sample offset value 486, a trace reference index 15l028.doc -73 - 201119346 value 488, view The component offset value 490 and the loop of the view component count 492. NAL unit header 482 may be defined similar to NAL unit header 422 and may be omitted in some examples. The continuous view set count 484 gives the number of consecutive view components of the sample with the track reference index track_ref Jndex for the extracted data. The track_ref_index value specifies the index of the track reference to find the track for extracting data. The view components in the trajectory for extracting data may be aligned in time (in the media decoding time table, using the time sample table, adjusted by the offset specified by the corresponding one of the sample offset values 486) The sample contains MediaExtractorMVC. The first track spring has an index value of 1; the value can be retained for future use. An example of MVC media extractor 480 includes each of sample offset value 牦6, trajectory reference index value 488, view component offset value 490, and view component count in a loop. Each loop repeatedly corresponds to a particular trajectory for extracting data corresponding to samples of the MVC media extractor 480. The sample offset value 486 defines the relative index of the samples in the trajectory, as indicated by the corresponding one of the trajectory reference index values 488, which can be used as information:. ^ 零 (zero) is the sample of the track reference index value 488, which is the same as the sample with the MVC media extractor 48〇, or the sample before the sample of the MVC media extractor 480. , sample 1 is inclusive, sample-1 is the same as before, and so on. Proof Each of the refurbishment reference index values 488 specifies a refusal reference for finding the information for extracting the corresponding lap & "引引 = track reference index value, MVC media extraction 9 uses multiple T from multiple different tracks 15I028.doc • 74, 201119346 trace extraction data. View component offset value 490 describes each of the traces for extracting data An offset of the first view component at the sample, the trajectory having a trajectory reference index corresponding to the corresponding one of the trajectory reference index values 488 in the loop. The MVC media extractor 480 can be used to extract the self-view component Offset starting view component. In some examples, a media extractor having a nested loop structure similar to the media extractors of Figures 15-17 can be constructed, in the nested loop structure, The outer loop is overlaid via the trajectory for extracting the sample, and the inner loop is overlaid by the sample to be extracted from the corresponding trajectory. Each of the view component counts 492 describes the number of referenced view components in the sample of the trajectory, The trajectory has a trajectory reference index corresponding to the current one of the trajectory reference index values 488 in this loop. The following pseudocode provides a media extractor class similar to the MVC media extractor 480. Example definition: class aligned(8) MediaExtractorMVC () { NALUnitHeader(); //In some instances you can omit unsigned int(8) continuous_view_set_count; for (i = 0 ; i <continuous_view_set_count; i++) { signed int(8 ) sample-offset; unsigned int(8) track-ref_index; unsigned int((lengthSizeMinusOne+1)* 8) view_component_offset; unsigned int((lengthSizeMinusOne+1 )* 8) view—component—count 151028.doc -75- 201119346 The 1/multiplexer 3G and demultiplexer 38 can instantiate the media extractor data object using the media extractor defined in the pseudo code of the above example. Thus, the demultiplexer 38 (for example) captures from the selected trajectory The data may be referred to the exemplified media extractor so as to freely retrieve the identified data from the other exemplified media extractor reference. Figure 19 is a block diagram illustrating another example MVC media extractor 5 The duration of the MVC media extractor 500 communication extractor. When different samples in the media extractor trajectory share the same syntax elements of the extractor, the MVC media extractor 500 can provide Or a plurality of advantages. In the example of FIG. 19, the MVC media extractor 500 includes a sample count 5〇2, a continuous view set count 504, a sample offset value 506, a track reference index 5〇8, a view component offset 5 1 0 , and the view component counts 5 12. The continuous view set count 504, the sample offset value 506, the track reference index 508, the view component offset 510, and the view component count 512 can generally be based on a continuous view set count 484, a sample offset value 486, a track reference index 488, a view component bias. The corresponding one of the shift 490 and the view component count 492 is defined. The sample count 502 can define the number of consecutive samples using the same media extractor in the media extractor trajectory containing the MVC media extractor. The following pseudocode provides an example definition similar to the media extractor class of the MVC media extractor 500: class aligned(8) MediaExtractorMVC () { unsigned int(8) sample count; 151028.doc •76· 201119346 unsigned int(8) continuous_view_set_count ; ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( lengthSizeMinusOne+1) * 8) view_component_count } } The multiplexer 30 and the demultiplexer 38 can instantiate the media extractor data object using the media extractor defined in the above example pseudo code. Thus, the demultiplexer 38, for example, may refer to the exemplified media extractor when extracting data from the selected trajectory to freely retrieve the identified material from another trajectory referenced by the instantiated media extractor. 20 is a block diagram illustrating another example MVC media extractor 520 that defines a set of different extractors. For each sample in the media extractor trajectory, the sample may use one or more of the extractors in the set of extractors, or reference one of the extractors. That is, one set of media extractors similar to MVC media extractor 520 can be defined, and each sample can use one or more extractors in the set of extractors or one of the extractors to identify another A sample of the falsification. Examples of MVC media extractor 520 include extractor identifier value 522, sample offset value 524, track reference index value 526, continuous view set count 151028.doc -77 - 201119346 528, and include view component offset 530 and view component Count the loop of 532. Sample offset value 524, continuous view set count 528, view component offset 5 30, and view component count 532 may be based on successive view set count 484, sample offset value 486, view component offset 490, and view component count 492. To define. The trajectory reference index value 526 can be defined in accordance with, for example, the trajectory reference index 464. Extractor identifier value 522 defines the identifier of the extractor (i.e., MVC media extractor 520). The extractor in the same media extractor trajectory is assigned a different extractor identifier such that samples in the media extractor trajectory can refer to the extractor identifier value to use the media extractor. The reference extractor bin can also be defined to include the number of extractors and the reference extractor identifier. The value of the number of extractors provides the number of extractors used to copy the data of the samples in the extractor trajectory. When the value of the number of extractors is equal to zero, an extractor having a predetermined extractor identifier (e.g., an extractor identifier equal to zero) can be used. The reference extractor identifier provides an extractor identifier for the extractor that copies the data for the samples in the extractor trajectory. This box can be included in the sample of the media extractor track. The following pseudocode provides an example definition similar to the media extractor class of the MVC media extractor 520: class aligned(8) MediaExtractorMVC () { unsigned int((lengthSizeMinusOne+l)*8) extractor_id; signed int(8) sample_offset; unsigned Int(8) track_ref_index; for ( i = 0 ; i <continuous_view_set_count; i++) { 151028.doc -78 - 201119346 unsigned int((lengthSizeMinusOne+1 )* 8) view component offset; unsigned int((lengthSizeMinusOne+1 ) * 8) viewcomponentcount; } } The multiplexer 3 〇 and multiplexer 38 can instantiate the media extractor data object using the media extractor defined in the above example pseudo code. Thus, the demultiplexer 38, for example, may refer to the exemplified media extractor' when extracting data from the selected trajectory to freely retrieve the identified material from another exemplified media extractor reference. The following pseudocode provides an example definition of the reference extractor box class of the reference extractor box described above: class aligned(8) RefExtractorMVC () { unsigned int((lengthSizeMinusOne+1 )* 8) num—extractor; for (i = 0 ; i <num_extractor; i++) ref_extractor_id ; } } Figure 21 is a block diagram illustrating an example MVC media extractor 550, which may be formed using a group of mapped samples. An instance of Mvc Media Extractor 550 specifies a group of NAL units from a series of sample items, each of which contributes to the $151028.doc •79-201119346 NAL unit in the mapped sample group. In the example of FIG. 22, MVC media extractor 550 includes a NALU group count 552, and a loop including track index 554, group description index 556, NALU start map sample 558, and NALU view count 560. The NALU group count 552 specifies the number of NAL unit groups from the mapped sample group entries in the reference trajectory. The trajectory reference index 554 each specifies an index of the trajectory reference for finding a trajectory for extracting the data of the corresponding loop. The group description index 556 each specifies an index of the mapped sample group item used to form the NAL unit group of the corresponding loop. The NALU start mapping samples 558 each specify an offset of a NAL unit in a mapped sample group in the corresponding loop, the mapped sample group having a mapped sample item index of the corresponding one of the group description index 556. The NALU view count 560 specifies the number of consecutive NAL units in the corresponding mediator to be extracted into the media extractor in the mapped sample group, the mapped sample group having the mapped sample item index of the corresponding one of the group description index 556 . The following pseudocode provides an example definition similar to the media extractor class of the MVC media extractor 550: class aligned(8) MedEtrMapSampleGroup () { unsigned int(32) NALU group count; for ( i =0; i<NALU_group_count; i++ ) { unsigned int(8) track_ref_index; unsigned int(32) group_description_index; unsigned int(8) NALU_start_map_sample; unsigned int(8) NALU_view_count; I51028.doc • 80 · 201119346 Multiplexer 30 and multiplexer 38 can be used above The media extractor defined in the example pseudocode is used to instantiate the media extractor data object. Thus, the demultiplexer 38, for example, may refer to the exemplified media extractor when extracting data from the selected trajectory to freely retrieve the identified material from another exemplified by the instantiated media extractor. The techniques of this disclosure may include a combination process for configuring a view component of a sample into a sample group. The view components in the sample of the sample group item are sorted in a time manner such that if sample A is after sample B in the original track (with the index of the track reference index), then the view component in sample eight is
媒體提取器軌跡中在樣本B中的視圖組件之後;若樣本A 具有早於樣本B之解碼時間的解碼時間,則樣本a中之視 圖組件在媒體提取器軌跡中在樣本B中之視圖組件之後; 軌跡之同一樣本中的兩個視圖組件遵循媒體提取器映射樣 本群組之語法表中之呈現次序;若軌跡之同一樣本中的兩 個視圖組件屬於NAL單元之同一群組,亦即,其由媒體提 取器映射樣本群,组中之同一迴圈之語法要素進行提取,則 該兩個視圖組件遵循原始次序;且若自在不同執跡中但且 有同一時戳之樣本提取兩個視圖組件,則該兩個視圖組= 遵循如在MVC槽案格式之視圖識別符箱中指^的視圖次序 索引之次序。 圖2 2為說明傳訊軌跡選擇箱之額外屬性的實例經修改 3GPP軌跡選擇箱39〇之方塊圖。根據此文獻㈣㈣之最 15I028.doc 201119346 新30??標準指定一入》1^1^61^31,其包括描述以下各者之 屬性:語言、頻寬、編解碼器、螢幕大小、最大封包大 小,及媒體類型。3GPP轨跡選擇箱390之屬性清單392包 括語言值394、頻寬值396、編解碼器值398及螢幕大小值 400,其根據現有3GPP標準傳訊此等屬性。此外,本發明 之技術可修改現有3GPP軌跡選擇箱以使其包括訊框率值 406、時間識別符值408,及(在一些狀況下)顯示視圖數目 值410及輸出視圖清單值412。 如在現有3GPP標準之條款5.3.3.4中所定義,語言值394 定義會話等級SDP中之「交替群組」屬性的群組類型 LANG之值。頻寬值396定義媒體等級SDP中之「b=AS」屬 性的值。編解碼器值398定義媒體軌跡之樣本描述箱中的 SampleEntry值。榮幕大小值400定義媒體執跡中之 MP4VisualSampleEntry 值及 H263SampleEntry 值的寬度及 高度欄位。最大封包大小值402定義RTPHintSampleEntry 中(例如,在RTP示意軌跡中)之MaxPacketSize欄位的值。 媒體類型值404描述媒體軌跡之處置器箱中的HandlerType。 一般而言,此等值對應於現有3GPP標準。 訊框率值406描述對應於3GPP執跡選擇箱390之視訊軌 跡或媒體提取器軌跡的訊框率。時間識別符值408對應於 對應於3GPP軌跡選擇箱390之視訊軌跡的時間識別符,且 可視具有較低時間識別符值之執跡而定。在一些實例中, 多工器30可指示,時間識別符值408之值並非藉由將值設 定為經預先組態之「非指定」值(例如,8)來指定。一般而 151028.doc -82- 201119346 。夕工益30可指示,並不指定非視訊軌跡之時間識別符 值408的值。在—些實例中,多工器別亦可指示,當相應 視訊轨跡並不含有媒體提取器及/或並未由其他轨跡料 時間子集而加以參考時,不指定時間制符值_的值。 在於3咖中考慮Mvc之實例中,多工器3〇可包括額外 屬性:顯示視圖數目值41〇及輸出視圖清單值412。在此等 實例中,多工器30可省略時間識別符值4〇8。顯示視圖數 目值4HM田述相應軌跡之將輸出之視圖的數目。舉例而 °在參考並未顯示之視圖編碼待顯示之視圖時,待輸出 2圖之數目與待解碼之視圖的數目不必相同。輸出視圖 π單值412可定義識別待輸出之Ν個視圖之Ν個視圖識別符 的清單。 圖2 3為根據本發明之技術的用於使用媒體提取器之實例 方法的流程圖。最初’諸如Α/ν源器件2〇(圖υ之源器件根 據本發明之技術建構遵照一檔案格式之檔案的視訊軌跡。 亦即,多工器30將經編碼之視訊資料組合於該軌跡中使 得視訊軌跡包括經寫碼之視訊樣本,該等視訊樣本包括一 或多個NAL單元(600)。多工器3〇亦建構一參考視訊軌跡之 一或多個NAL單元中之一些或全部的提取器(6〇2),並建構 一包括該提取器的提取器軌跡(6〇4)。此外,多工器3〇可將 經編碼之視訊樣本包括於媒體提取器軌跡及包括經編碼之 視訊樣本及/或媒體提取器的額外軌跡中。 多工器30可接著輸出檔案(6〇6)。檔案可經由傳輸器、 收發器、網路介面、數據機或其他信號輸出構件輸出至— 151028.doc •83- 201119346 信號’或檔案可經由諸如USB介面、磁性媒體記錄器、光 學記錄器之硬體介面或其他硬體介面輸出至儲存媒體。 A/V目的地器件4〇可(例如)藉由接收信號或讀取儲存媒 體來最終接收到檔案(608)。解多工器38可選擇兩個(或兩 個以上)軌跡中的一者以進行解碼(61〇)。解多工器W可美 於視訊解碼器48之解碼能力、視訊輸出44之顯現能力或其 T準則來選擇軌跡中的一者◎當選擇一提取器執跡時,解 多工器38可自該執跡擷取由提取器轨跡中之提取器泉考之 NAL單元,在該軌跡中儲存有由提取器識別之經編碼的視 訊樣本。 解多工器38可丟棄並不處於所選擇軌跡中且並未由所選 擇軌跡中之至少一提取器識別的經編碼之視訊樣本(或2 他NAL單元)。亦即’解多工器38可避免將此等經編敎 錢樣本發送至視訊解碼㈣,使得無需向視訊解碼器Μ 分派解碼未使用之視訊資料的任務。 在-或多個實例中,所描述之功能可以硬體、軟體、勃 體或其任-組合來實施。若以軟體來實施,則該等功能可 作為-或多則旨令或程式碼儲存於電腦可讀㈣上或經由 電腦可讀媒體來傳輸。電腦可讀媒體可包括諸如資料儲存 媒體或通信媒體之電腦可讀儲存媒體,該通信媒體包括促 進將電腦程式自一處轉移至另一處的任何媒體。資料儲存 媒體可為任何可用媒體,其可由—或多個電腦或-或多個 處理器存取以擷取用於實施本發明中所描述之技術的指 令、程式碼及/或資料結構。以實例說明之且並非限制, 151028.doc -84 - 201119346 此電腦可讀儲存媒體可包含:ram、r〇m、eepr〇m、 ROM或其他光碟儲存器、磁碟儲存器或其他磁性儲存 器件、快閃記憶體,或可用以儲存呈指令或資料結構形式 之所要程式碼且可由電腦存取的任何其他媒體。又,可將 任何連接適當地稱為電腦可讀媒體。舉例而言,若使用同 軸線境、光纖線欖、雙絞線、數位用戶線(dsl),或諸如 ..工外線、無線電及微波之無線技術而自網站、飼服器或其 他遠端源傳輸指令’則同軸線纜 '光纖線纜、雙絞線、 SL或諸如紅外線、無線電及微波之無線技術包括於媒 體之定義中。然而,應理解,電腦可讀儲存媒體及資料儲 存媒體並不包括連接、載波、信號或其他暫態媒體。於本 文中使用時,磁碟及光碟包括緊密光碟(CD)'雷射光碟、 光碟、數位影音光碟(_)、軟性磁碟及藍光光碟,1中 磁碟通常以磁性方式再現資料,而光碟藉由雷射以光學方 、見資料上述各者之組合亦應包括在電腦可讀媒體之 範疇内。 編碼於電腦可讀媒體中之指令可由諸如以下各者之一或 夕個處理器來執行:一或多個數位信號處理器(DSP)、通 =微處理器、特殊應用積體電路(ASIC)、場可程式化邏輯 歹UFPGA) ’或其他等效整合式或離散邏輯電路。因此, 中所使用之術語「處理器」可指代前述結構或適於 γ文中所描述之技術之任何其他結構中的任—者。此 在4樣中,本文中所描述之功能性可提供於專用 及/或軟體模組内,其經組態以用於編碼及解碼或併 151028.doc -85- 201119346 入於經組合之編解碼器中。又,該等技術可完全實施於一 或多個電路或邏輯元件中。 本發明之技術可以多種器件或裝置來實施,該等器件或 裝置包括無線手機、積體電路(1C)或1C之集合(例如,晶片 組)°在本發明中描述各種組件、模組或單元以強調經組 態以執行所揭示技術之器件的功能態樣,但不必要求由不 同硬體單元來實現。實情為,如上文所描述,各種單元可 組合於一編解碼器硬體單元中,或由包括如上文所描述之 一或多個處理器的交互操作式硬體單元之集合結合合適軟 體及/或韌體東提供。 已榀述了各種實施例。此等及其他實例係在以下申請專 利範圍之範嗜内。 【圖式簡單說明】 圖1為說明音訊/視訊(A/V)源器件將音訊資料及視訊資 料傳送至A/ν目的地器件之實例系統的方塊圖。 圖2為說明多工器之組件之實例配置的方塊圖。 圖:為說明-實例擋案之方塊圖,該實例檔案包括一具 、fl樣本之-集合的第一執跡及一具有提取器之第二軌 跡4等提取器參考第一執跡之視訊樣本之一子集。 圖4為說明包括兩個相異提取器軌跡之另—實例播案的 圖5為說明包括一子 一實例檔案的方塊圖。 圖6A至圓6C為說明 集軌跡及兩個媒體提取器軌跡之另 一檔案之媒體資料箱之實例的方塊 151028.doc -86- 201119346 圖,该媒體資料箱包括各種媒體提取器軌跡之媒體提取器 的實例。 圖7為說明實例MVC預測型式之概念圖。 圖8至圖2 1為說明根據本發明之技術的媒體提取器之資 料結構及可使用之其他支援資料結構的各種實例之方塊 圖。 圖22為說明用以傳訊執跡選擇箱之額外屬性的實例經修 改第三代合作夥伴計劃(3Gpp)轨跡選擇箱之方塊圖。 圖23為根據本發明之技術的用於使用媒體提取器之實例 方法的流程圖。 【主要元件符號說明】 10 系統 20 音訊/視訊(A/V)源器件 22 音訊源 24 視訊源 26 音訊編碼器 28 視訊編碼器 30 多工器 32 輸出介面 34 電腦可讀媒體 36 輸入介面 38 解多工器 40 音訊/視訊(A/V)目的地 42 音訊輸出 15I028.doc -87· 201119346 44 視訊輸出 46 音訊解碼器 48 視訊解碼器 60 流管理單元 62 NAL單元建構器 66 流識別符(流ID)查找單元 64 軌跡產生單元 68 提取器產生單元 80 視訊輸入介面 82 音訊輸入介面 84 多工流輸出介面 88 程式特定資訊表 100 檔案 102 MOOV 箱 104 完整子集軌跡 106 媒體提取器軌跡 110 媒體資料(MD AT)箱 112 經I編碼之樣本 114 經P編碼之樣本 116 經B編碼之樣本 118 經B編碼的樣本 120 提取器 122 提取器 124 提取器 151028.doc •88. 201119346 140 檔案 142 MOOV 箱 144 完整子集執跡 146 提取器轨跡 148 提取器軌跡 150 媒體資料(MD AT)箱 152 經I編瑪之樣本 154 經P編碼之樣本 156 經B編碼的樣本 158 經B編碼的樣本 160 提取器 162 提取器 164 提取器 166 提取器 168 提取器 180 檔案 182 MOOV 箱 184 媒體提取器轨跡 186 媒體提取器軌跡 188 子集軌跡 190 媒體資料(MD AT)箱 192 级I編碼之樣本 194 經P編碼之樣本 198 提取器 151028.doc -89- 201119346 200 提取器 202 經B編碼之樣本 204 提取器 206 提取器 208 經B編碼的樣本 210 提取器 220 媒體資料(MD AT)箱 222 錯定樣本 223 非猫定樣本 224A 視圖0樣本 224B 視圖0樣本 226A 視圖2樣本 226B 視圖2樣本 228A 視圖1樣本 228B 視圖1樣本 230A 視圖4樣本 230B 視圖4樣本 232A 視圖3樣本 232B 視圖3樣本 240 提取器集合 242A 提取器 242N 提取器 244 提取器集合 246A 提取器 151028.doc -90- 201119346 246N 提取器 250 提取器 252A 提取器樣本 252N 提取器樣本 254A 提取器 254B 提取器 256A 提取器 256B 提取器 300 媒體提取器 302 轨跡參考索引 304 樣本偏移值 310 媒體提取器 312 樣本標頭 314 軌跡參考索引 316 樣本偏移值 320 媒體提取器 322 樣本標頭 324 軌跡參考索引 326 資料偏移值 328 資料長度值 340 媒體提取器 342 軌跡參考索引 344 保留位元 346 樣本偏移值 151028.doc -91 201119346 350 媒體提取器 352 軌跡識別符 354 保留位元 356 樣本偏移值 360 媒體提取器樣本群組 362 執跡參考索引 364 群組類型 366 群組數目計數 368 保留位元 370 群組描述索引 380 媒體提取器 382 軌跡參考索引 384 時間識別符值 386 保留位元 388 樣本偏移值 390 3GPP軌跡選擇箱 392 屬性清單 394 語言值 396 頻寬值 398 編解碼器值 400 螢幕大小值 402 最大封包大小值 404 媒體類型值 406 訊框率值 151028.doc -92- 201119346 408 時間識別符值 410 顯示視圖數目值 412 輸出視圖清單值 420 媒體提取器 422 NAL單元標頭 424 軌跡參考索引 426 樣本偏移 428 連續位元組集合計數 430 資料偏移值 432 資料長度值 440 MVC媒體提取器 442 NAL單元標頭 444 執跡參考索引 446 樣本偏移值 448 連續NALU(NAL單元)集合計數 450 NALU偏移值 452 連續NAL單元之數目 460 MVC媒體提取器 462 NAL單元標頭 464 轨跡參考索引 466 樣本偏移 468 連續視圖集合計數 470 視圖組件偏移值 472 視圖組件計數 151028.doc -93- 201119346 480 MVC媒體提取器 482 NAL單元標頭 484 連續視圖集合計數 486 樣本偏移值 488 軌跡參考索引值 490 視圖組件偏移值 492 視_組件計數 500 MVC媒體提取器 502 樣本計數 504 連續視圖集合計數 506 樣本偏移值 508 執跡參考索引 510 視圖組件偏移 512 視圖組件計數 520 MVC媒體提取器 522 提取器識別符值 524 樣本偏移值 526 轨跡參考索引值 528 連續視圖集合計數 530 視圖組件偏移 532 視圖組件計數 550 MVC媒體提取器 552 NALU群組計數 554 軌跡索引 151028.doc -94- 201119346 556 群組描述索引 558 NALU起始映射樣本 560 NALU視圖計數 SO 視圖 SI 視圖 S2 視圖 S3 視圖 S4 視圖 S5 視圖 S6 視圖 S7 視圖 151028.doc -95-After the view component in sample B in the media extractor trajectory; if sample A has a decoding time earlier than the decoding time of sample B, then the view component in sample a is behind the view component in sample B in the media extractor trajectory The two view components in the same sample of the track follow the presentation order in the syntax table of the media extractor mapping sample group; if two view components in the same sample of the track belong to the same group of NAL units, ie, The sample group is mapped by the media extractor, and the syntax elements of the same loop in the group are extracted, then the two view components follow the original order; and if the samples of the same time stamp are extracted from different traces, the two view components are extracted. , then the two view groups = follow the order of the view order index as indicated in the view identifier box of the MVC slot format. Figure 2 2 is a block diagram illustrating an example of a modified 3GPP trajectory selection box 39 for additional attributes of the trajectory selection box. According to this document (4) (4), the most 15I028.doc 201119346 new 30?? standard specifies an entry "1^1^61^31, which includes attributes describing the following: language, bandwidth, codec, screen size, maximum packet Size, and media type. The attribute list 392 of the 3GPP trace selection box 390 includes a language value 394, a bandwidth value 396, a codec value 398, and a screen size value 400, which are communicated according to existing 3GPP standards. Moreover, the techniques of this disclosure may modify an existing 3GPP track selection box to include a frame rate value 406, a time identifier value 408, and (in some cases) a view view number value 410 and an output view list value 412. The language value 394 defines the value of the group type LANG of the "alternating group" attribute in the session level SDP as defined in clause 5.3.3.4 of the existing 3GPP standard. The bandwidth value 396 defines the value of the "b=AS" attribute in the media level SDP. The codec value 398 defines the SampleEntry value in the sample description box of the media track. The honor size value 400 defines the width and height fields of the MP4VisualSampleEntry value and the H263SampleEntry value in the media trace. The maximum packet size value 402 defines the value of the MaxPacketSize field in the RTPHintSampleEntry (eg, in the RTP schematic trace). The media type value 404 describes the HandlerType in the handler box of the media track. In general, these values correspond to existing 3GPP standards. The frame rate value 406 describes the frame rate corresponding to the video track or media extractor track of the 3GPP Strike Selection Box 390. The time identifier value 408 corresponds to the time identifier corresponding to the video track of the 3GPP track selection box 390 and may be subject to the presence of a lower time identifier value. In some examples, multiplexer 30 may indicate that the value of time identifier value 408 is not specified by setting the value to a pre-configured "unspecified" value (e.g., 8). Generally and 151028.doc -82- 201119346. Xigongyi 30 may indicate that the value of the time identifier value 408 of the non-video track is not specified. In some instances, the multiplexer may also indicate that the time slot value is not specified when the corresponding video track does not contain a media extractor and/or is not referenced by other time material subsets. Value. In the example of considering Mvc in 3, the multiplexer 3 may include additional attributes: a display view number value of 41 〇 and an output view list value 412. In such an example, multiplexer 30 may omit the time identifier value 4〇8. The number of views is displayed. The number of views of the corresponding trajectory of the 4HM field is output. For example, when the view to be displayed is encoded in a view not shown, the number of pictures to be outputted 2 does not have to be the same as the number of views to be decoded. Output View The π-single value 412 defines a list of view identifiers that identify the views to be output. Figure 23 is a flow diagram of an example method for using a media extractor in accordance with the teachings of the present invention. Initially, such as a Α/ν source device 2 (the source device of the figure is constructed according to the technique of the present invention to construct a video track conforming to a file of a file format. That is, the multiplexer 30 combines the encoded video data into the track. The video track includes a coded video sample, the video sample includes one or more NAL units (600), and the multiplexer 3A also constructs one or all of the reference video tracks or some or all of the NAL units. An extractor (6〇2) and constructing an extractor track (6〇4) including the extractor. In addition, the multiplexer 3 can include the encoded video sample in the media extractor track and include the encoded In the additional trace of the video sample and/or media extractor, the multiplexer 30 can then output the file (6〇6). The file can be output to the transmitter, transceiver, network interface, modem or other signal output component to - 151028.doc •83- 201119346 The signal 'or file can be output to the storage medium via a hardware interface such as a USB interface, magnetic media recorder, optical recorder or other hard interface. A/V destination device 4 〇 (eg The file is finally received (608) by receiving a signal or reading the storage medium. The demultiplexer 38 can select one of two (or more) tracks for decoding (61 〇). W can select one of the trajectories of the decoding capability of the video decoder 48, the rendering capability of the video output 44, or its T criterion. ◎ When an extractor is selected, the demultiplexer 38 can be executed from there. Taking the NAL unit of the extractor in the extractor trajectory, the encoded video sample identified by the extractor is stored in the trajectory. The multiplexer 38 can be discarded and not in the selected trajectory and is not An encoded video sample (or 2 other NAL unit) identified by at least one of the selected trajectories. That is, the 'demultiplexer 38 can prevent such encoded samples from being sent to the video decoding (4), such that There is no need to assign a task to the video decoder to decode unused video material. In the case of - or multiple instances, the functions described may be implemented in hardware, software, or any combination of them. These functions can be used as - or more Or the program code is stored on a computer readable medium (4) or transmitted via a computer readable medium. The computer readable medium can include a computer readable storage medium such as a data storage medium or a communication medium, the communication medium including facilitating the computer program from one place. Any medium that is transferred to another location. The data storage medium can be any available medium that can be accessed by - or multiple computers or - or multiple processors to retrieve instructions for implementing the techniques described in this disclosure, The code and / or data structure. By way of example and without limitation, 151028.doc -84 - 201119346 This computer readable storage medium can include: ram, r〇m, eepr〇m, ROM or other optical disk storage, magnetic A disk storage or other magnetic storage device, flash memory, or any other medium that can be used to store a desired code in the form of an instruction or data structure and accessible by a computer. Also, any connection is properly termed a computer-readable medium. For example, if you use coaxial wire, fiber optic cable, twisted pair cable, digital subscriber line (dsl), or wireless technology such as external lines, radio and microwave, from websites, feeders, or other remote sources. Transmission instructions 'coaxial cable' fiber optic cable, twisted pair, SL or wireless technologies such as infrared, radio and microwave are included in the definition of the media. However, it should be understood that computer readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media. As used herein, magnetic disks and optical discs include compact discs (CDs), laser discs, optical discs, digital audio and video discs (_), flexible magnetic discs and Blu-ray discs. One of the magnetic discs usually reproduces data magnetically, while the optical discs are used. The combination of each of the above by laser, and the combination of the above should also be included in the scope of computer readable media. The instructions encoded in the computer readable medium can be executed by one of, for example, one or more processors: one or more digital signal processors (DSPs), pass-through microprocessors, special application integrated circuits (ASICs) Field programmable logic (UFPGA) 'or other equivalent integrated or discrete logic circuits. Accordingly, the term "processor" as used in this specification may refer to any of the foregoing structures or any other structure suitable for the techniques described in the gamma. In this case, the functionality described herein can be provided in a dedicated and/or software module that is configured for encoding and decoding or as a combination of 151028.doc -85-201119346 In the decoder. Moreover, such techniques can be fully implemented in one or more circuits or logic elements. The techniques of this disclosure may be implemented in a variety of devices or devices, including wireless handsets, integrated circuits (1C) or a collection of 1C (eg, a chipset). Various components, modules or units are described in the present invention. To emphasize the functional aspects of devices configured to perform the disclosed techniques, but not necessarily required to be implemented by different hardware units. Rather, as described above, the various units may be combined in a codec hardware unit, or combined with a suitable software and/or set of interoperable hardware units including one or more processors as described above. Or firmware east provides. Various embodiments have been described. These and other examples are within the scope of the following patent application. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing an example system in which an audio/video (A/V) source device transmits audio data and video data to an A/ν destination device. 2 is a block diagram showing an example configuration of components of a multiplexer. Figure: is a block diagram of an example-example file, the example file includes a first sample of a set of fl samples, a first track of a set, and a second track 4 with an extractor. A subset of it. Figure 4 is a block diagram illustrating an embodiment of a file including two different extractor traces. 6A through 6C are blocks 151028.doc-86-201119346 of an example of a media data box illustrating a set of tracks and another file of two media extractor tracks, the media data box including media extraction of various media extractor tracks An instance of the device. Figure 7 is a conceptual diagram illustrating an example MVC prediction pattern. Figures 8 through 21 are block diagrams showing various examples of the material structure of the media extractor and other supporting data structures that may be used in accordance with the teachings of the present invention. Figure 22 is a block diagram showing an example modified third generation partnership plan (3Gpp) track selection box for communicating additional attributes of the track selection box. 23 is a flow diagram of an example method for using a media extractor in accordance with the teachings of the present invention. [Main component symbol description] 10 System 20 Audio/Video (A/V) source device 22 Audio source 24 Video source 26 Audio encoder 28 Video encoder 30 Multiplexer 32 Output interface 34 Computer readable media 36 Input interface 38 Solution Multiplexer 40 Audio/Video (A/V) Destination 42 Audio Output 15I028.doc -87· 201119346 44 Video Output 46 Audio Decoder 48 Video Decoder 60 Stream Management Unit 62 NAL Unit Builder 66 Stream Identifier (Stream ID) Search unit 64 Track generation unit 68 Extractor generation unit 80 Video input interface 82 Audio input interface 84 Multiple stream output interface 88 Program specific information table 100 File 102 MOOV box 104 Complete subset track 106 Media extractor track 110 Media data (MD AT) box 112 I-coded sample 114 P-coded sample 116 B-coded sample 118 B-coded sample 120 Extractor 122 Extractor 124 Extractor 151028.doc • 88. 201119346 140 File 142 MOOV box 144 Complete Subsection Execution 146 Extractor Trajectory 148 Extractor Trajectory 150 Media Data (MD AT) Box 152 I-coded sample 154 P-coded sample 156 B-coded sample 158 B-coded sample 160 Extractor 162 Extractor 164 Extractor 166 Extractor 168 Extractor 180 File 182 MOOV Box 184 Media Extractor Track 186 Media Extractor Trajectory 188 Subset Trace 190 Media Data (MD AT) Box 192 Level I Coded Sample 194 P Coded Sample 198 Extractor 151028.doc -89- 201119346 200 Extractor 202 B Coded Sample 204 Extractor 206 Extractor 208 B-coded sample 210 Extractor 220 Media Data (MD AT) box 222 Missed sample 223 Non-Cat Sample 224A View 0 Sample 224B View 0 Sample 226A View 2 Sample 226B View 2 Sample 228A View 1 Sample 228B View 1 Sample 230A View 4 Sample 230B View 4 Sample 232A View 3 Sample 232B View 3 Sample 240 Extractor Set 242A Extractor 242N Extractor 244 Extractor Set 246A Extractor 151028.doc -90- 201119346 246N Extractor 250 Extractor 252A Extractor Sample 252N Extractor Sample 254A Extractor 254B Extractor 256A 256B extractor 300 media extractor 302 trajectory reference index 304 sample offset value 310 media extractor 312 sample header 314 trajectory reference index 316 sample offset value 320 media extractor 322 sample header 324 trajectory reference index 326 data bias Shift value 328 data length value 340 media extractor 342 track reference index 344 reserved bit 346 sample offset value 151028.doc -91 201119346 350 media extractor 352 track identifier 354 reserved bit 356 sample offset value 360 media extractor Sample Group 362 Tracking Reference Index 364 Group Type 366 Group Number Count 368 Reserved Bit 370 Group Description Index 380 Media Extractor 382 Track Reference Index 384 Time Identifier Value 386 Reserved Bit 388 Sample Offset Value 390 3GPP Track selection box 392 Attribute list 394 Language value 396 Band value 398 Codec value 400 Screen size value 402 Maximum packet size value 404 Media type value 406 Frame rate value 151028.doc -92- 201119346 408 Time identifier value 410 display View number value 412 Output view list value 4 20 Media Extractor 422 NAL Unit Header 424 Track Reference Index 426 Sample Offset 428 Continuous Byte Set Count 430 Data Offset Value 432 Data Length Value 440 MVC Media Extractor 442 NAL Unit Header 444 Tracking Reference Index 446 Sample Offset value 448 Continuous NALU (NAL unit) set count 450 NALU offset value 452 Number of consecutive NAL units 460 MVC media extractor 462 NAL unit header 464 Trajectory reference index 466 Sample offset 468 Continuous view set count 470 View component Offset value 472 View component count 151028.doc -93- 201119346 480 MVC media extractor 482 NAL unit header 484 continuous view set count 486 sample offset value 488 track reference index value 490 view component offset value 492 view_component count 500 MVC Media Extractor 502 Sample Count 504 Continuous View Set Count 506 Sample Offset Value 508 Excluded Reference Index 510 View Component Offset 512 View Component Count 520 MVC Media Extractor 522 Extractor Identifier Value 524 Sample Offset Value 526 Track Trace Reference Index Value 528 Continuous View Set Count 53 0 View Component Offset 532 View Component Count 550 MVC Media Extractor 552 NALU Group Count 554 Track Index 151028.doc -94- 201119346 556 Group Description Index 558 NALU Start Map Sample 560 NALU View Count SO View SI View S2 View S3 view S4 view S5 view S6 view S7 view 151028.doc -95-