EP2198425A1 - Verfahren, modul und computerprogramm mit quantifizierung auf der basis von gerzon-vektoren - Google Patents
Verfahren, modul und computerprogramm mit quantifizierung auf der basis von gerzon-vektorenInfo
- Publication number
- EP2198425A1 EP2198425A1 EP08840014A EP08840014A EP2198425A1 EP 2198425 A1 EP2198425 A1 EP 2198425A1 EP 08840014 A EP08840014 A EP 08840014A EP 08840014 A EP08840014 A EP 08840014A EP 2198425 A1 EP2198425 A1 EP 2198425A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- components
- function
- quantization
- module
- vectors
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 239000013598 vector Substances 0.000 title claims abstract description 75
- 238000000034 method Methods 0.000 title claims abstract description 23
- 238000011002 quantification Methods 0.000 title claims abstract description 7
- 238000013139 quantization Methods 0.000 claims description 38
- 230000009466 transformation Effects 0.000 claims description 37
- 230000007704 transition Effects 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 4
- 230000006870 function Effects 0.000 description 23
- 239000011159 matrix material Substances 0.000 description 11
- 238000009877 rendering Methods 0.000 description 9
- 238000006073 displacement reaction Methods 0.000 description 6
- 238000005457 optimization Methods 0.000 description 5
- 101150118300 cos gene Proteins 0.000 description 4
- 230000003595 spectral effect Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 230000005236 sound signal Effects 0.000 description 3
- 206010021403 Illusion Diseases 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000000354 decomposition reaction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000011478 gradient descent method Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 239000003550 marker Substances 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 101100234408 Danio rerio kif7 gene Proteins 0.000 description 1
- 101100221620 Drosophila melanogaster cos gene Proteins 0.000 description 1
- 101100398237 Xenopus tropicalis kif11 gene Proteins 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 150000001875 compounds Chemical group 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000000593 degrading effect Effects 0.000 description 1
- 230000000916 dilatatory effect Effects 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- XOFYZVNMUHMLCC-ZPOLXVRWSA-N prednisone Chemical compound O=C1C=C[C@]2(C)[C@H]3C(=O)C[C@](C)([C@@](CC4)(O)C(=O)CO)[C@@H]4[C@@H]3CCC2=C1 XOFYZVNMUHMLCC-ZPOLXVRWSA-N 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
Definitions
- the present invention relates to audio signal coding devices comprising quantization modules and intended in particular to take place in applications for transmission or storage of digitized and compressed audio signals.
- a 3D sound scene also called spatialized sound, comprises a plurality of audio channels each corresponding to monophonic signals.
- each monophonic signal is encoded independently of other signals based on perceptual criteria for reducing the bit rate by minimizing the perceptual distortion of the monophonic coded signal relative to the original monophonic signal.
- Audio encoders of the state of the art MPEG 2/4 AAC encoder type provide rate reduction techniques that minimize perceptual distortion of the signal.
- the coding of the multichannel signals of a sound scene includes, in certain cases, the introduction of a transformation (KLT, Ambiophonic, DCT, etc.) making it possible to better take into account the interactions that may exist between the different signals of the sound scene. to code.
- KLT KLT, Ambiophonic, DCT, etc.
- the present invention improves this situation by proposing, in a first aspect, a method of encoding components of an audio scene comprising N signals with N> 1, comprising a step of quantizing at least some of the components.
- the method is characterized in that the quantization is defined as a function of at least one energy vector and / or a velocity vector associated with Gerzon criteria and function of the components.
- a method according to the invention thus proposes a quantization which takes into account the interactions between the signals of a sound scene and which thus makes it possible to reduce the spatial distortion of the sound stage and thus to respect its original aspect.
- the allocation of bits to the spatial components is performed by considering the spatial accuracy and spatial stability of the restored sound scene.
- the audio quality of the decoded global sound stage is improved for a given coding rate.
- the quantization is defined as a function of variations of at least one of said energy and velocity vectors during component variations.
- the allocation of bits to the various components is thus performed as a function of the impact of their respective variations on the spatial accuracy and / or the spatial stability of the decoded sound scene.
- component variations corresponding to the minimization, or limitation, of variations of at least one of the energy and velocity vectors are determined and, based on said component variations, values are derived. quantization errors to define the quantification of components. This arrangement makes it possible to determine the quantization function which will give rise to a minimum or limited disturbance of the restored sound scene.
- a method according to the invention further comprises a step of detecting a transition frequency for determining which of the vectors among the energy vector or the velocity vector to be taken into account to define the quantization. components.
- the components are components obtained by spatial transformation, for example of the ambiophonic type.
- the transformation is a time / frequency transformation, for example a DCT, or a transformation combination.
- the energy vector is calculated based on an inverse spatial transformation on said spatial components and / or the velocity vector is calculated based on an inverse spatial transformation on said spatial components.
- the invention proposes a component processing module originating from an audio scene comprising N signals with N> 1, comprising means for determining elements for defining a quantization step of at least some of the components. , based at least on the energy vectors and / or the velocity vector associated with Gerzon criteria and function of the components.
- the invention provides an audio coder adapted to encode components of an audio scene comprising N signals with N> 1, comprising: a component processing module according to the second aspect of the invention; and a quantization module adapted to define quantization indices associated with components as a function of at least elements determined by the processing module.
- the invention proposes a computer program to be installed in a processing module, said program comprising instructions for implementing, during a program execution by means of processing said module, the steps of a method according to the first aspect of the invention.
- FIG. 1 represents an encoder in one embodiment of the invention
- Figure 2 illustrates the propagation of a plane wave in space
- FIG. 3 represents a device for restoring a sound stage, comprising loudspeakers.
- Gerzon's criteria are generally used to characterize the location of synthesized virtual sound sources when rendering signals from a 3D sound stage from the speakers of a given sound rendering system.
- the velocity vector V, of polar coordinates (r v , ⁇ v ) is then defined as
- the energy vector É, of polar coordinates (r E , ⁇ E ) is defined as follows:
- the encoder described below in one embodiment of the invention utilizes the velocity and energy vectors associated with the Gerzon criteria in an application other than that of searching for the best angles ⁇ p t characterizing the position of the speakers. speakers of a sound rendering system considered.
- Figure 1 shows an audio coder 1 in one embodiment of the invention.
- the encoder 1 comprises a time / frequency transformation module 3, a spatial transformation module 4, a quantization module 6 and a module 7 for constituting a binary sequence.
- a 3D sound stage to be coded includes
- N channels (with N> 1) on each of which a respective signal Si, ..., S N is delivered.
- the time / frequency conversion module 3 of the encoder 1 receives as input the N signals Si,..., S N of the 3D sound scene to be encoded.
- An MDCT coefficient Y 1 k thus represents the element of the spectrum of the signal Si for the frequency F k .
- the spatial transformation module 4 is adapted to perform a spatial transformation of the input signals provided, that is to say to determine the spatial components of these signals resulting from the projection on a spatial repository depending on the order of the transformation. .
- the order of a spatial transformation is related to the angular frequency according to which it "scans" the sound field.
- the spatial transformation considered is the ambiophonic transformation.
- the sound scene is then represented by a set of signals called ambiophonic components, which make it possible to store the sound information relative to the acoustic field. This representation facilitates the manipulation of the acoustic field (rotation of the sound stage, distortion of perspective i.e. possibility of tightening the frontal scene and dilating the back scene) and the extraction of relevant parameters for a reproduction on a given device.
- Another advantage of the surround transformation is that, in the case where the number N of signals of the sound stage is large, it is possible to represent them by a number L of ambiophonic components much lower than N, degrading very little the quality space of the sound stage. The volume of data to be transmitted is reduced and this without significant degradation of the audio quality of the sound scene.
- the spatial transformation module 4 performs an ambiophonic transformation, which gives a compact spatial representation of a 3D sound scene, by producing projections of the sound field on the associated spherical or cylindrical harmonic functions.
- ambiophonic transformations For more information on the ambiophonic transformations, one can refer to the following documents: "Representation of acoustic fields, application to the transmission and the reproduction of complex sound scenes in a multimedia context", Thesis of doctorate of the university Paris 6, Jerome DANIEL, July 31, 2001, "A highly scalable spherical array based microphone on an orthonormal decomposition of the sound field," Jens Meyer - Gary Elko, Vol. He - pp. 1781-1784 in Proc. ICASSP 2002.
- the ambiophonic transform of a signal Si expressed in the time domain then comprises the following 2p + 1 components: (Pi, Pi. cos ⁇ i, Pi.sin ⁇ i, Pi.cos2 ⁇ i, Pi.sin2 ⁇ i, Pi.cos3 ⁇ i, Pi.sin3 ⁇ i, ..., Pi.cosp ⁇ i,
- A (A i ⁇ ⁇ ⁇ L be the ambiophonic transformation matrix of order pl ⁇ J ⁇ N for the 3D scene.
- yf2 cos p ⁇ N
- X (x ⁇ k ⁇ ⁇ t ⁇ L
- the matrix X of the surround components is determined using the following equation:
- the method exploits relationships between variations in velocity and energy vectors used in Gerzon criteria and variations in surround components.
- the quantization function thus defined is then applied to the ambiophonic components received by the quantization module 6.
- D be the p-order ambiophonic decoding matrix for a regular loudspeaker audio rendering system (i.e., the loudspeakers are arranged regularly around a point).
- ambiophonic components of order p with L Ip + 1 and ⁇ [k] is the vector of the powers of the respective signals delivered to the loudspeakers Q 'after surround decoding.
- a variation of the values taken by the ambiophonic components therefore implies a corresponding variation or displacement of the Gerzon vectors around their original position.
- the processing module 5 seeks to determine the quantization error h of the surround components with the Deb flow rate, which optimizes the displacement of the vectors. of Gerzon.
- the optimization sought is the minimization, or the limitation within a given threshold, of the displacement of the Gerzon vectors around their position corresponding to a zero error. This amounts to looking for the value of the error vector h which allows the Gerzon vectors to keep an orientation and a module fairly close to the calculated Gerzon vectors without quantization.
- Gerzon's vectors make it possible to control the degree of spatial fidelity (stability and accuracy of the sound image restored) during the rendering of a sound scene on a given device.
- This vector (10) represents the variations of the Gerzon vectors for a displacement h of the values of the ambiophonic components (X n ) 1 ⁇ n ⁇ L -
- the quantization module 6 is a high resolution quantizer
- the optimization problem to be solved can be written as follows: r ⁇ d ⁇ v ⁇ (hf
- Element ⁇ is a vector indicating a given threshold of spatial perception. This threshold vector ⁇ can be determined statistically by calculating for different rendering systems and for different ambiophonic transformation orders the threshold at which the change in the values taken by the surround components becomes perceptible.
- this optimization problem is solved by the processing module using the Lagrangian method and gradient descent methods, for example using a computer program implementing the steps of the algorithm described below.
- Lagrangian and gradient descent methods are known.
- step b / it is determined, with respect to the frequency Fk,
- This determination is made by searching the coordinates of
- step d / the flow is determined Dj ( ⁇ allocated for coding the j ⁇ eme surround component in the frequency F k equal to
- the value D (1> is then compared to the Deb value of the desired overall flow rate.
- step d / an iteration (l ⁇ , the value of the flow D (l /> obtained
- the coordinates h (//) of the vector h '''calculated during the iteration (l f ) for a frequency Fk are those of the error minimizing the displacement of the Gerzon vectors in the frequency F k .
- the quantization function is thus defined for each surround component in each frequency F k: the coordinate h ⁇ (lf) (k) calculated for the frequency Fk represents the quantization error of the j ⁇ eme surround component in the frequency Fk.
- the module 6 determines the corresponding quantization indices for each ambiophonic spectral component and supplies these data to the module 7 for constituting a binary sequence.
- additional processing on the received data for example an entropy coding
- the invention thus proposes a novel quantization technique applicable to multichannel signals, which takes into account spatial characteristics of the scene to be encoded.
- the quantization defined by the allocation of the bits, by the quantization steps or by an index characterizing a quantizer among a set, is determined so as to cause a limited deviation of the Gerzon vectors, and thus to guarantee during the restitution of the Quantized signals an acoustic scene true to the original acoustic scene.
- the velocity and energy vectors are two mathematical tools introduced by Gerzon whose objective is to translate the effect of the localization, in the low and high frequency domains respectively, of a synthesized sound source. For a listener placed in the center of a reproduction device, the velocity vector V and the energy vector E are respectively associated with the location effects at low and high frequencies.
- a transition frequency is determined which determines the preponderance domains of the V and E criteria.
- the prediction of the location is carried out thanks to the energy vector É and for the frequencies below this transition frequency, the location is based on the velocity vector V.
- the transition frequency corresponds to the frequency beyond which the wavefront is smaller than the size of the head. In the case of first-order surround systems, this transition frequency is of the order of 700 Hz.
- the first problem corresponds to seeking to optimize the position of the source reconstructed after quantization in the low frequency domain
- the second problem corresponds to seeking to optimize it in the high frequency domain.
- the invention is implemented using an inverse spatial transformation of a spatial transformation used during coding.
- the Gerzon vectors are computed and used independently of a transform possibly used during coding, ie the invention may be implemented when the signals are or are not spatial transformation or other.
- Gerzon vectors are physical parameters that make it possible to characterize the wavefront reconstructed by the superposition of the waves emitted by the different loudspeakers (see “Representation of acoustic fields, application to the transmission and reproduction of scenes”. complex sounds in a multimedia context ", Doctoral thesis of the Paris 6 University, July 31, 2001, Jércons Daniel).
- Gerzon vectors can be computed without the prior use of surround encoding.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| FR0757972 | 2007-10-01 | ||
| PCT/FR2008/051764 WO2009050409A1 (fr) | 2007-10-01 | 2008-09-30 | Procede, module et programme d'ordinateur avec quantification en fonction des vecteurs de gerzon |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| EP2198425A1 true EP2198425A1 (de) | 2010-06-23 |
Family
ID=39295969
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP08840014A Withdrawn EP2198425A1 (de) | 2007-10-01 | 2008-09-30 | Verfahren, modul und computerprogramm mit quantifizierung auf der basis von gerzon-vektoren |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US20100241439A1 (de) |
| EP (1) | EP2198425A1 (de) |
| WO (1) | WO2009050409A1 (de) |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9338552B2 (en) | 2014-05-09 | 2016-05-10 | Trifield Ip, Llc | Coinciding low and high frequency localization panning |
| CN115715470B (zh) | 2019-12-30 | 2025-11-18 | 卡姆希尔公司 | 用于提供空间化声场的方法 |
| CN115497485B (zh) * | 2021-06-18 | 2024-10-18 | 华为技术有限公司 | 三维音频信号编码方法、装置、编码器和系统 |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8712061B2 (en) * | 2006-05-17 | 2014-04-29 | Creative Technology Ltd | Phase-amplitude 3-D stereo encoder and decoder |
| US8345899B2 (en) * | 2006-05-17 | 2013-01-01 | Creative Technology Ltd | Phase-amplitude matrixed surround decoder |
| US8379868B2 (en) * | 2006-05-17 | 2013-02-19 | Creative Technology Ltd | Spatial audio coding based on universal spatial cues |
-
2008
- 2008-09-30 EP EP08840014A patent/EP2198425A1/de not_active Withdrawn
- 2008-09-30 WO PCT/FR2008/051764 patent/WO2009050409A1/fr not_active Ceased
- 2008-09-30 US US12/681,104 patent/US20100241439A1/en not_active Abandoned
Non-Patent Citations (1)
| Title |
|---|
| See references of WO2009050409A1 * |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2009050409A1 (fr) | 2009-04-23 |
| US20100241439A1 (en) | 2010-09-23 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP2143102B1 (de) | Verfahren zur audiokodierung und -dekodierung, audiokodierer, audiodekodierer und zugehörige computerprogramme | |
| EP2374123B1 (de) | Verbesserte codierung von mehrkanaligen digitalen audiosignalen | |
| EP2168121B1 (de) | Quantifizierung nach linearer umwandlung durch kombination von audiosignalen einer klangszene und kodiergerät dafür | |
| EP2002424B1 (de) | Vorrichtung und verfahren zur skalierbaren kodierung eines mehrkanaligen audiosignals auf der basis einer hauptkomponentenanalyse | |
| JP7789811B2 (ja) | 回転の補間と量子化による空間化オーディオコーディング | |
| WO2010076460A1 (fr) | Codage perfectionne de signaux audionumériques multicanaux | |
| EP2145167B1 (de) | Audiokodierungsverfahren sowie entsprechendes Audiokodierungsgerät, kodierten Signal und Computerprogramme dafür | |
| JP2009524108A (ja) | 拡張帯域周波数コーディングによる複素変換チャネルコーディング | |
| EP2005420A1 (de) | Einrichtung und verfahren zur codierung durch hauptkomponentenanalyse eines mehrkanaligen audiosignals | |
| EP2198425A1 (de) | Verfahren, modul und computerprogramm mit quantifizierung auf der basis von gerzon-vektoren | |
| EP4042418B1 (de) | Bestimmung von korrekturen zur anwendung auf ein mehrkanalaudiosignal, zugehörige codierung und decodierung | |
| EP4172986B1 (de) | Optimierte codierung einer für ein räumliches bild eines mehrkanaligen audiosignals repräsentativen information | |
| EP4268374B1 (de) | Optimierte codierung von rotationsmatrizen zur codierung eines mehrkanaligen audiosignals | |
| US20120035939A1 (en) | Method of processing signal, encoding apparatus thereof, decoding apparatus thereof, and signal processing system | |
| EP4533449A1 (de) | Titel räumliche audiokodierung mit konfiguration einer dekorrelationsverarbeitungsoperation | |
| EP4371108A1 (de) | Optimierte sphärische vektorquantisierung | |
| CN120418863A (zh) | 神经网络模型进行立体声解码的方法及解码器 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| 17P | Request for examination filed |
Effective date: 20100329 |
|
| AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MT NL NO PL PT RO SE SI SK TR |
|
| AX | Request for extension of the european patent |
Extension state: AL BA MK RS |
|
| DAX | Request for extension of the european patent (deleted) | ||
| 17Q | First examination report despatched |
Effective date: 20110110 |
|
| GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
| 18D | Application deemed to be withdrawn |
Effective date: 20120327 |