EP3706119A1 - Räumliche audiocodierung mit interpolation und quantifizierung der drehungen - Google Patents

Räumliche audiocodierung mit interpolation und quantifizierung der drehungen Download PDF

Info

Publication number: EP3706119A1
Authority: EP; European Patent Office
Prior art keywords: matrix; channels; rotation; frame; current frame
Prior art date: 2019-03-05
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.): Withdrawn

Application number

EP19305254.5A

Other languages

English (en)

French (fr)

Inventor

Stéphane RAGOT

Pierre Mahe

Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)

Orange SA

Original Assignee

Orange SA

Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)

2019-03-05

Filing date

2019-03-05

Publication date

2020-09-09

2019-03-05 Application filed by Orange SA filed Critical Orange SA

2019-03-05 Priority to EP19305254.5A priority Critical patent/EP3706119A1/de

2020-02-10 Priority to PCT/EP2020/053264 priority patent/WO2020177981A1/fr

2020-02-10 Priority to CN202080031569.8A priority patent/CN113728382B/zh

2020-02-10 Priority to EP24203647.3A priority patent/EP4498367A1/de

2020-02-10 Priority to JP2021552656A priority patent/JP7419388B2/ja

2020-02-10 Priority to US17/436,390 priority patent/US11922959B2/en

2020-02-10 Priority to KR1020217031995A priority patent/KR20210137114A/ko

2020-02-10 Priority to EP20703048.7A priority patent/EP3935629B1/de

2020-02-10 Priority to CN202410956721.3A priority patent/CN118692474A/zh

2020-02-10 Priority to PL20703048.7T priority patent/PL3935629T3/pl

2020-02-10 Priority to BR112021017511A priority patent/BR112021017511A2/pt

2020-02-10 Priority to ES20703048T priority patent/ES3012112T3/es

2020-09-09 Publication of EP3706119A1 publication Critical patent/EP3706119A1/de

2021-09-03 Priority to ZA2021/06465A priority patent/ZA202106465B/en

2024-01-09 Priority to JP2024001364A priority patent/JP7789811B2/ja

Status Withdrawn legal-status Critical Current

Links

239000011159 matrix material Substances 0.000 claims abstract description 205
238000000034 method Methods 0.000 claims abstract description 50
238000007906 compression Methods 0.000 claims abstract description 9
230000006835 compression Effects 0.000 claims abstract description 9
230000005236 sound signal Effects 0.000 claims abstract description 6
238000012360 testing method Methods 0.000 claims abstract description 6
238000000513 principal component analysis Methods 0.000 claims description 41
239000013598 vector Substances 0.000 claims description 35
230000006870 function Effects 0.000 claims description 17
238000012545 processing Methods 0.000 claims description 12
230000005540 biological transmission Effects 0.000 claims description 10
NCGICGYLBXGBGN-UHFFFAOYSA-N 3-morpholin-4-yl-1-oxa-3-azonia-2-azanidacyclopent-3-en-5-imine;hydrochloride Chemical compound Cl.[N-]1OC(=N)C=[N+]1N1CCOCC1 NCGICGYLBXGBGN-UHFFFAOYSA-N 0.000 claims description 7
238000004590 computer program Methods 0.000 claims description 7
230000002441 reversible effect Effects 0.000 claims description 7
230000009466 transformation Effects 0.000 description 43
201000006747 infectious mononucleosis Diseases 0.000 description 37
238000000354 decomposition reaction Methods 0.000 description 25
238000013139 quantization Methods 0.000 description 15
230000003044 adaptive effect Effects 0.000 description 13
238000013459 approach Methods 0.000 description 12
238000004458 analytical method Methods 0.000 description 7
238000004364 calculation method Methods 0.000 description 7
230000000875 corresponding effect Effects 0.000 description 7
238000011002 quantification Methods 0.000 description 6
241001080024 Telles Species 0.000 description 5
238000006243 chemical reaction Methods 0.000 description 5
238000007781 pre-processing Methods 0.000 description 5
238000004891 communication Methods 0.000 description 4
238000009877 rendering Methods 0.000 description 4
230000008859 change Effects 0.000 description 3
230000002123 temporal effect Effects 0.000 description 3
PUAQLLVFLMYYJJ-UHFFFAOYSA-N 2-aminopropiophenone Chemical compound CC(N)C(=O)C1=CC=CC=C1 PUAQLLVFLMYYJJ-UHFFFAOYSA-N 0.000 description 2
101100189060 Arabidopsis thaliana PROC1 gene Proteins 0.000 description 2
101100536354 Drosophila melanogaster tant gene Proteins 0.000 description 2
102100028043 Fibroblast growth factor 3 Human genes 0.000 description 2
102100024061 Integrator complex subunit 1 Human genes 0.000 description 2
101710092857 Integrator complex subunit 1 Proteins 0.000 description 2
108050002021 Integrator complex subunit 2 Proteins 0.000 description 2
241000861223 Issus Species 0.000 description 2
240000008042 Zea mays Species 0.000 description 2
230000006978 adaptation Effects 0.000 description 2
230000008901 benefit Effects 0.000 description 2
230000015572 biosynthetic process Effects 0.000 description 2
238000004422 calculation algorithm Methods 0.000 description 2
239000002775 capsule Substances 0.000 description 2
230000000694 effects Effects 0.000 description 2
229940082150 encore Drugs 0.000 description 2
238000001914 filtration Methods 0.000 description 2
238000010606 normalization Methods 0.000 description 2
238000005457 optimization Methods 0.000 description 2
230000009467 reduction Effects 0.000 description 2
230000000717 retained effect Effects 0.000 description 2
238000003786 synthesis reaction Methods 0.000 description 2
238000012546 transfer Methods 0.000 description 2
101100322245 Caenorhabditis elegans des-2 gene Proteins 0.000 description 1
RKTYLMNFRDHKIL-UHFFFAOYSA-N copper;5,10,15,20-tetraphenylporphyrin-22,24-diide Chemical compound [Cu+2].C1=CC(C(=C2C=CC([N-]2)=C(C=2C=CC=CC=2)C=2C=CC(N=2)=C(C=2C=CC=CC=2)C2=CC=C3[N-]2)C=2C=CC=CC=2)=NC1=C3C1=CC=CC=C1 RKTYLMNFRDHKIL-UHFFFAOYSA-N 0.000 description 1
230000002596 correlated effect Effects 0.000 description 1
230000003247 decreasing effect Effects 0.000 description 1
230000007547 defect Effects 0.000 description 1
238000011161 development Methods 0.000 description 1
238000009826 distribution Methods 0.000 description 1
230000009977 dual effect Effects 0.000 description 1
238000005516 engineering process Methods 0.000 description 1
238000009472 formulation Methods 0.000 description 1
238000007429 general method Methods 0.000 description 1
238000004519 manufacturing process Methods 0.000 description 1
238000013507 mapping Methods 0.000 description 1
229940050561 matrix product Drugs 0.000 description 1
238000002156 mixing Methods 0.000 description 1
239000000203 mixture Substances 0.000 description 1
230000008569 process Effects 0.000 description 1
230000001869 rapid Effects 0.000 description 1
238000005070 sampling Methods 0.000 description 1
238000003860 storage Methods 0.000 description 1
238000000844 transformation Methods 0.000 description 1
230000001052 transient effect Effects 0.000 description 1
230000007704 transition Effects 0.000 description 1
238000011282 treatment Methods 0.000 description 1

Images

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/002—Dynamic bit allocation
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/032—Quantisation or dequantisation of spectral components
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/06—Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients

Definitions

the present invention relates to the encoding / decoding of spatialized sound data, in particular in a surround sound context (hereinafter also referred to as “ambisonic”).
the coders / decoders which are currently used in mobile telephony are mono (a single signal channel for reproduction on a single loudspeaker).
codecs which are currently used in mobile telephony are mono (a single signal channel for reproduction on a single loudspeaker).
the 3GPP EVS (for “Enhanced Voice Services”) codec makes it possible to offer “Super-HD” quality (also called “High Definition +” or HD + voice) with an audio band in super-wide band (SWB for “super- wideband "in English) for signals sampled at 32 or 48 kHz or full band (FB for" Fullband ”) for signals sampled at 48 kHz; the audio bandwidth is 14.4 to 16 kHz in SWB mode (9.6 to 128 kbit / s) and 20 kHz in FB mode (16.4 to 128 kbit / s).
the next quality development in conversational services offered by operators should be immersive services, using terminals such as smartphones, for example, equipped with several microphones or spatialized audio conferencing or tele-presence type videoconferencing equipment. , or even “live” content sharing tools, with spatialized 3D sound rendering, which is far more immersive than a simple 2D stereo rendering.
terminals such as smartphones, for example, equipped with several microphones or spatialized audio conferencing or tele-presence type videoconferencing equipment.
live content sharing tools with spatialized 3D sound rendering, which is far more immersive than a simple 2D stereo rendering.
Ambisonics is a recording method ("encoding” in the acoustic sense) of spatialized sound and a reproduction system (“decoding” in the acoustic sense).
An ambisonic microphone (at order 1) comprises at least four capsules (typically of the cardoid or sub-cardoid type) arranged on a spherical grid, for example the vertices of a regular tetrahedron.
the audio channels associated with these capsules are called “A-format”. This format is converted into a “B-format”, in which the sound field is broken down into four components (spherical harmonics) denoted W, X, Y, Z, which correspond to four coincident virtual microphones.
the W component corresponds to an omnidirectional capture of the sound field while the X, Y and Z components, which are more directive, are comparable to pressure gradients oriented along the three dimensions of space.
An ambisonic system is a flexible system in the sense that recording and playback are separate and decoupled. It allows decoding (in the acoustic sense) on any speaker configuration (for example, binaural, 5.1-type surround sound or 7.1.4-type periphery (with elevation)).
the ambisonic approach can be generalized to more than four channels in B-format and this generalized representation is commonly called “HOA” (for “Higher-Order Ambisonics”). Breaking down the sound into more spherical harmonics improves the spatial accuracy of reproduction when rendering on loudspeakers.
FOA First-Order Ambisonics
plane variant of ambisonics which breaks down the sound defined in a plane which is generally the horizontal plane. In this case, the number of components is 2N + 1 channels.
the first order ambisonics (4 channels: W, X, Y, Z) and the first order planar ambisonics (3 channels: W, X, Y) are hereinafter referred to as “ambisonics” indiscriminately to facilitate reading, the treatments presented being applicable regardless of planar type or not.
first-order ambisonics and “first-order planar ambisonics” are used. Note that we can derive from the 1st order B-format a stereo signal (2 channels) corresponding to coincident stereo pickups of the Blumlein Crossed Pair (X + Y and XY) or Mid-Side type (by combining W and X for Mid and taking Y as Side).
ambisonic sound a B-format signal with a predetermined order
the ambisonic sound can be defined in another format such as A-format or pre-combined channels by fixed matrixing (keeping the number of channels or reducing it to a 3 or 2 channel case), as will be seen. further.
the signals to be processed by the encoder / decoder are presented as successions of blocks of sound samples called “frames” or “sub-frames” below.
the simplest approach to encoding a stereo or ambisonic signal is to use a mono encoder and apply it in parallel to all channels with possibly a different bit allocation depending on the channels. This approach is called here “multi-mono” (even if in practice we can generalize the approach to multi-stereo or a use of several parallel instances of the same core codec).
the input signal is divided into channels (mono) by block 100. These channels are individually encoded by blocks 120 through 122 according to a predetermined allocation. Their binary train is multiplexed (block 130) and after transmission and / or storage it is demultiplexed (block 140) to apply a decoding of each of the channels (blocks 150 to 152) which are recombined (block 160).
the MPEG-H codec for ambisonic sounds uses an add-overlap operation which adds delay and complexity, as well as linear interpolation on direction vectors which is suboptimal and introduces defects.
a basic problem with this codec is that it implements a decomposition into predominant components and ambience because the predominant components are supposed to be perceptually distinct from ambience, but this decomposition is not completely specified.
the MPEG-H encoder suffers from the problem of non-correspondence between the directions of the principal components from one frame to another: the order of the components (signals) can be swapped just like the associated directions. This is the reason why the MPEG-H codec uses a “matching” and overlap-add technique in order to solve this problem.
the present invention makes it possible to improve a decorrelation between the N channels to be encoded separately subsequently.
This separate encoding is hereinafter also referred to as “multi-mono encoding”.
the ambisonic representation is of order 1 and the number N of channels is four, and the rotation matrix of the current frame is represented by two quaternions.
each interpolation for a current sub-frame is a linear spherical interpolation (or “SLERP”), carried out as a function of the interpolation of the sub-frame preceding the sub-frame. current frame and from the quaternions of the previous subframe.
SLERP linear spherical interpolation
the search for the eigenvectors is performed by principal component analysis (or "PCA”) or by Karhunen Loeve transform (or "KLT”), in the time domain.
PCA principal component analysis
KLT Karhunen Loeve transform
other embodiments can be considered (decomposition into singular values, or others).
the present invention is also aimed at a coding device comprising a processing circuit for implementing the coding method presented above.
the signals are represented by successive blocks of sound samples, these blocks being called “sub-frames” below.
the invention uses a representation of the rotations in dimension n with parameters suitable for a quantization per frame and especially an efficient interpolation per sub-frame.
the representations of rotations used in dimension 2, 3 and 4 are defined below.
the interpolation between two rotations of respective angles ⁇ 1 and ⁇ 2 can be done by linear interpolation between ⁇ 1 and ⁇ 2 , taking into account the constraint of the shortest path on the unit circle between these two angles.
a rotation matrix of size 3x3 can be decomposed into a product of 3 elementary rotations of angle ⁇ along the x, y, or z axes.
angles are said to be Eulerian or Cardanic.
the real part a is called a scalar and the three imaginary parts ( b, c , d ) form a 3D vector.
Unit quaternions (of norm 1) represent rotations - however this representation is not unique; thus, if q represents a rotation, - q represents the same rotation.
the coefficients ⁇ i in the diagonal of ⁇ are the singular values of the matrix A. By convention, they are generally listed in decreasing order, and in this case the diagonal matrix ⁇ associated with A is unique.
the rank r of A is given by the number of non-zero coefficients ⁇ i .
We can therefore rewrite the decomposition in singular values as: AT U r U ⁇ r ⁇ r 0 0 0 V r T V ⁇ r T
U r [ u 1 , u 2 , ..., u r ] are the left singular vectors (or output vectors) of A
⁇ r diag ( ⁇ 1 , ..., ⁇ r )
V r [ v 1 , v 2 , ..., v r ] are the right singular vectors (or input vectors) of A.
the eigenvalues of ⁇ T ⁇ and ⁇ ⁇ T are ⁇ 1 2 , ... , ⁇ r 2 .
the columns of U are the eigenvectors of AA T
the columns of V are the eigenvectors of A T A.
the SVD can be interpreted in a geometric way: the image of a sphere in dimension n by the matrix A is in dimension m a hyper-ellipse having main axes in the directions u 1 , u 2 , ..., u m and of length ⁇ 1 , ..., ⁇ m .
the KLT makes it possible to decorrelate the components of x ; the variances of the transformed vector y are the eigenvalues of R xx .
Principal component analysis (or "PCA” for “principal component analysis )
PCA Principal Component Analysis
PCA is generally seen as a dimensionality reduction technique, to "compress" a large-dimensional dataset into a set. comprising few principal components.
the PCA advantageously makes it possible to decorrelate the multidimensional input signal but one avoids eliminating channels (therefore reducing the number of channels) in order to avoid introducing artefacts.
a minimum encoding rate is thus forced to avoid “truncating" the spatial image, except in specific variants where eigenvalues are so low that a zero rate can be authorized (for example to better encode ambisonic sounds created artificially. with a single synthetically spatialized source).
this figure 7 illustrates an example of a structural embodiment of a codec (coder or decoder) within the meaning of the invention.
codec coder or decoder
the coder's strategy is to decorrelate the channels of the ambisonic signal as much as possible and to encode them with a core codec. This strategy makes it possible to limit the artefacts in the decoded ambisonic signal. More particularly, one seeks to apply an optimized decorrelation of the input channels before multi-mono coding here.
an interpolation of which the computation cost for the encoder and the decoder is limited because it is carried out in a specific domain makes it possible to interpolate the covariance matrices calculated for the PCA / KLT analysis rather than repeating a decomposition into eigenvalues and eigenvectors several times per frame.
the latter can typically be an extension of the standardized 3GPP EVS (for “Enhanced Voiced Services”) encoder.
the EVS coding rates can be used without then modifying the structure of the EVS binary train.
the multi-mono coding (block 340 of the figure 3 described later) operates here with a possible allocation to each transformed channel, restricted to the following bit rates for super-wide audio band coding: 9.6; 13.2; 16.4; 24.4; 32; 48; 64; 96 and 128 kbit / s.
additional bit rates to have a finer allocation granularity
the block 300 receives an input signal Y in the current frame of index t.
the index is not indicated here so as not to weigh down the ratings.
This is a matrix of size nx L.
n 4 channels W, Y, Z, X (thus defined according to the order ACN) which can be standardized according to the SN3D convention.
the order of the channels can be alternately, for example W, X, Y, Z (following the FuMA convention) and the normalization can be different (N3D or FuMa).
the signal (in each channel) is sampled at 48 kHz, without loss of generality.
the block 300 of the encoder applies a pre-processing (optional) to obtain the pre-processed input signal denoted Y.
This may be a high-pass filtering (with a cutoff frequency typically at 20 Hz) of each new one. 20 ms frame of the input signal channels. This operation removes the DC component likely to bias estimating the covariance matrix so that at the output of block 300 the signal can be considered to have zero mean.
a low pass filter of block 340 can also be implemented to perform multi-mono encoding but when block 300 is implemented the high pass filtering in preprocessing of the mono encoding which can be used in block 340 is preferably disabled to avoid repeating the same preprocessing and thus reduce overall complexity.
M B ⁇ AT 1 / 2 1 6 0 1 12 1 / 2 - 1 6 0 1 12 1 / 2 0 1 6 - 1 12 1 / 2 0 - 1 6 - 1 12 1 / 2 0 - 1 6 - 1 12
the following block 310 estimates at each frame t a transformation matrix obtained by determining the eigenvectors by PCA / KLT and checking that the transformation matrix formed by these eigenvectors indeed characterizes a rotation. Details of the block operation 310 are given below with reference to figure 4 .
This transformation matrix performs a matrixing of the channels to de-correlate them making it possible to apply an independent coding of the multi-mono type by the block 340.
the block 310 transmits to the multiplexer quantization indices representing the matrix of transformation and optionally information encoding the number of interpolations of the transformation matrix, per sub-frame of the current frame t, as also detailed further below.
Block 320 determines the optimal rate allocation for each channel (after PCA / KLT transformation) as a function of a given B-bit budget. This block seeks a distribution of the bit rate between channels by calculating a score for each possible combination of bit rates; the optimal allocation is found by looking for the combination maximizing this score. Several criteria can be used to define a score for each combination. For example, the number of possible rates for mono encoding of a channel may be limited to the nine discrete rates of the EVS codec having super-wide audio band: 9.6; 13.2; 16.4; 24.4; 32; 48; 64; 96 and 128 kbit / s.
the codec according to the invention operates at a given bit rate associated with a budget of B bits in the current frame of index t, in general only a subset of these listed bit rates can be used.
⁇ i 1 not b t , i ⁇ B S b t , 1 .. b t , not
the factor E i can be fixed at the value taken by the eigenvalue associated with the channel i resulting from the decomposition into eigenvalues of the signal at the input of block 310 and after possible signed permutation.
MOS score values for each of the listed bit rates can be derived from other tests (subjective or objective) predicting the quality of the codec. It is also possible to adapt the MOS notes used in the current frame according to a classification of the type of signal (for example a speech signal without background noise, or speech with ambient noise, or music or mixed content), by reusing classification methods implemented by the EVS codec and by applying them to the W channel of the ambisonic input signal before performing the binary allocation.
the MOS score can also correspond to an average score resulting from different types of methodologies and rating scales: MOS (absolute) from 1 to 5, DMOS (from 1 to 5), MUSHRA (from 0 to 100).
the list of bit rates b i and the notes Q ( b i ) can be replaced as a function of this other codec. It is also possible to add additional coding rates to the EVS encoder and therefore complete the list of rates and MOS notes, or even modify the EVS encoder and potentially the associated MOS notes.
the allocation between the channels is refined by weighting the energy by a power ⁇ where ⁇ takes a value between 0 and 1.
⁇ takes a value between 0 and 1.
a second weighting can be added to the score function to penalize inter-frame rate changes.
a penalty is added to the score if the rate combination is not the same in frame t as in frame t - 1.
This additional weighting makes it possible to limit the too frequent fluctuations in flow rate between the channels. With this weighting, only significant changes in energy cause a change in flow. It is also possible to vary the value of the constant to adjust a stability of the allocation.
this bit rate is coded by block 330, for example exhaustively for all the bit rate combinations.
the index can then be represented by a coding of the type "permutation code” + “offset of the combination”; for example in the example where we code on a 4-bit index the 16 bit rate combinations comprising 4 permutations of (13.2, 13.2, 13.2, 9.6) and 12 permutations of (16.4, 13.2, 9.6, 9.6), we can use indices 0-3 to code the first 4 possible permutations (with an offset of 0 and a code ranging from 0 to 3) and the indices 4-15 to code the 12 other possible permutations (with an offset of 4 and a code of 0 to 11).
this matrix can be replaced by the correlation matrix, where the channels are pre-normalized by their respective standard deviation, or generally weights reflecting a relative importance can be applied to each of the channels; moreover, the normalization term 1 / ( L - 1) can be omitted or replaced by another value (for example 1 / L ).
the values C ij correspond to the variance between x i and x j .
the encoder then performs in block 410 an eigenvalue decomposition (EVD for “Eigenvalue Decomposition”), by calculating the eigenvalues and the eigenvectors of the matrix C.
the eigenvectors are noted here V t to indicate the frame index t because the eigenvectors V t -1 obtained in the previous frame of index t - 1 are preferably stored and used subsequently.
the eigenvalues are denoted ⁇ 1 , ⁇ 2 , ..., ⁇ n .
a singular value decomposition (SVD) of the X preprocessed channels can be used.
V singular value decomposition
the encoder then applies in block 420 a first signed permutation of the columns of the transformation matrix for the frame t (the columns of which are the eigenvectors) in order to avoid too much disparity with the transformation matrix of the previous frame t - 1, which would cause click problems at the border with the previous frame.
the eigenvectors of frame t are permuted so that the associated basis are as close as possible to the basis of frame t - 1. This has the effect of improving the continuity of the transformed signal frames (once the transformation matrix is applied to the channels).
transformation matrix must correspond to a rotation. This constraint makes it possible to guarantee that the encoder can convert the transformation matrix into generalized Euler angles (block 430) in order to quantize them (block 440) with a predetermined bit budget as seen previously. For this purpose, the determinant of this matrix must be positive (equal to +1 typically).
the transformation matrix resulting from blocks 410 and 420 is an orthogonal (unitary) matrix which can have a determinant at -1 or 1, that is to say a reflection or rotation matrix.
the transformation matrix is a reflection matrix (if its determinant is equal to -1), it can be modified into a rotation matrix by inverting an eigenvector (for example the eigenvector associated with the lowest value) or by inverting two columns (eigenvectors).
eigenvector for example the eigenvector associated with the lowest value
eigenvectors two columns
Certain methods of decomposition into eigenvalues for example by Givens rotation
transformation matrices which are intrinsically matrices of rotation (with a determinant at +1); in this case, the step of verifying that the determinant is +1 will be optional.
Block 430 converts the rotation matrix into parameters.
an angular representation is used for the quantification (6 generalized Euler angles for the 4D case, 3 Euler angles for the 3D case, and one 2D angle).
For the ambisonic case (four channels) we obtain six Euler angles generalized according to the method described in the article “Generalization of Euler Angles to N-Dimensional Orthogonal Matrices” by David K. Hoffman, Richard C. Raffenetti, and Klaus Ruedenberg , published in Journal of Mathematical Physics 13, 528 (1972); for the case of planar ambisonics (three channels), three Euler angles are obtained and for the stereo case, an angle of rotation is obtained according to the methods well known in the state of the art.
the values of the angles are quantized in block 440 with a predetermined budget of bits.
a scalar quantization is used and the quantization step is for example identical for each angle.
we code 6 Euler angles generalized with 3x (8 + 9) 51 bits (3 angles defined on an interval of [- ⁇ / 2, ⁇ / 2] coded on 8 bits with a step of ⁇ / 256 and the 3 other angles defined on an interval of [- ⁇ , ⁇ ] coded on 9 bits with one with a step of ⁇ / 256).
the quantization indices of the transformation matrix are sent to the multiplexer (block 350).
the block 440 will be able to convert the quantized parameters into a quantized rotation matrix V ⁇ t , if the parameters used for the quantization do not correspond to the parameters used for the interpolation.
the number of interpolations can be fixed (equal to a predetermined value) or adaptive. Each frame is then divided into sub-frames as a function of the number of interpolations determined in the block 450.
the block 450 can code on a chosen number of bits the number of interpolations to be performed, and therefore the number of sub-frames to be provided, in the case where this number is determined adaptively; in the case of a fixed interpolation, no information is to be coded.
block 460 converts the rotation matrices to a specific domain representing a rotation matrix.
the frame is divided into sub-frames, and in the chosen domain the interpolation is performed for each sub-frame.
the encoder reconstructs from the 6 quantized Euler angles a quantized 4D rotation matrix and that- ci is then converted to two unit quaternions for interpolation purposes.
the encoder reconstructs from the 3 quantized Euler angles a quantized 3D rotation matrix and the latter is then converted to a unitary quaternion for interpolation purposes.
the encoder uses in block 460 the representation of the 2D rotation quantized with a rotation angle.
the rotation matrix calculated for frame t is factored into 2 quaternions (a double quaternion) thanks to the Cayley factorization and we use the double quaternion memorized for the previous frame t -1 and noted ( Q L, t -1 , Q R, t -1 ).
the quaternions two by two are interpolated in each sub-frame.
the block determines the shortest path between the two possible ( Q L , t or -Q L , t ). Depending on the case, the sign of the quaternion of the current frame is reversed.
the matrices V t interp ⁇ (or their transposed) computed by subframe in the interpolation block 460 are then used in the transformation block 470 which produces n channels transformed by applying the rotation matrices thus found, to the ambisonic channels which have been preprocessed by the block 300.
the final difference between the corrected rotation matrix of the frame t and the rotation matrix of the frame t - 1 gives a measure of the importance of the difference in matrixing of the channels between the two frames.
the greater this difference the greater the number of subframes for the interpolation made in block 460.
I n is the identity matrix
V t the vectors specific to the frame of index t
⁇ M ⁇ is a norm of the matrix M which corresponds here to the sum of the absolute values of all the coefficients.
Other matrix standards can be used (for example the Frobenius standard).
the realization of the interpolation makes it possible to apply in fine an optimization of the decorrelation of the input channels before multi-mono coding.
the rotation matrices calculated respectively for a previous frame t -1 and a current frame t can be very different because of this search for decorrelation, but the interpolation nevertheless makes it possible to smooth this difference.
the interpolation used requires a limited calculation cost for the encoder and the decoder since it is carried out in a specific domain (angle in 2D, quaternion in 3D, double quaternion in 4D). This approach is more advantageous than interpolating covariance matrices calculated for the PCA / KLT analysis and repeating an eigenvalue decomposition type EVD (for “EigenValue Decomposition”) several times per frame.
Block 470 then performs the matrixing of the ambisonic channels by subframe using the transformation matrices calculated in block 460. This matrixing amounts to calculating by subframe.
the signal contained in these channels is then sent to block 340 for encoding multi-monos.
Block 510 After demultiplexing of the binary train for the current frame t by block 500, the allocation information is decoded (block 510) which makes it possible to de-multiplex and decode (block 520) the binary train (s) (s) received for each of the n transformed channels.
Block 520 calls for multiple instances executed separately from core decoding.
the core decoding can be of the EVS type optionally modified to improve its performance.
each channel is decoded separately. If the Previously used encoding is stereo or multi-channel encoding, the multi-mono approach can be replaced by multi-stereo or multi-channel for decoding.
the channels thus decoded are sent to block 530 which decodes the rotation matrix for the current frame and optionally the number K of subframes to be used for the interpolation (if the interpolation is adaptive).
the interpolation block 460 splits the frame into sub-frames whose number K can be read in the stream encoded by block 610 ( figure 6 ) and interpolates the rotation matrices, the aim being to find - in the absence of transmission errors - the same matrices as in block 460 of the encoder in order to be able to reverse the transformation which was previously done in block 470.
Block 530 performs the matrixing inverting that of block 470 to reconstruct a decoded signal, as detailed below with reference to figure 6 .
Block 530 globally performs the decoding and reverse PCA / KLT synthesis that was performed by block 310 of the figure 3 .
the quantization indices of the rotation quantization parameters in the current frame are decoded in block 600. Scalar quantization can be used and the quantization step is identical for each angle.
the number of interpolation sub-frames is decoded (block 610) to find the number K of sub-frames among the set ⁇ 10, 48, 96, 192 ⁇ ; in variants where the length of frames L is different, this set of values may be adapted.
the interpolation of the decoder is identical to that performed at the encoder (block 460).
Block 620 performs the reverse matrixing of the ambisonic channels per subframe using the inverses (transposed in practice) of the transformation matrices calculated in block 460.
the invention uses a completely different approach than the add / overlap MPEG-H codec based on a specific representation of transformation matrices which are restricted to matrices of rotation from one frame to another, in the temporal domain, allowing in particular an interpolation of the transformation matrices, with a mapping which ensures a coherence in direction (including by taking into account the direction by the sign).
the general approach of the invention is a coding of ambisonic sounds in the time domain by PCA with in particular PCA transformation matrices forced to be rotation matrices and interpolated by sub-frames in an optimized manner (in particular in the field of quaternions / double quaternions) to improve the quality.
the interpolation step is either fixed or adaptive as a function of a criterion of difference between an inter-correlation matrix and a reference matrix (identity) or between matrices to be interpolated.
the quantification of the rotation matrices can be implemented in the domain of generalized Euler angles.

Landscapes

Engineering & Computer Science (AREA)
Physics & Mathematics (AREA)
Audiology, Speech & Language Pathology (AREA)
Computational Linguistics (AREA)
Signal Processing (AREA)
Health & Medical Sciences (AREA)
Human Computer Interaction (AREA)
Acoustics & Sound (AREA)
Multimedia (AREA)
Spectroscopy & Molecular Physics (AREA)
Mathematical Physics (AREA)
Stereophonic System (AREA)
Compression, Expansion, Code Conversion, And Decoders (AREA)

EP19305254.5A 2019-03-05 2019-03-05 Räumliche audiocodierung mit interpolation und quantifizierung der drehungen Withdrawn EP3706119A1 (de)

Priority Applications (14)

Application Number	Priority Date	Filing Date	Title
EP19305254.5A EP3706119A1 (de)	2019-03-05	2019-03-05	Räumliche audiocodierung mit interpolation und quantifizierung der drehungen
EP20703048.7A EP3935629B1 (de)	2019-03-05	2020-02-10	Räumliche audiocodierung mit interpolation und quantifizierung der drehungen
CN202410956721.3A CN118692474A (zh)	2019-03-05	2020-02-10	压缩n个通道的音频信号的编码方法、编码设备和介质
EP24203647.3A EP4498367A1 (de)	2019-03-05	2020-02-10	Räumliche audiocodierung mit interpolation und quantifizierung von drehungen
JP2021552656A JP7419388B2 (ja)	2019-03-05	2020-02-10	回転の補間と量子化による空間化オーディオコーディング
US17/436,390 US11922959B2 (en)	2019-03-05	2020-02-10	Spatialized audio coding with interpolation and quantization of rotations
KR1020217031995A KR20210137114A (ko)	2019-03-05	2020-02-10	회전들의 보간 및 양자화를 통한 공간화된 오디오 코딩
PCT/EP2020/053264 WO2020177981A1 (fr)	2019-03-05	2020-02-10	Codage audio spatialisé avec interpolation et quantification de rotations
CN202080031569.8A CN113728382B (zh)	2019-03-05	2020-02-10	利用旋转的插值和量化进行空间化音频编解码
PL20703048.7T PL3935629T3 (pl)	2019-03-05	2020-02-10	Przestrzenne kodowanie audio z interpolacją i kwantyzacją obrotów
BR112021017511A BR112021017511A2 (pt)	2019-03-05	2020-02-10	Codificação de áudio espacializada com interpolação e quantização de rotações
ES20703048T ES3012112T3 (en)	2019-03-05	2020-02-10	Spatialised audio encoding with interpolation and quantifying of rotations
ZA2021/06465A ZA202106465B (en)	2019-03-05	2021-09-03	Spatialized audio coding with interpolation and quantification of rotations
JP2024001364A JP7789811B2 (ja)	2019-03-05	2024-01-09	回転の補間と量子化による空間化オーディオコーディング

Applications Claiming Priority (1)

Application Number	Priority Date	Filing Date	Title
EP19305254.5A EP3706119A1 (de)	2019-03-05	2019-03-05	Räumliche audiocodierung mit interpolation und quantifizierung der drehungen

Publications (1)

Publication Number	Publication Date
EP3706119A1 true EP3706119A1 (de)	2020-09-09

Family

ID=65991736

Family Applications (3)

Application Number	Title	Priority Date	Filing Date
EP19305254.5A Withdrawn EP3706119A1 (de)	2019-03-05	2019-03-05	Räumliche audiocodierung mit interpolation und quantifizierung der drehungen
EP24203647.3A Pending EP4498367A1 (de)	2019-03-05	2020-02-10	Räumliche audiocodierung mit interpolation und quantifizierung von drehungen
EP20703048.7A Active EP3935629B1 (de)	2019-03-05	2020-02-10	Räumliche audiocodierung mit interpolation und quantifizierung der drehungen

Family Applications After (2)

Application Number	Title	Priority Date	Filing Date
EP24203647.3A Pending EP4498367A1 (de)	2019-03-05	2020-02-10	Räumliche audiocodierung mit interpolation und quantifizierung von drehungen
EP20703048.7A Active EP3935629B1 (de)	2019-03-05	2020-02-10	Räumliche audiocodierung mit interpolation und quantifizierung der drehungen

Country Status (10)

Country	Link
US (1)	US11922959B2 (de)
EP (3)	EP3706119A1 (de)
JP (2)	JP7419388B2 (de)
KR (1)	KR20210137114A (de)
CN (2)	CN118692474A (de)
BR (1)	BR112021017511A2 (de)
ES (1)	ES3012112T3 (de)
PL (1)	PL3935629T3 (de)
WO (1)	WO2020177981A1 (de)
ZA (1)	ZA202106465B (de)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
FR3118266A1 (fr) *	2020-12-22	2022-06-24	Orange	Codage optimisé de matrices de rotations pour le codage d’un signal audio multicanal
JP2024521204A (ja) *	2021-05-31	2024-05-28	華為技術有限公司	三次元音声信号処理方法および装置

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
EP3706119A1 (de) *	2019-03-05	2020-09-09	Orange	Räumliche audiocodierung mit interpolation und quantifizierung der drehungen
CN116391365A (zh) *	2020-09-25	2023-07-04	苹果公司	高阶环境立体声编码和解码
US20240013793A1 (en) *	2020-12-02	2024-01-11	Dolby Laboratories Licensing Corporation	Rotation of sound components for orientation-dependent coding schemes
EP4089551A1 (de) *	2021-05-11	2022-11-16	Siemens Industry Software NV	Computerimplementiertes verfahren zur bestimmung einer transferfunktion eines moduls oder einer komponente und erzeugung solch einer komponente
CN115497485B (zh) *	2021-06-18	2024-10-18	华为技术有限公司	三维音频信号编码方法、装置、编码器和系统
EP4120255A1 (de)	2021-07-15	2023-01-18	Orange	Optimierte kugelvektorquantifizierung
FR3136099A1 (fr)	2022-05-30	2023-12-01	Orange	Codage audio spatialisé avec adaptation d’un traitement de décorrélation
KR102894176B1 (ko) *	2022-12-01	2025-12-01	백재호	보안 통신을 수행하는 시스템
WO2025000543A1 (zh) *	2023-06-30	2025-01-02	北京小米移动软件有限公司	音频数据处理方法、装置、芯片以及电子设备
CN119986659B (zh) *	2025-04-17	2025-09-16	南京理工大学	基于简化排序特征值解析式的极化合成孔径雷达数据快速分解方法

Citations (1)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
US20160155448A1 (en) *	2013-07-05	2016-06-02	Dolby International Ab	Enhanced sound field coding using parametric component generation

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
CN101802907B (zh) *	2007-09-19	2013-11-13	爱立信电话股份有限公司	多信道音频的联合增强
BR112012008793B1 (pt) *	2009-10-15	2021-02-23	France Telecom	Processos de codificação e de decodificação paramétrica de um sinalaudiodigital multicanal, codificador e decodificador paramétricos de um sinalaudiodigital multicanal
US9495968B2 (en) *	2013-05-29	2016-11-15	Qualcomm Incorporated	Identifying sources from which higher order ambisonic audio data is generated
CN104282309A (zh) *	2013-07-05	2015-01-14	杜比实验室特许公司	丢包掩蔽装置和方法以及音频处理系统
EP3706119A1 (de) *	2019-03-05	2020-09-09	Orange	Räumliche audiocodierung mit interpolation und quantifizierung der drehungen

2019
- 2019-03-05 EP EP19305254.5A patent/EP3706119A1/de not_active Withdrawn
2020
- 2020-02-10 KR KR1020217031995A patent/KR20210137114A/ko active Pending
- 2020-02-10 ES ES20703048T patent/ES3012112T3/es active Active
- 2020-02-10 US US17/436,390 patent/US11922959B2/en active Active
- 2020-02-10 CN CN202410956721.3A patent/CN118692474A/zh active Pending
- 2020-02-10 WO PCT/EP2020/053264 patent/WO2020177981A1/fr not_active Ceased
- 2020-02-10 EP EP24203647.3A patent/EP4498367A1/de active Pending
- 2020-02-10 EP EP20703048.7A patent/EP3935629B1/de active Active
- 2020-02-10 JP JP2021552656A patent/JP7419388B2/ja active Active
- 2020-02-10 PL PL20703048.7T patent/PL3935629T3/pl unknown
- 2020-02-10 BR BR112021017511A patent/BR112021017511A2/pt unknown
- 2020-02-10 CN CN202080031569.8A patent/CN113728382B/zh active Active
2021
- 2021-09-03 ZA ZA2021/06465A patent/ZA202106465B/en unknown
2024
- 2024-01-09 JP JP2024001364A patent/JP7789811B2/ja active Active

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
US20160155448A1 (en) *	2013-07-05	2016-06-02	Dolby International Ab	Enhanced sound field coding using parametric component generation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ROUMEN KOUNTCHEV ET AL: "New method for adaptive karhunen-loeve color transform", TELECOMMUNICATION IN MODERN SATELLITE, CABLE, AND BROADCASTING SERVICES, 2009. TELSIKS '09. 9TH INTERNATIONAL CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, 7 October 2009 (2009-10-07), pages 209 - 216, XP031573422, ISBN: 978-1-4244-4382-6 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
FR3118266A1 (fr) *	2020-12-22	2022-06-24	Orange	Codage optimisé de matrices de rotations pour le codage d’un signal audio multicanal
WO2022136760A1 (fr) *	2020-12-22	2022-06-30	Orange	Codage optimise de matrices de rotations pour le codage d'un signal audio multicanal
US12505847B2 (en)	2020-12-22	2025-12-23	Orange	Optimized encoding of rotation matrices for encoding a multichannel audio signal
JP2024521204A (ja) *	2021-05-31	2024-05-28	華為技術有限公司	三次元音声信号処理方法および装置
JP7680571B2 (ja)	2021-05-31	2025-05-20	華為技術有限公司	三次元音声信号処理方法および装置

Also Published As

Publication number	Publication date
JP2024024095A (ja)	2024-02-21
CN118692474A (zh)	2024-09-24
EP3935629B1 (de)	2024-11-27
WO2020177981A1 (fr)	2020-09-10
JP7789811B2 (ja)	2025-12-22
EP3935629C0 (de)	2024-11-27
CN113728382B (zh)	2024-08-09
JP7419388B2 (ja)	2024-01-22
KR20210137114A (ko)	2021-11-17
CN113728382A (zh)	2021-11-30
ES3012112T3 (en)	2025-04-08
BR112021017511A2 (pt)	2021-11-16
PL3935629T3 (pl)	2025-03-31
ZA202106465B (en)	2022-07-27
JP2022523414A (ja)	2022-04-22
US20220148607A1 (en)	2022-05-12
EP3935629A1 (de)	2022-01-12
EP4498367A1 (de)	2025-01-29
US11922959B2 (en)	2024-03-05

Legal Events

Date	Code	Title	Description
2020-08-07	PUAI	Public reference made under article 153(3) epc to a published international application that has entered the european phase	Free format text: ORIGINAL CODE: 0009012
2020-08-07	STAA	Information on the status of an ep patent application or granted ep patent	Free format text: STATUS: THE APPLICATION HAS BEEN PUBLISHED
2020-09-09	AK	Designated contracting states	Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR
2020-09-09	AX	Request for extension of the european patent	Extension state: BA ME
2021-07-16	STAA	Information on the status of an ep patent application or granted ep patent	Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN
2021-07-28	RAP3	Party data changed (applicant data changed or rights of an application transferred)	Owner name: ORANGE
2021-08-18	18D	Application deemed to be withdrawn	Effective date: 20210310

Publication	Publication Date	Title
EP3935629B1 (de)	2024-11-27	Räumliche audiocodierung mit interpolation und quantifizierung der drehungen
EP2002424B1 (de)	2015-07-29	Vorrichtung und verfahren zur skalierbaren kodierung eines mehrkanaligen audiosignals auf der basis einer hauptkomponentenanalyse
EP2374123B1 (de)	2019-04-10	Verbesserte codierung von mehrkanaligen digitalen audiosignalen
EP2374124B1 (de)	2013-05-29	Verwaltete codierung von mehrkanaligen digitalen audiosignalen
EP2005420B1 (de)	2011-10-26	Einrichtung und verfahren zur codierung durch hauptkomponentenanalyse eines mehrkanaligen audiosignals
EP2898707B1 (de)	2020-04-22	Optimierte kalibrierung eines klangwiedergabesystems mit mehreren lautsprechern
EP3427260B1 (de)	2021-04-28	Optimierte codierung und decodierung von verräumlichungsinformationen zur parametrischen codierung und decodierung eines mehrkanaligen audiosignals
EP2304721B1 (de)	2012-05-09	Raumsynthese mehrkanaliger tonsignale
FR2973551A1 (fr)	2012-10-05	Allocation par sous-bandes de bits de quantification de parametres d'information spatiale pour un codage parametrique
EP2168121B1 (de)	2018-06-06	Quantifizierung nach linearer umwandlung durch kombination von audiosignalen einer klangszene und kodiergerät dafür
Mahé et al.	2019	First-order ambisonic coding with quaternion-based interpolation of PCA rotation matrices
EP4042418B1 (de)	2023-09-06	Bestimmung von korrekturen zur anwendung auf ein mehrkanalaudiosignal, zugehörige codierung und decodierung
EP4172986B1 (de)	2025-08-13	Optimierte codierung einer für ein räumliches bild eines mehrkanaligen audiosignals repräsentativen information
EP2198425A1 (de)	2010-06-23	Verfahren, modul und computerprogramm mit quantifizierung auf der basis von gerzon-vektoren
EP4268374B1 (de)	2026-01-28	Optimierte codierung von rotationsmatrizen zur codierung eines mehrkanaligen audiosignals
WO2023232823A1 (fr)	2023-12-07	Titre: codage audio spatialisé avec adaptation d'un traitement de décorrélation