BRPI0317954B1

BRPI0317954B1 - Variable rate audio coding and decoding process

Info

Publication number: BRPI0317954B1
Authority: BR
Publication date: 2017-01-03

Description

Relatório Descritivo da Patente de Invenção para "PROCESSO DE CODIFICAÇÃO E DECODIFICAÇÃO ÁUDIO COM TAXA VARIÁVEL". A presente invenção refere-se aos dispositivos de codificação e de decodificação dos sinais áudio, destinados notadamente e ocupar lugar em aplicações de transmissão ou de armazenagem dos sinais áudio (fala e/ou sons) numerados e comprimidos.Descriptive Report of the Invention Patent for "AUDIO VARIABLE CODING AND DECODING PROCESS". The present invention relates to audio signal coding and decoding devices, intended notably to occupy place in the transmission or storage of numbered and compressed audio signals (speech and / or sounds).

Mais particularmente, essa invenção é relativa aos sistemas de codificação áudio tendo a capacidade de fornecer taxas variadas, ainda denominados de codificação multitaxa. Esses sistemas se distinguem dos codificadores com taxa fixa por sua capacidade de modificar a taxa da codificação, eventualmente em curso de tratamento, o que é particularmente adaptado para transmissão sobre redes de acessos heterogêneos, quer se trate de redes de tipo IP, misturando acessos fixos e móveis, elevadas taxas (ADSL), baixas taxas (modems RTC, GPRS), ou fazendo intervir terminais de capacidades variáveis (móveis, PC,...).More particularly, this invention relates to audio coding systems having the ability to provide varying rates, still called multitax coding. These systems are distinguished from fixed-rate encoders because of their ability to modify the encoding rate, which is currently being processed, which is particularly suited for transmission over heterogeneous access networks, whether IP-type networks, by mixing fixed accesses. and mobile, high rates (ADSL), low rates (RTC, GPRS modems), or intervening terminals of varying capacities (mobile, PC, ...).

Distinguem-se essencialmente duas categorias de codificadores multitaxa: aquela dos codificadores multitaxa “comutáveis” e aquela dos codificadores “hierárquicos".There are essentially two categories of multitax coders: that of "switchable" multitax coders and that of "hierarchical" coders.

Os codificadores multitaxa “comutáveis” se baseiam em uma estrutura de codificação pertencente a uma família tecnológica (codificação temporal, ou freqüencial, por exemplo: CELP, sinusoidal, ou por transformada), na qual uma indicação de taxa é simultaneamente fornecida ao codificador e ao decodificador. O codificador utiliza essa informação para selecionar as partes do algoritmo e as tabelas pertinentes para a taxa escolhida. O decodificador opera de maneira simétrica. Numerosas estruturas de codificação multitaxa comutáveis foram propostas para a codificação áudio. É o caso, por exemplo, dos codificadores móveis normalizados pela organização 3GPP ("3rd Generation Partnership Project"), o NB-AMR("Narrow Band Adaptive Multi-Rate", Specification Technique 3GPP TS 26090, versão 5.0.0, junho de 2002) em bandatelefônica, ou WB-AMR ("Wide Band Adaptive Multi-Rate", Specification Technique 3GPP TS 26.190, versão 5.1.0, dezembro 2001) em banda larga. Esses codificadores operam em muitas bandas lar- gas de taxas (4,75 a 12,2 kbit/s para o NB-AMR, 6,60 a 23,85 kbits/s para o WB-AMR), com uma granularidade bastante importante (8 taxas para o NB-AMR e 9 para o WB-AMR). Todavia, o preço a pagar para essa flexibilidade é uma complexidade de estrutura bem consequente: para chegar a atingir todas essas taxas, esses codificadores devem suportar numerosas opções diferentes, tabelas de quantificação variadas, etc. A curva dos desempenhos aumenta progressivamente com a taxa, mas a progressão não é linear e certas taxas são, por essência, melhor otimizadas do que outras.“Switchable” multitax encoders are based on a coding structure belonging to a technological family (temporal, or frequency coded, for example: CELP, sinusoidal, or transform), in which a rate indication is simultaneously provided to the encoder and decoder. The encoder uses this information to select the parts of the algorithm and the relevant tables for the chosen rate. The decoder operates symmetrically. Numerous switchable multitax coding structures have been proposed for audio coding. This is the case, for example, with mobile coders standardized by the 3GPP (3rd Generation Partnership Project) organization, the NB-AMR (Narrow Band Adaptive Multi-Rate), Specification Technique 3GPP TS 26090, version 5.0.0, June 2002) in bandwidth, or WB-AMR ("Wide Band Adaptive Multi-Rate", Specification Technique 3GPP TS 26.190, version 5.1.0, December 2001) in broadband. These encoders operate in many broadband bands (4.75 to 12.2 kbit / s for NB-AMR, 6.60 to 23.85 kbits / s for WB-AMR), with very important granularity. (8 rates for NB-AMR and 9 for WB-AMR). However, the price to be paid for such flexibility is a very consequential structural complexity: in order to achieve all these rates, these encoders must support numerous different options, varying quantization tables, and so on. The performance curve increases progressively with the rate, but progression is not linear and certain rates are, in essence, better optimized than others.

Nos sistemas de codificação ditos "hierárquicos", ainda denominados "escaláveis", os dados binários provenientes da operação de codificação se repartem em camadas sucessivas. Uma camada de base, ainda denominada "núcleo", é formada dos elementos binários absolutamente necessários â decodificação do trem binário, e determinando uma qualidade mínima de decodificação.In so-called "hierarchical" coding systems, still called "scalable", binary data from the coding operation is divided into successive layers. A base layer, still called a "core", is formed of the binary elements absolutely necessary for decoding the binary train, and determining a minimum quality of decoding.

As camadas seguintes permitem melhorar progressivamente a qualidade do sinal proveniente da operação de decodificação, cada nova camada levando novas informações, que, exploradas pelo decodificador, fornecem na saída um sinal de qualidade crescente.The following layers allow to progressively improve the signal quality from the decoding operation, each new layer carrying new information, which, exploited by the decoder, provides an increasing signal quality at the output.

Uma das particularidades da codificação hierárquica é a possibilidade oferecida de intervir em qualquer nível da cadeia de transmissão ou de estocagem para suprimir uma parte do trem binário, sem ter de fornecer indicação particular ao codificador, nem ao decodificador. O decodificador utiliza as informações binárias que ele recebe e produz um sinal de qualidade correspondente. O domínio das estruturas de codificação hierárquicas deu origem também a numerosos trabalhos. Certas estruturas de codificação hierárquicas funcionam a partir de um só tipo de codificador, concebido para liberar informações codificadas hierarquizadas. Quando as camadas suplementares melhoram a qualidade do sinal de saída, sem modificar a banda passante, fala-se antes de tudo de "codificadores imbricados" (ver, por exemplo, R.D. lacovo et al., Embedded CELP Coding For Variable Bit-Rate Between 6.4 and 9.6 kbit/s", Proc. ICASSP1991, pp. 681-686). Esse tipo de codificadores não permite todavia grandes desvios entre a mais baixa e a mais alta taxa propostos. A hierarquia é frequentemente utilizada para aumentar progressivamente a banda passante do sinal: o núcleo fornece um sinal em banda de base, por exemplo, telefônica {300-3400 Hz), e as camadas seguintes permitem a codificação de bandas freqüenciais suplementares (por exemplo, banda larga até 7 kHz, banda HiFi até 20 kHz ou intermediárias,...). Os codificadores em sub-bandas ou os codificadores utilizando uma transformação tempo-freqüência, tais como descrito nos documentos “Subband/transform coding using filter banks designs based on time doain aliasing cancellation: de J.P. Princen et al. (proc. IEEE ICASSP-97, pp. 2161-2164) e "High Quality Audio Transform Coding at 64 kbit/s", de y. Mahieux et al. (IEEE Trans. Commun, Vol. 42, N° 11, novembro de 1994, pp. 3010-3019) se prestam particularmente a essas operações.One of the peculiarities of hierarchical coding is the possibility offered to intervene at any level of the transmission or storage chain to suppress a part of the binary train without having to give particular indication to the encoder or decoder. The decoder uses the binary information it receives and produces a corresponding quality signal. Mastery of hierarchical coding structures has also given rise to numerous works. Certain hierarchical coding structures work from a single encoder type designed to release hierarchical coded information. When supplementary layers improve the quality of the output signal without modifying the bandwidth, it is first of all referred to as "nested encoders" (see, for example, RD lacovo et al., Embedded CELP Coding For Variable Bit-Rate Between). 6.4 and 9.6 kbit / s ", Proc. ICASSP1991, pp. 681-686). However, such encoders do not allow large deviations between the lowest and highest proposed rates. The hierarchy is often used to progressively increase the bandwidth. The core provides a baseband signal, eg telephone (300-3400 Hz), and the following layers allow the encoding of additional frequency bands (eg broadband up to 7 kHz, HiFi band up to 20 kHz) subband encoders or encoders using a time-frequency transformation, as described in the “Subband / transform coding using filter banks designs based on time doain aliasing cancellatio” document. n: from J.P. Princen et al. (proc. IEEE ICASSP-97, pp. 2161-2164) and "High Quality Audio Transform Coding at 64 kbit / s" by y. Mahieux et al. (IEEE Trans. Commun, Vol. 42, No. 11, November 1994, pp. 3010-3019) lend themselves particularly to such operations.

Por outro lado, é freqüente aplicar uma técnica de codificação diferente para o núcleo e para o($) módulo(s) codificando as camadas suplementares, fala-se então de diferentes estágios de codificação, cada estágio sendo constituído de um subcodificador. O subcodificador do estágio de um nível determinado poderá seja codificar partes do sinal não codificadas pelos estágios precedentes, seja codificar o resíduo de codificação do estágio precedente, o resíduo é obtido subtraindo o sinal decodificado do sinal original. A vantagem dessas estruturas é que elas permitem descer a taxas relativamente baixas com uma qualidade suficiente, produzindo uma boa qualidade de elevada taxa. Com efeito, as técnicas aplicadas para as baixas taxas não são geralmente eficazes às elevadas taxas e inversamente.On the other hand, a different coding technique is often applied to the core and to the ($) module (s) encoding the supplementary layers, so we speak of different coding stages, each stage consisting of a subcoder. The stage subcoder of a given level may either encode parts of the signal not encoded by the preceding stages, or encode the coding residue of the preceding stage, the residue is obtained by subtracting the decoded signal from the original signal. The advantage of these structures is that they allow to go down at relatively low rates of sufficient quality, producing good high rate quality. Indeed, the techniques applied for low rates are not generally effective at high rates and vice versa.

Essas estruturas permitem utilizar duas tecnologias diferentes (por exemplo, CELP e transformada tempo-freqüência,...) são particularmente eficazes para varrer grandes faixas de taxas.These structures allow you to use two different technologies (eg CELP and time-frequency transform, ...) are particularly effective for scanning large rate ranges.

Todavia, as estruturas de codificação hierárquicas propostas na técnica anterior definem precisamente a posição atribuída a cada uma das camadas intermediárias. Cada camada corresponde à codificação de certos parâmetros, e a granularidade do trem binário hierárquico depende da taxa atribuída a esses parâmetros (tipicamente uma camada pode conter da ordem de algumas dezenas de bits por trama, uma trama de sinal sendo constituída de um certo número de amostras do sinal em uma duração determinada, o exemplo descrito mais adiante, considerando-se uma trama de 960 amostras, correspondendo a 60 ms de sinal).However, the hierarchical coding structures proposed in the prior art precisely define the position assigned to each of the intermediate layers. Each layer corresponds to the coding of certain parameters, and the granularity of the hierarchical binary train depends on the rate assigned to these parameters (typically a layer may contain on the order of a few dozen bits per frame, a signal frame consisting of a number of samples of the signal at a given duration, the example described below, considering a frame of 960 samples, corresponding to 60 ms of signal).

Além disso, quando a banda passante dos sinais decodificados pode variar segundo nível das camadas de elementos binários, a modificação da taxa em linha pode produzir artefatos incômodos, quando da escuta. A presente invenção tem notadamente por finalidade propor uma solução de codificação multitaxa que previne os inconvenientes citados no caso da utilização das codificações comutáveis e hierárquicas existentes. A invenção propões assim um processo de codificação de uma trama de sina! áudio numérico em uma seqüência binária de saída, no qual um número máximo Nmax de bits de codificação é definido para um conjunto de parâmetros calculáveis a partir da trama de sinal, composto de um primeiro e de um segundo subconjuntos . O processo proposto compreende as seguintes etapas: - calculam-se os parâmetros do primeiro subconjunto e codificam-se esses parâmetros sobre um número NO de bits de codificação, tal como NO < Nmax; - determina-se uma alocação de Nmax-NO bits de codificação para os parâmetros do segundo subconjunto; e - classificam-se os Nmax - NO bits de codificação alocados nos parâmetros do segundo subconjunto em uma ordem determinada. A alocação e/ou a ordem de classificação dos Nmax - NO bits de codificação são determinados em função dos parâmetros codificados do primeiro subconjunto. O processo de codificação compreende, além disso, as seguintes etapas em resposta à indicação de um número N de bits da seqüência binária de saída disponíveis para a codificação desse conjunto de parâmetros, com NO < N < Nmax: - selecionam-se os parâmetros do segundo subconjunto nos quais são alocados os N-NO bits de codificação classificados os primeiros nessa ordem; - calcuíam-se os parâmetros selecionados do segundo subconjunto; e codificam-se esses parâmetros para produzir os N-NO bits de codificação classificados primeiros; e - inserem-se na seqüência de saída os NO bits de codificação do primeiro subconjunto, assim como os N-NO bits de codificação dos parâmetros selecionados do segundo subconjunto. O processo, de acordo com a invenção, permite definir uma codificação multitaxa, que funcionará pelo menos uma faixa correspondente para cada trama a um número de bits que vai de NO a Nmax.In addition, when the bandwidth of the decoded signals may vary according to the level of the binary element layers, inline rate modification may produce cumbersome artifacts upon listening. The present invention is notably intended to propose a multi-rate coding solution which avoids the disadvantages cited in the case of the use of existing switchable and hierarchical codings. The invention thus proposes a process for encoding a fate frame. numeric audio in a binary output sequence, in which a maximum number of coding bits Nmax is defined for a set of parameters computable from the signal frame, composed of a first and a second subset. The proposed process comprises the following steps: - the parameters of the first subset are calculated and coded over a NO number of coding bits, such as NO <Nmax; - an allocation of Nmax-NO coding bits for the parameters of the second subset is determined; and - the Nmax - NO coding bits allocated to the parameters of the second subset are classified in a given order. The allocation and / or sort order of the Nmax - NO coding bits is determined as a function of the coded parameters of the first subset. The coding process further comprises the following steps in response to indicating an N number of bits of the binary output sequence available for coding this parameter set, with NO <N <Nmax: - selecting the parameters of the second subset in which the N-NO coding bits allocated the first in this order are allocated; - the parameters selected from the second subset were calculated; and these parameters are coded to produce the first ranked N-NO coding bits; and - the coding bits of the first subset are inserted into the output sequence as well as the coding bits of the selected parameters of the second subset. The method according to the invention allows defining a multi-rate coding which will operate at least one corresponding range for each frame at a number of bits ranging from NO to Nmax.

Pode-se assim considerar que a noção de taxas preestabeleci-das que é ligada às codificações comutáveis e hierárquicas existentes é substituída por uma noção de cursor, permitindo fazer variar livremente a taxa entre um valor mínimo (podendo eventualmente corresponder a um número de bits N inferior a NO) e um valor máximo (correspondente a Nmax). Esses valores extremos são potencialmente afastados. O processo oferece bons desempenhos em termos de eficácia de codificação, independentemente da taxa escolhida.It can thus be considered that the notion of pre-set rates which is linked to existing switchable and hierarchical encodings is replaced by a notion of cursor, allowing the rate to be freely varied between a minimum value (possibly corresponding to a number of bits N less than NO) and a maximum value (corresponding to Nmax). These extreme values are potentially far apart. The process offers good performance in coding efficiency, regardless of the rate chosen.

Vantajosamente, o número de N de bits da seqüência binária de saída é estritamente inferior a Nmax. O codificador tem então de notável que a alocação dos bits empregada não faz referência á taxa de saída efetiva do codificador, mas a um outro número Nmax convencionado como decodifica-dor. É, todavia, possível fixar Nmax = N em função da taxa instantânea disponível sobre um canal; de transmissão. A seqüência de saída desse codificador multitaxa comutável poderá ser tratada por um decodificador que não recebería a totalidade da seqüência, desde quando é capaz de encontrar a estrutura dos bits de codificação do segundo subconjunto, graças ao conhecimento de Nmax.Advantageously, the number of bits N of the binary output sequence is strictly less than Nmax. The encoder then has to remark that the bit allocation employed does not refer to the encoder's effective output rate, but to another Nmax number conventionally known as the decoder. It is, however, possible to set Nmax = N as a function of the instantaneous rate available over a channel; transmission The output sequence of this switchable multitax encoder may be handled by a decoder that would not receive the entire sequence since it is able to find the structure of the second subset coding bits, thanks to the knowledge of Nmax.

Um outro caso em que se pode ter N = Nmax é aquele da arma- zenagem de dados áudio à taxa de codificação máxima. Quando de uma leitura de Ν’ bits desse conteúdo armazenado com taxa mais baixa, o deco-dificador será capaz de encontrar a estrutura dos bits de codificação do segundo subconjunto desde quando N'> NO. A ordem de classificação dos bits de codificação alocados nos parâmetros do segundo subconjunto pode ser uma ordem preestabelecida.Another case where N = Nmax can be had is that of storing audio data at the maximum encoding rate. When reading a bit of this lower-rate stored content, the deco-hinder will be able to find the structure of the second subset's encoding bits since when N '> NO. The sort order of the coding bits allocated in the parameters of the second subset may be a pre-set order.

Em um modo de realização preferido, a ordem de classificação dos bits de codificação alocados nos parâmetros do segundo subconjunto é variável. Pode notadamente ser uma ordem de importância decrescente determinada em função de pelo menos os parâmetros codificados do primeiro subconjunto. Assim, o decodificador que receberá uma seqüência binária de N'bits para a trama, com NO < N'< N < Nmax, poderá deduzir essa ordem dos NO bits recebidos para a codificação do primeiro subconjunto. A alocação dos Nmax - NO bits na codificação dos parâmetros do segundo subconjunto pode ser realizada de forma fixa (nesse caso, a ordem de classificação desses bits será função pelo menos dos parâmetros codificados do primeiro subconjunto).In a preferred embodiment, the sort order of coding bits allocated in the parameters of the second subset is variable. It may notably be an order of decreasing importance determined as a function of at least the coded parameters of the first subset. Thus, the decoder that will receive a binary sequence of N'bits for the frame, with NO <N '<N <Nmax, can deduce this order from the NO bits received for encoding the first subset. The allocation of Nmax - NO bits in the coding of parameters of the second subset can be fixed (in this case, the order of classification of these bits will be a function of at least the coded parameters of the first subset).

Em um modo de realização preferido, a alocação dos Nmax -NO bits na codificação dos parâmetros do segundo subconjunto é uma função dos parâmetros codificados do primeiro subconjunto.In a preferred embodiment, the allocation of Nmax -NO bits in the coding of the second subset parameters is a function of the coded parameters of the first subset.

Vantajosamente, essa ordem de classificação dos bits de codificação alocados nos parâmetros do segundo subconjunto é determinado com o auxílio de pelo menos um critério psícoacústico em função dos parâmetros codificados do primeiro subconjunto.Advantageously, this order of classification of the coding bits allocated in the second subset parameters is determined with the aid of at least one psychoacoustic criterion as a function of the coded parameters of the first subset.

Os parâmetros do segundo subconjunto podem ser reportar a bandas espectrais do sinal. Nesse caso, o processo compreende vantajosamente uma etapa de estimativa de um envoltório espectral do sinal codificado a partir dos parâmetros codificados do primeiro subconjunto e uma etapa de cálculo de uma curva de mascaramento freqüencial, aplicando um modelo de percepção auditiva com envoltório espectral estimado, e o critério psícoacústico faz referência ao nível do envoltório espectral estimado em relação à curva de mascaramento em cada banda espectral.The parameters of the second subset may be report to spectral bands of the signal. In this case, the process advantageously comprises a step of estimating a coded signal spectral envelope from the coded parameters of the first subset and a step of calculating a frequency masking curve, applying an estimated spectral envelope auditory perception model, and The psychoacoustic criterion refers to the estimated spectral envelope level in relation to the masking curve in each spectral band.

Em um modo de aplicação, ordenam-se os bits de codificação na seqüência de saída de tal modo que os NO bits de codificação do primeiro subconjunto precedente os N - NO bits de codificação dos parâmetros selecionados do segundo subconjunto e que os bits de codificação respectivos dos parâmetros selecionados do segundo subconjunto aí aparecem na ordem determinada para esses bits de codificação. Isto permite, no caso de a seqüência binária ser truncada, receber a parte mais importante. O número N pode variar de uma trama a outra, notadamente em função, por exemplo, da capacidade disponível da fonte de transmissão. A codificação áudio com multitaxas, segundo a presente invenção poderá ser utilizada segundo um modo comutável ou hierárquico muito flexível, já que um número qualquer de bits a transmitir escolhido livremente entre NO e Nmax pode ser selecionado a qualquer momento, isto é, trama por trama. A codificação dos parâmetros do primeiro subconjunto pode ser com taxa variável, o que faz variar o número NO de uma trama a outra. Isto permite ajustar ao máximo a repartição dos bits em função das tramas a codificar.In one application mode, the coding bits in the output sequence are arranged such that the NO coding bits of the first subset preceding the N - NO coding bits of the selected parameters of the second subset and the respective coding bits of the parameters selected from the second subset there appear in the order determined for those encoding bits. This allows, in case the binary sequence is truncated, to receive the most important part. The number N may vary from one frame to another, notably depending on, for example, the available capacity of the transmission source. Multitax audio coding according to the present invention may be used in a very flexible switchable or hierarchical mode as any number of bits to be transmitted freely chosen between NO and Nmax can be selected at any time, ie frame by frame. . Parameter coding of the first subset can be variable rate, which varies the NO number from one frame to another. This makes it possible to adjust the bit distribution as much as possible according to the frames to be encoded.

Em um modo de aplicação, o primeiro subconjunto compreende parâmetros calculados por um núcleo codificador. Vantajosamente, o núcleo codificador tem uma banda de freqüências de funcionamento inferior à banda passante do sinal a codificar, e o primeiro subconjunto compreende, além disso, níveis energéticos do sinal áudio associados a faixas de freqüências superiores à faixa de funcionamento do núcleo codificador. Esse tipo estrutura é aquele de um codificador hierárquico com dois níveis, que libera, por exemplo, via o núcleo codificador, um sinal; codificado de uma qualidade julgado suficiente e que, em função da posição disponível, completa a codificação feita pelo núcleo codificador por informações suplementares provenientes do processo decodificação, de acordo com a invenção.In one application mode, the first subset comprises parameters calculated by an encoding core. Advantageously, the encoding core has an operating frequency band less than the bandwidth of the signal to be encoded, and the first subset further comprises energetic levels of the audio signal associated with frequency bands exceeding the operating range of the encoding core. This type structure is that of a two-level hierarchical encoder, which releases, for example, via the encoding core, a signal; of sufficient quality and which, depending on the position available, completes the coding by the coding core for further information from the decoding process according to the invention.

De preferência, ordenam-se então os bits de codificação do primeiro subconjunto na seqüência de saída de tal modo que os bits de codificação dos parâmetros calculados pelo núcleo codificador sejam imediata- mente seguidos pelos bits de codificação dos níveis energéticos associados às faixas de frequências superiores. Isto assegura uma mesma banda passante às tramas sucessivamente codificadas, desde quando o decodificador recebe suficientemente de bits para dispor das informações do núcleo codificador e níveis energéticos codificados associados às faixas de frequências superiores.Preferably, the coding bits of the first subset in the output sequence are then sorted such that the coding bits of the parameters calculated by the coding core are immediately followed by the energy level coding bits associated with the higher frequency bands. . This ensures the same bandwidth to successively encoded frames, as long as the decoder receives sufficient bits to have the encoding core information and encoded energy levels associated with the higher frequency bands.

Em um modo de utilização, estima-se um sinal de diferença entre o sinal a codificar e um sinal de síntese derivado dos parâmetros codificados produzidos pelo núcleo codificador, e o primeiro subconjunto compreende, além disso, níveis energéticos do sinal de diferença associados a faixas de freqüências incluídas na faixa de funcionamento do núcleo codificador.In one use, a difference signal between the coding signal and a synthesis signal derived from the coded parameters produced by the coding core is estimated, and the first subset further comprises energy levels of the band-associated difference signal. frequencies included in the operating range of the encoder core.

Um segundo aspecto da invenção se reporta a um processo de decodificação de uma seqüência binária de entrada para sintetizar um sinal áudio numérico correspondente à decodificação de uma trama codificada segundo o processo de codificação da invenção. Segundo esse processo, um número máximo Nmax de bits de codificação é definido para um conjunto de parâmetros de descrição de uma trama de sinal, composto de um primeiro e de um segundo subconjunto. A seqüência de entrada compreende, para uma trama de sinal, um número Ν’ de bits de codificação do conjunto de parâmetros, com N'< Nmax. O processo de decomposição, de acordo com a invenção, compreende as seguintes etapas: - extrai-se, desses Ν' bits da seqüência de entrada, um número NO de bits de codificação dos parâmetros do primeiro subconjunto, se NO<N'; se recupera os parâmetros do primeiro subconjunto sobre a base desses NO bits de codificação extraídos; - determina-se uma alocação de Nmax - NO bits de codificação para os parâmetros do segundo subconjunto; e - classificam-se os Nmax - NO bits de codificação alocados nos parâmetros do segundo subconjunto em uma ordem determinada. A alocação e/ou a ordem de classificação dos Nmax - NO bits de codificação são determinadas em função dos parâmetros recuperados do primeiro subconjunto. O processo de decodificação compreende, além disso, as seguintes etapas: - selecionam-se os parâmetros do segundo subconjunto nos quais são alocados os Ν'- NO bits de codificação classificados os primeiros nessa ordem; - extraem-se, desses Ν' bits da seqüência de entrada, Ν'- NO bits de codificação dos parâmetros selecionados do segundo subconjunto; - recuperam-se os parâmetros selecionados do segundo subconjunto sobre a base desses Ν'- NO bits de codificação extraídos; e - sintetiza-se a trama de sinal, utilizando-se os parâmetros recuperados dos primeiro e segundo subconjuntos.A second aspect of the invention relates to a process of decoding an input binary sequence to synthesize a numeric audio signal corresponding to decoding a coded frame according to the encoding process of the invention. According to this process, a maximum number of coding bits Nmax is defined for a set of description parameters of a signal frame composed of a first and a second subset. The input sequence comprises, for a signal frame, a number of coding bits of the parameter set, with N '<Nmax. The decomposition process according to the invention comprises the following steps: - from these bits 'bits of the input sequence, a NO number of encoding bits of the parameters of the first subset, if NO <N'; retrieve the parameters of the first subset on the basis of these extracted NO coding bits; - an allocation of Nmax - NO coding bits for the parameters of the second subset is determined; and - the Nmax - NO coding bits allocated to the parameters of the second subset are classified in a given order. The allocation and / or sort order of the Nmax - NO coding bits is determined as a function of the parameters retrieved from the first subset. The decoding process further comprises the following steps: - selecting the parameters of the second subset into which the first coding bits classified in the first order are allocated; - from these Ν 'bits of the input sequence, NO'- NO coding bits of the selected parameters of the second subset are extracted; - retrieve the selected parameters from the second subset on the basis of these extracted coding bits; and synthesizing the signal frame using the parameters retrieved from the first and second subsets.

Esse processo de decodificação é vantajosamente associado a métodos de regeneração dos parâmetros que faltam devido à truncatura da seqüência de Nmax bits produzida, virtualmente ou não, pelo codificador.This decoding process is advantageously associated with missing parameter regeneration methods due to the truncation of the Nmax bit sequence produced, virtually or not, by the encoder.

Um terceiro aspecto da invenção se reporta a um codificador áudio, compreendendo meios de tratamento de sinal numérico ajustados para aplicar um processo de codificação, de acordo com a invenção.A third aspect of the invention relates to an audio encoder comprising numerical signal processing means arranged to apply an encoding process according to the invention.

Um outro aspecto da invenção se reporta a um decodificador áudio, compreendendo meios de tratamento de sinal numérico ajustados para aplicar um processo de decodificação, de acordo com a invenção.Another aspect of the invention relates to an audio decoder comprising numerical signal processing means adapted to apply a decoding process according to the invention.

Outras particularidades e vantagens da presente invenção aparecerão na descrição dada a seguir de exemplos de realização não limitati-vos, com referência aos desenhos anexados, nos quais: - a figura 1 representa um esquema sinóptico de um exemplo de codificador áudio, de acordo com a invenção; - a figura 2 representa uma seqüência binária de saída de N bits em um modo de realização da invenção; e - a figura 3 representa um esquema sinóptico de um decodifíca-dor áudio, de acordo com a invenção. O codificador representado na figura 1 tem uma estrutura hierárquica com dois estágios de codificação. Um primeiro estágio de codificação 1 consiste, por exemplo, em um núcleo codificador em banda telefônica (300-3400 Hz) de tipo CELP. Esse codificador é, no exemplo, considerado um codificador G.723.1 normalizado pelo ITU-T ("International Telecomuni-cation Union") em modo fixo com 6,4 kbit/s. Ele calcula parâmetros G.723.1 de acordo com a norma e os quantifica por meio de 192 bits de codificação P1 por trama de 30 ms. O segundo estágio de codificação 2, permitindo aumentar a banda passante em direção à banda larga (50-7000 Hz), opera sobre o resíduo de codificação E do primeiro estágio, fornecido por um subtraidor 3 no esquema da figura 1. Um módulo de sincronização de sinais 4 retarda a trama de sinal áudio S do tempo gasto pelo tratamento do núcleo codificador 1. Sua saída é endereçada ao subtraidor 3 que lhe subtrai o sinal sintético S' igual à saída do núcleo decodíficador operando sobre a base dos parâmetros quantificados, tais como representados pelos bits de saída P1 do núcleo codificador. Assim como é usual, o codificador 1 incorpora um decodíficador local que fornece S'. O sinal áudio a codificar S tem, por exemplo, uma banda passante de 7 kHz, sendo mostrado em 16 kHz. Uma trama consiste, por exemplo, em 960 amostras, seja 60 ms de sinal ou duas tramas elementares do núcleo codificador G.723.1. Conforme esse último opere sobre sinais mostrados em 8 kHz, o sinal S é subamostrado em um fator 2 à entrada do núcleo codificador 1. Da mesma forma, o sinal sintético S' é supermostrado em 16 kHz na saída do núcleo codificador 1. A taxa do primeiro estágio 1 é de 6,4 kbits/s (2 x N1 = 2 x192 = 384 bits por trama). Se o codificador tiver uma taxa máxima de 32 kbits/s (Nmax = 1920 bits por trama), a taxa máxima do segundo estágio é de 25,6 kbits/s (1920 - 384 = 1536 bits por trama). O segundo estágio 2 funciona, por exemplo, sobre tramas elementares, ou subtramas, de 20 ms (320 amostras a 16 kHz). O segundo estágio 2 compreende um módulo 5 de transformação tempo-freqüência, por exemplo de tipo MDCT ("Modified Discrete Cosine Transform") ao qual é endereçado o resíduo E obtido pelo subtraidor 3. Na prática, o funcionamento dos módulos 3 e 5 representados na figura 1 pode ser realizado, efetuando-se as seguintes operações para cada subtra-ma de 20 ms: - transformação MDCT do sinal de entrada S retardada pelo módulo 4, que fornece 320 coeficientes MDCT. O espectro sendo limitado em 7225 Hz, só os 289 primeiros coeficientes MDCT são diferentes de 0; - transformação MDCT do sinal de sintético S'. Como se trata do espectro do sinal de banda telefônica, só os 139 primeiros coeficientes MDCT são diferentes de 0 (até 3450 Hz); e - cálculo do espectro de diferença entre os espectros precedentes. O espectro resultante é distribuído em várias bandas de larguras diferentes por um módulo 6, A título de exemplo, a faixa passante do codec G.723.1 pode ser subdividida em 21 bandas, enquanto que as frequências mais elevadas são repartidas em 11 bandas suplementares. Nessas 11 bandas suplementares, o resíduo E é idêntico ao sinal de entrada S.Other features and advantages of the present invention will appear in the following description of non-limiting embodiments, with reference to the accompanying drawings, in which: Figure 1 is a synoptic schematic of an example audio encoder according to the invention. invention; Figure 2 represents a binary sequence of N-bit output in one embodiment of the invention; and Figure 3 is a synoptic scheme of an audio decoder according to the invention. The encoder shown in Figure 1 has a hierarchical structure with two coding stages. A first coding stage 1 consists, for example, of a CELP type (300-3400 Hz) telephone band encoder core. This encoder is in this example considered a G.723.1 encoder standardized by the ITU-T ("International Telecommunication Union") in fixed mode at 6.4 kbit / s. It calculates G.723.1 parameters according to the standard and quantifies them by 192 P1 encoding bits per 30 ms frame. The second coding stage 2, allowing to increase the bandwidth towards the broadband (50-7000 Hz), operates on the coding residue E of the first stage, provided by a subtractor 3 in the scheme of figure 1. A synchronization module 4 delays the audio signal frame S from the time taken to handle the encoder core 1. Its output is addressed to subtractor 3 which subtracts the synthetic signal S 'equal to the output of the decoder core operating on the basis of quantized parameters such as as represented by the output bits P1 of the encoder core. As usual, encoder 1 incorporates a local decoder that provides S '. The audio signal encoding S has, for example, a bandwidth of 7 kHz and is shown at 16 kHz. A frame consists, for example, of 960 samples, either 60 ms signal or two elemental frames of the G.723.1 encoding core. As the latter operates on signals shown at 8 kHz, the signal S is subsampled by a factor 2 to the encoder core input 1. Similarly, the synthetic signal S 'is oversampled at 16 kHz to the encoder core output 1. The rate of the first stage 1 is 6.4 kbits / s (2 x N1 = 2 x 192 = 384 bits per frame). If the encoder has a maximum rate of 32 kbits / s (Nmax = 1920 bits per frame), the second stage maximum rate is 25.6 kbits / s (1920 - 384 = 1536 bits per frame). The second stage 2 works, for example, on 20 ms elementary frames or subframes (320 samples at 16 kHz). The second stage 2 comprises a time-frequency transformation module 5, for example of the Modified Discrete Cosine Transform (MDCT) type, to which the residual E obtained by subtractor 3 is addressed. In practice, the operation of modules 3 and 5 represented Figure 1 can be performed by performing the following operations for each 20 ms subfram: - MDCT transformation of the input signal S delayed by module 4, which provides 320 MDCT coefficients. The spectrum being limited to 7225 Hz, only the first 289 MDCT coefficients are different from 0; - MDCT transformation of synthetic signal S '. As it is the spectrum of the telephone band signal, only the first 139 MDCT coefficients are different from 0 (up to 3450 Hz); and - calculating the difference spectrum between the preceding spectra. The resulting spectrum is distributed into several bands of different widths by a module 6. By way of example, the passing band of the G.723.1 codec can be subdivided into 21 bands, while the higher frequencies are divided into 11 additional bands. In these 11 additional bands, the residue E is identical to the input signal S.

Um módulo 7 efetua a codificação do envoltório espectral do resíduo E. Ele começa por calcular a energia dos coeficientes MDCT de cada banda do espectro de diferença. Essas energias são a seguir denominadas "fatores de escala". Os 32 fatores de escala constituem o envoltório espectral do sinal de diferença. O módulo 7 procede então a sua quantificação em duas partes. A primeira parte corresponde à banda telefônica (21 primeiras bandas, de 0 a 3450 Hz), a segunda às bandas altas (11 últimas bandas, de 3450 a 7225 Hz). Em cada parte, o primeiro fator de escala é quantificado em absoluto, e os seguintes em diferencial, utilizando uma codificação clássica de Huffman com taxa variável. Esses 32 fatores de escala são quantificados sobre um número variável N2(i) de bits P2 para cada subtrama de fileira i (i = 1,2,3).A module 7 encodes the spectral envelope of residue E. It starts by calculating the energy of the MDCT coefficients of each band of the difference spectrum. These energies are hereinafter referred to as "scale factors". The 32 scale factors constitute the spectral envelope of the difference signal. Module 7 then quantifies them in two parts. The first part corresponds to the telephone band (21 first bands, from 0 to 3450 Hz), the second to the high bands (last 11 bands, from 3450 to 7225 Hz). In each part, the first scale factor is quantified in absolute, and the next in differential, using a classical variable-rate Huffman coding. These 32 scaling factors are quantified over a variable number N2 (i) of P2 bits for each row subframe i (i = 1,2,3).

Os fatores de escalas quantificadas são anotados como FQ na figura 1. Os bits de quantificação P1, P2 do primeiro subconjunto constituído dos parâmetros quantificados do núcleo codificador 1 e dos fatores de escala quantificados FQ são em um número variável N0 = (2 x N1) + N2(1) + N2(2) +■ N2(3). A diferença Nmax - N0 = 1536 - N2(1) - N2(2) - N2(3) é dis- ponível para quantificar mais finamente os espectros das bandas.Quantified scaling factors are noted as FQ in Figure 1. Quantizing bits P1, P2 of the first subset consisting of quantized parameters of coding core 1 and quantized scaling factors FQ are in a variable number N0 = (2 x N1) + N2 (1) + N2 (2) + ■ N2 (3). The difference Nmax - N0 = 1536 - N2 (1) - N2 (2) - N2 (3) is available to more finely quantify the band spectra.

Um módulo 8 normaliza os coeficientes MDCT repartidos em bandas pelo módulo 6, dividindo-os pelos fatores de escala quantificados FQ respectivamente determinados para essas bandas. Os espectros assim normalizados são fornecidos ao módulo de quantificação 9 que utiliza um esquema de quantificação vetorial de tipo conhecido. Os bits de quantificação provenientes do módulo 9 são anotados P3 na figura 1.A module 8 normalizes the MDCT coefficients divided into bands by module 6 by dividing them by the quantized scaling factors FQ respectively determined for those bands. The thus normalized spectra are provided to the quantization module 9 which uses a vector quantization scheme of known type. The quantization bits from module 9 are noted P3 in Figure 1.

Um multiplexador de saída 10 reúne os bits P1, P2 e P3 oriundos dos módulos 1, 7 e 9 para formar a seqüência binária φ de saída do codificador.An output multiplexer 10 gathers bits P1, P2, and P3 from modules 1, 7, and 9 to form the binary output sequence φ of the encoder.

De acordo com a invenção, o número total de bits N da seqüência de saída, representando uma trama comum não é necessariamente igual a Nmax. Ele pode lhe ser inferior. Todavia, a alocação dos bits de quantificação nas bandas é feita, baseando-se no número Nmax.According to the invention, the total number of bits N of the output sequence representing a common frame is not necessarily equal to Nmax. He may be inferior to you. However, the allocation of the quantization bits in the bands is made based on the number Nmax.

No esquema da figura 1, essa alocação é feita para cada subtrairía pelo módulo 12 a partir do número Nmax - NO, fatores de escala quantificados FQ e de uma curva de ocultação espectral calculada por um módulo 11. O funcionamento deste módulo 11 é o seguinte. Ele determina inicialmente um valor aproximado do envoltório espectral original do sinal S a partir daquele do sinal de diferença, tal como quantificada pelo módulo 7, e daquele que ele determina com a mesma resolução para o sinal sintético S' resultante do núcleo codificador. Estes dois envoltórios que são também determináveis por um decodificador só disporiam dos parâmetros do primeiro subconjunto pré-citado. Assim, o envoltório espectral estimado do sinal S será também disponível no decodificador. Em seguida, o módulo 11 calcula uma curva de ocultação espectral, aplicando, de forma conhecida em si, um modelo de percepção auditiva banda por banda com envoltório espectral original estimada. Essa curva 11 dá um nível de ocultação para cada banda considerada. O módulo 12 realiza uma alocação dinâmica dos Nmax - NO bits restantes da seqüência ψ dentre as 3 x 32 bandas das três transformações MDCT do sinal de diferença. Na aplicação da invenção no caso exposta, em função de um critério de importância perceptual psicoacústica, fazendo referência ao nível do envoltório espectral estimada em relação à curva de ocultação em cada banda, aloca-se em cada banda uma taxa proporcional a esse nível. Outros critérios de classificação seriam utilizáveis.In the scheme of figure 1, this allocation is made for each subtract by module 12 from the number Nmax - NO, quantized scaling factors FQ and a spectral concealment curve calculated by module 11. The operation of this module 11 is as follows. . It initially determines an approximate value of the original spectral envelope of signal S from that of the difference signal, as quantified by module 7, and that which it determines at the same resolution for the synthetic signal S 'resulting from the encoding core. These two wraps which are also determinable by a decoder would only have the parameters of the first pre-quoted subset. Thus, the estimated spectral envelope of signal S will also be available in the decoder. Module 11 then calculates a spectral concealment curve by applying, in a manner known per se, a band-by-band auditory perception model with estimated original spectral envelope. This curve 11 gives a level of concealment for each band considered. Module 12 performs a dynamic allocation of the remaining Nmax - NO bits of the ψ sequence within the 3 x 32 bands of the three difference signal MDCT transformations. In the application of the invention in the present case, according to a criterion of psychoacoustic perceptual importance, referring to the estimated spectral envelope level in relation to the concealment curve in each band, a rate proportional to that level is allocated to each band. Other classification criteria would be usable.

Na sequência dessa alocação de bits, o módulo 9 sabe quantos bits devem ser considerados para a quantificação de cada banda em cada subtrama.Following this bit allocation, module 9 knows how many bits to consider when quantifying each band in each subframe.

Todavia, se N < Nmax, esses bits alocados não serão necessariamente todos utilizados. Uma ordenação dos bits que representam as bandas é feita por um módulo 13, em função de um critério de importância perceptual. O módulo 13 classifica as 3 x 32 bandas em uma ordem de importância decrescente que pode ser a ordem decrescente das relações sinal com ocultação (relação entre o envoltório espectral estimado e a curva de ocultação em cada banda). Essa ordem é utilizada para a construção da se-qüência binária φ, de acordo com a invenção.However, if N <Nmax, these allocated bits are not necessarily all used. An ordering of the bits representing the bands is made by a module 13, according to a criterion of perceptual importance. Module 13 ranks the 3 x 32 bands in a descending order of importance which may be the descending order of signal-to-conceal relationships (ratio of estimated spectral envelope to concealment curve in each band). This order is used for the construction of the binary sequence φ according to the invention.

Em função do número N de bits desejado na seqüência φ para a codificação da trama corrente, determinam-se as bandas que devem ser quantificadas pelo módulo 9, selecionando as bandas classificadas as primeiras pelo módulo 13 e retendo-se para cada banda selecionada um número de bits, tais como determinado pelo módulo 12.Depending on the desired number of N bits in the sequence φ for the encoding of the current frame, the bands to be quantified by module 9 are determined by selecting the bands classified first by module 13 and retaining for each selected band a number. bit as determined by module 12.

Depois os coeficientes MDCT de cada banda selecionada são quantificados pelo módulo 9, por exemplo com o auxílio de um quantificador vetorial, de acordo com um número de bits alocado, para produzir um número de bits total igual a N - NO. O multiplexador de saída 10 constitui a seqüência binária φ constituída dos N primeiros bits da seqüência ordenada apresentada a seguir representada na figura 2 (caso N = Nmax): a) inicialmente os trens binários correspondentes às duas tramas G.723.1 (384 bits); b) depois os bits de quantificação dos fatores de escala, para s três subtramas (i = 1, 2, 3), da 22a banda espectral (primeira banda além da banda telefônica) à 32a banda (codificação de Huffman com taxa variável); F(i) j f p(0 c) depois os bits 22 ’ ’’' 32 de quantificação dos fatores de escala, para as três subtramas (i = 1, 2, 3), da primeira banda espectral à vigésima primeira banda (codificação de Huffman com taxa variável); e d) enfim, os índices Mci, MC2... Mc96 de quantificação vetorial das 96 bandas por ordem de importância perceptual, da banda a mais importante à banda a menos importante, respeitando a ordem determinada pelo módulo 13. O fato de colocar em primeiro lugar (a e b) os parâmetros G.723.1 e os fatores de escala das bandas altas permite conservar a mesma banda passante para o sinal restituível pelo decodíficador independentemente da taxa efetiva além de um valor mínimo correspondente à recepção desses grupos a e b. Esse valor mínimo, suficiente para a codificação de Huffman dos 3 x 11 =33 fatores de escala das bandas altas além da codificação G.723.1, é, por exemplo, de 8 kbits/s. O processo de codificação acima permite uma decodificação da trama, caso o decodíficador receba N’ bits com NO < N'< N. Esse número Ν' será geralmente variável de uma trama à outra.Then the MDCT coefficients of each selected band are quantized by module 9, for example with the aid of a vector quantizer, according to an allocated number of bits, to produce a total number of bits equal to N - NO. The output multiplexer 10 constitutes the binary sequence φ consisting of the first N bits of the ordered sequence shown below in Figure 2 (case N = Nmax): a) initially the binary trains corresponding to the two G.723.1 frames (384 bits); b) then the scaling factor quantization bits for the three subframes (i = 1, 2, 3) from the 22nd spectral band (first band besides the telephone band) to the 32nd band (variable rate Huffman coding); F (i) jfp (0 c) then the scaling factor quantization bits 22 '' '' 32 for the three subframes (i = 1, 2, 3) from the first spectral band to the twenty-first band ( Huffman with variable rate); and d) finally, the Mci, MC2 ... Mc96 vector quantification indices of the 96 bands in order of perceptual importance, from the most important band to the least important band, respecting the order determined by module 13. The fact of placing first (a and b) the G.723.1 parameters and the high band scaling factors allow the same passband to be retained for the decoder-refundable signal regardless of the effective rate plus a minimum value corresponding to the reception of these groups a and b. This minimum value, sufficient for Huffman coding of 3 x 11 = 33 high band scaling factors beyond the G.723.1 coding, is, for example, 8 kbits / s. The above encoding process allows frame decoding if the decoder receives N 'bits with NO <N' <N. This number Ν 'will generally be variable from one frame to another.

Um decodíficador, de acordo com a invenção, correspondendo a esse exemplo, é ilustrado pela figura 3. Um demultiplexador 20 separa a se-qüência de bits recebidos φ' para daí extrair os bits de codificação P1 e P2. Os 384 bits P1 são fornecidos ao núcleo decodíficador 21 de tipo G.723.1 para que este sintetize duas tramas do sinal de base S' em banda telefônica. Os bits P2 são decodificados, segundo o algoritmo de Huffman por um módulo 22 que recupera assim os fatores de escalas quantificados FQ para cada uma dos 3 subtramas.A decoder according to the invention corresponding to this example is illustrated by FIG. 3. A demultiplexer 20 separates the sequence of received bits para 'to extract the coding bits P1 and P2 therefrom. The 384 bits P1 are provided to the G.723.1 type decoder core 21 for it to synthesize two frames of the base signal S 'in the telephone band. The P2 bits are decoded according to Huffman's algorithm by a module 22 which thus retrieves the quantized scaling factors FQ for each of the 3 subframes.

Um módulo 23 de cálculo da curva de ocultação, idêntico àquele 11 do codificador da figura 1, recebe o sinal de base S' e os fatores de escalas quantificados FQ e produz os níveis de ocultação espectral para cada uma das 96 bandas. A partir desses níveis de ocultação espectral para cada uma das 96 bandas. A partir desses níveis de ocultação, fatores de escalas quantificados FQ e do conhecimento do número Nmax (assim como aquela do número NO que se deduz da decodificação de Huffman dos bits P2 pelo módulo 22), um módulo 24 determina uma alocação de bits do mesmo modo que o módulo 12 da figura 1. Além disso, um módulo 25 procede à ordenação das bandas, segundo o mesmo critério de classificação que o módulo 13 descrito com referência à figura 1. A partir das informações fornecidas pelos módulos 24 e 25, o módulo 26 extrai os bits P3 da sequência de entrada f e sintetiza os coeficientes MDCT normalizados relativos às bandas representadas na sequência f. Se for o caso (N'< Nmax), os coeficientes MDCT normalizados relativos às bandas que faltam podem, além disso, ser sintetizados por interpolação ou extrapolação conforme descritos depois (módulo 27). Essas bandas que faltam podem ter sido eliminados pelo codificador devido a uma truncatura com N < Nmax, ou elas podem ter sido eliminadas no decorrer da transmissão (N'< N).A blind curve calculation module 23, identical to that of the encoder of FIG. 1, receives the base signal S 'and the quantized scaling factors FQ and produces the spectral blind levels for each of the 96 bands. From these levels of spectral concealment for each of the 96 bands. From these levels of concealment, quantified scaling factors FQ, and knowledge of the Nmax number (as well as that of the NO number that is deduced from Huffman's decoding of P2 bits by module 22), a module 24 determines a bit allocation of it. In addition, module 25 proceeds to sort the bands according to the same classification criteria as module 13 described with reference to figure 1. From the information provided by modules 24 and 25, module 26 extracts bits P3 from the input sequence f and synthesizes the normalized MDCT coefficients relative to the bands represented in sequence f. Where appropriate (N '<Nmax), the standardized MDCT coefficients for the missing bands can furthermore be synthesized by interpolation or extrapolation as described later (module 27). These missing bands may have been eliminated by the encoder due to a truncation with N <Nmax, or they may have been eliminated during transmission (N '<N).

Os coeficientes MDCT normalizados, sintetizados peio módulo 26 e/ou o módulo 27, são multiplicados por seus fatores de escala quantificados respectivos (multiplicador 28) antes de serem apresentados no módulo 29 que efetua a transformação freqüência-tempo inversa da transformação MDCT operada pelo módulo 5 do codificador. O sinal temporal de correção que daí resulta é adicionado ao sinal sintético S' liberado pelo núcleo decodificador 21 (adicionador 30) para produzir o sinal áudio de saída S do decodificador.The normalized MDCT coefficients synthesized by module 26 and / or module 27 are multiplied by their respective quantized scaling factors (multiplier 28) before being presented in module 29 which performs the inverse frequency-time transformation of the module-operated MDCT transformation. 5 of the encoder. The resulting time correction signal is added to the synthetic signal S 'released by the decoder core 21 (adder 30) to produce the decoder output audio signal S.

Deve ser observado que o decodificador poderá sintetizar um sinal S, mesmo em casos em que ele não recebe os NO primeiros bits da seqüência.It should be noted that the decoder may synthesize an S signal even in cases where it does not receive the first NO bits of the sequence.

Basta-lhe receber os 2 x N1 bits correspondentes à parte a da enumeração acima, a decodificação estando então em um modo "degradado11. Só esse modo degradado não utiliza a síntese MDCT para obter o sinal decodificado. Para assegurar a comutação sem ruptura entre esse modo e os outros modos, o decodificador faz três análises MDCT seguidas por três sínteses MDCT, permitindo publicar relatórios da transformação MDCT. O sinal de saída contém um sinal de qualidade banda telefônica. Se os 2 x N1 primeiros bits não são mesmo recebidos, o decodificador considera a trama correspondente como oculta e pode utilizar um algoritmo conhecido de dissimulação das tramas ocultas.Just receive the 2 x N1 bits corresponding to part a of the enumeration above, the decoding is then in a "degraded" mode. This degraded mode alone does not use the MDCT synthesis to obtain the decoded signal. To ensure seamless switching between this mode and the other modes, the decoder performs three MDCT analyzes followed by three MDCT syntheses, allowing you to publish reports of the MDCT transformation.The output signal contains a telephone band quality signal.If the first 2 x N1 bits are not even received, The decoder considers the corresponding frame as hidden and may use a known algorithm for concealing hidden frames.

Se o decodificador receber os 2 x N1 bits correspondentes à parte a mais bits da parte b (bandas altas dos três envoltórios espectrais), poderá começar a sintetizar um sinal em banda larga. Ele pode notadamente proceder conforme a seguir: 1) o módulo 22 recupera as partes dos três envoltórios espectrais recebidos; 2) as bandas não recebidas têm seus fatores de escala temporariamente colocados em zero; 3) as partes baixas dos envoltórios espectrais são calculadas a partir das análises MDCT feitas sobre o sinal obtido após a decodificação G.723.1 e o módulo 23 calcula as três curvas de ocultação sobre os envoltórios assim obtidos; 4) o envoltório espectral é corrigido para a regularização, evitando os orifício devido às bandas não recebidas: os valores nulos na parte alta dos envoltórios espectrais FQ são, por exemplo, substituídos pelo centésimo do valor da curva de ocultação calculada anteriormente, de tal modo que permanecem inaudíveis. O espectro completo das bandas baixas e o envoltório espectral das bandas altas são conhecidos nesse estágio; 5) o módulo 27 gera então o espectro alto. A estrutura fina dessas bandas é gerada por reflexão da estrutura finas de sua proximidades conhecidas antes da ponderação pelos fatores de escala (multiplicadores 28). No caso de nenhum dos bits P3 ser recebido, a "proximidade conhecida" corresponderá ao espectro do sinal S’ produzido pelo núcleo decodifica-dor G.723.1. Sua reflexão pode consistir em recopíar o valor do espectro MDCT normalizado, com eventualmente uma atenuação de suas variações proporcional ao afastamento dessa proximidade conhecida; 6) após transformação MDCT inversa (29) e adição (30) do sinal de correção resultante no sinal de saída do núcleo decodificador, obtém-se o sinal sintetizado em banda larga.If the decoder receives the 2 x N1 bits corresponding to the most bits of part b (high bands of the three spectral wraps), it can start synthesizing a broadband signal. It can remarkably proceed as follows: 1) module 22 retrieves the parts of the three received spectral wraps; 2) bands not received have their scaling factors temporarily set to zero; 3) the low parts of the spectral wraps are calculated from the MDCT analyzes made on the signal obtained after G.723.1 decoding and module 23 calculates the three concealment curves on the wraps thus obtained; 4) the spectral envelope is corrected for smoothing, avoiding the holes due to unrecognized bands: null values at the top of the FQ spectral envelopes are, for example, replaced by the hundredth of the previously calculated concealment curve value, such that that remain inaudible. The full spectrum of the low bands and the spectral envelope of the high bands are known at this stage; 5) module 27 then generates the high spectrum. The fine structure of these bands is generated by reflecting the fine structure of their known proximity prior to weighting by scale factors (multipliers 28). In case none of the P3 bits are received, the "known proximity" will correspond to the spectrum of the S 'signal produced by the G.723.1 decoder core. Its reflection may consist in recopying the value of the normalized MDCT spectrum, with attenuation of its variations proportional to the distance from this known proximity; 6) after inverse MDCT transformation (29) and addition (30) of the resulting correction signal to the decoder core output signal, the synthesized broadband signal is obtained.

No caso de o decodificador receber também uma parte pelo menos do envoltório espectral baixo do sinal de diferença (parte c), ele pode ou não considerar essa informação para afinar o envoltório espectral na etapa 3.If the decoder also receives at least part of the low spectral envelope of the difference signal (part c), it may or may not consider this information to fine tune the spectral envelope in step 3.

Se o decodificador 10 receber suficientemente bits P3 para decodificar pelo menos os coeficientes MDCT da faixa a mais importante, classificada a primeira na parte d da seqüência, então o módulo 26 recupera determinados coeficientes MDCT normalizados a partir da alocação e a ordenação indicados pelos módulos 24 e 25. Esses coeficientes MDCT não têm necessidade de serem interpolados conforme na etapa 5 acima. Para as outras bandas, o processo das etapas 1 a 6 é aplicável pelo módulo 27 do mesmo modo que anteriormente, o conhecimento dos coeficientes MDCT recebidos para certas bandas, permitindo uma interpolação mais confiável na etapa 5.If decoder 10 receives sufficient P3 bits to decode at least the MDCT coefficients of the most important band, ranked first in part d of the sequence, then module 26 retrieves certain normalized MDCT coefficients from the allocation and sorting indicated by modules 24. and 25. These MDCT coefficients do not need to be interpolated as in step 5 above. For the other bands, the process from steps 1 to 6 is applicable by module 27 in the same way as before, knowledge of the MDCT coefficients received for certain bands, allowing for more reliable interpolation in step 5.

As bandas não recebidas podem variar de uma subtrama MDCT à seguinte. A proximidade conhecida de uma banda que falta pode corresponder à mesma banda em uma outra subtrama na qual ela não está ausente e/ou a uma ou várias bandas as mais próximas no domínio freqüencial no decorrer da mesma subtrama. É também possível regenerar um espectro MCDT que está ausente em uma banda para uma subtrama, fazendo uma soma ponderada de contribuições avaliadas a partir de várias ban-das/subtramas da proximidade conhecida. À medida que a taxa efetiva de Ν’ bits por trama coloca arbitrariamente o último bit de uma trama determinada, o último parâmetro codificado transmitido pode, segundo os casos, ser transmitido completa ou parcialmente. Dois casos podem então se apresentar: . ou a estrutura de codificação adotada permite explorar a informação parcial recebida (caso de quantificadores escalares, ou de quantificação vetorial com dicionários divididos); . ou ela não permite e trata-se o parâmetro não inteiramente recebido como os outros parâmetros não recebidos. Nota-se que, para esse último caso, caso a ordem dos bits varia a cada trama, o número de bits assim perdidos é variável e a seleção de N’ bits produzirá em média, sobre o elemento das tramas decodificadas, uma qualidade melhor do que aquela que se obteria com um número de bits menor.Incoming bands may vary from one MDCT subframe to the next. The known proximity of a missing band may correspond to the same band in another subframe in which it is not absent and / or to one or more closest bands in the frequency domain over the same subframe. It is also possible to regenerate an MCDT spectrum that is absent in a band for a subframe by making a weighted sum of evaluated contributions from various known proximity plaques / subframes. As the effective frame bit rate arbitrarily places the last bit of a given frame, the last transmitted coded parameter may, as the case may be, be transmitted in whole or in part. Two cases can then present themselves:. or the coding structure adopted allows to exploit the partial information received (in case of scalar quantifiers, or vector quantification with divided dictionaries); . or it does not allow and is the parameter not fully received as the other parameters not received. Note that for the latter case, if the order of bits varies with each frame, the number of bits thus lost is variable and the selection of N 'bits will produce, on average, over the element of the decoded frames, a better quality. than one that would be obtained with a smaller number of bits.

REIVINDICAÇÕES

Claims

1. The process of encoding a numeric audio signal frame (S) into a binary output sequence (φ), in which a maximum number of coding bits Nmax is defined for a set of parameters computable from the signal frame, composed of a first and a second subset, the process comprising the following steps: calculating the parameters of the first subset, and encoding those parameters over a NO number of coding bits such that NO <Nmax; - determine an allocation of Nmax -NO coding bits for the second subset parameters; and - classifying the Nmax - NO coding bits allocated in the second subset parameters in a given order, in which the allocation and / or sorting order of the Nmax - NO coding bits is determined against the coded parameters of the first subset, the process further comprising the following steps in response to indicating an N number of bits of the binary output sequence available for encoding this parameter set, with NO <N <Nmax:. select the parameters of the second subset to which the N-NO coding bits allocated the first in that order are allocated; . calculate the selected parameters of the second subset, and encode these parameters to produce these N-NO coding bits ranked the first; and . insert in the output sequence the NO coding bits of the first subset as well as the N - NO decoding bits of the selected parameters of the second subset.

The method of claim 1, wherein the sort order of the coding bits allocated in the parameters of the second subset is variable from one frame to another.

Process according to claim 1 or 2, wherein N <Nmax.

Method according to any one of the preceding claims, in which the sort order of the coding bits allocated in the parameters of the second subset is a decreasing order of importance determined as a function of at least the coded parameters of the first subset.

The method of claim 4, wherein the sort order of the coding bits allocated in the parameters of the second subset is determined with the aid of at least one psychoacoustic criterion as a function of the coded parameters of the first subset.

The method of claim 5, wherein the parameters of the second subset refer to spectral bands of the signal, in which a spectral wrap of the encoded signal is estimated from the coded parameters of the first subset, in which a Frequency concealment curve applying an auditory perception model to the estimated spectral envelope, and in which the psychoacoustic criterion refers to the estimated spectral envelope level in relation to the concealment curve in each spectral band.

A process according to any one of claims 4 to 6, wherein Nmax = N.

Method according to any one of the preceding claims, in which the coding bits in the output sequence are arranged such that the NO coding bits of the first subset preceding the N - NO coding bits of the selected parameters of the second subset. that the respective coding bits of the selected parameters of the second subset therein appear in the order determined for those coding bits.

Method according to any of the preceding claims, wherein the number N varies from one frame to another.

Method according to any one of the preceding claims, wherein the parameter coding of the first subset is variable rate, which varies the NO number from one frame to another.

Method according to any one of the preceding claims, wherein the first subset comprises the parameters calculated by a coding core (1).

A method according to claim 11, wherein the encoding core (1) has an operating frequency band less than the bandwidth of the signal to be encoded, and wherein the first subset further comprises energy levels of the signal. associated with frequency bands exceeding the operating band of the encoder core.

Method according to each of claims 8 and 12, wherein the coding bits of the first subset in the output sequence are arranged such that the coding bits of the parameters calculated by the coding core are immediately followed by the coding bits. coding of energy levels associated with higher frequency bands.

A method according to any one of claims 11 to 13, wherein a signal of difference between the signal to be coded and a synthesis signal derived from the coded parameters produced by the coding core is estimated, and wherein the first subset comprises, In addition, energy levels of the difference signal associated with frequency bands included in the operating band of the encoder core.

A method according to claim 8 and any one of claims 12 to 14, wherein the coding bits of the first subset in the output sequence are ordered such that the coding bits of the parameters calculated by the core encoder (1) are followed by the energy level coding bits associated with the frequency bands.

A process of decoding a binary input sequence (f) to synthesize a numeric audio signal (S), in which a maximum number of coding bits Nmax is defined for a set of signal frame description parameters, composed of a first and second subset, the input sequence comprising, for a signal frame, a number N 'of coding bits of that parameter set, with N' <Nmax, the process comprising the following steps: - extracting from these Input sequence N'bits a NO number of encoding bits of the parameters of the first subset, if NO <Ν '; - retrieve the parameters of the first subset on the basis of these extracted NO coding bits; - determine an allocation of Nmax - NO coding bits for the second subset parameters; and - sorting the Nmax-NO coding bits allocated in the second subset parameters in a given order, in which the allocation and / or sorting order of the N-max -NO coding bits is determined against the parameters retrieved from the first subset. subset, the process further comprising the following steps: - selecting the parameters of the second subset into which the classificados'- NO coding bits classified the first in that order are allocated; extracting from these bits of the input sequence N-NO coding bits of the selected parameters of the second subset; - retrieve the selected parameters from the second subset on the basis of these extracted coding bits; and synthesizing the signal frame using the parameters retrieved from the first and second subsets.

The method of claim 16, wherein the sort order of coding bits allocated in the parameters of the second subset is variable from one frame to another.

Process according to claim 16 or 17, wherein N '<Nmax.

A method according to any one of claims 16 to 18, wherein the sort order of the coding bits allocated in the parameters of the second subset is an order of decreasing importance determined as a function of at least the parameters retrieved from the first subset.

The method of claim 19, wherein the sort order of the coding bits allocated in the parameters of the second subset is determined with the aid of at least one psychoacoustic criterion as a function of the parameters retrieved from the first subset.

The method of claim 20, wherein the parameters of the second subset refer to signal spectral bands, in which a signal spectral envelope is estimated from the parameters retrieved from the first subset, in which a frequency concealment curve by applying a model of auditory perception to the estimated spectral envelope, and in which the psychoacoustic criterion refers to the estimated spectral envelope level in relation to the concealment curve in each spectral band.

A method according to any one of claims 16 to 21, wherein the NO coding bits of the first subset parameters are extracted bits received at positions in the sequence preceding the positions from which the NOs are extracted. encoding bits of the selected parameters of the second subset.

A method according to any one of claims 16 to 22, wherein for synthesizing the signal frame, unselected parameters of the second subset are estimated by interpolation from at least the selected parameters retrieved on the basis of those N ' -N0 coding bits extracted.

A method according to any one of claims 16 to 23, wherein the first subset comprises input parameters of a decoder core (21).

The method of claim 24, wherein the decoder core (21) has an operating frequency band less than the signal passing band to be synthesized, and wherein the first subset further comprises signal energy levels. associated with frequency bands exceeding the operating band of the de-encoder core.

The method of each of claims 22 and 25, wherein the coding bits of the first subset in the input sequence are arranged such that the coding bits of the input parameters of the decoder core (21) are immediately followed by energy level coding bits associated with the higher frequency bands.

A method according to claim 26, comprising the following steps, if the input sequence bits (φ ') are limited to the decoder core (21) input parameter encoding bits and a portion by less of the energy level coding bits associated with the higher frequency bands: - extracting from the input sequence the coding bits of the input parameters of the decoder core and that part of the energy level coding bits; synthesize a base signal (S ') in the decoder core and recover energy levels associated with the higher frequency bands on the base of the extracted coding bits; - calculate a spectrum of the base signal; - affect an energy level for each band above which an uncoded energy level in the input sequence is associated; - synthesize spectral components for each higher frequency range from the corresponding energy level and the base signal spectrum in at least one band of that spectrum; - apply a transformation to the time domain in the synthesized spectral components to obtain a base signal correction signal; and adding the base signal and correction signal to synthesize the signal frame.

The method according to claim 27, wherein the affected energy level in a band greater than one associated with an uncoded energy level in the input sequence is a fraction of a perceptual concealment level calculated from the signal spectrum. base and the energy levels recovered on the basis of the extracted coding bits.

A method according to any one of claims 24 to 28, wherein a base signal (S1) is synthesized in the decoder core, and wherein the first subset further comprises energy levels of a difference signal between the signal to be synthesized and the base signal associated with frequency bands included in the operating band of the encoder core.

A method according to any one of claims 25, 26 and 29, wherein, for NO <N '<Nmax, the unselected parameters of the second subset for frequency spectral components are estimated with the aid of a calculated spectrum of the base signal and / or selected parameters retrieved on the basis of these extracted coding bits.

A method according to claim 30, wherein the unselected parameters of the second subset in a frequency range are estimated with the aid of a spectral proximity of that band, determined on the basis of the sequence coding bits. input.

The method of claim 22 and any one of claims 25 to 31, wherein the encoding bits of the decoder core (21) input parameters are extracted from the bits received at positions of the sequence which precede the positions from which the energy level coding bits associated with the frequency bands are extracted,

A method according to any one of claims 16 to 32, wherein the number varia 'varies from one frame to another.

A method according to any one of claims 16 to 33, wherein the NO number varies from one frame to another.

Audio encoder, comprising numerical signal processing means adapted to apply a decoding process as defined in any one of claims 1 to 15.

An audio decoder comprising numerical signal processing means adapted to apply an encoding process as defined in any one of claims 16 to 34.