WO2025201625A1 - Codeur et décodeur - Google Patents

Codeur et décodeur

Info

Publication number
WO2025201625A1
WO2025201625A1 PCT/EP2024/057979 EP2024057979W WO2025201625A1 WO 2025201625 A1 WO2025201625 A1 WO 2025201625A1 EP 2024057979 W EP2024057979 W EP 2024057979W WO 2025201625 A1 WO2025201625 A1 WO 2025201625A1
Authority
WO
WIPO (PCT)
Prior art keywords
band
encoder
signal
limited
decoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/EP2024/057979
Other languages
English (en)
Inventor
Martin Müller
Guillaume Fuchs
Kishan GUPTA
Kacper SAGNOWSKI
Goran MARKOVIC
Sebastian BOLTEN
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fraunhofer Gesellschaft zur Foerderung der Angewandten Forschung eV
Original Assignee
Fraunhofer Gesellschaft zur Foerderung der Angewandten Forschung eV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fraunhofer Gesellschaft zur Foerderung der Angewandten Forschung eV filed Critical Fraunhofer Gesellschaft zur Foerderung der Angewandten Forschung eV
Priority to PCT/EP2024/057979 priority Critical patent/WO2025201625A1/fr
Priority to PCT/EP2025/058177 priority patent/WO2025202226A1/fr
Publication of WO2025201625A1 publication Critical patent/WO2025201625A1/fr
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/038Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/24Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/06Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients

Definitions

  • the baseband encoder may, according to embodiments, comprise a neural coder and/or an auto-encoder architecture and/or a Vector Quantized Variational Auto-Encoder (VQ-VAE).
  • VQ-VAE Vector Quantized Variational Auto-Encoder
  • the baseband encoder may perform encoding based on a quantization of a latent representation of the band-limited signal portion.
  • the baseband encoder is trained or trainable using adversarial losses.
  • the baseband encoder is configured to obtain a latent, latent being quantized encoded by the baseband encoder.
  • the LPC analysis entity is configured to obtain a filter AHB(z).
  • the filter AHB(z) may be used to whiten the extended-band signal portion and to obtain a residual eHB(n) based on the formula
  • M H B is the LPC order
  • n is a time-domain sample index of the audio signal
  • LPC coefficients which could be obtained after quantization and interpolation in the LSF domain.
  • the bandwidth extension encoder (or its quantization entity) is configured to code and quantize the energy after the residual signal e H B(n) had been obtained and/or after the prediction.
  • the bandwidth extension entity is configured to code and quantize an energy of the residual exploiting information derived from the band-limited residual, so that energy of the residual in the extended-band is predicted by the energy of the linear prediction residual signal computed on the band-limited signal for deriving a residual of the energy prediction, which is then quantized and/or an information output to the decoder.
  • the latent representation is generally learned and difficult to interpret making it difficult to extract relevant information that can steer and guide a classical speech bandwidth encoder.
  • the baseband encoder comprises a format definer configured to define a first multi-dimensional band-limited signal representation of the band-limited signal, the first multi-dimensional band-limited representation of the band-limited signal including at least
  • the at least one learnable layer configured to process the first multidimensional band-limited signal representation of the band-limited signal, or processed version of the first multi-dimensional band-limited audio signal representation.
  • Another aspect of embodiments provide a decoder for decoding an audio signal comprising a band-limited decoded signal portion and an extended-band signal portion.
  • the decoder comprises the two central entities baseband decoder and bandwidth extension decoder.
  • the decoder comprises a LPC estimation entity configured to perform LPC estimation based on the band-limited decoded signal portion to obtain a LPC coefficients and a band-limited excitation.
  • the bandwidth extension decoder is then configured to generate the extended-band excitation based on the band-limited excitation.
  • Embodiments of the aspect are based on the finding that an audio signal which is decoded, e.g. using the above-defined encoder, so that a band-limited decoded signal and an extended-band signal portion is present by use of a decoder comprising the baseband decoder and the bandwidth extension decoder.
  • the baseband decoder preferably is implemented as neural decoder, i.e. , comprises at least one learnable layer. As discussed in context of the decoder, this embodiment enables as well to combine neural speech coding and classical bandwidth extension techniques so as to increase the efficiency.
  • the excitation generation may be based on a technique called Waveform Envelope Synchronized Pulse Excitation (WESPE).
  • WESPE Waveform Envelope Synchronized Pulse Excitation
  • the extended-band LPC are estimated based on an entity comprising at least one learnable layer.
  • the bandwidth extension decoder is not solely based on a classical approach, but also a neural decoder, i.e., a decoder comprising a learnable layer.
  • the band-limited LPC estimation entity may, according to further embodiments, be configured to generate the excitation comprising an analysis of the bandlimited decoded signal portion for getting an estimate of harmonicity and/or voicing factor. Due to this, the relevant information may be advantageously extracted from the band-limited band-limiteddecoded signal portion from the baseband decoder and exploited for the decoding of the extended-band excitation.
  • the band-limited LPC estimation entity comprises an input for receiving a decoded band-limited signal portion of the encoded audio signal.
  • the baseband decoder comprises a learnable convolution layer and/or a learnable affine transform and/or a learnable recurrent layer and/or a weighting layer in a residual block of neural network and/or learnable element-wise modulation.
  • the decoder may comprise a combination entity for combining the band-limited decoded signal portion and the extended-band signal portion to obtain a reconstructed signal. For example, this combination entity may be connected to the two decoders or the output of the two decoders.
  • the combination entity may comprise a filterbank and/or block transform and/or a time domain upscale entity and/or a complex valued low delay filterbank (CLDFB) being configured to perform additional postprocessing in a filterbank domain before combining and/or before transforming the constructed signal to a time domain and/or at a desired sampling rate.
  • CLDFB complex valued low delay filterbank
  • an embodiment provides a method for coding an audio signal comprising a band-limited signal portion and an extended-band signal portion.
  • the method comprises the two central steps
  • Another embodiment provides a method for decoding an audio signal comprising a bandlimited decoded signal portion and an extended-band signal portion. This method may comprise the two central steps:
  • embodiments of the present invention may be computed and implemented.
  • another embodiment provides a computer program for performing, when running on a processor the steps of the two method steps as above.
  • Another embodiment provides a method for training the neural encoder and/or decoder. For example, the training may be performed on the decoder side, and the encoder side or by use of both sides.
  • Fig 1 shows schematically a level zero of the split band encoder, involving the base band encoder and the BWE encoder according to embodiments;
  • Fig. 2 illustrates schematically a two-band system realized with block transform, e.g. DFT according to embodiments;
  • Fig. 3 shows a high-level architecture of an example of neural baseband encoder and decoder to discuss embodiments:
  • Fig. 4 shows a basic implementation of the baseband encoder and the baseband decoder according to embodiments
  • Fig. 5 shows a schematic block diagram of a BWE encoder according to embodiments
  • Fig. 7 shows schematically an overall block diagram of the encoding and decoding system combining a time domain classical speech BWE at neural coder.
  • Fig. 1 shows a split band encoder 100 comprising the entity’s baseband encoder 110, BWE encoder 120 and the optional entity for pre-processing (cf. reference numeral 130) and multiplexor 140.
  • the pre-processing entity 130 receives the audio signal s(n) and splits this audio signal s(n) into a limited-band portion and band-extended portion, e.g. the two portions Sib(n) and Shb(n).
  • the band-limited signal portion Sib(n) e.g. a low band portion is provided to the baseband encoder 110
  • the extended-band signal portion Shb(n) e.g. a high band portion is provided to the BWE encoder 120.
  • Both encoders 110 and 120 perform an encoding as will be discussed below, so as to output the two encoded signals for the baseband and the extended bandwidth portion to the multiplexing 140.
  • the multiplexer uses the two signal so as to generate a bitstream b.
  • the input signal is first conveyed to a pre-processing block, which is in charge of performing several analyses like a pitch estimation, a voice activity detection but also to convey signals at a proper sampling rate to the subsequent coding modules, consisting in our case of the baseband coder 110 and bandwidth extension (BWE) encoder 120.
  • a filter-bank like a Quadrature Mirror Filters (QMF), pseudo QMF, modulated lapped or block transforms, or simply downsampling filters in time domain can be used.
  • QMF Quadrature Mirror Filters
  • pseudo QMF pseudo QMF
  • modulated lapped or block transforms or simply downsampling filters in time domain
  • the low-band signal is conveyed to the baseband coder 110, which in our preferred case is a neural coder, similar to the Neural End-to-End Speech Coder (NESC).
  • b (n) signal preferably contains a wideband or broadband signal sampled at 16 kHz.
  • Fig. 2 shows a possible implementation for the pre-processing 130.
  • the truncation and normalization 133a and 133b of DFT spectrum serves as lowpass and highpass filtering respectively and the Inverse DFT 135a is operating at a size corresponding to the target sampling rate for the low-band signal.
  • the demodulation and truncation module 133b For the high band, only the high frequencies are retained and copied and flipped to the baseband (aka known as demodulation) by the demodulation and truncation module 133b before being decimated by the Inverse DFT 135a with a size corresponding to the sampling-rate of high-band signal.
  • the sub-band decomposition can be achieved by time-domain decimation, like with a polyphase filterbank, or a pseudo-QMF.
  • the neural baseband coder to be used on the encoder side (cf. 110) and on the decoder side (cf. 210) will be discussed.
  • the encoder architecture comprising the encoder 110 and the decoder 220 is configured to perform the following processing: the encoder 110 receives a speech signal and codes same so as to output a bitstream B.
  • the decoder 210 uses the bitstream B so as to decode same and outputting the decoded speech signal.
  • the entities 111 , 112, 113, 211 , 212 and 213 belong to the baseband encoder 110 and the baseband decoder 210, respectively. Consequently, no bandwidth extension is used in the current example.
  • the decoder 200 uses the baseband decoder 210 and the bandwidths extension decoder 220. Both decode the bitstream B and output the respective signals yib(n) and yhb(n), respectively to the post-processor 230.
  • the post-processor is configured to obtain based on these two signals yib(n) and yhb(n), the decoded audio signal y(n).
  • the exact decoding will be discussed in context of Fig. 6 with focus on the bandwidth extension decoding 220. Before discussing the decoder-side, the bandwidth extension encoding 120 will be discussed with respect to Fig. 5.
  • Fig. 5 shows a bandwidth extension encoder 120. It comprises LPC analysis 142, and LPC to LSF transformation 144 and LSF quantization 146 enabling to output the LSF parameters.
  • energy parameters are determined using the entities 150, 152 (subframe windowing), 154 (energy computation) and 156 (energy quantization).
  • the energy quantization 156 is based on the energy computation 154 and the energy prediction 160 which gets the signal from the entity 150 and from a baseband preprocessor 110.
  • the entity 150 is connected with the input for the signal and the LSF quantization 146 via the entity 147.
  • An LPC analysis aka short-term linear analysis is performed on shb(n) to obtain a set of LPC coefficients. Since speech and in general audio shows less structure or formant structure in the high frequencies, fewer parameters are required than for the low-band signal. In our preferred mode, an order of 8 or 10 is used for a 16kHz sampled shb(n) signal.
  • the LPC analysis is performed as it can be done in baseband encoder, that means, by windowing the signal, computing the autocorrelation function up to a maximum lag corresponding to the order, before finding the optimal prediction coefficients with a recursive algorithm like Levinson-Durbin. It is worth noting that the LPC analysis windows of both low and high band can be the same and preferably time aligned, which will be an advantage in the subsequent processing steps, but also for exploiting the same lookahead. The so- obtained LPC coefficients are then quantized and coded. Once again, since the spectral envelope of the high-band is usually less structured and also perceptually less relevant, the quantization resolution can be lowered for the BWE coding compared to the baseband coding.
  • the method for generating the HB excitation uses another analysis of LB decoded signal for getting an estimate of the harmonicity also known as voicing factor.
  • voicing factor an estimate of the harmonicity also known as voicing factor.
  • zero crossings or other methods may be used for estimating the harmonicity. Such a zero crossing can, therefore, be interpreted as voicing factor.
  • Fig. 7 shows the combination of the encoder 100’ and the decoder 200’.
  • the baseband encoder 110 and the base band decoder 212 are based on DNNs.
  • the latent of the encoder 110 is output to the decoder 210.
  • a filterbank 130 is arranged which pre-processes the input signal for the encoder 110 and the BWE encoder 120’. It performs a perceived parameter estimation 142 and LPC encoding 143.
  • the parameter encoder 143 uses LPC coefficients out of the input signal for the baseband encoder 110.
  • the encoder comprising: o a baseband encoder configured to encode the band-limited signal including at least one learnable layer; o a bandwidth extension encoder which comprises a linear prediction of the extended-band signal.
  • the decoder comprising: o a baseband decoder configured to decode the band-limited signal including at least one learnable layer; o LPC estimation based on a band-limited signal from a decoder o a bandwidth extension decoder which comprises the generation of an excitation, input of linear predictive synthesis filter.
  • aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus.
  • Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, some one or more of the most important method steps may be executed by such an apparatus.
  • the inventive encoded audio signal can be stored on a digital storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
  • embodiments of the invention can be implemented in hardware or in software.
  • the implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
  • embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer.
  • the program code may for example be stored on a machine readable carrier.
  • inventions comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
  • an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
  • a further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein.
  • the data carrier, the digital storage medium or the recorded medium are typically tangible and/or non- transitionary.
  • a further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein.
  • the data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
  • a further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
  • a processing means for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
  • a further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

Codeur pour coder un signal audio comprenant une partie de signal à bande limitée et une partie de signal à bande étendue, le codeur comprenant : un codeur de bande de base conçu pour coder la partie de signal à bande limitée, le codeur de bande de base comprenant au moins une couche d'apprentissage ; et un codeur d'extension de bande passante comprenant une entité de prédiction linéaire pour effectuer une prédiction linéaire sur la partie de signal à bande étendue.
PCT/EP2024/057979 2024-03-25 2024-03-25 Codeur et décodeur Pending WO2025201625A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/EP2024/057979 WO2025201625A1 (fr) 2024-03-25 2024-03-25 Codeur et décodeur
PCT/EP2025/058177 WO2025202226A1 (fr) 2024-03-25 2025-03-25 Codeur et décodeur

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2024/057979 WO2025201625A1 (fr) 2024-03-25 2024-03-25 Codeur et décodeur

Publications (1)

Publication Number Publication Date
WO2025201625A1 true WO2025201625A1 (fr) 2025-10-02

Family

ID=90482357

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/EP2024/057979 Pending WO2025201625A1 (fr) 2024-03-25 2024-03-25 Codeur et décodeur
PCT/EP2025/058177 Pending WO2025202226A1 (fr) 2024-03-25 2025-03-25 Codeur et décodeur

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/EP2025/058177 Pending WO2025202226A1 (fr) 2024-03-25 2025-03-25 Codeur et décodeur

Country Status (1)

Country Link
WO (2) WO2025201625A1 (fr)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020007280A1 (en) * 2000-05-22 2002-01-17 Mccree Alan V. Wideband speech coding system and method
US20090319277A1 (en) * 2005-03-30 2009-12-24 Nokia Corporation Source Coding and/or Decoding
US20130051571A1 (en) * 2010-03-09 2013-02-28 Frederik Nagel Apparatus and method for processing an audio signal using patch border alignment
ES2627775T3 (es) * 2009-02-18 2017-07-31 Dolby International Ab Banco de filtros modulado de bajo retardo
US20190385626A1 (en) * 2013-07-12 2019-12-19 Koninklijke Philips N.V. Optimized scale factor for frequency band extension in an audio frequency signal decoder
US20200176004A1 (en) * 2018-11-30 2020-06-04 Google Llc Speech coding using auto-regressive generative neural networks
US20210287687A1 (en) 2018-12-21 2021-09-16 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio processor and method for generating a frequency enhanced audio signal using pulse processing
WO2023175197A1 (fr) * 2022-03-18 2023-09-21 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Techniques de vocodeur

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020007280A1 (en) * 2000-05-22 2002-01-17 Mccree Alan V. Wideband speech coding system and method
US20090319277A1 (en) * 2005-03-30 2009-12-24 Nokia Corporation Source Coding and/or Decoding
ES2627775T3 (es) * 2009-02-18 2017-07-31 Dolby International Ab Banco de filtros modulado de bajo retardo
US20130051571A1 (en) * 2010-03-09 2013-02-28 Frederik Nagel Apparatus and method for processing an audio signal using patch border alignment
US20190385626A1 (en) * 2013-07-12 2019-12-19 Koninklijke Philips N.V. Optimized scale factor for frequency band extension in an audio frequency signal decoder
US20200176004A1 (en) * 2018-11-30 2020-06-04 Google Llc Speech coding using auto-regressive generative neural networks
US20210287687A1 (en) 2018-12-21 2021-09-16 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio processor and method for generating a frequency enhanced audio signal using pulse processing
WO2023175197A1 (fr) * 2022-03-18 2023-09-21 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Techniques de vocodeur

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
BRUHN, STEFANPOBLOTH, HARALDSCHNELL, MARKUSGRILL, BERNHARDGIBBS, JONMIAO, LEIJARVINEN, KARILAAKSONEN, LASSEHARADA, NOBORUNAKA, NOB: "Standardization of the new 3GPP EVS codec", 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2015, SOUTH BRISBANE, QUEENSLAND, AUSTRALIA, 19 April 2015 (2015-04-19)
DOUGLAS O'SHAUGHNESSY: "Review of methods for coding of speech signals", EURASIP JOURNAL ON AUDIO, SPEECH, AND MUSIC PROCESSING, BIOMED CENTRAL LTD, LONDON, UK, vol. 2023, no. 1, 7 February 2023 (2023-02-07), pages 1 - 25, XP021314321, DOI: 10.1186/S13636-023-00274-X *
MAKINEN, JARIBESSETTE, BRUNOBRUHN, STEFANOJALA, PASISALAMI, REDWANTALEB, ANISSE: "AMR-WB+: a new audio coding standard for 3rd generation mobile audio services", 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, ICASSP '05, PHILADELPHIA, PENNSYLVANIA, USA, 18 March 2005 (2005-03-18)
PIA, NICOLAGUPTA, KISHANKORSE, SRIKANTHMULTRUS, MARKUSFUCHS, GUILLAUME, NESC: ROBUST NEURAL END-2-END SPEECH CODING WITH GANS, July 2022 (2022-07-01)

Also Published As

Publication number Publication date
WO2025202226A1 (fr) 2025-10-02

Similar Documents

Publication Publication Date Title
AU2008316860B2 (en) Scalable speech and audio encoding using combinatorial encoding of MDCT spectrum
EP0981816B1 (fr) Procedes et systemes de codage audio
RU2389085C2 (ru) Способы и устройства для введения низкочастотных предыскажений в ходе сжатия звука на основе acelp/tcx
EP3039676B1 (fr) Extension de bande passante adaptative et son appareil
EP3239979B1 (fr) Codage de signaux audio génériques à faible débit binaire et à faible retard
US20060271356A1 (en) Systems, methods, and apparatus for quantization of spectral envelope representation
CN101371296B (zh) 用于编码和解码信号的设备和方法
CN103262161A (zh) 确定用于线性预测编码(lpc)系数量化的具有低复杂度的加权函数的设备和方法
US20050065788A1 (en) Hybrid speech coding and system
EP4275204B1 (fr) Procédé et dispositif de codage de domaine temporel/de domaine fréquentiel unifié d'un signal sonore
CN102460574A (zh) 用于使用层级正弦脉冲编码对音频信号进行编码和解码的方法和设备
JPWO2009125588A1 (ja) 符号化装置および符号化方法
KR20140088879A (ko) 음성 신호의 대역 선택적 양자화 방법 및 장치
Cho et al. A spectrally mixed excitation (SMX) vocoder with robust parameter determination
WO2025201625A1 (fr) Codeur et décodeur
RU2414009C2 (ru) Устройство и способ для кодирования и декодирования сигнала
EP1155405A1 (fr) Codeur de forme d'onde interpolatif ameliore
EP4553833A1 (fr) Décodeur et codeur pour l'extension de la largeur de bande
US20050065787A1 (en) Hybrid speech coding and system
EP4553832A1 (fr) Processeur audio avec extension de largeur de bande audio dirigée
EP4553830A1 (fr) Processeur audio pour extension de la largeur de bande audio d'un signal audio à bande limitée
Gupta et al. UBGAN: Enhancing Coded Speech with Blind and Guided Bandwidth Extension
Kim et al. A 4 kbps adaptive fixed code-excited linear prediction speech coder
HK40107881A (en) Coding generic audio signals at low bitrates and low delay
JP2004252477A (ja) 広帯域音声復元装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24714479

Country of ref document: EP

Kind code of ref document: A1