EP3745744A2 - Traitement audio - Google Patents

Traitement audio Download PDF

Info

Publication number: EP3745744A2
Authority: EP; European Patent Office
Prior art keywords: signal; channel; input; signal component; audio
Prior art date: 2019-05-29
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.): Pending

Application number

EP20176223.4A

Other languages

German (de)

English (en)

Other versions

EP3745744A3 (fr

Inventor

Riitta VÄÄNÄNEN

Sampo VESA

Mikko-Ville Laitinen

Jussi Virolainen

Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)

Nokia Technologies Oy

Original Assignee

Nokia Technologies Oy

Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)

2019-05-29

Filing date

2020-05-25

Publication date

2020-12-02

2020-05-25 Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy

2020-12-02 Publication of EP3745744A2 publication Critical patent/EP3745744A2/fr

2021-03-31 Publication of EP3745744A3 publication Critical patent/EP3745744A3/fr

Status Pending legal-status Critical Current

Links

Images

Classifications

- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S3/00—Systems employing more than two channels, e.g. quadraphonic
- H04S3/002—Non-adaptive circuits, e.g. manually adjustable or static, for enhancing the sound image or the spatial distribution
- H04S3/004—For headphones
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/305—Electronic adaptation of stereophonic audio signals to reverberation of the listening space
- H04S7/306—For headphones
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; ELECTRIC HEARING AIDS; PUBLIC ADDRESS SYSTEMS
- H04R2420/00—Details of connection covered by H04R, not provided for in its groups
- H04R2420/01—Input selection or mixing for amplifiers or loudspeakers
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/01—Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]

Definitions

the example and non-limiting embodiments of the present invention relate to processing of audio signals.
various embodiments of the present invention relate to modification of a spatial image represented by a multi-channel audio signal, such as a two-channel stereo signal.
So-called stereo widening is a technique known in the art for enhancing the perceivable spatial audio image of a stereophonic audio signal when reproduced via audio output device.
Such a technique aims at processing a stereophonic audio signal such that reproduced sound is not only perceived as originating from directions that are localized between the audio output devices but at least part of the sound field is perceived as if it originated from directions that are not localized between the audio output devices, thereby widening the perceivable width of spatial audio image from that conveyed in the stereophonic audio signal.
spatial audio image we refer to such spatial audio image as a widened or enlarged spatial audio image.
stereo widening may be applied to multi-channel audio signals that have more than two channels, such as 5.1-channel or 7.1-channel surround sound for playback via a pair of audio output devices.
virtual surround is applied to refer to a processed audio signal that conveys a spatial audio image originally conveyed in a multi-channel surround audio signal.
this term should be construed broadly, encompassing a technique for processing the spatial audio image conveyed in a multi-channel audio signal (i.e. a two-channel stereophonic audio signal or a surround sound of more than two channels) to provide audio playback at widened spatial audio image.
multi-channel audio signal refers to audio signals that have two or more channels.
stereo signal is used to refer to a stereophonic audio signal and the term surround signal is used to refer to a multi-channel audio signal having more than two channels.
stereo widening techniques When applied to a stereo signal, stereo widening techniques known in the art typically involve adding a processed (e.g. filtered) version of a contralateral channel signal to each of the left and right channel signals of the stereo signal in order to derive an output stereo signal having a widened spatial audio image (referred to in the following as a widened stereo signal).
a processed version of the right channel signal of the stereo signal is added to the left channel signal of the stereo signal to create the left channel of a widened stereo signal and a processed version of the left channel signal of the stereo signal is added to the right channel signal of the stereo signal to create the right channel of the widened stereo signal.
the procedure of deriving the widened stereo signal may further involve pre-filtering (or otherwise processing) each of the left and right channel signals of the stereo signal prior to adding the respective processed contralateral signals thereto in order to preserve desired frequency response in the widened stereo signal.
stereo widening readily generalizes into widening the spatial audio image of a multi-channel input audio signal, thereby deriving an output multi-channel audio signal having a widened spatial audio image (referred to in the following as a widened multi-channel signal).
the processing involves creating the left channel of the widened multi-channel audio signal as a sum of (first) filtered versions of channels of the multi-channel input audio signal and creating the right channel of the widened multi-channel audio signal as a sum of (second) filtered versions of channels of the multi-channel input audio signal.
a dedicated predefined filter may be provided for each pair of an input channel (channels of the multi-channel input signal) and an output channel (left and right) .
S ( i, b, n ) denotes frequency bin b in time frame n of channel i of the multi-channel signal S
H left ( i, b ) denotes a filter for filtering frequency bin b of channel i of the multi-channel signal S to create a respective channel component for creation of the left channel signal S out,left ( b, n )
H right ( i, b ) denotes a filter for filtering frequency bin b of channel i of the multi-channel signal S to create a respective channel component for creation of the right
a challenge involved in stereo widening is degraded timbre in the central part of the spatial audio image.
the central part of the spatial audio image includes perceptually important audio content, e.g. in case of music the voice of the vocalist is typically rendered in the center of the spatial audio image.
a sound component that is in the center of the spatial audio image is rendered by reproducing the same signal in both channels of the stereo signal and hence via both audio output devices.
the audio output devices are part of a headphone apparatus that comprises a left audio output device that is worn at, over or in a left ear of a user and a right audio output device that is worn at, over or in a right ear of a user.
Normal playback of stereo audio via headphones may cause the sound to be perceived by a user inside the user's head.
the stereo panning cues position the sound in between the ears, inside the head.
loudspeaker virtualization methods are used to process the audio signals so that the perception to the user listening via headphones is similar to the perception to a user who is listening via loudspeakers. This can be achieved by filtering the audio signals using appropriate head-related transfer functions (HRTF) or binaural room impulse responses (BRIR).
HRTF head-related transfer functions
BRIR binaural room impulse responses
an apparatus for processing an input audio signal comprising multiple channels
the apparatus comprising: means for deriving, based on the input audio signal, a first signal component, comprising at least one input channel, and a second signal component, comprising multiple input channels, wherein the first signal component is dependent upon at least a first portion of a spatial audio image conveyed by the input audio signal, and the second signal component is dependent upon at least a second portion of the spatial audio image that is different to the first portion; cross-channel mixing means for cross-channel mixing of a plurality of input channels; means for directing the second signal component to the cross-channel mixing means for cross-channel mixing of at least some of the multiple input channels of the second signal component to produce a modified second signal component; bypass means for enabling the first signal component to bypass the cross-channel mixing means; and means for combining the first signal component and the modified second signal component into an output audio signal comprising two output channels configured for rendering by headphone apparatus.
the cross-channel mixing means for cross-channel mixing of a plurality of input channels comprises means for applying head related transfer functions to each one of the plurality of input channels before mixing those channels to produce a modified second signal component comprising two output channels, wherein the head related transfer function applied to an input channel that is mixed to provide an output channel is dependent upon an identity of the input channel and an identify of the output channel.
the cross-channel mixing means for cross-channel mixing of a plurality of input channels comprises means for applying a headphone filter to each one of the plurality of input channels before mixing those channels to produce a modified second signal component comprising two output channels, wherein the headphone filter applied to an input channel that is mixed to provide an output channel is dependent upon an identity of the input channel and an identify of the output channel, wherein the headphone filter for an input channel mixes a direct version of the input channel with an ambient version of the input channel.
the relative gain of the direct version of the input channel compared to the ambient version of the input channel in a mix in the headphone filter is a user-controllable parameter.
the headphone filter for an input channel mixes a single-path direct version of the input channel with a multiple-path ambient version of the input channel; wherein a head related transfer function is used to form the single-path direct version of the input channel; wherein, an indirect path filter is used in combination with a head related transfer function for each path of the multiple paths, to form the multiple-path ambient version of the input channel.
the indirect path filter comprises decorrelation means or reverberation means.
the cross-channel mixing means is configured to cause stereo-widening for headphone apparatus such that a width of a spatial audio image associated with the modified second signal component is greater than a width of a spatial audio image associated with the second signal component before cross-channel mixing of the second signal component.
the first portion is front and central relative to a user of the headphone apparatus, and the second portion is peripheral relative to the user of headphone apparatus and does not overlap the first portion.
the first and second portions are contiguous.
the bypass means enables components of the input audio signal that represent a sound source that is coherent between two stereo channels and is positioned to front and center, to bypass the cross-channel mixing means.
a control input controls one or more of:
the input audio signal comprises a same sound source that is repeated at different positions, and that is rendered at the headphone apparatus without interaural time difference and without frequency dependent interaural level differences
the sound source of the input audio signal when the sound source of the input audio signal is positioned at a first position that is relatively front and central to a user of the headphone apparatus, then the sound source is rendered at the headphone apparatus with interaural time differences and with frequency dependent interaural level differences when the sound source of the input audio signal is repeated at a second position that is relatively peripheral and is not front and central to a user of the headphone apparatus.
a system comprising the apparatus and a headphone apparatus configured for receiving and rendering the output audio signal.
the apparatus is configured as a headphone apparatus for rendering the output audio signal.
a method for processing an input audio signal comprising a at least one input channel/multiple input channels, the method comprising:
an apparatus for processing an input audio signal comprising a at least one input channel/multiple input channels, the apparatus comprising at least one processor; and at least one memory including computer program code, which when executed by the at least one processor, causes the apparatus to:
a computer program comprising computer readable program code configured to cause a computer to: derive, based on an input audio signal, a first signal component, comprising at least one input channel, and a second signal component, comprising multiple input channels, wherein the first signal component is dependent upon at least a first portion of a spatial audio image conveyed by the input audio signal, and the second signal component is dependent upon at least a second portion of the spatial audio image that is different to the first portion; perform cross-channel mixing of at least some of the multiple input channels of the second signal component to produce a modified second signal component while enabling the first signal component to bypass cross-channel mixing.
an apparatus for processing an input audio signal comprising multiple channels to produce a two-channel output audio signal configured for rendering by headphone apparatus to produce a spatial audio image comprising:
the means for deriving the first and second signal components is arranged to
the first portion of the spatial audio image comprises one or more angular ranges that define a set of sound arrival directions within the spatial audio image.
said one or more angular ranges comprise an angular range that defines a range of sound arrival directions centered around a front direction of the spatial audio image.
the means for deriving the first and second signal components comprises
the means for deriving the directional coefficients is arranged to, for said plurality of frequency sub-bands,
the means for determining the decomposition coefficients is arranged to derive, for said plurality of frequency sub-bands, the respective decomposition coefficient as the product of the coherence value and the directional coefficient derived for the respective frequency sub-band.
the means for decomposing the input audio signal is arranged to, for said plurality of frequency sub-bands,
the apparatus comprises a means for delaying the first signal component by a predefined time delay prior to combining the first signal component with the modified second signal component, thereby creating a delayed first signal component that is temporally aligned with the modified second signal component.
the apparatus comprises a means for modifying the first signal component prior to combining the first signal component with the modified second signal component, wherein the modification comprises generating, on basis of the first signal component, a modified first signal component wherein one or more sound sources represented by the first signal component are panned in the spatial audio image,
each of said the multiple input channels comprise two channels.
a computer program comprising computer readable program code configured to cause performing at least a method according to the example embodiment described in the foregoing when said program code is executed on a computing apparatus.
the computer program according to an example embodiment may be embodied on a volatile or a non-volatile computer-readable record medium, for example as a computer program product comprising at least one computer readable non-transitory medium having program code stored thereon, the program which when executed by an apparatus cause the apparatus at least to perform the operations described hereinbefore for the computer program according to an example embodiment of the invention.
a headphone apparatus is an apparatus that has a left audio output device that is worn at, over or in a left ear of a user and a right audio output device that is worn at, over or in a right ear of a user.
the audio heard in the left ear by the user is dependent upon audio output by the left audio output device and is not dependent upon audio output by the right audio output device.
the audio heard in the right ear by the user is dependent upon audio output by the right audio output device and is not dependent upon audio output by the left audio output device.
the headphone receives input signals wirelessly or over a wired connection.
the headphone apparatus comprises acoustic isolators that isolate the ears of the user from external environmental sounds.
the headphone apparatus can comprise 'cans' that cover the user's ears and provide at least some acoustic isolation. In some examples, the headphone apparatus can comprise deformable 'buds' that fit snugly inside the user's ears and provide at least some acoustic isolation.
Each audio output device comprises a transducer that converts a received electrical signal to an acoustic pressure wave or a vibration.
an apparatus 100, 100', 50 for processing an input audio signal 101 comprising multiple channels comprising: means 104 for deriving, based on the input audio signal 101, a first signal component 105-1, comprising at least one input channel, and a second signal component 105-2, comprising multiple input channels, wherein the first signal component 105-1 is dependent upon at least a first portion of a spatial audio image conveyed by the input audio signal 101, and the second signal component 105-2 is dependent upon at least a second portion of the spatial audio image that is different to the first portion; cross-channel mixing means 112, 112' for cross-channel mixing of a plurality of input channels; means 104 for directing the second signal component 105-2 to the cross-channel mixing means 112, 112' for cross-channel mixing of at least some of the multiple input channels of the second signal component 105-2 to produce a modified second signal component 113, 113'; bypass means 104, 106 for enabling the first signal component 105-1 to bypass the cross
FIG. 1A illustrates a block diagram of some components and/or entities of an audio processing system 100 that may serve as framework for various embodiments of the audio processing technique described in the present disclosure.
the audio processing system 100 obtains a stereophonic audio signal as an input signal 101 and provides a stereophonic audio signal having at least partially widened spatial audio image as an output signal 115.
the input signal 101 and the output signal 115 are referred to in the following as a stereo signal 101 and a widened stereo signal 115, respectively.
each of these signals is assumed to be a respective two-channel stereophonic audio signal unless explicitly stated otherwise.
each of the intermediate audio signals derived on basis of the input signal 101 are likewise respective two-channel audio signals unless explicitly state otherwise.
the audio processing system 100 readily generalizes into a one that enables processing of a spatial audio signal (i.e. a multi-channel audio signal with more than two channels, such as a 5.1-channel spatial audio signal or a 7.1-channel spatial audio signal), some aspects of which are also described in the examples provided in the following.
a spatial audio signal i.e. a multi-channel audio signal with more than two channels, such as a 5.1-channel spatial audio signal or a 7.1-channel spatial audio signal
the audio processing system 100 may further receive a control input 10 and an indication 12 of target sound source (virtual loudspeaker) positions.
the audio processing system 100 comprises a transform entity (or a transformer) 102 for converting the stereo audio signal 101 from time domain into a transform domain stereo signal 103, a signal decomposer 104 for deriving, based on the transform-domain stereo signal 103, a first signal component 105-1 that represents a focus portion of the spatial audio image and a second signal component 105-2 that represents a non-focus portion of the spatial audio image, a re-panner 106 for generating, on basis of the first signal component 105-1, a modified first signal component 107, where one or more sound sources represented in the focus portion of the spatial audio image are repositioned in dependence of the target configuration, an inverse transform entity 108-1 for converting the modified first signal component 107 from the transform domain to a time-domain modified first signal component 109-1, an inverse transform entity 108-2 for converting the second signal component 105-2 from the transform domain to a time-domain second signal component 109-2, a delay element 110 for delaying the modified
Figure 1B illustrates a block diagram of some components and/or entities of an audio processing system 100', which is a variation of the audio processing system 100 illustrated in Figure 1A .
differences to the audio processing system 100 are that the inverse transform entities 108-1 and 108-2 are omitted, the delay element 100 is replaced with the optional delay element 110' for delaying the modified first signal component 107 into delayed modified first signal component 111', the stereo widening processor 112 is replaced with a stereo widening processor 112' for generating, on basis of the transform-domain second signal component 105-2, a modified (transform-domain) second signal component 113', and the signal combiner 114 is replaced with a signal combiner 114' for combining the delayed modified first signal component 111' and the modified second signal component 113' into a widened stereo signal 115' in the transform domain.
the audio processing system 100' comprises a transform entity 108' for converting the widened stereo signal 115' from the transform domain into a time-domain widened stereo signal 115.
the signal combiner 114' receives the modified first signal component 107 (instead of the delayed version thereof) and operates to combine modified first signal component 107 with the modified second signal component 113' to create the transform-domain widened stereo signal 115'.
the audio processing technique described in the present disclosure is predominantly described via examples that pertain to the audio processing system 100 according to the example of Figure 1A and entities thereof, whereas the audio processing system 100' and entities thereof are separately described where applicable.
the audio processing system 100 or the audio processing system 100' may include further entities and/or some entities depicted in Figures 1A and 1B may be omitted or combined with other entities.
Figures 1A and 1B serve to illustrate logical components of a respective entity and hence do not impose structural limitations concerning implementation of the respective entity but, for example, respective hardware means, respective software means or a respective combination of hardware means and software means may be applied to implement any of the logical components of an entity separately from the other logical components of that entity, to implement any sub-combination of two or more logical components of an entity, or to implement all logical components of an entity in combination.
the audio processing system 100, 100' may be implemented by one or more computing devices and the resulting widened stereo signal 115 may be provided for playback via headphone apparatus.
the audio processing system 100, 100' is implemented in a computing device of any type, e.g. a portable handheld device, a desktop computer, a server device, etc. Examples of portable handheld devices include a mobile phone, a media player device, a tablet computer, a laptop computer, etc.
the computing device can also be used to play back the widened stereo signal 115 via headphone apparatus.
the audio processing system 100, 100' is provided in the headphone apparatus and the playback of the widened stereo signal 115 is provided in the headphone apparatus.
a first part of the audio processing system 100, 100' is provided in a first device, whereas a second part of the audio processing system 100, 100' and the playback of the widened stereo signal 115 is provided in the headphone apparatus.
FIG. 2 illustrates a block diagram of some components and/or entities of a portable handheld device 50 that implements the audio processing system 100 or the audio processing system 100'.
the device 50 further comprises a memory device 52 for storing information, e.g. the stereo signal 101, and a communication interface 54 for communicating with other devices and possibly receiving the stereo signal 101 therefrom.
the device 50 optionally, further comprises an audio preprocessor 56 that may be useable for preprocessing the stereo signal 101 read from the memory 52 or received via the communication interface 54 before providing it to the audio processing system 100, 100'.
the audio preprocessor 56 may, for example, carry out decoding of an audio signal stored in an encoded format into a time domain stereo audio signal 101.
the audio processing system 100, 100' may further receive the first control input 10 and indication 12 together with the stereo signal 101 from or via the audio preprocessor 56.
the control input 12 is used to control signal de-composition 104 and/or re-panning 106 and/or stereo-widening 112, 112'. More details are provided in the following description.
the indication 12 indicates the target sound source (virtual loudspeaker) positions. Effectively this means the positions of loudspeakers if the input audio signal would be reproduced by loudspeakers.
the virtual loudspeaker positions match typically with the loudspeaker format of input audio signals.
the virtual loudspeaker positions could, e.g., correspond to loudspeaker angles of +/-30 degrees with respect to front direction.
For multichannel audio signals, e.g. for 5.1 these angles are typically 0, +/-30 and +/-110 degrees.
the virtual loudspeaker positions may have any meaningful values.
Target sound source position indication may also be provided by other means (via user interface), be a hardcoded value or be omitted.
the indication 12 is used to control signal decomposition 104. In some but not necessarily all examples, it can be used for stereo-widening 112.
the audio processing system 100, 100' provides the widened stereo signal 115 derived therein to an interface for communicating to headphone apparatus 20 for rendering.
the headphone apparatus 20 is an apparatus that has a left audio output device 21 that is worn at, over or in a left ear of a user and a right audio output device 22 that is worn at, over or in a right ear of a user.
the audio heard in the left ear by the user is dependent upon audio output by the left audio output device 21 and is not dependent upon audio output by the right audio output device 22.
the audio heard in the right ear by the user is dependent upon audio output by the right audio output device 22 and is not dependent upon audio output by the left audio output device 21.
the headphone apparatus 20 receives input signals wirelessly or over a wired connection.
the headphone apparatus 20 comprises acoustic isolators 23 that isolate the ears of the user from external environmental sounds.
the headphone apparatus can comprise left and right 'cans' 23 that cover the user's ears, house the respective audio output devices 21, 22 and provide at least some acoustic isolation.
the headphone apparatus can comprise a deformable 'buds' that fit snugly inside the respective left and right ears of the user, surround the respective audio output devices 21, 22 and provide at least some acoustic isolation.
Each audio output device 21, 22 comprises a transducer that converts a received electrical signal to an acoustic pressure wave or a vibration.
the stereo signal 101 may be received at the signal processing system 100, 100' e.g. by reading the stereo signal from a memory or from a mass storage device in the device 50.
the stereo signal is obtained via communication interface (such as a network interface) from another device that stores the stereo signal in a memory or from a mass storage device provided therein.
the widened stereo signal 115 may be provided for rendering by headphone apparatus 20. Additionally or alternatively, the widened stereo signal 115 may be stored in the memory or the mass storage device in the device 50 and/or provided via a communication interface to another device for storage therein.
the information 12 that defines the virtual loudspeaker positions may be used to control stereo widening processing such that audio sources are perceived at desired positions, which may also be at positions outside the physical locations of the headphones.
the processing may include maintaining some portions (such as the focus portion of the spatial audio image) in between the physical locations of the headphones.
the audio processing system 100, 100' may be arranged to process the stereo signal 101 arranged into a sequence of input frames, each input frame including a respective segment of digital audio signal for each of the channels, provided as a respective time series of input samples at a predefined sampling frequency.
the audio processing system 100, 100' employs a fixed predefined frame length.
the frame length may be a selectable frame length that may be selected from a plurality of predefined frame lengths, or the frame length may be an adjustable frame length that may be selected from a predefined range of frame lengths.
a frame length may be defined as number samples L included in the frame for each channel of the stereo signal 101, which at the predefined sampling frequency maps to a corresponding duration in time.
the frames may be non-overlapping or they may be partially overlapping. These values, however, serve as non-limiting examples and frame lengths and/or sampling frequencies different from these examples may be employed instead, depending e.g. on the desired audio bandwidth, on desired framing delay and/or on available processing capacity.
the audio processing system 100, 100' may comprise the transform entity 102 that is arranged to convert the stereo signal 101 from time domain into a transform-domain stereo signal 103.
the transform domain involves a frequency domain.
the transform entity 102 employs short-time discrete Fourier transform (STFT) to convert each channel of the stereo signal 101 into a respective channel of the transform-domain stereo signal 103 using a predefined analysis window length (e.g. 20 milliseconds).
STFT short-time discrete Fourier transform
QMF complex-modulated quadrature-mirror filter
the STFT and QMF bank serve as non-limiting examples in this regard and in further examples any suitable transform technique known in the art may be employed for creating the transform-domain stereo signal 103.
the transform entity 102 may further divide each of the channels into a plurality of frequency sub-bands, thereby resulting in the transform-domain stereo signal 103 that provides a respective time-frequency representation for each channel of the stereo signal 101.
a given frequency band in a given frame may be referred to as a time-frequency tile.
the number of frequency sub-bands and respective bandwidths of the frequency sub-bands may be selected e.g. in accordance with the desired frequency resolution and/or available computing power.
the sub-band structure involves 24 frequency sub-bands according to the Bark scale, an equivalent rectangular band (ERB) scale or 3 rd octave band scale known in the art.
different number of frequency sub-bands that have the same or different bandwidths may be employed.
a specific example in this regard is a single frequency sub-band that covers the input spectrum in its entirety or a continuous subset thereof.
a time-frequency tile that represents frequency bin b in time frame n of channel i of the transform-domain stereo signal 103 may be denoted as S ( i, b, n ).
the channel i represents a single virtual loudspeaker or an input channel.
the transform-domain stereo signal 103 e.g. the time-frequency tiles S ( i, b, n ) are passed to the signal decomposer 104 for decomposition into the first signal component 105-1 and the second signal component 105-2 therein.
the lowest bin i.e. a frequency bin that represents the lowest frequency in that frequency sub-band
the highest bin i.e. a frequency bin that represents the highest frequency in that frequency sub-band
the audio processing system 100, 100' may comprise the signal decomposer 104 that is arranged to derive, based on the transform-domain stereo signal 103, the first signal component 105-1 and the second signal component 105-2.
the first signal component 105-1 is referred to as a signal component that represents the focus portion of the spatial audio image
the second signal component 105-2 is referred to a signal component that represents the non-focus portion of the spatial audio image.
the focus portion represents those parts of the audio image that are front and central and can be considered as 'frontness'.
the non-focus portion represents those parts of the audio image that are not represented by the focus portion (not front and central) and may be hence referred to as a 'peripheral' portion of the spatial audio image.
the decomposition procedure does not change the number of channels and hence in the present example each of the first signal component 105-1 and the second signal component 105-2 is provided as a respective two-channel audio signal.
focus portion and non-focus portion as used in this disclosure are designations assigned to spatial sub-portions of the spatial audio image represented by the stereo signal 101, while these designation as such do not imply any specific processing to be applied (or having been applied) to the underlying stereo signal 101 or the transform-domain stereo signal 103 e.g. to actively emphasize or de-emphasize any portion of the spatial audio image represented by the stereo signal 101.
the signal decomposer 104 may derive, on basis of the transform-domain stereo signal 103, the first signal component 105 that represents those coherent sounds of the spatial audio image that are within a predefined focus range, such sounds hence constituting the focus portion of the spatial audio image.
the focus range can be defined by the control input 10.
the signal decomposer 104 may derive, on basis of the transform-domain stereo signal 103, the second signal component 105 that represents coherent sound sources or sound components of the spatial audio image that are outside the predefined focus range and all non-coherent sound sources of the spatial audio image, such sound sources or components hence constituting the non-focus portion of the spatial audio image.
the signal decomposer 104 decomposes the sound field represented by the stereo signal 101 into the first signal component 105-1 that is excluded from subsequent stereo widening processing and into the second signal component 105-2 that is subsequently subjected to the stereo widening processing.
Figure 3 illustrates a block diagram of some components and/or entities of the signal decomposer 104 according to an example.
the signal decomposer 104 may be, conceptually, divided into a decomposition analyzer 104a and a signal divider 126, as illustrated in Figure 3 .
entities of the signal decomposer 104 according to the example of Figure 3 are described in more detail.
the signal decomposer 104 may include further entities and/or some entities depicted in Figure 3 may be omitted or combined with other entities
the signal decomposer 104 may comprise a coherence analyzer 116 for estimating, on basis of the transform-domain stereo signal 103, coherence values 117 that are descriptive of coherence between the channels of the transform-domain stereo signal 103.
the coherence values 117 are provided for a decomposition coefficient determiner 124 for further processing therein.
Computation of the coherence values 117 may involve deriving a respective coherence value ⁇ ( k, n ) for a plurality of frequency sub-bands k in a plurality of time frames n based on the time-frequency tiles S ( i, b, n ) that represent the transform domain stereo signal 103.
the coherence values 117 may be computed e.g.
⁇ ( k, n ) has a large value when the audio of the channels is dominated by an audio event that is common to both channels.
a common audio event will typically cause a complex phasor distribution across the frequency bins b .
the signal decomposer 104 may comprise the energy estimator 118 for estimating energy of the transform-domain stereo signal 103 on basis of the transform-domain stereo signal 103.
the energy values 119 are provided for a direction estimator 120 for direction angle estimation therein.
Computation of the energy values 119 may involve deriving a respective energy value E ( i, k, n ) for a plurality of frequency sub-bands k in plurality of audio channels i in a plurality of time frames n based on the time-frequency tiles S ( i, b, n ).
the signal decomposer 104 may comprise the direction estimator 120 for estimating perceivable arrival direction of the sound represented by the stereo signal 101 based on the energy values 119 in view of a target virtual loudspeaker configuration applied in the stereo signal 101.
the direction estimation may comprise computation of direction angles 121 based on the energy values in view of the target virtual loudspeaker positions, which direction angles 121 are provided for a focus estimator 122 for further analysis therein.
the target sound source (virtual loudspeaker) configuration may also be referred to as channel configuration (of the stereo signal 101).
This information may be obtained, for example, from metadata 12 that accompanies the stereo signal 101, e.g. metadata included in an audio container within which the stereo signal 101 is stored.
the information defining the target virtual loudspeaker configuration applied in the stereo signal 101 may be received (as user input) 12 via a user interface of the device 50.
the target virtual loudspeaker configuration may be defined by indicating, for each channel of the stereo signal 101, a respective target virtual loudspeaker position with respect to an assumed listening point.
a target position for a virtual loudspeaker may comprise a target direction, which may be defined as an angle with respect to a reference direction (e.g. a front direction).
a reference direction e.g. a front direction
the target virtual loudspeaker configuration may be defined as respective target angles ⁇ in (1) and ⁇ in (2) with respect to the front direction for the left and right virtual loudspeakers.
no indication 12 is received in the audio processing system 100, 100' and the elements of the audio processing system 100, 100' that make use of the information that defines the target virtual loudspeaker configuration applied in the stereo signal 101 (the signal decomposer 104, the re-panner 106) apply predefined information in this regard instead.
An example in this regard involves applying a fixed predefined target virtual loudspeaker configuration.
Another example involves selecting one of a plurality of predefined target virtual loudspeaker configurations in dependence of the number of audio channels in the received stereo signal 101.
Non-limiting examples in this regard include selecting, in response to a two-channel signal 101 (which is hence assumed as a two-channel stereophonic audio signal), a target virtual loudspeaker configuration where the channels are positioned ⁇ 30 degrees with respect to the front direction and/or selecting, in response to a six-channel signal (that is hence assumed to represent a 5.1-channel surround signal), a target virtual loudspeaker configuration where the channels are positioned at target angles ⁇ in ( i ) of 0 degrees, ⁇ 30 degrees and ⁇ 110 degrees with respect to the front direction and complemented with a low frequency effects (LFE) channel.
LFE low frequency effects
the direction estimator 120 is configured to estimate perceivable arrival direction of the sound represented by the stereo signal 101.
the direction estimation may involve deriving a respective direction angle 121, ⁇ ( k, n ), for a plurality of frequency sub-bands k in a plurality of time frames n based on the estimated energies E ( i, k, n ) and the target virtual loudspeaker positions ⁇ in ( i ), the direction angles 121, ⁇ ( k, n ), thereby indicating the estimated perceived arrival direction of the sound in frequency sub-bands of input frames.
the target positions of the left and right virtual loudspeakers may be positioned non-symmetrically with respect to the front direction (e.g. such that
⁇ c ⁇ in 1 + ⁇ in 2 2 .
the signal decomposer 104 may comprise the focus estimator 122 for determining one or more focus coefficients 123 based on the estimated perceivable arrival direction of the sound represented by the stereo signal 101 (directions angles 121) in view of a defined focus range within the spatial audio image, where the focus coefficients 123 are indicative of the relationship between the estimated arrival direction of the sound (direction angles 121) and the focus range.
the focus range may be defined, for example, as a single angular range or as two or more angular sub-ranges in the spatial audio image. In other words, the focus range may be defined as a set of arrival directions of the sound within the spatial audio image.
the focus range can be defined by the control input 10.
the focus coefficients 123 may be derived by the focus estimator 122 based at least in part on the direction angles 121.
the focus estimator 122 may optionally further receive the indication 12 of the target virtual loudspeaker configuration applied in the stereo signal 101, and compute the focus coefficients 123 further in view of this information.
the focus coefficients 123 are provided for the decomposition coefficient determiner 124 for further processing therein.
the one or more angular ranges of the focus range define a set of arrival directions that cover a defined portion around the center of the spatial audio image, thereby rendering the focus estimation as a 'frontness' estimation.
the focus estimation may involve deriving a respective focus (frontness) coefficient ⁇ ( k, n ) for a plurality of frequency sub-bands k in a plurality of time frames n based on the direction angles 121, ⁇ ( k, n ), e.g.
⁇ k n ⁇ 1 , ⁇ k n ⁇ ⁇ Th 1 1 ⁇ ⁇ k n ⁇ ⁇ Th 1 ⁇ Th 2 ⁇ ⁇ Th 1 , ⁇ Th 1 ⁇ ⁇ k n ⁇ ⁇ Th 2 0 , ⁇ k n > ⁇ Th 2 .
the first threshold value ⁇ Th 1 and the second threshold value ⁇ Th 2 serve to define a primary (center) angular focus range (between angles - ⁇ Th 1 to ⁇ Th 1 around the front direction), a secondary angular focus range (from - ⁇ Th 2 to - ⁇ Th 1 and from ⁇ Th 1 to ⁇ Th 2 with respect to the front direction) and a non-focus range (outside - ⁇ Th 2 and ⁇ Th 2 with respect to the front direction).
the coefficients defining the focus range ⁇ Th 1, ⁇ Th 2 can be defined by the control input 10.
Focus estimation according to the equation (7) hence applies a focus range that includes two angular ranges (i.e.
the primary angular focus range and the secondary angular focus range sets the focus coefficient ⁇ ( k, n ) to unity in response to a sound source direction residing within the primary angular focus range and sets the focus coefficient ⁇ ( k, n ) to zero in response to the sound source direction residing outside the focus range, whereas a predefined function of sound source direction is applied to set the focus coefficient ⁇ ( k, n ) to a value between unity and zero in response to the sound source direction residing within the secondary angular focus range.
the focus coefficient ⁇ ( k, n ) is set to a non-zero value in response to the sound source direction residing within the focus range and the focus coefficient ⁇ ( k, n ) is set to zero value in response to the perceived sound source direction, direction angles 121, ⁇ ( k, n ), residing outside the focus range.
the equation (7) may be modified such that no secondary angular focus range is applied and hence only a single threshold may be applied to define the limit(s) between the focus range and the non-focus range.
the focus range may be defined as one or more contiguous, non-overlapping angular focus ranges.
the focus range may include a single defined angular range or two or more defined angular ranges.
At least one of the focus ranges is selectable, e.g. such that an angular focus range may be selected or adjusted (e.g. via selection or adjustment of one or more threshold values that define the respective angular focus range) in dependence of the target (or assumed) virtual loudspeaker configuration associated with the stereo input signal 12, and the focus range parameter present in control input 10.
the control information could be used to control how large a portion (or which angles) of the sound image will be sent to widening.
the signal decomposer 104 may comprise the decomposition coefficient determiner 124 for deriving decomposition coefficients 125 based on the coherence values 117 and the focus coefficients 123.
the decomposition coefficients 125 are provided for the signal divider 126 for decomposition of the transform-domain stereo signal 103 therein.
the signal divider 126 is configured to derive, based on the transform-domain stereo signal 103 and the decomposition coefficients 125, the first signal component 105-1 that represents the focus portion of the spatial audio image and the second signal component 105-2 that represents the non-focus portion (e.g. a 'peripheral' portion) of the spatial audio image.
the decomposition coefficient determination aims at providing a high value for a decomposition coefficient ⁇ ( k, n ) for a frequency sub-band k and frame n that exhibits relatively high coherence between the channels of the stereo signal 101 and that conveys a directional sound component that is within the focus portion of the spatial audio image (see description of the focus estimator 122 in the foregoing).
the decomposition coefficients ⁇ ( k, n ) may be applied as such as the decomposition coefficients 125 that are provided for the signal divider 126 for decomposition of the transform-domain stereo signal 103 therein.
energy-based temporal smoothing is applied to the decomposition coefficient ⁇ ( k, n ) obtained from the equation (8) in order to derive smoothed decomposition coefficients ⁇ ' ( k , n ), which may be provided for the signal divider 126 to be applied for decomposition of the transform-domain stereo signal 103 therein.
Smoothing of the decomposition coefficients results in slower variations over time in sub-portions of the spatial audio image assigned to the first signal component 105-1 and the second signal component 105-2, which may enable improved perceivable quality in the resulting widened stereo signal 115 via avoidance of small-scale fluctuances in the spatial audio image therein.
the signal decomposer 104 may comprise the signal divider 126 for deriving, based on the transform-domain stereo signal 103 and the decomposition coefficients 125, the first signal component 105-1 that represents the focus portion of the spatial audio image and the second signal component 105-2 that represents the non-focus portion (e.g. a 'peripheral' portion) of the spatial audio image.
the signal divider 126 creates the first signal component 105-1 that represents the focus portion of the spatial audio image and the second signal component 105-2 that represents the non-focus portion (e.g. a 'peripheral' portion) of the spatial audio image but it does not necessarily place a time-frequency tile S ( i, b, n ) into either the first signal component 105-1 or the second signal component 105-2. It can, as in this example, scale or weight the contribution of a time-frequency tile S ( i, b, n ) more heavily in one of the first signal component 105-1 or the second signal component 105-2 dependent upon the decomposition coefficients ⁇ ( k, n ).
the scaling coefficient ⁇ ( b, n ) p in the equation (9) may be replaced with another scaling coefficient that increases with increasing value of the decomposition coefficient ⁇ (b, n ) (and decreases with decreasing value of the decomposition coefficient ⁇ ( b, n )) and the scaling coefficient (1 - ⁇ (b, n )) p in the equation (10a) may be replaced with another scaling coefficient that decreases with increasing value of the decomposition coefficient ⁇ (b, n ) (and increases with decreasing value of the decomposition coefficient ⁇ ( b, n )).
the signal decomposition may be carried out for a plurality of frequency sub-bands k in a plurality of channels i in a plurality of time frames n based on the time-frequency tiles S ( i, b, n ), according the equation (10b):
S sw i b n ⁇ S i b n , ⁇ b n ⁇ ⁇ Th 0 , ⁇ b n > ⁇ Th
S dr i b n ⁇ 0 , ⁇ b n ⁇ ⁇ Th S i b n , ⁇ b n > ⁇ Th , wherein ⁇ Th denotes a defined threshold value that has value in the range from 0 to 1, e.g.
the signal decomposition parameter ⁇ Th can be defined by the control input 10. If applying the equation (10b) the temporal smoothing of the decomposition coefficients 125 described in the foregoing and/or temporal smoothing of the resulting signal components S sw ( i, b, n ) and S dr ( i, b, n ) may be advantageous for improved perceivable quality of the resulting widened stereo signal 115.
the decomposition coefficients ⁇ ( k, n ) according to the equation (8) are derived on time-frequency tile basis, whereas the equations (10a) and (10b) apply the decomposition coefficients ⁇ ( b, n ) on frequency bin basis.
the decomposition coefficients ⁇ ( k, n ) derived for a frequency sub-band k may be applied for each frequency bin b within the frequency sub-band k .
the transform-domain stereo signal 103 is divided, in each time-frequency tile S ( i, b, n ), into the first signal component 105-1 that represents sound components positioned in the focus portion of the spatial audio image represented by the stereo signal 101 and into the second signal component 105-2 that represents sound components positioned outside the focus portion of the spatial audio image represented by the stereo signal 101.
the first signal component 105-1 is subsequently provided for playback without applying stereo widening thereto
the second signal component 105-2 is subsequently provided for playback after being subjected to stereo widening.
the audio processing system 100, 100' may comprise the re-panner 106 that is arranged to generate a modified first signal component 107 on basis of the first signal component 105-1, wherein one or more sound sources represented by the first signal component 105-1 are repositioned in the spatial audio image.
Figure 4 illustrates a block diagram of some components and/or entities of the re-panner 106 according to an example.
entities of the re-panner 106 according to the example of Figure 4 are described in more detail.
the re-panner 106 may include further entities and/or some entities depicted in Figure 4 may be omitted or combined with other entities
the re-panner 106 may comprise an energy estimator 128 for estimating energy of the first signal component 105-1.
the energy values 129 are provided for a direction estimator 130 and for a re-panning gain determiner 136 for further processing therein.
the energy value computation may involve deriving a respective energy value E dr ( i, k, n ) for a plurality of frequency sub-bands k in plurality of audio channels i (plurality of virtual loudspeakers) in a plurality of time frames n based on the time-frequency tiles S dr ( i, b, n ).
the energy values 119 computed in the energy estimator 118 may be re-used in the re-panner 106, thereby dispensing with a dedicated energy estimator 128 in the re-panner 106.
the energy estimator 118 of the signal decomposer 104 estimates the energy values 119 based on the transform-domain stereo signal 103 instead of the first signal component 105-1, the energy values 119 enable correct operation of the direction estimator 130 and the re-panning gain determiner 136.
the re-panner 106 may comprise the direction estimator 130 for estimating perceivable arrival direction of the sound represented by the first signal component 105-1 based on the energy values 129 in view of the target virtual loudspeaker configuration applied in the stereo signal 101.
the direction estimation may comprise computation of direction angles 131 based on the energy values 129 in view of the target virtual loudspeaker positions, which direction angles 131 are provided for a direction adjuster 132 for further processing therein.
the direction estimation may involve deriving a respective direction angle 131, ⁇ dr ( k, n ), for a plurality of frequency sub-bands k in a plurality of time frames n based on the estimated energies E dr ( i, k, n ) and the positions ⁇ in ( i ) of the target virtual loudspeakers.
the direction angles 131, ⁇ dr ( k, n ) indicate the estimated perceived arrival direction (direction angle 131) of the sound in frequency sub-bands of first signal component 105-1.
the direction angles 121 computed in the energy estimator 128 may be re-used in the re-panner 106, thereby dispensing with a dedicated direction estimator 130 in the re-panner 106.
the direction estimator 120 of the signal decomposer 104 estimates the direction angles 121 based on the energy values 119 derived from the transform-domain stereo signal 103 instead of the first signal component 105-1, the sound source positions are the same or substantially the same and hence the direction angles 121 enable correct operation of the direction adjuster 132.
the re-panner 106 may comprise the direction adjuster 132 for modifying the estimated perceivable arrival direction (direction angle 131) of the sound represented by the first signal component 105-1.
the direction adjuster 132 may derive modified direction angles 133 based on the direction angles 131.
the modified direction angles 133 are provided for a panning gain determiner 134 for further processing therein.
the direction adjustment may comprise mapping the currently estimated perceivable arrival direction, direction angles 131, into respective modified direction angles 133 that represent new adjusted perceivable arrival direction of the sound in view of the control information 10.
mapping coefficient ⁇ for panning can be defined explicitly by the control input 10.
stereo widening 112 widens the signal 105-2 by a certain amount then, the re-panner 106 widens the signal 105-1 via re-panning by the same amount.
the stereo widening 112 may widen the signal so that a sound source originally at the location of 5 degrees is perceived after the widening at the location corresponding to 10 degrees in the original signals.
mapping coefficient ⁇ and derivation of the modified direction angles ⁇ ' ( k, n ) serves as a non-limiting example and a different procedure for deriving the modified direction angles 133 may be applied instead.
the re-panner 106 may comprise the panning gain determiner 134 for computing a set of panning gains 135 on basis of the modified direction angles 133.
the panning gain determination may comprise, for example, using vector base amplitude panning (VBAP) technique known in the art to compute a respective panning gain g' ( i, k, n ) for a plurality of frequency sub-bands k in plurality of audio channels i in a plurality of time frames n based on the modified direction angles ⁇ ' ( k, n ).
VBAP vector base amplitude panning
the re-panner 106 may comprise the re-panning gain determiner 136 for deriving re-panning gains 137 based on the panning gains 135 and the energy values 129.
the re-panning gains 137 are provided for a re-panning processor 138 for derivation of a modified first signal component 107 therein.
the re-panning gains g r ( i, k, n ) obtained from the equation (20) may be applied as such as the re-panning gains 137 that are provided for the re-panning processor 138 for derivation of the modified first signal component 107 therein.
energy-based temporal smoothing is applied to the re-panning gains g r ( i, k, n ) obtained from the equation (20) in order to derive smoothed re-panning gains g' r ( i, k, n ), which may be provided for the re-panning processor 138 to be applied for re-panning therein.
the re-panner 106 may comprise the re-panning processor 138 for deriving the modified first signal component 107 on basis of the first signal component 105-1 in dependence of the re-panning gains 137.
the sound sources in the focus portion of the spatial audio image are repositioned (i.e. re-panned) in accordance with the modified direction angles 132 derived in the direction adjuster 132 to account for (possible) differences between direct reproduction of stereo signals over headphones and reproduction of stereo widening 112 processed stereo signals over headphones.
the channels of the modified first signal component 107 are provided to an inverse transform entity 108-1 for conversion from the transform domain to the time domain therein.
the re-panning gains g r ( i, k, n ) according to the equation (20) are derived on time-frequency tile basis, whereas the equation (21) applies the re-panning gains g r ( i, k, n ) on frequency bin basis.
the re-panning gain g r ( i, k, n ) derived for a frequency sub-band k may be applied for each frequency bin b within the frequency sub-band k .
panning can apply to each time-frequency tile S ( i, b, n ) different combinations of controlled gain g r ( i, b, n ), controlled reverberation or decorrelation and, optionally, controlled delays to produce the channels of the modified first signal component 107.
the reverberation or decorrelation is typically added only at a low level.
the modified first signal component 107 may be divided to two paths (e.g., using a variable received in the control information 10).
the signal in the second path is processed using reverberation or decorrelation.
the signal in the first path is passed forward without processing and without any cross-channel mixing.
the signals in the two paths are combined, e.g., by summing them.
the audio processing system may comprise the inverse transform entity 108-1 that is arranged to transform the channels of the modified first signal component 107 from the transform-domain (back) to the time domain, thereby providing a time-domain modified first signal component 109-1.
the audio processing system 100 may comprise an inverse transform entity 108-2 that is arranged to transform channels of the second signal component 105-2 from the transform-domain (back) to the time domain, thereby providing a time-domain second signal component 109-2. Both the inverse transform entity 108-1 and the inverse transform entity 108-2 make use of an applicable inverse transform that inverts the time-to-transform-domain conversion carried out in the transform entity 102.
the inverse transform entities 108-1, 108-2 may apply an inverse STFT or a (synthesis) QMF bank to provide the inverse transform.
the resulting time-domain modified first signal component 109-1 may be denoted as s dr ( i, m ) and the resulting time-domain second signal component 109-2 may be denoted as s sw ( i, m ), where i denotes the channel and m denotes a time index (i.e. a sample index).
the inverse transform entities 108-1, 108-2 are omitted, and the modified first signal component 107 is provided as a transform-domain signal to the (optional) delay element 110' and the transform-domain second signal component 105-2 is provided as a transform-domain signal to the stereo widening processor 112'.
the audio processing system 100 may comprise the stereo widening processor 112 that is arranged to generate, on basis of the second signal component 109-2, the modified second signal component 113 where the width of a spatial audio image is extended from that represented by the second signal component 109-2.
the stereo widening processor 112 may apply any stereo widening technique known in the art to extend the width of the spatial audio image.
the stereo widening processor 112 processes the second signal component s sw ( i, m ) into the modified second signal component s' sw (i, m ) .
the second signal component s sw ( i, m ) and the modified second signal component s' sw ( i, m ) are respective time-domain signals.
Stereo widening techniques can involve adding a processed (e.g. filtered) version of a contralateral channel signal to each of the left and right channel signals of the stereo signal in order to derive an output stereo signal having a widened spatial audio image (a widened stereo signal).
a processed version of the right channel signal of the stereo signal is added to the left channel signal of the stereo signal to create the left channel of a widened stereo signal and a processed version of the left channel signal of the stereo signal is added to the right channel signal of the stereo signal to create the right channel of the widened stereo signal.
the procedure of deriving the widened stereo signal may further involve pre-filtering (or otherwise processing) each of the left and right channel signals of the stereo signal prior to adding the respective processed contralateral signals thereto in order to preserve desired frequency response in the widened stereo signal.
stereo widening readily generalizes into widening the spatial audio image of a multi-channel input audio signal, thereby deriving an output multi-channel audio signal having a widened spatial audio image (a widened multi-channel signal).
the processing involves creating the left channel of the widened multi-channel audio signal as a sum of (first) filtered versions of channels of the multi-channel input audio signal and creating the right channel of the widened multi-channel audio signal as a sum of (second) filtered versions of channels of the multi-channel input audio signal.
a dedicated predefined filter may be provided for each pair of an input channel (channels of the multi-channel input signal) and an output channel (left and right).
the filters H left ( i, b ) and H right ( i, b ) can include HRTFs, or HRTFs (or BRIRs) can be used later in the processing chain.
the filter H left ( i, b ) could be HRTFs to 90 degrees (i.e. to left).
the filter H right ( i, b ) could be HRTFs to -90 degrees (i.e. to right).
the filter H left ( i, b ) can comprise a direct (dry) part and an ambient part comprising one or more indirect (wet) paths.
H left i b r 1 2 H left , direct i b + 1 ⁇ r 1 2 H left , ambient i b where r is the ratio between direct and ambient parts.
the direct to ambient ratio r can be defined by the control input 10.
the direct part filter H left,direct ( i, b ) can be HRTFs to 90 degrees (i.e. to left).
the indirect part filter H left,ambient ( i, b ) can represent, for each time-frequency tile S ( i, b, n ), different indirect paths that each has a controlled gain, a controlled reverberation or decorrelation and, optionally, a controlled delay.
Each different indirect path is processed using a respective HRTF.
the directions of the HRTFs are typically selected so that they cover several directions around the listener, creating a perception of envelopment and/or spaciousness.
the filters of the different indirect paths are typically combined to the single filter H left,ambient ( i, b ) before they are applied.
the filter H right ( i, b ) can comprise a direct (dry) part and an ambient part comprising one or more indirect (wet) paths.
H right i b r 1 2 H right , direct i b + 1 ⁇ r 1 2 H right , ambient i b where r is the ratio between direct and ambient parts.
the direct part filter H right,direct ( i, b ) can be HRTFs to -90 degrees (i.e. to right).
the indirect part filter H right,ambient ( i, b ) can represent, for each time-frequency tile S ( i, b, n ), different indirect paths that each has a controlled gain, a controlled reverberation or decorrelation and, optionally, a controlled delay.
Each different indirect path is processed using a respective HRTF.
the directions of the HRTFs are typically selected so that they cover several directions around the listener, creating a perception of envelopment and/or spaciousness.
the filters of the different indirect paths are typically combined to the single filter H right, ambient ( i, b ) before they are applied.
the target virtual loudspeaker position indication 12 may be optionally provided to the stereo widening block 112.
the indicated virtual loudspeaker positions may then be used to select corresponding HRTFs for H left and H right filters, e.g. for a stereo signal HRTFs to +/-30 degrees were selected by default. However, in order to produce maximally strong widening effect for a stereo signal, HRTFs to +/-90 may be selected instead.
the stereo widening block 112 may map the indicated virtual loudspeaker positions to modified positions (for stronger widening effect) which are then used to derive the filters H lett and H right .
Figure 5 illustrates a block diagram of some components and/or entities of the stereo widening processor 112 according to a non-limiting example.
the stereo widening processor 112 is configured to provide cross-channel mixing means for applying a headphone filter H LL , H RL , H LR and H RR to each one of the plurality of input channels before mixing those channels to produce a modified second signal component 113 comprising two output channels (LEFT, RIGHT), wherein the headphone filter H mn applied to an input channel that is mixed to provide an output channel is dependent upon an identity of the output channel m and an identify of the input channel n.
the headphone filter H mn can comprise a head related transfer function dependent upon an identity of the output channel m and an identify of the input channel n.
the headphone filter H mn for an input channel n can be configured to mix a direct-rendering version of the input channel with an ambient-rendering version of the input channel.
the relative gain of the direct version of the input channel compared to the ambient version of the input channel in a mix in the headphone filter can be controlled via a user-controllable parameter r.
the headphone filter for an input channel can be configured to mix a single-path direct version of the input channel with a multiple-path ambient version of the input channel, where a head related transfer function is used to form the single-path direct version of the input channel and an indirect path filter is used in combination with a head related transfer function for each path of the multiple paths, to form the multiple-path ambient version of the input channel.
the indirect path filter can comprise decorrelation means or reverberation means.
the cross-channel mixing causes stereo-widening for headphone apparatus such that a width of a spatial audio image associated with the modified second signal component is greater than a width of a spatial audio image associated with the second signal component before cross-channel mixing of the second signal component.
the left channel of the modified second signal component 113 is created as a sum of the left channel of the second signal component 109-2 filtered by the filter H LL and the right channel of the second signal component 109-2 filtered by the filter H LR
the right channel of the modified second signal component 113 is created as a sum of the left channel of the second signal component 109-2 filtered by the filter H RL and the right channel of the second signal component 109-2 filtered by the filter H RR .
the stereo widening procedure is carried out on basis of the time-domain second signal component 109-2.
the stereo widening procedure (e.g. one that makes use of the filtering structure of Figure 5 ) may be carried out in the transform domain.
the order of the inverse transform entity 108-2 and the stereo widening processor 112 is changed.
the stereo widening processor 112 may be provided with a dedicated set of filters H LL , H RL , H LR and H RR that is designed to produce a desired extent of stereo widening for a target virtual loudspeaker configuration.
the stereo widening processor 112 may be provided with a plurality of sets of filters H LL , H RL , H LR and H RR , each set designed to produce a desired extent of stereo widening for a target virtual loudspeaker configuration.
the set of filters is selected in dependence of the indicated target virtual loudspeaker configuration.
the stereo widening processor 112 may dynamically switch between sets of filters e.g. in response to a change in the indicated virtual loudspeaker positions. There are various ways for designing a set of filters H LL , H RL , H LR and H RR .
the filter H LL can be filter H left ( left, b ) described above
the filter H LR can be filter H left ( right, b ) described above
the filter H RR can be filter H right ( right, b ) described above
the filter H RL can be filter H right ( left, b ) described above.
the stereo-widening performed by the spatial audio processor 112 can be performed in the time domain ( FIG 1A ) or the transform domain ( FIG 1B ).
the audio processing system 100 may comprise the delay element 110 that is arranged to delay the modified first signal component 109-1 by a predefined time delay, thereby creating a delayed first signal component 111.
the time delay is selected such that it matches or substantially matches the delay resulting from stereo widening processing applied in the stereo widening processor 112, thereby keeping the delayed first signal component 111 temporally aligned with the modified second signal component 113.
the delay element 110 processes the modified first signal component s dr ( i, m ) into the delayed first signal component sd r ( i, m ).
the time delay is applied in the time domain.
the order of the inverse transform entity 108-1 and the delay element 110 may be changed, thereby resulting in application of the predefined time delay in the transform domain.
the delay element 110' is optional and, if included, it is arranged to operate in the transform-domain, in other words to apply the predefined time delay to the modified first signal component 107 to create the delayed modified first signal component 111' in the transform-domain for provision to the combiner signal 114' as a transform-domain signal. It will be appreciated from the foregoing that if one wants to create a perception of a sound source outside the headphones, stereo widening 112 is needed (using, e.g., HRTFs). However, in between the headphones, the sound can be positioned without stereo widening.
re-panning can be used to position sound sources in between the headphones (You cannot position sounds outside the headphones with this method). However, the focus portion contains sounds only near the center, so positioning them in between the headphones is sufficient.
the peripheral portion 113 may contain sound sources perceived also outside the headphone positions.
the focus portion 111 does not contain sound sources perceived outside the headphone positions, but still they may be wider than they originally were.
the audio processing system 100 may comprise the signal combiner 114 that is arranged to combine the delayed first signal component 111 and the modified second signal component 113 into the widened stereo signal 115, where the width of spatial audio image is partially extended (in the peripheral but not necessarily the front focus portions) from that of the stereo signal 101.
the widened stereo signal 115 may be derived as a sum, as an average or as another linear combination of the delayed first signal component 111 and the modified second signal component 113, e.g.
s out i m s sw ⁇ i m + s dr ⁇ i m , where s out ( i, m ) denotes the widened stereo signal 115.
the signal combiner 114' is arranged to operate in the transform-domain, in other words to combine the (transform-domain) delayed modified first signal component 113' with the (transform-domain) modified second signal component 113' into the (transform-domain) widened stereo signal 115' for provision to the inverse transform entity 108'.
the inverse transform entity 108' is arranged to convert the (transform-domain) widened stereo signal 115' from the transform domain into the (time-domain) widened stereo signal 115.
the transform entity 108' may carry out the conversion in a similar manner as described in the foregoing in context of the transform entities 108-1, 108-2.
Each of the exemplifying audio processing systems 100, 100' described in the foregoing via a number of examples may further varied in a number of ways. In the following, non-limiting examples in this regard are described.
description of elements of the audio processing systems 100, 100' refer to processing of relevant audio signals in a plurality of frequency sub-bands k.
the processing of the audio signal in each element of the audio processing systems 100, 100' is carried out across (all) frequency sub-bands k .
the processing of the audio signal is carried out in a limited number of frequency sub-bands k .
the processing in a certain element of the audio processing system 100, 100' may be carried out for a predefined number of lowest frequency sub-bands k , for a predefined number of highest frequency sub-bands k, or for a predefined subset of frequency sub-bands k in the middle of the frequency range such that a first predefined number of lowest frequency sub-bands k and a second predefined number of highest frequency sub-bands k is excluded from the processing.
the frequency sub-bands k excluded from the processing e.g. ones at the lower end of the frequency range and/or ones at the higher end of the frequency range
the processing may be carried out only for a limited subset of frequency sub-bands k , involves one or both of the re-panner 116 and the stereo widening processor 112, 112', which may only process the respective input signal in a respective desired sub-range of frequencies, e.g. in a predefined number of lowest frequency sub-bands k or in a predefined subset of frequency sub-bands k in the middle of the frequency range.
the input audio signal 101 may comprise a multi-channel signal different from a two-channel stereophonic audio signal, e.g. surround signal.
the audio processing technique(s) described in the foregoing with references to the left and right channels of the stereo signal 101 may be applied to the front left and front right channels of the 5.1-channel surround signal to derive the left and right channels of the output audio signal 115.
the other channels of the 5.1-channel surround signal may be processed e.g. such that the center channel of the 5.1-channels surround signal scaled by a predefined gain factor (e.g.
the rear left and right channels of the 5.1-channel surround signal may be processed using a conventional stereo widening technique that makes use of widening filter(s) (utilizing, e.g., HRTFs or BRIRs)) that correspond(s) to respective target positions of the left and right rear loudspeakers (e.g. ⁇ 110 degrees with respect to the front direction).
the LFE channel of the 5.1-channel surround signal may be added to the center signal of the 5.1-channel surround signal prior to adding the scaled version thereof to the left and right channels of the output audio signal 115.
the input audio signal 101 may comprise N spatially distributed channels that are processed to produce a two-channel audio signal 115 processed specifically for playback via headphone apparatus.
the mixing of M channels to produce a first signal component 111, 111' of the two-channel stereophonic audio signal 115 can occur at re-panner 106.
the mixing of M' channels to produce a second signal component 113, 113' of the two-channel stereophonic audio signal 115 can occur at the stereo widening processor for headphone apparatus 112.
Audio events may move within the sound image.
an audio event sound object
the audio event is rendered via the first signal component 111, 111' of the two-channel stereophonic audio signal 115.
an audio event is positioned within the non-focus, peripheral range the audio event is rendered via the second signal component 113, 113' of the two-channel stereophonic audio signal 115.
the audio processing system 100, 100' may enable adjusting balance between the contribution from the first signal component 105-1 and the second signal component 105-2 in the resulting widened stereo signal 115.
This may be provided, for example, by applying respective different scaling gains to the first signal component 105-1 (or a derivative thereof) and to the second signal component 105-2 (or a derivative thereof).
respective scaling gains may be applied e.g. in the signal combiner 114, 114' to scale the signal components derived from the first and second signal components 105-1, 105-2 accordingly, or in the signal divider 126 to scale the first and second signal components 105-1, 105-2 accordingly.
a single respective scaling gain may be defined for scaling the first and second signal components 105-1, 105-2 (or a respective derivative thereof) across all frequency sub-bands or in predefined sub-set of frequency sub-bands.
different scaling gains may be applied across the frequency sub-bands, thereby enabling adjustment of the balance between the contribution from the first and second signal components 105-1, 105-2 only on some of the frequency sub-bands and/or adjusting the balance differently at different frequency sub-bands.
the audio processing system 100, 100' may enable scaling of one or both of the first signal component 105-1 and the second signal component 105-2 (or respective derivatives thereof) independently of each other, thereby enabling equalization (across frequency sub-bands) for one or both of the first and second signal components.
This may be provided, for example, by applying respective equalization gains to the first signal component 105-1 (or a derivative thereof) and to the second signal component 105-2 (or a derivative thereof).
a dedicated equalization gain may be defined for one or more frequency sub-bands for the first signal component 105-1 and/or for the second signal component 105-2.
a respective equalization gain may be applied e.g. in the signal divider 126 or in the signal combiner 114, 114' to scale a respective frequency sub-band of the respective one of the first and second signal components 105-1, 105-2 (or a respective derivative thereof).
the equalization gain may be the same for both the first and second signal components 105-1, 105-2 or different equalization gains be applied for the first and second signal component 105-1, 105-2.
Operation of the audio processing system 100, 100' described in the foregoing via multiple examples enables adaptively decomposing the stereo signal 101 into the first signal component 105-1 that represents the focus portion of the spatial audio image and that is provided for playback without application of stereo widening thereto and into the second signal component 105-2 that represents peripheral (non-focus) portion of the spatial audio image that is subjected to the stereo widening processing.
the audio processing system 100, 100' since the decomposition is carried out on basis of audio content conveyed by the stereo signal 101 on frame by frame basis, the audio processing system 100, 100' enables both adaptation for relatively static spatial audio images of different characteristics and adaptation to changes in the spatial audio image over time.
the disclosed stereo widening technique that relies on excluding coherent sound sources within the focus portion of the spatial audio image from the stereo widening processing and applies the stereo widening processing predominantly to coherent sounds that are outside the focus portion and to non-coherent sounds (such as ambience) enables improved timbre and reduced 'coloration' of sounds that are within the focus portion while still providing a large extent of perceivable stereo widening.
control input 10 can have one or more different functions.
the parameters of the decomposition process can be defined by the control input.
the control input 10 can for example define the focus range used in the analysis for dividing the signals to focus (i.e. front center) and non-focus (i.e. side) signals.
the focus range can, for example, be defined via ⁇ Th 1 and ⁇ Th 2 or ⁇ Th
the signal decomposition parameter ⁇ Th can, for example, be defined by the control input 10
the control input 10 can for example control relative gains between the peripheral signals 113, 113' that are widened and the frontal signals 111, 111' that are not. For example, it can in some examples control a relative gain ratio of peripheral to frontal.
the parameters of the widening process can for example be defined by the control input 10.
the control input 10 can, for example, control the direct to ambient ratio r used in widening.
the parameters may include for example the directions to which the non-focus sounds are processed (for example with the help of HRTF processing), and/or the amount of ambience (for example reverb) added to sound for increasing the "widening" effect or the perceived externalization. Processing the non-focus sounds to different virtual directions is not necessary, one embodiment of the invention can be such that the non-focus sounds are processed only using reverb, decorrelator or other methods which increase the externalization of the non-focus sounds.
the control input 10 can for example control explicitly or implicitly whether or not panning occurs. For example, panning may not occur if the focus range is narrow. For example, panning may not occur if the relative gain ratio of peripheral to frontal is small.
the value of the mapping coefficient ⁇ that controls panning extent can, for example, be defined explicitly by the control input 10 or can be controlled via definition of the focus range.
the overpan factor ⁇ can be used for modifying the front center sector (i.e. focus sounds) within which the focus signal is perceived (for example, it can be made sound wider than in the original signal).
the control input 10 can be also another parameter or a set of parameters which modify where the focus sounds are heard in the left - right panning dimension.
the weighting factors for energy-based temporal smoothing ( a and b ) can, for example, be defined by the control input 10.
control input can, for example, be controlled by user input.
the control input 10 can for example comprise parameters for controlling the focus sounds (e.g. for adding ambience to produce better externalization to front sounds).
the control input 10 can for example comprise parameters that define multiple analysis sectors (for the decomposition part) and multiple virtual speaker directions (used in the stereo widening block).
Non-focus sounds may be divided to more sectors than just left and right (outside of the focus range).
Components of the audio processing system 100, 100' may be arranged to operate, for example, in accordance with a method 200 illustrated by a flowchart depicted in Figure 6 .
the method 200 serves as a method for processing an input audio signal comprising a multi-channel audio signal that represents a spatial audio image.
the method 200 comprises:
the method 200 further comprises, at block 204, cross-channel mixing of at least some of the multiple input channels of the second signal component 105-2 to produce a modified second signal component 113 while enabling the first signal component to bypass cross-channel mixing
the method 200 further comprises, at block 206, combining the first signal component 105-2 and the modified second signal component 113 into an output audio signal 115 comprising two output channels configured for rendering by headphone apparatus.
the method 200 may be varied in a number of ways, for example in view of the examples concerning operation of the audio processing system 100 and/or the audio processing system 100' described in the foregoing.
the cross-channel mixing enables a width of the spatial audio image to be extended from that of the second signal component 105-2
Figure 7 illustrates a block diagram of some components of an exemplifying apparatus 300.
the apparatus 300 may comprise further components, elements or portions that are not depicted in Figure 7 .
the apparatus 300 may be employed e.g. in implementing one or more components described in the foregoing in context of the audio processing system 100, 100'.
the apparatus 300 may implement, for example, the device 50 or one or more components thereof.
the apparatus 300 comprises a processor 316 and a memory 315 for storing data and computer program code 317.
the memory 315 and a portion of the computer program code 317 stored therein may be further arranged to, with the processor 316, to implement at least some of the operations, procedures and/or functions described in the foregoing in context of the audio processing system 100, 100'.
the apparatus 300 comprises a communication portion 312 for communication with other devices.
the communication portion 312 comprises at least one communication apparatus that enables wired or wireless communication with other apparatuses.
a communication apparatus of the communication portion 312 may also be referred to as a respective communication means.
the apparatus 300 may further comprise user I/O (input/output) components 318 that may be arranged, possibly together with the processor 316 and a portion of the computer program code 317, to provide a user interface for receiving input from a user of the apparatus 300 and/or providing output to the user of the apparatus 300 to control at least some aspects of operation of the audio processing system 100, 100' implemented by the apparatus 300.
the user I/O components 318 may comprise hardware components such as a display, a touchscreen, a touchpad, a mouse, a keyboard, and/or an arrangement of one or more keys or buttons, etc.
the user I/O components 318 may be also referred to as peripherals.
the processor 316 may be arranged to control operation of the apparatus 300 e.g. in accordance with a portion of the computer program code 317 and possibly further in accordance with the user input received via the user I/O components 318 and/or in accordance with information received via the communication portion 312.
processor 316 is depicted as a single component, it may be implemented as one or more separate processing components.
memory 315 is depicted as a single component, it may be implemented as one or more separate components, some or all of which may be integrated/removable and/or may provide permanent / semi-permanent/ dynamic/cached storage.
the computer program code 317 stored in the memory 315 may comprise computer-executable instructions that control one or more aspects of operation of the apparatus 300 when loaded into the processor 316.
the computer-executable instructions may be provided as one or more sequences of one or more instructions.
the processor 316 is able to load and execute the computer program code 317 by reading the one or more sequences of one or more instructions included therein from the memory 315.
the one or more sequences of one or more instructions may be configured to, when executed by the processor 316, cause the apparatus 300 to carry out at least some of the operations, procedures and/or functions described in the foregoing in context of the audio processing system 100, 100'.
the apparatus 300 may comprise at least one processor 316 and at least one memory 315 including the computer program code 317 for one or more programs, the at least one memory 315 and the computer program code 317 configured to, with the at least one processor 316, cause the apparatus 300 to perform at least some of the operations, procedures and/or functions described in the foregoing in context of the audio processing system 100, 100'.
the computer program(s) stored in the memory 315 may be provided e.g. as a respective computer program product comprising at least one computer-readable non-transitory medium having the computer program code 317 stored thereon, the computer program code, when executed by the apparatus 300, causes the apparatus 300 at least to perform at least some of the operations, procedures and/or functions described in the foregoing in context of the audio processing system 100, 100'.
the computer-readable non-transitory medium may comprise a memory device or a record medium such as a CD-ROM, a DVD, a Blu-ray disc or another article of manufacture that tangibly embodies the computer program.
the computer program may be provided as a signal configured to reliably transfer the computer program.
references(s) to a processor should not be understood to encompass only programmable processors, but also dedicated circuits such as field-programmable gate arrays (FPGA), application specific circuits (ASIC), signal processors, etc.
FPGA field-programmable gate arrays
ASIC application specific circuits
signal processors etc.
the input audio signal 101 comprises a same sound source that is repeated at different positions, and that is rendered at the headphone apparatus 20 without interaural time difference and without frequency dependent interaural level differences
the sound source of the input audio signal 101 when the sound source of the input audio signal 101 is positioned at a first position that is relatively front and central to a user of the headphone apparatus 30, then the sound source is rendered at the headphone apparatus 30 with interaural time differences and with frequency dependent interaural level differences when the sound source of the input audio signal is repeated at a second position that is relatively peripheral and is not front and central to a user of the headphone apparatus 30.
the stereo-widening (for headphones) processor 112, 112' spatially processes the input audio signal 101 to add at peripheral positions, but not at central positions, of the spatial audio image positionally-dependent interaural time differences measurable between coherent audio events in both of the channels of the output audio signal and frequency-dependent and positionally-dependent interaural level differences measurable between coherent audio events in both of the channels of the output audio signal.
bypass initiated by the signal decomposer 104 and provided via a bypass route comprising the re-panner 106 thus enabling the first signal component 105-1 to bypass the stereo-widening (for headphones) processor 112, 112'.
the bypass enables components of the input audio signal 101 that represent a sound source that is coherent between two stereo channels and is positioned to front and center, to bypass cross-channel mixing at the stereo-widening (for headphones) processor 112, 112'.
first focus portion is front and central relative to a user of the headphone apparatus, and the second portion is peripheral relative to a user of headphone apparatus. In at least some of the above examples, first focus portion does not overlap the first portion. In at least some of the above examples, the first focus portion and second non-focus portions are contiguous.
first focus portion there is a central first focus portion and two second focus portions to left and right split by the first focus portion
reference to a portion may, for example, reference a single portion or multiple portions.
different spatial audio processing can be applied to each of the second portions.
different control inputs may be used for different second portions.
the same control inputs may be used for different second portions that are disposed symmetrically either side of a central direction.
different cross-channel mixing may be used for different second portions to achieve different widening effects.
the same cross-channel mixing may be used for different second portions that are disposed symmetrically either side of a central direction.
different direct to ambient rations r may be used for different second portions to achieve different effects.
the same direct to ambient ratio r may be used for different second portions that are disposed symmetrically either side of a central direction.
the first portion comprises multiple portions
different processing e.g. re-panning can be applied to each of the second portions.
the first (focus) portion is fixed in the audio image when the headphone apparatus move and the audio image is oriented with respect to the headphone apparatus.
the audio image is oriented with respect to the 'world' headphone apparatus and is processed to rotated when the headphones rotate.
the first (focus) portion can be fixed in the audio image when the headphone apparatus move or alternatively can rotate with the headphone apparatus.
the headphone apparatus 20 can comprise circuitry for tracking it's orientation.
the apparatus 100,100' is separate to the headphone apparatus 20, for example as illustrated in Figure 3 . In other examples, the apparatus 100, 100' is part of the headphone apparatus 20.
audio is divided into two paths, central and side sound.
timbre is important, so the processing is designed in order to keep that good.
HRTF processing is avoided.
the central sounds can be widened by, for example, "re-panning" which does not degrade timbre, and does some widening, even though it cannot create sources outside the headphones.
side sounds having very wide perception is the most important thing.
HRTFs are used to get that effect (and provide sound sources outside the headphones). This degrades the timbre, but that is accepted as a trade-off in order to get that maximal wideness. While one keeps timbre for central sounds, it is desirable to make them wide. Side sounds are made very wide.

Landscapes

Engineering & Computer Science (AREA)
Physics & Mathematics (AREA)
Acoustics & Sound (AREA)
Signal Processing (AREA)
Multimedia (AREA)
Computational Linguistics (AREA)
Health & Medical Sciences (AREA)
Audiology, Speech & Language Pathology (AREA)
Human Computer Interaction (AREA)
Mathematical Physics (AREA)
Stereophonic System (AREA)

EP20176223.4A 2019-05-29 2020-05-25 Traitement audio Pending EP3745744A3 (fr)

Applications Claiming Priority (1)

Application Number	Priority Date	Filing Date	Title
GB1907601.7A GB2584630A (en)	2019-05-29	2019-05-29	Audio processing

Publications (2)

Publication Number	Publication Date
EP3745744A2 true EP3745744A2 (fr)	2020-12-02
EP3745744A3 EP3745744A3 (fr)	2021-03-31

Family

ID=67385512

Family Applications (1)

Application Number	Title	Priority Date	Filing Date
EP20176223.4A Pending EP3745744A3 (fr)	2019-05-29	2020-05-25	Traitement audio

Country Status (3)

Country	Link
EP (1)	EP3745744A3 (fr)
CN (2)	CN112019993B (fr)
GB (1)	GB2584630A (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
WO2021058858A1 (fr)	2019-09-24	2021-04-01	Nokia Technologies Oy	Traitement audio
EP4340396A1 (fr) *	2022-09-14	2024-03-20	Nokia Technologies Oy	Appareil, procédés et programmes informatiques pour le traitement spatial de scènes audio

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
CN115376530A (zh) *	2021-05-17	2022-11-22	华为技术有限公司	三维音频信号编码方法、装置和编码器
CN113473352B (zh) *	2021-07-06	2023-06-20	北京达佳互联信息技术有限公司	双声道音频后处理的方法和装置
CN118764815B (zh) *	2024-08-02	2025-08-12	湖南芒果融创科技有限公司	一种虚拟现实空间的音频渲染方法及设备
CN120751331B (zh) *	2025-08-29	2025-12-26	浙江零跑科技股份有限公司	音频处理方法、装置及音频输出系统

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
US6091894A (en) *	1995-12-15	2000-07-18	Kabushiki Kaisha Kawai Gakki Seisakusho	Virtual sound source positioning apparatus
FI113147B (fi) *	2000-09-29	2004-02-27	Nokia Corp	Menetelmä ja signaalinkäsittelylaite stereosignaalien muuntamiseksi kuulokekuuntelua varten
FI118370B (fi) *	2002-11-22	2007-10-15	Nokia Corp	Stereolaajennusverkon ulostulon ekvalisointi
US7720230B2 (en) *	2004-10-20	2010-05-18	Agere Systems, Inc.	Individual channel shaping for BCC schemes and the like
US7991176B2 (en) *	2004-11-29	2011-08-02	Nokia Corporation	Stereo widening network for two loudspeakers
CN101243488B (zh) *	2005-06-30	2012-05-30	Lg电子株式会社	用于编码和解码音频信号的装置及其方法
US8374365B2 (en) *	2006-05-17	2013-02-12	Creative Technology Ltd	Spatial audio analysis and synthesis for binaural reproduction and format conversion
US8619998B2 (en) *	2006-08-07	2013-12-31	Creative Technology Ltd	Spatial audio enhancement processing method and apparatus
JP5007563B2 (ja) *	2006-12-28	2012-08-22	ソニー株式会社	音楽編集装置および方法、並びに、プログラム
GB2467247B (en) *	2007-10-04	2012-02-29	Creative Tech Ltd	Phase-amplitude 3-D stereo encoder and decoder
CN101184349A (zh) *	2007-10-10	2008-05-21	昊迪移通(北京)技术有限公司	针对双声道耳机设备的三维环响音效技术
US8144902B2 (en) *	2007-11-27	2012-03-27	Microsoft Corporation	Stereo image widening
EP2154911A1 (fr) *	2008-08-13	2010-02-17	Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.	Appareil pour déterminer un signal audio multi-canal de sortie spatiale
FR2996094B1 (fr) *	2012-09-27	2014-10-17	Sonic Emotion Labs	Procede et systeme de restitution d'un signal audio
WO2014164361A1 (fr) *	2013-03-13	2014-10-09	Dts Llc	Système et procédés pour traiter un contenu audio stéréoscopique
EP2830334A1 (fr) *	2013-07-22	2015-01-28	Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.	Décodeur audio multicanal, codeur audio multicanal, procédés, programmes informatiques au moyen d'une représentation audio codée utilisant une décorrélation de rendu de signaux audio
CN104200827B (zh) *	2014-09-05	2017-04-19	赵平	一种获得互联网音频文件的方法及装置
EP3048809B1 (fr) *	2015-01-21	2019-04-17	Nxp B.V.	Système et procédé pour élargissement stéréo
GB2549810B (en) *	2016-04-29	2020-08-19	Cirrus Logic Int Semiconductor Ltd	Audio signal processing
KR102580502B1 (ko) *	2016-11-29	2023-09-21	삼성전자주식회사	전자장치 및 그 제어방법

2019
- 2019-05-29 GB GB1907601.7A patent/GB2584630A/en not_active Withdrawn
2020
- 2020-05-25 EP EP20176223.4A patent/EP3745744A3/fr active Pending
- 2020-05-29 CN CN202010473489.XA patent/CN112019993B/zh active Active
- 2020-05-29 CN CN202210643129.9A patent/CN115190414B/zh active Active

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
WO2021058858A1 (fr)	2019-09-24	2021-04-01	Nokia Technologies Oy	Traitement audio
EP4035425A4 (fr) *	2019-09-24	2023-10-11	Nokia Technologies Oy	Traitement audio
EP4340396A1 (fr) *	2022-09-14	2024-03-20	Nokia Technologies Oy	Appareil, procédés et programmes informatiques pour le traitement spatial de scènes audio

Also Published As

Publication number	Publication date
EP3745744A3 (fr)	2021-03-31
CN115190414B (zh)	2025-08-08
CN112019993A (zh)	2020-12-01
CN115190414A (zh)	2022-10-14
CN112019993B (zh)	2022-06-17
GB201907601D0 (en)	2019-07-10
GB2584630A (en)	2020-12-16

Legal Events

Date	Code	Title	Description
2020-10-30	PUAI	Public reference made under article 153(3) epc to a published international application that has entered the european phase	Free format text: ORIGINAL CODE: 0009012
2020-10-30	STAA	Information on the status of an ep patent application or granted ep patent	Free format text: STATUS: THE APPLICATION HAS BEEN PUBLISHED
2020-12-02	AK	Designated contracting states	Kind code of ref document: A2 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR
2020-12-02	AX	Request for extension of the european patent	Extension state: BA ME
2021-02-26	PUAL	Search report despatched	Free format text: ORIGINAL CODE: 0009013
2021-03-31	AK	Designated contracting states	Kind code of ref document: A3 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR
2021-03-31	AX	Request for extension of the european patent	Extension state: BA ME
2021-03-31	RIC1	Information provided on ipc code assigned before grant	Ipc: H04S 3/00 20060101AFI20210222BHEP
2021-10-01	STAA	Information on the status of an ep patent application or granted ep patent	Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE
2021-11-03	17P	Request for examination filed	Effective date: 20210929
2021-11-03	RBV	Designated contracting states (corrected)	Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR
2023-02-17	STAA	Information on the status of an ep patent application or granted ep patent	Free format text: STATUS: EXAMINATION IS IN PROGRESS
2023-03-22	17Q	First examination report despatched	Effective date: 20230216

Publication	Publication Date	Title
EP3745744A2 (fr)	2020-12-02	Traitement audio
JP5285626B2 (ja)	2013-09-11	音声空間化及び環境シミュレーション
EP2258120B1 (fr)	2019-08-07	Procédés et dispositifs pour fournir des signaux ambiophoniques
US8180062B2 (en)	2012-05-15	Spatial sound zooming
US8045719B2 (en)	2011-10-25	Rendering center channel audio
KR101567461B1 (ko)	2015-11-09	다채널 사운드 신호 생성 장치
US20250168583A1 (en)	2025-05-22	Audio processing
EP3881566B1 (fr)	2025-09-10	Traitement audio
JP2014506416A (ja)	2014-03-13	オーディオ空間化および環境シミュレーション
US11212631B2 (en)	2021-12-28	Method for generating binaural signals from stereo signals using upmixing binauralization, and apparatus therefor
JP6660982B2 (ja)	2020-03-11	オーディオ信号レンダリング方法及び装置
EP4252432A1 (fr)	2023-10-04	Systèmes et procédés de mixage élévateur audio
EP4264963B1 (fr)	2026-01-28	Post-traitement de signal binaural
US11457329B2 (en)	2022-09-27	Immersive audio rendering
US20250350898A1 (en)	2025-11-13	Object-based Audio Spatializer With Crosstalk Equalization
HK1181948A1 (zh)	2013-11-15	立體聲像加寬系統