WO2026072097A1 - Techniques d'amélioration de la parole - Google Patents

Techniques d'amélioration de la parole

Info

Publication number
WO2026072097A1
WO2026072097A1 PCT/US2025/024937 US2025024937W WO2026072097A1 WO 2026072097 A1 WO2026072097 A1 WO 2026072097A1 US 2025024937 W US2025024937 W US 2025024937W WO 2026072097 A1 WO2026072097 A1 WO 2026072097A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
speech
signal
playback device
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2025/024937
Other languages
English (en)
Inventor
Matthew BENATAN
Christopher Pike
Adib Mehrabi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sonos Inc
Original Assignee
Sonos Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sonos Inc filed Critical Sonos Inc
Publication of WO2026072097A1 publication Critical patent/WO2026072097A1/fr
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0364Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; ELECTRIC HEARING AIDS; PUBLIC ADDRESS SYSTEMS
    • H04R5/00Stereophonic arrangements
    • H04R5/04Circuit arrangements, e.g. for selective connection of amplifier inputs/outputs to loudspeakers, for loudspeaker detection, or for adaptation of settings to personal preferences or hearing impairments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/05Generation or adaptation of centre channel in multi-channel audio systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/008Systems employing more than two channels, e.g. quadraphonic in which the audio signals are in digital form, i.e. employing more than two discrete digital channels

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

Un exemple de procédé consiste à détecter, à l'aide d'un dispositif de reproduction, un signal audio, appliquer un modèle d'apprentissage automatique paramétrique pour détecter dynamiquement la parole dans le signal audio, sur la base de la détection de la parole, séparer le signal audio en audio vocal et audio non vocal, appliquer un premier traitement audio à l'audio vocal pour produire un audio vocal traité, appliquer un second traitement audio à l'audio non vocal pour produire un audio non vocal traité, le second traitement audio étant différent du premier traitement audio, combiner l'audio vocal traité et l'audio non vocal traité pour produire un signal de sortie audio, et reproduire le signal de sortie audio par l'intermédiaire du dispositif de reproduction.
PCT/US2025/024937 2024-09-27 2025-04-16 Techniques d'amélioration de la parole Pending WO2026072097A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202463700280P 2024-09-27 2024-09-27
US63/700,280 2024-09-27

Publications (1)

Publication Number Publication Date
WO2026072097A1 true WO2026072097A1 (fr) 2026-04-02

Family

ID=95783837

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2025/024937 Pending WO2026072097A1 (fr) 2024-09-27 2025-04-16 Techniques d'amélioration de la parole

Country Status (1)

Country Link
WO (1) WO2026072097A1 (fr)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8234395B2 (en) 2003-07-28 2012-07-31 Sonos, Inc. System and method for synchronizing operations among a plurality of independently clocked digital data processing devices
US8483853B1 (en) 2006-09-12 2013-07-09 Sonos, Inc. Controlling and manipulating groupings in a multi-zone media system
US10499146B2 (en) 2016-02-22 2019-12-03 Sonos, Inc. Voice control of a media playback system
US10712997B2 (en) 2016-10-17 2020-07-14 Sonos, Inc. Room association based on name
US20230087486A1 (en) * 2020-05-29 2023-03-23 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Method and apparatus for processing an initial audio signal
US20240111484A1 (en) * 2022-09-30 2024-04-04 Sonos, Inc. Techniques for Intelligent Home Theater Configuration

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8234395B2 (en) 2003-07-28 2012-07-31 Sonos, Inc. System and method for synchronizing operations among a plurality of independently clocked digital data processing devices
US8483853B1 (en) 2006-09-12 2013-07-09 Sonos, Inc. Controlling and manipulating groupings in a multi-zone media system
US10499146B2 (en) 2016-02-22 2019-12-03 Sonos, Inc. Voice control of a media playback system
US10712997B2 (en) 2016-10-17 2020-07-14 Sonos, Inc. Room association based on name
US20230087486A1 (en) * 2020-05-29 2023-03-23 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Method and apparatus for processing an initial audio signal
US20240111484A1 (en) * 2022-09-30 2024-04-04 Sonos, Inc. Techniques for Intelligent Home Theater Configuration

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LEE GEON WOO ET AL: "Multi-Task Learning U-Net for Single-Channel Speech Enhancement and Mask-Based Voice Activity Detection", APPLIED SCIENCES, vol. 10, no. 9, 6 May 2020 (2020-05-06), pages 3230, XP093293003, ISSN: 2076-3417, Retrieved from the Internet <URL:https://www.mdpi.com/2076-3417/10/9/3230/pdf> [retrieved on 20250704], DOI: 10.3390/app10093230 *
LEE YOUNGLO ET AL: "Spectro-Temporal Attention-Based Voice Activity Detection", IEEE SIGNAL PROCESSING LETTERS, IEEE, USA, vol. 27, 13 December 2019 (2019-12-13), pages 131 - 135, XP011768664, ISSN: 1070-9908, [retrieved on 20200123], DOI: 10.1109/LSP.2019.2959917 *
SALEEM NASIR ET AL: "On Learning Spectral Masking for Single Channel Speech Enhancement Using Feedforward and Recurrent Neural Networks", IEEE ACCESS, IEEE, USA, vol. 8, 31 August 2020 (2020-08-31), pages 160581 - 160595, XP011808493, [retrieved on 20200910], DOI: 10.1109/ACCESS.2020.3021061 *

Similar Documents

Publication Publication Date Title
US12360736B2 (en) Audio conflict resolution
US12149897B2 (en) Audio playback settings for voice interaction
US12288558B2 (en) Systems and methods of operating media playback systems having multiple voice assistant services
US11778404B2 (en) Systems and methods for authenticating and calibrating passive speakers with a graphical user interface
US11900014B2 (en) Systems and methods for podcast playback
US10891105B1 (en) Systems and methods for displaying a transitional graphical user interface while loading media information for a networked media playback system
WO2019222667A1 (fr) Filtrage linéaire pour détection de parole avec suppression de bruit
US12417071B2 (en) Techniques for intelligent home theater configuration
US20230195783A1 (en) Speech Enhancement Based on Metadata Associated with Audio Content
AU2021382800A1 (en) Playback of generative media content
WO2026072097A1 (fr) Techniques d&#39;amélioration de la parole
WO2026085066A1 (fr) Rendu de dialogue amélioré
WO2025029609A1 (fr) Techniques de personnalisation pour systèmes de lecture multimédia