CN114333865B - 一种模型训练以及音色转换方法、装置、设备及介质 - Google Patents
一种模型训练以及音色转换方法、装置、设备及介质 Download PDFInfo
- Publication number
- CN114333865B CN114333865B CN202111577618.0A CN202111577618A CN114333865B CN 114333865 B CN114333865 B CN 114333865B CN 202111577618 A CN202111577618 A CN 202111577618A CN 114333865 B CN114333865 B CN 114333865B
- Authority
- CN
- China
- Prior art keywords
- audio data
- timbre
- semantic
- sample audio
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000006243 chemical reaction Methods 0.000 title claims abstract description 388
- 238000000034 method Methods 0.000 title claims abstract description 114
- 238000012549 training Methods 0.000 title claims abstract description 75
- 238000000605 extraction Methods 0.000 claims abstract description 80
- 239000013598 vector Substances 0.000 claims description 53
- 238000001228 spectrum Methods 0.000 claims description 49
- 238000013139 quantization Methods 0.000 claims description 21
- 238000004590 computer program Methods 0.000 claims description 18
- 238000012545 processing Methods 0.000 claims description 17
- 230000015572 biosynthetic process Effects 0.000 claims description 15
- 238000003786 synthesis reaction Methods 0.000 claims description 15
- 238000003860 storage Methods 0.000 claims description 13
- 230000002194 synthesizing effect Effects 0.000 claims description 2
- 238000004042 decolorization Methods 0.000 claims 2
- 230000000694 effects Effects 0.000 abstract description 16
- 230000008569 process Effects 0.000 description 44
- 238000010586 diagram Methods 0.000 description 22
- 238000004891 communication Methods 0.000 description 12
- 238000005516 engineering process Methods 0.000 description 11
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 7
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 239000003623 enhancer Substances 0.000 description 4
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 2
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 2
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000007620 mathematical function Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000000087 stabilizing effect Effects 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Quality & Reliability (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Electrophonic Musical Instruments (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Electrically Operated Instructional Devices (AREA)
- Reverberation, Karaoke And Other Acoustics (AREA)
Priority Applications (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202111577618.0A CN114333865B (zh) | 2021-12-22 | 2021-12-22 | 一种模型训练以及音色转换方法、装置、设备及介质 |
| US18/719,391 US20250061908A1 (en) | 2021-12-22 | 2022-12-20 | Method for model training and tone conversion, device, and medium |
| EP22909972.6A EP4425482B1 (fr) | 2021-12-22 | 2022-12-20 | Procédé et appareil d'entraînement de modèle et de conversion de tonalité, dispositif et support |
| PCT/CN2022/140253 WO2023116660A2 (fr) | 2021-12-22 | 2022-12-20 | Procédé et appareil d'entraînement de modèle et de conversion de tonalité, dispositif et support |
| JP2024538190A JP2024547129A (ja) | 2021-12-22 | 2022-12-20 | モデル訓練及び音色変換方法、装置、デバイス及び媒体 |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202111577618.0A CN114333865B (zh) | 2021-12-22 | 2021-12-22 | 一种模型训练以及音色转换方法、装置、设备及介质 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN114333865A CN114333865A (zh) | 2022-04-12 |
| CN114333865B true CN114333865B (zh) | 2024-07-19 |
Family
ID=81054746
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202111577618.0A Active CN114333865B (zh) | 2021-12-22 | 2021-12-22 | 一种模型训练以及音色转换方法、装置、设备及介质 |
Country Status (5)
| Country | Link |
|---|---|
| US (1) | US20250061908A1 (fr) |
| EP (1) | EP4425482B1 (fr) |
| JP (1) | JP2024547129A (fr) |
| CN (1) | CN114333865B (fr) |
| WO (1) | WO2023116660A2 (fr) |
Families Citing this family (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114333865B (zh) * | 2021-12-22 | 2024-07-19 | 广州市百果园网络科技有限公司 | 一种模型训练以及音色转换方法、装置、设备及介质 |
| CN117546238A (zh) * | 2022-06-07 | 2024-02-09 | 北京小米移动软件有限公司 | 一种生成音频的方法、装置及存储介质 |
| CN117672240A (zh) * | 2022-08-24 | 2024-03-08 | 北京小米移动软件有限公司 | 语音转换方法及装置、电子设备、存储介质 |
| CN117672239A (zh) * | 2022-08-31 | 2024-03-08 | 抖音视界有限公司 | 音频处理方法、装置及终端设备 |
| CN116704999A (zh) * | 2022-09-15 | 2023-09-05 | 荣耀终端有限公司 | 一种音频数据处理方法、装置、存储介质和电子设备 |
| CN115662386A (zh) * | 2022-10-18 | 2023-01-31 | 出门问问创新科技有限公司 | 一种语音转换方法、装置、电子设备及存储介质 |
| CN115905858B (zh) * | 2022-10-24 | 2026-01-27 | 招联消费金融股份有限公司 | 账户分类模型训练方法、装置、计算机设备和存储介质 |
| CN116013248A (zh) * | 2022-12-30 | 2023-04-25 | 广州趣丸网络科技有限公司 | 说唱音频生成方法、装置、设备和可读存储介质 |
| CN116665638B (zh) * | 2023-06-07 | 2026-03-24 | 平安科技(深圳)有限公司 | 语音合成方法、语音合成装置、电子设备及存储介质 |
| CN117219055A (zh) * | 2023-10-27 | 2023-12-12 | 之江实验室 | 一种基于音色分离的语音生成方法、装置、介质及设备 |
| CN118298836B (zh) * | 2024-05-29 | 2024-08-23 | 摩尔线程智能科技(北京)有限责任公司 | 音色转换方法、装置、电子设备、存储介质和程序产品 |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109448752A (zh) * | 2018-11-28 | 2019-03-08 | 广州市百果园信息技术有限公司 | 音频数据的处理方法、装置、设备及存储介质 |
| CN111785261A (zh) * | 2020-05-18 | 2020-10-16 | 南京邮电大学 | 基于解纠缠和解释性表征的跨语种语音转换方法及系统 |
Family Cites Families (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP5647159B2 (ja) * | 2012-02-28 | 2014-12-24 | 日本電信電話株式会社 | 事前分布計算装置、音声認識装置、事前分布計算方法、音声認識方法、プログラム |
| US8744854B1 (en) * | 2012-09-24 | 2014-06-03 | Chengjun Julian Chen | System and method for voice transformation |
| JP6347536B2 (ja) * | 2014-02-27 | 2018-06-27 | 学校法人 名城大学 | 音合成方法及び音合成装置 |
| US11264044B2 (en) * | 2016-02-02 | 2022-03-01 | Nippon Telegraph And Telephone Corporation | Acoustic model training method, speech recognition method, acoustic model training apparatus, speech recognition apparatus, acoustic model training program, and speech recognition program |
| CN107785015A (zh) * | 2016-08-26 | 2018-03-09 | 阿里巴巴集团控股有限公司 | 一种语音识别方法及装置 |
| JP6773634B2 (ja) * | 2017-12-15 | 2020-10-21 | 日本電信電話株式会社 | 音声変換装置、音声変換方法及びプログラム |
| JP7108147B2 (ja) * | 2019-05-23 | 2022-07-27 | グーグル エルエルシー | 表現用エンドツーエンド音声合成における変分埋め込み容量 |
| US11430424B2 (en) * | 2019-11-13 | 2022-08-30 | Meta Platforms Technologies, Llc | Generating a voice model for a user |
| CN113053356B (zh) * | 2019-12-27 | 2024-05-31 | 科大讯飞股份有限公司 | 语音波形生成方法、装置、服务器及存储介质 |
| CN112037754B (zh) * | 2020-09-09 | 2024-02-09 | 广州方硅信息技术有限公司 | 一种语音合成训练数据的生成方法及相关设备 |
| CN112382271B (zh) * | 2020-11-30 | 2024-03-26 | 北京百度网讯科技有限公司 | 语音处理方法、装置、电子设备和存储介质 |
| CN112652318B (zh) * | 2020-12-21 | 2024-03-29 | 北京捷通华声科技股份有限公司 | 音色转换方法、装置及电子设备 |
| CN113689868B (zh) * | 2021-08-18 | 2022-09-13 | 北京百度网讯科技有限公司 | 一种语音转换模型的训练方法、装置、电子设备及介质 |
| CN113470622B (zh) * | 2021-09-06 | 2021-11-19 | 成都启英泰伦科技有限公司 | 一种可将任意语音转换成多个语音的转换方法及装置 |
| CN114333865B (zh) * | 2021-12-22 | 2024-07-19 | 广州市百果园网络科技有限公司 | 一种模型训练以及音色转换方法、装置、设备及介质 |
-
2021
- 2021-12-22 CN CN202111577618.0A patent/CN114333865B/zh active Active
-
2022
- 2022-12-20 EP EP22909972.6A patent/EP4425482B1/fr active Active
- 2022-12-20 JP JP2024538190A patent/JP2024547129A/ja active Pending
- 2022-12-20 US US18/719,391 patent/US20250061908A1/en active Pending
- 2022-12-20 WO PCT/CN2022/140253 patent/WO2023116660A2/fr not_active Ceased
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109448752A (zh) * | 2018-11-28 | 2019-03-08 | 广州市百果园信息技术有限公司 | 音频数据的处理方法、装置、设备及存储介质 |
| CN111785261A (zh) * | 2020-05-18 | 2020-10-16 | 南京邮电大学 | 基于解纠缠和解释性表征的跨语种语音转换方法及系统 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN114333865A (zh) | 2022-04-12 |
| EP4425482B1 (fr) | 2025-09-03 |
| EP4425482A4 (fr) | 2025-01-22 |
| EP4425482A2 (fr) | 2024-09-04 |
| WO2023116660A3 (fr) | 2023-08-17 |
| US20250061908A1 (en) | 2025-02-20 |
| WO2023116660A2 (fr) | 2023-06-29 |
| JP2024547129A (ja) | 2024-12-26 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN114333865B (zh) | 一种模型训练以及音色转换方法、装置、设备及介质 | |
| CN110415687B (zh) | 语音处理方法、装置、介质、电子设备 | |
| CN111048064B (zh) | 基于单说话人语音合成数据集的声音克隆方法及装置 | |
| WO2024055752A9 (fr) | Procédé d'apprentissage de modèle de synthèse vocale, procédé de synthèse vocale et appareils associés | |
| CN112634866B (zh) | 语音合成模型训练和语音合成方法、装置、设备及介质 | |
| WO2023030235A1 (fr) | Procédé et système de production d'audio cible, support de stockage lisible et appareil électronique | |
| CN112992109B (zh) | 辅助歌唱系统、辅助歌唱方法及其非瞬时计算机可读取记录媒体 | |
| CN109376363A (zh) | 一种基于耳机的实时语音翻译方法及装置 | |
| JP7664330B2 (ja) | テキストエコー消去 | |
| CN113345414A (zh) | 基于语音合成的影片修复方法、装置、设备及介质 | |
| KR20240122776A (ko) | 뉴럴 음성 합성의 적응 및 학습 | |
| CN116913268A (zh) | 语音识别方法、装置、电子设备及存储介质 | |
| CN112562655A (zh) | 残差网络的训练和语音合成方法、装置、设备及介质 | |
| CN115700871A (zh) | 模型训练和语音合成方法、装置、设备及介质 | |
| CN116978381A (zh) | 音频数据处理方法、装置、计算机设备和存储介质 | |
| CN107886940B (zh) | 语音翻译处理方法及装置 | |
| WO2023102932A1 (fr) | Procédé de conversion audio, dispositif électronique, produit programme et support d'enregistrement | |
| CN112837688B (zh) | 语音转写方法、装置、相关系统及设备 | |
| CN114141259B (zh) | 语音转换方法、装置、设备、存储介质和程序产品 | |
| CN116072147A (zh) | 音乐检测模型训练方法、装置、电子设备及存储介质 | |
| WO2025113018A1 (fr) | Procédé et appareil d'extraction de parole, et dispositif électronique, support de stockage lisible par ordinateur et produit-programme d'ordinateur | |
| CN117542378A (zh) | 语音情绪识别方法、装置、电子设备及存储介质 | |
| RU2830834C2 (ru) | Способы обучения модели и конверсии голоса и устройство, другое устройство и носитель данных | |
| CN112365884A (zh) | 耳语的识别方法和装置、存储介质、电子装置 | |
| CN119763617B (zh) | 有效语音检测方法及装置 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |