CN114333865B - 一种模型训练以及音色转换方法、装置、设备及介质 - Google Patents

一种模型训练以及音色转换方法、装置、设备及介质 Download PDF

Info

Publication number: CN114333865B
Authority: CN; China
Prior art keywords: audio data; timbre; semantic; sample audio; sample
Prior art date: 2021-12-22
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.): Active

Application number

CN202111577618.0A

Other languages

English (en)

Chinese (zh)

Other versions

CN114333865A (zh

Inventor

黄家鸿

李玉乐

项伟

Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)

Guangzhou Baiguoyuan Network Technology Co Ltd

Original Assignee

Guangzhou Baiguoyuan Network Technology Co Ltd

Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)

2021-12-22

Filing date

2021-12-22

Publication date

2024-07-19

2021-12-22 Application filed by Guangzhou Baiguoyuan Network Technology Co Ltd filed Critical Guangzhou Baiguoyuan Network Technology Co Ltd

2021-12-22 Priority to CN202111577618.0A priority Critical patent/CN114333865B/zh

2022-04-12 Publication of CN114333865A publication Critical patent/CN114333865A/zh

2022-12-20 Priority to US18/719,391 priority patent/US20250061908A1/en

2022-12-20 Priority to EP22909972.6A priority patent/EP4425482B1/fr

2022-12-20 Priority to PCT/CN2022/140253 priority patent/WO2023116660A2/fr

2022-12-20 Priority to JP2024538190A priority patent/JP2024547129A/ja

2024-07-19 Application granted granted Critical

2024-07-19 Publication of CN114333865B publication Critical patent/CN114333865B/zh

Status Active legal-status Critical Current

2041-12-22 Anticipated expiration legal-status Critical

Links

238000006243 chemical reaction Methods 0.000 title claims abstract description 388
238000000034 method Methods 0.000 title claims abstract description 114
238000012549 training Methods 0.000 title claims abstract description 75
238000000605 extraction Methods 0.000 claims abstract description 80
239000013598 vector Substances 0.000 claims description 53
238000001228 spectrum Methods 0.000 claims description 49
238000013139 quantization Methods 0.000 claims description 21
238000004590 computer program Methods 0.000 claims description 18
238000012545 processing Methods 0.000 claims description 17
230000015572 biosynthetic process Effects 0.000 claims description 15
238000003786 synthesis reaction Methods 0.000 claims description 15
238000003860 storage Methods 0.000 claims description 13
230000002194 synthesizing effect Effects 0.000 claims description 2
238000004042 decolorization Methods 0.000 claims 2
230000000694 effects Effects 0.000 abstract description 16
230000008569 process Effects 0.000 description 44
238000010586 diagram Methods 0.000 description 22
238000004891 communication Methods 0.000 description 12
238000005516 engineering process Methods 0.000 description 11
101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 7
102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 7
230000006870 function Effects 0.000 description 7
239000003623 enhancer Substances 0.000 description 4
101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 3
238000004422 calculation algorithm Methods 0.000 description 3
238000012986 modification Methods 0.000 description 3
230000004048 modification Effects 0.000 description 3
238000012360 testing method Methods 0.000 description 3
101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 2
102100026388 L-amino-acid oxidase Human genes 0.000 description 2
101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 2
238000002372 labelling Methods 0.000 description 2
238000010606 normalization Methods 0.000 description 2
230000009467 reduction Effects 0.000 description 2
230000009466 transformation Effects 0.000 description 2
238000013528 artificial neural network Methods 0.000 description 1
230000009286 beneficial effect Effects 0.000 description 1
230000005540 biological transmission Effects 0.000 description 1
238000004364 calculation method Methods 0.000 description 1
238000009826 distribution Methods 0.000 description 1
238000004519 manufacturing process Methods 0.000 description 1
238000007620 mathematical function Methods 0.000 description 1
239000011159 matrix material Substances 0.000 description 1
238000003062 neural network model Methods 0.000 description 1
230000003287 optical effect Effects 0.000 description 1
230000002093 peripheral effect Effects 0.000 description 1
230000000306 recurrent effect Effects 0.000 description 1
230000000087 stabilizing effect Effects 0.000 description 1
230000017105 transposition Effects 0.000 description 1

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit

Landscapes

Engineering & Computer Science (AREA)
Physics & Mathematics (AREA)
Multimedia (AREA)
Health & Medical Sciences (AREA)
Audiology, Speech & Language Pathology (AREA)
Human Computer Interaction (AREA)
Quality & Reliability (AREA)
Acoustics & Sound (AREA)
Computational Linguistics (AREA)
Signal Processing (AREA)
Computer Vision & Pattern Recognition (AREA)
Electrophonic Musical Instruments (AREA)
Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Electrically Operated Instructional Devices (AREA)
Reverberation, Karaoke And Other Acoustics (AREA)

CN202111577618.0A 2021-12-22 2021-12-22 一种模型训练以及音色转换方法、装置、设备及介质 Active CN114333865B (zh)

Priority Applications (5)

Application Number	Priority Date	Filing Date	Title
CN202111577618.0A CN114333865B (zh)	2021-12-22	2021-12-22	一种模型训练以及音色转换方法、装置、设备及介质
US18/719,391 US20250061908A1 (en)	2021-12-22	2022-12-20	Method for model training and tone conversion, device, and medium
EP22909972.6A EP4425482B1 (fr)	2021-12-22	2022-12-20	Procédé et appareil d'entraînement de modèle et de conversion de tonalité, dispositif et support
PCT/CN2022/140253 WO2023116660A2 (fr)	2021-12-22	2022-12-20	Procédé et appareil d'entraînement de modèle et de conversion de tonalité, dispositif et support
JP2024538190A JP2024547129A (ja)	2021-12-22	2022-12-20	モデル訓練及び音色変換方法、装置、デバイス及び媒体

Applications Claiming Priority (1)

Application Number	Priority Date	Filing Date	Title
CN202111577618.0A CN114333865B (zh)	2021-12-22	2021-12-22	一种模型训练以及音色转换方法、装置、设备及介质

Publications (2)

Publication Number	Publication Date
CN114333865A CN114333865A (zh)	2022-04-12
CN114333865B true CN114333865B (zh)	2024-07-19

Family

ID=81054746

Family Applications (1)

Application Number	Title	Priority Date	Filing Date
CN202111577618.0A Active CN114333865B (zh)	2021-12-22	2021-12-22	一种模型训练以及音色转换方法、装置、设备及介质

Country Status (5)

Country	Link
US (1)	US20250061908A1 (fr)
EP (1)	EP4425482B1 (fr)
JP (1)	JP2024547129A (fr)
CN (1)	CN114333865B (fr)
WO (1)	WO2023116660A2 (fr)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
CN114333865B (zh) *	2021-12-22	2024-07-19	广州市百果园网络科技有限公司	一种模型训练以及音色转换方法、装置、设备及介质
CN117546238A (zh) *	2022-06-07	2024-02-09	北京小米移动软件有限公司	一种生成音频的方法、装置及存储介质
CN117672240A (zh) *	2022-08-24	2024-03-08	北京小米移动软件有限公司	语音转换方法及装置、电子设备、存储介质
CN117672239A (zh) *	2022-08-31	2024-03-08	抖音视界有限公司	音频处理方法、装置及终端设备
CN116704999A (zh) *	2022-09-15	2023-09-05	荣耀终端有限公司	一种音频数据处理方法、装置、存储介质和电子设备
CN115662386A (zh) *	2022-10-18	2023-01-31	出门问问创新科技有限公司	一种语音转换方法、装置、电子设备及存储介质
CN115905858B (zh) *	2022-10-24	2026-01-27	招联消费金融股份有限公司	账户分类模型训练方法、装置、计算机设备和存储介质
CN116013248A (zh) *	2022-12-30	2023-04-25	广州趣丸网络科技有限公司	说唱音频生成方法、装置、设备和可读存储介质
CN116665638B (zh) *	2023-06-07	2026-03-24	平安科技（深圳）有限公司	语音合成方法、语音合成装置、电子设备及存储介质
CN117219055A (zh) *	2023-10-27	2023-12-12	之江实验室	一种基于音色分离的语音生成方法、装置、介质及设备
CN118298836B (zh) *	2024-05-29	2024-08-23	摩尔线程智能科技(北京)有限责任公司	音色转换方法、装置、电子设备、存储介质和程序产品

Citations (2)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
CN109448752A (zh) *	2018-11-28	2019-03-08	广州市百果园信息技术有限公司	音频数据的处理方法、装置、设备及存储介质
CN111785261A (zh) *	2020-05-18	2020-10-16	南京邮电大学	基于解纠缠和解释性表征的跨语种语音转换方法及系统

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
JP5647159B2 (ja) *	2012-02-28	2014-12-24	日本電信電話株式会社	事前分布計算装置、音声認識装置、事前分布計算方法、音声認識方法、プログラム
US8744854B1 (en) *	2012-09-24	2014-06-03	Chengjun Julian Chen	System and method for voice transformation
JP6347536B2 (ja) *	2014-02-27	2018-06-27	学校法人名城大学	音合成方法及び音合成装置
US11264044B2 (en) *	2016-02-02	2022-03-01	Nippon Telegraph And Telephone Corporation	Acoustic model training method, speech recognition method, acoustic model training apparatus, speech recognition apparatus, acoustic model training program, and speech recognition program
CN107785015A (zh) *	2016-08-26	2018-03-09	阿里巴巴集团控股有限公司	一种语音识别方法及装置
JP6773634B2 (ja) *	2017-12-15	2020-10-21	日本電信電話株式会社	音声変換装置、音声変換方法及びプログラム
JP7108147B2 (ja) *	2019-05-23	2022-07-27	グーグルエルエルシー	表現用エンドツーエンド音声合成における変分埋め込み容量
US11430424B2 (en) *	2019-11-13	2022-08-30	Meta Platforms Technologies, Llc	Generating a voice model for a user
CN113053356B (zh) *	2019-12-27	2024-05-31	科大讯飞股份有限公司	语音波形生成方法、装置、服务器及存储介质
CN112037754B (zh) *	2020-09-09	2024-02-09	广州方硅信息技术有限公司	一种语音合成训练数据的生成方法及相关设备
CN112382271B (zh) *	2020-11-30	2024-03-26	北京百度网讯科技有限公司	语音处理方法、装置、电子设备和存储介质
CN112652318B (zh) *	2020-12-21	2024-03-29	北京捷通华声科技股份有限公司	音色转换方法、装置及电子设备
CN113689868B (zh) *	2021-08-18	2022-09-13	北京百度网讯科技有限公司	一种语音转换模型的训练方法、装置、电子设备及介质
CN113470622B (zh) *	2021-09-06	2021-11-19	成都启英泰伦科技有限公司	一种可将任意语音转换成多个语音的转换方法及装置
CN114333865B (zh) *	2021-12-22	2024-07-19	广州市百果园网络科技有限公司	一种模型训练以及音色转换方法、装置、设备及介质

2021
- 2021-12-22 CN CN202111577618.0A patent/CN114333865B/zh active Active
2022
- 2022-12-20 EP EP22909972.6A patent/EP4425482B1/fr active Active
- 2022-12-20 JP JP2024538190A patent/JP2024547129A/ja active Pending
- 2022-12-20 US US18/719,391 patent/US20250061908A1/en active Pending
- 2022-12-20 WO PCT/CN2022/140253 patent/WO2023116660A2/fr not_active Ceased

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
CN109448752A (zh) *	2018-11-28	2019-03-08	广州市百果园信息技术有限公司	音频数据的处理方法、装置、设备及存储介质
CN111785261A (zh) *	2020-05-18	2020-10-16	南京邮电大学	基于解纠缠和解释性表征的跨语种语音转换方法及系统

Also Published As

Publication number	Publication date
CN114333865A (zh)	2022-04-12
EP4425482B1 (fr)	2025-09-03
EP4425482A4 (fr)	2025-01-22
EP4425482A2 (fr)	2024-09-04
WO2023116660A3 (fr)	2023-08-17
US20250061908A1 (en)	2025-02-20
WO2023116660A2 (fr)	2023-06-29
JP2024547129A (ja)	2024-12-26

Legal Events

Date	Code	Title
2022-04-12	PB01	Publication
2022-04-12	PB01	Publication
2022-04-29	SE01	Entry into force of request for substantive examination
2022-04-29	SE01	Entry into force of request for substantive examination
2024-07-19	GR01	Patent grant
2024-07-19	GR01	Patent grant

Publication	Publication Date	Title
CN114333865B (zh)	2024-07-19	一种模型训练以及音色转换方法、装置、设备及介质
CN110415687B (zh)	2021-04-13	语音处理方法、装置、介质、电子设备
CN111048064B (zh)	2020-07-07	基于单说话人语音合成数据集的声音克隆方法及装置
WO2024055752A9 (fr)	2024-09-12	Procédé d'apprentissage de modèle de synthèse vocale, procédé de synthèse vocale et appareils associés
CN112634866B (zh)	2024-05-14	语音合成模型训练和语音合成方法、装置、设备及介质
WO2023030235A1 (fr)	2023-03-09	Procédé et système de production d'audio cible, support de stockage lisible et appareil électronique
CN112992109B (zh)	2023-11-28	辅助歌唱系统、辅助歌唱方法及其非瞬时计算机可读取记录媒体
CN109376363A (zh)	2019-02-22	一种基于耳机的实时语音翻译方法及装置
JP7664330B2 (ja)	2025-04-17	テキストエコー消去
CN113345414A (zh)	2021-09-03	基于语音合成的影片修复方法、装置、设备及介质
KR20240122776A (ko)	2024-08-13	뉴럴 음성 합성의 적응 및 학습
CN116913268A (zh)	2023-10-20	语音识别方法、装置、电子设备及存储介质
CN112562655A (zh)	2021-03-26	残差网络的训练和语音合成方法、装置、设备及介质
CN115700871A (zh)	2023-02-07	模型训练和语音合成方法、装置、设备及介质
CN116978381A (zh)	2023-10-31	音频数据处理方法、装置、计算机设备和存储介质
CN107886940B (zh)	2021-10-08	语音翻译处理方法及装置
WO2023102932A1 (fr)	2023-06-15	Procédé de conversion audio, dispositif électronique, produit programme et support d'enregistrement
CN112837688B (zh)	2024-04-02	语音转写方法、装置、相关系统及设备
CN114141259B (zh)	2025-12-09	语音转换方法、装置、设备、存储介质和程序产品
CN116072147A (zh)	2023-05-05	音乐检测模型训练方法、装置、电子设备及存储介质
WO2025113018A1 (fr)	2025-06-05	Procédé et appareil d'extraction de parole, et dispositif électronique, support de stockage lisible par ordinateur et produit-programme d'ordinateur
CN117542378A (zh)	2024-02-09	语音情绪识别方法、装置、电子设备及存储介质
RU2830834C2 (ru)	2024-11-26	Способы обучения модели и конверсии голоса и устройство, другое устройство и носитель данных
CN112365884A (zh)	2021-02-12	耳语的识别方法和装置、存储介质、电子装置
CN119763617B (zh)	2025-11-18	有效语音检测方法及装置