TW476060B - Smoothening apparatus and method for quick synthesized voice - Google Patents

Smoothening apparatus and method for quick synthesized voice Download PDF

Info

Publication number
TW476060B
TW476060B TW89110181A TW89110181A TW476060B TW 476060 B TW476060 B TW 476060B TW 89110181 A TW89110181 A TW 89110181A TW 89110181 A TW89110181 A TW 89110181A TW 476060 B TW476060 B TW 476060B
Authority
TW
Taiwan
Prior art keywords
speech signal
synthesized speech
function
signal
length
Prior art date
Application number
TW89110181A
Other languages
Chinese (zh)
Inventor
Huei-Liang Jiang
Original Assignee
Iqchina Technology Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Iqchina Technology Inc filed Critical Iqchina Technology Inc
Priority to TW89110181A priority Critical patent/TW476060B/en
Application granted granted Critical
Publication of TW476060B publication Critical patent/TW476060B/en

Links

Landscapes

  • Document Processing Apparatus (AREA)

Abstract

A smoothening apparatus and method for quick voice synthesis includes measurement of the consonant vocal time length of the synthesized voice signal and delay of the synthesized voice signal after a specified period of time for further output in accordance with the multiplication of the first conversion function and the first and the second synthesized voice signal. During the overlapped period for the second synthesized voice signal to start from the initial point, the gain of the first conversion function is increased to a specified value from zero and maintained at the particular value until the second synthesized voice signal finishes and returns to zero. The overlapped period length equals to the consonant vocal time length of the second synthesized vocal signal. Then the second conversion function is multiplied with the first and the second synthesized voice signals. During the overlapped period from the beginning of the attenuation start point to the completion of the first synthesized voice signal, the gain of the first processing apparatus is reduced from particular value to zero. Finally the voice signal overlapping is performed for the processing of the first and the second functions to make the uttered synthesized voice of the pronunciation between words smooth.

Description

經濟部智慧財產局員工消費合作社印製 476060 A7 B7_ 五、發明說明() 5-1發明領域 本發明係有關於一種聲音合成平滑處理之方法與 裝置,特別是有關於一種可以快速地將聲音合成平滑處 理之方法與裝置。 5-2發明背景 在自動化設備使用日益頻繁的現代,作為使用者與 自動化系統溝通之用的人機介面,為了讓使用者便於依 照指示操作機器,或是讓使用者獲得語音内的資訊,必 須會讓機器發出合成語音,所以合成語音若可以盡量接 近人類發出的自然語音,將會讓使用者覺得有親切感, 甚至更容易操作。 然而現在一般聲音合成的方式大約分為三種:一、 口腔模型參數法,二、原音參數法,三、聲音串接法。 上述的各種方法各有如下的缺點。口腔模型參數法的缺 點是效果太假,很容易就可以判定為機器合成聲音。而 原音參數法則是因為要減少須儲存的資料置,所以在儲 存時就將其高頻部分捨棄,以致於以原音參數法發音 時,合成效果不清晰(因為人聲的較高頻部分消失)。串接 法則銜接處理不易,儲存的資料量大,所以在以前儲存 裝置容量成本高時,一般並不會考慮使用此方法,然而 在機器發出單字的合成語音時,此方法卻是具有最佳發 音品質,所以聲音串接法在儲存媒體容量不斷變大、儲 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) (請先閱讀背面之注意事項再填寫本頁)Printed by the Intellectual Property Bureau's Consumer Cooperatives of the Ministry of Economic Affairs 476060 A7 B7_ V. Description of the Invention (5-1) Field of the Invention The present invention relates to a method and device for smoothing sound synthesis, and more particularly to a method for quickly synthesizing sound Method and device for smoothing. 5-2 Background of the Invention In the modern world where automation equipment is used more and more frequently, as a human-machine interface for users to communicate with automation systems, in order to allow users to operate the machine in accordance with instructions, or to allow users to obtain information in voice, It will make the machine produce synthetic speech, so if the synthesized speech can be as close as possible to the natural speech made by humans, it will make users feel intimate and even easier to operate. However, there are currently three general methods of sound synthesis: one, the oral model parameter method, two, the original sound parameter method, and three, the sound concatenation method. Each of the above methods has the following disadvantages. The disadvantage of the oral model parameter method is that the effect is too false, and it can be easily judged as a machine synthesized sound. The original sound parameter rule is to reduce the amount of data to be stored, so the high frequency part of the original sound parameter method is discarded during storage, so that when the original sound parameter method is used, the synthesis effect is not clear (because the higher frequency part of the human voice disappears). The concatenation rule is difficult to handle and the amount of stored data is large. Therefore, this method was not generally considered when the storage device capacity cost was high in the past. However, this method has the best pronunciation when the machine emits a single word of synthesized speech. Quality, so the sound concatenation method is constantly increasing in storage media capacity, and the paper size of the storage paper is applicable to the Chinese National Standard (CNS) A4 (210 X 297 mm) (Please read the precautions on the back before filling this page)

·1111111 . I I — II 經濟部智慧財產局員工消費合作社印製 476060 A7 B7_ 五、發明說明() 存成本降低的情形下,仍是語音合成的重要方法,只是 必須要克服每個字音之間連接的問題,否則即使每個字 音的發音都自然,但是當兩個字音連接時,聽起來仍然 不像人類的自然語音、無法達到提昇合成語音發音品質 的目的。 在不考慮破音字的情形下,一般而言,同一個字雖 然其單字的發音(字音)是固定的,但是在人類所發出的自 然語音當中,在不同的詞句内,會因為其前後所連接的 字不同,以致其發音必須因為前後的文字而改變。這種 連音的情形在英文發音中特別明顯,在中文發音中雖然 較不明顯,但是若細究中文語音的發音結構,可以發現 中文語音中每個單字的發音包括聲母、介音以及韻母, 並且發出韻母之後的一段延續時間内所發的是氣音,音 量很小。接著在下一字音之前並不會發出聲音。 若是將每一個字音單獨儲存在語音庫,要讀出一個 句子時再由語音庫個別讀出複數個單字字音,並且這複 數個單字字音之間只是直接串接,則所發出的複數個單 字字音聽起來就不像是一個句子,每兩個字的發音中間 的間斷變得很明顯,聽起來會與一般自然語音有很大的 差別。因為在詞句中一個單字的發音與另一個詞句中同 一個單字的發音(音量或音調)並不相同。在合成語音的應 用中,如果要儲存所有詞句以避免由單字發音逐一連接 所造成的斷音問題,所需儲存的空間極大,實務上並不 可行。所以儲存單字的發音,再將單字的發音連接成為 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) (請先閱讀背面之注意事項再填寫本頁) ^--------II-------- 經濟部智慧財產局員工消費合作社印製 476060 A7 B7_ 五、發明說明() 一個詞句的發音較為可行,並且為了運算快速起見,處 理合成語音的方法最好越簡單越好。 過去串接法在字與字連接處韻律往往無法得處理 流暢,連續字音之間無法銜接,因此聽起來會有間斷、 變調的感覺,需要提出一種機器發出一個詞句的合成語 音時,不只將其中每個字的發音作處理,更要使合成語 音發音時,每個發音的字與字之間的連接更為平順的方 法,以提高合成語音之發音品質,並且此方法必須具有 快速運算的優點,使其適用於實際運作。 5 - 3發明目的及概述 鑒於上述之發明背景中,語音合成的運用範圍越來 越廣,傳統的裝置與方法無法快速而且簡單的將合成語 音平滑化處理。所以本發明提出一種聲音合成的平滑處 理裝置與方法,使其具有良好的合成聲音串接效果,並 且因為其運算極為簡單,所以更適合一般不要花費大量 成本的使用者。 本發明所提出的聲音合成平滑處理裝置與方法,係 在讀出合成語音訊號時,使所發出的合成語音中每個字 音間之發音平滑化,上述之聲音合成平滑處理裝置包含 下列元件:聲母長度偵測裝置,係用於偵測出依序輸出 的每個合成語音訊號的聲母發聲時間長度。延遲裝置, 係用於將每個合成語音訊號輸入上述之延遲裝置而延遲 一段特定時間之後,再行將所輸入的每個合成語音訊號 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) (請先閱讀背面之注意事項再填寫本頁) -裝--------訂---------繞 %- 經濟部智慧財產局員工消費合作社印製 476060 A7 B7_ 五、發明說明() 輸出上述之延遲裝置。 第一處理裝置,係用於合成語音訊號中的第一合成 語音訊號以及第二合成語音訊號被依序輸入到第一處理 裝置時,相應於第二合成語音訊號由起始點開始的重疊 時段中,其增益大小由零漸增至一特定值,並且維持在 上述之特定值,直到上述之第二合成語音訊號結束之後 才又回歸到零,上述之重疊時段的長度等於子音長度偵 測裝置對連續的第一合成語音訊號以及第二合成語音訊 號中較慢輸入的第二合成語音訊號偵測之後所得到的後 子音長度。第一處理裝置的增益大小由零漸增至上述之 特定值時,其轉換函數為第一轉換函數,其為遞增函數。 第二處理裝置,係用於依據上述之子音長度偵測裝 置所偵測得的每個合成語音訊號的子音發聲時間長度, 當合成語音訊號中的第一合成語音訊號以及第二合成語 音訊號被依序輸入到上述之第二處理裝置時,第一合成 語音訊號由衰減起始點開始,到第一合成語音訊號結束 的上述之重疊時段内,上述之第一處理裝置的增益大小 由上述之特定值漸漸減小到零。上述之重疊時段的長度 等於上述之子音長度偵測裝置偵測第二合成語音訊號所 得的後子音長度。第二處理裝置的增益大小由上述之特 定值漸減至零時,其所對應的函數為第二轉換函數。上 述之第二轉換函數為遞減函數。 加成裝置,係用於將上述之第一處理裝置以及第二 處理裝置的輸出訊號疊加,使得重疊時段内的第一合成 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) (請先閱讀背面之注意事項再填寫本頁) _· 1111111. II — II Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 476060 A7 B7_ V. Description of the invention () In the case of reduced storage costs, it is still an important method for speech synthesis, but it is necessary to overcome the connection between each character Problem, or even if the pronunciation of each character is natural, when the two characters are connected, it still doesn't sound like human's natural voice and cannot achieve the purpose of improving the quality of the synthesized pronunciation. Without considering the broken sound, generally speaking, although the pronunciation of a single character (phonetic) of the same word is fixed, in the natural speech issued by human beings, in different words and sentences, it will be connected by its front and back. The words are different, so that the pronunciation must be changed by the words before and after. This kind of liaison is particularly noticeable in English pronunciation. Although it is less obvious in Chinese pronunciation, if we study the pronunciation structure of Chinese phonetics, we can find that the pronunciation of each single word in Chinese phonetic sounds includes initials, intermediate sounds and finals, and Breathing sounds are produced for a period of time after the finals are issued, with a low volume. Then there is no sound until the next word sound. If each character is stored separately in the speech database, when a sentence is to be read, then the speech database will read out a plurality of single-word sounds, and the plurality of single-word sounds are only directly concatenated. It doesn't sound like a sentence. The discontinuity in the pronunciation of each two words becomes obvious, and it sounds very different from ordinary natural speech. Because the pronunciation of one word in a sentence is not the same as the pronunciation (volume or pitch) of the same word in another sentence. In the application of synthetic speech, if you want to store all words and phrases to avoid the problem of broken sounds caused by the connection of single-word pronunciations, the storage space required is extremely large, which is not practical in practice. Therefore, the pronunciation of single words is stored, and then the pronunciation of single words is connected to the paper size applicable to the Chinese National Standard (CNS) A4 specification (210 X 297 mm) (Please read the precautions on the back before filling this page) ^ ---- ---- II -------- Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs 476060 A7 B7_ V. Description of the invention () The pronunciation of a word is more feasible, and for the sake of fast calculation, the processing of synthetic speech The better, the simpler the better. In the past, the tandem method often failed to deal with the rhythm at the word-to-word connection, and the continuous characters could not be connected. Therefore, it sounded discontinuous and tone-changing. When a machine was required to produce a synthetic speech of a word, not only The method of processing the pronunciation of each word, and a method of smoothing the connection between each pronunciation word and the word when synthesizing the pronunciation, so as to improve the pronunciation quality of the synthesized pronunciation, and this method must have the advantage of fast calculation To make it suitable for practical operation. 5-3 Purpose and Summary of the Invention In view of the above-mentioned background of the invention, the application range of speech synthesis is becoming wider and wider, and traditional devices and methods cannot quickly and simply smooth the synthesized speech. Therefore, the present invention proposes a smooth processing device and method for sound synthesis, so that it has a good effect of synthesizing sound synthesis, and because its operation is extremely simple, it is more suitable for users who generally do not spend a lot of costs. The sound synthesis smoothing processing device and method provided by the present invention smooth the pronunciation between each character in the synthesized speech when the synthesized speech signal is read out. The above sound synthesis smoothing processing device includes the following components: initials The length detection device is used to detect the length of the initial sound of each of the synthesized speech signals output in sequence. The delay device is used for inputting each synthetic voice signal into the above-mentioned delay device and delaying for a certain period of time, and then inputting each of the synthetic voice signals input. The paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297 mm) (Please read the precautions on the back before filling out this page)-Install -------- Order --------- Round%-Printed by the Employees' Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs 476060 A7 B7_ 5. Description of the invention () Output the above-mentioned delay device. The first processing device is used to sequentially input the first synthesized speech signal and the second synthesized speech signal in the synthesized speech signal to the first processing device, corresponding to the overlapping period of the second synthesized speech signal from the starting point. The gain value gradually increases from zero to a specific value, and is maintained at the specific value described above, and does not return to zero until the second synthetic speech signal ends. The length of the overlapping period is equal to the consonant length detection device. The consonant length obtained after detecting the continuous first synthetic speech signal and the second synthetic speech signal that is input slower in the second synthetic speech signal. When the gain of the first processing device is gradually increased from zero to the above-mentioned specific value, its conversion function is the first conversion function, which is an increasing function. The second processing device is used for the length of the consonant sounding time of each synthesized speech signal detected by the above-mentioned consonant length detection device. When the first synthesized speech signal and the second synthesized speech signal in the synthesized speech signal are detected, When inputting to the second processing device in sequence, the first synthesized speech signal starts from the attenuation start point and ends in the above overlapping period when the first synthesized speech signal ends. The gain of the first processing device is determined by the above. The specific value gradually decreases to zero. The length of the overlapping period is equal to the post-vowel length obtained by the above-mentioned consonant length detecting device detecting the second synthesized speech signal. When the gain of the second processing device is gradually reduced from the above-mentioned specific value to zero, the corresponding function is the second conversion function. The second conversion function is a decreasing function. The additive device is used to superimpose the output signals of the first processing device and the second processing device described above, so that the paper size of the first composite within the overlap period is applicable to the Chinese National Standard (CNS) A4 specification (210 X 297 mm) ) (Please read the notes on the back before filling this page) _

Λ ί n n m n an n 一 rT I n HI in —a— n teamf m I 476060 A7 B7 五、發明說明() (請先閱讀背面之注意事項再填寫本頁) 語音訊號與重疊時段内的第二合成語音訊號相加。藉以 使得上述之第一合成語音訊號,及其後段内的重疊時段 所疊加的第一合成語音訊號與第二合成語音訊號,連同 重疊時段之後的第二合成語音訊號結合之後,使所發出 的合成語音中之每個字音間之發音平滑化。 其中上述之第一轉換函數可以為遞增的步階函數 (stepfunct ion)或是遞增的斜坡函數(ramp function)。上 述之第二轉換函數可以為遞減的步階函數(stepfunction) 或是斜坡函數(ramp function)。 經濟部智慧財產局員工消費合作社印製 本發明所提出的聲音合成平滑處理方法,係在讀出 合成語音訊號時,使所發出的合成語音中每個字音間之 發音平滑化,上述之聲音合成的平滑處理方法包含下列 步驟:首先偵測出依序輸出的每個合成語音訊號的子音 發聲時間長度,同時將每個合成語音訊號延遲一段特定 時間之後再行輸出。然後以第一轉換函數乘以合成語音 訊號中的第一合成語音訊號以及第二合成語音訊號,其 中上述之第一轉換函數的轉換特性曲線相應於第二合成 語音訊號由起始點開始的重疊時段内,其增益大小由零 漸增至一特定值,並且維持在上述之特定值,直到上述 之第二合成語音訊號結束之後才又回歸到零。上述之重 疊時段的長度等於連續的第一合成語音訊號以及第二合 成語音訊號中較慢輸入的第二合成語音訊號的子音發聲 時間長度,上述之第一轉換函數為遞增函數。 接著以第二轉換函數乘以合成語音訊號中的第一 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 經濟部智慧財產局員工消費合作社印製 476060 A7 B7_ 五、發明說明() 合成語音訊號以及第二合成語音訊號,其中上述之第二 轉換函數的轉換特性曲線相應於上述之第一合成語音訊 號,於第一合成語音訊號由衰減起始點開始到第一合成 語音訊號結束的重疊時段内,上述之第一處理裝置的增 益大小由上述之特定值漸漸減小到零。上述之重疊時段 的長度等於上述的後子音長度,上述之第二轉換函數為 遞減函數。最後將經過上述第一函數處理的部分第二合 成語音訊號與經過上述第二函數處理的部分第一合成語 音訊號疊加,使得上述之重疊時段内的第一合成語音訊 號與上述之重疊時段内的第二合成語音訊號相加,藉以 使得上述之第一合成語音訊號,及其後段内上述之重疊 時段中所疊加的經過第一函數處理的部分第二合成語音 訊號與經過上述第二函數處理的部分第一合成語音訊 號,連同上述之重疊時段後的第二合成語音訊號結合之 後,使所發出的合成語音中之每個字音間之發音平滑 化。 5-4圖式簡單說明 將後續的說明配合下列圖式,即可以對於本發明的 特徵有更為清楚之了解,其中: 圖一顯示的是本發明的快速聲音合成之平滑處理 裝置與方法的一較佳實施例的功能方塊圖; 圖二顯示的是依據本發明的快速聲音合成之平滑 處理裝置與方法中之一較佳實施例所利用的第一處理裝 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) (請先閱讀背面之注意事項再填寫本頁) 裝--------訂---------線 經濟部智慧財產局員工消費合作社印製 476060 A7 ___B7_ 五、發明說明() 置的轉換函數圖;以及 圖三顯示的是依據本發明的快速聲音合成之平滑 處理裝置與方法中之一較佳實施例所利用的第二處理裝 置的轉換函數圖。 5-5發明詳細說明 因為在一般習知的合成語音之技術中,並未有成本 低廉又有良好效果的處理合成語音之方法與裝置,以致 現在市面上的語音合成產品,通通無法以較低廉的價格 來製造出高品質的語音合成裝置。本發明不但不會增加 成本,因為本發明的方法對於語音訊號的處理,不必對 發出合成語音的裝置作大幅更改,就可以讓傳統逐字發 出合成語音的裝置在發出詞句時,避免詞句中字音與字 音間的發音不連貫。 換句話說,即使語音庫中儲存的是獨立單字的字 音,經過本發明的裝置與方法處理之後,仍然可以在發 出詞句的聲音時,讓詞句中每個字的發音之音調及音量 前後相連貫而不致造成突兀感,以使機器所發出的聲音 更加接近真實人聲,提高合成語音之品質。習知技術所 使用的技術往往只利用參數之調整來模擬發音,所以字 音與子音連接處音貝律彳主在無法流暢’而利用本發明的方 法即能將語意分析,以使得句子與句子連接處的發音能 相連接,所以聽起來會有連續之感。因為以機器發出合 成語音的裝置為習知技術之範疇,所以本發明說明書於 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) (請先閱讀背面之注意事項再填寫本頁) 裝 · 476060 A7 ____B7_ 五、發明說明() (-1,先間讀背面之注意事項再瑣寫本頁) 此不加以贅述,僅就與本發明有關而對詞句中連接的字 音作快速聲音合成之平滑處理裝置與方法,於此說明書 中力口以說明。 圖一顯示的是本發明所提出的快速聲音合成之平 滑處理裝置與方法的功能方塊圖,當要由語音庫讀出一 連串字音以發出一整個句子的合成語音時,首先一連串 字音Si (例如第一字音S1與第二字音S2和第三字音S3) 分別被依序讀出,並且被依序饋入子音長度偵測裝置 3 與延遲裝置4,其中子音長度偵測裝置3是用以偵測出被 偵測字的字音中之氣音部分的時間長度,因為其做法乃 是取輸入字音的振幅小於一定程度時為依據,乃將習知 技術的原理轉用於本發明,故其細節於此不加贅述。此 外,延遲裝置4亦是屬於習知技術,只是本發明應用其 原理於不同應用範疇,其實施方式有很多種,例如延遲 線(delay line)或是延遲型正反器(Delay-type Flip Flop) 等可以將輸入訊號延遲一段時間之後再由其輸出端輸出 者,皆可以運用於本發明的裝置。 經濟部智慧財產局員工消費合作社印製 其中上述的子音長度偵測裝置3所輸出的訊號内 包含有輸入字音的子音之發音長度,其在本發明中係被 用於前後字音之重疊部分之時間長度,其詳細作用在後 續中說明。在上述一連串字音輸入本發明的裝置之過程 中,當例如第三字音S 3輸入第一處理裝置1 0時,同時 子音長度偵測裝置3的輸出也被饋送到第一處理裝置1 0 以及第二處理裝置15,並且其輸出包含了第三字音的子 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 經濟部智慧財產局員工消費合作社印制π 476060 A7 __B7_ 五、發明說明() 音長度。另外,延遲裝置4的輸出也被饋送到第一處理 裝置10以及第二處理裝置15,此時延遲裝置4的輸出 包含了第二字音S 2的訊號。 在上述的例子中,第三字音S 3的訊號、第二字音 S2的訊號以及第三字音的子音長度都被輸入第一處理裝 置1 0,而在本發明的第一較佳實施例中,第一處理裝置 1 〇的轉換特性曲線如圖二所示。其中橫軸(t)代表時間而 縱軸則代表增益(g a i η ),i則代表系統的計時脈衝,在本 較佳實施例中,例如當i = 3時,則表示正要讀出第三個 字音S3。Ti代表由起始到第i個字音結束的時間長度。 上述T i所代表的亦即是處理完第i個字音的時間, 另外,E(Si)代表第i個字音的子音長度,而L(Si)代表的 是第i個字音單獨的字音長度。由圖二中包含第一部份曲 線2 0 a與第二部分曲線2 0 b的曲線2 0,可以容易的觀察 到第三個字音的發音是由小逐漸變大的,因為第一處理 裝置的轉換函數(例如第一轉換函數Η 1)的第一部份曲線 20a由零逐漸增大,因為由圖二中可以看出第一部份曲 線 2 0a是遞增函數,在此實施例中其為步階函數(step function)。並且在本發明的一較佳實施例中,第一轉換 函數Η 1可以下列式子表示: H1(n,L(Si-1),E(Si))= Γ 0 當 T i -1 S t S T i -1 - E (S I) (t/n)當 Ti-1 -E(SI) $ Ti-1 1 當 Ti-1 S tS Ti- 1-E(SI) + L(Si) 、〇 當 G Ti 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) (請先閱讀背面之注意事項再填寫本頁) ,裝 訂: 476060 A7 B7 五、發明說明() 其中第一部份曲線2 0 a的表_ J表不式即為TM-E(SI)幺 的轉換特性曲線。在此實 施例中為(t/n),但若是基於其仙 (請先閱讀背面之注意事項再填寫本頁) 一他考量,亦可以用其他遞 增函數(或嚴格遞增函數)取代。 另外,在本發明的一較佳眚 貫%例中,第二處理裝置 1 5的轉換特性曲線如圖三所示。其中橫軸⑴代表時間而 縱軸則代表增益(gain),i則代表系統的計時脈衝,在本 車交佳實施例中,例如t i = 3時’則表示正要讀出第三個 字音S3。Ti代表由起始到第丨個字音結束的時間長度, 亦即處理完第i個字音的時間,E(Si)代表第丨個字音的 子音長度,而L(Si)代表的是第j個字音單獨的字音長 度。 由圖二中包含第一部份曲線3 〇 a與第二部分曲線 3 0 b的曲線3 0 ’可以容易的觀察到第二個字音的發音是 由大逐漸變小的,因為第二處理裝置的轉換函數(例如第 二轉換函數Η 2)的第一部份曲線3 〇 a逐漸減小到零,由 圖二中可以看出第一部份曲線3〇a是遞減函數,在此實 施例中其為步階函數(step function)。並且在本發明的一 較佳實施例中,第二轉換函數Η 2可以下列式子表示: H2(n,L(Si-1),E(Si))=厂 〇 當 Ti-2 1 當 Ti-2StSTi-2+L(SM) + E(Si-1) 經濟部智慧財產局員工消費合作社印製 1-(t/n)當 Ti-2 + L(SM)-E(Sl)$t$Ti-1 當 Ή-1 $ t 其中代表上述第一部份曲線 30a的者,即為 Ti- 11 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 476060 A7 B7 經濟部智慧財產局員工消費合作社印製 五、發明說明() 2 + L(SM)-E(SI)St$Ti-l 時 H2(n,L(Si-1),E(Si))的轉換 ^ ,L 每 U.1 丄 认 Χ/,,、 . .. 特性曲線 量,亦可以 經過第 _ 後,將其分別的輸出訊號加成,即可以得到經過處理的 聲音訊號vi,其中所含的第二字音S2的音量是漸漸變 小,而第三字音S3的音量是漸漸變大的,所以經過本發 明的快速聲音合成之平滑處理裝置與方法處理過的合成 語音所發出的聲音並不會有不連續之感。 本發明的1置與方法著重在兩聲音銜接時的平滑 處理,當前後兩音銜接時,應偵測後音之子音長度,並 以此長度作為前後兩音重疊時之參考長度,並且^疊處 理時前音(本實施例中是以S3為例)以階梯函數(本$ : 例中是以第二轉換函數H2為例)處理,而後音(本實施^ 中是以S 2為例)以階梯函數(本實施例中是以努 _ & μ弟一轉換函 數Η 1為例)處理,再將所處理之後的聲音舌% , 曰里:g:。依此原 理處理所有字音的發音之後,亦即將整個甸5★ # 士 —, 句的字音處 理完畢之後,其中母個字音與字音之間的聲音是互相 疊的,所以經過本發明的方法與裝置處理夕1 土 <傻,即可以 得到完整詞彙之發音。 而透過上述階梯(或步階)函數之處理, ’在時域中可 以看見前音的訊號平滑地連接到後音的$ % .^ L 无日的戒就’所以發音 時,前音之發音也順利地連接到後音的菸立 J $ 曰,因此话淫 了前後音銜接時發音的不順暢。由於复 ' 、/、十滑處理採用階 在此實施例中為1 - (t / η)。但若是基於其他考 用其他遞減函數(或嚴格遞減函數)取代。 處理裝置10及第二處理裝置15處理之 12 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公爱 (請先閱讀背面之注意事項再填寫本頁) m 裝 訂, 476060 A7 B7_ 五、發明說明() 梯函數,其運算極為簡單,故此平滑處理模式可以運用 於電腦的快速平滑處理,大幅提昇其處理效能而避免設 備之複雜化。 但若是基於其他考量,對於本發明的此實施例中所 使用的階梯函數,也可以利用其他函數取代。例如本發 明實施例中所使用的階梯函數中之梯、度(η)決定階梯函數 的平滑度,η值越大,平滑度約高,最終趨近於線性。亦 即階梯函數在η趨近於很大的數值時,可以視為直線的 效果,亦即若有其他考量時,上述的階梯函數可以用斜 坡函數(ramp function)取代,或是以其他嚴格遞增函數 或是遞增函數(嚴格遞減函數或是遞減函數)來取代,此類 修為熟知該項技藝者依據本發明的裝置與方法之揭露而 可以輕易推知者,改並不脫離本發明的精神與範疇之 外。 經濟部智慧財產局員工消費合作社印製 !-|---_----4裝—— (請先閱讀背面之注意事項再填寫本頁) 以上所述僅為本發明之較佳實施例而已,並非用以 限定本發明之申請專利範圍;凡其它未脫離本發明所揭 示之精神下所完成之等效改變或修飾,例如在本發明的 一較佳實施例中,用以處理字音的階梯函數,也可以利 用其他函數的多項式(或將其中的高次項刪除),只要能快 速的運算即可。甚至所處理的字音訊號也不限於處理兩 個相鄰的字音訊號,也可以一次處理兩個以上的字音訊 號。只要在合成語音發音時,可以利用本發明的函數或 其他適當函數加以平滑化之後再作銜接處理即可。故任 何對本發明的函數之之改變或修飾,以將前後字音銜接 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 476060 A7 __B7_ 五、發明說明() 之快速聲音合成之平滑處理裝置與方法,均應包含在下 述之申請專利範圍内。 (請先閱讀背面之注意事項再填寫本頁) 裝 訂---------總Λ ί nnmn an n rT I n HI in —a— n teamf m I 476060 A7 B7 V. Description of the invention () (Please read the notes on the back before filling this page) Voice signal and the second synthesis in the overlapping period Add the voice signals. In this way, the first synthesized speech signal and the first synthesized speech signal and the second synthesized speech signal superimposed in the overlapping period in the subsequent section are combined with the second synthesized speech signal after the overlapping period to synthesize the emitted speech. The pronunciation between each character in the speech is smoothed. The above-mentioned first conversion function may be an increasing step function or an increasing ramp function. The above-mentioned second conversion function may be a decreasing step function or a ramp function. The consumer cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs has printed the sound synthesis smoothing method proposed by the present invention, which smoothes the pronunciation between each character in the synthesized speech when the synthesized speech signal is read out. The smooth processing method includes the following steps: first detecting the length of the consonant sounding time of each synthesized speech signal output in sequence, and delaying each synthesized speech signal for a specific time before outputting. The first conversion function is then multiplied by the first synthesized speech signal and the second synthesized speech signal in the synthesized speech signal, wherein the conversion characteristic curve of the above-mentioned first conversion function corresponds to the overlap of the second synthesized speech signal from the starting point During the time period, the gain value gradually increases from zero to a specific value, and is maintained at the specific value mentioned above, and does not return to zero until the second synthetic voice signal ends. The length of the overlap period is equal to the length of the sub-voice sounding time of the continuous first synthetic speech signal and the slower input second synthetic speech signal in the second synthetic speech signal. The first conversion function is an increasing function. Then multiply the second paper conversion function by the first paper size in the synthesized speech signal. Applicable to China National Standard (CNS) A4 (210 X 297 mm). Printed by the Employees ’Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs. 476060 A7 B7_ V. Invention Explanation () The synthesized speech signal and the second synthesized speech signal, in which the conversion characteristic curve of the above-mentioned second conversion function corresponds to the above-mentioned first synthesized speech signal, and the first synthesized speech signal starts from the attenuation starting point to the first synthesis speech signal. During the overlapping period when the voice signal ends, the gain of the first processing device gradually decreases from the specific value to zero. The length of the overlapping period is equal to the length of the post-consonant, and the second conversion function is a decreasing function. Finally, the part of the second synthesized speech signal processed by the first function and the part of the first synthesized speech signal processed by the second function are superimposed, so that the first synthesized speech signal in the overlap period described above and the first synthesized speech signal in the overlap period described above are superimposed. The second synthesized speech signal is added, so that the above-mentioned first synthesized speech signal and a part of the second synthesized speech signal processed by the first function superimposed in the above-mentioned overlapping period in the subsequent paragraph and the processed function of the second function are added. Part of the first synthesized speech signal is combined with the second synthesized speech signal after the above-mentioned overlapping period to smooth the pronunciation between each character in the synthesized speech. 5-4 Schematic illustrations The following descriptions can be combined with the following diagrams to better understand the features of the present invention, where: Figure 1 shows the smooth processing device and method of the fast sound synthesis of the present invention. A functional block diagram of a preferred embodiment; FIG. 2 shows a first processing paper used in a preferred embodiment of a smooth processing device and method for fast sound synthesis according to the present invention. CNS) A4 specification (210 X 297 mm) (Please read the precautions on the back before filling out this page) Packing -------- Order --------- Staff of Intellectual Property Bureau, Ministry of Economics Printed by the Consumer Cooperative 476060 A7 ___B7_ 5. The transfer function diagram of the invention description (); and Figure 3 shows a second preferred embodiment of the smooth processing device and method for fast sound synthesis according to the present invention. Diagram of the transfer function of the processing device. 5-5 Detailed Description of the Invention Because there is no low-cost and good-effect method and device for processing synthesized speech in the conventionally known speech synthesis technology, the current speech synthesis products on the market cannot be cheaper. To make a high-quality speech synthesis device at the price. The present invention not only does not increase costs, because the method of the present invention does not need to significantly change the device that sends out synthetic speech for the processing of speech signals, and can make the traditional device that sends out synthesized speech word by word avoid the sound of words in the words. The pronunciation is not consistent with the pronunciation. In other words, even if the sounds of individual words are stored in the speech database, after processing by the device and method of the present invention, the tone and volume of the pronunciation of each word in the sentence can be connected back and forth when the sound of the word is issued. It does not cause a sudden feeling, so that the sound emitted by the machine is closer to the real human voice, and the quality of synthesized speech is improved. The technology used in the conventional technology often only uses the adjustment of parameters to simulate the pronunciation, so the sound of the sound and the consonant at the junction cannot be fluent, and the method of the present invention can analyze the semantics so that the sentence is connected to the sentence. The pronunciation of places can be connected, so it sounds continuous. Because the device that synthesizes speech from a machine is a category of known technology, the specification of this invention applies the Chinese National Standard (CNS) A4 specification (210 X 297 mm) at this paper size. Page) Equipment · 476060 A7 ____B7_ V. Description of the invention () (-1, read the precautions on the back first and then write this page) This is not described here, it is only related to the present invention, and it is used to quickly connect the phonetic sounds in the words. The smooth processing device and method for sound synthesis are explained in this manual. FIG. 1 shows a functional block diagram of the smooth processing device and method for fast sound synthesis proposed by the present invention. When a series of phonetic sounds are to be read out from the speech library to produce a synthesized speech of a whole sentence, a series of phonetic sounds Si (such as the first A syllable S1, a second syllable S2, and a third syllable S3) are read out in sequence and fed into the consonant length detection device 3 and the delay device 4, respectively. The consonant length detection device 3 is used to detect The time length of the gaseous part in the phonetic sound of the detected word is based on the fact that the amplitude of the input phonetic sound is less than a certain level, and the principle of the conventional technology is applied to the present invention, so the details are as follows: I won't go into details here. In addition, the delay device 4 also belongs to the conventional technology, but the principle of the present invention is applied to different application fields. There are many implementation manners, such as a delay line or a delay-type flip flop. ) Those who can delay the input signal for a period of time and then output it from its output terminal can be applied to the device of the present invention. Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs, where the signal output by the above-mentioned consonant length detection device 3 includes the pronunciation length of the consonants of the input vowel, which is used in the present invention for the time of overlapping parts of the vowel The length and its detailed role will be described later. In the process of inputting a series of character sounds into the device of the present invention, when, for example, the third character sound S 3 is input to the first processing device 10, the output of the consonant length detection device 3 is also fed to the first processing device 10 and the first The second processing device 15, and the output of the paper sheet containing the third vowel is applicable to the Chinese National Standard (CNS) A4 specification (210 X 297 mm). Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs π 476060 A7 __B7_ V. Description of the invention () Tone length. In addition, the output of the delay device 4 is also fed to the first processing device 10 and the second processing device 15. At this time, the output of the delay device 4 includes a signal of the second phonetic S 2. In the above example, the signal of the third character S3, the signal of the second character S2, and the consonant length of the third character are all input to the first processing device 10, and in the first preferred embodiment of the present invention, The conversion characteristic curve of the first processing device 10 is shown in FIG. The horizontal axis (t) represents time and the vertical axis represents gain (gai η), and i represents the timing pulse of the system. In the preferred embodiment, for example, when i = 3, it indicates that the third Words S3. Ti represents the length of time from the beginning to the end of the i-th tone. The above T i represents the time for processing the i-th tone. In addition, E (Si) represents the consonant length of the i-th tone, and L (Si) represents the individual length of the i-th tone. From the curve 2 0 including the first part of the curve 20 a and the second part of the curve 2 0 b in FIG. 2, it can be easily observed that the pronunciation of the third character is gradually increased from small, because the first processing device The first partial curve 20a of the conversion function (for example, the first conversion function Η 1) gradually increases from zero, because it can be seen from FIG. 2 that the first partial curve 20a is an increasing function. In this embodiment, Is a step function. And in a preferred embodiment of the present invention, the first conversion function Η 1 can be expressed by the following formula: H1 (n, L (Si-1), E (Si)) = Γ 0 when T i -1 S t ST i -1-E (SI) (t / n) when Ti-1 -E (SI) $ Ti-1 1 when Ti-1 S tS Ti- 1-E (SI) + L (Si), 〇 when G Ti This paper size is in accordance with Chinese National Standard (CNS) A4 (210 X 297 mm) (Please read the precautions on the back before filling out this page), binding: 476060 A7 B7 5. Description of the invention () The first part The table _ J of the partial curve 2 0 a is the conversion characteristic curve of TM-E (SI) 幺. In this embodiment, it is (t / n), but if it is based on its immortality (please read the notes on the back before filling this page) for other considerations, it can also be replaced by other increasing functions (or strictly increasing functions). In addition, in a preferred embodiment of the present invention, the conversion characteristic curve of the second processing device 15 is shown in FIG. Among them, the horizontal axis ⑴ represents time and the vertical axis represents gain, and i represents the timing pulse of the system. In the embodiment of this car, for example, when ti = 3, 'means that the third character S3 is to be read out. . Ti represents the length of time from the beginning to the end of the 丨 th tone, that is, the time to process the ith tone, E (Si) represents the consonant length of the 丨 th tone, and L (Si) represents the jth Individual phonetic length. From the curve 3 0 ′ including the first part of the curve 3 0a and the second part of the curve 3 0 b in FIG. 2, it can be easily observed that the pronunciation of the second character sound gradually decreases from large to small because the second processing device The first part of the curve 3 oa of the transformation function (for example, the second transformation function Η 2) gradually decreases to zero. As can be seen in Figure 2, the first part of the curve 30a is a decreasing function. In this embodiment, It is a step function. And in a preferred embodiment of the present invention, the second conversion function Η 2 can be expressed by the following formula: H2 (n, L (Si-1), E (Si)) = factory 0 when Ti-2 1 when Ti -2StSTi-2 + L (SM) + E (Si-1) Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs 1- (t / n) When Ti-2 + L (SM) -E (Sl) $ t $ Ti-1 When Ή-1 $ t, which represents the first part of the curve 30a above, it is Ti-11. The paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297 mm) 476060 A7 B7 Ministry of Economic Affairs Printed by the Intellectual Property Bureau Staff Consumer Cooperatives V. Invention Description () 2 + L (SM) -E (SI) St $ Ti-l Conversion of H2 (n, L (Si-1), E (Si)) ^ , L each U.1 丄 recognize X / ,,,. .. characteristic curve quantity, can also add their respective output signals after the _th, to get the processed sound signal vi, which contains The volume of the second character sound S2 is gradually reduced, and the volume of the third character sound S3 is gradually increased, so the sound produced by the synthesized speech processed by the smooth processing device and method of the rapid sound synthesis of the present invention will not Feeling discontinuous. The one-position method of the present invention focuses on smooth processing when two sounds are connected. When the current two sounds are connected, the consonant length of the rear sound should be detected, and this length is used as the reference length when the two sounds overlap. The front tone (S3 is taken as an example in this embodiment) is processed with a step function (this example: the second conversion function H2 is taken as an example), and the back tone is taken (in this embodiment, S 2 is taken as an example) A step function is used (in this embodiment, a nu_ampl conversion function Η 1 is taken as an example), and then the processed sound tongue% is said to be: g :. After processing the pronunciation of all the phonetic sounds according to this principle, the entire Dian 5 ★ # 士 —, after the processing of the phonetic sounds of the sentence, the sounds between the female phonetic sounds and the phonetic sounds overlap each other, so after the method and device of the present invention Dealing with Xi 1 soil < silly, you can get the pronunciation of the complete vocabulary. And through the processing of the above-mentioned step (or step) function, 'the signal of the front tone can be smoothly connected to the $% of the back tone in the time domain. Yan Li J $ Yue also smoothly connected to the back tone, so the words are not smooth when the front and back sounds are connected. Since the complex ', /, and ten-slip processing use steps in this embodiment, 1-(t / η). But if it is based on other tests, replace it with another decreasing function (or strictly decreasing function). Processing device 10 and second processing device 15 processing 12 This paper size applies to Chinese National Standard (CNS) A4 specifications (210 X 297 public love (please read the precautions on the back before filling this page) m binding, 476060 A7 B7_ 5 Explanation of the invention () Ladder function, its operation is extremely simple, so the smooth processing mode can be applied to the computer's fast smooth processing, which greatly improves its processing efficiency and avoids the complexity of the equipment. However, if it is based on other considerations, this implementation of the present invention The step function used in the example can also be replaced by other functions. For example, the step and degree (η) of the step function used in the embodiment of the present invention determine the smoothness of the step function. The larger the value of η, the higher the smoothness. , And eventually approaches linearity. That is, when η approaches a large value, it can be regarded as the effect of a straight line, that is, if there are other considerations, the above-mentioned step function can be replaced with a ramp function. , Or replaced by other strictly increasing function or increasing function (strictly decreasing function or decreasing function), this kind of repair is familiar with the technology According to the disclosure of the device and method of the present invention, it can be easily inferred, and the change does not depart from the spirit and scope of the present invention. Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs!-| ---_---- 4 Pack-(Please read the precautions on the back before filling out this page) The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of patent application for the present invention; all others do not depart from the disclosure of the present invention Equivalent changes or modifications performed under the spirit of the present invention, for example, in a preferred embodiment of the present invention, a step function for processing a word sound can also use a polynomial of other functions (or delete the higher-order terms thereof), as long as It can be quickly calculated. Even the processed audio signals are not limited to processing two adjacent audio signals, but can also process more than two audio signals at a time. As long as the speech pronunciation is synthesized, the functions of the present invention or Other appropriate functions can be smoothed and then processed for convergence. Therefore, any changes or modifications to the functions of the present invention can be used to connect the previous and next characters to the paper size. The Chinese National Standard (CNS) A4 specification (210 X 297 mm) 476060 A7 __B7_ V. Description of the invention () The smooth processing device and method for fast sound synthesis should be included in the scope of patent application below. (Please first (Read the notes on the back and fill in this page) Binding --------- Total

P 經濟部智慧財產局員工消費合作社印製 14 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐)P Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 14 This paper size applies to China National Standard (CNS) A4 (210 X 297 mm)

Claims (1)

經濟部智慧財產局員工消費合作社印製 476060 Λ8 B8 C8 08 六、申請專利範圍 1 . 一種聲音合成的平滑處理裝置,係在讀出合成語音訊 號時,使所發出的合成語音中每個字音間之發音平滑 化,該聲音合成的平滑處理裝置至少包含·· 子音長度偵測裝置,係用於偵測出依序輸出的每個 合成語音訊號的子音發聲時間長度; 延遲裝置,係用於將每個合成語音訊號輸入該延遲 裝置而延遲一段特定時間之後,再行將所輸入的每個合 成語音訊號輸出該延遲裝置; 第一處理裝置,係兩於在合成語音訊號中的第一合 成語音訊號以及第二合成語音訊號被依序輸入到該第一 處理裝置時,相應於該第二合成語音訊號由起始點開始 的重疊時段内,該第一處理裝置的增益大小由零漸增至 一特定值,並且維持在該特定值,直到該第二合成語音 訊號結束之後才又回歸到零,該重疊時段的長度等於該 子音長度偵測裝置對連續的該第一合成語音訊號以及該 第二合成語音訊號中較慢輸入的該第二合成語音訊號偵 測之後所得到的後子音長度,該第一處理裝置的增益大 小由零漸增至該特定值時,其轉換函數為第一轉換函 數,該第一轉換函數為遞增函數; 第二處理裝置,係用於依據該子音長度偵測裝置所 偵測得的每個合成語音訊號的子音發聲時間長度,當合 成語音訊號中的該第一合成語音訊號以及該第二合成語 音訊號被依序輸入到該第二處理裝置時,該第一合成語 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) ------------W裝—— ·**· (請先閱讀背面之注意事項再填寫本頁) 訂·- .^- 476060 A8 B8 C8 D8 六、申請專利範圍 (請先閱讀背面之注意事項再填寫本頁) 音訊號由衰減起始點開始到第一合成語音訊號結束的該 重疊時段内,該第一處理裝置的增益大小由該特定值漸 漸減小到零,該重疊時段的長度等於該子音長度偵測裝 置偵測該第二合成語音訊號所得的該後子音長度,該第 二處理裝置的增益大小由該特定值漸減至零時,其所對 應的函數為第二轉換函數,該第二轉換函數為遞減函 數;以及 加成裝置,係用於將該第一處理裝置以及該第二處 理裝置的輸出訊號疊加,使得該重疊時段内的該第一合 成語音訊號與該重疊時段内的第二合成語音訊號相加, 藉以使得該第一合成語音訊號,及其後段中該重疊時段 内所疊加的該第一合成語音訊號與該第二合成語音訊 號,連同該重疊時段之後的該第二合成語音訊號結合之 後,使所發出的合成語音中之每個字音間之發音平滑 2. 如申請專利範圍第1項之聲音合成的平滑處理裝置, 其中上述之第一轉換函數為該遞增函數的一種之步階函 數(step function) ° 經濟部智慧財產局員工消費合作社印製 3. 如申請專利範圍第1項之聲音合成的平滑處理裝置, 其中上述之第一轉換函數為該遞增函數的一種之斜坡函 數(ramp function) 〇 4. 如申請專利範圍第1項之聲音合成的平滑處理裝置, 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 476060 A8 B8 C8 D8 t、申請專利範圍 其中上述之第二轉換函數為該遞減函數的一種之步階函 數(step function)。 5. 如申請專利範圍第1項之聲音合成的平滑處理裝置, 其中上述之第二轉換函數為該遞減函數的一種之斜坡函 數(ramp function) ° 6. —種聲音合成的平滑處理裝置,係在讀出合成語音訊 號時,使所發出的合成語音中每個字音間之發音平滑 化,該聲音合成的平滑處理裝置至少包含: 子音長度偵測裝置,係用於偵測出依序輸出的每個 合成語音訊號的子音發聲時間長度; 延遲裝置,係用於將每個合成語音訊號輸入該延遲 裝置而延遲一段特定時間之後,再行將所輸入的每個合 成語音訊號輸出該延遲裝置; 經濟部智慧財產局員工消費合作社印製 -----^---,0--------- (請先閱讀背面之注意事項再填寫本頁) 第一處理裝置,係用於在合成語音訊號中的第一合 成語音訊號以及第二合成語音訊號被依序輸入到該第一 處理裝置時,相應於該第二合成語音訊號由起始點開始 的重疊時段内,該第一處理裝置的增益大小由零漸增至 一特定值,並且維持在該特定值,直到該第二合成語音 訊5虎結束之後才又回知到零,该重豐時段的長度專於4亥 子音長度偵測裝置對連續的該第一合成語音訊號以及該 第二合成語音訊號中較慢輸入的該第二合成語音訊號偵 測之後所得到的後子音長度,該第一處理裝置的增益大 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 476060 A8 B8 C8 D8 t、申請專利範圍 (請先閱讀背面之注意事項再填寫本頁) 小由零漸增至該特定值時,其轉換函數為第一轉換函 數,該第一轉換函數為一種遞增的步階函數(step function); 第二處理裝置,係用於依據該子音長度偵測裝置所 偵測得的每個合成語音訊號的子音發聲時間長度,當合 成語音訊號中的該第一合成語音訊號以及該第二合成語 音訊號被依序輸入到該第二處理裝置時,該第一合成語 音訊號由衰減起始點開始到第一合成語音訊號結束的該 重疊時段内,該第二處理裝置的增益大小由該特定值漸 漸減小到零,該重疊時段的長度等於該子音長度偵測裝 置偵測該第二合成語音訊號所得的該後子音長度,該第 二處理裝置的增益大小由該特定值漸減至零時,其所對 應的函數為第二轉換函數,該第二轉換函數為一種遞減 的步階函數;以及 經濟部智慧財產局員工消費合作社印製 加成裝置,係用於將該第一處理裝置以及該第二處 理裝置的輸出訊號疊加,使得該重疊時段内的該第一合 成語音訊號與該重疊時段内的第二合成語音訊號相加, 藉以使得該第一合成語音訊號,及其後段中該重疊時段 内所疊加的該第一合成語音訊號與該第二合成語音訊 號,連同該重疊時段之後的該第二合成語音訊號結合之 後,使所發出的合成語音中之每個字音間之發音平滑 化。 7. —種聲音合成的平滑處理方法,係在讀出合成語音訊 號時,使所發出的合成語音中每個字音間之發音平滑 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 476060 A8 B8 C8 D8 t、申請專利範圍 化,該聲音合成的平滑處理方法至少包含: (請先閱讀背面之注意事項再填寫本頁) 偵測出依序輸出的每個合成語音訊號的子音發聲 時間長度,同時將每個合成語音訊號延遲一段特定時間 之後,再行輸出; ·. 以第一轉換函數乘以合成語音訊號中的第一合成 語音訊號以及第二合成語音訊號,其中該第一轉換函數 的轉換特性曲線相應於該第二合成語音訊號由起始點開 始的重疊時段内,其增益大小由零漸增至一特定值,並 且維持在該特定值,直到該第二合成語音訊號結束之後 才又回歸到零,該重疊時段的長度等於連續的該第一合 成語音訊號以及該第二合成語音訊號中較慢輸入的該第 二合成語音訊號的子音發聲時間長度,該第一轉換函數 為遞增函數; 以第二轉換函數乘以合成語音訊號中的第一合成 語音訊號以及第二合成語音訊號,其中該第二轉換函數 的轉換特性曲線相應於該第一合成語音訊號,於該第一 合成語音訊號由衰減起始點開始到第一合成語音訊號結 束的該重疊時段内,該第一處理裝置的增益大小由該特 定值漸漸減小到零,該重疊時段的長度等於該後子音長 度,該第二轉換函數為遞減函數;以及 經濟部智慧財產局員工消費合作社印製 將經過該第一函數處理的部分該第二合成語音訊 號與經過該第二函數處理的部分該第一合成語音訊號疊 加,使得該重疊時段内的該第一合成語音訊號與該重疊 時段内的第二合成語音訊號相加,藉以使得該第一合成 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 476060 A8 B8 C8 D8 t、申請專利範圍 (請先閱讀背面之注意事項再填寫本頁) 語音訊號,及其後段内該重疊時段中所疊加的經過該第 一函數處理的部分該第二合成語音訊號與經過該第二函 數處理的部分該第一合成語音訊號,連同該重疊時段之 後的該第二合成語音訊號結合之後,使所發出的合成語 音中之每個字音間之發音平滑化。 8. 如申請專利範圍第 7項之聲音合成的平滑處理方法, 其中上述之第一轉換函數為遞增之步階函數(step function) ° 9. 如申請專利範圍第7項之聲音合成的平滑處理方法, 其中上述之第一轉換函數為遞增之斜坡函數(「amp function) ° 1 0.如申請專利範圍第7項之聲音合成的平滑處理方法, 其中上述之第二轉換函數為遞減之步階函數(step function) 〇 1 1 .如申請專利範圍第7項之聲音合成的平滑處理裝置, 其中上述之第二轉換函數為遞減之斜坡函數(ramp function)。 經濟部智慧財產局員工消費合作社印製 1 2. —種聲音合成的平滑處理方法,係在讀出合成語音訊 號時,使所發出的合成語音中每個字音間之發音平滑 化,該聲音合成的平滑處理方法至少包含: 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 476060 A8 B8 C8 D8 t、申請專利範圍 (請先閱讀背面之注意事項再填寫本頁) 偵測出依序輸出的每個合成語音訊號的子音發聲 時間長度,同時將每個合成語音訊號延遲一段特定時間 之後,再行輸出; 以第一轉換函數乘以合成語音訊號中的第一合成 語音訊號以及第二合成語音訊號,其中該第一轉換函數 的轉換特性曲線相應於該第二合成語音訊號由起始點開 始的重疊時段内,其增益大小由零漸增至一特定值,並 且維持在該特定值,直到該第二合成語音訊號結束之後 才又回歸到零,該重疊時段的長度等於連續的該第一合 成語音訊號以及該第二合成語音訊號中較慢輸入的該第 二合成語音訊號的子音發聲時間長度,該第一轉換函數 為遞增的步階函數(step function); 以第二轉換函數乘以合成語音訊號中的第一合成 語音訊號以及第二合成語音訊號,其中該第二轉換函數 的轉換特性曲線相應於該第一合成語音訊號,於該第一 合成語音訊號由衰減起始點開始到第一合成語音訊號結 束的該重疊時段内,該第一處理裝置的增益大小由該特 定值漸漸減小到零,該重疊時段的長度等於該後子音長 度,該第二轉換函數為遞減的步階函數;以及 經濟部智慧財產局員工消費合作社印製 將經過該第一函數處理的部分該第二合成語音訊 號與經過該第二函數處理的部分該第一合成語音訊號疊 加,使得該重疊時段内的該第一合成語音訊號與該重疊 時段内的第二合成語音訊號相加,藉以使得該第一合成 語音訊號,及其後段内該重疊時段中所疊加的經過該第 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 476060 A8 B8 C8 D8 申請專利範圍 函之語 二段成 第時合 該疊的 過重出 經該發 與同所 號連使 訊,, 音號後。 語訊之化 成音合''f 合語結平 二成號音 第合訊發 該一音之 分第語間 部該成音 的分合字 理部二個 處的第每 數理該之 函處的中 一 數後音 1 I ______I I ^ · I I I I I 11 *^" (請先閱讀背面之注意事項再填寫本頁) 經濟部智慧財產局員工消費合作社印製 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐)Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs 476060 Λ8 B8 C8 08 6. Scope of patent application 1. A smoothing device for sound synthesis, which reads out the synthesized speech signal and makes every synthesized word in the synthesized speech The pronunciation is smoothed. The smoothing device for sound synthesis includes at least a consonant length detection device for detecting the length of the consonant sounding time of each synthesized speech signal output in sequence; a delay device for After each synthetic speech signal is input to the delay device and delayed for a certain period of time, each input of the synthesized speech signal is output to the delay device. The first processing device is a second synthetic speech signal in the synthesized speech signal. When the signal and the second synthesized speech signal are sequentially input to the first processing device, the gain of the first processing device gradually increases from zero to within the overlap period of the second synthesized speech signal from the starting point. A specific value, and maintained at the specific value, and does not return to it until the second synthesized voice signal ends , The length of the overlapping period is equal to the post-voice length obtained after the consonant length detection device detects the continuous first synthetic speech signal and the second synthetic speech signal that is input slower in the second synthetic speech signal, When the gain of the first processing device gradually increases from zero to the specific value, its conversion function is a first conversion function, and the first conversion function is an increasing function. The second processing device is used to detect the consonant length. The length of the sub-voice sounding time of each synthesized speech signal detected by the device. When the first synthesized speech signal and the second synthesized speech signal in the synthesized speech signal are sequentially input to the second processing device, the first The paper size of a synthetic language is applicable to China National Standard (CNS) A4 (210 X 297 mm) ------------ W Packing-· ** · (Please read the notes on the back first Fill in this page again) Order ·-. ^-476060 A8 B8 C8 D8 6. Scope of patent application (please read the precautions on the back before filling this page) The audio signal starts from the attenuation start point to the first synthesized speech signal. During the overlapping period, the gain of the first processing device is gradually reduced from the specific value to zero, and the length of the overlapping period is equal to the post-consonant length obtained by the consonant length detection device detecting the second synthetic speech signal. , When the gain of the second processing device gradually decreases from the specific value to zero, its corresponding function is a second conversion function, and the second conversion function is a decreasing function; and an addition device is used to add the first The output signals of the processing device and the second processing device are superimposed, so that the first synthetic voice signal in the overlapping period and the second synthetic voice signal in the overlapping period are added, so that the first synthetic voice signal, and The first synthesized speech signal and the second synthesized speech signal superimposed in the overlapping period in the latter paragraph are combined with the second synthesized speech signal after the overlapping period, so that each word in the synthesized speech is separated. Smoothing of pronunciation 2. As in the smoothing device for sound synthesis in item 1 of the scope of patent application, wherein the above-mentioned first conversion function is the increment A step function of a function ° Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economics 3. If the smoothing device for sound synthesis in the first item of the scope of patent application, wherein the first conversion function is the increasing function One kind of ramp function 〇4. If the smoothing device for sound synthesis in item 1 of the scope of patent application, this paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297 mm) 476060 A8 B8 C8 D8 t. The scope of the patent application, wherein the above-mentioned second conversion function is a step function of one of the decreasing functions. 5. The smoothing device for sound synthesis as described in item 1 of the scope of patent application, wherein the second conversion function is a ramp function of one of the decreasing functions. 6. A smoothing device for sound synthesis. When the synthesized speech signal is read out, the pronunciation between each character in the synthesized speech is smoothed. The smoothing device for the sound synthesis includes at least: a consonant length detection device for detecting sequentially output The length of the sub-voice sounding time of each synthesized speech signal; the delay device is used to input each synthesized speech signal into the delay device and delay for a specific time, and then output each of the input synthesized speech signals to the delay device; Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs ----- ^ ---, 0 --------- (Please read the precautions on the back before filling this page) The first processing device is for use When the first synthesized speech signal and the second synthesized speech signal in the synthesized speech signal are sequentially input to the first processing device, the second synthesized speech signal starts from During the initial overlapping period, the gain of the first processing device gradually increases from zero to a specific value, and is maintained at the specific value. It does not return to zero until the second synthesized voice message 5 is over. The length of the peak period is specialized for the length of the post-vowel after the continuous detection of the first synthesized speech signal and the slower input of the second synthesized speech signal in the second synthesized speech signal. The gain of this first processing device is large. The paper size is applicable to the Chinese National Standard (CNS) A4 specification (210 X 297 mm) 476060 A8 B8 C8 D8 t. Patent application scope (please read the precautions on the back before filling this page) When the value is gradually increased from zero to the specific value, the conversion function is a first conversion function, and the first conversion function is an increasing step function. The second processing device is used to detect the consonant length. The length of the sub-voice sounding time of each synthesized speech signal detected by the measurement device, when the first synthesized speech signal and the second synthesized speech signal in the synthesized speech signal are When sequentially input to the second processing device, the gain of the second processing device gradually decreases from the specific value in the overlapping period from the start point of the attenuation to the end of the first synthesized speech signal. As small as zero, the length of the overlapping period is equal to the post-vowel length obtained by the consonant length detection device detecting the second synthetic speech signal. When the gain of the second processing device gradually decreases from the specific value to zero, The corresponding function is a second conversion function, which is a decreasing step function; and the printing and adding device of the consumer cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs is used for the first processing device and the second The output signals of the processing device are superimposed, so that the first synthetic voice signal in the overlapping period and the second synthetic voice signal in the overlapping period are added, so that the first synthetic voice signal and the subsequent period in the overlapping period The superimposed first synthetic voice signal and the second synthetic voice signal, together with the second synthetic voice signal after the overlapping period, After the combination, the pronunciation between each character in the synthesized speech is smoothed. 7. —A smoothing method for sound synthesis, which smoothes the pronunciation between each character in the synthesized speech when the synthesized speech signal is read out. This paper applies the Chinese National Standard (CNS) A4 specification (210 X 297). Mm) 476060 A8 B8 C8 D8 t. Patent application scope. The smoothing method of the sound synthesis includes at least: (Please read the precautions on the back before filling this page). Detect each synthesized speech signal output in sequence. The length of the sub-voice sound, and simultaneously delay the output of each synthesized voice signal for a certain period of time, and then output; ·. Multiply the first synthetic function signal by the first synthetic voice signal and the second synthetic voice signal in the synthetic voice signal, where The conversion characteristic curve of the first conversion function corresponds to the overlap period of the second synthesized speech signal from the starting point, and its gain gradually increases from zero to a specific value, and is maintained at the specific value until the second The synthetic speech signal returns to zero after the end, and the length of the overlapping period is equal to the continuous first synthetic speech signal and the continuous speech signal. The length of the sub-voice sounding time of the second synthetic speech signal that is input slower in the second synthesized speech signal, the first conversion function is an increasing function; the second conversion function is multiplied by the first synthesized speech signal and the first Two synthetic voice signals, wherein the conversion characteristic curve of the second conversion function corresponds to the first synthetic voice signal, and in the overlapping period from the start point of the attenuation of the first synthetic voice signal to the end of the first synthetic voice signal, The gain of the first processing device is gradually reduced from the specific value to zero, the length of the overlap period is equal to the length of the post-consonant, and the second conversion function is a decreasing function; A portion of the second synthesized speech signal processed by the first function is superimposed on a portion of the first synthesized speech signal processed by the second function, so that the first synthesized speech signal in the overlap period and the first synthesized speech signal in the overlap period are superimposed. The two synthesized speech signals are added so that the paper size of the first synthesized paper is applicable to Chinese national standards (CNS) A4 specification (210 X 297 mm) 476060 A8 B8 C8 D8 t. Patent application scope (please read the precautions on the back before filling this page) Voice signal, and the superimposed process in the overlapping period in the following paragraph A portion of the second synthesized speech signal processed by the first function and a portion of the first synthesized speech signal processed by the second function are combined with the second synthesized speech signal after the overlap period to make the synthesized speech emitted The pronunciation between each character in the sound is smoothed. 8. The smoothing method of sound synthesis as described in item 7 of the scope of patent application, wherein the first conversion function is an increasing step function ° 9. The smoothing process of sound synthesis as described in item 7 of the scope of patent application Method, wherein the above-mentioned first conversion function is an increasing ramp function (“amp function) ° 1 0. The smoothing method for sound synthesis as described in item 7 of the scope of patent application, wherein the above-mentioned second conversion function is a decreasing step Step function 〇1 1. The smoothing device for sound synthesis as described in item 7 of the scope of patent application, wherein the above-mentioned second conversion function is a decreasing ramp function. System 1 2. — A smoothing method for sound synthesis, which smoothes the pronunciation between each character in the synthesized speech when the synthesized speech signal is read out. The smoothing method for sound synthesis includes at least: this paper Standards apply to China National Standard (CNS) A4 specifications (210 X 297 mm) 476060 A8 B8 C8 D8 t, patent application scope (please read first Note on the back, fill in this page again.) Detect the length of the consonant sound of each synthesized speech signal output in sequence, and delay each synthesized speech signal for a specific time before outputting it; multiply by the first conversion function The first synthesized speech signal and the second synthesized speech signal in the synthesized speech signal are used. The conversion characteristic curve of the first conversion function corresponds to the overlap period of the second synthesized speech signal from the starting point. Zero gradually increases to a specific value and is maintained at that specific value, and does not return to zero until the end of the second synthesized speech signal. The length of the overlapping period is equal to the continuous first synthesized speech signal and the second synthesized speech signal. The length of the consonant sounding time of the slower input second synthetic speech signal in the signal, the first conversion function is an increasing step function; the second conversion function is multiplied by the first synthesized speech in the synthesized speech signal Signal and a second synthesized speech signal, wherein a conversion characteristic curve of the second conversion function corresponds to the first Into a voice signal, and during the overlapping period from when the first synthetic voice signal begins to decay to when the first synthetic voice signal ends, the gain of the first processing device gradually decreases from the specific value to zero, and the overlap The length of the period is equal to the length of the post-consonant, the second conversion function is a decreasing step function; and the consumer cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs prints a portion of the second synthesized voice signal that will be processed by the first function and passes through the first function. A portion of the first synthesized speech signal processed by the second function is superimposed, so that the first synthesized speech signal in the overlap period and the second synthesized speech signal in the overlap period are added to make the first synthesized speech signal, and In the following paragraphs, the superimposed paper size in this overlapping period is subject to the Chinese National Standard (CNS) A4 specification (210 X 297 mm) 476060 A8 B8 C8 D8. The over-repetition of the stack is followed by a messenger with the same number, after the note. Synthetic syllables of the `` f '' syllables are combined into two syllables, and the syllables are sent to the point of the syllable. After the first grade of the number 1 I ______I I ^ · IIIII 11 * ^ " (Please read the precautions on the back before filling out this page) Printed on paper standards of the Ministry of Economic Affairs and Intellectual Property Bureau's Consumer Cooperatives, which are in accordance with Chinese national standards (CNS ) A4 size (210 X 297 mm)
TW89110181A 2000-05-25 2000-05-25 Smoothening apparatus and method for quick synthesized voice TW476060B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW89110181A TW476060B (en) 2000-05-25 2000-05-25 Smoothening apparatus and method for quick synthesized voice

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW89110181A TW476060B (en) 2000-05-25 2000-05-25 Smoothening apparatus and method for quick synthesized voice

Publications (1)

Publication Number Publication Date
TW476060B true TW476060B (en) 2002-02-11

Family

ID=21659858

Family Applications (1)

Application Number Title Priority Date Filing Date
TW89110181A TW476060B (en) 2000-05-25 2000-05-25 Smoothening apparatus and method for quick synthesized voice

Country Status (1)

Country Link
TW (1) TW476060B (en)

Similar Documents

Publication Publication Date Title
US11443733B2 (en) Contextual text-to-speech processing
EP3387646B1 (en) Text-to-speech processing system and method
US8224645B2 (en) Method and system for preselection of suitable units for concatenative speech
US7565291B2 (en) Synthesis-based pre-selection of suitable units for concatenative speech
TW466470B (en) Identification of unit overlap regions for concatenative speech synthesis system
US20130041669A1 (en) Speech output with confidence indication
JP2001034283A (en) Voice synthesizing method, voice synthesizer and computer readable medium recorded with voice synthesis program
CA2145298A1 (en) Method and apparatus for speech synthesis
JPH08512150A (en) Method and apparatus for converting text into audible signals using neural networks
CN106710585A (en) Method and system for broadcasting polyphonic characters in voice interaction process
Aida-Zade et al. The main principles of text-to-speech synthesis system
CN115273776A (en) End-to-end singing voice synthesis method, computer equipment and storage medium
TW476060B (en) Smoothening apparatus and method for quick synthesized voice
JPH03273280A (en) Voice synthesizing system for vocal exercise
TW508564B (en) Method and system for phonetic recognition
JPH0962286A (en) Speech synthesizer and speech synthesis method
JP6631186B2 (en) Speech creation device, method and program, speech database creation device
TW470927B (en) Device and method for smoothening synthesized voice speech
KR0134707B1 (en) LSP Speech Synthesis Method Using Diphone Unit
JP3270668B2 (en) Prosody synthesizer based on artificial neural network from text to speech
Xie et al. FireRedTTS: The Xiaohongshu Speech Synthesis System for Blizzard Challenge 2023.
Dessai et al. Development of Konkani TTS system using concatenative synthesis
JPH04270394A (en) Pause length determining system
JP2680643B2 (en) Character display method of rule synthesizer
JP2819904B2 (en) Continuous speech recognition device

Legal Events

Date Code Title Description
GD4A Issue of patent certificate for granted invention patent
MM4A Annulment or lapse of patent due to non-payment of fees