JPH0344312B2

JPH0344312B2 -

Info

Publication number: JPH0344312B2
Application number: JP59115967A
Authority: JP
Inventors: Satoshi Endo; Takaaki Furuta; Masaharu Morita; Eiji Minami
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1984-06-06
Filing date: 1984-06-06
Publication date: 1991-07-05
Also published as: JPS60260082A

Description

【発明の詳細な説明】産業上の利用分野本発明は俗に言うカラオケ装置等の音声信号記
録再生装置と共に用いて、ユーザーの唄う音声信
号を、基準となる磁気テープ等の再生音声信号と
比較して自動的にユーザーの歌唱力を採点する採
点装置に関するものである。[Detailed Description of the Invention] Industrial Application Field The present invention is used in conjunction with an audio signal recording and reproducing device such as a karaoke machine, and compares the audio signal sung by a user with the audio signal reproduced from a standard magnetic tape or the like. This invention relates to a scoring device that automatically scores a user's singing ability.

従来例の構成とその問題点音響機器の一分野として、磁気テープ等の記録
媒体に記録された楽器などの演奏音楽信号を再
生、拡声し、これに合わせてユーザーが歌を唄う
と上記演奏音楽信号と混合して拡声する。俗に言
う「カラオケ装置」と呼ばれているものがあり、
広く一般家庭用あるいは業務用として普及してい
る。Configuration of conventional examples and their problems As a field of audio equipment, a music signal played by a musical instrument recorded on a recording medium such as a magnetic tape is played back and amplified, and when a user sings along with it, the music is reproduced. Amplify the sound by mixing it with the signal. There is something commonly called a "karaoke device",
It is widely used for general household or commercial use.

上記「カラオケ装置」を用いて歌を唄うことに
より、ユーザーは喜びや満足感を得ることができ
るが、近年、自らの歌唱力を向上させたいと思う
人々が増加しており、歌唱力向上のために歌の先
生の指導を受ける人もいるが、誰もが可能なこと
ではなく、一人で歌の勉強ができる一つの手段と
して、「音声多重テープ」とよばれる磁気テープ
等の音声多重式の記録媒体なるものが急速に普及
してきている。この音声多重式の記録媒体とは一
例として磁気テープの場合、第１図に示すよう
に、磁気テープ１における第１のトラツク１０１
に歌手などのボーカル信号が、第２のトラツク１
０２に楽器等の演奏音楽信号がそれぞれ記録され
たものである。この磁気テープを用いる場合、第
２図に示すような構成の音声多重式の「カラオケ
装置」が用いられ、磁気テープ１に記録されたボ
ーカル信号および演奏音楽信号を、磁気ヘツド２
０１と増幅器２０２よりなる第１の磁気テープ再
生手段２と、磁気ヘツド３０１と増幅器３０２よ
りなる第２のテープ再生手段３とにより再生し、
この２つの出力をマイク４０１と増幅器４０２よ
りなるマイク入力手段の出力とともに混合増幅器
５により混合・電力増幅してスピーカ６より音響
信号として出力する。 By singing using the above-mentioned "karaoke device," the user can gain joy and satisfaction, but in recent years, the number of people who want to improve their singing ability has increased, and the number of people who want to improve their singing ability has increased. Some people receive guidance from a singing teacher, but this is not possible for everyone, and one way to study singing on your own is to use an audio multiplexing method such as magnetic tape called ``audio multiplex tape.'' recording media are rapidly becoming popular. For example, in the case of a magnetic tape, this audio multiplexing recording medium is a first track 101 on a magnetic tape 1, as shown in FIG.
A vocal signal from a singer or the like is sent to the second track 1.
02, musical signals played by musical instruments, etc. are recorded. When using this magnetic tape, an audio multiplexing type ``karaoke apparatus'' having the configuration shown in FIG.
01 and an amplifier 202, and a second tape reproducing means 3 consisting of a magnetic head 301 and an amplifier 302,
These two outputs are mixed and power amplified by a mixing amplifier 5 together with the output of a microphone input means consisting of a microphone 401 and an amplifier 402, and outputted from a speaker 6 as an acoustic signal.

上記の装置を用いて、記録媒体に記録されてい
るボーカル信号を聴き、自分で実際にボーカル信
号に合せて唄う練習をすれば歌唱力の上達が図れ
るとされているが、どんなに練習を重ねても、自
分の唄が手本となつているボーカル信号の唄い方
にどれだけ近づいているのか、すなわち自分の歌
唱力がどの程度向上しているのかがユーザー自身
には分らないという欠点があり、又間違つた唄い
方をしていてもユーザー自身はそれに気づかない
ままであつたりして、個人での練習をする際には
自ずと限界が生じ、興味がなくなつて練習意欲も
なくしてしまうことも多いという欠点を有してい
た。 It is said that you can improve your singing ability by listening to vocal signals recorded on a recording medium using the above device and practicing singing along with the vocal signals yourself, but no matter how much you practice, However, the drawback is that users themselves cannot tell how close their own singing style is to the modeled vocal signal, in other words, how much their singing ability has improved. Also, even if the user sings in the wrong way, the user may not be aware of it, and when practicing individually, there will naturally be a limit, and the user will lose interest and desire to practice. It had the disadvantage that there were many

発明の目的本発明は上記従来の問題点を解消するもので、
音声多重式の記録媒体等に記録されているボーカ
ル信号とユーザーの唄う音声信号とを比較し、そ
の合致度を得点として算出・表示して、ユーザー
の歌唱力に対する一つの客観的評価手段を提供す
ることを目的とするものである。Purpose of the invention The present invention solves the above-mentioned conventional problems.
Compares the vocal signal recorded on an audio multiplex recording medium with the user's singing voice signal, calculates and displays the degree of match as a score, and provides an objective evaluation method for the user's singing ability. The purpose is to

発明の構成本発明の採点装置は、入力される第１の音声信
号をパルス信号に変換する第１の波形変換手段
と、前記第１の波形変換手段の出力パルス信号を
もとに、前記第１の音声信号の基本周波数を検出
する第１の音程検出手段と、ある時間幅の分量の
前記第１の音程検出手段の出力である前記第１の
音声信号の基本周波数のデータ群を記憶保持する
第１の音程記憶手段と、入力される第２の音声信
号をパルス信号に変換する第２の波形変換手段
と、前記第２の波形変換手段の出力パルス信号を
もとに前記第２の音声信号の基本周波数を検出す
る第２の音程検出手段と、前記第２の音声信号の
信号レベルの変化を検出するレベル変化検出手段
と、前記レベル変化検出手段により前記第２の音
声信号のレベルが変化したと検出された時点の前
記第１の音程検出手段の出力である前記第１の音
声信号の基本周波数を記憶保持する第２の音程記
憶手段と、第２の音程記憶手段に記憶されている
前記第２の音声信号の基本周波数と前記第１の音
程記憶手段に記憶されている前記第１の音声信号
の基本周波数のデータ群の内容とを照合して前記
第１の音声信号の基本周波数のデータ群の中から
前記第２の音声信号の基本周波数と最も近似して
いる前記第１の音声信号の基本周波数を選出後、
選出された前記第１の音声信号の基本周波数と前
記第２の音程記憶手段に記憶されている前記第２
の音声信号の基本周波数とを比較演算して、前記
第１の音声信号が前記第２の音声信号とどの程度
合致しているかを得点として算出する得点計算手
段とから構成されており、この構成によつて、第
１の音声信号としてユーザーの唄う音声信号、第
２の音声信号として歌の手本となる記録媒体に記
録されているボーカル信号の再生音声信号を用い
ることにより、ユーザーの唄う音声信号が記録媒
体のボーカル信号の再生音声信号とどの程度合致
しているかが得点として表示されるためユーザー
は自分の歌唱力が記録媒体のボーカル信号と比較
してどの程度のレベルであるかが認識できるもの
である。Structure of the Invention The scoring device of the present invention includes a first waveform converting means for converting an inputted first audio signal into a pulse signal, and a first waveform converting means that converts the first audio signal into a pulse signal based on the output pulse signal of the first waveform converting means. a first pitch detection means for detecting the fundamental frequency of the first speech signal; and a data group of the fundamental frequency of the first speech signal that is an output of the first pitch detection means for a certain time width. a first pitch storage means for converting an inputted second audio signal into a pulse signal; and a second waveform conversion means for converting an input second audio signal into a pulse signal; a second pitch detection means for detecting a fundamental frequency of an audio signal; a level change detection means for detecting a change in the signal level of the second audio signal; and a level change detection means for detecting a change in the signal level of the second audio signal; a second pitch storage means for storing and holding the fundamental frequency of the first audio signal which is the output of the first pitch detecting means at the time when a change in the pitch is detected; The fundamental frequency of the second audio signal that is being recorded is compared with the content of the data group of the fundamental frequency of the first audio signal stored in the first pitch storage means to determine the fundamental frequency of the first audio signal. After selecting the fundamental frequency of the first audio signal that is most similar to the fundamental frequency of the second audio signal from a data group of fundamental frequencies,
The fundamental frequency of the selected first audio signal and the second pitch stored in the second pitch storage means.
and a score calculation means for calculating a score based on the degree to which the first audio signal matches the second audio signal by comparing the fundamental frequency of the audio signal with the fundamental frequency of the audio signal. By using the user's singing voice signal as the first voice signal and the reproduced voice signal of the vocal signal recorded on the recording medium that serves as a song model as the second voice signal, the user's singing voice can be reproduced. The degree to which the signal matches the reproduced audio signal of the vocal signal on the recording medium is displayed as a score, so the user can recognize the level of his or her singing ability compared to the vocal signal on the recording medium. It is possible.

実施例の説明第３図は本発明の一実施例を示すブロツク図で
ある。４はユーザーの唄う音声を電気信号に変換
し増幅を行なうマイク入力手段で、４０１はマイ
ク、４０２は増幅器である。２は音声多重式の記
録媒体に記録されているボーカル信号の再生を行
なう第１の磁気テープ再生手段で、２０１は磁気
ヘツド、２０２は増幅器である。７は第１の波形
変換手段で、ユーザーが唄つた音声の信号をパル
ス信号へ変換するものである。８は第２の波形変
換手段で、記録媒体のボーカル信号をパルス信号
へ変換するものである。９は第１の音程検出手段
で、ユーザーが唄う音声の基本周波数を検出する
ものである。１０は第２の音程検出手段で、ボー
カル信号の基本周波数を検出するものである。１
１は第１の音程記憶手段で、ある時間幅の分量の
ユーザーが唄う音声の基本周波数のデータ群の記
憶を行なうものである。１６はレベル変化検出手
段で、ボーカル信号のレベル変化の検出を行なう
ものである。１２は第２の音程記憶手段で、ボー
カル信号のレベル変化が生じた時点のボーカル信
号の基本周波数の記憶を行なうものである。１３
は得点計算手段で、ボーカル信号の信号レベルが
変化した時点のある時間幅の分量のユーザーが唄
う音声の基本周波数のデータ群とボーカル信号の
信号レベルが変化した時点のボーカル信号の基本
周波数とを比較演算してユーザーが唄う音声信号
がボーカル信号とどの程度合致しているかを得点
として算出するものである。１４は得点表示手段
で、前記得点計算手段１３で算出された得点をユ
ーザーに知らせるために得点表示を行なうもので
ある。DESCRIPTION OF THE EMBODIMENT FIG. 3 is a block diagram showing an embodiment of the present invention. 4 is a microphone input means for converting the user's singing voice into an electrical signal and amplifying it; 401 is a microphone; and 402 is an amplifier. 2 is a first magnetic tape reproducing means for reproducing a vocal signal recorded on an audio multiplexing recording medium; 201 is a magnetic head; and 202 is an amplifier. Reference numeral 7 denotes a first waveform converting means, which converts a voice signal sung by a user into a pulse signal. Reference numeral 8 denotes a second waveform converting means, which converts the vocal signal on the recording medium into a pulse signal. Reference numeral 9 denotes a first pitch detection means, which detects the fundamental frequency of the voice sung by the user. Reference numeral 10 denotes a second pitch detection means for detecting the fundamental frequency of the vocal signal. 1
Reference numeral 1 denotes a first pitch storage means, which stores a data group of fundamental frequencies of voices sung by a user over a certain time width. Reference numeral 16 denotes level change detection means for detecting level changes in the vocal signal. Reference numeral 12 denotes a second pitch storage means for storing the fundamental frequency of the vocal signal at the time when the level of the vocal signal changes. 13
is a score calculation means that calculates a data group of the fundamental frequency of the voice sung by the user for a certain time width at the time when the signal level of the vocal signal changes and the fundamental frequency of the vocal signal at the time when the signal level of the vocal signal changes. A score is calculated based on the degree to which the voice signal sung by the user matches the vocal signal through comparison calculations. Reference numeral 14 denotes a score display means, which displays the score in order to inform the user of the score calculated by the score calculation means 13.

第４図は本実施例の具体的な構成を示すブロツ
ク図で、上記ユーザーの唄う音声の音程検出、ボ
ーカル信号の音程検出、ユーザーが唄う音声信号
の基本周波数のデータ群の記憶保持、ボーカル信
号の基本周波数の記憶保持、得点の計算の機能を
マイクロコンピユータ１５で実現したものであ
る。 FIG. 4 is a block diagram showing the specific configuration of this embodiment, which includes detecting the pitch of the voice sung by the user, detecting the pitch of the vocal signal, storing the data group of the fundamental frequency of the voice signal sung by the user, and detecting the pitch of the vocal signal sung by the user. The microcomputer 15 realizes the functions of storing the fundamental frequency and calculating the score.

第５図は前記第１の波形変換手段７の実際の回
路例を示したもので、通常、第１の波形変換手段
７と第２の波形変換手段８は同一回路が使われる
場合が多いため、第１の波形変換手段７の回路を
代表的に第６図の動作説明図とともに説明する。 FIG. 5 shows an actual circuit example of the first waveform converting means 7. Normally, the same circuit is often used for the first waveform converting means 7 and the second waveform converting means 8. , the circuit of the first waveform converting means 7 will be representatively explained with reference to the operational diagram of FIG.

７０１は入力端子、７０２，７０４，７０５，
７０８，７１０，７１１は抵抗器、７０３，７０
６，７０９はコンデンサ、７０７は演算増幅器
（以下OPアンプと略称する）、７１２はトランジ
スタ、７１３は出力端子である。 701 is an input terminal, 702, 704, 705,
708, 710, 711 are resistors, 703, 70
6 and 709 are capacitors, 707 is an operational amplifier (hereinafter abbreviated as OP amplifier), 712 is a transistor, and 713 is an output terminal.

OPアンプ７０７と抵抗器７０２，７０４，７
０５とコンデンサ７０３，７０６とは低減通過形
のアクテイブフイルタを構成しており、入力端子
７０１に入力される第６図ａに示されるような音
声電気信号の高域成分を取り去り、同時にOPア
ンプ７０７の増幅作用により必要な信号増幅を行
なうものであり、さらに抵抗器７０８とコンデン
サ７０９とで構成された時定数回路により前記ア
クテイブフイルタで除去不十分である高域成分を
補助的に除去する。こうして必要な量だけ高域成
分を除去された第６図ｂに示されるような音声電
気信号は抵抗器７１０，７１１とトランジスタ７
１２とで第６図ｃに示されるようなパルス波形に
変換されることとなる。このようにして第１の波
形変換手段７によりマイク入力手段４の出力であ
るユーザーの唄う音声信号はパルス波形へと変換
され、同様に第２の波形変換手段８により第１の
磁気テープ再生手段の出力であるボーカル信号も
パルス波形に変換されることとなる。 OP amplifier 707 and resistors 702, 704, 7
05 and capacitors 703 and 706 constitute a reduced-pass type active filter, which removes high-frequency components of the audio electrical signal as shown in FIG. Necessary signal amplification is performed by the amplification action of , and furthermore, a time constant circuit composed of a resistor 708 and a capacitor 709 supplementally removes high-frequency components that are not sufficiently removed by the active filter. In this way, the audio electrical signal shown in FIG.
12, the pulse waveform is converted into a pulse waveform as shown in FIG. 6c. In this way, the first waveform converting means 7 converts the user's singing voice signal, which is the output of the microphone input means 4, into a pulse waveform, and the second waveform converting means 8 similarly converts the voice signal sung by the user into a pulse waveform. The vocal signal that is the output of is also converted into a pulse waveform.

又、レベル変化検出手段１６はアナログデイジ
タル変換器や従来のアナログ技術およびデイジタ
ル技術応用のサンプルホールド回路などで現在の
レベルと以前のレベルとを比較することにより実
現できる。 Further, the level change detection means 16 can be realized by comparing the current level and the previous level using an analog-to-digital converter or a sample-and-hold circuit using conventional analog technology or digital technology.

以下、第７図のマイクロコンピユータの処理動
作の要部を示すうフロートチヤートにもとづいて
本実施例の動作を説明する。 The operation of this embodiment will be explained below based on the float chart shown in FIG. 7 which shows the main part of the processing operation of the microcomputer.

まず、装置の電源は投入されており、マイクロ
コンピユータ１５内部にある記憶素子等も初期化
されているものとする。ユーザーの唄う音声信号
はマイク入力手段４により電気音声信号となり、
増幅され、第１の波形変換手段７によりパルス信
号に変換され、マイクロコンピユータ１５に入力
され、ステツプ17で入力パルスの時間幅がデイジ
タル量に変換され、記憶される。すなわち第６図
ｃに示されるパルス信号の“Ｈ”である期間をマ
イクロコンピユータ自身のもつているクロツク信
号により計数すれば入力パルスの時間幅のデイジ
タル量への変換が達成できる。このようにして第
６図ｃにおけるt₁からt₂の時間幅、t₃からt₄の時
間幅、t₅からt₆の時間幅……という順で変換が行
なわれる。なお、この時間幅は、音声の基本周波
数に反比例する情報であり増加すれば音程が低く
なつたことを示し、減少すれば音程が高くなつた
ことを示している。 First, it is assumed that the power of the device is turned on and that the memory elements and the like inside the microcomputer 15 have also been initialized. The user's singing voice signal becomes an electric voice signal by the microphone input means 4,
The signal is amplified and converted into a pulse signal by the first waveform conversion means 7, which is input to the microcomputer 15. In step 17, the time width of the input pulse is converted into a digital quantity and stored. That is, by counting the "H" period of the pulse signal shown in FIG. 6c using the microcomputer's own clock signal, the time width of the input pulse can be converted into a digital quantity. In this way, conversion is performed in the order of the time width from _t1 to _t2 , the time width from _t3 to _t4 , the time width from _t5 to _t6 , etc. in FIG. 6c. Note that this time width is information that is inversely proportional to the fundamental frequency of the voice, and when it increases, it indicates that the pitch has become lower, and when it decreases, it indicates that the pitch has become higher.

一方、音声多重式の記録媒体である磁気テープ
１に記憶されているボーカル信号は第１の磁気テ
ーブル再生手段２により再生され、レベル変化検
出手段１６の出力によりステツプ18で信号レベル
が増大したか否かをチエツクし、増大していれば
ステツプ19でマイクロコンピユータ１５に入力さ
れた第２の波形変換手段８の出力である音声の基
本周波数の逆数の1/2の値を表すパルス時間幅が
デイジタル量に変換され、記憶される。 On the other hand, the vocal signal stored on the magnetic tape 1, which is an audio multiplexing recording medium, is reproduced by the first magnetic table reproduction means 2, and the signal level is increased in step 18 based on the output of the level change detection means 16. If it has increased, in step 19, the pulse time width representing the value of 1/2 of the reciprocal of the fundamental frequency of the voice, which is the output of the second waveform converting means 8 input to the microcomputer 15, is checked. converted into digital quantities and stored.

ステツプ19に続いて、ステツプ20とステツプ21
によつて、ステツプ19に入つてからある一定時間
Ｔの期間分の第１の音声信号のパルス信号の時間
幅のデータが収集され、次いでステツプ22によ
り、このステツプ22に入つた時点から過去2Tの
時間分の第１の音声信号のパルス信号の時間幅の
データ群、すなわちステツプ19に入つた時点から
みれば、−Ｔから＋Ｔの時間におけるユーザーの
唄の音声信号のパルス信号の時間幅データ群の中
からステツプ19で得られたボーカル信号の音声信
号のパルス信号の時間幅に最も近似したものを選
出して、ステツプ23によりステツプ19で得たボー
カル信号の音声信号のパルス信号の時間幅とステ
ツプ22で得たユーザーの唄の音声信号のパルス信
号の時間幅とを比較し、その合致度を計算し、合
致度に応じて得点を算出する。 Following step 19, step 20 and step 21
In step 22, data on the time width of the pulse signal of the first audio signal for a certain period of time T after entering step 19 is collected, and then in step 22, data on the time width of the pulse signal of the first audio signal is collected for the past 2T from the time when step 22 is entered. A data group of the time width of the pulse signal of the first audio signal for a time period of , that is, from the point of entering step 19, the time width data of the pulse signal of the audio signal of the user's song from time −T to +T. The time width of the pulse signal of the audio signal of the vocal signal obtained in step 19 is selected from the group in step 23, and the one that most closely approximates the time width of the pulse signal of the audio signal of the vocal signal obtained in step 19 is selected. and the time width of the pulse signal of the audio signal of the user's song obtained in step 22, the degree of matching is calculated, and a score is calculated according to the degree of matching.

このステツプ23における得点の算出法の１例に
ついて以下に述べる。例えば第１の音声信号のパ
ルス信号時間幅が9.5〔ｍＳ〕、第２の音声信号の
パルス信号時間幅が10〔ｍＳ〕とすればその合致
度は（10−9.5）／10×100＝５〔％〕という風に
合致度は算出され、この合致度の大小に応じて得
点に反映される配点を変えることとする。一例と
して合致度が３％以内なら100点、３〜５％なら
80点、５〜７％なら60点、７％以上は０点という
風に配点をし、一回の採点対象となる唄の中で時
間幅の合致度の算出をした回数をＮ〔回〕、時間幅
の合致度の算出をした度毎に得た配点をP_o〔点〕
とすると第１の音声信号と第２の音声信号の合致
度を示す得点Ｐは一例として以下のように表わさ
れる。 An example of the method for calculating the score in step 23 will be described below. For example, if the pulse signal time width of the first audio signal is 9.5 [mS] and the pulse signal time width of the second audio signal is 10 [mS], the degree of matching is (10-9.5)/10×100=5 The degree of agreement is calculated as [%], and the distribution of points reflected in the score will be changed depending on the degree of agreement. For example, if the degree of match is within 3%, 100 points, if it is 3-5%
Scores are given as 80 points, 60 points for 5-7%, 0 points for 7% or more, and the number of times the degree of matching of duration was calculated in a song to be scored is N [times]. , P _o [points] is the score obtained each time the degree of matching of the time width is calculated.
Then, the score P indicating the degree of matching between the first audio signal and the second audio signal is expressed as follows, as an example.

Ｐ＝Σ^N _o=1（P_o）／Ｎ〔点〕上記の式による得点計算の一例を挙げてみれ
ば、一回の採点対象となる唄の中で、時間幅の合
致度の算出を10回行ない、３回は３％以内、４回
は３〜５％、１回は５〜７％、２回は７％以上の
合致度であつたとし、配点も前記の通りであつた
とすれば得点Ｐは、100点満点中でＰ＝（３×100＋４×80＋１×60＋２×０
）／10＝680／10＝68（点）という風に算出される。 P=Σ ^N _o=1 (P _o )/N [points] To give an example of score calculation using the above formula, calculate the degree of matching of duration within a song to be scored once. Assume that the match was performed 10 times, and the match was within 3% for 3 times, 3 to 5% for 4 times, 5 to 7% for 1 time, and 7% or more for 2 times, and the points were distributed as described above. The score P is out of 100 points: P = (3 x 100 + 4 x 80 + 1 x 60 + 2 x 0
)/10=680/10=68 (points).

次にステツプ24により採点を終了する時点であ
るか否かを判断する。採点を終了する判断のもと
となるものとしては、採点終了の指定をする押し
ボタンスイツチ（図示せず）の情報を用いてもよ
いし、磁気テープ１に記録されている演奏音楽信
号の有無を検出して、演奏音楽信号がなくなつた
時点で採点開始としてもよい。またその曲の終了
信号をあらかじめ記録しておき、その終了信号を
検出した時点や、磁気テープの終端検出の時点を
利用することも可能である。 Next, in step 24, it is determined whether it is time to end the scoring. The decision to end the scoring may be based on information from a push button switch (not shown) that specifies the end of the scoring, or whether there is a performance music signal recorded on the magnetic tape 1. It is also possible to detect this and start scoring when the performance music signal disappears. It is also possible to record the end signal of the song in advance and use the point in time when the end signal is detected or the point in time when the end of the magnetic tape is detected.

採点終了の時点になつていなければステツプ24
より、次の時間幅情報の収集、比較および得点計
算へと備え、ステツプ17へと進むこととなる。 If it is not yet time to finish scoring, proceed to step 24.
Therefore, the process proceeds to step 17 in preparation for the next collection of time width information, comparison, and score calculation.

そして、採点終了の時点になればステツプ24か
らステツプ25へと進み、得点の表示が行なわれ
る。 When the scoring is finished, the process proceeds from step 24 to step 25, where the scores are displayed.

さて、ここでボーカル信号の信号レベルが増大
した時点付近の２種の音声信号のパルス時間幅を
照合、比較することの意味について第８図の唄い
方の変化説明図を用いて説明を行なう。第８図ア
はマイクから入力されるユーザーの唄、第８図イ
はプロ歌手による音声多重媒体のボーカル信号の
１例とする。第８図ａとｂで示すようにアマチユ
アのユーザーはプロ歌手の唄うボーカル信号より
唄うタイミングが遅れたり、第８図ｂとｅで示す
ようにユーザーはボーカル信号を唄つているプロ
歌手のようにこぶしやバイブレーシヨンと呼ばれ
る一つの発声の中での周波数や信号レベルの変化
を自由に行なうことができないことが多い。 Now, the meaning of collating and comparing the pulse time widths of the two types of audio signals around the time when the signal level of the vocal signal increases will be explained using the illustration of changes in singing style shown in FIG. 8. FIG. 8A shows an example of a user's song inputted from a microphone, and FIG. 8B shows an example of a vocal signal of an audio multiplexed medium by a professional singer. As shown in Figure 8 a and b, amateur users sing at a later timing than the vocal signal of a professional singer, and as shown in Figure 8 b and e, the user sings with a vocal signal that is similar to that of a professional singer. In many cases, it is not possible to freely change the frequency or signal level within a single vocalization called fist or vibration.

このため第８図の例に示すような場合において
ユーザーの唄とボーカル信号の唄とのそれぞれの
周波数を遂一連続的に比較して採点してしまつて
は、これらのこぶしやバイブレーシヨンの影響が
大きく表われる。例えば第８図ｆのＧ点以降のよ
うにプロ歌手によるボーカル信号はバイブレーシ
ヨンにより周波数が大きく変つているのに、ユー
ザーの唄はバイブレーシヨンがかからないため、
ほぼ一定の周波数なのでこの部分では非常に低い
評価得点しか得られないことが生じる。これらの
こぶしやバイブレーシヨンはプロ歌手でも個人差
が大きく表われ、また、同一人物でも毎回同じよ
うには発声できないため、唄を採点する際にはこ
ぶしやバイブレーシヨンをも含めて連続的にユー
ザーの唄とボーカル信号の唄とを比較し採点する
と採点得点のバラツキが非常に大きくなつてしま
う。 Therefore, in the case shown in the example in Figure 8, if the respective frequencies of the user's song and the vocal signal's song are compared and scored continuously, the influence of these fists and vibrations will be evaluated. is greatly expressed. For example, as shown after point G in Figure 8 f, the frequency of a professional singer's vocal signal changes greatly due to vibration, but the user's song is not affected by vibration.
Since the frequency is almost constant, it is possible that only a very low evaluation score can be obtained in this part. These fists and vibrations show great individual differences even among professional singers, and even the same person cannot pronounce them in the same way every time. When comparing and scoring songs using vocal signals and songs using vocal signals, the dispersion in the scores becomes very large.

また、第８図ａとｃの歌詞の位置関係のずれか
ら分るようにこの第８図の例ではユーザーの唄と
ボーカル信号とは唄うタイミングがずれている。
このようなタイミングずれは通常普通に生じるも
のであり、ユーザーの唄とボーカル信号の唄との
それぞれの周波数を遂一連続的に比較してしまつ
ては、第８図ｅのボーカル信号の最初の唄い出し
にあたる点では第８図ｂで分るようにまだユーザ
ーは唄い出していないから２種の基本周波数の比
較は不可能であるし、第８図ｄで示すボーカル信
号の歌詞が「も」に対応することばの唄い出しの
時点では第８図ａのユーザーの歌詞では最初の
「く」の音の延長の位置にあたるため２種の信号
の基本周波数を比較しても合致はしないのでたと
えユーザーが正しい音程で唄つていても低い評価
得点しか得られないということが生じる。 Furthermore, as can be seen from the deviation in the positional relationship of the lyrics in FIGS. 8a and 8c, in the example of FIG. 8, the singing timing of the user's song and the vocal signal are shifted.
Such a timing shift normally occurs, and if the respective frequencies of the user's song and the vocal signal's song are compared consecutively, the first of the vocal signals in Figure 8e. At the beginning of the song, as shown in Figure 8b, the user has not started singing yet, so it is impossible to compare the two types of fundamental frequencies, and the lyrics of the vocal signal shown in Figure 8d are "mo". In the user's lyrics in Figure 8a, the beginning of the word corresponding to ``k'' is at the extension of the first ``ku'' sound, so even if the fundamental frequencies of the two types of signals are compared, they will not match. Even if a person sings at the correct pitch, he or she may only receive a low evaluation score.

本発明は以上のような不都合を解決し、第８図
の例では、第８図ｆに示すようにＡ点の次はＢ
点、Ｂ点の次はＣ点というようにボーカル信号の
信号レベルが大きく増大する時点付近のこぶしや
バイブレーシヨン等が生じていない部分でのボー
カル信号の基本周波数と、第８図ｂに示すように
ボーカル信号の信号レベルが大きく増大した時
点、例えばＡ点の前後のＴという長さの時間の
間、すなわちA^-点からA⁺点までの期間のユーザ
ーの唄の基本周波数のデータ群とを照合して、Ａ
点の時点のボーカル信号の基本周波数に最も近似
しているユーザーの唄の基本周波数のデータを選
出し、これら２種の基本周波数を比較することに
より、ユーザーの唄とボーカル信号とが多少タイ
ミング的にずれていたとしても、第８図ｂで示す
ようなA^-点からA⁺点までの期間でのユーザーの
唄の基本周波数のデータの群の中で、第８図ｅ，
ｆで示すようなＡ点の時点のボーカル信号の基本
周波数とほぼ合致するデータが存在すれば、良い
評価得点を得ることができることになり、ユーザ
ーの唄とボーカル信号のプロ歌手の唄との唄い出
しのタイミングずれを吸収し、こぶしやバイブレ
ーシヨン等の個人差の大きく表われる部分での比
較を行なわないため実際に人間が耳で聴いて採点
評価する場合の評価に近いより正確な採点が行な
えることとなる。 The present invention solves the above-mentioned problems, and in the example of FIG. 8, the point after point A is B as shown in FIG. 8f.
The fundamental frequency of the vocal signal at the point where the signal level of the vocal signal increases greatly, such as point B and then point C, where no fists or vibrations occur, and the A data group of the fundamental frequency of the user's song during a period of time T before and after point A, that is, from point A ^- to point A ⁺ , when the signal level of the vocal signal increases significantly. Verify, A
By selecting data on the fundamental frequency of the user's song that is most similar to the fundamental frequency of the vocal signal at the point in time, and comparing these two types of fundamental frequencies, it is possible to determine whether the timing of the user's song and the vocal signal are slightly different. Even if the data deviates from the fundamental frequency of the user's song during the period from point A ^- to point A ⁺ as shown in figure 8 b, the fundamental frequency data in figure 8 e,
If there is data that almost matches the fundamental frequency of the vocal signal at point A, as shown by f, it will be possible to obtain a good evaluation score, and the difference between the user's song and the professional singer's song of the vocal signal. Since it absorbs timing deviations and does not compare areas where individual differences such as fists and vibrations are large, it is possible to perform more accurate scoring that is closer to the evaluation when humans actually listen and evaluate with their ears. The Rukoto.

以上のように本実施例によれば、磁気テープ等
のボーカル信号の信号レベルが変化した時点付近
のボーカル信号の基本周波数の情報とユーザーの
唄う音声信号の基本周波数のある期間分のデータ
とを比較し、その合致度を得点として算出し、表
示することができるので、こぶしやバイブレーシ
ヨン、唄うときのタイミングずれ等の唄い方の個
人差を除外したユーザーの歌唱力に対する一つの
より正確な客観的評価手段を提供することができ
る。 As described above, according to this embodiment, information on the fundamental frequency of the vocal signal around the time when the signal level of the vocal signal such as a magnetic tape changes and data for a certain period of the fundamental frequency of the audio signal sung by the user are stored. Since it is possible to compare, calculate and display the degree of agreement as a score, it provides a more accurate objective view of the user's singing ability, excluding individual differences in singing style such as fists, vibrations, and timing differences when singing. It is possible to provide a means of evaluation.

なお、本実施例では採点の対象としてユーザー
の唄う音声信号を、また採点の基準となるものと
して音声多重式の記録媒体である磁気テープのボ
ーカル信号を取り上げたが、これらは楽器演奏信
号や単なる正弦波信号や人の話し声などのような
音声信号を用いてもよい。 In this example, the user's singing voice signal was used as the subject of scoring, and the vocal signal recorded on magnetic tape, which is an audio multiplexing recording medium, was used as the scoring standard. An audio signal such as a sine wave signal or human speech may also be used.

また、本実施例では、音声信号をパルス信号に
変換するために低減通過形アクテイブフイルタと
トランジスタを用いた波形変換手段を取り上げた
が、これは音声信号波形をアナログ−デイジタル
変換器で直接デイジタル値のパルス信号に変換す
る回路を用いてもよい。 In addition, in this embodiment, a waveform conversion means using a reduced-pass active filter and a transistor to convert an audio signal into a pulse signal was taken up. A circuit that converts the signal into a pulse signal may also be used.

また、本実施例では、音程検出手段、得点計算
手段等をマイクロコンピユータにより実現したが
これら従来の汎用ロジツク回路等で実現して用い
てもよいのはもちろんのことである。 Further, in this embodiment, the pitch detection means, score calculation means, etc. are implemented by a microcomputer, but it goes without saying that these conventional general-purpose logic circuits may be implemented and used.

また、本実施例ではユーザーの音声信号の処理
と、ボーカル信号の処置とでそれぞれ個別に波形
変換手段、音程検出手段を設けたが、これらを１
系統のみとし、時分割でユーザーの音声信号の処
理と、ボーカル信号の処理を行なわせてもよい。 In addition, in this embodiment, waveform converting means and pitch detecting means are provided separately for processing the user's audio signal and processing the vocal signal, but these are
It is also possible to use only one system, and process the user's audio signal and the vocal signal in a time-sharing manner.

また、本実施例では、波形変換手段の出力であ
るパルス信号の“Ｈ”の場合の時間幅を第６図ｃ
において、t₁からt₂の時間幅の次はt₃からt₄の時
間幅というようにすべて検知して音声信号の基本
周波数、すなわち音程を検出するようにしている
が、例えば、第６図ｃにおいてt₁からt₂の時間幅
の次はt₅からt₆の時間幅というように１つずつ、
とびとびに時間幅を検出してもよいし、波形変換
手段の出力であるパルス信号の“Ｈ”となる１つ
の時間幅に比べて十分長い一定時間の間の波形変
換手段の出力であるパルス信号の“Ｈ”となる時
間幅を全パルスについて、または一部のパルスに
ついて調べ、１つのパルス当りの平均時間幅や最
大時間幅等を求めて、この平均時間幅等により音
声信号の音程を検出するようにしてもよいし、パ
ルス信号の“Ｈ”の時間幅でなく、“Ｈ”の立上
りから次の“Ｈ”の立上りまでの時間幅というよ
うにパルス一周期分の時間幅を求めて処理しても
よい。 In addition, in this embodiment, the time width in the case of "H" of the pulse signal which is the output of the waveform conversion means is shown in FIG.
In this case, the fundamental frequency of the audio signal, that is, the pitch, is detected by detecting the time width from t ₁ to t ₂ , followed by the time width from t ₃ to t ₄ , etc. For example, as shown in FIG. In c, the time span from t ₁ to t ₂ is followed by the time span from t ₅ to t ₆ , and so on, one by one.
The time width may be detected intermittently, or the pulse signal that is the output of the waveform conversion means may be detected for a certain period of time that is sufficiently long compared to one time width in which the pulse signal that is the output of the waveform conversion means becomes “H”. Check the time width of "H" for all pulses or some pulses, find the average time width and maximum time width, etc. per pulse, and detect the pitch of the audio signal from this average time width, etc. Alternatively, instead of finding the "H" time width of the pulse signal, the time width for one pulse period, such as the time width from the rising edge of "H" to the rising edge of the next "H", can be calculated. May be processed.

発明の効果以上のように本発明は、２つの音声信号をパル
ス信号に変換する２つの波形変換手段と、その出
力をもとに２つの音声信号の基本周波数を表す情
報を検出する２つの音程検出手段と、基本周波数
を記憶保持する２つの音程記憶手段と、採点の基
準となる音声信号の信号レベルの変化を検出する
レベル変化検出手段と、採点の基準となる音声信
号の信号レベルが変化した時点の採点の基準とな
る音声信号の基本周波数と、採点すべき音声信号
のある期間分の基本周波数のデータとを照合、比
較演算する得点計算手段とから成り、この構成に
より２つの音声信号の合致度をこぶしやバイブレ
ーシヨン、唄うときのタイミングずれ等の唄い方
の個人差を除外して得点として得ることができ
る。Effects of the Invention As described above, the present invention provides two waveform converting means for converting two audio signals into pulse signals, and two pitch converters for detecting information representing the fundamental frequencies of the two audio signals based on the outputs of the waveform converting means. a detection means, two pitch storage means for storing and holding the fundamental frequency, a level change detection means for detecting a change in the signal level of the audio signal that serves as the basis for scoring, and a change in the signal level of the audio signal that serves as the basis for scoring. It consists of a score calculation means that collates and performs a comparison calculation between the fundamental frequency of the audio signal that serves as the standard for scoring at the point in time and the fundamental frequency data for a certain period of the audio signal to be scored. The degree of agreement can be obtained as a score by excluding individual differences in singing style, such as fists, vibrations, and timing differences when singing.

このことは音声多重式の記録媒体を用いて歌の
練習をする人々に、音声多重式の記録媒体に記録
されているボーカル信号を歌の先生として、その
歌の先生の歌唱力に対し、自分は何点の歌唱力が
あるかという客観的判断手段を提供できることと
なり練習等の効果は大なるものがある。 This means that people who practice singing using audio multiplexed recording media can use the vocal signals recorded on the audio multiplexed recording media as a singing teacher to evaluate the singing ability of the singing teacher. It is possible to provide an objective means of determining how many points one has in singing ability, which has a great effect on practice, etc.

[Brief explanation of drawings]

第１図は音声多重式記録媒体の１つである磁気
テープ上の音声多重トラツクの説明図、第２図は
音声多重式記録媒体の１つである磁気テープを用
いた俗にいう音声多重式の「カラオケ装置」のブ
ロツク図、第３図は本発明の一実施例の要部ブロ
ツク図、第４図は本実施例の具体的構成を示すブ
ロツク図、第５図は本実施例の第１の波形変換手
段の具体的構成を示す回路図、第６図は第１の波
形変換手段の動作を説明するための動作説明図、
第７図は本実施例のマイクロコンピユータの処理
動作の要部を示すフローチヤート、第８図は唄い
方の時間的変化を説明するための唄い方の変化説
明図である。７……第１の波形変換手段、８……第２の波形
変換手段、９……第１の音程検出手段、１０……
第２の音程検出手段、１１……第１の音程記憶手
段、１２……第２の音程検出手段、１３……得点
計算手段、１４……得点表示手段、１６……レベ
ル変化検出手段。 Figure 1 is an explanatory diagram of an audio multiplex track on a magnetic tape, which is one type of audio multiplex recording medium, and Figure 2 is an illustration of a so-called audio multiplex track using magnetic tape, which is one type of audio multiplex recording medium. 3 is a block diagram of a main part of an embodiment of the present invention, FIG. 4 is a block diagram showing a specific configuration of this embodiment, and FIG. 5 is a block diagram of a karaoke device of this embodiment. FIG. 6 is an operation explanatory diagram for explaining the operation of the first waveform conversion means;
FIG. 7 is a flowchart showing the main part of the processing operation of the microcomputer of this embodiment, and FIG. 8 is an explanatory diagram of changes in the singing style for explaining temporal changes in the singing style. 7...First waveform converting means, 8...Second waveform converting means, 9...First pitch detecting means, 10...
Second pitch detection means, 11...First pitch storage means, 12...Second pitch detection means, 13...Score calculation means, 14...Score display means, 16...Level change detection means.

Claims

[Claims]

1. Detecting the fundamental frequency of the first audio signal based on the first waveform converting means that converts the input first audio signal into a pulse signal and the output pulse signal of the first waveform converting means. a first pitch detection means for storing and holding a data group of the fundamental frequency of the first audio signal which is the output of the first pitch detection means for a certain time width; a second waveform conversion means for converting an input second audio signal into a pulse signal; and detecting the fundamental frequency of the second audio signal based on the output pulse signal of the second waveform conversion means. a second pitch detecting means; a level change detecting means for detecting a change in the signal level of the second audio signal; and a point in time when a change in the level of the second audio signal is detected by the level change detecting means. a second interval storage means for storing and holding the fundamental frequency of the second audio signal which is the output of the second interval detection means; and the second audio signal stored in the second interval storage means. and the content of the data group of the fundamental frequency of the first audio signal stored in the first pitch storage means to select the fundamental frequency from among the data group of the fundamental frequency of the first audio signal. After selecting the fundamental frequency of the first audio signal that is most similar to the fundamental frequency of the second audio signal, the selected first
The fundamental frequency of the audio signal and the fundamental frequency of the second audio signal stored in the second pitch storage means are compared and computed to determine how the first audio signal is different from the second audio signal. A scoring device comprising a score calculation means for calculating a score based on degree of agreement.