JPS588024B2

JPS588024B2 - Detection and cutting device for characters with ruby

Info

Publication number: JPS588024B2
Application number: JP53127855A
Authority: JP
Inventors: 坂井邦夫; 須田正人; 平井彰一
Original assignee: Agency of Industrial Science and Technology
Current assignee: National Institute of Advanced Industrial Science and Technology AIST
Priority date: 1978-10-19
Filing date: 1978-10-19
Publication date: 1983-02-14
Also published as: JPS5556257A

Description

【発明の詳細な説明】この発明は文字検出切出装置に係り、特にルビ付文字の
ルビと文字とを分離するルビ付文字の検出切出装置に関
する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a character detection and cutting device, and more particularly to a ruby character detection and cutting device that separates ruby characters from ruby characters.

従来漢字ＯＣＲが対象にしていた日本語文章はルビなし
文章であったため１行づつの分離とか文字つつの分離の
ふでよかったが、小説とか参考書のような一般的な日本
語文章を読み取ろうとする場合ところどころの文字には
ルビが付されていることが多いだめ、文字を最善の状態
で読み取るには文字とルビとを分離しなければならない
。Traditionally, the Japanese text targeted by Kanji OCR was text without ruby, so it was easy to separate each line or character, but when trying to read general Japanese texts such as novels and reference books. When reading text, ruby is often added to some characters, so in order to read the text at its best, it is necessary to separate the text from the ruby.

まだ、ルビ付文字を含む日本語文章は従来の検出切出し
方法では困難な場合も生じる。However, there are still cases where it is difficult to use conventional detection and extraction methods for Japanese sentences that include ruby characters.

たとえば「田舎」に付されるルビ「いなか」のうち「な
」は文字「田」と「舎」の中間に位置する。For example, in the ruby ``inaka'' attached to ``inaka'', ``na'' is located between the characters ``田'' and ``sha''.

したがって「田」と「舎」の間の空白部を検知して両文
字を分離することはできない。Therefore, it is not possible to detect the blank space between ``田'' and ``sha'' and separate the two characters.

この発明は上記事情に鑑みて為されたものであり、その
目的は上記点を解決したルビ付文字の検出切出装置を提
供するにある。The present invention has been made in view of the above-mentioned circumstances, and its object is to provide a detection and cutting device for ruby characters that solves the above-mentioned problems.

この発明によれば、ルビ付文字混りの日本語文章が印刷
された読取帳票を光学的に走査する光電変換回路と、と
の光電変換回路によって得られた電気信号を量子化する
量子化回路と、上記帳票の１行分の文章に対応する量子
化信号を蓄える１行文字バツファ回路と、この１行文字
バツファ回路に収容された１行分の文章のうちルビが付
されない側の端部の位置を検出する行文字端部検出回路
と、前記１行文字バツファ回路内の文章のうち前記行文
字端部検出回路によって検出された前記端部より所定幅
の範囲について文字の分離位置を検出する文字分離回路
と、この文字分離回路の検出した文字分離位置を記憶す
る記憶回路と、この記憶回路が記憶する前記文字分離位
置に従って文字とルビとを分離するルビ分離回路とを備
えている。According to this invention, a photoelectric conversion circuit optically scans a reading form on which Japanese text with ruby characters is printed; and a quantization circuit that quantizes an electrical signal obtained by the photoelectric conversion circuit. , a single-line character buffer circuit that stores a quantized signal corresponding to one line of text in the form, and an end portion of the one-line text stored in this single-line character buffer circuit that is not marked with ruby. a line character end detection circuit that detects the position of the line character end detection circuit; and a character separation position is detected within a predetermined width range from the end detected by the line character end detection circuit of the text in the one line character buffer circuit. The present invention includes a character separation circuit for separating characters, a storage circuit for storing character separation positions detected by the character separation circuit, and a ruby separation circuit for separating characters from ruby according to the character separation positions stored in the storage circuit.

第１図はこの発明の一実施例を示すブロック図である。FIG. 1 is a block diagram showing one embodiment of the present invention.

１はルビ付文字混りの日本語文章が印字された帳票であ
り、２はその１行分の文字例を表わす。Reference numeral 1 is a document on which Japanese text including characters with ruby is printed, and reference numeral 2 represents an example of one line of characters.

帳票１は光電変換回路３によって光学的に走査され文字
例２は電気信号に変換される。The form 1 is optically scanned by the photoelectric conversion circuit 3, and the character example 2 is converted into an electrical signal.

光電変換回路３より得られた電気信号は２値化回路４に
供給されて２値信号に量子化される。The electrical signal obtained from the photoelectric conversion circuit 3 is supplied to a binarization circuit 4 and quantized into a binary signal.

この２値信号は行分離回路５を介して１行分の文字例に
対応する２直信号かバツファ回路６に格納される。This binary signal is stored in a buffer circuit 6 as a binary signal corresponding to one line of character examples via a line separation circuit 5.

すなわち、バツファ回路６は１行分の文字列を蓄える容
量を持つ。That is, the buffer circuit 6 has a capacity to store one line of character strings.

第２図ａ〜ｄは帳票１の入力方向を示す図である。2A to 2D are diagrams showing the input direction of the form 1. FIG.

第２図ａ，ｂは帳票１が横組に印字されている場合であ
り、第２図ｅ＋ｄは帳票１が縦組に印字されている場
合である。Figures 2a and 2b show the case where the form 1 is printed in horizontal type, and Figure 2 e+d shows the case where the form 1 is printed in vertical type.

また第２図ｂ，ｄは帳票１が逆向きで（文章の最後から
）入力される場合を示している。Furthermore, FIGS. 2b and 2d show the case where the form 1 is input in the reverse direction (starting from the end of the sentence).

いずれの場合でも帳票１は矢印の方向を上記として、光
電変換回路３は右方向へ走査を左上端から開始するもの
として示した。In either case, the form 1 is shown with the direction of the arrow as above, and the photoelectric conversion circuit 3 is shown with the scanning starting from the upper left end in the right direction.

このように帳票１の入力方向についての制限を許容する
と、例えば第２図ｂのように横組で印字された帳票を逆
向きに入力した場合には、バツファ回路６には第３図ａ
のように逆向きの文字列を収容することとなる。If the input direction of the form 1 is allowed to be restricted in this way, for example, when a form printed in horizontal writing as shown in FIG. 2b is input in the opposite direction, the buffer circuit 6 will be
This will accommodate reversed character strings, such as .

このような逆向きの文字列を第３図ｂに示すような通常
の文字列に変換することをアドレス変換と呼ぶ。Converting such a reversed character string into a normal character string as shown in FIG. 3b is called address conversion.

再び第１図において、７はフオーマツド指定回路であり
、帳票１の入力方向が正常であるか逆向きであるかを指
示する。Referring again to FIG. 1, 7 is a format designation circuit which designates whether the input direction of the form 1 is normal or reverse.

この指示は例えば操作者が帳票１の入力方向をみて図示
しないスイッチ操作によって行なうことができるし、ま
た帳票１上の適当な箇所に予じめ設けた基準マークの向
きを検知してもよい。This instruction can be given, for example, by the operator looking at the input direction of the form 1 and operating a switch (not shown), or by detecting the orientation of a reference mark provided in advance at an appropriate location on the form 1.

フォーマット指定回路７が逆向きであることを指示して
いれば、バソファ回路６の内容はアドレス変換回路８を
介して正常な向きに変換される。If the format designation circuit 7 indicates the reverse orientation, the contents of the bathopher circuit 6 are converted to the normal orientation via the address conversion circuit 8.

アドレス変換回路８は例えば図示しない１行文の文字列
を収容できる他のバツファ回路を有し、バツファ回路６
から順次取り出した２値信号を逆向きに順次収容される
ことにより実現できる。The address conversion circuit 8 has another buffer circuit (not shown) that can accommodate a character string of one line, for example.
This can be realized by sequentially accommodating binary signals taken out sequentially from .

次にバツファ回路６の内容又はアドレス変換回路８の出
力は１行底部検出回路９へ供給され文字列の１行底部が
検出される。Next, the contents of the buffer circuit 6 or the output of the address conversion circuit 8 are supplied to the bottom of one line detection circuit 9 to detect the bottom of one line of the character string.

ここで１行定部とは文字列においてルビが付されていな
い側であって、文字列をその配列方向に走査したとき文
字信号（黒）が存在する最丁の走査線の位置を表わして
いる。Here, the 1-line regular part is the side of the character string where ruby is not added, and represents the position of the last scanning line where the character signal (black) exists when the character string is scanned in the arrangement direction. There is.

１行底部検出回路９は、第４図ａに示すように、文字列
２０を走査線２１で順次下から上に走査して最初に文字
信号が得られた走査線が第何番目であるかを検知するこ
とによって求められる。As shown in FIG. 4a, the 1-line bottom detection circuit 9 scans the character string 20 sequentially from bottom to top using the scanning lines 21 and determines the number of the scanning line from which the first character signal is obtained. It is determined by detecting the

次にこの情報は文字分離回路１０へ供給され、１行分の
文字列を１文字毎に分離するために用いられる。This information is then supplied to the character separation circuit 10 and used to separate one line of character strings character by character.

文字分離回路１０は、第４図ｂに示すように、文字列２
０の１行底部を基準として距離ｙで表わされる所定の走
査範囲を走査線２２で示すように縦方向に順次走査する
。As shown in FIG. 4b, the character separation circuit 10 separates character string 2 from
A predetermined scanning range represented by a distance y is sequentially scanned in the vertical direction as shown by a scanning line 22 with the bottom of the first row of 0 as a reference.

距離ｙは帳票１に印字された文字の大きさによって予じ
め適当な値に定めてかくことができる。The distance y can be set in advance to an appropriate value depending on the size of the characters printed on the form 1.

このような縦方向の走査により各文字の投影データが得
られ、その中心から所定の幅の位置（文字分離位置）ｂ
１，ｂ２・・・ｂｉ・・・ｂｎを求める。Projection data of each character is obtained by such vertical scanning, and a position of a predetermined width from the center (character separation position) b
1, b2...bi...bn are determined.

この文字分離位置データは第１図において記憶回路１１
に収容される。This character separation position data is stored in the memory circuit 11 in FIG.
be accommodated in.

文字分離位置データがかわると各文字を１文字毎に容易
に切り出すことができる。When the character separation position data changes, each character can be easily separated character by character.

例えばｂ２とｂ３を指定すればその間の２値信号「成」
を表わす文字信号のみである。For example, if you specify b2 and b3, the binary signal between them will be
It is only a character signal representing .

ルビ分離回路にはこのように連続する２つの文字分離位
置データｂｉとｂｉ＋１を順次記憶回路１１から取り出
し、それが規定する範囲を行方向に走査する。The ruby separation circuit sequentially takes out two consecutive character separation position data bi and bi+1 from the storage circuit 11 and scans the range defined by them in the row direction.

すなわち第４図Ｃに示すように、１行底部より走査線２
３で順次上方に走査する。That is, as shown in FIG. 4C, scanning line 2 starts from the bottom of row 1.
3 to sequentially scan upwards.

走査線２３中に文字信号（黒）が全く検出されず白信号
のみが検出されたときに走査を終了することによってル
ビを除いた文字のみが切り出され、第４図ｄに示すよう
なルビ分離給果が得られる。By terminating the scanning when no character signal (black) is detected in the scanning line 23 and only a white signal is detected, only the characters excluding the ruby are extracted, and the ruby is separated as shown in Fig. 4d. You can get benefits.

ルビ分離回路によって得られた分離文字は文字アドレス
変換回路１．３を介して認識部１４へ供給されて認識さ
れる。The separated characters obtained by the ruby separation circuit are supplied to the recognition section 14 via the character address conversion circuit 1.3 and are recognized.

文字アドレス変換回路１３はフォーマット指定回路１５
の指定に基づいて分離文字のアドレス変換を行なう。Character address conversion circuit 13 is format designation circuit 15
The address of the separator character is converted based on the specification of .

このアドレス変換は、第５図ａに示すような縦組で印字
された文字（第２図Ｃ又はｄ）を第５図ｂに示すように
９０度回転させるものである。In this address conversion, characters (C or d in FIG. 2) printed in vertical writing as shown in FIG. 5a are rotated by 90 degrees as shown in FIG. 5b.

すなわち、フォーマット指定回路７により帳票１が縦組
である率合に文字アドレス変換回路１３は上述の処理を
行ない、横組である場合にはそのまま分離文字を出力す
る。That is, the character address conversion circuit 13 performs the above-mentioned process when the format designation circuit 7 indicates that the form 1 is in vertical writing mode, and when it is in horizontal writing mode, it outputs separated characters as is.

第６図は第１図に示す実施例の主要部の一構成例を示す
図である。FIG. 6 is a diagram showing an example of the configuration of the main parts of the embodiment shown in FIG. 1.

バツファ回路６に１行分の文字列が収容されると、走査
回路３０の出力する信号ＣＰＩに同期して２値信号が順
次取り出され１行白黒判定回路３１及びゲート３２を介
してバツファ回路３３へ供給される。When one line of character strings is stored in the buffer circuit 6, binary signals are sequentially taken out in synchronization with the signal CPI output from the scanning circuit 30 and sent to the buffer circuit 33 via the one line black/white determination circuit 31 and the gate 32. supplied to

以下２値信号が文字信号（黒）のとき“１”空白（白
）のとき“０”とする。Below, it is "1" when the binary signal is a character signal (black) and "0" when it is blank (white).

１行白黒判定回路３１は第４図ａの符号２１で示す１本
の走査線中に１つでも黒が検出されると端子Ａに出力信
号（ “１” ）を出力し、すべて白の場合に端子Ｂに
出力信号を出力する。The one-line black-and-white determination circuit 31 outputs an output signal ("1") to terminal A when even one black color is detected in one scanning line indicated by reference numeral 21 in FIG. 4a; An output signal is output to terminal B.

端子Ａの出力信号は、ゲート３４を介してフリップフロ
ツプ３５をセットする。The output signal at terminal A sets flip-flop 35 through gate 34.

したがって、バツファ回路６の内容の走査を開始して黒
が検出されるとゲート３２が開かれ、以後バツファ回路
６の内容は順次バソファ回路３３へ収容される。Therefore, when scanning of the contents of the buffer circuit 6 is started and black is detected, the gate 32 is opened, and thereafter the contents of the buffer circuit 6 are sequentially stored in the buffer circuit 33.

このときバツファ回路３３の内容は１行底部以下の情報
を持たない。At this time, the contents of the buffer circuit 33 do not have information below the bottom of the first row.

なお第６図において走査回路３０の出力する信号Ｅ１は
第４図ａに示すように、各走査線２３の同期信号である
。In FIG. 6, the signal E1 outputted from the scanning circuit 30 is a synchronization signal for each scanning line 23, as shown in FIG. 4a.

一方、計数回路３６は黒を含む走査線の本数を計数し、
所定の個数例えば５個以上であると出力信号（“１”
）を出力する。On the other hand, the counting circuit 36 counts the number of scanning lines including black,
When the predetermined number of pieces is, for example, 5 or more, an output signal (“1”
) is output.

いま計数回路３６の計数値が４で次の走査線はすべて白
であったとすると、１行白黒判定回路３１の端子Ｂに出
力信号が得られてゲート３１が開かれる。Assuming that the count value of the counting circuit 36 is 4 and all the next scanning lines are white, an output signal is obtained at the terminal B of the 1-row monochrome determination circuit 31, and the gate 31 is opened.

ゲート３７の出力信号はフリツプフロツプ．３５及びバ
ツファ回路３３をリセットする。The output signal of gate 37 is a flip-flop. 35 and the buffer circuit 33 are reset.

これは１行の文字列がブランクである場合でかつよごれ
等によって黒が検出された場合、バツファ回路６に収容
されている内容はブランクであるとして次の処理に移る
だめである。This is because if one line of character strings is blank and black is detected due to dirt or the like, the content stored in the buffer circuit 6 is assumed to be blank and the next process cannot proceed.

さて、バツファ回路３３には文字列の１行底部を基底と
してデータが収容されているので、走査回路３８はゲー
ト３９を介して第４図ｂに示す幅ｙの走査線２２に対応
する信号ＣＰ２を出力して、バツファ回路３３のデータ
を順次射影回路４０へ．供給する。Now, since the buffer circuit 33 stores data based on the bottom of one line of the character string, the scanning circuit 38 sends a signal CP2 corresponding to the scanning line 22 of width y shown in FIG. , and sequentially sends the data from the buffer circuit 33 to the projection circuit 40. supply

射影回路４０は各走査線中に黒が含まれていれば信号“
１”を出力する。If each scanning line contains black, the projection circuit 40 outputs a signal "
Outputs 1”.

射影回路４０の出力信号は走査回路３８の出力する走査
線２２の本数を規定する信号Ｅ２とともに文字分離位置
計算回路４１へ供給される。The output signal of the projection circuit 40 is supplied to a character separation position calculation circuit 41 together with a signal E2 that defines the number of scanning lines 22 output from the scanning circuit 38.

文字分離位置計算回，路４１へ供給される。The signal is supplied to the character separation position calculation circuit 41.

文字分離位置計算回路４１は信号Ｅ２を計数するととも
に、射影回路４０の出力信号が“１”→“０”及び“０
”→“１”へ変化したときの信号Ｅ２の計幹値ａ，ｂか
ら各文字の中心位置Ａを計算する。The character separation position calculation circuit 41 counts the signal E2, and the output signal of the projection circuit 40 changes from "1" to "0" and "0".
The center position A of each character is calculated from the stem values a and b of the signal E2 when the signal E2 changes from "" to "1".

第１図に示すようにこの中心位置Ａに予じめ求めておい
た所定値ΔＢを加えた位置を文字分離位置ｂｉとして
求める。As shown in FIG. 1, a position obtained by adding a predetermined value ΔB determined in advance to this center position A is determined as a character separation position bi.

なお第７図において２４は射影回路４Ｄの出力信号を表
わしている。Note that in FIG. 7, 24 represents the output signal of the projection circuit 4D.

このようにして得られた文字分離位置データは順次記憶
回路４２に収各される。The character separation position data obtained in this way is stored in the storage circuit 42 in sequence.

次に記憶回路４２内の連続する２つの文字分離位置デー
タｂ１，ｂ２をそれぞれ先頭位置レジスタ４３及び最
終位置レジスタ４４にセットする。Next, two consecutive character separation position data b1 and b2 in the storage circuit 42 are set in the first position register 43 and the last position register 44, respectively.

レジスタ４４には走査回路４５より同期信号ＣＰ３が
供給され、それぞれ供給されたＣＰ３の個数がｂ１
、ｂ２に一致すると出力信号を発生する。The register 44 is supplied with the synchronizing signal CP3 from the scanning circuit 45, and the number of supplied CP3 is b1.
, b2, an output signal is generated.

レシスタ４３の出力がフリソプフロツプ４６をセットと
、レジスタ４４の出力がフリツプフロツプ４６をリセッ
トすることによって、、フリツプフロツプ４６の出力信
号は第４図Ｃに符号２５で示すマスク信号が得られる。The output of the register 43 sets the flip-flop 46, and the output of the register 44 resets the flip-flop 46, so that the output signal of the flip-flop 46 is a mask signal shown at 25 in FIG. 4C.

このマスク信号によって規定される期間ＣＰ３はケート
４７．３９を介してバツファ回路３３を走査するととも
に、白検知回路４８及び文字バソファ回路４９を駆動す
る。During the period CP3 defined by this mask signal, the buffer circuit 33 is scanned via the gates 47 and 39, and the white detection circuit 48 and the character buffer circuit 49 are driven.

すなわちバツファ回路３３から順次取り出されたデータ
は順次文字バツファ回路４９へ収容される。That is, the data sequentially taken out from the buffer circuit 33 is sequentially stored in the character buffer circuit 49.

一方、マスク信号によってゲート５０が開かれ、走査回
路４５からの走査線２３の本数に対応する信号Ｅ３が走
査線数計数回路５１に供給されて、走行線（２）の本数
が計数される。On the other hand, the gate 50 is opened by the mask signal, and the signal E3 corresponding to the number of scanning lines 23 from the scanning circuit 45 is supplied to the scanning line number counting circuit 51 to count the number of running lines (2).

走査線数計数回路５１は計数値が所定の値Ｈになるとフ
リツプフロツプ５２をセットし、Ｈ＋△になるとフリツ
プフロツプ５２をリセットする。The scanning line number counting circuit 51 sets the flip-flop 52 when the count reaches a predetermined value H, and resets the flip-flop 52 when the count reaches H+Δ.

すなわち、Ｈ−Ｈ＋△の間ゲート５３が開かれる。That is, the gate 53 is opened during H-H+Δ.

この間に白検知回路４８により、第４図Ｃに示す走査線
２３において、すべてが白であることが検知されると、
ゲート５３．５４を介して文字バツファ回路４９ヘデー
タの格納を停止する。During this time, if the white detection circuit 48 detects that all of the scanning lines 23 shown in FIG. 4C are white,
The storage of data to the character buffer circuit 49 via gates 53 and 54 is stopped.

またＨ−Ｈ＋△の間に白検知回路４８が出力信号を出さ
ない場合にも同様に停止する。Further, when the white detection circuit 48 does not output an output signal during H-H+△, it similarly stops.

したがって第４図Ｃに示す、Ｃ１よりＣｍまで走査線
に対応するデータのみが文字バツファ回路４９へ格納さ
れ、ルビが分離される。Therefore, only the data corresponding to the scanning lines C1 to Cm shown in FIG. 4C are stored in the character buffer circuit 49, and ruby is separated.

以上のようにこの発明によれば、ルビ付き文字のルビと
文字とを分離することができ、文字読取装置の読取対象
文字を大きく拡げることができる。As described above, according to the present invention, it is possible to separate the ruby of a character with ruby and the character, and it is possible to greatly expand the characters to be read by a character reading device.

また認識部の構成を何ら変更することなくルビ付き文字
の読み取りが可能となる。Furthermore, characters with ruby text can be read without changing the configuration of the recognition unit.

上記実施例ではルビを除き文字のみを取り出すように構
成したがルビ部分も検出切出しして認識部に供給するこ
ともできる。Although the above embodiment is configured to extract only the characters excluding the ruby part, it is also possible to detect and cut out the ruby part and supply it to the recognition section.

このような構成によれば例えばルビを認識することによ
り、対応する文字の認識処理を著しく軽減させることが
できる。According to such a configuration, by recognizing ruby characters, for example, the recognition processing of the corresponding characters can be significantly reduced.

その他種々変形して実施できることは言うまでもない。It goes without saying that various other modifications can be made.

[Brief explanation of drawings]

第１図はこの発明の一実施例を示す図、第２図ａ、
ｂ、ｃ、ｄ、第３図ａ、ｂ、第４図ａ、ｂ，
ｃ，ｄ、第５図ａ、ｂ及び第７図はこの発明の一実施例
の動作を説明するための図、第６図はこの発明の一実施
列の主要部の一構成図である。１・・帳票、３・−・光電変換回路、６・・・バツフア
回路、９・・・１行底部検出回路、１０・・・文字分離
回路、１１・・・記憶回路、１２・・・ルビ分離回路。Fig. 1 shows an embodiment of the present invention, Fig. 2a,
b, c, d, Figure 3 a, b, Figure 4 a, b,
c, d, FIGS. 5a, b, and 7 are diagrams for explaining the operation of one embodiment of the present invention, and FIG. 6 is a block diagram of the main part of one embodiment of the present invention. 1... Ledger, 3... Photoelectric conversion circuit, 6... Buffer circuit, 9... First line bottom detection circuit, 10... Character separation circuit, 11... Memory circuit, 12... Ruby Separation circuit.

Claims

[Claims]

1. A photoelectric conversion circuit that optically scans a form printed with Japanese text mixed with ruby characters and converts it into an electrical signal, and a quantization circuit that quantizes the electrical signal obtained from this photoelectric conversion circuit. Among the quantized signals obtained from this quantization circuit, a buffer circuit that accommodates a quantized signal corresponding to a character string for one line of the Japanese text, and a position of the bottom of the character string in this buffer circuit are detected. a projection circuit that calculates a projection of a quantized signal within a predetermined width in a direction perpendicular to the arrangement direction of the character strings based on the bottom position detected by this means; a separation position calculation circuit that calculates the separation position of the character string from the projection result; and a ruby circuit that sequentially detects the quantized signal within the range separated by the separation position and separates the ruby and character of the ruby-applied character. 1. A detection and cutting device for characters with ruby, characterized by comprising a separation circuit.