JP2004013680A

JP2004013680A - Character code compression/decompression device and method

Info

Publication number: JP2004013680A
Application number: JP2002168638A
Authority: JP
Inventors: Hiroyuki Obara; 小原　宏幸
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2002-06-10
Filing date: 2002-06-10
Publication date: 2004-01-15

Abstract

<P>PROBLEM TO BE SOLVED: To obtain a character code compression/decompression device and method for reducing a whole storage capacity. <P>SOLUTION: The character codes of characters whose use frequency is high are compression-converted, and the information of the separation of the character codes is generated, and the compression-conversion results and the separation information are connected by a data processor 2. A conversion table storing part 31 preliminarily stores character code information corresponding to each bit column of characters, and a storage device 3 stores the character codes converted by using the conversion table and the separation information indicating the separating positions of the conversion result. The character code compression device is configured to prepare the conversion table in which character codes are assigned to a smaller number of bits in the order of the appearance frequency of characters, and to assign the character codes to be frequently used to the short bits, and to reduce the whole storage capacity, and to increase the converting efficiency of the character codes. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、文字コード圧縮・復元装置および同方法に関する。さらに詳述すると、本発明は、頻繁に使用される文字コードを短いビットに割り当てて全体の記憶容量を削減する文字コード圧縮・復元装置および同方法に関する。
【０００２】
【従来の技術】
従来、文字コード圧縮・復元装置および同方法は、たとえば、記憶装置に適用される。従来は、入力装置から得た文字コードをそのまま記憶装置に格納したり、出力装置から外部に流したりしている。
【０００３】
例えば、文字「ＡＢＣＤＥ」の場合について、より具体的な処理手順例を以下に説明する。本文字例を、通常のビット列（８ｂｉｔ　ＡＳＣＩＩ）で表した場合の、文字／ビット列、の関係は以下となる。
Ａ／０１０００００１、Ｂ／０１００００１０、Ｃ／０１００００１１、
Ｄ／０１０００１００、Ｅ／０１０００１０１
【０００４】
上記の従来例によれば、下記の特徴がある。
１）１文字を常に８ｂｉｔで表しているため、文字の区切りを示す必要がない。
２）必要なビット数は、８ｂｉｔ×５文字＝４０ｂｉｔ、である。
【０００５】
本発明と技術分野の類似する先願発明例１として、特開平１１−８５４５９号公報の「文字データ符号化方法および記録媒体」がある。本先願発明例１では、基本的にはＡＳＣＩＩコード表に従って、１文字を１つのコードに符号化するが、複数文字からなる特定のキーワードについては、制御文字用に用意された領域コードを用いて符号化する。即ち、文字数とその出現頻度とを考慮して、データ圧縮効果の高いものから順に選択された１６個のキーワードを、コード１０Ｈ〜１ｆＨに符号化する。これにより、より少ない記憶容量にて効率よく記憶媒体に格納可能なように文字データを符号化するとしている。
【０００６】
先願発明例２として、特開平１１−５５１２５号公報の「文字データの圧縮・復元方法」がある。本先願発明例２では、原データを判別して内部番号列に変換して符号化する。また、内部番号に変換された文字列が辞書に保持されていない場合には、２つのグループに分け、ひらがな、漢字等の文字を符号化し、グループを表すビットを付して出力する。これにより、小さいデータでも高い圧縮率を得ることができる、としている。
【０００７】
先願発明例３として、特開平９−６９７８５号公報の「データ圧縮方法及びデータ圧縮装置」がある。本先願発明例３では、特定文字のコードデータとその圧縮コードデータとを対応付けた対応一覧表を用意しておき、圧縮対象データから取り出した１文字分の文字コードデータがその対応一覧表に存在するか否かを検索する。その結果、当該データが対応する一覧表に存在する場合には、その圧縮コードデータを対応一覧表から読み出し、存在しない場合には、当該データをそのまま出力する。これにより、効果的なデータ圧縮を行うとしている。
【０００８】
【発明が解決しようとする課題】
しかしながら、従来の文字コード圧縮・復元装置および同方法では、入力装置から得た文字コードをそのまま記憶装置に格納している。このため、１文字当たり常に１バイト（８ビット）の容量を使用しており、文章全体に渡って無駄なビットが記憶容量の多くを占めているという問題を伴う。
【０００９】
本発明は、頻繁に使用される文字コードを短いビットに割り当て、全体の記憶容量を削減する文字コード圧縮・復元装置および同方法を提供することを目的とする。
【００１０】
【課題を解決するための手段】
かかる目的を達成するため、本発明の文字コード圧縮・復元装置は、入力された文字の文字コードを圧縮変換し、該文字コードの区切りの情報を生成し、圧縮変換結果と区切り情報とを結合するデータ処理装置と、文字の各ビット列に対応する文字コード情報を予め記憶している変換テーブルを使用して変換された文字コード、および該変換結果の区切り位置を示す区切りの情報を格納する記憶装置とを有し、文字の出現頻度順にビット数の少ない所に割り当てた変換テーブルを作成し、文字コードの変換効率を高めたことを特徴としている。
【００１１】
また、上記の記憶装置は、文字コードを変換する時に使用する変換テーブル記憶部と、圧縮変換した結果を格納する圧縮情報記憶部と、変換結果の区切り位置を示す区切り情報を格納する区切り情報記憶部とを備え、変換テーブル記憶部は、各アルファベットに対応するビット列の情報をあらかじめ記憶している記憶部とするとよい。
【００１２】
本発明の文字コード圧縮・復元方法は、入力された文字の文字コードを圧縮変換し、該文字コードの区切りの情報を生成し、圧縮変換結果と区切り情報とを結合するデータ処理工程と、文字の各ビット列に対応する文字コード情報を予め記憶している変換テーブルを使用して変換された文字コード、および該変換結果の区切り位置を示す区切りの情報を格納する記憶工程とを有し、文字の出現頻度順にビット数の少ない所に割り当てた変換テーブルを作成し、文字コードの変換効率を高めたことを特徴としている。
【００１３】
また、上記の記憶工程は、文字コードを変換する時に使用する変換テーブル記憶工程と、圧縮変換した結果を格納する圧縮情報記憶工程と、変換結果の区切り位置を示す区切り情報を格納する区切り情報記憶工程とを備え、変換テーブル記憶工程は、各アルファベットに対応するビット列の情報をあらかじめ記憶している記憶工程とするとよい。
【００１４】
【発明の実施の形態】
次に、添付図面を参照して本発明による文字コード圧縮・復元装置および同方法の実施の形態を詳細に説明する。図１から図８を参照すると、本発明の文字コード圧縮・復元装置および同方法の一実施形態が示されている。
【００１５】
本発明の文字コード圧縮・復元装置および同方法は、アルファベットとして使用されている文字コードが有しているビットサイズを減少することで、文章などで使用している文字列全体のビット数の削減を図り、全体として使用するデータの容量を圧縮するものである。また、可逆的な復元方式を用いることで、圧縮した情報から、元の文字コードへの復元動作も可能とする。
【００１６】
更に、１文字ずつ順番に圧縮・復元できる方式の為、先頭から１文字ずつ入力される場合や末尾の文字を１文字削除するなどと言った場合でも、その影響範囲を該当文字の部分だけに押さえ込むことを可能とする。本構成の内容を以下に詳述する。
【００１７】
（構成例）
図１および図２は、本発明の文字コード圧縮・復元装置および同方法の実施形態に適用される文字コード圧縮・復元装置の構成例を示すブロック図である。
図１を参照すると、文字コード圧縮装置であり、文字コード圧縮時に適用される機能部を示している。本機能部は、キーボードなどの入力装置１と、プログラム制御により動作するデータ処理装置２と、情報を記憶する記憶装置３と、情報を外部に取り出す為の出力装置４とを含む。
【００１８】
データ処理装置２は、入力装置１より入力された文字コードを圧縮する文字コード圧縮処理部２１と、文字コードの区切りの情報を生成する区切り情報生成処理部２２と、外部に出力する際に変換結果と区切り情報を結合する情報結合処理部２３とを備えている。
【００１９】
記憶装置３は、文字コードを変換する時に使用する変換テーブル記憶部３１と、圧縮変換した結果を格納する圧縮情報記憶部３２と、変換結果の区切り位置を示す区切り情報を格納する区切り情報記憶部３３とを備えている。変換テーブル記憶部３１は、各アルファベットに対応するビット列の情報を、あらかじめ記憶している。
【００２０】
次に、図２を参照すると、本実施例は、文字コードを復元する時のものであり、先の図１にて圧縮した情報を外部から入力する為の入力装置１と、プログラム制御により動作するデータ処理装置５と、情報を記憶する記憶装置６と、情報を外部に取り出す為の出力装置４とを含む。
【００２１】
記憶装置６は、文字コードの圧縮情報部分を記憶する圧縮情報記憶部６１と、区切り情報部分を記憶する区切り情報記憶部６２と、文字コードを復元する時に使用する変換テーブル記憶部６３とを備える。なお、この変換テーブル記憶部６３は、各ビット列に対応する文字コードの情報を、あらかじめ記憶している。
【００２２】
データ処理装置５は、入力装置１から得た情報を、文字コードの圧縮情報と区切り情報に分離する入力データ分離処理部５１と、記憶装置６の情報を元の文字コードに復元する文字コード復元処理部５２とを備える。
【００２３】
（動作例）
次に、図１〜図７を参照して、本実施例の動作について詳細に説明する。なお、図３は、記憶装置３および記憶装置６に記憶された変換テーブルの構成例を示す。図４は、文字列、文字のビット列、および文字の区切りビット列の構成例を示す。図５は、文字列例に対応する圧縮ビット列、区切りビット列、および従来のコード例を表した図である。図６および図７は、処理手順例を示すフローチャートである。
【００２４】
これらの図において、図１および図６は、文字コードから圧縮データへ圧縮する処理例を、図２および図７では、圧縮されたデータから元の文字コードへの復元を行う処理例を表している。
【００２５】
先ず、文字コードを入力する（ステップＡ１）。入力が未終了の場合は（ステップＡ２／ＮＯ）、入力文字を圧縮ビット列に変換し（ステップＡ３）、変換後のビット列から区切りビット列を生成する（ステップＡ４）。
入力が終了した場合は（ステップＡ２／ＹＥＳ）、圧縮ビット列と区切りビット列の結合を実行し（ステップＡ５）、実行後の結合されたビット列を外部へ出力する（ステップＡ６）。
【００２６】
図１の入力装置１から与えられた文字コードは（ステップＡ１）、データ処理装置２により記憶装置３を参照して、別のビット列に置き換える（ステップＡ３、Ａ４、Ａ５）。記憶装置３は、図３に示す通り、各文字に対応したビット列を保持し、それをテーブルとして記憶している。
【００２７】
データ処理装置２は、入力された文字コードを、記憶装置３を元にしたビット列に置き換える（ステップＡ３、Ａ４、Ａ５）。その際、データ処理装置２では、文字コードを置き換えたビット列の区切り情報を記憶するために、文字の区切り部分識別のためのビット列を生成する（ステップＡ４）。
【００２８】
図４は、文字列“ＡＢＣＤＥ”と、文字ビット列と、文字の区切りビット列とを示している。本図４に示す通り、置き換えたビット列の先頭ビットは“１”、残りのビットは“０”に相当するビット列を生成する。これにより、区切りビット列のビットが“１”の所に該当する圧縮ビット列のビットが、圧縮した文字コードの先頭ビットとなり、続く“０”のビット列の部分が、圧縮した文字コードの残りの部分となる。
【００２９】
図６に示す通り、上記の処理を入力が終了するまで行い、入力が終了した時点で、データ処理装置２にて、圧縮ビット列と区切りビット列の結合を行い、出力装置４にて出力を行う。
【００３０】
次に、図２の入力装置１から与えられたデータは、データ処理装置５の入力データ分離処理部５１で、圧縮情報データと区切り情報データの２つに分割を行う。分割したこれらのデータはそれぞれ、圧縮情報記憶部６１と区切り情報記憶部６２に格納する。次に、データ処理装置５の文字コード復元処理部５２では、記憶装置６の圧縮情報記憶部６１と区切り情報記憶部６２と変換テーブル記憶部６３のデータを、元の文字コードへの復元処理を行う。データ処理装置５は、区切り情報記憶部６２から文字の区切り情報として、ビット“１”とそれに続くビット“０”を、次にビット“１”が現れるか、区切りデータが末尾になるまで順に取り出す。その時に、何ビット取り出したかをカウントしておく。
【００３１】
次に、文字コード復元処理部５２は、先ほどカウントした数の分だけ、圧縮情報記憶部６１から圧縮情報のビットを取り出す。次に、文字コード復元処理部５２は、取り出した圧縮情報のビットを、変換テーブル記憶部６３のテーブルで検索し、該当するビット列に相当する文字コードを入手する。
【００３２】
図３に示す通り、変換テーブル記憶部６３が記憶する変換テーブルは、文字コードとビット列のテーブルとなっており、ビット列を順に検索することで、そのビット列に相当する文字コードを割り出すことが出来る。取り出した文字コードは、出力装置４により外部に取り出す。
【００３３】
図７に示す通り、先ずデータを入力する（ステップＢ１）。入力したデータを、圧縮ビット列と区切りビット列に分離する（ステップＢ２）。区切りビット列が残っている場合は（ステップＢ３／ＹＥＳ）、区切りビット列を元に、圧縮ビット列から１文字分のビット列を入手し（ステップＢ４）、１文字分のビット列から、テーブルを元に文字コードを入手する（ステップＢ５）。圧縮ビット列から１文字分のビット列入手（ステップＢ４）と、文字コード入手（ステップＢ５）は、区切りビット列が無くなるまで繰り返し実行される（ステップＢ３／ＮＯ）。
【００３４】
このように、区切り情報記憶部６２に区切り情報が残っている場合には（ステップＢ３／ＹＥＳ）、再度、区切り情報記憶部６２から取り出した情報を元に、圧縮情報記憶部６１から圧縮情報を取り出し（ステップＢ４）、変換テーブル記憶部６３のテーブルから文字コードを割り出す（ステップＢ５）。
１文字分のビット列から入手した文字コードは、出力装置４により、外部に取り出すものとする（ステップＢ６）。
【００３５】
図８の変換テーブルによるビット列で表した場合の、文字／ビット列、の関係は、以下となる。
Ａ／１０、　　　　　　Ｂ／０１１１、　　　　Ｃ／０１０１、
Ｄ／０１１、　　　　　Ｅ／１
上記の具体例では、１文字を可変長で表しているため、文字の区切りの位置を示す必要がある。そのため、区切り識別用のビット列を設けることとする。
【００３６】
例えば、区切りの識別子には、先頭ビットが“１”、残りのビットが“０”としたものを使用する。一例を、以下に示す。以下の、文字／ビット列、の関係の具体例は、図８のテーブルを使用した場合に該当している。
３）文字列；　　　　　Ａ　　　Ｂ　　　　Ｃ　　　Ｄ　　　Ｅ
４）文字のビット列；　１０　０１１１　０１０１　０１１　　１　　　１４ｂｉｔ
５）区切り用ビット列；１０　１０００　１０００　１００　　１　　　１４ｂｉｔ
上に掲げた具体例では、合計が２８ｂｉｔとなる。この場合のビット数を上述した従来の処理例と比較すると、本例／従来例が２８／４０＝７０％となる。
【００３７】
（実施例の効果）
第１の効果は、通常８ビットで表現される文字コードを、最低２ビット〜最悪８ビットで表わすことで文章全体で使用する容量を削減でき、少ないメモリで多くの文字が記憶できる。また、ネットワークなどで送受信する際にも、流すデータ量が削減され、転送速度の向上とトラフィックの軽減がなされる。
【００３８】
なお、図４に示した通り、単純な「ＡＢＣＤＥ」といった文字列の場合には、７０％の圧縮率となっている。しかし、通常の文書中に頻繁に使用される文字を割り出し、その文字を頻度順にビット数の少ない所に割り当てるようなテーブルを作成することで、更に圧縮率を高めることが可能となる。
【００３９】
（他の実施例）
図１の変換テーブル記憶部３１のテーブルの内容を、当事者間で了解され、外部に知らせていない独自なテーブルを使用することで、入出力データが外部に漏洩した際にも、データが暗号化された状態となり、機密保持を行うことが出来る。
【００４０】
なお、上述の実施形態は本発明の好適な実施の一例である。ただし、これに限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々変形実施が可能である。
【００４１】
【発明の効果】
以上の説明より明らかなように、本発明の文字コード圧縮・復元装置および同方法は、入力された文字の文字コードを圧縮変換し、該文字コードの区切りの情報を生成し、圧縮変換結果と区切り情報とを結合し、文字の各ビット列に対応する文字コード情報を予め記憶している。また、変換テーブルを使用して変換された文字コード、および該変換結果の区切り位置を示す区切りの情報を格納し、文字の出現頻度順にビット数の少ない所に割り当てた変換テーブルを作成して、文字コードの変換効率を高めている。
【図面の簡単な説明】
【図１】本発明の文字コード圧縮・復元装置の実施形態に適用され、文字コード圧縮時に使用される機能部の構成例を示すブロック図である。
【図２】本発明の文字コード圧縮・復元方法の実施形態に適用され、文字コード復元時に使用される機能部の構成例を示すブロック図である。
【図３】記憶装置３および記憶装置６に記憶された変換テーブルの構成例を示す。
【図４】文字列、文字のビット列、および文字の区切りビット列の構成例を示す。
【図５】文字列例に対応する圧縮ビット列、区切りビット列、および従来のコード例を表した図である。
【図６】処理手順例を示す第１のフローチャートである。
【図７】処理手順例を示す第２のフローチャートである。
【図８】変換テーブルの構成例を示している。
【符号の説明】
１　入力装置
２　データ処理装置
３　記憶装置
４　出力装置
５　データ処理装置
６　記憶装置
２１　文字コード圧縮処理部
２２　区切り情報生成処理部
２３　情報結合処理部
３１　変換テーブル記憶部
３２　圧縮情報記憶部
３３　区切情報記憶部
５１　入力データ分離処理部
５２　文字コード復元処理部
６１　圧縮情報記憶部
６２　区切り情報記憶部
６３　変換テーブル記憶部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a character code compression / decompression device and method. More specifically, the present invention relates to a character code compression / decompression apparatus and method for assigning frequently used character codes to short bits to reduce the overall storage capacity.
[0002]
[Prior art]
Conventionally, a character code compression / decompression device and method are applied to, for example, a storage device. Conventionally, a character code obtained from an input device is stored in a storage device as it is, or is sent to an external device from an output device.
[0003]
For example, in the case of the character “ABCDE”, a more specific processing procedure example will be described below. When this character example is represented by a normal bit string (8-bit ASCII), the relationship between the character and the bit string is as follows.
A / 01000001, B / 01000010, C / 01000011,
D / 01000100, E / 01000101
[0004]
According to the above conventional example, the following features are provided.
1) Since one character is always represented by 8 bits, there is no need to indicate a character delimiter.
2) The required number of bits is 8 bits × 5 characters = 40 bits.
[0005]
Japanese Patent Application Laid-Open No. H11-85459 discloses "Character data encoding method and recording medium" as a first example of the prior application similar to the present invention in the technical field. In Inventive Example 1 of the prior application, one character is basically encoded into one code in accordance with the ASCII code table. For a specific keyword including a plurality of characters, an area code prepared for a control character is used. Encoding. That is, in consideration of the number of characters and the appearance frequency, 16 keywords selected in order from the one with the highest data compression effect are encoded into codes 10H to 1fH. Thereby, the character data is encoded so that it can be efficiently stored in a storage medium with a smaller storage capacity.
[0006]
Japanese Patent Application Laid-Open No. H11-55125 discloses a "method of compressing and restoring character data" as Invention Example 2 of the prior application. In Inventive Invention Example 2, the original data is discriminated, converted into an internal number sequence, and encoded. If the character string converted to the internal number is not stored in the dictionary, the character string is divided into two groups, and characters such as hiragana and kanji are encoded and output with bits indicating the group. It states that a high compression ratio can be obtained even with small data.
[0007]
Japanese Patent Application Laid-Open No. 9-69785 discloses "Data Compression Method and Data Compression Apparatus". In Inventive Example 3, a correspondence list in which code data of a specific character is associated with its compression code data is prepared, and the character code data for one character extracted from the data to be compressed is stored in the correspondence list. Search whether it exists in. As a result, if the data exists in the corresponding list, the compressed code data is read from the corresponding list, and if not, the data is output as it is. Thereby, effective data compression is performed.
[0008]
[Problems to be solved by the invention]
However, in the conventional character code compression / decompression device and method, the character code obtained from the input device is stored in the storage device as it is. Therefore, one byte (8 bits) is always used for one character, and there is a problem that useless bits occupy a large part of the storage capacity over the entire text.
[0009]
SUMMARY OF THE INVENTION It is an object of the present invention to provide a character code compression / decompression apparatus and method for allocating frequently used character codes to short bits and reducing the overall storage capacity.
[0010]
[Means for Solving the Problems]
In order to achieve this object, the character code compression / decompression device of the present invention compresses and converts the character code of an input character, generates information of the character code delimiter, and combines the compression conversion result with the delimiter information. A data processing device that stores character codes converted by using a conversion table that stores character code information corresponding to each bit string of characters in advance, and storage for storing delimiter information indicating a delimiter position of the conversion result A conversion table assigned to places where the number of bits is small in the order of appearance frequency of characters, thereby improving the conversion efficiency of character codes.
[0011]
Further, the storage device includes a conversion table storage unit used when converting a character code, a compression information storage unit that stores a result of compression conversion, and a delimiter information storage that stores delimiter information indicating a delimiter position of the conversion result. The conversion table storage unit may be a storage unit that stores bit string information corresponding to each alphabet in advance.
[0012]
A character code compression / decompression method according to the present invention includes a data processing step of compressing and converting a character code of an input character, generating information of a delimiter of the character code, and combining the compression conversion result with the delimiter information; Storing a character code converted using a conversion table that stores character code information corresponding to each bit string in advance, and delimiter information indicating a delimiter position of the conversion result. A conversion table assigned to places with a small number of bits in the order of appearance frequency is created to enhance the conversion efficiency of character codes.
[0013]
Further, the storage step includes a conversion table storage step used when converting a character code, a compression information storage step storing a result of compression conversion, and a delimiter information storage storing delimiter information indicating a delimiter position of the conversion result. The conversion table storage step may be a storage step in which bit string information corresponding to each alphabet is stored in advance.
[0014]
BEST MODE FOR CARRYING OUT THE INVENTION
Next, an embodiment of a character code compressing / decompressing apparatus and method according to the present invention will be described in detail with reference to the accompanying drawings. 1 to 8, there is shown an embodiment of a character code compression / decompression apparatus and method according to the present invention.
[0015]
The character code compression / decompression device and method of the present invention reduce the bit size of a character code used as an alphabet, thereby reducing the number of bits of the entire character string used in a sentence or the like. In order to reduce the amount of data used as a whole. Further, by using a reversible restoration method, a restoration operation from compressed information to an original character code can be performed.
[0016]
Furthermore, since the compression / decompression method can be used in order one character at a time, even if the characters are input one by one from the beginning or the last character is deleted, the affected area is limited to the relevant character part only. It is possible to hold down. The contents of this configuration will be described in detail below.
[0017]
(Configuration example)
FIGS. 1 and 2 are block diagrams showing a configuration example of a character code compression / decompression device applied to an embodiment of the character code compression / decompression device and the method according to the present invention.
FIG. 1 shows a character code compression apparatus, which shows a functional unit applied at the time of character code compression. This functional unit includes an input device 1 such as a keyboard, a data processing device 2 operated by program control, a storage device 3 for storing information, and an output device 4 for extracting information to the outside.
[0018]
The data processing device 2 includes a character code compression processing unit 21 that compresses a character code input from the input device 1, a delimiter information generation processing unit 22 that generates information of a character code delimiter, and a conversion when outputting to the outside. An information combination processing unit 23 for combining the result and the delimiter information is provided.
[0019]
The storage device 3 includes a conversion table storage unit 31 used when converting a character code, a compression information storage unit 32 storing a result of compression conversion, and a delimiter information storage unit storing delimiter information indicating a delimiter position of the conversion result. 33. The conversion table storage unit 31 stores bit string information corresponding to each alphabet in advance.
[0020]
Next, referring to FIG. 2, this embodiment is for restoring a character code, and operates by an input device 1 for externally inputting the information compressed in FIG. A data processing device 5, a storage device 6 for storing information, and an output device 4 for extracting information to the outside.
[0021]
The storage device 6 includes a compression information storage unit 61 that stores a compression information portion of a character code, a delimiter information storage unit 62 that stores a delimiter information portion, and a conversion table storage unit 63 that is used when restoring a character code. . The conversion table storage unit 63 stores character code information corresponding to each bit string in advance.
[0022]
The data processing device 5 includes an input data separation processing unit 51 that separates information obtained from the input device 1 into compressed information of character codes and delimiter information, and a character code restoration unit that restores information in the storage device 6 to original character codes. And a processing unit 52.
[0023]
(Operation example)
Next, the operation of this embodiment will be described in detail with reference to FIGS. FIG. 3 shows a configuration example of the conversion table stored in the storage device 3 and the storage device 6. FIG. 4 shows a configuration example of a character string, a character bit string, and a character delimiter bit string. FIG. 5 is a diagram illustrating a compressed bit string, a delimiter bit string, and a conventional code example corresponding to a character string example. 6 and 7 are flowcharts illustrating an example of a processing procedure.
[0024]
In these figures, FIGS. 1 and 6 show examples of processing for compressing character codes into compressed data, and FIGS. 2 and 7 show examples of processing for restoring compressed data to the original character codes. I have.
[0025]
First, a character code is input (step A1). If the input has not been completed (step A2 / NO), the input character is converted into a compressed bit string (step A3), and a delimiter bit string is generated from the converted bit string (step A4).
When the input is completed (step A2 / YES), the compression bit string and the delimiter bit string are combined (step A5), and the combined bit string after execution is output to the outside (step A6).
[0026]
The character code given from the input device 1 of FIG. 1 (step A1) is replaced by another bit string by referring to the storage device 3 by the data processing device 2 (steps A3, A4, A5). As shown in FIG. 3, the storage device 3 holds a bit string corresponding to each character and stores it as a table.
[0027]
The data processing device 2 replaces the input character code with a bit string based on the storage device 3 (steps A3, A4, A5). At this time, the data processing device 2 generates a bit string for identifying a character-separated portion in order to store bit-string delimiter information obtained by replacing the character code (step A4).
[0028]
FIG. 4 shows a character string “ABCDE”, a character bit string, and a character separation bit string. As shown in FIG. 4, the first bit of the replaced bit string is "1", and the remaining bits generate a bit string corresponding to "0". As a result, the bit of the compressed bit string corresponding to the place where the bit of the delimiter bit string is “1” becomes the leading bit of the compressed character code, and the bit string of “0” that follows becomes the remaining bit of the compressed character code. Become.
[0029]
As shown in FIG. 6, the above processing is performed until the input is completed, and when the input is completed, the data processing device 2 combines the compressed bit string and the delimiter bit string, and outputs the result with the output device 4.
[0030]
Next, the data provided from the input device 1 of FIG. 2 is divided into two pieces of compressed information data and delimiter information data by the input data separation processing unit 51 of the data processing device 5. These divided data are stored in the compression information storage unit 61 and the delimitation information storage unit 62, respectively. Next, the character code restoration processing section 52 of the data processing device 5 restores the data of the compression information storage section 61, the delimiter information storage section 62, and the conversion table storage section 63 of the storage device 6 to the original character code. Do. The data processing device 5 sequentially extracts the bit “1” and the subsequent bit “0” from the delimiter information storage unit 62 as character delimiter information until the next bit “1” appears or the delimiter data reaches the end. . At that time, the number of bits extracted is counted.
[0031]
Next, the character code restoration processing unit 52 extracts the bits of the compressed information from the compressed information storage unit 61 by the number counted previously. Next, the character code restoration processing unit 52 searches the table of the conversion table storage unit 63 for the bits of the extracted compression information, and obtains a character code corresponding to the corresponding bit string.
[0032]
As shown in FIG. 3, the conversion table stored in the conversion table storage section 63 is a table of character codes and bit strings, and by sequentially searching the bit strings, a character code corresponding to the bit string can be determined. The extracted character code is extracted by the output device 4 to the outside.
[0033]
As shown in FIG. 7, data is input first (step B1). The input data is separated into a compressed bit string and a separating bit string (step B2). If the delimiter bit string remains (step B3 / YES), a bit string for one character is obtained from the compressed bit string based on the delimiter bit string (step B4), and a character code is obtained from the bit string for one character based on the table. (Step B5). The acquisition of the bit string for one character from the compressed bit string (step B4) and the acquisition of the character code (step B5) are repeatedly executed until there is no more delimiter bit string (step B3 / NO).
[0034]
As described above, when the delimiter information remains in the delimiter information storage unit 62 (step B3 / YES), the compression information is again stored in the decompression information storage unit 61 based on the information extracted from the delimiter information storage unit 62. The character code is extracted from the table of the conversion table storage 63 (step B4) (step B5).
The character code obtained from the bit string for one character is taken out to the outside by the output device 4 (step B6).
[0035]
The relationship between a character and a bit string when represented by a bit string according to the conversion table of FIG. 8 is as follows.
A / 10, B / 0111, C / 0101,
D / 011, E / 1
In the above specific example, since one character is represented by a variable length, it is necessary to indicate the position of the character delimiter. Therefore, a bit string for delimiter identification is provided.
[0036]
For example, as the delimiter identifier, the one whose first bit is “1” and whose remaining bits are “0” is used. An example is shown below. The following specific example of the relationship between the character / bit string corresponds to the case where the table of FIG. 8 is used.
3) Character string; ABCDE
4) Character bit string; 10 0111 0101 011 1 14 bits
5) Delimiter bit string; 10 1000 1000 100 1 14 bits
In the above specific example, the total is 28 bits. When the number of bits in this case is compared with the above-described conventional processing example, the present example / conventional example is 28/40 = 70%.
[0037]
(Effects of the embodiment)
The first effect is that a character code normally expressed by 8 bits is expressed by at least 2 bits to 8 bits at worst, so that the capacity used for the entire text can be reduced, and many characters can be stored with a small memory. Also, when transmitting and receiving over a network or the like, the amount of data to be transmitted is reduced, so that the transfer speed is improved and traffic is reduced.
[0038]
As shown in FIG. 4, in the case of a simple character string such as "ABCDE", the compression ratio is 70%. However, it is possible to further increase the compression ratio by creating a table in which characters that are frequently used in a normal document are allocated and the characters are assigned to places having a small number of bits in order of frequency.
[0039]
(Other embodiments)
The contents of the table in the conversion table storage unit 31 shown in FIG. 1 can be encrypted even when input / output data leaks to the outside by using a unique table which is understood by the parties and is not notified to the outside. And the confidentiality can be maintained.
[0040]
The above embodiment is an example of a preferred embodiment of the present invention. However, the present invention is not limited to this, and various modifications can be made without departing from the scope of the present invention.
[0041]
【The invention's effect】
As is clear from the above description, the character code compression / decompression apparatus and method of the present invention compresses and converts the character code of an input character, generates information on the delimitation of the character code, and generates a result of the compression conversion. The character code information corresponding to each bit string of the character is stored in advance by combining with the delimiter information. In addition, a character code converted by using the conversion table, and information of a delimiter indicating a delimiter position of the conversion result is stored, and a conversion table allocated to a place where the number of bits is small in the order of character appearance frequency is created. The efficiency of character code conversion is improved.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a configuration example of a functional unit used at the time of character code compression applied to an embodiment of a character code compression / decompression device of the present invention.
FIG. 2 is a block diagram illustrating a configuration example of a functional unit used at the time of character code decompression applied to the embodiment of the character code compression / decompression method of the present invention.
FIG. 3 shows a configuration example of a conversion table stored in a storage device 3 and a storage device 6.
FIG. 4 shows a configuration example of a character string, a character bit string, and a character delimiter bit string.
FIG. 5 is a diagram illustrating a compressed bit string, a delimiter bit string, and a conventional code example corresponding to a character string example.
FIG. 6 is a first flowchart illustrating an example of a processing procedure.
FIG. 7 is a second flowchart illustrating an example of a processing procedure.
FIG. 8 shows a configuration example of a conversion table.
[Explanation of symbols]
REFERENCE SIGNS LIST 1 input device 2 data processing device 3 storage device 4 output device 5 data processing device 6 storage device 21 character code compression processing unit 22 delimitation information generation processing unit 23 information combination processing unit 31 conversion table storage unit 32 compression information storage unit 33 partition information Storage unit 51 Input data separation processing unit 52 Character code restoration processing unit 61 Compression information storage unit 62 Delimitation information storage unit 63 Conversion table storage unit

Claims

A data processing device that compresses and converts a character code of an input character, generates information of a delimiter of the character code, and combines the compression conversion result and the delimiter information,
A storage device for storing a character code converted using a conversion table in which character code information corresponding to each bit string of the character is stored in advance, and the delimiter information indicating a delimiter position of the conversion result. And
A character code compression / decompression device, wherein the conversion table allocated to places having a small number of bits in the order of the appearance frequency of the character is created to increase the conversion efficiency of the character code.

The storage device includes a conversion table storage unit used when converting the character code, a compression information storage unit that stores the result of the compression conversion, and a delimiter information storage that stores delimiter information indicating a delimiter position of the conversion result. The character code compression / decompression device according to claim 1, further comprising:

3. The character code compression / decompression device according to claim 2, wherein the conversion table storage unit is a storage unit that stores bit string information corresponding to each alphabet in advance.

A data processing step of compressing and converting a character code of an input character, generating information of a delimiter of the character code, and combining the compression conversion result and the delimiter information;
Storing a character code converted using a conversion table in which character code information corresponding to each bit string of the character is stored in advance, and the delimiter information indicating a delimiter position of the conversion result. And
A character code compression / decompression method, wherein the conversion table allocated to places having a small number of bits in the order of the appearance frequency of the character is created, and the conversion efficiency of the character code is increased.

The storage step includes a conversion table storage step used when converting the character code, a compression information storage step for storing the result of the compression conversion, and a delimiter information storage for storing delimiter information indicating a delimiter position of the conversion result. 5. The method of claim 4, further comprising the steps of:

6. The character code compression / decompression method according to claim 5, wherein said conversion table storage step is a storage step in which bit string information corresponding to each alphabet is stored in advance.