WO2012083882A1 - 用于西文水印处理的水印图像分块方法和装置 - Google Patents

用于西文水印处理的水印图像分块方法和装置 Download PDF

Info

Publication number
WO2012083882A1
WO2012083882A1 PCT/CN2011/084577 CN2011084577W WO2012083882A1 WO 2012083882 A1 WO2012083882 A1 WO 2012083882A1 CN 2011084577 W CN2011084577 W CN 2011084577W WO 2012083882 A1 WO2012083882 A1 WO 2012083882A1
Authority
WO
WIPO (PCT)
Prior art keywords
character
image block
western
character image
document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2011/084577
Other languages
English (en)
French (fr)
Inventor
王高阳
亓文法
王立东
杨斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Peking University Founder Group Co Ltd
Beijing Founder Electronics Co Ltd
Peking University Founder Research and Development Center
Original Assignee
Peking University
Peking University Founder Group Co Ltd
Beijing Founder Electronics Co Ltd
Peking University Founder Research and Development Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University, Peking University Founder Group Co Ltd, Beijing Founder Electronics Co Ltd, Peking University Founder Research and Development Center filed Critical Peking University
Priority to US13/997,258 priority Critical patent/US9111341B2/en
Priority to JP2013545033A priority patent/JP5669957B2/ja
Priority to EP11852045.1A priority patent/EP2657902B1/en
Publication of WO2012083882A1 publication Critical patent/WO2012083882A1/zh
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/0021Image watermarking
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/158Segmentation of character regions using character size, text spacings or pitch estimation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2201/00General purpose image data processing
    • G06T2201/005Image watermarking
    • G06T2201/0051Embedding of the watermark in the spatial domain
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2201/00General purpose image data processing
    • G06T2201/005Image watermarking
    • G06T2201/0062Embedding of the watermark in text images, e.g. watermarking text documents using letter skew, letter distance or row distance
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2201/00General purpose image data processing
    • G06T2201/005Image watermarking
    • G06T2201/0083Image watermarking whereby only watermarked image required at decoder, e.g. source-based, blind, oblivious
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the present invention relates to the field of digital typesetting, and in particular to a watermark image blocking method and apparatus for western watermark processing. Background technique
  • Digital watermarking refers to embedding specific information into a digital signal, which may be audio, picture or film. To copy a signal with a digital watermark, the embedded information is also copied. Digital watermarks can be divided into two types: floating and hidden. The former is visible watermarking, and the information contained in it can be seen at the same time when viewing a picture or a movie. In general, floating watermarks usually contain the name or logo of the copyright owner. The logo placed on the corner of the screen by the TV station is also a kind of floating watermark.
  • Concealed watermarks are added to audio, pictures or movies in the form of digital data, but cannot be seen under normal conditions.
  • One of the important applications of hidden watermarks is to protect copyrights, which are expected to prevent or prevent unauthorized copying and copying of digital media.
  • Steganography is also an application of digital watermarking, where both parties can communicate using information hidden in digital signals.
  • the annotation data in digital photos can record the time of photo shooting, the aperture and shutter used, and even the camera's label, which is one of the applications of digital watermarking.
  • Some file formats can contain additional information called "metadata".
  • the patent document with the application number of 200710121642.7 discloses a method for embedding a digital watermark in a binary image, the method comprising: dividing part or all of the binary image into at least two watermark image blocks, according to each watermark image block The number of black pixel points in the group is grouped, and the data in each of the groups is subjected to a Hadamard transform. Using the quantization method, the watermark signal to be embedded is embedded, and the inverse Hadamard transform is performed to obtain the number of pixels to be changed in each watermark image block to achieve the purpose of embedding and extracting the watermark.
  • the patent document with the application number of 200810055770.0 discloses a method and apparatus for embedding a digital watermark in a binary text image, the method comprising: dividing part or all of the binary text image into an embedded portion and an adjustment portion, and calculating the An average value of the number of black pixel points included in each set of the embedding portion and the adjustment portion, and a color change parameter is calculated according to the average value and the number of black pixel points included in each set of the embedding portion, according to the color change The parameter changes the number of black pixels included in each set of the embedding portion and the adjustment portion to implement embedding of the watermark.
  • the patent document with the application number of 200610114048.0 discloses a method and apparatus for embedding and extracting a digital watermark in a black and white binary text image, wherein the embedding method comprises locating a valid character region in the text image; grouping the effective character regions, and Counting the number of black points in each character area; calculating the pixels that need to be flipped in each character area according to the relative relationship between the number of black points in each character area in the group, the watermark information bit string, and the first step length The first number; flips the pixels in each character area by the first number.
  • the extraction method includes locating effective character regions in the text image; grouping the effective character regions, and counting the number of black dots in each character region; according to the relative relationship between the number of black points in the character region of each group and the first step Long extraction of embedded watermark information bit strings.
  • the watermark image block is particularly important as an embedding area of the watermark. It can be seen that in the above-mentioned Patent Application 1, the watermark image block directly serves as a watermark embedding region. In Patent Application 2, the binary text portion is divided into an embedded portion, that is, a watermark image block. In Patent Application 3, the valid character area after grouping in the text image is used as a watermark image block.
  • the above patent applications embed the watermark by changing the number of black pixel points in the watermark image block, and extract the watermark by quantizing the number of black pixel points in the watermark image block.
  • the length of the Western word is quite different, and the number of black dots in the character image block contained in the word is relatively large. For example: "My extraordinary power" , where the length of each word differs several times. If a single Western word is used as the watermark image block, the difference in the number of black pixel points in the watermark image block is very unstable, and the watermark operation cannot be completed.
  • the obtained watermark image block needs to satisfy the following conditions: 1.
  • the influence of the watermark image block unsynchronization due to character sticking can be avoided.
  • the watermark image blocks can be adaptively divided according to the size.
  • the present invention aims to provide a watermark image blocking method and apparatus for western watermark processing to solve the problem that the prior art is difficult to correctly divide a watermark image block for a western image.
  • a watermark image blocking method for western watermark processing including: dividing a western image by row and column to obtain a plurality of character image blocks; The effective character image block is identified in the image block; the size of the valid character image block is counted to determine whether the western image is a large font document or a small font document; and the watermark image block corresponding to the large font character document and the small portion.
  • a watermark image blocking device for western watermark processing, including: a segmentation module, configured to segment a western image by row and column to obtain a plurality of character image blocks; , for identifying a valid character image block from the character image block; a statistics module, configured to perform statistics on the size of the valid character image block to determine whether the western image is a large font document or a small font document; a grouping module, configured to target The large-character character document and the small-character character document are respectively grouped by using different numbers of words; the equalization module is used to divide the word group into multiple copies, and these parts correspond to the watermark image block.
  • the watermark image blocking method and apparatus for western watermark processing solves the problem that the prior art is difficult to correctly divide the watermark image block for the western image because the watermark image block is reasonably set according to the character size. , to ensure the operability of the watermark embedding process.
  • FIG. 1 is a flowchart of a watermark image blocking method for western watermark processing according to an embodiment of the present invention
  • FIG. 2 is a flow chart of a method for discriminating a character file of a large size font according to an embodiment of the present invention
  • FIG. 3 is a flowchart of a method for grouping valid character regions according to an embodiment of the present invention
  • FIG. 4A is a schematic diagram of a row height and a center line of a western-language binary image according to an embodiment of the present invention
  • FIG. 5 is a schematic diagram showing a result of character segmentation of a Western-language binary image in an embodiment of the present invention
  • FIG. 6 is a schematic diagram showing a result of word grouping of a Western-language binary image in an embodiment of the present invention
  • 7A is a schematic diagram of calculating a valid length of a character in a word group according to an embodiment of the present invention
  • 7B is a schematic diagram of obtaining a watermark image block in an embodiment of the present invention
  • FIG. 8A is a schematic diagram of a watermark image block of a Western small font character file according to an embodiment of the present invention
  • FIG. 8B is a schematic diagram of a watermark image block after printing a western small font character document according to an embodiment of the present invention.
  • FIG. 8C is a schematic diagram of a watermark image block of a Western small character character file under abnormal spacing and partial character adhesion according to an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of calculating a segmentation threshold in a word segmentation according to an embodiment of the present invention
  • FIG. 10 is a schematic diagram of a watermark image segmentation device for western watermark processing according to an embodiment of the present invention. detailed description
  • FIG. 1 is a flowchart of a watermark image blocking method for western watermark processing according to an embodiment of the present invention, including:
  • Step S10 dividing the western image by row and column to obtain a plurality of character image blocks;
  • Step S20 identifying a valid character image block from the character image block;
  • Step S30 performing statistics on the size of the effective character image block to determine whether the western image is a large font document or a small font document;
  • Step S40 grouping the large font size document and the small font character file with different numbers of words respectively;
  • step S50 the word groups are divided into multiple copies, and the copies correspond to the watermark image blocks.
  • This embodiment counts the size of the effective character image block, which fully considers the characteristics of the western word segmentation in the lower column of the different fonts, and distinguishes the large font character document from the small font character document.
  • the word group is divided into multiple copies, which fully considers the unstable characteristics before and after the western character gap printing, and at the embedding end, the extended pitch character and the reduced pitch character are determined to be appropriately corrected. Therefore, the consistency of the Western word segmentation results before and after the print scan is ensured, so that the watermark image block has strong resynchronization property, and the watermark embedding and extraction process is more robust.
  • the method further comprises: acquiring a western image; performing noise reduction processing on the western image to obtain a binarized western image.
  • This step is to achieve pre-processing of Western images, which is easy to implement on a computer.
  • the preferred embodiment can obtain a better western binary text image by noise reduction processing.
  • step S20 comprises: dividing the character image block into a punctuation image block and a valid character image block.
  • Western text mainly consists of punctuation and letters, and the punctuation is usually small and is not suitable for embedding watermarks.
  • the preferred embodiment distinguishes between character image blocks, and the punctuation can be excluded.
  • the division of the character image block into a punctuation image block and a valid character image block includes:
  • any of the conditions 1-3 is established, it is determined that the character image block corresponding to U is a punctuation image block, and if none of the conditions 1-3 is satisfied, it is determined that the character image block corresponding to U is a valid character image block.
  • the preferred embodiment provides a specific numerical decision process that facilitates programming in a computer.
  • the preferred embodiment is the optimal coefficient obtained by the inventors after a lot of arduous trials.
  • step S30 includes: calculating m; wherein, h , , , ..., are respectively heights of valid character image blocks I, 2 , ..., m where U is in progress ; if Hs > Th slze , determining U is located
  • the line is a large character line, otherwise it is a small character line, where Th slze is the preset threshold; the number of lines of the large character line in the western image is N large and the number of lines of the small character line N small ; if N large > N small , then the Western image is determined to be a large character character document, otherwise it is a small font character document.
  • the preferred embodiment provides a specific numerical determination process that facilitates programming in a computer.
  • the preferred embodiment is the optimal threshold that the inventors have obtained after a rigorous trial. Of course, it is also possible to set the threshold in the vicinity of the above values, which still falls within the spirit of the invention and should be protected by the claims.
  • FIG. 2 is a flowchart of a method for discriminating a character file of a size word size according to an embodiment of the present invention.
  • an circumscribed rectangular frame of a plurality of character image blocks is obtained by preliminary row and column segmentation.
  • the large character size line and the small size character line are judged by the statistical features of the character image block, and then the type of the large size character document and the small size character document are determined. Specifically, it includes the following steps:
  • a circumscribed rectangular frame of several character image blocks is obtained by preliminary row and column segmentation.
  • H is the row height of the current line
  • m is the center line position of the current line.
  • the character image block U is an circumscribed rectangular frame of any of ⁇ .
  • h and w are the height and width of U, respectively.
  • Character image block U If the following three conditions are met, the mark U is a punctuation character image block; otherwise, it is marked as a valid character image block.
  • Condition 2 the bottom edge and the top edge of the character image block U both fall on the same side of the center line; Condition 3, the bottom edge and the top edge of the character image block U fall on the opposite side of the center line, And w ⁇ N t2 xH.
  • condition 1 is a punctuation mark that is shaped like a line " ";
  • condition 3 is to filter punctuation marks such as the conjunction "-".
  • h, 3 ⁇ 4, ... are respectively the heights of the valid character image blocks I, 2 , ..., m of the current line.
  • the number of lines of the large character character line N la e and the number of lines of the small character character line N small are respectively counted in the document. If N large > N small , the document belongs to a large font character document; otherwise, it belongs to a small font character document.
  • the step S50 includes: dividing the word group into a fixed number according to the effective length of the word column projection; combining the width occupied by the share and the maximum height of the character into a circumscribed rectangular frame, and the circumscribed rectangular frame corresponding to the watermark image block .
  • FIG. 3 is a flowchart of a method for grouping valid character regions according to an embodiment of the present invention.
  • a different number of words are used as a group, and each group is divided into fixed parts according to the effective length of the word column projection, and the width of each share and the maximum number of characters.
  • the heights are combined into a new circumscribed rectangle, and each new circumscribed rectangle corresponds to a watermark image block to complete the effective area grouping. Specifically, the following steps are included: 5301. Get the document type and word segmentation result.
  • the document type is obtained by step S30, that is, a large font character document or a small font character document.
  • word segmentation the distance between the circumscribed rectangles of all adjacent valid character image blocks in a row is sorted from small to large. It can be considered that this new sequence contains two types of data, one is the word spacing within the word, and the other is Classes are word spacing. Obviously the word spacing is greater than the word spacing within the word. As shown in Fig. 9, a segmentation threshold is selected from this sequence of small to large to distinguish the above two types of data. Therefore, obtaining an accurate and stable segmentation threshold is the key to word segmentation.
  • Two types of data in the above sequence can be distinguished by an image binarization method such as the Otsu method or the bimodal method.
  • an image binarization method such as the Otsu method or the bimodal method.
  • the segmentation threshold is obtained, the character image blocks corresponding to the character spacing smaller than the segmentation threshold are combined into one word. The result of the word segmentation shown in Figure 5 is finally obtained.
  • the effective character image block near the word segmentation threshold size is classified as an extended pitch character, and the effective character image block pitch is very small, and the effective character image block which is very likely to cause character sticking after printing is classified as a down-spacing character.
  • the resulting valid character image block attribute in the original document sequence, the corresponding character is moved.
  • the extended pitch character moves to the right, and all document content to the right of the extended pitch character moves to the right.
  • the reduced pitch character moves to the left, and all document content to the right of the reduced pitch character moves to the left.
  • a different number of words are used as a word group G, respectively.
  • the large-character character document is divided into a group by N t3 words
  • the small-character character document is divided into a group by N t4 words.
  • the effective length L is: .
  • the effective length Ls of each copy is divided into a sequence of valid character image blocks, and the maximum height of each character is combined into a new circumscribed rectangle, each new circumscribed rectangle.
  • the box corresponds to a watermark image block.
  • Each S watermark image block is divided into a group, and the digital watermark embedding and extraction processing is performed based on the grouped watermark image block.
  • Fig. 8A and 8B are watermark image blocks obtained before and after the scan of the small-character character document.
  • Fig. 8C shows a water-block image block obtained by the presence of abnormal pitch and partial character sticking in the document.
  • Fig. 8D and Fig. 8E are watermark image blocks obtained before and after the scanning of the large character character document. It can be seen that the watermark image block obtained by the present invention can resist print scanning operations and avoid interference of character sticking and size font characters.
  • FIG. 10 is a schematic diagram of a watermark image blocking apparatus for western watermark processing according to an embodiment of the present invention, including:
  • a segmentation module 10 configured to divide a Western image by row and column to obtain a plurality of character image blocks
  • the identification module 20 is configured to identify a valid character image block from the character image block;
  • the statistics module 30 is configured to perform statistics on the size of the valid character image block to determine whether the western image is a large font document or a small font document;
  • the grouping module 40 is configured to group the large font character document and the small font character document with different numbers of words respectively;
  • the equalization module 50 is configured to divide the word groups into multiple copies, and the watermark image blocks are corresponding to the copies.
  • This embodiment ensures the operability of the watermark embedding process, makes the resynchronization of the watermark image block stronger, and makes the watermark embedding and extraction process more robust.
  • the identification module 20 includes: a determination module, configured to determine the following conditions: condition 1, w > N tl xH; condition 2, the bottom edge and the top edge of U both fall on the same side of m; condition 3, The bottom and top edges of U fall on both sides of m, respectively, and ⁇ ⁇ ⁇ 11; where U is the circumscribed rectangle of the character image block in the set ⁇ of the character image block, and H is the line of the line where U is located High, m is the center line position where U is located, h and w are respectively the height and width of U, and N tl and N t2 are preset coefficients; a determining module is used for if any of the conditions 1 - 3 is true, It is determined that the character image block corresponding to U is a punctuation image block. If none of the conditions 1-3 is satisfied, it is determined that the character image block corresponding to U is a valid character image block.
  • a determination module configured to determine the following conditions: condition 1, w > N tl
  • the statistics module 30 comprises: a calculation module, configured to calculate
  • the row determining module is configured to determine that the line in which U is in the large font size if Hs > Th slze Character line, otherwise it is a small character line, where Th slze is the preset threshold; the line count module is used to count the number of lines of the large character line in the western image.
  • the Western image is determined to be a large character character document, otherwise it is a small font character document.
  • the above-described embodiments of the present invention ultimately improve the extraction accuracy of the western image watermarking process.
  • the characteristics of the western word in the lower column of the different fonts are fully considered, and the thresholds for distinguishing the large character character document from the small font character document are obtained, and the types of the large font character document and the small font character document are judged by the threshold.
  • different document types different numbers of word groups are used, and the difference of the number of black pixel points in the grouped watermark image block is small, so that the robustness of the watermark processing is greatly improved.
  • the distance between the western character spacing before and after printing is considered to be unstable.
  • the local characteristics of the western word segmentation threshold are considered, and the distance between the internal characters of the western word near the threshold is performed.
  • the fine adjustment ensures the consistency of the segmentation result of the western word before and after the print scan, so that the resynchronization of the watermark image block is stronger, and the robustness of the watermark embedding and extraction process is further improved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Editing Of Facsimile Originals (AREA)
  • Image Processing (AREA)

Description

用于西文水印处理的水印图像分块方法和装置 技术领域
本发明涉及数字排版领域, 具体而言, 涉及用于西文水印处理的水 印图像分块方法和装置。 背景技术
随着电子商务及电子政务的发展, 企事业单位、 党政机关以及国家 安全等有关部门将处理大量的文字材料, 这其中包括合约、 涉密等等的 重要文件。 这些文本文件的版权保护和内容安全是一个重要的问题, 数 字水印技术为上述问题的解决提供了一种途径。
数字水印是指将特定的信息嵌入数字信号中, 数字信号可能是音 频、 图片或是影片等。 若要拷贝有数字水印的信号, 所嵌入的信息也会 一并被拷贝。 数字水印可分为浮现式和隐藏式两种, 前者是可被看见的 水印( visible watermarking ), 其所包含的信息可在观看图片或影片时同 时被看见。一般来说,浮现式的水印通常包含版权拥有者的名称或标志。 电视台在画面角落所放置的标志, 也是浮现式水印的一种。
隐藏式的水印是以数字数据的方式加入音频、 图片或影片中, 但在 一般的状况下无法被看见。 隐藏式水印的重要应用之一是保护版权, 期 望能借此避免或阻止数字媒体未经授权的复制和拷贝。 隐写术 ( Steganography )也是数字水印的一种应用, 双方可利用隐藏在数字信 号中的信息进行沟通。 数字照片中的注释数据能记录照片拍摄的时间、 使用的光圈和快门, 甚至是相机的厂牌等信息, 这也是数字水印的应用 之一。 某些文件格式可以包含这些称为 "metadata" 的额外信息。
另外, 很多文本文件不仅以数字形式存在, 它还会通过打印、 复印 等途径以纸张的形式传播, 这其中不乏大量的西文文档。 随着国际化程 度越来越高, 西文文档的交流也变得日趋频繁, 因此对于此类文档的安 全保护的需求也变得更加强烈。 而随着数字化技术的迅猛发展, 这种方 式已经变得相当普遍 ,这也使得很多重要或者机密信息以纸质文件为传 播途径而泄漏出去。 因此, 研究基于纸质文件的能够抵抗打印和复印的 二值文本水印技术显得尤为重要。
1、 申请号为 200710121642.7 的专利文献公开了一种二值图像中数 字水印的嵌入方法, 该方法为: 将二值图像的部分或全部划分为至少两 个水印图像块, 根据每个水印图像块中的黑色像素点个数得到分组, 对 所述每个分组中的数据进行哈达玛(Hadamard ) 变换。 使用量化方法, 将待嵌入水印信号嵌入,进行逆哈达玛变换得到每个水印图像块中需要 改变的像素点个数从而达到嵌入、 提取水印的目的。
2、 申请号为 200810055770.0的专利文献公开了一种二值文本图像 中数字水印的嵌入方法和装置, 该方法包括: 将二值文本图像的部分或 全部划分为嵌入部分和调整部分,计算所述嵌入部分和调整部分每个集 合所包含的黑色像素点个数的平均值,根据所述平均值和嵌入部分每个 集合所包含的黑色像素点的个数计算颜色改变参数,根据所述颜色改变 参数改变嵌入部分和调整部分每个集合所包含的黑色像素点的个数, 实 现水印的嵌入。
3、 申请号为 200610114048.0的专利文献公开了一种黑白二值文本 图像中数字水印嵌入与提取的方法及装置,其中嵌入方法包括定位文本 图像中的有效字符区域; 将有效字符区域进行分组, 并统计各字符区域 中的黑点个数; 根据分组内的各字符区域内黑点个数之间的相对关系、 水印信息位串、以及第一步长计算出每个字符区域内需要翻转的像素第 一个数; 按第一个数翻转每个字符区域内的像素。 提取方法包括定位文 本图像中有效字符区域; 将有效字符区域进行分组, 并统计各字符区域 中的黑点个数;根据每个分组中字符区域黑点个数之间的相对关系及第 一步长提取嵌入的水印信息位串。
在上述的二值文本水印技术中,水印图像块作为水印的嵌入区域显 得尤为重要。 可以看出, 在上述专利申请 1中, 水印图像块直接作为水 印嵌入区域。 在专利申请 2中, 二值文本图像部分划分为嵌入部分, 即 为水印图像块。 在专利申请 3中, 文本图像中分组后的有效字符区域作 为水印图像块。以上专利申请均通过改变水印图像块内的黑色像素点个 数来嵌入水印, 通过量化水印图像块内的黑色像素点个数来提取水印。
因此, 上述方法都是基于两个前提: 1、 正确的字符切分结果。 目 前的字符切分算法大多依赖于光学字符识 OCR ( Optical Character Regnition ) 系统的字符识别结果, 但是考虑到 OCR识别的速度和效率 问题, 一般不会在数字水印系统中引入 OCR机制, 并且对于粘连的西 文字符而言, OCR识别也存在一定的错误率; 2、 水印图像块的黑色像 素点个数波动范围不大。 比如在中文文档中, 釆用的是基于一个汉字为 一个水印图像块。 中文是方块字, 各个字符的面积大小差别不大, 因此 水印图像块中的黑色像素点个数差别不大,从而保证了水印嵌入和提取 的正确率。
但是, 上述方法不太适合于西文文档。 面临的困难有:
a ) 西文字母在打印前后的粘连现象普遍, 无法保证打印扫描前后 的字符切分的一致性。 例如: "mn"、 "tt" 等。 如果将单个西文字母作 为水印图像块,字母的粘连必然影响水印嵌入和提取前后的字符图像块 切分序列的再同步性, 从而影响水印嵌入和提取的成功率。
b ) 西文单词的长度差别较大, 单词所含字符图像块的黑点个数波 动比较大。 例如: "My extraordinary power" , 其中各单词长度差别数倍。 如果将单个西文单词作为水印图像块,那么水印图像块中的黑色像素点 个数差别很不稳定, 无法完成水印操作。
c) 西文文档中的字号变化造成的字符大小变化。 例如: "Here" 和 "Here" , 所包含的黑色象素点个数相差甚远。 针对不同字号的文档, 需要釆取不同的量化方法。
因此, 针对西文文本文档, 获得的水印图像块需要满足以下条件: 1、 能够避免因字符粘连带来的水印图像块不同步的影响。
2、 水印图像块中的黑色像素点个数差别不大。
3、 针对不同字号的文档, 可按大小自适应地划分水印图像块。 发明内容
本发明旨在提供一种用于西文水印处理的水印图像分块方法和装 置, 以解决现有技术对于西文图像难以正确划分水印图像块的问题。
在本发明的实施例中, 提供了一种用于西文水印处理的水印图像分 块方法, 包括: 将西文图像通过行列切分得到多个字符图像块; 从字符 图像块中识别有效字符图像块; 对有效字符图像块的尺寸进行统计, 以 确定西文图像是大字号文档或是小字号文档; 针对大字号字符文档和小 以这些份对应水印图像块。
在本发明的实施例中, 提供了一种用于西文水印处理的水印图像分 块装置, 包括: 切分模块, 用于将西文图像通过行列切分得到多个字符 图像块; 识别模块, 用于从字符图像块中识别有效字符图像块; 统计模 块, 用于对有效字符图像块的尺寸进行统计, 以确定西文图像是大字号 文档或是小字号文档; 分组模块, 用于针对大字号字符文档和小字号字 符文档分别釆用不同数目的单词进行分组; 均分模块, 用于均分单词组 为多份, 以这些份对应水印图像块。
本发明上述实施例的用于西文水印处理的水印图像分块方法和装 置, 因为根据字符大小合理地设置水印图像块, 所以解决了现有技术对 于西文图像难以正确划分水印图像块的问题, 保证了水印嵌入处理的可 操作性。 附图说明
此处所说明的附图用来提供对本发明的进一步理解, 构成本申请的 一部分, 本发明的示意性实施例及其说明用于解释本发明, 并不构成对 本发明的不当限定。 在附图中:
图 1为本发明实施例提供的一种用于西文水印处理的水印图像分块 方法的流程图;
图 2为本发明实施例提供的一种大小字号字符文档判别方法的流程 图;
图 3为本发明实施例提供的一种有效字符区域分组方法的流程图; 图 4A为本发明实施例中的西文二值图像行高和中心线示意图; 图 4B为本发明实施例中的西文二值图像有效字符图像块示意图; 图 5为本发明实施例中的西文二值图像字符切分结果示意图; 图 6为本发明实施例中的西文二值图像单词分组结果示意图; 图 7A为本发明实施例中的计算单词分组中字符有效长度的示意图; 图 7B为本发明实施例中的获得水印图像块示意图; 图 8A为本发明实施例中的西文小字号字符文档的水印图像块示意 图;
图 8B 为本发明实施例中的西文小字号字符文档打印扫描后的水印 图像块示意图;
图 8C 为本发明实施例中的西文小字号字符文档在不正常间距和部 分字符粘连下的水印图像块示意图; 图; ; 、 , 、 、 — ' 、 ; 图像块示意图;
图 9为本发明实施例中的计算单词切分中切分阈值的示意图; 图 10 为本发明实施例提供的一种用于西文水印处理的水印图像分 块装置的示意图。 具体实施方式
下面将参考附图并结合实施例来详细说明本发明。
图 1 为本发明实施例提供的一种用于西文水印处理的水印图像 分块方法的流程图, 包括:
步骤 S 10, 将西文图像通过行列切分得到多个字符图像块; 步骤 S20, 从字符图像块中识别有效字符图像块;
步骤 S30, 对有效字符图像块的尺寸进行统计, 以确定西文图像 是大字号文档或是小字号文档;
步骤 S40,针对大字号字符文档和小字号字符文档分别釆用不同 数目的单词进行分组;
步骤 S50, 均分单词组为多份, 以这些份对应水印图像块。
该实施例对有效字符图像块的尺寸进行统计, 这充分考虑了西 文单词在不同字体下行列切分的特性, 区分大字号字符文档和小字 号字符文档。 自适应地釆用不同数目的单词分组, 获得的水印图像 块中的黑色像素点个数差距较小, 保证了水印嵌入处理的可操作性。 另外, 本实施例均分单词组为多份, 这充分考虑了西文字符间 隙打印前后不稳定的特性, 在嵌入端, 确定扩间距字符和缩间距字 符进行适当修正。 从而保证打印扫描前后的西文单词切分结果的一 致性, 使得该水印图像块的再同步性较强, 并使得水印嵌入和提取 处理的健壮性更好。
优选地, 本方法在步骤 S10之前还包括: 获取西文图像; 对西 文图像进行降噪处理, 得到二值化的西文图像。 该步骤是实现了对 西文图像的预处理, 很容易在计算机上实现。 本优选实施例通过降 噪处理, 可以得到较好的西文二值文本图像。
优选地, 步骤 S20 包括: 将字符图像块区分为标点图像块和有 效字符图像块。 西文文本主要包括标点和字母, 标点通常较小, 不 适合嵌入水印。 本优选实施例对字符图像块进行区分, 可以排除标 点。
优选地, 将字符图像块区分为标点图像块和有效字符图像块包 括:
判断以下条件:
条件 1、 w > Ntl xH;
条件 2、 U的底边与顶边均落在 m的同侧;
条件 3、 U的底边与顶边分别落在 m的两侧, 且\¥ < ^ 11; 其中, U为字符图像块的集合 Ω中的字符图像块的外接矩形框 , H为 U所处行的行高, m为 U所处行的中线位置, h、 w分别为 U 的高度、 宽度,
Figure imgf000008_0001
Nt2为预设的系数;
如果条件 1-3 中任一条成立, 则确定 U对应的字符图像块为标 点图像块, 如果条件 1-3均不成立, 则确定 U对应的字符图像块为 有效字符图像块。
本优选实施例给出了具体的数值化判断流程, 有利于在计算机 中编程实现。
优选地, 设置 Ntl = 4, Nt2 = 0.35。 本优选实施例是发明人经过 大量艰苦的试验后得到的最佳系数。 当然, 将系数设置在以上数值 附近范围也是可行的, 这仍然属于本发明的精神范围。 优选地, 步骤 S30 包括: 计算 m ; 其中, h、、 、…、 分别是 U所处行的有效字符图像块 I、2、 …、 m的高度; 如 果 Hs > Thslze , 则确定 U所处行是大字号字符行, 否则是小字号字符 行, 其中 Thslze是预设阈值; 统计西文图像中大字号字符行的行数 Nlarge和小字号字符行的行数 Nsmall; 如果 Nlarge > Nsmall , 则确定西文 图像是大字号字符文档, 否则是小字号字符文档。 本优选实施例给出了具体的数值化判断流程, 有利于在计算机 中编程实现。
优选地, 设置 Thslze = 88。 本优选实施例是发明人经过大量艰苦 的试验后得到的最佳阈值。 当然, 将阈值设置在以上数值的附近范 围也是可行的, 这仍然属于本发明的精神范围, 应当受到权利要求 的保护。
图 2 为本发明实施例提供的一种大小字号字符文档判别方法的 流程图。 本实施例通过初步的行列切分获得若干字符图像块的外接 矩形框。 通过字符图像块的统计特征判断大字号字符行和小字号字 符行, 然后确定大字号字符文档和小字号字符文档的类型。 具体包 括以下步骤:
5201、 获取字符图像块。
通过初步的行列切分获得若干字符图像块的外接矩形框。 如图 4A所示, 对当前字符图像块集合 Ω , H为当前行的行高, m为当前 行的中线位置。 如图 4B所示, 字符图像块 U即为 Ω中任意一个外接 矩形框。 h、 w分别为 U的高度、 宽度。
5202、 区分标点字符图像块。
字符图像块 U如果满足以下 3个条件中的任意一项, 标记 U为 标点字符图像块, 否则, 标记为有效字符图像块。
条件 1、 w > Ntl xH ;
条件 2、 该字符图像块 U的底边与顶边均落在中心线同侧; 条件 3、 该字符图像块 U的底边与顶边分别落在中心线异侧, 且 w < Nt2xH。
一般地, Ntl = 4, Nt2 = 0.35。
其中, 条件 1是筛选形如下划线 " " 的标点符号; 条件
2是筛选形如逗号、 句号、 引号的标点符号; 条件 3是筛选形如连词 符号 "-" 的标点符号。
显然, 经过该步骤, 所有的字符图像块 U都被区分成标点图像 块和有效字符图像块两类。
S203、 计算有效字符图像块的统计特征。
不妨设当前行中, 有效字符图像块 U的个数为 m。 那么当前行 的有效字符图像块的有效高度 Hs为:
其中, h、 ¾、···、 分别是当前行的有效字符图像块 I、2、 …、 m的 高度。
5204、 判断大字号字符行或小字号字符行。
如果 Hs > Thslze, 那么当前行属于大字号字符行, 否则, 属于小 字号字符行。 一般地, Thslze = 88。
5205、 判断文档类型。
分别统计文档中大字号字符行的行数 Nla e和小字号字符行的行 数 Nsmall,如果 Nlarge > Nsmall,那么该文档属于大字号字符文档,否则, 属于小字号字符文档。
优选地, 步骤 S50 包括: 将单词组按照其中单词列投影下的有 效长度均分成固定份数; 以份所占的宽度和所在字符的最大高度组 合成外接矩形框, 外接矩形框对应水印图像块。
图 3 为本发明实施例提供的一种有效字符区域分组方法的流程 图。 针对大字号字符文档和小字号字符文档, 分别釆用不同数目的 单词为一组, 每组按照其中单词列投影下的有效长度均分成固定份 数, 每份所占的宽度和所在字符的最大高度组合成一个新的外接矩 形框, 每个新外接矩形框对应一个水印图像块, 以完成有效区域分 组。 具体包括以下步骤: 5301、 获取文档类型和单词切分结果。
通过步骤 S30 已获得文档类型, 即为大字号字符文档或小字号 字符文档。 关于单词切分, 将一行内所有相邻有效字符图像块的外 接矩形框之间的距离由小到大排序, 可以认为这个新序列包含了两 类数据, 一类是单词内字符间距, 另一类是单词间距。 显然单词间 距要大于单词内字符间距。 如图 9 所示, 将在这个由小到大排列的 序列中选取一个切分阈值来区分上述两种类型的数据。 所以, 获得 准确稳定的切分阈值是单词切分的关键。
可通过大津法或双峰法等图像二值化方法来区分上述序列中的 两类数据。 另外, 也可以考虑切分阈值的左右子序列的方差均较小 的这一统计特征, 通过计算序列中左右方差之和最小的点的方法来 确定最佳的切分阈值。 获得切分阈值后, 将小于切分阈值的字符间 距对应的字符图像块合并为一个单词。 最终得到如图 5 所示的单词 切分结果。
5302、 部分字符局部调整。
将处在单词切分阈值大小附近的有效字符图像块归为扩间距字 符, 将有效字符图像块间距非常小, 在打印后非常容易造成字符粘 连的有效字符图像块归为缩间距字符。 根据所得的有效字符图像块 属性, 在原始文档序列中, 移动相对应的字符。 扩间距字符向右移 动, 在扩间距字符右侧的所有文档内容均向右移动。 缩间距字符向 左移动, 在缩间距字符右侧的所有文档内容均向左移动。
5303、 单词分组。
如图 6 所示, 针对大字号字符文档和小字号字符文档, 分别釆 用不同数目的单词为一个单词分组 G。 大字体字符文档釆用 Nt3个单 词分为一组, 小字号字符文档釆用 Nt4个单词分为一组。 一般地, Nt3 = 3 , Nt4 = 4。
5304、 计算字符有效长度。
如图 7A所示, 对某单词分组 G, 组中共 m个有效字符图像块 U
m
= Wi
的有效长度 L为: 。 将该组均分成固定份数 S。 那么, 每 一份的有效长度 Ls = L/S。 这里 S的取值根据水印处理而定。 例如, 在申请号为 200710121642.7 的专利中, S 的取值与使用的哈达玛矩 阵的阶数相同。 一般地, S=4。 S305、 获得水印图像块。
如图 7B所示, 按每一份的有效长度 Ls对应到有效字符图像块 序列上进行划分, 同时将每份所含字符的最大高度组合成一个新的 外接矩形框, 每个新的外接矩形框对应一个水印图像块。
将每 S 个水印图像块分为一组, 基于分组的水印图像块进行数 字水印嵌入和提取处理。
图 8A、 图 8B分别为小字号字符文档打印扫描前后获得的水印 图像块。 图 8C为文档中存在不正常间距和部分字符粘连下获得的水 印图像块。 图 8D、 图 8E分别为大字号字符文档打印扫描前后获得 的水印图像块。 可以看出, 本发明所获得的水印图像块可以抵抗打 印扫描操作, 以及避免字符粘连和大小字号字符文档的干扰。
图 10为本发明实施例提供的一种用于西文水印处理的水印图像 分块装置的示意图, 包括:
切分模块 10 , 用于将西文图像通过行列切分得到多个字符图像 块;
识别模块 20 , 用于从字符图像块中识别有效字符图像块; 统计模块 30 , 用于对有效字符图像块的尺寸进行统计, 以确定 西文图像是大字号文档或是小字号文档;
分组模块 40 , 用于针对大字号字符文档和小字号字符文档分别 釆用不同数目的单词进行分组;
均分模块 50 , 用于均分单词组为多份, 以这些份对应水印图像 块。
该实施例保证了水印嵌入处理的可操作性, 使得该水印图像块 的再同步性较强, 并使得水印嵌入和提取处理的健壮性更好。
优选地, 识别模块 20包括: 判断模块, 用于判断以下条件: 条 件 1、 w > NtlxH; 条件 2、 U的底边与顶边均落在 m的同侧; 条件 3、 U的底边与顶边分别落在 m的两侧, 且\¥ < ^ 11; 其中, U为字符 图像块的集合 Ω中的字符图像块的外接矩形框, H为 U所处行的行 高, m为 U所处行的中线位置, h、 w分别为 U的高度、 宽度, Ntl 和 Nt2为预设的系数; 确定模块, 用于如果条件 1 -3中任一条成立, 则确定 U对应的字符图像块为标点图像块, 如果条件 1 -3均不成立, 则确定 U对应的字符图像块为有效字符图像块。
优选地, 统计模块 30 包括: 计算模块, 用 于计算
m ; 其中, 、 、…、 分别是 U所处行的有效字 符图像块 I、2、 …、 m的高度; 行确定模块, 用于如果 Hs > Thslze, 则确 定 U所处行是大字号字符行, 否则是小字号字符行, 其中 Thslze是预 设阈值; 行数统计模块, 用于统计西文图像中大字号字符行的行数
Nla e和小字号字符行的行数 Nsmall; 文档确定模块, 用于如果 Nla e
> Nsmall , 则确定西文图像是大字号字符文档, 否则是小字号字符文 档。
从以上的描述中可以看出, 本发明上述的实施例最终提高了西 文图像水印处理的提取正确率。 在本发明中充分考虑西文单词在不 同字体下行列切分的特性, 得到区分大字号字符文档和小字号字符 文档的阈值, 通过阈值判断大字号字符文档和小字号字符文档的类 型。 根据文档类型不同釆用不同数目的单词分组, 分组后的水印图 像块中的黑色像素点个数差值较小, 从而使得水印处理的鲁棒性得 到很大提高。 在本发明中充分考虑西文字符间距在打印前后距离不 稳定的情况, 在水印嵌入时, 考虑西文单词切分阈值的局部特性, 对阈值附近的西文单词的内部字符之间的距离进行微调, 从而保证 打印扫描前后的西文单词切分结果的一致性, 使得该水印图像块的 再同步性较强, 并进一步提高水印嵌入和提取处理的健壮性。
显然, 本领域的技术人员应该明白, 上述的本发明的各模块或 各步骤可以用通用的计算装置来实现, 它们可以集中在单个的计算 装置上, 或者分布在多个计算装置所组成的网络上, 可选地, 它们 可以用计算装置可执行的程序代码来实现, 从而可以将它们存储在 存储装置中由计算装置来执行, 或者将它们分别制作成各个集成电 路模块, 或者将它们中的多个模块或步骤制作成单个集成电路模块 来实现。 这样, 本发明不限制于任何特定的硬件和软件结合。
以上所述仅为本发明的优选实施例而已, 并不用于限制本发明, 对于本领域的技术人员来说, 本发明可以有各种更改和变化。 凡在 本发明的精神和原则之内, 所作的任何修改、 等同替换、 改进等, 均应包含在本发明的保护范围之内。

Claims

权利要求:
1.一种用于西文水印处理的水印图像分块方法,其特征在于,包括: 将西文图像通过行列切分得到多个字符图像块;
从所述字符图像块中识别有效字符图像块;
对所述有效字符图像块的尺寸进行统计, 以确定所述西文图像是大 字号文档或是小字号文档;
针对所述大字号字符文档和所述小字号字符文档分别釆用不同数 目的单词进行分组;
均分所述单词组为多份, 以所述份对应水印图像块。
2.根据权利要求 1所述的方法, 其特征在于, 从所述字符图像块中 识别有效字符图像块包括:
将所述字符图像块区分为标点图像块和所述有效字符图像块。
3.根据权利要求 2所述的方法, 其特征在于, 将所述字符图像块区 分为标点图像块和所述有效字符图像块包括:
判断以下条件:
条件 1、 w > Ntl X H;
条件 2、 U的底边与顶边均落在 m的同侧;
条件 3、 U的底边与顶边分别落在 m的两侧, 且\¥ < ^
其中, U为所述字符图像块的集合 Ω中的所述字符图像块的外接矩 形框, H为 U所处行的行高, m为 U所处行的中线位置, h、 w分别为 U的高度、 宽度, Nt p Nt2为预设的系数;
如果条件 1-3中任一条成立,则确定 U对应的所述字符图像块为所 述标点图像块, 如果条件 1-3均不成立, 则确定 U对应的所述字符图像 块为所述有效字符图像块。
4.根据权利要求 3所述的方法,其特征在于,设置 Ntl = 4, Nt2= 0.35。
5.根据权利要求 3所述的方法, 其特征在于, 对所述有效字符图像 块的
Figure imgf000016_0001
u 所处行的所述有效字符图像块 1、2、 …、 m的高度;
如果 Hs > Thslze, 则确定 U所处行是大字号字符行, 否则是小字号 字符行, 其中 Thslze是预设阈值;
统计所述西文图像中所述大字号字符行的行数 Nlarge和所述小字号 字符行的行数 Nsmall;
如果 Nla e > Nsmall, 则确定所述西文图像是所述大字号字符文档, 否则是所述小字号字符文档。
6.根据权利要求 5所述的方法, 其特征在于, 设置 Thslze = 88。
7.根据权利要求 1所述的方法, 其特征在于, 均分所述单词组为多 份, 以所述份对应水印图像块包括:
将所述单词组按照其中单词列投影下的有效长度均分成固定份数; 以所述份所占的宽度和所在字符的最大高度组合成外接矩形框,所 述外接矩形框对应所述水印图像块。
8.根据权利要求 1所述的方法, 其特征在于, 在将西文图像通过行 列切分得到多个字符图像块之前, 还包括:
获取所述西文图像;
对所述西文图像进行降噪处理 , 得到二值化的所述西文图像。
9.一种用于西文水印处理的水印图像分块装置,其特征在于,包括: 切分模块, 用于将西文图像通过行列切分得到多个字符图像块; 识别模块, 用于从所述字符图像块中识别有效字符图像块; 统计模块, 用于对所述有效字符图像块的尺寸进行统计, 以确定所 述西文图像是大字号文档或是小字号文档;
分组模块,用于针对所述大字号字符文档和所述小字号字符文档分 别釆用不同数目的单词进行分组;
均分模块,用于均分所述单词组为多份,以所述份对应水印图像块。
10. 根据权利要求 9所述的装置, 其特征在于, 所述识别模块包 括:
判断模块, 用于判断以下条件:
条件 1、 w > Ntl X H;
条件 2、 U的底边与顶边均落在 m的同侧;
条件 3、 U的底边与顶边分别落在 m的两侧, 且\¥ < ^
其中, U为所述字符图像块的集合 Ω中的所述字符图像块的外接矩 形框, H为 U所处行的行高, m为 U所处行的中线位置, h、 w分别为 U的高度、 宽度, Nt p Nt2为预设的系数;
确定模块, 用于如果条件 1-3中任一条成立, 则确定 U对应的所述 字符图像块为标点图像块, 如果条件 1-3均不成立, 则确定 U对应的所 述字符图像块为所述有效字符图像块。
11. 根据权利要求 10所述的装置, 其特征在于, 所述统计模块包 括: 计算模块, 用于计算 m ;
其中, 、 /¾、···、 分别是 u 所处行的所述有效字符图像块 1、2、 …、 m的高度;
行确定模块, 用于如果 Hs > Thslze, 则确定 U所处行是大字号字符 行, 否则是小字号字符行, 其中 Thslze是预设阈值;
行数统计模块,用于统计所述西文图像中所述大字号字符行的行数 Nlarge和所述小字号字符行的行数 N small,
文档确定模块, 用于如果 Nla e > Nsmall, 则确定所述西文图像是所 述大字号字符文档, 否则是所述小字号字符文档。
PCT/CN2011/084577 2010-12-23 2011-12-23 用于西文水印处理的水印图像分块方法和装置 Ceased WO2012083882A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US13/997,258 US9111341B2 (en) 2010-12-23 2011-12-23 Watermarking image block division method and device for western language watermarking processing
JP2013545033A JP5669957B2 (ja) 2010-12-23 2011-12-23 西洋語の透かし処理をするための透かし画像の分割方法と装置
EP11852045.1A EP2657902B1 (en) 2010-12-23 2011-12-23 Watermarking image block division method and device for western language watermarking processing

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201010620424.X 2010-12-23
CN201010620424.XA CN102567938B (zh) 2010-12-23 2010-12-23 用于西文水印处理的水印图像分块方法和装置

Publications (1)

Publication Number Publication Date
WO2012083882A1 true WO2012083882A1 (zh) 2012-06-28

Family

ID=46313189

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/084577 Ceased WO2012083882A1 (zh) 2010-12-23 2011-12-23 用于西文水印处理的水印图像分块方法和装置

Country Status (5)

Country Link
US (1) US9111341B2 (zh)
EP (1) EP2657902B1 (zh)
JP (1) JP5669957B2 (zh)
CN (1) CN102567938B (zh)
WO (1) WO2012083882A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102938841A (zh) * 2012-11-30 2013-02-20 西安空间无线电技术研究所 在承载图像中隐藏信息、图像质量评价及信息传输方法
CN108830772A (zh) * 2018-05-25 2018-11-16 珠海奔图电子有限公司 水印编码转换方法及装置
TWI643159B (zh) * 2017-11-16 2018-12-01 國立臺北科技大學 基於奇偶特性隱寫資料於區塊截斷編碼影像的方法、影像壓縮裝置及電腦可讀取的記錄媒體

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224833B (zh) * 2014-06-30 2018-03-30 北京金山安全软件有限公司 利用数字水印识别应用程序是否是正版的方法及系统
CN105631486A (zh) * 2014-10-27 2016-06-01 深圳Tcl数字技术有限公司 图像文字识别方法及装置
GB2572386B (en) * 2018-03-28 2021-05-19 Canon Europa Nv An image processing system and an image processing method
US10939013B2 (en) 2018-09-07 2021-03-02 International Business Machines Corporation Encoding information within features associated with a document
WO2021056183A1 (en) * 2019-09-24 2021-04-01 Citrix Systems, Inc. Watermarks for text content
CN113450243A (zh) * 2020-03-24 2021-09-28 北京四维图新科技股份有限公司 水印添加方法和设备
CN114596188A (zh) * 2022-02-22 2022-06-07 北京百度网讯科技有限公司 水印检测方法、模型训练方法、装置及电子设备
CN120997026B (zh) * 2025-10-22 2026-03-06 浣江实验室 一种针对文本图像的抗打印扫描数字水印的方法

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1700205A (zh) * 2005-06-24 2005-11-23 清华大学 一种在英文文本中嵌入和提取水印的方法
CN101169779A (zh) * 2007-11-30 2008-04-30 清华大学 在英文文本中嵌入和提取频域水印的方法

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5848191A (en) * 1995-12-14 1998-12-08 Xerox Corporation Automatic method of generating thematic summaries from a document image without performing character recognition
JP3373811B2 (ja) * 1999-08-06 2003-02-04 インターナショナル・ビジネス・マシーンズ・コーポレーション 白黒2値文書画像への透かし情報埋め込み・検出方法及びその装置
JP2003230001A (ja) * 2002-02-01 2003-08-15 Canon Inc 文書用電子透かし埋め込み装置及び文書用電子透かし抽出装置並びにそれらの制御方法
US8127137B2 (en) * 2004-03-18 2012-02-28 Digimarc Corporation Watermark payload encryption for media including multiple watermarks
JP4298588B2 (ja) * 2004-05-31 2009-07-22 株式会社リコー 情報検出装置および情報検出方法
US7644281B2 (en) * 2004-09-27 2010-01-05 Universite De Geneve Character and vector graphics watermark for structured electronic documents security
CN1897522B (zh) * 2005-07-15 2010-05-05 国际商业机器公司 水印嵌入和/或检测的方法、装置及系统
JP2008098946A (ja) * 2006-10-11 2008-04-24 Canon Inc 画像処理装置及びその制御方法
JP2009141525A (ja) * 2007-12-04 2009-06-25 Canon Inc 画像処理装置及び画像処理方法
CN101251892B (zh) * 2008-03-07 2010-06-09 北大方正集团有限公司 一种字符切分方法和装置
JP2010124451A (ja) * 2008-10-24 2010-06-03 Canon Inc 文書処理装置および文書処理方法

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1700205A (zh) * 2005-06-24 2005-11-23 清华大学 一种在英文文本中嵌入和提取水印的方法
CN101169779A (zh) * 2007-11-30 2008-04-30 清华大学 在英文文本中嵌入和提取频域水印的方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2657902A4

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102938841A (zh) * 2012-11-30 2013-02-20 西安空间无线电技术研究所 在承载图像中隐藏信息、图像质量评价及信息传输方法
CN102938841B (zh) * 2012-11-30 2015-02-11 西安空间无线电技术研究所 在承载图像中隐藏信息、图像质量评价及信息传输方法
TWI643159B (zh) * 2017-11-16 2018-12-01 國立臺北科技大學 基於奇偶特性隱寫資料於區塊截斷編碼影像的方法、影像壓縮裝置及電腦可讀取的記錄媒體
CN108830772A (zh) * 2018-05-25 2018-11-16 珠海奔图电子有限公司 水印编码转换方法及装置

Also Published As

Publication number Publication date
JP5669957B2 (ja) 2015-02-18
US20140003649A1 (en) 2014-01-02
CN102567938A (zh) 2012-07-11
EP2657902B1 (en) 2017-03-08
EP2657902A4 (en) 2013-12-25
US9111341B2 (en) 2015-08-18
CN102567938B (zh) 2014-05-14
EP2657902A1 (en) 2013-10-30
JP2014500688A (ja) 2014-01-09

Similar Documents

Publication Publication Date Title
WO2012083882A1 (zh) 用于西文水印处理的水印图像分块方法和装置
US10949509B2 (en) Watermark embedding and extracting method for protecting documents
JP3837432B2 (ja) デジタル像符牒の処理方法及びシステム
US8693790B2 (en) Form template definition method and form template definition apparatus
CN107248134B (zh) 一种文本文档中的信息隐藏方法和装置
JP4904175B2 (ja) 低解像度のグリフ・イメージから高忠実度のグリフ・プロトタイプを作成するための方法および装置
US20080205699A1 (en) Digital watermark embedding and detection
JP2001078006A (ja) 白黒2値文書画像への透かし情報埋め込み・検出方法及びその装置
WO2008052430A1 (en) Method of digital watermark embedding and extracting and device thereof
TW200540728A (en) Text region recognition method, storage medium and system
CN103761700A (zh) 一种基于字符细化的可抵抗打印扫描攻击的水印方法
Chen et al. Recent developments in document image watermarking and data hiding
He et al. A practical print-scan resilient watermarking scheme
US8848984B2 (en) Dynamic thresholds for document tamper detection
JP2002199206A (ja) メッセージ埋込並びに抽出方法、装置および媒体
US20110170133A1 (en) Image forming apparatus, method of forming image and method of authenticating document
CN115239605A (zh) 一种基于像素不变的文本图像抗打印扫描方法
CN112990178B (zh) 一种基于字符切分的文本数字信息嵌入、提取方法及系统
Chen et al. Data hiding in document images
RU2431192C1 (ru) Способ внедрения скрытого цифрового сообщения в печатаемые документы и извлечения сообщения
Lin et al. Data hiding in image mosaics by visible boundary regions and its copyright protection application against print-and-scan attacks
CN118736060A (zh) 一种文档图像文字颜色加深方法、装置及计算机设备
CN118396825A (zh) 数字水印添加方法、数字水印检测方法及相关设备
AU2009202892A1 (en) Class membership of characters or symbols for tamper detection and correction
Schmucker et al. Fraunhofer Institute for Computer Graphics, Rundeturmstr. 6, 64283 Darmstadt, Germany

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11852045

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2013545033

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

REEP Request for entry into the european phase

Ref document number: 2011852045

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2011852045

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 13997258

Country of ref document: US