EP0059239A2 - Méthode pour rechercher et délimiter des régions de texte sur un document qui peut contenir des régions de texte, de graphique et/ou d'image - Google Patents

Méthode pour rechercher et délimiter des régions de texte sur un document qui peut contenir des régions de texte, de graphique et/ou d'image Download PDF

Info

Publication number
EP0059239A2
EP0059239A2 EP81108441A EP81108441A EP0059239A2 EP 0059239 A2 EP0059239 A2 EP 0059239A2 EP 81108441 A EP81108441 A EP 81108441A EP 81108441 A EP81108441 A EP 81108441A EP 0059239 A2 EP0059239 A2 EP 0059239A2
Authority
EP
European Patent Office
Prior art keywords
black
line
text
numbering
area
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP81108441A
Other languages
German (de)
English (en)
Other versions
EP0059239A3 (fr
Inventor
Wolfgang Dipl.-Ing. Scherl
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Siemens AG
Siemens Corp
Original Assignee
Siemens AG
Siemens Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siemens AG, Siemens Corp filed Critical Siemens AG
Publication of EP0059239A2 publication Critical patent/EP0059239A2/fr
Publication of EP0059239A3 publication Critical patent/EP0059239A3/fr
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables

Definitions

  • the present invention relates to a method for locating and delimiting text areas on a template, which can contain text, graphics and / or image areas.
  • the present invention has for its object to provide a method by means of which the above-mentioned processes can be carried out in a simple, quick and reliable manner.
  • the stated object is achieved by a method according to the preamble of the main claim, which is characterized by the features specified in the characterizing part of the main claim.
  • the invention offers the advantage that a relatively simple method, which consequently also requires a relatively simple arrangement for carrying out this method, in particular due to the availability of inexpensive and space-saving data processing aids, such as microcomputers, by means of which the stated object is achieved can be.
  • the template, cf. Fig. 5 or Fig. 6 initially in a first step optoelectronically, preferably by means of a video camera, scanned in a manner known per se.
  • the signals which arise in analog form and represent the optoelectronic image of the template are assigned to either a binary number, preferably 1, representing a "white value” or a binary number, preferably 0, representing a "black value”.
  • a third step all the points determined in the decision-making process as "black” and represented by the binary number in question are multiplied according to a predetermined rule, so that a horizontal line of a predetermined length is generated from one point, whereby, for example, when a text line is present in the concerned scanned area of the front position black blocks, the length of which corresponds to the relevant length of the text line plus the specified extension length for the "black” dots.
  • each "white" point is expanded in the opposite direction for the intermediate result line thus obtained by means of a similar method to form a line of predetermined length, this length being greater than the previous black extent, so that the previously produced black block is correspondingly extended Length is shortened.
  • a black block freed from minor discontinuities in this way is lengthened by the difference in length from the original text line length, so that a black block extending over the entire text line length is created.
  • an operator see FIG. 1 is then used to check whether there are characteristic horizontal white / black transitions with a predetermined white column length LW and a predetermined black column length LB for text areas.
  • a vertical black line LO see FIG. 2, of predetermined length is generated, so that for each line of text there is a black block isolated from its surroundings, see FIG. 9 and FIG. 10.
  • the black blocks formed in this way are used to determine the left-hand and right-hand extreme coordinates using an area tracking method, see FIGS. 11 and 12, respectively, and combine them into a list.
  • the extreme co-ordinates calculated in this way are examined using statistical test methods to determine whether they actually delimit a text part, cf.
  • the operator is formed in that a window of LW + LB scan lines is stored in a "rolling" manner in that the first LW scan lines are inverted and the operator condition can be checked by column-by-column summation, the operator condition being fulfilled for the column for which a column sum of 0 results.
  • a vertical black line LO can be checked line by line by providing a counter for each output column, which is deleted at the beginning of the overall process. This counter is reduced line by line by 1, but is set to length LO when the operator condition is met. The counter can be checked to the extent that areas in an operator output line are classified as belonging to a black block, as long as the value of the respective column counter is greater than 0.
  • the output line supplied by the operator is examined for black components.
  • Each new beginning of the black area is scanned in an area tracking process, whereby its extreme coordinates are calculated.
  • each new beginning of the black area is numbered with a number that is increased by 1.
  • Within an uninterrupted black area within a scan line such a black area is numbered with the same number. If a black area already numbered in the previous line is touched by the black area to be newly numbered, the numbering of the black area already numbered in the previous line is adopted so that an existing numbering is continued line by line over the total black area. If a black area of the new line touches several black areas of the old line, the numbering of the leftmost area is continued.
  • this numbering is used for the area to be newly numbered. For each numbering determined, a table entry is made in which the extreme coordinates occurring under this numbering are recorded. When a numbering range changes to a new numbering range, the extreme coordinates of interest are added to the table entry of the new numbering. The table entries that disappear when they overlap are deleted from the table after the extreme coordinates have been entered. When a black area is closed and the numbering disappears, the corresponding table entry is transferred to an output list and the black area is regarded as closed. After determining an actually existing text part, these entries are used as its extreme coordinates, namely the start and end coordinates of the relevant line of text.
  • One embodiment of the invention provides that the signals which arise in analog form and represent the optoelectronic image are divided into "white values" and "black values” in an analog method:
  • Another advantageous embodiment of the invention provides that the signals, which arise in analog form and represent the optoelectronic image, are digitized in a manner known per se and that the digital values obtained in this way are known using one of the methods known per se, preferably by comparison with a predetermined digital one Threshold value can be divided into "white values” and "black values”.
  • a fixed, predetermined reference value preferably a threshold value
  • a threshold value can be used in a simple manner for dividing into “white values” and “black values”.
  • an adaptable reference value preferably a threshold value
  • Another advantageous embodiment of the invention provides that a signal representing a setting criterion is continuously derived from the scanning signals during the scanning process and is used to adapt the reference value.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Input (AREA)
  • Editing Of Facsimile Originals (AREA)
EP81108441A 1981-02-27 1981-10-16 Méthode pour rechercher et délimiter des régions de texte sur un document qui peut contenir des régions de texte, de graphique et/ou d'image Withdrawn EP0059239A3 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DE3107655 1981-02-27
DE19813107655 DE3107655A1 (de) 1981-02-27 1981-02-27 Verfahren zum auffinden und abgrenzen von textbereichen auf einer vorlage, die text-, graphik- und/oder bildbereiche enthalten kann

Publications (2)

Publication Number Publication Date
EP0059239A2 true EP0059239A2 (fr) 1982-09-08
EP0059239A3 EP0059239A3 (fr) 1984-10-24

Family

ID=6126010

Family Applications (1)

Application Number Title Priority Date Filing Date
EP81108441A Withdrawn EP0059239A3 (fr) 1981-02-27 1981-10-16 Méthode pour rechercher et délimiter des régions de texte sur un document qui peut contenir des régions de texte, de graphique et/ou d'image

Country Status (3)

Country Link
US (1) US4513442A (fr)
EP (1) EP0059239A3 (fr)
DE (1) DE3107655A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0206214A1 (fr) * 1985-06-18 1986-12-30 Siemens Aktiengesellschaft Procédé de description symbolique uniforme d'épreuves de documents en forme de structures de données dans un automate
EP0350933A3 (en) * 1988-07-13 1990-04-25 Matsushita Electric Industrial Co., Ltd. Image signal processing apparatus for bar code image signal

Families Citing this family (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE3414455C2 (de) * 1983-04-26 1996-04-25 Wollang Peter Michael Verfahren und Vorrichtung zum Lesen und Verarbeiten von Information, die aus dekodierbarer Schriftinformation und/oder nichtdekodierbarer Graphikinformation besteht
DE3538639A1 (de) * 1984-10-31 1986-04-30 Canon K.K., Tokio/Tokyo Bildverarbeitungssystem
ES8701398A1 (es) * 1985-05-14 1986-11-16 Intersoftware Sa Procedimiento para la lectura automatica de imagenes y aparato para la realizacion del mismo
JPS62137974A (ja) * 1985-12-12 1987-06-20 Ricoh Co Ltd 画像処理方式
US4817166A (en) * 1986-05-05 1989-03-28 Perceptics Corporation Apparatus for reading a license plate
US4764973A (en) * 1986-05-28 1988-08-16 The United States Of America As Represented By The Secretary Of The Air Force Whole word, phrase or number reading
US4736109A (en) * 1986-08-13 1988-04-05 Bally Manufacturing Company Coded document and document reading system
US4866784A (en) * 1987-12-02 1989-09-12 Eastman Kodak Company Skew detector for digital image processing system
US5038381A (en) * 1988-07-11 1991-08-06 New Dest Corporation Image/text filtering system and method
US5081685A (en) * 1988-11-29 1992-01-14 Westinghouse Electric Corp. Apparatus and method for reading a license plate
JP2930612B2 (ja) * 1989-10-05 1999-08-03 株式会社リコー 画像形成装置
US5048096A (en) * 1989-12-01 1991-09-10 Eastman Kodak Company Bi-tonal image non-text matter removal with run length and connected component analysis
US5065437A (en) * 1989-12-08 1991-11-12 Xerox Corporation Identification and segmentation of finely textured and solid regions of binary images
US5048109A (en) * 1989-12-08 1991-09-10 Xerox Corporation Detection of highlighted regions
US5202933A (en) * 1989-12-08 1993-04-13 Xerox Corporation Segmentation of text and graphics
US5272764A (en) * 1989-12-08 1993-12-21 Xerox Corporation Detection of highlighted regions
US5459826A (en) * 1990-05-25 1995-10-17 Archibald; Delbert M. System and method for preparing text and pictorial materials for printing using predetermined coding and merging regimen
US5151949A (en) * 1990-10-10 1992-09-29 Fuji Xerox Co., Ltd. System and method employing multiple predictor sets to compress image data having different portions
US5193122A (en) * 1990-12-03 1993-03-09 Xerox Corporation High speed halftone detection technique
US5315668A (en) * 1991-11-27 1994-05-24 The United States Of America As Represented By The Secretary Of The Air Force Offline text recognition without intraword character segmentation based on two-dimensional low frequency discrete Fourier transforms
JPH05151254A (ja) * 1991-11-27 1993-06-18 Hitachi Ltd 文書処理方法およびシステム
US5325441A (en) * 1992-01-31 1994-06-28 Westinghouse Electric Corporation Method for automatically indexing the complexity of technical illustrations for prospective users
US6002798A (en) * 1993-01-19 1999-12-14 Canon Kabushiki Kaisha Method and apparatus for creating, indexing and viewing abstracted documents
US5410611A (en) * 1993-12-17 1995-04-25 Xerox Corporation Method for identifying word bounding boxes in text
JP3772262B2 (ja) * 1994-08-12 2006-05-10 ヒューレット・パッカード・カンパニー 画像の型を識別する方法
US5659767A (en) * 1994-11-10 1997-08-19 Canon Information Systems, Inc. Application programming interface for accessing document analysis functionality of a block selection program
US5745596A (en) * 1995-05-01 1998-04-28 Xerox Corporation Method and apparatus for performing text/image segmentation
US6038340A (en) * 1996-11-08 2000-03-14 Seiko Epson Corporation System and method for detecting the black and white points of a color image
JP3591184B2 (ja) * 1997-01-14 2004-11-17 松下電器産業株式会社 バーコード読み取り装置
US5995661A (en) * 1997-10-08 1999-11-30 Hewlett-Packard Company Image boundary detection for a scanned image
US6832726B2 (en) 2000-12-19 2004-12-21 Zih Corp. Barcode optical character recognition
US7311256B2 (en) * 2000-12-19 2007-12-25 Zih Corp. Barcode optical character recognition
US7082219B2 (en) * 2002-02-04 2006-07-25 The United States Of America As Represented By The Secretary Of The Air Force Method and apparatus for separating text from images
US20050244060A1 (en) * 2004-04-30 2005-11-03 Xerox Corporation Reformatting binary image data to generate smaller compressed image data size

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3496543A (en) * 1967-01-27 1970-02-17 Singer General Precision On-line read/copy data processing system accepting printed and graphic material
US3509415A (en) * 1969-01-13 1970-04-28 Ibm Format scheme for vidicon scanners
US3805237A (en) * 1971-04-30 1974-04-16 Ibm Technique for the conversion to digital form of interspersed symbolic and graphic data
JPS5751310B2 (fr) * 1972-06-07 1982-11-01
DE2326644C3 (de) * 1973-05-25 1981-10-01 Licentia Patent-Verwaltungs-Gmbh, 6000 Frankfurt Verfahren zur Datenkompression von Nachrichtensignalen
US3987412A (en) * 1975-01-27 1976-10-19 International Business Machines Corporation Method and apparatus for image data compression utilizing boundary following of the exterior and interior borders of objects
DE2516332C2 (de) * 1975-04-15 1987-01-22 Siemens AG, 1000 Berlin und 8000 München Verfahren zur Codierung von elektrischen Signalen, die bei der Abtastung eines graphischen Musters mit aus Text und Bild gemischtem Inhalt gewonnen werden
US4157533A (en) * 1977-11-25 1979-06-05 Recognition Equipment Incorporated Independent channel automatic gain control for self-scanning photocell array
DE3019836A1 (de) * 1980-05-23 1982-01-21 Siemens AG, 1000 Berlin und 8000 München Verfahren zum automatischen erkennen von bild- und text- oder graphikbereichen auf druckvorlagen

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0206214A1 (fr) * 1985-06-18 1986-12-30 Siemens Aktiengesellschaft Procédé de description symbolique uniforme d'épreuves de documents en forme de structures de données dans un automate
EP0350933A3 (en) * 1988-07-13 1990-04-25 Matsushita Electric Industrial Co., Ltd. Image signal processing apparatus for bar code image signal
US5134272A (en) * 1988-07-13 1992-07-28 Matsushita Electric Industrial Co., Ltd. Image signal processing apparatus for bar code image signal

Also Published As

Publication number Publication date
US4513442A (en) 1985-04-23
EP0059239A3 (fr) 1984-10-24
DE3107655A1 (de) 1982-09-16

Similar Documents

Publication Publication Date Title
EP0059239A2 (fr) Méthode pour rechercher et délimiter des régions de texte sur un document qui peut contenir des régions de texte, de graphique et/ou d'image
EP0067244A2 (fr) Méthode de reconnaissance automatique de blocs de blanc ainsi que de régions de textes de graphiques et/ou d'images en demi-teintes sur des documents imprimés
EP0040796B1 (fr) Procédé de differenciation entre régions d'image et de texte ou de graphique sur des originaux imprimés
DE2558498C2 (de) Vorrichtung zur Darstellung von aus Bildpunkten zusammengesetzten Zeichen
DE3248928C2 (fr)
DE19530829C2 (de) Verfahren zum elektronischen Wiederauffinden von einem Dokument hinzugefügter Information
DE3107521A1 (de) Verfahren zum automatischen erkennen von bild- und text- oder graphikbereichen auf druckvorlagen
DE3586646T2 (de) Bildanzeigegeraet.
DE2144596A1 (de) Video-Anzeigevorrichtung
DE3415470A1 (de) Geraet und verfahren zum codieren und speichern von rasterabtastbildern
DE2909153A1 (de) Einrichtung zur elektronischen verarbeitung von bild- und/oder zeichenmustern
DE3335162A1 (de) Vorrichtung und verfahren fuer graphische darstellungen mittels computern
DE3625390A1 (de) Graphisches anzeigesystem mit beliebiger rberlappung von bildausschnitten
EP0111026A1 (fr) Procédé et appareil pour la retouche copiante dans la reproduction électronique d'images en couleurs
DE2247942A1 (de) Zeichenerkennungsverfahren zur verbesserung der erkennbarkeit gestoerter zeichen
DE3101543A1 (de) "buerokommunikationssystem"
DE3345306A1 (de) Verfahren und vorrichtung zur verarbeitung von bilddaten
DE68904611T2 (de) Verfahren und vorrichtung zur erzeugung von gemischten bildern.
DE3850444T2 (de) Progammverwaltungsverfahren für verteilte Verarbeitungssysteme und angepasste Vorrichtung.
EP0059705A1 (fr) Procede et circuit pour la correction partielle du dessin lors de la reproduction d'images en couleur.
DE69429562T2 (de) Ein Detektionssystem in einem Durchlauf für Bildbereiche, die von einer Markierung umschlossen sind und Verfahren für einen Photokopierer
DE2618731A1 (de) Verfahren zur automatischen isolierung von in einem bild enthaltenen figuren und vorrichtung zur durchfuehrung des verfahrens
DE2410306A1 (de) Verfahren und vorrichtung zur maschinellen zeichenerkennung
DE3882361T2 (de) Bilddiskriminator zur wiedergabesteuerung.
DE3128794A1 (de) Verfahren zum auffinden und abgrenzen von buchstaben und buchstabengruppen oder woertern in textbereichen einer vorlage, die ausser textbereichen auch graphik-und/oder bildbereiche enthalten kann.

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 19811016

AK Designated contracting states

Designated state(s): AT BE FR GB IT

PUAL Search report despatched

Free format text: ORIGINAL CODE: 0009013

AK Designated contracting states

Designated state(s): AT BE FR GB IT

17Q First examination report despatched

Effective date: 19860422

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 19860501

RIN1 Information on inventor provided before grant (corrected)

Inventor name: SCHERL, WOLFGANG, DIPL.-ING.