EP0059239A2 - Méthode pour rechercher et délimiter des régions de texte sur un document qui peut contenir des régions de texte, de graphique et/ou d'image - Google Patents
Méthode pour rechercher et délimiter des régions de texte sur un document qui peut contenir des régions de texte, de graphique et/ou d'image Download PDFInfo
- Publication number
- EP0059239A2 EP0059239A2 EP81108441A EP81108441A EP0059239A2 EP 0059239 A2 EP0059239 A2 EP 0059239A2 EP 81108441 A EP81108441 A EP 81108441A EP 81108441 A EP81108441 A EP 81108441A EP 0059239 A2 EP0059239 A2 EP 0059239A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- black
- line
- text
- numbering
- area
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/413—Classification of content, e.g. text, photographs or tables
Definitions
- the present invention relates to a method for locating and delimiting text areas on a template, which can contain text, graphics and / or image areas.
- the present invention has for its object to provide a method by means of which the above-mentioned processes can be carried out in a simple, quick and reliable manner.
- the stated object is achieved by a method according to the preamble of the main claim, which is characterized by the features specified in the characterizing part of the main claim.
- the invention offers the advantage that a relatively simple method, which consequently also requires a relatively simple arrangement for carrying out this method, in particular due to the availability of inexpensive and space-saving data processing aids, such as microcomputers, by means of which the stated object is achieved can be.
- the template, cf. Fig. 5 or Fig. 6 initially in a first step optoelectronically, preferably by means of a video camera, scanned in a manner known per se.
- the signals which arise in analog form and represent the optoelectronic image of the template are assigned to either a binary number, preferably 1, representing a "white value” or a binary number, preferably 0, representing a "black value”.
- a third step all the points determined in the decision-making process as "black” and represented by the binary number in question are multiplied according to a predetermined rule, so that a horizontal line of a predetermined length is generated from one point, whereby, for example, when a text line is present in the concerned scanned area of the front position black blocks, the length of which corresponds to the relevant length of the text line plus the specified extension length for the "black” dots.
- each "white" point is expanded in the opposite direction for the intermediate result line thus obtained by means of a similar method to form a line of predetermined length, this length being greater than the previous black extent, so that the previously produced black block is correspondingly extended Length is shortened.
- a black block freed from minor discontinuities in this way is lengthened by the difference in length from the original text line length, so that a black block extending over the entire text line length is created.
- an operator see FIG. 1 is then used to check whether there are characteristic horizontal white / black transitions with a predetermined white column length LW and a predetermined black column length LB for text areas.
- a vertical black line LO see FIG. 2, of predetermined length is generated, so that for each line of text there is a black block isolated from its surroundings, see FIG. 9 and FIG. 10.
- the black blocks formed in this way are used to determine the left-hand and right-hand extreme coordinates using an area tracking method, see FIGS. 11 and 12, respectively, and combine them into a list.
- the extreme co-ordinates calculated in this way are examined using statistical test methods to determine whether they actually delimit a text part, cf.
- the operator is formed in that a window of LW + LB scan lines is stored in a "rolling" manner in that the first LW scan lines are inverted and the operator condition can be checked by column-by-column summation, the operator condition being fulfilled for the column for which a column sum of 0 results.
- a vertical black line LO can be checked line by line by providing a counter for each output column, which is deleted at the beginning of the overall process. This counter is reduced line by line by 1, but is set to length LO when the operator condition is met. The counter can be checked to the extent that areas in an operator output line are classified as belonging to a black block, as long as the value of the respective column counter is greater than 0.
- the output line supplied by the operator is examined for black components.
- Each new beginning of the black area is scanned in an area tracking process, whereby its extreme coordinates are calculated.
- each new beginning of the black area is numbered with a number that is increased by 1.
- Within an uninterrupted black area within a scan line such a black area is numbered with the same number. If a black area already numbered in the previous line is touched by the black area to be newly numbered, the numbering of the black area already numbered in the previous line is adopted so that an existing numbering is continued line by line over the total black area. If a black area of the new line touches several black areas of the old line, the numbering of the leftmost area is continued.
- this numbering is used for the area to be newly numbered. For each numbering determined, a table entry is made in which the extreme coordinates occurring under this numbering are recorded. When a numbering range changes to a new numbering range, the extreme coordinates of interest are added to the table entry of the new numbering. The table entries that disappear when they overlap are deleted from the table after the extreme coordinates have been entered. When a black area is closed and the numbering disappears, the corresponding table entry is transferred to an output list and the black area is regarded as closed. After determining an actually existing text part, these entries are used as its extreme coordinates, namely the start and end coordinates of the relevant line of text.
- One embodiment of the invention provides that the signals which arise in analog form and represent the optoelectronic image are divided into "white values" and "black values” in an analog method:
- Another advantageous embodiment of the invention provides that the signals, which arise in analog form and represent the optoelectronic image, are digitized in a manner known per se and that the digital values obtained in this way are known using one of the methods known per se, preferably by comparison with a predetermined digital one Threshold value can be divided into "white values” and "black values”.
- a fixed, predetermined reference value preferably a threshold value
- a threshold value can be used in a simple manner for dividing into “white values” and “black values”.
- an adaptable reference value preferably a threshold value
- Another advantageous embodiment of the invention provides that a signal representing a setting criterion is continuously derived from the scanning signals during the scanning process and is used to adapt the reference value.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Character Input (AREA)
- Editing Of Facsimile Originals (AREA)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| DE3107655 | 1981-02-27 | ||
| DE19813107655 DE3107655A1 (de) | 1981-02-27 | 1981-02-27 | Verfahren zum auffinden und abgrenzen von textbereichen auf einer vorlage, die text-, graphik- und/oder bildbereiche enthalten kann |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| EP0059239A2 true EP0059239A2 (fr) | 1982-09-08 |
| EP0059239A3 EP0059239A3 (fr) | 1984-10-24 |
Family
ID=6126010
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP81108441A Withdrawn EP0059239A3 (fr) | 1981-02-27 | 1981-10-16 | Méthode pour rechercher et délimiter des régions de texte sur un document qui peut contenir des régions de texte, de graphique et/ou d'image |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US4513442A (fr) |
| EP (1) | EP0059239A3 (fr) |
| DE (1) | DE3107655A1 (fr) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP0206214A1 (fr) * | 1985-06-18 | 1986-12-30 | Siemens Aktiengesellschaft | Procédé de description symbolique uniforme d'épreuves de documents en forme de structures de données dans un automate |
| EP0350933A3 (en) * | 1988-07-13 | 1990-04-25 | Matsushita Electric Industrial Co., Ltd. | Image signal processing apparatus for bar code image signal |
Families Citing this family (34)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| DE3414455C2 (de) * | 1983-04-26 | 1996-04-25 | Wollang Peter Michael | Verfahren und Vorrichtung zum Lesen und Verarbeiten von Information, die aus dekodierbarer Schriftinformation und/oder nichtdekodierbarer Graphikinformation besteht |
| DE3538639A1 (de) * | 1984-10-31 | 1986-04-30 | Canon K.K., Tokio/Tokyo | Bildverarbeitungssystem |
| ES8701398A1 (es) * | 1985-05-14 | 1986-11-16 | Intersoftware Sa | Procedimiento para la lectura automatica de imagenes y aparato para la realizacion del mismo |
| JPS62137974A (ja) * | 1985-12-12 | 1987-06-20 | Ricoh Co Ltd | 画像処理方式 |
| US4817166A (en) * | 1986-05-05 | 1989-03-28 | Perceptics Corporation | Apparatus for reading a license plate |
| US4764973A (en) * | 1986-05-28 | 1988-08-16 | The United States Of America As Represented By The Secretary Of The Air Force | Whole word, phrase or number reading |
| US4736109A (en) * | 1986-08-13 | 1988-04-05 | Bally Manufacturing Company | Coded document and document reading system |
| US4866784A (en) * | 1987-12-02 | 1989-09-12 | Eastman Kodak Company | Skew detector for digital image processing system |
| US5038381A (en) * | 1988-07-11 | 1991-08-06 | New Dest Corporation | Image/text filtering system and method |
| US5081685A (en) * | 1988-11-29 | 1992-01-14 | Westinghouse Electric Corp. | Apparatus and method for reading a license plate |
| JP2930612B2 (ja) * | 1989-10-05 | 1999-08-03 | 株式会社リコー | 画像形成装置 |
| US5048096A (en) * | 1989-12-01 | 1991-09-10 | Eastman Kodak Company | Bi-tonal image non-text matter removal with run length and connected component analysis |
| US5065437A (en) * | 1989-12-08 | 1991-11-12 | Xerox Corporation | Identification and segmentation of finely textured and solid regions of binary images |
| US5048109A (en) * | 1989-12-08 | 1991-09-10 | Xerox Corporation | Detection of highlighted regions |
| US5202933A (en) * | 1989-12-08 | 1993-04-13 | Xerox Corporation | Segmentation of text and graphics |
| US5272764A (en) * | 1989-12-08 | 1993-12-21 | Xerox Corporation | Detection of highlighted regions |
| US5459826A (en) * | 1990-05-25 | 1995-10-17 | Archibald; Delbert M. | System and method for preparing text and pictorial materials for printing using predetermined coding and merging regimen |
| US5151949A (en) * | 1990-10-10 | 1992-09-29 | Fuji Xerox Co., Ltd. | System and method employing multiple predictor sets to compress image data having different portions |
| US5193122A (en) * | 1990-12-03 | 1993-03-09 | Xerox Corporation | High speed halftone detection technique |
| US5315668A (en) * | 1991-11-27 | 1994-05-24 | The United States Of America As Represented By The Secretary Of The Air Force | Offline text recognition without intraword character segmentation based on two-dimensional low frequency discrete Fourier transforms |
| JPH05151254A (ja) * | 1991-11-27 | 1993-06-18 | Hitachi Ltd | 文書処理方法およびシステム |
| US5325441A (en) * | 1992-01-31 | 1994-06-28 | Westinghouse Electric Corporation | Method for automatically indexing the complexity of technical illustrations for prospective users |
| US6002798A (en) * | 1993-01-19 | 1999-12-14 | Canon Kabushiki Kaisha | Method and apparatus for creating, indexing and viewing abstracted documents |
| US5410611A (en) * | 1993-12-17 | 1995-04-25 | Xerox Corporation | Method for identifying word bounding boxes in text |
| JP3772262B2 (ja) * | 1994-08-12 | 2006-05-10 | ヒューレット・パッカード・カンパニー | 画像の型を識別する方法 |
| US5659767A (en) * | 1994-11-10 | 1997-08-19 | Canon Information Systems, Inc. | Application programming interface for accessing document analysis functionality of a block selection program |
| US5745596A (en) * | 1995-05-01 | 1998-04-28 | Xerox Corporation | Method and apparatus for performing text/image segmentation |
| US6038340A (en) * | 1996-11-08 | 2000-03-14 | Seiko Epson Corporation | System and method for detecting the black and white points of a color image |
| JP3591184B2 (ja) * | 1997-01-14 | 2004-11-17 | 松下電器産業株式会社 | バーコード読み取り装置 |
| US5995661A (en) * | 1997-10-08 | 1999-11-30 | Hewlett-Packard Company | Image boundary detection for a scanned image |
| US6832726B2 (en) | 2000-12-19 | 2004-12-21 | Zih Corp. | Barcode optical character recognition |
| US7311256B2 (en) * | 2000-12-19 | 2007-12-25 | Zih Corp. | Barcode optical character recognition |
| US7082219B2 (en) * | 2002-02-04 | 2006-07-25 | The United States Of America As Represented By The Secretary Of The Air Force | Method and apparatus for separating text from images |
| US20050244060A1 (en) * | 2004-04-30 | 2005-11-03 | Xerox Corporation | Reformatting binary image data to generate smaller compressed image data size |
Family Cites Families (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US3496543A (en) * | 1967-01-27 | 1970-02-17 | Singer General Precision | On-line read/copy data processing system accepting printed and graphic material |
| US3509415A (en) * | 1969-01-13 | 1970-04-28 | Ibm | Format scheme for vidicon scanners |
| US3805237A (en) * | 1971-04-30 | 1974-04-16 | Ibm | Technique for the conversion to digital form of interspersed symbolic and graphic data |
| JPS5751310B2 (fr) * | 1972-06-07 | 1982-11-01 | ||
| DE2326644C3 (de) * | 1973-05-25 | 1981-10-01 | Licentia Patent-Verwaltungs-Gmbh, 6000 Frankfurt | Verfahren zur Datenkompression von Nachrichtensignalen |
| US3987412A (en) * | 1975-01-27 | 1976-10-19 | International Business Machines Corporation | Method and apparatus for image data compression utilizing boundary following of the exterior and interior borders of objects |
| DE2516332C2 (de) * | 1975-04-15 | 1987-01-22 | Siemens AG, 1000 Berlin und 8000 München | Verfahren zur Codierung von elektrischen Signalen, die bei der Abtastung eines graphischen Musters mit aus Text und Bild gemischtem Inhalt gewonnen werden |
| US4157533A (en) * | 1977-11-25 | 1979-06-05 | Recognition Equipment Incorporated | Independent channel automatic gain control for self-scanning photocell array |
| DE3019836A1 (de) * | 1980-05-23 | 1982-01-21 | Siemens AG, 1000 Berlin und 8000 München | Verfahren zum automatischen erkennen von bild- und text- oder graphikbereichen auf druckvorlagen |
-
1981
- 1981-02-27 DE DE19813107655 patent/DE3107655A1/de not_active Withdrawn
- 1981-10-16 EP EP81108441A patent/EP0059239A3/fr not_active Withdrawn
-
1982
- 1982-02-12 US US06/348,279 patent/US4513442A/en not_active Expired - Fee Related
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP0206214A1 (fr) * | 1985-06-18 | 1986-12-30 | Siemens Aktiengesellschaft | Procédé de description symbolique uniforme d'épreuves de documents en forme de structures de données dans un automate |
| EP0350933A3 (en) * | 1988-07-13 | 1990-04-25 | Matsushita Electric Industrial Co., Ltd. | Image signal processing apparatus for bar code image signal |
| US5134272A (en) * | 1988-07-13 | 1992-07-28 | Matsushita Electric Industrial Co., Ltd. | Image signal processing apparatus for bar code image signal |
Also Published As
| Publication number | Publication date |
|---|---|
| US4513442A (en) | 1985-04-23 |
| EP0059239A3 (fr) | 1984-10-24 |
| DE3107655A1 (de) | 1982-09-16 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP0059239A2 (fr) | Méthode pour rechercher et délimiter des régions de texte sur un document qui peut contenir des régions de texte, de graphique et/ou d'image | |
| EP0067244A2 (fr) | Méthode de reconnaissance automatique de blocs de blanc ainsi que de régions de textes de graphiques et/ou d'images en demi-teintes sur des documents imprimés | |
| EP0040796B1 (fr) | Procédé de differenciation entre régions d'image et de texte ou de graphique sur des originaux imprimés | |
| DE2558498C2 (de) | Vorrichtung zur Darstellung von aus Bildpunkten zusammengesetzten Zeichen | |
| DE3248928C2 (fr) | ||
| DE19530829C2 (de) | Verfahren zum elektronischen Wiederauffinden von einem Dokument hinzugefügter Information | |
| DE3107521A1 (de) | Verfahren zum automatischen erkennen von bild- und text- oder graphikbereichen auf druckvorlagen | |
| DE3586646T2 (de) | Bildanzeigegeraet. | |
| DE2144596A1 (de) | Video-Anzeigevorrichtung | |
| DE3415470A1 (de) | Geraet und verfahren zum codieren und speichern von rasterabtastbildern | |
| DE2909153A1 (de) | Einrichtung zur elektronischen verarbeitung von bild- und/oder zeichenmustern | |
| DE3335162A1 (de) | Vorrichtung und verfahren fuer graphische darstellungen mittels computern | |
| DE3625390A1 (de) | Graphisches anzeigesystem mit beliebiger rberlappung von bildausschnitten | |
| EP0111026A1 (fr) | Procédé et appareil pour la retouche copiante dans la reproduction électronique d'images en couleurs | |
| DE2247942A1 (de) | Zeichenerkennungsverfahren zur verbesserung der erkennbarkeit gestoerter zeichen | |
| DE3101543A1 (de) | "buerokommunikationssystem" | |
| DE3345306A1 (de) | Verfahren und vorrichtung zur verarbeitung von bilddaten | |
| DE68904611T2 (de) | Verfahren und vorrichtung zur erzeugung von gemischten bildern. | |
| DE3850444T2 (de) | Progammverwaltungsverfahren für verteilte Verarbeitungssysteme und angepasste Vorrichtung. | |
| EP0059705A1 (fr) | Procede et circuit pour la correction partielle du dessin lors de la reproduction d'images en couleur. | |
| DE69429562T2 (de) | Ein Detektionssystem in einem Durchlauf für Bildbereiche, die von einer Markierung umschlossen sind und Verfahren für einen Photokopierer | |
| DE2618731A1 (de) | Verfahren zur automatischen isolierung von in einem bild enthaltenen figuren und vorrichtung zur durchfuehrung des verfahrens | |
| DE2410306A1 (de) | Verfahren und vorrichtung zur maschinellen zeichenerkennung | |
| DE3882361T2 (de) | Bilddiskriminator zur wiedergabesteuerung. | |
| DE3128794A1 (de) | Verfahren zum auffinden und abgrenzen von buchstaben und buchstabengruppen oder woertern in textbereichen einer vorlage, die ausser textbereichen auch graphik-und/oder bildbereiche enthalten kann. |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| 17P | Request for examination filed |
Effective date: 19811016 |
|
| AK | Designated contracting states |
Designated state(s): AT BE FR GB IT |
|
| PUAL | Search report despatched |
Free format text: ORIGINAL CODE: 0009013 |
|
| AK | Designated contracting states |
Designated state(s): AT BE FR GB IT |
|
| 17Q | First examination report despatched |
Effective date: 19860422 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
| 18D | Application deemed to be withdrawn |
Effective date: 19860501 |
|
| RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: SCHERL, WOLFGANG, DIPL.-ING. |