ATE530988T1 - Verfahren zum finden der textlesereihenfolge in einem dokument - Google Patents

Verfahren zum finden der textlesereihenfolge in einem dokument

Info

Publication number
ATE530988T1
ATE530988T1 AT05778313T AT05778313T ATE530988T1 AT E530988 T1 ATE530988 T1 AT E530988T1 AT 05778313 T AT05778313 T AT 05778313T AT 05778313 T AT05778313 T AT 05778313T AT E530988 T1 ATE530988 T1 AT E530988T1
Authority
AT
Austria
Prior art keywords
reading order
finding
document
text reading
text
Prior art date
Application number
AT05778313T
Other languages
English (en)
Inventor
Sherif Yacoub
Daniel Ortega
Paolo Faraboschi
Jose Peiro
Original Assignee
Hewlett Packard Development Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Development Co filed Critical Hewlett Packard Development Co
Application granted granted Critical
Publication of ATE530988T1 publication Critical patent/ATE530988T1/de

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Geometry (AREA)
  • Computer Graphics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Character Input (AREA)
AT05778313T 2005-07-27 2005-07-27 Verfahren zum finden der textlesereihenfolge in einem dokument ATE530988T1 (de)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2005/026498 WO2007018501A1 (en) 2005-07-27 2005-07-27 A method for finding text reading order in a document

Publications (1)

Publication Number Publication Date
ATE530988T1 true ATE530988T1 (de) 2011-11-15

Family

ID=35169885

Family Applications (1)

Application Number Title Priority Date Filing Date
AT05778313T ATE530988T1 (de) 2005-07-27 2005-07-27 Verfahren zum finden der textlesereihenfolge in einem dokument

Country Status (4)

Country Link
US (1) US9098581B2 (de)
EP (1) EP1907946B1 (de)
AT (1) ATE530988T1 (de)
WO (1) WO2007018501A1 (de)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8948511B2 (en) 2005-06-02 2015-02-03 Hewlett-Packard Development Company, L.P. Automated document processing system
US8112707B2 (en) * 2005-12-13 2012-02-07 Trigent Software Ltd. Capturing reading styles
US8301998B2 (en) 2007-12-14 2012-10-30 Ebay Inc. Identification of content in an electronic document
US8233671B2 (en) * 2007-12-27 2012-07-31 Intel-Ge Care Innovations Llc Reading device with hierarchal navigation
US8254681B1 (en) * 2009-02-05 2012-08-28 Google Inc. Display of document image optimized for reading
JP5412916B2 (ja) * 2009-03-27 2014-02-12 コニカミノルタ株式会社 文書画像処理装置、文書画像処理方法および文書画像処理プログラム
JP5720147B2 (ja) * 2010-09-02 2015-05-20 富士ゼロックス株式会社 図形領域取得装置及びプログラム
JP5812702B2 (ja) * 2011-06-08 2015-11-17 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation 文字の読み順を決定するための読み順決定装置、方法及びプログラム
WO2013110286A1 (en) 2012-01-23 2013-08-01 Microsoft Corporation Paragraph property detection and style reconstruction engine
US9946690B2 (en) * 2012-07-06 2018-04-17 Microsoft Technology Licensing, Llc Paragraph alignment detection and region-based section reconstruction
CN106326193A (zh) * 2015-06-18 2017-01-11 北京大学 一种版式文档中脚注识别方法及脚注与脚注引用关联方法
US10489439B2 (en) * 2016-04-14 2019-11-26 Xerox Corporation System and method for entity extraction from semi-structured text documents
US10713519B2 (en) * 2017-06-22 2020-07-14 Adobe Inc. Automated workflows for identification of reading order from text segments using probabilistic language models
US10970458B1 (en) * 2020-06-25 2021-04-06 Adobe Inc. Logical grouping of exported text blocks
US12086551B2 (en) * 2021-06-23 2024-09-10 Microsoft Technology Licensing, Llc Semantic difference characterization for documents
CN114239598B (zh) * 2021-12-17 2024-12-03 上海高德威智能交通系统有限公司 文本元素阅读顺序确定方法、装置、电子设备及存储介质
CN119380363B (zh) * 2024-10-10 2025-10-31 中南民族大学 一种联合布局分析和语言模型的阅读顺序检测方法及系统

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5185813A (en) * 1988-01-19 1993-02-09 Kabushiki Kaisha Toshiba Document image processing apparatus
US5159667A (en) 1989-05-31 1992-10-27 Borrey Roland G Document identification by characteristics matching
JP2579397B2 (ja) 1991-12-18 1997-02-05 インターナショナル・ビジネス・マシーンズ・コーポレイション 文書画像のレイアウトモデルを作成する方法及び装置
US5848184A (en) * 1993-03-15 1998-12-08 Unisys Corporation Document page analyzer and method
JP3302147B2 (ja) 1993-05-12 2002-07-15 株式会社リコー 文書画像処理方法
US6009196A (en) 1995-11-28 1999-12-28 Xerox Corporation Method for classifying non-running text in an image
US5956468A (en) * 1996-07-12 1999-09-21 Seiko Epson Corporation Document segmentation system
US6562077B2 (en) 1997-11-14 2003-05-13 Xerox Corporation Sorting image segments into clusters based on a distance measurement
US6970602B1 (en) 1998-10-06 2005-11-29 International Business Machines Corporation Method and apparatus for transcoding multimedia using content analysis
GB2364416B (en) * 2000-06-30 2004-10-27 Post Office Image processing for clustering related text objects
US6907431B2 (en) 2002-05-03 2005-06-14 Hewlett-Packard Development Company, L.P. Method for determining a logical structure of a document
US20060104511A1 (en) 2002-08-20 2006-05-18 Guo Jinhong K Method, system and apparatus for generating structured document files
US7707039B2 (en) 2004-02-15 2010-04-27 Exbiblio B.V. Automatic modification of web pages
US7756871B2 (en) 2004-10-13 2010-07-13 Hewlett-Packard Development Company, L.P. Article extraction
US20070027749A1 (en) 2005-07-27 2007-02-01 Hewlett-Packard Development Company, L.P. Advertisement detection

Also Published As

Publication number Publication date
WO2007018501A1 (en) 2007-02-15
EP1907946A1 (de) 2008-04-09
US20100198827A1 (en) 2010-08-05
US9098581B2 (en) 2015-08-04
EP1907946B1 (de) 2011-10-26

Similar Documents

Publication Publication Date Title
ATE530988T1 (de) Verfahren zum finden der textlesereihenfolge in einem dokument
DK1747540T3 (da) Fremgangsmåde til genkendelse og overvågning af fiberholdige medier, samt anvendelse af fremgangsmåden indenfor informationsteknologien
ATE510439T1 (de) Verfahren und vorrichtung zum bestimmen, übermitteln und/oder verwenden von verzögerungsinformationen
EA200602044A1 (ru) Молекулярно впечатанные полимеры, селективные по отношению к нитрозаминам, и способы их применения
EA200970740A1 (ru) Способ образования моделей коллектора с использованием синтетических стратиграфических колонок
DE602004027472D1 (de) Verbessertes verfahren zum wickeln von z-filtermedien
ATE546958T1 (de) Vorrichtung und verfahren zur datenverarbeitung
DE502004006864D1 (de) Verfahren zur computergestützten simulation einer maschinenanordnung, simulationseinrichtung, computerlesbares speichermedium und computerprogramm-element
EP1924903A4 (de) Systeme und verfahren zum finden relevanter dokumente durch analyse von etiketten
DE602005016892D1 (de) Nukleinsäurecharakterisierung
DE60332394D1 (de) Verfahren und vorrichtung zur seitengruppierung in einem block
DE602005018429D1 (de) Vorrichtung, Verfahren, Prozessoranordnung und computerlesbares Datenträgerspeicherprogramm zur Dokumentklassifizierung
ATE514161T1 (de) Vorrichtung und verfahren zum berechnen eines fingerabdrucks eines audiosignals, vorrichtung und verfahren zum synchronisieren und vorrichtung und verfahren zum charakterisieren eines testaudiosignals
DE602006009973D1 (de) Verfahren zur selektiven entfernung von safrol aus muskatöl
DE60327020D1 (de) Vorrichtung, Verfahren und computerlesbares Aufzeichnungsmedium zur Erkennung von Schlüsselwörtern in spontaner Sprache
DK1800753T3 (da) Fremgangsmåde og indretning til separering af faste partikler på basis af en forskel i densitet
DE602006021021D1 (de) Verfahren zur erzeugung von ausgabedaten
DE60334499D1 (de) Verfahren zur erhöhung der ausbreitung von b-zellen
ATE476068T1 (de) Verfahren und vorrichtung zum umkonfigurieren eines gemeinsamen kanals
ATE407019T1 (de) Sicherheitselement und verfahren zu dessen herstellung
ATE394660T1 (de) Verfahren zum immunzytologischen oder molekularen nachweis von disseminierten tumorzellen aus einer körperflüssigkeit und dazu geeigneter kit
DK1910999T3 (da) Fremgangsmåde og apparat til bestemmelse af den relative position af et første objekt i forhold til et andet objekt samt et tilsvarende computerprogram og et tilsvarende computerlæsbart lagermedium
ATE425962T1 (de) Peptid-deformylase-hemmer
ATE483215T1 (de) Verifikation der authentizität
DE602006013666D1 (de) Verfahren und vorrichtung zum automatischen erstellung einer abspielliste durch segmentweisen merkmalsvergleich

Legal Events

Date Code Title Description
RER Ceased as to paragraph 5 lit. 3 law introducing patent treaties