ATE373274T1 - Verfahren zur identifizierung von wörtern in einem elektronischen dokument - Google Patents

Verfahren zur identifizierung von wörtern in einem elektronischen dokument

Info

Publication number
ATE373274T1
ATE373274T1 AT05014369T AT05014369T ATE373274T1 AT E373274 T1 ATE373274 T1 AT E373274T1 AT 05014369 T AT05014369 T AT 05014369T AT 05014369 T AT05014369 T AT 05014369T AT E373274 T1 ATE373274 T1 AT E373274T1
Authority
AT
Austria
Prior art keywords
semantic units
electronic document
glyphs
determining
geometric
Prior art date
Application number
AT05014369T
Other languages
English (en)
Inventor
Serge Bronstein
Original Assignee
Pdflib Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pdflib Gmbh filed Critical Pdflib Gmbh
Application granted granted Critical
Publication of ATE373274T1 publication Critical patent/ATE373274T1/de

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)
AT05014369T 2005-07-01 2005-07-01 Verfahren zur identifizierung von wörtern in einem elektronischen dokument ATE373274T1 (de)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP05014369A EP1739574B1 (de) 2005-07-01 2005-07-01 Verfahren zur Identifizierung von Wörtern in einem elektronischen Dokument

Publications (1)

Publication Number Publication Date
ATE373274T1 true ATE373274T1 (de) 2007-09-15

Family

ID=35407037

Family Applications (1)

Application Number Title Priority Date Filing Date
AT05014369T ATE373274T1 (de) 2005-07-01 2005-07-01 Verfahren zur identifizierung von wörtern in einem elektronischen dokument

Country Status (4)

Country Link
US (1) US7705848B2 (de)
EP (1) EP1739574B1 (de)
AT (1) ATE373274T1 (de)
DE (1) DE602005002473T2 (de)

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7786994B2 (en) * 2006-10-26 2010-08-31 Microsoft Corporation Determination of unicode points from glyph elements
JP5248845B2 (ja) * 2006-12-13 2013-07-31 キヤノン株式会社 文書処理装置、文書処理方法、プログラムおよび記憶媒体
JP4834603B2 (ja) * 2007-05-09 2011-12-14 京セラミタ株式会社 画像処理装置、画像形成装置
US8229232B2 (en) * 2007-08-24 2012-07-24 CVISION Technologies, Inc. Computer vision-based methods for enhanced JBIG2 and generic bitonal compression
US8443278B2 (en) * 2009-01-02 2013-05-14 Apple Inc. Identification of tables in an unstructured document
US8824806B1 (en) * 2010-03-02 2014-09-02 Amazon Technologies, Inc. Sequential digital image panning
US8549399B2 (en) * 2011-01-18 2013-10-01 Apple Inc. Identifying a selection of content in a structured document
US8380753B2 (en) 2011-01-18 2013-02-19 Apple Inc. Reconstruction of lists in a document
TWI476761B (zh) 2011-04-08 2015-03-11 Dolby Lab Licensing Corp 用以產生可由實施不同解碼協定之解碼器所解碼的統一位元流之音頻編碼方法及系統
CA2753508C (en) * 2011-09-23 2013-07-30 Guy Le Henaff Tracing a document in an electronic publication
BR112014017832B1 (pt) 2012-01-23 2021-07-06 Microsoft Technology Licensing, Llc método de detecção de fórmula para identificar uma fórmula matemática, sistema para detectar uma fór-mula que aparece em um documento de formato fixo e mídia legível por computador
US10002448B2 (en) 2012-08-10 2018-06-19 Monotype Imaging Inc. Producing glyph distance fields
WO2014025363A1 (en) * 2012-08-10 2014-02-13 Monotype Imaging Inc. Producing glyph distance fields
US9330070B2 (en) 2013-03-11 2016-05-03 Microsoft Technology Licensing, Llc Detection and reconstruction of east asian layout features in a fixed format document
US20140258852A1 (en) * 2013-03-11 2014-09-11 Microsoft Corporation Detection and Reconstruction of Right-to-Left Text Direction, Ligatures and Diacritics in a Fixed Format Document
US9047511B1 (en) * 2013-05-15 2015-06-02 Amazon Technologies, Inc. Describing inter-character spacing in a font file
IN2015CH05327A (de) * 2015-10-05 2015-10-16 Wipro Ltd
US10930045B2 (en) * 2017-03-22 2021-02-23 Microsoft Technology Licensing, Llc Digital ink based visual components
US10740602B2 (en) * 2018-04-18 2020-08-11 Google Llc System and methods for assigning word fragments to text lines in optical character recognition-extracted data
US11615244B2 (en) 2020-01-30 2023-03-28 Oracle International Corporation Data extraction and ordering based on document layout analysis
US11475686B2 (en) 2020-01-31 2022-10-18 Oracle International Corporation Extracting data from tables detected in electronic documents
AU2021201352B2 (en) * 2021-03-02 2025-09-11 Canva Pty Ltd Systems and methods for extracting text from portable document format data
US11687700B1 (en) * 2022-02-01 2023-06-27 International Business Machines Corporation Generating a structure of a PDF-document
CN118228696A (zh) * 2022-12-20 2024-06-21 凯钿行动科技股份有限公司 编辑pdf文档的方法、装置、计算器设备及存储介质
US12361736B2 (en) 2023-01-04 2025-07-15 Oracle International Corporation Multi-stage machine learning model training for key-value extraction
CN116089668A (zh) * 2023-01-31 2023-05-09 中国工商银行股份有限公司 一种业务流程配置信息生成处理方法及装置

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5416898A (en) * 1992-05-12 1995-05-16 Apple Computer, Inc. Apparatus and method for generating textual lines layouts
EP0702322B1 (de) * 1994-09-12 2002-02-13 Adobe Systems Inc. Verfahren und Gerät zur Identifikation von Wörtern, die in einem portablen elektronischen Dokument beschrieben sind
US5870084A (en) * 1996-11-12 1999-02-09 Thomson Consumer Electronics, Inc. System and method for efficiently storing and quickly retrieving glyphs for large character set languages in a set top box
US6327393B1 (en) * 1998-08-17 2001-12-04 Cognex Corporation Method and apparatus to transform a region within a digital image using a deformable window
US20040205568A1 (en) * 2002-03-01 2004-10-14 Breuel Thomas M. Method and system for document image layout deconstruction and redisplay system
US20040202352A1 (en) * 2003-04-10 2004-10-14 International Business Machines Corporation Enhanced readability with flowed bitmaps

Also Published As

Publication number Publication date
US7705848B2 (en) 2010-04-27
EP1739574B1 (de) 2007-09-12
US20070002054A1 (en) 2007-01-04
DE602005002473T2 (de) 2008-01-10
DE602005002473D1 (de) 2007-10-25
EP1739574A1 (de) 2007-01-03

Similar Documents

Publication Publication Date Title
ATE373274T1 (de) Verfahren zur identifizierung von wörtern in einem elektronischen dokument
CN102411587B (zh) 一种网页分类方法和装置
Kirchherr et al. Conceptualizing the circular economy: An analysis of 114 definitions
WO2007038389A3 (en) Method and apparatus for identifying and classifying network documents as spam
ATE463013T1 (de) Verfahren und anordnung zur erkennung semantischer strukturen aus einem text
DE60231005D1 (de) Systeme, verfahren und software zum klassifizieren von dokumenten
JP2015502603A (ja) ウェブページからの主要コンテンツの抽出
ATE432515T1 (de) Verfahren zur determinierung und unterdrückung von duplikaten
CN108268884B (zh) 一种文档对比方法及装置
DE60139536D1 (de) Verfahren, Gerät und Computerprogrammprodukte zur Wiederauffindung von Information und dem Klassifizieren von Dokumenten mit einem multidimensionalem Unterraum
CN103309862A (zh) 一种网页类型识别方法和系统
ATE450012T1 (de) Computerunterstütztes abrufen von dokumenten
WO2003057648A3 (fr) Procedes et systemes de recherche et d'association de ressources d'information telles que des pages web
DE60308654D1 (de) Ermittlung der auswirkung von einem clostridial toxin auf muskeln
BR0309598A (pt) Método para a caracterização de um relacionamento entre uma primeira e uma segunda amostras de áudio, produto de programa de computador e sistema de computador
WO2007019691A3 (en) Automatic website generator
ATE470192T1 (de) Verfahren und vorrichtung zur klassifikation von bildseiten mittels zusammenfassungen
DE602006007172D1 (de) System und verfahren zum analysieren von radarinformationen
EP1736901A3 (de) Verfahren zur Klassifizierung von Subbäumen in halbstrukturierten Dokumenten
DE602005018429D1 (de) Vorrichtung, Verfahren, Prozessoranordnung und computerlesbares Datenträgerspeicherprogramm zur Dokumentklassifizierung
WO2008114245A3 (en) System and method for identification, prevention and management of web-sites defacement attacks
McCollum et al. Unbounded harmony is not always myopic: Evidence from Tutrugbu
ATE375561T1 (de) Verfahren zur identifizierung von redundantem text in elektronischen dokumenten
EP1907946A1 (de) Verfahren zum finden der textlesereihenfolge in einem dokument
ATE519168T1 (de) Verfahren zur analyse von ein teil von multimedia-inhalten und entsprechenden computer software produkt-und-analyse-gerät

Legal Events

Date Code Title Description
UEP Publication of translation of european patent specification

Ref document number: 1739574

Country of ref document: EP