ATE373274T1 - Verfahren zur identifizierung von wörtern in einem elektronischen dokument - Google Patents
Verfahren zur identifizierung von wörtern in einem elektronischen dokumentInfo
- Publication number
- ATE373274T1 ATE373274T1 AT05014369T AT05014369T ATE373274T1 AT E373274 T1 ATE373274 T1 AT E373274T1 AT 05014369 T AT05014369 T AT 05014369T AT 05014369 T AT05014369 T AT 05014369T AT E373274 T1 ATE373274 T1 AT E373274T1
- Authority
- AT
- Austria
- Prior art keywords
- semantic units
- electronic document
- glyphs
- determining
- geometric
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/414—Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Machine Translation (AREA)
- Document Processing Apparatus (AREA)
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP05014369A EP1739574B1 (de) | 2005-07-01 | 2005-07-01 | Verfahren zur Identifizierung von Wörtern in einem elektronischen Dokument |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| ATE373274T1 true ATE373274T1 (de) | 2007-09-15 |
Family
ID=35407037
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| AT05014369T ATE373274T1 (de) | 2005-07-01 | 2005-07-01 | Verfahren zur identifizierung von wörtern in einem elektronischen dokument |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US7705848B2 (de) |
| EP (1) | EP1739574B1 (de) |
| AT (1) | ATE373274T1 (de) |
| DE (1) | DE602005002473T2 (de) |
Families Citing this family (26)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7786994B2 (en) * | 2006-10-26 | 2010-08-31 | Microsoft Corporation | Determination of unicode points from glyph elements |
| JP5248845B2 (ja) * | 2006-12-13 | 2013-07-31 | キヤノン株式会社 | 文書処理装置、文書処理方法、プログラムおよび記憶媒体 |
| JP4834603B2 (ja) * | 2007-05-09 | 2011-12-14 | 京セラミタ株式会社 | 画像処理装置、画像形成装置 |
| US8229232B2 (en) * | 2007-08-24 | 2012-07-24 | CVISION Technologies, Inc. | Computer vision-based methods for enhanced JBIG2 and generic bitonal compression |
| US8443278B2 (en) * | 2009-01-02 | 2013-05-14 | Apple Inc. | Identification of tables in an unstructured document |
| US8824806B1 (en) * | 2010-03-02 | 2014-09-02 | Amazon Technologies, Inc. | Sequential digital image panning |
| US8549399B2 (en) * | 2011-01-18 | 2013-10-01 | Apple Inc. | Identifying a selection of content in a structured document |
| US8380753B2 (en) | 2011-01-18 | 2013-02-19 | Apple Inc. | Reconstruction of lists in a document |
| TWI476761B (zh) | 2011-04-08 | 2015-03-11 | Dolby Lab Licensing Corp | 用以產生可由實施不同解碼協定之解碼器所解碼的統一位元流之音頻編碼方法及系統 |
| CA2753508C (en) * | 2011-09-23 | 2013-07-30 | Guy Le Henaff | Tracing a document in an electronic publication |
| BR112014017832B1 (pt) | 2012-01-23 | 2021-07-06 | Microsoft Technology Licensing, Llc | método de detecção de fórmula para identificar uma fórmula matemática, sistema para detectar uma fór-mula que aparece em um documento de formato fixo e mídia legível por computador |
| US10002448B2 (en) | 2012-08-10 | 2018-06-19 | Monotype Imaging Inc. | Producing glyph distance fields |
| WO2014025363A1 (en) * | 2012-08-10 | 2014-02-13 | Monotype Imaging Inc. | Producing glyph distance fields |
| US9330070B2 (en) | 2013-03-11 | 2016-05-03 | Microsoft Technology Licensing, Llc | Detection and reconstruction of east asian layout features in a fixed format document |
| US20140258852A1 (en) * | 2013-03-11 | 2014-09-11 | Microsoft Corporation | Detection and Reconstruction of Right-to-Left Text Direction, Ligatures and Diacritics in a Fixed Format Document |
| US9047511B1 (en) * | 2013-05-15 | 2015-06-02 | Amazon Technologies, Inc. | Describing inter-character spacing in a font file |
| IN2015CH05327A (de) * | 2015-10-05 | 2015-10-16 | Wipro Ltd | |
| US10930045B2 (en) * | 2017-03-22 | 2021-02-23 | Microsoft Technology Licensing, Llc | Digital ink based visual components |
| US10740602B2 (en) * | 2018-04-18 | 2020-08-11 | Google Llc | System and methods for assigning word fragments to text lines in optical character recognition-extracted data |
| US11615244B2 (en) | 2020-01-30 | 2023-03-28 | Oracle International Corporation | Data extraction and ordering based on document layout analysis |
| US11475686B2 (en) | 2020-01-31 | 2022-10-18 | Oracle International Corporation | Extracting data from tables detected in electronic documents |
| AU2021201352B2 (en) * | 2021-03-02 | 2025-09-11 | Canva Pty Ltd | Systems and methods for extracting text from portable document format data |
| US11687700B1 (en) * | 2022-02-01 | 2023-06-27 | International Business Machines Corporation | Generating a structure of a PDF-document |
| CN118228696A (zh) * | 2022-12-20 | 2024-06-21 | 凯钿行动科技股份有限公司 | 编辑pdf文档的方法、装置、计算器设备及存储介质 |
| US12361736B2 (en) | 2023-01-04 | 2025-07-15 | Oracle International Corporation | Multi-stage machine learning model training for key-value extraction |
| CN116089668A (zh) * | 2023-01-31 | 2023-05-09 | 中国工商银行股份有限公司 | 一种业务流程配置信息生成处理方法及装置 |
Family Cites Families (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5416898A (en) * | 1992-05-12 | 1995-05-16 | Apple Computer, Inc. | Apparatus and method for generating textual lines layouts |
| EP0702322B1 (de) * | 1994-09-12 | 2002-02-13 | Adobe Systems Inc. | Verfahren und Gerät zur Identifikation von Wörtern, die in einem portablen elektronischen Dokument beschrieben sind |
| US5870084A (en) * | 1996-11-12 | 1999-02-09 | Thomson Consumer Electronics, Inc. | System and method for efficiently storing and quickly retrieving glyphs for large character set languages in a set top box |
| US6327393B1 (en) * | 1998-08-17 | 2001-12-04 | Cognex Corporation | Method and apparatus to transform a region within a digital image using a deformable window |
| US20040205568A1 (en) * | 2002-03-01 | 2004-10-14 | Breuel Thomas M. | Method and system for document image layout deconstruction and redisplay system |
| US20040202352A1 (en) * | 2003-04-10 | 2004-10-14 | International Business Machines Corporation | Enhanced readability with flowed bitmaps |
-
2005
- 2005-07-01 AT AT05014369T patent/ATE373274T1/de active
- 2005-07-01 DE DE602005002473T patent/DE602005002473T2/de not_active Expired - Lifetime
- 2005-07-01 EP EP05014369A patent/EP1739574B1/de not_active Expired - Lifetime
-
2006
- 2006-04-18 US US11/405,782 patent/US7705848B2/en active Active
Also Published As
| Publication number | Publication date |
|---|---|
| US7705848B2 (en) | 2010-04-27 |
| EP1739574B1 (de) | 2007-09-12 |
| US20070002054A1 (en) | 2007-01-04 |
| DE602005002473T2 (de) | 2008-01-10 |
| DE602005002473D1 (de) | 2007-10-25 |
| EP1739574A1 (de) | 2007-01-03 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| ATE373274T1 (de) | Verfahren zur identifizierung von wörtern in einem elektronischen dokument | |
| CN102411587B (zh) | 一种网页分类方法和装置 | |
| Kirchherr et al. | Conceptualizing the circular economy: An analysis of 114 definitions | |
| WO2007038389A3 (en) | Method and apparatus for identifying and classifying network documents as spam | |
| ATE463013T1 (de) | Verfahren und anordnung zur erkennung semantischer strukturen aus einem text | |
| DE60231005D1 (de) | Systeme, verfahren und software zum klassifizieren von dokumenten | |
| JP2015502603A (ja) | ウェブページからの主要コンテンツの抽出 | |
| ATE432515T1 (de) | Verfahren zur determinierung und unterdrückung von duplikaten | |
| CN108268884B (zh) | 一种文档对比方法及装置 | |
| DE60139536D1 (de) | Verfahren, Gerät und Computerprogrammprodukte zur Wiederauffindung von Information und dem Klassifizieren von Dokumenten mit einem multidimensionalem Unterraum | |
| CN103309862A (zh) | 一种网页类型识别方法和系统 | |
| ATE450012T1 (de) | Computerunterstütztes abrufen von dokumenten | |
| WO2003057648A3 (fr) | Procedes et systemes de recherche et d'association de ressources d'information telles que des pages web | |
| DE60308654D1 (de) | Ermittlung der auswirkung von einem clostridial toxin auf muskeln | |
| BR0309598A (pt) | Método para a caracterização de um relacionamento entre uma primeira e uma segunda amostras de áudio, produto de programa de computador e sistema de computador | |
| WO2007019691A3 (en) | Automatic website generator | |
| ATE470192T1 (de) | Verfahren und vorrichtung zur klassifikation von bildseiten mittels zusammenfassungen | |
| DE602006007172D1 (de) | System und verfahren zum analysieren von radarinformationen | |
| EP1736901A3 (de) | Verfahren zur Klassifizierung von Subbäumen in halbstrukturierten Dokumenten | |
| DE602005018429D1 (de) | Vorrichtung, Verfahren, Prozessoranordnung und computerlesbares Datenträgerspeicherprogramm zur Dokumentklassifizierung | |
| WO2008114245A3 (en) | System and method for identification, prevention and management of web-sites defacement attacks | |
| McCollum et al. | Unbounded harmony is not always myopic: Evidence from Tutrugbu | |
| ATE375561T1 (de) | Verfahren zur identifizierung von redundantem text in elektronischen dokumenten | |
| EP1907946A1 (de) | Verfahren zum finden der textlesereihenfolge in einem dokument | |
| ATE519168T1 (de) | Verfahren zur analyse von ein teil von multimedia-inhalten und entsprechenden computer software produkt-und-analyse-gerät |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| UEP | Publication of translation of european patent specification |
Ref document number: 1739574 Country of ref document: EP |