EP2776945A4 - EXTRACTION OF THE MAIN CONTENT OF WEB PAGES - Google Patents
EXTRACTION OF THE MAIN CONTENT OF WEB PAGESInfo
- Publication number
- EP2776945A4 EP2776945A4 EP12847034.1A EP12847034A EP2776945A4 EP 2776945 A4 EP2776945 A4 EP 2776945A4 EP 12847034 A EP12847034 A EP 12847034A EP 2776945 A4 EP2776945 A4 EP 2776945A4
- Authority
- EP
- European Patent Office
- Prior art keywords
- extraction
- web pages
- main content
- content
- main
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Document Processing Apparatus (AREA)
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201161558153P | 2011-11-10 | 2011-11-10 | |
| US13/563,060 US9152730B2 (en) | 2011-11-10 | 2012-07-31 | Extracting principal content from web pages |
| PCT/US2012/063777 WO2013070645A1 (en) | 2011-11-10 | 2012-11-07 | Extracting principal content from web pages |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| EP2776945A1 EP2776945A1 (en) | 2014-09-17 |
| EP2776945A4 true EP2776945A4 (en) | 2015-05-27 |
Family
ID=48281623
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP12847034.1A Ceased EP2776945A4 (en) | 2011-11-10 | 2012-11-07 | EXTRACTION OF THE MAIN CONTENT OF WEB PAGES |
Country Status (5)
| Country | Link |
|---|---|
| US (1) | US9152730B2 (en) |
| EP (1) | EP2776945A4 (en) |
| JP (1) | JP2015502603A (en) |
| CA (1) | CA2853199A1 (en) |
| WO (1) | WO2013070645A1 (en) |
Families Citing this family (31)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20130339839A1 (en) * | 2012-06-14 | 2013-12-19 | Emre Yavuz Baran | Analyzing User Interaction |
| US20140380142A1 (en) * | 2013-06-20 | 2014-12-25 | Microsoft Corporation | Capturing website content through capture services |
| WO2015018244A1 (en) | 2013-08-07 | 2015-02-12 | Microsoft Corporation | Augmenting and presenting captured data |
| US20150067476A1 (en) * | 2013-08-29 | 2015-03-05 | Microsoft Corporation | Title and body extraction from web page |
| US9117280B2 (en) | 2013-08-29 | 2015-08-25 | Microsoft Technology Licensing, Llc | Determining images of article for extraction |
| US9876848B1 (en) * | 2014-02-21 | 2018-01-23 | Twitter, Inc. | Television key phrase detection |
| KR102063566B1 (en) * | 2014-02-23 | 2020-01-09 | 삼성전자주식회사 | Operating Method For Text Message and Electronic Device supporting the same |
| US9665617B1 (en) * | 2014-04-16 | 2017-05-30 | Google Inc. | Methods and systems for generating a stable identifier for nodes likely including primary content within an information resource |
| US9910644B2 (en) * | 2015-03-03 | 2018-03-06 | Microsoft Technology Licensing, Llc | Integrated note-taking functionality for computing system entities |
| US10607152B2 (en) * | 2015-05-26 | 2020-03-31 | Textio, Inc. | Using machine learning to predict outcomes for documents |
| EP4044022A1 (en) * | 2015-07-30 | 2022-08-17 | Wix.com Ltd. | System integrating a mobile device application creation, editing and distribution system with a website design system |
| US11500535B2 (en) * | 2015-10-29 | 2022-11-15 | Lenovo (Singapore) Pte. Ltd. | Two stroke quick input selection |
| US10324699B2 (en) * | 2015-12-15 | 2019-06-18 | International Business Machines Corporation | Enhanceable cross-domain rules engine for unmatched registry entries filtering |
| US10289642B2 (en) * | 2016-06-06 | 2019-05-14 | Baidu Usa Llc | Method and system for matching images with content using whitelists and blacklists in response to a search query |
| US9817806B1 (en) * | 2016-06-28 | 2017-11-14 | International Business Machines Corporation | Entity-based content change management within a document content management system |
| WO2018039774A1 (en) | 2016-09-02 | 2018-03-08 | FutureVault Inc. | Systems and methods for sharing documents |
| CA3035097C (en) | 2016-09-02 | 2024-05-21 | FutureVault Inc. | Automated document filing and processing methods and systems |
| WO2018039772A1 (en) | 2016-09-02 | 2018-03-08 | FutureVault Inc. | Real-time document filtering systems and methods |
| CN108241612B (en) * | 2016-12-27 | 2021-11-05 | 北京国双科技有限公司 | Method and device for processing punctuation marks |
| JP7009840B2 (en) | 2017-08-30 | 2022-01-26 | 富士通株式会社 | Information processing equipment, information processing method and dialogue control system |
| US11030223B2 (en) * | 2017-10-09 | 2021-06-08 | Box, Inc. | Collaboration activity summaries |
| US11928083B2 (en) | 2017-10-09 | 2024-03-12 | Box, Inc. | Determining collaboration recommendations from file path information |
| KR102462516B1 (en) | 2018-01-09 | 2022-11-03 | 삼성전자주식회사 | Display apparatus and Method for providing a content thereof |
| US10824306B2 (en) * | 2018-10-16 | 2020-11-03 | Lenovo (Singapore) Pte. Ltd. | Presenting captured data |
| CN111460272B (en) * | 2019-01-22 | 2024-02-13 | 北京国双科技有限公司 | Text page ordering method and related equipment |
| US11042555B1 (en) * | 2019-06-28 | 2021-06-22 | Bottomline Technologies, Inc. | Two step algorithm for non-exact matching of large datasets |
| US11960834B2 (en) * | 2019-09-30 | 2024-04-16 | Brave Software, Inc. | Reader mode-optimized attention application |
| US10956731B1 (en) | 2019-10-09 | 2021-03-23 | Adobe Inc. | Heading identification and classification for a digital document |
| US10949604B1 (en) * | 2019-10-25 | 2021-03-16 | Adobe Inc. | Identifying artifacts in digital documents |
| CN117707505A (en) * | 2022-09-08 | 2024-03-15 | 北京有竹居网络技术有限公司 | Webpage generation method and device, electronic equipment and storage medium |
| US12223255B2 (en) * | 2022-09-12 | 2025-02-11 | Google Llc | Reading assistant in a browser environment |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20110066662A1 (en) * | 2009-09-14 | 2011-03-17 | Adtuitive, Inc. | System and Method for Content Extraction from Unstructured Sources |
Family Cites Families (23)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPS6249483A (en) * | 1985-08-28 | 1987-03-04 | Hitachi Ltd | Character input method for real-time handwritten character recognition |
| US7536561B2 (en) * | 1999-10-15 | 2009-05-19 | Ebrary, Inc. | Method and apparatus for improved information transactions |
| US7137067B2 (en) | 2000-03-17 | 2006-11-14 | Fujitsu Limited | Device and method for presenting news information |
| JP3703080B2 (en) | 2000-07-27 | 2005-10-05 | インターナショナル・ビジネス・マシーンズ・コーポレーション | Method, system and medium for simplifying web content |
| US6778986B1 (en) * | 2000-07-31 | 2004-08-17 | Eliyon Technologies Corporation | Computer method and apparatus for determining site type of a web site |
| US7467206B2 (en) | 2002-12-23 | 2008-12-16 | Microsoft Corporation | Reputation system for web services |
| US7653621B2 (en) * | 2003-07-30 | 2010-01-26 | Oracle International Corporation | Method of determining the similarity of two strings |
| US7392474B2 (en) | 2004-04-30 | 2008-06-24 | Microsoft Corporation | Method and system for classifying display pages using summaries |
| US20130212463A1 (en) | 2004-09-07 | 2013-08-15 | Evernote Corporation | Smart document processing with associated online data and action streams |
| US7774192B2 (en) | 2005-01-03 | 2010-08-10 | Industrial Technology Research Institute | Method for extracting translations from translated texts using punctuation-based sub-sentential alignment |
| US8468445B2 (en) * | 2005-03-30 | 2013-06-18 | The Trustees Of Columbia University In The City Of New York | Systems and methods for content extraction |
| US9141718B2 (en) | 2005-06-03 | 2015-09-22 | Apple Inc. | Clipview applications |
| JP4238849B2 (en) | 2005-06-30 | 2009-03-18 | カシオ計算機株式会社 | Web page browsing apparatus, Web page browsing method, and Web page browsing processing program |
| US7548929B2 (en) * | 2005-07-29 | 2009-06-16 | Yahoo! Inc. | System and method for determining semantically related terms |
| US8126898B2 (en) | 2006-11-06 | 2012-02-28 | Salesforce.Com, Inc. | Method and system for generating scored recommendations based on scored references |
| US8181107B2 (en) | 2006-12-08 | 2012-05-15 | Bytemobile, Inc. | Content adaptation |
| TW200836075A (en) | 2007-02-16 | 2008-09-01 | Esobi Inc | Method of converting hypertext markup language web page into pure text and system thereof |
| US8806325B2 (en) | 2009-11-18 | 2014-08-12 | Apple Inc. | Mode identification for selective document content presentation |
| US8819028B2 (en) * | 2009-12-14 | 2014-08-26 | Hewlett-Packard Development Company, L.P. | System and method for web content extraction |
| US8281232B2 (en) * | 2010-04-22 | 2012-10-02 | Rockmelt, Inc. | Integrated adaptive URL-shortening functionality |
| US20130155463A1 (en) * | 2010-07-30 | 2013-06-20 | Jian-Ming Jin | Method for selecting user desirable content from web pages |
| US10089404B2 (en) | 2010-09-08 | 2018-10-02 | Evernote Corporation | Site memory processing |
| CA2808943A1 (en) | 2010-09-08 | 2012-03-15 | Evernote Corporation | Site memory processing and clipping control |
-
2012
- 2012-07-31 US US13/563,060 patent/US9152730B2/en active Active
- 2012-11-07 JP JP2014541166A patent/JP2015502603A/en active Pending
- 2012-11-07 WO PCT/US2012/063777 patent/WO2013070645A1/en not_active Ceased
- 2012-11-07 CA CA 2853199 patent/CA2853199A1/en not_active Abandoned
- 2012-11-07 EP EP12847034.1A patent/EP2776945A4/en not_active Ceased
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20110066662A1 (en) * | 2009-09-14 | 2011-03-17 | Adtuitive, Inc. | System and Method for Content Extraction from Unstructured Sources |
Non-Patent Citations (2)
| Title |
|---|
| See also references of WO2013070645A1 * |
| TOPHER KESSLER: "How to use Safari's new 'Reader'", 9 June 2010 (2010-06-09), pages 1 - 4, XP055477560, Retrieved from the Internet <URL:https://www.cnet.com/news/how-to-use-safaris-new-reader/> [retrieved on 20180523] * |
Also Published As
| Publication number | Publication date |
|---|---|
| JP2015502603A (en) | 2015-01-22 |
| US20130124513A1 (en) | 2013-05-16 |
| EP2776945A1 (en) | 2014-09-17 |
| WO2013070645A1 (en) | 2013-05-16 |
| US9152730B2 (en) | 2015-10-06 |
| CA2853199A1 (en) | 2013-05-16 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP2776945A4 (en) | EXTRACTION OF THE MAIN CONTENT OF WEB PAGES | |
| EP2724557A4 (en) | PROVISION OF RELEVANT CONTENT | |
| EP2932401A4 (en) | CONTENT DISTRIBUTION CADRICAL | |
| EP2862048A4 (en) | SELECTION AND ROUTING OF ADDITIONAL CONTENT | |
| EP2852952A4 (en) | AUDIO CONTENT AUDIO | |
| EP2761573A4 (en) | TECHNIQUES TO MANAGE AND VIEW CONTENT FOLLOW-UP | |
| EP2561455A4 (en) | Selectively adding social dimension to web searches | |
| EP2817970A4 (en) | AUTOMATIC RECOMMENDATION CONTENT | |
| EP2726969A4 (en) | DISPLAY OF CONTENT | |
| EP2734909A4 (en) | Rich web page generation | |
| ES1078354Y (en) | BANK OF FOLDING STRUCTURE ROLLERS | |
| LT2775868T (en) | SMOKING PRODUCT WITH VISIBLE CONTENT | |
| HUE048841T2 (en) | Cloud-based web content filtering | |
| EP2915038A4 (en) | PROVIDING VIRTUALIZED CONTENT | |
| AT509318B8 (en) | separation | |
| BRDI7105192S (en) | CONFIGURATION APPLIES TO BAG | |
| EP2748728A4 (en) | SORT OF FREQUENCY CONTENT | |
| FR2962044B1 (en) | LACRYMIMETIC EMULSION | |
| EP2727324A4 (en) | CONTEXT EXTRACTION | |
| GB201414340D0 (en) | Web application content mapping | |
| DK2898148T3 (en) | DRAINAGE STRUCTURE | |
| EP2807800A4 (en) | AUTHORIZATIONS FOR EXPLOITABLE CONTENT | |
| FI20100033A0 (en) | extraction | |
| GB201222514D0 (en) | Web page variation | |
| PL2523567T3 (en) | WATER RECOVERY |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| 17P | Request for examination filed |
Effective date: 20140602 |
|
| AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
| RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: BIGNERT, JAKOB Inventor name: COARNA, GABRIEL, ALEXANDRU |
|
| DAX | Request for extension of the european patent (deleted) | ||
| RA4 | Supplementary search report drawn up and despatched (corrected) |
Effective date: 20150424 |
|
| RIC1 | Information provided on ipc code assigned before grant |
Ipc: G06F 17/30 20060101AFI20150420BHEP |
|
| 17Q | First examination report despatched |
Effective date: 20170919 |
|
| REG | Reference to a national code |
Ref country code: DE Ref legal event code: R003 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED |
|
| 18R | Application refused |
Effective date: 20190222 |