WO2017117805A1 - Procédé et système de capture d'informations web - Google Patents
Procédé et système de capture d'informations web Download PDFInfo
- Publication number
- WO2017117805A1 WO2017117805A1 PCT/CN2016/070499 CN2016070499W WO2017117805A1 WO 2017117805 A1 WO2017117805 A1 WO 2017117805A1 CN 2016070499 W CN2016070499 W CN 2016070499W WO 2017117805 A1 WO2017117805 A1 WO 2017117805A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- time
- type
- webpage
- entry
- captured
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3466—Performance evaluation by tracing or monitoring
- G06F11/3476—Data logging
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Definitions
- the present invention relates to the field of the Internet, and in particular, to a method and system for capturing network information.
- the network consists of nodes and connections, representing many objects and their interconnections.
- a network is a kind of graph that is generally considered to be a weighted graph.
- the network has a specific physical meaning, that is, the network is abstracted from some practical problem of the same type.
- the network In the field of computers, the network is a virtual platform for information transmission, reception, and sharing. Through it, the information of various points, faces, and bodies is linked together to realize the sharing of these resources.
- the network is the most important invention in the history of human development. Improve the development of science and technology and human society.
- the existing network information is massive, and each user is a specific person. Therefore, in this case, the crawling of network information is very important, and the existing network information is captured without any screening, so The effect of crawling network information is not good.
- the application provides a method for capturing network information.
- the invention solves the shortcomings of the prior art technical solution that the network information capture is not effective.
- a method for crawling network information comprising the following steps:
- the type of the webpage is captured
- the method further includes:
- the type of the network is not captured.
- the method further includes:
- the type of the network is crawled.
- a crawling system for network information comprising:
- the obtaining unit is configured to obtain an entry time and an entry and exit time when the user clicks into the webpage;
- a crawling unit configured to: when the time difference between the entry and exit time and the entry time is greater than a time threshold, the type of the webpage is captured;
- the sending unit is configured to send the captured webpage type and the identifier of the user to the background server.
- system further includes:
- the crawling unit is further configured to: if the number of clicks of the webpage exceeds a threshold number of times, the type of the network is captured.
- the technical solution provided by the invention can determine whether to capture the network information of the user according to the time when the user clicks on the webpage, which has a positive effect on the screening of the network information, so that it has the advantage of good network information capture effect.
- FIG. 1 is a flowchart of a method for capturing network information according to a first preferred embodiment of the present invention
- FIG. 2 is a structural diagram of a network information capture system according to a second preferred embodiment of the present invention.
- FIG. 1 is a schematic diagram of a method for capturing network information according to a first preferred embodiment of the present invention. The method is as shown in FIG.
- Step S101 Acquire an entry time and an entry and exit time when the user clicks into the webpage
- Step S102 If the time difference between the entry and exit time and the entry time is greater than a time threshold, the type of the webpage is captured;
- Step S103 Send the captured webpage type and the identifier of the user to the background server.
- the technical solution provided by the invention can determine whether to capture the network information of the user according to the time when the user clicks on the webpage, which has a positive effect on the screening of the network information, so that it has the advantage of good network information capture effect.
- the foregoing method may further include:
- the type of the network is not captured.
- the foregoing method may further include:
- the type of the network is crawled.
- FIG. 2 is a schematic diagram of a network information capture system according to a second preferred embodiment of the present invention.
- the system includes the following:
- the obtaining unit 201 is configured to obtain an entry time and an entry and exit time when the user clicks into the webpage;
- the crawling unit 202 is configured to: when the time difference between the entry and exit time and the entry time is greater than a time threshold, the type of the webpage is captured;
- the sending unit 203 is configured to send the captured webpage type and the identifier of the user to the background server.
- the technical solution provided by the invention can determine whether to capture the network information of the user according to the time when the user clicks on the webpage, which has a positive effect on the screening of the network information, so that it has the advantage of good network information capture effect.
- the above system may further include:
- the abandonment unit 204 is configured to not capture the type of the network if the time difference is less than a time threshold.
- the crawling unit 202 is further configured to: if the number of clicks of the webpage exceeds a threshold number of times, the type of the network is captured.
- the program may be stored in a computer readable storage medium, and the storage medium may include: Flash drive, read-only memory (English: Read-Only Memory, referred to as: ROM), random accessor (English: Random Access Memory, referred to as: RAM), disk or CD.
- ROM Read-Only Memory
- RAM Random Access Memory
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Information Transfer Between Computers (AREA)
Abstract
L'invention concerne un procédé et un système de capture d'informations Web. Le procédé consiste à : acquérir un temps d'accès et un temps de sortie après qu'un utilisateur a cliqué pour accéder à une page Web (101); si une différence temporelle entre le temps de sortie et le temps d'accès est supérieure à un seuil de durée, capturer le type de la page Web (102); et envoyer le type de page web capturé ainsi qu'un identifiant de l'utilisateur à un serveur dorsal (103). Ce procédé et ce système offrent l'avantage de capturer des informations Web avec précision.
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2016/070499 WO2017117805A1 (fr) | 2016-01-08 | 2016-01-08 | Procédé et système de capture d'informations web |
| CN201680000016.XA CN105683962A (zh) | 2016-01-08 | 2016-01-08 | 网络信息的抓取方法及系统 |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2016/070499 WO2017117805A1 (fr) | 2016-01-08 | 2016-01-08 | Procédé et système de capture d'informations web |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2017117805A1 true WO2017117805A1 (fr) | 2017-07-13 |
Family
ID=56216040
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2016/070499 Ceased WO2017117805A1 (fr) | 2016-01-08 | 2016-01-08 | Procédé et système de capture d'informations web |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN105683962A (fr) |
| WO (1) | WO2017117805A1 (fr) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2004280501A (ja) * | 2003-03-17 | 2004-10-07 | Nri & Ncc Co Ltd | Web動線分析システム |
| CN102347930A (zh) * | 2010-07-26 | 2012-02-08 | 中国电信股份有限公司 | 网页内容获取方法和系统 |
| CN102495874A (zh) * | 2011-12-01 | 2012-06-13 | 江苏仕德伟网络科技股份有限公司 | 确定网民单次访问网站浏览网页数量和时间的方法 |
| CN104199874A (zh) * | 2014-08-20 | 2014-12-10 | 哈尔滨工程大学 | 一种基于用户浏览行为的网页推荐方法 |
| CN104700289A (zh) * | 2015-03-17 | 2015-06-10 | 中国联合网络通信集团有限公司 | 广告投放方法和装置 |
-
2016
- 2016-01-08 WO PCT/CN2016/070499 patent/WO2017117805A1/fr not_active Ceased
- 2016-01-08 CN CN201680000016.XA patent/CN105683962A/zh active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2004280501A (ja) * | 2003-03-17 | 2004-10-07 | Nri & Ncc Co Ltd | Web動線分析システム |
| CN102347930A (zh) * | 2010-07-26 | 2012-02-08 | 中国电信股份有限公司 | 网页内容获取方法和系统 |
| CN102495874A (zh) * | 2011-12-01 | 2012-06-13 | 江苏仕德伟网络科技股份有限公司 | 确定网民单次访问网站浏览网页数量和时间的方法 |
| CN104199874A (zh) * | 2014-08-20 | 2014-12-10 | 哈尔滨工程大学 | 一种基于用户浏览行为的网页推荐方法 |
| CN104700289A (zh) * | 2015-03-17 | 2015-06-10 | 中国联合网络通信集团有限公司 | 广告投放方法和装置 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN105683962A (zh) | 2016-06-15 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2017161578A1 (fr) | Procédé et système de capture de données | |
| CN104809026B (zh) | 一种使用远程节点借用cpu计算资源的方法 | |
| WO2017128355A1 (fr) | Procédé et système de réservation de rendez-vous de visite d'habitation sur un réseau de biens immobiliers | |
| WO2017117805A1 (fr) | Procédé et système de capture d'informations web | |
| WO2017117783A1 (fr) | Système et procédé de recherche d'informations de réseau | |
| WO2017128364A1 (fr) | Procédé et système reposant sur des mégadonnées pour une association d'informations humain/véhicule | |
| CN112416887B (zh) | 信息交互方法、装置和电子设备 | |
| WO2017173653A1 (fr) | Procédé et système de questions et réponses éducatives basés sur internet | |
| WO2017128351A1 (fr) | Procédé et système permettant d'évaluer un agent sur des sites web immobiliers | |
| WO2017117778A1 (fr) | Système et procédé de partage d'informations réseau | |
| WO2017117781A1 (fr) | Procédé et système de classification d'informations de réseau | |
| WO2017117716A1 (fr) | Procédé et système de gestion de positionnement en extérieur pour ville intelligente | |
| WO2017117803A1 (fr) | Procédé et système d'acquisition de publicités en ligne | |
| WO2017117782A1 (fr) | Procédé et système de traitement de segmentation de mots d'informations de réseau | |
| WO2017117785A1 (fr) | Procédé et système de recherche web | |
| WO2017190284A1 (fr) | Système et procédé d'acquisition d'utilisateur de cours en ligne | |
| WO2017173652A1 (fr) | Procédé et système de limitation de dispositif pédagogique basée sur internet | |
| WO2017190322A1 (fr) | Procédé et système de formation d'avocat par des cours en ligne | |
| WO2017117779A1 (fr) | Procédé et système de fonctionnement destiné au partage d'informations de réseau | |
| WO2017190283A1 (fr) | Procédé et système destinés à filtrer des cours en ligne | |
| WO2017128440A1 (fr) | Procédé et système destinés à la surveillance et au rappel de mégadonnées | |
| WO2017166132A1 (fr) | Procédé et système de poussée d'informations de réseau | |
| WO2017173651A1 (fr) | Procédé et système d'éducation basé sur internet | |
| WO2017161576A1 (fr) | Procédé et système d'alerte précoce sur des données | |
| WO2018027928A1 (fr) | Procédé et système de capture de mégadonnées de forum |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16882950 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 16882950 Country of ref document: EP Kind code of ref document: A1 |