WO2021098242A1 - 页面处理方法、装置、电子设备和计算机可读介质 - Google Patents
页面处理方法、装置、电子设备和计算机可读介质 Download PDFInfo
- Publication number
- WO2021098242A1 WO2021098242A1 PCT/CN2020/101910 CN2020101910W WO2021098242A1 WO 2021098242 A1 WO2021098242 A1 WO 2021098242A1 CN 2020101910 W CN2020101910 W CN 2020101910W WO 2021098242 A1 WO2021098242 A1 WO 2021098242A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- node
- layout object
- page
- nodes
- recall
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/957—Browsing optimisation, e.g. caching or content distillation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/957—Browsing optimisation, e.g. caching or content distillation
- G06F16/9577—Optimising the visualization of content, e.g. distillation of HTML documents
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
Definitions
- the embodiments of the present disclosure relate to the technical fields of deep learning and intelligent search, and in particular to a page processing method, device, electronic equipment, and computer-readable medium.
- the content of the pages displayed on the website should be filtered to provide protection for the ecological safety of mobile search, so as to enhance the user's browsing experience.
- the embodiments of the present disclosure provide a page processing method, device, electronic equipment, and computer readable medium.
- an embodiment of the present disclosure provides a page processing method, including: determining multiple layout object nodes of the page according to the obtained hypertext markup language HTML file; after laying out the multiple layout object nodes of the page, using pre- Set recall rules to filter the multiple layout object nodes to obtain layout object nodes that meet the recall rules; predict whether the layout object nodes that meet the recall rules are designated target nodes; perform shielding processing on the designated target nodes, using shielding processing After the remaining layout object nodes, generate the masked page.
- an embodiment of the present disclosure provides a page processing device, including: a node determining module, configured to determine multiple layout object nodes of a page according to the obtained hypertext markup language HTML file; a node filtering module, configured to check the page After the layout of multiple layout object nodes of, the preset recall rules are used to screen the multiple layout object nodes to obtain layout object nodes that meet the recall rules; the prediction module is used to predict layout object nodes that meet the recall rules Whether it is a designated target node; the shielding processing module is used to shield the designated target node, and use the remaining layout object nodes after shielding to generate a shielded page.
- embodiments of the present disclosure provide an electronic device, which includes: one or more processors; a memory, on which one or more programs are stored, when one or more programs are executed by one or more processors , Enabling one or more processors to any one of the above-mentioned page processing methods; one or more I/O interfaces, connected between the processor and the memory, and configured to implement information interaction between the processor and the memory.
- embodiments of the present disclosure provide a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, any one of the above-mentioned page processing methods is implemented.
- the page processing method, device, electronic device, and computer-readable medium process the page by combining recall rules with a node prediction model, and use node prediction for the layout object nodes that have been screened by the recall rules.
- the model determines whether it affects the browsing experience, thereby shielding the predicted layout object nodes that affect the browsing experience, optimizes the page browsing experience as a whole, and provides protection for the ecological safety of mobile search.
- FIG. 1 is a schematic diagram of a page processing architecture provided by an embodiment of the present disclosure
- Fig. 2 is a flowchart of a page processing method according to an embodiment of the present disclosure
- Fig. 3 is a schematic diagram of recall rules in an exemplary embodiment of the present disclosure
- FIG. 4 is a flowchart of a page processing method according to another embodiment of the present disclosure.
- Figure 5 is a schematic diagram of the effect of the disclosed page processing method
- FIG. 6 is a block diagram of a page processing device provided by an embodiment of the disclosure.
- FIG. 7 is a block diagram of an electronic device provided by an embodiment of the present disclosure.
- Fig. 8 is a block diagram of a computer-readable medium provided by an embodiment of the present disclosure.
- FIG. 1 is a schematic diagram of a page processing architecture of an embodiment of the present disclosure.
- the architecture may include: a mobile device 20 and a website 30, where the mobile device 20 may include a browser kernel 21, a memory 22, and a display screen 23; the website 30 may include multiple pages 31.
- the mobile device 20 may include, but is not limited to: a personal computer, a smart phone, a tablet computer, a personal digital assistant, a server, and the like. All of them can be installed with various applications (Apps), such as mailbox apps.
- Apps such as mailbox apps.
- the page 31 in the embodiment of the present disclosure includes, but is not limited to, a landing page.
- Landing pages can be used to represent independent web pages, and can be used for marketing or advertising activities, such as advertisements that users or visitors click through to search or pages that are clicked through through paid search channels.
- the browsing kernel 21 initiates a hypertext markup language according to the URL Download the (Hypertext Markup Language, HTML) file, and parse the downloaded HTML file to obtain the DOM (Document Object Model) tree.
- the cascading style sheets (CSS) and script language When resources such as JavaScript, JS) files are linked, a CSS file download and a JS file download are initiated, and the downloaded CSS file and JS file are stored in the memory 22.
- the embodiments of the present disclosure can provide a page processing method. Before the display 23 of the mobile device 20 displays the page 31, the type of the page element in the page 31 is intelligently recognized in the rendering phase of the browser kernel 21, and the user's browsing experience is automatically blocked. After the page 31 is rendered, what the user 10 sees is an optimized page, which greatly improves the user’s browsing experience and provides a guarantee for the ecological safety of mobile search.
- Fig. 2 is a flowchart of a page processing method according to an embodiment of the disclosure. As shown in Figure 2, the page processing method may include the following steps.
- S110 Determine multiple layout object nodes of the page according to the obtained Hypertext Markup Language HTML file.
- S130 Predict whether the layout object node that meets the recall rule is a designated target node.
- S140 Perform shielding processing on the designated target node, and use the remaining layout target nodes after the shielding processing to generate a shielded page.
- the page is processed by combining the recall rule with the node prediction model.
- the layout object nodes filtered by the recall rule are used to determine whether the browsing experience is affected by the node prediction model, and the prediction is obtained.
- Layout object nodes that affect the browsing experience are shielded, and pages after the shielding processing are generated, so as to optimize the page browsing experience as a whole, and provide protection for the ecological safety of mobile search.
- the layout of the layout object node represents the process of arranging and calculating the width, height, position and other geometric information of the layout object node.
- the page processing method of the embodiments of the present disclosure may not need to traverse the entire page, and does not need to re-layout the entire page, but actively trigger the partial layout.
- the preset recall rules can be used to filter the layout object nodes.
- each node on the page will call its own layout method during layout, so as to avoid traversing the DOM tree.
- the target node will be shielded, such as setting the status of the target node to hidden, resetting the kernel layout state, and actively initiating the kernel re-layout, so that the node can be directly laid out locally to avoid re-layout at the entire page level. layout.
- step S110 may specifically include: S21, parsing the HTML file to obtain the document object model DOM and cascading style sheet CSS; S22, parsing the CSS to obtain the style data of the HTML element node in the DOM; S23, according to the needs in the DOM
- the HTML element nodes and style data for rendering determine multiple layout object nodes of the page.
- each layout object node corresponds to an HTML element node that needs to be rendered
- the style data of each layout object node is the style data of the corresponding HTML element node
- the document object model DOM can be a tree-structured DOM, that is, a DOM tree; multiple layout object nodes can be nodes in the Layout Object tree of the layout object; after the Layout Object tree is established and laid out, the layout of the Layout Object tree Nodes can have a series of attribute information such as coordinates, width, and height.
- each node in the Layout Object tree corresponds to the HTML element node that needs to be rendered in the DOM
- the CSS attribute object used to describe the HTML element node in the DOM tree is set to the newly created The layout object nodes in the Layout Object tree so that the layout object nodes in the Layout Object tree can be drawn according to the style data in the CSS.
- the method may further include: S31, downloading and executing the script file corresponding to the script file link to obtain the HTML element node corresponding to the script file; S32: The HTML element node corresponding to the script file is used as the layout object node that complies with the recall rule.
- the page processing method may further include: if the multiple layout object nodes include layout object nodes loaded through a script file, then The layout object node loaded through the script file is used as the layout object node that complies with the recall rule.
- the HTML element node corresponding to the script file can be used as the layout object node that complies with the recall rule according to the characteristics of whether the node is loaded by JS.
- the node to be identified is initially filtered, and the node re-layout is triggered by the asynchronously loaded JS resource, which effectively reduces the time for subsequent use of the node prediction model to identify nodes that affect the browsing experience through prediction.
- Fig. 3 shows a schematic diagram of a recall rule in an exemplary embodiment of the present disclosure.
- the use of preset recall rules to filter layout object nodes may be referred to as rule-based coarse recall.
- the node recall conditions can be set from the node width and height ratio, node embedded form, node location characteristics, node content, node generation mechanism and node structure.
- the recall rule may include: a rule set in advance according to at least one of the ratio of the width and height of the node, the embedded form of the node, the location feature of the node, the content of the node, the node generation mechanism, and the node structure.
- step S120 may specifically include: S41, lay out multiple layout object nodes on the page to obtain attribute information of the layout object nodes that have been laid out; S42, determine whether the attribute information meets the nodes defined in the recall rule Recall condition; S43, the layout object node that meets the node recall condition is taken as the layout object node that meets the recall rule.
- the rule set according to the ratio of the width and height of the node includes the node whose height ratio is less than the height ratio threshold, and/or the node whose width ratio is less than the width ratio threshold as the nodes meeting the recall rule.
- the nodes that affect the browsing experience seldom occupy the entire screen, and most of them exist in the form of interspersed or suspended on the page.
- the probability that the node that occupies 75% of the screen height is not the target node is very high, and the width ratio is less than the width ratio.
- the child nodes of other nodes than the threshold can be filtered out.
- the rules set according to the embedded form of the node include specifying the embedded form of the node as a node that complies with the recall rule. For example, according to data analysis, it is found that in-frame iframe nodes are commonly used parasitic sites for target nodes. The nodes contain embedded data from most advertisers. Therefore, nodes with iframes are also included in the set of suspected target nodes.
- the rule set according to the location feature of the node includes: the location feature of the node includes a floating node as a node that complies with the recall rule.
- the target nodes are fixed, embedded, or suspended relative to the page.
- the floating target node has the worst impact on the browsing experience, obstructing effective information and forcing the user to close it. Therefore, the floating node is also Be included in the set of suspected target nodes.
- the rule set according to the content characteristics of the node includes: taking a node with a specified type of content as a node that meets the recall rule.
- the text, pictures, and interactive content in the node are relatively rich, there is a high probability that it is a non-target node.
- the rules set according to the node generation mechanism include: taking a node with a designated generation mechanism as a node that complies with the recall rule.
- the nodes in the page include HTML source code and nodes dynamically generated by JS
- the JS generation nodes are flexible and changeable.
- Most of the main content of the page is in HTML, and other content such as advertisements and related recommendations that need to be dynamically changed Use JS to generate. Therefore, the node generated by JS is highly likely to be the target node.
- the rule set according to the structural characteristics of the node includes: taking a node with a specified structure as a node that complies with the recall rule.
- the structural characteristics of the node on the DOM tree can also be used as the basis for filtering. For example, in the DOM tree structure, most of the nodes with only plain text are non-target nodes (nodes that do not meet the recall rules); and have div/a Block-level nodes in the form of /img are likely to be promoted through pictures.
- the rule-based rough recall as long as it hits the node recall condition defined by any one of the recall rules, it can indicate that the node has the characteristics of a suspected target node, and subsequent follow-ups can be performed.
- the target node judgment logic if all the rules are missed, it is regarded as a non-target node, so that a large number of normal nodes that do not affect the browsing experience can be filtered out through a screening strategy of recall rules.
- step S130 the following steps may be further included.
- the layout object node that meets the recall rule is used as the layout object node obtained by the initial screening, and the node state of the layout object node obtained by the initial screening is determined.
- the preset recall rules are used again to screen the layout object nodes whose node status has changed.
- the layout object nodes selected for the first time and the layout object nodes obtained by the second selection are regarded as the layout object nodes that comply with the recall rule.
- step S130 may specifically include the following steps.
- S61 Calculate the node characteristics of the layout object node that meets the recall rule according to the attribute information of the layout object node that meets the recall rule.
- S62 Use a preset node prediction model to process node characteristics, and obtain a probability value that a layout target node that meets the recall rule is a designated target node.
- S63 According to the probability value, determine whether the layout object node that meets the recall rule is the designated target node.
- a machine learning model can be used to determine whether a node that meets the recall rule is a designated target node that affects the browsing experience.
- the layout object nodes that comply with the recall rule are nodes in the layout object tree of the page.
- S61 may specifically include the following steps.
- S71 Obtain the attribute information of the layout object node that meets the recall rule, and the attribute information is the information obtained during the layout process; S72, adopt the depth-first traversal method and use the attribute information to analyze the layout object node in the layout object tree that meets the recall rule. , Perform top-down feature calculation to obtain the node features of the layout object nodes that meet the recall rules.
- the node feature may be a specified dimension feature extracted and calculated from the visual information of the node, the content of the node, the structure of the node, and the like.
- the specified dimension can be set according to actual calculation requirements, for example, the specified dimension is greater than or equal to 10, and the embodiment of the present disclosure does not specifically limit the specified dimension.
- the bottom-up feature calculation can be used to calculate node features when building the layout object tree and transfer them to the parent node.
- almost all page nodes must participate in feature calculation;
- the normal nodes are filtered through the recall rules first. Therefore, the top-down feature calculation can selectively calculate the node features by adopting a depth-first traversal method for the layout object nodes that meet the recall rules (that is, the suspected target nodes). Thereby reducing the number of nodes for calculating features and improving the speed of node feature calculation.
- the node prediction model is a model trained in advance using labeled static page data completed offline rendering, and the node prediction model is a gradient-enhanced decision tree model with a specified depth and a specified number of decision trees.
- the static data completed offline can be used for labeling when the training data is selected, and a high-accuracy automatic labeling tool can be set to assist manual labeling, and finally the training data can be formed.
- the node prediction model obtained by machine learning includes a gradient boosted decision tree model (Gradient Boosted Decision Tree, GBDT).
- the GBDT model is obtained by pre-training with annotation data to obtain a specified depth and a specified number of decision trees, for example, to obtain the depth It is a model file of 4 100 trees, and the model file is directly used to predict whether the layout object node that meets the recall rule is the designated target node.
- the depth of the node prediction model and the number of decision trees obtained by the above training are exemplary values.
- the model training can be completed according to the actual needs of the user, which is not specifically limited in the embodiment of the present disclosure.
- step S140 the step of performing shielding processing on the designated target node may specifically include the following steps.
- the node characteristic information includes at least one of the position, width, height, whether it is in the subject content, and the area ratio in the page on the page.
- the page processing method of the embodiment of the present disclosure provides a shielding strategy of a target node, and the shielding strategy can adopt a targeted processing mechanism for the characteristics of the specified target node.
- the elements that affect the user's browsing experience are shielded. After the page is rendered and drawn, the user sees the optimized page, which greatly improves the user's browsing experience and provides a guarantee for the ecological safety of mobile search .
- the page processing method in the embodiment of the present disclosure performs shielding processing on the designated target node, for example, setting the node state to hidden, resetting the kernel layout state, actively initiating the kernel re-layout, and the entire page processing process occurs before the node is drawn, thereby ensuring When the user browses the page, there is no jitter perception hidden by any page node, thereby optimizing the page browsing experience as a whole.
- FIG. 4 shows a flowchart of a page processing method according to another embodiment of the present disclosure.
- the page processing method may include the following steps.
- the DOM tree is obtained by parsing the HTML file through the parser, and when the CSS and JS file resource links on the HTML file are obtained by parsing, download and parse the CSS, and download and execute the JS file.
- the CSS is downloaded and parsed to obtain the style data of the nodes in the DOM tree; after the JS file is downloaded and executed, the nodes dynamically loaded by JS can be obtained, and the dynamically loaded nodes can be inserted/added in the DOM tree.
- S203 Construct a Layout Object tree according to the HTML element nodes that need to be rendered in the DOM tree and the style data of the nodes in the DOM tree.
- layer positioning and layout can be realized based on the Layout Layer tree.
- S205 Filter the nodes generated by JS dynamic loading in the layout object node tree, and execute S209 to trigger the re-layout of the nodes generated by the dynamic loading.
- JS dynamic loading is asynchronous resource loading
- the process of re-layout of nodes generated by dynamic loading can also be referred to as node re-layout triggered by asynchronous resource loading.
- S206 Collect attribute information of the layout object node in the process of laying out the nodes in the Layout Object tree.
- S207 Based on a preset node prediction model, score whether the layout object node is a designated target node that affects the browsing experience, and predict whether the layout object node affects the browsing experience according to the scoring result.
- the score of the layout object node is the probability value of whether the layout object node is a designated target node that affects the browsing experience.
- a preset recall rule can be used to filter any layout object node to obtain a layout in the Layout Object tree that meets the recall rule. Therefore, in step S207, based on the preset node prediction model, whether the layout object node that meets the recall rule is a designated target node that affects the browsing experience is scored.
- the designated target node can be masked (for example, set the node status to hidden) by rearranging the designated target node.
- S209 Perform the re-layout of the layout object node to obtain the layout object node after the re-layout mask processing.
- S210 Draw a page based on the layout object node after the shielding process, so as to display the drawn page on a designated display screen.
- a combination of recall rule preprocessing and a machine learning model is adopted to filter the nodes to be rendered, thereby shielding elements in the page that affect the browsing experience.
- Fig. 5 shows a schematic diagram of the effect of page processing in an embodiment of the present disclosure.
- page 1 includes multiple layout object nodes corresponding to multiple HTML object elements, such as node 1, node 2, node 3, or node 4.
- each layout object node in page 1 will call its own layout method during layout, so as to avoid traversing the DOM tree. For each layout object node, you can perform the following steps.
- Step S301 has the same processing procedure as step S120 in the above-mentioned embodiment, which will not be repeated in the embodiment of the present disclosure.
- Step S302 has the same processing process as S53 in the above-mentioned embodiment, which is not repeated in the embodiment of the present disclosure.
- Step S303 has the same processing procedure as step S130 in the above-mentioned embodiment, which will not be repeated in the embodiment of the present disclosure.
- the designated target node is masked, and the masked layout target node is used to generate a masked page.
- Step S304 has the same processing procedure as step S140 in the above-mentioned embodiment, which will not be repeated in the embodiment of the present disclosure.
- Fig. 6 shows a block diagram of a page processing device provided by an embodiment of the present disclosure. As shown in Figure 6, the page processing device includes the following modules.
- the node determining module 610 is configured to determine multiple layout object nodes of the page according to the obtained hypertext markup language HTML file.
- the node screening module 620 is configured to lay out multiple layout object nodes on the page, and then use preset recall rules to filter the layout object nodes to obtain layout object nodes that meet the recall rules.
- the prediction module 630 is used to predict whether the layout object node that meets the recall rule is the designated target node.
- the prediction module 630 is configured to predict whether a layout object node that meets the recall rule is a designated target node based on a preset node prediction model.
- the shielding processing module 640 is configured to perform shielding processing on the designated target node, and use the remaining layout target nodes after the shielding processing to generate a page after the shielding processing.
- the content of the page displayed on the website can be filtered, which provides protection for the ecological safety of mobile search, thereby improving the user's browsing experience.
- the node determining module 610 may include the following units.
- the first parsing unit is used to parse the HTML file to obtain the document object model DOM and the cascading style sheet CSS;
- the second parsing unit is used to parse the CSS to obtain the style data of the HTML element nodes in the DOM;
- the node determination module 610 is specifically used to According to the HTML element nodes and style data that need to be rendered in the DOM, multiple layout object nodes of the page are determined.
- each layout object node corresponds to an HTML element node that needs to be rendered
- the style data of each layout object node is the style data of the corresponding HTML element node
- the node determination module 610 may further include: a download execution unit for downloading and executing the script file corresponding to the script file link to obtain the HTML element node corresponding to the script file ;
- the node determination module 610 is specifically configured to use the HTML element node corresponding to the script file as a layout object node that meets the recall rule.
- the node screening module 620 may further include: after the multiple layout object nodes of the page are determined, if the multiple layout object nodes include layout object nodes loaded through a script file, then The layout object node loaded through the script file is used as the layout object node that complies with the recall rule.
- the node screening module 620 may specifically include: an attribute information obtaining unit, configured to lay out any layout object node of the page to obtain attribute information of the layout object node that has been laid out; The unit is used to determine whether the attribute information meets the node recall conditions defined in the recall rules; the recall node determination unit is used to use the layout object nodes that meet the node recall conditions as the layout object nodes that meet the recall rules.
- the recall rule may include a rule set in advance according to at least one of the ratio of the width and height of the node, the embedded form of the node, the location feature of the node, the content of the node, the node generation mechanism, and the node structure.
- the page processing apparatus may further include: a node state determining module, configured to use layout object nodes that meet the recall rules as the layout object nodes obtained by the initial screening, and determine the node states of the layout object nodes obtained by the initial screening;
- the state change node obtaining module is used to obtain the layout object nodes whose node state has changed after all the layout object nodes on the page are laid out;
- the node re-screening module is used to re-use the preset recall rules to detect the node state changes.
- the layout object nodes are screened; the screening node determination module is used to use the layout object nodes screened for the first time and the layout object nodes obtained by screening again as the layout object nodes that comply with the recall rule.
- the model prediction module 330 may include: a feature calculation unit for calculating the node characteristics of the layout object node that meets the recall rule based on the attribute information of the layout object node that meets the recall rule; a probability calculation unit for using The preset node prediction model processes node characteristics and obtains the probability value that the layout object node that meets the recall rule is the designated target node; the target node determination unit is used to determine whether the layout object node that meets the recall rule is the designated target node according to the probability value .
- the layout object nodes that comply with the recall rule are nodes in the layout object tree of the page.
- the feature calculation unit may include: an attribute information collection subunit for obtaining attribute information of layout object nodes that comply with recall rules, where the attribute information is information obtained during the layout process; the feature calculation unit is specifically used In adopting a depth-first traversal method and using attribute information, top-down feature calculations are performed on the layout object nodes that meet the recall rules in the layout object tree, and the node features of the layout object nodes that meet the recall rules are obtained.
- the node prediction model is a model trained in advance using labeled static page data completed offline rendering, and the node prediction model is a gradient-enhanced decision tree model with a specified depth and a specified number of decision trees.
- the shielding processing module 340 may specifically include: a characteristic calculation unit, configured to calculate corresponding node characteristic information according to the attribute information of the specified target node.
- the node characteristic information includes the position, width, height, whether or not it is on the page. At least one of the area ratio in the subject content and in the page; a node shielding unit, used to set the state of the designated target node to be hidden if the node characteristic information reaches the corresponding preset shielding threshold, Shield the specified target node.
- the masking processing module 340 may specifically further include: a drawing unit, configured to use the remaining layout object nodes after the masking process to re-layout, and use the re-layout layout object nodes to perform drawing to obtain the drawn mask. The processed page.
- a drawing unit configured to use the remaining layout object nodes after the masking process to re-layout, and use the re-layout layout object nodes to perform drawing to obtain the drawn mask. The processed page.
- a scheme combining rule recall and model prediction is used to shield the designated target node.
- the entire page processing process occurs before the node is drawn, thereby ensuring that the user does not have any page nodes when browsing the page Hidden jitter perception, thereby optimizing the overall page browsing experience.
- FIG. 7 shows a block diagram of an electronic device provided by an embodiment of the present disclosure; as shown in FIG. 7, an embodiment of the present disclosure provides an electronic device 700, including: one or more processors 701;
- the memory 702 has one or more programs stored thereon. When the one or more programs are executed by one or more processors, the one or more processors implement the page processing method of any one of the above; one or more I
- the /O interface 703 is connected between the processor and the memory, and is configured to implement information interaction between the processor and the memory.
- the processor 701 is a device with data processing capabilities, including but not limited to a central processing unit (CPU), etc.
- the memory 702 is a device with data storage capabilities, including but not limited to random access memory (RAM, more specifically Such as SDRAM, DDR, etc.), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory (FLASH); I/O interface (read and write interface) 703 is connected between processor 701 and memory 702 , Can realize the information interaction between the processor 701 and the memory 702, which includes but is not limited to a data bus (Bus) and the like.
- RAM random access memory
- ROM read-only memory
- EEPROM electrically erasable programmable read-only memory
- FLASH flash memory
- I/O interface (read and write interface) 703 is connected between processor 701 and memory 702 , Can realize the information interaction between the processor 701 and the memory 702, which includes but is not limited to a data bus (Bus) and the like.
- the processor 701, the memory 702, and the I/O interface 703 are connected to each other through the bus 704, and further connected to other components of the electronic device 700.
- Fig. 8 shows a block diagram of a computer-readable medium provided by an embodiment of the present disclosure.
- an embodiment of the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, any one of the above-mentioned page processing methods is implemented.
- Such software may be distributed on a computer-readable medium
- the computer-readable medium may include a computer storage medium (or non-transitory medium) and a communication medium (or transitory medium).
- the term computer storage medium includes volatile and non-volatile memory implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data).
- Information such as computer-readable instructions, data structures, program modules, or other data.
- Computer storage media include but are not limited to RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cassette, tape, magnetic disk storage or other magnetic storage device, or Any other medium used to store desired information and that can be accessed by a computer.
- a communication medium usually contains computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transmission mechanism, and may include any information delivery medium. .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Document Processing Apparatus (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
在深度学习和智能搜索技术领域,具体提供了一种页面处理方法、装置、电子设备和计算机可读介质,该方法包括:根据获取的超文本标记语言HTML文件,确定页面的多个布局对象节点(S110);对页面的多个布局对象节点进行布局后,利用预设的召回规则,对所述多个布局对象节点进行筛选,得到符合召回规则的布局对象节点(S120);预测符合召回规则的布局对象节点是否为指定目标节点(S130);对指定目标节点进行屏蔽处理,利用屏蔽处理后剩余的布局对象节点,生成经屏蔽处理后的页面(S140)。
Description
本公开实施例涉及深度学习、智能搜索技术领域,特别涉及一种页面处理方法、装置、电子设备和计算机可读介质。
随着移动互联网的全面普及,越来越多的站点在移动场景下进行广告营销和应用推广。一方面,受限于移动设备屏幕的限制,广告等元素对用户的浏览体验的影响越来越明显;另一方面,一些站点为了最大限度地获取短的利益,在网站上挂载大量虚假、色情以及诱骗用户形式的广告元素,严重影响用户的浏览体验,破坏了移动生态安全。
因此,应当对网站显示的页面内容进行过滤,为移动搜索生态安全提供保障,从而提升用户浏览体验。
发明内容
本公开实施例提供一种页面处理方法、装置、电子设备和计算机可读介质。
第一方面,本公开实施例提供一种页面处理方法,包括:根据获取的超文本标记语言HTML文件,确定页面的多个布局对象节点;对页面的多个布局对象节点进行布局后,利用预设的召回规则,对所述多个布局对象节点进行筛选,得到符合召回规则的布局对象节点;预测符合召回规则的布局对象节点是否为指定目标节点;对指定目标节点进行屏蔽处理,利用屏蔽处理后剩余的布局对象节点,生成经屏蔽处理后的页面。
第二方面,本公开实施例提供一种页面处理装置,包括:节点确定模块, 用于根据获取的超文本标记语言HTML文件,确定页面的多个布局对象节点;节点筛选模块,用于对页面的多个布局对象节点进行布局后,利用预设的召回规则,对所述多个布局对象节点进行筛选,得到符合召回规则的布局对象节点;预测模块,用于预测符合召回规则的布局对象节点是否为指定目标节点;屏蔽处理模块,用于对指定目标节点进行屏蔽处理,利用屏蔽处理后剩余的布局对象节点,生成经屏蔽处理后的页面。
第三方面,本公开实施例提供一种电子设备,其包括:一个或多个处理器;存储器,其上存储有一个或多个程序,当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器上述任意一种页面处理方法;一个或多个I/O接口,连接在处理器与存储器之间,配置为实现处理器与存储器的信息交互。
第四方面,本公开实施例提供一种计算机可读介质,其上存储有计算机程序,程序被处理器执行时实现上述任意一种页面处理方法。
本公开实施例提供的页面处理方法、装置、电子设备和计算机可读介质,采用召回规则与节点预测模型结合的方式对页面进行处理,针对经过召回规则筛选后的布局对象节点,再利用节点预测模型进行判定是否影响浏览体验,从而对预测得到的影响浏览体验的布局对象节点进行屏蔽处理,整体上优化页面浏览体验,为移动搜索生态安全提供保障。
附图用来提供对本公开实施例的进一步理解,并且构成说明书的一部分,与本公开的实施例一起用于解释本公开,并不构成对本公开的限制。通过参考附图对详细示例实施例进行描述,以上和其它特征和优点对本领域技术人员将变得更加显而易见,在附图中:
图1为本公开实施例提供的一种页面处理的架构示意图;
图2为本公开一实施例的页面处理方法的流程图;
图3为本公开示例性实施例中召回规则的示意图;
图4为本公开另一实施例的页面处理方法的流程图;
图5为本公开页面处理方法的效果示意图;
图6为本公开实施例提供的一种页面处理装置的组成框图;
图7为本公开实施例提供的一种电子设备的组成框图;
图8为本公开实施例提供的一种计算机可读介质的组成框图。
为使本领域的技术人员更好地理解本公开的技术方案,下面结合附图对本公开提供的页面处理方法、装置、电子设备和计算机可读介质进行详细描述。
在下文中将参考附图更充分地描述示例实施例,但是示例实施例可以以不同形式来体现且不应当被解释为限于本文阐述的实施例。反之,提供这些实施例的目的在于使本公开透彻和完整,并将使本领域技术人员充分理解本公开的范围。在不冲突的情况下,本公开各实施例及实施例中的各特征可相互组合。
图1是本公开一实施例的页面处理的架构示意图。如图1所示,该架构可以包括:移动设备20和网站30,其中,移动设备20可以包括浏览器内核21、内存22和显示屏23;网站30中可以包括多个页面31。
其中,移动设备20可以包括但不限于:个人电脑、智能手机、平板电脑、个人数字助理、服务器等。它们均可以安装有各种应用(App),例如邮箱App等。
本公开实施例中的页面31,包括但不限于是落地页。落地页可以用于表示独立的网页,并可以用于营销或广告活动,例如用户或访问者通过点击搜索出来的广告或通过付费搜索渠道点击进入的页面。
在一个实施例中,当用户10通过移动设备20访问网站30,并点击网站30中的一个页面31的统一资源定位器(Uniform Resource Locator,URL)后,浏览内核21根据URL发起超文本标记语言(Hypertext Markup Language,HTML)文件下载,并对下载得到的HTML文件进行解析得到DOM(Document Object Model)树,同时在解析得到HTML文件上的层叠样式表(Cascading Style Sheets,CSS)以及脚本语言(JavaScript,JS)文件等资源链接时,发起CSS文件下载和JS文件下载,下载得到的CSS文件和JS文件保存在内存22中。
由于网页站点的行为变化非常快,通过配置规则集合无法穷举所有的类型和页面;且不是所有的广告都是影响用户浏览体验,当广告元素处于不影响页面主体内容浏览的位置且不存在诱导等行为时,是正常的商业行为,如果大面积误伤,也会破坏正常的互联网生态。但是目前很多方案无法区分处理上述正常商业行为的广告和影响用户浏览体验的广告;如果基于规则集对网页站点中的页面元素进行过滤,当规则集合过大时,网页加载的速度会受到明显的影响。
本公开实施例可以提供一种页面处理方法,在移动设备20的显示屏23显示页面31之前,通过在浏览器内核21渲染阶段智能识别页面31中页面元素的类型,并自动屏蔽影响用户浏览体验的页面元素,当页面31完成渲染后,用户10所看到的是经过优化后的页面,大幅度提升了用户浏览体验并为移动搜索生态安全提供了保障。
下面各实施例均可以应用于本实施例的系统架构。为了描述简洁,下面各个实施例可以相互参考和引用。
图2为本公开一实施例的页面处理方法的流程图。如图2所示,该页面处理方法可以包括如下步骤。
S110,根据获取的超文本标记语言HTML文件,确定页面的多个布局对象节点。
S120,对页面的多个布局对象节点进行布局后,利用预设的召回规则,对多个布局对象节点进行筛选,得到符合召回规则的布局对象节点。
S130,预测符合召回规则的布局对象节点是否为指定目标节点。
S140,对指定目标节点进行屏蔽处理,利用屏蔽处理后剩余的布局对象节点,生成经屏蔽处理后的页面。
根据本公开实施例的页面处理方法,采用召回规则与节点预测模型结合的方式对页面进行处理,经过召回规则筛选后的布局对象节点,利用节点预测模型进行判定是否影响浏览体验,从而对预测得到的影响浏览体验的布局对象节点进行屏蔽处理,生成经所述屏蔽处理后的页面,从整体上优化页面浏览体验,为移动搜索生态安全提供保障。
在本公开实施例中,由于渲染内核处理网页的过程非常复杂,从处理性能和用户体验的角度出发,选择一个合适的时机隐藏目标节点显得异常重要。对布局对象节点的布局(layout),表示安排和计算布局对象节点的宽、高、位置等几何信息的过程。如果简单地在页面每次整体布局完成的时候进行广告识别,并重新对整个页面进行布局,这样虽然能完成识别,但是由于网页在展现的时候需要进行几十甚至上百次的布局,同时需要遍历整个页面进行目标节点识别,遍历和重新布局都会耗费时间,对整个页面的加载时间会产生非常大的影响,直接导致整个网页加载感知上变慢。
因此,为了能达到最好的性能和用户体验,本公开实施例的页面处理方法可以不用遍历整个页面,并且不用整个页面重新布局,而主动触发局部布局。具体地,在上述步骤S120,对页面的多个布局对象节点进行布局后,即可利用预设的召回规则,对布局对象节点进行筛选。
也就是说,本公开实施例中,页面上每个节点在布局时都会调用自己的 布局方法,从而避免遍历DOM树,在该节点布局完成后,若该节点被识别为影响浏览体验的目标节点,则对该目标节点进行屏蔽处理,例如设置目标节点的状态为隐藏,并重新设置内核布局状态,主动发起内核重布局,从而可以直接在局部对该节点进行布局,避免在整个页面级别的重新布局。
在一个实施例中,步骤S110具体可以包括:S21,解析HTML文件,得到文档对象模型DOM和层叠样式表CSS;S22,解析CSS,得到DOM中HTML元素节点的样式数据;S23,根据DOM中需要进行渲染的HTML元素节点和样式数据,确定页面的多个布局对象节点。
其中,每个布局对象节点与需要进行渲染的一个HTML元素节点相对应,且每个布局对象节点的样式数据为对应的HTML元素节点的样式数据。
在该实施例中,文档对象模型DOM可以是树结构的DOM,即DOM树;多个布局对象节点可以是布局对象Layout Object树中的节点;建立Layout Object树并进行布局后,Layout Object树的节点可以具有坐标、宽、高等一系列的属性信息。
也就是说,在该实施例中,Layout Object树中的每个节点与DOM中需要进行渲染的HTML元素节点相对应,将用来描述DOM树中HTML元素节点的CSS属性对象设置给新创建的Layout Object树中的布局对象节点,以便Layout Object树中的布局对象节点可以根据CSS中的样式数据进行绘制。
在一个实施例中,若解析HTML文件得到脚本文件链接,则在步骤S23之前,还可以包括:S31,下载并执行脚本文件链接对应的脚本文件,得到脚本文件对应的HTML元素节点;S32,将脚本文件对应的HTML元素节点,作为符合召回规则的布局对象节点。
也就是说,在一些实施例中,在确定页面的多个布局对象节点之后,该页面处理方法还可以包括:若所述多个布局对象节点中包括通过脚本文件加载的布局对象节点,则将所述通过脚本文件加载的布局对象节点,作为符合 所述召回规则的布局对象节点。
在该实施例中,由于很多影响浏览体验的目标节点多为JS动态加载的,因此可以根据节点是否由JS加载的特点,将脚本文件对应的HTML元素节点,作为符合召回规则的布局对象节点,以对待识别的节点进行初过滤,从而通过异步加载的JS资源触发节点重布局,有效减少了后续使用节点预测模型通过预测对影响浏览体验的节点进行识别的时间。
图3示出本公开示例性实施例中召回规则的示意图。在本公开实施例中,可以将利用预设的召回规则,对布局对象节点进行筛选,称为基于规则的粗召回。
如图3所示,在基于规则的粗召回中,可以从节点宽高占比、节点内嵌形式、节点位置特征、节点内容、节点产生机制和节点结构等方面设置节点召回条件。
也就是说,召回规则可以包括:预先根据节点宽高占比、节点内嵌形式、节点位置特征、节点内容、节点产生机制和节点结构中的至少一项进行设置的规则。
在一个实施例中,步骤S120,具体可以包括:S41,对页面的多个布局对象节点进行布局,得到经布局的布局对象节点的属性信息;S42,判断属性信息是否符合召回规则中限定的节点召回条件;S43,将满足节点召回条件的布局对象节点,作为符合召回规则的布局对象节点。
作为一个示例,根据节点宽高占比设置的规则,包括将节点高度占比小于高度占比阈值,和/或宽度占比小于宽度占比阈值的节点作为符合召回规则的节点。在该示例中,影响浏览体验的节点极少会霸占整屏,大多以穿插或者悬浮的形式存在页面中,高度占据屏幕例如75%的节点不是目标节点的概率非常大,宽度占比小于宽度占比阈值的其他节点的子节点均可过滤掉。
作为一个示例,根据节点内嵌形式设置的规则,包括将指定内嵌形式的节点,作为符合召回规则的节点。例如,根据数据分析发现,内嵌框架iframe节点中是目标节点常用的寄生场所,节点中包含多数广告厂商的内嵌数据,因此有iframe的节点也被纳入到疑似目标节点的集合中。
作为一个示例,根据节点位置特征设置的规则包括:将节点位置特征包括悬浮型的节点,作为符合召回规则的节点。在该示例中,目标节点相对于页面有固定、内嵌、悬浮等形式,其中悬浮型的目标节点对浏览体验的影响最为恶劣,会遮挡有效信息并强制用户关闭,因此悬浮型的节点也都被列入到疑似目标节点集合。
作为一个示例,根据节点内容特征设置的规则包括:将具有指定类型内容的节点,作为符合召回规则的节点。在该示例中,如果节点内的文本、图片、交互型等内容比较丰富,很大概率是非目标节点。
作为一个示例,根据节点产生机制设置的规则包括:将具有指定产生机制的节点,作为符合召回规则的节点。在该示例中,若页面中的节点包括HTML源码和JS动态生成的节点,其中JS生成节点灵活多变,大多页面的主体内容是在HTML中的,其他需要动态变化的广告、相关推荐等内容则用JS生成。因此,由JS生成的节点是目标节点的可能性很大。
作为一个示例,根据节点结构特征设置的规则包括:将具有指定结构的节点,作为符合召回规则的节点。在该示例中,节点在DOM树上的结构特征也能作为过滤依据,例如,在DOM树结构中,只有纯文本的节点大多是非目标节点(不符合召回规则的节点);以及具有div/a/img形式的块级节点,很可能是通过图片进行推广的节点。
根据本公开实施例的页面处理方法,在基于规则的粗召回中,只要命中召回规则中的任意一种规则所限定的节点召回条件,可以表示该节点有疑似目标节点的特性,既可进行后续的目标节点判断逻辑;如果所有规则都没命 中,则被视为非目标节点,从而这样通过一个召回规则的筛选策略即可过滤掉大量不影响浏览体验的正常节点。
在一个实施例中,在上述步骤S130之前,还可以包括如下步骤。
S51,将符合召回规则的布局对象节点,作为初次筛选得到的布局对象节点,确定初次筛选得到的布局对象节点的节点状态。
S52,在页面的所有布局对象节点完成布局后,获取节点状态发生变化的布局对象节点。
S53,再次利用预设的召回规则,对节点状态发生变化的布局对象节点进行筛选。
S54,将初次筛选的布局对象节点和再次筛选得到的布局对象节点,作为符合召回规则的布局对象节点。
在该实施例中,在节点布局过程中,由于一些节点存在互相依赖的关系,因此在首次布局时,还没有计算准确的节点视觉信息,很难通过粗召回策略,因此需要在整体布局完成后,核查节点状态例如节点视觉信息发生变化的节点,并对节点状态重新进行粗召回策略,从而通过复查Recheck机制回捞一批在布局过程中状态发生变化后符合召回规则的节点,从而召回更多符合召回规则的节点,防止目标节点被遗漏。
在一个实施例中,步骤S130具体可以包括如下步骤。
S61,根据符合召回规则的布局对象节点的属性信息,计算符合召回规则的布局对象节点的节点特征。
S62,利用预设的节点预测模型处理节点特征,得到符合召回规则的布局对象节点为指定目标节点的概率值。
S63,根据概率值,确定符合召回规则的布局对象节点是否为指定目标节点。
在该实施例中,可以利用机器学习模型判定符合召回规则的节点是否为 影响浏览体验的指定目标节点。
在一个实施例中,符合召回规则的布局对象节点,为页面的布局对象树中的节点。具体地,S61,具体可以包括如下步骤。
S71,获取符合召回规则的布局对象节点的属性信息,属性信息是在布局过程中获取的信息;S72,采用深度优先遍历的方式,利用属性信息,对布局对象树中符合召回规则的布局对象节点,进行自顶向下的特征计算,得到符合召回规则的布局对象节点的节点特征。
在一个实施例中,该节点特征可以是从节点视觉信息、节点内容和节点结构等方面抽取并计算的指定维数特征。该指定维数可以根据实际计算需求进行设置,例如指定维数大于等于10,本公开实施例对指定维数不做具体限定。
在该实施例中,采用从下到上的特征计算可在建立布局对象树的时候计算节点特征并传递至父节点,但是该模式下几乎所有的页面节点都要参与特征计算;由于节点布局时先通过召回规则进行了正常节点的过滤,因此,自顶向下的特征计算,可以对符合召回规则的布局对象节点(即疑似目标节点)采用深度优先遍历的方式有选择性的计算节点特征,从而减少计算特征的节点数目,提高节点特征计算速度。
在一个实施例中,节点预测模型,是预先利用已标注的离线渲染完成的静态页面数据训练得到的模型,且节点预测模型为具有指定的深度和指定数目颗决策树的梯度增强决策树模型。
示例性地,由于浏览器内核处理的节点特征会动态变化,在训练数据选取时可利用离线渲染完成的静态数据进行标注,设定高准确率的自动化标注工具辅助人工标注,最终组成训练数据。
示例性地,机器学习得到的节点预测模型包括梯度增强决策树模型(Gradient Boosted Decision Tree,GBDT),预先利用标注数据训练得到该 GBDT模型,得到指定的深度和指定数目颗决策树,例如得到深度为4的100棵树的模型文件,后续直接利用该模型文件对符合召回规则的布局对象节点是否为指定目标节点进行预测。
应理解,上述训练得到的节点预测模型的深度和决策树的颗数为示例性的数值,实际应用场景中,可以根据用户的实际需求完成模型训练,本公开实施例不做具体限定。
在一个实施例中,步骤S140中,对指定目标节点进行屏蔽处理的步骤,具体可以包括如下步骤。
S81,根据指定目标节点的属性信息,计算对应的节点特性信息。
其中,节点特性信息包括在页面中的位置、宽度、高度、是否在主题内容中、以及在页面中的面积占比中的至少一种。
S82,若节点特性信息达到对应的预设的屏蔽阈值,通过设置指定目标节点的状态为隐藏,对影响浏览体验的布局对象节点进行屏蔽处理。
本公开实施例的页面处理方法提供一种目标节点的屏蔽策略,该屏蔽策略可以针对指定目标节点的特征采取有针对性的处理机制。在识别出目标节点后,可以对整体页面的目标节点的特性和面积占比进行计算,然后根据可配置的屏蔽阈值,例如针对节点在页面中的位置、宽高、是否在主题内容中等进行屏蔽,从而达到灵活屏蔽指定目标节点,维护和保证移动搜索的生态安全,并从整体上优化页面浏览体验。
在该实施例中,屏蔽影响用户浏览体验的元素,当页面完成渲染和绘制后,用户所看到的是经过优化后的页面,大幅度提升了用户浏览体验并为移动搜索生态安全提供了保障。
本公开实施例中的页面处理方法,对指定目标节点进行屏蔽处理,例如设置节点状态为隐藏,并重新设置内核布局状态,主动发起内核重布局,整个页面处理过程发生在节点绘制之前,从而保证用户在浏览页面时没有任何 页面节点隐藏的抖动感知,从而整体上优化页面浏览体验。
为了更好的理解本公开中的页面处理方法,下面通过图4描述本公开另一实施例的页面处理流程。图4示出本公开另一实施例的页面处理方法的流程图。如图4所示,页面处理方法可以包括如下步骤。
S201,根据页面URL下载超文本标记语言HTML文件。
S202,经过解析器解析HTML文件得到DOM树,并在解析得到HTML文件上的CSS和JS文件资源链接时,下载并解析CSS,以及下载并执行JS文件。
在该步骤中,下载并解析CSS,得到DOM树中节点的样式数据;下载并执行JS文件后,可以得到通过JS动态加载的节点,并在DOM树中插入/添加该动态加载的节点。
S203,根据DOM树中需要进行渲染的HTML元素节点和DOM树中节点的样式数据,构建布局对象Layout Object树。
S204,在Layout Object树构建完成之后,创建布局图层Layout Layer树。
在该步骤中,可以基于Layout Layer树实现图层定位和布局。
S205,将布局对象节点树中通过JS动态加载产生的节点进行过滤,并执行S209,以触发该动态加载产生的节点的重布局。
在图4中,由于JS动态加载为异步资源加载,因此动态加载产生的节点的重布局的过程也可以称为是异步资源加载触发的节点重布局。
S206,在对Layout Object树中的节点进行布局的过程中,收集布局对象节点的属性信息。
S207,基于预设的节点预测模型,对布局对象节点是否为影响浏览体验的指定目标节点打分,根据打分结果预测该布局对象节点是否影响浏览体验。
在该步骤中,布局对象节点的分值,为布局对象节点是否为影响浏览体验的指定目标节点的概率值。
在一些实施例中,在对Layout Object树中的任一节点进行布局后,可以利用预设的召回规则,对该任一布局对象节点进行筛选,得到Layout Object树中符合所述召回规则的布局对象节点,从而在上述步骤S207中,基于预设的节点预测模型,对符合所述召回规则的布局对象节点是否为影响浏览体验的指定目标节点打分。
S208,若预测为影响浏览体验的指定目标节点,浏览器内核进行布局状态设置,并执行S209,以主动触发该布局对象节点的重布局。
在步骤S208,可以通过对指定目标节点的重布局,屏蔽处理(例如设置节点状态为隐藏)该指定目标节点。
S209,执行布局对象节点的重布局,得到重布局的屏蔽处理后的布局对象节点。
S210,基于屏蔽处理后的布局对象节点绘制页面,以将绘制的页面显示到指定的显示屏幕。
根据本公开实施例的页面布局方法,采用召回规则策略预处理与机器学习模型结合的方式完成对要渲染的节点的过滤,从而屏蔽页面中影响浏览体验的元素。
图5示出本公开实施例中页面处理的效果示意图。如图5所示,页面1中包括多个HTML对象元素对应的多个布局对象节点,例如节点1、节点2、节点3或节点4。
在图5中,页面1中的每个布局对象节点在布局时都会调用自己的布局方法,从而避免遍历DOM树。针对每个布局对象节点,可以执行如下步骤。
如图5中的S301“基于规则的粗召回”所示,对页面的多个布局对象节 点进行布局后,利用预设的召回规则,对布局对象节点进行筛选,得到页面中符合召回规则的布局对象节点。
步骤S301与上述实施例中步骤S120具有相同的处理过程,本公开实施例不再赘述。
如图5中的S302“Recheck机制”所示,在页面的所有布局对象节点完成布局后,再次利用预设的召回规则,对节点状态发生变化的布局对象节点进行筛选。
步骤S302与上述实施例中的S53具有相同的处理过程,本公开实施例不再赘述。
如图5中的S303“模型召回”所示,基于预设的节点预测模型,预测符合召回规则的布局对象节点是否为指定目标节点。
步骤S303与上述实施例中步骤S130具有相同的处理过程,本公开实施例不再赘述。
如图5中的S304“屏蔽处理”所示,对指定目标节点进行屏蔽处理,利用屏蔽处理后的布局对象节点,生成经屏蔽处理后的页面。
步骤S304与上述实施例中步骤S140具有相同的处理过程,本公开实施例不再赘述。
如图5所示,当页面1完成渲染后,用户所看到的是经过优化后的页面2,大幅度提升了用户浏览体验,并为移动搜索生态安全提供了保障。
图6示出本公开实施例提供的页面处理装置的组成框图。如图6所示,该页面处理装置包括如下模块。
节点确定模块610,用于根据获取的超文本标记语言HTML文件,确定页面的多个布局对象节点。
节点筛选模块620,用于对页面的多个布局对象节点进行布局后,利用 预设的召回规则,对布局对象节点进行筛选,得到符合召回规则的布局对象节点。
预测模块630,用于预测符合召回规则的布局对象节点是否为指定目标节点。
在一些实施例中,预测模块630用于基于预设的节点预测模型,预测符合召回规则的布局对象节点是否为指定目标节点。
屏蔽处理模块640,用于对指定目标节点进行屏蔽处理,利用屏蔽处理后剩余的布局对象节点,生成经屏蔽处理后的页面。
根据本公开实施例的页面处理装置,可以对网站显示的页面内容进行过滤,为移动搜索生态安全提供保障,从而提升用户浏览体验。
在一个实施例中,节点确定模块610,可以包括如下单元。
第一解析单元,用于解析HTML文件,得到文档对象模型DOM和层叠样式表CSS;第二解析单元,用于解析CSS,得到DOM中HTML元素节点的样式数据;节点确定模块610,具体用于根据DOM中需要进行渲染的HTML元素节点和样式数据,确定页面的多个布局对象节点。
其中,每个布局对象节点与需要进行渲染的一个HTML元素节点相对应,且每个布局对象节点的样式数据为对应的HTML元素节点的样式数据。
在一个实施例中,若解析HTML文件得到脚本文件链接,则节点确定模块610,还可以包括:下载执行单元,用于下载并执行脚本文件链接对应的脚本文件,得到脚本文件对应的HTML元素节点;节点确定模块610,具体用于将脚本文件对应的HTML元素节点,作为符合召回规则的布局对象节点。
在一个实施例中,该节点筛选模块620还可以包括:在所述确定页面的多个布局对象节点之后,若所述多个布局对象节点中包括通过脚本文件加载的布局对象节点,则将所述通过脚本文件加载的布局对象节点,作为符合所 述召回规则的布局对象节点。
在一个实施例中,节点筛选模块620具体可以包括:属性信息获取单元,用于对所述页面的任一布局对象节点进行布局,得到经所述布局的布局对象节点的属性信息;符合条件判断单元,用于判断属性信息是否符合召回规则中限定的节点召回条件;召回节点确定单元,用于将满足节点召回条件的布局对象节点,作为符合召回规则的布局对象节点。
在一个实施例中,召回规则可以包括:预先根据节点宽高占比、节点内嵌形式、节点位置特征、节点内容、节点产生机制和节点结构中的至少一项进行设置的规则。
在一个实施例中,页面处理装置还可以包括:节点状态确定模块,用于将符合召回规则的布局对象节点,作为初次筛选得到的布局对象节点,确定初次筛选得到的布局对象节点的节点状态;状态变化节点获取模块,用于在页面的所有布局对象节点完成布局后,获取节点状态发生变化的布局对象节点;节点再次筛选模块,用于再次利用预设的召回规则,对节点状态发生变化的布局对象节点进行筛选;筛选节点确定模块,用于将初次筛选的布局对象节点和再次筛选得到的布局对象节点,作为符合召回规则的布局对象节点。
在一个实施例中,模型预测模块330可以包括:特征计算单元,用于根据符合召回规则的布局对象节点的属性信息,计算符合召回规则的布局对象节点的节点特征;概率计算单元,用于利用预设的节点预测模型处理节点特征,得到符合召回规则的布局对象节点为指定目标节点的概率值;目标节点确定单元,用于根据概率值,确定符合召回规则的布局对象节点是否为指定目标节点。
在一个实施例中,符合召回规则的布局对象节点,为页面的布局对象树中的节点。
在该实施例中,特征计算单元,可以包括:属性信息收集子单元,用于 获取符合召回规则的布局对象节点的属性信息,属性信息是在布局过程中获取的信息;特征计算单元,具体用于采用深度优先遍历的方式,利用属性信息,对布局对象树中符合召回规则的布局对象节点,进行自顶向下的特征计算,得到符合召回规则的布局对象节点的节点特征。
在一个实施例中,节点预测模型,是预先利用已标注的离线渲染完成的静态页面数据训练得到的模型,且节点预测模型为具有指定的深度和指定数目颗决策树的梯度增强决策树模型。
在一个实施例中,屏蔽处理模块340具体可以包括:特性计算单元,用于根据指定目标节点的属性信息,计算对应的节点特性信息,节点特性信息包括在页面中的位置、宽度、高度、是否在主题内容中、以及在页面中的面积占比中的至少一种;节点屏蔽单元,用于若节点特性信息达到对应的预设的屏蔽阈值,通过设置所述指定目标节点的状态为隐藏,对指定目标节点进行屏蔽处理。
在一个实施例中,屏蔽处理模块340具体还可以包括:绘制单元,用于利用屏蔽处理后剩余的布局对象节点进行重新布局,并利用重新布局后的布局对象节点进行绘制,得到绘制的经蔽处理后的页面。
根据本公开实施例的页面处理装置,利用规则召回和模型预测相结合的方案,对指定目标节点进行屏蔽处理,整个页面处理过程发生在节点绘制之前,从而保证用户在浏览页面时没有任何页面节点隐藏的抖动感知,从而整体上优化页面浏览体验。
图7示出本公开实施例提供的一种电子设备的组成框图;如图7所示,本公开实施例提供一种电子设备700,包括:一个或多个处理器701;
存储器702,其上存储有一个或多个程序,当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现上述任意一项的页面处理方 法;一个或多个I/O接口703,连接在处理器与存储器之间,配置为实现处理器与存储器的信息交互。
其中,处理器701为具有数据处理能力的器件,其包括但不限于中央处理器(CPU)等;存储器702为具有数据存储能力的器件,其包括但不限于随机存取存储器(RAM,更具体如SDRAM、DDR等)、只读存储器(ROM)、带电可擦可编程只读存储器(EEPROM)、闪存(FLASH);I/O接口(读写接口)703连接在处理器701与存储器702间,能实现处理器701与存储器702的信息交互,其包括但不限于数据总线(Bus)等。
在一些实施例中,处理器701、存储器702和I/O接口703通过总线704相互连接,进而与电子设备700的其他组件连接。
图8示出本公开实施例提供的一种计算机可读介质的组成框图。如图8所示,本公开实施例提供一种计算机可读介质,其上存储有计算机程序,程序被处理器执行时实现上述任意一种页面处理方法。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其它数据)的任何方法或技术中实施的易失性 和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其它存储器技术、CD-ROM、数字多功能盘(DVD)或其它光盘存储、磁盒、磁带、磁盘存储或其它磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其它的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其它传输机制之类的调制数据信号中的其它数据,并且可包括任何信息递送介质。
本文已经公开了示例实施例,并且虽然采用了具体术语,但它们仅用于并仅应当被解释为一般说明性含义,并且不用于限制的目的。在一些实例中,对本领域技术人员显而易见的是,除非另外明确指出,否则可单独使用与特定实施例相结合描述的特征、特性和/或元素,或可与其它实施例相结合描述的特征、特性和/或元件组合使用。因此,本领域技术人员将理解,在不脱离由所附的权利要求阐明的本公开的范围的情况下,可进行各种形式和细节上的改变。
Claims (12)
- 一种页面处理方法,包括:根据获取的超文本标记语言HTML文件,确定页面的多个布局对象节点;对所述页面的多个布局对象节点进行布局后,利用预设的召回规则,对所述多个布局对象节点进行筛选,得到符合所述召回规则的布局对象节点;预测符合所述召回规则的布局对象节点是否为指定目标节点;对所述指定目标节点进行屏蔽处理,利用屏蔽处理后剩余的布局对象节点,生成经所述屏蔽处理后的页面。
- 根据权利要求1所述的方法,其中,在所述确定页面的多个布局对象节点之后,还包括:若所述多个布局对象节点中包括通过脚本文件加载的布局对象节点,则将所述通过脚本文件加载的布局对象节点,作为符合所述召回规则的布局对象节点。
- 根据权利要求1所述的方法,其中,所述对所述页面的多个布局对象节点进行布局后,利用预设的召回规则,对所述布局对象节点进行筛选,得到所述多个布局对象节点中的符合所述召回规则的布局对象节点,包括:对所述页面的布局对象节点进行布局,得到经所述布局的布局对象节点的属性信息;判断所述属性信息是否符合所述召回规则中限定的节点召回条件;将满足所述节点召回条件的布局对象节点,作为符合所述召回规则的布局对象节点。
- 根据权利要求3所述的方法,其中,所述召回规则包括:预先根据节点宽高占比、节点内嵌形式、节点位置特征、节点内容、节点产生机制和节点结构中的至少一项进行设置的规则。
- 根据权利要求1所述的方法,其中,所述预测符合所述召回规则的布局对象节点是否为指定目标节点之前,所述方法还包括:将符合所述召回规则的布局对象节点,作为初次筛选得到的布局对象节点,确定所述初次筛选得到的布局对象节点的节点状态;在所述页面的所有布局对象节点完成布局后,获取节点状态发生变化的布局对象节点;再次利用预设的召回规则,对节点状态发生变化的布局对象节点进行筛选;将所述初次筛选的布局对象节点和所述再次筛选得到的布局对象节点,作为符合所述召回规则的布局对象节点。
- 根据权利要求1所述的方法,其中,所述预测符合所述召回规则的布局对象节点是否为指定目标节点,包括:根据所述符合所述召回规则的布局对象节点的属性信息,计算所述符合所述召回规则的布局对象节点的节点特征;利用预设的节点预测模型处理所述节点特征,得到所述符合所述召回规则的布局对象节点为所述指定目标节点的概率值;根据所述概率值,确定所述符合所述召回规则的布局对象节点是否为所述指定目标节点。
- 根据权利要求6所述的方法,其中,所述符合所述召回规则的布局对象节点,为所述页面的布局对象树中的节点;所述根据所述符合所述召回规则的布局对象节点的属性信息,计算所述符合所述召回规则的布局对象节点的节点特征,包括:获取所述符合所述召回规则的布局对象节点的属性信息,所述属性信息是在所述布局过程中获取的信息;采用深度优先遍历的方式,利用所述属性信息,对所述布局对象树中符合所述召回规则的布局对象节点,进行自顶向下的特征计算,得到符合所述召回规则的布局对象节点的节点特征。
- 根据权利要求6所述的方法,其中,所述节点预测模型,是预先利用已标注的离线渲染完成的静态页面数据训练得到的模型,且所述节点预测模型为具有指定的深度和指定数目颗决策树的梯度增强决策树模型。
- 根据权利要求1所述的方法,其中,所述对所述指定目标节点进行屏蔽处理,包括:根据所述指定目标节点的属性信息,计算对应的节点特性信息,所述节点特性信息包括在所述页面中的位置、宽度、高度、是否在主题内容中、以及在所述页面中的面积占比中的至少一种;若所述节点特性信息达到对应的预设的屏蔽阈值,通过设置所述指定目标节点的状态为隐藏,对所述指定目标节点进行屏蔽处理。
- 一种页面处理装置,包括:节点确定模块,用于根据获取的超文本标记语言HTML文件,确定页面的多个布局对象节点;节点筛选模块,用于对所述页面的多个布局对象节点进行布局后,利用预设的召回规则,对所述多个布局对象节点进行筛选,得到符合所述召回规则的布局对象节点;预测模块,预测符合所述召回规则的布局对象节点是否为指定目标节点;屏蔽处理模块,用于对所述指定目标节点进行屏蔽处理,利用屏蔽处理后剩余的布局对象节点,生成经所述屏蔽处理后的页面。
- 一种电子设备,包括:一个或多个处理器;存储装置,其上存储有一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现根据权利要求1-9任意一项所述的页面处理方法;一个或多个I/O接口,连接在所述处理器与存储器之间,配置为实现所述处理器与存储器的信息交互。
- 一种计算机可读介质,其上存储有计算机程序,所述程序被处理器执行时实现根据权利要求1-9任意一项所述的页面处理方法。
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP20864282.7A EP3851981A4 (en) | 2020-02-27 | 2020-07-14 | Page processing method and apparatus, electronic device and computer readable medium |
| US17/278,370 US12353574B2 (en) | 2020-02-27 | 2020-07-14 | Page processing method, electronic apparatus and non-transitory computer-readable storage medium |
| JP2021516984A JP7212771B2 (ja) | 2020-02-27 | 2020-07-14 | ページ処理方法、デバイス、電子デバイス及びコンピュータ読み取り可能な記憶媒体 |
| KR1020217008647A KR102565950B1 (ko) | 2020-02-27 | 2020-07-14 | 페이지 처리 방법, 장치, 전자 기기 및 컴퓨터 판독 가능 매체 |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010125624.1 | 2020-02-27 | ||
| CN202010125624.1A CN111353112A (zh) | 2020-02-27 | 2020-02-27 | 页面处理方法、装置、电子设备和计算机可读介质 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021098242A1 true WO2021098242A1 (zh) | 2021-05-27 |
Family
ID=71194173
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2020/101910 Ceased WO2021098242A1 (zh) | 2020-02-27 | 2020-07-14 | 页面处理方法、装置、电子设备和计算机可读介质 |
Country Status (5)
| Country | Link |
|---|---|
| US (1) | US12353574B2 (zh) |
| EP (1) | EP3851981A4 (zh) |
| JP (1) | JP7212771B2 (zh) |
| CN (1) | CN111353112A (zh) |
| WO (1) | WO2021098242A1 (zh) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113468455A (zh) * | 2021-06-29 | 2021-10-01 | 网易(杭州)网络有限公司 | 用户选择行为获取方法、装置、客户端以及服务端设备 |
Families Citing this family (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111353112A (zh) | 2020-02-27 | 2020-06-30 | 百度在线网络技术(北京)有限公司 | 页面处理方法、装置、电子设备和计算机可读介质 |
| CN111984891A (zh) * | 2020-08-07 | 2020-11-24 | 游艺星际(北京)科技有限公司 | 页面展示方法、装置、电子设备和存储介质 |
| CN115687815B (zh) * | 2021-07-28 | 2025-08-19 | 腾讯科技(深圳)有限公司 | 页面信息的展示方法、装置、设备和介质 |
| CN113656737B (zh) * | 2021-08-20 | 2024-05-14 | 北京百度网讯科技有限公司 | 网页内容展示方法、装置、电子设备以及存储介质 |
| CN117850731B (zh) * | 2022-09-30 | 2025-09-16 | 荣耀终端股份有限公司 | 基于终端设备的自动朗读方法和设备 |
| CN116032785B (zh) * | 2022-11-29 | 2024-12-13 | 企查查科技股份有限公司 | 网络页面的字段关系识别方法和装置 |
| CN116257714A (zh) * | 2022-12-20 | 2023-06-13 | 网易(杭州)网络有限公司 | 层叠样式表的生成方法、装置、计算机设备和存储介质 |
| CN116166152A (zh) * | 2023-02-20 | 2023-05-26 | 中国工商银行股份有限公司 | 一种报文数据展示方法及装置 |
| CN116910411B (zh) * | 2023-07-26 | 2025-09-16 | 上海微盟企业发展有限公司 | 一种html页面布局方法、装置、设备及存储介质 |
| CN120371430A (zh) * | 2024-01-18 | 2025-07-25 | 荣耀终端股份有限公司 | 一种页面数据处理方法及电子设备 |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103294781A (zh) * | 2013-05-14 | 2013-09-11 | 百度在线网络技术(北京)有限公司 | 一种用于处理页面数据的方法与设备 |
| US20160299835A1 (en) * | 2015-04-08 | 2016-10-13 | Opshub, Inc. | Method and system for providing delta code coverage information |
| CN106598574A (zh) * | 2016-11-25 | 2017-04-26 | 腾讯科技(深圳)有限公司 | 页面渲染的方法和装置 |
| CN109739500A (zh) * | 2018-12-14 | 2019-05-10 | 中国四维测绘技术有限公司 | 一种bs架构下的浏览器前端渲染展示方法 |
| CN111353112A (zh) * | 2020-02-27 | 2020-06-30 | 百度在线网络技术(北京)有限公司 | 页面处理方法、装置、电子设备和计算机可读介质 |
Family Cites Families (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP4959603B2 (ja) | 2008-02-21 | 2012-06-27 | ネットスター株式会社 | ドキュメントを解析するためのプログラム,装置および方法 |
| JP2010257412A (ja) | 2009-04-28 | 2010-11-11 | Nec Corp | 情報フィルタリング装置、情報フィルタリング方法及びプログラム |
| JP2011044116A (ja) | 2009-08-24 | 2011-03-03 | Justsystems Corp | 閲覧制御装置、閲覧制御方法および閲覧制御プログラム |
| CN103052950A (zh) | 2010-08-20 | 2013-04-17 | 惠普发展公司,有限责任合伙企业 | 用于过滤网页内容的系统和方法 |
| US9576068B2 (en) * | 2010-10-26 | 2017-02-21 | Good Technology Holdings Limited | Displaying selected portions of data sets on display devices |
| CN103778365B (zh) * | 2012-10-18 | 2015-05-13 | 腾讯科技(深圳)有限公司 | 一种检测网页隐藏内容的方法,及设备 |
| US9954894B2 (en) * | 2016-03-04 | 2018-04-24 | Microsoft Technology Licensing, Llc | Webpage security |
| CN106095869B (zh) | 2016-06-03 | 2020-11-06 | 腾讯科技(深圳)有限公司 | 广告信息处理方法、用户设备、后台服务器及系统 |
| CN110489636A (zh) * | 2018-05-15 | 2019-11-22 | 南京大学 | 一种基于代码分析与图像处理的网页广告屏蔽方法 |
-
2020
- 2020-02-27 CN CN202010125624.1A patent/CN111353112A/zh active Pending
- 2020-07-14 EP EP20864282.7A patent/EP3851981A4/en not_active Withdrawn
- 2020-07-14 US US17/278,370 patent/US12353574B2/en active Active
- 2020-07-14 WO PCT/CN2020/101910 patent/WO2021098242A1/zh not_active Ceased
- 2020-07-14 JP JP2021516984A patent/JP7212771B2/ja active Active
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103294781A (zh) * | 2013-05-14 | 2013-09-11 | 百度在线网络技术(北京)有限公司 | 一种用于处理页面数据的方法与设备 |
| US20160299835A1 (en) * | 2015-04-08 | 2016-10-13 | Opshub, Inc. | Method and system for providing delta code coverage information |
| CN106598574A (zh) * | 2016-11-25 | 2017-04-26 | 腾讯科技(深圳)有限公司 | 页面渲染的方法和装置 |
| CN109739500A (zh) * | 2018-12-14 | 2019-05-10 | 中国四维测绘技术有限公司 | 一种bs架构下的浏览器前端渲染展示方法 |
| CN111353112A (zh) * | 2020-02-27 | 2020-06-30 | 百度在线网络技术(北京)有限公司 | 页面处理方法、装置、电子设备和计算机可读介质 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP3851981A4 * |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113468455A (zh) * | 2021-06-29 | 2021-10-01 | 网易(杭州)网络有限公司 | 用户选择行为获取方法、装置、客户端以及服务端设备 |
| CN113468455B (zh) * | 2021-06-29 | 2023-06-30 | 网易(杭州)网络有限公司 | 用户选择行为获取方法、装置、客户端以及服务端设备 |
Also Published As
| Publication number | Publication date |
|---|---|
| JP2022512056A (ja) | 2022-02-02 |
| US12353574B2 (en) | 2025-07-08 |
| US20220114269A1 (en) | 2022-04-14 |
| EP3851981A4 (en) | 2021-12-29 |
| CN111353112A (zh) | 2020-06-30 |
| EP3851981A1 (en) | 2021-07-21 |
| JP7212771B2 (ja) | 2023-01-25 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2021098242A1 (zh) | 页面处理方法、装置、电子设备和计算机可读介质 | |
| JP6117452B1 (ja) | 行動計量学を使用してコンテンツレイアウトを最適化するためのシステムおよび方法 | |
| RU2611965C2 (ru) | Способ и устройство отображения страницы | |
| CN101782911B (zh) | 一种网络资源内容提示方法及系统 | |
| CN105843815B (zh) | 页面评论处理方法、装置和浏览器 | |
| US20130339840A1 (en) | System and method for logical chunking and restructuring websites | |
| US20150142567A1 (en) | Method and apparatus for identifying elements of a webpage | |
| WO2014101783A1 (en) | Method and server for performing cloud detection for malicious information | |
| CN112384940B (zh) | 用于web爬取电子商务资源页面的机制 | |
| CN104572798A (zh) | 一种用于处理网页的方法、设备与系统 | |
| JP2019536171A (ja) | ウェブページのクラスタリング方法及び装置 | |
| CN104331474A (zh) | 页面处理方法及装置 | |
| KR102565950B1 (ko) | 페이지 처리 방법, 장치, 전자 기기 및 컴퓨터 판독 가능 매체 | |
| CN105205080A (zh) | 冗余文件清理方法、装置和系统 | |
| CN102523130A (zh) | 不良网页检测方法及装置 | |
| WO2021068681A1 (zh) | 标签分析方法、装置及计算机可读存储介质 | |
| CA3044034A1 (en) | Electronic form identification using spatial information | |
| WO2021253252A1 (zh) | 网页检测方法、装置、电子设备以及存储介质 | |
| CN105653724A (zh) | 一种页面曝光量的监控方法和装置 | |
| CN111914199B (zh) | 一种页面元素过滤方法、装置、设备及存储介质 | |
| US20130073944A1 (en) | Method and system for dynamically providing contextually relevant posts on an article | |
| US10452727B2 (en) | Method and system for dynamically providing contextually relevant news based on an article displayed on a web page | |
| CN117520678A (zh) | 一种网页处理的方法、装置、电子设备及存储介质 | |
| CN110825976B (zh) | 网站页面的检测方法、装置、电子设备及介质 | |
| US12423361B2 (en) | Data extraction approach for retail crawling engine |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| ENP | Entry into the national phase |
Ref document number: 2021516984 Country of ref document: JP Kind code of ref document: A |
|
| ENP | Entry into the national phase |
Ref document number: 2020864282 Country of ref document: EP Effective date: 20210324 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWG | Wipo information: grant in national office |
Ref document number: 17278370 Country of ref document: US |