CN100514309C - System and method of storing and searching objects through multi-tiered cache memory - Google Patents
System and method of storing and searching objects through multi-tiered cache memory Download PDFInfo
- Publication number
- CN100514309C CN100514309C CNB038041057A CN03804105A CN100514309C CN 100514309 C CN100514309 C CN 100514309C CN B038041057 A CNB038041057 A CN B038041057A CN 03804105 A CN03804105 A CN 03804105A CN 100514309 C CN100514309 C CN 100514309C
- Authority
- CN
- China
- Prior art keywords
- cache memory
- floor height
- height speed
- cache
- speed cache
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
相关申请的相互参考Cross-references to related applications
本申请要求以下申请的优先权:由Jeremy S.de Bonet于2002年1月18日提交的名称为“用于存储和检索多种版本内容的多层高速缓冲存储机制”的美国临时专利申请No.60/349,553,由de Bonet等人于2002年1月18日提交的名称为“模块化插件式事务处理体系结构”的美国临时专利申请No.60/349,424以及由de Bonet等人于2002年1月18日提交的名称为“同时支持多种协议的数据变换、存储和处理的网络代理平台”的美国临时专利申请No.60/349,424,以上申请的全部内容在此引入作为参考。另外,由de Bonet等人于2003年1月14日提交的名称为“使用共享资源和不同应用程序进行事务处理的方法和系统”的美国专利申请序号No._(代理案卷号No.IDET1130-1),在此引入作为参考。This application claims priority from the following application: U.S. Provisional Patent Application No. 1, entitled "Multilayer Caching Mechanism for Storing and Retrieving Multiple Versions of Content," filed January 18, 2002 by Jeremy S. de Bonet. .60/349,553, U.S. Provisional Patent Application No. 60/349,424, entitled "Modular Pluggable Transaction Processing Architecture," filed Jan. 18, 2002 by de Bonet et al. U.S. Provisional Patent Application No. 60/349,424, entitled "Network Proxy Platform Supporting Data Transformation, Storage, and Processing of Multiple Protocols Simultaneously," filed January 18, the entire contents of which are hereby incorporated by reference. Also, U.S. Patent Application Serial No. _ (Attorney Docket No. IDET 1130- 1), incorporated herein by reference.
参考附录Refer to the appendix
在本申请中以附件的方式包含附录,其全部内容在此引入作为参考,其唯一目的是作为本申请的一个组成部分。附录1命题为“网络代理平台及其应用”并包括17页正文和8张附图。The Appendix, which is incorporated by way of attachment into this application, is hereby incorporated by reference in its entirety for the sole purpose of forming an integral part of this application.
技术领域 technical field
本发明一般涉及电子实体的存储和检索。本发明特别涉及用来存储和检索对象的多层高速缓冲存储器的使用,其中各组对象可相互关联,如在网络变换代理上存储多种版本的网络内容。The present invention generally relates to storage and retrieval of electronic entities. In particular, the invention relates to the use of multi-level caches for storing and retrieving objects, wherein groups of objects may be related to each other, such as storing multiple versions of web content on a web transformation agent.
背景技术 Background technique
有多种方法用于存储数据。一种这样的方法是通过使用结合阵列。在结合阵列里,将要存储的对象与一个关键字相关。将对象存在一个特殊的位置,该位置由该相关关键字来识别。当想要检索对象时,只需要查找该关键字,因为该关键字识别该对象的位置。There are various methods for storing data. One such approach is through the use of conjugation arrays. In an associative array, the objects to be stored are associated with a key. Stores the object in a special location identified by the relative key. When you want to retrieve an object, you only need to look up the key, because the key identifies the location of the object.
结合阵列有各种各样的实现方式。例如,数据库、文件系统和高速缓冲存储器都是结合阵列。在这里特别对高速缓冲存储器感兴趣。There are various implementations of combining arrays. For example, databases, file systems, and cache memories are all bonded arrays. Caches are of particular interest here.
高速缓冲存储器是提供数据的本地存储的结合阵列。这里所用的“本地”有点相对性。在将高速缓冲存储器连接到微处理器以允许它们工作得更快和更有效率的情形下,“本地”可能意味着高速缓冲存储器包括在与微处理器相同的芯片上制造的存储器。然而,在将高速缓冲存储器用于网络代理的情形下,“本地”可能意味着高速缓冲存储器在代理机箱内的磁盘驱动器中实现。A cache memory is an associative array that provides local storage of data. "Local" as used here is somewhat relative. "Local" may mean that the cache includes memory fabricated on the same chip as the microprocessor, where caches are connected to microprocessors to allow them to work faster and more efficiently. However, in the case of using the cache for a network proxy, "local" may mean that the cache is implemented in a disk drive within the proxy enclosure.
高速缓冲存储代理使用与Web页面相关的URL作为各自的关键字存储和检索Web内容。然而,在这种情况下出现的问题之一是可能有大量不同的Web页面具有相同的URL。例如,Web页面的内容可能近似相同,但它们可能各自适用于在不同类型的装置(例如,桌上型电脑或能够上网的蜂窝电话)上观看。为了唯一识别必须检索的Web页面,这个关键字因此需要包括额外的信息。关键字可因此结合Web页面的其它特征,诸如Cookies或设计Web页面的浏览器的类型。Caching proxies store and retrieve web content using URLs associated with web pages as respective keys. However, one of the problems that arises in this case is that there may be a large number of different web pages with the same URL. For example, the content of Web pages may be approximately the same, but they may each be adapted for viewing on different types of devices (eg, desktop computers or Internet-enabled cellular phones). This keyword therefore needs to include additional information in order to uniquely identify the Web page that must be retrieved. Keywords can thus be combined with other characteristics of the web page, such as cookies or the type of browser the web page is designed for.
在现有技术的代理中实现的高速缓冲存储是典型的平面形。换句话说,存在带有多个入口的单个高速缓冲存储器。每个缓冲存储器入口包含一个与相应关键字相关的Web页面。如上面所述,该关键字可同时结合URL和唯一识别高速缓冲内容所必需的其它特征。因此,如果代理需要存储具有不同URL的1000张Web页面,就需要1000个高速缓冲存储器入口。如果要求代理服务器为这些Web页面的每一张存储10种不同的版本,就需要10,000个高速缓冲存储器入口。The caches implemented in prior art proxies are typically planar. In other words, there is a single cache with multiple entries. Each cache entry contains a web page associated with the corresponding keyword. As noted above, this key may combine both the URL and other characteristics necessary to uniquely identify the cached content. Therefore, if the proxy needs to store 1000 web pages with different URLs, 1000 cache entries are required. If the proxy server is required to store 10 different versions of each of these Web pages, 10,000 cache memory entries are required.
因为高速缓冲存储器是平面的,所以其存储和检索入口所需的时间和/或存储器随入口数目增加。[[根据所用的数据结构,查找时间可从O(n)变化到O(log(n)),甚至到O(1)(恒定时间),所以以下不准确]]入口的相似性不能带来任何好处(即,数十个入口可能只是同一Web页面的不同版本的事实)Because a cache is flat, the time and/or memory it takes to store and retrieve an entry increases with the number of entries. [[Depending on the data structure used, the lookup time can vary from O(n) to O(log(n)), or even to O(1) (constant time), so the following is not exact]] The similarity of the entry does not bring Any benefits (i.e. the fact that dozens of entry points might just be different versions of the same web page)
而且,当用平面式高速缓冲存储结构存储内容的多种版本时,没有办法处理相关内容的集合。例如,没有办法存储所有相关内容所共有的数据(例如,存储HTTP标题或者同一Web页面的多种版本所共有的其它信息)。必须为每一个独立版本单独存储共有信息。同样,没有办法处理这些相关内容的集合作为一个组。例如,如果想要更新过时Web页面的每个版本,就没有办法采取影响所有版本的单个行动--必须在高速缓冲存储器结构中分别定位和更新它们。Furthermore, when using a flat cache structure to store multiple versions of content, there is no way to handle collections of related content. For example, there is no way to store data common to all related content (eg, storing HTTP headers or other information common to multiple versions of the same Web page). Shared information must be stored separately for each individual version. Again, there is no way to handle this collection of related content as a group. For example, if one wants to update each version of an outdated Web page, there is no way to take a single action that affects all versions -- they must be located and updated separately in the cache structure.
应当指出,当对于数据库存在多层存储机制时,这些与高速缓冲存储器结构不相同。数据库不是被设计用作其它程序的内部功能库。在数据库系统中,必须由数据库程序设计员明确构建树和多级存储和检索结构以及,由于实现数据库系统所付出的努力、花费以及系统开销,这项技术不适用于高性能的高速缓冲存储器检索。It should be noted that while there are multiple layers of storage mechanisms for the database, these are not the same as cache structures. The database is not designed to be used as a library of internal functions for other programs. In database systems, trees and multi-level storage and retrieval structures must be explicitly constructed by the database programmer and, due to the effort, expense, and overhead involved in implementing the database system, this technique is not suitable for high performance cache memory retrieval .
发明内容 Contents of the invention
可由本发明的各种实施例来解决上面概括论述的一个或多个问题。一般地说,本发明包括用于提高对象的存储和检索性能的系统和方法。在一个实施例中,本发明包括一个多层高速缓冲存储系统,其中可使用多个关键字来查找高速缓冲存储器入口。高速缓冲存储系统的最低层存储对象,而较高层存储对较低层(诸如存储对象的层)的引用。用每一个关键字来查找位于高速缓冲存储系统的不同层的入口。One or more of the problems discussed generally above may be addressed by various embodiments of the present invention. Generally, the present invention includes systems and methods for improving storage and retrieval performance of objects. In one embodiment, the present invention includes a multi-level cache system in which multiple keys can be used to look up cache entries. The lowest tiers of a cache system store objects, while higher tiers store references to lower tiers, such as tiers that store objects. Each key is used to find entries located at different levels of the cache system.
在网路代理中实现一个示范性的实施例。配置网络变换代理以处理Web服务器和一个或多个客户(例如网络浏览器)之间的通信。因此,如果配置网络变换代理来高速缓冲存储服务于客户的Web页面那么它将工作得更有效率。在这个实施例中,配置网络变换代理以存储每个Web页面的多种版本,其中每一个版本对应于,例如,一个不同的客户装置,每个客户装置拥有其自己的显示特征和性能。不同于将Web页面的所有不同版本存入平面式高速缓冲存储器,将Web页面存入多层高速缓冲存储器。更具体地说,将Web页面存入两层高速缓冲存储器,其中Web页面的URL用作高速缓冲存储器第一层入口的关键字,而Web页面的版本用作高速缓冲存储器(它实际上包括多个高速缓冲存储器)的第二层入口的关键字。当有客户请求Web页面时,从该请求中识别所想要的页面的URL和客户的装置类型。网络变换代理把URL用作关键字编索引进入第一层高速缓冲存储器。与这个关键字(URL)相应的入口包含识别第二层高速缓冲存储器的对象。网络变换代理用第二个关键字(装置类型)编索引进入所识别的第二层高速缓冲存储器。对应第二个关键字的所识别的第二层高速缓冲存储器的入口包含作为想得到的Web页面的对象。然后就可检索到该Web页面并服务于客户。An exemplary embodiment is implemented in a web proxy. A network transformation proxy is configured to handle communications between a web server and one or more clients (eg, web browsers). Therefore, the Network Transformation Proxy will work more efficiently if it is configured to cache Web pages that serve clients. In this embodiment, the network transformation agent is configured to store multiple versions of each Web page, where each version corresponds to, for example, a different client device, each client device having its own display features and capabilities. Instead of storing all the different versions of a Web page in a flat cache, a Web page is stored in a multi-level cache. More specifically, web pages are stored in two levels of cache memory, where the URL of the web page is used as the key for the first level entry of the cache memory, and the version of the web page is used as the cache memory (which actually includes multiple cache memory) level 2 entry key. When a client requests a Web page, the URL of the desired page and the client's device type are identified from the request. The Network Transformation Agent indexes the URL as a key into the first level cache. The entry corresponding to this key (URL) contains the object identifying the second level cache. The Network Transformation Agent indexes into the identified Layer 2 cache with the second key (device type). The identified second-level cache entry corresponding to the second key contains the object as the desired Web page. The Web page can then be retrieved and served to the client.
可供选择的实施例包括一种在多层高速缓冲存储器中存储和检索对象的方法。每一个要存储的对象具有多个与之相关的关键字。用每一个关键字编索引进入不同层的高速缓冲存储器。除在最后一层的高速缓冲存储器外,所有层的每个高速缓冲存储器都包含对后一层中的高速缓冲存储器引用的对象。在最后一层的高速缓冲存储器存储对象本身,而不是对其它高速缓冲存储器的引用。可供选择的是,最后一层的高速缓冲存储器可能包含对所存储对象的引用而不是对象本身。因此,在多层高速缓冲存储系统中存储一个对象包括在第一层高速缓冲存储器中存储包含第一个关键字和对第二层高速缓冲存储器的引用的入口,对另外的层可能重复这种操作(例如,在第二层高速缓冲存储器中存储包括第二关键字和对第三层高速缓冲存储器的引用的入口),以及在最低层高速缓冲存储器中用最后一个关键字存储对象。检索对象包括用第一个关键字编索引进入第一层高速缓冲存储器以得到对第二层高速缓冲存储器的引用,并对其它较低层重复这个步骤直到到达最后一层为止,在最后一层用最后一个关键字编索引进入最后一层高速缓冲存储器以检索对象。Alternative embodiments include a method of storing and retrieving objects in a multi-level cache. Each object to be stored has multiple keys associated with it. Each key is indexed into a different level of cache memory. Each cache in all tiers, except the cache in the last tier, contains objects that reference caches in the next tier. The caches at the last level store the objects themselves, rather than references to other caches. Alternatively, the last level of cache may contain references to stored objects rather than the objects themselves. Thus, storing an object in a multi-level cache system involves storing an entry in the first level cache containing the first key and a reference to the second level cache, possibly repeating this for additional levels. operation (eg, storing an entry including the second key and a reference to a third-level cache in the second-level cache), and storing the object with the last key in the lowest-level cache. Retrieving an object involves indexing into the first level cache with the first key to get a reference to the second level cache, and repeating this step for other lower levels until the last level is reached, where Indexes into the last level of cache with the last key to retrieve the object.
本发明的另一个实施例包括一个软件应用程序。该软件应用程序收录在诸如软盘、CD-ROM、DVD-ROM、RAM、ROM、数据库模式等的计算机可读介质中。计算机可读介质包含为使计算机执行上面所概述的方法而配置的指令。应当指出,计算机可读介质可能包括RAM或其它组成计算机系统的一部分的存储器。因此计算机系统将被允许执行根据现在所公开的内容的方法,并认为在所附权利要求书的范围内。Another embodiment of the invention includes a software application. The software application is embodied on a computer readable medium such as a floppy disk, CD-ROM, DVD-ROM, RAM, ROM, database schema, or the like. A computer readable medium contains instructions configured to cause a computer to perform the methods outlined above. It should be noted that computer readable media may include RAM or other memory forming part of a computer system. A computer system will therefore be permitted to perform methods according to what is now disclosed and is considered to be within the scope of the appended claims.
也可能有大量其它的实施例。Numerous other embodiments are also possible.
附图说明 Description of drawings
通过阅读下面的详细描述和参考附图,本发明的其它目标和优势将变得显而易见。Other objects and advantages of the present invention will become apparent by reading the following detailed description and by referring to the accompanying drawings.
图1说明根据本发明的一个实施例的基于网络的系统的示范性结构图。FIG. 1 illustrates an exemplary block diagram of a network-based system according to one embodiment of the present invention.
图2说明根据本发明的一个实施例的适合用作网络变换代理的计算机的基本配置图。Figure 2 illustrates a basic configuration diagram of a computer suitable for use as a network transformation agent according to one embodiment of the present invention.
图3说明根据本发明的一个实施例的多层高速缓冲存储器的结构图。FIG. 3 illustrates a block diagram of a multi-level cache memory according to one embodiment of the present invention.
图4是说明根据本发明的一个实施例的适用于N层高速缓冲存储结构的通用方法的流程图。FIG. 4 is a flowchart illustrating a general method applicable to an N-level cache structure according to one embodiment of the present invention.
虽然本发明可以有各种修改和可供选择的形式,仍然通过附图的实例和附加的详细说明展示其具体的实施例。然而,应当明白附图和详细说明的目的不是将本发明限制在所描述的特殊实施例。相反此公开的目的是覆盖落在如所附权利要求书定义的本发明的范围内的所有修改、等价物以及可供选择的对象。While the invention is capable of various modifications and alternative forms, specific embodiments thereof have been shown by way of example in the drawings and the accompanying detailed description. It should be understood, however, that the drawings and detailed description are not intended to limit the invention to the particular embodiments described. On the contrary, the intention of this disclosure is to cover all modifications, equivalents and alternatives falling within the scope of the invention as defined by the appended claims.
具体实施方式 Detailed ways
下面将描述本发明的优选实施例。应当指出下面所描述的这个以及任何其他的实施例只是示范性的,而且目的是说明本发明而不是限制本发明。Preferred embodiments of the present invention will be described below. It should be noted that this and any other embodiments described below are exemplary only, and are intended to illustrate the invention rather than limit it.
一般地说,本发明包括用于提高存储和检索对象的性能的系统和方法。在一个实施例中,本发明包括一个多层高速缓冲存储系统,其中使用多个关键字来查询高速缓冲存储器的入口。高速缓冲存储器的最低层存储对象,而较高层存储对较低层(如存储对象的层)的引用。用每个关键字来查询在高速缓冲存储系统的不同层的入口。Generally, the present invention includes systems and methods for improving the performance of storing and retrieving objects. In one embodiment, the present invention includes a multi-level cache system in which multiple keys are used to query cache entries. The lowest level of the cache memory stores objects, while higher levels store references to lower levels such as those storing objects. Each key is used to query entries at different levels of the cache system.
在网路变换代理实现一个示范性实施例。配置网路变换代理以截取和操纵Web服务器和一个或多个客户(例如Web浏览器)之间的通信。因此,如果配置网路变换代理高速缓冲存储服务于客户的Web页面,那么它将工作得更有效率。在这个实施例中,配置网路变换代理以产生每一个Web页面的多种版本,其中每个版本对应于,例如,客户装置的每种类型的不同最佳化,每个客户装置有其自己的显示特征和性能。不同于将Web页面的所有不同版本存入一个平面式高速缓冲存储器,将Web页面版本存入多层高速缓冲存储器。更具体地说,将Web页面存储在两层高速缓冲存储器,其中Web页面的URL用作进入高速缓冲存储器第一层的关键字,而将页面已经为其变换的装置用作进入高速缓冲存储器第二层的关键字。An exemplary embodiment is implemented in a network transformation agent. A network transformation proxy is configured to intercept and manipulate communications between a web server and one or more clients (eg, web browsers). Therefore, the Network Transformation Proxy cache will work more efficiently if it is configured to serve the client's Web pages. In this embodiment, the network transformation agent is configured to generate multiple versions of each Web page, where each version corresponds to, for example, a different optimization for each type of client device, each client device having its own display features and performance. Instead of storing all the different versions of a web page in one flat cache, storing the versions of a web page in a multi-level cache. More specifically, Web pages are stored in two levels of cache memory, where the URL of the Web page is used as the key into the first level of cache memory, and the device for which the page has been transformed is used as the key into the second level of cache memory. Second level keywords.
当客户请求Web页面时,可从请求中识别出想得到的页面的URL和客户的装置类型。网路变换代理用URL作为索引进入第一层高速缓冲存储器的关键字。对应于这个关键字(URL)的入口包含作为第二层的高速缓冲存储器的标识符的对象。网络变换服务器用第二关键字(装置类型)编索引进入第二层中所标识的高速缓冲存储器。对应于第二关键字的所标识的第二层高速缓冲存储器的入口包含是想得到Web页面的对象。于是就可检索到Web页面并提供给客户。When a client requests a Web page, the URL of the desired page and the client's device type can be identified from the request. The Network Transformation Agent uses the URL as a key to index into the first level cache memory. The entry corresponding to this key (URL) contains an object that is an identifier of the cache memory of the second layer. The network transformation server indexes into the cache memory identified in the second tier with the second key (device type). The identified second-level cache entry corresponding to the second key contains the object that is the desired Web page. The Web page can then be retrieved and provided to the client.
虽然该优选实施例是在网络变换代理实现的,但应当指出本发明能够通用于多种系统的多层高速缓冲存储器。因此,即使现在公开的内容主要集中在本发明在网路变换代理的实现,这样的目的也只是示范,而不是限制。Although the preferred embodiment is implemented at the network translation proxy, it should be noted that the present invention can be applied generally to multi-level caches of a variety of systems. Therefore, even though the present disclosure mainly focuses on the implementation of the present invention in the network conversion proxy, such purpose is only for illustration, not limitation.
本发明的优选实施例在网络环境下工作。应用网络及其组件来将Web内容(例如Web页面)从一个或多个服务器分配给一个或多个客户。参考图1展示了示范性结构。The preferred embodiment of the present invention works in a network environment. A network and its components are used to distribute web content (eg, web pages) from one or more servers to one or more clients. An exemplary structure is shown with reference to FIG. 1 .
如图1所示,所述结构包括连接到网络变换代理14的客户12,该代理14又连接到Web服务器16。网络变换代理14包括高速缓冲存储子系统18。客户12通过第一个网络13连接到代理14。代理14通过第二个网络15连接到Web服务器16。预期网络13和网络15中至少有一个包括互联网。这些网络的另一个可能包括一个特殊企业的内部或外部网络。然而,应当指出,客户12、代理服务器14和Web服务器16的连接没有必要为了本发明以任何特殊方式配置。As shown in FIG. 1 , the architecture includes a
代理处理诸如Web浏览器的客户装置或程序和诸如Web服务器的服务器装置或程序之间的通信。在基于Web的系统里,代理处理客户对Web内容的请求,以及Web服务器响应这些请求所提供的Web内容。在处理这些通信时,代理负责模拟Web服务器并因此降低对系统(对Web服务器和网络自身)的负担。代理通过存储由Web服务器所提供的一些内容,以及当可能时提供所存储的内容给客户以响应对内容的请求来做这件事。这样,代理为Web服务器减轻了服务一部分客户的请求的负担。The proxy handles communication between a client device or program such as a Web browser and a server device or program such as a Web server. In Web-based systems, proxies process client requests for Web content, and Web servers provide Web content in response to those requests. In handling these communications, the proxy is responsible for simulating the web server and thus reducing the burden on the system (on the web server and on the network itself). The proxy does this by storing some content provided by the web server, and when possible serving the stored content to the client in response to a request for the content. In this way, the proxy relieves the web server of the burden of serving a portion of the client's requests.
在一个优选实施例中,配置网络变换代理以执行由服务器所提供的Web内容的变换。变换可能依赖于客户对内容作出请求以及请求的方式。变换可包括为了在特殊类型的客户装置上使用而使内容优化所作的修改。例如,修改Web页面以适应不同显示装置的性能(例如,对Web页面上的图像执行色彩减弱或黑白变换)。因此,代理可生成一种特殊Web页面的多种版本(或其它Web内容)。代理也可执行变换,为客户做出更多内容修改,诸如为不同客户插入不同的广告。然后代理需要存储Web内容的这些不同版本。In a preferred embodiment, a network transformation proxy is configured to perform transformations of web content provided by the server. The transformation may depend on the client making the request for the content and the way the request is made. Transformations may include modifications to optimize content for use on particular types of client devices. For example, modifying Web pages to accommodate the capabilities of different display devices (eg, performing color attenuation or black-and-white transformations on images on Web pages). Thus, an agent can generate multiple versions of a particular Web page (or other Web content). Agents can also perform transformations to make more content modifications for clients, such as inserting different advertisements for different clients. The proxy then needs to store these different versions of the web content.
为了创建和识别Web页面的不同版本,网络变换代理使用关于在页面上所作变换的类型和源服务器所提供的版本的信息。版本高速缓冲存储关键字也可指示变换中所用的参数值。例如,如果网络变换代理对一幅图像做色彩减弱,那么该关键字将包括图像将要减弱到的颜色的标号。根据本发明的代理为存储和检索Web内容提供了快速而有效的机制,即使内容的多个版本增加了必须存储的信息的总量。To create and identify different versions of a Web page, the network transformation agent uses information about the type of transformation made on the page and the version provided by the origin server. Version cache keys may also indicate parameter values used in the transformation. For example, if the network transformation agent is color attenuating an image, then the key will include the label of the color to which the image is to be attenuated. Proxies according to the present invention provide a fast and efficient mechanism for storing and retrieving Web content, even though multiple versions of the content increase the total amount of information that must be stored.
参考图2,展示了根据本发明的一个实施例适合用作网络变换代理的计算机的基本配置。在计算机系统100实现服务器14。计算机系统100包括中央处理器(CPU)112、只读存储器(ROM)114、随机存取存储器(RAM)116、硬盘驱动器(HD)118以及输入输出设备(I/O)120。计算机系统100可能有不止一个CPU、ROM、随机存取存储器、硬盘驱动器、输入输出设备或其它其硬件组件。然而将计算机系统100描述为只拥有一种单一类型的组件。应当指出图2所示的系统是示范性硬件配置的简化示图,可能有许多其它可供选择的配置。Referring to Figure 2, there is shown a basic configuration of a computer suitable for use as a network transformation agent according to one embodiment of the present invention.
在驻留于诸如ROM 114、RAM 116或硬盘驱动器118的存储器内的适当软件应用程序可执行这里所描述的部分方法。软件应用程序可包括程序指令,这些指令被配置用来使执行指令的数据处理器执行这里所描述的方法。这些指令可收录(存储)在内部存储装置,如ROM114、RAM 116或硬盘驱动器118、其它装置,以及外部存储装置,或由诸如计算机系统100或者甚至CPU 112的数据处理器可读的存储介质。这种介质可包括,例如,软盘、CD-ROM、DVD ROM、磁带、光存储介质等。Appropriate software applications residing in memory such as
在本发明的一个示意性实施例中,计算机可执行指令可以是若干行编辑的C++、Java或其它语言代码。也可以使用其它结构。例如,任何一台计算机的功能可由图2所示的不同计算机实现。另外,携带这种代码的计算机程序或其软件组成部分也可以存储在不止一台计算机的不止一种数据处理系统的可读介质上。In an exemplary embodiment of the present invention, the computer executable instructions may be several lines of compiled C ++ , Java or other language codes. Other configurations may also be used. For example, the functions of any one computer can be performed by different computers shown in FIG. 2 . In addition, a computer program carrying such code, or software components thereof, may also be stored on readable media of more than one computer and more than one data processing system.
在上面的硬件配置里,各种软件组成部分可驻留于单个计算机或独立计算机的任何组合中。在可供选择的实施例中,一些或所有软件组成部分可驻留于同一台计算机。例如,代理计算机100的一种或多种软件组成部分可驻留于一个客户计算机或服务器计算机或两者之上。在仍是另一个实施例中,如果代理计算机所实现的功能融入到客户计算机或服务器计算机中就可不要求代理计算机本身。在这样的实施例里,客户计算机和服务器计算机可定向连接到同一个网络。In the above hardware configurations, the various software components may reside on a single computer or any combination of separate computers. In alternative embodiments, some or all of the software components may reside on the same computer. For example, one or more software components of
任何客户、服务器和代理计算机之间的通信可用电子、光、射频或其它信号来完成。例如,当用户在客户计算机上时,当发送通信给用户时客户计算机可将信号转换为人可理解的形式,也可将人的输入转换为代理或服务器计算机将要使用的适当的电子、光、射频或其它信号。同样,当操作者在服务器计算机时,当传送通信给操作者时服务器计算机可将信号转换为人可理解的形式,也可将人的输入转换为计算机将要使用的适当的电子、光、射频或其它信号。Communications between any client, server and proxy computers may be accomplished electronically, optically, radio frequency or other signals. For example, when a user is at a client computer, the client computer may convert signals into human understandable form when sending communications to the user, and may convert human input into appropriate electrical, optical, radio frequency or other signals. Likewise, when the operator is at the server computer, the server computer may convert the signal into a human understandable form when communicating to the operator, and may convert human input into the appropriate electrical, optical, radio frequency, or other Signal.
如上面所解释,代理负责存储Web服务器先前提供的信息使得可将这种信息提供给客户以响应他们的请求。这种信息存储在代理的高速缓冲子系统。高速缓冲子系统实际上包括大量组织在两层或多层的高速缓冲存储器。上层和中层引用较低层的高速缓冲存储器。最低层实际上存储想得到的信息。通过依次访问这些层的每一层来访问存储在高速缓冲子系统中的信息。As explained above, the proxy is responsible for storing information previously provided by the web server so that this information can be provided to clients in response to their requests. This information is stored in the proxy's cache subsystem. The cache subsystem actually includes a large number of cache memories organized in two or more layers. The upper and middle tiers refer to the cache memory of the lower tiers. The lowest layer actually stores the desired information. Information stored in the cache subsystem is accessed by accessing each of these layers in turn.
参考图3,展示了根据本发明的一个实施例的多层高速缓冲存储系统的结构。如图所示,高速缓冲存储子系统18包括第一层22和第二层24。第一层22实际上包括具有多个入口(例如31-33)的单个高速缓冲存储器30。第二层24包括多个高速缓冲存储器(例如40、50、60)。Referring to FIG. 3 , it shows the structure of a multi-level cache storage system according to an embodiment of the present invention. As shown, the
每一个高速缓冲存储器的每一个入口包括一个关键字和一个对象。关键字用来识别进入高速缓冲存储器的期望入口。对象是存储在高速缓冲存储器的想得到的信息。由于高速缓冲存储子系统18被设计用来存储Web内容,所以第一层高速缓冲存储器26入口(例如35)的关键字包括URL(统一资源定位符)。入口对象(例如36)包括对第二层24的高速缓冲存储器的引用。例如,第一层高速缓冲存储器的入口31的对象36是对第二层高速缓冲存储器40的引用。Each entry of each cache consists of a key and an object. The key is used to identify the desired entry into the cache memory. Objects are desired information stored in cache memory. Since the
高速缓冲存储器40(如同其它第二层高速缓冲存储器)非常相似,每一个入口(例如,41、42、43)包括一个关键字和一个对象。然而,由于高速缓冲存储器40位于高速缓冲存储系统结构的最低层,所以它的高速缓冲存储入口所包含的对象包括高速缓冲存储子系统18设计将要存储的类型的对象(例如,Web页面)。如果高速缓冲存储子系统18有不止两层,在第二层24的高速缓冲存储器(如40)所包含的对象将包括对第三层高速缓冲存储器的引用。这个第三层可能是最低层,或仍是另一个中间层,其中高速缓存的对象包括对后续层的高速缓冲器的引用。因此,该结构可扩展到任何数目N层。Cache 40 (like other second-level caches) is very similar in that each entry (eg, 41, 42, 43) includes a key and an object. However, since
图4的流程图总结了在使用这种高速缓冲存储结构检索所存储的对象中所采用的方法。图4的流程图表示适用于N层高速缓冲存储结构的通用方法。如图所示,用第一关键字来编索引进入第一结构以检索对第二层高速缓冲器的引用。根据高速缓冲存储结构中的层数N重复进行该操作。对最后一层(第N层)来说,用第N个关键字编索引进入高速缓冲器以检索所存储的对象。The flowchart of Figure 4 summarizes the method employed in retrieving stored objects using this cache structure. The flowchart of Figure 4 represents a general method applicable to an N-level cache structure. As shown, the first key is used to index into the first structure to retrieve a reference to the second level cache. This operation is repeated according to the number N of levels in the cache memory structure. For the last level (level N), the Nth key is used to index into the cache to retrieve the stored object.
在该优选实施例的情况下,其中在网络变换代理实现高速缓冲存储子系统,第一层高速缓冲存储器26的关键字包括在那里所存储的各个内容的名称(例如URL)。第一层高速缓冲存储器26中的对象包括对第二层24的高速缓冲存储器的引用。第二层高速缓冲存储器中的关键字是基于指定URL所标识的内容的不同版本的参数。第二层高速缓冲存储器中的对象包括由高速缓冲存储子系统18所存储的Web内容。第一和第二层高速缓冲存储器中的关键字结合起来可用于存储或检索高速缓冲存储子系统18所存储的任何一段内容的任何版本。In the case of the preferred embodiment, where the network translation agent implements the cache subsystem, the keys of the
在本发明的简单实施例中,两种不同的高速缓冲存储器,或需要的尽可能多的高速缓冲存储器,每个都用一个关键字来存储一个值。每个高速缓冲存储器的功能相似。高速缓冲存储器可包括任何高速缓冲存储或相关的存储器结构。In a simple embodiment of the invention, two different caches, or as many caches as are required, each use a key to store a value. Each cache memory functions similarly. A cache may include any cache or related memory structure.
本发明的一个实施例是多层高速缓冲存储系统。在简单实施例中,本发明使用两层高速缓冲存储系统。在第一层,用基于内容名称的关键字来识别许多第二高速缓冲存储器中的一个。在第二高速缓冲存储器里,用封装指定版本的信息的关键字来识别第一层关键字所指定的内容的特殊版本。用程序可表示如下:One embodiment of the present invention is a multi-level cache system. In a simple embodiment, the present invention uses a two-level cache system. At the first level, a key based on the content name is used to identify one of many secondary caches. In the second cache memory, the particular version of the content specified by the first-level key is identified by a key encapsulating the version-specific information. The program can be expressed as follows:
Level_1_Cache:=CacheOf<Level_1_Cache:=CacheOf<
f(Content_Name),f(Content_Name),
Level_2_CacheLevel_2_Cache
>>
Level_2_Cache:=CacheOf< Level_2_Cache:=CacheOf<
g(Description_Of_Content_Version),g(Description_Of_Content_Version),
Content_VersionContent_Version
>>
抽象函数f()和g()将它们的自变量转换成简洁而容易匹配的关键字。在该优选实施例中,用到MD5sum,但将用到任何(近似)独一无二的编码。CacheOf<Key,Content>是通过与关键字相关来存储和检索内容的高速缓冲存储数据结构。The abstract functions f() and g() transform their arguments into concise and easy-to-match keywords. In the preferred embodiment, MD5sum is used, but any (approximately) unique encoding will be used. CacheOf<Key, Content> is a cache data structure that stores and retrieves content by associating it with a key.
第一层高速缓冲存储(Level_1_Cache)是高速缓冲存储器的高速缓冲存储,其中用基于内容名称的关键字来存储和检索第二层高速缓冲存储(Level_2_Cache)。第二层高速缓冲存储是标准的高速缓冲存储,它将内容版本的描述和适当的内容版本联系起来。值得注意的是,第二层高速缓冲存储的关键字没有必要封装内容名称,因为在每一个第二层高速缓冲存储器的所有项都是同样内容的不同版本-更具体地说,是由Content_Name所识别的内容。The first level cache (Level_1_Cache) is a cache of cache memory in which the second level cache (Level_2_Cache) is stored and retrieved with a key based on a content name. The second level of cache is a standard cache that associates content version descriptions with the appropriate content version. It is worth noting that the keys of the second-level cache do not necessarily encapsulate the content name, because all entries in each second-level cache are different versions of the same content-more specifically, as determined by the Content_Name identified content.
存储包括版本的关键字和包括Web页面的值的第二层高速缓冲存储器与本领域现有技术相似。然而,存储包括URL的关键字和包括第二高速缓冲存储器的值的第一层高速缓冲存储器在存储高速缓冲存储器方面是独一无二的。A second level cache storing keys including versions and values including Web pages is similar to prior art in the art. However, the first level cache storing the key including the URL and the value including the second cache is unique in storing the cache.
在一个优选实施例中,第一层高速缓冲存储(Level_1_Cache)是基于内容的名称(例如URL),而且它指向第二层高速缓冲存储。第二层高速缓冲存储(对应于Level_2_Cache)基于已经应用于内容的变换的类型和参数设置。In a preferred embodiment, the first level cache (Level_1_Cache) is based on the name of the content (eg URL) and it points to the second level cache. The second level of caching (corresponding to Level_2_Cache) is based on the type and parameter settings of the transformations that have been applied to the content.
在该优选实施例中,用内容URL的MD5Sum提供第一层高速缓冲存储(对应于上面的简单实施例所描述的Level_1_Cache)的关键字。由MIT的Ronald L.Rivest教授开发的MD5算法是通常所用的单向函数,具备两个不相等输入极不可能产生相同结果的性质。这种算法使识别数据的关键字更有效。In the preferred embodiment, the MD5Sum of the content URL is used to provide the key for the first level cache (corresponding to the Level_1_Cache described above for the simple embodiment). The MD5 algorithm developed by Professor Ronald L. Rivest of MIT is a commonly used one-way function, which has the property that two unequal inputs are extremely unlikely to produce the same result. This algorithm makes it more efficient to identify the keywords of the data.
第二层高速缓冲存储器包含由URL所识别的内容的多种版本。代理通过对Web内容执行参数变换来创建这些多样化版本。基于变换名称的MD5Sum及其参数提供第二层高速缓冲存储器的关键字。每个关键字用那些设置来标识所变换内容的版本。The second level cache contains multiple versions of the content identified by the URL. Proxies create these diverse versions by performing parameter transformations on the web content. MD5Sum and its parameters provide keys for the second-level cache based on the transform name. Each keyword identifies the version of the transformed content with those settings.
使用MD5sum函数,这种结构可描述为:Using the MD5sum function, this structure can be described as:
Level_1_Cache:=OneCacheOf< Level_1_Cache:=OneCacheOf<
MD5Sum(URL),MD5Sum(URL),
Level_2_CacheLevel_2_Cache
>>
Level_2_Cache:=OneCacheOf< Level_2_Cache:=OneCacheOf<
MD5Sum(Transformation_Parameters),MD5Sum(Transformation_Parameters),
Transform(Content,Transformation_Parameters) Transform(Content, Transformation_Parameters)
>>
该优选实施例用C++模板在多层高速缓冲存储结构内创建高速缓冲存储器。C++模板使得没有必要为了完成同样的任务而写出独立的代码体。它使任务抽象化,允许一个C++对象执行多个任务。为了完成某一具体任务,可将任何类型的关键字和值赋值到模板中。在本系统的情况下,C++模板使得没有必要为第一和第二层高速缓冲存储器编写两种独立的代码体。针对两种不同的高速缓冲存储器的关键字和值类型可同时装入同一个C++模板的结构内。以这种方式使用C++模板的示范性系统和方法,在由发明人Jeremy S.de Bonet、Todd A.Stiers、Jeffrey R.Annison、Phillip Alvelda VLL以及Paul M.Scanlan于2003年1月16日提交的名称为“任意内容和应用数据的存储和检索的设计”的美国专利申请No._(代理案卷号No.IDET1150-1)中详细描述。The preferred embodiment uses C ++ templates to create caches within a multi-level cache structure. C ++ templates make it unnecessary to write separate bodies of code to accomplish the same task. It abstracts tasks, allowing one C ++ object to perform multiple tasks. To accomplish a specific task, any type of keyword and value can be assigned to the template. In the case of the present system, the C ++ templates make it unnecessary to write two separate bodies of code for the first and second level caches. Key and value types for two different caches can be loaded simultaneously into the same C ++ template structure. Exemplary systems and methods for using C ++ templates in this manner are described in January 16, 2003 by Jeremy S. de Bonet, Todd A. Stiers, Jeffrey R. Annison, Phillip Alvelda VLL, and Paul M. Scanlan Described in detail in US Patent Application No._ (Attorney Docket No. IDET1150-1) entitled "Design for Storage and Retrieval of Arbitrary Content and Application Data" filed on .
使用多层高速缓冲存储使高速缓冲存储器的查找更有效。例如,高速缓冲存储代理可存储1000个URL中每个的10种独立的版本。使用目前的发明,将没有必要用10,000个独立关键字将URL存为10,000个独立实体。相反,可将URL存为使用1,000个独立关键字的1,000个独立实体。当要查找某个具体页面的具体版本时,代理只需要搜寻1,000个URL,然后再搜寻那个URL的10种版本。这种查找只要求搜寻1,010个独立实体而不是10,000个。Use multiple levels of caching to make cache lookups more efficient. For example, a caching proxy may store 10 separate versions of each of 1000 URLs. With the current invention, there would be no need to store URLs as 10,000 separate entities with 10,000 separate keywords. Instead, URLs can be stored as 1,000 separate entities using 1,000 separate keywords. When looking for a specific version of a specific page, the proxy only needs to search 1,000 URLs, and then search 10 versions of that URL. This lookup only requires searching 1,010 unique entities instead of 10,000.
而且,本发明产生一种存储所有内容所共有的数据的方法。例如,创建内容的日期或各种其他的HTTP标题也许是所有版本所共有的(如在变换代理的情形下),而且本发明提供公共地点来存储这种信息。没有必要对内容的每一个版本单独存储这种信息,而且如果有任何变化,可同时对内容的多种版本做出这些变化。Furthermore, the present invention creates a method of storing data common to all content. For example, the date the content was created or various other HTTP headers may be common to all versions (as in the case of changing proxies), and the present invention provides a common place to store such information. It is not necessary to store this information separately for each version of the content, and if there are any changes, these changes can be made to multiple versions of the content at the same time.
同时,由于所有版本的Web页面一起存储在单个高速缓冲存储系统,开发者能够一起操纵、转储或删除它们,没有必要单独标识每一个。Also, since all versions of a Web page are stored together in a single cache system, developers can manipulate, dump, or delete them all together without having to identify each one individually.
开发者可将高速缓冲存储系统扩展到超过两层。可用于编索引进入这些层的高速缓冲存储器的另外的关键字可能包括其他的标识符,诸如装置类型、浏览器或支付水平。Developers can extend the cache storage system beyond two tiers. Additional keys that may be used to index into the caches of these tiers may include other identifiers, such as device type, browser, or payment level.
随着网络客户之间差异的增多,就有必要创建内容的多种版本,每一种版本对客户的不同类型是最佳化。在此发明之前,不存在将这种多版本化的内容组织到单个的统一的高速缓冲存储结构的方法。As the differences between web customers increase, it becomes necessary to create multiple versions of content, each optimized for a different type of customer. Prior to this invention, there was no way to organize such multi-versioned content into a single unified cache structure.
在上面结合具体的实施例已经描述了本发明可能提供的好处和优势。不能将这些好处和优势以及任何使它们发生或变得更加明确的元素或限制条件解释为任何或全部权利要求的重要、必须或基本特征。如这里所用到的,短语“包括”“包含”或它的任何其它变型的目的是解释为非专有地包括跟在那些短语后的元素或限制条件。因此,不将包括一组元素的系统、方法或其他实施例限制在只是那些元素,而可能包括其他没有明确列出或所述实施例固有的元素。The benefits and advantages which may be provided by the present invention have been described above with reference to specific examples. None of these benefits and advantages, nor any element or limitation which causes or makes them more specific, should be construed as an essential, necessary or essential feature of any or all of the claims. As used herein, the phrases "comprises," "comprises," or any other variation thereof are intended to be construed as non-exclusive inclusions of the elements or limitations that follow those phrases. Accordingly, a system, method, or other embodiment comprising a set of elements is not limited to only those elements, but may include other elements not expressly listed or inherent to the described embodiment.
虽然已经参考特殊的实施例说明了本发明,应当明白所述实施例是说明性的而且不将发明的范围限制在这些实施例。对上面所描述实施例也可能有许多变化、修改、添加或改进。预期这些变化、修改、添加或改进将落入如下面权利要求书所详述的本发明的范围内。While the invention has been described with reference to specific embodiments, it should be understood that the embodiments are illustrative and do not limit the scope of the invention to these embodiments. Many variations, modifications, additions or improvements are also possible to the embodiments described above. It is intended that such changes, modifications, additions or improvements will fall within the scope of the present invention as detailed in the following claims.
Claims (14)
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US34934402P | 2002-01-18 | 2002-01-18 | |
| US60/349,344 | 2002-01-18 | ||
| US60/349,424 | 2002-01-18 | ||
| US10/345,886 | 2003-01-16 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN1774703A CN1774703A (en) | 2006-05-17 |
| CN100514309C true CN100514309C (en) | 2009-07-15 |
Family
ID=36760954
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CNB038041057A Expired - Fee Related CN100514309C (en) | 2002-01-18 | 2003-01-16 | System and method of storing and searching objects through multi-tiered cache memory |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN100514309C (en) |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102479213B (en) * | 2010-11-25 | 2014-07-30 | 北大方正集团有限公司 | Data buffering method and device |
| KR20150132099A (en) | 2013-03-20 | 2015-11-25 | 휴렛-팩커드 디벨롭먼트 컴퍼니, 엘.피. | Caching data in a memory system having memory nodes at different hierarchical levels |
| US11507566B2 (en) * | 2020-01-31 | 2022-11-22 | Salesforce.Com, Inc. | Managing objects in shared caches using multiple chains |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1196523A (en) * | 1997-04-14 | 1998-10-21 | 国际商业机器公司 | Cache and Architecture-Specific Functional Layering Approach |
| US5991773A (en) * | 1996-04-30 | 1999-11-23 | Brother Kogyo Kabushiki Kaisha | Information terminal unit with history management functions |
| EP1146442A2 (en) * | 2000-04-13 | 2001-10-17 | International Computers Limited | Method and apparatus for storing and accessing electronic content |
| EP1160692A2 (en) * | 2000-05-30 | 2001-12-05 | Lucent Technologies Inc. | Internet archive service providing persistent access to web resources |
-
2003
- 2003-01-16 CN CNB038041057A patent/CN100514309C/en not_active Expired - Fee Related
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5991773A (en) * | 1996-04-30 | 1999-11-23 | Brother Kogyo Kabushiki Kaisha | Information terminal unit with history management functions |
| CN1196523A (en) * | 1997-04-14 | 1998-10-21 | 国际商业机器公司 | Cache and Architecture-Specific Functional Layering Approach |
| EP1146442A2 (en) * | 2000-04-13 | 2001-10-17 | International Computers Limited | Method and apparatus for storing and accessing electronic content |
| EP1160692A2 (en) * | 2000-05-30 | 2001-12-05 | Lucent Technologies Inc. | Internet archive service providing persistent access to web resources |
Also Published As
| Publication number | Publication date |
|---|---|
| CN1774703A (en) | 2006-05-17 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US7130872B2 (en) | Multi-tiered caching mechanism for the storage and retrieval of content multiple versions | |
| US6763362B2 (en) | Method and system for updating a search engine | |
| US7440964B2 (en) | Method, device and software for querying and presenting search results | |
| US7996392B2 (en) | Changing ranking algorithms based on customer settings | |
| JP5065584B2 (en) | Application programming interface for text mining and search | |
| US7099888B2 (en) | Accessing a remotely located nested object | |
| US7103593B2 (en) | System and method for retrieving information from disparate information sources in a decentralized manner and integrating the information in accordance with a distributed domain model/ontology | |
| US20030212863A1 (en) | Predicate indexing of data stored in a computer with application to indexing cached data | |
| US20090083293A1 (en) | Way Of Indexing Web Content | |
| US8521696B2 (en) | Structure of an alternative evaluator for directory operations | |
| CN100514309C (en) | System and method of storing and searching objects through multi-tiered cache memory | |
| US20020107986A1 (en) | Methods and systems for replacing data transmission request expressions | |
| US20090077112A1 (en) | Performance Optimized Navigation Support For Web Page Composer | |
| Chauhan et al. | Design of an agent based context driven focused crawler | |
| Singh et al. | Crawling the Deep Web: A Study | |
| Beniwal et al. | Web crawlers of search engine | |
| HK1023669A1 (en) | Search method for providing fulltext search over web pages of world wide web servers |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee | ||
| CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20090715 Termination date: 20200116 |