EP1006463A2 - Verfahren und Vorrichtung für den dauerhaften Zugriff auf Web-Betriebsmittel - Google Patents
Verfahren und Vorrichtung für den dauerhaften Zugriff auf Web-Betriebsmittel Download PDFInfo
- Publication number
- EP1006463A2 EP1006463A2 EP99309316A EP99309316A EP1006463A2 EP 1006463 A2 EP1006463 A2 EP 1006463A2 EP 99309316 A EP99309316 A EP 99309316A EP 99309316 A EP99309316 A EP 99309316A EP 1006463 A2 EP1006463 A2 EP 1006463A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- time
- web
- stamp
- electronic document
- version
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
- G06F16/9566—URL specific, e.g. using aliases, detecting broken or misspelled links
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Definitions
- the present invention relates to Internet resource access techniques, and more particularly, to a method and apparatus for ensuring persistent access to Internet resources.
- the World Wide Web provides a dynamic way to present and distribute a vast amount of information.
- the Web provides users with many media options and is becoming ubiquitously available in an expanding variety of personal electronic devices, far beyond its initial limited availability to users via computer terminals.
- the Web may ultimately replace traditional paper-based media altogether.
- Paper-based media generally have an associated time stamp, and permit an easy determination of the information that was available at a given time.
- a newspaper article can be cited as an authoritative reference, provided that the particular date of the newspaper publication is specified.
- Web content cannot reliably be expected to be available in the same form and addressed by the same Unifprm Resource Locator ("URL") at a future time. While some Web sites may provide access to some archived Web documents, the historical Web documents may not be accessed by users in a consistent and predictable manner, if at all.
- OCLC PURL Persistent Uniform Resource Locator
- a Persistent Uniform Resource Locator provides flexible naming and name resolution services for Internet resources to ensure reliable, long-term access to Internet resources with minimal maintenance.
- OCLC PURL assists Internet users in locating Web resources.
- the Internet is constantly expanding and changing. Once a Uniform Resource Locator (URL) changes, all previous references to that URL become invalid, thereby preventing users from accessing the Internet resource. The management of these changes often becomes burdensome.
- URL Uniform Resource Locator
- a PURL points to an intermediate resolution service, which translates the PURL into the actual URL.
- the Web resource may be accessed by means of the PURL.
- a PURL assigns a persistent name to a resource even if the location of the resource changes. In this manner, PURLs referenced in Web documents and other resources can remain viable over time without having to update the references each time the Web resource is moved.
- the PURL "forwarding" address maintained by OCLC must be kept up-to-date. In other words, each time the document is moved, OCLC must be notified of the new address for the document.
- the Uniform Resource Locators that identify Web resources are optionally augmented to include a time stamp.
- the time stamp can be specified in the Uniform Resource Locator ("URL”) in any suitable format.
- the present invention allows the Web to be an organized and reliable reference source, much like paper-based media.
- a web browser and a web server are disclosed that accommodate a time stamp parameter and allow a user to refer to any Web address with a precise target time. If a version of the Web resource corresponding to the requested time does not exist, a version of the document stored time-wise in the vicinity of the requested target time is provided. For example, the present invention may assume the Web resource has not changed from the previous archived version, and the version of the Web resource with the most recent time-stamp preceding the requested time is provided. Alternatively, the version of the Web resource with the next immediate time-stamp after the requested time is provided.
- the disclosed Web browser can optionally include a mechanism to facilitate the specification of the desired date and time, or the user can manually append the time stamp to the URL indicated in the "Location" window of the browser.
- the persistent Web server (i) receives URLs containing a time stamp, (ii) retrieves the Web page corresponding to the requested time-stamp, if it exists, or, retrieves a version of the Web page stored time-wise in the vicinity of the requested target time if a version of the Web page corresponding to the requested time-stamp does not exist, and (iii) returns the appropriate page to the client.
- the persistent Web server interprets the extracted URL in accordance with the selected time-stamp format.
- the persistent Web server includes a persistent archive for storing all of the versions of Web resources that will be persistently available to Web users.
- FIG. 1 illustrates a Web browser 100 in accordance with the present invention, that accesses information from one or more persistent Web servers 140, 150 over the Internet or World Wide Web (“Web") environment 130.
- the present invention provides persistent access to Web resources or electronic documents, including textual, audio, video or animation documents.
- the Uniform Resource Locators ("URLs") that identify Web resources are augmented to include a time stamp.
- the Web browser 100 and persistent Web servers 140, 150 accommodate the additional time stamp parameter and allow a user to refer to any Web address with a precise target date.
- the time stamp can be included in the Uniform Resource Locator ("URL") in any suitable format, as would be apparent to a person of ordinary skill.
- additional time granularity can be indicated by including the time-of-day in the URL.
- the time zone is assumed to be the user's default time zone.
- the illustrative time stamp format described above is a Common Gateway Interface (CGI) search argument.
- CGI Common Gateway Interface
- the month, day and year (or other time units) can be expressed in any order.
- the default value will be the most recent version.
- the URL can be represented using the labels "next_archive,” or "previous_archive.”
- the time stamp can be indicated as one of the HTTP request headers, such as: Time-Stamp: June 9, 1998.
- the Web browser 100 may be embodied as a conventional browser, such as Microsoft Internet Explorer TM or Netscape Navigator TM , as modified herein to incorporate the features and functions of the present invention. As discussed further below, the Web browser 100 only needs to incorporate a new options selection panel to permit the user to specify the desired date and time. In fact, a conventional Web browser 100 can be utilized, with the user manually appending the time stamp to the URL indicated in the "Location" window of the browser 100.
- the user has the option to turn the time stamp on or off. If the time stamp is activated, the browser 100 will change the URL accordingly before sending the URL out to the Web 130. Since there is no guarantee that the corresponding web server 140, 150 recognizes a time stamp, the document returned by the server 140, 150 might contain embedded hyperlinks that do not contain time stamps. Thus, in this situation, the web browser 100 can automatically convert the URL associated with an embedded hyperlink to add an appropriate time stamp when the user clicks on the hyperlink if the time stamp option is activated. The Web browser 100 should convert the URL in accordance with the selected time stamp format. In a request-header-scheme implementation, the browser 100 should be modified to send the special request header ("Time-Stamp: June 9, 1998").
- the HTML should be modified to include a new time stamp tag for any embedded hyberlink with a specific time stamp.
- TIMEZONE server> ⁇ /TIMESTAMP>Lucent Web Site ⁇ /A>.
- the persistent Web servers 140, 150 may be embodied as conventional hardware and software, as modified herein to carry out the functions and operations described below. Specifically, the persistent Web servers 140, 150 need to know how to (i) receive URLs containing a time stamp, (ii) extract the time stamp, (iii) retrieve the Web page corresponding to the requested time-stamp, if it exists, or, retrieve a version of the Web page stored time-wise in the vicinity of the requested target time if a version of the Web page corresponding to the requested time-stamp does not exist, (iv) modify the requested Web page to update embedded hyperlinks to incorporate the same time stamp as the requested Web page and (v) return the requested page to the client.
- the persistent Web servers 140, 150 should interpret the extracted URL in accordance with the selected time stamp format.
- the present invention provides a version of the document stored time-wise in the vicinity of the requested target time. For example, the present invention may assume the Web resource has not changed from the previous archived version, and the version of the Web resource with the most recent time-stamp preceding the requested time is provided. Alternatively, the version of the Web resource with the next immediate time-stamp after the requested time is provided.
- each persistent Web server such as the servers 140, 150, includes a persistent archive 145, 155, respectively, for storing all of the versions of Web resources that will be persistently available to Web users.
- the persistent archives 145, 155 may be embodied as any storage device, although a persistent (non-erasable) storage device such as CD-ROM, CR-R, WORM or DVD-ROM may be preferred.
- the persistent Web servers 140, 150 For the persistent Web servers 140, 150 to support dated URLs, the persistent Web servers 140, 150 need to store all of their contents in a chronicle fashion to enable the retrieval of timely information.
- the persistent archives 145, 155 store the entire web site contents on permanent storage devices according to some sort of chronological directory structure.
- FIG. 2 shows a directory structure 200 that arranges the contents of the Web site chronologically.
- each leaf, such as the leaf 210, in the directory structure 200 corresponds to a dated URL.
- FIG. 3 illustrates an archival process 300 for reducing the redundancy of the persistent archive 145, 155. All the files or subdirectories mentioned in the algorithm are under the archive subdirectory 220 of the illustrative directory structure 200 of FIG. 2. As shown in FIG. 3, the archival process 300 initially performs a test during step 310 for each subdirectory, such as subdirectory A, to determine whether there exists a subdirectory B that is created earlier and has identical contents as subdirectory A. If it is determined during step 310 that there is no subdirectory B created earlier and having identical contents as subdirectory A, then it is not possible to reduce the redundancy on the subdirectory level of the persistent archive 145, 155 and program control proceeds to step 330.
- each subdirectory such as subdirectory A
- subdirectory A becomes an alias during step 320 pointing to subdirectory B.
- FIG. 4A if the current contents of a Web site is identical to the contents of the previous day, an alias is created for today pointing to yesterday's subdirectory.
- FIG. 4B if the current month's content is the same as the contents of the previous month, an alias is created for this month pointing to last month's subdirectory.
- step 330 a test is performed during step 330 for each file, such as file A, to determine whether there exists a file B that is created earlier and has identical contents as file A. If it is determined during step 330 that there is no file B created earlier and having identical contents as file A, then it is not possible to reduce the redundancy of the persistent archive 145, 155 on the file level. Thus, program control terminates during step 350.
- step 330 If, however, it is determined during step 330 that there exists a file B that is created earlier and has identical contents as file A, then file A becomes an alias during step 340 pointing to file B. Thereafter, program control terminates during step 350.
- the archival process 300 may be impractical, since it needs to search for match files or directories.
- the run time increases exponentially with the number of entities in the archive.
- Many sub-optimal solutions are possible, as would be apparent to a person of ordinary skill in the art.
- a very simple solution is just checking what you want to archive today against the most recently added archive (like yesterday's contents). Since most of the web sites only differ from their previous archived ones slightly, this approach is quite reasonable. This approach is similar to the well-known incremental backup of a file system.
- a Web server If a Web server is not persistent, it should only have minimal impact. In one embodiment, if a request includes a time stamp that is not recognized by a Web server, the server should deliver the most recent version of the requested Web resource.
- persistent storage of a web resource can be limited to versions that have some difference relative to previously saved versions of the web resource.
- an illustrative archive contains the following five different versions of a web resource: 6/4/1996, 6/12/1996, 3/23/1997, 2/1/1998 and 2/3/1998
- the web server assumes that if the requested date does not equal any of the archived versions, then the requested date is identical to the version with the closest earlier date.
- a special symbolic link (or alias on MacOS, short cut on MS Windows) can be used in a directory to represent where to looks for files or directories that are not found under the current directory. In this manner, only the changed parts are stored under appropriate directories. All the unchanged data can be referred through a chain of such special links.
- the domain name server may be embodied as conventional hardware and software, as modified herein to carry out the functions and operations described below.
- Conventional DNS servers will reject any domain name reference which is not in the DNS database.
- One benefit of dated URL in accordance with the present invention is that it can be used to refer to historical Web resources. For example, if company A is merged into company B, all the web pages referred through "www.A.com" may no longer be valid. For users who want to access some documents from company A, they need to change all the reference to some place in company B's web site.
- FIGS. 5A and 5B provide examples of data stored in a DNS server database before and after the merger of companies A and B, respectively. As shown in FIG. 5B, if a user wants to find www.A.com after the merger, the DNS server has enough information to redirect the user's request to a new IP address associated with company B. The dates listed in the database are the valid periods for the corresponding domain name. Thus, a dated domain name reference like "www.A.com 2/2/1999" is invalid, while "www.A.com 2/2/1992" is valid.
- FIG. 6 illustrates a DNS server process 600 in accordance with the present invention.
- the DNS server process 600 initially receives a domain name request during step 610.
- a test is performed during step 620 to determine if the domain name request is dated. If it is determined during step 620 that the domain name request is not dated, the regular name searching result is returned during step 630.
- the DNS server process 600 searches the DNS database for the domain name with the date constraint during step 640. A further test is performed during step 650 to determine if the dated domain name is found. If it is determined during step 650 that the dated domain name is not found, then the DNS server consults with an archive service company during step 660 for further searching before program control proceeds to step 670.
- step 650 If, however, it is determined during step 650 that the dated domain name is not found, then the searching result and indication, if redirect, are returned during step 670, before program control terminates.
- the Web server 140, 150 of Company B will know how to map this old address of company A's to the appropriate place and get the correct information.
- the Web is now full of dynamic content, including real time video, for example, from a WebCam, and audio streams, for example, from a WebCast event, as well as Java, Javascript or Active-X enabled web pages.
- dynamic content including real time video, for example, from a WebCam, and audio streams, for example, from a WebCast event, as well as Java, Javascript or Active-X enabled web pages.
- the server 140, 150 only needs to retrieve or recalculate the data up to March 2, 1998 and return the results. Since all the transactions in such application environments have time stamps anyway, it is straightforward to add this function to the service.
- the only restriction in appending a time stamp is the storage requirement. If a lot of storage space is available compared to the amount of information to be archived, the Web site administrator can choose to archive the real time contents or to archive some of them such as one day, one week or one year's worth of data.
- the Web site administrator must decide whether it is reasonable to 'reshow' the old advertisement (for some special reason) or whether the old advertisement can be replaced with a new, up-to-date commercial which is not relevant to the 'real' archived web contents.
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer And Data Communications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Information Transfer Between Computers (AREA)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US20175198A | 1998-12-01 | 1998-12-01 | |
| US201751 | 2002-07-23 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| EP1006463A2 true EP1006463A2 (de) | 2000-06-07 |
Family
ID=22747134
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP99309316A Withdrawn EP1006463A2 (de) | 1998-12-01 | 1999-11-23 | Verfahren und Vorrichtung für den dauerhaften Zugriff auf Web-Betriebsmittel |
Country Status (2)
| Country | Link |
|---|---|
| EP (1) | EP1006463A2 (de) |
| JP (1) | JP2000194644A (de) |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7424471B2 (en) | 2007-01-08 | 2008-09-09 | Lsr Technologies | System for searching network accessible data sets |
| US8161064B2 (en) | 2007-01-08 | 2012-04-17 | Lsr Technologies | System for searching network accessible data sets |
| US20130275379A1 (en) * | 2012-04-11 | 2013-10-17 | 4Clicks Solutions, LLC | Storing application data |
| CN110233882A (zh) * | 2019-05-23 | 2019-09-13 | 广州视源电子科技股份有限公司 | 页面资源的访问控制方法、装置、系统、存储介质以及设备 |
| EP3716636A1 (de) * | 2019-03-29 | 2020-09-30 | Spotify AB | Systeme und methoden zur bereitstellung relevanter medieninhalte durch rückschluss über den verbrauch vergangener medieninhalte |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2001503888A (ja) * | 1996-05-06 | 2001-03-21 | アドビ システムズ インコーポレイテッド | 文書インターネットurl管理 |
| JPH10222415A (ja) * | 1997-02-03 | 1998-08-21 | Nec Corp | 更新情報を用いたWebブラウジング処理装置 |
| JPH10283369A (ja) * | 1997-04-10 | 1998-10-23 | Kawasaki Steel Corp | データ検索装置およびその使用方法 |
| GB9708175D0 (en) * | 1997-04-23 | 1997-06-11 | Xerox Corp | Feature constraint based document information retrieval and distribution |
-
1999
- 1999-11-23 EP EP99309316A patent/EP1006463A2/de not_active Withdrawn
- 1999-12-01 JP JP11341725A patent/JP2000194644A/ja not_active Abandoned
Cited By (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7424471B2 (en) | 2007-01-08 | 2008-09-09 | Lsr Technologies | System for searching network accessible data sets |
| US8161064B2 (en) | 2007-01-08 | 2012-04-17 | Lsr Technologies | System for searching network accessible data sets |
| US20130275379A1 (en) * | 2012-04-11 | 2013-10-17 | 4Clicks Solutions, LLC | Storing application data |
| US9053117B2 (en) * | 2012-04-11 | 2015-06-09 | 4Clicks Solutions, LLC | Storing application data with a unique ID |
| EP3716636A1 (de) * | 2019-03-29 | 2020-09-30 | Spotify AB | Systeme und methoden zur bereitstellung relevanter medieninhalte durch rückschluss über den verbrauch vergangener medieninhalte |
| US11653048B2 (en) | 2019-03-29 | 2023-05-16 | Spotify Ab | Systems and methods for delivering relevant media content by inferring past media content consumption |
| CN110233882A (zh) * | 2019-05-23 | 2019-09-13 | 广州视源电子科技股份有限公司 | 页面资源的访问控制方法、装置、系统、存储介质以及设备 |
| CN110233882B (zh) * | 2019-05-23 | 2022-01-11 | 广州视源电子科技股份有限公司 | 页面资源的访问控制方法、装置、系统、存储介质以及设备 |
Also Published As
| Publication number | Publication date |
|---|---|
| JP2000194644A (ja) | 2000-07-14 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US7418655B2 (en) | Method and apparatus for persistent storage of web resources | |
| US8402010B2 (en) | Method and apparatus for resolving domain names of persistent web resources | |
| US7120862B1 (en) | Method and apparatus for persistent access to Web resources using variable time-stamps | |
| EP1160692A2 (de) | Internetbasierter Archivierungsdienst mit Bereitstellung von dauerhaftem Zugriff auf Netzressourcen | |
| US7386614B2 (en) | Method allowing persistent links to web-pages | |
| US6981210B2 (en) | Self-maintaining web browser bookmarks | |
| JP2963087B2 (ja) | アクセス機構、記憶媒体、データ処理システム、アクセス方法、ウェブ・ページ処理方法およびアクセス機構を設ける方法 | |
| US7426544B2 (en) | Method and apparatus for local IP address translation | |
| EP1418512A2 (de) | Verfahren und Vorrichtung zur zentralisierten Lieferung von Multidomainwebinhalten | |
| GB2406399A (en) | Seaching within a computer network by entering a search term and optional URI into a web browser | |
| US6405223B1 (en) | System for personal storage of different web source versions | |
| US20100131588A1 (en) | Error processing methods to provide a user with the desired web page responsive to an error 404 | |
| US7376650B1 (en) | Method and system for redirecting a request using redirection patterns | |
| WO2002061627A2 (en) | Intelligent document linking system | |
| US20030122859A1 (en) | Cross-environment context-sensitive help files | |
| EP1006463A2 (de) | Verfahren und Vorrichtung für den dauerhaften Zugriff auf Web-Betriebsmittel | |
| US20090125533A1 (en) | Reference-Based Technique for Maintaining Links | |
| KR20000018242A (ko) | 인터넷 웹사이트 내용의 단축검색 및 관리 프로그램작성방법 | |
| KR20020085996A (ko) | 클라이언트 캐쉬 메모리를 이용한 웹 페이지 제공 방법 | |
| Warnock III et al. | The Electronic Astrophysical Journal: resource location and archive management | |
| Warnock III et al. | Archive Management | |
| HK1028465B (en) | Method for connection for computer network on internet by real name and computer network system thereof | |
| HK1028465A1 (en) | Method for connection for computer network on internet by real name and computer network system thereof |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE |
|
| AX | Request for extension of the european patent |
Free format text: AL;LT;LV;MK;RO;SI |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN |
|
| 18W | Application withdrawn |
Effective date: 20040618 |