Network Working Group                                       R. Moats
Request for Comments: 2517                                  R. Huber
Category: Informational                                         AT&T
                                                       February 1999
        
Network Working Group                                       R. Moats
Request for Comments: 2517                                  R. Huber
Category: Informational                                         AT&T
                                                       February 1999
        

Building Directories from DNS: Experiences from WWWSeeker

从DNS构建目录:WWWSeeker的经验

Status of this Memo

本备忘录的状况

This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited.

本备忘录为互联网社区提供信息。它没有规定任何类型的互联网标准。本备忘录的分发不受限制。

Copyright Notice

版权公告

Copyright (C) The Internet Society (1999). All Rights Reserved.

版权所有(C)互联网协会(1999年)。版权所有。

Abstract

摘要

There has been much discussion and several documents written about the need for an Internet Directory. Recently, this discussion has focused on ways to discover an organization's domain name without relying on use of DNS as a directory service. This memo discusses lessons that were learned during InterNIC Directory and Database Services' development and operation of WWWSeeker, an application that finds a web site given information about the name and location of an organization. The back end database that drives this application was built from information obtained from domain registries via WHOIS and other protocols. We present this information to help future implementors avoid some of the blind alleys that we have already explored. This work builds on the Netfind system that was created by Mike Schwartz and his team at the University of Colorado at Boulder [1].

关于因特网目录的必要性,已经有很多讨论和一些文件。最近,这一讨论集中于如何在不依赖DNS作为目录服务的情况下发现组织的域名。本备忘录讨论了InterNIC Directory and Database Services开发和运行WWWSeeker过程中的经验教训。WWWSeeker是一个应用程序,可根据组织名称和位置信息查找网站。驱动此应用程序的后端数据库是根据通过WHOIS和其他协议从域注册表获得的信息构建的。我们提供这些信息是为了帮助未来的实现者避免我们已经探索过的一些死胡同。这项工作建立在Mike Schwartz和他的团队在科罗拉多大学波尔得分校(1)创建的Netfind系统上。

1. Introduction
1. 介绍

Over time, there have been several RFCs [2, 3, 4] about approaches for providing Internet Directories. Many of the earlier documents discussed white pages directories that supply mappings from a person's name to their telephone number, email address, etc.

随着时间的推移,关于提供Internet目录的方法,出现了一些RFC[2,3,4]。早期的许多文档讨论了白页目录,这些目录提供了从一个人的姓名到他们的电话号码、电子邮件地址等的映射。

More recently, there has been discussion of directories that map from a company name to a domain name or web site. Many people are using DNS as a directory today to find this type of information about a given company. Typically when DNS is used, users guess the domain name of the company they are looking for and then prepend "www.". This makes it highly desirable for a company to have an easily

最近,有人讨论从公司名称映射到域名或网站的目录。今天,许多人使用DNS作为目录来查找有关给定公司的此类信息。通常,当使用DNS时,用户猜测他们要查找的公司的域名,然后在“www.”前面加上前缀。这使得一家公司非常希望有一个简单的解决方案

guessable name.

可猜的名字。

There are two major problems here. As the number of assigned names increases, it becomes more difficult to get an easily guessable name. Also, the TLD must be guessed as well as the name. While many users just guess ".COM" as the "default" TLD today, there are many two-letter country code top-level domains in current use as well as other gTLDs (.NET, .ORG, and possibly .EDU) with the prospect of additional gTLDs in the future. As the number of TLDs in general use increases, guessing gets more difficult.

这里有两个主要问题。随着指定名称数量的增加,获得一个容易猜测的名称变得更加困难。此外,必须猜测TLD以及名称。虽然现在很多用户只是猜测“.COM”是“默认”TLD,但目前有许多两个字母的国家/地区代码顶级域以及其他GTLD(.NET、.ORG,可能还有.EDU),未来可能会有更多的GTLD。随着通用TLD数量的增加,猜测变得更加困难。

Between July 1996 and our shutdown in March 1998, the InterNIC Directory and Database Services project maintained the Netfind search engine [1] and the associated database that maps organization information to domain names. This database thus acted as the type of Internet directory that associates company names with domain names. We also built WWWSeeker, a system that used the Netfind database to find web sites associated with a given organization. The experienced gained from maintaining and growing this database provides valuable insight into the issues of providing a directory service. We present it here to allow future implementors to avoid some of the blind alleys that we have already explored.

从1996年7月到我们1998年3月关闭,InterNIC目录和数据库服务项目维护了Netfind搜索引擎[1]和将组织信息映射到域名的相关数据库。因此,该数据库充当将公司名称与域名关联的Internet目录类型。我们还构建了WWWSeeker,这是一个使用Netfind数据库查找与给定组织关联的网站的系统。从维护和扩展此数据库中获得的经验为提供目录服务的问题提供了有价值的见解。我们在这里展示它是为了让未来的实现者能够避免我们已经探索过的一些死胡同。

2. Directory Population
2. 目录总体
2.1 What to do?
2.1 怎么办?

There are two issues in populating a directory: finding all the domain names (building the skeleton) and associating those domains with entities (adding the meat). These two issues are discussed below.

填充目录有两个问题:查找所有域名(构建框架)和将这些域与实体关联(添加实体)。下面讨论这两个问题。

2.2 Building the skeleton
2.2 构建骨架

In "building the skeleton", it is popular to suggest using a variant of a "tree walk" to determine the domains that need to be added to the directory. Our experience is that this is neither a reasonable nor an efficient proposal for maintaining such a directory. Except for some infrequent and long-standing DNS surveys [5], DNS "tree walks" tend to be discouraged by the Internet community, especially given that the frequency of DNS changes would require a new tree walk monthly (if not more often). Instead, our experience has shown that data on allocated DNS domains can usually be retrieved in bulk fashion with FTP, HTTP, or Gopher (we have used each of these for particular TLDs). This has the added advantage of both "building the skeleton" and "adding the meat" at the same time. Our favorite method for finding a server that has allocated DNS domain information is to start with the list maintained at

在“构建骨架”中,通常建议使用“树漫游”的变体来确定需要添加到目录中的域。根据我们的经验,这项建议既不合理,也不有效。除了一些不经常和长期的DNS调查[5],互联网社区往往不鼓励DNS“树行走”,特别是考虑到DNS更改的频率要求每月进行一次新的树行走(如果不是更频繁的话)。相反,我们的经验表明,分配的DNS域上的数据通常可以通过FTP、HTTP或Gopher以批量方式检索(我们已将其中的每一种用于特定TLD)。这样做的另一个好处是同时“构建骨架”和“添加肉”。我们最喜欢的查找已分配DNS域信息的服务器的方法是从

http://www.alldomains.com/countryindex.html and go from there. Before this was available, it was necessary to hunt for a registry using trial and error.

http://www.alldomains.com/countryindex.html 从那里开始。在这之前,有必要使用试错法寻找注册表。

When maintaining the database, existing domains may be verified via direct DNS lookups rather than a "tree walk." "Tree walks" should therefore be the choice of last resort for directory population, and bulk retrieval should be used whenever possible.

在维护数据库时,可以通过直接DNS查找而不是“树漫游”来验证现有域。因此,“树漫游”应该是目录填充的最后选择,并且应尽可能使用批量检索。

2.3 Adding the meat
2.3 加肉

A possibility for populating a directory ("adding the meat") is to use an automated system that makes repeated queries using the WHOIS protocol to gather information about the organization that owns a domain. The queries would be made against a WHOIS server located with the above method. At the conclusion of the InterNIC Directory and Database Services project, our backend database contained about 2.9 million records built from data that could be retrieved via WHOIS. The entire database contained 3.25 million records, with the additional records coming from sources other than WHOIS.

填充目录(“添加肉类”)的一种可能性是使用自动化系统,该系统使用WHOIS协议进行重复查询,以收集有关拥有域的组织的信息。查询将针对使用上述方法定位的WHOIS服务器进行。在InterNIC目录和数据库服务项目结束时,我们的后端数据库包含大约290万条记录,这些记录是由可以通过WHOIS检索的数据构建的。整个数据库包含325万条记录,其他记录来自WHOIS以外的来源。

In our experience this information contains many factual and typographical errors and requires further examination and processing to improve its quality. Further, TLD registrars that support WHOIS typically only support WHOIS information for second level domains (i.e. ne.us) as opposed to lower level domains (i.e. windrose.omaha.ne.us). Also, there are TLDs without registrars, TLDs without WHOIS support, and still other TLDs that use other methods (HTTP, FTP, gopher) for providing organizational information. Based on our experience, an implementor of an internet directory needs to support multiple protocols for directory population. An automated WHOIS search tool is necessary, but isn't enough.

根据我们的经验,这些信息包含许多事实和印刷错误,需要进一步检查和处理以提高其质量。此外,支持WHOIS的TLD注册器通常只支持二级域(即ne.us)的WHOIS信息,而不是较低级别域(即windrose.omaha.ne.us)。此外,还有没有注册者的TLD、没有WHOIS支持的TLD,以及使用其他方法(HTTP、FTP、gopher)提供组织信息的其他TLD。根据我们的经验,internet目录的实现者需要支持目录填充的多个协议。一个自动化的WHOIS搜索工具是必要的,但还不够。

3. Directory Updating: Full Rebuilds vs Incremental Updates
3. 目录更新:完全重建与增量更新

Given the size of our database in April 1998 when it was last generated, a complete rebuild of the database that is available from WHOIS lookups would require between 134.2 to 167.8 days just for WHOIS lookups from a Sun SPARCstation 20. This estimate does not include other considerations (for example, inverting the token tree required about 24 hours processing time on a Sun SPARCstation 20) that would increase the amount of time to rebuild the entire database.

鉴于我们的数据库在1998年4月最后一次生成时的大小,仅从Sun SPARCstation 20进行WHOIS查找,就需要134.2到167.8天的时间才能从WHOIS查找中完全重建数据库。此估计不包括其他考虑因素(例如,在Sun SPARCstation 20上反转令牌树需要大约24小时的处理时间),这将增加重建整个数据库的时间。

Whether this is feasible depends on the frequency of database updates provided. Because of the rate of growth of allocated domain names (150K-200K new allocated domains per month in early 1998), we provided monthly updates of the database. To rebuild the database

这是否可行取决于提供的数据库更新频率。由于已分配域名的增长率(1998年初每月新分配150-20万个域名),我们每月更新数据库。重建数据库的步骤

each month (based on the above time estimate) would require between 3 and 5 machines to be dedicated full time (independent of machine architecture). Instead, we checkpointed the allocated domain list and rebuild on an incremental basis during one weekend of the month. This allowed us to complete the update on between 1 and 4 machines (3 Sun SPARCstation 20s and a dual-processor Sparcserver 690) without full dedication over a couple of days. Further, by coupling incremental updates with periodic refresh of existing data (which can be done during another part of the month and doesn't require full dedication of machine hardware), older records would be periodically updated when the underlying information changes. The tradeoff is timeliness and accuracy of data (some data in the database may be old) against hardware and processing costs.

每个月(基于上述时间估计)需要3到5台机器全职(独立于机器架构)。相反,我们检查了分配的域列表,并在一个月的一个周末内增量重建。这使我们能够在1到4台机器(3台Sun SPARCstation 20s和一台双处理器Sparcserver 690)上完成更新,而无需在几天内投入全部精力。此外,通过将增量更新与现有数据的定期刷新相结合(可以在当月的另一部分时间内完成,不需要完全投入机器硬件),当基础信息发生变化时,旧记录将定期更新。权衡是数据的及时性和准确性(数据库中的一些数据可能是旧的)与硬件和处理成本。

4. Directory Presentation: Distributed vs Monolithic
4. 目录表示:分布式与单片

While a distributed directory is a desirable goal, we maintained our database as a monolithic structure. Given past growth, it is not clear at what point migrating to a distributed directory becomes actually necessary to support customer queries. Our last database contained over 3.25 million records in a flat ASCII file. Searching was done via a PERL script of an inverted tree (also produced by a PERL script). While admittedly primitive, this configuration supported over 200,000 database queries per month from our production servers.

虽然分布式目录是一个理想的目标,但我们将数据库作为一个整体结构进行维护。考虑到过去的增长,不清楚在什么情况下迁移到分布式目录才是支持客户查询所必需的。我们的上一个数据库在一个简单的ASCII文件中包含325多万条记录。搜索是通过一个倒树的PERL脚本完成的(也由PERL脚本生成)。尽管承认这种配置很原始,但它每月支持来自生产服务器的200000多个数据库查询。

Increasing the database size only requires more disk space to hold the database and inverted tree. Of course, using database technology would probably improve performance and scalability, but we had not reached the point where this technology was required.

增加数据库大小只需要更多的磁盘空间来容纳数据库和倒排树。当然,使用数据库技术可能会提高性能和可伸缩性,但我们还没有达到需要这种技术的程度。

5. Security Considerations
5. 安全考虑

The underlying data for the type of directory discussed in this document is already generally available through WHOIS, DNS, and other standard interfaces. No new information is made available by using these techniques though many types of search become much easier. To the extent that easier access to this data makes it easier to find specific sites or machines to attack, security may be decreased.

本文档中讨论的目录类型的基础数据通常已通过WHOIS、DNS和其他标准接口提供。虽然许多类型的搜索变得容易得多,但使用这些技术不会提供任何新信息。如果更容易访问这些数据,就更容易找到要攻击的特定站点或机器,那么安全性可能会降低。

The protocols discussed here do not have built-in security features. If one source machine is spoofed while the directory data is being gathered, substantial amounts of incorrect and misleading data could be pulled in to the directory and be spread to a wider audience.

这里讨论的协议没有内置的安全特性。如果在收集目录数据时欺骗了一台源计算机,大量不正确和误导性的数据可能会被拉入目录并传播到更广泛的受众。

In general, building a directory from registry data will not open any new security holes since the data is already available to the public. Existing security and accuracy problems with the data sources are likely to be amplified.

一般来说,从注册表数据构建目录不会打开任何新的安全漏洞,因为这些数据已经对公众可用。现有的数据源安全性和准确性问题可能会被放大。

6. Acknowledgments
6. 致谢

This work described in this document was partially supported by the National Science Foundation under Cooperative Agreement NCR-9218179.

这份文件中所描述的这项工作得到了NCR-9218179合作科学基金会的部分支持。

7. References
7. 工具书类

[1] M. F. Schwartz, C. Pu. "Applying an Information Gathering Architecture to Netfind: A White Pages Tool for a Changing and Growing Internet", University of Colorado Technical Report CU-CS-656-93. December 1993, revised July 1994.

[1] M.F.Schwartz,C.Pu。“将信息收集体系应用到NETSCODE:一个改变和增长互联网的白页工具”,科罗拉多大学技术报告CU-CS-65-93.1993年12月,1994年7月修订。

       URL:ftp://ftp.cs.colorado.edu/pub/cs/techreports/schwartz/Netfind
        
       URL:ftp://ftp.cs.colorado.edu/pub/cs/techreports/schwartz/Netfind
        

[2] Sollins, K., "Plan for Internet Directory Services", RFC 1107, July 1989.

[2] Sollins,K.,“因特网目录服务计划”,RFC1107,1989年7月。

[3] Hardcastle-Kille, S., Huizer, E., Cerf, V., Hobby, R. and S. Kent, "A Strategic Plan for Deploying an Internet X.500 Directory Service", RFC 1430, February 1993.

[3] Hardcastle Kille,S.,Huizer,E.,Cerf,V.,Hobby,R.和S.Kent,“部署Internet X.500目录服务的战略计划”,RFC 1430,1993年2月。

[4] Postel, J. and C. Anderson, "White Pages Meeting Report", RFC 1588, February 1994.

[4] Postel,J.和C.Anderson,“白页会议报告”,RFC 1588,1994年2月。

   [5] M. Lottor, "Network Wizards Internet Domain Survey", available
       from http://www.nw.com/zone/WWW/top.html
        
   [5] M. Lottor, "Network Wizards Internet Domain Survey", available
       from http://www.nw.com/zone/WWW/top.html
        
8. Authors' Addresses
8. 作者地址

Ryan Moats AT&T 15621 Drexel Circle Omaha, NE 68135-2358 USA

瑞安护城河AT&T 15621美国东北部奥马哈德雷克塞尔环路68135-2358

   EMail:  jayhawk@att.com
        
   EMail:  jayhawk@att.com
        

Rick Huber AT&T Room C3-3B30, 200 Laurel Ave. South Middletown, NJ 07748 USA

Rick Huber美国新泽西州南米德尔顿劳雷尔大道200号电话电报公司C3-3B30室,邮编07748

   EMail: rvh@att.com
        
   EMail: rvh@att.com
        
9. Full Copyright Statement
9. 完整版权声明

Copyright (C) The Internet Society (1999). All Rights Reserved.

版权所有(C)互联网协会(1999年)。版权所有。

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English.

本文件及其译本可复制并提供给他人,对其进行评论或解释或协助其实施的衍生作品可全部或部分编制、复制、出版和分发,不受任何限制,前提是上述版权声明和本段包含在所有此类副本和衍生作品中。但是,不得以任何方式修改本文件本身,例如删除版权通知或对互联网协会或其他互联网组织的引用,除非出于制定互联网标准的需要,在这种情况下,必须遵循互联网标准过程中定义的版权程序,或根据需要将其翻译成英语以外的其他语言。

The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns.

上述授予的有限许可是永久性的,互联网协会或其继承人或受让人不会撤销。

This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

本文件和其中包含的信息是按“原样”提供的,互联网协会和互联网工程任务组否认所有明示或暗示的保证,包括但不限于任何保证,即使用本文中的信息不会侵犯任何权利,或对适销性或特定用途适用性的任何默示保证。