The Profitable Search for Search Engines

The future developer of the Internet's intelligent agent of choice can expect fame, fortune and glory, here are some current offerings

ditor's Note: this is the first of a two-part article examining Internet search engines', what they can provide, and the growing business in creating them

Kent Summers (Electronic Book Technologies Inc.; kjs@ebt.com) wasn't just having a bad hair day when he began a presentation about the Internet recently with the pronouncement, "Surfing sucks." Rather, he was echoing the frustration Internet searchers feel when probing cyberspace looking for that particular piece of data or item of information. Finding specific documents is difficult enough; exploring collections, libraries or servers is nearly impossible with the current crop of Internet search engines.

This challenge of creating finely tuned engines to explore and exploit the information richness of the Internet -- not just the World Wide Web, but discussion groups, news groups, et al. -- confronts not only software developers but also document producers. It's also attracting serious attention from entrepreneurs and big business. Beacons broke the fog a short time ago when Microsoft signed a non-exclusive license to use one of the more popular engines, Lycos.

What's out there now? Even for those familiar with the Internet, the current crop of search engines, at times referred to as spiders, wanderers and robots, bear arcane names: Lycos (the first 5 letters of the Latin name for Wolf Spider, that is, Lycosidae; URL: http://lycos.cs.cmu.edu/); JumpStation II (URL: http://js.stir.ac.uk/jsbin/jsii); World Wide Web Worm (URL: http://www.cs.colorado.edu/home/mcbryan/WWWW.html); WebCrawler (URL: http://webcrawler.cs.washington.edu/WebCrawler/); NIKOS (URL:http://www.rns.com/cgi-bin/nikos) and DE-CLOD (Distributedly Administered Categorical List of Documents; URL: http://schiller.wustl.edu/DACLOD/daclod

These engines allow you to search for information many ways -- some search the titles or the headers of documents, such as WWW home pages, others search documents and still others search indexes or directories. Many offer subsystems to manipulate and manage the information you get.

Lycosª, for example, which bills itself as "the catalog of the Internet," trawls the Web, Gopherspace and FTP archives daily and creates a database of all the Web pages it uncovers. The index of the database is updated each week. Its search engine, called Pursuit, presents "probabilistic retrieval from this catalog, taking a user's query and returning a sorted list of hits (the list is sorted by match score, and only documents with scores above the threshold are retrieved)."

This Lycos site, administered by Michael L. Mauldin (fuzzy@cmu.edu), contains references to 3.85 million Web pages out of his estimate of more than 5 million Web documents. This does not include, as he notes, pages inside databases such as the Library of Congress, the Human Genome Database or WAIS indexes.

A newer set of search tools developed by Mike Schwartz is the Harvest Information Discovery and Access System (URL: http://harvest.cs.colorado.edu/). Described as an "integrated set of tools to gather, extract, organize, search, cache and replicate relevant information across the Internet," this system, so developers claim, allows users -- with only modest effort -- to "tailor Harvest to digest information in many different formats, and offer custom search services on the Internet."

The home page has hypertext information on such subjects as demonstrations and useful indexes, technical discussion, user's manual , FAQ, papers, talks, press release, HPCC blue book pages, getting the software and Harvest team contact information.

If you are interested in trying out the different search engines, you can find most of them at the heavily trafficked Yahoo (http://www.yahoo.com/Reference/Searching_the_Web/). Those who want to track this subject in finer detail should consider following patent announcements available via the Web at Source Translation & Optimization's (STO) Internet Patent Search System (URL:http://sunsite.unc.edu/patents/intropat.html) and through its mailing list (send the word, News, to: patents@world.std.com). The STO list offers a weekly mailing of all patents listed in the most recent issue of the USPTO Patent Gazette as well as other valuable services.

Further, there is the discussion group on technical aspects of WWW robots (E-mail robots-request@nexor.co.uk; type the words, "subscribe", "help", and "stop" on separate lines in your message) as well as news groups, for example, the group, comp.infosystems.www.*, which includes comp.infosystems.www.announce, comp.infosystems.www.misc, comp.infosystems.www.providers, comp.infosystems.www.servers.* and comp.infosystems.www.users.

Part 2 will address what's ahead in search engine development.


Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

What is your e-mail address?

My e-mail address is:

Do you have a password?

Forgot your password? Click here
close

Trending

  • Dive into our Contract Award database

    In an exclusive for WT Insider members, we are collecting all of the contract awards we cover into a database that you can sort by contractor, agency, value and other parameters. You can also download it into a spreadsheet. Our databases track awards back to 2013. Read More

  • Navigating the trends and issues of 2016 Nick Wakeman

    In our latest WT Insider Report, we pull together our best advice, insights and reporting on the trends and issues that will shape the market in 2016 and beyond. Read More

contracts DB

Washington Technology Daily

Sign up for our newsletter.

Terms and Privacy Policy consent

I agree to this site's Privacy Policy.