University at Albany Libraries

The Finer Points of Web Search Engines


Updated: 22 September 1998

This tutorial will cover some of the newer or lesser known options available on several Web search engines. These options can help the searcher gain control over search results and bring in more relevant hits. A handful of newer search engines are covered in this tutorial, along with a group of the better known.

For a tutorial covering the more basic aspects of Web search engines, see Searching the Internet: Recommended Sites and Search Techniques.

Topics covered in this tutorial

  1. Review of search syntax
  2. Refining search results
  3. Relevancy ranking
  4. Storing queries for regular processing

Search engines covered in this tutorial

 
[AltaVista] [Excite] [Inference Find] 

[Infoseek] [Lycos] [MetaFind] 

[Northern Light] [ProFusion] 

 

I. Review of search syntax

The most common form of search syntax employed on Web search engines is keyword searching with implied Boolean syntax. This tutorial will concentrate primarily on this type of search. For a fuller discussion of Boolean logic, see Boolean Searching on the Internet.

Keyword searching refers to a search type in which you enter terms representing the concepts you wish to retrieve. Boolean operators are not used.

Implied Boolean logic refers to a search in which symbols are used to represent Boolean logical operators. In this type of search on the Internet, the absence of a symbol is also significant, as the space between keywords defaults to either OR logic or AND logic. Many well-known search engines default to OR. For example:

Search engines defaulting to OR: AltaVista (main screen); Excite; Infoseek; MetaCrawler

Search engines defaulting to AND: HotBot; Lycos; Northern Light

[Return to Index]

Exercise: Review of implied Boolean syntax using Infoseek

 

Infoseek - http://www.infoseek.com/

Infoseek Quick Facts

Online help: http://www.infoseek.com/Help?pg=HomeHelp.html 

Infoseek is one of the most accurate search engines on the Internet. It does a good job of processing all the aspects of a search query and returning results in a useful relevancy ranked order. 

Special Features: 

  • Returns accurate results in very fast search processing time 
  • Clusters all the retrieved URLs from one site into one result; users have the option to view the other pages from the site 
  • Truncates automatically so that the user does not have to remember to use a truncation symbol 
  • Offers straightforward search syntax on a simple or Advanced Search (template) interface 
  • Field searching is useful and accurate 
  • Offers concept recognition for names, noun phrases, numbers, and word form variants 
  • Advanced Search interface offers a user-friendly template of search options 
Drawbacks: 
  • At 60 million files, not as large as other services 
  • Query window on the main screen is rather small 
 Query #1: I'm looking for information about peace in Bosnia

Search:

a)   peace     Bosnia      [implied OR - this search is incorrect]
b)   +peace    +Bosnia     [implied AND - this search is correct]
[Return to Index]


2. Refining search results

Several search engines offer the option of refining existing search results. This is useful if:

Exercise: Narrow existing results on Infoseek

Infoseek - http://www.infoseek.com/

Query #2: I'm interested in the Dayton peace agreement for Bosnia.

Search:
To the search above:

+peace     +Bosnia
  1. Go to the top of any results page
  2. Click on Search only within these xxxxxx pages
  3. Type: title:Dayton
This option adds terms to your existing results using AND logic. Using this option, you do not have to retype your original search in order to add to it.

[Return to Index]

Exercise: Refine feature on AltaVista

 

AltaVista - http://www.altavista.digital.com

AltaVista Quick Facts

Online help
Simple search: http://www.altavista.digital.com/av/content/help.htm

Advanced search: http://www.altavista.digital.com/av/content/help_advanced.htm 
Alta Vista is one of the most respected and popular general search engines on the Web. It features numerous searchable fields, the ability to process complex searches, and a poweful method of refining search results. Alta Vista continues to increase the size of its database, and now indexes 140 million files, the largest of any search engine. 

Special features: 

  • Very large database of 140 million Web pages 
  • Offers an impressive selection of searchable fields, including language and retrieval by last-modified date 
  • Refine feature presents users with a selection of related search terms to add to or exclude from a subsequent search 
  • Offers unusual options such as searching by language and translating search results 
Drawbacks: 
  • Relevancy ranking can be questionable 
  • In Simple Search, switches default logic from OR to AND if two or more fields are searched, e.g., title:mars   url:nasa 
  • Processes searches more slowly than Infoseek 
 Query #3: I'm looking for information about Watergate.

Search:

  1. Type:      Watergate
  2. Click on Refine. Examine the term cluster lists. Some clusters are relevant to the query, some are not. Notice that you can only choose to require or exclude an entire list of terms.
  3. Click on GRAPH. A Java version of this list will slowly appear. Here, you can choose to require (green X) or exclude (red check mark) individual terms.
  4. Choose individual terms to require and/or exclude. Click on Search. Examine your results.
  5. You may also choose to Refine Again if you wish to further narrow your results based on your selections before you search again.
[Return to Index]

Exercise: Search for more documents feature on Excite

 

Excite - http://www.excite.com/

Excite Quick Facts

Online help: http://www.excite.com/Info/searching.html?a-tip-t 

Excite is known for the relative currency of its database and its application of concept searching to a user's query. 

Special Features: 

  • Offers multiple syntax options as well as form-based searching in its Power Search interface 
  • Engine's concept searching looks for terms and concepts related to a user's search terms 
  • Retrieves Web pages related to any file in the list of search results when choosing to Search for more documents like this one 
  • With list of hits, offers the option to add displayed related terms to the original search 
Drawbacks: 
  • No field searching is available 
  • Excite's concept searching may pull in irrelevant hits, though exact matches are shown first in the list of hits 
  • At 50 million files, not as large as other services 
 Query #4: I'm looking for information about nuclear waste.

Search:

  1. Type:      "nuclear waste"
  2. Retrieve the first screen of results.
  3. Note that a small number of search terms appear at the top of this screen. Choosing any one or more of these with add them to your search with the Boolean OR. This will increase your number of results.
  4. Choose an item in your list of hits that is of interest to you. Click on Search for more documents like this one to retrieve related results.
  5. You may choose Search for more documents... on subsequent screens if this is helpful
[Return to Index]


3. Relevancy ranking

A few search engines give the user a certain amount of control over the relevancy ranking of search results. Other engines offer an alternative to relevancy ranking by organizing results into clusters or folders arranged by concept, type of site, document type, etc.

A. Control over relevancy ranking

Exercise: Control of factors in the relevancy ranking of results with Lycos Pro.

 

Lycos - http://www.lycos.com/

Lycos Quick Facts

Online help
Basic help: http://www.lycos.com/help/search-help.html
Advanced help: 
http://www.lycos.com/help/lycospro-help.html

http://www.lycos.com/press/pro/query_parse.html 
Lycos offers a search engine service with many options for customizing searches and ranking the results. 

Special Features: 

  • One of the smaller search engine databases with 30 million Web pages indexed 
  • Has one of the more current search engine databases on the Web 
  • Offers more proximity operators than any other search engine on the Web; these allow the user to specify the adjacency and order of terms in source documents 
  • Allows the user to control the factors in the relevancy ranking of results 
  • Truncation is automatic unless the searcher specifies otherwise 
  • Good relevancy ranking 
Drawbacks: 
  • Not a full-text database; only certain portion of Web pages are indexed 
  • Accuracy is weak relative to other major search engines 
  • Phrase searching can be tricky if stop words are included 
 Query #6: What has Clinton been doing about Bosnia?

Search:

  1. Click on Advanced Search
  2. Click on Power Panel
  3. Notice the choices for scoring various relevancy ranking factors as Low, Medium, or High
  4. Click on Java Version
  5. Type:      Clinton     Bosnia
  6. Apply ratings as you see fit
  7. Conduct the search and evaluate the results.
The Power Panel will not affect the number of results, but the order in which they appear.

[Return to Index]

B. Alternatives to relevancy ranking

Exercise: Grouping of results into concept folders with Northern Light

 

Northern Light - http://www.northernlight.com/

Online help: 
Main Screen: http://www.northernlight.com/docs/prod_help.htm#simplesearch 
Power Search: http://www.northernlight.com/docs/power_help.htm#powersearch 

Northern Light organizes results into Custom Search Folders that represent concepts and/or types of sites. Results within these folders are relevancy ranked. With this system, you can ignore the folders that are irrelevent and choose those that fit your query best. This may be more convenient than working through one master list of results. 

Special Features: 

  • Sorts search results into folders by subjects, types (e.g., press releases, maps), source sites, or languages 
  • Within folder levels, a new group of folders is presented 
  • Relevancy ranked results are available on the same screens as the folders 
  • Offers a Special Collection database of relevant articles from 3000+ sources for a fee 
Drawbacks: 
  • Database is relatively small (less than 50 million pages) but is growing 
  • Folders may not be consistently useful for all queries; however, the user can simply skip over irrelevant folders 
 Query #7: What are the prospects for peace in Bosnia?

Search:

  1. Type:     +peace      +Bosnia
  2. Choose the folder Bosnia-Herzegovina
  3. Choose the folder Government Sites
  4. Next, choose the Custom Search Folders that cover the aspects of this query that interest you. Examine the results. Note that Special Collection documents are available only for a fee, usually $1.
[Return to Index]

Exercise: Organization of results into concepts and/or types of sites with Inference Find

 

Inference Find - http://www.inference.com/infind/

Online help: http://www.inference.com/infind/boolean.html 

Inference Find is a multithreaded search engine that searches six multiple search engines simultaneously. The search engine merges the results, removes duplicate files, and organizes the results into sections. 

Special Features: 

  • Retrieves the maximum number of results each search engine will allow by searching target engines in parallel. For example, Infoseek is searched three times in parallel. 
  • Groups results into sections by concepts and/or by Web site, e.g., educational institution, non-profit site, European site, federal government, etc. 
  • Returns results quickly 
Drawbacks: 
  • Gives no syntax directions. Suggests the use of Boolean operators but cautions about inconsistent results 
  • List of results contains only titles of Web pages, so the relevancy of the source document is not always easy to determine without visiting the page 
 Query #8: What are the prospects for peace in Bosnia?

Search:

  1. Type:      Bosnia and peace
  2. View results
[Return to Index]

Exercise: Organization of results by user choice with MetaFind

 

MetaFind - http://www.metafind.com/

Online help: http://www.metafind.com/syntax.html 

MetaFind is a multi-threaded search engine that searches 6 major search engines for a limited numer of results and offers unusual options for sorting search results. 

Special Features: 

  • Results may be sorted by keyword, alphabetically, or by domain 
  • Duplicate records are removed from search results 
  • Boolean search options may be used, including the proximity limiter NEAR 
  • Each file in the results list includes an indication of the ranking of this file as retrieved from the source search engine 
Drawbacks: 
  • Retrieves a limited number of hits from six search engines, as follows: 10 links from AltaVista twice, 10 from Excite twice, 50 from HotBot, 25 from Infoseek, 30 from Planetsearch and 50 from Webcrawler 
  • When sorting by keyword, the "Others" category can be rather large 
 Query #9: What are the prospects for peace in Bosnia?

Search:

  1. Type:     Bosnia      peace      [MetaFind defaults to Boolean AND logic]
  2. Choose: Sort by Domain
This search can be quite useful if you wish to obtain results from certain types of sites. For example, you may wish to view sites on a particular topic from only the edu domain in order to see more scholarly materials. Unfortunately, most search engines cannot handle such a search because there are too many pages within this domain for them to process. MetaFind offers an interesting solution to this problem by first gathering pages and then sorting them by domain.

[Return to Index]


4. Storing queries regular processing

[NOTE: This feature is temporarily unavailable]

Exercise: Storing queries for regularly updated results with ProFusion

 

ProFusion - http://profusion.ittc.ukans.edu/

Online help: http://profusion.ittc.ukans.edu/help.html 

ProFusion is a multi-threaded search engine that searches 9 search engines and subject directories simultaneously and returns collated results. Queries may be stored at the site using ProFusion Personal Assistant. These queries can be run weekly, bi-weekly or monthly; ProFusion will e-mail a notification of new results. 

Special Features: 

  • Searches nine search engines, or the three best that support a particular query 
  • Queries may be registered at the site using ProFusion Personal Assistant; ProFusion will run them weekly, biweekly, or monthly, and inform the user via e-mail when new sites are found 
  • Users may enter keyword, Boolean, or phrase queries 
  • Results are presented in relevancy ranked order 
  • Offers an option to detect broken links and mark them in the list of results 
 Exercise: ProFusion

To store a query, click on the word here where it states, "Click here to view your personalized search results or to register to rerun queries." You will be asked to run your query once, and then rate each item in the results list as either relevant or irrelevant. ProFusion will use this ranking to conduct subsequent searches. [This service is temporarily unavailable]

[Return to Index]

Laura Cohen
lcohen@cnsvax.albany.edu