University at Albany Libraries

Searching the Internet:
Recommended Sites and Search Techniques


Updated: 6 October 1998

This document will help you to explore a variety of subject directories and search engines in order to gain skills in conducting independent research on the Internet. For more information about Internet research, see Conducting Research on the Internet.

For a more comprehensive list of subject directories, see: Internet Subject Directories
For a more comprehensive list of search engines, see: Search the Internet

Sites reviewed in this document:

 

[AltaVista] [Argus Clearinghouse] [Excite] 

[HotBot] [Infoseek] [Lycos] [Magellan] 

[MetaCrawler] [Northern Light] [Yahoo!] 

 

Remember...


Subject Directories

Definition: A subject directory is a database of Internet files submitted by site creators or evaluators and organized into subject categories. Most directories offer a search engine to query the database. The service may or may not use selection criteria when choosing files to include in the database. v

General Tips

  1. Subject directories differ significantly in selectivity. Consider the policies of any directory that you visit.
  2. One challenge to the above is the fact that not all directory services are willing to disclose either their policies or the names and qualifications of site reviewers.
  3. Some subject directories consist of links accompanied by annotations to describe or evaluate site content. A well-written annotation from a known reviewer is more useful than just a list of links.
  4. Most subject directories offer a search engine to query the database. For more information, see General Search Strategies below.

Three subject directories useful for initial exploration are Yahoo!, Magellan, and The Argus Clearinghouse. These three services illustrate vastly different policies for selection criteria.

Yahoo! - http://www.yahoo.com/

When to use Yahoo? When you want to visit a site with broad but unevaluated subject coverage to get an idea about what is available on the Internet on your topic.

Strengths:

Weaknesses:

Search Engine Syntax

Main screen

 
Boolean Logic 
  • Keyword search defaulting to Boolean AND 
  • Implied Boolean logic: + for Boolean AND, - for Boolean NOT 
Case Sensitivity 
  • None 
Fields 
  • Title, e.g., t:automobiles 
  • URL, e.g., u:ibm 
Phrases 
  • Phrases within double quotations, e.g., "budget battle" 
Truncation 
  • Mandatory truncation 
  • Truncation character: * 
 

Advanced search

Click on options
User fill-in template with supplied terminology
 
Boolean Logic 
  • Matches on all words (AND) 
  • Matches on any word (OR) 
  • A person's name 
Case Sensitivity 
  • None 
Fields 
  • Date 
Phrases 
  • An exact phrase match 
Other 
  • Intelligent default (explanation not provided) 
 [Return to Index]


Magellan - http://www.mckinley.com/

When to use Magellan? When you want to find generally good quality sites without having to wade through sites of lesser quality. The database of Reviewed Sites is currently not being kept up to date.

Strengths:

Weaknesses:

Search Engine Syntax

Main screen only

Uses the Excite search engine
 
Boolean Logic 
  • Keyword search defaulting to Boolean OR 
  • Implied Boolean logic:+ for Boolean AND, - for Boolean NOT 
  • Full Boolean logic: Boolean AND, OR, AND NOT with parentheses, e.g., behavior AND (cats OR felines). Boolean operators must be in CAPS. 
Case sensitivity 
  • None 
Fields 
  • None 
Phrases 
  • Phrases within double quotations, i.e., "budget battle" 
Truncation 
  • None 
 [Return to Index]


The Argus Clearinghouse - http://www.clearinghouse.net/

When to use the Clearinghouse? When you want to find a collection of Internet resources on specific topics recommended by specialists.

Strengths:

Weaknesses

Search Engine Syntax

To search (information pages of the guides only), click on Search/Browse
 
Boolean Logic 
  • Keyword search defaulting to Boolean AND unless one of the terms is truncated, in which case the default is OR 
  • Full Boolean logic: Boolean OR, AND with parentheses, e.g., behavior and (cats or felines) 
  • NOT operator is not supported 
Case sensitivity 
  • None 
Fields 
  • None 
Phrases 
  • None 
Truncation 
  • Mandatory truncation 
  • Truncation character: * 
 [Return to Index]


Other recommended subject directories with evaluated content:
BUBL Link
http://bubl.ac.uk/link/
INFOMINE: Scholarly Internet Resources
http://lib-www.ucr.edu/
Librarians' Index to the Internet
http://sunsite.berkeley.edu/InternetIndex/
The WWW Virtual Library
http://http://vlib.stanford.edu/Overview.html

Search Engines

Definition: A search engine is a searchable database of Internet files collected by a computer program (called a wanderer, crawler, robot, worm, spider). Indexing is created from the collected files, e.g., title, full text, size, URL, etc. There is no selection criteria for the collection of files.

A search engine might well be called a search engine service or a search service. As such, it consists of three components:

General Tips

  1. Search engines do not index all the documents available on the Web. For example, most search engines cannot index files to password-protected sites. Documents behind a firewall will not be accessible to a spider. Other files can be excluded from search engines by Web server software at the host site. Still other Web pages may not be picked up if they are not linked to other pages, and are therefore missed by a search engine spider as it crawls from one page to the next. Search engines rarely contain the most recent documents posted to the Internet; do not look for yesterday's news on a search engine.
  2. Search engine features are improving with time. For example, several search services offer searches on fields, programming languages, domain locations, dates, and so on. These include AltaVista, HotBot, and Infoseek. For a summary of the features available at many of the major search engines, see How to Choose a Search Engine or Research Database.
  3. Most major search engine indexes consist of the full text of source files. These include AltaVista, Excite, HotBot, and Infoseek. Lycos, however, indexes only certain portions of Web pages in its database. When you search a full text index, you will retrieve a file even if your search terms appear only once in the text and do not represent the primary topic of the document. Limiting your search to fields or using proximity operators (explained below) can be a useful way to boost the relevancy of your results.
  4. Some search engines have an interface for basic searches as well as a separate interface for advanced queries. Be sure to understand the difference in syntax requirements and to use the interface that is appropriate for your query.
  5. Search engines don't always pay attention to everything included in your search statements. For example, if you search for three terms using the Boolean AND, you may retrieve files with only two, one, or even none of these terms present. If you are unsure of the relevance of a document, use the "find" feature of your Web browser to search for your terms.
  6. Because only partially relevant files are often returned as a result of your search, good relevancy ranking is important. Most search engines use various criteria to construct a relevancy rating of each hit and will present your search results in this order. Criteria can include: search terms in the title, URL, first heading, HTML META tag; number of times search terms appear in the document; search terms appearing early in the document; search terms appearing close together; page popularity, etc.
  7. Beware of relevancy rating nonetheless. Your most relevant hits may appear beyond the first few screens. Your point of view is far more complex than the search engine's relevancy algorithm. In addition, the source documents on the Internet are extremely varied and only a small subset can be expected to meet your needs.
  8. One of the most interesting developments in search engine technology is the organization of search results by concept and/or site rather than by relevancy. To see if this may be useful to you, visit Inference Find, MetaFind, and Northern Light. Northern Light is a particularly good example of this technology.
  9. Offered features do not always successfully work.
  10. These Internet sites are not stable. Expect the interface and features to change. Be sure to read the Help files at each site.
[Return to Index]

General Search Strategies

[Return to Index]


One of the best search engines on the Internet is Infoseek because

Infoseek - http://www.infoseek.com/

When to use Infoseek? When you want to search the full text of a relatively large database using implied Boolean logic with field search options and retrieve accurate results quickly.

Characteristics:

Strengths: Weaknesses:

Search Engine Syntax

Main screen

 
Boolean Logic 
  • Keyword search defaulting to Boolean OR 
  • Implied Boolean logic: + for Boolean AND, - for Boolean NOT 
Case Sensitivity 
  • Full case sensitivity supported 
Fields 
  • Supported fields: 
    • Title, e.g., title:"New York Times"
    • URL, e.g., url:holocaust
    • Site, e.g., site:ibm.com
    • Link e.g., link:www.albany.edu/library 
Phrases 
  • Phrase search in one of two ways: 
    • Place phrase within double quotations, e.g., "budget battle" 
    • Insert hyphen between each word, e.g., budget-battle 
Truncation 
  • No truncation symbol; engine stems each term 
Other 
  • Separate a search of proper names with a comma 
  • Bill Clinton, Bob Dole
  • Use a pipe (|) to limit search results from within a retrieved set 
  • dogs | poodles 
 

Advanced Search

User fill-in template with supplied terminology
 
Boolean Logic 
  • Boolean logic via template terminology (up to three levels) 
    • must (Boolean AND) 
    • should (Boolean OR) 
    • should not Boolean NOT) 
Case Sensitivity 
  • Full case sensitivity supported 
Fields 
  • Document 
  • Title 
  • URL 
  • Hyperlink 
  • Search by location (top level domain) 
Phrases 
  • Phrase 
Truncation 
  • No truncation symbol; engine stems each term 
 

Exercise: Multiple concept search

Query: What is being done by Congress to reform campaigns?
Search (a)   campaign   reform    Congress

Note: This search is incorrect! OR logic is the default logic at Infoseek.

Search (b):    +campaign   +reform    +Congress

Note: This search is correct! AND logic now applies with the use of the + symbol in front of each search term.

Exercise: Multiple concept search

Query: I'm interested in knowing how the glut of Ph.D. graduates is affecting the job market.
Search (a)   Ph.D.    glut    job market

Note: This search is incorrect! OR logic is the default logic at Infoseek. We have also failed to place the phrase within double quotations.

Search (b)   +Ph.D.    +glut    +"job market"

Note: This search is correct! AND logic now applies with the use of the + symbol in front of each search term. We have also placed the phrase within double quotations.

Exercise: Combined Boolean search

Query: How much of a problem is university crime?
Search   college    university    campus    +title:crime

Notes on the search:

Exercise: Field Search

Query: I'd like to see information on slavery.
Search (a):    slavery

Note: This is a poor search to do in a large, full-text database.

Search (b):    title:slavery

Note: This is a much better search. This search will look for slavery in the HTML title field.

Search (c):    url:slavery

Note: This is also a very good search. This search will look for slavery in the URL of the file, e.g., in a subdirectory named slavery, or in a filename such as slavery.html.

Search (d):    title:slavery    url:slavery

Note: This is an even better search. This search looks for slavery in either the title, in the URL, or both using OR logic.

Search (e):    +title:slavery    +url:slavery

Note: This is a very restrictive search, but it could be quite useful in focusing your results. This search looks for slavery in both the title and in the URL, using AND logic.

Exercise: Field search

Query: Can I see reviews of the novels of Joyce Carol Oates?
Search:    +title:"Joyce Carol Oates"     +novels
[Return to Index]


AltaVista, HotBot, Excite, and Lycos are three other high quality search engines.

AltaVista - http://www.altavista.digital.com/

When to use AltaVista? When you want to search a large full text database by keywords, phrase, or field, or perform a complex Boolean search.

Characteristics:

Strengths: Weaknesses:

Search Engine Syntax

 
Boolean Logic 
  • Keyword search defaulting to Boolean OR 
  • Implied Boolean logic: + for Boolean AND, - for Boolean NOT 
Case Sensitivity 
  • Full case sensitivity supported 
Fields 
  • Page retrieval by language 
  • Supported fields: 
    • Anchor, e.g., anchor:White-House 
    • Applet, e.g., applet:NervousText 
    • Domain, e.g., domain:nz 
    • Host, e.g., host:ibm.com 
    • Image, e.g., image:clinton.jpg 
    • Link, e.g., link:www.albany.edu/library/ 
    • Text, e.g., text:fungicide 
    • Title, e.g., title:"New York Times" 
    • URL, e.g., url:holocaust 
Phrases 
  • Phrases within double quotations, e.g., "budget battle" 
Truncation 
  • Mandatory truncation 
  • Truncation character: * 
  • Internal truncation supported, e.g., colo*r 
 

Advanced search

Click on Advanced option
All the features above with the following additions
 
Boolean Logic 
  • Full Boolean logic 
  • Boolean AND, OR, AND NOT, NEAR [terms within 10 words of each other] with parentheses. e.g., behavior and (cats or felines) 
Fields 
  • Date 
 

Refine

AltaVista's Refine feature (also called Cow9) allows you to hone in on your topic. Refine presents a thesaurus of terms related to your search terms organized into concept categories. You may use these terms to modify your original search. Refine terms are chosen from documents retrieved by your search based on the frequency of appearance of these terms in these documents. Most frequently-appearing terms are listed first. You can either Require or Exclude terms listed in the various topic groups to modify your subsequent search.

You may view your word list either in a text or graphical mode. The graphical mode requires a Java-enabled browser.

Keep in mind these drawbacks:

  1. You cannot choose terms to add to your search using Boolean OR logic. Any terms you add to your original search with be combined with AND logic. Be sure to add only a few words to your search with the Require option or you may end out with zero results.
  2. Only single words, and not phrases, are handled by Refine.
For more information, see the Refine help page at AltaVista.


Exercise: Advanced Query

Choose "Advanced" option
Perform the following search, typing the bolded words and choosing the appropriate option:

Query: How is global warming affecting sea levels?

Search:
  1. ("global warming" or "climate change" or "global change") near "sea level"
Notes on this search:
[Return to Index]


HotBot - http://www.hotbot.com/

When to use HotBot? When you want to search a very large database with field and media search options using a fill-in template to guide your query.

Characteristics:

Strengths: Weaknesses:

Search Engine Syntax

Main screen

User fill-in template with supplied terminology
 
Boolean Logic 
  • Boolean logic via template terminology: 
    • all the words (Boolean AND) 
    • any of the words (Boolean OR) 
    • the person (proximity of terms) 
    • Boolean phrase 
  • Boolean phrase option supports AND, OR, NOT, AND NOT, and must be entered in CAPS. Boolean searches with parentheses are supported with this option, i,e., behavior AND (cats OR felines)
  • Implied Boolean logic: + for Boolean AND, - for Boolean NOT 
Case Sensitivity 
  • Limited support within words, e.g., neXT 
Fields 
  • the page title 
  • links to this URL 
  • Date, i.e., last modified date 
  • Language 
  • Meta Words offer manual input off search restrictions such as media type and date. For more informatoin, see Advanced Help file at http://www.hotbot.com/help/tips/search_features.asp 
Phrases 
  • Exact phrase 
  • Phrases within double quotations, e.g., "budget battle." This is useful for multi-term searches using the option all the words or any of the words. 
Truncation 
  • No truncation symbol; engine stems each term 
 

Super Search

All the features above with the following additions
 
Boolean Logic 
  • Added windows for subject searching; click on + icon to add more windows (2 may be added for a maximum of 4) 
Fields 
  • Date 
  • Language 
  • Location by domain or continent 
  • Page type 
  • File/program types including 
    • Java 
    • JavaScript 
    • ActiveX 
    • VRML 
    • Acrobat 
    • VB Script 
    • File Extension 
 

Exercise: Keyword search

Query: What can be done to prevent pollution from automobiles?
Search:    automobile    pollution    prevention

Note that "all the words" is the default.

Exercise: More complex search

Query:What is the United Nations doing about chemical weapons? I want to see files published in Adobe Acrobat format over the past two years and posted on a site in Europe.
Click on SuperSearch
Perform the following search, typing the bolded words and choosing the given options:
  1. Search the Web for "exact phrase"
  2. Type: chemical weapons
  3. Choose "must contain" "the phrase" united nations
  4. Date: specify after or on January 1, 1995
  5. Location: Continent, choose "Europe"
  6. Media Type: choose "Acrobat"
NOTE: Each file returned by this search will contain documents in Acrobat (PDF) format, as well as the terms chemical weapons and United Nations. Not all these elements, however, come together in the same PDF file in every case. These search results illustrate a limitation of HotBot, but also its unique strength in locating files of certain media types.
[Return to Index]


Excite - http://www.excite.com/

When to use Excite? When you want to search an up-to-date database with Boolean logic, keywords, or natural language, or take advantage of Excite's concept searching when you don't know what terms to use.

Characteristics:

Strengths: Weaknesses:

Search Engine Syntax

Main screen

 
Boolean logic 
  • Keyword search defaulting to Boolean OR 
  • Implied Boolean logic: + for Boolean AND, - for Boolean NOT 
  • Full Boolean logic: Boolean AND, OR, AND NOT with parentheses, e.g., behavior AND (cats OR felines). Boolean operators must be in CAPS. 
Case sensitivity 
  • None 
Fields 
  • None 
Phrases 
  • Phrases within double quotations, e.g., "budget battle" 
Truncation 
  • None 
Other 
  • With list of hits, option to add displayed related terms to the original search; terms will be added with OR logic 
  • Once list of hits is returned, option to View by Web Site (URL) 
 

Advanced search

Click on Power Search
 
Boolean Logic 
  • Boolean logic via template terminology: 
    • My search results CAN contain the words, the name or phrase (Boolean OR) 
    • My search results MUST contain the words, the name or phrase (Boolean AND) 
    • My search results MUST NOT contain the words, the name or phrase (Boolean NOT) 
  • Option to add more constraints as above 
Case Sensitivity 
  • None 
Fields 
  • None 
Phrases 
  • Phrases within double quotations, e.g., "budget battle" 
Truncation 
  • None 
 

Exercise: Single concept search using Excite's concept searching option

Query: I'm interested in learning about slavery.
Search:
  1. Type:    slavery
  2. Choose a file that interests you and click on Search for more documents like this one
  3. Examine the relevancy of these new results
[Return to Index]


Lycos - http://www.lycos.com/

When to use Lycos? When you want to search a Web page database with a variety of Boolean and term proximity options. You can also control the relevancy ranking of your results.

Characteristics:

Strengths: Weaknesses:

Search Engine Syntax

Main screen

 
Boolean Logic 
  • Keyword search defaulting to Boolean AND 
  • Implied Boolean logic: + for Boolean AND, - for Boolean NOT 
Case Sensitivity 
  • None 
Fields 
Phrases 
  • Phrases within double quotations, e.g., "budget battle" 
  • Phrases with stop words are searchable if other words also appear in the phrase 
Truncation 
  • Truncation is optional; search engine stems search terms 
  • Place a period (.) after the term to prevent truncation 
  • Optional truncation character: $ 
Other 
  • No searching a word that starts with a number, e.g., 11th 
 

Advanced search

Click on Advanced Search
User fill-in template with supplied terminology
 
Boolean Logic 
  • Boolean logic via template terminology: 
    • Any of the words (Boolean OR) 
    • All the words (Boolean AND) 
    • All the words (Good match) 
    • All the words (Near match) 
    • All the words (Close match) 
    • All the words (Strong match) 
      • The last 4 items above refer to relative proximity
  • Implied Boolean logic: +/- for Boolean AND, NOT 
  • Full Boolean logic: AND, OR, NOT with parentheses, e.g., behavior and (cats or felines) 
  • Several term proximity options: 
    • NEAR (within 25 words); NEAR/n 
    • ONEAR (within 25 words in exact query order); ONEAR/n 
    • FAR (at least 25 words apart); FAR/n 
    • OFAR (at least 25 words apart in exact query order); OFAR/n 
    • ADJ (words next to each other); ADJn 
    • OADJ (words next to each other in exact query order); OADJ/n 
    • BEFORE (first query word appears in document before second query word) 
      • n represents a user-specified number
Case Sensitivity 
  • None 
Fields
  • None 
Phrases 
  • The Exact Phrase 
  • Phrases within double quotations, e.g., "budget battle" 
  • Phrases with stop words are searchable if other words also appear in the phrase 
Truncation 
  • Truncation is optional; search engine stems search terms 
  • Place a period (.) after the term to prevent truncation 
  • Optional truncation character: $ 
Other 
  • Natural Language Query 
  • Relevancy ranking: Click on Power Panel (HTML and Java versions) to specify the relevancy ranking factors in terms of: 
    • Match every word 
    • Frequency of words 
    • Appear early in text 
    • Appear in title 
    • Appear close together 
    • Appear in exact order 
 

Exercise: Simple keyword search

Query: What is being done to screen the blood supply for AIDS?
Search:
  1. Type these words:    aids    "blood supply"     screen$
Note how the space between the search terms is interpreted as the Boolean AND.
[Return to Index]


Northern Light is one of the most interesting search engines on the Web. This search engine organizes results into Custom Search Folders which allow the searcher to view results in clusters that are relevant to the intent of the search. 

Northern Light - http://www.northernlight.com/

When to use Northern Light? When you want to search the full text of Web pages and view results by selecting folders which group files into concept or site clusters.

Characteristics

Strengths Weaknesses

Search Engine Syntax

Main Screen

 
Boolean Logic 
  • Keyword searching defaulting to Boolean AND 
  • Boolean AND, OR, NOT 
  • Implied Boolean logic: + for Boolean AND, - for Boolean NOT 
Case Sensitivity 
  • None 
Fields 
  • URL, e.g., URL:whitehouse 
  • Title, e.g., title:"White House" 
  • Pub (searches certain Special Collections journals), e.g., pub:Billboard 
  • Company (searches certain Special Collections journals), e.g., company:ibm 
  • Ticker, e.g., ticker:ibm 
  • Text, e.g., text:"White House" 
Phrases 
  • Phrases within double quotations, e.g., "budget battle" 
Truncation 
  • Automatic for common singular and plural word forms 
  • * truncation character replaces multiple characters if the word contains at least four characters, e.g., librar* will retrieve library, libraries, librarian, librarians, librarianship, etc. 
  • % truncation character replaces a single character, e.g., colo%r 
 

Power Search

Click on Power Search
User fill-in template with supplied terminology
All the features above with the following additions:
 
Fields
  • Words anywhere (searches full text) 
  • Words in title 
  • Words in URL 
  • Country 
  • Date 
  • Language (French, German, Italian, Spanish) 
Other 
  • Limits results to various sources, topics and document types 
 

Exercise: Multiple concept search

Query: Will El Nino have an effect on the hurricane season?
Search:
  1. Type this search:    +"el nino"     +hurricanes
  2. Select: World Wide Web
  3. When search is complete, view the yellow Custom Search Folders on the left side of the screen. Click on the folder(s) that interest you.
  4. Continue drilling down through the folders until you reach the level of specificity you wish to achieve. View results on the right side of the screen.
[Return to Index]


Multithreaded (Meta) Search Engines

Multithreaded search engines simultaneously search multiple search engines. They are also referred to as parallel search engines, mega-search engines, or meta-search engines. These are useful when: It is important to note that many multithreaded search engines retrieve a certain maximum number of documents from the individual engines they have searched, cut off after a certain point as the search is processed. Inference Find claims to return the maximum number of results that its targeted search engines will allow. In addition, many multithreaded search engines stop processing a query after a certain amount of time. Other search engines give the user a certain amount of control over the number of documents returned in a search. All these factors have two implications: The better multithreaded search engines remove duplicate files and give you some information along with the document title. MetaCrawler is an excellent choice. 

MetaCrawler - http://www.metacrawler.com/

When to use MetaCrawler? When you want a multithreaded search engine that processes results fast, removes duplicates, and presents results in relevancy ranked order.

Characteristics:

Strengths: Weaknesses:

Search Engine Syntax

Main screen

 
Boolean Logic 
  • Template terminology: any (Boolean OR), all (Boolean AND), as a phrase 
  • Implied Boolean logic: + for Boolean AND, - for Boolean NOT 
Case Sensitivity 
  • None 
Fields 
  • None 
Phrases 
  • Phrases within double quotations, i.e., "budget battle," to use with a search for any or all 
  • Exact phrases may not be searchable since some queried search engines ignore stop words 
Truncation 
  • None 
 

Power Search

Click on Power search interface
All options as above with the following addition:
 
Fields 
  • Location (URL) 
 

Exercise: Simple search - Main Screen

Query:How has the seal fur boycott affected the Inuit Indians?
Search:    fur    boycott     Inuit

Note that all is the default.

Exercise: Advanced Search

Click on Power search interface

Query: Is there a problem with Internet addiction in Europe?

Search:
  1. "Internet addiction"
  2. Results From: choose Europe
[Return to Index]


Other recommended multithreaded search engines:

CYBER411
http://www.cyber411.com/

Searches 16 search engines and subject directories, and allows for Boolean and proximity searching
INFERENCE FIND
http://www.inference.com/infind/

Searches multiple search engines and groups results by concept and Internet site
METAFIND
http://www.metafind.com/

Searches 6 search major search engines for a limited number of results and sort alphabetically, by keywords, or by domain
PROFUSION
http://profusion.ittc.ukans.edu/

Searches 6 search engines OR the best three for your query and gives collated results; also stores queries at the site and e-mails notification when new sites are found
SAVVY SEARCH
http://www.cs.colostate.edu/~dreiling/smartform.html

Searches multiple search engines and gives collated results if you choose the option to "Integrate results"
[Return to Index]


For a summary of the instructions contained in this document, see
 
Laura Cohen
lcohen@cnsvax.albany.edu