Searching the Internet:
Recommended Sites and Search Techniques
Updated: 6 October 1998
This document will help you to explore a variety of subject directories
and search engines in order to gain skills in conducting independent research
on the Internet. For more information about Internet research, see Conducting
Research on the Internet.
For a more comprehensive list of subject directories, see: Internet
Subject Directories
For a more comprehensive list of search engines, see: Search
the Internet
Sites reviewed in this document:
Remember...
-
The Internet is a self-publishing medium. Your visits to subject directories
and search engines will yield files with a wide range of quality from a
variety of sources. Be sure to evaluate everything you encounter.
For more information, see Evaluating Internet Resources.
-
Try out multiple research sites when you are investigating a topic. Subject
directories and search engines vary in their database contents, features,
and accuracy.
Subject Directories
Definition: A subject directory is a database of Internet
files submitted by site creators or evaluators and organized into subject
categories. Most directories offer a search engine to query the database.
The service may or may not use selection criteria when choosing files to
include in the database. v
General Tips
-
Subject directories differ significantly in selectivity. Consider the policies
of any directory that you visit.
-
One challenge to the above is the fact that not all directory services
are willing to disclose either their policies or the names and qualifications
of site reviewers.
-
Some subject directories consist of links accompanied by annotations to
describe or evaluate site content. A well-written annotation from a known
reviewer is more useful than just a list of links.
-
Most subject directories offer a search engine to query the database. For
more information, see General Search Strategies
below.
Three subject directories useful for initial exploration are Yahoo!,
Magellan, and The Argus Clearinghouse. These three services
illustrate vastly different policies for selection criteria.
-
Yahoo! does not evaluate content, but only categorizes sites submitted
to the database
-
Magellan exercises a fairly good level of selectivity when choosing
sites for its "Reviewed Sites Only" database
-
The Argus Clearinghouse consists of generally highly selective subject
guides that are often compiled by experts
When to use Yahoo? When you want to visit a
site with broad but unevaluated subject coverage to get an idea about what
is available on the Internet on your topic.
Strengths:
-
Largest subject database on the World Wide Web
-
Broad subject coverage
-
Has a hierarchical subject organization that is good for browsing
-
Has a search engine that runs your search in several Internet search engines
Weaknesses:
-
Accepts almost any site submitted for inclusion to its database and does
not evaluate for quality or accuracy
-
Yahoo! makes no attempt to be comprehensive in any subject area
-
There is generally sporadic coverage of academic subjects
-
Yahoo tends to index only the major landing page of a site; therefore,
any significant subsidiary pages on a related or different topic may not
show up on this site
-
Subject classifications are not always useful
-
When you do a search in Yahoo, you are searching only the title and the
short descriptive blurb about the site; by contrast, search engines usually
give you access to the full text of the document
-
Advanced searching allows for only one Boolean operator per search
Search Engine Syntax
Main screen
| Boolean Logic |
-
Keyword search defaulting to Boolean AND
-
Implied Boolean logic: + for Boolean AND, - for Boolean NOT
|
| Case Sensitivity |
|
| Fields |
-
Title, e.g., t:automobiles
-
URL, e.g., u:ibm
|
| Phrases |
-
Phrases within double quotations, e.g., "budget battle"
|
| Truncation |
-
Mandatory truncation
-
Truncation character: *
|
Advanced search
Click on options
User fill-in template with supplied terminology
| Boolean Logic |
-
Matches on all words (AND)
-
Matches on any word (OR)
-
A person's name
|
| Case Sensitivity |
|
| Fields |
|
| Phrases |
|
| Other |
-
Intelligent default (explanation not provided)
|
[Return to Index]
When to use Magellan? When you want to find
generally good quality sites without having to wade through sites of lesser
quality. The database of Reviewed Sites is currently not being kept
up to date.
Strengths:
-
There is generally good coverage in academic subject areas
-
Sites in the "Reviewed Sites Only" portion of the database must pass through
an editorial board to be accepted for review and inclusion in the database
-
Reviewed sites are accompanied by an evaluation, and are rated in terms
of content, ease of exploration, and appeal
-
You can choose to search only the reviewed sites, or else the entire database
of both reviewed and Internet-wide sites
-
The browsable subject directory contains only the reviewed sites
Weaknesses:
-
The reviewed database has not be updated in many months--beware!
This is a good example of the caution you should exercise when visiting
a site and the reason Magellan is included in this tutorial. Click on the
"Add Site" link at the bottom of the page to see why caution is advised.
-
The ratings are not very strict
Search Engine Syntax
Main screen only
Uses the Excite search engine
| Boolean Logic |
-
Keyword search defaulting to Boolean OR
-
Implied Boolean logic:+ for Boolean AND, - for Boolean NOT
-
Full Boolean logic: Boolean AND, OR, AND NOT with parentheses, e.g., behavior
AND (cats OR felines). Boolean operators must be in CAPS.
|
| Case sensitivity |
|
| Fields |
|
| Phrases |
-
Phrases within double quotations, i.e., "budget battle"
|
| Truncation |
|
[Return to Index]
When to use the Clearinghouse? When you want
to find a collection of Internet resources on specific topics recommended
by specialists.
Strengths:
-
Directory tends to be highly selective
-
There is strong coverage in academic subject areas
-
Collections of recommended sites are organized into subject-specific guides
created by identified individuals
-
Guide authors are often subject specialists
-
Many resources within the guides are described and evaluated
-
Guides are rated by the Clearinghouse staff according to several parameters
Weaknesses
-
Subject coverage is determined by the guides that are submitted
-
Guides with low ratings are not excluded
-
Guides are not necessarily kept up to date
-
Search engine has unusual syntax, so reading the Search Tips is a must
Search Engine Syntax
To search (information pages of the guides only), click on Search/Browse
| Boolean Logic |
-
Keyword search defaulting to Boolean AND unless one of the terms is
truncated, in which case the default is OR
-
Full Boolean logic: Boolean OR, AND with parentheses, e.g., behavior
and (cats or felines)
-
NOT operator is not supported
|
| Case sensitivity |
|
| Fields |
|
| Phrases |
|
| Truncation |
-
Mandatory truncation
-
Truncation character: *
|
[Return to Index]
Other recommended subject directories with evaluated content:
-
BUBL Link
-
http://bubl.ac.uk/link/
-
INFOMINE: Scholarly Internet Resources
-
http://lib-www.ucr.edu/
-
Librarians' Index to the Internet
-
http://sunsite.berkeley.edu/InternetIndex/
-
The WWW Virtual Library
-
http://http://vlib.stanford.edu/Overview.html
Search Engines
Definition: A search engine is a searchable database
of Internet files collected by a computer program (called a wanderer, crawler,
robot, worm, spider). Indexing is created from the collected files, e.g.,
title, full text, size, URL, etc. There is no selection criteria for the
collection of files.
A search engine might well be called a search engine service
or a search service. As such, it consists of three components:
-
Spider: Program that traverses the Web from link to link, identifying
and reading pages
-
Index: Database containing a copy of each Web page gathered by the
spider
-
Search engine mechanism: Software that enables users to query the
index and that usually returns results in relevancy ranked order
General Tips
-
Search engines do not index all the documents available on the Web. For
example, most search engines cannot index files to password-protected sites.
Documents behind a firewall will not be accessible to a spider. Other files
can be excluded from search engines by Web server software at the host
site. Still other Web pages may not be picked up if they are not linked
to other pages, and are therefore missed by a search engine spider as it
crawls from one page to the next. Search engines rarely contain the most
recent documents posted to the Internet; do not look for yesterday's news
on a search engine.
-
Search engine features are improving with time. For example, several search
services offer searches on fields, programming languages, domain locations,
dates, and so on. These include AltaVista, HotBot, and Infoseek. For a
summary of the features available at many of the major search engines,
see How to Choose a Search Engine or Research Database.
-
Most major search engine indexes consist of the full text of source files.
These include AltaVista, Excite, HotBot, and Infoseek. Lycos, however,
indexes only certain portions of Web pages in its database. When you search
a full text index, you will retrieve a file even if your search terms appear
only once in the text and do not represent the primary topic of the document.
Limiting your search to fields or using proximity operators (explained
below) can be a useful way to boost the relevancy of your results.
-
Some search engines have an interface for basic searches as well as a separate
interface for advanced queries. Be sure to understand the difference in
syntax requirements and to use the interface that is appropriate for your
query.
-
Search engines don't always pay attention to everything included in your
search statements. For example, if you search for three terms using the
Boolean AND, you may retrieve files with only two, one, or even none of
these terms present. If you are unsure of the relevance of a document,
use the "find" feature of your Web browser to search for your terms.
-
Because only partially relevant files are often returned as a result of
your search, good relevancy ranking is important. Most search engines use
various criteria to construct a relevancy rating of each hit and will present
your search results in this order. Criteria can include: search terms in
the title, URL, first heading, HTML META tag; number of times search terms
appear in the document; search terms appearing early in the document; search
terms appearing close together; page popularity, etc.
-
Beware of relevancy rating nonetheless. Your most relevant hits may appear
beyond the first few screens. Your point of view is far more complex than
the search engine's relevancy algorithm. In addition, the source documents
on the Internet are extremely varied and only a small subset can be expected
to meet your needs.
-
One of the most interesting developments in search engine technology is
the organization of search results by concept and/or site rather than by
relevancy. To see if this may be useful to you, visit Inference
Find, MetaFind, and Northern
Light. Northern Light is a particularly good example of this technology.
-
Offered features do not always successfully work.
-
These Internet sites are not stable. Expect the interface and features
to change. Be sure to read the Help files at each site.
[Return to Index]
General Search Strategies
-
Most search engines employ the principles of Boolean logic in the formulation
of search queries. See Boolean Searching on the
Internet for detailed information about search strategy.
-
Search engines offering a keyword search option have a default Boolean
logic. This means that the space between multiple search terms defaults
to either OR logic or AND logic. It is imperative that you know which logical
operator is the default. For example, AltaVista (simple search interface),
Excite, Infoseek and MetaCrawler default to OR. At these sites, you
must place a plus sign (+) in front of each search term for AND logic to
apply. If you just enter multiple keywords, the space between them
will be interpreted as the Boolean OR.
-
When searching full text databases, use proximity operators (e.g., NEAR)
if these are available rather than specifying an AND relationship between
your keywords. This will make sure that your keywords are located near
each other in the full text document. See Boolean
Searching on the Internet for a list of sites that offer proximity
searching.
-
Field searching is another extremely important way of limiting your search
results in large search engines that contain millions of full-text files.
For example,
TITLE:slavery
in a search engine such as AltaVista or Infoseek will bring you more relevant
hits than merely searching on the keyword slavery.
-
To enhance subject searches, try the URL field to narrow your results.
The URL field offers a good way to search for certain subject terms. This
is because of the make-up of the URL.
Anatomy of a URL
This is a URL on the CNN home page
http://www.cnn.com/feedback/comments.html
This URL is typical of addresses hosted in domains in the United States.
Structure of this URL:
-
Protocol: http
-
Host computer name: www
-
Second-level domain name: cnn
-
Top-level domain name: com
-
Directory name: feedback
-
File name: comments.html
The directory name and file name often contain subject terms.
These can be searched with the URL field.
For example:
URL:slavery
will give you more relevant results than the keyword slavery
by searching for this term as a directory name or a file name.
-
To find a home page when you know the location or sponsor of the
information, use the SITE field. In this case, you search on the top-level
and second-level domain names together, and then use AND logic to add subject
terms to your search.
Examples of sites:
mit.edu
nasa.gov
netscape.com
For example, if you are searching for information about spacewalks conducted
by NASA, search on the SITE field for nasa.gov. Use AND logic to
include terms for the spacewalk. This search will limit your results to
files at the NASA Web site.
Using the syntax of AltaVista or Infoseek:
+spacewalks +site:nasa.gov
-
Beware of searching on top-level domains to narrow your search. Do NOT
try to search for the URL edu or com. There are too many
pages in these domains for the search engine to handle. On the other hand,
searching for the URL gov may be more successful because there are
far fewer of these pages. Still, all searches on top-level domains should
be used with caution.
-
Limiting a search by a country code might be a viable option. For a complete
list of ISO 3166 Internet country codes, see: http://www.ics.uci.edu/pub/websoft/wwstat/country-codes.txt
[Return to Index]
One of the best search engines on the Internet is Infoseek because
-
it is one of the most accurate on the Web
-
it clusters results by presenting one hit per site (this option can be
disabled)
-
it processes search results fast
When to use Infoseek? When you want to search
the full text of a relatively large database using implied Boolean logic
with field search options and retrieve accurate results quickly.
Characteristics:
-
One of the fastest search engines on the Internet
-
Default syntax is OR; when you enter multiple search terms, the space between
the terms is interpreted as a Boolean OR
-
Clusters results by presenting one hit per site; users have the option
to disable this feature
-
Offers recommended and/or reviewed Web sites for certain topics
-
Engine is case sensitive, but lower case terms will retrieve upper case
terms
Strengths:
-
One of the most accurate of the major search engines
-
Offers accurate searching within fields such as title, URL, link, and site
(domain)
-
Processes queries very fast
-
Updates pages in the database on a frequent basis
-
Truncation is automatic, though beware of irrelevant hits
-
Claims to remove dead links and duplicate pages from the database
-
Offers concept recognition for names, noun phrases, numbers, word form
variants
-
Advanced Search interface offers a user-friendly search template
Weaknesses:
-
Query window on the main screen is rather small
-
Boolean operators may not be used
Search Engine Syntax
Main screen
| Boolean Logic |
-
Keyword search defaulting to Boolean OR
-
Implied Boolean logic: + for Boolean AND, - for Boolean NOT
|
| Case Sensitivity |
-
Full case sensitivity supported
|
| Fields |
-
Supported fields:
-
Title, e.g., title:"New York Times"
-
URL, e.g., url:holocaust
-
Site, e.g., site:ibm.com
-
Link e.g., link:www.albany.edu/library
|
| Phrases |
-
Phrase search in one of two ways:
-
Place phrase within double quotations, e.g., "budget battle"
-
Insert hyphen between each word, e.g., budget-battle
|
| Truncation |
-
No truncation symbol; engine stems each term
|
| Other |
-
Separate a search of proper names with a comma
-
Bill Clinton, Bob Dole
-
Use a pipe (|) to limit search results from within a retrieved set
-
dogs | poodles
|
Advanced Search
User fill-in template with supplied terminology
| Boolean Logic |
-
Boolean logic via template terminology (up to three levels)
-
must (Boolean AND)
-
should (Boolean OR)
-
should not Boolean NOT)
|
| Case Sensitivity |
-
Full case sensitivity supported
|
| Fields |
-
Document
-
Title
-
URL
-
Hyperlink
-
Search by location (top level domain)
|
| Phrases |
|
| Truncation |
-
No truncation symbol; engine stems each term
|
Exercise: Multiple concept search
Query: What is being done by Congress to reform campaigns?
Search (a): campaign
reform Congress
Note: This search is incorrect! OR logic is the default
logic at Infoseek.
Search (b): +campaign +reform
+Congress
Note: This search is correct! AND logic now applies with
the use of the + symbol in front of each search term.
Exercise: Multiple concept search
Query: I'm interested in knowing how the glut of Ph.D. graduates
is affecting the job market.
Search (a): Ph.D.
glut job market
Note: This search is incorrect! OR logic is the default
logic at Infoseek. We have also failed to place the phrase within double
quotations.
Search (b): +Ph.D. +glut
+"job market"
Note: This search is correct! AND logic now applies with
the use of the + symbol in front of each search term. We have also placed
the phrase within double quotations.
Exercise: Combined Boolean search
Query: How much of a problem is university crime?
Search: college
university campus +title:crime
Notes on the search:
-
OR logic will apply to the terms college, university, and campus
-
AND logic will apply to the term crime
-
The term crime is searched in the title to increase the relevancy
of retrieved documents
Exercise: Field Search
Query: I'd like to see information on slavery.
Search (a): slavery
Note: This is a poor search to do in a large, full-text database.
Search (b): title:slavery
Note: This is a much better search. This search will look for
slavery in the HTML title field.
Search (c): url:slavery
Note: This is also a very good search. This search will look
for slavery in the URL of the file, e.g., in a subdirectory named
slavery, or in a filename such as slavery.html.
Search (d): title:slavery
url:slavery
Note: This is an even better search. This search looks for slavery
in either the title, in the URL, or both using OR logic.
Search (e): +title:slavery
+url:slavery
Note: This is a very restrictive search, but it could be quite
useful in focusing your results. This search looks for slavery in
both the title and in the URL, using AND logic.
Exercise: Field search
Query: Can I see reviews of the novels of Joyce Carol Oates?
Search: +title:"Joyce Carol Oates"
+novels
[Return to Index]
AltaVista, HotBot, Excite, and Lycos are three other
high quality search engines.
-
AltaVista offers both a Simple and a Power Search interface with
numerous searchable fields including language. Its Refine feature
offers a useful way of locating terms with which to modify an initial search.
AltaVista continues to increase the size of its database and now indexes
140 million files, the largest of any search engine.
-
HotBot is a good choice for media, geographic, and date searching,
and offers a convenient user fill-in template that easily handles complex
searches. HotBot has also increased the size of its database, and now claims
to index 110 million files. Like Infoseek, HotBot clusters results by presenting
one hit per site; unlike Infoseek, this option can not be disabled.
-
Excite is known for the currency of its database and its use of
concept searching for gathering results. Its unique feature to Search
for more documents like this one allows you to locate files related
to any of your search results.
-
Lycos is one of the smaller search engine databases on the Internet
with 30 million Web pages indexed. Its Power Panel allows you to
control the factors of relevancy ranking. Lycos offers an impressive array
of options for term proximity searching. This search engine updates its
database weekly, making it one of the most current on the Web.
When to use AltaVista? When you want to search
a large full text database by keywords, phrase, or field, or perform a
complex Boolean search.
Characteristics:
-
Indexes the full text in all files in its database
-
In a simple (main screen) query, defaults to OR logic; + sign must precede
search terms for AND logic to apply
-
Treats each search term as it is presented. Truncation (*) allows searcher
to stem search terms
-
Engine is case sensitive, but lower case terms will retrieve upper case
terms
-
Can search the full text of Usenet newsgroups
-
Translates any text or pages retrieved as search results. This service
is known as Babelfish, and may be used as a standalone service located
at http://babelfish.altavista.digital.com/.
Available translations:
-
English to French
-
English to German
-
English to Italian
-
English to Portuguese
-
English to Spanish
-
French to English
-
German to English
-
Italian to English
-
Spanish to English
-
Portuguese to English
Strengths:
-
Very large database
-
Allows for complex Boolean searching with parentheses in Power Search
-
Advanced search offers term proximity searching
-
Offers Refine feature to add or exclude related terms in a subsequent
search
-
Offers searching within numerous fields such as anchor, applet, host, image,
link, text, title, URL
-
Allows results to be retrieved in numerous languages
-
Allows retrieval of pages by their last modified date; this is useful depending
on the frequency and comprehensiveness of the Alta Vista index update (Advanced
search)
Weaknesses:
-
Relevancy ranking can be questionable
-
In Simple Search, switches default logic from OR to AND if two or more
fields are searched, e.g., title:mars url:nasa
-
Refine can be problematic [see below]
Search Engine Syntax
| Boolean Logic |
-
Keyword search defaulting to Boolean OR
-
Implied Boolean logic: + for Boolean AND, - for Boolean NOT
|
| Case Sensitivity |
-
Full case sensitivity supported
|
| Fields |
-
Page retrieval by language
-
Supported fields:
-
Anchor, e.g., anchor:White-House
-
Applet, e.g., applet:NervousText
-
Domain, e.g., domain:nz
-
Host, e.g., host:ibm.com
-
Image, e.g., image:clinton.jpg
-
Link, e.g., link:www.albany.edu/library/
-
Text, e.g., text:fungicide
-
Title, e.g., title:"New York Times"
-
URL, e.g., url:holocaust
|
| Phrases |
-
Phrases within double quotations, e.g., "budget battle"
|
| Truncation |
-
Mandatory truncation
-
Truncation character: *
-
Internal truncation supported, e.g., colo*r
|
Advanced search
Click on Advanced option
All the features above with the following additions
| Boolean Logic |
-
Full Boolean logic
-
Boolean AND, OR, AND NOT, NEAR [terms within 10 words of each other] with
parentheses. e.g., behavior and (cats or felines)
|
| Fields |
|
Refine
AltaVista's Refine feature (also called Cow9) allows you to hone
in on your topic. Refine presents a thesaurus of terms related to your
search terms organized into concept categories. You may use these terms
to modify your original search. Refine terms are chosen from documents
retrieved by your search based on the frequency of appearance of these
terms in these documents. Most frequently-appearing terms are listed first.
You can either Require or Exclude terms listed in the various
topic groups to modify your subsequent search.
You may view your word list either in a text or graphical mode. The
graphical mode requires a Java-enabled browser.
Keep in mind these drawbacks:
-
You cannot choose terms to add to your search using Boolean OR logic. Any
terms you add to your original search with be combined with AND logic.
Be sure to add only a few words to your search with the Require
option or you may end out with zero results.
-
Only single words, and not phrases, are handled by Refine.
For more information, see the Refine help
page at AltaVista.
Exercise: Advanced Query
Choose "Advanced" option
Perform the following search, typing the bolded words and choosing
the appropriate option:
Query: How is global warming affecting sea levels?
Search:
-
("global warming" or "climate change" or "global change") near "sea
level"
Notes on this search:
-
The global warming terms are placed within parentheses to ensure that the
engine processes these terms as a logical unit before searching them in
relation to sea level.
-
Near will place the terms on either side within 10 words of each
other in the source file
[Return to Index]
When to use HotBot? When you want to search
a very large database with field and media search options using a fill-in
template to guide your query.
Characteristics:
-
One of the largest search engine databases on the Internet, with 100+ million
files
-
Clusters results by presenting one hit per site
-
Includes its channel content with the results for searches on broad or
popular terms
-
Supports complex Boolean searches with parentheses, i.e., behavior AND
(cats OR felines). Boolean operators must be in CAPS.
-
Direct Hit feature provides a list of popular sites relating to
your query [appears with about 75% of search queries]
-
Offers limited case sensitive searches, e.g., the company NeXT
Strengths:
-
Offers a helpful user fill-in template to demonstrate the available search
options
-
Allows Boolean searches with parentheses
-
Offers unusual search options with often accurate results, including media
type, URL, geographic location, and personal name
-
Supports automatic truncation
-
Allows retrieval of pages by their last modified date; this is useful depending
on the frequency and comprehensiveness of the HotBot index update
Weaknesses:
-
Relevancy ranking is not always successful
-
Fitting subject terms to options such as media type is not always possible
Search Engine Syntax
Main screen
User fill-in template with supplied terminology
| Boolean Logic |
-
Boolean logic via template terminology:
-
all the words (Boolean AND)
-
any of the words (Boolean OR)
-
the person (proximity of terms)
-
Boolean phrase
-
Boolean phrase option supports AND, OR, NOT, AND NOT, and must be
entered in CAPS. Boolean searches with parentheses are supported with this
option, i,e., behavior AND (cats OR felines).
-
Implied Boolean logic: + for Boolean AND, - for Boolean NOT
|
| Case Sensitivity |
-
Limited support within words, e.g., neXT
|
| Fields |
-
the page title
-
links to this URL
-
Date, i.e., last modified date
-
Language
-
Meta Words offer manual input off search restrictions such as media type
and date. For more informatoin, see Advanced Help file at http://www.hotbot.com/help/tips/search_features.asp
|
| Phrases |
-
Exact phrase
-
Phrases within double quotations, e.g., "budget battle." This is useful
for multi-term searches using the option all the words or any
of the words.
|
| Truncation |
-
No truncation symbol; engine stems each term
|
Super Search
All the features above with the following additions
| Boolean Logic |
-
Added windows for subject searching; click on + icon to add more windows
(2 may be added for a maximum of 4)
|
| Fields |
-
Date
-
Language
-
Location by domain or continent
-
Page type
-
File/program types including
-
Java
-
JavaScript
-
ActiveX
-
VRML
-
Acrobat
-
VB Script
-
File Extension
|
Exercise: Keyword search
Query: What can be done to prevent pollution from automobiles?
Search: automobile
pollution prevention
Note that "all the words" is the default.
Exercise: More complex search
Query:What is the United Nations doing about chemical weapons? I
want to see files published in Adobe Acrobat format over the past two years
and posted on a site in Europe.
Click on SuperSearch
Perform the following search, typing the bolded words and choosing
the given options:
-
Search the Web for "exact phrase"
-
Type: chemical weapons
-
Choose "must contain" "the phrase" united nations
-
Date: specify after or on January 1, 1995
-
Location: Continent, choose "Europe"
-
Media Type: choose "Acrobat"
NOTE: Each file returned by this search will contain documents in
Acrobat (PDF) format, as well as the terms chemical weapons and
United Nations. Not all these elements, however, come together in
the same PDF file in every case. These search results illustrate a limitation
of HotBot, but also its unique strength in locating files of certain media
types.
[Return to Index]
When to use Excite? When you want to search
an up-to-date database with Boolean logic, keywords, or natural language,
or take advantage of Excite's concept searching when you don't know what
terms to use.
Characteristics:
-
Searches for concepts related to your search terms
-
Database is claimed to be rather current: top pages are visited weekly
by the crawler, and the rest are visited every three weeks
-
Includes its channel content with results for searches on broad or popular
terms
Strengths:
-
Offers multiple syntax options including Boolean operators, implied Boolean
+/-, and form-based Boolean searching.
-
Offers to Search for more documents like this one to retrieve pages
related to your search results. This is helpful when you find relevant
hits and want to see more on the same aspect of the topic.
-
With list of hits, offers the option to add displayed related terms to
the original search
-
Power Search interface offers the option to dispaly the first 40 results
grouped by Web site
Weaknesses:
-
No field searching is available
-
Excite's concept searching may pull in irrelevant hits, though exact matches
are shown first in the list of hits
Search Engine Syntax
Main screen
| Boolean logic |
-
Keyword search defaulting to Boolean OR
-
Implied Boolean logic: + for Boolean AND, - for Boolean NOT
-
Full Boolean logic: Boolean AND, OR, AND NOT with parentheses, e.g., behavior
AND (cats OR felines). Boolean operators must be in CAPS.
|
| Case sensitivity |
|
| Fields |
|
| Phrases |
-
Phrases within double quotations, e.g., "budget battle"
|
| Truncation |
|
| Other |
-
With list of hits, option to add displayed related terms to the original
search; terms will be added with OR logic
-
Once list of hits is returned, option to View by Web Site (URL)
|
Advanced search
Click on Power Search
| Boolean Logic |
-
Boolean logic via template terminology:
-
My search results CAN contain the words, the name or phrase (Boolean
OR)
-
My search results MUST contain the words, the name or phrase
(Boolean AND)
-
My search results MUST NOT contain the words, the name or phrase
(Boolean NOT)
-
Option to add more constraints as above
|
| Case Sensitivity |
|
| Fields |
|
| Phrases |
-
Phrases within double quotations, e.g., "budget battle"
|
| Truncation |
|
Exercise: Single concept search using Excite's concept searching option
Query: I'm interested in learning about slavery.
Search:
-
Type: slavery
-
Choose a file that interests you and click on Search for more documents
like this one
-
Examine the relevancy of these new results
[Return to Index]
When to use Lycos? When you want to search
a Web page database with a variety of Boolean and term proximity options.
You can also control the relevancy ranking of your results.
Characteristics:
-
Crawler searches the Internet for sites that are frequently linked to from
other sites
-
Included in the relevancy rating is the frequency of linking to a site
from other Internet sites. These sites are shown first in your list of
search results.
-
Truncation is automatic., Place a period (.) after a search term to limit
it with no expansions.
-
Truncation is available using the $ symbol, e.g., librar$
Strengths:
-
Very current database
-
Offers more proximity operators than any other search engine on the Web;
these allow the user to specify the adjacency and order of search terms
-
Offers the opportunity to control the factors in the relevancy ranking
of results
Weaknesses:
-
One of the smaller databases with 30 million Web pages indexed
-
Not a full-text database; only certain portions of the Web page are indexed
-
Main screen query window is rather small
Search Engine Syntax
Main screen
| Boolean Logic |
-
Keyword search defaulting to Boolean AND
-
Implied Boolean logic: + for Boolean AND, - for Boolean NOT
|
| Case Sensitivity |
|
| Fields |
|
| Phrases |
-
Phrases within double quotations, e.g., "budget battle"
-
Phrases with stop words are searchable if other words also appear in the
phrase
|
| Truncation |
-
Truncation is optional; search engine stems search terms
-
Place a period (.) after the term to prevent truncation
-
Optional truncation character: $
|
| Other |
-
No searching a word that starts with a number, e.g., 11th
|
Advanced search
Click on Advanced Search
User fill-in template with supplied terminology
| Boolean Logic |
-
Boolean logic via template terminology:
-
Any of the words (Boolean OR)
-
All the words (Boolean AND)
-
All the words (Good match)
-
All the words (Near match)
-
All the words (Close match)
-
All the words (Strong match)
-
The last 4 items above refer to relative proximity
-
Implied Boolean logic: +/- for Boolean AND, NOT
-
Full Boolean logic: AND, OR, NOT with parentheses, e.g., behavior and
(cats or felines)
-
Several term proximity options:
-
NEAR (within 25 words); NEAR/n
-
ONEAR (within 25 words in exact query order); ONEAR/n
-
FAR (at least 25 words apart); FAR/n
-
OFAR (at least 25 words apart in exact query order); OFAR/n
-
ADJ (words next to each other); ADJn
-
OADJ (words next to each other in exact query order); OADJ/n
-
BEFORE (first query word appears in document before second query word)
-
n represents a user-specified number
|
| Case Sensitivity |
|
| Fields |
|
| Phrases |
-
The Exact Phrase
-
Phrases within double quotations, e.g., "budget battle"
-
Phrases with stop words are searchable if other words also appear in the
phrase
|
| Truncation |
-
Truncation is optional; search engine stems search terms
-
Place a period (.) after the term to prevent truncation
-
Optional truncation character: $
|
| Other |
-
Natural Language Query
-
Relevancy ranking: Click on Power Panel (HTML and Java versions)
to specify the relevancy ranking factors in terms of:
-
Match every word
-
Frequency of words
-
Appear early in text
-
Appear in title
-
Appear close together
-
Appear in exact order
|
Exercise: Simple keyword search
Query: What is being done to screen the blood supply for AIDS?
Search:
-
Type these words: aids "blood supply"
screen$
Note how the space between the search terms is interpreted as the Boolean
AND.
[Return to Index]
Northern Light is one of the most interesting search engines
on the Web. This search engine organizes results into Custom Search Folders
which allow the searcher to view results in clusters that are relevant
to the intent of the search.
When to use Northern Light? When you want to
search the full text of Web pages and view results by selecting folders
which group files into concept or site clusters.
Characteristics
-
Custom Search Folders group search results into:
-
Subject
-
Type, e.g., maps, press releases, product reviews, etc.
-
Source
-
Language
-
Results are relevancy ranked within each folder
-
A single relevancy ranked list of results is also available
-
The full text of Web pages are searchable
-
For a fee, users can choose to view the full text of files in a Special
Collection of 3,000+ journals and books
-
Offers Billboard Music
Search to search articles from Billboard, music sites on the
Web, press releases, music reviews, and job listings
Strengths
-
Custom Search Folders allow the user to choose those results that are generally
relevant and avoid those that are not
-
Some Custom Folders open up a new set of folders, allowing the user to
drill down to a specific level
Weaknesses
-
Some Custom Search Folders may be irrelevant to the user's query
-
Boolean searching with parentheses is not yet supported
Search Engine Syntax
Main Screen
| Boolean Logic |
-
Keyword searching defaulting to Boolean AND
-
Boolean AND, OR, NOT
-
Implied Boolean logic: + for Boolean AND, - for Boolean NOT
|
| Case Sensitivity |
|
| Fields |
-
URL, e.g., URL:whitehouse
-
Title, e.g., title:"White House"
-
Pub (searches certain Special Collections journals), e.g., pub:Billboard
-
Company (searches certain Special Collections journals), e.g., company:ibm
-
Ticker, e.g., ticker:ibm
-
Text, e.g., text:"White House"
|
| Phrases |
-
Phrases within double quotations, e.g., "budget battle"
|
| Truncation |
-
Automatic for common singular and plural word forms
-
* truncation character replaces multiple characters if the word contains
at least four characters, e.g., librar* will retrieve library, libraries,
librarian, librarians, librarianship, etc.
-
% truncation character replaces a single character, e.g., colo%r
|
Power Search
Click on Power Search
User fill-in template with supplied terminology
All the features above with the following additions:
| Fields |
-
Words anywhere (searches full text)
-
Words in title
-
Words in URL
-
Country
-
Date
-
Language (French, German, Italian, Spanish)
|
| Other |
-
Limits results to various sources, topics and document types
|
Exercise: Multiple concept search
Query: Will El Nino have an effect on the hurricane season?
Search:
-
Type this search: +"el nino"
+hurricanes
-
Select: World Wide Web
-
When search is complete, view the yellow Custom Search Folders on the left
side of the screen. Click on the folder(s) that interest you.
-
Continue drilling down through the folders until you reach the level of
specificity you wish to achieve. View results on the right side of the
screen.
[Return to Index]
Multithreaded (Meta) Search Engines
Multithreaded search engines simultaneously search
multiple search engines. They are also referred to as parallel search
engines, mega-search engines, or meta-search engines. These are useful
when:
-
you have an obscure topic
-
you are not having luck finding anything when you search
-
your search is not complex
-
you want to retrieve as many documents as possible with one
search statement, subject to special features that may limit search results
It is important to note that many multithreaded search engines retrieve
a certain maximum number of documents from the individual engines they
have searched, cut off after a certain point as the search is processed.
Inference Find claims to
return the maximum number of results that its targeted search engines will
allow. In addition, many multithreaded search engines stop processing a
query after a certain amount of time. Other search engines give the user
a certain amount of control over the number of documents returned in a
search. All these factors have two implications:
-
Don't expect multithreaded search engines to return all the documents available
at the individual engines they have searched
-
Results retrieved by multithreaded search engines can be highly relevant,
since they are usually presenting the first items from the relevancy-ranked
list of hits returned by the individual search engines
The better multithreaded search engines remove duplicate files and give
you some information along with the document title. MetaCrawler is an excellent
choice.
When to use MetaCrawler? When you want a multithreaded
search engine that processes results fast, removes duplicates, and presents
results in relevancy ranked order.
Characteristics:
-
Simultaneously searches several major search engines and subject directories
-
Retrieves results by location (URL) in the Power Search interface
Strengths:
-
Results are collated into a single list
-
Duplicate URLs are removed from the list of located sites
-
Ranks search results by summing the confidence scores from the source databases
and presenting files in "voted" order
-
Processes results very fast
Weaknesses:
-
No field searching available other than location
-
Retrieves a maximum of 10 pages per search engine in the basic search interface,
and 30 pages per search engine in Power Search
Search Engine Syntax
Main screen
| Boolean Logic |
-
Template terminology: any (Boolean OR), all (Boolean AND),
as a phrase
-
Implied Boolean logic: + for Boolean AND, - for Boolean NOT
|
| Case Sensitivity |
|
| Fields |
|
| Phrases |
-
Phrases within double quotations, i.e., "budget battle," to use with a
search for any or all
-
Exact phrases may not be searchable since some queried search engines ignore
stop words
|
| Truncation |
|
Power Search
Click on Power search interface
All options as above with the following addition:
Exercise: Simple search - Main Screen
Query:How has the seal fur boycott affected the Inuit Indians?
Search: fur boycott
Inuit
Note that all is the default.
Exercise: Advanced Search
Click on Power search interface
Query: Is there a problem with Internet addiction in Europe?
Search:
-
"Internet addiction"
-
Results From: choose Europe
[Return to Index]
Other recommended multithreaded search engines:
-
CYBER411
-
http://www.cyber411.com/
Searches 16 search engines and subject directories, and allows for
Boolean and proximity searching
-
INFERENCE FIND
-
http://www.inference.com/infind/
Searches multiple search engines and groups results by concept and
Internet site
-
METAFIND
-
http://www.metafind.com/
Searches 6 search major search engines for a limited number of results
and sort alphabetically, by keywords, or by domain
-
PROFUSION
-
http://profusion.ittc.ukans.edu/
Searches 6 search engines OR the best three for your query and gives
collated results; also stores queries at the site and e-mails notification
when new sites are found
-
SAVVY SEARCH
-
http://www.cs.colostate.edu/~dreiling/smartform.html
Searches multiple search engines and gives collated results if you
choose the option to "Integrate results"
[Return to Index]
For a summary of the instructions contained in this document,
see
Laura Cohen
lcohen@cnsvax.albany.edu