The Finer Points of Web Search Engines
Updated: 22 September 1998
This tutorial will cover some of the newer or lesser known options available
on several Web search engines. These options can help the searcher gain
control over search results and bring in more relevant hits. A handful
of newer search engines are covered in this tutorial, along with a group
of the better known.
For a tutorial covering the more basic aspects of Web search engines,
see Searching the Internet: Recommended Sites and
Search Techniques.
Topics covered in this tutorial
-
Review of search syntax
-
Refining search results
-
Relevancy ranking
-
Storing queries for regular processing
Search engines covered in this tutorial
I. Review of search syntax
The most common form of search syntax employed on Web search engines is
keyword searching with implied Boolean syntax. This tutorial will concentrate
primarily on this type of search. For a fuller discussion of Boolean logic,
see Boolean Searching on the Internet.
Keyword searching refers to a search type in which you enter
terms representing the concepts you wish to retrieve. Boolean operators
are not used.
Implied Boolean logic refers to a search in which symbols are
used to represent Boolean logical operators. In this type of search on
the Internet, the absence of a symbol is also significant, as the space
between keywords defaults to either OR logic or AND logic. Many well-known
search engines default to OR. For example:
Search engines defaulting to OR: AltaVista (main screen); Excite;
Infoseek; MetaCrawler
Search engines defaulting to AND: HotBot; Lycos; Northern Light
[Return to Index]
Exercise: Review of implied Boolean syntax using Infoseek
Infoseek Quick Facts
Online help: http://www.infoseek.com/Help?pg=HomeHelp.html
Infoseek is one of the most accurate search engines on the Internet.
It does a good job of processing all the aspects of a search query and
returning results in a useful relevancy ranked order.
Special Features:
-
Returns accurate results in very fast search processing time
-
Clusters all the retrieved URLs from one site into one result; users have
the option to view the other pages from the site
-
Truncates automatically so that the user does not have to remember to use
a truncation symbol
-
Offers straightforward search syntax on a simple or Advanced Search (template)
interface
-
Field searching is useful and accurate
-
Offers concept recognition for names, noun phrases, numbers, and word form
variants
-
Advanced Search interface offers a user-friendly template of search options
Drawbacks:
-
At 60 million files, not as large as other services
-
Query window on the main screen is rather small
|
Query #1: I'm looking for information about peace in Bosnia
Search:
a) peace Bosnia
[implied OR - this search is incorrect]
b) +peace +Bosnia
[implied AND - this search is correct]
[Return to Index]
2. Refining search results
Several search engines offer the option of refining existing search results.
This is useful if:
-
you have searched broadly and want to narrow your topic
-
you have retrieved too many hits and are looking for a way to reduce the
number of results
-
you wish to explore aspects of the topic and want to view a list of possibilities
Exercise: Narrow existing results on Infoseek
Query #2: I'm interested in the Dayton peace agreement for Bosnia.
Search:
To the search above:
+peace +Bosnia
-
Go to the top of any results page
-
Click on Search only within these xxxxxx pages
-
Type: title:Dayton
This option adds terms to your existing results using AND logic. Using
this option, you do not have to retype your original search in order to
add to it.
[Return to Index]
Exercise: Refine feature on AltaVista
AltaVista Quick Facts
Online help
-
Simple search: http://www.altavista.digital.com/av/content/help.htm
Advanced search: http://www.altavista.digital.com/av/content/help_advanced.htm
Alta Vista is one of the most respected and popular general search engines
on the Web. It features numerous searchable fields, the ability to process
complex searches, and a poweful method of refining search results. Alta
Vista continues to increase the size of its database, and now indexes 140
million files, the largest of any search engine.
Special features:
-
Very large database of 140 million Web pages
-
Offers an impressive selection of searchable fields, including language
and retrieval by last-modified date
-
Refine feature presents users with a selection of related search
terms to add to or exclude from a subsequent search
-
Offers unusual options such as searching by language and translating search
results
Drawbacks:
-
Relevancy ranking can be questionable
-
In Simple Search, switches default logic from OR to AND if two or more
fields are searched, e.g., title:mars url:nasa
-
Processes searches more slowly than Infoseek
|
Query #3: I'm looking for information about Watergate.
Search:
-
Type: Watergate
-
Click on Refine. Examine the term cluster lists. Some clusters are
relevant to the query, some are not. Notice that you can only choose to
require or exclude an entire list of terms.
-
Click on GRAPH. A Java version of this list will slowly appear.
Here, you can choose to require (green X) or exclude (red check mark) individual
terms.
-
Choose individual terms to require and/or exclude. Click on Search.
Examine your results.
-
You may also choose to Refine Again if you wish to further narrow
your results based on your selections before you search again.
[Return to Index]
Exercise: Search for more documents feature on Excite
Excite Quick Facts
Online help: http://www.excite.com/Info/searching.html?a-tip-t
Excite is known for the relative currency of its database and its application
of concept searching to a user's query.
Special Features:
-
Offers multiple syntax options as well as form-based searching in its Power
Search interface
-
Engine's concept searching looks for terms and concepts related to a user's
search terms
-
Retrieves Web pages related to any file in the list of search results when
choosing to Search for more documents like this one
-
With list of hits, offers the option to add displayed related terms to
the original search
Drawbacks:
-
No field searching is available
-
Excite's concept searching may pull in irrelevant hits, though exact matches
are shown first in the list of hits
-
At 50 million files, not as large as other services
|
Query #4: I'm looking for information about nuclear waste.
Search:
-
Type: "nuclear waste"
-
Retrieve the first screen of results.
-
Note that a small number of search terms appear at the top of this screen.
Choosing any one or more of these with add them to your search with the
Boolean OR. This will increase your number of results.
-
Choose an item in your list of hits that is of interest to you. Click on
Search for more documents like this one to retrieve related results.
-
You may choose Search for more documents... on subsequent screens
if this is helpful
[Return to Index]
3. Relevancy ranking
A few search engines give the user a certain amount of control over the
relevancy ranking of search results. Other engines offer an alternative
to relevancy ranking by organizing results into clusters or folders arranged
by concept, type of site, document type, etc.
A. Control over relevancy ranking
Exercise: Control of factors in the relevancy ranking of results with Lycos
Pro.
Lycos Quick Facts
Online help
-
Basic help: http://www.lycos.com/help/search-help.html
-
Advanced help:
-
http://www.lycos.com/help/lycospro-help.html
http://www.lycos.com/press/pro/query_parse.html
Lycos offers a search engine service with many options for customizing
searches and ranking the results.
Special Features:
-
One of the smaller search engine databases with 30 million Web pages indexed
-
Has one of the more current search engine databases on the Web
-
Offers more proximity operators than any other search engine on the Web;
these allow the user to specify the adjacency and order of terms in source
documents
-
Allows the user to control the factors in the relevancy ranking of results
-
Truncation is automatic unless the searcher specifies otherwise
-
Good relevancy ranking
Drawbacks:
-
Not a full-text database; only certain portion of Web pages are indexed
-
Accuracy is weak relative to other major search engines
-
Phrase searching can be tricky if stop words are included
|
Query #6: What has Clinton been doing about Bosnia?
Search:
-
Click on Advanced Search
-
Click on Power Panel
-
Notice the choices for scoring various relevancy ranking factors as Low,
Medium, or High
-
Click on Java Version
-
Type: Clinton
Bosnia
-
Apply ratings as you see fit
-
Conduct the search and evaluate the results.
The Power Panel will not affect the number of results, but the order in
which they appear.
[Return to Index]
B. Alternatives to relevancy ranking
Exercise: Grouping of results into concept folders with Northern Light
|
Online help:
Main Screen: http://www.northernlight.com/docs/prod_help.htm#simplesearch
Power Search: http://www.northernlight.com/docs/power_help.htm#powersearch
Northern Light organizes results into Custom Search Folders that represent
concepts and/or types of sites. Results within these folders are relevancy
ranked. With this system, you can ignore the folders that are irrelevent
and choose those that fit your query best. This may be more convenient
than working through one master list of results.
Special Features:
-
Sorts search results into folders by subjects, types (e.g., press releases,
maps), source sites, or languages
-
Within folder levels, a new group of folders is presented
-
Relevancy ranked results are available on the same screens as the folders
-
Offers a Special Collection database of relevant articles from 3000+ sources
for a fee
Drawbacks:
-
Database is relatively small (less than 50 million pages) but is growing
-
Folders may not be consistently useful for all queries; however, the user
can simply skip over irrelevant folders
|
Query #7: What are the prospects for peace in Bosnia?
Search:
-
Type: +peace +Bosnia
-
Choose the folder Bosnia-Herzegovina
-
Choose the folder Government Sites
-
Next, choose the Custom Search Folders that cover the aspects of this query
that interest you. Examine the results. Note that Special Collection documents
are available only for a fee, usually $1.
[Return to Index]
Exercise: Organization of results into concepts and/or types of sites with
Inference Find
|
Online help: http://www.inference.com/infind/boolean.html
Inference Find is a multithreaded search engine that searches six multiple
search engines simultaneously. The search engine merges the results, removes
duplicate files, and organizes the results into sections.
Special Features:
-
Retrieves the maximum number of results each search engine will allow by
searching target engines in parallel. For example, Infoseek is searched
three times in parallel.
-
Groups results into sections by concepts and/or by Web site, e.g., educational
institution, non-profit site, European site, federal government, etc.
-
Returns results quickly
Drawbacks:
-
Gives no syntax directions. Suggests the use of Boolean operators but cautions
about inconsistent results
-
List of results contains only titles of Web pages, so the relevancy of
the source document is not always easy to determine without visiting the
page
|
Query #8: What are the prospects for peace in Bosnia?
Search:
-
Type: Bosnia and peace
-
View results
[Return to Index]
Exercise: Organization of results by user choice with MetaFind
|
Online help: http://www.metafind.com/syntax.html
MetaFind is a multi-threaded search engine that searches 6 major search
engines for a limited numer of results and offers unusual options for sorting
search results.
Special Features:
-
Results may be sorted by keyword, alphabetically, or by domain
-
Duplicate records are removed from search results
-
Boolean search options may be used, including the proximity limiter NEAR
-
Each file in the results list includes an indication of the ranking of
this file as retrieved from the source search engine
Drawbacks:
-
Retrieves a limited number of hits from six search engines, as follows:
10 links from AltaVista twice, 10 from Excite twice, 50 from HotBot, 25
from Infoseek, 30 from Planetsearch and 50 from Webcrawler
-
When sorting by keyword, the "Others" category can be rather large
|
Query #9: What are the prospects for peace in Bosnia?
Search:
-
Type: Bosnia peace
[MetaFind defaults to Boolean AND logic]
-
Choose: Sort by Domain
This search can be quite useful if you wish to obtain results from certain
types of sites. For example, you may wish to view sites on a particular
topic from only the edu domain in order to see more scholarly materials.
Unfortunately, most search engines cannot handle such a search because
there are too many pages within this domain for them to process. MetaFind
offers an interesting solution to this problem by first gathering pages
and then sorting them by domain.
[Return to Index]
4. Storing queries regular processing
[NOTE: This feature is temporarily unavailable]
Exercise: Storing queries for regularly updated results with ProFusion
|
Online help: http://profusion.ittc.ukans.edu/help.html
ProFusion is a multi-threaded search engine that searches 9 search engines
and subject directories simultaneously and returns collated results. Queries
may be stored at the site using ProFusion Personal Assistant. These queries
can be run weekly, bi-weekly or monthly; ProFusion will e-mail a notification
of new results.
Special Features:
-
Searches nine search engines, or the three best that support a particular
query
-
Queries may be registered at the site using ProFusion Personal Assistant;
ProFusion will run them weekly, biweekly, or monthly, and inform the user
via e-mail when new sites are found
-
Users may enter keyword, Boolean, or phrase queries
-
Results are presented in relevancy ranked order
-
Offers an option to detect broken links and mark them in the list of results
|
Exercise: ProFusion
To store a query, click on the word here where it states, "Click
here to view your personalized search results or to register to rerun queries."
You will be asked to run your query once, and then rate each item in the
results list as either relevant or irrelevant. ProFusion will use this
ranking to conduct subsequent searches. [This service is temporarily
unavailable]
[Return to Index]
Laura Cohen
lcohen@cnsvax.albany.edu