-
The Internet is a self-publishing medium. It is not a library of evaluated
publications selected by professionals. Rather, the Internet is a bulletin
board containing everything from the difinitive to the spurious. Everything,
everything must be analyzed for its appropriateness for research
use. For guidelines on how to do this, see Evaluating
Internet Resources.
-
Be sure to try out a handful of sites when researching a topic on the Internet.
Do not rely on only one site or one type of site.
-
Two major resources for locating Internet materials are the subject directory
and the search engine. Be sure you understand the difference:
Subject Directory
Definition: A subject directory is a database of Internet files
submitted by site creators or evaluators and organized into subject categories.
Most directories offer a search engine to query the database. The service
may or may not use selection criteria when choosing files to include in
the database.
When using subject directories, keep in mind that:
-
Subject directories differ significantly in selectivity. Consider the policies
of any directory that you visit. One challenge to this is the fact that
not all directory services are willing to disclose either their policies
or the names and qualifications of site reviewers.
-
Many people don't make enough use of subject directories, but instead go
straight to search engines. Keep in mind that some of the more academically-oriented
subject directories contain carefully chosen and annotated lists of quality
Internet sites. Don't overlook subject directories when searching for quality
on the Internet.
The Argus Clearinghouse is
a good example of a subject directory. A more complete list may be found
on the page Internet
Subject Directories.
Search Engine
Definition: A search engine is a searchable database of Internet
files collected by a computer program (called a wanderer, crawler, robot,
worm, spider). Indexing is created from the collected files, e.g., title,
full text, size, URL, etc. There is no selection criteria for the collection
of files.
A search engine might well be called a search engine service
or a search service. As such, it consists of three components:
-
Spider: Program that traverses the Web from link to link, identifying
and reading pages
-
Index: Database containing a copy of each Web page gathered by the
spider
-
Search engine mechanism: Software that enables users to query the
index and that usually returns results in relevancy ranked order
Infoseek is a good example of a
search engine. A more complete list may be found on the page Search
the Internet.
-
Yahoo is one of the most popular site
on the Web. It is the Web's largest subject directory. But beware of its
drawbacks:
-
Yahoo's staff does not evaluate content when choosing to add items to the
database; therefore scholarly sites are haphazardly mixed in with everything
else
-
When you do a search in Yahoo, you are searching only the title and the
short descriptive blurb about the site; by contrast, search engines usually
give you access to the full text of the document
-
Yahoo tends to index only the major landing page of a site; therefore,
any significant subsidiary pages on a related or different topic may not
show up on this site
-
It is very helpful to understand the principles of Boolean search logic
when using a search engine on the Web. This search logic is manifested
in three distinct ways on Web search engines. Review Boolean
Searching on the Internet.
-
Other search
strategies are also useful to examine in order to make accurate use
of Web search engines. Be sure to check these out.
-
When you enter more than one word in a Web search engine, the space between
the words has a logical meaning that directly affects your results. This
is known as the default syntax. For example:
In AltaVista, Infoseek,
and Excite, a search on the words
birds migration
means that you will get back documents that contain either the word
birds, the word migration, or both. The space between the words defaults
to the Boolean OR. This is probably not what you want for this search.
In HotBot, Lycos
and Northern Light, a search
on the words
birds migration
means that you will get back documents that contain both the words
birds and migration. The space between the words defaults to the Boolean
AND. This is more appropriate.
Be sure you know the default syntax of the search engine you are using.
For an overview of the default syntax of major search engines, see Quick
Reference Guide to Search Engine Syntax.
-
When using Web search engines, a de facto search language is emerging especially
for basic search (i.e., main screen) interfaces. When in doubt, use the
following syntax:
-
+ for mandatory words: +birds +migration
-
phrases within double quotations: "human rights"
-
Search engines offer numerous features that help you hone in on what you
want. For a review of these features, and the search engines that support
them, see How
to Choose a Search Engine or Research Database.
-
If you have too many search results, or results that are not relevant:
-
Add concept words
-
Use vocabulary that is specific to your topic, e.g., Honda rather
than cars.
-
Link appropriate terms with the Boolean AND ( + ) so that each term is
required to appear in the record
-
Choose an option requiring exact term matches
-
Use term proximity operators if they are available
-
Narrow your search to individual parts of the Web page such as title, first
page level, etc.
-
Use the Boolean NOT to keep out records containing terms you don't want
-
If you have too few search results:
-
Drop off the least important concept(s) to broaden your subject
-
Use more general vocabulary
-
Add alternate terms or spellings for individual concepts and connect with
the Boolean OR
-
Choose an option allowing for loose or concept matches
-
Don't be impressed by a large number of hits in response to a search. Often
multiple pages are returned from a single site because they all contain
your search terms. Infoseek and
HotBot avoid this by a technique called
results grouping, whereby all the results from one site are clustered
together into one result. You are then given the opportunity to view all
the retrieved pages from that site if you choose. With these engines, you
may get a smaller number of results from a search, but each result is coming
from a different site.
-
Multithreaded search engines simultaneously search multiple search engines.
They are also referred to as parallel search engines, mega-search engines,
or meta-search engines. These are useful when:
-
you have an obscure topic
-
you are not having luck finding anything when you search
-
your search is not complex
-
you want to retrieve as many documents as possible with one search statement,
subject to special features that may limit search results
-
Many multithreaded search engines retrieve a certain maximum number of
documents from the individual engines they have searched, cut off after
a certain point as the search is processed. Inference
Find claims to return the maximum number of results that its targeted
search engines will allow. In addition, many multithreaded search engines
stop processing a query after a certain amount of time. Other search engines
give the user a certain amount of control over the number of documents
returned in a search. All these factors have two implications:
-
Multithreaded search engines often do not return all the documents available
at the individual engines it has searched.
-
Results retrieved can potentially be highly relevant, since it is usually
presenting the first items from the relevancy-ranked list of hits returned
by the individual search engines.
-
The better multithreaded search engines remove duplicate files and give
you some information along with the document title. To see a list of multithreaded
search engines, visit Search
the Internet.
-
Many search engines offer higher-end features that allow you to fine-tune
your searches. To view a tutorial about these features, see The
Finer Points of Web Search Engines.
Laura Cohen | October 1998