Conducting Research on the Internet
September 1998
The Internet provides access to a wealth of information on countless
topics contributed by people throughout the world. On the Internet, a user
has access to a wide variety of services: electronic mail, file transfer,
vast information resources, interest group membership, interactive collaboration,
multimedia displays, and more. The Internet consists primarily of a variety
of access protocols. These include e-mail, FTP, HTTP, Telnet, and Usenet
news. Many of these protocols feature programs that allow users to search
for and retrieve material made available by the protocol.
For background information on Internet access protocols, see A
Basic Guide to the Internet.
The Internet is not a library in which all its available items are identified
and can be retrieved by a single catalog. In fact, no one knows how many
individual files reside on the Internet. The number certainly runs into
the many millions and is growing at a rapid pace.
The Internet is a self-publishing medium. This means that anyone with
a small amount of technical skill and access to a host computer can publish
on the Internet. It is important to remember this when you locate sites
in the course of your research. Internet sites change over time according
to the commitment and inclination of the creator. Some sites demonstrate
an expert's knowledge, while others are amateur efforts. Some may be updated
daily, while others may be outdated. As with any information resource,
it is important to evaluate what you find on the Internet. For more information,
see Evaluating Internet Resources.
Also be aware that the addresses of Internet sites frequently change.
Web sites can disappear altogether. Do not expect stability on the Internet.
One of the most efficient ways of conducting research on the Internet
is to use the World Wide Web. Since the Web includes most Internet protocols,
it offers access to a great deal of what is available on the Internet.
HOW TO FIND INFORMATION ON THE INTERNET
There are five basic ways to access information on the Internet:
-
Join an e-mail discussion group or Usenet newsgroup
-
Go directly to a site if you have the address
-
Browse
-
Explore a subject directory
-
Conduct a search using a Web search engine
Each of these options is described below.
1. JOIN AN E-MAIL DISCUSSION GROUP OR USENET NEWSGROUP
Join any of the thousands of e-mail discussion groups or Usenet newsgroups.
These groups cover a wealth of topics. You can ask questions of the experts
and read the answers to questions that others ask. Belonging to these groups
is somewhat like receiving a daily newspaper on topics that interest you.
These groups provide a good way of keeping up with what is being discussed
on the Internet about your subject area. In addition, they can help you
find out how to locate information--both online and offline--that you want.
E-mail discussion groups tend to be associated with academic institutions.
Many topics are scholarly in nature, and it is not unusual for experts
in the field to be among the participants. In contrast, Usenet newsgroups
cover a far wider variety of topics and participants have a range of expertise.
Be careful to evaluate the knowledge and opinions offered in any discussion
forum. Note also that a small number of e-mail groups are cross-posted
as Usenet newsgroups. For example, the early music e-mail group EARLYM-L
also exists as the newsgroup rec.music.early.
E-mail discussion groups are managed by software programs. There are
three in common use: Listserv, Majordomo, and Listproc. The commands for
using these programs are similar. For information on how to use listserver
software, see the tutorial Internet from the VAX Prompt.
A list of Usenet newsgroups can be accessed from within a newsreader
program. Using the RN reader on Unix, for example, you can type dir/group/all
and receive a list of every newsgroup to which the University subscribes.
For more information on the RN newsreader, see Using
the RN Newsreader from the VAX.
A good Web-based directory to assist in locating e-mail discussion groups
and Usenet newsgroups is Liszt, located at http://www.liszt.com/.
2. GO DIRECTLY TO A SITE IF YOU HAVE THE ADDRESS
If you know the Internet address of a site you wish to visit, you can use
a Web browser to access that site. All you need to do is type the URL in
the appropriate location window. URL stands for Uniform Resource Locator.
The URL specifies the Internet address of the electronic document. Every
file on the Internet, no matter what its access protocol, has a unique
URL. Web browsers use the URL to retrieve the file from the host computer
and the directory in which it resides. This file is then displayed on the
user's computer monitor.
This is the format of the URL: &nsp; protocol://host/path/filename
For example:
http://cedr.lbl.gov/cdrom/doc/cdrom.html
a hypertext file on the Web
ftp://bongo.cc.utexas.edu/microlib a
file at an FTP site
telnet://library.albany.edu a Telnet connection
Any of these address can be typed into the location window of a Web browser.
3. BROWSE
Browsing home pages on the Web is a haphazard but interesting way of finding
desired material on the Internet. Because the creator of a home page programs
each link, you never know where these links might lead. High quality starting
pages will contain high quality links. The University Libraries Home Page
contains quality links leading into the World Wide Web, and is a good place
to start your exploration. This site is located at http://www.albany.edu/library/.
4. EXPLORE A SUBJECT DIRECTORY
An increasing number of universities, libraries, companies, organizations,
and even volunteers are creating subject directories to catalog portions
of the Internet. These directories are organized by subject and consist
of links to Internet resources relating to these subjects. The major subject
directories available on the Web tend to have overlapping but different
databases. Most directories provide a search capability that allows you
to query the database on your topic of interest.
Subject directories differ significantly in selectivity. For example,
the famous Yahoo! site does not consider content when adding Web pages
to its database. In contrast, the Argus Clearinghouse collects and rates
subject guides often compiled by experts. Consider the policies of any
directory that you visit. One challenge to this is the fact that not all
directory services are willing to disclose either their policies or the
names and qualifications of site reviewers. A number of subject directories
consist of links accompanied by annotations that describe or evaluate site
content. A well-written annotation from a known reviewer is more useful
than just a list of links.
Among the more prominent and useful directories are these:
The University Libraries Home Page includes a list of these and other
recommended subject directories, located at http://www.albany.edu/library/internet/subject.html.
Recommended starting points:
-
The Argus Clearinghouse is
one of the highest quality subject directories on the Internet. This site
consists of rated collections of recommended sites organized into subject-specific
guides. The guide authors are often specialists in the field. This site
is highly recommended for academic research.
-
The WWW Virtual Library
is one of the oldest and most respected subject directories on the Web.
This directory consists of individual subject collections, many of which
are maintained at universities throughout the world.
-
INFOMINE is a large directory of
Web sites of scholarly interest compiled at the University of California,
Riverside. The directory may be browsed or searched by subject, keyword,
or title. Each site listed is accompanied by a description.
-
If you want to explore a large number and variety of sources, try Yahoo!.
This is the largest subject collection on the Internet. Yahoo! indexes
hundreds of thousands of links and organizes them in a hierarchical subject
directory. Be aware, however, that Yahoo! accepts most sites submitted
to it and do not check for quality or authority. In addition, Yahoo! does
not attempt to provide a comprehensive listing on any topic. Its coverage
of academic subject areas is generally sporadic. Certainly the Yahoo! database
includes links to excellent resources. Each resource found here, however,
must be evaluated carefully.
5. CONDUCT A SEARCH USING A WEB SEARCH ENGINE
An Internet search engine allows the user to enter keywords relating to
a topic and retrieve information about Internet sites containing those
keywords. Search engines are available for many of the Internet protocols.
Archie searches for files stored at anonymous FTP sites. Veronica and Jughead,
now of mainly historical interest, search Gopherspace.
Search engines located on the World Wide Web have become quite popular
as the Web itself has become the Interneotj|t environment of choice. Web
search engines have the advantage of offering access to a vast range of
information resources located on the Internet. Many search engines compile
a database spanning multiple Internet protocols, including HTTP, FTP, and
Usenet. Web search engines tend to be developed by private companies, though
most of them are available free of charge.
A Web search engine service consists of three components:
-
Spider: Program that traverses the Web from link to link, identifying
and reading pages
-
Index: Database containing a copy of each Web page gathered by the
spider
-
Search engine: Software that enables users to query the index and
that usually returns results in relevancy ranked order
Keep in mind that spiders are indiscriminate. Be aware that some of the
resources they collect may be outdated, inaccurate, or incomplete. Others,
of course, may come from responsible sources and provide you with valuable
information. Be sure to evaluate all your search results carefully.
With most search engines, you fill out a form with your search terms
and then ask that the search proceed. The engine searches its index and
generates a page with links to those resources containing some or all of
your terms. These resources are usually presented in relevancy ranked order.
A new development in search engine technology is the ordering of search
results by concept, keyword, or site.
All search engines have rules for formulating queries. It is imperative
that you read the help files at the site before proceeding. Online tutorials
can also help you learn the rules. A short list of recommended tutorials
appears at the end of this file.
Among the more prominent and useful search engines are these:
Recommended starting points:
-
Start with Infoseek. This is a very
quick and accurate search engine that offers several field searching options
and simplified keyword and phrase searching. Infoseek clusters all your
results from one site into one result, so that each result comes from a
different site. This makes it easy to scan for a variety of documents brought
in by your search.
-
AltaVista is another good
choice. This engine has a very large database, handles sophisticated Boolean
searches, and offers several field searching options.
-
Also try Northern Light. This
unique engine clusters results into Custom Search Folders, which contain
specific subtopics and sites retrieved by your search. Northern Light therefore
gives you quick access to aspects of your topic that interest you. Excite
and HotBot are also recommended services to try.
-
MetaCrawler is a good site to
try if your topic is obscure or if you want to retrieve results from a
variety of search engines with a single search statement. This service
searches multiple search engines simultaneously and offers useful search
options. MetaCrawler returns your results in a single list and removes
the duplicate files. This type of search processing is called multi-threaded
searching. Other recommended multi-threaded search engines include Inference
Find, MetaFind and ProFusion.
-
Try The Internet Sleuth if you
want to search a topic-oriented database. This site offers access to hundreds
of searchable databases in several subject areas.
For a more extensive list of recommended Web search engines, see Search
the Internet.
PRACTICAL STEPS: WORLD WIDE WEB SEARCH ENGINES
HOW TO FORMULATE QUERIES
There are three steps to a computer database search:
-
Identify your concepts
When conducting any database search, you need to break down your topic
into its component concepts. For example, if you want to find information
on the budget negotiations between President Clinton and the Republicans,
these are your concepts: CLINTON, REPUBLICANS, BUDGET.
-
List keywords for each concept
Once you have identified your concepts, you need to list keywords which
describe each concept. Some concepts may have only one keyword, while others
may have many.
For example:
CLINTON REPUBLICANS BUDGET
HOUSE SPEAKER BUDGET NEGOTIATIONS
BUDGET BATTLE
BUDGET IMPASSE
BUDGET DEAL
Depending on the focus of your search, there may be other keywords you
would wish to use.
-
Specify the logical relationships among your keywords
Once you know the keywords you want to search, you need to establish
the logical relationships among them. The formal name for this is Boolean
logic. Boolean logic allows you to specify the relationships among search
terms by using any of three logical operators: AND, OR, NOT.
Search Statement Result of search
World War I AND Files containing both these terms
World War II
World War I OR Files containing at least one of these terms
World War II
World War I NOT Files containing the term World War I but
World War II not also the term World War II
Some search engines offer Boolean searching without mentioning the logical
operators by name. For example, you might be asked to list your search
terms and choose that All of these terms be searched. This denotes AND
logic. Specifying Any of these terms denotes OR logic. Other search engines
use a type of implied Boolean logic, in which symbols or spaces are used
to denote logical relationships.
Certain search engines allow you to use a proximity operator. This a
type of AND logic which specifies the distance between words in a source
file. For example, AltaVista and Lycos let you use the NEAR operator. Consider
this search: Clinton NEAR budget. In AltaVista, the two terms must
be within 10 words of each other in the source file. Lycos allows user-specified
distances. Use of this option can help you gain relevance in your search
results.
Most Web search engines cannot handle a single search statement that
includes all the terms listed in Step 2 above. You may need to repeat your
search a few times using terms in different combinations until you get
results that are satisfactory. For example, you may start with CLINTON,
REPUBLICANS, BUDGET NEGOTIATIONS and connect these terms with AND logic.
Take a look at your results. If you are not finding what you want, repeat
the search with alternative keywords for the budget concept. Your initial
results may give you ideas about which new terms to try.
For more information on formulating searches, see Boolean
Searching on the Internet.
TIPS ON CONDUCTING SEARCHES
-
Read the directions at each search site. The technique for formulating
a search depends on the search engine you are using. There is a wide variety
of options available among the different search engines.
-
If you have a multi-term search, be sure to determine which type of Boolean
logic you should use. For example, a search about the relationship between
latitude and temperature would be formulated as: +latitude
+ temperature on many Web search engines in order for AND logic to
apply.
-
Include synonyms or alternate spellings in your search statements and connect
these terms with OR logic.
-
Check your spelling.
-
Take advantage of capitalization if the search engine is case sensitive.
-
If your results are not satisfactory, repeat the search using alternative
terms.
-
If you have too many results, or results that are not relevant:
-
Add concept words
-
Use vocabulary that is specific to your topic
-
Link appropriate terms with the Boolean AND ( + ) so that each term is
required to appear in the record
-
Choose an option requiring exact term matches
-
Use term proximity operators if they are available
-
Narrow your search to individual parts of the Web page such as title, first
page level, etc.
-
Use the Boolean NOT to keep out records containing terms you don't want
-
If you have too few results:
-
Drop off the least important concept(s) to broaden your subject
-
Use more general vocabulary
-
Add alternate terms or spellings for individual concepts and connect with
the Boolean OR
-
Choose an option allowing for loose or concept matches
-
Try different sources within search engines to diversify your results.
Sources can include Usenet newsgroups, Internet FAQs, reviewed pages, and
more.
-
Experiment with different search engines. No two search engines work from
the same database.
-
You may want to try Web sites which allow you to search multiple search
engines simultaneously. Be aware that you will lose access to advanced
query options since not all engines offer them.
FOR MORE INFORMATION
The following tutorials include more detailed presentations on Web search
engines and other Web-based information resources:
-
The Finer Points of Web Search Engines
-
http://www.albany.edu/library/internet/finer.html
-
Finding Information on the Internet: A Tutorial
-
http://www.lib.berkeley.edu/TeachingLab/Guides/Internet/FindInfo.html
-
How to Choose a Search Engine or Research Database
-
http://www.albany.edu/library/internet/choose.html
-
Searching the Internet: Recommended Sites and Search Techniques
-
http://www.albany.edu/library/internet/search.html
Laura Cohen
lcohen@cnsvax.albany.edu