University at Albany Libraries

Conducting Research on the Internet


September 1998

The Internet provides access to a wealth of information on countless topics contributed by people throughout the world. On the Internet, a user has access to a wide variety of services: electronic mail, file transfer, vast information resources, interest group membership, interactive collaboration, multimedia displays, and more. The Internet consists primarily of a variety of access protocols. These include e-mail, FTP, HTTP, Telnet, and Usenet news. Many of these protocols feature programs that allow users to search for and retrieve material made available by the protocol.

For background information on Internet access protocols, see A Basic Guide to the Internet.

The Internet is not a library in which all its available items are identified and can be retrieved by a single catalog. In fact, no one knows how many individual files reside on the Internet. The number certainly runs into the many millions and is growing at a rapid pace.

The Internet is a self-publishing medium. This means that anyone with a small amount of technical skill and access to a host computer can publish on the Internet. It is important to remember this when you locate sites in the course of your research. Internet sites change over time according to the commitment and inclination of the creator. Some sites demonstrate an expert's knowledge, while others are amateur efforts. Some may be updated daily, while others may be outdated. As with any information resource, it is important to evaluate what you find on the Internet. For more information, see Evaluating Internet Resources.

Also be aware that the addresses of Internet sites frequently change. Web sites can disappear altogether. Do not expect stability on the Internet.

One of the most efficient ways of conducting research on the Internet is to use the World Wide Web. Since the Web includes most Internet protocols, it offers access to a great deal of what is available on the Internet.

HOW TO FIND INFORMATION ON THE INTERNET

There are five basic ways to access information on the Internet:
  1. Join an e-mail discussion group or Usenet newsgroup
  2. Go directly to a site if you have the address
  3. Browse
  4. Explore a subject directory
  5. Conduct a search using a Web search engine
Each of these options is described below.

1. JOIN AN E-MAIL DISCUSSION GROUP OR USENET NEWSGROUP

Join any of the thousands of e-mail discussion groups or Usenet newsgroups. These groups cover a wealth of topics. You can ask questions of the experts and read the answers to questions that others ask. Belonging to these groups is somewhat like receiving a daily newspaper on topics that interest you. These groups provide a good way of keeping up with what is being discussed on the Internet about your subject area. In addition, they can help you find out how to locate information--both online and offline--that you want.

E-mail discussion groups tend to be associated with academic institutions. Many topics are scholarly in nature, and it is not unusual for experts in the field to be among the participants. In contrast, Usenet newsgroups cover a far wider variety of topics and participants have a range of expertise. Be careful to evaluate the knowledge and opinions offered in any discussion forum. Note also that a small number of e-mail groups are cross-posted as Usenet newsgroups. For example, the early music e-mail group EARLYM-L also exists as the newsgroup rec.music.early.

E-mail discussion groups are managed by software programs. There are three in common use: Listserv, Majordomo, and Listproc. The commands for using these programs are similar. For information on how to use listserver software, see the tutorial Internet from the VAX Prompt.

A list of Usenet newsgroups can be accessed from within a newsreader program. Using the RN reader on Unix, for example, you can type dir/group/all and receive a list of every newsgroup to which the University subscribes. For more information on the RN newsreader, see Using the RN Newsreader from the VAX.

A good Web-based directory to assist in locating e-mail discussion groups and Usenet newsgroups is Liszt, located at http://www.liszt.com/.

2. GO DIRECTLY TO A SITE IF YOU HAVE THE ADDRESS

If you know the Internet address of a site you wish to visit, you can use a Web browser to access that site. All you need to do is type the URL in the appropriate location window. URL stands for Uniform Resource Locator. The URL specifies the Internet address of the electronic document. Every file on the Internet, no matter what its access protocol, has a unique URL. Web browsers use the URL to retrieve the file from the host computer and the directory in which it resides. This file is then displayed on the user's computer monitor.

This is the format of the URL: &nsp;   protocol://host/path/filename

For example:

http://cedr.lbl.gov/cdrom/doc/cdrom.html      a hypertext file on the Web
ftp://bongo.cc.utexas.edu/microlib      a file at an FTP site
telnet://library.albany.edu     a Telnet connection
Any of these address can be typed into the location window of a Web browser.

3. BROWSE

Browsing home pages on the Web is a haphazard but interesting way of finding desired material on the Internet. Because the creator of a home page programs each link, you never know where these links might lead. High quality starting pages will contain high quality links. The University Libraries Home Page contains quality links leading into the World Wide Web, and is a good place to start your exploration. This site is located at http://www.albany.edu/library/.

4. EXPLORE A SUBJECT DIRECTORY

An increasing number of universities, libraries, companies, organizations, and even volunteers are creating subject directories to catalog portions of the Internet. These directories are organized by subject and consist of links to Internet resources relating to these subjects. The major subject directories available on the Web tend to have overlapping but different databases. Most directories provide a search capability that allows you to query the database on your topic of interest.

Subject directories differ significantly in selectivity. For example, the famous Yahoo! site does not consider content when adding Web pages to its database. In contrast, the Argus Clearinghouse collects and rates subject guides often compiled by experts. Consider the policies of any directory that you visit. One challenge to this is the fact that not all directory services are willing to disclose either their policies or the names and qualifications of site reviewers. A number of subject directories consist of links accompanied by annotations that describe or evaluate site content. A well-written annotation from a known reviewer is more useful than just a list of links.

Among the more prominent and useful directories are these: 

Argus Clearinghouse 
http://www.clearinghouse.net/ 
BUBL Link 
http://bubl.ac.uk.link/ 
INFOMINE: Scholarly Internet Resource Collections 
http://lib-www.ucr.edu/ 
Librarians' Index to the Internet 
http://sunsite.berkeley.edu/InternetIndex/ 
Magellan   [not currently updated] 
http://www.mckinley.com/ 
Scout Report Signpost 
http://www.signpost.org/signpost/ 
The WWW Virtual Library 
http://vlib.stanford.edu/Overview.html 
Yahoo! 
http://www.yahoo.com/ 
 The University Libraries Home Page includes a list of these and other recommended subject directories, located at http://www.albany.edu/library/internet/subject.html.

Recommended starting points:

5. CONDUCT A SEARCH USING A WEB SEARCH ENGINE

An Internet search engine allows the user to enter keywords relating to a topic and retrieve information about Internet sites containing those keywords. Search engines are available for many of the Internet protocols. Archie searches for files stored at anonymous FTP sites. Veronica and Jughead, now of mainly historical interest, search Gopherspace.

Search engines located on the World Wide Web have become quite popular as the Web itself has become the Interneotj|t environment of choice. Web search engines have the advantage of offering access to a vast range of information resources located on the Internet. Many search engines compile a database spanning multiple Internet protocols, including HTTP, FTP, and Usenet. Web search engines tend to be developed by private companies, though most of them are available free of charge.

A Web search engine service consists of three components:

Keep in mind that spiders are indiscriminate. Be aware that some of the resources they collect may be outdated, inaccurate, or incomplete. Others, of course, may come from responsible sources and provide you with valuable information. Be sure to evaluate all your search results carefully.

With most search engines, you fill out a form with your search terms and then ask that the search proceed. The engine searches its index and generates a page with links to those resources containing some or all of your terms. These resources are usually presented in relevancy ranked order. A new development in search engine technology is the ordering of search results by concept, keyword, or site.

All search engines have rules for formulating queries. It is imperative that you read the help files at the site before proceeding. Online tutorials can also help you learn the rules. A short list of recommended tutorials appears at the end of this file.

Among the more prominent and useful search engines are these: 

AltaVista 
http://www.altavista.digital.com/ 
Excite 
http://www.excite.com/ 
HotBot 
http://www.hotbot.com/ 
Inference Find 
http://www.inference.com/infind/ 
Infoseek 
http://www.infoseek.com/ 
The Internet Sleuth 
http://www.isleuth.com/ 
Lycos 
http://www.lycos.com/ 
MetaCrawler 
http://www.metacrawler.com/ 
MetaFind 
http://www.metafind.com/ 
Northern Light 
http://www.northernlight.com/ 
ProFusion 
http://www.designlab.ukans.edu/profusion/ 
 Recommended starting points:
  1. Start with Infoseek. This is a very quick and accurate search engine that offers several field searching options and simplified keyword and phrase searching. Infoseek clusters all your results from one site into one result, so that each result comes from a different site. This makes it easy to scan for a variety of documents brought in by your search.
  2. AltaVista is another good choice. This engine has a very large database, handles sophisticated Boolean searches, and offers several field searching options.
  3. Also try Northern Light. This unique engine clusters results into Custom Search Folders, which contain specific subtopics and sites retrieved by your search. Northern Light therefore gives you quick access to aspects of your topic that interest you. Excite and HotBot are also recommended services to try.
  4. MetaCrawler is a good site to try if your topic is obscure or if you want to retrieve results from a variety of search engines with a single search statement. This service searches multiple search engines simultaneously and offers useful search options. MetaCrawler returns your results in a single list and removes the duplicate files. This type of search processing is called multi-threaded searching. Other recommended multi-threaded search engines include Inference Find, MetaFind and ProFusion.
  5. Try The Internet Sleuth if you want to search a topic-oriented database. This site offers access to hundreds of searchable databases in several subject areas.
For a more extensive list of recommended Web search engines, see Search the Internet.

PRACTICAL STEPS: WORLD WIDE WEB SEARCH ENGINES

HOW TO FORMULATE QUERIES

There are three steps to a computer database search:
  1. Identify your concepts
  2. When conducting any database search, you need to break down your topic into its component concepts. For example, if you want to find information on the budget negotiations between President Clinton and the Republicans, these are your concepts: CLINTON, REPUBLICANS, BUDGET.

  3. List keywords for each concept
  4. Once you have identified your concepts, you need to list keywords which describe each concept. Some concepts may have only one keyword, while others may have many.

    For example:

         CLINTON        REPUBLICANS         BUDGET
                        HOUSE SPEAKER       BUDGET NEGOTIATIONS 
                                            BUDGET BATTLE
                                            BUDGET IMPASSE
                                            BUDGET DEAL
    Depending on the focus of your search, there may be other keywords you would wish to use.
  5. Specify the logical relationships among your keywords
  6. Once you know the keywords you want to search, you need to establish the logical relationships among them. The formal name for this is Boolean logic. Boolean logic allows you to specify the relationships among search terms by using any of three logical operators: AND, OR, NOT.

Search Statement              Result of search

World War I   AND             Files containing both these terms
World War II   

World War I   OR              Files containing at least one of these terms
World War II

World War I   NOT             Files containing the term World War I but
World War II                  not also the term World War II
Some search engines offer Boolean searching without mentioning the logical operators by name. For example, you might be asked to list your search terms and choose that All of these terms be searched. This denotes AND logic. Specifying Any of these terms denotes OR logic. Other search engines use a type of implied Boolean logic, in which symbols or spaces are used to denote logical relationships.

Certain search engines allow you to use a proximity operator. This a type of AND logic which specifies the distance between words in a source file. For example, AltaVista and Lycos let you use the NEAR operator. Consider this search: Clinton NEAR budget. In AltaVista, the two terms must be within 10 words of each other in the source file. Lycos allows user-specified distances. Use of this option can help you gain relevance in your search results.

Most Web search engines cannot handle a single search statement that includes all the terms listed in Step 2 above. You may need to repeat your search a few times using terms in different combinations until you get results that are satisfactory. For example, you may start with CLINTON, REPUBLICANS, BUDGET NEGOTIATIONS and connect these terms with AND logic. Take a look at your results. If you are not finding what you want, repeat the search with alternative keywords for the budget concept. Your initial results may give you ideas about which new terms to try.

For more information on formulating searches, see Boolean Searching on the Internet.

TIPS ON CONDUCTING SEARCHES

  1. Read the directions at each search site. The technique for formulating a search depends on the search engine you are using. There is a wide variety of options available among the different search engines.
  2. If you have a multi-term search, be sure to determine which type of Boolean logic you should use. For example, a search about the relationship between latitude and temperature would be formulated as:    +latitude   + temperature on many Web search engines in order for AND logic to apply.
  3. Include synonyms or alternate spellings in your search statements and connect these terms with OR logic.
  4. Check your spelling.
  5. Take advantage of capitalization if the search engine is case sensitive.
  6. If your results are not satisfactory, repeat the search using alternative terms.
  7. If you have too many results, or results that are not relevant:
  8. If you have too few results:
  9. Try different sources within search engines to diversify your results. Sources can include Usenet newsgroups, Internet FAQs, reviewed pages, and more.
  10. Experiment with different search engines. No two search engines work from the same database.
  11. You may want to try Web sites which allow you to search multiple search engines simultaneously. Be aware that you will lose access to advanced query options since not all engines offer them.

FOR MORE INFORMATION

The following tutorials include more detailed presentations on Web search engines and other Web-based information resources:
The Finer Points of Web Search Engines
http://www.albany.edu/library/internet/finer.html
Finding Information on the Internet: A Tutorial
http://www.lib.berkeley.edu/TeachingLab/Guides/Internet/FindInfo.html
How to Choose a Search Engine or Research Database
http://www.albany.edu/library/internet/choose.html
Searching the Internet: Recommended Sites and Search Techniques
http://www.albany.edu/library/internet/search.html
 
Laura Cohen
lcohen@cnsvax.albany.edu