An overview of the Internet - history, organization, and operation. Major Internet services are covered, including their purposes, their protocols, and software and conventions associated with them.
The launching of Sputnik satellite by the Russians spurred the creation of ARPA (Advanced Research Projects Agency of the United States Department of Defense).
ARPA commissioned the creation of ARPANet, a packet-switching network to allow scientific colleagues and their support teams to share information and research results. Throughout the late 60's design papers were presented and in 1969 the first node was created at UCLA. The network soon expanded to include three more sites.
Grew to 15 nodes (23 hosts).
TCP/IP officially replaced older less powerful protocols. In the same year the network split into MILNET (military) and ARPAnet (research and education).
The Internet worm brought the Internet to a virtual standstill.
World Wide Web released by CERN.
The introduction of the Mosaic graphical browser spurred intense general interest in the Internet and Spiders and Crawlers begin roaming the Internet.
Introduction of Internet service providers and the commercialization of the Internet led to the first Internet shopping malls. In the same year WWW takes over from Telnet as the second most popular service on the Internet.
WWW is the service with the greatest amount of traffic on the Internet. See "World Wide Web Tops a Billion Pages"
The Internet is an interconnection of computer networks: a network of networks. Networks in this grand constellation include those of government, academia, business, non-profits, and private, and are inter-connected with fiber optics, microwave, and satellites.
Most users access the Internet using an intermediate server operated by their Internet Service Provider, or ISP. The 'last mile' between user devices and an ISP's servers can be over telephone lines, cable, fiber optics, wi-fi, or satellite.
The speed at which information can be uploaded and downloaded depends upon a connection's bandwidth. Bandwidth is a measure of how much information can be sent or received at one time. Bandwidth is measured as bits per second and expressed as Kbps (thousand bps), Mbps (million bps), and Gbps (giga - billion bps).
To enable the accurate and rapid exchange of information among computers, the Internet uses the TCP/IP protocol (Transmission Control Protocol/Internet Protocol). A protocol is a set of rules and methods by which Internet devices establish connections and transfer information.
To direct information to its proper destination the Internet relies on two address systems:
Internet Protocol Addresses, or IP addresses, are the numeric addresses used by network machines to uniquely identify each other. IP addresses are sometimes referred to as dot-quad addresses because they are composed of four sets of three digits, each separated by a dot (period). Each quad number must be between 0 and 255. Example: 184.108.40.206.
Although machines use them in extremely efficient ways, IP addresses are quite unwieldy for people, and therefore we have a parallel address system called the Domain Name System (DNS). DNS is a hierarchic address system which uses descriptive words to represent the IP addresses of web servers.
In a DNS address, the right-most word represents it's top-level domain. A top-level domain represents either a geographical area (.ca, .uk, .af) or a type of organization (.com, .org, .net).
Some common top-level domain names:
A commercial organization
An educational institution (generally a university)
A government (generally US gov) organization
International organizations (inc. NATO)
A US military organization
A network access provider
Usually, a not-for-profit organization
Reading a DNS address from right to left, the name gets more specific until
the name of the individual host computer is reached.
Reading from right to left, we see that this DNS address
The combination of a DNS address and the name and location (path) of a document
at that address is called a Uniform Resource Locator, or URL.
The various parts of a URL are separated by forward slashes.
Reading from right to left we see that this resource
Notice that while backslashes (\) are used on PC's to separate parts of a path, URL's follow the UNIX convention of using forward slashes (/).
As mentioned above, humans use textual DNS addresses to navigate the Internet at the time that computers are using numerical IP addresses. Translation between the two is done transparently by DNS servers, special Internet servers which maintain databases to provide the links between corresponding DNS and IP Addresses.
Under the TCP/IP protocol, files and other types of information are broken down into packets before being transferred from one location to another. Special computers called routers forward the individual packets until all reach their destination where they are reassembled into their original form.
To perform specialized tasks, the Internet uses many protocols in addition to TCP/IP. Just a few examples would include:
Internet services and their protocols
|UseNet News (Newsgroups)||NNTP|
|BitNet (Mailing Lists)|
|World Wide Web||HTTP|
File Transfer Protocol, FTP, is the most efficient protocol for transferring (uploading and downloading) files across the Internet.
Before the advent of HTTP and the World Wide Web, FTP servers were the "information warehouses" of the Internet and researchers searched them for information using specialized tools such as:
Although FTP sites are still used to distribute documents and software, the World Wide Web has replaced them as the major repository of information on the Internet and those pioneer search tools have been supplanted by WWW search engines.
FTP downloads are made using a dedicated FTP client or with a browser which is FTP-enabled. FTP clients are very compact and portable and they have many enhanced capabilities and are sometimes faster than browsers at transferring files.
Telnet is a text-based service which allows a user to establish a remote terminal connection with another computer and access its resources and services.
In the past, one of the most common uses for Telnet was to access the catalogue systems of university libraries. MUD's and MUSH's also utilize Telnet services.
While there are dedicated Telnet clients, many browsers can also utilize the Telnet protocol.
Ordinarily, connecting to a Telnet server (telnet://) requires a logon with a user account and a password.
Email was one of the original Internet services. Email is transferred in a forward and store process using two separate protocols, POP and SMTP.
Programs designed to send, receive, and manage email are called email clients. Netscape Navigator and Internet Explorer include email clients, thus making it possible to send and receive mail directly from a browser. However, these clients are considerably less powerful than dedicated clients like Eudora, Pine, and Pegasus, which provide such features as sophisticated filters, multiple signatures and mailboxes, and the ability to check and manage multiple mail accounts.
Beginning around 1998 a new type of email service emerged: Web-based email. The proprietary software used by such services sends and receives mail using the standard SMTP protocol, but replaces the POP protocol with a web link using the hypertext transfer protocol (http). Users can send and receive email anywhere they can access the World Wide Web and have no need for a email client, just a browser.
Such services are usually free to the user as revenue is generated by advertising on the website and in every email sent. While they are free and convenient, these services do not provide many of the more sophisticated features associated with dedicated clients and a POP account. Also, these services can change their terms of service at anytime they wish.
Examples of more or less popular webmail services are GMail (Google), Hotmail (Microsoft), and YaHoo Mail".
Many holders of personal POP accounts also obtain free, "throw-away" web-based accounts and use them in their online presence. Such "spam-magnet" accounts reduce the volume of spam (unsolicited email) received by personal accounts.
Netiquette Internet etiquette is a set of rules for acceptable behavior using email, and the Internet in general. Good Netiquette guidelines include:
Other important considerations include:
Usenet is a worldwide network of computers which, among other things, hosts what are called newsgroups. Usenet is like a huge bulletin board where people from all over the world can read and post messages. The "bulletin board" is divided into many topics, each representing an individual Newsgroup. Newsgroups are places where people with similar interests can exchange ideas and information.
A special program, called a news reader is needed to read and post Newsnet messages. Both the Netscape and Internet Explorer browsers include newsreaders, but dedicated programs, such as Forte's Free Agent, have more powerful features.
"Lurking" is the term applied to reading postings to a newsgroup without posting to the group. Lurking is a desirable activity: it acquaints one with a group's ground rules and etiquette. Reading a group regularly and reading the group's FAQ are recommended before actually posting a message to a group.
Newsgroup names are composed of several parts- comp.compilers for example. The top-level part (comp=computers, sci=science, rec=recreational sports, hobbies, arts, etc.) is the most general description and it becomes more specific from there.
The Usenet network is composed mostly of UNIX computers and it runs under the Network News Transfer Protocol (NNTP). The number of newsgroups hosted by Usenet has grown dramatically in recent years: from "several thousand" at the beginning of 1995 to more than 30,000 at the end of 1999, and over 50,000 by 2001.
See also "Frequently Asked Questions" and "Finding UseGroup FAQs below".
BitNet (Because It's Time) brings us mailing lists whose primary purpose originally was to create communication links between academic communities. Archived information from these lists is a little difficult to get, but it is of very high quality (consider the sources) and a bonanza for researchers.
Created in 1981 at City University of New York (CUNY), it received a big push from IBM, which donated a main frame. It links thousands of universities all around the world. The IBM protocols used by BITNET are different from those used by the rest of the Internet, so the two must be connected by a gateway.
Unlike other Internet features, a BitNet file can be blocked by the failure of even one node. BITNET is not an interactive log-in the way FTP and Telnet are.
Mailing lists run automatically using specialized software over a network server. The three original list management programs were LISTSERV, LISTPROC, and MAJORDOMO.
List management software automatically adds and removes subscribers, receives and re-sends user posts, and provides searchable indexing.
Thus, mailing lists transform email, which is one-to-one communication, into one-to-many communication and preserves it as part of a body of knowledge.
Listees have an option to receive posts as they are received by the server, or having them sent once a day in digest form.
When replying to a mailing list be sure to send your message to the list, and not the list server. The server address is used only for subscribing and unsubscribing.
IRC (Internet Relay Chat) provides interactive verbal communication between multiple users in widely separated locations.
Like CB radio, IRC is spontaneous, not private, and has developed its own subculture (users use nicknames for example) and rules of etiquette. Unlike CB radio, participants need not be located within a local geographic area to communicate with each other.
To obtain FAQ's for IRC, go to:
To get the IRC Primer, go to:
CU See Me, developed by Cornell University, is the transmission of audio and video across the Internet for the purpose of video-conferencing.
The term hypertext was coined in 1960 by Ted Nelson to describe text that is not constrained to be sequential. Hypertext links surpass mere footnotes in their ability to supply additional information.
Palo Alto Research Center (PARC) introduced a lisp-based hypertext system and Apple bundled HyperCard with Macintoshes.In 1989 a CERN researcher proposed a hypertext system to enable efficient information sharing for members of the high-energy physics community. By 1990 it was running as a prototype and was made available on CERN machines in 1991.
Growing interest in the WWW was spurred in 1993 by the release of Mosaic, a graphical interface. In 1994 more browsers were announced, including Spry and Netscape Navigator.
The main protocol used by the World Wide Web service is hypertext transfer protocol, or HTTP.
Secure web pages use the HTTPS protocol which transfers information in an encrypted, or secure, format.
As that of the World Wide Web, HTTP is the most widely used protocol in the transfer information on the Internet.
Web pages are formatted using HTML, or hyper-text-markup-language codes. HTML codes tell a web browser how to display information from a web page. HTML pages either have an .HTM file extension (Windows servers), or one which is .HTML (Unix and Linux servers).
Although completely textual at its beginning, the World Wide Web has long since acquired multimedia capabilities. Browsers are equally adept at displaying text, still and animated graphics, and video, as well as to play audio files.
A web page's address is known as its uniform resource locator, or URL for short. A Web site's default landing page is known as its home page.
Note that web browsers can be customized to default to their own default home page when they are launched. See Browser Tips, Setting Your Homepage.
A web browser is the Internet client used to access and display the contents of World Wide Web documents which have been "marked up" with HTML formatting codes, or "tags".
Browsers are designed in modular fashion. "Plug-ins", now standard in all browsers, allow them to deliver audio and video content as well as provide access to other Internet services, including FTP, Telnet, and newsgroups.
When browsers cache visited web pages in their history, they copy them to the local machine's hard disk drive. This is designed to speed loading subsequently. Clicking on refresh or pressing F5 tells the browser to update by loading from the server and not from the cache.
Directory sites are lists of other sites, organized by subject in a top-down arrangement. This makes them ideal for narrowing down a general topic to more and more specific sub-topics.
An example of a directory site would be Yahoo ("Yet Another Hierarchicly Odiferous Oracle"). Yahoo which lists over 80,000 sites in 14 top level categories.
To find other directory sites use a directory of directories, such as the Clearinghouse for Subject-Oriented Internet Resource Guides at the University of Michigan.
Search engines are used to find answers to specific information. They search for your keywords and display a "results page" which lists Web pages that match your search criteria.
Search engines do not search the Internet directly, they search databases which have been created by software programs called crawlers, spiders, robots, or simply "bots". These programs independently roam the Net looking for new web sites and new web pages to index. When something is found new, or changed, they create a database entry of the URL and keywords from the contents.
Meta-search engines submit your keywords to multiple search engines and return the results from all of them at the same time. Many meta-search sites are now simply ad-farms designed to harvest clicks for pay.
To learn more about using Search Engines, check out U.C. Berkley's Finding Info on the Internet tutorial.
Frequently Asked Questions (FAQs)
FAQs are a convenient method for distributing often requested information. They take the form of questions and answers and they are similar to leaflets and flyers but they are more accessible and up to date. They also cost much less to publish and can therefore serve smaller communities of interest.
True FAQs result from the collaboration of groups of individuals or organizations. They are public, they come from many contributors and go to many recipients, and they are authoritative. Their authoritativeness derives from having been reviewed and accepted by the community they serve.
All FAQs are copyrighted. They are legally the intellectual property of those who publish them. Copyright notices on individual FAQs vary from extremely liberal to very restrictive.
Types of FAQs
Newsgroup FAQs offer information about a particular newsgroup. This information might include suggestions for appropriate topics, the format for postings, rules concerning commercial postings (usually not allowed on newsgroups), and newcomer questions and answers.
Topical FAQs present information on a specific topic. For example: an FAQ on meditation which answers such questions as What is meditation?, How does one meditate?, When does one meditate?
Business and Commercial FAQs
FAQs published by organizations or businesses. Examples: a midwifery organization publishes a FAQ on how to become a midwife; computer consultants publish solutions to common problems.
Finding NewsGroup FAQs
FAQs in the News Archive at MIT
Use FTP to connect to the server
On the World Wide Web
Use browser to connect to:
Note: The rtfm.mit.edu mail server was turned off as of October 2009. The files it served may be accessed at:
Archie. Archie is a service that keeps track of the contents of most of the FTP sites on the Internet. Archie searches file titles for keywords and returns addresses for "hits".
FAQ. Frequently-Asked-Questions (and answers).
HTTP. Hypertext transfer protocol. Specifies the operations specific to the Web, such as hyperlinking.
IP. Internet Protocol is the network layer for the TCP/IP protocol suite. It is for packet-switching.
link. A link is a place in a hypermedia document that holds the information that identifies a place to jump to (a URL) in a different hypermedia document. Also known as a hotspot, or anchor.
NNTP. Network News Transfer Protocol used for Usenet news distribution.
POP. Post Office Protocol. POP server name is the server that provides your address and stores your mail for you to download to your personal computer.
Packet. A packet is a unit of data sent across a network. Large file transfers result in large numbers of packets being sent across a network.
PPP Point-to-Point Protocol. See SLIP.
Protocol. A technical description of the format for a message and the rules to be followed by two or more computers to follow as they exchange the message.
SLIP. Serial Line IP. Slip and PPP are the two protocols that allow home computer users to connect their computers to the Internet as peer hosts. They encapsulate TCP/IP packets for transmission over phone lines.
TCP/IP. Transmission Control Protocol/Internet Protocol. Two standards which work together to reliably send and receive blocks of data across the net. These protocols (sets of rules) for Internet communication between computers allow PC's, Apples, Mainframe, and Unix systems to freely exchange information.
URL A Uniform Resource Locator is a standardized description for the location of an Internet resource on the Internet. It consists of the access protocol, the host name, and the complete directory path of the file, separated by a forward slash.
Ginsburg, Mark and December, John. HTML 3.2 and CGI Unleashed. 1996. Sams Net, Indianapolis, IN.
McGregor, Pat. Mastering the Internet, 2nd Edition. 1996. Sybex, San Francisco.
Miller, Robert and Keeler, Melissa. Internet Direct. 1995. MIS: Press, NY, NY.
Rowland, Robin and Kinnaman, Dave. Researching on the Internet. 1995. Prima Publishing, Rockland, CA.
Stout, Rick. The World Wide Web, Complete Reference. 1996. Osborne, Berkeley, CA.