Tuesday, September 20, 2011

understanding sockets


Understanding Sockets

It is important that you have an understanding of some of the concepts behind the socket interface before you try to apply them. This section outlines some of the high level concepts surrounding the sockets themselves.

Defining a Socket
To communicate with someone using a telephone, you must pick up the handset, dial the other party's telephone number, and wait for them to answer. While you speak to that other party, there are two endpoints of communication established:

• Your telephone, at your location
• The remote party's telephone, at his location

As long as both of you communicate, there are two endpoints involved, with a line of communication in between them. Figure 1.2 shows an illustration of two telephones as endpoints, each connected to the other, through the telephone network.
Figure 1.2:

Without the telephone network, each endpoint of a telephone line is nothing more than a plastic box. A socket under Linux, is quite similar to a telephone. Sockets represent endpoints in a line of communication. In between the endpoints exists the data communications network.

Sockets are like telephones in another way. For you to telephone someone, you dial the telephone number of the party you want to contact. Sockets have network addresses instead of telephone
numbers. By indicating the address of the remote socket, your program can establish a line of communication between your local socket and that remote endpoint. Socket addresses are discussed
in my post,"Domains and Address Families." You can conclude then, that a socket is merely an endpoint in communication.

Using Sockets
You might think that Linux sockets are treated specially, because you've already learned that sockets have a collection of specific functions that operate on them. Although it is true that sockets have some special qualities, they are very similar to file descriptors that you should already be familiar with.

NOTE
Any reference to a function name like pipe(2) means that you should have online documentation (man pages) on your Linux system for that function. For information about pipe(2) for example, you can enter the command:

$ man 2 pipe

where the 2 represents the manual section number, and the function name can be used as the name of the manual page. Although the section number is often optional, there are many cases where you must specify it in order to obtain the correct information.

For example, when you open a file using the Linux open(2) call, you are returned a file descriptor if the open(2) function is successful. After you have this file descriptor, your program uses it to read(2), write(2), lseek(2), and close(2) the specific file that was opened. Similarly, a socket, when it is created, is just like a file descriptor. You can use the same file I/O functions to read, write, and close that socket. You learn in Chapter 15, "Using the inetd Daemon," that sockets can be used for standard input (file unit 0), standard output (file unit 1), or standard error (file unit 2).

NOTE
Sockets are referenced by file unit numbers in the same way that opened files are. These unit numbers share the same "number space"— for example, you cannot have both a socket with unit number 4 and an open file on unit number 4 at the same time. There are some differences, however, between sockets and opened files. The following list highlights some of these differences:
  • You cannot lseek(2) on a socket (this restriction also applies to pipes). 
  • Sockets can have addresses associated with them. Files and pipes do not have network addresses. 
  • Sockets have different option capabilities that can be queried and set using ioctl(2). 
  • Sockets must be in the correct state to perform input or output. Conversely, opened disk files can be read from or written to at any time.
Referencing Sockets
When you open a new file using the open(2) function call, the next available and lowest file descriptor is returned by the Linux kernel. This file descriptor, or file unit number as it is often called, is a zero or positive integer value that is used to refer to the file that was opened. This "handle" is used in all other functions that operate upon opened files. Now you know that file unit numbers can also refer to specific sockets.

NOTE
When a new file unit (or file descriptor) is needed by the kernel, the lowest available unit number is returned. For example, if you were to close standard input (file unit number 0), and then open a file successfully, the file unit number returned by the open(2) call will be zero. Assume for a moment that your program already has file units 0, 1, and 2 open (standard input, output, and error) and the following sequence of program operations is carried out. Notice how the file descriptors are allocated by the kernel:


  1. The open(2) function is called to open a file. 
  2. File unit 3 is returned to reference the opened file. Because this unit is not currently in use, and is the lowest file unit presently available, the value 3 is chosen to be the file unit number for the file. 
  3. A new socket is created using an appropriate function call. 
  4. File unit 4 is returned to reference that new socket. 
  5. Yet, another file is opened by calling open(2). 
  6. File unit 5 is returned to reference the newly opened file.

Notice how the Linux kernel makes no distinction between files and sockets when allocating unit numbers. A file descriptor is used to refer to an opened file or a network socket. This means that you, as a programmer, will use sockets as if they were open files. Being able to reference files and sockets interchangeably by file unit number provides you with a great deal of flexibility. This also means that functions like read(2) and write(2) can operate upon both open files and sockets.

No comments:

Post a Comment