Tying Standard Input and Output to a Socket Connection

==============================================================================  
C-Scene Issue #05
Tying Standard Input and Output to a Socket Connection
Platform: UNIX
Kyle R. Burton
==============================================================================  

Synopsis


  In this article, I attempt to show how you can establish
a socket connection, and tie the standard file descriptors 
to the socket.  This allows you to read from standard input,
and write to standard output, which is much simpler than
using the socket functions to get input and give output.

  I recently had a project where I needed to offer the 
services of a legacy command line program across a network.
This program only ran under Unix, and it's services needed
to be offerd as is to various clients across a tcp/ip network.

  The proposed solution was to write a concurrent server that
accepted incomming connections, and invoked this legacy
command line application while providing a conduit between
the socket connection and the command line program's input
and output.  The source to that project can be obtained from:


http://www.voicenet.com/~mortis/projects/tcp_server/tcp_server.html


Proposed Solution


  I would like to point out that this code is extracted
from the actual production code, the major difference is
that this code has most of the error handeling removed.

  Step one is to set up the skleton server code to accept
socket connections, and fork a child process to handle the
communications with the connected client.

  The following code should be sufficient for accepting
multiple socket connections:

                                   ---------
                                   Listing 1
                                   ---------
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <sys/types.h>
#include <sys/socket.h>

int main( int, char** )
{
  int sck, client, addrlen;
  struct sockaddr_in this_addr, peer_addr;
  pid_t child_pid;
  unsigned short port = 4137; /* random port to listen on */

  addrlen = sizeof( struct sockaddr_in );
  memset( &this_addr, 0, addrlen );
  memset( &peer_addr, 0, addrlen );

  this_addr.sin_port        = htons(port);
  this_addr.sin_family      = AF_INET;
  this_addr.sin_addr.s_addr = htonl(INADDR_ANY);

  sck = socket( AF_INET, SOCK_STREAM, IPPROTO_IP);
  bind( sck, &this_addr, addrlen );
  listen( sck, 5 );

  while( -1 != (client = accept( sck, &peer_addr, &addrlen ) ) ) {
    child_pid = fork();
    if( child_pid < 0 ) 
      perror("Error forking");  exit(1);   /* error */

    if( child_pid == 0 ) {
      exec_comm_handler(sck);
    }
  }

  exit(0);

  return 0;
}
                                   ---------
                                   Listing 1
                                   ---------

  There are four steps necessary for setting up a socket and
listening for connections.  These are socket(2), bind(2),
listen(2), and accept(2).

The first, socket(2) is used to obtain a socket file descriptor.  
The arguments passed to our call to socket(2) are:

  the domain, for which we pass AF_INET - the address
  family for the internet.

  the protocol type, for which we pass SOCK_STREAM, this
  opens a stream based, connection oriented, reliable socket 
  (like tcp) versus a broadcast, connectionless, unreliable
  type socket (like udp).  

  and the protocol, which in this case is IPPROTO_IP, which
  requests a socket from the tcp/ip suite of protocols.  If
  we had chosen a different protocol, then we would not be
  given a tcp/udp socket, but rather a socket type appropriate
  for the protocol.

  Once the call to socket(2) has been made, the socket needs
to be bound to a specific port.  This is accomplished with the
bind(2) call.  bind(2) takes a pointer to a struct sockaddr_in
(for AF_INET connections, struct sockaddr_in is not used for
other networking domains).  This structure needs to be 
initialized before we pass it to the bind call.  We set the
port (sin_port), address family (sin_family), and the local 
address (sin_addr.s_addr).  

  The htons(3), and htonl(3) calls are used to put the port, 
and the local address into network byte order.  htons(3) stands 
for host to network short, while htonl(3) stands for host to 
network long.  Due to the different machine [host] byte orders 
(remember little endian vs big endian?), a standard had to be 
set for byte ordering while communicating over sockets.  
Thankfuly you don't need to worry about the specific order, 
just how to get your data into that order, in a machine portable 
way of course.  Once the sockaddr_in structure has been 
initialized, bind(2) can be called.  

  The next, and final step is to set the socket into a state
where it is listening for incomming connections.  The two 
parameters to this function are the socket file descriptor, 
and the size of the backlog you wish to use.  The backlog
is the number of socket connections you want to have queue
up while you're busy doing other things (like servicing other
sockets).

  Now that we have reached this point, we can call accept(2).
accept(2) will block untill a client connects to us.  When a
client connects, accept(2) will return.

  Once the client connection has been accepted, and the process
has forked, the child can then go on to perform the duties of
redirecting the input/output file descriptors, and executing
the desired program.

                                   ---------
                                   Listing 2
                                   ---------
int exec_comm_handler( int sck )
{
  close(0); /* close standard input  */
  close(1); /* close standard output */
  close(2); /* close standard error  */

  if( dup(sck) != 0 || dup(sck) != 1 || dup(sck) != 2 ) {
    perror("error duplicating socket for stdin/stdout/stderr");
    exit(1);
  }

  printf("this should now go across the socket...\n");
  execl( "/bin/sh", "/bin/sh", "-c", "/path/to/redirected_program" );
  perror("the execl(3) call failed.");
  exit(1);
}
                                   ---------
                                   Listing 2
                                   ---------

  Our 'comm handler' first closes the file descriptors for standard input,
standard output, and standard error.  It then uses dup(2) to duplicate
the socket file handle.  dup(2) will duplicate the given file descriptor
and return the duplicate as the next available descriptor.  Since 0, 1,
and 2 are the next available descriptors, they should be the returned
duplicates.  Now operations on stdin/stdout/stderr [0/1/2] will act
upon the socket instead of the original stdin/stdout/stderr.
  
  I was unaware that this technique (calling dup(2) on a socket file
descriptor) was possible untill seeing code written by Martin Mares.


Conclusion


  This technique is not limited to sockets, you can use the
dup(2) call on any valid file descriptors.  This includes files
opened with open(2).  So it is possible to use this technique
to redirect the input and output of a program to/from disk files 
or device files under Unix.

  One thing to look out for is buffering on the socket. The C 
style io routines (stdio.h) seem to suffer from this more often 
than the C++ style io routines (iostream.h).

  If there is a way to disable the inherent buffering in the 
socket libraries, I am not aware of it, and would appriciate 
hearing about it.  One way to get around this is to flush stdout.  
Even this is a potential problem, I have not really seen it make 
a significant impact in any of the situations where I've used 
this technique.




This page is Copyright © 1998 By
C Scene. All Rights Reserved