Peter Cooper : UK Web 2.0 and Ruby on Rails consultant
Recent Posts
»Jay-Z: From Brooklyn to the Boardroom
»Prank Caller Submits Girl To Sexual Torture By Proxy
>Full archive
Other Posts
« Funny source codeRuby on Rails powered "Application Appliances" »

How to develop a stable, pre-forking HTTP daemon in Perl


This is very geeky, so I won't be subjecting the front page to this. To those techies among you, enjoy, comment, and critique. It's also verrry long. I am particularly interested to get comments from people more experienced in this area than me as I may have made major mistakes. That said, this all seems to work swimmingly and corresponds with everything I've learned along the way.




There are a few pages about developing daemons in Perl on the Web, but they largely demand a solid grasp on signals and other concepts that general programmers may not be competent with. Here I'll attempt to cover the anatomy of a pre-forking UNIX-based daemon in Perl, and how to develop one for yourself, in a general, easy-to-understand way.

What a daemon is

O'Reilly defines a daemon as A program that runs in the background; that is, without user interaction. Most 'server-like' programs are daemons, e.g: a Web server. A Web server waits for connections and handles those that arrives. It does this constantly without local user interaction on the computer where it is running. Mail servers, time servers, POP3 servers, and the like, are all based around daemons.

Forking or pre-forking?

Most daemon-development tutorials I've found show how to make a basic single-forked daemon, where you run a program from your shell, the program forks and hands back control to the shell, and the daemonized process - "the daemon" - accepts and handles connections one at a time, then loops around to wait for the next connection.

This is bearable with a low level of requests, but what if a request takes about a second to answer and there's more than 1 request a second hitting the server? You'll end up with unsuccessful requests on the client end and a slow service. In any case, if your service is to be stable and deal with bursts of requests, it makes sense to pre-fork many "child" processes which can handle the requests, and have a single "parent" process which manages them.

Let's build an HTTP daemon

Let's build a Web server (an HTTP daemon) of our own. Here are the specs:

Here's the steps that our program initially needs to take:

  1. Daemonize itself.
    • Set up an INT, TERM, and HUP signal handler so that the parent process kills itself and all child processes if we 'kill' the daemon later.
    • Set up a CHLD 'reaper' signal handler so that if a child process dies, the parent knows about it and fixes it.
  2. Implement a basic HTTP daemon using the HTTP::Daemon module.
  3. Spawn a number of child processes.
    • Fork a new child process.
    • Store the PID of the child and increase the child count so we know how many children we currently have running.
  4. Keep looping around, waiting for dying child processes, and creating new ones when necessary.
    • When a dead child is detected, reduce the child count.
    • Spawn new children up to the child count limit.

When spawned, each child process needs to:

  1. Daemonize itself away from the parent process.
  2. Loop for a certain number of times
    • Accept a connection from HTTP::Daemon
    • Get the request
    • Process the request
    • Send a response
    • Close the connection
  3. Die after a certain number of responses to avoid memory leaks, aid stability, etc.

In this way, you can see what the separate tasks of the parent process and the multiple child processes are. The parent process spawns and manages the children, and the children loop around accepting requests and sending responses.

Code to start with

Let's develop the basics of the parent process. We can fill in the subroutines afterwards, but here's the core logic:

use HTTP::Daemon; use POSIX;

# Number of child processes to keep going
my $totalChildren = 5;
# Number of requests each child handles before dying
my $childLifetime = 100;
my $logFile = "/tmp/daemon.log";
my %children;
my $children = 0;

&daemonize;

# Create an HTTP daemon
my $d = HTTP::Daemon->new( LocalPort => 1981, LocalAddr => '127.0.0.1', Reuse => 1, Timeout => 180 ) || die "Cannot create socket: $!\n";

# Log the URL used to access our daemon
logMessage ("master is ", $d->url);
&spawnChildren;
&keepTicking;

This code loads the HTTP::Daemon module and the POSIX module, daemonizes itself, creates an HTTP daemon (a bit of a misnomer, HTTP::Daemon is really just an extension of IO::Socket that lets you talk HTTP in both directions on a connection). It spawns our initial children, then goes into a 'keepTicking' subroutine that will reap dead children and launch new ones for eternity.

Note that the HTTP daemon is created on the 127.0.0.1 interface and uses port 1981. You may want to change these later, but note that you can't use any port below 1024 unless you're running the daemon as root. If you want to access the daemon from another machine on your network, you may need to specify your publicly addressable IP address here depending on your system.

Writing the daemonize sub

The first subroutines we need to write are the daemonizer and reaper. These will daemonize our parent process away from the shell and provide functionality for the parent to keep track of how many children are currently running.

Let's start with daemonize itself:

sub daemonize {   my $pid = fork;   defined ($pid) or die "Cannot start daemon: $!";

  # If we're the shell-called process,
  # let the user know the daemon is now running.
  print "Parent daemon running.\n" if $pid;
  # If we're the shell-called process, exit back.
  exit if $pid;

  # Now we're a daemonized parent process!

  # Detach from the shell entirely by setting our own
  # session and making our own process group
  # as well as closing any standard open filehandles.
  POSIX::setsid();
  close (STDOUT); close (STDIN); close (STDERR);

  # Set up signals we want to catch. Let's log
  # warnings, fatal errors, and catch hangups
  # and dying children

  $SIG{__WARN__} = sub {
    &logMessage ("NOTE! " . join(" ", @_));
  };

  $SIG{__DIE__} = sub {
    &logMessage ("FATAL! " . join(" ", @_));
    exit;
  };

  $SIG{HUP} = $SIG{INT} = $SIG{TERM} = sub {
    # Any sort of death trigger results in death of all
    my $sig = shift;
    $SIG{$sig} = 'IGNORE';
    kill 'INT' => keys %children;
    die "killed by $sig\n";
    exit;
  };

  # We'll handle our child reaper in a separate sub
  $SIG{CHLD} = \&REAPER;
}

This subroutine simply daemonizes the parent process away from the shell, then sets up things to do when "warn" and "die" are used (redirecting them to our logging subroutine), and sets up things to do when the daemon is killed from outside.

Now let's set up the REAPER sub that's called whenever a child process dies:

sub REAPER {

  my $stiff;
  while (($stiff = waitpid(-1, &WNOHANG)) > 0) {
    warn ("child $stiff terminated -- status $?");
    $children--;
    delete $children{$stiff};
  }

  $SIG{CHLD} = \&REAPER;
}

This sub may look a little intimidating. Basically, it uses a POSIX function, waitpid to collect up the PIDs and exit codes of any dying/dead children (this automatically removes them from the process table and stops them from being defunct / zombies). It then reduces the $children count by one (so that we know to create a new child in its place later) and removes the PID from the parent's list of children PIDs.

Now let's write the routines which spawn the initial children and then create new children as others die.

Writing the parent's child maintenance subroutines

Let's start with spawnChildren:

sub spawnChildren {

  for (1..$totalChildren) {
    &newChild();
  }
}

That was painless. It simply calls newChild for each new child needed. So let's move onto the child handler that runs permanently in the parent process after initialization:

sub keepTicking {

  while ( 1 ) {
    sleep;
    for (my $i = $children; $i < $totalChildren; $i++ ) {
      &newChild();
    }
  };
}

All this code does is loop indefinitely, wait till something happens, then respawns the difference between the number of existing children, and the number of required children.

Our parent daemon is complete! Now we just need to implement newChild so that child processes are spawned and can do their job.

Writing the child process

Let's recap what the child process needs to do:

  1. Daemonize itself away from the parent process.
  2. Loop for a certain number of times
    • Accept a connection from HTTP::Daemon
    • Get the request
    • Process the request
    • Send a response
    • Close the connection
  3. Die after a certain number of responses to avoid memory leaks, aid stability, etc.

So let's get the whole sub done in one go, then look at it:

sub newChild {

  # Daemonize away from the parent process.
  my $pid;
  my $sigset = POSIX::SigSet->new(SIGINT);
  sigprocmask(SIG_BLOCK, $sigset) or die "Can't block SIGINT for fork: $!";
  die "Cannot fork child: $!\n" unless defined ($pid = fork);
  if ($pid) {
    $children{$pid} = 1;
    $children++;
    warn "forked new child, we now have $children children";
    return;
  }

  # Loop for a certain number of times
  my $i = 0;
  while ($i < $childLifetime) {
    $i++;
    # Accept a connection from HTTP::Daemon
    my $c = $d->accept or last;
    $c->autoflush(1);
    logMessage ("connect:". $c->peerhost . "\n");

    # Get the request
    my $r = $c->get_request(1) or last;

    # Process the request..
    # you can do whatever you like here..
    # we blindly respond to anything for now..
    my $url = $r->url;
    my $response = HTTP::Response->new(200);
    logMessage ($c->peerhost . " " . $d->url . $url . "\n");
    $response->content("This child has served $i requests.");
    $response->header("Content-Type" => "text/html");
    $c->send_response($response);
    $c->close;
  }

  warn "child terminated after $i requests";
  exit;
}

Each stage from our initial chart is commented in the code so you can follow what's happening. The first block forks the parent process to create a child process. The child process then enters into a loop which accepts incoming connections using the HTTP::Daemon handle, then sends back a basic response. It logs various things about the request, then closes the connection and loops around for another try.

If you run the completed app, then go to http://127.0.0.1:1981/ in your Web browser, you should get a message in return. If you keep hitting refresh you will see how each child process individually responds to your requests, since it will say This child has served 1 requests several times, before changing to This child has served 2 requests. This is because you have multiple children, each with their own count, handling requests.

Note that to kill the daemon, you can run something like killall perl to send a TERM signal to the daemon which, if your TERM handler is coded properly, will send all the processes to their deaths. Depending on your implementation, you may need to killall <whatever-app-name-you-used>. Run a ps if in doubt.

Extending this

You can extend the above app as far as you like. It's only a bare-bones pre-forked HTTP daemon. You could rip out HTTP::Daemon, and use Socket or IO::Socket to write an entirely bare daemon of your own (using your own protocol) which you could telnet to. You could hook up your child processes to a MySQL connection and store and retrieve information based on the user's requests to make a chat server or similar (this is how RSS Digest works).

An alternate way: Forking children after each request

Take this page from 'Advanced Perl Programming'. It uses an alternate method to that above, where the parent process does the accept and then immediately passes the request handle to a newly forked child process. This, I feel, is inefficient in high volume situations, but may be beneficial in others. If anyone can enlighten me as to why this may be a preferred method, comment here.

In conclusion..

This article has been written rather quickly, so there may be mistakes or things I've gone over too quickly. That's why comments are turned on. Leave comments, fixes, complaints, etc, and I'll deal with them. Thanks for reading!


May 08, 2005 | Posted by peter | Comments (1)
Comments

Hey, this is cool! Thanks for giving even more back to the community with this tutorial.

Posted by: Nick Gray at May 8, 2005 09:46 PM

Return to the homepage.
Privacy Policy