Safe CGI Programming Last updated: 1995-09-03
----------------------------------------------------------------------
[Note -- the last update of any thoroughness was indeed 1995-09-03.
However, it turns out people are still using this, so I feel obliged
to at least correct the glaring errors. See the section on identifying
safe characters with regular expressions for an important update.
Thanks. -- PSP 1997-07-08]
[Updated again 1997-10-27 with a few notes from Dave Andersen.]
Recent exposure of security holes in several widely used CGI packages
indicates that the existing documents on CGI security have not taken
hold in the public consciousness. These scripts are being redistributed
to people that have no programming experience and no way to determine
whether they are opening up their servers for attack. This causes
considerable frustration for all involved.
This document is intended for the beginning or intermediate
CGI programmer. It is by no means a comprehensive analysis of
the security risks -- its purpose is to help people avoid the
most common errors. This document and other CGI security resources
are available at
Please send comments on this document to Paul Phillips
Q: "Why should I care? The server runs as nobody, right? That means
you can't do anything dangerous, even if you break a CGI script."
A: Wrong. Some of the actions that can be taken in various
circumstances are:
1) Mailing the password file to the attacker (unless shadowed)
2) Mailing a map of the filesystem to the attacker
3) Mailing system information from /etc to the attacker
4) Starting a login server on a high port and telneting in
5) Many denial of service attacks: massive filesytem finds,
for example, or other resource consuming commands
6) Erasing and/or altering the server's log files
Another problem is that some sites are running their webservers
as root. I CANNOT EMPHASIZE ENOUGH HOW BAD THIS IS. You are shooting
yourself in the foot. Whatever problem inspired you to do this, you
must solve it in some other manner, or you *will* be compromised in
the future.
There has been some confusion as to what it means to "run your
webserver as root." It is fine to *start* the webserver as root.
This is necessary to bind to port 80 on Unix systems. However,
the webserver should then give away its privileges with a call
to setuid. The webserver's configuration file should allow you
to specify what user it should run as; the default is normally
"nobody", a generic unprivileged account. Remember that it is
irrelevant which account owns the binary, and the program should
not have the setuid bit set.
There is a good argument that servers should not actually run as
"nobody", but rather as a specific UID and GID dedicated to the
webserver, such as "www". This prevents other programs that run
as "nobody" from interfering with server-owned files.
There is a program called "cgiwrap"
that runs CGI scripts under the UID of the person that owns them. While
cgiwrap successfully overcomes some problems with CGI scripts, it also
exacerbates the effect of security holes. If an attacker can execute
commands under the user UID, rm -rf ~ is only a few characters long,
and the user will lose everything.
Q: "Now I'm scared, maybe my code is buggy. Can you show me some
examples of security holes?"
A: Now you're talking. The entire philosophy can be summed up as
"Never trust input data." Most security holes are exploited by
sending data to the script that the author of the script did not
anticipate. Let's look at some examples.
Foo wants people to be able to send him email via the web. She
has several different email addresses, so she encodes an element
specifying which one so she can easily change it later without
having to change the script. (She needs her sysadmin's permission
to install or change CGI scripts -- what a hassle!)
Now she writes a script called "email-foo", and cajoles the sysadmin
into installing it. A few weeks later, Foo's sysadmin calls her back:
crackers have broken into the machine via Foo's script! Where did
Foo go wrong?
Let's see Foo's mistake in three different languages. Foo has
placed the data to be emailed in a tempfile and the FooAddress
passed by the form into a variable.
Perl:
system("/usr/lib/sendmail -t $foo_address < $input_file");
C:
sprintf(buffer, "/usr/lib/sendmail -t %s < %s", foo_address, input_file);
system(buffer);
C++:
system("/usr/lib/sendmail -t " + FooAddress + " < " + InputFile);
In all three cases, system is forking a shell. Foo is unwisely
assuming that people will only call this script from *her* form, so
the email address will always be one of hers. But the cracker copied
the form to his own machine, and edited it so it looked like this:
Then he submitted it to Foo's machine, and the rest is history,
along with the machine.
Q: "I never use system. I guess my scripts are all safe then!"
A: System is not the only command that forks a shell. In Perl,
you can invoke a shell by opening to a pipe, using backticks, or
calling exec (in some cases.)
* Opening to a pipe: open(OUT, "|program $args");
* Backticks: `program $args`;
* Exec: exec("program $args");
You can also get in trouble in Perl with the eval statement or
regular expression modifier /e (which calls eval.) That's beyond
the scope of this document, but be careful.
In C/C++, the popen(3) call also starts a shell.
* popen("program", "w");
Q: "What's the right way to do it?"
A: Generally there are two answers: use the data only where it can't
hurt you, or check it to make sure it is safe.
*1* Avoid the shell.
open(MAIL, "|/usr/lib/sendmail -t");
print MAIL "To: $recipient\n";
Now the untrusted data is no longer being passed to the shell. However,
it is being passed unchecked to sendmail. In some sense you are trading
the shell problems for those of the program you are running externally,
so be sure that it cannot be tricked with the untrusted data! For example
if you use /usr/ucb/mail rather than /usr/lib/sendmail, ~-escapes can be
used (on some versions) to execute commands. Be wary.
You can use the perl system() and exec() calls without invoking a shell
by supplying more than one argument:
system('/usr/games/fortune', '-o');
You can also use open() to achieve an effect similar to popen, but
without invoking the shell, by performing
open(FH, '|-') || exec("program", $arg1, $arg2);
*2* Avoid insecure data.
unless($recipient =~ /^[\w@\.\-]+$/) {
# Print out some HTML here indicating failure
exit(1);
}
This time we're making sure the data is safe for passing to the
shell. The example regexp above specifies what is safe rather than
what is unsafe.
if($to =~ tr/;<>*|`&$!#()[]{}:'"//) {
# Print out some HTML here indicating failure
exit(1);
}
Or, to escape metacharacters rather than just detecting them,
a subroutine like this could be used:
sub esc_chars {
# will change, for example, a!!a to a\!\!a
@_ =~ s/([;<>\*\|`&\$!#\(\)\[\]\{\}:'"])/\\$1/g;
return @_;
}
[UPDATE! As if to highlight the danger inherent in specifying
unsafe characters rather than safe, several oversights in the above
regexp have been pointed out to me. First, the ^ character (carat)
acts as a pipe under some shells, and should also be escaped.
Second, the \n character (newline) is not listed, which could
delimit shell commands depending on circumstances. And perhaps
most worrisome, the shell escape character itself \ (backslash)
could be present in external input. If an input stream of
foo\;bar
were run through the substitution above, it would yield
foo\\;bar
once again exposing the ; as a shell metacharacter. In short,
pay attention to the paragraph below, it's as true now as it ever
was. Note that I *have not* modified the esc_chars routine in
light of this information, so do not use it as-is.
Update Jul 13 1997: the beat goes on. The regexp also excludes
the ? metacharacter (which is almost as dangerous as *) and ASCII
255, which is treated as a delimiter by some shells.]
These regexps specify what is unsafe. I believe them to be a complete
list of potentially dangerous metacharacters, but I have no authoritative
source to check. The difference between the latter two regexps and the
first is the difference between the two security policies "that which is
not expressly permitted is forbidden" and "that which is not expressly
forbidden is permitted." All security professionals will tell you that the
former policy is safer.
For maximum security, use both *1* and *2* where possible.
USE PERL TAINT CHECKS: Perl can be very helpful with these problems.
Invoke it with perl -T to force taint checks; to learn about taint
checks, see the perl man page. (The -T option exists only under Perl5.)
DON'T MAKE ASSUMPTIONS ABOUT YOUR ENVIRONMENT:
Just because cgi-bin programs are traditionally
executed within the sanitized environment provided by the
webserver, on multiuser systems it may be possible for other
users to execute your cgi-bin programs, or force an execution
of them in an unexpected context. To prevent against this,
cgi-bin programs (especially if they run setuid) should
sanitize their environments appropriately before spawning
any shells or invoking any other programs. At a minimum, set
the value of the PATH and IFS environment variables to a known
state:
$ENV{"PATH"} = "/bin:/usr/bin:/usr/local/bin";
$ENV{"IFS"} = "/";
It takes a bit more work, but resetting the environment
to null using undef() and then building a completely known
environment is probably a safer way to accomplish this.
Note that perl in taint-checking mode will warn you if you
attempt to system() something without first setting your
path and IFS appropriately.
Q: Can I trust user supplied data if there is no shell involved?
A: No. There are other issues as well. Consider this perl code fragment:
open(MANPAGE, "/usr/man/man1/$filename.1");
This is intended to allow HTML access to man pages. However, what if the
user supplied filename is
../../../etc/passwd
Anytime you are dealing with pathnamess, be sure to check for
the .. component.
Q: "What else?"
A: In C and C++, improperly allocated memory is vulnerable to
buffer overruns. Perl dynamically extends its data structures to
prevent this. Imagine code like this:
int foo() {
char buffer[10];
strcpy(buffer, get_form_var("feh"));
/* etc */
}
When writing this code, the author certainly expected the value of
the feh variable to be less than 10 characters. Unfortunately for
him, he didn't make sure, and it turned out to be much longer. This
means that user data is overwriting the program stack, which in some
circumstances can be used to invoke commands.
This is very difficult to exploit and you probably will not encounter
it. Still, it's worth mentioning; a very similar hole was found in
NCSA httpd 1.3 earlier in 1995. It is poor programming practice not to
check such things anyway.
Along the same lines, under no circumstances should the C gets()
function be used. It's inherently insecure, as there is no way
to specify how large the input buffer is. Use fgets() on the stdin
stream instead.
Q: "My WWW server doesn't run on a unix platform. Only unix has all
these nasty security holes."
A: This may or may not be true. The author of this document has limited
experience with servers on other platforms, but he is more than a little
skeptical that security concerns do not exist. At the very least, the
gets() and stack-overflow issues are present on Windows and MacOS as well.
Specific examples of other CGI dangers on other platforms are welcomed.
Specifics here contributed by Dave Andersen:
Not only have buffer overflows been found on other platforms, but
glaring security holes in a number of cgi-bin scripts on other platforms
(notably Windows) have been found. Scripting languages such as perl
are in common use in Windows-based webservers. The lack of a telnet
port as is found in a UNIX based webserver is no deterrent to an
attacker who has under his or her command the full power to execute
arbitrary programs on a compromised webserver.
It is very likely that Windows NT webservers will be the targets
of the future because they haven't been as thouroughly exploited
and tested. They also make relatively easy targets of
denial-of-service attacks, so particular care should be paid
by programmers of cgi scripts which run on these machines to
avoid serious resource misuse which could present an attacker
with a method of disabling the machine. (Free memory responsibly,
always check the length on input data, ensure that in the case
of an abnormal termination due to user-supplied input that you
always release any resources which you might have).
*Appendix*
Contributions to this document welcomed at .
Thanks to those that have contributed to this document:
John Halperin
Maurice L. Marvin
Dave Andersen
Zygo Blaxell
Joe Sparrow
Keith Golden
James W. Abendschan
Jennifer Myers
Jarle Fredrik Greipsland
David Sacerdote