An extremely common security flaw is vulnerability to a ``buffer overflow''.
Buffer overflows are also called ``buffer overruns'', and there are
many kinds of buffer overflow attacks (including
``stack smashing'' and ``heap smashing'' attacks).
Technically, a buffer overflow is a problem with the program's internal
implementation, but it's such a common and serious problem that
I've placed this information in its own chapter.
To give you an idea of how important this subject is,
at the CERT, 9 of 13 advisories in 1998 and at least half of
the 1999 advisories involved buffer overflows.
An informal 1999 survey on Bugtraq found that approximately 2/3 of the
respondents felt that buffer overflows were the leading cause of
system security vulnerability (the remaining respondents identified
``mis-configuration'' as the leading cause) [Cowan 1999].
This is an old, well-known problem, yet it continues to resurface
[McGraw 2000].
A buffer overflow occurs when you write a set of values
(usually a string of characters) into a fixed length buffer
and write at least one value outside that buffer's boundaries
(usually past its end).
A buffer overflow can occur when reading input from the user into a buffer,
but it can also occur during other kinds of processing in a program.
If a secure program permits a buffer overflow, the overflow can often be
exploited by an adversary.
If the buffer is a local C variable, the overflow can be used to
force the function to run code of an attackers' choosing.
This specific variation is often called a ``stack smashing'' attack.
A buffer in the heap isn't much better; attackers may be able to
use such overflows to control other variables in the program.
More details can be found from Aleph1 [1996], Mudge [1995], LSD [2001],
or the Nathan P. Smith's
"Stack Smashing Security Vulnerabilities" website at
http://destroy.net/machines/security/.
A discussion of the problem and some ways to counter them is given
by Crispin Cowan et al, 2000, at
http://immunix.org/StackGuard/discex00.pdf.
A discussion of the problem and some ways to counter them in Linux
is given by
Pierre-Alain Fayolle and Vincent Glaume
at
http://www.enseirb.fr/~glaume/indexen.html.
Most high-level programming languages are essentially
immune to this problem, either
because they automatically resize arrays (e.g., Perl), or because they normally
detect and prevent buffer overflows (e.g., Ada95).
However, the C language provides no protection against
such problems, and C++ can be easily used in ways to cause this problem too.
Assembly language also provides no protection, and some languages
that normally include such protection (e.g., Ada and Pascal) can have
this protection disabled (for performance reasons).
Even if most of your program is written in another language,
many library routines are written in C or C++, as well as ``glue'' code to
call them, so other languages often don't provide as complete a protection
from buffer overflows as you'd like.
C users must avoid using dangerous functions that do not check bounds
unless they've ensured that the bounds will never get exceeded.
Functions to avoid in most cases (or ensure protection) include
the functions strcpy(3), strcat(3), sprintf(3)
(with cousin vsprintf(3)), and gets(3).
These should be replaced with functions such as strncpy(3), strncat(3),
snprintf(3), and fgets(3) respectively, but see the discussion below.
The function strlen(3) should be avoided unless you can ensure that there
will be a terminating NIL character to find.
The scanf() family (scanf(3), fscanf(3), sscanf(3), vscanf(3),
vsscanf(3), and vfscanf(3)) is often dangerous to use; do not use it
to send data to a string without controlling the maximum length
(the format %s is a particularly common problem).
Other dangerous functions that may permit buffer overruns (depending on their
use) include
realpath(3), getopt(3), getpass(3),
streadd(3), strecpy(3), and strtrns(3).
You must be careful with getwd(3); the buffer sent to getwd(3) must be
at least PATH_MAX bytes long.
The select(2) helper macros
FD_SET(), FD_CLR(), and FD_ISSET() do not check that the index fd
is within bounds; make sure that fd >= 0 and fd <= FD_SETSIZE
(this particular one has been exploited in pppd).
Unfortunately, snprintf()'s variants have additional problems.
Officially, snprintf() is not a standard C function in the ISO 1990
(ANSI 1989) standard, though sprintf() is,
so not all systems include snprintf().
Even worse, some systems' snprintf() do not actually protect
against buffer overflows; they just call sprintf directly.
Old versions of Linux's libc4 depended on a ``libbsd'' that did this
horrible thing, and I'm told that some old HP systems did the same.
Linux's current version of snprintf is known to work correctly, that is, it
does actually respect the boundary requested.
The return value of snprintf() varies as well;
the Single Unix Specification (SUS) version 2
and the C99 standard differ on what is returned by snprintf().
Finally, it appears that at least some versions of
snprintf don't guarantee that its string will end in NIL; if the
string is too long, it won't include NIL at all.
Note that the glib library (the basis of GTK, and not the same as the
GNU C library glibc) has a g_snprintf(), which
has a consistent return semantic, always NIL-terminates, and
most importantly always respects the buffer length.
Of course, the problem is more than just calling string functions poorly.
Here are a few additional examples of types of buffer overflow problems,
graciously suggested by Timo Sirainen, involving manipulation of
numbers to cause buffer overflows.
First, there's the problem of signedness.
If you read data that affects the buffer size,
such as the "number of characters to be read,"
be sure to check if the number is less than zero or one.
Otherwise, the negative number may be cast to an unsigned number,
and the resulting large positive number
may then permit a buffer overflow problem.
Note that sometimes an attacker can provide a large positive number and
have the same thing happen;
in some cases, the large value will be interpreted as a negative number
(slipping by the check for large numbers if there's no check
for a less-than-one value),
and then be interpreted later into a large positive value.
/* 1) signedness - DO NOT DO THIS. */
char *buf;
int i, len;
read(fd, &len, sizeof(len));
/* OOPS! We forgot to check for < 0 */
if (len > 8000) { error("too large length"); return; }
buf = malloc(len);
read(fd, buf, len); /* len casted to unsigned and overflows */Here's a second example identified by Timo Sirainen,
involving integer size truncation.
Sometimes the different sizes of integers
can be exploited to cause a buffer overflow.
Basically, make sure that you don't truncate any integer results used to
compute buffer sizes.
Here's Timo's example for 64-bit architectures:
/* An example of an ERROR for some 64-bit architectures,
if "unsigned int" is 32 bits and "size_t" is 64 bits: */
void *mymalloc(unsigned int size) { return malloc(size); }
char *buf;
size_t len;
read(fd, &len, sizeof(len));
/* we forgot to check the maximum length */
/* 64-bit size_t gets truncated to 32-bit unsigned int */
buf = mymalloc(len);
read(fd, buf, len);Here's a third example from Timo Sirainen, involving integer overflow.
This is particularly nasty when combined with malloc(); an attacker
may be able to create a situation where the computed buffer size
is less than the data to be placed in it.
Here is Timo's sample:
/* 3) integer overflow */
char *buf;
size_t len;
read(fd, &len, sizeof(len));
/* we forgot to check the maximum length */
buf = malloc(len+1); /* +1 can overflow to malloc(0) */
read(fd, buf, len);
buf[len] = '\0';