Invalid Query Characters
What do I mean by "sanitize"? That means properly escape metacharacters like / \ + - & %, etc. Let me show you an example of what can happen if you don't.
Let's say you have some HTML form and a Perl script that takes the data and passes it to some program on the command line, like so:
#!/usr/bin/perl #load the data from the form and split it into the variables. #a sample string of data would be: # query=this%20is%20a%20test%3e&language=en-us& #this would split into: # query=this%20is%20a%20test%3e # language=en-us @form_data = split(/&/,$ENV{'QUERY_STRING'}); foreach $i (@form_data) { #split strings into key-value pairs; we'd get: # $key="query"; # $value="this%20is%20a%20test%3e"; ($key,$value) = split(/=/,$i); #use urldecode to decode quoted characters, i.e. # %20 becomes a space, %3e becomes a period, etc. $value =~ s/\%([A-Fa-f0-9]{2})/pack('C', hex($1))/seg; #$value now equals "this is a test." #DO NOT DO THIS! $result = system("search_database -q $value"); }
Why shouldn't we do that? Well, imagine if someone maliciously passed in this for one of the fields:
dargueta \ rm -rf * 2> /dev/null
Yes, this would be URL-encoded when transferred, but as soon as you decode the string and pass it to system() your command line looks like this:
search_database -q \ rm -rf * 2> /dev/null
BAM. You've just lost a lot of files. Inserting a single line of code to check for invalid characters before you pass the data on the command line could've saved you a ton of trouble.
"But", you may say, "I had a Javascript function check all that before the form was submitted! How did I lose all my files?!" Easy. You ignore the Javascript. There are plenty of utilities like FireBug that'll let you modify a page's Javascript. Someone can bypass the form validation and send you malicious data. Or they can even use telnet and completely bypass retrieving the webpage at all and just send you a bogus query. Point is, do not trust the client to pass you valid data. Ever. It will save you more trouble (and money) in the long run if you're paranoid.
Buffer Overflows
Another possible attack is a buffer overflow. Take the following CGI code, for example:
void handle_request(void) { /*should be long enough for anything...right?*/ char query_buffer[1024]; /*Security hole!*/ get_client_query(query_buffer); }
What happens if the client enters a query longer than 1024 characters? The get_client_query function has no way of knowing how large the buffer is, since we don't pass it a size. And you can't trust your little client-side Javascript program to truncate the query at 1023 characters, because it can be bypassed, like I mentioned earlier. A better way to do it would've been to pass the length of the buffer in addition to a pointer to the buffer itself to get_client_query; if there isn't enough space in the buffer, then the function will throw an exception or return an error code, and you can handle it appropriately.
Filtering Error Messages
This isn't as important, but still a potential threat. Let's say you have some script with a bug in it, and someone sends your server an invalid query. Your script chokes and spits out a system error message, which could contain data like:
Fatal error while executing /WWW/cgi-bin/lib/scriptlib.pm referenced by /WWW/cgi-bin/database/script.pl.
StackOverflowException not caught.
<stack trace here>
This would give anyone a great insight into how your CGI scripts are written, and where they are. It's always better to catch all exceptions and spit out your own error messages rather than let the system do it for you - you never know what it's going to reveal to the client. Those error messages are meant for you to use to debug; the client just needs to see something like
HTTP 500 Internal Server Error
Epic fail, we're sorry. Try again later.
There was something else I was going to write about, but I'm completely blanking on it right now...oh well. Hope you've found this useful!