This is the bane of all sorts
of contact and comment forms, and popular applications such as blogs
include plug-in modules for blocking it. But you'll have to deal with
it yourself for custom applications, particularly if you decide to avoid using a CAPTCHA for every form entry. There are a few reasons your adversaries will spew garbage into
anything with an input or textarea field.
One motivation is to attempt email injection. If they know or suspect that the
website generates an email notification based on their input, they will
try to inject email headers into that input and hijack that
notification email for their own purposes. For example, including an
extra line return plus a "Bcc" into the post field where you ask them
to fill in their own email address could allow them to inject a series
of blind carbon copy recipients into that server-generated email. So
now the advertisement for porn they've entered into the comments
section goes out to the list of people they've designated - unless your
application is smart enough to recognize and block that gambit.
SQL injection is sort of the same thing, targeting databases instead
of forms. If they can make a few good guesses about how you're
translating their input into a SQL insert or update command, they may
be able to work in their own little flourishes. Again, this relies on
you making a few dumb mistakes and failing to filter input correctly.
Sometimes the worst thing that happens is that you wind up getting a few strange emails or phantom entries in a database table.
But even if the results aren't catastrophic, pay attention, look for
patterns. I like to have the website send me an email notification any
time something strange comes in. Here is an excerpt from one of those:
array (
'event' => '19',
'e' => 'hjUjnYhNukbCynexVH',
'rsvp' =>
array (
'answer' => '1',
'first' => 'NdIXUMiJ',
'last' => 'aWIyTtHhVPPLlWeG',
'reason' => 'PQA4S7 <a
href=\"http://snqqcyealkqs.com/\">snqqcyealkqs</a>,
[url=http://qxphfnlsmagp.com/]qxphfnlsmagp[/url],
[link=http://opchzvldolov.com/]opchzvldolov[/link],
http://poqsphfxalss.com/',
'update_id' => '',
'event' => '19',
),
'profile' =>
array (
'occupation' => 'HIRlonLIlTOFoyAX',
'company' => 'VfBSEqXhvFgnbONtC',
'address' => 'kYYidVgep',
'city' => 'hSxhSXCdQsEDUGl',
'state' => 'FL',
'zip' => 'PONkddDYYDF',
'phone' => 'hBEaMOvS',
'phonetype' => 'work',
)
This is what was entered into an event RSVP form. The question is
why. And I think the answer is pretty clearly that this was a probe, a
test, a way of seeing what the web application would put up with. And I
got the notification because at some point I noticed entries with this
kind of garbage turning up in the RSVP table of my database. More an
annoyance than a serious problem, on one level. On the other hand, as
long as the application was responding with "Thank very much for your
RSVP!" it was showing weakness. Any web application that will accept
"aWIyTtHhVPPLlWeG" as a valid last name will probably accept lots of
other garbage as well.
It's like walking down the street, jiggling door handles on all the
cars, looking for the one someone forgot to lock -- much easier than
breaking into a car that has been carefully secured. I have to assume
that's the only reason for submitting a completely random string of
characters rather than choosing from a list of common names. "John
Smith" could be a spammer, too, but wouldn't look so obviously
suspicious. But if the application will accept obvious garbage, then
it's obvious no one is paying attention.
So now I filter for text that includes random assortments of upper
and lowercase letters, counting the number of each and trying to
account for natural variations (like people who enter everything in
UPPERCASE or lowercase or have a little natural upper-lower case
combination like "du Maurier " or "McDonald") because some automated
hacking tool out there keeps throwing this same crap at me.
I also do pattern matching on web urls in the input and count the
number of occurrences. One or two might be legit (the user saying, "I
want to RSVP for this event, and I'm including company's website in
this handy comment field"), but more than that is probably spam (a list
of a dozen porn sites that an attacker is trying to get you to display
publicly or relay via email). Like any statistical approach to blocking
spam, this is error prone, but may be a necessary precaution.
Most of all, you have to pay attention for whatever is coming next.