|
The computer anti-virus programs in
common use today use signature detection schemes that can only protect a
machine from viruses that have been previously identified and entered into
the programs' virus databases.
Anomaly detection systems, however,
sense when normal patterns of communications change in order to stop new
viruses -- or any other system intruders like worms or unauthorized users
-- in their tracks.
The trouble is, existing anomaly detection
schemes all generate high error rates -- they cry wolf so often that they
are impractical. In order to identify the real intrusions, system managers
must spend time checking out every possibility.
Researchers from
the University of California at Davis have taken an unusual tack in
anomaly detection by adapting text classification techniques to intrusion
detection. Their initial results suggest that the technique could produce
an anomaly detection system with a reasonable error rate.
The idea
to apply text classification to intrusion detection began with a
conversation about categorizing Web pages into clusters that share a given
property, said V. Rao Vemuri, a professor of applied science and computer
science at the University of California at Davis, and a scientist at
Lawrence Livermore National Laboratories.
Instead of categorizing
Web pages, however, the researchers used the classification system to
categorize computer users into just two groups -- authorized users and
intruders. "The problem is to decide what 'text'" to use for the problem,
Vemuri said. We wanted some objective way of characterizing a user that
the user... cannot consciously influence" in order to prevent an intruder
from fooling the system, he said.
They turned to system calls to
characterize a user. System calls are the internal requests various pieces
of software make to each other in the course of carrying out a user's
instructions. "The system calls are generated by the computer, and the
user cannot really influence the sequence in which they are generated,"
said Vemuri. The scheme treats each system call as a word and each
sequence of system calls as a document, and classifies each document as
one generated during normal activity or intrusive activity, he said.
The nearest-neighbor text categorization technique the researchers
used categorizes Web pages based on how they are linked. The nearest
neighbors in terms of links also tend to be closer in terms of content.
The researchers' detection scheme characterizes an authorized user
by building a profile of activities. "For example, in the course of my
normal life, I use email, browse Web pages, use Word, PowerPoint [and]
printers," said Vemuri. "Let's suppose that I rarely, if ever, use Java or
C++. I rarely use root privileges. If someone logging onto my machine uses
these, that departure from normal usage should signal... abnormal,
[possibly] intrusive activity," he said.
The problem turned out to
be easier than categorizing Web pages, said Vemuri. "Usually we use many
categories. In our example, we have only two categories -- authorized or
intruder, and in the worst-case three" if the system has to resort to
classifying activity as unknown.
In addition, Web pages can be
very long and the size of the English vocabulary is around 50,000 words,
which makes categorizing Web pages a computer-intensive task. "In our
case, the vocabulary -- distinct system calls -- rarely exceeds 100, and
the size of the 'pages', [or groups of calls], is also very small," he
said.
Short sequences of system calls have been used before to
characterize a person's normal behavior, but this requires building a
database of normal sequences of system calls for each program a person
uses. The text categorization technique, however, calculates the
similarities between program activities, which involves fewer
calculations.
This allows the system to detect an intruder as the
intruder is affecting the system, said Vemuri. "The computational burden
in our case is much smaller, to the extent we started to dream about the
possibility of detecting an intruder in real-time," like the way
contestants called out titles as songs played on the TV show "Name That
Tune", he said.
The researchers' current implementation is almost
real-time, said Vemuri. "We have to wait until [a] process, terminates, or
halts" before completing the classification, he said. Intrusive attacks,
however, are usually conducted within one or more sessions, and every
session contains several processes, said Vemuri. Because the classifier
method monitors the execution of each process, it's likely that an attack
can be detected while it is happening, he said. The researchers are also
working on allowing the system to make a classification before a process
terminates, he added.
The researchers tested their scheme with 24
attacks within a two-week period. The method detected 22 of 24 attacks,
and had a relatively low false-positive rate of 31 false alarms out of
5,285 events, or 0.59 percent, according to Vemuri.
The method
shows promise, said Bennet Yee, an assistant professor of computer science
and engineering at the University of California at San Diego. "The novelty
is noticing that text classification techniques can be adapted to
intrusion detection, and doing the experiments that validate it," he said.
If it proves practical and is widely deployed, the technique could
help prevent malicious software like the Internet worms Code Red and
Klutz, Yee said. "It should be able to recognize new attacks as anomalous
behavior and raise alarms earlier [than] signature detection schemes where
a database of bad behavior must be compiled first," he said.
There
is still work to do to determine if the method can be improved to a low
enough false positive rate, however, said Yee. A practical anomaly
detection system must have a very low false positive rate in order to be
commercially useful because if system administrators spend too much time
chasing down false alarms, they "will not want to use the system and will
turn the intrusion detector off," he said.
Even a false positive
rate of 0.44 percent could mean 23 false alarms per day if there are 5,285
events per day, Yee said. "Most people will not want to handle a false
alarm per hour per machine," he said. The researchers' method is an
improvement over earlier anomaly detector designs, but "further
improvements are still necessary for broader use," he added.
It is
theoretically possible to use the method today, said Vemuri. The
researchers are working on proving that the method can be used without
raising too many false alarms, he said.
To cut down on false
alarms, the researchers are looking to make a redundant system "where we
use different methods on different data sets, combine the results of both
those methods, or use a best of three voting system," he said. One method
could use system call data, for instance, while another could analyze
instructions used, he said.
The researchers hope to have their
anomaly detection system worked out and supported with performance data
within a few years, said Vemuri.
Vemuri's research colleague was
Yihua Liao. They published the research in the Proceedings of the 11th
Usenix Security Symposium, which was held in San Francisco August 5
through 9, 2002. The research was funded by the Air Force Office of
Scientific Research.
Timeline: 2-3
years Funding: Government TRN Categories:
Cryptography and Security; Computer Science; Internet; Networking
Story Type: News Related Elements:
Technical paper, "Using Text Categorization Techniques for Intrusion
Detection," Proceedings of the 11th Usenix Security Symposium, San
Francisco August 5-9, 2002.
|
|