October 30/November 6, 2002

Text software spots intruders


Page One

Nanoscale LED debuts

Data transfer demo sets speed mark

Pulling nanotubes makes thread

Text software spots intruders

Microwave drill melts concrete
By Kimberly Patch, Technology Research News

The computer anti-virus programs in common use today use signature detection schemes that can only protect a machine from viruses that have been previously identified and entered into the programs' virus databases.

Anomaly detection systems, however, sense when normal patterns of communications change in order to stop new viruses -- or any other system intruders like worms or unauthorized users -- in their tracks.

The trouble is, existing anomaly detection schemes all generate high error rates -- they cry wolf so often that they are impractical. In order to identify the real intrusions, system managers must spend time checking out every possibility.

Researchers from the University of California at Davis have taken an unusual tack in anomaly detection by adapting text classification techniques to intrusion detection. Their initial results suggest that the technique could produce an anomaly detection system with a reasonable error rate.

The idea to apply text classification to intrusion detection began with a conversation about categorizing Web pages into clusters that share a given property, said V. Rao Vemuri, a professor of applied science and computer science at the University of California at Davis, and a scientist at Lawrence Livermore National Laboratories.

Instead of categorizing Web pages, however, the researchers used the classification system to categorize computer users into just two groups -- authorized users and intruders. "The problem is to decide what 'text'" to use for the problem, Vemuri said. We wanted some objective way of characterizing a user that the user... cannot consciously influence" in order to prevent an intruder from fooling the system, he said.

They turned to system calls to characterize a user. System calls are the internal requests various pieces of software make to each other in the course of carrying out a user's instructions. "The system calls are generated by the computer, and the user cannot really influence the sequence in which they are generated," said Vemuri. The scheme treats each system call as a word and each sequence of system calls as a document, and classifies each document as one generated during normal activity or intrusive activity, he said.

The nearest-neighbor text categorization technique the researchers used categorizes Web pages based on how they are linked. The nearest neighbors in terms of links also tend to be closer in terms of content.

The researchers' detection scheme characterizes an authorized user by building a profile of activities. "For example, in the course of my normal life, I use email, browse Web pages, use Word, PowerPoint [and] printers," said Vemuri. "Let's suppose that I rarely, if ever, use Java or C++. I rarely use root privileges. If someone logging onto my machine uses these, that departure from normal usage should signal... abnormal, [possibly] intrusive activity," he said.

The problem turned out to be easier than categorizing Web pages, said Vemuri. "Usually we use many categories. In our example, we have only two categories -- authorized or intruder, and in the worst-case three" if the system has to resort to classifying activity as unknown.

In addition, Web pages can be very long and the size of the English vocabulary is around 50,000 words, which makes categorizing Web pages a computer-intensive task. "In our case, the vocabulary -- distinct system calls -- rarely exceeds 100, and the size of the 'pages', [or groups of calls], is also very small," he said.

Short sequences of system calls have been used before to characterize a person's normal behavior, but this requires building a database of normal sequences of system calls for each program a person uses. The text categorization technique, however, calculates the similarities between program activities, which involves fewer calculations.

This allows the system to detect an intruder as the intruder is affecting the system, said Vemuri. "The computational burden in our case is much smaller, to the extent we started to dream about the possibility of detecting an intruder in real-time," like the way contestants called out titles as songs played on the TV show "Name That Tune", he said.

The researchers' current implementation is almost real-time, said Vemuri. "We have to wait until [a] process, terminates, or halts" before completing the classification, he said. Intrusive attacks, however, are usually conducted within one or more sessions, and every session contains several processes, said Vemuri. Because the classifier method monitors the execution of each process, it's likely that an attack can be detected while it is happening, he said. The researchers are also working on allowing the system to make a classification before a process terminates, he added.

The researchers tested their scheme with 24 attacks within a two-week period. The method detected 22 of 24 attacks, and had a relatively low false-positive rate of 31 false alarms out of 5,285 events, or 0.59 percent, according to Vemuri.

The method shows promise, said Bennet Yee, an assistant professor of computer science and engineering at the University of California at San Diego. "The novelty is noticing that text classification techniques can be adapted to intrusion detection, and doing the experiments that validate it," he said.

If it proves practical and is widely deployed, the technique could help prevent malicious software like the Internet worms Code Red and Klutz, Yee said. "It should be able to recognize new attacks as anomalous behavior and raise alarms earlier [than] signature detection schemes where a database of bad behavior must be compiled first," he said.

There is still work to do to determine if the method can be improved to a low enough false positive rate, however, said Yee. A practical anomaly detection system must have a very low false positive rate in order to be commercially useful because if system administrators spend too much time chasing down false alarms, they "will not want to use the system and will turn the intrusion detector off," he said.

Even a false positive rate of 0.44 percent could mean 23 false alarms per day if there are 5,285 events per day, Yee said. "Most people will not want to handle a false alarm per hour per machine," he said. The researchers' method is an improvement over earlier anomaly detector designs, but "further improvements are still necessary for broader use," he added.

It is theoretically possible to use the method today, said Vemuri. The researchers are working on proving that the method can be used without raising too many false alarms, he said.

To cut down on false alarms, the researchers are looking to make a redundant system "where we use different methods on different data sets, combine the results of both those methods, or use a best of three voting system," he said. One method could use system call data, for instance, while another could analyze instructions used, he said.

The researchers hope to have their anomaly detection system worked out and supported with performance data within a few years, said Vemuri.

Vemuri's research colleague was Yihua Liao. They published the research in the Proceedings of the 11th Usenix Security Symposium, which was held in San Francisco August 5 through 9, 2002. The research was funded by the Air Force Office of Scientific Research.

Timeline:   2-3 years
Funding:   Government
TRN Categories:  Cryptography and Security; Computer Science; Internet; Networking
Story Type:   News
Related Elements:  Technical paper, "Using Text Categorization Techniques for Intrusion Detection," Proceedings of the 11th Usenix Security Symposium, San Francisco August 5-9, 2002.



Page One
     Archive    Glossary     Resources    Research Directory
By Email     TRN Store     Feedback     Letters     About TRN
 
Find out about the TRN Services for Web sites, print publications and newsletters.

For permission to reprint or republish this article, please email trn@trnmag.com.
© Copyright Technology Research News, LLC 2000-2002. All rights reserved.