关键词:电子信息;网络安全;数据集;统计分析
摘 要:This thesis considers the best use of network traffic data to increase cyber security. This operational problem is one of great concern to network administrators and users generally. Our specific task was performed for the Network Security Division of the Army Research Laboratory, who requested we analyze a dataset of cyber-attacks masked in a background of representative network traffic (the 'Skaion' dataset). We find that substantial preprocessing must done before statistical methods can be applied to raw network data, that no single predictor is sufficient, and that the most effective statistical analysis is logistic regression. Our approach is novel in that we consider not only single predictor events, but also combinations of reports from multiple tools. While we consider a number of different statistical techniques, we find that the most satisfactory model is based on logistic regression. Finally, we conclude that while the Skaion dataset is valuable in terms of its new approach to network traffic emulation, the 1999 KDD Cup and DARPA-MIT datasets-despite their many shortcomings-are more clearly organized and accessible to academic study. Cyber security is a globally important problem and datasets like Skaion's must maximize opportunities for cross-disciplinary academic endeavors.