Affordable Access

An Improvised k-NN Respecting Diversity of Data for Network Intrusion Detection

Authors
  • Hamid, Yasir
  • Ranganathan, Balasaraswathi
  • Journaux, Ludovic
  • Farooq, Qaiser
  • Muthukumarasamy, Sugumaran
Publication Date
Jun 30, 2017
Source
HAL-UPMC
Keywords
Language
English
License
Unknown
External links

Abstract

Network Intrusion Detection is a complex classification problem aimed at discriminating the legitimate from illegitimate and potentially harmful network connections over the communication network. What adds to the complexity of the problem is the near real-time response to a threat, imbalanced datasets to deal with and finally the data being mixed in nature with some features being numeric some discrete and some nominal. In this work we have appliedSynthetic Minority Oversampling Technique(SMOTE)to balance the dataset and eliminate the skewness of the class distribution. The success of k-Nearest Neighbour (k-NN)depends upon the set of neighbours deemed to be very close or similar to a data point which is in turn determined by the similarity/distance metric employed, where most of the metrics employed in literature deal with numeric data only, and either need conversion of categorical features to numeric features or simply eliminated the categorical features, which often leads to reduction in the results. As forthis work is considered, we take into consideration both the categories of features simultaneously byreplacingthe conventional Euclidean metric with Gower metric, which is better suited for mixed data. Gower metric provides a mechanism to deal with heterogeneous features differently and ultimately yields a quantifiable value that determines the similarity of the two instances. Experimental results show that improvised version of k-NN outperforms its conventional counterpart in terms of the Accuracy, Detection Rate, Precision, Recall, f-Measure, and Receiver Operating Characteristic (ROC) curve.

Report this publication

Statistics

Seen <100 times