
Case Study to Determine DDoS Attack on SCADA system
For this case study, I have retrieved data from a research paper published in 2019 International Conference on Computer and Information Sciences (ICCIS) , whose authors are Ezaz Mohammed AL-Dahasi , Fahd A. Alhaidari.
So, now let's move on!!
SCADA SYSTEM AND ITS’S COMPONENT
SCADA (Supervisory Control and Data Acquisition) system is an automation architecture. It has physical hardware and software, on further bifurcating, it comprises of HMI ( Human Machine Interface), Supervisory system, Remote Terminal Unit (RTU), Programmable Logic Controllers (PLC), and communication interface.
Due to the internet, remote access has provided ease in controlling such gigantic networks, but on other hand, these systems are left vulnerable to cyber attacks such as DoS, DDoS, SQL injection, Man-In-The-Middle (MIM) Attack.
SCADA systems can be used in a wide range of industries, some of the examples are water and wastewater, electric generation, transmission and distribution systems, oil and gas, food production facilities, and mass transit systems.
DDoS ATTACK

DDoS(Denial-of-service attack) attack floods the targeted system or server. You might be thinking it is similar to a DoS attack. But that is not true, in a DDoS attack, there are multiple attackers or sometimes an attacker uses a botnet( in this scenario an attacker sends malicious software or script to others without letting them know and use their system to attack the target node).
Multiple false or fraudulent requests (bad requests) spikes the usage of the server and consumes its resources. This lead server to not fulfill the legitimate request and deny those request. Hence, either showing customers buffering signs or some other message.
The DDoS attack can be detected in three ways-
- By IP traceback techniques or packet marking technique
- Entropy variation
- IDs/IPs ( Intrusion Detection and Intrusion Prevention)
MACHINE LEARNING METHODS ADOPTED
In the research paper, the main motto was to detect DDoS attacks and determine their pattern by machine learning. So , to test and compare three different algorithms were used 1. J48(based on decision tree technique) 2. Random Forest 3.Naive Bayes(based on non-decision tree technique). The above algorithms are used for classification problems but the Random Forest algorithm can be also used for regression problems.
The dataset used for the following model for training and testing was taken from KDDCup’99. The records used from the dataset were of 488807 instances. Waikato Environment for Knowledge Analysis is a software used for training the model.
A total of 15 features was used to train the dataset. A list of features is directly provided from the research paper.

WHAT IS A CONFUSION MATRIX?

Confusion Matrix is a way to find the summary of the prediction values in classification problems. For instance, in the research paper whether the DDoS attack was detected or not is an example of the classification problem.
It is predicted by a table with a list of values and some important inferences can be drawn at the end.
Values in Confusion Table
TP(TRUE POSITIVE)- Predicted value is true and also the actual value is true
TN(TRUE NEGATIVE)- Predicted value is no and also the actual value is false
FP(FALSE POSITIVE)- Predicted value is true and also the actual value is no
FN(FALSE NEGATIVE)- Predicted value is no and also the actual value is yes
FP and FN are types I and II error respectively , where FP is the most dangerous type of error.
Other important interferences that can be drawn from the table are
- Senstivity = (TP)/(TP+FN)
- Specificity= TN/(TN+FP)
- Accuracy = (TP+TN)/(TN+FP+FN +TP)
- Precision = TP/(TP+FP)
- Negative Predicted Value = TN/(TN+FN)
USE OF CONFUSION MATRIX AND ITS’S RESULT
In the paper, the authors used three algorithms and tested the data. Each algorithm had a different result, thus resulting in a different confusion matrix result.
For the J48 Algorithm, the results were


For the Random Forest algorithm, the results were


For the Naive Bayes algorithm, the results were


So, the most successful algorithm was Random Forest ( with an accuracy of 99.9998%) , followed by J48 ( with an accuracy of 99.9957%), and lastly Navie Bayes ( with an accuracy of 97.74%).
Final output results are provided in the image below.

CONCLUSION
The use of Random Forest is best for detecting the DDoS attack out of the three algorithms used in the case study.