Skip links

Finding anomalies is easy, deriving alerts is hard.

Anomaly detection is a key component of next generation cybersecurity. And it doesn’t matter what kind of network it is–legacy enterprise, cloud-based or IoT-centric–once the “perimeter” is breached, a user’s credentials are compromised and a foothold achieved, the attacker will only be discovered by identifying unusual or anomalous behavior in the network.  Identifying anomalous behavior of entities—people, processes and machines—is the key to finding the most dangerous threats whether that is an external actor or a rogue insider. The challenge is moving from anomaly detection to useful alerting.

There are many companies trying to solve this problem through the use of big data analytics, behavioral modeling, machine learning (ML) and artificial intelligence (AI). It is actually pretty easy to find anomalies in a computer network since the environment is almost constantly changing and therefore fundamentally anomalous. The key is being able to sort through the anomalies and identify those that may dictate a corrective action by a machine or a human. I derived the model illustrated here to help think through the problem.

Hierarchy of anomaly detection
Hierarchy of anomaly detection

The base requirement is to identify anomalies in the data.  There are different approaches to this first step.  Some are looking just at endpoint data, others are looking at data already ingested into SIEMS and more ambitious approaches look very broadly at flow and log data to include full packet capture.  No matter what the data set is, step one is to identify anomalies in the data that indicate unusual behavior by entities on the network.

The next step is to categorize the anomalies into most likely cases.  I have listed four.  First is indicators of malicious activity.  This could be simple cases of unusual scanning or beaconing or more advanced indicators such as random domain name generation or subtle indications of lateral movement.  The second category is configuration errors and policy violations.  Some security types see these as IT issues and brush them off, but as we all know the initial attack frequently starts by taking advantage of a poorly configured device or the actions of a human that are not in compliance with established policies.  The third category is new activity on the network that is perfectly normal but was not part of the established baseline.  The fourth category is unusual but not malicious.  An example might be the emerging markets analyst who is poking around websites in Myanmar looking for the next hedge fund bet.

Once the anomalies are categorized, we need to prioritize them and deliver a list of alerts to the user.  We are looking for that elusive 1-to-n list where the top alert is the one we most care about.  Alert fatigue is not generated because we have too many alerts.  Alert fatigue comes from no prioritization.  In fact, I want to see all the alerts but I want them prioritized and categorized.  The categorization will help me decide who should analyze the alert and the prioritization will ensure that even if I only have one analyst, I know that single analyst is working on the most important problem.

Categorizing and prioritizing is not easy, otherwise we would have several solutions on the market to choose from.  The two arrows on the left of the model are the key to understanding what it takes to produce a useful 1-to-n list.  At the bottom of our stack, the work is data intensive.  Data scientists derive algorithms to find anomalies and, in general, more data is better.  As we move up the stack, we find context is the key to categorization and prioritization.  What does this entity do in the network, what business process does it support, what are the configurations and policies in place and is there a temporal nature to the analysis?  These and other contextual issues are the key to categorizing and prioritizing the alerts and this process requires extensive use of ML and AI to product the 1-to-n list.  It also requires cross-functional expertise from network engineers, data scientists, threat intelligence analysts and operators with both offensive and defensive expertise.

There is much more to discuss in this model, but I have exceeded the “optimum” length of a blog post.  In my next post, I will discuss the people, processes and technology it takes to make this all happen.

If you have read this far, you are probably someone who is a cybersecurity professional and I welcome your comments on LinkedIn or on my website,  Also, please check out my Facebook page, The CyberSpeaker, where I provide thoughts on cybersecurity targeted at a more general audience.