Skip links

AI enabled anomaly detection—a people, process, technology challenge

There is a lot of talk about applying artificial intelligence (AI) to the challenge of cybersecurity and our company, IronNet Cybersecurity, is one of many attempting to do so.  I have found over the last two years that you must have a well-defined linkage between people, process and technology to have any chance of creating value.

Before you read the rest of this article, you may want to check out my earlier post on the challenge of moving from anomalies to alerts: Finding anomalies is easy, deriving alerts is hard.

People.  I believe you need three specific groups of people to work this problem.  The first group are hardware and software engineers who are expert at capturing very large data sets at line speeds (10Gbps+).  They must be able to parse the data and in near real time make it available to the second group of people, the data scientists.  All data scientists are not created equal.  They all know the same math, but it is how they apply the math to the data set that creates the specialties.  To solve the security problem, you need data scientists who can apply their science/art to network flow data.  This is a different problem than delivering ads at click speed or electronic trading.  The third group you need is the hunters.  These are operators who are highly skilled in both defense and offense and who really understand what it means to “hunt.”

Process.  The process begins with the HW/SW engineers collecting full network flow data and sending the data flow to an analytic engine.  The analytic engine hosts the algorithms created by the data scientists to identify anomalies in the data.  The first challenge to overcome is that network flow data is almost by definition anomalous.  The second hurdle is the algorithms must be informed by some sense of threat intelligence so the math is targeted at finding anomalies most likely to indicate the presence of malicious activity.  The third step in the process is to present the output of the algorithms to the hunters who are going to use their experience, intuition, and understanding of threat intelligence to let the data scientists know what is useful and what is not.  The output of this process may be that the data scientists need to change features and parameters in the algorithms or there may be a requirement for the engineers to collect different data or to process the data in a different way to produce useful results.  Success will come from a deliberate closed-loop process that produces a metric driven, interactive relationship between the three groups of people.

Technology.  There is lots of technology required to execute the process I have described.  Much of it is well known in terms of network engineering and data science.  What has not been solved is the ability to create a 1-to-n list of alerts such that the top alert is more important than the second alert and so on down the list.  At the same time, the list must contain a very small number of benign events, so called “false positives,” and less than .1% would probably be a good target.  Getting to the 1-to-n list requires the application of AI.  A human would create the 1-to-in list by examining the output of the data algorithms, putting the output in the context of the network to prioritize critical issues and applying their experience and intuition to focus on the entity that is at the highest risk of being involved in a compromise.  Humans cannot do this at speed given the volume of network flow, which is why we need machines to take on the task.  The trick is getting the machines to emulate the intelligence of humans and that is where AI comes in.

If you like this article please share it. Also check out my website, www.thecyberspeaker.com and my Facebook page https://www.facebook.com/thecyberspeaker.