New Cyber Defense Brand DeepSeas to Unite Newly Acquired Commercial Managed Threat Services Business from Booz Allen Hamilton with Security On-Demand. Learn More

Why Machine Learning Models Usually Fail in Cyber Security

History of Machine Learning

The term Machine Learning (ML) has been around since the 1950s and has made a massive resurgence in the last five years. Gartner recently discovered that 42% of survey respondents did not understand the benefits of ML, but many business leaders have bought into it, because it’s a trend and the rewards seem appealing.

With nearly every cyber security tool boasting of their Machine Learning capabilities, how do you know which vendor’s machine learning actually works?

Problems with Machine Learning

AI & Machine Learning detection capabilities for cyber are rapidly evolving. Nearly half of organizations are expanding cognitive detection capabilities to find unknown attacks.

3 Key Challenges with using Machine Learning:

  • How adaptable are the models as the data changes
  • How are the models trained to find the threats
  • How much time does it take to “train” a machine learning model

AI & ML in the security world have great promise and are part of any next generation threat detection platform. There is an expectation that threats identified by an AI or ML system will be able to credibly detect advanced threats more efficiently. The problem with this assumption is that all AI & ML models have inherent inaccuracy, which means that false positives are still a major issue.

Data “Labeling”

Data must be structured and organized so it can be understood by the algorithms that analyze the data. More than 80% of ML project time is spent on data labeling and preparation phases. There are thousands of algorithms – which one will solve your problem?

One method is called “human in the loop,” which uses judgment of people who work with the data that is used with a machine learning model. This can bring efficiency to the program, but also causes bias.

Interpreting the Results

“Result Mapping” must be done in advance. One of the most common cyber attacks on machine learning systems is to trick the system into making false predictions by giving malicious inputs. Other issues that can arise with result interpretation is a Transfer Learning Attack or data poisoning attack, which can greatly decrease the accuracy of the output.

Also, when the output is received, what is the level of “fitness,” or ML accuracy? Knowing the fitness can make all the difference when interpreting the ML results. Along the same methodology, if the data is “over-fit” or “under-fit” then this is a measurement of how well the ML model learns and generalizes to new data.

Often these accuracy, adaptability, and efficiency challenges make it difficult to create an effective ML program. The result? Most cyber threat detection companies without a resolution will face:

  • Loss of fidelity
  • Compromised result accuracy
  • Introduction of bias
  • Loss of characteristics
  • Increase of false positives – increasing human analysis load and expense
  • Loss of customer trust from crying wolf
  • Miss detecting true positives

The Threat Actors Are Using Machine Learning

Beyond the challenges of effective ML uses, the threat actors also have access to machine learning tools. To defend against today’s sophisticated threat actors, your organization must not only combat the problems above, but also have a system that can learn from hundreds, if not thousands, of malicious parties.

For example, in August 2019, criminals employed AI-based software to replicate a CEO’s voice to command a cash transfer of €220,000 (approximately $243,000). Many of today’s intelligent ML tools are also sold to threat actors as cyber attacks as a Service platforms. Launching successful attacks has never been easier, which makes defending against attacks more challenging than ever.

6 Questions you should ask your MSSP/MDR Provider about Machine Learning

When choosing an MSSP or MDR provider that claims to use machine learning, ask these questions first:

• What model is your vendor using?

• How many ML models are in the system?

• If the models are dynamic, how often is the model updated (re-trained)?

• How much data do they use to train the models? (all or subset?)

• How often is that data refreshed (periodic or continuous)?

• How do you deal with unknown conditions or things that were not a part of the training data?

How SOD Solves the Machine Learning Accuracy Problems

Security On-Demand solves the ML accuracy problems by building its ThreatWatch platform directly on a Big Data Analytics engine called AQ Technology. The AQ analytics engine enables the following:

  • Increased Speed – ThreatWatch can query petabytes of log data in minutes – a 100x performance increase over other platforms
  • Small Data Footprint – ThreatWatch only needs to store 1-3% of the original dataset
  • Powerful Detection – Automatically identifies anomalies in the data – allowing us to find threats without pre-knowledge or rules

For more information, contact us for a demo here.

About Security On-Demand

Security On-Demand (SOD) provides 24×7 advanced cyber-threat detection services for mid-market companies and state or local government agencies.  SOD’s patented, behavioral analytics technology platform, ThreatWatch® enables the detection of advanced threats that help protect brand value and reduce the risk of a data breach.  Headquartered in San Diego, California with R&D offices in Warsaw Poland, SOD services and protects hundreds of brands globally and is the winner of multiple industry awards.  Please visit us at Find us on LinkedIn and follow us on Twitter @SecurityOnDmand.