Big Data to Scratch the Surface in 2018

2018 will be the year people will start to see the impact of big data and big data analytics. It’s just a beginning of what is possible. Up until now, much of the big data products being released were only big data in marketing terms. Most major companies who have released big data have been repackaging old statistical approaches and decided that they were big data. Like the one-dollar apps in the iStore, initial big data products might not wow you, but they produce a different type of analytical data.

While big data analytics remains to be immature, most products that are released will have shortcuts built into their development lifecycle. The advantage is that the results from the product will be far better than what is available with traditional techniques, but the better-engineered solutions will replace these products in the next generation. This is the classic blue ocean red-ocean cycle.

So, what are these three characteristics that 2018 products will have:

 

Data Lakes

Data Lakes are the sock drawer of the data storage world. They are our leftovers from 2017. It is akin to the blob field for a database engineer. When the ability to both collect and organize data is beyond one’s ability, one focuses on just collecting and throws all the data into a data lake. Data lakes exist for the sole reason that a program manager knows they want the data, but doesn’t know why.  All an organization’s data goes into a data lake, and then an analytical model is going to help that short-sighted manager figure out why. Then the new analytical engine is going to continue to use this inefficient data store to continue to store the collected data. Yes, I am not a fan of the beloved data lake.

In the world of pivoting companies, data lakes make sense. It allows the collection of data, without normalization to be accomplished. And when an understanding of how to organize and normalize the data occurs, a model and result can be achieved. The development path of the data lake is an engineering shortcut when it comes to time and flexibility. It’s for this purpose, you will continue to see data lakes, especially for new products trying to be first in the market.

In 2018, data lakes are going to be very popular. Amazon AWS kicked off its first email of the year, and data lakes were their first webinar. And this makes sense, for data lakes require a massive amount of storage, and this means large AWS bills. The slower and cheaper Glacier style of storage used for long-term logs cannot be used for data lakes, as the data will need to be analyzed, not just stored.

 

In-Memory Analytics

With the popularity of data lakes, database analytics become a problem. A side-effect of data lakes means that the analysis cannot rely on search speeds, for data lakes have terrible access time.  Big data, in general, is hampered by moving data to and from the disk itself (disk IO speeds).  This means that grading large amounts of data and analyzing it in memory becomes important.

Now in-memory analytics is a good trend in 2018 and should not be confused with the evils of data lakes. When data is properly normalized analytics can run before disk IO, analyzing in real-time, instead of polling the database.

This is an exciting aspect of big data analytics. The prospect of analyzing data outside of the data being stored to make real-time decisions. Areas that benefit from this approach, such as stock trading, will be affected first. A benefit is that in some cases, the data does not need to be stored, saving a significant line item for the cost of big data analytics.

 

The Power of Insight

Often in life, knowing the big picture is all that matters. It took more than 300 pages for Alfred North Whitehead and Bertrand Russell to prove that one plus one equals two. I feel we benefit from our ignorance of the logic as much as its answer. And that is the point, that insight is often all we are looking for in a question.

At our company, we focus on ground truth. The ability to determine in big data the exact elements that support the issue. This is needed when someone is trying to resolve a problem. When we first produced our product, we found that most customers had a harder time knowing where to start. They lacked insight.

It took us less than six months to produce the analytics to provide the insight on where to start, and this was a huge change in how our product was used. Insight is something that a proper big data infrastructure can produce almost immediately.

Despite years of waiting, there are still more years of waiting to come for big data to truly act like big data. As we are beginning 2018, there is a significant amount in place for end-user value in big data analytics. With companies getting past the point of collecting data, the real fun is just beginning. The initial results of analysis and new companies to bring them are going to emerge. Marketing hype has to give way to execution and the ability to do something. This year is looking to be a year focused on results.

The Network is Crying Wolf

Real Numbers Support Better Detection Accuracy

Data shows that automated response is not the big objective for security operations. While startups and investors like the automated response pitch, reviewing actual operations data shows that the real need is for accurate detection. Data shows that the number of critical alerts over the last year has consistently increased, but the number of incidents has decreased. This means that there are more issues to validate, but less real issues to address. The greatest efficiency gain is in reducing the number of false alerts. On the other hand, automating without validation is likely creating unnecessary prevention and denial of service.

Fluency is blessed with good customers who talk about their needs. I have to admit that I like the orchestration pitch and was debating making Fluency a pure orchestration tool. That is a big decision, one that requires talking to your customers. I sent out emails, picked up the phone, and met face to face. I wanted to know where the operation’s effort truly was.

I focused my questions on the number of incidents that needed to be addressed. I often see the use of vanity metrics. The most significant vanity metric for a security operations center (SOC) is to talk about the number of alerts. I was at the Gartner Summit hearing two large companies talking about millions of alerts a day, but not a single measurement of their response capability. The number of alerts does not translate into success, but merely available information for analysis. The objective of a SOC is its response.

Talking to our customers, there was a consistent answer in their data. The number of incidents per month was decreasing, while the number of critical events being reported was increasing. More than one customer showed a consistent decrease in their validated incidents, while their critical alerts more than doubled in the last year. This particular ratio showed an increase of critical alerts-to-incidents from 20-to-1 to 42-to-1.

Fluency is unique in that it does real-time analytics for detection and validation as part of its orchestration. This is a critical aspect as we feel that automating a response to an incorrect alert can be as damaging as missing a real one. The false positive ratio of critical alerts tells us that you are forty times more likely to respond to an incorrect alert if you do not validate it.

What does this ratio of critical alerts to confirmed issues mean? It means that security products are crying wolf more often. It means that security products are alerting more often and are increasingly incorrect in their alerts. The fact is that there are more alerts saying they are critical, and this increase is putting a real strain on staff to review the alerts. It also means that companies that focus on responding without validating alerts will be doing more harm than good.

As for Fluency, we will continue to perform orchestration but with added emphasis on analytics and validation. This is where our customers need us. Fluency is one of a couple companies that perform real time analytics using machine learning. Most machine learning approaches are with static searches or are just statistical analysis rebranded as machine learning.

Chris Jordan is CEO of College Park, Maryland-based Fluency (www.fluencysecurity.com), a pioneer in Security Automation and Orchestration.