2018 will be the year people will start to see the impact of big data and big data analytics. It’s just a beginning of what is possible. Up until now, much of the big data products being released were only big data in marketing terms. Most major companies who have released big data have been repackaging old statistical approaches and decided that they were big data. Like the one-dollar apps in the iStore, initial big data products might not wow you, but they produce a different type of analytical data.
While big data analytics remains to be immature, most products that are released will have shortcuts built into their development lifecycle. The advantage is that the results from the product will be far better than what is available with traditional techniques, but the better-engineered solutions will replace these products in the next generation. This is the classic blue ocean red-ocean cycle.
So, what are these three characteristics that 2018 products will have:
Data Lakes are the sock drawer of the data storage world. They are our leftovers from 2017. It is akin to the blob field for a database engineer. When the ability to both collect and organize data is beyond one’s ability, one focuses on just collecting and throws all the data into a data lake. Data lakes exist for the sole reason that a program manager knows they want the data, but doesn’t know why. All an organization’s data goes into a data lake, and then an analytical model is going to help that short-sighted manager figure out why. Then the new analytical engine is going to continue to use this inefficient data store to continue to store the collected data. Yes, I am not a fan of the beloved data lake.
In the world of pivoting companies, data lakes make sense. It allows the collection of data, without normalization to be accomplished. And when an understanding of how to organize and normalize the data occurs, a model and result can be achieved. The development path of the data lake is an engineering shortcut when it comes to time and flexibility. It’s for this purpose, you will continue to see data lakes, especially for new products trying to be first in the market.
In 2018, data lakes are going to be very popular. Amazon AWS kicked off its first email of the year, and data lakes were their first webinar. And this makes sense, for data lakes require a massive amount of storage, and this means large AWS bills. The slower and cheaper Glacier style of storage used for long-term logs cannot be used for data lakes, as the data will need to be analyzed, not just stored.
With the popularity of data lakes, database analytics become a problem. A side-effect of data lakes means that the analysis cannot rely on search speeds, for data lakes have terrible access time. Big data, in general, is hampered by moving data to and from the disk itself (disk IO speeds). This means that grading large amounts of data and analyzing it in memory becomes important.
Now in-memory analytics is a good trend in 2018 and should not be confused with the evils of data lakes. When data is properly normalized analytics can run before disk IO, analyzing in real-time, instead of polling the database.
This is an exciting aspect of big data analytics. The prospect of analyzing data outside of the data being stored to make real-time decisions. Areas that benefit from this approach, such as stock trading, will be affected first. A benefit is that in some cases, the data does not need to be stored, saving a significant line item for the cost of big data analytics.
The Power of Insight
Often in life, knowing the big picture is all that matters. It took more than 300 pages for Alfred North Whitehead and Bertrand Russell to prove that one plus one equals two. I feel we benefit from our ignorance of the logic as much as its answer. And that is the point, that insight is often all we are looking for in a question.
At our company, we focus on ground truth. The ability to determine in big data the exact elements that support the issue. This is needed when someone is trying to resolve a problem. When we first produced our product, we found that most customers had a harder time knowing where to start. They lacked insight.
It took us less than six months to produce the analytics to provide the insight on where to start, and this was a huge change in how our product was used. Insight is something that a proper big data infrastructure can produce almost immediately.
Despite years of waiting, there are still more years of waiting to come for big data to truly act like big data. As we are beginning 2018, there is a significant amount in place for end-user value in big data analytics. With companies getting past the point of collecting data, the real fun is just beginning. The initial results of analysis and new companies to bring them are going to emerge. Marketing hype has to give way to execution and the ability to do something. This year is looking to be a year focused on results.
Real Numbers Support Better Detection Accuracy
Data shows that automated response is not the big objective for security operations. While startups and investors like the automated response pitch, reviewing actual operations data shows that the real need is for accurate detection. Data shows that the number of critical alerts over the last year has consistently increased, but the number of incidents has decreased. This means that there are more issues to validate, but less real issues to address. The greatest efficiency gain is in reducing the number of false alerts. On the other hand, automating without validation is likely creating unnecessary prevention and denial of service.
Fluency is blessed with good customers who talk about their needs. I have to admit that I like the orchestration pitch and was debating making Fluency a pure orchestration tool. That is a big decision, one that requires talking to your customers. I sent out emails, picked up the phone, and met face to face. I wanted to know where the operation’s effort truly was.
I focused my questions on the number of incidents that needed to be addressed. I often see the use of vanity metrics. The most significant vanity metric for a security operations center (SOC) is to talk about the number of alerts. I was at the Gartner Summit hearing two large companies talking about millions of alerts a day, but not a single measurement of their response capability. The number of alerts does not translate into success, but merely available information for analysis. The objective of a SOC is its response.
Talking to our customers, there was a consistent answer in their data. The number of incidents per month was decreasing, while the number of critical events being reported was increasing. More than one customer showed a consistent decrease in their validated incidents, while their critical alerts more than doubled in the last year. This particular ratio showed an increase of critical alerts-to-incidents from 20-to-1 to 42-to-1.
Fluency is unique in that it does real-time analytics for detection and validation as part of its orchestration. This is a critical aspect as we feel that automating a response to an incorrect alert can be as damaging as missing a real one. The false positive ratio of critical alerts tells us that you are forty times more likely to respond to an incorrect alert if you do not validate it.
What does this ratio of critical alerts to confirmed issues mean? It means that security products are crying wolf more often. It means that security products are alerting more often and are increasingly incorrect in their alerts. The fact is that there are more alerts saying they are critical, and this increase is putting a real strain on staff to review the alerts. It also means that companies that focus on responding without validating alerts will be doing more harm than good.
As for Fluency, we will continue to perform orchestration but with added emphasis on analytics and validation. This is where our customers need us. Fluency is one of a couple companies that perform real time analytics using machine learning. Most machine learning approaches are with static searches or are just statistical analysis rebranded as machine learning.
Chris Jordan is CEO of College Park, Maryland-based Fluency (www.fluencysecurity.com), a pioneer in Security Automation and Orchestration.
Elizabeth O’Dowd wrote an excellent piece on the relationship of visibility to decision making, entitled “Visibility, Automation Defend Against Network Security Threats”. Read more
One awesome aspect about companies is that they grow into maturity, and so does their security. An organization chart slowly looks like a company. Successful companies are thinking more of their customers, and less on their network administration and security. IT and Security are one-person living programs that execute their vision. But eventually, these aspects of a company need to become more business-like. A formal security approach needs to be put in place that a team can run and a business execute.
The idea of all the sudden turning on security it daunting to most companies. Spreadsheets filled with compliance requirements are overwhelming. There is no “Quick Start Guide to Corporate Security”.
There is a first step. Building a security foundation in a company starts with two simple objectives:
· Governance, which asks the questions, and
· Audit, which provides evidence to answers them.
That’s it? For now, yes.
A common mistake is to focus on perfection. Security is like a boat needing to cross the sea. The boat might leak, but if it can get us to where we are going, it’s good enough. If we spend all our time trying to make the boat perfect then we never leave. The object is the voyage, not a perfect boat.
Create a Governance part to the company that focuses first on the basics. Governance requires that ability to see. Without vision, it is just paperwork. Governance first focuses on developing three simple abilities.
These are to see:
· a user’s login to the network.
· the desktop endpoint protection.
· the usage of the network.
This can be having Windows Login (LDAP), running a centrally managed antivirus product, and an Internet proxy to monitor web usage. These simple elements are the start to user, device and network audit.
To support Governance, implement basic audit. Now tools tend to already have an audit in them. We need to centralize the audit for Governance.
Create Centralized Log Management (CLM).
· move log data to a common location.
· store log data in a common datastore.
The focus of this one-two punch is that it will provide the insight to determine and to prioritize the need. The order of need is the ability to:
· prevent, and finally
This simple start of CLM is the framework Governance will consistently go back to gain understanding and address how to move forward. The ability to measure the security of the network relates to the ability to make decisions.
What this might look like
With basic controls report audit to a central location, Governance should have the ability to see:
It’s not perfect. But Governance can now track when a person properly enters the network. Governance can also determine when a device connects to the Internet without having gone through proper login. Lastly, there is an ability to see what network logs are associated to what users.
To relate all this information, Governance implements a Central Log Management (CLM) system. CLM addresses a number of long-term compliance objectives, and at the same time reduces the effort in understanding all the data being collected. This data will continually grow.
Choosing Central Log Management Software
A CLM is not a giant haystack. Solution’s like Fluency correlate and fuse the data. Placing all the data into a giant database did not solve anything. It just made your problem sit on a single server.
A Japanese samurai might tell you, “Every action is an action to the final cut.” If you do something that does not bring you closer to your objective, you have wasted an action. Waste to a businessman is lost profit.
The growth, diversity, and complexity of log data grow as a security process matures. To avoid having to migrate away from an incapable CLM solution, the foundation needs to implement a proper CLM from the start.
The lack of a CLM solution that provides insight is why I left a vice president position at McAfee. People were implementing large amounts of security without insight into what products were doing. Though individually, these products helped the security posture of a company, the lack of vision meant large gaps in prevention and response. This lack of insight still exists today.
Implementing CLM correctly means more than avoiding a migration later. For instance, CLM with analytics will provide better decision-making insight. Better insight means better priority and measurement.
Implementing a basic CLM capability is a mistake. Open source CLM focuses on creating a bigger haystack. Incoming logs are parsed and placed in a transaction table. All data is treated as if it was all the same. There is an immediate satisfaction of seeing data and charts, but the charts provide little management insight after the first viewing. The desired of relating user-to-device-to-network activity is still an inefficient manual process.
Leapfrogging Old CLM
When techies’ sees a newer solution is where they should be, they leapfrog. The greatest advantage of building a foundation is the ability to leapfrog old ideas without having to dismantle old infrastructure.
Central Log Management (CLM) is the cornerstone to audit. It appears like it does not have a technology. It appears like a giant haystack of data. This view is common even in the most cutting edge open source projects. The haystack approach to log management, one that focuses on the database, is a poison pill.
A CLM is to position you to implement and leverage analytics. Analytic algorithms do not understand data. This means that the CLM you choose needs to perform normalization, de-confliction, and fusion on their own. These aspects of log management are not in basic SIEM and CLM design.
Marketing material normally does not talk about this. To find a CLM solution ready for Security Analytics look for solutions that implement true fusion. To do this, look to see if the system needs to perform a join, or a second look-up to present a view of a transaction, if so then the system is not ready for analytics. Security Analytics works best with a graph database approach and fused data. Governance needs analytics for insight, not summary charts.
In starting a security program focus on creating an element of the company responsible for Governance and provide that Governance the ability to see and measure the infrastructure. Security management is Governance.
To gain vision for governing, create a central log management (CLM) that collects user, device and network logs. The CLM should fuse the data so that there is a clear relationship between the user, device, and network. The aim is to have analytics to support Governance.
Once there is Governance and vision into the user-device-network relationship, Governance can start implementing and measuring the network.
The next step (and posting) after establishing a CLM is implementing security controls for data leaving the network. This means determining how people use the network and what interaction is allowed.
Chris Jordan is CEO of College Park, Maryland-based Fluency (www.fluencysecurity.com, Twitter:@FluencySecurity), a pioneer in Security Automation and Orchestration.
As the security market undergoes fundamental shifts towards adaptive and risk-based security, there is a growing demand for innovations and new technologies that increase an organization’s effectiveness. Automation of processes and intelligent coordination across disparate systems help besieged organizations to resolve operational issues and rapidly improve efficiency. Read more
387 Technology Drive
College Park, MD 20742
Phone: +1 (888) 885-3569
What We Do
Fluency® provides network behavioral analytics through artificial intelligence and machine learning. Fluency doesn’t just silence alerts, it provides the insight necessary to discover and resolve issues. Machine scoring allows Fluency to detect and notify personnel 7x24, allowing people to focus on response instead of watching a screen.