2018 will be the year people will start to see the impact of big data and big data analytics. It’s just a beginning of what is possible. Up until now, much of the big data products being released were only big data in marketing terms. Most major companies who have released big data have been repackaging old statistical approaches and decided that they were big data. Like the one-dollar apps in the iStore, initial big data products might not wow you, but they produce a different type of analytical data.
While big data analytics remains to be immature, most products that are released will have shortcuts built into their development lifecycle. The advantage is that the results from the product will be far better than what is available with traditional techniques, but the better-engineered solutions will replace these products in the next generation. This is the classic blue ocean red-ocean cycle.
So, what are these three characteristics that 2018 products will have:
Data Lakes are the sock drawer of the data storage world. They are our leftovers from 2017. It is akin to the blob field for a database engineer. When the ability to both collect and organize data is beyond one’s ability, one focuses on just collecting and throws all the data into a data lake. Data lakes exist for the sole reason that a program manager knows they want the data, but doesn’t know why. All an organization’s data goes into a data lake, and then an analytical model is going to help that short-sighted manager figure out why. Then the new analytical engine is going to continue to use this inefficient data store to continue to store the collected data. Yes, I am not a fan of the beloved data lake.
In the world of pivoting companies, data lakes make sense. It allows the collection of data, without normalization to be accomplished. And when an understanding of how to organize and normalize the data occurs, a model and result can be achieved. The development path of the data lake is an engineering shortcut when it comes to time and flexibility. It’s for this purpose, you will continue to see data lakes, especially for new products trying to be first in the market.
In 2018, data lakes are going to be very popular. Amazon AWS kicked off its first email of the year, and data lakes were their first webinar. And this makes sense, for data lakes require a massive amount of storage, and this means large AWS bills. The slower and cheaper Glacier style of storage used for long-term logs cannot be used for data lakes, as the data will need to be analyzed, not just stored.
With the popularity of data lakes, database analytics become a problem. A side-effect of data lakes means that the analysis cannot rely on search speeds, for data lakes have terrible access time. Big data, in general, is hampered by moving data to and from the disk itself (disk IO speeds). This means that grading large amounts of data and analyzing it in memory becomes important.
Now in-memory analytics is a good trend in 2018 and should not be confused with the evils of data lakes. When data is properly normalized analytics can run before disk IO, analyzing in real-time, instead of polling the database.
This is an exciting aspect of big data analytics. The prospect of analyzing data outside of the data being stored to make real-time decisions. Areas that benefit from this approach, such as stock trading, will be affected first. A benefit is that in some cases, the data does not need to be stored, saving a significant line item for the cost of big data analytics.
The Power of Insight
Often in life, knowing the big picture is all that matters. It took more than 300 pages for Alfred North Whitehead and Bertrand Russell to prove that one plus one equals two. I feel we benefit from our ignorance of the logic as much as its answer. And that is the point, that insight is often all we are looking for in a question.
At our company, we focus on ground truth. The ability to determine in big data the exact elements that support the issue. This is needed when someone is trying to resolve a problem. When we first produced our product, we found that most customers had a harder time knowing where to start. They lacked insight.
It took us less than six months to produce the analytics to provide the insight on where to start, and this was a huge change in how our product was used. Insight is something that a proper big data infrastructure can produce almost immediately.
Despite years of waiting, there are still more years of waiting to come for big data to truly act like big data. As we are beginning 2018, there is a significant amount in place for end-user value in big data analytics. With companies getting past the point of collecting data, the real fun is just beginning. The initial results of analysis and new companies to bring them are going to emerge. Marketing hype has to give way to execution and the ability to do something. This year is looking to be a year focused on results.