What’s the big deal with Big Data?

29 June 2015

IBM estimates that we create 2.5 quintillion bytes of data every day. Such large quantities present a major challenge when it comes to processing, storing and analysing data in a meaningful way.

IBM estimates that we create 2.5 quintillion bytes of data every day. Such large quantities present a major challenge when it comes to processing, storing and analysing data in a meaningful way.

The reason why Big Data is important to understand as an issue is because the sheer volume of data being created today is making gathering information and analysing it a very difficult task. 

According to recent figures from Cisco’s latest Visual Networking Index released in May, IP traffic in Western Europe is forecast to reach 24.7 exabytes per month by 2019. In the UK, it predicts there will be 614.9 million networked devices in the next four years, up from 317.5 million in 2014. While this will all result in the creation of huge amounts of data, ‘Big Data’ is more specific. Gartner defines it as “high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making”.

IBM describes Big Data as a “new natural resource”. It reckons we create 2.5 quintillion bytes of data each day from a variety of sources. A quintillion is a one with 30 zeroes attached to it. 

Over the years, IBM has developed a range of cloud-based data management and analytics solutions to help organisations predict customer behaviour and outcomes with speed and ease. Steve Mills, the company’s SVP and group executive for software and systems, has previously said: “As the value of data continues to grow, the differentiator for clients will be around predicting what could happen to help transform their business with speed and conviction.” 

Jeff Veis, head of the Big Data business group at HP Software, agrees. He says companies want to improve products, personalise service, improve loyalty, cross-sell, upsell, improve operations, predict outcomes and maximise margins. “They want Big Data to help transform business into an active collaborative data-driven partner with IT, which is easier said than done. The biggest change is that business is no longer a passive consumer of data.”

With the data tsunami predicted to get bigger, one problem for network managers is where to store the deluge. 

“Managing data growth has been seen as a problem to be solved since the early days of computing,” says Molly Rector, CMO for storage vendor DataDirect Networks (DDN). “What puts us in the Big Data age now versus say, 20 years ago, is the fact that we view data growth as an opportunity, not just a problem. The ‘three Vs’ of velocity, volume and variety do a good job of describing the things that make data ‘Big’: how fast it is generated or ingested, how much of it you keep and manage, and how data varies by type and size.” 

Big Data is being created by a wide variety of networks and networked devices such as sensors, satellites, surveillance cameras, web browsing trends, etc. Rector says it all needs to be collected into central repositories. “Government institutions, universities or commercial conglomerates often host these. The more data that is available to analyse, (i.e. the larger the data set) the more powerful Big Data becomes.”

She adds that during high performance functions like data ingest, generation, processing or re-processing, data too large to fit in a computer’s memory resides on external, high-performance disk systems. When the data is no longer needed for high-speed access, its owner will typically reduce the cost of storage by migrating it to lower cost archive disk or tape, or even to a cloud or off-site storage facility.

The value of Big Data

Any organisation can benefit from using Big Data if it can analyse and utilise it in the appropriate way. However, typically it is the large, well-resourced corporations that make most use of the stuff. 

Rector believes any data-centric organisation must use Big Data in order to remain competitive in its field. “This includes financial services for high-frequency trading, genomic and biotech institutes to accelerate time to discovery and time to cure, manufacturing organisations for rapid design validation and simulation, and so on. 

“Using data predictively helps organisations model data-centric conclusions earlier in the design process and with significantly less hardware investment. Big Data, when used strategically, helps them predict the future with relative accuracy, greatly reducing investment risk and greatly accelerating time to market and time to revenue.”

She adds that most high-performance workflows in research, government intelligence, media, cloud and commercial environments generate, process, re-process and store Big Data, and many even need to share them among multiple sites or institutions for collaboration, publication or high availability purposes. 

For example, Rector says at Oak Ridge National Laboratory in the US, multiple projects run in parallel on the country’s fastest research system, looking for practical insights into fundamental physical systems through techniques such as molecular scale modelling. 

“To deliver this kind of fidelity they rely on a 27-petaflop system that can access 40 petabytes of data at over a terabyte a second,” she says. “At some of the world’s largest banks, quantitative analysts model the effectiveness of high-frequency trading algorithms by testing them against historical market (tick) data. The more data they can test against, the more confidence they can have in the quality of the algorithm and its ability to perform competitively in a live trading environment. The faster the data can be reached, the more can be used for testing, with the most competitive institutions regularly testing algorithms against years of tick data.” 

“A singular source of truth”

With data generation on the rise, the cost of managing data over its lifecycle also grows, leaving Big Data sites with growing administration cost and complexity. To address these, requirements for end-to-end systems that can manage data automatically across high-performance workflows, archive and even cloud storage are on the rise, though solutions that can actually deliver such functionality today are few.

Ron Lifton, senior solutions marketing manager at NetScout Systems, says there are many considerations to be made when carrying such vast quantities of data. 

“Big Data is growing in volume, variety, and velocity in a shared connected world where there is no ‘off’. Whether it is a corporate database, a network element or a web service, IT is tasked with anticipating change and reducing the time it takes to resolve service delivery issues by pinpointing the root cause. To do that, traffic data must be used as a singular source of truth, providing full visibility into Layers 2 through 7, and to gain a holistic and contextual end-to-end view of virtual, physical and hybrid environments.”

According to Lifton, this traffic-based intelligence is needed to rapidly triage user impacting problems and improve capacity and infrastructure planning for Big Data initiatives. “Through a comprehensive service assurance solution with deeper insights into network performance and resource usage, enterprises ranging from manufacturing and healthcare, to highly competitive retail banking, insurance and travel, have the confidence to operate, innovate and compete at the highest level.” 

Security is always an issue, and when so much information is being collected about so many people and objects, it is difficult to see how all of it can be protected. But as Rector points out, the volume of data has no relation to the amount of security needed. 

“Typically, the amount of data does not dictate security considerations. For most institutions, security practices are determined by a combination of applicable regulations and policies based on best practices and the institutions’ own experiences. 

“Regulatory policies regarding data management apply in many fields – utilities, financial institutions and any publicly held company. For institutions handling personal data (this could be anything from genomics research bio-bank samples to supermarket loyalty programmes) the protection of personally identifiable data has stringent guidelines in most countries.”

But HP’s Veis believes security must be upgraded and standards raised in a Big Data environment. “Using encryption needs to become the norm. If implemented properly, security with regards to Big Data then becomes a moot point. With most large enterprises employing chief information security officers that report at the board level, standards should rise, and the risks associated with breaches lowered.” 

All the major security specialists have developed a number of security solutions aimed at network managers who deal with Big Data. But how to utilise them and effectively manage Big Data remains a challenge. NetScout’s Lifton says: “The connected world is evolving at warp speed, with dizzying complexity. Things can go wrong with the network, transport, servers, service enablers (like DNS and DHCP), middleware and databases. 

“IT teams can’t face the challenges of delivering Big Data services with traditional silo-specific tools because they miss the big picture. Instead, they need a comprehensive service assurance solution that uses traffic data to gain visibility into service interrelationships and dependencies.” 

Lifton says that gaining deep and real-time insight into application and network performance across both physical and virtual environments not only dramatically reduces the mean-time-to-knowledge of service issues but also assures a “flawless” Big Data user experience.

“Even Bigger Data”

With companies such as Cisco and Ericsson (in its regular Mobility Report)forecasting ever-increasing rises in traffic, how will corporate networks deal with the data storm that becomes even worse when you start factoring in Big Data? 

SevOne specialises in network performance and management. Tom Griffin, the company’s EMEA VP of systems engineering, says: “Big Data is putting an unprecedented strain on networks, which are evolving into things of previously unseen complexity and scale. 

“For the manager of these networks, the role remains the same: to monitor performance and protect against a failure that could prove catastrophic for the business. What has changed thanks to Big Data is the sheer volume of information they must contend with from across the network. Understanding traffic patterns and looking for data anomalies throughout the course of a day, week and/or month becomes much more challenging when there is so much information to hand, especially if using legacy tools.” 

One thing you can safely predict about the future of Big Data is that it will be ‘Even Bigger Data’ in the years to come. Not only will ‘conventional’ traffic be generated by human beings, there will also be the billions of machines all connected through the Internet of Things (IoT). 

Machina Research is predicting that cars connected to the IoT will cause data jams on networks, particularly in areas near mobile phone towers where rush hour traffic is worst. And that’s even before the inevitable appearance of driver-less vehicles.

However, network managers will have more of an understanding of the issues and ever-more powerful tools to help them make sense of it. 

Veis says: “In the 2000s, traditional IT was fragmented. Over the past decade it has become more integrated to the point where we are now able to perform predictive analytics today. 

“Moving forward, we are heading towards an era of prescriptive analytics. Predictive analytics helps model and forecast what might happen. However, prescriptive analytics seeks to determine the best solution or outcome among various choices, given the known parameters. Advancements in the speed of computing and the development of complex mathematical algorithms applied to the data sets are making prescriptive analysis possible. Optimisation, simulation, game theory and decision-analysis methods will become the norm by the end of 2017.”

Veis adds that it’s also important to address social media’s role in Big Data. He says analysing semi-structured human data has been “all the rage” over the past two years, particularly the ‘sentiment’ analysis of social media. 

“However, the power of social media by itself is over-rated especially if it is analysed in isolation without linking it to machine and business data. 

“That said, exploring the Big Data potential of audio recordings, archive files, image files, RSS feeds and video extras will be a growing field with many start-ups now entering the market.”