Big data challenges are numerous: Big data projects have become a normal part of doing business — but that doesn’t mean that big data is easy.
According to the NewVantage Partners Big Data Executive Survey 2017, 95 percent of the Fortune 1000 business leaders surveyed said that their firms had undertaken a big data project in the last five years. However, less than half (48.4 percent) said that their big data initiatives had achieved measurable results.
An October 2016 report from Gartner found that organizations were getting stuck at the pilot stage of their big data initiatives. “Only 15 percent of businesses reported deploying their big data project to production, effectively unchanged from last year (14 percent),” the firm said.
Clearly, organizations are facing some major challenges when it comes to implementing their big data strategies. And in fact, the IDG Enterprise 2016 Data & Analytics Research found that 90 percent of those surveyed reported running into challenges related to their big data projects.
So what are those challenges? And more importantly, what can organizations do to overcome them?
If you’re in the market for big data solutions for your company, see our list of top big data companies
What Is Big Data?
Before we delve into the most common big data challenges, we should first define “big data.” There is no set number of gigabytes or terabytes or petabytes that separates “big data” from “average-sized data.” Data stores are constantly growing, so what seems like a lot of data right now may seem like a perfectly normal amount in a year or two. In addition, every organization is different, so the amount of data that seems challenging for a small retail store may not seem like a lot to a large financial services company.
Instead, most experts define big data in terms of the three Vs. You have big data if your data stores have the following characteristics:
- Volume: Big data is any set of data that is so large that the organization that owns it faces challenges related to storing or processing it. In reality, trends like ecommerce, mobility, social media and the Internet of Things (IoT) are generating so much information, that nearly every organization probably meets this criterion.
- Velocity:Â If your organizations is generating new data at a rapid pace and needs to respond in real time, you have the velocity associated with big data. Most organizations that are involved in ecommerce, social media or IoT satisfy this criterion for big data.
- Variety: If your data resides in many different formats, it has the variety associated with big data. For example, big data stores typically include email messages, word processing documents, images, video and presentations, as well as data that resides in structured relational database management systems (RDBMSes).
Characteristics of Big Data | |
---|---|
Volume | Big data requires a large amount of storage space, and organizations must constantly scaletheir hardware and software in order to accommodate increases. |
Velocity | New data is being created quickly, and organizations need to respond in real time. |
Variety | Data resides in a varfety of different formats,including text, images, video, spreadsheets and databases. |
These three characteristics cause many of the challenges that organizations encounter in their big data initiatives. Some of the most common of those big data challenges include the following:
1. Dealing with data growth
The most obvious challenge associated with big data is simply storing and analyzing all that information. In its Digital Universe report, IDC estimates that the amount of information stored in the world’s IT systems is doubling about every two years. By 2020, the total amount will be enough to fill a stack of tablets that reaches from the earth to the moon 6.6 times. And enterprises have responsibility or liability for about 85 percent of that information.
Much of that data is unstructured, meaning that it doesn’t reside in a database. Documents, photos, audio, videos and other unstructured data can be difficult to search and analyze.
It’s no surprise, then, that the IDG report found, “Managing unstructured data is growing as a challenge – rising from 31 percent in 2015 to 45 percent in 2016.”
In order to deal with data growth, organizations are turning to a number of different technologies. When it comes to storage, converged and hyperconverged infrastructure and software-defined storage can make it easier for companies to scale their hardware. And technologies like compression, deduplication and tiering can reduce the amount of space and the costs associated with big data storage.
On the management and analysis side, enterprises are using tools like NoSQL databases, Hadoop, Spark, big data analytics software, business intelligence applications, artificial intelligence and machine learning to help them comb through their big data stores to find the insights their companies need.
2. Generating insights in a timely manner
Of course, organizations don’t just want to store their big data — they want to use that big data to achieve business goals. According to the NewVantage Partners survey, the most common goals associated with big data projects included the following:
- Decreasing expenses through operational cost efficiencies
- Establishing a data-driven culture
- Creating new avenues for innovation and disruption
- Accelerating the speed with which new capabilities and services are deployed
- Launching new product and service offerings
All of those goals can help organizations become more competitive — but only if they can extract insights from their big data and then act on those insights quickly. PwC’s Global Data and Analytics Survey 2016 found, “Everyone wants decision-making to be faster, especially in banking, insurance, and healthcare.”
To achieve that speed, some organizations are looking to a new generation of ETL and analytics tools that dramatically reduce the time it takes to generate reports. They are investing in software with real-time analytics capabilities that allows them to respond to developments in the marketplace immediately.
3. Recruiting and retaining big data talent
But in order to develop, manage and run those applications that generate insights, organizations need professionals with big data skills. That has driven up demand for big data experts — and big data salaries have increased dramatically as a result.
The 2017 Robert Half Technology Salary Guide reported that big data engineers were earning between $135,000 and $196,000 on average, while data scientist salaries ranged from $116,000 to $163, 500. Even business intelligence analysts were very well paid, making $118,000 to $138,750 per year.
In order to deal with talent shortages, organizations have a couple of options. First, many are increasing their budgets and their recruitment and retention efforts. Second, they are offering more training opportunities to their current staff members in an attempt to develop the talent they need from within. Third, many organizations are looking to technology. They are buying analytics solutions with self-service and/or machine learning capabilities. Designed to be used by professionals without a data science degree, these tools may help organizations achieve their big data goals even if they do not have a lot of big data experts on staff.
4. Integrating disparate data sources
The variety associated with big data leads to challenges in data integration. Big data comes from a lot of different places — enterprise applications, social media streams, email systems, employee-created documents, etc. Combining all that data and reconciling it so that it can be used to create reports can be incredibly difficult. Vendors offer a variety of ETL and data integration tools designed to make the process easier, but many enterprises say that they have not solved the data integration problem yet.
In response, many enterprises are turning to new technology solutions. In the IDG report, 89 percent of those surveyed said that their companies planned to invest in new big data tools in the next 12 to 18 months. When asked which kind of tools they were planning to purchase, integration technology was second on the list, behind data analytics software.
5. Validating data
Closely related to the idea of data integration is the idea of data validation. Often organizations are getting similar pieces of data from different systems, and the data in those different systems doesn’t always agree. For example, the ecommerce system may show daily sales at a certain level while the enterprise resource planning (ERP) system has a slightly different number. Or a hospital’s electronic health record (EHR) system may have one address for a patient, while a partner pharmacy has a different address on record.
The process of getting those records to agree, as well as making sure the records are accurate, usable and secure, is called data governance. And in the AtScale 2016 Big Data Maturity Survey, the fastest-growing area of concern cited by respondents was data governance.
Solving data governance challenges is very complex and is usually requires a combination of policy changes and technology. Organizations often set up a group of people to oversee data governance and write a set of policies and procedures. They may also invest in data management solutions designed to simplify data governance and help ensure the accuracy of big data stores — and the insights derived from them.
6. Securing big data
Security is also a big concern for organizations with big data stores. After all, some big data stores can be attractive targets for hackers or advanced persistent threats (APTs).
However, most organizations seem to believe that their existing data security methods are sufficient for their big data needs as well. In the IDG survey, less than half of those surveyed (39 percent) said that they were using additional security measure for their big data repositories or analyses. Among those who do use additional measures, the most popular include identity and access control (59 percent), data encryption (52 percent) and data segregation (42 percent).
7. Organizational resistance
It is not only the technological aspects of big data that can be challenging — people can be an issue too.
In the NewVantage Partners survey, 85.5 percent of those surveyed said that their firms were committed to creating a data-driven culture, but only 37.1 percent said they had been successful with those efforts. When asked about the impediments to that culture shift, respondents pointed to three big obstacles within their organizations:
- Insufficient organizational alignment (4.6 percent)
- Lack of middle management adoption and understanding (41.0 percent)
- Business resistance or lack of understanding (41.0 percent)
In order for organizations to capitalize on the opportunities offered by big data, they are going to have to do some things differently. And that sort of change can be tremendously difficult for large organizations.
The PwC report recommended, “To improve decision-making capabilities at your company, you should continue to invest in strong leaders who understand data’s possibilities and who will challenge the business.”
One way to establish that sort of leadership is to appoint a chief data officer, a step that NewVantage Partners said 55.9 percent of Fortune 1000 companies have taken. But with or without a chief data officer, enterprises need executives, directors and managers who are going to commit to overcoming their big data challenges, if they want to remain competitive in the increasing data-driven economy.