Big Data Startups

Big Data startups are emerging quickly, because Big Data itself is quickly moving from emerging technology to mature technology. Companies that were startups five years ago are now key playes, like Cloudera and Hortonworks. Those companies, which covered the basics of Big Data, are giving way to the next generation, which are more specialized and dedicated to vertical solutions rather than general analytics.

As Big Data emerges from startup status to maturity, expect several changes in the overall market and its products. For starters, it’s going to get much faster. When it was first introduced in 2006, Big Data was primarily a batch process run overnight. That won’t cut it any more. Companies demand immediate insights and want real-time processing, which they are getting thanks to Apache Spark and many commercial products.

In addition, Big Data is moving beyond Hadoop. Customers want analytics on all data, not just one source. Answers to their questions may come from sensor logs or machine data, and that means structured and unstructured data. So new databases are needed.

Also, volume is no longer king, it’s variety of data that counts. Companies are looking to integrate more varied data sources, from JSON to nested types in other databases to non-flat data formats. Sources, and connectors, are becoming more varied and crucial.

Big Data Market

Hype and reality don’t always match. The hype around Big Data has receded but interest certainly has not. According to a recent report by IDC, worldwide revenues for big data and business analytics will rise 12.4% year-over-year in 2017 to $150.8 billion, while commercial purchases related hardware, software, and services will see a compound annual growth rate (CAGR) of 11.9% through 2020.

AngelList, which follows angel investors in tech firms, says there are 4,701 Big Data startups with an average valuation of $4.6 million. Many of these firms won’t make it, and a lot will be acquired by the giants, but it shows the interest and energy around Big Data is not waning at all.

What follows is a list of some hot startups to watch for. It is by no means complete, of course, but here are a few that stand out.

Big Data Startups


MapD was named one of Gartner’s Cool Vendors of 2016, Start-Up of the Year by Business Intelligence Group, and was listed in Fast Company’s 2016 Innovation by Design Awards. It uses GPU-based cloud services to query and visualize multi-billion record datasets in just milliseconds. It claims it can perform visualizations up to 100 times faster than CPU-based visualization applications thanks to the GPU’s parallel processing abilities.

Treasure Data

Treasure Data is a cloud-based data management platform designed to draw on data from multiple sources to create real-time insights. Its cloud-based service includes data collection and ingestion software, highly scalable data analytics, backend storage, and an analytics front-end. It helps customers collect, store, and analyze data from sources like the Web, applications, mobile, and sensor data.


This French startup states its objective is to offer a data science platform that offers visual tools to build workflows when productivity is at a premium, or use what it calls notebooks when needed for speed. Its main product is Dataiku Data Science Studio, an advanced analytics software solution that connects to more than 25 different data storage systems, and uses a variety of languages to integrate all of the data sources.


Dataiku analytics software uses a variety of languages to integrate data sources.


Couchbase develops a specialized document-oriented NoSQL database for building interactive applications. Its Engagement Database is designed for constantly-changing customer experiences to produce dynamic results. Couchbase Server is designed to provide scalable key-value or JSON documents, while Couchbase Mobile serves mobile users.


Inbenta provides a cloud-based, semantic search technology using natural language processing to improve online customer service through Artificial Intelligence-powered technology that helps businesses increase the efficiency of their customer service, eCommerce, FAQs, and social media platforms.


BigML offers what is described as Machine Learning as a Service (MLaaS), a SaaS-like service that connects to cloud data sources like AWS S3, Microsoft Azure, Google Storage, Google Drive, Dropbox, etc. to offer simple machine learning services. All of its services are available via the Web so you point and click at a gallery of free datasets and models for data analysis and visualization tools to play with, well organized into categories and publicly accessible.


Striim is a streaming analytics startup that provides companies with end-to-end, real-time data integration. It processes large amounts of data from sources such as enterprise databases, IoT sensors, and log files for analysis in real time, while maintaining secure connections with the remote IoT devices.


Cognonto is a knowledge-based artificial intelligence (KBAI) developer that develops the Cognonto Platform and KBpedia, a computable knowledge structure to automate much of the effort needed for machine learning. KBpedia leverages six large-scale knowledge bases – Wikipedia, Wikidata, GeoNames, OpenCyc, DBpedia and UMBEL – into a single structure designed to support artificial intelligence (AI) within enterprises. For customers looking to integrate with their own data sources, Cognonto Platform on-premises maps and integrates enterprise content to tailor the machine learning.


Cognonto offers a solution that leverages a handful of knowledge sources into a single structure.


Maana’s Enterprise Knowledge Graph lets data analysts, business analysts, data scientists, and enterprise architects collaborate in a single, integrated system. It uses machine learning and other technologies to bring together siloed data for better decision-making. This allows for high-speed data mining and machine learning across all data silos and data types, and organizes its findings into a knowledge graph.


VoxelCloud provides automated medical image analysis services and diagnosis assistance based on AI, deep learning, and cloud computing technologies. It serves hospitals and other medical centers, and currently covers lung cancer, retinal diseases, and coronary heart disease. Its automated medical image analysis services and clinical decision support services offer more accurate, efficient, and accessible personalized medical image analysis than by humans alone.


AtScale gets rid of the middle man for data processing by performing the data analysis where it is stored, rather than loading it into a processing engine or application or database. It also lets data analysts use the BI app of choice the same way, by leaving the data where it resides rather than loading it into a data warehouse.


Just like there are marketplaces for APIs, object frameworks, application frameworks and components and Java code snippets, Algorithmia offers online marketplace for Big Data algorithms developed by academics and programmers, which software developers can purchase for their own use. The company has more than 3,500 algorithms in its library.


Wavefront from VMware offers a real-time analytics platform that monitors an IT department’s systems and looks for potential problems, providing alerts and diagnostics before a system failure can cause significant problems.

Similar articles

Get the Free Newsletter!
Subscribe to Data Insider for top news, trends & analysis
This email address is invalid.
Get the Free Newsletter!
Subscribe to Data Insider for top news, trends & analysis
This email address is invalid.

Latest Articles