Dark Data -Latent Analytics Insights that enterprises are failing to realize.
Dark data is a type of unstructured, untagged and untapped data that has not yet been analysed or processed.Dark data, however, emerges from the user-centric point of view. Where structured data refers to the structural qualities of the data, dark data refers to the visible qualities of the data.
For many businesses, understanding the sheer amount of dark data can be overwhelming and time consuming to manage.Dark data is collected and stored as part of typical business activities, but it’s not used for anything other than compliance and retention purposes.
The Dark data matters for two reasons. First, this data costs money to capture and manage, and it often necessitates capacity upgrades for premium data warehouses. Second, dark data can hold latent analytics insights that enterprises are failing to realize.
Here’s how enterprises can begin to achieve both goals.
- Analyze more data. The value of a data point often boils down to its correlation with other data points. Decision makers can better understand their financial standing. This type of structured data typically resides in data warehouses. The data points are easily correlated for insights. Other data, such as unstructured and semi-structured data, might sit dark because it is not as easy to correlate and analyze.
- Reconsider data storage architectures with an eye toward cost savings. Not all data holds immediate value. Old customer records or operational reports often grow dusty but still consume space in premium data warehouses in order to satisfy regulatory retention requirements.
We find that enterprises can best reduce the amount and cost of dark data by adopting three basic best practices.
- Automate. IT organizations can lose valuable time and energy to manual, error-prone ETL processes. Replacing this drudgery with intuitive, automated software enables IT to deliver more analytics-ready data to the business faster. Zurich Insurance, for example, has used data warehouse automation solutions to reduce ETL coding time from 45 days to two, and to accelerate EDW updates from twice annually to a monthly pace. As a result, the company has freed up resources for analytics and has lit up more of its dark data.
- Try new technologies and platforms. Apache Spark and Apache Kafka are just two emerging methods for analyzing and acting upon data streams in real time. Kafka, for example, can stream real-time transaction updates from customer databases to big data platforms such as Hadoop, where those transactions can be correlated with individual smartphones and physical store sensors to make location-based retail offers to repeat customers. Without the Kafka real-time feed, that transaction update might have become dark data. Instead, it creates a cross-selling opportunity.
- Track data usage. Enterprises across industries can realize significant savings by identifying unused tables and databases in their data warehouses and rebalancing them to economical platforms such as Hadoop or the cloud. This frees up premium data warehouse resources, improves query performance, and postpones costly hardware upgrades.