Dark Data

Dark data is a new, old problem that has been known to analysts and statisticians for years. This refers to all the data and metadata not being gathered, structured and analyzed, and which form a constant waste of potentially valuable information that businesses are letting slip away. Developing our own utilities that perfectly fit what we need can be an excessively intensive task.
2 min reading
16 July 2018
Dark Data
Dark Data

BBVA API Market

Dark data is a new, old problem that has been known to analysts and statisticians for years. This refers to all the data and metadata not being gathered, structured and analyzed, and which form a constant waste of potentially valuable information that businesses are letting slip away. Developing our own utilities that perfectly fit what we need can be an excessively intensive task.

It is a complicated task to manage to take advantage of Dark Data. The first step is to identify what data a business has stored that are not being analyzed, the second is to try to foresee the potential of such data before endeavoring on development work to extract them.

Developing our own utilities that perfectly fit what we need can be an excessively intensive task, especially if we are not able to see what the final value that we will be able to get will be, either in terms of immediate monetary revenue or added value for other parts of the business. Fortunately there are multiple tools and APIs to work with and immerse yourself in this mass of data.

IBM OpenWhisk

A clear example of Dark Data is the content of videos that many platforms host. Usually the analysis focuses on the metadata surrounding the video such as the title, date, duration or tags generated or applied by humans.

With OpenWhisk you can analyze the content within each of the videos’ scenes. It does this by extracting individual shots and, in parallel, it identifies what happens in each of them: who appears, what texts there are, what is represented, what objects can be seen, and so on.

This is what IBM calls Dark Vision. Once the data concerning each of the video’s scenes is obtained, the level of improvements and possibilities increases exponentially.

Stanford’s DeepDive

Scholars from Stanford University in California created DeepDive, another system to extract data in a structured way. The main advantage of DeepDive is that it creates SQL tables with data extracted from documents. The platform has been used to categorize a totally disorganized corpus of data by several universities and research groups, with surprising results.

It is a qualitative leap compared to other platforms and software based on the initial manual identification of the data. DeepDive automates much of the process with machine learning. It allows the group in charge of the analysis to define the objectives to be achieved instead of scheduling concrete and specific tasks. Once these objectives are clear, the system will begin analysis and extraction.

The developers of DeepDive have left room for inaccuracies and to understand ambiguous data. For example, it understands that two terms are the same even though one contains spelling mistakes.

Background

Experts in Dark Data say the first step is “restoring the context”. Starting to analyze each piece of data by emulating the situation prior to it being stored. These techniques can serve to greatly improve the future success of the analysis.

Each business is different, and the Dark Data generated by a bank is very different from a law firm or anyone with a social network or an e-commerce site. Managing to “light up” dark data has many challenges at a technical level, and solutions can range from applying a better methodology to the existing development to hiring a specific disciplinary team if it is predicted that the hidden value is huge.

In fact, the best situation is for the data to always remain structured from when they are gathered and preventing them from becoming Dark Data due to technical negligence. If the technical resources are in place, no data should be given as lost once stored.

Are you interested in financial APIs? Discover all the APIs we can offer you at BBVA

It may interest you