Dark Data - Light It Up, Throw It Out, or Put It to Work?

Data management and cybersecurity personnel are going to have to crank the handle somewhat over the next few years. Initial imaginings of data lakes now seem positively provincial, as a growing, massive volume of data is shed by humanity for collection each day. Facebook users, for example, publish around 2.5 million snippets every minute, and this while 350,000 tweets are recorded in the same moment, and nearly a quarter of a billion text messages are dispatched, too.

Initial enterprise solutions of a few years back had the marketing pitch of making all that data manageable. The data lake would make everything available, managing sorted and unsorted inputs. Heading into 2020, however, the management of data lakes is proving problematic. Not only is data growing exponentially – and a larger and larger percentage is becoming unused as this happens – but company employees’ approach to and handling of data is poking holes in all that seamless and secure intel. Furthermore, dark data is a giant pile of messy data that’s prone to be left lying around in unsanctioned ways.

New thinking on the darknet that ties in far better with human logic is emerging. If one can appreciate that humanity’s aggregated data numbers some 6 zettabytes (a zettabyte is a trillion gigabytes) and that it doubles every year or so, it’s easy to see how managing the anticipated 44 zettabytes to be captured in 2020 and 2021 demand that failures in today’s systems be improved upon, and quickly. Protocols need a revitalising of their importance, while greater effort should be expended on ensuring that data makes profitable sense.

Where does dark data come from?

While pooling data is still valid, what data lakes (that can accommodate sorted and unsorted data) failed to factor in was the human element. Company employees – still mostly human in 2020 – often “parcel off” data from the lake or communal pool, mostly because people need to feel as though they’re in control. It’s not quite as simple as that, but almost. People manifest “app loyalty,” which is very much the same as brand loyalty, just the tech version.

Taking ownership of a function or project is encouraged, but when that responsible and wholly understandable behaviour extends to data management, security issues can arise. For example, rather than returning sorted data to the lake, users often move on and leave data in Dropbox, when in fact company security protocols might have dictated OneDrive. That’s a single oversight, but when compounded by dozens of employees over hundreds of days in a working year, it’s easy to see how things become an exposed or stagnant mess very quickly – dark data.

Secondly, while pooling data for subsequent analysis, feedback and monetisation is a great ploy, data lakes have very often become intimidating, murky swamps rather than the vibrant, clear waters imagined. The business has failed to appreciate the value (or begin the tasks of big data) as its extrapolation and monetisation almost presents as a separate enterprise, divorced from most companies’ core focus. For most companies entering 2020, the result is still a giant planet of data, floating just out of reach, that no one seems keen to tackle.

There’s a persistent notion that says collecting data is 99 per cent of the battle won, whereas collected but unsorted and unused data is actually worthless. It’s dark. Ninety-nine per cent of data’s value comes from sorting and acting upon discernible patterns.

Between individual users in a company who might “prefer” their apps or methods of doing things – a phenomenon greater enabled the higher up the ladder one goes – and deadweight data that simply accumulates to no profitable end, something has to change.

Dark data needs to come home

IT firms like Computers In The City often face the difficult task of tracing a data breach to ostensibly innocent behaviour. It’s human behaviour; employing apps and protocols that make one feel most comfortable, but it can’t coexist with the security demands of the modern enterprise.

Hope for better management is insight, and it’s not human. One area of potentially massive application for AI will lie in its ability to process and sort – and ultimately interpret – the tsunami of data enterprise finds itself swimming in. While stricter protocols and policing will still be needed to ensure employees operate within secure confines, they will perhaps also find wading into the lake much less intimidating if AI can sort it and present it as a welcoming space. That’s a good place to start.

The business hasn’t kept pace with the need to sort its accrued data, in spite of the fact that it’s demonstrably profitable. New business directions – and new income streams – sit in the ether waiting to be discovered. Even if a company were to stick strictly within their current field of operation, the proactive analysis of total data grants exquisite levels of tweaking, effectiveness and efficiency. Marketers have never had it so good, but they could have it even better yet.

Data sorting is still seen as secondary, as is its funding in the typical enterprise. Hopefully, developments in AI will address this aspect of dark data going forward, but an accompanying managerial mindset needs to propel it.

Coupled with the human desire of employees to have unorthodox access to data as they want it, comes the second (human) aspect of centralised data pooling that wasn’t factored in – people seldom return! They leave files or folders somewhere, they shed devices that still contain data, they resign and move on. Priorities change, and an unorthodox deposit – a possibly sensitive component of the company’s IP – is left out in an unsecured environment.

AI might prove to be a carrot and a stick

The more employees there are in business, the looser ends accumulate, until security breaches or data leaks have a myriad of start points that cannot be policed or, often, even determined with any degree of accuracy by subsequent investigation. The AI promise can likely address the issue of dark data. Although users will still tend to want to employ “their” apps for work tasks -something that must be addressed – at least AI can make the company’s data lake far more welcoming to enter.

AI can certainly free up people from the endless task of sorting data appropriately, as well as depicting patterns or trends, and even projecting forward directions that could prove profitable, especially as the IoT looms. It can also, perhaps, find another role in the necessary policing of an enterprise’s intellectual property. It can track access and monitor subsequent handling of data, follow the movement of that data, and signal its necessary deletion or return.

Knowing that to err is human, we’ll need to apply both stricter and novel approaches to handling enterprise data going into the new decade. Dark data is a decided liability and unnecessary evil. It needs to be curtailed by both stricter access protocols (especially heading towards a world with several dozen billion connected devices), as well as by companies actually taking the time – and making funds available – for its sorting and application.

Did you find this article useful? Find more articles like it here or comment/share with your thoughts below!