The Rise of Unstructured Data

17 Feb

IBM’s website features an interesting statistic- 80% of new data is unstructured. This means that new data is largely in the form of blogs, tweets, white papers, articles amongst many other different types. The vast amounts of unstructured data could have, and are having, quite profound effects on the way people and organisations view and analyse “soft data”.

There has long been a tendency to value structured data in the form of numbers and statistics when making plans and decisions. This is understandable, assigning numeric values makes data-

Easier to collect; for example, multiple choice questions, rating a feeling 1-5, makes data far easier to collect and analyse on a mass scale

Easier to store; software packages like SPSS and Microsoft Excel are capable of storing and categorising relatively large amounts of statistical data

Easier to retrieve; related to the above, tests and searches make statistical data relatively easy to retrieve from storage

This is in addition to the rigour statistical analysis can bring to many domains; enabling trends, patterns and relationships to be discovered over large sample sizes, and over decades of time. In contrast, qualitative analysis has traditionally had to rely on far smaller samples, greater depth, and inherently more subjectivity in data analysis.

However, now, in 2016, we have colossal sample sizes of unstructured data. This unstructured data contains the experiences, opinions, insights and expertise of millions of teams and individuals as they use services, products, solve problems and apply expertise. This data has huge value, you simply have to collect it, store it, be able to retrieve it, and then make sense of it.

If you can achieve the above then you may be able to find solutions to a problem you were ready to throw millions at. Or you might find valuable lessons from a company who has implemented a strategy similar to the one you are currently designing, and discover risks and potential problems which hadn’t even occurred to you.

Cognitive computing, IBM’s Watson is an example, provide potential mechanisms for collecting, storing and retrieving unstructured data, and gaining insight from it. IBM’s website describes that the process of teaching Watson a domain (industry, area of expertise, profession etc.) involves field specific experts (who are human) to “curate” the available data before it is inputted (ingested) into Watson. Watson is then taught the language, jargon, methods, sense making etc. of a particular domain so it can retrieve and answer “in context” queries. A user can then gain insight from both anecdotal and more technical structured data.

For example, general statistical information can be supported with a range of unstructured data which could be in narrative form. So a product that seems to have generally good reviews (the global perspective) doesn’t seem to have the same results in a particular context. Data which represents “the particular context” only becomes visible by searching through anecdotes, narratives and blogs on the subject (the local view). For example, an item on Amazon may have 100’s of 5 star reviews. But if you investigate the one star reviews, you might find stories and anecdotes from disappointed customers who all used the item specifically for what you want it for. The anecdotes help you adjust the global data for your specific need.

Tetlock and Gardner (2015) in their recent study on forecasting, argue that judgement of future events improves when a forecaster uses both the “outside view”, the global statistics, and the “inside view” the local context. This allows expectations to be adjusted for discrete local situations. The above Amazon example is a crude illustration of this.

Analytics which provide collection, storage and retrieval of unstructured data to provide insight, and improve judgement, are an exciting development. The ability to simultaneously use structured and unstructured data in decision making potentially provides the user with the “inside” and “outside” view. This brings together decision and consequence, planning and experience, and strategy with ground truth. It also reveals and shares learning, expertise and problem solving, potentially reducing the burden on centralised management as frontline workers could have greater access to each other’s expertise.

How could an organisation maximize the benefit of using similar analytics? And how could an organisation benefit from creating greater levels of unstructured data? The answers out there. For example, the UK Think Tank, The Policy Exchange recently produced a paper entitled Smart Devolution (2016), in reference to the devolved city strategy operating in the UK. Within the paper are suggestions of how a Mayor’s decision making could be supported and enhanced by increased use of data. The model for this suggestion is based on former New York Mayor, Michael Bloomberg’s Mayor’s Office of Data Analytics (MODA).

The first Director of MODA, Mike Flowers, was interviewed for the publication. Mike Flower’s stated that maximizing the use of data first begins with collecting frontline expertise. It begins here, because frontline expertise understands the ground truth of situations, the problems, the solutions, the craft skills and rules of thumb. The role of data is to support and enhance expertise. Naturally, expertise is unstructured by nature and difficult to collect (Klein, 2007).

For an organisation to maximize the use of unstructured data, it needs to begin by generating it (if it is not already). This means asking frontline workers to write down stories, anecdotes, hints and tips of their experiences; particularly the tough cases. Once data is generated, it needs to be curated, analysed. This requires research and analysis skills to capture the tacit lessons and represent them in a format which informs decision making. Then data needs to be stored in a format where it can be retrieved. This can be achieved in many different ways, analytics similar to Watson representing the most sophisticated option.

The IBM website highlights numerous benefits to collecting and analysing unstructured data in decision making. One of these benefits is that the collection and analysis of unstructured data changes the way people create and share expertise. It also changes the way an organisation learns and adapts, bringing together decision makers with frontline workers. Unstructured data effectively creates feedback loops throughout an organisation. For some organisations this could be the first opportunity where everyone finds out that when you touch something hot, it burns.

Reading and Links

Link to Watson overview on the IBM website

Link to summary and full download of Policy Exchange publication, Smart Devolution (2016)

Tetlock, P. E. & Gardner, D. (2015). Superforecasting: The Art and Science of Prediction. New York: Crown

Klein, G. (2007) The Power of Intuition: How to Use Your Gut Feelings to Make Better Decisions at Work. Currency




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: