Data Rediscovered

Written by David Gulajan | Feb 16, 2021 3:16:12 PM

Are you in need of more clarity in terms of the ever-developing field of Natural Language Processing? Good news! Avineon is launching a series of Natural Language Processing (NLP) blog posts to outline our success with Machine Learning and other Deep Learning endeavors.

 

This series assumes no prior knowledge of Machine Learning nor computer savviness. This will not be a masterclass on implementing solutions, but instead a primer to possibilities along with some of the solutions Avineon has explored. As our world evolves, so should our tools to interact with it. We believe that knowledge should be clear, concise, and accessible.

 

For your benefit, the list of covered topics is illustrated here below:

 

 

These topics are geared towards those with little to no prior knowledge in the field but will aim to engage those of all knowledge levels. Whether you know only a few terms or whether you have built your own model, we look forward to embarking on this series with you!

 

 

 

Data Rediscovered

 

Data is often used as a catch-all term that differs depending on the speaker’s intentions. The meaning is obscured by arcane adjectives and nuanced interpretations. When we see it in a news article, an advertisement, or blog post we skip over the word as a placeholder. It means something approximate and that is often close enough for our purposes.

 

It is in this blurred region that most Natural Language Processing (NLP) understanding lives. What happens when data does not look, act, or feel like data; When our data is just as messy as our speech? That is the essence of unstructured data and the problem most NLP solutions try to solve.

 

 

Defining Unstructured Data

 

Unstructured data is information that exists outside of the neat tiny boxes of Excel or your SQL database because it cannot be easily organized. Text is the quintessential example of unstructured data (and unsurprisingly the focus of this blog).


Any good writing, whether its technical or not, has a flow and structure to it. That flow and structure is easy for us to intuit but hard for us to rigidly explain. We can develop outlines, table of contents, or appendices for our documents; but the parallel becomes confused when you are asked to organize sentence to sentence or idea to idea. The aim of NLP is to capture that intuition about unstructured data and let computers parse it out for us, saving us time and money, or picking up on connections we would not be able to identify alone.

 

 

The Value of Analyzing Unstructured Data

 

All of this sounds nice; but what is the actual value of analyzing unstructured data? Well, think of it as a speed reader who processes like a human reader and can make judgments based on what it has read. NLP solutions could even read this blog and summarize its contents!


Deriving the value of utilizing unstructured data begins with looking at your needs. Think of the swathes of text you, or your organization, keeps in repositories, databases, or hard drives, and all the contracts, technical manuals, research papers, memos, and more stored in those folders. All of that is underutilized, unstructured data.


Your needs may vary. Maybe you need to ask questions to huge technical manuals that Google will not have the answer too because it is domain specific to your organization or business. Perhaps you have files which need to be categorized based on its content and you need a quick, efficient, and effective way to sort them. Or you maybe need to extract key information in wildly different document types to insert into a traditional database. Whatever your needs may be, text does not need to be an abyss only treaded by humans.

 


Now that we established the ‘what’ of NLP, I look forward to discussing part of the ‘’how’ with you next week!