Natural Language Processing Demystified

An interview with a clinical informatics expert

Christina Sung Image

The volume of healthcare data generated is growing at a remarkable pace. But how do you turn big data into smart data? Natural language processing (NLP), a technology that streamlines the analysis of unstructured data, is a subject of great interest both inside and outside healthcare. I sat down with Dr. Rob Kalfus, Health Fidelity’s Clinical Informaticist, to address some commonly asked questions about NLP.

Welcome, Rob! Tell me about yourself.

I joined Health Fidelity as a Clinical Informaticist. My role is to enhance our NLP engine’s performance. Prior to Health Fidelity, I was a physician practicing sleep medicine. In addition to seeing patients, I worked for FairCode Associates, where I reviewed inpatient electronic medical records to assure accurate ICD-9/ICD-10 coding and Diagnosis Related Group (DRG) assignment.

What is NLP, and how does it work?

NLP is a part of computer science and artificial intelligence that deals with human languages. Human language embodies an enormous amount of expressiveness, variety, ambiguity, and vagueness. The core of NLP is to understand human language in an automated way.

NLP works by processing natural language – physician narratives, as an example – through its library of words, concepts, and relationships to piece together an understanding of the narrative. NLP is so much more than a text search. It considers the structural nature of language – words create a phrase, phrases form a sentence, and sentences convey an idea – and aims to understand the meaning behind the collection of words.

What makes Health Fidelity’s NLP unique?

Our engine was specifically designed to understand clinical language and is the longest standing clinical NLP. Clinical language is very different than say, legal language, which is why it’s very important to have a clinically oriented NLP as opposed to just any NLP. Health Fidelity’s NLP is supported by over 20 years of research from Columbia University led by Dr. Carol Friedman, who is one of the world’s leading experts on NLP within the biomedical domain.

I’ll give some examples of the system’s sophistication:

It is remarkably capable of contextual analysis. A physician who documents “AF” could be referring to either atrial fibrillation or someone’s initials. If another part of the note contains evidence about atrial fibrillation – such as a medication or terms describing heart rhythm – only then would the system form the appropriate association.

Some codes require modification due to coding guidelines. If a physician documents both diabetes and kidney disease in a patient, the system would apply logic to generate a combination code for diabetic kidney disease.

Lastly, I’d point to our system’s industrial strength. Our NLP provides high throughput and real-time performance; it’s capable of handling millions of transactions per hour.

How does NLP improve risk adjustment performance?

Historically, unstructured data in medical records (i.e., clinical narratives) has been impossible to analyze without an actual human reading through the records, which is resource-intensive and error-prone. But now, NLP makes it possible to systematically analyze large volumes of patient data.

We’ve seen organizations apply NLP to retrospectively review medical records in order to code documented risk conditions with greater accuracy and efficiency. NLP also mitigates compliance risks by identifying coded conditions that are not appropriately documented.

When it comes to prospective review, analytics have historically been based on administrative data such as claims. NLP makes it possible to incorporate clinical data into analytics, thereby increasing predictive power. Chart data contains the greatest and most up-to-date information about patient health that cannot be obtained from claims alone.

How accurate is Health Fidelity’s NLP for risk adjustment coding?

A useful metric to quantify an NLP system’s accuracy is recall. Essentially, it is the probability that the system detects a documented, risk-adjustable condition. The recall of Health Fidelity’s NLP is consistently above 95%, and getting better over time.

How does the NLP engine improve over time?

We’re constantly working to improve the NLP engine. We analyze data from coders using our system to optimize the accuracy of code suggestions on an ongoing basis. For example, we may tweak the system’s understanding of the grammar patterns physicians use in their notes. We are constantly generating new rules and testing their accuracy.

Another major area of focus is machine learning, which allows the system to continuously learn and improve its outputs.

Thanks for explaining NLP, Rob! I am intrigued by machine learning, but that is a topic for another day. One last question – what are the limitations of NLP?

Good question. No technology available today for risk adjustment coding can entirely replace human coders. Rather, technology makes coders’ jobs easier by allowing them to focus their expertise where it counts rather than wading through vast amounts of data that may have no use for risk adjustment.

To learn more about Health Fidelity’s Natural Language Processing engine, tune into this short video animation.