Introduction to Knowledge Graphs in Healthcare

The healthcare industry is of utmost importance and focuses on the development of various systems that enable prolonged health through proper diagnoses, predictions, and treatments. Despite the drive to continually innovate and improve patient outcomes, the domain of healthcare services is plagued with manual processes that slow down doctors and employees – such as writing summary notes, correlating measurements, suggesting diagnoses, going through the patient's personal/family history, or looking for past similar cases. While the recent progress made in "Electronic Health Records" (EHRs) is promising, the information captured is not much easier to process due to the huge volume of data. Even with EHRs now being easily accessible and normalized, they usually convey much more information than a doctor can make sense of.

This EHR data itself contains information and data of various kinds, i.e., structured and unstructured, from laboratory measurements and vital signs, to physician notes and radiology imaging. Recent studies in psychological science show that humans can process at most four interacting variables at the same time, supporting the idea that computer-assisted healthcare is the future of the domain. Therefore, the primary value that machine learning brings in healthcare is its ability to process huge datasets of different data types (ex: images, text) beyond the scope of human capability. Furthermore, machine learning can aid doctors in reliably and efficiently leveraging information from healthcare data to speed up decision making- leading to better outcomes and lower costs of healthcare for patients.

In this series of two blog posts, I will show how we can employ Graph Machine Learning techniques to tackle a healthcare use case- the problem of automatically predicting diagnoses for patients staying in intensive care units. In this first post, we will introduce how we can build Knowledge Graphs (KGs) from heterogeneous sources. In the second post, we will deep-dive in the healthcare domain to demonstrate how to employ Graph Machine Learning over the constructed healthcare KG for diagnosis prediction. The techniques shown here are generic and can also be applied to other domains, such as Retail or Finance.

Why Knowledge Graphs in Healthcare?

Graphs have become ubiquitous nowadays as the backbone of multiple applications- from search engines and recommender systems to intelligent chatbots. The Graph data model captures the relationships between different entities by linking them through edges based on information extracted from various heterogeneous sources. Once the data is represented in graph format, there are various graph analytic techniques to query multi-hop relationships between entities in the constructed KG. Furthermore, graphs enable users to visualize the data in an interactive and exploratory fashion for analysis.

Deploying knowledge graphs in the healthcare services space has proven to be an effective method to map relationships between the enormous variety and structure of healthcare data. Graphs provide an uncanny ability to model latent relationships between information sources and capture linked information (i.e., entity relationships) that other data models fail to capture. This enables doctors and service providers to more easily find the information they need among a wide array of variables and data sources.

Use Case: Knowledge Graph in Electronic Health Records

We now present an example Knowledge Graph from the Healthcare domain. We will first introduce the use case at a high level, and in the second post of this series, walk through how to build the knowledge graph yourself. The use case we’re demonstrating captures the relationships present in Electronic Health Records (EHRs).

Whenever a patient is admitted to a hospital, we consider such an event as an “admission” which can range in duration depending on the criticality of the associated diagnoses. An admission has various properties, such as the medications during the on-going admission or the "pre-diagnosis" note from the medical practitioner.

Next, we look at a separate knowledge base which consists of information regarding various diseases and the related symptoms accompanying the disease. For example, Migraine is associated with medical symptoms such as headache, nausea or sensitivity to light. It is important to note that the pre-diagnosis information, provided by the medical practitioner, could be linked to either such diseases or even symptoms in medical terms.

In the following visualization, we illustrate an instance of the healthcare knowledge graph which demonstrates a heterogeneous graph with entities as either "Disease", "Symptom" or "Admission."  In this KG the relationships are either "Admission has similar pre-diagnosis as either a disease or a symptom" or "Disease has the following symptoms". To compute the similarity, we employ a combination of n-grams with TF-IDF (Term Frequency – Inverse Document Frequency) to link Admissions with diseases or symptoms. Furthermore, the "Internal information" is for information collected from the hospital database (with patient admissions) whereas the "External information" is the information aggregated from external sources which in this case is from a Disease-Symptom KG built externally and available publicly on Github through this link.

Disease Symptom Knowledge Graph


In this post, we began by introducing some of the data-related challenges healthcare providers face, and talked about the benefits machine learning can provide. Then, we discussed an example knowledge graph from the Healthcare domain to illustrate how data is captured in a graph model. In the next post, we will show how to employ Graph Machine Learning over a Healthcare KG for better diagnoses prediction.

Learn More

You can find more samples of how to use Parallel Graph AnalytiX (PGX) with machine learning in our GitHub repository:

For more detailed questions, please feel free to reach out to


Be First to Comment

    Leave a Reply

    Your email address will not be published. Required fields are marked *