T-20: Understanding Heterogeneity in Rheumatoid Arthritis Disease Progression by Using Word Embedding: An Electronic Health Record
Poster Presenter
Ye Jin Eun
Senior Data Scientist
Janssen United States
Objectives
Rheumatoid arthritis (RA) is a chronic autoimmune condition that often affects multiple joints and could lead to severe disability. The goal of this study is to understand heterogeneity in patient journey and decipher factors associated with progressing diseases via data-driven cluster analysis.
Method
Optum Pan-Therapeutic EHR were used in the analysis. To preserve the sequential nature of the data, word embedding was used to represent the medical codes in a continuous vector space. Density-based spatial clustering of applications with noise and k-means algorithms were used for cluster analysis.
Results
~160k subjects met the inclusion criteria. Among them, ~27k ended up with biologic treatment. Five meaningful clusters were identified based on patient journey. Patient characteristics were compared across clusters to identify distinct differences across clusters. Interesting findings include confirmation that a family history of RA puts subjects at risk of RA as well as at risk of requiring advanced therapy, and that the subgroup of patients with diagnosis of hypothyroidism are 2-times less likely to get biologic treatment, compared to the RA patient cohort average.
Conclusion
By using word embedding and cluster analysis, we successfully preserved the sequential nature of the patient journey, enabled patient segmentation based on disease progression over time, and identified risk factors associated with progressive diseases.