Already a DIA Member? Sign in. Not a member? Join.

Sign in

Forgot User ID? or Forgot Password?

Not a Member?

Create Account and Join

Menu Back to Poster-Presentations-Details

T-20: Understanding Heterogeneity in Rheumatoid Arthritis Disease Progression by Using Word Embedding: An Electronic Health Record





Poster Presenter

      Ye Jin Eun

      • Senior Data Scientist
      • Janssen
        United States

Objectives

Rheumatoid arthritis (RA) is a chronic autoimmune condition that often affects multiple joints and could lead to severe disability. The goal of this study is to understand heterogeneity in patient journey and decipher factors associated with progressing diseases via data-driven cluster analysis.

Method

Optum Pan-Therapeutic EHR were used in the analysis. To preserve the sequential nature of the data, word embedding was used to represent the medical codes in a continuous vector space. Density-based spatial clustering of applications with noise and k-means algorithms were used for cluster analysis.

Results

~160k subjects met the inclusion criteria. Among them, ~27k ended up with biologic treatment. Five meaningful clusters were identified based on patient journey. Patient characteristics were compared across clusters to identify distinct differences across clusters. Interesting findings include confirmation that a family history of RA puts subjects at risk of RA as well as at risk of requiring advanced therapy, and that the subgroup of patients with diagnosis of hypothyroidism are 2-times less likely to get biologic treatment, compared to the RA patient cohort average.

Conclusion

By using word embedding and cluster analysis, we successfully preserved the sequential nature of the patient journey, enabled patient segmentation based on disease progression over time, and identified risk factors associated with progressive diseases.

Be informed and stay engaged.

Don't miss an opportunity - join our mailing list to stay up to date on DIA insights and events.