Construct Validity of Six Sentiment Analysis Approaches in the Text of Encounter Notes of Patients with Critical Illness


In the era of widespread adoption of electronic health records (EHRs) and learning health systems, there is growing interest in the richness of free-text data sources. Among patients with critical illness, the text of clinical notes has been used to identify diagnoses and interventions in the intensive care unit (ICU) and to improve predictions of future health states. Clinical text contains objective diagnostic information not found in structured data sources within the EHR. But clinicians also make subjective assessments regarding patient outcomes that may be inscribed in the free-text of clinical notes. The study of such subjective content is called “sentiment analysis." In this study, we seek to determine the construct validity of existing sentiment lexica derived in other domains when used for analysis of clinical text among patients with critical illness. Specifically, we will examine the predictive, concurrent, content, and convergent validity to assess different aspects of the sentiment construct. We will analyze the Medical Information Mart for Intensive Care (MIMIC) III database which comprises all hospital admissions requiring ICU care at the Beth Israel Deaconess Medical Center, Boston, MA, between 2001 and 2012.


  • National Institutes of Health

