ERROR! No headcode.htm file found.

Bio

Publications

All Publications


  • Weakly supervised classification of rare aortic valve malformations using unlabeled cardiac MRI sequences Nature Communications Fries, J. A., Varma, P., Chen, V. S., Xiao, K., Tejeda, H., Saha, P., Dunnmon, J., Chubb, H., Maskatia, S., Fiterau, M., Delp, S., Ashley, E., Ré, C., Priest, J. R. 2019; 10
  • Snorkel: Rapid Training Data Creation with Weak Supervision PROCEEDINGS OF THE VLDB ENDOWMENT Ratner, A., Bach, S. H., Ehrenberg, H., Fries, J., Wu, S., Re, C. 2017; 11 (3): 269?82

    Abstract

    Labeling training data is increasingly the largest bottleneck in deploying machine learning systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of- the-art models without hand labeling any training data. Instead, users write labeling functions that express arbitrary heuristics, which can have unknown accuracies and correlations. Snorkel denoises their outputs without access to ground truth by incorporating the first end-to-end implementation of our recently proposed machine learning paradigm, data programming. We present a flexible interface layer for writing labeling functions based on our experience over the past year collaborating with companies, agencies, and research labs. In a user study, subject matter experts build models 2.8× faster and increase predictive performance an average 45.5% versus seven hours of hand labeling. We study the modeling tradeoffs in this new setting and propose an optimizer for automating tradeoff decisions that gives up to 1.8× speedup per pipeline execution. In two collaborations, with the U.S. Department of Veterans Affairs and the U.S. Food and Drug Administration, and on four open-source text and image data sets representative of other deployments, Snorkel provides 132% average improvements to predictive performance over prior heuristic approaches and comes within an average 3.60% of the predictive performance of large hand-curated training sets.

    View details for PubMedID 29770249

  • Ontology-driven weak supervision for clinical entity classification in electronic health records. Nature communications Fries, J. A., Steinberg, E., Khattar, S., Fleming, S. L., Posada, J., Callahan, A., Shah, N. H. 2021; 12 (1): 2017

    Abstract

    In the electronic health record, using clinical notes to identify entities such as disorders and their temporality (e.g. the order of an event relative to a time index) can inform many important analyses. However, creating training data for clinical entity tasks is time consuming and sharing labeled data is challenging due to privacy concerns. The information needs of the COVID-19 pandemic highlight the need for agile methods of training machine learning models for clinical notes. We present Trove, a framework for weakly supervised entity classification using medical ontologies and expert-generated rules. Our approach, unlike hand-labeled notes, is easy to share and modify, while offering performance comparable to learning from manually labeled training data. In this work, we validate our framework on six benchmark tasks and demonstrate Trove's ability to analyze the records of patients visiting the emergency department at Stanford Health Care for COVID-19 presenting symptoms and risk factors.

    View details for DOI 10.1038/s41467-021-22328-4

    View details for PubMedID 33795682

  • Assessment of Extractability and Accuracy of Electronic Health Record Data for Joint Implant Registries. JAMA network open Giori, N. J., Radin, J., Callahan, A., Fries, J. A., Halilaj, E., Re, C., Delp, S. L., Shah, N. H., Harris, A. H. 2021; 4 (3): e211728

    Abstract

    Importance: Implant registries provide valuable information on the performance of implants in a real-world setting, yet they have traditionally been expensive to establish and maintain. Electronic health records (EHRs) are widely used and may include the information needed to generate clinically meaningful reports similar to a formal implant registry.Objectives: To quantify the extractability and accuracy of registry-relevant data from the EHR and to assess the ability of these data to track trends in implant use and the durability of implants (hereafter referred to as implant survivorship), using data stored since 2000 in the EHR of the largest integrated health care system in the United States.Design, Setting, and Participants: Retrospective cohort study of a large EHR of veterans who had 45?351 total hip arthroplasty procedures in Veterans Health Administration hospitals from 2000 to 2017. Data analysis was performed from January 1, 2000, to December 31, 2017.Exposures: Total hip arthroplasty.Main Outcomes and Measures: Number of total hip arthroplasty procedures extracted from the EHR, trends in implant use, and relative survivorship of implants.Results: A total of 45?351 total hip arthroplasty procedures were identified from 2000 to 2017 with 192?805 implant parts. Data completeness improved over the time. After 2014, 85% of prosthetic heads, 91% of shells, 81% of stems, and 85% of liners used in the Veterans Health Administration health care system were identified by part number. Revision burden and trends in metal vs ceramic prosthetic femoral head use were found to reflect data from the American Joint Replacement Registry. Recalled implants were obvious negative outliers in implant survivorship using Kaplan-Meier curves.Conclusions and Relevance: Although loss to follow-up remains a challenge that requires additional attention to improve the quantitative nature of calculated implant survivorship, we conclude that data collected during routine clinical care and stored in the EHR of a large health system over 18 years were sufficient to provide clinically meaningful data on trends in implant use and to identify poor implants that were subsequently recalled. This automated approach was low cost and had no reporting burden. This low-cost, low-overhead method to assess implant use and performance within a large health care setting may be useful to internal quality assurance programs and, on a larger scale, to postmarket surveillance of implant performance.

    View details for DOI 10.1001/jamanetworkopen.2021.1728

    View details for PubMedID 33720372

  • Measure what matters: Counts of hospitalized patients are a better metric for health system capacity planning for a reopening. Journal of the American Medical Informatics Association : JAMIA Kashyap, S., Gombar, S., Yadlowsky, S., Callahan, A., Fries, J., Pinsky, B. A., Shah, N. H. 2020

    Abstract

    OBJECTIVE: Responding to the COVID-19 pandemic requires accurate forecasting of health system capacity requirements using readily available inputs. We examined whether testing and hospitalization data could help quantify the anticipated burden on the health system given shelter-in-place (SIP) order.MATERIALS AND METHODS: 16,103 SARS-CoV-2 RT-PCR tests were performed on 15,807 patients at Stanford facilities between March 2 and April 11, 2020. We analyzed the fraction of tested patients that were confirmed positive for COVID-19, the fraction of those needing hospitalization, and the fraction requiring ICU admission over the 40 days between March 2nd and April 11th 2020.RESULTS: We find a marked slowdown in the hospitalization rate within ten days of SIP even as cases continued to rise. We also find a shift towards younger patients in the age distribution of those testing positive for COVID-19 over the four weeks of SIP. The impact of this shift is a divergence between increasing positive case confirmations and slowing new hospitalizations, both of which affects the demand on health systems.CONCLUSION: Without using local hospitalization rates and the age distribution of positive patients, current models are likely to overestimate the resource burden of COVID-19. It is imperative that health systems start using these data to quantify effects of SIP and aid reopening planning.

    View details for DOI 10.1093/jamia/ocaa076

    View details for PubMedID 32548636

  • Assessing the accuracy of automatic speech recognition for psychotherapy NPJ DIGITAL MEDICINE Miner, A. S., Haque, A., Fries, J. A., Fleming, S. L., Wilfley, D. E., Wilson, G., Milstein, A., Jurafsky, D., Arnow, B. A., Agras, W., Li Fei-Fei, Shah, N. H. 2020; 3 (1)
  • Language models are an effective representation learning technique for electronic health record data. Journal of biomedical informatics Steinberg, E. n., Jung, K. n., Fries, J. A., Corbin, C. K., Pfohl, S. R., Shah, N. H. 2020: 103637

    Abstract

    Widespread adoption of electronic health records (EHRs) has fueled the development of using machine learning to build prediction models for various clinical outcomes. However, this process is often constrained by having a relatively small number of patient records for training the model. We demonstrate that using patient representation schemes inspired from techniques in natural language processing can increase the accuracy of clinical prediction models by transferring information learned from the entire patient population to the task of training a specific model, where only a subset of the population is relevant. Such patient representation schemes enable a 3.5% mean improvement in AUROC on five prediction tasks compared to standard baselines, with the average improvement rising to 19% when only a small number of patient records are available for training the clinical prediction model.

    View details for DOI 10.1016/j.jbi.2020.103637

    View details for PubMedID 33290879

  • Assessing the accuracy of automatic speech recognition for psychotherapy. NPJ digital medicine Miner, A. S., Haque, A. n., Fries, J. A., Fleming, S. L., Wilfley, D. E., Terence Wilson, G. n., Milstein, A. n., Jurafsky, D. n., Arnow, B. A., Stewart Agras, W. n., Fei-Fei, L. n., Shah, N. H. 2020; 3 (1): 82

    Abstract

    Accurate transcription of audio recordings in psychotherapy would improve therapy effectiveness, clinician training, and safety monitoring. Although automatic speech recognition software is commercially available, its accuracy in mental health settings has not been well described. It is unclear which metrics and thresholds are appropriate for different clinical use cases, which may range from population descriptions to individual safety monitoring. Here we show that automatic speech recognition is feasible in psychotherapy, but further improvements in accuracy are needed before widespread use. Our HIPAA-compliant automatic speech recognition system demonstrated a transcription word error rate of 25%. For depression-related utterances, sensitivity was 80% and positive predictive value was 83%. For clinician-identified harm-related sentences, the word error rate was 34%. These results suggest that automatic speech recognition may support understanding of language patterns and subgroup variation in existing treatments but may not be ready for individual-level safety surveillance.

    View details for DOI 10.1038/s41746-020-0285-8

    View details for PubMedID 33597677

  • Estimating the efficacy of symptom-based screening for COVID-19. NPJ digital medicine Callahan, A. n., Steinberg, E. n., Fries, J. A., Gombar, S. n., Patel, B. n., Corbin, C. K., Shah, N. H. 2020; 3 (1): 95

    Abstract

    There is substantial interest in using presenting symptoms to prioritize testing for COVID-19 and establish symptom-based surveillance. However, little is currently known about the specificity of COVID-19 symptoms. To assess the feasibility of symptom-based screening for COVID-19, we used data from tests for common respiratory viruses and SARS-CoV-2 in our health system to measure the ability to correctly classify virus test results based on presenting symptoms. Based on these results, symptom-based screening may not be an effective strategy to identify individuals who should be tested for SARS-CoV-2 infection or to obtain a leading indicator of new COVID-19 cases.

    View details for DOI 10.1038/s41746-020-0300-0

    View details for PubMedID 33597700

  • Snorkel: rapid training data creation with weak supervision. The VLDB journal : very large data bases : a publication of the VLDB Endowment Ratner, A., Bach, S. H., Ehrenberg, H., Fries, J., Wu, S., Re, C. 2020; 29 (2): 709?30

    Abstract

    Labeling training data is increasingly the largest bottleneck in deploying machine learning systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of-the-art models without hand labeling any training data. Instead, users write labeling functions that express arbitrary heuristics, which can have unknown accuracies and correlations. Snorkel denoises their outputs without access to ground truth by incorporating the first end-to-end implementation of our recently proposed machine learning paradigm, data programming. We present a flexible interface layer for writing labeling functions based on our experience over the past year collaborating with companies, agencies, and research laboratories. In a user study, subject matter experts build models 2.8 * faster and increase predictive performance an average 45.5 % versus seven hours of hand labeling. We study the modeling trade-offs in this new setting and propose an optimizer for automating trade-off decisions that gives up to 1.8 * speedup per pipeline execution. In two collaborations, with the US Department of Veterans Affairs and the US Food and Drug Administration, and on four open-source text and image data sets representative of other deployments, Snorkel provides 132 % average improvements to predictive performance over prior heuristic approaches and comes within an average 3.60 % of the predictive performance of large hand-curated training sets.

    View details for DOI 10.1007/s00778-019-00552-1

    View details for PubMedID 32214778

  • The accuracy vs. coverage trade-off in patient-facing diagnosis models. AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science Kannan, A., Fries, J. A., Kramer, E., Chen, J. J., Shah, N., Amatriain, X. 2020; 2020: 298?307

    Abstract

    A third of adults in America use the Internet to diagnose medical concerns, and online symptom checkers are increasingly part of this process. These tools are powered by diagnosis models similar to clinical decision support systems, with the primary difference being the coverage of symptoms and diagnoses. To be useful to patients and physicians, these models must have high accuracy while covering a meaningful space of symptoms and diagnoses. To the best of our knowledge, this paper is the first in studying the trade-off between the coverage of the model and its performance for diagnosis. To this end, we learn diagnosis models with different coverage from EHR data. We find a 1% drop in top-3 accuracy for every 10 diseases added to the coverage. We also observe that complexity for these models does not affect performance, with linear models performing as well as neural networks.

    View details for PubMedID 32477649

  • Cardiac Imaging of Aortic Valve Area from 34,287 UK Biobank Participants Reveal Novel Genetic Associations and Shared Genetic Comorbidity with Multiple Disease Phenotypes. Circulation. Genomic and precision medicine Córdova-Palomera, A. n., Tcheandjieu, C. n., Fries, J. n., Varma, P. n., Chen, V. S., Fiterau, M. n., Xiao, K. n., Tejeda, H. n., Keavney, B. n., Cordell, H. J., Tanigawa, Y. n., Venkataraman, G. n., Rivas, M. n., Ré, C. n., Ashley, E. A., Priest, J. R. 2020

    Abstract

    Background - The aortic valve is an important determinant of cardiovascular physiology and anatomic location of common human diseases. Methods - From a sample of 34,287 white British-ancestry participants, we estimated functional aortic valve area by planimetry from prospectively obtained cardiac MRI sequences of the aortic valve. Aortic valve area measurements were submitted to genome-wide association testing, followed by polygenic risk scoring and phenome-wide screening to identify genetic comorbidities. Results - A genome-wide association study of aortic valve area in these UK Biobank participants showed three significant associations, indexed by rs71190365 (chr13:50764607, DLEU1, p=1.8×10-9), rs35991305 (chr12:94191968, CRADD, p=3.4×10-8) and chr17:45013271:C:T (GOSR2, p=5.6×10-8). Replication on an independent set of 8,145 unrelated European-ancestry participants showed consistent effect sizes in all three loci, although rs35991305 did not meet nominal significance. We constructed a polygenic risk score for aortic valve area, which in a separate cohort of 311,728 individuals without imaging demonstrated that smaller aortic valve area is predictive of increased risk for aortic valve disease (Odds Ratio 1.14, p=2.3×10-6). After excluding subjects with a medical diagnosis of aortic valve stenosis (remaining n=308,683 individuals), phenome-wide association of >10,000 traits showed multiple links between the polygenic score for aortic valve disease and key health-related comorbidities involving the cardiovascular system and autoimmune disease. Genetic correlation analysis supports a shared genetic etiology with between aortic valve area and birthweight along with other cardiovascular conditions. Conclusions - These results illustrate the use of automated phenotyping of cardiac imaging data from the general population to investigate the genetic etiology of aortic valve disease, perform clinical prediction, and uncover new clinical and genetic correlates of cardiac anatomy.

    View details for DOI 10.1161/CIRCGEN.120.003014

    View details for PubMedID 33125279

  • Assessing the accuracy of automatic speech recognition for psychotherapy. NPJ digital medicine Miner, A. S., Haque, A. n., Fries, J. A., Fleming, S. L., Wilfley, D. E., Terence Wilson, G. n., Milstein, A. n., Jurafsky, D. n., Arnow, B. A., Stewart Agras, W. n., Fei-Fei, L. n., Shah, N. H. 2020; 3: 82

    Abstract

    Accurate transcription of audio recordings in psychotherapy would improve therapy effectiveness, clinician training, and safety monitoring. Although automatic speech recognition software is commercially available, its accuracy in mental health settings has not been well described. It is unclear which metrics and thresholds are appropriate for different clinical use cases, which may range from population descriptions to individual safety monitoring. Here we show that automatic speech recognition is feasible in psychotherapy, but further improvements in accuracy are needed before widespread use. Our HIPAA-compliant automatic speech recognition system demonstrated a transcription word error rate of 25%. For depression-related utterances, sensitivity was 80% and positive predictive value was 83%. For clinician-identified harm-related sentences, the word error rate was 34%. These results suggest that automatic speech recognition may support understanding of language patterns and subgroup variation in existing treatments but may not be ready for individual-level safety surveillance.

    View details for DOI 10.1038/s41746-020-0285-8

    View details for PubMedID 32550644

    View details for PubMedCentralID PMC7270106

  • Estimating the efficacy of symptom-based screening for COVID-19. NPJ digital medicine Callahan, A., Steinberg, E., Fries, J. A., Gombar, S., Patel, B., Corbin, C. K., Shah, N. H. 2020; 3: 95

    Abstract

    There is substantial interest in using presenting symptoms to prioritize testing for COVID-19 and establish symptom-based surveillance. However, little is currently known about the specificity of COVID-19 symptoms. To assess the feasibility of symptom-based screening for COVID-19, we used data from tests for common respiratory viruses and SARS-CoV-2 in our health system to measure the ability to correctly classify virus test results based on presenting symptoms. Based on these results, symptom-based screening may not be an effective strategy to identify individuals who should be tested for SARS-CoV-2 infection or to obtain a leading indicator of new COVID-19 cases.

    View details for DOI 10.1038/s41746-020-0300-0

    View details for PubMedID 32695885

  • Medical device surveillance with electronic health records. NPJ digital medicine Callahan, A. n., Fries, J. A., Ré, C. n., Huddleston, J. I., Giori, N. J., Delp, S. n., Shah, N. H. 2019; 2: 94

    Abstract

    Post-market medical device surveillance is a challenge facing manufacturers, regulatory agencies, and health care providers. Electronic health records are valuable sources of real-world evidence for assessing device safety and tracking device-related patient outcomes over time. However, distilling this evidence remains challenging, as information is fractured across clinical notes and structured records. Modern machine learning methods for machine reading promise to unlock increasingly complex information from text, but face barriers due to their reliance on large and expensive hand-labeled training sets. To address these challenges, we developed and validated state-of-the-art deep learning methods that identify patient outcomes from clinical notes without requiring hand-labeled training data. Using hip replacements-one of the most common implantable devices-as a test case, our methods accurately extracted implant details and reports of complications and pain from electronic health records with up to 96.3% precision, 98.5% recall, and 97.4% F1, improved classification performance by 12.8-53.9% over rule-based methods, and detected over six times as many complication events compared to using structured data alone. Using these additional events to assess complication-free survivorship of different implant systems, we found significant variation between implants, including for risk of revision surgery, which could not be detected using coded data alone. Patients with revision surgeries had more hip pain mentions in the post-hip replacement, pre-revision period compared to patients with no evidence of revision surgery (mean hip pain mentions 4.97 vs. 3.23; t?=?5.14; p?

    View details for DOI 10.1038/s41746-019-0168-z

    View details for PubMedID 31583282

    View details for PubMedCentralID PMC6761113

  • Multi-Resolution Weak Supervision for Sequential Data Sala, F., Varma, P., Fries, J., Fu, D. Y., Sagawa, S., Khattar, S., Ramamoorthy, A., Xiao, K., Fatahalian, K., Priest, J., Re, C., Wallach, H., Larochelle, H., Beygelzimer, A., d'Alche-Buc, F., Fox, E., Garnett, R. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2019
  • ShortFuse: Biomedical Time Series Representations in the Presence of Structured Information. Proceedings of machine learning research Fiterau, M. n., Bhooshan, S. n., Fries, J. n., Bournhonesque, C. n., Hicks, J. n., Halilaj, E. n., Ré, C. n., Delp, S. n. 2017; 68: 59?74

    Abstract

    In healthcare applications, temporal variables that encode movement, health status and longitudinal patient evolution are often accompanied by rich structured information such as demographics, diagnostics and medical exam data. However, current methods do not jointly optimize over structured covariates and time series in the feature extraction process. We present ShortFuse, a method that boosts the accuracy of deep learning models for time series by explicitly modeling temporal interactions and dependencies with structured covariates. ShortFuse introduces hybrid convolutional and LSTM cells that incorporate the covariates via weights that are shared across the temporal domain. ShortFuse outperforms competing models by 3% on two biomedical applications, forecasting osteoarthritis-related cartilage degeneration and predicting surgical outcomes for cerebral palsy patients, matching or exceeding the accuracy of models that use features engineered by domain experts.

    View details for PubMedID 30882086

  • Brundlefly at SemEval-2016 Task 12: Recurrent Neural Networks vs. Joint Inference for Clinical Temporal Information Extraction Jason Alan Fries Fries, J. A. 2016: 1274?79

    View details for DOI 10.18653/v1/S16-1198

Stanford Medicine Resources: