Current Role at Stanford
I'm currently working as a staff research scientist in the Shah Lab and research scientist at Snorkel AI. My interests fall in the intersection of computer science and medical informatics. My research interests include:
? Machine learning with limited labeled data, e.g., weak supervision, self-supervision, and few-shot learning.
? Multimodal learning, e.g., combining text, imaging, video and electronic health record data for improving clinical outcome prediction
? Human-in-the-loop machine learning systems.
? Knowledge graphs and their use in improving representation learning
Weakly supervised classification of rare aortic valve malformations using unlabeled cardiac MRI sequences, Stanford University (10/1/2017 - Present)
This work explores training deep learning models for detecting cardiac pathologies using large-scale, unlabeled MRI video data available as part of the UK Biobank.
- Chris Re, Mr, Stanford University
- Euan Ashley, Professor, Stanford University Cardiology
- James Priest, Adjunct Clinical Assistant Professor, THE STANFORD UNIVERSITY MEDICAL CENTER
For More Information:
Snorkel: Rapid Training Data Creation with Weak Supervision, Stanford University (6/1/2016 - Present)
Labeling training data is increasingly the largest bottleneck in deploying machine learning systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of-the-art models without hand labeling any training data. Instead, users write labeling functions that express arbitrary heuristics, which can have unknown accuracies and correlations. Snorkel denoises their outputs without access to ground truth by incorporating the first end-to-end implementation of our recently proposed machine learning paradigm, data programming. We present a flexible interface layer for writing labeling functions based on our experience over the past year collaborating with companies, agencies, and research labs. In a user study, subject matter experts build models 2.8x faster and increase predictive performance an average 45.5% versus seven hours of hand labeling. We study the modeling tradeoffs in this new setting and propose an optimizer for automating tradeoff decisions that gives up to 1.8x speedup per pipeline execution. In two collaborations, with the U.S. Department of Veterans Affairs and the U.S. Food and Drug Administration, and on four open-source text and image data sets representative of other deployments, Snorkel provides 132% average improvements to predictive performance over prior heuristic approaches and comes within an average 3.60% of the predictive performance of large hand-curated training sets.
- Alex Ratner, PhD Student, Stanford University
- Steven Bach, Assistant Professor, Brown University
- Henry Ehrenberg, Software Engineer, Facebook
- Sen Wu, PhD Student, Stanford University
- Christopher Ré, Associate Professor, Computer Science, Stanford University
For More Information:
Service, Volunteer and Community Work
Co-organizer for Machine Learning for Health Workshop @ NeurIPS, NeurIPS (12/2016 - 12/2018)
Machine Learning for Health Workshop @ NeurIPS
Area Chair @ Machine Learning for Healthcare Conference (MLHC), Stanford University (2019 - 2021)
Palo Alto, CA