Instructor, Medicine - Oncology
B.A.Sc., University of British Columbia, Engineering Physics, Electrical Engineering Option (2006)
Ph.D., Harvard University, Engineering Sciences (2012)
ERROR! No headcode.htm file found.
Cancer progression is driven by both somatic copy number aberrations (CNAs) and chromatin remodeling, yet little is known about the interplay between these two classes of events in shaping the clonal diversity of cancers. We present Alleloscope, a method for allele-specific copy number estimation that can be applied to single-cell DNA- and/or transposase-accessible chromatin-sequencing (scDNA-seq, ATAC-seq) data, enabling combined analysis of allele-specific copy number and chromatin accessibility. On scDNA-seq data from gastric, colorectal and breast cancer samples, with validation using matched linked-read sequencing, Alleloscope finds pervasive occurrence of highly complex, multiallelic CNAs, in which cells that carry varying allelic configurations adding to the same total copy number coevolve within a tumor. On scATAC-seq from two basal cell carcinoma samples and a gastric cancer cell line, Alleloscope detected multiallelic copy number events and copy-neutral loss-of-heterozygosity, enabling dissection of the contributions of chromosomal instability and chromatin remodeling to tumor evolution.
View details for DOI 10.1038/s41587-021-00911-w
View details for PubMedID 34017141
BACKGROUND: The genome of SARS-CoV-2 is susceptible to mutations during viral replication due to the errors generated by RNA-dependent RNA polymerases. These mutations enable the SARS-CoV-2 to evolve into new strains. Viral quasispecies emerge from de novo mutations that occur in individual patients. In combination, these sets of viral mutations provide distinct genetic fingerprints that reveal the patterns of transmission and have utility in contact tracing.METHODS: Leveraging thousands of sequenced SARS-CoV-2 genomes, we performed a viral pangenome analysis to identify conserved genomic sequences. We used a rapid and highly efficient computational approach that relies on k-mers, short tracts of sequence, instead of conventional sequence alignment. Using this method, we annotated viral mutation signatures that were associated with specific strains. Based on these highly conserved viral sequences, we developed a rapid and highly scalable targeted sequencing assay to identify mutations, detect quasispecies variants, and identify mutation signatures from patients. These results were compared to the pangenome genetic fingerprints.RESULTS: We built a k-mer index for thousands of SARS-CoV-2 genomes and identified conserved genomics regions and landscape of mutations across thousands of virus genomes. We delineated mutation profiles spanning common genetic fingerprints (the combination of mutations in a viral assembly) and a combination of mutations that appear in only a small number of patients. We developed a targeted sequencing assay by selecting primers from the conserved viral genome regions to flank frequent mutations. Using a cohort of 100 SARS-CoV-2 clinical samples, we identified genetic fingerprints consisting of strain-specific mutations seen across populations and de novo quasispecies mutations localized to individual infections. We compared the mutation profiles of viral samples undergoing analysis with the features of the pangenome.CONCLUSIONS: We conducted an analysis for viral mutation profiles that provide the basis of genetic fingerprints. Our study linked pangenome analysis with targeted deep sequenced SARS-CoV-2 clinical samples. We identified quasispecies mutations occurring within individual patients and determined their general prevalence when compared to over 70,000 other strains. Analysis of these genetic fingerprints may provide a way of conducting molecular contact tracing.
View details for DOI 10.1186/s13073-021-00882-2
View details for PubMedID 33875001
Cancer cell lines are not homogeneous nor are they static in their genetic state and biological properties. Genetic, transcriptional and phenotypic diversity within cell lines contributes to the lack of experimental reproducibility frequently observed in tissue-culture-based studies. While cancer cell line heterogeneity has been generally recognized, there are no studies which quantify the number of clones that coexist within cell lines and their distinguishing characteristics. We used a single-cell DNA sequencing approach to characterize the cellular diversity within nine gastric cancer cell lines and integrated this information with single-cell RNA sequencing. Overall, we sequenced the genomes of 8824 cells, identifying between 2 and 12 clones per cell line. Using the transcriptomes of more than 28 000 single cells from the same cell lines, we independently corroborated 88% of the clonal structure determined from single cell DNA analysis. For one of these cell lines, we identified cell surface markers that distinguished two subpopulations and used flow cytometry to sort these two clones. We identified substantial proportions of replicating cells in each cell line, assigned these cells to subclones detected among the G0/G1 population and used the proportion of replicating cells per subclone as a surrogate of each subclone's growth rate.
View details for DOI 10.1093/nargab/lqaa016
View details for PubMedID 32215369
View details for PubMedCentralID PMC7079336
The tumor microenvironment (TME) consists of a heterogenous cellular milieu that can influence cancer cell behavior. Its characteristics havean impact on treatments such as immunotherapy. These features can be revealed with single-cell RNA sequencing (scRNA-seq). We hypothesized that scRNA-seq analysis ofgastric cancer (GC) together with paired normal tissue and peripheral blood mononuclear cells (PBMCs) would identify critical elements of cellular deregulation not apparent with other approaches.scRNA-seq was conducted on seven patients with GC and one patient with intestinal metaplasia. We sequenced 56,167 cells comprising GC (32,407 cells), paired normal tissue (18,657 cells) and PBMCs (5,103 cells). Protein expression was validated by multiplex immunofluorescence.Tumor epithelium had copy number alterations, a distinct gene expression program from normal, with intra-tumor heterogeneity. GC TME was significantly enriched for stromal cells, macrophages, dendritic cells (DCs) and Tregs. TME-exclusive stromal cells expressed distinct extracellular matrix components than normal. Macrophages were transcriptionally heterogenous and did not conform to a binary M1/M2 paradigm. Tumor-DCs had a unique gene expression program compared to PBMC DCs. TME-specific cytotoxic T cells were exhausted with two heterogenous subsets. Helper, cytotoxic T, Treg and NK cells expressed multiple immune checkpoint or costimulatory molecules. Receptor-ligand analysis revealed TME-exclusive inter-cellular communication.Single-cell gene expression studies revealed widespread reprogramming across multiple cellular elements in the GC TME. Cellular remodeling was delineated by changes in cell numbers, transcriptional states and inter-cellular interactions. This characterization facilitates understanding of tumor biology and enables identification of novel targets including for immunotherapy.
View details for DOI 10.1158/1078-0432.CCR-19-3231
View details for PubMedID 32060101
View details for Web of Science ID 000615970409020
The diverse cellular milieu of the gastric tissue microenvironment plays a critical role in normal tissue homeostasis and tumor development. However, few cell culture model can recapitulate the tissue microenvironment and intercellular signaling in vitro. We used a primary tissue culture system to generate a murine p53 null gastric tissue model containing both epithelium and mesenchymal stroma. To characterize the microenvironment and niche signaling, we used single cell RNA sequencing (scRNA-Seq) to determine the transcriptomes of 4,391 individual cells. Based on specific markers, we identified epithelial cells, fibroblasts and macrophages in initial tissue explants during organoid formation. The majority of macrophages were polarized towards wound healing and tumor promotion M2-type. During the course of time, the organoids maintained both epithelial and fibroblast lineages with the features of immature mouse gastric stomach. We detected a subset of cells in both lineages expressing Lgr5, one of the stem cell markers. We examined the lineage-specific Wnt signaling activation, and identified that Rspo3 was specifically expressed in the fibroblast lineage, providing an endogenous source of the R-spondin to activate Wnt signaling. Our studies demonstrate that this primary tissue culture system enables one to study gastric tissue niche signaling and immune response in vitro.
View details for PubMedID 30872643
Some gastric cancers have FGFR2 amplifications, making them sensitive to FGFR inhibitors. However, cancer cells inevitably develop resistance despite initial response. The underlying resistance mechanism to FGFR inhibition is unclear. In this study, we applied a kinome-wide CRISPR/Cas9 screen to systematically identify kinases that are determinants of sensitivity to a potent FGFR inhibitor AZD4547 in KatoIII cells, a gastric cancer cell line with FGFR2 amplification. In total, we identified 20 kinases, involved in ILK, SRC, and EGFR signaling pathways, as determinants that alter cell sensitivity to FGFR inhibition. We functionally validated the top negatively selected and positively selected kinases, ILK and CSK, from the CRISPR/Cas9 screen using RNA interference. We observed synergistic effects on KatoIII cells as well as three additional gastric cancer cell lines with FGFR2 amplification when AZD4547 was combined with small molecular inhibitors Cpd22 and lapatinib targeting ILK and EGFR/HER2, respectively. Furthermore, we demonstrated that GSK3b is one of the downstream effectors of ILK upon FGFR inhibition. In summary, our study systematically evaluated the kinases and associated signaling pathways modulating cell response to FGFR inhibition, and for the first time, demonstrated that targeting ILK would enhance the effectiveness of AZD4547 treatment of gastric tumors with amplifications of FGFR2.
View details for PubMedID 31076567
View details for Web of Science ID 000535355700022
Molecular analysis of DNA samples with limited quantities can be challenging. Repeatedly sequencing the original DNA molecules from a given sample would overcome many issues related to accurate genetic analysis and mitigate issues with processing small amounts of DNA analyte. Moreover, an iterative, replicated analysis of the same DNA molecule has the potential to improve genetic characterization. Herein, we demonstrate that the use of 'click'-based attachment of DNA sequencing libraries onto an agarose bead support enables repetitive primer extension assays for specific genomic DNA targets such as gene exons. We validated the performance of this assay for evaluating specific genetic alterations in both normal and cancer reference standard DNA samples. We demonstrate the stability of conjugated DNA libraries and related sequencing results over the course of independent serial assays spanning several months from the same set of samples. Finally, we finally applied this method to DNA derived from a tumor sample and demonstrated improved mutation detection accuracy.
View details for PubMedID 30652472
Digital PCR (dPCR) relies on the analysis of individual partitions to accurately quantify nucleic acid species. The most widely used analysis method requires manual clustering through individual visual inspection. Some automated analysis methods have emerged but do not robustly account for multiplexed targets, low target concentration, and assay noise. In this study, we describe an open source analysis software called Calico that uses "data gridding" to increase the sensitivity of clustering toward small clusters. Our workflow also generates quality score metrics in order to gauge and filter individual assay partitions by how well they were classified. We applied our analysis algorithm to multiplexed droplet-based digital PCR data sets in both EvaGreen and probes-based schemes, and targeted the oncogenic BRAF V600E and KRAS G12D mutations. We demonstrate an automated clustering sensitivity of down to 0.1% mutant fraction and filtering of artifactual assay partitions from low quality DNA samples. Overall, we demonstrate a vastly improved approach to analyzing ddPCR data that can be applied to clinical use, where automation and reproducibility are critical.
View details for PubMedID 29083143
Genomic instability is a frequently occurring feature of cancer that involves large-scale structural alterations. These somatic changes in chromosome structure include duplication of entire chromosome arms and aneuploidy where chromosomes are duplicated beyond normal diploid content. However, the accurate determination of aneuploidy events in cancer genomes is a challenge. Recent advances in sequencing technology allow the characterization of haplotypes that extend megabases along the human genome using high molecular weight (HMW) DNA. For this study, we employed a library preparation method in which sequence reads have barcodes linked to single HMW DNA molecules. Barcode-linked reads are used to generate extended haplotypes on the order of megabases. We developed a method that leverages haplotypes to identify chromosomal segmental alterations in cancer and uses this information to join haplotypes together, thus extending the range of phased variants. With this approach, we identified mega-haplotypes that encompass entire chromosome arms. We characterized the chromosomal arm changes and aneuploidy events in a manner that offers similar information as a traditional karyotype but with the benefit of DNA sequence resolution. We applied this approach to characterize aneuploidy and chromosomal alterations from a series of primary colorectal cancers.
View details for PubMedID 28977555
View details for PubMedCentralID PMC5737808
RNA-Seq measures gene expression by counting sequence reads belonging to unique cDNA fragments. Molecular barcodes commonly in the form of random nucleotides were recently introduced to improve gene expression measures by detecting amplification duplicates, but are susceptible to errors generated during PCR and sequencing. This results in false positive counts, leading to inaccurate transcriptome quantification especially at low input and single-cell RNA amounts where the total number of molecules present is minuscule. To address this issue, we demonstrated the systematic identification of molecular species using transposable error-correcting barcodes that are exponentially expanded to tens of billions of unique labels.We experimentally showed random-mer molecular barcodes suffer from substantial and persistent errors that are difficult to resolve. To assess our method's performance, we applied it to the analysis of known reference RNA standards. By including an inline random-mer molecular barcode, we systematically characterized the presence of sequence errors in random-mer molecular barcodes. We observed that such errors are extensive and become more dominant at low input amounts.We described the first study to use transposable molecular barcodes and its use for studying random-mer molecular barcode errors. Extensive errors found in random-mer molecular barcodes may warrant the use of error correcting barcodes for transcriptome analysis as input amounts decrease.
View details for PubMedID 28934929
Microsatellites are multi-allelic and composed of short tandem repeats (STRs) with individual motifs composed of mononucleotides, dinucleotides or higher including hexamers. Next-generation sequencing approaches and other STR assays rely on a limited number of PCR amplicons, typically in the tens. Here, we demonstrate STR-Seq, a next-generation sequencing technology that analyses over 2,000 STRs in parallel, and provides the accurate genotyping of microsatellites. STR-Seq employs in vitro CRISPR-Cas9-targeted fragmentation to produce specific DNA molecules covering the complete microsatellite sequence. Amplification-free library preparation provides single molecule sequences without unique molecular barcodes. STR-selective primers enable massively parallel, targeted sequencing of large STR sets. Overall, STR-Seq has higher throughput, improved accuracy and provides a greater number of informative haplotypes compared with other microsatellite analysis approaches. With these new features, STR-Seq can identify a 0.1% minor genome fraction in a DNA mixture composed of different, unrelated samples.
View details for DOI 10.1038/ncomms14291
View details for PubMedID 28169275
Genome rearrangements are critical oncogenic driver events in many malignancies. However, the identification and resolution of the structure of cancer genomic rearrangements remain challenging even with whole genome sequencing.To identify oncogenic genomic rearrangements and resolve their structure, we analyzed linked read sequencing. This approach relies on a microfluidic droplet technology to produce libraries derived from single, high molecular weight DNA molecules, 50 kb in size or greater. After sequencing, the barcoded sequence reads provide long range genomic information, identify individual high molecular weight DNA molecules, determine the haplotype context of genetic variants that occur across contiguous megabase-length segments of the genome and delineate the structure of complex rearrangements. We applied linked read sequencing of whole genomes to the analysis of a set of synchronous metastatic diffuse gastric cancers that occurred in the same individual.When comparing metastatic sites, our analysis implicated a complex somatic rearrangement that was present in the metastatic tumor. The oncogenic event associated with the identified complex rearrangement resulted in an amplification of the known cancer driver gene FGFR2. With further investigation using these linked read data, the FGFR2 copy number alteration was determined to be a deletion-inversion motif that underwent tandem duplication, with unique breakpoints in each metastasis. Using a three-dimensional organoid tissue model, we functionally validated the metastatic potential of an FGFR2 amplification in gastric cancer.Our study demonstrates that linked read sequencing is useful in characterizing oncogenic rearrangements in cancer metastasis.
View details for PubMedID 28629429
We describe a single-color digital PCR assay that detects and quantifies cancer mutations directly from circulating DNA collected from the plasma of cancer patients. This approach relies on a double-stranded DNA intercalator dye and paired allele-specific DNA primer sets to determine an absolute count of both the mutation and wild-type-bearing DNA molecules present in the sample. The cell-free DNA assay uses an input of 1 ng of nonamplified DNA, approximately 300 genome equivalents, and has a molecular limit of detection of three mutation DNA genome-equivalent molecules per assay reaction. When using more genome equivalents as input, we demonstrated a sensitivity of 0.10% for detecting the BRAF V600E and KRAS G12D mutations. We developed several mutation assays specific to the cancer driver mutations of patients' tumors and detected these same mutations directly from the nonamplified, circulating cell-free DNA. This rapid and high-performance digital PCR assay can be configured to detect specific cancer mutations unique to an individual cancer, making it a potentially valuable method for patient-specific longitudinal monitoring.
View details for PubMedID 28818432
Haplotyping of human chromosomes is a prerequisite for cataloguing the full repertoire of genetic variation. We present a microfluidics-based, linked-read sequencing technology that can phase and haplotype germline and cancer genomes using nanograms of input DNA. This high-throughput platform prepares barcoded libraries for short-read sequencing and computationally reconstructs long-range haplotype and structural variant information. We generate haplotype blocks in a nuclear trio that are concordant with expected inheritance patterns and phase a set of structural variants. We also resolve the structure of the EML4-ALK gene fusion in the NCI-H2228 cancer cell line using phased exome sequencing. Finally, we assign genetic aberrations to specific megabase-scale haplotypes generated from whole-genome sequencing of a primary colorectal adenocarcinoma. This approach resolves haplotype information using up to 100 times less genomic DNA than some methods and enables the accurate detection of structural variants.
View details for DOI 10.1038/nbt.3432
View details for PubMedID 26829319
View details for PubMedCentralID PMC4786454
In this study, we present a highly customizable method for quantifying copy number and point mutations utilizing a single-color, droplet digital PCR platform. Droplet digital polymerase chain reaction (ddPCR) is rapidly replacing real-time quantitative PCR (qRT-PCR) as an efficient method of independent DNA quantification. Compared to quantative PCR, ddPCR eliminates the needs for traditional standards; instead, it measures target and reference DNA within the same well. The applications for ddPCR are widespread including targeted quantitation of genetic aberrations, which is commonly achieved with a two-color fluorescent oligonucleotide probe (TaqMan) design. However, the overall cost and need for optimization can be greatly reduced with an alternative method of distinguishing between target and reference products using the nonspecific DNA binding properties of EvaGreen (EG) dye. By manipulating the length of the target and reference amplicons, we can distinguish between their fluorescent signals and quantify each independently. We demonstrate the effectiveness of this method by examining copy number in the proto-oncogene FLT3 and the common V600E point mutation in BRAF. Using a series of well-characterized control samples and cancer cell lines, we confirmed the accuracy of our method in quantifying mutation percentage and integer value copy number changes. As another novel feature, our assay was able to detect a mutation comprising less than 1% of an otherwise wild-type sample, as well as copy number changes from cancers even in the context of significant dilution with normal DNA. This flexible and cost-effective method of independent DNA quantification proves to be a robust alternative to the commercialized TaqMan assay.
View details for DOI 10.1021/ac403843j
View details for PubMedID 24483992
Plasmid loss rate measurements are standard in microbiology and key to understanding plasmid stabilization mechanisms. The conventional assays eliminate selection for plasmids at the beginning of the experiment and screen for the appearance of plasmid-free cells over long-term population growth. However, it has been long appreciated in plasmid biology that the growth rate differential between plasmid-free and plasmid-containing cells at some point overshadows the effect of primary loss events, such that the assays can greatly over-estimate inherent loss rates. The standard solutions to this problem are to either consider the very early phase of loss where the fraction of plasmid-free cells increases linearly, or to measure the growth rate difference either by following the population for longer time or by measuring growth rates separately. Here we mathematically show that in all these cases, seemingly small experimental errors in the growth rate estimates can overshadow the estimates of the loss rates. For many plasmids, loss rates may thus be much lower than previously thought, and for some plasmids, the estimated loss rate may have nothing to do with actual loss rates. We further modify two independent experimental methods to separate inherent losses from growth differences and apply them to the same plasmids. First we use a high-throughput microscopy-based approach to screen for plasmid-free cells at extremely short time scales--tens of minutes rather than tens of generations--and apply it to a par? version of mini-R1. Second we modify a counterselection-based plasmid loss assay inspired by the Luria-Delbrück fluctuation test that completely separates losses from growth, and apply it to various R1 and pSC101 derivatives. Concordant results from the two assays suggest that plasmids are lost at a lower frequency than previously believed. In fact, for par? mini-R1 the observed loss rate of about 10?³ per cell and generation seems to be so low as to be inconsistent with what we know about the R1 stabilization mechanisms, suggesting these well characterized plasmids may have some additional and so far unknown stabilization mechanisms, for example improving copy number control or partitioning at cell division.
View details for DOI 10.1016/j.plasmid.2013.07.007
View details for Web of Science ID 000328175600007
View details for PubMedID 24042048
View details for PubMedCentralID PMC3966108