No sample size calculations were done to predetermine group sizes, and investigators were not blinded during randomization and outcome assessments.
Analysis of EGFR variants in MD Anderson Cancer Center GEMINI, Foundation Medicine, Guardant Health and cBioPortal databases
To analyse the numbers and frequencies of different EGFR mutations among patients with NSCLC in the MD Anderson Cancer Center GEMINI database, the database was queried for patients with EGFR mutations (n = 1,054) and manually curated as classical or atypical EGFR mutations. The MD Anderson Cancer Center GEMINI database is prospectively collected from patients consented and enrolled on protocol number PA13-0589 in accordance with the MD Anderson Institutional Review Board.
EGFR mutations were determined from formalin-fixed paraffin-embedded tumours or digital-droplet PCR for blood samples by CLIA-certified methods as previously described18,37. In brief, samples from MD Anderson Cancer Center were collected through molecular pathology and mutations were determined by next-generation sequencing panels of tumour tissue DNA (MD Anderson Cancer Center Molecular Diagnostics Laboratory). MD Anderson Molecular Diagnostics Laboratory is a tissue molecular profiling method for NGS-based analysis to detect mutations in hotspot regions of 50 genes, and in April 2016, it was expanded to analyse 134 unique genes for the detection of somatic mutations in coding sequences of 128 genes and selected copy number variations (amplifications) in 49 genes. Moffitt Cancer Center used diagnostic methods such as Clarinet (bi-directional sequencing of exons 18–21 of EGFR), Pyrosequencing of EGFR gene (exons 18–21), and Moffitt Illumina TruSight Tumor 26 (TST26). Moffitt Trusight is a NGS Illumina sequencing platform with a panel of 170 genes. Commercial NGS platforms including FoundationOne and Guardant360 were used by both MD Anderson and Moffitt Cancer Center as described below.
To identify patients with EGFR mutations in the Foundation Medicine database, patient samples taken between November 2011 and May 2020 previously subjected to hybrid-capture based comprehensive genomic profiling using formalin-fixed paraffin-embedded tissue or plasma using previously validated assays38,39, were analysed for EGFR mutations (n = 10,221). Patients were stratified by EGFR mutation, and EGFR mutations were manually curated as atypical or classical EGFR mutations. Classical EGFR mutations were defined as L858R point mutations, T790M mutations, and various exon 19 deletions including any deletion in exon 19 beginning at amino acid E746 or L747 and ending at amino acid A755. Deletions also including insertions were allowed and still considered classical exon 19 deletions. Atypical EGFR mutations were defined as non-synonymous mutations that were not defined as classical mutations. Patients with EGFR mutations where the sequence of the mutation was unknown were excluded from the analysis.
To determine the frequency of individual EGFR variants reported across the MD Anderson GEMINI database, cBioPortal, Foundation Medicine and the Guardant Health database, each database was analysed separately, and the average of all databases was determined. To determine the frequency of atypical mutations in the MD Anderson GEMINI and Foundation Medicine databases, atypical mutations were identified as described above and total number of known EGFR mutations across all patients was tabulated. For the analysis of cBioPortal, all non-overlapping studies were selected and exported. For overlapping studies, only the largest dataset was used, and all known EGFR mutations were tabulated. To determine the frequencies of EGFR variants from Guardant Health, a database of sequenced circulating free DNA (cfDNA), the Guardant360 clinical database was searched for NSCLC samples tested between November 2016 and November 2019 harbouring EGFR mutations (n = 5,026 patients). Guardant360 is a CLIA-certified, CAP/NYSDOH accredited comprehensive cfDNA NGS test that reports on SNVs, indels, fusions and SNVs in up to 73 genes. The Guardant360 clinical database, and the four datasets reported here, are enriched in North American patients with NSCLC; the frequency of atypical EGFR mutations may differ in Asia or other regions.
Analysis of TTF in MD Anderson Cancer Center GEMINI and Moffitt Cancer Center
To determine TTF after EGFR TKI treatment, patients with NSCLC harbouring an EGFR mutation in the tyrosine kinase domain (exons 18–22) were identified in the MD Anderson GEMINI and Moffitt Cancer Center databases. Data collection for Moffitt Cancer Center (MCC) patients was performed under the protocol (MCC 19161), which was formally reviewed and granted approval by MCC in accordance with the Declaration of Helsinki and the 21st Century Cures Act. Outcomes were recorded for patients for only first EGFR TKI. Patients were stratified by classical (L858R or Ex19del, as defined above) or atypical (non-classical). There were 333 patients with NSCLC identified in the MD Anderson GEMINI database who had tumours expressing atypical mutations. Of these patients, 88 patients received at least one line of EGFR TKI treatment. In addition, at Moffitt Cancer Center, there were 21 patients with NSCLC with tumours harbouring atypical EGFR mutations. Clinical parameters were extracted from the respective databases. Patients previously receiving chemotherapy were included, and TTF was calculated for the first EGFR TKI received. TTF was determined as previously described18 and defined as time from commencement of EGFR TKI to radiologic progression, TKI discontinuation, or death, and was not based on RECIST criteria. For patients treated beyond progression, radiologic progression was recorded as the end point, and data cut-off was May 2021. Median TTF was calculated using the Kaplan–Meier method. HR and P values were determined using GraphPad Prism software and two-sided Mantel–Cox log-rank tests.
Analysis of OS and PFI from cBioPortal Database
For overall survival (OS) and progression-free interval (PFI), analysis of patients in cBioportal was determined as previously described19 for patients receiving any treatment with survival information and qualifying EGFR mutation. This information was curated from cBioportal by selecting all non-overlapping studies of NSCLC. For overlapping studies, the largest database was selected. PFI and OS analysis were restricted to the tyrosine kinase domain. Median OS and median PFI were calculated using the Kaplan–Meier method. HR and P values were determined using GraphPad Prism software and two-sided Mantel–Cox log-rank tests.
Ba/F3 cell generation, drug screening and IC50 approximations
Ba/F3 cells were obtained as a gift from G. Mills (MD Anderson Cancer Center) and maintained in RPMI (Sigma) containing 10% FBS, 1% penicillin-streptomycin and 10 ng ml−1 recombinant mIL-3 (R&D Biosystems). To establish stable Ba/F3 cell lines, Ba/F3 cells were transduced with retroviruses containing mutant EGFR plasmids for 12–24 h. Retroviruses were generated using Lipofectamine 2000 (Invitrogen) transfections of Phoenix 293T-ampho cells (Orbigen) with pBabe-Puro based vectors listed in Supplementary Table 7. Vectors were generated by GeneScript or Bioinnovatise using parental vectors from Addgene listed in Supplementary Table 7. After 48–72 h of transduction, 2 µg ml−1 puromycin (Invitrogen) was added to Ba/F3 cell lines in complete RPMI. To select for EGFR-positive cell lines, cells were stained with PE-EGFR (Biolegend) and sorted by fluorescence-activated cell sorting. After sorting, EGFR-positive cells were maintained in RPMI containing 10% FBS, 1% penicillin-streptomycin, and 1 ng ml−1 EGF to support cell viability. Drug screening was performed as previously described22,36. Shortly, cells were plated in 384-well plates (Greiner Bio-One) at 2,000–3,000 cells per well in technical triplicate. Seven different concentrations of TKIs or DMSO vehicle were added to reach a final volume of 40 µl per well. After 72 h, 11 µl of Cell Titer Glo (Promega) was added to each well. Plates were incubated for a minimum of 10 min, and bioluminescence was determined using a FLUOstar OPTIMA plate reader (BMG LABTECH). Raw bioluminescence values were normalized to DMSO control-treated cells, and values were plotted in GraphPad Prism. Non-linear regressions were used to fit the normalized data with a variable slope, and IC50 values were determined by GraphPad prism by interpolation of concentrations at 50% inhibition. Drug screens were performed in technical triplicate on each plate and either duplicate or triplicate biological replicates. Mutant to WT ratios for each drug were calculated by dividing the IC50 values of mutant cell lines by the average IC50 value of Ba/F3 cells expressing WT EGFR supplemented with 10 ng ml−1 EGF for each drug. Statistical differences between groups were determined by one-way ANOVA as described in the figure legends.
In silico mutational mapping and docking experiments
X-ray structures of wild type EGFR in complex with AMP-PNP (2ITX) and osimertinib (4ZAU), and EGFR L858R mutant in complex with AMP-PNP (Protein Data Bank (PDB) ID: 2ITV) were retrieved from the Protein Data Bank. Molecular Operating Environment (2019.01; Chemical Computing Group CCCG) was used to generate mutant homology models, construct protein–ligand models and for visualization. Pymol was used for visualization of mutation location on WT EGFR (PDB ID: 2ITX) and structural alignment with EGFR D770insNPG (PDB ID: 4LRM) or EGFR G719S (PDB ID: 2ITN).
Heat map generation
Heat maps and hierarchical clustering were generated by plotting the median log (Mut/WT) value for each cell line and each drug using R and the ComplexHeatmap package40 2.6.2 (R Foundation for Statistical Computing). Hierarchical clustering was determined by Euclidean distance between Mut/WT ratios. For co-occurring mutations, mutation order was assigned arbitrarily, and for acquired mutations, mutations were assigned in the order mutations are observed clinically. Structure–function groups were assigned based on predicted impact of mutation on receptor conformation.
Determination of EGFR groups and subgroups
Mutational mapping was used to separate EGFR mutations into distinct groups based on predicted drug sensitivity. Structural features of EGFR mutations with known drug sensitivity (that is, classical EGFR mutations41,42, T790M43,44,45 and exon 20 insertions22,25) were used as the basis for predicting the impact of mutations on drug sensitivity. Using mutational mapping there were four distinct groups: (1) no obvious effect on the drug binding pocket (similar to L858R); (2) a mutation in the hydrophobic core (similar to T790M); (3) a large inward shift of both the αC-helix and P-loop (similar to exon 20 insertions); and (4) a slight inward shift of the αC-helix and/or P-loop due to direct changes to the either the αC-helix and/or P-loop or indirectly through alterations of the ß-pleated sheets that are predicted to effect the position either the αC-helix and/or P-loop. Groups were validated by hierarchical clustering of in vitro sensitivity of Ba/F3 cells expressing the various EGFR mutations. Subgroups such as T790M-like-3S/T790M-like-3R and Ex20ins-NL/Ex20ins-FL were defined based on cell line sensitivity data.
Statistical analyses of structure-function groups
Correlations for mutations were determined using Spearman’s rho by correlating the median log (Mut/WT) value for each mutation and drug versus the average of the median log (Mut/WT) value for the structure–function-based group or exon-based group for which the mutation belongs. For each correlation, the mutation tested was removed from the average structure function and exon-based groups. Average rho values were compared by two-sided Student’s t-test. To determine whether structure function groups or exon groups were better predictor of drug sensitivity, we performed recursive-partitioning analyses to construct a decision tree for each drug using structure function group and mutation data on exons 18, 19, 20, and 21 as predictors. The decision tree classified samples by posing a series of decision rules based on predictors. Each decision rule was constrained in an internal node, and every internal node points to yes-or-no questions that result in a ‘yes’ or ‘no’ branch. We applied the CART algorithm20,21 using the rpart R package. We calculated variable importance as the sum of the goodness of split measures for each split. These were scaled to sum to 100 for a tree. Median SAS version 9.4 and R version 3.5.6 were used to carry out the computations for all analyses. The structure function group variable was involved in the first and second splits in all of the 18 regression trees of drug sensitivity. The variable importance of this variable was in a range of 66–94%. Both the order of the split and variable importance indicate that the structure function group variable was more predictive than the exon-based variables in evaluation of drug sensitivity. Code for this analysis can be found at https://github.com/MD-Anderson-Bioinformatics/EGFR-Structure-Function-Nature-Manuscript.
PDX generation and in vivo experiments
As part of the MD Anderson Cancer Center Lung Cancer Moon Shots program, PDXs harbouring EGFR G719A and EGFR L858R/E709K were generated and maintained in accordance with Good Animal Practices and with approval from MD Anderson Cancer Center Institutional Animal Care and Use Committee on protocol number PA140276 as previously described46. Surgical samples were rinsed with serum-free RPMI supplemented with 1% penicillin-streptomycin then implanted into the right flank of 5- to 6-week-old NSG female mice within 2 h of resection. Tumours were validated for EGFR mutations by DNA fingerprinting and quantitative PCR as described46. PDXs harbouring EGFR S768dupSVD were purchased from Jackson Laboratories (J100672). To propagate tumours, 5- to 6-week-old female NSG mice (NOD.Cg-Prkdcscid IL2rgtmWjl/Szj) were purchased from Jackson Laboratories (005557). Fragments of NSCLC tumours expressing EGFR S768dupSVD, G719A or L858R/E709K were implanted into 6- to 8-week-old female NSG mice. Once tumours reached 2,000 mm3, they were collected and re-implanted into the right flank of 6- to 8-week-old female NSG mice. Tumours were measured 3 times per week and were randomized into treatment groups when tumors reached a volume of 275–325 mm3 for the EGFR G719A and S768dupSVD models, and 150–175 mm3 for the L858R/E709K model. Treatment groups included vehicle control (0.5% methylcellulose, 0.05% Tween-80 in dH2O), 100 mg kg−1 erlotinib, 20 mg kg−1 afatinib, 2.5 mg kg−1 poziotinib, 5 mg kg−1 osimertinib, and 25 mg kg−1 osimertinib. During treatment, body weight and tumour volumes were measured three times per week, and mice received treatment five days per week (Monday to Friday). Dosing holidays were given if mouse body weight decreased by more than 10% or overall body weight dropped below 20 g. Maximum allowed tumour burden by approved IACUC protocol was a volume of 2,000 mm3. Mice were humanely euthanized if tumour sizes exceeded the maximum size.
Case studies of patients treated with second-generation TKIs
Patients were consented under the GEMINI protocol (PA13-0589) which was approved in accordance with the MD Anderson Institutional Review Board, or protocol MCC 19161, which was formally reviewed and granted approval by Moffitt Cancer Center in accordance with the Declaration of Helsinki and the 21st Century Cures Act for retrospective analysis of patient outcomes and treatment course for case studies of patients presented. Both protocols include informed consent for publication of deidentified data.
Retrospective analysis of ORR and duration of treatment with afatinib
Response to afatinib and duration of afatinib treatment was tabulated from 803 patients in the Uncommon EGFR Database (www.uncommonegfrmutations.com). Objective response rate was reported in 529 patients. Patients were stratified by either structure–function-based groups or exon-based groups and ORR was determined by counting the number of patients reported to have complete response or partial response. Fisher’s exact test was used to determined statistical differences between subgroups (structure based or exon-based). Duration of treatment was provided in the Uncommon EGFR Database for 746 patients. Patients were stratified by structure–function-based groups and exon-based groups and median DOT was calculated using the Kaplan–Meier method. Statistical differences in Kaplan–Meier plots, HR and P values were generated using GraphPad Prism software and the Mantel–Cox log-rank method. When mutations were not explicitly stated (that is, exon 19 mutation) those patients were excluded from the structure–function-based analysis but included in the exon-based analysis.
Further information on research design is available in the Nature Research Reporting Summary linked to this paper.