Skip to content
Admin6 min read

Webinar recap: The Goldilocks problem in oncology RWD: choosing the data that’s ‘just right’

We all know that, in the story of Goldilocks and the Three Bears, it took the young girl three attempts to find the bowl of porridge, chair and bed that were ‘just right’ for her. Fortunately, researchers and clinicians in the life sciences and healthcare industries have much more sophisticated ways to choose the ‘right’ oncology real-world data (RWD) for use in their decision-making from drug development to clinical practice. In this webinar hosted by the Evidence Base, moderator Wendy Turenne and panelists Andrew Belli, MP, and Mark Shapiro, PhD, discuss the intricacies of oncology care, the complexities of selecting and extracting oncology RWD, and the ability of Aetion’s Accelerated Access program to support the data selection process.

a3d748ab-27d9-4b80-b6ff-23d16eaf52ee

Understand the intricacies of oncology care and the measurable outcomes captured through RWD

Oncology RWD is being increasingly utilized to offer greater insights into how treatments perform in broader patient populations over longer periods of time. This comprehensive view of patients’ journeys leads to advanced personalized treatment strategies, improved patient care and more informed cancer research. Yet incorporating oncology RWD into clinical development programs and accessing the relevant datasets in electronic health records (EHR), is challenging.

The panelists agreed that the study of oncology is different from the study of other therapeutic areas, particularly in how it relates to RWD, real-world evidence (RWE) and drug development. “It's extremely complex with significant therapeutic innovations, causing treatment paradigms, which are specialized and targeted, to constantly evolve over time,” said Belli. “Additionally, a multidisciplinary team consisting of an oncologist, pathologist, radiologist and others is involved in a patient’s care – all of whom are sources of information. At COTA, we use the concept of treatment regimens and a line of therapy algorithm as ways to accurately capture and code this wealth of data from the EHR to determine what is clinically relevant and actionable for a given cancer type.”  

“In oncology, much of the diagnostic, therapeutic and adherence information is described in the clinical notes and free texts on a pathology report and not accurately captured in coded data,” added Shapiro. “It’s a bit discombobulated and finding it requires human abstraction and manual review to reduce inferences and the level of assumptions.”

 

Dive deep into the most critical elements for oncology, such as biomarkers, survival rates and progression

“The way in which multidisciplinary team members deliver care – their practice and treatment patterns – directly impacts the data we see in an EHR, and how the variables and metrics we need are collected and measured,” said Turenne. “Therefore, it’s increasingly important to have specific data elements, such as biomarkers, survival rates and progression for oncology studies and clinical care.”

In most EHRs, a summary report of clinically relevant biomarkers is readily accessible and provides high-level information on the positive, negative or quantitative value of a given marker in a structured data format. A more granular, full genomic sequencing panel is often stored as a PDF, requiring users to pull up and scan the document for the relevant information they need. Unstructured data, such as PDFs, images and videos, does not fit neatly into a data table.

“Knowing that practicing clinicians are laser focused on what's actionable and has prognostic or therapeutic value, we are turning these PDFs into tabular data so they can quickly search and find the information they need,” said Shapiro. “Additionally, they may want to know the type of lab test, such as immunohistochemistry (IHC), polymerase chain reaction (PCR) or next-generation sequencing (NGS) because each has a different clinical meaning, even if the biomarker is reported the same way; therefore, we capture this data and put it into a more digestible format.”

“Survival is one of the best and most objective endpoints for use in oncology RWD, but it can often be missing in the documentation leading to potential inaccurate survival estimates,” said Belli. “At COTA, we use a composite real-world mortality endpoint, as it leverages the structured and unstructured data from the EHR and supplements it with commercially available obituary data from other sources, such as the Social Security database. We recently validated our mortality variable against the National Death Index in a study and found that it performed well against the gold standard in terms of validation metrics. It is emerging as a best practice and way to potentially avoid or mitigate some of the pitfalls of missing documentation.”

Using survival as a hard endpoint requires a way to determine who is alive in the US, which is not an easy thing to do. Information is needed from multiple consumer and government databases and is often hard to validate; data from an EHR is not enough. Identifying and measuring survival in oncology RWD and censoring is essential to use composite mortality as an endpoint.

 

Discuss strategies for extracting insights from unstructured EMR data, such as clinical genomic data and medical images

Understanding the distinction between data abstraction and data curation and the process behind both is key for buyers of oncology RWD. Data abstraction provides only the basic or essential information, whereas curated data is more nuanced and involves aggregating multimodal data, often manually, in a longitudinal fashion to enable the end user to follow the clinical path of a patient during treatment. The goal is to get the structured and unstructured data to a point where it is easy to understand and use, be it for clinical decision-making or research.

Buyers should also consider the source and quality of the data and the process by which their RWD partners abstract and curate the data. “The use of artificial Intelligence (AI) and machine learning (ML) to abstract and curate data is advancing,” said Turenne. “We partner with ConcertAI, COTA and xCures on our Accelerated Access: Oncology service to provide rapid access to fit-for-purpose RWD using our scientifically validated platform, Aetion® Substantiate. We also adhere to the US Food and Drug Administration’s (FDA) updated guidance, which offers expanded and updated recommendations to ensure the use of reliable, relevant and fit-for-purpose RWD to generate RWE.”

“At xCures, we've historically used an abstraction team to build curated data sets for training ML models by which we can then validate the performance of our oncology models, similar to validating a lab test,” said Shapiro. “Now we are incorporating large language models and can tune them to almost a 100% sensitivity and specificity with high quality training data. With all of this background knowledge, LLMs have the potential to make inferences, which may be correct but not necessarily verifiable, making guardrails, like microsatellite instability status, a requirement. This is similar to the use of double data entry and third-party referees in past clinical trials to ensure the quality of data.”

“No data is perfect – it’s all subject to the questions being asked or in Goldilocks’ case tasting the porridge and testing out the chair and bed to determine what is ‘just right.’ The FDA's guidance around RWD is clear that the same types of principles used in clinical trials should be applied when evaluating quality systems, asking specific questions and defining data sources,” added Shapiro.

To view this webinar in its entirety, click here.