The data we leave behind in clinical trials and why it matters for clinical care and healthcare research in the future with AI

Every time I hear that all health conditions will be cured and fixed in 5 years with AI, I cringe. I know too much to believe in this possibility. But this is not an uninformed opinion or a disbelief in the trajectory of AI takeoff: this is grounded in the very real reality of the nature of clinical trials reporting and publication of data and the limitations we have in current datasets today.

The sad reality is, we leave so much important data behind in clinical trials today. (And every clinical trial done before today). An example of this is how we report “positive” results for a lot of tests or conditions, using binary cutoffs and summary reporting without reporting average titres (levels) within subgroups. This affects both our ability to understand and characterize conditions, compare overlapping conditions with similar results, and also to be able to use this information clinically alongside symptoms and presentations of a condition. It’s not just a problem for research, it’s a problem for delivering healthcare. I have some ideas of things you (yes, you!) can do starting today to help fix this problem. It’s a great opportunity to do something now in order to fix the future (and today’s healthcare delivery gaps), not just complain that it’s someone else’s problem. If you contribute to clinical trials, you can help solve this!

What’s an example of this? Imagine an autoantibody test result, where values >20 are considered positive. That means a value of 21, 58, or 82 are all considered positive. But…that’s a wide range, and a much wider spread than is possible with “negative” values, where negative values could be 19, 8, or 3.

When this test is reported by labs, they give suggested cutoffs to interpret “weak”, “moderate”, or “strong” positives. In this example, a value of 20-40 is a “weak” positive, a value between 40-80 is a “moderate” positive, and a value above 80 is a strong positive. In our example list, all positives actually fall between barely a weak positive (21), a solidly moderate positive in the middle of that range (58), and a strong positive just above that cutoff (82). The weak positive could be interpreted as a negative, given variance in the test of 10% or so. But the problem lies in the moderate positive range. Clinicians are prone to say it’s not a strong positive therefore it should be considered as possibly negative, treating it more like the 21 value than the 82 value. And because there are no studies with actual titres, it’s unclear if the average or median “positive” reported is actually all above the “strong” (>80) cutoff or actually falls in the moderate positive category.

Also imagine the scenario where some other conditions occasionally have positive levels of this antibody level but again the titres aren’t actually published.

Today’s experience and how clinicians in the real world are interpreting this data:

  • 21: positive, but 10% within cutoff doesn’t mean true positivity
  • 53: moderate positive but it’s not strong and we don’t have median data of positives, so clinicians lean toward treating it as negative and/or an artifact of a co-condition given 10% prevalence in the other condition
  • 82: strong positive, above cutoff, easy to treat as positive

Now imagine these values with studies that have reported that the median titre in the “positive” >20 group is actually a value of 58 for the people with the true condition.

  • 21: would still be interpreted as likely negative even though it’s technically above the positive cutoff >20, again because of 10% error and how far it is below the median
  • 53: moderate positive but within 10% of the median positive value. Even though it’s not above the “strong” cutoff, more likely to be perceived as a true positive
  • 92: still strong positive, above cutoff, no change in perception

And what if the titres in the co-condition have a median value of 28? This makes it even more likely that if we know the co-condition value is 28 and the true condition value is 58, then a test result of 53 will be more correctly interpreted as the true condition rather than providing a false negative interpretation because it’s not above the >80 strong cutoff.

Why does this matter in the real world? Imagine a patient with a constellation of confusing symptoms and their positive antibody test (which would indicate a diagnosis for a disease) is interpreted as negative. This may result in a missed diagnosis, even if this is the correct diagnosis, given the absence of other definitive testing for the condition. This may mean lack of effective treatment, ineligibility to enroll in clinical trials, impacted quality of life, and possibly negatively impacting their survival and lifespan.

If you think I’m cherry picking a single example, you’re wrong. This has played out again and again in my last few years of researching conditions and autoantibody data. Another real-world scenario is where I had a slight positive (e.g. above a cutoff of 20) value, for a test that the lab reported is correlated with condition X. My doctor was puzzled because I have no signs of this condition X. I looked up the sensitivity and specificity data for this test and it only has 30% sensitivity and 80% specificity, whereas 20% of people with condition Y (which I do have) also have this antibody. There is no data on the median value of positivity in either condition X or condition Y. In the context of these two pieces of information we do have, it’s easier to interpret and guess that this value is not meaningful as a diagnostic for condition X given the lack of matching symptoms, yet the lab reports the association with condition X only even though it’s only slightly more probably for condition X to have this autoantibody compared to condition Y and several other conditions. I went looking for research data on raw levels of this autoantibody, to see where the median value is for positives with condition X and Y and again, like the above example, there is no raw data so it can’t be used for interpretation. Instead, it’s summary of summary data of summarizing with a simple binary cutoff >20, which then means clinical interpretation is really hard to do and impossible to research and meta-analyze the data to support individual interpretation.

And this is a key problem or limitation I see with the future of AI in healthcare that we need to focus on fixing. For diseases that are really well defined and characterized and we have in vitro or mouse models etc to use for testing diagnostics and therapies – sure, I can foresee huge breakthroughs in the next 5 years. However, for so many autoimmune conditions, they are not well characterized or defined, and the existing data we DO have is based on summaries of cutoff data like the examples above, so we can’t use them as endpoints to compare diagnostics or therapeutic targets. We need to re-do a lot of these studies and record and store the actual data so AI *can* do all of the amazing things we hear about the potential for.

But right now, for a lot of things, we can’t.

So what can we do? Right now, we actually CAN make a difference on this problem. If you’re gnashing your teeth about the change in the research funding landscape? You can take action right now by re-evaluating your current and retrospective datasets and your current studies and figure out:

  • Where you’re summarizing data and where raw data needs to be cleaned and tagged and stored so we can use AI with it in the future to do all these amazing things
  • What data could I tag and archive now that would be impossible or expensive to regenerate later?
  • Am I cleaning and storing values in formats that AI models could work with in the future (e.g. structured tables, CSVs, or JSON files)?
  • Most simply: how am I naming and storing the files with data so I can easily find them in the future? “Results.csv” or “results.xlsx” is maybe not ideal for helping you or your tools in the future find this data. How about “autoantibody_test-X_results_May-2025.csv” or similar.
  • Where are you reporting data? Can you report more data, as an associated supplementary file or a repository you can cite in your paper?

You should also ask yourself whether you’re even measuring the right things at the right time, and whether your inclusion and exclusion criteria are too strict and excluding the bulk of the population for which you should be studying.

An example of this is in exocrine pancreatic insufficiency, where studies often don’t look at all of the symptoms that correlate with EPI; they include or allow only for co-conditions that are only a tiny fraction of the likely EPI population; and they study the treatment (pancreatic enzyme replacement therapy) without context of food intake, which is as useful as studying whether insulin works in type 1 diabetes without context of how many carbohydrates someone is consuming.

You can be part of the solution, starting right now. Don’t just think about how you report data for a published paper (although there are opportunities there, too): think about the long term use of this data by humans (researchers and clinicians like yourself) AND by AI (capabilities and insights we can’t do yet but technology will be able to do in 3-5+ years).

A simple litmus test for you can be: if an interested researcher or patient reached out to me as the author of my study, and asked for the data to understand what the mean or median values were of a reported cohort with “positive” values…could I provide this data to them as an array of values?

For example, if you report that 65% of people with condition Y have positive autoantibody levels, you should also be able to say:

  • The mean value of the positive cohort (>20) is 58.
  • The mean value of the negative cohort (<20) is 13.
  • The full distribution (e.g. [21, 26, 53, 58, 60, 82, 92…]) is available in a supplemental file or data repository.

That makes a magnitude of difference in characterizing many of these conditions, for developing future models, testing treatments or comparative diagnostic approaches, or even getting people correctly diagnosed after previous missed diagnoses due to lack of available data to correctly interpret lab results.

Maybe you’re already doing this. If so, thanks. But I also challenge you to do more:

  • Ask for this type of data via peer review, either to be reported in the manuscript and/or included in supplementary material.
  • Push for more supplemental data publication with papers, in terms of code and datasets where possible.
  • Talk with your team, colleague and institution about long-term storage, accessibility, and formatting of datasets
  • Better yet, publish your anonymized dataset either with the supplementary appendix or in a repository online.
  • Take a step back and consider whether you’re studying the right things in the right population at the right time

The data we leave behind in clinical trials (white matters for clinical care, healthcare research, and the future with AI), a blog post by Dana M. Lewis from DIYPS.orgThese are actionable, doable, practical things we can all be doing, today, and not just gnashing our teeth. The sooner we course correct with improved data availability, the better off we’ll all be in the future, whether that’s tomorrow with better clinical care or in years with AI-facilitated diagnoses, treatments, and cures.

We should be thinking about:

  • What if we design data gathering & data generation in clinical trials not only for the current status quo (humans juggling data and only collecting minimal data), but how should we design trials for a potential future of machines as the primary viewers of the data?
  • What data would be worth accepting, collecting, and seeking as part of trials?
  • What burdens would that add (and how might we reduce those) now while preparing for that future?

The best time to collect the data we need was yesterday. The second best time is today (and tomorrow).

Leave a Reply

Your email address will not be published. Required fields are marked *