Want signal? We need more noise (more examples to address the quiet bottleneck)

Note: I’m assuming if you are reading this post that you’ve first read this post, summarizing the concept. The below is a companion piece with some expanded details about the concepts and more examples of how I think addressing this bottleneck will help make a difference.

You might hold the perspective that in the growing era of AI, there’s too much noise already. Slopocalypse. (Side note: this entire post, and my other post, was written by me, a human.) That’s not the biggest problem in healthcare, though: in both research and clinical care, so much critical data is simply not collected. We are missing sooooo much important data. Some of this is an artifact of past clinical trial design and how it was hard to collect or analyze or store the data; there were less established norms around data re-use; and some of it is a “collect the bare minimum to study the endpoints because that’s all we are supposed to do” phenomenon.

Nowadays, we should be thinking about how to make data and insights from clinical work and research available to AI, too, because AI will increasingly be used by humans to sort through what is known and where the opportunities are, plus make cross-domain connections that humans have been missing. And we need to be thinking about whether the incentives are set up correctly (spoiler: they’re not) to make sure all available data is able to be collected and shared or at least stored for humans and AI to have access to for future insight. What is studied is also ‘what is able to be funded’, which is disproportionately things that come to the commercial market eventually after regulatory approval.)

“But what’s the point?” you ask. “If no one is looking at the data from this study, why collect it?” Because the way we design studies – to answer a very limited-scope question, i.e. safety and efficacy for labeling claims and regulatory approval – is very different than the studies we need around optimizing and personalizing treatments. If you look at really large populations of people accessing a treatment –  think GLP-1 RA injectables, for example – you eventually start to see follow-on studies around optimization and different titration recommendations. But most diseases and most treatments aren’t for a population even 1/10th the size of people accessing those meds. So those studies don’t get done. Doctors don’t necessarily pay attention to this; patients may or may not report relevant data about this back to the doctor (and again even if they do: doctors don’t mentally or physically store this data, often); and the signal is missed because we don’t capture this ‘noise’.

This means people with rare diseases, undiagnosed diseases, atypical or seronegative diseases, unusual responses to treatments, multiple conditions (comorbidities), or any other scenario that results in them being part of a small population and where customizing and individualizing their care is VERY important…they have no evidence-based guidance. Clinicians do the best they can to interpret in the absence of evidence or guidelines, but no wonder patients often turn to LLMs (AI) for additional input and contextualization and discussion of the tradeoffs and pros and cons of different approaches.

We have to: the best available human (our clinicians) may not have the relevant expertise; or they may not be available at all; or they may be biased (even subconsciously) or forgetful of critical details or not up to date on the latest evidence; or they may be faced with a truly novel situation and they don’t have the skills to address it because they’re used to cookie-cutter standardized cases. This is not a dig at clinicians, but I recognize that we now have tools that can address some of the existing flaws in our human-based healthcare system. I keep talking about how we need to recognize that evaluations of AI in healthcare shouldn’t treat the status quo as the baseline to defend, because the status quo itself has problems, as I just described.

And AI has some of the same problems. (Except for the ‘not available’ part, unless you consider the lower-utility free access models to mean that the more advanced, thinking-based models are ‘not available’ because of cost access barriers). An AI may not have any training data on a rare disease… because nothing exists. It may drop information out of the context window, and we may not realize that this has happened (i.e., it ‘forgets’ something). Usually these are critiques of AI, juxtaposed against the implication that humans are better. But notice that these critiques are the same of humans, too! This happens all the time with human clinicians in healthcare. A human can’t make decisions on data that doesn’t exist in the world, either!

So…how do we “fix” AI? Or, how do we fix human healthcare? We should be asking BOTH questions. Maybe the answer is the same: increase the noise so we increase the signal. Argue about the ratio later (of signal:noise), but increase the amount of everything first. That is most important.

How might we do this? I have been thinking about this a lot, and Astera recently posted an essay contest asking for ideas that don’t fit the current infrastructure. (That’s why I’m finally writing this up, not because I think I’ll win the essay contest, but mostly because i’s an opportunity for people to consider whether/not this type of solution is a good idea as a separate analysis from “well, who is going to fund THAT?”. Let’s discuss and evaluate the idea, or riff on it, without being bogged down by the ‘how exactly it gets funded and managed over time’.)

I think we should create some kind of digitally-managed platform/ecosystem to do the following:

  1. Incentivize and facilitate written (AI-assisted allowed) output of everything. Case reports and scenarios of people and what they’re facing. All the background information that might have contributed. All the data we have access to that is related to the case, plus other passive, easy-to-collect data (that may already be collected, e.g., wearables and phone accelerometer data) even if it does not appear to be related. Patient-perspective narratives, clinical interpretations, clinical data, wearable data – all of this.

    A.) And, be willing to take variable types of data even from the same study population. For example, as a person with type 1 diabetes, I have 10+ years of CGM data. For a future study on exocrine pancreatic insufficiency, for example, not everyone will have CGM data, but because CGM data is increasingly common in people with T2D (a much larger population) and in the general population, a fraction of people will have CGM data and a fraction of people are willing to share it. We should enable this, even if it’s not for a primary endpoint analysis on the EPI study and even if only a fraction of people choose to share it – it might still be useful for subgroup analysis, determining power for a future study where it is part of the protocol, and identifying new research directions!

  2. Host storage somewhere. There can be some automated checking against identifiable information and sanity checking what’s submitted into the repository, plus consent (either self-consent or signed consent forms for this purpose.)
  3. Provide ‘rewards’ as incentives for inputting cases and data.

    A) Rewards might differ for patients self-submitting data and researchers and clinicians inputting data and cases. Maybe it’s one type of reward for patients and a batch reward of a free consult or service of some kind for researchers/clinicians (e.g. every 3 case reports submitted = 1 credit, credits can be used toward a new AI tool or token budget for LLMs or human expertise consult around study design, or all kinds of things.)

    B) Host data challenges where different types of funders incentivize different disease groups or families of conditions to be added, to round out what data is available in different areas.

  4. Enable AI and human access (including to citizen scientists/independent researchers who don’t have institutions) to these datasets, after self-credentialing and providing a documented use case for why the data is being accessed.

    A) Require anyone accessing the dataset to make their analysis code open source, or otherwise openly available for others to use, and to make the results available as well.

    B) Tag anytime an individual dataset is used by a project, that way individuals with self-donated data (or clinicians submitting a case) can revisit and see if there are any research insights or data analysis that applies to their case.

    C) Provide starter projects to show how the data can be used for novel insight generation. For example, develop sandboxes for different types of datasets with existing ‘lab notebooks’, so to speak, to onboard people to different datasets/groups of data, different types of analyses they might do, and to cut down on the environment setup barriers to getting started to analyzing this data.

    D) Facilitate outreach to institutions, disease-area patient nonprofits and advocacy groups both to solicit data inputs AND to use the data for generating outputs.

Why should we do this?

  • Clinical trials do not gather all of the possible useful data, because it’s not for their primary/secondary endpoints. Plus, they would have to clean it upon collection. Plus storage cost. Plus management decisions. Etc. There’s a lot of real barriers there, but the result is that clinical trials do not capture enough or the most possible relevant data. So we need a different path.
  • Clinicians/clinical pathways don’t have ways to review data, so they avoid doing it. There’s no path to “submit this data to my record but I don’t expect you to look at it” in medical charts (but there should be). So we need a different path.
  • Disease groups/organizations sometimes host and capture datasets, but like clinical trials, this is limited data; it often comes through clinical pathways (further limiting participation and diversity of the data); it doesn’t include all the additional data that might be useful. So we need a different path.
  • Some patients do publish or co-author in traditional journals, but it’s a limited, self-selected group. Then there are the usual journal hurdles that filter this down further. Not everyone knows about pre-prints. And, not everyone knows how useful their data can be – either as an n=1 alone or as an n=1*many for it to be worth sharing. A lot of people’s data may be in video/social media formats inaccessible to LLMs (right now) or locked in private Facebook groups or other proprietary platforms. So we need a different path.

Thus, the answer to ‘why we should do this’ is recognizing that: AI does not create magic out of nowhere. It relies on training data and extrapolating from that, plus web search. If it’s not searchable or findable, and it’s not in the training data, it’s not there. Yes, even the newer models and prompting it to extrapolate doesn’t solve all of these problems. What AI can do is shaped disproportionately by formal literature, institutional documents, and whatever scraps of publicly available content remain. If we want to shape what’s possible in the future, we need to start now by shaping and collecting the inputs to make it happen. We can’t fix the past trial designs but we can start to fill in the gaps in the data!

Here is an example of how this might work and why it matters to build and incentivize this type of data sharing.

  1. If we build and incentivize data sharing, the following individuals might self-donate their data and write-ups, or a clinician might submit them. Later, someone might analyze the data and put the pieces together.
  • Someone with type 1 diabetes (an autoimmune condition) and this new-onset muscle-related issue. Glucose levels are well-managed, and they don’t have neuropathy/sensory issues (which are common complications of decades of living with type 1 diabetes), and muscle damage and inflammation markers are normal. MRIs are normal. The person submits their data which includes their lab tests, clinical chart notes (scrubbed for anonymity), a patient write-up of what their symptoms are like, and exported data from their phone with years of motion/activity data.
  • A different person with Sjogren’s disease (also an autoimmune condition) and a similar new-onset muscle-related issue. There are known neurological manifestations or associations with Sjogren’s, but more typically those are small-fiber neuropathy or similar. The symptoms here are not sensory or neuropathy. MRIs don’t show inflammation or muscle atrophy. Their clinician is stumped, but knows they don’t see a lot of patients with Sjogren’s and wonders if there is a cohort of people with Sjogren’s facing this. The clinician asks the patient and gains consent; scrubs the chart note of identifying details, and submits the chart notes, labs, and a clinician summary describing the situation.
  • Later, a researcher (either a traditional institutional-affiliated researcher OR a citizen scientist, such as a third person with a new-onset muscle-related issue) decides to investigate a new-onset muscle-related issue. They register their hypothesis: that there may be a novel autoimmune condition that results in this unique muscle-related issue as a neuromuscular disease (that’s not myasthenia gravis or another common NMJ disease) that shows up in people with polyautoimmunity (multiple autoimmune conditions), but we don’t know which antibodies are likely correlated with it. The research question is to find people in the platform with similar muscle-related conditions and explore the available lab data to help find what might be the connecting situation and classify this disease or better understand the mechanism.
  • They are granted access to the platform, start analyzing, and come across an interesting correlation with antibody X, which is considered a standard autoimmune marker but one that doesn’t differentiate by disease, and which does highly correlate with this possibly novel muscle condition when it exists and is elevated in people with at least one existing autoimmune condition and these muscle-symptoms that are seronegative for every other autoimmune and neurologic condition. Further, by reading the clinical summary related to patient B and the self-written narrative from patient A, it becomes clear that this is likely neuromuscular – stemming from transmission failure along the nerves – and that the muscles are the ‘symptom’ but not the root cause of disease. This provides an avenue for a future research protocol to 1) allow these types of patients to be characterized into a cohort so this can be determined whether it is a novel disease or a subgroup of an adjacent condition (e.g., a seronegative subgroup);  2) track whether the antibody levels are treatment-sensitive or not or stay elevated always; and 3) cohorts of treatments that can be trialed off-label because they work in similar NMJ diseases even though the mechanism isn’t identical.
  • The mechanism for this novel disease isn’t proven, yet, but in the face of all the previous negative lab data and neurological testing patient A and patient B have experienced, it narrows it down from “muscle or neuromuscular” which is a significant improvement from their previous situations. Plus, this provides pathways for additional characterization; research; and eventual treatment options to explore versus the current dead ends both patients (and their clinical teams) are stuck in. And because there are no good clinical pathways for these types of undiagnosed cases, this type of insight development across multiple cases would not have occurred at all without this database of existing data.

2. The above is a small-n example, but consider a large dataset where there are hundreds or thousands of people with CGM data submitted. Plus meal tracking data, because people can export and provide that from whatever meal-logging apps some of them happen to use.

  • By analyzing this big dataset, an individual researcher could hypothesize that they could build a predictor to identity the onset of exocrine pancreatic insufficiency, which can occur in up to 10% of the general population (more frequently in older adults, people with any type of diabetes, as well as other pancreas-related conditions like pancreatitis, different types of cancer, etc), by comparing increases in glucose variability that correlate with a change in dietary consumption patterns, notably around decreasing meal size and eventually lowering the quantity of fat/protein consumed. (These are natural shifts people make when they notice they don’t feel good because their body is not effectively digesting what they are eating). They analyze and exclude the effect of GLP1-RA’s and other medications in this class: the effect size persists outside of medication usage patterns. This can later be validated and tested in a prospective clinical trial, but this dataset can be used to identify what level of correlation between meal consumption change and glucose variability change happen over what period of time in order to power a high-quality subsequent clinical trial. This may lead to an eventual non-invasive method to diagnose exocrine pancreatic insufficiency through wearable and meal-tracking data. (None exists today: only a messy stool test that no one wants to do and is hampered by other issues for accuracy.)

These are two examples and show how even small-n data or a large dataset where there are additional subgroups with additional datasets can be useful. In the past, it’s been a “we need X, Y, Z data from everyone. A, B, C would be nice to have, but it’s hard and not everyone will share it (or is willing to collect it due to burden), so we won’t enable it to be shared for the 30% of people who are willing”. Thus, we lose the gift and contributions from the people who are able and willing to share that data. Sometimes that 30% is a small n but that small n is >0 and may be the ONLY data that will eventually answer an important future research question.

signal-noise-more-examples-DanaMLewisWe are missing so much, because we don’t collect it. So, we should collect it. We need a platform to do this outside of any single disease group patient registry; we need to support clinician and patient entries into this platform; we need to support intake of a variety of types of data; and we need to have low (but sensible) barriers to access so individuals (citizen scientists, patients themselves) can leverage this data alongside traditional researchers. We need all hands on deck, and we need more data collected.

Want signal? We need more noise (looking at the quiet bottleneck)

We need more signal, which means we want more noise. A lot of current scientific infrastructure is designed to minimize messiness: define a narrow question, collect the minimum data required to answer it, standardize the dataset, exclude complicating variables, finish the analysis, publish the result. That approach is understandable. It is also one reason we repeatedly fail the patients who most need evidence: people with unusual responses, multiple conditions, atypical phenotypes, rare diseases, or combinations of features that do not fit cleanly into existing bins. Our systems today are structurally designed to discard or never capture the kinds of heterogeneous, partial, contextual, or longitudinal data that could eventually make critical insights available to us.

My hypothesis is that one important scientific bottleneck is not a lack of intelligence, or even a lack of data in the abstract, but a lack of infrastructure for accepting, preserving, and reusing the kinds of data that fall outside formal trial endpoints and standard clinical workflows. I am proposing a solution: a shared platform that accepts optional, heterogeneous, participant- and clinician-contributed data, which will generate clinically useful subgroup hypotheses that standard trial and registry structures fail to generate.

Here are the quiet bottlenecks in the current system that I run into and that this platform would address:

  1. Atypical, comorbid, and small-population patients are systematically left without usable evidence because current research systems are optimized to suppress heterogeneous data rather than preserve it. (Clinical trials are usually designed around narrow endpoint collection, fixed inclusion and exclusion criteria, and standardized datasets that are tractable for a specific immediate question. Patients are often told there is “no evidence,” when what is really missing is a system that could have captured and preserved evidence that can help with future translation for atypical or edge-case presentations.)
  2. Useful data is routinely discarded because no existing structure is responsible for capturing it. (Clinical trials generally do not want to collect data outside primary and secondary endpoints because of cleaning, storage, analysis, and governance costs. Clinical care systems do not want to ingest large volumes of patient-generated data that clinicians are not expected to review. Disease registries are usually narrow and disease-specific. Journals are poor infrastructure for surfacing up data. There is no home for this data.)
  3. AI and humans are constrained by the same missingness problem: the knowledge base is biased and uneven because we do not capture enough of the right kinds of real-world data. (LLMs and other AI systems are often criticized for bias, holes, and nonrepresentative training data. But this overlooks that clinicians and researchers are also reasoning from the same incomplete literature, selective trial populations, under-collected real-world evidence, and publication-filtered case reports. The bottleneck is upstream of both.)AI now makes it possible to overcome these constraints, but only if the data is collected and made available in the first place.
  1. Current systems privilege uniform completeness over partial but valuable contribution, which causes preventable signal loss. (If not everyone has or is willing to provide the data, it’s not collected. Subsets of participants who are willing and able to contribute additional data, such as wearables, CGM, meal logs, phone sensor streams, or symptom data, are prevented from doing so.)

Why current structures produce these bottlenecks

No single existing institution is responsible for optional data intake, long-term stewardship, and broad downstream access. And different structures prioritize different data capture, with no harmonization between these two settings. Clinical trials are funded, regulated, and staffed to answer bounded questions. Clinical trial incentives reward endpoint completion, analyzable datasets, and publication-ready results. Anything beyond the core clinical trial protocol creates additional costs: more data cleaning, more storage, more governance, and more analytic work that may not directly serve the trial’s primary purpose. Under those conditions, narrowness is rational but it is also systematically lossy. Clinical care systems have a different but related design problem. Electronic health records are built for documentation, care coordination, and billing, not for capturing or allowing participant-generated, future-use data that a clinician does not need for real-time decision making. This illustrates the infrastructure gap: patients may have useful longitudinal data, but there is often no legitimate pathway to store it in a way that supports future analysis. Disease registries and advocacy-group datasets help in some cases, but they are typically narrow in scope, tied to a disease silo, and constrained by the same pressures toward standardization and limited datastreams. Publications are also poorly matched to this problem, because they are optimized for polished outputs, not for preserving heterogeneous data. Much of the infrastructure was shaped from prior underlying constraints (when data capture, storage, and analysis was more expensive).

Testable hypotheses that would address these challenges:

  1. Allowing partial, nonuniform data donation from participants will outperform “uniform minimum dataset only” models for early discovery in rare, atypical, and comorbid populations.
  2. Structured case narratives combined with quantitative data will enable more useful discovery than quantitative data alone for poorly characterized conditions, because the narrative context helps identify mechanistic or phenotypic patterns that are otherwise lost.
  3. Modern tools (LLMs etc) make it feasible to analyze and normalize real-world participant-contributed data at a scale and cost that was previously impractical.
  4. For some under-characterized conditions, a heterogeneous real-world repository can identify candidate biomarkers, phenotypic subgroups, or prospective-study designs and lead to a shorter timeline for protocol development and funding of eventual trials, compared to the status quo.

Here are a few example experiments we can use to validate these hypotheses:

  1. Use heterogenous real-world data to test whether a shared repository can generate an earlier detection hypothesis that existing structures are poorly positioned to generate. For example, exocrine pancreatic insufficiency. EPI is common but underdiagnosed and its current diagnostic pathway is unpleasant (a stool test) and imperfect. As digestion becomes less effective, people often change what and how they eat before anyone labels it as a problem: meal sizes may shrink, fat and protein consumption may drift downward, symptom patterns develop, and glucose variability may shift in parallel, especially in people with diabetes or others who happen to have CGM data. A conventional clinical trial would rarely collect all of these data together (GI symptoms, meal logs, and CGM data). We have shown that a patient-developed symptom survey can effectively distinguish between EPI and non-EPI gastrointestinal symptom patterns in the general population. But this relies on people to know they have a problem and fill out the symptom survey to assess their symptoms. By analyzing CGM and meal logging data, we may be able to create a diagnostic signal from CGM data that may provide an earlier detection of EPI or other digestive problems, noninvasively and much earlier. If successful, the output would be a concrete, testable hypothesis for a future prospective study: for example, that a specific combination of changing meal patterns and glucose variability could serve as an early noninvasive screening signal for EPI.
  2. A second, smaller experiment would test the same infrastructure in a very different setting: under-characterized autoimmune or neuromuscular overlap syndromes. Here the problem is not underdiagnosis at scale, but invisibility in small-n edge cases. Patients with unusual muscle-related symptoms, normal or ambiguous standard workups, and backgrounds that include autoimmune disease often remain isolated within separate clinics and disease silos. One person may carry a type 1 diabetes diagnosis, another Sjögren’s, another a different autoimmune history, and yet their new symptoms may share an overlooked mechanism. In current systems, these cases rarely become legible as a group because the relevant data are scattered across chart notes, patient stories, normal imaging, lab panels, and passive longitudinal data that no one is collecting or comparing systematically. This tests the same core hypothesis under tougher conditions: smaller numbers, less uniform data, less obvious endpoints, and a heavier dependence on narrative context. If the EPI case shows that optional heterogeneous data can support tractable hypothesis generation in an underdiagnosed condition, the neuromuscular case shows why the same infrastructure could matter even more in sparse-data, high-ambiguity situations where it is even harder to capture data to generate evidence in support of and funding for subsequent trials.

What we might learn if these fail

We will learn what the barrier is, whether it is still a problem of infrastructure; a lack of the right people (or AI) leveraging the data; whether partial nonuniform data donation is operationally feasible; and whether limiting factors are data availability, harmonization, governance, or analytic quality. Plus, we can determine whether certain diseases or use cases (e.g. developing novel diagnostics versus assessing medication responses) are better suited than others for this type of platform.

Why now?

Passive and participant-generated data collection is easier. Wearables, phone sensing, CGM data, meal logging apps, symptom trackers, and similar are now significantly more common. Technology makes it easier than ever to create custom apps to track n=1 data or study-specific data. Storage is cheaper. Technology improvements, most notably AI tools, have made it more tractable to collect data; and have made it more tractable for researchers to analyze this data. The remaining bottleneck is capturing, storing, and making the data available. It is less of a technological bottleneck and instead a bottleneck of funding/governance/etc. This is addressable now that the capture/analysis barriers have been lowered!

signal-noise-DanaMLewisI don’t think the question we should be asking is whether every piece of heterogenous data will be useful, but whether we can afford to keep doing what we are doing (throwing away data and still expecting discovery for the populations our current systems and infrastructure already quietly fail).  

Note: I am submitting this post to the Astera essay contest, which you can read about here. You should write up your ideas about the bottlenecks you see and submit as well! I also wrote an additional piece with more details and examples, which you can read here. 

The data we are leaving behind in clinical trials, what it costs us, and why we should talk about it

I talk a lot about the data we leave behind. A lot of this time, this is in the context of clinical trial design. But more recently, it’s also about the data loss that drives bias in AI. And actually…that same bias exists in humans. And seeking to fix one may also fix the other!

To clarify, it’s not that no one cares (as a reason for why data loss or missingness happens). It is an artifact of our systems, funding priorities, and a whole bunch of historical decision making. Clinical trials cost money and are usually designed to answer a question around safety or efficacy, often in pursuit of regulatory approval and eventual commercial distribution. We need that (I’m not saying we don’t!), but we also need funding sources to answer additional questions related to titration, individual variance, medication impacts, more diseases, etc.

A lot of trials exclude people with type 1 diabetes, for example, so it’s always a question when new medications or treatments roll out whether they are appropriate or useful for people with T1D or if there’s a reason why they wouldn’t work. This is because sometimes T1D might be seen as a muddy variable in the study, but a lot of times it’s just a copy-paste decision from a previous study where it did matter that is propagated forward into the next study where it doesn’t matter. There is no resulting distinction between the two: there is simply a lack of representation of people with T1D in the study population and this means it’s hard to interpret and make decisions about this data in the future.

A lot of decisions around what data is collected, or who is enrolled in a population for a study, is an artifact of this copy-paste! Sometimes literal copy-paste, and sometimes a mental copy-paste for “we do things this way” because previously there was a reason to do so. Often that’s cost (financial or time or hardship or lack of expertise). But, in the era of increasing technological capabilities and lower cost (everything from the cost of tokens to LLM to the decreasing cost of storage for data long-term, plus the reduced cost on data collection from passive wearables and phone sensors or reduced hardship for participants to contribute data or reduced hardship for researchers to clean and study the data) this means that we should be revisiting A LOT of these decisions about ‘the way things are done’ because the WHY those things were done that way has likely changed.

The TLDR is that I want – and think we all need – more data collected in clinical trials. And, I don’t think we need to be as narrow-minded any more about having a single data protocol that all participants consent to. Hold up: before you get the torches and pitchforks out, I’m not saying not to have consent. I am saying we could have MULTIPLE consents. One consent for the study protocol. And a second consent where participants can evaluate and decide what additional permissions they want to provide around data re-use from the study, once the study is completed. (And possibly opt-in to additional data collection that they may want to passively provide in parallel).

The reason I think we should shift in this direction is because that, out of extreme respect to patient preferences around privacy and protecting patients (and participants in studies – I’ll use participants/patients interchangeably because it happens in clinical care as well as research), researchers sometimes err on the side of saying “data not available” after studies because they did not design the protocol to store the data and distribute it in the future. A lot of times this is because they believe patients did not consent (because researchers did not ask!) for data re-use. Other times it’s honestly – and I say this as a researcher who has collaborated with a bunch of people at a bunch of institutions globally – inadvertent laziness / copy-and-paste phenomenon where it’s hard to do the very first time so people don’t do it, and then it keeps getting passed on to the next project the same way, because admittedly figuring it out and doing it the first time takes work. (And, researchers may have not included in their budgets to cover data storage, etc.). Thus this propagates forward. Whereas you see researchers who do this on one project tend to continue doing it, because they’ve figured it out – and written it into their research budgets to do so for subsequent projects. And other times, the lack of data re-use is a protective instinct because researchers may have concern that the data may be used in a way that is harmful or negative. Depending on the type of data (genetic data versus glucose data, as two examples), that may be a very legitimate concern of study participants. Other times, it may be at odds with the wishes of the broad community perspective. And sometimes, individuals have different preferences!

We could – and should – be able to satisfy BOTH preferences, by having an explicit consent (either opt-in or opt-out, depending on the project) that is an ADDITIONAL consent specifically indicating preferences around data being re-used for additional studies.

To emphasize why this matters, I’ll circle back to AI/LLMs. A criticism of AI is that they are trained on data that is not representative; has holes; or is biased. This is legitimate and an artifact of the past design of trials. I have been thinking through the opportunities this provides now, in the current era of trial design, to be proactive and strategic about generating and collecting data that will fill these holes for the future! We can’t fix the way past trials were done; but we can adjust how we do trials moving forward, fill the holes, and update the knowledge base. This actually then fixes the HUMAN PROBLEM as well as the AI problem: because these criticisms of AI are the same criticism we should be making about humans (researchers and clinicians etc), who are also trained on the same missing, messy, possibly biased data!

More data; more opt-in/out; more strategery in thinking about plugging the holes in the knowledge base moving forward in part by not making the same mistakes in future research that we’ve made in the past. We should be designing trials not only for narrow endpoints (which we can also do) for safety and efficacy and regulatory approval and commercial availability, but also for answering the questions patients need answered when evaluating these treatments in the future: everything from titration to co-conditions to co-medications to the arbitrary research protocol decisions around the timing of treatment or the timing of the course of treatment. A lot of this could be better answered by data re-use beyond the additional trial as well as additional data collection in conjunction to the primary and secondary endpoints.

All of this is to say…you may not agree with me. I support that! I’d love to know (constructively; as I said, no pitchforks!) what you disagree about, or what you are violently nodding your head in agreement about, or what nuances or big picture things are being overlooked. And, I have an invitation for you: CTTI – the Clinical Trials Transformation Institute – is hosting a 2-hour summit on April 28 (2026) for patients, caregivers, and patient advocates designed to facilitate discussion around the range of perspectives on these topics as they apply to clinical trials. Specifically 1) the use of AI in clinical trials and 2) perspectives on data re-use beyond the initial trial that data is collected for.

If you have strong opinions, this is a great place for you to share them and hear from others who may have varying or similar perspectives. I am expecting to learn a lot! This event is not designed around creating consensus, because we know there are varying perspectives on these topics, and thus especially if you disagree or have a nuanced take your voice is needed! (I may be in the minority, for example, with my thoughts above.) The data we leave behind in clinical trials, what it costs us, and why we should talk about it - a blog by Dana M. Lewis on DIYPS.orgParticipation is focused (because these topics are scoped to focus on clinical trials) on anyone who has ever been part of a clinical trial, is currently navigating one, made the decision to leave a trial early, or even those who looked into participating but ultimately decided it wasn’t the right fit .

You can register for the summit here (April 28, 2026 10am-noon PT / 1-3pm ET), and if you can’t attend, feel free to drop a comment here with your thoughts and I’ll make sure it is shared and included in the notes for the event that will also be shared out afterward aggregating the perspectives we hear from.

What you shouldn’t take away from my talk about patient experiences of using AI in clinical healthcare

I was asked to contribute a talk in a session at Stanford’s recent AI+Health conference. I spoke alongside two clinician researchers, and we organized the session so that I would talk about some individual patient experiences & perspectives on AI use in health, then scale up to talk about health system use, then global perspectives on AI for health-related use cases.

One thing I’ve been speaking about lately (including when I was asked to present in DC at the FDA AI workshop) is about how AI…is not one thing. There are different technologies, different models, and they’re going to work differently based on the prompts (like our research showed here about chart notes + LLM responses, as evaluated by a patient and clinician) as well as whether it’s a one-off use, a recurring use, something that’s well-defined in the literature and training materials or whether it’s not well-defined. I find myself stumbling into the latter areas quite a bit, and have a knack of finding ways to deal with this stuff, so I’m probably more attuned to these spaces than most: everything from figuring out how we people with diabetes can automate our own insulin delivery (because it was not commercially available for years); to figuring out how to titrate my own pancreatic enzyme replacement therapy and building an app to help others track and figure things out themselves; to more recent experiences with titration and finding a mechanistic model for an undiagnosed disease. The grey area, the bleeding edge, no-(wo)man’s land where no one has answers or ideas of what to do next…I hang out there a lot.

Because of these experiences, I see a lot of human AND technology errors in these spaces. I see humans make mistakes in chart notes and also in verbal comments or directions to patients. I see missed diagnoses (or wrong diagnoses) because the medical literature is like a game of telephone and the awareness of what conditions are linked to other conditions is wrong. I see LLMs make mistakes. But I see so many human mistakes in healthcare. One example from my recent personal experiences – it’s 2025 and a clinical team member asked me if I had tried cinnamon to manage my glucose levels. This was after she had done my intake and confirmed that I had type 1 diabetes. I looked at her and said no, because that would kill me. She looked surprised, then abashed when I followed that comment that I have to take insulin because I have type 1 diabetes. So I am maybe less bothered than the average person by the idea that LLMs sometimes make mistakes, say a wrong (or not quite right, even if it’s not wrong) thing, or don’t access the latest evidence base. They can, when prompted – and so can the human clinical teams.

A big point I’ve been bringing up in these talks is that we need everyone to care about the status quo of human, manual healthcare that is already riddled with errors. We need everyone to care about net risk reduction of errors and problems, not focus on additive risk.

We saw this with automated insulin delivery – people without diabetes were prone to focus on the additive risk of the technology and compare it to the zero risk they face as a person without diabetes, rather than correctly looking at the net risk assessment of how automated insulin delivery lowers the risk of living with insulin-managed diabetes even though yes, there is some additive risk. But almost a decade later, AID is now the leading therapy choice for people with insulin-managed diabetes who want it, and OS-AID is right there in the standards of care in 2026 (and has been for years!!!) with a strong evidence base.

I also talked about my experiences observing the real use of clinical scribe tools. I talked more in detail about it in this blog post, but I summarized it in my talk at Stanford, and pointed out how I was surprised to learn later – inadvertently, rather than in the consent process – that the scribe tool had access to my full medical record. It was not just transcribing the conversation and generating chart notes from it.

I also pointed out that my health system has never asked me for feedback about the tool, and that I’ve actually seen my same clinician use multiple different scribe technologies, but with no different consent process and no disclosure about chart access or any differences in privacy policy, retention timeline, etc. (Don’t pick on my clinician: she’s great. This is a point about the broader systematic failures.)

This is where I realized later people might have taken away the wrong point. This was not about me being a whiny patient, “oh they didn’t ask for my feedback! My feedback is so important!” and self-centered.

It was about flagging that these technologies do not have embedded feedback loops from ALL stakeholders, and this is more critical as we roll out more of these technologies related to healthcare.

It’s one thing to do studies and talk about user perspectives and concerns about this technology (great, and the other two presenters did an awesome job talking about their work in these spaces).

But we need to do more. We need to have pathways built in so all stakeholders – all care team members; patients; caregivers/loved ones; etc. – have pathways to talk about what is working and what is not working, from everything from errors/hallucinations in the charts to the informed consent process and how it’s actually being implemented in practice.

It matters where these pathways are implemented. The reason I haven’t forced feedback into my health system is two-fold. Yes, I have agency and the ability to give feedback when asked. Because I’ve worked at a health system before, I’m aware there’s a clinic manager and people I could go find to talk to about this. But I have bigger problems I’m dealing with (and thus limited time). And I’m not bringing it up to my clinician because we have more important things to do with our time together AND because I’m cognizant of our relationship and the power dynamics.

What do I mean? The clinician in question is new-ish to me. I’ve only been seeing her for less than two years, and I’ve been carefully building a relationship with her. Both because I’m not quite a typical patient, and because I’m not dealing with a typical situation, I’m still seeking a lot from her and her support (she’s been great, yay) to manage things while we try to figure out what’s going on. And then ditto from additional specialists, who I’m also trying to build relationships with individually, and then think about how their perceptions of me interplay with how/when they work with each other to work on my case, etc.

(Should I have to think about all this? Do I think about it too much? Maybe. But I think there’s a non-zero chance a lot of my thinking about this and how I’m coming across in my messages and appointments have played a role in the micro-successes I’ve had in the face of a rubbish and progressive health situation that no human or LLM has answers for.)

So sure, I could force my feedback into the system, but I shouldn’t have to, and I definitely shouldn’t have to be doing the risk-reward calculation on feedback directly to my clinician about this. It’s not her fault/problem/role and I don’t expect it to be: thus, what I want people to take away from my talk and this post is that I’m expecting system-level fixes and approaches that do not put more stress or risk on the patient-provider relationship.

The other thing I brought up is a question I left with everyone, which I would love to spur more discussion on. In my individual case, I have an undiagnosed situation. A team of specialists (and yes, second opinions) have not been able to diagnose/name/characterize it. I have shifted to asking providers pointedly to step away from naming and thinking broadly: what is their mental model for the mechanism of what’s going on? That’s hard to answer, because providers aren’t often thinking that way. But in this situation, when everything has been ruled out but there is CLEARLY something going on, it’s the correct thing to do.

And for the last year, the LLMs had a hard time doing it, too. Because they’re trained on the literature and the human-driven approach to make differential diagnoses, the LLMs have struggled with this. I recently began asking very specifically to work on mechanistic models. I then used that mechanistic model to frame follow up questions and discussions, to rule things in/out, to figure out where there are slight overlaps, to see where that gives us clues or evidence for/against our mechanistic model hypothesis, and to see what treatment success / failure / everything in between tells us about what is going on. Sure, a name and a diagnosis would be nice, but it’s been so relieving to have at least a hypothetical mechanistic model to use and work from. And it took specific probing (and the latest thinking/reasoning models that are now commonly available). But why am I having to do it, and why are my clinicians not doing this?

Do clinicians have an obligation to disclose when they are not using AI?I know some of the answers to the question of why clinicians aren’t doing this. But, the question I asked the Stanford AI+Health audience was to consider why we focus so much on informed consent for taking action, but we ignore the risks and negative outcomes that occur when not taking action.

And specifically in rare/undiagnosed diseases or edge cases of known diseases…do clinical providers have an obligation now to disclose when they are NOT using AI technology to facilitate the care they are providing?

It’s an interesting question, and one I would love to keep discussing. If you have thoughts, please share them!

A new symptom score for people with exocrine pancreatic insufficiency (the EPI/PEI-SS)

One of the frequent complaints in the literature about exocrine pancreatic insufficiency (known as EPI in some parts of the world, or PEI for pancreatic exocrine insufficiency elsewhere) is that the symptoms are not specific and they can overlap with other conditions. Diarrhea, for example, can happen from a lot of conditions and a lot of medications. Not everyone with EPI has diarrhea, though. Another problem is that there are other symptoms that occur in EPI other than diarrhea and weight loss, but there’s not been any data on which groups of people experience which types of symptoms with EPI, or how common the other symptoms are, so they often aren’t listed. This leads to a cycle of lack of awareness, lack of screening, lack of diagnosis, and lack of treatment.

There’s been little effort to date to solve this problem, and I found myself wondering if we as patients, who experience the symptoms directly, could find a way to address this. Between my systematic review papers (where I’ve read hundreds of papers about the symptoms & diagnostic approaches to EPI) and personal experience with EPI, I made a list of 15 symptoms. But it’s not just about which symptoms people have: that’s where the overlap problem comes in. With EPI, many people have a lot of symptoms, a lot of the time, and they are VERY annoying. So the frequency and severity of the symptoms are a hallmark as well. I put together a way to quantify the frequency and severity (using plain language)of symptoms, and the EPI/PEI-SS (Exocrine Pancreatic Insufficiency Symptom Score) was born.

With help from more than a dozen people, some with EPI and some who didn’t have EPI, I ran a pilot test with the symptom score to see if the people with EPI would generate scores, the way I did, and whether people without EPI (and with either everyday gastrointestinal symptoms, or other conditions that sometimes cause GI symptoms) would have scores to match. They did not: it was a stark difference, and there wasn’t any overlap. The EPI symptom burden was quantifiably much higher than everyday GI symptoms for someone without a condition, and also higher compared to people with other conditions with GI symptoms (think food intolerances, IBS, other non-EPI GI conditions).

So I launched a bigger study that many of you participated in (thank you!), with the goal of exploring whether this score would be useful in the general population to help distinguish EPI from other conditions and whether it might possibly aid in screening for EPI.

And now, the results are published! (You can read the full open access paper here: https://doi.org/10.3390/epidemiologia6030048).

Here’s what we learned:

There were 324 participants at the time I cut off data collection for the analysis (after three weeks). This included 155 people who identified as having EPI, and 169 people without EPI. Everyone answered whether or not they had any of the 15 symptoms (falling into three groups: abdominal, toilet-related, and food-related symptoms) and indicated frequency and severity. Multiplying frequency (0–5) x severity (0–3) by each of the 15 symptoms, the EPI/PEI-SS score range is 0–225. (See Table 1 for a list of the symptoms and rating description).

The key finding: people with EPI had higher scores than people without EPI.

In this real-world study, the mean total score of those with EPI was 98.11 (min 1, max 213), in contrast to a mean total score of 38.86 for those without EPI (min 0, max 163). The difference is practically as well as statistically (p<.001) significant.

Figure 1 from the paper showing the sub-scores and total scores broken out by EPI and non-EPI groups, respectively.

Even when I separated the people without EPI into two groups, those with other gastrointestinal conditions and those without, the scores were still distinct and statistically significant from the people with EPI. I also did a sub-analysis of each individual condition and none had a significant impact on the overall score. (Because there are so many people with diabetes in my network who participated in the study, I also ran a separate sub-analysis to deeply analyse the contributions of type 1 diabetes and type 2 diabetes – and made a separate paper on this analysis, which is also open access and available to read here.) Also in the bucket of “things that did not affect the score” was age. However, females in the study reported higher scores compared to males (this matches other studies showing a higher gastrointestinal burden, so this isn’t necessarily unique to EPI).

In addition to the overall score, you can see the difference by looking at the number of symptoms people reported and the difference in frequency and severity:

  • EPI group: 12.39 symptoms, average frequency 3.02, average severity 1.73
  • Non-EPI group: 8.15 symptoms, with nearly half the frequency (1.55) and severity (0.91)
Figure 3 from the paper, showing each of the 15 symptoms and the range of scores for the EPI group (purple) and non-EPI group (blue), respectively.

Nerdy notes (you can read more in the full paper): Cohen’s d (1.475) indicated a large effect size; all comparisons overall and across sub-groups and across symptom categories were statistically significant (p<0.001). Cronbach’s alpha for sub-score categories was “good” (0.88 abdominal, 0.83 toilet, 0.88 food), indicating high internal consistency and good construct validity. Using an EPI/PEI-SS cutoff of 59 (out of possible 225), area under the curve was 0.85, sensitivity was 0.81, and specificity was 0.75.

Were there limitations for this study? You bet. It was online and based on people who happened to fill it out, so follow up studies will help confirm these results in different populations to confirm if it is representative of the average EPI experience. (Note though that this study population did have a lot more diversity of people with EPI, though, compared to most other EPI-related symptom assessment studies, which are often limited to chronic pancreatitis and cystic fibrosis, and/or pancreatic surgery/cancer.) There was a large number of people with diabetes who participated, in part because of my network and where I recruited participants from – however, as seen in this sub-study, presence of diabetes (any type, or split in type 1 and type 2) did not influence scores (analysis here). This study was also exploratory, meaning it was not powered for a specified outcome. We’ve now been able to use this data to power follow up studies, now that we know what to expect score-wise in people with and without EPI!

What should you take away from this study?

If you are a person with some kind of gastrointestinal symptoms, you can use the EPI/PEI-SS to explore your symptoms and quantify them based on frequency and severity. If your score is near or above the cutoff, you may want to consider discussing your symptoms with your doctor and exploring whether testing for EPI (often fecal elastase testing) is warranted. This tool hasn’t been validated as a diagnostic method, but this data can help the shared decision making process and hopefully also aid you in a better conversation with your doctor as you explore pathways to solutions.

The EPI/PEI-SS is available online, for free, and you can use it right now: https://danamlewis.github.io/EPI-PEI-SS/

If you are a person with EPI and you are still struggling with symptoms of EPI (PEI), you may find it handy to take the EPI/PEI-SS to document your symptom burden. Then, as you adjust your enzyme dosing, you can periodically take the EPI/PEI-SS again (every few weeks or months) and use it to help you track whether things are improving. You can use the web version, or if you want to also track your enzyme (PERT) dosing, you can use the EPI/PEI-SS in both the iOS (https://bit.ly/PERT-Pilot-iOS) and Android (https://bit.ly/PERT-Pilot-Android) versions of “PERT Pilot”. Then you can see your scores and view them over time in the same place.

Note that the scores of people with EPI in the study don’t mean that ‘this is as good as it gets’ when you go on enzymes. Many people with EPI indicate that they feel they are not dosing enough enzymes (see this study); the scores on the EPI/PEI-SS reflect this. It is possible for people with EPI to get scores in the non-EPI range, once enzymes are regularly dosed to match what you’re eating. (For example, my score went from well above the cutoff to well below the average non-EPI score once I started enzymes.)

If you are a doctor, take a look at the EPI/PEI-SS (see links, or Table 1 in the paper) so you know what some of the symptoms of EPI are. Notably, be aware that diarrhea and weight loss are not the only symptoms of EPI. In the diabetes sub-study, for example, we found food-related behaviors to be a key variable, as many people intuitively adjust what or how they are eating to try to eliminate symptoms on their own. Pain is not prominent in all corners of the EPI community (it’s more common among people with pancreatitis). Feel free to have patients use the EPI/PEI-SS any time and use it as part of your shared decision making process.

A new symptom score for exocrine pancreatic insufficiency: new research on the EPI/PEI-SS (a blog by Dana M. Lewis on DIYPS.org)If you have any feedback (for example, if it’s been helpful or not), you can email me any time (Dana+EPI-PEI-SS@OpenAPS.org). I’d also love to collaborate, if you’re interested in partnering on any research studies. We have some ongoing studies in different countries (US, Ireland, New Zealand, Australia) in different populations (general population; people with diabetes; people with pancreatic cancer; etc) and I’m looking forward to partnering with other researchers on additional validation studies and exploring if and how the EPI/PEI-SS can help us address some of the gaps of real-world clinical practice and life with EPI.

If you’re a researcher with shared interest in EPI…ditto the above!

Read the research referenced in this blog post: https://doi.org/10.3390/epidemiologia6030048

Cite it: Lewis DM, Landers A. Development of Novel Symptom Score to Assist in Screening for Exocrine Pancreatic Insufficiency. Epidemiologia. 2025; 6(3):48. https://doi.org/10.3390/epidemiologia6030048

Questions? Please comment below!

If you have EPI-specific questions, you might also like this blog post with 25 questions and answers about EPI (PEI) ranging from symptoms and diagnosis to treatment and dosing titration.

The data we leave behind in clinical trials and why it matters for clinical care and healthcare research in the future with AI

Every time I hear that all health conditions will be cured and fixed in 5 years with AI, I cringe. I know too much to believe in this possibility. But this is not an uninformed opinion or a disbelief in the trajectory of AI takeoff: this is grounded in the very real reality of the nature of clinical trials reporting and publication of data and the limitations we have in current datasets today.

The sad reality is, we leave so much important data behind in clinical trials today. (And every clinical trial done before today). An example of this is how we report “positive” results for a lot of tests or conditions, using binary cutoffs and summary reporting without reporting average titres (levels) within subgroups. This affects both our ability to understand and characterize conditions, compare overlapping conditions with similar results, and also to be able to use this information clinically alongside symptoms and presentations of a condition. It’s not just a problem for research, it’s a problem for delivering healthcare. I have some ideas of things you (yes, you!) can do starting today to help fix this problem. It’s a great opportunity to do something now in order to fix the future (and today’s healthcare delivery gaps), not just complain that it’s someone else’s problem. If you contribute to clinical trials, you can help solve this!

What’s an example of this? Imagine an autoantibody test result, where values >20 are considered positive. That means a value of 21, 58, or 82 are all considered positive. But…that’s a wide range, and a much wider spread than is possible with “negative” values, where negative values could be 19, 8, or 3.

When this test is reported by labs, they give suggested cutoffs to interpret “weak”, “moderate”, or “strong” positives. In this example, a value of 20-40 is a “weak” positive, a value between 40-80 is a “moderate” positive, and a value above 80 is a strong positive. In our example list, all positives actually fall between barely a weak positive (21), a solidly moderate positive in the middle of that range (58), and a strong positive just above that cutoff (82). The weak positive could be interpreted as a negative, given variance in the test of 10% or so. But the problem lies in the moderate positive range. Clinicians are prone to say it’s not a strong positive therefore it should be considered as possibly negative, treating it more like the 21 value than the 82 value. And because there are no studies with actual titres, it’s unclear if the average or median “positive” reported is actually all above the “strong” (>80) cutoff or actually falls in the moderate positive category.

Also imagine the scenario where some other conditions occasionally have positive levels of this antibody level but again the titres aren’t actually published.

Today’s experience and how clinicians in the real world are interpreting this data:

  • 21: positive, but 10% within cutoff doesn’t mean true positivity
  • 53: moderate positive but it’s not strong and we don’t have median data of positives, so clinicians lean toward treating it as negative and/or an artifact of a co-condition given 10% prevalence in the other condition
  • 82: strong positive, above cutoff, easy to treat as positive

Now imagine these values with studies that have reported that the median titre in the “positive” >20 group is actually a value of 58 for the people with the true condition.

  • 21: would still be interpreted as likely negative even though it’s technically above the positive cutoff >20, again because of 10% error and how far it is below the median
  • 53: moderate positive but within 10% of the median positive value. Even though it’s not above the “strong” cutoff, more likely to be perceived as a true positive
  • 92: still strong positive, above cutoff, no change in perception

And what if the titres in the co-condition have a median value of 28? This makes it even more likely that if we know the co-condition value is 28 and the true condition value is 58, then a test result of 53 will be more correctly interpreted as the true condition rather than providing a false negative interpretation because it’s not above the >80 strong cutoff.

Why does this matter in the real world? Imagine a patient with a constellation of confusing symptoms and their positive antibody test (which would indicate a diagnosis for a disease) is interpreted as negative. This may result in a missed diagnosis, even if this is the correct diagnosis, given the absence of other definitive testing for the condition. This may mean lack of effective treatment, ineligibility to enroll in clinical trials, impacted quality of life, and possibly negatively impacting their survival and lifespan.

If you think I’m cherry picking a single example, you’re wrong. This has played out again and again in my last few years of researching conditions and autoantibody data. Another real-world scenario is where I had a slight positive (e.g. above a cutoff of 20) value, for a test that the lab reported is correlated with condition X. My doctor was puzzled because I have no signs of this condition X. I looked up the sensitivity and specificity data for this test and it only has 30% sensitivity and 80% specificity, whereas 20% of people with condition Y (which I do have) also have this antibody. There is no data on the median value of positivity in either condition X or condition Y. In the context of these two pieces of information we do have, it’s easier to interpret and guess that this value is not meaningful as a diagnostic for condition X given the lack of matching symptoms, yet the lab reports the association with condition X only even though it’s only slightly more probably for condition X to have this autoantibody compared to condition Y and several other conditions. I went looking for research data on raw levels of this autoantibody, to see where the median value is for positives with condition X and Y and again, like the above example, there is no raw data so it can’t be used for interpretation. Instead, it’s summary of summary data of summarizing with a simple binary cutoff >20, which then means clinical interpretation is really hard to do and impossible to research and meta-analyze the data to support individual interpretation.

And this is a key problem or limitation I see with the future of AI in healthcare that we need to focus on fixing. For diseases that are really well defined and characterized and we have in vitro or mouse models etc to use for testing diagnostics and therapies – sure, I can foresee huge breakthroughs in the next 5 years. However, for so many autoimmune conditions, they are not well characterized or defined, and the existing data we DO have is based on summaries of cutoff data like the examples above, so we can’t use them as endpoints to compare diagnostics or therapeutic targets. We need to re-do a lot of these studies and record and store the actual data so AI *can* do all of the amazing things we hear about the potential for.

But right now, for a lot of things, we can’t.

So what can we do? Right now, we actually CAN make a difference on this problem. If you’re gnashing your teeth about the change in the research funding landscape? You can take action right now by re-evaluating your current and retrospective datasets and your current studies and figure out:

  • Where you’re summarizing data and where raw data needs to be cleaned and tagged and stored so we can use AI with it in the future to do all these amazing things
  • What data could I tag and archive now that would be impossible or expensive to regenerate later?
  • Am I cleaning and storing values in formats that AI models could work with in the future (e.g. structured tables, CSVs, or JSON files)?
  • Most simply: how am I naming and storing the files with data so I can easily find them in the future? “Results.csv” or “results.xlsx” is maybe not ideal for helping you or your tools in the future find this data. How about “autoantibody_test-X_results_May-2025.csv” or similar.
  • Where are you reporting data? Can you report more data, as an associated supplementary file or a repository you can cite in your paper?

You should also ask yourself whether you’re even measuring the right things at the right time, and whether your inclusion and exclusion criteria are too strict and excluding the bulk of the population for which you should be studying.

An example of this is in exocrine pancreatic insufficiency, where studies often don’t look at all of the symptoms that correlate with EPI; they include or allow only for co-conditions that are only a tiny fraction of the likely EPI population; and they study the treatment (pancreatic enzyme replacement therapy) without context of food intake, which is as useful as studying whether insulin works in type 1 diabetes without context of how many carbohydrates someone is consuming.

You can be part of the solution, starting right now. Don’t just think about how you report data for a published paper (although there are opportunities there, too): think about the long term use of this data by humans (researchers and clinicians like yourself) AND by AI (capabilities and insights we can’t do yet but technology will be able to do in 3-5+ years).

A simple litmus test for you can be: if an interested researcher or patient reached out to me as the author of my study, and asked for the data to understand what the mean or median values were of a reported cohort with “positive” values…could I provide this data to them as an array of values?

For example, if you report that 65% of people with condition Y have positive autoantibody levels, you should also be able to say:

  • The mean value of the positive cohort (>20) is 58.
  • The mean value of the negative cohort (<20) is 13.
  • The full distribution (e.g. [21, 26, 53, 58, 60, 82, 92…]) is available in a supplemental file or data repository.

That makes a magnitude of difference in characterizing many of these conditions, for developing future models, testing treatments or comparative diagnostic approaches, or even getting people correctly diagnosed after previous missed diagnoses due to lack of available data to correctly interpret lab results.

Maybe you’re already doing this. If so, thanks. But I also challenge you to do more:

  • Ask for this type of data via peer review, either to be reported in the manuscript and/or included in supplementary material.
  • Push for more supplemental data publication with papers, in terms of code and datasets where possible.
  • Talk with your team, colleague and institution about long-term storage, accessibility, and formatting of datasets
  • Better yet, publish your anonymized dataset either with the supplementary appendix or in a repository online.
  • Take a step back and consider whether you’re studying the right things in the right population at the right time

The data we leave behind in clinical trials (white matters for clinical care, healthcare research, and the future with AI), a blog post by Dana M. Lewis from DIYPS.orgThese are actionable, doable, practical things we can all be doing, today, and not just gnashing our teeth. The sooner we course correct with improved data availability, the better off we’ll all be in the future, whether that’s tomorrow with better clinical care or in years with AI-facilitated diagnoses, treatments, and cures.

We should be thinking about:

  • What if we design data gathering & data generation in clinical trials not only for the current status quo (humans juggling data and only collecting minimal data), but how should we design trials for a potential future of machines as the primary viewers of the data?
  • What data would be worth accepting, collecting, and seeking as part of trials?
  • What burdens would that add (and how might we reduce those) now while preparing for that future?

The best time to collect the data we need was yesterday. The second best time is today (and tomorrow).

What bends and what breaks and the importance of knowing the difference as a patient

As a patient, navigating healthcare often feels like decoding a complex rulebook. There are rules for everything: medication dosages, timing protocols, follow-up intervals. Some of these rules matter a lot, for either short term or longer term safety or health outcomes. But at other times… the rules seem senseless and are applied differently based on different healthcare providers within the same specialty, let alone across different specialities. As a patient, it’s easy to initially want to try to follow all rules perfectly, but feel unable to because the rules don’t make sense in a personal context. Over time, it can be hard to resist the conclusion that the rules don’t matter or don’t apply to you. The reality is somewhere in between. And it’s the in-between part that can be a challenging balance to figure out. Learning to navigate this balance requires understanding which rules are flexible and which aren’t.

I’ve learned there’s enormous value in digging into the “why” behind medical recommendations, when I can. Take acetaminophen (Tylenol), for example. There’s a clear, non-negotiable daily limit on the bottle because exceeding it is dangerous. The over-the-counter recommendation for Extra Strength acetaminophen (500 mg tablets) is no more than two tablets every six hours, not exceeding six tablets in 24 hours. Which actually means 3 doses per day, despite the 6 hour recommendation. This maximum daily limit (no more than six tablets) is set close to the safety threshold; exceeding that limit (eight tablets in 24 hours) increases the risk of severe liver damage.

Understanding this daily limit provides flexibility within safe boundaries (with the obvious caveat that I’m not a doctor and you should always talk to your own doctor). The “every 6 hours” recommendation ensures stable bioavailability of acetaminophen throughout the day, and making sure over the course of 24 hours that you are safely and completely below the max dosage line. Slight deviations to timing, such as taking a dose at 5 hours and 30 minutes instead of precisely 6 hours because you’re about to go to sleep, do not inherently cause harm, as long as the total intake remains within the safe daily limit. This is an example where a compliance-oriented guideline is designed primarily for optimal adherence at the population based level, rather than marking an absolute safety threshold at each individual dose.

There are a lot of things like this in healthcare, but it’s not always explained to patients and patients may not always think to stop and question the why – or have the time and resources to do so – and figure it out from first principles to decide whether a deviation on the timing or amount is risky, or not.

But many healthcare rules aren’t as clearly defined by safety, as is the case of the acetaminophen example. Other rules are shaped by convenience, compliance, and practical constraints of research protocols.

Timelines like “two weeks,” “one month,” or “six months” for follow-up visits or medication titration points often reflect research convenience more than physiological necessity or even the ideal best practice. These intervals might mark study endpoints, or convenience to the healthcare system, but they don’t necessarily pinpoint the best timeline overall or the right timeline for an individual patient. It can be hard as a patient to decide if your experience is deviating from the typical timeline in a beneficial or non-optimal way, and if and when to speak up and try to better adjust to the system or adjust the system to meet your needs (such as scheduling an earlier appointment rather than waiting for a mythical 4 month follow up when it’s clear by months 2-3 that there is no benefit to a treatment because any impact should have been observed by then, even if it wasn’t significant).

As a patient, understanding when rules reflect safety versus when they’re crafted primarily for convenience is crucial, but hard. Compliance-driven rules can sometimes be thoughtfully bent. They might be able to be adjusted to better fit individual circumstances without compromising safety. For instance, a medication schedule set strictly every eight hours might be modified slightly based on daily activities or sleep patterns, provided the change remains within safe therapeutic boundaries over the course of 24 hours. (And patients should be able to discuss this with their doctors! But time availability or access may influence the ability to have these conversations up front or over time as conflicts or issues arise.)

Yet, bending rules requires confidence, critical thinking, and often significant resources, whether those are educational, emotional, health itself, or financial. It means feeling secure enough to question a provider’s advice or advocate for adjustments tailored to individual needs. It’s not always even questioning the advice itself, but checking the understanding and interpretation of how you apply it to your own life. Most providers understand that, and have no problem confirming your understanding. Other times, though, it can accidentally or unintentionally cause conflict, if providers sometimes perceive questioning of their judgement.

I’ve tripped into that situation at least once accidentally before, when I had a follow up appointment with a non-MD clinical provider who wasn’t my main doctor at the practice, who I was seeing for an acute short-term issue. She was describing a recommendation for an rx, specifically because I have diabetes. In the past, I have received over-treatment from most providers because of having type 1 diabetes, because many recommendations for non-diabetes management that have guidance for people with diabetes are based on an assumption of non-optimal healing and non-optimal glucose management. Given that at the time I was already using OpenAPS, with ideal glucose outcomes for years, and no issues ever with reduced healing, I asked if the prescription recommendation would be given to the same type of patient without diabetes. I was trying to help myself make an informed decision about whether to accept the recommendation for the rx to determine if it was appropriate. If it was just because I had diabetes, it warranted additional discussion. It wasn’t about her clinical judgement per se, but about a shared decision making process to right-size the next steps to my individual situation, rather than assume that population-based outcomes for people with diabetes were automatically appropriate. Because of my experience, I know that sometimes they are and sometimes they are not, so I’ve learned to ask these questions. However, some combination of the lack of existing relationship with this provider; perhaps a poorly worded question; and other factors made the provider act defensive. I got the information I needed, decided the rx was appropriate for me and I would use it, and went about my business. But I got a follow up call later from another MD (again, not my MD) who was defensive and calling to check why I was questioning this non-MD provider and it came across as if I was questioning her because the provider was a non-MD…which was not the issue at all! It was about me and my care and making sure I understood the root of the recommendation: whether it was because of the health situation or because I had diabetes. (It was the former, about the health situation, although initially articulated as being simply because of the latter fact of simply having diabetes.)

This situation has colored all future encounters with healthcare providers for me. Seeing new providers who I don’t have a longstanding relationship with makes me nervous, from learned, lived experience about how some of these one-off encounters have gone in the past, like the ones above.

Unfortunately, patients who push back against compliance-driven rules or simply ask questions to facilitate their understanding risk being labeled “non-compliant” or “non-adherent”, and sometimes we get labels on our chart for asking questions and being misunderstood, despite our good intentions. Such labels can have lasting impacts, influencing how future providers perceive our reliability and credibility and can cause subsequent issues for receiving or even being granted access to healthcare.

This creates a profound dilemma for patients: follow all rules precisely, without question, but potentially sacrificing optimal care, or thoughtfully question to bend them and risk being misunderstood or penalized for trying to optimize your individual outcomes when the one-size-fits-all approach doesn’t actually fit.

Breaking compliance-oriented rules isn’t about defiance. At least, it’s never been that way for me. It’s about personalization and achieving the best possible outcomes. But not every patient has the luxury of confidently navigating these nuances, and even when they do, as described above, it can still sometimes turn out not so well. Many patients don’t have the time, energy, resources, or privilege required to safely challenge or reinterpret guidelines. Or they’ve been penalized for doing so. Consequently, they may remain strictly compliant, potentially missing opportunities for better individual outcomes and higher quality of life.

Healthcare needs to provide clarity around which rules are absolute safety boundaries and which are recommendations optimized primarily for convenience or broad adherence for the safe general public use. Patients deserve transparency and support in discerning between what’s bendable for individual benefit and what’s non-negotiable for safety.

What bends, what breaks and the importance of understanding the difference in healthcare. A blog post by Dana M. Lewis from DIYPS.orgAnd: patients should not be punished for asking questions in order to better understand or check their understanding. 

Knowing the difference on what bends and what breaks matters. But many patients remain caught in the delicate balance between bending and breaking, carefully evaluating risks and rewards, often alone.

How Medical Research Literature Evolves Over Time Like A Game of Telephone

Have you ever searched for or through medical research on a specific topic, only to find different studies saying seemingly contradictory things? Or you find something that doesn’t seem to make sense?

You may experience this, whether you’re a doctor, a researcher, or a patient.

I have found it helpful to consider that medical literature is like a game of telephone, where a fact or statement is passed from one research paper to another, which means that sometimes it is slowly (or quickly!) changing along the way. Sometimes this means an error has been introduced, or replicated.

A Game of Telephone in Research Citations

Imagine a research study from 2016 that makes a statement based on the best available data at the time. Over the next few years, other papers cite that original study, repeating the statement. Some authors might slightly rephrase it, adding their own interpretations. By 2019, newer research has emerged that contradicts the original statement. Some researchers start citing this new, corrected information, while others continue citing the outdated statement because they either haven’t updated their knowledge or are relying on older sources, especially because they see other papers pointing to these older sources and find it easiest to point to them, too. It’s not necessarily made clear that this outdated statement is now known to be incorrect. Sometimes that becomes obvious in the literature and field of study, and sometimes it’s not made explicit that the prior statement is ‘incorrect’. (And if it is incorrect, it doesn’t become known as incorrect until later – at the time it’s made, it’s considered to be correct.) 

By 2022, both the correct and incorrect statements appear in the literature. Eventually, a majority of researchers transition to citing the updated, accurate information—but the outdated statement never fully disappears. A handful of papers continue to reference the original incorrect fact, whether due to oversight, habit (of using older sources and repeating citations for simple statements), or a reluctance to accept new findings.

The gif below illustrates this concept, showing how incorrect and correct statements coexist over time. It also highlights how researchers may rely on citations from previous papers without always checking whether the original information was correct in the first place.

Animated gif illustrating how citations branch off and even if new statements are introduced to the literature, the previous statement can continue to appear over time.

This is not necessarily a criticism of researchers/authors of research publications (of which I am one!), but an acknowledgement of the situation that results from these processes. Once you’ve written a paper and cited a basic fact (let’s imagine you wrote this paper in 2017 and cite the 2016 paper and fact), it’s easy to keep using this citation over time. Imagine it’s 2023 and you’re writing a paper on the same topic area, it’s very easy to drop the same citation from 2016  in for the same basic fact, and you may not think to consider updating the citation or check if the fact is still the fact.

Why This Matters

Over time, a once-accepted “fact” may be corrected or revised, but older statements can still linger in the literature, continuing to influence new research. Understanding how this process works can help you critically evaluate medical research and recognize when a widely accepted statement might actually be outdated—or even incorrect.

If you’re looking into a medical topic, it’s important to pay attention not just to what different studies say, but also when they were published and how their key claims have evolved over time. If you notice a shift in the literature—where newer papers cite a different fact than older ones—it may indicate that scientific understanding has changed.

One useful strategy is to notice how frequently a particular statement appears in the literature over time.

Whenever I have a new diagnosis or a new topic to research on one of my chronic diseases, I find myself doing this.

I go and read a lot of abstracts and research papers about the topic; I generally observe patterns in terms of key things that everyone says, which establishes what the generally understood “facts” are, and also notice what is missing. (Usually, the question I’m asking is not addressed in the literature! But that’s another topic…)

I pay attention to the dates, observing when something is said in papers in the 1990s and whether it’s still being repeated in the 2020s era papers, or if/how it’s changed. In my head, I’m updating “this is what is generally known” and “this doesn’t seem to be answered in the literature (yet)” and “this is something that has changed over time” lists.

Re-Evaluating the Original ‘Fact’

In some cases, it turns out the original statement was never correct to begin with. This can happen when early research is based on small sample sizes, incomplete data, or incorrect assumptions. Sometimes that statement was correct, in context, but taken out of context immediately and this out of context use was never corrected. 

For example, a widely cited statement in medical literature once claimed that chronic pancreatitis is the most common cause of exocrine pancreatic insufficiency (EPI). This claim was repeated across numerous papers, reinforcing it as accepted knowledge. However, a closer examination of population data shows that while chronic pancreatitis is a known co-condition of EPI, it is far less common than diabetes—a condition that affects a much larger population and is also strongly associated with EPI. Despite this, many papers still repeat the outdated claim without checking the original data behind it.

(For a deeper dive into this example, you can read my previous post here. But TL;DR: even 80% of .03% is a smaller number than 10% of 10% of the overall population…so it is not plausible that CP is the biggest cause of EPI/PEI.)

Stay Curious

This realization can be really frustrating, because if you’re trying to do primary research to help you understand a topic or question, how do you know what the truth is? This is peer-reviewed research, but what this shows us is that the process of peer-review and publishing in a journal is not infallible. There can be errors. The process for updating errors can be messy, and it can be hard to clean up the literature over time. This makes it hard for us humans – whether in the role of patient or researcher or clinician – to sort things out.

But beyond a ‘woe is me, this is hard’ moment of frustration, I do find that this perspective of literature as a process of telephone makes me a better reader of the literature and forces me to think more critically about what I’m reading, and take papers in context of the broader landscape of literature and evolving knowledge base. It helps remove the strength I would otherwise be prone to assigning any one paper (and any one ‘fact’ or finding from a single paper), and encourages me to calibrate this against the broader knowledge base and the timeline of this knowledge base.

That can also be hard to deal with personally as a researcher/author, especially someone who tends to work in the gaps, establishing new findings and facts and introducing them to the literature. Some of my work also involves correcting errors in the literature, which I find from my outsider/patient perspective to be obvious because I’ve been able to use fresh eyes and evaluate at a systematic review level/high level view, without being as much in the weeds. That means my work, to disseminate new or corrected knowledge, is even more challenging. It’s also challenging personally as a patient, when I “just” want answers and for everything to already be studied, vetted, published, and widely known by everyone (including me and my clinician team).

But it’s usually not, and that’s just something I – and we – have to deal with. I’m curious as to whether we will eventually develop tools with AI to address this. Perhaps a mini systematic review tool that scrapes the literature and includes an analysis of how things have changed over time. This is done in systematic review or narrative reviews of the literature, when you read those types of papers, but those papers are based on researcher interests (and time and funding), and I often have so many questions that don’t have systematic reviews/narrative reviews covering them. Some I turn into papers myself (such as my paper on systematically reviewing the dosing guidelines and research on pancreatic enzyme replacement therapy for people with exocrine pancreatic insufficiency, known as EPI or PEI, or a systematic review on the prevalence of EPI in the general population or a systematic review on the prevalence of EPI in people with diabetes (Type 1 and Type 2)), but sometimes it’s just a personal question and it would be great to have a tool to help facilitate the process of seeing how information has changed over time. Maybe someone will eventually build that tool, or it’ll go on my list of things I might want to build, and I’ll build it myself like I have done with other types of research tools in the past, both without and with AI assistance. We’ll see!

TL;DR: be cognizant of the fact that medical literature changes over time, and keep this in mind when reading a single paper. Sometimes there are competing “facts” or beliefs or statements in the literature, and sometimes you can identify how it evolves over time, so that you can better assess the accuracy of research findings and avoid relying on outdated or incorrect information.

Whether you’re a researcher, a clinician, or a patient doing research for yourself, this awareness can help you better navigate the scientific literature.

A screenshot from the animated gif showing how citation strings happen in the literature, branching off over time but often still resulting in a repetition of a fact that is later considered to be incorrect, thus both the correct and incorrect fact occur in the literature at the same time.

The prompt matters when using Large Language Models (LLMs) and AI in healthcare

I see more and more research papers coming out these days about different uses of large language models (LLMs, a type of AI) in healthcare. There are papers evaluating it for supporting clinicians in decision-making, aiding in note-taking and improving clinical documentation, and enhancing patient education. But I see a wide-sweeping trend in the titles and conclusions of these papers, exacerbated by media headlines, making sweeping claims about the performance of one model versus another. I challenge everyone to pause and consider a critical fact that is less obvious: the prompt matters just as much as the model.

As an example of this, I will link to a recent pre-print of a research article I worked on with Liz Salmi (published article here pre-print here).

Liz nerd-sniped me about an idea of a study to have a patient and a neuro-oncologist evaluate LLM responses related to patient-generated queries about a chart note (or visit note or open note or clinical note, however you want to call it). I say nerd-sniped because I got very interested in designing the methods of the study, including making sure we used the APIs to model these ‘chat’ sessions so that the prompts were not influenced by custom instructions, ‘memory’ features within the account or chat sessions, etc. I also wanted to test something I’ve observed anecdotally from personal LLM use across other topics, which is that with 2024-era models the prompt matters a lot for what type of output you get. So that’s the study we designed, and wrote with Jennifer Clarke, Zhiyong Dong, Rudy Fischmann, Emily McIntosh, Chethan Sarabu, and Catherine (Cait) DesRoches, and I encourage you to check out the article here pre-print and enjoy the methods section, which is critical for understanding the point I’m trying to make here. 

In this study, the data showed that when LLM outputs were evaluated for a healthcare task, the results varied significantly depending not just on the model but also on how the task was presented (the prompt). Specifically, persona-based prompts—designed to reflect the perspectives of different end users like clinicians and patients—yielded better results, as independently graded by both an oncologist and a patient.

The Myth of the “Best Model for the Job”

Many research papers conclude with simplified takeaways: Model A is better than Model B for healthcare tasks. While performance benchmarking is important, this approach often oversimplifies reality. Healthcare tasks are rarely monolithic. There’s a difference between summarizing patient education materials, drafting clinical notes, or assisting with complex differential diagnosis tasks.

But even within a single task, the way you frame the prompt makes a profound difference.

Consider these three prompts for the same task:

  • “Explain the treatment options for early-stage breast cancer.”
  • “You’re an oncologist. Explain the treatment options for early-stage breast cancer.”
  • “You’re an oncologist. Explain the treatment options for early-stage breast cancer as you would to a newly diagnosed patient with no medical background.”

The second and third prompt likely result in a more accessible and tailored response. If a study only tests general prompts (e.g. prompt one), it may fail to capture how much more effective an LLM can be with task-specific guidance.

Why Prompting Matters in Healthcare Tasks

Prompting shapes how the model interprets the task and generates its output. Here’s why it matters:

  • Precision and Clarity: A vague prompt may yield vague results. A precise prompt clarifies the goal and the speaker (e.g. in prompt 2), and also often the audience (e.g. in prompt 3).
  • Task Alignment: Complex medical topics often require different approaches depending on the user—whether it’s a clinician, a patient, or a researcher.
  • Bias and Quality Control: Poorly constructed prompts can inadvertently introduce biases

Selecting a Model for a Task? Test Multiple Prompts

When evaluating LLMs for healthcare tasks—or applying insights from a research paper—consider these principles:

  1. Prompt Variation Matters: If an LLM fails on a task, it may not be the model’s fault. Try adjusting your prompts before concluding the model is ineffective, and avoid broad sweeping claims about a field or topic that aren’t supported by the test you are running.
  2. Multiple Dimensions of Performance: Look beyond binary “good” vs. “bad” evaluations. Consider dimensions like readability, clinical accuracy, and alignment with user needs, as an example when thinking about performance in healthcare. In our paper, we saw some cases where a patient and provider overlapped in ratings, and other places where the ratings were different.
  3. Reproducibility and Transparency: If a study doesn’t disclose how prompts were designed or varied, its conclusions may lack context. Reproducibility in AI studies depends not just on the model, but on the interaction between the task, model, and prompt design. You should be looking for these kinds of details when reading or peer reviewing papers. Take results and conclusions with a grain of salt if these methods are not detailed in the paper.
  4. Involve Stakeholders in Evaluation: As shown in the preprint mentioned earlier, involving both clinical experts and patients in evaluating LLM outputs adds critical perspectives often missing in standard evaluations, especially as we evolve to focus research on supporting patient needs and not simply focusing on clinician and healthcare system usage of AI.

What This Means for Healthcare Providers, Researchers, and Patients

  • For healthcare providers, understand that the way you frame a question can improve the usefulness of AI tools in practice. A carefully constructed prompt, adding a persona or requesting information for a specific audience, can change the output.
  • For researchers, especially those developing or evaluating AI models, it’s essential to test prompts across different task types and end-user needs. Transparent reporting on prompt strategies strengthens the reliability of your findings.
  • For patients, recognizing that AI-generated health information is shaped by both the model and the prompt. This can support critical thinking when interpreting AI-driven health advice. Remember that LLMs can be biased, but so too can be humans in healthcare. The same approach for assessing bias and evaluating experiences in healthcare should be used for LLM output as well as human output. Everyone (humans) and everything (LLMs) are capable of bias or errors in healthcare.

Prompts matter, so consider model type as well as the prompt as a factor in assessing LLMs in healthcare. Blog by Dana M. LewisTLDR: Instead of asking “Which model is best?”, a better question might be:

“How do we design and evaluate prompts that lead to the most reliable, useful results for this specific task and audience?”

I’ve observed, and this study adds evidence, that prompt interaction with the model matters.

Best practices in communication related to writing a journal article and sharing it with co-authors

I’ve been a single author, a lead author, a co-author, a corresponding author, AND a last author. Basically, I have written a lot of journal articles myself, solo / single, and with other people. One area in this process that I observe frequently gets overlooked is what happens during and after the submission process, as it relates to communicating about the article itself.

I’m not talking about disseminating the article to your target audience or the public, either (although that is important as well). I’m talking about making sure all authors know the article has been accepted; when it is live; have access to a copy of the article (!); etc.

Most people don’t know that by default, not all journals give all authors access to their own articles for free.

Here are some tips about the process of submitting and saving published articles that will help all authors – even solo authors – in the future.

Basically, help you help your future self! (As well as help your co-authors).

Journals typically only notify the lead/corresponding/submitting author about where the manuscript is in terms of revision, acceptance, and publication. That puts the responsibility on the lead/corresponding/submitting author to notify the full team of authors of where the article is in the process. Similarly, some journals will send a PDF/final copy of the proofed, final, version of record article to the lead author (not always, but usually), but that often does not go out to the full author team by default.

This means that it is the lead author’s responsibility to forward the copy of the final, PDF, proofed article to the entire authorship team so everyone has a copy.

(No, most of the time authors do not have free access to the journal they are submitting to. No, most authors do not have budget to make articles open access and free to all, which means unless they manage to snag and save this PDF article when it is sent to them at the time of publication, in the future, they may not have access to their very own article! Just because you, as the lead/corresponding author do have access, this does not mean everyone on your article team will.

I’m a good example of someone who authors frequently but is not at an institution and has zero access to any paywalled journals. If I’m not given a copy of my articles at the time of publication, I have to phone-a-friend (thanks, Liz Salmi, for being the go-to for me here) to help pull articles. There are things like S c i H u b, but they more often than not do not have super recent, fresh off the press articles. So yes, people like me exist on your authorship teams.)

Best practices for authors include:

  • Once you submit a manuscript, mark your file name (somehow) with “Submitted”. This way you know this is the version that was submitted. This is a useful step related to the below, we’ll come back to why we may want to use only the ‘submitted’ version.

    Example: “JournalAcronym-Article-Blah-Blah-SUBMITTED.docx”.

    Even as the non-lead author, when co-writing articles, as any type of author I prefer to have access to this submitted version. This way, I can see all incorporated edits and the ‘final’ version we submitted. There’s also cases where, see below, I need this for sharing it with other people.

  • Usually, the article goes through peer review and you get comments, so you make revisions and re-submit your article. Again, once submitted, make sure you’ve marked this as ‘revision’ somehow (usually people do) and that is was submitted.

    Example: “JournalAcronym-Article-Blah-Blah-SUBMITTED-R1.docx”.

    Again, best practice would be to send out this re-submitted revision version to all authors so everyone has it.

  • You may end up with multiple rounds of revisions and peer review (moving to R2, etc), or you may get an acceptance notice. Your article will then move to copyediting stage and you get proofs. It’s useful to save these for your own purpose, such as making sure that the edits you make are actually executed in the final article. This is less important for dissemination, though, although I do recommend giving all co-authors the ability to edit/review/proof and request changes.
  • Accepted, proofed, published! THIS is the step that I see most people miss, so pay attention.If you are the lead or solo author, you will probably get an email saying your article is now online, either online first or published. You may get an attachment PDF of your article. If not, you should be able to click on your access link and go to access the article online.

    IMPORTANT STEP HERE: go ahead and download the PDF of the article then. Right then, go ahead and save it.

    Example: “JournalAcronym-Article-Blah-Blah-Year.PDF”.

    (Why do you care about this if you are a solo author? Because the link may expire and you may lose access to this article. More on sharing your article below.)

  • Email your entire author team (if you’re not a solo author). Tell them the article was published; provide a link and/or the DOI link; and attach the PDF to the email so everyone on the team has a copy of the final article. Not all of your co-authors will work at an institution that has unlimited library access; if they do, that might change in the future. Give everyone a copy of the article to save for themselves.You can also remind everyone what the sharing permissions (or limitations) are for the article.

    For example, some articles are paywalled but authors have permission to store the final copy (PDF of the final version) on their own repository or not-for-profit website. For an example, see my research page of DIYPS.org/research – you’ll notice how sometimes I link to an “author copy” PDF, which is what this is – the final article PDF like you would get by accessing the paywalled journal.

    Other times, though, you are specifically not permitted to share the final/proofed/formatted copy. Instead, you’ll be allowed to share the “submitted” manuscript (usually prior to the revision stage). Remember how step 1 that I told you was to save a SUBMITTED copy? This is why! You can PDF this up; add a note to the top that references the final version of record (usually, journals give you recommended language for this) and a link/DOI link to it, and share away on your own site. Again, look at DIYPS.org/research and you’ll notice some of my “author copy” versions are these submitted versions rather than the final versions.

    You’ll also notice that sometimes I link to articles that are open access and then also have a link to a PDF author copy. This is in case something changes in the future with open access links breaking, the journal changing, etc. I have actually had free non-paywalled articles get turned into paywalled journal articles years later, which is why I do point to both places (the open access version and a back up author copy).

    Regardless of what the permissions are for sharing on your own website/repository/institutional repository: you as the author always have permission to give this PDF out when you are asked directly. For example, someone emails you and asks for a copy: you can email back and attach the PDF! This is true even if the permissioning for your own website is the submitted version (not the final version), you can still hand out the final, formatted, pretty PDF version when asked directly.

    As a related tip, this is a great way to disseminate your research and build relationships, so if someone does email you and ask for an author copy…please reply and send them a copy. (Saying this as someone without access to articles who sends requests to many authors to get access to their research, and I only get responses from 50% of authors. Sad panda.) Again, this is why it is helpful to get in the habit of saving your articles as you submit and have them published; it makes it easy to jump into the “Published copy” folder (or however you name it) and attach the PDF to the email and send it.

To recap, as a best practice, you should disseminate various versions of articles to your entire co-author team at the following points in time:

  • Original submission.

    Suggestion: Write an email, say you’ve successfully submitted, remind everyone which journal this was submitted to, and attached a copy of your “JournalAcronym-Article-Blah-Blah-SUBMITTED.docx”(If you end up getting a desk rejection, and you are re-submitting elsewhere, it is also nice to email co-authors and tell them so. You don’t necessarily need to send out a newly retitled version, unless there’s new changes to the submission, such as if you did go through a partial round of peer review before getting rejected and you are submitting the revised version to the new target journal.)

  • Revision submission.

    Suggestion: Write an email, say you’ve successfully submitted the revisions, remind everyone which journal this was submitted to, and attached a copy of your “JournalAcronym-Article-Blah-Blah-SUBMITTED-R1.docx” and the reviewer response document so everyone can see how edits/feedback were incorporated (or not).

  • Acceptance.

    Suggestion:

    A) Forward the email if it has the PDF attached to your full author team. Say congratulations; the article was accepted; and point out the article is attached as PDF.

    B) If you don’t have a PDF attachment in your email already, go to the online access link the journal gave you and save a copy of the PDF. Then, email the author team with the FYI that the article is live; provide the link to the online version; and attach the PDF directly to that email so everyone has a final version.

    Regardless of A or B, remind everyone what the permissions are for sharing to their own/institution repository (eg final PDF or use the submitted version, which you previously shared or could also re-share here).

Bonus tip:

Depending on the content of your article, you may also want to think about sending copies of the final PDF article to certain people who are not co-authors with you.

For example, if you are heavily citing someone’s work or talking about their work in a constructive way – you could email them and give them a heads up and provide a copy of the article. It’s a great way to contribute to your relationship (if you have an existing relationship) and/or foster a relationship. Remember that many people will have Google Scholar Alerts or similar with their name and/or citation alerts from various services, so people are likely to see when you talk about them or their work or are heavily citing their work. Again, some of those people may not have access to your article and may reach out to ask for an article; you can (and should) send them a copy! (And again, consider thinking about it as a relationship building opportunity rather than a transactional thing related to this single article.)

I would particularly flag this as something to pay attention to and do if you are someone working in the space of patient engagement in healthcare. For example, if you write an article and mention them or their body of work by name, it would be courteous to email them, let them know about the article, and send them a PDF.

Otherwise, I can speak from the experience of being talked about as a patient like I’m an ant under the microscope where someone cites an article where my work is mentioned; talks about me by name and references my perspective; and I get a notification about this article….but I can’t access it because it’s in a paywalled journal. Awkward, and a little weird in some cases when the very subject of the article(s) are about patient engagement and involving patients in research. Remember, research involvement should include all stages from design, planning, doing the research, and then disseminating the research. So this meta point is that if there is scholarly literature of any kind (whether original research articles or reviews, commentaries, letters in response to other articles, etc) talking about specific patients and their bodies of work – best practice should be to email them and send a copy of the article. Again, think less transactional and more about relationships – it will likely give you benefit in the long run! Plus, less awkward, a short-term benefit.

—-

best practices for communicating with co-authors about published articles, by Dana M. Lewis from DIYPS.orgAs an example for how I like to disseminate my articles personally, every time a journal article is published and I have access to it, I updated DIYPS.org/research with the title, journal, a DOI link (to help people find it online and/or cite it), and a link to the open access version if available and if not, an author copy PDF of the final or submitted version. So, if you’re ever looking for any of my articles, you can head there (DIYPS.org/research) first and grab copies any time!

If you are looking for a particular article and can’t find it or it’s not listed there yet (e.g. likely because it just came out and I haven’t been sent my own copy by my co-authors yet…), you can always email me directly (Dana@OpenAPS.org) and I’m more than happy to send you a copy of whatever version I have available and/or the final PDF once I have access to it.