A new symptom score for people with exocrine pancreatic insufficiency (the EPI/PEI-SS)

August 13, 2025 by Dana Lewis

One of the frequent complaints in the literature about exocrine pancreatic insufficiency (known as EPI in some parts of the world, or PEI for pancreatic exocrine insufficiency elsewhere) is that the symptoms are not specific and they can overlap with other conditions. Diarrhea, for example, can happen from a lot of conditions and a lot of medications. Not everyone with EPI has diarrhea, though. Another problem is that there are other symptoms that occur in EPI other than diarrhea and weight loss, but there’s not been any data on which groups of people experience which types of symptoms with EPI, or how common the other symptoms are, so they often aren’t listed. This leads to a cycle of lack of awareness, lack of screening, lack of diagnosis, and lack of treatment.

There’s been little effort to date to solve this problem, and I found myself wondering if we as patients, who experience the symptoms directly, could find a way to address this. Between my systematic review papers (where I’ve read hundreds of papers about the symptoms & diagnostic approaches to EPI) and personal experience with EPI, I made a list of 15 symptoms. But it’s not just about which symptoms people have: that’s where the overlap problem comes in. With EPI, many people have a lot of symptoms, a lot of the time, and they are VERY annoying. So the frequency and severity of the symptoms are a hallmark as well. I put together a way to quantify the frequency and severity (using plain language)of symptoms, and the EPI/PEI-SS (Exocrine Pancreatic Insufficiency Symptom Score) was born.

With help from more than a dozen people, some with EPI and some who didn’t have EPI, I ran a pilot test with the symptom score to see if the people with EPI would generate scores, the way I did, and whether people without EPI (and with either everyday gastrointestinal symptoms, or other conditions that sometimes cause GI symptoms) would have scores to match. They did not: it was a stark difference, and there wasn’t any overlap. The EPI symptom burden was quantifiably much higher than everyday GI symptoms for someone without a condition, and also higher compared to people with other conditions with GI symptoms (think food intolerances, IBS, other non-EPI GI conditions).

So I launched a bigger study that many of you participated in (thank you!), with the goal of exploring whether this score would be useful in the general population to help distinguish EPI from other conditions and whether it might possibly aid in screening for EPI.

And now, the results are published! (You can read the full open access paper here: https://doi.org/10.3390/epidemiologia6030048).

Here’s what we learned:

There were 324 participants at the time I cut off data collection for the analysis (after three weeks). This included 155 people who identified as having EPI, and 169 people without EPI. Everyone answered whether or not they had any of the 15 symptoms (falling into three groups: abdominal, toilet-related, and food-related symptoms) and indicated frequency and severity. Multiplying frequency (0–5) x severity (0–3) by each of the 15 symptoms, the EPI/PEI-SS score range is 0–225. (See Table 1 for a list of the symptoms and rating description).

The key finding: people with EPI had higher scores than people without EPI.

In this real-world study, the mean total score of those with EPI was 98.11 (min 1, max 213), in contrast to a mean total score of 38.86 for those without EPI (min 0, max 163). The difference is practically as well as statistically (p<.001) significant.

Figure 1 from the paper showing the sub-scores and total scores broken out by EPI and non-EPI groups, respectively.

Even when I separated the people without EPI into two groups, those with other gastrointestinal conditions and those without, the scores were still distinct and statistically significant from the people with EPI. I also did a sub-analysis of each individual condition and none had a significant impact on the overall score. (Because there are so many people with diabetes in my network who participated in the study, I also ran a separate sub-analysis to deeply analyse the contributions of type 1 diabetes and type 2 diabetes – and made a separate paper on this analysis, which is also open access and available to read here.) Also in the bucket of “things that did not affect the score” was age. However, females in the study reported higher scores compared to males (this matches other studies showing a higher gastrointestinal burden, so this isn’t necessarily unique to EPI).

In addition to the overall score, you can see the difference by looking at the number of symptoms people reported and the difference in frequency and severity:

EPI group: 12.39 symptoms, average frequency 3.02, average severity 1.73
Non-EPI group: 8.15 symptoms, with nearly half the frequency (1.55) and severity (0.91)

Figure 3 from the paper, showing each of the 15 symptoms and the range of scores for the EPI group (purple) and non-EPI group (blue), respectively.

Nerdy notes (you can read more in the full paper): Cohen’s d (1.475) indicated a large effect size; all comparisons overall and across sub-groups and across symptom categories were statistically significant (p<0.001). Cronbach’s alpha for sub-score categories was “good” (0.88 abdominal, 0.83 toilet, 0.88 food), indicating high internal consistency and good construct validity. Using an EPI/PEI-SS cutoff of 59 (out of possible 225), area under the curve was 0.85, sensitivity was 0.81, and specificity was 0.75.

Were there limitations for this study? You bet. It was online and based on people who happened to fill it out, so follow up studies will help confirm these results in different populations to confirm if it is representative of the average EPI experience. (Note though that this study population did have a lot more diversity of people with EPI, though, compared to most other EPI-related symptom assessment studies, which are often limited to chronic pancreatitis and cystic fibrosis, and/or pancreatic surgery/cancer.) There was a large number of people with diabetes who participated, in part because of my network and where I recruited participants from – however, as seen in this sub-study, presence of diabetes (any type, or split in type 1 and type 2) did not influence scores (analysis here). This study was also exploratory, meaning it was not powered for a specified outcome. We’ve now been able to use this data to power follow up studies, now that we know what to expect score-wise in people with and without EPI!

What should you take away from this study?

If you are a person with some kind of gastrointestinal symptoms, you can use the EPI/PEI-SS to explore your symptoms and quantify them based on frequency and severity. If your score is near or above the cutoff, you may want to consider discussing your symptoms with your doctor and exploring whether testing for EPI (often fecal elastase testing) is warranted. This tool hasn’t been validated as a diagnostic method, but this data can help the shared decision making process and hopefully also aid you in a better conversation with your doctor as you explore pathways to solutions.

The EPI/PEI-SS is available online, for free, and you can use it right now: https://danamlewis.github.io/EPI-PEI-SS/

If you are a person with EPI and you are still struggling with symptoms of EPI (PEI), you may find it handy to take the EPI/PEI-SS to document your symptom burden. Then, as you adjust your enzyme dosing, you can periodically take the EPI/PEI-SS again (every few weeks or months) and use it to help you track whether things are improving. You can use the web version, or if you want to also track your enzyme (PERT) dosing, you can use the EPI/PEI-SS in both the iOS (https://bit.ly/PERT-Pilot-iOS) and Android (https://bit.ly/PERT-Pilot-Android) versions of “PERT Pilot”. Then you can see your scores and view them over time in the same place.

Note that the scores of people with EPI in the study don’t mean that ‘this is as good as it gets’ when you go on enzymes. Many people with EPI indicate that they feel they are not dosing enough enzymes (see this study); the scores on the EPI/PEI-SS reflect this. It is possible for people with EPI to get scores in the non-EPI range, once enzymes are regularly dosed to match what you’re eating. (For example, my score went from well above the cutoff to well below the average non-EPI score once I started enzymes.)

If you are a doctor, take a look at the EPI/PEI-SS (see links, or Table 1 in the paper) so you know what some of the symptoms of EPI are. Notably, be aware that diarrhea and weight loss are not the only symptoms of EPI. In the diabetes sub-study, for example, we found food-related behaviors to be a key variable, as many people intuitively adjust what or how they are eating to try to eliminate symptoms on their own. Pain is not prominent in all corners of the EPI community (it’s more common among people with pancreatitis). Feel free to have patients use the EPI/PEI-SS any time and use it as part of your shared decision making process.

A new symptom score for exocrine pancreatic insufficiency: new research on the EPI/PEI-SS (a blog by Dana M. Lewis on DIYPS.org) If you have any feedback (for example, if it’s been helpful or not), you can email me any time (Dana+EPI-PEI-SS@OpenAPS.org). I’d also love to collaborate, if you’re interested in partnering on any research studies. We have some ongoing studies in different countries (US, Ireland, New Zealand, Australia) in different populations (general population; people with diabetes; people with pancreatic cancer; etc) and I’m looking forward to partnering with other researchers on additional validation studies and exploring if and how the EPI/PEI-SS can help us address some of the gaps of real-world clinical practice and life with EPI.

If you’re a researcher with shared interest in EPI…ditto the above!

—

Read the research referenced in this blog post: https://doi.org/10.3390/epidemiologia6030048

—

Cite it: Lewis DM, Landers A. Development of Novel Symptom Score to Assist in Screening for Exocrine Pancreatic Insufficiency. Epidemiologia. 2025; 6(3):48. https://doi.org/10.3390/epidemiologia6030048

—

Questions? Please comment below!

If you have EPI-specific questions, you might also like this blog post with 25 questions and answers about EPI (PEI) ranging from symptoms and diagnosis to treatment and dosing titration.

The data we leave behind in clinical trials and why it matters for clinical care and healthcare research in the future with AI

May 28, 2025 by Dana Lewis

Every time I hear that all health conditions will be cured and fixed in 5 years with AI, I cringe. I know too much to believe in this possibility. But this is not an uninformed opinion or a disbelief in the trajectory of AI takeoff: this is grounded in the very real reality of the nature of clinical trials reporting and publication of data and the limitations we have in current datasets today.

The sad reality is, we leave so much important data behind in clinical trials today. (And every clinical trial done before today). An example of this is how we report “positive” results for a lot of tests or conditions, using binary cutoffs and summary reporting without reporting average titres (levels) within subgroups. This affects both our ability to understand and characterize conditions, compare overlapping conditions with similar results, and also to be able to use this information clinically alongside symptoms and presentations of a condition. It’s not just a problem for research, it’s a problem for delivering healthcare. I have some ideas of things you (yes, you!) can do starting today to help fix this problem. It’s a great opportunity to do something now in order to fix the future (and today’s healthcare delivery gaps), not just complain that it’s someone else’s problem. If you contribute to clinical trials, you can help solve this!

What’s an example of this? Imagine an autoantibody test result, where values >20 are considered positive. That means a value of 21, 58, or 82 are all considered positive. But…that’s a wide range, and a much wider spread than is possible with “negative” values, where negative values could be 19, 8, or 3.

When this test is reported by labs, they give suggested cutoffs to interpret “weak”, “moderate”, or “strong” positives. In this example, a value of 20-40 is a “weak” positive, a value between 40-80 is a “moderate” positive, and a value above 80 is a strong positive. In our example list, all positives actually fall between barely a weak positive (21), a solidly moderate positive in the middle of that range (58), and a strong positive just above that cutoff (82). The weak positive could be interpreted as a negative, given variance in the test of 10% or so. But the problem lies in the moderate positive range. Clinicians are prone to say it’s not a strong positive therefore it should be considered as possibly negative, treating it more like the 21 value than the 82 value. And because there are no studies with actual titres, it’s unclear if the average or median “positive” reported is actually all above the “strong” (>80) cutoff or actually falls in the moderate positive category.

Also imagine the scenario where some other conditions occasionally have positive levels of this antibody level but again the titres aren’t actually published.

Today’s experience and how clinicians in the real world are interpreting this data:

21: positive, but 10% within cutoff doesn’t mean true positivity
53: moderate positive but it’s not strong and we don’t have median data of positives, so clinicians lean toward treating it as negative and/or an artifact of a co-condition given 10% prevalence in the other condition
82: strong positive, above cutoff, easy to treat as positive

Now imagine these values with studies that have reported that the median titre in the “positive” >20 group is actually a value of 58 for the people with the true condition.

21: would still be interpreted as likely negative even though it’s technically above the positive cutoff >20, again because of 10% error and how far it is below the median
53: moderate positive but within 10% of the median positive value. Even though it’s not above the “strong” cutoff, more likely to be perceived as a true positive
92: still strong positive, above cutoff, no change in perception

And what if the titres in the co-condition have a median value of 28? This makes it even more likely that if we know the co-condition value is 28 and the true condition value is 58, then a test result of 53 will be more correctly interpreted as the true condition rather than providing a false negative interpretation because it’s not above the >80 strong cutoff.

Why does this matter in the real world? Imagine a patient with a constellation of confusing symptoms and their positive antibody test (which would indicate a diagnosis for a disease) is interpreted as negative. This may result in a missed diagnosis, even if this is the correct diagnosis, given the absence of other definitive testing for the condition. This may mean lack of effective treatment, ineligibility to enroll in clinical trials, impacted quality of life, and possibly negatively impacting their survival and lifespan.

If you think I’m cherry picking a single example, you’re wrong. This has played out again and again in my last few years of researching conditions and autoantibody data. Another real-world scenario is where I had a slight positive (e.g. above a cutoff of 20) value, for a test that the lab reported is correlated with condition X. My doctor was puzzled because I have no signs of this condition X. I looked up the sensitivity and specificity data for this test and it only has 30% sensitivity and 80% specificity, whereas 20% of people with condition Y (which I do have) also have this antibody. There is no data on the median value of positivity in either condition X or condition Y. In the context of these two pieces of information we do have, it’s easier to interpret and guess that this value is not meaningful as a diagnostic for condition X given the lack of matching symptoms, yet the lab reports the association with condition X only even though it’s only slightly more probably for condition X to have this autoantibody compared to condition Y and several other conditions. I went looking for research data on raw levels of this autoantibody, to see where the median value is for positives with condition X and Y and again, like the above example, there is no raw data so it can’t be used for interpretation. Instead, it’s summary of summary data of summarizing with a simple binary cutoff >20, which then means clinical interpretation is really hard to do and impossible to research and meta-analyze the data to support individual interpretation.

And this is a key problem or limitation I see with the future of AI in healthcare that we need to focus on fixing. For diseases that are really well defined and characterized and we have in vitro or mouse models etc to use for testing diagnostics and therapies – sure, I can foresee huge breakthroughs in the next 5 years. However, for so many autoimmune conditions, they are not well characterized or defined, and the existing data we DO have is based on summaries of cutoff data like the examples above, so we can’t use them as endpoints to compare diagnostics or therapeutic targets. We need to re-do a lot of these studies and record and store the actual data so AI *can* do all of the amazing things we hear about the potential for.

But right now, for a lot of things, we can’t.

So what can we do? Right now, we actually CAN make a difference on this problem. If you’re gnashing your teeth about the change in the research funding landscape? You can take action right now by re-evaluating your current and retrospective datasets and your current studies and figure out:

Where you’re summarizing data and where raw data needs to be cleaned and tagged and stored so we can use AI with it in the future to do all these amazing things
What data could I tag and archive now that would be impossible or expensive to regenerate later?
Am I cleaning and storing values in formats that AI models could work with in the future (e.g. structured tables, CSVs, or JSON files)?
Most simply: how am I naming and storing the files with data so I can easily find them in the future? “Results.csv” or “results.xlsx” is maybe not ideal for helping you or your tools in the future find this data. How about “autoantibody_test-X_results_May-2025.csv” or similar.
Where are you reporting data? Can you report more data, as an associated supplementary file or a repository you can cite in your paper?

You should also ask yourself whether you’re even measuring the right things at the right time, and whether your inclusion and exclusion criteria are too strict and excluding the bulk of the population for which you should be studying.

An example of this is in exocrine pancreatic insufficiency, where studies often don’t look at all of the symptoms that correlate with EPI; they include or allow only for co-conditions that are only a tiny fraction of the likely EPI population; and they study the treatment (pancreatic enzyme replacement therapy) without context of food intake, which is as useful as studying whether insulin works in type 1 diabetes without context of how many carbohydrates someone is consuming.

You can be part of the solution, starting right now. Don’t just think about how you report data for a published paper (although there are opportunities there, too): think about the long term use of this data by humans (researchers and clinicians like yourself) AND by AI (capabilities and insights we can’t do yet but technology will be able to do in 3-5+ years).

A simple litmus test for you can be: if an interested researcher or patient reached out to me as the author of my study, and asked for the data to understand what the mean or median values were of a reported cohort with “positive” values…could I provide this data to them as an array of values?

For example, if you report that 65% of people with condition Y have positive autoantibody levels, you should also be able to say:

The mean value of the positive cohort (>20) is 58.
The mean value of the negative cohort (<20) is 13.
The full distribution (e.g. [21, 26, 53, 58, 60, 82, 92…]) is available in a supplemental file or data repository.

That makes a magnitude of difference in characterizing many of these conditions, for developing future models, testing treatments or comparative diagnostic approaches, or even getting people correctly diagnosed after previous missed diagnoses due to lack of available data to correctly interpret lab results.

Maybe you’re already doing this. If so, thanks. But I also challenge you to do more:

Ask for this type of data via peer review, either to be reported in the manuscript and/or included in supplementary material.
Push for more supplemental data publication with papers, in terms of code and datasets where possible.
Talk with your team, colleague and institution about long-term storage, accessibility, and formatting of datasets
Better yet, publish your anonymized dataset either with the supplementary appendix or in a repository online.
Take a step back and consider whether you’re studying the right things in the right population at the right time

The data we leave behind in clinical trials (white matters for clinical care, healthcare research, and the future with AI), a blog post by Dana M. Lewis from DIYPS.org These are actionable, doable, practical things we can all be doing, today, and not just gnashing our teeth. The sooner we course correct with improved data availability, the better off we’ll all be in the future, whether that’s tomorrow with better clinical care or in years with AI-facilitated diagnoses, treatments, and cures.

We should be thinking about:

What if we design data gathering & data generation in clinical trials not only for the current status quo (humans juggling data and only collecting minimal data), but how should we design trials for a potential future of machines as the primary viewers of the data?
What data would be worth accepting, collecting, and seeking as part of trials?
What burdens would that add (and how might we reduce those) now while preparing for that future?

The best time to collect the data we need was yesterday. The second best time is today (and tomorrow).

What bends and what breaks and the importance of knowing the difference as a patient

March 10, 2025 by Dana Lewis

As a patient, navigating healthcare often feels like decoding a complex rulebook. There are rules for everything: medication dosages, timing protocols, follow-up intervals. Some of these rules matter a lot, for either short term or longer term safety or health outcomes. But at other times… the rules seem senseless and are applied differently based on different healthcare providers within the same specialty, let alone across different specialities. As a patient, it’s easy to initially want to try to follow all rules perfectly, but feel unable to because the rules don’t make sense in a personal context. Over time, it can be hard to resist the conclusion that the rules don’t matter or don’t apply to you. The reality is somewhere in between. And it’s the in-between part that can be a challenging balance to figure out. Learning to navigate this balance requires understanding which rules are flexible and which aren’t.

I’ve learned there’s enormous value in digging into the “why” behind medical recommendations, when I can. Take acetaminophen (Tylenol), for example. There’s a clear, non-negotiable daily limit on the bottle because exceeding it is dangerous. The over-the-counter recommendation for Extra Strength acetaminophen (500 mg tablets) is no more than two tablets every six hours, not exceeding six tablets in 24 hours. Which actually means 3 doses per day, despite the 6 hour recommendation. This maximum daily limit (no more than six tablets) is set close to the safety threshold; exceeding that limit (eight tablets in 24 hours) increases the risk of severe liver damage.

Understanding this daily limit provides flexibility within safe boundaries (with the obvious caveat that I’m not a doctor and you should always talk to your own doctor). The “every 6 hours” recommendation ensures stable bioavailability of acetaminophen throughout the day, and making sure over the course of 24 hours that you are safely and completely below the max dosage line. Slight deviations to timing, such as taking a dose at 5 hours and 30 minutes instead of precisely 6 hours because you’re about to go to sleep, do not inherently cause harm, as long as the total intake remains within the safe daily limit. This is an example where a compliance-oriented guideline is designed primarily for optimal adherence at the population based level, rather than marking an absolute safety threshold at each individual dose.

There are a lot of things like this in healthcare, but it’s not always explained to patients and patients may not always think to stop and question the why – or have the time and resources to do so – and figure it out from first principles to decide whether a deviation on the timing or amount is risky, or not.

But many healthcare rules aren’t as clearly defined by safety, as is the case of the acetaminophen example. Other rules are shaped by convenience, compliance, and practical constraints of research protocols.

Timelines like “two weeks,” “one month,” or “six months” for follow-up visits or medication titration points often reflect research convenience more than physiological necessity or even the ideal best practice. These intervals might mark study endpoints, or convenience to the healthcare system, but they don’t necessarily pinpoint the best timeline overall or the right timeline for an individual patient. It can be hard as a patient to decide if your experience is deviating from the typical timeline in a beneficial or non-optimal way, and if and when to speak up and try to better adjust to the system or adjust the system to meet your needs (such as scheduling an earlier appointment rather than waiting for a mythical 4 month follow up when it’s clear by months 2-3 that there is no benefit to a treatment because any impact should have been observed by then, even if it wasn’t significant).

As a patient, understanding when rules reflect safety versus when they’re crafted primarily for convenience is crucial, but hard. Compliance-driven rules can sometimes be thoughtfully bent. They might be able to be adjusted to better fit individual circumstances without compromising safety. For instance, a medication schedule set strictly every eight hours might be modified slightly based on daily activities or sleep patterns, provided the change remains within safe therapeutic boundaries over the course of 24 hours. (And patients should be able to discuss this with their doctors! But time availability or access may influence the ability to have these conversations up front or over time as conflicts or issues arise.)

Yet, bending rules requires confidence, critical thinking, and often significant resources, whether those are educational, emotional, health itself, or financial. It means feeling secure enough to question a provider’s advice or advocate for adjustments tailored to individual needs. It’s not always even questioning the advice itself, but checking the understanding and interpretation of how you apply it to your own life. Most providers understand that, and have no problem confirming your understanding. Other times, though, it can accidentally or unintentionally cause conflict, if providers sometimes perceive questioning of their judgement.

I’ve tripped into that situation at least once accidentally before, when I had a follow up appointment with a non-MD clinical provider who wasn’t my main doctor at the practice, who I was seeing for an acute short-term issue. She was describing a recommendation for an rx, specifically because I have diabetes. In the past, I have received over-treatment from most providers because of having type 1 diabetes, because many recommendations for non-diabetes management that have guidance for people with diabetes are based on an assumption of non-optimal healing and non-optimal glucose management. Given that at the time I was already using OpenAPS, with ideal glucose outcomes for years, and no issues ever with reduced healing, I asked if the prescription recommendation would be given to the same type of patient without diabetes. I was trying to help myself make an informed decision about whether to accept the recommendation for the rx to determine if it was appropriate. If it was just because I had diabetes, it warranted additional discussion. It wasn’t about her clinical judgement per se, but about a shared decision making process to right-size the next steps to my individual situation, rather than assume that population-based outcomes for people with diabetes were automatically appropriate. Because of my experience, I know that sometimes they are and sometimes they are not, so I’ve learned to ask these questions. However, some combination of the lack of existing relationship with this provider; perhaps a poorly worded question; and other factors made the provider act defensive. I got the information I needed, decided the rx was appropriate for me and I would use it, and went about my business. But I got a follow up call later from another MD (again, not my MD) who was defensive and calling to check why I was questioning this non-MD provider and it came across as if I was questioning her because the provider was a non-MD…which was not the issue at all! It was about me and my care and making sure I understood the root of the recommendation: whether it was because of the health situation or because I had diabetes. (It was the former, about the health situation, although initially articulated as being simply because of the latter fact of simply having diabetes.)

This situation has colored all future encounters with healthcare providers for me. Seeing new providers who I don’t have a longstanding relationship with makes me nervous, from learned, lived experience about how some of these one-off encounters have gone in the past, like the ones above.

Unfortunately, patients who push back against compliance-driven rules or simply ask questions to facilitate their understanding risk being labeled “non-compliant” or “non-adherent”, and sometimes we get labels on our chart for asking questions and being misunderstood, despite our good intentions. Such labels can have lasting impacts, influencing how future providers perceive our reliability and credibility and can cause subsequent issues for receiving or even being granted access to healthcare.

This creates a profound dilemma for patients: follow all rules precisely, without question, but potentially sacrificing optimal care, or thoughtfully question to bend them and risk being misunderstood or penalized for trying to optimize your individual outcomes when the one-size-fits-all approach doesn’t actually fit.

Breaking compliance-oriented rules isn’t about defiance. At least, it’s never been that way for me. It’s about personalization and achieving the best possible outcomes. But not every patient has the luxury of confidently navigating these nuances, and even when they do, as described above, it can still sometimes turn out not so well. Many patients don’t have the time, energy, resources, or privilege required to safely challenge or reinterpret guidelines. Or they’ve been penalized for doing so. Consequently, they may remain strictly compliant, potentially missing opportunities for better individual outcomes and higher quality of life.

Healthcare needs to provide clarity around which rules are absolute safety boundaries and which are recommendations optimized primarily for convenience or broad adherence for the safe general public use. Patients deserve transparency and support in discerning between what’s bendable for individual benefit and what’s non-negotiable for safety.

What bends, what breaks and the importance of understanding the difference in healthcare. A blog post by Dana M. Lewis from DIYPS.org And: patients should not be punished for asking questions in order to better understand or check their understanding.

Knowing the difference on what bends and what breaks matters. But many patients remain caught in the delicate balance between bending and breaking, carefully evaluating risks and rewards, often alone.

How Medical Research Literature Evolves Over Time Like A Game of Telephone

February 3, 2025February 2, 2025 by Dana Lewis

Have you ever searched for or through medical research on a specific topic, only to find different studies saying seemingly contradictory things? Or you find something that doesn’t seem to make sense?

You may experience this, whether you’re a doctor, a researcher, or a patient.

I have found it helpful to consider that medical literature is like a game of telephone, where a fact or statement is passed from one research paper to another, which means that sometimes it is slowly (or quickly!) changing along the way. Sometimes this means an error has been introduced, or replicated.

A Game of Telephone in Research Citations

Imagine a research study from 2016 that makes a statement based on the best available data at the time. Over the next few years, other papers cite that original study, repeating the statement. Some authors might slightly rephrase it, adding their own interpretations. By 2019, newer research has emerged that contradicts the original statement. Some researchers start citing this new, corrected information, while others continue citing the outdated statement because they either haven’t updated their knowledge or are relying on older sources, especially because they see other papers pointing to these older sources and find it easiest to point to them, too. It’s not necessarily made clear that this outdated statement is now known to be incorrect. Sometimes that becomes obvious in the literature and field of study, and sometimes it’s not made explicit that the prior statement is ‘incorrect’. (And if it is incorrect, it doesn’t become known as incorrect until later – at the time it’s made, it’s considered to be correct.)

By 2022, both the correct and incorrect statements appear in the literature. Eventually, a majority of researchers transition to citing the updated, accurate information—but the outdated statement never fully disappears. A handful of papers continue to reference the original incorrect fact, whether due to oversight, habit (of using older sources and repeating citations for simple statements), or a reluctance to accept new findings.

The gif below illustrates this concept, showing how incorrect and correct statements coexist over time. It also highlights how researchers may rely on citations from previous papers without always checking whether the original information was correct in the first place.

Animated gif illustrating how citations branch off and even if new statements are introduced to the literature, the previous statement can continue to appear over time.

This is not necessarily a criticism of researchers/authors of research publications (of which I am one!), but an acknowledgement of the situation that results from these processes. Once you’ve written a paper and cited a basic fact (let’s imagine you wrote this paper in 2017 and cite the 2016 paper and fact), it’s easy to keep using this citation over time. Imagine it’s 2023 and you’re writing a paper on the same topic area, it’s very easy to drop the same citation from 2016 in for the same basic fact, and you may not think to consider updating the citation or check if the fact is still the fact.

Why This Matters

Over time, a once-accepted “fact” may be corrected or revised, but older statements can still linger in the literature, continuing to influence new research. Understanding how this process works can help you critically evaluate medical research and recognize when a widely accepted statement might actually be outdated—or even incorrect.

If you’re looking into a medical topic, it’s important to pay attention not just to what different studies say, but also when they were published and how their key claims have evolved over time. If you notice a shift in the literature—where newer papers cite a different fact than older ones—it may indicate that scientific understanding has changed.

One useful strategy is to notice how frequently a particular statement appears in the literature over time.

Whenever I have a new diagnosis or a new topic to research on one of my chronic diseases, I find myself doing this.

I go and read a lot of abstracts and research papers about the topic; I generally observe patterns in terms of key things that everyone says, which establishes what the generally understood “facts” are, and also notice what is missing. (Usually, the question I’m asking is not addressed in the literature! But that’s another topic…)

I pay attention to the dates, observing when something is said in papers in the 1990s and whether it’s still being repeated in the 2020s era papers, or if/how it’s changed. In my head, I’m updating “this is what is generally known” and “this doesn’t seem to be answered in the literature (yet)” and “this is something that has changed over time” lists.

Re-Evaluating the Original ‘Fact’

In some cases, it turns out the original statement was never correct to begin with. This can happen when early research is based on small sample sizes, incomplete data, or incorrect assumptions. Sometimes that statement was correct, in context, but taken out of context immediately and this out of context use was never corrected.

For example, a widely cited statement in medical literature once claimed that chronic pancreatitis is the most common cause of exocrine pancreatic insufficiency (EPI). This claim was repeated across numerous papers, reinforcing it as accepted knowledge. However, a closer examination of population data shows that while chronic pancreatitis is a known co-condition of EPI, it is far less common than diabetes—a condition that affects a much larger population and is also strongly associated with EPI. Despite this, many papers still repeat the outdated claim without checking the original data behind it.

(For a deeper dive into this example, you can read my previous post here. But TL;DR: even 80% of .03% is a smaller number than 10% of 10% of the overall population…so it is not plausible that CP is the biggest cause of EPI/PEI.)

Stay Curious

This realization can be really frustrating, because if you’re trying to do primary research to help you understand a topic or question, how do you know what the truth is? This is peer-reviewed research, but what this shows us is that the process of peer-review and publishing in a journal is not infallible. There can be errors. The process for updating errors can be messy, and it can be hard to clean up the literature over time. This makes it hard for us humans – whether in the role of patient or researcher or clinician – to sort things out.

But beyond a ‘woe is me, this is hard’ moment of frustration, I do find that this perspective of literature as a process of telephone makes me a better reader of the literature and forces me to think more critically about what I’m reading, and take papers in context of the broader landscape of literature and evolving knowledge base. It helps remove the strength I would otherwise be prone to assigning any one paper (and any one ‘fact’ or finding from a single paper), and encourages me to calibrate this against the broader knowledge base and the timeline of this knowledge base.

That can also be hard to deal with personally as a researcher/author, especially someone who tends to work in the gaps, establishing new findings and facts and introducing them to the literature. Some of my work also involves correcting errors in the literature, which I find from my outsider/patient perspective to be obvious because I’ve been able to use fresh eyes and evaluate at a systematic review level/high level view, without being as much in the weeds. That means my work, to disseminate new or corrected knowledge, is even more challenging. It’s also challenging personally as a patient, when I “just” want answers and for everything to already be studied, vetted, published, and widely known by everyone (including me and my clinician team).

But it’s usually not, and that’s just something I – and we – have to deal with. I’m curious as to whether we will eventually develop tools with AI to address this. Perhaps a mini systematic review tool that scrapes the literature and includes an analysis of how things have changed over time. This is done in systematic review or narrative reviews of the literature, when you read those types of papers, but those papers are based on researcher interests (and time and funding), and I often have so many questions that don’t have systematic reviews/narrative reviews covering them. Some I turn into papers myself (such as my paper on systematically reviewing the dosing guidelines and research on pancreatic enzyme replacement therapy for people with exocrine pancreatic insufficiency, known as EPI or PEI, or a systematic review on the prevalence of EPI in the general population or a systematic review on the prevalence of EPI in people with diabetes (Type 1 and Type 2)), but sometimes it’s just a personal question and it would be great to have a tool to help facilitate the process of seeing how information has changed over time. Maybe someone will eventually build that tool, or it’ll go on my list of things I might want to build, and I’ll build it myself like I have done with other types of research tools in the past, both without and with AI assistance. We’ll see!

—

TL;DR: be cognizant of the fact that medical literature changes over time, and keep this in mind when reading a single paper. Sometimes there are competing “facts” or beliefs or statements in the literature, and sometimes you can identify how it evolves over time, so that you can better assess the accuracy of research findings and avoid relying on outdated or incorrect information.

Whether you’re a researcher, a clinician, or a patient doing research for yourself, this awareness can help you better navigate the scientific literature.

A screenshot from the animated gif showing how citation strings happen in the literature, branching off over time but often still resulting in a repetition of a fact that is later considered to be incorrect, thus both the correct and incorrect fact occur in the literature at the same time.

The prompt matters when using Large Language Models (LLMs) and AI in healthcare

January 15, 2025April 12, 2025 by Dana Lewis

I see more and more research papers coming out these days about different uses of large language models (LLMs, a type of AI) in healthcare. There are papers evaluating it for supporting clinicians in decision-making, aiding in note-taking and improving clinical documentation, and enhancing patient education. But I see a wide-sweeping trend in the titles and conclusions of these papers, exacerbated by media headlines, making sweeping claims about the performance of one model versus another. I challenge everyone to pause and consider a critical fact that is less obvious: the prompt matters just as much as the model.

As an example of this, I will link to a recent ~~pre-print of a~~ research article I worked on with Liz Salmi (published article here ~~pre-print here~~).

Liz nerd-sniped me about an idea of a study to have a patient and a neuro-oncologist evaluate LLM responses related to patient-generated queries about a chart note (or visit note or open note or clinical note, however you want to call it). I say nerd-sniped because I got very interested in designing the methods of the study, including making sure we used the APIs to model these ‘chat’ sessions so that the prompts were not influenced by custom instructions, ‘memory’ features within the account or chat sessions, etc. I also wanted to test something I’ve observed anecdotally from personal LLM use across other topics, which is that with 2024-era models the prompt matters a lot for what type of output you get. So that’s the study we designed, and wrote with Jennifer Clarke, Zhiyong Dong, Rudy Fischmann, Emily McIntosh, Chethan Sarabu, and Catherine (Cait) DesRoches, and I encourage you to check out the article here ~~pre-print~~ and enjoy the methods section, which is critical for understanding the point I’m trying to make here.

In this study, the data showed that when LLM outputs were evaluated for a healthcare task, the results varied significantly depending not just on the model but also on how the task was presented (the prompt). Specifically, persona-based prompts—designed to reflect the perspectives of different end users like clinicians and patients—yielded better results, as independently graded by both an oncologist and a patient.

The Myth of the “Best Model for the Job”

Many research papers conclude with simplified takeaways: Model A is better than Model B for healthcare tasks. While performance benchmarking is important, this approach often oversimplifies reality. Healthcare tasks are rarely monolithic. There’s a difference between summarizing patient education materials, drafting clinical notes, or assisting with complex differential diagnosis tasks.

But even within a single task, the way you frame the prompt makes a profound difference.

Consider these three prompts for the same task:

“Explain the treatment options for early-stage breast cancer.”
“You’re an oncologist. Explain the treatment options for early-stage breast cancer.”
“You’re an oncologist. Explain the treatment options for early-stage breast cancer as you would to a newly diagnosed patient with no medical background.”

The second and third prompt likely result in a more accessible and tailored response. If a study only tests general prompts (e.g. prompt one), it may fail to capture how much more effective an LLM can be with task-specific guidance.

Why Prompting Matters in Healthcare Tasks

Prompting shapes how the model interprets the task and generates its output. Here’s why it matters:

Precision and Clarity: A vague prompt may yield vague results. A precise prompt clarifies the goal and the speaker (e.g. in prompt 2), and also often the audience (e.g. in prompt 3).
Task Alignment: Complex medical topics often require different approaches depending on the user—whether it’s a clinician, a patient, or a researcher.
Bias and Quality Control: Poorly constructed prompts can inadvertently introduce biases

Selecting a Model for a Task? Test Multiple Prompts

When evaluating LLMs for healthcare tasks—or applying insights from a research paper—consider these principles:

Prompt Variation Matters: If an LLM fails on a task, it may not be the model’s fault. Try adjusting your prompts before concluding the model is ineffective, and avoid broad sweeping claims about a field or topic that aren’t supported by the test you are running.
Multiple Dimensions of Performance: Look beyond binary “good” vs. “bad” evaluations. Consider dimensions like readability, clinical accuracy, and alignment with user needs, as an example when thinking about performance in healthcare. In our paper, we saw some cases where a patient and provider overlapped in ratings, and other places where the ratings were different.
Reproducibility and Transparency: If a study doesn’t disclose how prompts were designed or varied, its conclusions may lack context. Reproducibility in AI studies depends not just on the model, but on the interaction between the task, model, and prompt design. You should be looking for these kinds of details when reading or peer reviewing papers. Take results and conclusions with a grain of salt if these methods are not detailed in the paper.
Involve Stakeholders in Evaluation: As shown in the preprint mentioned earlier, involving both clinical experts and patients in evaluating LLM outputs adds critical perspectives often missing in standard evaluations, especially as we evolve to focus research on supporting patient needs and not simply focusing on clinician and healthcare system usage of AI.

What This Means for Healthcare Providers, Researchers, and Patients

For healthcare providers, understand that the way you frame a question can improve the usefulness of AI tools in practice. A carefully constructed prompt, adding a persona or requesting information for a specific audience, can change the output.
For researchers, especially those developing or evaluating AI models, it’s essential to test prompts across different task types and end-user needs. Transparent reporting on prompt strategies strengthens the reliability of your findings.
For patients, recognizing that AI-generated health information is shaped by both the model and the prompt. This can support critical thinking when interpreting AI-driven health advice. Remember that LLMs can be biased, but so too can be humans in healthcare. The same approach for assessing bias and evaluating experiences in healthcare should be used for LLM output as well as human output. Everyone (humans) and everything (LLMs) are capable of bias or errors in healthcare.

TLDR: Instead of asking “Which model is best?”, a better question might be:

“How do we design and evaluate prompts that lead to the most reliable, useful results for this specific task and audience?”

I’ve observed, and this study adds evidence, that prompt interaction with the model matters.

Best practices in communication related to writing a journal article and sharing it with co-authors

January 2, 2025 by Dana Lewis

I’ve been a single author, a lead author, a co-author, a corresponding author, AND a last author. Basically, I have written a lot of journal articles myself, solo / single, and with other people. One area in this process that I observe frequently gets overlooked is what happens during and after the submission process, as it relates to communicating about the article itself.

I’m not talking about disseminating the article to your target audience or the public, either (although that is important as well). I’m talking about making sure all authors know the article has been accepted; when it is live; have access to a copy of the article (!); etc.

Most people don’t know that by default, not all journals give all authors access to their own articles for free.

Here are some tips about the process of submitting and saving published articles that will help all authors – even solo authors – in the future.

Basically, help you help your future self! (As well as help your co-authors).

Journals typically only notify the lead/corresponding/submitting author about where the manuscript is in terms of revision, acceptance, and publication. That puts the responsibility on the lead/corresponding/submitting author to notify the full team of authors of where the article is in the process. Similarly, some journals will send a PDF/final copy of the proofed, final, version of record article to the lead author (not always, but usually), but that often does not go out to the full author team by default.

This means that it is the lead author’s responsibility to forward the copy of the final, PDF, proofed article to the entire authorship team so everyone has a copy.

(No, most of the time authors do not have free access to the journal they are submitting to. No, most authors do not have budget to make articles open access and free to all, which means unless they manage to snag and save this PDF article when it is sent to them at the time of publication, in the future, they may not have access to their very own article! Just because you, as the lead/corresponding author do have access, this does not mean everyone on your article team will.

I’m a good example of someone who authors frequently but is not at an institution and has zero access to any paywalled journals. If I’m not given a copy of my articles at the time of publication, I have to phone-a-friend (thanks, Liz Salmi, for being the go-to for me here) to help pull articles. There are things like S c i H u b, but they more often than not do not have super recent, fresh off the press articles. So yes, people like me exist on your authorship teams.)

Best practices for authors include:

Once you submit a manuscript, mark your file name (somehow) with “Submitted”. This way you know this is the version that was submitted. This is a useful step related to the below, we’ll come back to why we may want to use only the ‘submitted’ version.
Example: “JournalAcronym-Article-Blah-Blah-SUBMITTED.docx”.

Even as the non-lead author, when co-writing articles, as any type of author I prefer to have access to this submitted version. This way, I can see all incorporated edits and the ‘final’ version we submitted. There’s also cases where, see below, I need this for sharing it with other people.
Usually, the article goes through peer review and you get comments, so you make revisions and re-submit your article. Again, once submitted, make sure you’ve marked this as ‘revision’ somehow (usually people do) and that is was submitted.
Example: “JournalAcronym-Article-Blah-Blah-SUBMITTED-R1.docx”.

Again, best practice would be to send out this re-submitted revision version to all authors so everyone has it.
You may end up with multiple rounds of revisions and peer review (moving to R2, etc), or you may get an acceptance notice. Your article will then move to copyediting stage and you get proofs. It’s useful to save these for your own purpose, such as making sure that the edits you make are actually executed in the final article. This is less important for dissemination, though, although I do recommend giving all co-authors the ability to edit/review/proof and request changes.
Accepted, proofed, published! THIS is the step that I see most people miss, so pay attention.If you are the lead or solo author, you will probably get an email saying your article is now online, either online first or published. You may get an attachment PDF of your article. If not, you should be able to click on your access link and go to access the article online.
IMPORTANT STEP HERE: go ahead and download the PDF of the article then. Right then, go ahead and save it.

Example: “JournalAcronym-Article-Blah-Blah-Year.PDF”.

(Why do you care about this if you are a solo author? Because the link may expire and you may lose access to this article. More on sharing your article below.)
Email your entire author team (if you’re not a solo author). Tell them the article was published; provide a link and/or the DOI link; and attach the PDF to the email so everyone on the team has a copy of the final article. Not all of your co-authors will work at an institution that has unlimited library access; if they do, that might change in the future. Give everyone a copy of the article to save for themselves.You can also remind everyone what the sharing permissions (or limitations) are for the article.
For example, some articles are paywalled but authors have permission to store the final copy (PDF of the final version) on their own repository or not-for-profit website. For an example, see my research page of DIYPS.org/research – you’ll notice how sometimes I link to an “author copy” PDF, which is what this is – the final article PDF like you would get by accessing the paywalled journal.

Other times, though, you are specifically not permitted to share the final/proofed/formatted copy. Instead, you’ll be allowed to share the “submitted” manuscript (usually prior to the revision stage). Remember how step 1 that I told you was to save a SUBMITTED copy? This is why! You can PDF this up; add a note to the top that references the final version of record (usually, journals give you recommended language for this) and a link/DOI link to it, and share away on your own site. Again, look at DIYPS.org/research and you’ll notice some of my “author copy” versions are these submitted versions rather than the final versions.

You’ll also notice that sometimes I link to articles that are open access and then also have a link to a PDF author copy. This is in case something changes in the future with open access links breaking, the journal changing, etc. I have actually had free non-paywalled articles get turned into paywalled journal articles years later, which is why I do point to both places (the open access version and a back up author copy).

Regardless of what the permissions are for sharing on your own website/repository/institutional repository: you as the author always have permission to give this PDF out when you are asked directly. For example, someone emails you and asks for a copy: you can email back and attach the PDF! This is true even if the permissioning for your own website is the submitted version (not the final version), you can still hand out the final, formatted, pretty PDF version when asked directly.

As a related tip, this is a great way to disseminate your research and build relationships, so if someone does email you and ask for an author copy…please reply and send them a copy. (Saying this as someone without access to articles who sends requests to many authors to get access to their research, and I only get responses from 50% of authors. Sad panda.) Again, this is why it is helpful to get in the habit of saving your articles as you submit and have them published; it makes it easy to jump into the “Published copy” folder (or however you name it) and attach the PDF to the email and send it.

To recap, as a best practice, you should disseminate various versions of articles to your entire co-author team at the following points in time:

Original submission.
Suggestion: Write an email, say you’ve successfully submitted, remind everyone which journal this was submitted to, and attached a copy of your “JournalAcronym-Article-Blah-Blah-SUBMITTED.docx”(If you end up getting a desk rejection, and you are re-submitting elsewhere, it is also nice to email co-authors and tell them so. You don’t necessarily need to send out a newly retitled version, unless there’s new changes to the submission, such as if you did go through a partial round of peer review before getting rejected and you are submitting the revised version to the new target journal.)
Revision submission.
Suggestion: Write an email, say you’ve successfully submitted the revisions, remind everyone which journal this was submitted to, and attached a copy of your “JournalAcronym-Article-Blah-Blah-SUBMITTED-R1.docx” and the reviewer response document so everyone can see how edits/feedback were incorporated (or not).
Acceptance.

Suggestion:

A) Forward the email if it has the PDF attached to your full author team. Say congratulations; the article was accepted; and point out the article is attached as PDF.

B) If you don’t have a PDF attachment in your email already, go to the online access link the journal gave you and save a copy of the PDF. Then, email the author team with the FYI that the article is live; provide the link to the online version; and attach the PDF directly to that email so everyone has a final version.

Regardless of A or B, remind everyone what the permissions are for sharing to their own/institution repository (eg final PDF or use the submitted version, which you previously shared or could also re-share here).

Bonus tip:

Depending on the content of your article, you may also want to think about sending copies of the final PDF article to certain people who are not co-authors with you.

For example, if you are heavily citing someone’s work or talking about their work in a constructive way – you could email them and give them a heads up and provide a copy of the article. It’s a great way to contribute to your relationship (if you have an existing relationship) and/or foster a relationship. Remember that many people will have Google Scholar Alerts or similar with their name and/or citation alerts from various services, so people are likely to see when you talk about them or their work or are heavily citing their work. Again, some of those people may not have access to your article and may reach out to ask for an article; you can (and should) send them a copy! (And again, consider thinking about it as a relationship building opportunity rather than a transactional thing related to this single article.)

I would particularly flag this as something to pay attention to and do if you are someone working in the space of patient engagement in healthcare. For example, if you write an article and mention them or their body of work by name, it would be courteous to email them, let them know about the article, and send them a PDF.

Otherwise, I can speak from the experience of being talked about as a patient like I’m an ant under the microscope where someone cites an article where my work is mentioned; talks about me by name and references my perspective; and I get a notification about this article….but I can’t access it because it’s in a paywalled journal. Awkward, and a little weird in some cases when the very subject of the article(s) are about patient engagement and involving patients in research. Remember, research involvement should include all stages from design, planning, doing the research, and then disseminating the research. So this meta point is that if there is scholarly literature of any kind (whether original research articles or reviews, commentaries, letters in response to other articles, etc) talking about specific patients and their bodies of work – best practice should be to email them and send a copy of the article. Again, think less transactional and more about relationships – it will likely give you benefit in the long run! Plus, less awkward, a short-term benefit.

—-

best practices for communicating with co-authors about published articles, by Dana M. Lewis from DIYPS.org As an example for how I like to disseminate my articles personally, every time a journal article is published and I have access to it, I updated DIYPS.org/research with the title, journal, a DOI link (to help people find it online and/or cite it), and a link to the open access version if available and if not, an author copy PDF of the final or submitted version. So, if you’re ever looking for any of my articles, you can head there (DIYPS.org/research) first and grab copies any time!

If you are looking for a particular article and can’t find it or it’s not listed there yet (e.g. likely because it just came out and I haven’t been sent my own copy by my co-authors yet…), you can always email me directly (Dana@OpenAPS.org) and I’m more than happy to send you a copy of whatever version I have available and/or the final PDF once I have access to it.

Assessing the Impact of Diabetes on Gastrointestinal Symptom Severity in Exocrine Pancreatic Insufficiency (EPI/PEI): A Diabetes Subgroup Analysis of EPI/PEI-SS Scores – Poster at #ADA2024

June 21, 2024April 2, 2025 by Dana Lewis

Last year, I recognized that there was a need to improve the documentation of symptoms of exocrine pancreatic insufficiency (known as EPI or PEI). There is no standardized way to discuss symptoms with doctors, and this influences whether or not people get the right amount of enzymes (pancreatic enzyme replacement therapy; PERT) to treat EPI and eliminate symptoms completely. It can be done, but like insulin, it requires matching PERT to the amount of food you’re consuming. I also began observing that EPI is underscreened and underdiagnosed, whether that’s in the general population or in people with diabetes. I thought that if we could create a list of common EPI symptoms and a standardized scale to rate them, this might help address some of these challenges.

I developed this scale to address these needs. It is called the “Exocrine Pancreatic Insufficiency Symptom Score” or “EPI/PEI-SS” for short.

I had a handful of people with and without EPI help me test the scale last year, and then I opened up a survey to the entire world and asked people to share their experiences with GI-related symptoms. I specifically sought people with EPI diagnoses as well as people who don’t have EPI, so that we could compare the symptom burden and experiences to people without EPI. (Thank you to everyone who contributed their data to this survey!)

After the first three weeks, I started analyzing the first set of data. While doing that, I realized that (both because of my network of people with diabetes and because I also posted in at least one diabetes-specific group), I had a large sub-group of people with diabetes who had contributed to the survey, and I was able to do a full subgroup analyses to assess whether having diabetes seemed to correlate with a different symptom experience of EPI or not.

Here’s what I found, and what my poster is about (you can view my poster as a PDF here), presented at ADA Scientific Sessions 2024 (#ADA2024):

1985-LB at #ADA2024, “Assessing the Impact of Diabetes on Gastrointestinal Symptom Severity in Exocrine Pancreatic Insufficiency (EPI/PEI): A Diabetes Subgroup Analysis of EPI/PEI-SS Scores”

Exocrine pancreatic insufficiency has a high symptom burden and is present in as many as 3 of 10 people with diabetes. (See my systematic review from last year here). To help improve conversations about symptoms of EPI, which can then be used to improve screening, diagnosis, and treatment success with EPI, I created the Exocrine Pancreatic Insufficiency Symptom Score (EPI/PEI-SS), which consists of 15 individual symptoms that people separately rate the frequency (0-5) and severity (0-3) for which they experience those symptoms, if at all. The frequency and severity get multiplied for an individual symptom score (0-15 possible) and these get added up for a total EPI/PEI-SS score (0-225 possible, because 15 symptoms times 15 possible points per symptom is 225).

I conducted a real-world study of the EPI/PEI-SS in the general population to assess the gastrointestinal symptom burden in individuals with (n=155) and without (n=169) EPI. Because there was a large cohort of PWD within these groups, I separately analyzed them to evaluate whether diabetes contributes to a difference in EPI/PEI-SS score.

Methods:

I calculated EPI/PEI-SS scores for all survey participants. Previously, I had analyzed the differences of people with and without EPI overall. For this sub-analysis, I analyzed and compared between PWD (n=118 total), with EPI (T1D: n=14; T2D: n=20) or without EPI (T1D: n=78; T2D: n=6), and people without diabetes (n=206 total) with and without EPI.

I also looked at sub-groups within the non-EPI cohorts and broke them into two groups to see whether other GI conditions contributed to a higher EPI/PEI-SS score and whether we could distinguish EPI from other GI and non-GI conditions.

Results:

People with EPI have a much higher symptom burden than people without EPI. This can be assessed by looking at the statistically significant higher mean EPI/PEI-SS score as well as the average number of symptoms; the average severity score of individual symptoms; and the average frequency score of individual symptoms.

This remains true irrespective of diabetes. In other words, diabetes does not appear to influence any of these metrics.

People with diabetes with EPI had statistically significant higher mean EPI/PEI-SS scores (102.62 out of 225, SD: 52.46) than did people with diabetes without EPI (33.64, SD: 30.38), irrespective of presence of other GI conditions (all group comparisons p<0.001). As you can see below, that is the same pattern we see in people without diabetes. And the stats confirm what you can see: there is no significant difference overall or in any of the subgroups between people with and without diabetes.

T1D and T2D subgroups were similar
(but because the T2D cohort is small, I did not break them out separately in this graph).

For example, people with diabetes with EPI had an average of 12.59 (out of 15) symptoms, with an average frequency score of 3.06 and average severity score of 1.79, and an average individual symptom score of 5.48. This is a pretty clear contrast to people with diabetes without EPI who had had an average of 7.36 symptoms, with an average frequency score of 1.4 and average severity score of 0.8, and an average individual symptom score of 1.12. All comparisons are statistically significant (p<0.001).

A table comparing the average number of symptoms, frequency, severity, and individual symptom scores between people with diabetes with and without exocrine pancreatic insufficiency (EPI). People with EPI have more symptoms and higher frequency and severity than without EPI: regardless of diabetes.

Conclusion

EPI has a high symptom burden, irrespective of diabetes.
High scores using the EPI/PEI-SS among people with diabetes can distinguish between EPI and other GI conditions.
The EPI/PEI-SS should be further studied as a possible screening method for EPI and assessed as a tool to aid people with EPI in tracking changes to EPI symptoms over time based on PERT titration.

What does this mean if you are a healthcare provider? What actionable information does this give you?

If you’re a healthcare provider, you should be aware that people with diabetes may be more likely to have EPI – rather than celiac or gastroparesis (source) – if they mention having GI symptoms. This means you should incorporate fecal elastase screening into your care plans to help further evaluate GI-related symptoms.

If you want to further improve your pre-test probability of the elastase testing, you can use the EPI/PEI-SS with your patients to assess the severity and frequency of their GI-related symptoms. I will explain the cutoff and AUC numbers we calculated, but first understand the caveat that these were calculated in the initial real-world study that included people with EPI who are already treating with PERT; thus these numbers might change a little when we repeat this study and evaluate it in people with untreated EPI. (However, I actually predict the mean score to go up in an undiagnosed population, because scores should go down with treatment.) But that different population study may change these exact cutoff and sensitivity specificity numbers, which is why I’m giving this caveat. That being said: the AUC was 0.85 which means a higher EPI/PEI-SS is pretty good for differentiating between EPI and not having EPI. (In the diabetes sub-population specifically, I calculated a suggested cutoff of 59 (out of 225) with a sensitivity of 0.81 and specificity of 0.75. This means we estimate that if people are bringing up GI symptoms to you and you have them take the EPI/PEI-SS and their score is greater than or equal to 59, you would expect that out of 100 people that 81 with EPI would be identified (and 75 of 100 people without EPI would also correctly be identified via scores lower than 59). That doesn’t mean that people with EPI can’t have a lower score; or that people with a higher score do have EPI; but it does mean that the chances of having fecal elastase <=200 ug/g is a lot more likely in those with higher EPI/PEI-SS scores.

In addition to the cutoff score, there is a notable difference in people with diabetes and EPI compared to people with diabetes without EPI in their top individual symptom scores (representing symptom burden based on frequency and severity). For example, the top 3 symptoms of those with EPI and diabetes include avoiding certain food/groups; urgent bowel movements; and avoiding eating large meals. People without EPI and diabetes also score “Avoid certain food/groups” as their top score, but the score is markedly different: the mean score of 8.94 for people with EPI as compared to 3.49 for people without EPI. In fact, the mean score on the lowest individual symptom is higher for people with EPI than the highest individual symptom score for people without EPI.

QR code for EPI/PEI-SS - takes you to https://bit.ly/EPI-PEI-SS-Web How do you have people take the EPI/PEI-SS? You can pull this link up (https://bit.ly/EPI-PEI-SS-Web), give this link to them and ask them to take it on their phone, or save this QR code and give it to them to take later. The link (and the QR code) go to a free web-based version of the EPI/PEI-SS that will calculate the total EPI/PEI-SS score, and you can use it for shared decision making processes about whether this person would benefit from a fecal elastase test or other follow up screening for EPI. Note that the EPI/PEI-SS does not collect any identifiable information and is fully anonymous.

(Bonus: people who use this tool can opt to contribute their anonymized symptom and score data for an ongoing observational study.)

If you have feedback about whether the EPI/PEI-SS was helpful – or not – in your care of people with diabetes; or if you want to discuss collaborating on some prospective studies to evaluate EPI/PEI-SS in comparison to fecal elastase screening, please reach out anytime to Dana@OpenAPS.org.

What does this mean if you are a patient (person with diabetes)? What actionable information does this give you?

If you don’t have GI symptoms that bother you, you don’t necessarily need to take action. (Just put a note in your brain that EPI is more likely than celiac or gastroparesis in people with diabetes so if you or a friend with diabetes have GI symptoms in the future, you can make sure you are assessed for EPI.) You can also choose to take the EPI/PEI-SS regardless, and also opt in to donate your data.

If you do have GI symptoms that are annoying, you may want to take the EPI/PEI-SS to help you evaluate the frequency and severity of your GI symptoms. You can take it for free and anonymously – no identifiable information is needed to access the tool. It will generate the EPI/PEI-SS score for you.

Based on the score, you may want to ask your doctor (which could be the doctor that treats your diabetes, or a primary/general care provider, or a gastroenterologist – whoever you seek routine care from or have an appointment from next) about your symptoms; share the EPI/PEI-SS score; and explain that you think you may warrant screening for EPI.

(You can also choose to contribute your anonymous symptom data to a research dataset, to help us improve the EPI/PEI-SS and help us figure out how to help improve screening and diagnosis and treatment of EPI. Remember, this tool will not ask you for any identifying information. This is 100% optional and you can opt out of doing so if you do not prefer to contribute to research, while still using the tool.)

—

You can see a pre-print version of the diabetes sub-study here or pre-print of the general population data here.

If you’re looking for more personal experiences about living with EPI, check out DIYPS.org/EPI, and also for people with EPI looking to improve their dosing with pancreatic enzyme replacement therapy – you may want to check out PERT Pilot (a free iOS app to record enzyme dosing, also available for free for Android).

Researchers & clinicians, if you’re interested in collaborating on studies in EPI (in diabetes, or more broadly on EPI), whether specifically on EPI/PEI-SS or broader EPI topics, please reach out! My email is Dana@OpenAPS.org

Effective Pair Programming and Coding and Prompt Engineering and Writing with LLMs like ChatGPT and other AI tools

February 12, 2024April 12, 2025 by Dana Lewis

I’ve been puzzled when I see people online say that LLM’s “don’t write good code”. In my experience, they do. But given that most of these LLMs are used in chatbot mode – meaning you chat and give it instructions to generate the code – that might be where the disconnect lies. To get good code, you need effective prompting and to do so, you need clear thinking and ideas on what you are trying to achieve and how.

My recipe and understanding is:

Clear thinking + clear communication of ideas/request = effective prompting => effective code and other outputs

It also involves understanding what these systems can and can’t do. For example, as I’ve written about before, they can’t “know” things (although they can increasingly look things up) and they can’t do “mental” math. But, they can generally repeat patterns of words to help you see what is known about a topic and they can write code that you can execute (or it can execute, depending on settings) to solve a math problem.

What the system does well is help code small chunks, walk you through processes to link these sections of code up, and help you implement them (if you ask for it). The smaller the task (ask), the more effective it is. Or also – the easier it is for you to see when it completes the task and when it hasn’t been able to finish due to limitations like response length limits, information falling out of the context window (what it knows that you’ve told it); unclear prompting; and/or because you’re asking it to do things for which it doesn’t have expertise. Some of the last part – lack of expertise – can be improved with specific prompting techniques – and that’s also true for right-sizing the task it’s focusing on.

Right-size the task by giving a clear ask

If I were to ask an LLM to write me code for an iOS app to do XYZ, it could write me some code, but it certainly wouldn’t (at this point in history, written in February 2024), write all code and give me a downloadable file that includes it all and the ability to simply run it. What it can do is start writing chunks and snippets of code for bits and pieces of files that I can take and place and build upon.

How do I know this? Because I made that mistake when trying to build my first iOS apps in April and May 2023 (last year). It can’t do that (and still can’t today; I repeated the experiment). I had zero ideas how to build an iOS app; I had a sense that it involved XCode and pushing to the Apple iOS App Store, and that I needed “Swift” as the programming language. Luckily, though, I had a much stronger sense of how I wanted to structure the app user experience and what the app needed to do.

I followed the following steps:

First, I initiated chat as a complete novice app builder. I told it I was new to building iOS apps and wanted to use XCode. I had XCode downloaded, but that was it. I told it to give me step by step instructions for opening XCode and setting up a project. Success! That was effective.
I opened a different chat window after that, to start a new chat. I told it that it was an expert in iOS programming using Swift and XCode. Then I described the app that I wanted to build, said where I was in the process (e.g. had opened and started a project in XCode but had no code yet), and asked it for code to put on the home screen so I could build and open the app and it would have content on the home screen. Success!
From there, I was able to stay in the same chat window and ask it for pieces at a time. I wanted to have a new user complete an onboarding flow the very first time they opened the app. I explained the number of screens and content I wanted on those screens; the chat was able to generate code, tell me how to create that in a file, and how to write code that would trigger this only for new users. Success!
I was able to then add buttons to the home screen; have those buttons open new screens of the app; add navigation back to the home; etc. Success!
(Rinse and repeat, continuing until all of the functionality was built out a step at a time).

To someone with familiarity building and programming things, this probably follows a logical process of how you might build apps. If you’ve built iOS apps before and are an expert in Swift programming, you’re either not reading this blog post or are thinking I (the human) am dumb and inexperienced.

Inexperienced, yes, I was (in April 2023). But what I am trying to show here is for someone new to a process and language, this is how we need to break down steps and work with LLMs to give it small tasks to help us understand and implement the code it produces before moving forward with a new task (ask). It takes these small building block tasks in order to build up to a complete app with all the functionality that we want. Nowadays, even though I can now whip up a prototype project and iOS app and deploy it to my phone within an hour (by working with an LLM as described above, but skipping some of the introductory set-up steps now that I have experience in those), I still follow the same general process to give the LLM the big picture and efficiently ask it to code pieces of the puzzle I want to create.

As the human, you need to be able to keep the big picture – full app purpose and functionality – in mind while subcontracting with the LLM to generate code for specific chunks of code to help achieve new functionality in our project.

In my experience, this is very much like pair programming with a human. In fact, this is exactly what we did when we built DIYPS over ten years ago (wow) and then OpenAPS within the following year. I’ve talked endlessly about how Scott and I would discuss an idea and agree on the big picture task; then I would direct sub-tasks and asks that he, then also Ben and others would be coding on (at first, because I didn’t have as much experience coding and this was 10 years ago without LLMs; I gradually took on more of those coding steps and roles as well). I was in charge of the big picture project and process and end goal; it didn’t matter who wrote which code or how; we worked together to achieve the intended end result. (And it worked amazingly well; here I am 10 years later still using DIYPS and OpenAPS; and tens of thousands of people globally are all using open source AID systems spun off of the algorithm we built through this process!)

Two purple boxes. The one on the left says "big picture project idea" and has a bunch of smaller size boxes within labeled LLM, attempting to show how an LLM can do small-size tasks within the scope of a bigger project that you direct it to do. On the right, the box simply says "finished project". Today, I would say the same is true. It doesn’t matter – for my types of projects – if a human or an LLM “wrote” the code. What matters is: does it work as intended? Does it achieve the goal? Does it contribute to the goal of the project?

Coding can be done – often by anyone (human with relevant coding expertise) or anything (LLM with effective prompting) – for any purpose. The critical key is knowing what the purpose is of the project and keeping the coding heading in the direction of serving that purpose.

Tips for right-sizing the ask

Consider using different chat windows for different purposes, rather than trying to do it all in one. Yes, context windows are getting bigger, but you’ll still likely benefit from giving different prompts in different windows (more on effective prompting below).Start with one window for getting started with setting up a project (e.g. how to get XCode on a Mac and start a project; what file structure to use for an app/project that will do XYZ; how to start a Jupyter notebook for doing data science with python; etc); brainstorming ideas to scope your project; then separately for starting a series of coding sub-tasks (e.g. write code for the home page screen for your app; add a button that allows voice entry functionality; add in HealthKit permission functionality; etc.) that serves the big picture goal.
Make a list for yourself of the steps needed to build a new piece of functionality for your project. If you know what the steps are, you can specifically ask the LLM for that.Again, use a separate window if you need to. For example, if you want to add in the ability to save data to HealthKit from your app, you may start a new chat window that asks the LLM generally how does one add HealthKit functionality for an app? It’ll describe the process of certain settings that need to be done in XCode for the project; adding code that prompts the user with correct permissions; and then code that actually does the saving/revising to HealthKit.
Make your list (by yourself or with help), then you can go ask the LLM to do those things in your coding/task window for your specific project. You can go set the settings in XCode yourself, and skip to asking it for the task you need it to do, e.g. “write code to prompt the user with HealthKit permissions when button X is clicked”.

(Sure, you can do the ask for help in outlining steps in the same window that you’ve been prompting for coding sub-tasks, just be aware that the more you do this, the more quickly you’ll burn through your context window. Sometimes that’s ok, and you’ll get a feel for when to do a separate window with the more experience you get.)

Pay attention as you go and see how much code it can generate and when it falls short of an ask. This will help you improve the rate at which you successfully ask and it fully completes a task for future asks. I observe that when I don’t know – due to my lack of expertise – the right size of a task, it’s more prone to give me ½-⅔ of the code and solution but need additional prompting after that. Sometimes I ask it to continue where it cut off; other times I start implementing/working with the bits of code (the first ⅔) it gave me, and have a mental or written note that this did not completely generate all steps/code for the functionality and to come back.Part of why sometimes it is effective to get started with ⅔ of the code is because you’ll likely need to debug/test the first bit of code, anyway. Sometimes when you paste in code it’s using methods that don’t match the version you’re targeting (e.g. functionality that is outdated as of iOS 15, for example, when you’re targeting iOS 17 and newer) and it’ll flag a warning or block it from working until you fix it.
Once you’ve debugged/tested as much as you can of the original ⅔ of code it gave you, you can prompt it to say “Ok, I’ve done X and Y. We were trying to (repeat initial instructions/prompt) – what are the remaining next steps? Please code that.” to go back and finish the remaining pieces of that functionality.

(Note that saying “please code that” isn’t necessarily good prompt technique, see below).

Again, much of this is paying attention to how the sub-task is getting done in service of the overall big picture goal of your project; or the chunk that you’ve been working on if you’re building new functionality. Keeping track with whatever method you prefer – in your head, a physical written list, a checklist digitally, or notes showing what you’ve done/not done – is helpful.

Most of the above I used for coding examples, but I follow the same general process when writing research papers, blog posts, research protocols, etc. My point is that this works for all types of projects that you’d work on with an LLM, whether the output generation intended is code or human-focused language that you’d write or speak.

But, coding or writing language, the other thing that makes a difference in addition to right-sizing the task is effective prompting. I’ve intuitively noticed that has made the biggest difference in my projects for getting the output matching my expertise. Conversely, I have actually peer reviewed papers for medical journals that do a horrifying job with prompting. You’ll hear people talk about “prompt engineering” and this is what it is referring to: how do you engineer (write) a prompt to get the ideal response from the LLM?

Tips for effective prompting with an LLM

Personas and roles can make a difference, both for you and for the LLM. What do I mean by this? Start your prompt by telling the LLM what perspective you want it to take. Without it, you’re going to make it guess what information and style of response you’re looking for. Here’s an example: if you asked it what caused cancer, it’s going to default to safety and give you a general public answer about causes of cancer in very plain, lay language. Which may be fine. But if you’re looking to generate a better understanding of the causal mechanism of cancer; what is known; and what is not known, you will get better results if you prompt it with “You are an experienced medical oncologist” so it speaks from the generated perspective of that role. Similarly, you can tell it your role. Follow it with “Please describe the causal mechanisms of cancer and what is known and not known” and/or “I am also an experienced medical researcher, although not an oncologist” to help contextualize that you want a deeper, technical approach to the answer and not high level plain language in the response.
Compare and contrast when you prompt the following:

A. “What causes cancer?”

B. “You are an experienced medical oncologist. What causes cancer? How would you explain this differently in lay language to a patient, and how would you explain this to another doctor who is not an oncologist?”

C. “You are an experienced medical oncologist. Please describe the causal mechanisms of cancer and what is known and not known. I am also an experienced medical researcher, although not an oncologist.”

You’ll likely get different types of answers, with some overlap between A and the first part of answer B. Ditto for a tiny bit of overlap between the latter half of answer B and for C.

I do the same kind of prompting with technical projects where I want code. Often, I will say “You are an expert data scientist with experience writing code in Python for a Jupyter Notebook” or “You are an AI programming assistant with expertise in building iOS apps using XCode and SwiftUI”. Those will then be followed with a brief description of my project (more on why this is brief below) and the first task I’m giving it.

The same also goes for writing-related tasks; the persona I give it and/or the role I reference for myself makes a sizable difference in getting the quality of the output to match the style and quality I was seeking in a response.

Be specific. Saying “please code that” or “please write that” might work, sometimes, but more often or not will get a less effective output than if you provide a more specific prompt.I am a literal person, so this is something I think about a lot because I’m always parsing and mentally reviewing what people say to me because my instinct is to take their words literally and I have to think through the likelihood that those words were intended literally or if there is context that should be used to filter those words to be less literal. Sometimes, you’ll be thinking about something and start talking to someone about something, and they have no idea what on earth you’re talking about because the last part of your out-loud conversation with them was about a completely different topic!
LLMs are the same as the confused conversational partner who doesn’t know what you’re thinking about. LLMs only know what you’ve last/recently told it (and more quickly than humans will ‘forget’ what you told it about a project). Remember the above tips about brainstorming and making a list of tasks for a project? Providing a description of the task along with the ask (e.g. we are doing X related to the purpose of achieving Y, please code X) will get you better output more closely matching what you wanted than saying “please code that” where the LLM might code something else to achieve Y if you didn’t tell it you wanted to focus on X.

I find this even more necessary with writing related projects. I often find I need to give it the persona “You are an expert medical researcher”, the project “we are writing a research paper for a medical journal”, the task “we need to write the methods section of the paper”, and a clear ask “please review the code and analyses and make an outline of the steps that we have completed in this process, with sufficient detail that we could later write a methods section of a research paper”. A follow up ask is then “please take this list and draft it into the methods section”. That process with all of that specific context gives better results than “write a methods section” or “write the methods” etc.
Be willing to start over with a new window/chat. Sometimes the LLM can get itself lost in solving a sub-task and lose sight (via lost context window) of the big picture of a project, and you’ll find yourself having to repeat over and over again what you’re asking it to do. Don’t be afraid to cut your losses and start a new chat for a sub-task that you’ve been stuck on. You may be able to eventually come back to the same window as before, or the new window might become your new ‘home’ for the project…or sometimes a third, fourth, or fifth window will.
Try, try again.
I may hold the record for the longest running bug that I (and the LLM) could. Not. solve. This was so, so annoying. No users apparently noticed it but I knew about it and it bugged me for months and months. Every few weeks I would go to an old window and also start a new window, describe the problem, paste the code in, and ask for help to solve it. I asked it to identify problems with the code; I asked it to explain the code and unexpected/unintended functionality from it; I asked it what types of general things would be likely to cause that type of bug. It couldn’t find the problem. I couldn’t find the problem. Finally, one day, I did all of the above, but then also started pasting every single file from my project and asking if it was likely to include code that could be related to the problem. By forcing myself to review all my code files with this problem in mind, even though the files weren’t related at all to the file/bug….I finally spotted the problem myself. I pasted the code in, asked if it was a possibility that it was related to the problem, the LLM said yes, I tried a change and…voila! Bug solved on January 16 after plaguing me since November 8. (And probably existed before then but I didn’t have functionality built until November 8 where I realized it was a problem). I was beating myself up about it and posted to Twitter about finally solving the bug (but very much with the mindset of feeling very stupid about it). Someone replied and said “congrats! sounds like it was a tough one!”. Which I realized was a very kind framing and one that I liked, because it was a tough one; and also I am doing a tough thing that no one else is doing and I would not have been willing to try to do without an LLM to support.

Similarly, just this last week on Tuesday I spent about 3 hours working on a sub-task for a new project. It took 3 hours to do something that on a previous project took me about 40 minutes, so I was hyper aware of the time mismatch and perceiving that 3 hours was a long time to spend on the task. I vented to Scott quite a bit on Tuesday night, and he reminded me that sure it took “3 hours” but I did something in 3 hours that would take 3 years otherwise because no one else would do (or is doing) the project that I’m working on. Then on Wednesday, I spent an hour doing another part of the project and Thursday whipped through another hour and a half of doing huge chunks of work that ended up being highly efficient and much faster than they would have been, in part because the “three hours” it took on Tuesday wasn’t just about the code but about organizing my thinking, scoping the project and research protocol, etc. and doing a huge portion of other work to organize my thinking to be able to effectively prompt the LLM to do the sub-task (that probably did actually take closer to the ~40 minutes, similar to the prior project).

All this to say: LLMs have become pair programmers and collaborators and writers that are helping me achieve tasks and projects that no one else in the world is working on yet. (It reminds me very much of my early work with DIYPS and OpenAPS where we did the work, quietly, and people eventually took notice and paid attention, albeit slower than we wished but years faster than had we not done that work. I’m doing the same thing in a new field/project space now.) Sometimes, the first attempt to delegate a sub-task doesn’t work. It may be because I haven’t organized my thinking enough, and the lack of ideal output shows that I have not prompted effectively yet. Sometimes I can quickly fix the prompt to be effective; but sometimes it highlights that my thinking is not yet clear; my ability to communicate the project/task/big picture is not yet sufficient; and the process of achieving the clarity of thinking and translating to the LLM takes time (e.g. “that took 3 hours when it should have taken 40 minutes”) but ultimately still moves me forward to solving the problem or achieving the tasks and sub-tasks that I wanted to do. Remember what I said at the beginning:

Clear thinking + clear communication of ideas/request = effective prompting => effective code and other outputs

Try it anyway.
I am trying to get out of the habit of saying “I can’t do X”, like “I can’t code/program an iOS app”…because now I can. I’ve in fact built and shipped/launched/made available multiple iOS apps (check out Carb Pilot if you’re interested in macronutrient estimates for any reason; you can customize so you only see the one(s) you care about; or if you have EPI, check out PERT Pilot, which is the world’s first and only app for tracking pancreatic enzyme replacement therapy and has the same AI feature for generating macronutrient estimates to aid in adjusting enzyme dosing for EPI.) I’ve also made really cool, 100% custom-to-me niche apps to serve a personal purpose that save me tons of time and energy. I can do those things, because I tried. I flopped a bunch along the way – it took me several hours to solve a simple iOS programming error related to home screen navigation in my first few apps – but in the process I learned how to do those things and now I can build apps. I’ve coded and developed for OpenAPS and other open source projects, including a tool for data conversion that no one else in the world had built. Yet, my brain still tries to tell me I can’t code/program/etc (and to be fair, humans try to tell me that sometimes, too).

I bring that up to contextualize that I’m working on – and I wish others would work on to – trying to address the reflexive thoughts of what we can and can’t do, based on prior knowledge. The world is different now and tools like LLMs make it possible to learn new things and build new projects that maybe we didn’t have time/energy to do before (not that we couldn’t). The bar to entry and the bar to starting and trying is so much lower than it was even a year ago. It really comes down to willingness to try and see, which I recognize is hard: I have those thought patterns too of “I can’t do X”, but I’m trying to notice when I have those patterns; shift my thinking to “I used to not be able to do X; I wonder if it is possible to work with an LLM to do part of X or learn how to do Y so that I could try to do X”.

A recent real example for me is power calculations and sample size estimates for future clinical trials. That’s something I can’t do; it requires a statistician and specialized software and expertise.

Or…does it?

I asked my LLM how power calculations are done. It explained. I asked if it was possible to do it using Python code in a Jupyter notebook. I asked what information would be needed to do so. It walked me through the decisions I needed to make about power and significance, and highlighted variables I needed to define/collect to put into the calculation. I had generated the data from a previous study so I had all the pieces (variables) I needed. I asked it to write code for me to run in a Jupyter notebook, and it did. I tweaked the code, input my variables, ran it..and got the result. I had run a power calculation! (Shocked face here). But then I got imposter syndrome again, reached out to a statistician who I had previously worked with on a research project. I shared my code and asked if that was the correct or an acceptable approach and if I was interpreting it correctly. His response? It was correct, and “I couldn’t have done it better myself”.

(I’m still shocked about this).

He also kindly took my variables and put it in the specialized software he uses and confirmed that the results output matched what my code did, then pointed out something that taught me something for future projects that might be different (where the data is/isn’t normally distributed) although it didn’t influence the output of my calculation for this project.

What I learned from this was a) this statistician is amazing (which I already knew from working with him in the past) and kind to support my learning like this; b) I can do pieces of projects that I previously thought were far beyond my expertise; c) the blocker is truly in my head, and the more we break out of or identify the patterns stopping us from trying, the farther we will get.

“Try it anyway” also refers to trying things over time. The LLMs are improving every few months and often have new capabilities that didn’t before. Much of my work is done with GPT-4 and the more nuanced, advanced technical tasks are way more efficient than when using GPT-3.5. That being said, some tasks can absolutely be done with GPT-3.5-level AI. Doing something now and not quite figuring it out could be something that you sort out in a few weeks/months (see above about my 3 month bug); it could be something that is easier to do once you advance your thinking ; or it could be more efficiently done with the next model of the LLM you’re working with.
Test whether custom instructions help. Be aware though that sometimes too many instructions can conflict and also take up some of your context window. Plus if you forget what instructions you gave it, you might get seemingly unexpected responses in future chats. (You can always change the custom instructions and/or turn it on and off.)

I’m hoping this helps give people confidence or context to try things with LLMs that they were not willing to try before; or to help get in the habit of remembering to try things with LLMs; and to get the best possible output for the project that they’re working on.

Remember:

Right-size the task by making a clear ask.
You can use different chat windows for different levels of the same project.
Use a list to help you, the human, keep track of all the pieces that contribute to the bigger picture of the project.
Try giving the LLM a persona for an ask; and test whether you also need to assign yourself a persona or not for a particular type of request.
Be specific, think of the LLM as a conversational partner that can’t read your mind.
Don’t be afraid to start over with a new context window/chat.
Things that were hard a year ago might be easier with an LLM; you should try again.
You can do more, partnering with an LLM, than you can on your own, and likely can do things you didn’t realize were possible for you to do!

Clear thinking + clear communication of ideas/request = effective prompting => effective code and other outputs

—

Have any tips to help others get more effective output from LLMs? I’d love to hear them, please comment below and share your tips as well!

Tips for prompting LLMs like ChatGPT, written by Dana M. Lewis and available from DIYPS.org

Accepted, Rejected, and Conflict of Interest in Gastroenterology (And Why This Is A Symptom Of A Bigger Problem)

December 18, 2023 by Dana Lewis

Recently, someone published a new clinical practice update on exocrine pancreatic insufficiency (known as EPI or PEI) in the journal called Gastroenterology, from the American Gastroenterology Association (AGA). Those of you who’ve read any of my blog posts in the last year know how much I’ve been working to raise awareness of EPI, which is very under-researched and under-treated clinically despite the prevalence rates in the general population and key sub-populations such as PWD. So when there was a new clinical practice update and another publication on EPI in general, I was jazzed and set out to read it immediately. Then frowned. Because, like so many articles about EPI, it’s not *quite* right about many things and it perpetuates a lot of the existing problems in the literature. So I did what I could, which was to check out the journal requirements for writing a letter to the editor (LTE) in response to this article and drafting and submitting a LTE article about it. To my delight, on October 17, 2023, I got an email indicating that my LTE was accepted.

You can find my LTE as a pre-print here.

See below why this pre-print version is important, and why you should read it, plus what it reminds us about what journal articles can or cannot tell us in healthcare.

Here’s an image of my acceptance email. I’ll call out a key part of the email:

A print of the acceptance email I received on October 17, 2023, indicating my letter would be sent to authors of the original articles for a chance to choose to respond (or not). Then my LTE would be published.

Letters to the Editor are sent to the authors of the original articles discussed in the letter so that they might have a chance to respond. Letters are not sent to the original article authors until the window of submission for letters responding to that article is closed (the last day of the issue month in which the article is published). Should the authors choose to respond to your letter, their response will appear alongside your letter in the journal.

Given the timeline described, I knew I wouldn’t hear more from the journal until the end of November. The article went online ahead of print in September, meaning likely officially published in October, so the letters wouldn’t be sent to authors until the end of October.

And then I did indeed hear back from the journal. On December 4, 2023, I got the following email:

A print of the email I received saying the LTE was now rejected
TLDR: just kidding, the committee – members of which published the article you’re responding to – and the editors have decided not to publish your article.

I was surprised – and confused. The committee members, or at least 3 of them, wrote the article. They should have a chance to decide whether or not to write a response letter, which is standard. But telling the editors not to publish my LTE? That seems odd and in contrast to the initial acceptance email. What was going on?

I decided to write back and ask. “Hi (name redacted), this is very surprising. Could you please provide more detail on the decision making process for rescinding the already accepted LTE?”

The response?

“In terms of this decision, possible commercial affiliations, as well as other judgments of priority and relevance among other submissions, dampened enthusiasm for this particular manuscript. Ultimately, it was not judged to be competitive for acceptance in the journal. “

Huh? I don’t have any commercial affiliations. So I asked again, “Can you clarify what commercial affiliations were perceived? I have none (nor any financial conflict of interest; nor any funding related to my time spent on the article) and I wonder if there was a misunderstanding when reviewing this letter to the editor.”

The response was “There were concerns with the affiliation with OpenAPS; with the use of the term “guidelines,” which are distinct from this Clinical Practice Update; and with the overall focus being more fit for a cystic fibrosis or research audience rather than a GI audience.”

A final email saying the concern with my affiliation of OpenAPS, which is not a commercial organization nor related to the field of gastroenterology and EPI

Aha, I thought, there WAS a misunderstanding. (And the latter makes no sense in the context of my LTE – the point of it is that most research and clinical literature is a too-narrow focus, cystic fibrosis as one example – the very point is that a broad gastroenterology audience should pay attention to EPI).

I wrote back and explained how I, as a patient/independent researcher, struggle to submit articles to manuscript systems without a Ringgold-verified organization. (You can also listen to me describe the problem in a podcast, here, and I also talked about it in a peer-reviewed journal article about citizen science and health-related journal publishing here). So I use OpenAPS as an “affiliation” even though OpenAPS isn’t an organization. Let alone a commercial organization. I have no financial conflict of interest related to OpenAPS, and zero financial conflict of interest or commercial or any type of funding in gastroenterology at all, related to EPI or not. I actually go to such extremes to describe even perceived conflicts of interest, even non-financial ones, as you can see this in my disclosure statement publicly available from the New England Journal of Medicine here on our CREATE trial article (scroll to Supplemental Information and click on Disclosure Forms) where I articulate that I have no financial conflicts of interest but acknowledge openly that I created the algorithm used in the study. Yet, there’s no commercial or financial conflict of interest.

A screenshot from the publicly available disclosure form on NEJM's site, where I am so careful to indicate possible conflicts of interest that are not commercial or financial, such as the fact that I developed the algorithm that was used in that study. Again, that's a diabetes study and a diabetes example, the paper we are discussing here is on exocrine pancreatic insufficiency (EPI) and gastroenterology, which is unrelated. I have no COI in gastroenterology.

I sent this information back to the journal, explaining this, and asking if the editors would reconsider the situation, given that the authors (committee members?) have misconstrued my affiliation, and given that the LTE was originally accepted.

Sadly, there was no change. They are still declining to publish this article. And there is no change in my level of disappointment.

Interestingly, here is the article in which my LTE is in reply to, and the conflict of interest statement by the authors (committee members?) who possibly raised a flag about supposed concern about my (this is not true) commercial affiliation:

The authors disclose the following: David C. Whitcomb: consultant for AbbVie, Nestlé, Regeneron; cofounder, consultant, board member, chief scientific officer, and equity holder for Ariel Precision Medicine. Anna M. Buchner: consultant for Olympus Corporation of America. Chris E. Forsmark: grant support from AbbVie; consultant for Nestlé; chair, National Pancreas Foundation Board of Directors.

As a side note, one of the companies with consulting and/or grant funding to two of the three authors is the biggest manufacturer of pancreatic enzyme replacement therapy (PERT), which is the treatment for EPI. I don’t think this conflict of interest makes these clinicians ineligible to write their article; nor do I think commercial interests should preclude anyone from publishing – but in my case, it is irrelevant, because I have none. But, it does seem weird given the stated COI for my (actually not a) COI then to be a reason to reject a LTE, of all things.

Here’s the point, though.

It’s not really about the fact that I had an accepted article rejected (although that is weird, to say the least…).

The point is that the presence of information in medical and research journals does not mean that they are correct. (See this post describing the incorrect facts presented about prevalence of EPI, for example.)

And similarly, the lack of presence of material in medical and research journals does not mean that something is not true or is not fact!

There is a lot of gatekeeping in scientific and medical research. You can see it illustrated here in this accepted-rejected dance because of supposed COI (when there are zero commercial ties, let alone COI) and alluded to in terms of the priority of what gets published.

I see this often.

There is good research that goes unpublished because editors decide not to prioritize it (aka do not allow it to get published). There are many such factors in play affecting what gets published.

There are also systemic barriers.

Many journals require fees (called article processing charges or “APC”s) if your article is accepted for publication. If you don’t have funding, that means you can’t publish there unless you want to pay $2500 (or more) out of pocket. Some journals even have submission fees of hundreds of dollars, just to submit! (At least APCs are usually only levied if your article is accepted, but you won’t submit to these journals if you know you can’t pay the APC). That means the few journals in your field that don’t require APCs or fees are harder to get published in, because many more articles are submitted (thus, influencing the “prioritization” problem at the editor level) to the “free” journals.
Journals often require, as previously described, your organization to be part of a verified list (maintained by a third party org) in order for your article to be moved through the queue once submitted. Instead of n/a, I started listing “OpenAPS” as my affiliation and proactively writing to admin teams to let them know that my affiliation won’t be Ringgold-verified, explaining that it’s not an org/I’m not at any institution, and then my article can (usually) get moved through the queue ok. But as I wrote in this peer-reviewed article with a lot of other details about barriers to publishing citizen science and other patient-driven work, it’s one of many barriers involved in the publication process. It’s a little hard, every journal and submission system is a little different, and it’s a lot harder for us than it is for people who have staff/support to help them get articles published in journals.

I’ve seen grant funders say no to funding researchers who haven’t published yet; but editors also won’t prioritize them to publish on a topic in a field where they haven’t been funded yet or aren’t well known. Or they aren’t at a prestigious organization. Or they don’t have the “right” credentials. (Ahem, ahem, ahem). It can be a vicious cycle for even traditional (aka day job) researchers and clinicians. Now imagine that for people who are not inside those systems of academia or medical organizations.

Yet, think about where much of knowledge is captured, created, translated, studied – it’s not solely in these organizations.

Thus, the mismatch. What’s in journals isn’t always right, and the process of peer review can’t catch everything. It’s not a perfect system. But what I want you to take away, if you didn’t already have this context, is an understanding that what’s NOT in a journal is not because the information is not fact or does not exist. It may have not been studied yet; or it may have been blocked from publication by the systemic forces in play.

As I said at the end of my LTE:

“It is also critical to update the knowledge base of EPI beyond the sub-populations of cystic fibrosis and chronic pancreatitis that are currently over-represented in the EPI-related literature. Building upon this updated research base will enable future guidelines, including those like the AGA Clinical Practice Update on EPI, to be clearer, more evidence-based, and truly patient-centric ensuring that every individual living with exocrine pancreatic insufficiency receives optimal care. “

PS – want to read my LTE that was accepted then rejected, meaning it won’t be present in the journal? Here it is on a preprint server with a DOI, which means it’s still easily citable! Here’s an example citation:

Lewis, D. Navigating Ambiguities in Exocrine Pancreatic Insufficiency. OSF Preprints. 2023. DOI: 10.31219/osf.io/xcnf6

New Survey For Everyone (Including You – Yes, You!) To Help Us Learn More About Exocrine Pancreatic Insufficiency

November 9, 2023 by Dana Lewis

If you’ve ever wanted to help with some of my research, this is for you. Yes, you! I am asking people in the general public to take a survey (https://bit.ly/GI-Symptom-Survey-All) and share their experiences.

Why?

Many people have stomach or digestion problems occasionally. For some people, these symptoms happen more often. In some cases, the symptoms are related to exocrine pancreatic insufficiency (known as EPI or PEI). But to date, there have been few studies looking at the frequency of symptoms – or the level of their self-rated severity – in people with EPI or what symptoms may distinguish EPI from other GI-related conditions.

That’s where this survey comes in! We want to compare the experiences of people with EPI to people without EPI (like you!).

Will you help by taking this survey?

Your anonymous participation in this survey will help us understand the unique experiences individuals have with GI symptoms, including those with conditions like exocrine pancreatic insufficiency (EPI). In particular, data contributed by people without EPI will help us understand how the EPI experience is different (or not).

A note on privacy:

The survey is completely anonymous; no identifying information will be collected.
You can stop the survey at any point.

Who designed this survey:

Dana Lewis, an independent researcher, developed the survey and will manage the survey data. This survey design and the choice to run this survey is not influenced by funding from or affiliations with any organizations.

What happens to the data collected in this survey:

The aggregated data will be analyzed for patterns and shared through blog posts and academic publications. No individual data will be shared. This will help fill some of the documented gaps in the EPI-related medical knowledge and may influence the design of targeted research studies in the future.

Have Questions?
Feel free to reach out to Dana+GISymptomSurvey@OpenAPS.org.

How else can you help?
Remember, ANYONE can take this survey. So, feel free to share the link with your family and friends – they can take it, too!

Here’s a link to the survey that you can share (after taking it yourself, of course!): https://bit.ly/GI-Symptom-Survey-All

You (yes you!) can help us learn about exocrine pancreatic insufficiency by taking the survey linked on this page.

DIYPS.org

#WeAreNotWaiting to make the world a better place

research

A new symptom score for people with exocrine pancreatic insufficiency (the EPI/PEI-SS)

The data we leave behind in clinical trials and why it matters for clinical care and healthcare research in the future with AI

What bends and what breaks and the importance of knowing the difference as a patient

How Medical Research Literature Evolves Over Time Like A Game of Telephone

The prompt matters when using Large Language Models (LLMs) and AI in healthcare

Best practices in communication related to writing a journal article and sharing it with co-authors

Assessing the Impact of Diabetes on Gastrointestinal Symptom Severity in Exocrine Pancreatic Insufficiency (EPI/PEI): A Diabetes Subgroup Analysis of EPI/PEI-SS Scores – Poster at #ADA2024

Effective Pair Programming and Coding and Prompt Engineering and Writing with LLMs like ChatGPT and other AI tools

Accepted, Rejected, and Conflict of Interest in Gastroenterology (And Why This Is A Symptom Of A Bigger Problem)

New Survey For Everyone (Including You – Yes, You!) To Help Us Learn More About Exocrine Pancreatic Insufficiency

Recent Posts

Recent Comments

Archives

Categories

Meta