The prompt matters when using Large Language Models (LLMs) and AI in healthcare

I see more and more research papers coming out these days about different uses of large language models (LLMs, a type of AI) in healthcare. There are papers evaluating it for supporting clinicians in decision-making, aiding in note-taking and improving clinical documentation, and enhancing patient education. But I see a wide-sweeping trend in the titles and conclusions of these papers, exacerbated by media headlines, making sweeping claims about the performance of one model versus another. I challenge everyone to pause and consider a critical fact that is less obvious: the prompt matters just as much as the model.

As an example of this, I will link to a recent pre-print of a research article I worked on with Liz Salmi (pre-print here).

Liz nerd-sniped me about an idea of a study to have a patient and a neuro-oncologist evaluate LLM responses related to patient-generated queries about a chart note (or visit note or open note or clinical note, however you want to call it). I say nerd-sniped because I got very interested in designing the methods of the study, including making sure we used the APIs to model these ‘chat’ sessions so that the prompts were not influenced by custom instructions, ‘memory’ features within the account or chat sessions, etc. I also wanted to test something I’ve observed anecdotally from personal LLM use across other topics, which is that with 2024-era models the prompt matters a lot for what type of output you get. So that’s the study we designed, and wrote with Jennifer Clarke, Zhiyong Dong, Rudy Fischmann, Emily McIntosh, Chethan Sarabu, and Catherine (Cait) DesRoches, and I encourage you to check out the pre-print and enjoy the methods section, which is critical for understanding the point I’m trying to make here. 

In this study, the data showed that when LLM outputs were evaluated for a healthcare task, the results varied significantly depending not just on the model but also on how the task was presented (the prompt). Specifically, persona-based prompts—designed to reflect the perspectives of different end users like clinicians and patients—yielded better results, as independently graded by both an oncologist and a patient.

The Myth of the “Best Model for the Job”

Many research papers conclude with simplified takeaways: Model A is better than Model B for healthcare tasks. While performance benchmarking is important, this approach often oversimplifies reality. Healthcare tasks are rarely monolithic. There’s a difference between summarizing patient education materials, drafting clinical notes, or assisting with complex differential diagnosis tasks.

But even within a single task, the way you frame the prompt makes a profound difference.

Consider these three prompts for the same task:

  • “Explain the treatment options for early-stage breast cancer.”
  • “You’re an oncologist. Explain the treatment options for early-stage breast cancer.”
  • “You’re an oncologist. Explain the treatment options for early-stage breast cancer as you would to a newly diagnosed patient with no medical background.”

The second and third prompt likely result in a more accessible and tailored response. If a study only tests general prompts (e.g. prompt one), it may fail to capture how much more effective an LLM can be with task-specific guidance.

Why Prompting Matters in Healthcare Tasks

Prompting shapes how the model interprets the task and generates its output. Here’s why it matters:

  • Precision and Clarity: A vague prompt may yield vague results. A precise prompt clarifies the goal and the speaker (e.g. in prompt 2), and also often the audience (e.g. in prompt 3).
  • Task Alignment: Complex medical topics often require different approaches depending on the user—whether it’s a clinician, a patient, or a researcher.
  • Bias and Quality Control: Poorly constructed prompts can inadvertently introduce biases

Selecting a Model for a Task? Test Multiple Prompts

When evaluating LLMs for healthcare tasks—or applying insights from a research paper—consider these principles:

  1. Prompt Variation Matters: If an LLM fails on a task, it may not be the model’s fault. Try adjusting your prompts before concluding the model is ineffective, and avoid broad sweeping claims about a field or topic that aren’t supported by the test you are running.
  2. Multiple Dimensions of Performance: Look beyond binary “good” vs. “bad” evaluations. Consider dimensions like readability, clinical accuracy, and alignment with user needs, as an example when thinking about performance in healthcare. In our paper, we saw some cases where a patient and provider overlapped in ratings, and other places where the ratings were different.
  3. Reproducibility and Transparency: If a study doesn’t disclose how prompts were designed or varied, its conclusions may lack context. Reproducibility in AI studies depends not just on the model, but on the interaction between the task, model, and prompt design. You should be looking for these kinds of details when reading or peer reviewing papers. Take results and conclusions with a grain of salt if these methods are not detailed in the paper.
  4. Involve Stakeholders in Evaluation: As shown in the preprint mentioned earlier, involving both clinical experts and patients in evaluating LLM outputs adds critical perspectives often missing in standard evaluations, especially as we evolve to focus research on supporting patient needs and not simply focusing on clinician and healthcare system usage of AI.

What This Means for Healthcare Providers, Researchers, and Patients

  • For healthcare providers, understand that the way you frame a question can improve the usefulness of AI tools in practice. A carefully constructed prompt, adding a persona or requesting information for a specific audience, can change the output.
  • For researchers, especially those developing or evaluating AI models, it’s essential to test prompts across different task types and end-user needs. Transparent reporting on prompt strategies strengthens the reliability of your findings.
  • For patients, recognizing that AI-generated health information is shaped by both the model and the prompt. This can support critical thinking when interpreting AI-driven health advice. Remember that LLMs can be biased, but so too can be humans in healthcare. The same approach for assessing bias and evaluating experiences in healthcare should be used for LLM output as well as human output. Everyone (humans) and everything (LLMs) are capable of bias or errors in healthcare.

Prompts matter, so consider model type as well as the prompt as a factor in assessing LLMs in healthcare. Blog by Dana M. LewisTLDR: Instead of asking “Which model is best?”, a better question might be:

“How do we design and evaluate prompts that lead to the most reliable, useful results for this specific task and audience?”

I’ve observed, and this study adds evidence, that prompt interaction with the model matters.

Assessing the Impact of Diabetes on Gastrointestinal Symptom Severity in Exocrine Pancreatic Insufficiency (EPI/PEI): A Diabetes Subgroup Analysis of EPI/PEI-SS Scores – Poster at #ADA2024

Last year, I recognized that there was a need to improve the documentation of symptoms of exocrine pancreatic insufficiency (known as EPI or PEI). There is no standardized way to discuss symptoms with doctors, and this influences whether or not people get the right amount of enzymes (pancreatic enzyme replacement therapy; PERT) to treat EPI and eliminate symptoms completely. It can be done, but like insulin, it requires matching PERT to the amount of food you’re consuming. I also began observing that EPI is underscreened and underdiagnosed, whether that’s in the general population or in people with diabetes. I thought that if we could create a list of common EPI symptoms and a standardized scale to rate them, this might help address some of these challenges.

I developed this scale to address these needs. It is called the “Exocrine Pancreatic Insufficiency Symptom Score” or “EPI/PEI-SS” for short.

I had a handful of people with and without EPI help me test the scale last year, and then I opened up a survey to the entire world and asked people to share their experiences with GI-related symptoms. I specifically sought people with EPI diagnoses as well as people who don’t have EPI, so that we could compare the symptom burden and experiences to people without EPI. (Thank you to everyone who contributed their data to this survey!)

After the first three weeks, I started analyzing the first set of data. While doing that, I realized that (both because of my network of people with diabetes and because I also posted in at least one diabetes-specific group), I had a large sub-group of people with diabetes who had contributed to the survey, and I was able to do a full subgroup analyses to assess whether having diabetes seemed to correlate with a different symptom experience of EPI or not.

Here’s what I found, and what my poster is about (you can view my poster as a PDF here), presented at ADA Scientific Sessions 2024 (#ADA2024):

1985-LB at #ADA2024, “Assessing the Impact of Diabetes on Gastrointestinal Symptom Severity in Exocrine Pancreatic Insufficiency (EPI/PEI): A Diabetes Subgroup Analysis of EPI/PEI-SS Scores”

Exocrine pancreatic insufficiency has a high symptom burden and is present in as many as 3 of 10 people with diabetes. (See my systematic review from last year here). To help improve conversations about symptoms of EPI, which can then be used to improve screening, diagnosis, and treatment success with EPI, I created the Exocrine Pancreatic Insufficiency Symptom Score (EPI/PEI-SS), which consists of 15 individual symptoms that people separately rate the frequency (0-5) and severity (0-3) for which they experience those symptoms, if at all. The frequency and severity get multiplied for an individual symptom score (0-15 possible) and these get added up for a total EPI/PEI-SS score (0-225 possible, because 15 symptoms times 15 possible points per symptom is 225).

I conducted a real-world study of the EPI/PEI-SS in the general population to assess the gastrointestinal symptom burden in individuals with (n=155) and without (n=169) EPI. Because there was a large cohort of PWD within these groups, I separately analyzed them to evaluate whether diabetes contributes to a difference in EPI/PEI-SS score.

Methods:

I calculated EPI/PEI-SS scores for all survey participants. Previously, I had analyzed the differences of people with and without EPI overall. For this sub-analysis, I analyzed and compared between PWD (n=118 total), with EPI (T1D: n=14; T2D: n=20) or without EPI (T1D: n=78; T2D: n=6), and people without diabetes (n=206 total) with and without EPI.

I also looked at sub-groups within the non-EPI cohorts and broke them into two groups to see whether other GI conditions contributed to a higher EPI/PEI-SS score and whether we could distinguish EPI from other GI and non-GI conditions.

Results:

People with EPI have a much higher symptom burden than people without EPI. This can be assessed by looking at the statistically significant higher mean EPI/PEI-SS score as well as the average number of symptoms; the average severity score of individual symptoms; and the average frequency score of individual symptoms.

This remains true irrespective of diabetes. In other words, diabetes does not appear to influence any of these metrics.

People with diabetes with EPI had statistically significant higher mean EPI/PEI-SS scores (102.62 out of 225, SD: 52.46) than did people with diabetes without EPI (33.64, SD: 30.38), irrespective of presence of other GI conditions (all group comparisons p<0.001). As you can see below, that is the same pattern we see in people without diabetes. And the stats confirm what you can see: there is no significant difference overall or in any of the subgroups between people with and without diabetes.

Box plot showing EPI/PEI-SS scores for people with and without diabetes, and with and without EPI or other GI conditions. The scores are higher in people with EPI regardless of whether they have diabetes. The plot makes it clear that the scores are distinct between the groups with and without EPI, even when the people without EPI have other GI conditions. This suggests the EPI/PEI-SS can be useful in distinguishing between EPI and other conditions that may cause GI symptoms, and that the EPI/PEI-SS could be a useful screening tool to help identify people who need screening for EPI.

T1D and T2D subgroups were similar
(but because the T2D cohort is small, I did not break them out separately in this graph).

For example, people with diabetes with EPI had an average of 12.59 (out of 15) symptoms, with an average frequency score of 3.06 and average severity score of 1.79, and an average individual symptom score of 5.48. This is a pretty clear contrast to people with diabetes without EPI who had had an average of 7.36 symptoms, with an average frequency score of 1.4 and average severity score of 0.8, and an average individual symptom score of 1.12. All comparisons are statistically significant (p<0.001).

A table comparing the average number of symptoms, frequency, severity, and individual symptom scores between people with diabetes with and without exocrine pancreatic insufficiency (EPI). People with EPI have more symptoms and higher frequency and severity than without EPI: regardless of diabetes.

Conclusion 

  • EPI has a high symptom burden, irrespective of diabetes.
  • High scores using the EPI/PEI-SS among people with diabetes can distinguish between EPI and other GI conditions.
  • The EPI/PEI-SS should be further studied as a possible screening method for EPI and assessed as a tool to aid people with EPI in tracking changes to EPI symptoms over time based on PERT titration.

What does this mean if you are a healthcare provider? What actionable information does this give you?

If you’re a healthcare provider, you should be aware that people with diabetes may be more likely to have EPI – rather than celiac or gastroparesis (source) – if they mention having GI symptoms. This means you should incorporate fecal elastase screening into your care plans to help further evaluate GI-related symptoms.

If you want to further improve your pre-test probability of the elastase testing, you can use the EPI/PEI-SS with your patients to assess the severity and frequency of their GI-related symptoms. I will explain the cutoff and AUC numbers we calculated, but first understand the caveat that these were calculated in the initial real-world study that included people with EPI who are already treating with PERT; thus these numbers might change a little when we repeat this study and evaluate it in people with untreated EPI. (However, I actually predict the mean score to go up in an undiagnosed population, because scores should go down with treatment.) But that different population study may change these exact cutoff and sensitivity specificity numbers, which is why I’m giving this caveat. That being said: the AUC was 0.85 which means a higher EPI/PEI-SS is pretty good for differentiating between EPI and not having EPI. (In the diabetes sub-population specifically, I calculated a suggested cutoff of 59 (out of 225) with a sensitivity of 0.81 and specificity of 0.75. This means we estimate that if people are bringing up GI symptoms to you and you have them take the EPI/PEI-SS and their score is greater than or equal to 59, you would expect that out of 100 people that 81 with EPI would be identified (and 75 of 100 people without EPI would also correctly be identified via scores lower than 59). That doesn’t mean that people with EPI can’t have a lower score; or that people with a higher score do have EPI; but it does mean that the chances of having fecal elastase <=200 ug/g is a lot more likely in those with higher EPI/PEI-SS scores.

In addition to the cutoff score, there is a notable difference in people with diabetes and EPI compared to people with diabetes without EPI in their top individual symptom scores (representing symptom burden based on frequency and severity). For example, the top 3 symptoms of those with EPI and diabetes include avoiding certain food/groups; urgent bowel movements; and avoiding eating large meals. People without EPI and diabetes also score “Avoid certain food/groups” as their top score, but the score is markedly different: the mean score of 8.94 for people with EPI as compared to 3.49 for people without EPI. In fact, the mean score on the lowest individual symptom is higher for people with EPI than the highest individual symptom score for people without EPI.

QR code for EPI/PEI-SS - takes you to https://bit.ly/EPI-PEI-SS-WebHow do you have people take the EPI/PEI-SS? You can pull this link up (https://bit.ly/EPI-PEI-SS-Web), give this link to them and ask them to take it on their phone, or save this QR code and give it to them to take later. The link (and the QR code) go to a free web-based version of the EPI/PEI-SS that will calculate the total EPI/PEI-SS score, and you can use it for shared decision making processes about whether this person would benefit from a fecal elastase test or other follow up screening for EPI. Note that the EPI/PEI-SS does not collect any identifiable information and is fully anonymous.

(Bonus: people who use this tool can opt to contribute their anonymized symptom and score data for an ongoing observational study.)

If you have feedback about whether the EPI/PEI-SS was helpful – or not – in your care of people with diabetes; or if you want to discuss collaborating on some prospective studies to evaluate EPI/PEI-SS in comparison to fecal elastase screening, please reach out anytime to Dana@OpenAPS.org

What does this mean if you are a patient (person with diabetes)? What actionable information does this give you?

If you don’t have GI symptoms that bother you, you don’t necessarily need to take action. (Just put a note in your brain that EPI is more likely than celiac or gastroparesis in people with diabetes so if you or a friend with diabetes have GI symptoms in the future, you can make sure you are assessed for EPI.) You can also choose to take the EPI/PEI-SS regardless, and also opt in to donate your data.

If you do have GI symptoms that are annoying, you may want to take the EPI/PEI-SS to help you evaluate the frequency and severity of your GI symptoms. You can take it for free and anonymously – no identifiable information is needed to access the tool. It will generate the EPI/PEI-SS score for you.

Based on the score, you may want to ask your doctor (which could be the doctor that treats your diabetes, or a primary/general care provider, or a gastroenterologist – whoever you seek routine care from or have an appointment from next) about your symptoms; share the EPI/PEI-SS score; and explain that you think you may warrant screening for EPI.

(You can also choose to contribute your anonymous symptom data to a research dataset, to help us improve the EPI/PEI-SS and help us figure out how to help improve screening and diagnosis and treatment of EPI. Remember, this tool will not ask you for any identifying information. This is 100% optional and you can opt out of doing so if you do not prefer to contribute to research, while still using the tool.)

You can see a pre-print version of the diabetes sub-study here or pre-print of the general population data here.

If you’re looking for more personal experiences about living with EPI, check out DIYPS.org/EPI, and also for people with EPI looking to improve their dosing with pancreatic enzyme replacement therapy – you may want to check out PERT Pilot (a free iOS app to record enzyme dosing).

Researchers & clinicians, if you’re interested in collaborating on studies in EPI (in diabetes, or more broadly on EPI), whether specifically on EPI/PEI-SS or broader EPI topics, please reach out! My email is Dana@OpenAPS.org

Pain and translation and using AI to improve healthcare at an individual level

I think differently from most people. Sometimes, this is a strength; and sometimes this is a challenge. This is noticeable when I approach healthcare encounters in particular: the way I perceive signals from my body is different from a typical person. I didn’t know this for the longest time, but it’s something I have been becoming more aware of over the years.

The most noticeable incident that brought me to this realization involved when I pitched head first off a mountain trail in New Zealand over five years ago. I remember yelling – in flight – help, I broke my ankle, help. When I had arrested my fall, clung on, and then the human daisy chain was pulling me back up onto the trail, I yelped and stopped because I could not use my right ankle to help me climb up the trail. I had to reposition my knee to help move me up. When we got up to the trail and had me sitting on a rock, resting, I felt a wave of nausea crest over me. People suggested that it was dehydration and I should drink. I didn’t feel dehydrated, but ok. Then because I was able to gently rest my foot on the ground at a normal perpendicular angle, the trail guides hypothesized that it was not broken, just sprained. It wasn’t swollen enough to look like a fracture, either. I felt like it hurt really bad, worse than I’d ever hurt an ankle before and it didn’t feel like a sprain, but I had never broken a bone before so maybe it was the trauma of the incident contributing to how I was feeling. We taped it and I tried walking. Nope. Too-strong pain. We made a new goal of having me use poles as crutches to get me to a nearby stream a half mile a way, to try to ice my ankle. Nope, could not use poles as crutches, even partial weight bearing was undoable. I ended up doing a mix of hopping, holding on to Scott and one of the guides. That got exhausting on my other leg pretty quickly, so I also got down on all fours (with my right knee on the ground but lifting my foot and ankle in the air behind me) to crawl some. Eventually, we realized I wasn’t going to be able to make it to the stream and the trail guides decided to call for a helicopter evacuation. The medics, too, when they arrived via helicopter thought it likely wasn’t broken. I got flown to an ER and taken to X-Ray. When the technician came out, I asked her if she saw anything obvious and whether it looked broken or not. She laughed and said oh yes, there’s a break. When the ER doc came in to talk to me he said “you must have a really high pain tolerance” and I said “oh really? So it’s definitely broken?” and he looked at me like I was crazy, saying “it’s broken in 3 different places”. (And then he gave me extra pain meds before setting my ankle and putting the cast on to compensate for the fact that I have high pain tolerance and/or don’t communicate pain levels in quite the typical way.)

A week later, when I was trying not to fall on my broken ankle and broke my toe, I knew instantly that I had broken my toe, both by the pain and the nausea that followed. Years later when I smashed another toe on another chair, I again knew that my toe was broken because of the pain + following wave of nausea. Nausea, for me, is apparently a response to very high level pain. And this is something I’ve carried forward to help me identify and communicate when my pain levels are significant, because otherwise my pain tolerance is such that I don’t feel like I’m taken seriously because my pain scale is so different from other people’s pain scales.

Flash forward to the last few weeks. I have an autoimmune disease causing issues with multiple areas of my body. I have some progressive slight muscle weakness that began to concern me, especially as it spread to multiple limbs and areas of my body. This was followed with pain in different parts of my spine which has escalated. Last weekend, riding in the car, I started to get nauseous from the pain and had to take anti-nausea medicine (which thankfully helped) as well as pain medicine (OTC, and thankfully it also helped lower it down to manageable levels). This has happened several other times.

Some of the symptoms are concerning to my healthcare provider and she agreed I should probably have a MRI and a consult from neurology. Sadly, the first available new patient appointment with the neurologist I was assigned to was in late September. Gulp. I was admittedly nervous about my symptom progression, my pain levels (intermittent as they are), and how bad things might get if we are not able to take any action between now and September. I also, admittedly, was not quite sure how I would cope with the level of pain I have been experiencing at those peak moments that cause nausea.

I had last spoken to my provider a week prior, before the spine pain started. I reached out to give her an update, confirm that my specialist appointment was not until September, and express my concern about the progression and timeline. She too was concerned and I ended up going in for imaging sooner.

Over the last week, because I’ve been having these progressive symptoms, I used Katie McCurdy’s free templates from Pictal Health to help visualize and show the progression of symptoms over time. I wasn’t planning on sending my visuals to my doctor, but it helped me concretely articulate my symptoms and confirm that I was including everything that I thought was meaningful for my healthcare providers to know. I also shared them with Scott to confirm he didn’t think I had missed anything. The icons in some cases were helpful but in other cases didn’t quite match how I was experiencing pain and I modified them somewhat to better match how I saw the pain I was experiencing.

(PS – check out Katie’s templates here, you can make a copy in Google Drive and use them yourself!)

As I spoke with the nurse who was recording my information at intake for imaging, she asked me to characterize the pain. I did and explained that it was probably usually a 7/10 then but periodically gets stronger to the point of causing nausea, which for me is a broken bone pain-level response. She asked me to characterize the pain – was it burning, tingling…? None of the words she said matched how it feels. It’s strong pain; it sometimes gets worse. But it’s not any of the words she mentioned.

When the nurse asked if it was “sharp”, Scott spoke up and explained the icon that I had used on my visual, saying maybe it was “sharp” pain. I thought about it and agreed that it was probably the closest word (at least, it wasn’t a hard no like the words burning, tingling, etc. were), and the nurse wrote it down. That became the word I was able to use as the closest approximation to how the pain felt, but again with the emphasis of it periodically reaching nausea-inducing levels equivalent to broken bone pain, because I felt saying “sharp” pain alone did not characterize it fully.

This, then, is one of the areas where I feel that artificial intelligence (AI) gives me a huge helping hand. I often will start working with an LLM (a large language model) and describing symptoms. Sometimes I give it a persona to respond as (different healthcare provider roles); sometimes I clarify my role as a patient or sometimes as a similar provider expert role. I use different words and phrases in different questions and follow ups; I then study the language it uses in response.

If you’re not familiar with LLMs, you should know it is not human intelligence; there is no brain that “knows things”. It’s not an encyclopedia. It’s a tool that’s been trained on a bajillion words, and it learns patterns of words as a result, and records “weights” that are basically cues about how those patterns of words relate to each other. When you ask it a question, it’s basically autocompleting the next word based on the likelihood of it being the next word in a similar pattern. It can therefore be wildly wrong; it can also still be wildly useful in a lot of ways, including this context.

What I often do in these situations is not looking for factual information. Again, it’s not an encyclopedia. But I myself am observing the LLM in using a pattern of words so that I am in turn building my own set of “weights” – meaning, building an understanding of the patterns of words it uses – to figure out a general outline of what is commonly known by doctors and medical literature; the common terminology that is being used likely by doctors to intake and output recommendations; and basically build a list of things that do and do not match my scenario or symptoms or words, or whatever it is I am seeking to learn about.

I can then learn (from the LLM as well as in person clinical encounters) that doctors and other providers typically ask about burning, tingling, etc and can make it clear that none of those words match at all. I can then accept from them (or Scott, or use a word I learned from an LLM) an alternative suggestion where I’m not quite sure if it’s a perfect match, but it’s not absolutely wrong and therefore is ok to use to describe somewhat of the sensation I am experiencing.

The LLM and AI, basically, have become a translator for me. Again, notice that I’m not asking it to describe my pain for me; it would make up words based on patterns that have nothing to do with me. But when I observe the words it uses I can then use my own experience to rule things in/out and decide what best fits and whether and when to use any of those, if they are appropriate.

Often, I can do this in advance of a live healthcare encounter. And that’s really helpful because it makes me a better historian (to use clinical terms, meaning I’m able to report the symptoms and chronology and characterization more succinctly without them having to play 20 questions to draw it out of me); and it saves me and the clinicians time for being able to move on to other things.

At this imaging appointment, this was incredibly helpful. I had the necessary imaging and had the results at my fingertips and was able to begin exploring and discussing the raw data with my LLM. When I then spoke with the clinician, I was able to better characterize my symptoms in context of the imaging results and ask questions that I felt were more aligned with what I was experiencing, and it was useful for a more efficient but effective conversation with the clinician about what our working hypothesis was; what next short-term and long-term pathways looked like; etc.

This is often how I use LLMs overall. If you ask an LLM if it knows who Dana Lewis is, it “does” know. It’ll tell you things about me that are mostly correct. If you ask it to write a bio about me, it will solidly make up ⅓ of it that is fully inaccurate. Again, remember it is not an encyclopedia and does not “know things”. When you remember that the LLM is autocompleting words based on the likelihood that they match the previous words – and think about how much information is on the internet and how many weights (patterns of words) it’s been able to build about a topic – you can then get a better spidey-sense about when things are slightly more or less accurate at a general level. I have actually used part of a LLM-written bio, but not by asking it to write a bio. That doesn’t work because of made up facts. I have instead asked it to describe my work, and it does a pretty decent job. This is due to the number of articles I have written and authored; the number of articles describing my work; and the number of bios I’ve actually written and posted online for conferences and such. So it has a lot of “weights” probably tied to the types of things I work on, and having it describe the type of work I do or am known for gets pretty accurate results, because it’s writing in a general high level without enough detail to get anything “wrong” like a fact about an award, etc.

This is how I recommend others use LLMs, too, especially those of us as patients or working in healthcare. LLMs pattern match on words in their training; and they output likely patterns of words. We in turn as humans can observe and learn from the patterns, while recognizing these are PATTERNS of connected words that can in fact be wrong. Systemic bias is baked into human behavior and medical literature, and this then has been pattern-matched by the LLM. (Note I didn’t say “learned”; but they’ve created weights based on the patterns they observe over and over again). You can’t necessarily course-correct the LLM (it’ll pretend to apologize and maybe for a short while adjust it’s word patterns but in a new chat it’s prone to make the same mistakes because the training has not been updated based on your feedback, so it reverts to using the ‘weights’ (patterns) it was trained on); instead, we need to create more of the correct/right information and have it voluminously available for LLMs to train on in the future. At an individual level then, we can let go of the obvious not-right things it’s saying and focus on what we can benefit from in the patterns of words it gives us.

And for people like me, with a high (or different type of) pain tolerance and a different vocabulary for what my body is feeling like, this has become a critical tool in my toolbox for optimizing my healthcare encounters. Do I have to do this to get adequate care? No. But I’m an optimizer, and I want to give the best inputs to the healthcare system (providers and my medical records) in order to increase my chances of getting the best possible outputs from the healthcare system to help me maintain and improve and save my health when these things are needed.

TLDR: LLMs can be powerful tools in the hands of patients, including for real-time or ahead-of-time translation and creating shared, understandable language for improving communication between patients and providers. Just as you shouldn’t tell a patient not to use Dr. Google, you should similarly avoid falling into the trap of telling a patient not to use LLMs because they’re “wrong”. Being wrong in some cases and some ways does not mean LLMs are useless or should not be used by patients. Each of these tools has limitations but a lot of upside and benefits; restricting patients or trying to limit use of tools is like limiting the use of other accessibility tools. I spotted a quote from Dr. Wes Ely that is relevant: “Maleficence can be created with beneficent intent”. In simple words, he is pointing out that harm can happen even with good intent.

Don’t do harm by restricting or recommending avoiding tools like LLMs.

Effective Pair Programming and Coding and Prompt Engineering and Writing with LLMs like ChatGPT and other AI tools

I’ve been puzzled when I see people online say that LLM’s “don’t write good code”. In my experience, they do. But given that most of these LLMs are used in chatbot mode – meaning you chat and give it instructions to generate the code – that might be where the disconnect lies. To get good code, you need effective prompting and to do so, you need clear thinking and ideas on what you are trying to achieve and how.

My recipe and understanding is:

Clear thinking + clear communication of ideas/request = effective prompting => effective code and other outputs

It also involves understanding what these systems can and can’t do. For example, as I’ve written about before, they can’t “know” things (although they can increasingly look things up) and they can’t do “mental” math. But, they can generally repeat patterns of words to help you see what is known about a topic and they can write code that you can execute (or it can execute, depending on settings) to solve a math problem.

What the system does well is help code small chunks, walk you through processes to link these sections of code up, and help you implement them (if you ask for it). The smaller the task (ask), the more effective it is. Or also – the easier it is for you to see when it completes the task and when it hasn’t been able to finish due to limitations like response length limits, information falling out of the context window (what it knows that you’ve told it); unclear prompting; and/or because you’re asking it to do things for which it doesn’t have expertise. Some of the last part – lack of expertise – can be improved with specific prompting techniques –  and that’s also true for right-sizing the task it’s focusing on.

Right-size the task by giving a clear ask

If I were to ask an LLM to write me code for an iOS app to do XYZ, it could write me some code, but it certainly wouldn’t (at this point in history, written in February 2024), write all code and give me a downloadable file that includes it all and the ability to simply run it. What it can do is start writing chunks and snippets of code for bits and pieces of files that I can take and place and build upon.

How do I know this? Because I made that mistake when trying to build my first iOS apps in April and May 2023 (last year). It can’t do that (and still can’t today; I repeated the experiment). I had zero ideas how to build an iOS app; I had a sense that it involved XCode and pushing to the Apple iOS App Store, and that I needed “Swift” as the programming language. Luckily, though, I had a much stronger sense of how I wanted to structure the app user experience and what the app needed to do.

I followed the following steps:

  1. First, I initiated chat as a complete novice app builder. I told it I was new to building iOS apps and wanted to use XCode. I had XCode downloaded, but that was it. I told it to give me step by step instructions for opening XCode and setting up a project. Success! That was effective.
  2. I opened a different chat window after that, to start a new chat. I told it that it was an expert in iOS programming using Swift and XCode. Then I described the app that I wanted to build, said where I was in the process (e.g. had opened and started a project in XCode but had no code yet), and asked it for code to put on the home screen so I could build and open the app and it would have content on the home screen. Success!
  3. From there, I was able to stay in the same chat window and ask it for pieces at a time. I wanted to have a new user complete an onboarding flow the very first time they opened the app. I explained the number of screens and content I wanted on those screens; the chat was able to generate code, tell me how to create that in a file, and how to write code that would trigger this only for new users. Success!
  4. I was able to then add buttons to the home screen; have those buttons open new screens of the app; add navigation back to the home; etc. Success!
  5. (Rinse and repeat, continuing until all of the functionality was built out a step at a time).

To someone with familiarity building and programming things, this probably follows a logical process of how you might build apps. If you’ve built iOS apps before and are an expert in Swift programming, you’re either not reading this blog post or are thinking I (the human) am dumb and inexperienced.

Inexperienced, yes, I was (in April 2023). But what I am trying to show here is for someone new to a process and language, this is how we need to break down steps and work with LLMs to give it small tasks to help us understand and implement the code it produces before moving forward with a new task (ask). It takes these small building block tasks in order to build up to a complete app with all the functionality that we want. Nowadays, even though I can now whip up a prototype project and iOS app and deploy it to my phone within an hour (by working with an LLM as described above, but skipping some of the introductory set-up steps now that I have experience in those), I still follow the same general process to give the LLM the big picture and efficiently ask it to code pieces of the puzzle I want to create.

As the human, you need to be able to keep the big picture – full app purpose and functionality – in mind while subcontracting with the LLM to generate code for specific chunks of code to help achieve new functionality in our project.

In my experience, this is very much like pair programming with a human. In fact, this is exactly what we did when we built DIYPS over ten years ago (wow) and then OpenAPS within the following year. I’ve talked endlessly about how Scott and I would discuss an idea and agree on the big picture task; then I would direct sub-tasks and asks that he, then also Ben and others would be coding on (at first, because I didn’t have as much experience coding and this was 10 years ago without LLMs; I gradually took on more of those coding steps and roles as well). I was in charge of the big picture project and process and end goal; it didn’t matter who wrote which code or how; we worked together to achieve the intended end result. (And it worked amazingly well; here I am 10 years later still using DIYPS and OpenAPS; and tens of thousands of people globally are all using open source AID systems spun off of the algorithm we built through this process!)

Two purple boxes. The one on the left says "big picture project idea" and has a bunch of smaller size boxes within labeled LLM, attempting to show how an LLM can do small-size tasks within the scope of a bigger project that you direct it to do. On the right, the box simply says "finished project". Today, I would say the same is true. It doesn’t matter – for my types of projects – if a human or an LLM “wrote” the code. What matters is: does it work as intended? Does it achieve the goal? Does it contribute to the goal of the project?

Coding can be done – often by anyone (human with relevant coding expertise) or anything (LLM with effective prompting) – for any purpose. The critical key is knowing what the purpose is of the project and keeping the coding heading in the direction of serving that purpose.

Tips for right-sizing the ask

  1. Consider using different chat windows for different purposes, rather than trying to do it all in one. Yes, context windows are getting bigger, but you’ll still likely benefit from giving different prompts in different windows (more on effective prompting below).Start with one window for getting started with setting up a project (e.g. how to get XCode on a Mac and start a project; what file structure to use for an app/project that will do XYZ; how to start a Jupyter notebook for doing data science with python; etc); brainstorming ideas to scope your project; then separately for starting a series of coding sub-tasks (e.g. write code for the home page screen for your app; add a button that allows voice entry functionality; add in HealthKit permission functionality; etc.) that serves the big picture goal.
  2. Make a list for yourself of the steps needed to build a new piece of functionality for your project. If you know what the steps are, you can specifically ask the LLM for that.Again, use a separate window if you need to. For example, if you want to add in the ability to save data to HealthKit from your app, you may start a new chat window that asks the LLM generally how does one add HealthKit functionality for an app? It’ll describe the process of certain settings that need to be done in XCode for the project; adding code that prompts the user with correct permissions; and then code that actually does the saving/revising to HealthKit.

    Make your list (by yourself or with help), then you can go ask the LLM to do those things in your coding/task window for your specific project. You can go set the settings in XCode yourself, and skip to asking it for the task you need it to do, e.g. “write code to prompt the user with HealthKit permissions when button X is clicked”.

    (Sure, you can do the ask for help in outlining steps in the same window that you’ve been prompting for coding sub-tasks, just be aware that the more you do this, the more quickly you’ll burn through your context window. Sometimes that’s ok, and you’ll get a feel for when to do a separate window with the more experience you get.)

  • Pay attention as you go and see how much code it can generate and when it falls short of an ask. This will help you improve the rate at which you successfully ask and it fully completes a task for future asks. I observe that when I don’t know – due to my lack of expertise – the right size of a task, it’s more prone to give me ½-⅔ of the code and solution but need additional prompting after that. Sometimes I ask it to continue where it cut off; other times I start implementing/working with the bits of code (the first ⅔) it gave me, and have a mental or written note that this did not completely generate all steps/code for the functionality and to come back.Part of why sometimes it is effective to get started with ⅔ of the code is because you’ll likely need to debug/test the first bit of code, anyway. Sometimes when you paste in code it’s using methods that don’t match the version you’re targeting (e.g. functionality that is outdated as of iOS 15, for example, when you’re targeting iOS 17 and newer) and it’ll flag a warning or block it from working until you fix it.

    Once you’ve debugged/tested as much as you can of the original ⅔ of code it gave you, you can prompt it to say “Ok, I’ve done X and Y. We were trying to (repeat initial instructions/prompt) – what are the remaining next steps? Please code that.” to go back and finish the remaining pieces of that functionality.

    (Note that saying “please code that” isn’t necessarily good prompt technique, see below).

    Again, much of this is paying attention to how the sub-task is getting done in service of the overall big picture goal of your project; or the chunk that you’ve been working on if you’re building new functionality. Keeping track with whatever method you prefer – in your head, a physical written list, a checklist digitally, or notes showing what you’ve done/not done – is helpful.

Most of the above I used for coding examples, but I follow the same general process when writing research papers, blog posts, research protocols, etc. My point is that this works for all types of projects that you’d work on with an LLM, whether the output generation intended is code or human-focused language that you’d write or speak.

But, coding or writing language, the other thing that makes a difference in addition to right-sizing the task is effective prompting. I’ve intuitively noticed that has made the biggest difference in my projects for getting the output matching my expertise. Conversely, I have actually peer reviewed papers for medical journals that do a horrifying job with prompting. You’ll hear people talk about “prompt engineering” and this is what it is referring to: how do you engineer (write) a prompt to get the ideal response from the LLM?

Tips for effective prompting with an LLM

    1. Personas and roles can make a difference, both for you and for the LLM. What do I mean by this? Start your prompt by telling the LLM what perspective you want it to take. Without it, you’re going to make it guess what information and style of response you’re looking for. Here’s an example: if you asked it what caused cancer, it’s going to default to safety and give you a general public answer about causes of cancer in very plain, lay language. Which may be fine. But if you’re looking to generate a better understanding of the causal mechanism of cancer; what is known; and what is not known, you will get better results if you prompt it with “You are an experienced medical oncologist” so it speaks from the generated perspective of that role. Similarly, you can tell it your role. Follow it with “Please describe the causal mechanisms of cancer and what is known and not known” and/or “I am also an experienced medical researcher, although not an oncologist” to help contextualize that you want a deeper, technical approach to the answer and not high level plain language in the response.

      Compare and contrast when you prompt the following:

      A. “What causes cancer?”

      B. “You are an experienced medical oncologist. What causes cancer? How would you explain this differently in lay language to a patient, and how would you explain this to another doctor who is not an oncologist?”

      C. “You are an experienced medical oncologist. Please describe the causal mechanisms of cancer and what is known and not known. I am also an experienced medical researcher, although not an oncologist.”

      You’ll likely get different types of answers, with some overlap between A and the first part of answer B. Ditto for a tiny bit of overlap between the latter half of answer B and for C.

      I do the same kind of prompting with technical projects where I want code. Often, I will say “You are an expert data scientist with experience writing code in Python for a Jupyter Notebook” or “You are an AI programming assistant with expertise in building iOS apps using XCode and SwiftUI”. Those will then be followed with a brief description of my project (more on why this is brief below) and the first task I’m giving it.

      The same also goes for writing-related tasks; the persona I give it and/or the role I reference for myself makes a sizable difference in getting the quality of the output to match the style and quality I was seeking in a response.

  • Be specific. Saying “please code that” or “please write that” might work, sometimes, but more often or not will get a less effective output than if you provide a more specific prompt.I am a literal person, so this is something I think about a lot because I’m always parsing and mentally reviewing what people say to me because my instinct is to take their words literally and I have to think through the likelihood that those words were intended literally or if there is context that should be used to filter those words to be less literal. Sometimes, you’ll be thinking about something and start talking to someone about something, and they have no idea what on earth you’re talking about because the last part of your out-loud conversation with them was about a completely different topic!

    LLMs are the same as the confused conversational partner who doesn’t know what you’re thinking about. LLMs only know what you’ve last/recently told it (and more quickly than humans will ‘forget’ what you told it about a project). Remember the above tips about brainstorming and making a list of tasks for a project? Providing a description of the task along with the ask (e.g. we are doing X related to the purpose of achieving Y, please code X) will get you better output more closely matching what you wanted than saying “please code that” where the LLM might code something else to achieve Y if you didn’t tell it you wanted to focus on X.

    I find this even more necessary with writing related projects. I often find I need to give it the persona “You are an expert medical researcher”, the project “we are writing a research paper for a medical journal”, the task “we need to write the methods section of the paper”, and a clear ask “please review the code and analyses and make an outline of the steps that we have completed in this process, with sufficient detail that we could later write a methods section of a research paper”. A follow up ask is then “please take this list and draft it into the methods section”. That process with all of that specific context gives better results than “write a methods section” or “write the methods” etc.

  • Be willing to start over with a new window/chat. Sometimes the LLM can get itself lost in solving a sub-task and lose sight (via lost context window) of the big picture of a project, and you’ll find yourself having to repeat over and over again what you’re asking it to do. Don’t be afraid to cut your losses and start a new chat for a sub-task that you’ve been stuck on. You may be able to eventually come back to the same window as before, or the new window might become your new ‘home’ for the project…or sometimes a third, fourth, or fifth window will.
  • Try, try again.
    I may hold the record for the longest running bug that I (and the LLM) could. Not. solve. This was so, so annoying. No users apparently noticed it but I knew about it and it bugged me for months and months. Every few weeks I would go to an old window and also start a new window, describe the problem, paste the code in, and ask for help to solve it. I asked it to identify problems with the code; I asked it to explain the code and unexpected/unintended functionality from it; I asked it what types of general things would be likely to cause that type of bug. It couldn’t find the problem. I couldn’t find the problem. Finally, one day, I did all of the above, but then also started pasting every single file from my project and asking if it was likely to include code that could be related to the problem. By forcing myself to review all my code files with this problem in mind, even though the files weren’t related at all to the file/bug….I finally spotted the problem myself. I pasted the code in, asked if it was a possibility that it was related to the problem, the LLM said yes, I tried a change and…voila! Bug solved on January 16 after plaguing me since November 8. (And probably existed before then but I didn’t have functionality built until November 8 where I realized it was a problem). I was beating myself up about it and posted to Twitter about finally solving the bug (but very much with the mindset of feeling very stupid about it). Someone replied and said “congrats! sounds like it was a tough one!”. Which I realized was a very kind framing and one that I liked, because it was a tough one; and also I am doing a tough thing that no one else is doing and I would not have been willing to try to do without an LLM to support.

    Similarly, just this last week on Tuesday I spent about 3 hours working on a sub-task for a new project. It took 3 hours to do something that on a previous project took me about 40 minutes, so I was hyper aware of the time mismatch and perceiving that 3 hours was a long time to spend on the task. I vented to Scott quite a bit on Tuesday night, and he reminded me that sure it took “3 hours” but I did something in 3 hours that would take 3 years otherwise because no one else would do (or is doing) the project that I’m working on. Then on Wednesday, I spent an hour doing another part of the project and Thursday whipped through another hour and a half of doing huge chunks of work that ended up being highly efficient and much faster than they would have been, in part because the “three hours” it took on Tuesday wasn’t just about the code but about organizing my thinking, scoping the project and research protocol, etc. and doing a huge portion of other work to organize my thinking to be able to effectively prompt the LLM to do the sub-task (that probably did actually take closer to the ~40 minutes, similar to the prior project).

    All this to say: LLMs have become pair programmers and collaborators and writers that are helping me achieve tasks and projects that no one else in the world is working on yet. (It reminds me very much of my early work with DIYPS and OpenAPS where we did the work, quietly, and people eventually took notice and paid attention, albeit slower than we wished but years faster than had we not done that work. I’m doing the same thing in a new field/project space now.) Sometimes, the first attempt to delegate a sub-task doesn’t work. It may be because I haven’t organized my thinking enough, and the lack of ideal output shows that I have not prompted effectively yet. Sometimes I can quickly fix the prompt to be effective; but sometimes it highlights that my thinking is not yet clear; my ability to communicate the project/task/big picture is not yet sufficient; and the process of achieving the clarity of thinking and translating to the LLM takes time (e.g. “that took 3 hours when it should have taken 40 minutes”) but ultimately still moves me forward to solving the problem or achieving the tasks and sub-tasks that I wanted to do. Remember what I said at the beginning:

    Clear thinking + clear communication of ideas/request = effective prompting => effective code and other outputs

 

  • Try it anyway.
    I am trying to get out of the habit of saying “I can’t do X”, like “I can’t code/program an iOS app”…because now I can. I’ve in fact built and shipped/launched/made available multiple iOS apps (check out Carb Pilot if you’re interested in macronutrient estimates for any reason; you can customize so you only see the one(s) you care about; or if you have EPI, check out PERT Pilot, which is the world’s first and only app for tracking pancreatic enzyme replacement therapy and has the same AI feature for generating macronutrient estimates to aid in adjusting enzyme dosing for EPI.) I’ve also made really cool, 100% custom-to-me niche apps to serve a personal purpose that save me tons of time and energy. I can do those things, because I tried. I flopped a bunch along the way – it took me several hours to solve a simple iOS programming error related to home screen navigation in my first few apps – but in the process I learned how to do those things and now I can build apps. I’ve coded and developed for OpenAPS and other open source projects, including a tool for data conversion that no one else in the world had built. Yet, my brain still tries to tell me I can’t code/program/etc (and to be fair, humans try to tell me that sometimes, too).

    I bring that up to contextualize that I’m working on – and I wish others would work on to – trying to address the reflexive thoughts of what we can and can’t do, based on prior knowledge. The world is different now and tools like LLMs make it possible to learn new things and build new projects that maybe we didn’t have time/energy to do before (not that we couldn’t). The bar to entry and the bar to starting and trying is so much lower than it was even a year ago. It really comes down to willingness to try and see, which I recognize is hard: I have those thought patterns too of “I can’t do X”, but I’m trying to notice when I have those patterns; shift my thinking to “I used to not be able to do X; I wonder if it is possible to work with an LLM to do part of X or learn how to do Y so that I could try to do X”.

    A recent real example for me is power calculations and sample size estimates for future clinical trials. That’s something I can’t do; it requires a statistician and specialized software and expertise.

    Or…does it?

    I asked my LLM how power calculations are done. It explained. I asked if it was possible to do it using Python code in a Jupyter notebook. I asked what information would be needed to do so. It walked me through the decisions I needed to make about power and significance, and highlighted variables I needed to define/collect to put into the calculation. I had generated the data from a previous study so I had all the pieces (variables) I needed. I asked it to write code for me to run in a Jupyter notebook, and it did. I tweaked the code, input my variables, ran it..and got the result. I had run a power calculation! (Shocked face here). But then I got imposter syndrome again, reached out to a statistician who I had previously worked with on a research project. I shared my code and asked if that was the correct or an acceptable approach and if I was interpreting it correctly. His response? It was correct, and “I couldn’t have done it better myself”.

    (I’m still shocked about this).

    He also kindly took my variables and put it in the specialized software he uses and confirmed that the results output matched what my code did, then pointed out something that taught me something for future projects that might be different (where the data is/isn’t normally distributed) although it didn’t influence the output of my calculation for this project.

    What I learned from this was a) this statistician is amazing (which I already knew from working with him in the past) and kind to support my learning like this; b) I can do pieces of projects that I previously thought were far beyond my expertise; c) the blocker is truly in my head, and the more we break out of or identify the patterns stopping us from trying, the farther we will get.

    “Try it anyway” also refers to trying things over time. The LLMs are improving every few months and often have new capabilities that didn’t before. Much of my work is done with GPT-4 and the more nuanced, advanced technical tasks are way more efficient than when using GPT-3.5. That being said, some tasks can absolutely be done with GPT-3.5-level AI. Doing something now and not quite figuring it out could be something that you sort out in a few weeks/months (see above about my 3 month bug); it could be something that is easier to do once you advance your thinking ; or it could be more efficiently done with the next model of the LLM you’re working with.

  • Test whether custom instructions help. Be aware though that sometimes too many instructions can conflict and also take up some of your context window. Plus if you forget what instructions you gave it, you might get seemingly unexpected responses in future chats. (You can always change the custom instructions and/or turn it on and off.)

I’m hoping this helps give people confidence or context to try things with LLMs that they were not willing to try before; or to help get in the habit of remembering to try things with LLMs; and to get the best possible output for the project that they’re working on.

Remember:

  • Right-size the task by making a clear ask.
  • You can use different chat windows for different levels of the same project.
  • Use a list to help you, the human, keep track of all the pieces that contribute to the bigger picture of the project.
  • Try giving the LLM a persona for an ask; and test whether you also need to assign yourself a persona or not for a particular type of request.
  • Be specific, think of the LLM as a conversational partner that can’t read your mind.
  • Don’t be afraid to start over with a new context window/chat.
  • Things that were hard a year ago might be easier with an LLM; you should try again.
  • You can do more, partnering with an LLM, than you can on your own, and likely can do things you didn’t realize were possible for you to do!

Clear thinking + clear communication of ideas/request = effective prompting => effective code and other outputs

Have any tips to help others get more effective output from LLMs? I’d love to hear them, please comment below and share your tips as well!

Tips for prompting LLMs like ChatGPT, written by Dana M. Lewis and available from DIYPS.org

Accepted, Rejected, and Conflict of Interest in Gastroenterology (And Why This Is A Symptom Of A Bigger Problem)

Recently, someone published a new clinical practice update on exocrine pancreatic insufficiency (known as EPI or PEI) in the journal called Gastroenterology, from the American Gastroenterology Association (AGA). Those of you who’ve read any of my blog posts in the last year know how much I’ve been working to raise awareness of EPI, which is very under-researched and under-treated clinically despite the prevalence rates in the general population and key sub-populations such as PWD. So when there was a new clinical practice update and another publication on EPI in general, I was jazzed and set out to read it immediately. Then frowned. Because, like so many articles about EPI, it’s not *quite* right about many things and it perpetuates a lot of the existing problems in the literature. So I did what I could, which was to check out the journal requirements for writing a letter to the editor (LTE) in response to this article and drafting and submitting a LTE article about it. To my delight, on October 17, 2023, I got an email indicating that my LTE was accepted.

You can find my LTE as a pre-print here.

See below why this pre-print version is important, and why you should read it, plus what it reminds us about what journal articles can or cannot tell us in healthcare.

Here’s an image of my acceptance email. I’ll call out a key part of the email:

A print of the acceptance email I received on October 17, 2023, indicating my letter would be sent to authors of the original articles for a chance to choose to respond (or not). Then my LTE would be published.

Letters to the Editor are sent to the authors of the original articles discussed in the letter so that they might have a chance to respond. Letters are not sent to the original article authors until the window of submission for letters responding to that article is closed (the last day of the issue month in which the article is published). Should the authors choose to respond to your letter, their response will appear alongside your letter in the journal.

Given the timeline described, I knew I wouldn’t hear more from the journal until the end of November. The article went online ahead of print in September, meaning likely officially published in October, so the letters wouldn’t be sent to authors until the end of October.

And then I did indeed hear back from the journal. On December 4, 2023, I got the following email:

A print of the email I received saying the LTE was now rejected
TLDR: just kidding, the committee – members of which published the article you’re responding to – and the editors have decided not to publish your article. 

I was surprised – and confused. The committee members, or at least 3 of them, wrote the article. They should have a chance to decide whether or not to write a response letter, which is standard. But telling the editors not to publish my LTE? That seems odd and in contrast to the initial acceptance email. What was going on?

I decided to write back and ask. “Hi (name redacted), this is very surprising. Could you please provide more detail on the decision making process for rescinding the already accepted LTE?”

The response?

Another email explaining that possible commercial affiliations influenced their choice to reject the article after accpeting it originally
In terms of this decision, possible commercial affiliations, as well as other judgments of priority and relevance among other submissions, dampened enthusiasm for this particular manuscript. Ultimately, it was not judged to be competitive for acceptance in the journal.

Huh? I don’t have any commercial affiliations. So I asked again, “Can you clarify what commercial affiliations were perceived? I have none (nor any financial conflict of interest; nor any funding related to my time spent on the article) and I wonder if there was a misunderstanding when reviewing this letter to the editor.”

The response was “There were concerns with the affiliation with OpenAPS; with the use of the term “guidelines,” which are distinct from this Clinical Practice Update; and with the overall focus being more fit for a cystic fibrosis or research audience rather than a GI audience.”

A final email saying the concern with my affiliation of OpenAPS, which is not a commercial organization nor related to the field of gastroenterology and EPI

Aha, I thought, there WAS a misunderstanding. (And the latter makes no sense in the context of my LTE – the point of it is that most research and clinical literature is a too-narrow focus, cystic fibrosis as one example – the very point is that a broad gastroenterology audience should pay attention to EPI).

I wrote back and explained how I, as a patient/independent researcher, struggle to submit articles to manuscript systems without a Ringgold-verified organization. (You can also listen to me describe the problem in a podcast, here, and I also talked about it in a peer-reviewed journal article about citizen science and health-related journal publishing here). So I use OpenAPS as an “affiliation” even though OpenAPS isn’t an organization. Let alone a commercial organization. I have no financial conflict of interest related to OpenAPS, and zero financial conflict of interest or commercial or any type of funding in gastroenterology at all, related to EPI or not. I actually go to such extremes to describe even perceived conflicts of interest, even non-financial ones, as you can see this in my disclosure statement publicly available from the New England Journal of Medicine here on our CREATE trial article (scroll to Supplemental Information and click on Disclosure Forms) where I articulate that I have no financial conflicts of interest but acknowledge openly that I created the algorithm used in the study. Yet, there’s no commercial or financial conflict of interest.

A screenshot from the publicly available disclosure form on NEJM's site, where I am so careful to indicate possible conflicts of interest that are not commercial or financial, such as the fact that I developed the algorithm that was used in that study. Again, that's a diabetes study and a diabetes example, the paper we are discussing here is on exocrine pancreatic insufficiency (EPI) and gastroenterology, which is unrelated. I have no COI in gastroenterology.

I sent this information back to the journal, explaining this, and asking if the editors would reconsider the situation, given that the authors (committee members?) have misconstrued my affiliation, and given that the LTE was originally accepted.

Sadly, there was no change. They are still declining to publish this article. And there is no change in my level of disappointment.

Interestingly, here is the article in which my LTE is in reply to, and the conflict of interest statement by the authors (committee members?) who possibly raised a flag about supposed concern about my (this is not true) commercial affiliation:

The conflict of interest statement for authors from the article "AGA Clinical Practice Update on the Epidemiology, Evaluation, and Management of Exocrine Pancreatic Insufficiency 2023"

The authors disclose the following: David C. Whitcomb: consultant for AbbVie, Nestlé, Regeneron; cofounder, consultant, board member, chief scientific officer, and equity holder for Ariel Precision Medicine. Anna M. Buchner: consultant for Olympus Corporation of America. Chris E. Forsmark: grant support from AbbVie; consultant for Nestlé; chair, National Pancreas Foundation Board of Directors.

As a side note, one of the companies with consulting and/or grant funding to two of the three authors is the biggest manufacturer of pancreatic enzyme replacement therapy (PERT), which is the treatment for EPI. I don’t think this conflict of interest makes these clinicians ineligible to write their article; nor do I think commercial interests should preclude anyone from publishing – but in my case, it is irrelevant, because I have none. But, it does seem weird given the stated COI for my (actually not a) COI then to be a reason to reject a LTE, of all things.

Here’s the point, though.

It’s not really about the fact that I had an accepted article rejected (although that is weird, to say the least…).

The point is that the presence of information in medical and research journals does not mean that they are correct. (See this post describing the incorrect facts presented about prevalence of EPI, for example.)

And similarly, the lack of presence of material in medical and research journals does not mean that something is not true or is not fact! 

There is a lot of gatekeeping in scientific and medical research. You can see it illustrated here in this accepted-rejected dance because of supposed COI (when there are zero commercial ties, let alone COI) and alluded to in terms of the priority of what gets published.

I see this often.

There is good research that goes unpublished because editors decide not to prioritize it (aka do not allow it to get published). There are many such factors in play affecting what gets published.

There are also systemic barriers.

  • Many journals require fees (called article processing charges or “APC”s) if your article is accepted for publication. If you don’t have funding, that means you can’t publish there unless you want to pay $2500 (or more) out of pocket. Some journals even have submission fees of hundreds of dollars, just to submit! (At least APCs are usually only levied if your article is accepted, but you won’t submit to these journals if you know you can’t pay the APC). That means the few journals in your field that don’t require APCs or fees are harder to get published in, because many more articles are submitted (thus, influencing the “prioritization” problem at the editor level) to the “free” journals.
  • Journals often require, as previously described, your organization to be part of a verified list (maintained by a third party org) in order for your article to be moved through the queue once submitted. Instead of n/a, I started listing “OpenAPS” as my affiliation and proactively writing to admin teams to let them know that my affiliation won’t be Ringgold-verified, explaining that it’s not an org/I’m not at any institution, and then my article can (usually) get moved through the queue ok. But as I wrote in this peer-reviewed article with a lot of other details about barriers to publishing citizen science and other patient-driven work, it’s one of many barriers involved in the publication process. It’s a little hard, every journal and submission system is a little different, and it’s a lot harder for us than it is for people who have staff/support to help them get articles published in journals.

I’ve seen grant funders say no to funding researchers who haven’t published yet; but editors also won’t prioritize them to publish on a topic in a field where they haven’t been funded yet or aren’t well known. Or they aren’t at a prestigious organization. Or they don’t have the “right” credentials. (Ahem, ahem, ahem). It can be a vicious cycle for even traditional (aka day job) researchers and clinicians. Now imagine that for people who are not inside those systems of academia or medical organizations.

Yet, think about where much of knowledge is captured, created, translated, studied – it’s not solely in these organizations.

Thus, the mismatch. What’s in journals isn’t always right, and the process of peer review can’t catch everything. It’s not a perfect system. But what I want you to take away, if you didn’t already have this context, is an understanding that what’s NOT in a journal is not because the information is not fact or does not exist. It may have not been studied yet; or it may have been blocked from publication by the systemic forces in play.

As I said at the end of my LTE:

It is also critical to update the knowledge base of EPI beyond the sub-populations of cystic fibrosis and chronic pancreatitis that are currently over-represented in the EPI-related literature. Building upon this updated research base will enable future guidelines, including those like the AGA Clinical Practice Update on EPI, to be clearer, more evidence-based, and truly patient-centric ensuring that every individual living with exocrine pancreatic insufficiency receives optimal care.

PS – want to read my LTE that was accepted then rejected, meaning it won’t be present in the journal? Here it is on a preprint server with a DOI, which means it’s still easily citable! Here’s an example citation:

Lewis, D. Navigating Ambiguities in Exocrine Pancreatic Insufficiency. OSF Preprints. 2023. DOI: 10.31219/osf.io/xcnf6

New Survey For Everyone (Including You – Yes, You!) To Help Us Learn More About Exocrine Pancreatic Insufficiency

If you’ve ever wanted to help with some of my research, this is for you. Yes, you! I am asking people in the general public to take a survey (https://bit.ly/GI-Symptom-Survey-All) and share their experiences.

Why?

Many people have stomach or digestion problems occasionally. For some people, these symptoms happen more often. In some cases, the symptoms are related to exocrine pancreatic insufficiency (known as EPI or PEI). But to date, there have been few studies looking at the frequency of symptoms – or the level of their self-rated severity – in people with EPI or what symptoms may distinguish EPI from other GI-related conditions.

That’s where this survey comes in! We want to compare the experiences of people with EPI to people without EPI (like you!).

Will you help by taking this survey?

Your anonymous participation in this survey will help us understand the unique experiences individuals have with GI symptoms, including those with conditions like exocrine pancreatic insufficiency (EPI). In particular, data contributed by people without EPI will help us understand how the EPI experience is different (or not).

A note on privacy:

  • The survey is completely anonymous; no identifying information will be collected.
  • You can stop the survey at any point.

Who designed this survey:

Dana Lewis, an independent researcher, developed the survey and will manage the survey data. This survey design and the choice to run this survey is not influenced by funding from or affiliations with any organizations.

What happens to the data collected in this survey:

The aggregated data will be analyzed for patterns and shared through blog posts and academic publications. No individual data will be shared. This will help fill some of the documented gaps in the EPI-related medical knowledge and may influence the design of targeted research studies in the future.

Have Questions?
Feel free to reach out to Dana+GISymptomSurvey@OpenAPS.org.

How else can you help?
Remember, ANYONE can take this survey. So, feel free to share the link with your family and friends – they can take it, too!

Here’s a link to the survey that you can share (after taking it yourself, of course!): https://bit.ly/GI-Symptom-Survey-All

You (yes you!) can help us learn about exocrine pancreatic insufficiency by taking the survey linked on this page.

MacrosOnTheRun: an iOS app for tracking activity fuel consumption

Last year, I built a spreadsheet template (and shared it here) to use while training and running ultramarathons to track my fuel consumption. It was helpful for me, as a person with exocrine pancreatic insufficiency, to see and decide based on macronutrient counts for each snack how many enzyme pills I needed to take each time I fueled, which is every 30 minutes.

This year, I got tired of messing with the spreadsheet while running. I don’t mind the data entry, but because of the iterative calculations updating with the hourly and overall totals of carbs, sodium, calories per hour etc, the Google Sheet would get bogged down over time, especially when I was running for 16 hours (like during my 100k in March). That would cause the Google Sheets app to crash and reload, or kick me out of the sheet and require me to click back in, wait for it to catch up, before entering my fuel item. It only took a couple of seconds, but it was annoying to have that delay while I was running.

I thought about not logging my fueling while running, especially because I had switched to a slightly more expensive but also larger over-the-counter (OTC) enzyme pill that basically covers every single snack I take with one single pill. That requires less mid-run decision making about how many to take, so it’s less important during the run to see each snack’s composition: I simply swallow a pill each time I do fuel.

Yet, after 1-2 runs of 2-3 hours where I didn’t log my intake, I still found myself missing the data from the run. Although the primary use case of in-run decision making wasn’t there for enzyme dosing, the secondary use case of making sure I was consuming enough sodium per hour and calories per hour relative to my goals was still there. I still wanted to offload that hourly tracking so I didn’t have to remember how much I had had in the last hour. Plus, the post-run data summary was nice, because it helped me evaluate my fueling overall in the grand scheme of my daily nutritional intake, which is particularly helpful for me in making sure I’m consuming enough protein to match my ultra-running activities.

And, I had figured out last year how to develop iOS apps (check out PERT Pilot if you have EPI, and Carb Pilot if you’re someone who’d like to simply use AI to generate estimates of how many carbs or macronutrients are in what you’re eating) with the help of an LLM. So I decided to try to build a custom, just for me app to mimic my spreadsheet in order to easily track my fueling on the run.

Tada! I made MacrosOnTheRun.Macros on the run logo showing "on the run" below the word Macros, stylized to look like 'on the run' is a drop down menu, reminiscent of the fuel list drop down in the app

It’s pretty simple: I open the app, hit ‘start run’, and then click the drop down and tap the fuel item (or electrolyte) that I’m consuming. I hit “add fuel”, and the items drops into the list on the screen and is added to the hourly and overall estimates shown above the drop down.

Screenshot of MacrosOnTheRun showing a pre-populated fuel list to select from and on the right, a screenshot at the end of a 9 hour run with fuel totals and individual fuel items entered
An example during a long run where after the run I open the app to export my in-run data. This is after the run, so you’ll see it’s been 97 minutes since the last fuel when I took that screenshot, and thus the sodium per hour and calories per hour calculation shows 0 given that it’s been >60 minutes since the last fuel. Below that is the total run stats, including enzymes and electrolytes counts. Given that I fuel like clockwork every 30 minutes, you can infer this was a 9 hour run since I took 18 enzymes!

When I’m done with the run, I tap the “stop and export” button at the bottom, which opens the iOS share sheet and enables me to email the CSV file to myself, so I can copy/paste the data back into the same spreadsheet template I was using before. It’s useful because I have all my runs stored as individual tabs in the sheet, and the template (same one I was using last year) autopopulates the pivot table with hourly summaries so I can see across each hour whether I was meeting my sodium and fueling goals. (Check out the 27 hour summary table in my 100 mile recap if you’re curious to see an example!)

Right now, I haven’t bothered to add a feature to edit in-app what the fuel list is – mine is programmed in via the code of the app itself, since I’m the only one using it – and I haven’t published it to the iOS App Store because I didn’t think anyone else would want to use it.

But, if I’m wrong, and this is something you’d like to use – let me know by commenting here or emailing me (Dana+MacrosOnTheRun@OpenAPS.org) and letting me know. If there’s interest, I can modify the app to allow in-app fuel list entry and modifications of the fuel list and then share it via TestFlight or in the App Store for other people to download and use.

Running a Multi-Day Ultramarathon (Aiming for 200 Miles)

I used to make a lot of statements about things I thought I couldn’t do. I thought I couldn’t run overnight, so I couldn’t attempt to run 100 miles. I could never run 200 mile races the way other people did. Etc. Yet last year I found myself training for and attempting 100 miles (I chose to stop at 82, but successfully ran overnight and for 25 hours) and this year I found myself working through the excessive mental logistics and puzzle of determining that I could train for and attempt to run 200 miles, or as many miles as I could across 3-4 days.

Like my 100 mile attempt, I found some useful blog recaps and race reports of people’s official races they did for 200-ish mile races. However, like the 100 attempts, I found myself wanting more information for the mental training and logistical preparation people put into it. While my 200 mile training and prep anchored heavily on what I did before, this post describes more detail on how my training, prep, and ‘race’ experience for a multi-day or 200 mile ultra attempt.

DIY-ing a 200

For context, I have a previous post describing the myriad reasons of why I often choose to run DIY ultras, meaning I’m not signing up for an official race. Most of those reasons hold true for why I chose to DIY my 200. Like my 100 (82) miles, I mapped a route that was based on my home paved trail that takes me out and around the trails I’m familiar with. It has its downsides, but also the upsides: really good trail bathrooms and I feel safe running them. Plus, it’s easy and convenient for my husband to crew me. Since I expected this adventure to take 3-4 days (more on that below), that’s a heavy ask of my husband’s time and energy, so sticking with the easy routes that work for him is optimal, too. So while I also sought to run 200 miles just like any other 200-mile ultra runner, my course happens to have minimal elevation. Not all 200 mile ultramarathon races have a ton of elevation – some like the Cowboy 200 are pretty flat – so my experience is closer to that than the experience of those running mountain based ultras with 30,000 feet (or more) of elevation gain. And I’m ok with that!

Sleep

One of the puzzles I had to figure out to decide I could even attempt a 200 miler is sleep. With a 100 mile race, most people don’t sleep at all (nor did I) and we just run through the night. With 200 miles, that’s impossible, because it takes 3, 4, 5 days to finish and biologically you need sleep. Plus, I need more sleep than the average person. I’m a champion sleeper; I typically sleep much longer than everyone else; and I know I couldn’t function with an hour here or there like many people do at traditional races. So I actually designed my 200 mile ultra with this in mind: how could I cover 200 miles AND get sleep? Because I’m running to/from home, I have access to my kitchen, shower, and bed, so I decided that I would set up my run to run each day and come home and eat dinner, shower, and sleep each night for a short night in my bed.

I then decided that instead of winging it and running until I dropped before eating, showering, and sleeping, I would aim for running 50 miles each day. Then I’d come in, eat, shower, and sleep and get up the next morning and go again. 4 days, 3 nights, 50 miles each day: that would have me finishing around 87-90ish hours total (with the clock running from my initial start), including ~25 hours or more of total downtime between the eating/showering/sleeping/getting ready. That breakdown of 3.67 days is well within the typical finish times of many 200 mile ultras (yes, comparing to those with elevation gain), so it felt like it was both a stretch for me but also doable and in a sensible way that works for me and my needs. I mapped it all out in my spreadsheet, with the number of laps and my routes and pacing to finish 50 miles per day; the two times per day I would need my husband to come out and crew me at ‘aid station stops’ in between laps, and what time I would finish each night. I then factored in time to eat and shower and get ready for bed, sleep, and time to get up in the morning. Given the fact that I expected to run slower each day, the sleep windows go from 8 hours down to less than 6 hours by night 3. That being said, if I managed to sleep 5 hours per night and 15 hours total, that’s probably almost twice as much as most people get during traditional races!

Like sleep, I was also very cognizant of the fact that a 200 probably comes down to mental fortitude and will power to keep going; meticulous fueling; and excellent foot care. Plus reasonable training, of course.

Meticulous fueling

I have previously written about building and using a spreadsheet to track my fuel intake during ultras. This method works really well for me because after each training run I can see how much I consumed and any trends. I started to spot that as I got tired, I would tend to choose certain snacks that happened to be slightly lower calorie. Not by much, but the snack selections went from those that are 150-180 calories to 120-140 calories, in part because I perceived them to be both ‘smaller’ (less volume) and ‘easier to swallow’ when I was tired. Doubled up in the same hour, this meant that I started to have hours of 240 calories instead of more than 250. That doesn’t sound like much, but I need every calorie I can get.

I mapped out my estimated energy expenditure based on the 50 miles per day, and even consuming 250 calories per hour, I would end up with several thousand calories of deficit each day! I spent a lot of time testing food that I think I can eat for dinner on the 3 nights to ensure that I get a good 1000 calories or more in before going to bed, to help address and reduce the growing energy deficit. But I also ended up optimizing my race fuel, too. Because I ran so many long runs in training where I fueled every 30 minutes, and because I had been mapping out my snack list for each lap for 50 miles a day for 4 days, I’ve been aware for months that I would probably get food fatigue if I didn’t expand my fuel list. I worked really hard to test a bunch of new snacks and add them to the rotation. That really helped even in training, across all 12 laps (3 laps a day to get 50 miles, times 4 days), I carefully made sure I wouldn’t have too many repeats and get sick of one food or one group of things I planned to eat. I also recently realized that some of the smaller items (e.g. 120 calorie servings) could be increased. I’m already portioning out servings from a big bag into small baggies; in some cases adding one more pretzel or one more piece of candy (or more) would drive up the calories by 10-20 per serving. Those small tweaks I made to 5 of my ~18 possible snacks means that I added about 200 calories on top of what was already represented in those snacks. If I happen to choose those 5 snacks as part of my list for any one lap, that means I have a bonus 200 calories I’ve convinced myself to consume without it being a big deal, because it’s simply one more pretzel or one more piece of candy in the snack that I’m already use to consuming. (Again, because I’m DIYing my race and have specific needs relative to running with celiac, diabetes, and exocrine pancreatic insufficiency, for me, pre-planning my fuel and having it laid out in advance for every run, or in the race every single lap, is what works for me personally.)

Here’s a view of how I laid out my fuel. I had worked on a list of what I wanted for each lap, checking against repeats across the same day and making sure I wasn’t too heavily relying on any one snack throughout all the days. I then bagged up all snacks individually, then followed my list to lay them out by each lap and day accordingly. I also have a bag per day each for enzymes and electrolytes, which you’ll see on the left. Previously, I’ve done one bag per lap, but to reduce the number of things I’m pulling in and out of my vest each time, I decided I could do one big bag each per day (and that did end up working out well).

Two pictures side by side, with papers on the floor showing left to right laps 1-3 on the top and along the left side days 1-4, to create a grid to lay out my snacks. On the left picture, I have my enzymes, electrolytes per day and then a pile of snacks grouped for each lap. On the right, all the snacks and enzymes and electrolytes have been put into gallon bags, one for each lap.

Contingency planning

Like I did for my 100, I was (clearly) planning for as many possibilities as I could. I knew that during the run – and each evening after the run – I would have limited excess mental capacity for new ideas and brainstorming solutions when problems come up. The more I prepared for things that I knew were likely to happen – fatigue, sore body, blisters, chafing, dropping things, getting tired of eating, etc – the more likely that they would be small things and not big things that can contribute to ending a race attempt. This includes learning from my past 100 attempt and how I dealt with the rain. First of all, I planned to move my race if it looks like we’ll get 6 months of rain in a single 24 hour period! But also, I scheduled my race so that if I do have a few hours of really hard rain, I could choose to take a break and come in and eat/shower/change/rest and go back out later, or extend and finish a lap on the last day or the day after that. I was not running a race that would yank me from the course, but I did have a hard limit after day 5 based on a pre-planned doctor’s appointment that would be a hassle to reschedule, so I needed to finish by the night after day 5. But this gave me the flexibility to take breaks (that I wasn’t really planning to take but was prepared to if I needed to due to weather conditions).

Training for a 200 mile ultramarathon

Like training plans for marathons and 100 milers, the training plans I’ve read about for 200 mile ultramarathons intimidate me. So much mileage! So much time for a slow run/walker like me. I did try to look at sample 200 mile ultra plans and get a sense for what they’re trying to achieve – e.g. when do they peak their mileage before the race, how many back to back runs of what general length in terms of time etc – and then loosely keep that in mind.

But basically, I trained for this 200 mile ultra just like I trained for my marathon, 50k, 100k, and 82 miler. I like to end up doing long runs (which for me are run/walks of 30 seconds run, 60 seconds walk, just like I do shorter runs) of up to around 50k distance. This time, I did two total training runs that were each around 29 miles, just based on the length of the trail I had to run. I could have run longer, but mentally had the confidence that another ~45 minutes per run wasn’t going to change my ability to attempt 50 miles a day for 4 days. If I didn’t have 3 years of this training style under my personal belt, I might feel different about it. That’s longer than many people run, but I find the experience of 7-8 hours of time on my feet fueling, run/walking, and problem solving (including building up my willpower to spend that much time moving) to be what works for me.

The main difference for my 200 is probably also that it’s my 3rd year of ultrarunning. I was able to increase my long runs a little bit more of a time, when historically I used to add 2 miles a time to a long run. I jumped up 4 miles at a time – again, run/walking so very easy on my legs – when building up my long runs, so I was able to end up with 2 different 29 mile runs, two weeks apart, even though I really kicked off training specifically for this 8 weeks prior (10 weeks including taper) to the run. In between I also did a weekend of back to back to back runs (meaning 3 days in a row) where I ran 16 miles, another 16 miles, and 13 miles to practice getting up and running on tired legs. In past cycles I had done a lot more back to back (2-day) with a long and a medium run, but this time I did less of the 2-day and did the one big 3-day since I was targeting a 4-day experience. In future, if I were to do this again, given how well my body held up with all this training, I might have done more back to back, but I took things very cautiously and wanted to not overtrain and cause injury from ramping up too quickly.

As part of that (trying not to over do it), instead of doing several little runs throughout the week I focused on more medium-long runs with my vest and fueling, so I would do something like a long run (starting at 10 miles building up to 29 miles), a medium-long run (8 miles up to 13 miles or 16 miles) and another medium-ish run (usually 8 miles). Three runs a week, and that was it. Earlier in the 8 weeks, I was still doing a lot of hiking off the season, so I had plenty of other time-on-feet experiences. Later in the season I sometimes squeezed in a 4th short run of the week if we wouldn’t be hiking, and ran without my vest and tried to do some ‘speed work’ (aka run a little faster than my easy long run pace). Nothing fancy. Again, this is based on my slow running style (that’s actually a fixed interval of short run and short walk, usually 30 seconds run and 60 seconds walk), my schedule, my personality, and more. If you read this, don’t think my mileage or training style is the answer. But I did want to share what I did and that it generally worked for me.

I did struggle with wondering if I was training “enough”. But I never train “enough” compared to others’ marathon, 50k, 100k, 100 mile plans, either. I’m a low mileage-ish trainer overall, even though I do throw in a few longer runs than most people do. My peak training for marathon, 50k, and 100k is usually around low 50s (miles per week). Surprisingly, this 200 cycle did get me to some mid 60 mile weeks! One thing that also helped me mentally was adding in a rolling 7 day calculation of the miles, not just looking at miles per calendar week. That helped when I shifted some runs around due to scheduling, because I could see that I was still keeping a reasonable 55-low60s mileage over 7 days even though the calendar week total dropped to low 40s because of the way the runs happened to land in the calendar weeks.

Generally, though, looking back at how my training was more than I had accomplished for previous races; I feel better than ever (good fueling really helps!); I didn’t have any accidents or overtraining injuries or niggles; I decided a few weeks before peak that I was training enough and it was the right amount for me.

Another factor that was slightly different was how much hiking I had done this year. I ran my 100k in March then took some time off, promising my husband that we would hike “more” this year. That also coincided with me not really bouncing back from my 100k recovery period: I didn’t feel like doing much running, so we kept planning hiking adventures. Eventually I realized (because I was diagnosed with Graves’ disease last year, I’m having my thyroid and antibody and other related blood work done every 3 months while we work on getting everything into range) that this coincided with my TSH going too high for my body’s happiness; and my disinterest in long runs was actually a symptom (for me) of slightly too-high TSH. I changed my thyroid medication and within two weeks felt HUGELY more interested in long running, which is what coincided with reinvigorating my interest in a fall ultra, training, and ultimately deciding to go for the 200. But in the meantime, we kept hiking a lot – to the tune of over 225 miles hiked and over 53,000 feet of elevation gain! I never tracked elevation gain for hiking before (last year, not sure I retrospectively tracked it all but it was closer to 100 miles – so definitely likely 2x increase), but I can imagine this is definitely >2x above what I’ve done on my previous biggest hiking year, just given the sheer number of hikes that we went out on. So overall, the strengthening of my muscles from hiking helped, as did the time on feet. Before I kicked off my 8 week cycle, we were easily spending 3-4 hours a hike and usually at least two hikes a weekend, so I had a lot of time on feet almost every hike equivalent to 12 or more miles of running at that point. That really helped when I reintroduced long runs and aided my ability to jump my long run in distance by 4 miles at a time instead of more gently progressing it by 2 miles a week as I had done in the past.

How my 200 mile attempt actually went

Spoiler alert: I DNF (did not finish) 200 miles. Instead, I stopped – happily – at 100 miles. But it wasn’t for a lack of training.

Day 1 – 51 miles – All as planned

I set out on lap 1 on Day 1 as planned and on time, starting in the dark with a waist lamp at 6am. It was dark and just faintly cool, but warm enough (51F) that I didn’t bother with long sleeves because I knew I would warm up. (Instead, for all days, I was happy in shorts and a short sleeve shirt when the temps would range from 49F to 76F and back down again.) I only had to run for about an hour in the dark and the sky gradually brightened. It ended up being a cloudy, overcast and nice weather day so it didn’t get super bright first thing, but because it wasn’t wet and cold, it wasn’t annoying at all. I tried to start and stay at an easy pace, and was running slow enough (about ~30s/mile slower than my training paces) that I didn’t have to alter my planned intervals to slow me down any more. All was fairly well and as planned in the first lap. I stopped to use the bathroom at mile 3.5 and as planned at my 8 mile turnaround point, and also stopped to stuff a little more wool in a spot in my shoe a mile later. That added 2 minutes of time, but I didn’t let it bother me and still managed to finish lap 1 at about a 15:08 min/mi average pace, which was definitely faster than I had predicted. I used the bathroom again at the turnaround while my husband re-filled my hydration pack, then I stuffed the next round of snacks in my vest and took off. The bathroom and re-fueling “aid station” stop only took 5 minutes. Not bad! And on I went.

A background-less shot of me in my ultrarunning gear. I'm wearing a grey moisture-wicking visor; sunglasses; a purple ultrarunning vest packed with snacks in front and the blue tube of my hydration pack looped in front; a bright flourescent pink short sleeve shirt; grey shorts with pockets bulging on the side with my phone (left pocket) and skittles and headphones and keys (right pocket), and in this lap I was wearing bright pink shoes. Lap 2 was also pretty reasonable, although I was surprised by how often I wanted a bathroom. My period had started that morning (fun timing), and while I didn’t have a lot of flow, the signals my abdomen was giving my brain was telling me that I needed to go to the bathroom more often than I would have otherwise. That started to stress me out slightly, because I found myself wishing for a bathroom in the longest stretch without trail bathrooms and in a very populated area, the duration of which was about 5.5 miles long. I tried to drink less but was also aware of trying not to under hydrate or imbalance my electrolytes. I always get a little dehydrated during my period; and I was running a multi-day ultra where I needed a lot of hydration and more sodium than usual; this situation didn’t add up well! But I made it without any embarrassing moments on the trail. The second aid station again only took 5 minutes. (It really makes a world of difference to not have to dry off my feet, Desitin them up, and re-do socks and shoes every single aid station like I did last year!) I could have moved faster, but I was trying to not let small minutes of time frazzle me, and I was succeeding with being efficient but not rushed and continuing on my way. I had slowed down some during lap 2, however – dropping from a 15:08 to 15:20ish min/mi pace. Not much, but noticeable.

At sunset, with light blue sky fading to yellow at the horizon behind the row of tall, skinny bush like trees with gaps and a hot air balloon a hundred or so feet off the ground seen between the trees.Lap 3 I did feel more tired. I talked my husband into bringing me my headlamp toward the end of the last lap, instead of me having to carry it for 4+ hours before the sun went down. (Originally, I thought I would need it 2-3 hours into this last lap, but because I was moving so well it was now looking like 4 hours, and it would be a 2-3 mile e-bike ride for him to bring me the lamp when I wanted it. That was a mental win to not have to run with the lamp when I wasn’t using it!) I was still run/walking the same duration of intervals, but slowed down to about 16:01 pace for this lap. Overall, I would be at 15:40 average for the whole day, but the fatigue and my tired feet started to kick in on the third lap between miles 34-51. Plus, I stopped to take a LOT more pictures, because there was a hot air balloon growing in the distance as it was flying right toward me – and then by me next to the trail! It ended up landing next to the soccer fields a mile behind me after it passed me in this picture. I actually made it home right as the sun set and didn’t have to wear my lamp at all that evening.

Day 1 recovery was better and worse than I expected. I sat down and used my foot massager on my still-socked feet, which felt very good. I took a shower after I peeled my socks off and took a look at my feet for the first time. I had one blister that I didn’t know was growing at all pop about an hour before I finished, but it was under some of my pre-taped area. I decided to leave the tape and see how it looked and felt in the morning. I had 2-3 other tiny, not a big deal blisters that I would tape in the morning but didn’t need any attention that night.

I had planned to eat a reasonably sized dinner – preferably around 1000 calories – each night, to help me address my calorie deficit. And I had a big deficit: I had burned 5,447 calories and consumed 3,051 calories in my 13 hours and 13 minutes of running. But I could only eat ¼ of the pizza I planned for dinner, and that took a lot of work to force myself to eat. So I gave up, and went to bed with a 3,846 calorie deficit, which was bigger than I wanted.

And going to bed hurt. I was stiff, which I could deal with, but my feet that didn’t hurt much while running started SCREAMING at me. All over. They hurt so bad. Not blisters, just intense aches. Ouch! I started to doubt my ability to run the next day, but this is where my pre-planning kicked in (aided by my husband who had agreed to the rules we had decided upon): no matter what, I would get up in the morning, get dressed, and go out and start my first lap. If I decided to quit, I could, but I could not quit at night in bed or in the morning in the bed or in the house. I had to get up and go. So I went to sleep, less optimistic about my ability to finish 50 miles again on day 2, but willing to see what would happen.

Day 2: 34 instead of 50 miles, and walking my first ever lap

I actually woke up before my alarm went off on day 2. Because I had finished so efficiently the day before, I was able to again get a good night’s sleep, even with the early alarm and waking up again at 4:30am with plans to be going by 6am. The extra time was helpful, because I didn’t feel rushed as I got ready to go. I spent some extra time taping my new blisters. Because they hadn’t popped, I put small torn pieces of Kleenex against them and used cut strips of kinesio tape to protect the area. (Read “Fixing Your Feet” for other great ultra-related foot care tips; I learned about Kleenex from that book.) I also use lambs’ wool for areas that rub or might be getting hot spots, so I put wool back in my usual places (between big and second toes, and on the side of the foot) plus another toe that was rubbing but not blistered and could use some cushion. I also this year have been trying Tom’s blister powder in my socks, which seems to help since my feet are extra sweat prone, and I had pre-powdered a stack of socks so I could simply slip them on and get going once I had done the Kleenex/tape and wool setup. The one blister that had popped under my tape wasn’t hurting when I pressed on it, so I left it alone and just added loose wool for a little padding.

A pretty view of the trail with bright blue sky after the sun rose with green bushes (and the river out of sight) to the left, with the trail parallel to a high concrete wall of a road with cheery red and yellow leaved trees leaning over the trail.And off I went. I managed to run/walk from the start, and faster than I had projected on my spreadsheets originally and definitely faster than I thought was possible the night before or even before I started that morning. Sure, I was slower than the day before, but 15:40 min/mi pace was nothing to sneeze at, and I was feeling good. I was really surprised that my legs, hips and body did not hurt at all! My multi-day or back-to-back training seemed to pay off here. All was well for most of the first lap (17 miles again), but then the last 2 or so miles, my pace started dipping unexpectedly so I was doing 16+ min/mi without changing my easy effort. I was disappointed, and tired, when I came into my aid station turnaround. I again didn’t need foot care and spent less than 5 minutes here, but I told Scott as I left that I was going to walk for a while, because my feet had been hurting and they were getting worse. Not blisters: but the balls of my feet were feeling excruciating.

A close up of a yellow shelled snail against the paved trail that I saw while walking the world's slowest 17-mile lap on day 2.I headed out, and within a few minutes he had re-packed up and biked up to ride alongside me for a few minutes and chat. I told him I was probably going to need to walk this entire lap. We agreed this was fine and to be expected, and was in fact built into my schedule that I would slow down. I’ve never walked a full lap in an ultra before, so this would be novel to me. But then my feet got louder and louder and I told him I didn’t think I could even walk the full lap. We decided that I should take some Tylenol, because I wasn’t limping and this wouldn’t mask any pain that would be important cues for my body that I would be overriding, but simply muting the “ow this is a lot” screams that the bones in the balls of my feet were feeling. He biked home, grabbed some, and came back out. I took the Tylenol and sent him home again, walking on. Luckily, the Tylenol did kick in and it went from almost unbearable to manageable super-discomfort, so I continued walking. And walking. And walking. It took FOREVER, it felt like, having gone from 15-16 min/mi pace with 30 seconds of running, 60 seconds of walking, to doing 19-20 minute miles of pure walking. It was boring. I had podcasts, music, audiobooks galore, and I was still bored and uncomfortable and not loving this experience. I also was thinking about it on the way back about how I did not want to do a 3rd lap that day (to get me to my planned 50 miles) walking again.

Scott biked out early to meet me and bring me extra ice, because it was getting hot and I was an hour slower than the day before and risking running out of water that lap if he didn’t. After he refilled my hydration pack and brought it back to me while I walked on, I told him I wanted to be done for the day. He pointed out that when I finished this lap, I would be at 34 miles for the day, and combined with the day before (51), that put me at 85 miles, which would be a new distance PR for me since last year I had stopped at 82. That was true, and that would be a nice place to stop for the day. He reminded me of our ‘rules’ that I could go out the next day and do another lap to get me to 100, and decide during that lap what else I wanted to do. I was pretty sure I didn’t want to do more, but agreed I would decide the next day. So I walked home, completing lap 2 and 34 miles for the day, bringing me to 85 miles overall across 2 days.

Day 2 recovery went a little better, in part because I didn’t do 51 miles (only 34) and I had walked rather than ran the second lap, and also stopped earlier in the day (4pm instead of 7pm). I had more time to shower and bring myself to finally eat an entire 1000 calories before going to bed, again with my feet screaming at me. I had more blisters this time, mostly again on my right foot, but the balls of my feet and the bones of my feet ached in a way they never had before. This time, though, instead of setting my alarm to get up and go by 6am, I decided to sleep for longer, and go out a little later to start my first lap. This was a deviation from my plan, but another deviation I felt was the right one: I needed the sleep to help my body recover to be able to even attempt another lap.

Day 3: Only 16 miles, but hitting 100 for the first time ever

Instead of 6am, I set out on Day 3 around 8:30am. I would have taken even longer to go, but the forecast was for a warm day (we ended up hitting 81F) and I wanted to be done with the lap before the worst of the heat. I thought there was a 10% chance I’d keep going after this lap, but it was a pretty small chance. However, I set out for the planned 16 mile lap and was pleasantly surprised that I was run/walking at about a 15:40 pace! Again, better than I had projected (although yes, I had deviated from my mileage plan the day before), and it felt like a good affirmation that stopping the day before instead of slogging out another walking lap was the right thing to do.

After a first few miles, I toyed with the idea of continuing on. But I knew with the heat I probably wouldn’t stand more than one more lap, which would get me to 116. Even if I went out again the fourth day, and did 1-2 laps, that would MAYBE get me to 150, but I doubted I could do that without starting to cause some serious damage. And it honestly wasn’t feeling fun. I had enjoyed the first day, running in the dark, the fog, the daylight, and the twilight, seeing changing fall leaves and running through piles of them. The second day was also fun for the first lap, but the second lap walking was probably what a lot of ultra marathoners call the “death march” and just not fun. I didn’t want to keep going if it wasn’t fun, and I didn’t want to run myself into the ground (meaning to be so worn down that it would take weeks to months to recover) or into injury, especially when the specific milestones didn’t really mean anything. Sure, I wanted to be a 200 mile ultramarathoner, something that only a few thousand people have ever done – but I didn’t want to do it at the expense of my well-being. I spent a lot of time thinking about it, especially miles 4-8, and was thinking about the fact that the day before I had started, I had gone to a doctor’s appointment and had an official diagnosis confirming my fifth autoimmune disease, then proceeded to run (was running) 100 miles. Despite all the fun challenges of running with autoimmune conditions, I’m in really good health and fitness. My training this year went so well and I really enjoyed it. Most of this ultra had gone so well physically, and my legs and body weren’t hurting at all: the weakness was my feet. I didn’t think I could have trained any differently to address that, nor do I think I could change it moving forward. It’s honestly just hard to run that many hours or that many miles, as most ultramarathoners know, and your feet take a beating. Given that I was running on pavement for all of those hours, it can be even harder – or a different kind of hard – than kicking roots and rocks on a dirt trail. I figured I would metaphorically kick myself if I tried for 116 or 134 and injured myself in a way that would take 6-8 weeks to recover, whereas I felt pretty confident that if I stopped after this lap (at 100), I would have a relatively short and easy recovery, no major issues, and bounce back better than I ever have, despite it being my longest ever ultramarathon. Yes, I was doing it as a multi-day with sleep in between, but both in time on feet and in mileage, it was still the most I’d ever done in 2 or 3 days.

And, I was tired of eating. I was fueling SO well. Per my plans, I set out to do >500 mg of sodium per hour and >250 calories per hour. I had been nailing it every lap and every day! Day 1 I averaged 809 mg of sodium per hour and 290 calories per hour. Day 2 was even increased from that, averaging 934 mg of sodium per hour and 303 calories per hour! Given the decreased caloric burn of day 2 because I walked the second lap, my caloric deficit for day 2 was a mere ~882 calories (given that I also managed to eat a full dinner that night), even though I skipped the last hour as I finished the walking lap. Day 3 I was also fueling above my goals, but I was tired of it. Sooooo tired of it. Remember, I have to take a pill every time I eat, because I have exocrine pancreatic insufficiency (EPI or PEI). I was eating every 30 minutes as I ran or walked, so that meant swallowing at least one pill every 30 minutes. I had swallowed 57 pills on Day 1 and 48 pills on Day 2, between my enzymes and electrolyte pills. SO MANY PILLS. The idea of continuing to eat constantly every 30 minutes for another lap of ~5 or more hours was also not appealing. I knew if I didn’t eat, I couldn’t continue.

A chart with an hourly break down of sodium, calories, and carbs consumed per hour, plus totals of caloric consumption, burn, and calculated deficit across ~27 hours of move time to accomplish 100 miles run.

And so, I decided to stop after one more lap on day 3, even though I was holding up a respectable 15:41 min/mi pace throughout. I hit 100 miles and finished the lap at home, happy with my decision.

Two pictures of me leaning over after my run holding a sign (one reading 50 miles, one reading 100 miles) for each of my cats to sniff.(You can see from these two pictures that I smelled VERY interesting, sweaty and salty and exhausted at the end of day 1 and day 3, when I hit 50 miles and 100 miles, respectively. We have two twin kittens (now 3 years old) and one came out to sniff me first on the first day, and the other came out as I came home on the third day!)

Because I had only run one final lap (16 miles) on day 3, and had so many bonus hours in the rest of the day afterward when I was done and home, I was able to eat more and end up with only a 803 calorie deficit for the day. So overall, day 1 had the biggest deficit and probably influenced my fatigue and perception of pain on day 2, but because I had shortened day 2 and then day 3, my very high calorie intake every hour did a pretty good job matching my calorie expenditure, which is probably why I felt very little muscle fatigue in my body and had no significant sore areas other than the bottoms of my feet. I ended up averaging 821 mg/hr of sodium and 279 calories per hour (taking into account the fact that I skipped two final snacks at the end of day 2 when I was walking it out; ignoring that completely skipped hour would mean the average caloric intake on hours I ate anything at all was closer to 290 calories/hr!)

In total, I ended up consuming 124 pills in approximately 27 hours of move time across my 100 miles. (This doesn’t include enzyme pills for my breakfast or dinners each of those days, either – just the electrolyte and enzyme pills consumed while running!)

AFTERMATH

Recovery after day 3 was pretty similar to day 2, with me being able to eat more and limit my calorie deficit. I’ve had long ~30 mile training runs where I wasn’t very hungry afterward, but it surprised me that even two days after my ultra, I still haven’t really regained my appetite. I would have figured my almost 4000 calorie deficit from day 1 would drive a lot of hunger, so this surprised me.

So too has my physical state: 48 hours following the completion of my 100 miles, I am in *fantastic* shape compared to other multi-day back to back series of runs I’ve done, ultramarathons or not. The few blisters I got, mainly on my right foot, have already flattened themselves up and mostly vanished. I think I get more blisters on my right foot because of breaking my toe last year: my right foot now splays wider in my shoe, so it tends to get more blisters and cause more trouble than my left foot. I got only one blister on my left foot, which is still fluid filled but not painful and starting to visibly deflate now that I’m not rubbing it onto a shoe constantly any more. And my legs don’t feel like I ran at all, let alone running 51+34+16 miles!

I am tired, though. I don’t have brain fog, probably because of my excellent fueling, but I am fatigued in terms of overall energy and lack of motivation to get a lot done yesterday and today (other than writing this blog post!). So that’s probably pretty on par with my effort expended and matches what I expected, but it’s nice to be able to move around without hurting (other than my feet).

My feet in terms of general aches and ows are what came out the worst from my run. Day 2, what hurt was the bottom of the balls of my feet. Starting each night though, I was getting aches all over in all of the bones of my feet. After day 3, that night the foot aches were particularly strong, and I took some Tylenol to help with that. Yesterday evening and today though, the ache has settled down to very minor and only occasionally noticeable. The tendon from the top of my left foot up my ankle is sore and gets cranky when I wear my sneakers (although it didn’t bother me at all while running any of the days), so after tying and re-tying my shoelaces 18 times yesterday to try to find the perfect fit for my left foot, today I went on my recovery walk in flip flops and was much happier.

What I’m taking away from this 200 mile attempt that was only 100 miles:

I feel a little disappointed that I didn’t get anywhere near 200 miles, but obviously, I was not willing to hurt long enough or hard enough to get there. My husband called it a stretch goal. Rationally, I am very happy with my choices to stop at 100 and end up in the fantastic physical shape that I am in, and I recognize that I made a very rational choice and tradeoff between ending in good shape (and health) and the mainly ego-driven benefits of possibly achieving 200 miles (for me).

Would I do anything different? I can’t think of anything. If I somehow had an alternate do-over, I can’t think of anything I would think to change. I’d like to reduce my risk of blisters but I’m already doing all I can there, and dealing with changes in my right foot shape post-broken toe that I have no control over. And I’m not sure how to train more/better for reducing the bottom ball of foot pain that I got: I already trained multiple days, back to back, long hours of feet on pavement. It’s possible that having my doctor’s appointment the day before I started influenced my mental calculation of my future risk/benefit tradeoff of continuing more miles, and so not having had that then may have changed my calculations to do another lap or two, or go out on the 4th day (which I did not). But, I don’t have a do over, and I’ll never know, and I’m not too upset about that because I was able to control what I could control and am again pretty happy with the outcomes. 100 or 150 miles felt about the same to me, psychologically, in terms of satisfaction.

What I would tell other people about attempting multiple day ultramarathons or 200 mile ultramarathons:

Training back to back days is one option, as is long spurts of time on feet walking/hiking/running. I don’t think “just running” has to be the only way to train for these things. I’m also a big proponent of short intervals: If you hear people recommend taking walk breaks, it doesn’t have to be 1 minute every 10 minutes or every mile. It can be as short as every 30 seconds of running, take a walk break! There’s no wrong way to do it, whatever makes your body and brain happy. I get bored running longer (and don’t like it); other people get bored running the short intervals that I do – so find what works for you and what you’re actually willing to do.

Having plans for how you’ll rest X hours and go out and try to make it another lap or to the next aid station works really well, especially if you have crew/pacers/support (for me, my husband) who will stick to those rules and help you get back out there to try the next lap/section. Speaking of sleep/rest, laying down for a while helps as much as sleeping, so even if you can’t sleep, committing to the rest of X hours is also good for resting your feet and everything. I found that the hour laying down before I fell asleep helped my body process the noise of the “ouch” from my feet and it was a lot easier to sleep after that. Plan that you’ll have some down/up time before and after your sleep/rest time, and figure that into your time plans accordingly.

The cheesy “know your why” and “know what you want” recommendations do help. I didn’t want 200 miles badly enough to hurt more for longer and risk months of recovery (or the inability to recover). Maybe you’d be lucky enough to achieve 200 without hurting that bad, that long, or risking injury – or maybe you’ll have to make that choice, and you might make it differently than I did. (Maybe you’re lucky enough to not have 5 autoimmune things to juggle! I hope you don’t have to!) I kind of knew going in that I was only going to hit 200 if all went perfect.

Diabetes and this 200 mile ultramarathon that was a 100 mile ultra:

I just realized that I managed to write an ENTIRE race report without talking about diabetes and glucose management…because I had zero diabetes-related thoughts or issues during these several days of my run! Sweet! (Pun fully intended.)

Remember, I have type 1 diabetes and use an open source automated insulin delivery (AID) system (in my case, still using OpenAPS after alllllll these years), and I’ve talked previously about how I fuel while ultrarunning and juggling blood glucose management. Unlike previous ultras, I had zero pump site malfunctions (phew) and my glucose stayed nicely in range throughout. I think I had one small drift above range for 2 hours due to an hour of higher carb activity right when I shifted to walking the second lap on day 2, but otherwise was nicely in range all days and all nights without any extra thought or energy expended. I didn’t have to take a single “low carb”/hypoglycemia treatment! I think there was one snack I took a few minutes early when I saw I was drifting down slightly, but that was mostly a convenience thing and I probably would not have gone low (below target) even if I had waited for my planned fuel interval. But out of 46 snacks, only one 5-10 minutes early is impressive to me.

I had no issues after each day’s run, either: OpenAPS seamlessly adjusted to the increasing insulin sensitivity (using “autosensitivity” or “autosens”) so I didn’t have to do manual profile shifts or overrides or any manual interference. I did decide each night whether I wanted to let it SMB (supermicrobolus) as usual or stick to temp basal only to reduce the risk of hypoglycemia, but I had no post-dinner or overnight lows at all.

The most “work” I had to do was deciding to wear a second CGM sensor (staggered, 5 days after my other one started) so that I had a CGM sensor session going with good quality data that I could fall back to if my other sensor started to get jumpy, because the sensor session was supposed to end the night of day 4 of my planned run. I obviously didn’t run day 4, but even so I was glad to have another sensor going (worth the cost of overlapping my sensors) in order to have the reassurance of constant data if the first one died or fell out and I could seamlessly switch to an already-warmed up sensor with good data. I didn’t need it, but I was glad to have done that in prep.

(Because I didn’t talk about diabetes a lot in this post, because it was not very relevant to my experiences here, you might want to check out my previous race recaps and posts about utlrarunning like this one where I talk in more detail about balancing fueling, insulin, and glucose management while running for zillions of hours.)

TLDR: I ran 100 miles, and I did it my DIY way: my own course, my own (slow pace), with sleep breaks, a lot of fueling, and a lot of satisfaction of setting big goals and attempting to achieve them. I think for me, the process goals of figuring out how to even safely attempt ultramarathons are even more rewarding than the mileage milestones of ultrarunning.

Running a multi-day ultramarathon by Dana M. Lewis from DIYPS.org

What I’ve Learned From 5,000 Pills Of Pancreatic Enzyme Replacement Therapy (PERT) For Exocrine Pancreatic Insufficiency (EPI/PEI)

I recently reached a weird milestone that no one likely cares about, but that I find fascinating: in the first 534 days of exocrine pancreatic insufficiency (EPI / PEI), I’ve taken more than 5,000 pills of pancreatic enzyme replacement therapy (PERT).

That’s an average of 9.41 pills per day!

PERT (enzymes) helps my body successfully digest the food that I eat, because my pancreas is no longer producing enough enzymes. Like insulin treatment for diabetes, PERT will be a lifelong necessity for me: this number of pills consumed is one that only goes up from here.

Here’s a look at what the pills per day intake has looked like over this time:

  • Min: 2 (early days)
  • Max: 72 (hello, outlier of two ultramarathons! One was 62 miles, the other was 82 miles! Other 30-40+ pill days are likely also ultra 🏃🏼‍♀️ training days, e.g. around 50k of running, which is still 8-9 hours of running and fueling every 30 minutes)
  • Median: 8

Analyzing a graph of my daily PERT enzyme pills, there are noticeable spikes, particularly around my ultramarathon training days. Two distinct spikes at 72 pills per day correspond to my 100k (62 mile) and 82-mile ultra runs.

Here is a graph showing my PERT (enzyme) pills per day totals, there are a few noticeable spikes in the 20-40ish range that are likely ultra training days. The two spikes around 72/day are my 100k (62 mile) and 82 mile ultra runs.

Why so many pills?!

Not everyone with EPI takes as many pills as I do. The number is titrated (adjusted) based on what and how often I eat. A typical meal for me requires 2-3 prescription pills of PERT.

In my case, I sometimes use over-the-counter (OTC) enzymes to ‘top off’ a prescription pill.

For hikes and runs, which I do 4-5 times each week, I eat small amounts every 30 minutes if I’m out for more than 2 hours, which is 3+ times a week. For a run of 5 hours, where I consume 10 snacks, I’d use 10 pills if I went the prescription route. In contrast, I usually use 2-4 OTC pills per snack, which combined costs an average of $0.70. That means $7 in enzyme costs for 5 hours compared to $80 if I had taken prescription PERT! Multiply times several times a week, and you can see why I choose this strategy.

Balancing Cost ($) and Convenience (Fewer Pills)

The “cost” for using OTC pills, though, is 20-40 pills ($7) instead of 10 pills ($80). On a day-to-day basis, my choice depends on convenience, how confident I am in my counts/dosing (I’m very confident for hike/run pre-portioned snacks that I’ve tested rigorously), and other factors.

Increasingly, when I’m not pursuing physical activity, I’m more likely to choose fewer pills at the financial cost of prescription PERT. I’d like to choose fewer pills for physical activity, too, which is why I’ve recently shifted to a slightly more expensive OTC pill that has more enzymes in it, in order to take 1 pill for most snacks instead of 2-4. In a typical long run of 4 hours, for example, instead of 7 snacks resulting in 28 pills, those 7 snacks would instead result in 7 pills! (There’s also a challenge with finding these particular OTC pills, as prescription pill shortage has driven more people to try OTCs and now the OTC pills I prefer are regularly out of stock, too. If you’re curious about using OTC pills with EPI, or prior to a diagnosis of EPI, you may be interested in this post where I describe in more detail using over the counter (OTC) enzyme pills for this purpose.)

Long run days are outliers in my pill count per day numbers and graphs. However, even if I skipped those and only took 8 prescription PERT per day, I’d still have consumed over 4,200 enzyme pills at this point.

EPI or PEI leads to a lot of pill-swallowing, regardless of whether you’re using over the counter enzymes or prescription enzymes.

But they work! Oh, do they work. My GI symptoms used to be most days a week and caused me to feel miserable (read about my experience getting diagnosed with EPI here). Now, I rarely have any symptoms, and when they do occur (likely mistiming a dose compared to what I was eating or taking not quite enough to match what I was eating), they are significantly less bothersome. It’s awesome, and I feel back to “normal” for me well before all of my GI symptoms started years ago! So yes, I have to swallow many pills a day for EPI but my symptoms are completely and regularly managed as a result and my quality of life is back to being what it was before.

If you’re curious to read more about my experiences with EPI, or posts about adjusting enzymes to match what you’re eating, check out DIYPS.org/EPI for a list of other EPI related posts.

If you have EPI and have an iOS device, you also might be interested in checking out PERT Pilot, a free iOS app to track food intake and PERT dosing and outcomes.


You can also contribute to a research study and help us learn more about EPI/pEI – take this anonymous survey to share your experiences with EPI-related symptoms!

You’d Be Surprised: Common Causes of Exocrine Pancreatic Insufficiency

Academic and medical literature often is like the game of “telephone”. You can find something commonly cited throughout the literature, but if you dig deep, you can watch the key points change throughout the literature going from a solid, evidence-backed statement to a weaker, more vague statement that is not factually correct but is widely propagated as “fact” as people cite and re-cite the new incorrect statements.

The most obvious one I have seen, after reading hundreds of papers on exocrine pancreatic insufficiency (known as EPI or PEI), is that “chronic pancreatitis is the most common cause of exocrine pancreatic insufficiency”. It’s stated here (“Although chronic pancreatitis is the most common cause of EPI“) and here (“The most frequent causes [of exocrine pancreatic insufficiency] are chronic pancreatitis in adults“) and here (“Besides cystic fibrosis and chronic pancreatitis, the most common etiologies of EPI“) and here (“Numerous conditions account for the etiology of EPI, with the most common being diseases of the pancreatic parenchyma including chronic pancreatitis, cystic fibrosis, and a history of extensive necrotizing acute pancreatitis“) and… you get the picture. I find this statement all over the place.

But guess what? This is not true.

First off, no one has done a study on the overall population of EPI and the breakdown of the most common co-conditions.

Secondly, I did research for my latest article on exocrine pancreatic insufficiency in Type 1 diabetes and Type 2 diabetes and was looking to contextualize the size of the populations. For example, I know overall that diabetes has a ~10% population prevalence, and this review found that there is a median prevalence of EPI of 33% in T1D and 29% in T2D. To put that in absolute numbers, this means that out of 100 people, it’s likely that 3 people have both diabetes and EPI.

How does this compare to the other “most common” causes of EPI?

First, let’s look at the prevalence of EPI in these other conditions:

  • In people with cystic fibrosis, 80-90% of people are estimated to also have EPI
  • In people with chronic pancreatitis, anywhere from 30-90% of people are estimated to also have EPI
  • In people with pancreatic cancer, anywhere from 20-60% of people are estimated to also have EPI

Now let’s look at how common these conditions are in the general population:

  • People with cystic fibrosis are estimated to be 0.04% of the general population.
    • This is 4 in every 10,000 people
  • People with chronic pancreatitis combined with all other types of pancreatitis are also estimated to be 0.04% of the general population, so another 4 out of 10,000.
  • People with pancreatic cancer are estimated to be 0.005% of the general population, or 1 in 20,000.

What happens if you add all of these up: cystic fibrosis, 0.04%, plus all types of pancreatitis, 0.04%, and pancreatic cancer, 0.005%? You get 0.085%, which is less than 1 in 1000 people.

This is quite a bit less than the 10% prevalence of diabetes (1 in 10 people!), or even the 3 in 100 people (3%) with both diabetes and EPI.

Let’s also look at the estimates for EPI prevalence in the general population:

  • General population prevalence of EPI is estimated to be 10-20%, and if we use 10%, that means that 1 in 10 people may have EPI.

Here’s a visual to illustrate the relative size of the populations of people with cystic fibrosis, chronic pancreatitis (visualized as all types of pancreatitis), and pancreatic cancer, relative to the sizes of the general population and the relative amount of people estimated to have EPI:

Gif showing the relative sizes of populations of people with cystic fibrosis, chronic pancreatitis, pancreatic cancer, and the % of those with EPI, contextualized against the prevalence of these in the general population and those with EPI. It's a small number of people because these conditions aren't common, therefore these conditions are not the most common cause of EPI!

What you should take away from this:

  • Yes, EPI is common within conditions such as cystic fibrosis, chronic pancreatitis (and other forms of pancreatitis), and pancreatic cancer
  • However, these conditions are not common: even combined, they add up to less than 1 in 1000!
  • Therefore, it is incorrect to conclude that any of these conditions, individually or even combined, are the most common causes of EPI.

You could say, as I do in this paper, that EPI is likely more common in people with diabetes than all of these conditions combined. You’ll notice that I don’t go so far as to say it’s the MOST common, because I haven’t seen studies to support such a statement, and as I started the post by pointing out, no one has done studies looking at huge populations of EPI and the breakdown of co-conditions at a population level; instead, studies tend to focus on the population of a co-condition and prevalence of EPI within, which is a very different thing than that co-condition’s EPI population as a percentage of the overall population of people with EPI. However, there are some great studies (and I have another systematic review accepted and forthcoming on this topic!) that support the overall prevalence estimates in the general population being in the ballpark of 10+%, so there might be other ‘more common’ causes of EPI that we are currently unaware of, or it may be that most cases of EPI are uncorrelated with any particular co-condition.

(Need a citation? This logic is found in the introduction paragraph of a systematic review found here, of which the DOI is 10.1089/dia.2023.0157. You can also access a full author copy of it and my other papers here.)


You can also contribute to a research study and help us learn more about EPI/PEI – take this anonymous survey to share your experiences with EPI-related symptoms!