Understanding Fecal Elastase Test Results Including Sensitivity And Specificity And What It Means For Exocrine Pancreatic Insufficiency (EPI or PEI)

One of the challenges related to diagnosing exocrine pancreatic insufficiency (known as EPI or PEI) is that there is no perfect test.

With diabetes, we can see in several different ways what glucose is doing: via fasting glucose levels, HbA1c (an average of 3 months glucose), and/or continuous glucose monitoring. We can also test for c-peptide to see if insulin production is ongoing.

Yet for EPI, the tests for assessing whether and how much the pancreas is producing digestive enzymes are much less direct, more invasive, or both.

Some of the tests include a breath test; an invasive secretin pancreatic function test; a 72-hour fecal fat collection test, or a single sample fecal elastase test.

  • A breath test is an indirect test, which assesses the end-product of digestion rather than digestion itself, and other conditions (like SIBO) can influence the results of this test. It’s also not widely available or widely used.
  • The secretin pancreatic function test is an invasive test involving inserting a tube into the small intestine after giving secretin, which is a hormone that stimulates the pancreas. The tube collects digestive juices produced by the pancreas, which are tested. It’s invasive, costly, and therefore not ideal.
  • For reliability, the 72-hour fecal fat collection test might be ideal, because it’s checking the amount of fat in the stool. It requires stopping enzymes, if someone is taking them already, and consuming a high fat diet. But that includes collecting stool samples for 3 days – ugh. (The “ugh” is my personal opinion, clearly).
  • The fecal elastase test, in contrast, does not require stopping enzymes. It measures human elastase, whereas digestive enzymes are typically pig-based, so you don’t have to stop enzymes when doing this test. It’s also a single stool sample (so you’re not collecting poop for 3 days in a row). The sensitivity and specificity are different based on the diagnostic threshold, which I’ll talk about below, and the accuracy can be influenced by the sample. Diarrhea, meaning watery poop, can make this test even less reliable. But that’s why it’s good that you can take enzymes while doing this test. Someone with diarrhea and suspected EPI could go on enzymes, reduce their diarrhea so they could have a formed (non-watery) sample for the elastase test, and get a better answer from the fecal elastase test.

The fecal elastase test is often commonly used for initial screening or diagnosis of EPI. But over the last two years, I’ve observed a series of problems with how it is being used clinically, based on reading hundreds of research and clinical practice articles and reading thousands of posts of people with EPI describing how their doctor is ordering/reviewing/evaluating this test.

Frequent problems include:

  • Doctors refuse to test elastase, because they don’t believe the test indicates EPI due to the sensitivity/specificity results for mild/moderate EPI.
  • Doctors test elastase, but won’t diagnose EPI when test results are <200 (especially if 100-200).
  • Doctors test elastase, but won’t diagnose EPI even when test results are <100!
  • Doctors test elastase, diagnose EPI, but then do not prescribe enzymes because of the level of elastase (even when <200).
  • Doctors test elastase, diagnose EPI, but prescribe a too-low level of enzymes based on the level of elastase, even though there is no evidence indicating elastase should be used to determine dosing of enzymes.

Some of the problems seem to result from the fact that the elastase test has different sensitivity and specificity at different threshold levels of elastase.

When we talk about “levels” of elastase or “levels” or “types” of EPI (PEI), that usually means the following thresholds / ranges:

  • Elastase <= 200 ug/g indicates EPI
  • Elastase 100-200 ug/g indicates “mild” or “mild/moderate” or “moderate” EPI
  • Elastase <100 ug/g often is referred to as “severe” EPI

You should know that:

  • People with severe EPI (elastase <100) could have no symptoms
  • People with mild/moderate EPI (elastase 100-200) could have a very high level of symptoms and be malnourished
  • People with any level of elastase indicating EPI (elastase <=200) can have EPI even if they don’t have malnourishment (usually meaning blood vitamin levels like A, D, E, or K are below range).

So let’s talk about sensitivity and specificity at these different levels of elastase.

First, let’s grab some sensitivity and specificity numbers for EPI.

  1. One paper that is widely cited, albeit old, is of sensitivity and specificity of fecal elastase for EPI in people with chronic pancreatitis. You’ll see me talk in other posts about how chronic pancreatitis and cystic fibrosis-related research is over-represented in EPI research, and it may or may not reflect the overarching population of people with EPI.But since it’s widely used, I’ll use it in the below examples, especially because this may be what is driving clinician misunderstanding about this test.With a cut off of <200 ug/g, they found that the sensitivity in detecting moderate/severe EPI is 100%, and 63% sensitivity for detecting mild EPI. At that <200 ug/g threshold, the specificity is 93% (which doesn’t distinguish between severities). With a cut off of <100 ug/g, the sensitivity for detecting mild EPI drops to 50%, but the specificity increases to 98%.This means that:
    1. 63% of people with mild EPI would be correctly diagnosed using an elastase threshold of 200 ug/g (vs. only 50% at 100 ug/g).
    2. 100% of people with moderate/severe EPI would be correctly diagnosed using an elastase threshold of 200 ug/g (compared to only 93% or 96% for moderate/severe at 100 ug/g).
    3. Only 7% of people testing <200 ug/g would be incorrectly diagnosed with EPI, and only 2% of people testing <100 ug/g.
  2. For comparison, a systematic review evaluated a bunch of studies (428 people from 14 studies) and found an average sensitivity of 77% (95% CI of 58-89%) and average specificity of 88% (95% CI of 78-93%).This sensitivity is a little higher than the above number, which I’ll discuss at the end for some context.

So what does sensitivity and specificity mean and why do we care?

At an abstract level, I personally find it hard to remember what sensitivity and specificity mean.

  • Sensitivity means: how often does it correctly identify the thing we want to identify?

This means a true positive. (Think about x-ray screening at airport security: how often do they find a weapon that is there?)

  • Specificity means: how often does it avoid mistakenly identifying the thing we want to identify? In other words, how often is a positive a true positive rather than a false positive?

(Think about x-ray screening at airport security: how often does it correctly identify that there are no weapons in the bag? Or how often do they accidentally think that your jam-packed bag of granola and snacks might be a weapon?)

Here is how we apply this to fecal elastase testing for EPI.

For those with moderate/severe EPI, the test is 100% sensitive at correctly detecting those cases if you use an elastase cut off of <200 ug/g. For those with mild EPI, the test drops to only being 63% sensitive at correctly detecting all of those cases. And 93% of the time, the test correctly excludes EPI when it doesn’t exist (at a <200 ug/g cut off, vs. 98% of the time at a <100 ug/g cut off). Conversely, 7% (which we get from subtracting 93% from 100%) of people with elastase <200 ug/g might not have EPI, and 2% (98% subtracted from 100%) of people with elastase <100 ug/g might not have EPI.

Here’s another way of thinking about it, using a weather forecast analogy. Think about how easy it is to predict rain when a major storm is coming. That’s like trying to detect severe EPI, it’s a lot easier and forecasters are pretty good about spotting major storms.

But in contrast, what about correctly predicting light rain? In Seattle, that feels close to impossible – it rains a lot, very lightly. It’s hard to predict, so we often carry a light rain jacket just in case!

And for mild EPI, that’s what the sensitivity of 63% means: less than two thirds of the time can it correctly spot mild EPI by looking for <200 ug/g levels, and only half the time by looking for <100 ug/g. The signal isn’t as strong so it’s easier to miss.

The specificity of 93% means that the forecast is pretty good at identifying not-rainy-days, even with a cut off of elastase >200 ug/g. But, occasionally (around 7/100 times), it’s wrong.

Table comparing the sensitivity for severe and mild EPI alongside specificity, plus comparing to weather forecast ability for rain in major storms.

Why might clinicians be incorrectly using the value of these numbers for the fecal elastase test?

I hypothesize that in many cases, for the elastase levels now considered to indicate mild/moderate EPI (elastase 100-200 ug/g), clinicians might be accidentally swapping the sensitivity (63%) and specificity (93%) numbers in their mind.

What these numbers tell us is that 63% of the time, we’ll catch mild EPI through elastase testing. This means 37/100 people with actual mild EPI might be missed!

In contrast, the specificity of 93% tells us about accidental false positives, and that 7/100 people without EPI might accidentally get flagged as having possible EPI.

Yet, the clinical practice in the real-world seems to swap these numbers, acting as if the accuracy goes the other way, suspecting that elastase 100-200 doesn’t indicate EPI (e.g. thinking 37/100 false positives, which is incorrect, the false positive rate is 7/100).

There’s plenty of peer-reviewed and published evidence that people with elastase 100-200 have a clear symptom burden. There’s even a more recent paper suggesting that those with symptoms and elastase of 200-500 benefit from enzymes!

Personally, as a person with EPI, I am frustrated when I see/hear cases of people whose clinicians refuse testing, or don’t prescribe PERT when elastase is <=200 ug/g, because they don’t believe elastase 100-200 ug/g is an accurate indicator of EPI. This data shows that’s incorrect. Regardless of which paper you use and which numbers you cite for sensitivity and specificity, they all end up with way more common rates of false negatives (missing people with EPI) than false positives.

And, remember that many people with FE 200-500 benefit from enzymes, too. At a cutoff of 200 ug/g, the number of people we are likely to miss (sensitivity) at the mild/moderate level is much higher than the number of false positives who don’t actually have EPI. That puts the risk/benefit calculation – to me – such that it warrants using this test, putting people on enzymes, and evaluating symptom resolution over time following PERT dosing guidelines. If people’s symptom burden does not improve, titrating PERT and re-testing elastase makes sense (and that is what the clinical guidelines say to do), but the cost of missing ~37 people out of 100 with EPI is too high!

Let’s also talk about elastase re-testing and what to make of changed numbers.

I often also observe people with EPI who have their elastase re-tested multiple times. Here are some examples and what they might mean.

  • A) Someone who tests initially with a fecal elastase of 14, later retests as 16, then 42 ug/g.
  • B) Someone who tests initially at 200 and later 168.
  • C) Someone who tests initially at 72 and later 142.
  • D) Someone who tests initially as 112 and later 537.

Remember the key to interpreting elastase is that <=200 ug/g is generally accepted as indicating EPI. Also it’s key to remember that the pancreas is still producing some enzymes, thus elastase production will vary slightly. But in scenarios A, B, and C – those changes are not meaningful. In scenario A, someone still has clear indicators of severe (elastase <100) EPI. Slight fluctuations don’t change that. Same for scenario B, 200 and 168 are both still in mild/moderate EPI (elastase <=200). Even scenario C isn’t very meaningful, even though there is an “increase”, this is still clearly EPI.

In most cases, the fluctuations in test results are likely a combination of both natural fluctuations in pancreas production and/or test reliability. If someone was eating a super low fat diet, taking enzymes effectively, that may influence how the pancreas is producing its natural enzymes – we don’t actually know what causes the pancreas to fluctuate the natural enzyme levels.

The only case that is meaningful in these examples is scenario D, where someone initially had a result of 112 and later clearly above the EPI threshold (e.g. 537). There are a few cases in the literature where people with celiac seem to have temporary EPI and later their elastase production returns to normal. This hasn’t been documented in other conditions, which doesn’t mean that it’s not possible, but we don’t know how common it is. It’s possible the first sample of 112 was due to a watery sample (e.g. during diarrhea) or other testing inaccuracy, too. If a third test result was >500, I’d assume it was a temporary fluctuation or test issue, and that it’s not a case of EPI. (Yay for that person!). If it were me (and I am not a doctor), I’d have them try out a period without enzymes to ensure that symptoms continued to be managed effectively. If the third test was anywhere around 200 or below, I’d suspect something going on contributing to fluctuations in pancreatic production and not be surprised if enzymes were continued to be needed, unless the cause could be resolved.

But what about scenario C where someone “went from severe to mild/moderate EPI”?!

A lot of people ask that. There’s no evidence in the hundreds (seriously, hundreds) of papers about EPI that indicate clearly that enzymes should be dosed based on elastase level, or that there’s different needs based on these different categories. The “categories” of EPI originally came from direct measurements of enzyme secretion via invasive tests, combined with quantitative measurements of bicarbonate and fat in stools. Now that fecal elastase is well established as a non-invasive diagnostic method, severities are usually estimated based on the sensitivity of these cutoffs for detecting EPI, and that’s it. The elastase level doesn’t actually indicate the severity of the experience through symptoms, and so enzymes should be dosed and adjusted based on the individual’s symptoms and their diet.

In summary:

  • Elastase <=200 ug/g is very reliable, indicates EPI, and warrants starting PERT.
  • There is one small study suggesting even people with elastase 200-500 might benefit from PERT, if they have symptoms, but this needs to be studied more widely.
  • It’s possible clinicians are conflating the sensitivity and specificity, thus misunderstanding how accurately elastase tests can detect cases of mild/moderate EPI (when elastase is 100-200 ug/g).

Let me know if anyone has questions about elastase testing, sensitivity, and specificity that I haven’t answered here! Remember I’m not a doctor, and you should certainly talk with your doctor if you have questions about your specific levels. But make sure your doctor understands the research, and feel free to recommend this post to them if they aren’t already familiar with it: https://bit.ly/elastase-sensitivity-specificity

Personalized Story Prompts for Kids Books and Early Reader Books

For the holidays this year, I decided to try my hand at creating another set of custom, illustrated stories for my nieces and nephews (and bonus nieces and nephews). I have a few that are very advanced readers and/or too old for this, but I ended up with a list of 8 kids in my life from not-yet-reading to beginning reading to early 2nd grade reading level. I wanted to write stories that would appeal to each kid, include them as the main character, be appropriate for their reading (or read-to) level, and also include some of their interests.

Their interests were varied which made it quite a challenge! Here’s the list I worked from:

  • 2nd grade reading level, Minecraft
  • early 2nd grade reading level: soccer, stunt biking, parkour, ninja, Minecraft
  • beginning reading level: soccer, stunt biking, ninja, Spiderman
  • beginning reading level: Peppa Pig, moko jumbies
  • (read to younger child): Minnie Mouse, Peppa Pig, Bluey, and tea parties
  • (read to younger child): Bluey, Olaf, Elsa, & Anna
  • (read to younger child): cars/vehicles

I enlisted ChatGPT, an LLM, and ended up creating stories for each kid, matching their grade levels and interests, then illustrating them.

But illustrating them was actually a challenge (still), trying to create images with similar characters that would be on every page of the story and similar enough throughout that they were the “same” character.

Illustration challenges and how I got successful prompts:

My first pass on images wasn’t very good. I could get basic details to repeat, but often had images that looked like this – slightly different style and character throughout:

8 different illustrations in slightly different styles and almost different characters of a girl with blonde, shoulder length hair and a purple dress in an enchanted forest

Different styles throughout and that makes it look like a different character, even though it’s the same character in the whole story. This was a book to read to a <3 year old, though, and I thought she wouldn’t mind the different styles and left it as is. I also battled with adding, for personal use, the characters that most interested her: Peppa Pig and Minnie Mouse.

Interestingly, if I described with a prompt to illustrate a scene including a character “inspired by, but distinct from, Peppa Pig”…it essentially drew Peppa Pig or a character from it. No problems.

But if you gave the same prompt “inspired by, but distinct from, Minnie Mouse”? No go. No image at all: ChatGPT would block it for copyright reasons and wouldn’t draw any of the image. I riffed a bunch of times and finally was able to prompt a good enough mouse with round ears and a red dress with white polka dots. I had to ultimately illustrate the mouse character alone with the human character, because if I tried to get a Peppa-inspired character and then separately a mouse character, it wanted to draw the mouse with a pig-style face in the correct dress! I could never work around that effectively for the time I had available (and all the other books I was trying to illustrate!) so I stopped with what I had.

This was true for other characters, too, with copyright issues. It won’t draw anything from or like Bluey – or Frozen, when prompted. But I could get it to draw “an ethereal but warm, tall female adult with icy blonde hair, blue eyes, in an icy blue dress”, which you can see in the fourth image on the top row here:

Another series of illustrations with slightly different characters but closer in style throughout. there's one image showing a Frozen-inspired female character that I got by not prompting with Frozen.

I also managed to get slightly closer matching characters throughout this, but still quite a bit of variability. Again, for a young being-read-to-child, it was good enough for my purposes. (I never could get it to draw a Bluey-like character, even when I stopped referencing Bluey by name and described the shape and character, so I gave up on that.)

I tried a variety of prompts and series of prompts for each book. Sometimes, I would give it the story and prompt it with each page’s text, asking for an illustration and to keep it in the same style and the same character as the previous image. That didn’t work well, even when I told it in every prompt to use the same style and character plus the actual image prompt. I then tried to create a “custom” GPT, with the GPT’s instructions to use the same style throughout. That started to give me slightly better results, but I still had to remind it constantly to use the same style.

I also played around with taking an image that I liked, starting a new chat, and asking it to describe that image. Then I’d use that prompt to create a new prompt, describing the character in the same way. That started to get me slightly better results, especially when I did so using the custom GPT I had designed (you can try using this GPT here). I started to get better, more consistent characters:

A series of images of a young cartoon-drawn boy with wavy blonde hair riding a bike through an enchanted forest.

 

A series of drawings of a cartoon-like character with spiky blonde hair, blue eyes, and various outfits including a ninja costume

Those two had some variability, but a lot improved beyond the first several books. They are for the beginning and second-grade reading levels, too, so they are older kids with more attention to detail so it was worth the extra effort to try to get theirs to be more consistent.

The last one with the ninja and ninja outfits is another one that ran into copyright issues. I tried to have it illustrate a character inspired by, but distinct from, Spiderman – nope, no illustration at all. I asked it to illustrate the first picture in the soccer park with a spider strand looping in the corner of the image, like Spiderman had swung by but was out of sight and not picture – NOPE. You can’t even get an image that has Spiderman in the prompt at all, even if Spiderman isn’t in the picture! (I gave up and moved on without illustrating spiderwebs, even though Spiderman is described in the story).

My other favorite and pretty consistent one was two more of the early reader ones:

A series of images showing a young cartoon boy with wavy brown hair at a car fair

The hard part from that book was actually trying to do the cars consistently, rather than the human character. The human character was fairly consistent (although in different outfits, despite clear outfit prompts – argh) throughout, because I had learned from the previous images and prompt processes and used the Custom GPT, but the cars varied more. But, for a younger reader, hopefully that doesn’t matter.

The other, more-consistent character one for an early reader had some variations in style but did a better job matching the character throughout even when the style changed.

Another example with a mostly consistent young cartoon drawn girl with whispy blonde pigtails and big blue eyes, plus moko jumbies and peppa pig

How I wrote each story:

I also found some processes for building better stories. Again, see the above list of very, varied interests for each kid. Some prompts were straight forward (Minecraft) and other were about really different characters or activities (moko jumbies and Peppa Pig? Minnie Mouse and Peppa Pig? soccer ninja and Minecraft?).

What I ended up doing for each:

  1. In a new ChatGPT window (not the custom GPT for illustrating): Describe the reading level; the name of the character(s); and the interests. Ask it to brainstorm story ideas based on these interests.
  2. It usually gave 3 story ideas in a few sentences each, including a title. Sometimes, I would pick one and move on. Other times, I would take one of the ideas and tweak it a bit and ask for more ideas based on that. Or, I’d have it try again generally, asking for 3 more ideas.
  3. Once I had an idea that I liked, I would ask it to outline the story, based on the chosen story idea and the grade level we were targeting. Sometimes I would tweak the title and other times I would take the title as-is.
  4. Once it had the outline, I could have it then write the entire story (especially for the younger, beginner reader or read-to levels that are so short), but for the “chapter” books of early 2nd and 2nd grade reading level, I had it give me a chapter at a time, based on the outline. As each chapter was generated, I edited and tweaked it and took the text to where I would build the book. Sometimes, I would re-write the whole chapter myself, then give it back the chapter text and ask it to write the next one. If you didn’t give it back, it wouldn’t know what the chapter ended up as, so this is an important step to do when you’re making more than minor sentence construction changes.
  5. Because I know my audience(s) well, I tweaked it heavily as I went, incorporating their interests. For example, in the second images I showed above, there’s a dancing dog. It’s their actual dog, with the dog named in the story along with them as characters. Or in the chapter book for the character with the bike, it described running up a big mountain on a quest and being tired. I tossed in an Aunt-Dana reference including reminding the character about run-walking as a way to keep moving forward without stopping and cover the distance that needs to be covered. I also tweaked the stories to include character traits (like kindness) that each child has, and/or behaviors that their family prioritizes.

I described the images processes first, then the story writing, in this blog post, but I actually did the opposite for each book. I would write (brainstorm, outline, write, edit, write) the entire book, then I would go start a new chat window (eventually solely using my custom GPT) and ask for illustrations. Sometimes, I would give it the page of the story’s text and ask it to illustrate it. That’s helpful when you don’t know what to illustrate, and it did fairly well for some of the images (especially the Minecraft-inspired ones!). Ultimately, though, I would often get an image, ask what the prompt was for the image, tweak the prompt, and give it back to better match the story or what I wanted to illustrate. Once I was regularly asking for the image prompts, I had realized that giving the character details repeatedly for every image helped with consistency. Then I would use the ad-nauseam details myself for a longer prompt, which resulted in better images throughout, so I spent more energy deciding myself what to illustrate to best match the story.

All in all, I made 7 custom books (and 8 copies, one of the Minecraft books I copied and converted to a different named character for a friend’s child!). Between writing and editing, and illustrating, I probably spent an average of one hour per book! That’s a lot of time, but it did get more efficient as I went, and in some cases the hour included completely starting over and re-working the images in the book for consistency compared to the version I had before. The next books I create will probably take less time, both because I figured out the above processes but also because hopefully DALL*E and other illustration tools will get better about being able to illustrate the same character consistently across multiple prompts to illustrate a story.

How other people can use this to create stories – and why:

I have been so excited about this project. I love, love, love to read and I love reading with my nieces and nephews (and bonus kids in my life) and finding books that match their interest and help spark or maintain their love of reading. That’s why I did this project, and I have been bursting for WEEKS waiting to be able to give everyone their books! I wanted it to be a surprise for their parents, too, which meant that I couldn’t tell 2/3 of my closest circles about my cool project.

One of my friends without young kids that I finally told about my project loved the idea: she works as staff at an elementary school, supporting some students who are working on their reading skills who are nonverbal. She thought it would be cool to make a book for one student in particular, and described some of her interests: violins, drums, raspberries, and unicorns. I was in the car when she told me this, and I was able to follow the same process as above in the mobile ChatGPT app and list the interests, ask for a brainstorm of story ideas for a beginning reading level style book that had some repetitive text using the interests to aid in reading. It created a story about a unicorn who gathers other animals in the forest to play in an orchestra (with drums and violins) and eat raspberries. I had it illustrate the story, and it did so (with slightly different unicorns throughout). I only had to have it re-draw one image, because it put text in one of the last images that didn’t need to be there.

Illsutrations from a quick story about a unicorn, drums, violin, and an orchestra, plus raspberries

It was quick and easy, and my friend and her student LOVED it, and the other teachers and staff at the school are now working on personalized books for a lot of other students to help them with reading skills!

It really is an efficient and relatively easy way to generate personalized content; it can do so at different reading levels (especially when a teacher or someone who knows the student can tweak it to better match the reading level or sounds and words they are working on next); and you can generate pretty good matching illustrations too.

The hardest part is consistent characters; but when you don’t need consistency throughout a whole book, the time it takes drops to ~5 or so minutes to write, tweak, and illustrate an entire story.

Illustrations require a paid ChatGPT account, but if you have one and want to try out the custom GPT I built for (slightly more consistent) illustrations of stories, you can check it out here.

Custom stories: prompting and effective illustrating with ChatGPT, a blog post by Dana M. Lewis from DIYPS.org