Making it possible for researchers to work with #OpenAPS or general Nightscout data – and creating a complex json to csv command line tool that works with unknown schema

This is less of an OpenAPS/DIYPS/diabetes-related post, although that is normally what I blog about. However, since we created the #OpenAPS Data Commons on Open Humans, to allow those of us who desire to donate our diabetes data to research, I have been spending a lot of time figuring out the process from uploading your data to how data is managed and shared securely with researchers. The hardest part is helping researchers figure out how to handle the data – because we PWDs produce a lot of data :) . So this post explains some of the challenges of the data management to get it to a researcher-friendly format. I have been greatly helped over the years by general purpose open-source work from other people, and one of the things that helps ME the most as a non-traditional programmer is plain language posts explaining the thought process by behind the tools and the attempted solution paths. Especially because sometimes the web pages and blog posts pop higher in search than nitty gritty tool documentation without context. (Plus, I’ve been taking my own advice about not letting myself hold me back from trying, even when I don’t know how to do things yet.) So that’s what this post is!

Background/inspiration for the project and the tools I had to build:

We’re using Nightscout, which is a remote data-viewing platform for diabetes data, made with love and open source and freely available for anyone with diabetes to use. It’s one of the best ways to display not only continuous glucose monitor (CGM) data, but also data from our DIY closed loop artificial pancreases (#OpenAPS). It can store data from a number of different kinds and brands of diabetes devices (pumps, CGMs, manual data entries, etc.), which means it’s a rich source of data. As the number of DIY OpenAPS users are growing, we estimate that our real-world use is overtaking the amount of total hours of data from clinical trials of closed loop artificial pancreas systems.  In the #WeAreNotWaiting spirit of moving quickly (rather than waiting years for research teams to collect and analyze their own data) we want to see what we can learn from OpenAPS usage, not only by donating data to help traditional researchers speed up their work, but also by co-designing research studies of the things of most value to the diabetes community.

Step 1: Data from users to Open Humans

I thought Step 1 would be the hardest. However, thanks to Madeleine Ball, John Costik, and others in the Nightscout community, a simple Nightscout Data Transfer App was created that enables people with Nightscout data to pop it into their Open Humans accounts. It’s then very easy to join different projects (like the OpenAPS Data Commons) and share your data with those projects. And as the volunteer administrator of the OpenAPS Data Commons, it’s also easy for me to provide data to researchers.

The biggest challenge at this stage was figuring out how much data to pull from the API. I have almost 3 years worth of DIY diabetes data, and I have numerous devices over time uploading all at once…which makes for large chunks of data. Not everyone has this much data (or 6-7 rigs uploading constantly ;)). Props to Madeleine for the patience in working with me to make sure the super users with large data sets will be able to use all of these tools!

Step 2: Sharing the data with researchers

This was easy. Yay for data-sharing tools like Dropbox.

Step 3: Researchers being able to use the data

Here’s where thing started to get interesting. We have large data files that come in json format from Nightscout. I know some researchers we will be working with are probably very comfortable working with tools that can take large, complex json files. However…not all will be, especially because we also want to encourage independent researchers to engage with the data for projects. So I had the belated realization that we need to do something other than hand over json files. We need to convert, at the least, to csv so it can be easily viewed in Excel.

Sounds easy, right?

According to basic searches, there’s roughly a gazillion ways to convert json to csv. There’s even websites that will do it for you, without making you run it on the command line. However, most of them require you to know the types of data and the number of types, in order to therefore construct headers in the csv file to make it readable and useful to a human.

This is where the DIY and infinite possibility nature of all the kinds of diabetes tools anyone could be using with Nightscout, plus the infinite ways they can self-describe profiles and alarms and methods of entering data, makes it tricky. Just based on an eyeball search between two individuals, I was unable to find and count the hundred+ types of data entry possibilities. This is definitely a job for the computer, but I had to figure out how to train the computer to deal with this.

Again, json to csv tools are so common I figured there HAD to be someone who had done this. Finally, after a dozen varying searches and trying a variety of command line tools, I finally found one web-based tool that would take json, create the schema without knowing the data types in advance, and convert it to csv. It was (is) super slick. I got very excited when I saw it linked to a Github repository, because that meant it was probably open source and I can use it. I didn’t see any instructions for how to use it on the command line, though, so I message the author on Twitter and found out that it didn’t yet exist and was a not-yet-done TODO for him.

Sigh. Given this whole #WeAreNotWaiting thing (and given I’ve promised to help some of the researchers in figuring this out so we can initiate some of the research projects), I needed to figure out how to convert this tool into a command line version.

So, I did.

  • I taught myself how to unzip json files (ended up picking `gzip -cd`, because it works on both Mac and Linux)
  • I planned to then convert the web tool to be able to work on the command line, and use it to translate the json files to csv.

But..remember the big file issue? It struck again. So I first had to figure out the best way to estimate the size and splice or split the json into a series of files, without splitting it in a weird place and messing up the data. That became jsonsplit.sh, a tool to split a json file based on the size you give it (and if you don’t specify, it defaults to something like 100000 records).

FWIW: 100,000 records was too much for the more complex schema of the data I was working with, so I often did it in smaller chunks, but you can set it to whatever size you prefer.

So now “all” I had to do was:

  • Unzip the json
  • Break it down if it was too large, using jsonsplit.sh
  • Convert each of these files from json to csv

Phew. Each of these looks really simple now, but took a good chunk of time to figure out. Luckily, the author of the web tool had done much of the hard json-to-csv work, and Scott helped me figure out how to take the html-based version of the conversion and make it useable in the command line using javascript. That became complex-json2csv.js.

Because I knew how hard this all was, and wanted other people to be able to easily use this tool if they had large, complex json with unknown schema to deal with, I created a package.json so I could publish it to npm so you can download and run it anywhere.

I also had to create a script that would pass it all of the Open Humans data; unzip the file; run jsonsplit.sh, run complex-json2csv.js, and organize the data in a useful way, given the existing file structure of the data. Therefore I also created an “OpenHumansDataTools” repository on Github, so that other researchers who will be using Nightscout-based Open Humans data can use this if they want to work with the data. (And, there may be something useful to others using Open Humans even if they’re not using Nightscout data as their data source – again, see “large, complex, challenging json since you don’t know the data type and count of data types” issue. So this repo can link them to complex-json2csv.js and jsonsplit.sh for discovery purposes, as they’re general purpose tools.) That script is here.

My next TODO will be to write a script to take only slices of data based on information shared as part of the surveys that go with the Nightscout data; i.e. if you started your DIY closed loop on X data, take data from 2 weeks prior and 6 weeks after, etc.

I also created a pull request (PR) back to the original tool that inspired my work, in case he wants to add it to his repository for others who also want to run his great stuff from the command line. I know my stuff isn’t perfect, but it works :) and I’m proud of being able to contribute to general-purpose open source in addition to diabetes-specific open source work. (Big thanks as always to everyone who devotes their work to open source for others to use!)

So now, I can pass researchers json or csv files for use in their research. We have a number of studies who are planning to request access to the OpenAPS Data Commons, and I’m excited about how work like this to make diabetes data more broadly available for research will help improve our lives in the short and long term!

The only thing to fear is fear itself

(Things I didn’t realize were involved in open-sourcing a DIY artificial pancreas: writing “yes you can” style self-help blog posts to encourage people to take the first step to TRY and use the open source code and instructions that are freely available….for those who are willing to try.)

You are the only thing holding yourself back from trying. Maybe it’s trying to DIY closed loop at all. Maybe it’s trying to make a change to your existing rig that was set up a long time ago.  Maybe it’s doing something your spouse/partner/parent has previously done for you. Maybe it’s trying to think about changing the way you deal with diabetes at all.

Trying is hard. Learning is hard. But even harder (I think) is listening to the negative self-talk that says “I can’t do this” and perhaps going without something that could make a big difference in your daily life.

99% of the time, you CAN do the thing. But it primarily starts with being willing to try, and being ok with not being perfect right out of the gate.

I blogged last year (wow, almost two years ago actually) about making and doing and how I’ve learned to do so many new things as part of my OpenAPS journey that I never thought possible. I am not a traditional programmer, developer, engineer, or anything like that. Yes, I can code (some)…because I taught myself as I went and continue to teach myself as I go. It’s because I keep trying, and failing, then trying, and succeeding, and trying some more and asking lots of questions along the way.

Here’s what I’ve learned in 3+ years of doing DIY, technical diabetes things that I never thought I’d be able to accomplish:

  1. You don’t need to know everything.
  2. You really don’t particularly need to have any technical “ability” or experience.
  3. You DO need to know that you don’t know it all, even if you already know a thing or two about computers.
  4. (People who come into this process thinking they know everything tend to struggle even more than people who come in humble and ready to learn.)
  5. You only need to be willing to TRY, try, and try again.
  6. It might not always work on the first try of a particular thing…
  7. …but there’s help from the community to help you learn what you need to know.
  8. The learning is a big piece of this, because we’re completely changing the way we treat our diabetes when we go from manual interventions to a hybrid closed loop (and we learned some things to help do it safely).
  9. You can do this – as long as you think you can.
  10. If you think you can’t, you’re right – but it’s not that you can’t, it’s that you’re not willing to even try.

This list of things gets proved out to me on a weekly basis.

I see many people look at the #OpenAPS docs and think “I can’t do that” (and tell me this) and not even attempt to try.

What’s been interesting, though, is how many non-technical people jumped in and gave autotune a try. Even with the same level of no technical ability, several people jumped in, followed the instructions, asked questions, and were able to spin up a Linux virtual machine and run beta-level (brand new, not by any means perfect) code and get output and results. It was amazing, and really proved all those points above. People were deeply interested in getting the computer to help them, and it did. It sometimes took some work, but they were able to accomplish it.

OpenAPS, or anything else involving computers, is the same way. (And OpenAPS is even easier than most anything else that requires coding, in my opinion.) Someone recently estimated that setting up OpenAPS takes only 20 mouse clicks; 29 copy and paste lines of code; 10 entries of passwords or logins; and probably about 15-20 random small entries at prompts (like your NS site address or your email address or wifi addresses). There’s a reference guide, documentation that walks you through exactly what to do, and a supportive community.

You can do it. You can do this. You just have to be willing to try.

Improving #OpenAPS connectivity with automatic Bluetooth tethering (and switching)

One of my favorite things about developing and designing new OpenAPS tools is that if it works for me, it probably will work for someone else, too, and is worth sharing. These little tweaks and hacks add up to improving the real-world lived experience (usability) of living with DIY devices quite a bit…and I’m hoping that continuing to remove that friction enables people with diabetes to live their lives & take action more easily elsewhere, less distracted by diabetes.

So this weekend, Saturday was about enabling easier re-running of the setup scripts to add advanced features more easily in the future.

But Sunday became all about Bluetooth.

Background

Recently, several people have made a concerted effort to create and improve the directions to enable people to connect their OpenAPS rigs to their phones, using Bluetooth.

Without Bluetooth capabilities, when someone left the house or a known wifi network, they would either have to plug in a CGM receiver to get BGs (or have xDrip); or “hotspot” their phone to connect the rig to the Internet. It wasn’t a big deal, but it was something else you had to get into the habit of doing every time you left.

With Bluetooth tethering, you can connect your rig to the phone. And we added the feature so that if you dropped off a wifi network (you left home; or your router at home went down), then your rig automatically established Bluetooth connection and your phone would provide Internet connectivity to your rig. Great!

Making it easier for PWDs with loved ones (spouses/partners/parents/etc.) supporting them

However, today I noticed that because I have both Scott and my phones enabled and configured, sometimes the rigs would grab my phone’s hotspot, and sometimes his (depending on the timing). As the PWD, I would prefer my phone to be the primary phone for Bluetooth, and to only grab Scott’s if mine is out of range/unavailable. And I realized that this will probably be true for most people: kids may sometimes carry a phone, but not always, so it’ll make sense to check for a PWD’s phone first before cycling to try their support network’s phones next.

..so off we went to build that in. Scott also added code that makes it so that if your rig spots an open wifi, but it has a captive portal (meaning it requires passwords or accepting T&C, which the computer can’t automatically do, so it really doesn’t enable Internet access) and wifi ultimately doesn’t work, it will turn off wifi so the Bluetooth can provide connectivity..until the Bluetooth goes away. So it makes it easier for the rig to automatically stay online while you’re going to and from various places that do and don’t have open wifi networks for connectivity.

More connectivity is awesome

I was telling someone the other day why having easier connectivity and remote troubleshooting options is awesome – even as an adult. When a PWD is busy (at school, or on a stage presenting, or at a meeting, or whatever), a loved one can remote in and see what’s going on in the rig and resolve any issues, allowing the PWD to live their life.

That’s something to ask the commercial manufacturers of AP systems as they are in the pipeline to roll out to the broader community of people living with diabetes. For any commercial system you’re considering, ask the manufacturer:

  • How will your system enable me to live my life successfully?
  • How can see I easily see my data in the ways that I want to see it, on the devices that I want to see it on?
  • How will my loved ones be able to see my data?
  • How will my loved ones in a different location be able to help troubleshoot when things are going on?

These are the details that make the difference. This is why #WeAreNotWaiting.