Musty Thoughts

Dear Reader!

This article was first published at the blog of Zapata Computing, company I work for, on August 7th 2020. You can find more articles written by my colleagues there, the link is here.

TL;DR

Complex computational science is hard to manage, especially operating at any significant scale. Some of the most prevalent problems are even more pronounced when doing quantum computations.
I’ll share some of my best practices (would love to hear yours, too!) to avoid or solve these challenges.
Our workflow management system, Orquestra®, handles many of these challenges for me, by design. These challenges inspired why we built, and continue to evolve, Orquestra in the first place.
I added some visual examples below, including a MaxCut with QAOA.

The challenges of computational sciences learned the hard way

Doing science has never been easy. Thankfully it has gotten much better in the last hundred years. Think about Rutherford’s students who had to sit in some pitch-black cellar counting light flashes! We can just sit in our office (nowadays probably a home office) and run some simulations on a computer. While convenient, it does come with its own set of challenges. Let me share a couple of examples from my own life with you to illustrate some of these challenges.

One of the first times I realized, “using a computer as a primary tool for work might not be as great as it seems” was when I was working on my BSc. I was running simulations of pedestrians. Each simulation was performed for different parameters of agents, and the outputs were saved to a file. Another script was reading data from all the files and performing some analysis. I used my MacBook to run it all, and unfortunately, it broke down (I guess from overheating). Fortunately, the hard drive survived, but I needed to use some other machine for analyzing it. The only one I had was a RaspberryPi running Ubuntu. I copied my python scripts there, installed all the libraries, and performed the analysis. Imagine my surprise when I discovered that it looks like absolute garbage! Fast forward two days of debugging, and it turned out that a function for sorting filenames I was using worked differently on Ubuntu and OSX.

Lesson learned 1: The OS you’re using for running your code matters.

On another occasion, I was doing research for my MSc thesis. It was a similar thing – many simulations, many different combinations of parameters, a couple of hours for each. It was hard to keep track of all that, but obviously, I had a good system for doing it. I stored each result in a separate file. The name of the file indicated the set of parameters I used. There was no space for confusion!

Unfortunately, it turned out there was. I started a simulation, then discovered a bug, so I terminated it and fixed it. I started it again but quickly realized that I should introduce another parameter. I added it and, in the meantime, also fixed how the optimizer worked. After a week, I had no idea which run was done with what version of the code, and I had to repeat hundreds of hours of simulations.

Lesson learned 2: You have to be meticulous about keeping track of your experiments. (I also became acquainted with git, my new best friend for life, who now helps with this task.)

One or two years later, when I was working at Estimote, our team faced a problem that we were not able to solve alone. We needed to work together with another team to make use of their expertise. They had performed similar tests as we had before, so we wanted to see how our methods worked with their data and vice versa. Unfortunately, it turned out that it wasn’t easy. We used a different format of data, conventions, and tools to do similar things and it wasn’t immediately obvious how to translate between these two worlds. Of course, we finally succeeded, learned a lot in the process, and made our code and processes much more compatible, but it took us more time than we suspected in the beginning.

Lesson learned 3: It’s important to operate within a unified framework. It requires some overhead, but if your project is bigger than you or your team, it will pay off.

I could tell you more failure stories like this, and if you work in the computational sciences business, you likely have plenty of your own. Let’s take a moment to summarize some common challenges, which, by the way, are magnified when doing quantum computations:

Long runs – There is a big difference between working on a simulation where a single run takes a second, a couple of minutes, or a couple of days. Finding a critical bug in your code that has been running for a couple of days might be, well, upsetting.
Parallelization – One way to deal with long runs is to parallelize them. Sometimes it’s just a matter of a couple of lines of code, sometimes a couple of days of work.
Interoperability – Some software packages simply don’t go well together, and you can spend hours trying to have them both work on one machine. At the same time, the same setup might not work on another computer.
Keeping track of your work – Making sure all the settings and parameters of a given simulation are stored properly might be challenging. Using version control helps but doesn’t solve all the issues.
Sharing your work – Making it easy for others to use your results or code often requires good communication and following standard conventions.

How I’ve addressed some of these challenges in my work

Here are some good practices I’ve learned over the years to make my life easier – sometimes slightly, sometimes significantly. The list is long; these are just some examples.

Versioning/version control – Keeping track of the version of code that you’re using is helpful. I cannot imagine working without git. At the same time, it’s useful to also keep track of the exact versions of the libraries you’re using.
Automation – “If you need to do something more than twice, it’s time to automate it” – this rule of thumb is useful in general, not only in this context. And while most of us appreciate the usefulness of automation, we often tend to automate some tasks much later than we should.
DRY (Don’t Repeat Yourself) – One of the most important rules of software development. Code duplication leads to inconsistencies and errors.
Avoid “quick & dirty” solutions – Sometimes, a suboptimal solution due to time pressure is a smart move but immediately add “TODO” to your code. Make a note to yourself or set a reminder, so you will fix this as soon as you’re back on track. If you make poor coding choices out of laziness, do yourself a favor and stop now. Reality will strike you back sooner or later.
Use a profiler – Sometimes, a bottleneck in the performance can be in the most surprising places. It’s really useful to use a profiler from time to time to see how much time your program spends on calling particular functions. The investment of 30 minutes might shorten your runtimes significantly.
Following good software practices – This is a broad category, and I’ve already touched on this in previous points. If you don’t have a good software engineering background, educate yourself. Read a book on this topic (“Clean Code” is my personal favorite) or watch a couple of YouTube videos, and it will pay off.
Talk to others – Not only about what research they do, but also how do they do that.

Enter, Orquestra — a tool built to tackle these challenges for me

It all started on a warm summer day in June 2019. I came to Cambridge for a job interview at Zapata. I had a whole day of conversations, learned a lot about what people do at Zapata, and about the product they were building – a computational platform that we now call Orquestra. At the end of the day, tired and jetlagged, I had a meeting with Peter Johnson, one of the founders, and we decided to take a stroll around Cambridge. He asked me what I thought of the platform and the idea of working to build it. I had fallen in love with the idea of the product, but the opportunity to build a tool I had always dreamt of was pure bliss!

Okay, so what is Orquestra and why am I so enthusiastic about it?

Orquestra is a workflow management system designed for performing computations utilizing quantum computers (though it works equally well with classical and classical-quantum computations).

At the top, there are workflows. A workflow is a representation of an algorithm – it defines what steps need to be performed, in what order, and the relationship between them. (Although a workflow is not limited to a single algorithm; it could also chain multiple algorithms together)

The next layer is tasks or specific steps in the workflow. For each task, we define the inputs, how we want to process them, and what the output should be.

I know it doesn’t sound particularly exciting, so I’ll show you where the magic (at least part of it) lies. Here is how we run a workflow for solving MaxCut with QAOA:

Alt Text

(It’s a small thing, but I just love to watch how it unfolds. )

Here is how we run a workflow for solving MaxCut with QAOA for three graphs, each using three different optimization algorithms.

Alt Text

A couple of cool facts about these examples:

Both cases will run at roughly the same time. Why? Because once I have defined in my workflow that I want to run these things in parallel and I don’t have to worry about how it happens.
Another thing – since we’ve designed Orquestra to be very modular, changing the optimization algorithm or the backend we use is just a matter of one line of code in the workflow.
Since every step is being executed on a separate machine (docker image), I don’t have to worry about the software dependencies between different steps.
Once the calculations are finished, I can access all the data (both final and intermediate results) either by downloading a JSON file or connecting to a database.
All the code that’s executed is either part of some git repository or the workflow itself. I can always go back to and check what workflow has been submitted and what version of code I’ve used.
If I wanted to optimize the cost of running the workflow, I could specify what machines I want to use – add RAM and CPU for the heavy tasks and use smaller ones for the trivial steps.
If you want to introduce any changes to the code someone else has written, you can create another branch or fork of their repository with your changes, specify it in your workflow and run your version.

I could go on, but you get the idea. We’ve been using prototypes of Orquestra internally for well over a year to do our research and customer work, and it definitely speeds up our work and saves us some major headaches.

Is Orquestra a silver bullet that will seamlessly solve all of your problems? Is it the right tool for any kind of project? Does Orquestra work flawlessly 100% of the time? Sure no. But we work hard – mostly day and sometimes night – to make the answers to these questions as close to “yes” as we can. So that one day, some scientists will look back at the software tools we’re using today and think about us with the same pity that we think today about Rutherford’s students.

Yours forever in computational struggles,

Michał

The photo next to the title of this article on the main page comes from the ORNL and is used under the CC BY 2.0 License

Common Computational Pitfalls and How to Avoid Them?