Dear Reader,
Last time we talked about challenges associated with Variational Quantum Algorithms in general and today we’ll focus more specifically on Variational Quantum Eigensolver (VQE). All of the things we’ve covered previously are relevant for VQE as well, but here we’ll focus on challenges and progress that are more specific for VQE.
If you don’t remember exactly how VQE works, you can refresh your memory with the first article in the series – “VQE explained”, which is also a good place to start if you’re new here.
One more thing – while I was working on this article a new preprint came out on arxiv by Tilly et al. It is a really great piece of work and it made my life much easier. Therefore I consider this article a lightweight review of the current state of VQE, but if you’d prefer an actual deep scientific dive, that’s the place to go!
Without further ado, let’s start with…
Ansatz design
Ansatz design is a big part of ongoing research for VQE. However, I won’t be going into more details in this article, for a couple of reasons:
 I covered ansatz design in the previous part.
 Reviewing various ansatzes would require some introduction to quantum chemistry, which is beyond the scope of this piece.
If anyone is interested, you can find a great review in Fedorov et al. or in more details in Anand et al. And if you need some introduction to quantum chemistry, the first part of this review by Cao et al is a decent starter.
Hamiltonian construction
This is a topic that requires way more quantum chemistry background than I have. I found section 3 and 4 of Tilly et al. covers this topic way better than I would, so if you’re looking for a good reference, that’s the place. It contains explanations of concepts such as the JordanWigner or BravyiKitaev transformations, which allow us to translate Hamiltonians from the realm of quantum chemistry to quantum computing and which are definitely some terms I’ve come across a lot but did not understand for quite a long time.
It’s also worth checking out this tutorial to the FermiHubbard model by Warren Alphonso, it also explains some of these concepts very well.
Ok, after these two disappointingly short sections, let’s get into concepts which do not require background in quantum chemistry!
Measurement problem
VQE is basically an estimation problem – given a parameterized quantum circuit you want to estimate the expectation value of an operator (Hamiltonian) with certain precision. Two big issues are how to design the circuit and how to get the right parameters, but we’ve talked about them a lot in the previous article. Now let’s talk about “certain precision” – we’ll work on a concrete example of a \(CO_2\) molecule. To be clear, this problem is associated with every “cost function evaluation” in VQE that we do while optimizing parameters, not just the final result.
A standard accuracy that we desire in quantum chemistry is called “chemical accuracy” and is equal to 1kcal/mol^{1}, which is roughly equal to 1.6 mHa^{2}). But for this exampe we want to be extra safe – in the end, these quantum devices are far from perfect, so let’s aim for an accuracy of \(\epsilon = 0.5 mHa\). Fortunately, the relation between the number of measurements we need to do and the target accuracy is fairly simple: \(M = \frac{K}{\epsilon^2}\). \(K\) is a constant that depends on a molecule of interest, the estimation strategy used^{3} and certain other assumptions about the problem. In Gonthier et al they calculated that for \(CO_2\) it’s \(K=8000\). After substituting values in the equation above it turns out we need 32 billions measurements to properly estimate the expectation value of interest. This number seems high, but to give you a better sense of how much it actually is, this is equivalent to (given some reasonable assumptions described in Gonthier et al ) roughly 39 days of calculations.
Let’s stop here for a moment to appreciate that.
We need to run a quantum computer, for 39 days, non stop, to get a single estimation of energy^{4}.
And you know how many energy estimations you need to perform to actually find good parameters for VQE if we run an optimization loop? I tried it for a simple \(H_2\) molecule with much more optimistic assumptions and got 87, but it can easily increase to thousands.
Therefore, one of the biggest challenges for VQE is how to reduce this number of measurements and fortunately, there are a couple of methods to do this. I described some of them in the previous post, like “interpolating algorithms”, and below you can find the description of how grouping and measurements allocation works. They tackle different components in the “overall VQA structure” (see VQA post), so some of these methods can be used in conjunction.
Grouping
VQE is all about finding the lowest eigenvalue of a given Hamiltonian, which is expressed as a sum of Pauli terms. For each term we need to run a separate circuit and perform multiple measurements. Then we combine the results of all the runs and we calculate the energy of a given state based on that.
However, there’s a big problem with this naive way of using VQE. Let’s say our Hamiltonian looks like this: \(X_0 + Y_1 + Z_2\). It consists of three terms and in the most basic implementation we would just run a separate circuit for each term. However, each term affects a different qubit, and hence, you can run just one circuit and measure all of them at the same time (we say that these terms are comeasurable).
Let’s consider a slightly more complicated example:
\[Z_0 \cdot X_1 + Y_1 \cdot X_2 + X_2 \cdot X_3 + X_0 + Z_3\]There are a couple of ways to gather these terms in comeasurable groups, one being:
Group 1: \(Z_0 \cdot X_1 + Z_3\)
Group 2: \(Y_1 \cdot X_2 + X_0\)
Group 3: \(X_2 \cdot X_3\)
And another being:
Group 1: \(Z_0 \cdot X_1 + X_2 \cdot X_3\)
Group 2: \(Y_1 \cdot X_2 + X_0 + Z_3\)
Obviously the second allows us to perform a smaller number of evaluations to get the same result.
It turns out that performing grouping efficiently is already a pretty hard problem – there has been a lot of research in recent years to find better algorithms to do this. You can find a summary of these methods in Table 1 of Huggins et al. Some key takeaways from this table for our discussion:
 Number of groups depends polynomially on the number of qubits^{5}. While “polynomial scaling” usually means “good” in algorithms, \(N^4\) might be actually pretty brutal in practice.
 There are different tradeoffs that we need to take into account when choosing grouping methods. Some work best with certain QPU topologies, some others increase the number of gates, etc.
 Smaller number of groups does not necessarily imply a smaller total number of overall measurements! More groups might actually allow us to get higher precision by achieving better energy estimation per group with the same number of measurements^{6}.
Section 5 of Tilly et al. provides an introduction into some of the grouping methods, so if this topic is of interest for you, that’s where you should go.
Measurements allocation
Let’s say we have our groups now and we want to actually run the circuits and get some measurements. Should we just go ahead and do that? Well, not necessarily. Let’s say we have a Hamiltonian that looks like this:
\[H = H_1 + H_2 + H_3 = 5 Z_0 + 3 Z_1 + 2 Z_0 \cdot Z_1\]And a circuit which creates a state:
\[ \psi \rangle = ( \cos{(\frac{\pi}{6})} 00 \rangle + \sin{(\frac{\pi}{6})} 10 \rangle)\]By calculating expectation values by hand, we can see that
\(\langle H_1 \rangle = 5 \langle Z_0 \rangle = 5 \cdot 0.5 = 2.5\)
\(\langle H_2 \rangle = 3 \langle Z_1 \rangle = 3 \cdot 1 = 3\)
\(\langle H_3 \rangle = 2 \langle Z_0 \cdot Z_1 \rangle = 2 \cdot 0.5 = 1\)
\(\langle H \rangle = 6.5\)
Looks easy, right?
But now let’s see what happens if we try to measure these on a QPU^{7}, with 100 shots for each term^{8}.
\(\langle H_1 \rangle = 5 \cdot 0.66 = 3.3\)
\(\langle H_2 \rangle = 3 \cdot 1.0 = 3\)
\(\langle H_3 \rangle = 2 \cdot 0.58 = 1.16\)
\(\langle H \rangle = 7.46\)
Ok, let’s try once again, now with 1000 shots:
\(\langle H_1 \rangle = 5 \cdot 0.488 = 2.44\)
\(\langle H_2 \rangle = 3 \cdot 1.0 = 3\)
\(\langle H_3 \rangle = 2 \cdot 0.504 = 1.008\)
\(\langle H \rangle = 6.448\)
As we can see, the estimates we got are not exact and they depend on the number of shots. So we can increase the accuracy by increasing the number of shots, problem solved, right? Well, not really, as when we increase the number of shots, we also increase the time of running calculations.
Can we be smarter about it? Absolutely…
One can use a fixed total number of shots (budget) and allocate it among various groups. So let’s say we have 300 shots in our budget, how should we allocate it? The first approach is to do a uniform allocation, i.e. 100 shots for each operator. This is what we did in the first example. But we can do better than this. We know that the operators have weights 5, 3 and 2. So it would make sense to put most shots from our budget towards the first operator and least shots towards the last one. Let’s say we do that proportionally – all weights add up to 10, so we use \(\frac{5}{10}\) shots for \(H_1\) (150), \(\frac{3}{10}\) for \(H_2\) (90) and \(\frac{2}{10}\) for \(H_3\) (60). What did we get?
\(\langle H_1 \rangle = 5 \cdot 0.53 = 2.65\)
\(\langle H_2 \rangle = 3 \cdot 1.0 = 3\)
\(\langle H_3 \rangle = 2 \cdot 0.66 = 1.32\)
\(\langle H \rangle = 6.97\)
This result is better from what we got from just using 100 shots per operator. But this could be a random fluke, so in order to evaluate that, we should calculate the standard deviation (std) of our final energies. So after repeating my experiment 10000 times, it turns out that for uniform shot allocation, std is 0.469 and for proportional shot allocation it’s 0.420.
You might wonder – proportional allocation is definitely better than uniform, but what is the optimal allocation strategy? Well, as always, the answer is not simple. It will generally depend on the exact grouping method used, and getting a perfect answer is generally not possible: indeed, the number of shots necessary depends on the standard deviation of each individual term, which itself depends on the current wavefunction on the quantum computer. In some cases it is possible to mathematically solve for the optimal allocation by making some assumptions about these standard deviations, for more details see Rubin et al. section 5.1. Arrasmith et al. is also a good reference for this problem.
What we did so far was allocation of shots for a single evaluation of energy. But what if we wanted to have a shot budget for the whole optimization process? Perhaps at the beginning of the optimization we don’t need as much precision as towards the end? One approach to this has been proposed in Cade et al. (but also described very well in BonetMonroig et al. ), which authors call “3stageapproach”. It works by defining a shot budget for the whole optimization process and later dividing the optimization in 3 stages. As an example, they use 100, 1000 and 10000 samples in stages 1, 2 and 3 accordingly. However, different phases have assigned different numbers of energy evaluations. The ratio they proposed is 10:3:1, so for the first phase they use \(\frac{10}{10+3+1}\) of all energy evaluations, in the second \(\frac{3}{14}\) and just \(\frac{1}{14}\) in the last.
This allows for much more efficient usage of the available resources, as you can see in the plot below (it’s, fig 6 from Cade et al.).
This is just one way to approach this problem, you can find others in Gu et al. or Arrasmith et al.
VQE beyond ground states
VQE is usually used for finding the ground state of a given Hamiltonian. However, it could also be used to solve other problems, such as:
 Finding excited states of the system
 Calculating vibrational spectrum of a molecule
Finding excited states of a system
In many cases we’re interested in finding the excited states of a given system, not just the ground state. For example, in order to get information about the color of a material we need to know what the energy difference between the ground stare and first excited state of the molecule is. One approach to do this has been described in McClean et al. and Colless et al. It basically boils down to the following steps:
 Run a regular VQE in order to find the ground state of the system.
 Once the algorithm has converged, modify the Hamiltonian in such a way that it contains information about excited states.
 Measure new Hamiltonian.
 Perform some classical postprocessing of the results.
 Voila!
There are many other variations of VQE, if this is of interest to you I recommend section 9 of Tilly et al.
Calculating vibrational modes
Another interesting property that we might want to learn is associated with the vibrations existing in the molecule. What is it about? Well, atoms in the molecules are not rigid, they are in constant motion in respect to each other and this movement, the vibrations, can affect the properties of the molecule in nontrivial ways. This is important for topics such as astrochemistry or modeling fuel combustion, but also particularly useful for a class of molecules that have certain structures.
In principle, we can use the old good VQE to solve it – we just need the to use a different Hamiltonian, which describes those vibrational modes. However, since this problem is different from the electronic structure problem that we usually solve, there are somewhat different considerations, e.g. we use different mappings to construct the Hamiltonian or we need to pay more attention to the excited states.
If you would like to learn more about this topic, I talked about it with Nicolas Sawaya, research scientists working at Intel in my podcast and here you can find his paper where he describes these ideas in more details.
Outside of quantum chemistry
In principle you can use VQE for any situation where you have a matrix and you want to learn what its lowest eigenvalue is.
One example can be using VQE for solving combinatorial optimization problems. Usually, we think about using QAOA for such problems, but QAOA also takes a Hamiltonian as a problem, so we could use VQE to find its ground state as well. If you’d like to better understand the difference, I have covered the difference between these two methods in my “QAOA explained” article. An example of using VQE for combinatorial optimization problems can be found e.g.: in Liu et al.
Apart from that, I have not encountered any particularly exciting or promising use cases for VQE outside of chemistry, so I’m not going to talk about it too much. However, if you know one, please leave me a comment under this article and I will be happy to include it :)
Conclusion
Thank you for reading!
As it turns out, getting to the forefront of VQE actually requires more quantum chemistry background than I expected, hence article ended up being shorter than I expected.
If you’d like to get deeper into the topic, follow the references I left throughout the text and make sure to spend some time with Tilly et al!
I wanted to thank people who helped me review the draft!
 Rafał Ociepa
 Nicolas Sawaya
 Jérôme Gonthier
 Boniface Yogendran
 Peter Johnson
If you don’t want to miss the next (and the last!) part of the series on VQAs, which will be a deep dive into QAOA, sign up for the newsletter. And since QAOA is kind of my thing, expect this to be a really deep dive – I can’t wait to write it!
Have a nice day!
Michał
P.S. The graphics I used as a “thumbnail” on the main page of my blog comes from the Wombo project. It’s amazing!
Footnotes:

Why does anyone use this particular number? As it gives you orderofmagnitude correct answers to reactions rates, based on Arrhenius equation at room temperature. ↩

mHa is a unit of energy used in quantum chemistry. ↩

For example it might change depending on whether you group measurements or not. ↩

Nowadays you need to recalibrate your device every couple of hours in order to make sure its performance doesn’t degregate, source. ↩

It’s actually in the number of spin orbitals, which corresponds to the number of qubits in the most commonly used JordanWigner transform. ↩

For more details see Huggins et al, and section 5.3 from Tilly et al., or section V of Yen et al. ↩

Well, to be honest with you, I have used a simulator ;) ↩

We often say “shots” instead of “measurements” in QC. ↩