Podcast with Yudong Cao, co-founder and CTO of Zapata Computing

dougfinke

3 years ago

Yudong Cao, co-founder and CTO of Zapata Computing, is interviewed by Yuval Boger. Yudong and Yuval talk about how quantum computing can be used for generative modeling, the implications of this usage for various industries, classification vs prediction modeling, and much more.

Transcript

Yuval Boger: Hello, Yudong, and thank you for joining me today.

Yudong Cao: Hello, Yuval. Very happy to be here.

Yuval: Happy to have you. So who are you and what do you do?

Yudong: I’m co-founder and CTO at Zapata Computing. We’re an enterprise software company focusing on bringing quantum computing to enterprises, bringing quantum advantage to enterprises. So at Zapata, I oversee all of our research efforts, including on quantum algorithms for different applications as well as what we call quantum-related classical techniques, which I’d also be excited to touch on at some points as well.

Yuval: Would I be correct in saying that the view is basically, customers are looking for a solution, they don’t particularly care if it’s quantum or quantum inspired or someone else or something else, and therefore, you basically just try to deliver the best solution for the customer?

Yudong: I think the answer will be yes and no. It is true that there are highly critical tasks in an enterprise setting, where the bottleneck can be solved by whatever means necessary. And for highly critical tasks like some sort of logistic optimization, scheduling, planning, the sort of tasks that naturally comes with just the sheer scale of the business, it doesn’t matter if you’re selling cars or drugs or anything else. So, because of the scale and complexity of a business operation, these are the bottlenecks that naturally arise, and those are the types of problems where enterprises say, how much does it cost to actually solve it, and how well does it solve it? There are also other flavors problems that are intentionally open-ended and innovative. And that’s another aspect that we see the global enterprises, they are making an active, strategic investment in those research, fundamental research ideas so that they can become early adopters in the field.

And we’ve been fortunate to work with quite a few of those forward-looking enterprises, where our scientists are actually sitting down with their team to collaborate on science projects and advance the field together. That’s the type of projects that are typically driven by the innovation units, the kind of organizations whose mandates is, by construction, this sort of forward-looking mandate. With them, we want to be open-minded and collaborative. And then, with a business unit, we want to be very clearly defined in terms of the results and, the timeline, the deliverable. So I would say it’s sort of a mix of two different approaches.

Yuval: When the idea of having Zapata on the podcast first came about, a topic was suggested about generative AI and I said, well, this is really interesting, but this is a quantum computing podcast and I understand that actually, there’s a connection there. So what is the connection between generative AI that we know so well today because of ChatGPT and others and quantum computing?

Yudong: Yeah, that’s a great question and especially a question that really hits at the core of what we’ve been working on pretty much in the past, three years, four years, where our bet for near-term quantum advantage is generative modeling. So this is, I would say, a bit different from what people typically refer to as generative AI these days, but it’s highly related. Generative modeling is a paradigm of machine learning where you’re training a machine learning model to learn the probability distribution of a set of training samples. So it’s an unsupervised scheme – it’s a real-world machine learning setting because most data in the world are unlabeled. Unsupervised learning is this whole iceberg under the surface of machine learning – it allows you to at least build a model that, without any training label, without any specification of where you want the model to go,captures the sort of inherent correlation within data.

So in the case of generative AI, this is the pre-training stage. So for some of the common transformer architectures, for, say, large language modeling, you typically do this unsupervised training for a very long time to arrive at a pre-trained model, which is just a very good autocomplete machine. It’s trained on a large corpus of data, and it’s good at replicating text that’s sort of similar to the text there. But this is very far from what people call generative AI. You still need to take two or three more steps on top of this generative modeling to actually fine-tune the model and do even more fine-tuning with human input to turn it into something like chat GPT. So that’s the precise relationship between generative AI and what we refer to as generative modeling in the past few years.

I would say in terms of computing cost, it’s the dominant portion in the journey of building generative AI capabilities. These days, there’s a flurry of foundation models. So these are the pre-trained generative models that lie at the basis of many of these generative AI applications. So once you have a foundation model, then you can proceed to fine-tune it for different tasks. And that is the type of task that we’ve been interested in for the past three years, four years. And the strategy is to think of quantum computers as statistical engines. The intuition is, in fact, aligned with many existing results in the field – the quantum supremacy experiments that we’ve seen at least three examples around the world, essentially says that quantum devices are very powerful statistical engines. They can create these statistical correlations that are otherwise very difficult to replicate. You probably need a supercomputer to replicate.

But the quantum supremacy experiments is a very contrived setup. It’s a great engineering milestone, and it’s great evidence that such advantage in terms of statistics exists. There’s also a lot of work on the theoretical side in the literature as well, characterizing different regimes of how you can create this kind of statistical power that is very hard for classic computers to catch up in these generative modeling setting. So our approach in the past few years is more of a, roll-up-our-sleeves-and-actually-do-it type of attitude, where we actually applied this hybrid quantum-classical architecture and take a quantum computer as a latent space and then combine that with a state-of-the-art classical machine learning architecture to generate images. We set up this process so that we can not only generate images, but also generate other more interesting structures like molecules, and in the future, also text and even higher dimensional data.

The logical chain is that, it starts from the statistical power of quantum devices, and then the next step is apply that into a hybrid machine learning algorithm, and then apply that to the generative model. And then that generative modeling capability is today, the foundation of what most people would say, is the generative AI. Of course, it’s not generative AI by itself, but it’s the foundation of, with some fine-tuning, you can get to some useful generative AI applications. I hope that clarifies the precise relationship of how everything fits together.

Yuval: I think it would be good to dive into a couple of examples in a little bit. But first, because you mentioned language models, my understanding is that when you do classical training of language models for the generative model, you’re using billions of parameters. How large does a quantum computer need to be to create an effective model for weather language or for something else?

Yudong: The precise answer should be, we won’t know until we’ve tried it. I can point to some intuitions that people have found, where you can now map quantum states into probability distributions. And there’s a lot of examples in physics literature where people have shown leveraging the unique quantum mechanical ability to derive statistics from quantum state. In quantum mechanics, you can measure qubits in different bases. CHSH game is ironclad evidence of how quantum correlations can give rise to something that’s hard to replicate without quantum mechanics.

I think there are more generalized demonstrations of these sort of non-classical behavior. This is what we are betting on, essentially. Quantum correlations can give us a richer model than the sort of conventional classical techniques that we’ve been using. Another point that’s worth mentioning is quantum-related or quantum inspired, such as tensor networks, which is a near term bridge towards quantum advantage. Tensor networks is a very elegant way to represent correlation among different variables. Those variables could very well be quantum degrees of freedom, and in that case, tensor network is a very good representation of quantum states. There are ways to map a tensor network description of quantum states directly onto quantum computer and then go even further on the quantum computer to enrich the kind of correlation.

We’ve actually done some work that demonstrates this. You can think like a relay race. You go as far as you can on the classical computer with tensor networks, and then you load it onto a quantum computer and then you do even more fine-tuning to enrich the model. You can use tensor networks and quantum computer for many other things.

Tensor networks are in fact a very good approximation of these high-dimensional, non linear structures. There are plenty of literature of so-called tensor-izing neural networks. We know that there’s a very intimate connection between tensor networks and neural networks. In the case of large language model, there’s a last step of sampling from a probability distribution of what’s the next token to generate or what’s the next set of tokens to generate. In that scenario, you will probably replace the neural network construction with a tensor network, and then we’re off to the races because now we can either stick with the tensor network just to run it classically, or we can set ourselves up for quantum computing, where now this tensor network is ready to be ported on a quantum device and then we can get even richer representation on a quantum computer.

But I have to caveat that, there’s a lot of subtlety, especially when it comes to this kind of relay race. When you pass the baton from classical computer to quantum computer, there are many engineering subtleties to work through. If you really think about what exactly happens in this kind of handoff from classical to quantum, and then quantum computer makes a measurement and come back, there are many stages that to unpack all of that probably takes another hour, but I’ll just stop here to illustrate this is the basic idea.

Yuval: In moving from theory to practice, you’ve been working on this for a number of years. Can you give me some examples of industries or applications that you’ve experimented with and you’re seeing good results of this approach?

Yudong: Yes, the first proof point was in 2020, where we work with an eight qubit IonQ device in conjunction with some classical deep learning architecture, some variant of generative adversarial network to generate very high-resolution MNIST handwritten digits. So that’s the first proof point. Then later on, we actually built on top of this generative modeling ability to think about how we can solve combinatorial optimization problems. Instead of using the hybrid machine learning setup to dream up handwritten digits, which by itself, is not particularly interesting, we did generative modeling for financial portfolios. But we did that in the context of optimization problem. Basically, an additional layer on top of the generative modeling. We have an optimizer that’s exploring the space of possible portfolios, and some of the portfolios are better than others. So what the generative model does is that it tries to replicate the portfolios that have been discovered by the algorithm that have better performance than the others. If you have a hundred portfolios, maybe there are ten portfolios that are very high performing. The generative model will pay a lot of attention to those ten portfolios and try to learn, what is it about these portfolios that gives them high performance? And then try to generate as many portfolios that it thinks looks like these high-performing portfolios. So generative modeling in this sense, it’s like a booster to this optimization scheme. What we discovered was that we could actually= discover portfolios of much lower risk than the best portfolio discovered by a classical algorithm.

I don’t have the numbers on top of my head, but the team recently did some benchmark with, I think nine different state-of-the-art portfolio optimizers. And under some hypothesis testing, our algorithm has outperformed at least two of the nine, and then is on par with the other seven. So that’s another proof point for us of a generative model approach, relative to these combinatorial optimization problems. And another thing to note is that in this example, the generative model is actually just a matrix product state. So it’s a tensor network, it’s a classical representation, but because of the connection with quantum computer. So we expect that if we actually report it onto a quantum computer, the model will be at least as expressive, because you can map tensor networks exactly onto a quantum state, even though it may not be very useful to do that without actually do some additional fine-tuning on the quantum computer to enrich the model with quantum correlation.

So that’s the second proof point. And then the third proof point is a paper that came out last year called, I think it’s called ”Synergy between quantum and classical computing”. Something like that. So that’s a paper that illustrates this kind of relay race concept, where you can train a model with tensor network first, and then once you move into a quantum computer, the loss function continues to decrease, and you can actually learn more correlations in the data. And then we also have a collaboration with MIT and BMW about using the same technique for optimizing the factory floor schedule. In a car company, there are different stations like the paint shops and the body shops. And at each station, there’s typically a buffer zone to store the inventory because each station work at a different rates, and then workers go on and off the station according to their shift.

You can appreciate how the complexity multiplies as you consider more and more variables. How do you work out the schedule of the workers, of the stations, in a way that minimizes the buffer in those stations? So that’s another use case that we have done research, and I think a paper is coming out soon where we observe that we can actually enhance some of the existing classical solutions when it comes to finding out the best arrangements for the factory. We’ve also used it for other combinatorial optimization problems like feature selection, where you’re trying to find the optimal subset of a set of items, except that this time, it’s a subset of features rather than a subset of assets in the portfolio optimization. So yeah, there’s quite a broad range of things that we’ve tried it on, and we can do it with classical tensor network. It could be all classical or it could be driven by some sort of quantum device.

Yuval: Do you feel that this technique is particularly suited for time series data or not for time series data, or does it not make a difference?

Yudong: This technique, in its vanilla version, is intended for data sets of fixed dimensions, so images and solutions, to a given combinatorial optimization problem. But by introducing some recurrent structure, I think this could very well be used for data of variable dimensions like time series or language or combinatorial optimization or very large combinatorial optimization problems. Say if you have problems of millions of variables, maybe commercial solvers will struggle to solve those problems. But you can imagine some kind of… Well, I guess you don’t have to imagine because there’s already existing literature. People have used, for example, reinforcement learning, to solve a billion variable problems like chip floor planning. So for those schemes fundamentally, you still need to do the sampling.

For example, some kind of agent sampling from a policy distribution or an environment sampling from a transition rule. So there’s still going to be sampling involved, and this technique could very well be applied there. , t It’s definitely on our mind to migrate this to see how we can apply this technique to variable dimension problems.

Yuval: You mentioned generative model for portfolio. One other area that I’m curious if there’s applicability is generative models for molecules or anything that has to do with the pharma industry. What can you share on that?

Yudong: Yeah, we came from a chemistry lab, so this is very ingrained in our DNA, to think about what can we do for chemistry.” In Alan’s lab, there’s already been a history, a trail of research with classical machine learning, say with GANS, to generate molecules. So I think we have every reason to believe that generative model can also play a role with molecular generation, and we’ll have some results published very soon as well.

Yuval: As we get closer to the end of our conversation, I wanted to ask you, you’ve been doing this for a while. What have you learned over the last six or 12 months that you didn’t know before? What surprised you the most in the past year or so?

Yudong: Yeah, there are multiple things. I would just pick one that really stood out, which is the importance of tools and infrastructure. We came from a very deep, scientific background, the founding team as well as a lot of the early team members that we recruit, they’re very embedded in this sort of the scientific community. So naturally, we like to think about, okay, what is the scientific approach to solving this problem? But what has really highlighted for me in the last six months, pretty much since we started the company, is the importance of having the right kind of tools and infrastructure in place to make sure that we can actually launch very big experiments and. we can manage the data in a thoughtful way. When we do these projects, we do it in a way that there’s always maximum amount of information that is left behind that we can leverage for later, so that this is not just a scientific exercise to create just enough data to produce that plot in the paper, and then once we publish the paper, we move on to the next thing.

This was the sort of academia style of operating, where we just want to do the bare minimum software engineering so that we can get through this project and get it published. But for us, this is very much a big change in the culture, which is that, in addition to delivering the results,it’s also very important to set up the kind of processes and tools, so whatever experiment we do is in some way reproducible by somebody else in a company or by us in the future, or whatever we do is sort of modular in such a way that wherever we build can be reused later on.

So there’s a bit of learning in terms of using the tools at first, but later on it becomes building the tools. We realize that the tools out there don’t necessarily scratch all the itches that they want to have. So that’s the original impetus for us to build Orquestra , which is that we want to have a tool that is really seamless for everybody to use, but first and foremost, seamless for us to use. So that was, I think, one of the biggest lessons. There are a few more, but I think it’s best to just stop here.

Yuval: And a hypothetical, so if you could have dinner with one of the quantum greats, dead or alive, who would that person be?

Yudong: I would have dinner with Richard Feynman. I would be curious to see how… Well, yeah, I guess I’ll be curious to see what he thinks of these well-known quantum computing results. Shor’s algorithm, error correction, et cetera. Maybe over dinner, he would’ve figured out Shor’s algorithm and error correction all by himself. Who knows? I’ll be curious to see what he thinks of the development of quantum computing in the past 20 years. I wouldn’t be surprised if he thinks, “Well, is that all you have done in 20 years? I could have done it in a week. You should instead look at X, Y, and Z.” That’s the kind of thing I’ll be excited to hear about, hypothetically. It would be great to be told that the whole time, we’ve been only looking at this very narrow part of what is this much bigger landscape that we don’t even know about. That’s usually the kind of thing that I find truly exciting.

Yuval: Excellent. Yudong, thank you so much for joining me today.

Yudong: Yeah, thank you for having me as well.

Yuval Boger is an executive working at the intersection of quantum technology and business. Known as the “Superposition Guy” as well as the original “Qubit Guy,” he can be reached on LinkedIn or at this email.

May 15, 2023