Q-CTRL, in partnership with NVIDIA and Oxford Quantum Circuits (OQC), has demonstrated a 500,000x reduction in classical compute costs for quantum error suppression tasks by leveraging NVIDIA GPUs and accelerated libraries. The work focuses on optimizing the layout ranking process, a computationally intensive step in mapping abstract quantum circuits to physical qubits. This process involves evaluating potential mappings while accounting for qubit connectivity constraints and hardware performance variations, which becomes exponentially complex as qubit counts scale.
The team developed GPU-accelerated implementations of the layout ranking process using NVIDIA’s RAPIDS and cuDF libraries. These implementations employ two levels of parallelism: (1) layout-level parallelism, where multiple circuit layouts are simultaneously evaluated across GPU threads, and (2) qubit-level parallelism, where computations within each layout are further distributed across thousands of GPU operations. Benchmarking results showed a 10x speedup for real quantum circuits (e.g., Bernstein-Vazirani circuits) and up to 300,000x speedup for large-scale, randomized layouts compared to CPU-based methods. At 200 qubits, the cost-per-layout dropped from $1 to $0.01 when using GPUs instead of CPUs, demonstrating significant computational efficiency.
The layout ranking process is essential for minimizing errors and maximizing circuit fidelity. For example, selecting optimal layouts can improve circuit fidelity by over 10x compared to suboptimal choices. However, as qubit counts increase, the number of possible layouts grows exponentially, creating a computational bottleneck. The GPU-accelerated solutions address this by enabling parallel evaluation of layouts and qubit-level operations, reducing both computational time and cost.
The layout ranking process plays a critical role in reducing errors and improving execution fidelity. Optimal layout selection can improve circuit fidelity by over 10x compared to suboptimal choices, but the search space scales exponentially with the number of qubits. The GPU-accelerated approach addresses this bottleneck by massively parallelizing the ranking procedure, reducing both computational time and cost. In benchmarking experiments, evaluating 1 million layouts at 200 qubits took 1.2 minutes with a GPU, compared to 11.7 minutes on a CPU.
Two benchmarking experiments were conducted:
- Full-Pipeline Benchmark: Using real quantum circuits, the GPU-accelerated approach delivered a 10x speedup over the Qiskit VF2PostLayout pass, reducing ranking time from 11.7 minutes (CPU) to 1.2 minutes (GPU) for 1 million layouts at 200 qubits.
- Modular Benchmark: Scaling to 1,000 qubits and 500,000 layouts per circuit, the GPU-accelerated ranking was 100,000 to 300,000 times faster than the CPU-based approach. Cost analysis showed GPU-based ranking at $0.01 per layout compared to CPU-based ranking at $1 per layout.
The collaboration also explored the integration of AI-driven techniques for layout ranking, building on Q-CTRL’s prior work in AI-augmented quantum error suppression. For example, Q-CTRL’s “Learning to Rank” method uses machine learning to optimize layout selection, further improving circuit fidelity. Future advancements could include AI-powered layout ranking and faster layout generation, which would reduce compilation times and improve algorithm performance. These innovations are critical for scaling quantum computing to thousands of qubits, enabling efficient hardware-aware compilation and reducing errors in practical quantum applications.
For more details, visit Q-CTRL’s announcement here.
March 21, 2025
Leave A Comment