Quantum networking company Welinq has integrated its distributed quantum compiler, araQne, with the NVIDIA CUDA-Q hybrid quantum-classical software platform. Engineered to partition, map, and orchestrate monolithic quantum algorithms across heterogeneous processors, araQne addresses the hardware scaling limits of individual, isolated devices. The structural integration with NVIDIA CUDA-Q establishes a unified compilation-to-verification workflow, allowing developers to optimize multi-processor networks while utilizing graphics processing unit (GPU) simulation infrastructure to validate compiled architectures prior to deployment in quantum-augmented data centers.
[ Monolithic Circuit ] ──► [ araQne Hypergraph Partitioning ] ──► [ Optimized Distributed Subcircuits ] ──► [ CUDA-Q GPU Validation ]
Hypergraph Partitioning and Entanglement Cost Reduction
Distributed quantum computing links separate, modular Quantum Processing Units (QPUs) into a single, higher-capacity system mediated by classical and quantum communication tracks. To minimize the consumption of entangled Einstein-Podolsky-Rosen (EPR) pairs required for inter-QPU data and gate teleportation, araQne employs a hypergraph-based compilation model. Published in EPJ Quantum Technology by researchers R. Mengoni, W. Nadalin, and M. Rennela, the compiler combines circuit partitioning, gate reordering, and greedy gate-packing heuristics to mitigate routing overhead. When evaluated across multi-node benchmarks, this optimization scheme demonstrated structural reductions in EPR pair consumption compared to unoptimized baselines, reducing entanglement distribution costs by 10% for 8-QPU splits, 19% for 4-QPU networks, and 30% for 2-QPU clusters.
Sliced-Statevector Validation and GPU Performance Gain
Because complex gate reordering and packet grouping can introduce unintended compiling faults, Welinq developed a sliced-statevector verification pipeline natively accelerated by NVIDIA CUDA-Q’s GPU statevector simulator backend. Rather than attempting to process full, exponentially large system matrices, the verification utility decomposes the global circuit layout into fixed 20-qubit spatial slices. The validator then treats external operations acting on the slice as boundary-conditioned matrices and independently measures the projected statevectors before and after compiler reordering, flagging common errors such as missing operations, dropped controls, and incorrect targets.
This sliced-statevector verification protocol was stress-tested on random quantum circuits (RQCs) with a 35% two-qubit gate density to evaluate performance limits under un-structured, high-entropy conditions. When benchmarked on an NVIDIA L4 Tensor Core GPU against a standard central processing unit (CPU) simulation baseline, the NVIDIA CUDA-Q backend consistently reduced execution times by at least an order of magnitude. The performance scaling advantage held true across both experimental test axes, sustaining a clear runtime benefit when scaling a 100-qubit circuit up to 100 deep logical layers, as well as when validating a shallow 10-layer grid across widths expanding up to 1,000 qubits.
The technical benchmarking metrics, compiler source code derivations, and verification algorithms can be reviewed in the official Welinq Research Brief here and the corresponding peer-reviewed paper hosted via EPJ Quantum Technology here.
June 24, 2026

Leave A Comment