flasharray/kvm/adaptive: NVMe-TCP transport for FlashArray primary storage#13061
flasharray/kvm/adaptive: NVMe-TCP transport for FlashArray primary storage#13061genegr wants to merge 7 commits intoapache:mainfrom
Conversation
Preparatory data-model changes for NVMe-TCP support on the adaptive
storage framework. No behaviour change for existing Fibre Channel
users - the extra enum value, field, and getter/setter are only
exercised by callers that explicitly use them.
ProviderVolume.AddressType gains a NVMETCP value alongside FIBERWWN,
so adapters can declare that a volume is addressed by an NVMe EUI-128
(NGUID) rather than a SCSI WWN.
FlashArrayVolume.getAddress() produces the NGUID layout expected by
the Linux kernel for a FlashArray NVMe namespace:
00 + serial[0:14] + 24a937 (Pure 6-hex OUI) + serial[14:24]
which matches the /dev/disk/by-id/nvme-eui.<id> symlink emitted by
udev. Fibre Channel callers (addressType != NVMETCP) still get the
existing 6 + 24a9370 + serial form.
FlashArrayConnection gains a nsid field to carry the namespace id the
FlashArray REST API attaches to host-group-scoped NVMe connections,
when it is present.
Teach FlashArrayAdapter to talk to a pool over NVMe over TCP instead of
Fibre Channel.
The transport is selected from a new transport= option on the storage
pool URL (or the equivalent storage_pool_details entry), e.g.
https://user:pass@fa:443/api?pod=cs&transport=nvme-tcp&hostgroup=cluster1
Defaults remain Fibre Channel / WWN addressing when transport is absent
or anything other than nvme-tcp, so existing FC pools are unaffected.
Beyond the transport parsing itself the adapter now:
* Tracks a per-pool volumeAddressType (AddressType.NVMETCP or
FIBERWWN) and stamps every volume it hands back to the framework
with it (withAddressType), so the adaptive driver path stores the
correct type=... field in the CloudStack volume path (used later
by the KVM driver to locate the device).
* Attaches pod-backed NVMe-TCP volumes at the host-group level
(POST /connections?host_group_names=...) instead of per-host, so
the array assigns a consistent NSID to every member host; falls
back to per-host attach for FC or when no hostgroup is configured.
* Tolerates a missing nsid in the FlashArray connections response
for NVMe-TCP - Purity does not return one for host-group NVMe
connections; the namespace is identified on the host by EUI-128
from FlashArrayVolume.getAddress(), so a placeholder value is
returned to the caller purely for informational tracking.
* Resolves NVMETCP addresses back to volumes in getVolumeByAddress
by reversing the EUI-128 layout (strip optional eui. prefix, drop
leading 00 and the embedded Pure OUI).
* Indexes NVMe connections in getConnectionIdMap by host name (the
array returns one entry per host inside a host-group connection),
so connid.<hostname> tokens in the path still match in
parseAndValidatePath on the KVM side.
Followed by a matching adaptive/KVM driver change (separate commit).
NVMe-oF over TCP (NVMe-TCP) is conceptually a separate storage fabric from Fibre Channel / iSCSI: it speaks the NVMe command set rather than SCSI, identifies namespaces by EUI-128 NGUIDs rather than WWNs, and on Linux is multipathed natively by the nvme driver rather than by device-mapper multipath. Giving it its own StoragePoolType lets the KVM agent dispatch the adaptive driver to a dedicated NVMe-oF adapter (added in the next commit) without polluting the existing Fibre Channel code path. The new value is wired into the same format-routing and derivePath fall-through paths that already special-case FiberChannel in KVMStorageProcessor: NVMe-TCP volumes are also RAW and carry their device path in DataObjectTO.path rather than in a managedStoreTarget detail.
Introduce an NVMe-over-Fabrics counterpart to the existing
MultipathSCSIAdapterBase / FiberChannelAdapter pair.
NVMe-oF is conceptually distinct from SCSI - it speaks the NVMe command
set, identifies namespaces by EUI-128 NGUIDs, and is multipathed by the
kernel natively rather than by device-mapper - so keeping it out of the
SCSI code path avoids special-casing inside every method that handles
volume paths, connect, disconnect, or size lookup.
MultipathNVMeOFAdapterBase (abstract)
* Parses volume paths of the form
type=NVMETCP; address=<eui>; connid.<host>=<nsid>; ...
into an AddressInfo whose path is
/dev/disk/by-id/nvme-eui.<eui>
which is the udev symlink the kernel emits for every NVMe namespace.
* connectPhysicalDisk polls the udev path and, on every iteration,
triggers nvme ns-rescan on all local NVMe controllers, to cover
target/firmware combinations that do not send an asynchronous event
notification when a new namespace is mapped.
* disconnectPhysicalDisk is a no-op; the kernel drops the namespace
when the target removes the host-group connection. The
ByPath variant only claims paths starting with
/dev/disk/by-id/nvme-eui. so foreign paths still fall through to
other adapters.
* Delegates getPhysicalDisk, isConnected, and getPhysicalDiskSize to
plain test -b / blockdev --getsize64 calls - no SCSI rescan, no dm
multipath, no multipath-map cleanup timer.
* createPhysicalDisk / createTemplateFromDisk / listPhysicalDisks /
copyPhysicalDisk all throw UnsupportedOperationException - these
are the responsibility of the storage provider, not the KVM
adapter, same as the SCSI base.
MultipathNVMeOFPool
* KVMStoragePool mirror of MultipathSCSIPool. Defaults to
Storage.StoragePoolType.NVMeTCP in the parameterless-fallback
constructor.
NVMeTCPAdapter
* Concrete adapter that registers itself for
Storage.StoragePoolType.NVMeTCP via the reflection-based scan in
KVMStoragePoolManager. Carries no logic of its own beyond binding
the base to the pool type.
A similar MultipathNVMeOFAdapterBase-derived NVMeRoCEAdapter (or
NVMeFCAdapter) can later be added by adding one concrete subclass and a
new pool-type value; the base does not assume any particular
fabric-level transport.
The adaptive storage framework hard-coded FiberChannel as the KVM-side pool type for every provider it fronts. With a separate NVMeTCP pool type now available (and a dedicated NVMe-oF adapter on the KVM side), teach the lifecycle to route a pool to the right adapter based on a transport= URL parameter: https://user:pass@host/api?...&transport=nvme-tcp -> StoragePoolType.NVMeTCP -> NVMeTCPAdapter on the KVM host When the query parameter is absent the default stays FiberChannel, so existing FC deployments on Primera or FlashArray continue to work unchanged. The choice is made in the shared AdaptiveDataStoreLifeCycleImpl rather than inside each vendor plugin so every adaptive provider (FlashArray, Primera, any future one) speaks the same configuration vocabulary.
|
@blueorangutan package |
|
@winterhazel a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress. |
The NVMe-oF KVM adapter refused every template copy request from the adaptive storage orchestrator with UnsupportedOperationException, which made it impossible to use an NVMe-TCP pool as primary storage for a VM root disk: every deploy that landed a root volume on the pool failed as soon as CloudStack tried to lay down the template. Implement it the same way FiberChannel (SCSI) does: the storage provider creates and connects a raw namespace ahead of time, then the adapter resolves the host-side /dev/disk/by-id/nvme-eui.<NGUID> path via the existing getPhysicalDisk plumbing (which will nvme ns-rescan and wait for the symlink if the kernel has not yet picked it up) and qemu-img converts the source image into the raw block device. User-space encrypted source or destination volumes are rejected: the FlashArray already encrypts at rest and layering qemu-img LUKS on top of a hostgroup-scoped namespace shared between hosts is not a sensible layering. Source encryption would also break on migration because the passphrase does not travel. With this change a CloudStack KVM VM can have its ROOT volume on an NVMe-TCP pool (tested end-to-end on 4.23-SNAPSHOT against Purity 6.7.7: template copy, first boot, live migrate with data disk, VM snapshot with quiesce, and revert all work). Signed-off-by: Eugenio Grosso <eugenio.grosso@gmail.com>
|
Heads-up: pushed an additional commit Implementation mirrors With this commit I was able to:
I've also updated the PR description to reflect the 7-commit set and add the full-NVMe test evidence. Happy to split this commit into a separate follow-up PR if reviewers prefer — let me know. |
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17578 |
| * {@link KVMStoragePoolManager} can find it via reflection. | ||
| */ | ||
| public class NVMeTCPAdapter extends MultipathNVMeOFAdapterBase { | ||
| private static final Logger LOGGER_NVMETCP = LogManager.getLogger(NVMeTCPAdapter.class); |
There was a problem hiding this comment.
| private static final Logger LOGGER_NVMETCP = LogManager.getLogger(NVMeTCPAdapter.class); | |
| private static final Logger LOGGER = LogManager.getLogger(NVMeTCPAdapter.class); |
There was a problem hiding this comment.
Pull request overview
Adds opt-in NVMe-over-TCP (NVMe-oF/TCP) support for KVM managed primary storage via the adaptive storage framework, with the FlashArray adaptive plugin as the first consumer. This introduces a new StoragePoolType.NVMeTCP, NVMe EUI-128 addressing, and a KVM-side NVMe-oF adapter base to surface namespaces via /dev/disk/by-id/nvme-eui.<eui>.
Changes:
- Introduces NVMe-TCP transport selection (
transport=nvme-tcp) and maps it to a newStoragePoolType.NVMeTCP. - Extends FlashArray adapter to generate/parse NVMe EUI-128 addresses and use host-group scoped connections for consistent namespace identity.
- Adds KVM NVMe-oF adapter/pool implementations and updates KVM storage processor handling (RAW format + path derivation) for the new pool type.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| plugins/storage/volume/flasharray/src/main/java/org/apache/cloudstack/storage/datastore/adapter/flasharray/FlashArrayVolume.java | Adds NVMe EUI-128 address construction for NVMe-TCP volumes. |
| plugins/storage/volume/flasharray/src/main/java/org/apache/cloudstack/storage/datastore/adapter/flasharray/FlashArrayConnection.java | Adds nsid field to model NVMe namespace IDs in connection payloads. |
| plugins/storage/volume/flasharray/src/main/java/org/apache/cloudstack/storage/datastore/adapter/flasharray/FlashArrayAdapter.java | Adds transport selection, NVMe attach/lookup behavior, and address-type stamping for returned volumes. |
| plugins/storage/volume/adaptive/src/main/java/org/apache/cloudstack/storage/datastore/lifecycle/AdaptiveDataStoreLifeCycleImpl.java | Chooses pool type from provider URL transport= query parameter (defaults to FiberChannel). |
| plugins/storage/volume/adaptive/src/main/java/org/apache/cloudstack/storage/datastore/adapter/ProviderVolume.java | Adds AddressType.NVMETCP for provider volume addressing. |
| plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/storage/NVMeTCPAdapter.java | Registers a KVM storage adapter for StoragePoolType.NVMeTCP. |
| plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/storage/MultipathNVMeOFPool.java | Adds a pool implementation delegating operations back to the NVMe-oF adapter. |
| plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/storage/MultipathNVMeOFAdapterBase.java | Implements NVMe-oF attach/wait-for-namespace and qemu-img convert copy into namespaces. |
| plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/storage/KVMStorageProcessor.java | Treats NVMeTCP pools like other managed/shared block pools for RAW format and path derivation. |
| api/src/main/java/com/cloud/storage/Storage.java | Adds new enum value StoragePoolType.NVMeTCP. |
| PendingReleaseNotes | Documents the new NVMe-oF/TCP support and required components. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| p.waitFor(NS_RESCAN_TIMEOUT_SECS, TimeUnit.SECONDS); | ||
| } |
| if (AddressType.NVMETCP.equals(volumeAddressType)) { | ||
| if (conn.getHostGroup() != null && conn.getHostGroup().getName() != null | ||
| && conn.getHostGroup().getName().equals(hostgroup)) { | ||
| return conn.getNsid() != null ? "" + conn.getNsid() : "1"; | ||
| } | ||
| } else if (conn.getHost() != null && conn.getHost().getName() != null && | ||
| (conn.getHost().getName().equals(hostname) || conn.getHost().getName().equals(hostname.substring(0, hostname.indexOf('.')))) && | ||
| conn.getLun() != null) { |
| if (list == null || list.getItems() == null || list.getItems().size() == 0) { | ||
| throw new RuntimeException("Volume attach did not return lun information"); | ||
| } |
| // Reverse the EUI-128 layout: serial = eui[2:16] + eui[22:32], after | ||
| // stripping the optional "eui." prefix that appears in udev paths. | ||
| String eui = address.startsWith("eui.") ? address.substring(4) : address; | ||
| serial = (eui.substring(2, 16) + eui.substring(22)).toUpperCase(); | ||
| } else { |
| @@ -781,6 +816,13 @@ private FlashArrayVolume getSnapshot(String snapshotName) { | |||
| return (FlashArrayVolume) getFlashArrayItem(list); | |||
| if (AddressType.NVMETCP.equals(addressType)) { | ||
| // EUI-128 layout for FlashArray NVMe namespaces: | ||
| // 00 + serial[0:14] + <Pure OUI (24a937)> + serial[14:24] | ||
| // This is the value the Linux kernel exposes as | ||
| // /dev/disk/by-id/nvme-eui.<result> |
| if (details != null && details.containsKey(com.cloud.storage.StorageManager.STORAGE_POOL_DISK_WAIT.toString())) { | ||
| String waitTime = details.get(com.cloud.storage.StorageManager.STORAGE_POOL_DISK_WAIT.toString()); | ||
| if (StringUtils.isNotEmpty(waitTime)) { | ||
| waitSecs = Integer.parseInt(waitTime); |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #13061 +/- ##
============================================
+ Coverage 18.01% 19.15% +1.13%
+ Complexity 16607 16603 -4
============================================
Files 6029 5568 -461
Lines 542160 502404 -39756
Branches 66451 58940 -7511
============================================
- Hits 97682 96245 -1437
+ Misses 433461 395337 -38124
+ Partials 11017 10822 -195
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Description
Adds an end-to-end NVMe-over-TCP data path for CloudStack on KVM, using the FlashArray adaptive plugin as the first (and currently only) consumer. The change is opt-in — existing Fibre Channel FlashArray / Primera deployments continue to work unchanged.
A FlashArray pool is switched to NVMe-TCP by adding a single
transport=nvme-tcpquery parameter to the pool URL oncreateStoragePool:When that parameter is present the adaptive lifecycle stamps the pool with the new
StoragePoolType.NVMeTCP, the KVM agent dispatches to a brand-newMultipathNVMeOFAdapterBase/NVMeTCPAdapterpair, and the FlashArray adapter attaches volumes as host-group-scoped NVMe connections, builds EUI-128 NGUIDs in the layout/dev/disk/by-id/nvme-eui.<32-hex>that udev emits for a Pure namespace, and reverses that layout when CloudStack looks up a volume by address.The seven commits are split along natural seams (address type, FA REST-side support, storage pool type, KVM adapter, adaptive lifecycle routing, docs,
copyPhysicalDisk) so each can be reviewed independently.Why a separate
NVMeTCPpool type (and a separateMultipathNVMeOFAdapterBase) rather than reusingFiberChannel/MultipathSCSIAdapterBase?nvmedriver rather than by device-mapper multipath. Keeping it out of the SCSI code path avoids special-casing inside every method that handles paths, connect, disconnect, or size lookup.Types of changes
Feature/Enhancement Scale or Bug Severity
Feature. Opt-in via
transport=nvme-tcpURL parameter on pool registration. Defaults are unchanged.How Has This Been Tested?
Validated end-to-end on a 4.23-SNAPSHOT lab against a Pure Storage FlashArray running Purity 6.7.7:
cloudbr-nvmewith an IP on the NVMe subnet,nvme-cli+nvme_tcpkernel module, a persistent/etc/nvme/hostnqn, a populated/etc/nvme/discovery.confandnvme connect-allenabled at boot.cloudstack), a hostgroup matching the CloudStack cluster name (cluster1), one host per KVM host inside the hostgroup bound to the host's NQN.provider="Flash Array",transport=nvme-tcp,hostgroup=cluster1→ pool entersUpstate,type: NVMeTCP.tags=nvmedisk offering volume to a Rocky 9 VM: the volume's path carriedtype=NVMETCP; address=<EUI-128>; connid.kvm01=1; connid.kvm02=1;; both hosts saw/dev/disk/by-id/nvme-eui.<that EUI>via the host-group NVMe connection; libvirt presented the namespace to the guest as/dev/vdb.mkfs.ext4 /dev/vdb, wrote 16 MiB of/dev/urandomwithconv=fsync, recorded SHA-256, unmounted/remounted, re-checksummed → hash matched.sha256sumprobe loop was running against/mnt/nvme/pattern.binevery 2 s. Migration completed in 6 s, the loop output showed the same hash across the migration window with no gap (multi-path/hostgroup-scope proof).copyPhysicalDiskconverted the Rocky 9 cloud template qcow2 into a raw NVMe namespace (10 GB root), the VM booted from it, cloud-init injected an SSH key, and a 20 GBtags=nvmedata disk was attached.lsblkinside the guest showed bothvdaandvdbas NVMe-backed virtio block devices.vdaandvdbwith a known SHA-256, took acreateVMSnapshotwithquiescevm=true, snapshotmemory=false, deleted both sentinel files, issuedrevertToVMSnapshot, restarted, and confirmed both files reappeared with the identical SHA-256 content. Array-side snapshotscloudstack::vol-4-1-2-<id>.1for both volumes visible on Purity during the window. TheStorageVMSnapshotStrategypath is what CloudStack dispatches here, so any adaptive-plugin consumer gets the same behaviour.transport=parameter) continue to work —type: FiberChannel, FC WWN addressing, sameMultipathSCSIAdapterBasecode path as before.Notes
capacitybytes=oncreateStoragePoolis a workaround without flasharray: fall back to array capacity when pod has no quota #13050 merged. A companion fix sits in adaptive: honor user-provided capacityBytes when provider stats are unavailable #13059 (AdaptiveDataStoreLifeCycleImplhonouring user-supplied capacity when provider stats are null).snapshotmemory=trueis not yet supported on an NVMe-TCP pool (or on any managed pool):StorageVMSnapshotStrategy.canHandlein core CloudStack explicitly rejects memory snapshots. Disk-only snapshots (with or without quiesce) work. Lifting that restriction would be a separate feature PR touching core CloudStack, not the adaptive/NVMe plugin.