replparameter)\, the control plane will attempt to maintain through a 'Kubernetes-like' reconciliation loop that number of identical copies of the volume's data ("replicas" or "children") at any point in time. When a volume is first provisioned the control plane will attempt to create the required number of replicas, whilst adhering to its internal heuristics for their location within the cluster (which will be discussed shortly). If it succeeds, the volume will become available and will bind with the PVC. If the control plane cannot identify a sufficient number of eligble Mayastor Pools in which to create required replicas at the time of provisioning, the operation will fail; the Mayastor Volume will not be created and the associated PVC will not be bound. Kubernetes will periodically re-try the volume creation and if at any time the appropriate number of pools can be selected, the volume provisioning should succeed.
CHILD_FAULTEDand it will no longer receive I/O requests from the nexus. It will remain a member of the volume, whose departure from the desired state with respect to replica count will reflected with a volume status of
degraded. How many I/O failures are considered "too many" in this context is outside the scope of this discussion.
CHILD_ONLINE, i.e. the source. This process can proceed whilst the volume continues to process application I/Os although it will contend for disk throughput at both the source and destination disks. It will also 'retire' the old, faulted one which will then no longer be associated to the volume. Once retired, a replica will become available for garbage collection (deletion from the Mayastor Pool containing it), assuming that the nature of the failure was such that the pool itself is still viable (i.e. the underlying disk device is still accessible). N.B.: For a replica to be retired from a volume, it must have been possible to create a suitable replacement first. In the absence of a replacement, the replica remains a member of the nexus, albeit faulted and thus unusable.
CHILD_FAULTED. If faulted replicas can be re-connected successfully, then the control plane will attempt to rebuild them directly, rather than seek replacements for them first. This edge-case therefore does not result in the retirement of the affected replicas; they are simply reused. If they are successfully re-attached but then continue to encounter I/O failures the rebuild will fail and will not be attempted again.
Replica Countfield may be either increased or decreased and the control plane will attempt to reconcile the desired and actual state, following the same replica placement rules as described herein. If the replica count is reduced, faulted replicas will be selected for removal in preference to healthy ones.
CHILD_FAULTEDare always selected for retirement in preference to those with state
degradedand the replica associated with Pool-2-A enters the
CHILD_FAULTEDstate (as seen in the MSV custom resouce).
degradedas it is not currently possible for the control plane to create a new replica - Rule 2 applies; the faulted replica is situated on Node-2 and is still a member of the volume's nexus (Rule 3), so a new replica cannot be placed here. Even though that node has a second, unused pool, the requirement is that ALL replicas of a volume must be placed on different nodes. There are no other MSNs in the cluster, so the selection of a new replica location fails.
degradedand the faulted replica's becomes
healthystate, a user edits the MSV's
Replica Count:field, increasing the value from 2 to 3. Before doing so they corrected the SAN misconfiguration and ensured that the MSP on Node-2 was
degradedto reflect the difference in actual vs required redundant data copies but a rebuild of the new replica will be performed and eventually the volume state will be
degradedstate. The replica in Pool-3-A will be in the state
CHILD_FAULTED, as observed in the volume's MSV custom resource. No replica replacement (nor subsequent rebuild) will occur, since Rule 3 states that the faulted replica hosted in Pool-3-A on Node-3 remains a part of that volume's nexus and therefore the other pool on Node-3 (Pool-3-B) cannot be selected as a location for its replacement because of Rule 2.
Replica Countfrom 3 to 2
CHILD_FAULTEDstate whilst the other two replicas are healthy (
CHILD_ONLINE). This is Rule 5 in action. Once the faulted replica has been retired, the volume state will become
healthyagain. The desired and actual replica count are now 2. The volume's replicas are located in MSPs on both Node-1 and Node-2. The user then edits the MSV custom resource again, increasing the
Replica Count:from 2 to 3 again.
degraded, reflecting the difference in desired vs actual replica count. The control plane will select a pool on Node-3 as the location for the new replica required. It is following Rule 2; the replica previously in Pool-3-A, which failed, has already been retired from the membership of the volume's nexus as a result of the user's actions in Scenario Five. Node-3 is therefore again a suitable candidate and has online pools with sufficient capacity. If the control plane selects Pool-3-B, which was unaffected by the previous disk failure in Pool-3-A, then once the new replica has been created, a rebuild will take place and eventually, the volume state will return to
healthybut now with a
Replica Count:of 3. However, if Pool-3-A were to be selected (which still has a permanent disk fault and is unable to complete any I/O) then the replica creation will fail; the control plane will not attempt further reconciliation and the volume state will remain
degraded, with a
Replica Count:of 2.