Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
This website/page will be End-of-life (EOL) after 31 August 2024. We recommend you to visit OpenEBS Documentation for the latest Mayastor documentation (v2.6 and above).
Mayastor is now also referred to as OpenEBS Replicated PV Mayastor.
All worker nodes must satisfy the following requirements:
x86-64 CPU cores with SSE4.2 instruction support
(Tested on) Linux kernel 5.15 (Recommended) Linux kernel 5.13 or higher. The kernel should have the following modules loaded:
nvme-tcp
ext4 and optionally xfs
Helm version must be v3.7 or later.
Each worker node which will host an instance of an io-engine pod must have the following resources free and available for exclusive use by that pod:
Two CPU cores
1GiB RAM
HugePage support
A minimum of 2GiB of 2MiB-sized pages
Ensure that the following ports are not in use on the node:
10124: Mayastor gRPC server will use this port.
8420 / 4421: NVMf targets will use these ports.
The firewall settings should not restrict connection to the node.
Disks must be unpartitioned, unformatted, and used exclusively by the DiskPool.
The minimum capacity of the disks should be 10 GB.
Kubernetes core v1 API-group resources: Pod, Event, Node, Namespace, ServiceAccount, PersistentVolume, PersistentVolumeClaim, ConfigMap, Secret, Service, Endpoint, Event.
Kubernetes batch API-group resources: CronJob, Job
Kubernetes apps API-group resources: Deployment, ReplicaSet, StatefulSet, DaemonSet
Kubernetes storage.k8s.io
API-group resources: StorageClass, VolumeSnapshot, VolumeSnapshotContent, VolumeAttachment, CSI-Node
Kubernetes apiextensions.k8s.io
API-group resources: CustomResourceDefinition
Mayastor Custom Resources that is openebs.io
API-group resources: DiskPool
Custom Resources from Helm chart dependencies of Jaeger that is helpful for debugging:
ConsoleLink Resource from console.openshift.io
API group
ElasticSearch Resource from logging.openshift.io
API group
Kafka and KafkaUsers from kafka.strimzi.io
API group
ServiceMonitor from monitoring.coreos.com
API group
Ingress from networking.k8s.io
API group and from extensions API group
Route from route.openshift.io
API group
All resources from jaegertracing.io
API group
The minimum supported worker node count is three nodes. When using the synchronous replication feature (N-way mirroring), the number of worker nodes to which Mayastor is deployed should be no less than the desired replication factor.
Mayastor supports the export and mounting of volumes over NVMe-oF TCP only. Worker node(s) on which a volume may be scheduled (to be mounted) must have the requisite initiator support installed and configured. In order to reliably mount Mayastor volumes over NVMe-oF TCP, a worker node's kernel version must be 5.13 or later and the nvme-tcp kernel module must be loaded.
This website/page will be End-of-life (EOL) after 31 August 2024. We recommend you to visit OpenEBS Documentation for the latest Mayastor documentation (v2.6 and above).
Mayastor is now also referred to as OpenEBS Replicated PV Mayastor.
Before deploying and using Mayastor please consult the Known Issues section of this guide.
The steps and commands which follow are intended only for use in conjunction with Mayastor version(s) 2.2.x.
Installing Mayastor via the Helm chart sets the StorageClass as 'default' for Loki and etcd StatefulSets. This will work in clusters which have a StorageClass named 'default'. If there is no 'default' StorageClass in the cluster, storage provisioning will fail for Loki and etcd. In such cases, one can change the StorageClass to 'manual' which will provision a PersistentVolume of type hostPath. To do this, make the following changes in the values.yaml
file:
loki-stack.persistence.storageClassName: manual
etcd.persistence.storageClassName: manual
Add the OpenEBS Mayastor Helm repository.
Run the following command to discover all the stable versions of the added chart repository:
To discover all the versions (including unstable versions), execute: helm search repo mayastor --devel --versions
Run the following command to install Mayastor _version 2.2.
Verify the status of the pods by running the command:
The native NVMe-oF CAS engine of OpenEBS
This website/page will be End-of-life (EOL) after 31 August 2024. We recommend you to visit OpenEBS Documentation for the latest Mayastor documentation (v2.6 and above).
Mayastor is now also referred to as OpenEBS Replicated PV Mayastor.
Mayastor is a performance optimised "Container Attached Storage" (CAS) solution of the CNCF project OpenEBS. The goal of OpenEBS is to extend Kubernetes with a declarative data plane, providing flexible persistent storage for stateful applications.
Design goals for Mayastor include:
Highly available, durable persistence of data
To be readily deployable and easily managed by autonomous SRE or development teams
To be a low-overhead abstraction for NVMe-based storage devices
Mayastor incorporates Intel's Storage Performance Development Kit. It has been designed from the ground up to leverage the protocol and compute efficiency of NVMe-oF semantics, and the performance capabilities of the latest generation of solid-state storage devices, in order to deliver a storage abstraction with performance overhead measured to be within the range of single-digit percentages.
By comparison, most "shared everything" storage systems are widely thought to impart an overhead of at least 40% (and sometimes as much as 80% or more) as compared to the capabilities of the underlying devices or cloud volumes; additionally traditional shared storage scales in an unpredictable manner as I/O from many workloads interact and compete for resources.
While Mayastor utilizes NVMe-oF it does not require NVMe devices or cloud volumes to operate and can work well with other device types.
Mayastor's source code and documentation are distributed amongst a number of GitHub repositories under the OpenEBS organisation. The following list describes some of the main repositories but is not exhaustive.
openebs/mayastor : contains the source code of the data plane components
openebs/mayastor-control-plane : contains the source code of the control plane components
openebs/mayastor-api : contains common protocol buffer definitions and OpenAPI specifications for Mayastor components
openebs/mayastor-dependencies : contains dependencies common to the control and data plane repositories
openebs/mayastor-extensions : contains components and utilities that provide extended functionalities like ease of installation, monitoring and observability aspects
openebs/mayastor-docs : contains Mayastor's user documenation
This website/page will be End-of-life (EOL) after 31 August 2024. We recommend you to visit OpenEBS Documentation for the latest Mayastor documentation (v2.6 and above).
Mayastor is now also referred to as OpenEBS Replicated PV Mayastor.
The objective of this section is to provide the user and evaluator of Mayastor with a topological view of the gross anatomy of a Mayastor deployment. A description will be made of the expected pod inventory of a correctly deployed cluster, the roles and functions of the constituent pods and related Kubernetes resource types, and of the high level interactions between them and the orchestration thereof.
More detailed guides to Mayastor's components, their design and internal structure, and instructions for building Mayastor from source, are maintained within the project's GitHub repository.
Control Plane
agent-core
Deployment
Principle control plane actor
Single
csi-controller
Deployment
Hosts Mayastor's CSI controller implementation and CSI provisioner side car
Single
api-rest
Pod
Hosts the public API REST server
Single
api-rest
Service
Exposes the REST API server
operator-diskpool
Deployment
Hosts DiskPool operator
Single
csi-node
DaemonSet
Hosts CSI Driver node plugin containers
All worker nodes
etcd
StatefulSet
Hosts etcd Server container
Configurable(Recommended: Three replicas)
etcd
Service
Exposes etcd DB endpoint
Single
etcd-headless
Service
Exposes etcd DB endpoint
Single
io-engine
DaemonSet
Hosts Mayastor I/O engine
User-selected nodes
DiskPool
CRD
Declares a Mayastor pool's desired state and reflects its current state
User-defined, one or many
Additional components
metrics-exporter-pool
Sidecar container (within io-engine DaemonSet)
Exports pool related metrics in Prometheus format
All worker nodes
pool-metrics-exporter
Service
Exposes exporter API endpoint to Prometheus
Single
promtail
DaemonSet
Scrapes logs of Mayastor-specific pods and exports them to Loki
All worker nodes
loki
StatefulSet
Stores the historical logs exported by promtail pods
Single
loki
Service
Exposes the Loki API endpoint via ClusterIP
Single
The io-engine pod encapsulates Mayastor containers, which implement the I/O path from the block devices at the persistence layer, up to the relevant initiators on the worker nodes mounting volume claims. The Mayastor process running inside this container performs four major functions:
Creates and manages DiskPools hosted on that node.
Creates, exports, and manages volume controller objects hosted on that node.
Creates and exposes replicas from DiskPools hosted on that node over NVMe-TCP.
Provides a gRPC interface service to orchestrate the creation, deletion and management of the above objects, hosted on that node. Before the io-engine pod starts running, an init container attempts to verify connectivity to the agent-core in the namespace where Mayastor has been deployed. If a connection is established, the io-engine pod registers itself over gRPC to the agent-core. In this way, the agent-core maintains a registry of nodes and their supported api-versions. The scheduling of these pods is determined declaratively by using a DaemonSet specification. By default, a nodeSelector field is used within the pod spec to select all worker nodes to which the user has attached the label openebs.io/engine=mayastor
as recipients of an io-engine pod. In this way, the node count and location are set appropriately to the hardware configuration of the worker nodes, and the capacity and performance demands of the cluster.
The csi-node pods within a cluster implement the node plugin component of Mayastor's CSI driver. As such, their function is to orchestrate the mounting of Mayastor-provisioned volumes on worker nodes on which application pods consuming those volumes are scheduled. By default, a csi-node pod is scheduled on every node in the target cluster, as determined by a DaemonSet resource of the same name. Each of these pods encapsulates two containers, csi-node, and csi-driver-registrar. The node plugin does not need to run on every worker node within a cluster and this behavior can be modified, if desired, through the application of appropriate node labeling and the addition of a corresponding nodeSelector entry within the pod spec of the csi-node DaemonSet. However, it should be noted that if a node does not host a plugin pod, then it will not be possible to schedule an application pod on it, which is configured to mount Mayastor volumes.
etcd is a distributed reliable key-value store for the critical data of a distributed system. Mayastor uses etcd as a reliable persistent store for its configuration and state data.
The supportability tool is used to create support bundles (archive files) by interacting with multiple services present in the cluster where Mayastor is installed. These bundles contain information about Mayastor resources like volumes, pools and nodes, and can be used for debugging. The tool can collect the following information:
Topological information of Mayastor's resource(s) by interacting with the REST service
Historical logs by interacting with Loki. If Loki is unavailable, it interacts with the kube-apiserver to fetch logs.
Mayastor-specific Kubernetes resources by interacting with the kube-apiserver
Mayastor-specific information from etcd (internal) by interacting with the etcd server.
Loki aggregates and centrally stores logs from all Mayastor containers which are deployed in the cluster.
Promtail is a log collector built specifically for Loki. It uses the configuration file for target discovery and includes analogous features for labeling, transforming, and filtering logs from containers before ingesting them to Loki.
This website/page will be End-of-life (EOL) after 31 August 2024. We recommend you to visit for the latest Mayastor documentation (v2.6 and above).
Mayastor is now also referred to as OpenEBS Replicated PV Mayastor.
Storage class resource in Kubernetes is used to supply parameters to volumes when they are created. It is a convenient way of grouping volumes with common characteristics. All parameters take a string value. Brief explanation of each supported Mayastor parameter follows.
The storage class parameter local
has been deprecated and is a breaking change in Mayastor version 2.0. Ensure that this parameter is not used.
File system that will be used when mounting the volume. The default file system when not specified is 'ext4'. We recommend to use 'xfs' that is considered to be more advanced and performant. Though make sure that XFS is installed on all nodes in the cluster before using it.
Expressed in seconds and it sets the io_timeout
parameter in the linux block device driver for Mayastor volume. It also sets ctrl-loss-tmo
timeout in linux NVMe driver that is used for detecting inaccessible target (nexus in our case). In either case, if IO cannot be served by Mayastor volume, because it is stuck or because the nvmf target is not there, it will take roughly this time to the initiator to return an error to data consumer, which can be either filesystem or an application if using raw block volume. The usual reaction of a filesystem to such error is to switch the filesystem to read-only state and require manual intervention from user - remounting. Be careful when setting this to a low value because any node reboot that does not fit into this time interval may require manual intervention afterwards or can result in application failure - depending on how the error is handled by the application.
The setting is supported only when using "nvmf" protocol.
The parameter 'protocol' takes the value nvmf
(NVMe over TCP protocol). It is used to mount the volume (target) on the application node.
The string value should be a number and the number should be greater than zero. Mayastor control plane will try to keep always this many copies of the data if possible. If set to one then the volume does not tolerate any node failure. If set to two, then it tolerates one node failure. If set to three, then two node failures, etc.
The agents.core.capacity.thin
spec present in the Mayastor helm chart consists of the following configurable parameters that can be used to control the scheduling of thinly provisioned replicas:
poolCommitment parameter specifies the maximum allowed pool commitment limit (in percent).
volumeCommitment parameter specifies the minimum amount of free space that must be present in each replica pool in order to create new replicas for an existing volume. This value is specified as a percentage of the volume size.
volumeCommitmentInitial minimum amount of free space that must be present in each replica pool in order to create new replicas for a new volume. This value is specified as a percentage of the volume size.
Note:
By default, the volumes are provisioned as thick
.
For a pool of a particular size, say 10 Gigabytes, a volume > 10 Gigabytes cannot be created, as Mayastor 2.2.0 does not support pool expansion.
The replicas for a given volume can be either all thick or all thin. Same volume cannot have a combination of thick and thin replicas
stsAffinityGroup
represents a collection of volumes that belong to instances of Kubernetes StatefulSet. When a StatefulSet is deployed, each instance within the StatefulSet creates its own individual volume, which collectively forms the stsAffinityGroup
. Each volume within the stsAffinityGroup
corresponds to a pod of the StatefulSet.
This feature enforces the following rules to ensure the proper placement and distribution of replicas and targets so that there isn't any single point of failure affecting multiple instances of StatefulSet.
Anti-Affinity among single-replica volumes : This rule ensures that replicas of different volumes are distributed in such a way that there is no single point of failure. By avoiding the colocation of replicas from different volumes on the same node.
Anti-Affinity among multi-replica volumes : If the affinity group volumes have multiple replicas, they already have some level of redundancy. This feature ensures that in such cases, the replicas are distributed optimally for the stsAffinityGroup volumes.
By default, the stsAffinityGroup
feature is disabled. To enable it, modify the storage class YAML by setting the parameters.stsAffinityGroup
parameter to true.
The volumes can either be thick
or thin
provisioned. Adding the thin
parameter to the StorageClass YAML allows the volume to be thinly provisioned. To do so, add thin: true
under the parameters
spec, in the StorageClass YAML. When the volumes are thinly provisioned, the user needs to monitor the pools, and if these pools start to run out of space, then either new pools must be added or volumes deleted to prevent thinly provisioned volumes from getting degraded or faulted. This is because when a pool with more than one replica runs out of space, Mayastor moves the largest out-of-space replica to another pool and then executes a rebuild. It then checks if all the replicas have sufficient space; if not, it moves the next largest replica to another pool, and this process continues till all the replicas have sufficient space.
The capacity usage on a pool can be monitored using .
Anti-affinity among targets : The feature ensures that there is no single point of failure for the targets. The stsAffinityGroup
ensures that in such cases, the targets are distributed optimally for the stsAffinityGroup volumes.
This website/page will be End-of-life (EOL) after 31 August 2024. We recommend you to visit OpenEBS Documentation for the latest Mayastor documentation (v2.6 and above).
Mayastor is now also referred to as OpenEBS Replicated PV Mayastor.
This quickstart guide describes the actions necessary to perform a basic installation of Mayastor on an existing Kubernetes cluster, sufficient for evaluation purposes. It assumes that the target cluster will pull the Mayastor container images directly from OpenEBS public container repositories. Where preferred, it is also possible to build Mayastor locally from source and deploy the resultant images but this is outside of the scope of this guide.
Deploying and operating Mayastor in production contexts requires a foundational knowledge of Mayastor internals and best practices, found elsewhere within this documentation.
This website/page will be End-of-life (EOL) after 31 August 2024. We recommend you to visit OpenEBS Documentation for the latest Mayastor documentation (v2.6 and above).
Mayastor is now also referred to as OpenEBS Replicated PV Mayastor.
2MiB-sized Huge Pages must be supported and enabled on the mayastor storage nodes. A minimum number of 1024 such pages (i.e. 2GiB total) must be available exclusively to the Mayastor pod on each node, which should be verified thus:
If fewer than 1024 pages are available then the page count should be reconfigured on the worker node as required, accounting for any other workloads which may be scheduled on the same node and which also require them. For example:
This change should also be made persistent across reboots by adding the required value to the file/etc/sysctl.conf
like so:
If you modify the Huge Page configuration of a node, you MUST either restart kubelet or reboot the node. Mayastor will not deploy correctly if the available Huge Page count as reported by the node's kubelet instance does not satisfy the minimum requirements.
All worker nodes which will have Mayastor pods running on them must be labelled with the OpenEBS engine type "mayastor". This label will be used as a node selector by the Mayastor Daemonset, which is deployed as a part of the Mayastor data plane components installation. To add this label to a node, execute:
This website/page will be End-of-life (EOL) after 31 August 2024. We recommend you to visit for the latest Mayastor documentation (v2.6 and above).
Mayastor is now also referred to as OpenEBS Replicated PV Mayastor.
If all verification steps in the preceding stages were satisfied, then Mayastor has been successfully deployed within the cluster. In order to verify basic functionality, we will now dynamically provision a Persistent Volume based on a Mayastor StorageClass, mount that volume within a small test pod which we'll create, and use the utility within that pod to check that I/O to the volume is processed correctly.
Use kubectl
to create a PVC based on a StorageClass that you created in the . In the example shown below, we'll consider that StorageClass to have been named "mayastor-1". Replace the value of the field "storageClassName" with the name of your own Mayastor-based StorageClass.
For the purposes of this quickstart guide, it is suggested to name the PVC "ms-volume-claim", as this is what will be illustrated in the example steps which follow.
If you used the storage class example from previous stage, then volume binding mode is set to