Basic Troubleshooting

Logs

The correct set of log file to collect depends on the nature of the problem. If unsure, then it is best to collect log files for all Mayastor containers. In nearly every case, the logs of all of the control plane component pods will be needed;

  • csi-controller

  • core-agent

  • rest

  • msp-operator

kubectl -n mayastor get pods -o wide

Mayastor pod log file

Mayastor containers form the data plane of a Mayastor deployment. A cluster should schedule as many mayastor container instances as required storage nodes have been defined. This log file is most useful when troubleshooting I/O errors however, provisioning and management operations might also fail because of a problem on a storage node.

kubectl -n mayastor logs mayastor-qgpw6 mayastor

CSI agent pod log file

If experiencing problems with (un)mounting a volume on an application node, this log file can be useful. Generally all worker nodes in the cluster will be configured to schedule a mayastor CSI agent pod, so it's good to know which specific node is experiencing the issue and inspect the log file only for that node.

kubectl -n mayastor logs mayastor-csi-7pg82 mayastor-csi

CSI sidecars

These containers implement the CSI spec for Kubernetes and run within the same pods as the csi-controller and mayastor-csi (node plugin) containers. Whilst they are not part of Mayastor's code, they can contain useful information when a Mayastor CSI controller/node plugin fails to register with k8s cluster.

kubectl -n mayastor logs $(kubectl -n mayastor get pod -l app=moac -o jsonpath="{.items[0].metadata.name}") csi-attacher
kubectl -n mayastor logs $(kubectl -n mayastor get pod -l app=moac -o jsonpath="{.items[0].metadata.name}") csi-provisioner
kubectl -n mayastor logs mayastor-csi-7pg82 csi-driver-registrar

Coredumps

A coredump is a snapshot of process' memory combined with auxiliary information (PID, state of registers, etc.) and saved to a file. It is used for post-mortem analysis and it is generated automatically by the operating system in case of a severe, unrecoverable error (i.e. memory corruption) causing the process to panic. Using a coredump for a problem analysis requires deep knowledge of program internals and is usually done only by developers. However, there is a very useful piece of information that users can retrieve from it and this information alone can often identify the root cause of the problem. That is the stack (backtrace) - a record of the last action that the program was performing at the time when it crashed. Here we describe how to get it. The steps as shown apply specifically to Ubuntu, other linux distros might employ variations.

We rely on systemd-coredump that saves and manages coredumps on the system, coredumpctl utility that is part of the same package and finally the gdb debugger.

sudo apt-get install -y systemd-coredump gdb lz4

If installed correctly then the global core pattern will be set so that all generated coredumps will be piped to the systemd-coredump binary.

cat /proc/sys/kernel/core_pattern
coredumpctl list

If there is a new coredump from the mayastor container, the coredump alone won't be that useful. GDB needs to access the binary of crashed process in order to be able to print at least some information in the backtrace. For that, we need to copy the contents of the container's filesystem to the host.

docker ps | grep mayadata/mayastor
mkdir -p /tmp/rootdir
docker cp b3db4615d5e1:/bin /tmp/rootdir
docker cp b3db4615d5e1:/nix /tmp/rootdir

Now we can start GDB. Don't use the coredumpctl command for starting the debugger. It invokes GDB with invalid path to the debugged binary hence stack unwinding fails for Rust functions. At first we extract the compressed coredump.

coredumpctl info | grep Storage | awk '{ print $2 }'
sudo lz4cat /var/lib/systemd/coredump/core.mayastor.0.6a5e550e77ee4e77a19bd67436ce7a98.64074.1615374302000000000000.lz4 >core
gdb -c core /tmp/rootdir$(readlink /tmp/rootdir/bin/mayastor)

Once in GDB we need to set a sysroot so that GDB knows where to find the binary for the debugged program.

set auto-load safe-path /tmp/rootdir
set sysroot /tmp/rootdir

After that we can print backtrace(s).

thread apply all bt

Diskpool behaviour

The below behaviour may be encountered while uprading from older releases to Mayastor 2.5 release and above.

Get Dsp

Running kubectl get dsp -n mayastor could result in the error due to the v1alpha1 schema in the discovery cache. To resolve this, run the command kubectl get diskpools.openebs.io -n mayastor. After this kubectl discovery cache will be updated with v1beta1 object for dsp.

Create API

When creating a Disk Pool with kubectl create -f dsp.yaml, you might encounter an error related to v1alpha1 CR definitions. To resolve this, ensure your CR definition is updated to v1beta1 in the YAML file (for example, apiVersion: openebs.io/v1beta1).

You can validate the schema changes by executing kubectl get crd diskpools.openebs.io.

Last updated