beta.blog

Archive for December, 2023

GlusterFS – Resolving the “split-brain” issue

by on Dec.01, 2023, under News

About GlusterFS

GlusterFS is an open-source, scalable network filesystem suitable for data-intensive tasks such as cloud storage and media streaming. It utilizes a distributed file system approach, allowing for the storage resources over multiple machines to be pooled into a single namespace.

Key features and concepts include:

  • Scale-Out Architecture: GlusterFS is designed to scale horizontally, meaning we can simply add more machines to expand storage capacity.
  • Elasticity: Servers can be added or removed from the storage pool without any downtime.
  • Flexibility: Supports diverse workloads, allowing it to be used for archival, rich media, or cloud storage.
  • Self-healing: In the event of a failure, GlusterFS can automatically recover the data to ensure its high availability.
  • Replication: Data can be mirrored across multiple nodes to ensure data redundancy and increase fault tolerance.
  • Striping: Data can be split across multiple nodes to improve performance.
  • Unified Namespace: Provides a unified file namespace, which ensures data location transparency.
  • Geo-replication: Allows for asynchronous replication of data between geographically dispersed clusters.
  • Agility: Being software-defined, it allows users to scale and manage storage resources on commodity hardware, which can lead to significant cost savings.

In essence, GlusterFS offers a cost-effective, highly scalable, and robust storage solution, making it an excellent choice for enterprises and businesses seeking to manage large amounts of data without compromising on performance or resilience.

The “Split-Brain” issue

Split-brain is a problematic scenario in distributed systems, particularly in clusters, where two or more nodes start operating independently instead of synchronizing their operations. This situation arises when nodes lose communication with each other and, assuming they’re isolated, continue processing independently. As a result, the system lacks a single source of truth, leading to inconsistencies in data. In a storage system like GlusterFS, if two nodes receive write operations for the same data block during a split-brain, they may end up with divergent data copies. Resolving these inconsistencies post split-brain can be complex and may require manual intervention.

Typical symptoms would be I/O errors when accessing files that are in a split-brain state.

Debugging and resolving a split-brain situation

Let’s assume our gluster is called gv0. Run the following command from terminal to find files in a split-brain state:

gluster volume heal gv0 info

This shows us for example a file /data/glusterfs/gv0/data/ib_logfile0 in split-brain state.

First off, we’ll create a backup of this file, so ideally we create a separate backup copy of this file on each node of the cluster.

After creating a backup, we may now proceed with getting the gfid of the file:

getfattr -m . -d -e hex /data/glusterfs/gv0/data/ib_logfile0

This should print some file attributes, along with the line:

trusted.gfid=0x23d6d1f6bc6b4e93b23b075e80eaf7f0

Since we created a backup copy of the file in the previous step, we may now proceed with deleting this file AND the .glusterfs meta file using the following command (which needs to be run on each node of the cluster):

rm /data/glusterfs/gv0/data/ib_logfile0 /data/glusterfs/gv0/.glusterfs/23/d6/23d6d1f6-bc6b-4e93-b23b-075e80eaf7f0

After we did this, the command gluster volume heal gv0 info should no longer report this file as split-brain.

We may now copy over the backup file back onto the glusterfs mount in order to restore the file properly.

Leave a Comment :, more...

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!