vmkfstools -x repair /vmfs/volumes/datastore01/server01/server01.vmdk Output: "Failed to repair disk. Disk chain is inconsistent."

if [ ! -z "$FLAT_SIZE" ]; then CALC_MB=$(($FLAT_SIZE / 1024 / 1024)) echo "Health: $vmdk is $CALC_MB MB nominal." >> $LOG_FILE fi done VMDK corruption is rarely a hardware failure. It is a timing failure —where the hypervisor assumes a write completed before the storage array confirms it. The fix involves bypassing the broken descriptor and forcing a rebuild from the raw extent. Always keep a backup of the .vmdk descriptor file separately from the flat file.

I pull the manifest. The VMDK is 2TB thin-provisioned. The VM won't boot. The backup ran six hours ago, but the differential is useless because the block map is scrambled.

#!/bin/bash # vmdk_integrity_check.sh # Checks for early signs of VMDK corruption DATASTORE_PATH="/vmfs/volumes/VM_Storage" LOG_FILE="/var/log/vmdk_health.log"

"Server is slow. Throwing I/O errors. Tried to restart; now it won't boot."

[SUCCESS] Grain table rebuilt. 1,892,471 sectors recovered. [WARNING] 204 sectors unrecoverable (log files, temporary data). The SQL database performs a recovery on mount. I lose 15 minutes of transaction log data. The business accepts the loss. To prevent this, implement a VMDK health check cron job on the ESXi host:

# Check for zeroed CID (Corruption flag) CID=$(grep "CID=" $vmdk | cut -d'=' -f2) if [ "$CID" == "fffffffe" ]; then echo "ALERT: $vmdk has invalid CID. Possible corruption." >> $LOG_FILE /usr/bin/vmkfstools -x check $vmdk >> $LOG_FILE fi

# Verify descriptor matches flat extent size DESCRIPTOR_SIZE=$(grep "RW" $vmdk | awk 'print $2') FLAT_SIZE=$(stat -c %s $vmdk%-flat.vmdk.vmdk 2>/dev/null)

Ads