Q:

Drive has failed, resilvering is extremely slow, and the system won’t boot.

I have a 2023 system with a simple ZFS mirror pool (rust) using 2x14TB Seagate drives. The system also has 4 SSDs (boot, mirrored data, L2ARC).

Yesterday, TrueNAS reported one drive (ZHZ3Q546) had failed, and the pool went DEGRADED, then SUSPENDED. After a reboot, the pool showed ONLINE and started resilvering. Later, the other drive (WAINR7DV) became DEGRADED. Resilvering is extremely slow (around 600KB/s), causing high I/O load, and SSH/web access is difficult.

I powered off and removed WAINR7DV, checked it on another system, and it looked fine. TrueNAS now keeps rebooting. A scrub on WAINR7DV found no issues.

I want to keep the system running with a single degraded drive while I get a replacement. I’m looking for advice on what might have happened, what I may have done wrong during recovery, and the best next steps.

EDIT: I left WAINR7DV in the system after the first alert. The system now boots in degraded state, and a long SMART test is running.

NAS data recovery

All Replies

Viewing 1 replies (of 1 total)

The ZFS mirror pool became degraded after one drive failed, and the second drive showed errors during resilvering, likely due to heavy I/O on large drives. Keeping both drives active and using the system during recovery slowed resilvering and may have caused further issues. The system is now unstable and only boots in a degraded state. The best course of action is to stop all non-essential activity, run SMART tests on the remaining drive, back up important data immediately, and avoid writing to the pool. Once a replacement drive is ready, replace the failed drive and let resilvering finish without additional load. Temporarily removing L2ARC or SLOG devices can reduce stress during recovery. The priority is to stabilize the pool and secure the data before proceeding.

Viewing 1 replies (of 1 total)

  • You must be logged in to reply to this topic.
New to Communities?

New to Communities?

Ask a Question