Well it’s been a VERY long time. So much has changed. But enough about me and my gap in posting.
Given the discontinuation of Apple’s Time Capsules a while back, I knew that ultimately I would have to put some other sort of NAS storage system on the network to continue to enjoy the seamless backups. That time came this summer when the Time Capsule was starting to make some unpleasant sounds which often precede a disk failure. It was time.
Cue a fair bit of research. Looking at Synology, Western Digital, Buffalo, Asustor, Qnap, Terramaster. Many great products. All commercial, simple, decently supported. But pricey for what you get. I had just been burned by a bad Kickstarter that was promising a good home NAS and server system, and I wanted all that storage and flexibility, and that price. So I looked at what the state of the DIY sphere was.
Ultimately it came down to TrueNAS and Unraid for being able to have solid disk arrays, flexibility and resilience. TrueNAS is more of an enterprise offering and also had what seemed to be an amazing ecosystem with TrueCharts. Unraid was also a great alternative, simpler, perhaps a bit less powerful, but also booted off of a flash drive, which just didn’t quite sit right with me. So ultimately, I went TrueNAS.
The config: AMD Ryzen 7 8700g with 64 GB of DDR 5 RAM (not ECC, if you’re into the NAS TrueNAS scene, it’s a debate), 1 TB NVME boot drive, Asus Prime B650M-A AX motherboard, 750w Prime Gold 80+ seasonic power supply, Fractal Design Node 804 case (GREAT case for a server with space for 8 3.5” drives, and really easy access), 5 8 TB Iron World NAS drives and 3 4 TB Iron Wolf NAS drives (2 pools and a somewhat constrained budget). All that was going to get me a lot more storage than the commercial offerings, a lot of performance for running containers/servers at home, transcoding capability if I did put a Plex server together, and 2.5 Gbit ethernet wired connectivity. Start the adventure.
I put it all together, installed TrueNAS and got things up and running. Just started with a pure Time Machine backup on an SMB share configured for multi user Time Machine. MUCH faster, and worked great. I made some mistakes though, and the array started getting fatally corrupted. Cue a fair bit of trouble shooting looking for drive failures, long drive testing, etc. It seems the motherboard SATA controllers might be getting overloaded, and ZFS is a bit temperamental about the hardware (again, the ECC debate comes up, but that wasn’t where it was happening). So add in a PCI card with 8 SATA connectors on it per TrueNAS forum recommendations with the 9211 JBOD-configured PCI card. Rebuild the pools. All is well. No more disk corruption. So. Pretty sure that problem was diagnosed and solved correctly.
After more successful backups and stability, it’s time to add in a few servers. Simple. Gitea (a source code control server based on Git) and the Postgres database server to support it. Easy install, easy startup, all good. TrueCharts was as promised.
Then I started to get random reboots. Zero logs, just the computer would randomly reboot itself. No warning. Cue another investigation. It seemed to correlate with the start of Time Machine backups and disk activity spikes. Hmmmm. Again testing disks, nothing. Power? 750W was WELL in excess of the disks all startup up at once. Not sure. One of my sons has use for a power supply on a gaming build, so I grabbed a 1000W gold 80+ seasonic. But before that I grabbed a live Ubuntu image on USB, and booted the system clean on that. Mounted all the drives and did stress tests on the CPU, RAM, and ever disk including the NVME. For a week. Perfectly stable. Tuned the stress tests to maximize the load and minimize wait times on the arrays, and also to burst it all. Rock solid. The hardware recommendations from the TrueNAS forums were as promised.
So. Full clean install then. Wiped everything. EVERYTHING. Latest 24.04.2.3 version of TrueNAS, clean install, new arrays, and off we go again. No load. How long will it run by itself just doing snapshots (DO THESE WITH ZFS. It’s like Time Machine for your arrays without piles of backup storage, but you DO still need full backups not in the system, that’s a different post). Flawless. No issues, smooth as silk for a week again. Start the Time Machine backups. Also all good. Ran for over a week. Zero issues. Then random reboots started up again. The reboots stopped when I reinstalled the entire system, and now are starting again? This smells of a bug now. Zero errors of any kind reported by ZFS or S.M.A.R.T. on anything in the disks, NVME, or HDD. But no disk corruption. Before I had to revert to a snapshot to have Time Machine work again. Not this time. Curiouser and Curiouser.
Seemed semi-stable now though. I started the servers again. Same behaviour for another week. Still generally stable but random reboots. Keep looking, keep checking firmware and any other clue or possibility I could find in the forums. Everything SHOULD be solid.
And now, corruption of the Time Machine backup again. The array is again fine, as it has been ever since I put the PCI SATA card in the system. There’s a bug somewhere. And all through this the TrueNAS system will randomly reprint IP address for the interfaces despite it being a static address (also done with a permanently assigned DHCP address, same behaviour).
Well, I KNOW TrueNAS is amazing and solid and brilliant for literally thousands of people and that iXSystems has a great offering for thousands of companies. But I think I’m throwing the towel in on this. TrueCharts had a major blow-up with the community and abruptly just UNPUBLISHED all of the repo. So now there’s an open source war on the ecosystem that was also valuable. It was all built on Helm Charts on k3s (mini Kubernetes, or k8s if you’re into that) and it was another lfit to just take a simple docker container and get that going, but I was less worried about that. Now it was a pain.
Add it all up and then Unraid is now supporting ZFS in their beta stream of 7.0.0-beta-x. Well, then I looked into backup options and experience on that flash boot drive. Nothing alarming, and generally really great reviews of response and support. It is a paid license. But a very reasonable amount and it is a lifetime license. No call home things of some of the mainstream commercial stuff, all very open, open standards, and generally a much more “grass roots” company, but very successful. Standard docker support, and again a great and flourishing ecosystem. Noted as much more user friendly.
To be clear, I’ve done a fair bit of sysadmin work over my career, and nothing TrueNAS is doing is leaving me in the dust, but at some point, I want the systems to work. I’m not hacking on these things, they are supporting my hacking efforts. I’m more software than hardware.
So I’m going to look into pulling it all down and trying out Unraid now. I still back up everything to detached SSD drives and the Git and DB stuff is all also locally mirrored so nothing is at risk. And I have offsite backups on separate HDDs. This is an experiment to do more and enable more and also move past the Time Capsules without having drives dangling everywhere.
So I still have a pile of respect for the TrueNAS system and community, and they are moving with the upcoming 24.10 electric eel releases (in release candidate status as of this posting) to support pure docker and put the TrueCharts fiasco behind them. All great things. It’s the fact the foundational reason for this all, the backups, is not reliable FOR ME. It’s obviously reliable for other people, but I’ve invested a lot of time to make it work, and just haven’t had success, and I can’t swap out every piece of hardware just to try to find some edge case issue in the software after all the mainstream recommended stress tests showed all the hardware to be rock solid.
So hopefully I will actually do some follow up posts for anyone reading this with what I discover. 🙂