Alex’s Blog

I try to be smart, sometimes

Server Crash of 2021

Ended 2020 with a bang in my homelab. Well not an actual bang more like a sudden power cut on my main server. Before we dive into the details, (I am currently typing this while my new OS install is reapplying permissions to all my files). Let’s get to know my main server “Alpha Centauri” a little bit. Alpha Centauri, or here in A.C., is the heart and soul of my homelab. I built this beast back in 2018 after I realized my HP Z400 workstation was no longer cutting it for my needs.

Parts for A.C. were sourced from mostly IT recyclers on eBay, random Reddit users on /r/hardwareswap and /r/homelabsales. She was originally built using 128GB of DDR3 ECC RAM, 2x Intel E5-2670v2 (6c/12t), 3x 3TB WD RED HDDs, an LSI SAS HBA, 2x HP Quad NIC and a PCIe Adapter for M.2 NGFF + NVMe. Over the years she was upgraded with a Quadro P2000, +3x 3TB WD Reds, a Dell 1.2TB Fusion IO MLC SSD, an Intel x520 SFP+ adapter, and a few random SATA SSDs. As of 2021 the current specs for the server are:

CPUx2 Intel Xeon E5-2670 v2 (6c/12t)
MOBOx1 Intel S2600CP
RAM128GB 1333Mhz ECC Unbuffered (64GB / CPU )
HDDx8 WD Red 3TB (21.8TB total)
HDDx2 WD Red 2TB
SSDx1 256 WD Blue M.2 NGFF + x1 256GB Samsung M.2 NGFF
SSD x1 Samsung 850 Evo 500GB
SSDx1 Crucial 128GB
SSDx1 Dell 1.2TB MLC Fusion IO SSD
GPUx1 PNY Nvidia Quadro P2000
NIC-1x1 4 port HP Gigabit
NIC-2x1 10Gtek Intel x520 SFP+
PSUx1 Evga G2 1000w
CASEx1 Rosewill RSV-L4500
OSWindows Server 2019 Datacenter [Core]

So What Happened?

I had just finished watching an episode of The Blacklist on Plex when my wife informed me that the TV in the living room could not connect to our Plex Server. I wasn’t really surprised as the LG WebOS TV app for Plex is terrible and disconnected frequently. So I fired up my Plex app on my PC just to do a double check that the server was still running; as I was currently dealing with a memory issue with my new PlexWatchDog script that caused it to crash every few hours.

Server Unavailable

Looks like it crashed again. Easy fix. Fired up Hyper-V Manager and tried to connect to A.C. to reboot the Plex VM.

Unable to connect to Alphacentauri.lab. Please verify…

Crap the entire server is down. Okay lets check IPMI.

Host Offline

Yup definitely crashed.

Diagnosing the Issue

So after hooking up my “crash cart” (Elgato capture card in my PC and a spare keyboard) to the server I noticed that one of my drives in the array for Plex was offline and the SSD hosting my VM’s VHDs was also offline while rebooting. The latter part would definitely crash the host through a BSOD. Sure enough the host crashed again while loading Server 2019 core. Which means it was time to do a rack pull and investigate.

After verifying all my drives were OK and the server itself was OK I booted it using a full boot cycle and not the normal “fast boot.” While watching the process I noticed that the SSD had come back and was showing again. However, the offline HDD in the Plex array was not back. I decided to try something completely random and unplugged one of my non essential drives from power and booted the system again.

Huzzah! The drive was back! Which now means my 850w PSU was not sufficient enough to run the system at full capacity.

Replacing the 850w PSU with a 1000w fixed the problem and all drives came back up after the replacement.

Conclusion

I am still not entirely sure why the server to crash as I was unable to figure out what caused it. This had never happened before on the 850w PSU and the server was originally built with it so I had no reason to doubt the power supply as being a potential problem. Guess I was wrong there!

PS: No I will not be switching to unraid. 🙂

Written by Alexander Henderson

I do IT things sometimes.

More From This Category

Docker Container Building via GitHub Actions with Diun.

Docker Container Building via GitHub Actions with Diun.

My last post about GitHub Actions left off with a docker container being built when a commit was pushed to the repo. Now this is great if you are building a docker container, for your project, in your current repo. However, for me though that's not the case. At least...

read more

Expanding Pi-Hole Stats with Prometheus

The other day I came across a Prometheus Exporter for Pi-hole (found in a comment on /r/pihole) that gives WAY more stats/data compared to the InfluxDB script I posted about awhile back. With this exporter, I was able to setup a more detailed dashboard. Now currently...

read more

Monitoring Nvidia GPUs via Telegraf

The nvida-smi plugin for Telegraf basically gives you an overview of your GPU usage in the most current iteration in v1.10.4. This "guide" assumes you are using Windows as your host OS. Linux should be fairly easy to get going as long as you know where your nvidia-smi...

read more

0 Comments

Submit a Comment