• 1 Post
  • 9 Comments
Joined 2 years ago
cake
Cake day: June 30th, 2023

help-circle
  • The problem is that I want failover to work if a site goes offline, this happens quite a bit with private ISP where I live and instead of waiting for the connection to be restored my idea was that kubernetes would see the failed node and replace it.

    Most data will be transfered locally (with node affinity) and only on failure would the pods spread out. The problem that remained in this was storage which is why I’m here looking for options.


  • Thanks for the info!

    I’ll try Rook-Ceph, Ceph has been recommended quite a lot now, but my nvme drives sadly don’t have PLP. Afaict that should still work because not all nodes will face power loss at the same time.

    I’d rather start with the hardware I have and upgrade as necessary, backups are always running for emergency cases and I can’t afford to replace all hard drives.

    I’ll join Home Operations and see what infos I can find


  • It’s fine if the bottleneck is upload/download speed, there’s no easy way around that.
    The other problems like high latency or using more bandwith than is required are more my fear. Maybe local read cache or stuff like that can be a solution too but that’s why I’m asking for what is in use and what works vs what is better reserved for dedicated networks.




  • They both support k8s, juicefs with either just a hostpath (not what i’d use) or the JuiceFS CSI Driver. Linstore has an operator which uses drbd and provides it too.

    If you know of storage classes which are useful for this deployment (or just ones you want to talk about in general) then go on. From what I’m seeing in this thread I’ll probably have to deploy all options that seem reasonable and test them myself anyways.




  • I mean storage backends as in the provisioner, I will use local storage on the nodes with either lvm or just storage on a filesystem.

    I already set up a cluster and tried linstore, I’m searching for experiences with the options because I don’t want to test them all.

    I currently manage all the servers with a NixOS repository but am looking for better failover.