Wednesday 8 April 2020

Re: RFC: Ubuntu HA resource-agents supportability

On Tue, Apr 7, 2020 at 10:47 PM Rafael David Tinoco
<rafaeldtinoco@ubuntu.com> wrote:
>
> > I added a few comments below, otherwise all the categories look
> > reasonable to me, thanks!
>
> Thanks for the feedback Dan!
>
> > > clvm - clvmd daemon (cluster logical vol manager)
> >
> > Was clvm dropped from lvm2?
> > https://launchpad.net/ubuntu/+source/lvm2/2.03.02-2ubuntu1
> > I haven't used clustered lvm myself; maybe it was just rolled into lvm2.
>
> LVM2 can use lvmlockd (lvm2-lockd in Focal) now for VG access coordination: It can use either dlm (dlm-controld) or sanlock (sanlock) as lock managers. It is not a cmdline level based locking, as clvm was. AND it supports lvmetad.
>
> I believe we will continue seeing dlm as the lock manager as it uses underlying corosync for messaging, same one as pacemaker, thus allowing the same dlm to support gfs2, for example.
>
> > > docker - docker container resource agent
> >
> > as cpaelzer said, docker itself shouldn't be in the fully supported list.
> >
>
> Yep, changed it already!
>
> > > lxc - allows LXC containers to be managed by the cluster
> >
> > presumably, this includes lxc and lxd?
>
> That's actually a really good question. I flagged LXC resource here ("fully supported agents - containers") and LXD resource agent in the "best effort - registration agents" section.
>
> I explain:
>
> * We do have ocf_heartbeat_lxd resource:
>
> Just a service agent to tell the cluster (through CIB attributes) how many containers are running in that node. Pacemaker pengine decisions can be made out of that CIB attribute (to compile decision taking).
>
> For LXD, it seems it uses RAFT through its internal SQLite database for clustering consensus. AND it controls all its own internal resources... so not sure there would be any advantage in creating a pacemaker agent for LXD.
>
> Its quite a long discussion whether to rely on Paxos/Raft/Zab protocols or a "totem single ring" protocol + fencing mechanisms like corosync & pacemaker do... But I think its not worth pursuing a lxd agent if we are not managing multiple resources in big resource groups and clusters.
>
> * We also have ocf_heartbeat_lxc resource: Basically manages lxc containers.
>
> Since LXD is light years in front of pacemaker + lxc I believe, IMO, the strategy here should be to support users of the ocf lxc agent (if any) and point them at lxd clustering.
>
> I could even move lxc agent to "best effort" as our strategy is targeted to lxd.

While lxc is still supported in Xenial, if we're talking only about
future direction, then I absolutely agree.

I also agree it's probably better to allow lxd to manage its
clustering itself, instead of with a pacemaker agent.

>
> > > exportfs - nfs exports (not the nfs server)
>
> That's a nice catch, I'm supporting NFS HA so I should support this together as it specifies the exported directories. We can configure one per running agent and give it the same fsid for example. Moving to supported.
>
> > wouldn't this be fully supported?
> >
> > > fio - fio instance
> > > galera - galera instance
> > > garbd - galera arbitrator instance
> > > jboss - JBoss application server instance
> > > jira - JIRA server instance
> > > kamailio - kamailio SIP proxy/registrar instance
> > > mariadb - MariaDB master/slave instance
> > > nagios - nagios instance
> > > ovsmonitor - clone resource to monitor network bonds on diff
> > > nodes
> > > pgagent - pgagent instance
> > > pgsql - pgsql database instance
> >
> > shouldn't this be in fully supported?
>
> Pgsql is in [universe]. Unless SEG supports pgsql "by default", do you ?

We're talking about postgresql right? It's definitely supported (it's
in main), and from a quick look at cases, it gets quite a bit of
support requests, especially around cluster management.

>
> > also Brett (I added to cc) brought up that resource-agents-paf might
> > be worth considering supporting:
> > https://launchpad.net/ubuntu/+source/resource-agents-paf
>
> Also in [universe]. Opened for discussions for pgsql.

Right exactly, Brett was suggesting it may be worth considering this
for main as well. Just wanted to let you know in case you had any
opinions on it and FYI (and we'll be coming to you anyway if we do
pursue an MIR ;-).

>
> > >
> > > # openstack
> > >
> > > openstack-cinder-volume - attach cinder vol to an instance (os-info <->)
> > > openstack-floating-ip - move a floating IP from an instance to another
> >
> > I would expect both these to be in the fully supported category?
>
> So, ocf_heartbeat_openstack-info gets attributes from openstack instance and records into cluster CIB (openstack_flavor, openstack_ports, etc). ocf_heartbeat_openstack-cinder-volume uses those attributes to take actions attaching or removing cinder volumes from a instance.
>
> I have no knowhow in this and judging by the descripting it felt fragile to say "fully supported". I thought making no hard commitment was "safer". What's your opinion ?

Honestly for openstack stuff, I'm not the right person to say, but I'm
sure either James or Ed (on cc) could comment.

>
> > > Xen - xen unprivileged domains
> >
> > as cpaelzer mentioned, Xen should probably move up to the 'best
> > effort' section; this was just moved out of main in focal.
>
> Yep =(. Will do.
>
> Thanks a lot for all the input, very helpful!

Thank you! :)

>
> -rafaeldtinoco
>

--
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel