Tuesday, 10 January 2017

Re: netplan and post-up/pre-down scripts

Hi Martin,

Thanks for the reply.

>    Let me explain my use case: when an interface goes up or down, I want to
> be able to do event-driven things with the network configuration, such as
> add or remove routes, run a DHCP client, etc.

These two and more are already supported by networkd and NM (i. e. both of
netplan's current backends) and also in the netplan YAML itself of course. OOI,
what is your particular use case?

I had two test cases, which I first tried implementing using 'auto' and 'dhcp' interfaces in ifupdown, then in netplan.

Physical setup: two interfaces, intended to be used for routing. One to be used for a WAN interface, the other to be used for a LAN interface.

Logical setup: each physical interface has an attached bridge, each containing just that one port. For example:

Physical: eth0 ---> Bridge: wan0
Physical: eth1 ---> Bridge: lan0

I configured it this way because I wanted a no-hassle way to attach LXD containers and/or VMs to a physical network.[1]

Now, this may have a lot to do with the fact that I'm using the bridge; the behavior I'm describing may be very different for a "pure physical" interface. But with this setup, I see 5-minute timeouts at boot when I:

 - Declare an 'auto' interface in 'ifupdown' which is disconnected at system boot.
 - Declare a 'dhcp' interface in 'ifupdown' which is disconnected at system boot.

I've seen similar 5-minute timeouts occur in the field with bond interfaces whose physical links are not [all?] available.

After the system booted, I then tried more tests, such as checking if routes and addresses were added or removed when I disconnected the physical link. I was also testing a redundant setup with interfaces on the same network, and a route metric in place for shortest-path selection. So I expected the higher-metric route to be used when the lower-metric interface's link went down. However, the behavior I saw was that the routes via the downed interface were marked "linkdown" in the "ip route" output, but route lookups still selected the "linkdown" route rather than switching to the higher-metric route to the same destination. (So the only option is to remove them from the kernel upon link down. Note, I don't want to configure a bond; this is for "someone accidentally, or maybe on purpose, plugged the WAN port into the LAN, or bridged the networks"; you want traffic to keep flowing in that case, and return to normal when it's fixed.)

I would have to go back and test more to confirm, but I suspect some of these issues have to do with the bridge. The behavior I want is for the *bridge* to, for example, release its DHCP address, or delete its routes, if the *underlying* physical interface goes down, and bring up the DHCP client and/or static routes if the link comes back up. That way, any running containers correctly see "no route to host" ICMP messages, rather than trying to route packets to an interface whose underlying link is down.

I ended up having to write the configuration in 'ifudown' with "auto" and "manual" on all the interfaces, and write a separate script (called from post-up and pre-down callouts) to monitor link status, so that I could take action on the corresponding bridge when the underlying link goes up or down.

> My first attempt to make this happen was to add `post-up` and `pre-down`
> scripts to do this. However, this had a fatal flaw for my application:
> `ifupdown` doesn't separate the concept of operational status from the
> concept of administrative status.  (That is, in `ifupdown`, an interface is
> "up" if the admin says it is up.  Link up or link down does not seem to
> matter; it's strictly an /administrative/ status[3].)

The ifup@.service (more or less) deals with hotplugging, so normally as an
admin you would not explicitly "ifup" any interface (unless you mark them as
"manual", but then you are on your own anyway). So I fail to see the problem
here -- for "auto" and "allow-hotplug" interfaces this should just work?

I have not used allow-hotplug, to be honest. The documentation isn't clear on how exactly it works; I'm not sure what exactly triggers the hotplug event. This might solve part of the issue, in that I want to allow booting the system even if the link is down. (Is this the officially-supported way [across all backends] to get rid of the 5-minute timeout I love to hate?) But I also want to be able to take action on link down. So I'll have to look into it more.[2]

If you need this, then I suggest to use the NM backend, which gives you
/etc/network/if-up.d/. We will never use NM in confined scenarios like the
initramfs, so that should be reasonably safe. OTOH netplan itself (with
networkd) was meant to work in initrd and other early-boot scenarios where
arbitrary script callouts are not supportable.

Again, this is for a router, so NetworkManager doesn't seem like a good option. For the record, I'm fine with the early-boot limitation. I just want the flexibility to both cause and solve my own problems. ;-)

But at this point, maybe I'm using so little of 'ifupdown' that I might as well not configure anything, and do everything in custom scripts. But I'd hate for everyone in my situation to have to do that.


[1]:  You could also throw in some VLANs, to add to the complexity. I have at least one, but it might not be relevant to this discussion. (I think I had a eth0.100 and was planning to create a bridge on top of that as well, to attach virtual interfaces without giving them access to every VLAN on the physical link.) This probably causes similar issues if set to 'auto'; I wonder if 'allow-hotplug' will work.

[2]: So far, I think "allow-hotplug" is a little strange in that seems to blur the distinction between operational status and administrative status. (I guess you could say that the administrative status is "always up" if you have "allow-hotplug" in your /etc/network/interfaces. But what if a flaky NIC is toggling on and off rapidly, and you keep getting hotplug events? I suspect 'ifdown' would not be enough to bring the interface down permanently; you'd need to comment out the "allow-hotplug" in /e/n/i first.) Whereas, with my solution, if an interface set to "auto" and "manual" you can just "ifdown" the interface and it would go away until both: (1) "ifup <interface>" occurs, and (2) the interface link is up.