Monday, 13 February 2017

Re: netplan and post-up/pre-down scripts

On Tue, Jan 17, 2017 at 1:02 AM, Mike Pontillo <[email protected]> wrote:
On Mon, Jan 16, 2017 at 7:35 AM, Mark Shuttleworth <[email protected]> wrote:
Would 'got-link' and 'lost-link' be good names for this?

I'm not certain a new event name is needed for this functionality; it seems to me that the current definition of 'up' isn't quite correct.[1] (But all this might be a moot point depending on what is supported in networkd, and how it behaves.)

I understand there have been several attempts to address this in the past, such as the 'allow-hotplug' option, ifplugd, ifupdown-extra, NetworkManager, and now networkd. IMHO, no solution is complete unless it properly separates adminStatus from operStatus, and holds off on confirming "link up" until both are "up". For backward compatibility, a boolean flag (similar to "allow-hotplug") should indicate whether or not the system is allowed to continue booting if the interface is down.[2]

I agree booting should continue for any device that is down even if it's configured and marked as fine to still be down. I think rather than trying to skip a long delay that would happen in some other cases, we should revisit whether it makes sense for it to take 5 minutes; or even make this configurable. 5 minutes is a lot; 1 minute is acceptable in many configurations, 30 seconds is ideal in some. All this only makes sense for DHCP-configured interfaces where the link is up but no DHCP response has been received. If the link is down, there is no point in ever waiting. This may need fixed in networkd and NM to allow configuring the DHCP timeout.

I do not know that any of the different network management tools were planned to address administrative status of an interface. It may even in fact require kernel work (I haven't looked yet) to allow its state to be set to administratively down.

Please file a bug about this. I'll review and look how we can specify this in netplan, and how we can drive it in the various backends... But I expect it will require work in both networkd and NetworkManager before it does work correctly. Servers aren't typically used in such a way, and setting that option seems like it might be quite intrusive (I mean, of course the default wouldn't be for interfaces to be administratively down, but I can see issues coming up from it. One of them is how to do it in the first place, and another is that it would only happen once the renderer (networkd/NM) is up and running, so potentially you've already sent/received packets on the interfaces... ideally, that shouldn't happen, but maybe I'm overthinking it).

Another subtle detail is that if an interface is administratively down, there should be an option to cause the NIC to take its physical link down. That way, whatever is on the other side of the link doesn't assume its peer is active. (This is standard behavior on a router or a switch, but may be atypical for a server... so I think the default behavior should continue be "leave the physical link up".)

Of course. If something is administratively down, we must think of it in terms of *powered down*. Otherwise it's simply not configured.

/ Matt