WEBVTT 00:00.000 --> 00:22.220 Hello, what kind of everyone? I'm Fernando. I'm a 00:22.220 --> 00:27.400 kind of network engineer at Sousa and today I'm here to talk about 00:27.400 --> 00:36.520 channels and light weight tunnel infrastructure with NFTables so let's start with 00:36.520 --> 00:41.560 topic. So what's light weight tunnel? Well it's something that has been around 00:41.560 --> 00:47.800 the camera since 2015. It's not new and it allows to ingest this 00:47.800 --> 00:54.120 touch encapsulation instructions to routes originally. We will go back to that 00:54.120 --> 01:01.680 later and it has support in multiple kind of tunnels like IP, IP, DX, Lange, and many 01:01.680 --> 01:07.600 others. So why is this useful? This is useful for virtualized environment. If you are 01:07.600 --> 01:14.600 familiar with Kubernetes or environments with a lot of VMs, usually all the 01:14.600 --> 01:20.360 connection is tunnels in enterprise networks. You have the machines, VMs or 01:20.360 --> 01:24.560 bare metal connected to switches connected to a huge network and it's 01:24.560 --> 01:31.040 common that you want to connect everything together or in Kubernetes. It's also quite 01:31.040 --> 01:38.000 common to have bought bought or no-to-not communication tunnel and also there 01:38.000 --> 01:45.840 is a lot of work in business networking like OBS that works through tunnels. So in 01:45.840 --> 01:51.000 essence the initial support was something like what's other there. You added the 01:51.000 --> 01:57.920 route to a specific endpoint and you added some encapsulation instructions. So for 01:57.920 --> 02:02.480 example here we are in the DSABX line, you add the ID and you add the 02:02.480 --> 02:07.840 destination endpoint and then the device that is going to handle this. So the good 02:07.840 --> 02:12.520 thing about this and if you are familiar with DX line is that usually or with an 02:12.520 --> 02:15.800 tunnel actually is that usually you won't need to create one interface 02:15.800 --> 02:23.640 where a tunnel that you create. So you have for let's say every endpoint that you 02:23.640 --> 02:28.920 want to connect, you have one tunnel interface that have source destination, 02:28.920 --> 02:34.120 some options and then on the other end you have another tunnel with source 02:34.120 --> 02:39.760 destination and so options and then then working the middle. The good thing is that 02:39.760 --> 02:46.120 this other unit create a single device, it could be an IPAP, the X-Long 02:46.120 --> 02:52.960 or whatever and that single device can be used for different endpoints and this 02:52.960 --> 02:56.840 super useful in Kubernetes for example when you need to create a managed 02:56.840 --> 03:02.640 a huge amount of interfaces so you can reduce that amount of interfaces and in 03:02.640 --> 03:09.920 a sense make it more simple and efficient. All right so what new by 03:09.920 --> 03:16.160 I'm talking about this in 2026, 11 years later, well we have other support 03:16.160 --> 03:23.800 to unafita and ftables for it and it's true that lightweight tunneling is not 03:23.800 --> 03:27.680 very well documented, it's hard to find documentation how it works, how it 03:27.680 --> 03:33.840 configures it and if you are an expert on virtualizing and working then it's 03:33.840 --> 03:39.440 easier but usually it's pretty hard to be able to figure out how to use it. 03:39.440 --> 03:45.360 So there is a lot of option rate as I say probably to do it to the lack of documentation 03:45.360 --> 03:50.960 and I must explain that the kind of leader documentation in NFTables is not that good 03:50.960 --> 03:55.680 so that's on my side I'm working on it sorry and I would try to 03:55.680 --> 04:01.840 have a patch of it to to improve that documentation and the things that in IP route 04:01.840 --> 04:07.360 the implementation and configuration it's sometimes not very compatible with 04:07.360 --> 04:11.360 never manager because never manager is touching also routes and all this stuff so 04:11.360 --> 04:15.440 there are conflicts and things sometimes doesn't work pretty well so either you 04:15.440 --> 04:22.080 cannot use it or you can you need to drop network manager and in this case as the 04:22.080 --> 04:26.800 implementation in NFTables does not touch the routes what it does is create a 04:26.800 --> 04:33.840 rule set of NFTables and inject the encapsulation instructions there it's completely 04:33.840 --> 04:37.280 compatible with this kind of engines like never manager is systemDener where the 04:37.280 --> 04:45.120 and so on so that's another thing well NFTables it's going to be the supporting 04:45.120 --> 04:50.080 NFTables is more flexible because it's not dependent on routes anymore you pull match 04:50.720 --> 04:57.040 on any characteristic that you want to split the traffic and it doesn't necessarily need to be 04:57.040 --> 05:01.600 the source and the destination it could be the ID or it could be whatever all the stuff that 05:01.600 --> 05:07.920 you want to touch and it's other you also to combine it with some features like the maps, 05:07.920 --> 05:16.160 the sets, dynamic sets, dynamic maps and actually any NFTables expression that you want to use 05:16.240 --> 05:23.280 and this helps you to well scale up and simplify the configuration from the CCR administrator 05:23.280 --> 05:30.080 point of view so also by the way in NFT this is an object so it's not like a standard 05:30.080 --> 05:36.160 expression it's an object statement and what is good about this is that it supports the update 05:36.160 --> 05:42.080 operation so imagine that you have a one tunnel created with one tunnel template that we're looking 05:42.080 --> 05:47.280 to that data and you want to update some of the fields configure you can do it using the 05:47.280 --> 05:52.320 transactions system that NFTables support so you won't have intermediate states like people have 05:52.320 --> 05:59.440 for example in IP rules because you got an executing commands in your specific order if you have 06:00.160 --> 06:06.160 high flow traffic that could be some packets in the middle that suffer some drops or or some 06:06.160 --> 06:13.760 connectivity issues all right so first let's look at what we did in the corner so this is 06:13.760 --> 06:20.880 the key piece of the work which in essence we just take the NFTNL object information we get the 06:20.880 --> 06:28.960 SKB when we have a evaluation and evaluation it's in essence a match and we drop initially the 06:28.960 --> 06:37.600 DSD metadata that is attached to the SKB and we put our well we get a reference of the 06:37.600 --> 06:46.000 of the metadata that we have generated and we attach it to the SKB and this sounds super simple but 06:47.520 --> 06:54.320 well first ND it's a metadata structure it's in essence it contains an IP tunnel info 06:54.320 --> 06:59.920 and it parts all the information of the tunnel and this comes from generic things like I say 06:59.920 --> 07:10.640 say source address IP 4 and IP 6 destination port ID, BTL or other stuff but also the specific 07:11.520 --> 07:16.000 of that tunnel let's see for example if you have the need you know that you can have multiple 07:16.000 --> 07:21.760 options configure if you have BX line you have the GBP configure so all of that it's contained 07:22.320 --> 07:29.920 and the hard work that is doing NFTables is in essence when you're from the users space 07:29.920 --> 07:34.560 configure the tunnel it's validates all of that and make sure that you are not configuring something 07:34.560 --> 07:42.880 wrong in the terms of that it makes sense like you're not mixing IP 4 with IP 6 or you are not 07:42.880 --> 07:49.760 to think in valid values and it's passing all of that into the correct tunnel type and then 07:49.760 --> 07:57.680 inside to need but I'll tell you so before attaching it to the SKB it's kind of trivial thanks 07:57.680 --> 08:03.440 to all the mechanisms that we already have because actually without them it will be quite complex 08:04.400 --> 08:12.080 so let's look at a real NFT tunnel object so this is a generic template this is not 08:12.640 --> 08:21.280 this could be used for example for any kind of tunnel IPIP, PX line, Geneva and so on because it doesn't 08:21.280 --> 08:27.920 have a specific options for it but of course this is not configuring any of the specific options 08:27.920 --> 08:35.520 and if you want to confirm and the default will be whatever is defined on the on the driver of the tunnel 08:35.520 --> 08:42.080 so if you want to confirm it we'll look at the later so in essence we have a table in that table 08:42.800 --> 08:49.280 we create the the object that will be the tunnel we put the name and we can define the ID 08:50.240 --> 08:55.280 source address and destination address this examples IPv4 but IPv6 is supported 08:56.160 --> 09:03.120 and then the destination port and TTL that's it we are going to look later on how to use this 09:03.120 --> 09:12.160 object but they're variants so as I say if you want to define some of the options that are more 09:12.160 --> 09:18.400 specific to the tunnel type like for example the VX line GBP option you need to add the 09:18.400 --> 09:25.280 section so you add a small section under the tunnel definition and you put the X line GBP in this 09:25.280 --> 09:31.200 case 100 and for Geneva people will be the same Geneva and the options configure 09:32.960 --> 09:41.520 so all right what are the real world samples that we can deal with with this guy in off 09:41.520 --> 09:46.880 tunneling so we have two games that are connected to each other like for example in a cabin 09:46.880 --> 09:55.520 that is a cluster which are the nodes then these VMs has a BX line and there are containers 09:55.520 --> 10:03.440 configure with a better to allow our container communication with the host of the VM with the 10:03.440 --> 10:11.680 guest VM sorry so here we will use the template to allow that encapsulation and be able to communicate 10:11.840 --> 10:18.000 and the good thing is that if we have three or four containers we could use the same BX line 10:18.000 --> 10:28.240 interface we would need to create one pair pair container all right so this will be the rule set 10:28.240 --> 10:35.600 for that specific scenario as I say we have the definition of the tunnel and then we need to 10:35.600 --> 10:41.600 redirect the traffic to the tunnel and from the tunnel because here we need to redirect 10:41.600 --> 10:48.880 from container and to the container so we in essence match the address and that address will be 10:49.520 --> 10:58.320 that is computer on the containers and after that we define right we want to the the tunnel 10:58.320 --> 11:07.280 object to kick in and we define tunnel name the name of the tunnel and later we forward E to 11:07.280 --> 11:13.680 BX line 0 so BX line BX line the bias line the bias we be able to look at the encapsulation 11:13.680 --> 11:22.320 instructions that are defined there and do proper encapsulation but we also need to do something else 11:22.320 --> 11:30.240 when we get the traffic from the other container we need to do for one it for what it 11:30.240 --> 11:38.560 to the the best in the host so it's actually it lands in the container so you have traffic in the 11:38.560 --> 11:46.560 both ways this by the way cool if you use IP route to do this you will need to use probably TC 11:46.560 --> 11:51.680 to ready the traffic from the guest to the container when you are listening it because with a 11:51.680 --> 12:00.160 route and on it's usually not enough another example that I'm not going to get in very much 12:00.160 --> 12:07.040 it's yeah the simple tunnel example you have two VMs connected to a switch connected to a network 12:07.680 --> 12:15.120 and you create a VX line and therefore you encapsulate the traffic between them so 12:15.280 --> 12:23.280 let's look at a real example for this specification area 12:25.680 --> 12:33.120 so this recorded with this demo aski with aski cinema so I'm going to explain what's happening 12:33.120 --> 12:39.120 so first I'm going to log into one of the VMs because we need to configure both VMs 12:45.920 --> 12:57.440 right so what we have known here finishes me going to try to make it be here fine okay yeah so 12:57.440 --> 13:12.240 what we have known here is to create few names places okay they are right more okay okay that's fine 13:12.400 --> 13:23.120 so we have created two names spaces those names spaces have the best configures and so on 13:23.120 --> 13:29.520 and we have used the NSS the ruleset that we show before but instead having only one address we 13:29.520 --> 13:36.080 are having two others because both containers has different others and in essence yeah this is the 13:36.160 --> 13:42.800 ruleset and now I'm going to log in the other machine to configure the other end 13:53.680 --> 13:59.200 okay yeah so it's configure and now I'm going to pick being from the containers to the other 13:59.200 --> 14:12.320 container using the tunnel which should be happening so yes all right so I'm going to 14:13.120 --> 14:18.880 name space this is done this is done with names space that's a VM works kind of the same way 14:20.160 --> 14:26.080 all right and as you can see I'm able to ping from the container to the other end and from 14:29.200 --> 14:36.560 here the container to the other one so both ends and I'm going to show you now that I only 14:36.560 --> 14:43.120 using one BX line and also yeah here you can see there is only one which doesn't have any IP 14:43.120 --> 14:49.680 configure and I have the both best on the host the both pairs on the host and the physical 14:49.680 --> 14:56.320 nick that I'm using with the others configure and no rules well the default rules of course for 14:56.320 --> 15:05.360 connectivity and yeah that's that's basically it we were able to achieve our connections 15:06.080 --> 15:14.640 from different endpoints using a single BX line and this can be a scale app to a very very 15:15.600 --> 15:34.640 big hand-hound of containers and endpoints so yeah all right so that's it thank you very much 15:34.640 --> 15:44.960 everyone here thank you for listening and also thank you for all the volunteers and 15:45.040 --> 15:48.880 organist solution 15:52.000 --> 15:57.760 and questions thank you 15:57.760 --> 15:58.760 Oh, there. 16:10.760 --> 16:11.760 Yeah, hey. 16:11.760 --> 16:15.760 So we just wanted to bring the rummage 16:15.760 --> 16:17.760 internal version easily. 16:17.760 --> 16:21.760 Did this supported in any of the tables? 16:21.760 --> 16:26.760 So with the latest bug fixes, I believe it is 618. 16:26.760 --> 16:28.760 It is 618. 16:28.760 --> 16:31.760 So we fixed some bugs in Geneva. 16:31.760 --> 16:37.760 But for BX then, you should be able to use it since 64. 16:37.760 --> 16:38.760 Something like that. 16:38.760 --> 16:41.760 From the top of my mind, maybe it's a little bit off. 16:41.760 --> 16:44.760 But if you want to use it, you need 618. 16:44.760 --> 16:45.760 Thank you. 16:45.760 --> 16:46.760 Thank you. 16:46.760 --> 16:59.760 So suppose that's some person from neighbour room, 16:59.760 --> 17:01.760 hosting Gbps will come here. 17:01.760 --> 17:02.760 And that's the question. 17:02.760 --> 17:05.760 Is it possible to port this implementation 17:05.760 --> 17:09.760 from any tables to some XDP or BPS program? 17:09.760 --> 17:10.760 Is it possible? 17:10.760 --> 17:11.760 Probably. 17:11.760 --> 17:12.760 I don't know. 17:12.760 --> 17:13.760 I'm not a BPS expert. 17:13.760 --> 17:14.760 I don't know. 17:14.760 --> 17:17.760 But I'm pretty sure that someone could do it. 17:17.760 --> 17:19.760 It's not rocket science. 17:19.760 --> 17:23.760 So there's just manipulation with the packet. 17:23.760 --> 17:25.760 Yeah, it's manipulation of the packet. 17:25.760 --> 17:26.760 In essence, what is it? 17:26.760 --> 17:29.760 Well, actually, I'm not very sure that you can do it. 17:29.760 --> 17:32.760 Because if I know where I'm from BPS or BPS context, 17:32.760 --> 17:35.760 you cannot see the SQV, which is the packet representation on the kernel. 17:35.760 --> 17:39.760 And what we are doing here is touching the 17:39.760 --> 17:43.760 instruction to do the encapsulation to that SQV structure. 17:43.760 --> 17:47.760 So actually, I don't know. 17:47.760 --> 17:59.760 Maybe you can go to the, I guess the, he might know better. 17:59.760 --> 18:00.760 Thank you very much. 18:00.760 --> 18:01.760 All right. 18:01.760 --> 18:02.760 Thank you.