WEBVTT 00:00.000 --> 00:15.440 I will be talking about python bpf, what is it and how it works, but before that yeah a quick 00:15.440 --> 00:22.680 introduction about me and Varun. So, I am Pragyansh, I work at canonical in Ubuntu engineering, 00:22.680 --> 00:29.320 I am also the co-mentainer of python bpf which has nothing to do with my day job and Varun 00:29.480 --> 00:34.400 engineering student who works with me on python bpf, he could not make it here because 00:34.400 --> 00:40.360 of some reasons, but yeah another person I forgot to include in this slide, but I have 00:40.360 --> 00:46.560 a master acknowledge a scarthic here, a PhD student at e-mfl, who helped us with adults with 00:46.560 --> 00:57.560 ebpf and was very helpful in guiding us through this project. So, yeah the project python bpf is 00:57.960 --> 01:07.000 a python front end for writing ebpf programs. It allows you to write both ebpf and its corresponding 01:07.000 --> 01:13.080 user space code in python and they can be in the same file, they can be in different split 01:13.080 --> 01:20.040 across different files, it can also be in a python notebook across multiple cells. Now writing 01:20.040 --> 01:27.480 ebpf and python is not a new idea and might sound somewhat outdated as well as bcc or 01:27.560 --> 01:35.800 already existed in like what 2015 16, but how python bpf does this is very different from bcc. 01:38.680 --> 01:45.960 So, if you have used bcc for writing ebpf programs before, you might know what the might 01:45.960 --> 01:50.760 know that you still have to write the bpf specific part in a multi line string if you are using 01:50.920 --> 01:57.080 it in the same file or have to have a different file and then reopen that and read that 01:57.080 --> 02:04.520 in your python file and your ebpf specific code is written in a flavor of c and only the 02:04.520 --> 02:11.560 user space processing part is in python. In python bpf the users use a subset of the existing 02:11.560 --> 02:19.080 python grammar to write ebpf code and this has some benefits like python dev tools like 02:19.160 --> 02:26.680 lenders and pre-setters are useful for the whole file now. This also potentially lowers the barrier 02:26.680 --> 02:33.240 of entry for people to learn ebpf by hiding a lot of typing details and hand me in common 02:33.240 --> 02:41.640 verification failure mitigations behind the scenes. So, this is a side by side comparison of 02:41.720 --> 02:50.040 the same code written in bcc and python bpf. This was one of the examples in the bcc tutorial. 02:51.000 --> 02:56.600 You can see the dev tools at work with proper syntax highlighting as compared to the bcc 02:56.600 --> 03:07.240 example given on the left side. This is somewhat superficial when it comes to benefits as we will 03:07.320 --> 03:16.760 discuss what else python bpf enables in our latest slides. Let me give you a basic overview of 03:16.760 --> 03:25.320 this program. At the top we have the various imports of python bpf the decorators, the helpers, 03:25.320 --> 03:34.760 the maps and the type hints which are necessary. Then we declare a custom data type data t 03:35.720 --> 03:48.360 which has a pid timestamp and the command which is a caraday. Then we also have a perf event 03:48.360 --> 03:56.040 array map type bpf perf event array and then there is the function which attaches itself to 03:56.280 --> 04:05.640 says endoclone trace point and then outputs some data to the perf event array map. So, 04:05.640 --> 04:15.240 yeah that is the basic that is what a python bpf program would typically look like. Let us move on 04:15.240 --> 04:23.480 to how do we compile this. So, this is a big image and I wish I could employ you all to remember 04:23.480 --> 04:29.080 this for the rest of the presentation but we can come back to this slide again if we need it. 04:30.280 --> 04:38.680 This is what a comes firstly as a one time step we have to generate the VM Linux.py VM Linux 04:38.680 --> 04:47.720 header. So, we generated using a script that we have written that runs bpf2 to create VM Linux.h 04:47.800 --> 04:56.280 and then runs crank topy on it with some tweaks to give us VM Linux.py so that we can use your running 04:56.280 --> 05:04.760 kernels data structures in the bpf code that you are writing. Then we can take the input as a python 05:04.760 --> 05:15.080 file or python notebook source and then bpf chunks are marked out and processed by python bpf. 05:15.160 --> 05:22.920 What are bpf chunks these are just code snippets which are which will be like compile to a bpf 05:22.920 --> 05:28.280 object. So, if there is a file that contains both if bpf code and user space code 05:29.320 --> 05:36.120 we just we are just concerned with the bpf chunks rest of the code is run through any python 05:36.200 --> 05:47.720 interpreter. So, here and then we first look at VM Linux imports and how they are being used. 05:47.720 --> 05:54.920 So, if there is any kernel data structure or kfong or something that is present in VM Linux.py 05:54.920 --> 06:01.160 imported from there and being used there. So, as a first step we create the llvmir for 06:01.800 --> 06:09.720 that said import and then we create the symbol table for those imports and 06:10.760 --> 06:16.680 like inject them into our local and global symbol tables for the input file we have 06:17.320 --> 06:24.280 which will be populated with other program logic later. Then we do struct processing 06:24.600 --> 06:32.440 our custom data types you might have declared and then comes map processing where 06:33.480 --> 06:39.320 we generate llvmir for any maps you might be using we do this in this specific order because 06:39.880 --> 06:46.280 you might be using those custom struct types in your map declarations and might be using 06:46.680 --> 06:52.280 of course, always using those maps in your functions that you will be writing. 06:52.360 --> 06:59.720 So, this is the order we follow. Then comes the real part function processing where we 07:01.240 --> 07:09.240 do a pass to kind of guess how many local symbols you need and how many local 07:09.960 --> 07:19.000 how much local stack space you need and then assign those stack spaces to symbols at times 07:19.160 --> 07:27.160 you can have the same stack space for multiple symbols fill in type metadata and then process 07:27.160 --> 07:39.800 all of the expressions one by one here. So, conversion of Python ast to llvmir is done by a 07:39.800 --> 07:47.640 project called llvmir which is number you are thinking of using just llvmir but we figure that 07:47.880 --> 07:53.880 let us try with llvmir it allows us to write everything in Python and I do not know this goes 07:53.880 --> 08:06.280 in with the spirit of the project. Then now that we have that i of file we can just pass it 08:06.280 --> 08:15.480 well c and it gives us bpf object file. We also have a sister project called pile bpf which is 08:15.800 --> 08:21.880 essentially just python bindings for llvmir and we can pass on the 08:23.400 --> 08:31.800 struct definitions that we created during this pass and pass that on to our user space code 08:31.800 --> 08:37.880 which is using pile bpf. So, you do not have to declare the same structs twice and then use 08:38.760 --> 08:46.360 those map and struct definitions in your user space. So, yeah that is the compilation flow. 08:48.600 --> 08:57.880 Now, I will go through what is the anatomy of python bpf program through work through. So, 08:57.880 --> 09:07.160 this is the disk snoop example which we have ported from bcc to python bpf and so yeah 09:08.120 --> 09:15.880 first we take in the c types import. These are necessary even though type hints are 09:16.760 --> 09:25.080 optional in python for python bpf they are necessary because you need to compile it down to llvmir 09:25.080 --> 09:32.040 and there are some places where you just cannot guess types after some point. So, these are 09:32.120 --> 09:39.800 common and also like when you are creating suppose this hash map which is a map of type bpf hash 09:39.800 --> 09:49.720 it is not the data structure hash map. You have to give the key and value types and they could 09:49.720 --> 09:55.880 have been like instead of equals we could have used code in but that is something that we 09:55.880 --> 10:04.200 talking about in the next slides. Then there are some imports from rvmreenox.py struct ptx which 10:04.200 --> 10:10.600 is the type of the context argument pass to functions attached to k probes then there is a request 10:10.600 --> 10:20.840 struct which is a common data structure in vmreenox header. Then we have the decorators bpf bpf 10:21.160 --> 10:29.400 global compile is not a decorator but map and section. These necessary to point out the bpf chunks 10:29.400 --> 10:37.560 and what kind of bpf chunk there then we have the k time helper which maps to bpf k time and the hash 10:37.560 --> 10:48.040 map type is imported from bpf.maps there we have listed other map types there as well. So yeah 10:50.840 --> 11:04.440 next we have what a function body would look like. So first if we have to attach a function 11:04.440 --> 11:15.720 to like section we provide that section using the section decorator of course the bpf decorator 11:15.880 --> 11:25.400 should come first to mark that this is to be compiled down to bpf object and I am not quite 11:25.400 --> 11:32.360 happy with these syntax we use in the section as the argument for section and I will come to that 11:32.360 --> 11:42.040 later but yeah then we have trace completion this is I think I have shown this before and this 11:42.920 --> 11:51.400 zoomed in part of that. So struct ptx as I said is the type 4 context and this function returns 11:51.400 --> 11:59.960 an n64 at the end I have to add a type ignore return value because my pie complaints that 0 11:59.960 --> 12:09.720 is not c n64 but Python bpf internally would assume a constant n64 if no type is specified and 12:09.720 --> 12:16.360 so returns 0 is actually c n64 my pie doesn't know it so for satisfying it we have to add it 12:16.360 --> 12:25.000 you know so yeah then we have context or d i context if we just take the type of context or d i 12:25.000 --> 12:35.480 in lvmir it will come out to be n64 so but the user knows that it can it can we convert it to 12:35.480 --> 12:41.880 a pointer to struct so you just do struct request get the request object from it and then 12:41.880 --> 12:47.320 data land command flags request or data land request command flags there is also this 12:47.320 --> 12:53.800 interesting bit that we do not know if data land and command flags are pointers or pointers 12:53.800 --> 13:02.440 to pointers or integers we will just use a dot to d difference it to any depth level and yeah 13:02.520 --> 13:10.360 this is part of the type deduction we have to do then another interesting bit is start dot 13:10.360 --> 13:20.440 look up so in c for example if you were looking up if you were doing a map look up that would have 13:20.440 --> 13:28.680 been a stateless function where you have to pass the map as well and the argument but I think 13:29.320 --> 13:36.600 for pie then it makes sense to make it look like start is a data container and you can have a look 13:36.600 --> 13:45.400 up on it you can also overload the dundam method for indexing and then you have start square 13:45.400 --> 13:53.160 brackets request pointer and that does it pretty well then we have the usage of k time minus 13:53.240 --> 14:00.760 it equates tsp and this would be interesting because the type for k type k time and request 14:00.760 --> 14:07.320 tsp does not match here what still it works and then we have print which allows you to use python 14:07.320 --> 14:15.560 like format strings and print is actually bpf print k so you would see there are only three 14:15.640 --> 14:24.440 arguments because we cannot do four so that is the and I did last the I won't explain trace 14:24.440 --> 14:31.080 start because I think the an explanation of trace and covered it the things happening here pretty 14:31.080 --> 14:39.000 well there is this part of bpf global where we specify the gpl license because it is used you need 14:39.000 --> 14:47.160 to do this to use much of many of the helpers and k functions in the Linux kernel we haven't 14:47.160 --> 14:53.560 like automated it here it has to be done by the user itself at the end is the compile function 14:53.560 --> 14:58.840 and now this is important that the compile function has to be at the end of all bpf chunks 14:58.920 --> 15:10.840 and instead of and what compile does is spit out an object file which contains a bpf object file 15:10.840 --> 15:18.680 there are some variations of compile that you can use you can use compile to IR to only get 15:18.680 --> 15:24.520 the LLVMIR instead of the object file if you need to inspect something while debugging yourself 15:55.480 --> 16:04.520 oh nice instead of compile and compile IR you can also use the bpf object from pile of bpf 16:04.520 --> 16:12.440 which works pretty much like what bcc user space code does so and you can have all of your 16:12.440 --> 16:21.000 things in the same file and that's pretty neat then coming to demos and examples this was presented 16:21.080 --> 16:27.080 at Linux Plumbus Conference 2025 where we presented some demos I don't want to do that again 16:27.080 --> 16:32.440 because of two reasons a the demo was like 10 minutes showing four different things will be 16:32.440 --> 16:38.440 there are some people who have attended LPC 2025 and this talk would have been like 16:38.440 --> 16:47.320 a waste of time for them if I did this again but I do employ you to go see the recording 16:47.320 --> 16:54.600 from this time stamp five minutes 52 seconds we did for examples a TUI based container monitor 16:54.600 --> 17:02.040 syscola anomaly detection for Spotify and then a kernel symbolization using 17:02.040 --> 17:13.000 base sim and vfs read latency example with different like in a python notebook then in a TUI 17:13.560 --> 17:19.720 web dashboard and how you can leverage the python ecosystem to make your vpf tools better using 17:19.720 --> 17:30.520 python vpf also I welcome you to mess with the project and the examples we have on get up most 17:30.520 --> 17:36.840 of these are ported over from bcc we have a try it out section where we list out the steps you need 17:36.840 --> 17:48.280 to take to set up python vpf and mess with it yeah I think I have covered pits in this slide that 17:49.480 --> 17:58.440 the vpf decorator is what marks a vpf chunk and then you can specify for the if it has to be 17:58.440 --> 18:04.040 a struct if it has to be a map or if it's a global or if there's a section that you need to 18:04.040 --> 18:14.520 attach it to so those are the decorators the vpfs nvpf chunk will have at least two of these decorators 18:14.520 --> 18:24.120 one for marking it one for specifying it now a big challenge while working on it and one 18:24.120 --> 18:30.440 of the things that we are still working on that's why the slides are named the internals trying 18:30.440 --> 18:38.200 to make typing work one of the reasons why we are still working on python vpf is to lower the 18:38.200 --> 18:48.760 barrier of entry to learning vpf and writing those programs and abstracting away typing has some 18:48.760 --> 18:54.840 benefits because now people don't have to know everything about what they're writing and 18:54.920 --> 19:04.360 if we handle some vm linear x or verifier errors and typing ourselves behind the scenes it gives 19:04.360 --> 19:15.480 for a much easier experience but so let's define an action on a data container or data to be an 19:15.480 --> 19:27.080 operation which involves the set data or maybe a function call to which this data is an argument 19:27.080 --> 19:37.320 of then we can find such action points in the input file and employ our type deduction on it 19:37.320 --> 19:42.840 because these action points are the only places where we have to care about the data type of the 19:42.920 --> 19:52.200 container in question yeah I try to read about auto and decrytype and how they perform type 19:52.200 --> 19:59.720 reductions in c++ and the outcome is to have a set of rules and just wing it I don't know if we can 19:59.720 --> 20:06.520 ensure that all cases will be covered by us that might need some formal proofing or I don't know 20:06.520 --> 20:12.280 but for now we look at the expected type and the current type of the data at any action point 20:12.280 --> 20:18.040 and check if we can actually convert between them according to our set of rules and then try to do it 20:18.840 --> 20:24.200 one work which we haven't python vpf is there at times you need to convert some r values to 20:24.760 --> 20:38.760 values suppose you have a map with a hash map with your n keys and n values and if you look 20:38.760 --> 20:48.440 at the helpers at the dogs and see the helper signature you'll find out that the keys 20:48.760 --> 20:56.600 have to be for for look up the key has to be an L value but python vpf allows you to do stuff like 20:57.800 --> 21:07.480 map dot look up one or it can be one plus one or it can be k time 21:09.320 --> 21:18.040 on this returns and 864 so and this can be any expression plus one plus I don't know or x plus one 21:18.920 --> 21:27.320 now the outcome of these operations is usually in our value but you have to pass it to a helper 21:27.320 --> 21:35.800 which needs the address of this thing so this introduces a lot of complexity in how we allocate 21:36.600 --> 21:45.160 stack space to local variables another work of working with vpf is that you can only 21:45.640 --> 21:56.040 allocate stack space in the first basic block of a function so if your function has any f statement 21:56.040 --> 22:04.200 and you are creating a variable inside the body of that in f statement we need to see if we 22:04.200 --> 22:12.520 will ever reach that if statement and then probably create space for that as well now another thing 22:12.520 --> 22:20.200 is that map look up this function takes an L value but what kind of L value may be it takes 22:20.200 --> 22:26.040 up not just this helper but any function which we might have already cleared it can be like 22:27.000 --> 22:33.800 a pointer to a pointer or and you are passing just one integer so you need to create 22:34.520 --> 22:43.000 a temporary two temporary stack spaces here which can allow you to first create a pointer and then 22:43.000 --> 22:51.640 a pointer to a pointer and then pass it over to the helper so this because of this we have to create 22:51.640 --> 22:59.240 like scratch spaces where you go through each basic block and see how many such temporary 23:00.200 --> 23:09.080 scratch registers you need and then create them for each function so there is also the depth 23:09.800 --> 23:19.000 problem that suppose we have I think this was what I was hinting to here K time minus request 23:19.000 --> 23:25.320 tsp the here it is very easy because request time stamp is a pointer and you just need to 23:25.400 --> 23:34.760 dereference it because we know that the outcome of this expression has to be and in 64 or 23:34.760 --> 23:42.920 in 32 depending on what the widest integer in this expression is K time is in 64 so this will be 23:42.920 --> 23:50.840 64 bit minus d reference a request tsp until you get a 64 bit integer if you don't then just don't 23:51.000 --> 23:57.240 do it that's one of the things we do and we also have to do it the other way like 23:57.960 --> 24:03.960 allocate as much stack space as you can until you get to a point where you can pass it to 24:04.920 --> 24:12.040 with or right type to the function or helper you want to pass it to then the to 24:12.920 --> 24:18.680 victor stock here was also interesting for string and calories and this is a problem that we 24:18.920 --> 24:25.320 view facing here that there is no clear distinction between strings and calories in python bpf 24:25.320 --> 24:32.120 so if a function needs a pointer to a string you just give it a pointer to the first character 24:32.120 --> 24:40.520 of the character array but now if you want to have a character array at an action point so we'll first 24:40.520 --> 24:47.480 do call to the helper of bpf corner read string and give the output of that string behind the 24:47.480 --> 25:00.200 scenes to whatever action point that that car array is required at so yeah coming back to yeah 25:00.600 --> 25:07.000 so this is somewhat difficult and we constantly test and improve our type deduction and this is 25:07.000 --> 25:13.480 an area where we face most of our hiccups whenever we try something new in python bpf 25:14.440 --> 25:22.600 this is an example which can probably I yeah I think I did go back to this example like here and 25:22.600 --> 25:31.240 then if you see you don't need to worry about types then we have this core read which will read the 25:33.240 --> 25:39.800 exact data for you but here we employ the same thing using just the D difference the struct member 25:39.960 --> 25:56.200 syntax and yeah that's that's it for this right I believe and one more thing about this is 25:56.200 --> 26:06.200 that you might see that this looks cleaner and bless complex and this is not about code 26:06.200 --> 26:13.960 goal thing this is that it's it's fine if the user doesn't know about exact types as I don't think 26:13.960 --> 26:26.440 they should be knowing that to write bpf programs then we have this this about vmnex.py and how it works 26:26.440 --> 26:35.880 the quark here was that we have to generate a debugging for for this and we encountered some types 26:35.960 --> 26:42.440 that we weren't encountering when we write programs without vmnex like function pointers or 26:42.440 --> 26:50.680 structs within structs within structs and that led to more challenges which we overcame 26:52.760 --> 27:01.160 and then where what python bpf does well it tries to shield users from typing and verify 27:01.480 --> 27:08.920 bit pitfalls by basic verifier failure mitigations like auto generating null checks for pointers 27:08.920 --> 27:17.000 or boundary checks for raw data from packets then a concise python x syntax 27:18.040 --> 27:23.080 you can leverage the python ecosystem to create created data analysis and visualization tools 27:23.160 --> 27:33.880 which we have demoed in the LPC recording which I urge you to watch and this can be ideal for 27:33.880 --> 27:44.920 beginners to vbpf and for quick prototyping and there are some things that we still have to work on 27:45.480 --> 27:51.640 we got some nice reviews after presentation at LPC from the attendees one of the things that we 27:51.720 --> 27:59.320 have to work on is that it's not quite feature complete which is true we only have support for some 27:59.320 --> 28:07.080 maps and helpers right now partly due to because we started by handwriting all of the helpers 28:07.080 --> 28:16.760 but now that we can pass vmnex.py and have some what robust type inference system we can 28:16.760 --> 28:23.720 probably create auto generate python signatures from vmnex.py and try to 28:25.640 --> 28:34.120 not insert a writing it by hand just we use the exact helper okay from and pass arguments 28:34.120 --> 28:38.680 and it should work barring for some cases in which we will still have to write exceptions 28:39.640 --> 28:47.880 there was also this review that in vpf raise someone wrote a snake game can someone write a 28:47.880 --> 28:53.640 snake game in python vpf and yeah you cannot write now because we don't support for loops 28:55.160 --> 29:01.800 and this becomes a vacuum old game then that you present this project to someone and someone will 29:01.960 --> 29:09.560 come with a feature request and then you rush to complete that feature request and this might work 29:09.560 --> 29:18.200 but this is not how a project should work we should know like what things we have to implement 29:18.200 --> 29:25.720 for this to work one idea for that was to try to put all the kernel self tests to python vpf 29:26.600 --> 29:36.200 but that seemed like a pretty big undertaking through that it's a very representative of what kind 29:36.200 --> 29:44.760 of vpf programs you can write but it's a huge a lot of code that we will have to work on so 29:44.760 --> 29:56.280 another thing that we focus on would be making a like a simple minimum viable product 29:56.280 --> 30:01.480 and perfecting it and that comes to the second thing establishing a feedback loop with our 30:01.480 --> 30:07.480 early adopters people will come try this find one of the one or two features they don't like 30:07.480 --> 30:14.040 and or maybe python vpf doesn't support and relieve it and we need to have a feedback loop to 30:14.040 --> 30:22.040 our early users to be able to make this better so we need users whose needs and use cases are 30:22.040 --> 30:29.640 very clearly defined and we need to have a wrap up with them to have constant feedbacks a good 30:29.640 --> 30:39.000 feedback loop this can solve our problem we had with kernel self tests as well it's a huge thing 30:39.000 --> 30:47.960 but what if we create it for grad students first for like courses where you have to do a lab 30:49.640 --> 31:03.400 is the time done yeah okay then so so yeah this is something we're working on to make a lab for 31:03.560 --> 31:15.080 my alma mater for the fall sim that would be but that would be a good feedback loop from 31:15.080 --> 31:23.560 them from the students there's also how opinionated python vpf has to be so like for things like 31:23.560 --> 31:29.880 k time that can be time and using print instead of print k because print already existing python 31:29.960 --> 31:38.120 so why should we use the whole vpf print k thing and instead of having a string arguments to sections 31:38.120 --> 31:45.960 having name couples so that your id hints can help you find if a specific trace point exists or not 31:46.680 --> 31:54.680 so things like that can be added a goal is to make python vpf a crime choice and make this 31:55.640 --> 32:06.520 accessible and this is the summary of whatever of this presentation the links I didn't have a QR code 32:06.520 --> 32:13.800 but you can download and visit them and yeah that's all any questions 32:14.440 --> 32:32.760 so you mentioned that we're producing the tools from bcc as example using python vpf do you have 32:32.760 --> 32:42.120 some other tools like property python vpf as well that are not in bcc I mean not tools but we have like 32:44.200 --> 32:50.840 kind of tests so like we implement a feature and then try to write tests for that and those are 32:50.840 --> 33:00.040 not tools that people can use it's more like this we pivoted to using converting bcc tools to python 33:00.040 --> 33:06.600 vpf because we thought that they might already have users maybe just a small comment if you want 33:06.600 --> 33:13.480 to get grad students interested I think it would be good to support for example the vpf 33:13.800 --> 33:21.560 m or the sketch to implement schedulers and start experimenting with that to make it really easy 33:21.560 --> 33:29.480 for students to hike on it or like networking you know some cases like that yeah that's something 33:29.480 --> 33:36.040 we are discussing with the professors that what they exactly need for their labs and since we have 33:36.040 --> 33:42.440 a good timeline for the false them we can implement all of these and then try to see if they're robust 33:42.520 --> 33:48.200 and because students would essentially be fuzzing it for us so yeah that's the plan 33:50.600 --> 33:54.600 all right thank you thank you