WEBVTT 00:00.000 --> 00:15.000 So, on top, who knows who it is, and everybody dealt with Zoom killer errors before. 00:15.000 --> 00:17.000 They're kind of a pain, right? 00:17.000 --> 00:23.000 So, like, your programs running along, and somebody allocates memory that might not be your program, 00:23.000 --> 00:28.000 and the kernel decides that your program is the fat program that's using too much memory. 00:28.000 --> 00:34.000 It's killed, and then in your demessage log, you see some kernel stack trace and nothing about, you know, 00:34.000 --> 00:38.000 what your program is doing at the time, so it's like, it just got nuked. 00:38.000 --> 00:44.000 So, this can be thought if you're in, you know, production and trying to solve these issues, it can be a challenge. 00:44.000 --> 00:53.000 So, what is Zoom Proph, so, Zoom Proph is a set of EDPF programs that knows how to read, 00:53.000 --> 00:57.000 and it actually goes internal memory profile or data. 00:57.000 --> 01:02.000 And at Zoom time, it actually goes and reads that, records it to some EDPF maps, 01:02.000 --> 01:08.000 and then allows after your program is dead for your memory profile to be saved and uploaded or logged, 01:08.000 --> 01:10.000 or whatever you want to do with it. 01:10.000 --> 01:13.000 So, that's it, so we're done. 01:13.000 --> 01:23.000 So, the motivation for our signals provides the profiling solution. 01:23.000 --> 01:27.000 You know, CPU profiling is kind of our bread and butter, but memory profiling, 01:27.000 --> 01:31.000 and other forms of profiling are things we do as well. 01:31.000 --> 01:39.000 But, you know, if you're profiling memory, you may not get memory profiling typically works by scraping 01:39.000 --> 01:44.000 and programs, maybe like once every minute, once every five minutes, you scrape your p-prof 01:44.000 --> 01:48.000 endpoint and get a memory profile, but that might not be good enough, right? 01:48.000 --> 01:53.000 So, like, sometimes you have some allocation that is too varied, 01:53.000 --> 01:58.000 and you can just die right away with one allocation, or sometimes you have some, like, service failure 01:58.000 --> 02:02.000 and you start getting to some failure, retire loop, and you kind of can die right away 02:02.000 --> 02:08.000 and those allocations may not show up ever in a profile for using ad hoc profiling. 02:08.000 --> 02:12.000 So, that's the motivation. 02:12.000 --> 02:13.000 Not really. 02:13.000 --> 02:19.000 Actually, the motivation is that I've been battling, um, killing problems for a longer than I 02:19.000 --> 02:20.000 cared of in that. 02:20.000 --> 02:27.000 I used to work on databases, and most working on garbage collectors, and, you know, memory 02:27.000 --> 02:31.000 when these problems happen, it's like, it's got to be a better way, right? 02:31.000 --> 02:35.000 It's got to be able to figure out why this item happened and understand what happened. 02:35.000 --> 02:39.000 So, the real answer is that this was an issue I had while I was working on something else, 02:39.000 --> 02:44.000 and because poor things are such a cool company, they allowed me to go take a little while and work on this. 02:44.000 --> 02:51.000 So, anyway, so, for those who don't know, who's right, go programs. 02:51.000 --> 02:54.000 Alright, so you guys probably don't know all of this. 02:54.000 --> 02:58.000 Anyway, go has one of the cool features as a built-in memory profile, 02:58.000 --> 03:02.000 and it's just a statistical memory profile, or the basically just has a little counter, 03:02.000 --> 03:06.000 and every time it crosses a 512k, it's configurable. 03:06.000 --> 03:11.000 But every time that happens, it takes a sample, and then it takes that sample, 03:11.000 --> 03:16.000 it's just a call stack, and it sticks that to a hash table, and then that allows it to, 03:16.000 --> 03:20.000 you know, track your memory usage with very low overhead. 03:20.000 --> 03:26.000 So, you know, 512k is a pretty big number, so most allocations, zero overhead. 03:27.000 --> 03:34.000 And then, if a, you know, allocation gets tracked by the profile, a bit gets flipped, 03:34.000 --> 03:36.000 and so when that thing gets free, by the garbage collector, 03:36.000 --> 03:41.000 some stats are updated, so it knows when those things are free, 03:41.000 --> 03:46.000 and in addition to seeing allocation volume, you can also see in use statistics, 03:46.000 --> 03:48.000 which is usually what we care about, right? 03:48.000 --> 03:51.000 We don't care about all the garbage, all the things that were allocated free. 03:51.000 --> 03:55.000 We want to know what the kind of outstanding and use allocations are. 03:57.000 --> 04:01.000 But goes GC, so do we want to count garbage? 04:01.000 --> 04:04.000 Like garbage, you know, as things that are going to go away, 04:04.000 --> 04:07.000 but if you're dealing with loans, you do want to count garbage, right? 04:07.000 --> 04:11.000 So until the garbage collector gets to the sweet point of its cycle, 04:11.000 --> 04:14.000 that, you know, all your garbage is actually consuming memory. 04:14.000 --> 04:22.000 So, the way this works in the profile is that, as it's profiling, and according to Alex, 04:22.000 --> 04:26.000 and free, they get stored in this kind of like scratch, 04:26.000 --> 04:31.000 temporary, you know, future bucket, and then at the end of the sweep, 04:31.000 --> 04:37.000 all that, you know, future free, temp stuff gets swept into the active bucket, 04:37.000 --> 04:41.000 and then that's what's reported to you from the go pre-process end points. 04:41.000 --> 04:46.000 So the active bucket, the active, you know, profile is never going to have 04:46.000 --> 04:49.000 any garbage in it, it's like the end of the sweep. 04:49.000 --> 04:55.000 So what we do is we count all those things, so we roll it, not only just the active thing, 04:55.000 --> 05:00.000 the P-profit can read those future buckets, and so if, you know, 05:00.000 --> 05:04.000 the allocation to cause your problems was recently, the recent allocation, 05:04.000 --> 05:10.000 which can happen, we'll see those, even though they may not have show up in a memory profile. 05:10.000 --> 05:12.000 Does that make sense? 05:12.000 --> 05:18.000 So we're in the garbage collection cycles, but anyway, so the first step 05:18.000 --> 05:23.000 of doing improve is that, you know, we have a wash and program at scans, 05:23.000 --> 05:26.000 all the pids in your system, it finds the go programs, 05:26.000 --> 05:30.000 and then it tries to find this point or call the runtime and bucket. 05:30.000 --> 05:34.000 So this is the address of the actual memory profile buckets. 05:34.000 --> 05:40.000 So if it can find them, we'll register them, so we'll have a BBS map that says, 05:40.000 --> 05:44.000 these are the go programs, we're care about, these are the ones we know, 05:44.000 --> 05:51.000 we're the profiling data lives, so these are the ones we're going to watch for. 05:51.000 --> 05:57.000 This stuff is all implemented in the parker agent, and the Oom Prof code itself 05:57.000 --> 05:59.000 is a library that we just use. 05:59.000 --> 06:03.000 So you can use the Oom Prof library separate from the parker agent, 06:03.000 --> 06:06.000 but I'm not really going to talk about that because it's a little complicated, 06:06.000 --> 06:12.000 but if you're interested in that, you can see me after class. 06:12.000 --> 06:17.000 Anyway, so it works with a couple trace points, 06:17.000 --> 06:23.000 what you do is a marked victim trace point that allows you to see 06:23.000 --> 06:27.000 when the Oom killer is picked a victim, 06:27.000 --> 06:30.000 and so you can get the pid from that data. 06:30.000 --> 06:33.000 So we can do that to say, okay, here's the pid that's dying, 06:33.000 --> 06:37.000 is this in our go prox maps of the go process as we care about, 06:37.000 --> 06:41.000 and if it is, we know, oh, we care about this, so we can go read it, 06:41.000 --> 06:44.000 go read it's memory and we're done, right? 06:44.000 --> 06:48.000 Well, not really, it's a little more complicated, 06:48.000 --> 06:52.000 because the Oom killer runs in the context of some programs 06:52.000 --> 06:55.000 are allocating memory that may not be your program, right? 06:55.000 --> 06:58.000 So the marked victim, even here, program, 06:58.000 --> 07:02.000 can't just start reading data from your go program 07:02.000 --> 07:06.000 because the context may be systemd or some other program. 07:06.000 --> 07:09.000 So how do we solve that? 07:09.000 --> 07:14.000 So the way we solve that after much spelunking 07:14.000 --> 07:19.000 was to realize that the way that Oom works is it uses kill, 07:19.000 --> 07:22.000 so it sends a kill signal to the program that's dying, 07:22.000 --> 07:27.000 so we can hook on to the signal signal deliver trace point 07:27.000 --> 07:31.000 to see when our program receives this, 07:31.000 --> 07:36.000 and luckily for us, the signal deliver EVPF trace point 07:36.000 --> 07:39.000 occurs in the context of your program. 07:39.000 --> 07:43.000 So our EVPF program that's hooked up to the signal deliver trace point 07:43.000 --> 07:47.000 can restart reading the memory from your process, 07:47.000 --> 07:51.000 and it's still there and it can read it and it just kind of works, 07:51.000 --> 07:53.000 which is nice. 07:53.000 --> 07:56.000 So, you know, just to brief point about yes, 07:56.000 --> 07:59.000 there is some overhead attaching a trace point, 07:59.000 --> 08:01.000 just signal deliver, but, you know, 08:01.000 --> 08:05.000 typically signal deliver is not like a super high throughput thing, 08:05.000 --> 08:10.000 and typically it's not on the critical path 08:10.000 --> 08:13.000 for key workloads, but who knows, 08:13.000 --> 08:15.000 you know, which may vary, so. 08:15.000 --> 08:18.000 When it's out, measure. 08:18.000 --> 08:22.000 Let's see, let's set it. 08:22.000 --> 08:24.000 All right. 08:24.000 --> 08:31.000 So just to overview of, you know, how this all works, 08:32.000 --> 08:35.000 Parker agent, Parker agents are, you know, 08:35.000 --> 08:39.000 profiling agent that, that poor signals helps develop, 08:39.000 --> 08:43.000 and then it, installs the trace points and sets up the maps. 08:43.000 --> 08:45.000 And then it's gives the process, 08:45.000 --> 08:48.000 it doesn't register the ones that can find the bucket address, 08:48.000 --> 08:52.000 the one who happens, mark victim, signal deliver trace points, 08:52.000 --> 08:54.000 exit, execute, sorry, 08:54.000 --> 09:00.000 and then what we do in the EVPF single delivery trace point 09:00.000 --> 09:05.000 is that we rip through the buckets of the GoPro file, 09:05.000 --> 09:07.000 copy of those into an EVPF map, 09:07.000 --> 09:10.000 and then we send a signal via a PVDF 09:10.000 --> 09:13.000 purpose and output to the user agent. 09:13.000 --> 09:16.000 And then, Parker agent gets that signal and says, 09:16.000 --> 09:19.000 oh, and program died as an event. 09:19.000 --> 09:21.000 Now, I can read that map, read those buckets, 09:21.000 --> 09:23.000 which have been copied from your GoPro, 09:23.000 --> 09:26.000 and your GoPro and going at this point is dead and gone, 09:26.000 --> 09:29.000 but that EVPF map is still there. 09:29.000 --> 09:32.000 And we can read all the entries from that, 09:32.000 --> 09:34.000 and then create a people off profile, 09:34.000 --> 09:37.000 and it can be uploaded to polar signals, 09:37.000 --> 09:39.000 and then you see your profile, 09:39.000 --> 09:42.000 and you can figure out what happened and save the day, right? 09:42.000 --> 09:44.000 So, does it work? 09:44.000 --> 09:46.000 Sometimes, you know? 09:46.000 --> 09:48.000 Actually, it works pretty well, but like, 09:48.000 --> 09:50.000 there are things that can go wrong, right? 09:50.000 --> 09:56.000 So, the hash table that go uses distort, 09:56.000 --> 09:59.000 things has a massive prime number of entries, 09:59.000 --> 10:03.000 and then each entry has an infinite number of 10:03.000 --> 10:07.000 possible entries because it's just used as linked list 10:07.000 --> 10:08.000 chaining. 10:08.000 --> 10:12.000 So, in theory, we could have an infinite number of records to read. 10:12.000 --> 10:17.000 And obviously, that kind of doesn't fly with an EVPF program, 10:17.000 --> 10:18.000 right? It's pretty strict. 10:18.000 --> 10:21.000 And it's, you know, how much execution you can do. 10:21.000 --> 10:25.000 So, our current implementation 10:25.000 --> 10:29.000 can, in one EVPF program, 10:29.000 --> 10:35.000 read 360,362 buckets, which seems like a small number, 10:35.000 --> 10:38.000 but we haven't done a lot to optimize that 10:38.000 --> 10:41.000 to probably get a little better, but anyway, 10:41.000 --> 10:45.000 that's not enough for a kind of, you know, 10:45.000 --> 10:48.000 resumong sized go program. 10:48.000 --> 10:50.000 So, that's no good. 10:50.000 --> 10:53.000 But then luckily, we can tap into tail calls 10:53.000 --> 10:56.000 and get that up to 100,000 buckets. 10:56.000 --> 10:59.000 And that turns out to, you know, 10:59.000 --> 11:01.000 for all the things we've tried to throw at it, 11:01.000 --> 11:02.000 there's plenty of buckets. 11:02.000 --> 11:04.000 So, the map is a little hard, right? 11:04.000 --> 11:07.000 So, what is a bucket of bucket is a unique allocation stack. 11:07.000 --> 11:08.000 And so, if you look at a program, 11:08.000 --> 11:10.000 like you can't just look at it and go, 11:10.000 --> 11:13.000 how many unique allocation paths are in that program? 11:13.000 --> 11:14.000 You know, it could be a small number, 11:14.000 --> 11:16.000 it could be a huge number, you know, 11:16.000 --> 11:18.000 basically, probably roughly correlates 11:18.000 --> 11:19.000 with the size of the program. 11:19.000 --> 11:21.000 So, if you have like a huge go program, 11:22.000 --> 11:23.000 maybe that's not enough. 11:23.000 --> 11:25.000 But in practice that we've seen, 11:25.000 --> 11:26.000 that's plenty. 11:26.000 --> 11:29.000 And then also, because that maps pre-allocated, 11:29.000 --> 11:32.000 how many buckets you support 11:32.000 --> 11:34.000 factors into the size of the map. 11:34.000 --> 11:37.000 And the map roughly wakes that works out to 11:37.000 --> 11:39.000 60,000 buckets takes 40 megabytes. 11:39.000 --> 11:42.000 So, that's not a ton of memory, 11:42.000 --> 11:44.000 but this is configurable, right? 11:44.000 --> 11:46.000 So, you could jack it up, 11:46.000 --> 11:48.000 and you can get up to, like I said, 11:48.000 --> 11:50.000 100,000 before we start running into 11:50.000 --> 11:53.000 our EVPF limits. 11:53.000 --> 11:55.000 So, if you have more, 11:55.000 --> 11:56.000 what happens? 11:56.000 --> 11:58.000 Well, we just stop, right? 11:58.000 --> 12:00.000 I mean, we only, you know, 12:00.000 --> 12:03.000 the loop in our program is limited 12:03.000 --> 12:05.000 to a certain number of iterations. 12:05.000 --> 12:06.000 If we get to the end, 12:06.000 --> 12:08.000 and we haven't reached the end of all 12:08.000 --> 12:10.000 of the memory allocation, 12:10.000 --> 12:11.000 buffets in the hash table, 12:11.000 --> 12:12.000 we'll just record, 12:12.000 --> 12:14.000 but now we didn't get to the end. 12:14.000 --> 12:16.000 This is incomplete. 12:16.000 --> 12:18.000 So, now that may or may not be useful, 12:18.000 --> 12:20.000 like, you know, depends 12:20.000 --> 12:22.000 to be hit the interesting allocations, 12:22.000 --> 12:24.000 to the interesting allocations 12:24.000 --> 12:26.000 come later in the hash map, 12:26.000 --> 12:27.000 and we didn't get to them, 12:27.000 --> 12:29.000 but I really still have something 12:29.000 --> 12:30.000 so you can look at it. 12:30.000 --> 12:33.000 So, it's a bit of a crapshoot in that situation, 12:33.000 --> 12:34.000 but at least you know, 12:34.000 --> 12:35.000 like, let me tell you, 12:35.000 --> 12:38.000 if it's a complete reader not. 12:38.000 --> 12:40.000 So, why do these limits exist? 12:40.000 --> 12:42.000 Just a little more detail, 12:42.000 --> 12:43.000 like, 12:43.000 --> 12:44.000 the, 12:44.000 --> 12:47.000 my slides suck by the way, 12:47.000 --> 12:49.000 so I apologize for that. 12:49.000 --> 12:50.000 Like, I should have like, 12:50.000 --> 12:51.000 pictures and stuff. 12:51.000 --> 12:52.000 Anyway, 12:52.000 --> 12:56.000 the way these records are laid out, 12:56.000 --> 12:58.000 you have like a header, 12:58.000 --> 13:01.000 you have the stack, 13:01.000 --> 13:04.000 and then you have like, 13:04.000 --> 13:07.000 their actual mem stuff that tells you, 13:07.000 --> 13:08.000 how many, you know, 13:08.000 --> 13:10.000 Alex and Freeze and whatnot. 13:10.000 --> 13:11.000 So, the header's a fixed size, 13:11.000 --> 13:13.000 and then it's going to number at the end, 13:13.000 --> 13:14.000 which is the number of stacks, 13:14.000 --> 13:16.000 and the stack part is anywhere from zero to a thousand, 13:16.000 --> 13:18.000 stack entries, 13:18.000 --> 13:20.000 which is just the call stack up to that, 13:20.000 --> 13:22.000 and then after that is the mem. 13:22.000 --> 13:25.000 So, our ADPF program has to do three reads, 13:25.000 --> 13:28.000 to figure this out right next to read the header, 13:28.000 --> 13:29.000 read the stack, 13:29.000 --> 13:30.000 and then read the mem. 13:30.000 --> 13:31.000 So, that's, 13:31.000 --> 13:32.000 it's pretty simple, 13:32.000 --> 13:33.000 you know, 13:33.000 --> 13:34.000 if you want to go look at the code, 13:34.000 --> 13:35.000 I encourage you to do so, 13:35.000 --> 13:36.000 but that, 13:36.000 --> 13:37.000 you know, 13:37.000 --> 13:39.000 doing those three reads, 13:39.000 --> 13:41.000 the instructions to pull out the data 13:41.000 --> 13:42.000 we're interested in, 13:42.000 --> 13:44.000 copy them to the EDPF map, 13:45.000 --> 13:46.000 is why, 13:46.000 --> 13:47.000 you know, 13:47.000 --> 13:50.000 we only get three thousand or so buckets per program. 13:55.000 --> 13:56.000 The, 13:56.000 --> 13:57.000 you know, 13:57.000 --> 13:58.000 the other thing I was going to point out is that 13:58.000 --> 14:01.000 this can be up to a thousand, 14:01.000 --> 14:04.000 but our representation fixes it at 64, 14:04.000 --> 14:06.000 so it's a fixed size allocation, 14:06.000 --> 14:08.000 so that's why the three reads instead of two, 14:08.000 --> 14:13.000 because we don't support the full thousand stack frames, 14:13.000 --> 14:15.000 that go to us. 14:15.000 --> 14:17.000 It's typically 64's enough. 14:17.000 --> 14:18.000 This is all configurable, 14:18.000 --> 14:19.000 like, 14:19.000 --> 14:20.000 in your use case is different. 14:20.000 --> 14:22.000 You can go in there and change it. 14:22.000 --> 14:24.000 So, 14:24.000 --> 14:26.000 walking go wrong, 14:26.000 --> 14:27.000 you know, 14:27.000 --> 14:28.000 BMPF reads can fail, 14:28.000 --> 14:29.000 you know, 14:29.000 --> 14:30.000 it's effective life, 14:30.000 --> 14:31.000 you know, 14:31.000 --> 14:32.000 if the, you know, 14:32.000 --> 14:34.000 the programs under memory pressure, 14:34.000 --> 14:35.000 which, 14:35.000 --> 14:37.000 if it's boom killers happening, 14:37.000 --> 14:38.000 you know, 14:38.000 --> 14:40.000 machine is under memory pressure, 14:40.000 --> 14:42.000 it could be possible that parts of the, 14:43.000 --> 14:45.000 the bucket array, 14:45.000 --> 14:47.000 massive array on any of those allocations, 14:47.000 --> 14:48.000 what pages they live on, 14:48.000 --> 14:50.000 could get put out to swap, 14:50.000 --> 14:51.000 and if we try to read them, 14:51.000 --> 14:53.000 we just get an error back. 14:53.000 --> 14:54.000 So, that happens. 14:54.000 --> 14:55.000 So, again, 14:55.000 --> 14:57.000 that's why we have that incomplete flag. 14:57.000 --> 14:58.000 If we can, 14:58.000 --> 14:59.000 some reason, 14:59.000 --> 15:01.000 we don't get to the end of all of your buckets, 15:01.000 --> 15:03.000 we'll let you know that's an incomplete read. 15:06.000 --> 15:07.000 And then, 15:07.000 --> 15:09.000 something that can go wrong 15:09.000 --> 15:10.000 is, 15:10.000 --> 15:11.000 sometimes, 15:11.000 --> 15:12.000 the Oomkill gets impatient, 15:12.000 --> 15:13.000 and, 15:13.000 --> 15:15.000 while it's trying to kill your program, 15:15.000 --> 15:17.000 it will come along and kill another program, 15:17.000 --> 15:18.000 in that case, 15:18.000 --> 15:19.000 it can also be the park agents, 15:19.000 --> 15:20.000 it gets killed, 15:20.000 --> 15:21.000 and then, 15:21.000 --> 15:22.000 you know, 15:22.000 --> 15:24.000 what's the point of Oombrough at that point, 15:24.000 --> 15:26.000 because that data's not going anywhere. 15:26.000 --> 15:27.000 But, 15:27.000 --> 15:29.000 it was surprisingly well in practice, 15:29.000 --> 15:30.000 you know, 15:30.000 --> 15:32.000 we have a bunch of different tests in our CI, 15:32.000 --> 15:33.000 that tests different, 15:33.000 --> 15:34.000 you know, 15:34.000 --> 15:36.000 programs that crash in different ways, 15:36.000 --> 15:38.000 and it's pretty reliable for the test. 15:38.000 --> 15:40.000 That we have so far, 15:40.000 --> 15:41.000 but, 15:41.000 --> 15:42.000 you know, 15:42.000 --> 15:43.000 probably lots of fillings, 15:43.000 --> 15:44.000 I don't know where of, 15:44.000 --> 15:45.000 but, 15:45.000 --> 15:47.000 those are the ones we've seen. 15:47.000 --> 15:50.000 So future directions for Oombrough. 15:50.000 --> 15:51.000 We, 15:51.000 --> 15:55.000 you know, 15:55.000 --> 15:56.000 kind of, 15:56.000 --> 15:57.000 I think the inspiration for it goes, 15:57.000 --> 15:58.000 memory profile, 15:58.000 --> 15:59.000 probably came from J Malik, 15:59.000 --> 16:00.000 and it's profile, 16:00.000 --> 16:01.000 so, 16:01.000 --> 16:04.000 including J Malik's support would be pretty easy, 16:04.000 --> 16:05.000 and be cool, 16:05.000 --> 16:07.000 so it wouldn't be just go programs, 16:08.000 --> 16:09.000 or TC Malik, 16:09.000 --> 16:10.000 or Mew Malik, 16:10.000 --> 16:12.000 which is commonly used in rust programs, 16:12.000 --> 16:14.000 would be good to support. 16:14.000 --> 16:15.000 In theory, 16:15.000 --> 16:17.000 we could probably support J lived C, 16:17.000 --> 16:21.000 but that would be a little bit more of a project, 16:21.000 --> 16:24.000 because it doesn't have a profile built in. 16:24.000 --> 16:26.000 And then right now, 16:26.000 --> 16:28.000 we have that limitation where, 16:28.000 --> 16:30.000 if we can't find the runtime, 16:30.000 --> 16:32.000 and bucket symbol in your program, 16:32.000 --> 16:33.000 then we're dead, 16:33.000 --> 16:34.000 right, 16:34.000 --> 16:36.000 we can't program a lot of go-binder, 16:36.000 --> 16:37.000 as well, 16:37.000 --> 16:38.000 you know, 16:38.000 --> 16:39.000 strip out those symbols, 16:39.000 --> 16:40.000 so, 16:40.000 --> 16:41.000 not a lot, 16:41.000 --> 16:43.000 I would say some. 16:43.000 --> 16:45.000 But there are ways of finding that stuff, 16:45.000 --> 16:46.000 if it's a strip binary, 16:46.000 --> 16:49.000 but we just haven't gone to those links yet, 16:49.000 --> 16:50.000 but, 16:50.000 --> 16:51.000 you know, 16:51.000 --> 16:52.000 in theory, 16:52.000 --> 16:53.000 you can find a function that, 16:53.000 --> 16:54.000 references that thing, 16:54.000 --> 16:56.000 and then disassemble the function, 16:56.000 --> 16:58.000 and find the address that way. 16:58.000 --> 17:00.000 And the other thing we're looking at, 17:00.000 --> 17:01.000 is, 17:01.000 --> 17:02.000 you know, 17:02.000 --> 17:03.000 like I said before, 17:03.000 --> 17:05.000 a lot of profiles for go programs 17:05.000 --> 17:07.000 and by scraping the people off end point, 17:07.000 --> 17:08.000 it's a mineral. 17:08.000 --> 17:10.000 But what if we could do push profile, 17:10.000 --> 17:12.000 and what if your program could say, 17:12.000 --> 17:13.000 oh, 17:13.000 --> 17:14.000 something interesting happened, 17:14.000 --> 17:16.000 or I'm in some interesting phase of my program. 17:16.000 --> 17:18.000 We're doing some kind of memory, 17:18.000 --> 17:20.000 intensive optimization thing. 17:20.000 --> 17:22.000 Let's push, 17:22.000 --> 17:23.000 you know, 17:23.000 --> 17:24.000 so we have the ability, 17:24.000 --> 17:26.000 because we can just read them at any time. 17:26.000 --> 17:28.000 You can push those profiles. 17:28.000 --> 17:29.000 You know, 17:29.000 --> 17:31.000 having to get pulled. 17:32.000 --> 17:34.000 So, 17:34.000 --> 17:35.000 I encourage you to check it out. 17:35.000 --> 17:36.000 Like I said, 17:36.000 --> 17:38.000 the UMPROF project is, 17:38.000 --> 17:40.000 under the parka umbrella, 17:40.000 --> 17:43.000 that the pole signal is created. 17:43.000 --> 17:45.000 That's the library, 17:45.000 --> 17:46.000 and then it's included, 17:46.000 --> 17:48.000 and built into the parka agent. 17:48.000 --> 17:51.000 And then the parka website has a bunch of, 17:51.000 --> 17:53.000 a link to start on it. 17:53.000 --> 17:54.000 Is anyone ever used parka? 17:54.000 --> 17:56.000 Anybody? 17:57.000 --> 18:00.000 So, check it out. 18:00.000 --> 18:02.000 You get parka agent. 18:02.000 --> 18:03.000 There's a lot of useful things, 18:03.000 --> 18:07.000 in addition to just doing Prophiling. 18:07.000 --> 18:09.000 So, 18:09.000 --> 18:11.000 and then I guess my last slide wasn't included. 18:11.000 --> 18:12.000 That's all right. 18:12.000 --> 18:13.000 It was just my name. 18:13.000 --> 18:14.000 So, 18:14.000 --> 18:16.000 well, 18:16.000 --> 18:17.000 thank you. 18:17.000 --> 18:18.000 Thank you. 18:18.000 --> 18:19.000 Thank you. 18:19.000 --> 18:20.000 Thank you. 18:20.000 --> 18:21.000 Thank you. 18:21.000 --> 18:22.000 Thank you. 18:23.000 --> 18:24.000 Thank you. 18:24.000 --> 18:25.000 Thank you. 18:25.000 --> 18:26.000 Thank you. 18:30.000 --> 18:32.000 So, we have lots of time for questions, 18:32.000 --> 18:34.000 so I hope you have questions. 18:34.000 --> 18:35.000 Yeah, thanks very much. 18:35.000 --> 18:37.000 I'll definitely will try it out. 18:37.000 --> 18:39.000 Especially I do a lot of direct, 18:39.000 --> 18:40.000 we do a lot of direct profiling, 18:40.000 --> 18:41.000 just like you said, 18:41.000 --> 18:42.000 and I directly, 18:42.000 --> 18:43.000 with running, 18:43.000 --> 18:44.000 executable, 18:44.000 --> 18:45.000 go. 18:45.000 --> 18:46.000 I was just wondering, 18:46.000 --> 18:49.000 does this offer any additional presentation? 18:49.000 --> 18:51.000 Or is it literally I'm just going to get people 18:51.000 --> 18:52.000 by the end? 18:52.000 --> 18:54.000 Is there something extra I get, 18:54.000 --> 18:56.000 additional to the people of itself? 18:56.000 --> 18:57.000 So, 18:57.000 --> 18:59.000 it's just the P Proph, 18:59.000 --> 19:02.000 and then the information you get is 19:02.000 --> 19:03.000 did a reader occur, 19:03.000 --> 19:04.000 so, 19:04.000 --> 19:05.000 along the way, 19:05.000 --> 19:07.000 if we've got failed to read from the map, 19:07.000 --> 19:08.000 we'll tell you what that happened, 19:08.000 --> 19:10.000 and then is it complete? 19:10.000 --> 19:13.000 It can be incomplete if we, 19:13.000 --> 19:15.000 if there's too many documents for us to read, 19:15.000 --> 19:16.000 so we didn't get to the end. 19:16.000 --> 19:17.000 So, 19:17.000 --> 19:19.000 for most cases, 19:19.000 --> 19:20.000 you know, 19:20.000 --> 19:21.000 complete will be true, 19:21.000 --> 19:22.000 and read there will be false, 19:22.000 --> 19:24.000 and that's a full profile, 19:24.000 --> 19:25.000 and you should be able to use that, 19:25.000 --> 19:27.000 to kind of figure out what happened. 19:30.000 --> 19:31.000 Hi, 19:31.000 --> 19:32.000 yeah, 19:32.000 --> 19:33.000 I have a question myself, 19:33.000 --> 19:34.000 over here. 19:34.000 --> 19:35.000 Yeah. 19:37.000 --> 19:39.000 I'm really useful to, 19:39.000 --> 19:41.000 I'm super excited to try it out. 19:41.000 --> 19:42.000 I'm sure it will help 19:42.000 --> 19:45.000 to debug lots of these own issues. 19:45.000 --> 19:46.000 While you're beginning the talk, 19:46.000 --> 19:47.000 I had a thought, 19:47.000 --> 19:50.000 and I'm wondering what your opinion is on it. 19:50.000 --> 19:52.000 So, as the memory pressure builds up, 19:52.000 --> 19:54.000 the kernel obviously tries to do a bunch of things, 19:54.000 --> 19:56.000 like try to put pages as well, 19:56.000 --> 19:59.000 and basically get them out of memory, 19:59.000 --> 20:01.000 so it can free up space with the allocation, 20:01.000 --> 20:03.000 and it's only after a lot of trying 20:03.000 --> 20:04.000 when it can, 20:04.000 --> 20:06.000 that it owns the process. 20:06.000 --> 20:08.000 And I'm wondering if you can, 20:08.000 --> 20:10.000 maybe use EBBF in this process 20:10.000 --> 20:13.000 to kind of start to build a very compact model, 20:13.000 --> 20:14.000 of which pages, 20:14.000 --> 20:16.000 the kernel is having problems with, 20:16.000 --> 20:19.000 and if you can communicate that to the user space, 20:19.000 --> 20:21.000 so maybe the user space application can get, 20:21.000 --> 20:22.000 ahead of the own, 20:22.000 --> 20:24.000 like it has more context about 20:24.000 --> 20:25.000 a which memory should be available, 20:25.000 --> 20:26.000 and not be available. 20:26.000 --> 20:27.000 Like early warning system. 20:27.000 --> 20:29.000 Yeah, like an early warning system, 20:29.000 --> 20:31.000 but like with a model of the memory that the kernel has, 20:31.000 --> 20:32.000 so it knows that, 20:32.000 --> 20:34.000 okay, the kernels having troubles with this section, 20:34.000 --> 20:35.000 I was an honest memory, 20:35.000 --> 20:37.000 and maybe I have some buckets over here 20:37.000 --> 20:39.000 that I really don't need, 20:39.000 --> 20:41.000 and I can shunt them out. 20:41.000 --> 20:43.000 Yeah, I mean that would be nice, right? 20:43.000 --> 20:46.000 If there was like a signal or something you could tap into, 20:46.000 --> 20:48.000 I'm not aware of anything like that. 20:48.000 --> 20:51.000 And I do know that like the go garbage collector 20:51.000 --> 20:52.000 will, you know, 20:52.000 --> 20:54.000 if it can't allocate, 20:54.000 --> 20:56.000 will, you know, before, 20:56.000 --> 20:57.000 you know, dying, 20:57.000 --> 20:59.000 it will try to run the garbage collector 20:59.000 --> 21:01.000 and try to free up space. 21:01.000 --> 21:04.000 But I don't think there's any kind of like proactive signals 21:04.000 --> 21:06.000 that Linux sends, 21:06.000 --> 21:08.000 but that would probably the way to do it, right? 21:08.000 --> 21:10.000 Like if basically garbage collector could know 21:10.000 --> 21:12.000 that it's running out of memory 21:12.000 --> 21:14.000 that maybe it can try to adjust its, 21:14.000 --> 21:15.000 you know, policy, 21:15.000 --> 21:18.000 so that maybe it doesn't overal get too much. 21:18.000 --> 21:20.000 But yeah, it's good idea. 21:20.000 --> 21:22.000 I mean, I also know that there's work 21:22.000 --> 21:24.000 ongoing in the kernel 21:24.000 --> 21:26.000 with improving some of this stuff 21:26.000 --> 21:27.000 and making, you know, 21:27.000 --> 21:28.000 kind of more information available, 21:28.000 --> 21:29.000 like you're talking about. 21:29.000 --> 21:31.000 So, I think the issue is also, 21:31.000 --> 21:33.000 preemptively numbing one hour 21:33.000 --> 21:34.000 and it's going to happen 21:34.000 --> 21:35.000 because you could get a pressure. 21:35.000 --> 21:37.000 But it's not necessarily, 21:37.000 --> 21:38.000 but I'm just going to necessity. 21:38.000 --> 21:41.000 That's the challenge I think. 21:42.000 --> 21:43.000 Sorry? 21:45.000 --> 21:46.000 Well, he said, 21:46.000 --> 21:47.000 oh, 21:47.000 --> 21:48.000 so what do you say? 21:48.000 --> 21:50.000 The preemptively. 21:50.000 --> 21:51.000 The preempt is hard. 21:51.000 --> 21:53.000 It's hard to know preemptively 21:53.000 --> 21:55.000 when you're going to get the RM, right? 21:55.000 --> 21:56.000 So even if the pressure's high, 21:56.000 --> 21:58.000 even if I set the signals out, 21:58.000 --> 22:00.000 it could be 10,000 or four signals, 22:00.000 --> 22:01.000 so like, you know, 22:01.000 --> 22:02.000 which one do I get? 22:02.000 --> 22:04.000 The current response to the RM I'm looking for. 22:04.000 --> 22:06.000 So, I don't think that. 22:06.000 --> 22:07.000 I think that. 22:07.000 --> 22:08.000 That's okay. 22:08.000 --> 22:09.000 Preemptively is hard. 22:10.000 --> 22:14.000 Yeah, thanks again for your presentation. 22:14.000 --> 22:16.000 I'm wondering a bit, 22:16.000 --> 22:18.000 and that we kind of have the O, 22:18.000 --> 22:20.000 M, not student to two use cases. 22:20.000 --> 22:22.000 It's the machine runs out of memory 22:22.000 --> 22:24.000 and you have a secret limited process, 22:24.000 --> 22:28.000 which is like 99% of the O, M's 22:28.000 --> 22:30.000 that process is usually C 22:30.000 --> 22:32.000 because it's not the machine that runs out of memory. 22:32.000 --> 22:34.000 It's the limit that's been given. 22:34.000 --> 22:35.000 Yeah. 22:35.000 --> 22:36.000 So the secret. 22:36.000 --> 22:38.000 And as I was wondering, 22:38.000 --> 22:40.000 there's no, there's work being done 22:40.000 --> 22:43.000 by Roman Kuchen from Google belief 22:43.000 --> 22:48.000 to add O, M management to the BFF capabilities. 22:48.000 --> 22:50.000 So you, you don't get a warning, 22:50.000 --> 22:53.000 but you get plenty more abilities to manage that. 22:53.000 --> 22:55.000 And then the machine is not out of memory. 22:55.000 --> 22:58.000 So you have plenty of time and plenty of memory 22:58.000 --> 23:01.000 to keep done but do also the stuff, 23:01.000 --> 23:03.000 which is more of a thing 23:03.000 --> 23:05.000 than managing the out of memory of the machine, 23:05.000 --> 23:07.000 because then that's a panic. 23:07.000 --> 23:08.000 Yeah. 23:08.000 --> 23:09.000 Yeah. 23:09.000 --> 23:10.000 I mean, it will proff works whether 23:10.000 --> 23:12.000 it's a C group or a machine. 23:12.000 --> 23:15.000 So, you know, parker agent usually runs in its own, 23:15.000 --> 23:17.000 you know, par and Kubernetes and when it 23:17.000 --> 23:19.000 ooms, that can happen, 23:19.000 --> 23:22.000 because people will typically tell parker agent 23:22.000 --> 23:23.000 like don't use a lot of memory 23:23.000 --> 23:24.000 because it's a, you know, 23:24.000 --> 23:26.000 continuous production profile. 23:26.000 --> 23:27.000 So it's not supposed to, 23:27.000 --> 23:30.000 but yeah, it works in both cases. 23:30.000 --> 23:32.000 And, you know, these things are constantly 23:32.000 --> 23:34.000 going to get refined and it's good thing, right? 23:34.000 --> 23:37.000 Like, I think, you know, for certain applications, 23:37.000 --> 23:38.000 especially databases. 23:38.000 --> 23:40.000 Like, you want to be able to use as much memory 23:40.000 --> 23:42.000 as you possibly can and run tight 23:42.000 --> 23:45.000 and, you know, cash as many things as you want 23:45.000 --> 23:48.000 and have flexible manner, you know, flexible ways 23:48.000 --> 23:50.000 to respond to a lot of memory on payment 23:50.000 --> 23:52.000 and be clean up some stuff. 23:52.000 --> 23:55.000 But this is kind of the something, like, 23:55.000 --> 23:57.000 and develop before or most, 23:57.000 --> 23:59.000 like, when things got too bad, 23:59.000 --> 24:01.000 the process has had to die, 24:01.000 --> 24:03.000 now you know why? 24:03.000 --> 24:05.000 One question over here. 24:05.000 --> 24:06.000 Yeah. 24:06.000 --> 24:07.000 So you find that them correctly 24:07.000 --> 24:08.000 in the end of the day. 24:08.000 --> 24:09.000 It's a correlation of information 24:09.000 --> 24:11.000 from user space in the kernel space 24:11.000 --> 24:13.000 via regular BPF map. 24:13.000 --> 24:15.000 So is there feasible or reasonable 24:15.000 --> 24:17.000 to try maybe to use task local storage 24:17.000 --> 24:19.000 for these purposes to store those buckets? 24:19.000 --> 24:20.000 Use which storage? 24:20.000 --> 24:21.000 Just local storage. 24:21.000 --> 24:23.000 So like a different type of BPF maps 24:23.000 --> 24:25.000 that are local to, to tasks. 24:25.000 --> 24:26.000 Ah, maybe. 24:26.000 --> 24:28.000 So what would be the range there? 24:28.000 --> 24:30.000 Essentially it would be a advantage 24:30.000 --> 24:31.000 would be that you will get 24:31.000 --> 24:33.000 inside your BPF program, 24:33.000 --> 24:35.000 right there, local to your threat 24:35.000 --> 24:37.000 that's going to be a performance advantage 24:37.000 --> 24:39.000 and probably a couple of other perks. 24:39.000 --> 24:42.000 It is a different type of BPF maps for the purposes. 24:42.000 --> 24:43.000 Interesting idea. 24:43.000 --> 24:44.000 Yeah. 24:44.000 --> 24:45.000 Yeah. 24:45.000 --> 24:46.000 Because I'm really, I'm asking, 24:46.000 --> 24:48.000 because it seems to be popular nowadays 24:48.000 --> 24:49.000 for profiling of different type of thing, 24:49.000 --> 24:51.000 mostly for CPU profiling, like, 24:51.000 --> 24:52.000 if you get item of Python, 24:52.000 --> 24:54.000 run time or something like that, 24:54.000 --> 24:56.000 would still seem a similar idea, 24:56.000 --> 24:57.000 run time profiling, 24:57.000 --> 24:59.000 but in your case it's going to be memory. 25:00.000 --> 25:01.000 Okay. 25:01.000 --> 25:02.000 And what's curious if you're like, 25:02.000 --> 25:04.000 we're investigating at the direction? 25:04.000 --> 25:06.000 Not at the moment. 25:06.000 --> 25:07.000 Yeah. 25:07.000 --> 25:09.000 Excuse me, could you have this? 25:09.000 --> 25:10.000 Yeah. 25:10.000 --> 25:15.000 This project was used in the context of a project 25:15.000 --> 25:18.000 that we had to do an unwinder for Luigi. 25:18.000 --> 25:21.000 And in the course of doing that project, 25:21.000 --> 25:25.000 we had some code that was written by your 25:25.000 --> 25:26.000 colleague. 25:26.000 --> 25:28.000 It wasn't so great that umed, 25:28.000 --> 25:30.000 so that was kind of the, 25:30.000 --> 25:31.000 that was the thing it was solving. 25:31.000 --> 25:32.000 And it worked. 25:32.000 --> 25:34.000 So, I'm sure there's a million ways 25:34.000 --> 25:35.000 that could be improved, 25:35.000 --> 25:37.000 but to open source and it's out there. 25:37.000 --> 25:39.000 So, going through that. 25:39.000 --> 25:42.000 So, I'm curious if you have plans to work 25:42.000 --> 25:44.000 is going around time, 25:44.000 --> 25:46.000 and you mentioned the legit, 25:46.000 --> 25:49.000 to have some support on their site, 25:49.000 --> 25:51.000 to make it more reliable. 25:52.000 --> 25:53.000 Thank you. 25:53.000 --> 25:55.000 So, making things more reliable 25:55.000 --> 25:57.000 from a running out of memory perspective. 25:57.000 --> 26:00.000 Well, for, as I go this right now, 26:00.000 --> 26:01.000 it's super inconvenient, 26:01.000 --> 26:03.000 because this Hashmap, for instance, 26:03.000 --> 26:06.000 is scattered all over the memory space. 26:06.000 --> 26:09.000 So, for instance, just, you know, 26:09.000 --> 26:10.000 it's done by the, 26:10.000 --> 26:12.000 if they did something that it wasn't, 26:12.000 --> 26:13.000 that much scattered, 26:13.000 --> 26:14.000 but state, 26:14.000 --> 26:18.000 I don't know, 26:18.000 --> 26:20.000 three megabytes of continuous memory space, 26:20.000 --> 26:22.000 and this could be some special making 26:22.000 --> 26:27.000 that lived after a process has been killed. 26:27.000 --> 26:30.000 But I don't want to suggest this. 26:30.000 --> 26:32.000 I'm just thinking, general, 26:32.000 --> 26:34.000 that maybe programming languages 26:34.000 --> 26:36.000 at some support, 26:36.000 --> 26:38.000 the whole infrastructure could work better. 26:38.000 --> 26:39.000 And since you're like, 26:39.000 --> 26:41.000 doing the peer-nearing work in this area, 26:41.000 --> 26:45.000 maybe you started doing something like that already. 26:45.000 --> 26:46.000 I haven't, 26:46.000 --> 26:47.000 but I mean, 26:47.000 --> 26:48.000 I know there are solutions, 26:49.000 --> 26:51.000 especially in database software that do stuff like this, 26:51.000 --> 26:52.000 that kind of monitor, 26:52.000 --> 26:53.000 you know, 26:53.000 --> 26:54.000 how much memory is free, 26:54.000 --> 26:56.000 and we'll kind of, 26:56.000 --> 26:58.000 react early 26:58.000 --> 27:00.000 to low memory situations 27:00.000 --> 27:02.000 to avoid this kind of stuff. 27:02.000 --> 27:04.000 But things happen, 27:04.000 --> 27:07.000 and it's good to have a escape, 27:07.000 --> 27:08.000 you know, 27:08.000 --> 27:11.000 patch for when it dies. 27:11.000 --> 27:12.000 All right. 27:12.000 --> 27:13.000 Thanks for coming out. 27:13.000 --> 27:15.000 Thank you, Tommy. 27:18.000 --> 27:20.000 Thank you.