WEBVTT

00:00.000 --> 00:11.360
I'll summarize what I've said, you're said.

00:11.360 --> 00:21.000
Our goal is to have a always-on verbus tracing system in such a way that your performance

00:21.000 --> 00:23.880
won't hurt too much, right?

00:23.880 --> 00:25.360
Zero is impossible, contrary?

00:25.360 --> 00:26.360
I think.

00:26.360 --> 00:27.360
Yeah.

00:27.360 --> 00:34.240
But that's the goal, because sometimes you get those issues that happen somewhere sometimes

00:34.240 --> 00:38.480
in a narrow and impossible to deal with, right?

00:38.480 --> 00:39.840
So that's the motivation.

00:39.840 --> 00:45.680
So it's technically, it's somewhat similar to F-trace, right?

00:45.680 --> 00:52.320
I hope you're familiar, it's tracing, but it's used in, usually, in development or in

00:52.320 --> 00:57.280
the debug, when you control the system, and it's a single term, right?

00:57.280 --> 01:02.960
So here we're talking about by repression, per your module, and where you control the, what is,

01:02.960 --> 01:05.960
you know, what, and how, etc.

01:05.960 --> 01:13.720
So the problem, yeah, as I just mentioned, fully formatting the loading is of a slow, even if

01:13.720 --> 01:20.080
you don't dump it into demensage, it will still hit over from it, right?

01:20.080 --> 01:28.200
So this is going to be a technical detail in what we did, and how we did it, yeah.

01:28.200 --> 01:33.760
So the problem, the problem is memory, memory, memory, that's the problem, right?

01:33.760 --> 01:41.400
So if you want to have good performance, you have to be hitting a close to your memory,

01:41.400 --> 01:46.480
and doing, doing as little as possible.

01:46.480 --> 01:58.520
So that means that the idea of increasing, meant that while, well, we may be reaching out

01:58.520 --> 02:04.080
of your, except the working space, right, because you're loading somewhere else, right?

02:04.080 --> 02:08.320
You have your memory that you're working memory, and you have your loading memory, going

02:08.320 --> 02:12.240
outside of your working memory, means you'll get your cache misses, and you will hurt

02:12.240 --> 02:13.240
performance.

02:13.240 --> 02:17.200
That's the main problem of your, of any logging or any files.

02:17.200 --> 02:23.320
So try to minimize the size of the trace, right?

02:23.320 --> 02:27.720
And the number of times you're accessing a remote memory, remote, and it's outside of

02:27.720 --> 02:29.080
a one two, right?

02:29.080 --> 02:33.320
So it's not really remote, but still something we try to minimize your performance

02:33.320 --> 02:34.320
hit.

02:34.320 --> 02:36.960
The goal is to make that as small as possible, right?

02:36.960 --> 02:37.960
So why I'm saying it?

02:37.960 --> 02:42.640
Because you will see some civilizations, and that's the idea of trying to make everything

02:42.640 --> 02:45.520
as compact as possible.

02:45.520 --> 02:53.560
That's why, also, if trace is a bit flat in my mind, for at least for this case, right?

02:53.560 --> 02:58.480
So why binary trace is, by interesting is kind of the fastest way to do it, and many

02:58.480 --> 03:02.360
of the things that you do, you can do it compile time, right?

03:02.360 --> 03:07.520
So we keep an in memory only the parameters that you save it, right?

03:07.520 --> 03:11.840
So you have some kernel letters, or some ID, some token.

03:11.840 --> 03:16.040
You save only, you don't need to save the whole string, because the string is somewhere

03:16.040 --> 03:21.560
else, all you save, so we save it just once, at compile time, right?

03:21.560 --> 03:28.720
So, we have, for each logging instance, we have this data structure, right?

03:28.720 --> 03:35.480
So we have, at the as delta, what at the as delta means that we don't keep the whole use

03:35.480 --> 03:41.760
64, for your time frame, we keep only a data from a baseline, right?

03:41.760 --> 03:47.800
We keep the baseline, actually, in the next slide, so, but we don't keep the whole time

03:47.800 --> 03:55.560
frame, only the data from a baseline where you allocate it a page buffer, instead of, for

03:55.560 --> 04:00.960
a shot in something that I'm going to talk about, so each entry has the data, so it's

04:00.960 --> 04:05.920
more compact, source ID is where your line was, right?

04:05.920 --> 04:11.000
So you save somewhere, each time you log in interest, right?

04:11.000 --> 04:17.760
On the first time you hit this line, you save the location, right?

04:17.760 --> 04:23.040
The file, the file name, the the line number, and gets an ID.

04:23.040 --> 04:27.760
So next time when you are logging, you only save the ID, right?

04:27.760 --> 04:32.880
And so when you are disserializing this log, you have the exact time, you don't need the

04:32.880 --> 04:39.000
string, you just have a U6 and a, why does it say, hey, fine, okay?

04:39.000 --> 04:46.640
And so you have this, and you have a user context, for example, in, in safe, we have FSI

04:46.720 --> 04:52.760
globally, but we also can save some client specific, something that specific to you, in

04:52.760 --> 04:56.200
your logging, so you can get it fast, right?

04:56.200 --> 05:02.040
And the left, so it's basically a TLV structure, and that's kind of the idea, using

05:02.040 --> 05:10.520
as compact as possible, a, how are we, you know, how we're doing, especially, so you have

05:10.520 --> 05:16.640
a serialization, with the LV, it compiles time, as much as you can do it, compiles time,

05:16.640 --> 05:21.840
you save it, and that's kind of the, the, the magic, right?

05:21.840 --> 05:25.840
And then you, you see always on demand, right?

05:25.840 --> 05:38.080
So, what's, what's the magic, the magic is that we, I'm not using actual circle buffers,

05:38.080 --> 05:43.080
but it's a circle memory allocation system, right?

05:43.080 --> 05:49.600
I should have invested in a slide of this kind of thing, but basically, we have a magazine

05:49.600 --> 06:00.160
allocator, some, the memcash work, so magazine allocator, you get a magazine buffers, right?

06:00.160 --> 06:06.320
An allocator per core, right, you can have two of those, so there's a paper on it, not, unfortunately

06:06.320 --> 06:12.040
not mine, and it's been long in the can offer for, but we're using this reason to make

06:12.040 --> 06:17.880
it, so the allocation is very fast, for each frug, right?

06:17.880 --> 06:20.600
But we are allocating very, very small segments, right?

06:20.600 --> 06:26.240
As I mentioned, we have this on your parameters, and only the, the TLV header, right?

06:26.240 --> 06:34.720
So each, in our use case, what I've saw, with this, in this way, we could save two

06:34.760 --> 06:38.120
logs per cache line, right?

06:38.120 --> 06:43.960
And it's improvement on, whatever, whatever it was before, basically means that when you

06:43.960 --> 06:49.200
have one log, it's, the cache line is only in your L1, the next log will be essentially

06:49.200 --> 06:51.840
free, right?

06:51.840 --> 06:52.840
So how do you allocate fast?

06:52.840 --> 06:57.840
So first, you have the magazines, allocator, so you have per core allocations, which

06:57.840 --> 07:07.120
you go to the global context once, once in a blue moon, again, it's a talk about magazines,

07:07.120 --> 07:12.000
but you know, if you're not familiar with pattern, my advice, you use it, no, it

07:12.000 --> 07:15.120
because it's very powerful.

07:15.120 --> 07:22.520
And the next thing we do is take each one of those buffers, and actually use it for a,

07:22.520 --> 07:29.320
what, I'm forgetting the name, we slice it up, right?

07:29.320 --> 07:31.480
It's again borrowed from networking, right?

07:31.480 --> 07:38.200
Neplications, what you have usually, you have 30 k, a buffer allocated, and then each time

07:38.200 --> 07:44.840
you have a network pocket, incoming just slice it up, so your allocation is just adding

07:44.840 --> 07:49.200
whatever the size of the buffer is, and that's the same mechanism we're using here.

07:49.200 --> 07:57.120
And so for my experience having this pair of magazines with, a, page frauds, with a fragment

07:57.120 --> 08:01.640
into this page, it's very, very fast, right?

08:01.640 --> 08:08.360
It's, in this, you, you want to see the performance, I use it with 100 gigabit nix, pair

08:08.360 --> 08:15.520
pocket, a memory operations, so this is not what core superfones, what core superfones

08:15.520 --> 08:16.960
is the actual axis, right?

08:17.040 --> 08:26.560
That's the first slide I saw, I, I, I, I showed, right, so memory apps, and so we have

08:26.560 --> 08:32.000
this magazine allocation on, on the one side, we have your page fraud allocation, when

08:32.000 --> 08:39.040
you're using it, and then you have a, an allocator that, also an magazine allocator that

08:39.040 --> 08:47.280
puts it in and exists in logs cache basically, right, and all magazines go into a global

08:47.280 --> 08:50.640
list eventually, and then recycled, right?

08:50.640 --> 08:57.640
So you have full control, and this cyclical, a, allocation scheme provides you two things,

08:57.640 --> 09:03.640
hey, you control them out of memory, you're willing to provide for your login, and then

09:03.640 --> 09:09.960
make sure that each log keeps its life as long as possible, right?

09:09.960 --> 09:16.320
So it's only when it goes through the whole, whole cycle, will your logs be recycled,

09:16.320 --> 09:18.920
right, in globally, right?

09:18.920 --> 09:28.760
So performance and, and, and life, so additionally we want to have process context,

09:28.840 --> 09:34.120
but, you know, not everyone happens in process, or that's something happened, something's

09:34.120 --> 09:38.360
happened in, I agree, or not, because that's inappropriate, right?

09:38.360 --> 09:46.280
So for this specific change, you're trying, and that's the one big change I need to

09:46.280 --> 09:51.480
kind of get in, we are adding a point in the structure, right?

09:51.480 --> 09:57.320
So when you have a process that is logging, like, we're talking about file system, right?

09:57.320 --> 10:02.760
So when it does the right read some operation, you want to see, you want to be able, when

10:02.760 --> 10:07.680
you get in the logs, back out, you want to see which process did what, right?

10:07.680 --> 10:12.520
You want to, you want to, be able to take your A Python script, and organize the logs in

10:12.520 --> 10:17.840
ever, which way you like, because you won't get them in a, a, a, a, a, a, a property

10:17.840 --> 10:22.440
serial assign, you'll get them each pagefrog, which will be properly sealed as, you know,

10:22.440 --> 10:24.880
by time, but they will be out of order, right?

10:24.880 --> 10:30.480
Anyone will organize it by your process, or whatever context that you have, right?

10:30.480 --> 10:34.960
And second thing you have the dynamic context, you have packets coming and we're seeing,

10:34.960 --> 10:38.120
so the process context is relevant is wrong, right?

10:38.120 --> 10:43.240
So that's why we, we separate between the two, and we have pair process logging, and pair

10:43.240 --> 10:50.720
CPU logging, and that separate, they all belong to your module context, right, and you'll

10:50.800 --> 10:55.920
get them, but you don't, but you will see something that happened when you actually did

10:55.920 --> 11:02.200
something, when you made the C school, and when something happened with your module, but

11:02.200 --> 11:06.000
you don't really know kind of, you don't know who, who paid the price, because, you know,

11:06.000 --> 11:10.200
whatever was running on the CPU, when they, you happen, so we tell, there was a guy,

11:10.200 --> 11:16.120
if you're not, it, something came from the wire, something came in your, I think, this

11:16.200 --> 11:21.600
is called the C school, and, you know, make sure you have the proper logging.

11:21.600 --> 11:31.880
A memory layout, let's try to make it, let's get the slide, so this is kind of the

11:31.880 --> 11:38.160
explanation of what I will, if you session with our spleen, so we have the empty mug,

11:38.160 --> 11:45.600
it goes to global least, that's kind of the blue, and it goes into a full mug, at least.

11:45.600 --> 11:51.400
So we allocate a page for context, as I said, so we have, we have a smelly, magazines,

11:51.400 --> 11:57.320
we allocate an entry, but it belongs to a page for our context, which has at least had,

11:57.320 --> 12:02.440
because as I mentioned, when we allocate one, it goes into this global least, and a page

12:02.440 --> 12:10.680
had, it remains alive and usable throughout the life, no matter where in the life cycle,

12:10.760 --> 12:16.160
each page for a is, and that's so that when we dump your information, when you want

12:16.160 --> 12:21.000
to actually want to see to be serial as a logging, you can get all the logs, right?

12:21.000 --> 12:27.240
So I have the list log, and we have the distribution, like the PID or the CPU, wherever,

12:27.240 --> 12:32.880
and the base G-fix, so when this becomes active, you record the base G-fix, and so

12:32.880 --> 12:37.880
the next time on each log, comments don't need the delta, right?

12:37.880 --> 12:40.880
That's how we make it small.

12:40.880 --> 12:46.800
Yeah, so that's kind of the description, so we have the buffer, the metadata, and then you

12:46.800 --> 12:52.680
have the logs, and when we deploy this page drop, instead of trying to do a cyclical

12:52.680 --> 13:00.040
location, we just move it aside into an existing logs, cache, and get a new one, again,

13:00.040 --> 13:06.720
because we don't, we want to keep the logs alive as long as possible, right?

13:06.720 --> 13:10.760
Global content, that's kind of what they start, so we have the location, but this is

13:10.760 --> 13:16.960
the kind of the live log batch, and I think it's a typo, and so you have the source

13:16.960 --> 13:21.640
map and the climb up, so the climb up, as I said, is the identifiers, they're specific

13:21.640 --> 13:27.800
to your module, whatever you want it to, to save, and the source map is the location,

13:27.800 --> 13:33.600
so you have the lines, and the, yeah, so it was a big dive, the original one talk was kind

13:33.600 --> 13:44.800
of longer, I hope it wasn't too dense, so thank you for questions.

13:44.800 --> 13:50.600
How do you read the trace, then once you put it up, how do you read the trace, then

13:50.600 --> 13:56.040
is there integration with any tracing GUI, or do you use some custom scripting to then

13:56.040 --> 13:57.040
reason about it?

13:57.040 --> 14:04.520
So, what you get, you have the current solution, is we have a debug a fast file for your

14:04.520 --> 14:09.480
module, and you just cut and read it, right?

14:09.480 --> 14:15.720
So I'm not sure it will be the main, so I sent back in October, the batch set, it's about

14:15.720 --> 14:22.480
long-dish, and Steven wrote that who wrote F-trace, long-dish, why would I use F-trace

14:22.480 --> 14:28.440
so the current implementation is debug a fast, and we'll see how it goes, they might

14:28.440 --> 14:30.440
go into it.

14:30.440 --> 14:33.640
Are you tracing only kernel space, only user space, both?

14:33.640 --> 14:34.640
It's only kernel space.

14:34.640 --> 14:35.640
Okay, thanks.

14:35.640 --> 14:41.000
It's not in the big dev solution, it's just inside your kind of module and compile.

14:41.000 --> 14:48.200
You started your talk with, with comments about memory latency, yeah, and does this really

14:48.200 --> 14:55.200
play such a big role, because as a tracing provider or creator, you're just writing,

14:55.200 --> 15:01.600
that means you never have to wait on memory, you just stored out into write buffers and

15:01.600 --> 15:05.400
it eventually, the system will flush them out, what are you saying stalls?

15:05.400 --> 15:11.640
It's not about the stalls, it's about the memory footprint, so whenever you're going

15:11.640 --> 15:17.880
outside of your memory, you're built in your L1, your L2, etc, so whenever you have

15:17.880 --> 15:26.120
a working context at the CPU, it's just a long-dish thing, I need to remember, the Intel

15:26.120 --> 15:34.280
has passed the, what do we call the non-curve or are you using them?

15:34.280 --> 15:38.840
Yeah, that would buy fast all these times.

15:38.840 --> 15:46.520
He actually tried, for my memory, the hustle is not very, so we used to, made some experiments

15:46.600 --> 15:53.520
work like a decade ago, in different contexts, not this one, it's actually an interesting

15:53.520 --> 16:01.400
idea, I would need to do it, so my memory of it was that it's not very good, it's not

16:01.400 --> 16:06.560
very useful, but maybe out of date and the specific contexts may have changed enough to

16:06.560 --> 16:09.360
make it very open, so it's actually good at it.

16:09.360 --> 16:15.600
Is this specific to kernel modules, you need to do something in your kernel module, is this

16:15.680 --> 16:20.520
specific to kernel modules, do you need to modify a kernel module, or can we use it in the core

16:20.520 --> 16:21.520
kernel?

16:21.520 --> 16:29.760
It's a subsystem that you can, it's a library, you can use and just change your logging

16:29.760 --> 16:45.560
like, the P trays to, you know, be locked trays and you get it, all right, and if you

16:45.560 --> 16:52.560
am, okay folks, any more questions, you want to see the, you know, double the badges,

16:52.560 --> 16:56.720
my email is right there, thank you for listening, thank you.

16:59.760 --> 17:06.760
Thank you, thank you, thank you, thank you, thank you, thank you, thank you, thank you, thank you