WEBVTT 00:00.000 --> 00:18.000 Okay, so let me introduce the next very special talk and that's like especially for me as I'm actually a chief sticker office in the company and I was bullied into design and sticker. 00:19.000 --> 00:33.000 But yes, our new car will guide us into water tea platform ease and like if somebody listen to my talk, this is a platform that I butcher all the time and try to use it and your new car is actually the one who builds it. 00:33.000 --> 00:35.000 So please go ahead. 00:37.000 --> 00:39.000 Hello everyone. 00:39.000 --> 00:45.000 I am the one who builds it, but there are people in the zoom actually fix it. 00:45.000 --> 00:49.000 So so hi, my name is John Luca. 00:49.000 --> 00:57.000 This is the talk about a plastic introduction to ET platform because I just wanted to give where we can speak about the hardware and the theory as much. 00:57.000 --> 01:05.000 I just wanted to see, this is how you get a hands-on on the project and this is how you can see what's long and see what you can fix and see what you can join. 01:05.000 --> 01:10.000 Because first of all, I think we need to start by saying what is going to talk about. 01:10.000 --> 01:18.000 First of all, I believe not everyone here knows what ET platform is and so we're going to define it. 01:18.000 --> 01:26.000 Then we're going to see how to actually get the platform on your machine and run it and see what you have and what you can do. 01:26.000 --> 01:34.000 And then we're going to get from ET platform just look up to what I found this and what I found it does and what we're up to. 01:34.000 --> 01:40.000 Okay, about me, this slide is becoming very static and not very changing. 01:40.000 --> 01:45.000 The two things that I've changed in the past year are number three because I joined an echo. 01:45.000 --> 01:49.000 Number five because I had zero free time to do personal project. 01:50.000 --> 02:09.000 Okay, so what is it the board that you see here is it is a one board is the development board that we actually use it is based on the former Esperanto it is a one which is now part of an echo and we're planning to open source everything. 02:09.000 --> 02:19.000 The way to think about this chip and I'm going to go later on is essentially an accelerator based on circa a thousand is five calls and he has incredibly low power. 02:19.000 --> 02:24.000 And as you can see there and be more precise you can actually have a thousand is five. 02:24.000 --> 02:44.000 So I'm going to describe quickly the other so at least we know what we're talking about and then go back to the software you can run it. 02:44.000 --> 02:48.000 Some people might have a sense of the job you either use this slide today. 02:48.000 --> 02:59.000 So this is how a board looks like when when you see the board before you will quickly notice that there's this big chip on the hit sink you find the LPD are for around. 03:00.000 --> 03:17.000 This is the pimic and that's on the top left is actually the FTDDI so what are the useful so it is a one is the big chip FTDI is the one that give you the UR to USB so actually you can check or you can control when you have the physical card the stuff. 03:17.000 --> 03:27.000 The pimic is actually the guy who actually handled the power of the chip and you see the memory in PCA is the actual slot where you fit the card into. 03:27.000 --> 03:46.000 So this is another I have various various graphs that I used to explain how the hardware works and none of them gives any justice to the elegance of the actual chip that is actually very uniform and made in different stuff. 03:46.000 --> 03:51.000 But when I think about programming this chip this is the probably warp at the much that I have in my mind. 03:51.000 --> 03:54.000 As you can see we had the PCI where you get the host. 03:54.000 --> 03:59.000 We get a service processor that is the kind of controls of a thing. 03:59.000 --> 04:04.000 We got the compute shares and that's where most of the thousand calls are. 04:04.000 --> 04:14.000 We even got four out of all the CPUs and that's the last call maxions while the small shares are called minion and then you got various devices as everything goes. 04:14.000 --> 04:26.000 The things you understand from this is that you get this huge array of CPUs that are small and in order can be programmed and there you get a service processor that control everything. 04:26.000 --> 04:37.000 Right, so what's a minion minion is one of the many thousand CPUs that we have and essentially you can think of it as a very small this five 64 CPUs. 04:37.000 --> 04:51.000 So you actually have machine mode supervisor mode a user mode he has various extension array interesting like as a tactical atomic see definitely have a SIMD instructions or you can do fixed parked back to extensions. 04:51.000 --> 04:56.000 And you get actual tens of accelerators we actually do matrix multiplication. 04:56.000 --> 05:02.000 So it's actually quite powerful and you got a thousand of them. 05:02.000 --> 05:19.000 Right, so now that we solve how the hardware works we can actually start understanding what is it platform so it platform is a software is completely open source it is on GitHub and contains what you see here essentially contains everything to actually. 05:19.000 --> 05:31.000 If you have a card run kernels on it runs software on this thousand CPUs or if you are if you don't have a card you can simulate it just to learn to test and to see how it works. 05:32.000 --> 05:37.000 Ideally like going from higher level to lower level you get that on time library. 05:37.000 --> 05:44.000 The random library is the library that actually you interface and the way you interface to run temporary is that you program. 05:44.000 --> 05:57.000 You write a program you compile it for this small CPUs you use the random library to load it into device ram and then you tell all the CPU execute this program right now. 05:57.000 --> 06:10.000 And of course the program can actually easily say okay if I'm CPU one I'm going to do job for CPU one and CPU two under the job for CPU two so that's the differentiation that you have simplifying. 06:10.000 --> 06:17.000 The other things you'll find in it platform is the device layer which is the thing that abstract the mechanism to have a device. 06:17.000 --> 06:31.000 And these allow us to run the same software for the computer file but then a simulator that you can just run in your computer and the actual hardware that's going to use our Linux server. 06:31.000 --> 06:34.000 And as I said this is on GitHub. 06:34.000 --> 06:42.000 This is the software that you will find as I said you're going to find the runtime library to device layer the simulator you don't see either but also the firmware. 06:42.000 --> 06:54.000 And this is what's funny you will actually run the simulator the firmware simulated so you can actually even test the simulator for this very complex machine. 06:54.000 --> 06:59.000 And of course everything is open source right so. 06:59.000 --> 07:08.000 This is what I wanted to do today I wanted to be very precise on how to actually get clone compile and run. 07:08.000 --> 07:17.000 First of all I went the hard way because it's actually a Docker build but there was two short of the slides so I decided to actually use the full thing. 07:17.000 --> 07:31.000 It's definitely a Docker file you definitely can look at I found the read me for for that but anyway the first step to actually compile everything you need the gcc tool chain for the device they have particular extension. 07:31.000 --> 07:42.000 And this is the step as you can see is usually this five new tool chain which is a very common way to deploy this file tool chain in the window. 07:42.000 --> 07:47.000 I now it doesn't work but we're going to fix it by the end of the week. 07:47.000 --> 07:50.000 It's not all problem is upstream. 07:50.000 --> 07:55.000 So yes you're going to clone the this file tool chain and then this is essentially. 07:55.000 --> 08:02.000 I'm assuming you're going to 24 or four but there are people that are being able to build and install this on Arc Linux and in other. 08:02.000 --> 08:11.000 Distribution but Linux is necessary there are patches to build this on macOS but that off branch so far. 08:11.000 --> 08:22.000 So essentially yeah you build these are the usual things required to build gcc you configure and you build you configure and make and then you will actually have. 08:22.000 --> 08:26.000 Installing opportunity as you can see. 08:26.000 --> 08:31.000 The full tool chain for minion that's all you need to do. 08:31.000 --> 08:37.000 Once you do the tool chain you can start compiling gcc platform because you can start compiling the firmware. 08:37.000 --> 08:44.000 And sorry the first line is long this slides are very fresh. 08:44.000 --> 08:49.000 So the first thing you need to do then is you have to need to clone it platform. 08:49.000 --> 09:00.000 And as I read me that I explained even how to run with Docker but you get the dependency and then it's just a very normal see make and build. 09:00.000 --> 09:07.000 The general things that is different is I need to tell him where to find the tool chain and we just installed it in. 09:07.000 --> 09:10.000 But that's pretty much it. 09:10.000 --> 09:17.000 So moment you do this you actually gonna have everything that I discovered before. 09:17.000 --> 09:23.000 Including the simulator and everything but that's small because once you compile what you can do well. 09:23.000 --> 09:25.000 You can run tests. 09:25.000 --> 09:30.000 If you do an opportunity then you will find a lot of tests that actually stress the simulator. 09:30.000 --> 09:34.000 But if you just add dash dash mode equal PCIe. 09:34.000 --> 09:38.000 You actually gonna run this test test on the card if you'd like it to have one. 09:38.000 --> 09:45.000 And actually the one that says something about how to access the card later. 09:45.000 --> 09:46.000 Exciting things. 09:46.000 --> 09:50.000 Those are full simulator already built and installed. 09:50.000 --> 09:54.000 And it's essential to call the model for us. 09:54.000 --> 09:57.000 And you will find it there. 09:57.000 --> 10:00.000 I will tell you later how to run stuff on it. 10:00.000 --> 10:04.000 If you're interested in the full model, share runs on the real card. 10:04.000 --> 10:07.000 You will find it in an opportunity, be a sperant of firmware. 10:07.000 --> 10:12.000 And you can modify and compile it and test it on the simulator. 10:12.000 --> 10:14.000 Right. 10:14.000 --> 10:16.000 So what else can you do? 10:16.000 --> 10:23.000 Well, there's definitely an example kernel which is usually very interesting to run with simulator just to see what happened. 10:23.000 --> 10:27.000 So you can just do it in the platforms of the same examples. 10:27.000 --> 10:31.000 You type a make and we should probably fix the make file. 10:31.000 --> 10:37.000 But you will actually find it is a quanta self in this directory in the build directory. 10:37.000 --> 10:42.000 And once you have that, you can actually run the simulator with this. 10:42.000 --> 10:47.000 And you actually see every single instruction and the register that it changed and now it works. 10:47.000 --> 10:50.000 So this is actually how you learn to do this. 10:50.000 --> 10:55.000 And that's actually how you can actually experiment, modify and experiment the test to see where it is. 10:55.000 --> 10:58.000 But you want to see actual kernels. 10:58.000 --> 11:04.000 And even like if the simple but we actually have a port with lmcpp, that is a basic port. 11:04.000 --> 11:10.000 But it has a lot of kernels that actually work in order to be able to run the model. 11:10.000 --> 11:13.000 You can actually just, you know, have found it has this. 11:13.000 --> 11:18.000 You can just go to find the lmcpp, you can just get lmcpp and actually find 11:18.000 --> 11:22.000 in the gmail structure in it, gmail it. 11:22.000 --> 11:24.000 You're going to find the kernels. 11:24.000 --> 11:30.000 You can actually see how this can actually be used in a proper system. 11:30.000 --> 11:36.000 You see this is 80 soquan all well not at all because we are very open company. 11:36.000 --> 11:41.000 And actually, 80 plus term has already the simulator support for lmum, 11:41.000 --> 11:44.000 which is the next chip. 11:44.000 --> 11:48.000 And so when you actually build a dipl safer, we would be a Servicator. 11:48.000 --> 11:49.000 One is Cimum. 11:49.000 --> 11:52.000 For it is 1 year old, your days for lmum. 11:52.000 --> 11:54.000 And lm is the next chip. 11:54.000 --> 11:58.000 And you can actually even run lm test. 11:58.000 --> 12:00.000 Now, of course, what is lmum? 12:00.000 --> 12:03.000 You will expect a slide for me, but I didn't do any. 12:03.000 --> 12:08.000 doing it because, actually, the whole spec, while we're developing the chip, is already 12:08.000 --> 12:09.000 there. 12:09.000 --> 12:13.200 So, actually, go there and even get the reference panel directly from the hardware 12:13.200 --> 12:16.200 engineer working on it. 12:16.200 --> 12:22.000 So, this is the basics and the question is, how can you join? 12:22.000 --> 12:27.080 Because there's a lot going on with very open, and this is, this gets everyone from people 12:27.080 --> 12:32.760 working on MSEPP and AI on Accelerator based on this five to actually people interested 12:32.760 --> 12:38.880 in to see how does a film of work and hardware handle, and we say, yeah, well, we have this 12:38.880 --> 12:39.880 code. 12:39.880 --> 12:44.880 If you actually go to getup.com, I found it or told, you're a fan of this code in 12:44.880 --> 12:45.880 right. 12:45.880 --> 12:46.880 You should join us. 12:46.880 --> 12:51.400 Honestly, it's very hard to keep up with the average intelligence of the people 12:51.400 --> 12:54.560 I feel very stupid, but that's my life. 12:54.560 --> 12:59.400 It's actually very interesting, there's discussion going on from compiler technology 12:59.400 --> 13:05.800 to whatever new is happening in the world of AI to how do I fix the links to people 13:05.800 --> 13:09.560 suggesting artifacts, kernel drivers, and everything else. 13:09.560 --> 13:16.560 And the results are not the important thing, every Tuesday, I think local time is 5pm. 13:16.560 --> 13:23.080 We actually have an ET platform calling, which you'll find a lot of an echo and other 13:23.080 --> 13:28.800 engineers just joining and discussing about technology, what's our plan, what's happening 13:28.800 --> 13:34.720 in the process, and what can people can help, what people are doing, and it's a completely 13:34.720 --> 13:40.680 open-source project, it's completely open, and you'll learn a lot just by being there 13:40.680 --> 13:43.080 at least I did. 13:43.080 --> 13:47.080 And yes, so you actually can go to the event calendar in this code and you'll say it. 13:47.080 --> 13:53.840 And as I said, when we said the ET platform is open, so we actually mean it. 13:53.840 --> 13:57.840 That's it for me, other any question. 13:57.840 --> 13:58.840 Please. 13:58.840 --> 14:08.800 I have a question, is there a new Airbnb platform on the chip of the RBA in 23? 14:08.800 --> 14:10.800 No, absolutely no. 14:10.800 --> 14:16.800 So the question is, is the new Airbnb chip at V8 when you take a compatible, absolutely 14:16.800 --> 14:17.800 not? 14:17.800 --> 14:19.800 That's why I linked the spec. 14:19.800 --> 14:27.880 No, the new Airbnb chip is actually, if you see the way the technical is called a neighborhood. 14:27.880 --> 14:32.760 So essentially, 8 of this minion, we don't the thing, so it's still like the same architecture, 14:32.760 --> 14:39.040 with some Lata fixed cylinder, and then 24 megabytes of memory, and then it's actually 14:39.040 --> 14:43.600 going to have a dual-modal operation when it's as small microcontroller, so it's on GPIO, 14:43.600 --> 14:51.640 what SPI, what if things, or as I've lived a while, it actually has octopi-pulbats interface 14:51.640 --> 14:52.640 for memory. 14:52.640 --> 14:56.640 So there will be a smart memory within and of compute. 14:56.640 --> 14:58.640 Please. 14:58.640 --> 14:59.640 Yes. 14:59.640 --> 15:12.880 We do have, we do have the advert, the chip exists, you saw the picture, right? 15:12.880 --> 15:18.960 We do, so the way it works is that, that way we're not selling it, so far, we do have access 15:18.960 --> 15:23.280 for the community, for people that are involved in community, can actually SSH to machine 15:23.280 --> 15:27.640 in San Francisco, hopefully we're going to bring it on the other side of Atlantic, and it's 15:27.640 --> 15:32.720 part of the community access to architecture experiment. 15:32.720 --> 15:47.880 So it doesn't include, those the emulator include performance models, is definitely an emulator 15:47.880 --> 15:52.840 that was used both to simulate correctly, but if you see the dash L that I add, you have 15:52.840 --> 15:57.600 to get the place used by DV, so actually you can see clock by clock how it changed. 15:57.600 --> 16:04.400 It's not clocked, so let me say, it's not cycle accurate, but it's mostly a way to actually 16:04.400 --> 16:13.280 measure that the behavior of the shape is exactly the same as the behavior of the thing. 16:13.280 --> 16:20.000 So there was, my question is, should it make silver as a chip with 90,000 colours? 16:20.000 --> 16:21.000 Yeah. 16:21.600 --> 16:29.440 We're not building nannanded, of course, sorry, no, no, no, how many colours? 16:29.440 --> 16:32.720 Yeah, we're not building nannanded, there's some colours, so. 16:32.720 --> 16:36.960 Well, now we're starting by eight, and then we'll go, goly. 16:36.960 --> 16:39.200 No, it's up to, what do we want to go? 16:39.200 --> 16:42.600 Essentially, this is probably the city always better equipped to answer that question, but 16:42.600 --> 16:49.680 I can, if you want, we're going for inference, we're going for the middle market. 16:50.240 --> 16:53.720 I'm actually here for the blinking lights, so you can speak with them. 16:53.720 --> 16:57.040 I like to say that I'm here for the blinking light enough for the business, because I prefer 16:57.040 --> 17:03.400 to talk about problems that actual selling something, so anyone else, please? 17:12.400 --> 17:17.000 That's a very good question, I should, as I said, is a simple port. 17:17.000 --> 17:22.080 So the question is, what are the performance of LMSVP on it is a one? 17:22.080 --> 17:26.080 They can go very well, the chip is definitely fast. 17:26.080 --> 17:34.000 LMSVP is not, because this was a demo for us to show that we use the LMSVP as a way 17:34.000 --> 17:39.880 to show anyone's quite other the time, talk to you, show that the softest stack we were building 17:39.880 --> 17:42.600 actually could compile and work on the L card. 17:42.600 --> 17:48.920 So LMSVP was actually a way for us to not only learn, but also see that using the highest 17:48.920 --> 17:52.040 possible levels of the stack we could actually work it. 17:52.040 --> 17:55.680 It's quite interesting, because if you see how the currency works, we actually wanted 17:55.680 --> 18:02.040 to test the fact that this was a CPU, because this was CPUs, the non-dax accelerators, 18:02.040 --> 18:03.040 right? 18:03.040 --> 18:07.320 So we actually wrote it like we wrote C and just wrote, because we wanted to be sure 18:07.320 --> 18:08.320 of that. 18:08.320 --> 18:13.240 There are effort in the community and people are happy to join that actually, there's 18:13.240 --> 18:16.600 definitely an effort going on right now for actually writing up to my screen and then of course 18:16.600 --> 18:19.600 we started from math pool. 18:19.600 --> 18:21.600 Anyone else? 18:21.600 --> 18:25.600 I think we're done.