WEBVTT 00:00.000 --> 00:11.480 Hey everybody, my name is Eric Ernst, I'm excited to be here to talk about our open source 00:11.480 --> 00:16.640 projects and some of the community work that we're doing around containers on macOS. 00:16.640 --> 00:22.400 So last June, our team at Apple Open Source, the Apple Containerization Framework and 00:22.400 --> 00:27.800 Container Julien projects, which enabled developers to be able to create, run, build, push, 00:27.800 --> 00:33.120 and continue to image directly on their Mac in a way that focuses on security and privacy. 00:33.120 --> 00:38.480 Really, what we did is we introduced two key projects, one of them, a command line tool called 00:38.480 --> 00:43.760 Container, naming as hard, and it allows users, again, to quickly be able to get 00:43.760 --> 00:47.640 up and running containers on top of a macOS operating system. 00:47.640 --> 00:51.560 The second thing that we open source project wise is Containerization, and this is a framework 00:51.560 --> 00:57.080 that allows us to build solutions like that command line tool container. 00:57.080 --> 01:01.560 Users will typically interact with just a container that command line tool, but the core 01:01.560 --> 01:07.920 functionality for the project is really behind simply the eyes in Containerization. 01:07.920 --> 01:13.240 So I want to talk about what I'm going to focus on in this quick session, so I'm going 01:13.240 --> 01:17.520 to talk about some of the design principles that guided the work that we're doing. 01:17.520 --> 01:24.240 And then I want to briefly talk about how we designed it to be extensible in this example 01:24.240 --> 01:29.400 is how developers can go ahead and do that, as well as showing some of the key APIs 01:29.400 --> 01:34.320 in Containerization, again, motivation that is pretty easy to build on top of this. 01:34.320 --> 01:39.160 We'll talk about how we wrote it in Swift, and what that experience has been. 01:39.160 --> 01:43.600 It was a new process for the team, and then we're going to talk a bit more about just 01:43.600 --> 01:47.120 overall resource community and what we're doing going forward. 01:47.120 --> 01:50.680 So first off, let's talk about the actual design side of things. 01:50.680 --> 01:57.040 So in order to run the Linux container on macOS, you need to virtualize a Linux environment. 01:57.040 --> 02:03.840 A historical solution is to spawn a large virtual machine for all the different containers. 02:03.840 --> 02:06.840 Resources are allocated to that one virtual machine. 02:06.840 --> 02:08.280 The containers are added to it. 02:08.280 --> 02:11.640 These resources are devied up per container. 02:11.640 --> 02:15.760 When you need to share additional directories and files from your Mac, they're passed 02:15.760 --> 02:20.280 to that virtual machine before then they are then passed to the corresponding container 02:20.280 --> 02:22.800 with the needs it. 02:22.800 --> 02:27.480 So one of the areas we focused on in the security, and for that, we want to have the same 02:27.480 --> 02:32.040 level isolation, typical platforms have for their large virtual machines, and apply that 02:32.040 --> 02:34.520 to every container that is launched. 02:34.520 --> 02:38.640 We also want to reduce the need for core utilities or any dynamic libraries or anything 02:38.640 --> 02:44.520 else inside of the guest, and this helps both reduce the attack surface and also the 02:44.520 --> 02:49.480 burdened of maintenance costs for keeping these up to date. 02:49.480 --> 02:51.920 We also focus on privacy. 02:51.920 --> 02:57.040 Because of that, we want to limit access of what files are needed for that guest virtual 02:57.040 --> 02:59.920 machine and do it on a per container basis instead. 02:59.920 --> 03:04.080 So only the container requesting a specific directory at that point should have access 03:04.080 --> 03:05.960 to those contents. 03:05.960 --> 03:09.880 And we want to hit these goals, we'll still provide an efficient and perform an experience 03:09.880 --> 03:11.640 for end users. 03:11.640 --> 03:15.960 So for security or goal, we'll provide the same level isolation, used by the large virtual 03:15.960 --> 03:20.000 machine, and apply that to each and every single container that is started. 03:20.000 --> 03:24.440 Containerization does this by running each container in its own lightweight virtual machine, 03:24.440 --> 03:27.080 we'll still provide in sub-second-start times. 03:27.080 --> 03:31.480 This also provides the benefit that each container has its own dedicated IP address. 03:31.480 --> 03:37.120 So if you've used different solutions, usually we end up having to do is do different port forwarding 03:37.120 --> 03:40.280 magic in order to be able to access the different service. 03:40.280 --> 03:44.720 So in this now, you can just access the direct IP of that container workload instead. 03:45.720 --> 03:50.600 When sharing directories and files, only the container requesting that directory has access 03:50.600 --> 03:56.160 to those contents, and resources like CPU and memory, if there's no containers running, no 03:56.160 --> 03:58.600 resources will be allocated. 03:58.600 --> 04:03.360 Now I've said lightweight virtual machine several times, so let me dig into what I mean by 04:03.360 --> 04:04.360 that. 04:04.360 --> 04:09.400 To be lightweight, we first focus on the actual machine of self, specifically what devices 04:09.400 --> 04:12.640 associated with it that we need and what a re-verchalizing. 04:12.960 --> 04:17.200 Using Apple's virtualization framework, we are using the minimal set of devices that we 04:17.200 --> 04:22.320 need for the user experience and nothing else, so being in a pair of virtualized devices 04:22.320 --> 04:28.360 not chipset or machines or anything else, just a virtual socket, virtual block device, network 04:28.360 --> 04:30.760 device, and shader file system. 04:30.760 --> 04:35.680 This is more aligned with a micro-VM type of model. 04:35.680 --> 04:39.960 Also what we're doing is making sure that that machine itself is sized appropriately. 04:39.960 --> 04:44.840 So we have a reasonable default, but then again, we're sizing the virtual machine 04:44.840 --> 04:51.000 based on the requests of the actual workload itself, same exact thing for CPU. 04:51.000 --> 04:55.080 So it's lightweight machine, the next is what do we do and turn that machine on. 04:55.080 --> 04:58.320 In our case, we direct boot against Linux kernel. 04:58.320 --> 05:01.320 There's no need for any kind of firmware, bootloader, or anything else. 05:01.320 --> 05:05.920 This saves on boot time as well as it reduces our bill of materials of what we need. 05:05.920 --> 05:09.920 Now for this direct booted kernel, we also want to make sure it's a minimal configuration. 05:09.920 --> 05:14.880 We chose NARM 64 configuration that, again, addresses our needs, but minimizes otherwise 05:14.880 --> 05:16.880 what we would have inside of there. 05:16.880 --> 05:21.320 Reduce the tax surface, reduce the footprint of that actual kernel, as well as helps provide 05:21.320 --> 05:23.120 a perform boot time. 05:23.120 --> 05:26.280 Getting from reset to user space is very fast. 05:26.280 --> 05:31.240 So speaking of user space, I want to talk about what the user space looks like inside of 05:31.240 --> 05:32.240 our guest. 05:32.240 --> 05:37.440 So historically, when using a large virtual machine, you have different dynamic libraries, utilities, 05:37.440 --> 05:39.040 and everything else inside of it. 05:39.080 --> 05:41.720 You're booting a full system. 05:41.720 --> 05:42.720 The file system? 05:42.720 --> 05:43.720 Yeah, I said that. 05:43.720 --> 05:44.720 Okay, sorry. 05:44.720 --> 05:47.160 If it's security, we want to reduce the attack surface. 05:47.160 --> 05:49.560 So we don't have any core utilities. 05:49.560 --> 05:51.200 There's no dynamic libraries. 05:51.200 --> 05:54.080 There's no lib-c implementation or anything else. 05:54.080 --> 05:58.680 We created our own init process that is purpose built just for running containers in this 05:58.680 --> 06:00.400 constrained environment. 06:00.400 --> 06:05.120 And we call it VMInitD, which is written in Swift, it's statically built because there's 06:05.120 --> 06:06.120 no libraries. 06:06.120 --> 06:11.120 And it's designed to manage a full cycle of processes associated with running the container 06:11.120 --> 06:13.440 inside that guest environment. 06:13.440 --> 06:19.200 Now running as initial process comes with a bunch of responsibilities before, as well as 06:19.200 --> 06:21.400 during execution of containers. 06:21.400 --> 06:27.340 So it's responsible for doing things like the actual interface IP is set up, mounting the 06:27.340 --> 06:33.800 file system, such as any volumes or the actual RITFS itself. 06:33.800 --> 06:38.160 It is responsible for launch and supervision of all the processes inside of it, namely 06:38.160 --> 06:40.320 the container entry point. 06:40.320 --> 06:45.240 And it has a year PCAPI that we have for it in order for us to be able to derive that from 06:45.240 --> 06:47.880 the host side. 06:47.880 --> 06:54.200 Looking at our three focus areas for security, privacy and performance with a minimal kernel, 06:54.200 --> 06:59.240 we kind of hit both security as well as performance, I would say, for direct booting, 06:59.280 --> 07:01.120 quicker boot as well. 07:01.120 --> 07:07.600 Using that static purpose built init process helps, again, have a much faster boot time 07:07.600 --> 07:11.520 for us as well as a reduced attack surface inside of the guest. 07:11.520 --> 07:15.280 And each workload being inside its own virtual machine kind of helps across the board 07:15.280 --> 07:16.840 I would say for all three of these. 07:16.840 --> 07:21.520 So let's talk about extending then. 07:21.520 --> 07:25.520 So I don't have time to really go into much demos and everything else, but it's a command 07:25.520 --> 07:29.840 tool to container run container. 07:29.840 --> 07:32.840 Any image you want, as you would expect. 07:32.840 --> 07:37.560 One of the things we wanted to look at, make sure there, yeah. 07:37.560 --> 07:41.920 So we wanted to make it so that people can extend it for their own use cases and actually 07:41.920 --> 07:45.480 for us to easily extend it for our own internal use cases as well. 07:45.480 --> 07:49.680 So what we did was we created a plug-in framework for it and this kind of behaves a little 07:49.680 --> 07:53.280 bit like you would have forget subcommands. 07:53.280 --> 07:58.560 So with this, it allows us to have other developers who want to contribute, but don't 07:58.560 --> 08:00.600 want to make changes to the core code. 08:00.600 --> 08:03.640 Maybe they're specific for the use case or custom workflow or anything else. 08:03.640 --> 08:08.880 They can go ahead and easily just extend what we have functionality wise. 08:08.880 --> 08:14.200 We can just add simple CLI subcommands or these could be different services. 08:14.200 --> 08:18.960 XPC services, the maybe are tied to the lifecycle of the whole thing running or just a container 08:18.960 --> 08:22.160 running or anything else. 08:22.160 --> 08:25.960 So we actually use plugins today for a bunch of the components that we need in order 08:25.960 --> 08:28.640 to be able to run the baseline tool itself. 08:28.640 --> 08:31.240 And then later on the talk, I'll talk about a couple of plugins that we're kind of thinking 08:31.240 --> 08:34.120 about for open. 08:34.120 --> 08:37.680 Creating this easy, it's auto-discovery, you just drop it into no location with a little 08:37.680 --> 08:39.280 bit of config made it data. 08:39.280 --> 08:43.200 And after that, it's picked up automatically the next time you're on the CLI command, you'll 08:43.200 --> 08:46.160 see it. 08:46.160 --> 08:48.320 So that's extending container. 08:48.320 --> 08:53.560 But containerization itself, if I were to talk to all the developers in the team, I would 08:53.560 --> 08:57.960 say that the command link was fine, it has high utility for people, but really the most 08:57.960 --> 09:01.640 proud of containerization itself and all the different APIs that are provided inside of 09:01.640 --> 09:02.640 it. 09:02.640 --> 09:06.480 So with that in mind, I want to talk about how you can easily build a client like ours on 09:06.480 --> 09:08.840 top of containerization. 09:08.840 --> 09:13.320 So as I mentioned before, containerization is where the core logic is for working with 09:13.320 --> 09:15.480 Linux containers on macOS. 09:15.480 --> 09:19.960 To this time, we created many useful modules, which could be useful in their own right. 09:19.960 --> 09:24.160 You don't have to be making containers to want to use some of these or take them for anything 09:24.160 --> 09:25.160 else. 09:25.160 --> 09:29.640 We have containerization OS, interacting with low-level OS components like terminal and 09:29.640 --> 09:31.240 different process management. 09:31.240 --> 09:36.520 We have a minimal net link library for being able to do interface management inside of 09:36.520 --> 09:43.480 a Linux guest with OCI so we can have different runtime primitives as well as different container 09:43.480 --> 09:46.280 image types and everything. 09:46.280 --> 09:50.880 Then we also create an EXT-401, which is I think pretty interesting, it makes it so that 09:50.880 --> 09:56.960 way we can create and manipulate EXT-405 systems from macOS so you don't have to boot 09:56.960 --> 10:00.560 Linux kernel and do all this, you can actually do it directly, then have that raw block 10:00.560 --> 10:04.040 device and boot with it. 10:04.040 --> 10:08.680 Throughout all of this, we made sure to make heavy use of protocols, which in Swift really 10:08.680 --> 10:13.560 means that you can go ahead and replace this with a different implementation and make 10:13.560 --> 10:16.280 it plugable and everything else. 10:16.280 --> 10:21.680 In addition to protocols, we really did focus on the APIs themselves and we're not one 10:21.680 --> 10:27.120 dot zero so they still move around a little bit but really looking at it and to make 10:27.120 --> 10:29.840 sure that we have different layers for this. 10:29.840 --> 10:35.120 So at the highest level, it makes it really easy and I'll show a quick example to just 10:35.120 --> 10:38.920 be able to run a Linux environment on top of macOS. 10:38.920 --> 10:44.460 Kind of in the mid-level, you have different virtual machine management type of APIs where 10:44.460 --> 10:50.000 process management APIs in there and then you can go lower level where we get into more 10:50.000 --> 10:54.240 file system and Linux systems management type of layers and they're kind of build on top 10:54.240 --> 10:58.360 of each other so you can go as deep as you want depending on your application. 10:58.360 --> 11:02.120 At this point, I'm going to risk trying to change the terminal and show that we can make 11:02.120 --> 11:09.960 and macOS application use this Linux in very little code. 11:09.960 --> 11:11.960 We did it. 11:11.960 --> 11:12.960 Okay. 11:12.960 --> 11:17.280 I hope that the people in the back can see it okay. 11:17.280 --> 11:22.520 This is 38 lines in the file of which maybe we have about six function calls. 11:22.520 --> 11:29.040 If I look at it, one of the top level types of we have is container manager where this 11:29.040 --> 11:32.560 is taking a kernel and then knit FS reference. 11:32.560 --> 11:38.760 That's that minimal kernel that every container is going to use for each individual copy 11:38.760 --> 11:40.120 of it then. 11:40.120 --> 11:45.880 Then knit FS is what we do for that VMNID process that I talked about is static built. 11:45.880 --> 11:52.080 With that, we can now go ahead and create containers using that factory. 11:52.080 --> 11:56.760 With the container create, what we need is the reference to the actual OCI image. 11:56.760 --> 12:01.800 Again, standard OCI image that we're using building and consuming and then just whatever 12:01.800 --> 12:07.600 a unique name to it, you can figure it for whatever resources you want, set terminal because 12:07.600 --> 12:15.640 I'll just do it a little echo because we're here, I can type that and then this is just 12:15.640 --> 12:18.280 setting up all the metadata after we execute that. 12:18.280 --> 12:22.280 It's pulling down and unpacking and making a block device that we can use inside the gas 12:22.280 --> 12:26.240 for that root of S which is the container image. 12:26.280 --> 12:30.920 You create, we do start, this is at this point, boot of the VM, starts the process, and 12:30.920 --> 12:31.920 then we wait. 12:31.920 --> 12:34.720 It was a shell or an interactive or anything else, it would keep going in our case that 12:34.720 --> 12:38.800 weight should come back pretty quick once the echo completes and then we'll stop and 12:38.800 --> 12:40.800 on the defer will clean up. 12:40.800 --> 12:45.840 So that was about six function calls and at that point, you can have a custom macOS application, 12:45.840 --> 12:50.960 yes it's using a container image but you could use, that's just a payload. 12:50.960 --> 12:55.440 You can run a Linux environment and about 10 lines and have that embedded in Swift on top 12:55.600 --> 12:56.600 of your Mac. 12:56.600 --> 13:02.440 So it built and then if we just run it, it'll just say hello world eventually, yes, 13:02.440 --> 13:05.760 okay, that took a while, I didn't like that, I'll taste it on. 13:05.760 --> 13:06.760 Okay. 13:06.760 --> 13:25.360 So let me tell you about Swift a little bit, background of the team is a lot of folks 13:25.360 --> 13:30.560 who contribute to container D, run C authors, different tools like this, so like very container 13:30.560 --> 13:36.000 run time focused, very C focused and very, very go focused. 13:36.000 --> 13:39.600 This is our first project using Swift, so I'm just going to give a few different data points 13:39.600 --> 13:42.960 of what the experience was like. 13:42.960 --> 13:48.400 First, C interoperability, you know, with terminal and all the different containerization 13:48.400 --> 13:52.080 OS, all these different modules use a lot of ciscalls and things like this so we often 13:52.080 --> 13:58.080 do need to call in the C. Compared to using Go, it was a lotty, it was very easy, the interoperability, 13:58.080 --> 14:00.440 really, it couldn't be any easier. 14:00.440 --> 14:06.400 The second area was around enums and tech unions, this helped, it really changed how 14:06.400 --> 14:12.040 we're designing area APIs and containerization, we found them really quite nice and useful. 14:12.040 --> 14:17.280 Memory safety, again, is pretty important given our focus on security, you know, using 14:17.280 --> 14:23.560 optionals, designed to prevent psych faults, it was, they're pretty nice to use. 14:23.560 --> 14:28.560 In addition, you know, it's kind of table stakes, we need a static SDK available for Linux, 14:28.640 --> 14:33.600 and it, you know, meets our needs so that we can have that minimal guest image, and we're 14:33.600 --> 14:37.520 often calling into different macOS frameworks and everything else, so being able to call 14:37.520 --> 14:42.720 Swift into these frameworks was a pretty natural and easy thing to do. 14:42.720 --> 14:46.880 Now, let's talk about the open source community aspects, tell you some of the goals we have 14:46.880 --> 14:50.320 and some of the work that we're going to be doing going forward. 14:50.320 --> 14:54.720 We open source the last year and we've been really excited, it's been a pretty positive response 14:54.720 --> 14:59.680 so far. Now that it's in the open, our first party is encouraging contributions in trying 14:59.680 --> 15:07.760 to develop in the open with a community. So we welcome all feedback, if it's issues, PRs, questions, 15:08.320 --> 15:15.520 ideas, anything, it really, this brings us a ton of joy. So we'd love that. We're also looking 15:15.520 --> 15:19.520 forward to seeing how different projects can build on top of containerization, since it's 15:19.520 --> 15:24.880 standalone framework, again, developers can build directly on top of it and use the APIs 15:24.880 --> 15:30.160 to build their own custom solutions. And finally, we want to expand the container ecosystem 15:30.160 --> 15:36.240 through that plug-in architecture. So transitioning to kind of more of a roadmap type of focus, 15:36.240 --> 15:40.640 we have several features and integrations which we're currently working through and thinking about. 15:40.640 --> 15:47.520 One of them is it's trying to create a container plug-in that will allow for a more long 15:47.520 --> 15:53.440 running Linux environments on top of macOS. So like, less the femoral compared to containers. 15:54.240 --> 16:00.080 I would say that a lot of this is motivated like, hey, we like how WSL does this and kind of having 16:00.080 --> 16:06.400 that type of UX where we start to blur the Unix environments a little bit back and forth. 16:07.280 --> 16:11.040 So that's an interesting area. Another one that we're looking at is to be able to have a 16:11.040 --> 16:16.880 femoral Kubernetes environment locally. So this is effectively, how do we work with things like 16:17.840 --> 16:23.120 or mini-cube or any of these? We're doing some work in order to be able to see if we can 16:23.120 --> 16:30.160 be like a provider with kind and upstream. They were having like, we probably won't be able to do that. 16:30.880 --> 16:35.600 Container is pretty different from a lot of the other macOS based solutions that are kind or 16:35.600 --> 16:39.360 supporting because we do have pretty nice resource management of a one-to-one mapping that 16:39.360 --> 16:45.200 the container to a VM. So since our focus is on the other ones, essentially the feedback from 16:45.200 --> 16:50.320 Ben was like, hey, you guys should just write your own it's easy. So we're going to look at that 16:50.320 --> 16:55.440 for a femoral environment pretty shortly here. That'll probably be a container plug-in. 16:56.400 --> 17:02.080 In addition to this, we're always looking to improve our performance. So we currently use build kit 17:02.080 --> 17:07.200 and build kits great build kits in the facto. Every machine is built kit using it inside of a 17:07.600 --> 17:14.720 guest across from Swift across the machine boundary for GRPC. It's the performance that's 17:14.720 --> 17:21.280 suboptimal on our side and we want it to be better. The problem is fixing it, it's complex, 17:21.280 --> 17:27.040 as far as being able to have the equivalent functionality. So this is kind of a much larger thing 17:27.040 --> 17:30.400 that we've been talking about in the community and trying to figure out how we'd push that forward. 17:32.000 --> 17:36.640 On top of that, we want more ecosystem integration. So this is more like if we do Kubernetes 17:37.200 --> 17:42.240 of how can we build on top of containerization to have something that would fit underneath that 17:42.240 --> 17:47.520 container runtime interface type of API. And in the grace of you could run a more typical 17:47.520 --> 17:54.160 Kubernetes stack on this. One of the most common things is, hey, this is cool. Can I use compose 17:54.160 --> 17:59.760 and we say not yet. And dev containers are kind of similar to these two. The API that they kind of 17:59.760 --> 18:07.040 have is more like the Docker CLI. So it's challenging for us to decide how much should we make 18:07.040 --> 18:12.320 it a drop in replacement versus being opinionated for a very easy entry point. So this is still 18:12.320 --> 18:18.160 something we're looking at. They're different community PRs for adding compose, which is super fun, 18:18.160 --> 18:23.920 but not all the way there yet. So we're still kind of working through that. Next step, final 18:23.920 --> 18:29.120 wrapping up is, if you're on a Mac, you can try it out. Again, we would really love feedback 18:29.120 --> 18:34.160 for curious developers who want to use their own use case. Check out the APIs and containerization, 18:34.160 --> 18:39.280 build your own solution or extend it with plugins. If the intersection of virtualization containers 18:39.280 --> 18:48.720 are interesting, are growing as well on the team. But thank you. I appreciate it. And we have stickers 18:48.720 --> 18:52.800 in some things here as well.