WEBVTT 00:00.000 --> 00:10.000 Jim, we'll be talking about a programming language perspective on 00:10.000 --> 00:14.000 and replication. It's super high level, super interesting. Let's go. 00:14.000 --> 00:16.000 Let's welcome him. 00:16.000 --> 00:23.000 So then talk about the scientific environment. 00:23.000 --> 00:27.000 I can have a perspective as a post-conceptal tool researcher at the TV. 00:27.000 --> 00:32.000 So I'm the expert in the programming language perspective. 00:32.000 --> 00:35.000 So we're talking about the applied module. 00:35.000 --> 00:37.000 I have these vision data types. 00:37.000 --> 00:40.000 So they have been talking about two parts. 00:40.000 --> 00:44.000 First of all, look at the problem that we've made. 00:45.000 --> 00:48.000 How we attempted to work, try to solve them for the method teams. 00:48.000 --> 00:52.000 And then try to go a bit into my personal framework. 00:52.000 --> 00:55.000 I've been able to test these things very well. 00:55.000 --> 00:59.000 So a lot of the problems, actually, that we have to do. 00:59.000 --> 01:01.000 We can get them to get them strong. 01:01.000 --> 01:03.000 Well, it's right that we can't remember. 01:03.000 --> 01:05.000 We need a distributed system. 01:05.000 --> 01:07.000 A lot of these pre-porting systems. 01:07.000 --> 01:10.000 We've been able to build a built-up team and partition tools. 01:10.000 --> 01:13.000 We're going to take it into a liquid system, right? 01:13.000 --> 01:17.000 But, actually, in reality, we always need more people, right? 01:17.000 --> 01:20.000 We always have to assume that our system remains correct. 01:20.000 --> 01:22.000 If we have something, what does it get? 01:22.000 --> 01:24.000 It's not as much as the method. 01:24.000 --> 01:27.000 What that means is that we have to have a built-up system in the state. 01:27.000 --> 01:28.000 And then we'll be able to build it. 01:28.000 --> 01:31.000 So that's why we have to do this. 01:31.000 --> 01:34.000 But the reality when the development is replicated, 01:34.000 --> 01:35.000 it's not like this. 01:35.000 --> 01:36.000 The consistency is poor. 01:36.000 --> 01:37.000 You get it's right. 01:37.000 --> 01:40.000 The replicated setting is used to be strong. 01:40.000 --> 01:42.000 We practically get it. 01:42.000 --> 01:44.000 It's a contextual tool. 01:44.000 --> 01:46.000 But it is not impossible. 01:46.000 --> 01:49.000 I'm going to go pick over several little problems. 01:49.000 --> 01:51.000 And we'll put that into the system. 01:51.000 --> 01:53.000 So the first thing is right. 01:53.000 --> 01:56.000 Selecting the right to begin the device for your application. 01:56.000 --> 02:00.000 So, if you're building an application, you have to ask yourself, 02:00.000 --> 02:04.000 do we want strong consistency to want synchronize or data? 02:04.000 --> 02:07.000 Do we want that strong advantage of consistency? 02:07.000 --> 02:11.000 We can tolerate a bit of delay in our synthesis. 02:11.000 --> 02:14.000 We're at the gas consumption rate of data. 02:14.000 --> 02:16.000 But eventually, it converts. 02:16.000 --> 02:21.000 Or do we need just causal consistency, which might be different? 02:21.000 --> 02:22.000 Let's be glad that. 02:22.000 --> 02:24.000 Thank you very much. 02:24.000 --> 02:33.000 All right, I'll just recap the slide. 02:33.000 --> 02:35.000 Just for the online audience. 02:35.000 --> 02:38.000 So basically, selecting the right or the T is difficult. 02:38.000 --> 02:41.000 We have to select the right consistency level. 02:41.000 --> 02:43.000 And based on that, we also have to think, 02:43.000 --> 02:46.000 what invariant in our application, we have to guarantee. 02:46.000 --> 02:48.000 What does our application need? 02:48.000 --> 02:51.000 And often this is very specific to what you're going to do. 02:51.000 --> 02:53.000 So one of the things that we've been developing, 02:53.000 --> 02:56.000 and actually this was done by Kevin, who gave a presentation earlier, 02:56.000 --> 02:58.000 is that we let the computer decide. 02:58.000 --> 03:01.000 Basically, you encode, in a formal specification, 03:01.000 --> 03:04.000 in a high-level language, what you need, 03:04.000 --> 03:06.000 what the semantics of your data type have to be, 03:06.000 --> 03:09.000 what invariants you have to uphold, 03:09.000 --> 03:11.000 and we run competitions on this. 03:11.000 --> 03:14.000 So basically, what happens is you give a scalar-like spec. 03:14.000 --> 03:18.000 We let the three wrong on this together with some analysis tools. 03:18.000 --> 03:20.000 It gives you back a lot of bunch of results. 03:20.000 --> 03:23.000 Based on this, we can automatically generate 03:23.000 --> 03:25.000 a scalar code, JavaScript code, other code, 03:25.000 --> 03:27.000 which will allow you to synchronize it. 03:27.000 --> 03:31.000 We have also proof that whatever the computer decides 03:31.000 --> 03:34.000 to generate what data type generates out of this is correct. 03:34.000 --> 03:38.000 So a really cool part of this is that it allows you 03:38.000 --> 03:42.000 to offload a bit about thinking how to design correct 03:42.000 --> 03:46.000 to serialities, data types, or strongly consistent data types, 03:46.000 --> 03:49.000 because not only can verify, make it a serialities. 03:49.000 --> 03:53.000 It also can tell you for this replicated data type, 03:53.000 --> 03:56.000 you cannot have a seriality, because we will never be able 03:56.000 --> 03:59.000 to uphold this variance, and you will have to sync on this point. 03:59.000 --> 04:03.000 So this gives the developer a bit of a flexibility 04:03.000 --> 04:07.000 to be more confident about what they have. 04:07.000 --> 04:09.000 So that's important. 04:09.000 --> 04:13.000 The second part is that in a replicated data type, 04:13.000 --> 04:16.000 you often have all these events happening concurrently, right? 04:16.000 --> 04:19.000 This is why you need serialities to have convergence 04:19.000 --> 04:20.000 or any other mechanism. 04:20.000 --> 04:23.000 But so one of the ways that you want to implement a seriality 04:23.000 --> 04:25.000 is by tracking causality, for example, 04:25.000 --> 04:27.000 tracking what links you have between events, 04:27.000 --> 04:30.000 what operations happened before other events. 04:30.000 --> 04:32.000 And so this actually leads to a problem, 04:32.000 --> 04:35.000 because you have to kind of track this metadata 04:35.000 --> 04:38.000 inside of your replicated state, so either as data, 04:38.000 --> 04:42.000 or often when you are transmitting packages over the network, 04:42.000 --> 04:44.000 you want to track logical timestamps, 04:44.000 --> 04:47.000 about the data or what other data variations 04:47.000 --> 04:49.000 this might grow with the size of the events in your system. 04:49.000 --> 04:53.000 So basically, we have a data management problem, right? 04:53.000 --> 04:57.000 So what we've been doing is looking at ways 04:57.000 --> 05:00.000 that we can have distributed garbage collection 05:00.000 --> 05:03.000 and serialities, that can happen in the background, 05:03.000 --> 05:05.000 and it makes it more efficient. 05:05.000 --> 05:07.000 We move to data when needed. 05:07.000 --> 05:09.000 And also Tomas here has been working 05:09.000 --> 05:12.000 on having Delta vector close, which allow us to have 05:12.000 --> 05:14.840 in efficient representation of called timestamps 05:14.840 --> 05:18.440 which are replicated in a network. 05:18.440 --> 05:22.440 Finally, designing a sensible CRT is hard. 05:22.440 --> 05:25.240 So earlier, you saw that we can let 05:25.240 --> 05:27.760 computer decide how to make CRTs. 05:27.760 --> 05:29.920 But also, we have to kind of think that if you want 05:29.920 --> 05:34.200 to have a general set of CRTs that people can directly use, 05:34.200 --> 05:35.720 we have to think a bit about or semantic 05:35.720 --> 05:36.960 that we give to the end user. 05:36.960 --> 05:39.760 So for given a set, for example, a set, 05:39.760 --> 05:43.040 you cannot replicate directly because if you have received 05:43.040 --> 05:44.920 either remove operations in different orders, 05:44.920 --> 05:46.280 you get different results. 05:46.280 --> 05:48.960 So this makes that you have CRTs, for example, 05:48.960 --> 05:53.400 be add-win CRT or remove CRT or last writer-win CRT's. 05:53.400 --> 05:56.320 They allow you to force in certain ordering 05:56.320 --> 05:59.600 that will make that your system converges, right? 05:59.600 --> 06:02.000 But even if you pick one of these, implementing them 06:02.000 --> 06:04.720 still can lead to strange behavior. 06:04.720 --> 06:06.520 Imagine we have two replicas. 06:06.520 --> 06:08.960 And then when we add an item, and remove it, 06:08.960 --> 06:11.680 and we do exactly the same in an order replica. 06:11.680 --> 06:13.640 Now I say we have add-win semantic system 06:13.640 --> 06:15.160 in a concurrent context. 06:15.160 --> 06:17.800 We will let ads win over removes. 06:17.800 --> 06:20.560 That would mean that in this particular case, 06:20.560 --> 06:26.200 if we merge everything, we might end up with the item in it 06:26.200 --> 06:28.760 in the set because the concurrent ads 06:28.760 --> 06:30.640 would cancel or out the removes. 06:30.640 --> 06:34.080 But sequentially, actually, we have removed the element. 06:34.080 --> 06:36.160 So depending on your interpretation, 06:36.160 --> 06:37.960 you can get to very strange results 06:37.960 --> 06:41.800 that might not also make sense to the end user. 06:41.800 --> 06:44.440 So building on the automatic generation 06:44.440 --> 06:47.080 of what we saw in the first problem, 06:47.080 --> 06:49.480 we've been also working on an extension of this, 06:49.480 --> 06:52.880 where that we actually are trying to create this building blocks 06:52.880 --> 06:55.520 and different building blocks that we can provide 06:55.520 --> 06:57.640 to the end user to build replications 06:57.640 --> 07:01.480 that where you first will analyze, for example, 07:01.480 --> 07:04.800 the data type, you will automatically detect 07:04.800 --> 07:06.280 the complex different behaviors, 07:06.280 --> 07:09.240 and there's different properties of your sequential implementations. 07:09.240 --> 07:11.440 And then you will propose to the user look, 07:11.440 --> 07:14.720 I find five ways that I can produce a serity, 07:14.720 --> 07:16.920 and this is to be these behaviors, right? 07:16.920 --> 07:19.880 And so we can give these components 07:19.880 --> 07:22.520 and users can also pick what states best. 07:22.520 --> 07:24.480 In the first version that I showed, 07:24.480 --> 07:27.160 the computer will give you something that solves the problem. 07:27.160 --> 07:30.200 But here, we're actually trying to give the end user 07:30.200 --> 07:32.680 the ability to pick a bit and see what fits best 07:32.680 --> 07:33.960 to their application. 07:33.960 --> 07:37.000 And the realities that often see what it is, 07:37.000 --> 07:39.560 may not be the best case, but at least we give them 07:39.560 --> 07:41.360 a way to envision this. 07:41.360 --> 07:43.080 So this work, excitingly enough, 07:43.080 --> 07:45.960 we're doing this also with two important figures 07:45.960 --> 07:47.920 who worked on the original serity paper, 07:47.920 --> 07:50.320 Carlos Bacero, and no producer. 07:50.320 --> 07:53.120 So we are really getting important feedback of them 07:53.120 --> 07:55.640 as well on doing this correctly. 07:57.840 --> 08:00.560 Finally, in the understanding problem, 08:00.560 --> 08:02.320 well, understanding or it is hard, 08:02.320 --> 08:03.360 but it doesn't have to be. 08:03.360 --> 08:05.600 If you're a master student or a PhD, 08:05.600 --> 08:07.840 you can go to the Ford Summer School on this route 08:07.840 --> 08:12.000 or replicate an environment where you can get direct class 08:12.000 --> 08:16.280 from some of the experts in the domain. 08:16.280 --> 08:18.720 So what I want to say about this first part 08:18.720 --> 08:23.640 is basically that well, consistency can be easy 08:23.640 --> 08:25.440 if you do have the right components. 08:25.440 --> 08:28.560 And so this is the research that we're doing at university. 08:28.560 --> 08:31.400 We're looking at ways we can make serity more efficient 08:31.400 --> 08:34.400 in what ways we can target new settings, 08:34.400 --> 08:37.160 for example, imagine that we are in a system 08:37.160 --> 08:39.720 where we transmit packages over Lora. 08:39.720 --> 08:41.640 We have a very small bandwidth, right? 08:41.640 --> 08:45.080 We cannot just broadcast data at random 08:45.080 --> 08:48.080 so we really have to consider how we package our data, 08:48.080 --> 08:49.680 what representation we use, 08:49.680 --> 08:51.280 how is our causality linked, 08:51.280 --> 08:56.120 so that we can have as efficient data structures as possible. 08:56.120 --> 08:58.080 So this finishes the first part, 08:58.080 --> 09:02.240 where I give a bit of a view of the problems. 09:02.240 --> 09:03.960 And I will go back into FLEC, 09:03.960 --> 09:08.120 which is my personal laboratory for experimenting with seritys. 09:08.120 --> 09:13.120 So I started my research on seritys actually back in 2018. 09:13.120 --> 09:17.480 And I was using a lot of Lua code to implement my seritys. 09:17.480 --> 09:19.160 I had a small little framework. 09:19.160 --> 09:22.080 And later, it is actually evolved into a TypeScript 09:22.080 --> 09:25.040 based implementation because I wanted to use 09:25.040 --> 09:28.040 assembly scripts to compile this to WebAssembly. 09:28.040 --> 09:31.400 And we were running this on embedded microcontrollers. 09:31.400 --> 09:35.160 In the end, we just focus on the TypeScript. 09:35.160 --> 09:36.000 And so what is this? 09:36.000 --> 09:38.400 What FLEC is actually is a small tool, 09:38.400 --> 09:40.760 which gives you several libraries. 09:40.760 --> 09:43.600 Let's say a causal delivery middleware 09:43.600 --> 09:45.840 a serity-based serity library 09:45.840 --> 09:47.520 where you can implement state-based seritys, 09:47.520 --> 09:48.960 operation-based seritys. 09:48.960 --> 09:50.600 And we have this important layer called 09:50.600 --> 09:52.640 pure operation-based serity layer. 09:52.640 --> 09:56.520 So pure operation-based seritys are like a known ad hoc way 09:56.520 --> 09:59.120 of designing operation-based seritys. 09:59.120 --> 10:01.360 And what's really nice is that we can reason easily 10:01.360 --> 10:05.960 about stuff like when we should drop certain metadata 10:05.960 --> 10:07.600 in the network. 10:07.600 --> 10:11.200 So this framework is built on something I call TSAT, 10:11.200 --> 10:15.200 which is actually an implementation of what you call 10:15.200 --> 10:18.400 ambient oriented programming, which was a bit popular 10:18.400 --> 10:22.320 in 2009 on top of TypeScript. 10:22.320 --> 10:24.000 So I'll give a bit of an example of that 10:24.000 --> 10:26.280 and show you how we use this. 10:26.280 --> 10:29.080 So the most basic example that I can show you 10:29.080 --> 10:32.160 with just the not-slag but TSAT 10:32.160 --> 10:34.960 is that we're going to have a ping-pong application. 10:34.960 --> 10:37.280 So on one side, we can make a new actor. 10:37.280 --> 10:39.880 And we can export an object, a ping-pong object 10:39.880 --> 10:40.600 on the network. 10:40.600 --> 10:42.400 So here you see ping-pong. 10:42.400 --> 10:45.320 And so these nodes can be using web port 10:45.320 --> 10:46.440 to see peer-to-peer. 10:46.440 --> 10:50.240 You can have shuffle a bit to the middleware layer outside. 10:50.240 --> 10:52.880 You could even add a Bluetooth layer if you wanted. 10:52.880 --> 10:55.400 But the idea is that you have a full mesh network 10:55.400 --> 10:58.600 where nodes can discover all objects exported by the order 10:58.600 --> 10:59.800 nodes. 10:59.800 --> 11:02.400 And so that's what we do exactly in this computer. 11:02.400 --> 11:04.240 We're discovering, we have an actor, 11:04.240 --> 11:06.280 which is discovering a ping-pong object. 11:06.280 --> 11:08.440 And so we got here is a reference. 11:08.440 --> 11:10.400 And what's really cool is that this reference is actually 11:10.400 --> 11:11.800 a four reference. 11:11.800 --> 11:14.760 And any operation that would be originally synchronous 11:14.760 --> 11:16.800 on the other side is automatically 11:17.360 --> 11:19.400 related into an asynchronous call. 11:19.400 --> 11:20.360 Even at type level. 11:20.360 --> 11:22.240 So here you would see the reference, 11:22.240 --> 11:24.840 and you would get a full type of it. 11:24.840 --> 11:29.240 But let's say if the original sequential implementation 11:29.240 --> 11:32.720 would return string, here we will get a promise to a string. 11:32.720 --> 11:39.000 So we can safely type and implement asynchronous applications. 11:39.000 --> 11:41.480 So on top of this, we have actually implemented 11:41.480 --> 11:43.760 and the serity layer. 11:43.760 --> 11:46.720 So the most basic example is that we can, for example, 11:46.720 --> 11:49.480 we want to have a simple shopping list, the most basic one. 11:49.480 --> 11:51.760 We say we want a net wind set. 11:51.760 --> 11:55.480 And so we say this at wind set should go online 11:55.480 --> 11:56.600 with the name shopping list. 11:56.600 --> 12:00.280 So now anywhere in the network, I can discover this shopping list. 12:00.280 --> 12:01.720 And so that's what I do here. 12:01.720 --> 12:02.920 And discovered the shopping list. 12:02.920 --> 12:04.960 I set a callback on the shopping list 12:04.960 --> 12:08.320 to get updates on when the serity itself 12:08.320 --> 12:11.640 can rely on serity is updated. 12:11.640 --> 12:13.760 And so here you can see I add some items 12:13.760 --> 12:18.760 as you would do in a regular sequential data type. 12:18.760 --> 12:21.040 I forgot to go over my arrows, but OK. 12:21.040 --> 12:23.320 So the implementation of the yet-wins itself 12:23.320 --> 12:25.160 builds again on top of, let's say, 12:25.160 --> 12:27.520 or pure operation-based serity theory. 12:27.520 --> 12:29.640 And we kind of basically defined these rules, 12:29.640 --> 12:33.760 these redundancy rules, which map actually directly 12:33.760 --> 12:37.520 to their specification as you can find in academia. 12:37.520 --> 12:40.200 So one of the goals when I was building this framework 12:40.200 --> 12:43.840 was that I could easily experiment and change it, 12:43.840 --> 12:48.160 changes to these frameworks, new concepts for serities. 12:48.160 --> 12:50.280 For example, I introduced redundancy relations 12:50.280 --> 12:52.720 that allow you to access data directly 12:52.720 --> 12:55.720 that is cash in the reliable causal broadcasting layer 12:55.720 --> 12:58.760 to allow you to have more efficient updates. 12:58.760 --> 13:02.000 And so that was my goal with building this little laboratory 13:02.000 --> 13:03.960 for myself. 13:03.960 --> 13:08.400 How do I make something that I can easily extend and play with? 13:08.400 --> 13:12.960 So this is a bit of unclear, so maybe I should give a visualization 13:12.960 --> 13:15.920 of what actually happens when we receive operations 13:15.920 --> 13:18.800 and what redundancy relations are used. 13:18.800 --> 13:23.520 So imagine that we have a serity, a pure operation-based serity, 13:23.520 --> 13:26.640 and internally we keep a log of the events that happened, 13:26.640 --> 13:28.800 only the ones that we need to. 13:28.800 --> 13:30.640 We actually compact this log regularly 13:30.640 --> 13:32.120 and we keep a compact state. 13:32.120 --> 13:34.040 But basically, whatever we keep in the log, 13:34.040 --> 13:36.000 is the frontier of operations, which 13:36.000 --> 13:40.040 are not causing stable, which we don't have information 13:40.040 --> 13:42.600 if they have been observed by other replicas. 13:42.600 --> 13:47.760 So let's say we receive a remove or a operation. 13:47.760 --> 13:53.000 What we'll first do is we'll check if anything in the log, 13:53.000 --> 13:56.000 the existing log of the serity will become redundant, right? 13:56.000 --> 13:59.640 And we do that using the redundancy rules of the pure operation-based 13:59.640 --> 14:03.800 serity framework and well, or implementation. 14:03.800 --> 14:08.600 So basically, we say that any existing item is redundant. 14:08.600 --> 14:12.720 If either the newly arriving operation is a clear operation, 14:12.720 --> 14:16.160 or if the existing item has the same arguments. 14:16.160 --> 14:18.680 What's important to note is that all these operations 14:18.680 --> 14:22.880 are arriving in causal order, the underlying framework 14:22.880 --> 14:25.600 will ensure that they're causally ordered. 14:25.600 --> 14:30.800 So in this case, we can make the ads for a redundant. 14:30.800 --> 14:37.000 And then, before we put the element itself in the log, 14:37.000 --> 14:38.760 we will check, actually, do we need to keep it. 14:38.760 --> 14:41.160 And in this case, for the ads, we actually 14:41.160 --> 14:43.560 do not need to keep removes. 14:43.560 --> 14:47.160 Because the serity will converge without it. 14:47.160 --> 14:50.960 And so what happens is that actually, then, in this naive version, 14:50.960 --> 14:55.360 we will just iterate over the log and compute the difference 14:55.360 --> 14:58.440 and compute the state. 14:58.440 --> 15:02.960 And this result is here T, which contains B and C. 15:02.960 --> 15:05.360 Now, in an actual implementation, so in black, 15:05.360 --> 15:07.680 what we do is we keep this compact state. 15:07.680 --> 15:09.560 And whenever operations are causally stable, 15:09.560 --> 15:11.800 we apply on sequential version to ensure 15:11.800 --> 15:15.320 that we have as little internal data usage as possible. 15:15.320 --> 15:16.520 And the different techniques, actually, 15:16.520 --> 15:19.320 to clean this up and force it. 15:19.320 --> 15:21.240 But so what was important for me 15:21.240 --> 15:23.160 was that we had these different layers where I could 15:23.160 --> 15:27.560 poke and play with, and so I built this extensible implementation 15:27.560 --> 15:30.880 where we, at different levels, have different hooks 15:30.880 --> 15:31.560 that I can hook in. 15:31.560 --> 15:34.320 For example, in the TSAT core layer, 15:34.320 --> 15:37.920 I can hook into message shunning, message receiving, 15:37.920 --> 15:40.400 the ide generation of your different nodes, 15:40.400 --> 15:43.080 when events are received, updated, et cetera. 15:43.080 --> 15:45.160 So that's really nice that I had these distinct ports, 15:45.160 --> 15:49.720 which helped me reason about CRT's replicas, network messages 15:49.720 --> 15:52.360 in a flexible way. 15:52.360 --> 15:55.480 And that's where the name flag actually also come from. 15:55.480 --> 15:57.480 So for the CRDT layer, 15:57.480 --> 16:01.480 I could hook into when nodes would go online. 16:01.480 --> 16:06.160 I could have hook into when operations were generated. 16:06.160 --> 16:08.600 Operations were received. 16:08.600 --> 16:11.720 And in the buffer, the causal buffer, 16:11.720 --> 16:13.440 and also finally applied. 16:13.440 --> 16:14.440 So different layers. 16:14.440 --> 16:16.200 And on because I have these hooks, 16:16.200 --> 16:17.720 now I can build on top of this. 16:17.720 --> 16:22.200 And that's where these, the pure operation layer 16:22.200 --> 16:24.040 and the RCB layer, build on. 16:24.040 --> 16:27.280 So the RCB layer then adds this causal buffering. 16:27.280 --> 16:30.000 And the pure up layer adds this method 16:30.000 --> 16:34.800 for dealing with all the redundancy layers. 16:34.800 --> 16:37.800 So actually, when we really fast through my presentation, 16:37.800 --> 16:39.040 and this is my last slide. 16:39.040 --> 16:40.960 So I'm giving you some buffer time. 16:40.960 --> 16:43.680 But so I will conclude with basically saying 16:43.680 --> 16:46.240 that designing and implementing replicated data type 16:46.240 --> 16:46.760 is hard. 16:46.760 --> 16:50.720 This is why you often will go to one-fit-all solutions 16:50.720 --> 16:53.200 that are out there that are made by very small people 16:53.200 --> 16:55.480 who have already talked about a lot of the problems. 16:55.480 --> 16:59.600 But sometimes your problem might not fit this one 16:59.600 --> 17:01.160 for fits all solutions. 17:01.160 --> 17:05.640 So we've been working at getting like computer-generated, 17:05.640 --> 17:08.560 provenly correct designs out there. 17:08.560 --> 17:12.440 And this is important because we can optimize for efficiency. 17:12.440 --> 17:14.040 We can optimize for correctness. 17:14.040 --> 17:15.480 And we can also really try to make 17:15.480 --> 17:17.880 a series ofties that sometimes makes sense, right? 17:17.880 --> 17:20.520 Because sometimes we want to build an application. 17:20.520 --> 17:24.600 And in reality, concurrency forces you 17:24.600 --> 17:26.320 to make compromises. 17:26.320 --> 17:28.760 So we want you to develop, or at least, 17:28.760 --> 17:32.720 to be understanding what compromises they will make 17:32.720 --> 17:35.000 and what it means for the end user. 17:35.000 --> 17:37.040 So yeah, the core is we're trying to distill 17:37.040 --> 17:39.600 the core building blocks horserierlyties 17:39.600 --> 17:42.680 and let the computer do as much calculation as possible. 17:42.680 --> 17:44.080 Let's see end of my presentation. 17:44.080 --> 17:45.280 And thank you for listening. 17:45.280 --> 17:54.160 Thank you, Jim. 17:54.160 --> 17:55.200 I'm sure we have questions. 17:55.200 --> 17:56.800 Yes, I was thinking about you. 17:56.800 --> 17:59.920 Yeah, I wish that's getting closer. 17:59.920 --> 18:01.120 It might be a bit of a... 18:01.120 --> 18:02.280 Is it on? 18:02.280 --> 18:03.680 Yeah, it is. 18:03.680 --> 18:05.880 The remote here is neutral press. 18:05.880 --> 18:07.280 It might be a bit of an assumption of question. 18:07.280 --> 18:09.720 But I was... 18:09.720 --> 18:12.160 My trainer thinking is that this is still highly dependent 18:12.160 --> 18:16.360 on some elements of time date. 18:16.360 --> 18:19.040 Do you foresee, or do you imagine, 18:19.040 --> 18:21.640 some way of achieving CRDT, you know, 18:21.640 --> 18:24.760 just consistency as all, without that? 18:24.760 --> 18:26.000 We're dealing with time and date. 18:26.000 --> 18:28.120 Without human data, without... 18:28.120 --> 18:29.600 Ah. 18:29.600 --> 18:31.160 Well, the thing is what's really important 18:31.160 --> 18:32.800 with developing strategies and application 18:32.800 --> 18:36.000 is that you verify the correctness, right? 18:36.000 --> 18:38.400 And so the benefit of the tools that we develop 18:38.400 --> 18:41.160 is that you give the sequential specification, 18:41.160 --> 18:43.640 which is always based on the business logic 18:43.640 --> 18:45.760 of what your application needs, right? 18:45.760 --> 18:49.320 And so that is still something that has to become up with a human 18:49.320 --> 18:50.960 or maybe these days you could do an LM, 18:50.960 --> 18:55.120 but I don't know how much you could trust it to do the correct thing. 18:55.120 --> 18:58.040 But you don't have to, or... 18:58.040 --> 19:01.600 You much less have to think about designing the correct CRDT 19:01.600 --> 19:03.880 because the semantics will be automatically... 19:03.880 --> 19:06.640 that's the goal verified by the computer for you. 19:06.640 --> 19:08.280 So you get something, you can see, 19:08.280 --> 19:12.560 look, we can offer you this, these strategies based on your spec, 19:12.560 --> 19:15.960 and well, you decide if this fits your application. 19:15.960 --> 19:17.520 But at least you don't have to think about 19:17.520 --> 19:20.960 how you go into the details of it. 19:28.240 --> 19:29.080 Thanks, Jamen. 19:29.080 --> 19:32.080 Things are really, really great body of work. 19:32.080 --> 19:35.520 And as a creator of a one-size-fits-all CRDT, 19:35.520 --> 19:38.280 I think it's an empirical question. 19:38.280 --> 19:40.280 We don't think it's the right approach. 19:40.280 --> 19:44.120 We think it's an expedient approach to try and test whether 19:44.120 --> 19:47.120 you can have a one-size-fits-most approach. 19:47.120 --> 19:53.520 And we don't think it's optimal, but it's useful. 19:53.520 --> 19:55.080 It's got pretty far. 19:55.080 --> 19:59.280 I, in a past life, did a lot of database stuff. 19:59.280 --> 20:02.600 And what I noticed is nobody actually understands how to use databases 20:02.600 --> 20:05.880 and the vast majority of postgres features 20:05.880 --> 20:10.200 are never used by anyone except consultants. 20:10.200 --> 20:12.320 I see someone from the postgres community having a chocolate 20:12.320 --> 20:12.760 front of me. 20:12.760 --> 20:13.640 It's very true, though. 20:13.640 --> 20:14.760 It's very true. 20:14.760 --> 20:17.400 And so I think historically, I would have been worried 20:17.400 --> 20:21.440 about this, but I'm curious now, it's been a few years since Flek. 20:21.440 --> 20:25.360 Has anyone been making any progress on type inference, 20:25.360 --> 20:28.040 static analysis, so that we don't get 20:28.040 --> 20:32.360 sort of either the laziest or most conservative interpretations 20:32.360 --> 20:36.680 that could actually use PL techniques to find the weakest forms 20:36.680 --> 20:38.280 of consistency necessary. 20:38.280 --> 20:40.080 Does it feel like a promising direction? 20:40.080 --> 20:42.880 Yeah, so I maybe will go back to this slide. 20:46.600 --> 20:47.760 I went to four, sorry. 20:47.760 --> 20:48.680 Two, sorry. 20:48.680 --> 20:50.480 No, let me see. 20:50.480 --> 20:52.440 Show navigator. 20:52.440 --> 20:54.600 Uh, I won't. 20:54.600 --> 20:55.320 Yeah, you're just fine. 20:55.320 --> 20:56.720 Yeah. 20:56.720 --> 20:59.080 So it's a very good question, right? 20:59.080 --> 21:01.760 Because there is no right solution for very many things. 21:01.760 --> 21:04.640 We're just trying to create something and see if it fits. 21:04.640 --> 21:08.160 So in this last paper that we were working on, 21:08.160 --> 21:11.440 we're trying to really analyze the building blocks of what 21:11.440 --> 21:13.280 makes a replicated data type take. 21:13.280 --> 21:15.600 Where they core essential ports. 21:15.600 --> 21:18.440 And this is why we're also working together at like Carlos Bacero 21:18.440 --> 21:21.840 because they have a very good insights on what is important 21:21.840 --> 21:24.320 and what has been needed for them to come up with the concept 21:24.320 --> 21:26.560 of series these in the first place. 21:26.560 --> 21:30.200 And so it's a very difficult problem. 21:30.200 --> 21:34.480 We're trying to analyze as much as possible given, 21:34.480 --> 21:38.840 let's say, some goal, some business goal, 21:38.840 --> 21:39.920 some application, right? 21:39.920 --> 21:42.880 What is the minimum we need to make something 21:42.880 --> 21:47.240 that you can automatically compose these components 21:47.240 --> 21:49.160 and get something else useful? 21:49.160 --> 21:52.400 So in the end, the one size set all solution is actually 21:52.400 --> 21:53.880 really good because you get these components 21:53.880 --> 21:57.000 that you can play with and build it and easily iterate. 21:57.000 --> 21:58.920 So we're also trying to do that somehow, 21:58.920 --> 22:02.840 but create it by some algorithms that help you formulate 22:02.840 --> 22:05.560 this in a way that fits this package, right? 22:05.560 --> 22:07.720 Is it still bringing this to Papak this year 22:07.720 --> 22:09.560 or will we have to wait a little longer? 22:09.560 --> 22:12.440 So this paper is going to be submitted 22:12.440 --> 22:15.480 in the initial version of this paper is going to be submitted 22:15.480 --> 22:17.320 in to Eco? 22:17.320 --> 22:18.000 OK, great. 22:18.000 --> 22:20.760 Well, hopefully it's going to come out. 22:20.760 --> 22:22.800 Hopefully it gets accepted, let's say, to I get first. 22:22.800 --> 22:25.280 Because obviously we have to be verified for correctness. 22:25.280 --> 22:27.600 It's okay if it's the key of the show. 22:27.600 --> 22:31.560 Yeah, but we don't want to burn the paper immediately 22:31.560 --> 22:35.760 for sure to try to get it published and then we see. 22:35.760 --> 22:37.160 Well, thank you so much. 22:37.160 --> 22:39.080 Thank you for your question. 22:39.080 --> 22:41.080 Another question? 22:41.080 --> 22:41.920 Yes. 22:41.920 --> 22:42.920 Next question. 22:47.920 --> 22:48.920 Yeah, thanks. 22:48.920 --> 22:50.400 This was a great presentation. 22:50.400 --> 22:52.480 I think, very information dense. 22:52.480 --> 22:56.520 So I'm mainly wondering, do you have either these slides, 22:56.520 --> 22:59.280 but like links to the papers and all the work somewhere, 22:59.280 --> 23:04.520 or I saw the end, you have like a page for the faculty 23:04.520 --> 23:08.400 like to disco, is it all on there or like back? 23:08.400 --> 23:13.240 And I find more information in the end, indeed, to the end. 23:13.240 --> 23:14.800 Animations make it navigation slow. 23:14.800 --> 23:17.240 But so basically, if you go here, that's where all 23:17.240 --> 23:18.920 our publications will list as well. 23:18.920 --> 23:21.160 So some papers and I had listed there 23:21.160 --> 23:22.880 because there's still an owner progress, 23:22.880 --> 23:25.720 but you can find through this site, probably a little bit, 23:25.720 --> 23:30.720 or you can look up my name at like the name of the... 23:32.720 --> 23:34.520 Ah, on the page of your song? 23:34.520 --> 23:36.520 Yeah, yes, as well. 23:36.520 --> 23:40.280 And also really great is cdt.dech website, 23:40.280 --> 23:43.760 which lists super much resources on cdt's 23:43.760 --> 23:47.640 and of very prominent goods researchers. 23:47.640 --> 23:50.160 Well, nothing, I'm not good, but I mean, 23:50.240 --> 23:53.000 more famous researchers, like Martin Klappman. 23:53.000 --> 23:56.560 So cdt.dech is really a very good resource. 23:56.560 --> 23:57.560 Good things. 24:03.760 --> 24:05.000 Yeah, I'm Willem. 24:05.000 --> 24:09.280 System where, basically, you don't want the programmer 24:09.280 --> 24:12.280 to manually specify the CRDT, but you want it to be 24:12.280 --> 24:16.400 generated automatically to a lot of extent 24:16.400 --> 24:18.680 just that depend on the use case, 24:18.680 --> 24:23.840 and are there maybe invariants or properties of the use case 24:23.840 --> 24:27.480 that somehow need to be encoded to render this derivation 24:27.480 --> 24:31.520 and is there tooling to help in identifying that? 24:31.520 --> 24:33.720 So that's actually a really good question 24:33.720 --> 24:37.200 because that's exactly where the problem lies, right? 24:37.200 --> 24:40.400 In variance, dealing with applications, specific constraints. 24:40.400 --> 24:42.920 So if you don't have any constraints, 24:42.920 --> 24:45.040 you can easily build a CRDT. 24:45.040 --> 24:47.120 However, as soon as you add invariants to the mix, 24:47.120 --> 24:51.200 or you have data relations, it becomes much more complicated. 24:51.200 --> 24:55.400 And so because we have a solver that can understand this stuff, 24:55.400 --> 24:57.600 the goal is to have indeed that we can automatically 24:57.600 --> 25:01.200 generate a CRDT's, which even have data dependencies 25:01.200 --> 25:04.440 among them, and we have to respect certain rules. 25:04.440 --> 25:06.080 Now, the first bit that we're going to publish 25:06.080 --> 25:08.960 doesn't have this invariant port yet, because it would 25:08.960 --> 25:13.120 be too large, but the goal is to have this in a subsequent 25:13.120 --> 25:16.320 releases, and we've already been working on the solution there. 25:16.320 --> 25:20.320 And can you help the programmer to specify the invariant? 25:20.320 --> 25:23.600 So the specification for the invariant is basically 25:23.600 --> 25:27.320 you implement, so the spec is given in a scalar-like language, 25:27.320 --> 25:31.400 and there you can describe how your business rule should be 25:31.400 --> 25:34.400 in contrast to operation that are applied 25:34.400 --> 25:38.280 on the data type itself, the sequential one. 25:38.280 --> 25:40.640 So you're supposed to write there, basically, 25:40.640 --> 25:42.920 some of the invariants, but you do it in a high-level 25:42.920 --> 25:45.840 language, and not in something like Koch. 25:45.840 --> 25:48.960 And it's like open source. 25:48.960 --> 25:50.040 Flake is open source. 25:50.040 --> 25:53.600 It's not like something you should use to build the application, 25:53.600 --> 25:56.280 because it's a laboratory, mainly made for myself, 25:56.280 --> 25:59.080 but it's open source, and you can find all source code 25:59.080 --> 26:02.320 on the GitHub, which is linked to my pages as well.