WEBVTT 00:00.000 --> 00:13.000 I want to welcome now Bartosch, who will be talking about taming your timing, your YGS documents. 00:13.000 --> 00:15.000 Please welcome him with the round of applause. 00:15.000 --> 00:17.000 Thank you very much. 00:45.000 --> 00:47.000 Thank you very much. 00:47.000 --> 00:49.000 Thank you. 01:15.000 --> 01:25.000 Okay, so first, we'll be talking about how YGS operates on the data that you are passing to it. 01:25.000 --> 01:33.000 What patterns and anti patterns come out of it, because from what you might have known, 01:33.000 --> 01:43.000 or not know, YGS is one of the most popular collaborative libraries for writing collaborative apps and mostly optimized for writing collaborative text editors. 01:43.000 --> 01:48.000 So this is like a standard for YGS. 01:48.000 --> 01:54.000 The thing that we are optimizing for collaborative text editors, and they are used for example in Evernote by now. 01:54.000 --> 02:00.000 If you are using Jupyter Notebooks in collaborative mode, you are using Quires to work with mine. 02:00.000 --> 02:06.000 If you are using, there are, let's say, plenty of notion alternatives, right? 02:06.000 --> 02:17.000 In the open source or in the VCFM companies, and quite a few of them are actually using Quires or Wires under the hood for the collaborative features. 02:17.000 --> 02:33.000 And from this experiences, we came up with different and weird ways, weird for us as creators of patterns that people are using for working with YGS documents, and we decided to address it on this talk. 02:33.000 --> 02:35.000 So yeah. 02:35.000 --> 02:41.000 And for first, on the, oh, can you see the left one? 02:41.000 --> 02:48.000 Yeah, okay. So on the left there is a YGS, the standard YGS API, let's say, on the right is the Wires. 02:48.000 --> 02:57.000 So last one, it's mimicking in the terms of behavior and fully compatible in the terms of binary format. 02:57.000 --> 03:02.000 So you can work between Wires and Wires interchangeably. 03:02.000 --> 03:08.000 And so with that tip number one, keep Wires sequential as long as possible. 03:08.000 --> 03:11.000 So what does it mean sequential reading? 03:11.000 --> 03:25.000 And in order to deep dive into it, we need to explain a little bit about how YGS and how Wires are storing your update and attaching metadata for a possible contribution. 03:25.000 --> 03:34.000 Several people are writing in the same place, for example, we don't want them to override each other character style typing, right? And things like that. 03:34.000 --> 03:43.000 So first, this is like a core element of YGS and Wires is called block or item. 03:43.000 --> 03:50.000 And all of the updates that you are making, land in the YGS docs are wrapped in this block structure. 03:50.000 --> 04:05.000 It has its identifier, it's relationship to other blocks when you are typing, inserting something at position, let's say, and actually we are storing information about the identifier of the element on the position and minus one and plus one. 04:05.000 --> 04:15.000 So everything happens in your relationship because indexes are relative to the person who is looking at them and what is stored there changes over time. 04:15.000 --> 04:25.000 And we are using those IDs to basically fix the position in time and in space as a relationship between different items that you are putting in. 04:25.000 --> 04:39.000 So there is also the identifier of the collection, identifier of how to set end trends and trends of the key value map because YGS also supports key value maps. 04:39.000 --> 04:50.000 And some of the extra metadata in there is finally your update and the thing is that your update can be quite big because of that because there is a lot of fields to that, right? 04:50.000 --> 04:58.000 And when you are typing characters, you are inserting them individually in your document basically. 04:58.000 --> 05:02.000 And every time you are inserting a character, we need to attach this metadata to it, right? 05:02.000 --> 05:12.000 So there's a lot of fields that have some weight in memory and on this if you are avoiding to store them and putting them for every single character is expensive, right? 05:12.000 --> 05:27.000 So what we are doing is basically operation called merging and the thing is that as long as you don't change the cursor position, you are typing one character after another, basically you are putting them in the relationship. 05:27.000 --> 05:33.000 That you know, we are preserving this sequential information basically and like here. 05:33.000 --> 05:38.000 A2 knows that it's left-wise a1. 05:38.000 --> 05:44.000 So we know that if those two blocks were next to each other, that they were inserted after each other basically. 05:44.000 --> 05:54.000 And once this is done, we can basically scratch them together into a single block, which now holds two values A and B. 05:54.000 --> 06:03.000 And the amount of metadata that we stored is basically the same as if there was a single item because metadata stored per single item. 06:03.000 --> 06:10.000 So technically those two notations are equivalent to each other, just one of them is compressed for another one is not. 06:11.000 --> 06:18.000 Now, how can we keep this sequential, this ability to merge those blocks together? 06:18.000 --> 06:26.000 First, they need to happen in the same session because every time you create a new dog by default, it's creating a client ID. 06:26.000 --> 06:38.000 And it's very important that no-to-processes write under the same client ID at the same time because then we have duplicate IDs having different elements starts under them and this would cause the problems. 06:38.000 --> 06:53.000 So we need to keep in mind that those client IDs needs to be used by one process at the time, which is hard for the browsers because this means that basically for every time you are opening the document in the new browser tab, you are generating a client ID. 06:54.000 --> 07:01.000 Well, you can basically recycle the same client ID over and over, which is why wires is so great also. 07:01.000 --> 07:07.000 Another is that they need to happen for the same collection because those marriages happen within the same collection boundaries. 07:07.000 --> 07:12.000 Another, if you are using quiet map, they need to happen in the same key value entry. 07:12.000 --> 07:27.000 So basically updating quantity value than another key value is similar to changing the course of position. So we are breaking this functionality of those updates basically and I will explain it a little bit more later on. 07:27.000 --> 07:36.000 And yeah, if you are doing this on the text, as long as you don't move the course of position and start typing one after another, we can merge those together. 07:36.000 --> 07:52.000 And also means that you can push more than one character at the time, right? You can basically copy paste the entire book and put it as a plain text and it counts as a single block with the single set of metadata it basically. 07:52.000 --> 08:06.000 And also it needs to start the same type of object, which is why I called copy paste as a plain text because if you copy paste as a formatted text, then it will be broken into parts with the formatting attributes in between. 08:06.000 --> 08:15.000 So this is also pretty important. So for how does this sequential it behaves? And now let's see an example. We are creating a wide map. 08:15.000 --> 08:24.000 In one example, we are basically updating entries A and B 100,000 times. On the other, we are updating them. 08:24.000 --> 08:46.000 First, A 10,000 times, then B 10,000 times. And let's encode those documents, see how big they are. So first one is two megabytes, basically. Second one is 77 bytes. So where the difference comes from? Well, first of all. 08:46.000 --> 09:13.000 There's also a special size, because YGL is also to encode in one of two variants. Version V2 is not used by default by any of the default attachments to YGS. So I mean, in the persistence providers or network providers, but you can use it. And in this case, even this big sequential update would be still 74 bytes. 09:13.000 --> 09:35.000 Because of how well YGS can compress the information in the V2 settings. However, here V2 is a little bit bigger, 79 bytes. So basically the role of the amp is used V1 for incremental updates, like single characters, small packages of the packages of data. 09:35.000 --> 09:48.000 And for big chunk documents like you know, hold the constraint, V2 is quite a bit better. So the difference, where are the difference comes from? Right? Those 77 bytes versus two megabytes. 09:48.000 --> 10:12.000 So how map sequential it works when it works as intended, right? So we are updating the same entrance over and over again. We are basically in YGS, basically every Y map is a set of lists, I would say, where the value is the last element of the list in the given entry. 10:12.000 --> 10:23.000 So when we are inserting a new element, then we are inserting a new element again. And we are deleting the previous entry, and the previous value of that entry. 10:23.000 --> 10:37.000 And there is a thing, YGS doesn't have really have updates, it has inserts, you can add it or insert the block or delete it. Once it's deleted, you can no longer access it with some caveys that are possible. 10:37.000 --> 10:56.000 So when you are, again, inserting the next element, you'll see that we are adding a next block, delete the previous one, and as we can see, N0 and N1 are the block IDs of deleted elements. They are sequential, happening next after each other, so we can merge them together in a single block. 10:56.000 --> 11:06.000 Okay. And this way, we can basically squash those hundreds of thousands of updates into several box. 11:06.000 --> 11:22.000 However, when we are inserting elements, you know, one after another to different entries, we are breaking the sequentiality, so we can no longer preserve the merge those updates together, and we need to basically save them one by one. 11:22.000 --> 11:40.000 This is where those two megabytes come from. So one of the ways to solve it, honestly, if you don't need to store the values on, you know, resolve conflicts on entry by entry basis, it's better to just save entire JavaScript objects or JSON objects. 11:40.000 --> 11:58.000 And this way, our output size is 59 bytes, which is even lower than the sequential case previously, as long as you don't need to update, update A and B in collaborate manner, right? 11:58.000 --> 12:17.000 Now, tip number two, not every object needs to be collaborative, going back to the same idea. For example, if you have let's say we are using quite just update the very vehicle position, right? So we are representing the position as a Y map with latitude and longitude, and two clients are updating it concurrently. 12:17.000 --> 12:34.000 The problem is that if two of them are updating concurrently, and we cross sync them with one another, what we can end with is a consistent state, but invalid state, because we are basically resolving conflicts on entry by entry basis, right? 12:34.000 --> 13:01.000 So one client might win on another and one entry, another might win on another entry, producing the result that is not satisfactorable, not satisfactionary for us. So the idea here is that we don't use Y map, more Y maps when we don't need it, it's just better to use a simple JSON objects, because JSON objects are treated as a single entry, and we resolve them as a, basically atomic value conflict, right? So this is another. 13:01.000 --> 13:26.000 So basically, this is common pattern when people are building something like a crad application. So what they are doing, well, there is a user form that we can put into JSON, we update that data from JSON, put it back again, by basically saving all of the JSON data field by field in the YGS, Y map let's say, right? 13:26.000 --> 13:36.000 The thing is that every time you are writing something to Y map, you aren't creating a new block, even if the value that was the previously is the same. 13:36.000 --> 13:47.000 It's a way of saying, I really, really want this to have this value in case of concurrent conflicts that I can still fight with my value, basically, value from another person. 13:47.000 --> 13:55.000 But the problem is that we are writing to two and three entries while only one was changed, and this means that we also breaking sequentiality. 13:55.000 --> 14:02.000 So we are basically blowing up the document size by inserting the same value over and over again. 14:02.000 --> 14:10.000 So of course, better to only write the values that change, never write the values that didn't change. 14:10.000 --> 14:31.000 But for the collections, like why maps can't erase each other. So this is a problem that most people run into when they are saying, I've made applications in KJS, and now I've made some changes here and here, and my data disappeared, the sync was broken, I've lost my data, right? 14:31.000 --> 14:44.000 Let's say that we have following them, we are having a map of users, trying to get a map representing a person and Alice, this is collaborative map, but it could not exist because it was not initialized here. 14:44.000 --> 14:55.000 So if it doesn't exist, we create a new Y map, insert data from Alice, insert it into a user's Y map, parent Y map, then fill it with some data, right? 14:55.000 --> 15:09.000 Then let's say we have two documents that represent concurrent update. On one, we are basically adding age equal 28, on another one, we are adding email, we are applying the update, 15:09.000 --> 15:21.000 syncing those documents together, and the result is that on one of them, they both have the same value, but they both don't have email value, and why does it happen? 15:21.000 --> 15:35.000 Basically, why JS has two levels of types, the replayable types created by the under the document itself are identified by their string name that you are giving them, in this case it's users. 15:35.000 --> 15:49.000 But if you are inserting Y maps under the hood, then under those collections, they are getting a block ID as they are identifier, and this means that on every insert and you block ID is created, 15:49.000 --> 16:03.000 and this Y map, even though it in our heads represents the same element, the same entity, on the wires side or on the YJ side, there are two different elements, and as such they are basically competing with each other. 16:03.000 --> 16:15.000 So when the configuration happens, we see that under the Alice, we have one object that has this ID, another object that has this ID, one of those objects will win, and one of those objects will preserve the value. 16:15.000 --> 16:23.000 If the ID is the same, because we synchronized it first before inserting something from another client, then all it's fine, really. 16:23.000 --> 16:35.000 So, root level types are mostly, since we have a little bit of time, they are mostly for things like schema, if you are database developers think, like tables in SQL, right? 16:35.000 --> 16:41.000 So, the value that is expected to be there already, and we are working with our application logic out against that schema. 16:41.000 --> 16:57.000 Well, unless it types, they are just entities, it's better to have if you have the possibility of conflict, you need to have some person that is the first initializer of that element, and then, for example, refer to it by its unique ID, for example, right? 16:57.000 --> 17:07.000 The ID that you have control over in this case, something like U, ID, or any kind of ID that has sense in your business logic, I say. 17:07.000 --> 17:25.000 But, why did we take that peculiar choice? Like, why does the Y maps can delete each other? It's not like a very safe to lose user data, but there is a reason for that, and one is that, why types can be used to basically stabilize data? 17:25.000 --> 17:33.000 Let's say, this is an example of two people, three people are really collaborating with each other and doing the changes in the document, but one of them is offline, right? 17:33.000 --> 17:53.000 And there comes a time for a peer review, they are reviewing the document, those changes are fine, okay? The document is accepted, this is now we want to freeze the stay of the document, but then the third person comes in after a long period of being offline, and they also are editing this document and they didn't, 17:53.000 --> 18:09.000 you know, that it was peer reviewed already, right? But anyway, their changes are jumping into it, right? So this is a problem, and basically using a new Y document can be used to optimize that in the moment. 18:09.000 --> 18:19.000 And the other is garbage collection, because the biggest problem with document updates is that when we are deleting this data, a garbage starts to pile up, and it is present in document state, 18:19.000 --> 18:30.000 as we've seen before with those two megabyte updates, right? So to close, YGS has three different ways of representing deleted data. 18:30.000 --> 18:43.000 One is basically a bit marker that is used when you turn garbage collection off in YGS document, when you are creating it. It is necessary for things like snapshotting time travel debugging, sort of time travel, and on the riddle manager. 18:43.000 --> 18:55.000 You can undo and redo operations, because we need that data, we cannot just delete it. Another one, the default one, is we are deleting the user data, just keeping information, how many elements were in there, 18:55.000 --> 19:05.000 but we still keep the metadata around, because one person can delete a fragment of text, but two other people could possibly insert something in that deleted range, 19:05.000 --> 19:10.000 and there are still subject of context, so we still need this metadata to resolve that context. 19:10.000 --> 19:24.000 And the third one is when the parent collection is deleted, it's deleted forever basically, because once something, some element is deleted, you can no longer access it, so you can access basically, you can create a new one, of course, and put on top of it. 19:24.000 --> 19:37.000 But this is another collection, but once the collection is deleted, all of the blogs that were referring to were also deleted, no longer accessible, there is no need for conflict resolution, and we can just put a very simple, very small place holder for it. 19:37.000 --> 19:46.000 And you can use it, this is like the final slide, so you can use it basically to stabilize your data. 19:46.000 --> 19:53.000 So we are doing some, for example, here we are doing a thousandth of, or insert and delete operations in random positions. 19:53.000 --> 19:59.000 This is the string that I've generated from it. I can put it into JSON, which basically returns as string, a plain string. 19:59.000 --> 20:10.000 I can then code the document as an update for this string, this update is six thousand bytes, because of those thousands of operations that we had to store, and they were in random positions, so probably no merge happened. 20:10.000 --> 20:21.000 But you can create a new ytext object, which is yj's element used for each text as you think, and initialize it with this data from the previous text. 20:22.000 --> 20:50.000 Now, anything that happened in the concurrent updates in between those will be erased, because ytext, text to will erase text, and all of the updates that happened on the original text, which is bad, but once it's, but if you are willing to accept it, because you want to have a stable version of the document, then you can encode it again, and now it counts as a single string of characters, which after our update and coding, 20:51.000 --> 21:10.000 has 87 bytes in the document size, because we get rid of the previous text, got rid of all of the blocks, and all of the metadata that were associated with it, so we don't longer need those information that was basically six kilobytes long, right? 21:10.000 --> 21:16.000 So that was very fast, but I hope that you managed to keep on with me. 21:16.000 --> 21:25.000 So if you want those are the references, I'm writing mostly about wires, because this is my field, but if you want to see how yj's contribution work, 21:25.000 --> 21:30.000 and this is like a tutorial, so you can write it yourself, this is the last link. 21:30.000 --> 21:39.000 If you are interested in more in the details about how yj's and wires works under the hood, there are two different things, and how to write quite perform, 21:39.000 --> 21:49.000 one CSV table, depending on how big CSVs we are talking about, but this one shows that it can work up to the certain points for sure. 21:49.000 --> 21:51.000 So thank you. 21:51.000 --> 22:01.000 Thank you so much, Barters, for your talk. 22:01.000 --> 22:09.000 We're going to take questions in the room on the net, nothing. Please raise your hand if you want to ask. 22:09.000 --> 22:16.000 It's a moment, or if everything was clear, we don't take questions. 22:16.000 --> 22:18.000 Okay, then. 22:22.000 --> 22:30.000 Sorry, originally this talk was said for like an hour, and slower and much more on the better pace than that. 22:30.000 --> 22:37.000 No, really, we've very often used a much, I wouldn't throw much about why I'm trying to do that. 22:37.000 --> 22:43.000 One burial challenge that we had a lot of work to do, I see previous problems. 22:43.000 --> 22:45.000 You know, we didn't know why, once I write in it. 22:45.000 --> 22:48.000 You play a big life on dinner, all your sort of fitted in it. 22:48.000 --> 22:52.000 And we talk about it every day, and we talk in this day. 22:52.000 --> 22:55.000 She can do it often, come and see each other today. 22:55.000 --> 23:00.000 Because it can prevent updates being routed to the wrong object that tends to be in replace. 23:00.000 --> 23:03.000 It's also desirable sometimes. 23:03.000 --> 23:10.000 And I wonder if you've given any thought to, like, whether you could tune the semantics better or give different kinds of knobs, 23:10.000 --> 23:13.000 that is the right thing for many years. 23:13.000 --> 23:15.000 And you know, as a conversation, we had all the time. 23:15.000 --> 23:17.000 I'm curious if you can give up that tip. 23:17.000 --> 23:24.000 So what I was presenting here is based on the experiences I have with different people in different companies, 23:24.000 --> 23:25.000 right? 23:25.000 --> 23:31.000 And it turns out that more often people, you know, they are not interested in things like internal, 23:31.000 --> 23:34.000 of facilities, as long as they are working. 23:34.000 --> 23:38.000 As they expect, the problems that they usually don't expect that. 23:38.000 --> 23:42.000 And this is why I was describing this problem on the first place. 23:42.000 --> 23:48.000 I think that this is more common case when people are in certain things to the same place. 23:48.000 --> 23:53.000 And they expect that, you know, the things will magically resolve themselves without losing their data. 23:53.000 --> 23:55.000 Which is not what happens in wages. 23:55.000 --> 23:58.000 From what they know, it doesn't also happen in the lower role. 23:58.000 --> 24:01.000 I think that it also doesn't happen in automatic. 24:01.000 --> 24:02.000 But... 24:02.000 --> 24:03.000 What... 24:03.000 --> 24:04.000 How much does it retain the whole history? 24:04.000 --> 24:05.000 Is that everything? 24:05.000 --> 24:07.000 So by default, you still have the data? 24:07.000 --> 24:08.000 Yeah. 24:08.000 --> 24:09.000 But if... 24:09.000 --> 24:10.000 Yeah. 24:10.000 --> 24:11.000 This is the visibility challenge. 24:11.000 --> 24:12.000 So the same goes here. 24:12.000 --> 24:14.000 If you will turn off the G.C. 24:14.000 --> 24:17.000 And you are willing to store the whole history around. 24:17.000 --> 24:21.000 Then you can go back to those previous variants of the document, right? 24:21.000 --> 24:23.000 But... 24:23.000 --> 24:29.000 You know, you cannot conduct a convict resolution on them, on there. 24:29.000 --> 24:33.000 Right now, with the latest version of YGS that is still in the working, 24:33.000 --> 24:37.000 there is possibility to basically build data in YGS. 24:37.000 --> 24:40.000 And those data can be working like a G.C. 24:40.000 --> 24:41.000 Basically, right? 24:41.000 --> 24:42.000 So you can re-base. 24:42.000 --> 24:48.000 You can re-base the changes onto some alternative history of updates, 24:48.000 --> 24:52.000 which includes cherry picking the changes from YMAP that was deleted 24:52.000 --> 24:56.000 onto YMAP that won the convict resolution. 24:56.000 --> 24:57.000 Very cool. 24:57.000 --> 24:58.000 Yeah. 24:58.000 --> 25:00.000 Let's see. 25:00.000 --> 25:01.000 Yeah. 25:01.000 --> 25:02.000 Thanks. 25:04.000 --> 25:06.000 Any other question in the room? 25:07.000 --> 25:08.000 Is your time? 25:08.000 --> 25:09.000 Yeah. 25:09.000 --> 25:10.000 Doctor? 25:10.000 --> 25:11.000 You won't have it again. 25:11.000 --> 25:14.000 You're the one. 25:14.000 --> 25:16.000 So we can wrap up. 25:16.000 --> 25:18.000 Thank you very much. 25:18.000 --> 25:19.000 Thanks. 25:19.000 --> 25:20.000 Thank you.