WEBVTT

00:00.000 --> 00:13.000
I want to welcome now Bartosch, who will be talking about taming your timing, your YGS documents.

00:13.000 --> 00:15.000
Please welcome him with the round of applause.

00:15.000 --> 00:17.000
Thank you very much.

00:45.000 --> 00:47.000
Thank you very much.

00:47.000 --> 00:49.000
Thank you.

01:15.000 --> 01:25.000
Okay, so first, we'll be talking about how YGS operates on the data that you are passing to it.

01:25.000 --> 01:33.000
What patterns and anti patterns come out of it, because from what you might have known,

01:33.000 --> 01:43.000
or not know, YGS is one of the most popular collaborative libraries for writing collaborative apps and mostly optimized for writing collaborative text editors.

01:43.000 --> 01:48.000
So this is like a standard for YGS.

01:48.000 --> 01:54.000
The thing that we are optimizing for collaborative text editors, and they are used for example in Evernote by now.

01:54.000 --> 02:00.000
If you are using Jupyter Notebooks in collaborative mode, you are using Quires to work with mine.

02:00.000 --> 02:06.000
If you are using, there are, let's say, plenty of notion alternatives, right?

02:06.000 --> 02:17.000
In the open source or in the VCFM companies, and quite a few of them are actually using Quires or Wires under the hood for the collaborative features.

02:17.000 --> 02:33.000
And from this experiences, we came up with different and weird ways, weird for us as creators of patterns that people are using for working with YGS documents, and we decided to address it on this talk.

02:33.000 --> 02:35.000
So yeah.

02:35.000 --> 02:41.000
And for first, on the, oh, can you see the left one?

02:41.000 --> 02:48.000
Yeah, okay. So on the left there is a YGS, the standard YGS API, let's say, on the right is the Wires.

02:48.000 --> 02:57.000
So last one, it's mimicking in the terms of behavior and fully compatible in the terms of binary format.

02:57.000 --> 03:02.000
So you can work between Wires and Wires interchangeably.

03:02.000 --> 03:08.000
And so with that tip number one, keep Wires sequential as long as possible.

03:08.000 --> 03:11.000
So what does it mean sequential reading?

03:11.000 --> 03:25.000
And in order to deep dive into it, we need to explain a little bit about how YGS and how Wires are storing your update and attaching metadata for a possible contribution.

03:25.000 --> 03:34.000
Several people are writing in the same place, for example, we don't want them to override each other character style typing, right? And things like that.

03:34.000 --> 03:43.000
So first, this is like a core element of YGS and Wires is called block or item.

03:43.000 --> 03:50.000
And all of the updates that you are making, land in the YGS docs are wrapped in this block structure.

03:50.000 --> 04:05.000
It has its identifier, it's relationship to other blocks when you are typing, inserting something at position, let's say, and actually we are storing information about the identifier of the element on the position and minus one and plus one.

04:05.000 --> 04:15.000
So everything happens in your relationship because indexes are relative to the person who is looking at them and what is stored there changes over time.

04:15.000 --> 04:25.000
And we are using those IDs to basically fix the position in time and in space as a relationship between different items that you are putting in.

04:25.000 --> 04:39.000
So there is also the identifier of the collection, identifier of how to set end trends and trends of the key value map because YGS also supports key value maps.

04:39.000 --> 04:50.000
And some of the extra metadata in there is finally your update and the thing is that your update can be quite big because of that because there is a lot of fields to that, right?

04:50.000 --> 04:58.000
And when you are typing characters, you are inserting them individually in your document basically.

04:58.000 --> 05:02.000
And every time you are inserting a character, we need to attach this metadata to it, right?

05:02.000 --> 05:12.000
So there's a lot of fields that have some weight in memory and on this if you are avoiding to store them and putting them for every single character is expensive, right?

05:12.000 --> 05:27.000
So what we are doing is basically operation called merging and the thing is that as long as you don't change the cursor position, you are typing one character after another, basically you are putting them in the relationship.

05:27.000 --> 05:33.000
That you know, we are preserving this sequential information basically and like here.

05:33.000 --> 05:38.000
A2 knows that it's left-wise a1.

05:38.000 --> 05:44.000
So we know that if those two blocks were next to each other, that they were inserted after each other basically.

05:44.000 --> 05:54.000
And once this is done, we can basically scratch them together into a single block, which now holds two values A and B.

05:54.000 --> 06:03.000
And the amount of metadata that we stored is basically the same as if there was a single item because metadata stored per single item.

06:03.000 --> 06:10.000
So technically those two notations are equivalent to each other, just one of them is compressed for another one is not.

06:11.000 --> 06:18.000
Now, how can we keep this sequential, this ability to merge those blocks together?

06:18.000 --> 06:26.000
First, they need to happen in the same session because every time you create a new dog by default, it's creating a client ID.

06:26.000 --> 06:38.000
And it's very important that no-to-processes write under the same client ID at the same time because then we have duplicate IDs having different elements starts under them and this would cause the problems.

06:38.000 --> 06:53.000
So we need to keep in mind that those client IDs needs to be used by one process at the time, which is hard for the browsers because this means that basically for every time you are opening the document in the new browser tab, you are generating a client ID.

06:54.000 --> 07:01.000
Well, you can basically recycle the same client ID over and over, which is why wires is so great also.

07:01.000 --> 07:07.000
Another is that they need to happen for the same collection because those marriages happen within the same collection boundaries.

07:07.000 --> 07:12.000
Another, if you are using quiet map, they need to happen in the same key value entry.

07:12.000 --> 07:27.000
So basically updating quantity value than another key value is similar to changing the course of position. So we are breaking this functionality of those updates basically and I will explain it a little bit more later on.

07:27.000 --> 07:36.000
And yeah, if you are doing this on the text, as long as you don't move the course of position and start typing one after another, we can merge those together.

07:36.000 --> 07:52.000
And also means that you can push more than one character at the time, right? You can basically copy paste the entire book and put it as a plain text and it counts as a single block with the single set of metadata it basically.

07:52.000 --> 08:06.000
And also it needs to start the same type of object, which is why I called copy paste as a plain text because if you copy paste as a formatted text, then it will be broken into parts with the formatting attributes in between.

08:06.000 --> 08:15.000
So this is also pretty important. So for how does this sequential it behaves? And now let's see an example. We are creating a wide map.

08:15.000 --> 08:24.000
In one example, we are basically updating entries A and B 100,000 times. On the other, we are updating them.

08:24.000 --> 08:46.000
First, A 10,000 times, then B 10,000 times. And let's encode those documents, see how big they are. So first one is two megabytes, basically. Second one is 77 bytes. So where the difference comes from? Well, first of all.

08:46.000 --> 09:13.000
There's also a special size, because YGL is also to encode in one of two variants. Version V2 is not used by default by any of the default attachments to YGS. So I mean, in the persistence providers or network providers, but you can use it. And in this case, even this big sequential update would be still 74 bytes.

09:13.000 --> 09:35.000
Because of how well YGS can compress the information in the V2 settings. However, here V2 is a little bit bigger, 79 bytes. So basically the role of the amp is used V1 for incremental updates, like single characters, small packages of the packages of data.

09:35.000 --> 09:48.000
And for big chunk documents like you know, hold the constraint, V2 is quite a bit better. So the difference, where are the difference comes from? Right? Those 77 bytes versus two megabytes.

09:48.000 --> 10:12.000
So how map sequential it works when it works as intended, right? So we are updating the same entrance over and over again. We are basically in YGS, basically every Y map is a set of lists, I would say, where the value is the last element of the list in the given entry.

10:12.000 --> 10:23.000
So when we are inserting a new element, then we are inserting a new element again. And we are deleting the previous entry, and the previous value of that entry.

10:23.000 --> 10:37.000
And there is a thing, YGS doesn't have really have updates, it has inserts, you can add it or insert the block or delete it. Once it's deleted, you can no longer access it with some caveys that are possible.

10:37.000 --> 10:56.000
So when you are, again, inserting the next element, you'll see that we are adding a next block, delete the previous one, and as we can see, N0 and N1 are the block IDs of deleted elements. They are sequential, happening next after each other, so we can merge them together in a single block.

10:56.000 --> 11:06.000
Okay. And this way, we can basically squash those hundreds of thousands of updates into several box.

11:06.000 --> 11:22.000
However, when we are inserting elements, you know, one after another to different entries, we are breaking the sequentiality, so we can no longer preserve the merge those updates together, and we need to basically save them one by one.

11:22.000 --> 11:40.000
This is where those two megabytes come from. So one of the ways to solve it, honestly, if you don't need to store the values on, you know, resolve conflicts on entry by entry basis, it's better to just save entire JavaScript objects or JSON objects.

11:40.000 --> 11:58.000
And this way, our output size is 59 bytes, which is even lower than the sequential case previously, as long as you don't need to update, update A and B in collaborate manner, right?

11:58.000 --> 12:17.000
Now, tip number two, not every object needs to be collaborative, going back to the same idea. For example, if you have let's say we are using quite just update the very vehicle position, right? So we are representing the position as a Y map with latitude and longitude, and two clients are updating it concurrently.

12:17.000 --> 12:34.000
The problem is that if two of them are updating concurrently, and we cross sync them with one another, what we can end with is a consistent state, but invalid state, because we are basically resolving conflicts on entry by entry basis, right?

12:34.000 --> 13:01.000
So one client might win on another and one entry, another might win on another entry, producing the result that is not satisfactorable, not satisfactionary for us. So the idea here is that we don't use Y map, more Y maps when we don't need it, it's just better to use a simple JSON objects, because JSON objects are treated as a single entry, and we resolve them as a, basically atomic value conflict, right? So this is another.

13:01.000 --> 13:26.000
So basically, this is common pattern when people are building something like a crad application. So what they are doing, well, there is a user form that we can put into JSON, we update that data from JSON, put it back again, by basically saving all of the JSON data field by field in the YGS, Y map let's say, right?

13:26.000 --> 13:36.000
The thing is that every time you are writing something to Y map, you aren't creating a new block, even if the value that was the previously is the same.

13:36.000 --> 13:47.000
It's a way of saying, I really, really want this to have this value in case of concurrent conflicts that I can still fight with my value, basically, value from another person.

13:47.000 --> 13:55.000
But the problem is that we are writing to two and three entries while only one was changed, and this means that we also breaking sequentiality.

13:55.000 --> 14:02.000
So we are basically blowing up the document size by inserting the same value over and over again.

14:02.000 --> 14:10.000
So of course, better to only write the values that change, never write the values that didn't change.

14:10.000 --> 14:31.000
But for the collections, like why maps can't erase each other. So this is a problem that most people run into when they are saying, I've made applications in KJS, and now I've made some changes here and here, and my data disappeared, the sync was broken, I've lost my data, right?

14:31.000 --> 14:44.000
Let's say that we have following them, we are having a map of users, trying to get a map representing a person and Alice, this is collaborative map, but it could not exist because it was not initialized here.

14:44.000 --> 14:55.000
So if it doesn't exist, we create a new Y map, insert data from Alice, insert it into a user's Y map, parent Y map, then fill it with some data, right?

14:55.000 --> 15:09.000
Then let's say we have two documents that represent concurrent update. On one, we are basically adding age equal 28, on another one, we are adding email, we are applying the update,

15:09.000 --> 15:21.000
syncing those documents together, and the result is that on one of them, they both have the same value, but they both don't have email value, and why does it happen?

15:21.000 --> 15:35.000
Basically, why JS has two levels of types, the replayable types created by the under the document itself are identified by their string name that you are giving them, in this case it's users.

15:35.000 --> 15:49.000
But if you are inserting Y maps under the hood, then under those collections, they are getting a block ID as they are identifier, and this means that on every insert and you block ID is created,

15:49.000 --> 16:03.000
and this Y map, even though it in our heads represents the same element, the same entity, on the wires side or on the YJ side, there are two different elements, and as such they are basically competing with each other.

16:03.000 --> 16:15.000
So when the configuration happens, we see that under the Alice, we have one object that has this ID, another object that has this ID, one of those objects will win, and one of those objects will preserve the value.

16:15.000 --> 16:23.000
If the ID is the same, because we synchronized it first before inserting something from another client, then all it's fine, really.

16:23.000 --> 16:35.000
So, root level types are mostly, since we have a little bit of time, they are mostly for things like schema, if you are database developers think, like tables in SQL, right?

16:35.000 --> 16:41.000
So, the value that is expected to be there already, and we are working with our application logic out against that schema.

16:41.000 --> 16:57.000
Well, unless it types, they are just entities, it's better to have if you have the possibility of conflict, you need to have some person that is the first initializer of that element, and then, for example, refer to it by its unique ID, for example, right?

16:57.000 --> 17:07.000
The ID that you have control over in this case, something like U, ID, or any kind of ID that has sense in your business logic, I say.

17:07.000 --> 17:25.000
But, why did we take that peculiar choice? Like, why does the Y maps can delete each other? It's not like a very safe to lose user data, but there is a reason for that, and one is that, why types can be used to basically stabilize data?

17:25.000 --> 17:33.000
Let's say, this is an example of two people, three people are really collaborating with each other and doing the changes in the document, but one of them is offline, right?

17:33.000 --> 17:53.000
And there comes a time for a peer review, they are reviewing the document, those changes are fine, okay? The document is accepted, this is now we want to freeze the stay of the document, but then the third person comes in after a long period of being offline, and they also are editing this document and they didn't,

17:53.000 --> 18:09.000
you know, that it was peer reviewed already, right? But anyway, their changes are jumping into it, right? So this is a problem, and basically using a new Y document can be used to optimize that in the moment.

18:09.000 --> 18:19.000
And the other is garbage collection, because the biggest problem with document updates is that when we are deleting this data, a garbage starts to pile up, and it is present in document state,

18:19.000 --> 18:30.000
as we've seen before with those two megabyte updates, right? So to close, YGS has three different ways of representing deleted data.

18:30.000 --> 18:43.000
One is basically a bit marker that is used when you turn garbage collection off in YGS document, when you are creating it. It is necessary for things like snapshotting time travel debugging, sort of time travel, and on the riddle manager.

18:43.000 --> 18:55.000
You can undo and redo operations, because we need that data, we cannot just delete it. Another one, the default one, is we are deleting the user data, just keeping information, how many elements were in there,

18:55.000 --> 19:05.000
but we still keep the metadata around, because one person can delete a fragment of text, but two other people could possibly insert something in that deleted range,

19:05.000 --> 19:10.000
and there are still subject of context, so we still need this metadata to resolve that context.

19:10.000 --> 19:24.000
And the third one is when the parent collection is deleted, it's deleted forever basically, because once something, some element is deleted, you can no longer access it, so you can access basically, you can create a new one, of course, and put on top of it.

19:24.000 --> 19:37.000
But this is another collection, but once the collection is deleted, all of the blogs that were referring to were also deleted, no longer accessible, there is no need for conflict resolution, and we can just put a very simple, very small place holder for it.

19:37.000 --> 19:46.000
And you can use it, this is like the final slide, so you can use it basically to stabilize your data.

19:46.000 --> 19:53.000
So we are doing some, for example, here we are doing a thousandth of, or insert and delete operations in random positions.

19:53.000 --> 19:59.000
This is the string that I've generated from it. I can put it into JSON, which basically returns as string, a plain string.

19:59.000 --> 20:10.000
I can then code the document as an update for this string, this update is six thousand bytes, because of those thousands of operations that we had to store, and they were in random positions, so probably no merge happened.

20:10.000 --> 20:21.000
But you can create a new ytext object, which is yj's element used for each text as you think, and initialize it with this data from the previous text.

20:22.000 --> 20:50.000
Now, anything that happened in the concurrent updates in between those will be erased, because ytext, text to will erase text, and all of the updates that happened on the original text, which is bad, but once it's, but if you are willing to accept it, because you want to have a stable version of the document, then you can encode it again, and now it counts as a single string of characters, which after our update and coding,

20:51.000 --> 21:10.000
has 87 bytes in the document size, because we get rid of the previous text, got rid of all of the blocks, and all of the metadata that were associated with it, so we don't longer need those information that was basically six kilobytes long, right?

21:10.000 --> 21:16.000
So that was very fast, but I hope that you managed to keep on with me.

21:16.000 --> 21:25.000
So if you want those are the references, I'm writing mostly about wires, because this is my field, but if you want to see how yj's contribution work,

21:25.000 --> 21:30.000
and this is like a tutorial, so you can write it yourself, this is the last link.

21:30.000 --> 21:39.000
If you are interested in more in the details about how yj's and wires works under the hood, there are two different things, and how to write quite perform,

21:39.000 --> 21:49.000
one CSV table, depending on how big CSVs we are talking about, but this one shows that it can work up to the certain points for sure.

21:49.000 --> 21:51.000
So thank you.

21:51.000 --> 22:01.000
Thank you so much, Barters, for your talk.

22:01.000 --> 22:09.000
We're going to take questions in the room on the net, nothing. Please raise your hand if you want to ask.

22:09.000 --> 22:16.000
It's a moment, or if everything was clear, we don't take questions.

22:16.000 --> 22:18.000
Okay, then.

22:22.000 --> 22:30.000
Sorry, originally this talk was said for like an hour, and slower and much more on the better pace than that.

22:30.000 --> 22:37.000
No, really, we've very often used a much, I wouldn't throw much about why I'm trying to do that.

22:37.000 --> 22:43.000
One burial challenge that we had a lot of work to do, I see previous problems.

22:43.000 --> 22:45.000
You know, we didn't know why, once I write in it.

22:45.000 --> 22:48.000
You play a big life on dinner, all your sort of fitted in it.

22:48.000 --> 22:52.000
And we talk about it every day, and we talk in this day.

22:52.000 --> 22:55.000
She can do it often, come and see each other today.

22:55.000 --> 23:00.000
Because it can prevent updates being routed to the wrong object that tends to be in replace.

23:00.000 --> 23:03.000
It's also desirable sometimes.

23:03.000 --> 23:10.000
And I wonder if you've given any thought to, like, whether you could tune the semantics better or give different kinds of knobs,

23:10.000 --> 23:13.000
that is the right thing for many years.

23:13.000 --> 23:15.000
And you know, as a conversation, we had all the time.

23:15.000 --> 23:17.000
I'm curious if you can give up that tip.

23:17.000 --> 23:24.000
So what I was presenting here is based on the experiences I have with different people in different companies,

23:24.000 --> 23:25.000
right?

23:25.000 --> 23:31.000
And it turns out that more often people, you know, they are not interested in things like internal,

23:31.000 --> 23:34.000
of facilities, as long as they are working.

23:34.000 --> 23:38.000
As they expect, the problems that they usually don't expect that.

23:38.000 --> 23:42.000
And this is why I was describing this problem on the first place.

23:42.000 --> 23:48.000
I think that this is more common case when people are in certain things to the same place.

23:48.000 --> 23:53.000
And they expect that, you know, the things will magically resolve themselves without losing their data.

23:53.000 --> 23:55.000
Which is not what happens in wages.

23:55.000 --> 23:58.000
From what they know, it doesn't also happen in the lower role.

23:58.000 --> 24:01.000
I think that it also doesn't happen in automatic.

24:01.000 --> 24:02.000
But...

24:02.000 --> 24:03.000
What...

24:03.000 --> 24:04.000
How much does it retain the whole history?

24:04.000 --> 24:05.000
Is that everything?

24:05.000 --> 24:07.000
So by default, you still have the data?

24:07.000 --> 24:08.000
Yeah.

24:08.000 --> 24:09.000
But if...

24:09.000 --> 24:10.000
Yeah.

24:10.000 --> 24:11.000
This is the visibility challenge.

24:11.000 --> 24:12.000
So the same goes here.

24:12.000 --> 24:14.000
If you will turn off the G.C.

24:14.000 --> 24:17.000
And you are willing to store the whole history around.

24:17.000 --> 24:21.000
Then you can go back to those previous variants of the document, right?

24:21.000 --> 24:23.000
But...

24:23.000 --> 24:29.000
You know, you cannot conduct a convict resolution on them, on there.

24:29.000 --> 24:33.000
Right now, with the latest version of YGS that is still in the working,

24:33.000 --> 24:37.000
there is possibility to basically build data in YGS.

24:37.000 --> 24:40.000
And those data can be working like a G.C.

24:40.000 --> 24:41.000
Basically, right?

24:41.000 --> 24:42.000
So you can re-base.

24:42.000 --> 24:48.000
You can re-base the changes onto some alternative history of updates,

24:48.000 --> 24:52.000
which includes cherry picking the changes from YMAP that was deleted

24:52.000 --> 24:56.000
onto YMAP that won the convict resolution.

24:56.000 --> 24:57.000
Very cool.

24:57.000 --> 24:58.000
Yeah.

24:58.000 --> 25:00.000
Let's see.

25:00.000 --> 25:01.000
Yeah.

25:01.000 --> 25:02.000
Thanks.

25:04.000 --> 25:06.000
Any other question in the room?

25:07.000 --> 25:08.000
Is your time?

25:08.000 --> 25:09.000
Yeah.

25:09.000 --> 25:10.000
Doctor?

25:10.000 --> 25:11.000
You won't have it again.

25:11.000 --> 25:14.000
You're the one.

25:14.000 --> 25:16.000
So we can wrap up.

25:16.000 --> 25:18.000
Thank you very much.

25:18.000 --> 25:19.000
Thanks.

25:19.000 --> 25:20.000
Thank you.