WEBVTT

00:00.000 --> 00:14.880
Yep, okay, welcome here. I can hear myself. So hi, my name is Hubert, you might know me

00:14.880 --> 00:23.600
online as you are egg. I'm on the crypto team at Eleman, where crypto means encryption,

00:23.600 --> 00:32.800
and I'm also on the matrix spec core team. But my talk isn't going to be about matrix,

00:32.800 --> 00:40.720
so if you're came here trying to find out what we're going to do about MOS on matrix, sorry,

00:41.920 --> 00:50.400
I'll talk a little bit about it, but not much. I did some investigation into decentralizing

00:50.400 --> 01:00.240
MOS for matrix, and I wrote a partial experimental MOS implementation in TypeScript,

01:01.840 --> 01:14.800
which was not very fun. So what is MOS for messaging layer security? So it is an ITF standard.

01:15.280 --> 01:20.400
It's RFC 9420 for enter and encryption in instant messaging.

01:24.720 --> 01:29.520
And it can be it's intended to apply to many different systems, so

01:32.560 --> 01:42.080
so any or many different messaging platforms can use it. It encrypts messages, obviously,

01:42.080 --> 01:50.480
if it's intended for messaging. And also handshake messages, which are the messages used for

01:50.480 --> 01:57.760
controlling the MOS group, can be either encrypted or in clear text, although the sensitive

01:57.760 --> 02:05.520
parts are always encrypted. So that means that for example, you have the option of either hiding

02:06.480 --> 02:12.240
the group membership from the server, or the server can see the group membership, so that it

02:12.240 --> 02:22.480
can apply its own access controls or whatnot. It ensure its consistent group state, so it

02:25.120 --> 02:34.320
controls the group membership. So everyone in the room can be assured that they

02:36.480 --> 02:44.000
that everyone knows who's in the room. And then there is also this concept of group context,

02:44.000 --> 02:51.600
which is sort of you can shove arbitrary data into the group, and then everyone in the group agrees

02:51.600 --> 02:58.640
what's in the group context. It provides sort of the normal things that you expect from

02:59.600 --> 03:06.720
and turned encryption in messaging. So forward, you can see post compromise security.

03:06.720 --> 03:16.480
The one thing it does not provide is deniability. So there is some work on trying to do that,

03:16.480 --> 03:27.760
but it's not in the standard. And updates are intended to be fast. So order log n,

03:28.480 --> 03:34.880
work where n is the number of members in the group. On average, there's a big

03:34.880 --> 03:44.240
asterisk there because it really depends on how the group is managed. You can get into a

03:44.240 --> 03:53.280
really bad state where there's lots of holes in the tree and you end up having to do order and

03:53.280 --> 04:02.400
work anyways. But the intention is that the common case is that it's order log n. Though

04:02.400 --> 04:11.520
inviting someone still requires a linear time work. So how does MLS work? So I am only going to

04:11.520 --> 04:19.360
focus on a very tiny part of MLS because I don't have two days to talk about it.

04:19.920 --> 04:30.880
So the part I'm going to talk about is the MLS epochs. And so an MLS group goes through

04:31.920 --> 04:40.400
different epochs and each epoch has a secret associated, which is called an epoch secret.

04:41.360 --> 04:47.200
From the epoch secret, you derive other secrets. For example, encryption keys.

04:49.520 --> 04:55.520
I believe a message authentication code keys are all sorts of things.

04:57.040 --> 05:04.880
And the epoch secret, each epoch secret is derived from the previous epoch secret and

05:05.760 --> 05:11.600
a secret that you get from the commit message that created the epoch. And then you sort of

05:12.160 --> 05:22.080
when you get a new epoch, then you throw away the old epochs. So this is a diagram that comes from the

05:22.160 --> 05:32.800
RFC. So this sort of shows that you have the epoch secret. You derive, well, this just shows

05:32.800 --> 05:40.000
the encryption secret. But you derive a bunch of other secrets from it. You use the encryption secret

05:40.000 --> 05:51.840
to derive keys from the secret. The secret key you use to decrypt messages. So when you get in

05:51.920 --> 05:59.920
a message that creates a new epoch, you decrypt it. And then that gives you the commit secret.

05:59.920 --> 06:06.800
You combine it with the previous epochs secret. And then you get the next epochs secret.

06:10.800 --> 06:18.960
So in normal MLS, commit messages are ordered. So you have epoch number zero and then epoch

06:18.960 --> 06:27.440
number one. And then epoch number two. And you only apply one commit to each epoch.

06:28.480 --> 06:35.600
And they're all in a linear order. You have a delivery service that enforces the order.

06:35.600 --> 06:44.240
So that everyone agrees that this is the commit that came, that modifies epoch zero.

06:44.240 --> 06:51.920
And then I get epoch one. So everyone is in sync. But what happens if you don't have a single

06:51.920 --> 06:58.960
delivery service to order them. And so you can't ensure that everyone sees the commit messages

06:58.960 --> 07:07.200
in the same order. So there are two different approaches that have been published as ITF drafts.

07:08.080 --> 07:16.720
There's decentralized MLS, also called DMLS. And this was written by people from

07:17.680 --> 07:24.080
Phoenix R&D. And it was based on work by people from Amazon who were

07:26.480 --> 07:35.200
formerly wicker people, but they got bought out by Amazon. And then there's also distributed MLS,

07:35.280 --> 07:44.320
also known as DMLS. And this was written by people from German at work and researchers from the

07:44.320 --> 07:51.760
US Naval Postgraduate School. So I'm going to give a brief introduction for both of them and then do

07:51.760 --> 08:01.200
some comparisons. Decentralized MLS. So instead of having a linear order of epochs,

08:01.920 --> 08:11.680
it allows you to have multiple descendants. So you can have multiple senders modifying the same

08:11.680 --> 08:19.840
epoch. And then so you get sort of a tree of epochs rather than lying. I use it some fancy

08:19.840 --> 08:26.560
cryptography called puncturable pseudo random functions. So you can't redevive an epoch secret

08:27.440 --> 08:34.880
to preserve forward secrecy. So here's a little diagram that I drew up. The circles are epochs.

08:36.000 --> 08:44.720
The labels on the lines are the people who created the epochs. So we start off with a single epoch.

08:44.720 --> 08:51.120
And then L.S. and Bob try to do something to the group at the same time. So maybe they want to

08:51.120 --> 09:00.320
update the wrong keys in the group. They happen at the same time. So it's not like Alice sends

09:02.400 --> 09:08.320
her commit and then Bob sees Alice's commit and then sends his own. They happen at the same time.

09:08.320 --> 09:18.640
So they both modify the same epoch. So then there's a fork in the tree. And then Carol wants to

09:18.640 --> 09:28.800
send her own commit. She has to decide which epoch to base her commit off of, say she decides to

09:28.800 --> 09:40.160
base it off of Alice's epoch and then so she creates her commit based on that. So yeah, when Alice sends

09:40.240 --> 09:48.800
her commit, then she will show send it to the group. Everyone will see the group. And then so

09:48.800 --> 10:04.000
everyone won't drive of these the three new epochs. So one of the problems with just doing this

10:04.080 --> 10:15.200
with MLS is that so say after Alice drives for epoch, then she still needs to keep around

10:16.080 --> 10:26.080
the epochs secret for the original commit because after Bob sends his commit, then she needs

10:26.080 --> 10:33.760
the epochs secret from the original commit to drive Bob's commit. And that poses a problem for

10:33.760 --> 10:42.960
forward secrecy because you're holding on to chemotherapy for longer than you need to or longer

10:42.960 --> 10:51.200
than you should. And so if someone compromises Alice's client, then they can get the epochs secret

10:51.200 --> 10:58.160
and then replay Alice's commit and then re-derive this whole epoch and then decrypt

10:58.160 --> 11:06.080
all messages using the secret. So that's where this puncturable pseudo random functions comes in.

11:07.200 --> 11:12.640
So if you know pseudo random function is sort of function, you give it some input and then it

11:12.640 --> 11:21.280
spits out some pseudo random data, the puncturable. So if you give it the same data, it'll give you

11:21.280 --> 11:32.400
the same pseudo random data. The puncturable part is once you've given it the input data once,

11:32.400 --> 11:39.920
then you can't give it the same data again. So it can only derive the pseudo random data once

11:41.680 --> 11:48.480
and it's a bit of cryptomagic. I'm not going to try to explain it because I don't understand it

11:48.480 --> 11:58.240
either. It just works. So with this approach, there are a few open questions. So how do you manage

11:59.200 --> 12:08.240
the forks? So if you remember here, we had Alice and Bob's commits. So Carol needs to decide which

12:08.320 --> 12:18.400
commit to use. So Carol uses Alice's, she decided to use Alice's commit, you know,

12:18.400 --> 12:27.840
woods that the best choice. And then what are we going to do about Bob's commit? And then another

12:27.840 --> 12:38.240
question is, now that we have, now that we have multiple epochs that we need to worry about,

12:39.280 --> 12:45.840
how do we make sure that everything's consistent for example membership. So say if on the left

12:45.840 --> 13:01.440
Alice invited her friend Dan and Bob invited Emily, you know, so then on one side of the tree,

13:01.440 --> 13:08.960
we have different memberships. How do we combine them? So this sort of goes into my, the work

13:08.960 --> 13:17.440
that I did, which I also called decentralized MLS, another DMLS, but I spelled it with an S instead

13:17.440 --> 13:28.320
of a Z. So if you get confused, yeah. So it was just a way to decide which epoch to use, how to copy

13:28.320 --> 13:35.600
updates from one epoch to another. It sort of assumes that we have some sort of algorithm to

13:35.680 --> 13:43.280
determine membership in group state. So in matrix, we have the state resolution algorithms

13:44.160 --> 13:53.120
that we run in matrix currently that's done on the server side. And when I was doing my experimentation,

13:53.120 --> 14:02.240
I just used, I just used what the server had told the client to use for the membership,

14:02.240 --> 14:08.400
but there's no reason, well, other than complexity that we couldn't also run the state resolution

14:08.400 --> 14:20.320
algorithm on the client. So some downsides to decentralized, decentralized with a ZMLS.

14:23.520 --> 14:30.240
So it potentially requires a lot of storage because you need to keep all the previous epochs

14:30.560 --> 14:43.680
or information about the previous epochs in order to handle forks. So there may be ways of

14:45.280 --> 14:52.160
cleaning things up if you have sort of a time window that you accept old epochs from, but it's

14:52.640 --> 15:00.320
that's something that would need to be figured out. Managing forks can be complicated. So if you

15:00.320 --> 15:07.440
have a lot of forks, then you sort of have to look at all of them, figure out what information

15:07.440 --> 15:16.000
you need to take from each of them and so forth. And it's sort of an open question. The decentralized

15:16.000 --> 15:29.760
MLS draft, the ITF draft doesn't really deal with any of these issues. It just describes

15:29.760 --> 15:36.560
what messages and modifications you need in MLS in order to do those things, but it doesn't

15:36.640 --> 15:49.200
describe how to do all this management. So the other DMLS distributed MLS takes a completely different

15:50.560 --> 15:59.920
approach where it says that each sender will maintain their own MLS group.

16:00.800 --> 16:07.840
And so each sender sort of acts as their own delivery service because they're the only person

16:07.840 --> 16:19.520
who can commit to make commits to their own group. But you can also look at other people's groups

16:19.520 --> 16:30.640
and pulling information from their group. So for example, if Alice updates her key in her group,

16:30.640 --> 16:38.960
then Bob can look at that update and then update Alice's key in his group so that

16:38.960 --> 16:52.080
the cryptographic information is up to date. It sort of sidesteps the question of how do

16:52.080 --> 17:02.960
maintain consistent membership by saying that each sender decides on their own who's going to

17:02.960 --> 17:16.640
be a member of their group. So there is less of a need to have a consistent membership between groups.

17:20.160 --> 17:27.280
So downsides to this approach is that you lose the log-and performance of MLS.

17:27.360 --> 17:34.160
So remember at the beginning I said that the log-and performance depends on how you manage

17:34.160 --> 17:43.840
your group. So distributed MLS basically the way it uses MLS you just lose the log-and performance.

17:43.840 --> 17:50.240
It's always linear time. And you lose the shared group context and the shared membership.

17:50.240 --> 18:02.000
So yeah, depending on your application that may or may not be important. So even though you

18:02.000 --> 18:10.480
lose shared membership, you can still see the group membership from everyone else's sender group.

18:10.560 --> 18:17.920
So that you can tell who they've sent the message to.

18:22.960 --> 18:31.360
So now I'll do some comparisons between the two. So the first question is which one is more MLS

18:31.440 --> 18:39.280
like? And this is possibly a bit controversial because it is very subjective.

18:40.080 --> 18:47.040
And so I will say that they're both more MLS like than the other in different ways.

18:48.080 --> 18:57.120
So decentralized MLS it tries to keep more of the MLS semantics in my opinion. So you get like the

18:58.000 --> 19:04.320
it tries to keep the shared membership. The group context and the

19:05.600 --> 19:12.560
order log-and performance. But it requires more modifications to the protocol. So it requires this

19:13.680 --> 19:19.680
new way of storing the epoch secret as a punctual bone pseudo random function.

19:20.320 --> 19:31.440
And so that's some requires some cryptography analysis that hasn't been done yet.

19:33.120 --> 19:42.160
So distributed MLS it requires fewer changes to the MLS group because you can basically just use

19:42.160 --> 19:49.520
MLS as it is. But you lose the semantics such as the shared group context.

19:52.480 --> 20:00.480
Which one is faster? So decentralized MLS by itself can be word of log-and.

20:01.840 --> 20:09.120
Although again it depends on how you manage your how you manage your group.

20:10.080 --> 20:15.680
But managing forks can be expensive and it depends on how many forks you have. So if

20:15.680 --> 20:21.600
you know most of the time everything looks linear then it'll basically be the same performance as

20:21.600 --> 20:28.960
MLS. If you have you know a few forks here and there then it'll quickly resolve. But if you get a lot

20:28.960 --> 20:36.080
of forking in your tree then it can get expensive because you have to look at a whole lot of different

20:36.160 --> 20:45.840
epochs and try to merge things together. And distributed MLS as I mentioned before is always

20:45.840 --> 20:55.520
linear time. What about storage requirements? So decentralized MLS is potentially unbounded.

20:56.480 --> 21:05.040
But you know if you add in some way of pruning old forks then you know it may be only need to store

21:06.080 --> 21:14.960
a few different epochs at a time. So it might be closer to or linear space like MLSs.

21:15.920 --> 21:25.520
Distributed MLS if you just do a naive implementation you store you have end-different

21:25.520 --> 21:36.000
centers each MLS group is linear space. So that's you know quadratic space in total which isn't that great.

21:36.320 --> 21:49.200
But the way distributed MLS works a lot of that information is irrelevant to you as a receiver

21:49.200 --> 22:00.800
because like in each MLS group every node has a signing key so you can verify that yes this

22:00.880 --> 22:06.480
person actually sent the message. But if you know that this group is only going to have one

22:06.480 --> 22:13.600
center then you only need to keep the signing key for that one center. You also don't need to keep

22:13.600 --> 22:20.240
any information about how do you encrypt things to other people in that group because you're not

22:20.240 --> 22:29.120
going to use that group for encryption. So you can throw away a lot of information. I think that you

22:29.200 --> 22:40.880
can get away with storing a constant amount of information per receiving group. I haven't actually

22:40.880 --> 22:47.760
gone through to figure it out but I believe it's constant at worst I think it's logarithmic.

22:49.360 --> 22:56.800
So you of course need to keep your own MLS tree for sending which will be linear

22:57.680 --> 23:05.280
and you have a linear number of receiving groups. So in total it'll just be linear or I think that

23:05.280 --> 23:15.360
worse log n log n which isn't too bad. So which one should you use if you're designing a distributed

23:16.640 --> 23:26.400
or decentralized protocol? So distributed MLS I think is in my opinion it's more ready to use

23:27.760 --> 23:36.880
because there's fewer open questions you don't need to worry about. So distributed MLS is the

23:36.880 --> 23:43.200
one where each person has their own sending group. So you don't have to worry about managing the

23:43.200 --> 23:53.760
forks. Things like that it's and yeah decentralized MLS has more details to work out

23:54.560 --> 24:00.880
such as the group management and the state resolution and sort of things but they have different

24:00.880 --> 24:08.960
trade-offs and it also depends on the underlying message messaging protocol. So for example

24:09.680 --> 24:20.160
in matrix I believe that decentralized MLS would be a better fit. So in matrix servers are usually

24:20.160 --> 24:28.160
available in the common case servers are mostly up and in contact with each other.

24:29.840 --> 24:36.400
It's designed to be able to handle net splits but most of the time if a server is up

24:36.400 --> 24:46.640
then it can it can contact all the other servers. So in the common case I believe that decentralized

24:46.640 --> 24:58.720
MLS is a better fit for MLS sorry for matrix for peer to peer messaging systems such as

24:58.720 --> 25:08.800
rain talks or briar. I believe that distributed MLS is a better fit for XMPP. So

25:10.160 --> 25:19.360
most of you they're chatting XMPP for the purposes of this talk only I would consider them centralized.

25:20.400 --> 25:28.160
XMPP is obviously not centralized but each room individually has its server that every message

25:28.160 --> 25:35.680
goes through and it can act as a centralized delivery service. So for multi user chats you can just use

25:35.680 --> 25:42.720
plain MLS you don't need to worry about this stuff. One on one chats are different in XMPP though.

25:43.760 --> 25:51.760
One on one chats you send your message to your server it relays the message to the other person

25:51.760 --> 25:58.400
server and then sends it on. So there are a couple different approaches one of them is you could

25:59.360 --> 26:07.840
you could just use plain MLS but just designate one server as the delivery service because if you're

26:07.840 --> 26:13.120
doing a one on one chat and the other person's server is down then you're not going to have

26:13.200 --> 26:24.160
much success chatting anyways. If you don't want to do that then my suspicion is that distributed

26:24.160 --> 26:33.280
MLS would be a better fit for one on one chats in XMPP. So there are other approaches to

26:33.280 --> 26:47.440
doing MLS on decentralized platforms. One of them is that you can centralize at handshake messages

26:47.440 --> 26:56.160
but decentralize the application messages. So when you're just sending like a chat message

26:56.640 --> 27:04.000
you can do that normally as you would over your decentralize network but then when you're trying

27:04.000 --> 27:12.400
to do an update to the MLS group then you would have a single server that handles the ordering of

27:12.400 --> 27:20.400
those things. So the downside to that of course is that if this single point of failure fails

27:21.280 --> 27:29.040
then you can still send messages but you can't make any changes to the group which means that

27:29.040 --> 27:37.200
we would lose for example post-conformized security. So if someone's client gets compromised then you

27:37.200 --> 27:43.440
can update the MLS group and then the attacker can still keep reading messages.

27:43.520 --> 27:56.160
You could go a bit further and have a pool of delivery services and you pick one of them to be

27:57.120 --> 28:04.160
the delivery service and if that one goes down then you pick a different one. There's a bunch

28:04.160 --> 28:12.800
of literature out there on later elections. Most of them they pretty much require that most of the

28:12.800 --> 28:20.080
candidates are online so that they can come through consensus so this might not work for

28:21.360 --> 28:30.480
some situations. For example peer peer where you have a bunch of things and the most people

28:30.480 --> 28:38.800
won't be online at the same time and then there's completely different algorithms that can be used.

28:38.800 --> 28:48.720
There's this thing distributed continuous group key agreement which was written by widener

28:48.720 --> 28:56.080
and gluttonin and a few other people. There's this thing called bechem from ink and switch.

28:57.840 --> 29:04.640
Both of those are on my reading list. I'm not very familiar with them but they're both

29:05.600 --> 29:14.560
intended for decentralized situations or we can just keep doing what most people have been

29:14.560 --> 29:21.280
doing which is using the signal protocol and variance of the signal protocol such as omin

29:21.280 --> 29:38.880
megong in matrix and omemo in x and pp. So that's it. Any questions?

29:39.200 --> 29:46.560
For the folks to color less down the line. Part in the future can you see the opportunities for

29:47.840 --> 29:53.840
where the e-box of folks do them? Yes. There are opportunities for them to color less later on in

29:53.840 --> 30:04.480
time. Right so the question is if I'm in the diagram so the question is when you have the

30:04.480 --> 30:16.400
forking is there opportunity for the forks to color less later on. So the way the decentralized

30:19.440 --> 30:27.920
ITF draft is written it just handles the forking. Part of the work that I was doing was sort of

30:34.480 --> 30:40.960
it was so pulling in information from from the different forks and then adding information

30:40.960 --> 30:50.640
to say that yes I've so Carol would say I'm basing it off of Alice's e-poc but I've also seen

30:50.640 --> 30:58.240
Bob's e-poc and I've merged in information from there and then so there isn't sort of a formal

30:58.320 --> 31:08.960
merge operation but it sort of assumes that after Carol sends the e-poc then Bob's e-poc

31:08.960 --> 31:16.880
will sort of be ignored for the rest of time and everyone will sort of base it off of there.

31:18.480 --> 31:24.000
Yep. To me it sounds like this will pop here. Thank the scalability of the chat so if you have

31:24.000 --> 31:30.640
a chat with many people they would have many different forks to get together is there any

31:30.640 --> 31:37.840
or not about this or how it could be solved. Yeah so the question was about scalability if you have

31:37.840 --> 31:47.680
a lot of people and you have a lot of forks yeah so this is an open question we haven't really

31:47.760 --> 31:57.600
figured it out yet but yeah so the forking only happens when you do an update to the group

31:59.040 --> 32:05.600
so that's if you need to change your key material or if you invite someone or someone leaves the group

32:08.080 --> 32:16.640
so it's not something that happens all the time. It's something that happens occasionally so

32:18.000 --> 32:26.480
if you have something that happens even in a big group I wouldn't expect people to be sending

32:26.480 --> 32:34.560
commits more than more often than say like once a minute so in a situation like in matrix where

32:34.560 --> 32:41.760
servers are mostly connected I would not expect there to be much forking even in large rooms but

32:41.760 --> 32:53.760
it's it's it's an open question no it's yeah it's it's it's not on every message that you send it's only

32:53.760 --> 33:00.880
on certain certain group operations. Yep. Did you already compare this video to MLS with

33:00.880 --> 33:07.200
sender keys like the approach signal users in terms of performance and security because it seems

33:07.920 --> 33:15.120
right as a consumer because it uses something with his key from the school. Yeah so the question is

33:15.120 --> 33:22.960
about comparing distributed MLS with sender keys because they they seem kind of similar. I personally

33:23.680 --> 33:32.560
don't have much experience with distributed MLS so yeah and I don't know if the group

33:32.640 --> 33:41.360
who's worked on that has done comparisons with with sender keys.

33:58.400 --> 34:00.560
Yeah so the question was about

34:03.200 --> 34:12.640
distributed MLS on a federated service whether you can do a group per server rather than

34:12.640 --> 34:22.880
per sender and yeah that that might be possible it would be something we would have to explore

34:24.000 --> 34:30.240
we haven't really there's there's a lot of possibilities that we could we could look into and we

34:30.240 --> 34:41.600
haven't really looked into it that much okay I guess that's it