WEBVTT 00:00.000 --> 00:14.880 Yep, okay, welcome here. I can hear myself. So hi, my name is Hubert, you might know me 00:14.880 --> 00:23.600 online as you are egg. I'm on the crypto team at Eleman, where crypto means encryption, 00:23.600 --> 00:32.800 and I'm also on the matrix spec core team. But my talk isn't going to be about matrix, 00:32.800 --> 00:40.720 so if you're came here trying to find out what we're going to do about MOS on matrix, sorry, 00:41.920 --> 00:50.400 I'll talk a little bit about it, but not much. I did some investigation into decentralizing 00:50.400 --> 01:00.240 MOS for matrix, and I wrote a partial experimental MOS implementation in TypeScript, 01:01.840 --> 01:14.800 which was not very fun. So what is MOS for messaging layer security? So it is an ITF standard. 01:15.280 --> 01:20.400 It's RFC 9420 for enter and encryption in instant messaging. 01:24.720 --> 01:29.520 And it can be it's intended to apply to many different systems, so 01:32.560 --> 01:42.080 so any or many different messaging platforms can use it. It encrypts messages, obviously, 01:42.080 --> 01:50.480 if it's intended for messaging. And also handshake messages, which are the messages used for 01:50.480 --> 01:57.760 controlling the MOS group, can be either encrypted or in clear text, although the sensitive 01:57.760 --> 02:05.520 parts are always encrypted. So that means that for example, you have the option of either hiding 02:06.480 --> 02:12.240 the group membership from the server, or the server can see the group membership, so that it 02:12.240 --> 02:22.480 can apply its own access controls or whatnot. It ensure its consistent group state, so it 02:25.120 --> 02:34.320 controls the group membership. So everyone in the room can be assured that they 02:36.480 --> 02:44.000 that everyone knows who's in the room. And then there is also this concept of group context, 02:44.000 --> 02:51.600 which is sort of you can shove arbitrary data into the group, and then everyone in the group agrees 02:51.600 --> 02:58.640 what's in the group context. It provides sort of the normal things that you expect from 02:59.600 --> 03:06.720 and turned encryption in messaging. So forward, you can see post compromise security. 03:06.720 --> 03:16.480 The one thing it does not provide is deniability. So there is some work on trying to do that, 03:16.480 --> 03:27.760 but it's not in the standard. And updates are intended to be fast. So order log n, 03:28.480 --> 03:34.880 work where n is the number of members in the group. On average, there's a big 03:34.880 --> 03:44.240 asterisk there because it really depends on how the group is managed. You can get into a 03:44.240 --> 03:53.280 really bad state where there's lots of holes in the tree and you end up having to do order and 03:53.280 --> 04:02.400 work anyways. But the intention is that the common case is that it's order log n. Though 04:02.400 --> 04:11.520 inviting someone still requires a linear time work. So how does MLS work? So I am only going to 04:11.520 --> 04:19.360 focus on a very tiny part of MLS because I don't have two days to talk about it. 04:19.920 --> 04:30.880 So the part I'm going to talk about is the MLS epochs. And so an MLS group goes through 04:31.920 --> 04:40.400 different epochs and each epoch has a secret associated, which is called an epoch secret. 04:41.360 --> 04:47.200 From the epoch secret, you derive other secrets. For example, encryption keys. 04:49.520 --> 04:55.520 I believe a message authentication code keys are all sorts of things. 04:57.040 --> 05:04.880 And the epoch secret, each epoch secret is derived from the previous epoch secret and 05:05.760 --> 05:11.600 a secret that you get from the commit message that created the epoch. And then you sort of 05:12.160 --> 05:22.080 when you get a new epoch, then you throw away the old epochs. So this is a diagram that comes from the 05:22.160 --> 05:32.800 RFC. So this sort of shows that you have the epoch secret. You derive, well, this just shows 05:32.800 --> 05:40.000 the encryption secret. But you derive a bunch of other secrets from it. You use the encryption secret 05:40.000 --> 05:51.840 to derive keys from the secret. The secret key you use to decrypt messages. So when you get in 05:51.920 --> 05:59.920 a message that creates a new epoch, you decrypt it. And then that gives you the commit secret. 05:59.920 --> 06:06.800 You combine it with the previous epochs secret. And then you get the next epochs secret. 06:10.800 --> 06:18.960 So in normal MLS, commit messages are ordered. So you have epoch number zero and then epoch 06:18.960 --> 06:27.440 number one. And then epoch number two. And you only apply one commit to each epoch. 06:28.480 --> 06:35.600 And they're all in a linear order. You have a delivery service that enforces the order. 06:35.600 --> 06:44.240 So that everyone agrees that this is the commit that came, that modifies epoch zero. 06:44.240 --> 06:51.920 And then I get epoch one. So everyone is in sync. But what happens if you don't have a single 06:51.920 --> 06:58.960 delivery service to order them. And so you can't ensure that everyone sees the commit messages 06:58.960 --> 07:07.200 in the same order. So there are two different approaches that have been published as ITF drafts. 07:08.080 --> 07:16.720 There's decentralized MLS, also called DMLS. And this was written by people from 07:17.680 --> 07:24.080 Phoenix R&D. And it was based on work by people from Amazon who were 07:26.480 --> 07:35.200 formerly wicker people, but they got bought out by Amazon. And then there's also distributed MLS, 07:35.280 --> 07:44.320 also known as DMLS. And this was written by people from German at work and researchers from the 07:44.320 --> 07:51.760 US Naval Postgraduate School. So I'm going to give a brief introduction for both of them and then do 07:51.760 --> 08:01.200 some comparisons. Decentralized MLS. So instead of having a linear order of epochs, 08:01.920 --> 08:11.680 it allows you to have multiple descendants. So you can have multiple senders modifying the same 08:11.680 --> 08:19.840 epoch. And then so you get sort of a tree of epochs rather than lying. I use it some fancy 08:19.840 --> 08:26.560 cryptography called puncturable pseudo random functions. So you can't redevive an epoch secret 08:27.440 --> 08:34.880 to preserve forward secrecy. So here's a little diagram that I drew up. The circles are epochs. 08:36.000 --> 08:44.720 The labels on the lines are the people who created the epochs. So we start off with a single epoch. 08:44.720 --> 08:51.120 And then L.S. and Bob try to do something to the group at the same time. So maybe they want to 08:51.120 --> 09:00.320 update the wrong keys in the group. They happen at the same time. So it's not like Alice sends 09:02.400 --> 09:08.320 her commit and then Bob sees Alice's commit and then sends his own. They happen at the same time. 09:08.320 --> 09:18.640 So they both modify the same epoch. So then there's a fork in the tree. And then Carol wants to 09:18.640 --> 09:28.800 send her own commit. She has to decide which epoch to base her commit off of, say she decides to 09:28.800 --> 09:40.160 base it off of Alice's epoch and then so she creates her commit based on that. So yeah, when Alice sends 09:40.240 --> 09:48.800 her commit, then she will show send it to the group. Everyone will see the group. And then so 09:48.800 --> 10:04.000 everyone won't drive of these the three new epochs. So one of the problems with just doing this 10:04.080 --> 10:15.200 with MLS is that so say after Alice drives for epoch, then she still needs to keep around 10:16.080 --> 10:26.080 the epochs secret for the original commit because after Bob sends his commit, then she needs 10:26.080 --> 10:33.760 the epochs secret from the original commit to drive Bob's commit. And that poses a problem for 10:33.760 --> 10:42.960 forward secrecy because you're holding on to chemotherapy for longer than you need to or longer 10:42.960 --> 10:51.200 than you should. And so if someone compromises Alice's client, then they can get the epochs secret 10:51.200 --> 10:58.160 and then replay Alice's commit and then re-derive this whole epoch and then decrypt 10:58.160 --> 11:06.080 all messages using the secret. So that's where this puncturable pseudo random functions comes in. 11:07.200 --> 11:12.640 So if you know pseudo random function is sort of function, you give it some input and then it 11:12.640 --> 11:21.280 spits out some pseudo random data, the puncturable. So if you give it the same data, it'll give you 11:21.280 --> 11:32.400 the same pseudo random data. The puncturable part is once you've given it the input data once, 11:32.400 --> 11:39.920 then you can't give it the same data again. So it can only derive the pseudo random data once 11:41.680 --> 11:48.480 and it's a bit of cryptomagic. I'm not going to try to explain it because I don't understand it 11:48.480 --> 11:58.240 either. It just works. So with this approach, there are a few open questions. So how do you manage 11:59.200 --> 12:08.240 the forks? So if you remember here, we had Alice and Bob's commits. So Carol needs to decide which 12:08.320 --> 12:18.400 commit to use. So Carol uses Alice's, she decided to use Alice's commit, you know, 12:18.400 --> 12:27.840 woods that the best choice. And then what are we going to do about Bob's commit? And then another 12:27.840 --> 12:38.240 question is, now that we have, now that we have multiple epochs that we need to worry about, 12:39.280 --> 12:45.840 how do we make sure that everything's consistent for example membership. So say if on the left 12:45.840 --> 13:01.440 Alice invited her friend Dan and Bob invited Emily, you know, so then on one side of the tree, 13:01.440 --> 13:08.960 we have different memberships. How do we combine them? So this sort of goes into my, the work 13:08.960 --> 13:17.440 that I did, which I also called decentralized MLS, another DMLS, but I spelled it with an S instead 13:17.440 --> 13:28.320 of a Z. So if you get confused, yeah. So it was just a way to decide which epoch to use, how to copy 13:28.320 --> 13:35.600 updates from one epoch to another. It sort of assumes that we have some sort of algorithm to 13:35.680 --> 13:43.280 determine membership in group state. So in matrix, we have the state resolution algorithms 13:44.160 --> 13:53.120 that we run in matrix currently that's done on the server side. And when I was doing my experimentation, 13:53.120 --> 14:02.240 I just used, I just used what the server had told the client to use for the membership, 14:02.240 --> 14:08.400 but there's no reason, well, other than complexity that we couldn't also run the state resolution 14:08.400 --> 14:20.320 algorithm on the client. So some downsides to decentralized, decentralized with a ZMLS. 14:23.520 --> 14:30.240 So it potentially requires a lot of storage because you need to keep all the previous epochs 14:30.560 --> 14:43.680 or information about the previous epochs in order to handle forks. So there may be ways of 14:45.280 --> 14:52.160 cleaning things up if you have sort of a time window that you accept old epochs from, but it's 14:52.640 --> 15:00.320 that's something that would need to be figured out. Managing forks can be complicated. So if you 15:00.320 --> 15:07.440 have a lot of forks, then you sort of have to look at all of them, figure out what information 15:07.440 --> 15:16.000 you need to take from each of them and so forth. And it's sort of an open question. The decentralized 15:16.000 --> 15:29.760 MLS draft, the ITF draft doesn't really deal with any of these issues. It just describes 15:29.760 --> 15:36.560 what messages and modifications you need in MLS in order to do those things, but it doesn't 15:36.640 --> 15:49.200 describe how to do all this management. So the other DMLS distributed MLS takes a completely different 15:50.560 --> 15:59.920 approach where it says that each sender will maintain their own MLS group. 16:00.800 --> 16:07.840 And so each sender sort of acts as their own delivery service because they're the only person 16:07.840 --> 16:19.520 who can commit to make commits to their own group. But you can also look at other people's groups 16:19.520 --> 16:30.640 and pulling information from their group. So for example, if Alice updates her key in her group, 16:30.640 --> 16:38.960 then Bob can look at that update and then update Alice's key in his group so that 16:38.960 --> 16:52.080 the cryptographic information is up to date. It sort of sidesteps the question of how do 16:52.080 --> 17:02.960 maintain consistent membership by saying that each sender decides on their own who's going to 17:02.960 --> 17:16.640 be a member of their group. So there is less of a need to have a consistent membership between groups. 17:20.160 --> 17:27.280 So downsides to this approach is that you lose the log-and performance of MLS. 17:27.360 --> 17:34.160 So remember at the beginning I said that the log-and performance depends on how you manage 17:34.160 --> 17:43.840 your group. So distributed MLS basically the way it uses MLS you just lose the log-and performance. 17:43.840 --> 17:50.240 It's always linear time. And you lose the shared group context and the shared membership. 17:50.240 --> 18:02.000 So yeah, depending on your application that may or may not be important. So even though you 18:02.000 --> 18:10.480 lose shared membership, you can still see the group membership from everyone else's sender group. 18:10.560 --> 18:17.920 So that you can tell who they've sent the message to. 18:22.960 --> 18:31.360 So now I'll do some comparisons between the two. So the first question is which one is more MLS 18:31.440 --> 18:39.280 like? And this is possibly a bit controversial because it is very subjective. 18:40.080 --> 18:47.040 And so I will say that they're both more MLS like than the other in different ways. 18:48.080 --> 18:57.120 So decentralized MLS it tries to keep more of the MLS semantics in my opinion. So you get like the 18:58.000 --> 19:04.320 it tries to keep the shared membership. The group context and the 19:05.600 --> 19:12.560 order log-and performance. But it requires more modifications to the protocol. So it requires this 19:13.680 --> 19:19.680 new way of storing the epoch secret as a punctual bone pseudo random function. 19:20.320 --> 19:31.440 And so that's some requires some cryptography analysis that hasn't been done yet. 19:33.120 --> 19:42.160 So distributed MLS it requires fewer changes to the MLS group because you can basically just use 19:42.160 --> 19:49.520 MLS as it is. But you lose the semantics such as the shared group context. 19:52.480 --> 20:00.480 Which one is faster? So decentralized MLS by itself can be word of log-and. 20:01.840 --> 20:09.120 Although again it depends on how you manage your how you manage your group. 20:10.080 --> 20:15.680 But managing forks can be expensive and it depends on how many forks you have. So if 20:15.680 --> 20:21.600 you know most of the time everything looks linear then it'll basically be the same performance as 20:21.600 --> 20:28.960 MLS. If you have you know a few forks here and there then it'll quickly resolve. But if you get a lot 20:28.960 --> 20:36.080 of forking in your tree then it can get expensive because you have to look at a whole lot of different 20:36.160 --> 20:45.840 epochs and try to merge things together. And distributed MLS as I mentioned before is always 20:45.840 --> 20:55.520 linear time. What about storage requirements? So decentralized MLS is potentially unbounded. 20:56.480 --> 21:05.040 But you know if you add in some way of pruning old forks then you know it may be only need to store 21:06.080 --> 21:14.960 a few different epochs at a time. So it might be closer to or linear space like MLSs. 21:15.920 --> 21:25.520 Distributed MLS if you just do a naive implementation you store you have end-different 21:25.520 --> 21:36.000 centers each MLS group is linear space. So that's you know quadratic space in total which isn't that great. 21:36.320 --> 21:49.200 But the way distributed MLS works a lot of that information is irrelevant to you as a receiver 21:49.200 --> 22:00.800 because like in each MLS group every node has a signing key so you can verify that yes this 22:00.880 --> 22:06.480 person actually sent the message. But if you know that this group is only going to have one 22:06.480 --> 22:13.600 center then you only need to keep the signing key for that one center. You also don't need to keep 22:13.600 --> 22:20.240 any information about how do you encrypt things to other people in that group because you're not 22:20.240 --> 22:29.120 going to use that group for encryption. So you can throw away a lot of information. I think that you 22:29.200 --> 22:40.880 can get away with storing a constant amount of information per receiving group. I haven't actually 22:40.880 --> 22:47.760 gone through to figure it out but I believe it's constant at worst I think it's logarithmic. 22:49.360 --> 22:56.800 So you of course need to keep your own MLS tree for sending which will be linear 22:57.680 --> 23:05.280 and you have a linear number of receiving groups. So in total it'll just be linear or I think that 23:05.280 --> 23:15.360 worse log n log n which isn't too bad. So which one should you use if you're designing a distributed 23:16.640 --> 23:26.400 or decentralized protocol? So distributed MLS I think is in my opinion it's more ready to use 23:27.760 --> 23:36.880 because there's fewer open questions you don't need to worry about. So distributed MLS is the 23:36.880 --> 23:43.200 one where each person has their own sending group. So you don't have to worry about managing the 23:43.200 --> 23:53.760 forks. Things like that it's and yeah decentralized MLS has more details to work out 23:54.560 --> 24:00.880 such as the group management and the state resolution and sort of things but they have different 24:00.880 --> 24:08.960 trade-offs and it also depends on the underlying message messaging protocol. So for example 24:09.680 --> 24:20.160 in matrix I believe that decentralized MLS would be a better fit. So in matrix servers are usually 24:20.160 --> 24:28.160 available in the common case servers are mostly up and in contact with each other. 24:29.840 --> 24:36.400 It's designed to be able to handle net splits but most of the time if a server is up 24:36.400 --> 24:46.640 then it can it can contact all the other servers. So in the common case I believe that decentralized 24:46.640 --> 24:58.720 MLS is a better fit for MLS sorry for matrix for peer to peer messaging systems such as 24:58.720 --> 25:08.800 rain talks or briar. I believe that distributed MLS is a better fit for XMPP. So 25:10.160 --> 25:19.360 most of you they're chatting XMPP for the purposes of this talk only I would consider them centralized. 25:20.400 --> 25:28.160 XMPP is obviously not centralized but each room individually has its server that every message 25:28.160 --> 25:35.680 goes through and it can act as a centralized delivery service. So for multi user chats you can just use 25:35.680 --> 25:42.720 plain MLS you don't need to worry about this stuff. One on one chats are different in XMPP though. 25:43.760 --> 25:51.760 One on one chats you send your message to your server it relays the message to the other person 25:51.760 --> 25:58.400 server and then sends it on. So there are a couple different approaches one of them is you could 25:59.360 --> 26:07.840 you could just use plain MLS but just designate one server as the delivery service because if you're 26:07.840 --> 26:13.120 doing a one on one chat and the other person's server is down then you're not going to have 26:13.200 --> 26:24.160 much success chatting anyways. If you don't want to do that then my suspicion is that distributed 26:24.160 --> 26:33.280 MLS would be a better fit for one on one chats in XMPP. So there are other approaches to 26:33.280 --> 26:47.440 doing MLS on decentralized platforms. One of them is that you can centralize at handshake messages 26:47.440 --> 26:56.160 but decentralize the application messages. So when you're just sending like a chat message 26:56.640 --> 27:04.000 you can do that normally as you would over your decentralize network but then when you're trying 27:04.000 --> 27:12.400 to do an update to the MLS group then you would have a single server that handles the ordering of 27:12.400 --> 27:20.400 those things. So the downside to that of course is that if this single point of failure fails 27:21.280 --> 27:29.040 then you can still send messages but you can't make any changes to the group which means that 27:29.040 --> 27:37.200 we would lose for example post-conformized security. So if someone's client gets compromised then you 27:37.200 --> 27:43.440 can update the MLS group and then the attacker can still keep reading messages. 27:43.520 --> 27:56.160 You could go a bit further and have a pool of delivery services and you pick one of them to be 27:57.120 --> 28:04.160 the delivery service and if that one goes down then you pick a different one. There's a bunch 28:04.160 --> 28:12.800 of literature out there on later elections. Most of them they pretty much require that most of the 28:12.800 --> 28:20.080 candidates are online so that they can come through consensus so this might not work for 28:21.360 --> 28:30.480 some situations. For example peer peer where you have a bunch of things and the most people 28:30.480 --> 28:38.800 won't be online at the same time and then there's completely different algorithms that can be used. 28:38.800 --> 28:48.720 There's this thing distributed continuous group key agreement which was written by widener 28:48.720 --> 28:56.080 and gluttonin and a few other people. There's this thing called bechem from ink and switch. 28:57.840 --> 29:04.640 Both of those are on my reading list. I'm not very familiar with them but they're both 29:05.600 --> 29:14.560 intended for decentralized situations or we can just keep doing what most people have been 29:14.560 --> 29:21.280 doing which is using the signal protocol and variance of the signal protocol such as omin 29:21.280 --> 29:38.880 megong in matrix and omemo in x and pp. So that's it. Any questions? 29:39.200 --> 29:46.560 For the folks to color less down the line. Part in the future can you see the opportunities for 29:47.840 --> 29:53.840 where the e-box of folks do them? Yes. There are opportunities for them to color less later on in 29:53.840 --> 30:04.480 time. Right so the question is if I'm in the diagram so the question is when you have the 30:04.480 --> 30:16.400 forking is there opportunity for the forks to color less later on. So the way the decentralized 30:19.440 --> 30:27.920 ITF draft is written it just handles the forking. Part of the work that I was doing was sort of 30:34.480 --> 30:40.960 it was so pulling in information from from the different forks and then adding information 30:40.960 --> 30:50.640 to say that yes I've so Carol would say I'm basing it off of Alice's e-poc but I've also seen 30:50.640 --> 30:58.240 Bob's e-poc and I've merged in information from there and then so there isn't sort of a formal 30:58.320 --> 31:08.960 merge operation but it sort of assumes that after Carol sends the e-poc then Bob's e-poc 31:08.960 --> 31:16.880 will sort of be ignored for the rest of time and everyone will sort of base it off of there. 31:18.480 --> 31:24.000 Yep. To me it sounds like this will pop here. Thank the scalability of the chat so if you have 31:24.000 --> 31:30.640 a chat with many people they would have many different forks to get together is there any 31:30.640 --> 31:37.840 or not about this or how it could be solved. Yeah so the question was about scalability if you have 31:37.840 --> 31:47.680 a lot of people and you have a lot of forks yeah so this is an open question we haven't really 31:47.760 --> 31:57.600 figured it out yet but yeah so the forking only happens when you do an update to the group 31:59.040 --> 32:05.600 so that's if you need to change your key material or if you invite someone or someone leaves the group 32:08.080 --> 32:16.640 so it's not something that happens all the time. It's something that happens occasionally so 32:18.000 --> 32:26.480 if you have something that happens even in a big group I wouldn't expect people to be sending 32:26.480 --> 32:34.560 commits more than more often than say like once a minute so in a situation like in matrix where 32:34.560 --> 32:41.760 servers are mostly connected I would not expect there to be much forking even in large rooms but 32:41.760 --> 32:53.760 it's it's it's an open question no it's yeah it's it's it's not on every message that you send it's only 32:53.760 --> 33:00.880 on certain certain group operations. Yep. Did you already compare this video to MLS with 33:00.880 --> 33:07.200 sender keys like the approach signal users in terms of performance and security because it seems 33:07.920 --> 33:15.120 right as a consumer because it uses something with his key from the school. Yeah so the question is 33:15.120 --> 33:22.960 about comparing distributed MLS with sender keys because they they seem kind of similar. I personally 33:23.680 --> 33:32.560 don't have much experience with distributed MLS so yeah and I don't know if the group 33:32.640 --> 33:41.360 who's worked on that has done comparisons with with sender keys. 33:58.400 --> 34:00.560 Yeah so the question was about 34:03.200 --> 34:12.640 distributed MLS on a federated service whether you can do a group per server rather than 34:12.640 --> 34:22.880 per sender and yeah that that might be possible it would be something we would have to explore 34:24.000 --> 34:30.240 we haven't really there's there's a lot of possibilities that we could we could look into and we 34:30.240 --> 34:41.600 haven't really looked into it that much okay I guess that's it