WEBVTT 00:00.000 --> 00:13.360 Hi everybody. I'm here to present the mobility database. I'm Isabelle de Robert. I work for mobility 00:13.360 --> 00:21.240 data as a regional director for Europe. I'm Basin Barcelona. I speak English, French and a 00:21.240 --> 00:26.880 bit of Spanish. It's working progress. So if there are any native speakers in Spanish come 00:26.880 --> 00:31.880 to talk to me after I need to practice. And the reason I do this work is to make sustainable 00:31.880 --> 00:39.280 transport. I forgot to put transport. The easiest choice. I'm sure all of you take public transport 00:39.280 --> 00:47.000 as a principle already. I try, but I'm also a bit lazy and sometimes I need to choose between 00:47.000 --> 00:52.720 the easiest or the most sustainable often actually. And so my goal is to make sustainable 00:52.720 --> 01:00.800 transport the easiest for other lazy people like me. So who is mobility data? Who knows about 01:00.800 --> 01:11.360 mobility data in the room? Oh, nice. I was expecting a lot less. We're global non-profit, HQ 01:11.360 --> 01:21.160 in Montreal, Canada. We're about 25 people. And we govern open data formats. All of you might 01:21.160 --> 01:27.240 know GTFS and GBFS. So we govern these standards. And something you might not know is that 01:27.240 --> 01:32.760 we also have a software engineering team that builds free and open source tools for the 01:32.760 --> 01:41.360 community of users. Our goal as an organization is to empower sustainable mobility stakeholders 01:41.360 --> 01:50.680 to make the best of their data, their digital assets. So you might already know GTFS and 01:50.680 --> 01:58.120 GBFS and what we call diffact standard. So it means a format or a specification that becomes 01:58.120 --> 02:07.560 a standard because of its worldwide adoption. So this is the network between GTFS, GBFS for 02:07.560 --> 02:15.000 shared mobility, shared modes, bike share, scooter share, car share. A lot of the work has 02:15.080 --> 02:21.080 been done in Europe to build this format because the market is big in Europe. And about 02:21.080 --> 02:31.240 a thousand show mobility operators use this format today. And what they do is they handle the 02:31.240 --> 02:37.160 well, okay, they handle the communication between the operators and the general public using 02:37.160 --> 02:42.440 the service. So they represent writer-facing information. Nothing kind of operational. It's 02:42.440 --> 02:47.880 really meant to share information to the public using the service. Most of the time if you 02:47.880 --> 02:54.680 are trip-learning apps. So these are a few examples. There's a huge open source community 02:54.680 --> 03:02.280 between these two formats. There is an official GitHub and Slack for GTFS, GBFS. We're 03:02.280 --> 03:08.360 maintaining the Slack. But there is also a very big community of folks building open source software. 03:08.600 --> 03:16.200 We also have a list we've had for a while called Awesome Transit and we should link the two links 03:16.200 --> 03:22.920 somehow. And we try to keep up with all of the tools that are available with the focus on free 03:22.920 --> 03:28.600 and open source. There is no rules to this list. Anyone can add their tool. If there is one 03:28.600 --> 03:35.320 user or thousands of users, it's hard to keep up. So if you see tools that are not there, 03:35.320 --> 03:41.240 if you build tools, please add them to this list. We try to maintain it and keep it up to date. 03:44.280 --> 03:48.840 So before introducing the mobility database, I want to talk about the problem. It's meant to 03:48.840 --> 03:57.400 solve. We talked a little bit about the quality of the data. Sorry for the Google map screen shot. 03:57.400 --> 04:03.960 I know it's bad to bring that in an open source conference. But the point is that often times this happens 04:04.040 --> 04:11.320 so often we are stressed when we take public transport. We don't know what to do with this information. 04:12.680 --> 04:19.080 Having bad data, sometimes it's worse than having no data. It reduces the trust. People 04:19.080 --> 04:25.640 having the service creates a lot of complaints. The people that can afford to take a taxi, they take 04:25.640 --> 04:30.360 a taxi and the one that can't take their right way to work. They are very stressed. 04:30.360 --> 04:39.400 It takes a whole industry to improve data quality. Originally, I didn't have policy makers here 04:39.400 --> 04:48.040 and I feel very ashamed. When you spoke, I added policy makers. Because in the open source system, 04:48.040 --> 04:53.400 we are a bit outside of what's happening in policy. I find, and so I'm very glad you're here 04:54.040 --> 05:00.440 and I'm sorry that I didn't put policy makers on this slide in our ecosystem. It's more 05:00.440 --> 05:06.600 the industry. So the public transport authority is the technology partners, the trip planners. 05:06.600 --> 05:11.720 I take a commitment to collaborate more with policy makers in Europe, especially. 05:13.000 --> 05:18.280 So why do we have this bad experience in the world? Everything is digital. We can do everything with AI. 05:18.280 --> 05:22.360 And when we take the bus, we have wrong information. The best doesn't show up. 05:24.120 --> 05:30.360 It's very hard to translate data into a problem for the writer. So when the bus doesn't show up, 05:30.360 --> 05:36.680 it's a real experience. But when you see a data set, it's hard to trace and to be able to say, 05:36.680 --> 05:42.600 well, this will lead in a bad writer experience. Sometimes it's a bit obscure. We don't know how to 05:42.600 --> 05:50.600 measure data quality. We don't have tools to measure it. We have accountability issues of 10 times 05:50.600 --> 05:57.400 between the operator and your up. There is five intermediaries in Europe. There are the national 05:57.400 --> 06:02.440 access points that are supposed to help with all of this. It also adds another stakeholder. Sometimes 06:02.440 --> 06:06.520 they point fingers. You're responsible. You're responsible. We don't know who's responsible 06:06.520 --> 06:13.640 for the data quality. So the solution, one solution is a product to be able to encode the mobility 06:13.640 --> 06:22.120 database. So it's an open data platform containing data in the GTFS and GBFS format, essentially. 06:22.120 --> 06:32.600 It's free to open source. Currently it has data from 4,500 sources, 75 countries. 06:33.240 --> 06:39.800 And the reason we build this platform is the data quality. I know we talked about availability. 06:39.800 --> 06:48.760 But it's the main thing we're trying to solve with it. And so each feed contains data quality 06:48.760 --> 06:56.680 reports. Our team also builds validators that evaluate the quality of GTFS and GBFS data and 06:56.680 --> 07:05.080 provide this official way to measure the quality. It is required in some places to use these tools. 07:07.560 --> 07:13.240 Here's an example of what a page looks like. So the metadata, I would say on the left, 07:14.520 --> 07:19.800 license, contact information and the features that are included. Is it only timetable? Or is it 07:19.800 --> 07:28.280 only accessibility information, fairs, etc. It links to documentation with just release visualizations 07:29.320 --> 07:35.960 last year. So these you can zoom in and out and be able to kind of see what the data looks like 07:35.960 --> 07:41.480 in practice. And the assumption is to try to make people understand the impact of having data 07:41.480 --> 07:48.200 problems. And we have the compliance reports according to the spec, not the law, in that case. 07:50.600 --> 07:57.160 And the value of this product is to make this data available for many users. 07:59.640 --> 08:05.480 So we have the cities that can use it for all of the planning. They need to do different scenarios. 08:05.480 --> 08:10.520 We have collaborations with the national law access points. We exchange data. Some of them 08:10.520 --> 08:18.760 have integrated our tooling within their platforms. Of course researchers, transport operators 08:18.760 --> 08:25.560 can use it to check the quality of their feed. Maybe longer term is their data reused. 08:25.560 --> 08:30.120 For example, it's something we could add writer applications. 08:33.400 --> 08:40.680 So it has the reports from the official validators that we maintain. We have one for 08:41.560 --> 08:45.960 GTFS schedule. We also have one for GTFS rule time. It's not yet integrated. 08:46.040 --> 08:55.880 We have one for GTFS. It has visualization to troubleshoot problems. You can see kind of the location 08:55.880 --> 09:06.040 of the stops. The weather shape is a lot of metadata, historical data is also available 09:06.040 --> 09:10.280 or historical versions of the data for GTFS schedule only. 09:10.520 --> 09:20.520 And they are widely used. We have a lot of contributors. We have our team, but we are 09:20.520 --> 09:26.280 as an open source project. There is also a lot of people contributing. And we are lucky to have 09:26.280 --> 09:35.240 a lot of them. But we struggle with keeping this platform up to date. These are some of our 09:35.240 --> 09:41.320 issues. I'm sure that any of you building products on open data are facing similar ones. 09:41.320 --> 09:45.800 The feed become inactive. We don't know if there is a replacement. We don't know what the 09:45.800 --> 09:53.640 license is or the license is unclear. It completes data sources. You mentioned some aggregate. Sometimes 09:53.640 --> 09:59.480 you have a address of a bad aggregate or incomplete small local sources. We don't know if the 09:59.480 --> 10:07.560 source is official. Is it maintained? Or is it an awesome contributor that did it for Hackaton? 10:07.560 --> 10:13.560 But then they don't do anything with it anymore. No coverage in certain areas, certain countries. 10:13.560 --> 10:22.760 We have no feed at all. And so I'm lucky to be in a room full of developers to present this. 10:23.400 --> 10:31.080 And so I couldn't help but to prompt you to support this project. I imagine you come from many 10:31.080 --> 10:36.680 different places in Europe. So I encourage you to go on the platform to look if there is the data 10:36.680 --> 10:43.960 from your city or from your country. Check the quality. Bring it to the transport authority. 10:43.960 --> 10:51.400 If they don't know about it, add a missing feed. If you know of a feed that is missing, 10:51.480 --> 10:59.160 replacing outdated source. It's all on GitHub, open issues, whether from missing data, 10:59.800 --> 11:04.920 problems with the quality. It helps us also go to the official source and say people are complaining 11:06.360 --> 11:12.680 about your source and it's not only us saying it. And open each GitHub issue. Also your ideas 11:12.680 --> 11:20.360 on how to improve the platform, new features and functionalities that we will make it useful for you. 11:22.360 --> 11:28.120 And for the time that is left, I encourage you to go ahead and open it. The people that have 11:28.120 --> 11:34.280 their computers. I'm going to show a little demo but go ahead. I won't be mad if you go on your 11:34.280 --> 11:44.040 laptops or on your phones. Open it, browse it and I will show a little demo. I think it's my last 11:44.040 --> 11:56.600 slide. How many minutes? 5 minutes left. Here's what it looks like. I'll make it a bit bigger. 11:57.400 --> 12:16.600 And I'll put TMB in Barcelona. So this is what the page looks like. We know it's an official feed, 12:16.600 --> 12:24.360 so it means they have confirmed that the source is maintained by them. We can download the GTFS. 12:24.360 --> 12:33.240 We can open the quality report with issues in the data. 12:36.840 --> 12:42.440 We are to download the service range. So some of the things we are thinking about doing is, for example, 12:42.440 --> 12:49.640 sending notifications. When the feed is about to expire, because having the feed expiring and not 12:49.640 --> 12:56.760 having replacement is actually a major problem we have. Which features are included. So here we 12:56.760 --> 13:02.040 can see they have, for example, wheelchair accessibility. We can click on it and get to the documentation. 13:02.840 --> 13:09.560 Another thing we're thinking about doing is adding what features could be included. If one feed 13:09.560 --> 13:14.520 only has minimum information, we can say, hey, you could add accessibility here's the documentation. 13:14.680 --> 13:21.400 And this is the big new thing, the visualization. So I'd also open them up. 13:23.400 --> 13:30.840 And I think we'll do just with subways and metro. And this shows the shapes. 13:33.000 --> 13:37.560 So the form that the writer seen a trip planner and the location of the stops. 13:37.560 --> 13:48.520 And that's it. I want to keep time for questions. How much time do you have? 13:51.560 --> 14:01.400 There should be. Let's check. But maybe if you type STIBs, how you find it. 14:01.400 --> 14:08.040 Normally, if you type bristles, you should, you should write, why am I? 14:12.360 --> 14:25.880 Yeah. Here's the feed for bristles. Or for the STIB. Any questions? Yes. 14:26.840 --> 14:33.800 I'm wondering, do you have an example of an unofficial feed that is like trust him by the community? 14:36.040 --> 14:49.000 Yes. The feeds from, well, we removed them now, but we had some from GTFS.BE. It's a platform actually in Belgium. 14:49.560 --> 15:01.400 That we're making the Belgium feeds available without a login because you need it to sign a PDF document. 15:02.440 --> 15:06.040 But now it's not the case anymore. So we remove them. But that's an example. 15:19.960 --> 15:28.920 Me, the same thing. Sorry. For example, a couple of people would be still chairing about the transit and sometimes they say, 15:28.920 --> 15:35.080 oh, this is wheelchair accessibility and they get there and they do. So do you look at, sort of like, 15:36.280 --> 15:44.600 crowdsourced information or did just the feed from that transparency? We take both our philosophies to add 15:44.680 --> 15:52.760 all of the data sources. We have either crowdsource or official or aggregates, disaggregate, 15:52.760 --> 16:02.920 but just document it so people can choose. And then, um, with regards to being able to compare 16:02.920 --> 16:11.480 feed with each other, we our validator measures what is included in the data. And so when we say the 16:11.480 --> 16:19.960 data has wheelchair accessibility, it will mean the same thing in every country. If the producer says 16:19.960 --> 16:25.480 it, or if the if the producers and the consumers use the same tool that measures it, then they have the same 16:25.480 --> 16:32.040 definition. Okay. Yes. And then you. Yes. And then you. Yes. 16:32.120 --> 16:40.680 So, I'm going to say to happen this, are you actually consuming the data? Well, we are 16:40.680 --> 16:46.680 pointing to the feed. We are consuming to build this platform, but we don't build any additional service. 16:46.680 --> 16:55.480 We are consuming to to display them, but we don't, um, we don't have, I don't have an API on the data. 16:55.480 --> 17:00.120 No, we do have an API, so that's not completely accurate. And we have a stable ID as well. 17:00.200 --> 17:09.560 So, um, so, for example, in France, we rely on the national access point to transport a 17:09.560 --> 17:17.240 taggov. And so, they, with the ill-defrost mobility, they have, uh, they have a stable ID. So, 17:17.240 --> 17:21.720 if you look, and they are in touch with ill-defrost mobility, they are the person. And then we, 17:23.480 --> 17:28.120 we provide an URL, uh, based on the transport a taggov feed. 17:28.120 --> 17:32.120 In that case, how do you deal with licensing? We have something up. I mean, I do really 17:32.120 --> 17:40.760 get info about data that they license. Yeah. Yeah, exactly. So, whatever opinions people might 17:40.760 --> 17:46.520 have on the weaknesses in France, well, can I say five questions or? Also, I want you to be able 17:46.520 --> 17:53.240 to ask us your hand the whole time. But we, um, GTFS and NetAx, it's obviously from both the 17:53.240 --> 17:58.680 presentation, the EU guy, but NetAx says the way that things will be going in the EU. 17:58.680 --> 18:04.520 Yeah. Does your tool have any support for NetAx, or are you planning to do this? Because having 18:04.520 --> 18:09.560 something like this that breaks the quality of NetAx means what would be very useful. 18:09.560 --> 18:18.200 We, yes, some people ask us, we consider it. But we govern this format. And so, 18:18.680 --> 18:23.080 from a product perspective, it could make sense, but we also use the mobility that are 18:23.080 --> 18:29.800 based, uh, more selfishly to analyze and then make changes to the standard based on what we see, 18:29.800 --> 18:37.320 which we cannot really do in the text. And so, but it's possible, but currently it's not really 18:37.320 --> 18:43.880 a short-term plan, no. Yeah, we did not go a project there. It's about to be a waste of that purpose. 18:43.880 --> 18:54.440 Oh, cool. But we could operate as much as, as much as we can. Yeah. Okay. Sorry.