WEBVTT 00:00.000 --> 00:10.000 Thanks everyone for attending my name is Andreas. 00:10.000 --> 00:18.000 I'm a member of the GNU Octave project, I'm a developer for Core Octave and I'm also a 00:18.000 --> 00:24.000 maintainer in the primary developer of a few packages including the statistics and the data 00:25.000 --> 00:33.000 I'm an electronic engineer by training but I've also made a piece being biological anthropology 00:33.000 --> 00:44.000 so I ended up on the different career path doing with anthropology and basically doing statistics 00:44.000 --> 00:50.000 and data analysis in a different context and engineering and computer science. 00:51.000 --> 01:00.000 So today I would like to give you my insight about GNU Octave and the GNU Octave ecosystem 01:00.000 --> 01:02.000 in education. 01:02.000 --> 01:12.000 As I've experienced during my research time but also as a lecturer in various universities 01:12.000 --> 01:19.000 I worked over the past ten years but also from the maintenance and developers point of view 01:20.000 --> 01:25.000 because I've been heavily involved with GNU Octave during the past four years 01:25.000 --> 01:31.000 or I've been using it for more than eleven years now for my work and my teaching. 01:31.000 --> 01:43.000 So here we are then like GNU Octave in education and insight beyond engineering into statistics and data analysis basically. 01:44.000 --> 01:55.000 So what is GNU Octave? GNU Octave is a scientific programming language which is mostly focused on numerical 01:55.000 --> 01:58.000 computations. 01:58.000 --> 02:10.000 Most people know it as the open source alternative of MATLAB or like the freebie clone of MATLAB which is kind of true. 02:10.000 --> 02:20.000 I mean we serve the same syntax but also it's a project that only took three open source projects. 02:20.000 --> 02:27.000 So to give you a bit of historical background of this project it's not GNU actually. 02:27.000 --> 02:36.000 It was conceived back in the late 80s and it was conceived actually as a tool, as an education tool. 02:37.000 --> 02:42.000 They needed to make a scripted language so to speak. 02:42.000 --> 02:48.000 In order to assist students actually to understand the computations in chemical reactions. 02:48.000 --> 02:57.000 So and back then the idea was like well we can do that in Fortran, which was what scientists did use at the time. 02:57.000 --> 03:13.000 And then again the whole irrational behind that was that yes, but we don't want the students actually paying attention to all the what is going on with a Fortran language and trying to debug the compiler so on. 03:13.000 --> 03:17.000 We want to make a long way that it's straightforward. 03:17.000 --> 03:29.000 It has a very easy learning curve and actually students can actually dedicate their focus and their effort in understanding the underlying mathematics. 03:29.000 --> 03:36.000 So this initial idea well still stands today like 36 years later or so. 03:36.000 --> 03:44.000 The development of octaves started back in 1992 by a guy named John W. Eton. 03:44.000 --> 03:48.000 Thankfully he's still around with us. 03:48.000 --> 04:05.000 And so the initial release was in 1993 and basically now we're almost I mean we're almost at octave 11 which is like we've just made a release candidate. 04:05.000 --> 04:15.000 Like if 10 days ago and hopefully there will be the major release within this math if everything goes well. 04:15.000 --> 04:34.000 So during this 34 years of course they've been more than 450 contributors in this project because 35 years it's 34 or 35 years it's quite a long time as you understand but nevertheless it's always been kind of a. 04:35.000 --> 04:42.000 Of a small project in terms of the actual people that were always engaged so. 04:42.000 --> 04:53.000 During this 34 years there's always been just a handful of people being engaged for certain periods and then like leaving us on so it's we're not a big project in this aspect. 04:53.000 --> 04:59.000 We don't have like a large community we do have a large user base though. 04:59.000 --> 05:09.000 But we're kind of seen in terms of developers and the contributors. 05:09.000 --> 05:12.000 We don't have a foundation supporting our show. 05:12.000 --> 05:15.000 I mean our concept is that basically. 05:15.000 --> 05:21.000 We're just a repository and the cold base and the group of people who are actually working on that. 05:21.000 --> 05:39.000 And of course we are licensed I mean the license is like general public license version three so because actually the octave project is actually part of the of the new project and it was released back then and we continue that. 05:39.000 --> 05:45.000 So a few things about the octave it's written in C mostly. 05:46.000 --> 05:55.000 Well it's an interpreted language of course and the scripting language of octave actually. 05:55.000 --> 05:57.000 Well it's almost identical. 05:57.000 --> 06:05.000 Well it is identical to MATLAB but it has a few extensions that work also in octave that are octaves specific. 06:05.000 --> 06:28.000 And it can be extended by using dynamic libraries written in C++ and this is an integral part of the octave interpreter which makes it quite easy to actually implement or link any kind of library into octava without any intermediate. 06:28.000 --> 06:33.000 We're also the need of intermediate libraries or other codes. 06:34.000 --> 06:47.000 We use an open GL backend for plotting which is quite a useful in most statistical data analysis tasks. 06:47.000 --> 07:02.000 And it comes both with a graphical user interface which I will also refer later on in my talk and of course it has a traditional command line interface that you can run. 07:02.000 --> 07:11.000 Last but not least the octava in itself is also a library. 07:11.000 --> 07:24.000 So whatever you can do with the interpreter and like the interpreter with the language actually you can use octava as a library into your own projects. 07:24.000 --> 07:34.000 So of course when you link to octava you have to be also a free project because you have to abide with a GPL version free library of course. 07:34.000 --> 07:38.000 But the potential is there nevertheless. 07:38.000 --> 07:54.000 So the key features of octava that makes it apart from other interpreted language is that it has a building support for multidimensional arrays and sparse matrices. 07:54.000 --> 08:09.000 And this is basically an integral part of the octava library because it was built upon this concept that you need to do multidimensional arrays. 08:09.000 --> 08:30.000 So basically when you have like a color picture that you need to process somehow in octava you don't really need to do some sort of work around or just load some external library that you're doing for example in r or python. 08:30.000 --> 08:42.000 It's already there like you load the data and it's natively multidimensional and this helps a lot because actually you can have a substantial speed. 08:42.000 --> 08:54.000 Because of course octava also supports full broadcasting in all math operations comparison or Boolean operations both dense and sparse matrices. 08:54.000 --> 09:10.000 So this makes it quite easy actually to write a highly vectorized code which makes the processing of the interpreted language. 09:10.000 --> 09:16.000 Quite quite fast. 09:16.000 --> 09:34.000 So other key features is that we do support nested indexing which plays very well with multidimensional arrays and other actually writing vectorized code especially with a complex structures. 09:34.000 --> 09:47.000 It has I said like we have an open gl back end so there's extensive floating capabilities that there built in we don't have to rely external packages like gd plot or whatever else. 09:47.000 --> 10:00.000 And of course there is an extensive set of core functions that deal with geometry linear algebra we have like ordinary differential equation solvers like. 10:00.000 --> 10:05.000 Like set of them you get statistics set operations etc. 10:05.000 --> 10:29.000 And of course and this is well I consider this one of the very important aspects of octave is that there is an integrated testing suite which is used for octava internally but it is also used for all the packages that actually can be used as addons to the octava program language. 10:30.000 --> 10:42.000 So I mean if you've used the octave before most likely you've you've heard of octave force or sure and octave packages basically. 10:42.000 --> 10:53.000 Octave force is the legacy systems that used it was actually it started back in 2000 and basically it was kind of a. 10:53.000 --> 11:06.000 A simple project to the new octave project that actually was dealing with the packages that would extend the octave functionality for specific. 11:06.000 --> 11:18.000 Needs and for specific scientific fields so you would get like statistics package you will get a package for optimizations you will get like geographic packages dealing with. 11:18.000 --> 11:33.000 So well as I said the octave community like the octave developers team has always been like. 11:33.000 --> 11:43.000 It's a slim in terms of a number of people so the the original octave force. 11:43.000 --> 11:50.000 Packed the system came to a stall basically somewhere about. 11:50.000 --> 11:58.000 2014 so to speak and I mean there were there were problems actually maintaining it. 11:58.000 --> 12:04.000 It also had to do that it was when it was built back in the 2000 it was built with. 12:04.000 --> 12:21.000 With an old old system and old concepts in mind terms of packaging so basically back in 2020 there has been a shift with change the whole packaging and well actually we moved it to GitHub the previous. 12:21.000 --> 12:25.000 Octave force was hosted in the source cords. 12:25.000 --> 12:42.000 And on GitHub we made good user of the continuous integration capabilities which are free for open source projects until now at least so and. 12:42.000 --> 12:57.000 And the concept was that to make another made it system for publishing packages so that me other maintainers like other users can publish their own code without. 12:57.000 --> 13:16.000 The necessity of any of the country of the developers being actively engaged or involved in this and the idea behind this was to actually be able to help people to expand the use of octave. 13:16.000 --> 13:32.000 Because what octave has been mostly used so far was basically doing linear algebra and matrix computations and this is why you don't get like when you hear like doing statistics and data analysis with octave. 13:32.000 --> 13:41.000 It comes it sounds a bit weird right because nowadays most people either use are or Python for example. 13:41.000 --> 14:04.000 But nevertheless it did work the transition that we did like the octave force used to have something like 55 packages and during the last years the majority of those were actually not maintained whereas with a new system. 14:04.000 --> 14:20.000 The octave packets indexed actually have more than 130 packages and with more than 100 of them being actively developed and maintained so actually from our perspective. 14:20.000 --> 14:24.000 It did work the effort of making this transition. 14:24.000 --> 14:40.000 It is the a octave packets index basically it's a single station file that it is automatically generated with all the packages that being listed on the octave packets index and basically what we actually do is that. 14:40.000 --> 14:57.000 We have set up continuous integration to actually monitor and most importantly test all the packages so any of you can you can write your own particular packets that you want to use in your research project in the class wherever and you can actually. 14:57.000 --> 15:17.000 Publish it there and have it readily available for your students for your colleagues to install it in octave and basically what we do actually we make sure that it goes through the continuous integration testing that it doesn't break octave when you install. 15:17.000 --> 15:31.000 And the other thing that we also have nowadays is that basically octave also takes that for the integrity of what is what is what is being downloaded. 15:31.000 --> 15:58.000 With was a long-standing issue in request for from my users of course and we did that of course this doesn't mean that you can't like you still have to know what you're downloading because we were just testing that the code that the package doesn't download but of course you're downloading a programming language. 15:58.000 --> 16:20.000 So you have when you don't load the package you have to actually keep this in mind. But anyway, all this you can actually just install it in octave with a simple compile pkg install and the packets that you want to install and this actually makes it quite. 16:20.000 --> 16:34.000 It's helpful not only for a colleagues but especially in classrooms at least my experience because it has been quite a few times that I just. 16:34.000 --> 16:50.000 Had to actually make something for the classroom and then make it make it at least to have it available and then ask the students to download it either in the lab or in their personal computers and to work without. 16:50.000 --> 17:06.000 Which is quite handy in certain systems. So during my research I mean I've mostly been dealing with a population statistics in biological anthropology and doing a classification for biological parameters. 17:06.000 --> 17:21.000 So using octave basically for the most part was actually dealing with a statistics package and this is how I started contributing back in like. 17:21.000 --> 17:30.000 2016-17 I think and then after certain point I said like and because there were no maintainers actually I said like okay we. 17:30.000 --> 17:40.000 I started I took over the maintenance and the main development back in 2022. 17:40.000 --> 17:51.000 And because I've been using this for the classes as well I did put a substantial effort to actually expand it and make it. 17:51.000 --> 18:11.000 Well not complete but as complete as I could so nowadays the latest statistics release is like one point eight and there are more than 450 functions and class objects that are supporting the. 18:11.000 --> 18:30.000 There is support for more than 30 different distributions and as far as I know it's the only package that it does support so mainly fully support so many distributions in a single package. 18:30.000 --> 18:39.000 And these distributions include like random generators, distributors, feedings, log like you would say etc. 18:39.000 --> 18:57.000 Of course there's a very large set of functions for hypothesis testing and the most important is the 11,000 unit tests that are integrated in the statistics package. 18:57.000 --> 19:22.000 The one of the one of the important aspects of the octave both core octave and a lot of packages is that we do pay a lot of attention in the regression and testing it's not about writing some code and making somehow work but we really have to. 19:22.000 --> 19:43.000 Make sure that what we produce as a result it's correct well at least to our best effort that we can because after all we're still making a library for the numerical computation and if the output is not a numerical correct basically it's kind of useless no matter how fast it is or how. 19:43.000 --> 19:52.000 So and of course in the statistics package I've put a lot of I mean I've made like a lot of. 19:52.000 --> 20:08.000 Integrated algorithm for classification the regression I mean I've written these while because I needed them for my own reasons but after some point that I also started teaching I also found it quite handy. 20:08.000 --> 20:26.000 That I can use octave to actually teach statistics and I get it to there in a few minutes so the other thing about the statistics package in particular will basically all the packages I maintain but also octave is. 20:26.000 --> 20:35.000 Apart from testing we also have a focus for good documentation. 20:35.000 --> 20:41.000 And because it is important to be able to. 20:41.000 --> 20:49.000 To I mean when you write the function yourself you know how it works and you know how to call it and so on but. 20:49.000 --> 20:55.000 When you want your student to write it well the only way to get there is to actually. 20:56.000 --> 21:00.000 Right proper documentation which is a. 21:00.000 --> 21:09.000 Worker only told I mean writing the documentation for the function usually takes the same amount of time actually writing the code for the function so. 21:09.000 --> 21:14.000 And this is just an example from the online documentation. 21:14.000 --> 21:29.000 From the statistics package like I mean like up like this is the function for geometric mean and this documentation is also automatically generated with another package I've written the peak is the octave dock. 21:29.000 --> 21:42.000 So basically what it does is it takes all the help dox strings that are embedded in the function files and also the demos that are available and it produces this online documentation for users. 21:42.000 --> 21:50.000 To be able to see to read how how to use the function also find certain examples. 21:50.000 --> 22:05.000 So working with statistics well inevitably you end up needing certain certain data types and especially tables. 22:05.000 --> 22:11.000 Which is something that the core of the locks and well they are available. 22:11.000 --> 22:20.000 Matlab other languages have their own implementations for the tables like are has this data frames concept and etc. 22:20.000 --> 22:30.000 So over the past three years now I've started implementing the data types package from strad from scuds. 22:30.000 --> 22:40.000 And the idea was is to make table and categorical arrays available because that's the two things basically that I need for the statistics package. 22:40.000 --> 22:46.000 But doing that then you know you get into the rabbit hole and you start. 22:46.000 --> 22:53.000 I mean you realize that then you need to make like day time arrays and duration arrays and string arrays and etc. 22:53.000 --> 22:59.000 So I ended up with data types package which is. 22:59.000 --> 23:10.000 I mean it's not production ready I mean there is a still functionality missing but also what is already there it is working because as I said. 23:10.000 --> 23:23.000 What we build that we also build testing for that so we know that if it's working at least it's working correctly so it won't it won't mess your data. 23:23.000 --> 23:28.000 So if you call a function and it's not there it will not work so. 23:28.000 --> 23:38.000 And this is the what you get for example which is like similar to how Matlab actually uses tables. 23:38.000 --> 23:53.000 And this also I mean when you do it for yourself during your research it doesn't really matter you can have it like you can have your numbers in CSV and your data in CSV files. 23:53.000 --> 24:11.000 But when you get to the class and you actually have to teach students how to actually use the data load them from CSV files and use them to do an ANOVA etc. 24:11.000 --> 24:20.000 Being able to show the data in tabular format actually helps a lot. 24:20.000 --> 24:24.000 So I mean we are in the education. 24:24.000 --> 24:38.000 The room of course and most likely a lot of you will be wondering okay nice work that you guys are doing with Matlab thank you very much so what's the education in it. 24:38.000 --> 24:56.000 So I basically from my own personal experience which is not huge I mean I've been teaching like classes for the past five or six years now like in mostly in other graduate students. 24:56.000 --> 25:04.000 From my experience is that the and the other thing is that I haven't I haven't been teaching in like in CSV. 25:04.000 --> 25:08.000 Like the students I do is like from humanities their biologists. 25:08.000 --> 25:15.000 There are people doing social sciences and they want to and they need to learn statistics. 25:15.000 --> 25:23.000 So basically you don't have like take place take students or STEM students. 25:23.000 --> 25:44.000 And this is quite important because I mean what what we are dealing today is that in the class is that basically we have this AI thing and the question is like to AI or not to AI basically. 25:44.000 --> 25:54.000 So a lot of people saying that okay you will just have the AI to code the anything you want. 25:54.000 --> 26:03.000 But the thing and it kind of works like if you want to plot something the AI will do it for you today like some would compile it. 26:03.000 --> 26:10.000 But teaching students actually to understand the statistical concepts and apply them. 26:10.000 --> 26:19.000 They still have to understand them in order to be able to apply them even if they use an AI assistant to do it for them. 26:19.000 --> 26:30.000 Because at the end of the day I mean if you go to the grocery and like buy something that costs two euros and you pay five if you don't know math well the best like basic math. 26:30.000 --> 26:38.000 The best calculator in the world will not help you because you don't know what to do with two and five at the end of the same applies with statistics. 26:38.000 --> 26:59.000 So from my perspective is that what are the key aspects that make or that have a great programming language for teaching students statistics and data types. 26:59.000 --> 27:06.000 Well, to begin with indexing starts at one instead of zero. 27:06.000 --> 27:19.000 I mean most of us we are like developers programmers and so on and at the end of the day we are all I mean our favorite language each one has its own but it's the language that we use the most enter that we know. 27:19.000 --> 27:25.000 I mean you would guess that my favorite language is octave and C++ this is what I write this is what I know. 27:25.000 --> 27:31.000 But for first year students will they don't know anything. 27:31.000 --> 27:42.000 So basically every little detail matters in how much they will be able how how easy to attract their attention and get them involved. 27:42.000 --> 27:50.000 So the other benefit is that syntax naturally resembles mathematical notakes notation x equals five that's it. 27:50.000 --> 27:56.000 You see it on the blackboard you just type it and it's the same. 27:56.000 --> 28:10.000 And also the self-intuitive vectorization syntax which also has to do with the way with the way with the way octave is syntax is doing array indexing and also broadcasting. 28:10.000 --> 28:20.000 And last but not least well it's the modern yet simple graphical user interface that the octave has which also has an integrated debugger. 28:20.000 --> 28:36.000 The work space you and the variable editor and at least to my experience in the class it has been very very helpful in actually getting students to understand basic concepts of programming. 28:36.000 --> 28:49.000 And I think this is a highly important and this is why I wanted to talk to you about a new octave in the education at least from my own experience over the past few years. 28:49.000 --> 28:54.000 So thank you very much for your attention and to open to any questions. 28:54.000 --> 29:04.000 Thank you very much. 29:04.000 --> 29:11.000 Yeah, other questions yes, would you like to. 29:11.000 --> 29:25.000 And so are and octave I understand there is a lot of overlap but what would be the main distinction and I also want to like follow up we are for a question. 29:25.000 --> 29:29.000 Last year there was a talk about robotics and education. 29:29.000 --> 29:40.000 So with R I know it's not recommended for embedded because of memory reliance on OS and other reasons. 29:40.000 --> 29:52.000 Do you know if there is what's the situation with octave and it's use in embedded systems. 29:52.000 --> 29:56.000 Thank you for the question. 29:56.000 --> 30:02.000 Well, I don't know much about embedded system to be honest. 30:02.000 --> 30:12.000 So I mean I know you can write coding MATLAB and have it translated to embedded systems. 30:12.000 --> 30:16.000 I don't think you can do that with the octave. 30:16.000 --> 30:31.000 There are packets is in octaves that you actually interface with Arduino for example or you can interface with low IO systems through serial ports, USB, etc etc etc. 30:31.000 --> 30:36.000 So that's regarding the second question you made. 30:36.000 --> 30:48.000 One of the first questions so about R well R is mostly used for statistical computations. 30:48.000 --> 31:01.000 And you can do a lot of stuff but as I said in the previous slide well it's a very I mean it has a very bad syntax. 31:01.000 --> 31:19.000 So I mean it's a great software but in terms of education. 31:19.000 --> 31:39.000 When you're like in the high school or when you have like first year students who know nothing about programming languages. 31:39.000 --> 31:52.000 Well you have to explain them why X less does 5 is equivalent to X equal 5 so to speak. 31:52.000 --> 32:02.000 If you know what I understand so for example if you use Python and you want to do some vectorization. 32:02.000 --> 32:15.000 And you have all this semicolon semicolon comma semicolon a number a minus number indexing sort of I mean if you know Python it's like yeah okay. 32:15.000 --> 32:27.000 But in education we're dealing with students well especially in my field where I think it's like students from social sciences. 32:27.000 --> 32:41.000 I mean I put a lot of effort to convince that you need some sort of knowledge in data analysis you need to be able to actually write five lines of code and. 32:41.000 --> 32:53.000 And merge to CSV files to do your data analysis instead of copy pasting stuff in SSO you know it's. 32:53.000 --> 33:06.000 I hope I just answered your question and anybody else it's time up okay is it time for the last question or. 33:06.000 --> 33:24.000 I think there is an ongoing effort by. 33:25.000 --> 33:41.000 There's a company there's a fence company think that they've done integration with with do be there in the web assembly but I don't know much about it to be honest to be more informative. 33:41.000 --> 33:56.000 But I know that there is some effort to integrate the I mean they do have an opportunity I don't know much about it. 33:56.000 --> 33:57.000 Okay thank you very much.