Brendan Malone talks about DSM with R

What led up to this DSM “cookbook”?

Things really kicked off in 2010 when I was invited with Budiman and Alex to prepare a digital soil mapping training course to a number of Australian government scientists and soil scientists. Digital soil mapping was becoming operationalised in Australia around this time with initiatives such as the GlobalSoilMap and the Australian Soil and Landscape grid all being newly established. We needed to build the DSM capacity in the country. Subsequent to this there have been a number of workshops and training courses related to DSM in Australia and around the world. The course materials evolved significantly from 2010 and have been extended somewhat. Right at the beginning we were using a smorgasbord of software throughout the course, but it has been streamlined a fair bit and its now exclusively given using the R statistical software.

Why do you choose R ?

R is very popular. It has its limitations of course, but I like it for all the obvious reasons of being free and open source and having a very extensive help support. I had never been exposed to scripting until the beginning of my PhD in 2009 where Budiman encouraged me to learn R. What actually helped with my learning was translating some of Budi’s Matlab code to R! A very cathartic exercise that was I might say. But anyhow, R is quite easy to learn, it is intuitive, and I like how you can develop complete workflows of a statistical analysis. This is good for reproducing your work and effective in a teaching context too.

You travelled around the world giving this DSM course?

I have! It is definitely one of the perks of the job I might say. I won’t list them all, but a general observation from all my international training experiences are that soil scientists are a friendly bunch of people. Our workshops attract all skillsets. We have had seasoned pedologists, fledgling undergrad and postgrad students, government scientists, private consultants and professionals. Furthermore, peoples experience with scripting and R has been equally diverse. This makes our workshops pretty fun. As facilitators of the courses we have to be on our toes too. I don’t think any one workshop is the same really, so we have to be prepared for anything and improvise often. Ultimately this creates quite nuanced experiences for the participants, which is why I suppose we have not had too many complaints yet! Ultimately, I think this type of high intensity training environment has been really good for us in honing our training materials over time.

How long did it take to write the book?

The book materials have taken some time to develop. There have not been extended periods of writing it though, just bits and pieces here and there. I did do a fair bit of writing for in in 2015 though once we had established the publishing contract with Springer.  I migrated everything to LaTeX and knitr in 2013 to give the seamless integration between script and text. That was a rather fiddly exercise but worth it I think.

Which part of chapter of the book do you enjoy most?

Are you implying this book is not a masterpiece! There is not a single part, but I would say I quite like the parts where a particular method for addressing a question is elaborated upon. For example I like the chapters about soil map disaggregation and the 2-stage DSM. I also like the section about estimating soil homologues too. There was actually quite a bit stuff going on behind the scenes for this book too which I am quite happy with too. These are namely to development of a number of R packages which included ithir, which has all the data that is used in the book. Also there is the dsmart R package (soil map disaggregation) and the fuzme R package, which does fuzzy k means and fuzzy kmeans with extragrades.

Do you got feedback from readers?

No I think we have stunned everyone to silence. No seriously, there has been quite a few personal congratulatory emails that I have found very satisfying. I can’t say I have fielded any sharp criticisms (yet). Ultimately the book is not for seasoned DSM people, but rather for new students and practitioners wanting an entry point to our science.
Can we get all the R codes from the book?

Yes you can. We are setting up a book website that will have all the R scripts. I also have most of the R scripts up on my Github repo “USYD_DSM”. Anyone can get the data by getting hold of the ithir R package I have developed.

Isn’t a cookbook of DSM dangerous? Anyone without knowledge of soil can now make soil maps from few lines of codes.

Well the flip side of your argument is that if we shroud our workflows in mystery and jargon, then nobody will get on board. Part of the reason why DSM is operationalised, for example in Australia is because this book and our courses have made it accessible. I don’t think we are dumbing it down, but rather we have partially designed a pedagogical framework needed to build capacity in DSM around the world. So let’s get it out there I say.

I do believe in the book we have gone to lengths about the importance of validation, and covered different approaches for how to go about doing this. Also the chapter on uncertainty quantification is integral to the workflow of DSM. Evaluating the quality of our outputs via communicating the uncertainties is a strong characteristic for the DSM community, and probably sets us apart a bit from other communities where geospatial science is undertaken.

What would you like to see in the future edition of this book?

A future edition of the book would be one that stays up to date with what is going on in the R GIS space and the spatial modeling space. I think I would also like to see the book in digital form, complete with interactive scripting. There are some other unborn ideas bubbling away too. As a start though, what would you (meaning the DSM community) like to see in a future edition? I would like to see what other people think about this too you know.

Leave a Reply

Your email address will not be published. Required fields are marked *