22 Concluding remarks, next steps, and open problems

STATUS: Under construction.

22.1 Concluding remarks

There’s an old saying, something along the lines of ‘may you live in interesting times.’ I’m not sure if every generation feels this, but we sure live in interesting times. In this book, I have tried to convey some essentials that I think would allow you to contribute. But we are just getting started.

I’m 35 and so I am in the ‘data science didn’t exist when I was an undergraduate’ generation. In a little over a decade data science has gone from something that barely existed to a defining part of academia and industry. What does that imply for for you? It may imply that one should not just be making decisions that optimize for what data science looks like right now, but also what could happen. While that’s a little difficult, that’s also one of the things that makes data science so exciting. That might mean choices like:

  • taking courses on fundamentals, not just fashionable applications;
  • reading books, not just whatever is trending; and
  • trying to be at the intersection of at least a few different areas, rather than hyper-specialized.

I’m just someone who likes to play with data using R. A decade ago I wouldn’t have fit into any particular department. I’m lucky that these days there is space in data science for someone like me. And the nice thing about what we now call data science is that there’s space for you as well.

Data science needs diversity. Data science needs your intelligence and enthusiasm. It needs you to be in the room, and able to make contributions. We live in interesting times and it’s just such an exciting time to be enthusiastic about data. I can’t wait to see what you build.

22.2 Next steps

This book has covered a lot of ground. Chances are there are aspects that you want to explore further, building on the foundation that you have established. If so, then I’ve accomplished what I set out to do.

For learning more about R in terms of data science, there is only one recommendation that is possible and that is Wickham and Grolemund (2017). To deepen your understanding of R itself, go next to Wickham (2019a).

If you’re interested in learning more about causality then start with Cunningham (2021) and Huntington-Klein (2021).

If you’re interested to learn more about statistics then begin with Johnson, Ott, and Dogucu (2022), McElreath (2020), and Gelman et al. (2014). You should probably also backfill some of the fundamentals, starting with Wasserman (2005).

There is only one next natural step if you’re interested in learning more about statistical (what’s come to be called machine) learning and that’s James et al. (2017) followed by Friedman, Tibshirani, and Hastie (2009).

If you’re interested in sampling then the next book to turn to is Lohr (2019). To deepen your understanding of surveys and experiments, go next to Gerber and Green (2012) in combination with Kohavi, Tang, and Xu (2020).

For graphs turn to Healy (2018).

Writing go to…

Thinking through production and SQL and things like, a next natural step is…

We often hear the phrase let the data speak. Hopefully by this point you understand that never happens. Never do anything in moderation, and if you can’t do anything else then, be kind.