Corey Wade is the author of Hands-on Gradient Boosting with XGBoost and Scikit-learn. We got the chance to sit down with him and find out more about his experience of writing with Packt.
Q: How did you become an author for Packt? Tell us about your journey.
Corey: I was recruited by Packt in 2018 to help develop and write the Python Workshop. They were looking for a team of programmers, educators, and writers. My profession is education so it seemed like a nice fit. I researched Python and proposed a table of contents, subsequently writing the first chapter on Vital Python, and the last two chapters on Data Analytics and Machine Learning. The book is currently available on Amazon with over 37 5-star reviews.
I was asked to submit a proposal for an XGBoost book by Packt after the Python Workshop was published. I wanted to learn more about XGBoost so I did some intense research before submitting a proposal which was accepted. I included the historical and theoretical development in the proposal, placing XGBoost and Gradient Boosting in the larger context of Decision Trees and Ensemble Methods within the Machine Learning Landscape.
Q: How long did it take you to write the book?
Corey: This book has taken approximately a year to write.
Q: What kind of research did you do, and how long did you spend researching before beginning the book?
Corey: I researched XGBoost during the entire process of writing the book. My research consisted of reading original papers, experimenting with code, frequenting StackOverFlow, scouring Kaggle forums, reading interviews from Kaggle winners who had used XGBoost, and consulting excellent third party sources online like Jason Brownlee’s Machine Learning Master blog and DataCamp. Additionally, whenever I came across a new topic, or a term that needed clarification, I looked multiple references online and consulted the original documentation.
Q: Why should readers choose this book over others already on the market? How would you differentiate your book from its competition?
Corey: This book stands out because it includes the XGBoost Scikit-learn wrapper in nearly all examples, details and models from the Higgs boson Kaggle competition, the theoretical mathematics behind XGBoost, foundational machine learning topics, and advanced machine learning topics like building non-correlated ensembles and customized transformers.
Q: What’s your take on the technologies discussed in the book? Where do you see these technologies heading in the future?
Corey: XGBoost is the best machine learning algorithm for handling tabular data in the majority of cases. Before this book, I used XGBoost sparingly. Now I use it all the time. As more data is developed and analyzed, the need for more machine learning algorithms will continue to rise, and XGBoost will only be in higher demand.
Q: Did you face any challenges during the writing process? How did you overcome them?
Corey: There are always challenges when writing a book of this scope! The biggest challenge for me was analyzing the theoretical mathematics behind XGBoost so that I could explain it in a way that made sense to the reader. Working with the original paper, I found an additional reference to the mathematics online, and worked with pencil and paper over multiple nights until it made sense. I went through a similar process trying to fully understand the data behind the Higgs boson competition where XGBoost first made its mark on the world. Both topics are included in the advanced introduction to XGBoost in Chapter 5.
Q: What advice would you give to readers learning tech? Do you have any top tips?
Corey: Always code. It’s not enough to read code. It’s essential to write code at the same time. Take classes, enroll in Kaggle competitions (machine learning), and research every topic that you come across. If you want to get better, consult academic papers and read the original documentation. And get involved with a big project!
Q. How do you keep up-to-date on your tech?
Corey: You can’t rest on prior knowledge in the tech world. I always do research when I teach new classes for Berkeley Coding Academy, and I check the dates of my references. As for regular sources, DataCamp stays on top the Data Science world and I like how their courses are focused on clean code without the fluff. I also look for new classes that are being offered from reputable sites like MITx and Coursera. It’s not just enough to take classes, it’s also important to make sure that the classes are relevant to the changes going on in the world today. Regular reading, writing, teaching, and learning does the trick for me.
Q. Do you have a blog that readers can follow?
Q. Can you share any blogs, websites and forums to help readers gain a holistic view of the tech they are learning?
Corey: Yes! XGBoost has their own forum, https://discuss.xgboost.ai/, in addition to their own documentation, https://xgboost.readthedocs.io/en/latest/. Be sure to read the original paper on XGBoost, https://arxiv.org/pdf/1603.02754.pdf, and check out some YouTube videos with XGBoost author Tianqi Chen lecturing such as this one, https://www.youtube.com/watch?v=Vly8xGnNiWs&t=2132s&ab_channel=DataScience.LA. Also, the Higgs boson Kaggle competition is a mine of XGBoost information: https://www.kaggle.com/c/higgs-boson/discussion/10335.
Your general machine learning journey may be greatly enhanced with Jason Brownlee’s Machine Learning Mastery blog, https://machinelearningmastery.com/about/, which includes XGBoost documentation along with a huge range of other topics. Finally, StackOverFlow is gold for any coding questions or issues that arise along your journey – I use it many times every week.
How would you describe you author journey with Packt? Would you recommend Packt to aspiring authors?
Corey: My journey with Packt has been a struggle at times, but ultimately rewarding. I would definitely recommend Packt for authors, educators, and programmers who are motivated to break into the publishing world and communicate with a wider audience. With Packt, if accepted, you can guarantee that your book will be published and you will be provided with a team to help ensure the professionalism and consistency of the work. The process isn’t perfect, but the results stand for themselves.
You can find Corey’s book on Amazon by following the links below the cover image: