Sarthak is the co-author of Optimizing Databricks Workload. We got the chance to interview him and find out more about his writing experience with Packt.
Q: What is/are your specialist tech area(s)?
Sarthak: My specialist tech areas include data engineering and analytics services on Microsoft Azure, Azure Databricks, Synapse Analytics, Data Lake, Data Factory, Event Hubs, and Cosmos DB. Power BI is the primary business intelligence and visualization tool that I occasionally work with.
Q: How did you become an author for Packt? Tell us about your journey. What was your motivation for writing this book?
Sarthak: It all started when Anshul reached out to me with this book-writing opportunity. I immediately accepted this opportunity because I always enjoyed writing and wanted to share my technical knowledge. Also, the very idea of writing a book on Databricks excited me because I had been learning and working with it from the very first day of my big data career. Overall, the journey was smooth but not as per what I had initially planned. I got diverted from the initial target of writing two pages per day very easily due to my full-time job and spending time on weekends to finish the chapters to meet the deadlines. But the journey was definitely worth the effort!
Q: What kind of research did you do, and how long did you spend researching before beginning the book?
Sarthak: My research involved reading Databricks blogs and Microsoft documentation pages. Since the book relied heavily on Spark/Databricks optimization techniques, I had to understand them well. For the same, I had to spend a lot of time doing personal proof of concepts and R&Ds. This practical know-how helped me explain the more complex concepts clearly.
Q: Did you face any challenges during the writing process? How did you overcome them?
Sarthak: Yes. The main challenge that I faced was sticking to the timelines. Initially, I planned to write two pages a day but got diverted from the goal due to the commitment of a full-time job. Hence, I had to spend quite a lot of time on weekends to meet the deadlines and ensure the chapters were ready on time for review.
Q: What’s your take on the technologies discussed in the book? Where do you see these technologies heading in the future?
Sarthak: Databricks has recently reached a $38 Billion valuation and is disrupting the cloud and data industry across the globe. The pace at which Databricks is innovating is unprecedented and will continue for a very long time. In the near future, Databricks is going to become a must-have skill for all data engineers, scientists, and analysts as the big data industry continues to rise.
Q: Why should readers choose this book over others already on the market? How would you differentiate your book from its competition?
Sarthak: “Optimizing Databricks Workload” is for data engineers, data scientists, and cloud architects who have worked with Spark/Databricks and have a basic understanding of data engineering principles. But even if a beginner picks up the book, they will grasp the concepts quickly. Also, the code samples and worked-out examples are simple and easy to follow, making the book fit for beginners and experienced professionals. Readers should choose this book over other Databricks books already on the market as it gives a holistic picture of this cloud native data analytics platform.
The book starts with a brief introduction to Azure Databricks and then quickly covers essential optimization techniques. Readers will understand how to select the optimal Spark cluster configurations for running big data processing and workloads in Databricks and get to grips with some beneficial optimization techniques for Spark data frames, best practices for optimizing Delta Lake, and techniques to optimize Spark jobs through Spark Core. They will also learn about some real-world scenarios where Databricks has helped organizations increase performance and save costs across various domains.
Q. What are the key takeaways you want readers to come away from the book with?
Sarthak: The key takeaways are as follows:
1. Get to grips with Spark fundamentals and the Databricks platform
2. Process big data using the Spark Dataframe API with Delta Lake
3. Analyze data using graph processing in Databricks
4. Use MLflow to manage machine learning lifecycles in Databricks
5. Learn to choose the right cluster configuration for your workloads
6. Explore file compaction and clustering methods to tune Delta tables
7. Discover advanced optimization techniques to speed up Spark jobs
Q. What advice would you give to readers learning tech? Do you have any top tips?
Sarthak: For anyone learning tech, the most important thing is to keep learning daily. As the tech around us is evolving every day, we need to ensure that we keep up to that pace. I believe in learning by doing and would recommend the same to everyone passionate about working in tech. We are solely responsible for our personal learning journeys and should not shy away from researching and proof of concepts.
Q. Can you share any blogs, websites, and forums to help readers gain a holistic view of the tech they are learning?
Sarthak: The Databricks Blog and Azure Databricks documentation.
Q. How did you organize, plan, and prioritize your work and write the book?
Sarthak: I distributed the book writing work over weekends to prioritize work on the weekdays.
Q. How would you describe your author journey with Packt? Would you recommend Packt to aspiring authors?
Sarthak: My author journey with Packt was excellent. The entire Packt team was very helpful and prompt to answer all our queries. Right from the beginning, when we were introduced to the Packt writing style to submitting the last chapter’s final draft, the journey was very smooth. The technical reviewers ensured that the content was complete and up to date. Yes, I would recommend Packt to aspiring authors!
Q. What are your favourite tech journals? How do you keep yourself up to date on tech?
Sarthak: My favourite tech journals include ‘Databricks Blog’ and the ‘Official Microsft Blog’. I also follow multiple tech YouTube channels focusing on Azure.
1. Databricks Blog – https://databricks.com/blog
2. Official Microsoft Blog – https://blogs.microsoft.com/
Q. Do you belong to any tech community groups?
Sarthak: Yes, I am part of the Microsoft Tech Community.
Q. What is the one writing tip that you found most crucial and would like to share with aspiring authors?
Sarthak: Try to stick to the timelines and keep a rough template for each chapter ready before beginning the writing process.
You can find Sarthak’s book on Amazon by following this link click.