The opportunity to participate in Data Science Dojo’s: A Hands-on Introduction to Data Science bootcamp was a simple decision as I have been a consumer of bootcamps for several years and have found my success varies with them. In my prior self-paced learning, I found that there were concepts that I simply did not understand well, or perhaps was not explicitly stated in whatever course I was taking. I wanted to experience an in-person immersive bootcamp, with the hopes that practical examples and in-person interactions will be helpful in understanding and retaining the material. Not to mention I was able to network with others that are interested in this field.
Data Science Dojo is taught by Raja Iqbal, CEO and Chief Data Scientist. He is a talented presenter, and appreciated his style of teaching the material. He was accompanied by Arham Akheel, whom assisted Raja in helping students and also provided us with machine learning demonstrations. This combination was very complimentary to one another and worked well. Please check out Data Science Dojo’s website, and check their schedule for they may be coming to a city near you!
The bootcamp was offered in Albuquerque, New Mexico for 3 days instead of the prior 5-day bootcamp. From what I understand, we were the first cohort to try this format.
On this first day, we spent some time looking into data exploration, and how to approach data problems. We discussed things as a group, and I enjoyed the energy from class. We discussed that a model is only as good as the data provided to it-garbage in, garbage out. Data is the new oil, and is the most valuable asset a company can have, however, we as the data scientists need to tap into that resource by refining it and getting the most value from it. One thing that I have personally struggled with that this course was VERY helpful for, was learning how to ask the right questions, and evaluating business impact. It is our job to ask questions. Many times in the past, I was given a task, and I simply began to hammer away without questions asked. In data science, feature engineering and data exploration are the most important tasks, as these activities help to further define and evaluate if this is a worthy endeavor for a company.
On this day, we began to delve into machine learning algorithms, more specifically supervised learning. I found this valuable, as I myself, have the most experience and understanding with supervised learning. We stressed again before building a model to ask, “What is the intended use of this model”, as that would be pertinent information in determining what features and format to provide the model to the stakeholders’ that will use it. We analyzed the titanic dataset in detail, and discussed what features to include in our decision tree model. We also discussed entropy, stopping criteria, and splitting. Our homework assignment was to submit our titanic model to our leaderboard. I did not place very high, lol.
On the last day of the bootcamp, we discussed the pitfalls in machine learning such as overfitting/underfitting and understanding the bias/variance tradeoff. I have read about this topic to the point of nausea in other settings, but this truly helped me to understand it. Seeing practical examples helped me to put this in context. What was interesting and new to me was discussing how to properly evaluate a model, as it is NOT always about the accuracy-sometimes (depending upon the problem and domain), it is about the precision or recall! We then spent a great deal of time on hyperparameter tuning, and then how to deploy our machine learning model as a web service, which was way too cool.
What I’ve Learned
I did not completely understand how to tune hyperparameters and how to properly evaluate the performance of a model before the bootcamp. Now I understand why this is necessary and how to carry out this task. We bridged the gap between data science and business value in this course, and that was the foundation going forward. What I learned that it is not always about the accuracy of a model, and to align the business needs with precision or recall depending upon the domain and problem one is looking to solve.
I have learned why it is important for the data scientist to ask questions, and not just questions in general, but the RIGHT questions, and how the most important tasks before building a model is data exploration, data discovery and feature engineering. We need to understand the business impact and how this model will add value. This for me was paramount. Too many times do we focus on wanting cool models to say we are involved in machine learning rather than focusing on the business need.
I have learned how to use Microsoft tools to build and deploy a model as a web service. I found the ease and simplicity of this to be amazing and something I would like to continue to explore.
- The in-person class setting was helpful in order to understand and connect to the topics at hand. For those that have taken online bootcamps with varying success, you may also appreciate being able to interact with the instructor and other students.
- The breadth of material covered was impressive. I appreciated that we covered the most important topics in machine learning and addressed common mistakes. We dedicated some of the day to hyperparameter tuning when a model is not performing optimally.
- We addressed the proper mindset to be in for data analysis. How to ask the right questions, and to not be afraid to ask questions! -Raja and Arham have great chemistry as team mates, and are fantastic instructors.
- The condensed format was rather overwhelming. This material isn’t truly suited for a 3-day setting. We really only just scratched the surface. This cannot be truly helped, but it was worth mentioning.
- This course is not for those that are new to programming and/or data science. Although we did use Microsoft Azure for Machine Learning, there is an assumption that the student has some familiarity with programming and data science concepts. You will likely get more out of this course if you have some prior knowledge.
I highly recommend this bootcamp for those that would like to increase their knowledge in data science. This experience was valuable for me so that I can bridge the gap between theory and implementation. From this point on, more learning will be required, but this gave me the boost in the right direction. Cheers!