The Story

Kaggle 2020 Report: 'State of Machine Learning and Data Science"

A recent Kaggle 35 question survey was given to several thousand data scientists with a wide variety of backgrounds. The report, State of Machine Learning and Data Science, delves deep into data scientist demographics as well as their methodologies and algorithms. The respondents we asked anonymously a wide variety of from age and gender to advanced questions about their coding experience. A lot can be gleaned from this in depth report but we would like to highlight some of the most important takeaways on the dynamic nature of the machine learning and data science industries.

Experience and Background:

-80% male and under age 35

-36% from either USA or India with 73% having a Master’s degree or higher

-76% have less than 10 years of programming experience, with even less experience in Machine Learning

Choice of Technology, and Methods/Algorithms:

-For Interactive Development Environments (IDEs), JupyterLab was by far the most popular, being used by almost 75% of data scientists. RStudio, Pycharm and Visual Studio Code were all used by around 30% of respondents.

-Python based packages were the most commonly used for machine learning with Scikit-learn(83%), Tensorflow(50%), and Keras(50%) leading the way.

-The most commonly used included Linear and Logistic Regression(84%), Decision Trees (78%), Gradient Boosting(61%), and Neural Networks(43%).

