Often I get the question as a Data Scientist what the Python Data Science Stack actually is and where a beginner should start to learn. The Python ecosystem, especially around the topics such as data analytics, data mining, data science and machine learning is so vast and rich that it confuses many rookies.

For such an audience I created a slide deck that starts with pointing out the benefits of the Python language for analytics. Even beginners in Python are addressed by some slides that explain the syntax of Python and how to get started. After that some slides present the most important packages of the data science stack, namely NumPy, SciPy, Pandas, Scikit-Learn, Jupyter and IPython. The merits of Jupyter are best shown in a live demonstration to convey its power. The interplay between Pandas and Scikit-Learn is shown based on Kaggle’s Titanic: Machine Learning from Disaster dataset. Eventually, an outlook to further libraries in the data science domain are presented.

View in fullscreen


Comments

comments powered by Disqus