Causal Inference and Propensity Score Methods

In the field of machine learning and particularly in supervised learning, correlation is crucial to predict the target variable with the help of the feature variables. Rarely do we think about causation and the actual effect of a single feature variable or covariate on the target or response. Some even …

more ...

Hive UDFs and UDAFs with Python

Sometimes the analytical power of built-in Hive functions is just not enough. In this case it is possible to write hand-tailored User-Defined Functions (UDFs) for transformations and even aggregations which are therefore called User-Defined Aggregation Functions (UDAFs). In this post we focus on how to write sophisticated UDFs and UDAFs …

more ...


Interactively visualizing distributions in a Jupyter notebook with Bokeh

If you are doing probabilistic programming you are dealing with all kinds of different distributions. That means choosing an ensemble of right distributions which describe the underlying real-world process in a suitable way but also choosing the right parameters for prior distributions. At that point I often start visualizing the …

more ...

Introduction to the Python Data Science Stack

Often I get the question as a Data Scientist what the Python Data Science Stack actually is and where a beginner should start to learn. The Python ecosystem, especially around the topics such as data analytics, data mining, data science and machine learning is so vast and rich that it …

more ...