Florian Wilhelm's blog

More Efficient UD(A)Fs with PySpark

With the release of Spark 2.3 implementing user defined functions with PySpark became a lot easier and faster. Unfortunately, there are still some rough edges when it comes to complex data types that need to be worked around.

more ...

How mobile.de brings Data Science to Production for a Personalized Web Experience

As Germany’s biggest online car marketplace, mobile.de provides a personalized web experience. Our Data Team leverages the interactions of our users to infer their preferences. For this tasks we often apply Python and Spark to wrangle massive amounts of data. In this talk, we are going to present …

more ...

Efficient UD(A)Fs with PySpark

Nowadays, Spark surely is one of the most prevalent technologies in the fields of data science and big data. Luckily, even though it is developed in Scala and runs in the Java Virtual Machine (JVM), it comes with Python bindings also known as PySpark, whose API was heavily influenced by …

more ...

Hive UDFs and UDAFs with Python

Sometimes the analytical power of built-in Hive functions is just not enough. In this case it is possible to write hand-tailored User-Defined Functions (UDFs) for transformations and even aggregations which are therefore called User-Defined Aggregation Functions (UDAFs). In this post we focus on how to write sophisticated UDFs and UDAFs …

more ...

Leveraging the Value of Big Data with Automated Decision Making

It is a widely accepted fact that we are living in the era of Big Data. Many traditional companies are looking for ways to improve their business through the virtues of Big Data and Data Science. While matured startups born in this era like Facebook and Twitter seem to naturally …

more ...

Handling Big Data with Python

The talk presented at the PyCon 2013 in Cologne gives a small introduction of how Blue Yonder applies machine learning and Predictive Analytics in various fields as well as the challenges of Big Data. Using the example of Blue Yonder’s machine learning software NeuroBayes, I show the made efforts …

more ...