Managing isolated Environments with PySpark

The Spark data processing platform becomes more and more important for data scientists using Python. PySpark - the official Python API for Spark - makes it easy to get started but managing applications and their dependencies in isolated environments is no easy task.

more ...


Efficient UD(A)Fs with PySpark

Nowadays, Spark surely is one of the most prevalent technologies in the fields of data science and big data. Luckily, even though it is developed in Scala and runs in the Java Virtual Machine (JVM), it comes with Python bindings also known as PySpark, whose API was heavily influenced by …

more ...

Declarative Thinking and Programming

Declarative Programming is a programming paradigm that focuses on describing what should be computed in a problem domain without describing how it should be done. The post starts by explaining differences between a declarative and imperative approach with the help of examples from everyday life.

more ...