The job system of Octopath Traveler and the symmetric group

Octopath Traveler is an RPG where you take control of eight characters, each having a unique job with a unique set of skills. In the course of the game you may give each character a secondary job (but you can’t endow them with the same job twice). Since a party may only consist of four …

Creating test data with Faker and Factory Boy

Creating test data is essential for data scientists and data engineers, especially when working with large datasets that need to be transformed. It goes without saying that such transformations should be tested thoroughly: You do not want to wait for a few minutes for your transformation to finish only to realize you’ve misspelled the column …

Classifying SCPs, Part 2: Data transformation (TF-IDF) and preprocessing

After we have obtained data through the means described in the first part of this blog post series, it is time to deal with data transformations and data preprocessing. While humans can comprehend textual information in the form of articles, it is hard to digest for a Machine Learning algorithm. In this blog post, we …

Effective Python 2nd Edition: What’s new?

Effective Python by Brett Slatkin is a book filled with best practices of the Python programming language. I devoured the first edition as an ebook and was eager to buy the second edition as a physical book. Having skimmed through it, I was already satisfied. The new edition features updates, the removal of all Python …

Why I created my own fork of the Data Science Cookiecutter template

The Data Science Cookiecutter template is a great way to quickly set up your Data Science project. For instance, I have used and recommended it for my Machine Learning project as well as for a Data Analysis project at work. In this blog post, I want to emphasize four reasons why I created my own …

Quickfix: jupyter nbconvert with clear-output flag not working

Jupyter comes with a command line utility jupyter nbconvert that can transform a jupyter notebook into different formats. Furthermore, it has options to tweak the output. For instance, the execute flag executes the notebook before transforming it while the clear-output flag is supposed to remove outputs from the notebook. Thus, if you want to execute …

Classifying SCPs, Part 1: Building a web crawler

This first blog post in the series of our Data Science project of classifying SCPs is concerned with the starting point of any data-related problem, really: How do we obtain data? Fortunately, the data we want to build a Machine Learning model upon is readily available in the form of articles with standardized URLs. Thus, …

A Machine Learning Project: Classifying SCPs, Part 0, Overview

Tutorials on Machine Learning tend to emphasize the process of training models, interpreting their scores and tuning hyper-parameters. While these are certainly important tasks, most tutorials and documentations are not concerned with how the data is obtained; mostly, a toy data set is used that is provided by the Machine Learning library. In a series …

Quickfix: ‘Push rejected’ when deploying a Python app on Heroku

Heroku is a platform that enables you to deploy your web application in a quick and painless manner; unless you’re stumbling upon a ‘Push rejected’ error with next to no hint how to resolve it. Problem description I stumbled upon this when trying to deploy a flask app but I’m pretty sure this will also …

The Bayes classifier

The Bayes classifier is a Data Scientist’s unbeatable arch enemy. When faced with the challenge of finding an optimal solution to a classification problem, it is the ultimate (theoretical!) benchmark when evaluating the performance of a classifier. The problem with the Bayes classifier is that it uses information that generally is not available to us …