Classifying SCPs, Part 2: Data transformation (TF-IDF) and preprocessing

After we have obtained data through the means described in the first part of this blog post series, it is time to deal with data transformations and data preprocessing. While humans can comprehend textual information in the form of articles, it is hard to digest for a Machine Learning algorithm. In this blog post, we …

Effective Python 2nd Edition: What’s new?

Effective Python by Brett Slatkin is a book filled with best practices of the Python programming language. I devoured the first edition as an ebook and was eager to buy the second edition as a physical book. Having skimmed through it, I was already satisfied. The new edition features updates, the removal of all Python …

Why I created my own fork of the Data Science Cookiecutter template

The Data Science Cookiecutter template is a great way to quickly set up your Data Science project. For instance, I have used and recommended it for my Machine Learning project as well as for a Data Analysis project at work. In this blog post, I want to emphasize four reasons why I created my own …

Quickfix: jupyter nbconvert with clear-output flag not working

Jupyter comes with a command line utility jupyter nbconvert that can transform a jupyter notebook into different formats. Furthermore, it has options to tweak the output. For instance, the execute flag executes the notebook before transforming it while the clear-output flag is supposed to remove outputs from the notebook. Thus, if you want to execute …

Classifying SCPs, Part 1: Building a web crawler

This first blog post in the series of our Data Science project of classifying SCPs is concerned with the starting point of any data-related problem, really: How do we obtain data? Fortunately, the data we want to build a Machine Learning model upon is readily available in the form of articles with standardized URLs. Thus, …

A Machine Learning Project: Classifying SCPs, Part 0, Overview

Tutorials on Machine Learning tend to emphasize the process of training models, interpreting their scores and tuning hyper-parameters. While these are certainly important tasks, most tutorials and documentations are not concerned with how the data is obtained; mostly, a toy data set is used that is provided by the Machine Learning library. In a series …

Quickfix: ‘Push rejected’ when deploying a Python app on Heroku

Heroku is a platform that enables you to deploy your web application in a quick and painless manner; unless you’re stumbling upon a ‘Push rejected’ error with next to no hint how to resolve it. Problem description I stumbled upon this when trying to deploy a flask app but I’m pretty sure this will also …

Quickfix: Muting search highlights in Emacs Evil mode

This quickfix is about muting search highlights when using the text editor Emacs with its vim emulation Evil mode. Problem description Sometimes the quickest way to navigate through a file with vim is to enter a search term and work your way through the search results if necessary. Search highlighting may assist you by providing …

Quickfix: Using Lombok with IntelliJ causes compiler error “cannot resolve method”

Quickfixes are short posts that deal with small problems that I encounter and their solutions. This blog post is about a compiler error I stumbled about when using the Java library Lombok in conjunction with the IDE IntelliJ. Problem description To understand the problem, we first need to know what the library Lombok is all …