From Linear Regression to Neural Networks

From Linear Regression to Neural Networks

These days there exists much hype around sophisticated machine learning methods such as Neural Networks — they are massively powerful models that allow us to fit very flexible models. However, we do not always require the full complexity of a Neural Network: sometimes, a simpler model will do the job just fine. In this project, we take a journey starting from the most fundamental statistical machinery to model data distributions, linear regression, to then explain the benefits of constructing more complex models, such as logistic regression or a Neural Network. In this way, this text aims to build a bridge...

Read more » Github logo

Making Art with Generative Adversarial Networks

Making Art with Generative Adversarial Networks

Generative Adversarial Networks are a relatively new type of technique for generating samples from a learned distribution, in which two networks are simultaneously trained whilst competing against each other. Applications for GAN’s are numerous, including image up-sampling, image generation, and the recently quite popular ’deep fakes’. In this project, we aim to train such a Generative Adversarial Network ourselves, with the purpose of image generation, specifically. As the generation of human faces has been widely studied, we have chosen a different topic, namely: the generation of paintings. While large datasets of paintings are available, we have opted to restrict ourselves...

PDF Github logo

Musical Key Recognition using a Hidden Markov Model

Musical Key Recognition using a Hidden Markov Model

Musical keys can be thought of as the foundation on which music and songs are created. The development of an automated technique to recognise the overall key of a musical recording is driven by the need in the production and mixing of music. In this paper Hidden Markov Models (HMMs) are used to detect the key of tracks retrieved from Spotify’s web API. The relative strength of each pitch in different segments of a track, represented as the chroma vector, was extracted for about 10,000 tracks. The HMMs models were trained for each mode, major and minor, and shifted to...

Read more » Github logo

Finding 'God' components in Apache Tika

Finding 'God' components in Apache Tika

God Components (or ’God objects’) are components in a software system that have accumulated a large bulk of classes and lines of code over time. Such really large, bulky components are hard to maintain and to reason about; they are in fact a software anti-pattern. It is preferred to have smaller, isolated components instead. Although it is a common good practice to build software by creating small building blocks of reusable code and accessing them using a declarative and well-documented API, big code-bases might still suffer from scaling issues: large inter-weaved software components might develop that become difficult to reason...

Read more » Github logo

Backdoors in Neural Networks

Backdoors in Neural Networks

Neural Networks are in increasing popularity, being applied in ever more fields and applications. The expanding set of tools available to train Neural Networks makes it easier for both consumers and professionals to utilize the power of the architecture. The networks do come at a risk, however. Because big computer vision networks can take up vast computational resources to train, consumers resort to using pre-trained off-the-shelf models. Using pre-trained networks in critical applications without precaution might pose serious security risks - think of applications like biometrical identification with face recognition, traffic sign recognition for autonomous driving, or usage in robotics...

Read more » Github logo

COVID-19 Dashboard

COVID-19 Dashboard

We are in the midst of a global pandemic. At the time this project started, the Corona virus was still just a headline for most - but in the meantime it reached and impacted all of our lives. Fighting such a pandemic happens in many ways on multiple scales. We are interested in how this can be done on the societal level: using data. We built a pipeline capable of processing a large dataset and created a visualization of the areas most vulnerable to Corona which includes reported cases in real-time.

Read more » Github logo