View on GitHub

thirdwave

Finding Gold in Data

Interesting talk: So this gentleman (along with his team) won top prize in a data analysis contest; trying to predict existence of whales from few seconds of audio. Few interesting things happen here - first of all, Kridler (the winner) tries to access some academic work related to the subject, but is blocked by paywalls. Then he basically says “f–k domain knowledge” and uses data as is, extracts some key features, and runs machine learning algorithms on it. The result: detection model accuracy jumps from 72% to 98%, where the former number belongs to existing / older models. More interesting point is, existing models were surpassed within the first few days of the challenge. Oh, and Kridler does not even make use of basic audio processing / speech recognition research. This is where a lot of success in data mining comes from - just communicate with your data, create features, tune your model (beware of overfitting however), turn the crank, and iterate.