It feels great to be in the top 1% of the kaggle competition hosted by MIT.

The competition is pretty fierce and a lot of seasoned ML implementers have participated in this competition.

But the top 3 is just on another level. The have reached more than 90% accuracy which is incredible given the fact that the data is very noisy.

The competition is about what predicts the voting outcomes? The data set comes from ‘Show of Hands’ and includes 108 variables and around 5500 observations. The data is pretty noisy in the sense that there are lots of missing values. On average, there are about 1900 empty observations in each ‘Question’ variables and there are 100 of these variables.

I have spent more than 40+ hours on this competition and have learned a lot of things during this course. From my understanding, for this particular data set cleaning data is more important than modeling. My position was 7th but in a couple of days moved down to 12th. There are 9 more days to go and it would be very interesting how the standings turn out as I think many people have overfitted their model. Will update once the final standings are up.