Calculating MSE and MAE
The mean squared error for all the predictions is 281726.7. The mean absolute error for all predictions is 232.1.
The mean squared error for the 10 biggest over-predictions is 198050.7. The mean absolute error for the 10 biggest over-predictions is 444.1. The fact that the MSE for the 10 biggest over-predictions is less than the MSE for all the predictions demonstrates that the over-predictions were not that severe compared with the under-predictions.
The mean squared error for the 10 biggest under-predictions is 9073386.6. The mean absolute error for the 10 biggest under-predictions is 2401.4.
The MSE and MAE for the 10 biggest under predictions is very large compared with those for all predictions and even those for the over predictions. It seems especially large when considering that the actual and predicted price are in units of thousands of dollars. However, when looking closer at these values, it seems that the houses that were undervalued by the model are worth millions in real life; I suspect that this is caused by geographic location rather than a physical attribute of the house. Geographic location makes a significant contribution to house value especially in such extreme cases.
The mean squared error for the 10 most accurate predictions is 42.1. The mean absolute error for the 10 most accurate predictions is 5.75.
In which percentile do the 10 most accurate predictions reside? Did your model trend towards over or under predicting home values?
The homes with the most accurate predictions cost between 286k and 520k, placing them between the 30th and 80th percentiles. This is a pretty wide range for the most accurate predictions, indicating that the model works pretty well predicting the value of a lot of the homes.
The average of all the differences is 27.3, so the model trends slightly toward over predicting models. It seems that the model is fairly accurate at predicting data that it is in the middle of the range but not as good at predicting the extremes (more expensive houses and cheaper houses). Based on the fact that the MSE for the 10 biggest under predictions is so high but the most accurate predictions are between the 30th and 80th percentiles, I’d hypothesize that there’s a small subset of super expensive homes in the top 10 or 15 percentile range that skews the MSE. It also seems that the range of prices for the homes that are overpredicted is much smaller, but that there are more homes that are over predicted than under predicted. Perhaps this is because of the presence of such drastically expensive homes.
Which feature appears to be the most significant predictor in the above cases?