I spent this week working on fixing issues we encountered during the integration of our three subparts as well as trying new methods for training my models that show better results than my previous methods.
Focusing on the model improvements, I was able to use the rankings provided by the Auton Lab’s AutoML tool to select Random Forest Regression as a more effective model for our wind data than LSTMs were. I was also able to discern the most effective preprocessing steps to take as well as optimal hyper-parameters, all of which i used to write a new python script for training and predicting on a random forest regression pipeline. This method achieved impressive results in terms of testing metrics, with an R^2 score of 0.922 and a NRMSE of 9.8%, the best scores we’ve gotten yet and well under our goal for this project. Next week I plan on using the AutoML tool to analyze our load and solar data as well and see if a better model can be used for each of them.
In terms of verification for my portion of the project, I’ve been utilizing Sklearn’s evaluation modules (mean_absolute_error, mean_squared_error, r2_score, etc.) to record metrics for every algorithm I try in my modeling of power data. These metrics are outputted for test data, which is unseen by the model during training to avoid data leakage. In addition, these metrics are compared with the linear regression metrics I generated at the beginning of this project, to ensure improvements have been made. In the absence of a ground truth for predictions made using future weather forecasts, I ensure manually that my results follow a logical trend over time (i.e. solar increases in the day and decreases in the night).