Physics vs AI

Table of Contents

Backyard Data Science - This article is part of a series.

Part 4: This Article

I’m billing this post as a knock-down, drag-out steel cage match: Physics versus Neural Networks. Can a model of pool heat flows based on physics predict the temperature of my pool water better than a neural network?

Let’s find out. The heat-flow model that can best predict the observed pool temperatures for a test set of measurements, given the same weather data points as inputs, will be deemed the champion.

In this corner… Physics

The Physics model for pool heat flows was created and optimized in the previous post in this series. It was built from knowledge of physical effects and the thermodynamics that should control how much heat flows into or out of the pool, based on various weather variables. From the modeled heat flow, the pool water temperature can be predicted.

In the Physics model, the weather variables were combined in ways based on physics - e.g., the temperature, humidity, wind, and barometric-pressure weather variables were combined in a way that represents the rate at which pool water is lost to evaporation, and from that, calculates the amount of heat that will be drawn from the pool to evaporate that water. Similarly, the model included physics-based formulas for convection and radiative cooling. The total heat flux into or out of the pool is the sum of the heat fluxes from each of the contributing effects.

Then, to account for model inaccuracies and the ‘microclimate’ around the pool, the Physics model was given a ‘free parameter’ for each contributor to the total heat flux, and those parameters were tuned and optimized (using standard fitting techniques) so that the heat flux calculated from the optimized model best matched the observed heat flux. Obtaining and checking the measured pool-temperature data and the historical weather data used to tune the model were explained in the first two posts in this series.

Thanks to the parameter tuning, the Physics model was able to reproduce the observed pool temperature from the historical weather data quite well. But how well would the Physics model predict pool temperatures from weather data that it had not ‘seen’ before?

And the Challenger… the Neural Network

Weighing in at just 8 hidden-layer neurons, the Neural Network.¹ The Neural Network was trained to predict heat flow using the same 1,368 hourly historical weather and pool-temperature measurements that were used to tune the Physics model. But unlike the Physics model, the Neural Network used the ‘raw’ weather variables (air temperature, wind speed, humidity, etc.) as inputs. The weather variables were not ‘pre-combined’ into heat-flux components, so there was no physics imposed on the inputs. Just the raw weather data.

The Neural Network doesn’t ‘know’ or care that the input data values are historical weather variables or that the goal is predicting the observed heat flux for a pool. It is just trying to exploit relationships between its inputs (weather variables) and its output (the heat flux) to best match the observed output. The Neural Network scoffs at the notion that knowledge of physics is necessary for the task at hand. All you need to do is throw the ‘raw’ historical weather data at the network, give it the desired heat-flux outputs for those inputs, and trust it to figure out how heat flux depends on those weather variables.

Let’s get ready to rumble!

As mentioned above, the historical weather data and pool-temperature data used to tune or train the models were the same, collected between Oct 21 and Dec 16, 2025. For this contest, the proving ground for the models was chosen to be a very challenging stretch of data from Dec 24, 2025 to Jan 7, 2026 that contained a precipitous temperature drop². Here are plots of the weather variables and the pool temperature over the test data set:

Measured pool temperature and weather conditions during the test set

There were 337 samples in the test set.

Both models have been trained/tuned to produce a predicted heat flux \(Q_{predicted}\), which is then directly converted to a \(\frac{dT}{dt}\) by multiplying by a constant according to the relation

\[ \frac{dT}{dt} = \frac{Q_{predicted} \cdot A}{m_{th}} \approx Q_{predicted} \cdot 1.29 \times 10^{-7}\]

as explained in the previous post in this series. Thus, both models use the same pool surface area and thermal mass, as seems fair. The predicted \(\frac{dT}{dt}\) is then integrated (using a measured starting pool temperature at the beginning of the test set data range) to get the predicted temperature over the entire test set. This predicted pool temperature is then compared to the measured pool temperature.

Whichever model produces the best match is the winner.

The Physics model comes out swinging!

On the test data set, the predicted and measured \(\frac{dT}{dt}\) for the Physics model looks like:

The Physics model prediction of dT/dt

Which leads to the predicted pool temperature plotted below, along with the observed pool temperature:

The Physics model prediction of temperature

Which is a strong performance. Let’s put numbers to how well the predicted pool temperature matched the observed temperature:

Metric	Value
RMSE	1.96 °F
MAE	1.76 °F

As explained in the previous post, both RMSE and MAE are measurements of mismatch between the observed and predicted temperatures; lower numbers are better.

But the Neural Network fights back!

On the same test data set, \(\frac{dT}{dt}\) for the trained Neural Network model looks like:

The Neural Network model prediction of dT/dt

Which leads to the pool temperature prediction:

The Neural Network model prediction of temperature

Metric	Value
RMSE	1.66 °F
MAE	1.42 °F

The RMSE for the Neural Network is less than for the Physics model. So it looks like we have a new champion, the Neural Network!

But wait! What’s this?

A new contender just entered the ring!

This challenger has a claim to be a hybrid, best-of-both-worlds competitor. It is again a neural network with the same architecture as the first Neural Network model, but now, instead of training the network using just the ‘raw’ historical weather data, it has been given additional inputs that are the calculated physical heat-flux components that went into the Physics model. That is, instead of just throwing the raw temperatures, humidity, wind, and pressure at the network, those quantities have been combined to compute heat-flux components, just as was done in the Physics model. So now, instead of forcing the neural network to figure out the (rather complex) combination of inputs that may have bearing on the pool heat flux via, e.g., evaporative heat loss, the evaporation rate is pre-calculated and fed as an input to the neural network. Same with the heat flux due to convection, longwave cooling, and solar radiation (to a lesser degree - it is just the raw solar-radiation weather variable scaled by a constant). So now the inputs to the Neural Network Plus Physics network are the raw weather variables as before, plus some new inputs that have some physics behind them. If that physics is correct and relevant, perhaps this Neural Network Plus Physics model can use it to do a better job of predicting pool temperature.

The trained Neural Network Plus Physics model was tested on the same test set, and the results are in:

Metric	Value
RMSE	1.18 °F
MAE	1.06 °F

Ladies and gentlemen, we have a new champion: the Neural Network Plus Physics model!

The Final Results

The performance of each of the models is summarized in this table:

Model	RMSE (°F)	MAE (°F)
Physics	1.96	1.76
Neural Network	1.66	1.42
Neural Network Plus Physics	1.18	1.06

The \(\frac{dT}{dt}\) and temperature plots for all the contenders are:

dT/dt predictions for all models

Temperature predictions for all models

While it is hard to tell which model performed best from the \(\frac{dT}{dt}\) plots, the temperature plot magnifies the differences between the models thanks to the fact that errors in \(\frac{dT}{dt}\) were compounded over time (which is normally an undesirable feature in any analysis).

It is interesting to note the similarity between the models: they all under-predicted the heat loss on several evenings.

Controversy

No competition is complete without the possibility of some kind of scandal.

A feature importance analysis is often done in studies that utilize neural networks. There are different ways to perform feature-importance analysis, but one called permutation feature importance involves slightly tweaking the values of each input feature in turn while keeping the others constant, running the values through the neural network, and noting how much the output of the neural network changes in response to that tweaking. If a small change in one of the inputs produces a large change in the neural-network output, that input is deemed to be important.

Performing permutation feature analysis on the Neural Network that used the raw weather variables as input produced the following results:

Feature importance analysis of the Neural Network

The taller the bar on the chart, the more important the feature. The air temperature and the solar radiation weather inputs were determined to be most important, while barometric pressure and cloud cover barely mattered at all. This seems somewhat reasonable, and probably would not surprise anyone who has a pool.

But performing the same feature importance analysis on the champion Neural Network Plus Physics model produced:

Feature importance analysis of the Neural Network

Longwave cooling \(Q_{longwave}\) was the most important factor?! That makes no sense at all. \(Q_{longwave}\) was a minor player in the Physics model (see plots in the previous post in this series), so why would it have a starring role in the best-performing neural network?

Some people would say that it doesn’t matter - the best-performing network is the best-performing network. Who cares how the neural network decided to ‘assign importance’ to the various inputs to produce the best output? Other people might say that this illustrates a fatal flaw of neural networks in general - they are inscrutable. It is hard for humans to understand just how a neural network produced its output, and what aspects of the training data the neural network found to be most significant. There is a whole subfield of research concerning interpretability and explainability of neural networks. The fear is that the neural network somehow incorrectly focused on a spurious or insignificant feature of the data to get results that only appeared to be correct, but would be catastrophically flawed if tested in some slightly different way. But maybe, just maybe, the neural network found some real connection between the inputs and the desired output during training that humans weren’t smart enough to spot?

For this case of \(Q_{longwave}\) importance, my theory is that there is an unexciting explanation. The Neural Network Plus Physics network perhaps decided that it needed a ‘constant term’ to best predict the heat flux. While it is possible to set up neural networks that include bias nodes to account for this constant component, that was not done here. So the network did the next best thing and used \(Q_{longwave}\) as the constant bias. From the heat-flux component plots in the previous post, \(Q_{longwave}\) is a small and nearly constant contributor to total heat flux - it changes little over time, but its small value means that it will have to be multiplied by a larger weight to make a significant (and nearly constant) contribution to the heat flux. So basically, \(Q_{longwave}\) only appeared to be important because it got pressed into service in place of a bias node to provide a constant offset to the other factors, which were themselves providing the hour-by-hour changes in heat flux.

Summary

Forgive the theatrics in this post; I perhaps had a bit too much fun with it.

Physics-Informed Neural Networks (PINNs) are a hot topic now. Generally speaking, PINNs incorporate physics into neural networks, and try to get the neural network to ‘respect’ any physics equations that should govern the data and outputs. Again speaking generally, we perhaps did a rudimentary form of this by incorporating pre-calculated quantities like the evaporation rate into the inputs of the Neural Network Plus Physics model. In effect, this was a hint to the neural network that a certain combination of weather variables (e.g., evaporation rate from the pool) has special relevance. Otherwise, the neural network has no physical ‘intuition’ at all about what aspects of the input data may be important. One input is just as potentially relevant (or not) as the next.

But, this does not make the Neural Network Plus Physics model a PINN. PINNs generally are used to solve complicated partial differential equations, and any physics that governs the solutions to those equations (e.g. conservation of energy) is incorporated during training by modifying the training loss function to penalize outputs that do not respect the physics. Thus the network is pushed during training towards producing physically valid solutions. That is not what we did here.

I was actually surprised and pleased that the feature importance analysis turned up such a surprising result, even though it may be explained away by something fairly prosaic. It did allow me to briefly discuss neural network interpretability though, which is a topic of interest to me that I would like to explore further in future projects.

The neural network used MLPRegressor from the sklearn package, utilizing the ReLU activation function and the Adam solver. The size of the hidden layer was determined by scanning different sizes (and numbers of layers) and picking the size that performed best. ↩︎
The temperature may have dropped low enough that some of the formulas and approximations used in building the Physics model may not be valid anymore. Does this perhaps give the Neural Network a slight advantage? ↩︎

Author

DrProton

Retired software developer, ex-physicist, wannabe musician and lifelong learner.