FDEP - Office of Water Policy and Ecosystem Restoration
January 13, 2021
Goal: Develop a series of water quality models based on hydrodynamic indicators to be used in planning model scenario evaluation for LOSOM using RSMBN.
Period of Record: May 1981 – April 2019 (WY1982 – 2019)
Goal: Develop a series of water quality models based on hydrodynamic indicators to be used in planning model scenario evaluation for LOSOM using RSMBN.
Period of Record: May 1981 – April 2019 (WY1982 – 2019)
Parameters of Interest: Total Phosphorus and Total Nitrogen.
Goal: Develop a series of water quality models based on hydrodynamic indicators to be used in planning model scenario evaluation for LOSOM using RSMBN.
Period of Record: May 1981 – April 2019 (WY1982 – 2019)
Parameters of Interest: Total Phosphorus and Total Nitrogen.
Predictor Variables: Discharge (S80, S308 and C44 Basin) converted from ft3 s-1 to Acre-Ft d-1 and Lake Okeechobee stage elevation were considered.
Goal: Develop a series of water quality models based on hydrodynamic indicators to be used in planning model scenario evaluation for LOSOM using RSMBN.
Period of Record: May 1981 – April 2019 (WY1982 – 2019)
Parameters of Interest: Total Phosphorus and Total Nitrogen.
Predictor Variables: Discharge (S80, S308 and C44 Basin) converted from ft3 s-1 to Acre-Ft d-1 and Lake Okeechobee stage elevation were considered.
Statistical Modeling:
Goal: Develop a series of water quality models based on hydrodynamic indicators to be used in planning model scenario evaluation for LOSOM using RSMBN.
Period of Record: May 1981 – April 2019 (WY1982 – 2019)
Parameters of Interest: Total Phosphorus and Total Nitrogen.
Predictor Variables: Discharge (S80, S308 and C44 Basin) converted from ft3 s-1 to Acre-Ft d-1 and Lake Okeechobee stage elevation were considered.
Statistical Modeling:
Consistent with Caloosahatchee River Estuary Nutrient Loading Model.
Cumulative discharge (S80) and rainfall (across C44 basin) for the period of May 1979 - Apirl 2019 (WY1980 - 2019) with breakpoints identified using segmented regression.
ln(TPLoadS80)=QC44Basin+QS308+ln(Q.S80)+MeanLakeStage
ln(TPLoadS80)=QC44Basin+QS308+ln(Q.S80)+MeanLakeStage
TP load was log-transformed to fit the assumptions of linear modeling.
Model assumptions tested and verified (see Model Diagnostics)
Variance inflation factors (VIF) evaluated for model
Variable | VIF |
QC44 | 2.35 |
QS308 | 2.15 |
ln(QS80) | 3.53 |
Mean Lake Stage | 2.88 |
ln(TPLoadS80)=QC44Basin+QS308+ln(Q.S80)+MeanLakeStage
TP load was log-transformed to fit the assumptions of linear modeling.
Model assumptions tested and verified (see Model Diagnostics)
Variance inflation factors (VIF) evaluated for model
Variable | VIF |
QC44 | 2.35 |
QS308 | 2.15 |
ln(QS80) | 3.53 |
Mean Lake Stage | 2.88 |
ln(TPLoadS80)=QC44Basin+QS308+ln(QS80)+MeanLakeStage
S-80 total phosphorus model results and estimates using available data during the water year 1982 - 2019 period. Data were split into training and testing datasets (70:30). | |||||
Estimate | Standard Error | t-value | ρ-value | ||
(Intercept) | -2.49 | 0.68 | -3.64 | ≤ 0.01 | |
QC44 | -2.85x10-7 | 7.07x10-7 | -0.40 | 0.69 | |
QS308 | -5.29x10-8 | 2.35x10-7 | -0.22 | 0.82 | |
ln(QS80) | 1.22 | 0.06 | 20.21 | ≤ 0.01 | |
Mean Lake Stage | -0.13 | 0.05 | -2.57 | 0.02 | |
Residual standard error: 0.22 on 20 degrees of freedom | |||||
Multiple R-squared: 0.98, Adjusted R-squared: 0.98 | |||||
F-statistic: 289.8 on 20 and 4, ρ-value: ≤ 0.01 |
ln(TPLoadS80)=QC44Basin+QS308+ln(QS80)+MeanLakeStage
S-80 total phosphorus model results and estimates using available data during the water year 1982 - 2019 period. Data were split into training and testing datasets (70:30). | |||||
Estimate | Standard Error | t-value | ρ-value | ||
(Intercept) | -2.49 | 0.68 | -3.64 | ≤ 0.01 | |
QC44 | -2.85x10-7 | 7.07x10-7 | -0.40 | 0.69 | |
QS308 | -5.29x10-8 | 2.35x10-7 | -0.22 | 0.82 | |
ln(QS80) | 1.22 | 0.06 | 20.21 | ≤ 0.01 | |
Mean Lake Stage | -0.13 | 0.05 | -2.57 | 0.02 | |
Residual standard error: 0.22 on 20 degrees of freedom | |||||
Multiple R-squared: 0.98, Adjusted R-squared: 0.98 | |||||
F-statistic: 289.8 on 20 and 4, ρ-value: ≤ 0.01 |
ln(TPLoadS80)=−2.49−(2.85x10−7×QC44Basin)−(5.29x10−8×QS308)+(1.22×ln(QS80))−(0.13×MeanStage)
Model Diagnostics plots
ln(TPLoadS80)=QC44Basin+QS308+ln(QS80)+MeanLakeStage
Relative importance of each predictor calculated by partitioning R2 by averaging sequential sums of squares over all orders of regressors (Lindeman et al 1979). All metrics are normalized to a sum of 100%.
ln(TPLoadS80)=QC44Basin+QS308+ln(QS80)+MeanLakeStage
Relative importance of each predictor calculated by partitioning R2 by averaging sequential sums of squares over all orders of regressors (Lindeman et al 1979). All metrics are normalized to a sum of 100%.
Relative Importance Metrics for the S80 TP Load annual model.
Predictor | Percent of R² |
QC44 | 13.5 |
QS308 | 14.8 |
ln(QS80) | 56.2 |
Mean Lake Stage | 15.6 |
ln(TPLoadS80)=QC44Basin+QS308+ln(QS80)+MeanLakeStage
Actual versus predicted TP loads at S-80 based on predictive model. Actual and predicted concentration were highly correlated (Spearman’s correlation: r=0.97, ρ<0.01).
ln(TPLoadS80)=QC44Basin+QS308+ln(QS80)+MeanLakeStage
Actual versus predicted TP loads at S-80 based on predictive model. Actual and predicted concentration were highly correlated (Spearman’s correlation: r=0.97, ρ<0.01).
Model Fit
Train:Test
Model RSE (backtransformed): 26739.11
Mean absolute percentage error - lower the better Min_Max Accuracy - higher the better Nash-Sutcliffe - 1 = perfect model (error variance divided by observed variance); https://en.wikipedia.org/wiki/Nash%E2%80%93Sutcliffe_model_efficiency_coefficient
Kling-Gupta - similar to NS range -1 to 1
ln(TPLoadS80)=QC44Basin+QS308+ln(QS80)+MeanLakeStage
Actual versus predicted TP loads at S-80 with each k-model presented.
ln(TPLoadS80)=QC44Basin+QS308+ln(QS80)+MeanLakeStage
Actual versus predicted TP loads at S-80 with each k-model presented.
k-fold (k=10)
Cross-validation error (average k errors)
| Parameter | Mean | Min | Max |
Model | R2adj | 0.97 | 0.96 | 0.98 |
RMSE | 0.20 | 0.17 | 0.23 | |
Train:Test | MAPE 1 | 19 | 13 | 28 |
MMA 1 | 84 | 78 | 88 | |
NS 2 | 0.93 | 0.89 | 0.97 | |
KG 2 | 0.86 | 0.74 | 0.98 | |
1 Mean Absolute Percent Error (MAPE) and Min-Max Accuracy (MMA) expressed in percent | ||||
2 NS = Nash-Sutcliffe coefficient | ||||
2 KG = Kling-Gupta coefficient |
ln(TNLoadS80)=QC44Basin+QS308+ln(QS80)+MeanLakeStage
ln(TNLoadS80)=QC44Basin+QS308+ln(QS80)+MeanLakeStage
TP load was log-transformed to fit the assumptions of linear modeling.
Model assumptions tested and verified (see Model Diagnostics)
Variance inflation factors (VIF) evaluated for model
Variable | VIF |
QC44 | 2.35 |
QS308 | 2.15 |
ln(QS80) | 3.53 |
Mean Lake Stage | 2.88 |
ln(TNLoadS80)=QC44Basin+QS308+ln(QS80)+MeanLakeStage
TP load was log-transformed to fit the assumptions of linear modeling.
Model assumptions tested and verified (see Model Diagnostics)
Variance inflation factors (VIF) evaluated for model
Variable | VIF |
QC44 | 2.35 |
QS308 | 2.15 |
ln(QS80) | 3.53 |
Mean Lake Stage | 2.88 |
ln(TNLoadS80)=QC44Basin+QS308+ln(QS80)+MeanLakeStage
S-80 total nitrogen model results and estimates using available data during the water year 1982 - 2019 period. Data were split into training and testing datasets (70:30). | |||||
Estimate | Standard Error | t-value | ρ-value | ||
(Intercept) | 1.76x10-2 | 0.51 | 0.03 | 0.97 | |
QC44 | 6.60x10-8 | 5.24x10-7 | 0.13 | 0.90 | |
QS308 | 1.99x10-7 | 1.74x10-7 | 1.14 | 0.27 | |
ln(QS80) | 1.06 | 0.04 | 23.66 | ≤ 0.01 | |
Mean Lake Stage | -1.70x10-2 | 0.04 | -0.47 | 0.65 | |
Residual standard error: 0.16 on 20 degrees of freedom | |||||
Multiple R-squared: 0.99, Adjusted R-squared: 0.99 | |||||
F-statistic: 510.9 on 20 and 4, ρ-value: ≤ 0.01 |
ln(TNLoadS80)=QC44Basin+QS308+ln(QS80)+MeanLakeStage
S-80 total nitrogen model results and estimates using available data during the water year 1982 - 2019 period. Data were split into training and testing datasets (70:30). | |||||
Estimate | Standard Error | t-value | ρ-value | ||
(Intercept) | 1.76x10-2 | 0.51 | 0.03 | 0.97 | |
QC44 | 6.60x10-8 | 5.24x10-7 | 0.13 | 0.90 | |
QS308 | 1.99x10-7 | 1.74x10-7 | 1.14 | 0.27 | |
ln(QS80) | 1.06 | 0.04 | 23.66 | ≤ 0.01 | |
Mean Lake Stage | -1.70x10-2 | 0.04 | -0.47 | 0.65 | |
Residual standard error: 0.16 on 20 degrees of freedom | |||||
Multiple R-squared: 0.99, Adjusted R-squared: 0.99 | |||||
F-statistic: 510.9 on 20 and 4, ρ-value: ≤ 0.01 |
TNLoadS79=1.76×10−2+(6.60×10−8QC44Basin)+(1.99×10−7QS308)+(1.06×10−2ln(QS80))−(1.70x10−2MeanStage)
Model Diagnostics plots
ln(TNLoadS80)=QC44Basin+QS308+ln(QS80)+MeanLakeStage
Relative importance of each predictor calculated by partitioning R2 by averaging sequential sums of squares over all orders of regressors (Lindeman et al 1979). All metrics are normalized to a sum of 100%.
ln(TNLoadS80)=QC44Basin+QS308+ln(QS80)+MeanLakeStage
Relative importance of each predictor calculated by partitioning R2 by averaging sequential sums of squares over all orders of regressors (Lindeman et al 1979). All metrics are normalized to a sum of 100%.
Relative Importance Metrics for the S80 TN Load annual model.
Predictor | Percent of R² |
QC44 | 14.7 |
QS308 | 16.2 |
ln(QS80) | 51.2 |
Mean Lake Stage | 17.9 |
ln(TNLoadS80)=QC44Basin+QS308+ln(QS80)+MeanLakeStage
Actual versus predicted TN loads at S-80 based on predictive model. Actual and predicted concentration were highly correlated (Spearman’s correlation: r=0.96, ρ<0.01).
ln(TNLoadS80)=QC44Basin+QS308+ln(QS80)+MeanLakeStage
Actual versus predicted TN loads at S-80 based on predictive model. Actual and predicted concentration were highly correlated (Spearman’s correlation: r=0.96, ρ<0.01).
Model Fit
Train:Test
Model RSE (backtransformed): 102326
Mean absolute percentage error - lower the better Min_Max Accuracy - higher the better
ln(TNLoadS80)=QC44Basin+QS308+ln(QS80)+MeanLakeStage
Actual versus predicted TN loads at S-80 with each k-model presented.
ln(TNLoadS80)=QC44Basin+QS308+ln(QS80)+MeanLakeStage
Actual versus predicted TN loads at S-80 with each k-model presented.
k-fold (k=10)
Cross-validation error (average k errors)
| Parameter | Mean | Min | Max |
Model | R2adj | 0.97 | 0.96 | 0.99 |
RMSE | 0.19 | 0.14 | 0.24 | |
Train:Test | MAPE 1 | 19 | 9 | 26 |
MMA 1 | 85 | 80 | 92 | |
NS 2 | 0.93 | 0.88 | 0.99 | |
KG 2 | 0.88 | 0.74 | 0.96 | |
1 Mean Absolute Percent Error (MAPE) and Min-Max Accuracy (MMA) expressed in percent | ||||
2 NS = Nash-Sutcliffe coefficient | ||||
2 KG = Kling-Gupta coefficient |
Annual observed versus predicted ( ± 95% CI) S-80 load during the period of record (WY1982 – WY 2019) with hurricane years identified.
Annual observed versus predicted ( ± 95% CI) S-79 load during the period of record (WY1982 – WY 2019) with hurricane years identified.
Similar to CRE models, period of record monthly nutrient concentrations were considered
Other restoration planning efforts (i.e. Restoration Strategies) have used this method in the past.
Evaluated by comparing observed versus estimate (i.e. "predicted") by computing RMSE
Root Mean Square Error (RMSE)
RMSE=√∑ni=1(Xi−^Xi)2n
Xi: Oberseved value
^Xi: Predicted value
n: Number of observations
Month | Total Phosphorus | Total Nitrogen |
Jan | 106 ± 36 (64) | 1.18 ± 0.32 (63) |
Feb | 108 ± 49 (66) | 1.28 ± 0.41 (66) |
Mar | 114 ± 51 (67) | 1.27 ± 0.47 (68) |
Apr | 117 ± 42 (66) | 1.24 ± 0.47 (65) |
May | 132 ± 72 (67) | 1.12 ± 0.34 (65) |
Jun | 178 ± 83 (59) | 1.29 ± 0.4 (56) |
Jul | 216 ± 106 (63) | 1.36 ± 0.45 (63) |
Aug | 197 ± 82 (69) | 1.37 ± 0.67 (67) |
Sep | 221 ± 97 (65) | 1.45 ± 0.38 (62) |
Oct | 192 ± 62 (62) | 1.48 ± 0.41 (64) |
Nov | 155 ± 62 (69) | 1.41 ± 0.45 (70) |
Dec | 113 ± 39 (62) | 1.24 ± 0.29 (61) |
a Mean ± Std Dev (N) | ||
POR: Jan 1981 - April 2019 | ||
Station ID: C44S80 | ||
Data Source: SFWMD DBHydro |
Root mean standard error for models and period of record estimates. | ||
Model | Estimate Method | RMSE A B |
TP Load | Model | 23916 |
POR Est. | 36478 | |
TN Load | Model | 91523 |
POR Est. | 329708 | |
A RMSE value for POR Est. calculated using observed values versus annual estimated values using monthly mean concentrations | ||
B RMSE value for Model - All Data backcalculated on untransformed predicted and observed values |
Comparison of observed, modelled and period of record estimated nutrient loads at S-80 between Florida Water Year 1982 - 2019 (May 1981 - April 2019).
Comparison of observed, modelled and period of record estimated nutrient flow-weighted mean at S-80 between Florida Water Year 1982 - 2019 (May 1981 - April 2019).
Application of model with RSM-BN outputs1
1Provisional RSM BN outputs with POR extension. For demonstration/testing purposes only.
Application of model with RSM-BN outputs1
1Provisional RSM BN outputs with POR extension. For demonstration/testing purposes only.
Compare loading conditions of selected alternatives to some base conditions (i.e. ECB, LORS08, etc).
Both models assume that C43 and C44 Reservoirs are providing temporary storage of existing/available water.
Both models do not incorporate potential water quality treatment features
To evalute potential WQ improvements loading could be evaluated post processing in a Monte-Carlo like evaluation assuming a degree of treatment (i.e. % reduction, X metric tons, etc.).
South Florida Water Management District (DBHYDRO)
HTML Slide deck © Julian (2020)
RMarkdown Source
S80 TP Model diagnostics
S80 TP model diagnostics plots (Top Left: Residuals vs Fitted, Bottom Left: Normal Q-Q, Top Right: Scale-Location, Bottom right: Residuals vs leverage.).
GVLMA (Global Stats = 1.75, ρ =0.78)
Shapiro-Wilk normality test (W=0.94, ρ =0.12)
S80 TP Model residual Autocorrelation Function.
TP Model plots
S80 TN Model diagnostics
S80 TN model diagnostics plots (Top Left: Residuals vs Fitted, Bottom Left: Normal Q-Q, Top Right: Scale-Location, Bottom right: Residuals vs leverage.).
GVLMA (Global Stats = 6.68, ρ =0.27)
Shapiro-Wilk normality test (W=0.98, ρ =0.92)
S80 TN Model residual Autocorrelation Function.
TN Model plots
Nash-Sutcliffe
NS=1−∑nt=1(Xs,t−Xo,t)2∑nt=1(Xo,t−μo)2
n : total number of time-steps
Xs,t : simulated value at timestep t
Xo,t : observed value at timestep t
μo : mean of observed values
The ratio of error variance of the modeled versus observed timeseries
Kling-Gupta
KG=1−√(rpearson−1)2+(σsσo−1)2+(μsμo−1)2
rpearson : Pearson correlation coefficient
μs : mean of simulated values
σo : standard deviation of observed values
σs : standard deviation of simulated values
Decomposition of NS representing the degree of correlation, bias and variablity of simulated and observed values.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |