Chapter 6

Discussion and Future Work

6.1 Synthesis of Contributions

This thesis successfully develops, validates, and applies the ‘Potential Height’ framework, a novel methodology for quantifying the physical upper bound of tropical cyclone storm surges and its response to climate change. The core argument is constructed as a logical chain, with each chapter’s findings providing a necessary component for the next.

First, the framework requires a new, robust theoretical foundation. Chapter 2 provides this by adapting the nascent ‘potential outer size’ (\(r_a\)) theory from Wang et al. (2022) into a new, observable metric: the ‘potential inner size’ (\(r_{max}\)). This chapter compares this metric against observations and proposes its physical domain of applicability (c. \(10^\circ\)-\(30^\circ\) latitude). It also proposes a new physical diagnostic: that ‘supersize’ events could identify storms undergoing extratropical transition above \(30^\circ\) latitude.

With these thermodynamic bounds (PI and PI potential size, \(V_p\) and \(r_3\)) established, Chapter 3 builds the core computational framework. It integrates these bounds with Bayesian optimisation to efficiently find the worst-case TC tracks in ADCIRC, discovering a quantifiable and significant non-stationary trend (e.g., an 11% increase in ‘potential height’ for New Orleans by 2097). Crucially, this chapter demonstrates the framework’s practical value through an idealised statistical simulation, demonstrating that this physical bound substantially reduces uncertainty in high-return-period (e.g., 1-in-500-year) events.

Chapter 4 provides essential methodological and geographical validation. It shows that the Bayesian optimisation approach is robust, with the final potential height being stable across different random seeds. Its comprehensive kernel analysis, using over 700 simulations, provides a key methodological insight: that Matérn kernels are superior for modelling the non-smooth surge response. By expanding the analysis to Galveston and Miami, this chapter also illustrates the framework’s generalisability, correctly identifying that local geography dictates distinct optimal scenarios (e.g., slow-moving TCs for enclosed bays vs. fast-moving TCs for open coasts).

Finally, Chapter 5 demonstrates the dual utility of the framework’s products. It repurposes the ‘potential height’ scenarios as a critical out-of-distribution testbed for machine learning. By developing the ‘SurgeNet’ GNN, this chapter provides a proof-of-concept for the next generation of physically-inspired emulators and, more importantly, a methodology for testing their ability to extrapolate to the unobserved extremes that matter most.

Taken together, this work contributes not just a new hazard metric (the ‘Potential Height’), but a complete, validated methodology for its calculation and a clear demonstration of its two primary applications: (1) as a direct physical constraint for probabilistic risk assessment and (2) as a novel testbed for validating AI-driven hazard models.

6.2 Critical Analysis, Limitations, and Uncertainty Quantification

6.2.1 Uncertainty Quantification in the Potential Height Framework

While the Potential Height framework introduced here is a conceptual advance, its implementation and interpretation warrant critical examination. The framework’s robustness rests on several key assumptions and dependencies that represent important areas for future research. In order for it to be used, we must consider both the systematic and random errors that might arise in each step of the framework.

We show in Chapter 3, Section Section 3.2.5, that even with large random errors in the potential height estimate (under idealised assumptions up to 5 m error on a 7 m potential height), it can still provide useful constraints on extreme value analysis, but we do not explore systematic bias. However, the assumptions that we make in that statistical simulation may be too optimistic (such as the choice of the shape parameter of the generating function), and we do not explore how systematic bias in the potential height estimate might affect the results. In order for it to be used in the future, we must show that the combination of systematic and random errors in the framework is small enough that the potential height still provides useful information.

The errors caused by the idealisations made in the potential intensity and potential size models are difficult to quantify. We could investigate the sensitivity of the model to the free parameters such as the supergradient factor, \(\gamma_{\text{sg}}\), the lift parameter, \(\beta_l\), and the efficiency relative to the Carnot engine, \(\eta\). We could also compare the potential size model against more observational data, focusing on radii with better quality and more widespread observations such as the radius of 34 knot winds \(r_{34}\) as suggested in Chapter 2. However, if we merely identify that the potential size is rarely exceeded in observations, this does not necessarily validate the model, as it could be too high rather than too low.

It would also not take account of all of the problems highlighted in our approximations. For instance, a limitation that we discuss in Chapter 2 of the PI and PS models is that they do not explicitly account for TC translation speed, a parameter with a significant influence on storm surge magnitude (e.g., Mori et al. 2022). While the Bayesian optimisation loop varies translation speed to find the maximum surge, the physical limit on potential intensity and PI potential size is calculated independently, instead of calculating it separately for the gridpoint that corresponds to each approach direction, a disconnect that could be explored in future theoretical model refinements.

The climate model data (e.g., CMIP6, Eyring et al. 2016) is also a significant source of uncertainty. As discussed in Chapter 2, we should carefully select GCMs that accurately represent the relevant thermodynamic variables (e.g., sea surface temperature, atmospheric humidity, atmospheric temperature) in the regions of interest, and should consider what is the most effective way to bias correct these variables against the ERA5 reanalysis data (Goldenson et al. 2023). We could estimate the uncertainty through the spread between ensemble members, but this is highly likely to underestimate the true uncertainty (Hourdin et al. 2023), as GCMs tend to share similar biases and structural errors (e.g., Stainforth 2023; Melo Virı́ssimo and Stainforth 2025).

For the ADCIRC storm surge model, this uncertainty quantification may be possible through comparing the output data produced in Chapter 5 to historical tidal gauge observations in the domain (e.g., Center for Operational Oceanographic Products and Services 2025). Thomas (2025b) provides the full set of successful simulations run with our ADCIRC setup with IBTrACS input. If the comparison shows that the model is systematically biased in particular regions, we could change our ADCIRC settings, perhaps systematically via Bayesian optimisation, in order to address this. We could use the error in any historical examples that have not been used to tune the model settings as an estimate of the random and systematic errors that might be expected in the potential height simulations. However, the potential height simulations are far more extreme than any historical events, and so we must be careful about extrapolating these error statistics to the potential height simulations.

Each of the major elements we have excluded from our storm surge model will lead to an underestimate of the maximum sea level that could be experienced at a location. We do not include tides, wave setup and runup, or sea level rise. Tides are quite small in the Gulf of Mexico, only 0.3 m range in New Orleans. They are significantly larger in mesotidal regions like Miami, where the mean range exceeds 0.75 m (Center for Operational Oceanographic Products and Services 2025). Tide-surge interaction can also add to its significance (Arns et al. 2020; Feng et al. 2019). Wave setup is the wave breaking that contributes to increased water levels, and can be incorporated by coupling ADCIRC to e.g., SWAN (Booij et al. 1996). It can add significantly to the storm surge contributing 15–20% to total water height (Bunya et al. 2010; Dietrich et al. 2011; Forbes et al. 2010; Phan et al. 2013). Wave runup is the addition of periodic waves on top of this heightened sea level, and it can contribute a further 5% to total water level (Phan et al. 2013). Finally, sea level rise is significant, with relative sea level projections of \(\sim 1.4 \text{ m}\) by SSP5-8.5 for New Orleans by 2100, which includes the expected effect of subsidence (e.g., Dokka 2011; Sweet et al. 2022). For SSP5-8.5, the global sea level rise is expected to be 0.77 m by 2100 (likely range between 0.63 m and 1.01 m) (IPCC 2021, chap. 9). All of these factors could interact nonlinearly with each other, leading to more than a simple addition of their individual contributions. Future work should explore how to include these ignored processes in the potential height framework.

6.2.2 Comparison to Existing Methods and Practical Limitations

From a practical standpoint, the methodology is computationally demanding (and likely more demanding than the Maximum Potential Surge, MPS, as described in Mori et al. 2022). Although Bayesian optimisation is far more efficient than a brute-force grid search, the process of finding the true maximum surge for every point along a coastline requires significant computational resources (see Chapter 3 and Chapter 4). This may limit its application to specific, high-risk locations rather than broad regional assessments, where the less computationally intensive MPS model may still serve as a valuable screening tool.

The interpretation and application of the potential height value require careful consideration. It represents a theoretical physical ceiling for a given climate state, not an event with a quantifiable return period. Its primary role is as a constraint for probabilistic hazard curves, not as a direct design specification for most coastal infrastructure, which is typically designed to a probabilistic standard (e.g., the 100-year flood level). There is a risk that the concept could be misinterpreted as a new deterministic design standard, potentially leading to inefficient over-engineering. Communicating the nuanced role of this physical bound within a broader, probabilistic risk assessment framework will be crucial for its effective use in coastal planning and engineering practice.

The potential height is an interesting way to probe, and perhaps constrain, the existing catalogue methods used to estimate storm surge hazard, not a replacement for them. As we detail in Chapter 1, the existing state of the art in storm surge catastrophe modelling requires large catalogues of synthetic storms to estimate both the hazard for a particular location, but also how that hazard is linked to other locations through a joint probability distribution. As it stands, the potential height framework focuses on single points, and so cannot provide this joint distribution information, and so cannot replace these catalogue methods. However, catastrophe modellers may doubt whether the large synthetic events contain events that are extreme enough or perhaps instead that they are too extreme. It can be very difficult to reason about these unobservable extremes, and even harder to plausibly model how they might change with climate change. Even if the method is too expensive to be used for every location, it could be used selectively to model points of particular economic and social value, helping to critique their existing view of risk. As mandates from central banks call for the calculation of non-stationary physical climate risk exposure (Basel Committee on Banking Supervision 2022, 2025; Holden et al. 2024), for both insurers and banks (see e.g. Bank of England 2022), the catastrophe modelling industry will continue to move from backward-looking stationary catastrophe models to non-stationary climate change-aware catastrophe models. Having a physically based upper bound on hazard is a vital tool.

6.3 Future Directions for the Potential Height Framework

6.3.1 An Improved Potential Height of Tropical Cyclone Storm Surges

A relatively simple way of improving the validity of the potential height of TC storm surges would be to add more degrees of freedom to the Bayesian optimisation loop:

The trade-off between intensity and size: As discussed in Section Section 6.2.1, the current framework calculates the potential size assuming the windspeed is at its potential intensity. However, in Chapter 2 we see that we can now cheaply model this tradeoff, calculating a new potential size at a different intensity in perhaps 44 seconds, quick enough to easily be included in the Bayesian optimisation loop. A less intense storm should be able to reach a larger size than a more intense storm existing in the same climatological conditions. This would allow us to find areas where a larger but less intense storm produces a larger surge, and vice versa. In this context, the potential intensity would be the upper bound on the intensity parameter, rather than the fixed value to put into the potential size model.
Curvature of the storm track: Instead of assuming that the storm travels along a line of constant bearing (as in Chapters 3 and 4), we can add another parameter to add parabolic curvature to the track. Curvature is included in Ide et al. (2024), and they show that adding it as an additional degree of freedom has a small positive effect on the maximum storm surge height found (\(<5\%\)). Track curvature is already implemented in our Python repository PotentialHeight (see Appendix A).

In addition it would be helpful to make this small correction:

Variable thermodynamic reference location: Currently we choose a single thermodynamic reference location for each city for which we want to calculate potential height. In Chapter 4 we choose the closest ocean cell to the city, and use that location to calculate the PI and PI PS. This should likely be changed depending on the track, as a TC that approaches over an area of warmer water could be more intense and larger and conversely if it approaches over cooler water it could be less intense and smaller. A method might be to find the last ocean grid point that is intersected by the track before TC impact.

6.3.2 A related family of potentials

6.3.2.1 A Maximum Potential Height for TC Storm Surges

Instead of merely selecting potential intensity of a single monthly average, Mori et al. (2022) calculate potential intensity for the CMIP6 datasets, and explore how the distribution of potential intensity changes their measure for maximum potential surge. This allows them to capture crucial variability in Potential Intensity through events such as marine heat waves, which may drastically increase the energy available to a TC, presumably increasing both PI and PI PS. Schwerdt et al. (1979) also use the 99th percentile sea surface temperature to calculate the probable maximum hurricane (PMH) central pressure deficit. The simplest method for us to follow would be to use conditions from the 99th percentile of calculated potential intensity, and use this as the conditions to calculate the potential size as we vary the intensity input. This would be a higher upper bound.

6.3.2.2 A Potential Height for TC Extreme Sea Level

Currently our ADCIRC setup does not include tides. In order to work out how high the water levels might really become we could include these, and add the phase of the tide in relation to the TC impact location as another variable for the Bayesian optimisation loop to choose. We would expect a TC that hits the coast at high tide to lead to much higher water levels at the coast. A method for this might be to vary the time in the 12 hours around spring high water, which is the point in the tidal cycle when the tidal amplitude is largest. Allowing this phase parameter to vary more than this would make the optimisation space increasingly multimodal without commensurate increases in potential height.

6.3.2.3 Tropical Cyclone Potential Inundation

Rather than merely focusing on the height at a coastal point, if we include an ADCIRC mesh that includes the land, we could instead model how much of the land is covered by water during each storm surge model. The target for the Bayesian optimisation loop to optimise could be the maximum area inundated during the simulation, perhaps within the city limits. Such ADCIRC models are significantly more computationally expensive, and would themselves have to be carefully validated against historical events, but this would provide a more direct measure of hazard than the potential height at a single point.

6.3.2.4 Potential Economic or Social Damage

The final logical goal once we are happy with the consistency of our physical hazard model components might be to begin to drive the other components of a Catastrophe model (see e.g. Grossi et al. 2005; Guin 2018). Once we can accurately calculate the area that could be flooded, we might want to work out how this translates into how much damage this flood would cause. This would require a trustworthy vulnerability model to translate the physical input including inundation and windspeed to economic or social damage. These additional models would add significant computational cost, leading to a very slow Bayesian optimisation loop. We could then work out what the most damaging storm for a city might be either economically or in terms of particular social goals. This would be a way to test the resilience of important infrastructure such as hospitals to possible TCs.

6.3.2.5 Worldwide Potential Height

For each of these improvements, as long as we are satisfied with the generalisability of our calculation of potential size, we could extend this to other coastlines. In particular high exposure and hazard coastlines like near Hong Kong (as explored in Chapter 2), and the Bay of Bengal would be valuable areas to extend the framework to. This would require setting up ADCIRC meshes for these locations, and validating ADCIRC setups including them against historical events. Once this is done, we could calculate the potential height for these locations, and explore how they might change with climate change.

6.4 Implications and Broader Impact

6.4.1 Integrating an upper bound into risk estimation

We show that you can integrate the upper bound into a monovariate GEV distribution (Chapter 3, Section Section 3.2.5), and this achieves certain benefits even when the upper bound is quite uncertain. However, we do not consider how this could be integrated into a more advanced setting (see e.g., Coles 2001). A simple way to do this might be to calculate the potential height over time for a point for which we have observations, and then introduce this as a time-evolving upper bound in a non-stationary extreme value analysis.

6.4.2 A testbed for future ML developments

We show in Chapter 5 that it is interesting to test the extrapolation of emulators for these extreme values (and provide it for easy download via Hugging Face at Thomas 2025a). It is likely that these types of tests will become progressively more important as emulators are increasingly applied as quick stand-ins for storm surge models in a forecast setting. As the potential height framework is expanded to other coastlines, we can also generate new test data for these locations. As we add more processes to the storm surge model (e.g., tides, wave runup), we can also generate new training and test data that can be used to test the corresponding emulator.

6.5 Final Remarks

Ultimately, this thesis seeks to transform the worst-case storm surge from a ‘Black Swan’, a fundamentally unpredictable outlier, to a ‘Grey Swan’, an event that is physically foreseeable given sufficient knowledge of the climate system (Taleb 2007; Lin and Emanuel 2016). By imposing thermodynamic constraints to quantify this limit, the Potential Height framework stretches our current modelling capabilities to their edge, yet provides a necessary counterweight to standard probabilistic approaches. While this physical upper bound cannot offer a complete view of risk on its own, it serves as a critical vantage point to contextualise and critique the often inscrutable catalogues of synthetic cyclones. This ensures that our preparation for a non-stationary future is bounded not by a lack of imagination, but by the fundamental limits of the atmosphere and ocean.

References

Arns, Arne, Thomas Wahl, Claudia Wolff, et al. 2020. “Non-Linear Interaction Modulates Global Extreme Sea Levels, Coastal Flood Exposure, and Impacts.” Nature Communications 11 (1): 1–9. https://doi.org/10.1038/s41467-020-15752-5.

Bank of England. 2022. Results of the 2021 Climate Biennial Exploratory Scenario (CBES).

Basel Committee on Banking Supervision. 2022. Principles for the Effective Management and Supervision of Climate-Related Financial Risks. https://www.bis.org/bcbs/publ/d532.pdf.

Basel Committee on Banking Supervision. 2025. A Framework for the Voluntary Disclosure of Climate-Related Financial Risks. https://www.bis.org/bcbs/publ/d597.pdf.

Booij, N, LH Holthuijsen, and RC Ris. 1996. “The" SWAN" Wave Model for Shallow Water.” In Coastal Engineering 1996. https://doi.org/10.1061/9780784402429.053.

Bunya, S., J. C. Dietrich, J. J. Westerink, et al. 2010. “A High-Resolution Coupled Riverine Flow, Tide, Wind, Wind Wave, and Storm Surge Model for southern Louisiana and Mississippi. Part I: Model Development and Validation.” Monthly Weather Review 138 (2): 345–77. https://doi.org/10.1175/2009mwr2906.1.

Center for Operational Oceanographic Products and Services. 2025. NOS CO-OPS Water Level Data, Verified Hourly. National Oceanic and Atmospheric Administration. https://tidesandcurrents.noaa.gov/.

Coles, Stuart. 2001. An Introduction to Statistical Modeling of Extreme Values. Vol. 208. Springer.

Dietrich, J. C., J. J. Westerink, A. B. Kennedy, et al. 2011. “Hurricane Gustav (2008) Waves and Storm Surge: Hindcast, Synoptic Analysis, and Validation in Southern Louisiana.” Monthly Weather Review 139 (8): 2488–522. https://doi.org/10.1175/2011mwr3611.1.

Dokka, Roy K. 2011. “The Role of Deep Processes in Late 20th Century Subsidence of New Orleans and Coastal Areas of southern Louisiana and Mississippi.” Journal of Geophysical Research: Solid Earth 116 (B6). https://doi.org/10.1029/2010jb008008.

Eyring, Veronika, Sandrine Bony, Gerald A Meehl, et al. 2016. “Overview of the Coupled Model Intercomparison Project Phase 6 (CMIP6) Experimental Design and Organization.” Geoscientific Model Development 9 (5): 1937–58.

Feng, Jianlong, Wensheng Jiang, Delei Li, Qiulin Liu, Hui Wang, and Kexiu Liu. 2019. “Characteristics of Tide–Surge Interaction and Its Roles in the Distribution of Surge Residuals Along the Coast of China.” Journal of Oceanography 75 (3): 225–34.

Forbes, Cristina, Richard A Luettich Jr, Craig A Mattocks, and Joannes J Westerink. 2010. “A Retrospective Evaluation of the Storm Surge Produced by Hurricane Gustav (2008): Forecast and Hindcast Results.” Weather and Forecasting 25 (6): 1577–602. https://doi.org/10.1175/2010waf2222416.1.

Goldenson, Naomi, L. Ruby Leung, Linda O. Mearns, et al. 2023. “Use-Inspired, Process-Oriented GCM Selection: Prioritizing Models for Regional Dynamical Downscaling.” Bulletin of the American Meteorological Society 104 (9): E1619–29. https://doi.org/10.1175/bams-d-23-0100.1.

Grossi, Patricia, Howard Kunreuther, and Don Windeler. 2005. “An Introduction to Catastrophe Models and Insurance.” In Catastrophe Modeling: A New Approach to Managing Risk. Springer. https://doi.org/10.1007/0-387-23129-3_2.

Guin, Jayanta. 2018. “What Makes a Catastrophe Model Robust.” In Risk Modeling for Hazards and Disasters. Elsevier. https://doi.org/10.1016/b978-0-12-804071-3.00002-1.

Holden, Lewis, Jordan King, Harriet Richards, Caspar Siegert, and Lukasz Krebel. 2024. “Measuring Climate-Related Financial Risks Using Scenario Analysis.” Bank of England Quarterly Bulletin, ahead of print. https://doi.org/10.2139/ssrn.4947406.

Hourdin, Frédéric, Brady Ferster, Julie Deshayes, Juliette Mignot, Ionela Musat, and Daniel Williamson. 2023. “Toward Machine-Assisted Tuning Avoiding the Underestimation of Uncertainty in Climate Change Projections.” Science Advances 9 (29): eadf2758. https://doi.org/10.1126/sciadv.adf2758.

Ide, Yoshihiko, Shinichiro Ozaki, Masaru Yamashiro, and Mitsuyoshi Kodama. 2024. “Development and Improvement of a Method for Determining the Worst-Case Typhoon Path for Storm Surge Deviation Through Bayesian Optimization.” Engineering Applications of Artificial Intelligence 132: 107950.

IPCC, Intergovernmental Panel on Climate Change. 2021. The Physical Science Basis: Working Group I Contribution to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge University Press.

Lin, Ning, and Kerry A Emanuel. 2016. “Grey swan tropical cyclones.” Nature Climate Change 6 (1): 106–11. https://doi.org/10.1038/nclimate2777.

Melo Virı́ssimo, Francisco de, and David A Stainforth. 2025. “Micro and Macro Parametric Uncertainty in Climate Change Prediction: A Large Ensemble Perspective.” Bulletin of the American Meteorological Society, BAMS–D.

Mori, Sotaro, Tomoya Shimura, Takuya Miyashita, Adrean Webb, and Nobuhito Mori. 2022. “Future Changes in Extreme Storm Surge Based on a Maximum Potential Storm Surge Model for East Asia.” Coastal Engineering Journal 64 (4): 630–47. https://doi.org/10.1080/21664250.2022.2145682.

Phan, LT, DN Slinn, and SW Kline. 2013. “Wave Effects on Hurricane Storm Surge Simulation.” In Advances in Hurricane Engineering: Learning from Our Past. https://doi.org/10.1061/9780784412626.065.

Schwerdt, Richard W, Francis P Ho, and Roger R Watkins. 1979. “NOAA Technical Report NWS 23: Meteorological Criteria for Standard Project Hurricane and Probable Maximum Hurricane Windfields, Gulf and East Coasts of the United States.” NOAA Technical Report NWS 23, National Oceanic and Atmospheric Administration, US Department of Commerce, Washington, D.C. https://library.oarcloud.noaa.gov/noaa_documents.lib/NWS/TR_NWS/TR_NWS_23.pdf.

Stainforth, David A. 2023. Predicting Our Climate Future: What we know, what we don’t know, and what we can’t know. Oxford University Press.

Sweet, William V., Benjamin D. Hamlington, Robert E. Kopp, et al. 2022. Global and Regional Sea Level Rise Scenarios for the United States: Updated Mean Projections and Extreme Water Level Probabilities Along U.S. Coastlines. NOAA Technical Report NOS 01. National Oceanic; Atmospheric Administration. https://oceanservice.noaa.gov/hazards/sealevelrise/noaa-nos-techrpt01-global-regional-SLR-scenarios-US.pdf.

Taleb, Nassim Nicholas. 2007. The Black Swan: The Impact of the Highly Improbable. Random House.

Thomas, Simon D. A. 2025a. SurgeNet Test Dataset – Potential Height Simulations – Alpha Version. Hugging Face. https://doi.org/ 10.57967/hf/7006 .

Thomas, Simon D. A. 2025b. SurgeNet Training Dataset. Hugging Face. https://doi.org/10.57967/hf/6971.

Wang, Danyang, Yanluan Lin, and Daniel R Chavas. 2022. “Tropical Cyclone Potential Size.” Journal of the Atmospheric Sciences 79 (11): 3001–25. https://doi.org/10.1175/jas-d-21-0325.1.