Performance evaluation of cetacean species distribution models developed using generalized additive models and boosted regression trees

Citation
Becker EA, Carretta JV, Forney KA, et al (2020) Performance evaluation of cetacean species distribution models developed using generalized additive models and boosted regression trees. Ecology and Evolution 10:5759–5784. https://doi.org/10.1002/ece3.6316
Abstract

Species distribution models (SDMs) are important management tools for highly mobile marine species because they provide spatially and temporally explicit information on animal distribution. Two prevalent modeling frameworks used to develop SDMs for marine species are generalized additive models (GAMs) and boosted regression trees (BRTs), but comparative studies have rarely been conducted; most rely on presence-only data; and few have explored how features such as species distribution characteristics affect model performance. Since the majority of marine species BRTs have been used to predict habitat suitability, we first compared BRTs to GAMs that used presence/absence as the response variable. We then compared results from these habitat suitability models to GAMs that predict species density (animals per km2) because density models built with a subset of the data used here have previously received extensive validation. We compared both the explanatory power (i.e., model goodness of fit) and predictive power (i.e., performance on a novel dataset) of the GAMs and BRTs for a taxonomically diverse suite of cetacean species using a robust set of systematic survey data (1991–2014) within the California Current Ecosystem. Both BRTs and GAMs were successful at describing overall distribution patterns throughout the study area for the majority of species considered, but when predicting on novel data, the density GAMs exhibited substantially greater predictive power than both the presence/absence GAMs and BRTs, likely due to both the different response variables and fitting algorithms. Our results provide an improved understanding of some of the strengths and limitations of models developed using these two methods. These results can be used by modelers developing SDMs and resource managers tasked with the spatial management of marine species to determine the best modeling technique for their question of interest.