The Species Distribution Model Experiment (SDM) lets you investigate the potential distribution of a species under current climatic conditions. The BCCVL currently provides 17 algorithms across 4 different categories to run your species distribution model.

**Note:** You will need to run a Species Distribution Model before you can run a Climate Change or Biodiverse Experiment.

#### Overview of SDM algorithms/methods in the BCCVL

**Profile models**

These models only use occurrence data, and are based on the characterisation of the environmental conditions of locations associated with species presence.

Bioclim / Surface Range Envelope | Defines a multi-dimensional environmental space bounded by the minimum and maximum values of environmental variables for all occurrences as the potential range where a species can occur |

**Statistical regression models**

These models produce estimates of the effect of different environmental variables on the distribution of a species. These models use all the data available to estimate the parameters of the environmental variables, and construct a function that best describes the effect of these predictors on species occurrence. The suitability of a particular model is often defined by specific model assumptions.

Generalized Linear Model | A regression model for data with a non-normal distribution, fitted with maximum likelihood estimation. |

Generalized Additive Model | A multiple regression model that uses smoothed functions of the environmental variables to model non-linear relationships between the response and the predictors. |

Multivariate Adaptive Regression Splines | A regression model that builds multiple linear regression models across the range of predictor values by partitioning the data and run a linear regression model on each different partition. This allows to model complex relationships between the response and predictor variables. |

Flexible Discriminant Analysis | A classification model based on a mixture of linear regression models, which uses optimal scoring to transform the response variable so that the data are in a better form for linear separation, and multiple adaptive regression splines to generate the discriminant surface. |

**Machine learning models**

These models typically use one part of the dataset to ‘learn’ and describe the dataset (training) and the other part to to assess the accuracy of the model.

Maxent | Predicts species occurrences by finding the distribution that is most spread out, or closest to uniform, while taking into account the limits of the environmental variables of known locations. |

Classification Tree | Predicts species occurrence by repeatedly splitting the dataset into mutually exclusive groups based on a threshold value of one of the environmental variables. |

Random Forest | Grows many decision trees based on random subsets of the data and averages the predictions of these trees to estimate the importance of each environmental variable. |

Boosted Regression Tree / General Boosting Model | Predicts species occurrence probabilities based on a combination of decision trees and boosting. It uses a stagewise procedure to iteratively fit random subsets of the data that are weighted in such a way that new trees take into account the error of previously built trees. |

Artificial Neural Network | A ‘black box’ model that predicts species occurrence probabilities as a weighted combination of features, which are calculated in a hidden layer from linear combinations of the predictor variables. |

**Geographic models**

These models only use the geographic location of known occurrences of a species to predict the likelihood of presence in other locations, and do not rely on the values of environmental variables. They are therefore not regarded as true species distribution models, but they can give a good overview of the spatial extent of the occurrence of a species.

Circles | Predicts that a species is present at sites within a certain radius around observed occurrences, and absent beyond that radius |

Convex Hull | Predicts that a species is present at sites inside the minimum spatial convex hull around observed occurrences, and absent outside that hull |

Geographic Distance | Predicts species occurrences based on the assumption that the closer to a known presence, the more likely it is to find the species |

Inverse-Distance Weighted Model | Predicts species occurrence probabilities for unknown locations as the average of values at nearby known locations weighted by their inverse distance from the unknown location |

Voronoi Hull | Predicts that a species is present inside voronoi hulls around observed occurrences, which consist of all points whose distance to the known location is less than or equal to its distance to any other known location, and absent outside those hulls |

#### What is a Species Distribution Model?

Let’s start with first asking why it is important to understand where species occur. There are many different answers to this question. For a start, it is fundamental to our understanding of the biology and the natural history of a species. But there are also many different applications of species distribution models: they can help to identify areas that should be prioritised for conservation, for example for endangered species that are vulnerable to extinction. They can be of value in evaluating the potential of an invasive species to settle in particular areas. They can also help determine potential routes of infections and diseases, which makes them important for public health and safety as well. Another application is to combine them with future projections of changes of the natural environment. This means they can be used to predict how biodiversity will be affected by impacts such as climate change or changes in land use. So there are many important reasons for why we want to know where particular species can occur, but how do we predict species distributions?

Developing a species distribution model begins with observations of species occurrences: these are places where we know a species has been found. These occurrences are mostly point-based and come from sources such as museum records and observations of experts in the field. However, if you look up a distribution map of a species, it often shows a range rather than dots on a map. So, how do we go from specific places where individuals of a species have been observed to producing a map that gives an estimate of the distribution of that species? This is where species distribution models come into play.

There are two approaches that you can take in estimating species distributions. You can either use a mechanistic model, which specifically incorporates known species’ tolerances to environmental conditions, such as the maximum temperature in which a species can survive. This requires detailed data on the physiological response of species to environmental factors, but this data is often not available. The second approach is the correlative approach, which is mostly used in species distribution models, and is also the focus of this course. This approach is used when we don’t have the detailed information about species’ tolerances to particular environmental variables. The correlative approach is based on the assumption that the current distribution of a species is a good indicator of its ecological requirements.

To calibrate a correlative species distribution model we need two types of input data: species occurrences, and measurements of a suite of environmental variables, such as temperature and rainfall. These two types of data are then put into an algorithm to find associations between the known occurrences of a species and the environmental conditions at those sites, so we can identify the environmental conditions that are suitable for a species to survive. In other words, they describe relationships between species distributions and environmental variables. So we know something about where species occur and something about the environmental conditions of those places. The next thing we need to understand is how the algorithm uses this data. The algorithm uses these two types of information to estimate the probability of a species occurring in a place as some function of the environmental conditions of that place.

Once we’ve built the model, we can then project the predicted species distribution geographically on a map. For every point in the landscape, the model estimates the probability of a species occurring there. This can either be displayed as a binary outcome, that means as a presence/absence map, or as a probability on a scale from 0 to 1, with for example darker coloured areas representing a higher likelihood that a species can occur in that place. It is important to note that these maps do not show actual occurrences of a species, but highlight areas that have similar environmental conditions to areas where we have already found the species, and thus it is an estimation of where a species can occur. This does not necessarily mean that a species actually exists in the area.