Editorial Type:
Article Category: Research Article
 | 
Online Publication Date: 01 Jun 2013

The Importance of Geographic and Biological Variables in Predicting the Naturalization of Non-Native Woody Plants in the Upper Midwest

,
,
, and
Page Range: 124 – 131
DOI: 10.24266/0738-2898.31.2.124
Save
Download PDF

The selection, introduction, and cultivation of non-native woody plants beyond their native ranges can have great benefits, but also unintended consequences. Among these consequences is the tendency for some species to naturalize and become invasive pests in new environments to which they were introduced. In lieu of lengthy and costly field trials, risk-assessment models can be used to predict the likelihood of naturalization. We compared the relative performance of five established risk-assessment models on species datasets from two previously untested areas: southern Minnesota and northern Missouri. Model classification rates ranged from 64.2 to 90.5%, biologically significant errors ranged from 4.4 to 9.3%, and horticulturally limiting errors ranged from 6.6 to 30.4%. For the random forest model, we investigated the importance of variables used to predict naturalization by examining datasets for five distinct study areas across the Upper Midwest. Geographic-risk ratios were the most important predictors of species' tendency to naturalize. Other factors, such as quick maturity, record of invading elsewhere, and production of fleshy, bird-dispersed fruit were also important in the random forest models. Although some models tested need additional refinement, the random forest models maintain robustness and provide additional information on plant-specific characteristics that contribute to naturalization.

Significance to the Nursery Industry

The introduction of new woody landscape plants can generate economic benefits for consumers and the nursery industry. Some new introductions escape from cultivation, naturalize, and invade existing plant communities. This is a concern to many stakeholders, including gardeners, members of the nursery industry, and land managers. By studying past introductions, researchers can uncover patterns in life-history characteristics and native geographic ranges that allow prediction of naturalization and invasion. We seek to develop methods to safely introduce beneficial non-native plants while protecting native ecosystems and agricultural lands from invasion by non-native plants. To this end, we developed models to predict naturalization of non-native woody landscape plants cultivated in southern Minnesota and northern Missouri. We then combined these findings with results of similar analyses from Iowa and two study areas in the Chicago region to investigate which life-history and geographic characteristics are the strongest predictors across the Upper Midwest. In southern Minnesota and northern Missouri, the random forest method generated the best combination of species classification (as naturalizing or not naturalizing) and low error rates. When we ranked the importance of plant characteristics in random forest model predictions across the five datasets, by far the most powerful and consistent predictor of naturalization was a plant's geographic-risk ratio (G-value), a statistic based on the mean proportion of naturalizing species across an introduced plant's full native range. The next strongest predictive characteristic was whether a plant displayed rapid reproductive maturity, a trait that can be desirable for marketability, in cases when floral or fruiting display attract buyers. This suggests a need to carefully balance some desirable production characteristics with risks of naturalization and invasion.

Introduction

Over millennia, the native ranges of woody plant species have naturally expanded and contracted. In recent decades, human actions have greatly accelerated their expansion on a global basis (36). While many woody plant introductions have benefits that far outweigh their costs (42), some human-assisted plant introductions have resulted in naturalization (the ability of a plant to propagate and sustain a population outside of cultivation) and invasion (the ability of a naturalized plant to aggressively colonize and displace native plants) into new habitats, causing harm to native ecosystems and species (2, 6, 11). A recent inventory indicates that more than 700 species of woody plants (425 trees and 303 shrubs) are considered invasive in some part of the world (35). While this is only a small proportion of species that have been deliberately introduced for horticultural purposes, it is a serious and growing problem worldwide (37).

Within the scientific community, there has long been interest in understanding the characteristics that lead to species invasiveness (8) and this interest has rapidly expanded in recent years (e.g. 7, 9, 14, 15, 20). There is also growing interest in the specific impacts of these invasions on natural ecosystems' structure and function (12, 31, 47). It is clear that naturalization and invasion depend on both the life-history characteristics of the introduced plant and how well it is pre-adapted to a new habitat area (36).

Considerable study has been devoted to understanding how woody plants successfully naturalize or invade once arriving in a new habitat area. Researchers have proposed general schemes that include life-history characteristics, such as population fitness, generation time and level of fecundity, rate of population expansion, and individuals' competitiveness, as critical elements affecting the likelihood of becoming invasive (34, 36). Additionally, the native range of imported species, or the degree of the environmental match between the population source and the location of introduction, are important considerations (32, 38), especially at local and regional scales (49, 51, 52).

By using information gathered about introduced species, a growing body of research has specifically focused on developing predictive risk-assessment models to evaluate (ideally, before their release into the market and the landscape) the probability that a new woody plant will naturalize, invade, and cause harm (16, 20, 27, 33). As part of a larger strategy, risk assessments can proactively balance the societal and economic benefits of plant introductions against potential harms and avoid costly efforts to contain or eradicate species after escape (13, 28).

To these ends, we have developed and tested a set of predictive models to assess the likelihood of naturalization (a necessary prerequisite of invasion) for non-native woody plants in the Upper Midwest. The models use various statistical approaches to make their predictions based on both life-history characteristics (e.g., quick maturity, quick vegetative spread, invasive outside North America, and fleshy, bird-dispersed fruit) and a native-range criterion (G-value, described fully in Widrlechner et al. (52) and reiterated herein) (14, 52, 53). Our past work includes testing decision trees, CART models, combination models, and random forest techniques developed at continental, regional, and local scales on a series of introduced woody plant population datasets for Iowa (14, 52) and the Chicago area (53). Risk-assessment models generally assign one of three possible outcomes: ‘accept’ for plants not likely to naturalize, ‘reject’ for plants that are likely to naturalize, and ‘further analysis’ for plants when model results are unclear. Accuracy of risk-assessment models is assessed based on analysis of known naturalizing and non-naturalizing species in two ways. First, a high classification rate indicates the model is successfully placing species in ‘accept’ or ‘reject’ outcomes. Successful classification does not necessarily indicate correct classification, so the second assessment evaluates the degree to which the classifications are correct. Errors may either be described as ‘horticulturally limiting’ (rejecting a species not known to naturalize) or ‘biologically significant’ (accepting a species known to naturalize). In our previous work, classification rates have ranged between 62.0% (for a CART model applied to an Iowa dataset) and 93.1% (for decision tree and matrix models also applied to an Iowa dataset) (14); ‘horticulturally limiting’ errors have ranged from 3.7 to 38.5%, and ‘biologically significant’ errors have ranged from 1.8 to 18.5% (14, 52, 53).

This paper represents another step in the ongoing work to assess classification and error rates of risk-assessment models in the Upper Midwest by testing the relative performance of previously developed models on two new woody plant data-sets: one for southern Minnesota and another for northern Missouri. Second, we explore the relative importance of the various life-history and native-range criteria included in random forests generated from each dataset.

Materials and Methods

Our earlier analyses included plant datasets for Iowa and the Chicago area. For the research reported in this paper, we first established the scope of two new study areas for Minnesota and Missouri. Both of these study areas and corresponding datasets were originally compiled as part of an ongoing effort to validate the risk-assessment models developed for Iowa (52), and were therefore tailored to roughly resemble the range of variation in climatic conditions found in Iowa. We evaluated moisture balance, January mean temperatures, and natural geologic and plant community subdivisions to determine the counties to be included in the Minnesota and Missouri study areas (Fig. 1). Study areas in both states were required to have a positive moisture balance of ≤ 40 Im (48). In addition to this moisture requirement, the study area in Minnesota was defined by January mean air temperatures of ≥ −15C (59F) (43), excluding the Laurentian Mixed Forest plant community area (24). The study area for Missouri was defined by a positive moisture balance and January mean air temperatures of ≤ 0C (32F), excluding the Ozark and Mississippi Lowlands regions (25).

Fig. 1. Map of study areas for five risk-assessment datasets in the Upper Midwest. Study areas for their respective states are shaded. Chicago A is the more darkly shaded area in Illinois and Wisconsin and Chicago B is the less darkly shaded area in Indiana and Michigan.Fig. 1. Map of study areas for five risk-assessment datasets in the Upper Midwest. Study areas for their respective states are shaded. Chicago A is the more darkly shaded area in Illinois and Wisconsin and Chicago B is the less darkly shaded area in Indiana and Michigan.Fig. 1. Map of study areas for five risk-assessment datasets in the Upper Midwest. Study areas for their respective states are shaded. Chicago A is the more darkly shaded area in Illinois and Wisconsin and Chicago B is the less darkly shaded area in Indiana and Michigan.
Fig. 1. Map of study areas for five risk-assessment datasets in the Upper Midwest. Study areas for their respective states are shaded. Chicago A is the more darkly shaded area in Illinois and Wisconsin and Chicago B is the less darkly shaded area in Indiana and Michigan.

Citation: Journal of Environmental Horticulture 31, 2; 10.24266/0738-2898.31.2.124

In order to include non-native woody plant taxa (naturalizing or otherwise) in our lists for both of these newly established study areas, they needed to be commonly and historically cultivated in the respective state for significant periods of time (at least 30 years). We began by consulting lists of woody plants drawn from previous studies (52, 53), pre-1980s nursery and gardening catalogs, and books on woody plants cultivated in these areas (19, 39, 40, 41). These lists were reviewed by regional experts (for southern Minnesota: Neil Anderson, Jeff Gilman, Gary Johnson, Laurie Robinson, Harold Pellett, and Mike Zins; for northern Missouri: Alan Branhagen, June Hutson, Chris Starbuck, and Jan Vinyard) to generate a final list of species. Naturalization status was determined by consulting herbarium records at the University of Minnesota — Saint Paul, George Yatskievych's Flora of Missouri Database, and the University of Missouri — Columbia. We required a minimum of two herbarium vouchers that documented naturalization from different locations within each study area in order to designate a non-native woody plant taxon as naturalized in that study area. Taxa with ambiguous naturalization status were removed from the study. The final list for southern Minnesota included 23 naturalizing and 71 non-naturalizing species (n = 94), and the final list for northern Missouri included 39 naturalizing and 87 non-naturalizing species (n =126) (Table 1).

Table 1. Study area and naturalization status of species used in the southern Minnesota and northern Missouri data sets.
Table 1.
Table 1. Continued.
Table 1.
Table 1. Continued.
Table 1.

Life-history characteristics were populated with data from previous studies (14, 52, 53) and additional reference sources (4, 30, 40, 41, 45). Because non-native taxa can behave differently across geography, our life-history characteristics were reviewed by horticultural experts to ensure local accuracy (for southern Minnesota: Jeff Gilman, Harold Pellett, Laurie Robinson, Nancy Rose, and Mike Zins; for northern Missouri: Alan Branhagen, Boyce Tankersley, Chris Starbuck, and Guy Sternberg). Geographic-risk ratios (G-values) were calculated independently for the southern Minnesota and northern Missouri datasets by using native range data for each species. Native range data were primarily obtained from the USDA-ARS Germplasm Resources Information Network database (46) and previous studies (14, 52, 53), with supplementation from published floras (5, 17, 44). Native range data were organized into 360 geographic subdivisions for the southern Minnesota study area and 390 geographic subdivisions for northern Missouri. G-values were calculated as the proportion of species native to a geographic subdivision that have naturalized in the study area (southern Minnesota or northern Missouri), as described by Widrlechner et al. (52). Life-history characteristics and G-values were compiled into spreadsheets for analysis and may be accessed at http://www.nrem.iastate.edu/research/jan-t/index.php.

After reviewing the life-history and geographic characteristics for both datasets, each dataset was independently assessed by the five risk-assessment models described in Kapler et al. (14). Four of these five models are discussed in detail in Widrlechner et al. (52) and include a ‘continental decision tree’ developed by Reichard and Hamilton (33), plus three models developed specifically for Iowa (52): 1) a ‘modified decision tree’ that adds ten steps to the continental decision tree, 2) a ‘decision tree/matrix model’ that focuses on reevaluating ‘further analysis’ species produced by the continental decision tree, and 3) a classification and regression tree model (‘CART model’) developed specifically for Iowa. In addition, fitted random forest models, which are an extension of CART modeling techniques, were generated with the randomForest package (21) in R software (29) for the southern Minnesota and northern Missouri datasets following Kapler et al. (14). Random forest models provide a predicted probability of naturalization from zero (will not naturalize) to one (will naturalize). We set cutoff rules within these probabilities for each dataset in order to create ‘accept’, ‘reject’, and ‘further analysis’ outputs as follows:

For southern Minnesota:

  • If the predicted probability is < 0.205, then classify as ‘accept’;

  • If the predicted probability is ≥ 0.370, then classify as ‘reject’; and

  • If the predicted probability is between 0.205 and 0.370, then classify as ‘further analysis’.

For northern Missouri:

  • If the predicted probability is < 0.278, then classify as ‘accept’;

  • If the predicted probability is ≥ 0.490, then classify as ‘reject’; and

  • If the predicted probability is between 0.278 and 0.490, then classify as ‘further analysis’.

These cutoff values were set to maximize classification rates while keeping horticulturally limiting errors and biologically significant errors at, or below, values that are considerable acceptable by a broad cross-section of stakeholders (13).

The power and accuracy of each model were assessed in three ways. First, the ‘classification rate’ or proportion of species that a model successfully classifies (‘accept’ or ‘reject’) was examined as a measure of a model's power. We also assessed two types of errors expressed as the proportion of misclassifications to the total number of classified species: 1) ‘horticulturally limiting errors’ or non-naturalizing species that the models rejected as naturalizing and, 2) ‘biologically significant errors’ or naturalizing species that the models accepted as non-naturalizing. Both of these metrics are described in detail by Widrlechner et al. (52, 53). To determine the relative contributions of each individual life-history and geographical characteristic to the predictive strength of the random forest models, variable importance plots were generated in R software as part of the modeling process for each dataset. The quality of a node in one tree in the random forest can be measured by the change in the Gini index (10) when that node is added to the tree:

where pj is the probability that a species classified into that node naturalizes (j = 1) or does not (j = 2). A node where all species are correctly classified has Gini = 0. The importance of a variable is calculated by considering all splits based on that variable and calculating the total drop in the Gini index for those splits (10). Because the sum of variable importance values for all traits depends on the overall proportion of naturalizing species in the study and the number of nodes in the tree, we calculated the relative importance of each variable by expressing the importance as the proportion of the total importance for all variables.

Results and Discussion

Model performance. Fitting the models to predict naturalization status in southern Minnesota and northern Missouri generated similar results to those from Iowa (14, 52) and two regions in the Chicago area (53). The extended Reichard and Hamilton models (the modified decision tree and the decision tree/matrix model) have higher classification rates (between 87.3 and 90.4%) than does the original Reichard and Hamilton continental model (about 72% for both areas; Table 2). All three models have similar horticultural (23.5 to 30.9%) and biological (4.4 to 9.4%) error rates (Table 2). The CART model has a much lower classification rate for the southern Minnesota study area (63.8%), a somewhat lower classification rate for the northern Missouri study area (81.0%) and slightly elevated biological error rates (10.0% in Minnesota and 7.8% in Missouri) compared to the original and extended Reichard and Hamilton models. To its credit, the CART model produced much lower horticulturally limiting error rates in both states (6.7% in southern Minnesota and 11.8% in northern Missouri) compared to the original and extended Reichard and Hamilton models. Relative to Reichard and Hamilton, the modified decision tree, and the decision tree/matrix model, the random forest model had lower classification rates (80.0% for southern Minnesota and 81.8% for northern Missouri), similar biological error rates (5.9% in southern Minnesota and 7.8% in northern Missouri), and much lower horticulturally limiting error rates (11.8% in southern Minnesota and 9.7% in northern Missouri) (Table 2).

Table 2. Summary of classification and error rates for five risk-assessment models by data set.
Table 2.

Misclassified species. As we have previously noted (14, 52), it is instructive to examine misclassified species more carefully. The random forest model gives unanticipated results for several species in the two study areas (Table 3). Two naturalizing species were misclassified as ‘accept’ in both study areas: Berberis thunbergii and Viburnum opulus. Berberis thunbergii is endemic to Japan, and in contrast to many woody plants endemic to Japan (50), this shrub is well adapted to seasonal moisture deficits similar to those that characterize the Upper Midwest. We have also observed it to be increasing in woodlands in Iowa and Illinois that are heavily browsed by deer. Notably, the Japanese barberry was also misclassified by risk-assessment models for both Iowa and the Chicago region (52,53).

Table 3. Species producing biologically significant and horticulturally limiting errors in local random forests for the southern Minnesota and northern Missouri datasets.
Table 3.

Old World populations of V. opulus have also been naturalizing throughout the Midwest, although typically in relatively low abundance. It is possible that the introduced Old World populations are hybridizing with local populations of V. opulus var. americanum, at least in the northern part of our region, which may be increasing their adaptation to local conditions.

The naturalization of Wisteria frutescens in northern Missouri may be related to a very different phenomenon. Compared to other regions of the world, the southeastern United States is a low-risk region for the introduction of naturalizing woody plants into the Midwest (51). But the native range of this southeastern vine extends north and west into the forests of southeastern Missouri and southern Illinois (19, 26) very near the boundary of our Missouri study area. It is possible that a combination of its cultivation as an ornamental vine and natural dispersal events is leading to its range expansion.

When examining species that are predicted to naturalize but have not, one must also keep in mind the possibility that these species may still be in a lag phase (18) prior to naturalization or they may be undocumented naturalizers. Long-lived trees, such as Castanea mollissima, often have lengthy lag phases. Euonymus europaeus may be an example of the latter, an undocumented naturalizer. In this case, the European spindle-tree is quite difficult to distinguish from its native congener, E. atropurpureus, except when these plants are in flower, which only occurs for a short period in the spring greatly reducing the likelihood of detection.

There are other potential explanations for the lack of naturalization among species predicted to naturalize. For example, Alnus glutinosa is known to naturalize in northern Illinois (53) and has become so invasive along waterways in Du Page County, Illinois, that extensive efforts have been directed toward large-scale removal (Joseph Suchecki, personal communication). In the Old World, A. glutinosa has an extensive native range, covering an extremely broad latitudinal band. However, provenance testing across multiple locations in the Midwest has shown that each Old World population has a very narrow band of latitudinal adaptation (3, 23). Thus, it is likely that A. glutinosa will not naturalize in southern Minnesota unless populations that are well adapted to that latitude and climate are widely cultivated there. The case of Ribes alpinum is somewhat different. It is very well adapted in southern Minnesota and is widely planted as a low-growing, dense shrub in residential and commercial landscapes. For this dioecious species, however, fruit production is only rarely observed, which may be due to a preference for the cultivation of single clones in blocks, and for selection of staminate clones. Without fruit production, it is unlikely to escape from cultivation. And, finally, in the case of the species with the highest predicted chance of naturalization (Table 3), Cotoneaster divaricatus, we suspect that the discrepancy may be the result of a gradual decline in its cultivation in northern Missouri. As noted by Lockwood et al. (22), a critical degree of propagule pressure is often needed before naturalization or invasion is observed.

Variable importance. We determined the relative importance of life-history and geographical characteristics in the random forest models for each study area (one Iowa and two Chicago datasets, along with the datasets for Minnesota and Missouri presented in this paper) to look for any patterns in the results (Fig. 2). Using simulation, Archer and Kimes (1) show that the random forest variable importance plot successfully identifies the correct variables to include in a classification model. The two variables most closely associated with native range, ‘Native to North America’ and ‘G-value’, display strikingly different results. ‘Native to North America’ is the least important variable in model construction, whereas ‘G-value’ is, by far, the most important determinant in all five study areas. The contribution of ‘G-value’ to each of the five random forest models, as reflected in standardized relative importance, ranges between 41.5 and 54.5% (Fig. 2). At least within this region, the role that geographic pre-adaptation plays in allowing cultivated woody plants to naturalize cannot be overemphasized. Previous research has demonstrated that much of this pre-adaptation is related to climatic analogs (a close match between climatic characteristics of a plant's native range and those of the location where it is introduced). This is true both within our region (49, 51) and more broadly (as reviewed by Richardson and Thullier, 38).

Fig. 2. Variable importance in the random forest model for five datasets representing the Upper Midwest.Fig. 2. Variable importance in the random forest model for five datasets representing the Upper Midwest.Fig. 2. Variable importance in the random forest model for five datasets representing the Upper Midwest.
Fig. 2. Variable importance in the random forest model for five datasets representing the Upper Midwest.

Citation: Journal of Environmental Horticulture 31, 2; 10.24266/0738-2898.31.2.124

Of the biological characteristics included in the random forest model, ‘Quick maturity’ stands out as being of particular importance, especially in Missouri and Iowa, where its relative importance is 2.0 to 2.5 times greater than any other biological trait. This underscores the importance of a relatively brief establishment phase leading to consistent reproduction. In other words, minimum generation time, especially for woody plants, leads to greater propagule pressure in a shorter period of time, which has widely been documented as a likely precursor of naturalization and invasion (36).

‘Fleshy, bird-dispersed fruits’ and ‘Quick vegetative spread’ are of somewhat lesser importance. Of these two, ‘Fleshy, bird-dispersed fruits’ is most important in Iowa. This characteristic was added to risk-assessment models developed solely from the Iowa dataset (51) in order to improve upon the classification and error rates of the continental decision tree, so this result is not surprising. We hypothesize that seed dispersal by birds may be most important in landscapes resembling those of Iowa, where forest areas are highly fragmented. Bird dispersal can facilitate colonization from managed landscapes to new sites even across inhospitable areas that may exist between them. Of the five study areas, ‘Vegetative spread’ is most important in northern Missouri, and it may prove to be more important in densely wooded plant communities in milder climates, whenever seed propagation is limiting.

Taken together, the two characteristics related to invasion history for each species, ‘Group invasive in North America’ and ‘Invades outside North America’, are comparable to ‘Quick maturity’ in contributing to our random forest models. These two traits play key roles in Reichard and Hamilton's continental model (33) for North America, wherein ‘Invades outside North America’ is the first criterion, and ‘Group invasive in North America’ is used at three subsequent nodes in the decision tree.

In our study areas, the remaining characteristics contribute relatively little to predicting the probability of naturalization in the random forest models. Information on seed dormancy and on leaf persistence of broad-leaved evergreens under a wide range of environmental conditions is not always readily available. Our results suggest that special efforts may not be needed to identify these two character states before proceeding to model development. In contrast, for most taxa, it is relatively easy to determine whether a plant is native to North America (required to calculate G-values), but that fact per se also contributes little to random forest models.

Concluding thoughts. Classification and error rates for the five risk-assessment models applied to species datasets from southern Minnesota and northern Missouri resembled those reported in earlier work for other areas within the Upper Midwest (Iowa, and two areas near Chicago; 14, 52, 53). Across all of these datasets, the random forest models have consistently produced acceptable classification rates, as well as reasonable biological and horticultural error rates (results reported here, as well as in Kapler et al., 14). In addition, the ability to discern which variables in the model are making important contributions to predicting species' naturalization, as is true for the random forest models, is helpful in understanding the relative performance of different models. In our ongoing work, we are developing regional-scale models and comparing their performance to those of the local models that have been generated for each of these datasets. It is important to note that the utility of any floristic risk assessment model is only as good as the data available to construct it. We encourage regular floristic surveys and voucher specimen collection, particularly documenting non-native species, to provide better data on floristic change.

Literature Cited

Copyright: © 2013 Horticultural Research Institute 2013
Fig. 1.
Fig. 1.

Map of study areas for five risk-assessment datasets in the Upper Midwest. Study areas for their respective states are shaded. Chicago A is the more darkly shaded area in Illinois and Wisconsin and Chicago B is the less darkly shaded area in Indiana and Michigan.


Fig. 2.
Fig. 2.

Variable importance in the random forest model for five datasets representing the Upper Midwest.


Contributor Notes

Journal paper of the Iowa Agriculture and Home Economics Experiment Station, Ames, IA, and supported by Hatch Act, McIntire-Stennis, and State of Iowa funds. We acknowledge additional financial support from USDA-ARS through the Floral and Nursery Crops Research Initiative. We also thank Anita Cholewa, Robin Kennedy, Matt O'Hearn, and Welby Smith for herbarium assistance, and Jeffery Iles and two anonymous peer reviewers for their useful critiques of our manuscript. Mention of commercial brand names does not constitute an endorsement of any product by the U.S. Department of Agriculture or cooperating agencies.

2Affiliate Associate Professor. Departments of Ecology, Evolution and Organismal Biology and Horticulture, Iowa State University, Ames, IA. isumw@iastate.edu.

3Master of Science. Department of Natural Resource Ecology and Management, Iowa State University. ekapler@gmail.com.

4University Professor. Department of Statistics, Iowa State University. pdixon@iastate.edu.

5Professor. Department of Natural Resource Ecology and Management, Iowa State University. jrrt@iastate.edu.

Received: 14 Feb 2013
  • Download PDF