Introduction

The trunk is an important component of trees, accounting for 60 ~ 70% of tree volume (Li 2019). Volume is the main basis for evaluating tree economic value, and the quality of the stem form is also an important indicator for wood utilization. The taper equation is often used to describe the stem profile in forestry. The taper equation describes how sharply the stem diameter becomes thinner with increasing height, and its mathematical expression is d = ƒ(h, H, D) (Li 2019). The taper equation not only enables estimation of the diameter at any height of the stem or of the height at any diameter but also makes it possible to obtain the total volume of the trunk and the volume at any given position by integration. Because the taper equation is not affected by the change in material specifications, it has become the preferred method for preparing volume and species yield tables (Meng 1982; Jiang et al. 2016). It is widely used in timber yielding and utilization, forest management planning, actual timber production, forest stock and biomass estimation, etc. (Jiang et al. 2005; Özçelik and Crecente-Campo 2016; He et al. 2021). Tree volume estimates are critical to constructing growth yield models, not only as an important way to evaluate forest health and productivity but also as a primary basis for determining the method and time of forest harvesting (Alkan and Özçelik 2020). In recent years, with scientific and technological developments, foresters have derived a consistent volume equation by integrating the taper equation. The consistent taper and volume equation system is highly compatible and can simultaneously estimate stem diameter, total volume and merchantable volume, providing flexibility for forest management and planning in the context of changing wood use standards (Quiñonez-Barraza et al. 2019; Zhao et al. 2019).

Stem taper is influenced by a variety of factors, such as species (Özçelik et al. 2016), site quality (Lee et al. 2006), and management practices (Jiang and Liu 2011); therefore, no single taper equation can be applied to all species or to the description of the stem profile of the same species in different regions (He et al. 2020). Larix olgensis (LO) and Larix kaempferi (LK) are the main timber trees in northeast China with good timber quality, fast growth rates and high yields, characteristics which have made important contributions to modernization of the Chinese economy. Foresters have conducted numerous studies on larch growth and yield models in terms of diameter at breast height (DBH) and tree height (TH) growth (Wang and Li 2018; Zhang et al. 2021; Qiao and Sun 2022), stem taper and volume (Jiang et al. 2016), and forest stock and biomass (Fu et al. 2015; Jia and Chen 2019; Dong et al. 2022). Previous studies have covered LO, LK and Larix gmelinii (LG) together as a whole or have focused on a single Larix species; however, the differences and commonalities among the different species have rarely been reported. In recent years, researchers have developed a compatible volume-biomass system for different tree species by using dummy variables and mixed effects models that incorporate both geography and origin—a system that reflects not only the relationship between biomass and independent variables but also the differences in biomass among different tree species (Fu et al. 2015; Zeng et al. 2019). In previous studies, the taper and volume system has been studied only for a specific tree species or only for taper and total volume (Jiang et al. 2011). However, studies on the compatible taper and merchantable volume equation systems of different larch species in northeast China are rarely reported.

The objectives of this study were to use destructive sampling to investigate the diameter and height data of LO and LK in northeast China to construct a system of compatible taper and volume equations and then construct a generalized equation appropriate for both larch species using species as dummy variables. With this approach, a reference can be provided for accurate estimation of the diameter and merchantable volume of different larch species in northeast China, offering insight into larch growth yield to inform management decisions.

Materials and methods

Study area

Larix kaempferi data were collected in 2017 and 2018 at Dagujia Forest Farm (Fig. 1), which is located in northeast Qingyuan County, Fushun City, Liaoning Province, with longitude 124°47′–125°12′ E and latitude 42°22′–42°16′ N and an altitude from 200 to 600 m. It belongs to the middle temperate continental climate, with long, cold winters, warm and rainy summers, an average annual temperature of 6 °C and an annual precipitation of 500–800 mm. The forest mainly consists of larch, Pinus koraiensis, Picea asperata, Betula platyphylla, Tilia tuan and other species.

Fig. 1
figure 1

Location of the study areas within Liaoning and Heilongjiang province

Larix olgensis data were collected from the Mengjiagang Forest Farm in 2020. Mengjiagang Forest Farm is located in northeast Huanan County, Jiamusi City, Heilongjiang Province (Fig. 1), with longitude 130°32′42″–130°52′36″E and latitude 46°20′16″–46°30′50″N and an altitude from 170 to 575 m. It has a temperate continental monsoon climate with long winters, average annual temperature of 2.7 °C and average annual precipitation of 550 mm. The forest farm is dominated by planted forests, and the main coniferous species are larch, Pinus sylvestris and Pinus koraiensis, as well as secondary broad-leaved mixed forests such as Fraxinus mandshurica, Populus davidiana and Quercus mongolica.

Data

LK sample trees were collected from 30 sample plots in Dagujia Forest Farm, and three trees (dominant, average and inferior) were felled in each plot for trunk analysis. A total of 90 LKs were collected. The LO sample trees were obtained from 9 clear-cut areas in the Mengjiagang Forest Farm, and 262 representative LOs were selected according to diameter and height grades. The diameter at breast height (DBH, 1.3 m), stump diameter (d0, 0.1 m), tree height (TH) and diameters at relative heights of 0.02, 0.04, 0.06, 0.08, 0.1, 0.2, 0.25, 0.3, 0.4, 0.5, 0.6, 0.7, 0.75, 0.8, and 0.9 were measured after felling the sample trees. Bole volumes were calculated with the Smalian formula, and the tip volumes were calculated with the cone equation. The collected LO and LK data were randomly divided into two groups with same number of trees, and the two-fold evaluation scheme was used to fit and test the model system (Bohora and Cao 2014; Özçelik et al. 2018). Removing sample trees with DBH less than 5 cm, the actual LK used for modeling was 86. The summary statistics for the LO and LK sample trees are shown in Table 1. Figure 2 shows a local nonparametric quadratic fit of the two larch species using the PROC LOESS procedure in Statistical Analysis System (SAS) software. The scatter plot of the relative LO and LK diameter and height was assessed to remove outliers.

Table 1 Summary statistics for Larix olgensis and Larix kaempferi sample trees
Fig. 2
figure 2

Data points for relative diameter and relative height plotted with a local regression loess smoothing curve (smoothing factor = 0.2) for each larch species

Methods

System of equations selected for comparison

The volume equation obtained by integrating the taper equation, or the taper equation derived by derivation of the volume equation, were combined into a compatible equation system. The volume calculated by the integral of the compatible taper equation is equal to the volume calculated by the volume equation, and the two equations provide consistent estimates to ensure consistency and logical interconversion between the tree volume table and the merchantable volume table (Li 2019). The two equations in the compatible system share one set of parameters to ensure the robustness of the model parameter estimation and the coordinated consistency of the calculated results, which is conducive to the standardization and serialization of forestry management (Zeng and Tang 2011). Based on previous research, this study adopted five compatible taper and volume equation systems commonly used in forestry: those presented in Goulding (1976), Max (1976), Cao (1980), Fang (2000) and Zhao (2017).

The Goulding and Murray (1976) equation, which uses a power function of relative height to define the stem profile, was originally introduced by Demaerschalk (1972) to develop a compatible volume system (Diéguez-Aranda et al. 2006). In the present study, it is rewritten as follows:

$$\left\{ {\begin{array}{*{20}c} {d^{2} = \frac{{a_{0} DBH^{{a_{1} }} TH^{{a_{2} }} }}{kTH}\left( {b_{1} z + b_{2} z^{2} + b_{3} z^{3} + b_{4} z^{4} + b_{5} z^{5} } \right)} \\ {V_{m} = a_{0} DBH^{{a_{1} }} TH^{{a_{2} }} - a_{0} DBH^{{a_{1} }} TH^{{a_{2} }} \left( {\frac{{b_{1} }}{2}z^{2} + \frac{{b_{2} }}{3}z^{3} + \frac{{b_{3} }}{4}z^{4} + \frac{{b_{4} }}{5}z^{5} + \frac{{b_{5} }}{6}z^{6} } \right)} \\ \end{array} } \right.$$
(1)

where d is the diameter at height h (cm), DBH is the diameter at breast height (cm), TH is the total tree height (m), h is the stem height from the butt (m), Vm is the merchantable volume (m3), ai, bi are model parameters, \(Z=\left(TH-h\right)/TH\), and k is a metric constant for converting from diameter squared in cm2 to cross section area in m2, \(k=\pi /40000\).

The Max and Burkhart (1976) equation is currently the most applied segmented taper equation with the highest accuracy (Jiang et al. 2011), and Zhao et al. (2019) integrated it to obtain the following merchantable volume equation.

$$\left\{ {\begin{array}{*{20}c} {d^{2} = DBH^{2} \left[ {b_{1} \left( {q - 1} \right) + b_{2} \left( {q^{2} - 1} \right) + b_{3} (p_{1} - q)^{2} I_{1} + b_{4} (p_{2} - q)^{2} I_{2} } \right]} \\ {V_{m} = kDBH^{2} TH\left\{ {\frac{{b_{3} p_{1}^{3} + b_{4} p_{2}^{3} }}{3} + \frac{{b_{1} }}{2}q^{2} + \frac{{b_{2} }}{3}q^{3} - \left( {b_{1} + b_{2} } \right)q - \frac{{b_{3} }}{3}\left( {p_{1} - q} \right)^{3} I_{1} - \frac{{b_{4} }}{3}\left( {p_{2} - q} \right)^{3} I_{2} } \right\}} \\ \end{array} } \right.$$
(2)
$$I_{1} = {\text{1,if }}q \le p_{1} ;0{\text{ otherwise}}$$
$$I_{2} = {\text{1,if }}q \le p_{2} ;0{\text{ otherwise}}$$

where p1 and p2 are the relative heights at the lower and upper inflection points of the stem, respectively, q = h/TH, all other variables as previously defined.

The Cao and Burkhart (1980) equation is derived from the volume ratio equation. This equation system is used to estimate the volume of the stem at any height, and its merchantable volume is converted from the total volume (Quiñonez-Barraza et al. 2019).

$$\left\{ {\begin{array}{*{20}c} {d^{2} = - \frac{{a_{0} {\text{DBH}}^{{a_{1} }} {\text{TH}}^{{a_{2} }} }}{k}b_{1} b_{2} \left[ {\frac{{\left( {{\text{TH}} - h} \right)^{{b_{2} - 1}} }}{{{\text{TH}}^{{b_{3} }} }}} \right]} \\ {V_{m} = a_{0} {\text{DBH}}^{{a_{1} }} {\text{TH}}^{{a_{2} }} \left\{ {1 + b_{1} \left[ {\frac{{\left( {{\text{TH}} - h} \right)^{{b_{2} }} }}{{{\text{TH}}^{{b_{3} }} }}} \right]} \right\}} \\ \end{array} } \right.$$
(3)

All variables as previously defined.

Fang et al. (2000) presented a taper model proposed for stems with two inflection points that divide the stem into three parts with different shapes. b1, b2, and b3 are the shape factors of the corresponding segments, and p1 and p2 are the relative positions of the two inflection points. The equations were mainly used for the study of pine species, and they have good statistical performance and low multicollinearity in estimating stem diameter, total volume and merchantable volume (Li and Weiskittel 2010).

$$\left\{ {\begin{array}{*{20}c} {d = c_{1} \sqrt {TH^{{\left( {k - b_{1} } \right)/b_{1} }} \left( {1 - q} \right)^{{\left( {k - \beta } \right)/\beta }} q_{1}^{{J_{1} + J_{2} }} q_{2}^{{J_{2} }} } } \\ {V_{m} = c_{1}^{2} TH^{{k/b_{1} }} \left[ {b_{1} t_{0} + \left( {J_{1} + J_{2} } \right)\left( {b_{2} - b_{1} } \right)t_{1} + J_{2} \left( {b_{3} - b_{2} } \right)q_{1} t_{2} - \beta \left( {1 - q} \right)^{k/\beta } q_{1}^{{J_{1} + J_{2} }} q_{2}^{{J_{2} }} } \right]} \\ \end{array} } \right.$$
(4)
$$\begin{aligned} c_{1} = & \sqrt {a_{0} {\text{DBH}}^{{a_{1} }} {\text{TH}}^{{a_{2} - k/b_{1} }} /\left[ {b_{1} \left( {t_{0} - t_{1} } \right) + b_{2} \left( {t_{1} - q_{1} t_{2} } \right) + b_{3} q_{1} t_{2} } \right]}; \\ & t_{0} = \left( {1 - p_{0} } \right)k/b_{1} ;t_{1} = (1 - p_{1} )^{{k/b_{1} }} ;t_{2} = (1 - p_{2} )^{{k/b_{2} }}; \\ & q_{1} = (1 - p_{1} )^{{\frac{{\left( {b_{2} - b_{1} } \right)k}}{{b_{1} b_{2} }}}} ;q_{2} = (1 - p_{2} )^{{\frac{{\left( {b_{3} - b_{2} } \right)k}}{{b_{2} b_{3} }}}} ; \beta = b_{1}^{{1 - \left( {J_{1} + J_{2} } \right)}} b_{2}^{{J_{1} }} b_{3}^{{J_{2} }} ; \\ & p_{0} = h_{0} /{\text{TH}};p_{1} = {{h_{1} } \mathord{\left/ {\vphantom {{h_{1} } {{\text{TH}}}}} \right. \kern-0pt} {{\text{TH}}}};p_{2} = h_{2} /{\text{TH}} \\ & J_{1} = 1,if\;p_{1} \le q \le p_{2} ;0,{\text{otherwise}} \\ & J_{2} = 1,if\;p_{2} < q < 1;0,{\text{otherwise}} \\ \end{aligned}$$

where h0 is the stump height; 0.1 m, h1, h2 are the heights of the lower and upper inflection points (m), respectively, all other variables as previously defined.

Zhao and Kane (2017) developed a variable merchantable volume equation based on the cumulative relative distribution of height, which in essence is a partial derivative of the volume ratio function for the upper height of the stem to derive the corresponding taper equation (Quiñonez-Barraza et al. 2019):

$$\left\{ {\begin{array}{*{20}l} {d^{2} = \frac{{b_{1} }}{kTH}a_{0} {\text{DBH}}^{{a_{1} }} {\text{TH}}^{{a_{2} }} \left( {1 - q} \right)^{{b_{1} - 1}} \left\{ {1 - b_{2} \exp \left[ { - \exp \left( {b_{3} {\text{DBH}}^{{a_{1} }} {\text{TH}}^{{a_{2} }} } \right)} \right]} \right\}\left[ {1 - \left( {1 - q} \right)^{{b_{1} }} } \right]^{{\left\{ { - b_{2} \exp \left[ { - \exp \left( {b_{3} DBH^{{a_{1} }} TH^{{a_{2} }} } \right)} \right]} \right\}}} } \hfill \\ {V_{m} = a_{0} {\text{DBH}}^{{a_{1} }} {\text{TH}}^{{a_{2} }} \left[ {1 - \left( {1 - q} \right)^{{b_{1} }} } \right]^{{\left\{ {1 - b_{2} \exp \left[ { - \exp \left( {b_{3} {\text{DBH}}^{{a_{1} }} {\text{TH}}^{{a_{2} }} } \right)} \right]} \right\}}} } \hfill \\ \end{array} } \right.$$
(5)

All variables as previously defined.

Heteroscedasticity

Heteroscedasticity is a common problem in forest modeling, especially in volume equations (Parresol 1993). The presence of heteroscedasticity will lead to inaccuracy in the parameter estimation and prediction intervals. Weighted regression or logarithmic transformation methods are commonly used in forestry to eliminate heteroscedasticity (Diéguez-Aranda et al. 2006; Wang et al. 2014; Quiñonez-Barraza et al. 2019). The logarithmic transformation process is prone to errors, so this study adopts the weighted regression method to correct for heteroscedasticity. According to relevant literature (Quiñonez-Barraza et al. 2014; López-Martínez et al. 2020; Zhang et al. 2022), DBH and TH are used as independent variables, and the power function of residual variance (\({\sigma }_{i}^{2}={\left(DB{H}^{2}TH\right)}^{\varphi }\)) is used as the error variance function. The estimated error of the unweighted model \({\widehat{e}}_{i}^{2}\) is used as the dependent variable \({\widehat{e}}_{i}^{2}={\varphi }_{0}{\left(DB{H}^{2}TH\right)}^{{\varphi }_{1}}\) in the error variance model, and the parameters are estimated in SAS/ETS PROC MODEL.

$${\text{resid}}.V_{m} = {{{\text{resid}}.V_{m} } \mathord{\left/ {\vphantom {{{\text{resid}}.V_{m} } {\left[ {\left( {DBH^{2} TH} \right)^{\varphi } } \right]^{0.5} }}} \right. \kern-0pt} {\left[ {\left( {DBH^{2} TH} \right)^{\varphi } } \right]^{0.5} }}$$

Generalized equation

There are some differences in stem form between LO and LK. To more accurately reflect their differences and to improve the generality and practicability of the equations, compatible stem taper and merchantable volume equations appropriate for both were constructed using the tree species as dummy variables. The dummy variables are denoted by S, i.e., S = 1 for LO and S = 0 for LK. All parameters in Eq. (1)–(5) are assumed to depend linearly on these dummy variables:

$$\begin{gathered} a_{0} = a_{01} + a_{02} S;a_{1} = a_{11} + a_{12} S;a_{2} = a_{21} + a_{22} S;b_{1} = b_{11} + b_{12} S; \hfill \\ b_{2} = b_{21} + b_{22} S;b_{3} = b_{31} + b_{32} S;b_{4} = b_{41} + b_{42} S; \hfill \\ b_{5} = b_{51} + b_{52} S;p_{1} = p_{11} + p_{12} S;p_{2} = p_{21} + p_{22} S \hfill \\ \end{gathered}$$
(6)

where \({a}_{i1}\), \({b}_{i1}\), and \({p}_{i1}\) are model fixed parameters and \({a}_{i2}\), \({b}_{i2}\), and \({p}_{i2}\) are dummy variable parameters. Substituting (6) into Eq. (1)–(5) results in the dummy variable model for describing stem diameter and merchantable volume varying across the two species.

Fitting and evaluation of models

To minimize the error of taper and volume, the taper and volume equations for each model system are fitted simultaneously by tree species in SAS PROC MODEL using seemingly unrelated regression (SUR) based on diameter and cumulative volume data at each relative height. Parameter estimation is performed by solving nonlinear simultaneous equations (Jiang et al. 2011), with the taper and volume equations sharing a set of parameters. The adjusted coefficient of determination (\({R}_{a}^{2}\)), root mean square error (RMSE), RMSE%, Akaike information criterion (AIC), and condition number (CN) were used as model fitting statistical criteria. The condition number (CN) assesses whether there is a multicollinearity problem among the variables of the model system. Generally, if the CN is less than 30, then collinearity is not a problem; if the CN is between 30 and 100, then there are associated problems of multicollinearity, but the model is acceptable; and if the CN is greater than 100, then there is serious multicollinearity (Belsley 1991). The expressions of these statistical criteria are:

$$R_{a}^{2} = 1 - \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {y_{i} - \hat{y}_{i} } \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} \left( {y_{i} - \overline{y}} \right)^{2} }} \times \left( {\frac{n - 1}{{n - p}}} \right)$$
(7)
$${\text{RMSE}} = \left[ {\frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {y_{i} - \hat{y}_{i} } \right)^{2} }}{n - p}} \right]^{0.5}$$
(8)
$${\text{RMSE}}\% = \frac{{{\text{RMSE}}}}{{\overline{y}}} \times 100$$
(9)
$${\text{AIC}} = n\log \left( {\mathop \sum \limits_{i = 1}^{n} \frac{{\left( {y_{i} - \hat{y}_{i} } \right)^{2} }}{n}} \right) + 2p$$
(10)

where \({y}_{i}\) and \({\widehat{y}}_{i}\) are the measured and predicted diameter or \({V}_{m}\) at different height; n is the total number of observations; \(\overline{y}\) is the mean of \({y}_{i}\); and p is the number of estimated parameters in a model.

In this study, the data were randomly divided into two groups, each contained the same number of trees. We used the two-fold evaluation scheme (Özçelik et al. 2018), in which parameters of the equation systems fitted to one group was applied to predict for the other group. The predictions from both groups were then used to calculate evaluation statistics for the equation systems. The mean absolute bias (MAB) and the mean percentage of bias (MPB) were used as the model evaluation criteria. The expressions for the test criteria are:

$${\text{MAB}} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left| {y_{i} - \hat{y}_{i} } \right|$$
(11)
$${\text{MPB}} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left| {y_{i} - \hat{y}_{i} } \right|}}{{\mathop \sum \nolimits_{i = 1}^{n} y_{i} }} \times 100$$
(12)

All variables as previously defined.

Results

Compatible taper and volume equations system

Table 2 shows the parameter estimates and standard errors of the simultaneous fitting taper and volume equation systems of Larix olgensis and Larix kaempferi. For LK, all parameters were significant (P < 0.05) except b1 in the equation of Goulding (1976) and b2 in the equation of Zhao (2017). The equations from Max (1976) and Fang (2000) are segmented taper equations, and they both have two inflection points p1 and p2, which divide the stem into three parts (neiloid, paraboloid and cone). However, the positions of the inflection points for these two equations are different. For LO, the lower and upper inflection points with the Max (1976) equation were 0.036 and 0.909, respectively, while the lower and upper inflection points with the Fang (2000) equation were 0.027 and 0.726, respectively. For LK, the lower and upper inflection points with the Max (1976) equation were 0.055 and 0.798, respectively, while the lower and upper inflection points with the Fang (2000) equation were 0.030 and 0.663, respectively. The lower inflection points with the Max (1976) equation for LO and LK were only slightly different, but the upper inflection points differed by approximately 10%. For the Fang (2000) equation, the upper inflection point of LO was higher than that of LK.

Table 2 Parameter estimates and standard errors of the taper and volume model system for Larix olgensis and Larix kaempferi

Table 3 shows the statistics for goodness of fit, RMSE%, condition number (CN) and test criteria of the taper and volume equations of both larches. For LO, the \({R}_{a}^{2}\) of all the taper equations were above 0.95 except for in the case of the Cao (1980) Equation (0.921). The Fang (2000) equation performed slightly better than the Max (1976) equation, with RMSEs of diameter and Vm of 1.04 and 0.0148, respectively. The merchantable volume equation had a good fitting effect, with an accuracy of more than 99%. In the model test, the mean absolute bias (MAB) for diameter ranged from 0.83 to 1.28 cm and was less than 0.0128 m3 for merchantable volume; the mean percentage of bias (MPB) for both taper and merchantable volume was less than 7.72%. For LK, all five taper equations explained more than 97% of the variation in diameter. Similarly, the fitting accuracy of the merchantable volume equations was more than 99%. The Fang (2000) and Zhao (2017) equation had the lowest multicollinearity, while the Fang (2000) equation had the best overall performance. In the model test, the mean absolute bias (MAB) for diameter was less than 0.85 cm (less than that of LO) and less than 0.0051 m3 for merchantable volume. The mean percentage of bias (MPB) of both taper and merchantable volume were less than 7.3%.

Table 3 Fitting and testing statistical criteria of the basic taper and volume model system

The Goulding (1976), Max (1976) and Cao (1980) equations had strong multicollinearity. For both LO and LK, the Fang (2000) model had the smallest AIC and RMSE%, the condition number (CN) was less than 100, and the model had the best performance in terms of fit and test statistical criteria. Therefore, the Fang (2000) model was selected as the optimal base model in this study to establish a generalized larch taper and volume equation system appropriate for northeast China with species as dummy variables.

In this study, a power function (DBH2TH) of the variance covariate was used for weighted regression in fitting the model, and the heteroscedasticity of the volume equation was corrected. Figure 3 shows the standardized residual plots of the unweighted and weighted merchantable volume for the two larch species from the Fang (2000) model. In Fig. 3A and C are the uncorrected residual distributions for LO and LK, respectively. It can be seen from the figure that both scatter plots are trumpet shaped, indicating that the volume has variance heterogeneity. The model was then corrected by the variance function, and the resulting standardized residual distributions of LO and LK are shown in Fig. 3B and D. The model residuals are evenly distributed, and the heteroscedasticity was eliminated. The other models had similar results, so we will not go into detail here.

Fig. 3
figure 3

Unweighted and weighted standardized residual distributions of the Fang (2000) model for LO and LK

A, unweighted residual distribution of the LO merchantable volume equation; B, weighted residual distribution of the LO merchantable volume equation; C, unweighted residual distribution of the LK merchantable volume equation; D, weighted residual distribution of the LK merchantable volume equation.

Dummy variable model

Based on the optimal basic model of Fang (2000), a generalized equation applicable to both larch species was constructed with tree species as dummy variables, and the model parameters and statistics are shown in Tables 4 and 5. Model 4 is an overall model without distinguishing tree species, and Model 6 is a generalized equation with tree species as a dummy variable. We used the overall data to fit Model 4 and Model 6.

Table 4 Parameters of the overall Model 4 and generalized Model 6
Table 5 LO and LK’s fitting and testing statistical criteria based on the dummy variable Model 6

There was little difference in the fixed parameters between the overall model and the dummy variable model (Table 4). The Fang (2000) equation is a segmented taper equation with two inflection points, and the dummy variable parameters p12 and p22 are important for distinguishing the species inflection points and accurately calculating the diameter and volume of different species. The dummy variable parameter p12 was less than 0 (−0.019), which means that the lower stem inflection point for LK was higher than that for LO under the same conditions; p22 was greater than 0 (0.029), indicating that the upper stem inflection point for LO was higher than that of LK. This is similar to the location of the upper and lower inflection points calculated by fitting LO and LK separately.

Model performance

In this study, the overall model and the dummy variable model explained more than 98% and approximately 99.5% of the variation in diameter and merchantable volume, respectively. The root mean square error (RMSE), AIC, and RMSE% of the dummy variable model were smaller than those of the overall model, and the test criteria (MAB, MPB) were also smaller than those of the overall model. In general, the dummy variable model presents advantages in statistics of fit compared to the overall model for both larch types. The results of LO and LK based on the dummy variable model are shown in Table 5. Compared with the results of Fang (2000) model in Table 3, both are almost identical, and the dummy variable model still outperforms other models. Figure 4 shows the standardized residual distributions of the overall model (A, B, C, D) and the dummy variable model (E, F, G, H) for LO and LK. Similar to the optimal model of Fang (2000), the uncorrected residual distributions are trumpet shaped, and weighted regression eliminates heteroscedasticity.

Fig. 4
figure 4

Unweighted and weighted standardized residual distributions of the overall model (A, B, C, D) and the dummy variable model (E, F, G, H) for LO and LK

A and E, unweighted residual distribution of the LO merchantable volume equation; B and F, weighted residual distribution of the LO merchantable volume equation; C and G, unweighted residual distribution of the LK merchantable volume equation; D and H, weighted residual distribution of the LK merchantable volume equation.

To verify the applicability and accuracy of the dummy variable model, the parameters of Model 6 were used to predict the stem diameter and volume of the relative height classes of the two larch species in this study. The mean absolute bias (MAB) and the mean percentage of bias (MPB) of the estimates are shown in Table 6. At a relative height of 0.9–1, the MAB of the LO diameter was equal to 0.91 cm, and the MAB of the LK diameter was also the largest at this position, while the others were all less than 0.90 cm. The MPB of both larch species below a relative height of 0.8 was less than 10%. The MAB of the LO merchantable volume ranged from 0.0031 to 0.0191 m3; that of LK ranged from 0.0009 to 0.0060 m3; and the MPB of both larch species was relatively stable (LO: 2.97–5.73%; LK: 2.61–4.43%). In general, the dummy variable Model 6 performed better for both LO and LK in terms of diameter and volume at relative height classes; it is therefore very suitable for describing the shapes of stems and can be used as a general equation for both species of larch in northeast China.

Table 6 Mean absolute bias (MAB) and the mean percentage of bias (MPB) of the estimate by relative height (RH) class for both diameter and volume for Larix olgensis and Larix kaempferi

Discussion

Larix olgensis and Larix kaempferi are two important tree species in northeast China, and their taper and merchantable volume are crucial evaluation criteria for forest production, yield and management. Due to the variety of taper and volume equation forms, it is difficult to select an appropriate model that is applicable to multiple species and to the conditions of each stand. Most previous studies have fitted the taper equation or the volume equation separately (Corral-Rivas et al. 2007; Li and Weiskittel 2010; Shahzad et al. 2020), which may lead to large deviations in the calculation of trunk volume, and few researchers have connected the two to construct a consistent model system (Jiang et al. 2011). On the other hand, Zeng and Liao (1997) indicated that the compatible taper equation (variable model in his article) was affected by the constraint of the volume equation. Fang et al. (2000) developed a compatible segmented taper and volume equation system. In this study, we used Fang's segmented model, the goodness-of-fit of the compatible taper and volume equation did not loss.

In this study, five compatible taper and volume model systems, including simple and segmented models, were compared. The taper and volume equations for each system were fitted simultaneously using seemingly unrelated regression (SUR) in the SAS PROC MODEL procedure to minimize diameter and volume errors. All five equation systems demonstrated good predictive performance, and the segmented model performed better than the simple model in the estimation of diameter and volume. Among the simple models, the Zhao (2017) equation has a simple form, few parameters, and good fitting and testing results, which indicate that it can be popularized and utilized in general forestry production (Zhao and Kane 2017; Zhao et al. 2019). The Fang (2000) and Max (1976) models are segmented models, which divides the trunk into three parts with two inflection points, upper and lower, and describes the variation in the profiles of the stem at different positions in more detail, so the estimation of diameter and volume is more accurate (Fang et al. 2000; Quiñonez-Barraza et al. 2019; Hussain et al. 2020; López-Martínez et al. 2020). In this study, both larch species had higher upper and lower inflection points when these values were calculated by the Max (1976) model than when they were calculated by the Fang (2000) model, and the same results were obtained in a study of Larix gmelinii (Hussain et al. 2020). In addition, stem form is also influenced by site quality (Lee et al. 2006), management practices (Jiang and Liu 2011), and temperature and precipitation (Özçelik et al. 2016). Although both the Max (1976) and Fang (2000) models are segmented models, the Fang (2000) equation provides better overall performance than the Max (1976) equation in estimating diameter and merchantable volume (Quiñonez-Barraza et al. 2014; López-Martínez et al. 2020), and the Fang (2000) model exhibited lower multicollinearity in this study.

In this study, the compatible taper and volume system of Fang (2000) provided the best predictive performance. On the one hand, the original Fang (2000) equation was developed with pine species as the research object (Fang et al. 2000; Li and Weiskittel 2010), which corresponded to the objective species of this study; on the other hand, the two inflection points more suitably represented the trunk profile, which also helped in the accurate estimation for LO and LK. At present, the Fang (2000) taper equation is rarely reported in the literature, and there are even fewer volume equations compatible with it, mainly because its formula is complex and the model does not easily converge. However, because the Fang (2000) formula is complex, it has greater accuracy, flexibility and applicability (Fang et al. 2000; Quiñonez-Barraza et al. 2014; Hussain et al. 2020; López-Martínez et al. 2020).

Heteroscedasticity is often present in the modeling of the volume equation, and in previous papers, most researchers used the power function (DBH2TH) as the weight function to correct it (Quiñonez-Barraza et al. 2014; López-Martínez et al. 2020). This study used the same method to correct the heteroscedasticity. The distribution of the corrected residuals was uniform, and heteroscedasticity was eliminated. Notably, Zhang et al. (2022) used three kinds of weighting functions (power function, exponential function and constant plus power function) to address the heteroscedasticity of the volume equation, and the volume model performed best when the predicted value V was used as the function variable for correction. In future research, different variance functions can also be tried to correct the heteroscedasticity of the compatible taper and volume equations system.

Dummy variables have wide application in forestry modeling of biomass, carbon stock, and stand basal area (Castedo-Dorado et al. 2007; Zeng 2015; Jia et al. 2019), which can enable good integration of the simultaneous estimation problems for different stand types, different management practices, and different study areas and species, but they are rarely used in taper and volume modeling. Subati and Jia (2021) used geographical regions as dummy variables to construct an additive model system for heartwood, sapwood and bark taper for a Pinus koraiensis plantation in Heilongjiang Province, China, which did not explain the geographical differences represented by the dummy variables, although the model accuracy and applicability were improved. Fu et al. (2015) constructed a generalized equation for the aboveground biomass of Larix gmelinii and Larix olgensis in northeast China with tree species as dummy variables and explained the biomass difference between both species reflected by the dummy variable parameters. The overall model has a bias in the estimation of LK (Fig. 4C and D), and the standardized residuals are biased upward for Vm greater than 0.3 m3, which may be related to the samples used. For LO, the optimal model, the overall model and the dummy variable model do not differ significantly. The dummy variable segmented model of Fang (2000), constructed based on the tree species factor, reflected the differences in stem curves among tree species quite well. In the separate modeling of both tree species, the lower inflection point for LO was lower than that for LK, and the upper inflection point was higher than that for LK. The inflection points estimated by the dummy variable model produced similar results, i.e., the middle segment (paraboloid) for LO was longer than that for LK, which can be interpreted in forestry as LO having a better stem form index and being able to produce a higher yield. Therefore, the Fang (2000) equation is applicable to the compatible taper and volume model system that includes different tree species, and the introduction of dummy variables not only circumvents the problem of repeated modeling but also explains the differences in trunk form among different tree species. In addition, a mixed model with tree species as random effects can also reflect the differences among different species (Zeng et al. 2011; Xie et al. 2022), which are not explained in this study because there were only two species, i.e., fewer species variables. In the future, multiple tree species can be collected for further research.

There was also a significant difference between LO and LK in the diameter and volume predicted by the dummy variable model. The error for LO was slightly larger than that for LK, probably due to the DBH size of the sample trees (LO: 7.3–43.5 cm; LK: 5.5–28.1 cm). For relative heights between 0.0–0.2 and 0.7–0.9, both tree species showed larger estimation errors than at other height intervals. These relative height classes are related to trunk butt swell and the point at the base of the canopy (Jiang et al. 2005).

In this study, tree species were used as dummy variables to solve the problem of compatibility between different species variables and simultaneous modeling of taper and volume, improving the prediction accuracy and applicability to a certain extent, and simplifying the modeling process. It is advantageous to construct a generalized biomathematical model, which has important reference significance in related forestry research (Zeng et al. 2019).

Conclusions

We used LO and LK as the objects of the study, and the power function (DBH2TH) was used to eliminate the heteroscedasticity of the volume equation and to construct a compatible taper and volume model system appropriate for different tree species by introducing dummy variables. The results showed that five compatible model systems could accurately predict stem diameter and merchantable volume, with the Fang (2000) equation having the best statistical criteria and prediction performance. The Fang (2000) equation was not only able to predict the diameter and volume of stems at any given position but could also accurately characterize the differences in stem profile between the two tree species, which indicates the wide applicability and flexibility of the equation in forest production, yield and management. The inflection points estimated by the Fang (2000) equation system showed that LO has a longer middle section and shorter tip than LK, i.e., LO has a better stem form and produces more merchantable volume with higher yield for the same DBH and TH.