Performance Data Analysis

Chris Brantner

2024-03-18

Summary

In the following, we analyze the performancedata data set, which contains different metrics of health and body fat percentage. We will attempt to build a model to accurately predict body fat percentage from more easily measurable prediction variables. We first fit the full first order model with all possible interaction terms, then used stepwise regression and VIF to reduce our model to an optimal point. We then fit the model to our data set and performed model utility tests and compared the nested (reduced) model to our full model to see which performs better. Finally, we fit our final reduced model and conducted a validation test using the caret library in R to determine how well our model will perform making honest predictions.

Methods and Analysis

Data Types:

The performancedata data set has \(1000\) observations of \(12\) variables: age, gender, height (cm), weight (kg), diastolic and systolic pressure (mmHg), grip force (kg), sit bend (cm), sit ups (count), broad jump (cm), fitness class (A,B,C,D from best to worst), and body fat percentage. All are numeric values except for fitness class and gender, which are factored below. There is one noticable data error in sit_bend_cm, seen in the histogram

Explore Relationships:

As we are trying to predict body_fat_perc, it a useful exploratory step to look at a heatmap of the correlation values between each of the predictors and the response. From this, it is clear that broad_jump_cm, grip_force and height_cm all correlate highly to body_fat_perc, and so these will be good potential predictors. Also note that many features in the bottom left of the heatmap are highly correlated as well as diastolic and systolic, indicating potential collinerity between these predictors. We will keep this in mind as we build initial models and thin variables. Looking at the three most highly correlated predictors to body_fat_perc, the scatterplots below shows a roughly linear relationship for broad_jump_cm but not as clear of a relationship with grip_force and height_cm. Grouping each of these instead with fit_class reveals a more clear pattern, with the better fitness classes being markedly lower in body fat percentage, indicating fit_class may also be a useful predictor. Looking at the pairs plots below (stratified by gender first and then fit_class), we can see that measures of fitness sit_bend_cm and broad_jump_cm have roughly linear relationships to body_fat_perc. There is also clear patterns across all predictors of both gender and fit_class.

Variable-Thinning:

With \(11\) potential predictor variables, the first step in finding an accurate model was to build a full model and check the VIF to remove some features. This was done first below and gender,broad_jump_cm,grip_force, and systolic were removed from the set and stored in performancedataRed, as they were collinear with other predictors and so a useful model could be made without them.

Model-Building:

Once predictors were reduced, backward elimination stepwise regression was used with the full model and all interactions to minimize the AIC criterion and produce a useful reduced model. The reduced model is \[body\_fat\_perc ~ age + height\_cm + weight\_kg + diastolic + sit\_bend\_cm + sit\_ups\_count + fit\_class + age:weight\_kg + age:diastolic + age:sit\_ups\_count + age:fit\_class + height\_cm:sit\_bend\_cm + height\_cm:sit\_ups\_count + weight\_kg:sit\_bend\_cm + weight\_kg:sit\_ups\_count + diastolic:fit\_class + sit\_bend\_cm:fit\_class + sit\_ups\_count:fit\_class\]

Model Specification:

From the below reduced model summary, \(R^2_{adj}=0.6807\), \(f=76.99\), and \(p \approx 0\), indicating that this is a useful model. Testing this against the full model using the ANOVA test, we have \(F=0.567\) and \(p=0.8917\), indicating that we have significant evidence that the full model is not necessary when compared to the reduced model. Given that this model is useful and performs better than the full model, this is a sensible choice of model.

Assumption-Checking:

To perform inference using out model, assumptions must be checked first. The assumptions of linearity in the mean function for each \(x_i\), reasonably constant variance and approximate normality in the residuals were checked below, along with potential outliers and influential points (none found). The test for reasonably constant variance resulted in a \(p=0.0869\), so errors have reasonably constant variance. The Anderson-Darling test for normality resulted in a \(p=0.5279\), so errors are reasonably normal.

Response-Transform:

Given that the assumptions of the multiple linear regression model were all satisfied, a response transformation is not necessary in this case.

Final Model:

Fitting the reduced model, the model utility was tested above (with $f=76.99%, \(p \approx 0\)), and was validated below using the train function from the caret library. From this, we can see that the \(R^2_{LOO}=0.6699735\) and \(PRESS=18826.95\), so making honest predictions we are still able to explain approximately \(67\%\) of the variability in body fat percentage predictions with this fitted regression equation.

Graphs and Output

Read in growth data:

Return to Data Types

performancedata <- read_excel("bodyPerformance_fit.xlsx")
datatable(performancedata, rownames = FALSE, filter="top", options = list(pageLength = 5, scrollX=T) )
str(performancedata)
## tibble [1,000 × 12] (S3: tbl_df/tbl/data.frame)
##  $ age          : num [1:1000] 36 23 26 23 36 24 21 33 21 39 ...
##  $ gender       : chr [1:1000] "M" "M" "M" "F" ...
##  $ height_cm    : num [1:1000] 173 172 178 172 172 ...
##  $ weight_kg    : num [1:1000] 73.4 64.7 83.1 64.3 71.8 ...
##  $ diastolic    : num [1:1000] 77 75 82 76 76 72 86 80 70 79 ...
##  $ systolic     : num [1:1000] 121 130 120 120 130 113 131 121 120 131 ...
##  $ grip_force   : num [1:1000] 51.4 54.1 50.6 29.6 45 23.8 41.9 45.1 19.7 30.4 ...
##  $ sit_bend_cm  : num [1:1000] 23 7.8 22.1 15.2 21.4 ...
##  $ sit_ups_count: num [1:1000] 54 43 57 46 42 40 55 31 43 31 ...
##  $ broad_jump_cm: num [1:1000] 232 212 255 177 221 171 228 213 164 160 ...
##  $ fit_class    : chr [1:1000] "A" "C" "A" "B" ...
##  $ body_fat_perc: num [1:1000] 15.1 15.2 16.8 29.5 16.4 22 29.8 27.7 28.9 25.5 ...

Data Types:

Output 1

Return to Data Types

Factoring qualitative variables gender and fit_class for analysis, and checking structure again

#factor qualitative variables gender and fit_class
performancedata = mutate(performancedata, gender = factor(gender, levels = c("M","F")), fit_class = factor(fit_class, levels = c("A","B","C","D")))
str(performancedata)
## tibble [1,000 × 12] (S3: tbl_df/tbl/data.frame)
##  $ age          : num [1:1000] 36 23 26 23 36 24 21 33 21 39 ...
##  $ gender       : Factor w/ 2 levels "M","F": 1 1 1 2 1 2 1 1 2 2 ...
##  $ height_cm    : num [1:1000] 173 172 178 172 172 ...
##  $ weight_kg    : num [1:1000] 73.4 64.7 83.1 64.3 71.8 ...
##  $ diastolic    : num [1:1000] 77 75 82 76 76 72 86 80 70 79 ...
##  $ systolic     : num [1:1000] 121 130 120 120 130 113 131 121 120 131 ...
##  $ grip_force   : num [1:1000] 51.4 54.1 50.6 29.6 45 23.8 41.9 45.1 19.7 30.4 ...
##  $ sit_bend_cm  : num [1:1000] 23 7.8 22.1 15.2 21.4 ...
##  $ sit_ups_count: num [1:1000] 54 43 57 46 42 40 55 31 43 31 ...
##  $ broad_jump_cm: num [1:1000] 232 212 255 177 221 171 228 213 164 160 ...
##  $ fit_class    : Factor w/ 4 levels "A","B","C","D": 1 3 1 2 2 1 4 4 1 1 ...
##  $ body_fat_perc: num [1:1000] 15.1 15.2 16.8 29.5 16.4 22 29.8 27.7 28.9 25.5 ...

Explore Relationships:

Figure / Output 2

Return to Exploring Relationships

Pairs plots showing patterns in features

#focus on patterns in qualitative features, colored by gender
ggpairs(performancedata, 
        aes(color = gender, alpha = 0.5),
        lower = list(continuous = "smooth"),
        upper = list(continuous = "points"))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#focus on patterns in combo features (quant v qual) colored by fit_class
ggpairs(performancedata, 
        aes(color = fit_class, alpha = 0.5),
        lower = list(combo = "count"),
        upper = list(combo = "facetdensity"))
## Warning: The dot-dot notation (`..scaled..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(scaled)` instead.
## ℹ The deprecated feature was likely used in the GGally package.
##   Please report the issue at <https://github.com/ggobi/ggally/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Figure / Output 3

Return to Exploring Relationships

Correlation heat map to figure out which predictors will be most useful in model

heatmaply_cor(cor(select_if(performancedata, is.numeric)))

Figure / Output 4

Return to Exploring Relationships

Scatterplots for three predictors highest correlated to body fat percentage, colored by fitness class

ggplotly(ggplot(performancedata, aes(x=broad_jump_cm, y=body_fat_perc, color=fit_class)) + geom_point(size=1) + facet_wrap(~gender))
ggplotly(ggplot(performancedata, aes(x=grip_force, y=body_fat_perc, color=fit_class)) + geom_point(size=1) + facet_wrap(~gender))
ggplotly(ggplot(performancedata, aes(x=height_cm, y=body_fat_perc, color=fit_class)) + geom_point(size=1) + facet_wrap(~gender))

Output 5

Return to Data Types

Removing data error seen in sit_bend_cm predictor

hist(performancedata$sit_bend_cm)

performancedata = performancedata[which(performancedata$sit_bend_cm < 50),]

Variable-Thinning:

Figure / Output 6

Return to Variable-Thinning Return to Model Building

Checking VIF of full model to figure out which predictors can be removed before running backward step regression for reduced model

vif(lm(body_fat_perc ~ ., data =performancedata))
##                   GVIF Df GVIF^(1/(2*Df))
## age           2.240800  1        1.496930
## gender        5.356699  1        2.314454
## height_cm     3.543935  1        1.882534
## weight_kg     2.984608  1        1.727602
## diastolic     1.973425  1        1.404787
## systolic      2.227747  1        1.492564
## grip_force    4.161785  1        2.040045
## sit_bend_cm   2.007439  1        1.416841
## sit_ups_count 3.633076  1        1.906063
## broad_jump_cm 4.312311  1        2.076610
## fit_class     2.690454  3        1.179336
performancedataRed = select(performancedata, -c('gender','broad_jump_cm','grip_force','systolic'))
vif(lm(body_fat_perc ~ ., data =performancedataRed))
##                   GVIF Df GVIF^(1/(2*Df))
## age           1.641519  1        1.281218
## height_cm     2.740313  1        1.655389
## weight_kg     2.496352  1        1.579985
## diastolic     1.152320  1        1.073462
## sit_bend_cm   1.694215  1        1.301620
## sit_ups_count 2.559522  1        1.599851
## fit_class     2.504954  3        1.165378
fullFit = lm(body_fat_perc ~ .^2, data = performancedataRed)
reducedFit = step(fullFit)
## Start:  AIC=2949.03
## body_fat_perc ~ (age + height_cm + weight_kg + diastolic + sit_bend_cm + 
##     sit_ups_count + fit_class)^2
## 
##                             Df Sum of Sq   RSS    AIC
## - height_cm:fit_class        3     38.61 17586 2945.2
## - weight_kg:fit_class        3     61.98 17609 2946.6
## - age:height_cm              1      0.44 17548 2947.1
## - age:sit_bend_cm            1      0.83 17548 2947.1
## - diastolic:sit_bend_cm      1      1.21 17548 2947.1
## - sit_bend_cm:sit_ups_count  1     10.06 17557 2947.6
## - diastolic:sit_ups_count    1     12.33 17560 2947.7
## - height_cm:sit_bend_cm      1     16.12 17563 2947.9
## - weight_kg:diastolic        1     16.29 17564 2948.0
## - height_cm:diastolic        1     17.68 17565 2948.0
## - age:diastolic              1     19.70 17567 2948.2
## - age:weight_kg              1     21.27 17568 2948.2
## - height_cm:weight_kg        1     22.52 17570 2948.3
## <none>                                   17547 2949.0
## - height_cm:sit_ups_count    1     37.47 17585 2949.2
## - sit_bend_cm:fit_class      3    108.30 17656 2949.2
## - age:sit_ups_count          1     41.45 17589 2949.4
## - weight_kg:sit_bend_cm      1     42.08 17589 2949.4
## - weight_kg:sit_ups_count    1     48.45 17596 2949.8
## - age:fit_class              3    137.28 17684 2950.8
## - diastolic:fit_class        3    199.21 17746 2954.3
## - sit_ups_count:fit_class    3    358.66 17906 2963.2
## 
## Step:  AIC=2945.23
## body_fat_perc ~ age + height_cm + weight_kg + diastolic + sit_bend_cm + 
##     sit_ups_count + fit_class + age:height_cm + age:weight_kg + 
##     age:diastolic + age:sit_bend_cm + age:sit_ups_count + age:fit_class + 
##     height_cm:weight_kg + height_cm:diastolic + height_cm:sit_bend_cm + 
##     height_cm:sit_ups_count + weight_kg:diastolic + weight_kg:sit_bend_cm + 
##     weight_kg:sit_ups_count + weight_kg:fit_class + diastolic:sit_bend_cm + 
##     diastolic:sit_ups_count + diastolic:fit_class + sit_bend_cm:sit_ups_count + 
##     sit_bend_cm:fit_class + sit_ups_count:fit_class
## 
##                             Df Sum of Sq   RSS    AIC
## - weight_kg:fit_class        3     39.76 17626 2941.5
## - diastolic:sit_bend_cm      1      0.73 17587 2943.3
## - age:sit_bend_cm            1      1.80 17588 2943.3
## - age:height_cm              1      3.61 17589 2943.4
## - sit_bend_cm:sit_ups_count  1      7.06 17593 2943.6
## - diastolic:sit_ups_count    1     12.33 17598 2943.9
## - age:weight_kg              1     12.71 17598 2943.9
## - height_cm:weight_kg        1     14.61 17600 2944.1
## - height_cm:diastolic        1     21.50 17607 2944.4
## - weight_kg:diastolic        1     21.93 17608 2944.5
## - age:diastolic              1     22.01 17608 2944.5
## <none>                                   17586 2945.2
## - sit_bend_cm:fit_class      3    119.05 17705 2946.0
## - age:sit_ups_count          1     57.66 17644 2946.5
## - height_cm:sit_bend_cm      1     64.93 17651 2946.9
## - weight_kg:sit_ups_count    1     84.18 17670 2948.0
## - age:fit_class              3    155.50 17741 2948.0
## - weight_kg:sit_bend_cm      1     84.62 17670 2948.0
## - height_cm:sit_ups_count    1    101.05 17687 2948.9
## - diastolic:fit_class        3    196.08 17782 2950.3
## - sit_ups_count:fit_class    3    344.30 17930 2958.6
## 
## Step:  AIC=2941.48
## body_fat_perc ~ age + height_cm + weight_kg + diastolic + sit_bend_cm + 
##     sit_ups_count + fit_class + age:height_cm + age:weight_kg + 
##     age:diastolic + age:sit_bend_cm + age:sit_ups_count + age:fit_class + 
##     height_cm:weight_kg + height_cm:diastolic + height_cm:sit_bend_cm + 
##     height_cm:sit_ups_count + weight_kg:diastolic + weight_kg:sit_bend_cm + 
##     weight_kg:sit_ups_count + diastolic:sit_bend_cm + diastolic:sit_ups_count + 
##     diastolic:fit_class + sit_bend_cm:sit_ups_count + sit_bend_cm:fit_class + 
##     sit_ups_count:fit_class
## 
##                             Df Sum of Sq   RSS    AIC
## - diastolic:sit_bend_cm      1      0.74 17626 2939.5
## - age:sit_bend_cm            1      2.59 17628 2939.6
## - age:height_cm              1      4.31 17630 2939.7
## - sit_bend_cm:sit_ups_count  1      9.21 17635 2940.0
## - age:weight_kg              1     12.93 17638 2940.2
## - diastolic:sit_ups_count    1     14.19 17640 2940.3
## - height_cm:weight_kg        1     15.78 17641 2940.4
## - age:diastolic              1     21.03 17647 2940.7
## - weight_kg:diastolic        1     23.32 17649 2940.8
## - height_cm:diastolic        1     25.30 17651 2940.9
## <none>                                   17626 2941.5
## - sit_bend_cm:fit_class      3    113.14 17739 2941.9
## - age:sit_ups_count          1     51.97 17678 2942.4
## - height_cm:sit_bend_cm      1     63.84 17689 2943.1
## - age:fit_class              3    148.39 17774 2943.9
## - height_cm:sit_ups_count    1     99.97 17726 2945.1
## - weight_kg:sit_bend_cm      1    114.11 17740 2945.9
## - weight_kg:sit_ups_count    1    120.36 17746 2946.3
## - diastolic:fit_class        3    194.50 17820 2946.4
## - sit_ups_count:fit_class    3    445.87 18072 2960.4
## 
## Step:  AIC=2939.52
## body_fat_perc ~ age + height_cm + weight_kg + diastolic + sit_bend_cm + 
##     sit_ups_count + fit_class + age:height_cm + age:weight_kg + 
##     age:diastolic + age:sit_bend_cm + age:sit_ups_count + age:fit_class + 
##     height_cm:weight_kg + height_cm:diastolic + height_cm:sit_bend_cm + 
##     height_cm:sit_ups_count + weight_kg:diastolic + weight_kg:sit_bend_cm + 
##     weight_kg:sit_ups_count + diastolic:sit_ups_count + diastolic:fit_class + 
##     sit_bend_cm:sit_ups_count + sit_bend_cm:fit_class + sit_ups_count:fit_class
## 
##                             Df Sum of Sq   RSS    AIC
## - age:sit_bend_cm            1      2.16 17628 2937.7
## - age:height_cm              1      3.94 17630 2937.8
## - sit_bend_cm:sit_ups_count  1      9.01 17635 2938.0
## - age:weight_kg              1     13.38 17640 2938.3
## - diastolic:sit_ups_count    1     14.13 17640 2938.3
## - height_cm:weight_kg        1     15.47 17642 2938.4
## - age:diastolic              1     20.81 17647 2938.7
## - weight_kg:diastolic        1     22.90 17649 2938.8
## - height_cm:diastolic        1     24.56 17651 2938.9
## <none>                                   17626 2939.5
## - sit_bend_cm:fit_class      3    114.29 17741 2940.0
## - age:sit_ups_count          1     52.31 17679 2940.5
## - height_cm:sit_bend_cm      1     64.06 17690 2941.2
## - age:fit_class              3    147.65 17774 2941.9
## - height_cm:sit_ups_count    1     99.23 17726 2943.1
## - weight_kg:sit_ups_count    1    119.72 17746 2944.3
## - diastolic:fit_class        3    193.77 17820 2944.4
## - weight_kg:sit_bend_cm      1    123.46 17750 2944.5
## - sit_ups_count:fit_class    3    447.85 18074 2958.6
## 
## Step:  AIC=2937.65
## body_fat_perc ~ age + height_cm + weight_kg + diastolic + sit_bend_cm + 
##     sit_ups_count + fit_class + age:height_cm + age:weight_kg + 
##     age:diastolic + age:sit_ups_count + age:fit_class + height_cm:weight_kg + 
##     height_cm:diastolic + height_cm:sit_bend_cm + height_cm:sit_ups_count + 
##     weight_kg:diastolic + weight_kg:sit_bend_cm + weight_kg:sit_ups_count + 
##     diastolic:sit_ups_count + diastolic:fit_class + sit_bend_cm:sit_ups_count + 
##     sit_bend_cm:fit_class + sit_ups_count:fit_class
## 
##                             Df Sum of Sq   RSS    AIC
## - age:height_cm              1      3.21 17632 2935.8
## - sit_bend_cm:sit_ups_count  1      6.85 17635 2936.0
## - age:weight_kg              1     12.74 17641 2936.4
## - diastolic:sit_ups_count    1     13.78 17642 2936.4
## - height_cm:weight_kg        1     16.26 17645 2936.6
## - age:diastolic              1     20.65 17649 2936.8
## - weight_kg:diastolic        1     22.61 17651 2936.9
## - height_cm:diastolic        1     23.51 17652 2937.0
## <none>                                   17628 2937.7
## - sit_bend_cm:fit_class      3    112.13 17741 2938.0
## - age:sit_ups_count          1     51.74 17680 2938.6
## - height_cm:sit_bend_cm      1     61.90 17690 2939.2
## - age:fit_class              3    159.44 17788 2940.6
## - height_cm:sit_ups_count    1     98.77 17727 2941.2
## - weight_kg:sit_bend_cm      1    122.07 17751 2942.5
## - weight_kg:sit_ups_count    1    122.17 17751 2942.6
## - diastolic:fit_class        3    193.50 17822 2942.6
## - sit_ups_count:fit_class    3    478.24 18107 2958.4
## 
## Step:  AIC=2935.83
## body_fat_perc ~ age + height_cm + weight_kg + diastolic + sit_bend_cm + 
##     sit_ups_count + fit_class + age:weight_kg + age:diastolic + 
##     age:sit_ups_count + age:fit_class + height_cm:weight_kg + 
##     height_cm:diastolic + height_cm:sit_bend_cm + height_cm:sit_ups_count + 
##     weight_kg:diastolic + weight_kg:sit_bend_cm + weight_kg:sit_ups_count + 
##     diastolic:sit_ups_count + diastolic:fit_class + sit_bend_cm:sit_ups_count + 
##     sit_bend_cm:fit_class + sit_ups_count:fit_class
## 
##                             Df Sum of Sq   RSS    AIC
## - sit_bend_cm:sit_ups_count  1      6.48 17638 2934.2
## - diastolic:sit_ups_count    1     13.27 17645 2934.6
## - height_cm:weight_kg        1     15.39 17647 2934.7
## - age:diastolic              1     20.63 17652 2935.0
## - height_cm:diastolic        1     21.06 17653 2935.0
## - weight_kg:diastolic        1     21.21 17653 2935.0
## <none>                                   17632 2935.8
## - age:weight_kg              1     37.56 17669 2935.9
## - sit_bend_cm:fit_class      3    112.73 17744 2936.2
## - age:sit_ups_count          1     58.78 17690 2937.2
## - height_cm:sit_bend_cm      1     66.08 17698 2937.6
## - age:fit_class              3    166.30 17798 2939.2
## - height_cm:sit_ups_count    1    109.07 17741 2940.0
## - diastolic:fit_class        3    192.24 17824 2940.7
## - weight_kg:sit_ups_count    1    123.04 17755 2940.8
## - weight_kg:sit_bend_cm      1    125.08 17757 2940.9
## - sit_ups_count:fit_class    3    475.42 18107 2956.4
## 
## Step:  AIC=2934.2
## body_fat_perc ~ age + height_cm + weight_kg + diastolic + sit_bend_cm + 
##     sit_ups_count + fit_class + age:weight_kg + age:diastolic + 
##     age:sit_ups_count + age:fit_class + height_cm:weight_kg + 
##     height_cm:diastolic + height_cm:sit_bend_cm + height_cm:sit_ups_count + 
##     weight_kg:diastolic + weight_kg:sit_bend_cm + weight_kg:sit_ups_count + 
##     diastolic:sit_ups_count + diastolic:fit_class + sit_bend_cm:fit_class + 
##     sit_ups_count:fit_class
## 
##                           Df Sum of Sq   RSS    AIC
## - diastolic:sit_ups_count  1     13.25 17651 2932.9
## - height_cm:diastolic      1     20.62 17659 2933.4
## - weight_kg:diastolic      1     20.99 17659 2933.4
## - height_cm:weight_kg      1     21.10 17659 2933.4
## - age:diastolic            1     21.41 17660 2933.4
## <none>                                 17638 2934.2
## - sit_bend_cm:fit_class    3    109.06 17747 2934.3
## - age:weight_kg            1     39.10 17677 2934.4
## - age:sit_ups_count        1     56.29 17694 2935.4
## - age:fit_class            3    164.45 17803 2937.5
## - height_cm:sit_bend_cm    1    100.09 17738 2937.8
## - height_cm:sit_ups_count  1    102.59 17741 2938.0
## - diastolic:fit_class      3    189.12 17827 2938.8
## - weight_kg:sit_bend_cm    1    121.76 17760 2939.1
## - weight_kg:sit_ups_count  1    129.70 17768 2939.5
## - sit_ups_count:fit_class  3    559.60 18198 2959.4
## 
## Step:  AIC=2932.95
## body_fat_perc ~ age + height_cm + weight_kg + diastolic + sit_bend_cm + 
##     sit_ups_count + fit_class + age:weight_kg + age:diastolic + 
##     age:sit_ups_count + age:fit_class + height_cm:weight_kg + 
##     height_cm:diastolic + height_cm:sit_bend_cm + height_cm:sit_ups_count + 
##     weight_kg:diastolic + weight_kg:sit_bend_cm + weight_kg:sit_ups_count + 
##     diastolic:fit_class + sit_bend_cm:fit_class + sit_ups_count:fit_class
## 
##                           Df Sum of Sq   RSS    AIC
## - height_cm:diastolic      1     12.63 17664 2931.7
## - height_cm:weight_kg      1     19.40 17671 2932.0
## - weight_kg:diastolic      1     23.35 17675 2932.3
## <none>                                 17651 2932.9
## - sit_bend_cm:fit_class    3    110.40 17762 2933.2
## - age:weight_kg            1     40.94 17692 2933.3
## - age:diastolic            1     57.43 17709 2934.2
## - age:sit_ups_count        1     66.39 17718 2934.7
## - age:fit_class            3    170.06 17822 2936.5
## - height_cm:sit_bend_cm    1    105.97 17757 2936.9
## - height_cm:sit_ups_count  1    106.69 17758 2937.0
## - weight_kg:sit_ups_count  1    119.43 17771 2937.7
## - weight_kg:sit_bend_cm    1    131.32 17783 2938.3
## - diastolic:fit_class      3    221.28 17873 2939.4
## - sit_ups_count:fit_class  3    586.78 18238 2959.6
## 
## Step:  AIC=2931.66
## body_fat_perc ~ age + height_cm + weight_kg + diastolic + sit_bend_cm + 
##     sit_ups_count + fit_class + age:weight_kg + age:diastolic + 
##     age:sit_ups_count + age:fit_class + height_cm:weight_kg + 
##     height_cm:sit_bend_cm + height_cm:sit_ups_count + weight_kg:diastolic + 
##     weight_kg:sit_bend_cm + weight_kg:sit_ups_count + diastolic:fit_class + 
##     sit_bend_cm:fit_class + sit_ups_count:fit_class
## 
##                           Df Sum of Sq   RSS    AIC
## - weight_kg:diastolic      1     10.76 17675 2930.3
## - height_cm:weight_kg      1     14.32 17678 2930.5
## <none>                                 17664 2931.7
## - age:weight_kg            1     36.71 17701 2931.7
## - sit_bend_cm:fit_class    3    109.80 17774 2931.8
## - age:diastolic            1     45.38 17710 2932.2
## - age:sit_ups_count        1     68.07 17732 2933.5
## - age:fit_class            3    165.12 17829 2935.0
## - height_cm:sit_bend_cm    1    107.71 17772 2935.7
## - height_cm:sit_ups_count  1    114.40 17778 2936.1
## - weight_kg:sit_ups_count  1    125.06 17789 2936.7
## - weight_kg:sit_bend_cm    1    133.18 17797 2937.2
## - diastolic:fit_class      3    212.51 17877 2937.6
## - sit_ups_count:fit_class  3    592.96 18257 2958.7
## 
## Step:  AIC=2930.27
## body_fat_perc ~ age + height_cm + weight_kg + diastolic + sit_bend_cm + 
##     sit_ups_count + fit_class + age:weight_kg + age:diastolic + 
##     age:sit_ups_count + age:fit_class + height_cm:weight_kg + 
##     height_cm:sit_bend_cm + height_cm:sit_ups_count + weight_kg:sit_bend_cm + 
##     weight_kg:sit_ups_count + diastolic:fit_class + sit_bend_cm:fit_class + 
##     sit_ups_count:fit_class
## 
##                           Df Sum of Sq   RSS    AIC
## - height_cm:weight_kg      1     18.08 17693 2929.3
## <none>                                 17675 2930.3
## - sit_bend_cm:fit_class    3    110.74 17786 2930.5
## - age:weight_kg            1     50.96 17726 2931.2
## - age:diastolic            1     56.92 17732 2931.5
## - age:sit_ups_count        1     67.00 17742 2932.1
## - age:fit_class            3    162.82 17838 2933.4
## - height_cm:sit_bend_cm    1    108.64 17784 2934.4
## - height_cm:sit_ups_count  1    112.81 17788 2934.6
## - weight_kg:sit_ups_count  1    121.00 17796 2935.1
## - diastolic:fit_class      3    201.88 17877 2935.6
## - weight_kg:sit_bend_cm    1    135.25 17810 2935.9
## - sit_ups_count:fit_class  3    592.27 18267 2957.2
## 
## Step:  AIC=2929.29
## body_fat_perc ~ age + height_cm + weight_kg + diastolic + sit_bend_cm + 
##     sit_ups_count + fit_class + age:weight_kg + age:diastolic + 
##     age:sit_ups_count + age:fit_class + height_cm:sit_bend_cm + 
##     height_cm:sit_ups_count + weight_kg:sit_bend_cm + weight_kg:sit_ups_count + 
##     diastolic:fit_class + sit_bend_cm:fit_class + sit_ups_count:fit_class
## 
##                           Df Sum of Sq   RSS    AIC
## <none>                                 17693 2929.3
## - sit_bend_cm:fit_class    3    107.54 17800 2929.3
## - age:weight_kg            1     39.33 17732 2929.5
## - age:diastolic            1     53.94 17747 2930.3
## - age:sit_ups_count        1     80.70 17774 2931.8
## - age:fit_class            3    174.39 17867 2933.1
## - height_cm:sit_bend_cm    1    103.03 17796 2933.1
## - weight_kg:sit_ups_count  1    107.18 17800 2933.3
## - height_cm:sit_ups_count  1    120.08 17813 2934.1
## - diastolic:fit_class      3    201.57 17894 2934.6
## - weight_kg:sit_bend_cm    1    162.31 17855 2936.4
## - sit_ups_count:fit_class  3    609.71 18303 2957.1
summary(reducedFit)
## 
## Call:
## lm(formula = body_fat_perc ~ age + height_cm + weight_kg + diastolic + 
##     sit_bend_cm + sit_ups_count + fit_class + age:weight_kg + 
##     age:diastolic + age:sit_ups_count + age:fit_class + height_cm:sit_bend_cm + 
##     height_cm:sit_ups_count + weight_kg:sit_bend_cm + weight_kg:sit_ups_count + 
##     diastolic:fit_class + sit_bend_cm:fit_class + sit_ups_count:fit_class, 
##     data = performancedataRed)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.9514  -2.6864   0.0187   2.8180  12.8339 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              166.703530  12.886461  12.936  < 2e-16 ***
## age                       -0.235463   0.102166  -2.305 0.021393 *  
## height_cm                 -0.921805   0.074961 -12.297  < 2e-16 ***
## weight_kg                  0.461923   0.069487   6.648 4.96e-11 ***
## diastolic                  0.089326   0.043671   2.045 0.041081 *  
## sit_bend_cm               -0.849463   0.417404  -2.035 0.042112 *  
## sit_ups_count             -1.008046   0.279637  -3.605 0.000328 ***
## fit_classB               -11.884731   4.373865  -2.717 0.006701 ** 
## fit_classC               -18.460473   4.169851  -4.427 1.06e-05 ***
## fit_classD                -9.997982   3.928823  -2.545 0.011089 *  
## age:weight_kg              0.001697   0.001156   1.468 0.142328    
## age:diastolic             -0.001724   0.001003  -1.720 0.085805 .  
## age:sit_ups_count          0.002217   0.001054   2.103 0.035686 *  
## age:fit_classB             0.034059   0.038657   0.881 0.378512    
## age:fit_classC             0.111448   0.038884   2.866 0.004245 ** 
## age:fit_classD             0.091578   0.042681   2.146 0.032150 *  
## height_cm:sit_bend_cm      0.006659   0.002802   2.377 0.017665 *  
## height_cm:sit_ups_count    0.004475   0.001744   2.566 0.010442 *  
## weight_kg:sit_bend_cm     -0.005158   0.001729  -2.983 0.002926 ** 
## weight_kg:sit_ups_count   -0.003025   0.001248  -2.424 0.015530 *  
## diastolic:fit_classB       0.050473   0.037757   1.337 0.181604    
## diastolic:fit_classC       0.082208   0.037176   2.211 0.027247 *  
## diastolic:fit_classD      -0.029348   0.036878  -0.796 0.426335    
## sit_bend_cm:fit_classB     0.184962   0.089861   2.058 0.039827 *  
## sit_bend_cm:fit_classC     0.082722   0.078709   1.051 0.293527    
## sit_bend_cm:fit_classD     0.148015   0.072983   2.028 0.042827 *  
## sit_ups_count:fit_classB   0.078457   0.044418   1.766 0.077655 .  
## sit_ups_count:fit_classC   0.162484   0.041714   3.895 0.000105 ***
## sit_ups_count:fit_classD   0.207784   0.037969   5.473 5.65e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.271 on 970 degrees of freedom
## Multiple R-squared:  0.6897, Adjusted R-squared:  0.6807 
## F-statistic: 76.99 on 28 and 970 DF,  p-value: < 2.2e-16

Model-Building:

and

Model Specification:

Figure / Output 7

Return to Variable Thinning Return to Model Specification

Summary of full model with interactions

fullFit = lm(body_fat_perc ~ .^2, data = performancedataRed)
summary(fullFit)
## 
## Call:
## lm(formula = body_fat_perc ~ .^2, data = performancedataRed)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.8630  -2.6857   0.0226   2.8591  12.9568 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                1.495e+02  3.989e+01   3.749 0.000188 ***
## age                       -3.171e-01  3.639e-01  -0.872 0.383676    
## height_cm                 -7.038e-01  2.778e-01  -2.534 0.011444 *  
## weight_kg                 -7.441e-02  3.418e-01  -0.218 0.827697    
## diastolic                  3.720e-01  3.896e-01   0.955 0.339868    
## sit_bend_cm               -5.113e-01  5.874e-01  -0.870 0.384273    
## sit_ups_count             -9.695e-01  3.997e-01  -2.426 0.015470 *  
## fit_classB                -5.718e+00  1.264e+01  -0.453 0.650974    
## fit_classC                -1.774e+01  1.407e+01  -1.261 0.207623    
## fit_classD                 5.467e+00  1.688e+01   0.324 0.746123    
## age:height_cm              3.810e-04  2.465e-03   0.155 0.877186    
## age:weight_kg              1.827e-03  1.697e-03   1.076 0.282016    
## age:diastolic             -1.347e-03  1.301e-03  -1.036 0.300432    
## age:sit_bend_cm            4.245e-04  1.992e-03   0.213 0.831286    
## age:sit_ups_count          1.714e-03  1.140e-03   1.503 0.133235    
## age:fit_classB             2.454e-02  4.112e-02   0.597 0.550792    
## age:fit_classC             1.126e-01  4.433e-02   2.540 0.011232 *  
## age:fit_classD             8.010e-02  5.427e-02   1.476 0.140258    
## height_cm:weight_kg        1.869e-03  1.687e-03   1.108 0.268254    
## height_cm:diastolic       -2.622e-03  2.672e-03  -0.982 0.326578    
## height_cm:sit_bend_cm      3.570e-03  3.810e-03   0.937 0.348947    
## height_cm:sit_ups_count    3.551e-03  2.485e-03   1.429 0.153400    
## height_cm:fit_classB      -6.043e-02  8.892e-02  -0.680 0.496931    
## height_cm:fit_classC      -1.469e-02  9.678e-02  -0.152 0.879397    
## height_cm:fit_classD      -1.285e-01  1.125e-01  -1.143 0.253359    
## weight_kg:diastolic        1.557e-03  1.653e-03   0.942 0.346397    
## weight_kg:sit_bend_cm     -3.405e-03  2.249e-03  -1.514 0.130319    
## weight_kg:sit_ups_count   -2.767e-03  1.703e-03  -1.625 0.104561    
## weight_kg:fit_classB       7.920e-02  6.512e-02   1.216 0.224262    
## weight_kg:fit_classC      -6.996e-03  7.235e-02  -0.097 0.922981    
## weight_kg:fit_classD       8.288e-02  7.741e-02   1.071 0.284611    
## diastolic:sit_bend_cm     -5.255e-04  2.048e-03  -0.257 0.797550    
## diastolic:sit_ups_count    1.236e-03  1.508e-03   0.820 0.412560    
## diastolic:fit_classB       4.246e-02  4.095e-02   1.037 0.300031    
## diastolic:fit_classC       9.240e-02  4.306e-02   2.146 0.032120 *  
## diastolic:fit_classD      -2.912e-02  5.726e-02  -0.508 0.611230    
## sit_bend_cm:sit_ups_count  1.530e-03  2.067e-03   0.740 0.459321    
## sit_bend_cm:fit_classB     2.079e-01  9.326e-02   2.230 0.025991 *  
## sit_bend_cm:fit_classC     1.200e-01  8.390e-02   1.431 0.152831    
## sit_bend_cm:fit_classD     1.814e-01  8.831e-02   2.054 0.040227 *  
## sit_ups_count:fit_classB   6.226e-02  5.309e-02   1.173 0.241180    
## sit_ups_count:fit_classC   1.802e-01  5.152e-02   3.497 0.000492 ***
## sit_ups_count:fit_classD   2.181e-01  5.687e-02   3.835 0.000134 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.284 on 956 degrees of freedom
## Multiple R-squared:  0.6922, Adjusted R-squared:  0.6787 
## F-statistic:  51.2 on 42 and 956 DF,  p-value: < 2.2e-16

Figure / Output 8

Return to Model Specification

Reduced model created by removing high VIF predictors from full model and then running backward step regression

vif(reducedFit)
## there are higher-order terms (interactions) in this model
## consider setting type = 'predictor'; see ?vif
##                                 GVIF Df GVIF^(1/(2*Df))
## age                     1.042929e+02  1       10.212387
## height_cm               2.192168e+01  1        4.682059
## weight_kg               3.919055e+01  1        6.260236
## diastolic               1.242130e+01  1        3.524386
## sit_bend_cm             6.891709e+02  1       26.252064
## sit_ups_count           9.249102e+02  1       30.412337
## fit_class               1.265824e+06  3       10.400692
## age:weight_kg           6.709045e+01  1        8.190876
## age:diastolic           8.271813e+01  1        9.094951
## age:sit_ups_count       1.769913e+01  1        4.207034
## age:fit_class           5.800390e+03  3        4.238799
## height_cm:sit_bend_cm   8.646334e+02  1       29.404648
## height_cm:sit_ups_count 1.136245e+03  1       33.708235
## weight_kg:sit_bend_cm   5.357833e+01  1        7.319722
## weight_kg:sit_ups_count 1.204426e+02  1       10.974635
## diastolic:fit_class     2.317788e+05  3        7.837525
## sit_bend_cm:fit_class   8.627231e+02  3        3.085403
## sit_ups_count:fit_class 6.994892e+03  3        4.373175
summary(reducedFit)
## 
## Call:
## lm(formula = body_fat_perc ~ age + height_cm + weight_kg + diastolic + 
##     sit_bend_cm + sit_ups_count + fit_class + age:weight_kg + 
##     age:diastolic + age:sit_ups_count + age:fit_class + height_cm:sit_bend_cm + 
##     height_cm:sit_ups_count + weight_kg:sit_bend_cm + weight_kg:sit_ups_count + 
##     diastolic:fit_class + sit_bend_cm:fit_class + sit_ups_count:fit_class, 
##     data = performancedataRed)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.9514  -2.6864   0.0187   2.8180  12.8339 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              166.703530  12.886461  12.936  < 2e-16 ***
## age                       -0.235463   0.102166  -2.305 0.021393 *  
## height_cm                 -0.921805   0.074961 -12.297  < 2e-16 ***
## weight_kg                  0.461923   0.069487   6.648 4.96e-11 ***
## diastolic                  0.089326   0.043671   2.045 0.041081 *  
## sit_bend_cm               -0.849463   0.417404  -2.035 0.042112 *  
## sit_ups_count             -1.008046   0.279637  -3.605 0.000328 ***
## fit_classB               -11.884731   4.373865  -2.717 0.006701 ** 
## fit_classC               -18.460473   4.169851  -4.427 1.06e-05 ***
## fit_classD                -9.997982   3.928823  -2.545 0.011089 *  
## age:weight_kg              0.001697   0.001156   1.468 0.142328    
## age:diastolic             -0.001724   0.001003  -1.720 0.085805 .  
## age:sit_ups_count          0.002217   0.001054   2.103 0.035686 *  
## age:fit_classB             0.034059   0.038657   0.881 0.378512    
## age:fit_classC             0.111448   0.038884   2.866 0.004245 ** 
## age:fit_classD             0.091578   0.042681   2.146 0.032150 *  
## height_cm:sit_bend_cm      0.006659   0.002802   2.377 0.017665 *  
## height_cm:sit_ups_count    0.004475   0.001744   2.566 0.010442 *  
## weight_kg:sit_bend_cm     -0.005158   0.001729  -2.983 0.002926 ** 
## weight_kg:sit_ups_count   -0.003025   0.001248  -2.424 0.015530 *  
## diastolic:fit_classB       0.050473   0.037757   1.337 0.181604    
## diastolic:fit_classC       0.082208   0.037176   2.211 0.027247 *  
## diastolic:fit_classD      -0.029348   0.036878  -0.796 0.426335    
## sit_bend_cm:fit_classB     0.184962   0.089861   2.058 0.039827 *  
## sit_bend_cm:fit_classC     0.082722   0.078709   1.051 0.293527    
## sit_bend_cm:fit_classD     0.148015   0.072983   2.028 0.042827 *  
## sit_ups_count:fit_classB   0.078457   0.044418   1.766 0.077655 .  
## sit_ups_count:fit_classC   0.162484   0.041714   3.895 0.000105 ***
## sit_ups_count:fit_classD   0.207784   0.037969   5.473 5.65e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.271 on 970 degrees of freedom
## Multiple R-squared:  0.6897, Adjusted R-squared:  0.6807 
## F-statistic: 76.99 on 28 and 970 DF,  p-value: < 2.2e-16

Figure / Output 9

Return to Model Specification

ANOVA test to determine whether reduced model is more useful

anova(reducedFit, fullFit)
## Analysis of Variance Table
## 
## Model 1: body_fat_perc ~ age + height_cm + weight_kg + diastolic + sit_bend_cm + 
##     sit_ups_count + fit_class + age:weight_kg + age:diastolic + 
##     age:sit_ups_count + age:fit_class + height_cm:sit_bend_cm + 
##     height_cm:sit_ups_count + weight_kg:sit_bend_cm + weight_kg:sit_ups_count + 
##     diastolic:fit_class + sit_bend_cm:fit_class + sit_ups_count:fit_class
## Model 2: body_fat_perc ~ (age + height_cm + weight_kg + diastolic + sit_bend_cm + 
##     sit_ups_count + fit_class)^2
##   Res.Df   RSS Df Sum of Sq     F Pr(>F)
## 1    970 17693                          
## 2    956 17547 14     145.7 0.567 0.8917

Assumption-Checking:

and

Response-Transform:

Figure / Output 10

Assumptions check using CheckAll function. High VIFs are expected for interaction variables in model

source("RegressionOverallAssumptions.R")
## [1] "argument is:  lmfit"
## [1] "needs the 'car', 'nortest', 'gridExtra', and 'ggplot2' packages"
CheckAll(reducedFit)
## For the model fit by
## body_fat_perc ~ age + height_cm + weight_kg + diastolic + sit_bend_cm + sit_ups_count + fit_class + age:weight_kg + age:diastolic + age:sit_ups_count + age:fit_class + height_cm:sit_bend_cm + height_cm:sit_ups_count + weight_kg:sit_bend_cm + weight_kg:sit_ups_count + diastolic:fit_class + sit_bend_cm:fit_class + sit_ups_count:fit_class
##  
##     the adjusted R^2 is 68.07%  
##    and the overall model is significant, with a Pvalue of 0    
## The P-value for the test about error variance is about 0.0869 so
##      Residuals have approximately constant variance.
##  
##  The P-value for the Anderson-Darling test about normality of errors is 0.5279 so
##      Residuals are reasonably normal.
##  
## 
## there are higher-order terms (interactions) in this model
## consider setting type = 'predictor'; see ?vif
## there are higher-order terms (interactions) in this model
## consider setting type = 'predictor'; see ?vif
## The following variables: age height_cm weight_kg diastolic sit_bend_cm sit_ups_count fit_class age:weight_kg age:diastolic age:sit_ups_count age:fit_class height_cm:sit_bend_cm height_cm:sit_ups_count weight_kg:sit_bend_cm weight_kg:sit_ups_count diastolic:fit_class sit_bend_cm:fit_class sit_ups_count:fit_class   have large VIF values 104.3 21.9 39.2 12.4 689.2 924.9 1265824 67.1 82.7 17.7 5800.4 864.6 1136.2 53.6 120.4 231778.8 862.7 6994.9     
##   
## Potential outliers are: 
## none   
##    and potential influential points are 
## none   
##    and LIKELY influential points are 
## none   
##   
## The following point(s) have been identified as high-leverage: 
## 16 18 19 32 33 39 95 117 118 129 146 197 207 225 255 256 257 264 272 278 311 328 331 355 366 368 376 386 395 424 442 443 448 452 458 467 490 514 521 539 607 620 624 632 643 644 645 659 661 663 735 741 755 761 808 819 821 827 877 880 917 930 945 956 
##   
## In power transformation, the range of values for the power is between 0.72 and 1

Final Model:

Figure / Output 11

Return to Final Model

Final model used:

summary(reducedFit)
## 
## Call:
## lm(formula = body_fat_perc ~ age + height_cm + weight_kg + diastolic + 
##     sit_bend_cm + sit_ups_count + fit_class + age:weight_kg + 
##     age:diastolic + age:sit_ups_count + age:fit_class + height_cm:sit_bend_cm + 
##     height_cm:sit_ups_count + weight_kg:sit_bend_cm + weight_kg:sit_ups_count + 
##     diastolic:fit_class + sit_bend_cm:fit_class + sit_ups_count:fit_class, 
##     data = performancedataRed)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.9514  -2.6864   0.0187   2.8180  12.8339 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              166.703530  12.886461  12.936  < 2e-16 ***
## age                       -0.235463   0.102166  -2.305 0.021393 *  
## height_cm                 -0.921805   0.074961 -12.297  < 2e-16 ***
## weight_kg                  0.461923   0.069487   6.648 4.96e-11 ***
## diastolic                  0.089326   0.043671   2.045 0.041081 *  
## sit_bend_cm               -0.849463   0.417404  -2.035 0.042112 *  
## sit_ups_count             -1.008046   0.279637  -3.605 0.000328 ***
## fit_classB               -11.884731   4.373865  -2.717 0.006701 ** 
## fit_classC               -18.460473   4.169851  -4.427 1.06e-05 ***
## fit_classD                -9.997982   3.928823  -2.545 0.011089 *  
## age:weight_kg              0.001697   0.001156   1.468 0.142328    
## age:diastolic             -0.001724   0.001003  -1.720 0.085805 .  
## age:sit_ups_count          0.002217   0.001054   2.103 0.035686 *  
## age:fit_classB             0.034059   0.038657   0.881 0.378512    
## age:fit_classC             0.111448   0.038884   2.866 0.004245 ** 
## age:fit_classD             0.091578   0.042681   2.146 0.032150 *  
## height_cm:sit_bend_cm      0.006659   0.002802   2.377 0.017665 *  
## height_cm:sit_ups_count    0.004475   0.001744   2.566 0.010442 *  
## weight_kg:sit_bend_cm     -0.005158   0.001729  -2.983 0.002926 ** 
## weight_kg:sit_ups_count   -0.003025   0.001248  -2.424 0.015530 *  
## diastolic:fit_classB       0.050473   0.037757   1.337 0.181604    
## diastolic:fit_classC       0.082208   0.037176   2.211 0.027247 *  
## diastolic:fit_classD      -0.029348   0.036878  -0.796 0.426335    
## sit_bend_cm:fit_classB     0.184962   0.089861   2.058 0.039827 *  
## sit_bend_cm:fit_classC     0.082722   0.078709   1.051 0.293527    
## sit_bend_cm:fit_classD     0.148015   0.072983   2.028 0.042827 *  
## sit_ups_count:fit_classB   0.078457   0.044418   1.766 0.077655 .  
## sit_ups_count:fit_classC   0.162484   0.041714   3.895 0.000105 ***
## sit_ups_count:fit_classD   0.207784   0.037969   5.473 5.65e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.271 on 970 degrees of freedom
## Multiple R-squared:  0.6897, Adjusted R-squared:  0.6807 
## F-statistic: 76.99 on 28 and 970 DF,  p-value: < 2.2e-16

Figure / Output 12

Return to Final Model

Model validation with \(R^2_{LOO}\) and \(PRESS\) statistics

train(body_fat_perc ~ age + height_cm + weight_kg + diastolic + 
    sit_bend_cm + sit_ups_count + fit_class + age:weight_kg + 
    age:diastolic + age:sit_ups_count + age:fit_class + height_cm:sit_bend_cm + 
    height_cm:sit_ups_count + weight_kg:sit_bend_cm + weight_kg:sit_ups_count + 
    diastolic:fit_class + sit_bend_cm:fit_class + sit_ups_count:fit_class, 
    data = performancedata, 
    method = "lm",
    trControl = trainControl(method = "LOOCV"))
## Linear Regression 
## 
## 999 samples
##   7 predictor
## 
## No pre-processing
## Resampling: Leave-One-Out Cross-Validation 
## Summary of sample sizes: 998, 998, 998, 998, 998, 998, ... 
## Resampling results:
## 
##   RMSE      Rsquared   MAE     
##   4.341163  0.6699735  3.409795
## 
## Tuning parameter 'intercept' was held constant at a value of TRUE
PRESS = 999*(4.341163^2)
PRESS
## [1] 18826.85