Summary
In the following, we analyze the performancedata
data set, which contains different metrics of health and body fat
percentage. We will attempt to build a model to accurately predict body
fat percentage from more easily measurable prediction variables. We
first fit the full first order model with all possible interaction
terms, then used stepwise regression and VIF to reduce our model to an
optimal point. We then fit the model to our data set and performed model
utility tests and compared the nested (reduced) model to our full model
to see which performs better. Finally, we fit our final reduced model
and conducted a validation test using the caret
library in
R to determine how well our model will perform making honest
predictions.
Methods and Analysis
Data Types:
The performancedata
data set
has \(1000\) observations of \(12\) variables: age, gender, height (cm),
weight (kg), diastolic and systolic pressure (mmHg), grip force (kg),
sit bend (cm), sit ups (count), broad jump (cm), fitness class (A,B,C,D
from best to worst), and body fat percentage. All are numeric values
except for fitness class and gender, which are factored below. There is one noticable data error
in sit_bend_cm
, seen in the histogram
Explore Relationships:
As we are trying to predict body_fat_perc
, it a
useful exploratory step to look at a heatmap of
the correlation values between each of the predictors and the response.
From this, it is clear that broad_jump_cm
,
grip_force
and height_cm
all correlate highly
to body_fat_perc
, and so these will be good potential
predictors. Also note that many features in the bottom left of the heatmap are highly correlated as well as
diastolic
and systolic
, indicating potential
collinerity between these predictors. We will keep this in mind as we
build initial models and thin variables. Looking at the three most
highly correlated predictors to body_fat_perc
, the scatterplots below shows a roughly linear
relationship for broad_jump_cm
but not as clear of a
relationship with grip_force
and height_cm
.
Grouping each of these instead with fit_class
reveals a
more clear pattern, with the better fitness classes being markedly lower
in body fat percentage, indicating fit_class
may also be a
useful predictor. Looking at the pairs plots below
(stratified by gender
first and then
fit_class
), we can see that measures of fitness
sit_bend_cm
and broad_jump_cm
have roughly
linear relationships to body_fat_perc
. There is also clear
patterns across all predictors of both gender
and
fit_class
.
Variable-Thinning:
With \(11\) potential predictor
variables, the first step in finding an accurate model was to build a
full model and check the VIF to remove some features. This was done
first below and
gender
,broad_jump_cm
,grip_force
,
and systolic
were removed from the set and stored in
performancedataRed
, as they were collinear with other
predictors and so a useful model could be made without them.
Model-Building:
Once predictors were reduced, backward elimination stepwise regression was used with the full model and all interactions to minimize the AIC criterion and produce a useful reduced model. The reduced model is \[body\_fat\_perc ~ age + height\_cm + weight\_kg + diastolic + sit\_bend\_cm + sit\_ups\_count + fit\_class + age:weight\_kg + age:diastolic + age:sit\_ups\_count + age:fit\_class + height\_cm:sit\_bend\_cm + height\_cm:sit\_ups\_count + weight\_kg:sit\_bend\_cm + weight\_kg:sit\_ups\_count + diastolic:fit\_class + sit\_bend\_cm:fit\_class + sit\_ups\_count:fit\_class\]
Model Specification:
From the below reduced model summary, \(R^2_{adj}=0.6807\), \(f=76.99\), and \(p \approx 0\), indicating that this is a useful model. Testing this against the full model using the ANOVA test, we have \(F=0.567\) and \(p=0.8917\), indicating that we have significant evidence that the full model is not necessary when compared to the reduced model. Given that this model is useful and performs better than the full model, this is a sensible choice of model.
Assumption-Checking:
To perform inference using out model, assumptions must be checked first. The assumptions of linearity in the mean function for each \(x_i\), reasonably constant variance and approximate normality in the residuals were checked below, along with potential outliers and influential points (none found). The test for reasonably constant variance resulted in a \(p=0.0869\), so errors have reasonably constant variance. The Anderson-Darling test for normality resulted in a \(p=0.5279\), so errors are reasonably normal.
Response-Transform:
Given that the assumptions of the multiple linear regression model were all satisfied, a response transformation is not necessary in this case.
Final Model:
Fitting the reduced model, the model utility
was tested above (with $f=76.99%, \(p \approx
0\)), and was validated below using the
train
function from the caret
library. From
this, we can see that the \(R^2_{LOO}=0.6699735\) and \(PRESS=18826.95\), so making honest
predictions we are still able to explain approximately \(67\%\) of the variability in body fat
percentage predictions with this fitted regression equation.
Graphs and Output
Read in growth data:
Return to Data Types
performancedata <- read_excel("bodyPerformance_fit.xlsx")
datatable(performancedata, rownames = FALSE, filter="top", options = list(pageLength = 5, scrollX=T) )
str(performancedata)
## tibble [1,000 × 12] (S3: tbl_df/tbl/data.frame)
## $ age : num [1:1000] 36 23 26 23 36 24 21 33 21 39 ...
## $ gender : chr [1:1000] "M" "M" "M" "F" ...
## $ height_cm : num [1:1000] 173 172 178 172 172 ...
## $ weight_kg : num [1:1000] 73.4 64.7 83.1 64.3 71.8 ...
## $ diastolic : num [1:1000] 77 75 82 76 76 72 86 80 70 79 ...
## $ systolic : num [1:1000] 121 130 120 120 130 113 131 121 120 131 ...
## $ grip_force : num [1:1000] 51.4 54.1 50.6 29.6 45 23.8 41.9 45.1 19.7 30.4 ...
## $ sit_bend_cm : num [1:1000] 23 7.8 22.1 15.2 21.4 ...
## $ sit_ups_count: num [1:1000] 54 43 57 46 42 40 55 31 43 31 ...
## $ broad_jump_cm: num [1:1000] 232 212 255 177 221 171 228 213 164 160 ...
## $ fit_class : chr [1:1000] "A" "C" "A" "B" ...
## $ body_fat_perc: num [1:1000] 15.1 15.2 16.8 29.5 16.4 22 29.8 27.7 28.9 25.5 ...
Data Types:
Output 1
Return to Data Types
Factoring qualitative variables gender
and
fit_class
for analysis, and checking structure
again
#factor qualitative variables gender and fit_class
performancedata = mutate(performancedata, gender = factor(gender, levels = c("M","F")), fit_class = factor(fit_class, levels = c("A","B","C","D")))
str(performancedata)
## tibble [1,000 × 12] (S3: tbl_df/tbl/data.frame)
## $ age : num [1:1000] 36 23 26 23 36 24 21 33 21 39 ...
## $ gender : Factor w/ 2 levels "M","F": 1 1 1 2 1 2 1 1 2 2 ...
## $ height_cm : num [1:1000] 173 172 178 172 172 ...
## $ weight_kg : num [1:1000] 73.4 64.7 83.1 64.3 71.8 ...
## $ diastolic : num [1:1000] 77 75 82 76 76 72 86 80 70 79 ...
## $ systolic : num [1:1000] 121 130 120 120 130 113 131 121 120 131 ...
## $ grip_force : num [1:1000] 51.4 54.1 50.6 29.6 45 23.8 41.9 45.1 19.7 30.4 ...
## $ sit_bend_cm : num [1:1000] 23 7.8 22.1 15.2 21.4 ...
## $ sit_ups_count: num [1:1000] 54 43 57 46 42 40 55 31 43 31 ...
## $ broad_jump_cm: num [1:1000] 232 212 255 177 221 171 228 213 164 160 ...
## $ fit_class : Factor w/ 4 levels "A","B","C","D": 1 3 1 2 2 1 4 4 1 1 ...
## $ body_fat_perc: num [1:1000] 15.1 15.2 16.8 29.5 16.4 22 29.8 27.7 28.9 25.5 ...
Explore Relationships:
Figure / Output 2
Return to Exploring Relationships
Pairs plots showing patterns in features
#focus on patterns in qualitative features, colored by gender
ggpairs(performancedata,
aes(color = gender, alpha = 0.5),
lower = list(continuous = "smooth"),
upper = list(continuous = "points"))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#focus on patterns in combo features (quant v qual) colored by fit_class
ggpairs(performancedata,
aes(color = fit_class, alpha = 0.5),
lower = list(combo = "count"),
upper = list(combo = "facetdensity"))
## Warning: The dot-dot notation (`..scaled..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(scaled)` instead.
## ℹ The deprecated feature was likely used in the GGally package.
## Please report the issue at <https://github.com/ggobi/ggally/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Figure / Output 3
Return to Exploring Relationships
Correlation heat map to figure out which predictors will be most useful in model
heatmaply_cor(cor(select_if(performancedata, is.numeric)))
Figure / Output 4
Return to Exploring Relationships
Scatterplots for three predictors highest correlated to body fat percentage, colored by fitness class
ggplotly(ggplot(performancedata, aes(x=broad_jump_cm, y=body_fat_perc, color=fit_class)) + geom_point(size=1) + facet_wrap(~gender))
ggplotly(ggplot(performancedata, aes(x=grip_force, y=body_fat_perc, color=fit_class)) + geom_point(size=1) + facet_wrap(~gender))
ggplotly(ggplot(performancedata, aes(x=height_cm, y=body_fat_perc, color=fit_class)) + geom_point(size=1) + facet_wrap(~gender))
Output 5
Return to Data Types
Removing data error seen in sit_bend_cm
predictor
hist(performancedata$sit_bend_cm)
performancedata = performancedata[which(performancedata$sit_bend_cm < 50),]
Variable-Thinning:
Figure / Output 6
Return to Variable-Thinning Return to Model Building
Checking VIF of full model to figure out which predictors can be removed before running backward step regression for reduced model
vif(lm(body_fat_perc ~ ., data =performancedata))
## GVIF Df GVIF^(1/(2*Df))
## age 2.240800 1 1.496930
## gender 5.356699 1 2.314454
## height_cm 3.543935 1 1.882534
## weight_kg 2.984608 1 1.727602
## diastolic 1.973425 1 1.404787
## systolic 2.227747 1 1.492564
## grip_force 4.161785 1 2.040045
## sit_bend_cm 2.007439 1 1.416841
## sit_ups_count 3.633076 1 1.906063
## broad_jump_cm 4.312311 1 2.076610
## fit_class 2.690454 3 1.179336
performancedataRed = select(performancedata, -c('gender','broad_jump_cm','grip_force','systolic'))
vif(lm(body_fat_perc ~ ., data =performancedataRed))
## GVIF Df GVIF^(1/(2*Df))
## age 1.641519 1 1.281218
## height_cm 2.740313 1 1.655389
## weight_kg 2.496352 1 1.579985
## diastolic 1.152320 1 1.073462
## sit_bend_cm 1.694215 1 1.301620
## sit_ups_count 2.559522 1 1.599851
## fit_class 2.504954 3 1.165378
fullFit = lm(body_fat_perc ~ .^2, data = performancedataRed)
reducedFit = step(fullFit)
## Start: AIC=2949.03
## body_fat_perc ~ (age + height_cm + weight_kg + diastolic + sit_bend_cm +
## sit_ups_count + fit_class)^2
##
## Df Sum of Sq RSS AIC
## - height_cm:fit_class 3 38.61 17586 2945.2
## - weight_kg:fit_class 3 61.98 17609 2946.6
## - age:height_cm 1 0.44 17548 2947.1
## - age:sit_bend_cm 1 0.83 17548 2947.1
## - diastolic:sit_bend_cm 1 1.21 17548 2947.1
## - sit_bend_cm:sit_ups_count 1 10.06 17557 2947.6
## - diastolic:sit_ups_count 1 12.33 17560 2947.7
## - height_cm:sit_bend_cm 1 16.12 17563 2947.9
## - weight_kg:diastolic 1 16.29 17564 2948.0
## - height_cm:diastolic 1 17.68 17565 2948.0
## - age:diastolic 1 19.70 17567 2948.2
## - age:weight_kg 1 21.27 17568 2948.2
## - height_cm:weight_kg 1 22.52 17570 2948.3
## <none> 17547 2949.0
## - height_cm:sit_ups_count 1 37.47 17585 2949.2
## - sit_bend_cm:fit_class 3 108.30 17656 2949.2
## - age:sit_ups_count 1 41.45 17589 2949.4
## - weight_kg:sit_bend_cm 1 42.08 17589 2949.4
## - weight_kg:sit_ups_count 1 48.45 17596 2949.8
## - age:fit_class 3 137.28 17684 2950.8
## - diastolic:fit_class 3 199.21 17746 2954.3
## - sit_ups_count:fit_class 3 358.66 17906 2963.2
##
## Step: AIC=2945.23
## body_fat_perc ~ age + height_cm + weight_kg + diastolic + sit_bend_cm +
## sit_ups_count + fit_class + age:height_cm + age:weight_kg +
## age:diastolic + age:sit_bend_cm + age:sit_ups_count + age:fit_class +
## height_cm:weight_kg + height_cm:diastolic + height_cm:sit_bend_cm +
## height_cm:sit_ups_count + weight_kg:diastolic + weight_kg:sit_bend_cm +
## weight_kg:sit_ups_count + weight_kg:fit_class + diastolic:sit_bend_cm +
## diastolic:sit_ups_count + diastolic:fit_class + sit_bend_cm:sit_ups_count +
## sit_bend_cm:fit_class + sit_ups_count:fit_class
##
## Df Sum of Sq RSS AIC
## - weight_kg:fit_class 3 39.76 17626 2941.5
## - diastolic:sit_bend_cm 1 0.73 17587 2943.3
## - age:sit_bend_cm 1 1.80 17588 2943.3
## - age:height_cm 1 3.61 17589 2943.4
## - sit_bend_cm:sit_ups_count 1 7.06 17593 2943.6
## - diastolic:sit_ups_count 1 12.33 17598 2943.9
## - age:weight_kg 1 12.71 17598 2943.9
## - height_cm:weight_kg 1 14.61 17600 2944.1
## - height_cm:diastolic 1 21.50 17607 2944.4
## - weight_kg:diastolic 1 21.93 17608 2944.5
## - age:diastolic 1 22.01 17608 2944.5
## <none> 17586 2945.2
## - sit_bend_cm:fit_class 3 119.05 17705 2946.0
## - age:sit_ups_count 1 57.66 17644 2946.5
## - height_cm:sit_bend_cm 1 64.93 17651 2946.9
## - weight_kg:sit_ups_count 1 84.18 17670 2948.0
## - age:fit_class 3 155.50 17741 2948.0
## - weight_kg:sit_bend_cm 1 84.62 17670 2948.0
## - height_cm:sit_ups_count 1 101.05 17687 2948.9
## - diastolic:fit_class 3 196.08 17782 2950.3
## - sit_ups_count:fit_class 3 344.30 17930 2958.6
##
## Step: AIC=2941.48
## body_fat_perc ~ age + height_cm + weight_kg + diastolic + sit_bend_cm +
## sit_ups_count + fit_class + age:height_cm + age:weight_kg +
## age:diastolic + age:sit_bend_cm + age:sit_ups_count + age:fit_class +
## height_cm:weight_kg + height_cm:diastolic + height_cm:sit_bend_cm +
## height_cm:sit_ups_count + weight_kg:diastolic + weight_kg:sit_bend_cm +
## weight_kg:sit_ups_count + diastolic:sit_bend_cm + diastolic:sit_ups_count +
## diastolic:fit_class + sit_bend_cm:sit_ups_count + sit_bend_cm:fit_class +
## sit_ups_count:fit_class
##
## Df Sum of Sq RSS AIC
## - diastolic:sit_bend_cm 1 0.74 17626 2939.5
## - age:sit_bend_cm 1 2.59 17628 2939.6
## - age:height_cm 1 4.31 17630 2939.7
## - sit_bend_cm:sit_ups_count 1 9.21 17635 2940.0
## - age:weight_kg 1 12.93 17638 2940.2
## - diastolic:sit_ups_count 1 14.19 17640 2940.3
## - height_cm:weight_kg 1 15.78 17641 2940.4
## - age:diastolic 1 21.03 17647 2940.7
## - weight_kg:diastolic 1 23.32 17649 2940.8
## - height_cm:diastolic 1 25.30 17651 2940.9
## <none> 17626 2941.5
## - sit_bend_cm:fit_class 3 113.14 17739 2941.9
## - age:sit_ups_count 1 51.97 17678 2942.4
## - height_cm:sit_bend_cm 1 63.84 17689 2943.1
## - age:fit_class 3 148.39 17774 2943.9
## - height_cm:sit_ups_count 1 99.97 17726 2945.1
## - weight_kg:sit_bend_cm 1 114.11 17740 2945.9
## - weight_kg:sit_ups_count 1 120.36 17746 2946.3
## - diastolic:fit_class 3 194.50 17820 2946.4
## - sit_ups_count:fit_class 3 445.87 18072 2960.4
##
## Step: AIC=2939.52
## body_fat_perc ~ age + height_cm + weight_kg + diastolic + sit_bend_cm +
## sit_ups_count + fit_class + age:height_cm + age:weight_kg +
## age:diastolic + age:sit_bend_cm + age:sit_ups_count + age:fit_class +
## height_cm:weight_kg + height_cm:diastolic + height_cm:sit_bend_cm +
## height_cm:sit_ups_count + weight_kg:diastolic + weight_kg:sit_bend_cm +
## weight_kg:sit_ups_count + diastolic:sit_ups_count + diastolic:fit_class +
## sit_bend_cm:sit_ups_count + sit_bend_cm:fit_class + sit_ups_count:fit_class
##
## Df Sum of Sq RSS AIC
## - age:sit_bend_cm 1 2.16 17628 2937.7
## - age:height_cm 1 3.94 17630 2937.8
## - sit_bend_cm:sit_ups_count 1 9.01 17635 2938.0
## - age:weight_kg 1 13.38 17640 2938.3
## - diastolic:sit_ups_count 1 14.13 17640 2938.3
## - height_cm:weight_kg 1 15.47 17642 2938.4
## - age:diastolic 1 20.81 17647 2938.7
## - weight_kg:diastolic 1 22.90 17649 2938.8
## - height_cm:diastolic 1 24.56 17651 2938.9
## <none> 17626 2939.5
## - sit_bend_cm:fit_class 3 114.29 17741 2940.0
## - age:sit_ups_count 1 52.31 17679 2940.5
## - height_cm:sit_bend_cm 1 64.06 17690 2941.2
## - age:fit_class 3 147.65 17774 2941.9
## - height_cm:sit_ups_count 1 99.23 17726 2943.1
## - weight_kg:sit_ups_count 1 119.72 17746 2944.3
## - diastolic:fit_class 3 193.77 17820 2944.4
## - weight_kg:sit_bend_cm 1 123.46 17750 2944.5
## - sit_ups_count:fit_class 3 447.85 18074 2958.6
##
## Step: AIC=2937.65
## body_fat_perc ~ age + height_cm + weight_kg + diastolic + sit_bend_cm +
## sit_ups_count + fit_class + age:height_cm + age:weight_kg +
## age:diastolic + age:sit_ups_count + age:fit_class + height_cm:weight_kg +
## height_cm:diastolic + height_cm:sit_bend_cm + height_cm:sit_ups_count +
## weight_kg:diastolic + weight_kg:sit_bend_cm + weight_kg:sit_ups_count +
## diastolic:sit_ups_count + diastolic:fit_class + sit_bend_cm:sit_ups_count +
## sit_bend_cm:fit_class + sit_ups_count:fit_class
##
## Df Sum of Sq RSS AIC
## - age:height_cm 1 3.21 17632 2935.8
## - sit_bend_cm:sit_ups_count 1 6.85 17635 2936.0
## - age:weight_kg 1 12.74 17641 2936.4
## - diastolic:sit_ups_count 1 13.78 17642 2936.4
## - height_cm:weight_kg 1 16.26 17645 2936.6
## - age:diastolic 1 20.65 17649 2936.8
## - weight_kg:diastolic 1 22.61 17651 2936.9
## - height_cm:diastolic 1 23.51 17652 2937.0
## <none> 17628 2937.7
## - sit_bend_cm:fit_class 3 112.13 17741 2938.0
## - age:sit_ups_count 1 51.74 17680 2938.6
## - height_cm:sit_bend_cm 1 61.90 17690 2939.2
## - age:fit_class 3 159.44 17788 2940.6
## - height_cm:sit_ups_count 1 98.77 17727 2941.2
## - weight_kg:sit_bend_cm 1 122.07 17751 2942.5
## - weight_kg:sit_ups_count 1 122.17 17751 2942.6
## - diastolic:fit_class 3 193.50 17822 2942.6
## - sit_ups_count:fit_class 3 478.24 18107 2958.4
##
## Step: AIC=2935.83
## body_fat_perc ~ age + height_cm + weight_kg + diastolic + sit_bend_cm +
## sit_ups_count + fit_class + age:weight_kg + age:diastolic +
## age:sit_ups_count + age:fit_class + height_cm:weight_kg +
## height_cm:diastolic + height_cm:sit_bend_cm + height_cm:sit_ups_count +
## weight_kg:diastolic + weight_kg:sit_bend_cm + weight_kg:sit_ups_count +
## diastolic:sit_ups_count + diastolic:fit_class + sit_bend_cm:sit_ups_count +
## sit_bend_cm:fit_class + sit_ups_count:fit_class
##
## Df Sum of Sq RSS AIC
## - sit_bend_cm:sit_ups_count 1 6.48 17638 2934.2
## - diastolic:sit_ups_count 1 13.27 17645 2934.6
## - height_cm:weight_kg 1 15.39 17647 2934.7
## - age:diastolic 1 20.63 17652 2935.0
## - height_cm:diastolic 1 21.06 17653 2935.0
## - weight_kg:diastolic 1 21.21 17653 2935.0
## <none> 17632 2935.8
## - age:weight_kg 1 37.56 17669 2935.9
## - sit_bend_cm:fit_class 3 112.73 17744 2936.2
## - age:sit_ups_count 1 58.78 17690 2937.2
## - height_cm:sit_bend_cm 1 66.08 17698 2937.6
## - age:fit_class 3 166.30 17798 2939.2
## - height_cm:sit_ups_count 1 109.07 17741 2940.0
## - diastolic:fit_class 3 192.24 17824 2940.7
## - weight_kg:sit_ups_count 1 123.04 17755 2940.8
## - weight_kg:sit_bend_cm 1 125.08 17757 2940.9
## - sit_ups_count:fit_class 3 475.42 18107 2956.4
##
## Step: AIC=2934.2
## body_fat_perc ~ age + height_cm + weight_kg + diastolic + sit_bend_cm +
## sit_ups_count + fit_class + age:weight_kg + age:diastolic +
## age:sit_ups_count + age:fit_class + height_cm:weight_kg +
## height_cm:diastolic + height_cm:sit_bend_cm + height_cm:sit_ups_count +
## weight_kg:diastolic + weight_kg:sit_bend_cm + weight_kg:sit_ups_count +
## diastolic:sit_ups_count + diastolic:fit_class + sit_bend_cm:fit_class +
## sit_ups_count:fit_class
##
## Df Sum of Sq RSS AIC
## - diastolic:sit_ups_count 1 13.25 17651 2932.9
## - height_cm:diastolic 1 20.62 17659 2933.4
## - weight_kg:diastolic 1 20.99 17659 2933.4
## - height_cm:weight_kg 1 21.10 17659 2933.4
## - age:diastolic 1 21.41 17660 2933.4
## <none> 17638 2934.2
## - sit_bend_cm:fit_class 3 109.06 17747 2934.3
## - age:weight_kg 1 39.10 17677 2934.4
## - age:sit_ups_count 1 56.29 17694 2935.4
## - age:fit_class 3 164.45 17803 2937.5
## - height_cm:sit_bend_cm 1 100.09 17738 2937.8
## - height_cm:sit_ups_count 1 102.59 17741 2938.0
## - diastolic:fit_class 3 189.12 17827 2938.8
## - weight_kg:sit_bend_cm 1 121.76 17760 2939.1
## - weight_kg:sit_ups_count 1 129.70 17768 2939.5
## - sit_ups_count:fit_class 3 559.60 18198 2959.4
##
## Step: AIC=2932.95
## body_fat_perc ~ age + height_cm + weight_kg + diastolic + sit_bend_cm +
## sit_ups_count + fit_class + age:weight_kg + age:diastolic +
## age:sit_ups_count + age:fit_class + height_cm:weight_kg +
## height_cm:diastolic + height_cm:sit_bend_cm + height_cm:sit_ups_count +
## weight_kg:diastolic + weight_kg:sit_bend_cm + weight_kg:sit_ups_count +
## diastolic:fit_class + sit_bend_cm:fit_class + sit_ups_count:fit_class
##
## Df Sum of Sq RSS AIC
## - height_cm:diastolic 1 12.63 17664 2931.7
## - height_cm:weight_kg 1 19.40 17671 2932.0
## - weight_kg:diastolic 1 23.35 17675 2932.3
## <none> 17651 2932.9
## - sit_bend_cm:fit_class 3 110.40 17762 2933.2
## - age:weight_kg 1 40.94 17692 2933.3
## - age:diastolic 1 57.43 17709 2934.2
## - age:sit_ups_count 1 66.39 17718 2934.7
## - age:fit_class 3 170.06 17822 2936.5
## - height_cm:sit_bend_cm 1 105.97 17757 2936.9
## - height_cm:sit_ups_count 1 106.69 17758 2937.0
## - weight_kg:sit_ups_count 1 119.43 17771 2937.7
## - weight_kg:sit_bend_cm 1 131.32 17783 2938.3
## - diastolic:fit_class 3 221.28 17873 2939.4
## - sit_ups_count:fit_class 3 586.78 18238 2959.6
##
## Step: AIC=2931.66
## body_fat_perc ~ age + height_cm + weight_kg + diastolic + sit_bend_cm +
## sit_ups_count + fit_class + age:weight_kg + age:diastolic +
## age:sit_ups_count + age:fit_class + height_cm:weight_kg +
## height_cm:sit_bend_cm + height_cm:sit_ups_count + weight_kg:diastolic +
## weight_kg:sit_bend_cm + weight_kg:sit_ups_count + diastolic:fit_class +
## sit_bend_cm:fit_class + sit_ups_count:fit_class
##
## Df Sum of Sq RSS AIC
## - weight_kg:diastolic 1 10.76 17675 2930.3
## - height_cm:weight_kg 1 14.32 17678 2930.5
## <none> 17664 2931.7
## - age:weight_kg 1 36.71 17701 2931.7
## - sit_bend_cm:fit_class 3 109.80 17774 2931.8
## - age:diastolic 1 45.38 17710 2932.2
## - age:sit_ups_count 1 68.07 17732 2933.5
## - age:fit_class 3 165.12 17829 2935.0
## - height_cm:sit_bend_cm 1 107.71 17772 2935.7
## - height_cm:sit_ups_count 1 114.40 17778 2936.1
## - weight_kg:sit_ups_count 1 125.06 17789 2936.7
## - weight_kg:sit_bend_cm 1 133.18 17797 2937.2
## - diastolic:fit_class 3 212.51 17877 2937.6
## - sit_ups_count:fit_class 3 592.96 18257 2958.7
##
## Step: AIC=2930.27
## body_fat_perc ~ age + height_cm + weight_kg + diastolic + sit_bend_cm +
## sit_ups_count + fit_class + age:weight_kg + age:diastolic +
## age:sit_ups_count + age:fit_class + height_cm:weight_kg +
## height_cm:sit_bend_cm + height_cm:sit_ups_count + weight_kg:sit_bend_cm +
## weight_kg:sit_ups_count + diastolic:fit_class + sit_bend_cm:fit_class +
## sit_ups_count:fit_class
##
## Df Sum of Sq RSS AIC
## - height_cm:weight_kg 1 18.08 17693 2929.3
## <none> 17675 2930.3
## - sit_bend_cm:fit_class 3 110.74 17786 2930.5
## - age:weight_kg 1 50.96 17726 2931.2
## - age:diastolic 1 56.92 17732 2931.5
## - age:sit_ups_count 1 67.00 17742 2932.1
## - age:fit_class 3 162.82 17838 2933.4
## - height_cm:sit_bend_cm 1 108.64 17784 2934.4
## - height_cm:sit_ups_count 1 112.81 17788 2934.6
## - weight_kg:sit_ups_count 1 121.00 17796 2935.1
## - diastolic:fit_class 3 201.88 17877 2935.6
## - weight_kg:sit_bend_cm 1 135.25 17810 2935.9
## - sit_ups_count:fit_class 3 592.27 18267 2957.2
##
## Step: AIC=2929.29
## body_fat_perc ~ age + height_cm + weight_kg + diastolic + sit_bend_cm +
## sit_ups_count + fit_class + age:weight_kg + age:diastolic +
## age:sit_ups_count + age:fit_class + height_cm:sit_bend_cm +
## height_cm:sit_ups_count + weight_kg:sit_bend_cm + weight_kg:sit_ups_count +
## diastolic:fit_class + sit_bend_cm:fit_class + sit_ups_count:fit_class
##
## Df Sum of Sq RSS AIC
## <none> 17693 2929.3
## - sit_bend_cm:fit_class 3 107.54 17800 2929.3
## - age:weight_kg 1 39.33 17732 2929.5
## - age:diastolic 1 53.94 17747 2930.3
## - age:sit_ups_count 1 80.70 17774 2931.8
## - age:fit_class 3 174.39 17867 2933.1
## - height_cm:sit_bend_cm 1 103.03 17796 2933.1
## - weight_kg:sit_ups_count 1 107.18 17800 2933.3
## - height_cm:sit_ups_count 1 120.08 17813 2934.1
## - diastolic:fit_class 3 201.57 17894 2934.6
## - weight_kg:sit_bend_cm 1 162.31 17855 2936.4
## - sit_ups_count:fit_class 3 609.71 18303 2957.1
summary(reducedFit)
##
## Call:
## lm(formula = body_fat_perc ~ age + height_cm + weight_kg + diastolic +
## sit_bend_cm + sit_ups_count + fit_class + age:weight_kg +
## age:diastolic + age:sit_ups_count + age:fit_class + height_cm:sit_bend_cm +
## height_cm:sit_ups_count + weight_kg:sit_bend_cm + weight_kg:sit_ups_count +
## diastolic:fit_class + sit_bend_cm:fit_class + sit_ups_count:fit_class,
## data = performancedataRed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.9514 -2.6864 0.0187 2.8180 12.8339
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 166.703530 12.886461 12.936 < 2e-16 ***
## age -0.235463 0.102166 -2.305 0.021393 *
## height_cm -0.921805 0.074961 -12.297 < 2e-16 ***
## weight_kg 0.461923 0.069487 6.648 4.96e-11 ***
## diastolic 0.089326 0.043671 2.045 0.041081 *
## sit_bend_cm -0.849463 0.417404 -2.035 0.042112 *
## sit_ups_count -1.008046 0.279637 -3.605 0.000328 ***
## fit_classB -11.884731 4.373865 -2.717 0.006701 **
## fit_classC -18.460473 4.169851 -4.427 1.06e-05 ***
## fit_classD -9.997982 3.928823 -2.545 0.011089 *
## age:weight_kg 0.001697 0.001156 1.468 0.142328
## age:diastolic -0.001724 0.001003 -1.720 0.085805 .
## age:sit_ups_count 0.002217 0.001054 2.103 0.035686 *
## age:fit_classB 0.034059 0.038657 0.881 0.378512
## age:fit_classC 0.111448 0.038884 2.866 0.004245 **
## age:fit_classD 0.091578 0.042681 2.146 0.032150 *
## height_cm:sit_bend_cm 0.006659 0.002802 2.377 0.017665 *
## height_cm:sit_ups_count 0.004475 0.001744 2.566 0.010442 *
## weight_kg:sit_bend_cm -0.005158 0.001729 -2.983 0.002926 **
## weight_kg:sit_ups_count -0.003025 0.001248 -2.424 0.015530 *
## diastolic:fit_classB 0.050473 0.037757 1.337 0.181604
## diastolic:fit_classC 0.082208 0.037176 2.211 0.027247 *
## diastolic:fit_classD -0.029348 0.036878 -0.796 0.426335
## sit_bend_cm:fit_classB 0.184962 0.089861 2.058 0.039827 *
## sit_bend_cm:fit_classC 0.082722 0.078709 1.051 0.293527
## sit_bend_cm:fit_classD 0.148015 0.072983 2.028 0.042827 *
## sit_ups_count:fit_classB 0.078457 0.044418 1.766 0.077655 .
## sit_ups_count:fit_classC 0.162484 0.041714 3.895 0.000105 ***
## sit_ups_count:fit_classD 0.207784 0.037969 5.473 5.65e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.271 on 970 degrees of freedom
## Multiple R-squared: 0.6897, Adjusted R-squared: 0.6807
## F-statistic: 76.99 on 28 and 970 DF, p-value: < 2.2e-16
Model-Building:
and
Model Specification:
Figure / Output 7
Return to Variable Thinning Return to Model Specification
Summary of full model with interactions
fullFit = lm(body_fat_perc ~ .^2, data = performancedataRed)
summary(fullFit)
##
## Call:
## lm(formula = body_fat_perc ~ .^2, data = performancedataRed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.8630 -2.6857 0.0226 2.8591 12.9568
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.495e+02 3.989e+01 3.749 0.000188 ***
## age -3.171e-01 3.639e-01 -0.872 0.383676
## height_cm -7.038e-01 2.778e-01 -2.534 0.011444 *
## weight_kg -7.441e-02 3.418e-01 -0.218 0.827697
## diastolic 3.720e-01 3.896e-01 0.955 0.339868
## sit_bend_cm -5.113e-01 5.874e-01 -0.870 0.384273
## sit_ups_count -9.695e-01 3.997e-01 -2.426 0.015470 *
## fit_classB -5.718e+00 1.264e+01 -0.453 0.650974
## fit_classC -1.774e+01 1.407e+01 -1.261 0.207623
## fit_classD 5.467e+00 1.688e+01 0.324 0.746123
## age:height_cm 3.810e-04 2.465e-03 0.155 0.877186
## age:weight_kg 1.827e-03 1.697e-03 1.076 0.282016
## age:diastolic -1.347e-03 1.301e-03 -1.036 0.300432
## age:sit_bend_cm 4.245e-04 1.992e-03 0.213 0.831286
## age:sit_ups_count 1.714e-03 1.140e-03 1.503 0.133235
## age:fit_classB 2.454e-02 4.112e-02 0.597 0.550792
## age:fit_classC 1.126e-01 4.433e-02 2.540 0.011232 *
## age:fit_classD 8.010e-02 5.427e-02 1.476 0.140258
## height_cm:weight_kg 1.869e-03 1.687e-03 1.108 0.268254
## height_cm:diastolic -2.622e-03 2.672e-03 -0.982 0.326578
## height_cm:sit_bend_cm 3.570e-03 3.810e-03 0.937 0.348947
## height_cm:sit_ups_count 3.551e-03 2.485e-03 1.429 0.153400
## height_cm:fit_classB -6.043e-02 8.892e-02 -0.680 0.496931
## height_cm:fit_classC -1.469e-02 9.678e-02 -0.152 0.879397
## height_cm:fit_classD -1.285e-01 1.125e-01 -1.143 0.253359
## weight_kg:diastolic 1.557e-03 1.653e-03 0.942 0.346397
## weight_kg:sit_bend_cm -3.405e-03 2.249e-03 -1.514 0.130319
## weight_kg:sit_ups_count -2.767e-03 1.703e-03 -1.625 0.104561
## weight_kg:fit_classB 7.920e-02 6.512e-02 1.216 0.224262
## weight_kg:fit_classC -6.996e-03 7.235e-02 -0.097 0.922981
## weight_kg:fit_classD 8.288e-02 7.741e-02 1.071 0.284611
## diastolic:sit_bend_cm -5.255e-04 2.048e-03 -0.257 0.797550
## diastolic:sit_ups_count 1.236e-03 1.508e-03 0.820 0.412560
## diastolic:fit_classB 4.246e-02 4.095e-02 1.037 0.300031
## diastolic:fit_classC 9.240e-02 4.306e-02 2.146 0.032120 *
## diastolic:fit_classD -2.912e-02 5.726e-02 -0.508 0.611230
## sit_bend_cm:sit_ups_count 1.530e-03 2.067e-03 0.740 0.459321
## sit_bend_cm:fit_classB 2.079e-01 9.326e-02 2.230 0.025991 *
## sit_bend_cm:fit_classC 1.200e-01 8.390e-02 1.431 0.152831
## sit_bend_cm:fit_classD 1.814e-01 8.831e-02 2.054 0.040227 *
## sit_ups_count:fit_classB 6.226e-02 5.309e-02 1.173 0.241180
## sit_ups_count:fit_classC 1.802e-01 5.152e-02 3.497 0.000492 ***
## sit_ups_count:fit_classD 2.181e-01 5.687e-02 3.835 0.000134 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.284 on 956 degrees of freedom
## Multiple R-squared: 0.6922, Adjusted R-squared: 0.6787
## F-statistic: 51.2 on 42 and 956 DF, p-value: < 2.2e-16
Figure / Output 8
Return to Model Specification
Reduced model created by removing high VIF predictors from full model and then running backward step regression
vif(reducedFit)
## there are higher-order terms (interactions) in this model
## consider setting type = 'predictor'; see ?vif
## GVIF Df GVIF^(1/(2*Df))
## age 1.042929e+02 1 10.212387
## height_cm 2.192168e+01 1 4.682059
## weight_kg 3.919055e+01 1 6.260236
## diastolic 1.242130e+01 1 3.524386
## sit_bend_cm 6.891709e+02 1 26.252064
## sit_ups_count 9.249102e+02 1 30.412337
## fit_class 1.265824e+06 3 10.400692
## age:weight_kg 6.709045e+01 1 8.190876
## age:diastolic 8.271813e+01 1 9.094951
## age:sit_ups_count 1.769913e+01 1 4.207034
## age:fit_class 5.800390e+03 3 4.238799
## height_cm:sit_bend_cm 8.646334e+02 1 29.404648
## height_cm:sit_ups_count 1.136245e+03 1 33.708235
## weight_kg:sit_bend_cm 5.357833e+01 1 7.319722
## weight_kg:sit_ups_count 1.204426e+02 1 10.974635
## diastolic:fit_class 2.317788e+05 3 7.837525
## sit_bend_cm:fit_class 8.627231e+02 3 3.085403
## sit_ups_count:fit_class 6.994892e+03 3 4.373175
summary(reducedFit)
##
## Call:
## lm(formula = body_fat_perc ~ age + height_cm + weight_kg + diastolic +
## sit_bend_cm + sit_ups_count + fit_class + age:weight_kg +
## age:diastolic + age:sit_ups_count + age:fit_class + height_cm:sit_bend_cm +
## height_cm:sit_ups_count + weight_kg:sit_bend_cm + weight_kg:sit_ups_count +
## diastolic:fit_class + sit_bend_cm:fit_class + sit_ups_count:fit_class,
## data = performancedataRed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.9514 -2.6864 0.0187 2.8180 12.8339
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 166.703530 12.886461 12.936 < 2e-16 ***
## age -0.235463 0.102166 -2.305 0.021393 *
## height_cm -0.921805 0.074961 -12.297 < 2e-16 ***
## weight_kg 0.461923 0.069487 6.648 4.96e-11 ***
## diastolic 0.089326 0.043671 2.045 0.041081 *
## sit_bend_cm -0.849463 0.417404 -2.035 0.042112 *
## sit_ups_count -1.008046 0.279637 -3.605 0.000328 ***
## fit_classB -11.884731 4.373865 -2.717 0.006701 **
## fit_classC -18.460473 4.169851 -4.427 1.06e-05 ***
## fit_classD -9.997982 3.928823 -2.545 0.011089 *
## age:weight_kg 0.001697 0.001156 1.468 0.142328
## age:diastolic -0.001724 0.001003 -1.720 0.085805 .
## age:sit_ups_count 0.002217 0.001054 2.103 0.035686 *
## age:fit_classB 0.034059 0.038657 0.881 0.378512
## age:fit_classC 0.111448 0.038884 2.866 0.004245 **
## age:fit_classD 0.091578 0.042681 2.146 0.032150 *
## height_cm:sit_bend_cm 0.006659 0.002802 2.377 0.017665 *
## height_cm:sit_ups_count 0.004475 0.001744 2.566 0.010442 *
## weight_kg:sit_bend_cm -0.005158 0.001729 -2.983 0.002926 **
## weight_kg:sit_ups_count -0.003025 0.001248 -2.424 0.015530 *
## diastolic:fit_classB 0.050473 0.037757 1.337 0.181604
## diastolic:fit_classC 0.082208 0.037176 2.211 0.027247 *
## diastolic:fit_classD -0.029348 0.036878 -0.796 0.426335
## sit_bend_cm:fit_classB 0.184962 0.089861 2.058 0.039827 *
## sit_bend_cm:fit_classC 0.082722 0.078709 1.051 0.293527
## sit_bend_cm:fit_classD 0.148015 0.072983 2.028 0.042827 *
## sit_ups_count:fit_classB 0.078457 0.044418 1.766 0.077655 .
## sit_ups_count:fit_classC 0.162484 0.041714 3.895 0.000105 ***
## sit_ups_count:fit_classD 0.207784 0.037969 5.473 5.65e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.271 on 970 degrees of freedom
## Multiple R-squared: 0.6897, Adjusted R-squared: 0.6807
## F-statistic: 76.99 on 28 and 970 DF, p-value: < 2.2e-16
Figure / Output 9
Return to Model Specification
ANOVA test to determine whether reduced model is more useful
anova(reducedFit, fullFit)
## Analysis of Variance Table
##
## Model 1: body_fat_perc ~ age + height_cm + weight_kg + diastolic + sit_bend_cm +
## sit_ups_count + fit_class + age:weight_kg + age:diastolic +
## age:sit_ups_count + age:fit_class + height_cm:sit_bend_cm +
## height_cm:sit_ups_count + weight_kg:sit_bend_cm + weight_kg:sit_ups_count +
## diastolic:fit_class + sit_bend_cm:fit_class + sit_ups_count:fit_class
## Model 2: body_fat_perc ~ (age + height_cm + weight_kg + diastolic + sit_bend_cm +
## sit_ups_count + fit_class)^2
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 970 17693
## 2 956 17547 14 145.7 0.567 0.8917
Assumption-Checking:
and
Response-Transform:
Figure / Output 10
Assumptions check using CheckAll
function. High VIFs
are expected for interaction variables in model
source("RegressionOverallAssumptions.R")
## [1] "argument is: lmfit"
## [1] "needs the 'car', 'nortest', 'gridExtra', and 'ggplot2' packages"
CheckAll(reducedFit)
## For the model fit by
## body_fat_perc ~ age + height_cm + weight_kg + diastolic + sit_bend_cm + sit_ups_count + fit_class + age:weight_kg + age:diastolic + age:sit_ups_count + age:fit_class + height_cm:sit_bend_cm + height_cm:sit_ups_count + weight_kg:sit_bend_cm + weight_kg:sit_ups_count + diastolic:fit_class + sit_bend_cm:fit_class + sit_ups_count:fit_class
##
## the adjusted R^2 is 68.07%
## and the overall model is significant, with a Pvalue of 0
## The P-value for the test about error variance is about 0.0869 so
## Residuals have approximately constant variance.
##
## The P-value for the Anderson-Darling test about normality of errors is 0.5279 so
## Residuals are reasonably normal.
##
##
## there are higher-order terms (interactions) in this model
## consider setting type = 'predictor'; see ?vif
## there are higher-order terms (interactions) in this model
## consider setting type = 'predictor'; see ?vif
## The following variables: age height_cm weight_kg diastolic sit_bend_cm sit_ups_count fit_class age:weight_kg age:diastolic age:sit_ups_count age:fit_class height_cm:sit_bend_cm height_cm:sit_ups_count weight_kg:sit_bend_cm weight_kg:sit_ups_count diastolic:fit_class sit_bend_cm:fit_class sit_ups_count:fit_class have large VIF values 104.3 21.9 39.2 12.4 689.2 924.9 1265824 67.1 82.7 17.7 5800.4 864.6 1136.2 53.6 120.4 231778.8 862.7 6994.9
##
## Potential outliers are:
## none
## and potential influential points are
## none
## and LIKELY influential points are
## none
##
## The following point(s) have been identified as high-leverage:
## 16 18 19 32 33 39 95 117 118 129 146 197 207 225 255 256 257 264 272 278 311 328 331 355 366 368 376 386 395 424 442 443 448 452 458 467 490 514 521 539 607 620 624 632 643 644 645 659 661 663 735 741 755 761 808 819 821 827 877 880 917 930 945 956
##
## In power transformation, the range of values for the power is between 0.72 and 1
Final Model:
Figure / Output 11
Return to Final Model
Final model used:
summary(reducedFit)
##
## Call:
## lm(formula = body_fat_perc ~ age + height_cm + weight_kg + diastolic +
## sit_bend_cm + sit_ups_count + fit_class + age:weight_kg +
## age:diastolic + age:sit_ups_count + age:fit_class + height_cm:sit_bend_cm +
## height_cm:sit_ups_count + weight_kg:sit_bend_cm + weight_kg:sit_ups_count +
## diastolic:fit_class + sit_bend_cm:fit_class + sit_ups_count:fit_class,
## data = performancedataRed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.9514 -2.6864 0.0187 2.8180 12.8339
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 166.703530 12.886461 12.936 < 2e-16 ***
## age -0.235463 0.102166 -2.305 0.021393 *
## height_cm -0.921805 0.074961 -12.297 < 2e-16 ***
## weight_kg 0.461923 0.069487 6.648 4.96e-11 ***
## diastolic 0.089326 0.043671 2.045 0.041081 *
## sit_bend_cm -0.849463 0.417404 -2.035 0.042112 *
## sit_ups_count -1.008046 0.279637 -3.605 0.000328 ***
## fit_classB -11.884731 4.373865 -2.717 0.006701 **
## fit_classC -18.460473 4.169851 -4.427 1.06e-05 ***
## fit_classD -9.997982 3.928823 -2.545 0.011089 *
## age:weight_kg 0.001697 0.001156 1.468 0.142328
## age:diastolic -0.001724 0.001003 -1.720 0.085805 .
## age:sit_ups_count 0.002217 0.001054 2.103 0.035686 *
## age:fit_classB 0.034059 0.038657 0.881 0.378512
## age:fit_classC 0.111448 0.038884 2.866 0.004245 **
## age:fit_classD 0.091578 0.042681 2.146 0.032150 *
## height_cm:sit_bend_cm 0.006659 0.002802 2.377 0.017665 *
## height_cm:sit_ups_count 0.004475 0.001744 2.566 0.010442 *
## weight_kg:sit_bend_cm -0.005158 0.001729 -2.983 0.002926 **
## weight_kg:sit_ups_count -0.003025 0.001248 -2.424 0.015530 *
## diastolic:fit_classB 0.050473 0.037757 1.337 0.181604
## diastolic:fit_classC 0.082208 0.037176 2.211 0.027247 *
## diastolic:fit_classD -0.029348 0.036878 -0.796 0.426335
## sit_bend_cm:fit_classB 0.184962 0.089861 2.058 0.039827 *
## sit_bend_cm:fit_classC 0.082722 0.078709 1.051 0.293527
## sit_bend_cm:fit_classD 0.148015 0.072983 2.028 0.042827 *
## sit_ups_count:fit_classB 0.078457 0.044418 1.766 0.077655 .
## sit_ups_count:fit_classC 0.162484 0.041714 3.895 0.000105 ***
## sit_ups_count:fit_classD 0.207784 0.037969 5.473 5.65e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.271 on 970 degrees of freedom
## Multiple R-squared: 0.6897, Adjusted R-squared: 0.6807
## F-statistic: 76.99 on 28 and 970 DF, p-value: < 2.2e-16
Figure / Output 12
Return to Final Model
Model validation with \(R^2_{LOO}\) and \(PRESS\) statistics
train(body_fat_perc ~ age + height_cm + weight_kg + diastolic +
sit_bend_cm + sit_ups_count + fit_class + age:weight_kg +
age:diastolic + age:sit_ups_count + age:fit_class + height_cm:sit_bend_cm +
height_cm:sit_ups_count + weight_kg:sit_bend_cm + weight_kg:sit_ups_count +
diastolic:fit_class + sit_bend_cm:fit_class + sit_ups_count:fit_class,
data = performancedata,
method = "lm",
trControl = trainControl(method = "LOOCV"))
## Linear Regression
##
## 999 samples
## 7 predictor
##
## No pre-processing
## Resampling: Leave-One-Out Cross-Validation
## Summary of sample sizes: 998, 998, 998, 998, 998, 998, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 4.341163 0.6699735 3.409795
##
## Tuning parameter 'intercept' was held constant at a value of TRUE
PRESS = 999*(4.341163^2)
PRESS
## [1] 18826.85