In order to continue enjoying our site, we ask that you confirm your identity as a human. Thank you very much for your cooperation. Here are my answers to the Week 3 Quiz Activity of the couse Inferential Statistics with R presented by Coursera and conducted by Mine Çetinkaya-Rundel.
This is a R Markdown file. To a better viewing, it could be forked and knitted on RStudio to a html file or it could be viewed directly as a RPubs publication.
We will use the devtools package to install the statsr package associated with this course. We need to install and load this package. install.packages("devtools") library(devtools) Now we can install the rest of the packages we will use during the course. Type the following commands in the Console as well: install.packages("dplyr") install.packages("ggplot2") install.packages("shiny") install_github("StatsWithR/statsr")
Construct bootstrap confidence intervals using one of the following methods:
Recognize that when the bootstrap distribution is extremely skewed and sparse, the bootstrap confidence interval may not be appropriate. For a 90% confidence interval we would want to exclude 10% of samples outside of the confidence interval, i.e. 5% on each tail. With 100 samples, that means we just count off 5 points corresponding to 5 bootstrap sample statistics from each end of the bootstrap distribution to determine what the endpoints of the confidence interval are.
This question refers to the following:
It doesn’t make any sense to subtract each observation in one data set from the average of the other data set’s observations, we subtract the paired observations from each other.
Bootstrap distributions are constructed by resampling from the sample while sampling distributions by resampling from the population.
This question refers to the following:
With a small sample size our estimate of the standard error as \(s/\sqrt{n}\) is not reliable, since the sample standard deviation, \(s\), may not be a reliable estimate for the population standard deviation \(\sigma\) when the sample size is low. We make up for this by using the t instead of the normal distribution.
This question refers to the following:
As the degrees of freedom increases the t distribution starts approaching the normal distribution, and t distributions with lower degrees of freedom will have heavier tails than t distributions with higher degrees of freedom.
The test is a paired test because the same 10 items were bought at each store; i.e. each observation in one data set has a special correspondence to exactly one observation in the other data set.
This question refers to the following :
The degrees of freedom are \(df = 25 - 1 = 24\), the alternative hypothesis is ”less than 100”, significance level is 5%, we can use the table to find the cutoff value corresponding to the lower 5% of the \(t_24\) curve.
List the conditions necessary for performing ANOVA
and use graphical diagnostics to check if these conditions are met. Success-failure condition is relevant for categorical variables.
This question refers to the following learning objective(s): Recognize that the test statistic for ANOVA, the F statistic, is calculated as the ratio of the mean square between groups (MSG, variability between groups) and mean square error (MSE, variability within errors). Also recognize that the F statistic has a right skewed distribution with two different measures of degrees of freedom: one for the numerator (\({df}_{G}=k−1\), where \(k\) is the number of groups), and one for the denominator (\({df}_{E}=n−k\), where \(n\) is the total sample size). The group degrees of freedom is number of levels (categories) minus 1 (\(k−1=5−1=4\)) and the error degrees of freedom is the sample size minus the number of levels (\(n−k=45−5=40\)).
Recognize that the null hypothesis in ANOVA sets all means equal to each other, and the alternative hypothesis suggest that at least one mean is different.
The p-value is low so we reject the null hypothesis.
This question refers to the following:
where \(SE=s/\sqrt{n}\).
where \(SE=s/\sqrt{n}\). Note that \(\mu_{diff}\) is often 0, since often \(H_0 : \mu_{diff}=0\). As the sample size increases the standard error will decrease, which increases the test statistic, and hence decreases the p-value (the tail area).
alpha = 0.05 k = 5 comparisons <- k*(k-1)/2 alpha/comparisons ## [1] 0.005This question refers to the following learning objective(s): Describe why conducting many t-tests for differences between each pair of means leads to an increased Type 1 Error rate, and we use a corrected significance level (Bonferroni correction, α⋆=α/K, where K is the number of comparisons being considered) to combat inflating this error rate.
\(K = (5-4)/2\), \(\alpha^{*} = 0.05/10 = 0.005\). |