Examtests with the 10th edition of Introduction to the Practice of Statistics by Moore, McCabe & Craig


What are distributions in the realm of statistics? - ExamTests 1

 

Multiple Choice questions

Question 1

Which of the below measures can be calculated from the five-number summary?

A. The mean
B. The interquartile range
C. The standard deviation
D. The variance

Question 2

Person X made a lot of practice exams for statistics. Because of this, X understands the material well and passes the exam. The variable ‘hour spent studying’ is an example of

A. A dependent variable
B. A normally distributed variable
C. An independent variable
D. Qualitative variable

Question 3

A teacher made a stemplot from the scores of 23 students on their statistics exam (range 0-100). In this stemplot it can be seen that de mode equals 61. Which of the below stemplots may be applicable?

A.
3 | 8
4 | 2 8
5 | 4 5 6 7
6| 1 1 1 6
7| 3 3 8 8
8| 0 2 2 5 9
9| 3 5 9

B.
3 | 8
4 | 2 3 8
5 | 4 5 5 5
6| 0 0 1 6
7| 3 3 8 8 9
8| 0 2 5
9| 3 5 9

C. None of the above.
D. Both.

Question 4

Which figure can best be used to check if a variable is normally distributed?

A. Q-Q plot
B. Barplot
C. Timeplot
D. Histogram

Question 5

Given are the scores on Statistics 1 for Psychology students. The five-number summary of these scores is given below. 4 5 6 7 9 Which statement is true?

A. The scores above the mode are less spread than the scores below the mode.
B. The scores above the mode are more spread than the scores below the mode.
C. The scores above the median are less spread than the scores below the median.
D. The scores above the median are more spread than the scores below the median.

Question 6

What can not be deduced from a boxplot, when the distribution of a variable is skewed?
A. The mean
B. The median
C. The interquartile range
D. The minimum

Question 7

What kind of plot is depicted below?

A. Density plot
B. Normal Quantile plot
C. Line plot
D. Residual plot

Question 8

The scores on 400 participants on an IQ-test provide a mean of 300 and a standard deviation of 30. The researcher wants to linearly transform the scores, so that the mean is 100 and the standard deviation 15. What should the researcher do to obtain this?

A. Divide all scores by 2
B. Divide all scores by 3
C. Divide all scores by 2 and subtract 50 from each value
D. Divide all scores by 2 and subtract 100 from each value

Question 9

Which of the statements below is or are true?

I. The standard deviation is resistant.
II. De standard deviation is zero when no outliers are present.

A. Only statement I is true.
B. Only statement II is true.
C. Both statements are true.
D. Both statements are false.

Question 10

The distribution of house selling prices appears to be right skewed. The mean house price is 223500 euros. Hence, the median is

A. Lower than 223500
B. Equal to 223500
C. Higher than 223500
D. The median can not be determined based on this information alone

Question 11

Given are the test scores with mean equal to 100 and standard deviation equal to 30. A researcher wishes to transform the data in such a way that the standard deviation becomes 15, but that the mean remains equal to 100. Which transformation should the researcher use to achieve this?

A. Y = 0.50X
B. Y = 0.50X + 50
C. Y = 2X
D. That is not possible

Question 12

Given is the following five-number summary: 20, 25, 28, 35 en 55. Which of the following scores can be regarded as outlier according to the 1.5-IQR criterium?

A. 15
B. 55
C. Both 15 and 55
D. None of the above

Question 13

A researcher wishes to describe his data with two summary measures: one center measure and one measure for spread. Which measures could he use best, if he strives to use robust measures?

A. Mean and standard deviation
B. Mean and IQR
C. Median and standard deviation
D. Median and IQR

Question 14

A researcher collected data of 500 participants about their monthly gross income and gasoline costs per month. A Q-Q plot has been made for these data. Which of the following conclusions is true?

A. The monthly gross income correlates strongly with the monthly gasoline costs.
B. The monthly gross income appears to be normally distributed.
C. The monthly gross income does not correlate strongly with the monthly gasoline costs.
D. The monthly gross income appears not to be perfectly normally distributed

Question 15

A researcher collected data about the living situation of students and assigned them to four categories: independent (studio), living together with partner, living together with other students (student home), with parents. The researcher wants to display the data graphically. What figure can best be used to display the data?

A. Boxplot
B. Stemplot
C. Bar chart
D. Scatterplot

Question 16

In an international research containing men and women from several countries, it is examined to what extent gross income can be predicted from education. What is the independent variable here?

A. Nationality
B. Sex
C. Gross income
D. Education

Question 17

What does an IQR of 16 imply?

A. That the mean 50% of the scores are spread over a scale of 4 points.
B. That the mean 50% of the scores are spread over a scale of 8 points.
C. That the mean 50% of the scores are spread over a scale of 16 points.
D. That the mean 50% of the scores are spread over a scale of 32 points.

Question 18

Data are collected for 1500 children about a writing test. It is measured how long each child needs to write a certain text. It is assumed that the variable ‘time’ is normally distributed in the population. From a random sample of 2500 children, 95% children score between 5 and 9 minutes. Which of the below statements is true?

I. The standard deviation in the sample is likely to be 1.
II. The mean in the sample is likely to be 7.

A. Only statement 1 is true
B. Only statement 2 is true
C. Both statements are true
D. Both statements are false

Question 19

Due to falling leaves, travelers from NS (Nederlandse Spoorwegen) had to deal with much delay last weekend. Given is the delay in minutes in the past weekend for a random sample of 100 travelers. The data are displayed in a boxplot below. What does the red square display?

A. The median
B. The position of the median after removing outliers
C. The IQR
D. The mean

Question 20

Given are the scores on variable X. A researcher wants to linearly transform the raw data by multiplying each score by 1 and then adding 20. What does change because of this transformation, and what does not change?

A. The shape of the distribution and the mean do not change, but the standard deviation becomes 20 points higher.
B. The shape of the distribution and the standard deviation do not change, but the mean becomes 20 points higher.
C. The shape of the distribution does not change, but the mean and standard deviation become 20 points higher.
D. The distribution will be more normally distributed, the mean and standard eviation become 20 points higher.

Question 21

In a questionnaire the following item is present: ‘How often did you wash your hair in the past week?’. This MC-question consists of the following response categories: 1 = not, 2 = once, 3 = twice, 4 = three times, 5 = four times or more. What is the highest meaningful measurement level of this variable?

A. Nominal
B. Ordinal
C. Interval
D. Ratio 22.

Question 22

For 800 students, data are collected about which sports they primarily play. Results are presented in the pie chart below. Based on this information, how many students play rugby?

 

A. 7
B. 56
C. 80
D. 560

Question 23

The age of 500 participants of a Justin Bieber concert are displayed in the table below. What is the median of age?

Leeftijd91011121314151622
Aantal deelnemers328390100873216564

A. 11
B. 11,5
C. 12
D. 12,2

Question 24

Three children of age 1, 3 and 5 are present in a room. If a 3-year old enters the room, how does this influence the mean and variance?

A. The mean remains equal, the standard deviation increases.
B. The mean remains equal, the standard deviation decreases.
C. The mean and standard deviation remain equal.
D. The mean and standard deviation decrease

Question 25

A teacher receives the following grades from 5 students: 4, 6, 7, 7, 8. What is the variance for these scores?

A. 0
B. 0.76
C. 1.40
D. 2.30

Question 26

When is it better to use the five-number summary instead of the mean and standard deviation to describe the distribution of a variable?

A. Never, the mean and standard distribution are always better.
B. When the distribution of the variable is fairly symmetric.
C. When the distribution of the variable is strongly skewed with strong outliers.
D. When the distribution of the variable is slightly skewed without outliers

Question 27

A random variable X has a mean of 10 and a standard deviation of 2. The variable X is multiplied by 2 to create Y: Y = 2X. What is the variance of the new variable Y?

A. 2
B. 4
C. 16
D. 32

Question 28

In a study it appears that people who drink more beer, are less often sick. In addition, it appears that people that drink more beer, also drink more orange juice. The variables “drinking beer” and “drinking orange juice” are ………… variables as explanation for being less often sick.

A. Skewed
B. Normally distributed
C. Explanatory
D. Confounding

Question 29

A group of students thinks that drinking orange juice is good for physical recovery. To test this hypothesis, the students visit a retirement home weekly and talk with the elderly while drinking some orange juice. After a couple of weeks, the elderly are happy and healthy. What is the explanatory variable in this study?

A. Orange juice
B. The living situation (retirement home)
C. The emotional well-being of the elderly
D. All of the above answers

Question 30

In a large-scale study in The United States, various variables have been measured. Which of the following variables is a nominal variable?

A. The state in which one lives
B. The age of the respondent
C. The number of people within a household
D. The annual income of a household per year

Question 31

What can one use best to examine to what extent the scores on two variables are equal?

A. The correlation
B. Kendall’s tau
C. The IQR
D. The mean absolute difference

Question 32

Kees has put the scores of 10 participants on a certain test in a stemplot. He now wants to expand the figure by adding the distinction between men and women. Which figure can Kees use best?

A. A scatterplot
B. A histogram
C. A time plot
D. A back-to-back stemplot

Answers

QuestionAnswerExplanation
1BThe interquartile range is the third quartile minus the first quartile, i.e.: IQR = Q3 – Q1
2CThe variable ‘hours spent studying’ explains (partly) whether or not someone passes the exam and is therefore an independent variable (also called: explanatory variable). However, this does not reveal anything about the distribution of the variable. Hence, no claims can be made regarding the distribution of the variable.
3BFor the first stemplot, the median (middle number) is 73 and de mode 61 (most frequent number). For the second stemplot, the median is 66 and the mode 55.
4A 
5DThe median is 6. The minimum score is 4 and the maximum score is 9. This implies that all possible values below the median vary from 4 to 6. All values above the median vary from 6-9. Hence, the spread is larger above the median. The five-number summary does not provide direct information about the mode.
6AA boxplot shows the median, Q1 and Q3, and outliers if present. If a variable is not (perfectly) normally distributed, the mean does not equal the median and hence the mean is not directly deducible from the boxplot.
7B 
8Cxnew = a + bx.
Multiplying each observation with b (here: b = 0.5) results in a multiplication of both the center measures (e.g. mean) and spread measures (e.g. variance) with b. Adding the same number a to all observations adds a to the center measures, but does not change the measures of spread.
9DThe standard deviation is influenced by outliers and hence not resistant; a few outliers can make the standard deviation very large. The standard deviation is zero, when there is no spread. That does not imply that all observations have the same value.
10AThe mean is ‘pushed’ towards the side of the tail, because the mean is influenced more by extreme scores. The median is influences less by extreme scores and hence is lower than the mean.
11BStart with adapting the standard deviation: Snew SD * |b| gives b = 0.5 Next, only adapt the mean: 100 = 0.5*100 + a gives a = 50
12BIQR = 35 – 25 = 10 points 1.5*IQR = 15, so outliers are below 25-15 = 10 and above 35+15 = 50.
13DMedian and IQR are relative robust measures.
14DA Q-Q plot is used to display the (normal) distribution. The line is not perfectly diagonal, so only D is true.
15CThis is a qualitative (categorical) variable. Only bar charts can be used to display categorical variables; all other figures are used for quantitative measures.
16DThe independent variable is the variable that one uses to try to explain the dependent variable.
17C 
18CWhen the population is normally distributed, the sample is likely to be normally distributed as well. According to the 65-95-99.7 rule, 2 standard deviations left and right from the main comprise 95% of the scores. Thus, 1 standard deviation equals approximately 1. The mean in the sample lies around 7.
19DA is false, because the median is the middle dash. B is wrong, because the median would be lower when removing the outliers. C is wrong, because the IQR is the middle box. D is right. We are facing a right skewed distribution, which implies that the mean is right from (i.e. higher than) the median.
20B 
21BWe are facing a categorical variable, so C and D are false. Because there is a rank order in the categories, ordinal is the highest measurement level.
22B7%, so 0.07 * 800 = 56
23CThe median is on number (250+1)/2 = 250,5 so between 250 and 251. This is in accordance with age 12.
24BThe extra child has exactly the mean age, so the mean does not change. Although the sum of squared deviations remains equal, dividing it by a larger number, results in a lower variance.
25B

First, calculate the mean.
\[ \bar{x} = \frac{4 + 6 + 7 + 7 + 8}{5} = 6.4 \]
Next, calculate the squared sum of deviation of each score from the mean:
(x - x̄)2 = (4 − 6.4)2 + (6 − 6.4)2+ (7 − 6.4)2 + (7 − 6.4)2 + (8 − 6.4)2 = (−2.4)2 + (−0.4)2 + (0.6)2 + (0.6)2 + (1.6)2= 5.76 + 0.16 + 0.36 + 0.36 + 2.56 = 9.2
Take the square root and divide by n - 1.
So, var = 1/4 √9.2 ≈ 0.76

26C 
27C

σa+bX = b σx, Hence: σ2a+bX = b2σ2x. Dus de variantie σ2 = 22 * 22 = 4 * 4 = 16.

28D 
29A 
30A 
31DThe correlation and Kendall’s tau provide information about the association between variables, this does not per se imply equal scores. The IQR provides information about the spread of scores. Again, this does not imply equal scores.
32D 

What are statistical relationships? - ExamTests 2

 

Multiple Choice questions

Question 1

A regression analysis is performed in SPSS with the variables ‘education’ (in years) and income. The output is presented in the table below. What are the a and b here in the regression formula ŷ = a + bx?

A. a = -1636.364 and b = 237.063
B. a = 237.063 and b = -1636.364
C. a = -0.606 and b = 1.495
D. a = -1636.364 and b = -0.606

   Unstandardized CoefficientsStandardized Coefficients  
1ModelBStd. ErrorBetatSig.
 (Constant)-1636.3642699.962 -.606.561
 Education237.063158.575.4671.495.173

Question 2

What tries one to minimize in a scatterplot of the regression of Y on X?

A. The sum of squares of the horizontal distances of the points till the regression line.
B. The sum of squares of the vertical distances of the points till the regression line.
C. The sum of squares of the shortest distances of the points till the regression line.
D. The sum of squares of the horizontal and vertical distances of the points till the regression line.

Question 3

Given is that the correlation between X and Y equals 0.6. Furthermore, the mean of X equals 3, and the mean of Y equals 5. The standard deviation of both X and Y equals 1. What are a and b in the regression equation ŷ = a + bx?

A. a = 0 and b = 0.6
B. a = 0.6 and b = 0
C. a = 0.6 and b = 3.2
D. a = 3.2 and b = 0.6 4.

Question 4

The correlations between four variables are calculated and displayed in the table below. A researcher wants to make a linear regression equation to predict the exam mark on the basis of one other variable. Considering the output below, which variable is the best predictor of the exam mark?

A. Hours_studied
B. Hours_Netflix
C. Previous_exam_mark
D. That can not be determined based on correlational values only

Correlations
  Exam gradeHours of studyHours of NetflixPrevious exam grade
Exam gradePearson correlation1-.277-952 **.533
 Sig. (2-tailed) .438.000.113
 N10 1010
Hours of studyPearson correlation-.2771.377.394
 Sig. (2-tailed).438 .283-.260
 N10 1010
Hours of NetflixPearson correlation-.952 **.3771-.379
 Sig. (2-tailed).000.283 .280
 N10101010
Previous exam gradePearson correlation.533.394-.3791
 Sig. (2-tailed).113.260.280 
 N10101010
**Correlation is significant at the 0.01 level (2-tailed)

Question 5

In a study regarding the relation between being overweight and visiting the G.P. (General Practitioner) it is shown that people who are overweight visit the G.P. more often than people with a healthy weight. This finding indicates that

A. Being overweight causes visiting the G.P.
B. People who are overweight will visit the G.P. less frequently when they lose weight.
C. There is a connection between being overweight and visiting the G.P.
D. Among people who are overweight, many people visit the G.P.

Question 6

Given are the scores of 100 participants on variables X and Y. It is known that the variance of X equals 4, and that the variance of Y equals 9. The covariance of X and Y equals 3. What is the correlation between X and Y?

A. 0.08
B. 0.25
C. 0.50
D. 0.75

Question 7

In a study regarding the relation between teeth and memory (Algemeen Dagblad, 2004) it is found that people who still have their own teeth, have a better memory than people with a denture. Based on this finding, the researchers conclude that ‘teeth are of utmost important for our memory’. However, a critic argues that the connection that is found can be explained easily by lurking variables (third variables). Which of the variable(s) can play the role of third variable in this case?

A. Having a denture (fake teeth)
B. Age
C. Memory
D. All three of the above variables

Question 8

The scores of 20 persons on variables X and Y are plotted in the figure below. Of these 20 persons, one person is quite striking. Are the scores of this person an influential point?

A. Yes, because removing this person results in a considerable change for the correlation between X and Y.
B. Yes, because the score of this person on variable Y is clearly an outlier.
C. No, because removing this person does not result in a change for the correlation between X and Y.
D. No, because the scores of this person on X and Y are clearly no outliers.

Question 9

The correlation between variables X and Y appears to be exactly 1.0. What can you conclude, based on this information?

A. The mean absolute difference equals 0
B. The slope of the regression equation equals 0
C. The scores on X equal the scores on Y
D. The scores on Y are a linear transformation of the scores on X

Question 10

Given are two variables X and Y. To predict Y from X, the following regression equation is made: ŷ = −9 + 3.2X. The correlation between X and Y is 1.0. Consider that someone scores -9 on Y. What can be said about the residual y − ŷ?

A. The residual is positive
B. The residual is negative
C. The residual will be zero
D. No statement can be made about the residual based on this information 1

Question 11

The correlation between variables X and Y equals -0.40. Both X and Y have a mean of 30. The standard deviation of X equals 6. The standard deviation of Y equals 3. What is the intercept in the regression equation of Y on X?

A. 6
B. 24
C. 36
D. 54

Question 12

The correlation between variables X and Y equals 0. Below are four conclusions that are drawn based on this information. Which conclusion is false?

A. There is no linear relation between X and Y.
B. The scores on X and Y are identical.
C. The regression equation provides a horizontal line (slope equals zero).
D. There is 0% explained variance for a linear regression of Y on X.

Question 13

In which situation is there a Simpson’s paradox present?

A. Hospital X has a lower death rate for terminal patients, whereas hospital Y has a lower death rate for non-terminal patients. When we do not consider whether the patient is terminal or not, hospital X has a lower death rate.
B. Hospital X has a lower death rate for terminal patients, whereas hospital Y has a lower death rate for non-terminal patients. When we do not consider whether the patient is terminal or not, hospital Y has a lower death rate.
C. Hospital X has a lower death rate for terminal patients, and hospital X has a lower death rate for non-terminal patients. When we do not consider whether the patient is terminal or not, hospital X has a lower death rate.
D. Hospital X has a lower death rate for terminal patients, and hospital X has a lower death rate for non-terminal patients. When we do not consider whether the patient is terminal or not, hospital Y has a lower death rate.

Question 14

What is a reasonable estimate of the correlation between body height (in centimeters) and shoe size, according to the scatterplot that is displayed below? [Schoenmaat = shoe size / lichaamslengte = body height]

A. -0.70
B. -0.10
C. 0.10
D. 0.70

Question 15

The following linear regression equation is set up: y = 10 + 0.8x in which y is the end score on a test, and x the partial score. Marleen scored 80 on her partial test. What is her predicted end score?

A. 64
B. 72
C. 74
D. 80

Question 16

Someone examines the association between body height of women and their date partner. The table below displays the body height of six women and their data in inches (1 inch ≈ 2.5 cm).

Lengte vrouw646565666668
Lengte date686969707273

Which of the following statements is true?

A. Each body height above 66 inches should be considered an outlier
B. There is a strong positive association between the body height of the women and the body height of their date
C. There is a strong negative association between the body height of the women and the body height of their date
D. If the body height of the women and their data would have been expressed in centimetres, the correlation would be 2.5 times larger

Question 17

In a study about the association between gender and income, the correlation between these two variables appears to be r = -0.61. Which statement is true?

A. Women earn more than men.
B. Men earn more than women.
C. A mistake has been made; the correlation should be positive.
D. The measurement is pointless; r can only be determined for two quantitative variables.

Question 18

Many high-school students in The United States make the SAT-test and/or the ACT-test as admission for further education. Data are collected for 60 students who made both tests.

  • The SAT had an average of 888 with a standard deviation of 180.
  • The CAT had an average of 25 with a standard deviation of 5.
  • The correlation between the SAT and CAT is 0.851

A researcher wants to predict the ACT from the SAT-test results by using a linear regression equation. What is the least sum of squares regression line y = a + bx for these data?

A. y = 122.10 + 30.636x
B. y = 30.636 + 122.10x
C. y = 0.024 + 3.725x
D. y = 3.725 + 0.024x

Question 19

A least squares regression line is estimated for a variable. One of the data-points has a positive residual. Which statement is true?

A. The correlation between all predicted and observed data points is positive
B. This data-point lies above the regression line
C. This data-point has to be an influential point
D. This data-point lies at the right side of the scatterplot

Answers

QuestionAnswerExplanation
1Aa is the intercept, b is the slope
2B 
3Dr2 = (−0.952)2 = 0.906. Thus, the hours spent watching Netflix explain about 90% of the variance of the exam mark
4B\[ b = r_{xy} \frac{s_{x}}{s_{y}} = 0.6 \frac{1}{1} = 0.6 \]
\[ a = \bar{y} - b * \bar{x} = 5 - 0.6 * 3 = 3.2 \]
5CA en B implicate a causal relation. D is wrong, because it may be that among people who are overweight, only a small proportion visits the G.P. but that this is more than among people with a healthy weight. Thus, it tells you something about the relative number of G.P. visits, not about the absolute number.
6C

\[ r_{xy} = \frac{cov(x,y)}{s_{x}s_{y}} \]

\[ = \frac{3}{\sqrt{4} * \sqrt{9} } \]

\[ = \frac{3}{2*3} \]

\[ = \frac{3}{6} \]

\[ = 0.5 \]

7BA lurking variable is a variable –other than an exploratory or response variable- that influences the relation between the studied variables.
8A 
9DCorrelation tells you to what extent all points lie on one line: a correlation of 1 means that all points lie perfectly on one line. This, however, does not per se imply that all scores are equal, or that the slope is 1. When the scores are not equal, the mean difference does not have to be zero.
10CA correlation of 1 implies that all points lie perfectly on one line (see also question 9). This implies that all residuals are zero.
11C

\[ b = \frac{s_{y}}{s_{x}} * r_{xy} = \frac{3}{6} * -.40 = -.20 \]

\[ a = \bar{y} − b ∗ \bar{x} \]

\[ = 30 − 0.20 ∗ 30 \]

\[ = 30 − −6 \]

\[ = 30 + 6 \]

\[ = 36 \]

12BA correlation of zero implies that there is no linear association between the variables, so A and C are true. The proportion explained variance is r2 and hence is also zero. B is false.
13DDiscussed in class. See also page 143-145 of the book for a detailed explanation and different example. Moral: a causal relationship that seems to be present, switched when you add a third (lurking) variable.
14DThe regression line is positive; so there is a positive correlation. Moreover, there is a reasonable association between body height and shoe size. Answer D is the best approximation.
15Cy = 10 + 0.8*80 = 74
16B 
17D 
18Ab = r * (SSAT /SCAT) = 0.851 * (180/5) = 30.636
a = SAT - b * ACT = 888 - 30.636 * 25 = 122.1
19B 

How to collect data for the purpose of statistics? - ExamTests 3

 

Multiple Choice questions

Question 1

What is an example of a matched-pairs design with two conditions?

A. Each participant is matched to a similar participant. These two participants are allocated randomly to a condition and compared.
B. Each participant is allocated to both conditions. The order of the allocation is randomly selected per participant.
C. None of the above
D. Both

Question 2

A random sample is a sample in which

A. The participants are drawn randomly from the population
B. The conditions are allocated randomly to participants
C. The conditions are selected randomly
D. The conditions are allocated in a random order to participants

Question 3

Which of the following statements about experimental research is true?

I. The independent variable is manipulated by the researcher.
II. It is possible to examine a causal relationship with an experimental design

A. Only statement I is true.
B. Only statement II is true.
C. Both statements are true.
D. Both statements are false.

Question 4

A research examines the association between income and education. When collecting the data, the researcher wishes to take into account the 50-50 distribution for male/female that is present in the population as well as the 30-60-10 distribution for social economical status (SES). Therefore, the researcher divides the population according to sex and SES, and draws a random sample with the numbers of each group equal to the proportions that are present in the population. What kind of sample does the researcher use?

A. Convenient sample
B. Stratified sample
C. Multistage sample
D. Paired sample

Question 5

Anneloes has a cold. Her room mate uses a garlic tablet every day and has not had a cold for over a year now. The aunt of Anneloes knows someone who also uses garlic tablets daily and has not had a cold for a year. Based on this, Anneloes decides to use garlic tablets as soon as she is recovered from her cold. On which kind of study is her decision based?

A. Anecdotic evidence
B. An observational study based on available evidence
C. An observational study based on a sample
D. An experiment

Question 6

The association between drinking Pepsi and weight gain is examined. The study divided 25 participants into two groups: one group followed a Pepsi-free diet and one group followed a Pepsi-rich diet. After 8 weeks the weight gain of each participant is determined. This study is an example of a(n)

A. Observational study
B. Survey
C. Matched-pairs experiment
D. Experiment, which is not double-blind

Question 7

Geertje wants to examine price differences of coffee milk between Albert Heijn, Jumbo and De Spar. How can Geertje best select the products to prevent bias as good as possible?

A. Buy the most bought coffee milk
B. Buy coffee milk of the famous brands
C. Buy both the most bought and famous brands
D. Randomly select a number of available products

Question 8

In a study regarding Ritalin, 100 participants are first divided according to gender. Next, half of the male participants (randomly selected) is assigned the Ritalin, and the other half is assigned a placebo. Equally, half of the female participants (randomly selected) is assigned the Ritalin, and the other half is assigned a placebo. This is an example of

A. Replication
B. Matched-pairs design
C. Entanglement, because the effect of gender is entangled with the effect of Ritalin
D. Block-design

Answers chapter 3

QuestionAnswerExplanation
1DA matched pairs design can be related to both the allocation (order) of participants to both conditions, and the allocation of matched participants to different conditions.
2A 
3C 
4BThe population is subdivided in ‘strata’. Next, a sample is drawn from each stratum. Due to this, the population proportions remain.
5A 
6D 
7D 
8D 

What is probability theory? - ExamTests 4

 

Multiple Choice questions

Question 1

Given are the scores on variable X with a mean of 10 and a standard deviation of 2. Based on this information, we can calculate Y as Y = 10 – 2X. The standard deviation of Y equals

A. 2
B. 4
C. 16
D. 32

Question 2

Given are two events A and B. It is known that P(B) = 0.6, P(A and B) = 0.3 and P(A or B) = 1.0. What is the chance that A occurs, i.e. P(A)?

A. 0.1
B. 0.3
C. 0.6
D. 0.7

Question 3

Given are two events A and B. It is known that P(A) = 0.3 and P(B) = 0.5 and P(B|A) = 0.8. What is the chance of P(A and B)?

A. 0.15
B. 0.24
C. 0.40
D. 0.48

Question 4

A fair dice is thrown twice. What is the chance that the sum of these two throws equals 12?

 EenzaamNiet eenzaamTotaal
In bejaardentehuis403070
Zelfstandig wonend102030
Totaal5050100

Question 5

What is the chance that an elderly person, of which it is known that he or she lives in a retirement home, is lonely?

A. 40/70
B. 40/100
C. 50/100
D. 70/100

Question 6

People who are psychotic, are often depressed too. To examine this relationship, we collected information of 100 patients. In this sample 30% of the patients are psychotic. Of the psychotic patient, 80% is depressed. Of the patients that are not psychotic, only 20% is depressed. How many patients from this sample are psychotic and depressed?

A. 20
B. 24
C. 30
D. 80

Question 7

When event A and B are independent, then:

A. P(A|B) = 0
B. P(A and B) = 0
C. Both A and B
D. None of the above

Question 8

It is given that 25% of the people has a vitamin deficiency. Moreover, of the people with a vitamin deficiency, 80% is truly tested positively on a certain test. Of the people with no vitamin deficiency, 10% somehow still has a positive test result. What is the chance that someone with a positive test result, actually has a vitamin deficiency?

A. 20%
B. 73%
C. 80%
D. 90%

Question 9

Given is the below chance distribution of variable X. The mean of X equals 2.5. What is the expected standard deviation of X?

X1234
P.30.20.20.30

A. 1.20
B. 1.45
C. 1.80
D. 2.00

Question 10

The next information is provided: P(A) = 0.40 and P(B) = 0.30. Moreover, is it given that A and B are independent events. What is the chance on A, given B?

A. 0.12
B. 0.30
C. 0.40
D. More information is needed to determine this

Question 11

With a certain drinking game, you have to drink you throw a 1. Someone joins this game for three rounds, and throws with the same, equal dice. What is the chance that a person has to drink exactly one?

A. \[ 1 * ({\frac{1}{6}}^{1} * {\frac{5}{6}}^{2}) \]
B. \[ 3 * ({\frac{1}{6}}^{1} * {\frac{5}{6}}^{2}) \]
C. \[ (^{3}_{2}) \]
D. More information is needed to determine this

Question 12

Imagine two independent events A and B with P(A) = 0.5 and P(B) = 0.2. What is the chance that both A and B do not happen?

A. 0.1
B. 0.3
C. 0.4
D. 0.7

Question 13

Consider that you throw a fair dice twice. What is the chance that you throw the same number both times?

A. 1/6
B. 1/12
C. 1/18
D. 1/36

Question 14

Below you find the chance distribution of X. X is the number of courses attended by fulltime students in the last period.

X1234
P.20.30.20.30

What is the average number of attended courses by full-time students in the last period?

A. 0.65
B. 2
C. 2.6
D. 1.10

Question 15

And what is the standard deviation of X, as presented in question 14?

A. 0.32
B. 0.64
C. 1.04
D. 1.10

Question 16

Hans is often hired to fix computer problems, such as debugging viruses. Recently, two viruses are present: Dummy and Smarty. The following information is provided:

  • 65% of the customers has problems with virus Dummy, and 35% of the customers has problems with virus Smarty.
  • If the computer is infected with Smarty, there is an 80% chance that Hans can fix the problems.
  • If the computer is infected with Dummy, there is an 30% chance that Hans can fix the problems.

If we randomly select a computer, of which we know that Hans fixed the problems, what is then the chance that this computer was infected with Dummy?

A. 0.52
B. 0.53
C. 0.63
D. 0.83

Question 17

Given are two disjoint events A and B. The chance on A is 0.2 The chance on B is 0.8. What is P(A or B)?

A. 0.6
B. 0.8
C. 1.0
D. More information is needed

Answers

QuestionAnswerExplanation
1Cvar(Y) = (-2)2 * var(X) = (-2)2 * (2)2 = 4 * 4 = 16
sd(Y) = √var(Y) = √16 = 4
2DP(A or B) = P(A) + P(B) - P(A and B). Filling in this formula yields: 1.0 = x + 0.6 − 0.3. So x = 1.0 − 0.6 + 0.3 = 0.7
3BP(A and B) = P(B | A) * P(A) = 0.24
4AThe sum of two times equals 12 only when both times 6 is thrown. Hence: 1/6 * 1/6 = 1/36.
5AThe question implies the conditional chance (= given living situation in retirement home).
6B30% is psychotic, that is 30/100 * 100 = 30 patients. Of these 30 patients, 80% is depressed. That is: 80/100 * 30 = 24 patients. Tip: draw a tree diagram.
7DIndependent means that event A does not influence the chance on event B, and vice versa. So P(A|B) = P(A) and P(B|A) = P(B).
8BMake a tree diagram. Consider n = 1000 participants. Then, approximately n = 275 test positively and n = 200 actually have a vitamin deficiency (250*0.8 = 200). Thus, 200/275 * 100 = 73%.
9AVar = 0.30 ∗ (1 − 2.5)2 + 0.20 ∗ (2 − 2.5)2 + 0.20 ∗ (3 − 2.5)2+0.30 ∗ (4 − 2.5)2 = 1.45
SD = √1.45 ≈ 1.20
10CA and B are independent, so B does not predict anything about A.
11BThe chance to drink exactly one means that you have to drink the first, second or third round. This implies three above 1 (=3) times the chance on each of the drinking possibilities (i.e. 1/6 * 5/6).
12CP(A not and B not) = P(A not)* P(B not) = (1 – 0.5) * (1 – 0.2*) = 0.5 * 0.8 = 0.4
13AThe chance to throw any number is 1/6. The chance to throw that number twice is 1/6 * 1/6 = 1/36 This can happen for all 6 numbers, so multiply by 6 (= 6/36 thus 1/6)
14Cμ = 1 ∗ 0.2 + 2 ∗ 0.3 + 3 ∗ 0.2 + 4 ∗ 0.3 = 2.65
15AMean = 2.6 (see question14).
Variance = (0.20 * 1-2.6)2 + (0.30 * 2-2.6)2 + (0.20 * 3-2.6)2 + (0.30 * 4-2.6)2 = (-0.32)2 + (-0.18)2 + (0.08)2 + (0.42)2 = 0.3176 ≈ 0.32
16DMake a tree diagram 100*0.65 = 65 > 65*0.8 = 52 (computers with Dummy, fixed by Hans) 100*0.35 = 35 > 35*0.3 = 10.5 (computers with Smarty, fixed by Hans) So, 52 / (52+10.5) = 0.8333 ≈ 0.84
17CDisjoint implies P(A or B) = 1.0

What are sampling distributions? - ExamTests 5

 

Multiple Choice questions

Question 1

The scores on the Cito-test are approximately normally distributed with a mean of 535 and a standard deviation of 5. What percentage of students scored higher than 545?

A. 1%
B. 2.5%
C. 5%
D. 10%

Question 2

Given are the the scores on the normally distributed variable ‘Time needed to fall asleep’ for 100 children with a mean of 1500 seconds and a standard deviation of 300 seconds. What is the proportion of children that needs more than 1000 seconds to fall asleep?

A. 0.0475
B. 0.1423
C. 0.8577
D. 0.9525

Question 3

Which of the below statements about sampling variability is/are true?

I. The sampling variability can be lowered by increasing the sample size.
II. The sampling variability is the degree of distribution of a statistic when the statistic is calculated for many randomly drawn samples from the same population.

A. Only statement I is true.
B. Only statement II is true.
C. Both statements are true.
D. Both statements are false.

Question 4

The scores on a developmental test for toddlers are normally distributed with mean 100 and standard deviation 10. What is the chance that a random toddler scores 115 or higher?

A. .0068
B. .4404
C. .5596
D. .9332

Use the following information for questions 5 and 6. The population Dutch Psychology students has a skewed distribution for sex: only 20% is male, and 80% is female. We are interested in the population of male Dutch Psychology students (so p = 0.20).

Question 5

What is the chance on less than 2 male students in a random sample of 8?

A. .1678 + .3355
B. .1678 + .3355 + .2936
C. 1 – (.1678 + .3355)
D. More information is needed

Question 6

What is the chance on at least 30 male students in a random sample of 120 students? Use a normal approximation of the binomial distribution.

A. P(Z > 1.15)
B. P(Z > 1.26)
C. P(Z > 1.37)
D. P(Z > 1.48)

Question 7

Given are the scores on the Cito-test. The scores are normally distributed in the population with mean 100. In a random sample of n = 25, the mean equals 25. The standard deviation in the sample is 3. Which of the statements below is true?

A. 100 is a parameter, 25 is a statistic.
B. 100 is a parameter, 105 is a statistic.
C. 25 is a parameter, 3 is a statistic.
D. 25 is a parameter, 105 is a statistic.

Question 8

An unbiased statistic implies that for a large number of similar, representative samples from the same population and with the same sample size n …

A. All statistics are closely together.
B. The mean of the statistics equals the mean of the parameter.
C. The variance of the statistics is zero.
D. The mean of the statistics is zero.

Question 9

What is P(-0.55 < Z < 1.21) if we use Table A for standard normal distributions?

A. 0.2912
B. 0.5957
C. 0.7088
D. 0.8869

Question 10

The scores of students on the American College Test (ACT) are normally distributed in the population with mean 18 and standard deviation 6. 50 students from a certain school make the ACT. Assume that these 50 scores follow the same distribution as in the population. What is the sampling distribution of the mean on the ACT for samples of n = 50?

A. About normal, but the approximation is bad
B. Exactly normal
C. Skewed to the right
D. Skewed to the left

Question 11

Birth weight of babies is normally distributed with a mean of 7 pound and a standard deviation of 0.8 pound. What is the chance that a randomly selected baby weights more than 7.6 pound?

A. 0.23
B. 0.75
C. 0.77
D. More information is needed

Question 12

X has a binomial distribution with parameters n = 10 and p = 0.7. What is the average number of successes, and what is the standard deviation?

A. μ = 1.45, σ = 7
B. μ = 1.45, σ = 2.1
C. μ = 7, σ = 2.1
D. μ = 7, σ = 1.45

Question 13

Given is that 30% of the marriages in The Netherlands results in a divorce within 15 years. A large study examined hundreds of marriages for the past 15 years. Imagine that 100 of these marriages are selected at random, what is then the chance that less than 20 of these marriages result in a divorce?

A. .011
B. .110
C. .890
D. .989

Question 14

Given is that variable X is heavily skewed to the left in the population. What does the sampling distribution of X look like for samples of n = 100 from this population?

A. Heavily skewed to the left, in accordance with the population.
B. More normally distributed than in the population.
C. Exactly normally distributed.
D. More information is needed.

Question 15

An assumption of the binomial distribution is that all observations are

A. Independent
B. Random
C. Dependent
D. Positive

Question 16

A singular random sample is drawn from a large population. The percentage of respondents in the sample with a certain characteristic is determined. What is the best description of this percentage?

A. It is a parameter.
B. It is a statistic.
C. It is a lurking variable.
D. None of the above answers is correct.

Answers

QuestionAnswerExplanation
1B545 – 535 = 10. That means the score 545 is 2 standard deviations higher than the mean. Two standard deviations left and right of the mean summarizes 95% of all observations. Of the remaining 5%, 2.5% is left (< 525) and 2.5% is right (> 545). Draw a normal distribution with vertical lines for the mean and critical values to provide more insight into the question.
2D

Z > (x - μ)/σ = (1000 - 1500)/300 = -500/300 = -1.67

Looking at Table A, we find for this z-value p = .0475. This is the left exceedance probability. Beause the question is how many children need more than 1000 seconds, we need 1 - .0475 = .9525.

3C 
4AZ > (x - μ)/σ = (115 - 100)/10 = 15/10 = 1.5
Look up Z = 1.5 in Table A. This provides you with a left exceedance probability of p = .9932. We want to know the right exceedance probability, i.e. 1 – 0.9932 = 0.0068
5ATable C: P(X < 2 | p = 0.20, n = 8) = P(X = 0 | p = 0.20, n = 8) + P(X = 1 | p = 0.20, n = 8) .
6BFirst, calculate the mean and standard deviation:
x̄ = 120 ∗ 0.20 = 24
SD = √((n * p * (1-p)) = √(120 ∗ 0.20 ∗ 0.80) ≈ 4.38
Then, use the continuity correction for a normal approximation of the binomial distribution. That means here that you have to use 29.5 instead of 30.
P(X > 30|p = 0.20, n = 120) = P (Z > (29.5 - 24)/4.38) = P(Z > 1.26)
7BPopulation > parameter and Sample > statistic PP - SS
8BUnbiased means that there is no structural distortion. While a single sample may deviate from the population (parameter), the statistic is on average equal to the parameter.
9BB P(-0.55 < Z < 1.21) = P(Z < 1.21) – P(Z < -0.55) = 0.8869 – 0.2912 = 0.5957
10B 
11BZ > (x - μ)/σ = (7.6 - 7)/0.8 = 0.6/0.8 = 0.75 geeft .7734
We want to know the right exceedance probability, so P = 1 - .7734 = 0.2266
12Dμ = np = 10 * 0.7 = 7
σ = √((np(1-p)) = √2.1 ≈ 1.45
13AUse the normal approximation of the binomial distribution.
μ = np = 0.30 * 100 = 30
σ = √((np(1-p)) = √30(0.70) = √21 ≈ 4.58
P(Z < (19.5 - 30)/4.58) ≈ -2.29, opzoeken in tabel A geeft P < .0110
14DCentral limit theorem (see chapter 5).
15A 
16B 

What is statistical inference? - ExamTests 6

 

Multiple Choice questions

Question 1

Given are the years of education for a random sample of 100 participants from the population of Dutch man. Next, a 95% confidence interval is made for the first quartile. This 95% confidence interval consists of

A. The lowest 25% of the scores on ‘years of education’ in the sample.
B. The lowest 25% of the scores on ‘years of education’ in the population.
C. With 95% confidence the value of the first quartile in the sample.
D. With 95% confidence the value of the first quartile in the population.

Question 2

The mean on a variable X has been calculated for 100 students from the population of students in Groningen. A 95% confidence interval is made for the mean. In this case, the 95% confidence interval is the interval is which we find

A. 95% of the means from the sample
B. 95% of the means from the population
C. With 95% certainty the sample mean of X
D. With 95% certainty the population mean of X

Question 3

Dutch employees work on average 30 hours a week. Assume a normal distribution and a standard deviation of 3 in the population. What percentage of Dutch employees works between 24 and 36 hours a week?

A. 5%
B. 32%
C. 68%
D. 95%

Question 4

Rimmer examines the satisfaction of Psychology students with their exam grade for statistics. He uses a 0-100 range and assumes that the scores are normally distributed. Rimmer makes a 95% confidence interval for the mean from a random sample. The confidence interval is [60-75]. What does this imply?

A. That 95% of the scores in the sample lie between 57 and 63.
B. That 95% of the scores in the population lie between 57 and 63.
C. That there is a 95% chance that this interval contains the parameter.
D. That there is a 95% chance that this interval contains the statistic.

Question 5

One hundred students are asked how much beers they have drunken in the past week. The scores are skewed to the left with mean 5 and standard deviation 3. How many beers does a student have to drink to be in the top 2.5%?

A. At least 8
B. At least 11
C. At least 14
D. More information is needed

Question 6

The scores on an exam are normally distributed with mean 60 and standard deviation 8. What is the score that one has to get in order to be in the lowest 5% of the scores?

A. Approximately 44 or lower
B. Approximately 44 or higher
C. Approximately 47 or lower
D. Approximately 47 or higher

Question 7

The time to finish an exam is normally distributed with mean 50 and standard deviation 10. What is approximately the percentage of students that finishes the exam within an hour?

A. 68%
B. 84%
C. 95%
D. 99.7%

Question 8

In a study it is found that Dutch citizens spend on average 1200 euros per year on clothing, with a standard deviation of 14.83. Given is that the margin of error equals 30. What is the minimum required sample size to obtain a 95% confidence interval?

A. 5
B. 6
C. 33
D. 34

Answers

QuestionAnswerExplanation
1DA confidence interval is used to say something about the population with a certain degree of (un)certainty. The sample is only a means to an end.
2DSee previous question.
3DTake 2 SD’s left and right from the mean (according to 68-95-99.7 rule of thumb).
4C 
5DIt is a left skewed distribution. Therefore, one cannot make statements according to the 68-95-99.7 rule-of-thumb for normal distributions.
6CBe aware that we can not use the rule-of-thumb as we we used in the previous question. We now need to look up the z-score in table A. The chance of .05 lies between z = -1.64 and z = -1.65, so we use z = -1.645. This results in: 60 − 1.645 ∗ 8 = 46.84 which is 47 after rounding.
7BWithin an hour implies +1 SD to the right. +/- 1 SD comprises 68% Add half of the remaining 32%, so 68 + 16 = 84%
8D\[ n = {\frac{z * \sigma}{m}}^{2} = {\frac{1.96 * 14.83}{5}}^{2} = 5.813^{2} = 33.80 \]
So at least 34.

What are statistical inferences for distributions? - ExamTests 7

 

Multiple Choice questions

Question 1

Given are two independent variables X and Y. It is known that the mean of X equals 20 and the standard deviation equals 10. Variable Y has a mean of 10 and a standard deviation of 5. What is the standard deviation of the variable (X – Y)?

A. 5
B. 15
C. 75
D. 125

Question 2

Given are two independent random variables X and Y. Which of the following statements is not true?

A. The variance of the difference X – Y equals the difference between the variances.
B. The variance of the sum X + Y equals the sum of the variances.
C. The mean of the sum X + Y equals the sum of the means.
D. The mean of the difference X – Y equals the difference of the means.

Question 3

A researcher believes he should run a lower risk of falsely rejecting the null hypothesis. What do you propose him to do?

A. Testing at a lower significance level
B. To increase the sample size
C. Investigate a greater effect
D. Try to keep the standard error as small as possible

Question 4

The scores of a particular variable are normally distributed in the population with a standard deviation of 12. Suppose the null hypothesis is tested unilaterally on the right that the population mean is equal to 80. It is known that the null hypothesis is rejected from a sample mean of 82.5. What will the power be if the population mean is 86? Assume a sample size of 64 people. The power is approximately...

A. 0.76
B. 0.82
C. 0.94
D. 0.99

Question 5

It is given that with a pooled t procedure for testing a difference in means, the power is 0.82 with α = 0.05 and a sample of 50 people. The researcher actually wants the power to go to a minimum of 0.90. What could he theoretically do to achieve this?

A. Working with a larger sample in combination with α = 0.01
B. Working with a larger sample in combination with α = 0.10
C. Working with a smaller sample in combination with α = 0.01
D. Working with a smaller sample in combination with α = 0.10

Question 6

In a sample of 81 persons, the sample mean is equal to 104 with a standard deviation of 17.24. For a one-sample t-test to test the null hypothesis that the population mean is equal to 100 (H0: μ = 100), a right hand side exceedance probability probability of 0.02, is found. What does this exceedance probability tell you?

A. There is a 2% chance that the population mean is 100 if you find a sample mean of exactly 104.
B. There is a 2% chance that the population mean is 100 if you find a sample mean higher than 104.
C. If the population mean equals 100, there is a 2% chance that you will find a sample mean of exactly 104.
D. If the population mean is 100, there is a 2% chance that you will find a sample mean higher than 104.

The following information belongs to questions 7 to 14. There are various ways to deal better with stress. One way to reduce stress is to offer some form of help. The researchers recorded the physiological response of subjects during a demanding task in which they had to count backwards (mental arithmetic is a very reliable way to induce stress). The participants were 45 women who all have dogs. The test was performed under three conditions (variable CONDIT):

  1. The experimenter present (CONTROL)
  2. A friend and the experimenter present (FEMALE FRIEND)
  3. The dog and the experimenter present (PET DOG)

One of the physiological responses that was measured is the mean heart rate per person during the math test (MEAN HEART RATE). Below are descriptive data of the three groups on MEAN HEART RATE and the output for the t procedure for the difference between two means looking at the control group (group 1) versus the group of subjects who brought a girlfriend (group 2) . This tests the null hypothesis that there is no difference in population means between the two groups.

Table 1. 'Descriptive statistics'
  NMeanMeanStd. dev.
condit statisticstatisticstd. errorstatistic
controlMean heart rate1582,522,3869,242
female friendMean heart rate1591,332,1548,341
pet dogMean heart rate1573,482,5749,970
Table 2. 't-test for equality of means'
  Sig. (2-tailed)Mean differenceStd. error differenceLower limit 95% CI for thedifferenceUpper limmit 95% CI for the difference
Mean heart rateEqual variances assumed,011-8,8013,214-15,385-2,217
 Equal variances not assumed,011-8,8013,214-15,388-2,214

Question 7

The researchers chose to compare the means on the basis of the two sample procedure. Which of the following conditions does not have to be met?

A. The scores on MEAN HEART RATE are normally distributed in both populations.
B. The standard deviations of the scores on MEAN HEART RATE in both populations are equal.
C. The two samples were taken independently of each other from their respective populations.
D. Both A, B and C are conditions that must be met in order to be able to interpret the results of the two-sample procedure in a meaningful way.

Question 8

Suppose you want to compile a 99% confidence interval for the difference between the two means (group 1 and group 2) according to the pooled t procedure. What distribution would you use with this output to find the critical value?

A. The pooled t-distribution
B. A distribution with df = 14
C. A distribution with df = 28
D. A distribution with df = 44

Question 9

The researchers want to investigate whether the average heart rate in the group of subjects who brought a girlfriend is significantly higher than in the control group. What would be the smallest of the below that rejects the null hypothesis?

A. α = 0.10
B. α = 0.05
C. α = 0.01
D. α = 0.005

Question 10

What would be the value of the test statistic t that tests the null hypothesis that the population mean on MEAN HEART RATE is 80 for the control group?

A. -2.74
B. -1.06
C. 1.06
D. 2.74

Question 11

Suppose you want to test whether the hypothesis that the MEAN HEART RATE in the population is exactly 14 points lower in the control group than in the FEMALE FRIEND group (H0: μ1 - μ2 = −14) at a significance level of 5%. What would you decide then?

A. This null hypothesis would not be rejected as the difference in sample means of -8.8 found does not deviate far enough from 14.
B. This null hypothesis would be rejected as the difference in sample means of -8.8 found is far enough from 14.
C. This null hypothesis would not be rejected since the exceedance probability of 0.011 found is not small enough.
D. This null hypothesis would be rejected as the exceedance probability of 0.011 found is small enough

Question 12

What would be the value of the standard error for the difference in sample means between the group FEMALE FRIEND and PET DOG?

A. 2.154
B. 2.364
C. 2.574
D. 3,356

Question 13

Someone wants to investigate whether there is more spread within the control group compared to the group in which a friend is present. She examines this by means of the test and then wants to look up the corresponding value in Table E. Which value does she find?

A. 0.81
B. 0.90
C. 1.11
D. 1.23

Question 14

Imagine that the experiment was structured in a slightly different way: there were only 15 women in total, all of whom would have taken the math test in all three conditions (in random order). You would then have three math test scores for each woman. What procedure would you recommend if you wanted to compare the distribution of scores under the control condition with that of the FEMALE FRIEND condition (assuming that the scores are normally distributed in the population)?

A. The sign test for the difference scores
B. The paired t-test for the difference scores
C. The binomial test for the difference scores
D. Neither A, nor B, nor C is a recommended procedure

Answers

QuestionAnswerExplanation
1DDe variantie van de verschil variabele is gelijk aan de som van de varianties van beide variabelen. Dus var(X – Y) = var(X) + var(Y) = 102 + 52 = 125.
2A 
3ATo reduce the chance of a type-1 error, you can test at a lower significance level.
4Dpower = P(X > 82.5 | μ = 86) = P(Z > (82.5 - 86)/(12√64) = -2.33
1 − 0.0099 = 0.99 (Table A)
5BMore power: (a) larger sample size; (b) higher significance level.
6DIf the population mean equals 100, there is a 2% chance that you will have a
sample mean higher than 104.
7BFor a two-sample t-test, the standard deviations do not have to be equal.
8Cdf = N1 + N2 - 2 = 15 + 15 - 2 = 28
9Cp = .011 for two-sided, so p = .0055 for one-sided
10C\[ t = \frac{82.52 - 80}{9.242/ \sqrt{15}} = 1.06 \]
11AThis null hypothesis would not be rejected as the difference in sample means of -8.8 found does not deviate far enough from 14.
12D\[ SE = \sqrt{ \frac{8.341^{2}}{15} + \frac{9.970^{2}}{15} } \]
13D

F = s12/s22 = 9.2422 / 8.3412 = 1.23

14BSince all women took all tests, the paired t-test for difference scores is best suited here.

What are statistical inferences for proportions? - ExamTests 8

 

Multiple Choice questions

Use the information below to answer questions 1 to question 4. A study was carried out to compare the driving skills of students from Groningen, Rotterdam and Leiden. Of the 100 randomly selected students from Groningen, fifteen stated that they had been involved in a car accident in the past year. Twelve of the 100 students from Leiden selected at random indicated that they had been involved in a car accident in the past year.

The notation used in the exercises below is as follows:

  • PG= the proportion of students from Groningen who were involved in a car accident in the past year.
  • PR = the proportion of students from Rotterdam who were involved in a car accident in the past year.
  • PL = the proportion of students from Leiden who were involved in a car accident in the past year.

Question 1

The researchers want to test whether PL is greater than 10%. Perform an appropriate significance test. Which of the statements below is incorrect?

A. The test statistic is equal to 0.67.
B. The calculated test statistic is t-distributed with df = 99.
C. The critical value at α = 0.05 is equal to 1.645.
D. The P value is equal to 0.2514.

Question 2

The researchers want to test whether PR is less than 25%. To this end, they compile a sample of n randomly selected students from Rotterdam. They report that the 90% confidence interval for pR runs from 0.26 to 0.30. Which of the statements below is correct?

A. The confidence interval widens with a larger sample size n, provided all other parameters remain the same.
B. The confidence interval widens with an increase in the confidence level C, provided all other variables remain the same.
C. Both A and B are correct.
D. Both A and B are not correct.

Question 3

Suppose the researchers want to investigate the size of PL. To this end, they compile a sample of n randomly selected students from Leiden. Which of the sample sizes n below is the smallest sample size for which the 90% confidence interval for PL is smaller than 0.04?

A. 500
B. 1000
C. 2000
D. 2500

Question 4

The margin of error of the 95% confidence interval for pG - pA (large sample) is equal to…

A. \[ \sqrt{ \frac{0.15 * 0.85}{100} + \frac{0.12 * 0.88}{100} } \]
B. \[ 1.96 * \sqrt{ \frac{0.15 * 0.85}{100} + \frac{0.12 * 0.88}{100} } \]
C. \[ 1.984 \sqrt{ \frac{0.15 * 0.85}{100} + \frac{0.12 * 0.88}{100} } \]
D. None of the above answers is correct.

Question 5

The researcher from the previous exercise tests the hypothesis that the sample is a simple random sample (SRS) from a population of 70% women and 30% men. He applies a significance level of 5%. It turns out that the p-value is 0.3. The researcher then concludes that the sample is a simple random sample from a population of 70% women and 30% men. Which of the following statements about this conclusion is correct?

A. The conclusion is correct because the distribution in the sample is not significantly different from the distribution in the population.
B. The conclusion is correct because we cannot reject the null hypothesis.
C. The conclusion is not correct because we cannot reject the null hypothesis.
D. The conclusion is not correct because we have no evidence for the null hypothesis.

Question 6

Suppose you want to establish a confidence interval for a population proportion and no information is known about a possible estimate of the population proportion. In this case, why are we using an estimated population proportion of 0.50 when determining the minimum sample size?

A. The minimum sample size based on a population proportion of 0.50 is in any case large enough if the population proportion is found to deviate from 0.50.
B. An estimated population proportion of 0.50 is exactly between 0 and 1 and, if nothing is known about the population proportion, is the best estimate you can give.
C. When determining the sample size with an estimated population proportion of 0.50, the risk of error type I will be minimized.
D. None of the above alternatives is correct.

Answers

QuestionAnswerExplanation
1BH0: p = 0.10, z = (0.12 - 0.10) / √(0.1*0.9/100) = 0.67
p = P(Z > 0.67) = 0.2514
2BWhen all other quantities (such as sample size) remain the same, an increase in the confidence level causes the confidence interval to become wider. You want more reliability without having extra 'resources', so the interval in which that value lies becomes wider.
3Cn = (1.645/0.02)2 * 0.5 * 0.5 = 1691.3, so from n = 1692 onward, the interval is sufficiently small.
4B 
5DIf the H0 is not rejected, it does NOT mean that the H0 is accepted.
6AThe minimum sample size based on a population proportion of 0.50 is in any case large enough if the population proportion is found to deviate from 0.50.

What are statistical inferences for categorical data? - ExamTests 9

 

Multiple Choice questions

Question 1

A researcher wants to find out whether his sample is a simple random sample (SRS) from the business administration student population. He knows that the business administration student population consists of 70% women and 30% men. His sample of n = 500 subjects consists of 63% women and 37% men. Which test can best be performed by the researcher to test whether the sample is a simple random sample (SRS) from a population of 70% women and 30% men?

A. The z-test for a population proportion
B. A chi-square test
C. Both A and B are suitable
D. Both A and B are not suitable

Use the information below for questions 2 to 4.

A study was carried out to investigate whether the ease with which the statistical computer program SPSS is taught (variable: SPSS) is related to the statistical knowledge of students (variable: KNOWLEDGE). To gain insight into this, a questionnaire was sent to randomly selected students from Dutch universities. In the tables below you will find some research results and incomplete SPSS output.

   Knowledge  
  BadAverageGoodTotal
RBad1115228
 Average8331152
 Good1171432
 Total206527112
   Knowledge  
  BadAverageGoodTotal
RBad55%23%7%25%
 Average40%51%41%46%
 Good5%26%52%29%
 Totaal100%100%100%100%

Chi-square test

 ValueAsymp. Sign. (2-sided)
Pearson Chi-Square20.413a.000
Likelihood Ratio20.914.000
Linear-by-Linear Association18.840.000
N of Valid Cases112 

a. 0 cells (0,0%) have expected count less than 5. The minimum expected count is 5,00.

Question 2

Which distribution is a correct description of the numbers in the column “KNOWLEDGE = Good” of Table 2 (ie the percentages 7%, 41% and 52%)?

A. The marginal probability distribution of R.
B. The joint probability distribution of KENNIS and R.
C. The conditional probability distribution of R given KNOWLEDGE.
D. The chi-square distribution

Question 3

A chi-square analysis has been performed on the above data. Based on the above data, which of the statements below is correct?

A. The number of degrees of freedom is equal to 8.
B. If there is no correlation between R and KNOWLEDGE, then the expected number of students with "KNOWLEDGE = good" and "R = good" equals 30.
C. The contribution of the cell (KNOWLEDGE = good, R = good) to the chi-square statistic is equal to 5.12.
D. None of the above statements is correct.

Question 4

What can we say about the relationship between the ease with which R is taught and the statistical knowledge, based on the above R output and α = 0.05?

A. There is probably no correlation between the ease with which R is taught and the statistical knowledge.
B. There is a very weak relationship between the ease with which R is taught and the statistical knowledge.
C. There is a strong correlation between the ease with which R is taught and the statistical knowledge.
D. None of the above alternatives is correct.

The following information belongs to questions 5 and 6. A home hunter has kept the following information on the Funda.nl website for 126 houses: whether / not a detached house and the condition of the house (moderate, reasonable, good). This resulted in the following output from SPSS.

   condition of the house  
  moderatefairgoodtotal
detachedyes2114742
 no7423584
total 285642126
 Value
Pearson Chi-Square28,875
Likelihood Ratio28,082
Linear-by-Linear Association22,727
N of valid cases126

Question 5

What is a condition for using the chi-square procedure with the data in this crosstab?

A. The expected frequencies must be greater than 5 on average.
B. The houses must be selected independently of each other.
C. Both A and B are a condition.
D. Neither A nor B is a condition

Question 6

What will be the contribution to the value found in the chi-square analysis of the 21 non-detached and 7 detached houses that are in moderate condition, together?

A. 7.29
B. 14.58
C. 21.87
D. None of the above alternatives is correct

Question 7

Assuming that these 126 houses are a random sample of the population. What will be the value of the test statistic against which the null hypothesis is tested that the proportion of moderate dwellings (regardless of whether they are detached) in the population is equal to 0.30?

A. t = −1.91
B. t = −2.10
C. z = −1.91
D. z = −2.10

Question 8

Suppose that for the found Chi-square value the probability of exceedance is so small that the null hypothesis can be rejected. What can you conclude then?

A. The population distribution of the condition of the house is probably different for detached houses than for non-detached houses.
B. The probability in the population of a house in a reasonable condition for detached houses is probably different from the probability in the population of a house in reasonable condition for non-detached houses.
C. Both A and B
D. Neither A nor B

Question 9

Someone has found the (rounded) probabilities 0.33 and 0.67 based on the above crosstab. Which probability distribution (s) do these two numbers form?

A. The marginal distribution of detached houses
B. The conditional distribution of detached houses expected according to the null hypothesis, given a moderate condition of the house
C. The conditional distribution of detached houses expected according to the null hypothesis given a reasonable condition of the house
D. Both A, B and C are correct

Answers

QuestionAnswerExplanation
1CWith 2x2 cross tables you have two options, which yield exactly the same results: a chi-square analysis or a z-test for a proportion. This question was very often answered incorrectly in exams.
2C 
3Cdf = (r - 1) * (c - 1) = 2 * 2 = 4
Expected = (32 * 27) / 112 = 7.14 (do not round this number).
Contribution = (14-7.14)2 / 7.14 = 5.12
4DA significance test never provides information about the strength of the relationship. Also, a significance test never provides evidence for the null hypothesis.
5CFor a chi-square test, both the expected frequencies per cell must be more than five and the observations must be selected independently of each other.
6CExpected number: 42 × 28/126 = 9.33 en 84 × 28/126 = 18.67
χ2 contribution = (21 - 9.33)2 / 9.33 + (7 - 18.67)2/ 18.67 = 14.58 + 7.29 = 21.87
7CThe sample proportion of moderate houses is:
\[ \hat{p} = 28/126 = 0.22 \]
This yields the following z-score:
\[ z = \frac{0.22 - 0.30}{\sqrt{0.3 x 0.7 / 126} } = -1.91 \]
8AThere are two possible null hypotheses that can be tested with a chi-square test: H0: there is no connection between the state of the house and whether or not the house is detached, or H0: the distribution of the state of the house is the same for detached and non-detached houses. Very specific hypotheses formulated as b are not tested with a chi-square test. This hypothesis you should / can test with a test whether two proportions are equal to each other. If there had been of a 2x2 design, these keys were the same. But because we have a 2x3 design, the the proportions test and the chi-square test no longer correspond to each other. Hence only a is correct.
9AMarginal distribution of detached houses is 42/126 = 0.33 and 84/126 = 0.67 Conditional distribution of detached data is moderate is 21/28 = 0.75 and 7/28 = 0.25 Conditional distribution of detached data is reasonable is 14/56 = 0.25 and 42 / 56 = 0.75.

What is regression? - ExamTests 10

 

Multiple Choice questions

Question 1

In a single regression, H0: R2 = 0 versus HA: R2 > 0 is rejected with a p-value of 0.04. What can you say about the p-value of H0: β1 = 0 versus HA: β1 ≠ 0?

A. p = 0.02
B. p = 0.04
C. p = 0.08

Question 2

Complete: “In a simple regression model, the SSMODEL is based on the ...

A. ... differences between the predictions and the mean ”
B. ... differences between measurements and predictions ”
C. ... differences between the measurements and the mean ”

Question 3

A simple regression shows n = 20, r = 0.85. What can you say about the 95%-confidence interval for ρ?

A. This interval is symmetrical around 0.85.
B. The left limit is closer to 0.85 than the right limit.
C. The right limit is closer to 0.85 than the left limit.

Question 4

An study has been conducted to find a relationship between the roasting time performed (m, in minutes) of a turkey based on its weight (g, in pounds) and whether the turkey is stuffed (D = 1) or not (D = 0). This study, based on 32 turkeys, yielded yi = 12.0 + 27.0 gi + 36.0 Di . A 7 pound stuffed turkey is placed in the oven for four hours (240 minutes). Which residual ei belongs to this according to the regression equation?

A ei <0
B. 0 ≤ ei <15
C. ei ≥ 15

Question 5

To use the results in a Dutch cookbook, the above research is converted to one in which the turkey is measured in kilograms. (1 pound = 0.45 kg). Which statement is true?

A. The regression weight of weight becomes 60. The intercept does change.
B. The regression weight of weight becomes 60. The intercept does not change.
C. The regression weight of weight becomes 12. The intercept does not change.
D. The regression weight of weight becomes 12. The intercept does change.

Question 6

A regression analysis is performed. The explanatory variables are two dummy variables, which are based on a factor with 3 groups. The researchers are mainly interested in the following two contrasts:

\[ \psi_{1}: 1/2 (\mu_{1} + \mu_{2}) - \mu_{3} = 0 \hspace{2mm} and \hspace{2mm} \psi_{2}: \mu_{1} - \mu_{2} = 0 \]

Which coding is best suited for this?

A. D1 = -1 in group 1, 0 in group 3, 1 in group 2; D2 = -1/2 niot in group 3, 1 in group 3
B. D1 = -1 in group 2, 0 in group 3, 1 in group 1; D2 = -1/2 not in group 2, 1 in group 2
C. D1 = -1 in group 3, 0 in group 2, 1 in group 1; D2 = -1/2 not in group 3, 1 in group 3

Question 7

Both the slope b1 in a single regression analysis and the Fisher-z transformed correlation coefficient rz have a sampling distribution that is normally distributed. Yet to test β1 you use a different kind of distribution (t-distribution) than when you test ρz (normal distribution). Why?

A. Because the sampling distribution of the correlation coefficient at ρ ≠ 0 is not normally distributed.
B. Because SEb1 still has to be estimated and SEzr not, you use a t-distribution for b1 and not for rz.
C. Because you do not need the Fisher-z transformation to test H0: β1 = 0.

Question 8

What is the function of cross validation in regression analysis? Cross-validation is used to ...

A. ... determine how many and which independent variables should be included in the model.
B. ... determine how well the estimated model can make predictions in another sample.
C. ... calculate the maximum percentage of variance that can be explained by the estimated model in the sample.

Question 9

Which assumption should you not check when performing a regression analysis in which all independent variables are dummy variables?

A. Constant variance
B. Independent residues
C. Linearity

Answers

QuestionAnswerExplanation
1BThe R2 cannot be less than 0, so a one-tailed test is basically the same as a two-tailed test.
2ASSM is calculated by the formula: Σ(ŷi-ȳ)²
3C 
4Byi = 12.0 + 27.0 gi + 36.0 Di
yi = 12.0 + 27.0 * 7 + 36.0 * 1 = 237
ei = (observed y-value) – (predicted y-value)
ei = 240 - 237 = 3
5BThe intercept is about the minutes of roasting time and does not change. The regression weight changes as follows: 27 / 0.45 = 60.
6A 
7BBecause SEb1 still has to be estimated and SEzr not, you use a t-distribution for b1 and not for rz.
8BCross-validation is used to determine how well the estimated model can make predictions in another sample.
9CWith dummy variables you do not have to check linearity, that only applies to continuous variables.

What is multiple regression? - ExamTests 11

 

Multiple Choice questions

Question 1

In which of the multiple regression situations below is the adjusted ("adjusted") R2 greatest? (Here n refers to the sample size and p to the number of independent variables).

A. n = 90, p = 2, R2 = 0.400
B. n = 90, p = 4, R2 = 0.400
C. n = 70, p = 2, R2 = 0.400

Questions 2 to 7 are about the following SPSS output from a multiple regression analysis on the next page. Variables y, x2 and x3 are continuous variables. Variable x1 is a dummy with values 1 for women and value 2 for men.

ModelSum of SquaresdfMean Square
Regression5021,347 1673,782
Residual3696,188 45,075
Total8717,53585 
 UnstandardizedCoefficientsStandardized Coefficients 
ModelBStd. ErrorBetaVIF
(Constant)46,09515,254  
15,3342,710,1451,048
2,101,051,1531,155
3-1,011,097-,8191,204

Question 2

What is the p-value for H0: β3 = 0 versus HA: β3 > 0.

A. Less than 1%
B. Between 1% and 10%
C. Greater than 10%

Question 3

Person 40 in the sample is a man with scores x2 = 40, x3 = 10 and y = 52. What is the residue associated with this person?

A. Less than 0
B. Between 0 and 2
C. Greater than 2

Question 4

How many degrees of freedom belong to the test on H: β1 = 0 with two-sided alternative?

A. 82
B. 83
C. 84

Question 5

Suppose the dummy variable x1 was coded with values ​​-1 and 1 instead of values ​​1 and 2. What would the regression equation look like?

A. y = 40.761 + 5.334 x1 + 0.101 x2 - 1.011 x3
B. y = 48.762 + 2.667 x1 + 0.101 x2 - 1.011 x3
C. y = 46.095 + 10.668 x1 + 0.101 x2 - 1.011 x3.

Question 6

Which statement is correct with regard to correlations in the data?

A. None of the three x variables has a correlation with y greater than 0.76.
B. The correlation between x2 and x3 is greater than 0.25.
C. The absolution value of the correlation of x2 with y is greater than the absolute value of the correlation of x3 with y.

Answers

QuestionAnswerExplanation
1ABoth main and interaction effects.
2C 
3Cŷ = 46,095 + 5,334*1 + 0,101*40 - 1,011* 10
ŷ = 45,359
residual = y - ŷ = 52 - 45,359 = 6,641
4Adftotal = N - 1 = 85, so N = 86
dfmodel = N - I - 1 = 86 - 3 - 1 = 82
5By = 48,762 + 2,667 x1 + 0,101 x2 – 1,011 x3.
Filling in x1 = - 1 yields:
y = 48,762 + 2.667 * -1 .....
y = 46,095 .....
Filling in x1 = 1 yields:
y = 48,762 + 2,667 * 1 .....
y = 51,429 (= 46,095 * 5,334)
6A 

What is one-way ANOVA? - ExamTests 12

 

Multiple Choice questions

Question 1

In the context of a one-way ANOVA with 4 groups with a sample size of 20 in each group, how many degrees of freedom belong to the contrast ψ: μ1 = μ4?

A. 3
B. 38
C. 76

Question 2

What is not a valid contrast in a one-way ANOVA in three groups?

A. (μ1 + μ2)/2 = 0
B. (μ1 – μ2)/2 = 0
C. μ3 – (μ1 + μ2)/2 = 0

The following data belong to question 4 to question 7. A one-way ANOVA analysis in R produces the following output. The sample size is the same for each group. The group averages are successively: 3.90, 6.82 and 7.80.

ANOVA Table
 DfSum Sq.Mean Sq.F valuePr (> F)
Group241.220.607.030.009 **
Residuals1235.22.93  

Question 3

What is the pooled standard deviation?

A. 1.71
B. 2.33
C. 4.85

Question 4

What is the right-hand limit of the 95% confidence interval for this group, based on the pooled standard deviation?

A. It is less than 5.0
B. It is between 5.0 and 6.0
C. It is greater than 6.0

Question 5

How much of the variance can be explained by factor Group?

A. More than 50%
B. Between 30% and 50%
C. Less than 30%

Question 6

One wants to perform all multiple comparisons. A choice can be made from the LSD (Least Significant Difference) or the Bonferroni method to counteract chance capitalization. Which method is preferable here?

A. There is something to be said for both
B. Bonferroni
C. LSD

The following information belongs to question 7 to question 9. A one-way ANOVA is performed. In this analysis, the following information applies: the sample size is 83, the independent variable x consists of 3 groups, the variance of dependent variable y is 100.0, and the sum of squares associated with 'between groups' is equal to 5320.

Question 7

What is the value of sp?

A. 6.0
B. 7.2
C. 36.0

Question 8

What is the number of degrees of freedom of the ANOVA test statistic for the test H0: μ1 = μ2 = μ3?

A. 2 and 80
B. 80 and 2
C. 2

Question 9

It is decided to merge groups 2 and 3 into one group. The ANOVA will be performed again. What can you say about the SSerror?

A. It gets smaller.
B. It gets bigger.
C. This can become both larger and smaller.

Question 10

The following incomplete two-way ANOVA table is given. What is true?

 SSdfMSF
Factor A 2  
Factor B 31,93 
Interactie AB13,05   
Error7,6119  
Totaal28,08   

A. FA = 0.815
B. FA = 2.035
C. FA = 5.490

Answers

QuestionAnswerExplanation
1Cdf = n - k = 70 - 4 = 80 - 4 = 76
There are 4 groups of each size 20, so n = 4*20 = 80
2A 
3ATo find the pooled standard deviation, the root must be extracted from MSE. Sp2 = √MSE = √2.93 ≈ 1.71
4B 
5A 
6CThe LSD can be dangerous, especially when many populations are being examined. This is because the chance of a type-I error then increases. In that case, the null hypothesis is rejected, when in reality it is correct. As a researcher you assume that an effect exists, when this is not the case. To determine t**, we can also choose the Bonferroni method. With this method, the probability of a type-I error does not increase per comparison. The chance always remains 5%.
7A 
8Adf = I - 1 and N - I
df = 3 - 1 and 83 - 3
df = 2 and 80
9BLess degrees of freedom, hence the SSerror becomes larger.
10BFA = MSA / MSE
MSA = SSA/dfA
MSE = SSE/dfE

What is two-way ANOVA? - ExamTests 13

 

Multiple Choice questions

Question 1

1. In a two-way ANOVA, the following four population means apply:
(A1, B1): 12
(A2, B1): 12
(A1, B2): 18
(A2, B2): 16

Which statement is true?

A. There is no interaction effect.
B. There is no main effect for factor B.
C. Both main effects as well as an interaction effect are present.

The following information belongs to question 2 to question 5. A study was carried out into the relationship between gender (M / F) and residential area (city, village, or countryside) and the feeling of well-being (measured on a continuous scale). This resulted in the following (incomplete) ANOVA table.

BronSSdfMSFp
Gender   5.0.028
Residential area  4.5 .229
Interaction    .004
Error 80.03.0  
Total300.0    

Question 2

Calculate the Mean Sum of Squares for Gender factor.

A. 5
B. 15
C. 30

Question 3

Calculate the F-value associated with Residential area.

A. 1.5
B. 3.0
C. 4.5

Question 4

What percentage of the variance in well-being can be explained by the model?

A. 20%
B. 60%
C. 80%

Question 5

What conclusion can be drawn on the basis of the p-values?

A. Because the main effect residential area is not significant, you might as well perform a one-way ANOVA with only factor gender.
B. Post-hoc tests are needed to find where gender differences can be found.
C. Both A and B are true.
D. Both A and B are not true.

Answers

QuestionAnswerExplanation
1C 
2BF = MSA / MSE
MSA = F * MSE
MSA = 5 * 3 = 15
3AF = MSB / MSE
F = 4.5 / 3 = 1.5
4A 
5DBoth statements are incorrect.

What is logistic regression? - ExamTests 14

 

Multiple Choice questions

Question 1

What probability is associated with an odds of 0.25?

A. 1/5
B. 1/4
C. 1/3
D. 1/25

Question 2

Which estimation method is used by SPSS for logistic regression?

A. Least squares method
B. Wald statistics
C. Maximum likelihood method

The following information belongs to questions 3 till 7. As a trial, drivers who are fined for the second time in 12 months will be fined for a considerable amount speeding, compulsory sent to a course. It is examined whether these drivers in the 12 going wrong again months after the course (dummy recidivism = 0 with no measured speeding offense, recidivism = 1 with another fine). Using a logistic regression examined whether the chance of recidivism depends on the number of sessions of the course that the driver followed.

 BS.E.WalddfSig.Exp(B)
sessies-.654.2566.5291.011.520
constant4.7791.8796.4711.011118.944

Question 3

According to the results of this model, is there a significant relationship between recidivism and the number of sessions in the course?

A. Yes, p <.001 and so there is a significant relationship.
B. No, p> .001, so there is no significant relationship
C. Nothing can be said about this on the basis of this information.

Question 4

According to the model, what is the chance that someone will go wrong again after 6 sessions?

A. 60%
B. 70%
C. 85%

Question 5

From how many sessions is the chance of recidivism smaller than 50%?

A. 6
B. 8
C. 9

Question 6

"Each extra session reduces the ... on recidivism about twice." Which word should be correct?

A. Chance
B. Log odds
C. Odds

Question 7

For how many people does the model falsely suggest that they will go wrong again?

A. 3
B. 4
C. 7

The following information belongs to questions 8 - 12. A doctor examines a group of 20 patients who have had a heart attack. The dependent variable is whether the patient had a second heart attack within a year (1 = yes, 0 = no). Explanatory variables are treatment, a dummy that indicates whether the patient has been in treatment in the past year to control anger outbursts, and anxiety, a score on a questionnaire that measures how anxious someone is. A logistic regression was performed via SPSS.

 BS.E.WalddfSig.Exp(B)
treatment-1,0241,171   ,359
anxiety,119,0554,6881,030 
constant-6,3633,2143,9201,048,002

Question 8

The third person in the data set received anger treatment and scored 50 points on the anxiety test. According to the regression model, what are the chances that this person will have another heart attack?

A. 19%
B. 24%
C. 61%

Question 9

What value for the test statistic of the Wald test for treatment does SPSS deliver?

A. 0.147
B. 0.765
C. 0.874

Question 10

Give a 95% confidence interval for the odds ratio for anxiety.

A. (1.011, 1.255)
B. (1.053, 1.185)
C. (1.060, 1,192)

Question 11

What is the effect of anger treatment on the second tumor?

A. The chance of a second tumor is about three times lower when a treatment is followed.
B. The odds for a second tumor are about three times less when a treatment is followed
C. The logit for a second tumor is about three times smaller when a treatment is followed

Question 12

The model incorrectly predicts that two people will not develop a second tumor. A second tumor is incorrectly predicted in three people. What is the percentage of agreement that can be read from the classification table?

A. 70%
B. 75%
C. 80%

Question 13

Which probability is associated with an odds of 0.32?

A. Probability between 0% and 25%
B. Probability between 25% and 50%
C. Probability between 50% and 100%

Question 14

Which claim to logistic regression is true?

A. If an odds ratio for a variable does not differ significantly from 1, then the B coefficient for that variable does not differ significantly from 0.
B. An odds ratio greater than 1 means a positive linear relationship between the independent variable and the probability that the dependent variable is 1.
C. The logit transformation is necessary to satisfy the constant variance assumption.

Answers

QuestionAnswerExplanation
1Aodds = p/(1 - p)
By filling in p = 1/5 = 0.2 it yields;
odds = 0.2/0.8 = 0.25
2C 
3Bp = .011 so p is not smaller than .001. Hence, there is no significant relationship between recidivism and the number of sessions of the course.
4B 
5B 
6CA logistic regression equation says something about the odds.
7B 
8A 
9B 
10A 
11BThe odds for a second tumor are about three times less when a treatment is followed.
12B 
13Aodds = p / (1 - p)
0.32 = p (1 - p)
0.32 - 0.32p = p
0.32 = p + 0.32p
0.32 = 1.32p
p = 0.32 / 1.32
p ≈ 0.24 (≈ 24%)
14AIf an odds ratio for a variable does not deviate significantly from 1, then the B coefficient for that variable did not decrease significantly from 0.

What are non-parametric tests? - ExamTests 15

 

Multiple Choice questions

Question 1

Two measurements are made on two groups with n1 = 12 and n2 = 18. The usual parametric assumptions appear to be fulfilled. Which test is best used to compare the two groups?

A. The paired t-test.
B. The regular (unpaired) t-test.
C. A separate t-test per group.
D. Kruskal-Wallis test

Question 2

Measurements are taken in three groups. Given: n1 = 12, n2 = 25 en n3 = 312. The data shows, among other things, that s1 = 12.3, s2 = 12.6, s3 = 13. What is a valid reason for choosing the Kruskal-Wallis test over a one-way ANOVA?

A. That the sample size per group is so unevenly distributed.
B. When the assumption of normality appears to have been violated.
C. Both A and B are true.
D. Both A and B are not true.

Question 3

Three measurements are taken in two groups during a study. What is the smallest value that Wilcoxon's W can take?

A. 4
B. 5
C. 6
D. 7

Question 4

Analyzes were performed on the scores of five groups. Subsequently, both a one-way ANOVA and the Kruskal-Walli test were carried out. The Q-Q plot shows deviations at the top right of the figure. These deviations have approximately the same distance to the line. Furthermore, a box plot and output of both ANOVA and Kruskal-Wallis are provided. The box plot shows two outliers. What is a good reason for this study to carry out the Kruskal-Wallis test?

A. The assumption of normality appears to have been violated.
B. The homoscedasticity assumption has been violated.
C. Both A and B are true.

Question 5

The researcher is mainly interested in the differences in groups 1 and 4. Which test cannot be used to make a statement about this?

A. Kruskal-Wallis test
B. Wilcoxon rank sum test
C. Wilcoxon signed rank test

The following information is used for exercises 6 to 9. It is investigated to what extent sleep deprivation influences exam results. The researchers suspect that a lack of sleep has a negative effect on the exam scores. Based on theoretical arguments, the researchers strongly suspect that the data does not come from a normal distribution. Nine subjects are randomly divided into two groups. Group 1 undergoes an undisturbed night and is then asked to take a statistical exam. The participants in group 2 are awakened every hour at night and take the same statistics exam the following morning. This produces the following results:

  • Group 1: 104 96 112 106 100
  • Group 2: 98 104 92 100 13

Question 6

What is the Z value of Wilcoxon's rank sum test, based on the normal distribution approach?

A. 0.27
B.10
C. 1.22

Question 7

What is the maximum value that the test statistic W1 of Wilcoxon's rank sum test can achieve at n1 = 5 and n2 = 4?

A. 28
B. 30
C. 35.

Question 8

Another non-parametric test is the Kruskal-Wallis test. Can it also be used here to see if both groups differ?

A. No, because the rule of thumb that each group must have a minimum of five people has not been met.
B. No, because it has been developed for a situation with at least 3 groups.
C. Yes.

Question 9

Regardless of the answer to the previous question, what is the value of Kruskal-Wallis' H applied to the data?

A. -28.5
B. 1.5
C. 2.25

Answers

QuestionAnswerExplanation
1BThe assumptions have been fulfilled, so a parametric test can be performed. This is a comparison between two groups, so a regular t-test is most suitable here.
2BThe Kruskal-Wallis test is an alternative non-parametric test. This is used when the assumptions for a parametric test (including normality) are not met.
3CW = 1 + 2 + 3 = 6
4AThe Q-Q plot shows an anomaly, so the normality assumption appears to have been violated. This does not apply to the assumption of equal variances; both above and below the line are variances of approximately the same distance.
5C 
6B\[ z = \frac{W - \mu_{W}}{\sigma_{W}} = \frac{W - n_{1} (N + 1) / 2}{\sqrt{n_{1}n_{2} (N+1) / 12 }} \]
7CAll nine observations are assigned a number:
1 2 3 4 5 6 7 8 9
For the maximum score at n1 = 5, take the highest 5 observations, so Wmax = 5 + 6 + 7 + 8 + 9 = 35
8C 
9B\[ H = \frac{12}{N(N + 1)} \sum{\frac{R_{i}^{2}}{n_{i}} } - 3(N + 1) \]

 

Supporting content II (teasers)
Practice Exams with Introduction to the Practice of Statistics Chapter 1-7
Join World Supporter
Join World Supporter
Log in or create your free account

Waarom een account aanmaken?

  • Je WorldSupporter account geeft je toegang tot alle functionaliteiten van het platform
  • Zodra je bent ingelogd kun je onder andere:
    • pagina's aan je lijst met favorieten toevoegen
    • feedback achterlaten
    • deelnemen aan discussies
    • zelf bijdragen delen via de 7 WorldSupporter tools
Follow the author: Vintage Supporter
Comments, Compliments & Kudos

Add new contribution

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.
Promotions
vacatures

JoHo kan jouw hulp goed gebruiken! Check hier de diverse studentenbanen die aansluiten bij je studie, je competenties verbeteren, je cv versterken en een bijdrage leveren aan een tolerantere wereld