16. Predictions with TimeSeries Data
Time series data involves measurements that are ordered over time, in which the sequence of observations is important. Most procedures for data analysis cannot be used for this data, as these procedures are based on the assumption that the errors are independent. Thus, different forms of analysis are needed.
The main goal of analysing timeseries data is to make predictions. An important assumption here is that the relations between variables remain constant.
16.1. TimeSeries Components
Most timeseries have the following four components:
 Trend component: Values grow or decrease steadily over long periods of time.
 Seasonality component: An oscillatory patterns that is specific per season (quarter year) repeats itself.
 Cyclical component: And oscillatory or cyclical pattern that is not related to seasonal behaviour.
 Irregular component: No pattern is regular enough to only exist through these predictable trends; each series of data will also have irregular components (similar to the random error term).
Analysis of timeseries data involves constructing a formal model in which most of these components are explicitly or implicitly present, in order to describe the behaviour of the data series. In building this model the series components can either be regarded as being fixed over time, or as steadily evolving over time.
16.2. Moving Averages
Moving averages are the basis for many practical adjustment procedures. It can be used to remove the irregular component or smooth seasonal component:
 Removing the irregular component: This is done by replacing each observation with the average of itself and its neighbours. The theory is that this will decrease the effect of the irregular component on each data point.
 Smoothing the seasonal component: This is done by producing fourperiod moving averages in such a manner that the seasonal values become one single seasonal moving average. This does mean that the values have shifted in time (in comparison to the original series), but this can be corrected by centring the averages. The specific procedure always depends on the amount of stability the pattern is assumed to have, and whether seasonality is thought to be additive or multiplicative (in the latter case: use logarithms).
If there is an assumption of a stable seasonal pattern a further seasonaladjustment approach can be used: the seasonal index method. Here the original series is expressed as a percentage of the centred 4point moving average series.
Additionally moving averages are very suitable for detecting cyclical components and/or trends.
16.3. Predictions using smoothing
There are a various prediction methods, and the choice you make should always depend on the resources, the objectives, and the available data.
Simple exponential smoothing is a more basic prediction method that is appropriate when the series is nonseasonal and has no consistent trends. It predicts future values on the basis of an estimate of the current level of the time series. This estimate is comprised of a weighted average of current and past values, where most weight is given to the most recent observations (with decreasing weight the older the observation is).
The smoothed series is then ^x, with ^x_{t} = (1 – α)^x_{t1} + αx_{t}. Where t signifies t the moment in the time series, and α is the smoothing constant. The smoothing constant is a value between 0 and 1 and is different per situation. It is possible to rely on experience or judgment to choose this value, or to try several different values and see which is more successful.
The HoltWinters exponential smoothing procedure is a more advanced prediction method that allows for trend. It functions just like the simple exponential smoothing procedure, but with the added variable for the trend estimate T_{t1}.
An extension of this method also allows for seasonality. This is done by using a set of recursive estimates from the timeseries. For this a level factor (α), a trend factor (β) and a multiplicative seasonal factor (γ) are used.
16.4. Predictions using AutoRegression
The procedure of autoregressive models uses the available timeseries data to estimate the parameters of a model of the process that could have generated the time series. This is based on autocorrelation, correlation patterns between adjacent periods. The model that is formed by this is: x_{t} = γ +φ_{1}x_{t1} + ε_{t}. Where γ and φ_{1} are fixed parameters. The parameter γ allows for rhe mean of the series x_{t} to be other than 0. The random variables ε_{t} have a mean of 0, fixed parameters and are not correlated with each other.
This is called a firstorder autoregressive model. It is possible to extend this model by making the current value of the series dependent on the two most recent observations, this is then called a secondorder autoregressive model.
16.5. The BoxJenkins approach
It is good to briefly mention the BoxJenkins approach to predictions in timeseries data. In this procedure one (1) defines a broad class of models for predictions, and then (2) develop a methodology for picking a suitable model on the basis of the characteristics of the available data. This has three general stages:
 Selecting a specific model that might be appropriate, based on summary statistics.
 Estimated the unknown coefficients in this model.
 Applying checks to determine whether the model adequately represents the available data.
This approach is useful due to its flexibility.
A general model class that can be used here is that of autoregressive integrated moving average models (ARIMA models).
Samenvatting Statistics for Business and Economics
Multiple Regression (12)
12. Multiple Regression
Simple regression (see chapter 11) can predict a dependent variable as a function of a single independent variable. But often there are multiple variables at play. In order to determine the simultaneous effect of multiple independent variables on a dependent variable, multiple regression is used. The least squares principle fit the model.
12.1. The model
As with simple regression, the first step in the model development is model specification, the selection of the model variables and functional form of the model. This is influenced by the model objectives, namely: (1) predicting the dependent variable, and/or (2) estimating the marginal effect of each independent variable. The second objective is hard to achieve, however, in a model with multiple independent variables, because these variables are not only related to the dependent variable but also to each other. This leaves a web of effects that is not easily untangled.
To make multiple regression models more accurate an error term “ε” is added, as a way to recognize that none of the described relationships in the model will hold exactly and there are likely to be variables that affect the dependent variable, but are not included in the model.
12.2. Estimating Coefficients
Multiple regression coefficients are calculated with the least squares procedure. However, again this is more complicated than with simple regression, as the independent variables not only affect the dependent variable but also each other. It is not possible to identify the unique effect of each independent variable on the dependent variable. This means that the higher the correlations between two or more of the independent variables in a model are, the less reliable the estimated regression coefficients are.
There are 5 assumptions to standard multiple regression. The first 4 are the same as are made for simple regression (see chapter 11). The 5^{th} states that it is not possible to find a set of nonzero numbers such that the sum of the coefficients equals 0. This assumption excludes the cases in which there is a linear relationship between a pair of independent variables. In most cases this assumption will not be violated if the model is properly specified.
Whereas in simple regression the least squares procedure finds a line that best represents the set of points in space, multiple regression finds a plane that best represents these points (as each variable is represented with its own dimension).
It is important to be aware of the fact that in a multiple regression it is not possible to know which independent variable predicts which change in the dependent variable. After all, the slope coefficient estimated is affected by the correlations between all independent and dependent variables. This also means that any multiple regression coefficient is dependent on all independent variables in the model. These coefficients are thus referred to as conditional coefficients. This is the case is all multiple regression models unless there are two independent variables with a sample correlation of zero (but this is very unlikely). Because of this.....read more
Several Aspects of Regression Analysis (13)
13. Several Aspects of Regression Analysis
This chapter focusses on topics that add to the understanding of regression analysis. This includes alternative specifications for these models, and what happens in the situations where basic regression assumptions are violated.
13.1. Developing models
The goal when developing a model is to approximate the complex reality as close as possible with a relatively simple model, which can then be used to provide insight into reality. It is impossible to represent all of the influences in the real situation in a model, instead only the most influential variables are selected.
Building a statistical model has 4 stages:
 Model Specification: This step involves the selection of the variables (dependent and independent), the algebraic form of the model, and the required data. In order to this correctly it is important to understand the underlying theory and context for the model. This stage may require serious study and analysis. This step is crucial to the integrity of the model.
 Coefficient Estimation: This step involves using the available data to estimate the coefficients and/or parameters in the model. The desired values are dependent on the objective of the model. Roughly there are two goals:
 Predicting the mean of the dependent variable: In this case it is desirable to have a small standard error of the estimate, s_{e}. The correlations between independent variables need to be steady, and there needs to be a wide spread for these independent variables (as this means that the prediction variance is small).
 Estimating one or more coefficients: In this case a number of problems arise, as there is always a tradeoff between estimator bias and variance, within which a proper balance must be found. Including an independent variable that is highly correlated with other independent variables decreases bias but increases variance. Excluding the variable decreases variance, but increases bias. This is the case because both these correlations and the spread of the independent variables influence the standard deviation of the slope coefficients, s_{b}.
 Model Verification: This step involves checking whether the model is still accurate in its portrayal of reality. This is important because simplifications and assumptions are often made while constructing the model, this can lead to the model becoming (too) inaccurate). It is important to examine the regression assumptions, the model specification, and the selected data. If something is wrong here, we return to step 1.
 Interpretation and Inference: This step involves drawing conclusions from the outcomes of the model. Here it is important to remain critical. Inferences drawn from these outcomes can only be accurate if the previous 3 steps have been completed properly. If these outcomes differ from expectations or previous findings you must be critical about whether this is due to the model or whether you really have found something new.
13.2. Further Application of Dummy Variables
Dummy variables were introduced in chapter 12 as a way to include categorical variables in regression analysis. Further uses for these variables will be.....read more
Analysis of Variance (15)
15. Analysis of Variance
There are situations and experiments that require processes to be compared at more than two levels. Data from such experiments can be analysed using analysis of variance or ANOVA.
15.1. Comparing Population Means
There are other ways to compare population means than ANOVA, but these are based on the assumption of either paired observations or independent random samples, and can only be used to compare two population means. ANOVA can be used to compare more than two populations, and also uses assessments of variation, which forms a large problem in other methods.
15.2. OneWay ANOVA
The procedure for testing the equality of population means is called a oneway ANOVA. This procedure is based on the assumption that all included populations have a common variance.
The total sum of squares (SST) in this procedure is made up of a withingroup sum of squares (SSW) and a between groups sum of squares (SSG): SST = SSW + SSG
This division of the SST forms the basis of the oneway ANOVA, as it expresses the total variability around the mean for the sample observations.
If the null hypothesis is true (all population means are the same) then both SSW and SSG can be used to estimate the common population variance. This is done by dividing by the appropriate number of degrees of freedom.
Because SSW and SSG both provide an unbiased estimate of the common population variance if the null hypothesis is true, a difference between the two values indicates that the null hypothesis is false. The test of the null hypothesis is thus based on the ratio of mean squares:
Where and . With the assumptions that the population variances are equal and the population distributions are normal.
The closer the ratio is to 1, the less indication there is that the null hypothesis is false.
These results are also summarized in a oneway ANOVA table, which has the following format:
Source of Variation  Sum of Squares  Degrees of Freedom  Mean Squares  Fratio 
Between groups  SSG  K – 1  MSG  MSG/MSW 
Within groups  SSW  n – K  MSW  
Total  SST  n – 1 


It is also possible to calculate a minimum significant difference (MSD) between two sample means, as evidence to conclude whether the population means are different. This is done:
With s_{p} being the estimate of variance (),.....read more
Predictions with TimeSeries Data (16)
16. Predictions with TimeSeries Data
Time series data involves measurements that are ordered over time, in which the sequence of observations is important. Most procedures for data analysis cannot be used for this data, as these procedures are based on the assumption that the errors are independent. Thus, different forms of analysis are needed.
The main goal of analysing timeseries data is to make predictions. An important assumption here is that the relations between variables remain constant.
16.1. TimeSeries Components
Most timeseries have the following four components:
 Trend component: Values grow or decrease steadily over long periods of time.
 Seasonality component: An oscillatory patterns that is specific per season (quarter year) repeats itself.
 Cyclical component: And oscillatory or cyclical pattern that is not related to seasonal behaviour.
 Irregular component: No pattern is regular enough to only exist through these predictable trends; each series of data will also have irregular components (similar to the random error term).
Analysis of timeseries data involves constructing a formal model in which most of these components are explicitly or implicitly present, in order to describe the behaviour of the data series. In building this model the series components can either be regarded as being fixed over time, or as steadily evolving over time.
16.2. Moving Averages
Moving averages are the basis for many practical adjustment procedures. It can be used to remove the irregular component or smooth seasonal component:
 Removing the irregular component: This is done by replacing each observation with the average of itself and its neighbours. The theory is that this will decrease the effect of the irregular component on each data point.
 Smoothing the seasonal component: This is done by producing fourperiod moving averages in such a manner that the seasonal values become one single seasonal moving average. This does mean that the values have shifted in time (in comparison to the original series), but this can be corrected by centring the averages. The specific procedure always depends on the amount of stability the pattern is assumed to have, and whether seasonality is thought to be additive or multiplicative (in the latter case: use logarithms).
If there is an assumption of a stable seasonal pattern a further seasonaladjustment approach can be used: the seasonal index method. Here the original series is expressed as a percentage of the centred 4point moving average series.
Additionally moving averages are very suitable for detecting cyclical components and/or trends.
16.3. Predictions using smoothing
There are a various prediction methods, and the choice you make should always depend on the resources, the objectives, and the available data.
Simple exponential smoothing is a more basic prediction method that is appropriate when the series is nonseasonal and has no consistent trends. It predicts future values on the basis of an estimate of the current level of the time series. This estimate is comprised of a weighted average of current and past values, where most weight is given to the most recent observations (with decreasing weight.....read more
Sampling (17)
17. Sampling
There are various ways of sampling a population, according to research and analysis goals.
17.1. Stratified Sampling
Stratified sampling involves breaking the population into strata (a.k.a. subgroups) according to a specific identifiable characteristic in such a way that each member of the population belongs to only one strata. Stratified random sampling is the process of selecting independent simple random samples from each strata. A question that arises here Is how to allocate the sampling effort among the strata. There are various possibilities:
 Proportional allocation: The proportion of the sample from a stratum is the same as the proportion of that stratum to the population. This is used if there is little to nothing known about the population and there are no strong requirements for the production of information.
 Optimal allocation: More sample effort is allocated to strata with a higher population variance. This is used if the objective is to estimate an overall population parameter (such as mean, total, or proportion) as precisely as possible. This method is only optimal with this goal in mind.
Analysing the results of stratified random samples is relatively straightforward, and any stratum sample mean (m_{j}) can be used as an unbiased estimator of the population mean (μ_{j}). It can also be sued to estimate the population total, as this is the product of the population mean and the number of population members.
17.2. Other Ways to Sample
Various other sampling methods are:
 Cluster Sampling: This method can be used when a population can be subdivided into small geographical units, or clusters. A simple random sample of clusters is then selected, and each member of these clusters is contacted for data. Using this method very little prior information of the population is needed.
 TwoPhase Sampling: In this method the regular datacollection is preceded by a smaller pilot study, in which a smaller sample is used. This cost more time but allows for methods and procedures to be improved, and can provide some estimations for the true study.
 Nonrandom sampling: There are two main methods:
 Nonprobabilistic sampling: Sample members are selected by convenience. This often means that the sample is not representative of the population and lacks proper statistical validity.
 Quota sampling: There are specified numbers of people of certain characteristics (race, age, gender etc.) that are contacted. This usually produces quite accurate estimates of population parameters, but it is not possible to determine the reliability of these estimates, because the sample was not randomly chosen.
Bullets Statistics for Business and Economics
12. Multiple Regression
 Regression objectives are either to predict the value of the dependent variable, or to estimate the marginal effect of each independent variable.
 A population multiple regression model is a model that includes multiple independent variables.
 Standard multiple regression assumptions include the four standard simple regression assumptions, plus a fifth one: It is not possible to find a set off nonzero numbers such that the sum of the coefficients equals zero.
 Multiple regression models include an error term, ε, that represents variability caused by variables not included in the model.
 In multiple regression coefficients are estimated using least squares, but these estimates become less reliable the higher the correlations between independent variables are.
 Any regression coefficient in a multiple regression model is dependent on all independent variables, and are thus referred to as conditional coefficients.
 Mean square regression (MSR) shows the proportion of the variability by the dependent variable that can be explained by the regression model.
 In a multiple regression model the sumofsquares (SST; or sample variability) can be split into the sum of squares regression (SSR; or explained variability) and the sum of squares error (SSE; or unexplained variability). This is referred to as sumofsquares decomposition.
 The coefficient of determination, R^{2}, describes the strength of the linear relationship between the independent variables and the dependent variables, and is calculated by 1 – SSE/SST.
 Adding more independent variables leads to a misleading increase in R^{2}, which can be avoided by calculating the adjusted coefficient of determination.
 The coefficient variance estimator, s^{2}_{b}, is calculated as:
The square root of s^{2}_{b} is the coefficient standard error.  Multiple regression models can be transformed into nonlinear models, namely quadratic models and logarithmic models.
 Dummy variables can be used to represent categorical data in a regression model, and have a value of either 0 or 1.
13. Additional Topics in Regression Analysis
 Models are developed through four steps: model specification (selecting the variables, the algebraic form, and the data), coefficient estimation, model verification (checking whether the model is still accurate), and interpretation and inference.
 Dummy variables can be used to represent more than two categories by using multiple dummy variables. The rule is: number of categories 1 = number of dummy variables.
 In time series data the values of the dependent variable are related, this is then referred to as a lagged dependent variable.
 Not including important independent variables in a model can make any conclusions drawn from this model faulty.
 Multicollinearity is the phenomenon of two highly correlated independent variables. This leads to misleading estimated coefficients.
 Correlations between error terms are called autocorrelated errors. This leads to the estimated standard errors for the coefficients being biased, the null hypotheses falsely being rejected, and confidence intervals being too narrow. Autocorrelation can be formally tested with the DurbinWatson test.
15. Analysis of Variance
 An Analysis of Variance (ANOVA) can be used to analyze data at more .....read more
Oefenvragen Statistics for Business and Economics
12. Multiple Regression
1. A study was conducted to assess the influence of various factors on the start of new firms in the agricultural industry. For a sample of 70 countries the following model was estimated:
yn = 59.31 + 4.983x1 + 2.198x2 + 3.816x3  0.310x4 11.1562 10.2102 12.0632 10.3302
0.886x5 + 3.215x6 + 0.85x7 13.0552 11.5682 10.3542
R2 = 0.766
where:
yn = new business starts in the industry
x1 = population in millions
x2 = industry size
x3 = measure of economic quality of life
x4 = measure of political quality of life
x5 = measure of environmental quality of life
x6 = measure of health and educational quality of life
x7 = measure of social quality of life
The numbers in parentheses under the coefficients are the estimated coefficient standard errors.
a. Interpret the estimated regression coefficients.
b. Interpret the coefficient of determination.
c. Find a 90% confidence interval for the increase in new business starts resulting from a oneunit increase in the economic quality of life, with all other variables unchanged.
d. Test, against a twosided alternative at the 5% level, the null hypothesis that, all else remaining equal, the environmental quality of life does not influence new business starts.
e. Test, against a twosided alternative at the 5% level, the null hypothesis that, all else remaining equal, the health and educational quality of life does not influence new business starts.
f. Test the null hypothesis that, taken together, these seven independent variables do not influence new business starts.
2. Based on 25 years of annual data, an attempt was made to explain savings in Japan. The model fitted was as follows:
y = b0 + b1x1 + b2x2 + e
where
y = change in real deposit rate
x1 = change in real per capita income
x2 = change in real interest rate
The least squares parameter estimates (with standard errors in parentheses) were (Ghatak and Deadman 1989) as follows:
b1 = 0.097410.02152 b2 = 0.37410.2092
The adjusted coefficient of determination was as follows:
R2 = .91
a. Find and interpret a 99% confidence interval for b1.
b. Test, against the alternative that it is positive, the null hypothesis that b2 is 0.
c. Find the coefficient of determination.
d. Test the null hypothesis that b1 = b2 = 0.
e. Find and interpret the coefficient of multiple correlation.
3. Based on data from 63 countries, the following model was estimated by least squares:
yn = 0.58  .052x1  .005x2 R2 = .17
1.0192 1.0422
where:
yn = growth rate in real gross domestic product
x1 = real income per capita
x2 = average tax rate, as a proportion of gross.....read more
JoHo can really use your help! Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world
Online access to all summaries, study notes en practice exams
 Check out: Register with JoHo WorldSupporter: starting page (EN)
 Check out: Aanmelden bij JoHo WorldSupporter  startpagina (NL)
How and why would you use WorldSupporter.org for your summaries and study assistance?
 For free use of many of the summaries and study aids provided or collected by your fellow students.
 For free use of many of the lecture and study group notes, exam questions and practice questions.
 For use of all exclusive summaries and study assistance for those who are member with JoHo WorldSupporter with online access
 For compiling your own materials and contributions with relevant study help
 For sharing and finding relevant and interesting summaries, documents, notes, blogs, tips, videos, discussions, activities, recipes, side jobs and more.
Using and finding summaries, study notes en practice exams on JoHo WorldSupporter
There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.
 Use the menu above every page to go to one of the main starting pages
 Starting pages: for some fields of study and some university curricula editors have created (start) magazines where customised selections of summaries are put together to smoothen navigation. When you have found a magazine of your likings, add that page to your favorites so you can easily go to that starting point directly from your profile during future visits. Below you will find some start magazines per field of study
 Use the topics and taxonomy terms
 The topics and taxonomy of the study and working fields gives you insight in the amount of summaries that are tagged by authors on specific subjects. This type of navigation can help find summaries that you could have missed when just using the search tools. Tags are organised per field of study and per study institution. Note: not all content is tagged thoroughly, so when this approach doesn't give the results you were looking for, please check the search tool as back up
 Check or follow your (study) organizations:
 by checking or using your study organizations you are likely to discover all relevant study materials.
 this option is only available trough partner organizations
 Check or follow authors or other WorldSupporters
 by following individual users, authors you are likely to discover more relevant study materials.
 Use the Search tools
 'Quick & Easy' not very elegant but the fastest way to find a specific summary of a book or study assistance with a specific course or subject.
 The search tool is also available at the bottom of most pages
Do you want to share your summaries with JoHo WorldSupporter and its visitors?
 Check out: Why and how to add a WorldSupporter contributions
 JoHo members: JoHo WorldSupporter members can share content directly and have access to all content: Join JoHo and become a JoHo member
 Nonmembers: When you are not a member you do not have full access, but if you want to share your own content with others you can fill out the contact form
Quicklinks to fields of study for summaries and study assistance
Field of study
 All studies for summaries, study assistance and working fields
 Communication & Media sciences
 Corporate & Organizational Sciences
 Cultural Studies & Humanities
 Economy & Economical sciences
 Education & Pedagogic Sciences
 Health & Medical Sciences
 IT & Exact sciences
 Law & Justice
 Nature & Environmental Sciences
 Psychology & Behavioral Sciences
 Public Administration & Social Sciences
 Science & Research
 Technical Sciences
Add new contribution