Pages

Tuesday, December 13, 2011

HYPOTHESIS


INTRODUCTION
A hypothesis is a preliminary or tentative explanation or postulate by the researcher of what the researcher considers the outcome of an investigation will be.   It is an informed/educated guess. It indicates the expectations of the researcher regarding certain variables.  It is the most specific way in which an answer to a problem can be stated.
MEANING
Hypothesis means a mere assumption or some supposition or a possibility to be proved or disproved.
1. A tentative explanation for an observation, phenomenon, or scientific problem that can be tested by further investigation.
2. Something taken to be true for the purpose of argument or investigation; an assumption. A statement that explains or makes generalizations about a set of facts or principles, usually forming a basis for possible experiments to confirm its viability.

DEFINITION
            “A hypothesis is a tentative generalization, the validity of which remains to be tested.
-          George A.Landberg.

 WHEN IS AN HYPOTHESIS FORMULATED
An hypothesis is formulated after the problem has been stated and the literature study has been concluded.  It is formulated when the researcher is totally aware of the theoretical and empirical background to the problem.
 THE PURPOSE AND FUNCTION OF AN HYPOTHESIS
  • It gives direction to an investigation.
  • It structures the next phase in the investigation and therefore furnishes continuity to the examination of the problem.
CHARACTERISTICS OF AN HYPOTHESIS
  • It must be verifiable.
  • It must be formulated in simple, understandable terms.
·         Hypothesis should be clear and precise.
·         It should be capable of being tested.
·         A relational hypothesis should state relationship between variables.
·         It should be specific and limited in scope.
·         It should be consistent with most known facts.
·         It should be amenable to testing within a reasonable time.
  • An important requirement for hypotheses is TESTABILITY.
  • A condition for testability is CLEAR nad UNAMBIGUOUS CONCEPTS.
OTHER CHARACTORS
  1. A good hypothesis is based on sound reasoning.
    1. Your hypothesis should be based on previous research.
    2. The hypothesis should follow the most likely outcome, not the exceptional outcome.

  1. A good hypothesis provides a reasonable explanation for the predicted outcome.
    1. Do not look for unrealistic explanations.

  1. A good hypothesis clearly states the relationship between the defined variables.
    1. Clear, simply written hypothesis is easier to test.
    2. Do not be vague.

  1. A good hypothesis defines the variables in easy to measure terms.
    1. Who are the participants?
    2. What is different or will be different in your test?
    3. What is the effect?

  1. A good hypothesis is testable in a reasonable amount of time.
    1. Do not plan a test that will take longer than your class project.

TYPES
DESCRIPTIVE HYPOTHESIS:
            Descriptive hypothesis are propositions that describe the existence, size, form or distribution of some variables.
RELATIONAL HYPOTHESIS:
            It describes the relationship between two variables.
WORKING HYPOTHESIS:
            The working hypothesis indicates the nature of data and methods of analysis required for the study. Working hypothesis are subject to modification as the investigation proceeds.
NULL HYPOTHESIS:
            When a hypothesis is stated negatively, it is called a null hypothesis. A null hypothesis should always be specific. The null hypothesis is the one which one wishes to disprove.
ALTERNATIVE HYPOTHESIS
            The set of alternatives to the null hypothesis is referred to as the alternative hypothesis. Alternative hypothesis is usually the one which one whishes to prove.

STATISTICAL HYPOTHESIS:
            It is a quantitative statement about a population. When the researcher derives hypothesis from a sample and hopes it to be true for the entire population it is known as statistical hypothesis.
SIMPLE HYPOTHESIS:
            It states the existence of certain empirical uniformities. Many empirical uniformities are common in sociological research.
COMPOSITE HYPOTHESIS:
            These hypothesis aim at testing the existence of logically derived relationships between empirical uniformities obtain.
EXPLANATORY HYPOTHESIS:
            It states the existence of one independent variable causes or leads to an effect on dependent variable.
PROCEDURE OF TESTING A HYPOTHESIS:
Making a formal statement:
Construct a formal statement of the null hypothesis and also of the alternative hypothesis.
(Eg) Null hypothesis H0
        Alternative hypothesis Ha
Selecting a statistical technique:
            There are many important parametric tests, which are frequently used in hypothesis testing. They are Z-test, t-test, X2-test, and F-test. The researcher has to select the appropriate test for his research.
Selecting the significance level:
            The hypothesis are tested on pre-determined level of significance. In practice, either 5% level and or 1% level of significance is adopted for accepting or rejecting a hypothesis.
Choosing the two-tailed and one-tailed tests:
            The hypothesis indicated whether we should use a one-tailed test or a two-tailed test. If the alternative hypothesis is of the type greater than or of the type lesser than, we use a one-tailed test. On the other hand if the alternative hypothesis is of the type “not equal to” then we use a two-tailed test.
Compute the appropriate statistics from the sample data:
            A random sample has to be selected as per the sample design decided, and for the collected data, the appropriate statistic or measure with reference to the research question, type of hypothesis to be tested and the level of measurement of the data.
Compute the significance test value:
            After the sample statistic is calculated, the formula for the selected significance test is used to obtain the calculated test value.
Obtain the critical test value:
            We must locate the critical value in the table concerned with the selected probability distribution for the given level of significance for the appropriate number of degrees of freedom. The critical value so located in the table is commonly known as table value.
Deriving the inference:
            The calculated value is then compared with the predetermined critical value. If the calculated value exceeds the critical value at 5% level, then the difference is considered as significant. On the other hand, if the calculated valued is less than the critical value at 5% level the difference is considered as insignificant.

Hypothesis Tests

Statisticians follow a formal process to determine whether to reject a null hypothesis, based on sample data. This process, called hypothesis testing, consists of four steps.
  • State the hypotheses. This involves stating the null and alternative hypotheses. The hypotheses are stated in such a way that they are mutually exclusive. That is, if one is true, the other must be false.
  • Formulate an analysis plan. The analysis plan describes how to use sample data to evaluate the null hypothesis. The evaluation often focuses around a single test statistic.
  • Analyze sample data. Find the value of the test statistic (mean score, proportion, t-score, z-score, etc.) described in the analysis plan.
  • Interpret results. Apply the decision rule described in the analysis plan. If the value of the test statistic is unlikely, based on the null hypothesis, reject the null hypothesis.

Decision Errors            

Two types of errors can result from a hypothesis test.
  • Type I error. A Type I error occurs when the researcher rejects a null hypothesis when it is true. The probability of committing a Type I error is called the significance level. This probability is also called alpha, and is often denoted by α.
  • Type II error. A Type II error occurs when the researcher fails to reject a null hypothesis that is false. The probability of committing a Type II error is called Beta, and is often denoted by β. The probability of not committing a Type II error is called the Power of the test.
                                                               Ho true                            Ho false
Reject Ho
Type I error (a)
OK
Accept Ho
OK
Type II error (b)

Decision Rules

The analysis plan includes decision rules for rejecting the null hypothesis. In practice, statisticians describe these decision rules in two ways - with reference to a P-value or with reference to a region of acceptance.
·         P-value. The strength of evidence in support of a null hypothesis is measured by the P-value. Suppose the test statistic is equal to S. The P-value is the probability of observing a test statistic as extreme as S, assuming the null hypothesis is true. If the P-value is less than the significance level, we reject the null hypothesis.
·         Region of acceptance. The region of acceptance is a range of values. If the test statistic falls within the region of acceptance, the null hypothesis is not rejected. The region of acceptance is defined so that the chance of making a Type I error is equal to the significance level.
The set of values outside the region of acceptance is called the region of rejection. If the test statistic falls within the region of rejection, the null hypothesis is rejected. In such cases, we say that the hypothesis has been rejected at the α level of significance.
These approaches are equivalent. Some statistics texts use the P-value approach; others use the region of acceptance approach. In subsequent lessons, this tutorial will present examples that illustrate each approach.

One-Tailed and Two-Tailed Tests

A test of a statistical hypothesis, where the region of rejection is on only one side of the sampling distribution, is called a one-tailed test. For example, suppose the null hypothesis states that the mean is less than or equal to 10. The alternative hypothesis would be that the mean is greater than 10. The region of rejection would consist of a range of numbers located on the right side of sampling distribution; that is, a set of numbers greater than 10.
A test of a statistical hypothesis, where the region of rejection is on both sides of the sampling distribution, is called a two-tailed test. For example, suppose the null hypothesis states that the mean is equal to 10. The alternative hypothesis would be that the mean is less than 10 or greater than 10. The region of rejection would consist of a range of numbers located on both sides of sampling distribution; that is, the region of rejection would consist partly of numbers that were less than 10 and partly of numbers that were greater than 10.

A General Procedure for Conducting Hypothesis Tests

All hypothesis tests are conducted the same way. The researcher states a hypothesis to be tested, formulates an analysis plan, analyzes sample data according to the plan, and accepts or rejects the null hypothesis, based on results of the analysis.
  • State the hypotheses. Every hypothesis test requires the analyst to state a null hypothesis and an alternative hypothesis. The hypotheses are stated in such a way that they are mutually exclusive. That is, if one is true, the other must be false; and vice versa.
  • Formulate an analysis plan. The analysis plan describes how to use sample data to accept or reject the null hypothesis. It should specify the following elements.
    • Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10; but any value between 0 and 1 can be used.
    • Test method. Typically, the test method involves a test statistic and a sampling distribution. Computed from sample data, the test statistic might be a mean score, proportion, difference between means, difference between proportions, z-score, t-score, chi-square, etc. Given a test statistic and its sampling distribution, a researcher can assess probabilities associated with the test statistic. If the test statistic probability is less than the significance level, the null hypothesis is rejected.

  • Analyze sample data. Using sample data perform computations called for in the analysis plan.
    • Test statistic. When the null hypothesis involves a mean or proportion, use either of the following equations to compute the test statistic.
Test statistic = (Statistic - Parameter) / (Standard deviation of statistic)
Test statistic = (Statistic - Parameter) / (Standard error of statistic)
where Parameter is the value appearing in the null hypothesis, and Statistic is the point estimate of Parameter. As part of the analysis, you may need to compute the standard deviation or standard error of the statistic. Previously, we presented common formulas for the standard deviation and standard error. When the parameter in the null hypothesis involves categorical data, you may use a chi-square statistic as the test statistic. Instructions for computing a chi-square test statistic are presented in the lesson on the chi-square goodness of fit test.
    • P-value. The P-value is the probability of observing a sample statistic as extreme as the test statistic, assuming the null hypothesis is true.
  • Interpret the results. If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null hypothesis. Typically, this involves comparing the P-value to the significance level, and rejecting the null hypothesis when the P-value is less than the significance level.

Parametric test:
Parametric methods were developed on the assumption that the underlying distribution was normal, exponential and the like. Important parametric tests used for testing the significance are ‘t-test’ ‘f-test’, ‘z-test’ etc., with these tests the observed values, their distribution, significance and conclusion are drawn on the basis of the nature and extent of difference between the two.
Non-parametric tests:
            Non-parametric methods are distribution free methods. Which have no assumption about the underlying distribution. Hence, it can be used regardless of the shape of underlying distribution. It is suitable for small sized samples. It can be applied even in case of nominal scale and ordinal scaled data.
            Important non-parametric test used for testing the significance are median test, wilcoxon matched-pairs test, chi-square test, Nann-whitney ‘U’ test, kruskal wallis test, etc.,

Hypothesis Test of the Mean

This lesson explains how to conduct a hypothesis test of a mean, when the following conditions are met:
  • The sampling method is simple random sampling.
  • The sample is drawn from a normal or near-normal population.
Generally, the sampling distribution will be approximately normally distributed if any of the following conditions apply.
  • The population distribution is normal.
  • The sampling distribution is symmetric, unimodal, without outliers, and the sample size is 15 or less.
  • The sampling distribution is moderately skewed, unimodal, without outliers, and the sample size is between 16 and 40.
  • The sample size is greater than 40, without outliers.
This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results.

State the Hypotheses

Every hypothesis test requires the analyst to state a null hypothesis and an alternative hypothesis. The hypotheses are stated in such a way that they are mutually exclusive. That is, if one is true, the other must be false; and vice versa.
The first set of hypotheses (Set 1) is an example of a two-tailed test, since an extreme value on either side of the sampling distribution would cause a researcher to reject the null hypothesis. The other two sets of hypotheses (Sets 2 and 3) are one-tailed tests, since an extreme value on only one side of the sampling distribution would cause a researcher to reject the null hypothesis.

 

Formulate an Analysis Plan

The analysis plan describes how to use sample data to accept or reject the null hypothesis. It should specify the following elements.
  • Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10; but any value between 0 and 1 can be used.
  • Test method. Use the one-sample t-test to determine whether the hypothesized mean differs significantly from the observed sample mean.

Analyze Sample Data

Using sample data, conduct a one-sample t-test. This involves finding the standard error, degrees of freedom, test statistic, and the P-value associated with the test statistic.
  • Standard error. Compute the standard error (SE) of the sampling distribution.
SE = s * sqrt{ ( 1/n ) * ( 1 - n/N ) * [ N / ( N - 1 ) ] }
where s is the standard deviation of the sample, N is the population size, and n is the sample size. When the population size is much larger (at least 10 times larger) than the sample size, the standard error can be approximated by:
SE = s / sqrt( n )
  • Degrees of freedom. The degrees of freedom (DF) is equal to the sample size (n) minus one.
Thus, DF = n - 1.
  • Test statistic. The test statistic is a t-score (t) defined by the following equation.
t = (x - μ) / SE
where x is the sample mean, μ is the hypothesized population mean in the null hypothesis, and SE is the standard error.
  • P-value. The P-value is the probability of observing a sample statistic as extreme as the test statistic. Since the test statistic is a t-score, use the t Distribution Calculator to assess the probability associated with the t-score, given the degrees of freedom computed above. (See sample problems at the end of this lesson for examples of how this is done.)

Interpret Results

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null hypothesis. Typically, this involves comparing the P-value to the significance level, and rejecting the null hypothesis when the P-value is less than the significance level.
Problem 1: Two-Tailed Test
An inventor has developed a new, energy-efficient lawn mower engine. He claims that the engine will run continuously for 5 hours (300 minutes) on a single gallon of regular gasoline. Suppose a simple random sample of 50 engines is tested. The engines run for an average of 295 minutes, with a standard deviation of 20 minutes. Test the null hypothesis that the mean run time is 300 minutes against the alternative hypothesis that the mean run time is not 300 minutes. Use a 0.05 level of significance. (Assume that run times for the population of engines are normally distributed.)
Solution: The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results. We work through those steps below:
·         State the hypotheses. The first step is to state the null hypothesis and an alternative hypothesis.
Null hypothesis: μ = 300
Alternative hypothesis: μ ≠ 300
Note that these hypotheses constitute a two-tailed test. The null hypothesis will be rejected if the sample mean is too big or if it is too small.
·         Formulate an analysis plan. For this analysis, the significance level is 0.05. The test method is a one-sample t-test.
·         Analyze sample data. Using sample data, we compute the standard error (SE), degrees of freedom (DF), and the t-score test statistic (t).
SE = s / sqrt(n) = 20 / sqrt(50) = 20/7.07 = 2.83
DF = n - 1 = 50 - 1 = 49
t = (x - μ) / SE = (295 - 300)/2.83 = 1.77
where s is the standard deviation of the sample, x is the sample mean, μ is the hypothesized population mean, and n is the sample size.
Since we have a two-tailed test, the P-value is the probability that the t-score having 49 degrees of freedom is less than -1.77 or greater than 1.77.
We use the t Distribution Calculator to find P(t < -1.77) = 0.04, and P(t > 1.75) = 0.04. Thus, the P-value = 0.04 + 0.04 = 0.08.
·         Interpret results. Since the P-value (0.08) is greater than the significance level (0.05), we cannot reject the null hypothesis.
Problem 2: One-Tailed Test
            Bon Air Elementary School has 300 students. The principal of the school thinks that the average IQ of students at Bon Air is at least 110. To prove her point, she administers an IQ test to 20 randomly selected students. Among the sampled students, the average IQ is 108 with a standard deviation of 10. Based on these results, should the principal accept or reject her original hypothesis? Assume a significance level of 0.01.
Solution: The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results. We work through those steps below:
·         State the hypotheses. The first step is to state the null hypothesis and an alternative hypothesis.
Null hypothesis: μ >= 110
Alternative hypothesis: μ < 110
Note that these hypotheses constitute a one-tailed test. The null hypothesis will be rejected if the sample mean is too small.
·         Formulate an analysis plan. For this analysis, the significance level is 0.01. The test method is a one-sample t-test.
·         Analyze sample data. Using sample data, we compute the standard error (SE), degrees of freedom (DF), and the t-score test statistic (t).
SE = s / sqrt(n) = 10 / sqrt(20) = 10/4.472 = 2.236
DF = n - 1 = 20 - 1 = 19
t = (x - μ) / SE = (108 - 110)/2.236 = -0.894
where s is the standard deviation of the sample, x is the sample mean, μ is the hypothesized population mean, and n is the sample size.
Since we have a one-tailed test, the P-value is the probability that the t-score having 19 degrees of freedom is less than -0.894.
We use the t Distribution Calculator to find P(t < -0.894) = 0.19. Thus, the P-value is 0.19.
·         Interpret results. Since the P-value (0.19) is greater than the significance level (0.01), we cannot reject the null hypothesis.

No comments:

Post a Comment