\documentstyle[11pt]{report}

Sample Survey Project

Introduction

In this project, you perform your own statistical inference using methods described in this book. Specifically, you will take a sample of students from your school to learn about one or more proportions of interest. After you take your sample, you will use inferential methods to see how the observations have modified your beliefs about each proportion. Here we outline the different steps of the project and discuss constructing a prior, selecting a random sample, summarizing the sample information, and computing the posterior probability distribution.

Getting Started

You begin by thinking of one proportion of your student body that you wish to learn about. This proportion might be the fraction of students in agreement with a particular issue, the fraction who prefer one flavor of ice cream to another, or the fraction who participate in a special activity. Suppose, for the sake of illustration, that you are interested in the proportion of the student body who regard their political philosophy as conservative. Then p would denote the proportion of conservative students in the entire student body.

Once you decide on a proportion of interest, you construct a question that will be asked to each student in your sample. If, for example, you are interested in the proportion of students who think of themselves as conservative, you might ask the question

Do you think your political philosophy is conservative?

The possible responses to this question would be "yes, ``no, or ``I don't know and you learn about p by counting the number of yes's in your sample.

Constructing a Prior

Before a sample is taken, you likely have some opinions about the location of the proportion value p. Your opinions about this proportion are represented by means of a prior probability distribution. For simplicity, suppose that p can be one of the eleven values 0, .1, .2, .3, .4, .5, .6, .7, .8, .9, 1. By following the worksheet, you can construct a prior distribution which approximately reflects your knowledge about p.

Taking a Random Sample

In Topic 10, we discussed how to take a Simple Random Sample (SRS) which is one type of random sample. However, the procedure for taking a SRS (labeling each member of the population and using the random digit table) is impractical when the size of the population is large. This will be the case if you have many students at your school.

A Stratified Random Sample is another type of random sample which is easier to take when you have a listing of the population, such as a phonebook containing the names and phone numbers of all students at your school.

Here is how you take a Stratified Random Sample:

· Decide on a step size -- here we'll use 50 but other values can be used.

· Decide on a starting place to sample in the population listing. (This should be done in some random fashion.) Say we decide to start on the 17th listing on page 50 of the phonebook.

· Add the step size (50) to each listing to get the next one to sample. So we'll sample the

· 17th, 67th, 117th, 167th, 217th ... listings

· Continue until you've got a large enough sample.

· What if a person is not home when you call? I would just forget this person and keep sampling. However, this procedure might introduce a bias in your sampling procedure. Why?

Data Analysis

Suppose that you have taken your random sample and you have a list of responses of the type "yes or ``no which are the responses to your question. You can summarize these data using the basic techniques described in Topic 1. A count table is helpful for finding the number and proportion of yes's and no's and a bar graph can be used to display these data.

Statistical Inference

After the prior distribution for the proportion has been constructed and the data are taken, we use the methodology of Topic 16 to compute the posterior probability distribution. There is a Minitab program called p_disc that can be used in this computation. You enter the prior distribution into two columns of the spreadsheet, input the number of yes's and no's, and the program computes the posterior distribution. You use this probability distribution to construct a probability distribution for p as described in Topic 16.

The Project Report

It is helpful to write a report which describes all parts of this scientific study. This report can be divided into three stages.

The first stage of the report describes the choice of a survey question and the construction of the prior probability distributions. How did you decide on your particular inference problem? Is there any personal experience or things you heard or things you read in the newspaper that motivated you to choose your question? Before you took your sample, what did you think you would find out? Include the prior worksheet that you used to construct your prior distribution.

The second stage of the report describes the ``data phase of the investigation. Describe in detail how you took your random sample. Describe the method you used, how many students were contacted, and any difficulties you experienced in collecting these data.

Give the results of your survey, including the number who said yes, no, or something else. Include any graphs that you made to summarize these data.

If you have the raw data for the individual students available, include these in the report.

The third stage of the report describes the statistical inference.

Write down the posterior probability distribution and graph the probabilities.

If you are constructing a probability interval for the proportion, describe the methodology you are using and give all the details of your computation.

Attach Minitab output if this computer package was used in the computations.

It is important to explain, using language that a layman would understand, what your interval estimate means. In particular, if you are computing a 95% interval, explain what 95% means. Interpret this interval in the context of your example.

To conclude this part of the report, you should explain what you learned from this project. Were you surprised by the results? How different were the prior and posterior distributions? What problems did you experience in doing the project? If you had to do the project over again, what would you do differently?