Sample
Survey Project
Introduction
In this
project, you perform your own statistical inference using methods described in
this book. Specifically, you will take
a sample of students from your school to learn about one or more proportions of
interest. After you take your sample,
you will use inferential methods to see how the observations have modified your
beliefs about each proportion. Here we
outline the different steps of the project and discuss constructing a prior,
selecting a random sample, summarizing the sample information, and computing
the posterior probability distribution.
Getting Started
You
begin by thinking of one proportion of your student body that you wish to learn
about. This proportion might be the
fraction of students in agreement with a particular issue, the fraction who
prefer one flavor of ice cream to another, or the fraction who participate in a
special activity. Suppose, for the sake
of illustration, that you are interested in the proportion of the student body
who regard their political philosophy as conservative. Then p would denote the proportion of
conservative students in the entire student body.
Once
you decide on a proportion of interest, you construct a question that will be
asked to each student in your
sample. If, for example, you are
interested in the proportion of students who think of themselves as
conservative, you might ask the question
Do you think your political
philosophy is conservative?
The
possible responses to this question would be "yes, ``no, or ``I don't know
and you learn about p by counting the number of yes's in your sample.
Constructing a Prior
Before
a sample is taken, you likely have some opinions about the location of the
proportion value p. Your opinions about
this proportion are represented by means of a prior probability
distribution. For simplicity, suppose
that p can be one of the eleven values 0, .1, .2, .3, .4, .5, .6, .7, .8, .9,
1. By following the worksheet, you can construct a prior distribution which
approximately reflects your knowledge about p.
Taking a Random Sample
In
Topic 10, we discussed how to take a Simple Random Sample (SRS) which is one
type of random sample. However, the
procedure for taking a SRS (labeling each member of the population and using
the random digit table) is impractical when the size of the population is
large. This will be the case if you
have many students at your school.
A
Stratified Random Sample is another type of random sample which is easier to
take when you have a listing of the population, such as a phonebook containing
the names and phone numbers of all students at your school.
Here is
how you take a Stratified Random Sample:
·
Decide
on a step size -- here we'll use 50 but other values can be used.
·
Decide
on a starting place to sample in the population listing. (This should be done in some random
fashion.) Say we decide to start on the
17th listing on page 50 of the phonebook.
·
Add
the step size (50) to each listing to get the next one to sample. So we'll sample the
·
17th,
67th, 117th, 167th, 217th ... listings
·
Continue
until you've got a large enough sample.
·
What
if a person is not home when you call?
I would just forget this person and keep sampling. However, this procedure might introduce a
bias in your sampling procedure. Why?
Data Analysis
Suppose
that you have taken your random sample and you have a list of responses of the
type "yes or ``no which are the responses to your question. You can summarize these data using the basic
techniques described in Topic 1. A
count table is helpful for finding the number and proportion of yes's and no's
and a bar graph can be used to display these data.
Statistical Inference
After
the prior distribution for the proportion has been constructed and the data are
taken, we use the methodology of Topic 16 to compute the posterior probability
distribution. There is a Minitab
program called p_disc that can be used in this computation. You enter the prior distribution into two
columns of the spreadsheet, input the number of yes's and no's, and the program
computes the posterior distribution.
You use this probability distribution to construct a probability
distribution for p as described in Topic 16.
The Project Report
It is
helpful to write a report which describes all parts of this scientific
study. This report can be divided into
three stages.
The
first stage of the
report describes the choice of a survey question and the construction of the
prior probability distributions. How
did you decide on your particular inference problem? Is there any personal
experience or things you heard or things you read in the newspaper that
motivated you to choose your question?
Before you took your sample, what did you think you would find out? Include the prior worksheet that you used to
construct your prior distribution.
The
second stage of
the report describes the ``data phase of the investigation. Describe in detail how you took your random
sample. Describe the method you used, how
many students were contacted, and any difficulties you experienced in
collecting these data.
Give
the results of your survey, including the number who said yes, no, or something
else. Include any graphs that you made
to summarize these data.
If you have
the raw data for the individual students available, include these in the
report.
The
third stage of the
report describes the statistical inference.
Write
down the posterior probability distribution and graph the probabilities.
If you
are constructing a probability interval for the proportion, describe the
methodology you are using and give all the details of your computation.
Attach
Minitab output if this computer package was used in the computations.
It is
important to explain, using language that a layman would understand, what your
interval estimate means. In particular,
if you are computing a 95% interval, explain what 95% means. Interpret this interval in the context of
your example.
To
conclude this part of the report, you should explain what you learned from this
project. Were you surprised by the
results? How different were the prior
and posterior distributions? What problems
did you experience in doing the project?
If you had to do the project over again, what would you do differently?