Micro-evolution

STATISTICS IN INTRODUCTORY BIOLOGY
T
hi
s supplement
is excerpted with permission from
the
BIOL 1010 lab
m
anual
(Bishop et al., 2012).
References to BIOL 1010 and BIOL 1011 also apply to BIOL 1020 and BIOL 1021,
respectively.
Bishop T, Gass G, Van Dommelen J. 2012. Appendix E: Statistics in Introductory Biology. I
n: Biology
1010 Laboratory Manual. Halifax (NS): Dalhousie University.
***************************************************************************************************
In virtually every published primar
y research article in science, the Results section will contain the
results of a number of statistical tests performed on the data collected by the researchers. Scientists use
statistics to demonstrate mathematically that their results (for example, that p
lants treated with
Fertilizer A grew larger than plants treated with Fertilizer B) are meaningful. For example, a biologist
might weigh the plants in the two groups and find that the average weights calculated for each group
were different values. The rese
archer would not stop there, but would then want to find out whether
that the difference observed between the two groups is a legitimate or
significant
one (Fertilizer A really
does promote plant growth better than Fertilizer B), rather than just an accide
nt of chance (that is, the
plants chosen for measurement in treatment A just happened to be heavier than the plants chosen for
measurement in treatment B, even though there was no real difference in weights caused by the
fertilizer used). Another biologist
might have used a hypothesis to generate a prediction of the
frequency of a particular phenotype in the offspring of a cross between two plants. When he or she
actually performs that cross by breeding the plants together, do the frequencies match what was
predicted? If they don’t match exactly, are they close enough, or do the expected and the observed
differences differ significantly?
In this Appendix, you will learn
the
basic stati
stical tech
niques that you
may
need to use in your BIOL
1010
and 101
1 lab
oratory activities. More advanced biology classes make use of more advanced
statistical techniques, but many of these techniques are based on the same concepts you will use in your
labs this year.
There are three sections to this Appendix:
I. Basic descr
iptive statistics: mean and standard deviation
II. Statistical tests: the chi
–
square test
III. Standard error and 95% confidence intervals
I. BASIC DESCRIPTIVE STATISTICS: MEAN AND STANDARD DEVIATION
When a group of measurements are taken, we often want to
be able to characterize that group using
descriptive statistics: for example, what was the middle or average weight of a plant in that group? How
much did individual plants in that group tend to differ in weight from one another?
A common measure of the
middle or average value used in biology is the
mean
. You have likely
calculated means in secondary school math: the mean is found by adding up all of the observed values,
then dividing by the number of observed values. The number of observations or data po
ints is referred
to as
n
. The Greek letter sigma (
?
) indicates that you should sum up whatever comes immediately after
the sigma. We can represent the procedure for finding the mean like this:
mean =
?observed
values
/ n
.
In spreadsheet programs such as Microsoft Excel or Google Docs Spreadsheets you can
calculate
the mean using the “=AVERAGE” formula.
The variability of the data set (how much the values tended to differ from the mean) is described using
the
standard deviation
. Together, the mean and the standard deviation tell you about the distribution
of your observations: what value they cluster around, and how narrow or wide that cluster is. The
larger the differences between each observation and the mean, the larger the standard deviation. The
procedure for finding the standard deviation of a sample
is more complex than the procedure for finding
the mean. Some values will fall below the mean (resulting in a negative number), while some values will
fall above the mean (resulting in a positive number), so the values need first to be squared so that all
of
the differences will be positive, then a square root taken. Here is the formula describing this procedure:
standard deviation =
v
(
?(observed value
–
sample mean)
2
/ n
–
1)
You can use the “=STDEV” formula in spreadsheet software to calculate the standard deviation. In BIOL
1010, you will need to know what the mean and standard deviation tell you about your set of data. In
BIOL 1011,
we will build on this knowledge: you will learn how to use
n
and standard deviation to
calculate a related value called standard error, so that when graphing your data you can quickly assess
whether the means of two groups are likely to be significantly d
ifferent, as in the case of the Fertilizer A
and B treatments described above.
II. STATISTICAL TESTS: THE CHI
–
SQUARE TEST
In both BIOL 1010 and 1011, you will carry out and interpret a
statistical test
called the chi
–
square
(
?
2
) test of goodness of fit. This test will allow you to test hypotheses by comparing your predictions to
your observations, as in the plant cross example described above. On the next page, you will find
complete instructions for carrying out and interpret
ing the chi
–
square test. This test is just one of a very
wide range of statistical tests used in science, and if you take upper
–
year courses in biology you will
likely encounter many different statistical tests. However, these tests tend to share some comm
on
features:
o
The purpose of the test is to help you decide whether or not to reject some hypothesis. The
hypothesis itself will differ depending on the study being performed and the statistical test being
used, but at the end of the test you should be able
to say whether the hypothesis should be
rejected or not. Notice that we do not say that the hypothesis is “supported” or “proven”,
simply that we fail to reject it.
o
At the end of the mathematical operations involved in the test, you have computed what is
called a
test statistic
. In the chi
–
square test, the test statistic is the
?
2
value that you calculate
by adding up the squared differences between observed and expected values divided by the
expected value; other types of tests (the Student’s t
–
test or the Mann
–
Whitney U test, for
example) have their own test statistics arrived
at by their own procedures.
o
Each test also requires that you find the number of
degrees of freedom
, which is related to
the number of different categories being studied. Together, the test statistic and the degrees of
freedom value will allow you to interp
ret the results of your test.
o
When the test statistic and degrees of freedom have been calculated, you use these values to
consult
s
tatistical tables
(on paper or in computer databases) specific to each statistical test.
In your chi
–
square test, the degree
s of freedom value tells you which row of the table to look in.

BIOL 1020
Lab Assignment: Microevolution

Start by re-saving this file as follows: lab_surname_labtitle.rtf, substituting your own surname. Remember to convert to PDF after you have finished entering your answers and before submitting for grading.

Type your responses to the questions below where indicated. Remember to save your work frequently.

Data Analysis and Interpretation

1. Use your data to estimate the allele frequencies at the longhair locus for the cats in each city of your chosen pair. Use the warm-up exercises in the online content as a guide and show your work clearly. Enter your results in Table 2 (which goes with Question 8 in this document). (2 marks)

(a) City # 1 [replace with city name]

RESPONSE:

PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS

(a) City # 2 [replace with city name]

RESPONSE:

PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS

2. What percentage of the cat population in each city is heterozygous at the longhair locus? Use the warm-up exercises in the online content as a guide and show your work clearly. (2 marks)

(a) City # 1 [replace with city name]

RESPONSE:

PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS

(b) City # 2 [replace with city name]

RESPONSE:

PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS

3. Use your data to calculate the allele frequencies at the spotting locus for the cats in each city of your chosen pair. Exclude unknowns from your totals. Use the warm-up exercises in the online content as a guide and show your work clearly. (2 marks)

(a) City # 1 [replace with city name]

RESPONSE:

PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS

(b) City # 2 [replace with city name]

RESPONSE:

PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS

4. Use your answers from the previous question to calculate the NUMBER of cats with each genotype that would be expected in your sample if the population were in Hardy-Weinberg equilibrium with respect to the spotting locus. Use the warm-up exercises in the online content as a guide and show your work clearly. Add your genotype numbers to the appropriate ‘Expected #’ columns in Tables 1a and 1b. (2 marks)

(a) City # 1 [replace with city name]

RESPONSE:

PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS

(b) City # 2 [replace with city name]

RESPONSE:

PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS

5. From your data sheet, add your observed number of cats for each genotype associated with the spotting locus to the ‘Observed #’ columns in Tables 1a and 1b. Do the ‘Expected’ genotype numbers match the actual genotype numbers that you observed in your samples? Probably not. But are the differences STATISTICALLY significant? If not, then we can say that the differences are due to chance alone, and that they do not represent a meaningful deviation from the equilibrium numbers. If the differences ARE statistically significant, then we can say that they are not due to chance alone, and that there is some other factor that accounts for the differences.

We can use the chi-squared test of goodness of fit to determine whether the observed spotting genotype numbers in your data are significantly different from those expected under equilibrium conditions. In this case, we can say that the null hypothesis is that there is no difference between the observed spotting genotype numbers and those expected under Hardy-Weinberg equilibrium.

Complete Tables 1a and 1b to obtain a test statistic for each city, and answer the questions that accompany them. (5 marks)

Table 1a. Calculation of chi-squared test statistic for three genotypes in cats at shelters in
_____________________________________________[fill in the name of the first city in your pair].

Genotype
(class)   Observed #
(o)   Expected #
(e)   (o-e)   (o-e)2   (o-e)2
e
SS
Ss
ss
Total                   ?2 =

You will have to determine the number of degrees of freedom before you proceed. When calculating the degrees of freedom for Hardy-Weinberg, the equation is slightly different than in other genetics problems. Use the formula

df = k – r

where k= the number of classes (genotypes)
and r = the number of alleles in an individual

(a) How many degrees of freedom are there for Table 1a?

RESPONSE:

PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS

(b) What p-value did you obtain for the test statistic in Table 1a (refer to Table 3 near the end of this document)? Give a range if appropriate.

RESPONSE:

PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS

RESPONSE:

PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS

Table 1b. Calculation of chi-squared test statistic for three genotypes in cats at shelters in
_____________________________________________[fill in the name of the second city in your pair].

Genotype
(class)   Observed #
(o)   Expected #
(e)   (o-e)   (o-e)2   (o-e)2
e
SS
Ss
ss
Total                   ?2 =

(d) How many degrees of freedom are there for Table 1b?

RESPONSE:

PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS

(e) What p-value did you obtain for the test statistic in Table 1b? Give a range if appropriate.

RESPONSE:

PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS

(f) Should you reject or fail to reject the null hypothesis? With reference to your p-value, justify your decision.

RESPONSE:

PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS

6. With respect to the spotting locus, is microevolution occurring in the cat population in either city in your pair? State your evidence. (1 mark)

RESPONSE:

PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS

7. List the five assumptions of the Hardy-Weinberg principle. If micoevolution is occurring with respect to the spotting locus, which if the assumptions do you think could be violated? Explain your answer. (If your data indicate that microevolution at the spotting locus is NOT occurring, pretend for a moment that it is, and answer the same question.) (1 mark)

RESPONSE:

PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS

8. Table 2 below summarizes your data and calculations. Another test (the chi-squared test of independence) would be required to determine whether any differences between the cities might be statistically significant, but that is beyond the scope of this lab. For our purposes, we’ll consider the differences to be significant.

Propose a hypothesis (explanation) as to why there are differences in cat data between cities. You may propose a general hypothesis (i.e., one that might apply to any or all of the items in Table 2), or a hypothesis specific to a particular item in Table 2. (Hint: the cities weren’t paired at random!) (1 mark)

Table 2. Summary of observations and calculations based on data collected from photos of cats at shelters in two North American cities.

City longhair spotting HWE for spotting (yes/no)
f (L) f (l) f (SS) f (Ss) f (ss)
enter name of first city
enter name of second city

RESPONSE:

PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS

9. Name one potential drawback to the data collection procedure and speculate about its potential impact on your data. (1 mark)

RESPONSE:

PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS

Table 3. Critical ?2 Values
Degrees of Freedom                   Probability (P)
0.95   0.8   0.5   0.2   0.05   0.01   0.005
1   0.004   0.064   0.455   1.642   3.841   6.635   7.879
2   0.103   0.446   1.386   3.219   5.991   9.21   10.597
3   0.352   1.005   2.366   4.642   7.815   11.345   12.838
4   0.711   1.649   3.357   5.989   9.48   13.277   14.86
5   1.145   2.343   4.351   7.289   11.07   15.086   16.75
6   1.635   3.07   5.348   8.558   12.592   16.812   18.548
7   2.167   3.822   6.346   9.803   14.067   18.475   20.278
8   2.733   4.594   7.344   11.03   15.507   20.09   21.955
Non significant   Significant

Using the table of Critical ?2 Values
1.    Locate the row containing the appropriate degrees of freedom.
2.    Find where your chi-squared test statistic fits within the range of numbers in the row (it may fall outside of the range; i.e. to the left or right ends of the scale).
3.    Note the probability values (p-values) corresponding to your test statistic and determine which p-values your test statistic lies between, or whether the p-value is off the scale.
4.    According to statistical convention, a p-value of less than 0.05 (p < 0.05) means that there is less than a 5% chance that the difference between what you observed and what you expected is due to chance. Therefore the difference between the actual and expected values is considered to be due to some factor other than chance. So you can reject your null hypothesis that the difference is due to chance alone.
5.   If the p-value is greater than or equal to 0.05 (p=0.05) there is a greater than 5% chance that your test statistic is due to chance, so you do not reject the null hypothesis that the difference between what you expected and what you observed is due to chance alone.

Lab Assignment Survey Questions

We’re interested in your feedback! Please visit the Lab AssignmentsSurvey, via the ‘Proctor and Other Surveys’ page in the Course Menu of the class site, to enter your responses to the questions below. This survey is anonymous.

1. Approximately how long did it take you to complete this assignment?

•   less than an hour
•   1-2 hours
•   2-3 hours
•   3-4 hours
•   more than 4 hours

2. Was this a fair amount of time, considering the particulars of the assignment?

•   Yes, it was a fair amount of time.
•   No, the assignment could have been more comprehensive.
•   No, it took too much time.

3. How would you rate the level of difficulty of the assignment?

•   Easy
•   Challenging, but manageable
•   Too challenging

4. How would you rate the learning value of the assignment?

• The assignment helped my learning.
• The assignment did not help my learning.

5. Do you have any additional comments or feedback about this assignment?

Start by re-saving this file as follows: lab_surname_nicroevolutiondata.xlsx, substituting your own surname. Remember to convert to (or save as) PDF before submitting.

Table 4. Data sheet for recording selected phenotypes of cats in shelters in __________________________ [replace the blank with the first city of your pair].

cat name Tina T Oximo

phenotype   E.g. 1   E.g. 2   1   2   3   4   5   6   7   8   9   10   11   12   13   14   15   16   17   18   19   20   Total
short hair (L_)   1   0
long hair (ll)   0   1
100% white (W_)   0   1
<100% white (ww)   1   0
>50% white spotting (SS)   1   0
<50% white spotting (Ss)   0   0
0% white (ss)   0   0
unknown at spotting locus   0   1

Table 5. Data sheet for recording selected phenotypes of cats in shelters in __________________________ [replace the blank with the second city of your pair].