This unit involves students formulating questions about data sets, displaying and analysing data as well as reporting their conclusions. Information about the pregnancies of 56 woman and their newborn babies is used as a context.
- write and answer a question that compares sets of statistical data
- justify the answer to a question with reference to graphs and statistical calculations
- compare data about two variables by finding mean, median, modes, range and interquartile range
- compare data about two variables by drawing graphs including back to back stem and leaf, box and whisker, scatter and composite bar graphs
- record statistical information in ways that are helpful for drawing conclusions
- comment on statistical processes e.g. bias, limitations and possible improvements
A five phase approach has been used to help students identify the key processes involved in the comparison of two sets of data. * Phase 1 - Planning: Posing questions in order to explore and compare data. * Phase 2 - Exploring: How will you find out the answer to the questions? * Phase 3 - Calculations and Graphing: How will you justify your answer to the questions? * Phase 4 - Concluding. Writing statements and answering the questions with justification. * Phase 5 - Reflection: Limitations and improvements to the process. A graphics calculator or spreadsheet would be useful here to help analyse the data.
The qualitative data is number coded as is often the case in survey results. The terminology of data types, statistical calculations and data displays (statistical graphs) are explored in this unit. Statistical measures of both the centre and the spread of the data are used to justified comparative statements and answer questions posed by the students about the data set provided.
Types of data
Qualitative data is information about non-numerical qualities e.g. colour of school bag, mode of transport to school. You cannot find the mean or median for qualitative data only the mode (most common result).
Quantative data is represented in number form and can be discrete or continuous. E.g. Number of CD’s you own (discrete) or weight of your school bag (continuous).
Discrete data can be obtained from counting and has no results between the values obtained in the data. Discrete data may be grouped or ungrouped. E.g. birth date (grouped) or number of children in your family (ungrouped).
Continuous data is obtained by measuring. For any two possible values other results can be found in-between them. E.g. Weight of new born babies. (You could get continuous data by sticking a pin in a number line).
Bi-variate data is where a comparison is made between two variables with reference to the one group. You are looking for a possible connection between bi-variate data. E.g. Comparing birth weight and length for new born baby boys. The birth weight and the length are the two different variables.
Uni-variate data is where a comparison is made between a common variable for two different groups or categories. E.g. Birth weight of baby girls compared with baby boys. The only variable is the birth weight.
Averages or measurements of the centre of a distribution of data.
- The Mean is the total of all the results divided by the number of results. It is often referred to as the average, but all three measures of the centre given here are types of average.
- The Median is the middle result when the results are placed in numerical order.
- The Mode is the most common result.
- The Range is: the largest result minus the smallest result and measures the spread of the data.
- Interquartile range is: the upper quartile value – the lower quartile value.
- The upper quartile is the middle value of the top half of the data. Approx 25% of the data lies above the upper quartile (75% below).
- The lower quartile is the middle value of the bottom half of the data. Approx 25% of the data lies below the lower quartile (75% above).
- Back-to-back stem-and-leaf: uni-variate data comparison.
- Box-and-whisker: uni-variate data comparison.
- Composite bar graphs: uni-variate data comparison.
- Scatter: bi-variate data comparison.
Poster paper if required for the report.
Graphics calculators or computer for spread sheeting.
Copymaster 1 Stork Delivery data set (table of data). The data can be printed to use as a data sheet or students can explore the
justification, limitations, bias, qualitative data, quantitative data, discrete data, continuoues data, bi-variate data, uni-variate data, validity, measures of central tendancy
Session 1: Getting Started and Review
This session introduces the data set being used and explores some of the ways in which statistical data can be analysed and presented. Here are the model answers for the questions.
Present the students with the Stork Delivery data set (Copymaster 1). Discuss the meaning of the terms used:
Gravida: number of pregnancies
Para : number of births
Term: number of weeks the pregnancy lasted
Sex: male or female
Medic: type of medical personnel who delivered the baby
Deliv: nature of the delivery (caesarian, forceps or normal birth)
Identify the coding system.
Sex: Boys 13, Girls 6
Medic: GP 7, Midwife 8, Obstetrician 15
Deliv: Caesar 3, Forceps 6, Normal 14
Check the students’ understanding of the meaning of the terms used.
Consider Row 11 what does it tell us about this particular mother and her baby?
Which of her babies is represented in this data?
Why are some data items in the Gravida and the Para columns different? (Consider rows 26 and 42)
2. Check the students’ knowledge of the types of data presented.
Ensure they justify their answers using appropriate descriptive terms like numeric data.
Which of the data related to the mother is quantative data?Which is qualitative?
Which of the data related to the babies is quantative, qualitative?
Which of the quantative information presented represents discrete data?
Which of the quantative information presented represents continuous data?
3. Consider individual variables to review Level 5 graphing types before moving to posing comparative questions.
What type of graph would best display each of the given variables?
Age? Baby Mass? Gravida? Para ? Term? Sex? Medical Support? Nature of delivery? (E.g. Mother’s Age could be displayed appropriately in a histogram or stem-and-leaf plot.)
If you feel your students need to review drawing these individual graphs you can get different groups in the class to choose a type of graph from; bar, pie, histogram or stem-and-leaf. Then they need to appropriately choose one of the data sets to draw their allocated graph type. These graphs can be shared with the class by getting the students to present them, reminding others of the key features of their allocated type of graph and why they chose the data they did to present in this type of graph. Alternatively this could be a homework task following this first lesson.
4. Review the meaning and calculation of basic measures of the central tendency or average.
How do you find the mode of a set of data?
Share the task around the class with each group finding the total number of women at a selection of specific ages. (E.g. One group finds the number of women at each of the specific ages 16, 17, 18, 19, 20, another 21, 22, 23, 24, 25 etc). Get a student to record the results on the board. Leave this open to see how they will record the information from the groups. The difficulty in recording every age should arise.
What is the modal age for the mothers?
Is this significant? What does it tell us? Can we make any generalisations about the age of mothers having babies in this data set? Why? Why not?
The need to collate the summarised results in an organised way could lead to grouping the age data in a table. Highlight the difficulty in finding a specific mode if the data is presented in grouped form without knowing the individual data items.
Can you find the median weight of the babies?
What about the mean and the mode?
Which of these statistics is most useful in describing the average (central tendency) of the babies’ weights? Why?
What is the mean term of pregnancy for these babies?
Is the median term about the same? Would you expect this?
What about the modal term?
Which of these statistics is most useful in describing the average (central tendency) of the term of the pregnancy for these babies?
Encourage the students to be selective and critical about the best representation of the central tendency of the data rather than just calculating all three.
5. Review the meaning and calculation of basic measures of the spread.
How spread is the age of the mothers?
How can you measure it?
What measures of spread do you know about?
6. Phase 1 - Planning: Posing questions in order to explore and compare data.
In groups of 2-4 get the students to make up questions that interest them and that would allow them to compare different aspects of the data. Ensure students do write questions rather than make comparative statements. Allow and encourage both bi-variate and uni-variate comparisons. You may need to share a good example from one group or make one up to get them started.
E.g. Are boys more likely to have normal deliveries than girls?
Do older woman have more difficult births? (I.e. forceps and caesarians)
What can we use the data to find out about?
What are you interested in finding out about from this information?
What would be good aspects to compare?
What could we compare about the girl babies and the boy babies?
Which of these comparisons do you think would highlight differences?
Have you written a question?
(E.g. Are the new born girls lighter than the new born boys? rather than
Compare the weight of new born girls and boys).
7. Get a member from each group to write possible questions on the board. Discuss with the whole class which ones are well worded questions. Identify which are bi-variate and which are uni-variate comparisons.
E.g. 1 Do mothers of boys tend to be older than the mothers of girls? This is a uni-variate data comparison, where the age of the mothers is the only variable and it is being compared for two different groups, boys and girls.
E.g. 2. Do the heavier babies come from longer term pregnancies? This is a bi-variate data comparison, where the two variables are the baby mass and the term.
8. How will we answer these questions?
Interesting discussions can develop about how best to do this.
E.g. Do older woman have more difficult births? (I.e. forceps and caesareans)
How do you define older?
Will you group forceps and caesarean together or deal with each separately?
How will you organise the data to help you answer the question?
What statistics can you calculate?
How will you display this information?
You could grade the Nature of the Delivery on an Easy to Difficult scale, where normal is easiest, then forceps, then caesarian the most difficult delivery. This would allow you to do a form of scatter graph analysis to explore the relationship between age and difficulty of birth.
9. Get each group to settle on two questions to investigate; one that is a bi-variate comparison and one that is a uni-variate comparison. These are to be written out ready to explore in class next session. Encourage at least one comparison that focuses on quantative data.
Sessions 2 – 3: The Investigation
Working in pairs the students may explore one question each lesson or work on both questions during the next two lessons.
1. Phase 2 - Exploring: How will you find out the answer to the questions?
Get the students to consider what statistics they can calculate that will help them to make comparisons and build up some supporting evidence that will enable them to answer their questions.
What statistics can you calculate to help you explore the data related to your question?
How will you organise the data to help you calculate the statistics you require?
Tally charts, frequency tables or stem-and-leaf plots may be helpful. Encourage the students to calculate appropriate measures of central tendency (mean, median) as well as measures of spread (range, upper, lower quartiles and inter-quartile range) relevant to uni-variate data comparisons.
Percentages may be useful. E.g. Percentages of various medics involved in each type of delivery may be an appropriate statistic related to the question “Are General Practitioners present more normal births than other medical professionals?”
2. Phase 3 - Calculations and Graphing: How will you justify your answer to the question?
The students may wish to use statistical graphs from the start to help them calculate these statistics. A stem-and-leaf plot is useful for finding the median, quartiles, range and inter-quartile range.
What type of data display (statistical graph) will best help you show the comparisons between your data?
How will you decide if there is a relationship between the variables?
How will you measure the strength of any relationship between the variables for bi-variate data?
Encourage the students:
- to clearly label their graphs: title, named axes, even spaced axis labels
- to make checks that they have included all the data items; total number in each group or category
- to reflect on their calculations and ensure they look sensible. “Don’t just believe the calculator; you may have missed out some values.” “Do your results look reasonable?”
- to support their stem and leaf plots with box and whisker comparisons for the two groups for uni-variate data
- to sketch trend lines on their scatter graphs and comment on the strength of the relationship
3. Phase 4 – Concluding: Writing statements and answering the questions with justification.
Initially encourage the students to write brief statements (three to five) comparing the groups or variables using their calculations and graphs. Just quick key points at first that compare the different statistics they have calculated and the interpretations they put on the graphs they have drawn.
Then get the students to write a clear statement that answers their question based on the statement they have written.
Next ensure they justify their conclusion with full statements (three to five) that support the answer to their questions. They need to identify the differences in the statistics and those highlighted by the graphs for the two groups or variables, as well as explain what this means in relation to the context and the question they are considering. Include interesting aspects of the data shown by the graphs that make the conclusion reached a confident one or not.
E.g. 1. It is not sufficient to say “the mean age of the boys’ mothers is 25 and the girls’ mothers is 23.” But rather “the mean age of the boys’ mothers at 25 years is 2 years greater than the girls’ mothers, at 23 years. This supports the trend that the boy babies have older mothers than the girl babies, although the difference is not very great”.
E.g. 2. Yes, heavier babies do tend to come from longer pregnancies. The trend line on my scatter graph goes up steeply showing that the bigger babies usually come from the longer pregnancies, but there are a few exceptions to this.
Share students’ statements for each of the three developmental stages of statement writing above and highlight good in-context statements that clearly relate to and justify the answer to the question.
NB. The answer may be that no relationship exists between two variables and this conclusion still needs to be justified from the scatter graph.
4. Phase 5 - Reflection: Limitations and improvements to the process
Do you think your conclusion would be true for any data sets for a group of mothers and their babies? Why? Why not?
What factors could limit the validity of your conclusion?
What other information about the gathering of this set of data would be useful to know?
Could you have improved the way you carried out this process?
Is there a better graph you could have drawn?
Are there any other relevant calculations you could have done?
What relationships have you discovered about pregnancies, births and babies?
What other questions would you like to explore now if you had more time?
What have you learnt while doing this statistical investigation?
Session 4: Write up and Presentation
1. Students are to summarise their findings in poster or report form. This may be done as a pair task depending on you class arrangement. They may present an oral report to the class on their key findings supported by their poster or hand in a more formal written report.
2. Key information to include in their Presentation or Report:
- Question they are interested in.
- Two Data sets to be compared presented in an organised form.
- Summary statistics if relevant.
- Appropriate graphs that allow the data sets to be compared.
- Clear statement that answers their question.
- Three to five statements that justify their answer by referring to the context of the question (Note: Statements about the numbers or graphs is not sufficient. They need reference to the real situation they are considering.)
- Include interesting aspects of the data shown by the graphs that make the conclusion reached a confident one or not.
- Limitations to the conclusions drawn and improvements to the process identified if appropriate.
- What have they learnt while doing this statistical investigation?
Two sample reports are included as attached resources.