Representations of Data

We will use creative inquiry techniques to analyze large data sets with many dimensions of data. Work in the same group that you created your stock market data and graph.

Certain questions are labeled as Critical Analysis - these require more thought and the in-depth responses distinguish the college perspective from what you might have seen in previous stats classes in middle grades. Test 3 questions will be similar to these types of questions.
Class Data Click on this Class Data Excel File. The computer should then download the file. Open the file classdata.xls from Excel. You will see an Excel table filled in - I took the data from the survey you filled in, and put it into Excel. (To try and maximize anonymity I removed some columns, although we'll see the results of that data in other ways).
    Height and Armspan Averages
  1. Click on O2, an empty box. Calculate the class average of armspan using
    =average(d2:d61)
    What is this average?
  2. Click on O3. Calculate the class average of height. What is this average?
  3. Search the web to find information about Leonardo da Vinci's speculation about the relationship between armspan and height. Summarize what you found in your own words.
  4. Search the web to find information about the "ape index" and its relationship to rock climbing and summarize what you found. Then identify yourself on the class data list, looking at your survey submission online if necessary, but don't tell anyone else which row you are in! Calculate your own index and write down the definition you used.

    Review: Mean and Median as Measures of Center
    Mean/Average To calculate the average of a bunch of numbers, we add them all up, and then divide by how many numbers we have. It is the center with respect to weight and distance.
    Median To calculate the median, the center with respect to location that divides the data in half, we first put our data in increasing order, and then find the middle number and place. If there is no such middle number, we take the 2 numbers next to the middle place, and take the average of them.
    For example, 0,0,1,3 has an average of 1, since we would take (0+0+1+3)/4 = 1. It has a median of .5, since the middle place is between the 2nd 0 and the 1. Since there is no actual number in this place, we take the average of 0 and 1, which is .5.
    The median is both a place and a number.
    Mean and median are different measures of the data's "center", and certain circumstances reveal different pros and cons.

    Number of Siblings    
  5. Click on O4. Calculate the class average of the number of siblings. What is this average?

    Here I have ordered the sibling data, which is in blue, and annotated the median, where 50% of the data is on either side. In this case, since it is in between 2 numbers, we take the average of the ones surrounding it:
    0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [median is here middle of the data] 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 4 4 9

    Scale Balancing Idea: Compare the average to the median. If you think of the median place as the middle of a scale of the ordered numbers, then the lower numbers pull the average down while the higher numbers pull it up. While there are an equal number of items on either side of this scale, the average equals the median only when these pulls (the spreads) cancel each other out, only if the data is evenly weighted about the median. This does not happen in this case.

    What is a reasonable measure of center in this case? Well, the median is the actual center of the data and so it is a good measure of location center. In this case it tells us that half the class has that many or less siblings. Notice that while the average is not a number in our data set, and it is not exactly the center of the data, it does tell us about the distribution of the data and is the center when considering distances from it.

    Scale Balancing in Our Siblings Data: In our data notice that the average is higher than the median since the data is not weighted evenly about the middle place. To understand this using a scale balancing idea, observe that all those 1s have no pull because they are the same as the median. The 2s balance out 0s, since they are each 1 away from the median, but there are less 0s, so we already see that the average will be dragged up. In addition, there aren't any numbers below the median to balance out the 3s, 4s, and 9 since there aren't any negative siblings. So the average is dragged up.



    Distance from Home
  6. Click on O5. Type in
    =median(c2:c61)
    What is this median?
  7. Use Excel to calculate the class average of the distance from home in O6. What is this average?
  8. On your paper copy of the class data I have ordered it by distance from home. Mark where the average and median are (just like I did for the median with the number of siblings).
  9. Critical Analysis: Explain why the average is larger than the median by using the idea of a scale balancing about the median-- refer to the home data and analyze the balancing and pulls of the numbers (like I did for Scale Balancing in Our Siblings Data---it should be a number of sentences or a paragraph).

    Make sure that I have checked over your numerical answers. Then, under File, release on Close, and click on Don't Save.
    Stock Market Data

    Open up your Excel data on the stock market from your email (do not delete the email - we will use it again in future labs). You may need to click on Sheet1 if the graph opens up.

    The Mean and Median of Volume
    Volume is the number of shares bought or sold in a given day, and is represented on your graph by the bar chart, with each day given, and numbers read to the left.
  10. What Excel box contains the first data entry for your Volume data? (is it B2?)
  11. What Excel box contains the last data entry for Volume data?
  12. Click on G2, which is empty, and calculate the average of Volume. What is this value?
  13. Calculate the median of Volume in G3 (Excel can calculate using the median command even when the data is not in order). What is this value?
  14. Take out your physical copy of the graph. The volume is represented as the bar graph. To read the volume on a given day, you go to the top of the corresponding rectangle, go over to the left of the graph, and then read the number from there. Draw a horizontal line at the point on the y-axis which matches the median of Volume on this graph. This may or may not actually correspond to a specific day that attained this volume, but you should see that half the bars are above and half below. Show me so that I can mark off that you have done this correctly.
  15. Critical Analysis: Recall that the median is the middle place after we put the data in increasing order and that if we think of this as the center of a scale, then we can see whether the mean is higher or lower by seeing whether the data tips the scale to one side or another (you may wish to review my explanation in the Sibling section above). Even if the data is not in increasing order (on your stock graph for Volume it is not), we can still use the idea of the median as the center of a scale to see whether the data tips the scale higher or lower than the median. Use only the Volume bars on your stock graph and your knowledge of mean compared to median to discuss why you can see from these bars whether the mean is above or below the median - address using a scale balancing idea like I did for siblings.


    The Mean, Median and Boxplot of High
    High is the highest price that a stock hits each day. On your graph, the high is on the top of the boxplots and is read by looking at the axis on the right.
  16. What is the average of the data in the High column?
  17. What is the median of the data in the High column?
  18. What is the smallest High value? You can use a command like =quartile(D2:Dfillin,0) where you fill in the number of rows of data you have. The quartiles are the 25% markers, so the 0th quartile is the lowest data point.
  19. What is the largest High value? You can use a command like =quartile(D2:Dfillin,4) where you fill in the number of rows of data you have.
  20. Critical Analysis: In questions #16 and #17, many of you may have similar numbers even though your stock fluctuated up and down from one day to the next (as any stock data typically does!). Explain what is going on using the data, including the spread from #18 to #19.
  21. Notice that you now have lo, median, and hi. Compute the other quartiles in Excel, q1 and q3, and write them down.
  22. Use the five numbers lo, q1, median, q3, hi to create an axis that is roughly to scale, and then sketch a boxplot of the High price.