Hi. I was playing around with a dataset in the faraway package in the statistical programming language R. This data set is based on a study to determine the effectiveness of a new teaching method in economics.

**Table Of Contents**

**A Look At The Dataset**

In R, we load the faraway dataset package as well as the ggplot2 package for data visualization.

1 2 |
library(faraway) library(ggplot2) |

In the faraway dataset package, there is this dataset called spector. I am naming this spector dataset as econData.

1 2 3 4 |
# Teaching Methods in Economics: # A study to determine the effectiveness of a new teaching method in an economics class. econData <- spector |

We can take a look at the data using the head(), tail() and str() functions.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
> # A look of the data. > > head(econData); tail(econData); grade psi tuce gpa 0 0 20 2.66 0 0 22 2.89 0 0 24 3.28 0 0 12 2.92 1 0 21 4.00 0 0 17 2.86 grade psi tuce gpa 1 1 17 3.39 0 1 24 2.67 1 1 21 3.65 1 1 23 4.00 0 1 21 3.10 1 1 19 2.39 > > str(econData) 'data.frame': 32 obs. of 4 variables: $ grade: num 0 0 0 0 1 0 0 0 0 1 ... $ psi : num 0 0 0 0 0 0 0 0 0 0 ... $ tuce : num 20 22 24 12 21 17 17 21 25 29 ... $ gpa : num 2.66 2.89 3.28 2.92 4 2.86 2.76 2.87 3.03 3.92 ... |

You can find details about the spector/econData dataset through this .pdf link https://cran.r-project.org/web/packages/faraway/faraway.pdf. In page 93 of the pdf, you can find information about the variables in the dataset.

The manual contains the following about spector:

**Description**

A study to determine the effectiveness of a new teaching method in Economics

**Usage**

data(spector)

**Format**

A data frame with 32 observations on the following 4 variables.

**grade**1 = exam grades improved, 0 = not improved**psi**1 = student exposed to PSI (a new teach method), 0 = not exposed**tuce**a measure of ability when entering the class**gpa**grade point average

Source Spector, L. and Mazzeo, M. (1980), “Probit Analysis and Economic Education”, Journal of Economic Education, 11, 37 – 44.

In the mathematical sense the sample size of 32 observations is not really big. In this context of data collection from students and their results, 32 observations may be considered big.

It is not known on who the teachers are, where the economics course(s) is being taught and the curriculum. We can only work with what we have.

This tuce variable represents a measure of ability but it seems vague. Is this measuring learning ability? If so, is this measure an accurate measure or a subjective measure.

Notice how the variable psi is a number and not a factor. We can convert the psi variable into a factor using the factor() command in R.

1 2 3 4 |
> # Convert psi variable from number to as a factor where 0 is not under psi teaching > # method and 1 is student under PSI teaching method. > > econData$psi <- factor(econData$psi) |

As a check we can use the str() function onto econData.

1 2 3 4 5 6 |
> str(econData) 'data.frame': 32 obs. of 4 variables: $ grade: num 0 0 0 0 1 0 0 0 0 1 ... $ psi : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ... $ tuce : num 20 22 24 12 21 17 17 21 25 29 ... $ gpa : num 2.66 2.89 3.28 2.92 4 2.86 2.76 2.87 3.03 3.92 ... |

We convert the grade variable into a factor variable as well.

1 |
econData$grade <- factor(econData$grade) |

**A Scatterplot Using The ggplot2 Package In R**

After fixing some things with out data, we want to see what our data looks like. I am interested in the student’s ability measure (tuce) versus the student’s GPA in this economics course by teaching method groups. (We have one group being exposed to the new teaching method and the second group not being exposed.)

1 2 3 4 5 6 7 8 9 10 11 |
> # Take a look of student ability when entering the class versus GPA outcome by > # those under new teaching method versus not. > # Legend Title from http://stats.stackexchange.com/questions/5007/ > # how-can-i-change-the-title-of-a-legend-in-ggplot2 > > ggplot(data = econData, aes(x = tuce, y = gpa)) + + geom_point(aes(colour = psi)) + + ggtitle("Student Ability Versus GPA Based On Teaching Method") + + xlab("Student Ability Measure") + + ylab("Student's GPA") + + labs(colour = 'Student PSI Exposure') |

The variable tuce is renamed as Student Ability Measure along the bottom with the xlab() command. On the y-axis, gpa is renamed as Student’s GPA using the ylab() command. Points are coloured by whether or not the student is being exposed to the new PSI teaching method. Points in blue represent the students being exposed to the new student PSI teaching method while points in red represent the students not being exposed to the teaching method.

Based on our sample of 32 observations, a higher student ability measure leads to a higher student GPA. This association (correlation) is at a medium/moderate strength. Further techniques would be needed for measuring the correlation between two variables (not shown here).

If we were to compare the two groups, it appears that the students under the new teaching method score a higher GPA than those not under the new teaching method. However, this difference is not that big.

**Fitting A Linear Model**

While still using the ggplot2 package in R, we can plot a linear regression line (line of best fit) to the points. This line is the best possible line such that the (absolute) differences from the line to the each point is minimized (or the line is as close as possible to every data point).

1 2 3 4 5 6 7 8 9 10 11 |
### Fit linear models: # Fit regression line (Case: All students): ggplot(data = econData, aes(x = tuce, y = gpa)) + geom_point(aes(colour = psi)) + geom_smooth(method='lm', formula = y ~ x) + ggtitle("Student Ability Versus GPA Based On Teaching Method") + xlab("Student Ability Measure") + ylab("Student's GPA") + labs(colour = 'Student PSI Exposure') |

You can remove the confidence interval bands by including the option se = FALSE in geom_smooth().

1 2 3 4 5 6 7 8 9 10 |
# Fit regression line without confidence bands (Case: All students): ggplot(data = econData, aes(x = tuce, y = gpa)) + geom_point(aes(colour = psi)) + geom_smooth(method='lm', formula = y ~ x, se = FALSE) + ggtitle("Student Ability Versus GPA Based On Teaching Method") + xlab("Student Ability Measure") + ylab("Student's GPA") + labs(colour = 'Student PSI Exposure') + theme(plot.title = element_text(hjust = 0.5)) |

**Comparing Students Under The Teaching Methods**

With this dataset, you could also investigate how students perform depending whether or not they were under the new teaching method (psi variable = 1). Instead of one line like in the plot above, we would have two separate lines where one line is for the students not under the new teaching method and the second line representing the students under the new teaching method.

We still have the student ability measure tuce on the x-axis and the student’s GPA score (gpa) on the y-axis.

1 2 3 4 5 6 7 8 9 10 |
# Fit regression line (Case: By Teaching Method): ggplot(data = econData, aes(x = tuce, y = gpa, colour = psi)) + geom_point() + geom_smooth(method='lm', formula = y ~ x, se = FALSE) + ggtitle("Student Ability Versus GPA Based On Teaching Method") + xlab("Student Ability Measure") + ylab("Student's GPA") + labs(colour = 'Student PSI Exposure') + theme(plot.title = element_text(hjust = 0.5)) |

The plot has two lines. The red line of best fit corresponds with the red points or the students not under the new teaching method. The blue line of best fit is for the students under the new teaching method.

Based on the given data and this visual, it appears that the higher red line indicates that the student GPA scores have a wider range of values compared to the blue line. However, there are two outlier points which are blue. These outlier points of around 22 for the measure and 2.1 GPA and 18 measure and about 2.4 GPA suggest that the new teaching method does not guarantee a good GPA score. (You could argue that regardless of the teaching quality, the student is still responsible for his/her own progress.)

**Looking At Grade Improvements Of Students**

We can also investigate whether or not the student’s ability measure (tuce) determines a grade improvement (grade).

1 2 3 4 5 6 |
ggplot(data = econData, aes(x = tuce, y = grade)) + geom_point(colour = "red") + ggtitle("Student Ability Versus Grade Improvement Based On Teaching Method") + xlab("Student Ability Measure") + ylab("Grade Improvement Indicator") + theme(plot.title = element_text(hjust = 0.5)) |

It appears that a higher student ability measure may lead to an increased grade. Now let’s do this plot again but separate the points by the psi teaching method.

1 2 3 4 5 6 7 8 9 |
# Colour By Psi Teaching Method ggplot(data = econData, aes(x = tuce, y = grade)) + geom_point(aes(colour = factor(psi))) + ggtitle("Student Ability Versus Grade Improvement Based On Teaching Method") + xlab("Student Ability Measure") + ylab("Grade Improvement Indicator") + labs(colour = 'Student PSI Exposure') + theme(plot.title = element_text(hjust = 0.5)) |

This plot gives us a better picture. It appears that a higher student ability measure while being under the new teaching method leads to a grade improvement.

This concept of grade improvement in this dataset is somewhat suspect. A grade improvement could mean 0.01 GPA, 1.0 GPA boost and so on. It would have been nice to know what these grade improvement measures are.

**Summary**

Given our dataset and a sample of 32, we can investigate which variables have associations which each other. It seems that this new teaching method does improve a good handful of the student grades overall. Since our sample size of 32 is not that large (in the mathematical sense), more investigation and context is needed. It is best at this point not to give definite conclusions.