In R a data frame is a kind of object. Like vectors, data frames store data. However, data frames are differ in that they store multiple vectors. It is important that you understand what a data frame is as it is the most frequently used tool in political statistical analysis.
If you are having a hard time visualizing a dataframe simply think of what a spreadsheet looks like. Each column of the dataframe can be said to be vector, each vector represents a variable and the rows coincide with an observation. In all statistical software variables are represented by columns and observations are by rows.
You may create a data frame manually if you want but living in the age of big data this is rarely the case! There are many example datasets pre-loaded in RStudio.
Let’s have a look at one of these pre-loaded data frames. The data frame is called longley (this is an pre-loaded economic dataset)
Using the View
function let’s see the variables included in the dataset
data("longley")
View(longley)
If we want to see individuals columns, in other words, a specific variable in the data frame, then we use the $ sign between the name of the dataset and the name of the variable (e.g name_of_dataset$name_of_variable). Let’s start by observing the Unemployment column.
longley$Unemployed
In addition, often we want to access only certain observations (rows) or only certain variables (columns). By using the square brackets [ ] we subset the data frame. In the square brackets, we insert the coordinates for a row and a column. The row is always first followed by the column. For example, longey[7, 5] gives us the 7th row and the 5th column. If we leave the column coordinate empty then we want to see all columns longey[7, ]. If we leave the row coordinate empty then we want all columns.
longley[7,5]
Leave the column coordinate empty to see the 7th row
longley[7, ]
Leave the row empty to see the 5th column
longley[ ,5]
We may see the first ten rows of a dataset by adding a colon in the brackets
longley[1:10, ]
Let’s create a plot from our dataset. Let’s start by creating a scatterplot with the one axis (X) representing the Year and the other (Y) axis the Gross National Product
plot(longley$Year,longley$GNP)
to create the same plot but by using a line instead of dots we add the argument type="l"
plot(longley$Year,longley$GNP,type = "l")
Use the title()
function, to give labels to the axes, and a title to your plot.
The examples in the help are particularly informative.