In this tutorial, we will consider the most common operations used for manipulation in data frames in R: How to add columns to the data frame at Base R.
Data frame is one of the basic data structures of the programming language. It is also a very versatile data structure as it can store, easily edit, and easily update.
What is the data frame in r?
Technically, a data frame in R is a special case A list of vectors of the same lengthWhere different vectors can have different types of different data (and usually occur). Since the data frame contains a tabler, 2 -dimensional form, it contains columns (variables) and row (data entries). If you are new to making a data frame in R, you want to read how to make a data frame in the R before continuing this post.
Adding columns to R data frame in r
We would like to add a new column to a Ra data frame for various reasons: calculating a new variable based on current individuals, adding a new column available available but with a different form (keeping both columns in view of both columns), adding it to a full -fledged column to add a full -fledged column to add new columns.
Find different ways to add a new column to the data frame in the IR. For our experiences, we will mostly use the same data frame called. super_sleepers Which we will re -form with the following initial data frame each time:
super_sleepers_initial <- data.frame(rating=1:4,
animal=c('koala', 'hedgehog', 'sloth', 'panda'),
country=c('Australia', 'Italy', 'Peru', 'China'))
print(super_sleepers_initial)Output:
rating animal country
1 1 koala Australia
2 2 hedgehog Italy
3 3 sloth Peru
4 4 panda ChinaOur job is to add a new column to this data frame called avg_sleep_hours According to the following scheme, each of the above animals represent the average time.
| Animal | Average of daily sleep |
|---|---|
| Hood | 21 |
| Hedgehog | 18 |
| Lowly | 17 |
| Panda | 10 |
Some examples of Wee, we will experience adding two other columns: avg_sleep_hours_per_year And has_tail.
Now, I sink.
Adding columns to the data frame in R using the $ symbol
Since R is a list of data frame vector where each vector represents an individual column of this data frame, we can add columns to only one data frame by adding relevant new vector to this “list”. The syntax is as follows:
dataframe_name$new_column_name <- vectorLet’s make our formation super_sleepers Data frame from the initial super_sleepers_initial Data frame (we will do this for every experience of later) and add a column to which the name is avg_sleep_hours Represented by vector c(21, 18, 17, 10):
# Reconstructing the `super_sleepers` dataframe
super_sleepers <- super_sleepers_initial
print(super_sleepers)
cat('\n\n') # printing an empty line
# Adding a new column `avg_sleep_hours` to the `super_sleepers` dataframe
super_sleepers$avg_sleep_hours <- c(21, 18, 17, 10)
print(super_sleepers)Output:
rating animal country
1 1 koala Australia
2 2 hedgehog Italy
3 3 sloth Peru
4 4 panda China
rating animal country avg_sleep_hours
1 1 koala Australia 21
2 2 hedgehog Italy 18
3 3 sloth Peru 17
4 4 panda China 10Note that the number of items contained in the vector should be equal to the current number of rows in the data frame, otherwise, the program throws an error.
# Reconstructing the `super_sleepers` dataframe
super_sleepers <- super_sleepers_initial
print(super_sleepers)
cat('\n\n')
# Attempting to add a new column `avg_sleep_hours` to the `super_sleepers` dataframe
# with the number of items in the vector NOT EQUAL to the number of rows in the dataframe
super_sleepers$avg_sleep_hours <- c(21, 18, 17)
print(super_sleepers)Output:
rating animal country
1 1 koala Australia
2 2 hedgehog Italy
3 3 sloth Peru
4 4 panda China
Error in $<-.data.frame(*tmp*, avg_sleep_hours, value = c(21, 18, 17): replacement has 3 rows, data has 4
Traceback:
1. <-(*tmp*, avg_sleep_hours, value = c(21, 18, 17))
2. <-.data.frame(*tmp*, avg_sleep_hours, value = c(21, 18, 17))
3. stop(sprintf(ngettext(N, "replacement has %d row, data has %d",
. "replacement has %d rows, data has %d"), N, nrows), domain = NA)Instead of assigning a vector, we can assign the same price, whether it be a numeric or character, for all row of new columns:
# Reconstructing the `super_sleepers` dataframe
super_sleepers <- super_sleepers_initial
print(super_sleepers)
cat('\n\n')
# Adding a new column `avg_sleep_hours` to the `super_sleepers` dataframe and setting it to 0
super_sleepers$avg_sleep_hours <- 0
print(super_sleepers)Output:
rating animal country
1 1 koala Australia
2 2 hedgehog Italy
3 3 sloth Peru
4 4 panda China
rating animal country avg_sleep_hours
1 1 koala Australia 0
2 2 hedgehog Italy 0
3 3 sloth Peru 0
4 4 panda China 0In this case, the new column plays the role of a place holder for the original values ​​of the specific data type (in the above case, which we can submit later.
As an alternative, we can calculate a new column based on existing calls. Let’s first add AVG_SLEEPE_HOUrs Columns to our Data Frame and then calculate a new column avg_sleep_hours_per_year From this we want to know how many hours of an average of these animals sleep every year:
# Reconstructing the `super_sleepers` dataframe
super_sleepers <- super_sleepers_initial
print(super_sleepers)
cat('\n\n')
# Adding a new column `avg_sleep_hours` to the `super_sleepers` dataframe
super_sleepers$avg_sleep_hours <- c(21, 18, 17, 10)
print(super_sleepers)
cat('\n\n')
# Adding a new column `avg_sleep_hours_per_year` calculated from `avg_sleep_hours`
super_sleepers$avg_sleep_hours_per_year <- super_sleepers$avg_sleep_hours * 365
print(super_sleepers)Output:
rating animal country
1 1 koala Australia
2 2 hedgehog Italy
3 3 sloth Peru
4 4 panda China
rating animal country avg_sleep_hours
1 1 koala Australia 21
2 2 hedgehog Italy 18
3 3 sloth Peru 17
4 4 panda China 10
rating animal country avg_sleep_hours avg_sleep_hours_per_year
1 1 koala Australia 21 7665
2 2 hedgehog Italy 18 6570
3 3 sloth Peru 17 6205
4 4 panda China 10 3650Also, it is possible to copy one column from one data frame using the following syntax:
df1$new_col <- df2$existing_colLet’s prepare a copy of such a situation:
# Creating the `super_sleepers_1` dataframe with the only column rating
super_sleepers_1 <- data.frame(rating=1:4)
print(super_sleepers_1)
cat('\n\n')
# Copying the `animal` column from `super_sleepers_initial` to `super_sleepers_1`
# Note that in the new dataframe, the column is called `ANIMAL` instead of `animal`
super_sleepers_1$ANIMAL <- super_sleepers_initial$animal
print(super_sleepers_1)Output:
rating
1 1
2 2
3 3
4 4
rating ANIMAL
1 1 koala
2 2 hedgehog
3 3 sloth
4 4 pandaThe error of this approach (ie, using the $ operator to add columns to the data frame) is that we cannot add a column with white spots or special symbols. In fact, there is nothing that is not something that has a letter (upper or lower case), a number, a dot, or any under -score. Also, this approach does not work to add numerous columns.
Adding columns to the data frame in R using square brackets
Another way to add a new column to the R data frame is “data frame style” instead of “list style”: using bracket notification. Let’s see how it works:
# Reconstructing the `super_sleepers` dataframe
super_sleepers <- super_sleepers_initial
print(super_sleepers)
cat('\n\n')
# Adding a new `column avg_sleep_hours` to the `super_sleepers` dataframe:
super_sleepers('avg_sleep_hours') <- c(21, 18, 17, 10)
print(super_sleepers)Output:
rating animal country
1 1 koala Australia
2 2 hedgehog Italy
3 3 sloth Peru
4 4 panda China
rating animal country avg_sleep_hours
1 1 koala Australia 21
2 2 hedgehog Italy 18
3 3 sloth Peru 17
4 4 panda China 10In a piece of code mentioned, we can change this line:
super_sleepers('avg_sleep_hours') <- c(21, 18, 17, 10)This line can also be replaced:
super_sleepers(('avg_sleep_hours')) <- c(21, 18, 17, 10)Finally, it can also be changed:
super_sleepers('avg_sleep_hours') <- c(21, 18, 17, 10)The result will be the same, they have only 3 different versions of the syntax.
As it was for the previous procedure, we can assign the same price instead of vector in the new column:
# Reconstructing the `super_sleepers` dataframe
super_sleepers <- super_sleepers_initial
print(super_sleepers)
cat('\n\n')
# Adding a new column `avg_sleep_hours` to the `super_sleepers` dataframe and assigning it to 'Unknown'
super_sleepers('avg_sleep_hours') <- 'Unknown'
print(super_sleepers)Output:
rating animal country
1 1 koala Australia
2 2 hedgehog Italy
3 3 sloth Peru
4 4 panda China
rating animal country avg_sleep_hours
1 1 koala Australia Unknown
2 2 hedgehog Italy Unknown
3 3 sloth Peru Unknown
4 4 panda China UnknownAs an alternative, we can calculate a new column based on existing columns:
# Reconstructing the `super_sleepers` dataframe
super_sleepers <- super_sleepers_initial
print(super_sleepers)
cat('\n\n')
# Adding a new column `avg_sleep_hours` to the `super_sleepers` dataframe
super_sleepers('avg_sleep_hours') <- c(21, 18, 17, 10)
print(super_sleepers)
cat('\n\n')
# Adding a new column `avg_sleep_hours_per_year` calculated from `avg_sleep_hours`
super_sleepers('avg_sleep_hours_per_year') <- super_sleepers('avg_sleep_hours') * 365
print(super_sleepers)Output:
rating animal country
1 1 koala Australia
2 2 hedgehog Italy
3 3 sloth Peru
4 4 panda China
rating animal country avg_sleep_hours
1 1 koala Australia 21
2 2 hedgehog Italy 18
3 3 sloth Peru 17
4 4 panda China 10
rating animal country avg_sleep_hours avg_sleep_hours_per_year
1 1 koala Australia 21 7665
2 2 hedgehog Italy 18 6570
3 3 sloth Peru 17 6205
4 4 panda China 10 3650Using another option we can copy the column from another data frame:
# Creating the `super_sleepers_1` dataframe with the only column `rating`
super_sleepers_1 <- data.frame(rating=1:4)
print(super_sleepers_1)
cat('\n\n')
# Copying the `animal` column from `super_sleepers_initial` to `super_sleepers_1`
# Note that in the new dataframe, the column is called `ANIMAL` instead of `animal`
super_sleepers_1('ANIMAL') <- super_sleepers_initial('animal')
print(super_sleepers_1)Output:
rating
1 1
2 2
3 3
4 4
rating ANIMAL
1 1 koala
2 2 hedgehog
3 3 sloth
4 4 pandaThe advantage of using square brackets on the $ operator to add columns to the data frame is that we can add a column that has white spaces or a special sign.
Adding columns to the data frame in R cbind() Ceremony
The third way to add a new column to the R data frame is to apply it cbind() The function that stands for “column binds” and can also be used to connect two or more data faces. The use of this function is a more universal approach than the previous two as it allows to add several columns simultaneously. Its main syntax is as follows:
df <- cbind(df, new_col_1, new_col_2, ..., new_col_N)Adds a piece of code below avg_sleep_hours To the column super_sleepers Data frame:
# Reconstructing the `super_sleepers` dataframe
super_sleepers <- super_sleepers_initial
print(super_sleepers)
cat('\n\n')
# Adding a new column `avg_sleep_hours` to the `super_sleepers` dataframe
super_sleepers <- cbind(super_sleepers,
avg_sleep_hours=c(21, 18, 17, 10))
print(super_sleepers)Output:
rating animal country
1 1 koala Australia
2 2 hedgehog Italy
3 3 sloth Peru
4 4 panda China
rating animal country avg_sleep_hours
1 1 koala Australia 21
2 2 hedgehog Italy 18
3 3 sloth Peru 17
4 4 panda China 10The next piece of code adds two new columns. avg_sleep_hours And has_tail – – super_sleepers Data frame together:
# Reconstructing the `super_sleepers` dataframe
super_sleepers <- super_sleepers_initial
print(super_sleepers)
cat(\n\n)
# Adding two new columns `avg_sleep_hours` and `has_tail` to the `super_sleepers` dataframe
super_sleepers <- cbind(super_sleepers,
avg_sleep_hours=c(21, 18, 17, 10),
has_tail=c('no', 'yes', 'yes', 'yes'))
print(super_sleepers)Output:
rating animal country
1 1 koala Australia
2 2 hedgehog Italy
3 3 sloth Peru
4 4 panda China
rating animal country avg_sleep_hours has_tail
1 1 koala Australia 21 no
2 2 hedgehog Italy 18 yes
3 3 sloth Peru 17 yes
4 4 panda China 10 yesIn addition to adding multiple columns at the same time, another advantage of using cbind() The function is that it allows to assign the result of this operation (ie, add one or more columns to the R data frame) does not change the initial anyone in a new data frame:
# Reconstructing the `super_sleepers` dataframe
super_sleepers <- super_sleepers_initial
print(super_sleepers)
cat('\n\n')
# Creating a new dataframe `super_sleepers_new` based on `super_sleepers` with a new column `avg_sleep_hours`
super_sleepers_new <- cbind(super_sleepers,
avg_sleep_hours=c(21, 18, 17, 10),
has_tail=c('no', 'yes', 'yes', 'yes'))
print(super_sleepers_new)
cat('\n\n')
print(super_sleepers)Output:
rating animal country
1 1 koala Australia
2 2 hedgehog Italy
3 3 sloth Peru
4 4 panda China
rating animal country avg_sleep_hours has_tail
1 1 koala Australia 21 no
2 2 hedgehog Italy 18 yes
3 3 sloth Peru 17 yes
4 4 panda China 10 yes
rating animal country
1 1 koala Australia
2 2 hedgehog Italy
3 3 sloth Peru
4 4 panda ChinaAs it was for the past two ways, within that cbind() Function, we can assign the same price to the entire new column:
# Reconstructing the `super_sleepers` dataframe
super_sleepers <- super_sleepers_initial
print(super_sleepers)
cat('\n\n')
# Adding a new column `avg_sleep_hours` to the `super_sleepers` dataframe and setting it to 0.999
super_sleepers <- cbind(super_sleepers,
avg_sleep_hours=0.999)
print(super_sleepers)Output:
rating animal country
1 1 koala Australia
2 2 hedgehog Italy
3 3 sloth Peru
4 4 panda China
rating animal country avg_sleep_hours
1 1 koala Australia 0.999
2 2 hedgehog Italy 0.999
3 3 sloth Peru 0.999
4 4 panda China 0.999Another option allows us to calculate it based on existing columns:
# Reconstructing the `super_sleepers` dataframe
super_sleepers <- super_sleepers_initial
print(super_sleepers)
cat('\n\n')
# Adding a new column `avg_sleep_hours` to the `super_sleepers` dataframe
super_sleepers <- cbind(super_sleepers,
avg_sleep_hours=c(21, 18, 17, 10))
print(super_sleepers)
cat('\n\n')
# Adding a new column `avg_sleep_hours_per_year` calculated from `avg_sleep_hours`
super_sleepers <- cbind(super_sleepers,
avg_sleep_hours_per_year=super_sleepers('avg_sleep_hours') * 365)
print(super_sleepers)Output:
rating animal country
1 1 koala Australia
2 2 hedgehog Italy
3 3 sloth Peru
4 4 panda China
rating animal country avg_sleep_hours
1 1 koala Australia 21
2 2 hedgehog Italy 18
3 3 sloth Peru 17
4 4 panda China 10
rating animal country avg_sleep_hours avg_sleep_hours
1 1 koala Australia 21 7665
2 2 hedgehog Italy 18 6570
3 3 sloth Peru 17 6205
4 4 panda China 10 3650With the following option we can copy the column from another data frame:
# Creating the `super_sleepers_1` dataframe with the only column `rating`
super_sleepers_1 <- data.frame(rating=1:4)
print(super_sleepers_1)
cat('\n\n')
# Copying the `animal` column from `super_sleepers_initia`l to `super_sleepers_1`
# Note that in the new dataframe, the column is still called `animal` despite setting the new name `ANIMAL`
super_sleepers_1 <- cbind(super_sleepers_1,
ANIMAL=super_sleepers_initial('animal'))
print(super_sleepers_1)Output:
rating
1 1
2 2
3 3
4 4
rating animal
1 1 koala
2 2 hedgehog
3 3 sloth
4 4 pandaHoweverUnlike the approach of the $ operator and square bracket, pay attention to the following two nuances here:
- We cannot create a new column and calculate another column based on the new column The same within
cbind()Ceremony. For example, the code piece below will throw error.
# Reconstructing the `super_sleepers` dataframe
super_sleepers <- super_sleepers_initial
print(super_sleepers)
cat('\n\n')
# Attempting to add a new column `avg_sleep_hours` to the `super_sleepers` dataframe
# AND another new column `avg_sleep_hours_per_year` based on it
super_sleepers <- cbind(super_sleepers,
avg_sleep_hours=c(21, 18, 17, 10),
avg_sleep_hours_per_year=super_sleepers('avg_sleep_hours') * 365)
print(super_sleepers)Output:
rating animal country
1 1 koala Australia
2 2 hedgehog Italy
3 3 sloth Peru
4 4 panda China
Error in (.data.frame(super_sleepers, "avg_sleep_hours"): undefined columns selected
Traceback:
1. cbind(super_sleepers, avg_sleep_hours = c(21, 18, 17, 10), avg_sleep_hours_per_year = super_sleepers("avg_sleep_hours") *
. 365)
2. super_sleepers("avg_sleep_hours")
3. (.data.frame(super_sleepers, "avg_sleep_hours")
4. stop("undefined columns selected")- When we copy the column from another data frame and try to give it a new name In
cbind()CeremonyThis new name will be ignored, and the new column will be called exactly as it is called in the original data frame. For example, in a piece of code below, new nameANIMALWas neglected, and called a new columnanimalJust like in the data frame from which it was copied:
# Creating the `super_sleepers_1` dataframe with the only column `rating`
super_sleepers_1 <- data.frame(rating=1:4)
print(super_sleepers_1)
cat('\n\n')
# Copying the `animal` column from `super_sleepers_initial` to `super_sleepers_1`
# Note that in the new dataframe, the column is still called `animal` despite setting the new name `ANIMAL`
super_sleepers_1 <- cbind(super_sleepers_1,
ANIMAL=super_sleepers_initial('animal'))
print(super_sleepers_1)Output:
rating
1 1
2 2
3 3
4 4
rating animal
1 1 koala
2 2 hedgehog
3 3 sloth
4 4 pandaConclusion
In this tutorial, we discussed various reasons why we need to add a new column to the R data frame and what information it can store. Then, we discovered three different ways of doing this: \ $ symbol, square brackets and using it cbind() The event we considered each of these perspectives and its potential variations, professional and consistent, potential additional functions of each procedure, the most common disadvantages and errors and ways to avoid them. Also, we learned how to add multiple columns to the R data frame at the same time.
It is worth noting that there are no points discussed to add columns to the data frame in R. For example, for the same purpose, we can use mutate() Or add_column() Functions, however, to be able to apply these functions to We, we need to install and load specific R packages (dplyr And TableWe discussed in this tutorial, respectively, without adding any additional functions to the process of interest in their competition. Instead, using the $ symbol, square brackets, and d cbind() No installation in the base R need to be enforced for the function.
If you would like to find more information about working with data fames in R, check how to add rows to R in the data frame (with 7 code examples)