Data Frame Manipulation -- Base R data.frame¶
Basic data frame manipulation using base R data frames.
In [1]:
Copied!
library(magrittr)
library(magrittr)
Load csv¶
In [2]:
Copied!
df <- read.csv(here::here(".data", "titanic.csv"))
head(df, 3)
df <- read.csv(here::here(".data", "titanic.csv"))
head(df, 3)
| pclass | survived | name | sex | age | sibsp | parch | ticket | fare | cabin | embarked | boat | body | home.dest | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| <int> | <int> | <chr> | <chr> | <dbl> | <int> | <int> | <chr> | <dbl> | <chr> | <chr> | <chr> | <int> | <chr> | |
| 1 | 1 | 1 | Allen, Miss. Elisabeth Walton | female | 29.0000 | 0 | 0 | 24160 | 211.3375 | B5 | S | 2 | NA | St Louis, MO |
| 2 | 1 | 1 | Allison, Master. Hudson Trevor | male | 0.9167 | 1 | 2 | 113781 | 151.5500 | C22 C26 | S | 11 | NA | Montreal, PQ / Chesterville, ON |
| 3 | 1 | 0 | Allison, Miss. Helen Loraine | female | 2.0000 | 1 | 2 | 113781 | 151.5500 | C22 C26 | S | NA | Montreal, PQ / Chesterville, ON |
Summary¶
In [3]:
Copied!
# Rows
nrow(df)
# Rows
nrow(df)
1309
In [4]:
Copied!
# Columns
length(df)
# Columns
length(df)
14
In [5]:
Copied!
# Inspect types
str(df)
# Inspect types
str(df)
'data.frame': 1309 obs. of 14 variables: $ pclass : int 1 1 1 1 1 1 1 1 1 1 ... $ survived : int 1 1 0 0 0 1 1 0 1 0 ... $ name : chr "Allen, Miss. Elisabeth Walton" "Allison, Master. Hudson Trevor" "Allison, Miss. Helen Loraine" "Allison, Mr. Hudson Joshua Creighton" ... $ sex : chr "female" "male" "female" "male" ... $ age : num 29 0.917 2 30 25 ... $ sibsp : int 0 1 1 1 1 0 1 0 2 0 ... $ parch : int 0 2 2 2 2 0 0 0 0 0 ... $ ticket : chr "24160" "113781" "113781" "113781" ... $ fare : num 211 152 152 152 152 ... $ cabin : chr "B5" "C22 C26" "C22 C26" "C22 C26" ... $ embarked : chr "S" "S" "S" "S" ... $ boat : chr "2" "11" "" "" ... $ body : int NA NA NA 135 NA NA NA NA NA 22 ... $ home.dest: chr "St Louis, MO" "Montreal, PQ / Chesterville, ON" "Montreal, PQ / Chesterville, ON" "Montreal, PQ / Chesterville, ON" ...
In [6]:
Copied!
# Descriptive statistics
summary(df)
# Descriptive statistics
summary(df)
pclass survived name sex
Min. :1.000 Min. :0.000 Length:1309 Length:1309
1st Qu.:2.000 1st Qu.:0.000 Class :character Class :character
Median :3.000 Median :0.000 Mode :character Mode :character
Mean :2.295 Mean :0.382
3rd Qu.:3.000 3rd Qu.:1.000
Max. :3.000 Max. :1.000
age sibsp parch ticket
Min. : 0.1667 Min. :0.0000 Min. :0.000 Length:1309
1st Qu.:21.0000 1st Qu.:0.0000 1st Qu.:0.000 Class :character
Median :28.0000 Median :0.0000 Median :0.000 Mode :character
Mean :29.8811 Mean :0.4989 Mean :0.385
3rd Qu.:39.0000 3rd Qu.:1.0000 3rd Qu.:0.000
Max. :80.0000 Max. :8.0000 Max. :9.000
NA's :263
fare cabin embarked boat
Min. : 0.000 Length:1309 Length:1309 Length:1309
1st Qu.: 7.896 Class :character Class :character Class :character
Median : 14.454 Mode :character Mode :character Mode :character
Mean : 33.295
3rd Qu.: 31.275
Max. :512.329
NA's :1
body home.dest
Min. : 1.0 Length:1309
1st Qu.: 72.0 Class :character
Median :155.0 Mode :character
Mean :160.8
3rd Qu.:256.0
Max. :328.0
NA's :1188
Slice data¶
In [7]:
Copied!
# Select rows by number
df[2:4,]
# Select rows by number
df[2:4,]
| pclass | survived | name | sex | age | sibsp | parch | ticket | fare | cabin | embarked | boat | body | home.dest | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| <int> | <int> | <chr> | <chr> | <dbl> | <int> | <int> | <chr> | <dbl> | <chr> | <chr> | <chr> | <int> | <chr> | |
| 2 | 1 | 1 | Allison, Master. Hudson Trevor | male | 0.9167 | 1 | 2 | 113781 | 151.55 | C22 C26 | S | 11 | NA | Montreal, PQ / Chesterville, ON |
| 3 | 1 | 0 | Allison, Miss. Helen Loraine | female | 2.0000 | 1 | 2 | 113781 | 151.55 | C22 C26 | S | NA | Montreal, PQ / Chesterville, ON | |
| 4 | 1 | 0 | Allison, Mr. Hudson Joshua Creighton | male | 30.0000 | 1 | 2 | 113781 | 151.55 | C22 C26 | S | 135 | Montreal, PQ / Chesterville, ON |
In [8]:
Copied!
# Select rows by condition
df[df[, "survived"] == 1,] %>% head(3)
# Select rows by condition
df[df[, "survived"] == 1,] %>% head(3)
| pclass | survived | name | sex | age | sibsp | parch | ticket | fare | cabin | embarked | boat | body | home.dest | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| <int> | <int> | <chr> | <chr> | <dbl> | <int> | <int> | <chr> | <dbl> | <chr> | <chr> | <chr> | <int> | <chr> | |
| 1 | 1 | 1 | Allen, Miss. Elisabeth Walton | female | 29.0000 | 0 | 0 | 24160 | 211.3375 | B5 | S | 2 | NA | St Louis, MO |
| 2 | 1 | 1 | Allison, Master. Hudson Trevor | male | 0.9167 | 1 | 2 | 113781 | 151.5500 | C22 C26 | S | 11 | NA | Montreal, PQ / Chesterville, ON |
| 6 | 1 | 1 | Anderson, Mr. Harry | male | 48.0000 | 0 | 0 | 19952 | 26.5500 | E12 | S | 3 | NA | New York, NY |
In [9]:
Copied!
# Select rows by multiple conditions
df[df[, "survived"] == 1 & df[, "sex"] == "female",] %>% head(3)
# Select rows by multiple conditions
df[df[, "survived"] == 1 & df[, "sex"] == "female",] %>% head(3)
| pclass | survived | name | sex | age | sibsp | parch | ticket | fare | cabin | embarked | boat | body | home.dest | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| <int> | <int> | <chr> | <chr> | <dbl> | <int> | <int> | <chr> | <dbl> | <chr> | <chr> | <chr> | <int> | <chr> | |
| 1 | 1 | 1 | Allen, Miss. Elisabeth Walton | female | 29 | 0 | 0 | 24160 | 211.3375 | B5 | S | 2 | NA | St Louis, MO |
| 7 | 1 | 1 | Andrews, Miss. Kornelia Theodosia | female | 63 | 1 | 0 | 13502 | 77.9583 | D7 | S | 10 | NA | Hudson, NY |
| 9 | 1 | 1 | Appleton, Mrs. Edward Dale (Charlotte Lamson) | female | 53 | 2 | 0 | 11769 | 51.4792 | C101 | S | D | NA | Bayside, Queens, NY |
In [10]:
Copied!
# Select columns by name
df[, c("survived", "pclass")] %>% head(3)
# Select columns by name
df[, c("survived", "pclass")] %>% head(3)
| survived | pclass | |
|---|---|---|
| <int> | <int> | |
| 1 | 1 | 1 |
| 2 | 1 | 1 |
| 3 | 0 | 1 |
Assign values¶
In [11]:
Copied!
# Assign a (new) column
df$sparkles <- 8
head(df, 3)
# Assign a (new) column
df$sparkles <- 8
head(df, 3)
| pclass | survived | name | sex | age | sibsp | parch | ticket | fare | cabin | embarked | boat | body | home.dest | sparkles | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| <int> | <int> | <chr> | <chr> | <dbl> | <int> | <int> | <chr> | <dbl> | <chr> | <chr> | <chr> | <int> | <chr> | <dbl> | |
| 1 | 1 | 1 | Allen, Miss. Elisabeth Walton | female | 29.0000 | 0 | 0 | 24160 | 211.3375 | B5 | S | 2 | NA | St Louis, MO | 8 |
| 2 | 1 | 1 | Allison, Master. Hudson Trevor | male | 0.9167 | 1 | 2 | 113781 | 151.5500 | C22 C26 | S | 11 | NA | Montreal, PQ / Chesterville, ON | 8 |
| 3 | 1 | 0 | Allison, Miss. Helen Loraine | female | 2.0000 | 1 | 2 | 113781 | 151.5500 | C22 C26 | S | NA | Montreal, PQ / Chesterville, ON | 8 |
In [12]:
Copied!
# Assign value to some rows
df[df[, "survived"] == 1, "sparkles"] <- 3
head(df, 3)
# Assign value to some rows
df[df[, "survived"] == 1, "sparkles"] <- 3
head(df, 3)
| pclass | survived | name | sex | age | sibsp | parch | ticket | fare | cabin | embarked | boat | body | home.dest | sparkles | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| <int> | <int> | <chr> | <chr> | <dbl> | <int> | <int> | <chr> | <dbl> | <chr> | <chr> | <chr> | <int> | <chr> | <dbl> | |
| 1 | 1 | 1 | Allen, Miss. Elisabeth Walton | female | 29.0000 | 0 | 0 | 24160 | 211.3375 | B5 | S | 2 | NA | St Louis, MO | 3 |
| 2 | 1 | 1 | Allison, Master. Hudson Trevor | male | 0.9167 | 1 | 2 | 113781 | 151.5500 | C22 C26 | S | 11 | NA | Montreal, PQ / Chesterville, ON | 3 |
| 3 | 1 | 0 | Allison, Miss. Helen Loraine | female | 2.0000 | 1 | 2 | 113781 | 151.5500 | C22 C26 | S | NA | Montreal, PQ / Chesterville, ON | 8 |
Group by¶
In [ ]:
Copied!
Reshape¶
In [13]:
Copied!
# Gather / melt
# Gather / melt
In [14]:
Copied!
# Spread / cast / pivot
# Spread / cast / pivot
Join¶
In [ ]:
Copied!
Creation¶
In [15]:
Copied!
df2 <- data.frame(
x = c(1,2,3)
, y = c('a', 'b', 'c')
, z = c(TRUE, FALSE, TRUE)
)
df2
df2 <- data.frame(
x = c(1,2,3)
, y = c('a', 'b', 'c')
, z = c(TRUE, FALSE, TRUE)
)
df2
| x | y | z |
|---|---|---|
| <dbl> | <chr> | <lgl> |
| 1 | a | TRUE |
| 2 | b | FALSE |
| 3 | c | TRUE |