R Data Frame

R Data frame is a list of vectors, factors, and/or matrices all having the same length (number of rows in the case of matrices). In addition, a data frame generally has a names attribute labeling the variables and a row.names attribute for labeling the cases.

A data frame can contain a list that is the same length as the other components. The list can contain elements of differing lengths thereby providing a data structure for ragged arrays. However, as of this writing such arrays are not generally handled correctly.

  • A data frame is table or a list with class “data.frame”.
  • Data Frame can be displayed in matrix form, and its rows and columns extracted using matrix indexing conventions.
  • In Data Frames, column names should not be empty.
  • In Data Frames, rows should be unique.
  • The data stored in Data Frame can be of numeric, factor, or character type.
  • Each column should contain same number of data items.
  • Data Frame can be created with data.frame() function.

Create Data Frame in R

student_data<-data.frame( std_id=c(1:4), std_name=c("George","Kalpesh","Karunakar","David"), std_age=c(10,8,6,9), std_city=c("Hyderabad","Elchuru","Ongole","London")) > student_data std_id std_name std_age std_city 1 1 George 10 Hyderabad 2 2 Kalpesh 8 Elchuru 3 3 Karunakar 6 Ongole 4 4 David 9 London > length(student_data) #Length of Data Frame [1] 4 > nrow(student_data) #Row Size [1] 4 > ncol(student_data) # Column size [1] 4 > typeof(student_data) # Data Frame is special case of list [1] "list" > class(student_data) # To check Data Frame or not using class function. [1] "data.frame"

Get Structure of Data Frame:

We can get the structure of Data Frame using str() function.

> student_data<-data.frame( std_id=c(1:4), std_name=c("George","Kalpesh","Karunakar","David"), std_age=c(10,8,6,9), std_city=c("Hyderabad","Elchuru","Ongole","London")) > str(student_data) 'data.frame': 4 obs. of 4 variables: $ std_id : int 1 2 3 4 $ std_name: Factor w/ 4 levels "David","George",..: 2 3 4 1 $ std_age : num 10 8 6 9 $ std_city: Factor w/ 4 levels "Elchuru","Hyderabad",..: 2 1 4 3

Summary of Data Frame:

We can get statistical summary of Data Frame using summary() function.

> student_data<-data.frame( std_id=c(1:4), std_name=c("George","Kalpesh","Karunakar","David"), std_age=c(10,8,6,9), std_city=c("Hyderabad","Elchuru","Ongole","London")) > summary(student_data) std_id std_name std_age std_city Min. :1.00 David :1 Min. : 6.00 Elchuru :1 1st Qu.:1.75 George :1 1st Qu.: 7.50 Hyderabad:1 Median :2.50 Kalpesh :1 Median : 8.50 London :1 Mean :2.50 Karunakar:1 Mean : 8.25 Ongole :1 3rd Qu.:3.25 3rd Qu.: 9.25 Max. :4.00 Max. :10.00

Access data from Data Frame:

student_data<-data.frame( std_id=c(1:4), std_name=c("George","Kalpesh","Karunakar","David"), std_age=c(10,8,6,9), std_city=c("Hyderabad","Elchuru","Ongole","London"))

Extract specific columns from Data Frame.

> result<-data.frame(student_data$std_id,student_data$std_name) > result student_data.std_id student_data.std_name 1 1 George 2 2 Kalpesh 3 3 Karunakar 4 4 David

Extract First 3 rows and all columns

> student_data[1:3,] std_id std_name std_age std_city 1 1 George 10 Hyderabad 2 2 Kalpesh 8 Elchuru 3 3 Karunakar 6 Ongole

or use head function to get first n rows of a Data Frame

> head(student_data,3) std_id std_name std_age std_city 1 1 George 10 Hyderabad 2 2 Kalpesh 8 Elchuru 3 3 Karunakar 6 Ongole

Extract 1st row and 3rd row with 2nd and 4th columns

> student_data[c(1,3),c(2,4)]

std_name std_city 1 George Hyderabad 3 Karunakar Ongole

Adding column or row to Data Frame in R

Data Frame can be extended by adding columns or rows.

Add column to Data Frame:

student_data<-data.frame( std_id=c(1:4), std_name=c("George","Kalpesh","Karunakar","David"), std_age=c(10,8,6,9), std_city=c("Hyderabad","Elchuru","Ongole","London")) student_data$student_state<-c("TS","AP","AP","BT") > student_data$student_state<-c("TS","AP","AP","BT") > student_data std_id std_name std_age std_city student_state 1 1 George 10 Hyderabad TS 2 2 Kalpesh 8 Elchuru AP 3 3 Karunakar 6 Ongole AP 4 4 David 9 London BT

Add a row to Data Frame:

Create New Data Frame student_new_data<-data.frame( std_id=c(5), std_name=c("Prem"), std_age=c(10), std_city=c("Sydney")) > student_new_data std_id std_name std_age std_city 1 5 Prem 10 Sydney

Add rows of new Data Frame with existing Data Frame:

> student_data<-rbind(student_data,student_new_data) > student_data std_id std_name std_age std_city 1 1 George 10 Hyderabad 2 2 Kalpesh 8 Elchuru 3 3 Karunakar 6 Ongole 4 4 David 9 London 5 5 Prem 10 Sydney

Powered by WordPress