个性化阅读
专注于IT技术分析

R中的DataFrame操作详细指南

数据框(DataFrame)是R的通用数据对象, 用于存储表格数据。数据框被认为是R编程中最流行的数据对象, 因为以表格形式分析数据更加方便。数据帧也可以讲授为床垫, 其中矩阵的每一列可以具有不同的数据类型。 DataFrame由三个主要部分组成, 即数据, 行和列。

R中的DataFrame操作1

可以在DataFrame上执行的操作是:

  • 创建一个DataFrame
  • 访问行和列
  • 选择数据框的子集
  • 编辑数据框
  • 向数据框添加额外的行和列
  • 根据现有变量向数据框添加新变量
  • 删除数据框中的行和列

创建一个DataFrame

在现实世界中, 将通过从现有存储中加载数据集来创建DataFrame, 存储可以是SQL数据库, CSV文件和Excel文件。也可以从R中的向量创建DataFrame。以下是可用于创建DataFrame的各种方法:

要创建数据帧,使用data.frame()命令,然后将创建的每个向量作为参数传递给函数。

例子:

# R program to illustrate dataframe
  
# A vector which is a character vector
Name = c( "Amiya" , "Raj" , "Asish" )
  
# A vector which is a character vector
Language = c( "R" , "Python" , "Java" )
  
# A vector which is a numeric vector
Age = c( 22 , 25 , 45 )
  
# To create dataframe use data.frame command and
# then pass each of the vectors 
# we have created as arguments
# to the function data.frame()
df = data.frame(Name, Language, Age)
  
print (df)

输出如下:

Name  Language  Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45

使用文件中的数据创建数据框:也可以通过从文件导入数据来创建数据框。为此, 你必须使用名为”read.table()‘。

语法如下:

newDF = read.table(path="Path of the file")

要从R中的CSV文件创建数据框, 请执行以下操作:

语法如下:

newDF = read.csv("FileName.csv")

访问行和列

下面给出了访问行和列的语法,

df[val1, val2]

df = dataframe object
val1 = rows of a data frame
val2 = columns of a data frame

所以这 ‘值1‘和‘值2‘可以是值数组, 例如” 1:2″或” 2:3″等。如果仅指定df [val2]这仅指你需要从数据框中访问的一组列。

示例:行选择

# R program to illustrate operations
# on a data frame
  
# Creating a dataframe
df = data.frame(
   "Name" = c( "Amiya" , "Raj" , "Asish" ), "Language" = c( "R" , "Python" , "Java" ), "Age" = c( 22 , 25 , 45 )
)
print (df)
  
# Accessing first and second row
cat( "Accessing first and second row\n" )
print (df[ 1 : 2 , ])

输出如下:

Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45

Accessing first and second row
   Name Language Age
1 Amiya        R  22
2   Raj   Python  25

示例:列选择

# R program to illustrate operations
# on a data frame
  
# Creating a dataframe
df = data.frame(
   "Name" = c( "Amiya" , "Raj" , "Asish" ), "Language" = c( "R" , "Python" , "Java" ), "Age" = c( 22 , 25 , 45 )
)
print (df)
  
# Accessing first and second column
cat( "Accessing first and second column\n" )
print (df[, 1 : 2 ])

输出如下:

Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45

Accessing first and second column
   Name Language
1 Amiya        R
2   Raj   Python
3 Asish     Java

选择数据框的子集

也可以借助以下语法, 根据某些条件创建DataFrame的子集。

newDF =子集(df, 条件)df =原始数据框条件=某些条件

例子:

# R program to illustrate operations
# on a data frame
  
# Creating a dataframe
df = data.frame(
   "Name" = c( "Amiya" , "Raj" , "Asish" ), "Language" = c( "R" , "Python" , "Java" ), "Age" = c( 22 , 25 , 45 )
)
print (df)
  
# Selecting the subset of the data frame
# where Name is equal to Amiya
# OR age is greater than 30
newDf = subset(df, Name = = "Amiya" |Age> 30 )
  
cat( "After Selecting the subset of the data frame\n" )
print (newDf)

输出如下:

Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45

After Selecting the subset of the data frame
   Name Language Age
1 Amiya        R  22
3 Asish     Java  45

编辑数据框

在R中, 可以通过两种方式编辑DataFrame:

通过直接分配编辑数据框:与R中的列表非常相似, 你可以通过直接分配来编辑数据帧。

例子:

# R program to illustrate operation on a data frame
  
# Creating a dataframe
df = data.frame(
   "Name" = c( "Amiya" , "Raj" , "Asish" ), "Language" = c( "R" , "Python" , "Java" ), "Age" = c( 22 , 25 , 45 )
)
cat( "Before editing the dataframe\n" )
print (df)
  
# Editing dataframes by direct assignments
# [[3]] accesing the top level components 
# Here Age in this case
# [[3]][3] accessing inner level componets 
# Here Age of Asish in this case
df[[ 3 ]][ 3 ] = 30
  
cat( "After edited the dataframe\n" )
print (df)

输出如下:

Before editing the data frame
   Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45

After edited the data frame
   Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  30

使用来编辑数据框

编辑()

命令:

请按照给定的步骤编辑DataFrame:

第1步:因此, 你需要为此做的是创建一个数据框实例, 例如, 你可以看到此处使用命令创建了一个数据框实例并将其命名为” myTable”data.frame()这将创建一个空的数据框。

myTable = data.frame()

第2步:接下来, 我们将使用编辑功能启动查看器。请注意, ” myTable”数据帧被传递回” myTable”对象, 这样, 我们对此模块所做的更改将保存到原始对象。

myTable =编辑(myTable)

因此, 当执行以上命令时, 它将弹出一个这样的窗口,

R中的DataFrame操作2

第三步

:现在, 表格已包含此小表。

R中的DataFrame操作3

请注意, 通过单击变量名称并输入更改来更改变量名称。变量也可以设置为数字或字符。一旦DataFrame中的数据如上所示, 请关闭表。更改将自动保存。

步骤4:通过打印检查结果数据框。

> myTable

Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45

将行和列添加到数据框

添加额外的行:我们可以使用以下命令添加额外的行rbind()。语法如下所示,

newDF = rbind(df, 你必须添加的新行的条目)df =原始数据帧

请注意, 你必须添加的新行条目在使用时必须小心

rbind()

因为每个列条目中的数据类型应等于已经存在的行的数据类型。

例子:

# R program to illustrate operation on a data frame
  
# Creating a dataframe
df = data.frame(
   "Name" = c( "Amiya" , "Raj" , "Asish" ), "Language" = c( "R" , "Python" , "Java" ), "Age" = c( 22 , 25 , 45 )
)
cat( "Before adding row\n" )
print (df)
  
# Add a new row using rbind()
newDf = rbind(df, data.frame(Name = "Sandeep" , Language = "C" , Age = 23
                            ))
cat( "After Added a row\n" )
print (newDf)

输出如下:

Before adding row
   Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45

After Added a row
     Name Language Age
1   Amiya        R  22
2     Raj   Python  25
3   Asish     Java  45
4 Sandeep        C  23

添加额外的列:我们可以使用以下命令添加额外的列cbind()。语法如下所示,

newDF = cbind(df, 你必须添加的新列的条目)df =原始数据帧

例子:

# R program to illustrate operation on a data frame
  
# Creating a dataframe
df = data.frame(
   "Name" = c( "Amiya" , "Raj" , "Asish" ), "Language" = c( "R" , "Python" , "Java" ), "Age" = c( 22 , 25 , 45 )
)
cat( "Before adding column\n" )
print (df)
  
# Add a new column using cbind()
newDf = cbind(df, Rank = c( 3 , 5 , 1 ))
  
cat( "After Added a column\n" )
print (newDf)

输出如下:

Before adding column
   Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45

After Added a column
   Name Language Age Rank
1 Amiya        R  22    3
2   Raj   Python  25    5
3 Asish     Java  45    1

向DataFrame添加新变量

在R中, 我们可以基于现有变量将新变量添加到数据框。为此, 我们必须先调用dplyr使用命令库图书馆() 。然后打电话mutate()函数将基于现有列添加额外的变量列。

语法如下:

library(dplyr)newDF = mutate(df, new_var = [existing_var])df =原始数据框new_var =新变量的名称existing_var =你要执行的修改操作(例如, 对数值乘以10)

例子:

# R program to illustrate operation on a data frame
  
# Importing the dplyr library
library(dplyr)
  
# Creating a dataframe
df = data.frame(
   "Name" = c( "Amiya" , "Raj" , "Asish" ), "Language" = c( "R" , "Python" , "Java" ), "Age" = c( 22 , 25 , 45 )
)
cat( "Original Dataframe\n" )
print (df)
  
# Creating an extra variable column
# "log_Age" which is log of variable column "Age"
# Using mutate() command
newDf = mutate(df, log_Age = log(Age))
  
cat( "After creating extra variable column\n" )
print (newDf)

输出如下:

Original Dataframe
   Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45

After creating extra variable column
   Name Language Age  log_Age
1 Amiya        R  22 3.091042
2   Raj   Python  25 3.218876
3 Asish     Java  45 3.806662

从数据框中删除行和列

要删除行或列, 首先, 你需要访问该行或列, 然后在该行或列之前插入一个负号。它表明你必须删除该行或列。

语法如下:

newDF = df [-rowNo, -colNo] df =原始数据帧

例子:

# R program to illustrate operation on a data frame
  
# Creating a dataframe
df = data.frame(
   "Name" = c( "Amiya" , "Raj" , "Asish" ), "Language" = c( "R" , "Python" , "Java" ), "Age" = c( 22 , 25 , 45 )
)
cat( "Before deleting the 3rd row and 2nd column\n" )
print (df)
  
# delete the third row and the second column
newDF = df[ - 3 , - 2 ]
  
cat( "After Deleted the 3rd row and 2nd column\n" )
print (newDF)

输出如下:

Before deleting the 3rd row and 2nd column
   Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45
After Deleted the 3rd row and 2nd column
   Name Age
1 Amiya  22
2   Raj  25

赞(0) 打赏
未经允许不得转载:srcmini » R中的DataFrame操作详细指南
分享到: 更多 (0)

评论 抢沙发

评论前必须登录!

 

觉得文章有用就打赏一下文章作者

微信扫一扫打赏