使用Python进行数据分析和可视化|S2

1.以CSV格式存储DataFrame：

``````import pandas as pd

# assigning three series to s1, s2, s3
s1 = pd.Series([ 0 , 4 , 8 ])
s2 = pd.Series([ 1 , 5 , 9 ])
s3 = pd.Series([ 2 , 6 , 10 ])

# taking index and column values
dframe = pd.DataFrame([s1, s2, s3])

# assign column name
dframe.columns = [ 'Geeks' , 'For' , 'Geeks' ]

# write data to csv file
dframe.to_csv( 'srcmini.csv' , index = False )
dframe.to_csv( 'srcmini1.csv' , index = True )``````

``````srcmini1.csv

srcmini2.csv``````

2.处理丢失的数据

``````import pandas as pd

# Create a DataFrame
dframe = pd.DataFrame({ 'Geeks' : [ 23 , 24 , 22 ], 'For' : [ 10 , 12 , np.nan], 'geeks' : [ 0 , np.nan, np.nan]}, columns = [ 'Geeks' , 'For' , 'geeks' ])

# This will remove all the
# rows with NAN values

# If axis is not defined then
# it is along rows i.e. axis = 0
dframe.dropna(inplace = True )
print (dframe)

# if axis is equal to 1
dframe.dropna(axis = 1 , inplace = True )

print (dframe)``````

``````axis=0

axis=1``````

``````import numpy as np
import pandas as pd

# Create a DataFrame
dframe = pd.DataFrame({ 'Geeks' : [ 23 , 24 , 22 ], 'For' : [ 10 , 12 , np.nan], 'geeks' : [ 0 , np.nan, np.nan]}, columns = [ 'Geeks' , 'For' , 'geeks' ])

# Use fillna of complete Dataframe

# value function will be applied on every column
dframe.fillna(value = dframe.mean(), inplace = True )
print (dframe)

# filling value of one column
dframe[ 'For' ].fillna(value = dframe[ 'For' ].mean(), inplace = True )
print (dframe)``````

3. Groupby方法(聚合)：

groupby方法允许我们根据任何行或列将数据分组在一起, 因此我们可以进一步应用聚合函数来分析数据。使用映射器(dict或键函数, 将给定函数应用于组, 将结果作为序列返回)或按一系列列对系列进行分组。

``````import pandas as pd
import numpy as np

# create DataFrame
dframe = pd.DataFrame({ 'Geeks' : [ 23 , 24 , 22 , 22 , 23 , 24 ], 'For' : [ 10 , 12 , 13 , 14 , 15 , 16 ], 'geeks' : [ 122 , 142 , 112 , 122 , 114 , 112 ]}, columns = [ 'Geeks' , 'For' , 'geeks' ])

# Apply groupby and aggregate function
# max to find max value of column

# &quot;For&quot; and column &quot;geeks&quot; for every
# different value of column &quot;Geeks&quot;.

print (dframe.groupby([ 'Geeks' ]). max ())``````