Pandas 复制 DataFrame

本教程将介绍我们如何使用 DataFrame.copy() 方法复制一个 DataFrame 对象。

import pandas as pd
items_df = pd.DataFrame({
    'Id': [302, 504, 708],
    'Cost': ["300", "400", "350"],
})
print(items_df)

输出:

    Id Cost
0  302  300
1  504  400
2  708  350

我们将用上面的例子来演示如何在 Pandas 中使用 DataFrame.copy() 方法。

pandas.DataFrame.copy() 方法语法

DataFrame.copy(deep=True)

它返回 DataFrame 的副本。deep 默认为 True,这意味着在副本中所作的任何更改将不会反映在原始 DataFrame 中。但是,如果我们设置 deep=False,那么在副本中所做的任何改变也会反映在原始 DataFrame 中。

使用 pandas.DataFrame.copy() 方法复制 Pandas DataFrame

import pandas as pd
import numpy as np
items_df = pd.DataFrame({
    'Id': [302, 504, 708],
    'Cost': ["300", "400", "350"],
})
deep_copy = items_df.copy()
print("Original DataFrame before changing value in copy DataFrame:")
print(items_df, "\n")
print("Copy DataFrame before changing value in copy DataFrame:")
print(deep_copy, "\n")
deep_copy.loc[0, "Cost"] = np.nan
print("Original DataFrame after changing value in copy DataFrame:")
print(items_df, "\n")
print("Copy DataFrame after changing value in copy DataFrame:")
print(deep_copy, "\n")

输出:

Original DataFrame before changing value in copy DataFrame:
    Id Cost
0  302  300
1  504  400
2  708  350
Copy DataFrame before changing value in copy DataFrame:
    Id Cost
0  302  300
1  504  400
2  708  350
Original DataFrame after changing value in copy DataFrame:
    Id Cost
0  302  300
1  504  400
2  708  350
Copy DataFrame after changing value in copy DataFrame:
    Id Cost
0  302  NaN
1  504  400
2  708  350

它创建了 DataFrame items_df 的副本作为 deep_copy。如果我们改变了副本 deep_copy 的任何值,原来的 DataFrame items_df 就没有变化。我们在 deep_copy 中把第一行的 Cost 列的值设置为 NaN,但 items_df 却没有变化。

将 Pandas DataFrame 分配给变量来复制 DataFrame

import pandas as pd
import numpy as np
items_df = pd.DataFrame({
    'Id': [302, 504, 708],
    'Cost': ["300", "400", "350"],
})
copy_cost = items_df["Cost"]
print("Cost column of Original DataFrame before changing value in copy DataFrame:")
print(items_df, "\n")
print("Cost column of Copied DataFrame before changing value in copy DataFrame:")
print(copy_cost, "\n")
copy_cost[0] = np.nan
print("Cost column of Original DataFrame after changing value in copy DataFrame:")
print(copy_cost, "\n")
print("Cost column of Copied DataFrame after changing value in copy DataFrame:")
print(copy_cost, "\n")

输出:

Cost column of Original DataFrame before changing value in copy DataFrame:
    Id Cost
0  302  300
1  504  400
2  708  350
Cost column of Copied DataFrame before changing value in copy DataFrame:
0    300
1    400
2    350
Name: Cost, dtype: object
Cost column of Original DataFrame after changing value in copy DataFrame:
0    NaN
1    400
2    350
Name: Cost, dtype: object
Cost column of Copied DataFrame after changing value in copy DataFrame:
0    NaN
1    400
2    350
Name: Cost, dtype: object

它将 DataFrame items_df 中的 Cost 列创建为 copy_cost