What is the efficent way to convert a Pandas DataFrame to a PyTorch TensorDataset
I want to convert this Pandas DataFrame to a TensorDataset
import pandas as pd
df = pd.DataFrame({'A': [[1, 2, 3], [1, 2, 3], [1, 2, 3]], 'B': [0, 1, 0]})
I figured out I can do it this way without getting an error.
A = torch.tensor(df['A'].values.tolist())
B = torch.tensor(df['B'].values)
dataset = torch.utils.data.TensorDataset(A, B)
However, I get the Warning:
UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a开发者_JAVA技巧 single numpy.ndarray with numpy.array() before converting to a tensor.
When I try it this way:
data_numpy = df.to_numpy()
data_tensor = torch.from_numpy(data_numpy)
dataset = torch.utils.data.TensorDataset(data_tensor)
I get the error:
can't convert np.ndarray of type numpy.object_
So the question arises, what is the efficient way to convert a Pandas Data Frame with this structure to a TensorDataset?
Code:
import torch
def get_device() -> torch.device:
return torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")
def df_to_tensor(x: pd.DataFrame) -> torch:
return torch.from_numpy(x.values).to(get_device())
df = pd.DataFrame({"spam": [1, 2, 3, 4], "eggs": [5, 6, 7, 8], "ham": [9, 10, 11, 12]})
tensor = df_to_tensor(df)
print(tensor)
Output:
tensor([[ 1, 5, 9],
[ 2, 6, 10],
[ 3, 7, 11],
[ 4, 8, 12]])
精彩评论