开发者

What is the efficent way to convert a Pandas DataFrame to a PyTorch TensorDataset

I want to convert this Pandas DataFrame to a TensorDataset

import pandas as pd
df = pd.DataFrame({'A': [[1, 2, 3], [1, 2, 3], [1, 2, 3]], 'B': [0, 1, 0]})

I figured out I can do it this way without getting an error.

A = torch.tensor(df['A'].values.tolist())
B = torch.tensor(df['B'].values)
dataset = torch.utils.data.TensorDataset(A, B)

However, I get the Warning:

UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a开发者_JAVA技巧 single numpy.ndarray with numpy.array() before converting to a tensor.

When I try it this way:

data_numpy = df.to_numpy()
data_tensor = torch.from_numpy(data_numpy)
dataset = torch.utils.data.TensorDataset(data_tensor)

I get the error:

can't convert np.ndarray of type numpy.object_

So the question arises, what is the efficient way to convert a Pandas Data Frame with this structure to a TensorDataset?


Code:

import torch


def get_device() -> torch.device:
    return torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")


def df_to_tensor(x: pd.DataFrame) -> torch:
    return torch.from_numpy(x.values).to(get_device())


df = pd.DataFrame({"spam": [1, 2, 3, 4], "eggs": [5, 6, 7, 8], "ham": [9, 10, 11, 12]})
tensor = df_to_tensor(df)
print(tensor)

Output:

tensor([[ 1,  5,  9],
        [ 2,  6, 10],
        [ 3,  7, 11],
        [ 4,  8, 12]])
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜