pyspark createDataframe typeerror: structtype can not accept object 'id' in type <class 'str'>
An API call is returning DICT type response similar to the output below:
{'Account': {'id': 123, 'externalIdentifier': None, 'name': 'test acct', 'accountNumber': None, 'Rep': None, 'organizationId': 123, 'streetAddress': '123 Main Road', 'streetAddressCity': 'Town City', 'streetAddressState': 'Texas', 'streetAddressZipCode': '76123', 'contact': [{'id': 10001, 'name': 'Test test', 'extID': '9999999999'}]}}
I am attempting to build a dataframe of the Account record returned but I keep getting TypeError: StructType can not accept object 'id' in type <class 'str'>. I have tried the other methods which include adding .item(), map lambda and converting types, but always coming back to the same error.
account_schema = StructType([
StructField('id', StringType(), True),
StructField('externalIdentifier', StringType(), True),
StructField('name', StringType(), True),
StructField('Account_number', StringType(), True),
StructField('Rep', StructType([
StructField('firstName', StringType(), True),
StructField('lastName', StringType(), True),
StructField('email', StringType(), True),
StructField('id', StringType(), True),
])),
StructField('streetAddress', StringType(), True),
StructField('streetAddressCity', StringType(), True),
StructField('streetAddressState', StringType(), True), 开发者_如何转开发
StructField('streetAddressZipCode', StringType(), True) ])
df = spark.createDataFrame(account_response['Account'], schema=account_schema)
Any direction would be appreciated.
The reason is that the data type of data
argument should be RDD or some kind of iterable like list, array etc. as per official documentation.
If you enclose your data in square brackets then you get a spark dataframe with one record.
spark.createDataFrame(data=[account_response['Account']], schema=account_schema)
Full working example:
account_response = {'Account': {'id': 123, 'externalIdentifier': None, 'name': 'test acct', 'accountNumber': None, 'Rep': None, 'organizationId': 123, 'streetAddress': '123 Main Road', 'streetAddressCity': 'Town City', 'streetAddressState': 'Texas', 'streetAddressZipCode': '76123', 'contact': [{'id': 10001, 'name': 'Test test', 'extID': '9999999999'}]}}
account_schema = StructType([
StructField('id', StringType(), True),
StructField('externalIdentifier', StringType(), True),
StructField('name', StringType(), True),
StructField('Account_number', StringType(), True),
StructField('Rep', StructType([
StructField('firstName', StringType(), True),
StructField('lastName', StringType(), True),
StructField('email', StringType(), True),
StructField('id', StringType(), True),
])),
StructField('streetAddress', StringType(), True),
StructField('streetAddressCity', StringType(), True),
StructField('streetAddressState', StringType(), True),
StructField('streetAddressZipCode', StringType(), True)
])
df = spark.createDataFrame(data=[account_response['Account']], schema=account_schema)
df.show(truncate=False)
Output:
+---+------------------+---------+--------------+----+-------------+-----------------+------------------+--------------------+
|id |externalIdentifier|name |Account_number|Rep |streetAddress|streetAddressCity|streetAddressState|streetAddressZipCode|
+---+------------------+---------+--------------+----+-------------+-----------------+------------------+--------------------+
|123|null |test acct|null |null|123 Main Road|Town City |Texas |76123 |
+---+------------------+---------+--------------+----+-------------+-----------------+------------------+--------------------+
精彩评论