parse.unquote_plus TypeError

2022-12-10 16:14 问答作者：

I'm trying to format a file so that it can be inserted into a database, the file is originally compressed and arround 1.3MB big. Each line looks something like this:

398,%7EAnoniem+001%7E,543,480,7525010,1775,0

This is how the code looks like that parses this file:

   Village = gzip.open(Root+'\\data'+'\\' +str(Newest_Date[0])+'\\' +str(Newest_Date[1])+'\\' +str(Newest_Date[2])\
               +'\\'+str(Newest_Date[3])+' village.gz');
Village_Parsed = str
for line in Village:
    Village_Parsed = Village_Parsed + urllib.parse.unquote_plus(line);
print(Village.readline());

When I run the program I get this error:

Village_开发者_开发问答Parsed = Village_Parsed + urllib.parse.unquote_plus(line);
file "C:\Python31\lib\urllib\parse.py", line 404, in unquote_plus string = string.replace('+', ' ') TypeError: expected an object with the buffer interface

Any idea what is wrong here? Thanks in advance for any help :)

PROBLEM 1 is that urllib.unquote_plus doesn't like the line that you have fed it. The message should be "Please supply a str object" :-) I suggest that you fix problem 2 below, and insert:

print('line', type(line), repr(line))

immediately after your for statement so that you can see what you are getting in line.

You will find that it returns bytes objects:

>>> [line for line in gzip.open('test.gz')]
[b'nudge nudge\n', b'wink wink\n']

Using a mode of 'r' has scant effect:

>>> [line for line in gzip.open('test.gz', 'r')]
[b'nudge nudge\n', b'wink wink\n']

I suggest that instead of passing line to the parsing routine you pass line.decode('UTF-8') ... or whatever encoding was used when the gz file was written.

PROBLEM 2 is in this line:

Village_Parsed = str

str is a type. You need an empty str object. To get that, you could call the type i.e. str() which is formally correct but impractical/unusual/scoffable/weird when compared to using a string constant '' ... so do this:

Village_Parsed = ''

You also have PROBLEM 3: your last statement is trying to read the gz file after EOF.

import gzip, os, urllib.parse

archive_relpath = os.sep.join(map(str, Newest_Date[:4])) + ' village.gz'  
archive_path = os.path.join(Root, 'data', archive_relpath)

with gzip.open(archive_path) as Village:
    Village_Parsed = ''.join(urllib.parse.unquote_plus(line.decode('ascii'))
                             for line in Village)
    print(Village_Parsed)

Output:

398,~Anoniem 001~,543,480,7525010,1775,0

NOTE: RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax says:

This specification does not mandate any particular character encoding for mapping between URI characters and the octets used to store or transmit those characters. When a URI appears in a protocol element, the character encoding is defined by that protocol; without such a definition, a URI is assumed to be in the same character encoding as the surrounding text.

Therefore 'ascii' in the line.decode('ascii') fragment should be replaced by whatever character encoding you've used to encode your text.

继续阅读：parsing python typeerror urllib

parse.unquote_plus TypeError

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？