how python test binary data inside some plain text which was extract from database [closed]
I have to extract some product information from mysql database, then construct a SOAP
request and use python's suds library to send this SOAP
request to a remote server.
But some of the information extract is combine with binary data and text data, such as:
...
Some plain data
...
Content-Type: image/pjpeg
? JFIF H H C
C
P P E !1AQa "q?#2脑$4BCR倯ご?3TVfrs枴贬 - !1A"2QR亼Ba毖q? ? 鮊€
( ?€
( 开发者_JS百科?€
釱Whf颲[e?喸媼q屧ㄠ蚀厲蹳ZIO痙(r5?-i擯栧剗矹?尴?蝓玁帰XZ鞭#崛攳┸蹵X僦?攅Z@?らM;X藙?N蹮垀s@jQ?Z徸林炑M~?麒]H=颦C胝_р}"?Gixqz坽徸玨?O?Q+谍?w鬪??-囯礥?а|乛聚Zyt>?`[~跲桫?騽D曐縅CmN=?shOU+湫锏竩&6げ?铚扺d)mn?c?X?6RmQ JJ?7繴*v>.捈鉵基d?堌疻熼G肗裪囅w騲癔R?qW鑪陭瞘.C窄贇CkyV瀷1? 柚%W}}?Yz僫芐D嫆1鬊懜赈篽效lq蒟H棯]y|G.;硅憖Ew??栧$?=e菚鏪Rbj?枝}爲釼Z3FE<尒n%C蓎??樋>`I 顝y∥+pP敐慗岻゛\硳诮湣]~??xΔ诩[_?叴b嫜?yz*?=ψk?猝"%Ak?撍滷秠BR?-铈b礖?蟷y[)厌麓4,怈
窧q觏?_ N獛擒F杍q凞画Q襃@镛P$讄k\鏁祘譟㎎*V>W??鵔M嫯q寓y焊閔C杔栽?+鹕瑟qbs:z氱^PJ聣?汜ZU"ス嘔+輔€8楺<夻Uゎ顓瞚?氅豴<]P銨c? +K6]┓gr杺 蝫?VJ能?陭欹殡J倢gS扚?娭酧??gw?膙y矼j折B礕殯
繅捁%撽蛵震挔撲y?3鑪澳N?Ec~涰巽j慭搆锥▓IP?)┤燎鐠懴 €H9瘾F毖l氾+岎6o?殎託炗y尬n??8??黬?4Qbń覦;縢?兟HRONd *紂蚽娖t猦?^?2
庴E$x 譴q箘瘃J檐H筶鷆[?8 ?9颢*髟揤v緜魸擭槧?%msV嬖z瘨摉擀F摫鞍s犮殩H4s咸?S蓉扷濅? V?昋c?u婆SG撙???{臘亞攕曳<\K? D]+#瓃kgw犤?.?惨邔蹓#p(巂s?瘑蜲Q?傻鑟6ce?敟)?9嶔?測誗?yfvp謒NnbmB3齑栘v>RR=拏H'焴j壎e鎨洘?窑??MH单;5?T1倧o)锐认J?QY&7?橥%诤授b?氭\堫轁q)荖no弎閂?添頶5E敌?U瞿??雛柖Q??Ps?冇9'=)J殅朥k%鈌l疆$q}?い$袋蜕~跏綺衄qU玉矰潱v硻e鷵?薭?<爗树q熣?I;ぞ_鬿埗d.握莰俜6渺^貀No-乾R?r\芷<A稙鋆j璲吡累Y错$F梱?镫[猄k\﹋JrRp悇?救
...
end of binary data.
I don't know how this data insert into mysql, but I have to detect this type of data, and replace this binary data to string EEEEE
, otherwise suds will raise exception.
Anyone can tell me how to test this type of data?
Thanks in advance.
Mixed text and binary?! Sounds awful...
However if all data is in the format you presented in your example (i.e. with a Content-type
declaration) you could do something along the lines of:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
data = '''Some plain data!Content-Type: image/pjpeg ? JFIF H H C C
P P E !1AQa "q?#2脑$4BCR倯ご?3TVfrs枴贬 - 釱W
hf颲[e?喸媼q屧ㄠ蚀厲蹳ZIO痙(r5?-i擯栧剗矹?尴?蝓玁帰XZ鞭#崛攳┸蹵X僦?攅Z@?らM
;X藙?N蹮垀s@jQ?Z徸林炑M~?麒]H=颦C胝_р}"?Gixqz坽徸玨?O?Q+谍?w鬪??'''
tdata = '''This has no binary in it'''
def filter_data(blob):
mixed = blob.find('Content-Type:')
if mixed != -1: # -1 ==> not found
return blob[:mixed] + 'EEEEE'
return blob
print filter_data(data)
print filter_data(tdata)
If binary data is not prepended by a Content-Type
declaration, I'm not sure there is a 100% reliable way to distinguish text from binary (a byte from the binary could be decoded to some sensful character) but you could at least improve the situation by filtering against a pool of valid characters.
For example, assuming that all valid text is alphanumeric A-Z, a-z and 0-1 plus the space character:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
data = '''Some plain data脑$4BCR倯ご?3TVfrs枴贬 - 釱W hf颲[e?喸媼q屧ㄠ蚀厲蹳
ZIO痙(r5?-i擯栧剗矹?尴?蝓玁帰XZ鞭#崛攳┸蹵X僦?攅Z@?らM;X藙?N蹮垀s@jQ?Z徸
林炑M~?麒]H=颦C胝_р}"?Gixqz坽徸玨?O?Q+谍?w鬪??'''
tdata = '''This has no binary in it'''
bdata = '''炑M~?麒]H=颦C胝'''
pool = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 '
def filter_data(blob):
last_good_one = None
for i, c in enumerate(blob):
if c in pool:
last_good_one = i
else:
break
if last_good_one == None:
raise BaseException('Only binary data!')
return blob[:last_good_one+1]
print filter_data(data)
print filter_data(tdata)
print filter_data(bdata)
HTH!
精彩评论