开发者

how python test binary data inside some plain text which was extract from database [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 11 years ago.

I have to extract some product information from mysql database, then construct a SOAP request and use python's suds library to send this SOAP request to a remote server.

But some of the information extract is combine with binary data and text data, such as:

...
Some plain data
...
Content-Type: image/pjpeg

? JFIF  H H   C 


 C     
  P P                  E       !1AQa "q?#2脑$4BCR倯ご?3TVfrs枴贬             -        !1A"2QR亼Ba毖q?   ? 鮊€
 ( ?€
 ( 开发者_JS百科?€
 釱Whf颲[e?喸媼q屧ㄠ蚀厲蹳ZIO痙(r5?-i擯栧剗矹?尴?蝓玁帰XZ鞭#崛攳┸蹵X僦?攅Z@?らM;X藙?N蹮垀s@jQ?Z徸林炑M~?麒]H=颦C胝_р}"?Gixqz坽徸玨?O?Q+谍?w鬪??-囯礥?а|乛聚Zyt>?`[~跲桫?騽D曐縅CmN=?shOU+湫锏竩&6げ?铚扺d)mn?c?X?6RmQ   JJ?7繴*v>.捈鉵基d?堌疻熼G肗裪囅w騲癔R?qW鑪陭瞘.C窄贇CkyV瀷1? 柚%W}}?Yz僫芐D嫆1鬊懜赈篽效lq蒟H棯]y|G.;硅憖Ew??栧$?=e菚鏪Rbj?枝}爲釼Z3FE<尒n%C蓎??樋>`I 顝y∥+pP敐慗岻゛\硳诮湣]~??xΔ诩[_?叴b嫜?yz*?=ψk?猝"%Ak?撍滷秠BR?-铈b礖?蟷y[)厌麓4,怈
窧q觏?_   N獛擒F杍q凞画Q襃@镛P$讄k\鏁祘譟㎎*V>W??鵔M嫯q寓y焊閔C杔栽?+鹕瑟qbs:z氱^PJ聣?汜ZU"ス嘔+輔€8楺<夻Uゎ顓瞚?氅豴<]P銨c? +K6]┓gr杺  蝫?VJ能?陭欹殡J倢gS扚?娭酧??gw?膙y矼j折B礕殯
繅捁%撽蛵震挔撲y?3鑪澳N?Ec~涰巽j慭搆锥▓IP?)┤燎鐠懴 €H9瘾F毖l氾+岎6o?殎託炗y尬n??8??黬?4Qbń覦;縢?兟HRONd *紂蚽娖t猦?^?2
庴E$x 譴q箘瘃J檐H筶鷆[?8 ?9颢*髟揤v緜魸擭槧?%msV嬖z瘨摉擀F摫鞍s犮殩H4s咸?S蓉扷濅?    V?昋c?u婆SG撙???{臘亞攕曳<\K? D]+#瓃kgw犤?.?惨邔蹓#p(巂s?瘑蜲Q?傻鑟6ce?敟)?9嶔?測誗?yfvp謒NnbmB3齑栘v>RR=拏H'焴j壎e鎨洘?窑??MH单;5?T1倧o)锐认J?QY&7?橥%诤授b?氭\堫轁q)荖no弎閂?添頶5E敌?U瞿??雛柖Q??Ps?冇9'=)J殅朥k%鈌l疆$q}?い$袋蜕~跏綺衄qU玉矰潱v硻e鷵?薭?<爗树q熣?I;ぞ_鬿埗d.握莰俜6渺^貀No-乾R?r\芷<A稙鋆j璲吡累Y错$F梱?镫[猄k\﹋JrRp悇?救 
...
end of binary data.

I don't know how this data insert into mysql, but I have to detect this type of data, and replace this binary data to string EEEEE, otherwise suds will raise exception.

Anyone can tell me how to test this type of data?

Thanks in advance.


Mixed text and binary?! Sounds awful...

However if all data is in the format you presented in your example (i.e. with a Content-type declaration) you could do something along the lines of:

#!/usr/bin/env python
# -*- coding: utf-8  -*-

data = '''Some plain data!Content-Type: image/pjpeg ? JFIF  H H   C  C 
          P P                  E       !1AQa "q?#2脑$4BCR倯ご?3TVfrs枴贬    - 釱W
          hf颲[e?喸媼q屧ㄠ蚀厲蹳ZIO痙(r5?-i擯栧剗矹?尴?蝓玁帰XZ鞭#崛攳┸蹵X僦?攅Z@?らM
          ;X藙?N蹮垀s@jQ?Z徸林炑M~?麒]H=颦C胝_р}"?Gixqz坽徸玨?O?Q+谍?w鬪??'''
tdata = '''This has no binary in it'''

def filter_data(blob):
    mixed = blob.find('Content-Type:')
    if mixed != -1:  # -1 ==> not found
        return blob[:mixed] + 'EEEEE'
    return blob

print filter_data(data)
print filter_data(tdata)

If binary data is not prepended by a Content-Type declaration, I'm not sure there is a 100% reliable way to distinguish text from binary (a byte from the binary could be decoded to some sensful character) but you could at least improve the situation by filtering against a pool of valid characters.

For example, assuming that all valid text is alphanumeric A-Z, a-z and 0-1 plus the space character:

#!/usr/bin/env python
# -*- coding: utf-8  -*-

data = '''Some plain data脑$4BCR倯ご?3TVfrs枴贬    - 釱W hf颲[e?喸媼q屧ㄠ蚀厲蹳
          ZIO痙(r5?-i擯栧剗矹?尴?蝓玁帰XZ鞭#崛攳┸蹵X僦?攅Z@?らM;X藙?N蹮垀s@jQ?Z徸
          林炑M~?麒]H=颦C胝_р}"?Gixqz坽徸玨?O?Q+谍?w鬪??'''
tdata = '''This has no binary in it'''
bdata = '''炑M~?麒]H=颦C胝'''

pool = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 '

def filter_data(blob):
    last_good_one = None
    for i, c in enumerate(blob):
        if c in pool:
            last_good_one = i
        else:
            break
    if last_good_one == None:
        raise BaseException('Only binary data!')
    return blob[:last_good_one+1]

print filter_data(data)
print filter_data(tdata)
print filter_data(bdata)

HTH!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜