How to: remove part of a Unicode string in Python following a special character

2023-01-18 22:17 问答作者：

first a short summery:

python ver: 3.1 system: Linux (Ubuntu)

I am trying to do some data retrieval through Python and BeautifulSoup.

Unfortunately some of the tables I am trying to process contains cells where the following text string exists:

789.82 ± 10.28

For this i to work i need two things:

How do i handle "weird" symbols such as: ± and how do i remove the par开发者_JS百科t of the string containing: ± and everything to the right of this?

Currently i get an error like: SyntaxError: Non-ASCII charecter '\xc2' in file ......

Thank you for your help

[edit]:

# dataretriveal from html files from DETHERM
# -*- coding: utf8 -*-

import sys,os,re
from BeautifulSoup import BeautifulSoup


sys.path.insert(0, os.getcwd())

raw_data = open('download.php.html','r')
soup = BeautifulSoup(raw_data)


for numdiv in soup.findAll('div', {"id" : "sec"}):
    currenttable = numdiv.find('table',{"class" : "data"})
    if currenttable:
        numrow=0
        for row in currenttable.findAll('td', {"class" : "dataHead"}):
            numrow=numrow+1

        for col in currenttable.findAll('td'):
            col2 = ''.join(col.findAll(text=True))
            if col2.index('±'):
                col2=col2[:col2.indeindex('±')]
            print(col)
        print(numrow)
        ref=numdiv.find('a')
        niceref=''.join(ref.findAll(text=True))
        print(niceref)

Now this code is followed by an error of:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)

Where did the ASCII reference pop up from ?

You need to have your Python file encoded in utf-8. Otherwise, it's quite trivial:

>>> s = '789.82 ± 10.28'
>>> s[:s.index('±')]
'789.82 '
>>> s.partition('±')
('789.82 ', '±', ' 10.28')

继续阅读：python special-characters string unicode

How to: remove part of a Unicode string in Python following a special character

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？