开发者

Python search HTML document for capital letters

So I have all these html documents that have strings of capital letter in various places in alt tags, title tage, link text...etc.

<li><a title='BUY FOOD' href="http://www.example.com/food.html'>BUY FOOD</a></li>

What I need to do is replace all letters except the first letter with lowercase letting. Like 开发者_Python百科so:

<li><a title='Buy Food' href="http://www.example.com/food.html'>Buy Food</a></li>

Now how can I do this either in python or some form of regex. I was told that my editor Coda could do something like this. But I can't seem to find any documentation on how to do something like this.


I suggest you use Beautiful Soup to parse your HTML into a tree of tags, then write Python code to walk the tree of tags and body text and change to title case. You could use a regexp to do that, but Python has a built-in string method that will do it:

"BUY FOOD".title()  # returns "Buy Food"

If you need a pattern to match strings that are all caps, I suggest you use: "[^a-z]*[A-Z][^a-z]*"

This means "match zero or more of anything except a lower-case character, then a single upper-case character, then zero or more of anything except a lower-case character".

This pattern will correctly match "BUY 99 BEERS", for example. It would not match "so very quiet" because that does not have even a single upper-case letter.

P.S. You can actually pass a function to re.sub() so you could potentially do crazy powerful processing if you needed it. In your case I think Python's .title() method will do it for you, but here is another answer I posted with information about passing in a function.

How to capitalize the first letter of each word in a string (Python)?


I think you need a HTML parser like BeautifulSoup, the rest would be details.


There may be noteworthy exceptions for which fully automatic editing is not a good idea, but if you have a regex capable editor you might search for /[A-Z][A-Z]+/ and replace by hand.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜