Python search HTML document for capital letters
So I have all these html documents that have strings of capital letter in various places in alt tags, title tage, link text...etc.
<li><a title='BUY FOOD' href="http://www.example.com/food.html'>BUY FOOD</a></li>
What I need to do is replace all letters except the first letter with lowercase letting. Like 开发者_Python百科so:
<li><a title='Buy Food' href="http://www.example.com/food.html'>Buy Food</a></li>
Now how can I do this either in python or some form of regex. I was told that my editor Coda could do something like this. But I can't seem to find any documentation on how to do something like this.
I suggest you use Beautiful Soup to parse your HTML into a tree of tags, then write Python code to walk the tree of tags and body text and change to title case. You could use a regexp to do that, but Python has a built-in string method that will do it:
"BUY FOOD".title() # returns "Buy Food"
If you need a pattern to match strings that are all caps, I suggest you use: "[^a-z]*[A-Z][^a-z]*"
This means "match zero or more of anything except a lower-case character, then a single upper-case character, then zero or more of anything except a lower-case character".
This pattern will correctly match "BUY 99 BEERS", for example. It would not match "so very quiet" because that does not have even a single upper-case letter.
P.S. You can actually pass a function to re.sub()
so you could potentially do crazy powerful processing if you needed it. In your case I think Python's .title()
method will do it for you, but here is another answer I posted with information about passing in a function.
How to capitalize the first letter of each word in a string (Python)?
I think you need a HTML parser like BeautifulSoup, the rest would be details.
There may be noteworthy exceptions for which fully automatic editing is not a good idea, but if you have a regex capable editor you might search for /[A-Z][A-Z]+/ and replace by hand.
精彩评论