I\'m using this code to find all interestinglinks in a page: soup.findAll(\'a\', href=re.compile(\'^notizia.php\\?idn=\\d+\'))
It\'s not really scraping, I\'m just trying to find the URLs in a web page where the class has a specific value. For example:
I would like to save a web page pr开发者_开发问答ogrammatically. I don\'t mean merely save the HTML. I would also like automatically to store all associated files (images, CSS files, maybe embedded S
I want to display on a wordpress page the total volume of shares traded on the NYSE stock exchange the last 2 weeks that it\'s been open.What is the best way to go abo开发者_开发知识库ut doing this?Ya
I\'m trying to extract text from arbitrary html pages. Some of the pages (which I have no control over) have malformed html or scripts which make this difficult. Also I\'m on a shared hosting environm