开发者

Is this possible to extract all the text only from a HTML file?

I am taking about remove all format, all the text just like you go to any page of the website, user Ctrl+A, and Ctrl + C, then use Ctrl + V to paste all the content in a notepad. And you will understand what I mean by extracting all the text....only. Let use a site for a better explain: This site: https://developer.palm.com/content/resources/develop/quick_start_ios.html

And the thing I want:

jump to navigation
jump to content

Showcase
Why webOS
The Opportunity
Innovative Platform
Cross-Platform
HP Reach
Vibrant Community
Showcase
Device Showcase
App Showcase
Developer Voices
My Apps
Resources
Design
Enyo Design Guide
Advanced Application Guidelines
webOS and Game Development
Development
Download the SDK
Enyo from the Ground Up
Enyo Tutorial
Third-party Tools
Developer Device Program
PDK Development
Unactivated Devices
Glossary
Distribution and Promotion
Distributing with HP
App Content Criteria
App Submission Checklist
International e-commerce FAQ
Submit Your Enyo App
Market Your App
Promo codes
In-App Purchase
FAQs
Developer Program FAQ
International e-commerce FAQ
PDK Technical FAQ
Videos
View All
Community
Connect
Forums
Developer Blog
Events
Twitter
IRC
RSS
Resources
Third-party Developers
webOS on github
Guide to Custom Feeds
webOS101 (external)
Community Sites
mobspot
Cyrket
PreCentral
webOS Roundup
Documentation
SDK Documentation
Index
Developer Guide
API Reference
Sign In Sign Up Search Form
Search   
HomeResourcesQuick Start iOS
Quick Start - iOS Developers
Print
Email
Share
If you've been developing for iOS® and are looking to expand your audience, we're here to help. Getting started with webOS is easy! If your current focus is OpenGL/SDL, then the transition will be simplicity itself. We have lots of great stories of developers porting their OpenGL apps very quickly. You can use the publicly available 3.0 SDK to do OpenGL/SDL development now with the included Plug-in Development Kit (PDK). Best of all, the PDK integrates nicely with  Xcode.

If your focus is web app development, you'll want to look at Enyo, our next-generation JavaScript framework, which is included in the 3.0 SDK.

Ready to get started?

Download the SDK
It's free! (While you're at it, sign up for the Developer Program.)

Try the Enyo tutorial or the OpenGL sample app
Choose the sample that's most appropriate for your skill set.

Check out our Resources pages
Get more information on developing for webOS. Or go straight to the Reference sec开发者_高级运维tion to get all the details.
Quick Start Guide

iOS Developers
Web Developers
C/C++ Developers
Next Steps

Sign up!
Become a member of the webOS developer community
Watch Dev Day videos
See the talks from the NYC Dev Day
Find a Developer
Check out our list of third-party developers and designers
Support
We are here to help!
Why webOS
Business Case for webOS
Success Stories
App Showcase
Contact Us
Getting Started
Join the HP webOS Developer Program
Download the SDK/PDK
Developing Your First App
Videos
webOS CONNECT Events
MWC Developer Conference
NYC Developer Day
Podcasts
Support
Help
FAQs
Stay up to date
About RSS Feeds
Developer Blog
© 2011 Hewlett-Packard Development Company, L.P.
The information contained herein is subject to change without notice. All screen images simulated. HP Pre 3 planned availability this summer.  Privacy Statement
Supported browsers: Firefox 3.6+; Google Chrome 10+; Safari 5+; Internet Explorer 8+
Palm.comLegal NoticesContact Us


This should work

<?php 

echo strip_tags(file_get_contents("https://developer.palm.com/content/resources/develop/quick_start_ios.html"));

Thats the general idea. You can do things like str_replace('<br/>', '\n', $output) to format it better.


I use lynx, try this from your terminal:

lynx -dump http://www.google.com


Another way to do it is to retrieve the value of the page's body tag:

$html = new DOMDocument();

$html->loadHTMLFile("https://developer.palm.com/content/resources/develop/quick_start_ios.html");

$body = $html->getElementsByTagName("body");
$body = $body->item(0);

echo $body->nodeValue;


you can use the document tree to do this, just leave all the text nodes and remove all the element nodes.

you can implement this with javascript or C++ with webkit.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜