How to extract text from such type of html source?
I have html source containing about 1000 microblogs (one tweet per line). Most of the tweets are like the below. I am using delphi memo to try to strip html marks by using Pos function and delete function but failed.
<div id='tweetText'> RT <a onmousedown="return touch(this.href,0)" href="http://twitter.com/HighfashionUK">@HighfashionUK</a> RT: Surprise goody bag up 4 grabs, Ok. <a onmousedown="return touch(this.href,0)" hre开发者_如何学编程f="http://plixi.com/p/57846587">http://plixi.com/p/57846587</a> when we get 150</div>
I want to strip html marks and only have:
RT: Surprise goody bag up 4 grabs, Ok. http://plixi.com/p/57846587 when we get 150
How can I extract such text in delphi?
Thank you very much in advance.
Update:
Cosmin Prund is right. I mistakenly skipped a part. What I want is :
RT @HighfashionUK RT: Surprise goody bag up 4 grabs, Ok. http://plixi.com/p/57846587 when we get 150
Cosmin Prund is great.
Since all HTML markup is between <
and >
, a routine to strip markup can be trivially written like this. Hopefully this is what you want because, as you see in my comment, there's a issue with @HighfashionUK
- your example skipped that, don't know why.
function StripHtmlMarkup(const source:string):string;
var i, count: Integer;
InTag: Boolean;
P: PChar;
begin
SetLength(Result, Length(source));
P := PChar(Result);
InTag := False;
count := 0;
for i:=1 to Length(source) do
if InTag then
begin
if source[i] = '>' then InTag := False;
end
else
if source[i] = '<' then InTag := True
else
begin
P[count] := source[i];
Inc(count);
end;
SetLength(Result, count);
end;
精彩评论