开发者

Remove anchor from URL in C#

I'm trying to pull in an src value from an XML d开发者_如何转开发ocument, and in one that I'm testing it with, the src is:

<content src="content/Orwell - 1984 - 0451524934_split_2.html#calibre_chapter_2"/>

That creates a problem when trying to open the file. I'm not sure what that #(stuff) suffix is called, so I had no luck searching for an answer. I'd just like a simple way to remove it if possible. I suppose I could write a function to search for a # and remove anything after, but that would break if the filename contained a # symbol (or can a file even have that symbol?)

Thanks!


If you had the src in a string you could use

srcstring.Substring(0,srcstring.LastIndexOf("#"));

Which would return the src without the #. If the values you are retreiving are all web urls then this should work, the # is a bookmark in a url that takes you to a specific part of the page.


You should be OK assuming that URLs won't contain a "#"

The character "#" is unsafe and should always be encoded because it is used in World Wide Web and in other systems to delimit a URL from a fragment/anchor identifier that might follow it.

Source (search for "#" or "unsafe").

Therefore just use String.Split() with the "#" as the split character. This should give you 2 parts. In the highly unlikely event it gives more, just discard the last one and rejoin the remainder.


From Wikipedia:

# is used in a URL of a webpage or other resource to introduce a "fragment identifier" – an id which defines a position within that resource. For example, in the URL http://en.wikipedia.org/wiki/Number_sign#Other_uses the portion after the # (Other_uses) is the fragment identifier, in this case indicating that the display should be moved to show the tag marked by ... in the HTML


It's not safe to remove de anchor of the url. What I mean is that ajax like sites make use of the anchor to keep track of the context. For example gmail. If you go to http://www.gmail.com/#inbox, you go directly to your inbox, but if you go to http://www.gmail.com/#all, you'll go to all your mail. The server can give a different response based on the anchor, even if the response is a file.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜