开发者

Python script for URL split

I'm new to python,learning the basics.

My Query : I have multiple pages accessed as a request from a log file like the below,

"GET /img/home/search-user-ico.jpg HTTP/1.1"  
"GET /SpellCheck/am.tlx HTTP/1.1"
"GET /img/plan-comp-nav.jpg HTTP/1.1" 
"GET /ie6.css HTTP/1.1"
"GET /img/portlet/portlet-content-bg.jpg HTTP/1.1"
"GET /SpellCheck/am100k2.clx HTTP/1.1" 
"GET /SpellCheck/am.tlx HTTP/1.1" 

My question is i want only the file part from the page, For example, Let us consider "GET /img/home/search-user-ico.jpg HTTP/1.1" ,"GET /ie6.css HTTP/1.1" as a page then from the above i want to split search-user-ico.jpg HTTP, ie6.css HTTP.

so experts please help me in writing the py开发者_JS百科thon script for the above to split.


Assuming that you don't have spaces in the filenames and that you don't want "HTTP" at the end.

You can split the line by space.

parts = line.split(" ")

and then use the os module to get the filename from the path.

filename = os.path.basename(parts[1])

For example.

>>> line = "GET /img/home/search-user-ico.jpg HTTP/1.1"
>>> parts = line.split(" ")
>>> parts[1]
'/img/home/search-user-ico.jpg'
>>> os.path.basename(parts[1])
'search-user-ico.jpg'


data = [
"GET /img/home/search-user-ico.jpg HTTP/1.1",
"GET /SpellCheck/am.tlx HTTP/1.1",
"GET /img/plan-comp-nav.jpg HTTP/1.1" ,
"GET /ie6.css HTTP/1.1",
"GET /img/portlet/portlet-content-bg.jpg HTTP/1.1",
"GET /SpellCheck/am100k2.clx HTTP/1.1" ,
"GET /SpellCheck/am.tlx HTTP/1.1" 
]

for url in data:
    print url.split(' ')[1].split('/')[-2]


data = [
"GET /img/home/search-user-ico.jpg HTTP/1.1",
"GET /SpellCheck/am.tlx HTTP/1.1",
"GET /img/plan-comp-nav.jpg HTTP/1.1" ,
"GET /ie6.css HTTP/1.1",
"GET /img/portlet/portlet-content-bg.jpg HTTP/1.1",
"GET /SpellCheck/am100k2.clx HTTP/1.1" ,
"GET /SpellCheck/am.tlx HTTP/1.1" 
]

for url in data:
    print url.split(' ')[1].split('/')[-1]


If the format of your links is similar. Another solution would be:

request = "GET /img/home/search-user-ico.jpg HTTP/1.1"
parts = request.split("/")
parts[-2] //returns search-user-ico.jpg HTTP
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜