开发者

Hadoop Hive - Split string

I am a new hivebe.

My Query : In the log file we have a request fie开发者_如何学Gold like this "GET /img/home/search-user-ico.jpg HTTP/1.1" .There are more than 10,000 records are available.

Example :

"GET /img/home/search-user-ico.jpg HTTP/1.1"

"GET /JavaScript/jquery-1.4.2.min.js HTTP/1.1" "GET /ems/home HTTP/1.1" "POST /ir HTTP/1.1" "GET /CSS/jquery/themes/base/jquery.ui.button.css HTTP/1.1" "GET /CSS/jquery/themes/base/images/ui-bg_glass_75_e6e6e6_1x400.png HTTP/1.1"

"GET /JavaScript/jquery/jquery-ui-1.8.5.custom.min.js HTTP/1.0"

From this field "GET /img/home/search-user-ico.jpg HTTP/1.1" , i want only this part /img/home/search-user-ico.jpg ,i want to split it from GET,POST and HTTP/1.1 so please help me as how to split this using string functions available in wiki.I tried with some of the syntax available in wiki.but i'm helpless now.

i tried with the syntax like,

select regexp_extract(request,'a-zA-Za-zA-Z[a-zA-Z]',2) from logfile limit 10;

select regexp_extract(request,'GET(\s)([a-zA-Z])',2) from logfile limit 10;

select regexp_extract(request,'.?(\s)(.?)(\s)(.*?)',2) from logfile limit 10;

select regexp_extract(request,'.(\s)(.)(\s)(.*)',2) from logfile limit 10;

Thanks -Joe


I used RegexBuddy and the samples you provided and got just the URLs with this regex ([\S]*) HTTP This assumes there will be no literal spaces in the URL, encoded is fine.

Plugging it into a hive query should look something like

select regexp_extract(request, ' (\\S*) HTTP', 1) from logfile;

(Just to note, there is a space before (\\S). It might be fairly obvious, but just wanted to comment on it in case it was missed)

I have done a little testing in hive and it is working, at least with the tests similar to the samples provided.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜