开发者

Vim Regex Duplicate Lines Grouping

I have a log file like this:

开发者_如何转开发
12 adsflljl
12 hgfahld
12 ash;al
13 a;jfda
13 asldfj
15 ;aljdf
16 a;dlfj
19 adads
19 adfasf
20 aaaadsf

And I would like to "group" them like one of these two:

12 adsfllj, 12 hgfahld, 12 ash;al
13 a;jfda, 13 asldfj
15 ;aljdf
16 a;dlfj
19 adads, 19 adfasf
20 aaaadsf

Or

12 adsfllj, hgfahld, ash;al
13 a;jfda, asldfj
15 ;aljdf
16 a;dlfj
19 adads, adfasf
20 aaaadsf

And I am totally stuck. And if vim doesn't do it, I have sed and awk and bash too. I just don't really want to write a bash script, I want to increase my regex-fu


In Vim you can use:

:%s/\(\(\d\+\) .*\)\n\2/\1, \2/g 

which means: if a group of numbers is matched after a new line, remove the newline and place a comma instead. If you are not familiar with them, \1 and \2 are backreferences.

Unfortunately this only merges two occurrences at a time, so you'll have to run it multiple times before achieving your goal.

EDIT: one way to do it in a single go would be to cycle and exploit the fact that the as soon as the file doesn't match anymore an error is issued. The error is a bit annoying though, but I couldn't do better with a one-liner:

:while 1 | :%s/\(\(\d\+\) .*\)\n\2/\1, \2/g | :endwhile


I'd just use awk:

awk '
  {
    sep = val[$1] ? ", " : ""
    val[$1] = val[$1] sep $2
  }
  END {for (v in val) print v, val[v]}
' log.file | sort > new.file


In Vim, I would use the command

:g/^\d\+/y|if+@"==getline(line('.')-1)|s//,/|-j!

if it is guaranteed that the first column always contains digital ids.

Otherwise, I would modify that if-condition as follows.

:g/^\S\+/y|if matchstr(@",@/)==matchstr(getline(line('.')-1),@/)|s//,/|-j!


Another way to do this, with a macro this time (I advise you to use another solution, this one just shows that there are plenty of ways to do it):

gg:%s/$/,enterqa0V?ctrl-rctrl-w\>\&^enterJjq100@a:%s/.$//return

explanation:

  • gg => go to start of file
  • :%s/$/, => append comma to every line
  • qa => start recording a macro into register a
  • 0V => go to first column, and start linewise selection
  • ? => lookup backwards (you must have set wrapscan)
    • ctrl-r ctrl-w inserts word under cursor.
    • \> ensures end of word
    • \&^ ensures pattern matches at start of line. You cannot put ^ at the beginning of the pattern, because if incsearch is set, then as soon as you have typed ^ then ctrl-r ctrl-w will print the word under cursor, which will have moved to the previous line.
  • J will join all lines from the visual selection with spaces.
  • j will go to next line
  • q will stop recording macro
  • 100@a will play macro 100 times.
  • :%s/.$// will remove trailing commas.


I do not think that this is a good idea to use regular expressions here. Same idea that you can find in @glenn jackman's solution written in vimscript will be the following:

function JoinLog()
    let d={}
    g/\v^\S+\s/let [ds, k, t; dl]=matchlist(getline('.'), '\v^(\S+)\s+(.*)') |
              \let d[k]=get(d, k, [])+[t]
    %delete _
    call setline(1, map(sort(keys(d)), 'v:val." ".join(d[v:val], ", ")'))
endfunction

You can keep the order instead of sorting:

function JoinLog()
    let d={}
    let ordered=[]
    g/\v^\S+\s/let [ds, k, t; dl]=matchlist(getline('.'), '\v^(\S+)\s+(.*)') |
              \if has_key(d, k) | let d[k]+=[t] |
              \else             | let ordered+=[k] | let d[k]=[t] |
              \endif
    %delete _
    call setline(1, map(copy(ordered), 'v:val." ".join(d[v:val], ", ")'))
endfunction
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜