开发者

Extracting list of numbers from plain text file

I have text file containing ser开发者_如何学编程ies of numbers following similar pattern:

<Lorepsum ipsum lores aus Lorep NUM="100" aus Lore>

<Lorepsum ipsum lores aus Lorpsum NUM="101" Lorepsum>

<Lorepsum ipsum lores aus Lorp77dsum NUM="102" ipsum lores aus>

<Lorepsum ipsum lores aus Lopsum NUM="103" lores aus>

Is it possible to write a windows batch script to extract the numbers from the file and put it into a new file?

o/p file should contain

101
102
103
104


Yes, but it's not very pretty. The obvious candidate for this would b regular expressions which you only have for matching (and then only very limited) in batch files. If you'd use PowerShell then it'd just be

Get-Content foo.txt | ForEach-Object {
    [Regex]::Match($_,  'NUM="(\d+)"').Groups[1].Value
}

But sadly, in a batch file this is a little more complicated.

You can, however, use for /f to parse the file and then examine the tokens. There is no easy way to parse a line token by token, though. And tokenizing stops after 31 tokens (if I remember correctly). In any case, the following does work:

@echo off
for /f "delims=" %%f in (foo.txt) do call :parse "%%f"
goto :eof

:parse
setlocal enabledelayedexpansion
set i=0
:parseImpl
set /a i+=1
(
  for /f "tokens=%i% delims= " %%l in (%1) do (
    rem Jump out if no more tokens are there
    if "%%l"=="" goto :eof
    rem Remember the token
    set T=%%l
    if "!T:~0,4!"=="NUM=" (
      set N=!T:~4!
      rem add redirection here if needed
      echo !N:"=!
    )
  )
) || goto :eof
rem This above will cause the loop to stop once no more tokens are there.
rem The for loop will return a non-zero exit code then.
goto parseImpl

It's not too pretty, but fairly straightforward. Since when reading a file I can use each line only once I delegate the work to a subroutine which goes over the line as often as necessary. For this the variable i is used which keeps track of the current token number. Then another for loop is employed which extracts the requested token from the string. If the token starts with NUM= then it is assumed to be the number you want. It is cleaned up and printed.

If you want them directly into a file, then change the respective line to

>out.txt echo !N:"=!

The code can also be found in my SVN.


This should get you started:

@echo off
set cnt=0
set max=9
:enter_loop

if %cnt% GTR %max% goto end_loop
echo NUM="%cnt%" >> output.txt
set /a cnt="cnt+1"
goto enter_loop

:end_loop

pause
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜