开发者

rearranging string containing repeating patterns of variable length

I have a file with the following layout:

TABLE name_of_table

COLUMNS first_column 2nd_column [..] n-th_column

VALUES 1st_value 2nd_value [...] n-th value

VALUES yet_another_value ... go on

ANOTHER TABLE repeat from begin.....

I want to have this textfile rearranged for me, so I don't have to type in the TABLE and COLUMNS in front of every VALUES line, yielding:

TABL开发者_高级运维E name_of_table COLUMNS first_column [..] n-th column VALUES 1st_value

TABLE name_of_table COLUMNS first_column [..] n-th column VALUES yetanother_value

I need to take input and rearrange several lines at once here, so getting the entire textfile as a string with hGetContents seems appropriate, yielding a string like this:

TABLE name_of_table COLUMNS first_column [..] n-th_column VALUES 1st_value [..] n-th_value VALUES another_value [..] yet_another VALUES ...... ANOTHER TABLE .... COLUMNS .... VALUES [....]VALUES ...

I have tried doing this with nested case of's and recursion. This gives me a dilemma I need help with:

1) I need recursion in order to avoid endless case nesting problems.

2) with recursion, I can't have as an alternative adding the previous parts of the string, as the recursion only references the tail of my string!

Illustrating the problem:

myStr::[[Char]]->[[Char]] myStr [] = [] myStr one = case (head one) of "table" -> "insert into":(head two):columnRecursion (three) ++ case (head four) of "values" -> (head four):valueRecursion (tail three) ++ myStr (tail four) _ -> case head (tail four) of "values" -> (head (tail four):myStr (tail (tail four)) _ -> where two = tail one three = tail two four = tail three columnRecursion::[[Char]] -> [[Char]] columnRecursion [] = [] columnRecursion cool = case (head cool) of "columns" -> "(":columnRecursion (tail cool) "values" -> [")"] _ -> (head cool):columnRecursion (tail cool) valueRecursion::[[Char]] -> [[Char]] valueRecursion foo = case head foo of "values" -> "insert into":(head two):columnRecursion (three) ++ valueRecursion (tail foo) "table" -> [] "columns"-> [] _ -> (head foo):valueRecursion (tail foo)

I wind up with FIRSTPART, VALUES bla bla VALUES bla bla, and I can't fetch FIRSTPART again, to create FIRSTPART, VALUES, FIRSTPART, VALUES, FIRSTPART, VALUES.

The attempt to do this by referencing myStr in valueRecursion is obviously out of scope.

What to do??


For me this kind of problem would be just past the use-a-real-parsing-tool threshold. Here's a quick working example with Attoparsec:

import Control.Applicative
import Data.Attoparsec (maybeResult)
import Data.Attoparsec.Char8
import qualified Data.Attoparsec.Char8 as A (takeWhile)
import qualified Data.ByteString.Char8 as B
import Data.Maybe (fromMaybe)

data Entry = Entry String [String] [[String]] deriving (Show)

entry = Entry <$> table <*> cols <*> many1 vals
items = sepBy1 (A.takeWhile $ notInClass " \n") $ char ' '
table = string (B.pack "TABLE ") *> many1 (notChar '\n') <* endOfLine
cols = string (B.pack "COLUMNS ") *> (map B.unpack <$> items) <* endOfLine
vals = string (B.pack "VALUES ")  *> (map B.unpack <$> items) <* endOfLine

parseEntries :: B.ByteString -> Maybe [Entry]
parseEntries = maybeResult . flip feed B.empty . parse (sepBy1 entry skipSpace)

And a bit of machinery:

pretty :: Entry -> String
pretty (Entry t cs vs)
  = unwords $ ["TABLE", t, "COLUMNS"]
  ++ cs ++ concatMap ("VALUES" :) vs

layout :: B.ByteString -> Maybe String
layout = (unlines . map pretty <$>) . parseEntries

testLayout :: FilePath -> IO ()
testLayout f = putStr . fromMaybe [] =<< layout <$> B.readFile f

And given this input:

TABLE test
COLUMNS a b c
VALUES 1 2 3
VALUES 4 5 6

TABLE another
COLUMNS x y z q
VALUES 7 8 9 10
VALUES 1 2 3 4

We get the following:

*Main> testLayout "test.dat" 
TABLE test COLUMNS a b c VALUES 1 2 3 VALUES 4 5 6
TABLE another COLUMNS x y z q VALUES 7 8 9 10 VALUES 1 2 3 4

Which seems to be what you want?


This answer is literate Haskell, so you can copy-and-paste it into a file named table.lhs to get a working program.

Beginning with a few imports

> import Control.Arrow ((&&&))
> import Control.Monad (forM_)
> import Data.List (intercalate,isPrefixOf)
> import Data.Maybe (fromJust)

and say we represent a table with the following record:

> data Table = Table { tblName :: String
>                    , tblCols :: [String]
>                    , tblVals :: [String]
>                    }
>   deriving (Show)

That is, we record the table's name, the list of column names, and the list of column values.

Each table in the input starts on a line beginning with TABLE, so separate all lines in the input into chunks accordingly:

> tables :: [String] -> [Table]
> tables [] = []
> tables xs = next : tables ys
>   where next = mkTable (th:tt)
>         (th:rest) = dropWhile (not . isTable) xs
>         (tt,ys) = break isTable rest
>         isTable = ("TABLE" `isPrefixOf`)

Having chunked the input into tables, the name of a given table is the first word on the TABLE line. The column names are all words that appear on COLUMNS lines, and column values come from VALUES lines:

> mkTable :: [String] -> Table
> mkTable xs = Table name cols vals
>   where name = head $ fromJust $ lookup "TABLE" tagged
>         cols = grab "COLUMNS"
>         vals = grab "VALUES"
>         grab t = concatMap snd $ filter ((== t) . fst) tagged
>         tagged = map ((head &&& tail) . words)
>                $ filter (not . null) xs

Given a Table record, we print it by pasting the names, values, and SQL keywords together in the appropriate order on a single line:

> main :: IO ()
> main = do
>   input <- readFile "input"
>   forM_ (tables $ lines input) $
>     \t -> do putStrLn $ intercalate " " $
>                 "TABLE"   : (tblName t)  :
>                ("COLUMNS" : (tblCols t)) ++
>                ("VALUES"  : (tblVals t))

Given the unimaginative input of

TABLE name_of_table

COLUMNS first_column 2nd_column [..] n-th_column

VALUES 1st_value 2nd_value [...] n-th value

VALUES yet_another_value ... go on

TABLE name_of_table

COLUMNS first_column 2nd_column [..] n-th_column

VALUES 1st_value 2nd_value [...] n-th value

VALUES yet_another_value ... go on

the output is

$ runhaskell table.lhs
TABLE name_of_table COLUMNS first_column 2nd_column [..] n-th_column VALUES 1st_value 2nd_value [...] n-th value yet_another_value ... go on
TABLE name_of_table COLUMNS first_column 2nd_column [..] n-th_column VALUES 1st_value 2nd_value [...] n-th value yet_another_value ... go on
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜