开发者

Is there a more efficient way to calculate the next line/column number from a multiline token?

So I am working on constructing a lexer/parser pair using parser combinators which leaves me with some interesting problems. Now the specific problem this question is regarding I have actually solved but I am not completely happy with my solution.

module P开发者_如何转开发rogram =            

    type Token = { value:string; line:int; column:int; }

    let lineTerminators = set ['\u000A'; '\u000D'; '\u2028'; '\u2029']

    let main () =

        let token = { value = "/*\r\n *\r\n *\r\n \n */"; line = 1; column = 1; }

        let chars = token.value.ToCharArray()

        let totalLines =
            chars
            |> Array.mapi(
                fun i c ->
                    if not (lineTerminators.Contains c)  then 
                        0 
                    else
                        if c <> '\n' || i = 0 || chars.[i - 1] <> '\r' then 
                            1 
                        else 
                            0
            )
            |> Array.sum

        let nextLine = token.line + totalLines

        let nextColumn =
            if totalLines = 0 then 
                token.column + token.value.Length 
            else
                1 + (chars 
                     |> Array.rev 
                     |> Array.findIndex lineTerminators.Contains)


        System.Console.ReadKey true |> ignore

    main()


One problem with your implementation is that it originally seems to expect that all line terminators are just single characters, but that's actually not the case - if you treat "\r\n" as a single line terminator (composed from 2 characters), then the situation should be more clear. For example, I would declare terminators like this:

let terminators = [ ['\r'; '\n']; ['\r']; ['\n'] ]

The order is significant - if we find "\r\n" first then we want to skip 2 characters (so that we don't count the next '\n' char as the next terminator). Unfortunately "skip 2 characters" is a bit tricky - it cannot be done using mapi function, which calls the function for each element.

A direct implementation using a recursive function could look like this:

let input = "aaa\nbbb\r\nccc" |> List.ofSeq

// Returns Some() if input starts with token (and the returned
// value is the rest of the input with the starting token removed)
let rec equalStart input token = 
  match input, token with
  | _, [] -> Some(input) // End of recursion
  | a::input, b::token when a = b -> equalStart input token // Recursive call
  | _ -> None // Mismatch

// Counts the number of lines...
let rec countLines count input = 
  // Pick first terminator that matches with the beginning of the input (if any)
  let term = terminators |> List.tryPick (equalStart input)
  match term, input with 
  | None, _::input -> countLines count input  // Nothing found - continue
  | None, [] -> count + 1 // At the end, add the last line & return
  | Some(rest), _ -> countLines (count + 1) rest  // Found - add line & continue

If you were using some parser combinator library (such as FParsec), then you could use built-in parsers for most of the things. I didn't actually try this, but here is a rough sketch - you could store the list of terminators as a list of strings and generate parser for each of the string:

let terminators = [ "\r\n"; "\r"; "\n" ]
let parsers = [ for t in terminators -> 
                  parser { let! _ = pstring t  
                           return 1 } ] // Return '1' because we found next line

This gives you a list of parsers that return 1 when there is a terminator at the end - now you could aggregate all of them using <|> (or combinator) and then run the composed parser. If that fails, you can skip the first character (combine this with another parser) and continue recursively. The only problem is that parser combinators usually return all possible derivations ("\r\n" can be interpreted as two line breaks..), so you would need to get just the first result...

(From your question, it wasn't clear whether you actually want to use some parser combinator library or not, so I didn't elaborate on that topic - if you're interested, you can ask for more details...)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜