Is there a more efficient way to calculate the next line/column number from a multiline token?

2023-01-16 02:10 问答作者：

So I am working on constructing a lexer/parser pair using parser combinators which leaves me with some interesting problems. Now the specific problem this question is regarding I have actually solved but I am not completely happy with my solution.

module P开发者_如何转开发rogram =            

    type Token = { value:string; line:int; column:int; }

    let lineTerminators = set ['\u000A'; '\u000D'; '\u2028'; '\u2029']

    let main () =

        let token = { value = "/*\r\n *\r\n *\r\n \n */"; line = 1; column = 1; }

        let chars = token.value.ToCharArray()

        let totalLines =
            chars
            |> Array.mapi(
                fun i c ->
                    if not (lineTerminators.Contains c)  then 
                        0 
                    else
                        if c <> '\n' || i = 0 || chars.[i - 1] <> '\r' then 
                            1 
                        else 
                            0
            )
            |> Array.sum

        let nextLine = token.line + totalLines

        let nextColumn =
            if totalLines = 0 then 
                token.column + token.value.Length 
            else
                1 + (chars 
                     |> Array.rev 
                     |> Array.findIndex lineTerminators.Contains)


        System.Console.ReadKey true |> ignore

    main()

One problem with your implementation is that it originally seems to expect that all line terminators are just single characters, but that's actually not the case - if you treat "\r\n" as a single line terminator (composed from 2 characters), then the situation should be more clear. For example, I would declare terminators like this:

let terminators = [ ['\r'; '\n']; ['\r']; ['\n'] ]

The order is significant - if we find "\r\n" first then we want to skip 2 characters (so that we don't count the next '\n' char as the next terminator). Unfortunately "skip 2 characters" is a bit tricky - it cannot be done using mapi function, which calls the function for each element.

A direct implementation using a recursive function could look like this:

let input = "aaa\nbbb\r\nccc" |> List.ofSeq

// Returns Some() if input starts with token (and the returned
// value is the rest of the input with the starting token removed)
let rec equalStart input token = 
  match input, token with
  | _, [] -> Some(input) // End of recursion
  | a::input, b::token when a = b -> equalStart input token // Recursive call
  | _ -> None // Mismatch

// Counts the number of lines...
let rec countLines count input = 
  // Pick first terminator that matches with the beginning of the input (if any)
  let term = terminators |> List.tryPick (equalStart input)
  match term, input with 
  | None, _::input -> countLines count input  // Nothing found - continue
  | None, [] -> count + 1 // At the end, add the last line & return
  | Some(rest), _ -> countLines (count + 1) rest  // Found - add line & continue

If you were using some parser combinator library (such as FParsec), then you could use built-in parsers for most of the things. I didn't actually try this, but here is a rough sketch - you could store the list of terminators as a list of strings and generate parser for each of the string:

let terminators = [ "\r\n"; "\r"; "\n" ]
let parsers = [ for t in terminators -> 
                  parser { let! _ = pstring t  
                           return 1 } ] // Return '1' because we found next line

This gives you a list of parsers that return 1 when there is a terminator at the end - now you could aggregate all of them using <|> (or combinator) and then run the composed parser. If that fails, you can skip the first character (combine this with another parser) and continue recursively. The only problem is that parser combinators usually return all possible derivations ("\r\n" can be interpreted as two line breaks..), so you would need to get just the first result...

(From your question, it wasn't clear whether you actually want to use some parser combinator library or not, so I didn't elaborate on that topic - if you're interested, you can ask for more details...)

继续阅读：f#

Is there a more efficient way to calculate the next line/column number from a multiline token?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？