开发者

Indentation control while developing a small python like language

I'm developing a small python like language using flex, byacc (for lexical and parsing) and C++, but i have a few questions regarding scope control.

just as python it uses white spaces (or tabs) for indentation, not only that but i want to implement index breaking like for instance if you type "break 2" inside a while loop that's inside another while loop it would no开发者_运维问答t only break from the last one but from the first loop as well (hence the number 2 after break) and so on.

example:

while 1
    while 1
        break 2
        'hello world'!! #will never reach this. "!!" outputs with a newline
    end
    'hello world again'!! #also will never reach this. again "!!" used for cout
end
#after break 2 it would jump right here

but since I don't have an "anti" tab character to check when a scope ends (like C for example i would just use the '}' char) i was wondering if this method would the the best:

I would define a global variable, like "int tabIndex" on my yacc file that i would access in my lex file using extern. then every time i find a tab character on my lex file i would increment that variable by 1. when parsing on my yacc file if i find a "break" keyword i would decrement by the amount typed after it from the tabIndex variable, and when i reach and EOF after compiling and i get a tabIndex != 0 i would output compilation error.

now the problem is, whats the best way to see if the indentation got reduced, should i read \b (backspace) chars from lex and then reduce the tabIndex variable (when the user doesn't use break)?

another method to achieve this?

also just another small question, i want every executable to have its starting point on the function called start() should i hardcode this onto my yacc file?

sorry for the long question any help is greatly appreciated. also if someone can provide an yacc file for python would be nice as a guideline (tried looking on Google and had no luck).

thanks in advance.


I am currently implementing a programming language rather similar to this (including the multilevel break oddly enough). My solution was to have the tokenizer emit indent and dedent tokens based on indentation. Eg:

while 1: # colons help :)
    print('foo')
    break 1

becomes:

["while", "1", ":",
    indent,
    "print", "(", "'foo'", ")",
    "break", "1",
    dedent]

It makes the tokenizer's handling of '\n' somewhat complicated though. Also, i wrote the tokenizer and parser from scratch, so i'm not sure whether this is feasable in lex and yacc.

Edit:

Semi-working pseudocode example:

level = 0
levels = []
for c = getc():
    if c=='\n':
        emit('\n')
        n = 0
        while (c=getc())==' ':
            n += 1
        if n > level:
            emit(indent)
            push(levels,n)
        while n < level:
            emit(dedent)
            level = pop(levels)
            if level < n:
                error tokenize
        # fall through
    emit(c) #lazy example


Very interesting exercise. Can't you use the end keyword to check when the scope ends?

On a different note, I have never seen a language that allows you to break out of several nested loops at once. There may be a good reason for that...

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜