开发者

Is there finer granularity than LOAD/NEXT for reading structured data?

Imagine that I have a long file of Rebol-formatted data, with a million lines, that look something like

REBOL []

[
    [employee name: {Tony Romero} salary: $10,203.04]
    [employee name: {Marcus "Marco" Marcami} salary: default]
    [employee name: {Serena Derella} salary: ($10,000 + $20开发者_如何学Go3.04)]

...

    [employee name: {Stacey Christie} salary: (10% * $102,030.40)]
]

If the enclosing block wasn't there, I could use LOAD/NEXT to read through the employee items one at a time (as opposed to parsing the entire file into structured data with LOAD). Is there any way to do something similar if the enclosing block is there?

What if I wanted to go back to a previously visited item? Could there be a "structural seek"?

Is there a viable database solution that one could use for this kind of desire for Rebol-structured data, which might even permit random access insertions?


I recall, that it was you who proved, that this should be doable in PARSE? ;-)

Nevertheless, to give you a useful answer: the code I wrote for the link text can be described exactly as parsing (in essence) REBOL not using the default LOAD/NEXT when needing something else. So, have a look, read the documentation, run the tests, write some tests, and if you have more questions, just ask.


If you are happy to tweak your file format a little so it is a file with one record per line, no enclosing blocks nor REBOL header:

employee-name: {Tony Romero} salary: $10203.04
employee-name: {Marcus "Marco" Marcami} salary: 'default
employee-name: {Serena Derella} salary: ($10000 + $203.04)
employee-name: {Stacey Christie} salary: (10% * $102030.40)

Then....

data: read/lines %data-file.txt

....gets you a block of unloaded strings

One way to work with them is like this:

foreach record data [
    record: make object! load/all record
    probe record
]

I had to tweak your data format too to make it easily loadable by REBOL:

  • employee-name rather than employee name
  • $10203.04 rather than $10'203.04
  • 10% -- only works with REBOL3

If you can't tweak the data format like that, you could always do some edits on each string prior to LOAD/ALL to normalise it for REBOL.


Sunanda's answer is not good as you can have multiline data! You can use something like that:

data: {REBOL []

[
    [employee name: {Tony Romero} salary: $10'203.04]
    [employee name: {Marcus "Marco" Marcami} salary: default]
    [employee name: {Serena Derella} salary: ($10'000 + $203.04)]
]}

unless all [
    set [value data] load/next data
    value = 'REBOL
][  print "Not a REBOL data file!" halt ]
set [header data] load/next data
print ["data-file-header:" mold header]
data: find/tail data #"["

attempt [
    ;you must use attempt as there will be at least one error at the end of file!
    ;** Syntax Error: Missing [ at end-of-block
    indexes: copy []
    while [
        append indexes data
        set [loaded-row data] load/next data
        data
    ][
        probe loaded-row
    ]

]
print "done"

remove back tail indexes ;removes the last erroneous position

foreach data-at-pos reverse indexes [
    probe first load/next data-at-pos
]

So the output would be:

[employee name: "Tony Romero" salary: $10203.04]
[employee name: {Marcus "Marco" Marcami} salary: default]
[employee name: "Serena Derella" salary: ($10000.00 + $203.04)]
done
[employee name: "Serena Derella" salary: ($10000.00 + $203.04)]
[employee name: {Marcus "Marco" Marcami} salary: default]
[employee name: "Tony Romero" salary: $10203.04]
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜