JSON to fixed width file

2023-02-14 10:08 问答作者：

I have to extract data from JSON file depending on a specific key. The data then has to be filtered (based on the key value) and separated into different fixed width flat files. I have to develop a solution using shell scripting.

Since the data is just key:value pair I can extract them by processing each line in the JSON file, checking the type and writing the values to the corresponding fixed-width file.

My problem is that开发者_Go百科 the input JSON file is approximately 5GB in size. My method is very basic and would like to know if there is a better way to achieve this using shell scripting ?

Sample JSON file would look like as below:

{"Type":"Mail","id":"101","Subject":"How are you ?","Attachment":"true"}
{"Type":"Chat","id":"12ABD","Mode:Online"}

The above is a sample of the kind of data I need to process.

Give this a try:

#!/usr/bin/awk
{
    line = ""
    gsub("[{}\x22]", "", $0)
    f=split($0, a, "[:,]")
    for (i=1;i<=f;i++)
        if (a[i] == "Type")
            file = a[++i]
        else
            line = line sprintf("%-15s",a[i])
    print line > file ".fixed.out"
}

I made assumptions based on the sample data provided. There is a lot based on those assumptions that may need to be changed if the data varies much from what you've shown. In particular, this script will not work properly if the data values or field names contain colons, commas, quotes or braces. If this is a problem, it's one of the primary reasons that a proper JSON parser should be used. If it were my assignment, I'd push back hard on this point to get permission to use the proper tools.

This outputs lines that have type "Mail" to a file named "Mail.fixed.out" and type "Chat" to "Chat.fixed.out", etc.

The "Type" field name and field value ("Mail", etc.) are not output as part of the contents. This can be changed.

Otherwise, both the field names and values are output. This can be changed.

The field widths are all fixed at 15 characters, padded with spaces, with no delimiters. The field width can be changed, etc.

Let me know how close this comes to what you're looking for and I can make some adjustments.

perl script

#!/usr/bin/perl -w
use strict;
use warnings;

no strict 'refs'; # for FileCache
use FileCache; # avoid exceeding system's maximum number of file descriptors
use JSON;

my $type;
my $json = JSON->new->utf8(1); #NOTE: expect utf-8 strings

while(my $line = <>) { # for each input line
    # extract type
    eval { $type = $json->decode($line)->{Type} };
    $type = 'json_decode_error' if $@;
    $type ||= 'missing_type';

    # print to the appropriate file
    my $fh = cacheout '>>', "$type.out";
    print $fh $line; #NOTE: use cache if there are too many hdd seeks
}

corresponding shell script

#!/bin/bash
#NOTE: bash is used to create non-ascii filenames correctly

__extract_type()
{
    perl -MJSON -e 'print from_json(shift)->{Type}' "$1"
}

__process_input()
{
    local IFS=$'\n'
    while read line; do # for each input line
        # extract type
        local type="$(__extract_type "$line" 2>/dev/null ||
            echo json_decode_error)"
        [ -z "$type" ] && local type=missing_type

        # print to the appropriate file
        echo "$line" >> "$type.out"
    done
}

__process_input

Example:

$ ./script-name < input_file
$ ls -1 *.out
json_decode_error.out
Mail.out

继续阅读：json scripting shell

JSON to fixed width file

perl script

corresponding shell script

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？

perl script

corresponding shell script

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生 新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？