开发者

Read file line-by-line in Amazon S3?

Is it possible to read a f开发者_如何学Cile line-by-line with Amazon S3? I'm looking to let people upload large files somewhere, then have some code (probably running on Amazon) read their file line-by-line and do something with it, probably in a map-reduced multithreaded fashion. Or maybe just being able to load 1000 lines at a time... Any suggestions?


Amazon S3 does support range requests but its not designed to read a file line by line.

However it looks like Amazon Elastic MapReduce might be a good fit what you are looking for. Transfers between S3 and the EC2 instances used will be very fast and then you can divide up the work in any way you please.


Here is a simple example using PHP 7 and Laravel 5 how to read a file line-by-line from Amazon S3:

S3StreamReader.php

<?php
declare(strict_types=1);

namespace App\Helpers\Json;

use App\Helpers\S3StreamFactory;
use Generator;
use SplFileObject;

final class S3StreamReader
{
    /**
     * @var \App\Helpers\S3StreamFactory
     */
    private $streamFactory;


    /**
     * @param \App\Helpers\S3StreamFactory $s3StreamFactory
     */
    public function __construct(S3StreamFactory $s3StreamFactory)
    {
        $this->streamFactory = $s3StreamFactory;
    }

    /**
     * @param string $filename
     * @return \Generator
     */
    public function get(string $filename): Generator
    {
        $file = new SplFileObject($this->streamFactory->create($filename), 'r');

        while (!$file->eof()) {
            yield $file->fgets();
        }
    }
}

S3StreamFactory.php

<?php
declare(strict_types=1);

namespace App\Helpers;

use League\Flysystem\AwsS3v3\AwsS3Adapter;

final class S3StreamFactory
{
    /**
     * @var \League\Flysystem\AwsS3v3\AwsS3Adapter
     */
    private $adapter;


    /**
     * @param \League\Flysystem\AwsS3v3\AwsS3Adapter $adapter
     */
    public function __construct(AwsS3Adapter $adapter)
    {
        $this->adapter = $adapter;
        $adapter->getClient()->registerStreamWrapper();
    }

    /**
     * @param string $filename
     * @return string
     */
    public function create(string $filename): string
    {
        return "s3://{$this->adapter->getBucket()}/{$filename}";
    }
}

Example of usage:

$lines = (new S3JsonReader(new S3StreamFactory(Storage::disk('s3')->getAdapter())))->get($sourceFile);

while ($lines->valid()) {
    $line = $lines->current();
    // do something with the current line...
    $lines->next();
}

Even if you don't use Laravel, you can still use this code, since Laravel just uses league/flysystem-aws-s3-v3 package.


Here's an example snippet in PHP that seems to do what you're asking (grabs the first 1000 lines in file.txt and concatenates them). It's a bit contrite, but the idea can be implemented in other languages or using other techniques. The key is to treat S3 the same as you would any other file system like windows or linux, the only difference being that you use your S3 keys credentials and set the file path to s3://your_directory_tree/your_file.txt":

<?php 
    set_time_limit(0); 
    include("gs3.php"); 
    /* fake keys!, please put yours */ 
    define('S3_KEY', 'DA5S4D5A6S4D'); 
    define('S3_PRIVATE','adsadasd');

    $f = fopen('s3://mydir/file.txt', 'r');
    $c = "";
    $d = 0;

    $handle = @fopen('s3://mydir/file.txt', "r");
    if ($handle) {
        while (($buffer = fgets($handle)) !== false  && $d < 1000) {
            $c .= $buffer; /* concatenate the string (newlines attached)*/
            $d += 1; /* increment the count*?
        }
        if (!feof($handle)) {
            echo "Error: unexpected fgets() fail\n";
        }
        else{
            print "$c"
        }

        fclose($handle);
    }
?> 
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜