开发者

memory usage export from database to csv in php

I need to export data from mysql to csv. I have to select data from several tables putting them in arrays, then process them and return them to browser as .csv. I noticed that arrays consume huge ammount of rows. For example, I imported a .csv in the database which is 1.8M, then I try to export this data from the database in .csv. The memory_get_peak_usage() shows more than 128M to store arrays with data.

For example this small array takes more than 700 bytes:

$startMemory = memory_get_usage();  
        //get constant fields of the subscriber
        $data = array(array('subscriber_id' => 1315444, 'email_address' => 'test0@gmail.com',
                            'first_name' => 'Michael', 'last_name' => 'Allen'));
        echo memory_get_usage() - $startMemory;

So exporting even several megabytes of data, require hundreds megabytes of memory in php script. Is there a way to solve this problem? Tables:

    CREATE TABLE `subscribers` (
     `subscriber_id` int(10) unsigned NOT NULL auto_increment,
开发者_运维问答     `list_id` int(10) unsigned NOT NULL,
     `account_id` int(10) unsigned NOT NULL,
     `email_address` varchar(100) collate utf8_unicode_ci NOT NULL,
     `first_name` varchar(50) collate utf8_unicode_ci NOT NULL default '',
     `last_name` varchar(50) collate utf8_unicode_ci NOT NULL default '',
     `ip` int(10) unsigned default NULL COMMENT '\nThe ip address of the subscriber that we can get when he opens the \nthe email or subscribe using subsribe form.\nTheoretically it can be used to segment by Location (which is not correct if someone uses proxy).',
     `preferred_format` tinyint(4) NOT NULL default '0' COMMENT 'Preferred format of \n0 - HTML, \n1 -Text,\n2 - Mobile',
     `state` tinyint(4) NOT NULL default '1' COMMENT '1 - subscribed\n2 - unsubscribed\n3 - cleaned\n4 - not confirmed, it means the user subscribed but has not confirmed it yet.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n',
     `cause_of_cleaning` tinyint(4) NOT NULL default '0' COMMENT '\nThis field is the cause of moving the subscriber to the \n0 - not used\n1 - spam complaint\n2 - hard bounce\n3 - several soft bounces',
     `date_added` datetime NOT NULL COMMENT 'The data when the subscriber was added. I suppose this field can be used in the conditions forming the segment',
     `last_changed` datetime NOT NULL,
     PRIMARY KEY  (`subscriber_id`),
     UNIQUE KEY `email_list_id` (`email_address`,`list_id`),
     KEY `FK_list_id` (`list_id`),
     CONSTRAINT `FK_list_id` FOREIGN KEY (`list_id`) REFERENCES `lists` (`list_id`) ON DELETE CASCADE ON UPDATE CASCADE
    ) ENGINE=InnoDB 
    CREATE TABLE `subscribers_multivalued` (
     `id` int(10) unsigned NOT NULL auto_increment,
     `subscriber_id` int(10) unsigned NOT NULL,
     `field_id` int(10) unsigned NOT NULL,
     `value` varchar(100) collate utf8_unicode_ci NOT NULL,
     `account_id` int(10) unsigned NOT NULL COMMENT '\nThe identifier of the account',
     PRIMARY KEY  (`id`),
     KEY `subscriber_fk` (`subscriber_id`),
     KEY `field_fk` (`field_id`),
     CONSTRAINT `field_fk_string_multivalued` FOREIGN KEY (`field_id`) REFERENCES `custom_fields` (`field_id`) ON DELETE CASCADE ON UPDATE CASCADE,
     CONSTRAINT `subscriber_fk_multivalued` FOREIGN KEY (`subscriber_id`) REFERENCES `subscribers` (`subscriber_id`) ON DELETE CASCADE ON UPDATE CASCADE
    ) ENGINE=InnoDB
CREATE TABLE `subscribers_custom_data_string` (
 `subscriber_id` int(10) unsigned NOT NULL,
 `field_id` int(10) unsigned NOT NULL,
 `value` varchar(255) collate utf8_unicode_ci NOT NULL,
 `account_id` int(10) unsigned NOT NULL COMMENT '\nThe identifier of the account',
 PRIMARY KEY  (`subscriber_id`,`field_id`),
 KEY `subscriber_fk` (`subscriber_id`),
 KEY `field_fk` (`field_id`),
 CONSTRAINT `field_fk_string` FOREIGN KEY (`field_id`) REFERENCES `custom_fields` (`field_id`) ON DELETE CASCADE ON UPDATE CASCADE,
 CONSTRAINT `subscriber_fk_string` FOREIGN KEY (`subscriber_id`) REFERENCES `subscribers` (`subscriber_id`) ON DELETE CASCADE ON UPDATE CASCADE
) 

There are other tables for fields similar to the table with strings for numbers, dates. For them primary key is subscriber_id, field_id.

When query fails (for example we have several custom fields):

SELECT subscribers.email_address, subscribers.first_name, subscribers.last_name, GROUP_CONCAT(t1.value SEPARATOR '|') AS Colors, GROUP_CONCAT(t2.value SEPARATOR '|') AS Languages FROM subscribers LEFT JOIN subscribers_multivalued AS t1 ON subscribers.subscriber_id=t1.subscriber_id AND t1.field_id=112 LEFT JOIN subscribers_multivalued AS t2 ON subscribers.subscriber_id=t2.subscriber_id AND t2.field_id=111 WHERE (list_id=40) GROUP BY subscribers.email_address, subscribers.first_name, subscribers.last_name

It would return this:

test1000@gmail.com Michelle Bush Red|Red|Blue|Blue English|Spanish|English|Spanish instead of test1000@gmail.com Michelle Bush Red|Blue English|Spanish

Thank you for any information.


Using just the two tables:

Your original query:

SELECT subscribers.email_address, 
       subscribers.first_name, 
       subscribers.last_name, 
       t1.value AS Languages 
  FROM subscribers 
  LEFT JOIN (SELECT subscriber_id, 
                    field_id, 
                    GROUP_CONCAT(value SEPARATOR '|') AS value 
               FROM subscribers_multivalued 
              WHERE field_id=37 
              GROUP BY subscriber_id, field_id
            ) AS t1 
         ON subscribers.subscriber_id=t1.subscriber_id 
        AND t1.field_id=37 
 WHERE (list_id=49) 
   AND (state=1)

gives an explain plan of:

id  select_type  table                    type  possible_keys  key         key_len  ref    rows  Extra
1   PRIMARY      subscribers              ref   FK_list_id     FK_list_id  4        const  2     Using where
1   PRIMARY      <derived2>               ALL   NULL           NULL        NULL     NULL   5      
2   DERIVED      subscribers_multivalued  ALL   field_fk       field_fk    4               11    Using filesort

My join suggestion:

SELECT subscribers.email_address, 
       subscribers.first_name, 
       subscribers.last_name, 
       GROUP_CONCAT(t1.value SEPARATOR '|') AS Languages 
  FROM subscribers 
  LEFT JOIN subscribers_multivalued t1 
         ON subscribers.subscriber_id=t1.subscriber_id 
        AND t1.field_id=37 
 WHERE (list_id=49) 
   AND (state=1)
 GROUP BY subscribers.email_address, 
          subscribers.first_name, 
          subscribers.last_name

gives an explain plan of:

id  select_type  table        type  possible_keys           key            key_len  ref                             rows  Extra
1   SIMPLE       subscribers  ref   FK_list_id              FK_list_id     4        const                           2     Using where; Using filesort
1   SIMPLE       t1           ref   subscriber_fk,field_fk  subscriber_fk  4        test.subscribers.subscriber_id  1      

While I only populated those two tables with a very small volume of data, this suggests to me that my version of the query will execute more efficiently against the database, because it isn't using the derived table that your query generates.

The other tables could be linked into the query in much the same way, and the entire result spooled directly to a csv file rather than parsed further with PHP.

This should give you a run that is both faster and more memory efficient.

EDIT

SELECT subscribers.email_address, 
       subscribers.first_name, 
       subscribers.last_name, 
       GROUP_CONCAT(DISTINCT t1.value SEPARATOR '|') AS Colors, 
       GROUP_CONCAT(DISTINCT t2.value SEPARATOR '|') AS Languages 
  FROM subscribers 
  LEFT JOIN subscribers_multivalued AS t1 
         ON subscribers.subscriber_id=t1.subscriber_id 
        AND t1.field_id=112 
  LEFT JOIN subscribers_multivalued AS t2 
    ON subscribers.subscriber_id=t2.subscriber_id 
   AND t2.field_id=37 
 WHERE (list_id=49) 
 GROUP BY subscribers.email_address, 
          subscribers.first_name, 
          subscribers.last_name

Note the use of DISTINCT in the GROUP_CONCAT() function


If possible by your business logic, you could do the transformations in mysql and do a

SELECT * from table INTO OUTFILE 'file_name.csv'

it has the same options as LOAD DATA INFILE, the file you are writing to must not exist.


Refactor your code into a single function that reads the data for the Nth line, processes it, outputs that line without output buffering, and the discards all temporary data. Call that function repeatedly. This should reduce your memory usage to only what is needed on a given line, instead of handling all lines together.

Once this is done, you can extend it to read an arbitrary number of lines at the same time, in order to tweak the memory usage and the performance tradeoff this involves.


It looks like to prevent supeflous grouping it's possible to use DISTINCT in aggregate function GROUP_CONCAT:

SELECT `subscribers`.`email_address`, `subscribers`.`first_name`, `subscribers`.`last_name`,
GROUP_CONCAT(DISTINCT t1.value SEPARATOR '|') AS Colors, GROUP_CONCAT(DISTINCT t2.value SEPARATOR '|') AS Languages
FROM `subscribers`
LEFT JOIN `subscribers_multivalued` AS `t1` ON subscribers.subscriber_id=t1.subscriber_id AND t1.field_id=49
LEFT JOIN `subscribers_multivalued` AS `t2` ON subscribers.subscriber_id=t2.subscriber_id AND t2.field_id=48
WHERE (list_id=63)
GROUP BY `subscribers`.`email_address`, `subscribers`.`first_name`, `subscribers`.`last_name`


So exporting even several megabytes of data, require hundreds megabytes of memory in php script. Is there a way to solve this problem?

Since another post asking that same question got marked as a duplicate of this, I'll give an answer to that part of your question without regard to all the other specific details you asked.

Row-based processing in PHP

In order to reduce the amount of memory PHP requires at any time, you can fetch rows from the MySQL server one at a time, and pass them on to the client browser (or some file on the server) without buffering.

To fetch rows one at a time, add MYSQLI_USE_RESULT as the resultmode parameter to the mysqli::query call in order to iterate over the results one row at a time, without transferring them all to PHP in one bunch. See also the documentation for mysqli::use-result.

Make sure you're not using PHP output buffering, so that you won't need memory for the whole document. If you write the content to a file on the server (e.g. using fwrite), you can later on pass that file to the client using readfile or similar. You may use the file as a cache, in case you need to transfer the same result several times.

MySQL writing to file

If you have the FILE privilege, then you can issue a SELECT … INTO OUTFILE … query to have the MySQL server write the result directly to some (temporary) file on the server.

You can then either redirect the client to that file so that the web server will serve that as a plain static file, or you can pass the content to the client yourself using readfile. The latter has the benefit that you can remove the file immediately after it has been passed to the client, in cases where you know you won't need it again.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜