memory usage export from database to csv in php
I need to export data from mysql to csv. I have to select data from several tables putting them in arrays, then process them and return them to browser as .csv. I noticed that arrays consume huge ammount of rows. For example, I imported a .csv in the database which is 1.8M, then I try to export this data from the database in .csv. The memory_get_peak_usage() shows more than 128M to store arrays with data.
For example this small array takes more than 700 bytes:
$startMemory = memory_get_usage();
//get constant fields of the subscriber
$data = array(array('subscriber_id' => 1315444, 'email_address' => 'test0@gmail.com',
'first_name' => 'Michael', 'last_name' => 'Allen'));
echo memory_get_usage() - $startMemory;
So exporting even several megabytes of data, require hundreds megabytes of memory in php script. Is there a way to solve this problem? Tables:
CREATE TABLE `subscribers` (
`subscriber_id` int(10) unsigned NOT NULL auto_increment,
开发者_运维问答 `list_id` int(10) unsigned NOT NULL,
`account_id` int(10) unsigned NOT NULL,
`email_address` varchar(100) collate utf8_unicode_ci NOT NULL,
`first_name` varchar(50) collate utf8_unicode_ci NOT NULL default '',
`last_name` varchar(50) collate utf8_unicode_ci NOT NULL default '',
`ip` int(10) unsigned default NULL COMMENT '\nThe ip address of the subscriber that we can get when he opens the \nthe email or subscribe using subsribe form.\nTheoretically it can be used to segment by Location (which is not correct if someone uses proxy).',
`preferred_format` tinyint(4) NOT NULL default '0' COMMENT 'Preferred format of \n0 - HTML, \n1 -Text,\n2 - Mobile',
`state` tinyint(4) NOT NULL default '1' COMMENT '1 - subscribed\n2 - unsubscribed\n3 - cleaned\n4 - not confirmed, it means the user subscribed but has not confirmed it yet.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n',
`cause_of_cleaning` tinyint(4) NOT NULL default '0' COMMENT '\nThis field is the cause of moving the subscriber to the \n0 - not used\n1 - spam complaint\n2 - hard bounce\n3 - several soft bounces',
`date_added` datetime NOT NULL COMMENT 'The data when the subscriber was added. I suppose this field can be used in the conditions forming the segment',
`last_changed` datetime NOT NULL,
PRIMARY KEY (`subscriber_id`),
UNIQUE KEY `email_list_id` (`email_address`,`list_id`),
KEY `FK_list_id` (`list_id`),
CONSTRAINT `FK_list_id` FOREIGN KEY (`list_id`) REFERENCES `lists` (`list_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB
CREATE TABLE `subscribers_multivalued` (
`id` int(10) unsigned NOT NULL auto_increment,
`subscriber_id` int(10) unsigned NOT NULL,
`field_id` int(10) unsigned NOT NULL,
`value` varchar(100) collate utf8_unicode_ci NOT NULL,
`account_id` int(10) unsigned NOT NULL COMMENT '\nThe identifier of the account',
PRIMARY KEY (`id`),
KEY `subscriber_fk` (`subscriber_id`),
KEY `field_fk` (`field_id`),
CONSTRAINT `field_fk_string_multivalued` FOREIGN KEY (`field_id`) REFERENCES `custom_fields` (`field_id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `subscriber_fk_multivalued` FOREIGN KEY (`subscriber_id`) REFERENCES `subscribers` (`subscriber_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB
CREATE TABLE `subscribers_custom_data_string` (
`subscriber_id` int(10) unsigned NOT NULL,
`field_id` int(10) unsigned NOT NULL,
`value` varchar(255) collate utf8_unicode_ci NOT NULL,
`account_id` int(10) unsigned NOT NULL COMMENT '\nThe identifier of the account',
PRIMARY KEY (`subscriber_id`,`field_id`),
KEY `subscriber_fk` (`subscriber_id`),
KEY `field_fk` (`field_id`),
CONSTRAINT `field_fk_string` FOREIGN KEY (`field_id`) REFERENCES `custom_fields` (`field_id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `subscriber_fk_string` FOREIGN KEY (`subscriber_id`) REFERENCES `subscribers` (`subscriber_id`) ON DELETE CASCADE ON UPDATE CASCADE
)
There are other tables for fields similar to the table with strings for numbers, dates. For them primary key is subscriber_id, field_id.
When query fails (for example we have several custom fields):
SELECT subscribers
.email_address
, subscribers
.first_name
, subscribers
.last_name
,
GROUP_CONCAT(t1.value SEPARATOR '|') AS Colors, GROUP_CONCAT(t2.value SEPARATOR '|') AS Languages
FROM subscribers
LEFT JOIN subscribers_multivalued
AS t1
ON subscribers.subscriber_id=t1.subscriber_id AND t1.field_id=112
LEFT JOIN subscribers_multivalued
AS t2
ON subscribers.subscriber_id=t2.subscriber_id AND t2.field_id=111
WHERE (list_id=40)
GROUP BY subscribers
.email_address
, subscribers
.first_name
, subscribers
.last_name
It would return this:
test1000@gmail.com Michelle Bush Red|Red|Blue|Blue English|Spanish|English|Spanish instead of test1000@gmail.com Michelle Bush Red|Blue English|Spanish
Thank you for any information.
Using just the two tables:
Your original query:
SELECT subscribers.email_address,
subscribers.first_name,
subscribers.last_name,
t1.value AS Languages
FROM subscribers
LEFT JOIN (SELECT subscriber_id,
field_id,
GROUP_CONCAT(value SEPARATOR '|') AS value
FROM subscribers_multivalued
WHERE field_id=37
GROUP BY subscriber_id, field_id
) AS t1
ON subscribers.subscriber_id=t1.subscriber_id
AND t1.field_id=37
WHERE (list_id=49)
AND (state=1)
gives an explain plan of:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY subscribers ref FK_list_id FK_list_id 4 const 2 Using where
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 5
2 DERIVED subscribers_multivalued ALL field_fk field_fk 4 11 Using filesort
My join suggestion:
SELECT subscribers.email_address,
subscribers.first_name,
subscribers.last_name,
GROUP_CONCAT(t1.value SEPARATOR '|') AS Languages
FROM subscribers
LEFT JOIN subscribers_multivalued t1
ON subscribers.subscriber_id=t1.subscriber_id
AND t1.field_id=37
WHERE (list_id=49)
AND (state=1)
GROUP BY subscribers.email_address,
subscribers.first_name,
subscribers.last_name
gives an explain plan of:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE subscribers ref FK_list_id FK_list_id 4 const 2 Using where; Using filesort
1 SIMPLE t1 ref subscriber_fk,field_fk subscriber_fk 4 test.subscribers.subscriber_id 1
While I only populated those two tables with a very small volume of data, this suggests to me that my version of the query will execute more efficiently against the database, because it isn't using the derived table that your query generates.
The other tables could be linked into the query in much the same way, and the entire result spooled directly to a csv file rather than parsed further with PHP.
This should give you a run that is both faster and more memory efficient.
EDIT
SELECT subscribers.email_address,
subscribers.first_name,
subscribers.last_name,
GROUP_CONCAT(DISTINCT t1.value SEPARATOR '|') AS Colors,
GROUP_CONCAT(DISTINCT t2.value SEPARATOR '|') AS Languages
FROM subscribers
LEFT JOIN subscribers_multivalued AS t1
ON subscribers.subscriber_id=t1.subscriber_id
AND t1.field_id=112
LEFT JOIN subscribers_multivalued AS t2
ON subscribers.subscriber_id=t2.subscriber_id
AND t2.field_id=37
WHERE (list_id=49)
GROUP BY subscribers.email_address,
subscribers.first_name,
subscribers.last_name
Note the use of DISTINCT in the GROUP_CONCAT() function
If possible by your business logic, you could do the transformations in mysql and do a
SELECT * from table INTO OUTFILE 'file_name.csv'
it has the same options as LOAD DATA INFILE, the file you are writing to must not exist.
Refactor your code into a single function that reads the data for the Nth line, processes it, outputs that line without output buffering, and the discards all temporary data. Call that function repeatedly. This should reduce your memory usage to only what is needed on a given line, instead of handling all lines together.
Once this is done, you can extend it to read an arbitrary number of lines at the same time, in order to tweak the memory usage and the performance tradeoff this involves.
It looks like to prevent supeflous grouping it's possible to use DISTINCT in aggregate function GROUP_CONCAT:
SELECT `subscribers`.`email_address`, `subscribers`.`first_name`, `subscribers`.`last_name`,
GROUP_CONCAT(DISTINCT t1.value SEPARATOR '|') AS Colors, GROUP_CONCAT(DISTINCT t2.value SEPARATOR '|') AS Languages
FROM `subscribers`
LEFT JOIN `subscribers_multivalued` AS `t1` ON subscribers.subscriber_id=t1.subscriber_id AND t1.field_id=49
LEFT JOIN `subscribers_multivalued` AS `t2` ON subscribers.subscriber_id=t2.subscriber_id AND t2.field_id=48
WHERE (list_id=63)
GROUP BY `subscribers`.`email_address`, `subscribers`.`first_name`, `subscribers`.`last_name`
So exporting even several megabytes of data, require hundreds megabytes of memory in php script. Is there a way to solve this problem?
Since another post asking that same question got marked as a duplicate of this, I'll give an answer to that part of your question without regard to all the other specific details you asked.
Row-based processing in PHP
In order to reduce the amount of memory PHP requires at any time, you can fetch rows from the MySQL server one at a time, and pass them on to the client browser (or some file on the server) without buffering.
To fetch rows one at a time, add MYSQLI_USE_RESULT
as the resultmode parameter to the mysqli::query
call in order to iterate over the results one row at a time, without transferring them all to PHP in one bunch. See also the documentation for mysqli::use-result
.
Make sure you're not using PHP output buffering, so that you won't need memory for the whole document. If you write the content to a file on the server (e.g. using fwrite
), you can later on pass that file to the client using readfile
or similar. You may use the file as a cache, in case you need to transfer the same result several times.
MySQL writing to file
If you have the FILE
privilege, then you can issue a SELECT … INTO OUTFILE …
query to have the MySQL server write the result directly to some (temporary) file on the server.
You can then either redirect the client to that file so that the web server will serve that as a plain static file, or you can pass the content to the client yourself using readfile
. The latter has the benefit that you can remove the file immediately after it has been passed to the client, in cases where you know you won't need it again.
精彩评论