Need more speed in PHP foreach loop
I'm doing some integrations towards MS based web applications which forces me to fetch the data to my php application via SOAP which is fine.
I got the structure of a file system in an xml which I convert to an object. All documents have an ID and it's path. To be able t开发者_如何学Goo place the documents in a tree view I've built some methods to calculate the documents whereabouts through the files and folder structure. This works fine until I started to try with large file lists.
What I need is a faster method (or way to do things) than a foreach loop.
The method below is the troublemaker.
/**
* Find parent id based on path
* @param array $documents
* @param string $parentPath
* @return int
*/
private function getParentId($documents, $parentPath) {
$parentId = 0;
foreach ($documents as $document) {
if ($parentPath == $document->ServerUrl) {
$parentId = $document->ID;
break;
}
}
return $parentId;
}
// With 20 documents nested in different folders this method renders in 0.00033712387084961
// With 9000 documents nested in different folders it takes 60 seconds
The array sent to the object looks like this
Array
(
[0] => testprojectDocumentLibraryObject Object
(
[ParentID] => 0
[Level] => 1
[ParentPath] => /Shared Documents
[ID] => 163
[GUID] => 505d70ea-51d7-4ef0-bf79-8e912553249e
[DocIcon] =>
[FileType] =>
[Title] => Folder1
[BaseName] => Folder1
[LinkFilename] => Folder1
[ContentType] => Folder
[FileSizeDisplay] =>
[_UIVersionString] => 1.0
[ServerUrl] => /Shared Documents/Folder1
[EncodedAbsUrl] => http://dev1.example.com/Shared%20Documents/Folder1
[Created] => 2011-10-08 20:57:47
[Modified] => 2011-10-08 20:57:47
[ModifiedBy] =>
[CreatedBy] =>
[_ModerationStatus] => 0
[WorkflowVersion] => 1
)
...
A bit bigger example of the data array is available here http://www.trikks.com/files/testprojectDocumentLibraryObject.txt
Thanks for any help!
=== UPDATE ===
To illustrate the time different stuff takes I've added this part.
- Packet downloaded in 8.5031080245972 seconds
- Packet decoded in 1.2838368415833 seconds
- Packet unpacked in 0.051079988479614 seconds
- List data organized in 3.8216209411621 seconds
- Standard properties filled in 0.46236896514893 seconds
- Custom properties filled in 40.856066942215 seconds
- TOTAL: This page was created in 55.231353998184 seconds!
Now, this is a custom property action that im describing, the other stuff is already somewhat optimized. The data sent from the WCF service is compressed and encoded ratio 10:1 (like 10mb uncompressed : 1mb compressed).
The current priority is to optimize the custom properties part, where the getParentId method takes 99% of the execution time!
You may see faster results by using XMLReader or expat instead of simplexml. Both of these reqd the xml sequentially and won't store the entire document in memory.
Also make sure you have the APC extension on, for the actual loop it's a big big difference. Some benchmarks on the actual loop would be nice.
Lastly, if you cannot make it faster.. rather than trying to optimize reading the large xml document, you should look into ways where this 'slowness' is not an issue. Some ideas include an asynchronous process, proper caching, etc..
Edit
Are you actually calling getParentId for every document? This just occurred to me. If you have a 1000 documents then this would imply already 1000*1000 loops. If this is truly the case, you need to rewrite your code so it becomes a single loop.
How are you populating the array in the first place? Perhaps you could arrange the items in a hierarchy of nested arrays, where each key relates to one part of the path.
e.g.
['Shared Documents']
['Folder1']
['Yet another folder']
['folderA']
['folderB']
Then in your getParentId()
method, extract the various parts of the path and just search that section of data:
private function getParentId($documents, $parentPath) {
$keys = explode('/', $parentPath);
$docs = $documents;
foreach ($keys as $key) {
if (isset($docs[$key])) {
$docs = $docs[$key];
} else {
return 0;
}
}
foreach $docs as $document) {
if ($parentPath == $document->ServerUrl) {
return $document->ID;
}
}
}
I haven't fully checked that will do what you're after, but it might help set you on a helpful path.
Edit: I missed that you're not populating the array yourself initially; but doing some sort of indexing ahead of time might still save you time overall, especially if getParentId
is called on the same data multiple times.
As usual this was a matter of programming design. And there are a few lessons to be learned from this.
In a file system the parent is always a folder, to speed up such a process in php you can put all the folders in a separate array with it's corresponding ID as the key and search that array when you want to find the parent of a file, instead of searching the entire file structure array!
- Packet downloaded in 6.9351849555969 seconds
- Packet decoded in 1.2411289215088 seconds
- Packet unpacked in 0.04874587059021 seconds
- List data organized in 3.7993721961975 seconds
- Standard properties filled in 0.4488160610199 seconds
- Custom properties filled in 0.15889382362366 seconds
- This page was created in 11.578738212585 seconds!
Compare the custom properties by the one from my original post
Cheers
精彩评论