out of memory - What is the best way to process large data in PHP -


i have daily cron job xml web service. large, contains more 10k products information , xml size 14m example.

what need parsing xml object processing them. processing quite complicated. not directly put them database, need lot operation on them, , put them many database tables.

it in 1 php script. don't have experience on dealing large data.

so problem take lot of memory. , long time it. turn localhost php memory_limit 4g , running 3.5hrs got successful. production host not allowed such amount memory.

i research confused right way dealing situation.

here sample of code:

function my_items_import($xml){      $results = new simplexmlelement($xml);     $results->registerxpathnamespace('i', 'http://schemas.microsoft.com/dynamics/2008/01/documents/item');      //it loop on 10k     foreach($results->xpath('//i:item') $data) {          $data->registerxpathnamespace('i', 'http://schemas.microsoft.com/dynamics/2008/01/documents/item');          //my processing code here, call other functions lot things         processing($data);      }     unset($results); } 

as start don't use simplexmlelement on whole document. simplexmlelement loads in memory , not efficient large data. here snippet real code. you'll need accommodate case hope you'll general idea.

    $reader = new xmlreader();     $reader->xml($xml);     // cursor first article     while($reader->read() && $reader->name !== 'article');      // iterate articles     while($reader->name === 'article')     {         $doc = new domdocument('1.0', 'utf-8');         $article = simplexml_import_dom($doc->importnode($reader->expand(), true));         processing($article);         $reader->next('article');     }     $reader->close(); 

$article simplexmlelement can processed further. way save lot of memory making single article nodes go memory. additionally if each processing() function take long time can turn background process runs in separately main script , several processing() functions can started in parallel.


Comments

Popular posts from this blog

python - Subclassed QStyledItemDelegate ignores Stylesheet -

java - HttpClient 3.1 Connection pooling vs HttpClient 4.3.2 -

node.js - StackOverflow API not returning JSON -