Jump to content

Read large text file efficiently?

- - - - -

This topic has been archived. This means that you cannot reply to this topic.
6 replies to this topic

#1
FireGator

FireGator

    Learning Programmer

  • Members
  • PipPipPip
  • 37 posts
I'm sort of new to PHP, so I wish to learn it right from the start. My scenario is where I have a large log file and I wanted to read it efficiently without exhausting any memory, I had used the simple "file_get_contents()" function but had read on the manual that it can exhaust memory on large files:

<?php

    //my method..

    $logcontents = file_get_contents('logfile.txt');


May someone document an example method to make it efficient?

#2
Alexander

Alexander

    It's Science!

  • Moderators
  • 4,124 posts
A PHP homebrew dev to the rescue!
file_get_contents() employs a persistant seekable stream wrapper (PHPAPI php_stream_passthru to be specific) which attempts to load the seekable stream directly into memory using mmap() if available. As it is an "all in one URL wrapper stream function" it was not designed for the use of file parsing.

Here we will want to use fopen() to chunk the request, to utilize memory to only the part being read.

An example reading your logfile.txt into a string:
<?php
//initialize the buffer
$buffer = null;
//set xmode to `read`, r = read, rb = read binary, w = write, wr = read write, a = append,  x = create and write
$handle = fopen('./logfile.txt', 'r') 
    or trigger_error('Fatal Error: Could not open file', E_USER_ERROR) && exit(); //we need to ensure validity to prevent an infinite loop with feof()
//iterate through the file until EOF (end of file) is reached
while(!feof($handle)) {
    //we allocate and read 4096 bytes (4 Megabytes) of the file into memory at a time until EOF
    $buffer .= fread($handle, 4096);
}
//we close the stream wrapper to free resources
fclose($handle);

//do something with the buffer
print "File is " . strlen($buffer) . " bytes.";
?>
We can of course utilize fread() to accept the complete chunk of the file at once using filesize($filename) as length parameter, but that will not be very efficient in the end.

Edited by Alexander, 14 September 2010 - 10:14 AM.
bits->bytes

Be sure to read the updated FAQ! || Health is achieved through the same 10,000 steps.
If a suggested code/method fails, informing us is less important than telling us why or what errors occurred.

#3
Orjan

Orjan

    Writes binary right handed and hex left handed

  • Moderators
  • 3,299 posts
fread($handle, 4096); will read 4096 BYTES not bits
__________________________________________
I study Information Systems at Karlstad University when I'm not on CodeCall

#4
Alexander

Alexander

    It's Science!

  • Moderators
  • 4,124 posts
A childish mistake there! Thank you Orjan.

Edited by Alexander, 06 September 2010 - 07:58 PM.

Be sure to read the updated FAQ! || Health is achieved through the same 10,000 steps.
If a suggested code/method fails, informing us is less important than telling us why or what errors occurred.

#5
FireGator

FireGator

    Learning Programmer

  • Members
  • PipPipPip
  • 37 posts
Nullw0rm - Thank you! And thank you for mentioning how it works, while I posted I was looking around the net and couldn't find anything more than just answers that fopen would be more efficient.

I'm interested in C components, where might I find the find the source files regarding the streams, and fopen? I might be more confused but I think its worth it for my understanding.

Edited by FireGator, 06 September 2010 - 05:19 PM.


#6
Alexander

Alexander

    It's Science!

  • Moderators
  • 4,124 posts

FireGator said:

where might I find the find the source files regarding the streams, and fopen? I might be more confused but I think its worth it for my understanding.

I'm glad you are interested in such things. Check source here http://www.php.net/g...2/from/a/mirror

The main file functions (feof, fscanf, fopen, ftell, mkdir, readfile, unlink, ...) are located in /ext/standard/file.c, you can follow the references using something like grep -lr "php_stream*" ./* , all standard functions are in /ext/standard and main stream core (php_streams* use FILE* just as ANSI stdio) located in /main/streams/streams/ and /main/php_stream* files.

I'd be willing to share a few developer notes (old as they be), may become useful for deciphering the correlation of the various PHP APIs, just ask.
Be sure to read the updated FAQ! || Health is achieved through the same 10,000 steps.
If a suggested code/method fails, informing us is less important than telling us why or what errors occurred.

#7
FireGator

FireGator

    Learning Programmer

  • Members
  • PipPipPip
  • 37 posts
I'm glad I got a few hard C lessons in school before I went into telecommunications, I was able to understand a lot of the theory. I had written a few client/server relational implementations for the Unix computers in my current class and my teacher thought they were amazing.

Nullw0rm said:

I'd be willing to share a few developer notes (old as they be)

Would you mind PMing me your notes for SPL (standard PHP library), since PHP is based on C I'm sure there are plethora of similarities in how it works, tomorrow (if I am free) I'll go down to my local book store (a friend works there) and see what he has for PHP theory and programming, I'd love to get some inside input from you along with my studies.

Thanks again, Nullw0rm.