Monday, July 15, 2013

PHP - Log Maintenance

Running a PHP file from cron is easy enough, simply specify the PHP bin path, the path to the PHP script and where to send the output of the script. The issue comes with how to manage the size of the log file. You can set a job to truncate the file as opposed to append it but that means you loose any of the log lines that were in it. A better option is to use one of the cron jobs to rotate the logs into an archive of some sort.

This script first checks when it was last run, if it's under an hour ago, it skips to the remainder of the job.

We then check the file size, if it's over 1MB it starts the process of archiving it and creating a blank replacement. To be sure we can know when the archived log file was created, we put the epoch time that it was created in the file name.

Every time this portion of the job runs, it also checks to see if there are any log files that are older than seven days.  Because we put the epoch time that the archive was created, this is a simple matter of getting the epoch time back out of the file name and comparing it to our cutoff value. If the file is older than cutoff date, we simply delete it then log that we did so.

// only check log files for maintenance once an hour at most
if($process->lastRan < (time() - 3600)) {
    $logFile = LOG_PATH.'/cron.log';
    // if log file is over 1MB, archive it and create a new one
    if(filesize($logFile) > (1024 * 1024)) {
        // put the epoch time into the name so we can tell when it was archived
        $logFileBk = 'cron.'.time().'.log';
        // duplicate the file with the new name
        copy($logFile, LOG_PATH.'/'.$logFileBk);
        // create a blank 
        file_put_contents($logFile, date('M d, Y H:i:s')."\t[cron_controller]\tTruncated log file - $logFileBk\n");
    }
    // look for any log file that is oder than 7 days
    $end = strtotime('00:00:00 - 7 days');
    $list = scandir(LOG_PATH);
    foreach($list as $i) {
        // get the date date from the file name if it matches the correct form
        if(preg_match('/^cron\.([0-9]{10})\.log$/',$i,$matches) === 1) {
            // if the found date is older than 7 days ago
            if((int)$matches[1] < $end) {
                // delete the file
                unlink(LOG_PATH.'/'.$i);
                echo date('M d, Y H:i:s')."\t[cron_controller]\tDeleted log file - $i\n";
            }
        }
    }
}

This script could be put in either the beginning of the cron job or at the end. But if you are using the PHP Cron Controller that I outlined in this post, I highly recommend doing this log maintenance as early in the script as possible. If you do it at the end of the script, you could run into a situation where a job called from the PHP Cron Controller is working and writing lines when this maintenance is being done.

Monday, July 8, 2013

PHP Logging Class

The last post talked about a PHP Cron Controller, the problem is that because it's a fire and forget controller, it does not capture the output (including errors) of the jobs and do anything with it. That's why we need a way to log the output and the errors to a file for later review.

Since I prefer to do most everything in classes to keep functions and variables isolated,  that's what I did for this set of functions.

The first thing we need to do is to initiate the class. In this function we will be determining the name of the cron job that is to be logged. We can either do this by submitting the job name (process) or it can be determined by the file name that initiated the function. Obviously it's faster to submit the process name, but I like flexibility. If you capture the output of debug_backgrace function, you'll see that it has the file name of the function that called the current function. This is what we use to get the process name if it's not submitted to the class constructor.

The constructor also allows us to change the file that the logs are written into. For safety, we perform some tests on the file specified to make sure it exists and that we can write to it. If either of these tests fail, we use PHP to see if we can fix the problem. We could use shell commands, but my experience is that most servers don't allow PHP to execute shell commands so I'd say, don't bother unless you know you have the ability.

If after all the tests we determine that we can't write to the log file, there is no point in letting the later functions work so we set a flag (writeError) in the class so we can kill other functions immediately.
    public function __construct($process=null, $logFile='cron.log') {
        if($process == null || $process == '') {
            $callers = debug_backtrace();
            $file = $callers[0]['file'];
            $fA = explode('/', $file);
            $file = array_pop($fA);
            $fA = explode('.', $file);
            array_pop($fA);
            if(count($fA) > 1) {
                $this->process = implode('.', $fA);
            } else {
                $this->process = $fA[0];
            }
        } else {
            $this->process = $process;
        }
        
        $this->logFile = LOG_PATH . '/' . $logFile;
        if(!file_exists($this->logFile)) {
            if(file_put_contents($this->logFile, 'Log Created '.date('M d, Y H:i:s')."\n") === false) {
                $this->writeError = true;
            }
        } elseif(!is_writable($this->logFile)) {
            $t2 = @chmod($this->logFile, 0777);
            if($t2 === false) {
                $this->writeError = true;
            }
        }
    }
The class is initiated in the cron job by calling:
$log = new Log_Model('metrics');

Next we want to be able to actually write some useful lines to the log file. The only thing that this function requires is the string to output to the log file. We'll discuss the $die in a little bit. For safety, we perform a very basic test on the input to make sure it is a string and that it's not empty. If the constructor couldn't write to the log file, we don't even attempt to write to the file here, but we do let the other portions of the function work. For standards sake, I like to begin each line with the date and time then the process that this line relates to.

I like to use the file_put_contents function as opposed to using the file handle method because it's simpler. We still have the ability to do the append by adding the FILE_APPEND flag. Since multiple jobs could be writing to the log file at once, we add the LOCK_EX flag.

The die portion of this function has to do with the error handling but it can also be used in the cron job if for instance a database call yielded no results. But since this is less clear in the cron job, I like to implicitly put the exit in the job itself rather than a true in the function call like $log->writeLogLine('Log this line', ture).
    public function writeLogLine($str, $die=false) {
        if(!is_string($str) || $str == '') {
            return false;
        }
        $r = false;
        if($this->writeError == false) {
            $line = date('M d, Y H:i:s')."\t[$this->process]\t$str\n";
            if(file_put_contents($this->logFile, $line, FILE_APPEND | LOCK_EX) !== false) {
                $r = true;
            }
        }
        if($die === true) {
            exit;
        }
        return $r;
    }

The last portion of the class is the error handling. To be fair, this is mostly stolen from the PHP documentation. I've done some minor changes in function where it will send the error output to the log file (via the writeLogLine function) and send the die command to the function if it is a fatal error.
    public function errorOut($errno, $errstr, $errfile, $errline) {
        if (!(error_reporting() & $errno)) {
            // This error code is not included in error_reporting
            return;
        }

        $die = false;
        $str = '';
        switch ($errno) {
            case E_USER_ERROR:
                $str .= "FATAL ERROR [$errno] $errstr in $errfile on line $errline\n";
                $die = true;
                break;

            case E_USER_WARNING:
                $str .= "WARNING [$errno] $errstr in $errfile on line $errline\n";
                break;

            case E_USER_NOTICE:
                $str .= "NOTICE [$errno] $errstr in $errfile on line $errline\n";
                break;

            default:
                $str .= "Unknown error type: [$errno] $errstr in $errfile on line $errline\n";
                break;
        }
        
        $this->writeLogLine($str, $die);

        // if you return false, it will still execute PHP's error handling
        return true;
    }

To enable the error handling, I prefer to explicitly call it in the cron job as opposed to doing it in the class constructor because it's much clearer in the cron job. To do this simply call:
set_error_handler(array($log, 'errorOut'));
The $log is the instance of the class and the errorOut is the function to use in the event of an error.

This method of handling logging and errors works well for jobs that work in the background that you don't typically see the output of. If you also wanted to see the output, it would be easy to add an echo into the writeLogLine function, but for my application, it was completely unnecessary.

Monday, July 1, 2013

PHP Cron Controller

I've just finished working on a distributed web application that required fourteen different cron jobs. As this project was developing and released to beta testers, we did not have the ability to modify when those cron jobs executed or even which ones executed since the user installed them on their server. Users also complained that it was the hardest part of the installation. Just before public release of the project, we eliminated one of the cron jobs but because beta users had it setup in their system, the only thing we could do was wipe the file so it could still run without error.

With a traditional cron scheduler, I was not able to easily control what jobs fired when with a simple application update. Also, cron setup seemed to be difficult for some users to manage, even though I gave them the exact values to put in. Part of the problem was that there were so many jobs and when a schedule value changed, some users didn't see the change in value. This ment that jobs that should be executed once a day were getting executed every hour. While this is fixable in the PHP script itself, I'd rather have it done correctly the first time.

The solution I've come up with for this next project is a fire and forget cron controller written in PHP. There is one standard cron job that fires every minute. That job is a controller that has a list of every job with a schedule for it. For flexibility, I wanted to use similar notation as cron so to get a job to run every five minutes I'd use */5 and to get it to run five after the hour it would simply be 5. For simplicity, I put all the jobs into an array.
$jobs = array(
    'metrics-1' => array(
        'minInt'    => '*/5',
        'hourInt'   => '*',
        'url'       => '/app/cron/metrics.php?block=1'
    ),
    'metrics-2' => array(
        'minInt'    => '*/5',
        'hourInt'   => '*',
        'offset'    => 1,
        'url'       => '/app/cron/metrics.php?block=2'
    )
);
The offset allows a job to easily have a one minute delay, in this case so it will run every five minutes but one minute after the other job. Here's the main potion of the controller.
$batch = array();
foreach($jobs as $name => $data) {
    $min = (int)date('i');
    $hr = (int)date('H');
    $runJob = false;
    $mI = $data['minInt'];
    $mH = $data['hourInt'];
    
    if(isset($data['enable']) && $data['enable'] == 0) {
        continue;
    }
    
    // evaluate minute intervals
    if(strpos($mI, '*') !== false) {
        if(strpos($mI, '/') !== false) {
            $i = str_replace('*/','',$mI);
            if(isset($data['offset'])) {
                $min = $min - (int)$data['offset'];
            }
            if(($min % $i) == 0) {
                $runJob = true;
            }
        } else {
            $runJob = true;
        }
    } elseif(strpos($mI, ',') !== false) {
        $i = explode(',', $mI);
        if(in_array($min, $i)) {
            $runJob = true;
        }
    } else {
        $i = (int)$mI;
        if($min == $i) {
            $runJob = true;
        }
    }
    
    // evalute hour intervals
    if($runJob === true) {
        if(strpos($mH, '*') !== false) {
            if(strpos($mH, '/') !== false) {
                $i = str_replace('*/','',$mH);
                if(($hr % $i) != 0) {
                    $runJob = false;
                }
            }
        } else {
            $i = (int)$mH;
            if($hr != $i) {
                $runJob = false;
            }
        }
    }
    
    if($runJob === true) {
        echo date('M d, Y H:i:s')."\t[$name]\tStarting\n";
        $batch[] = $data['url'];
    }
}
Since I only needed the minute and hour setting, that's all this controller will handle.

Once the controller determines which jobs to run at that time, they are called via cURL. The advantage of curl is that the jobs can be called and the output from those jobs can be ignored.
$hArr = array();
foreach($batch as $url) {
    $h = curl_init();
    
    curl_setopt($h, CURLOPT_SSL_VERIFYHOST, false);
    curl_setopt($h, CURLOPT_SSL_VERIFYPEER, false);

    curl_setopt($h, CURLOPT_USERAGENT, "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.82 Safari/537.1");
    curl_setopt($h, CURLOPT_ENCODING, 'gzip,deflate');
    curl_setopt($h, CURLOPT_RETURNTRANSFER, false);
    curl_setopt($h, CURLOPT_AUTOREFERER, true);
    curl_setopt($h, CURLOPT_HEADER, false);
    curl_setopt($h, CURLOPT_URL,$url);
    curl_setopt($h, CURLOPT_TIMEOUT, 10);

    $hArr[] = $h;
}
$mh = curl_multi_init();
foreach($hArr as $h) {
    curl_multi_add_handle($mh,$h);
}

$running = null;
do {
    curl_multi_exec($mh,$running);
} while($running > 0);
Since we don't care about the output, we can set the timeout to ten seconds. I felt ten seconds was safer so that DNS could be resolved and the server could be sure to start the request. Inside the actual jobs, the time limit is set to what ever is needed for that job.

Because the actual job are called via standard web services, the usual issues of what user is executing the cron job and server document root are all avoided. The issue of logging for these jobs is harder but also easily handled with a logging class that does error handling (I'll discuss that in a later post).

Update

After deploying this to a couple of dozen servers, I've come to the conclusion that some servers don't deal well with the fire and forget method. What appears to be happening is that after the controller stops looking for a response, the server kills the script that was called. For safety I've increased the CURLOPT_TIMEOUT to 300 to ensure that the scripts are not killed by the server. This only seems to be happening on one hosting company that I know of so far so I assume it's a special setup they have.