Thursday, December 24, 2009

Progressive Upload with PHP and APC

A year or two ago, I was given the task to make a Progress Bar work with PHP. From my own research a few years ago, this task was impossible and so I thought I would need some sort of Flash + Javascript + PHP combination to make this happen.

Until now.

I found a nice recipe from IBM that actually makes this process much simpler. It's by using a simple PECL extension called Alternative PHP Cache (APC for short). This brilliantly simple extension allows us to FINALLY be able to monitor file upload progress! I read the tutorial and right away, I was throwing my fists into the air: it's finally possible to do progress bars with PHP WITHOUT THE NEED OF CGI !!!!

For anyone who has ever run into this issue, you are probably the only ones to truly respect this awesome new toy. What's even more bizarre is that this functionality has been out since PHP 5.2 and this is truly the first I've ever heard about it.

My problems though soon caught up with me. Apparently IBM's little "tutorial" is a little short-winded and doesn't exactly cover all the bases that are needed to make this new extension work. So, being the great guy I am, I decided to write a step-by-step tutorial on how to use this for the weak at heart.

==========================================================

*** INSTALLATION ***

First, you need to get the APC module. There are 2 downloads available:
Linux and Windows

* Please note that the Windows link above may be broken as the team is trying to move this box over to a new server... *

Next, you need to Install the modue.

-- FreeBSD/Linux Users only --
First things first: you'll need root to make all this work, so su into the root user.

After downloading the APC package (I downloaded the 3.0.18 stable bundle), you need to unpackage it. Once you have untar'ed the file, you need to cd into the apc directory and run the following set of commands:

phpize
./configure --enable-apc --enable-mmap
make install
cp modules/apc.so /usr/local/lib/php/
The phpize command is used to prepare the build environment for a PHP extension. If you don't have phpize installed, you will want to do so via RPM, ports, or whatever method your nix distro supports.

Here are some commands for a few linux distros:

Fedora Core x:
yum -y install php-devel
Gentoo Click here for the gentoo quick guide

FreeBSD Installed by default with PHP

I can't cover them all, but these 3 seem to be the most popular (along with Ubuntu but I'm not familiar with it)

Next, you'll need to open the php.ini file to make a few quick changes.

vi /usr/local/etc/php.ini
Locate where the Windows Extensions are loaded (yes I know you aren't using Winblows, but you'll see what I'm getting at next).

Below the ;extension=php_zip.dll line, add a new line and enter the following extension:

extension=apc.so
Return a few lines down and enter the following:
;;;;;;;;;;;;;;;;;;;;;;;
; APC LAYER CONTROL ;
;;;;;;;;;;;;;;;;;;;;;;;
apc.rfc1867=on
apc.max_file_size=200M
upload_max_filesize=200M
post_max_size=200M

Now, you do NOT have to use 200M. I simply did to test out uploading huge 191M video files to see if this progress bar truly works. Adjust this to suit your own personal needs.

Finally, save the file and exit by typing:

:wq!
You'll need the bang(!) because php.ini is by default a read-only file.

The last step now is just to restart apache:

apachectl restart
apachectl is the command that restarts Apache in FreeBSD. However, it may or may not be available in your distro/PATH (user's profile) so you may need to browse into that directory and run the appropriate server command used in your distro (example: /etc/rc.id/init.d/httpd -k restart).

Since I have never done this on windows, I'm just going to copy/paste IBM's installation method here. NOTICE: It's asking you to use the WAMP install package, but most developers here manually have their's installed. If you have issues, just post it here and we'll see if a solution can't be found.

--- WINDOWS USERS ONLY ---

APC is not enabled by default in PHP V5.2. Since the new hooks are a part of APC, we need to make sure to install the extension and make it available to the PHP interpreter. This is accomplished by downloading the php_apc extension files. In our case, we are using an installation of WAMP, a freely available packaged PHP for Windows®, which includes Apache and MySQL. It offers a nice user interface and is easy to manage with menus that support configuration options.

To set up APC on WAMP:

1. See Resources to download the libraries and WAMP.
2. Install WAMP.
3. Put the php_apc.dll file in the extensions folder for PHP. This is /php/ext by default.
4. Use the system tray WAMP menu to select PHP settings>PHP Extensions>Add Extension.
5. In the command-line interface that pops up, type php_apc.dll and press Enter.
6. Using a text editor, open /php/php.ini and add the line apc.rfc1867 = on (it doesn't matter where). If you're trying to test locally and plan to upload large files so you can actually see progress, you'll also want to add the following directives: apc.max_file_size = 200M, upload_max_filesize = 200M, and post_max_size = 200M. Don't do this on a live production server, though, or you're likely to use up bandwidth and disk space allotments, not to mention slowing everyone else down to a crawl.
7. Restart the webserver.

APC should now be set up and initialized. The RFC1867 features of APC — the features that enable you to track file uploads — should now be enabled as an option, and you should be ready to look into our file uploads to enable real-time status.

If you have trouble installing this on windows, just post and I'm sure we can figure it out. In the meantime, I think these steps are self-explanatory enough to get you started.

==========================================================

Testing out the installation

Testing out the installation is pretty simple. Just go into the directory where you initially unpackaged the APC bundle and find the file apc.php. Copy this file into your docroot and open it up in your browser. If the installation worked, you should see a big long explanation about all of APC's features, along with a nice pie chart if the GD library is also installed.

==========================================================

Making A Progress Bar Test

Finally, we're ready to test out this progress bar stuff! It's very freaking easy, and I'm first going to just give you all the code you'll need.


__________________________________________________________


progress.php

<?php
$id = uniqid("");
?>
<html>
<head><title>Upload Example</title>
<script type="text/javascript">
function getProgress() {
CDownloadUrl('get', "getprogress.php?progress_key=<?php echo($id)?>",
function(percent) {
document.getElementById("progressinner").style.width = percent+"%";
if (percent < 100) {
setTimeout("getProgress()", 100);
}
});
}

function CDownloadUrl(method, url, func) {
var httpObj;
var browser = navigator.appName;
if(browser.indexOf("Microsoft") > -1)
httpObj = new ActiveXObject("Microsoft.XMLHTTP");
else
httpObj = new XMLHttpRequest();

httpObj.open(method, url, true);
httpObj.onreadystatechange = function() {
if(httpObj.readyState == 4) {
if(httpObj.status == 200) {
var contenttype = httpObj.getResponseHeader('Content-Type');
if(contenttype.indexOf('xml')>-1) {
func(httpObj.responseXML);
} else {
func(httpObj.responseText);
}
} else {
func('Error: '+httpObj.status);
}
}
};
httpObj.send(null);
}

function startProgress(){
document.getElementById("progressouter").style.display="block";
setTimeout("getProgress()", 1000);
}

</script>
</head>
<body>
<iframe id="theframe" name="theframe" src="upload.php?id=<?php echo($id); ?>" style="border: none;
height: 100px; width: 400px;" > </iframe><br/><br/>
<div id="progress_win"></div>

<div id="progressouter" style="width: 500px; height: 20px; border: 6px solid red; display:none;">
<div id="progressinner" style="position: relative; height: 20px; background-color: purple;
width: 0%;"></div>
</div>

</body>
</html>

__________________________________________________________

upload.php
<?php
$id = $_GET['id'];
?>
<form enctype="multipart/form-data" id="upload_form" action="target.php" method="POST">
<input type="hidden" name="APC_UPLOAD_PROGRESS" id="progress_key" value="<?php echo $id?>"/>
<input type="file" id="test_file" name="test_file"/><br/>
<input onclick="window.parent.startProgress(); return true;" type="submit" value="Upload!"/>
</form>

__________________________________________________________

target.php
<?php
if($_SERVER['REQUEST_METHOD']=='POST') {
$filename = $_FILES["test_file"]["tmp_name"];
$destination = "/path/to/apache22/data/progressbar/uploads/" . $_FILES["test_file"]["name"];
move_uploaded_file($filename, $destination);
echo "<p>File uploaded. Thank you!</p>";
}
?>

__________________________________________________________

getprogress.php
<?php
header('Expires: Mon, 26 Jul 1997 05:00:00 GMT');
header("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT");
header('Cache-Control: no-store, no-cache, must-revalidate');
header('Cache-Control: post-check=0, pre-check=0', FALSE);
header('Pragma: no-cache');

if(isset($_GET['progress_key'])) {
$status = apc_fetch('upload_'.$_GET['progress_key']);
echo $status['current']/$status['total']*100;
}
?>
==========================================================

Some Explaining to Do

First, let me say that the above code is direct code examples from IBM, but altered a little bit to make our lives much simpler.

In the first file, progress.php, IBM originally had a Google MAPS API work-around using the GDownloadUrl function. As anyone knows who has tried to use this behind a firewalled web server, you can't validate your KEY (i.e.: <script src="http://maps.google.com/maps?file=api&v=2&key=YOURKEYHERE" type="text/javascript"></script>) without Google being able to access your server. Since this is true, the now awesome GDownloadUrl function is unusable. Fortunately I found the same idea they use for this function already exists in a separate implementation called CDownloadUrl that does not require Google Authentication. As I suspected, it's basically just an XHTTPRequest method that asks for the results of getprogress.php

Also in progress.php there is a bit of confusion as to what exactly the $id value does. Well, the $id value, even though it appears blank, is anything but. It holds a unique ID value, so that we can track our current session without worrying about compromising it with someone else's results. To see it's value, just put "echo $id;" somewhere after the tag to view its result.

Next, is target.php. This file basically just tells our application WHERE to store the new Uploaded file.

Finally, getprogress.php. In this file is one more foreign entry that IBM does not give you: no-caching.

header('Expires: Mon, 26 Jul 1997 05:00:00 GMT');
header("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT");
header('Cache-Control: no-store, no-cache, must-revalidate');
header('Cache-Control: post-check=0, pre-check=0', FALSE);
header('Pragma: no-cache');
//....
We HAVE to put the no-cache statement in there for this to work in both Firefox and IE 6+. IE 7 (maybe IE 6, I haven't tested this in that browser yet) CACHES your value. So unless you put this no-cache statement in there, the result you are going to see is the progress bar jump 1 time only (usually around 4-7% on big files or 90% to 100% on smaller files). This is because once IE's Ajax call retrieves the value once, it freaking caches the value and just sits there. And this little annoyance smurf ME OFF (sorry, but it did). I can't believe that not only I have to put hacks in place for IE in css, but now also because it caches VALUES when I don't want it to??????? Sigh........

==========================================================

Conclusion

Run the progress bar example. It works. Works like a freaking dream!! I couldn't believe it but it has finally made me smile once again

It should also be noted here that APC is NOT just for progressive uploads, but instead a caching service. We're basically just using APC's caching ability in order to grant us the nice method of progressive uploads...

Anyways, I hope you enjoyed this tutorial. Take it easy.

==========================================================

Resources
What's new in PHP V5.2, Part 5: Tracking file upload progress
PHP Manaul for APC
CDownloadURL Javascript Function

@author: Jonathon Hibbard


I originally wrote this article in April, 2008, which can be found here.

Friday, December 11, 2009

Simple Optimizing Tips for PHP Part III - Commenting

As we write PHP code (or any language for that matter) we begin to start thinking about optimizing code. But even though our intentions are for "the greater good", we sometimes miss things that should be obvious.

Today's article takes a look at:

Commenting

Commenting, to some, doesn't make sense to be an "optimizing" tip. I'm here to tell you that this is wrong. Commenting is, without a doubt, a key part to optimizing your code!

If we consider all the scripts we have built over the years, one of the fundamental things that allow a script to be a "strong" and "well written" script is its commenting structure.

Comments allow developers to find key methods and components that make a script operate much easier. Comments also allow for maintaining code and, in the process, make debugging these scripts easier. Thus it optimizes not only PHP, but also YOU!

There are 3 supported methods of commenting blocks in PHP. They are:

// (2 forward slashes) - This commenting break comments out one line.

# (or hash) - This commenting break also comments out one line.

The only difference is this method was (and still is) used in Unix

programming.


/* */ - This commenting style is for one to many lines of code.



While knowing the commenting blocks is all good and great, there exists 3 main questions that are asked by all levels of developers:
1) Where should I comment?

2) How often should I comment?

3) What kind of commenting style should I use?

Where should I comment?
This is perhaps the trickest question of all. The key here is if we define where to make comments, we could inadverdently convince a developer to make comments more often then he should. So to answer this quesiton without misleading and causing more harm than good, I have a few examples of "where" one should comment.

You should comment on functions, variables, or complex blocks of code in which:
a) the meaning of the definition is unclear,
b) where it wasn't written in the most optimal way,
c) where it was a "hack",
d) where an "important" piece that solves a complex issue,
e) where a "change" from a privious method existed,
f) when commenting "out" an old method,
g) when setting ground rules for said definitions.

Since there are obviously a lot of instances commenting is great, we must be careful not to over do our comments. Which leads us to the next question...



How often should I comment?
You should not comment too heavily, and you should not comment too lightly. An even medium is suggested, with an influence on the lighter side.

While some will say lots of comments help people understand things, I will disagree and say it causes more complexity than is needed. If we end up having 100 lines of a 300 line file being code, and the other 200 lines of code being comments and whitespace, it very evident that the code is either written so poorly that lots of comments are needed to explain how it works, or that the author himself is not sure how the code works.

Whlie some will say this is not true that they just "like" to make long, elaquent comments, I can respond with that there is a time and place (such as a wiki) for such long-winded explanations.

So, how do we solve this issue of not writing too much, or too little comments?

It's very simple. As stated in the "Where to comment" above, you should mainly be commenting on pieces of code that are not very easy to follow. So, to explain these pieces, here are a few scenarios that should help you on your way:

1) Imagine that you have a variable that has been defined by an included file. A comment where the variable first appears would be an extremely helpful spot to point a new developer (or even your future self!) in the right direction.

2) Say we have a new class or function that we have writte to save ourselves some time. Make a comment explaining the purpose of this class/function!

3) Imagine that you have a few included files that perform other operations that will make the script you're currently viewing operate correctly. You could make 1 comment for all these includes, explaining why they are there and what purpose they have.

4) One day you have to debug a piece of code but you forgot how it works. So you start back-tracking through the code and figure out, piece by piece, everything that was needed to make the end result work. Find the most vital part of your code, and create a comment there explaining just what you found! There may be a workound later.

Whlie these examples are small, the concept should be quite clear. Try to consolidate your comments, but try to keep them short and sweet and to the point! The comments are there to leave breadcrumb trails and help debug, NOT to create a new bestseller book!

What kind of commenting style should I use?
The commenting style I will suggest here is using phpDoc commenting format. To learn about phpDoc, I would recommend their website for more information.

Why phpDoc? Because, for me and the teams I've worked with, the style is clean, organized, and even supported in IDE's such as Zend and Eclipse!

While I know this article was very short, I must break here. A whole book could be focused on the issue of commenting methods (and books DO exist on the subject!), but today just isn't the time for that sort of article. For now, have a merry christmas!

Be sure to check out my next optimizing tutorial : By Example

Thursday, December 3, 2009

How to Fix svn: Directory __DIR__ containing working copy admin area is missing (and other misc SVN shortcuts)

So, you have a directory called myclass, you've added the directory to SVN, and then you issue a commit. During the commit, you get a message saying either a file is outdated, or a directory already exists that causes this commit to fail. No big deal right? So you fix those issues and try to commit again... Only this time, you get a totally different error that can cause a pretty ugly headache when you're trying to solve it.

Problem: svn: Directory 'myclass' containing working copy admin area is missing (and other misc SVN shortcuts)

Solutions
1) You could try and "checkout" the whole project again with the command svn co http://svn.mydomain/my_project/ my_project. This works for some, but in the case above, it may or may not work...

2) You could try and "checkout" only the folder that's having the issues into a temporary directory, then move the .svn folder into the SVN controlled directory where your project lives with the commands:
svn co http://svn.mydomain/my_project/myclass my_temp_folder
mv my_temp_folder/myclass/.svn /path/to/my_project/myclass/

Again, this will work for some, but in the case above it may yet again not solve your issue.

3) Finally, you could copy the folder (in this case, myclass) to a temporary location, force delete it from svn's repository, and copy the folder back in, issue the add, and then commit again with the following commands:
cp -r myclass /home/myuser/myclass_temp/
svn --force delete myclass
cp -r /home/myuser/myclass_temp/ /path/to/my_project/myclass
svn add myclass
svn commit -m "Added myclass folder"

This 3rd step works for me everytime, though the others could work for you as well.


Since we're talking about SVN, here are some helpful commands should you need them...

Checkout a SVN Repository : svn co http://svn.mydomain.com/myrepo/my_project myproject

Force and ADD recursively to all files and directories inside a working SVN folder: svn add --force ./*

UPDATE a SVN project: svn update

Commit a change to your SVN project: svn commit -m "My Message here..."

Delete some files or folders from SVN svn --force delete my_file_or_folder

Friday, November 20, 2009

MySQL Quick Reference

This is just a simple quick reference for normal mysql routines that I forget from time to time and have to lookup.

MySQL

Index
es (from SitePoint)

Adding a “normal” index via CREATE INDEX:

mysql> CREATE INDEX [index_name] ON tablename (index_columns);
Example mysql> CREATE INDEX fname_lname_age ON people (firstname,lastname,age);

Adding a unique index via CREATE INDEX:
mysql> CREATE UNIQUE INDEX [index_name] ON tablename (index_columns);
Example: mysql> CREATE UNIQUE INDEX fname_lname_age ON people (firstname,lastname,age);

Adding a “normal” index via ALTER TABLE:
mysql> ALTER TABLE tablename ADD INDEX [index_name] (index_columns);
Example: mysql> ALTER TABLE people ADD INDEX fname_lname_age (firstname,lastname,age);

Adding a unique index via ALTER TABLE:
mysql> ALTER TABLE tablename ADD UNIQUE [index_name] (index_columns);
Example:mysql> ALTER TABLE people ADD UNIQUE fname_lname_age (firstname,lastname,age);

Adding a primary key via ALTER TABLE:
mysql> ALTER TABLE tablename ADD PRIMARY KEY (index_columns);
Example: mysql> ALTER TABLE people ADD PRIMARY KEY (peopleid);

Adding a “normal” index via CREATE TABLE:
mysql> CREATE TABLE tablename (
rest of columns,
INDEX [index_name] (index_columns)
[other indexes]
);

Example:
mysql> CREATE TABLE people (
peopleid SMALLINT UNSIGNED NOT NULL,
firstname CHAR(50) NOT NULL,
lastname CHAR(50) NOT NULL,
age SMALLINT NOT NULL,
townid SMALLINT NOT NULL,
INDEX fname_lname_age (firstname,lastname,age)
);

Adding a unique index via CREATE TABLE:
mysql> CREATE TABLE tablename (
rest of columns,
UNIQUE [index_name] (index_columns)
[other indexes]
);

Example:
mysql> CREATE TABLE people (
peopleid SMALLINT UNSIGNED NOT NULL,
firstname CHAR(50) NOT NULL,
lastname CHAR(50) NOT NULL,
age SMALLINT NOT NULL,
townid SMALLINT NOT NULL,
UNIQUE fname_lname_age (firstname,lastname,age)
);

Adding a primary key via CREATE TABLE:
mysql> CREATE TABLE tablename (
rest of columns,
INDEX [index_name] (index_columns)
[other indexes]
);

Example:
mysql> CREATE TABLE people (
peopleid SMALLINT NOT NULL AUTO_INCREMENT,
firstname CHAR(50) NOT NULL,
lastname CHAR(50) NOT NULL,
age SMALLINT NOT NULL,
townid SMALLINT NOT NULL,
PRIMARY KEY (peopleid)
);

Dropping (removing) a “normal” or unique index via ALTER TABLE:
mysql> ALTER TABLE tablename DROP INDEX index_name;
Example: mysql> ALTER TABLE people DROP INDEX fname_lname_age;

Dropping (removing) a primary key via ALTER TABLE:
mysql> ALTER TABLE tablename DROP PRIMARY KEY;
Example: mysql> ALTER TABLE people DROP PRIMARY KEY;

-----------------------------------------------------------------------

Add Column
mysql> ALTER TABLE tablename ADD COLUMN [Col_Name] [Col_Definitions]

Add TinyINT Column
mysql> ALTER TABLE customers ADD COLUMN age tinyint unsigned NOT NULL default '0';

Add Varchar Column
mysql> ALTER TABLE customers ADD COLUMN first_name varchar(40) NULL;


Drop Column
mysql> ALTER TABLE tablename DROP COLUMN [Col_Name]
Example
mysql> ALTER TABLE tablename DROP COLUMN age;

-----------------------------------------------------------------------

Import/Export DB and Tables

DB Import Example

mysql -u [username] -p [password] [database] < [filename]
~$ mysql -u my_user -p < customers_backup.sql

Full DB Export Example
mysqldump [options...] -u [username] -p [password] [database] > [filename]
~$ mysqldump -u my_user -p customers > customers_backup.sql

Full DB Export with Drop Table
mysqldump [options...] -u [username] -p [password] [database] > [filename]
~$ mysqldump --add-drop-table -u my_user -p customers > customers_backup.sql

DB Export with Specific Tables
mysqldump -u [username] -p [password] [databasename] [table1 table2 ...] > [filename]
~$ mysqldump --add-drop-table -u my_user -p customers account_info blog_entries > customers_backup.sql

-----------------------------------------------------------------------

Table Stats

SHOW TABLE STATUS
SHOW TABLE STATUS [ {FROM | IN} DB_NAME ] [LIKE '%' | WHERE
The following example uses a database name of "stats" and a table name of "Server"
mysql> SHOW TABLE STATUS FROM `stats` WHERE `Name` = 'Server'


-----------------------------------------------------------------------

Misc

Create User
CREATE USER user [IDENTIFIED BY [PASSWORD] 'password'];
mysql> Create USER my_username IDENTIFIED BY 'my_password';

Create Database
$ mysqladmin -u -p create
~$ mysqladmin -u my_username -p create my_database

Drop/Delete Database
$ mysqladmin -u -p drop
~$ mysqladmin -u my_username -p drop my_database

Monday, November 16, 2009

Simple Optimizing Tips for PHP Part II - Recursive Functions

As we write PHP code (or any language for that matter) we begin to start thinking about optimizing code. But even though our intentions are for "the greater good", we sometimes miss things that should be obvious.

Today's article takes a look at:

Recursive Functions

Functions are one of the most important tools we have for developing reusable, structured code. When we build our functions, we are mainly looking at ways to reduce redundancy and create effective ways to perform the same operations we use in either 1 script or all our scripts.

In our attempts to reduce redundancy, we inadvertently create more redundancy or miss opportunities to simplify how our code executes.

Consider this situation. We have a multidimensional array set that we want to display not only in a div, but also divs inside of divs. One method we could solve this with involves something like the following code:


$array = array("info" => array("employees" => array("name" => array(array("first" => "John",
"last" => "Doe",
),
array("first" => "Bob",
"last" => "Henry",
),
),
),
"departs" => array("building_1" => array("Sales",
"PR",
"HR",
),
"building_2" => array("Developers",
"Management",
),
),
),
"misc" => "Created in 2009",
);

function display_information(array $data_array) {
foreach($data_array as $key => $value) { // info
if(is_array($value)) {
foreach($value as $sub_key => $sub_value) {
if(is_array($sub_value)) {
foreach($sub_value as $sub_key2 => $sub_value2) {
if(is_array($sub_value2)) {
foreach($sub_value2 as $sub_key3 => $sub_value3) {
if(is_array($sub_value3)) {
foreach($sub_value3 as $sub_key4 => $sub_value4) {
echo "$sub_key4 => $sub_value4";
}
} else {
echo "$sub_key3 => $sub_value3";
}
}
} else {
echo "$sub_key2 => $sub_value2";
}
}
} else {
echo "$sub_key => $sub_value";
}
}
} else {
echo "$array_val";
}
}
}
While the above code works exactly as we need it to, it isn't exactly written the most efficient or optimized way. If you look at the code above, you can already start to see the redundancy involved with it. It's the exact same foreach statement whenever our condition of "is_array" rings true, and the exact same statement when it rings false. Plus, since we have to keep track of where we are in each of our loops, we get to the point to where our foreach loop parameters are extremely difficult to keep track of and the code in general is just a mess.

So how can we optimize the code above to be less redundant, and in return create a quicker, less line, and optimized solution? We do it by creating a recursive function.

A recursive function is just a method that calls itself from within itself. It helps reduce redundancy by applying similar logic to a result set without having to rewrite the exact same methods over and over again (as in the above code).


$array = array("info" => array("employees" => array("name" => array(array("first" => "John",
"last" => "Doe",
),
array("first" => "Bob",
"last" => "Henry",
),
),
),
"departs" => array("building_1" => array("Sales",
"PR",
"HR",
),
"building_2" => array("Developers",
"Management",
),
),
),
"misc" => "Created in 2009",
);

function display_information(array $data_array) {
foreach($data_array as $key => $value) { // info
if(is_array($value)) {
display_information($value);
} else {
echo "$array_val";
}
}
}


Nice isn't it? We take a 29 line function from our original example and turned it easily into a 9 line optimized function. This not only reduces the lines dedicated our function, but it also makes it easier to read, faster, and above all optimized!

Recursive functions can be used in many instances. The key for the developer is to recognize when it's needed by carefully examining the code and determine if it's needed. This example (while a bit exaggerated) will hopefully open your eyes and also teach you just one more simple thing you can do to create more optimized code for PHP!

Tune in next time for my next optimizing article on, Commenting.

Thursday, November 12, 2009

Zend Studio 7.0.2 - Lot of new stuff, same old issues

I've been a Zend Studio IDE user for about 5 years now. I've always loved the IDE since it was so well tuned for PHP projects. Last year I made the blind leap of faith into using Zend Studio 6.0 and was extremely disappointed in what it provided to the developer. The release had so many bugs and never really stood up well against their previous 5.x release...

When 6.0 was released with all the bugs and issues it contained, I went out of my way to write a 3 page bug list for the team. Why? Because I love Zend and their IDE. It is what makes my life as a developer much easier. I was hoping that pointing out the "features" of 6.0 that was causing so much pain for me to even consider using their product would be enough to send them a message: Change it, or lose yet another developer.

Today I installed Zend Studio 7.0.2 Professional, totally expecting to see these issues resolved. While the IDE shows a drastic change in speed, the same errors and exceptions remain, thus making working in Zend Studio a much more complicated headache than it ever should be.

Granted I would never be able to create the things Zend has for an IDE, however this is not even the case this time. Zend has decided (as of their 6.0 release) to use the Eclipse engine for their IDE. It was a mistake for their 6.0 version, and it's still a mistake today.

The project management piece of Zend Studio (which is the most used functionality for me) is still broken, and management of files is still a clustered mess.

Sigh. While there are a small amount of issues with the 5.5.1 build (which is what I STILL use today), these are NOTHING compared to working with 7.0.2.

I hate to say it, but Zend Studio has lost another developer to use their IDE. I'm done. Their 5.x release was so GREAT! It made life so nice! But now they have continued to stick with the failed philosophy of using Eclipse as their engine (which is actually a decent enough engine), but they just cannot seem to make it work with Zend Studio.

I cannot recommend this IDE to anyone, and I definitely cannot justify spending the amount of cash they (Zend) are calling me asking me to purchase for 7.0.2. There is just no way.

I'll continue to use 5.5.1. Unfortunately though, I can only see nothing but doom for Zend's IDE user base, which is really too bad...

Monday, November 9, 2009

Simple Optimizing Tips for PHP - Part I

As we write PHP code (or any language for that matter) we begin to start thinking about optimizing code. But even though our intentions are for "the greater good", we sometimes miss things that should be obvious. This article is one of many to come that hopes to point these things out to you, and in the process train your mind to think of quicker steps you may otherwise haven't really thought about.

Today's article takes a look at:

Loop Arrays
When we use loops, the goal is simple: iterate through an array set, and do something with the results. Most developers understand this quite well but they build the basic looping structure without considering how poor methods of iteration will affect their code's performance.

Example:

$sql = "SELECT * FROM `user`";
$rst = $dbh->fetch_rows($sql);

for($i = 0; $i < count($rst); $i++) {
// do something
}


For some, the problem is quite clear. For others, the code above seems quite valid and perfectly fine. However, the above can cause serious latency in your application(s) if you follow this type of constructing loops enough times.

The issue exists with the loop piece $i < count($rst);

Consider that the query we used returns 2000 records. While the for loop is completely valid, the issue exists in that, for each iteration, you are now counting the number of results ($rst) in each and every iteration!. This means that if we have 2000 records, it's going to get a count of our $rst array 2000 times!

What's worse is that when you issue the count() function on a big array, it must load this array into memory, check it, and then finally clear the tmp data. This in turn puts a small hamper on the server when done once, but when we are doing it 2000 consecutive times just for the sake of iteration, the overhead becomes very evident.

The solution for this is quite simple. You should first just get a count of the number of records, assign it to a value before the loop, and then use this for your iteration. For the PHP engine, is much faster to check a flat numeric value 2000 times rather than issue the count() sequence (explained above) 2000 times.


The Full Solution

$sql = "SELECT * from `user`";
$rst = $dbh->fetch_rows($sql);

$total_users = count($rst);
for($i = 0; $i < $total_users; $i++) {
// do something
}


Where the above will find some flack from experienced users is the situation where you may really need a count of an array. Such situations include: removing a key from an array, adding a new key to an array, popping values off an array, etc...

For these situations, the solution is not to use a for loop, but rather a foreach loop. Since a foreach loop iterates one array key at a time, we can remove/add array keys without the need to keep a count. Plus, if you really want to keep track of your $i variables count you can just assign $i before foreach loop and increment it within the foreach.

Example:

$sql = "SELECT * FROM `user`";
$rst = $dbh->fetch_rows($sql);

$i = 0;
foreach($rst as $key => $value) {
echo "|| $key => $value => $i || ";
$i++;
}


While the above can be optimized a hundred different ways, the point is that we should be taking into consideration how a script in PHP operates. Instead of just forcing it to carry out redundant operations and continue to put more overhead/waste of processor cycles on our server, why not try to optimize the code and make a cleaner, faster application?

Tune in next week for my next article: Recursive Functions.

Wednesday, November 4, 2009

PHP Coding Standards

I wrote this a few years ago in order to create a more standard approach to developing PHP code. Enjoy...

/**
* @author Jonathon Hibbard
*/

PHP Coding Standards

Contents
1 PHP Coding Standards
2 Benefits of Coding Standards
3 Coding Standard Used in this Document
4 Identifier Naming
5 Keep Functions to a Maximum of One Page
6 Whitespace and Indentation
7 Placement of Parenthesis and Curly Braces
8 Line Up Like Values
9 Add a Space between Variable Definitions
10 File Structure and File Naming (More Work Needed in this section)
11 Comments
12 A Full Example
13 Conclusion
14 References



PHP Coding Standards
This document provides a complete coding standard to resolve conflicts that may arise when developers share and/or debug code. It is important for every developer to understand the benefits of such standards, how to use them and how not doing so affects others. Each developer adapts to their own methods and forcing them to change usually results in a negative result.

With that being said, it is encouraged rather than required that you follow this standard. Coding standards do not have to be a difficult thing to conform to. They actually do more good than bad, especially in an environment where such standards do not already exist. This document's intension is to put standards to use and make for a more team-friendly development environment.


Benefits of Coding Standards
Standards benefit both developers and businesses alike. It applies conventions that make it significantly easier for both current and subsequent developers to understand how the application they work on operates. It also helps with how to fix or extend existing code for new needs.

Example:
$a = $b * c;

The above is 100% valid and works, but it's very unclear to its true meaning. What are the variables true representations? How hard is it going to be when we have more variables like $aa, $az, $z, or others that obfuscate this further?

A Standard-Correct Example:
$weekly_pay = $hours_worked * $pay_rate

This describes both the intent and meaning of the variables, the end result and makes it very easy for anyone else to decipher who may stumble upon this line.

Coding standards also are beneficial in that when used, they can enable a transparency to the code, eases debugging and can increase code maintainability.

Quote from the Java Standards Webpage:
  "Why have code conventions?
Code conventions are important to programmers for a number of reasons:
* 80% of the lifetime cost of a piece of software goes to maintenance.
* Hardly any software is maintained for its whole life by the original author
* Code conventions improve the readability of the software,
allowing engineers to understand new code more quickly and thoroughly."



Coding Standard Used in this Document
The coding standard used in this document is very closely tied to the C++ (and C) language called the Kernighan and Ritchie (K&R) standard.

There are other standards (including the ANSI C++ standard, Java standard, Pascal Standard and hundreds others) that developers use, but this standard is much more universal in many ways. The primary way being that it conforms to other adapted models outside of programming (such as the MySQL and SQL database server's standards) and has been widely used for over a decade now.


Identifier Naming
Identifiers are names given to variables, functions, objects, etc.

Given this understanding, you should follow these rules when defining an Identifier:
1) Do not use a leading underscore. They are usually reserved for compiler or internal variables of a language and can mislead a developer into thinking it's a SYSTEM identifier instead of a custom built one. Example of an incorrect name: $_first_name;

2) Do not use names that are like standard system identifiers as these words are reserved words for a reason, and doing so will make the application more difficult to understand. Example of an incorrect name: $for;

3) Give descriptive meaning to identifier names. Example of a bad name: $fn Example of a good name: $first_name. However, make sure to keep in mind not to get too descriptive since this will result in a lot of typing for the rest. Try to keep it short, but to the point. This rule can also be relaxed a little more for instances where short variable names such as $i are used in for loops.

4) Give descriptive meaning for functions that contain a mix of verbs and nouns separated by underscores.
Bad Example:
function first_name() {

Good Example:
function get_first_name() {

Once again, this is considered the K&R standard. The other method is the ANSI C++ Standard, which capitalizes the first letter in each word (getFirstName). However, since we are using the K&R Standard in this document, it is required you follow the K&R naming conventions. Function names can easily be dismissed by a developer who is in the heat of the moment in code. It's very easy to call a function "doit" instead of "get_first_name".


Keep Functions to a Maximum of One Page
This is not an absolute rule as much as it is a helpful practice. Functions that exceed 1 page are by default a little harder to decipher both during creation as well as debugging. Scrolling up and down a page can cause confusion as to what function you are in, where the braces line up, and also how something in the beginning of the function worked. This becomes more evident when opening a file in a console editor like VI or EMACs.

Plus, if you find yourself writing more than a page for a function, it may be a very good indication that you might need to break the function down to 2 or more separate functions, that the function could instead be used as a recursive function, or simply that the method you are choosing is not the quickest route.

This does not mean that you should obfuscate your code in order to meet this requirement. If the function absolutely must extend a page (and usually this is the case when formatting arrays correctly) then go ahead and do it. Just keep it in your head that if you can put a piece of code in its own separate function to be used later, do it.


Whitespace and Indentation
Most modern editors today allow you to define how many spaces a tab can be. They also offer the ability to decide if you would rather use the tab (\t) character when the TAB key is pressed, or if the TAB should be interpreted as X number of spaces.

The preferred method is that all TABS be handled as spaces with 2 spaces as the default (4 is usually the system default). If you are used to the standard Window's tab of 4 spaces, this may seem a little awkward at first. Rest assured, it will reduce the number of column space your code is taking up, but also give it a more structured and easier flow.

Finally, it should also be common practice to change how new lines are handled with your IDE. The default should be set to treat new lines as the Unix \n (new line) instead of the Microsoft default \r (Return). The reason is that \n is a universal default supported in all Operating Systems, so no matter what OS your code is opened up in, the new lines will be nicely available on each new line instead of one concurrent line.

Placement of Parenthesis and Curly Braces
This is perhaps one of the most difficult subjects, not because of its complexity, but more of an infringement of a developer's habit. The ANSI C++ standard has us put curly braces on their own separate lines after ifs, loops, function declarations, etc. The K&R standard, however, has us put curly braces on the same line.

Example ANSI C++ Standard
if (this_condition == true)
{
//do something
}
else {
//do something else
}

K&R 1TBS Standard Example

if(this_condition == true) {
//do something
} else {
//do something else
}

This document requires you to use the K&R Standard as was first defined in the beginning of this text. It is very important that this method be used in order to reduce the need of other developers to reformat your code in order to bring it up to scope with this standard.


Line Up Like Values
When you have a list of variable declarations, or a large array definition, it's a good idea to line these values up. It helps with readability and is key to making your code clean for other developers.

Bad Example
$first_name = 'John';
$last_name = 'Doe';
$city = 'Cincinnati';

Good Example

$first_name = 'John';
$last_name = 'Doe';
$city = 'Cincinnati';

As you can see, the second example makes it much more legible. One more exaple would be for an array;

Bad Example:
Array('first_name' => 'John', 'last_name' => 'Doe', 'city' => 'Cincinnati', 'address' => array('street' => 'PO Box 20', 'zip' => '45241'));

Good Example:

Array('first_name' => 'John',
'last_name' => 'Doe',
'city' => 'Cincinnati',
'address' => array('street' => 'PO Box 20',
'zip' => '45241',
),
);



Add a Space between Variable Definitions
When defining a variable to be equal to a value, always make sure to put a space between the variable and the equal sign, and also a space between the equal sign and the value.

Bad Example:
$i=0;
function foo($bar=1){
}

Good Example:
$i = 0;
function foo($bar = 1) {
}


Filesystem Naming Conventions (Files/Directories)
Both files and directories should be named by default by all lower-case. They should be descriptive in their meaning, but also limited in length. This is not only because both Linux and Unix servers are case-sensitive file systems, but also because even web addresses typed in capitalized lettering will, by default, be translated as lower-case values (IE: WWW.GOOGLE.COM will be switched to www.google.com on Enter).

A typical file structure should have the following structure:

/admin/
/apps/
|- classes
|- includes
/css/
/js/
/utils/
/xml/


The admin directory is where backend application files that would require a login and admins a page should go. The apps directory has 2 sub directories: includes and classes. Includes are files that are included simply to reduce repetition (such as header or footer files), whereas classes is where files containing class definitions are stored. The css directory, as the name implies, is where style sheet (css) files will go. The js directory is where JavaScript files would go. The utils directory is where either 3rd party applications or reusable applications (such as thumbnail generators) would go. The xml directory is where xml files will go.

Example of a bad filename
1.php

Example of a GOOD filename
list_users.php


Comments
Comments play a major role in any development environment. It not only helps out an individual better debug their own code, but it is basically a manual to how your code works. Commenting should follow the DOC standard for commenting.

The following is an example of this:

/**
* A brief description of what you are commenting goes here.
*/


Commenting is vital with any team. Over commenting can cause your code to be very confusing, under commenting could leave many questions regarding what your code actually does.

You should always comment in these circumstances:

* Whenever you define a function or class
* Whenever you define a variable that does not describe itself properly.
* Whenever you write a "hack".
* Whenever a block of code is foreign to you.
* Every new page you create should have a description at the top indicating its purpose.
* If something you do is not 100% and needs to be rewritten in the future.
* If something you do is not 100% obvious.


The PHPdoc comment is very intuitive and supported by most popular IDE's.

Some of the most popular keys used with this commenting style are:

@author // Indicates the author's name
@param // Describes a parameter for a function
@copyright // The copyright information (if applicable).
@return // The return value of a function
@todo // Gives you the ability to write-up a "todo" list.



A Full Example

/**
* @author John Doe
* This class' purpose is to provide a control area for handling database connections,
* errors and results.
*
* @copyright ACME
* @todo This class requires a new method of error control.
*/
class db_handle {
/**
* This is the database handler variable.
*/
public $dbh;

/**
* This is the constructor of the class.
* @param string $strDSN // String containg connection information
* @return object // Returns the db_handle object.
*/
public function __construct($strDSN) {
// Do Something.
}
}



Conclusion
This is the end of the current Coding Standards document. There are a few other concepts such as private/public variables, unit naming, and other high-end programming techniques that can be applied, but are not due to this not being about "how to program", but instead is about coding standards.


References
http://en.wikipedia.org/wiki/Coding_standard - Wikipedia Definition of a Coding Standard
http://en.wikibooks.org/wiki/C%2B%2B_Programming/Code_Style - Code Standards
http://en.wikipedia.org/wiki/Indent_style - K&R Coding Style Definition
http://dn.codegear.com/article/10280/Code - Great Article relating to the K&R Coding Style

Friday, January 9, 2009

When and How to Use Cookies

(Disclaimer: The methods described below should be viewed as education purposes only! These methods are not certified by me to stop or prevent attacks to your site. Instead, this blog tries to explain some ideas for properly using cookies. In other words, use these methods at your own risk, but don't come looking to sue me if you or your site are compromised).

Cookies, by some, are one of God's greatest inventions for the Internet. To others, cookies represent all that is bad and evil in the world today.

In reality, they are both.

These little files are only as good or bad as you, the developer, use them. With the growing popularity in XSS Attacks and CSRF Attacks, there is no wonder developers are having trouble relating anything good with using cookies, and also why even more users are refusing to accept them.



In this blog entry, I'll explain to you how to make using cookies on your site safer by explaining the following:
When to use cookies, and when not to use cookies
How to properly validate users with cookies
Alternatives to using client-side cookies
References/Further Reading



When to use cookies
Cookies, by many sites in the past and, unfortunately, the present, are mainly used to identify a user who has access to a website. They are intended to be a beneficial file(s) that grants users a more pleasant experience while they are browsing, shopping, or later return to a site, without the need to retype their credentials. However, the same helpful intentions of cookies such as these are continually being turned into unpleasant security vulnerabilities for users and websites alike.

We first start with when not to use cookies.

Cookies should not be used to store sensitive information. To be more blunt, you shouldn't store unencrypted (or plain-text) values for usernames, credit card information, first/last names, addresses, phone/fax/mobile numbers, passwords, email addresses, websites, social security numbers...you get the drift. Nothing personally related to your user nor anything that will be used to automatically authenticate/login a user should be available in a cookie.

While select few will scream “blasphemy” at such an idea, I assure you it is not I who is blaspheming. It is a common misconception for developers to think cookies are a safe warehouse for information. Such carelessness is leading the current craze of XSS and CSRF attacks among crackers (poorly and misleadingly labeled hackers) to be legit methods to obtain or attack a user/website.

So what should cookies be used for then?

Cookies should be used for simple and harmless storage, and/or by validating that an actual user is submitting a form from within your website (for examples using the form method, please see Preventing CSRF andXSRF Attack link below).

Some Ideas of simple storage:
Keeping track of a shopping cart of an unregistered user (registered users have the luxury of that information being stored in a database).

Remembering which test/quiz/questionnaire/poll a user has already completed.. Of course, I do not mean for use with prize-based poll/questionnaire/etc. These should be limited to registered users and, like before, stored in a database.

Tracking when a user is logged into your site to assist and report “Number of active/Logged In Users” status.

All of these methods are simple, and harmless uses to store cookie data. But like any other data coming from a user to your site, the above must always be validated. You must be as paranoid as possible when dealing with data coming from your client, and validate everything, always. Assuming that no one cares about your website, or that it's unlikely a cracker would ever want to take over your website is exactly why a cracker would choose to take over your site. The majority of crackers do this in order to obtain popularity within their group (or bragging rights if you will), and when one finds out it's possible, the majority of the others will follow in order to gain the same bragging rights. All courtesy of your website's vulnerabilities.

In general remember to just keep it simple, keep it light, and make values you store in a cookie obfuscated as well. If you want to store the ID of an item, name the cookie “session_id_for_user” and the ID the item_id multiplied by some integer and combination of letters/numbers.

Example:
Assume you sale Bourbon, and the George T. Stagg has an item_id of 15. The value you should store in the cookie should be b375gts. In other words, b(for bourbon), 375 (which is 15 x 25), and gts (the initials for George T Stagg).

Don't go using my example though. Be creative! Be obtrusive! The harder it is for a cracker to understand just exactly what your cookie values do, the harder it's going to be for them to realize that it's a worthless cookie to begin with.

How to properly validate users with cookies
One of the most popular things a developer wants to do anymore with cookies is allow for a returning user to have a more pleasant experience on their site by not needing to enter as much data to gain access. But since we've already stated that storing information used to authenticate a user is not only dangerous, but shouldn't be used, how should we go about granting users some sort of relief?

The answer, just like above, is an obfuscated encryption key that is used to lookup a user_id in a database, but requires the user to type in their password. Once the encryption key is used and the correct password entered, the current encrypted key is discarded, and a new one is generated.

While this may sound like it contradicts my statements earlier, it is in fact a legitimate method! How can this be? Well, first of all an encrypted hash generated from a combination of the user's browser agent (ie: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.5) Gecko/2008121622 Ubuntu/8.04 (hardy) Firefox/3.0.5), a current timestamp (12388445), parts of the person's username and email address (smashed123 becomes maed23 and bob@gmail.com becomes omilom), all encrypted with SHA or MD5 (although there are numerous rumors about MD5 quickly becoming an insecure encryption). Now you have a nice, simple, and (for now) safe encryption. One that, even if decrypted, would be meaningless/useless to the cracker who attempts to hijack it.

And since we are still requiring the user to type in their password to the site in order to accurately authenticate them further, off-site attacks are even more less likely.

Even after all of this, I still do not use cookies for validating a user's login credentials. Unfortunately, I would rather suffer from losing a lazy user to giving an attacker even 1 iota of an advantage, no matter how good the methods appears.

Alternatives to using client-side cookies
There are of course alternatives to using cookies. Since I am more of a PHP fan, I will use PHP's server-side sessions as an example.

PHP stores sessions 3 ways: via cookies on the client site (which, if used, should be used in conjunction with the session.cookie_httponly setting to help reduce XSS vulnerability), on the filesystem of the server (usually located in /tmp with a hash value), or in the server's memory (not covered or recommended).

Since we're only looking at server-side methods, obviously that leaves server-side-based cookies. While this method is much more secure and reliable, it is still strongly recommended to follow the above guidelines when determining what sort of data to store inside of a session-based cookie.

Resources/Further Reading
Preventing CSRF and XSRF Attacks
The Felten and Zeller Paper (PDF)
Cross-site Scripting Prevention/Awareness


In conclusion, I hope this has been helpful in some way to anyone interested in security. While some of the above almost certainly can be improved on (and I truly hope the development community tries to assist in doing just that), I welcome any and all comments.

Wednesday, January 7, 2009

Facebook - You're killing me smalls!

About 4 months ago, I began making my transition from MySpace over to the Facebook realm. Until recently, I had the utmost respect for Facebook, it's interface, how it works, and all that jazz.

As I said, that is until recently.

That is until I started getting all sorts of emails out of the blue. "What the hell??", I asked myself. Was Facebook selling my account information to a 3rd party? Were they spamming me to make an extra buck?

The answer to all of these questions is "No" - at least not on purpose.

For any tech nerd out there who knows anything at all about Cookies, Sessions, and Security from building a website, you know some pretty common rules when it comes to user information.

1) Don't put any sensitive information in the cookie that you yourself wouldn't want the whole ever-loving world to know about.

2) Don't trust that cookie data. It should be there as a means of "Quick Lookup" ONLY!

3) DON'T PUT ANY SENSITIVE INFORMATION IN THE COOKIE!!

Facebook, being the big powerhouse social networking site that they are, should of all people be more than aware of this. They should know that setting a cookie called "login_x" with my username (which is my email address too by the way) in there is a HUGE no-no! Worse yet, it's in plain text to boot!!

Well, maybe I'm over-reacting a little bit. I mean, it is quote/unquote encrypted with url encode!



COME ON GUYS! What the heck gives with this crap?

Given the fact that a LOT more Cookie Sniffing sites are coming out, wouldn't it be pretty obvious that this is yet another great way to have the security on your site compromised, or even worse, a great way for my personal information to get out?! A simple lookup of the email from google, and more than likely you're also going to know that Person's First and Last Name. With that, if the user, say, owns a few domain names? A quick whois on those domains and you now have their address and phone number too! All this information in a matter of seconds if you have written a script to just do the leg work for you, or a couple minutes if you're a teenager with a desire to be a haxx0r.

Now, in case there are any nay sayers out there who think that cookies are part of Satan's toolkit and that only a moron would store them, just hold on a second. Cookies can be good for a number of reasons, like storing a counter, the number of the last post you read, or something else insignificant that would just make your user's experience a tad bit better. Hell, maybe I'll even write a blog on the goods and bads of cookies. Either way, there are ways to use a cookie and there are ways NOT to use a cookie. For a good show in how not to, just use Facebook as an example.

Don't get me wrong, I still love Facebook as an application, and I'll still use it since I've put way too much time into it as it is. However, I for one am going to try and block any and all cookies from this moment on from Facebook.

I mean, MySpace isn't exactly the greatest site on the planet, but hey, at least they aren't storing sensitive information like this in plain text for the world to see...