Usage
The PHP include statement is a widely used statement in PHP. As the name suggests, the include statement is used to include files within another file (think of #include from C++). However, it does more than that, PHP also parses the file it includes. The file can be either remotely or locally included. In the case of remote inclusion, the URL of the file is provided. Note, the file must be able to be parsed in order for remote inclusions to work.
For local inclusions either the full path or the relative path of the file need be provided. The include statement allows developers to split their code up into different files, allowing for better organization, and it is a major asset for template systems. Even though a file may not be considered a PHP file by the operating system, if included within a PHP script, the file will nevertheless be parsed if it contains PHP code. This is as useful as it is dangerous if not properly filtered and monitored.
Since the include statement is a language construct, parantheses are not required. Therefore both cases will work:
<php
include "file1.html";
include ("file2.html");
?>
require, include_once and require_once
Aside from the include statement, there are also other ones that do the same thing but slightly different. The difference between include and require is that require exits with an error if the file could not be included. This is useful when your script needs those files in order to function. include_once and require_once, just as the name suggests, is when you only want files to be included once (useful for when inclusions are handled dynamically). In practice, it may be best to get used to using include_once and require_once instead of the other two.
Security Issues
Due to the fact that practically any file can be included, proper steps need to be taken to prevent exploits. These exploits are generally not the fault of PHP but of the developer. Due to the fact of how easy it is to learn PHP, badly written or insecure code may be found. It would be in the best interest of the developer to take the time to study the ways the include statement can be exploited.
A lot of tutorials may take the easy route in trying to explain the include statement. This generally involves the $_GET variable. $_GET stores information retrieved from the URL. For this reason, $_GET is a lot easier to manipulate than, for example, $_POST. Tutorials should therefore consider being more complex in their ways.
You might come across a tutorial with similar code:
<php include $_GET['file']; ?>
It is understandable why they would display such simple code, as the point of the tutorial is often to briefly teach the include statement. However, this just reduces quality and promotes bad code if not otherwise stated. Adding a file extension does not help either:
<php include $_GET['file'].".html"; ?>
As mentioned before, any file that contains PHP code will get parsed by the PHP parser regardless of file extension. If the server allows for remote inclusion, arbitrary code can be executed and information about the server can be exposed. Rather, if the developer does not want files other than what they have in mind to be included, then they should filter $_GET or whatever dynamic variable is used. Note also that if a user manages to upload a file to the server, the very same file can be included with the include statement.
Anyone who knows a little computer science can tell you that a string is just an array of characters that ends with a null byte. What you are giving the include statement is a string. If a user manages to pass a null byte, the extra ".html" may be ignored. What then is the point of adding a file extension? Therefore learn to filter all input—a general rule of thumb.
Filtering Data
When it comes to static paths filtering isn't a concern; however, dynamic paths that can be manipulated by anyone is. One of the steps to figuring out what you want to filter is thinking about what you don't want included. Do you want people to include things from a remote source? How about files that can't be publicly accessed (e.g. /etc/passwd)? How about the very same file doing the inclusion (e.g. index.php?file=index.php)?—which will hurt server resources due to its recursive nature.
If you have made your own flat-file based content management system, a good system does not bear these problems. All of the problems that can occur that have been mentioned here are easy to solve. The solution concerns checking the path provided, whether in the URL or elsewhere. Take the following example on ways that can stop these three problems all at once:
<?php
if (file_exists($_GET['file']))
{
$_GET['file'] = realpath($_GET['file']);
if (preg_match("#^".$_SERVER['DOCUMENT_ROOT']."#", $_GET['file'])
&& $_GET['file'] != $_SERVER['SCRIPT_FILENAME'])
include_once $_GET['file'];
else
exit("Invalid path.");
}
else exit("Invalid path.");
?>
file_exists() allows us to differentiate between a file system path and a URL. realpath() is used to obtain the absolute path of the file—this helps us determine where the file is located. We then check to see if the path is located within our document root (the location where files can be accessed publicly) by matching it against $_SERVER['DOCUMENT_ROOT']. Just in case that returns true, we check to see if $_GET['file'] is not equal to $_SERVER['SCRIPT_FILENAME'] (so we can avoid instances like index.php?file=index.php). If everything fails, exit the script.
As an extra precaution, try to avoid having special characters within the URL, specifically the end of the URL.
Tips and Tricks
As we've pretty much covered all the necessary stuff, let's get into some interesting facts about the include statement.
Object Oriented Programming
Whether some may realize it or not, the include statement can be used in object oriented programming to have other scripts access protected and private data. When a file is included within a method, that file then has access to the object using the $this variable. Of course, this is dangerous if those other than the developer(s) have access to this file. Therefore be sure to check file permissions and take extra precautions to avoid exploits.
