Why HTTP_HOST is evil

When browsing Stackoverflow I often notice users asking questions somehow involving the use of HTTP_HOST. I nonchalantly hint on its vulnerable nature and fail to produce a hint on an article explaining why. Which is why I decided to take matters into my own hands.

The origin

In PHP, our protagonist is accessible via:

$http_host = $_SERVER['HTTP_HOST'];

The value in the $_SERVER superglobal is taken from the HTTP request’s Host: header. Now, this header is only sent by non-HTTP 1.0 clients, but what browser uses that outdated protocol anymore (unless you tell it to)?

For an Apache server responsible for multiple sites, the information in the Host: header is crucial to determine which virtual host to route the request to. After all, the client only connects to an IP address and multiple domain names can resolve to this address.

The assumption

Since the Apache server is doing all the work of finding the correct virtual host to serve our request and passes HTTP_HOST along to the script to be executed, many assume HTTP_HOST to now contain the correct domain name to which a client has connected.

This assumption is circumstantially wrong.

The common use

We assume you have a template engine driven website, you’re aware that hard-coding URLs into your templates is a Bad Thing™ and you set up globally available template variables. In this example, our template engine is Smarty:

The globally included setup script:

$smarty = new Smarty();
/* later on.... */
    'title' => 'My Homepage',
    'page_base' => $_SERVER['HTTP_HOST'],

Later on, in a template:

A non-malicious HTTP request would have generated two or more links to URLs on the current domain.

What usually happens:

The following HTTP request could have been sent by a Firefox browser:

GET / HTTP/1.1
Host: perfect-co.de
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; de) ...
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,de;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive

What could have happened:

Someone could have come along and done a telnet to your domain’s IP address and send the following manually:

GET / HTTP/1.1
Host: "><iframe src="about:blank" onload="alert('XSS')"

That’s all. Your template’s result should now look like:

<div class="links">
    <a href="http://"><iframe src="about:blank" onload="alert('XSS')" "="">Home</a>
<a href="http://"><iframe src="about:blank" onload="alert('XSS')"/about">About</a>
<!-- and more links --></div>

Tada! XSS galore. This is also easily achievable through a Firefox plugin called TamperData.

The prevention

The above example actually only works for the default virtual host, as it’s the one getting all non-matching requests routed to.

Apache does its CGI scripts the favor to provide SERVER_NAME as well. Given a correct setup, it’ll actually contain the virtual hosts domain name, as configured. No buts. The correct setup includes this tiny little directive:

UseCanonicalName On

Without this, $_SERVER['SERVER_NAME'] would at least have contained an escaped variant of our injection. Using the above directive to configure Apache, we force the content of $_SERVER['SERVER_NAME'] to actually be the targeted domain name.