About Apache Compression and Content-Length Header

Just resolved an interesting problem today, one of our code breaks because the response header set by the web server did not include Content-Length.

Spent quite a while investigating and turns out this is due to gzip compression. As seen below Content-Encoding is gzip and I think this causes Content-Length to be omitted.

apache-resp-headers

Gzip compression can be disabled on apache via .htaccess config. In my case I disabled all compression to swf file by adding following configuration

<FilesMatch "\.swf$">
  SetEnv no-gzip 1
</FilesMatch>

VirtualBox, Ubuntu and LAMP Stack

Came accross VirtualBox, a free & excellent virtual machine software. I decided to take it for a spin creating a Ubuntu virtual machine LAMP stack on it..

Here We Go

  1. Download and install VirtualBox
  2. Download latest Ubuntu iso installation file
  3. From VirtualBox create a new Virtual Machine. Select type: Linux and version: Ubuntu. On the next step you will be prompted with dvd drive containing the installaion disk, but instead just select the iso downloaded on step 2
  4. Go through the Ubuntu installation steps
  5. It’s also very helpful to install ssh server so you can ssh into your VM later on: sudo apt-get install openssh-server

Voila! You have ubuntu running on your Windows PC

Host and Guest

In virtualization realm, host indicates your physical PC (Windows 7 in my case), and guest is the virtual machine (Ubuntu). Most of virtual machine software documentation uses host and guest terminology heavily so make sure you’re familiar with it

Networking

This is where things get tricky. Virtual machine comes with virtual network adapters, and you have to do few configuration to setup connectivity between your virtual and physical adapters.

By default VirtualBox allows the guest machine to connect to the internet through NAT, so you can download data, browse internet etc. However if you want to run servers from the guest, it won’t be discoverable by the host or other PC in the host’s network immediately.

One approach to make them discoverable is by setting up port forwarding. You get here by going to networking section on the machine’s setting on Virtual Box

portforwarding

Note that setting port forwarding requires the port is actually free on your host machine. Hence I find it very useful to add an IP to your host’s network interface specifically for the VM so you don’t have port conflicts. In this example I added the IP 192.168.16.201 on my interface:

addip

The “AMP”

So there’s the “L – Linux” done. Now for the Apache, Mysql and Php, it can simply be done by using Ubuntu’s apt-get package manager:

  1. Open a terminal / SSH session to your Ubuntu machine
  2. Elevate into root using sudo su root
  3. apt-get install apache2
  4. apt-get install php5
  5. apt-get install mysql-server mysql-client

Few helpful notes:

  • Default doc root is /var/www
  • To start / stop apache: sudo service apache2 stopsudo service apache2 start
  • To start / stop mysql: sudo service mysql stop / sudo service mysql start

Useful Apache Configuration

Redirecting HTTPS to HTTP (and vice versa)

RewriteEngine On
RewriteCond %{HTTPS} on
RewriteRule (.*) http://%{HTTP_HOST}%{REQUEST_URI}

http://stackoverflow.com/questions/8371/how-do-you-redirect-https-to-http

WordPress .htaccess Rewrite Rule

This .htaccess will rewrite any path not resolving to actual file or directory. It will add “/index.php” prefix into the URL. This is required for wordpress permalink

RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]

Redirecting mydomain.com to http://www.mydomain.com (or the opposite)

Your user often access your site using www prefix or without it, hence you setup both URL to resolve into your webhost in your DNS. However if you don’t redirect one into the other, search engine might think it’s a completely different site (hence website statistics etc will be wrong). One approach is to do external (301) redirect from one into the other.

RewriteCond %{HTTP_HOST} ^mydomain\.com.au [NC]
RewriteRule (.*) http://www.mydomain.com.au/$1 [L,R=301]

Monitoring Apache

To enable apache monitoring, firstly make sure status module is enabled. Find following line on your httpd.conf

LoadModule status_module modules/mod_status.so

Then add following configuration section. The “Allow from” restriction will prevent arbitary IP to view this, so if your ISP provide you with static IP, put it here.

<Location /server-status>
    SetHandler server-status
    Order deny,allow
    Deny from all
    Allow from 220.245.123.18
</Location>

You can monitor your apache server (see worker threads status etc) by going to http://mydomain.com/server-status

 

Stay Tuned!

More to come when I stumble accross them

See Also

Benchmarking Web Page Load Time Using Apache AB Tool

Any apache httpd installation comes with ab tool on the bin folder. This handy tool can be used to perform benchmark testing:

ab -n 10 -c 2 http://www.mycoolwebsite.com/

The output you get is quite self-describing:

This is ApacheBench, Version 2.3 
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking www.vantagefx.com (be patient).....done


Server Software:        Apache
Server Hostname:        www.mycoolwebsite.com
Server Port:            80

Document Path:          /
Document Length:        29320 bytes

Concurrency Level:      2
Time taken for tests:   4.524 seconds
Complete requests:      10
Failed requests:        0
Write errors:           0
Total transferred:      297360 bytes
HTML transferred:       293200 bytes
Requests per second:    2.21 [#/sec] (mean)
Time per request:       904.890 [ms] (mean)
Time per request:       452.445 [ms] (mean, across all concurrent requests)
Transfer rate:          64.18 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   1.3      0       4
Processing:   806  887  90.4    852    1063
Waiting:      804  885  90.4    850    1061
Total:        806  887  91.3    852    1067

Percentage of the requests served within a certain time (ms)
  50%    852
  66%    888
  75%    918
  80%   1027
  90%   1067
  95%   1067
  98%   1067
  99%   1067
 100%   1067 (longest request)

Note that however I believe this tool will only request specified html page, not external resources associated with the page (no external images, javascript, css, etc.).

If you want to test https (SSL) page, make sure you have a version of Apache httpd with ssl support, and use abs instead.

Tomcat & Apache Reverse Proxy

I have to admit Apache HTTPD has one of the most confusing documentation among other tools out there. Here are my Apache & Tomcat requirements:

  • A domain name mycoolwebsite.com pointing to my hosting server
  • Tomcat running on http://localhost:8080 on my hosting (default)
  • Apache HTTP server running on http://localhost:80 on hosting (default)
  • Request to http://mycoolwebsite.com should be reverse proxied into Tomcat without the end-user noticing the port difference

After 2 hours of reading documentation, trial and error, I managed to get this setup working using following Apache Reverse Proxy and VirtualHost configuration:

<VirtualHost *:80>
   ServerName mycoolwebsite.com
   ProxyPass        / http://localhost:8080/
   ProxyPassReverse / http://localhost:8080/
   ProxyPassReverseCookieDomain localhost mycoolwebsite.com
   ProxyPreserveHost On
</VirtualHost>

Configuration Explained

  1. ProxyPass directive will cause request to http://mycoolwebsite.com/foo/bar (and any other paths) be internally forwarded to http://localhost:8080/foo/bar. Internally here means the user will never know — it will look as if the response came from http://mycoolwebsite.com. Behind the screen (on the server) Apache forwards the request and obtain the response to/from Tomcat. The trailing slash (/) at the end of http://localhost:8080/ is important because without it request to http://mycoolwebsite.com/style.css will be internally translated to http://localhoststyle.css (which is wrong).
  2. ProxyPassReverse directive is useful when Tomcat sends HTTP 3xx redirect response. If Tomcat redirects http://localhost:8080/logout into http://localhost:8080/login, your end user will see http://localhost:8080/login unless you place this directive.
  3. ProxyPassReverseCookieDomain directive will cause the cookie domain be re-written into the intended domain. Without this end-user would see localhost as their cookie domain (which is again wrong).
  4. ProxyPreserveHost directive will cause the domain header be forwarded into Tomcat. Without this Tomcat will see the incoming request coming from localhost. This is handy when you’re planning to serve multiple host name under Tomcat.

You can try this on be adding the above configuration into your httpd.conf file (typically located on /etc/httpd/conf/httpd.conf). Once you’ve edited the config, restart apache using sudo apachectl -k restart command.

Reverse Proxy and Load Balancing using Apache mod_proxy

Reverse Proxy is a type of proxy server that retrieves the resource on behalf of the client such that the response appears to come from the reverse proxy server itself. One useful application of reverse proxy is when you want to expose a server on your internal network to the internet.

Load Balancing is a technique of distributing requests into several backend servers, hence providing redundancy and scalability. Many Java EE app servers came built-in with clustering feature allowing session to be replicated across all nodes in the cluster.

Apache has few modules that can be used to setup the two above. Following is a guide on how to set it up.

  1. Ensure you have an apache-httpd installed. This guide is written against httpd 2.4. If you’re on OSX most likely you’ve already got httpd installed.
  2. Figure out where is your httpd.conf file located. Typically it’s at $HTTPD_ROOT/conf/httpd.conf.
  3. Enable following modules. Module can be enabled by uncommenting (removing the ‘#’ character at the beginning) the LoadModule directive on your httpd.conf:
    LoadModule proxy_module modules/mod_proxy.so
    LoadModule proxy_http_module modules/mod_proxy_http.so
    LoadModule proxy_balancer_module modules/mod_proxy_balancer.so
    LoadModule slotmem_shm_module modules/mod_slotmem_shm.so
    LoadModule lbmethod_byrequests_module modules/mod_lbmethod_byrequests.so
    
  4. (Optional) depending on your operating system setup, you might need to change the user & group httpd will use to start the process. On my OSX I have to change it into my own user / group (or root). Use the User and Group directive for this
    User gerrytan
    Group staff
    
  5. Add the following proxy / load balancer configuration
    <Proxy balancer://mycluster>
      BalancerMember http://192.168.1.101:8080 route=node1
      BalancerMember http://192.168.1.102:8080 route=node2
      ProxySet stickysession=BALANCEID
    </Proxy>
    
    ProxyPassMatch ^(/.*)$ balancer://mycluster$1
    ProxyPassReverse / http://192.168.1.101:8080/
    ProxyPassReverse / http://192.168.1.102:8080/
    ProxyPassReverseCookieDomain 192.168.1.101 localhost
    ProxyPassReverseCookieDomain 192.168.1.102 localhost
    

    The <Proxy> directive specifies we are defining a load balancer with a name balancer://mycluster with two backend servers (each specified by the BalancerMember directive). In my case my backend servers are 192.168.1.101:8080 and 192.168.1.102:8080. You can add additional IPs to your network interface to simulate multiple backend servers running in different hosts. Note the existence of the route attribute which will be explained shortly.

    The ProxySet directive is used to set the stickysession attribute specifying a cookie name that will cause request to be forwarded to the same member. In this case I used BALANCEID, and in my webapp I wrote a code that will set the cookie value to balancer.node1 or balancer.node2 respectively. When apache detects the existence of this cookie, it will only redirect request to BalancerMemeber which route attribute matches the string after the dot (in this case either node1 or node2). Thanks to this blog article for explaining the step: http://www.markround.com/archives/33-Apache-mod_proxy-balancing-with-PHP-sticky-sessions.html.

    The ProxyPassMatch directive is used to capture and proxy the requests that came into the apache httpd server (the proxy server). It takes a regular expression to match the request path. For example, assuming I installed my httpd on localhost:80, and I made a request to http://localhost/foo, then this directive will (internally) forward the request to http://192.168.1.101:8080/foo.

    The ProxyPassReverse directive is used to rewrite http redirect (302) sent by the backend server, so the client does not bypass the reverse proxy.

    ProxyPassReverseCookieDomain is similar to ProxyPassReverse but applied to cookies.

Few words of warning

There are plenty methods of configuring reverse proxy, this guide uses http reverse proxy which might not deliver the best performance. You might also want to consider AJP reverse proxy for better performance.