Tag Archives: apache

Apache SSL Offloading

Application servers such as Jetty abd Tomcat are widely used in today’s world. If SSL termination is done on the application server, such as Jetty, it may impact the performance of Jetty. You can also terminate SSL on the load balancer, but load balancers often have a limit on the amount of SSL traffic they can handle. For instance a load balancer might be able to handle 10Gbps, so if your traffic is more than that, it will cause a problem. Another way of reducing load on Jetty is to offload SSL termination to Apache. Apache will then pass the traffic to Jetty in an unencrypted connection. Let’s say you have a load balancer that is in front of your Jetty servers, you can install Apache on each of the servers running Jetty. Each Jetty server will also run Apache. So the connection would look something like this:

Clients--(SSL)-->Load Balancer--(SSL)-->Apache--(Unencrypted)-->Jetty

Apache will be listening on port 443, and Jetty on port 8080. Apache will then forward the traffic to port 8080 on the same host. In the logs of Apache you will see the source IP address of the clients. Configure the load balancer to use DSR, or direct server return. Apache acts as a proxy server. You can find a configuration for Apache here https://github.com/syedaali/configs/blob/master/apache-proxy.conf. I will explain some of the configuration settings below.

#Load the SSL module that is needed to terminate SSL on Apache
LoadModule ssl_module modules/mod_ssl.so

#We don't want to pass Apache server status to the Jetty server
ProxyPass /server-status !

#Apache is listening on port 443
Listen 443

SSLSessionCache         shmcb:/var/cache/mod_ssl/scache(512000)
SSLSessionCacheTimeout  300
SSLMutex default
SSLRandomSeed startup file:/dev/urandom  256
SSLRandomSeed connect builtin
SSLCryptoDevice builtin

#Location for SSL error logs
ErrorLog logs/ssl_error_log

#Location for SSL traffic logs
TransferLog logs/ssl_access_log

#Log level, this can be emrg, alert, crti, error, warn, notice, info, or debug
#See https://httpd.apache.org/docs/2.2/mod/core.html#loglevel for details
LogLevel warn

#Dont' use SSLv2, instead use SSLv3 and TLSv1
SSLProtocol all -SSLv2

#When choosing a cipher during an SSLv3 ot TLSv1 handshake, normally the client's preference is used. We want Apache to use the server's preference.
SSLHonorCipherOrder On

#SSL Tuning. We want to optimize our SSL chipher by removing some and adding other
SSLCipherSuite ALL:!ADH:!EXP:!LOW:!RC2:!3DES:!SEED:!RC4:+HIGH:+MEDIUM

#SSL certificate file location
#you can generate a self signed certificate file using this command
#$sudo openssl req -x509 -newkey rsa:2048 -keyout apache.key -out apache.crt -days 999 -nodes
#the ca.crt file is a certificate chain file
SSLCertificateFile /etc/httpd/certs/apache.crt

#SSL private key file location
SSLCertificateKeyFile /etc/httpd/certs/apache.key

#This directive sets the all-in-one file where you can assemble the Certificates of Certification Authorities (CA) whose clients you deal with.
SSLCACertificateFile /etc/httpd/certs/ca.crt

#The * is for listening on all IP interfaces.
<VirtualHost *:443>
ServerName <fill-in-server-name>
SSLEngine on

#Allow %2F in URLs. See https://httpd.apache.org/docs/2.2/mod/core.html#allowencodedslashes
AllowEncodedSlashes On
</VirtualHost>

#Disable forward proxy. See https://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxyrequests
ProxyRequests Off

#Any incoming URL is forwarded to localhost port 8080 where Jetty is listening. The noncanon allows raw passing of URLs without any canonicalise of URLS.
# See https://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypassmatch.
ProxyPassMatch  (.*) http://localhost:8080 nocanon

#"When enabled, this option will pass the Host: line from the incoming request to the proxied host, instead of the hostname specified in the ProxyPass line." See #https://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypreservehost
ProxyPreserveHost On

#Display proxy status in server status Apache page See https://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxystatus.
ProxyStatus On

#Keep alive proxy connections
SetEnv proxy-nokeepalive 0

#Set the X-Forwarded-Proto to be https for Jetty to understand. See https://httpd.apache.org/docs/2.2/mod/mod_headers.html#requestheader.
RequestHeader set X-Forwarded-Proto "https" env=HTTPS

What happens when you type in ‘www.cnn.com’ in your browser?

What happens when you type in http://www.cnn.com in your browser?

The communication between your client and a web server can be divided into the following components.
Assumptions: You are on a Linux client and trying to connect from your Comcast home cable connection.

– DNS
– Network communication
– HTTP

DNS: A DNS lookup resolves http://www.cnn.com. At first the Linux client will look in it’s name service cache daemon to see if this request has been cached from before. If not, then the request has to be sent to the name server. This happens when the dns request is sent to the name server specified in your /etc/resolv.conf on the client. The DNS resolver is usually your local router if you are at home. The router will forward all DNS requests to the resolver specified in it’s configuration, which it may have gotten from Comcast. The Comcast DNS server will forward this request to the .com root name servers. These will look up the NS of cnn.com. CNN.com name servers are specified when the domain is registered with the registrar. The request is then sent to to the CNN.com name servers which will look up http://www.cnn.com in their zone files and reply with the request. There is a forward lookup and a reverse lookup zone file. The forward lookup matches hostname to IP address. There are two kinds of requests that are made, one is a recursive and the other is a non recursive. A recursive query is when the client requests an answer which the DNS server has to find out and return. A non-recursive query is when the DNS server does not return the end answer, it’s upto to the client to find that out by talking to the next resolver in line. Keep in mind that each DNS server may have it’s own cache, so at any point in time the reply may come from cache if an answer is available. Caching has it’s own intricacies, and I will cover that in a DNS related blog.

Network communication: Network communication is when a 3 way TCP handshake happens between the client and the server, that includes a syn, sync-ack, and then an ack. Once the client has the IP address from the above DNS it makes a connection to the server. The client in our case will use it’s routing table to see if there is an entry for CNN.com’s network, if there is no entry, then the client will send the request for connection or sync packet to the default gateway. The default gateway does the same thing, it tries to find an entry for the the destination network, and when it cannot find it the packet is sent to the default gateway. This happens until the packet reaches the Internet gateway which is running BGP. The BGP routing table contains the list of all public IP address that have been assigned by ISP’s. This will contain CNN.com’s public IP address and the way to get to it. Comcast will then set the destination address of the packet to belong to CNN.com and send it across the Internet. Once it reaches http://www.yahoo.com, the destination will acknowledge the SYNC packet with an ACK and also send it’s own sync. This is the 2nd part of the 3 way TCP handshake. (Sync-Ack). The same routing happens on the way back at which point the client send an ACK and the connection is established. One important question to ask here is that how does the client know which host is in it’s network and which host is not? The answer is based on the netmask. For instance is the client IP is 10.1.1.100 and the netmask is 255.255.255.0, the client knows that the range 10.1.1.[0-256] is it’s local area subnet, and the rest is outside of the local area subnet. The client uses ARP or address resolution protocol to figure out which MAC address to send the packet to at layer 2. If the default gateway is 10.1.1.1, the client will send an ARP request saying ‘Who has 10.1.1.1?’. This ARP request is heard by the router which will then respond with it’s MAC address and the client will encapsulate the packet with the MAC address of the router.

HTTP: Once the destination receives the packet, let’s say http://www.cnn.com is running behind a http load balancer, in which case the load balancer can be using in-line or DSR. In-line means that the load balancer will handle all incoming and outgoing connection between the client and the http server. DSR or direct server return means that the incoming connections many come through the load balancer, but the outgoing connections will be between the web server and the client. Further details of this will be in my load balancer blog. Apache when it receives the incoming request on port 80 will then either use a forked process or a thread to pass the request to. Apache has two modes of running, one is worker.c and the other is pre-fork. In pre-fork Apache uses processes that have been forked. In worker.c it uses threads. Threads consumes less resources, but is more complex. Since pre-fork is by default, let’s say Apache forked off a process to handle our request.

If you are asked this question of ‘What happens when you type in in your browser?’, the above is a good start. At various points in the conversation you can talk more in depth about a particular topic. For instance with networking you can break down further and go into TCP/UDP differences and also talk about routing protocols such as RIP, OSPF, BGP. When it comes to DNS you can talk about the SOA record, what it means, and also about the other DNS records. Additionally you can talk about how to setup a BIND server. In regards to HTTP you can talk about how to setup Apache, difference between encrypted and un-encrypted traffic, and also go more in depth about SSL.

Improving Apache performance

Remove modules
Apache include many modules that are enabled, and you may not be needing all of them, disable the ones you do not need. (By default in CentOS when you install Apache, over 50 modules are loaded.) This will help speed up Apache. The modules that are loading are listed in /etc/httpd/conf/httpd.conf and the line begins with the word LoadModule, as in:

LoadModule auth_basic_module modules/mod_auth_basic.so

For instance if you don’t use LDAP to authenticate with Apache, you can disable the authnz_ldap_module module.
If you are unsure about disabling an Apache module whose name is perhaps not self explanatory, you can take a look here http://httpd.apache.org/docs/2.2/mod/ for a more detailed description.

DNS tunning
Each DNS lookup takes up time, so make sure that Apache is not doing hostname lookups, you can enable this feature with the following directive ‘HostnameLookups Off’. This is normally off by default.

Don’t use htaccess if there is no need to
Use ‘AllowOverride None’, since allowing override will force Apache to look for .htaccess file, which you may not be using. This will speed up Apache, since it’s one less thing that Apache has to do before serving content.

Avoid content negotiation
When you access the root directory of a web server, Apache usually looks for an index file which is basically the ‘home-page’ of a web server. This file can have various names such as index, index.html, etc. You can specify the exact filename so that Apache does not have to look for different files. So replace ‘DirectoryIndex index’ with ‘DirectoryIndex index.cgi index.pl index.shtml index.html’

How do you improve Apache’s performance? Share your comments in the blog.

Reference
http://httpd.apache.org/docs/current/misc/perf-tuning.html