Synopsis: this blog describes the concept of reverse-proxying web URLs, and how to implement it using Apache
Prerequisites: basic understanding of web-sites, administrative knowledge of target operating system (Unix-type or Windows)
Why should I reverse-proxy?
In a nutshell, a reverse proxy set-up can be described as a go-between that looks like a server to a client, and like a client to a server. There are several scenarios that would require this. Each of these scenarios is expanded upon below:
- segregating the serving of dynamic resources from static resources
- serving static resources from a different source to the current domain
- enabling single-sign-on to multiple systems, each with their own web frontend
- mitigating against cross-origin restrictions
Segregating the serving of dynamic resources from static resources
Application servers like JBoss and WebSphere are used to dynamically generate custom content in a web application. When volumes are introduced, this can become hard work, and serving resources like static images, style-sheets, and javascript files detracts from the server’s abilities to perform this work. While clustering will help with the problem, there is no real value gained by using the app server for menial labour. Instead, this should be delegated to a web server like Apache, which is designed specifically for doing so. In addition, the web server can be configured to add headers to the resources to encourage browsers to cache content, and only fetch fresh versions when the recommended expiry date has passed. Delegating this responsibility can have dramatically positive impact on your application server’s performance.
Segregating the serving of dynamic resources from static resources
In a previous blog, we discussed how to upload images and other media files to a virtual file system. Mounting a link from a web server to that file system, and then reverse-proxying requests to the mounted link, allows those resources to be served as if they are contained in the current web-site.
Enabling single-sign-on to multiple systems, each with their own web frontend
In a portal-like web-site, multiple different web applications may be exposed via a single entry point. Reverse-proxying those applications will ensure that the root domain exposed to the end-user remains consistent; this means that SSL sites can use a single certificate for all those apps, and authentication cookies attached to a particular domain can be reused.
Mitigating against cross-site restrictions
Many browsers will warn against or prevent the loading of resources from a different domain to the current one. This is a valid and desirable security feature. However, if you are legitimately attempting to meld multiple resources into a single site, this can be frustrating, if not terminal to your project. Reverse-proxying will ensure that all the resources you are dependent on can appear to originate from the same root domain. For example, REST services used to populate dynamic content may originate from a completely different physical server to your HTML resources. Using a reverse proxy will prevent the results from being blocked by the user’s browser.
Convinced yet? If not, have a look at “The benefits of a reverse proxy” for some compelling arguments, some of which cover the above. If you are, let’s look at how.
How do I reverse-proxy?
Just to show I’m not completely prejudiced, there are other open source web servers like nginx and jscape that support reverse proxying, and even some commercial options like WebSEAL, one of IBM’s acquisitions. My personal bigotry, however, is based on my happy experiences with Apache, so the following guide will tell you how to set up reverse proxying on that platform, specifically version 2.4. The remainder of this blog assumes that this version of Apache has been downloaded and installed on your environment. A useful starter guide is available, as well as installation instructions for Unix-likeoperating systems, and for Windows.
At its most basic, reverse-proxying on Apache requires that the proxy module is enabled, and then that the proxied server path and corresponding filter is defined. Always make a back-up of the httpd.conffile before making changes, so that you can revert in case of catastrophic disaster.
Enable proxy modules
Open up the httpd.conf file in a text editor. This file is located in the “conf” directory under the root Apache folder, assuming a default installation. Ensure that the following LoadModule directives are enabled (note that the # symbol indicates that the line following that character is commented out, and will not be executed):
LoadModule proxy_module modules/mod_proxy.so LoadModule proxy_http_module modules/mod_proxy_http.so
Save the file, and restart the Apache server. Make sure the restart was successful by entering the URL for the server in a browser. If the server is running locally, the root URL will be as follows:
http://localhost/Add reverse proxy directives
To continue the modular theme within Apache, we will make a new file for our new directives. Create a file called “myproxy.conf” in the “conf” directory of your Apache installation (of course, the file could reside anywhere that is accessible to the Apache user, but let’s keep it tidy). In the httpd.conf file, add the following towards the end of the file (again, location is irrelevant except for ease of maintenance):
Include conf/myproxy.conf
Save and restart.
Now, open myproxy.conf and add the following lines:
ProxyRequests Off ProxyPass /apache/docs/ http://httpd.apache.org/docs/2.4/ ProxyPassReverse /apache/docs/ http://httpd.apache.org/docs/2.4/
Note that “ProxyRequests” should always be set to “Off” to avoid embarrassing cases of stolen identity.
Save this file, and restart apache again.
Now, go to your browser, and enter your server’s domain plus the sub-domain we just mapped (let’s assume you’re running locally again; otherwise, substitute “localhost” with the remote domain or IP address):
You should see the Apache HTTP Server Version 2.4 Documentation landing page. Any request that hits your web server that contains the pattern specified in your directive will be redirected to the associated URL.
Note that any relative links (“nextpage.html“, rather than “http://hostname/nextpage.html“) will still resolve back to the proxied address, unless the link sets the path to a relative folder above the current one, such as “../../new/nextpage.html“. Absolute links will just go to the specified address. The latter two scenarios introduce the requirement for a whole new set of tools like rule-based URL rewriting, which is supported by the mod_proxy_http module, with more sophisticated mechanisms in the mod_rewrite module. Such complexity is beyond the scope of this article.
In our simple world, then, we have successfully reverse-proxied the reverse-proxy documentation. Similar pairs of directives will enable reverse proxying to solve most of the requirements detailed above.
When you own the web application to be reverse-proxied, ensuring that static resources are located in easily identifiable and isolatable folders will make your life a whole lot easier. Then, you can copy the appropriate resources to the Apache server, and use reverse-proxying to service requests for such resources from the local web server, and not the over-worked and under-paid application server:
ProxyPass /myapp/css http://localhost/myappstatic/css ProxyPassReverse /myapp/css http://localhost/myappstatic/css ProxyPass /myapp/ http://someotherserver:8080/myapp/ ProxyPassReverse /myapp/ http://someotherserver:8080/myapp/
Note that earlier directives override later ones. Thus, in the configuration above, requests hitting the web server that include the “myapp” context root will be forwarded to the application server running on port 8080 on someotherserver, unless they also contain the css path, in which case they will be served from the local myappstatic/css path.
In the event of needing to serve resources residing on an external structure, map the external drive to a path within the web server document structure, then add a reverse proxy directive using the new path, such as:
ProxyPass /uploaded/images http://localhost/mountedpath/images ProxyPassReverse /uploaded/images http://localhost/mountedpath/images
The steps outlined above give a simple view into the power of reverse-proxying. The documentation on mod_proxyon the Apache site provides a whole lot more detail on the potential capabilities and application of this concept. Hopefully, though, you now have sufficient insight to identify when this approach is appropriate to your own environment, and will be able to harness some of that power for good.