Wednesday 17th of January 2018 12:26:25 AM
more .htaccess tips and tricks

 <ifModule>
 more clever stuff here

 </ifModule>

(in progress)

redirecting and rewriting

"The great thing about mod_rewrite is it gives you all the configurability and flexibility of Sendmail. The downside to mod_rewrite is that it gives you all the configurability and flexibility of Sendmail."

Brian Behlendorf - Apache Group

One of the more powerful tricks of the .htaccess hacker is the ability to rewrite URLs. This enables us to do some mighty manipulations on our links; useful stuff like transforming very long URL's into short, cute URLs, transforming dynamic ?generated=page&URL's into /friendly/flat/links, redirect missing pages, preventing hot-linking, performing automatic language translation, and much, much more.

Make no mistake, mod_rewrite is complex. this isn't the subject for a quick bite-size tech-snack, probably not even a week-end crash-course, I've seen guys pull off some real cute stuff with mod_rewrite, but with kudos-hat tipped firmly towards that bastard operator from hell, Ralf S. Engelschall, author of the magic module itself, I have to admit that a great deal of it still seems so much voodoo to me.

The way that rules can work one minute and then seem not to the next, how browser and other in-between network caches interact with rules and testing rules is often baffling, maddening. When I feel the need to bend my mind completely out of shape, I mess around with mod_rewrite!

after all this, it does work, and while I'm not planning on taking that week-end crash-course any time soon, I have picked up a few wee tricks myself, messing around with webservers and web sites, this place..

The plan here is to just drop some neat stuff, examples, things that has proven useful, stuff that works on a variety of server setups; there are apache's all over my LAN, I keep coming across old .htaccess files stuffed with past rewriting experiments that either worked; and I add them to my list (gramotki), or failed dismally; and I'm surprised that more often these days, I can see exactly why!

Nothing here is my own invention. Even the bits I figured out myself were already well documented, I just hadn't understood the documents, or couldn't find them. Sometimes, just looking at the same thing from a different angle can make all the difference, so perhaps this humble stab at URL Rewriting might be of some use. I'm writing it for me, of course. but I do get some credit for this..

# time to get dynamic, see..
rewriterule ^(.*)\.htm $1.php

beginning rewriting..

Whenever you use mod_rewrite (the part of apache that does all this magic), you need to do..
you only need to do this once per .htaccess file:
 Options +FollowSymlinks
 RewriteEngine on
before any ReWrite rules. +FollowSymLinks must be enabled for any rules to work, this is a security requirement of the rewrite engine. Normally it's enabled in the root and you shouldn't have to add it, but it doesn't hurt to do so, and I'll insert it into all the examples on this page, just in case.

The next line simply switches on the rewrite engine for that folder. if this directive is in you main .htaccess file, then the ReWrite engine is theoretically enabled for your entire site, but it's wise to always add that line before you write any redirections, anywhere.

note: while some of the directives on this page may appear split onto two lines, in your .htaccess file, they must exist completely on one line.


simple rewriting

Simply put, Apache scans all incoming URL requests, checks for matches in our .htaccess file and rewrites those matching URLs to whatever we specify. something like this..
all requests to whatever.htm will be sent to whatever.php:
 Options +FollowSymlinks
 RewriteEngine on
 RewriteRule ^(.*)\.htm $1.php [nc]
handy for anyone updating a site from static htm (you could use .html, or .htm(.*)) to dynamic php pages; requests to the old pages are automatically rewritten to our new urls. no one notices a thing, visitors and search engines can access your content either way. leave the rule in; as an added bonus, this enables us to easily split php code and its included html structures into two separate files, a nice idea; makes editing and updating a breeze. The [nc] part at the end means "No Case", or "case-insensitive", but we'll get to that.

folks can link to whatever.htm or whatever.php, but they always get whatever.php in their browser, and this works even if whatever.htm doesn't exist! but I'm straying..

as it stands, it's a bit tricky; folks will still have whatever.htm in their browser address bar, and will still keep bookmarking your old .htm URL's. Search engines, too, will keep on indexing your links as .htm, some have even argued that serving up the same content from two different places could have you penalized by the search engines. This may or not bother you, but if it does, mod_rewrite can do some more magic..
this will do a "real" http redirection:
 Options +FollowSymlinks
 rewriteengine on
 rewriterule ^(.*)\.htm$ http://corz.org/$1.php [r=301,nc]
this time we instruct mod_rewrite to send a proper HTTP "permanently moved" redirection, aka; "301". Now, instead of just redirecting on-the-fly, the user's browser is physically redirected to a new URL, and whatever.php appears in their browser's address bar, and search engines and other spidering entities will automatically update their links to the .php versions; everyone wins. and you can take your time with the updating, too.


not-so-simple rewriting

You may have noticed, the above examples use regular expression to match variables. what that simply means is.. match the part inside (.*) and use it to construct "$1" in the new URL. in other words, (.*) = $1 you could have multiple (.*) parts and for each, mod_rewrite automatically creates a matching $1, $2, $3, etc, in your target URL, something like..
a more complex rewrite rule:
 Options +FollowSymlinks
 RewriteEngine on
 RewriteRule ^files/(.*)/(.*).zip /download.php?section=$1&file=$2 [nc]
would allow you to present a link as..

  http://mysite/files/games/hoopy.zip

and in the background have that translated to..

  http://mysite/download.php?section=games&file=hoopy

which some script could process. you see, many search engines simply don't follow our ?generated=links, so if you create generating pages, this is useful. However, it's only the dumb search engines that can't handle these kinds of links; we have to ask ourselves.. do we really want to be listed by the dumb search engines? Google will handle a good few parameters in your URL without any problems, and the (hungry hungry) yet-to-actually-have-a-search-engine msn-bot stops at nothing to get that page, sometimes again and again and again…

I personally feel it's the search engines that should strive to keep up with modern web technologies, in other words; we shouldn't have to dumb-down for them. But that's just my opinion. Many users will prefer /files/games/hoopy.zip to /download.php?section=games&file=hoopy but I don't mind either way. As someone pointed out to me recently, presenting links as/standard/paths means you're less likely to get folks doing typos in typed URL's, so something like..
an even more complex rewrite rule:
 Options +FollowSymlinks
 RewriteEngine on
 RewriteRule ^blog/([0-9]+)-([a-z]+) http://corz.org/blog/index.php?archive=$1-$2 [nc]
would be a neat trick, enabling anyone to access my blog archives by doing..
Values

disc | circle |square | decimal |upper-alpha | lower-alpha |upper-roman | lower-roman |none

The meaning of these values is shown in
 
http://corz.org/blog/2003-nov

in their browser, and have it automagically transformed server-side into..

 http://corz.org/blog/index.php?archive=2003-nov

which corzblog would understand. It's easy to see that with a little imagination, and a basic understanding of posix regular expression, you can perform some highly cool URL manipulations.


shortening URLs

One common use of mod_rewrite is to shorten URL's. shorter URL's are easier to remember and, of course, easier to type. an example..
beware the regular expression:
 Options +FollowSymlinks
 RewriteEngine On
 RewriteRule ^grab(.*) /public/files/download/download.php$1
this rule would transform this user's URL..

  http://mysite/grab?file=my.zip

server-side, into..

  http://mysite/public/files/download/download.php?file=my.zip

which is a wee trick I use for my distro machine, among other things. everyone likes short URL's. and so will you; using this technique, you can move /public/files/download/ to anywhere else in your site, and all the old links still work fine. just alter your .htaccess file to reflect the new location. edit one line, done. nice. means even when stuff is way deep in your site you can have cool links like this.. /trueview/sample.php


cooler access denied

In part one I demonstrated a drop-dead simple mechanism for denying access to particular files and folders. The trouble with this is the way our user gets a 403 "Access Denied" error, which is a bit like having a door slammed in your face. Fortunately, mod_rewrite comes to the rescue again and enables us to do less painful things. One method I often employ is to redirect the user to the parent folder..

they go "huh?.. ahhh!"
 # send them up!
 Options +FollowSymlinks
 RewriteEngine on
 RewriteRule ^(.*)$ ../ [nc]

It works great, though it can be a wee bit tricky with the URLs, and you may prefer to use a harder location, which avoids potential issues in indexed directories, where folks can get in a loop..

they go damn!"
 # send them exactly there!
 Options +FollowSymlinks
 RewriteEngine on
 RewriteRule ^(.*)$ /comms/hardware/router/ [nc]

Sometimes you'll only want to deny access to most of the files in the directory, but allow access to maybe one or two files, or file types, easy..
deny with style!
 # users can load only "special.zip", and the css and js files.
 Options +FollowSymlinks
 RewriteEngine On
 rewritecond %{REQUEST_FILENAME} !^(.+)\.css$
 rewritecond %{REQUEST_FILENAME} !^(.+)\.js$
 rewritecond %{REQUEST_FILENAME} !special.zip$
 RewriteRule ^(.+)$ /chat/ [nc]

Here we take the whole thing a stage further. Users can access .css (stylesheet) and javascript files without problem, and also the file called "special.zip", but requests for any other filetypes are immediately redirected back up to the main "/chat/" directory. You can add as many types as you need.


prevent hot-linking

Believe it or not, there are some webmasters who, rather than coming up with their own content will steal yours. really! even worse, they won't even bother to copy to their own server to serve it up, they'll just link to your content!  no, it's true, in fact, it used to be incredibly common. these days most people like to prevent this sort of thing, and .htaccess is one of the best ways to do it.

This is one of those directives where the mileage variables are at their limits, but something like this works fine for me..
how DARE they!
 Options +FollowSymlinks
 # no hot-linking
 RewriteEngine On
 RewriteCond %{HTTP_REFERER} !^$
 RewriteCond %{HTTP_REFERER} !^http://(www\.)?corz\.org/ [nc]
 RewriteRule .*\.(gif|jpg|png)$ http://corz.org/img/hotlink.png [nc]
you may see the last line broken into two, but it's all one line (all the directives on this page are). let's have a wee look at what it does..

we begin by enabling the rewrite engine, as always.

the first RewriteCond line allows direct requests (not from other pages - an "empty referrer") to pass unmolested. The next line means; if the browser did send a referrer header, and the word "corz" is not in the domain part of it, then DO rewrite this request.

the all-important final RewriteRule line instructs mod_rewrite to rewrite all matched requests (anything without "corz" in its referrer) asking for gifs, jpegs, or pngs, to an alternative image. mine says "no hotlinking!". You can see it in action here. there are loads of ways you can write this rule. google for "hot-link protection" and get a whole heap. simple is best. you could send a wee message instead, or direct them to some evil script, or something.


inheritance..

If you are creating rules in sub-folders of your site, you need to read this.

you'll remember how rules in top folders apply to all the folders inside those folders too. we call this "inheritance". normally this just works. but if you start creating other rules inside subfolders you will, in effect, obliterate the rules already applying to that folder due to inheritance, or "decendancy", if you prefer. not all the rules, just the ones applying to that subfolder. a wee demonstration..

let's say I have a rule in my main /.htaccess which redirected requests for files ending .htm to their .php equivalent, just like the example at the top of this very page. now, if for any reason I need to add some rewrite rules to my /osx/.htaccess file, the .htm >> .php redirection will no longer work for the /osx/ subfolder, I'll need to reinsert it, but with a crucial difference..

this works fine, site-wide, in my main .htaccess file
 # main (top-level) .htaccess file..
 # requests to file.htm goto file.php
 Options +FollowSymlinks
 rewriteengine on
 rewriterule ^(.*)\.htm$ http://corz.org/$1.php [r=301,nc]

here's my updated /osx/.htaccess file, with the .htm >> .php redirection rule reinserted..

but I'll need to reinsert the rules for it to work in this sub-folder
 # /osx/.htaccess file..
 Options +FollowSymlinks
 rewriteengine on
 rewriterule some rule that I need here
 rewriterule some other rule I need here
 rewriterule ^(.*)\.htm$ http://corz.org/osx/$1.php [r=301,nc]
spot the difference in the subfolder rule, highlighted in red. you must add the current path to the new rule. now it works again. if you remember this, you can go replicating rewrite rules all over the place.


conclusion

In short, mod_rewrite allows you to send browsers from anywhere to anywhere. You can create rules based not simply on the requested URL, but also on such things as IP address, browser agent (send old browsers to different pages, for instance), and even the time of day; the possibilities are practically limitless.

the ins-and outs of mod_rewrite syntax are topic for a much longer document than this, and if you fancy experimenting with more advanced rewriting rules, I urge you to check out the apache documentation. If you are running some *nix operating system, (in fact, if you have apache installed on any operating system), there will likely be a copy of the apache manual on your own machine, right here, and the excellent mod_rewriting guide, lives right here. do check out the URL Rewriting Engine notes for the juicy syntax bits. That's where I got the cute quote for the top of the page, too.
;o)
cor
<«-  .htaccess tricks: authentication, indexing, and more  <«-
Apache mod_rewrite docs THE reference document for all things mod_rewrite
Apache 2 mod_rewrite docs As above, but for Apache 2
Apache URL rewriting guide In more easily understaood language, very useful.
Apache 2 URL rewriting guide As above but for Apache 2
modrewrite.com forum a forum-full of mod_rewrite help
webmasterworld forum more help from those in the know
David Mertz's regex tutorial get cubed up on regular expressionism!
to affect short bits of text; block-level tags affect paragraphs or other blocks of text, and typically include automatic line-breaks.  You can nest in-line tags within block-level tags and/or other in-line tags, but don't next block-level tags inside in-line tags.  Where tags let you specify attributes, attribute choices are summarized
  First, note that HTML ignores 
carriage returns            and  indirectly, can be risky. A better choice might be this:

IMG{float: left; margin-left: -10%;}

This will allow the images' left margins to scale along with the environment.

Since we only have two images, and both of them require the same effect, this declaration will work just fine. Figure 11-9 reveals the result.

Figure 11-9

Figure 11-9. Floating images

Web-based applications are similar to app servers, except for one thing: Web-based applications don't have client apps, instead they use web browsers on the client side. They generate their front ends using HTML, which is dynamically generated by the web-based app. In the Java world, Servlets are best suited for this job.

Web-based apps might themselves rely on another app server to gather information that is presented on the client web browser. Also, you can write Servlets that get information from remote or local databases, XML document repositories and even other Servlets. One good use for web-based apps is to be a wrapper around an app server, so that you can allow your customers to access at least part of the services offered by your app server via a simple web browser. So web-based apps allow you to integrate many components including app servers, and provide access to this information over the web via a simple web browser.

Web-based apps are very deployable, since they don't require special Java VMs to be installed on the client side, or any other special plug ins, if the creator of the web-based app relies solely on HTML. Unfortunately, this can restrict the level of service that can be offered by a web-based app when compared to the functionality offered by custom clients of an app server, but they are a good compromise when it comes to providing web-based access to your information. In fact, in a real world scenario, both a web-based app and app server may be used together, in order to provide your customers access to their information. In an Intranet setting, you might deploy the clients that come with the app server, and in an Internet setting it would be better to deploy a web-based app that sits on top of this app server, and gives your customers (relatively) limited access to their data over the web (via a simple web browser).

A, using styles like these:

DIV {position: relative;}
P.A {position: absolute; top: 0; right: 0; width: 15em; height: auto;
margin-left: auto;}
P.B {position: absolute; bottom: 0; left: 0; width: 10em; height: 50%;
margin-top: auto;}

This is an important point to always keep in mind: only positioned elements establish containing blocks for their descendant elements. I know this has come up before, but it's so fundamental that it The simplest category of XML Java applications is the kind of Java application that stores information in XML documents (files). This is illustrated in Figure 1. By using XML to create your own markup languages (i.e. your own file formats for your information) in an open way, you don't have to use propietary and binary file formats. Using XML over proprietary binary file formats, allows your applications to have immense inter operability across platforms, applications and even programming languages. Since any kind of markup language can be defined using XML (you can even formalize it by creating a DTD for it) applications can store their information using their own markup languages. For example, address book information can be stored in an AddressBookML file. A few commercial programs currently available allow saving their application data to XML files, e.g., Framemaker can save its documents as XML files.

In order to create applications of this category, you might have to define a DTD for your information. Then you have to write classes to import and export information from your XML document(s) (validating using your application's DTD if you have one). You must also write the classes which create the user interface in your application. The user of your application can view and modify information using the GUI (graphical user interface), and they can save (and load) their information to (and from) an XML file (that might use your DTD); in other words, they can save (and load) their information to (and from) an ApplicationML file (where Application is the name of your application). Some examples are AddressBookML, MathML, SVGML, etc.

The classes that import and export information from your ApplicationML file must use the parser and SAX or DOM API in order to import the information. These classes can access this information by using one of the following strategies:

  1. Use DOM to directly manipulate the information stored in the document (which DOM turns into a tree of nodes). This document object is created by the DOM XML parser after it reads in the XML document. This option leads to messy and hard-to-understand code. Also, this works better for document-type data rather than just computer generated data (like data structures and objects used in your code).
  2. Create your own Java object model that imports information from the XML document by using either SAX or DOM. This kind of object model only uses SAX or DOM to initialize itself with the information contained in the XML document(s). Once the parsing and initialization of your object model is completed, DOM or SAX isn't used anymore. You can use your own object model to accessed or modify your information without using SAX or DOM anymore. So you manipulate your information using your own objects, and rely on the SAX or DOM APIs to import the information from your ApplicationML file into memory (as a bunch of Java objects). You can think of this object model as an in-memory instance of the information that came was "serialized" in your XML document(s). Changes made to this object model are made persistent automatically, you have to deal with persistence issues (ie, write code to save your object model to a persistence layer as XML).