Readme for analog 6.0


Introduction

Analog is a program which analyses logfiles from WWW servers. It works on almost any operating system. It is designed to be fast and to produce accurate and attractive statistics: and combined with Report Magic, you can generate even prettier reports. It's free software.

Although analog is free software, its distribution and modification are covered by the terms of the GNU General Public License. You are not required to accept this licence, but nothing else gives you permission to modify or distribute the program. Analog comes with no warranty.

Although analog is free, if you like it, please consider making a donation towards its development. Thank you.

This Readme describes analog 6.0. For the latest version of analog, see the analog home page. For examples of the output see

This is a version of the Readme in one page. If you're reading it on line, you might prefer the version on several smaller pages. Beginners should start with the licence followed by the section on Starting to use analog. There is an index at the end of this document.

You might also find the How-To's helpful; these are descriptions by other authors of how to use analog for particular tasks.

Now you can go to


Starting to use analog

The only thing you need to run analog is to be able to read the logfiles which are produced by your web server. If you don't know what these logfiles are and where to find them, contact your internet service provider (ISP) or system administrator. Analog doesn't write the logfiles: it only reads them.

If you log in to your ISP's machine from your home machine, you have two options. If you have the right permissions, you can run analog on your ISP's machine. Otherwise, you can download (e.g., ftp) the logfiles from their machine to yours, and then run analog on your machine.

Once you've downloaded the right version of analog for your computer from the analog home page (or a mirror site), you need to know how to set it up and run it. This is very easy, but the instructions are slightly different depending which platform you're using.

If you can't manage to set up analog after reading the instructions, send a message to the analog-help mailing list.


Starting to use analog on a Mac

Here is the really short summary:
  1. Edit analog.cfg
  2. Run analog
  3. Read Report.html

When you download the Mac version of analog, it should unpack itself. (If it doesn't, you might have to run StuffIt Expander on it). You should then find in the analog directory a configuration file called analog.cfg and the analog application itself, as well as the Readme, the Licence and a couple of other files. When you double-click on the analog icon, it will run in its own window, and produce an output file called Report.html. (For help in interpreting the output, see What the results mean.) The window will then close if there weren't any warning messages, or stay open for you to read them if there were.
You can configure analog by putting commands in the configuration file, analog.cfg. Although this is less familiar to Mac users than pressing buttons etc., it's really much simpler and more flexible when you get used to it. One command you will need straight away is
LOGFILE logfilename    # to set where your logfile lives
The logfile must be stored locally -- analog won't use FTP or HTTP to fetch it from the internet. There's a sample logfile supplied with the program.

There are already some configuration commands to get you started in the configuration file, but there are lots of others available. You can find the most common ones in the section on basic commands later in the Readme, and you can read about all of them in the section on customising analog. There are also some sample configuration files in the examples folder.


Another way to start analog is to drag a logfile onto the analog icon, in which case analog will try to analyse it, or drag a configuration file onto the icon, in which case analog will use the commands in that configuration file. (Analog detects whether it's a configuration file or a logfile by whether it starts with "# " or not.) This enables you to create different reports without having two copies of the application.

There is another way to give options, via command line arguments. You'll see these mentioned in this Readme from time to time, but MacOS before MacOS X doesn't have a command line, so ignore these unless you've downloaded the Darwin version of analog.

If you want to compile your own version of analog (it's written in C), or just to read the source code, it's available from the analog home page. (It's the same source code for all versions).


Starting to use analog under Windows

This describes how to set up analog under Windows 95/NT or later. Windows 3.1 users will have to read the section on other platforms instead.

Here is the really short summary:

  1. Edit analog.cfg
  2. Run analog (a DOS window flashes up).
  3. Read Report.html

There's also a How-To written by Simon Handfield, which explains how to get started in more detail with lots of pictures.


When you've downloaded analog, and either you or your browser has unzipped it, you will find in the analog folder a configuration file called analog.cfg and the analog executable itself, as well as the Readme, the Licence and a couple of other files. There is no setup.exe: analog is already ready to run without one.

(Some unzip programs are broken, and do not create folders when they should. If you don't have a folder called lang inside the analog folder, create one and put all the files called *.lng and *.tab into it.)

There are two ways of running analog. You can either run it from Windows (by single-clicking or double-clicking on its icon, depending on your setup), or you can run it from the DOS command prompt (under Start-Programs). If you run it from Windows, it will create a DOS window to run in. When it's finished, it will produce an output file called Report.html and some graphics; and a file called errors.txt which contains any errors there might have been. The first time you run it, this will all happen almost instantly. This is not a bug. For help in interpreting the output, see What the results mean.


You can configure analog by putting commands in the configuration file, analog.cfg. Although this is less familiar to Windows users than pressing buttons etc., it's really much simpler and more flexible when you get used to it. You can edit analog.cfg using any plain text editor, for example Notepad. One command you will need straight away is
LOGFILE logfilename    # to set where your logfile lives
The logfile must be stored locally -- analog won't use FTP or HTTP to fetch it from the internet. There's a sample logfile supplied with the program.

There are already some configuration commands to get you started in the configuration file, but there are lots of others available. You can find the most common ones in the section on basic commands later in the Readme, and you can read about all of them in the section on customising analog. There are also some sample configuration files in the examples folder.

If you run analog from the DOS command prompt, there is another way to give options, via command line arguments, given on the command line after the program name. These are just shortcuts for configuration file commands. You can use the command line arguments if you run analog from a batch file too.

If you want to compile your own version of analog (it's written in C), or just to read the source code, it's available from the analog home page. (It's the same source code for all versions).


Starting to use analog on other platforms

Here is the really short summary:
  1. Edit anlghead.h and compile, if necessary
  2. Edit analog.cfg
  3. Run analog

Many platforms have a precompiled version of analog available. Before compiling analog, have a look at the analog home page to see if yours does.

If you're not using one of the platforms for which a precompiled version is available, you'll have to compile your own version from the source. But don't worry -- it's written in standard C throughout, so it will compile out of the box on most platforms. (The source code is the same for all platforms.)

First, change to the src/ directory.

Then look at the file anlghead.h, and see if there's anything you want to edit.

When you have done that, you need to compile the program. How to do that depends on which operating system you're using.


Compiling under Unix. First edit anlghead.h as described above. Then just type
make
within the src/ directory to compile the program. On most systems, that will be sufficient, and the compiled program should appear in the parent directory. If it fails to compile, have a look in the Makefile to see if there's anything that you need to change to suit your configuration, and try again. It says in that file what to do. In particular, Solaris 2 (SunOS 5+) users need to change the LIBS= line.

(Experts can pass some arguments in on the make command line instead of by editing anlghead.h: e.g.

make DEFS='-DLANGDIR=\"/usr/etc/apache/analog/lang/\"'
This is useful if you have a script to compile analog.)

If you haven't got gcc, you will need to change the compiler - try acc or cc instead.

Compiling under OpenVMS. You can find OpenVMS build scripts within the src/build directory. Unzip them within the src directory. Then to build Analog interactively from the command line, type

$ @ Build_Analog
or to submit the Build_Analog procedure to a batch queue, type
$ Submit /NoPrint /Keep Batch.com
The command procedure will use MMS (or MMK) if it is available, otherwise it will compile everything from raw command procedures.

Compiling under Acorn RiscOS. The Makefile can be found in the src/build directly, although at this point it has not been updated for version 5 of analog. You will have to make directories called C, H and O, and move the sources files into the appropriate directories: e.g., alias.c must be renamed C.alias. And you will find that there are some filenames in the header file anlghead.h that you want to change to fit into the RiscOS directory structure.

Compiling under OS/2. To compile analog for OS/2, you will need the EMX package. You should edit the Makefile to have OS=OS2 and LIBS=-lsocket. Then after editing anlghead.h and running Make, you need to run the command

EMXBIND -b ANALOG
to generate the analog.exe executable.
After you've compiled the program, leave the src/ directory and then just type
analog
to run the program. (Or ./analog if for some reason . isn't in your $PATH.)

You can configure analog by putting commands in the configuration file, which is called analog.cfg by default. Two commands you will need straight away are

LOGFILE logfilename      # to set where your logfile lives
OUTFILE outputfile.html  # to send the output to a file instead of the screen
The logfile must be stored locally -- analog won't use FTP or HTTP to fetch it from the internet. There's a sample logfile supplied with the program. For help in interpreting the output, see What the results mean.

There are already some configuration commands to get you started in the configuration file, but there are lots of others available. You can find the most common ones in the section on basic commands later in the Readme, and you can read about all of them in the section on customising analog. There are also some sample configuration files in the examples directory.

There is one other way to give options to analog, via command line arguments, given on the command line after the program name. These are just shortcuts for configuration file commands.


Customising analog

This is the bulk of the Readme. It tells you all the commands you can give to analog, and what they all do. First there's a list of which is as much as beginners need to read, until they want to do something which isn't listed there, or are curious to find out what they could do.

The following section is a technical (i.e., dull but important) one on the

Then there's documentation on all the configuration commands in the following categories. Analog has over 200 configuration commands and over 40 command line options, so sometimes these sections turn into lists of commands. But here's where you find out everything you can do with analog.

Later there's an index of all the commands and topics, and also a quick reference containing the syntax of all the commands and examples.


Basic commands

Here is a list of basic configuration commands to get you started with analog. These commands should be added to your configuration file, analog.cfg, as explained in the section on Starting to use analog. We'll see all the possible configuration commands in later sections. Or you can read a summary of the commands which control each report in the section on Analog's reports.
Analog reads logfiles produced by your web server, and produces an output file based on the data in them. So you need to know how to specify which logfile to read, and which file to send the output to. The relevant commands look like
LOGFILE my_logfile
OUTFILE output.html
where, of course, you should substitute the names of the files you want to use. The logfile must be stored locally -- analog won't use FTP or HTTP to fetch it from the internet, so you may have to fetch it yourself first. You can read several logfiles by giving several logfile commands, or by giving a comma-separated list, or by using wildcards in the logfile name. So, for example, if you use the commands
LOGFILE new1.log,old*.log
LOGFILE new2.log
analog will analyse the logfiles new1.log, new2.log, and all the old logfiles. Analog will recognise logfiles in several different formats. You can read more about this in the section on Choosing a logfile.
There are a couple of other commands you need to know right at the beginning, not because they're particularly important in themselves, but because the output will look silly if you don't know them. First, you need to know how to put your own organisation's name and URL at the top of the output. For this, you need two commands such as
HOSTNAME "Spam Widgets Inc."
HOSTURL http://www.spam-widgets.com/

If you have broken images in the output instead of graphs, you need to say in which directory on your server the images are stored. You do this by a command like

IMAGEDIR /analog/images/
(This is just put in the <img> tags in the output page, so it's the URL of a directory, not the name of the directory on your disk. The images are distributed with the program - you will have to move them to whichever directory you choose.)
Next you will want to know how to turn individual reports on and off. Analog can produce up to 44 different reports if your web server has been configured to record the necessary data in your logfiles, but here are the most important ones. Try them and see what happens. You can turn each report on with an ON command, or off with an OFF command. You can also use the commands ALL ON and ALL OFF to turn all reports on or off.
MONTHLY ON       # one line for each month
WEEKLY ON        # one line for each week
DAILYREP ON      # one line for each day
DAILYSUM ON      # one line for each day of the week
HOURLYREP ON     # one line for each hour of the day
GENERAL ON       # the General Summary at the top
REQUEST ON       # which files were requested
FAILURE ON       # which files were not found
DIRECTORY ON     # Directory Report
HOST ON          # which computers requested files
ORGANISATION ON  # which organisations they were from
DOMAIN ON        # which countries they were in
REFERRER ON      # where people followed links from
FAILREF ON       # where people followed broken links from
SEARCHQUERY ON   # the phrases and words they used...
SEARCHWORD ON    # ...to find you from search engines
BROWSERSUM ON    # which browser types people were using
OSREP ON         # and which operating systems
FILETYPE ON      # types of file requested
SIZE ON          # sizes of files requested
STATUS ON        # number of each type of success and failure
The full list of reports is in the section on Configuring the output. Some reports, for example the Referrer, Browser and Operating System Reports, will only appear if your web server has been configured to record the necessary data in its logfiles.

You can configure lots of other things about each report, such as how many rows are listed, which columns are included, and how the reports are sorted. For example, the command

REQINCLUDE pages
tells analog only to list pages, rather than all files, in the Request Report, and
REQFLOOR 10r
tells analog to include in the Request Report all files with at least 10 requests. You can read a summary of all the reports and the commands which control them in the section on Analog's reports.
You can have the output in several different languages, by using a LANGUAGE command. For example, the command
LANGUAGE FRENCH
will give you the output in French. The available languages at the moment include ARMENIAN, BASQUE, BULGARIAN, CATALAN, SIMP-CHINESE (GB2312), TRAD-CHINESE (Big5), CZECH, DANISH, DUTCH, ENGLISH, US-ENGLISH, FINNISH, FRENCH, GERMAN, HUNGARIAN, INDONESIAN, ITALIAN, JAPANESE, KOREAN, LATVIAN, NORWEGIAN (Bokmål), NYNORSK, POLISH, PORTUGUESE, BR-PORTUGUESE, RUSSIAN, SERBIAN, SLOVAK, SLOVENE, SPANISH, SWEDISH, TURKISH and UKRAINIAN.

The following languages were available for previous versions of analog, but have not yet been translated for version 5: BOSNIAN, CROATIAN, GREEK, ICELANDIC, LITHUANIAN and ROMANIAN. As and when they are translated, they will be added to the analog home page. See the section on Configuring the output for how to download, or even translate, new languages.


Two other common things you might want to do are to alias files or hosts (for example, to tell analog that two different filenames are really the same file), or to include or exclude certain files, hosts or dates (to ignore accesses from your site, for example, or to do an analysis only of a certain subdirectory or a certain time period). For these, see the later sections on Aliases and Inclusions and exclusions.

As I said, these are only a few of the commands available. To find out about all the commands, you'll have to read the remaining sections of the Readme, starting with a short section on the syntax of configuration commands.


Syntax of configuration commands

This section describes how analog finds configuration commands, and what the syntax of a configuration file should be. The syntax of individual commands is given in the Quick reference section later.
When analog starts up, it first reads options from configuration files and the command line (assuming that you are running analog from an operating system with a command line). Defaults for many of these options will have already been set in the files anlghead.h and anlghea2.h at the time the program was compiled. So if you compile your own version of analog, rather than downloading a pre-compiled executable, you can also set some options in those files before compiling. Those options are all documented there.
The first file which analog reads is the default configuration file, normally called analog.cfg. You can stop this file being read by specifying the option -G on the command line. Then the command line arguments are read, in the order in which they appear. Finally, the mandatory configuration file is read, if you specified one when you compiled the program. This is a configuration file which cannot be overridden by the user: if it is not found, analog exits immediately. This allows a system administrator to prevent users analysing certain files or producing certain reports, for example. However, note that the only certain way to prevent users analysing things is to deny them access to the logfile. Otherwise there is nothing to stop them analysing the logfile using another copy of analog or another program.
You can include another configuration file by a command like
CONFIGFILE other.cfg
The commands in the other configuration file are read immediately, in order. The program then continues reading the first configuration file where it left off. Note that reading in several configuration files does not produce several output pages, but a single output page based on all the options.

You can also include another configuration file from the command line by using a command like +gother.cfg. (Note that there is no space between +g and the filename; this is true of all command line arguments.) But note that reading an alternative configuration file does not stop the default configuration file (usually analog.cfg) being read as well. To do that you have to specify -G as well as the +g command. This is because if you want several different configurations, it's most convenient to put all the common options in analog.cfg, and options specific to each configuration in a separate file. Then the +g command line option will read both those files.

If the name of a configuration file given in a CONFIGFILE command doesn't include a directory, it will be looked for wherever analog expects to find its configuration files. (This location is a compile-time option.) For example, in the Windows version it would be in the same folder as the analog executable. This applies to the default and mandatory configuration files as well. But configuration files given with +g are relative to the current directory at the time you run the program.

In the Mac version, you can start up a program with a particular configuration file instead of the default one by dragging the configuration file onto the analog icon. The file must start with "# ".

You can also specify any configuration command on the command line even if it doesn't have a command line abbreviation, by use of the +C command. (NB The C must be upper case.) For example, +C"UNCOMPRESS *.gz gzcat" will include that command.


Here are the syntax rules for configuration commands. A configuration file contains several commands, normally on separate lines; any text after a hash (#) on a line is ignored as a comment. Configuration commands can be continued across lines by using a backslash as the last character on the line (but can't then have comments until the end of all the lines; also the total length can't be more than 254 characters). Each command consists of the command name followed by one or two arguments. An argument to a command may optionally be placed in single or double quotes or parentheses, and it must be if the argument contains a hash or a space, or if the last character of the last argument is a backslash. So, for example, here are some valid configuration commands.
DAILYSUM   OFF   # We don't want a Daily Summary
DAILYREP  "ON"   # We want a full Daily Report instead 
HOSTNAME (Spam Widgets Inc.)  # Spaces, so quotes or brackets needed
LOGFILE logfile1.log,\
logfile2.log     # This line and the previous one are one command
Generally later commands override earlier ones if you can have only one of that thing (e.g., for the OUTFILE), or supplement them if you can have several (e.g., for the LOGFILE, because you can read several logfiles). Apart from that, the order of commands doesn't matter, except that LOGFORMAT and LOGTIMEOFFSET commands must come earlier in the same configuration file than the LOGFILE to which they refer.
If all the options seem a bit confusing, just run
analog -settings [other options]
from the command line, or include SETTINGS ON in the configuration commands. Then instead of running normally, analog will just tell you what the values of all the variables will be, based on the defaults in anlghead.h and anlghea2.h, the configuration commands, and the command line options. If you're on Unix or Windows, remember that you can send the output to a file with
analog -settings > file
Also, analog -version will just give the version number.

Choosing a logfile

The basic command for selecting a logfile is
LOGFILE logfilename
or just to put the logfile name on the command line without any arguments, e.g., analog logfilename. In the Mac version, you can also analyse a particular single logfile by dragging it onto the analog icon. All logfiles must be within your computer's file system (on disk, or at least mounted under Unix, or on a mapped drive under NT) -- analog won't use FTP or HTTP to fetch them from the internet.

A - sign or the word stdin is interpreted as standard input: this is useful on Unix systems for constructing pipes. There is also an optional second argument to the LOGFILE command which is explained below.

You can have several LOGFILE commands. You can include wildcards in the logfile name (but not necessarily in the directory name: this is system-dependent), and you can use a list of logfiles separated by commas (without spaces). So the following commands would tell analog to read logfile1, c:\logs\logfile2, and all files ending in .log:

LOGFILE logfile1,*.log
LOGFILE c:\logs\logfile2
Or if you were on a Mac, you might use something like
LOGFILE "Hard Drive:Internet Applications:Analog:Logs:*"
You can also use the special command
LOGFILE none
to erase the list of logfiles specified so far.

If the name of a logfile in a LOGFILE command doesn't include a directory, it will be looked for wherever analog expects to find logfiles. (This location is built in when the program is compiled.) For example, on Windows it would be in the same folder as the analog executable. But logfile names given on the command line are within the current directory.

You can also include the date in the LOGFILE name, by using the following codes.

%D  date of month
%m  month name, in English
%M  month number
%y  two-digit year
%Y  four-digit year
%H  hour
%n  minute
%w  day of week, in English
So for example,
LOGFILE access_log%Y%M.log
will look for the logfile access_log200109.log, if it's September 2001. The date used is actually the TO date if one was specified, and otherwise the time of the start of the program. So for example, you can look at all of last month's logfiles with the commands
TO -00-0131                   # to end of last month
LOGFILE access_log%Y%M??.log  # finds access_log200108??.log in Sep 2001

The LOGFILE commands are cumulative, except that any logfiles on the command line or in configuration files specified on the command line override any in the default configuration file or configuration files loaded from there, and are themselves overridden by any in the mandatory configuration file or configuration files loaded from there. Usually you don't need to worry about this, and it will do what you expect! (Actually I should have said "logfiles or cache files" -- but we'll get on to that later).


Analog knows about several different types of logfile. By default it will attempt to see if your logfile is of one of the types it knows about, based on the first line. The types it can usually diagnose are the common log format, the NCSA combined format, referrer log and browser log, the W3 extended log format, the Microsoft IIS format, the Netscape format, the WebSTAR format, the WebSite format and the MacHTTP format. Examples of all these formats are given at the end of this section. If you have debugging on, analog will report what type of logfile it thinks yours is.

If your logfile is not in one of the standard formats, you will probably still be OK, because it is possible to tell analog about other formats using a LOGFORMAT command. This is explained in the next section. But most users don't ever need to know about this because they have logfiles in a standard format. So the best thing to do is just to try analysing your logfile and see if analog will understand it. If it does, you don't need to worry about LOGFORMATs.

If analog can't understand your logfile, it will warn you that it can't detect the format, or possibly that it found a lot of corrupt lines. There are basically five reasons why this might happen:

  1. Many people try and use a LOGFORMAT command when they don't need one. Always try without one first.
  2. Some log formats are not very well designed and analog can't analyse them reliably. In this case it will give up, usually with a helpful message, rather than risk doing a bad job. For example, you might get "Logfile with ambiguous dates" or "Time without date." In this case you should read the notes on all the built-in formats below where some common problems with those formats are described.
  3. Since analog tries to deduce the format based on the first line of the logfile, it could just be that the first line is corrupt. In this case, you could tell analog the format, or you could just fix the first line.
  4. For the same reason, if the format changes midway through the log, analog will count the remaining lines as corrupt. In this case, you will find that your output page contains a partial analysis but with a large number of corrupt lines too. You will need to give analog two LOGFORMAT commands to tell it about the two different formats.
  5. Finally, some logfiles really aren't in one of the standard formats. In this case you will need to read the next section and learn how to tell analog about your format.
If you can't see what's wrong with your logfile, you can specify DEBUG ON, and analog will report where each line was corrupt.
There is also an optional second argument to the LOGFILE command, which specifies a prefix to add to all the filenames in that logfile. This is useful if you've got several different servers or virtual hosts, when the same filename may occur on each of the servers. For example,
LOGFILE mydomain.log http://www.mydomain.com
would translate a filename /file.html in mydomain.log to http://www.mydomain.com/file.html. (If you only have logfiles from one server, and you just want the prefix so that you can host the output on a different server, then you probably want the BASEURL command instead.)

Note that because this actually changes the name of the file, any FILEINCLUDE, FILEEXCLUDE or FILEALIAS command will have to refer to the new name, including the prefix.

If you are using this command to combine logfiles from several different virtual hosts, then the Virtual Host Report doesn't tell you about the different virtual hosts. The virtual host name has just become part of the filename. So you want to look in the Directory Report instead. (And you will probably want to use the SUBDIR command as well.)

If the logfile contains the name of the virtual host on each line, then the argument can contain a %v, and the name of the virtual host will be inserted at that point. If %v is included and the logfile line doesn't have a virtual host, then that line will be marked as corrupt.


It is often convenient to store logfiles compressed to save disk space. Analog will automatically read logfiles compressed using gzip, zip or bzip2. But if you have logfiles compressed using some other program, analog can still read them provided that you use an UNCOMPRESS command to say how to uncompress them.

You need to supply the types of file that you want to uncompress in a comma-separated list, together with the name of a command that will uncompress the files to standard output (rather than to a file). For example, on Unix you might use

UNCOMPRESS *.Z "/usr/bin/uncompress -c"
whereas on Windows NT, you might use
UNCOMPRESS *.Z ("c:\Program Files\uncompress\uncompress" -c)

If analog determines that a logfile which it's uncompressing isn't wanted for the analysis, a "broken pipe" error can be reported. This is produced by the uncompressing command and is out of analog's control, but it's harmless.

(Hint: There's nothing to stop you using the UNCOMPRESS command for other types of preprocessing, for example DNS resolution.)


Logfile formats

Here is a summary of the various logfile formats which analog knows about. To illustrate them, I have used the same (fictional) request as it might be recorded in the different formats.

The common logfile format is written by most servers. Its lines look like

jay.bird.com - fred [25/Dec/1998:17:45:35 +0000]
"GET /~sret1/ HTTP/1.0" 200 1243
(except all on one line). Some versions of Microsoft software have a buggy version of this with an extra quote mark before the HTTP like this:
jay.bird.com - fred [25/Dec/1998:17:45:35 +0000]
"GET /~sret1/ "HTTP/1.0" 200 1243
Analog will understand these, but (as with any two formats) it will reject lines if the format changes half way through.
The NCSA referrer log looks like
[25/Dec/1998:17:45:35] http://www.site.com/ -> /~sret1/
and the browser (or agent) log looks like
[25/Dec/1998:17:45:35] Mozilla/2.0 (X11; I; HP-UX A.09.05)
In the referrer log, the date can be omitted.
The NCSA combined log is the same as the common log, except that it has the referrer and browser on the end in quotes, like this:
jay.bird.com - fred [25/Dec/1998:17:45:35 +0000] "GET /~sret1/ HTTP/1.0"
200 1243 "http://www.site.com/" "Mozilla/2.0 (X11; I; HP-UX A.09.05)"
(except all one line). If you are using the Apache server, you can generate this with the mod_log_config module, using the Apache command
LogFormat "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-Agent}i\""
It is usually better to use the combined log than separate logs, because it stores more information in less space.
The Microsoft IIS logfile looks like
192.64.25.41, -, 25/12/98, 17:45:35, W3SVC1, HOST1, 192.16.225.10,
2178, 303, 1243, 200, 0, GET, /~sret1/, -,
(except all on one line; and sometimes with four-digit years). However, the format is extremely badly designed, in that the date follows local conventions: in other words, in North America the above example would have the date 12/25/98 instead. Analog will diagnose which form the logfile is in if possible: but if both the date and the month are at most 12, there is no way to tell which format it is. In this case, it will advise you to use the command LOGFORMAT MICROSOFT-NA for North American date format, or LOGFORMAT MICROSOFT-INT for international date format. In some countries, the date will not be in either of these formats, in which case you need to write your own LOGFORMAT command, based on the examples in the next section.

There are also various third-party extensions to the Microsoft format to include, for example, the browser and referrer. But they all do it in different ways, so analog can't automatically diagnose them, and again, you need to write a LOGFORMAT command for them.


The WebSite format looks like
12/25/98 17:45:35  jay.bird.com  host1  Server  fred  GET  /~sret1/
http://www.site.com/    Mozilla/2.0 (X11; I; HP-UX A.09.05)  200  1243  2178
(except all on one line, and with the fields separated by tabs). It suffers from the same problem with ambiguous dates as the IIS logfile (above), so again you might have to use LOGFORMAT WEBSITE-NA or LOGFORMAT WEBSITE-INT, or even have to write your own LOGFORMAT command.
The MacHTTP format looks like
12/25/98  17:45:35   OK    jay.bird.com  /~sret1/  1243
with the fields separated by tabs.
The W3 extended log, the Netscape log, and the WebSTAR log can be recognised because they must include at or near the top a line telling analog what format to expect on subsequent lines. (They may also contain later lines changing the format). If the header line is missing, analog won't be able to interpret the subsequent lines and so won't be able to analyse the logfile. In this case, you will have to either replace the missing header or use a LOGFORMAT command to tell analog your format.

If analog finds that the header line is corrupt, it will usually tell you what was wrong with it. The most common problem is that you're not allowed the time without the date or vice versa -- in particular, having the date just at the top of the logfile is not sufficient; you must have it on each line. By default, Microsoft servers produce extended logs with the date only at the top. But if the date changes during the logfile, the server doesn't then write a new date line. This means that missing days or corrupt entries can make analog get a day out in either direction, with no way to rescue or even recognise the situation!

For this reason analog knows that it can't analyse such logfiles safely, so instead it insists that the date should be on every line. There are some programs on the helper applications page to put the date on each line. If you already have such a logfile you might want to use one of these programs, but they have to assume that the date doesn't change during the logfile, so it would be much safer to tell your server to log the date on every line in future.

The extended log is described at http://www.w3.org/TR/WD-logfile.html. Its header line looks like

#Fields: date time cs-uri
In the rest of the logfile, the fields can be separated by spaces or tabs. Remember the logfile must contain the date as well as the time on every line -- see above.

There is also Microsoft's attempt at the extended format -- unfortunately they didn't read the spec., so they didn't enclose the browser and referrer in quotes, they replaced spaces in the browser name with +'s, and they put the time taken to serve the request in milliseconds instead of seconds. And there is WebSTAR's attempt which is very nearly right except that they erroneously used the CS-HOST field as the client hostname instead of the server hostname. Analog will understand all of these versions.

Extended logs always record the time in GMT, so you will probably need to use a LOGTIMEOFFSET command to convert to your local timezone.

The WebSTAR format is described at http://www.starnine.com/webstar/docs/ws4manual.3f.html. It has a header line like

!!LOG_FORMAT DATE TIME RESULT URL BYTES_SENT HOSTNAME
In the rest of the logfile, the fields are separated by tabs. The WebSTAR server also records the time in GMT, so again you will probably need to use a LOGTIMEOFFSET command to convert to your local timezone. Some other Mac servers also use the WebSTAR format, or something looking like it. Analog will understand these too.

Finally, the Netscape header line looks like

format=%Ses->client.ip% [%SYSDATE%] "%Req->reqpb.clf-request%"
%Req->srvhdrs.clf-status% %Req->srvhdrs.content-length%

Log formats

This section is about how to tell analog the format of your logfile. Most people don't need to do this because analog can detect the format automatically -- try it first and see, because you will save yourself a lot of trouble! But if you do need to specify the log format explicitly, here is how to do it.

The basic command to specify a log format looks like

LOGFORMAT format
-- we'll discuss what the formats can be in a minute. Or if you are using the Apache server, you will probably find it more convenient to use
APACHELOGFORMAT apacheformat
instead.

The LOGFORMAT and APACHELOGFORMAT commands only apply to logfiles specified with a LOGFILE command later in the same configuration file. So you must put the LOGFORMAT above the LOGFILE to which it refers. If you declare your logfiles on the command line, or drag them onto the app on the Mac, you must use DEFAULTLOGFORMAT or APACHEDEFAULTLOGFORMAT instead. This is so that different logfiles can have different formats, like this:

LOGFILE log0
LOGFORMAT format1
LOGFILE log1
LOGFORMAT format2
LOGFILE log2
LOGFILE log3
In this example, log1 is in format1, log2 and log3 are in format2, and log0 isn't in either format -- analog will try and detect which format it's in.
The APACHELOGFORMAT command is followed by the LogFormat from your Apache httpd.conf file. For example, if your httpd.conf contained the following lines:
LogFormat "%h %l %u %t %v \"%r\" %>s %b" myformat
CustomLog /var/log/apache/access.log myformat
then your analog.cfg should contain
APACHELOGFORMAT (%h %l %u %t %v \"%r\" %>s %b)
LOGFILE /var/log/apache/access.log
(Use parentheses instead of quotes round the argument if the argument already contains quotes.) Analog understands all Apache log formats, with the exception that it won't parse Apache's "%...{format}t" construction for customised times: if you have this construction, you will have to use ordinary LOGFORMAT instead. (This is because "%...{format}t" is sometimes localised.)
The possible formats for use with the LOGFORMAT command are of two types. First there are some symbolic words, and then there are log format strings. We'll look at the words first.

There are format words for all the built-in formats analog knows about. You might need one of these words if your logfile is in a standard format, but analog can't detect which format it's in for some reason; for example, maybe the first line is corrupt; or maybe analog can't tell whether you're using North American or international dates. So for example

LOGFORMAT COMMON
will select common format; you can also have COMBINED, REFERRER, BROWSER, EXTENDED, MICROSOFT-NA (North American date format), MICROSOFT-INT (international date format), WEBSITE-NA, WEBSITE-INT, MS-EXTENDED (Microsoft's attempt at extended format), WEBSTAR-EXTENDED (WebSTAR's version of extended format), MS-COMMON (a buggy version of common format in some versions of Microsoft software), NETSCAPE, WEBSTAR or MACHTTP. All these formats were defined at the end of the previous section. You can also use the special word AUTO to return to automatic detection.

If your logfile is not in one of the recognised formats, you can tell analog about your format using a log format string. You only ever need this if your logfile has lines which are not in one of the standard formats. (And even if it isn't in a standard format, if you're using the Apache web server, you will find APACHELOGFORMAT easier.)

The format string consists of a template for the logfile line, with the various fields and special characters replaced by codes as follows. Please note that these codes are case sensitive -- for example, %b is completely different from %B!

%S
host (the client hostname, or address of the computer making the request)
%s
numerical IP address of client (if recorded in a separate field; used when %S is empty)
%r
file requested
%q
query string (part of filename after ?, if recorded in a separate field)
%B
browser
%A
browser with +'s instead of spaces
%f
referrer
%u
user (tip: a cookie or session id can usefully be defined as %u too)
%v
virtual host (the server hostname, also called the virtual domain)
%d
day of the month
%m
month in digits
%M
month, three letter English abbreviation
%y
year, last two digits
%Y
year, four digits
%Z
year, two or four digits (less efficient)
%h
hour of the day
%n
minute of the hour
%a
a or A for am, or p or P for pm, if %h is in the 12-hour clock. (So to match "am" you need %am and to match "AM" you need %aM)
%U
"Unix time" (seconds since beginning of 1970, GMT). If it includes decimals, use %U.%j
%b
number of bytes transferred
%t
processing time in seconds
%T
processing time in milliseconds
%D
processing time in microseconds
%c
HTTP status code
%C
code words used instead of HTTP status code in some servers -- only used internally
%j
junk: ignore this field (field can be empty too)
%w
white space: spaces or tabs
%W
optional white space
%%
% sign
\n
new line
\t
tab stop
\\
single backslash
So for example, the common log format, which looks like
jay.bird.com - fred [25/Dec/1998:17:45:35 +0000]
"GET /~sret1/ HTTP/1.0" 200 1243
(except all on one line) could be represented by the LOGFORMAT command
LOGFORMAT (%S - %u [%d/%M/%Y:%h:%n:%j %j] "%j %r %j" %c %b)
In other words, it's just the sample line but with the hostname replaced by %S, the username by %u etc. (The parentheses are needed because the argument contains spaces.) Or take another example: if you had lines which looked like
Fri 25/12/98 5:45pm, /~sret1/, jay.bird.com, 200, 1243,
http://www.site.com, Mozilla/2.0 (X11; I; HP-UX A.09.05)
(all on one line again), you could use the format
LOGFORMAT (%j %d/%m/%y %h:%n%am, %r, %S, %c, %b, %f, %B)
Remember: if you have trouble writing a LOGFORMAT string, you can turn debugging on, and analog will report where each line was corrupt. If you still have trouble, you can write to the analog-help mailing list.
A logfile can sometimes have lines in several different formats. So you can specify several LOGFORMAT commands in a row, and they will all apply to the next logfile. This is also useful if the format of your logfile changes half way through. So in this example:
LOGFORMAT COMMON
LOGFORMAT COMBINED
LOGFILE log1
LOGFORMAT (%j %d/%m/%y %h:%n%am, %r, %S, %c, %b, %f, %B)
LOGFILE log2
LOGFILE log3
log1 has lines in both common and combined format, whereas log2 and log3 have lines just in the format in the previous example.

If you specify several formats, analog tries to match each line to the first format first, then if that fails the next, and so on, so the order of the formats is important. Usually you want to specify the most common one first, to minimise the time spent trying to match lines to inappropriate formats.


I suggested above that any logfile which doesn't have a LOGFORMAT command earlier in the same configuration file, or is specified on the command line, is auto-detected. But this isn't quite true. Actually such logfiles get a special format called the default log format. The default format starts off as auto-detection, but you can change it if you want with the DEFAULTLOGFORMAT command. This command works exactly the same as the LOGFORMAT command -- it understands the same formats, and if you have several DEFAULTLOGFORMAT commands, they accumulate in the same way. The difference is that they don't need to be put in any particular place. (There is also APACHEDEFAULTLOGFORMAT, which has the same effect but uses the Apache LogFormat strings.)

So let's go back to the first example:

LOGFILE log0
LOGFORMAT format1
LOGFILE log1
LOGFORMAT format2
LOGFILE log2
LOGFILE log3
Here log0 actually gets the default log format. If there are no DEFAULTLOGFORMAT commands, the default will be auto-detection. But if there are DEFAULTLOGFORMAT commands, even in another configuration file, that will be the format of log0.

The times you need to use the DEFAULTLOGFORMAT instead of the LOGFORMAT are if you want to change the format of logfiles which aren't given in a LOGFILE command -- for example, ones specified on the command line, or dragged onto the program icon on a Mac, or compiled in.


A couple more technical details and tips about LOGFORMAT commands.

The "Unix time", %U, is always recorded in GMT. So you will probably need to use a LOGTIMEOFFSET command to convert to your local timezone. Also, it's just the integer part of the time, so if you have decimals you will have to use %U.%j .

The log formats which analog can handle are those which are known as instantaneously decipherable: in practice, this means that the character which terminates a string can never occur in the string. So for example, in common format, which looks like

LOGFORMAT (%S - %u [%d/%M/%Y:%h:%n:%j %j] "%j %r %j" %c %b)
if the hostname ever contained a space, the line would be marked as corrupt, because analog terminates the host at the first space, not at the first occurrence of space-dash-space, and then the rest of the line wouldn't match. Of course, hostnames should never contain spaces, so this shouldn't be a problem. There are a couple of other restrictions: if there is any date or time information, then the year, month, date, hour and minute must all be present: and the same information may not occur twice in the format (so you can't have both %m and %M, for example, because these both represent the month; make one of them a %j to have it ignored).

Sometimes you need to read one of the fields in a logfile, but not analyse it. For example, if you have a separate common log and referrer log, the referrer log might look like

http://guide-p.infoseek.com/Titles -> /~sret1/analog/
But the requests for /~sret1/analog/ would already have been counted when reading the main logfile, so you don't want to count them again now. You get round this by specifying a * in that item in the format string, like this:
LOGFORMAT (%f -> %*r)

A tip: sometimes it is more efficient to specify two or more adjacent fields to ignore with a single %j, as long as the whole group ends with a recognisable character. So common format is more efficiently specified as

LOGFORMAT (%S - %u [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b)
-- in the date and time [25/Dec/1998:17:45:35 +0000], the seconds and the timezone can be ignored with a single %j, extending until the close-bracket.

Another tip: %j can also be used to ignore whole lines, rather than just fields analog doesn't use. For example, the extended log format ignores lines beginning with # by using

LOGFORMAT #%j
and the Microsoft format ignores lines corresponding to FTP requests with
LOGFORMAT (%*S, %*u, %m/%d/%y, %h:%n:%j, %j)
If those formats had not been used, the lines would have been incorrectly marked as corrupt.
Finally, both for reference and as examples, here is a list of all the fixed formats that analog understands, together with the example lines from the previous section and their built-in definitions (split over two lines where necessary).
Common format, LOGFORMAT COMMON
jay.bird.com - fred [25/Dec/1998:17:45:35 +0000]
      "GET /~sret1/ HTTP/1.0" 200 1243
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j%w%r%wHTTP%j" %c %b)
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j%w%r" %c %b)
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%r" %c %b)
Microsoft common format, LOGFORMAT MS-COMMON
jay.bird.com - fred [25/Dec/1998:17:45:35 +0000]
      "GET /~sret1/ "HTTP/1.0" 200 1243
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j%w%r%w"HTTP%j" %c %b)
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j%w%r" %c %b)
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%r" %c %b)
Combined log, LOGFORMAT COMBINED
jay.bird.com - fred [25/Dec/1998:17:45:35 +0000] "GET /~sret1/ HTTP/1.0" 200
      1243 "http://www.site.com/" "Mozilla/2.0 (X11; I; HP-UX A.09.05)"
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j%w%r%wHTTP%j" %c %b "%f" "%B")
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j%w%r" %c %b "%f" "%B")
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%r" %c %b "%f" "%B")
Referrer log, LOGFORMAT REFERRER
[25/Dec/1998:17:45:35] http://www.site.com/ -> /~sret1/
or http://www.site.com/ -> /~sret1/
LOGFORMAT ([%d/%M/%Y:%h:%n:%j] %f -> %*r)
LOGFORMAT (%f -> %*r)
Browser log, LOGFORMAT BROWSER
[25/Dec/1998:17:45:35] Mozilla/2.0 (X11; I; HP-UX A.09.05)
LOGFORMAT ([%d/%M/%Y:%h:%n:%j] %B)
Microsoft log, North American dates, LOGFORMAT MICROSOFT-NA
192.64.25.41, -, 12/25/98, 17:45:35, W3SVC1, HOST1, 192.16.225.10,
      2178, 303, 1243, 200, 0, GET, /~sret1/, -,
192.64.25.41, -, 12/25/2001, 17:45:35, W3SVC1, HOST1, 192.16.225.10,
      2178, 303, 1243, 200, 0, GET, /~sret1/, -,
LOGFORMAT (%S, %u, %m/%d/%Z, %h:%n:%j, W3SVC%j, %j, %v,
      %T, %j, %b, %c, %j, %j, %r, %q,)
LOGFORMAT (%*S, %*u, %m/%d/%Z, %h:%n:%j, %j)
Microsoft log, international dates, LOGFORMAT MICROSOFT-INT
192.64.25.41, -, 25/12/98, 17:45:35, W3SVC1, HOST1, 192.16.225.10,
      2178, 303, 1243, 200, 0, GET, /~sret1/, -,
192.64.25.41, -, 25/12/2001, 17:45:35, W3SVC1, HOST1, 192.16.225.10,
      2178, 303, 1243, 200, 0, GET, /~sret1/, -,
LOGFORMAT (%S, %u, %d/%m/%Z, %h:%n:%j, W3SVC%j, %j, %v,
      %T, %j, %b, %c, %j, %j, %r, %q,)
LOGFORMAT (%*S, %*u, %d/%m/%Z, %h:%n:%j, %j)
WebSite log, North American dates, LOGFORMAT WEBSITE-NA
12/25/98 17:45:35  jay.bird.com  host1  Server  fred  GET  /~sret1/
   http://www.site.com/    Mozilla/2.0 (X11; I; HP-UX A.09.05)  200  1243  2178
LOGFORMAT (%m/%d/%y %h:%n:%j\t%S\t%v\t%j\t%u\t%j\t%r\t%f\t%j\t%B\t%c\t%b\t%T)
WebSite log, international dates, LOGFORMAT WEBSITE-INT
25/12/98 17:45:35  jay.bird.com  host1  Server  fred  GET  /~sret1/
   http://www.site.com/    Mozilla/2.0 (X11; I; HP-UX A.09.05)  200  1243  2178
LOGFORMAT (%d/%m/%y %h:%n:%j\t%S\t%v\t%j\t%u\t%j\t%r\t%f\t%j\t%B\t%c\t%b\t%T)
MacHTTP format, LOGFORMAT MACHTTP
12/25/98  17:45:35   OK    jay.bird.com  /~sret1/  1243
LOGFORMAT (%m/%d/%y\t%h:%n:%j \t%C%w%S\t%r\t%b)
The extended log, Netscape log and WebSTAR log don't have any built-in formats: analog constructs their formats from their header lines.

Aliases

After analog has read each logfile entry, it then applies aliases to each of the items. First, if you have a case insensitive filesystem, analog converts the filename to lower case. Usually analog assumes that Unix and BeOS filesystems are case sensitive and other systems are case insensitive. You might want to override its choice, if, for example, you have transferred files from one machine to another, so as to use the convention on the original machine. You can do this by the commands
CASE INSENSITIVE
CASE SENSITIVE
There are similar commands for usernames, if your logfile records these. By default, usernames are always case insensitive, but you can specify
USERCASE SENSITIVE
to override this.
Next it applies built-in aliases to each item. For example, it knows that %7E in a filename or referrer is equivalent to ~ and translates it accordingly. It also strips off the directory suffix from any filenames which have it. This suffix is normally index.html, but you can specify another one instead with a command such as
DIRSUFFIX default.htm
(You can only have one DIRSUFFIX.) There are other built-in aliases for other items: for example, hostnames are converted to lower case at this point.
After this, it applies user-specified aliases to each item. These aliases are useful if, for example, you know that two filenames correspond to the same file, or if you want to translate local hostnames to their internet equivalents. You specify aliases by commands like
FILEALIAS /football.html /soccer.html
HOSTALIAS lion lion.statslab.cam.ac.uk
There is also the special command FILEALIAS none, which cancels any other file aliases which might have been specified.

The alias commands for the other items are called BROWALIAS, REFALIAS, USERALIAS and VHOSTALIAS. Only one alias is ever applied to any item. So after

FILEALIAS /football.html /soccer.html
FILEALIAS /soccer.html /brazil.html
the file /soccer.html would get translated to /brazil.html, but /football.html would only get translated to /soccer.html and would not see the second alias.

You can also use wildcards in ALIAS commands: ? matches any one character and * matches any number of characters (including none). And on the right-hand side, you can use $1, $2 etc. to represent the parts of the original name matched by the *'s. As a special abbreviation, if there is exactly one * on the left-hand side, then a * on the right-hand side can be used to represent $1. So, for example,

FILEALIAS /*/football/* /soccer/
would translate /sport/football/rules.html to just /soccer/, but either of
FILEALIAS /*/football/* /$1/soccer/$2         # or
FILEALIAS /sport/football/* /sport/soccer/*
would translate /sport/football/rules.html to /sport/soccer/rules.html.

You can use $$ to get an actual $ on the right-hand side. Or you can prefix the right-hand side with "PLAIN:" to treat any $'s and *'s on the right-hand side literally. For example

FILEALIAS /*/football/* PLAIN:/$1/soccer/$2
would translate /sport/football/rules.html to exactly /$1/soccer/$2

Analog's *'s are un-greedy: if there are two possible ways of matching, the part of the expression on the left matches as little as possible. This is more often what you want. But it contrasts with Perl's regular expressions, for example. (Oh, two consecutive *'s are completely useless, but if you try it they are collapsed into one before counting the $1, $2, etc.)

The behaviour of FILEALIAS and REFALIAS can be slightly unintuitive if the file has search arguments.

A warning to Unix users: if you put an ALIAS command on the command line with +C, the shell may try and expand $1 etc., which is not what you want. To stop the shell doing this, put the command in single quotes instead of double quotes.


There is another set of alias commands, called output aliases. They don't alias items, but individual lines from particular reports (and they never combine lines, even if two lines end up with the same name). For example, the command
TYPEALIAS .txt ".txt (Plain text files)"
would provide an explanation of that line in the File Type Report.

There can be some confusion between some normal alias and output alias commands. For example, what is the difference between FILEALIAS and REQALIAS? In fact, there are several differences because of the different things the aliases are doing. FILEALIAS applies to the files themselves, but REQALIAS only applies to the lines in the Request Report. This means that FILEALIAS also affects the other reports which use the filenames, such as the Directory Report, whereas REQALIAS only affects the Request Report.

Another difference is that REQALIAS applies separately to each line of the Request Report. This means that if two separate files translate to the same thing in a FILEALIAS command, they will become one file for all the reports. But if you were to use the same REQALIAS command, they would still be two files, and would still be listed on separate lines in the Request Report, but with the same name.

So in summary, when should you use each command? FILEALIAS should be used if a single file has two different names; i.e., if your web server returns the same file for two different URLs. REQALIAS, on the other hand, would typically be used to annotate or clarify the Request Report. Sometimes it's useful to use both; first combine some files with FILEALIAS, and then annotate them in the Request Report with REQALIAS.

The full list of output aliases is REQALIAS, REDIRALIAS, FAILALIAS, TYPEALIAS, DIRALIAS, HOSTREPALIAS, REDIRHOSTALIAS, FAILHOSTALIAS, DOMALIAS, ORGALIAS, REFREPALIAS, REFSITEALIAS, REDIRREFALIAS, FAILREFALIAS, BROWREPALIAS, BROWSUMALIAS, OSALIAS, VHOSTREPALIAS, REDIRVHOSTREPALIAS, FAILVHOSTREPALIAS, USERREPALIAS, REDIRUSERALIAS and FAILUSERALIAS.

There is one known bug with the output aliases. The report is sorted before the alias is applied. This means that if the SORTBY for the report is set to ALPHABETICAL, then the report will not be sorted correctly.


You can also use regular expressions in the ALIAS commands. Sorry, I'm not going to teach you how to use regular expressions here if you don't already know: if you're on Unix try typing man perlre or man regex or man grep. There are lots of implementations of regular expressions. The ones which analog uses are Perl-syntax regular expressions. In general, these are a superset of the extended regular expressions used by Unix egrep or GNU grep -E.

You include regular expressions in an ALIAS command by prefixing the left-hand side of the alias with "REGEXP:". Or you can specify a case-insensitive match, like Perl m//i or Unix egrep -i, by using "REGEXPI:". (It's automatically case-insensitive for many items, such as hostnames, or filenames if you have specified CASE INSENSITIVE.)

On the right-hand side of the alias you can use $1, $2 etc. to represent the first, second etc. bracketed expression on the left-hand side, counting in order of the left brackets. (Again, you can't put $1, $2 etc. on the command line unless you put them in single quotes.)

Regular expressions match if they match just part of the string. If you want them to have to match the whole of the string, you have to anchor them to the ends of the string with ^ and $.

For example,

REQALIAS REGEXP:^(/~(.+?)/.*) "[$2] $1"
would translate
/~sret1/backgammon/rules.html
to
[sret1] /~sret1/backgammon/rules.html
in the Request Report. Or
HOSTALIAS REGEXP:^([^.]*)$ $1.mycompany.com
would add .mycompany.com to all hostnames not containing a dot. (See the FAQ for a discussion about whether this is a good idea.)

Regular expressions are greedy: if there are two possible ways of matching, the part of the expression on the left matches as much as possible.


Inclusions and exclusions

After aliasing each item, analog decides whether that item is wanted or not. The whole line is only counted if all the items are wanted. Whether an item is wanted or not is determined by INCLUDE and EXCLUDE commands specified by the user. These commands can be used to exclude requests from your local users, for example, or to analyse only files in a subdirectory. For example
HOSTEXCLUDE mycomputer.myisp.com
would exclude all requests by that computer from the statistics. (To exclude lines just from one specific report, see below.)

The rule for determining whether an item is included or excluded is as follows. All the INCLUDE and EXCLUDE commands for that item are considered one by one in order, and the item is included or excluded according to the last command it matched. Items which don't match any of the INCLUDE or EXCLUDE commands are included if the first command was an exclusion, and excluded if the first command was an inclusion. For example, the configuration

FILEINCLUDE /~sret1/*
FILEEXCLUDE /~sret1/backgammon/*,/~sret1/analog/*
FILEINCLUDE /~sret1/backgammon/*.gif
would instruct the program to examine only my files, excluding my backgammon and analog files, but including gifs in my backgammon directory. On the other hand,
FILEEXCLUDE /~sret1/*/img/*
would analyse all files, except for images in my various directories. (If you get confused with all the inclusions and exclusions, remember that you can always use SETTINGS ON to see what the options you have specified represent.) Note that inclusions and exclusions can contain any number of wildcards, and can be lists separated by commas (but no spaces).

The full list of these commands is HOSTINCLUDE and HOSTEXCLUDE; FILEINCLUDE and FILEEXCLUDE; BROWINCLUDE and BROWEXCLUDE; REFINCLUDE and REFEXCLUDE; USERINCLUDE and USEREXCLUDE; VHOSTINCLUDE and VHOSTEXCLUDE; and STATUSINCLUDE and STATUSEXCLUDE.


Some notes on these commands.

Because the inclusions and exclusions take place after the aliasing, the name you must use is the aliased name. (In the absence of output alias commands, this is the name of the item in the output.)

Sometimes a line doesn't contain a particular sort of item, either because there is no field reserved for it on the line, or because the browser didn't send it for that request, or because it was present but corrupt. You can include or exclude these lines by making a special blank entry in the INCLUDE or EXCLUDE command. For example,

USERINCLUDE jim
USERINCLUDE ""
would include lines from user jim and lines without any user specified.

The behaviour of REQINCLUDE and REFINCLUDE can be slightly unintuitive if the file has search arguments.

You can also use regular expressions for the inclusions and exclusions by prefixing the expression with "REGEXP:" or "REGEXPI:". I've already described this at length in the context of aliases, so you can look there for all the details. A regular expression must be on a line on its own, not within a comma-separated list.


With HOSTINCLUDE and HOSTEXCLUDE, you have to use numerical addresses if your web server records numerical addresses in the logfile, or names if it records names (or if you're resolving the numerical addresses with analog's DNS resolution). For numerical addresses, you can use some special formats, like this:
HOSTINCLUDE 131.111.20.18      # simple IP address
HOSTINCLUDE 131.111.20.*       # wildcard
HOSTINCLUDE 131.111.20         # the same meaning
HOSTINCLUDE 131.111.20-23      # a range of class C addresses
HOSTINCLUDE 131.111.20.18/23   # subnet mask

The STATUSINCLUDE and STATUSEXCLUDE commands are slightly different from the rest. They work on HTTP status codes. (These codes are defined in the HTTP spec, and viewable in the Status Code Report. But if you don't already know about them, you really don't want to use these commands anyway!) The arguments to the commands are a comma-separated list of ranges. One end of the range can be blank, meaning from the first, or to the last, status code. For example
STATUSINCLUDE 200-206,304,500-
would mean only look at lines with status codes 200-206, 304 or 500-599.

Some people want to exclude status code 304 (Not Modified) to stop those requests appearing in the Request Report. But there is a better solution. By default, analog counts code 304 as a successful request, because it assumes that the cached version of the document is then presented to the user. But you can count it as a redirected request with the command

304ISSUCCESS OFF
For most people this is the wrong option, because code 304 is really the same as code 200 to the user. So again, if you don't understand this, stick with the default.
There is also one other pair of commands which belongs in this category, namely the FROM and TO commands. These specify a time period to restrict the analysis to. The simplest usage of these commands is FROM yyMMdd or FROM yyMMdd:hhmm, where yy represents the last two digits of the year (analog assumes that the year is between 1970 and 2069), MM represents the month, dd is the date, hh the hour, and mm the minute. So, for example, to analyse only requests from 1st July 1999 to 1pm on 15th June 2000 I would use the configuration
FROM 990701
TO   000615:1300
Alternatively, each of the components can be preceded by + or - to represent time relative to the time at which the program was invoked. In this case, the date can have more than 2 digits. This allows constructions like
FROM -01-00+01   # from tomorrow last year
TO -00-0131  # to the end of last month (OK even if last month
             # didn't have 31 days)
FROM -00-00-112
TO   -00-00-01  # statistics for the last 16 weeks
FROM -00-00-00:-06+01  # statistics for the last 6 hours
There are command line abbreviations +F and +T for the FROM and TO commands; for example, +T-00-00-01:1800 looks at statistics until 6pm yesterday. -F and -T turn off the from and to, as do FROM OFF and TO OFF.
There are also INCLUDE and EXCLUDE commands for most of the reports. Unlike the INCLUDE and EXCLUDE commands discussed above, these don't exclude logfile lines but individual lines from particular reports.

So, for example, the command

REFREPEXCLUDE http://your.site.com/*
would exclude your internal referrers from the Referrer Report. However, it would not exclude them from the Failed Referrer Report, the Referring Site Report, etc. (you need to use FAILREFEXCLUDE, REFSITEEXCLUDE etc. for that); nor would it prevent other analysis of logfile lines with those referrers, as REFEXCLUDE would.

The full list of these commands is REQINCLUDE and REQEXCLUDE; REDIRINCLUDE and REDIREXCLUDE; FAILINCLUDE and FAILEXCLUDE; TYPEINCLUDE and TYPEEXCLUDE; DIRINCLUDE and DIREXCLUDE; HOSTREPINCLUDE and HOSTREPEXCLUDE; REDIRHOSTINCLUDE and REDIRHOSTEXCLUDE; FAILHOSTINCLUDE and FAILHOSTEXCLUDE; DOMINCLUDE and DOMEXCLUDE; ORGINCLUDE and ORGEXCLUDE; REFREPINCLUDE and REFREPEXCLUDE; REFSITEINCLUDE and REFSITEEXCLUDE; SEARCHQUERYINCLUDE and SEARCHQUERYEXCLUDE; SEARCHWORDINCLUDE and SEARCHWORDEXCLUDE; INTSEARCHQUERYINCLUDE and INTSEARCHQUERYEXCLUDE; INTSEARCHWORDINCLUDE and INTSEARCHWORDEXCLUDE; REDIRREFINCLUDE and REDIRREFEXCLUDE; FAILREFINCLUDE and FAILREFEXCLUDE; BROWSUMINCLUDE and BROWSUMEXCLUDE; BROWREPINCLUDE and BROWREPEXCLUDE; OSINCLUDE and OSEXCLUDE; VHOSTREPINCLUDE and VHOSTREPEXCLUDE; REDIRVHOSTREPINCLUDE and REDIRVHOSTREPEXCLUDE; FAILVHOSTREPINCLUDE and FAILVHOSTREPEXCLUDE; USERREPINCLUDE and USERREPEXCLUDE; REDIRUSERREPINCLUDE and REDIRUSERREPEXCLUDE; and FAILUSERINCLUDE and FAILUSEREXCLUDE.

The inclusion or exclusion applies to the unaliased name, if you are doing any output aliases. (This contrasts with the behaviour of normal INCLUDE and EXCLUDE commands, which apply to the aliased name.)

All directory names end in slashes, so DIRINCLUDE and DIREXCLUDE, and REFSITEINCLUDE and REFSITEEXCLUDE, implicitly add a trailing slash even if you don't give one. This sometimes catches people out in the following situation.

REFSITEEXCLUDE http://my.host.com/*     # probably not what you want
means not to list subdirectories of the referring site http://my.host.com/, but to keep the site itself in the list. To exclude the site completely, just use
REFSITEEXCLUDE http://my.host.com/

You can also use the symbolic word pages in suitable INCLUDE and EXCLUDE commands; one very common command is

REQINCLUDE pages
to include only pages in the Request Report.
There are some miscellaneous INCLUDE and EXCLUDE commands which I'll describe now. First, analog determines which files should count as pages (and thus which requests count as page requests) using an INCLUDE/EXCLUDE pair called PAGEINCLUDE and PAGEEXCLUDE. By default, (case insensitive) *.html and *.htm, and directories (*/) count as pages. But you change the list by commands like
PAGEINCLUDE *.asp
PAGEEXCLUDE /sret1.html
I.e., *.asp are pages, but /sret1.html isn't. (If the file has search arguments, the PAGEINCLUDE and PAGEEXCLUDE are reckoned just on the part of the filename before the question mark.)
In some of the reports, analog can link to the files which it's listing. You can specify exactly which files are linked to with the LINKINCLUDE family of commands. For example,
REQLINKINCLUDE pages,*.pdf
would link to pages and PDF files in the Request Report. The full set of these commands is REQLINKINCLUDE and REQLINKEXCLUDE (Request Report), REDIRLINKINCLUDE and REDIRLINKEXCLUDE (Redirection Report), FAILLINKINCLUDE and FAILLINKEXCLUDE (Failure Report), REFLINKINCLUDE and REFLINKEXCLUDE (Referrer Report), REDIRREFLINKINCLUDE and REDIRREFLINKEXCLUDE (Redirected Referrer Report), and FAILREFLINKINCLUDE and FAILREFLINKEXCLUDE (Failed Referrer Report). Note that the target of the links is also affected by the BASEURL command.
Finally, there is a pair of commands called ROBOTINCLUDE and ROBOTEXCLUDE, which determine which browsers count as "robots" in the Operating System Report. For example,
ROBOTINCLUDE Googlebot/*

There is one final set of INCLUDE and EXCLUDE commands to include or exclude the search arguments at the end of URLs. But there are some slightly complicated issues surrounding those, so they deserve a new section.

Search arguments

Sometimes a URL contains arguments after a question mark. For example, the URL
/cgi-bin/script.pl?x=1&y=2
runs the /cgi-bin/script.pl program with arguments x=1 and y=2. (Sometimes the server records these arguments in a separate field in the logfile, but if so you can use the %q field in the LOGFORMAT command, and analog will translate the filename to the above format).

You can tell analog either to read or to ignore the arguments using the commands ARGSINCLUDE and ARGSEXCLUDE which we'll discuss in a minute. But by default, all arguments are read, and as this is usually what you want, you don't usually need those commands.

You don't always see the arguments in the reports, even if they're being read, because analog doesn't show them if there aren't enough of them. In order to see them, you have to set the corresponding ARGSFLOOR parameter low enough.

Also note that within a report, the search arguments are listed immediately under the file to which they refer. This temporarily interrupts the normal order of the files. It may be clearer if you turn the N column on.


Assuming that the arguments are being read, analog treats the file /cgi-bin/script.pl?x=1&y=2 as a different file from /cgi-bin/script.pl (or from /cgi-bin/script.pl?y=2&x=1 for that matter). It doesn't look like that in the Request Report because you see a grand total for /cgi-bin/script.pl with all its different arguments. But it matters if you want to do inclusions and exclusions or aliases on the file.

The reason is that, for example, the command

FILEINCLUDE /cgi-bin/script.pl
doesn't match the file /cgi-bin/script.pl?x=1&y=2. To match that, you would have to use something like
FILEINCLUDE /cgi-bin/script.pl*
instead. Similarly
FILEALIAS /cgi-bin/script.pl /script.pl
will change /cgi-bin/script.pl itself, but not /cgi-bin/script.pl?x=1&y=2. You might want to use something like
FILEALIAS /cgi-bin/script.pl?* /script.pl?$1
as well. (However, PAGEINCLUDE and PAGEEXCLUDE always refer to the part of the filename before the question mark.)

Conversely, because in the Request Report files with arguments are only included if their parent file is included, you can't just

REQINCLUDE /cgi-bin/script.pl?*x=1*
or you will end up with nothing listed. You have to
REQINCLUDE /cgi-bin/script.pl
as well.
The alternative is to tell analog not to read the search arguments. There are commands called ARGSINCLUDE and ARGSEXCLUDE, and REFARGSINCLUDE and REFARGSEXCLUDE, to do this. They work the same as the other INCLUDE and EXCLUDE commands which we discussed in the previous section. So, for example, if the command
ARGSEXCLUDE /cgi-bin/script.pl
were given, analog would ignore the arguments to that file, and so read /cgi-bin/script.pl?x=1&y=2 as just /cgi-bin/script.pl. On the other hand, if
ARGSINCLUDE /cgi-bin/script.pl
were specified, analog would read the arguments, and so treat /cgi-bin/script.pl?x=1&y=2 as a different file from /cgi-bin/script.pl. REFARGSINCLUDE and REFARGSEXCLUDE are the same for referrers.

Technical note: the check for whether the arguments should be included happens before the filename has been subject to either built-in or user-specified aliases. So you have to use the unaliased name, exactly as it occurs in the logfile. For example, ARGSINCLUDE /~sret1/script.pl won't match /%7Esret1/script.pl even though they are really the same file. It also means that you can't use "pages" in the ARGSINCLUDE or ARGSEXCLUDE command, because we don't know whether a file is a page until after it's been aliased.


There are related commands called SEARCHENGINE and INTSEARCHENGINE. If you have referrers with search arguments, usually from search engines, you can tell analog which field corresponds to the search term. It uses this information to compile the Search Query Report and the Search Word Report. For example, consider the referrer
http://www.altavista.com/cgi-bin/query?pg=q&kl=XX&q=carrot+cake
The search term is in the field q= so the appropriate SEARCHENGINE command is
SEARCHENGINE http://www.altavista.com/cgi-bin/query q
(or even better
SEARCHENGINE http://*altavista.*/* q
to allow for all their mirror sites in different countries.)

The command INTSEARCHENGINE is the same for search engines, or other scripts which take arguments, within your site. For example, you might have requests for files like

/cgi-bin/search?trm=chocolate+cake
in which case you would specify
INTSEARCHENGINE /cgi-bin/search trm
and (assuming you haven't done an ARGSEXCLUDE for that file) "chocolate cake" would then appear in your Internal Search Query Report.

Sometimes a search engine has two or more possible fields for the search term. In that case you can list all of them separated by commas, like this:

SEARCHENGINE http://*webcrawler.*/* search,searchText

The rest of this section is a bit technical, and you usually don't need to worry about it. On a first reading, you probably want to skip it.

I said previously that %7E in a URL is automatically converted to ~, etc. In fact this is only done to the ASCII-printable characters %20-%7E, because these are the only characters that are the same in every character set. (In fact, even that isn't true. Experts might want to know that ?, &, ; and = aren't converted either, to distinguish them from query-string delimiters: an encoded ?, &, ; or = is one that is not intended to be a delimiter. Also % isn't converted, to avoid confusing %25nm with %nm.)

But in the Search Query Report and Search Word Report it is useful to be able to convert non-ASCII characters too, so that you can see the actual words people typed, rather than get the %nm codes in place of all accented letters. So in these reports analog also converts characters %A0-%FF (if you are using an ISO-8859-* character set) or %80-%FF (for most other character sets).

However, there are reasons why you might not want this feature, and you can turn it off with the command

SEARCHCHARCONVERT OFF
These reasons include:
  1. The character set in which the query was submitted to the search engine may not be the same as that in which the page reached was written, or that in which the analog output page is being written. So converting to the character set of the analog output page may give garbage anyway. This is particularly a problem with languages, such as Russian, which have two or more characters sets in common use. It is also a problem for sites which host resources in many languages.
  2. Not all of the character positions correspond to printable characters in every character set. Analog knows that %80-%9F are non-printable in the ISO-8859-* character sets, but apart from that it converts everything in %80-%FF. So you may end up with non-printable characters in your output.
SEARCHCHARCONVERT is always turned off if the output is in ASCII; and it defaults to off if the output is in a multibyte character set because it doesn't work well in that case.

Configuring the output

So far we have mainly discussed commands which control how analog reads the logfiles. We now get on to commands for configuring the output.
First, you can change the style of the output using the OUTPUT command. There are seven possible output styles, called XHTML, HTML, PLAIN, ASCII, XML, LATEX and COMPUTER.

XHTML is the default. It produces web pages in XHTML 1.0. HTML produces web pages in HTML 2.0.

PLAIN produces plain text files, and ASCII is the same as PLAIN except that it uses all ASCII characters (no accents etc.) if possible. (This is because some applications don't understand accented characters).

LATEX produces LaTeX code which can be turned into PDF if you have the pdflatex command installed. (If you want to use the ordinary latex command, specify PDFLATEX OFF.) It's only available with certain European languages (US-ASCII, ISO-8859-1 and ISO-8859-2 character sets). Yes, I know it gives overfull hboxes sometimes.

COMPUTER is a special format suitable for reading by a computer (useful for reading into a spreadsheet, or post-processing with a graphics package, for example). There is a separate section about this format later.

XML produces an XML output which is an alternative format for post-processing. The DTD for the XML output is distributed with the program. You can find more information about the XML style, and an example of a post-processing program, at http://timian.jessen.ch/.

As well as a command like

OUTPUT PLAIN
you can also select PLAIN style with the command line argument +a, and XHTML with the command line argument -a.

You can also specify OUTPUT NONE for no output, if you are producing a cache file.


Next, you can change the language of the output. There are two ways to do this. The usual way is to use the LANGUAGE command. For example, the command
LANGUAGE FRENCH
will give you the output in French. The available languages at the moment are ARMENIAN, BASQUE, BULGARIAN (Windows-1251), BULGARIAN-MIK (MIK-16), CATALAN, SIMP-CHINESE (GB2312), TRAD-CHINESE (Big5), CZECH (ISO Latin 2), CZECH-1250 (Windows-1250), DANISH, DUTCH, ENGLISH, US-ENGLISH, FINNISH, FRENCH, GERMAN, HUNGARIAN, INDONESIAN, ITALIAN, JAPANESE-EUC (EUC-JP), JAPANESE-JIS (ISO-2022-JP), JAPANESE-SJIS (SJIS), JAPANESE-UTF (UTF-8), KOREAN, LATVIAN, NORWEGIAN (Bokmål), NYNORSK, POLISH, PORTUGUESE, BR-PORTUGUESE, RUSSIAN (KOI8-R), RUSSIAN-1251 (Windows-1251), SERBIAN, SLOVAK (ISO Latin 2), SLOVAK-1250 (Windows-1250), SLOVENE (ISO Latin 2), SLOVENE-1250 (Windows-1250), SPANISH, SWEDISH, SWEDISH-ALT (alternative translation avoiding Anglicisms), TURKISH and UKRAINIAN.

The following languages were available for previous versions of analog, but have not yet been translated for version 5: BOSNIAN, CROATIAN, GREEK, ICELANDIC, LITHUANIAN and ROMANIAN. As and when they are translated, they will be added to the analog home page. If you want to translate any of them (or any other language), I would be delighted! See below.

The other way to specify a language is to use the LANGFILE command. This is useful if you want to download a new language from the analog home page, or if you want to translate one yourself, or even if you want to change some words or phrases or the way the dates and times are formatted in the output. The LANGFILE command tells analog in which file to find the various words and phrases for a new language. For example, the command

LANGFILE guarani.lng   # or
LANGFILE /usr/etc/httpd/analog/lang/guarani.lng
would read from that file. If the name of the file doesn't include a directory, it will be looked for wherever analog normally expects to find its language files.

Some languages also have domains files or report descriptions files available. These are normally selected automatically by the LANGUAGE command. But you can tell analog to use different ones with the DOMAINSFILE and DESCFILE commands. Also, some languages have translations of the form interface or configuration file.

If you want to translate another language, I would be delighted! Do contact me first to make sure that no-one else is already translating the same language. The file README.txt in the language directory, and the English language file, contain some brief instructions for translating new languages.

Equally, if you find any mistakes in the output in different languages, please do let me know because I'm not able to check them all myself!


You can change which file the output goes to with a command like
OUTFILE stats.htm
or with a command line argument like +Ostats.htm. If you use the filename - or stdout, the output will go to standard output, which is normally the screen, but Unix users might like to redirect it to another file or even into a pipe. You can also use an absolute path name, like
OUTFILE /usr/bin/httpd/htdocs/stats.html  # Unix
OUTFILE "Hard Disk:Server Apps:WebSTAR:Analog:Report.html" # Mac
If the name of the OUTFILE doesn't include a directory, it will be put wherever analog expects to put its output files. (This location is built in when the program is compiled.) For example, on Windows it would be in the same folder as the analog executable. But if you use the +O command line argument, the file is within the current directory.

You can include date codes in the OUTFILE in exactly the same way as for the LOGFILE. So for example,

OUTFILE stats%y%M%D.html
will produce filenames like stats990501.html. As with the LOGFILE, the date used is the TO date if one was specified, and otherwise the time of the start of the program.
Next, you need to know how to turn the different reports on and off. There are 44 different reports which analog can produce, if your web server has been configured to record the necessary data in the logfiles. Each one has a short name, and a code letter or number, as follows. (Note that the code letters are case sensitive: Z is quite different from z, for example).
x  GENERAL         General Summary
1  YEARLY          Yearly Report
Q  QUARTERLY       Quarterly Report
m  MONTHLY         Monthly Report
W  WEEKLY          Weekly Report
D  DAILYREP        Daily Report
d  DAILYSUM        Daily Summary
H  HOURLYREP       Hourly Report
h  HOURLYSUM       Hourly Summary
w  WEEKHOUR        Hour of the Week Summary
4  QUARTERREP      Quarter-Hour Report
6  QUARTERSUM      Quarter-Hour Summary
5  FIVEREP         Five-Minute Report
7  FIVESUM         Five-Minute Summary
S  HOST            Host Report
l  REDIRHOST       Host Redirection Report
L  FAILHOST        Host Failure Report
Z  ORGANISATION    Organisation Report
o  DOMAIN          Domain Report
r  REQUEST         Request Report
i  DIRECTORY       Directory Report
t  FILETYPE        File Type Report
z  SIZE            File Size Report
P  PROCTIME        Processing Time Report
E  REDIR           Redirection Report
I  FAILURE         Failure Report
f  REFERRER        Referrer Report
s  REFSITE         Referring Site Report
N  SEARCHQUERY     Search Query Report
n  SEARCHWORD      Search Word Report
Y  INTSEARCHQUERY  Internal Search Query Report
y  INTSEARCHWORD   Internal Search Word Report
k  REDIRREF        Redirected Referrer Report
K  FAILREF         Failed Referrer Report
B  BROWSERREP      Browser Report
b  BROWSERSUM      Browser Summary
p  OSREP           Operating System Report
v  VHOST           Virtual Host Report
R  REDIRVHOST      Virtual Host Redirection Report
M  FAILVHOST       Virtual Host Failure Report
u  USER            User Report
j  REDIRUSER       User Redirection Report
J  FAILUSER        User Failure Report
c  STATUS          Status Code Report
For details on what the various reports mean, and a summary of the commands which control them, see the section on Analog's reports.

You can turn each report on or off with configuration commands like

FIVEREP OFF
REFSITE ON
or by using command line arguments like -5 and +s. You can also turn all reports except the General Summary on or off with the commands ALL ON and ALL OFF, or with the command line arguments +A and -A.
You can turn the descriptions of each report off with the command
DESCRIPTIONS OFF
Even if DESCRIPTIONS is ON, the descriptions will only appear if analog can find a report descriptions file in your language, or if you specify one using the DESCFILE command: for example,
DESCFILE descriptions.txt
If the name of the descriptions file doesn't include a directory, it will be looked for wherever analog normally expects to find its language files.

You can turn the "Go To" lines in the output off with the command

GOTOS OFF
GOTOS ON turns them on again, and GOTOS FEW puts the "Go To" lines just at the top and bottom. GOTOS OFF can be abbreviated with the -X command line argument, and GOTOS ON with +X.

You can turn off the "Program started at" line at the top of the output, and the "Running Time" line at the bottom, with the command

RUNTIME OFF
and turn them on again with RUNTIME ON.

The figures in parentheses in the General Summary are for the last seven days: either the seven days before the TO time, or if no TO time is given, the seven days before the time of the program start. The figures for the last seven days are normally included if some, but not all, of the requests fall in those seven days; but you can turn them off by means of the command

LASTSEVEN OFF
Of course LASTSEVEN ON turns them on again.

You can change the order of the reports by means of the REPORTORDER command. You should list the code letters for all possible reports in the order you want them. Non-alphanumeric characters are ignored and so can be used as separators. For example,

REPORTORDER x-1QmdDhHw4567W-cPz-ritEIYy-SlLZo-sNnfKk-ujJ-vMR-bBp

You can turn the lines in General Summary on and off individually using the GENSUMLINES command. The default is
GENSUMLINES ALL
meaning all available lines. (You always only get the ones relevant to your logfile though.) You can turn lines off using a command like
GENSUMLINES -KL
(to turn off lines K & L) and turn them on again with a command like
GENSUMLINES +K
You can specify the exact set of lines to include with a command like
GENSUMLINES CDFGHM
You now just need to know which lines have which code letters, which is given in the following table.
 
Successful requests (always listed)
B
Average successful requests per day
C
Logfile lines without status code
D
Successful requests for pages
E
Average successful requests for pages per day
F
Failed requests
G
Redirected requests
H
Requests with informational status code
I
Distinct files requested
J
Distinct hosts served
K
Corrupt logfile lines
L
Unwanted logfile entries
M
Data transferred
N
Average data transferred per day

There is a command called IMAGEDIR which tells analog where the various images used to make the output page should live. It should be a URL, not the actual location on your disk, and it should include the final slash. For example, you could have
IMAGEDIR img/   # relative URL: within the same directory as the output
IMAGEDIR /img/  # off the root directory of your server
IMAGEDIR http://www.myother.server.com/img/  # on another server
Some people are confused about the IMAGEDIR. It's just put in the <img> tags in the output. You can see its effect if you look at the HTML source of the output page.

You can use gif images instead of png's for the bar charts by specifying

PNGIMAGES OFF
PNGIMAGES doesn't affect the pie charts, which are always png's: but see the JPEGCHARTS command for something similar.
There are four commands which affect the top line of the output. First, the LOGO and LOGOURL commands allow you to replace the analog logo with another image (for example, your organisation's logo). You can say
LOGO picture.gif  # for this file
LOGO /images/picture2.gif  # a different file
LOGO none         # for no logo
The logo is assumed to be inside the IMAGEDIR unless it starts with a slash, or contains ://

The LOGOURL command specifies a URL to link the logo to. If you change the LOGO, you probably want to change the LOGOURL as well. For example,

LOGOURL http://www.mycompany.com/
LOGOURL none   # for no link
The LOGOURL command only works with the XHTML output style, not HTML 2.0.

There are commands HOSTNAME and HOSTURL which affect the name and link at the end of the title line. For example, I might specify

HOSTNAME "Stephen Turner"
HOSTURL  http://homepage.ntlworld.com/adelie/stephen/
to generate the title "Web Server Statistics for Stephen Turner". Again, you can use none as the HOSTURL to specify no link. Analog will normally translate characters in the hostname to HTML if necessary. So to include literal HTML, such as accented characters, in the output you need to precede them by a backslash, like this:
HOSTNAME "M\&uuml;ller & S\&ouml;hne"

There are commands called HEADERFILE and FOOTERFILE. These let you specify files to be inserted near the top and bottom of your output. You can also specify
HEADERFILE none
to cancel a previously-specified header file. Again, if the name of the HEADERFILE or FOOTERFILE doesn't include a directory, analog will assume a directory, specified when the program was compiled.
There is a command called STYLESHEET to specify the URL of a style sheet for the output. This allows you to change the colours etc. (See http://www.w3.org/Style/css/ for how to write a style sheet.) For example,
STYLESHEET /housestyle.css
STYLESHEET none   # to cancel it
In the XHTML output style, if you specify a style sheet, it will replace the default one, so you might prefer to use the default one as a base -- you can find it in the directory examples/css, along with some other style sheets contributed by users.

There is a command CSSPREFIX to add a prefix to all the CSS class names used in the XHTML output style. This is useful to avoid clashes with other style sheets: the disadvantage is that it will make your output longer. For example,

CSSPREFIX anlg
CSSPREFIX none    # to cancel it
Of course, if you use your own style sheet, you will have to add the CSSPREFIX to all the class names in the style sheet.
There are three related commands called SEPCHAR, REPSEPCHAR and DECPOINT. These specify single characters to be used as the thousands separator in numbers, the thousands separator within the columns in the reports, and the decimal point. Normally, these will be set automatically for the language you choose, but you can change them if you want. For example, a French user might choose
SEPCHAR " "
REPSEPCHAR none
DECPOINT ,
to make "three thousand and a quarter" look like "3 000,25" in text and "3000,25" in the reports.

There is a command called RAWBYTES. Specify RAWBYTES ON if you want the exact number of bytes to be listed, or RAWBYTES OFF if you want the number of kilobytes or Megabytes as appropriate to be listed instead.

If RAWBYTES is OFF (which is the default), then you can use the BYTESDP command to specify how many decimal places you want the bytes rounded to. The default is 2, which will display numbers like "91.26 kilobytes".


There are commands called HTMLPAGEWIDTH, PLAINPAGEWIDTH and LATEXPLAINWIDTH which specify the width of the page. Which one is used depends on whethere the output style is HTML/XHTML, PLAIN/ASCII, or LATEX. The output is not guaranteed to fit in this width, but analog will take notice of it when choosing the width of the time graphs, when sorting the Host Report alphabetically, when drawing horizontal rules, and when writing some bits of text.
There is a command called NOROBOTS which stops robots which obey the robots META tag from indexing your output page or following its links. Normally this is set to ON but you can specify NOROBOTS OFF if you don't mind robots finding your other pages this way. Note that you will stop far more robots if you also put your stats page in your robots.txt file; on the other hand, this file has to be kept up to date by the server administrator.
Sometimes your server is not in the same timezone as you, or at least records the times in its logfiles in a different timezone (for example GMT). So that you can get your statistics in your local time, there is a command called LOGTIMEOFFSET to change the time by a certain number of minutes. As with the LOGFORMAT command, this only affects logfiles which come later in the same configuration file.

You have to be careful using this command. Because of daylight savings time in operation in different parts of the world at different times, analog cannot attempt to convert between different timezones. So it's your responsibility to set the right offset for different times of year. For example, if you were in Chicago, but your server was recording time in GMT, you would need to specify two different time offsets, one of minus five hours for summer and one of minus six hours for winter. You would need to split your logfiles in the right places and then run commands like

LOGTIMEOFFSET -300
LOGFILE summer*.log
LOGTIMEOFFSET -360
LOGFILE winter*.log

There is also a related command called TIMEOFFSET. This tells analog how much to offset the time of the computer on which it is running (rather than the computer running the server), to get your local time.


In the following sections we shall look at some commands for configuring the output of particular reports, under the following headings: Time reports, Other reports and Hierarchical reports.

Time reports

This section is about commands which control the appearance of the time reports. There are thirteen such reports, which show the pattern of usage over time. Eight of them (the ones with "Report" in their name) show the usage at specific times, whilst the other five (the "Summaries") show the total (not average) activity at particular times of day and week over the whole time period of the report.

By the way, in the following lists, don't get confused between the commands for the Quarterly Report (which begin with QUARTERLY) and those for the Quarter-Hour Report and Quarter-Hour Summary (with begin with QUARTERREP and QUARTERSUM respectively).


Each time report can contain columns listing the requests, requests for pages, and bytes transferred at that time, using the following code letters.
R
Number of requests
r
Percentage of the requests
P
Number of page requests
p
Percentage of the page requests
B
Number of bytes transferred
b
Percentage of the bytes
Which columns appear in which reports is controlled by various COLS commands. For example, the command
HOURSUMCOLS Pb
tells analog to include the number of page requests and percentage of the bytes, in that order, as the columns for the Hourly Summary. The full list of these COLS commands is YEARCOLS, QUARTERLYCOLS, MONTHCOLS, WEEKCOLS, DAYREPCOLS, DAYSUMCOLS, HOURREPCOLS, HOURSUMCOLS, WEEKHOURCOLS, QUARTERREPCOLS, QUARTERSUMCOLS, FIVEREPCOLS and FIVESUMCOLS. There is also a TIMECOLS command, which specifies that all the time reports are to have the specified columns.
Similarly, analog can plot the bar charts in the time reports according to the number of requests, number of page requests, or number of bytes. This is controlled by the GRAPH family of commands. So, for example,
DAYREPGRAPH P
tells analog to plot the bar charts in the Daily Report by the number of page requests. This also controls how analog decides which is the busiest time period in the bottom line of the report. Using a lower case letter tells analog to plot the bar charts with ASCII characters instead of the normal red bars. (This produces shorter output, and it is how they appear anyway in PLAIN and ASCII output styles, or when viewed with a non-graphical browser.) So, for example,
DAYREPGRAPH b
would plot the Daily Report by bytes, without using the graphics. The full list of GRAPH commands is YEARGRAPH, QUARTERLYGRAPH, MONTHGRAPH, WEEKGRAPH, DAYREPGRAPH, DAYSUMGRAPH, HOURREPGRAPH, HOURSUMGRAPH, WEEKHOURGRAPH, QUARTERREPGRAPH, QUARTERSUMGRAPH, FIVEREPGRAPH and FIVESUMGRAPH. There's also an ALLGRAPH command to set all of them simultaneously.
There are various possible graphics available for the graphs, controlled by the BARSTYLE command, as follows. (They will all look the same if you have a non-graphical browser.)

BARSTYLE a  +++++++++++
BARSTYLE b  +++++++++++
BARSTYLE c  +++++++++++
BARSTYLE d  +++++++++++
BARSTYLE e  +++++++++++
BARSTYLE f  +++++++++++
BARSTYLE g  +++++++++++
BARSTYLE h  +++++++++++
BARSTYLE i  +++++++++++
BARSTYLE j  +++++++++++
The default style is b.
You can plot the graphs either forwards in time (starting from the earliest date) or backwards (starting from the latest date). Use commands like
MONTHBACK ON  # Monthly Report backwards
WEEKBACK OFF  # Weekly Report forwards
The full list of BACK commands is YEARBACK, QUARTERLYBACK, MONTHBACK, WEEKBACK, DAYREPBACK, HOURREPBACK, QUARTERREPBACK and FIVEREPBACK. It tends to be confusing to mix directions (and analog will warn you if you attempt it) so usually you want to use the ALLBACK command which will set all of them at once.
For the more detailed time reports, you usually only want to list the last few time periods. (Every five minutes for the last three years?? I think not.) So analog provides some ROWS commands to let you specify how many rows you want in the time reports. For example
QUARTERREPROWS 96  # only the last day's worth
MONTHROWS 0        # 0 means no restriction: show all time
The full list of ROWS commands is YEARROWS, QUARTERLYROWS, MONTHROWS, WEEKROWS, DAYREPROWS, HOURREPROWS, QUARTERREPROWS and FIVEREPROWS. Even if a ROWS command is given, the line at the bottom of the report will still show the busiest time period ever, not just the busiest one in that many rows.
The character which is used for plotting the graphs in PLAIN and ASCII styles or on a non-graphical browser is specified by means of the MARKCHAR command. For example,
MARKCHAR =
tells analog to use the equals sign.

There is a parameter called MINGRAPHWIDTH which sets the minimum nominal size of the graphs. For example, if you set

MINGRAPHWIDTH 10
then the graph will be allowed to be up to 10 characters wide, even if that would exceed the PAGEWIDTH.

There is one more command which affects the time reports. You can specify which day should be counted as the first day of the week. This affects the layout of the Daily Report, Daily Summary, Weekly Report and Hour of the Week Summary. For example, our local student newspaper publishes a new edition on the web every Friday, so they like to specify WEEKBEGINSON FRIDAY for their reports.

In the next section, we'll look at commands relating to the non-time reports.


Other reports

This section deals with the non-time reports. There are quite a lot of commands which control these reports, although we've seen some of them already.

First, these reports have COLS commands, just like the time reports. (See the section on Time reports for how to use these commands.) But for these reports, several additional columns are available. Here is the full list of columns for the non-time reports

R
Number of requests
r
Percentage of the requests
S
Number of requests in the last 7 days
s
Percentage of the requests in the last 7 days
P
Number of page requests
p
Percentage of the page requests
Q
Number of page requests in the last 7 days
q
Percentage of the page requests in the last 7 days
B
Number of bytes transferred
b
Percentage of the bytes
C
Number of bytes transferred in the last 7 days
c
Percentage of the bytes in the last 7 days
d
Date of last access
D
Date and time of last access
e
Date of first access
E
Date and time of first access
N
The number of the item in the list
So, for example,
REQCOLS NRSD
counts the files in the Request Report, listing the number of requests for each, the number of requests for each in the last 7 days, and the time when each was last requested. The full list of COLS commands for non-time reports is HOSTCOLS, REDIRHOSTCOLS, FAILHOSTCOLS, ORGCOLS, DOMCOLS, REQCOLS, DIRCOLS, TYPECOLS, SIZECOLS, PROCTIMECOLS, REDIRCOLS, FAILCOLS, REFCOLS, REFSITECOLS, SEARCHQUERYCOLS, SEARCHWORDCOLS, INTSEARCHQUERYCOLS, INTSEARCHWORDCOLS, REDIRREFCOLS, FAILREFCOLS, BROWREPCOLS, BROWSUMCOLS, OSCOLS, VHOSTCOLS, REDIRVHOSTCOLS, FAILVHOSTCOLS, USERCOLS, REDIRUSERCOLS, FAILUSERCOLS and STATUSCOLS. Not every column is allowed in every report, but if you specify an illegal one, analog will warn you about it.
Next you need to know how use a SORTBY command to specify how the reports should be sorted. There are ten possible ways of sorting reports:
REQUESTS
total number of requests
REQUESTS7
requests within the last 7 days
PAGES
total requests for pages
PAGES7
requests for pages within the last 7 days
BYTES
total bytes transferred
BYTES7
bytes transferred within the last 7 days
FIRSTDATE
time of first request
DATE
time of most recent request
ALPHABETICAL
alphabetically
RANDOM
unsorted, sometimes useful for speed in very long reports
For example, the command
HOSTSORTBY ALPHABETICAL
will sort the Host Report alphabetically. The full list of SORTBY commands is HOSTSORTBY, REDIRHOSTSORTBY, FAILHOSTSORTBY, ORGSORTBY, DOMSORTBY, REQSORTBY, DIRSORTBY, TYPESORTBY, REDIRSORTBY, FAILSORTBY, REFSORTBY, REFSITESORTBY, SEARCHQUERYSORTBY, SEARCHWORDSORTBY, INTSEARCHQUERYSORTBY, INTSEARCHWORDSORTBY, REDIRREFSORTBY, FAILREFSORTBY, BROWREPSORTBY, BROWSUMSORTBY, OSSORTBY, VHOSTSORTBY, REDIRVHOSTSORTBY, FAILVHOSTSORTBY, USERSORTBY, REDIRUSERSORTBY, FAILUSERSORTBY and STATUSSORTBY. Again, not every sort method is possible in every report, but you'll be warned if you choose an illegal one.

There is one known bug concerned with SORTBY ALPHABETICAL. The report is sorted before any output alias is applied. This means that if an output alias has been specified for the report, then the report may appear not to be sorted correctly.


You can also specify a FLOOR for most reports, saying how much activity an item needs before it is listed on the report. (Other items will just be accumulated together in the "not listed" line at the bottom of the report.) There are lots of possible ways of specifying floors, which I'll list here, using the DOMFLOOR (Domain Report FLOOR) command as an example. Essentially each one consists of a number indicating the level of the floor, followed by a letter indicating the floor criterion.
DOMFLOOR 1000r       # all domains with at least 1000 requests
DOMFLOOR 100s        # at least 100 requests within the last 7 days
DOMFLOOR 1000p       # at least 1000 requests for pages
DOMFLOOR 100q        # at least 100 requests for pages within the last 7 days
DOMFLOOR 1000000b    # at least 1,000,000 bytes transferred
DOMFLOOR 1kb         # at least 1 kilobyte (1024 bytes)
DOMFLOOR 10.5Mc      # at least 10.5Mb within the last 7 days
DOMFLOOR 0.5%r       # 0.5% of the total requests in the Domain Report
                     # (ditto %s, %p etc.)
DOMFLOOR 0.5:r       # 0.5% of the maximum number of requests for any domain
                     # (ditto :s, :p etc.)
DOMFLOOR 970701d     # last access since 1st July 1997
DOMFLOOR 970701e     # first access since 1st July 1997
DOMFLOOR -00-01-00d  # last access in last month (see
                     # documentation on FROM and TO commands)
DOMFLOO