Friday, August 31, 2007

Scene Cut Detection

Back in time i use to integrate on Windows the following software issued by
this famous French Laboratory. The purpose was to perform scene cut detection
thanks to a frame by frame video engine written for Windows and DirectShow.
The algorithm was very CPU intensive but pretty interesting.

http://www.irisa.fr/vista/Themes/Logiciel/MdShots/MdShots.english.html

Monday, August 13, 2007

HTTP performance testing with httperf, autobench

HTTP performance testing with httperf, autobench


* httperf is a benchmarking tool that measures the HTTP request throughput of a web server. The way it achieves this is by sending requests to the server at a fixed rate and measuring the rate at which replies arrive. Running the test several times and with monotonically increasing request rates, one can see the reply rate level off when the server becomes saturated, i.e., when it is operating at its full capacity.
* autobench is a Perl wrapper around httperf. It runs httperf a number of times against a Web server, increasing the number of requested connections per second on each iteration, and extracts the significant data from the httperf output, delivering a CSV format file which can be imported directly into a spreadsheet for analysis/graphing.


I ran a series of autobench/httperf and openload tests against a Web site I'll call site2 in the following discussion (site2 is a beta version of a site I'll call site1). For comparison purposes, I also ran similar tests against site1 and against www.example.com. The machine I ran the tests from is a Red Hat 9 Linux server co-located in downtown Los Angeles.

Here is an example of running httperf against www.example.com:

# httperf --server=www.example.com --rate=10 --num-conns=500

httperf --client=0/1 --server=www.example.com --port=80 --uri=/ --rate=10 --send-buffer=4096 --recv-buffer=16384 --num-conns=500 --num-calls=1
Maximum connect burst length: 1

Total: connections 500 requests 500 replies 500 test-duration 50.354 s

Connection rate: 9.9 conn/s (100.7 ms/conn, <=8 concurrent connections)
Connection time [ms]: min 449.7 avg 465.1 max 2856.6 median 451.5 stddev 132.1
Connection time [ms]: connect 74.1
Connection length [replies/conn]: 1.000

Request rate: 9.9 req/s (100.7 ms/req)
Request size [B]: 65.0

Reply rate [replies/s]: min 9.2 avg 9.9 max 10.0 stddev 0.3 (10 samples)
Reply time [ms]: response 88.1 transfer 302.9
Reply size [B]: header 274.0 content 54744.0 footer 2.0 (total 55020.0)
Reply status: 1xx=0 2xx=500 3xx=0 4xx=0 5xx=0

CPU time [s]: user 15.65 system 34.65 (user 31.1% system 68.8% total 99.9%)
Net I/O: 534.1 KB/s (4.4*10^6 bps)

Errors: total 0 client-timo 0 socket-timo 0 connrefused 0 connreset 0
Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0

The 3 arguments I specified on the command line are:

* server: the name or IP address of your Web site (you can also specify a particular URL via the --uri argument)
* rate: specifies the number of HTTP requests/second sent to the Web server -- indicates the number of concurrent clients accessing the server
* num-conns: specifies how many total HTTP connections will be made during the test run -- this is a cumulative number, so the higher the number of connections, the longer the test run

Here is a detailed interpretation of an httperf test run. In short, the main numbers to look for are the connection rate, the request rate and the reply rate. Ideally, you would like to see that all these numbers are very close to the request rate specified on the command line. If the actual request rate and the reply rate start to decline, that's a sign your server became saturated and can't handle any new connections. That could also be a sign that your client became saturated, so that's why it's better to test your client against a fast Web site in order to gauge how many outgoing HTTP requests can be sustained by your client.

Autobench is a simple Perl script that facilitates multiple runs of httperf and automatically increases the HTTP request rate. Configuration of autobench can be achieved for example by means of the ~/.autobench.conf file. Here is how my file looks like:

# Autobench Configuration File

# host1, host2
# The hostnames of the servers under test
# Eg. host1 = iis.test.com
# host2 = apache.test.com

host1 = testhost1
host2 = testhost2

# uri1, uri2
# The URI to test (relative to the document root). For a fair comparison
# the files should be identical (although the paths to them may differ on the
# different hosts)

uri1 = /
uri2 = /

# port1, port2
# The port number on which the servers are listening

port1 = 80
port2 = 80

# low_rate, high_rate, rate_step
# The 'rate' is the number of number of connections to open per second.
# A series of tests will be conducted, starting at low rate,
# increasing by rate step, and finishing at high_rate.
# The default settings test at rates of 20,30,40,50...180,190,200

low_rate = 10
high_rate = 50
rate_step = 10

# num_conn, num_call
# num_conn is the total number of connections to make during a test
# num_call is the number of requests per connection
# The product of num_call and rate is the the approximate number of
# requests per second that will be attempted.

num_conn = 200
#num_call = 10
num_call = 1

# timeout sets the maximimum time (in seconds) that httperf will wait
# for replies from the web server. If the timeout is exceeded, the
# reply concerned is counted as an error.

timeout = 60

# output_fmt
# sets the output type - may be either "csv", or "tsv";

output_fmt = csv

## Config for distributed autobench (autobench_admin)
# clients
# comma separated list of the hostnames and portnumbers for the
# autobench clients. No whitespace can appear before or after the commas.
# clients = bench1.foo.com:4600,bench2.foo.com:4600,bench3.foo.com:4600

clients = localhost:4600

The only variable I usually tweak from one test run to another is num_conn, which I set to the desired number of total HTTP connections to the server for that test run. In the example file above it is set to 200.

I changed the default num_call value from 10 to 1 (num_call specifies the number of HTTP requests per connection; I like to set it to 1 to keep things simple). I started my test runs with low_rate set to 10, high_rate set to 50 and rate_step set to 10. What this means is that autobench will run httperf 5 times, starting with 10 requests/sec and going up to 50 requests/sec in increments of 10.

When running the following command line...

# autobench --single_host --host1=www.example.com --file=example.com.csv

Wednesday, August 08, 2007

Source Code Beautifier for C, C++, C#, D, Java, and Pawn

A must to use !

http://uncrustify.sourceforge.net/

how to convert IIS log to apache log using perl

# http://www.jammed.com/~jwa/hacks/iis2apache/iis2apache

# 1. Go to Start -> Control Panel -> Administrative Tools
# 2. Run Internet Information Services (IIS).
# 3. Find your Web site under the tree on the left.
# 4. Right-click on it and choose Properties.
# 5. On the Web site tab, you will see an option near the bottom that says "Active #Log Format." Click on the Properties button.
# 6. At the bottom of the General Properties tab, you will see a box that contains #the log file directory and the log file name. The full log path is comprised of the #log file directory plus the first part of the log file name.
#
#For example, if the dialog box displayed the following values:
#
# * Log file directory: C:\Windows\System32\LogFiles
# * Log file name: W3SVC1\exyymmdd.log
#Then your full log path would be:
#C:\Windows\System32\LogFiles\W3SVC1

#!/usr/bin/perl
#
# make an IIS log look strikingly like an apache log
# we make a vague attempt to interpret the Fields: header
# in an IIS logfile and use that to make IIS fields match up
# with apache fields.
#
# jwa@jammed.com 12 Dec 2000
#

while ($arg = shift @ARGV) {
$tzoffset = shift @ARGV if ($arg eq "--faketz");
$vhost = shift @ARGV if ($arg eq "--vhost");
$debug = 1 if ($arg eq "--debug"); # show field interpretation
}

if ($tzoffset eq "") {
print STDERR "Will use -0000 as a fake tzoffset\n";
$tzoffset = "-0000";
}

# build month hash
@m = ('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec');
while ($m = shift @m) {
$month{++$n} = $m;
}

# an IIS log adheres to what's defined by 'Fields;'
# attempt to parse this, tagging


LINE:
while ($line = ) {
$line =~ s/\r|\n//g; # cooky DOS format
if ($line =~ /^#Fields: /) {
@line = split(" ", $line);
shift @line; # shifts of #Fields
# build a hash so we can look up a fieldname and
# have it return a position in the string
undef %fieldh;
$n = 0; # zero-based array for split
while ($l = shift @line) {
$fieldh{$l} = $n++;
print STDERR "$l is position $fieldh{$l}\n" if ($debug);
}

}
next LINE if ($line =~ /^#/);

#Fields: date time c-ip cs-username s-sitename s-computername s-ip cs-method cs-
#uri-stem cs-uri-query sc-status sc-win32-status sc-bytes cs-bytes time-taken
#s-port cs-version cs(User-Agent) cs(Cookie) cs(Referer)

# this is really slow.

$date = yankfield("date");
$time = yankfield("time");
$ip = yankfield("c-ip");
$username = yankfield("cs-username");
$method = yankfield("cs-method");
$stem = yankfield("cs-uri-stem");
$query = yankfield("cs-uri-query");
$status = yankfield("sc-status");
#$bytes = yankfield("cs-bytes");
# Which is it? sc-bytes or cs-bytes?
# cs-bytes only appears in some of the IIS logs I've seen.
# I'll assume that sc-bytes is "server->client bytes", which
# is what we want anyway.
$bytes = yankfield("sc-bytes"); # I'm gonna go with this.
$useragent = yankfield("cs(User-Agent)");
$referer = yankfield("cs(Referer)");

$useragent =~ s/\+/\ /g;

# our modified CLF sez:
# IP - - [DD/MMM/YYYY:HH:MM:SS TZOFFSET] "method stem[?query]" status bytes "referer" "user agent" "vhost"

# convert date
# 2000-07-19 00:00:01
($y, $m, $d) = split("-", $date);
$m =~ s/^0//g;
$mname = $month{$m};

# build url
$url = $stem;
if ($query ne "-") {
$url .= "?$query";
}

# all done, print it out
print "$ip - - [${d}/${mname}/${y}:${time} ${tzoffset}] \"$method $url\" $status $bytes \"$referer\" \"$useragent\" \"${vhost}\"\n";
}


# return the proper field, or "-" if it's not defined.
# (unfortunately ($date) = (split(" ", $line))[$fieldh{date}];
# will return element 0 if $fieldh{date} is undefined . . .)

sub yankfield {
my ($field) = shift @_;

print STDERR "Looking at $field; position [$fieldh{$field}]\n" if ($debug);

if ($fieldh{$field} ne "") {
return (split(" ", $line))[$fieldh{$field}];
} else {
print STDERR "$field undefined\n" if ($debug);
return "-";
}
}