Network redundancy
When setting up a new server or servers cluster you should always be looking at what you should be doing to prevent downtimes: hardware failures, network failures, etc.
Most of the people will configure any server on RAID, either hardware RAID provided by a RAID controller on the server or an attached storage. But many people forgets how important is to make your network redundant.
Network guys are ahead of systems in this because they've been using Spanning-Tree, VRRP, HSRP and many other nice acronyms with a brilliant function. But what about the people in charge of servers? It's not that the technologies are not available, it is just that we get too involved on building the server and services and we forget about the network.
These are the three methods I've seen the most:
1. Floating IPThis is quite a simple configuration. You basically configured two physical network interfaces (usually connected to different network switches) and a virtual one. The virtual interface is the one with the outside facing IP and it'll be configured on just one of the real interfaces.
The idea behind it is that if one of your network links fails you just need to move the floating IP to the other physical interface to restore it.
Pros: very easy setup supported by any OS
Cons: it needs manual intervention when the interface fails
2. IPMP (IP network multipathing)
This is method provided by Solaris. As far as I know there is no equivalent on Linux but the bonding driver could in theory do something similar.
What you do here is you configure at least two network interfaces within the same network group. Depending on whether you configured probe based or link based failure, the in.mpathd will detect the problem and switch the IP between interfaces.
bge0: flags=1000843
Contrary to common believe you don't need IP addresses on both interfaces. One of them will suffice as long as both interfaces are UP.
Pros: very easy configuration and it's very reliable
Cons: Solaris specific
By configuring network aggregation you are basically joining together two or more interfaces. They will not only provide you with fail over support but they will also join bandwidth giving you a faster output. This is in my opinion the best method and it's supported natively by Solaris and throughout the bonding driver on Linux. It's is also an industry standard and you'll find it supported on most network kit such as Cisco and Juniper.
Pros: easy configuration, improved network output through bandwidth joining, industry standard
Cons: I can't think of any
I found a couple of good documents explaining how to configure aggregation on Ubunto/Debian, RedHat/CentOS and Solaris. If you have any question, do let me know.
A world of files
I always had a lot of interest for the different filesystems available on UNIX systems since I worked as developer creating a Linux distribution. Back then reiserfs was the hottest thing around as the first widely available filesystem supporting journaling.
With time I got involved on many different projects where I could use my filesystems knowledge to improve a service. I was quite surprise to find out many sys admins would not take filesystem capabilities in mind when building up a new platform. I also found a lot of old school sys admins that have been using the same for many years and would not change it!
A few years ago I was on charge of the USENET platform on a known ISP. The amount of data that was coming in and out the platform was massive and the servers were expending 99% of the time either writing or reading.
I did then a lot of investigation and I discovered that some colleagues running Usenet platform were using XFS as an alternative. After quite a lot of time spent there, I managed to apply a few patches both provided to me and made by me to increase even more the XFS perfomance. The result was a drop on usage down to 60%
Don't get me wrong here. I'm not saying you should use XFS. What I'm saying is you should choose carefully the fs you're going to use depending on the platform you are bulding. Some require to be fast opening and closing small files, others large files, etc. No fs is good on all of them so you should get some benchmarks and give it some thought.
Lateley I'm finding more and more companies turning into Sun ZFS. And with good reason. It's one of the most powerful fs I've seen. It's very fast, very resiliant and it has some brilliant well made features such as snapshots, utilities to duplicate pools (cloning), etc.
You would expect something as good as this to be very difficult to use or administrate but it couldn't be easier. It takes just a couple of commands to get it up and running. These is a bit of my cheatsheet:
- Create disk mirror (RAID1 with disks c0d1 & c1d1)
zpool create pool mirror c0d1 c1d1
- Create filesystem in the pool
zfs create pool/test1
- Change mountpoints
zfs set mountpoint=none pool
zfs set mountpoint=/test1 pool/test1
- Create snapshot
zfs snapshot pool/test1@my_snap
- Copy filesystem to another server
zfs send pool/test1@my_snap | ssh root@other_server zfs receive pool/newtest1
- Destroy snapshot
zfs destroy pool1/test1@my_snap
The problem of SPAM
This is a subject that annoys many people and worries many business. In my experience I worked for companies investing large quantities of cash improving the mail systems and all due to SPAM. On some estimates made by large corporations and ISP's around 80% to 90% of all mail is actually SPAM.
But what are the implications? They're many but in my opinion the three more important are:
- Storage
- Processing
- Security
Storage
All email received by mail servers, by default, it's stored until the owner reads it and deletes it. Let's do some easy calculations. Let's suppose the average mail size is about 40k and that a business user would receive around 100 emails per day and that 90% are Spam.
Real mail: (10*40) = 400k = 0.39Mb
Spam: (90*40) = 3600k = 3.51Mb
It may not sound a lot but the company probably has more than one employee. For example, if it has 100 employees we're talking about 351Mb of Spam per day. If it is never deleted it'll grow up to 128115Mb on a single year. And this is a very small example. A small service provider could easy have 200,000 customers and often they'll have more than one e-mail address. Just do the maths!
Processing
Each email sent and received by a service provider needs to be processed to determine the final destination and to store it. This requires processing time and the servers can only do a limited number of e-mails per second. If the servers can cope with the load, they'll start failing and e-mails could get lost or returned to the sender.
Although the price and power of servers is not what it used to be, it's still money to invest in equipment and network.
Security
I was unsure whether to include this section here or not as it's not strictly Spam but often Spam-like messages are used to deliver viruses or to obtain private information from users.
What are companies and professionals doing about it?
Most of the companies will have by now and anti-spam platform. Some others prefer to hire another company to do the cleaning like Plusnet has recently done by hiring Postini.
The cheapest option is usually to do it in house but it can be very time consuming and it takes time to create a good system.
The internet and the free/open source communities provide us with all the tools we require to achieve this. These are the ones I have used and I rely the most:
E-Mail content tests
One one of identifying spam is by running different tests against each email such as DNS verification or Bayesian.
In this category we'll include software such as SpamAssassin.
Cons: As each email needs to be analyze this method it requires powerful servers to speed up checking.
Greylisting
This is a very clever way of avoiding Spam. When an e-mail is sent but for any problem it cannot reach it's destination the originating server will retry in a little while.
However, spammers systems don't do that. The greylisting method basically uses this principle. When receives an email from an unknown server it replies with a try again later. If indeed tries again the mail will be delivered successfully and the IP address will be added to the allowed list.
Cons: The first email from a new IP is always delayed for some time. Spammers are adapting.
Sender Policy Framework (SPF)
Today, nearly all abusive e-mail messages carry fake sender addresses. The Sender Policy Framework (SPF) is an open standard specifying a technical method to prevent sender address forgery. The technology requires two sides to play together: (1) the domain owner publishes this information in an SPF record in the domain's DNS zone, and when someone else's mail server receives a message claiming to come from that domain, then (2) the receiving server can check whether the message complies with the domain's stated policy. If, e.g., the message comes from an unknown server, it can be considered a fake.
Cons: It requires DNS changes to add SPF records. This may not be suitable for some as many hosting providers don't offer this option.
To conclude, SPAM is still a growing problem. As much as the technology to combat it improves so the spammers do. There is no easy way to win this war but it can definitely save your company loads of money if using the right implementation.
Python SQLObject
A few days ago I talked about using Python for web development together with CherryPy. I didn't mention at that time the other great thing about Python Web Development as oppose to PHP or Perl: SQLObject
It's rare nowadays to do a web development without a back-end database. And when we talk about databases we have to talk about all the usual queries we are tired of doing:
- Create database
- Create tables
- Write all the queries you'll need
- Handle foreign-keys
And many more. Most of the programmers (including myself) will do their own library to handle all of these and we keep reusing it in most of our projects. But still, it won't be as good as a well designed, purposed build and object orientated library as SQLObject.
On SQLObject each table becomes a python Class and it will produce the code to create, access, insert, update and delete tables and contents.
#!/usr/bin/env python
from sqlobject import *
import sys, os
class Person(SQLObject):
firstname = StringCol(length=30)
lastname = StringCol(length=30)
connection_string = 'sqlite:/tmp/test.db'
connection = connectionForURI(connection_string)
sqlhub.processConnection = connection
# Create table
Person.createTable('ifDoesNotExists')
# Add new entry
Person(firstname='Sergio', lastname='Rua')
Person(firstname='John', lastname='Doe')
# Get single entry by ID
p = Person.get(1)
print "1: Got entry: name=%s - Surname=%s" % (p.firstname, p.lastname)
# Get selected entries
p = Person.select(Person.q.firstname=='John')
print "2: Got entry: name=%s - Surname=%s" % (p[0].firstname, p[0].lastname)
# Get all entries
people = Person.select()
for p in people:
print "3: Got entry: name=%s - Surname=%s" % (p.firstname, p.lastname)
This is a very basic example to illustrate how easy is to use. Even if you don't know python I hope you see that there is no SQL involved in all of this. The class Person will in fact be a table on the database with the fields listed. You can then add or get rows from it with a very simple line of code.
I don't pretend to make a tutorial out of this. I just want to open your eyes to the possibilities. The website has a very good tutorial with everything you need to know. Otherwise, I'm here to help. You can always contact me or request an invitation to the free advice section.
The author, Ian Bicking, has done an excellent job.
The OS war
There is and there always will be a war between Operating Systems. They compete between each other for a place in the top. Some driven by large companies, some others driven by users groups and communities with companies support (ie RedHat).
The competition is hard and each OS will claim to be better than the other. But the winner (if any) will not get its place because it's the better among them but because it has the biggest amount of followers.
When I design a platform I have my preferences, of course, like any other IT Consultant and it's this election the one driving the war. However, there are mitigations to this. From time to time, for one reason or another, you can't use the OS of your choice and you have to go for another one.
At the top of the league table we'll find Windows, Solaris, Linux and the BSD family. I don't choose Windows unless there is an unavoidable reason for that (ie the server is meant to run a Windows-only application). I tend to use Solaris for large platforms with demanding applications (ie databases) when the budget allows for it. Linux is my favourite and I largely use it for nearly everything when the budget is not very big and therefore I will be using Intel cheap? based hardware. The best thing for me about Linux is how flexible and simple it is so I can get any platform up and running in no time.
I have used in the past FreeBSD quite a lot but in recent years I stopped using it because I can do the same with Linux and it has better support from hardware vendors. I quite like it, though, and I found it to be very fast.
Not long time ago I did a presentation to help people understand what OS to choose and why. I came up with a large list of reasons to use Solaris, similar ones for Linux but when it came to reasons to use Windows I could only think on one reason: It's a very well known brand.
I'm sure there are more reasons and that people working with Windows would know of them. Actually during the talk I remembered another reason. From time to time you have to run a 3rd party application made for Windows only. Unfortunately for me, comparing it with Unix based OS it's a no no. Too many problems.
But we have to be cold hearted and choose the right OS for each platform. Once chosen, it's not easy to roll back. Make sure you take your time looking as pros and cons.
Python web development
Whenever I'm asked what to use for a web development project my reply is usually PHP + MySQL. Why?
- Powerful yet simple
- Very cheap or free
- Largely used and therefore it's easy to find people with the skills required to do the development
However I've always been more impressed by Python as a programming language. It's very powerful and very versatile: you can both make a desktop application that'll work perfectly fine on Linux, Solaris, Windows or MacOS or you can make a web development.
When I use PHP I personally like Smarty as my template engine. The best thing is always to separate the code from the HTML or you'll get a mess very difficult to maintain.
In the case of Python, I tried many of the available template libraries / engines and my favourite is right now Mako
Going a bit further into the development subject, you can use Mako together with CherryPy. It's a HTTP development framework with embedded HTTP server (if you want to use it but I would only recommend it for development).
It's a bit difficult to explain so I'll use an example comparing it with PHP. On a PHP development you'll usually create several .php files for each of the actions you'll be programming.
- index.php
- contact.php
- blog.php
And you'll add the required code to each of the files. You then need a web server such as Apache or LigHttpd to run your program. You would then access each of them using http://yourserver/index.php
Using cherrypy though, you're more in control because CherryPy can use it's own internal web server and every function in the code can become a web page.
import cherrypyclass Root:
def index(self):
return "Hello World!!"
index.exposed=True
@cherrypy.expose
def contact(self):
pass
@cherrypy.expose
def blog(self):
pass
cherrypy.quickstart(Root())
If you just run this bit of code, it'll start a web server on localhost by default on port 8080. You can then access your page just by typing http://localhost:8080/index.
I'm very glad to be able to use Python for Web Development in such a clever way. As I said before, I do believe Python is a better programming language. However, on a professional world I'm still recommending PHP over it to my customers just for one reason: you cannot find Python programmers as easily as PHP and unless you have the in-house skills this may probe a little bit painful. I hope someone probes me wrong here and we can use more Python from there on. There are after all quite a few websites running on Python code and many companies using it (Google, RedHat, etc).
To finish, please give Python a change. If you are a programmer, learn it!
A new beginning
Reviewing lighttpd
The first thing I noticed is how easy to install and configure is. In just 5 minutes I had it up and running. I didn't take me very long neither to discover how to do the same things I'm used to on apache.
Because the named their modules on a similar fashion as apache, they were very easy to identify. Very quickly I was trying out mod_rewrite, creating some aliases on a couple of test virtual hosts and configuration it to use PHP.
I like very much their configuration file. It is well structured and easy to read. I found very handy that you can use 'perl like' regular expressions to create conditions and secure the access. For example, if you want to allow access just to your internal IPs:
$HTTP["remoteip"] !~ "^(192.168.88..*|10.10.10..*)" {
url.access-deny = ("")
}
The other thing I noticed is that my test PHP applications seem to be faster than under Apache's care. I don't know though whether this is because of lighttpd itself of because it uses fastcgi.
I'll definitely do more testing and I'll consider putting some productions servers in use with this server instead of apache.
IE problem resolved
Nevertheless, it is working now. You can download the JavaScript from this link.



