POPFile 0.21.0 has been released, and with very little trouble I have it up and running in Mac OS 10.3.
The big change in this version is no more BerklyDB back end, POPFile now using a SQL database, and the easiest way to do this in OS X is to use SQLite. Thankfully Michael Tsai (SpamSieve author) has a built version of SQLite for Mac OS X. This means you do not have to compile and install it. The schema is available to use something like MySQL, but why bother?
Next up you have some perl modules to install:
These you will need to download, compile, and install. DBI needs to be installed before you can install the DBD::SQLite module.
More on installing SQLite and these associated perl modules below.
There are a couple really cool new features to POPFile, and in my opinion, worth the work to use them. They are:
- The global options for Subject Modification, X-Text-Classification insertion and X-POPFile-Link insertion have been removed and replaced with individual options on a per bucket basis to give greater choice in configuring POPFile.
- The 'unclassified' bucket is now visible in the UI so that you can see how many messages were unclassified, and configure header modification. This also means that unclassified messages are counted in the accuracy statistics; previously they were not counted which could have skewed the accuracy statistics if there were unclassified messages. -- This was something I had asked for, and I assume others did as well.
- The history "page" bar has been simplified so that it uses a fixed amount of screen space, while making navigation easy. Filters and searches on the history are now persistent, for example you can click on the Buckets page and return to the History page without losing your filter or search settings.
- The Buckets page has been modified to only show the 'distinct word' count per bucket and to show the total number of distinct words in the database. Previously we showed two counts with confusing titles: now we show the true number of words in the database, not the "word counts" (which was the number of times each word occurred).
- We've recently seen spam start to use CSS to obscure messages and fool filters like POPFile; in response, this version of POPFile does analysis of CSS in HTML encoded messages. POPFile now correctly uses the SpamAssassin headers to make POPFile more efficient when used in conjunction with SpamAssassin. We now also look at TLDs (Top Level Domains) and store them as pseudowords (most useful for TLDs like .biz). -- My favorite new feature.
- It's possible that you might see a drop in accuracy as your corpus gets trained up on the new anti-spam features. This drop in accuracy will be corrected once you've retrained POPFile a little.
- I'm suggesting that if you have a corpus with GREATER THAN 30,000 unique words (you can figure this out from the Buckets page) please wait for v0.21.1.

Comments (4)
thanks for the clear and concise upgrade tutorial... I had v0.21 purring in less than 10mins!
Posted by greg | March 19, 2004 11:18 PM
Posted on March 19, 2004 23:18
I am glad to help! I think more people should try POPFile. Just a little work can save you $25 (the cost of SpamSieve).
Posted by Ken Edwards | March 20, 2004 5:35 AM
Posted on March 20, 2004 05:35
I wonder if anyone can offer me some help in installing the BerkeleyDB Perl module (Berkeley DB-0.25). I keep on getting error message, even after I change the config.in file as stated in the Popfile installation instruction. Here's the copy of what's happening at terminal.
# perl Makefile.pl
Parsing config.in...
Looks Good.
Note (probably harmless): No library found for -ldb
Writing Makefile for BerkeleyDB
Richard-Yips-Computer:/Users/richard/Documents/SD Downloads/BerkeleyDB-0.25 root# make
cc -c -I./libraries/4.2.41/include -g -pipe -pipe -fno-common -DPERL_DARWIN -no-cpp-precomp -fno-strict-aliasing -I/usr/local/include -Os -DVERSION=\"0.25\" -DXS_VERSION=\"0.25\" "-I/System/Library/Perl/5.8.1/darwin-thread-multi-2level/CORE" BerkeleyDB.c
BerkeleyDB.xs:74:2: #error db.h is from Berkeley DB 1.x - need at least Berkeley DB 2.6.4
Posted by Richard | May 4, 2004 2:26 AM
Posted on May 4, 2004 02:26
It is my understanding that the BerklyDB is not required now. POPFile uses SQLite now. Have you tried just installing SQLite and the required perl modules?
I never did try installing this latest version of POPFile clean. I should do that when I have the time (ha ha).
Posted by Ken Edwards | May 4, 2004 3:30 AM
Posted on May 4, 2004 03:30