Where Everybody's Crazy

I'm a missionary in Japan. The name of my mission agency is WEC International. That's supposedly Worldwide Evangelisation for Christ, but I think I have a better idea about what it stands for...

2006-03-23

Who made Windows Perl easy?

So I had a gig on Monday installing a TT/Plucene/Perl site on a laptop so our client could get at their data and search it even when they were out on house calls. On their Windows laptop.

Great, I thought, I can bill by the hour, and this job will be a nightmare.

It took two hours. I had settled in for the day, but there you go. I'm a couple of hundred quid out of pocket, thanks to CamelPack. The hardest part was connecting the laptop to the network.

I'm waiting on a couple of other jobs which may or may not come in next week, so today continued hacking on Email::Received. Now I have a parser and an unparser, which turns the little language back into Perl code. Then we have a very neat trick which essentially goes:

    use base 'Exporter';
    our @EXPORT = qw(parse_received);
    *parse_received = eval unparse_rules( parse_rules(  ));

So the code is partially static (because it's compiled once) and partially dynamic, because you're not compiling it and dumping it into another file, or whatever. But the best thing about this is that you can instrument the code you generate so it tells you which rules were matched for which input. I gave a brief talk about this at London.pm tonight.

Also, after popular demand at the London.pm meeting, SpamMonkey escaped to CPAN.


Posted at 23:47:03 in technology spammonkey whats-going-on | # | G | P | 0 Comments

2006-03-22

Little languages

I have a little black book with all of the projects I'm working on on and off and all the things I need to do for them, and when, as now, I don't have much else I should be doing, I go through and finish something else off. The project I've been playing with for the past couple of days has been SpamMonkey, my cut-down SA clone for blog/mail/whatever spam detection.

I got the idea at YAPC Europe, while battling blog spam and listening to Stowe's talk about using DNS BLs to block web spam. I coded up a basic SpamAssassin replacement in a couple of days and it's been deflecting a bit of spam from this very site. (Although not all of it, and I've other measures in use too.)

One thing that stops it being useful for mail too is the lack of RBL support, and the reason I haven't done that is because there isn't anything like SpamAssassin's Received header parsing, which you really need for this job. So I've been working on Email::Received.

Unfortunately I couldn't just pull out the relevant subroutine from SA because it's 900 lines of ugly code. So I've been trying to make it less ugly, by turning it into something data-driven rather than code-driven. To do this, I invented a little ad hoc language - basically AWK-with-a-vengence - and wrote all the parsing rules in that. So this:

  if (/^\(/) { return; }
  if (/\sid\s+;]{3,})/) { $id = $1; }
  if (/ by .*? with (ESMTPA|ESMTPSA|LMTPA|LMTPSA|ASMTP|HTTP)\;? /i) { $auth = $1; }

becomes the slightly less horrific:

/^\(/                                 IGNORE "gateway noise";
/\sid\s+;]{3,})/             SET id = $1;
/ by .*? with (ESMTPA|ESMTPSA|LMTPA|LMTPSA|ASMTP|HTTP)\;? /i SET auth = $1;

I now have the functionality of 800 lines of code expressed in 200 lines of data. The data is easier to edit and to verify, and this process also makes it easier to detect when there is redundancy in the rules, which I found quite a bit. The next stage is probably to write a translator from this into Perl; although thinking about it, the rules are now decoupled from the code and it would be possible to use PCRE and generate a really fast Received line parser in C from this data. (I'm not going to do that.)

Little languages like this are a great way to turn code into data, and by developing them in an ad-hoc way you tend to produce a language that best expresses the task at hand - and with a little bit of Perl it's not too difficult to turn them into something executable, either interpreted or translated.


Posted at 22:31:21 in spammonkey technology yak-shaving | # | G | P | 2 Comments

2005-09-28

SpamMonkey considered "Good Enough"

So a few days ago I finished the first plugin for SpamMonkey, SpamMonkey::Test::check_uridnsbl. SpamAssassin people already know what that does - looks up URIs in a message in a DNS blacklist. This is basically what I wanted to stop comment spam on the blog.

And as of now, SpamMonkey is doing what I intended it to do - it's a SpamAssassin clone which can filter other types of spam, such as comment spam. It's in operation now on this very Bryar blog. The code to make it connect to Bryar was very simple:

use SpamMonkey;
my $sm = SpamMonkey->new;
$sm->ready();
my $res = $sm->test($params{content});
if ($res->is_spam) {
    $self->report_error("I think you're a spammer, because your comment: ".
        join("\n", $res->describe_hits));
}

That should deal with the spammers... for the moment.


Posted at 12:46:46 in mail-handling spammonkey technology | # | G | P | 4 Comments

2005-09-07

SpamMonkey

Yeah, yeah, I'm not programming... but I wrote something today called SpamMonkey, which is a very very cut down ground-up reimplementation of SpamAssassin. I said I was going to do it, and I did. (I was going to call it Barcelos, or Spam<something else>, but that's another story...)

Specifically it lacks:

  • Meta rules (because I can't be bothered to implement the logic in them yet)
  • Conditional compilation of rules (because I haven't worked out how to emulate the plugin infrastructure yet)
  • Any of the eval tests (because they're all really ugly).

Unfortunately that last one means there's no DNS BL checks, which I really do need to fix, and there's no Bayes yet. I might fix that.

OK, so why? First because the SA code is somewhat baroque, to say the least. To be fair, most of the complexity is justified, for reasons of optimization or having to handle pathological (MIME) messages. But there's not really any reason to implement your own MIME parser in this day and age. I've already done that.

Second because I want to be able to use it to test things that aren't mail (like, say, blog comments) for spaminess using the same ruleset. And since this reads SA config files, it does indeed use the same ruleset.

I might release this in a few days, but I have this horrible premonition that everyone will miss the point and complain that it's just like SpamAssassin but it doesn't do X, Y, and Z. What do you think?


Posted at 23:50:24 in technology spammonkey mail-handling | # | G | P | 8 Comments
Language
Japanese English
Links

Tags and Tools
« 2008-05
S M TWTFS
1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

RSS


I am...

lathos: Just written a device driver for my new piano. I impress myself sometimes.


Photoblog

castle1_filtered.jpg

gosanpai_filtered.jpg

ichibangai2_filtered.jpg

machinaga_filtered.jpg

mizu.jpg


Speedblog

http://daiyainn.gooside.com/ # 京都だいや旅館 京へおこしやす

http://www.e-chords.com/guitartab.asp?idmusica=96629&keyb=true # Where Could I go Tab by Ben Harper - E-Chords

http://www.inmamaskitchen.com/RECIPES/RECIPES/Soups/vegetable_stock.html # Moosewood's Vegetable Stock Recipe

http://www.foodnetwork.com/food/recipes/recipe/0,,FOOD_9936_8389,00.html # Good Eats Roast Turkey Recipe: Recipes: Food Network

http://www.reallivepreacher.com/node/203 # You Ain't Jesus, PreacherPart Two: Losing The Language of Love

http://leiterreports.typepad.com/blog/2005/06/95_theses_on_th.html # Leiter Reports: A Philosophy Blog: 95 Theses on the Religious Right

http://cbae.nmsu.edu/~dboje/teaching/338/traits.htm # TRAITS

http://jweb.kokken.go.jp/gitaigo/index.html # 擬音語・擬態語 - 日本語を楽しもう! -

http://www.nanzan-u.ac.jp/SHUBUNKEN/publications/jjrs/jjrs_cumulative_list.htm # Japanese Journal of Religious Studies: Cumulative list of Essays & Book Reviews

http://www.myspace.com/chloecfrancis # www.myspace.com/chloecfrancis

http://www.solar.ifa.hawaii.edu/cgi-bin/StrikeProb?latitude=+35.38&longitude=-136.26&location=Nagahama,+Japan # Tropical Cyclone Strike Probabilities for Nagahama, Japan

http://www.missionjapan.org/mission/jmissionorg.html # Japan Mission Organization List

http://www.aquasapone.com.au/soapmaking/showergel_soap.html # AquaSapone - How to make shower gel from natural handmade soap

http://www.ultimate-guitar.com/tabs/d/danilo_montero/la_unica_razon_crd.htm # La Unica Razon Chords by Danilo Montero @ Ultimate-Guitar.Com

http://kb.mozillazine.org/Synchronizing_Windows_based_PDAs # Synchronizing Windows based PDAs - MozillaZine Knowledge Base

http://www.provider-navi.jp/campaign/gyao-withflets/ # USENインターネット接続サービス GyaO 光|当サイト限定キャンペーン

http://mytown.asahi.com/shiga/ # asahi.com:マイタウン滋賀 - 朝日新聞地域情報

http://news.bbc.co.uk/2/hi/programmes/from_our_own_correspondent/6506915.stm # BBC NEWS | Programmes | From Our Own Correspondent | Japanese men take marriage lessons

http://wiki.clamwin.com/index.php/Thunderbird_Extension # ClamWin Free Antivirus. GNU GPL Free Software Open Source Virus Scanner and Spyware Detector. Free Windows Antivirus and Anti Spyware. Stay Virus and Spyware Free with Free Software.

http://scan.dalo.us/ # Scandalous Software - Mac XML Tools


Musicblog

Elvis Costello – The Invisible Man

Elvis Costello – Town Where Time Stood Still

Otis Redding – Hard to Handle

Powered by Glob!
Search: