Where Everybody's Crazy

I'm a missionary in Japan. The name of my mission agency is WEC International. That's supposedly Worldwide Evangelisation for Christ, but I think I have a better idea about what it stands for...

2006-03-22

Little languages

I have a little black book with all of the projects I'm working on on and off and all the things I need to do for them, and when, as now, I don't have much else I should be doing, I go through and finish something else off. The project I've been playing with for the past couple of days has been SpamMonkey, my cut-down SA clone for blog/mail/whatever spam detection.

I got the idea at YAPC Europe, while battling blog spam and listening to Stowe's talk about using DNS BLs to block web spam. I coded up a basic SpamAssassin replacement in a couple of days and it's been deflecting a bit of spam from this very site. (Although not all of it, and I've other measures in use too.)

One thing that stops it being useful for mail too is the lack of RBL support, and the reason I haven't done that is because there isn't anything like SpamAssassin's Received header parsing, which you really need for this job. So I've been working on Email::Received.

Unfortunately I couldn't just pull out the relevant subroutine from SA because it's 900 lines of ugly code. So I've been trying to make it less ugly, by turning it into something data-driven rather than code-driven. To do this, I invented a little ad hoc language - basically AWK-with-a-vengence - and wrote all the parsing rules in that. So this:

  if (/^\(/) { return; }
  if (/\sid\s+;]{3,})/) { $id = $1; }
  if (/ by .*? with (ESMTPA|ESMTPSA|LMTPA|LMTPSA|ASMTP|HTTP)\;? /i) { $auth = $1; }

becomes the slightly less horrific:

/^\(/                                 IGNORE "gateway noise";
/\sid\s+;]{3,})/             SET id = $1;
/ by .*? with (ESMTPA|ESMTPSA|LMTPA|LMTPSA|ASMTP|HTTP)\;? /i SET auth = $1;

I now have the functionality of 800 lines of code expressed in 200 lines of data. The data is easier to edit and to verify, and this process also makes it easier to detect when there is redundancy in the rules, which I found quite a bit. The next stage is probably to write a translator from this into Perl; although thinking about it, the rules are now decoupled from the code and it would be possible to use PCRE and generate a really fast Received line parser in C from this data. (I'm not going to do that.)

Little languages like this are a great way to turn code into data, and by developing them in an ad-hoc way you tend to produce a language that best expresses the task at hand - and with a little bit of Perl it's not too difficult to turn them into something executable, either interpreted or translated.


Posted at 22:31:21 in spammonkey technology yak-shaving | # | G | P | 2 Comments
Language
Japanese English
Links

Tags and Tools
« 2006-03 »
S M TWTFS
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31

RSS


I am...

lathos: Heading down to Oookayama. The おおお joke never gets old.


Photoblog

castle1_filtered.jpg

gosanpai_filtered.jpg

ichibangai2_filtered.jpg

machinaga_filtered.jpg

mizu.jpg


Speedblog

http://daiyainn.gooside.com/ # 京都だいや旅館 京へおこしやす

http://www.e-chords.com/guitartab.asp?idmusica=96629&keyb=true # Where Could I go Tab by Ben Harper - E-Chords

http://www.inmamaskitchen.com/RECIPES/RECIPES/Soups/vegetable_stock.html # Moosewood's Vegetable Stock Recipe

http://www.foodnetwork.com/food/recipes/recipe/0,,FOOD_9936_8389,00.html # Good Eats Roast Turkey Recipe: Recipes: Food Network

http://www.reallivepreacher.com/node/203 # You Ain't Jesus, PreacherPart Two: Losing The Language of Love

http://leiterreports.typepad.com/blog/2005/06/95_theses_on_th.html # Leiter Reports: A Philosophy Blog: 95 Theses on the Religious Right

http://cbae.nmsu.edu/~dboje/teaching/338/traits.htm # TRAITS

http://jweb.kokken.go.jp/gitaigo/index.html # 擬音語・擬態語 - 日本語を楽しもう! -

http://www.nanzan-u.ac.jp/SHUBUNKEN/publications/jjrs/jjrs_cumulative_list.htm # Japanese Journal of Religious Studies: Cumulative list of Essays & Book Reviews

http://www.myspace.com/chloecfrancis # www.myspace.com/chloecfrancis

http://www.solar.ifa.hawaii.edu/cgi-bin/StrikeProb?latitude=+35.38&longitude=-136.26&location=Nagahama,+Japan # Tropical Cyclone Strike Probabilities for Nagahama, Japan

http://www.missionjapan.org/mission/jmissionorg.html # Japan Mission Organization List

http://www.aquasapone.com.au/soapmaking/showergel_soap.html # AquaSapone - How to make shower gel from natural handmade soap

http://www.ultimate-guitar.com/tabs/d/danilo_montero/la_unica_razon_crd.htm # La Unica Razon Chords by Danilo Montero @ Ultimate-Guitar.Com

http://kb.mozillazine.org/Synchronizing_Windows_based_PDAs # Synchronizing Windows based PDAs - MozillaZine Knowledge Base

http://www.provider-navi.jp/campaign/gyao-withflets/ # USENインターネット接続サービス GyaO 光|当サイト限定キャンペーン

http://mytown.asahi.com/shiga/ # asahi.com:マイタウン滋賀 - 朝日新聞地域情報

http://news.bbc.co.uk/2/hi/programmes/from_our_own_correspondent/6506915.stm # BBC NEWS | Programmes | From Our Own Correspondent | Japanese men take marriage lessons

http://wiki.clamwin.com/index.php/Thunderbird_Extension # ClamWin Free Antivirus. GNU GPL Free Software Open Source Virus Scanner and Spyware Detector. Free Windows Antivirus and Anti Spyware. Stay Virus and Spyware Free with Free Software.

http://scan.dalo.us/ # Scandalous Software - Mac XML Tools


Musicblog

Blue States – Diamente

John Martyn – Don't Think Twice It's Alight

The Blind Boys of Alabama – Many Rivers to Cross

Powered by Glob!
Search: