On Mon, 2006-10-02 at 13:03 +0300, Matti Aarnio wrote:I do think that Markov Chains combined with Bayes Statistics might do a wee bit better. (Except with very short emails.)
However all that these things are able to do is essentially
grow the key database when spammers are producing new mutated
(mis-spelled) texts by mixing in spaces, punctuations, and even
occasional characters.
For recognizing those pill merchants one needs complex software
to read the site at the URL, and to read texts out of the IMAGES
at the site. Captcha to get thru spam filters...
Could a heuristic be added to reject messages with wildly incorrect
dates? I notice that the last 5-10 messages in my LKML folder every
morning are spam with a date that's ~24 hours in the future.