Re: Spam, bogofilter, etc

From: Martin J. Bligh
Date: Mon Oct 02 2006 - 11:26:26 EST


Lee Revell wrote:
On Mon, 2006-10-02 at 13:03 +0300, Matti Aarnio wrote:
I do think that Markov Chains combined with Bayes Statistics might do a wee bit better. (Except with very short emails.)
However all that these things are able to do is essentially
grow the key database when spammers are producing new mutated
(mis-spelled) texts by mixing in spaces, punctuations, and even
occasional characters.

For recognizing those pill merchants one needs complex software
to read the site at the URL, and to read texts out of the IMAGES
at the site. Captcha to get thru spam filters...


Could a heuristic be added to reject messages with wildly incorrect
dates? I notice that the last 5-10 messages in my LKML folder every
morning are spam with a date that's ~24 hours in the future.

If you got rid of "slut" and "schoolgirl" that'd get rid of half of it.

M.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/