read-text
and see
and the variable punc
. The function read-text reads one
word at a time, the function see updates the count of how many time
a word has been seen, punc specifies pubctuation characters that
should be considered as separate words.
bad
] and one *good* for the words from the good email file
[called good
by Paul Graham].
I have removed the call to string-downcase
in the
function read-text
since words should be kept with the
original case. I have added additional characters to the
list in punc
since I want them to be stored as separate
symbols. You are free to add more characters if needed.
; ngood = number of good messages -- good = hash table of good words (let ((g (* 2 (or (gethash word good) 0))) (b (or (gethash word bad) 0))) (unless (< (+ g b) 5) (max .01 (min .99 (float (/ (min 1 (/ b nbad)) (+ (min 1 (/ g ngood)) (min 1 (/ b nbad)))))))))
;; call them probs (let ((prod (apply #'* probs))) (/ prod (+ prod (apply #'* (mapcar #'(lambda (x) (- 1 x)) probs)))))