seeand the variable
punc. The function read-text reads one word at a time, the function see updates the count of how many time a word has been seen, punc specifies pubctuation characters that should be considered as separate words.
bad] and one *good* for the words from the good email file [called
goodby Paul Graham]. I have removed the call to
string-downcasein the function
read-textsince words should be kept with the original case. I have added additional characters to the list in
puncsince I want them to be stored as separate symbols. You are free to add more characters if needed.
; ngood = number of good messages -- good = hash table of good words (let ((g (* 2 (or (gethash word good) 0))) (b (or (gethash word bad) 0))) (unless (< (+ g b) 5) (max .01 (min .99 (float (/ (min 1 (/ b nbad)) (+ (min 1 (/ g ngood)) (min 1 (/ b nbad)))))))))
;; call them probs (let ((prod (apply #'* probs))) (/ prod (+ prod (apply #'* (mapcar #'(lambda (x) (- 1 x)) probs)))))