Spam cleaning thoughts ---------------------- JMF 15 Jan 2006 --------------- Need signature and clean regexps... options to clean automatically or offer for human check For each page, get the contents of the Edit form check it for match with any signature if match, select the offending part, allow human check. Should use TDD - need JavaScript unit test framework. Should do some analysis of the problem - build database of page name, updated by, IP address, date-time, versions, size TDD of this, too! Or at least, build from parts tried in IRB. Two key aspects: - find all page URLS - for given page, extract required data Relevant snippets - from list all http://wiki.rubyonrails.com/rails/pages
check_box
function does just that, insert the checkbox. I'd like to see an added parameter "label" which outputs
Should I harvest the escaped or the unescaped version?
I.e. should I harvest by fetching page source, or by driving a browser and
saving textarea content? (Watir?)
N.B. Should be using XML access rather than pulling out content through regexps!
There are 13485 page versions.
JMF 18th January 2006
---------------------
Spam signatures: