Here's some sources of data which could be used with machine learning
or data mining algorithms.
If you want to do real statistically valid comparisons, you need to be
aware of potential problems. For a good introduction to these, see the
paper "On Comparing Classifiers: Pitfalls to Avoid and a Recommended
Approach" ( http://www.cs.jhu.edu/~salzberg/critique.ps )
by Steven Salzberg? ( http://www.cs.jhu.edu/~salzberg/ )