THE FOLLOWING SOURCES WERE FOUND VIA A GOOGLE SEARCH "data mining sample data" on Jul 12 2019.
THE FOLLOWING SOURCES WERE FOUND IN 2017.
SNAP(Stanford Network Analysis Project)
Stanford Network Analysis Platform (SNAP) is a general purpose network analysis and graph mining
library.
http://snap.stanford.edu/
Bureau of Transportation Statistics
https://www.bts.gov/
The SuiteSparse Matrix Collection (formerly known as the University of Florida Sparse Matrix Collection)
https://www.cise.ufl.edu/research/sparse/matrices/
movielens + GroupLens
https://grouplens.org/datasets/
kaggle - Data Science & Machine Learning competitions and open data sets
https://www.kaggle.com
https://www.kaggle.com/datasets
https://www.kaggle.com/c/word2vec-nlp-tutorial/data
Use Google's Word2Vec for movie reviews
BSDB (The Berkeley Segmentation Dataset and Benchmark)
https://www.eecs.berkeley.edu/Research/Projects/CS/vision/bsds/.
Testbed images for image segmentation and edge detection in natural images.
Network Data
http://openflights.org/data.html#route
airport-airport 539 airlines 2939 airport pairs
Word count data
http://www.ngrams.info/intro.asp
to get 3-grams of english text
https://books.google.com/ngrams
sample data
MSR GPS Privacy Dataset 2009, March 2017
http://research.microsoft.com/en-us/um/people/jckrumm/wallflower/testimages.htm.
See the "Downloads" tab:
Seattle region GPS tracking data.
https://www.microsoft.com/en-us/download/details.aspx?id=54965
http://www-personal.umich.edu/~mejn/netdata/
links to some network data, compiled over the years.
http://socialcomputing.asu.edu/datasets/YouTube
co-occurence data among user's access to youtube videos.
http://socialcomputing.asu.edu/datasets/BlogCatalog
friend information among bloggers.
10,000 Facebook status updates of 250 users + personality + Facebook social network properties, including network size, betweenness centrality, density and transitivity.
http://mypersonality.org/wiki/lib/exe/fetch.php?media=wiki:mypersonality_final.zip
Cite: Celli F., Pianesi F., Stillwell D., Kosinski M. (2013) Workshop on Computational Personality Recognition (Shared Task). In Proceedings of WCPR13, in conjunction with ICWSM-13.
Image - Video Data
Detection of Moving Objects
http://limu.ait.kyushu-u.ac.jp/dataset/en/
http://wordpress-jodoin.dmi.usherb.ca/dataset2012/
identification of changing or moving areas in the field of view of a camera
This dataset contains 6 video categories with 4 to 6 videos sequences in each category
Test Images for Wallflower Paper (background subtraction) February 2017
http://research.microsoft.com/en-us/um/people/jckrumm/wallflower/testimages.htm.
See the "Downloads" tab:
Back to Class Web Page