Description
Linear Algebra has contributed many methods for handling very large quantities
of numerical data.
Here we examine many of these linear algebra methods
and how they have been applied to the exploration and analysis of very large
data collections.
After a brief review of some basic concepts in linear algebra,
most of the class will be devoted to how these linear
algebra methods have been used in information retrieval,
data mining,
unsupervised clustering,
bioinformatics,
social networking,
machine learning
and the like.
Examples of methods we will examine are Latent
Semantic Indexing,
Least Squares Fit, possibly under a sparsity constraint,
Spectral partitioning,
Pagerank,
Support Vector Machines,
and
recent ideas on sparse approximation methods using L1 regularization.
A collection of basic
research papers, some of a tutorial nature, will be used for the class.
Examples will be taken from vision recognition systems, biological gene
analysis, document retrieval.
Prerequisites
Students should be familiar with basic linear algebra concepts and methods
such as Gaussian elimination for systems of linear equations, plus some
familarity with.
concepts such as matrix eigenvalues, singular values, and matrix least squares problems,
though some time will be spent reviewing these latter topics.
Basic concepts in optimization like first order optimality conditions
and duality will also be useful.
Work Plan
Students will be expected to do the following.
 Students are to present [at least] one research
or tutorial papers during the course of the
semester, by rotation.
Talks should highlight the main points, and summarize the theoretical and
experimental results present in the paper being presented. For very
theoretically technical papers, you should at least be able to explain what
the main results are and what they mean, even if the detailed derivations are
too complex for a short presentation.
Long papers may be split up into parts presented separately (e.g.,
basic results/algorithms and examples/applications).
(30% of the final grade)
 The papers listed here are a mix of historical foundational papers, recent
papers showing variations on the original ideas or how the abstract methods
have been applied to specific applications. Students should either select
from among these papers, or may select another similar relevant paper after
checking with first with the instructor.

Submit a weekly synopsis of each week's material, with your own reactions.
This should be limited to a paragraph or two pointing out what the main
takeaway message you got out of each lecture.
(10% of the final grade)
 Also submit as separate item some comments on how the lecture
was presented, to be passed on to the speaker (without attribution).
 Develop and carry out a research project based on one or more recent
research papers devoted to topics studied in this class.
A research project can be a literature survey, an experimental study of some
methods proposed in a paper or of an application of one of the methods studied
in this class. To give an approximate scale of the effort required, you
should expect to devote about 50 hours of time during the course of the
semester. In about 3 weeks, you should submit a 1 page description of
your proposed project.
(10% of the final grade)
 Write a 1015 page report on your research project at a level that would
be appropriate for publication in a workshop or conference.
in Computer Science.
(40% of the final grade)
 Give a short presentation on your project during the last 2 weeks of the
semester.
(10% of the final grade)
 Your project will count toward the Project Requirements for a Plan C MS degree
 Extra effort on the project or participation in class will be counted as an extra boost to your final grade.
 All submissions should be via
Moodle.
Topics
Link to Current Semester Plan and List of Papers
(
Link to Old Semester Plan and List of Papers from the previous offering of this class in 2012)
 Review of SVD and related topics
 Classical Dimensionality Reduction
 Latent Semantic Indexing
 Principal Component Analysis
 Tensor decompositions
 Concept Decomposition
 Randomized Matrix Approximations
 Specialized Matrix Approximations in applied statistics
 Fisher Linear Discriminant Analysis (LDA)
 MultiDimensional Scaling (MDS)
 Isomap and Locally Linear Embedding (LLE)
 NonNegative Matrix Factorization (NMF)
 Graphs and Link Analysis → Web Searches
 Power Law Distribution
 Importance of Nodes/Links via Page Rank, Commute Times.
 Hubs and Authorities (HITS)
 Connectivity, Almost Shortest Paths, Partitioning.
 Spectral Graph Partitioning
 CoClustering
 Dynamics  Consensus Computations.
 Theoretically fast matrix inversion.
 Sparse Approximations and Optimization
 Review of First Order Optimality Conditions in Optimization.
 basis pursuit, compressed sensing
 L1 regularization LASSO
 group lasso → sparse basis
 variations: low rank (nuclear norm) low variation (image processing)
 clustering with constraints.
 Iterative Algorithms, proximal methods.
For Further Information
Contact Daniel Boley, 6209 KHKH, boley@cs.umn.edu,
6253887.
http://wwwusers.cs.umn.edu/~boley