# NUMERICAL LINEAR ALGEBRA IN DATA EXPLORATION

### MW 4-5:15pm, AkerH 227

Linear Algebra has contributed many methods for handling very large quantities of numerical data. Here we examine many of these linear algebra methods and how they have been applied to the exploration and analysis of very large data collections. After a brief review of some basic concepts in linear algebra, most of the class will be devoted to how these linear algebra methods have been used in information retrieval, data mining, unsupervised clustering, bioinformatics, social networking, machine learning and the like. Examples of methods we will examine are Latent Semantic Indexing, Least Squares Fit, possibly under a sparsity constraint, Spectral partitioning, Pagerank, Support Vector Machines, and recent ideas on sparse approximation methods using L1 regularization. A collection of basic research papers, some of a tutorial nature, will be used for the class. Examples will be taken from vision recognition systems, biological gene analysis, document retrieval.

### Prerequisites

Students should be familiar with basic linear algebra concepts and methods such as Gaussian elimination for systems of linear equations, plus some familarity with. concepts such as matrix eigenvalues, singular values, and matrix least squares problems, though some time will be spent reviewing these latter topics. Basic concepts in optimization like first order optimality conditions and duality will also be useful.

### Work Plan

Students will be expected to do the following.
• Present one or two research or tutorial papers during the course of the semester, by rotation. Talks should highlight the main points, and summarize the theoretical and experimental results present in the paper being presented. For very theoretically technical papers, you should at least be able to explain what the main results are and what they mean, even if the detailed derivations are too complex for a short presentation. Some conference papers are short and will be presented via a short talk (two per class period). Very long papers will be split up into parts presented separately (e.g., basic results/algorithms and examples/applications), and these parts might be presented by different students.
• Submit a short weekly synopsis of each week's material, with your own reactions.
• Develop and carry out a research project based on one or more recent research papers devoted to topics studied in this class. A research project can be a literature survey, an experimental study of some methods proposed in a paper or of an application of one of the methods studied in this class. To give an approximate scale of the effort required, you should expect to devote about 50 hours of time during the course of the semester.
• Write a 10-15 page report on your research project.
• Give a short presentation on your project during the last 2-3 weeks of the semester.
Your project will count toward the Project Requirements for a Plan C MS degree in Computer Science.

### Sample Topics

• Intro: Basics of Eigenvalues, PCA definition
• Text Mining
• Tensor Decompositions
• Feature extraction
• Kernal Methods
• Dimensionality Reduction
• Multidimensional Scaling and non-linear dimensionality reduction
• Methods related to Spectral Graph Partitioning
• Importance Ranking in Graphs and Link Analysis
• Bounds on eigenvalues related to how well a graph can be cut
• Graph Related Methods + a sparse paper
• Convolutional Neural Nets