by A. W. van der Vaart
About the book: Here is a practical and mathematically rigorous introduction to the field of asymptotic statistics. In addition to most of the standard topics of an asymptotics course--likelihood inference, M-estimation, the theory of asymptotic efficiency, U-statistics, and rank procedures--the book also presents recent research topics such as semiparametric models, the bootstrap, and empirical processes and their applications. The topics are organized from the central idea of approximation by limit experiments, one of the book's unifying themes that mainly entails the local approximation of the classical i.i.d. set up with smooth parameters by location experiments involving a single, normally distributed observation.
Notes: This book is used to teach students at Berkeley. As a book it shows how many ideas in inference (M estimation---which includes maximum likelihood and empirical risk minimization-the bootstrap, semiparametrics, etc) repose on top of empirical process theory.
by Y. Nesterov
About the book: It was in the middle of the 1980s, when the seminal paper by Kar markar opened a new epoch in nonlinear optimization. The importance of this paper, containing a new polynomial-time algorithm for linear op timization problems, was not only in its complexity bound. At that time, the most surprising feature of this algorithm was that the theoretical pre diction of its high efficiency was supported by excellent computational results. This unusual fact dramatically changed the style and direc tions of the research in nonlinear optimization. Thereafter it became more and more common that the new methods were provided with a complexity analysis, which was considered a better justification of their efficiency than computational experiments. In a new rapidly develop ing field, which got the name "polynomial-time interior-point methods", such a justification was obligatory. Afteralmost fifteen years of intensive research, the main results of this development started to appear in monographs [12, 14, 16, 17, 18, 19]. Approximately at that time the author was asked to prepare a new course on nonlinear optimization for graduate students. The idea was to create a course which would reflect the new developments in the field. Actually, this was a major challenge. At the time only the theory of interior-point methods for linear optimization was polished enough to be explained to students. The general theory of self-concordant functions had appeared in print only once in the form of research monograph .
Notes: Y Nesterov's book gives a way to start to understand lower bounds in optimization.
by Kevin P. Murphy
About the book: Today's Web-enabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach. The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudo-code for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic methods, the book stresses a principled model-based approach, often using the language of graphical models to specify models in a concise and intuitive way. Almost all the models described have been implemented in a MATLAB software package -- PMTK (probabilistic modeling toolkit) -- that is freely available online. The book is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.
Notes: Nando de Freitas recommends this book for machine learning. Kevin Murphy will be coming out with an updated version of the same book that will have a much better deep learning section.
by Gareth James,Daniela Witten,Trevor Hastie,Robert Tibshirani
About the book: An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years. This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, and more. Color graphics and real-world examples are used to illustrate the methods presented. Since the goal of this textbook is to facilitate the use of these statistical learning techniques by practitioners in science, industry, and other fields, each chapter contains a tutorial on implementing the analyses and methods presented in R, an extremely popular open source statistical software platform.Two of the authors co-wrote The Elements of Statistical Learning (Hastie, Tibshirani and Friedman, 2nd edition 2009), a popular reference book for statistics and machine learning researchers. An Introduction to Statistical Learning covers many of the same topics, but at a level accessible to a much broader audience. This book is targeted at statisticians and non-statisticians alike who wish to use cutting-edge statistical learning techniques to analyze their data. The text assumes only a previous course in linear regression and no knowledge of matrix algebra.
Notes: For those who have worked on Unsupervised, Supervised and LSTM based models, and have a great conceptual understanding of the models, Nando De Freitas recommends this book before pursuing their PhD in this field.
by Richard S. Sutton,Andrew G. Barto
About the book: Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability.The book is divided into three parts. Part I defines the reinforcement learning problem in terms of Markov decision processes. Part II provides basic solution methods: dynamic programming, Monte Carlo methods, and temporal-difference learning. Part III presents a unified view of the solution methods and incorporates artificial neural networks, eligibility traces, and planning; the two final chapters present case studies and consider the future of reinforcement learning.
Notes: This book is a survey of traditional Reinforcement Learning. It's perfect for those who have just started in this field and want to learn more about machine learning.
by Christopher M. Bishop
About the book: This is the first textbook on pattern recognition to present the Bayesian viewpoint. The book presents approximate inference algorithms that permit fast approximate answers in situations where exact answers are not feasible. It uses graphical models to describe probability distributions when no other books apply graphical models to machine learning. No previous knowledge of pattern recognition or machine learning concepts is assumed. Familiarity with multivariate calculus and basic linear algebra is required, and some experience in the use of probabilities would be helpful though not essential as the book includes a self-contained introduction to basic probability theory.
Notes: Said to have a probabilistic view, Juergen Schmidhuber calls this book the bible of traditional machine learning,
by Ian Goodfellow,Yoshua Bengio,Aaron Courville
About the book: "Written by three experts in the field, Deep Learning is the only comprehensive book on the subject." -- Elon Musk, cochair of OpenAI; cofounder and CEO of Tesla and SpaceXDeep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.
by Tom M. Mitchell
About the book: This book covers the field of machine learning, which is the study of algorithms that allow computer programs to automatically improve through experience. The book is intended to support upper level undergraduate and introductory level graduate courses in machine learning.
Notes: This book is used mostly at a university level. It covers just the fundamentals and would be perfect for a beginner in this field.
by Aurélien Géron
About the book: Through a series of recent breakthroughs, deep learning has boosted the entire field of machine learning. Now, even programmers who know close to nothing about this technology can use simple, efficient tools to implement programs capable of learning from data. This practical book shows you how.By using concrete examples, minimal theory, and two production-ready Python frameworks—scikit-learn and TensorFlow—author Aurélien Géron helps you gain an intuitive understanding of the concepts and tools for building intelligent systems. You’ll learn a range of techniques, starting with simple linear regression and progressing to deep neural networks. With exercises in each chapter to help you apply what you’ve learned, all you need is programming experience to get started.Explore the machine learning landscape, particularly neural netsUse scikit-learn to track an example machine-learning project end-to-endExplore several training models, including support vector machines, decision trees, random forests, and ensemble methodsUse the TensorFlow library to build and train neural netsDive into neural net architectures, including convolutional nets, recurrent nets, and deep reinforcement learningLearn techniques for training and scaling deep neural netsApply practical code examples without acquiring excessive machine learning theory or algorithm details
Notes: It provides the one of the best overviews on how to think about and use TensorFlow. The chapters provide not just theory, but even full-blown practice. It skips over the very basic Python theory, most of the times used as a filler by other books.
by Igor Aizenberg,Naum N. Aizenberg,Joos P.L. Vandewalle
About the book: Multi-Valued and Universal Binary Neurons deals with two new types of neurons: multi-valued neurons and universal binary neurons. These neurons are based on complex number arithmetic and are hence much more powerful than the typical neurons used in artificial neural networks. Therefore, networks with such neurons exhibit a broad functionality. They can not only realise threshold input/output maps but can also implement any arbitrary Boolean function. Two learning methods are presented whereby these networks can be trained easily. The broad applicability of these networks is proven by several case studies in different fields of application: image processing, edge detection, image enhancement, super resolution, pattern recognition, face recognition, and prediction. The book is hence partitioned into three almost equally sized parts: a mathematical study of the unique features of these new neurons, learning of networks of such neurons, and application of such neural networks. Most of this work was developed by the first two authors over a period of more than 10 years and was only available in the Russian literature. With this book we present the first comprehensive treatment of this important class of neural networks in the open Western literature. Multi-Valued and Universal Binary Neurons is intended for anyone with a scholarly interest in neural network theory, applications and learning. It will also be of interest to researchers and practitioners in the fields of image processing, pattern recognition, control and robotics.
Notes: The author wrote about “deep learning of the features of threshold Boolean functions", one of the most important objects considered in the theory of perceptrons.
by Michael Sipser
About the book: Now you can clearly present even the most complex computational theory topics to your students with Sipser's distinct, market-leading INTRODUCTION TO THE THEORY OF COMPUTATION, 3E. The number one choice for today's computational theory course, this highly anticipated revision retains the unmatched clarity and thorough coverage that make it a leading text for upper-level undergraduate and introductory graduate students. This edition continues author Michael Sipser's well-known, approachable style with timely revisions, additional exercises, and more memorable examples in key areas. A new first-of-its-kind theoretical treatment of deterministic context-free languages is ideal for a better understanding of parsing and LR(k) grammars. This edition's refined presentation ensures a trusted accuracy and clarity that make the challenging study of computational theory accessible and intuitive to students while maintaining the subject's rigor and formalism. Readers gain a solid understanding of the fundamental mathematical properties of computer hardware, software, and applications with a blend of practical and philosophical coverage and mathematical treatments, including advanced theorems and proofs. INTRODUCTION TO THE THEORY OF COMPUTATION, 3E's comprehensive coverage makes this an ideal ongoing reference tool for those studying theoretical computing.Important Notice: Media content referenced within the product description or the product text may not be available in the ebook version.
Notes: Juergen Schmidhuber recommends this book to understand the basics of computation, essential for deep learning.
by Alex Graves
About the book: Supervised sequence labelling is a vital area of machine learning, encompassing tasks such as speech, handwriting and gesture recognition, protein secondary structure prediction and part-of-speech tagging. Recurrent neural networks are powerful sequence learning tools―robust to input noise and distortion, able to exploit long-range contextual information―that would seem ideally suited to such problems. However their role in large-scale sequence labelling systems has so far been auxiliary. The goal of this book is a complete framework for classifying and transcribing sequential data with recurrent neural networks only. Three main innovations are introduced in order to realise this goal. Firstly, the connectionist temporal classification output layer allows the framework to be trained with unsegmented target sequences, such as phoneme-level speech transcriptions; this is in contrast to previous connectionist approaches, which were dependent on error-prone prior segmentation. Secondly, multidimensional recurrent neural networks extend the framework in a natural way to data with more than one spatio-temporal dimension, such as images and videos. Thirdly, the use of hierarchical subsampling makes it feasible to apply the framework to very large or high resolution sequences, such as raw audio or video. Experimental validation is provided by state-of-the-art results in speech and handwriting recognition.
Notes: Thesis of Alex Graves, this book covers much of RNN (Recurrent Neural Networks) which is not covered so much in Bishop's 'Pattern Recognition and Machine Learning' book. Recommended by Juergen Schmidhuber for students starting out in his lab.
by Marcus Hutter
About the book: This book presents sequential decision theory from a novel algorithmic information theory perspective. While the former is suited for active agents in known environments, the latter is suited for passive prediction in unknown environments. The book introduces these two different ideas and removes the limitations by unifying them to one parameter-free theory of an optimal reinforcement learning agent embedded in an unknown environment. Most AI problems can easily be formulated within this theory, reducing the conceptual problems to pure computational ones. Considered problem classes include sequence prediction, strategic games, function minimization, reinforcement and supervised learning. The discussion includes formal definitions of intelligence order relations, the horizon problem and relations to other approaches. One intention of this book is to excite a broader AI audience about abstract algorithmic information theory concepts, and conversely to inform theorists about exciting applications to AI.
Notes: It's a book on mathematically optimal universal AI. It also contains general problem solvers and universal reinforcement learners. The book goes far beyond traditional RL and previous AI method.
by Ming Li,Paul M.B. Vitányi
About the book: “The book is outstanding and admirable in many respects. ... is necessary reading for all kinds of readers from undergraduate students to top authorities in the field.” Journal of Symbolic Logic Written by two experts in the field, this is the only comprehensive and unified treatment of the central ideas and applications of Kolmogorov complexity. The book presents a thorough treatment of the subject with a wide range of illustrative applications. Such applications include the randomness of finite objects or infinite sequences, Martin-Loef tests for randomness, information theory, computational learning theory, the complexity of algorithms, and the thermodynamics of computing. It will be ideal for advanced undergraduate students, graduate students, and researchers in computer science, mathematics, cognitive sciences, philosophy, artificial intelligence, statistics, and physics. The book is self-contained in that it contains the basic requirements from mathematics and computer science. Included are also numerous problem sets, comments, source references, and hints to solutions of problems. New topics in this edition include Omega numbers, Kolmogorov–Loveland randomness, universal learning, communication complexity, Kolmogorov's random graphs, time-limited universal distribution, Shannon information and others.
Notes: The book is a survey of algorithmic information theory, based on the original work by Kolmogorov and Solomonoff. Foundation of universal optimal predictors and compressors and general inductive inference machines.
by David J. C. MacKay
About the book: Information theory and inference, often taught separately, are here united in one entertaining textbook. These topics lie at the heart of many exciting areas of contemporary science and engineering - communication, signal processing, data mining, machine learning, pattern recognition, computational neuroscience, bioinformatics, and cryptography. This textbook introduces theory in tandem with applications. Information theory is taught alongside practical communication systems, such as arithmetic coding for data compression and sparse-graph codes for error-correction. A toolbox of inference techniques, including message-passing algorithms, Monte Carlo methods, and variational approximations, are developed alongside applications of these tools to clustering, convolutional codes, independent component analysis, and neural networks. The final part of the book describes the state of the art in error-correcting codes, including low-density parity-check codes, turbo codes, and digital fountain codes -- the twenty-first century standards for satellite communications, disk drives, and data broadcast. Richly illustrated, filled with worked examples and over 400 exercises, some with detailed solutions, David MacKay's groundbreaking book is ideal for self-learning and for undergraduate or graduate courses. Interludes on crosswords, evolution, and sex provide entertainment along the way. In sum, this is a textbook on information, communication, and coding for a new generation of students, and an unparalleled entry point into these subjects for professionals in areas as diverse as computational biology, financial engineering, and machine learning.
Notes: Nando de freitas talks about David McKay as being very influential on his way of thinking. He is a phenomenal researcher and a great human being.
by Alexandre B. Tsybakov
About the book: This is a concise text developed from lecture notes and ready to be used for a course on the graduate level. The main idea is to introduce the fundamental concepts of the theory while maintaining the exposition suitable for a first approach in the field. Therefore, the results are not always given in the most general form but rather under assumptions that lead to shorter or more elegant proofs.The book has three chapters. Chapter 1 presents basic nonparametric regression and density estimators and analyzes their properties. Chapter 2 is devoted to a detailed treatment of minimax lower bounds. Chapter 3 develops more advanced topics: Pinsker’s theorem, oracle inequalities, Stein shrinkage, and sharp minimax adaptivity.This book will be useful for researchers and grad students interested in theoretical aspects of smoothing techniques. Many important and useful results on optimal and adaptive estimation are provided. As one of the leading mathematical statisticians working in nonparametrics, the author is an authority on the subject.
Notes: A very readable source for the tools for obtaining lower bounds on estimators.
by Bradley Efron
About the book: We live in a new age for statistical inference, where modern scientific technology such as microarrays and fMRI machines routinely produce thousands and sometimes millions of parallel data sets, each with its own estimation or testing problem. Doing thousands of problems at once is more than repeated application of classical methods. Taking an empirical Bayes approach, Bradley Efron, inventor of the bootstrap, shows how information accrues across problems in a way that combines Bayesian and frequentist ideas. Estimation, testing, and prediction blend in this framework, producing opportunities for new methodologies of increased power. New difficulties also arise, easily leading to flawed inferences. This book takes a careful look at both the promise and pitfalls of large-scale statistical inference, with particular attention to false discovery rates, the most successful of the new statistical techniques. Emphasis is on the inferential ideas underlying technical developments, illustrated using a large number of real examples.
Notes: Michael I Jordon acclaims it to be a thought-provoking book.