-
MTA SZTAKI, Kende utca 13-17, Nagytanács terem
-
-
-
-
-
-

Description

Since its early applications by Gauss and Legendre around 1800, the least-squares method has been used in countless applications, and is today a standard tool in many fields, including control theory, system identification, machine learning, econometrics, etc. In this talk, we refer to least-squares as a methodology to make decisions based on a finite sample of data. For example, the problem of deciding the location of a station that serves a population can be addressed as follows: sample a set of potential clients and then choose the location by minimising the sum of quadratic cost functions representing the squared home-service distances, possibly weighted by some client-dependent penalisation parameter. Once the location of the service station has been determined based on a sample, one can wonder how good the decision is for the rest of the population. For example, one can evaluate the costs – as measured by the home-service distance – paid by the individuals in the sample, and ask how representative these costs are of the level of dissatisfaction of the whole population. In spite of the popularity of the least-squares method and the many theoretical investigations related to it, this problem has so far received little attention from the statistical community. In this talk, we start considering the question as to whether the empirical proportion of members in the sample that pay a cost above a given value is a valid statistic for quantifying the proportion of the whole population that pays a cost above that given value. The answer to this question is "no", because the least-squares solution has a bias towards making small the cost for the members in the sample. However, we will show that, by introducing suitable margins, valid and tight statistics can be obtained which hold true distribution-free, that is, these statistics can be applied without using any extra knowledge on how the population distributes. In the course of the seminar, this result will be put in the wider context of the quest for guaranteed data-driven decision methods, with references to recent applications, such as control and machine learning algorithms.