3. Function approximation using the method

least squares

The least squares method is used when processing the experimental results for approximations (approximation) experimental data analytical formula. The specific form of the formula is chosen, as a rule, from physical considerations. Such formulas can be:

other.

The essence of the least squares method is as follows. Let the measurement results be presented in the table:

table 4
				x n
				y n

(3.1)

where f - known function, a 0, a 1, ..., a m - unknown constant parameters, the values of which must be found. In the least squares method, the approximation of function (3.1) to the experimental dependence is considered to be the best if the condition

(3.2)

that is sums a the squares of the deviations of the required analytical function from the experimental dependence should be minimal .

Note that the function Q called discrepancy.

Since the residual

then it has a minimum. A necessary condition for the minimum of a function of several variables is the equality to zero of all partial derivatives of this function with respect to the parameters. Thus, finding the best values of the parameters of the approximating function (3.1), that is, such values for which Q = Q (a 0, a 1, ..., a m ) is minimal, is reduced to solving the system of equations:

(3.3)

The method of least squares can be given the following geometric interpretation: among an infinite family of lines of a given type, one line is found for which the sum of the squares of the differences between the ordinates of the experimental points and the corresponding ordinates of points found by the equation of this line will be the smallest.

Finding the parameters of a linear function

Let the experimental data be represented by a linear function:

It is required to select such values a and b for which the function

(3.4)

will be minimal. The necessary conditions for the minimum of function (3.4) are reduced to the system of equations:

After transformations, we obtain a system of two linear equations with two unknowns:

(3.5)

solving which, we find the required values of the parameters a and b.

Finding the parameters of a quadratic function

If the approximating function is the quadratic dependence

then its parameters a, b, c are found from the condition for the minimum of the function:

(3.6)

The conditions for the minimum of function (3.6) are reduced to the system of equations:

After transformations, we obtain a system of three linear equations with three unknowns:

(3.7)

at the solution of which we find the required values of the parameters a, b and c.

Example ... Let the experiment result in the following table of values x and y:

table 5

y i	0,705	0,495	0,426	0,357	0,368	0,406	0,549	0,768

It is required to approximate the experimental data with linear and quadratic functions.

Solution. Finding the parameters of the approximating functions is reduced to solving systems of linear equations (3.5) and (3.7). To solve the problem, we will use a spreadsheet processor Excel.

1. First, let's link sheets 1 and 2. Let's enter the experimental values x i and y i into columns A and B, starting from the second row (in the first row we will put the column headings). Then we calculate the sums for these columns and place them in the tenth row.

Columns C - G place the calculation and summation respectively

2. Let's uncouple the sheets. Further calculations will be carried out in the same way for a linear dependence on Sheet 1 and for a quadratic dependence on Sheet 2.

3. Under the resulting table, form a matrix of coefficients and a column vector of free members. Let's solve the system of linear equations according to the following algorithm:

To calculate the inverse matrix and matrix multiplication, we use By the master functions and functions MOBR and MOMNOZH.

4. In a block of cells H2: H 9 based on the obtained coefficients, we calculate the value of the approximating polynomialy i deduction., in block I 2: I 9 - deviations D y i = y i exp. - y i deduction., column J - residual:

The resulting tables and built with Chart Wizards graphs are shown in Figures 6, 7, 8.

Rice. 6. Table for calculating the coefficients of a linear function,

approximating experimental data.

Rice. 7. Table for calculating the coefficients of a quadratic function,

approximatingexperimental data.

Rice. 8. Graphical presentation of approximation results

experimental data by linear and quadratic functions.

Answer. The experimental data were approximated by the linear dependence y = 0,07881 x + 0,442262 with residual Q = 0,165167 and quadratic dependence y = 3,115476 x 2 – 5,2175 x + 2,529631 with residual Q = 0,002103 .

Tasks. Approximate a function given by a tabular, linear and quadratic functions.

Table 6

№0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

3,030

3,142

3,358

3,463

3,772

3,251

3,170

3,665

№ 1

3,314

3,278

3,262

3,292

3,332

3,397

3,487

3,563

№ 2

1,045

1,162

1,264

1,172

1,070

0,898

0,656

0,344

№ 3

6,715

6,735

6,750

6,741

6,645

6,639

6,647

6,612

№ 4

2,325

2,515

2,638

2,700

2,696

2,626

2,491

2,291

№ 5

1.752

1,762

1,777

1,797

1,821

1,850

1,884

1,944

№ 6

1,924

1,710

1,525

1,370

1,264

1,190

1,148

1,127

№ 7

1,025

1,144

1,336

1,419

1,479

1,530

1,568

1,248

№ 8

5,785

5,685

5,605

5,545

5,505

5,480

5,495

5,510

№ 9

4,052

4,092

4,152

4,234

4,338

4,468

4,599

COURSE WORK

by discipline: Informatics

Topic: Least Squares Function Approximation

Introduction

1. Statement of the problem

2. Calculation formulas

Calculation using tables made using Microsoft Excel

Algorithm diagram

Calculation in the MathCad program

Results obtained using the Linear function

Presentation of results in the form of graphs

Introduction

The aim of the course work is to deepen the knowledge of computer science, develop and consolidate the skills of working with the Microsoft Excel spreadsheet processor and the MathCAD software product and their use for solving problems using a computer from the subject area related to research.

Approximation (from the Latin "approximare" - "to approach") - an approximate expression of any mathematical objects (for example, numbers or functions) through other simpler, more convenient to use, or simply better known. In scientific research, approximation is used to describe, analyze, generalize and further use empirical results.

As you know, there can be an exact (functional) relationship between quantities, when one value of the argument corresponds to one definite value, and a less precise (correlation) relationship, when one specific value of the argument corresponds to an approximate value or a set of values of a function that are more or less close. to each other. When conducting scientific research, processing the results of observation or experiment, one usually has to deal with the second option.

When studying the quantitative dependencies of various indicators, the values of which are determined empirically, as a rule, there is some variability. Partly it is set by the heterogeneity of the studied objects of inanimate and, especially, living nature, partly - it is caused by the observation error and quantitative processing of materials. The last component is not always possible to exclude completely, it can only be minimized by careful selection of an adequate research method and accuracy of work. Therefore, when performing any research work, the problem arises of identifying the true nature of the dependence of the studied indicators, this or that degree is masked by the unaccounted for variability: values. For this, an approximation is used - an approximate description of the correlation dependence of variables by a suitable equation of functional dependence, which conveys the main trend of the dependence (or its "trend").

When choosing an approximation, one should proceed from a specific research problem. Usually, the simpler the equation is used for approximation, the more approximate the description of the dependence will be. Therefore, it is important to read how significant and what caused the deviations of specific values from the resulting trend. When describing the dependence of empirically determined values, it is possible to achieve much greater accuracy using some more complex, multi-parametric equation. However, it makes no sense to strive with maximum accuracy to convey random deviations of values in specific series of empirical data. It is much more important to grasp the general pattern, which in this case is most logically and with acceptable accuracy expressed precisely by the two-parameter equation of the power function. Thus, choosing an approximation method, the researcher always makes a compromise: decides to what extent in this case it is expedient and appropriate to “sacrifice” the details and, accordingly, how general the dependence of the compared variables should be expressed. Along with identifying patterns masked by random deviations of empirical data from the general pattern, approximation also allows solving many other important problems: formalizing the found dependence; find unknown values of the dependent variable by interpolation or, if applicable, extrapolation.

In each task, the conditions of the problem, the initial data, the form of issuing the results are formulated, the main mathematical dependences for solving the problem are indicated. In accordance with the method for solving the problem, a solution algorithm is developed, which is presented in graphical form.

1. Statement of the problem

1. Using the least squares method, the function given in the table is approximated:

a) a polynomial of the first degree ;

b) a polynomial of the second degree;

c) exponential dependence.

Calculate the coefficient of determinism for each dependency.

Calculate the correlation coefficient (only in case a).

Draw a trend line for each dependency.

Using the LINEST function, calculate the numerical characteristics of the dependence on.

Compare your calculations with the results obtained using LINEST.

Draw a conclusion which of the obtained formulas best approximates the function.

Write a program in one of the programming languages and compare the counting results with those obtained above.

Option 3. The function is given in table. one.

Table 1.

2. Calculation formulas

Often, when analyzing empirical data, it becomes necessary to find a functional relationship between the values of x and y, which are obtained as a result of experience or measurements.

Xi (independent value) is given by the experimenter, and yi, called empirical or experimental values, is obtained as a result of experience.

The analytical form of the functional dependence that exists between the values of x and y is usually unknown, therefore, a practically important task arises - to find an empirical formula

, (1)

(where are the parameters), the values of which, if possible, would differ little from the experimental values.

According to the least squares method, the best coefficients are those for which the sum of the squares of the deviations of the found empirical function from the given values of the function is minimal.

Using the necessary condition for the extremum of a function of several variables - the equality of partial derivatives to zero, find a set of coefficients that provide the minimum of the function defined by formula (2) and obtain a normal system for determining the coefficients :

(3)

Thus, finding the coefficients is reduced to solving system (3).

The type of system (3) depends on which class of empirical formulas we are looking for dependence (1). In the case of a linear dependence, system (3) takes the form:

(4)

In the case of a quadratic dependence, system (3) takes the form:

(5)

In some cases, as an empirical formula, a function is taken into which the undefined coefficients enter nonlinearly. In this case, sometimes the problem can be linearized, i.e. reduce to linear. These dependences include the exponential dependence

where a1 and a2 are undefined coefficients.

Linearization is achieved by taking the logarithm of equality (6), after which we obtain the relation

(7)

Let us denote and, respectively, by and, then dependence (6) can be written in the form, which makes it possible to apply formulas (4) with the replacement of a1 by and by.

The graph of the restored functional dependence y (x) according to the measurement results (xi, yi), i = 1,2,…, n is called the regression curve. To check the agreement of the constructed regression curve with the experimental results, the following numerical characteristics are usually introduced: the correlation coefficient (linear dependence), the correlation ratio, and the determinism coefficient.

The correlation coefficient is a measure of the linear relationship between dependent random variables: it shows how well, on average, one of the variables can be represented as a linear function of the other.

The correlation coefficient is calculated using the formula:

(8)

(9)

where is the arithmetic mean of x, y, respectively.

The correlation coefficient between random variables in absolute value does not exceed 1. The closer to 1, the closer the linear relationship between x and y.

In the case of a nonlinear correlation, the conditional mean values are located near the curved line. In this case, it is recommended to use the correlation ratio as a characteristic of the bond strength, the interpretation of which does not depend on the type of the studied dependence.

The correlation ratio is calculated by the formula:

(10)

where and the numerator characterizes the dispersion of the conditional averages around the unconditional average.

Is always. Equality = corresponds to random uncorrelated values; = if and only if there is an exact functional relationship between x and y. In the case of a linear dependence of y on x, the correlation ratio coincides with the square of the correlation coefficient. The value is used as an indicator of the deviation of the regression from linearity.

The correlation ratio is a measure of the correlation between y c x in any form, but it cannot give an idea of the degree of closeness of empirical data to a special form. To find out how accurately the plotted 5th curve reflects empirical data, one more characteristic is introduced - the coefficient of determinism.

The coefficient of determinism is determined by the formula:

where Sres = is the residual sum of squares, which characterizes the deviation of the experimental data from the theoretical ones; full is the total sum of squares, where the average value is yi.

- regression sum of squares, which characterizes the scatter of the data.

The smaller the residual sum of squares compared to the total sum of squares, the greater the coefficient of determinism r2, which shows how well the equation obtained using regression analysis explains the relationship between the variables. If it is equal to 1, then there is a complete correlation with the model, i.e. there is no difference between actual and estimated y-values. Otherwise, if the coefficient of determinism is 0, then the regression equation fails to predict y values.

The coefficient of determinism does not always exceed the correlation ratio. In the case when equality is satisfied, then we can assume that the constructed empirical formula most accurately reflects the empirical data.

3. Calculation using tables made using Microsoft Excel

To carry out calculations, it is advisable to arrange the data in the form of Table 2 using the tools of the Microsoft Excel spreadsheet processor.

table 2

Let us explain how table 2 is compiled.

Step 1.In cells A1: A25, enter the values xi.

Step 2 In cells B1: B25 we enter the values of уi.

Step 3 In cell C1, enter the formula = A1 ^ 2.

Step 4. In cells C1: C25, this formula is copied.

Step 5 In cell D1, enter the formula = A1 * B1.

Step 6 This formula is copied into cells D1: D25.

Step 7 In cell F1, enter the formula = A1 ^ 4.

Step 8 This formula is copied into cells F1: F25.

Step 9 In cell G1, enter the formula = A1 ^ 2 * B1.

Step 10 This formula is copied into cells G1: G25.

Step 11 In cell H1, enter the formula = LN (B1).

Step 12.In cells H1: H25 this formula is copied.

Step 13 In cell I1, enter the formula = A1 * LN (B1).

Step 14.In cells I1: I25, this formula is copied.

The next steps are done using the autosum S.

Step 15. In cell A26, enter the formula = SUM (A1: A25).

Step 16. In cell B26, enter the formula = SUM (B1: B25).

Step 17. In cell C26, enter the formula = SUM (C1: C25).

Step 18. In cell D26, enter the formula = SUM (D1: D25).

Step 19. In cell E26, enter the formula = SUM (E1: E25).

Step 20. In cell F26, enter the formula = SUM (F1: F25).

Step 21. In cell G26, enter the formula = SUM (G1: G25).

Step 22. In cell H26, enter the formula = SUM (H1: H25).

Step 23. In cell I26, enter the formula = SUM (I1: I25).

Let us approximate the function by a linear function. To determine the coefficients and use system (4). Using the total sums of Table 2 located in cells A26, B26, C26 and D26, we write system (4) in the form

(11)

having solved which, we get and .

The system was solved by Cramer's method. The essence of which is as follows. Consider a system of n algebraic linear equations with n unknowns:

(12)

The determinant of the system is the determinant of the matrix of the system:

(13)

We denote the determinant, which is obtained from the determinant of the system Δ by replacing the j-th column with the column

Thus, the linear approximation has the form

System (11) is solved using Microsoft Excel tools. The results are shown in Table 3.

Table 3





	inverse matrix

Table 3 in cells A32: B33 contains the formula (= MOBR (A28: B29)).

In cells E32: E33 the formula is written (= MULTIPLE (A32: B33), (C28: C29)).

Next, we approximate the function by a quadratic function ... To determine the coefficients a1, a2, and a3, we use system (5). Using the total sums of Table 2, located in cells A26, B26, C26, D26, E26, F26, G26, we write system (5) in the form

(16)

solving which, we get a1 = 10.663624, and

Thus, the quadratic approximation has the form

System (16) is solved using Microsoft Excel tools. The results are shown in Table 4.

Table 4






	inverse matrix

In table 4, in cells A41: C43, the formula is written (= MOBR (A36: C38)).

Cells F41: F43 contain the formula (= MULTIPLE (A41: C43), (D36: D38)).

Now we approximate the function with an exponential function. To determine the coefficients and we logarithm the values and, using the total sums of Table 2, located in cells A26, C26, H26 and I26, we obtain the system

(18)

Having solved system (18), we obtain and.

After potentiation we get.

Thus, the exponential approximation has the form

System (18) is solved using Microsoft Excel tools. The results are shown in Table 5.

Table 5





	inverse matrix

In cells A50: B51, the formula is written (= MOBR (A46: B47)).

In cells E49: E50, the formula is written (= MULTIPLE (A50: B51), (C46: C47)).

Cell E51 contains the formula = EXP (E49).

Let's calculate the arithmetic mean using the formulas:

The calculation results using Microsoft Excel are presented in Table 6.

Table 6

Cell B54 contains the formula = A26 / 25.

Cell B55 contains the formula = B26 / 25

Table 7

Step 1 In cell J1, enter the formula = (A1- $ B $ 54) * (B1- $ B $ 55).

Step 2 This formula is copied into cells J2: J25.

Step 3 In cell K1, enter the formula = (A1- $ B $ 54) ^ 2.

Step 4 This formula is copied into cells k2: K25.

Step 5 In cell L1, enter the formula = (B1- $ B $ 55) ^ 2.

Step 6 This formula is copied into cells L2: L25.

Step 7 In cell M1, enter the formula = ($ E $ 32 + $ E $ 33 * A1-B1) ^ 2.

Step 8 This formula is copied into cells M2: M25.

Step 9 In cell N1, enter the formula = ($ F $ 41 + $ F $ 42 * A1 + $ F $ 43 * A1 ^ 2-B1) ^ 2.

Step 10.In cells N2: N25, this formula is copied.

Step 11 In cell O1, enter the formula = ($ E $ 51 * EXP ($ E $ 50 * A1) -B1) ^ 2.

Step 12 This formula is copied to cells O2: O25.

The next steps are done with the auto-summing S.

Step 13 In cell J26, enter the formula = SUMM (J1: J25).

Step 14 In cell K26, enter the formula = SUMM (K1: K25).

Step 15 In cell L26, enter the formula = SUMM (L1: L25).

Step 16 In cell M26, enter the formula = SUMM (M1: M25).

Step 17 In cell N26, enter the formula = SUMM (N1: N25).

Step 18 In cell O26, enter the formula = SUMM (O1: O25).

Now let us calculate the correlation coefficient using formula (8) (only for linear approximation) and the coefficient of determinism using formula (10). The results of calculations using Microsoft Excel are presented in Table 8.

Table 8


	Correlation coefficient
	Determinism coefficient (linear approximation)

	Determinism coefficient (quadratic approximation)

	Determinism coefficient (exponential approximation)

Cell E57 contains the formula = J26 / (K26 * L26) ^ (1/2).

Cell E59 contains the formula = 1-M26 / L26.

Cell E61 contains the formula = 1-N26 / L26.

Cell E63 contains the formula = 1-O26 / L26.

Analysis of the calculation results shows that the quadratic approximation best describes the experimental data.

Algorithm diagram

Rice. 1. Scheme of the algorithm for the calculation program.

5. Calculation in the MathCad program

Linear regression

· Line (x, y) - a vector of two elements (b, a) of linear regression coefficients b + ax;

· X - vector of valid data of the argument;

· Y is a vector of valid data values of the same size.

Figure 2.

Polynomial regression means approximating the data (x1, y1) by a polynomial of the kth degree.With k = i, the polynomial is a straight line, with k = 2 - a parabola, with k = 3 - a cubic parabola, etc. As a rule, in practice, k<5.

Regress (x, y, k) - vector of coefficients for constructing polynomial data regression;

Interp (s, x, y, t) - the result of polynomial regression;

S = regress (x, y, k);

· X - vector of valid data of the argument, the elements of which are arranged in ascending order;

· Y is a vector of valid data values of the same size;

· K - degree of the regression polynomial (positive integer);

· T - the value of the argument of the regression polynomial.

Figure 3

In addition to those considered, several more types of three-parameter regression are built into Mathcad, their implementation is somewhat different from the above regression options in that, in addition to the data array, it is required to set some initial values of the coefficients a, b, c for them. Use the appropriate type of regression if you have a good idea of what kind of dependency describes your data set. When the type of regression does not reflect the sequence of data well, then its result is often unsatisfactory and even very different depending on the choice of initial values. Each of the functions produces a vector of specified parameters a, b, c.

Results from LINEST

Let's look at the purpose of the LINEST function.

This function uses the least squares method to calculate the straight line that best fits the available data.

The function returns an array that describes the resulting line. The equation for a straight line is as follows:

M1x1 + m2x2 + ... + b or y = mx + b,

tabular microsoft software algorithm

where the dependent y-value is a function of the independent x-value. The m values are the coefficients corresponding to each independent variable x, and b is a constant. Note that y, x and m can be vectors.

To get the results, you need to create a tabular formula that will occupy 5 rows and 2 columns. This interval can be located anywhere on the worksheet. The LINEST function is required at this interval.

As a result, all cells of the A65: B69 interval should be filled (as shown in Table 9).

Table 9.

Let us explain the purpose of some of the values in Table 9.

The values located in cells A65 and B65 characterize the slope and shift, respectively. - the coefficient of determinism. - F - the observed value. - the number of degrees of freedom. - the regression sum of squares. - the residual sum of squares.

Presentation of results in the form of graphs

Rice. 4. Graph of linear approximation

Rice. 5. Plot of quadratic approximation

Rice. 6. Graph of exponential approximation

conclusions

Let's draw conclusions based on the results of the data obtained.

Analysis of the calculation results shows that the quadratic approximation best describes the experimental data, since the trend line for it most accurately reflects the behavior of the function in this area.

Comparing the results obtained using the LINEST function, we see that they completely coincide with the calculations carried out above. This indicates that the calculations are correct.

The results obtained using the MathCad program completely coincide with the values given above. This indicates the correctness of the calculations.

Bibliography

1 B.P. Demidovich, I.A. Maroon. Foundations of Computational Mathematics. M: State publishing house of physical and mathematical literature.

2 Informatics: Textbook ed. prof. N.V. Makarova. M: Finance and Statistics, 2007.

3 Informatics: Workshop on the technology of working on a computer, ed. prof. N.V. Makarova. M: Finance and Statistics, 2010.

4 V.B. Komyagin. Excel programming in Visual Basic. M: Radio and communication, 2007.

5 N. Nicole, R. Albrecht. Excel. Spreadsheets. M: Ed. ECOM, 2008.

6 Methodological instructions for the implementation of course work in computer science (for correspondence students of all specialties), ed. Zhurova G.N., SPbGGI (TU), 2011.

I am a software mathematician. The biggest leap in my career was when I learned to say: "I do not understand anything!" Now I am not ashamed to tell the luminary of science that he is giving me a lecture, that I do not understand what it was telling me about. And this is very difficult. Yes, it is difficult and embarrassing to admit your ignorance. Who likes to admit that he does not know the basics of something-there. By virtue of my profession, I have to attend a large number of presentations and lectures, where, I confess, in the overwhelming majority of cases I want to sleep, because I do not understand anything. But I don't understand because the huge problem of the current situation in science lies in mathematics. It assumes that all listeners are familiar with absolutely all areas of mathematics (which is absurd). It’s a shame to admit that you don’t know what a derivative is (that it is a little later).

But I learned to say that I don't know what multiplication is. Yes, I don't know what a subalgebra over a Lie algebra is. Yes, I do not know why quadratic equations are needed in life. By the way, if you are sure that you know, then we have something to talk about! Mathematics is a series of tricks. Mathematicians try to confuse and intimidate the public; where there is no confusion, there is no reputation, there is no authority. Yes, it is prestigious to speak in as abstract language as possible, which is complete nonsense in itself.

Do you know what a derivative is? Most likely you will tell me about the difference ratio limit. In the first year of Mathematics and Mechanics at St. Petersburg State University, Viktor Petrovich Khavin identified the derivative as the coefficient of the first term of the Taylor series of the function at a point (it was a separate gymnastics to determine the Taylor series without derivatives). I laughed at this definition for a long time, until I finally understood what it was about. The derivative is nothing more than just a measure of how much the function we are differentiating resembles the function y = x, y = x ^ 2, y = x ^ 3.

I now have the honor to lecture to students who fear mathematics. If you are afraid of mathematics, we are on the same path. As soon as you try to read some text, and it seems to you that it is overly complicated, then know that it is badly written. I argue that there is not a single area of mathematics that cannot be talked about "on the fingers" without losing accuracy.

The task for the near future: I instructed my students to understand what a linear-quadratic regulator is. Do not hesitate, spend three minutes of your life, follow the link. If you do not understand anything, then we are on the way with you. I (a professional mathematician-programmer) didn't understand anything either. And I assure you that you can figure it out on the fingers. At the moment I do not know what it is, but I assure you that we will be able to figure it out.

So, the first lecture that I am going to read to my students after they come running to me in horror with the words that a linear-quadratic regulator is a terrible byaka that will never be mastered in my life, this is least squares methods... Can you solve linear equations? If you are reading this text, then most likely not.

So, given two points (x0, y0), (x1, y1), for example, (1,1) and (3,2), the problem is to find the equation of a straight line passing through these two points:

illustration

This line must have an equation like the following:

Here alpha and beta are unknown to us, but we know two points of this straight line:

You can write this equation in matrix form:

A lyrical digression should be made here: what is a matrix? A matrix is nothing more than a two-dimensional array. This is a way of storing data; you shouldn't attach any more importance to it. It is up to us how exactly to interpret a certain matrix. Periodically I will interpret it as a linear display, periodically as a quadratic form, and sometimes just as a set of vectors. This will all be clarified in context.

Let's replace specific matrices with their symbolic representations:

Then (alpha, beta) can be found easily:

More specifically for our previous data:

Which leads to the following equation of the straight line passing through the points (1,1) and (3,2):

Okay, everything is clear here. Let's find the equation of the straight line passing through three points: (x0, y0), (x1, y1) and (x2, y2):

Oh-oh-oh, but we have three equations for two unknowns! A standard mathematician will say that there is no solution. What will the programmer say? For a start, he will rewrite the previous system of equations in the following form:

In our case, the vectors i, j, b are three-dimensional, therefore, (in the general case) there is no solution to this system. Any vector (alpha \ * i + beta \ * j) lies in the plane spanned by the vectors (i, j). If b does not belong to this plane, then the solution does not exist (equality in the equation cannot be achieved). What to do? Let's find a compromise. Let's denote by e (alpha, beta) exactly how far we have not reached equality:

And we will try to minimize this error:

Why square?

We are looking not just for the minimum of the norm, but for the minimum of the square of the norm. Why? The minimum point itself coincides, and the square gives a smooth function (a quadratic function of the arguments (alpha, beta)), while simply the length gives a cone-like function that is not differentiable at the minimum point. Brr. The square is more convenient.

Obviously, the error is minimized when the vector e is orthogonal to the plane spanned by the vectors i and j.

Illustration

In other words: we are looking for a line such that the sum of the squared lengths of the distances from all points to this line is minimal:

UPDATE: here I have a cant, the distance to the straight line should be measured vertically, not an orthogonal projection. This commentator is right.

Illustration

Quite differently (carefully, poorly formalized, but it should be clear on the fingers): we take all possible straight lines between all pairs of points and look for the average straight line between all:

Illustration

Another explanation on the fingers: we attach a spring between all data points (here we have three) and the straight line that we are looking for, and the straight line of the equilibrium state is exactly what we are looking for.

Minimum of a quadratic form

So, having a given vector b and the plane spanned by the column vectors of the matrix A(in this case (x0, x1, x2) and (1,1,1)), we are looking for a vector e with a minimum of a square of length. Obviously, the minimum is attainable only for the vector e, orthogonal to the plane spanned by the column vectors of the matrix A:

In other words, we are looking for a vector x = (alpha, beta) such that:

Let me remind you that this vector x = (alpha, beta) is the minimum of the quadratic function || e (alpha, beta) || ^ 2:

Here it will be useful to remember that the matrix can be interpreted as a quadratic form, for example, the unit matrix ((1,0), (0,1)) can be interpreted as a function x ^ 2 + y ^ 2:

quadratic form

All this gymnastics is known as linear regression.

Laplace's equation with the Dirichlet boundary condition

Now the simplest real task: there is a certain triangulated surface, you need to smooth it. For example, let's load my face model:

The initial commit is available. To minimize external dependencies, I took the code of my software renderer, already on Habré. To solve a linear system, I use OpenNL, this is an excellent solver, which, however, is very difficult to install: you need to copy two files (.h + .c) to the folder with your project. All anti-aliasing is done with the following code:

For (int d = 0; d<3; d++) { nlNewContext(); nlSolverParameteri(NL_NB_VARIABLES, verts.size()); nlSolverParameteri(NL_LEAST_SQUARES, NL_TRUE); nlBegin(NL_SYSTEM); nlBegin(NL_MATRIX); for (int i=0; i<(int)verts.size(); i++) { nlBegin(NL_ROW); nlCoefficient(i, 1); nlRightHandSide(verts[i][d]); nlEnd(NL_ROW); } for (unsigned int i=0; i& face = faces [i]; for (int j = 0; j<3; j++) { nlBegin(NL_ROW); nlCoefficient(face[ j ], 1); nlCoefficient(face[(j+1)%3], -1); nlEnd(NL_ROW); } } nlEnd(NL_MATRIX); nlEnd(NL_SYSTEM); nlSolve(); for (int i=0; i<(int)verts.size(); i++) { verts[i][d] = nlGetVariable(i); } }

The X, Y and Z coordinates are separable, I smooth them separately. That is, I solve three systems of linear equations, each with the number of variables equal to the number of vertices in my model. The first n rows of matrix A have only one unit per row, and the first n rows of vector b have original model coordinates. That is, I spring-tie between the new vertex position and the old vertex position - the new ones should not stray too far from the old ones.

All subsequent rows of the matrix A (faces.size () * 3 = the number of edges of all triangles in the grid) have one occurrence 1 and one occurrence -1, and the vector b has zero components opposite. This means I hang a spring on each edge of our triangular mesh: all edges try to get the same vertex as a starting and ending point.

Once again: all vertices are variables, and they cannot move far from their original position, but at the same time they try to become similar to each other.

Here's the result:

Everything would be fine, the model is really smoothed, but it has moved away from its original edge. Let's change the code a bit:

For (int i = 0; i<(int)verts.size(); i++) { float scale = border[i] ? 1000: 1; nlBegin(NL_ROW); nlCoefficient(i, scale); nlRightHandSide(scale*verts[i][d]); nlEnd(NL_ROW); }

In our matrix A, for the vertices that are on the edge, I add not a row from the v_i = verts [i] [d] bit, but 1000 * v_i = 1000 * verts [i] [d]. What does it change? And it changes our square-law error. Now, a single deviation from the vertex at the edge will cost not one unit, as before, but 1000 * 1000 units. That is, we hung a stronger spring on the extreme vertices, the solution prefers to stretch the others more. Here's the result:

Let's double the springs between the vertices:
nlCoefficient (face [j], 2); nlCoefficient (face [(j + 1)% 3], -2);

It is logical that the surface has become smoother:

And now it is even a hundred times stronger:

What's this? Imagine dipping a wire ring in soapy water. As a result, the formed soapy film will try to have the smallest curvature, as far as possible, touching the border - our wire ring. This is exactly what we got by fixing the border and asking for a smooth surface on the inside. Congratulations, we just solved the Laplace equation with Dirichlet boundary conditions. Sounds cool? But in fact, only one system of linear equations to solve.

Poisson's equation

Let's remember another cool name.

Suppose I have a picture like this:

Everyone is good, only I don't like the chair.

I will cut the picture in half:

And I will highlight the chair with my hands:

Then I will pull everything that is white in the mask to the left side of the picture, and at the same time throughout the picture I will say that the difference between two neighboring pixels should be equal to the difference between two neighboring pixels of the right picture:

For (int i = 0; i

Here's the result:

Real life example

I deliberately didn’t do the polished results. I just wanted to show you exactly how you can apply the least squares methods, this is a tutorial code. Let me now give an example from life:

I have a number of photos of fabric samples like this:

My task is to make seamless textures from photos of this quality. First, I (automatically) look for a repeating pattern:

If I cut out this quadrilateral directly, then due to distortion, the edges will not converge, here is an example of a pattern repeated four times:

Hidden text

Here is a snippet where the seam is clearly visible:

Therefore, I will not cut along a straight line, here is the cut line:

Hidden text

And here is a pattern repeated four times:

Hidden text

And a fragment of it, to make it clearer:

Even better, the cut did not go in a straight line, bypassing all sorts of curls, but still the seam is visible due to the uneven lighting in the original photo. This is where the least squares method for Poisson's equation comes in. Here is the final result after leveling the lighting:

The texture came out perfectly seamless, and it was all automatic from a very mediocre photo. Do not be afraid of math, look for simple explanations, and you will have engineering happiness.

Ordinary Least Squares (OLS)- a mathematical method used to solve various problems, based on minimizing the sum of the squares of the deviations of some functions from the desired variables. It can be used to "solve" overdetermined systems of equations (when the number of equations exceeds the number of unknowns), to find a solution in the case of ordinary (not overdetermined) nonlinear systems of equations, to approximate the point values of some function. OLS is one of the basic regression analysis methods for estimating unknown parameters of regression models based on sample data.

Collegiate YouTube

1 / 5

✪ Least squares method. Topic

✪ Least squares lesson 1/2. Linear function

✪ Econometrics. Lecture 5 Least squares method

✪ Mitin IV - Processing of the results of physical. Experiment - Least Squares Method (Lecture 4)

✪ Econometrics: Understanding Least Squares # 2

Subtitles

Story

Until the beginning of the 19th century. scientists did not have definite rules for solving a system of equations in which the number of unknowns is less than the number of equations; Until that time, particular methods were used that depended on the type of equations and on the wit of calculators, and therefore different calculators, based on the same observational data, came to different conclusions. Gauss (1795) was the author of the first application of the method, and Legendre (1805) independently discovered and published it under the modern name (fr. Méthode des moindres quarrés). Laplace linked the method with the theory of probability, and the American mathematician Edrain (1808) considered its theoretical and probabilistic applications. The method was spread and improved by further research by Encke, Bessel, Hansen and others.

The essence of the least squares method

Let x (\ displaystyle x)- kit n (\ displaystyle n) unknown variables (parameters), f i (x) (\ displaystyle f_ (i) (x)), , m> n (\ displaystyle m> n)- a set of functions from this set of variables. The task is to select such values x (\ displaystyle x) so that the values of these functions are as close as possible to some values y i (\ displaystyle y_ (i))... In essence, we are talking about the "solution" of the overdetermined system of equations f i (x) = y i (\ displaystyle f_ (i) (x) = y_ (i)), i = 1,…, m (\ displaystyle i = 1, \ ldots, m) in the indicated sense of the maximum proximity of the left and right parts of the system. The essence of the LSM is to choose the sum of the squares of the deviations of the left and right sides as a "measure of proximity" | f i (x) - y i | (\ displaystyle | f_ (i) (x) -y_ (i) |)... Thus, the essence of OLS can be expressed as follows:

∑ iei 2 = ∑ i (yi - fi (x)) 2 → min x (\ displaystyle \ sum _ (i) e_ (i) ^ (2) = \ sum _ (i) (y_ (i) -f_ ( i) (x)) ^ (2) \ rightarrow \ min _ (x)).

If the system of equations has a solution, then the minimum of the sum of squares will be equal to zero and exact solutions of the system of equations can be found analytically or, for example, by various numerical optimization methods. If the system is redefined, that is, speaking loosely, the number of independent equations is greater than the number of sought variables, then the system does not have an exact solution and the least squares method allows you to find some “optimal” vector x (\ displaystyle x) in the sense of maximum proximity of vectors y (\ displaystyle y) and f (x) (\ displaystyle f (x)) or the maximum proximity of the vector of deviations e (\ displaystyle e) to zero (proximity is understood in the sense of Euclidean distance).

Example - a system of linear equations

In particular, the least squares method can be used to "solve" a system of linear equations

A x = b (\ displaystyle Ax = b),

where A (\ displaystyle A) rectangular size matrix m × n, m> n (\ displaystyle m \ times n, m> n)(that is, the number of rows of the matrix A is more than the number of sought variables).

In the general case, such a system of equations has no solution. Therefore, this system can be "solved" only in the sense of choosing such a vector x (\ displaystyle x) to minimize the "distance" between vectors A x (\ displaystyle Ax) and b (\ displaystyle b)... To do this, you can apply the criterion for minimizing the sum of squares of the differences between the left and right sides of the equations of the system, that is, (A x - b) T (A x - b) → min x (\ displaystyle (Ax-b) ^ (T) (Ax-b) \ rightarrow \ min _ (x))... It is easy to show that the solution of this minimization problem leads to the solution of the following system of equations

ATA x = AT b ⇒ x = (ATA) - 1 AT b (\ displaystyle A ^ (T) Ax = A ^ (T) b \ Rightarrow x = (A ^ (T) A) ^ (- 1) A ^ (T) b).

OLS in regression analysis (data fit)

Let there be n (\ displaystyle n) values of some variable y (\ displaystyle y)(these can be the results of observations, experiments, etc.) and the corresponding variables x (\ displaystyle x)... The challenge is to ensure that the relationship between y (\ displaystyle y) and x (\ displaystyle x) approximate by some function known up to some unknown parameters b (\ displaystyle b), that is, in fact, find the best values of the parameters b (\ displaystyle b), maximally approximating values f (x, b) (\ displaystyle f (x, b)) to actual values y (\ displaystyle y)... In fact, this reduces to the case of a "solution" of an overdetermined system of equations with respect to b (\ displaystyle b):

F (x t, b) = y t, t = 1,…, n (\ displaystyle f (x_ (t), b) = y_ (t), t = 1, \ ldots, n).

In regression analysis, and in econometrics in particular, probabilistic models of the relationship between variables are used

Y t = f (x t, b) + ε t (\ displaystyle y_ (t) = f (x_ (t), b) + \ varepsilon _ (t)),

where ε t (\ displaystyle \ varepsilon _ (t))- so called random errors models.

Accordingly, the deviations of the observed values y (\ displaystyle y) from model f (x, b) (\ displaystyle f (x, b)) is assumed already in the model itself. The essence of OLS (ordinary, classical) is to find such parameters b (\ displaystyle b) for which the sum of squares of deviations (errors, for regression models they are often called regression residuals) e t (\ displaystyle e_ (t)) will be minimal:

b ^ O L S = arg ⁡ min b R S S (b) (\ displaystyle (\ hat (b)) _ (OLS) = \ arg \ min _ (b) RSS (b)),

where R S S (\ displaystyle RSS)- English. Residual Sum of Squares is defined as:

RSS (b) = e T e = ∑ t = 1 net 2 = ∑ t = 1 n (yt - f (xt, b)) 2 (\ displaystyle RSS (b) = e ^ (T) e = \ sum _ (t = 1) ^ (n) e_ (t) ^ (2) = \ sum _ (t = 1) ^ (n) (y_ (t) -f (x_ (t), b)) ^ (2) ).

In the general case, this problem can be solved by numerical optimization (minimization) methods. In this case, they talk about nonlinear least squares(NLS or NLLS - English Non-Linear Least Squares). In many cases, an analytical solution can be obtained. To solve the minimization problem, it is necessary to find the stationary points of the function R S S (b) (\ displaystyle RSS (b)), differentiating it by unknown parameters b (\ displaystyle b), equating the derivatives to zero and solving the resulting system of equations:

∑ t = 1 n (yt - f (xt, b)) ∂ f (xt, b) ∂ b = 0 (\ displaystyle \ sum _ (t = 1) ^ (n) (y_ (t) -f (x_ (t), b)) (\ frac (\ partial f (x_ (t), b)) (\ partial b)) = 0).

OLS for Linear Regression

Let the regression dependence be linear:

yt = ∑ j = 1 kbjxtj + ε = xt T b + ε t (\ displaystyle y_ (t) = \ sum _ (j = 1) ^ (k) b_ (j) x_ (tj) + \ varepsilon = x_ ( t) ^ (T) b + \ varepsilon _ (t)).

Let y is the column vector of observations of the variable being explained, and X (\ displaystyle X)- it (n × k) (\ displaystyle ((n \ times k)))-matrix of observations of factors (rows of the matrix are vectors of values of factors in a given observation, by columns - a vector of values of a given factor in all observations). The matrix representation of the linear model is:

y = X b + ε (\ displaystyle y = Xb + \ varepsilon).

Then the vector of estimates of the explained variable and the vector of regression residuals will be equal

y ^ = X b, e = y - y ^ = y - X b (\ displaystyle (\ hat (y)) = Xb, \ quad e = y - (\ hat (y)) = y-Xb).

accordingly, the sum of the squares of the regression residuals will be

R S S = e T e = (y - X b) T (y - X b) (\ displaystyle RSS = e ^ (T) e = (y-Xb) ^ (T) (y-Xb)).

Differentiating this function with respect to the parameter vector b (\ displaystyle b) and equating the derivatives to zero, we obtain a system of equations (in matrix form):

(X T X) b = X T y (\ displaystyle (X ^ (T) X) b = X ^ (T) y).

In deciphered matrix form, this system of equations looks like this:

(∑ xt 1 2 ∑ xt 1 xt 2 ∑ xt 1 xt 3… ∑ xt 1 xtk ∑ xt 2 xt 1 ∑ xt 2 2 ∑ xt 2 xt 3… ∑ xt 2 xtk ∑ xt 3 xt 1 ∑ xt 3 xt 2 ∑ xt 3 2… ∑ xt 3 xtk ⋮ ⋮ ⋮ ⋱ ⋮ ∑ xtkxt 1 ∑ xtkxt 2 ∑ xtkxt 3… ∑ xtk 2) (b 1 b 2 b 3 ⋮ bk) = (∑ xt 1 yt ∑ xt 2 yt ∑ xt 3 yt ⋮ ∑ xtkyt), (\ displaystyle (\ begin (pmatrix) \ sum x_ (t1) ^ (2) & \ sum x_ (t1) x_ (t2) & \ sum x_ (t1) x_ (t3) & \ ldots & \ sum x_ (t1) x_ (tk) \\\ sum x_ (t2) x_ (t1) & \ sum x_ (t2) ^ (2) & \ sum x_ (t2) x_ (t3) & \ ldots & \ sum x_ (t2) x_ (tk) \\\ sum x_ (t3) x_ (t1) & \ sum x_ (t3) x_ (t2) & \ sum x_ (t3) ^ (2) & \ ldots & \ sum x_ (t3) x_ (tk) \\\ vdots & \ vdots & \ vdots & \ ddots & \ vdots \\\ sum x_ (tk) x_ (t1) & \ sum x_ (tk) x_ (t2) & \ sum x_ (tk) x_ (t3) & \ ldots & \ sum x_ (tk) ^ (2) \\\ end (pmatrix)) (\ begin (pmatrix) b_ (1) \\ b_ (2) \\ b_ (3 ) \\\ vdots \\ b_ (k) \\\ end (pmatrix)) = (\ begin (pmatrix) \ sum x_ (t1) y_ (t) \\\ sum x_ (t2) y_ (t) \\ \ sum x_ (t3) y_ (t) \\\ vdots \\\ sum x_ (tk) y_ (t) \\\ end (pmatrix)),) where all the sums are taken over all admissible values t (\ displaystyle t).

If a constant is included in the model (as usual), then x t 1 = 1 (\ displaystyle x_ (t1) = 1) with all t (\ displaystyle t), therefore, in the upper left corner of the matrix of the system of equations, there is the number of observations n (\ displaystyle n), and in the rest of the elements of the first row and the first column - just the sum of the values of the variables: ∑ x t j (\ displaystyle \ sum x_ (tj)) and the first element of the right side of the system is ∑ y t (\ displaystyle \ sum y_ (t)).

The solution of this system of equations gives the general formula of the OLS estimates for the linear model:

b ^ OLS = (XTX) - 1 XT y = (1 n XTX) - 1 1 n XT y = V x - 1 C xy (\ displaystyle (\ hat (b)) _ (OLS) = (X ^ (T ) X) ^ (- 1) X ^ (T) y = \ left ((\ frac (1) (n)) X ^ (T) X \ right) ^ (- 1) (\ frac (1) (n )) X ^ (T) y = V_ (x) ^ (- 1) C_ (xy)).

For analytical purposes, the last representation of this formula turns out to be useful (in the system of equations when divided by n, instead of sums, arithmetic means appear). If in the regression model the data centered, then in this representation the first matrix has the meaning of the sample covariance matrix of factors, and the second is the vector of covariance of factors with the dependent variable. If, in addition, the data is also normalized to SKO (that is, ultimately standardized), then the first matrix has the meaning of a selective correlation matrix of factors, the second vector is a vector of selective correlations of factors with a dependent variable.

An important property of OLS estimates for models with constant- the line of the constructed regression passes through the center of gravity of the sample data, that is, the equality is fulfilled:

y ¯ = b 1 ^ + ∑ j = 2 kb ^ jx ¯ j (\ displaystyle (\ bar (y)) = (\ hat (b_ (1))) + \ sum _ (j = 2) ^ (k) (\ hat (b)) _ (j) (\ bar (x)) _ (j)).

In particular, in the extreme case, when the only regressor is a constant, we find that the OLS estimate of the only parameter (the constant itself) is equal to the mean value of the variable being explained. That is, the arithmetic mean, known for its good properties from the laws of large numbers, is also an OLS-estimate - it satisfies the criterion of the minimum sum of squares of deviations from it.

The simplest special cases

In the case of paired linear regression y t = a + b x t + ε t (\ displaystyle y_ (t) = a + bx_ (t) + \ varepsilon _ (t)), when the linear dependence of one variable on another is estimated, the calculation formulas are simplified (you can do without matrix algebra). The system of equations is as follows:

(1 x ¯ x ¯ x 2 ¯) (ab) = (y ¯ xy ¯) (\ displaystyle (\ begin (pmatrix) 1 & (\ bar (x)) \\ (\ bar (x)) & (\ bar (x ^ (2))) \\\ end (pmatrix)) (\ begin (pmatrix) a \\ b \\\ end (pmatrix)) = (\ begin (pmatrix) (\ bar (y)) \\ (\ overline (xy)) \\\ end (pmatrix))).

Hence, it is easy to find estimates of the coefficients:

(b ^ = Cov ⁡ (x, y) Var ⁡ (x) = xy ¯ - x ¯ y ¯ x 2 ¯ - x ¯ 2, a ^ = y ¯ - bx ¯. (\ displaystyle (\ begin (cases) (\ hat (b)) = (\ frac (\ mathop (\ textrm (Cov)) (x, y)) (\ mathop (\ textrm (Var)) (x))) = (\ frac ((\ overline (xy)) - (\ bar (x)) (\ bar (y))) ((\ overline (x ^ (2))) - (\ overline (x)) ^ (2))), \\ ( \ hat (a)) = (\ bar (y)) - b (\ bar (x)). \ end (cases)))

Despite the fact that in the general case the model with a constant is preferable, in some cases it is known from theoretical considerations that the constant a (\ displaystyle a) should be zero. For example, in physics, the relationship between voltage and current has the form U = I ⋅ R (\ displaystyle U = I \ cdot R); measuring the voltage and current strength, it is necessary to estimate the resistance. In this case, we are talking about the model y = b x (\ displaystyle y = bx)... In this case, instead of the system of equations, we have the only equation

(∑ x t 2) b = ∑ x t y t (\ displaystyle \ left (\ sum x_ (t) ^ (2) \ right) b = \ sum x_ (t) y_ (t)).

Consequently, the formula for estimating a single coefficient has the form

B ^ = ∑ t = 1 nxtyt ∑ t = 1 nxt 2 = xy ¯ x 2 ¯ (\ displaystyle (\ hat (b)) = (\ frac (\ sum _ (t = 1) ^ (n) x_ (t ) y_ (t)) (\ sum _ (t = 1) ^ (n) x_ (t) ^ (2))) = (\ frac (\ overline (xy)) (\ overline (x ^ (2)) ))).

Polynomial model case

If the data is fitted with a single variable polynomial regression function f (x) = b 0 + ∑ i = 1 k b i x i (\ displaystyle f (x) = b_ (0) + \ sum \ limits _ (i = 1) ^ (k) b_ (i) x ^ (i)), then, perceiving the degree x i (\ displaystyle x ^ (i)) as independent factors for everyone i (\ displaystyle i) it is possible to estimate the parameters of the model based on the general formula for estimating the parameters of a linear model. To do this, it is sufficient to take into account in the general formula that with such an interpretation x t i x t j = x t i x t j = x t i + j (\ displaystyle x_ (ti) x_ (tj) = x_ (t) ^ (i) x_ (t) ^ (j) = x_ (t) ^ (i + j)) and x t j y t = x t j y t (\ displaystyle x_ (tj) y_ (t) = x_ (t) ^ (j) y_ (t))... Consequently, the matrix equations in this case will take the form:

(n ∑ nxt… ∑ nxtk ∑ nxt ∑ nxt 2… ∑ nxtk + 1 ⋮ ⋮ ⋱ ⋮ ∑ nxtk ∑ nxtk + 1… ∑ nxt 2 k) [b 0 b 1 ⋮ bk] = [∑ nyt ∑ nxtyt ⋮ ∑ nxtkyt ]. (\ displaystyle (\ begin (pmatrix) n & \ sum \ limits _ (n) x_ (t) & \ ldots & \ sum \ limits _ (n) x_ (t) ^ (k) \\\ sum \ limits _ ( n) x_ (t) & \ sum \ limits _ (n) x_ (t) ^ (2) & \ ldots & \ sum \ limits _ (n) x_ (t) ^ (k + 1) \\\ vdots & \ vdots & \ ddots & \ vdots \\\ sum \ limits _ (n) x_ (t) ^ (k) & \ sum \ limits _ (n) x_ (t) ^ (k + 1) & \ ldots & \ sum \ limits _ (n) x_ (t) ^ (2k) \ end (pmatrix)) (\ begin (bmatrix) b_ (0) \\ b_ (1) \\\ vdots \\ b_ (k) \ end ( bmatrix)) = (\ begin (bmatrix) \ sum \ limits _ (n) y_ (t) \\\ sum \ limits _ (n) x_ (t) y_ (t) \\\ vdots \\\ sum \ limits _ (n) x_ (t) ^ (k) y_ (t) \ end (bmatrix)).)

Statistical properties of OLS estimates

First of all, we note that for linear models, OLS estimates are linear estimates, as follows from the above formula. For the unbiasedness of the OLS estimates, it is necessary and sufficient to fulfill the most important condition of regression analysis: the mathematical expectation of a random error, conditional in terms of factors, should be equal to zero. This condition, in particular, is satisfied if

the mathematical expectation of random errors is zero, and
factors and random errors are independent random variables.

The second condition - the condition of exogenous factors - is fundamental. If this property is not met, then we can assume that almost any estimates will be extremely unsatisfactory: they will not even be consistent (that is, even a very large amount of data does not allow obtaining qualitative estimates in this case). In the classical case, a stronger assumption is made about the determinism of factors, as opposed to a random error, which automatically means the fulfillment of the exogenous condition. In the general case, for the consistency of the estimates, it is sufficient to satisfy the exogeneity condition together with the convergence of the matrix V x (\ displaystyle V_ (x)) to some non-degenerate matrix with increasing sample size to infinity.

In order for, in addition to consistency and unbiasedness, estimates of (ordinary) least squares to be effective (the best in the class of linear unbiased estimates), it is necessary to fulfill additional properties of a random error:

These assumptions can be formulated for the covariance matrix of the vector of random errors V (ε) = σ 2 I (\ displaystyle V (\ varepsilon) = \ sigma ^ (2) I).

A linear model satisfying these conditions is called classical... OLS estimates for classical linear regression are unbiased, consistent and the most effective estimates in the class of all linear unbiased estimates (in English literature, the abbreviation is sometimes used BLUE (Best Linear Unbiased Estimator) is the best linear unbiased estimate; in the domestic literature, the Gauss - Markov theorem is more often cited). As it is easy to show, the covariance matrix of the vector of coefficient estimates will be equal to:

V (b ^ OLS) = σ 2 (XTX) - 1 (\ displaystyle V ((\ hat (b)) _ (OLS)) = \ sigma ^ (2) (X ^ (T) X) ^ (- 1 )).

Efficiency means that this covariance matrix is "minimal" (any linear combination of coefficients, and in particular the coefficients themselves, have the minimum variance), that is, in the class of linear unbiased estimates, the OLS estimates are the best. The diagonal elements of this matrix - the variances of the coefficient estimates - are important parameters of the quality of the estimates obtained. However, it is impossible to calculate the covariance matrix, since the variance of the random errors is unknown. It can be proved that the unbiased and consistent (for the classical linear model) estimate of the variance of random errors is the value:

S 2 = R S S / (n - k) (\ displaystyle s ^ (2) = RSS / (n-k)).

Substituting this value in the formula for the covariance matrix and we obtain an estimate of the covariance matrix. The estimates obtained are also unbiased and consistent. It is also important that the estimation of the variance of errors (and hence the variances of the coefficients) and the estimates of the model parameters are independent random variables, which allows one to obtain test statistics for testing hypotheses about the coefficients of the model.

It should be noted that if the classical assumptions are not met, the OLS estimates of the parameters are not the most efficient and, where W (\ displaystyle W)- some symmetric positive definite weight matrix. The usual OLS is a special case of this approach, when the weight matrix is proportional to the identity matrix. As is known, for symmetric matrices (or operators) there is a decomposition W = P T P (\ displaystyle W = P ^ (T) P)... Therefore, this functional can be represented as follows e TPTP e = (P e) TP e = e ∗ T e ∗ (\ displaystyle e ^ (T) P ^ (T) Pe = (Pe) ^ (T) Pe = e _ (*) ^ (T) e_ ( *)), that is, this functional can be represented as the sum of the squares of some transformed "residuals". Thus, we can distinguish a class of least squares methods - LS-methods (Least Squares).

It has been proved (Aitken's theorem) that for a generalized linear regression model (in which no restrictions are imposed on the covariance matrix of random errors), the most effective (in the class of linear unbiased estimates) are estimates of the so-called generalized OLS (OLS, GLS - Generalized Least Squares)- LS-method with a weight matrix equal to the inverse covariance matrix of random errors: W = V ε - 1 (\ displaystyle W = V _ (\ varepsilon) ^ (- 1)).

It can be shown that the formula for OLS estimates for the parameters of a linear model has the form

B ^ GLS = (XTV - 1 X) - 1 XTV - 1 y (\ displaystyle (\ hat (b)) _ (GLS) = (X ^ (T) V ^ (- 1) X) ^ (- 1) X ^ (T) V ^ (- 1) y).

The covariance matrix of these estimates will accordingly be equal to

V (b ^ GLS) = (XTV - 1 X) - 1 (\ displaystyle V ((\ hat (b)) _ (GLS)) = (X ^ (T) V ^ (- 1) X) ^ (- one)).

In fact, the essence of OLS is a certain (linear) transformation (P) of the original data and the application of the usual OLS to the transformed data. The goal of this transformation is that for the transformed data, random errors already satisfy the classical assumptions.

Weighted OLS

In the case of a diagonal weight matrix (and hence a covariance matrix of random errors), we have the so-called Weighted Least Squares (WLS). In this case, the weighted sum of the squares of the residuals of the model is minimized, that is, each observation receives a "weight" inversely proportional to the variance of the random error in this observation: e TW e = ∑ t = 1 net 2 σ t 2 (\ displaystyle e ^ (T) We = \ sum _ (t = 1) ^ (n) (\ frac (e_ (t) ^ (2)) (\ sigma _ (t) ^ (2))))... In fact, the data is transformed by weighting the observations (dividing by a value proportional to the estimated standard deviation of random errors), and regular OLS is applied to the weighted data.

ISBN 978-5-7749-0473-0.

Econometrics. Textbook / Ed. Eliseeva I.I. - 2nd ed. - M.: Finance and statistics, 2006 .-- 576 p. - ISBN 5-279-02786-3.

Alexandrova N.V. History of mathematical terms, concepts, designations: reference dictionary. - 3rd ed .. - M.: LKI, 2008 .-- 248 p. - ISBN 978-5-382-00839-4. I.V. Mitin, Rusakov V.S. Analysis and processing of experimental data - 5th edition - 24s.

It is widely used in econometrics in the form of a clear economic interpretation of its parameters.

Linear regression is reduced to finding an equation of the form

Equation of the form allows for the given parameter values X have the theoretical values of the effective indicator, substituting the actual values of the factor into it X.

The construction of linear regression is reduced to the estimation of its parameters - a and v. Estimates of the parameters of linear regression can be found by different methods.

The classical approach to estimating linear regression parameters is based on least squares method(OLS).

OLS allows one to obtain such parameter estimates a and v, at which the sum of the squares of the deviations of the actual values of the resultant attribute (y) from calculated (theoretical) minimal:

To find the minimum of the function, it is necessary to calculate the partial derivatives with respect to each of the parameters a and b and set them to zero.

We denote through S, then:

Transforming the formula, we obtain the following system of normal equations for estimating the parameters a and v:

Solving the system of normal equations (3.5) either by the method of successive elimination of variables or by the method of determinants, we find the required estimates of the parameters a and v.

Parameter v called the regression coefficient. Its value shows the average change in the result with a change in the factor by one unit.

The regression equation is always supplemented by an indicator of the tightness of the relationship. When linear regression is used, the linear correlation coefficient acts as such an indicator. There are various modifications of the linear correlation coefficient formula. Some of them are listed below:

As you know, the linear correlation coefficient is in the range: -1 ≤ ≤ 1.

To assess the quality of the selection of a linear function, the square is calculated

Linear correlation coefficient called the coefficient of determination. The coefficient of determination characterizes the proportion of the variance of the effective indicator y, explained by regression, in the total variance of the effective trait:

Accordingly, the value 1 - characterizes the proportion of dispersion y, caused by the influence of other factors not taken into account in the model.

Questions for self-control

1. What is the essence of the least squares method?

2. How many variables are paired regression provided?

3. What is the coefficient that determines the tightness of the relationship between changes?

4. Within what limits is the coefficient of determination determined?

5. Estimation of parameter b in correlation-regression analysis?

1. Christopher Dougherty. Introduction to Econometrics. - M .: INFRA - M, 2001 - 402 p.

2.S.A. Borodich. Econometrics. Minsk LLC "New Knowledge" 2001.

3. R.U. Rakhmetova A short course in econometrics. Tutorial. Almaty. 2004. -78s.

4. I.I. Eliseeva, Econometrics. - M .: "Finance and Statistics", 2002

5. Monthly information and analytical magazine.

Nonlinear economic models. Non-linear regression models. Conversion of variables.

Nonlinear economic models.

Conversion of variables.

Elasticity coefficient.

If there are non-linear relations between economic phenomena, then they are expressed using the corresponding non-linear functions: for example, an equilateral hyperbola , parabolas of the second degree and etc.

There are two classes of nonlinear regressions:

1. Regressions that are nonlinear with respect to the explanatory variables included in the analysis, but linear with respect to the estimated parameters, for example:

Polynomials of different degrees - , ;

Equilateral hyperbola -;

Semi-logarithmic function -.

2. Regressions that are non-linear in the parameters being estimated, for example:

Power -;

Indicative -;

Exponential -.

The total sum of the squares of the deviations of the individual values of the effective indicator at from the average is due to the influence of many reasons. Let's conditionally divide the whole set of reasons into two groups: studied factor x and other factors.

If the factor does not affect the result, then the regression line on the graph is parallel to the axis Oh and

Then the entire variance of the effective trait is due to the influence of other factors and the total sum of the squares of the deviations will coincide with the residual. If other factors do not affect the result, then u tied With X functionally and the residual sum of squares is zero. In this case, the sum of squares of the deviations explained by the regression is the same as the total sum of squares.

Since not all points of the correlation field lie on the regression line, then their scatter always takes place as due to the influence of the factor X, i.e., regression at on X, and other causes (unexplained variation). The suitability of the regression line for forecasting depends on how much of the total variation of the characteristic at falls on the explained variation

Obviously, if the sum of squares of deviations due to the regression is greater than the residual sum of squares, then the regression equation is statistically significant and the factor X has a significant impact on the result at.

, that is, with the number of freedom of independent variation of the feature. The number of degrees of freedom is associated with the number of units of the population n and with the number of constants determined from it. In relation to the problem under study, the number of degrees of freedom should show how many independent deviations from P

The estimation of the significance of the regression equation as a whole is given with the help of F-Fisher's criterion. At the same time, a zero hypothesis is put forward that the regression coefficient is zero, i.e. b = 0, and hence the factor X does not affect the result at.

The direct calculation of the F-criterion is preceded by the analysis of variance. The central place in it is occupied by the decomposition of the total sum of the squares of the deviations of the variable at from the average at into two parts - "explained" and "unexplained":

- the total sum of the squares of the deviations;

- the sum of squares of the deviation explained by the regression;

- residual sum of squares of deviation.

Any sum of squares of deviations is related to the number of degrees of freedom , that is, with the number of freedom of independent variation of the feature. The number of degrees of freedom is related to the number of units in the population n and with the number of constants determined from it. In relation to the problem under study, the number of degrees of freedom should show how many independent deviations from P possible is required to form a given sum of squares.

Dispersion per degree of freedomD.

F-ratios (F-criterion):

If the null hypothesis is true, then the factorial and residual variances do not differ from each other. For Н 0, a refutation is necessary so that the factorial variance exceeds the residual by several times. British statistician Snedecor developed tables of critical values F-relations at different levels of significance of the null hypothesis and different degrees of freedom. Table value F-criterion is the maximum value of the ratio of variances that can occur in case of their random discrepancy for a given level of probability of the presence of a null hypothesis. Calculated value F-relationship is recognized as reliable if it is more than tabular.

In this case, the null hypothesis of the absence of a connection between signs is rejected and a conclusion is made about the significance of this connection: F fact> F tab H 0 is rejected.

If the value is less than the table F fact ‹, F tab, then the probability of the null hypothesis is higher than a given level and it cannot be rejected without serious risk of making an incorrect conclusion about the presence of a connection. In this case, the regression equation is considered statistically insignificant. But it does not deviate.

Regression coefficient standard error

To assess the significance of the regression coefficient, its value is compared with its standard error, i.e., the actual value is determined t-Student's criterion: which is then compared with the table value at a certain level of significance and the number of degrees of freedom ( n- 2).

Parameter standard error a:

The significance of the linear correlation coefficient is checked based on the magnitude of the error correlation coefficient t r:

Total variance of a trait X:

Multiple Linear Regression

Building the model

Multiple regression is a regression of an effective trait with two or more factors, i.e., a model of the form

Regression can give a good result in modeling, if the influence of other factors affecting the research object can be neglected. The behavior of individual economic variables cannot be controlled, that is, it is not possible to ensure the equality of all other conditions for assessing the influence of one investigated factor. In this case, one should try to identify the influence of other factors by introducing them into the model, i.e., construct a multiple regression equation: y = a + b 1 x 1 + b 2 +… + b p x p + .

The main goal of multiple regression is to build a model with a large number of factors, while determining the influence of each of them separately, as well as their cumulative effect on the modeled indicator. Model specification includes two areas of issues: selection of factors and selection of the type of regression equation

Introduction

Minimum of a quadratic form

Laplace's equation with the Dirichlet boundary condition

Poisson's equation

Real life example

Collegiate YouTube

Subtitles

Story

The essence of the least squares method

Example - a system of linear equations

OLS in regression analysis (data fit)

OLS for Linear Regression

The simplest special cases

Polynomial model case

Statistical properties of OLS estimates

Weighted OLS

Nonlinear economic models. Non-linear regression models. Conversion of variables.

There are two classes of nonlinear regressions:

Read also