Article swirling streams optical methods. Calculation of the optical flow by the Lucas-Kanade method


In computer vision and image processing systems, the problem often arises of determining the movements of objects in three-dimensional space using an optical sensor, that is, a video camera. Having a sequence of frames as input, it is necessary to recreate the three-dimensional space imprinted on them and the changes that occur to it over time. It sounds complicated, but in practice it is often enough to find the offsets of two-dimensional projections of objects in the frame plane.

If we want to know how much this or that object has shifted relative to its position on the previous frame during the time that has passed between fixing frames, then most likely we will first of all remember the optical flow. To find the optical flow, you can safely use a ready-made tested and optimized implementation of one of the algorithms, for example, from the OpenCV library. It is, however, very harmless to understand the theory, so I invite all interested to look inside one of the popular and well-studied methods. This article does not contain code and practical advice, but there are formulas and a number of mathematical derivations.

There are several approaches to determining offsets between two adjacent frames. For example, for each small fragment (say, 8 by 8 pixels) of one frame, you can find the most similar fragment in the next frame. In this case, the difference between the coordinates of the original and found fragments will give us the offset. The main difficulty here is how to quickly find the desired fragment without going through the entire frame pixel by pixel. Various implementations of this approach solve the problem of computational complexity in one way or another. Some are so successful that they are used, for example, in common video compression standards. The price of speed is, of course, quality. We will consider another approach, which allows us to obtain offsets not for fragments, but for each individual pixel, and is used when the speed is not so critical. The term “optical flow” is often associated with it in the literature.

This approach is often called differential, since it is based on the calculation of partial derivatives in the horizontal and vertical directions of the image. As we will see later, derivatives alone are not enough to determine biases. That is why, based on one simple idea, a great many methods have appeared, each of which uses some kind of mathematical dance with a tambourine to achieve the goal. Let's focus on the Lucas-Kanade method, proposed in 1981 by Bruce Lucas and Takeo Kanade.

Lucas-Kanade method
All further reasoning is based on one very important and not very fair assumption: Assume pixel values ​​pass from one frame to the next without change. Thus, we make the assumption that the pixels belonging to the same object can move in any direction, but their value will remain unchanged. Of course, this assumption has little to do with reality, because global lighting conditions and the illumination of the moving object itself can change from frame to frame. There are a lot of problems with this assumption, but, oddly enough, despite everything, it works quite well in practice.

In mathematical language, this assumption can be written as follows: . Where I is a function of pixel brightness from position on the frame and time. In other words, x and y are the pixel coordinates in the frame plane, and y is the offset, and t is the frame number in the sequence. Let us agree that a single time interval passes between two adjacent frames.

One-dimensional case
Let us first consider the one-dimensional case. Imagine two one-dimensional frames 1 pixel high and 20 pixels wide (figure on the right). On the second frame, the image is slightly shifted to the right. It is this offset that we want to find. To do this, we present the same frames as functions (figure on the left). The input is the position of the pixel, the output is its intensity. In such a representation, the required displacement (d) can be seen even more clearly. According to our assumption, this is just a biased , that is, we can say that .

Please note that if you wish, you can also write in general form: ; where y and t are fixed and equal to zero.

For each coordinate, we know the values ​​​​at this point, in addition, we can calculate their derivatives. Associate the known values ​​with the offset d. To do this, we write the expansion in a Taylor series for :

Let's make a second important assumption: Assume that is approximated well enough by the first derivative. Having made this assumption, we discard everything after the first derivative:

How correct is this? In general, not very much, here we lose in accuracy, unless our function / image is strictly linear, as in our artificial example. But this greatly simplifies the method, and to achieve the required accuracy, a successive approximation can be made, which we will consider later.

We are almost there. The offset d is our target value, so we need to do something with . As we agreed earlier, so we just rewrite:

2D case
Let us now pass from the one-dimensional case to the two-dimensional one. We write the expansion in a Taylor series for and immediately discard all higher derivatives. Instead of the first derivative, a gradient appears:

Where is the displacement vector.
In accordance with the assumption made. Note that this expression is equivalent to . This is what we need. Let's rewrite:

Since a single time interval passes between two frames, we can say that there is nothing more than a derivative with respect to time.
Let's rewrite:

Let's rewrite it again, opening the gradient:

We have an equation that tells us that the sum of the partial derivatives must be equal to zero. The only problem is that we have one equation, and there are two unknowns in it: and . At this point, a flight of fancy and a variety of approaches begin.

Let's make a third assumption: Assume that adjacent pixels are shifted by the same distance. Let's take a fragment of the image, let's say 5 by 5 pixels, and we will agree that for each of the 25 pixels and are equal. Then instead of one equation, we get 25 equations at once! Obviously, in the general case, the system has no solution, so we will look for those and that minimize the error:

Here g is a function that determines the weights for the pixels. The most common variant is the 2D Gaussian, which gives the most weight to the central pixel and less and less as you move away from the center.

To find the minimum, we use the least squares method, find its partial derivatives with respect to and :

We rewrite in a more compact form and equate to zero:

Let's rewrite these two equations in matrix form:

If the matrix M is invertible (has rank 2), we can calculate and , which minimize the error E:

That's actually all. We know the approximate pixel offset between two adjacent frames.

Since the neighboring pixels are also involved in finding the offset of each pixel, when implementing this method, it is advisable to preliminarily calculate the horizontal and vertical derivatives of the frame.

Disadvantages of the method
The method described above is based on three significant assumptions, which, on the one hand, give us the fundamental opportunity to determine the optical flow, but, on the other hand, introduce an error. The good news for perfectionists is that we only need one assumption to simplify the method, and we can deal with its consequences. We assumed that the first derivative would be enough for us to approximate the displacement. In the general case, this is of course not the case (figure on the left). To achieve the required accuracy, the offset for each pair of frames (let's call them and ) can be calculated iteratively. In the literature, this is called warping. In practice, this means that, having calculated the offsets at the first iteration, we move each pixel of the frame in the opposite direction so that this offset is compensated. At the next iteration, instead of the original frame, we will use its distorted version. And so on, until at the next iteration all the resulting offsets are less than the specified threshold value. We get the final offset for each specific pixel as the sum of its offsets at all iterations.

By its nature, this method is local, that is, when determining the offset of a particular pixel, only the area around this pixel is taken into account - the local neighborhood. As a consequence, it is impossible to determine displacements inside sufficiently large (greater than the size of the local neighborhood) evenly colored areas of the frame. Fortunately, such areas are not often found on real frames, but this feature still introduces an additional deviation from the true displacement.

Another problem is related to the fact that some textures in the image give a degenerate matrix M, for which the inverse matrix cannot be found. Accordingly, for such textures we will not be able to determine the offset. That is, there seems to be movement, but it is not clear in which direction. In general, not only the considered method suffers from this problem. Even the human eye perceives such a movement ambiguously (Barber pole).

Conclusion
We have analyzed the theoretical foundations of one of the differential methods for finding the optical flow. There are many other interesting methods, some of which, as of today, give more reliable results. However, the Lucas-Kanade method, with its good efficiency, remains quite simple to understand, and therefore is well suited for getting acquainted with the mathematical foundations.

Although the problem of finding the optical flow has been studied for several decades, the methods are still being improved. The work continues in view of the fact that, upon closer examination, the problem turns out to be very difficult, and the stability and efficiency of many other algorithms depend on the quality of determining the biases in video and image processing.

On this pathetic note, let me round off and move on to sources and useful links.
Lucas-Kanade method Add tags

Optical flow is an image of the apparent movement of objects, surfaces or edges of the scene, resulting from the movement of the observer (eyes or camera) relative to the scene. Optical flow-based algorithms such as motion detection, object segmentation, motion encoding, and stereo disparity counting use this motion of objects, surfaces, and edges.

Optical Flow Estimation

Optical flow based methods calculate motion between two frames taken at time and, at each pixel. These methods are called differential since they are based on the approximation of the signal by a segment of the Taylor series; thus, they use partial derivatives with respect to time and space coordinates.

In the case of dimension 2D+ t(higher dimensional cases are similar) the pixel at the position with intensity per frame will be moved by , and the following equation can be written:

Assuming that the displacement is small, and using the Taylor series, we get:

From these equalities it follows:

hence it turns out that

Optical flow rate components in ,

Derivative images in the respective directions.

In this way:

The resulting equation contains two unknowns and cannot be uniquely resolved. This circumstance is known as the aperture problem. The problem is solved by imposing additional restrictions - regularization.

Methods for determining the optical flow:

    Phase correlation is the inversion of the normalized cross spectrum.

    Block methods - minimizing the sum of squares or the sum of modulus of differences

    Differential methods for estimating the optical flow based on the partial derivatives of the signal:

    Lucas algorithm - Kanade - considers image parts and affine motion model

    Horn–Schunck - minimization of the functional describing the deviation from the assumption of constant brightness and the smoothness of the resulting vector field.

    Buxton–Buxton - based on the model of movement of the boundaries of objects in a sequence of images

    General variational methods are modifications of the Horn-Schunck method that use different constraints on the data and other constraints on smoothness.

    Discrete optimization methods - the search space is quantized, then a label is assigned to each pixel of the image so that the distance between successive frames is minimal. The optimal solution is often sought using algorithms for finding the minimum cut and maximum flow in a graph, linear programming, or belief propagation.

Optical flow tracking is often used with fixed cameras such as airport or building cameras, and fixed DVR cameras.

In this work, a method was used using the Lucas-Kanade algorithm (Fig. 4-6)

Rice. 4 Main model window



Rice. 5 Model blocks

Rice. 6 The result of the model

This model uses an optical flow estimation method to determine the motion vectors in each frame of a video file. By limiting and morphologically approximating the motion vectors, the model creates binary images of the features. The model finds a car in each binary image through the "Blob Analysis" block. The Draw Shapes block then draws a green box around the cars that pass through the white line.

The disadvantage of the method is that the camera must be stationary, otherwise the result of recognition and tracking becomes unpredictable.

The construction of an optical flow is traditionally considered as a procedure for estimating the brightness-geometric changes between the present (current) and previous frames. The movement of objects in front of a stationary camera, as well as the movement of the camera in the environment, lead to corresponding changes in the image. The apparent movement of the visible field (two-dimensional distribution of brightness), observed when the camera moves relative to the imaged objects or objects relative to the camera, is called optical flow. Let us define the motion field by assigning a velocity vector to each point of the image. At some selected point in time, a point on the image corresponds to some point on the surface of the object. These two points are connected by design equations. The point of the object moves relative to the camera at a speed. This generates the movement of the corresponding point in the image. During the time, the point moves a distance, and its image - a distance (see Fig. 1).

Rice. one.

The brightness distributions move along with the observed objects. The optical flow, as mentioned earlier, is the apparent movement of the brightness pattern. Ideally, the optical flow corresponds to the previously defined field of motion, however, in practice this is not always the case.

Let now be the brightness of a pixel at a point in the image at a point in time. Then, if and are the components of the optical flow vector at this point, then we can expect that

We cancel by on the left and right sides, divide by and pass to the limit at. Get

Derivatives are easy to obtain from an image using finite difference numerical approximations of derivatives

Let us rewrite (4) in the form

Here, the area is the area in which the optical flow is searched. The value of the coefficient determines the significance level of the smoothing part of the functional (11). Note that in the literature, proposals for choosing a value differ dramatically. For example, in the book it is proposed to choose this constant equal, in the book - equal.

The task of minimizing the functional (6) is solved using the iterative procedure (7)-(8). The sequence of velocities minimizing functional (6) has the form:

Here the index shows the number of the current iteration, - the indices of the current grid node.

The iterative process ends when the discrepancy (9) between two successive iterations is less than a predetermined number:

However, this condition is rather inconvenient to use explicitly due to significant computational costs for its calculation if it is necessary to check it at each iteration. For this reason, a fixed number of iterations is usually used. For example, in the works and it is proposed to use. In practice, for images with good object contrast, iterations are sufficient. The use of a significantly larger number of iterations can lead to the appearance of erroneous nonzero velocities in those regions where the velocity field is actually zero. In particular, this happens when two different objects move at a small distance from each other. The velocity field between them is actually equal to zero, but the optical flow calculated for a large number of iterations can be nonzero due to the assumption of motion continuity.

The optical flow can also be zero where the velocity field is not zero. Such a case, for example, occurs when moving an object characterized by constant brightness over the entire area of ​​the occupied image area. In this case, the optical flow calculated at the edge of the object will be non-zero, and the one calculated at the center of the object will be close to zero, while the true velocity field should be the same on the entire surface of the object. This problem is called the "aperture problem".

The derivatives of the image brightness are proposed to be considered as follows:

Here the grid approximation of derivatives is used. The index shows the number of the current frame, - the indices of the current grid node. Variations and when calculating partial derivatives (10) can be chosen any. Usually a mesh is used

Now, by substituting partial derivatives (10) and average velocities (8) into the iterative process (7) with initial conditions, for all from the region, it is easy to find the velocities of all grid points on the observed frames of the video sequence.

The described computational scheme follows traditional methods for estimating optical flow. However, the experiments performed on a large volume of real video recordings showed that when the algorithms for analyzing optical flows directly from the original digital halftone images work, the quality of the output data of such algorithms is not high enough due to the significant effect of noise and other interference on the quality of object detection. In this regard, in this paper, it is proposed to use a special difference-accumulation procedure described in the next section as a procedure for preprocessing video data. The meaning of this procedure lies in the robust preliminary selection of the contours of moving objects, on which the estimate of optical flows is then calculated, which is used at the stage of forming hypotheses and tracking moving objects.

Bright pictures move along with the observed objects. The optical flow is the apparent movement of a luminance image. Ideally, the optical flow corresponds to the field of motion, however, we will show below that this is not always the case.

Rice. 12.2. Optical flow, not always coinciding with the field of motion. a - a smooth sphere rotates under constant illumination - the image does not change, although the field of motion is non-zero; b - a stationary sphere is illuminated by a moving source - the image illumination distribution changes, although the motion field is zero.

Consider a perfectly homogeneous sphere rotating in front of the lens of the visual system (Fig. 12.2, a). Since the surface is curved, spatial variations in brightness will be observed in the image of the sphere. However, this brightness pattern does not move with the surface and the image does not change with time. In this case, the optical flow is zero everywhere, although the field of motion is nonzero. Now consider a fixed sphere illuminated by a moving light source (Fig. 12.2, b). The brightness in the image changes with the movement of the source. In this case, the optical flow is obviously non-zero, although the field of motion is zero everywhere. Specular reflections and shadows provide other examples where the optical flow does not match the field of motion.

Only the optical flow is available to us, and we will proceed from the assumption that in the usual case the optical flow does not differ too much from the field of motion. This allows us to estimate relative motion based on how the image changes over time.

What do we mean by the apparent movement of the brightness pattern? Consider a point P on the image, which has a brightness E at the moment (Fig. 12.3). Which point P of the image will correspond to it at the moment, in other words, how does the brightness pattern move in this time interval? Usually in the vicinity of the point P there are many

points with the same brightness E. If the brightness changes continuously in the part of the image of interest to us, then the point P will lie on the line of equal brightness C. At the moment it will correspond to the line of equal brightness C with the same value E. However, what is the correspondence between the points of the lines C and FROM? This question is not easy to answer, since the shape of this line usually changes with movement.

Thus, we note that the optical flow on a changing image is ambiguously determined by local information. Let's explain this with another example. Consider a region of uniform brightness in an image that does not change with time. The "most believable" in this case would be zero optical flow. In fact, inside a homogeneous spot, we can assign any speed we want to the points. Although, apparently, we would prefer the simplest explanation of the observed changes in the image (in this case, the absence of such changes).

Let be the illumination at the point of the image at the moment of time . Then, if and are the y-components of the optical flow vector at this point, we can expect that the illumination will be the same at the moment at the point where In other words, for a small time interval This single restriction is not enough to uniquely determine the two unknowns u and V. It is also clear that we would like to take advantage of the fact that the field of motion is continuous almost everywhere.

If the brightness changes smoothly with and then we can expand the left side of the equation in a Taylor series and from here we get

where in contains terms of higher orders of smallness in the beginning

Rice. 12.3. The apparent movement of the brightness pattern.

It is not easy to decide which point P on the luminance level line C of the second image corresponds to a given point P on the luminance level line C of the first image.

Rice. 12.4. Local information about the brightness gradient and the rate of its change in time, which imposes only one limitation on the components of the optical flow vector. It is necessary that the flow velocity be directed along a straight line perpendicular to the direction of the brightness gradient. We can only define the component directed along the brightness gradient. Nothing is known about the component of the flow velocity in the perpendicular direction.

from the second. We cancel, divide by and pass to the limit at Then we get