Article: swirling flows, optical methods. What you wanted to know about optical flow, but were embarrassed to ask. Methods for determining optical flow

  • 14.01.2024


In computer vision and image processing systems, the task of determining the movements of objects in three-dimensional space using an optical sensor, that is, a video camera, often arises. Having a sequence of frames as input, it is necessary to recreate the three-dimensional space captured on them and the changes that occur to it over time. It sounds complicated, but in practice it is often enough to find the displacements of two-dimensional projections of objects in the frame plane.

If we want to find out how much this or that object has shifted relative to its position on the previous frame during the time that has passed between fixing frames, then most likely, first of all, we will remember about optical flow. To find the optical flow, you can safely use a ready-made, tested and optimized implementation of one of the algorithms, for example, from the OpenCV library. At the same time, however, it is very harmless to understand the theory, so I invite everyone interested to look inside one of the popular and well-studied methods. This article does not contain code or practical advice, but there are formulas and a number of mathematical derivations.

There are several approaches to determining the offsets between two adjacent frames. For example, for each small fragment (say, 8 by 8 pixels) of one frame, you can find the most similar fragment in the next frame. In this case, the difference between the coordinates of the original and found fragments will give us a displacement. The main difficulty here is how to quickly find the desired fragment without going through the entire frame pixel by pixel. Various implementations of this approach solve the problem of computational complexity in one way or another. Some are so successful that they are used, for example, in common video compression standards. The price for speed is, of course, quality. We will consider another approach, which allows us to obtain offsets not for fragments, but for each individual pixel, and is used when speed is not so critical. It is with this that the term “optical flow” is often associated in the literature.

This approach is often called differential, since it is based on the calculation of partial derivatives along the horizontal and vertical directions of the image. As we will see later, derivatives alone are not enough to determine displacements. That is why, on the basis of one simple idea, a great variety of methods have appeared, each of which uses some kind of mathematical dance with a tambourine to achieve the goal. Let's focus on the Lucas-Kanade method, proposed in 1981 by Bruce Lucas and Takeo Kanade.

Lucas-Kanade method
All further reasoning is based on one very important and not very fair assumption: Let's assume that pixel values ​​move from one frame to the next without changing. Thus, we make the assumption that pixels belonging to the same object can shift in any direction, but their value will remain unchanged. Of course, this assumption has little to do with reality, because global lighting conditions and the illumination of the moving object itself can change from frame to frame. There are a lot of problems with this assumption, but, oddly enough, despite everything, it works quite well in practice.

In mathematical language, this assumption can be written as follows: . Where I is a function of pixel brightness from position on the frame and time. In other words, x and y are the pixel coordinates in the frame plane, and is the offset, and t is the frame number in the sequence. Let us agree that a unit period of time passes between two adjacent frames.

One-dimensional case
First, let's consider the one-dimensional case. Let's imagine two one-dimensional frames 1 pixel high and 20 pixels wide (picture on the right). In the second frame the image is slightly shifted to the right. It is this displacement that we want to find. To do this, let’s imagine these same frames as functions (figure on the left). The input is the pixel position, the output is its intensity. In this representation, the desired displacement (d) is seen even more clearly. In accordance with our assumption, this is simply shifted, that is, we can say that .

Please note that, if desired, you can also write it in general form: ; where y and t are fixed and equal to zero.

For each coordinate, we know the values ​​​​at this point, in addition, we can calculate their derivatives. Let us associate the known values ​​with the offset d. To do this, we write the Taylor series expansion for:

Let's make a second important assumption: Let us assume that it is sufficiently well approximated by the first derivative. Having made this assumption, we discard everything after the first derivative:

How correct is this? In general, not very much, here we lose in accuracy, unless our function/image is strictly linear, as in our artificial example. But this significantly simplifies the method, and to achieve the required accuracy, you can make a successive approximation, which we will consider later.

We're almost there. Displacement d is our desired value, so we need to do something with . As we agreed earlier, so we’ll simply rewrite:

Two-dimensional case
Now let's move from the one-dimensional case to the two-dimensional one. Let us write the expansion in a Taylor series for and immediately discard all higher derivatives. Instead of the first derivative, a gradient appears:

Where is the displacement vector.
In accordance with the assumption made. Note that this expression is equivalent to . That's what we need. Let's rewrite:

Since a unit time interval passes between two frames, we can say that there is nothing more than a derivative with respect to time.
Let's rewrite:

Let's rewrite it again, expanding the gradient:

We have an equation that tells us that the sum of the partial derivatives must be equal to zero. The only problem is that we have one equation, but there are two unknowns in it: and . At this point, flights of fancy and a variety of approaches begin.

Let's make a third assumption: Assume that neighboring pixels are shifted by the same distance. Let's take a fragment of the image, say 5 by 5 pixels, and agree that for each of the 25 pixels and are equal. Then instead of one equation we will get 25 equations at once! It is obvious that in the general case the system has no solution, so we will look for such and that minimize the error:

Here g is a function that determines the weighting coefficients for the pixels. The most common option is a two-dimensional Gaussian, which gives the greatest weight to the central pixel and less and less as it moves away from the center.

To find the minimum, we use the least squares method and find its partial derivatives with respect to and:

Let's rewrite it in a more compact form and equate it to zero:

Let's rewrite these two equations in matrix form:

If the matrix M is invertible (has rank 2), we can calculate and , which minimize the error E:

That's all. We know the approximate pixel displacement between two adjacent frames.

Since the pixels adjacent to it also participate in finding the offset of each pixel, when implementing this method it is advisable to first calculate the horizontal and vertical derivatives of the frame.

Disadvantages of the method
The method described above is based on three significant assumptions, which, on the one hand, give us the fundamental opportunity to determine the optical flow, but on the other hand, introduce an error. The good news for perfectionists is that we only need one assumption to simplify the method, and we can deal with its consequences. We assumed that to approximate the displacement, the first derivative would be enough for us. In the general case, this is certainly not the case (figure on the left). To achieve the required accuracy, the offset for each pair of frames (let's call them and ) can be calculated iteratively. In the literature this is called warping. In practice, this means that, having calculated the displacements at the first iteration, we move each pixel of the frame in the opposite direction so as to compensate for this displacement. At the next iteration, instead of the original frame, we will use its distorted version. And so on, until at the next iteration all the resulting displacements are less than the specified threshold value. We obtain the final displacement for each specific pixel as the sum of its displacements at all iterations.

By its nature, this method is local, that is, when determining the offset of a particular pixel, only the area around this pixel is taken into account - the local neighborhood. As a consequence, it is impossible to determine displacements within sufficiently large (larger than the size of the local neighborhood) uniformly colored areas of the frame. Fortunately, such areas are not often found in real frames, but this feature still introduces an additional deviation from the true offset.

Another problem is that some textures in the image produce a singular matrix M for which the inverse matrix cannot be found. Accordingly, for such textures we will not be able to determine the offset. That is, there seems to be movement, but it is not clear in which direction. In general, not only the considered method suffers from this problem. Even the human eye does not perceive such movement unambiguously (Barber pole).

Conclusion
We examined the theoretical foundations of one of the differential methods for finding optical flow. There are many other interesting methods, some of which today give more reliable results. However, the Lucas-Kanade method, while quite effective, remains quite simple to understand, and therefore is well suited for introducing mathematical fundamentals.

Although the problem of finding optical flow has been studied for several decades, methods are still being improved. The work continues in view of the fact that upon closer examination the problem turns out to be very difficult, and the stability and efficiency of many other algorithms depends on the quality of determining biases in video and image processing.

On this pretentious note, let me wrap up and move on to sources and useful links.
Lucas-Kanade method

  • Lucas Canada
  • Add tags

    Agafonov V.Yu. 1

    1 Agafonov Vladislav Yurievich - graduate student of the department of "computer-aided design and search design systems", Volgograd State Technical University,

    Volgograd

    Annotation: V This article discusses two approaches to finding offsets between images in a video stream. The first approach is based on minimizing the error using the least squares method when searching for the displacement of key points in the image. The second approach is based on approximating some region of each pixel by a quadratic polynomial, which can be well used in motion estimation. The main stages of implementing the methods are described. Test sets of images were prepared taking into account the specifics of the task. The results of these tests are presented.

    Keywords: image processing, image displacement search

    APPLICATION OF OPTICAL FLOW METHODS TO IMAGE SHIFT ESTIMATION

    Agafonov V.U. 1

    1 Agafonov Vladislav Urevich - post-graduate student of “systems of computer-aided design and search design” department,

    Volgograd State Technical University, Volgograd

    Abstract: in this article, methods for image shift estimation are described. The first approach is based on minimizing mean squared error when searching for the displacement of image key points. The second approach is based on approximate each neighborhood of both frames by quadratic polynomials, which can be done efficiently using the polynomial expansion transform. The main stages of the implementation of these methods are described. Test sets of images are prepared taking into account the specific nature of the problem. Results of the work are presented.

    Keywords: image processing, image shift estimation

    UDC 004.932.2

    Introduction

    When solving computer vision problems, the problem of estimating the movement of an object in frames or the displacement of two consecutive frames in a video stream often arises. Accurate estimation of scene motion parameters in frames is an important factor for many algorithms, so motion detection methods must provide sub-pixel accuracy.

    There are several approaches to estimating image displacement. The main ones are: identifying key features in the image, followed by comparison of key points, determining the phase correlation of the frequency representation of the image signal and optical flow. The latter in its main form is not used to search for image displacements; it is used to determine the presence of movement in the image. Thus, optical flow is used in systems for detecting and tracking objects.

    Optical flow is the pattern of apparent motion of objects, surfaces, or edges of a scene caused by the relative motion of an observer (eye or camera) relative to the scene. Optical flow-based algorithms—such as motion detection, object segmentation, and motion encoding—use this motion of objects, surfaces, and edges.

    In this work, two approaches are considered, one of which allows you to obtain offsets for individual key points, and the second - for each individual pixel.

    Optical flow by Lucas-Kanade method

    The basic concept of the method is to assume that pixel values ​​move from one frame to the next without changing:

    Where. Ignoring the remainder term, we obtain the formula for the approximation ????:

    where determines the offset. In accordance with the assumption made, we obtain that. Then we write equation (4) as

    It turns out that the sum of partial derivatives is zero. However, since we have only one equation and two unknowns, we have to impose additional conditions. One common way is to impose restrictions on nearby pixels. Let's assume that neighboring pixels move equally. Obviously, there cannot be an offset that satisfies all pixels, so we minimize the error using the least squares method.

    where is a symmetric matrix, is a vector, and is a scalar. The coefficients are calculated using weighted least squares for signal values ​​in a given neighborhood. The weighting function consists of two components called certainty and applicability. Certainty relates the value of a signal according to values ​​in the neighborhood. Applicability determines the relative contribution of points according to their position in the neighborhood. Usually the central point has the greatest weight, the weight of the remaining points decreases in the radial direction. Since the image is expanded as a quadratic polynomial in a certain neighborhood, it is necessary to understand how the polynomial behaves with an ideal displacement. Let a quadratic polynomial of the form:

    Equating the corresponding coefficients of quadratic polynomials we have:

    This is true for any signal.

    Obviously, the assumption that the signal can be represented by a single polynomial is quite unrealistic. However, relation (10) can be used for real signals, even if errors occur. The main question is whether the errors are small enough for the algorithm to be useful.

    Let us replace the global polynomial representation with a local one. Let's calculate the polynomial coefficients for the first image and for the second image. In accordance with (9) it should be, but in practice the approximation is used:

    where reflects the replacement of the global displacement by a spatially varying displacement.

    In practice, equation (11) can be solved element by element, but the results turn out to be too noisy. Instead, the assumption is made that the displacement region changes slowly so that we can integrate information using neighboring pixels. Thus we try to find, satisfying (11) in the neighborhood, or more formally:

    The main problem with the method is the assumption that the polynomials are the same, except for the offset. Since the local polynomial expansion of the polynomial will change depending on the environment, this will introduce an error into (11). For small displacements this is not important, but as the displacement increases, the error increases. If we have a priori knowledge of the bias, we can compare two polynomials: the first in, the second in, where this is the prior bias, rounded to an integer value. This way we can execute the method iteratively. A better prior bias means a relatively smaller bias, which in turn increases the chances of a good estimate of the real bias.

    Application of optical flow methods to the problem of searching for image displacements

    To test these methods, testing was carried out on an artificial data set, i.e. obtained using a biased image generator. To obtain a subpixel shift, areas with a slight shift relative to each other were cut out from the original image. After this, the resulting test images were compressed several times; accordingly, some of the information on them disappeared, and the displacement was reduced to subpixel.

    For the Lucas-Kanade method, it is necessary to select many key points, the displacement of which must be found. For this purpose, the classical method of searching for boundaries and corners is used. After obtaining the displacement of each of these points, it is necessary to average the resulting result.

    For the Farneback method, it is enough to average the displacement values ​​in each pixel.

    The experiment was carried out on a sample of 20 test pairs and the standard deviation for each method was calculated.

    OP Lucas-Canada

    OP Farnebaca

    Table 1 - Standard deviation of displacements

    From the experimental results, it can be seen that both methods estimate the displacement of images with high accuracy. The result of Farneback's method shows a poorer result due to the fact that the method estimates the displacement for all pixels and may make errors on monochromatic parts of the image.

    Each method has its own advantages and disadvantages, as well as scope and limitations. The Lucas-Kanade method implements a sparse optical flow and

    can achieve sub-pixel precision. But with non-plane-parallel motion, another problem arises, how to estimate the displacement of points that are not key. The first solution is to construct a Voronoi diagram for the image plane with given key points. This approach is based on the assumption that areas located close to key points move in the same way as key points. In general, this is not always true. In practice, key points are mostly areas of sharp gradient changes, meaning key points will be mostly concentrated on detailed objects. Therefore, the displacement of solid color areas will have a large error. The second solution is to find the homography of two images. This solution also suffers from inaccurate estimation of displacement at the edges of the image.

    The advantage of the Farneback optical flow method is that the displacement is found for each pixel. However, this method also makes mistakes on periodic and weakly textured objects, but allows us to evaluate non-plane-parallel motion.

    Conclusion

    This article discussed an approach for estimating image displacement based on optical flow methods. The applicability of the methods is proven and the results of a comparative analysis are presented. Experimental studies of the methods have shown that this approach is highly accurate and can be used to estimate scene motion parameters.

    Bibliography/ References

    1. Fleet D., Weiss Y. Optical flow estimation //Handbook of mathematical models in computer vision. - Springer US, 2006. - pp. 237-257.
    2. Farnebäck G. Two-frame motion estimation based on polynomial expansion //Image analysis. - 2003. - P. 363-370.

    Optical flow is a technology used in various areas of computer vision to determine shifts, segmentation, object selection, and video compression. However, if we want to quickly implement it in our project, having read about it on Wikipedia or somewhere else, then, most likely, we will very quickly stumble upon the fact that it works very poorly and fails when determining shifts of the order of 1-2 pixels (at least that was the case for me). Then let's turn to ready-made implementations, for example, in OpenCV. There it is implemented using various methods and it is completely unclear why the abbreviation PyrLK is better or worse than the designation Farneback or something like that, and you will have to figure out the meaning of the parameters, of which there are a lot in some implementations. Moreover, what’s interesting is that these algorithms somehow work, unlike what we wrote ourselves. What's the secret?

    What is optical flow

    Optical flow (OP) is an image of apparent motion that represents the shift of each point between two images. In essence, it represents a velocity field (since the shift, up to scale, is equivalent to the instantaneous velocity). The essence of the OP is that for each point in the image a shift (dx, dy) is found so that the point on the second image corresponds to the original point. How to determine the correspondence of points is a separate question. To do this, you need to take some kind of point function that does not change as a result of the displacement. It is usually considered that a point retains its intensity (i.e., brightness or color for color images), but points that preserve the magnitude of the gradient, the Hessian, its magnitude or its determinant, the Laplacian, and other characteristics can be considered identical. Obviously, maintaining the intensity fails if the illumination or the angle of incidence of the light changes. However, if we are talking about a video stream, then, most likely, the lighting will not change much between two frames, if only because a short period of time passes between them. Therefore, intensity is often used as a function that is conserved at a point.

    Based on this description, one can confuse OP with the search and comparison of characteristic points. But these are different things, the essence of optical flow is that it does not look for any special points, but, based on the image parameters, tries to determine where an arbitrary point has shifted.

    There are two options for calculating optical flow: dense (dense) and selective (sparse). The Sparse stream calculates the shift of individual given points (for example, points selected by some feature detector), the dense stream calculates the shift of all image points. Naturally, the selective stream is calculated faster, but for some algorithms the difference is not so big, and for some tasks it is required to find the flow at all points of the image.

    For degenerate cases, simpler methods for determining the shift can be used. In particular, if all image points have the same shift (the entire image is shifted), then the phase correlation method can be applied: calculate the Fourier transform for both images, find the convolution of their phases and use it to determine the shift (see en.wikipedia.org /wiki/Phase_correlation). You can also use block matching: find the shift that minimizes the norm of the difference between images in the window. In its pure form, such an algorithm will work for a long time and is unstable to rotations and other distortions. The English Wikipedia lists the listed algorithms as various options for calculating optical flow, but this does not seem very correct to me, since these algorithms can be used for other purposes and do not completely solve this problem. We will call methods based on local characteristics of images optical flow (what is called differential methods in the English Wikipedia).

    Standard approach (Lucas-Kanade method)

    A mathematical description of the algorithm is given in sufficient detail in this article, but it only touches on theoretical aspects.

    Let's consider a mathematical model of optical flow, assuming that the intensity of the point has not changed as a result of the displacement.

    Let – intensity at some point (x, y) in the first image (i.e. at time t). In the second image, this point has moved by (dx, dy), while time dt has passed, then we have Taylor expanded the intensity function to the first term (later it will be mentioned why only to the first), here are the partial derivatives with respect to coordinates and time , that is, in essence, a change in brightness at point (x, y) between two frames.

    We believe that the point has retained its intensity, which means
    We get one equation with two unknowns (dx and dy), which means it is not enough to solve, that is, you won’t get far with this equation alone.

    The simplest solution to the problem is the Lucas-Kanade algorithm. In our image, we have objects larger than 1 pixel in size, which means that, most likely, in the vicinity of the current point, other points will have approximately the same shifts. Therefore, we take a window around this point and minimize (by least squares method) the total error in it with weighting coefficients distributed according to Gaussian, that is, so that the pixels closest to the one under study have the greatest weight. After the simplest transformations, we obtain a system of 2 equations with 2 unknowns:

    As is known, this system does not always have a unique solution (although very often): if the determinant of the system is equal to zero, then there are either no solutions or an infinite number. This problem is known as the Aperture problem - shift ambiguity with a limited field of view for periodic images. It corresponds to the case when a fragment of an image comes into view, in which there is some cyclicity; here even a person will not be able to unambiguously determine where the picture has shifted. The problem is that due to noise in such ambiguous situations, we will not get a zero determinant, but a very small one, which will most likely lead to very large shift values ​​that are not particularly correlated with reality. So at a certain stage you just need to check whether the determinant of the system is small enough, and, if so, not consider such points or mark them as erroneous.

    Why does not it work?
    If we stop at this stage and implement this algorithm, it will work successfully. But only if the shift between adjacent images is very small, on the order of 1 pixel, and even then not always. (To analyze the quality, synthetic sequences with different relative shifts were generated, and this shift can be expressed by a non-integer number of pixels, then the resulting image is interpolated accordingly) Already at a shift of 2 pixels the error will be large, and if 3 or more, then the result will be generally inadequate. What's the matter?

    Here the mathematics set us up. She instilled in us the feeling that all the functions around us are continuous and many times differentiable. And in general, at the institute we were taught to write the approximation of a function in the neighborhood of a point using the Taylor formula, and we thoughtlessly and joyfully use it everywhere. Now let’s think about the physical meaning of derivatives in this place? We want to use them to determine the change in the value of a function in a finite neighborhood of a point, and the derivative gives an idea of ​​​​an infinitesimal neighborhood. To expand this neighborhood, it would be possible to add a higher order of derivatives to the Taylor expansion, but this will lead to nonlinearities in the system, which will make it much more difficult to solve, and the advantages will be questionable, especially since in practice we are not dealing with continuous multiple differentiable functions, but with generally unclear what discrete functions. Therefore, it would be more logical to look for a function g(x), for which in our discrete case f(x) + g(x) = f(x+1), f(x) + 2g(x) = f(x) is satisfied as accurately as possible +2), f(x) - g(x) = f(x-1), etc. Thus, in this case we do not need a derivative, but some linear function that is closest to the points of the original function. Simple mathematical calculations lead to a solution , Where . If we constructed the derivative using one adjacent point on each side, then we were lucky: in this case the formula coincides with the formula for approximate calculation of derivatives: g(x) = (f(x+1) – f(x-1)) / 2. Typically, in OpenCV, when calculating the Lucas-Kanade optical flow, exactly this formula is used, we will return to this later. But if you take more points, then the formula becomes completely different from the classical difference schemes for the first derivative.

    Obviously, if we build this function, for example, using three neighboring points to the left and right of the original one, then it does not depend in any way on the points located further away, and, accordingly, if we shift more than three points, we will still often get inadequate results . And also, the greater the number of points from which we construct this function, the greater the average deviation of the resulting line from the points used - again due to the fact that we do not have linearly varying images, but who knows what kind. In practice, shifts of more than 2 pixels already produce an inadequately large error, no matter how many points we take.

    Another weak point of the algorithm is that, again, we are not dealing with smooth continuous functions, but with arbitrary ones, and even discrete ones. Therefore, in some image fragments the intensity may “jump” without any obvious patterns at all, for example at the boundaries of objects, or due to noise. In this case, no function g(x) can sufficiently accurately describe the changes in the image in the vicinity of the point. To combat this (at least partially), it is proposed to smear the original image, and it will be useful to smear it quite strongly, that is, it is better to use not even everyone’s favorite gaussian blur (averaging with weighting coefficients), but just a box filter (uniform averaging over the window ), and several times in a row. The smoothness of the image is now more important to us than detail.

    However, these measures will also not save us from limiting the detected shift to 2-3 pixels. And by the way, in OpenCV 1.0 there was such an implementation of optical flow, and it only worked under ideal conditions and at very small shifts.

    What to do?
    In summary, the usual Lucas-Kanade is good at identifying small shifts, such that the picture is similar to its linear approximation. To combat this, we’ll use the standard CV technique - multi-scaling: we’ll build a “pyramid” of images of different scales (almost always scale by 2 times along each axis, it’s easier to calculate) and go through them with an optical flow from a smaller image to a larger one , then the detected small shift in a small image will correspond to a large shift in a large image. On the smallest image we detect a shift of no more than 1-2 pixels, and moving from a smaller scale to a larger one, we use the result from the previous step and refine the shift values. Actually , in OpenCV it is implemented by the function calcOptFlowPyrLK. Using this pyramidal algorithm allows us not to bother with calculating a linear approximation over many points: it’s easier to take more levels of the pyramid, and at each level take a rather rough approximation of this function. Therefore, OpenCV calculates using only two neighboring points. And therefore, in relation to this implementation of the algorithm, our conclusions about the advantage of the approximating function over the derivative turned out to be useless: for such a number of reference points, the derivative is the best approximating function.

    What other ones are there?

    This algorithm is not the only option for calculating optical flow. In OpenCV, in addition to the Lucas-Kanade flow, there is also a Farneback and SimpleFlow flow, and the Horn–Schunck algorithm is also often referred to.

    Method Horn–Schunck is somewhat more global in nature than the Lucas-Kanade method. It relies on the assumption that the optical flow will be fairly smooth throughout the entire image. From the same equation it is proposed to move to the functional, that is, to add the requirement for the absence of a sharp change in shifts with a weighting coefficient α. Minimizing this functional leads us to a system of two equations:

    In these equations, the Laplacian is proposed to be calculated approximately: – difference with the average value. We obtain a system of equations, which we write for each pixel and solve the overall system iteratively:

    In this algorithm they also suggest using multi-scaling, and they recommend scaling images not by 2 times, but by a factor of 0.65

    This algorithm was implemented in the first versions of OpenCV, but was later abandoned.

    Farneback proposed to approximate the change in intensity in the neighborhood using a quadratic form: I = xAx + bx + c with a symmetric matrix A (in fact, considering the Taylor expansion to the first term, we took the linear approximation I = bx + c, that is, now we are just decided to increase the accuracy of the approximation) If the image has moved within this neighborhood, then, we substitute in the quadratic expansion, open the brackets, we get


    .

    Now we can calculate the values ​​of A, b, c in both pictures, and then this system will become redundant with respect to d (the first equation is especially confusing), and in general d can be obtained from the second equation: . We have to resort to the following approximation: . Let us also denote for simplicity , Then we get simply .

    To compensate for noise during calculations, we again turn to the assumption that in the vicinity of the point under study, all points have more or less the same shift. Therefore, we again integrate the error over the window with Gaussian weighting coefficients w, and find the vector d that minimizes this total error. Then we will get the optimal value and the corresponding minimum error. That is, we need to calculate for each point, average over the window, invert the matrix and get the result. Accordingly, these products can be calculated for the entire picture and pre-calculated values ​​for different points can be used, that is, this is exactly the case when it makes sense to calculate a dense flow.

    As usual, this algorithm has a number of modifications and improvements, primarily allowing the use of known a priori information - a given initial approximation of the flow - and, again, multi-scaling.

    The basis of the method SimpleFlow lies the following idea: if we still don’t know how to determine a shift larger than the size of the window over which we searched for derivatives, then why even bother with calculating derivatives? Let's just find the most similar point in the window! And to resolve ambiguities and compensate for noise, we take into account that the flow is continuous and in the vicinity of a given point all points have almost the same shift. And we will again solve the problem with the window size through multi-scaling.

    More strictly, the algorithm sounds like this: for all points in the window there is an “energy” function that is responsible (with an inverse logarithmic dependence) for the probability of the initial point transitioning to this point: . Next, the convolution of this energy with a Gaussian window is considered and the values ​​(dx,dy) that minimize this function are found. To obtain sub-pixel accuracy, a small neighborhood of the found optimal point (dx,dy) is considered and the peak of the energy function is searched for in it as the peak of a paraboloid. And, as mentioned above, this procedure is performed on a pyramid of scaled images. Their algorithm also offers clever methods for speeding up calculations, but those who are interested will figure it out on their own. What is important for us is that due to this, this algorithm is (theoretically) quite fast with good accuracy. And it doesn’t have the same problem as the previous ones, that the larger the shift, the worse it is detected.

    What if we don’t take intensity?

    It was said above that the correspondence between points can be determined by different quantities, so why do we consider only intensity? But because any other value can be reduced to it: we simply filter the images with the appropriate filter and feed the filtered images to the input of the algorithms described above. Accordingly, if you want to use optical flow, then first think about which image characteristic will be the most stable under your conditions, and carry out appropriate filtering so that the input of the algorithm is not intensity, but this characteristic.

    Practice

    Let's try out in practice the algorithms that OpenCV offers us.

    Here you can carry out many different studies of each algorithm, varying parameters, changing input sequences - with different shifts, rotations, projective transformations, segments, with different noise, etc. This would all take a lot of time and the size of the report would exceed this article, Therefore, here I propose to limit ourselves to the simple case of parallel image shifting by a fixed distance and the imposition of small noise. This will allow you to understand in general terms how to run algorithms and which of them is cooler.

    The syntax of the procedures is described in detail on the page with the manual; here I will provide a summary translation with my comments.

    The classic Lucas-Kanade is implemented with a pyramid in the calcOpticalFlowPyrLK procedure. The algorithm calculates the sparse flow, that is, for a given set of points in the first image, it estimates their position in the second. The input parameters are quite obvious: two input images, an input and an output set of points, status - an output vector indicating whether the corresponding point was successfully found, err - an output vector of the estimated errors of the corresponding points, WinSize - the size of the window over which the Gaussian averaging occurs, I took 21x21 and it worked well, maxLevel – the number of layers in the pyramid minus one, i.e. the number of the last layer, I took 5, criteria – the condition for exiting the iterative process of determining the shift (minimizing the error is done iteratively) – I left this parameter by default, flags – additional flags, for example, you can use the initial flow approximation or select the error estimation method, minEigThreshold – the gradient threshold value below which the matrix is ​​considered singular, I left it by default. Since OpenCV 2.4.1, a pre-computed pyramid of scaled images can be used when calculating the flow.

    The result of the work is that both small and large shifts are successfully and stably detected, it is resistant to fairly large noises, the operating time is about 10 ms for 400 points with a 5-layer pyramid (on a core i7 950).

    By the way, this algorithm is also implemented on GPU (CUDA), both dense and sparse versions.

    The Farneback flow is implemented by the calcOpticalFlowFarneback procedure, the dense flow is calculated, that is, the shift of each point. Parameters: input images, output stream in the format of a two-channel float matrix, pyr_scale determines the scale ratio between the layers of the pyramid, levels – the number of levels in the pyramid, winsize – the size of the window over which averaging is performed, iterations – the number of iterations at each level, poly_n – the size of the polynomial by which the values ​​of A and b are estimated, poly_sigma – the sigma of the Gaussian blur when smoothing derivatives, recommended parameter values ​​are indicated in the manual, flags – additional flags, for example, you can use the initial approximation of the flow or otherwise average over the window.

    This algorithm is much less stable (according to my observations), it misses more easily on fairly uniform pictures (apparently, the problem is the lack of filtering of unsuccessful points), and it does not detect large shifts well. It worked for me in 600 ms on a 512x512 image.

    The SimpleFlow flow implements the calcOpticalFlowSF procedure (again, the dense flow is calculated), and it has many mysterious parameters without default values, and in general, at the moment the information on the page is provided very succinctly. Let's try to figure it out. The first 3 are input images and output two-channel; layers – the number of layers in the pyramid, that is, how many times we scale the original image; averaging_block_size – the size of the window in which we calculated the pixel energy function; max_flow – the maximum shift that we want to be able to determine at each step, essentially it is determined by the window size (although it is not entirely clear why it is int). You can stop here, or you can set a few more parameters, the meaning of some of them eludes me.

    The site offers an example of its use, in which it is launched with the following parameters: calcOpticalFlowSF(frame1, frame2, flow, 3, 2, 4, 4.1, 25.5, 18, 55.0, 25.5, 0.35, 18, 55.0, 25.5, 10) ;

    My algorithm works much slower than others, about 9-12 seconds per 512x512 image. The result of the work seems more plausible than Farneback, at least the shift in uniform images is better determined, and it works noticeably better with large shifts.

    conclusions

    If you want to use optical flow somewhere, first consider whether you need it: you can often get by with simpler methods. Undertaking to implement a flow yourself is worth only thinking a few times: each algorithm has many tricks, subtleties and optimizations; Whatever you do, it will probably work better in OpenCV (assuming it's there, of course). Moreover, they make full use of logical and hardware optimizations such as using SSE instructions, multithreading, calculation capabilities with CUDA or OpenCL, etc. If you just need to calculate the shift of a certain set of points (i.e. sparse stream), then you can safely use function calcOpticalFlowPyrLK, it works well, reliably and quite quickly. To calculate dense flow, it is good to use the calcOpticalFlowSF function, but it is very slow. If performance is critical, then calcOpticalFlowFarneback, but you still need to make sure that the results of its work will suit you. Add tags

    Please format it according to the article formatting rules.

    Optical flow is an image of the apparent movement of objects, surfaces, or edges of a scene resulting from the movement of the observer (eye or camera) relative to the scene. Optical flow-based algorithms—such as motion detection, object segmentation, motion encoding, and stereo disparity calculation—exploit this motion of objects, surfaces, and edges.

    Optical Flow Estimation

    Sequences of ordered images allow motion to be estimated as either instantaneous image velocity or discrete displacement. Fleet and Weiss compiled a tutorial on the gradient method of optical flow estimation.

    An analysis of methods for calculating optical flow was carried out in the work of John L. Barron, David J. Fleet and Steven Beauchemin. They review the methods both from the point of view of accuracy and from the point of view of the density of the resulting vector field.

    Optical flow-based methods calculate the motion between two frames taken at time and , at each pixel. These methods are called differential, since they are based on approximating the signal by a segment of a Taylor series; thus, they use partial derivatives with respect to time and spatial coordinates.

    In case of dimension 2D+ t(higher dimension cases are similar) a pixel at a position with intensity in one frame will be moved by , and , and the following equation can be written:

    Assuming that the displacement is small and using the Taylor series, we obtain:

    .

    From these equalities it follows:

    from here it turns out that

    - components of the optical flow velocity in ,
    , , - derivative images in the corresponding directions.

    Thus:

    The resulting equation contains two unknowns and cannot be uniquely resolved. This circumstance is known as aperture problem. The problem is solved by imposing additional restrictions - regularization.

    Methods for determining optical flow

    Using Optical Flow

    Optical flow research is widely conducted in the fields of video compression and motion analysis. Optical flow algorithms not only determine the flow field, but also use optical flow in analyzing the 3D essence and structure of the scene, as well as the 3D motion of objects and the observer relative to the scene.

    Optical flow is used in robotics for object recognition, object tracking, motion detection, and robot navigation.

    In addition, optical flow is used to study the structure of objects. Since detecting motion and creating maps of the structure of the environment are an integral part of animal (human) vision, the implementation of this innate ability by means of a computer is an integral part of computer vision.

    Imagine a five-frame video of a ball moving from the bottom left to the top right. Motion detection methods can determine that on a two-dimensional plane a ball is moving up and to the right, and vectors describing this motion can be obtained from a sequence of frames. In video compression, this is the correct description of the sequence of frames. However, in the field of computer vision, without additional information, it is impossible to tell whether the ball is moving to the right and the observer is standing still, or the ball is at rest and the observer is moving to the left.

    see also

    Notes

    Links

    • DROP: (Windows Interface) Dense Optical Flow Estimation Freeware Software Using Discrete Optimization.
    • The French Aerospace Lab: GPU implementation of a Lucas-Kanade based optical flow

    Wikimedia Foundation. 2010.

    • Optical flow meters
    • Double coated optical fiber

    See what “Optical flow” is in other dictionaries:

      Optical element- an optical system that allows you to obtain retroreflective reflection. Note The following types of optical elements are distinguished: flat-edged, spherical and film. Source … Dictionary-reference book of terms of normative and technical documentation

      optical efficiency of a lighting device- The efficiency of a lighting device, calculated in relation to the nominal luminous flux of the lamp (lamps), without taking into account the influence of the environment, thermal conditions and the position of the lighting device on the luminous flux of the lamp (lamps). [GOST... ... Technical Translator's Guide

      Optical tweezers- Scheme of using optical tweezers in the study of RNA polymerase Optical tweezers (English ... Wikipedia

      ELECTRON-OPTICAL CONVERTER- (EOC), a vacuum photoelectronic device for converting an image of an object invisible to the eye (in IR, UV and X-rays) into a visible one or to enhance the brightness of the visible image. The operation of the image intensifier is based on optical conversion. or x-ray... ... Physical encyclopedia

      Lucas–Canade- Lucas Kanade's algorithm is a widely used differential local method for calculating optical flow in computer vision. As is known, the basic equation of optical flow contains two unknowns and cannot be uniquely resolved.... ... Wikipedia

      Lucas' algorithm- Canada, a differential local method for calculating optical flow, widely used in computer vision. The fundamental optical flow equation contains two unknowns and cannot be uniquely resolved. Lucas Kanade's algorithm bypasses... ... Wikipedia

      OpenCV- Type computer vision Author... Wikipedia

      Tracking (computer graphics)- This term has other meanings, see Tracking (meanings). Tracking is the determination of the location of a moving object (several objects) in time using a camera. The algorithm analyzes video frames and gives the position... ... Wikipedia

    The construction of optical flow is traditionally considered as a procedure for assessing brightness-geometric changes between the present (current) and previous frames. The movement of objects in front of a stationary camera, as well as the movement of the camera in the surrounding environment, lead to corresponding changes in the image. The apparent movement of the visible field (two-dimensional brightness distribution) observed when the camera moves relative to the imaged objects or objects relative to the camera is called optical flow. Let's define the motion field by assigning a velocity vector to each point in the image. At some selected moment in time, a point on the image corresponds to a certain point on the surface of the object. These two points are related by design equations. The object point moves relative to the camera with speed. This generates movement of the corresponding point in the image. Over time, the point moves a distance, and its image moves a distance (see Fig. 1).

    Rice. 1.

    Brightness distributions move along with the observed objects. Optical flow, as mentioned earlier, is the apparent movement of the brightness pattern. Ideally, the optical flow corresponds to the previously determined motion field, however, in practice this is not always the case.

    Let now be the brightness of a pixel at a point in the image at an instant in time. Then, if and - and are components of the optical flow vector at this point, then we can expect that

    Let's cancel by on the left and right sides, divide by and go to the limit at. We get

    Derivatives can be easily obtained from the image using numerical approximations of the derivatives by finite differences

    Let us rewrite (4) in the form

    Here the region is the region in which the optical flow is searched. The value of the coefficient determines the level of significance of the smoothing part of the functional (11). Note that in the literature, proposals for choosing values ​​differ radically. For example, in the book it is suggested to choose this constant equal, in the book - equal.

    The problem of minimizing the functional (6) is solved using the iterative procedure (7)-(8). The sequence of speeds that minimizes functional (6) has the form:

    Here the index shows the number of the current iteration, - the indices of the current grid node.

    The iterative process ends when the discrepancy (9) between two successive iterations is less than a predetermined number:

    However, this condition is quite inconvenient to use explicitly due to the significant computational costs of calculating it if it is necessary to check it at each iteration. Therefore, a fixed number of iterations is usually used. For example, in the works it is proposed to use. In practice, for images with good object contrast, iterations are sufficient. Using a significantly larger number of iterations can lead to the appearance of erroneous non-zero velocities in those regions where the velocity field is actually zero. In particular, this occurs in cases where two different objects are moving at a short distance from each other. The velocity field between them is actually zero, but the optical flow calculated over a large number of iterations may be nonzero due to the assumption of continuity of motion.

    Optical flow can also be zero where the velocity field is not zero. This case, for example, occurs when moving an object characterized by constant brightness over the entire area of ​​the occupied image area. In this case, the optical flux calculated at the boundary of the object will be non-zero, and the one calculated at the center of the object will be close to zero, while the true velocity field should be the same over the entire surface of the object. This problem is called the “aperture problem.”

    Derivatives of image brightness are proposed to be calculated as follows:

    Here, a grid approximation of derivatives is used. The index shows the number of the current frame, - the indices of the current grid node. Variations and when calculating partial derivatives (10) can be chosen in any way. Typically a mesh is used

    Now, by substituting partial derivatives (10) and average velocities (8) into the iterative process (7) with initial conditions, for everyone in the region, it is easy to find the velocities of all grid points on the observed frames of the video sequence.

    The described computational scheme follows traditional optical flow estimation methods. However, experiments conducted on a large volume of real video recordings have shown that when optical flow analysis algorithms operate directly from the original digital halftone images, the quality of the output data of such algorithms is not high enough due to the significant influence of noise and other interference on the quality of object detection. In this regard, in this work it is proposed to use a special difference-cumulative procedure described in the next section as a procedure for preprocessing video data. The meaning of this procedure is the robust preliminary selection of the contours of moving objects, from which an estimate of optical flows is then calculated, which is used at the stage of forming hypotheses and tracking moving objects.