Introduction
The original project proposal discussed the application of computer vision to
sports. In particular, I had proposed to extract 3D position
data from screenshots of a televised sporting event. However, based on
the instructor's feedback and initial experimentation, it soon became apparent
that
that problem would be too difficult to solve in a few weeks. Thus, the scope
of the project was reduced to the tracking of the ball during the game.
Although
this might sound trivial, there are many complications that make things
difficult for a computer-based vision system. For example, there are
often occlusions where the ball is no longer visible and the viewer must
make assumptions as to the location of the ball. The hybrid ball-tracking
algorithm I developed worked remarkably well for the test cases, and I present
the results in this paper.
Description of the Problem
We have tracked objects with our eyes from the moment we first looked at the
world.
We have been doing this
for many years, and so we have become quite good at it. We have watched many sporting
events on television, and watched as a ball was thrown back
and forth to score points and win games. Perhaps even some adeventurous readers
might actually have partaken in these physical activities on occasion,
and experienced the thrill of having to track an oncoming ball.
Bayes Theory postulates that we make decisions based previous experience. This
is certainly true during the tracking of a moving ball. When we observe the ball in motion,
we are not able track it because we see it perfectly at every step of the way, but
often because where we know where it is going to go. We have a knowledge of the
physical properties of the ball, the nature and rules of the sport we are
watching, and the conditions of the field that allow us to make these decisions.
This is why new viewers of a sport often have more trouble following the ball
than seasoned fans.
A dog does the same thing, for example. If you pretend to throw a stick, the
dog will follow the phantom stick with its eyes for a second. Perhaps
it might get confused, when it realizes that its perception of reality
does not match with what its eyes are seeing. However, it began to
track the stick based on a path it had seen the stick take in the past, not
because of any visual cues it was seeing (except for the moving arm, of
course).
Computers, on the other hand, do not have the luxury of relying on years of experience.
Unless one can implement an extensive knowledgebase of the physics of ball movement, the rules
of the game, player behavior, etc., one must dismiss Bayesian ideology and
deal only with what one can see at the present time. This
system can never be good as a person, since occlusion will always yield
an unpredictable state. However, despite these drawbacks my initial
results seem to suggest that it is possible to
get reasonable results from a computer vision system.
My basic algorithm is a hybrid combination of several major computer vision
technologies: edge detection, optical flow methods, and segmentation techniques.
This seemed reasonable since humans use many different cues to see objects. In
the section that follows, I will describe the algorithm in detail.
Algorithm and Implementation
After some testing, I came up with an algorithm that was robust enough
to do well in my test cases. The outline of the algorithm is as follows:
|
|
|
|
For every frame { |
|
Setup: |
|
|
Find edges in the image using modified Canny edge detector. Create
"edgeImage" matrix. |
|
|
Segment the image by mixture of gaussian into two layers (players and
ball). |
|
|
Use the edgeImage and the player layer as masks for the ball layer to
remove false hits.
the final image should only have the ball highlighted (in theory).
Call this the "ball mask" |
|
Tracking Portion: |
|
|
If the ball moved more than a certain amount in the last frame, assume
that it moved forward by this same amount. Begin tracking here. |
|
|
Begin looking for the ball from this position
by using optical flow on the edge images. Come up with best estimate for
ball location. |
|
|
Move the cursor to the nearest point on the ball mask from where the
Lucas Kanade optical flow algorithm had predicted it would be. Only do this
if the point you are moving to is closer than a certain distance. If it is
too far away, it probably is just random noise so remain where the LK
algorithm had predicted. |
|
|
Save the parameters you need, move to the next frame and start again. |
} |
As you can see, I have broken up the steps of the algorithm into a setup
phase and a tracking phase. Click on a step button to read about that step
in more detail.
Testing and Results
I needed to test my code with actual data from a soccer match, so I taped one
off television and edited a few clips to use as test cases. I did not want
to pick easy clips, so I just picked clips that just featured the kind of
typical action that can be expected at a soccer game. In addition, the
quality of the videos is not very good at all, yet the algorithm still is able
to perform remarkably well in most cases. I used
four main scenes that featured different challenges to test my algorithm.
|
|
Test1 |
Initial test case, fairly easy with one minor occlusion |
|
|
Test2 |
The free kick. Tough to track because the ball moves so quickly. |
|
|
Test3 |
A long sequence. Checks to see if the code can track for a long amount
of time. |
|
|
Test4 |
Deflections and occlusions. A crazy sequence where the ball bounces every
which way and is hidden from view several times. |
Click on each test to read more about it and to view the results.
Conclusions
Overall, the ball-tracking algorithm performed reasonably well in the test
cases, considering the poor quality of the video and the difficulties involved
in tracking
a moving ball. Further work could focus on making the system more robust.
One way to do this would be to track multiple "possible" balls for several frames if uncertainties
arise, then collapse these back when one true ball is found again. Another
possible approach would be to use an optical flow algorithm to remove the pan
of the camera and thus detect only objects which have moved. This could
be used as yet another layer to try to locate the ball.