An Explanation of Film-to-Video Frame Rate
Conversion for NTSC
To better understand the upcoming concepts, one must be
armed with some basic knowledge of how film gets transferred to video, as
well as the nature of interlaced versus progressive display. As such, the
following information is not intended to be a definitive paper on the
subject, but should serve as a good introduction for all.
The visuals and animations presented here, though large
in file size, are key and will reward repeat viewing.
Motion pictures are comprised not of motion at all, but
numerous stills shown in rapid succession. For the films we all watch at
the theater, 24 frames are shown in one second (24 frames per second, or
24fps). The NTSC television system differs from film in this regard,
making it complicated to show film on video.
Televisions
create their image by drawing (scanning) lines of light on the CRT face,
left to right, top to bottom, to produce a picture over the entire screen.
The resultant images that make up the motion picture are comprised of two
interlaced fields: that is, the first field consists of all the odd lines
(1 through 525), and the second field consists of all the even lines (2
through 524). The result is that only half of the video's display is drawn
every 60th of a second. A simulation of this is shown on the left. Field 1
is scanned, and then Field 2 is scanned. Traditional talk quotes NTSC
television as having 30 frames per second (as opposed to film's 24), each
being comprised of two interlaced fields. This is actually misleading: The
NTSC interlaced system shows 60 unique images per second, but each one
uses only half of the vertical resolution available on the display. Only
if the source material contained 30 unique frames per second could you
could say that two fields form a single frame but in reality, video
material such as the evening news is true 60 fields per second. So we
don't want to think of interlaced televisions in terms of frames but
rather in terms of fields, interlaced fields, and 60 of them per second.
The principal drawbacks of an interlaced display are
(A) visible line structure, (B) flicker caused by the rapid alternating of
the fields, and most important, (C) artifacts such as 'feathering' (also
referred to as 'combing') and 'line twitter'. Visual artifacts like these
last two occur anytime the subject or the camera is in a different
position from field to field. The subject will be in one position for one
field, and in another position for the next, resulting in jagged edges
(feathering) or shimmering horizontal lines (twitter).
The animation on the right shows an example of an
interlaced display trying to show a tomato moving from left to right. Each
field shows the tomato a little farther to the right than the previous.
Because the fields are interlaced, jagged vertical edges can't help but
exist, except during for the last two fields (5 and 6) where the tomato is
stationary. The further back you are from an interlaced display (or the
smaller the display is), the less this and other artifacts are noticed. If
you want to see the effect in real life, just stick your nose up to an
interlaced TV. Focus in on an objects edge that is stationary and wait for
it to move. You will notice this right away.
At
left is an interlaced image of a skier. Not only is the flicker annoying,
but have a good look at the ski-pole: It comes and goes because its so
fine it can only be found in one of the two interlaced fields. This is
line twitter. This artifact manifests it self when fine detail is less
than 2 scan lines high. It is exasperated during vertical movement as the
fields alternate. Often fine detail is filtered before being encoded to
minimize these artifacts when played back at home on your interlaced
display device. Because of this, we have yet to experience the full
potential of DVD.
The preceding basic knowledge of interlacing is
necessary to understand the transfer of film to video, because it is an
important factor in what we end up seeing.
Motion picture photography is based on 24 frames per
second. Time to call to mind all that math you learned in school and
realize that 24 doesn't go into 60 very easily. To boil it down a little,
our challenge is to make 4 frames from the film fit as evenly as possible
across 10 video fields. We can't just double up the fields on every fourth
film frame or we'd get a real 'stuttered' look. Instead, a process is used
known as 3-2 pulldown to create 10 video fields from 4 film frames. This
form of telecine alternates between creating 3 fields from a film frame
and 2 fields from a film frame. Hence the name 3-2.
Consider now our flow chart of the 3-2 pulldown
performed on four frames of this movie scene:
Pretty cool right? It is and it isn't. 3-2 pulldown
inherits much of the artifacts we described when talking about interlaced
video. A anytime a field follows one made from a different film frame
(noted above by the "!" icon), there exist the possibility for anomalies
in what we see, feathering and twittering being great examples. Absolutely
any differences between the two film frames that make up the video frame
(the last field of one frame and the first field of the next frame), be it
brightness, color, or especially motion, are going to result in some
artifact as the two fields merge on screen. Even our little animated
synthesis of the final interlaced product, which actually contains 10
interlaced pieces, shows evidence of such anomalies as the flying police
cars move ahead. Such is life.
As long as you are watching your movies on an ordinary
interlaced display, there is not much more to tell you. What you see at
home is pretty much what we've shown as the interlaced content in the
above illustration. But should you have the fortune to be using a
progressive display TV, the following comes into play.
Progressive displays, such as high-performance CRT/LCD/DLP/D-iLA
projectors and the new HDTV-ready TVs, can show progressive scanned images
as opposed to interlaced. In order to do this, the display must scan at a
higher rate, 2x the speed of NTSC. Because we are scanning at twice the
speed, we can draw an entire frame in the same amount of time it takes an
interlaced system to draw a single field. We learned above that an
interlaced display shows 60 fields per second. But with progressive, each
"field" is now a complete picture including all scan lines, top to bottom,
so we will now call it a frame, and we are showing 60 of those per second.
(Of course, only 24 of those are unique if the source is film based) The
benefits of a progressive display are no flicker, scan lines are much less
visible (permitting closer seating to the display), and they have none of
the artifacts we described for the interlaced display, as long as the
source material is progressive in nature (film or a progressive video
camera).
But sources which are truly progressive in nature are
hard to come by right now. Movies on DVD are almost always decoded
as interlaced fields yet all of the film's original frames are there, just
broken up. What we're going to talk about next is how we take the
interlaced content of DVD and recreate the full film frames so we can
display them progressively. The term commonly used to restore the
progressive image is deinterlacing, though we think it is more correct to
call it re-interleaving, which is a subset of deinterlacing.
Deinterlacing (or re-interleaving) involves assembling
pairs of interlaced fields into one progressive frame (1/60 of a second
long), and showing it at least twice to use up the same amount of time as
two fields. The need for 60 flashes on the screen each second stems from a
biological property called the Flicker Fusion Frequency, meaning how many
flashes that we need to see each second so that we (our brains) fuse the
image into one where we don't see a flicker.
For every film frame that had three fields made from
it, the third field is a duplicate of the first, and (if the MPEG-2
encoder is behaving properly) won't even be stored on the DVD. Instead of
encoding the duplicate fields, the DVD flags repeat_first_field and
top_field_first are used to instruct the MPEG decoder where to place these
duplicate fields during playback.
The progressive output of a DVD player should assemble
2 fields from each film frame and create a complete progressive one that
looks just like the original film frame. You should now be thinking that
the DVD will once again have 24 frames to show in one second. But the
progressive display is still expecting 60 complete frames per second. In
order to space them out, the DVD player shows the complete frames in this
order: 1, 1, 1, 2, 2, 3, 3, 3, 4, 4 and so on.
This form of display gives us a moving image very close
to the original film. It has a tendency to "judder" a bit though, as every
other film frame lasts 1/60 of a second longer than the previous one. Even
our little synthesis of the final product, which actually contains 10
pieces, shows this judder. In the future, both the player and the display
could increase their display rate above 60 fields per second, to 72 per
second. At that point, the fields would only last 1/72 of a second,
permitting the player to show every film frame three times (24 x 3 = 72),
eliminating the motion judder, and also helping us with the Flicker Fusion
Frequency problem (60 flashes per second are just barely enough in a well
lit viewing environment). This would look like: 1, 1, 1, 2, 2, 2, 3, 3,
3, 4, 4, 4 and so on. 72 fps will only work with film based sources
though, as it is a multiple of 24. It will not work well with video
sources which are 60 field per second.
The re-interleaving process we've just covered is
specific to 24fps film material which is MPEG-2 decoded (as interlaced
fields). It's really a matter of putting the right fields together so it's
fairly simple. Deinterlacing native NTSC interlaced video material is much
more complicated. In such video material, each field is a unique image in
time, and in order to be deinterlaced at an acceptable level, it requires
getting into motion-adaptive and motion-compensation algorithms to
overcome the inherent problems of the interlaced material. There is no
best method, and the two mentioned are expensive to implement.
(Note: NTSC does not really run at 60 Hz; it is
technically 59.94 Hz. The industry rounds it up to make it easier to read.
If you did play back video at 60 Hz instead of 59.94 Hz, you would end up
with a dropped frame approximately once every 20 seconds.)
- Brian Florian - Secrets
of Home Theater and Hi Fidelity