Coding Adventure: Rendering Text
Summary
TLDRIn this Coding Adventures episode, the focus is on the intricacies of rendering text using TrueType font files. The host delves into the complexities of font file formats, exploring the challenges of decoding and rendering glyphs. Through trial and error, various methods are tested, including parsing font tables, handling bezier curves, and implementing anti-aliasing for smoother edges. The video also touches on the patent issues surrounding certain rendering techniques and the host's journey in overcoming various rendering artifacts and issues with overlapping shapes and complex fonts.
Takeaways
- 🌟 The video is about the process of rendering text using font files and the complexities involved in ensuring clear and legible display.
- 📚 The journey begins with understanding the TrueType font format and the importance of tables like 'glyf' for storing glyph shapes.
- 🔍 Parsing font files involves dealing with endianness issues, byte offsets, and various data structures that define the properties and contours of glyphs.
- 🎨 Rendering glyphs involves interpreting the font's bytecode instructions and handling the intricacies of bezier curves and line segments.
- 🔄 The importance of handling compound glyphs and understanding the mapping between glyph indices and unicode values is highlighted.
- 📐 The video emphasizes the challenge of rendering text at low resolutions and the need for techniques like anti-aliasing to improve legibility.
- 💡 The exploration of different rendering techniques, such as using textures, signed distance fields, and direct bezier rendering, is discussed.
- 🐛 Debugging font rendering issues involves creating 'evil artifact detectors' and refining algorithms to handle edge cases and font peculiarities.
- 🚀 The video showcases the improvement in performance and visual quality through the use of shaders and optimized algorithms.
- 📖 The script concludes with the acknowledgment of remaining challenges in rendering complex fonts and the need for further exploration and refinement.
- 🎓 The content is a rich resource for understanding the technical aspects of font rendering and the trade-offs involved in different approaches.
Q & A
What is the primary focus of the video?
-The primary focus of the video is to explore the process of rendering text using font files, specifically focusing on the technical aspects of decoding and rendering fonts, and handling various challenges such as anti-aliasing and overlapping shapes.
Which font file format does the video begin with?
-The video begins with the TrueType font file format, which was developed by Apple in the late 1980s.
What is the significance of the 'glyf' table in a font file?
-The 'glyf' table is significant because it stores the shapes of the glyphs, or characters, in the font.
What is the main problem that the video tries to address with low-resolution text rendering?
-The main problem addressed is ensuring that text is displayed or printed clearly at low resolutions, where unlucky scaling could cause parts of a letter to appear twice as thick as others, making the letters difficult to read.
How does the video tackle the endianness issue when reading font tables?
-The video creates a FontReader class in C# that handles the conversion from big endian to little endian, allowing the program to correctly read the font table data.
What is the role of the 'head' and 'maxp' tables in a font file?
-The 'head' table contains metadata about the font, while the 'maxp' table provides information about the maximum number of glyphs and other data useful for font processing.
What is a compound glyph, and how is it handled in the video?
-A compound glyph is a character made up of other glyphs. In the video, compound glyphs are handled by reading in each component glyph and combining their points and contour end indices into a single glyph.
How does the video address the issue of text legibility at small sizes?
-The video acknowledges that legibility at small sizes is a challenge and suggests that delving into the bytecode interpreter aspects of font rendering might be necessary for improvement, though it does not provide a concrete solution within the scope of the video.
What is the basic idea behind the anti-aliasing technique implemented in the video?
-The basic idea behind the anti-aliasing technique is to use a grid of points within each pixel and calculate a coverage value based on how many of those points are inside the glyph, resulting in smoother edges and improved appearance at larger magnifications.
How does the video propose to optimize the rendering performance for complex fonts?
-The video suggests optimizing the rendering performance by splitting each glyph into multiple bands so that each pixel only has to test the bezier segments inside its band, and by constructing tighter fitting meshes tailored for each individual glyph through pre-processing.
Outlines
🚀 Introduction to Font Rendering
The video begins with the host expressing excitement about exploring the process of rendering text, starting with the need to select a font. The chosen font, JetBrainsMono-Bold, is opened in a text editor to inspect its contents, which are found to be unintelligible. The host decides to refer to a reference manual to understand the TrueType font format, developed by Apple in the 1980s. The complexity of the font file is discussed, including its tables and bytecode instructions. The focus is on the 'glyf' table, which is believed to contain the shapes of the characters. The host encounters an issue with the number of tables read from the font file but resolves it by understanding the endianness of the file format. The video then delves into the structure of the tables and how to interpret them.
📚 Understanding the Font Table Structure
The host continues to explore the font tables, focusing on the table directory and the metadata it provides. The discussion includes the identification tags and byte offsets of the tables. The host experiences initial difficulties with reading the tags due to skipping over unnecessary data. After adjusting the code, the host successfully reads the tags and identifies the glyph table entry. The structure of the glyph data is explained, including contours, bounding boxes, and instructions. The host also discusses the concept of 'FWords' and how they relate to the glyph data. The video then demonstrates how to read the glyph data and plot it in the Unity engine, revealing a distorted character glyph. The host refines the code to correctly display the glyphs and expresses eagerness to render all characters in the font.
🎨 Bezier Curves and Glyph Transformation
The host introduces Bezier curves as a method to smooth out the blocky appearance of the glyphs. A brief explanation of Bezier curves is provided, including the concept of control points and how they affect the curvature of a line. The host visualizes the curves with a creative lazer-beam analogy. The mathematical aspect of Bezier curves is simplified with a linear interpolation function. The video then shows how to implement Bezier curves in the code to replace the blocky glyphs with smooth curves. The host tests the code with a 'Hello World!' message, but encounters issues with character mapping and spacing. The video ends with the host planning to address these issues in future videos.
🔍 Debugging and Refining Glyph Rendering
The host addresses the issues from the previous attempt by implementing a character map to correctly display the 'Hello World!' message. The video discusses the various formats of character mapping and focuses on the most common ones. The host writes code to handle these mappings and successfully displays the message. However, the host notices that certain letters are missing and deduces that they are compound glyphs. The video explains the concept of compound glyphs and how they are stored in the font file. The host implements a solution to render compound glyphs and tests it with lower-case letters and diacritics. The video concludes with the host planning to test different fonts and explore more rendering techniques.
🖋️ Advanced Rendering Techniques
The host experiments with different rendering techniques for text, starting with a discussion on increasing the line thickness to create solid shapes. The video explores polygon triangulation and the use of a shader to fill in the curves. The host also considers the idea of pre-rendering glyph meshes to a texture file for efficient rendering. The video then discusses the concept of signed distance fields for text rendering and their benefits in scaling. The host concludes that direct rendering from Bezier data is preferred for quality and begins to discuss the challenges of text rendering, including the need for anti-aliasing and handling overlapping shapes.
🐞 Artifact Detection and Algorithm Refinement
The host creates an 'artifact detector' to identify and quantify rendering issues. The video explains the process of testing each pixel within a glyph and how the results are used to identify artifacts. The host identifies a flickering issue when zooming and panning the text and attempts to understand its cause. The video delves into the problem of double-counting intersections and presents a solution to record and ignore close intersection points. The host also discusses the limitations of the 'closest curve' approach and considers returning to the intersection counting method. The video concludes with the host expressing a desire to improve the rendering algorithm further.
🔄 Revising the Rendering Approach
The host revisits the intersection counting method, using insights gained from debugging to refine the approach. The video explains the new method for skipping curves and handling curves going in opposite directions. The host addresses issues with overlapping shapes and introduces a method to track the closest distance to the exit of shapes. The video demonstrates the improved rendering with different fonts and discusses the challenges of legibility at small sizes. The host acknowledges the limitations of the current approach and suggests potential optimizations. The video concludes with a test of the latest rendering method on a large text wall and a reflection on the journey of text rendering exploration.
Mindmap
Keywords
💡Font Rendering
💡TrueType Font (TTF)
💡Glyph
💡Endianness
💡Bytecode Instructions
💡Font Directory
💡Contours
💡Bytecode
💡Font Reader Class
💡Glyph Table
💡Bezier Curves
Highlights
Exploring the complexities of rendering text from font files, specifically focusing on the challenges of interpreting TrueType Font (TTF) files.
Discussing the historical development of TTF files, originally created by Apple in the late 1980s.
The importance of understanding the structure of font files, including mandatory tables like 'glyf' that store glyph shapes.
Addressing endianness issues when reading font files, with a focus on big endian and little endian systems.
Decoding the font directory and table entries, including handling 4-letter tags and byte offsets.
Understanding the concept of 'contours' that make up glyphs and the potential complexity of compound glyphs.
The role of 'FWords' in defining the bounding box of a glyph and the intricacies of handling x and y coordinates.
The challenges of interpreting bytecode instructions and flags within font files for rendering purposes.
The significance of the 'maxp' and 'head' tables in determining the total number of glyphs and glyph locations.
The process of extracting glyph locations from the 'loca' table and the importance of mapping glyphs to Unicode values.
The introduction of Bezier curves for smoothing out the edges of rendered glyphs, enhancing visual appeal.
The exploration of different rendering techniques, such as increasing line thickness versus using polygon triangulation.
The challenges of rendering text at different sizes and the need for anti-aliasing to improve legibility.
The development of a method for detecting and correcting faulty contours in fonts to improve rendering accuracy.
The creation of an 'artifact detector' to systematically identify and address rendering errors.
The implementation of a shader-based approach for efficient, parallel computation of pixel coverage.
The exploration of sub-pixel anti-aliasing as a technique to enhance the smoothness of rendered text.
The final demonstration of rendering a variety of fonts with the developed techniques, showcasing the potential and limitations of the approach.
Transcripts
Hello everyone, and welcome to another episode of Coding Adventures. Today, I’d like to
try and render some text, since it’s one of those things that I always just take for
granted. To begin with, we’re going to need a font.
And I’ve just grabbed the first one I found on my hard-drive, which happens to be JetBrainsMono-Bold.
So let me open that up in a text editor just to see what’s inside, and not unexpectedly,
the raw contents is highly un-illuminating to say the least, so we’re going to need
to find some kind of guide for decoding this. This is a ttf, or TrueType font file, which
is a format, I’m told, developed by Apple in the late 1980s. And so, I had a look on
their developer website, and located this handy reference manual. Hopefully it’s nice
and simple to understand. Okay, I’ve just been skimming through it
a little, and I’m becoming increasingly baffled and bewildered by all the mysterious
diagrams, the long lists of arcane bytecode instructions, not to mention the strange terminology
such as freedom vectors… ‘ploopvalues’, and twilight zones… What is the twilight
zone? *“It is the middle ground between light and shadow”* — You’re not helping!
Anyway, after slowing down a bit and reading more carefully, I realized that most of this
complexity is focused on a single problem, which is making sure that text is displayed
or printed clearly at low resolutions, where unlucky scaling could cause one part of a
letter to appear twice as thick as another, for example, making the letters difficult
to read. So I don’t think it’s actually something
we need to worry about for our little experiment today at least, which is definitely a relief!
Let’s move on then to the next section of the manual, which gives us this list of mandatory
tables that all fonts need to include, and I’m most excited about this ‘glyf’ table,
since that sounds like where the shapes are stored. So our first goal is to locate that
table. Aand — this looks promising! “The Font
Directory is a guide to the contents of the font file”. Music to my ears.
So this is the very first block of data we’ll encounter in the file, and I don’t think
we really care about any of it , except for the number of tables. So we’re going to
want to skip over one 32bit integer, and then read this 16bit integer.
And here’s some quick C# code for doing exactly that. I’m printing out the number
of tables here by the way, just to make sure it’s a plausible value, which from the tables
listed in the manual, I’m thinking would be somewhere between the mandatory 9 and around
50 at a maximum maybe. So let’s run this quickly — and we can
see the number of tables is…. 4352. How am I doing something wrong already?
Ookay, I finally found the answer on this osdev wiki entry — which is that the file
format is big endian — which just means that the individual bytes of each value are
stored back-to-front from what my little-endian computer is expecting.
So I’ve quickly made this simple FontReader class, which let’s us read in a 16bit value,
and have this conversion to little-endian happen behind the scenes, so we don’t need
to worry about it every time. Alright, so using that, let’s try running the program
again — and now the number of tables comes out as… 17. Which is much more believable.
That means we can proceed to the table directory. And what this gives us is a 4-letter identifying
tag for each table — and we’ve briefly seen what those look like already — and
then also the table’s location in the file, in the form of a byte offset.
So in the code I’ve just added this little loop over the total number of tables, and
in there it’s simply reading in this metadata for each of them. Of course I’ve had to
expand our little reader class in the process, with these two new functions for actually
reading a tag, and a 32bit integer. And you can see here my, possibly somewhat clunky
approach to reversing the order of the bytes in there.
So, hopefully this is going to work… Let’s try running it, aand these tags look a little
strange. For example we seem to have a tiny head over here with a big question mark mark
next to it. Which does incidentally sum up how I’m feeling right now. Oh wait! I just
realized that I forgot to skip over all that other stuff in that first table that we didn’t
really care about. Let me just do that quickly — I think it was 3 16bit values, so that’s
6 bytes we want to hop over here. Alright… And now… We’ve got the tables!
I’ve spotted the glyph table entry over here, which is pointing us to byte location
35436, so that’s where we’re headed next! Let’s take a quick look at the manual first
though — and it looks like the first bit of information we’ll be given is the number
of contours that make up the first glyph in the table. Interestingly, that value can be
negative, which would indicate that this is a ‘compound’ glyph made up of other glyphs,
but let’s worry about that later! We then have a bunch of 16bit ‘FWords’,
which is an interesting name choice, and these tell us the bounding box of the glyph.
Following that comes the actual data such as these x and y coordinates in here, which
I’m eager to get get my hands on. Although we should take note that these coordinates
are relative to the previous coordinate, so they’re more like offsets I guess we could
say. They can also be in either an 8bit or 16bit format, which I assume these flags here
are hopefully going to tell us. Then there’s also an array of instructions,
which I believe is the scary bytecode stuff we’re skipping over, and finally, or I guess
firstly — I’m not sure why I read this table backwards — we have indices for the
end points of each contour. So if I’m understanding this correctly,
we’re going to read in a bunch offsets, from which we can construct the actual points,
and then say we get two contour end indices — like 3, and 6 for example —that just
means that the first contour connects points 0, 1, 2, 3 and back to 0. While the second
contour connects points 4, 5, 6, and back to 4.
That seems easy enough, so the other thing to wrap our heads around is that 8bit flag
value we saw. Although only 6 of the bits are actually used. So we’re going to have
one of these for each point, and according to the guide, bits 1 and 2 tell us whether
the offsets for that point are stored in an unsigned 1 byte format, or a signed 2 byte
format. Then, getting just slightly convoluted here,
bits 4 and 5 have two different meanings, depending on whether the 1 byte, or 2 byte
representation is being used. In the first case, it tells us whether that byte should
be considered positive or negative. And in the second case, that’s not necessary since
the sign is included in the offset itself, and so it instead tells us whether or not
we should skip the offset. That way, if the offset is zero, it doesn’t have to actually
be stored in the file. So we can see there’s some basic compression happening here.
On that note, let’s take a look at bit 3. If this is on, it tells us to read the next
byte in the file, to find out how many times this flag should be repeated, so that it doesn’t
have to waste space actually storing repeated instances of the same flag.
And finally, bit 0 tells us whether a point is on the curve or off the curve — and I’m
not entirely certain what that means right now, but we can worry about it later — let’s
first just get these points loaded in. By the way, to actually test if a particular
bit in the flag is on or off — I’m using this tiny function here which simply shifts
all the bits over so that the bit we’re interested in is in the first spot, then masks
it out, meaning all the other bits get turned off, and finally checks if the resulting value
is equal to 1. Alright, and then here’s a function I’ve
been working on for actually loading one of these simple glyphs. And this does exactly
what we just talked about: it reads in the contourEndIndices, then reads in all of the
flag values — of course checking each of them to see if they should be repeated some
number of times — and finally reading in the x and y coordinates, before simply returning
all of that data. And for reading in the coordinates, I have
other little function here — where each coordinate starts out with the value of the
previous coordinate, and then we grab the flag for this point, and depending on that,
we either read in a single byte for the offset, and add or subtract it based again on what
the flag tells us to do; or we read in a 16bit offset, but only if the flag isn’t telling
us to skip this one. Alright, I’m excited to see what comes out
of this — so back over here I’ve just made a dictionary for mapping table tags to
locations, which we can use to set the reader’s position to the start of the glyph table,
and then simply read in the first glyph, and print out its data.
So let’s what that looks like quickly. Okay, that seems promising… but we won’t really
know if it’s nonsense or not until we draw it. So in the Unity engine I’ve just quickly
plotted these points — and I don’t know what this is supposed to be, but I’m sure
it will become clear once we draw in the contours. Okay, it’s looking a little dishevelled,
but I do believe it’s the missing character glyph — and I actually remember now reading
that it’s required to be at the very start of the glyph table, so that makes sense. I
just need to iron out some bugs clearly, so I’ve been fiddling with the code a bit,
and here’s how it’s working now. We simply loop over all the end indices, and
then create this little window into the array, which allows us to access only the points
in the current contour — since evidently I can’t be trusted — and then we simply
draw lines between those, looping back to the first point in the contour to close the
contour at the end. Alright, let’s try it out — and that’s
looking much better! One glyph is not enough though, I want all
the glyphs! And to get them, we just hop over to the ‘maxp’ table, where we can look
up the total number of glyphs in the font. Then we head over to the ‘head’ table
and leap over several entries we don’t care about right now, to find out whether the locations
of the glyphs are stored in a 2 byte or 4 byte format — and finally we can dive into
the ‘loca’ table, from which we can extract the locations of all of the glyphs in the
glyf table. And then we can just read them in.
I must admit, even though parsing a font file must be pretty low on the list of thrilling
things one could do with one’s day, I’m quite excited to see all these letters appearing
here. Anyway, let’s zoom in on one of these glyphs,
such as the big B for instance, and we can see that while the glyphs are obviously very
beautiful, they are perhaps, a bit blocky. So I believe our next step should be to bezier
these bad boys. I’ve babbled about beziers a bunch before
on this channel, but briefly — if we have two points, and want to draw a curve between
them — we need at least one extra control point to control how the curve should curve.
Now to visualize this curve, let’s imagine a point starting out at the start point, and
moving at a constant speed towards this intermediate control point. From which, a second point
moves simultaneously towards the end point, in the same amount of time that it takes the
first point to complete it’s journey. Like this.
That’s not terribly interesting, but like many things in life, it can be made more exciting
with the addition of lazers. So let’s draw a lazer-beam between our two moving points.
And that’s a bit better already, but for some added flair, let’s then also leave
a ghost trail of the old beams behind… and now we’re really getting somewhere.
So the curve that these beams trace out is our bezier curve. And all we need now is a
way to describe this curve mathematically, which is surprisingly easy to do. We just
need to imagine one last travelling point — this one moving between the first two
travellers, and also completing its journey in the same amount of time. Observing the
path of this third point, we can see how it travels perfectly out our curve.
So here’s a little function to help calculate the positions of these points. It takes in
both the start position, and the destination, as well as a sort of ‘time’ value, which
can be anywhere between 0 and 1, as measure of the progress of the journey.
The point’s position at the given time is then calculated as the starting point, plus
the offset that takes us from the start to the ending point, multiplied by the time value.
From this linear interpolation, we can build our Bezier interpolation, which just calculates
these intermediate A and B points like we saw, and then interpolates between those to
get the actual point along the curve at the current time.
So, if we want to draw one of these curves, one way to do it would be like this — simply
chopping the curve up into a bunch of discrete points based on this resolution value, and
then drawing straight lines between them. So just testing this out quickly — we can
see that with a resolution of 1 we of course just have a straight line; 2 gives us just
the faintest idea of a curve; and from there it quickly starts to look pretty smooth.
Anyway, let’s get back to our blocky B — now that we have beziers working — and I’ll
just quickly jump into the code here, and modify it so that every set of 3 points in
the contour now gets drawn as a curve. And let’s take a moment to admire how that
looks. Wait what… This is not quite what I had in mind, but
it looks kind of interesting, so let’s try writing some text with it, and we can come
back and fix it in a minute. So I’ve created a string here that says
“Hello World!”, and we simply go through each character in that string, and of course
to the computer each of these characters is just a number defined by the unicode standard
— and I’m using that number as an index into the array of glyphs that we’ve loaded
in. We then draw each of these, and add some hardcoded
spacing between them, because I still need to figure out where they’ve hidden the actual
spacing information. Alright, let’s take a look at the result.
Okay the letter spacing is too small, so I’ll just increase that quickly, but unfortunately
the text is still not particularly comprehensible. Clearly the glyphs in the font are not in
the same order as unicode — which makes total sense actually, since different fonts
support different subsets of characters, so there’s never going to be a 1 to 1 mapping.
Instead, the font file needs to include a character map, to tell us which glyph indices
map to which unicode values. This part seems like a bit of a pain because
there are 9 different formats in which this mapping information can be stored. Although,
I then read to my great relief that many of these are either obsolete, or were designed
to meet anticipated needs which never materialized. And reading a bit further, it seems that 4
and 12 are the most important ones to cover, so I’m going to get cracking on those.
Okay — here’s the code I’ve written to handle them. And I’m not going to bore
you with the details, all it’s doing is looking up, in a slightly convoluted way,
the unicode value that corresponds to each glyph index.
With that mapping information, we can now retrieve the correct glyphs to display our
little Hello World message. I really like how stylish some of these buggy
beziers look. Perhaps this is the font of the future… Oh and by the way, if you’d
like to learn more about the beauty of bezier curves, I highly recommend a video called
the “Beauty of Bezier Curves”, and it’s sequel too — the continuity of splines.
Right now though, let’s see if we can fix this mistake that I’ve made.
So first of all, we can that here where there’s supposed to be a straight line, this point
is actually just controlling how the curve bends as it moves towards this next point
here. So it looks like the glyphs aren’t made entirely out of beziers as I was kind
of assuming, but rather have some ordinary line segments mixed in.
Also, taking another look at the manual, it shows this example where we have a spline
made up of two consecutive bezier curves. And what it’s pointing out, is that this
shared point between the two curves is exactly at the midpoint between these two control
points. This is usually where you’d want it to be,
simply because if it’s not at the midpoint, we can see that we no longer get a nice smooth
join between the two curves. They become discontinuous. And so what the manual recommends doing for
these points that lie right in the middle, is to save space by simply deleting them.
As it puts it, the existence of that middle point is implied.
So, we’re going to need to infer these implied points when we load the font. And I believe
that this is where that ‘on curve’ flag that we saw earlier is going to come in handy.
So in this example, points 1 and two would be flagged as ‘off the curve’, whereas
points 0 and 3 would be on the curve, since the curve actually touches them.
What we can do then is say that if we find two consecutive off-curve points, we’ll
insert a new on-curve point between them. Also, just to turn everything into beziers
for consistency’s sake, let’s also say that if we find a straight line segment — which
would be two consecutive on-curve points, then we’ll insert an extra control point
in between them. Alright, so I’ve been writing some code
to handle that quickly, which as we just talked about simply checks if we have two consecutive
on or off-curve points, and if so, it inserts a new point in between them. And here’s
a quick test to see if that’s working as expected. So these are the on and off curve
points that are actually stored within the font file for our letter B. And then here
are the implied on curve points that we’ve now inserted into the contours, as well as
these extra points between the straight line segments, to turn them into valid bezier curves.
With that, our code for drawing these curves finally works as expected. And just for fun,
I made it so that we can grab these points and move them around to edit the glyphs if
we want. Anyway, let’s put this to the test quickly
by drawing all the letters of the English alphabet here, and these all seem to have
come out correct. So let’s then also try the lowercase letters,
and these are looking good too… except I notice that we are missing the i and the j.
I guess they must be compound glyphs since we haven’t handled those yet, and I think
the idea is just that the ‘dot’ would be stored as a separate glyph, and then the
i and the j would both reference that, instead of each having to store their own copy of
it. So I’ve just spent some time implementing
compound glyphs over here, and since these are made up of multiple component glyphs,
essentially all we need to do is keep reading in each component glyph, until we receive
word that we’ve reached the last one. And all the points and contour end indices from
those, are simply squished together into a single glyph, which is returned at the end
here. Then here is the function for actually reading
in a component glyphs, and this essentially just gives us the index to go and look up
the data of a simple glyph. Actually, I’m wondering now if a compound glyph can contain
another compound glyph. I’ll need to tweak this later if that’s the case…
But anyway the simple glyph we’ve just read in can then be moved around, scaled, or even
rotated — if I had implemented that yet that is — to orient it correctly relative
to the other components. And we can see that transformation being applied to the points
over here. So with that bit of code, we can at last read
in our missing i and j. And this has also unlocked a whole world of
diacritics, since of course these all make use of compound glyphs as well, to save having
loads of duplicate data Okay, we’re making good progress I think,
so I’d like to start trying out some different fonts, just to see if they’re even working
properly. Let’s write a little test sentence here,
such as the classic one concerning the fox and the lazy dog, or this one I came across
concerning discotheques and jukeboxes. These are good test sentences because of course
they contain all the letters of the alphabet. Which I learned today is called a pangram.
Anyway, this is JetBrainsMono that we’ve been using, and I’m going to try switching
now to maybe… Open Sans. Okay, there’s an issue with the size of
the glyphs clearly — but I found this helpful scaling factor in the head table, which should
take care of that… And that’s looking much better — but obviously the spacing
is a little off. The problem is that this isn’t a monospaced
font like we had before, so our hard-coded spacing value is no longer going to cut it.
After some rooting around though, I unearthed this horizontal metrics table, which tells
us how much space we need to leave after each glyph, which is known as the ‘advance width’.
So here’s some code for just reading in that information. And if we use those values,
our text now obviously looks a lot more sensible. Okay I’ve just spent a little while refactoring
and cleaning up the project, because things were getting a little out of hand. Let’s
quickly make sure I haven’t messed anything up in the process… and of course I have.
Alright — I tracked down the issue, so now we’re back on track.
And now that we’re able to write some words and display their outlines, let’s finally
figure out how to properly render the text. The solution is actually surprisingly simple
— we just need to increase the thickness of the lines until they turn into a solid
shape. Alright, thanks for watching! Well, wait a
minute, this is perhaps not the most legible approach, so we should maybe take some time
to explore our options. For example, some years ago I wrote this little
polygon triangulation script, which uses an algorithm called ear-clipping. And one idea
might be to simply feed the points that we’re currently using to draw the outlines into
that, and get out a mesh for the computer to draw. This doesn’t seem ideal though
because if we want the letters to be nice and smooth, we’ll have to have lots and
lots of little triangles on the screen which the computer might not particularly appreciate.
So a much smarter mesh-based approach is presented in this paper on “Resolution Independent
Curve Rendering”. Basically, we triangulate a mesh of just the sort of straight-line inner
part of the glyph, and then render a single extra triangle for each of the curvy bits.
So to be clear, for each bezier curve on the contour, a triangle is simply being drawn
like this. The trick now is to find some way to display only the section of the triangle
that is inside of the curve, so that we can get a smooth result without having to add
even more triangles. For this, we’re going to need to think about
what the shape of our bezier curve actually is —and just eye-balling it, it looks kind
of parabolic, but let’s have a look at our equation to be sure…
Currently we have this defined as 3 separate linear interpolations, which paints a nice
geometric picture, but it’s a little clunky mathematically. So I’ve quickly just written
out those interpolations as one giant equation, and then simplified it a bit to arrive at
this, which we can of course recognize as a quadratic equation. It’s just this coefficient
‘a’ times t^2, plus the coefficient ‘b’ times t, plus a constant ‘c’. For that
reason, this particular type of Bezier curve we’re working with is called a quadratic
bezier curve, which confirms our parabolic suspicions.
We can appreciate that fact even better if allow the lines to extend out past this 0
to 1 time interval of our bezier segment. Now, it is a little bit fancier than your
standard parabola though in that we can actually rotate it around. And that’s because our
equation of course is outputting 2 dimensional points, rather than a single value. We could
break it down if we wanted into separate equations for the x and y values of the curve respectively.
And if we were to graph those, they would just be regular old, non-rotated parabolas
like this. So nothing mysterious here — if we move
a point on the x axis, we can see how that’s reflected in the x graph, and if we move a
point on the y axis, we can see how that’s reflected in the y graph.
Alright, so I reckon we’re ready now to think about this problem of how to fill in
the region inside of the curve. That’s kind of confusing when the curve is all rotated
like this though, so let’s just position the points so that it looks like a perfectly
normal y = x^2 parabola. And one way to do this is to put p1 at coordinates (1/2, 0),
and p0 at coordinates (0, 0), and p2 at coordinates (1,1).
And that indeed has given us this nice simplified case where the y values of the curve are equal
to the square of the x values. So if we want to fill in this region inside
the curve — that’s very easy in this case because if the points on the curve are where
y is equal to x^2, then obviously inside is just where y is greater than x^2.
To actually draw this region though, we’ll need a mesh and a shader.
For the mesh, we can simply construct a triangle like we saw earlier out of the 3 bezier points,
and I’ll give those points texture coordinates corresponding to the positions we put the
points in to get that simple y=x^2 parabola. Then in a shader, we can get the interpolated
x and y texture coordinates for the current pixel, and let’s visualize the x value for
now as red, which looks like this. And here’s the y value as green. And here’s them both
together. I actually find it a bit easier to see what’s going on if we chop the values
up into discrete bands, by multiplying by 10 for example, then taking the integer part
of that, and then dividing by 10. And now we can see our x and y values as a
little grid here, instead of a continuous gradient.
Of course we can move our points around, and we can see how this grid gets transformed
along with the , since again the shader automatically interpolates the texture coordinates to give
us the value for the current pixel. In texture space though, things always just look like
this, since these are the texture coordinates we defined.
So back in the shader, let’s test for y being greater than x^2, and draw the result
of that into the blue channel here. As hoped, that now fills in the curve — and
what’s more of course, it will also be transformed along as we move our points around. So we
only really had to think about the simplest possible case, and by taking advantage of
the way that shaders work, we get the rest for free.
I thought that was a cool idea, so with that, we can now now go back to our mesh of just
the straight lines of the glyph, then add in those triangles for all the curves, and
then use this clever little idea to shade in just the parts inside the curve.
Hang on, that’s not quite right. It looks like along the outside edge of the glyph,
where the points go in a clockwise order, everything is correct — but along the inside
here where the points are now going counterclockwise we actually want to fill in the opposite region
of the curve. So in the shader I’ve just quickly added
this little parameter which’ll tell us the direction of the triangle, and we can flip
the fill flag based on that. Then just testing that here — we can see that when the points
are clockwise, we have the inside of the curve being filled, and when counterclockwise, the
outside is now filled. Aaand that seems to have fixed our problem
here. So this is a pretty interesting way to render text I think, but one potential
problem with the approach is that Microsoft has a patent on this research. Skimming through
this, it does seem though that it only covers the more complicated case of cubic bezier
curves — which is when the curves are defined by four points instead of the three that we’re
working with. So quite possibly it is actually okay to use.
But, I’m anyway curious to experiment with different ideas today, so let’s put this
one to the side for now. What if instead we took our old glyph meshes
made out of ridiculous numbers of triangles, but claim that we don’t actually care about
the triangle count because instead of rendering them to the screen, we pre-render all of them
to a texture file. That way we ultimately only need to render a single quad for each
letter, and use a shader that automatically crops out the appropriate part of that texture
atlas to display the letter we want. This’d be nice and fast, but it does of course have
some issues with scaling. We’d need to make our atlas truly gigantic if we want to be
able to display large text for example. Or alternatively, we could try using signed
distance fields. In one of my old videos we took this map of the earth, and wrote a little
function to calculate the distance of every pixel to the shoreline. Which could then be
used for making a simple wave effect. So we could use that same code here to turn
our image of glyphs, into an image of glyph distances.
A simple shader could then sample the distance, and output a colour if that distance is within
some threshold. Which looks like this. This scales better than before, because now
the shader is blending between distances rather than hard on-off values. The outlines are
a little lumpy, but I’m sure we could improve that with some refinements to the technique.
These texture-based approaches are slightly annoying though, in the sense that it’s
not really practical to support all the characters in a font because then the texture would have
to be too large. So we always have to compromise on some subset, in order to get good quality.
And even still, the edges tend to have some obvious imperfections, at least if you’re
zooming unreasonably close to the text — and we tend to lose our nice crisp corners.
With that said, I have seen some interesting-looking research about using multiple colour channels
to store additional information that can help to increase the quality.
So I would like to experiment with this at some point because it’s definitely an interesting
technique, but for today I’d prefer to render the text directly from the bezier data, without
any textures comprising the quality in between. So focusing on a single letter for a moment,
let’s imagine a grid of pixels behind it, and our goal is obviously to light up the
pixels that are inside of the letter. Therefore, we need a way to detect whether a point is
inside or not, and happily there’s a very intuitive way to do this.
Say we’re interested in this point over here. All we need to do is do is pick any
direction, and imagine a ray going out until it hits one of the contours of the shape.
When it hits that contour, it must either be entering or exiting the shape. And if it’s
entering it, then of course it’s bound to exit it at some point. But in this case the
ray doesn’t hit any other contours along its journey, which means it must have been
exiting the shape, and so obviously it must have started out inside of it.
Now moving our point of interest back a bit into the ‘B’ hole here, for want of a
better term, the ray now intersects with two contours along its path. From that we can
deduce it must have first entered the shape, and then exited it, and so clearly the starting
point this time was on the outside The simple rule we can gather from this is
that if the number of intersections is even, we’re outside the glyph, while if its odd,
we’re inside of it. So all we really need to do is figure out
the maths for calculating the intersection of a horizontal ray with a bezier curve.
For example, say we have this curve over here, and we want to figure out where this ray,
at a height of 0.5, intersects with it. Since the ray is horizontal, we only need
to care about the y values of the curve, so we’re basically just asking where this function
is equal to 0.5. Equivalently, we could shift the whole curve down by 0.5, and then ask
where the function is equal to zero. Conveniently, that’s exactly the question
that that the quadratic formula is there to answer.
So, if we just take our equation for the y component of the bezier curve, and plug the
values in for a, b, and c, we’re going to get back the values of t where the corresponding
y value is 0. In this case, the t values come out as 0.3,
and 1.2. And since values outside of the range 0 to 1 are not on our curve segment, that
means that we’d count 1 intersection in this case.
Alright, so here’s some code implementing the quadratic formula. And one small detail
here is that if a is zero, we get a division error — this is the case where the curve
is actually a straight line — and so then we can solve it like this instead.
Okay, now I’ve also written this little test just to see that this is working properly
— and all it does is calculate the a b and c values from the positions of the bezier
points, and then it asks what the roots of that equation are — so where it’s equal
to zero, but first the height of our horizontal ray is of course subtracted over here like
we saw earlier, to take its position into account.
Then, assuming a solution exists, we just draw a point at that time value along the
curve to visualize it — using this little function over here. So, let’s see how it
goes! I’m just going to grab some points and move
them around a bit to see whether the intersection points look correct or not. It’s maybe not
the most rigorous testing methodology in the world, but we should catch anything obvious
at least. So far so good though. And let me also try changing the height of the ray, and
that seems to be working as well. So using this horizontal intersection test,
we can now implement that simple algorithm we were talking about earlier, where an even
number of intersections means that the point is on the outside of the glyph, while an odd
number means that it’s inside. So I’ve basically just copy pasted our intersection
test here — only after calculating the roots, instead of visualizing them, we now increment
a counter if the root is valid. And by valid I mean it needs to lie between 0 and 1 to
be on the bezier curve, and we’re also only interested in intersections to the right of
the ray’s starting point; since that’s the arbitrary direction I picked for the ray
to be traveling. Now we just need this tiny function here to
loop over all the curves in the glyph, count the total number of intersections, and return
whether that value is even or odd. Alright — let’s try it out. And, isn’t
that beautiful? From here, I guess we basically just need to increase the number of dots!
But hang on a second… what are these weird artefacts that keep appearing, and also from
the noises I’m hearing, I fear my computer’s about to burst into flames.
Okay, no need to panic, let’s just start with the most serious issue first — getting
rid of this ugly line. So let’s take another look at
our visualization. Alright… it just needed a good old-fashioned
restart it appears, and we’re back in business. So let me set the grid size to one hundred
here, which is where that issue was occurring — and I believe what’s happening is just
that there’s obviously a meeting point of two curves over here, and this row of pixels
just happens to line up precisely for both of them to be counted in this case. So in
the code, I’ll just change the condition for being a valid intersection to discount
the very end of the curve, meaning that where one curve ends and another begins, we’ll
only count one of them. Aaand running this again — we can see: problem solved.
With that fixed, let’s now tackle the performance problems, and I want to try simply moving
all of our calculations into a shader so that we can compute loads of pixels in parallel.
Firstly I’m just trying to get the bounding boxes of each glyph showing up — so basically
there’s this buffer of instance data, which just contains each glyph’s position and
size at the moment, and that gets sent to the shader. Then we’re requesting that a
quad mesh be drawn for however many glyphs there are in the input string.
So when the shader is processing the vertices of the mesh, it’ll be told which copy or
instance of the mesh it’s currently working with, meaning that we can get the data for
that instance, and use it to position each vertex.
Here’s how it’s looking at the moment. This says “Coding Adventures”, by the
way, in case you can’t tell. To improve the readability, we’ll need to
get the rest of the data over to the shader. So I’ve been working on this little function
here which creates a big list of all the bezier points that we need, along with a list of
integers for storing some metadata about each unique glyph in the text. In particular, at
what index it can find its bezier data, the number of contours it has, and also the number
of points in each of those contours. Obviously we also need to record the index at which
each glyph’s metadata begins, so that the glyphs can be told where to find it.
Not very exciting stuff, but it needs to be done. And all of this ultimately gets packed
up and sent to the shader, to live in these buffers over here.
Another slightly tedious thing I’ve been doing is translating all our C# code into
shader language for things like the quadratic formula, intersection counting, and the point
inside glyph test. Okay with that done, let’s try it out…
And it’s working! Well mostly anyway… I don’t know why the bitcoin logo has showed
up — that’s just supposed to be a space; and also the i has gone missing, but I’m
sure these’ll be easy enough to fix. I’m more concerned about whatever’s happening
over here! So I just want to step though the different
segments of the glyph to see what values we’re getting for our quadratic coefficients — aaand
I should have known. If it isn’t my old nemesis floating point imprecision….
“…as we can see, everything is jiggling like there’s no tomorrow”
So computers, tragically, represent numbers with a finite number of bits, meaning that
precision is limited. On this segment for instance, we calculate the coefficient ‘a’
by taking p0, adding p2, and subtracting twice p1. Which is precisely zero for the y values.
Except… the computer begs to differ. It says it’s extremely close, but not quite,
equal to zero. As a result, when we apply the quadratic formula, we’re dividing by
this tiny, incorrect value, and getting incorrect results. So, we need to get need a little
dirty I guess, and just say that we’ll consider our curve to be a straight line if as ‘a’
is very close to zero. And now those horrible little lines have disappeared.
By the way, we should check if all of this has actually improved the performance, and
indeed our efforts have taken us from barely 12 frames a second or whatever it was, to
being comfortably in the range of 800 / 900 frames per second range.
Okay, after a few misadventures… I’ve managed to fix that missing I and unvisited
bitcoin, and so at last, we’re able to properly render some text to the screen with our shader.
Since we’re doing this directly from the curve data, we can also zoom in pretty much
to our heart’s content, and it remains perfectly smooth. Well I say perfectly smooth, but of
course there’re still not enough pixels in modern displays to prevent edges from looking
unpleasantly jagged — so we’ll definitely need to implement some kind of anti-aliasing
to improve that. I also spotted a weird flicker when I was
zooming in. I’m struggling to see it again now, but here’s a freeze frame from the
recording. So I’ve been zooming in and out and panning around trying to see how common
these flickers are, and where they’re occurring. There’s another one! By the V… Did you
see that? This is a bit frustrating because it only
happens when the pixels perfectly line up in some way, so it’s hard to capture the
exact moment. It’s right in the sweet-spot I’d say of
being rare enough to be a nightmare to debug, but not quite rare enough to just pretend
it doesn’t exist. So I really want to have some way of quantifying how many of these
artefacts are occurring, so that I don’t have to spent multiple minutes after each
tweak to the code trying desperately to divine whether I’ve made things better or worse.
So, I’ve decided to build an evil artifact detector.
What I’ve done is simply taken a glyph, and written some code to randomly generate
these little test windows. The windows in black are entirely outside of the glyph, while
the windows in white are entirely inside of it.
And I’ve run this across a bunch of different glyphs from several different fonts. The idea
then is that each of these windows gets fed one by one to a compute shader, which looks
at all the texels — which is to say, all the pixels inside the window texture — and
runs our algorithm for each of them to figure out whether they’re inside the glyph or
not. It then draws the result to the window texture, and also flags the test as a failure
if it doesn’t match the expected value. By the way, a small detail here is that I’m
offsetting the y position by an increasing fraction of a texel as we go from the left
to right edge of the window — because that way we’re not just running the same test
over and over, but rather covering a whole extra range of values, which’ll hopefully
help us catch more artifacts! In total it’s running just over 74000 tests;
which takes about 10 seconds to complete, and our current algorithm is failing 7166
of them, so roughly 10 percent. To help hunt down the issues, I’ve also
set up this little debug view, where we can see a list of the failed cases down the side
here. And each case is defined by 3 numbers. There’s the glyph index, which we can set
over here. Then the resolution index, which determines how many pixels are tested inside
of the window. And finally the window index of course tells us which window we’re looking
at. So let’s go to one of the failed cases,
such as 0, 2, 19. And zooming in on the window, we can see this little pixel over here where
it mistakenly thought that it was inside of the glyph. I’m going to draw in a quick
debug line, so that we can then zoom out and see where exactly the issue is arising. And
it appears to be this troublesome meeting point of two curves again, where the intersection
is being double-counted. I thought I had fixed that earlier when we
told it to ignore time values exactly equal to one; but clearly that was overly optimistic.
It makes sense in theory, but as we saw recently — with every bit of maths we do, numerical
imprecision has an opportunity to creep in — and so we can’t exactly trust the values
that we’re getting. So I’ve been working on a dirty little fix
for the double counting, where we simply record the positions of the intersections that occur
on each segment, and then when we’re examining the next segment, we ignore any intersection
points that are extremely close to those previous ones. Alright, let’s try this out… And
we get a failure count of over 30000. Well that’s not great… Okay, I see what
went wrong though. We’re recording the positions over here regardless of whether they were
actually on the curve segment or not. So let me fix that quickly, and then redo the test
— and this time…. 4943. And with some trial and error just tweaking the value of
the distance threshold, I was able to get the failure rate down to 3647.
Alright, now I’ve been taking a look at some of these remaining failures — such
as this one over here — and in this case, we have a few spots that falsely believe they’re
outside of the glyph — meaning the intersection count is even when it should be odd. And looking
at this, we can see we have 1, 2, 3 simple intersections, and then we avoid the double
counting over here, giving us a total of 4. The count is supposed to be odd though, so
I guess in this particular case we actually do need to count both of them. I think because
they’re forming this kind of vertical wedge, so in a sense the ray is both exiting and
entering the glyph at that point, meaning they cancel each other out.
I’m not sure if I’m making any sense, but regardless, this whole double-counting
business is becoming annoying, so I’ve been brainstorming a bit how we could approach
the problem slightly differently to get around it.
The new plan is to forget about counting the number of intersections, but rather look for
the closest intersection. The idea is that regular contours are defined in a clockwise
direction, while the contours of holes go anti-clockwise. And this means that if the
closest curve crosses our ray in an upwards direction, we must be outside the glyph, or
inside a hole — but that’s the same thing really. Whereas if the closest curve crosses
our ray in a downwards direction, then we know we’re inside the glyph.
To determine which direction the curve crosses the ray, we can take our equation and just
apply the power rule from calculus to figure out the gradient, which comes out as 2at plus
b. Calculating that for any value t along our
curve gives us this vector here, which tells us how the x and y positions along the curve
change with respect to t. So over here, the y position of the curve is increasing as t
increases, so the gradient has a positive y value, whereas over here y is decreasing
as t increases, so the gradient has a negative y value. That means we can simply check the
sign of that value to find out whether the curve crosses our ray in an upwards or downwards
direction. So here’s the new implementation — as
you can see we’re simply searching for the closest point that our ray intersects, then
calculating the gradient of the curve at that point, and saying we’re inside the glyph
if that closest curve crosses the ray in a downwards direction.
Let’s run the test to see how this compares… And we’re getting a failure count now of
1126, so not a bad improvement! There are some downsides to this approach
though. For instance — here’s a font I was experimenting with, and we can see that
the ‘j’ is looking a little strange! That’s due to an error in the font where the contour
has been wound anti-clockwise like a hole. If we were just counting the intersections
it would give the correct result regardless, but this new approach fails catastrophically.
Also, over here the contour contains a slight self-intersection, which again causes an issue.
So that’s definitely a big weakness to keep in mind, but I’m going to run with it anyway
since it seems promising in terms of eliminating those pesky artifacts.
With that said, this same case we had before is actually still failing, but now its because
right at the end point these two curves have the same position, so it doesn’t know which
one is closer. I think it makes sense then to favour the curve that’s most on the left,
since our rightwards ray should hit that first. So in the code I’ve just added a quick test
to see if two points are the same, or at least very near to being the same — and then if
that’s the case, we skip the curve that has it’s ‘p1’ point further to the right.
And running this now, we’ve eliminated a few more problems — down to 1024.
Alright here is our next problem…. and I’m just trying to spot the artifact here. It’s
quite tiny, but there it is. Okay I’m not sure what’s going on in this case — I
don’t even know which of these curves the ray is intersecting. So I’ve quickly added
a way to highlight the curves of each contour by their index, and we can see here that the
possibilities are curves 2, 3, or 4. Now shaders can be a little tricky to debug
— but we can get creative and do something like output a colour based on the index of
the closest curve we found— so red if index 2, green if index 3, and blue if index 4.
And now we can zoom in on our debug texture again to see which it is. And it must be our
lucky day — two different problems for the price of one! As we can see, from one spot
its hitting curve 2, and from the other its hitting curve 4.
Let’s think first about the ray that’s hitting curve number 2. It’s a little odd
that our ray is able to hit that curve, but evidently miss the closer curve over here
— given the fact that these endpoints are at exactly the same height.
I think what might help in at least some situations like this is to test whether all the points
of a curve are below our ray, and simply skip the curve in that case. And same story if
all the points are above the ray. These raw position values are really the only reliable
thing we have — as we’ve seen, once we start doing maths with them, all bets are
off — and so by skipping curves early on based on the positions, we can avoid ‘accidentally’
hitting them due to numerical issues. It would really be wise to use these positions
directly as much as possible — for example at one point I read a bit about the Slug algorithm,
which has this way of classifying different curves based solely on the positions of their
control points. This technique is very much patented though, so I’m going to just forge
ahead with my own hacky approach instead. Alright, so running the test again with our
curve skipping enabled now, we get a failure count of 882.
And looking at our case from before, we can see that only the intersection with curve
4 remains. This is actually the curve that we want it to intersect with, but since it
completely flattens out over here, we must be getting a gradient of zero, which is ambiguous
as to whether the curve is going up or down. But from the fact that p0 and p1 are in a
straight line here, and p2 is *below* them, I think we could reasonably consider this
curve as going purely downwards. So in general, let’s say the a curve is
decreasing for it’s entire duration if p2 is below p0, and increasing if it’s the
other way around. Although this only holds of course if p1 is somewhere in between them
on the y axis. As soon as p1 becomes the lowest, or highest
point, the curve starts curving in both directions. And this is kind of troublesome actually,
because let’s say we have a ray which should theoretically intersect precisely at this
turning point over here where the y gradient is 0… Any tiny imprecision might cause the
program to actually think it intersected a tiny bit behind or ahead of that point, so
we could very easily end up with artifacts. Most fonts I’ve come across so far, seem
to actually avoid using this kind of curve, but if we encounter one, maybe we could simply
split it into two segments, one where it’s going purely upwards with respect to t. And
one where it’s going purely downwards. Doing that should be reasonably straight-forward.
First of all, we can figure out the turning point simply by setting the gradient to zero,
and then solving for t — from which we can of course compute the actual point.
Then say we want to construct this segment here that goes from p0 to the turning point.
Obviously we’re going to place the two endpoints of the new curve at p0 and the turning point,
but the big question is, where does the new p1 point go?
Well, we need the gradient at p0 to still point towards the p1 point of the original
curve, in order to maintain the same shape, which means that our new p1 point is constrained
to be somewhere along this line here. And where exactly, is where it forms a horizontal
line with the turning point, since of course the gradient there, by definition, is 0 on
the y axis. So I’ve been writing some code to do these
calculations for us automatically, and it’s such a great to feeling to figure out the
maths for something, and then just have it work perfectly the first time. Which is why
I was a bit disappointed when this happened. Anyway, it turned out I just forgot some important
parentheses… and while we’re here let’s take a quick look at the actual maths, which
is super simple. We just say that if we start at p0 and move along the gradient vector of
the original curve at p0 for some unknown distance, we’re eventually going to hit
a point with a y value equal to the turning point. We can then rearrange that to solve
for the unknown distance, which then let’s us calculate our new p1 point. And the exact
same goes for the other segment of course Aaaand now we can see that this is working.
So this can be done as a pre-processing step on the glyph contours — though like I said,
most the fonts I’ve tried actually keep the p1 points strictly between p0 and p2,
so it’s usually not necessary. Alright, then in the shader, we can now determine
if the curve is going up or down simply based on the relative y positions of p0 and p2.
Hopefully this will improve the accuracy somewhat… and running the test on this, we now we have
a failure count of 755. I’m slightly underwhelmed, but it’s progress nonethless.
Okay, let’s keep going! The next error on our list is this one here, where this ray
is managing somehow to dodge both of the segments at their meeting point. I believe what’s
happening is that the roots we’re calculating there are just ever so slightly outside of
the valid bounds — due to precision errors of course.
So I’ve simply added in a little fudge factor to catch those cases. And running the test
again, we’re now down to just 404 remaining failures.
Okay, here’s the next case on the list, and in this one the ray should be hitting
segment six, but after some debugging I discovered its actually just missing it due to precision
issues wouldn’t you know — and instead hitting only segment seven. So in our quadratic
formula, missing the curve means that the disciminant is negative — so let’s try
actually allowing it to be a tiny bit negative to catch these cases — and just clamp it
to zero in our actual calculation. Then let’s run the tests once again, and
that’s actually brought us all the way down to just 10 failed cases. So let’s have a
look at what remains. Alright, our ray is hitting the tip of these
two segments here — and this looks very similar to a problem we dealt with earlier
actually. Remember this one — where the ray was also hitting the tip of two segments,
and we told it that it should prefer the one that had its p1 point further on the left;
the idea being that a ray traveling rightwards should hit that segment first.
But now we have a bit of an edge case here where the p1 point of the segment we want
the ray to hit, is actually not further to the left than the other segment.
So I’ve set up this little test area with different configurations of curves to try
and come up with a better approach, and I think I finally have something that’ll do
the trick. To be clear about what we’re trying to achieve
in case I wasn’t making sense earlier — the problem is that if our ray is coming in and
hitting the tip of these two curve segments, we’re not going to get a clear answer as
to which curve it hit first. But intuitively, it should be the blue curve, since if the
ray was just a tiny bit lower down, then it would very clearly be hitting the blur curve
first. So my thinking now is that we’ll consider
the gradients of the two curves at our point of intersection, and then imagine kind of
winding in a clockwise direction from the ray, and seeing which gradient vector we run
into first. Whichever curve that gradient belongs to, is the curve we want.
By the way, if the curves originate from above the ray instead of below it like this case
here, the idea still works the same, only we imagine winding in a counterclockwise direction
instead. And here’s that idea expressed in code.
So… running the test again, we’re now at long last down to zero errors, which I’m
very happy to see. Obviously, scoring perfectly on the test does
not mean it’s perfect — in fact I’d be shocked if there’re not still plenty
of edge cases lurking in the shadows, but we’ve definitely come a long way in destroying
those artifacts that were quite prevalent before. At least I’ve been zooming and panning
around this text for quite a while, and not yet noticed a single one.
I did come across a more fundamental issue while testing some new fonts though. And this
arises when we have overlapping shapes, like with this particular letter ‘K’ over here.
We’re using the closest segment to determine whether we’re inside or outside of the glyph,
but from over here for example, the ray would hit into this as the closest segment, which
incorrectly tells us that we’re outside the glyph. And if we try rendering it, it
looks like this. After some contemplation though, I’ve tweaked
the algorithm slightly to now keep track of the closest distance to the exit of any shape
that we’re inside of, as well to the exit of any hole that we might be inside of. And
then by comparing those distances we can determine whether our point is inside the glyph or not.
So it seems like our rendering is now working okay.
But it definitely is in need of some anti-aliasing as I mentioned before, so that it doesn’t
look so unpleasantly jaggedy. Currently, each pixel decides what colour
to display based on whether a single point at its centre is inside of the glyph or not.
So the easiest improvement I can think of would be to go from a single point deciding
the fate of the pixel, to perhaps a grid of 9 points, each contributing a little to the
final colour of the pixel. We could even get fancy and consider the fact
that each pixel is made up of a red, green, and blue component. So over here for example,
instead of just dimming all 3 colours uniformly because only 6 of the 9 points are inside
the glyph — we could say that since red and green are within the glyph, we’ll just
light those two up fully, by outputting yellow, and leave blue turned off.
This is a technique called sub-pixel anti-aliasing, and we can see it in action if we take a very
close look at the text in my code editor for example, where along left edge of the h for
instance we can notice how mainly just the green and blue parts of the pixel are lit
up; while along the right edge here it looks like mainly just the red part of the pixel
is lit up. So this is a clever way of essentially tripling the horizontal resolution of the
display. There are some subtleties to this though—
in terms of filtering the result so that the colour fringes aren’t overly harsh, and
I guess detecting and handling the possibility of different pixel layouts on different displays.
So I’m going to actually leave this subpixel idea for future experiments.
But here’s our super simple version for today, we just loop over a 3x3 grid, and calculate
the different sample points within the pixel. This can be done by the way thanks to this
derivative function, which just tell us how much the position value changes between the
current pixel we’re working on, and its next-door neighbour. In other words, it’s
the size of the pixel in our glyph’s coordinate space.
So we’re just adding one for each of these points that is inside of the glyph, and then
dividing by 9 to get the average of those samples. And now, instead of being all jaggedy,
our edges become nicely smoothed out. This is at a very high magnification of course
just to show the effect better — it’s obviously a lot more subtle in reality, and
while I’m not sure how well it’ll show up in the video, it definitely does make a
big difference to how smooth the text feels. Running our algorithm 9 times for each pixel
feels a bit exorbitant though, so I’ve been experimenting with a little improvement where
we actually only run it three times — along the vertical axis — and sum up now this
horizontal coverage value. Basically, instead of simply returning a bool
for whether the point is inside the glyph or not, I’ve slightly modified our function
to return a coverage value between 0 and 1. This is calculated by adding together the
distance to the nearest curve segment on either side of the input point, both of which have
been clamped to half of the width of the pixel, and then dividing by the width of the pixel.
This means that if the pixel is far away from any edge, then it will get a value of 1, meaning
it’s completely inside the glyph — or inverted to zero if we’re actually outside
of the glyph. And for pixels closer and closer to an edge, that value will get closer and
closer to 0.5, so that we get a smooth transition between 0 and 1 across the edge.
Here that is in action with our test text from earlier. And if we magnify this a bit,
we can see the anti-aliasing is looking pretty decent I think.
It’s maybe a slightly strange implementation though — I later came across this writeup
which talks about sending out the rays in different directions to estimate what fraction
of this little circle is covered by the glyph, which sounds a bit more flexible and accurate,
so I might use that instead in the future. In any case let’s do a super quick performance
test now. So I’ll paste in the entire 12 thousand word script for this video as the
input, and try running it. Alright, taking a look at the statistics,
it seems to be running happily enough at around 800 frames per second — and I tried comparing
the average frame duration here to that of a blank project, and it seems like the text
is taking roughly 0.2 milliseconds to render, although I’m honestly not sure how to properly
profile these sorts of things. The performance does fall off a cliff unsurprisingly
though for very complex fonts. Here’s a random font I found with lots of little wiggly
details in the glyphs — this g for instance contains 440 bezier curves, which has taken
us from roughly 800 frames per second down to about 150.
There are ways we could optimize this though — the obvious idea that comes to mind being
splitting each glyph up into multiple bands, so that each pixel only has to test the bezier
segments inside of the band its in. Additionally, each glyph is currently rendered as a quad
— which is nice and simple, but in a lot of cases can leave quite a bit of empty space
that the shader still needs to crunch the numbers for. With a bit of pre-processing,
we could construct much tighter fitting meshes, tailored specifically for each individual
glyph. Far more challenging than improving the performance
I suspect is improving the legibility of the text at small sizes.
Right now it’s very easy to read — on my screen at least — but if I just zoom
out a little more, it really starts to go downhill quite fast. Let me magnify this view
so we can see the problem better, and take this word here for example. We can see how
some parts of it show up nice and bright, while other parts are much dimmer — because
maybe they fall awkwardly between two pixels for example, so both pixels end up just being
kind of greyish. If I pan the camera slightly, we can see how the legibility of each letter
actually changes as its alignment with the pixels changes.
And that’s kind of a tough thing to fix! I believe we’d need to delve into that scary
bytecode interpreter stuff that I was so happy to skip in the beginning. So definitely a
problem for some other day. What’s still bugging me a bit though is
those downsides I showed earlier to our ‘closest curve’ rendering approach, namely failing
if the contour has the wrong winding direction, and being very sensitive to self-intersections.
So I decided to take everything I learned from that artifact hunting process, and basically
start over from scratch with the counting method again. And I won’t bore you with
another blow-by blow account, but armed with a much better understanding of the problem
now, I was able to quickly get it working with zero issues on the test set.
A bunch of the same ideas are still in there, and the main new addition is really just a
small detail in the way that curves are skipped, which actually helped immensely with the double-counting
issues that were plaguing us previously. Basically, when we’re testing if all the
points of a curve are above or below the ray, we need to decide whether we want to include
zero in the test; which means skipping the curve even if one of the points is precisely
at the same height as the ray. If we don’t skip those cases we get issues
with double counting at the meeting point of two curves, while if we do skip them, then
we essentially create tiny holes that the ray can sneak through at those meeting points.
Of course we could try to solve that issue by only including zero for the p2 points perhaps,
since then if we imagine a contour like this, a ray that perfectly lines up at the meeting
point will only intersect one of the curves. The problem though is that if the contour
looks like this instead, then suddenly the ray isn’t really crossing the contour anymore,
it’s just kind of scraping past it, so we want it to hit either neither of the curves,
or both of them, so that it doesn’t affect the result.
I believe it was this kind of issue that drove me to stop using this counting method originally,
but now that we’ve processed the curves to ensure they can only go strictly upwards
or strictly downwards — it suddenly doesn’t seem like such a big deal anymore. The problem
simply occurs when the curves are going in opposite directions: one up, and one down.
And so the way we can handle this in the code is to just check the direction of the current
curve, and for one direction we include zero only for the p0 points, and for the other
direction we include zero only for the p2 points. And that solves our issue.
So now that we’re using the counting method again, we can see that the ‘j’ renders
correctly, and the tiny self intersection is no longer causing any visible issues.
I was quite happy about this, but victory was short-lived because I then realized that
this approach actually fails in the case of overlapping shapes again. We can fix this
with a slight tweak — where we say that whenever the ray crosses a downwards curve,
which means that it’s exiting the current shape, we add one to this counter. And otherwise
subtract one if it’s entering the shape. Then, if the ray exited more shapes than it
entered, it must have started out inside of a shape. This is essentially the same idea
as the counting method, but should be able to handle overlaps.
By the way, this can also be extended very nicely to work with our current anti-aliasing
approach, like this. It’s still just adding or subtracting one like before, unless the
distance to the intersection becomes extremely small — like within the current pixel. And
in that case we’re now adding or subtracting some fractional amount — meaning that in
the end, we again get our nice smooth blend from 0 to 1 across the edge of the glyph again.
This idea comes from that writeup I mentioned earlier when we were first implementing the
anti-aliasing. Anyway, as we can see this little tweak has
indeed resolved the overlapping shape issue. … While unfortunately breaking the problematic
‘j’ again, which is quite annoying. On the other hand, the self-intersection is looking
better than ever, so I’d say we can count this as a half victory at least.
I did try a couple different things to get the wonky j to render correctly, since it
is a recurring issue that I’ve seen in several fonts at this point, but none of my attempts
came particularly close to working. This one perhaps least close of all. Maybe we should
try to automatically detect faulty contours and correct them when importing the font instead,
but I need to stop worrying about these little
details for now, this was just supposed to be a quick experiment after all, and I maybe
got a little lost in the weeds along the way — sorry about that.
Anyhow, after all this, I definitely feel like I’ve gained a much greater appreciation
for some of the intricacies of rendering text. Here is one last test with our latest approach
on a big wall of text with several different fonts, and I think I’m happy enough with
how it’s looking to call it a day for now at least.
So, to end things off, I’ve just been having some fun in the vertex shader here, making
the letters wave and spin around to say farewell, and thanks for watching
I hope you’ve enjoyed following this little journey into the surprisingly deep rabbit
hole of rendering text, and if you have any thoughts as to how the current implementation
could be improved, I’d love to hear them. Alright, until next time — cheers!
5.0 / 5 (0 votes)