Coding Adventure: Rendering Text

Sebastian Lague
13 Apr 202470:54

Summary

TLDRIn this Coding Adventures episode, the focus is on the intricacies of rendering text using TrueType font files. The host delves into the complexities of font file formats, exploring the challenges of decoding and rendering glyphs. Through trial and error, various methods are tested, including parsing font tables, handling bezier curves, and implementing anti-aliasing for smoother edges. The video also touches on the patent issues surrounding certain rendering techniques and the host's journey in overcoming various rendering artifacts and issues with overlapping shapes and complex fonts.

Takeaways

  • 🌟 The video is about the process of rendering text using font files and the complexities involved in ensuring clear and legible display.
  • 📚 The journey begins with understanding the TrueType font format and the importance of tables like 'glyf' for storing glyph shapes.
  • 🔍 Parsing font files involves dealing with endianness issues, byte offsets, and various data structures that define the properties and contours of glyphs.
  • 🎨 Rendering glyphs involves interpreting the font's bytecode instructions and handling the intricacies of bezier curves and line segments.
  • 🔄 The importance of handling compound glyphs and understanding the mapping between glyph indices and unicode values is highlighted.
  • 📐 The video emphasizes the challenge of rendering text at low resolutions and the need for techniques like anti-aliasing to improve legibility.
  • 💡 The exploration of different rendering techniques, such as using textures, signed distance fields, and direct bezier rendering, is discussed.
  • 🐛 Debugging font rendering issues involves creating 'evil artifact detectors' and refining algorithms to handle edge cases and font peculiarities.
  • 🚀 The video showcases the improvement in performance and visual quality through the use of shaders and optimized algorithms.
  • 📖 The script concludes with the acknowledgment of remaining challenges in rendering complex fonts and the need for further exploration and refinement.
  • 🎓 The content is a rich resource for understanding the technical aspects of font rendering and the trade-offs involved in different approaches.

Q & A

  • What is the primary focus of the video?

    -The primary focus of the video is to explore the process of rendering text using font files, specifically focusing on the technical aspects of decoding and rendering fonts, and handling various challenges such as anti-aliasing and overlapping shapes.

  • Which font file format does the video begin with?

    -The video begins with the TrueType font file format, which was developed by Apple in the late 1980s.

  • What is the significance of the 'glyf' table in a font file?

    -The 'glyf' table is significant because it stores the shapes of the glyphs, or characters, in the font.

  • What is the main problem that the video tries to address with low-resolution text rendering?

    -The main problem addressed is ensuring that text is displayed or printed clearly at low resolutions, where unlucky scaling could cause parts of a letter to appear twice as thick as others, making the letters difficult to read.

  • How does the video tackle the endianness issue when reading font tables?

    -The video creates a FontReader class in C# that handles the conversion from big endian to little endian, allowing the program to correctly read the font table data.

  • What is the role of the 'head' and 'maxp' tables in a font file?

    -The 'head' table contains metadata about the font, while the 'maxp' table provides information about the maximum number of glyphs and other data useful for font processing.

  • What is a compound glyph, and how is it handled in the video?

    -A compound glyph is a character made up of other glyphs. In the video, compound glyphs are handled by reading in each component glyph and combining their points and contour end indices into a single glyph.

  • How does the video address the issue of text legibility at small sizes?

    -The video acknowledges that legibility at small sizes is a challenge and suggests that delving into the bytecode interpreter aspects of font rendering might be necessary for improvement, though it does not provide a concrete solution within the scope of the video.

  • What is the basic idea behind the anti-aliasing technique implemented in the video?

    -The basic idea behind the anti-aliasing technique is to use a grid of points within each pixel and calculate a coverage value based on how many of those points are inside the glyph, resulting in smoother edges and improved appearance at larger magnifications.

  • How does the video propose to optimize the rendering performance for complex fonts?

    -The video suggests optimizing the rendering performance by splitting each glyph into multiple bands so that each pixel only has to test the bezier segments inside its band, and by constructing tighter fitting meshes tailored for each individual glyph through pre-processing.

Outlines

00:00

🚀 Introduction to Font Rendering

The video begins with the host expressing excitement about exploring the process of rendering text, starting with the need to select a font. The chosen font, JetBrainsMono-Bold, is opened in a text editor to inspect its contents, which are found to be unintelligible. The host decides to refer to a reference manual to understand the TrueType font format, developed by Apple in the 1980s. The complexity of the font file is discussed, including its tables and bytecode instructions. The focus is on the 'glyf' table, which is believed to contain the shapes of the characters. The host encounters an issue with the number of tables read from the font file but resolves it by understanding the endianness of the file format. The video then delves into the structure of the tables and how to interpret them.

05:02

📚 Understanding the Font Table Structure

The host continues to explore the font tables, focusing on the table directory and the metadata it provides. The discussion includes the identification tags and byte offsets of the tables. The host experiences initial difficulties with reading the tags due to skipping over unnecessary data. After adjusting the code, the host successfully reads the tags and identifies the glyph table entry. The structure of the glyph data is explained, including contours, bounding boxes, and instructions. The host also discusses the concept of 'FWords' and how they relate to the glyph data. The video then demonstrates how to read the glyph data and plot it in the Unity engine, revealing a distorted character glyph. The host refines the code to correctly display the glyphs and expresses eagerness to render all characters in the font.

10:03

🎨 Bezier Curves and Glyph Transformation

The host introduces Bezier curves as a method to smooth out the blocky appearance of the glyphs. A brief explanation of Bezier curves is provided, including the concept of control points and how they affect the curvature of a line. The host visualizes the curves with a creative lazer-beam analogy. The mathematical aspect of Bezier curves is simplified with a linear interpolation function. The video then shows how to implement Bezier curves in the code to replace the blocky glyphs with smooth curves. The host tests the code with a 'Hello World!' message, but encounters issues with character mapping and spacing. The video ends with the host planning to address these issues in future videos.

15:04

🔍 Debugging and Refining Glyph Rendering

The host addresses the issues from the previous attempt by implementing a character map to correctly display the 'Hello World!' message. The video discusses the various formats of character mapping and focuses on the most common ones. The host writes code to handle these mappings and successfully displays the message. However, the host notices that certain letters are missing and deduces that they are compound glyphs. The video explains the concept of compound glyphs and how they are stored in the font file. The host implements a solution to render compound glyphs and tests it with lower-case letters and diacritics. The video concludes with the host planning to test different fonts and explore more rendering techniques.

20:07

🖋️ Advanced Rendering Techniques

The host experiments with different rendering techniques for text, starting with a discussion on increasing the line thickness to create solid shapes. The video explores polygon triangulation and the use of a shader to fill in the curves. The host also considers the idea of pre-rendering glyph meshes to a texture file for efficient rendering. The video then discusses the concept of signed distance fields for text rendering and their benefits in scaling. The host concludes that direct rendering from Bezier data is preferred for quality and begins to discuss the challenges of text rendering, including the need for anti-aliasing and handling overlapping shapes.

25:09

🐞 Artifact Detection and Algorithm Refinement

The host creates an 'artifact detector' to identify and quantify rendering issues. The video explains the process of testing each pixel within a glyph and how the results are used to identify artifacts. The host identifies a flickering issue when zooming and panning the text and attempts to understand its cause. The video delves into the problem of double-counting intersections and presents a solution to record and ignore close intersection points. The host also discusses the limitations of the 'closest curve' approach and considers returning to the intersection counting method. The video concludes with the host expressing a desire to improve the rendering algorithm further.

30:13

🔄 Revising the Rendering Approach

The host revisits the intersection counting method, using insights gained from debugging to refine the approach. The video explains the new method for skipping curves and handling curves going in opposite directions. The host addresses issues with overlapping shapes and introduces a method to track the closest distance to the exit of shapes. The video demonstrates the improved rendering with different fonts and discusses the challenges of legibility at small sizes. The host acknowledges the limitations of the current approach and suggests potential optimizations. The video concludes with a test of the latest rendering method on a large text wall and a reflection on the journey of text rendering exploration.

Mindmap

Keywords

💡Font Rendering

Font rendering refers to the process of displaying text on a digital screen. In the video, the speaker explores the intricacies of rendering text, including the handling of TrueType font files and the decoding of their contents to render the text visually. The process involves understanding the structure of font files, the tables within them, and how to interpret the glyph data to display text correctly on various digital platforms.

💡TrueType Font (TTF)

TrueType Font is a digital font format developed by Apple in the late 1980s. It is used to store font data that defines the appearance of text. TTF files contain a series of tables with metadata about the font, including the glyph data for individual characters. In the video, the speaker focuses on the 'glyf' table, which stores the shapes of the characters, and the challenges of interpreting this data for rendering purposes.

💡Glyph

A glyph is a visual representation of a character in a typeface or font. Glyphs are the individual elements that make up the text when rendered on a screen or printed on paper. The speaker is particularly interested in the 'glyf' table of the TTF file, which contains the contour data for each glyph in the font.

💡Endianness

Endianness refers to the order in which bytes are stored in memory or processed by a computer system. In the context of the video, the speaker discovers that the font file format is big endian, which means that the bytes of each value are stored in a reverse order compared to what their little-endian computer expects, requiring a conversion to correctly interpret the number of tables in the font file.

💡Bytecode Instructions

Bytecode instructions are low-level commands used by a computer's processor to perform specific tasks. In the context of font files, they are part of the font's instruction set that controls how the outlines of glyphs are drawn. The speaker mentions these as part of the complex diagrams and terminology found within the font file reference manual.

💡Font Directory

The Font Directory is the first block of data encountered in a font file, which serves as a guide to the contents of the font file. It includes information such as the number of tables and the location of each table within the file. The speaker uses this directory to navigate through the font file and locate the necessary tables for rendering the text.

💡Contours

Contours are the outlines or boundaries that define the shape of a glyph in a font. They are made up of a series of points and are used to construct the visual representation of each character. The speaker discusses the number of contours that make up the first glyph in the table and how they are used to render the text.

💡Bytecode

Bytecode is a lower-level programming language in the form of binary code that is used to define the operations of a computer. In the context of font files, it refers to the set of instructions that control the rendering of the font's glyphs. The speaker skips over the bytecode in the font file, as it is complex and not directly related to the immediate goal of rendering the text.

💡Font Reader Class

The Font Reader Class is a custom-made software component developed by the speaker to read and interpret the data from the font file. This class handles the conversion between different byte orders and provides functions to read specific data types from the font file, such as tags and integers.

💡Glyph Table

The Glyph Table is a part of the font file that contains the data for all the glyphs in the font. It includes information about the contours, points, and instructions needed to render each character. The speaker focuses on locating and interpreting the Glyph Table to extract the shapes of the characters for rendering.

💡Bezier Curves

Bezier curves are mathematical curves used in computer graphics to create smooth, continuous shapes. In the context of font rendering, they are used to define the outlines of glyphs. The speaker discusses the need to convert the blocky outlines of the glyphs into smoother Bezier curves to improve the appearance of the rendered text.

Highlights

Exploring the complexities of rendering text from font files, specifically focusing on the challenges of interpreting TrueType Font (TTF) files.

Discussing the historical development of TTF files, originally created by Apple in the late 1980s.

The importance of understanding the structure of font files, including mandatory tables like 'glyf' that store glyph shapes.

Addressing endianness issues when reading font files, with a focus on big endian and little endian systems.

Decoding the font directory and table entries, including handling 4-letter tags and byte offsets.

Understanding the concept of 'contours' that make up glyphs and the potential complexity of compound glyphs.

The role of 'FWords' in defining the bounding box of a glyph and the intricacies of handling x and y coordinates.

The challenges of interpreting bytecode instructions and flags within font files for rendering purposes.

The significance of the 'maxp' and 'head' tables in determining the total number of glyphs and glyph locations.

The process of extracting glyph locations from the 'loca' table and the importance of mapping glyphs to Unicode values.

The introduction of Bezier curves for smoothing out the edges of rendered glyphs, enhancing visual appeal.

The exploration of different rendering techniques, such as increasing line thickness versus using polygon triangulation.

The challenges of rendering text at different sizes and the need for anti-aliasing to improve legibility.

The development of a method for detecting and correcting faulty contours in fonts to improve rendering accuracy.

The creation of an 'artifact detector' to systematically identify and address rendering errors.

The implementation of a shader-based approach for efficient, parallel computation of pixel coverage.

The exploration of sub-pixel anti-aliasing as a technique to enhance the smoothness of rendered text.

The final demonstration of rendering a variety of fonts with the developed techniques, showcasing the potential and limitations of the approach.

Transcripts

00:01

Hello everyone, and welcome to another episode of Coding Adventures. Today, I’d like to

00:05

try and render some text, since it’s one of those things that I always just take for

00:10

granted. To begin with, we’re going to need a font.

00:13

And I’ve just grabbed the first one I found on my hard-drive, which happens to be JetBrainsMono-Bold.

00:18

So let me open that up in a text editor just to see what’s inside, and not unexpectedly,

00:23

the raw contents is highly un-illuminating to say the least, so we’re going to need

00:27

to find some kind of guide for decoding this. This is a ttf, or TrueType font file, which

00:32

is a format, I’m told, developed by Apple in the late 1980s. And so, I had a look on

00:37

their developer website, and located this handy reference manual. Hopefully it’s nice

00:42

and simple to understand. Okay, I’ve just been skimming through it

00:50

a little, and I’m becoming increasingly baffled and bewildered by all the mysterious

00:55

diagrams, the long lists of arcane bytecode instructions, not to mention the strange terminology

01:00

such as freedom vectors… ‘ploopvalues’, and twilight zones… What is the twilight

01:06

zone? *“It is the middle ground between light and shadow”* — You’re not helping!

01:12

Anyway, after slowing down a bit and reading more carefully, I realized that most of this

01:16

complexity is focused on a single problem, which is making sure that text is displayed

01:21

or printed clearly at low resolutions, where unlucky scaling could cause one part of a

01:27

letter to appear twice as thick as another, for example, making the letters difficult

01:31

to read. So I don’t think it’s actually something

01:34

we need to worry about for our little experiment today at least, which is definitely a relief!

01:39

Let’s move on then to the next section of the manual, which gives us this list of mandatory

01:44

tables that all fonts need to include, and I’m most excited about this ‘glyf’ table,

01:50

since that sounds like where the shapes are stored. So our first goal is to locate that

01:55

table. Aand — this looks promising! “The Font

01:58

Directory is a guide to the contents of the font file”. Music to my ears.

02:04

So this is the very first block of data we’ll encounter in the file, and I don’t think

02:08

we really care about any of it , except for the number of tables. So we’re going to

02:13

want to skip over one 32bit integer, and then read this 16bit integer.

02:19

And here’s some quick C# code for doing exactly that. I’m printing out the number

02:24

of tables here by the way, just to make sure it’s a plausible value, which from the tables

02:28

listed in the manual, I’m thinking would be somewhere between the mandatory 9 and around

02:32

50 at a maximum maybe. So let’s run this quickly — and we can

02:37

see the number of tables is…. 4352. How am I doing something wrong already?

02:44

Ookay, I finally found the answer on this osdev wiki entry — which is that the file

02:49

format is big endian — which just means that the individual bytes of each value are

02:54

stored back-to-front from what my little-endian computer is expecting.

02:58

So I’ve quickly made this simple FontReader class, which let’s us read in a 16bit value,

03:03

and have this conversion to little-endian happen behind the scenes, so we don’t need

03:07

to worry about it every time. Alright, so using that, let’s try running the program

03:12

again — and now the number of tables comes out as… 17. Which is much more believable.

03:19

That means we can proceed to the table directory. And what this gives us is a 4-letter identifying

03:24

tag for each table — and we’ve briefly seen what those look like already — and

03:29

then also the table’s location in the file, in the form of a byte offset.

03:34

So in the code I’ve just added this little loop over the total number of tables, and

03:39

in there it’s simply reading in this metadata for each of them. Of course I’ve had to

03:44

expand our little reader class in the process, with these two new functions for actually

03:47

reading a tag, and a 32bit integer. And you can see here my, possibly somewhat clunky

03:50

approach to reversing the order of the bytes in there.

03:52

So, hopefully this is going to work… Let’s try running it, aand these tags look a little

03:58

strange. For example we seem to have a tiny head over here with a big question mark mark

04:03

next to it. Which does incidentally sum up how I’m feeling right now. Oh wait! I just

04:10

realized that I forgot to skip over all that other stuff in that first table that we didn’t

04:14

really care about. Let me just do that quickly — I think it was 3 16bit values, so that’s

04:20

6 bytes we want to hop over here. Alright… And now… We’ve got the tables!

04:26

I’ve spotted the glyph table entry over here, which is pointing us to byte location

04:31

35436, so that’s where we’re headed next! Let’s take a quick look at the manual first

04:38

though — and it looks like the first bit of information we’ll be given is the number

04:42

of contours that make up the first glyph in the table. Interestingly, that value can be

04:47

negative, which would indicate that this is a ‘compound’ glyph made up of other glyphs,

04:53

but let’s worry about that later! We then have a bunch of 16bit ‘FWords’,

04:58

which is an interesting name choice, and these tell us the bounding box of the glyph.

05:01

Following that comes the actual data such as these x and y coordinates in here, which

05:06

I’m eager to get get my hands on. Although we should take note that these coordinates

05:10

are relative to the previous coordinate, so they’re more like offsets I guess we could

05:15

say. They can also be in either an 8bit or 16bit format, which I assume these flags here

05:22

are hopefully going to tell us. Then there’s also an array of instructions,

05:26

which I believe is the scary bytecode stuff we’re skipping over, and finally, or I guess

05:31

firstly — I’m not sure why I read this table backwards — we have indices for the

05:35

end points of each contour. So if I’m understanding this correctly,

05:38

we’re going to read in a bunch offsets, from which we can construct the actual points,

05:43

and then say we get two contour end indices — like 3, and 6 for example —that just

05:49

means that the first contour connects points 0, 1, 2, 3 and back to 0. While the second

05:56

contour connects points 4, 5, 6, and back to 4.

06:00

That seems easy enough, so the other thing to wrap our heads around is that 8bit flag

06:04

value we saw. Although only 6 of the bits are actually used. So we’re going to have

06:10

one of these for each point, and according to the guide, bits 1 and 2 tell us whether

06:16

the offsets for that point are stored in an unsigned 1 byte format, or a signed 2 byte

06:22

format. Then, getting just slightly convoluted here,

06:26

bits 4 and 5 have two different meanings, depending on whether the 1 byte, or 2 byte

06:31

representation is being used. In the first case, it tells us whether that byte should

06:35

be considered positive or negative. And in the second case, that’s not necessary since

06:40

the sign is included in the offset itself, and so it instead tells us whether or not

06:45

we should skip the offset. That way, if the offset is zero, it doesn’t have to actually

06:50

be stored in the file. So we can see there’s some basic compression happening here.

06:55

On that note, let’s take a look at bit 3. If this is on, it tells us to read the next

07:00

byte in the file, to find out how many times this flag should be repeated, so that it doesn’t

07:06

have to waste space actually storing repeated instances of the same flag.

07:10

And finally, bit 0 tells us whether a point is on the curve or off the curve — and I’m

07:16

not entirely certain what that means right now, but we can worry about it later — let’s

07:20

first just get these points loaded in. By the way, to actually test if a particular

07:25

bit in the flag is on or off — I’m using this tiny function here which simply shifts

07:30

all the bits over so that the bit we’re interested in is in the first spot, then masks

07:36

it out, meaning all the other bits get turned off, and finally checks if the resulting value

07:41

is equal to 1. Alright, and then here’s a function I’ve

07:44

been working on for actually loading one of these simple glyphs. And this does exactly

07:48

what we just talked about: it reads in the contourEndIndices, then reads in all of the

07:54

flag values — of course checking each of them to see if they should be repeated some

07:57

number of times — and finally reading in the x and y coordinates, before simply returning

08:02

all of that data. And for reading in the coordinates, I have

08:06

other little function here — where each coordinate starts out with the value of the

08:10

previous coordinate, and then we grab the flag for this point, and depending on that,

08:15

we either read in a single byte for the offset, and add or subtract it based again on what

08:21

the flag tells us to do; or we read in a 16bit offset, but only if the flag isn’t telling

08:27

us to skip this one. Alright, I’m excited to see what comes out

08:31

of this — so back over here I’ve just made a dictionary for mapping table tags to

08:36

locations, which we can use to set the reader’s position to the start of the glyph table,

08:41

and then simply read in the first glyph, and print out its data.

08:45

So let’s what that looks like quickly. Okay, that seems promising… but we won’t really

08:51

know if it’s nonsense or not until we draw it. So in the Unity engine I’ve just quickly

08:56

plotted these points — and I don’t know what this is supposed to be, but I’m sure

09:00

it will become clear once we draw in the contours. Okay, it’s looking a little dishevelled,

09:08

but I do believe it’s the missing character glyph — and I actually remember now reading

09:12

that it’s required to be at the very start of the glyph table, so that makes sense. I

09:18

just need to iron out some bugs clearly, so I’ve been fiddling with the code a bit,

09:22

and here’s how it’s working now. We simply loop over all the end indices, and

09:27

then create this little window into the array, which allows us to access only the points

09:31

in the current contour — since evidently I can’t be trusted — and then we simply

09:35

draw lines between those, looping back to the first point in the contour to close the

09:40

contour at the end. Alright, let’s try it out — and that’s

09:45

looking much better! One glyph is not enough though, I want all

09:49

the glyphs! And to get them, we just hop over to the ‘maxp’ table, where we can look

09:53

up the total number of glyphs in the font. Then we head over to the ‘head’ table

09:58

and leap over several entries we don’t care about right now, to find out whether the locations

10:02

of the glyphs are stored in a 2 byte or 4 byte format — and finally we can dive into

10:07

the ‘loca’ table, from which we can extract the locations of all of the glyphs in the

10:12

glyf table. And then we can just read them in.

10:16

I must admit, even though parsing a font file must be pretty low on the list of thrilling

10:21

things one could do with one’s day, I’m quite excited to see all these letters appearing

10:25

here. Anyway, let’s zoom in on one of these glyphs,

10:29

such as the big B for instance, and we can see that while the glyphs are obviously very

10:33

beautiful, they are perhaps, a bit blocky. So I believe our next step should be to bezier

10:39

these bad boys. I’ve babbled about beziers a bunch before

10:43

on this channel, but briefly — if we have two points, and want to draw a curve between

10:48

them — we need at least one extra control point to control how the curve should curve.

10:54

Now to visualize this curve, let’s imagine a point starting out at the start point, and

10:58

moving at a constant speed towards this intermediate control point. From which, a second point

11:04

moves simultaneously towards the end point, in the same amount of time that it takes the

11:09

first point to complete it’s journey. Like this.

11:12

That’s not terribly interesting, but like many things in life, it can be made more exciting

11:17

with the addition of lazers. So let’s draw a lazer-beam between our two moving points.

11:23

And that’s a bit better already, but for some added flair, let’s then also leave

11:27

a ghost trail of the old beams behind… and now we’re really getting somewhere.

11:34

So the curve that these beams trace out is our bezier curve. And all we need now is a

11:39

way to describe this curve mathematically, which is surprisingly easy to do. We just

11:44

need to imagine one last travelling point — this one moving between the first two

11:49

travellers, and also completing its journey in the same amount of time. Observing the

11:56

path of this third point, we can see how it travels perfectly out our curve.

12:01

So here’s a little function to help calculate the positions of these points. It takes in

12:05

both the start position, and the destination, as well as a sort of ‘time’ value, which

12:10

can be anywhere between 0 and 1, as measure of the progress of the journey.

12:15

The point’s position at the given time is then calculated as the starting point, plus

12:20

the offset that takes us from the start to the ending point, multiplied by the time value.

12:27

From this linear interpolation, we can build our Bezier interpolation, which just calculates

12:32

these intermediate A and B points like we saw, and then interpolates between those to

12:37

get the actual point along the curve at the current time.

12:41

So, if we want to draw one of these curves, one way to do it would be like this — simply

12:46

chopping the curve up into a bunch of discrete points based on this resolution value, and

12:51

then drawing straight lines between them. So just testing this out quickly — we can

12:57

see that with a resolution of 1 we of course just have a straight line; 2 gives us just

13:02

the faintest idea of a curve; and from there it quickly starts to look pretty smooth.

13:08

Anyway, let’s get back to our blocky B — now that we have beziers working — and I’ll

13:16

just quickly jump into the code here, and modify it so that every set of 3 points in

13:21

the contour now gets drawn as a curve. And let’s take a moment to admire how that

13:29

looks. Wait what… This is not quite what I had in mind, but

13:36

it looks kind of interesting, so let’s try writing some text with it, and we can come

13:40

back and fix it in a minute. So I’ve created a string here that says

13:44

“Hello World!”, and we simply go through each character in that string, and of course

13:49

to the computer each of these characters is just a number defined by the unicode standard

13:53

— and I’m using that number as an index into the array of glyphs that we’ve loaded

13:59

in. We then draw each of these, and add some hardcoded

14:02

spacing between them, because I still need to figure out where they’ve hidden the actual

14:06

spacing information. Alright, let’s take a look at the result.

14:11

Okay the letter spacing is too small, so I’ll just increase that quickly, but unfortunately

14:16

the text is still not particularly comprehensible. Clearly the glyphs in the font are not in

14:22

the same order as unicode — which makes total sense actually, since different fonts

14:27

support different subsets of characters, so there’s never going to be a 1 to 1 mapping.

14:32

Instead, the font file needs to include a character map, to tell us which glyph indices

14:38

map to which unicode values. This part seems like a bit of a pain because

14:42

there are 9 different formats in which this mapping information can be stored. Although,

14:46

I then read to my great relief that many of these are either obsolete, or were designed

14:51

to meet anticipated needs which never materialized. And reading a bit further, it seems that 4

14:57

and 12 are the most important ones to cover, so I’m going to get cracking on those.

15:03

Okay — here’s the code I’ve written to handle them. And I’m not going to bore

15:07

you with the details, all it’s doing is looking up, in a slightly convoluted way,

15:13

the unicode value that corresponds to each glyph index.

15:17

With that mapping information, we can now retrieve the correct glyphs to display our

15:22

little Hello World message. I really like how stylish some of these buggy

15:26

beziers look. Perhaps this is the font of the future… Oh and by the way, if you’d

15:31

like to learn more about the beauty of bezier curves, I highly recommend a video called

15:36

the “Beauty of Bezier Curves”, and it’s sequel too — the continuity of splines.

15:43

Right now though, let’s see if we can fix this mistake that I’ve made.

15:47

So first of all, we can that here where there’s supposed to be a straight line, this point

15:51

is actually just controlling how the curve bends as it moves towards this next point

15:56

here. So it looks like the glyphs aren’t made entirely out of beziers as I was kind

16:01

of assuming, but rather have some ordinary line segments mixed in.

16:05

Also, taking another look at the manual, it shows this example where we have a spline

16:09

made up of two consecutive bezier curves. And what it’s pointing out, is that this

16:14

shared point between the two curves is exactly at the midpoint between these two control

16:20

points. This is usually where you’d want it to be,

16:24

simply because if it’s not at the midpoint, we can see that we no longer get a nice smooth

16:28

join between the two curves. They become discontinuous. And so what the manual recommends doing for

16:34

these points that lie right in the middle, is to save space by simply deleting them.

16:40

As it puts it, the existence of that middle point is implied.

16:44

So, we’re going to need to infer these implied points when we load the font. And I believe

16:49

that this is where that ‘on curve’ flag that we saw earlier is going to come in handy.

16:54

So in this example, points 1 and two would be flagged as ‘off the curve’, whereas

16:59

points 0 and 3 would be on the curve, since the curve actually touches them.

17:03

What we can do then is say that if we find two consecutive off-curve points, we’ll

17:09

insert a new on-curve point between them. Also, just to turn everything into beziers

17:15

for consistency’s sake, let’s also say that if we find a straight line segment — which

17:20

would be two consecutive on-curve points, then we’ll insert an extra control point

17:25

in between them. Alright, so I’ve been writing some code

17:29

to handle that quickly, which as we just talked about simply checks if we have two consecutive

17:34

on or off-curve points, and if so, it inserts a new point in between them. And here’s

17:40

a quick test to see if that’s working as expected. So these are the on and off curve

17:45

points that are actually stored within the font file for our letter B. And then here

17:49

are the implied on curve points that we’ve now inserted into the contours, as well as

17:54

these extra points between the straight line segments, to turn them into valid bezier curves.

18:00

With that, our code for drawing these curves finally works as expected. And just for fun,

18:05

I made it so that we can grab these points and move them around to edit the glyphs if

18:09

we want. Anyway, let’s put this to the test quickly

18:13

by drawing all the letters of the English alphabet here, and these all seem to have

18:18

come out correct. So let’s then also try the lowercase letters,

18:22

and these are looking good too… except I notice that we are missing the i and the j.

18:29

I guess they must be compound glyphs since we haven’t handled those yet, and I think

18:33

the idea is just that the ‘dot’ would be stored as a separate glyph, and then the

18:39

i and the j would both reference that, instead of each having to store their own copy of

18:43

it. So I’ve just spent some time implementing

18:46

compound glyphs over here, and since these are made up of multiple component glyphs,

18:51

essentially all we need to do is keep reading in each component glyph, until we receive

18:55

word that we’ve reached the last one. And all the points and contour end indices from

19:00

those, are simply squished together into a single glyph, which is returned at the end

19:05

here. Then here is the function for actually reading

19:08

in a component glyphs, and this essentially just gives us the index to go and look up

19:12

the data of a simple glyph. Actually, I’m wondering now if a compound glyph can contain

19:18

another compound glyph. I’ll need to tweak this later if that’s the case…

19:22

But anyway the simple glyph we’ve just read in can then be moved around, scaled, or even

19:27

rotated — if I had implemented that yet that is — to orient it correctly relative

19:32

to the other components. And we can see that transformation being applied to the points

19:36

over here. So with that bit of code, we can at last read

19:40

in our missing i and j. And this has also unlocked a whole world of

19:45

diacritics, since of course these all make use of compound glyphs as well, to save having

19:50

loads of duplicate data Okay, we’re making good progress I think,

19:55

so I’d like to start trying out some different fonts, just to see if they’re even working

19:59

properly. Let’s write a little test sentence here,

20:02

such as the classic one concerning the fox and the lazy dog, or this one I came across

20:07

concerning discotheques and jukeboxes. These are good test sentences because of course

20:13

they contain all the letters of the alphabet. Which I learned today is called a pangram.

20:18

Anyway, this is JetBrainsMono that we’ve been using, and I’m going to try switching

20:22

now to maybe… Open Sans. Okay, there’s an issue with the size of

20:28

the glyphs clearly — but I found this helpful scaling factor in the head table, which should

20:32

take care of that… And that’s looking much better — but obviously the spacing

20:38

is a little off. The problem is that this isn’t a monospaced

20:42

font like we had before, so our hard-coded spacing value is no longer going to cut it.

20:48

After some rooting around though, I unearthed this horizontal metrics table, which tells

20:53

us how much space we need to leave after each glyph, which is known as the ‘advance width’.

20:58

So here’s some code for just reading in that information. And if we use those values,

21:04

our text now obviously looks a lot more sensible. Okay I’ve just spent a little while refactoring

21:13

and cleaning up the project, because things were getting a little out of hand. Let’s

21:16

quickly make sure I haven’t messed anything up in the process… and of course I have.

21:25

Alright — I tracked down the issue, so now we’re back on track.

21:28

And now that we’re able to write some words and display their outlines, let’s finally

21:32

figure out how to properly render the text. The solution is actually surprisingly simple

21:37

— we just need to increase the thickness of the lines until they turn into a solid

21:41

shape. Alright, thanks for watching! Well, wait a

21:46

minute, this is perhaps not the most legible approach, so we should maybe take some time

21:50

to explore our options. For example, some years ago I wrote this little

21:55

polygon triangulation script, which uses an algorithm called ear-clipping. And one idea

22:00

might be to simply feed the points that we’re currently using to draw the outlines into

22:04

that, and get out a mesh for the computer to draw. This doesn’t seem ideal though

22:09

because if we want the letters to be nice and smooth, we’ll have to have lots and

22:13

lots of little triangles on the screen which the computer might not particularly appreciate.

22:19

So a much smarter mesh-based approach is presented in this paper on “Resolution Independent

22:24

Curve Rendering”. Basically, we triangulate a mesh of just the sort of straight-line inner

22:30

part of the glyph, and then render a single extra triangle for each of the curvy bits.

22:37

So to be clear, for each bezier curve on the contour, a triangle is simply being drawn

22:42

like this. The trick now is to find some way to display only the section of the triangle

22:48

that is inside of the curve, so that we can get a smooth result without having to add

22:52

even more triangles. For this, we’re going to need to think about

22:56

what the shape of our bezier curve actually is —and just eye-balling it, it looks kind

23:02

of parabolic, but let’s have a look at our equation to be sure…

23:06

Currently we have this defined as 3 separate linear interpolations, which paints a nice

23:11

geometric picture, but it’s a little clunky mathematically. So I’ve quickly just written

23:16

out those interpolations as one giant equation, and then simplified it a bit to arrive at

23:21

this, which we can of course recognize as a quadratic equation. It’s just this coefficient

23:26

‘a’ times t^2, plus the coefficient ‘b’ times t, plus a constant ‘c’. For that

23:33

reason, this particular type of Bezier curve we’re working with is called a quadratic

23:37

bezier curve, which confirms our parabolic suspicions.

23:42

We can appreciate that fact even better if allow the lines to extend out past this 0

23:46

to 1 time interval of our bezier segment. Now, it is a little bit fancier than your

23:51

standard parabola though in that we can actually rotate it around. And that’s because our

23:56

equation of course is outputting 2 dimensional points, rather than a single value. We could

24:03

break it down if we wanted into separate equations for the x and y values of the curve respectively.

24:08

And if we were to graph those, they would just be regular old, non-rotated parabolas

24:12

like this. So nothing mysterious here — if we move

24:16

a point on the x axis, we can see how that’s reflected in the x graph, and if we move a

24:21

point on the y axis, we can see how that’s reflected in the y graph.

24:26

Alright, so I reckon we’re ready now to think about this problem of how to fill in

24:31

the region inside of the curve. That’s kind of confusing when the curve is all rotated

24:35

like this though, so let’s just position the points so that it looks like a perfectly

24:39

normal y = x^2 parabola. And one way to do this is to put p1 at coordinates (1/2, 0),

24:46

and p0 at coordinates (0, 0), and p2 at coordinates (1,1).

24:53

And that indeed has given us this nice simplified case where the y values of the curve are equal

24:59

to the square of the x values. So if we want to fill in this region inside

25:04

the curve — that’s very easy in this case because if the points on the curve are where

25:08

y is equal to x^2, then obviously inside is just where y is greater than x^2.

25:15

To actually draw this region though, we’ll need a mesh and a shader.

25:19

For the mesh, we can simply construct a triangle like we saw earlier out of the 3 bezier points,

25:24

and I’ll give those points texture coordinates corresponding to the positions we put the

25:29

points in to get that simple y=x^2 parabola. Then in a shader, we can get the interpolated

25:37

x and y texture coordinates for the current pixel, and let’s visualize the x value for

25:41

now as red, which looks like this. And here’s the y value as green. And here’s them both

25:49

together. I actually find it a bit easier to see what’s going on if we chop the values

25:54

up into discrete bands, by multiplying by 10 for example, then taking the integer part

25:59

of that, and then dividing by 10. And now we can see our x and y values as a

26:05

little grid here, instead of a continuous gradient.

26:09

Of course we can move our points around, and we can see how this grid gets transformed

26:14

along with the , since again the shader automatically interpolates the texture coordinates to give

26:19

us the value for the current pixel. In texture space though, things always just look like

26:24

this, since these are the texture coordinates we defined.

26:30

So back in the shader, let’s test for y being greater than x^2, and draw the result

26:34

of that into the blue channel here. As hoped, that now fills in the curve — and

26:41

what’s more of course, it will also be transformed along as we move our points around. So we

26:47

only really had to think about the simplest possible case, and by taking advantage of

26:51

the way that shaders work, we get the rest for free.

26:54

I thought that was a cool idea, so with that, we can now now go back to our mesh of just

26:59

the straight lines of the glyph, then add in those triangles for all the curves, and

27:03

then use this clever little idea to shade in just the parts inside the curve.

27:08

Hang on, that’s not quite right. It looks like along the outside edge of the glyph,

27:13

where the points go in a clockwise order, everything is correct — but along the inside

27:18

here where the points are now going counterclockwise we actually want to fill in the opposite region

27:23

of the curve. So in the shader I’ve just quickly added

27:27

this little parameter which’ll tell us the direction of the triangle, and we can flip

27:31

the fill flag based on that. Then just testing that here — we can see that when the points

27:38

are clockwise, we have the inside of the curve being filled, and when counterclockwise, the

27:45

outside is now filled. Aaand that seems to have fixed our problem

27:50

here. So this is a pretty interesting way to render text I think, but one potential

27:56

problem with the approach is that Microsoft has a patent on this research. Skimming through

28:01

this, it does seem though that it only covers the more complicated case of cubic bezier

28:06

curves — which is when the curves are defined by four points instead of the three that we’re

28:11

working with. So quite possibly it is actually okay to use.

28:15

But, I’m anyway curious to experiment with different ideas today, so let’s put this

28:20

one to the side for now. What if instead we took our old glyph meshes

28:25

made out of ridiculous numbers of triangles, but claim that we don’t actually care about

28:29

the triangle count because instead of rendering them to the screen, we pre-render all of them

28:34

to a texture file. That way we ultimately only need to render a single quad for each

28:39

letter, and use a shader that automatically crops out the appropriate part of that texture

28:43

atlas to display the letter we want. This’d be nice and fast, but it does of course have

28:49

some issues with scaling. We’d need to make our atlas truly gigantic if we want to be

28:54

able to display large text for example. Or alternatively, we could try using signed

28:59

distance fields. In one of my old videos we took this map of the earth, and wrote a little

29:06

function to calculate the distance of every pixel to the shoreline. Which could then be

29:10

used for making a simple wave effect. So we could use that same code here to turn

29:17

our image of glyphs, into an image of glyph distances.

29:22

A simple shader could then sample the distance, and output a colour if that distance is within

29:26

some threshold. Which looks like this. This scales better than before, because now

29:35

the shader is blending between distances rather than hard on-off values. The outlines are

29:41

a little lumpy, but I’m sure we could improve that with some refinements to the technique.

29:46

These texture-based approaches are slightly annoying though, in the sense that it’s

29:50

not really practical to support all the characters in a font because then the texture would have

29:54

to be too large. So we always have to compromise on some subset, in order to get good quality.

30:00

And even still, the edges tend to have some obvious imperfections, at least if you’re

30:05

zooming unreasonably close to the text — and we tend to lose our nice crisp corners.

30:12

With that said, I have seen some interesting-looking research about using multiple colour channels

30:17

to store additional information that can help to increase the quality.

30:22

So I would like to experiment with this at some point because it’s definitely an interesting

30:26

technique, but for today I’d prefer to render the text directly from the bezier data, without

30:31

any textures comprising the quality in between. So focusing on a single letter for a moment,

30:38

let’s imagine a grid of pixels behind it, and our goal is obviously to light up the

30:42

pixels that are inside of the letter. Therefore, we need a way to detect whether a point is

30:48

inside or not, and happily there’s a very intuitive way to do this.

30:52

Say we’re interested in this point over here. All we need to do is do is pick any

30:57

direction, and imagine a ray going out until it hits one of the contours of the shape.

31:02

When it hits that contour, it must either be entering or exiting the shape. And if it’s

31:08

entering it, then of course it’s bound to exit it at some point. But in this case the

31:13

ray doesn’t hit any other contours along its journey, which means it must have been

31:17

exiting the shape, and so obviously it must have started out inside of it.

31:23

Now moving our point of interest back a bit into the ‘B’ hole here, for want of a

31:27

better term, the ray now intersects with two contours along its path. From that we can

31:32

deduce it must have first entered the shape, and then exited it, and so clearly the starting

31:37

point this time was on the outside The simple rule we can gather from this is

31:42

that if the number of intersections is even, we’re outside the glyph, while if its odd,

31:48

we’re inside of it. So all we really need to do is figure out

31:52

the maths for calculating the intersection of a horizontal ray with a bezier curve.

32:00

For example, say we have this curve over here, and we want to figure out where this ray,

32:05

at a height of 0.5, intersects with it. Since the ray is horizontal, we only need

32:11

to care about the y values of the curve, so we’re basically just asking where this function

32:16

is equal to 0.5. Equivalently, we could shift the whole curve down by 0.5, and then ask

32:23

where the function is equal to zero. Conveniently, that’s exactly the question

32:28

that that the quadratic formula is there to answer.

32:30

So, if we just take our equation for the y component of the bezier curve, and plug the

32:35

values in for a, b, and c, we’re going to get back the values of t where the corresponding

32:42

y value is 0. In this case, the t values come out as 0.3,

32:48

and 1.2. And since values outside of the range 0 to 1 are not on our curve segment, that

32:54

means that we’d count 1 intersection in this case.

32:57

Alright, so here’s some code implementing the quadratic formula. And one small detail

33:02

here is that if a is zero, we get a division error — this is the case where the curve

33:07

is actually a straight line — and so then we can solve it like this instead.

33:12

Okay, now I’ve also written this little test just to see that this is working properly

33:16

— and all it does is calculate the a b and c values from the positions of the bezier

33:22

points, and then it asks what the roots of that equation are — so where it’s equal

33:26

to zero, but first the height of our horizontal ray is of course subtracted over here like

33:31

we saw earlier, to take its position into account.

33:34

Then, assuming a solution exists, we just draw a point at that time value along the

33:39

curve to visualize it — using this little function over here. So, let’s see how it

33:44

goes! I’m just going to grab some points and move

33:49

them around a bit to see whether the intersection points look correct or not. It’s maybe not

33:54

the most rigorous testing methodology in the world, but we should catch anything obvious

33:59

at least. So far so good though. And let me also try changing the height of the ray, and

34:07

that seems to be working as well. So using this horizontal intersection test,

34:14

we can now implement that simple algorithm we were talking about earlier, where an even

34:17

number of intersections means that the point is on the outside of the glyph, while an odd

34:22

number means that it’s inside. So I’ve basically just copy pasted our intersection

34:27

test here — only after calculating the roots, instead of visualizing them, we now increment

34:33

a counter if the root is valid. And by valid I mean it needs to lie between 0 and 1 to

34:38

be on the bezier curve, and we’re also only interested in intersections to the right of

34:43

the ray’s starting point; since that’s the arbitrary direction I picked for the ray

34:47

to be traveling. Now we just need this tiny function here to

34:51

loop over all the curves in the glyph, count the total number of intersections, and return

34:56

whether that value is even or odd. Alright — let’s try it out. And, isn’t

35:03

that beautiful? From here, I guess we basically just need to increase the number of dots!

35:10

But hang on a second… what are these weird artefacts that keep appearing, and also from

35:14

the noises I’m hearing, I fear my computer’s about to burst into flames.

35:18

Okay, no need to panic, let’s just start with the most serious issue first — getting

35:23

rid of this ugly line. So let’s take another look at

35:48

our visualization. Alright… it just needed a good old-fashioned

35:54

restart it appears, and we’re back in business. So let me set the grid size to one hundred

36:00

here, which is where that issue was occurring — and I believe what’s happening is just

36:04

that there’s obviously a meeting point of two curves over here, and this row of pixels

36:09

just happens to line up precisely for both of them to be counted in this case. So in

36:15

the code, I’ll just change the condition for being a valid intersection to discount

36:19

the very end of the curve, meaning that where one curve ends and another begins, we’ll

36:25

only count one of them. Aaand running this again — we can see: problem solved.

36:33

With that fixed, let’s now tackle the performance problems, and I want to try simply moving

36:38

all of our calculations into a shader so that we can compute loads of pixels in parallel.

36:44

Firstly I’m just trying to get the bounding boxes of each glyph showing up — so basically

36:48

there’s this buffer of instance data, which just contains each glyph’s position and

36:52

size at the moment, and that gets sent to the shader. Then we’re requesting that a

36:58

quad mesh be drawn for however many glyphs there are in the input string.

37:02

So when the shader is processing the vertices of the mesh, it’ll be told which copy or

37:08

instance of the mesh it’s currently working with, meaning that we can get the data for

37:12

that instance, and use it to position each vertex.

37:16

Here’s how it’s looking at the moment. This says “Coding Adventures”, by the

37:21

way, in case you can’t tell. To improve the readability, we’ll need to

37:26

get the rest of the data over to the shader. So I’ve been working on this little function

37:31

here which creates a big list of all the bezier points that we need, along with a list of

37:36

integers for storing some metadata about each unique glyph in the text. In particular, at

37:43

what index it can find its bezier data, the number of contours it has, and also the number

37:49

of points in each of those contours. Obviously we also need to record the index at which

37:54

each glyph’s metadata begins, so that the glyphs can be told where to find it.

37:58

Not very exciting stuff, but it needs to be done. And all of this ultimately gets packed

38:03

up and sent to the shader, to live in these buffers over here.

38:09

Another slightly tedious thing I’ve been doing is translating all our C# code into

38:12

shader language for things like the quadratic formula, intersection counting, and the point

38:18

inside glyph test. Okay with that done, let’s try it out…

38:27

And it’s working! Well mostly anyway… I don’t know why the bitcoin logo has showed

38:32

up — that’s just supposed to be a space; and also the i has gone missing, but I’m

38:37

sure these’ll be easy enough to fix. I’m more concerned about whatever’s happening

38:41

over here! So I just want to step though the different

38:44

segments of the glyph to see what values we’re getting for our quadratic coefficients — aaand

38:51

I should have known. If it isn’t my old nemesis floating point imprecision….

38:59

“…as we can see, everything is jiggling like there’s no tomorrow”

39:03

So computers, tragically, represent numbers with a finite number of bits, meaning that

39:09

precision is limited. On this segment for instance, we calculate the coefficient ‘a’

39:14

by taking p0, adding p2, and subtracting twice p1. Which is precisely zero for the y values.

39:22

Except… the computer begs to differ. It says it’s extremely close, but not quite,

39:28

equal to zero. As a result, when we apply the quadratic formula, we’re dividing by

39:33

this tiny, incorrect value, and getting incorrect results. So, we need to get need a little

39:39

dirty I guess, and just say that we’ll consider our curve to be a straight line if as ‘a’

39:44

is very close to zero. And now those horrible little lines have disappeared.

39:50

By the way, we should check if all of this has actually improved the performance, and

39:57

indeed our efforts have taken us from barely 12 frames a second or whatever it was, to

40:00

being comfortably in the range of 800 / 900 frames per second range.

40:01

Okay, after a few misadventures… I’ve managed to fix that missing I and unvisited

40:06

bitcoin, and so at last, we’re able to properly render some text to the screen with our shader.

40:12

Since we’re doing this directly from the curve data, we can also zoom in pretty much

40:16

to our heart’s content, and it remains perfectly smooth. Well I say perfectly smooth, but of

40:22

course there’re still not enough pixels in modern displays to prevent edges from looking

40:26

unpleasantly jagged — so we’ll definitely need to implement some kind of anti-aliasing

40:31

to improve that. I also spotted a weird flicker when I was

40:35

zooming in. I’m struggling to see it again now, but here’s a freeze frame from the

40:39

recording. So I’ve been zooming in and out and panning around trying to see how common

40:44

these flickers are, and where they’re occurring. There’s another one! By the V… Did you

40:53

see that? This is a bit frustrating because it only

40:56

happens when the pixels perfectly line up in some way, so it’s hard to capture the

41:01

exact moment. It’s right in the sweet-spot I’d say of

41:04

being rare enough to be a nightmare to debug, but not quite rare enough to just pretend

41:10

it doesn’t exist. So I really want to have some way of quantifying how many of these

41:14

artefacts are occurring, so that I don’t have to spent multiple minutes after each

41:18

tweak to the code trying desperately to divine whether I’ve made things better or worse.

41:23

So, I’ve decided to build an evil artifact detector.

41:29

What I’ve done is simply taken a glyph, and written some code to randomly generate

41:33

these little test windows. The windows in black are entirely outside of the glyph, while

41:39

the windows in white are entirely inside of it.

41:42

And I’ve run this across a bunch of different glyphs from several different fonts. The idea

41:48

then is that each of these windows gets fed one by one to a compute shader, which looks

41:53

at all the texels — which is to say, all the pixels inside the window texture — and

41:59

runs our algorithm for each of them to figure out whether they’re inside the glyph or

42:03

not. It then draws the result to the window texture, and also flags the test as a failure

42:09

if it doesn’t match the expected value. By the way, a small detail here is that I’m

42:15

offsetting the y position by an increasing fraction of a texel as we go from the left

42:20

to right edge of the window — because that way we’re not just running the same test

42:24

over and over, but rather covering a whole extra range of values, which’ll hopefully

42:28

help us catch more artifacts! In total it’s running just over 74000 tests;

42:35

which takes about 10 seconds to complete, and our current algorithm is failing 7166

42:42

of them, so roughly 10 percent. To help hunt down the issues, I’ve also

42:47

set up this little debug view, where we can see a list of the failed cases down the side

42:51

here. And each case is defined by 3 numbers. There’s the glyph index, which we can set

42:57

over here. Then the resolution index, which determines how many pixels are tested inside

43:03

of the window. And finally the window index of course tells us which window we’re looking

43:07

at. So let’s go to one of the failed cases,

43:11

such as 0, 2, 19. And zooming in on the window, we can see this little pixel over here where

43:17

it mistakenly thought that it was inside of the glyph. I’m going to draw in a quick

43:21

debug line, so that we can then zoom out and see where exactly the issue is arising. And

43:27

it appears to be this troublesome meeting point of two curves again, where the intersection

43:32

is being double-counted. I thought I had fixed that earlier when we

43:36

told it to ignore time values exactly equal to one; but clearly that was overly optimistic.

43:43

It makes sense in theory, but as we saw recently — with every bit of maths we do, numerical

43:49

imprecision has an opportunity to creep in — and so we can’t exactly trust the values

43:53

that we’re getting. So I’ve been working on a dirty little fix

43:57

for the double counting, where we simply record the positions of the intersections that occur

44:02

on each segment, and then when we’re examining the next segment, we ignore any intersection

44:08

points that are extremely close to those previous ones. Alright, let’s try this out… And

44:14

we get a failure count of over 30000. Well that’s not great… Okay, I see what

44:19

went wrong though. We’re recording the positions over here regardless of whether they were

44:23

actually on the curve segment or not. So let me fix that quickly, and then redo the test

44:29

— and this time…. 4943. And with some trial and error just tweaking the value of

44:37

the distance threshold, I was able to get the failure rate down to 3647.

44:42

Alright, now I’ve been taking a look at some of these remaining failures — such

44:48

as this one over here — and in this case, we have a few spots that falsely believe they’re

44:53

outside of the glyph — meaning the intersection count is even when it should be odd. And looking

45:00

at this, we can see we have 1, 2, 3 simple intersections, and then we avoid the double

45:06

counting over here, giving us a total of 4. The count is supposed to be odd though, so

45:13

I guess in this particular case we actually do need to count both of them. I think because

45:18

they’re forming this kind of vertical wedge, so in a sense the ray is both exiting and

45:23

entering the glyph at that point, meaning they cancel each other out.

45:27

I’m not sure if I’m making any sense, but regardless, this whole double-counting

45:31

business is becoming annoying, so I’ve been brainstorming a bit how we could approach

45:35

the problem slightly differently to get around it.

45:39

The new plan is to forget about counting the number of intersections, but rather look for

45:44

the closest intersection. The idea is that regular contours are defined in a clockwise

45:49

direction, while the contours of holes go anti-clockwise. And this means that if the

45:55

closest curve crosses our ray in an upwards direction, we must be outside the glyph, or

46:01

inside a hole — but that’s the same thing really. Whereas if the closest curve crosses

46:06

our ray in a downwards direction, then we know we’re inside the glyph.

46:11

To determine which direction the curve crosses the ray, we can take our equation and just

46:15

apply the power rule from calculus to figure out the gradient, which comes out as 2at plus

46:21

b. Calculating that for any value t along our

46:25

curve gives us this vector here, which tells us how the x and y positions along the curve

46:30

change with respect to t. So over here, the y position of the curve is increasing as t

46:38

increases, so the gradient has a positive y value, whereas over here y is decreasing

46:44

as t increases, so the gradient has a negative y value. That means we can simply check the

46:50

sign of that value to find out whether the curve crosses our ray in an upwards or downwards

46:55

direction. So here’s the new implementation — as

46:58

you can see we’re simply searching for the closest point that our ray intersects, then

47:03

calculating the gradient of the curve at that point, and saying we’re inside the glyph

47:08

if that closest curve crosses the ray in a downwards direction.

47:12

Let’s run the test to see how this compares… And we’re getting a failure count now of

47:17

1126, so not a bad improvement! There are some downsides to this approach

47:25

though. For instance — here’s a font I was experimenting with, and we can see that

47:29

the ‘j’ is looking a little strange! That’s due to an error in the font where the contour

47:35

has been wound anti-clockwise like a hole. If we were just counting the intersections

47:40

it would give the correct result regardless, but this new approach fails catastrophically.

47:45

Also, over here the contour contains a slight self-intersection, which again causes an issue.

47:51

So that’s definitely a big weakness to keep in mind, but I’m going to run with it anyway

47:55

since it seems promising in terms of eliminating those pesky artifacts.

48:01

With that said, this same case we had before is actually still failing, but now its because

48:06

right at the end point these two curves have the same position, so it doesn’t know which

48:11

one is closer. I think it makes sense then to favour the curve that’s most on the left,

48:17

since our rightwards ray should hit that first. So in the code I’ve just added a quick test

48:22

to see if two points are the same, or at least very near to being the same — and then if

48:27

that’s the case, we skip the curve that has it’s ‘p1’ point further to the right.

48:32

And running this now, we’ve eliminated a few more problems — down to 1024.

48:39

Alright here is our next problem…. and I’m just trying to spot the artifact here. It’s

48:44

quite tiny, but there it is. Okay I’m not sure what’s going on in this case — I

48:51

don’t even know which of these curves the ray is intersecting. So I’ve quickly added

48:56

a way to highlight the curves of each contour by their index, and we can see here that the

49:01

possibilities are curves 2, 3, or 4. Now shaders can be a little tricky to debug

49:06

— but we can get creative and do something like output a colour based on the index of

49:11

the closest curve we found— so red if index 2, green if index 3, and blue if index 4.

49:19

And now we can zoom in on our debug texture again to see which it is. And it must be our

49:23

lucky day — two different problems for the price of one! As we can see, from one spot

49:28

its hitting curve 2, and from the other its hitting curve 4.

49:32

Let’s think first about the ray that’s hitting curve number 2. It’s a little odd

49:38

that our ray is able to hit that curve, but evidently miss the closer curve over here

49:43

— given the fact that these endpoints are at exactly the same height.

49:47

I think what might help in at least some situations like this is to test whether all the points

49:52

of a curve are below our ray, and simply skip the curve in that case. And same story if

49:58

all the points are above the ray. These raw position values are really the only reliable

50:05

thing we have — as we’ve seen, once we start doing maths with them, all bets are

50:09

off — and so by skipping curves early on based on the positions, we can avoid ‘accidentally’

50:15

hitting them due to numerical issues. It would really be wise to use these positions

50:20

directly as much as possible — for example at one point I read a bit about the Slug algorithm,

50:25

which has this way of classifying different curves based solely on the positions of their

50:30

control points. This technique is very much patented though, so I’m going to just forge

50:35

ahead with my own hacky approach instead. Alright, so running the test again with our

50:40

curve skipping enabled now, we get a failure count of 882.

50:45

And looking at our case from before, we can see that only the intersection with curve

50:49

4 remains. This is actually the curve that we want it to intersect with, but since it

50:56

completely flattens out over here, we must be getting a gradient of zero, which is ambiguous

51:01

as to whether the curve is going up or down. But from the fact that p0 and p1 are in a

51:07

straight line here, and p2 is *below* them, I think we could reasonably consider this

51:12

curve as going purely downwards. So in general, let’s say the a curve is

51:17

decreasing for it’s entire duration if p2 is below p0, and increasing if it’s the

51:23

other way around. Although this only holds of course if p1 is somewhere in between them

51:28

on the y axis. As soon as p1 becomes the lowest, or highest

51:33

point, the curve starts curving in both directions. And this is kind of troublesome actually,

51:38

because let’s say we have a ray which should theoretically intersect precisely at this

51:43

turning point over here where the y gradient is 0… Any tiny imprecision might cause the

51:49

program to actually think it intersected a tiny bit behind or ahead of that point, so

51:54

we could very easily end up with artifacts. Most fonts I’ve come across so far, seem

51:59

to actually avoid using this kind of curve, but if we encounter one, maybe we could simply

52:04

split it into two segments, one where it’s going purely upwards with respect to t. And

52:10

one where it’s going purely downwards. Doing that should be reasonably straight-forward.

52:15

First of all, we can figure out the turning point simply by setting the gradient to zero,

52:21

and then solving for t — from which we can of course compute the actual point.

52:27

Then say we want to construct this segment here that goes from p0 to the turning point.

52:32

Obviously we’re going to place the two endpoints of the new curve at p0 and the turning point,

52:38

but the big question is, where does the new p1 point go?

52:42

Well, we need the gradient at p0 to still point towards the p1 point of the original

52:48

curve, in order to maintain the same shape, which means that our new p1 point is constrained

52:53

to be somewhere along this line here. And where exactly, is where it forms a horizontal

52:59

line with the turning point, since of course the gradient there, by definition, is 0 on

53:05

the y axis. So I’ve been writing some code to do these

53:08

calculations for us automatically, and it’s such a great to feeling to figure out the

53:13

maths for something, and then just have it work perfectly the first time. Which is why

53:17

I was a bit disappointed when this happened. Anyway, it turned out I just forgot some important

53:24

parentheses… and while we’re here let’s take a quick look at the actual maths, which

53:28

is super simple. We just say that if we start at p0 and move along the gradient vector of

53:34

the original curve at p0 for some unknown distance, we’re eventually going to hit

53:39

a point with a y value equal to the turning point. We can then rearrange that to solve

53:44

for the unknown distance, which then let’s us calculate our new p1 point. And the exact

53:50

same goes for the other segment of course Aaaand now we can see that this is working.

53:59

So this can be done as a pre-processing step on the glyph contours — though like I said,

54:03

most the fonts I’ve tried actually keep the p1 points strictly between p0 and p2,

54:08

so it’s usually not necessary. Alright, then in the shader, we can now determine

54:13

if the curve is going up or down simply based on the relative y positions of p0 and p2.

54:21

Hopefully this will improve the accuracy somewhat… and running the test on this, we now we have

54:25

a failure count of 755. I’m slightly underwhelmed, but it’s progress nonethless.

54:33

Okay, let’s keep going! The next error on our list is this one here, where this ray

54:38

is managing somehow to dodge both of the segments at their meeting point. I believe what’s

54:44

happening is that the roots we’re calculating there are just ever so slightly outside of

54:49

the valid bounds — due to precision errors of course.

54:54

So I’ve simply added in a little fudge factor to catch those cases. And running the test

55:00

again, we’re now down to just 404 remaining failures.

55:05

Okay, here’s the next case on the list, and in this one the ray should be hitting

55:11

segment six, but after some debugging I discovered its actually just missing it due to precision

55:18

issues wouldn’t you know — and instead hitting only segment seven. So in our quadratic

55:24

formula, missing the curve means that the disciminant is negative — so let’s try

55:29

actually allowing it to be a tiny bit negative to catch these cases — and just clamp it

55:34

to zero in our actual calculation. Then let’s run the tests once again, and

55:47

that’s actually brought us all the way down to just 10 failed cases. So let’s have a

56:02

look at what remains. Alright, our ray is hitting the tip of these

56:10

two segments here — and this looks very similar to a problem we dealt with earlier

56:14

actually. Remember this one — where the ray was also hitting the tip of two segments,

56:19

and we told it that it should prefer the one that had its p1 point further on the left;

56:23

the idea being that a ray traveling rightwards should hit that segment first.

56:28

But now we have a bit of an edge case here where the p1 point of the segment we want

56:33

the ray to hit, is actually not further to the left than the other segment.

56:38

So I’ve set up this little test area with different configurations of curves to try

56:42

and come up with a better approach, and I think I finally have something that’ll do

56:47

the trick. To be clear about what we’re trying to achieve

56:51

in case I wasn’t making sense earlier — the problem is that if our ray is coming in and

56:56

hitting the tip of these two curve segments, we’re not going to get a clear answer as

57:00

to which curve it hit first. But intuitively, it should be the blue curve, since if the

57:05

ray was just a tiny bit lower down, then it would very clearly be hitting the blur curve

57:10

first. So my thinking now is that we’ll consider

57:14

the gradients of the two curves at our point of intersection, and then imagine kind of

57:19

winding in a clockwise direction from the ray, and seeing which gradient vector we run

57:25

into first. Whichever curve that gradient belongs to, is the curve we want.

57:29

By the way, if the curves originate from above the ray instead of below it like this case

57:35

here, the idea still works the same, only we imagine winding in a counterclockwise direction

57:40

instead. And here’s that idea expressed in code.

57:45

So… running the test again, we’re now at long last down to zero errors, which I’m

57:51

very happy to see. Obviously, scoring perfectly on the test does

57:57

not mean it’s perfect — in fact I’d be shocked if there’re not still plenty

58:00

of edge cases lurking in the shadows, but we’ve definitely come a long way in destroying

58:04

those artifacts that were quite prevalent before. At least I’ve been zooming and panning

58:09

around this text for quite a while, and not yet noticed a single one.

58:14

I did come across a more fundamental issue while testing some new fonts though. And this

58:18

arises when we have overlapping shapes, like with this particular letter ‘K’ over here.

58:24

We’re using the closest segment to determine whether we’re inside or outside of the glyph,

58:28

but from over here for example, the ray would hit into this as the closest segment, which

58:33

incorrectly tells us that we’re outside the glyph. And if we try rendering it, it

58:38

looks like this. After some contemplation though, I’ve tweaked

58:43

the algorithm slightly to now keep track of the closest distance to the exit of any shape

58:48

that we’re inside of, as well to the exit of any hole that we might be inside of. And

58:52

then by comparing those distances we can determine whether our point is inside the glyph or not.

58:59

So it seems like our rendering is now working okay.

59:03

But it definitely is in need of some anti-aliasing as I mentioned before, so that it doesn’t

59:07

look so unpleasantly jaggedy. Currently, each pixel decides what colour

59:13

to display based on whether a single point at its centre is inside of the glyph or not.

59:19

So the easiest improvement I can think of would be to go from a single point deciding

59:23

the fate of the pixel, to perhaps a grid of 9 points, each contributing a little to the

59:29

final colour of the pixel. We could even get fancy and consider the fact

59:33

that each pixel is made up of a red, green, and blue component. So over here for example,

59:39

instead of just dimming all 3 colours uniformly because only 6 of the 9 points are inside

59:45

the glyph — we could say that since red and green are within the glyph, we’ll just

59:49

light those two up fully, by outputting yellow, and leave blue turned off.

59:55

This is a technique called sub-pixel anti-aliasing, and we can see it in action if we take a very

60:01

close look at the text in my code editor for example, where along left edge of the h for

60:06

instance we can notice how mainly just the green and blue parts of the pixel are lit

60:11

up; while along the right edge here it looks like mainly just the red part of the pixel

60:15

is lit up. So this is a clever way of essentially tripling the horizontal resolution of the

60:21

display. There are some subtleties to this though—

60:24

in terms of filtering the result so that the colour fringes aren’t overly harsh, and

60:28

I guess detecting and handling the possibility of different pixel layouts on different displays.

60:33

So I’m going to actually leave this subpixel idea for future experiments.

60:37

But here’s our super simple version for today, we just loop over a 3x3 grid, and calculate

60:43

the different sample points within the pixel. This can be done by the way thanks to this

60:48

derivative function, which just tell us how much the position value changes between the

60:53

current pixel we’re working on, and its next-door neighbour. In other words, it’s

60:57

the size of the pixel in our glyph’s coordinate space.

61:01

So we’re just adding one for each of these points that is inside of the glyph, and then

61:05

dividing by 9 to get the average of those samples. And now, instead of being all jaggedy,

61:11

our edges become nicely smoothed out. This is at a very high magnification of course

61:16

just to show the effect better — it’s obviously a lot more subtle in reality, and

61:20

while I’m not sure how well it’ll show up in the video, it definitely does make a

61:24

big difference to how smooth the text feels. Running our algorithm 9 times for each pixel

61:30

feels a bit exorbitant though, so I’ve been experimenting with a little improvement where

61:34

we actually only run it three times — along the vertical axis — and sum up now this

61:39

horizontal coverage value. Basically, instead of simply returning a bool

61:43

for whether the point is inside the glyph or not, I’ve slightly modified our function

61:48

to return a coverage value between 0 and 1. This is calculated by adding together the

61:53

distance to the nearest curve segment on either side of the input point, both of which have

61:58

been clamped to half of the width of the pixel, and then dividing by the width of the pixel.

62:04

This means that if the pixel is far away from any edge, then it will get a value of 1, meaning

62:09

it’s completely inside the glyph — or inverted to zero if we’re actually outside

62:13

of the glyph. And for pixels closer and closer to an edge, that value will get closer and

62:17

closer to 0.5, so that we get a smooth transition between 0 and 1 across the edge.

62:24

Here that is in action with our test text from earlier. And if we magnify this a bit,

62:30

we can see the anti-aliasing is looking pretty decent I think.

62:34

It’s maybe a slightly strange implementation though — I later came across this writeup

62:38

which talks about sending out the rays in different directions to estimate what fraction

62:43

of this little circle is covered by the glyph, which sounds a bit more flexible and accurate,

62:48

so I might use that instead in the future. In any case let’s do a super quick performance

62:55

test now. So I’ll paste in the entire 12 thousand word script for this video as the

63:01

input, and try running it. Alright, taking a look at the statistics,

63:16

it seems to be running happily enough at around 800 frames per second — and I tried comparing

63:20

the average frame duration here to that of a blank project, and it seems like the text

63:25

is taking roughly 0.2 milliseconds to render, although I’m honestly not sure how to properly

63:31

profile these sorts of things. The performance does fall off a cliff unsurprisingly

63:35

though for very complex fonts. Here’s a random font I found with lots of little wiggly

63:40

details in the glyphs — this g for instance contains 440 bezier curves, which has taken

63:46

us from roughly 800 frames per second down to about 150.

63:52

There are ways we could optimize this though — the obvious idea that comes to mind being

63:56

splitting each glyph up into multiple bands, so that each pixel only has to test the bezier

64:02

segments inside of the band its in. Additionally, each glyph is currently rendered as a quad

64:07

— which is nice and simple, but in a lot of cases can leave quite a bit of empty space

64:12

that the shader still needs to crunch the numbers for. With a bit of pre-processing,

64:17

we could construct much tighter fitting meshes, tailored specifically for each individual

64:21

glyph. Far more challenging than improving the performance

64:25

I suspect is improving the legibility of the text at small sizes.

64:31

Right now it’s very easy to read — on my screen at least — but if I just zoom

64:35

out a little more, it really starts to go downhill quite fast. Let me magnify this view

64:41

so we can see the problem better, and take this word here for example. We can see how

64:46

some parts of it show up nice and bright, while other parts are much dimmer — because

64:50

maybe they fall awkwardly between two pixels for example, so both pixels end up just being

64:55

kind of greyish. If I pan the camera slightly, we can see how the legibility of each letter

65:00

actually changes as its alignment with the pixels changes.

65:04

And that’s kind of a tough thing to fix! I believe we’d need to delve into that scary

65:08

bytecode interpreter stuff that I was so happy to skip in the beginning. So definitely a

65:13

problem for some other day. What’s still bugging me a bit though is

65:17

those downsides I showed earlier to our ‘closest curve’ rendering approach, namely failing

65:22

if the contour has the wrong winding direction, and being very sensitive to self-intersections.

65:29

So I decided to take everything I learned from that artifact hunting process, and basically

65:34

start over from scratch with the counting method again. And I won’t bore you with

65:38

another blow-by blow account, but armed with a much better understanding of the problem

65:42

now, I was able to quickly get it working with zero issues on the test set.

65:48

A bunch of the same ideas are still in there, and the main new addition is really just a

65:54

small detail in the way that curves are skipped, which actually helped immensely with the double-counting

65:59

issues that were plaguing us previously. Basically, when we’re testing if all the

66:05

points of a curve are above or below the ray, we need to decide whether we want to include

66:10

zero in the test; which means skipping the curve even if one of the points is precisely

66:15

at the same height as the ray. If we don’t skip those cases we get issues

66:20

with double counting at the meeting point of two curves, while if we do skip them, then

66:26

we essentially create tiny holes that the ray can sneak through at those meeting points.

66:31

Of course we could try to solve that issue by only including zero for the p2 points perhaps,

66:37

since then if we imagine a contour like this, a ray that perfectly lines up at the meeting

66:41

point will only intersect one of the curves. The problem though is that if the contour

66:47

looks like this instead, then suddenly the ray isn’t really crossing the contour anymore,

66:52

it’s just kind of scraping past it, so we want it to hit either neither of the curves,

66:57

or both of them, so that it doesn’t affect the result.

67:01

I believe it was this kind of issue that drove me to stop using this counting method originally,

67:06

but now that we’ve processed the curves to ensure they can only go strictly upwards

67:10

or strictly downwards — it suddenly doesn’t seem like such a big deal anymore. The problem

67:15

simply occurs when the curves are going in opposite directions: one up, and one down.

67:21

And so the way we can handle this in the code is to just check the direction of the current

67:25

curve, and for one direction we include zero only for the p0 points, and for the other

67:31

direction we include zero only for the p2 points. And that solves our issue.

67:38

So now that we’re using the counting method again, we can see that the ‘j’ renders

67:44

correctly, and the tiny self intersection is no longer causing any visible issues.

67:51

I was quite happy about this, but victory was short-lived because I then realized that

67:55

this approach actually fails in the case of overlapping shapes again. We can fix this

68:03

with a slight tweak — where we say that whenever the ray crosses a downwards curve,

68:07

which means that it’s exiting the current shape, we add one to this counter. And otherwise

68:12

subtract one if it’s entering the shape. Then, if the ray exited more shapes than it

68:18

entered, it must have started out inside of a shape. This is essentially the same idea

68:23

as the counting method, but should be able to handle overlaps.

68:27

By the way, this can also be extended very nicely to work with our current anti-aliasing

68:32

approach, like this. It’s still just adding or subtracting one like before, unless the

68:37

distance to the intersection becomes extremely small — like within the current pixel. And

68:43

in that case we’re now adding or subtracting some fractional amount — meaning that in

68:47

the end, we again get our nice smooth blend from 0 to 1 across the edge of the glyph again.

68:53

This idea comes from that writeup I mentioned earlier when we were first implementing the

68:57

anti-aliasing. Anyway, as we can see this little tweak has

69:02

indeed resolved the overlapping shape issue. … While unfortunately breaking the problematic

69:08

‘j’ again, which is quite annoying. On the other hand, the self-intersection is looking

69:13

better than ever, so I’d say we can count this as a half victory at least.

69:18

I did try a couple different things to get the wonky j to render correctly, since it

69:22

is a recurring issue that I’ve seen in several fonts at this point, but none of my attempts

69:27

came particularly close to working. This one perhaps least close of all. Maybe we should

69:33

try to automatically detect faulty contours and correct them when importing the font instead,

69:39

but I need to stop worrying about these little

69:41

details for now, this was just supposed to be a quick experiment after all, and I maybe

69:45

got a little lost in the weeds along the way — sorry about that.

69:50

Anyhow, after all this, I definitely feel like I’ve gained a much greater appreciation

69:54

for some of the intricacies of rendering text. Here is one last test with our latest approach

70:03

on a big wall of text with several different fonts, and I think I’m happy enough with

70:08

how it’s looking to call it a day for now at least.

70:12

So, to end things off, I’ve just been having some fun in the vertex shader here, making

70:19

the letters wave and spin around to say farewell, and thanks for watching

70:23

I hope you’ve enjoyed following this little journey into the surprisingly deep rabbit

70:27

hole of rendering text, and if you have any thoughts as to how the current implementation

70:32

could be improved, I’d love to hear them. Alright, until next time — cheers!

Rate This

5.0 / 5 (0 votes)

相关标签
Font ParsingC# CodingBezier CurvesText RenderingTrueType FontsGraphic DesignSoftware DevelopmentDigital TypographyCoding TutorialTech Education
您需要『中文』的总结吗?