In my early days at MathWorks, learning to be a software developer, there were several books that I found particularly influential and helpful. One was *Writing Solid Code* by Steve Maguire. There was one chapter in particular that I incorporated directly into my routine: “Step through Your Code.”

The best way to write bug-free code is to actively step through all new or modified code to watch it execute, and to verify that every instruction does exactly what you intend it to do.

–Steve Maguire

Just watch your code execute, one line at a time, by stepping through it in the debugger. It seems almost too simple to have a big impact, but in my repeated experience, I often did find bugs, almost-bugs, or worthwhile code improvements to make. The process helped me gain confidence that my code was solid and ready for code review, testing, and submission, and so I did it every time.

I got out of the habit because of a years-long period during which I did not write product code. Then, by the time I returned to the Image Processing Toolbox team and started writing product code again, I had gotten out of the habit of printing things.

Not printing mattered because of the way I used code printouts for my step-through process. I would pick a set of inputs and then start the code in the debugger. As I stepped through the code execution, I would highlight the executed lines on a printout. Very frequently, I would inspect values in the code to make sure they were as I expected.

Then, I would choose a different set of inputs so that I could watch code lines that hadn’t been executed yet, and I would start again. I would keep doing this until all the code lines were highlighted.

Here is a recent example, from my work on “Initialize a MATLAB Toolbox”. From my handwritten scribbles, you might be able to tell that I identified a case to add to my test suite, and I also had an idea for a possible code improvement. The highlights show that there is one code line, a call to `error`

, that I haven’t watched yet.

When developers first hear about this technique, they almost always think that it would be too much trouble, or that it would take too much time. It really only takes a few minutes, though, in my experience. It takes way more time than that if you have to fix something later, after the code has been submitted.

A more significant objection, I think, is that many developers today rarely print anything. There might not even be a printer nearby. To solve that problem, I would love to see a code editor feature that would support this kind of visual inspection of code execution. Perhaps the editor could visibly mark code lines whenever the debugger stops on them.

Many people will see this suggestion and think that a code coverage report will do the trick, but I don’t think so. I’m looking for something that is more interactive, something that will update and keep track as I watch the code execute, a line at a time.

I have submitted an enhancement request to MathWorks to consider adding something like this to the MATLAB Editor.

]]>In my previous post in this series, I described a method for choosing the FFT transform length when computing a full convolution. The method is based on the idea of choosing a length whose prime factors are no greater than 7.

To find such a length, I wrote some code that used repeated division to determine whether $N$ has factors greater than 7. The code would then simply increment $N$ until it found a suitable length.

```
function Np = fftTransformLength_repeated_division(N)
Np = N;
while true
% Divide n evenly, as many times as possible, by the factors 2, 3,
% 5, and 7.
r = Np;
for p = [2 3 5 7]
while (r > 1) && (mod(r, p) == 0)
r = r / p;
end
end
if r == 1
break
else
Np = Np + 1;
end
end
end
```

Because I was interested in 2-D applications with relatively small transform sizes (in the thousands), I didn’t think too much about the efficiency of this function. For 1-D applications with very long vectors, though, the time required to compute the transform size could be significant. For example, $N=8505001$ is prime, and the nearest larger integer with prime factors no larger than 7 is $N_p =8573040$ , which is 68039 integers away. Let’s see how much time that would take.

```
N = 8505001;
Np = fftTransformLength_repeated_division(N)
```

```
Np = 8573040
```

```
f = @() fftTransformLength_repeated_division(N);
timeit(f)
```

```
ans = 0.0322
```

Is 32 milliseconds a long time?

It is. It’s about a third of the time required for the corresponding FFT computation:

```
x = rand(1,N);
g = @() fft(x,Np);
timeit(g)
```

```
ans = 0.0946
```

I’d like to thank Chris Turnes (MATLAB Math team developer at MathWorks; fellow Georgia Tech grad) and Cris Luengo (principal data scientist at Deepcell; creator of DIPlib) for sharing their thoughts with me about alternative computation methods. Chris taught me how to construct a lookup table containing all integers (up to a specified maximum value) whose prime factors are no larger than 7. He suggested that such a table would have only a modest size, even for very large maximum transform sizes. Cris pointed out that OpenCV uses the lookup table approach with a binary search and that PocketFFT uses only multiplications and divisions by 2. Chris and Cris both did experiments that suggest there is value in using these other methods.

Here is a function that lists all the integers, up to some maximum value, whose prime factors are no greater than 7. The implementation concept here is from Chris Turnes.

```
function P = generateLookupTable(N)
P = 2.^(0:ceil(log2(N)));
P = P(:) * 3.^(0:ceil(log(N)/log(3)));
P = P(:) * 5.^(0:ceil(log(N)/log(5)));
P = P(:) * 7.^(0:ceil(log(N)/log(7)));
% Sort and trim P so that it contains only values up to N. Add 0 to the
% set of values.
P = sort(P(:));
i = find(P >= N, 1, "first");
P = [0 ; P(1:i)];
end
```

Let’s find all of the desirable transform lengths up to approximately a billion.

```
P = generateLookupTable(2^30);
size(P)
```

```
ans = 1x2
5261 1
```

How are those values spread out?

```
plot(diff(P))
title("Distance between adjacent values of P")
```

Now let’s try using a simple search of `P`

to implement an alternative method to find a good transform length.

```
function Np = fftTransformLength_lookup_table(N,P)
Np = P(find(P >= N, 1, "first"));
end
```

How long does this take for the example above, $N=8505001$ ?

```
N = 8505001;
P = generateLookupTable(1e9);
h = @() fftTransformLength_lookup_table(N,P);
timeit(h)
```

```
ans = 7.1154e-06
```

Further optimizations could be explored, but already it takes only a few microseconds, about 4,500 times faster than the repeated division method for this particular case.

Let’s check performance for a large power of two, which I would expect to be a much better case for the repeated division method.

```
N = 2^30;
timeit(@() fftTransformLength_repeated_division(N))
```

```
Warning: The measured time for F may be inaccurate because it is running too fast. Try measuring something that takes longer.
ans = 0
```

```
timeit(@() fftTransformLength_lookup_table(N,P))
```

```
ans = 7.4774e-06
```

The repeated division method is so fast, in this case, that the time required is not measurable.

However, I am inclined to always use the lookup table method, and I have updated my `fftTransformLength`

function (GitHub link, File Exchange link) to do that.

I have submitted an enhancement request to MathWorks to add a function for computing good FFT transform lengths, as described in this series.

]]>In my previous post, I wrote about using FFTs to compute a full convolution. If the vector `x`

has $K$ points and the vector `h`

has $L$ points, then the convolution of `x`

and `h`

has $K+L-1$ points. FFTs can be used to compute this convolution, but **only** if `x`

and `h`

are zero-padded to have **at least** $K+L-1$ points in the FFT computation.

Early in my career, the commonly available FFT algorithms, which were usually written in Fortran, were limited to computing FFT with transforms that were powers of two. Because of this limitation, using FFTs to compute convolution required zero-padding to the smallest power of two that was greater than or equal to $K+L-1$ . Here’s how that might look in MATLAB code.

```
x = [1 -1 0 4];
h = [1 0 -1];
K = length(x);
L = length(h);
N = K + L - 1
```

```
N = 6
```

```
Np = 2^nextpow2(N)
```

```
Np = 8
```

```
X = fft(x,Np); % Computes Np-point zero-padded FFT
H = fft(h,Np); % Computes Np-point zero-padded FFT
Y = ifft(X .* H);
y = Y(1:N)
```

```
y = 1x6
1.0000 -1.0000 -1.0000 5.0000 0 -4.0000
```

More than 20 years ago now, I integrated the FFTW library into MATLAB. FFTW, or “Fastest Fourier Transform in the West,” won the 1999 J. H. Wilkinson Prize for Numerical Software. I spent a long time deep in the FFTW documentation at the time, and this statement caught my attention: “The standard FFTW distribution works most efficiently for arrays whose size can be factored into small primes (2, 3, 5, and 7), and otherwise it uses a slower general-purpose routine.”

This observation suggests the interesting possibility of using transform lengths are powers of 2, 3, 5, and 7, instead of using only transform lengths that are powers of 2.

Here is a function that computes a transform length based on this idea.

```
function np = transformLength(n)
np = n;
while true
% Divide n evenly, as many times as possible, by the factors 2, 3,
% 5, and 7.
r = np;
for p = [2 3 5 7]
while (r > 1) && (mod(r, p) == 0)
r = r / p;
end
end
if r == 1
% If the result after the above divisions is 1, then we have found
% the desired number, so break out of the loop.
break;
else
% np has one or more prime factors greater than 7, so try the
% next integer.
np = np + 1;
end
end
end
```

Suppose `n`

is 37, a prime number. The next power of 2 is 64. The next number whose largest prime factor 7 or less, on the other hand, is 40.

```
n = 37;
np = transformLength(n)
```

```
np = 40
```

```
factor(np)
```

```
ans = 1x4
2 2 2 5
```

To see how different padding strategies affect performance, I will set up some different FFT-based 2-D convolution functions and measure their execution time for different input sizes. Here is function that zero-pads only to the minimum amount required, $K+L-1$ .

```
function C = conv2_fft_minpad(A,B)
[K1,K2] = size(A);
[L1,L2] = size(B);
N1 = K1 + L1 - 1;
N2 = K2 + L2 - 1;
C = ifft2(fft2(A,N1,N2) .* fft2(B,N1,N2));
end
```

Let’s measure the time required to convolve a 1170x1170 input with a 101x101 input. The output will be 1271x1271, and 1271 has prime factors 31 and 41.

```
A = rand(1170,1170);
B = rand(101,101);
f_minpad = @() conv2_fft_minpad(A,B);
timeit(f_minpad)
```

```
ans = 0.0329
```

Here is a function that zero-pads to the next power of 2. (The next power of 2 above 1271 is 2048.)

```
function C = conv2_fft_pow2pad(A,B)
[K1,K2] = size(A);
[L1,L2] = size(B);
N1 = K1 + L1 - 1;
N2 = K2 + L2 - 1;
N1p = 2^nextpow2(N1);
N2p = 2^nextpow2(N2);
C = ifft2(fft2(A,N1p,N2p) .* fft2(B,N1p,N2p));
C = C(1:N1,1:N2);
end
```

Repeat the timing experiment using this second padding method:

```
f_pow2pad = @() conv2_fft_pow2pad(A,B);
timeit(f_pow2pad)
```

```
ans = 0.0553
```

So, power-of-2 padding is not helping in this case. The fact that power-of-2 padding is slower than using a transform length with relatively high prime factors represents a sea change from the way FFT computations in MATLAB performed prior to 2004, which is when MATLAB started using FFTW.

Can small-primes padding do better?

```
function C = conv2_fft_smallprimespad(A,B)
[K1,K2] = size(A);
[L1,L2] = size(B);
N1 = K1 + L1 - 1;
N2 = K2 + L2 - 1;
N1p = transformLength(N1);
N2p = transformLength(N2);
C = ifft2(fft2(A,N1p,N2p) .* fft2(B,N1p,N2p));
C = C(1:N1,1:N2);
end
```

Run the timing again with the small-primes pad method:

```
f_smallprimespad = @() conv2_fft_smallprimespad(A,B);
timeit(f_smallprimespad)
```

```
ans = 0.0230
```

Yes, the convolution function using the small-primes method runs about 45% faster for this case.

Now let’s compare the results for a range of sizes.

```
nn = 1100:1200;
t_minpad = zeros(size(nn));
t_pow2pad = zeros(size(nn));
t_smallprimespad = zeros(size(nn));
for k = 1:length(nn)
n = nn(k);
A = rand(n,n);
t_minpad(k) = timeit(@() conv2_fft_minpad(A,B));
t_pow2pad(k) = timeit(@() conv2_fft_pow2pad(A,B));
t_smallprimespad(k) = timeit(@() conv2_fft_smallprimespad(A,B));
end
```

Plot the results.

```
plot(nn,t_minpad,nn,t_pow2pad,nn,t_smallprimespad)
ax = gca;
ax.YLim(1) = 0;
legend(["min padding","power-of-2","small-primes"],...
Location="southeast")
title("Execution time, nxn convolution with 101x101")
xlabel("n")
ylabel("time (s)")
grid on
```

While this is admittedly not anything like a comprehensive performance study, it does suggest that:

- There is no longer a reason to do power-of-2 padding for 2-D convolution problems.
- Small-primes padding is almost always faster that minimum-length padding, even when considering the extra steps needed (computing the transform length and cropping the output of
`ifft2`

. In a few cases, it is about the same or just slightly worse.

In my next post in this series, I plan to investigate a possible way to speed up the computation of the small-primes transform length.

]]>This post is the first in a short series that will present an implementation of FFT-based convolution that is faster than what is typically done in MATLAB. The improvement is achieved by using a different *zero-padding strategy* than what is commonly used.

Using FFTs to compute convolution

Common zero-padding strategies

If the vector `x`

has $k$ elements, then `fft(x)`

computes the $k$ -point FFT. The output has the same length as the input. For example:

```
x = [1 2 3];
fft(x)
```

```
ans = 1x3 complex
6.0000 + 0.0000i -1.5000 + 0.8660i -1.5000 - 0.8660i
```

The `fft`

function has another syntax, `fft(x,n)`

. With this syntax, `fft`

computes the $n$ -point FFT. Typically, this syntax has the effect of *zero-padding* `x`

so that it has length $n$ and then computing the $n$ -point FFT of the zero-padded vector. For example:

```
X1 = fft(x,5)
```

```
X1 = 1x5 complex
6.0000 + 0.0000i -0.8090 - 3.6655i 0.3090 + 1.6776i 0.3090 - 1.6776i -0.8090 + 3.6655i
```

That computation is equivalent to:

```
X2 = fft([x 0 0])
```

```
X2 = 1x5 complex
6.0000 + 0.0000i -0.8090 - 3.6655i 0.3090 + 1.6776i 0.3090 - 1.6776i -0.8090 + 3.6655i
```

```
isequal(X1,X2)
```

```
ans = logical
1
```

One application of zero-padded FFTs is using FFTs to compute convolution. If the vector `x`

has $K$ points and the vector `h`

has $L$ points, then the convolution of `x`

and `h`

has $K+L-1$ points. FFTs can be used to compute this convolution, but **only** if `x`

and `h`

are zero-padded to have **at least** $K+L-1$ points in the FFT computation. The code would look something like this:

```
x = [1 -1 0 4];
h = [1 0 -1];
K = length(x);
L = length(h);
N = K + L - 1
```

```
N = 6
```

```
X = fft(x,N); % Computes N-point zero-padded FFT
H = fft(h,N); % Computes N-point zero-padded FFT
y = ifft(X .* H)
```

```
y = 1x6
1.0000 -1.0000 -1.0000 5.0000 -0.0000 -4.0000
```

You can zero-pad **more** and still get the same result (except perhaps for some floating-point round-off differences). The code below zero-pads to length 10 instead of 6:

```
X = fft(x,10);
H = fft(h,10);
y = ifft(X .* H);
y = y(1:6) % Extract the first 6 elements
```

```
y = 1x6
1.0000 -1.0000 -1.0000 5.0000 0.0000 -4.0000
```

When I see MATLAB implementations of FFT-based convolution, I typically see one these two implementation strategies:

- Use $n=K+L-1$ as the transform length.
- Use the smallest power of two that is greater than or equal to $K+L-1$ as the transform length.

I don’t use either method. I have a third way of picking the zero-padded transform length.

Next time, I’ll explain this third method and why I think it is generally better.

If you’d like a preview, check out my new *FFT Transform Length* submission on the File Exchange. (File Exchange link, GitHub link)

As I considered what the code might look like, I realized that I would need a utility function related to extracting a subarray from an array of arbitrary dimension. And I thought that would be worth another File Exchange submission.

That got me to thinking about how I wanted to do File Exchange submissions, since the procedure would be somewhat different from what I did way back when I still worked for MathWorks (12 days ago). I recalled that there is a document, “MATLAB Toolbox Best Practices,” authored by my MathWorks colleagues, that recommended a folder and file structure for a MATLAB “toolbox” that was intended to be hosted on GitHub (which was my plan).

In this context, what is a *toolbox*? The document explains it this way:

We use the term “toolbox” here to mean a collection of reusable MATLAB code that you want to share with other people. Toolboxes contain not just code files, but also data, apps, tests, and examples. Some toolboxes will be just a few files; others may be extensive and represent years of effort by multiple people. The guidelines we present can be adapted for toolboxes that are large or small, casual or sophisticated.

I wanted to try following the recommended practices, as least for the kind of projects I initially had in mind. As the document says: “The guidelines we present can be adapted for toolboxes that are large or small, casual or sophisticated.”

The document is fairly lengthy, and it covers a wide variety of possible “toolboxes” and their contents. Combining several of the recommendations results in a set of folders and files that might look something like this.

```
arithmetic/
│ .gitattributes
│ .gitignore
| README.md
| license.txt
| toolboxPackaging.prj
├───buildUtilities/
├───release/
| Arithmetic Toolbox.mltbx
├───tests/
| testAdd.m
└───toolbox/
| add.m
| functionSignatures.json
| gettingStarted.mlx
├───+describe/
| add.m
├───apps/
| arithmetic.mlapp
├───examples/
| usingAdd.mlx
└───internal/
| addLiveTask.m
| intToWord.m
└───resources/
liveTasks.json
```

This is more than what I needed for the simple File Exchange submissions that I planned. Even a simplified organization, though, seemed like a lot of steps to me. So my software developer instincts kicked in, and I decided to try automating it. As I learned during my MathWorks career, a set of recommended practices is much more likely to take hold if it can be automated.

After some experimentation, I revised and simplified the structure to suit my needs as follows:

- Changed
`license.txt`

to`LICENSE.md`

. - Changed to use
`matlab.addons.toolbox.package.packageToolbox`

instead of using a packaging file,`toolboxPackaging.prj`

, because`packageToolbox`

is easier to automate. - Put the build utilities at the top level, instead of using a
`buildUtilities`

subfolder, to make it easier to use`buildtool`

. - Eliminate the namespace, apps, and internal subfolders.
- Use function argument validation, and the automatic tab completion it provides, instead of
`functionSignatures.json`

.

I wrote one function to make all this happen, and I called it `inittbx`

. It is used this way:

```
inittbx("add")
cd add
buildtool
```

```
** Starting check
Analysis Summary:
Total Files: 3
Errors: 0 (Threshold: 0)
Warnings: 0 (Threshold: Inf)
** Finished check
** Starting test
.
Test Summary:
Total Tests: 1
Passed: 1
Failed: 0
Incomplete: 0
Duration: 0.00061558 seconds testing time.
** Finished test
** Starting package
** Finished package
```

Those steps create the following:

```
add/
│ .gitattributes
│ .gitignore
| buildfile.m
| CHECKLIST.md
| LICENSE.md
| packageToolbox.m
| README.md
| toolboxOptions.m
├───release/
| add Toolbox.mltbx
├───tests/
| add_test.m
└───toolbox/
| add.m
| gettingStarted.mlx
├───examples/
└ HelpfulExample.mlx
```

The created files are mostly just stubs. The file `CHECKLIST.md`

reminds me of all the steps needed to complete the job.

```
# Checklist for Completing Toolbox
After calling `inittbx` to initialize your toolbox folder hierarchy,
you can use this checklist as a reference for making the additional
changes that are needed. After you have made these changes, you can
delete the checklist file.
- [ ] Initialize Git repository and commit the initial files produced
by `inittbx`.
- [ ] Edit README.md. Include a brief toolbox summary, installation
instructions, and a pointer to gettingStarted.mlx. Optionally, follow
with information for toolbox collaborators.
- [ ] Edit LICENSE.md. The license file helps everyone understand how
to use, change, and distribute the toolbox. If you do not provide a
license, then normal copyright rules will prohibit others from
copying, distributing, or modifying your work. If you accept
repository contributions from others, then your own use of the
repository may also become restricted.[^1] Avoid writing your own
license text. It is better to choose an existing license that meets
your needs. See https://choosealicense.com for help choosing a
license.
- [ ] Review and revise the options in toolboxOptions.m, especially
`ToolboxName` and `ToolboxVersion`.
- [ ] Replace the stub code in the `toolbox` folder with your own.
- [ ] Revise the test file in `tests` to test your own code.
- [ ] Revise `gettingStarted.mlx`. This file introduces your toolbox
and illustrates the primary workflows. It should link to any examples
that you create.
- [ ] Add one or more example files to the `toolbox/examples` folder.
- [ ] Add an "Open in MATLAB Online" badge using [this
tool](https://www.mathworks.com/products/matlab-online/git.html)
- [ ] If your GitHub repository has been linked to a MATLAB Central
File Exchange submission, then add a File Exchange badge to your
README.md. See the instructions at the top of your GitHub-linked File
Exchange submission.
```

This toolbox initialization helper, `inittbx`

, is now available on File Exchange as well as on GitHub. I prepared the submission with the help of `inittbx`

, of course.

Don’t worry, I will get back to the FFT padding question soon.

In the course of working on `inittbx`

, I came across two issues that I would like MathWorks to address. One is a workflow gap, and one is a needed document clarification. I have sent these issues to MathWorks support.

The function `matlab.addons.toolbox.packageToolbox`

can take a `ToolboxOptions`

input, and that is the syntax I use for automating the toolbox packaging. Initializing that input requires that you provide a unique identifier for the toolbox. Furthermore:

If you want to share your toolbox on MATLAB File Exchange, identifier must follow the RFC 4122 specification which defines a Uniform Resource Name namespace for UUIDs (Universally Unique Identifier). For more information, see https://www.rfc-editor.org/info/rfc4122.

However, as far as I know, MATLAB does not provide a documented way to generate a UUID that conforms to this requirement. The documentation examples for `ToolboxOptions`

make this gap fairly obvious.

I learned a lot about `buildtool`

in this project. You can define a set of named tasks that `buildtool`

can perform. One way to define a task’s actions is to provide one or more function handles. The documentation should mention that these function handles must accept an input argument, and it should explain what that input argument is.

Now, I spend much of my time studying and performing on the French horn, usually in amateur orchestras. I play in the Melrose Symphony and the Concord Orchestra. I also write about my musical path at my *Horn Journey* blog. And, I am a member of the Board of Directors, as well as Technology Advisor, for Cormont Music, a nonprofit organization that sponsors the annual Kendall Betts Horn Camp in New Hampshire.

Here is a recording of the Concord Orchestra performing Beethoven’s Coriolan Overture in November 2023.

I am also continuing to work with MATLAB and image processing, and I am continuing to write about those things, because I still enjoy them. That’s what this website, *Matrix Values*, is about—MATLAB, image processing, and anything else related that happens to catch my interest. You might think of it as a continuation of *Steve on Image Processing*. Only, now it is my hobby, an avocation, not my day job.

If this sounds interesting to you, then I invite you to follow along. You can sign up for the mailing list, or use the RSS feed, or follow me on LinkedIn. Together, we’ll see where this goes.

]]>