Celebrating Pi Day on the Commander X16

Guess what day it is, will be, or was! That’s right — the 14th of March, which can be written as 3.14 if you have trouble writing dates. In other words, π.

To celebrate this momentous occasion, I have created a pi generator (or, more accurately, adapted an existing one that I don’t fully understand).

From NES to X16

The idea came to me when I saw this video by NESHacker, in which he generates pi on the Nintendo Entertainment System. Since the NES uses a 6502 CPU and the X16 uses a 65C02 (slightly better but pretty much the same thing) there was no reason to think it wouldn’t also be up to the challenge.

This uses a spigot algorithm (by Stanley Rabinowitz and Stan Wagon) and doesn’t require any floating-point maths — good news for the 6502, which doesn’t have instructions for much more than integer addition, subtraction and the usual bitwise operations. Each digit is produced at a constant rate, giving something to look at while it works away.

There’s a few ways in which our implementations differ. Mine has fewer features, for a start — none of this menu business or a spinner animation for each digit. In fact, interrupts are disabled in 4800Pi so that the CPU has as much time as possible to do its calculations, since it’s slow enough as it is!

One such difference is mine supports up to 4800 digits (4800 characters, 4799 digits, 4798 decimal places… I’m just going to say “digits” because it’s easier) whereas the NES version was designed for 960. On paper, this doesn’t seem like a big deal, but it means that the program has to cope with more big numbers. For instance, the left variable goes up to 64,000 when generating 960 digits, which fits into 16 bits nicely, but for 4800 digits it goes up to 320,020, which uses 19 bits. So a 24-bit multiplication routine had to be introduced.

This also ends up consuming more memory. About 32K for 4800 digits. This fits into low RAM with enough wiggle room for the program and its variables to be quite a bit larger than they are. The additional RAM banks remain unused.

The reason why I say “up to 4800 digits” is because it can also generate fewer digits. Depending on the screen mode chosen, generation times can range from several minutes to several hours. Yep, the amount of time required isn’t linear to the number of digits to generate.

Despite NESHacker’s NES assembly being open source, I didn’t end up using anything from it. The code is based upon his JavaScript implementation and the multiplication/division subroutines are based upon routines hosted at Codebase 64.

Size matters

To make up for it being rather pointless, I focused my efforts on reducing the size of the resulting PRG. Such optimisations include:

  • Heavily utilising the zeropage
  • Having constants wherever possible
  • Doing OR 1 to add 1 to an even 16-bit number
  • Using inline assembly to get the x and y registers directly from the SCREEN KERNAL call instead of calling it twice via txt.width() and txt.height()
  • Using chrout instead of print_ub to print numbers by adding 48 to convert 0-9 to the correct PETSCII codes
  • Bothering the author of Prog8 to make the compiler better
  • Gutting some of the unnecessary stuff Prog8 puts in the assembly and BASIC stub
  • Using the 24-bit multiplication and division routines everywhere instead of letting Prog8 insert additional routines such as for 16-bit (mostly 24-bit is required, so performance impact is practically zero)
  • Adapting the 24-bit multiplication and division routines I found to work with smaller inputs (multiplying 16 and 8-bit values; dividing by a 16-bit value)
  • To avoid CHROUT from scrolling the screen when printing to the bottom-right corner, I just raised nlines (address $0387) instead of faffing about with VPOKE

I’m not too familiar with 65(C)02 assembly, or any assembly for that matter, so much of it is written in Prog8, a high-level language for the Commander X16 and Commodore 64. I’m confident that more could be done to reduce its size further, but it’s at the point where I don’t know how.

Currently it’s sitting at 697 bytes, which I’m very happy with since there’s no way 4800 digits could fit in that amount of space. At best you’d be looking at 1500 bytes for the data alone (some sort of Huffman approach). If you’d like to see if smaller is possible (without feature sacrifices) it’s open source on GitHub.

Other notes

It wouldn’t be much of a challenge to bring this to other Prog8-supported platforms. On the Commodore 64, this could use the same memory space even if it were to also do 4800 digits (though only 1000 would fit on-screen with the C64; 2000 on the C128 in 80-column mode). I just don’t have much incentive to, especially since someone will have done a better job there.

I saw a few of those Monte Carlo pi approximation programmes in my research, which mostly aren’t very interesting, but I like the one in More than 32 Basic Programs for the VIC 20 Computer which turns the concept into a bizarre darts simulation which fails to approximate is as 3.14 in the provided screenshot.

Compared to litwr2’s pi spigot benchmark page, 4800Pi is excruciatingly slow compared to even 1MHz hardware. It appears they traded executable size for speed, utilising lookup tables or something or other to help the 6502 along. Speed wasn’t really the goal with 4800Pi, as if the program is large I feel like you might as well just jam a giant PRINT “3.14159…” in there, though it would be nice if it was faster.

There’s a BASIC V2 implementation of the algorithm at Rosetta Code, which runs just fine on the Commander X16. It’s both slower and larger than 4800Pi — not really surprising, given the fact it’s written in BASIC, but still an uplifting note to end on.

Oh, and famous internet personality SlithyMatt let it run overnight on his X16 — not emulated! Even though this is just a silly little thing that was unlikely to suffer any issues from emulation inaccuracies, it’s still nice to see.

Last year’s adventures

This isn’t my first venture into the world of Commander X16. I’d seen The 8-Bit Guy’s older videos on the project but had frankly forgotten all about it until he released the update/announcement video in October 2022.

That same month I released TADA.PRG, also written in Prog8, which is nothing special — it just plays the tada.wav startup sound from Windows 3.1.

Apart from that, I’ve been playing around with doing more exciting things on the X16 on and off, such as bringing a couple of my existing projects to it, but have yet to have anything even close to ready for release.

It really is new territory for me, all this X16 business. I follow YouTube channels about retro technology as a whole, but in regards to ’80s computing I have near-zero experience. I dabbled in Commodore 64 development at one point but didn’t get very far.

Anyway, I hope to continue making X16 stuff. It’s a niche market, but that also means less competition, so it all balances out. I won’t be getting a real unit anytime soon, rather opting to wait until at least Gen-2 when it becomes more affordable.


Comment on this article at itch.io

See the follow-up post: Faster 4800Pi