Falvotech.
Conquering the world is easy — what do you do with it afterwards?

L I N K S


⇐ Soapbox: KFC: Heart Attack in a Wrapper

⇒ An Opinion Offering Arguments Against Boycotting British Petroleum


Atari BASIC vs. Python: Let's Be Fair About This
Samuel A. Falvo II
kc5tja -at- arrl.net
2010 Apr 25 10:35 PDT

James Hague posted a blog article boasting about how much faster Python executes code over Atari BASIC, introduced (at the time of the blog posting) some 25 years prior. He claims Python achieves 108 000 times faster run-time performance than Atari's BASIC implementation. I find this utterly disingenuous, because James completely fails to compensate for the hardware on which these language implementations are running. Without considering the hardware implementations used, all James seems to succeed in accomplishing is increasing the entropy of the universe.

We cannot discern from James' article what hardware he's sicking against the humble Atari. We only know that he used hardware that's a generation back--no i7 processor or anything like that. Considering that James published this article in November 2009, we must conclude he's running Python on a Pentium IV processor, most likely at the 1.8GHz range.

What do we know about the differences between a 1.79MHz 8-bit 6502 and a 1.8GHz 32-bit Pentium IV? First, it's (1 800 000 000 ÷ 1 790 000 =) 1005 and some change times faster in bus fetch rate. This means that the CPU can accomplish cache fetches 1005 times faster than the 6502 can from RAM. But, wait a minute!, I hear you scream — you cannot compare cache hits against a 6502's RAM hits! Sure I can, because the 6502 fetches from external RAM in a single clock cycle. Yes, that's right — unlike the Zilog Z80 and anything at all from Intel until the 80486 integrated caches on-die, the 6502 can hit memory productively in a single clock cycle. So, since cache hits are also single-cycle, this isn't just fair, it's only meaningful comparison in fetch performance you can make!1

Next, the 6502 takes an average of 3.5 clock cycles to execute instructions, since there'll be a healthy mix of 2-, 3-, 4-, and 5-cycle latency instructions in the dynamic mix. Last but not least, the Pentium IV has no less than four integer pipes (two load/store, two ALU) which can run concurrently. These don't just add up, they multiply, yielding an instruction execution engine 1005 × 3.5 × 4 = 14 070 times faster than the 6502's. And, remember, I'm making conservative estimates. Note that I'm ignoring the performance gains offered through out-of-order and/or speculative instruction execution, branch prediction, et. al. I'm just looking at straight-ahead, "dumb" instruction execution benefits made available through the architecture.

The Pentium IV also operates with a 32-bit wide bus and instruction set, while the 6502 makes do with a paltry 8-bit bus and instruction set. This provides another factor of four (estimated) improvement in performance (14 070 × 4 = 56 280). The wider bus and instruction set allows the Pentium IV to execute fewer instructions to arrive at a mathematical solution than the 6502.

So, really, we're not looking at 108 000 times faster performance, contrary to James' assertion. We're actually looking at a performance benefit of only 108 000 ÷ 56 280 = 1.9. In other words, if we took Atari BASIC as currently written, and place it on the exact same Pentium IV that James used to run his Python performance test, it would run no slower than 0.52 times the Python performance &mdash within a factor of two.

OK, you say, but Python is still faster! But, is it really?

You see, there's another factor that needs consideration: the BASIC used by Atari, Commodore, and Apple all were tokenized due to RAM constraints (30KB then versus 3.0GB today), which means that they encoded the raw source code of the program as tokens. None of them made any effort to actually parse the program until run-time.2 If you tried, you would need to keep two copies of your program in memory3 (remember, only 30KB available to hold user programs!), and you'd need more ROM to hold that part of the interpreter4 (remember, on a 64K address space, 16K for ROM, plus another 4K for OS variables, and another 4K to 8K for video memory). Using tokens, you can literally reconstruct the exact program listing by simply expanding the tokens using a brain-dead table-lookup5. In effect, programs were stored using LZ78 compression with a predetermined dictionary.

Python, in contrast, has a byte-code compiler, which actually parses high-level constructs and produces smaller, simpler byte-codes as its output. It does most of the interpretation work once, up-front, as the Python interpreter only works with these (now significantly simplified) byte-codes. There is no way to reconstruct a precise Python source listing from a bytecoded .pyc file, just as there is no way to reconstruct a Java source file, in its entirety, from a .class file.

The BASICs of old are only 1.9 times slower and performing a hell of a lot more work per program statement than Python will ever see for that same statement. This is because Atari (and other contemporary) BASIC has to interpret what is, in effect, raw program text every time. It just didn't have the RAM to do anything else. But, today, we do have the RAM resources, and this means, in no uncertain terms or conditions, that if BASIC used a pre-compilation phase just like Python, then BASIC would soundly kick Python's ass from here to the moon and back again. I will go so far as to bet you money that a BASIC interpreter made with contemporary interpreter techniques will be at least as fast as Forth, if not faster.6 James, I encourage you to try this experiment: port the Atari BASIC version of the seive into Microsoft Visual Basic, and run it on the same hardware.

So, given all this, which language do I prefer to work with? I still prefer to work with Python and Ruby despite their performance hits over BASIC. Why? BASIC has other productivity issues which makes it inferior to these languages (a list of which is surprisingly short, but nonetheless beyond the scope of this article). It's as simple as that. Performance simply isn't in that list, however. It never has been, and never will be.

To recap, you can't just go boasting about how fast your favorite language performs without also considering two other things: (1) raw hardware performance, and (2) how the interpreter (or compiler) is written to take advantage of the underlying hardware. Doing anything else just makes you look stupid, especially when, after equalizing all the environmental dependencies, we find the victim of your unabashed bullying actually turns out to be your total and utter humiliator. Sorry, James.


1  This is also why a 1.0MHz 6502 could trivially snow a 4.0MHz Z80A in runtime performance, too. You see, hardware really does matter after all.

2  Wozniak's Integer BASIC for the Apple computers had a quasi-smart parser which compiled different tokens for different broad forms of statements. As a single example, note that it had different tokens for PRINT depending on the data type of the first argument. However, despite this, ultimately, even Integer BASIC resorted to parsing program source token-by-token at run-time for more sophisticated program constructs.

3  Remember, once a high-level program is compiled into a lower-level representation, you lose information. Hence, you can never recover the original program source precisely. Thus, to prevent user surprises, you need the two copies in RAM. Why RAM? Because 25 years ago, people were still using cassettes as their primary means of storage, and unreliable floppy disks still cost hundreds of dollars.

4  Most computers back then used Microsoft BASIC, which consumed 11KB of ROM. Commodore, for example, split BASIC into two pieces: 8KB and another 4KB that sat in the KERNAL ROM. Apple's version of Microsoft BASIC came on disk. But Atari didn't have that luxury — they had to use their own version of BASIC because anything else would have exceeded the 8KB of ROM space they had left. Anything else would have increased the already dear price-tag of the machine, while compromising still more of the dear RAM left available for your programs!

5  In fact, Commodore's implementation of LIST works precisely this way.

6  Remember, BASIC is statically, not dynamically, typed. All that time Python spends doing dictionary lookups to resolve methods on objects just doesn't exist in BASIC.