Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Microsoft GW-Basic Interpreter Source Code (github.com/microsoft)
238 points by susam on March 7, 2022 | hide | past | favorite | 97 comments


The GW-BASIC source code was published 2 years ago: https://devblogs.microsoft.com/commandline/microsoft-open-so...


Thanks! Discussed at the time:

The original source code of Microsoft GW-BASIC from 1983 - https://news.ycombinator.com/item?id=23266917 - May 2020 (266 comments)

Related:

GW-Basic Source Notes - https://news.ycombinator.com/item?id=23619590 - June 2020 (17 comments)

Converting GW-BASIC to the Z80 - https://news.ycombinator.com/item?id=23605821 - June 2020 (4 comments)

Help assemble the released GW-BASIC source code - https://news.ycombinator.com/item?id=23275225 - May 2020 (1 comment)

GW-Basic creator Greg Whitten on Joel Spolsky and other MS things - https://news.ycombinator.com/item?id=506466 - March 2009 (31 comments)


Ah. GW Basic. Memories.

On the very first day our household had a PC, they called "the guy". "The guy" was supposedly a PC guru, and he explained to me how to load a program in GW basic. And he put GW Basic in my autoexec.bat so it would launch automatically.

On day #2, I had gotten hold of some games on the school yard and naturally attempted to load them in GW basic: "LOAD a:\alleycat.exe". It didn't work well.

But I quickly saw the errors of my ways, and was soon whipping up really cool programs in GW Basic, and playing IBM Alley Cat in black, white, cyan and magenta.

Good times. Sometimes I think IT as a whole went down hill from there.


Oooh Alley Cat! That game was so good!…


I never got what what to do in that room with the sleeping dogs :D Failed hard!

Good times.


Drink the milk!

Unfortunately the action key is Alt which does not play along well with browsers (they want to show the menu instead): https://archive.org/details/msdos_Alley_Cat_1984


  --------- ---- -- ---- ----- --- ---- -----
  COPYRIGHT 1975 BY BILL GATES AND PAUL ALLEN
  --------- ---- -- ---- ----- --- ---- -----
  
  ORIGINALLY WRITTEN ON THE PDP-10 FROM
  FEBRUARY 9 TO  APRIL 9 1975
  
  BILL GATES WROTE A LOT OF STUFF.
  PAUL ALLEN WROTE A LOT OF OTHER STUFF AND FAST CODE.


According to Paul Allen's book [1] about his time at Microsoft (admittedly a biased source), his particularly critical contribution was an 8088 emulator/simulator for the PDP-10. That allowed them to write and even interactively debug (if I remember right) BASIC for the 8088 on the PDP. It would've been hopeless to develop directly on a microcomputer, so they would've had to have written on the minicomputer, transferred the binary across and see if it worked, and iterate like that.

That contribution wouldn't show up directly in the BASIC source code, since the emulator wasn't part of BASIC itself.

[1] https://www.amazon.co.uk/Idea-Man-Memoir-Co-founder-Microsof...


Yes.

There's an even more extraordinary story about them travelling to demonstrate their implementation of BASIC to the MITS team.

They realise they have not written a bootloader for the Altair, and Allen writes out a bootloader on the plane, which works when they get to MITS.

https://en.wikipedia.org/wiki/Altair_BASIC#Origin_and_develo...

My first encounter with a real Microsoft BASIC wasn't for another 15 years; the interpreter-only QBASIC that came with Halvorson & Rygmyr's Learn BASIC Now, which was a great book I gave away and now feel the urge to repurchase for nostalgia's sake. I used it for my GCSE Computer Science project.


Writing an 8080 emulator on the -10 is not extraordinary, after all, it's a trivial instruction set.

What was extraordinary is Allen realizing he could do this as a shortcut.


An 8080 has what, 40 opcodes? All doing simple things like "add" and "mov".

https://altairclone.com/downloads/manuals/8080%20Programmers...


I learned on an 8080 system (Interact) that also had MS basic. Almost every one of 256 op-codes is used on the 8080. That's not to say there were that many instructions. For example register-to-register move might be considered one instruction but the source and destination register are encoded into that one byte so there are many reg-reg moves. Same with add sub, etc... My father wrote a disassembler in basic to read out the ROM and he made his own mnemonics which amounted to probably 20-40 actual instructions (I still have it and could look it up), but there were maybe 8 entries in the opcode table that did nothing.


https://pastraiser.com/cpu/i8080/i8080_opcodes.html

As a pragmatic manner, I don't regard having the register and/or addressing mode encoded into the opcode as a separate opcode.

Emulating a modern 64 bit processor would be a major chunk of work, but the old 8 bit ones are simple.


If we exclude addressing mode then I think there are less than 26 instructions. Our mnemonic format was one letter (verb/op), one for source (a,b,c,d,e,h,l,w,x,y) and one for destination. I dont recall the letter for immediate, probably i but MIA for move immediate to a doesnt ring a bell for me. :-)


Yes, around 8 spare opcodes, and they were the ones Zilog used to escape into a large set of extensions for the (backwardly compatible to 8080) Z80.


Can a Z80 be dropped into an 8080A socket? I'm thinking there were hardware differences.


No absolutely not. 8080 was a 3 chip solution from memory. The 8085 was, like a Z80, a 40 pin DIP, but even here the compatibility was strictly software not hardware. And in fact the software compatibility was a little less than 100% because Zilog decided to add an overflow flag by changing the 8080/8085 parity flag to actually reflect overflow instead of parity on arithmetic instructions.


Interesting! Could you explain in a bit more detail how you used it for your CS project?


Oh I just mean, QBasic was the language I used for that project.

I can't actually remember it in detail now, but this was when I was 15 or 16. It was a database application; I guess mine would have been music lending.

I wonder if I still have the code. I definitely still wish I had the book.

I really enjoyed QBASIC because of the gentle way you were exposed to better programming practices; I've always admired that.

Five or six years later, my uni project was a BASIC-to-C transpiler for a simple dialect of BASIC as a teaching language with some specific language extensions for message-passing parallel programming. It had a simple IDE, too.

In some ways it is a shame BASIC itself is really gone; modern implementations only have a whisper of its original simplicity.


Python seems to have taken its place, specially with all school calculators having some variant of MicroPython.

It isn't the same thing though.


Yes -- I think sort of the most basic BASIC programming is done with Scratch, and then they graduate to Python.

In the UK I gather kids are taught Scratch and Python in an overlapping fashion, so they see the structure on the Scratch view and then have a better chance of understanding how the code works without being thrown off by the symbols.

This is probably better in a pedagogical sense, and I am sure Scratch programmers have a better early understanding of flow control and things like loop termination conditions than BASIC programmers did; Scratch makes it more physical.

But it does lose that absolute immediacy of being able to type simple words onto an empty screen and see it do things.


Scratch and especially Python are a lot more complex than BASIC. OTOH, Scratch having a structural rather than solely text-based interface might also make a more complex language immediately usable, by removing the difficulty associated with keywords and surface syntax. This is the kind of thing that would really benefit from a well-designed usability experiment.


Using a minicomputer as a dev environment for micros was particularly popular in the late 1970s and early 1980s. See Infocom

https://www.filfre.net/2013/03/the-top-of-its-game/

which used virtual machine technology to deliver games developed on a Decsystem 20 to many different kinds of Micro.

When Microsoft BASIC came out in 1976 it was rare for a micro to fill out the 16-bit address space with a full 64k. I had a TRS-80 Color Computer which ran a multitasking operating system

https://en.wikipedia.org/wiki/OS-9

With 64k of RAM I wrote a FORTH interpreter that ran under OS-9 with a good standard library in about 3000 lines of assembly. OS-9 looked a lot like UNIX or VMS. Dev tools in 1984 were good enough that writing a BASIC interpreter on a micro in assembly would have been straightforward, I'm sure you could on a C-64.

No way you could do that on the base configuration Altair that Microsoft BASIC originally targeted.

The first time I used emulation was circa 1987 when I developed a BASIC program for a high school teacher who had a Z-80 based CP/M system on my generic 286 machine... Even then there was a CP/M emulator for the 286 which could destroy any Z-80 machine on the market -- even though the protected mode of the 286 was as "brain damaged" as Bill Gates said it was, the raw performance of the 286 was the beginning of the end for the 6502, 68--, 68---, Z80 and all the other architectures.


I had an obscure machine ("Interact") in the early 80's that had an 8080, 16 kb of memory, and a cassette drive. There was an assembler for it. If I remember right, you read the editor from tape, edit and save your file, then read the assembler from tape, it reads the tape with your code and then writes the output to tape.

I have no idea what the people ("Micro Video") producing software for this thing did, but I can't imagine they used this on-machine assembler.

(The tape loading mechanism was basically blocks of address + data. They put loading screens on the tapes by populating video memory first.)


... an 8080 emulator, not 8088 :)


So I might be able to use a PiDP-11 microcomputer, presuming it is compatible with thd PDP-10, to run an 8088 emulator, on which I could run early DOS programs?


PDP-11 and PDP-10 are completely different. The 11 was in the DEC 16-bit series and the 10 was in the DEC 36-bit series.


While this is completely true, the PiDP-11 mentioned by the parent is based on SimH, which does simulate PDP-10 (as well as dozens of other early systems).

[1] https://obsolescence.wixsite.com/obsolescence/pidp-11 [2] https://github.com/simh/simh


They wrote this in two months? In assembly? Impressive! I wonder if they used code generators or wrote plain assembly?


Assembly language isn't that hard. I wrote a FORTH interpreter with a pretty good standard library in 3000 lines of 6809 assembly that pretty much worked right the first time.

Today you could spend as much time getting the build to work with npm or maven for something very simple.

Macro assemblers are nowhere near as much fun on the x86 (and other 80's era micros) as they are on something like the IBM 360 or AVR8 with a big register file... If you've got 32 registers you can write macros where the register names are parameters and avoid a lot of the meaningless activity involved in "calling conventions" in programming languages like C.


Was the FORTH interpreter for your own use or did you sell it? I ask because it's always fun to see someone on HN that wrote products I saw in the pages of Rainbow long ago.


It was for my own use.

It was unusual in quite a few ways. For one thing it was subroutine threaded which made it a little faster than most FORTHs, also it used self-modifying code in an inner loop to save a cycle in an important place.

Most FORTHs at the time used block-based disk I/O because you could do that without a real operating system and have some nice benefits such being able to copy a block of source code to the screen buffer to edit in place, easy memory allocation, etc.

OS-9 had a handle-based API for filesystem access basically the same as UNIX and MS-DOS 2.0 so my FORTH used that. OS-9 had a choice of text editors, including a vi clone so I didn't feel the need to embed an editor in the interpreter.


Everyone wrote plain assembly in those days. If you were lucky your assembler did nice things like symbolic constants.

It was a skill pretty much everyone knew.


It's fair to assume this is handcrafted and hand-optimized code, as machines back then were typically too small to use for many tasks. Any code generated would have added bloat making the whole project untenable.


For awhile googling for GWBasic would find my homepage. (I used to have a video call widget and my username was "GWBasic.")

Someone once emailed me looking for support for GWBasic. They apparently got ahold of an old GWBasic program that was saved in an encrypted manner. It was run-only but they couldn't list out the source code.

I just dug out my old GWBasic manual. The command was "Save P"


Did the manual have the incredibly easy "decryption" process?

https://groups.google.com/g/comp.os.msdos.misc/c/PA9sve0eKAk


No. At the time I got the email my GWBasic manual was 3000 miles away.

But it was one of those cases of someone not having good internet etiquette. There was no reference to "GW-Basic" on my home page, other than metadata in the HTML. I should have responded with a link to let me google that for you.


There are forked versions which have been made somewhat to work. [1] [2] The original version (the linked article) won't work as-is, since there are no build scripts, instructions, or executables.

[1] https://github.com/dspinellis/GW-BASIC

[2] https://github.com/tkchia/GW-BASIC


All of the "comments are useless because they get out of date" folks should look at this code, which is still comprehensible 40 years later because of the comments.


Well, they get out of date because the code around it changes. If the code doesn't change either...

I do agree with you though that commenting is a good thing.


I couldn't agree more - and think it is asinine to tell developers to not comment code; most of the comments I write, are for me as much as for the next person who works on it. I find them invaluable.

Recent contract at a big Fortune50 company, and they forbid developers from using comments - "any code that you think needs some explanation should be put in a separate readme file in a 'docs' directory off of the root" - how can that be better, or more useful, than a couple of lines of comments right above the code in question?

They didn't want us to use comments, because they were afraid the code would change, and the comments would become out of date - so put those comments in a separate file, that almost nobody will remember to read, much less update, and that would solve the problem.

You can't make this stuff up, utter insanity.


Heh, now imagine that knowledge is not in README files, but in a certain popular slow WYSIWYG SaaS document product.


Updating comments as needed is just something I consider part of being a professional programmer.


This is why I've always been an advocate of inline code comments.

With just the slightest amount of professionalism, you can keep them up to date, and only a real psychopath would update code without updating the comments directly above it.

(So, of course, 50% of programmers don't...)


I can’t count the number of PRs I’ve rejected because code was changed but the corresponding inline comments were not.


Thank you for being part of the professional 50%!


75% of assembly programming is writing comments.


Agreed, commenting assembly is bit different "environment" to explain the "why?". You're explaining why you're putting a value in a register vs why you're calling some particular API or whatever in a higher-level language. And then you have COBOL...


And figuring out meaningful jump target labels. :)


Michal Necasek has some notes on this code on his blog: https://www.os2museum.com/wp/gw-basic-source-notes/


Big thread from when it was released in 2020:

https://news.ycombinator.com/item?id=23266917


For those interested in building this repo, TKChia has a repo with changes for building the repo with MASM 5.1A or JWASM.

https://github.com/tkchia/GW-BASIC


Does everybody know the duality of "Gee Whiz BASIC" and "Gordon Whitten BASIC"?


Yes. And, there are other possible expansions, e.g. Gates William.

This is likely no different than the MZ in the DOS exe header. Nobody actually knows (and says!) what MZ stands for, but it is believed to be Mark Zbikowski.


Pretty sure it is Mark Zbikowski.


I believe that, too. However, for something like a zip file, I KNOW that the PK in the header stands for Phil Katz.


To be fair, he did design the file format


Greg Whitten. I don't know the answer.


Whoops, yeah.


(From the linked announcement blog:) > (Alas, sorry, we’re unable to open-source the ISA translator.)

Awww...


I wonder why. Uses third-party code? Some contractor/consultant/etc wrote it and they can't find the contract so they don't know what it says about copyrights? Lost it completely?


Likely something like that. My money is on them being unable to locate someone who needs to give permission, and it's probably not a very high priority issue.

I vaguely recall reading a blog post when Sun opened the Solaris source code, that it was a tremendous effort tracking down every single person they needed permission from.


And then Oracle went and undid almost all that work :'(


To make it even worse, they laid off most of their Solaris developers a couple years ago.

Somebody who used to work on Solaris before the Oracle takeover commented at the time that just by looking at the number of layoffs, Oracle Solaris was effectively dead.

I am not even a big Solaris fan (although it was the most beautifully named OS ever), but I cannot help but think of Rutger Hauer's dying speech from Blade Runner: I have seen things you people would not believe [...] All these moments will be lost in time like tears in the rain.

If somebody who has been there would write a book about the development of SunOS and Solaris, I would probably buy two copies. If the title included the word Eclipse, I'd buy three copies.


There’s a fun GW-BASIC emulator written in Python that even has cassette tape audio file read access:

http://pc-basic.org


A 13 year old me brought down the entire showcase of Kmart's PCs with this program back in the 80's. Good times.


I'd love to hear how. Did you go computer to computer, since LAN in the early 80s was not really a common thing except for big businesses. Was this late 80s? I was a Banyan Vines guy vs. Netware by then.


Deployed the program via floppy. Edited the autoexec.bat to load the program straight away and interrupted any keyboard input such as Ctrl-C. Program layed dormant until the system clock reached a certain time then back to dos to format c:


I must admit we did a similar thing without BASIC.

The local super groceries-and-more store had a bank of demo PCs running DOS, Geoworks or early Windows, and a screen saver. We used to type "format c: /autotest", the later being an undocumented MS-DOS parameter that skipped the "are you sure?" question, and left it standing on screen like that.

The sales personnel would occasionally terminate the screen saver by pressing "enter". Hilarity ensued (for us! Not for the poor sales person trying to give a demo to a customer) :)

As for programs and the clock, my school (which had HORRIBLE IT courses at the time) had caught a virus (not from me, just like that!).

I removed it for them (using F-PROT and McAfee for DOS back in the days), but they didn't trust that and called an "expert" business (the local typewriter guy), who ran exactly the same virus scanners, and changed them dearly. But they didn't trust him either (ahoy computer virus craze!), so they made me reinstall nine school machines, with Windows 3.1, Word 6, Excel 5, Access 2.0, from floppy disks, because I was apparently the only one capable of installing Windows.

I was so angry about the fact they didn't even properly thank me, I made a Turbo Pascal program that would also check the system clock, and do interesting things. I couldn't get a TSR working properly at the time, so I ended with an exe in the search path called " .exe" (the blank being ascii 255, not ascii 20, which wasn't allowed), and autoexec.bat would have loads of empty lines at the end, and then finally, the call to the program named with "ascii 255", which was invisible in the editor even if someone scrolled all the way down and was looking right at that particular line.

The program was reportedly still there many years thereafter. Figures :)


I did this with just batch scripts. Little `tty > null` and then just an infinite loop of "You've been owned" or whatever I thought was edgy back then.

Another fun one (later in the Windows '95 era) was to create a shortcut to every program on the computer in the startup folder. Then copy / paste it about a dozen times and reboot.


5-1/4 in. floppy? I had a cassette drive in my Commodore PET 2001 in 1978. Micro floppies came in 1986 or 87 I think. You booted from the floppy vs. the MBR of the system's hard drive?


Like so many, I gathered my first experiences with programming in Basic (or is it BASIC?). My first computer was a Commodore 128, and my mother sent to a week-long programming course that for some reason the local Sparkasse (quasi-public bank in Germany) held over the Easter holidays

And here we used PCs running DOS, so I almost certainly did my first baby steps programming using GW-BASIC. Little did I know the Basic on my C128 had also been written by Microsoft. Ah, to be that young again...

I expect Microsoft is not going to get many PRs, but it's a nice gesture. One day, some digital archeologist is going to have fun (or curse like a sailor) inspecting this code.


It is supposed to be "BASIC", from Beginner's All-purpose Symbolic Instruction Code no less [1].

[1]: https://en.wikipedia.org/wiki/BASIC


I knew that it used to stand for that (and as an acronym should be upper-case), but I thought they maybe dropped that somewhere along the way.

Fortran and Cobol used to be "FORTRAN" and "COBOL", too, but once systems with support for upper- and lowercase-letters became widespread, they went mixed-case.

I recall there was some debate if the name BASIC had been intended as an acronym initially, but in time, evidence (AFAIR, anyway) tended to support the claim is was an acronym from the very beginning. (-:


Dartmouth celebrated the 50th anniversary of BASIC a few years ago and they were pretty consistent in keeping it all upper case.

https://www.dartmouth.edu/basicfifty/basic.html

True BASIC, a company that Kemeny and Kurtz founded in 1983, is all upper case too.

https://www.truebasic.com/about


Well, Happy Birthday BASIC!

Love it or hate it, BASIC has left a footprint in the IT world that few languages could match.


> Fortran and Cobol used to be "FORTRAN" and "COBOL", too, but once systems with support for upper- and lowercase-letters became widespread, they went mixed-case

COBOL is still “COBOL” (at least per the IBM page on the language and the ISO/IEC 1989:2014 standard for the language.


My bad then. Thank you for pointing that out.


I suspect they'll get zero PRs:

"The files in this repo are for historical reference only and will remain read-only and unmodified in their original state.

Please do not send Pull Requests suggesting any modifications to the source files."


Ah, yes, I should have read the README. ;-)


Why does GitHub say 40 years ago when 1983 is 39 years ago?


Because the relative date function that powers that label rounds to the nearest year (it also treats 360 days as a year):

    timeAgoFromMs(ms) {
        const sec = Math.round(ms / 1000);
        const min = Math.round(sec / 60);
        const hr = Math.round(min / 60);
        const day = Math.round(hr / 24);
        const month = Math.round(day / 30);
        const year = Math.round(month / 12);
From https://github.githubassets.com/assets/javascript/node_modul...


Wow! They added a markdown formatted code of conduct 40 years ago!!


They even reference "Windows Store app package directories and files"[0]. Very futurist thinking!

[0]: https://github.com/microsoft/GW-BASIC/blob/edf82c2ebf6bfe099...


The numeric scanning sub is called NUMNUM. Love it.


There are machine translator sounding comments and references to nonexistent source .mac files - is the real source code lost?


I remember being shocked to find that every GOTO (and even a NEXT) would do a linear search for the right line number starting from the beginning of the program image.

They really, actively did not care about performance. Manifestly, it was the right choice for the time, place, and circumstances, but it is hard to see how, today.


Optimizing for time and optimizing for memory are different things. Choosing one over the other doesn't guarantee the author "did not care about performance".


Nothing guarantees anything. But the extra stuff needed to compensate for being extremely slow can often cost more than the cost not to be extremely slow.


Oh - The nostalgia - the memories...

My first foray into programming. I remember skipping breaks to go to our computer room (super chilled due to Air conditioning) and popping in a 5.1/4 inch floppy diskette and loading this...


I love how they went through all the internal hoops to make this public and consequently put the SECURITY.md in there. Just in case someone finds a vulnerability in GW-Basic after all these years...


Greg Whitten still lives in the area and brings one of his Ferraris to the Microsoft charity car show every year (last time it was a LaFerrari!). I hope that happens again this year.


Still looking forward to https://github.com/Microsoft/Windows .


I wonder how much more diversity in programming languages we would have had if BASIC wasn't pre-installed on home computers.


Ah, memories.

Also, love that the commits are set to '40 years ago'.


>Will not be modified - please do not submit PR's or request changes

Why doesn't github still allow users to deactivate the "pull requests" tab?


Seems to me like better UX as the users are being informed why they can't make pull requests. Of course this could be done differently but I don't have an issue with it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: