Ford & Amolak:
With demo day rapidly approaching and our audio codec configuration
still unreliable, we resorted at the last minute to a hack for demo
day:
Instead of configuring the audio codec ourselves, we let Terasic's
reference design do it for us. This works reliably, although it's an
inelegant solution. We plan to fix this at a later date, but for now,
this will have to do. Debugging audio also meant we had to shelve
plans for the filters. It sounds okay without the filters, but they
would be nice to have.
This will be our last weekly update. Working on this project was a
joy at times, and an royal pain at others. Getting a game
booting was a real thrill, and we look forward to future FPGA fun.
Perhaps a SuperGrafx, next time :-)
Ford & Amolak:
We spent the last few weeks attempting to get audio working. It works
(to some degree, anyway), but initializes the audio codec unreliably,
sometimes makes awful clicking noises, and has issues with aliasing.
To resolve the issues with aliasing, Ford met with Professor Richard
Stern, who suggested building a 3-stage chain of FIR filters to
correctly perform decimation of the output stream. The clicking
and init issues will have to be resolved through standard debugging
techniques. We're almost there!
Amolak:
Amolak:
This week was a bit of a slow week/break for me to focus on two midterms in
a couple of other classes. I initially planned on finally finishing up the
controller interface with the GPIO interface on the board, but I got a bit behind
and mainly was able to go through the code and find a few smaller bugs in the
sprite logic.
This next week will be spent on finalizing the controller input and then
working on the PSG with Ford, which will be the final touches on the project.
Ford:
This week was mostly spent trying to fix compatibility issues with
our hardware. Thanks to the kind citizens of the Mednafen emulator
developer IRC channel (especially rootabaga), we learned that
there is an undocumented scrolling behavior where writing to the
YScroll register (BYR) resets the vertical counter. Without emulating
this behavior, mid-frame BYR writes result in incorrectly scrolled
graphics.
We also managed to get Final Soldier to boot! After looking at
the game's boot sequence in the waveform viewer, we discovered that
the game was stuck in an infinite loop of entering an interrupt
handler, enabling interrupts, and re-entering the same handler.
Looking at the game code, we discovered the following sequence at the
start of the interrupt handler (label + comments by me):
That's right, the game is re-enabling interrupts before
acknowledging the interrupt it's currently servicing! Strange, isn't
it?VDC_IRQ_handler:
PHA ;save registers
PHX
PHY
CLI ;enable interrupts
LDA $0000 ;acknowledge VDC interrupt
After some research, we learned of an interesting quirk of the
HuC6280 (and possibly other 65xx family members): the result of an
operation modifying the I (interrupt disable) flag does not become
visible until the instruction after the instruction modifying
the flag. This means that the CPU will not handle interrupts until
decoding the instruction 2 instructions after the
CLI
. This effectively creates a strange sort of
delay-slot-like structure except with interrupts instead of
branches. Changing our logic to emulate this quirk was a
simple 1 line change consisting of moving I flag modification from
fetch to decode. With this change, Final Soldier booted
successfully!
Ford and Amolak:
Note: We're writing this status update together as one since most of our work generally (and from here on out) is a group effort and isn't so cleanly divided anymore.
The biggest update is that we are running on the FGPA!!! We gave ourselves at least 2-3 weeks just to get the most basic functionality running on the board (i.e. The Kung Fu running at least recognizably), but we pushed hard last weekend and reached that point on Sunday. From that point on Sunday, The Kung Fu was running with some minor graphical glitching.
After some reasoning about what might be going wrong, we determined that the cause of the graphicall issues was a defect in our implementation of CPU VRAM access. We decided to take a shortcut and configure the M9K block used for VRAM to have two read/write ports. This allowed us to isolate the CPU side of things and modify the access timings without needing to change anything in the graphics pipeline. Changing the VRAM access circuitry fixed Kung Fu completely, and generally improved several other games. We're still having issues with games that perform large numbers of reads from VRAM, so we still need to determine why this is the case.
For next week, we're going to attempt to fix some bugs in the sprite logic, debug the VRAM read issues, and pick up work on the PSG.
Amolak:
Sprites work completely! We've completely merged our Sprite logic with the rest of the CPU and VDC MMIO (which in previous updates was only showing backgrounds). With this, we can now load VRAM and CRAM completely from the CPU execution of the ROM and render frames including the entire screen contents including the background and the sprites! When running "The Kung Fu" for instance, we're able to show the entire screen loaded:
We've noticed small glitches with more rigorous game tests (surprisingly far less then we initially thought there'd be. We're testing Sprites on Gunhed and are in the process of fixing some small graphical glitching show up. Besides that though, besides making some changes to CRAM (which Ford will finish up, it's a quick job), we're ready to move onto the board! We have an ambitious goal to get this running on the board by Status Checkpoint on Wednesday, and we're going to shoot for that since we're already at a point that we're very comfortable showing for Wednesday.
Ford:
Not much has changed since my Monday update, but I figure I should
write something here to direct whoever is grading these to look there.
Of note, I've figured out a better way to do VGA scaling:
The PCE has a nice property where no matter what configuration the
VDC and VCE are in, the video signal has constant sync timing. On the
real console, the sync timing is generated by the VCE, we do it in the
VDC (due to some very confusing documentation), but we can easily fix
this. We can then construct a second copy of the sync circuitry that
runs at the full 21MHz master clock speed instead of half (within 15%
of a
640x480 pixel clock, well within VGA tolerances), along with two
line buffers. By saving each line as it is sent out of the VCE and
playing it back at double speed twice (or once with a line of black
after to simulate scanlines), swapping buffers every time we finish a
VCE scanline, we can line double the 240p to a modern-TV-friendly
480p without any complicated sync locking, PLLs, or framebuffers.
This also means that we can scale the image without any noticable
input latency (a mere scanline of input lag!). This should remove the
need for the external converter box we considered using to scale our
video output.
Ford:
Sometimes, a bug that is totally a CPU bug just isn't a CPU
bug. For the past week, we were puzzled as to why
Gunhed (known in the US under the far more '80s title
Blazing Lazers[sic]) refused to boot without getting stuck in
an infinite loop overwriting all of VRAM (including the 8000-8FFF
region that isn't even backed by physical memory) with 0s. Today, I
finally tracked down the reason why. It turns out that Gunhed
is a 384KiB ROM. On the PCE/TG16, ROMs of this size are mapped into
memory in a rather interesting way. The first MiB of the address
space is mapped to the game ROM. For a ROM of a size N that
evenly divides 1MiB, the game is simply mirrored every N bytes. For a
384KiB ROM, however, this is not the case. Instead, the first 512KiB
of the address space consist of the first 256KiB of the ROM image
mirrored twice. Then, the upper 512KiB of the address space is filled
with mirrored copies of the upper 128KiB of the ROM image. I believe
this may be because the 384KiB cards were implemented using a 256KiB
mask ROM and a 128KiB mask ROM with certain address lines used as
chip enables, but I don't know for certain. As it turns out, we were
reading an incorrect value from ROM, setting it as a base address in
a pointer table, and then corrupting a whole load of memory, later
triggering the VRAM clearing loop.
After discovering a mention of "strange" behavior with 384KiB ROM
mappings in a random forum post from 15+ years ago, I played around
with Mednafen to figure out the actual mapping, and then fixed my
memory mapping logic. I was then greeted by the title screen to
Gunhed (minus sprites)!
On Wednesday, Amolak and I will merge his sprite code branch into the
master branch and attempt to boot a game with sprites enabled. If all
goes well, we will then proceed onto the next phase in our project:
synthesizing our Verilog and testing it on the FPGA board.
On an unrelated note, I ended up implementing the SET instruction and
the modifications it performs to immediate-mode arithmetic operations.
The peace of mind of having the full ISA implemented was worth the
extra annoyance. :-)
Ford:
All instructions implemented except SET. No games we've tested use
SET and it looks like a pain to deal with, so I'm going to avoid
implementing it unless we find a game using it that we feel we
absolutely have to run. With fixes to the memory subsystem, we now
boot The Kung Fu to the title screen! Sprites are not visible
as my fork currently does not have Amolak's changes. This will be
resolved over the next few days. Full system tests run at a rate of
between 3 and 5 seconds per frame of video, so full-system testing is
a long process. Note that the large black borders on the top and
left of the image below are a quirk of TVEmu (our TV emulator) and
are not a result of incorrect video rendering.
Additionally, I implemented a preliminary version of the PSG. It is
completely untested, as we've been focusing on the video hardware, but
I wanted to get something written down so that it would be easier to
continue with later.
Amolak:
My goal up to this status update was to have something visually showing on frame dumps in regards to sprite data. As mentioned in our project proposal. the VDC output logic is split into background logic and sprite logic, and the sprite logic is considerably more complicated than the background logic. As shown in Ford's update, the background logic has mostly been completed, enough at least to get something visually recognizable. My goal to this update was to achieve the same thing with Sprite logic.
There were a few complications in achieving proper Sprite output. I had to consider the following traits from the Sprite Attribute Table:
- Sprite Width and Height
- Sprite Background Flag
The main difficulty is actually the prefetching of all the Sprite data before every line rendered. As we figured out through documentation, with the typical width of HSYNC and HWAIT before HDISP, we have EXACTLY enough cycles to be able to prefetch all of the sprite data. My initial attempt was to simplify the prefetching logic just for the sake of getting something output to the screen.
I started with the Parasol Stars titlescreen where there's a wave animation with sprites beneath the Parasol represented by 2 32x32 sprites. I simplified the prefetching logic just to initially handle 2 sprites and properly handle the background flag (since these sprites specifically are drawn behind the display background.
It worked! You can clearly see the ripples underneath the Parasol.
Our next step is to properly handle the Sprite prefetching logic line per line which is a bit more complicated given the planar memory distribution of Sprite data. Then we can combine this with the booting CPU and we can in theory boot the game in simulation frame by frame!
Amolak:
The VDC now has a dummy MMIO interface. This will be filled in as we
get closer to integration testing. Preliminary work has been done on
the sprite system. Structures for the sprite attribute table and its
entries have been created.
Ford:
Almost all additional instructions have been implemented. The CPU now
has an MMU, interrupt controller, and timer, as well as a dummy MMIO
interface with the external peripherals and dummy PSG. This has
allowed us to boot The Kung Fu up to the point where it waits
for a VDC interrupt!