Thursday, December 8, 2016

Archiving "Convict" and dealing with data corruption

Every now and again I get asked to do archiving of Apple II disks for people who don't have their physical computers anymore. I've archived personal files, commercial software and even disks for court cases. In all that time I've never come across disk corruption when archiving disks for others. That is until now. These disks tended not to be the your everyday usage disks but instead the ones that got written to once and were stored away. So I guess the probability of corruption was low. Sure, I've had corruption on my own disks but I've always been able to work around it by having backups or having software that is readily available from other sources such as online archives. As time goes by and disk media degrades further, having a bit of knowledge on disk corruption can't be a bad thing. This is a write up of my experiences in rescuing a particular software package. I hope it gives others confidence in attempting to repair their own disks when needed.

A few weeks ago I responded to a request that came through from the Australian Apple II enthusiast's email list. Matt was looking for someone local to archive software that had been written by the local state education department. He managed to obtain the physical disk by contacting the developer. This software was used locally in schools so it was not wildly dispersed. I had a decent search for information about this software but I could't find any references to it. Since I have an Apple II system setup that can easily generate images from disks or the reverse I didn't hesitate to call and offer my assistance.

The disk that Matt handed me is the game "Convict" by Paul Holland. It comes in two parts. The first being the research part which I could best describe as an interactive history book. The focus of which is on early European settlement in Brisbane, Queensland, Australia. The second part is a penal colony simulator where your objective is to allocate resources and punishment in the pursuit of setting up and growing a convict settlement. Even before evaluating the contents of the material I prefer to do the copying. That way if the disk is fragile and on it's last legs this will result in extracting the best possible information before the disk gets any worse. So far this hasn't been the case. Most 5.25 inch disks that I have dealt with have stood up pretty well.

Side A of the software copied without a hitch. I was on a roll and thought that within a few minutes I'd be finished. Well that didn't go to plan. Side B had corruption. At first three sectors were showing up as containing errors. To my surprise, the more I tried copying the disk the better it got. It got to a stage where only one sector was showing up as being corrupt. I then tried using different disk drives and different software packages to copy the software but with no luck. The bad sector was there to stay. I wondered if this was a copy protection issue so I tried to run the software to see what I got. The software would not run. Not in the sense that it was broken but in the way that it gave me a warning saying "Do no Write-Protect Program Disk!". The software was protected from being run off this floppy. This was a master disk. This disk was meant to be copied and yet it was now corrupt.

Looking at the bad sector can be done with the software like Copy ][+ however I prefer to use more modern solutions like CiderPress and Omnivore because these reduce the high ASCII codes into lower ones which results in being able to view readable characters. Looking at the corrupt sector and surrounding sectors I could see that this area is comprised of text data. This means that restoring the corruption was going to be easier than restoring program data. The text would be instant confirmation that the reconstruction was moving in the right direction. With the corrupt sector itself I could see that the first few lines looked fine but then a third of the way through the sector it reaches the corrupted area.

I ran with what I had imaged in an emulator and the game was playable but I knew that at some point into the game it would either crash or display unexpected behaviour. I wasn't confident in being able to restore the disk as I had no idea as to how corrupt the data might be. However I decided that until I ran out of options I would give it my best shot.

I exhausted all the copy programs that I had available and since I did not have any specialist archiving tools such as a KryoFlux or an EDD Plus card (not that these would guarantee success) I went with the only option I had left. That was with a logic analyser and some software I had written four years earlier for diagnosing CHED, a floppy disk emulator, I had built at the time. The logic analyser samples the disk data that is being transferred between the Apple II and the Disk II while the software that I wrote processes and evaluates that bitstream.

While using a sector editor on the Apple II I was able to reread the corrupt sector (Track $18 Sector $C) at the same time as using the logic analyser to record the bitstream. The analyser records multiple input lines such as the stepper motors pulses, read request and write data however we are only interested in looking at one line, the data read line. I had already loaded the "AppleIIDiskIIAnalyzer" analyser into the logic analyser software so it processes the Disk II data bytes as soon as the data is sampled. This shows up as disk bytes above the bitstream.

I exported the disk byte data into a text file then loaded it into a Microsoft Access database application for processing. I've built the processing around a database and not just a text file so that I could view the data at each of the conversion steps. It also makes it easy for me to modify the data in the database to test different scenarios, without having to reload the data each time. After processing the data I found that I had gone backwards. The corruption was worse then before. The reason for this was that "AppleIIDiskIIAnalyzer" was treating the timings as being close to optimum like all the other disks I had played around with. However the "Convict" disk was different. The timings had greater variances.

To understand what corruption is and how to fix it the above two diagrams are the most critical parts to understand. Since I'm only dealing with fixing a single corrupted sector I didn't need to worry about sync bytes, the address field or sector skew (Dos 3.3 reads a physical sector then skips two physical sectors to give itself time to processes the data hence Dos 3.3 sectors are numbered differently to physical sectors) so if you wish to cover these parts of the disk II protocol then the best book to read is Don Worth's "Beneath Apple DOS".
I'll just be covering the data field. Each disk sector is just a stream of bits. "Convict" is a 16 sector per track disk (not the older 13 sector) so each disk byte is eight bits long, it always starts with a high bit and there can be no more than two consecutive zeros (absence of a high bit). Therefore all "AppleIIDiskIIAnalyzer" does to convert bits to bytes is to wait for a high bit at the start of the byte then process three different types of gap lengths (approx 3 microseconds to denote two consecutive high bits, 7 microseconds to denote a low bit and 11 microseconds to denote two consecutive low bits). Corruption happens when the magnetic structure of the disk is changed so that a high bit is read as a low bit or a low bit is read as a high bit (possibly less likely) or the gap length timings expand / contract. Once the disk bytes are calculated one has to look at the data field as a whole and follow the steps that the disk operating system would take to convert disk bytes into user bytes ie the memory bytes that the Apple II users would be used to when dealing with memory at the assembly language level or peeking / poking in higher level languages. The conversion process takes disk bytes, runs them through a lookup table with a one to one translation, exclusive ORs the result then performs a 6and2 prenibbleisation calculation. All these stages help in determining where the corruption is in the sector.

a. The one to one translation can help by determining if a disk byte is valid because it has to match only one of the sixty four valid disk bytes.
b. The exclusive OR step is probably the best at determining if corruption has happened because every byte is included in the calculation to determine the checksum.
c. The 6and2 prenibbleisation stage is helpful if corruption is contained to a small area because this stage splits the user bytes into different parts of the disk sector.

Because of the corruption in the data, when importing and processing the data bytes we end up with an invalid data field. The prenibble byte calculations get pushed past the checksum and field end bytes. To fix the field size I inserted in some dummy bytes to make the size equal 342 disk bytes plus the checksum.

I rewrote parts of the "AppleIIDiskIIAnalyzer" so that it was more forgiving on timings. What I ended up with was what had initially been produced by the copy programs.

To track the corrupt data all I needed to do was to find the bytes "6F 66 20 61 67" within the "Disk II Data Stream Processor". This I knew had been read without corruption. This gave me the start time of the data field within the bitstream recording. After counting 3 bytes for the data field header then 86 disk bytes which comprise of the compressed "0 0 C0 C1 B0 B1 A0 A1" bytes put me at the beginning of the user prenibble bytes. Traversing another eighty odd characters I arrived at the corruption area. Counting the bitstream bytes showed that indeed this was true. The "AppleIIDiskIIAnalyzer" has difficulty converting the bytes when it hits the bitstream containing the corruption. Looking further afield, the bitstream seems to suggest that the corruption is contained to one area and the protocol finally recovers towards the end of the sector.

To recover the bottom part of the sector I setup the processing to work backwards. If we assume the checksum exclusive OR result is zero and run the exclusive OR calculation in reverse we get the above result. Again I had to tinker with the "AppleIIDiskIIAnalyzer" timings to get a better outcome. The protocol does not recover straight way after the corruption area but a least we have a good chunk that can be worked out automatically. At this point we are left with just seventeen bytes to fix.

Still working from the end of the sector towards the corrupted area the rest of the calculations needed to be done manually. Out came the pen and paper. Working with two or three bytes at a time I was able to reconstruct the data all the way to the corrupted area.

The remaining eight bytes would end up being the most difficult to recover because the actual data was missing ie the disk was corrupt in this area. I placed a template of good looking bytes over the top of the corrupted bytes to give me an idea of how long normal byte lengths were. From this I continued to do byte recognition based from optimal timings. I was more confident over some bytes than others. I played around with different combinations until I stumbled across a solution that fulfilled the exclusive OR calculation. The displayed text was "caibu,ne". I didn't believe I had the right combination of bytes but the exclusive OR calculation worked. I could now turn off the reverse processing code and only deal with processing the data in one direction. The last two bytes made sense in the text so I was confident I was down to six bytes.

How hard can it be to brute force six bytes ie find every valid combination of data bytes? It turns out to be harder than it looks. The program is not just a bunch of "for" loops but instead something that knows the available disk bytes to use, 6and2 byte processing and the bytes it needs from the first 86 bytes of the data field. By brute forcing just two bytes I got a result that showed me sixty four possible combinations that satisfied the exclusive OR calculation. When four bytes were brute forced it generated a file that was half a gigabyte in size. It would have taken me months just to go through the results. I did notice that patterns were appearing in the output results and therefore I didn't need to process all six bytes at a time. In terms of just readable characters I determined that the lists (c,g,k,o,s,w), (a,e,i,m,q,u,y) and (b,f,j,n,r,v,z) could be used so I modified the program to output all the possible readable options. This didn't take too long and only took me about fifteen minutes to go through the results. The only word that worked well was "many" in terms of size and matching the existing converted text. Hooray that was the all the text worked out. I thought I was done.

Omnivore was used to fix the sector on the corrupted disk image. All I had to do now was to check and see if this worked. In the emulator I played the game several times and worked out where Track $18 Sector $C was being used. My celebrations were shorted lived. I realised that the fix did not work as expected. It didn't crash but it cut short the line in question ("build" was displayed instead of "buildings"). So even though the text was fixed, the character in front of the word "many" needed adjusting. I thought this character was a page or paragraph separator but in fact it must be some sort of length descriptor.

I changed the program to only look at solutions that contained the word "many" however it still produced a huge list of results that satisfied the exclusive OR test. I was lucky yet again. The corrupted sector and the text within it formed the part of the game where you were reading a letter (a message of your progress). After you read the letter you were asked if you wanted to read the letter again. I produced several disk images with what I thought may have been the correct replacement bytes. Back in the game I confirmed that I wanted to reread the letter and every time it showed me the letter I was able to swap another disk image and test out another option. Some options produced text but with incorrect line lengths, some options produced blank lines but one option got it spot on. Finally the right bytes were found and tested. I was able to use this same method to test the original disk and sure enough when the letter (message) was being displayed the hard drive generates that distinctive grinding sound and stops the program from continuing. When attempting to continue it keeps trying to reread the corrupted sector.

The images above show the six bytes used in the final cut and their results. I also generated a physical disk with the reconstructed image and then used the logic anlayser to read Track $18 Sector $C. The corrupted bitstream can now be compared with the reconstructed bitstream. The bitstreams do not differ by much. I knew that there were missing bits within the corrupted gaps however what threw me was the gap length variations. In hindsight it looks like I should have been able to reconstruct the bitstream without having to do the brute force option. Next time that may be the case.

You may be thinking this is all well and good but not everybody has a logic analyser. That's true but I think a device could be constructed for under $20 to do the job. I have used a Cypress FX2LP (a $10 breakout board) to sample video signals at 14MHz and send them over USB to a modern computer. To do the same with slow disk drive signals would be a push over. Think poor mans EDD Plus solution. This maybe a project I'll tackle in the future some day.  

The "Convict" software can be located here.
My tools can be found here if one wishes to check out the source code.

Good luck with you own endeavours.


sicklittlemonkey said...

Did the corrupted disk read wrong consistently on all reads, even on different drives, or did the values change? I had a similar problem with a disk years ago, but found that the sector would change only from the point of bit-slip. I assumed it was one bit and managed to correct it.

Anyway, nice work Alex. I hope you get around to the poor man's EDD card. I used the same term when I had a go at rewriting the Disk II sequencer to do that, but the ROM is just too small. ; - )


Lukazi said...

Hi Nick,

The values changed on every bad read whether I was using a different drive or the same drive. The values changed after the first corrupted byte. Luckily the three bad bits that I had were only a few bytes apart. What I found with this disk was that the bits were not spaced well like the ones I get to see using my grey drives. I suspect this disk was created using an older drive like a Disk II or maybe a clone drive.

Unfortunately I can't see myself getting to the EDD card any time soon. Number one priority now is getting a special project together for the next OzKFest (in September).