Sizecoding Blog by TomCat

Occasionally I will share only one thought about one of my tiny intro...

In low resolution black & white pattern would have look awful in slow motion without antialiasing. This short routine uses 16 grey colors of the standard VGA color palette to make the movement smooth:
    TEST AL,16+32+64
    JZ .1
    MOV AL,255
    AND AL,15
    ADD AL,16
download the full source code
Quick compofiller entry with softsynth drums. I store the drum grid as a bitfield. You can make your own drum loop here: learningmusic@ableton
DRUMS: DW 0000100010001000B   ; open hihat
       DW 0101011001010110B   ; closed hihat
       DW 1000000000000000B   ; snare (clap)
       DW 0111000100110001B   ; kick drum

       JNC DRUM2
       JNC DRUM3
download the full source code
To copy a line with 32 unrolled LDI instructions is quite fast and only have to decide that, which stored line should copy for next.
          LD DE,$4000    ; video mem address
          LD BC,192*32   ; video mem length
nextline: LD HL,$8000
          ADD HL,DE      ; stored f(y) at memory $C000
          LD A,(HL)      ; read f(y)
timer:    ADD A,254      ; add frame counter
          ADD A,$A0      ; stored lines at memory $A000
          LD H,A
part:     LD L,0         ; xoffset (part selector)
unroll:   LDI ...        ; unrolled x32
          JP PE,nextline
          JR nextframe
download the full source code
Five lines intersect each other, rotated by 72 degrees and if we are closer to the center than to 4 lines (at least), then we are inside the star.
       MOV CX,405H  ; CL: total lines, CH: min lines for inside
lines: MOV AL,[BX]  ; BX: sinus
       IMUL DL      ; x*sinus
       XCHG DI,AX   ; save result
       ADD BL,64    ; BX: sinus->cosinus
       MOV AL,[BX]  ; BX: cosinus
       IMUL DH      ; y*cosinus
       ADD AX,DI    ; x*sin+y*cos
       CMP AX,BP    ; BP: zoom constant
       JG next      ; > outside
       DEC CH       ; < inside
       JZ putpixel  ; 4 inside -> draw star
next:  SUB BL,64+51 ; next line rotated by 72 deegre
       ...          ; 5x loop
download the full source code
Every movement repeats after 256 frames while the pointer walks through the whole sinus table.
@@: FLDPI               ; PI
    FMUL   DWORD [SI]   ; PI/128
    FIMUL  WORD [DI]    ; counter*PI/128
    FSIN                ; SIN(counter*PI/128)
    FIMUL  WORD [SI]    ; 127*SIN(counter*PI/128)
    FISTP  WORD [BX+SI] ; -
    INC    BX
    DEC    BYTE [DI]    ; loop 256x
    JNZ    @B           ; BX = sinus table
    DEC    BL           ; BL = my own timer counter, BX = angle
    HLT                 ; wait for timing (cheaper than 46CH BIOS var)
download the full source code
Real shadows? of an unreal 3D object? Yeah, shadows are always fake and easily derivated from the previous pixels.
    SUB    AL,45           ; AX: Y coordinate, BX: sinus table
    JNG    @F              ; if we are at 45 lines lower than the center
    IMUL   BP,AX,-640      ; then check the screenbuffer for triangle
    IMUL   BYTE [BX]       ; shift the shadow in x direction
    SAR    AX,6            ; in opposite of the light
    SUB    BP,AX
    CMP    [ES:DI+BP],CH   ; if screenbuffer pixel > 127
    JBE    @F              ; then screenbuffer pixel = triangle
    SHR    BYTE [SI],1     ; and decrase the intensity of the background
@@: MOVSB                  ; PutPixel
download the full source code
I use only one channel of the AY-3-8912 music chip. The one byte frame counter gives the pitch (2 bits), the volume (4 bits) and the echo fade as well (2 bits).
      LD A, ($5C78)  ; frame counter (incrasing) -> A
      OR $3F         ; upper 2 bits + 63 -> AY Channel A fine pitch
      LD A, ($5C78)  ; frame counter (incrasing) -> A
      LD (HL), A     ; save frame counter
      CPL            ; flip frame counter (decrasing) -> A
      AND $0F       
      LD C, A        ; lower 4 bits -> echo volume
      AND $03
      LD B, A        ; middle 2 bits > amount of fade        
      JR Z, nofade  
fade: SRL C          ; fading out
      DJNZ fade      ; C -> AY Channel A volume
download the full source code
The easiest way to play music under Windows is using general MIDI. I'm storing 64 notes (2 notes per bytes) so this is 32 bytes in total. And the whole stored data is played once again, but higher. We had to change only one note by extra during the repeat.
   MOV EDX,[ESI-start+MIDIMSG]    ; 007F1090H
@@:BT [ESI-start+score],EBX
   RCL DH,1
   INC BL                         ; bit counter
   JNC @B                         ; loop x4
   JNZ @F                         ; last note?
   XOR WORD [ESI-start+last],7402H; then we have to change it for the 2nd play
@@:ADD DH,[ESI-start+adder]
   PUSH EDX                       ; send note on channel 0
   PUSH DWORD [EDI]               ; handle (given by midiOutOpen)
   CALL DWORD [ESI-start+midiOutShortMsg]
download the full source code
For the scroll first I copy an empty line and the printed text and another empty line to the back buffer (every pixel by four times). Then I use the back buffer as a mask. The scroll is moving by three pixels.
 MOVSX SI,DL                    ; SI=Y coord
 IMUL SI,SI,56                  ; 56 -> 1/height of letters
 AND SI,1111110000000000B       ; SI=Y*56/256*1024
 ADD SI,BX                      ; SI+=X coord
 IMUL BP,BP,-3                  ; BP=3*framecounter (fcnt is a neg num)
 CMP BYTE [BP+SI+0D360H],7      ; check back buffer
 DB 10,'Hi iONic rascy$'        ; an empty line and the greeting text
download the full source code
Altough I recycle any possible part of the PE header, less code fits into a 512byte win32 EXE file than into a DOS 256byte COM file. On the other hand I can use such virtual resolution as I want. To get smooth moving thin grid in vertical direction, I have 768 lines, and in horizontal direction I have only 256 pixels to get an acceptable drawing speed.
 CALL DWORD [ESI-start+StretchDIBits]
 SUB ESP,13*4  ; repair the stack frame (preserves StretchDIBits arguments)
download the full source code
After playing a complete Bach music with 549 notes, only 20 bytes left for me to make some visualisation. Scrolling soundbars in horizontal direction is better choice then scrolling verticaly, because we don't need to copy the whole screen.
 PUSH 0A000H            ; Horizontal scroll
 POP ES                 ; ES=A000: video segment
 SUB DI,DI              ; Cheap destination address: 0
 MOV SI,1               ; Source, scrolling left by 1 pixel
 MOV CH,54H             ; 54H * 256 * 2 pixels are enough
 ES:                    ; ES prefix: cheaper than set DS
 REP MOVSW              ; Copying by words
 IMUL DI,AX,-320*2      ; AX * -1: mirror, AX * 2: scale
 REP STOSB              ; AL: MIDI note = bar color
download the full source code
To change a pixel in Truecolor video mode normally we write 4 bytes to the video memory by accessing the VRAM 4 times. But if we setup the CPU's MSR register to combine of these memory writes, then we can achieve 3-4x speedup by this. Computing every pixel at smooth animation speed is not possible without this speedup. But on the other hand this means the intro won't run under DOSBox.
 INC    SI              ; Write Combine speedup
 PUSH   SI              ; greets to HellMood
 PUSH   SI              ; MOV EAX,01010101H
 POP    EAX             ; greets to Rrrola
 DB     66H             ; MOV ECX,00000259H
 LEA    CX,[259H]       ; greets to Ervin
 CDQ                    ; MOV EDX,00000000H
 WRMSR                  ; (kills DOSBox)
download the full source code
First I draw the path of the scroll with 7 x 35 elements. Then we have no other job, just rotating the colors in the color palette:
 MOV    CH,3            ; 256*3 color components
 REP    OUTSB           ; changing the whole color palette
 MOV    CX,3*7*35	; rotating the colors in the puffer
 DEC    SI
 MOV    DL,[SI-7*3]     ; by 7 colors forward
 MOV    [SI],DL
 LOOP   .2
 ...                    ; we put one new column to the puffer
download the full source code
I store characters in 5 bits. I have less than 32 different characters: 25 lowercase letters (no Z in the text), the point, the space, the double space and the new line character. After skipping the first 8 letters the title's 15 chars are converted to uppercase, and every first and every fifth character of the paragraphs too:
 MOV    CL,8            ; offset of the title
 MOV    DH,00100001B    ; flags for capital letters
 LOOP   skip            ; no converting
 INC    CX              ; CX = 1 -> LOOP doesn't skip anymore
 SHL    DI,1            ; DI:0FFFEH -> 15 captital letters
 JB     convert
 SHR    DH,1            ; every first and fifth chars
 JC     convert
download the full source code
I was browsing the Mandelbrot set as deep what is allowed by the FPU's 80-bit precision floatingpoint numbers. I was searching for nice and interesting forms and structures. The intro zooms in and out to these coordinates:
+0.34390699597256746411, -0.70062002023500613567
-1.25764648790205013639, +0.11831488894193964434
-0.93789936639955584496, +0.31736094066985742756
-0.72568075954437072372, -0.27254962894836931575
-0.15713601278801156424, -1.10494558202452419770
+0.34390699597256746411, +0.70062002023500613567
-1.25764648790205013639, -0.11831488894193964434
-0.93789936639955584496, -0.31736094066985742756
-0.72568075954437072372, +0.27254962894836931575
-0.15713601278801156424, +1.10494558202452419770
download the full source code
Storing the bits in horizontal direction it would be logic, but I store them from the upper-right corner to the lower-left corner in vertical direction. Because in this way I can spare 3 bits at the start of data and 18 bits at the end.
    MOV    SI,26*14-16          ; number of bits
    MOV    DL,14                ; coloums of the image
.3: MOV    CL,26                ; raws of the image
.4: BT     [BP-text+bits],SI    ; the bit means color #0 or #255
    DEC    SI                   ; counting the bits left
    LOOPNZ .4                   ; repeat until bottom of scr or end of bits
    SUB    DI,320*7*26+8
    DEC    DX                   ; go to next coloum
    JNZ    .3
download the full source code
8 gray shaded photo looked poor quality, but 16 shades wasn't needed. I used 12 shades with a small blur which allowed me storing 54 circles (and ellipsoids). To find the shapes, I had a formula to summarize the differences between the original photo and the current image. With brute force I've tried every possible object, and kept the 2d-shape which gave the lowest result... and again... the next shape.
 ADC    AL,12H          ; AL = previous+shift+correction
 MOV    AH,BH           ; AH = new color
 AAD    1               ; AL = new+prev color, AH = 0
 SHR    AL,1            ; horizontal blur
 STOSB                  ; put pixel
 LOOP  .1               ; next pixel
download the full source code
The screen consists modulo 256 cells. When the radius of spheres is less than 256, then easy to make 3D textures with cheap 16-bit instructions:
 CMP    AX,633          ; AX:X2+Y2
 JNA    white
 CMP    AX,1392
 JNA    black
 CWD                    ; AX:Y
 XOR    DX,AX           ; DX = abs(Y)
 CMP    DL,11
 JNA    black
 JS     white
download the full source code
I wasn't able to select the right tempo, so finally I left many variations in the intro. The speed of the music will be faster and faster, because I'm speeding up the system BIOS timer:
 MOV    AX,[GS:046CH]   ; get the BIOS timer counter
 SUB    DI,AX           ; negate the counter to count down
 SHR    AX,8+5          ; the tempo will change after 2^13 ticks
 INC    AX              ; at least one -> freq divisor will be at least 257
 OUT    40H,AL          ; IRQ0 speedup (faster BIOS timer tick)
;OUT    40H,AL          ; lower and higher bytes will be the same (in 2 pass)
download the full source code
There are no separate parts and transitions between them. The whole intro is only one effect. From the beginning to the end, the red constant is decreased from 9 to 0 (and there is a pause at value 4).
void mainImage(out vec4 o,vec2 u)
    vec3 R = iResolution, 
         p = R-R; p.z = 4.;
    while (R.z++<64.)
        p +=  vec3((u+u-R.xy)/R.x,.5) 
            * (length(vec2(o.a=length(p.xz)-4.,p.y))-4.);
    o = vec4 ( 1&int(7.*(atan(p.y,o.a)-atan(p.z,p.x)-iTime)) );                 
download the full source code
The first and the 3rd effect run the same routine, but calculate with more or less bars. For rotating the bars I use the triangle wave function instead of sine&cosine.
.1: MOV    AX,BX       ; AL: time = [0...255], CH: 0 
    SUB    AL,CH       ; shift triangle wave by PI/2
    XOR    AL,AH       ; AL = triangle wave = [0...127]
    SUB    AL,64       ; AL = [-64...63]
    IMUL   DH          ; DH: dy, DL: dx
    XCHG   BP,AX       ; BP = b*dy
    XCHG   DH,DL
    XOR    CH,64       ; 64 -> PI/2 (256 -> 2PI)
    JNZ    .1          ; loop 2x
    ADD    AX,BP       ; AX = c = a*dx + b*dy
download the full source code
Spheres are ordered by the distance from eye. So the first hit gives the closest object. The reflected hit order is not correct, but who cares?
.2: DEC    DX           ; AX:16,  DX:16
.3: XCHG   AX,DX
    STOSW               ; X = DX = [15...-15]
    XCHG   AX,DX
    XCHG   AX,CX
    STOSW               ; Y = CX = [24...1]
    XCHG   AX,CX
    STOSW               ; Z = AX = 16
    NEG    DX
    JS     .3
    JNZ    .2
    LOOP   .1           ; 31x24 spheres
download the full source code
I play arpeggios alternately from two kind of chords, and I vary the tempo of arpeggios. Thanks to this, 2x3 notes are enough for a whole song :)
 SHR    EBP,CL          ; EBP: time counter, CL: tempo of arpeggio
 AND    BP,3            ; BP = index of the note in a chord
 TEST   SI,8192*2       ; depends on time counter (SI)
 JNZ    @F              ; alternate between chord1 and chord2
 MOV    AH,[BP+NOTESA1] ; get the note
NOTESA1: DB 128,144,192,0
NOTESA2: DB 128,152,192,0
download the full source code
The standard VGA 256 color palette has some structures. If you increase the color index by 72 and 72, you get a darker and darker color. When the background pattern is behind a moving sphere, I do this.
 CS LODSW               ; get the mirrored offset
 MOV    AL,[BX+DI]      ; get the background color
 TEST   BX,BX           ; sphere hit test 
 JZ     .3
 ADD    AL,72           ; make it darker
 STOSB                  ; put pixel
download the full source code
Only a big gradient drawn on the screen from left to right, and the color palette continuously renewing along triangle waves. Each components of a color has different phase. This difference given by the bytebeat sample.
 MOV    BX,SI           ; BX = 256
 MOV    AL,[BX]         ; phase of component
 SUB    AL,CL           ; next angle
 XOR    AL,AH           ; triangle wave [0...127]
 SHR    AL,1            ; AL = [0...63]
 OUT    DX,AL           ; set color component
 INC    BX
 JPO    rgb             ; loop 3x
download the full source code
Drawing in HiRes TrueColor video mode we need a fast code, so symmetry is our friend again. The left side of a line is mirrored to the right, and for more speed every line is doubled:
 MOV    [SI+BX],AL      ; horizontal mirror
 MOV    [SI+BP+RESX*4/2-1],AL
 REP    MOVSD           ; twice as fast than REP MOVSW
 POP    SI
 DEC    BP
 JPE    twice           ; duplicate scanline
download the full source code
First Lyric Video in 256 byte :) Music data including the markers of lyric is compressed by a special method: 1 => fill by zero (silent); pos values => 1 byte copy then fill by the next byte (text marker and note); neg value => copy already uncompressed data (repeated patterns).
 JS	copy
 JZ	fill
download the full source code
I store 3 different properties only in one byte: shape (0-2), size (0-6), color (0-12). If you have only three different kind of shapes, You don't need to store this data on 2 bits (one and a half bits are enough :) and AAM instruction is more powerful than AND or SHR:
 AAM    21
 MOV    CH,AH           ; CH = color
 AAM    7               ; AL = radius, AH = shape
 DB 10*21+7*1+6         ; color(*21) + (7*)shape + radius
 DB  1*21+7*1+4
download the full source code
This was my first intro when any sound and any pixel in the intro depends on the time. So it will run at the same speed on every PC or DOSBox config, but the slower PC draws the less frames. The soundsample and the pixelcolor has it's own math formula. This kind of sound generation called bytebeat:
t = time
sound = 1500/(y=t&8191)&1)*35
 + (x=t*"6689"[t>>15&3]/24&127)*y/40000
 + ((t>>7^t>>9|t>>13|x)&63
download the full source code
Because not enough free DOS memory, I use the 4th byte of every pixel in the 32 bit video mode and my Z buffer stored in the video memory.
 CMP    [ES:DI+3],DL    ; Z buffer test
 JA     pixelok
 STOSB                  ; write RGB color
 INC    BX
 JPO    bgr             ; loop 3x
 XCHG   AX,DX           ; AL = Z coord
 STOSB                  ; write Z buffer
download the full source code
Every visual effect in the intro is symmetric about the center of the screen. So I'm drawing the pixels from the topleft corner and from the bottomright corner at the same time.
 MOV    BH,320*200/256
 DEC    BX
 MOV    [ES:BX],AL      ; mirroring for speedup
 CMP    DI,BX           ; check halfscreen
 JC     nextpixel
download the full source code

Please write comments and questions here: