Sizecoding Blog by TomCat |
2023.09.06. | |
In low resolution black & white pattern would have look awful in slow motion without antialiasing. This short routine uses 16 grey colors of the standard VGA color palette to make the movement smooth:
aalias: CBW TEST AL,16+32+64 JZ .1 MOV AL,255 .1: XOR AL,AH AND AL,15 ADD AL,16download the full source code |
2020.02.20. | |
Quick compofiller entry with softsynth drums. I store the drum grid as a bitfield. You can make your own drum loop here: learningmusic@ableton
DRUMS: DW 0000100010001000B ; open hihat DW 0101011001010110B ; closed hihat DW 1000000000000000B ; snare (clap) DW 0111000100110001B ; kick drum DRUM1: BT [BP-BASE+DRUMS],DI JNC DRUM2 ... DRUM2: BT [BP-BASE+DRUMS+2],DI JNC DRUM3download the full source code |
2019.11.03. | |
To copy a line with 32 unrolled LDI instructions is quite fast and only have to decide that, which stored line should copy for next.
LD DE,$4000 ; video mem address LD BC,192*32 ; video mem length nextline: LD HL,$8000 ADD HL,DE ; stored f(y) at memory $C000 LD A,(HL) ; read f(y) timer: ADD A,254 ; add frame counter ADD A,$A0 ; stored lines at memory $A000 LD H,A part: LD L,0 ; xoffset (part selector) unroll: LDI ... ; unrolled x32 JP PE,nextline JR nextframedownload the full source code |
2019.10.29. | |
Five lines intersect each other, rotated by 72 degrees and if we are closer to the center than to 4 lines (at least), then we are inside the star.
MOV CX,405H ; CL: total lines, CH: min lines for inside lines: MOV AL,[BX] ; BX: sinus IMUL DL ; x*sinus XCHG DI,AX ; save result ADD BL,64 ; BX: sinus->cosinus MOV AL,[BX] ; BX: cosinus IMUL DH ; y*cosinus ADD AX,DI ; x*sin+y*cos CMP AX,BP ; BP: zoom constant JG next ; > outside DEC CH ; < inside JZ putpixel ; 4 inside -> draw star next: SUB BL,64+51 ; next line rotated by 72 deegre ... ; 5x loopdownload the full source code |
2019.10.26. | |
Every movement repeats after 256 frames while the pointer walks through the whole sinus table.
@@: FLDPI ; PI FMUL DWORD [SI] ; PI/128 FIMUL WORD [DI] ; counter*PI/128 FSIN ; SIN(counter*PI/128) FIMUL WORD [SI] ; 127*SIN(counter*PI/128) FISTP WORD [BX+SI] ; - INC BX DEC BYTE [DI] ; loop 256x JNZ @B ; BX = sinus table ... DEC BL ; BL = my own timer counter, BX = angle HLT ; wait for timing (cheaper than 46CH BIOS var)download the full source code |
2019.10.24. | |
Real shadows? of an unreal 3D object? Yeah, shadows are always fake and easily derivated from the previous pixels.
SUB AL,45 ; AX: Y coordinate, BX: sinus table JNG @F ; if we are at 45 lines lower than the center IMUL BP,AX,-640 ; then check the screenbuffer for triangle IMUL BYTE [BX] ; shift the shadow in x direction SAR AX,6 ; in opposite of the light SUB BP,AX CMP [ES:DI+BP],CH ; if screenbuffer pixel > 127 JBE @F ; then screenbuffer pixel = triangle SHR BYTE [SI],1 ; and decrase the intensity of the background @@: MOVSB ; PutPixeldownload the full source code |
2019.10.23. | |
I use only one channel of the AY-3-8912 music chip. The one byte frame counter gives the pitch (2 bits), the volume (4 bits) and the echo fade as well (2 bits).LD A, ($5C78) ; frame counter (incrasing) -> A OR $3F ; upper 2 bits + 63 -> AY Channel A fine pitch LD A, ($5C78) ; frame counter (incrasing) -> A LD (HL), A ; save frame counter CPL ; flip frame counter (decrasing) -> A AND $0F LD C, A ; lower 4 bits -> echo volume RLD AND $03 LD B, A ; middle 2 bits > amount of fade JR Z, nofade fade: SRL C ; fading out DJNZ fade ; C -> AY Channel A volumedownload the full source code |
2019.10.21. | |
The easiest way to play music under Windows is using general MIDI. I'm storing 64 notes (2 notes per bytes) so this is 32 bytes in total. And the whole stored data is played once again, but higher. We had to change only one note by extra during the repeat.
MOV EDX,[ESI-start+MIDIMSG] ; 007F1090H @@:BT [ESI-start+score],EBX RCL DH,1 INC BL ; bit counter JNC @B ; loop x4 JNZ @F ; last note? XOR WORD [ESI-start+last],7402H; then we have to change it for the 2nd play @@:ADD DH,[ESI-start+adder] PUSH EDX ; send note on channel 0 PUSH DWORD [EDI] ; handle (given by midiOutOpen) CALL DWORD [ESI-start+midiOutShortMsg]download the full source code |
2019.10.20. | |
For the scroll first I copy an empty line and the printed text and another empty line to the back buffer (every pixel by four times). Then I use the back buffer as a mask. The scroll is moving by three pixels.
scrolling: MOVSX SI,DL ; SI=Y coord IMUL SI,SI,56 ; 56 -> 1/height of letters AND SI,1111110000000000B ; SI=Y*56/256*1024 ADD SI,BX ; SI+=X coord MOVZX BP,CL IMUL BP,BP,-3 ; BP=3*framecounter (fcnt is a neg num) CMP BYTE [BP+SI+0D360H],7 ; check back buffer ... greetings: DB 10,'Hi iONic rascy$' ; an empty line and the greeting textdownload the full source code |
2019.10.19. | |
Altough I recycle any possible part of the PE header, less code fits into a 512byte win32 EXE file than into a DOS 256byte COM file. On the other hand I can use such virtual resolution as I want. To get smooth moving thin grid in vertical direction, I have 768 lines, and in horizontal direction I have only 256 pixels to get an acceptable drawing speed.
PUSH 00CC0020H; SRCCOPY PUSH EBX ; DIB_RGB_COLORS PUSH EBP ; BITMAPINFOHEADER PUSH start+RESX*RESY PUSH EDX ; RESY PUSH EAX ; RESX ... ;hdc,rc.left,rc.top,rc.right,rc.bottom,0,0,ResX,ResY,pixels,bmpnfo,0,SRCCOPY CALL DWORD [ESI-start+StretchDIBits] SUB ESP,13*4 ; repair the stack frame (preserves StretchDIBits arguments)download the full source code |
2019.10.17. | |
After playing a complete Bach music with 549 notes, only 20 bytes left for me to make some visualisation. Scrolling soundbars in horizontal direction is better choice then scrolling verticaly, because we don't need to copy the whole screen.
PUSH 0A000H ; Horizontal scroll POP ES ; ES=A000: video segment SUB DI,DI ; Cheap destination address: 0 MOV SI,1 ; Source, scrolling left by 1 pixel MOV CH,54H ; 54H * 256 * 2 pixels are enough ES: ; ES prefix: cheaper than set DS REP MOVSW ; Copying by words ... IMUL DI,AX,-320*2 ; AX * -1: mirror, AX * 2: scale REP STOSB ; AL: MIDI note = bar colordownload the full source code |
2019.10.16. | |
To change a pixel in Truecolor video mode normally we write 4 bytes to the video memory by accessing the VRAM 4 times. But if we setup the CPU's MSR register to combine of these memory writes, then we can achieve 3-4x speedup by this. Computing every pixel at smooth animation speed is not possible without this speedup. But on the other hand this means the intro won't run under DOSBox.
INC SI ; Write Combine speedup PUSH SI ; greets to HellMood PUSH SI ; MOV EAX,01010101H POP EAX ; greets to Rrrola DB 66H ; MOV ECX,00000259H LEA CX,[259H] ; greets to Ervin CDQ ; MOV EDX,00000000H WRMSR ; (kills DOSBox)download the full source code |
2019.01.27. | |
First I draw the path of the scroll with 7 x 35 elements. Then we have no other job, just rotating the colors in the color palette:
MOV CH,3 ; 256*3 color components REP OUTSB ; changing the whole color palette MOV CX,3*7*35 ; rotating the colors in the puffer .2: DEC SI MOV DL,[SI-7*3] ; by 7 colors forward MOV [SI],DL LOOP .2 ... ; we put one new column to the pufferdownload the full source code |
2019.01.24. | |
I store characters in 5 bits. I have less than 32 different characters: 25 lowercase letters (no Z in the text), the point, the space, the double space and the new line character. After skipping the first 8 letters the title's 15 chars are converted to uppercase, and every first and every fifth character of the paragraphs too:
MOV CL,8 ; offset of the title newpar: MOV DH,00100001B ; flags for capital letters ... LOOP skip ; no converting INC CX ; CX = 1 -> LOOP doesn't skip anymore SHL DI,1 ; DI:0FFFEH -> 15 captital letters JB convert SHR DH,1 ; every first and fifth chars JC convertdownload the full source code |
2019.01.23. | |
I was browsing the Mandelbrot set as deep what is allowed by the FPU's 80-bit precision floatingpoint numbers. I was searching for nice and interesting forms and structures. The intro zooms in and out to these coordinates:
+0.34390699597256746411, -0.70062002023500613567 -1.25764648790205013639, +0.11831488894193964434 -0.93789936639955584496, +0.31736094066985742756 -0.72568075954437072372, -0.27254962894836931575 -0.15713601278801156424, -1.10494558202452419770 +0.34390699597256746411, +0.70062002023500613567 -1.25764648790205013639, -0.11831488894193964434 -0.93789936639955584496, -0.31736094066985742756 -0.72568075954437072372, +0.27254962894836931575 -0.15713601278801156424, +1.10494558202452419770download the full source code |
2019.01.22. | |
Storing the bits in horizontal direction it would be logic, but I store them from the upper-right corner to the lower-left corner in vertical direction. Because in this way I can spare 3 bits at the start of data and 18 bits at the end.MOV SI,26*14-16 ; number of bits MOV DL,14 ; coloums of the image .3: MOV CL,26 ; raws of the image .4: BT [BP-text+bits],SI ; the bit means color #0 or #255 ... DEC SI ; counting the bits left LOOPNZ .4 ; repeat until bottom of scr or end of bits SUB DI,320*7*26+8 DEC DX ; go to next coloum JNZ .3download the full source code |
2019.01.21. | |
8 gray shaded photo looked poor quality, but 16 shades wasn't needed. I used 12 shades with a small blur which allowed me storing 54 circles (and ellipsoids). To find the shapes, I had a formula to summarize the differences between the original photo and the current image. With brute force I've tried every possible object, and kept the 2d-shape which gave the lowest result... and again... the next shape.
.1: ADC AL,12H ; AL = previous+shift+correction ... MOV AH,BH ; AH = new color AAD 1 ; AL = new+prev color, AH = 0 SHR AL,1 ; horizontal blur STOSB ; put pixel LOOP .1 ; next pixeldownload the full source code |
2019.01.20. | |
The screen consists modulo 256 cells. When the radius of spheres is less than 256, then easy to make 3D textures with cheap 16-bit instructions:
CMP AX,633 ; AX:X2+Y2 JNA white CMP AX,1392 JNA black ... CWD ; AX:Y XOR DX,AX ; DX = abs(Y) CMP DL,11 JNA black TEST AX,AX JS whitedownload the full source code |
2019.01.19. | |
I wasn't able to select the right tempo, so finally I left many variations in the intro. The speed of the music will be faster and faster, because I'm speeding up the system BIOS timer:
MOV AX,[GS:046CH] ; get the BIOS timer counter SUB DI,AX ; negate the counter to count down ... XCHG AX,DI SHR AX,8+5 ; the tempo will change after 2^13 ticks INC AX ; at least one -> freq divisor will be at least 257 OUT 40H,AL ; IRQ0 speedup (faster BIOS timer tick) ;OUT 40H,AL ; lower and higher bytes will be the same (in 2 pass)download the full source code |
2019.01.18. | |
There are no separate parts and transitions between them. The whole intro is only one effect. From the beginning to the end, the red constant is decreased from 9 to 0 (and there is a pause at value 4).
void mainImage(out vec4 o,vec2 u) { vec3 R = iResolution, p = R-R; p.z = 4.; while (R.z++<64.) p += vec3((u+u-R.xy)/R.x,.5) * (length(vec2(o.a=length(p.xz)-4.,p.y))-4.); o = vec4 ( 1&int(7.*(atan(p.y,o.a)-atan(p.z,p.x)-iTime)) ); }download the full source code |
2019.01.17. | |
The first and the 3rd effect run the same routine, but calculate with more or less bars. For rotating the bars I use the triangle wave function instead of sine&cosine.
.1: MOV AX,BX ; AL: time = [0...255], CH: 0 SUB AL,CH ; shift triangle wave by PI/2 CBW XOR AL,AH ; AL = triangle wave = [0...127] SUB AL,64 ; AL = [-64...63] IMUL DH ; DH: dy, DL: dx XCHG BP,AX ; BP = b*dy XCHG DH,DL XOR CH,64 ; 64 -> PI/2 (256 -> 2PI) JNZ .1 ; loop 2x ADD AX,BP ; AX = c = a*dx + b*dydownload the full source code |
2019.01.16. | |
Spheres are ordered by the distance from eye. So the first hit gives the closest object. The reflected hit order is not correct, but who cares?
.2: DEC DX ; AX:16, DX:16 .3: XCHG AX,DX STOSW ; X = DX = [15...-15] XCHG AX,DX XCHG AX,CX STOSW ; Y = CX = [24...1] XCHG AX,CX STOSW ; Z = AX = 16 NEG DX JS .3 JNZ .2 LOOP .1 ; 31x24 spheresdownload the full source code |
2019.01.15. | |
I play arpeggios alternately from two kind of chords, and I vary the tempo of arpeggios. Thanks to this, 2x3 notes are enough for a whole song :)
SHR EBP,CL ; EBP: time counter, CL: tempo of arpeggio AND BP,3 ; BP = index of the note in a chord TEST SI,8192*2 ; depends on time counter (SI) JNZ @F ; alternate between chord1 and chord2 ADD BP,NOTESA2-NOTESA1 @@: MOV AH,[BP+NOTESA1] ; get the note ... NOTESA1: DB 128,144,192,0 NOTESA2: DB 128,152,192,0download the full source code |
2019.01.14. | |
The standard VGA 256 color palette has some structures. If you increase the color index by 72 and 72, you get a darker and darker color. When the background pattern is behind a moving sphere, I do this.
CS LODSW ; get the mirrored offset XCHG AX,BX MOV AL,[BX+DI] ; get the background color TEST BX,BX ; sphere hit test JZ .3 ADD AL,72 ; make it darker .3: STOSB ; put pixeldownload the full source code |
2019.01.13. | |
Only a big gradient drawn on the screen from left to right, and the color palette continuously renewing along triangle waves. Each components of a color has different phase. This difference given by the bytebeat sample.MOV BX,SI ; BX = 256 rgb: MOV AL,[BX] ; phase of component SUB AL,CL ; next angle CBW XOR AL,AH ; triangle wave [0...127] SHR AL,1 ; AL = [0...63] OUT DX,AL ; set color component INC BX JPO rgb ; loop 3xdownload the full source code |
2019.01.12. | |
Drawing in HiRes TrueColor video mode we need a fast code, so symmetry is our friend again. The left side of a line is mirrored to the right, and for more speed every line is doubled:
MOV [SI+BX],AL ; horizontal mirror MOV [SI+BP+RESX*4/2-1],AL ... twice: PUSH SI ... REP MOVSD ; twice as fast than REP MOVSW POP SI DEC BP JPE twice ; duplicate scanlinedownload the full source code |
2019.01.11. | |
First Lyric Video in 256 byte :) Music data including the markers of lyric is compressed by a special method: 1 => fill by zero (silent); pos values => 1 byte copy then fill by the next byte (text marker and note); neg value => copy already uncompressed data (repeated patterns).
LODSW ADD CL,AH CBW DEC AX JS copy STOSB JZ fill MOVSB ... REP MOVSBdownload the full source code |
2019.01.10. | |
I store 3 different properties only in one byte: shape (0-2), size (0-6), color (0-12). If you have only three different kind of shapes, You don't need to store this data on 2 bits (one and a half bits are enough :) and AAM instruction is more powerful than AND or SHR:
LODSB AAM 21 MOV CH,AH ; CH = color AAM 7 ; AL = radius, AH = shape ... circles: DB 10*21+7*1+6 ; color(*21) + (7*)shape + radius DB 1*21+7*1+4 ...download the full source code |
2019.01.09. | |
This was my first intro when any sound and any pixel in the intro depends on the time. So it will run at the same speed on every PC or DOSBox config, but the slower PC draws the less frames. The soundsample and the pixelcolor has it's own math formula. This kind of sound generation called bytebeat:
t = time sound = 1500/(y=t&8191)&1)*35 + (x=t*"6689"[t>>15&3]/24&127)*y/40000 + ((t>>7^t>>9|t>>13|x)&63download the full source code |
2019.01.08. | |
Because not enough free DOS memory, I use the 4th byte of every pixel in the 32 bit video mode and my Z buffer stored in the video memory.
CMP [ES:DI+3],DL ; Z buffer test JA pixelok bgr: ... STOSB ; write RGB color INC BX JPO bgr ; loop 3x XCHG AX,DX ; AL = Z coord STOSB ; write Z bufferdownload the full source code |
2019.01.07. | |
Every visual effect in the intro is symmetric about the center of the screen. So I'm drawing the pixels from the topleft corner and from the bottomright corner at the same time.
MOV BH,320*200/256 SUB DI,DI nextpixel: DEC BX ... STOSB MOV [ES:BX],AL ; mirroring for speedup CMP DI,BX ; check halfscreen JC nextpixeldownload the full source code |