• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

Translation

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

JigPu

Inactive Pokémon Moderator
Joined
Jun 20, 2001
Location
Vancouver, WA
I have a prime number program witten in BASIC, which I believe I have sped up to the limits of what I can do (other than using a seive... too hard to program). I was wondering if anybody here could translate my BASIC source into C or C++ (or any other language) to speed up execution time.

Also, if you could compile it with 486 or above code, I would be very interested in how it performs (since my program can only be compiled up to 386 with my compiler). Please make sure that you do NOT use FPU emulation in the compiled code if you can help it (I have the option in both my BASIC and Pascal compilers) as it GREATLY slows down the program.

Boy I sound nit-picky today!! Perhaps I should just upload the code and trust you to do your best :rolleyes:
Post your times running 1 - 1,000,000 in a DOS window (not full-screen!).
JigPu
 
LOL! Forgot the code!!

BTW: I can also post some psudo-code for those who want it...

JigPu
 
Timing for C++ and C++ with asm main loop on a 1533MHz tbird. Compiled using visual C++6, target cpu was pentium.

C++ Timings:

Screen output, 1-1000000 in a window:
4.877 seconds, 78498 primes found, 16095.551 primes/sec, 7.850% prime

File output, 1-1000000:
1.232 seconds, 78498 primes found, 63715.909 primes/sec, 7.850% prime

No Output 1-1000000:
1.132 seconds, 78498 primes found, 69344.523 primes/sec, 7.850% prime


ASM core timings:

Screen output, 1-1000000 in a window:
4.887 seconds, 78498 primes found, 16062.615 primes/sec, 7.850% prime

File output, 1-1000000:
1.112 seconds, 78498 primes found, 70591.727 primes/sec, 7.850% prime

No Output 1-1000000:
1.012 seconds, 78498 primes found, 77567.194 primes/sec, 7.850% prime

BTW what were your timings in BASIC?
 
Ok just grabbed a basic compiler (firstbasic off qbasic.com), timings (286 code, no FP emulation):

Screen output, 1-1000000 in a window:
10.71 Seconds, 78498 Primes Found, 7329.14 Primes/Sec., 7.85 % Prime

File output, 1-1000000:
10.71 Seconds, 78498 Primes Found, 7329.277 Primes/Sec., 7.85 % Prime

No Output 1-1000000:
10.656 Seconds, 78498 Primes Found, 7366.715 Primes/Sec., 7.85 % Prime

Now either that's not much of a basic compiler, or basic really sucks... or both ;)
Interestingly the basic code runs the same speed regardless of output, whereas the C++/asm is much faster without it.
 
I think the main thing that is slowing down your BASIC version is emulation code. As you will see with my posted times, emulation REALLY slows down the code. Of course I don't have a very high opinion of the QBasic compiler either...

EDIT: Wait... You said you had FP emulation off.... Must be QBasic... Or BASIC in general :D

Here is my whole package.. Run PRM32.exe. The 'b' version is a MUCH faster version but I have not verified wether it gives correct results or not (I turned off math error checking).

JigPu
 
Last edited:
Intel Celeron: 466MHz

Screen Output:
386 Code: 20.983 sec
386 Code (FPU Emulate): 34.877 sec.

(very little diffrence in times between generations of code)


File Output:
386 Code: 21.696 sec.
386 Code (FPU Emulate): 35.153 sec.

JigPu
 
Well I tested your version, the normal version gets similar times to my build of it, the 'b' version gets much lower times: 4.83 Seconds. Didn't make much difference file or screen.
Still a lot slower than my C++ and asm.
Here's my program if you want to try it, should run on any windows based computer running a 486 or higher (I think, might need a pentium class cpu though).
 
Just optimized my asm some, new timings:

Screen: 4.426 seconds, 78498 primes found, 17735.653 primes/sec, 7.850% prime
File: 0.681 seconds, 78498 primes found, 115268.722 primes/sec, 7.850% prime
None: 0.580 seconds, 78498 primes found, 135341.379 primes/sec, 7.850% prime
 
MAN!! That program is FAST when writing to a file!!! 5 Seconds!!

However, for some reason, the screen routines are really SLOW... 43 seconds on my system when going to screen... Very odd...

Now to try to 1 Billion... :D
JigPu
 
I didn't bother to do anything fancy with screen output. If I put the effort in I could probably increase the speed quite a bit.
 
1 billion eh...
Ok if it takes 0.681s to do 1 million, and the problem has O(n sqrt(n)) complexity (which it does, trust me if you don't know how to work it out).
So for 100 million it should take 10^8*sqrt(10^8)/(10^6*sqrt(10^6)/0.681) seconds, or 681 seconds. I did a run to verify this and got:
469.355 seconds, 5761455 primes found, 12275.261 primes/sec, 5.761% prime
Incidentally, 5.7 million primes is nearly 55 megs of output.

So this shows my calculations are in the right ballpark.
Using the 100 million number to get a 1 billion figure gives 10^9*sqrt(10^9)/(10^8*sqrt(10^8)/469.355) seconds, or 14842 seconds. That's 4 hours and 7 minutes. Too long for my taste, others are welcome to run the program for that long though. :)
 
ButcherUK said:
1 billion eh...
Ok if it takes 0.681s to do 1 million, and the problem has O(n sqrt(n)) complexity (which it does, trust me if you don't know how to work it out).
If you say so... :D
ButcherUK said:
That's 4 hours and 7 minutes. Too long for my taste, others are welcome to run the program for that long though. :)
Ick... And your computer runs it about 10x faster than mine, so 40 hours of my time!!! Guess I shouldn't try after all :D :D

Oh BTW, I just dug out my aincient Assebly Language book (Peter Norton's Assebly Language Book for the IBM PC... copyright 1986 :D) and started learning it to see if I can make it any faster. Early results... FAST. Made a short program that will go up to about 65,000 (limit of the registers) and can calculate (no output) in no time!! Now if I could come up with a way to run it for larger numbers...


ALSO: If anyone else would like to chip in their own version (I know there are more people who know BASIC out there!) of my program, feel free!! I really want to see how diffrent languages compile!

JigPu
 
Here's my code, since you're getting into the asm. It's all 32-bit code, so register size isn't an issue. Bit of a long post... ;)

Code:
  int r, i, j;
  const char* fmt = "%10d";
  WORD fpuctrl, t;
  _asm {
    // for (int i = StartNum; i <= EndNum; i += 2)

    fldpi
    fstcw fpuctrl
    movzx eax,fpuctrl
    and eax,0xfcff
    mov t,ax
    fldcw t
    push 3
    mov eax,StartNum
    xor edi,edi
    pop esi
    cmp eax,EndNum
    mov i,eax
    jg  iloopend

  iloopstart:

    // int r = sqrt(i);

    fild  i
    fsqrt
    fistp r
    mov ebx,r

    // for (int j = 3; j <= r; j += 2)

    mov ecx,esi
    cmp ecx,ebx
    jg  jloopend

  jloopstart:

    // if (!(i % j)) break; 

/* This code did the above
    I recoded it to use the FPU and it's twice as fast on an athlon ;)

    mov eax,edi
    cdq
    idiv  ecx
    test  edx,edx*/

    mov j,ecx
    fild j
    fild i
    fprem1
    fldz
    fcomip  st,st(1)
    fstp st(0)
    fstp st(0)
    jz  nexti
    inc ecx
    inc ecx
    cmp ecx,ebx
    jle jloopstart

  jloopend:
    // if (j > r)

	  cmp	ecx,ebx
	  jle nexti

    // ++primes;

    inc primes

    // if (OutMeth != 3)

	  cmp	OutMeth,esi
	  je  noout

    // fprintf(fp, "%8d", i);

    push  i
    push  fmt
    push  fp
	  call  fprintf
	  add esp,12
    cmp	OutMeth,1
    je  nexti

  noout:
  // print a '.' every 2^20 (~1 million) loops if not doing screen output
    mov eax,i
    shr eax,20
    cmp eax,edi
    jle nexti
    inc edi
    xor eax,eax
    mov al,'.'
    push  eax
    call  putch

  nexti:
    mov eax,i
    inc eax
    inc eax
    cmp eax,EndNum
    mov i,eax
    jle iloopstart

  iloopend:
    fldcw fpuctrl
  }
 
:eek: :eek: :eek: :eek:
LOOK AT ALL THAT CODE!!!

[drops dead]

Here's my futile attempt at my program... No input or output (hard coded 7 - 65536). Unfortunatly, my assembler hates my program for some reason (the labes specificly) and so this is straight from good ol' DEBUG...

????:???? ??????--------MOV CX,0005
????:???? ??????--------ADD CX,0002
0E7C:0106 BB0100--------MOV BX,0001
0E7C:0109 83C302--------ADD BX,+02
0E7C:010C 39CB----------CMP BX,CX
0E7C:010E 740C----------JZ 011C
0E7C:0110 89C8----------MOV AX,CX
0E7C:0112 BA0000--------MOV DX,0000
0E7C:0115 F7F3-----------DIV BX
0E7C:0117 83FA00--------CMP DX,+00
0E7C:011A 77ED----------JA 0109
0E7C:011C 83F9F9--------CMP CX,-07
0E7C:011F 7EE2----------JLE 0103
0E7C:0121 CD20----------INT 20

EDIT: Seemed to have missed some code...

JigPu
 
Last edited:
Heh, that's only the main loop the whole program is 182 lines.

BTW, instead of
MOV DX,0000 do XOR DX,DX
CMP DX,0 do TEST DX,DX
They're faster :)

If you like I can break down my code and give an analysis of it... that'd be a monster post though.
 
ButcherUK said:
Heh, that's only the main loop the whole program is 182 lines.

BTW, instead of
MOV DX,0000 do XOR DX,DX
CMP DX,0 do TEST DX,DX
They're faster :)

If you like I can break down my code and give an analysis of it... that'd be a monster post though.
Perhaps we should move this to PMs...

YGPM :)
JigPu
 
Last edited:
Back