Not logged in. Login

Direct CC Compiler

Parabix applications often have two initial stages:

  1. Transposition of byte streams to 8 parallel basis bit streams.
  2. Using the Parabix CC compiler to produce character class bit streams.

Avoiding S2P Overhead

  • Transposition imposes an overhead on Parabix applications.
  • Cost is about 0.5-0.7 cycles per byte.
  • In some cases, the character classes required may be very simple.
    • Example: counting lines only needs [\x{0A}] stream. (wc -l)
    • Can directly compute this stream using hsimd_signmask on the byte stream.
  • Avoiding transposition in simple cases may improve performance.

The Direct CC Compiler

The wc application has recently been updated to use only the Direct CC compiler when we just count lines.

    if (CountWords || CountChars) {
        ccc = make_unique<cc::Parabix_CC_Compiler>
                (getEntryScope(), getInputStreamSet("u8bit"));
    } else {
        ccc = make_unique<cc::Direct_CC_Compiler>
                (getEntryScope(), pb.createExtract(getInput(0), pb.getInteger(0)));
    }

Performance Comparisons

Standard wc program.

cameron@cs-osl-10:~/icgrep-devel$ perf stat wc -l ~/Wikibooks/*l
    149060 /home/cameron/Wikibooks/arwikibooks-20150102-pages-articles.xml
   3822595 /home/cameron/Wikibooks/dewikibooks-20141216-pages-articles.xml
    149198 /home/cameron/Wikibooks/elwikibooks-20141226-pages-articles.xml
   1421406 /home/cameron/Wikibooks/eswikibooks-20141223-pages-articles.xml
    256564 /home/cameron/Wikibooks/fawikibooks-20141217-pages-articles.xml
    380247 /home/cameron/Wikibooks/fiwikibooks-20141221-pages-articles.xml
   1726088 /home/cameron/Wikibooks/frwikibooks-20150106-pages-articles.xml
    281479 /home/cameron/Wikibooks/idwikibooks-20141221-pages-articles.xml
    916939 /home/cameron/Wikibooks/jawikibooks-20150103-pages-articles.xml
    207600 /home/cameron/Wikibooks/kowikibooks-20141223-pages-articles.xml
    634492 /home/cameron/Wikibooks/ruwikibooks-20150123-pages-articles.xml
    133998 /home/cameron/Wikibooks/thwikibooks-20150104-pages-articles.xml
    196735 /home/cameron/Wikibooks/trwikibooks-20141227-pages-articles.xml
    245338 /home/cameron/Wikibooks/viwikibooks-20141221-pages-articles.xml
  10912107 /home/cameron/Wikibooks/wiki-books-all.xml
    390368 /home/cameron/Wikibooks/zhwikibooks-20141225-pages-articles.xml
  21824214 total

 Performance counter stats :

        730.262254      task-clock (msec)         #    1.000 CPUs utilized          
                66      page-faults               #    0.090 K/sec                  
     1,457,095,087      cycles                    #    1.995 GHz                    
     1,398,712,316      instructions              #    0.96  insns per cycle        
       317,362,991      branches                  #  434.588 M/sec                  
        23,900,552      branch-misses             #    7.53% of all branches        

       0.730538577 seconds time elapsed

Parabix wc, revision 5854

cameron@cs-osl-10:~/icgrep-devel$ perf stat build5854/wc -l ~/Wikibooks/*l
  149060         /home/cameron/Wikibooks/arwikibooks-20150102-pages-articles.xml
...
21824214    total

 Performance counter stats for 'build5854/wc -l::

        542.880842      task-clock (msec)         #    0.999 CPUs utilized          
            20,573      page-faults               #    0.038 M/sec                  
     1,082,442,226      cycles                    #    1.994 GHz                    
     2,725,327,731      instructions              #    2.52  insns per cycle        
       170,912,946      branches                  #  314.826 M/sec                  
           430,236      branch-misses             #    0.25% of all branches        

       0.543234219 seconds time elapsed

Parabix wc, current

cameron@cs-osl-10:~/icgrep-devel$ perf stat icgrep-build/wc -l ~/Wikibooks/*l
  149060         /home/cameron/Wikibooks/arwikibooks-20150102-pages-articles.xml
...
21824214    total

 Performance counter stats for 'icgrep-build/wc -l:

        231.530436      task-clock (msec)         #    0.999 CPUs utilized          
            20,547      page-faults               #    0.089 M/sec                  
       461,674,778      cycles                    #    1.994 GHz                    
       512,511,462      instructions              #    1.11  insns per cycle        
        58,107,345      branches                  #  250.971 M/sec                  
           254,124      branch-misses             #    0.44% of all branches        

       0.231848515 seconds time elapsed

The use of the Direct CC compiler can substantially speed up applications when the number of CC streams is small!

Direct CC Compiler with icgrep

Work is underway to optimize icgrep based on use of DirectCC compiler.

  • For simple regular expressions.
  • For complex regular expressions, with a simple prefix.
Updated Mon Feb. 19 2018, 08:57 by cameron.