Direct CC Compiler
Parabix applications often have two initial stages:
- Transposition of byte streams to 8 parallel basis bit streams.
- Using the Parabix CC compiler to produce character class bit streams.
Avoiding S2P Overhead
- Transposition imposes an overhead on Parabix applications.
- Cost is about 0.5-0.7 cycles per byte.
- In some cases, the character classes required may be very simple.
- Example: counting lines only needs [\x{0A}] stream. (
wc -l
) - Can directly compute this stream using
hsimd_signmask
on the byte stream.
- Example: counting lines only needs [\x{0A}] stream. (
- Avoiding transposition in simple cases may improve performance.
The Direct CC Compiler
The wc
application has recently been updated to use only the Direct CC compiler when we just count lines.
if (CountWords || CountChars) { ccc = make_unique<cc::Parabix_CC_Compiler> (getEntryScope(), getInputStreamSet("u8bit")); } else { ccc = make_unique<cc::Direct_CC_Compiler> (getEntryScope(), pb.createExtract(getInput(0), pb.getInteger(0))); }
Performance Comparisons
Standard wc
program.
cameron@cs-osl-10:~/icgrep-devel$ perf stat wc -l ~/Wikibooks/*l 149060 /home/cameron/Wikibooks/arwikibooks-20150102-pages-articles.xml 3822595 /home/cameron/Wikibooks/dewikibooks-20141216-pages-articles.xml 149198 /home/cameron/Wikibooks/elwikibooks-20141226-pages-articles.xml 1421406 /home/cameron/Wikibooks/eswikibooks-20141223-pages-articles.xml 256564 /home/cameron/Wikibooks/fawikibooks-20141217-pages-articles.xml 380247 /home/cameron/Wikibooks/fiwikibooks-20141221-pages-articles.xml 1726088 /home/cameron/Wikibooks/frwikibooks-20150106-pages-articles.xml 281479 /home/cameron/Wikibooks/idwikibooks-20141221-pages-articles.xml 916939 /home/cameron/Wikibooks/jawikibooks-20150103-pages-articles.xml 207600 /home/cameron/Wikibooks/kowikibooks-20141223-pages-articles.xml 634492 /home/cameron/Wikibooks/ruwikibooks-20150123-pages-articles.xml 133998 /home/cameron/Wikibooks/thwikibooks-20150104-pages-articles.xml 196735 /home/cameron/Wikibooks/trwikibooks-20141227-pages-articles.xml 245338 /home/cameron/Wikibooks/viwikibooks-20141221-pages-articles.xml 10912107 /home/cameron/Wikibooks/wiki-books-all.xml 390368 /home/cameron/Wikibooks/zhwikibooks-20141225-pages-articles.xml 21824214 total Performance counter stats : 730.262254 task-clock (msec) # 1.000 CPUs utilized 66 page-faults # 0.090 K/sec 1,457,095,087 cycles # 1.995 GHz 1,398,712,316 instructions # 0.96 insns per cycle 317,362,991 branches # 434.588 M/sec 23,900,552 branch-misses # 7.53% of all branches 0.730538577 seconds time elapsed
Parabix wc
, revision 5854
cameron@cs-osl-10:~/icgrep-devel$ perf stat build5854/wc -l ~/Wikibooks/*l 149060 /home/cameron/Wikibooks/arwikibooks-20150102-pages-articles.xml ... 21824214 total Performance counter stats for 'build5854/wc -l:: 542.880842 task-clock (msec) # 0.999 CPUs utilized 20,573 page-faults # 0.038 M/sec 1,082,442,226 cycles # 1.994 GHz 2,725,327,731 instructions # 2.52 insns per cycle 170,912,946 branches # 314.826 M/sec 430,236 branch-misses # 0.25% of all branches 0.543234219 seconds time elapsed
Parabix wc
, current
cameron@cs-osl-10:~/icgrep-devel$ perf stat icgrep-build/wc -l ~/Wikibooks/*l 149060 /home/cameron/Wikibooks/arwikibooks-20150102-pages-articles.xml ... 21824214 total Performance counter stats for 'icgrep-build/wc -l: 231.530436 task-clock (msec) # 0.999 CPUs utilized 20,547 page-faults # 0.089 M/sec 461,674,778 cycles # 1.994 GHz 512,511,462 instructions # 1.11 insns per cycle 58,107,345 branches # 250.971 M/sec 254,124 branch-misses # 0.44% of all branches 0.231848515 seconds time elapsed
The use of the Direct CC compiler can substantially speed up applications when the number of CC streams is small!
Direct CC Compiler with icgrep
Work is underway to optimize icgrep based on use of DirectCC compiler.
- For simple regular expressions.
- For complex regular expressions, with a simple prefix.
Updated Mon Feb. 19 2018, 08:57 by cameron.