Direct CC Compiler
Parabix applications often have two initial stages:
- Transposition of byte streams to 8 parallel basis bit streams.
- Using the Parabix CC compiler to produce character class bit streams.
Avoiding S2P Overhead
- Transposition imposes an overhead on Parabix applications.
- Cost is about 0.5-0.7 cycles per byte.
- In some cases, the character classes required may be very simple.
- Example: counting lines only needs [\x{0A}] stream. (
wc -l) - Can directly compute this stream using
hsimd_signmaskon the byte stream.
- Example: counting lines only needs [\x{0A}] stream. (
- Avoiding transposition in simple cases may improve performance.
The Direct CC Compiler
The wc application has recently been updated to use only the Direct CC compiler when we just count lines.
if (CountWords || CountChars) {
ccc = make_unique<cc::Parabix_CC_Compiler>
(getEntryScope(), getInputStreamSet("u8bit"));
} else {
ccc = make_unique<cc::Direct_CC_Compiler>
(getEntryScope(), pb.createExtract(getInput(0), pb.getInteger(0)));
}
Performance Comparisons
Standard wc program.
cameron@cs-osl-10:~/icgrep-devel$ perf stat wc -l ~/Wikibooks/*l
149060 /home/cameron/Wikibooks/arwikibooks-20150102-pages-articles.xml
3822595 /home/cameron/Wikibooks/dewikibooks-20141216-pages-articles.xml
149198 /home/cameron/Wikibooks/elwikibooks-20141226-pages-articles.xml
1421406 /home/cameron/Wikibooks/eswikibooks-20141223-pages-articles.xml
256564 /home/cameron/Wikibooks/fawikibooks-20141217-pages-articles.xml
380247 /home/cameron/Wikibooks/fiwikibooks-20141221-pages-articles.xml
1726088 /home/cameron/Wikibooks/frwikibooks-20150106-pages-articles.xml
281479 /home/cameron/Wikibooks/idwikibooks-20141221-pages-articles.xml
916939 /home/cameron/Wikibooks/jawikibooks-20150103-pages-articles.xml
207600 /home/cameron/Wikibooks/kowikibooks-20141223-pages-articles.xml
634492 /home/cameron/Wikibooks/ruwikibooks-20150123-pages-articles.xml
133998 /home/cameron/Wikibooks/thwikibooks-20150104-pages-articles.xml
196735 /home/cameron/Wikibooks/trwikibooks-20141227-pages-articles.xml
245338 /home/cameron/Wikibooks/viwikibooks-20141221-pages-articles.xml
10912107 /home/cameron/Wikibooks/wiki-books-all.xml
390368 /home/cameron/Wikibooks/zhwikibooks-20141225-pages-articles.xml
21824214 total
Performance counter stats :
730.262254 task-clock (msec) # 1.000 CPUs utilized
66 page-faults # 0.090 K/sec
1,457,095,087 cycles # 1.995 GHz
1,398,712,316 instructions # 0.96 insns per cycle
317,362,991 branches # 434.588 M/sec
23,900,552 branch-misses # 7.53% of all branches
0.730538577 seconds time elapsed
Parabix wc, revision 5854
cameron@cs-osl-10:~/icgrep-devel$ perf stat build5854/wc -l ~/Wikibooks/*l
149060 /home/cameron/Wikibooks/arwikibooks-20150102-pages-articles.xml
...
21824214 total
Performance counter stats for 'build5854/wc -l::
542.880842 task-clock (msec) # 0.999 CPUs utilized
20,573 page-faults # 0.038 M/sec
1,082,442,226 cycles # 1.994 GHz
2,725,327,731 instructions # 2.52 insns per cycle
170,912,946 branches # 314.826 M/sec
430,236 branch-misses # 0.25% of all branches
0.543234219 seconds time elapsed
Parabix wc, current
cameron@cs-osl-10:~/icgrep-devel$ perf stat icgrep-build/wc -l ~/Wikibooks/*l
149060 /home/cameron/Wikibooks/arwikibooks-20150102-pages-articles.xml
...
21824214 total
Performance counter stats for 'icgrep-build/wc -l:
231.530436 task-clock (msec) # 0.999 CPUs utilized
20,547 page-faults # 0.089 M/sec
461,674,778 cycles # 1.994 GHz
512,511,462 instructions # 1.11 insns per cycle
58,107,345 branches # 250.971 M/sec
254,124 branch-misses # 0.44% of all branches
0.231848515 seconds time elapsed
The use of the Direct CC compiler can substantially speed up applications when the number of CC streams is small!
Direct CC Compiler with icgrep
Work is underway to optimize icgrep based on use of DirectCC compiler.
- For simple regular expressions.
- For complex regular expressions, with a simple prefix.
Updated Mon Feb. 19 2018, 08:57 by cameron.