Feature #14328: SIMD vectorization - Ruby - Ruby Issue Tracking System

Actions

Copy link

Feature #14328

open

SIMD vectorization

Added by ahorek (Pavel Rosický) over 7 years ago. Updated about 7 years ago.

Status:

Open

Assignee:

Target version:

[ruby-core:84688]

Description

Hello,
in order to make ruby faster, I'd like to propose an optional SIMD optimization for some cases. I want to target SSE2 which is available in all modern x86 processors. (Pentium 4, Athlon 64 and newer).

this is usually automatically handled by GCC during compilation time, but because of dynamic nature of ruby, redefinitions etc. It's very hard to preoptimize it before the actual execution.

use auto-vectorization provided by JIT ( https://bugs.ruby-lang.org/issues/12589 )¶

GCC can do that, but I'm not sure how reliable and effective it is today

Pros:
we don't have to do anything, let GCC do the job
bigger scope for optimizations

Cons:
slower compilation

gcc docs:
https://gcc.gnu.org/projects/tree-ssa/vectorization.html
pypy has this feature implemented for some time now:
https://morepypy.blogspot.cz/2015/10/pypy-400-released-jit-with-simd.html

specialize known bottlenecks by hand¶

Pros:
predictable performace
without increased compilation time

Cons:
code complexity

unfortunatelly using SIMD isn't for free, there's an overhead, it needs a large data set to be effective. It's useful mainly for math operations, sum, min, max, arrays, matrixes, string manipulations etc. There probably won't be any significant benefit for appliactions like Rails.

what do you think about it?

Related issues 1 (1 open — 0 closed)

Actions

Copy link

#1 [ruby-core:84832]

Updated by naruse (Yui NARUSE) over 7 years ago

I had tried to use SIMD in some parts.
But its performance improvement is limited.

Of course it can improve performance so much, but it is only in special use cases.
In usual Ruby handles small data and they can't ignore SIMD overhead.

math operations

Ruby uses GMP if exist.

sum, min, max, arrays, matrixes

Normal array can store any type.
To use SIMD power, the array should be typed array like NArray.
It's not Ruby itself's issue.

string manipulations

I tried to use SSE2 for coderange_scan() in string.c, but it doesn't improve performance so much.

SSE 4.2 STTNI is also interesting but I don't find a good use case which can pay for increasing code complexity.

Actions

Copy link

Updated by nobu (Nobuyoshi Nakada) about 7 years ago

Status changed from Open to Closed

Actions

Copy link

Updated by nobu (Nobuyoshi Nakada) about 7 years ago

Status changed from Closed to Open

Actions

Copy link

#4 [ruby-core:87958]

Updated by ahorek (Pavel Rosický) about 7 years ago

@naruse (Yui NARUSE) I saw your blank implementation, impressive
https://github.com/ruby/ruby/commit/e6bc209abf81d53c2e3374dc52c2a128570c6055

the complexity for a hand written simd code is probably too high. Ruby supports a lot of platforms, so we have to duplicate the code (compatibility paths) or make a portable interface for it.

here's also an interesting implementation of "strip" method
https://github.com/lemire/despacer

I don't like the idea of exposing simd types like NArray to the developer, but some languages did it this way (like Dart)

The best solution is to teach JIT how to vertorize at least basic loops like

for (int i = 0; i < N; ++i)
  A[i] = B[i] + C[i];

->

for (int i = 0; i < N/8; ++i)
  VECTOR_ADD(A + i, B + i, C + i);

unfortunatelly it's not always as simple as this example

Actions

Copy link

Updated by naruse (Yui NARUSE) over 5 years ago

Related to Misc #16487: Potential for SIMD usage in ruby-core added

Actions

Copy link

Also available in: Atom PDF

Like0

Like0Like0Like0Like0Like0

Project

General

Profile

Ruby

Tags

Custom queries

Feature #14328

SIMD vectorization

use auto-vectorization provided by JIT ( https://bugs.ruby-lang.org/issues/12589 )¶

specialize known bottlenecks by hand¶

Updated by naruse (Yui NARUSE) over 7 years ago

Updated by nobu (Nobuyoshi Nakada) about 7 years ago

Updated by nobu (Nobuyoshi Nakada) about 7 years ago

Updated by ahorek (Pavel Rosický) about 7 years ago

Updated by naruse (Yui NARUSE) over 5 years ago