300種類以上のAIモデルをより簡単で迅速に

RBLN SDKで、Rebellions NPUへ AIモデルを手早くデプロイできます。Compiler、Runtime、Model Library、Serving Frameworkなど詳細内容はガイドをご参照ください。開発から運用まで円滑に行うことができます。

業界の標準フレームワークでスタート

Hugging Face

RBLN SDKは、Hugging FaceのTransformerおよびDiffuserモデルに対応します。Optimum RBLNライブラリで、Llama3-8B、SDXLなどの最新モデルをダウンロードし、Hugging Face Hubでデプロイしてください。

💡 RebellionsハードウェアでHugging Faceモデルを実行

  • Rebellions NPUに最適化したHugging Faceモデルのコンパイルおよび推論
  • RBLNランタイムを利用した開発者フレンドリーなAPIを提供
  • Llama、SDXLなどマルチチップ構成をサポート

PyTorch

RBLN SDKはPyTorch 2.0に完全対応します。自然言語処理(NLP)、音声、ビジョンモデルなどさまざまなPyTorch基盤のワークロードをRebellions NPUで高速化できます。

💡 PyTorchモデルの統合

  • Rebellionsハードウェアに最適化したPyTorchモデルをコンパイル
  • RBLNランタイムを利用した開発者フレンドリーなAPIを提供
  • 事前に調整せずにTorch 2.0モデルを即時にサービングパイプラインに接続

TensorFlow

RBLN SDKはTensorFlowに対応し、LLM、ImageNet、YOLOなど多様なモデルの推論を最適化します。

💡 TensorFlowモデルの統合

  • Keras Applicationsの多様な事前学習モデルを即時に活用
  • RBLNランタイムを利用した開発者フレンドリーなAPIを提供
  • 事前に調整せずにTensorFlowモデルをそのままサービングパイプラインに接続

Rebellionsのソフトウェアスタック

ハードウェアの性能を最大化するように設計されたRebellionisのソフトウェアスタックをご紹介します。

Machine Learning Framework

自然言語処理、ビジョン、音声、生成型モデル向けの主要な開発ツール

PyTorch、TensorFlow、HuggingFaceなど業界の標準フレームワークとスムーズに連携

開発者の生産性を最大化し、デプロイまでのEnd-to-Endワークフローを簡素化

Compiler

フロントエンド・コンパイラは、PyTorchとTensorFlowで作成したモデルをRebellionsの中間表現(IR)に変換し、デバイスで実行できるモデルへと円滑に移行

バックエンド・コンパイラは、変換したモデルを基に、デバイスの効率を最大限にするコマンド・ストリームとプログラム・バイナリーを生成

高度な最適化技術でAI推論コストを削減し、演算の最適化およびメモリ管理が簡単にでき、運用効率が向上

Compiler

Compute Library

ビジョン、トランスフォーマーなど多様なAIモデルの推論に必要な演算を最適化

CNN (Convolutional Neural Network)および最新の生成AIモデルに対応

拡張可能なNeural Engineアーキテクチャを基に、高度な演算を最適化

Compute Library

Runtime Module

コンパイルしたモデルとハードウェア間の実行を仲介する中核的な構成要素

データ転送、命令実行、パフォーマンスモニタリングを担い、最適なAI推論環境を提供

Runtime Module

Driver

OSとNPU間の最適なインターフェースを提供し、ハードウェアの活用を最大化

カーネルドライバーおよびユーザードライバーで構成され、ハードウェアとソフトウェア間の円滑な通信を支援

RBLNコンパイラで生成されたコマンドストリームをATOM™デバイスに伝達し、実行を管理

Driver

Firmware

ソフトウェアとハードウェアの最下層にあるインターフェース

メモリ階層間の作業調整およびハードウェアの状態をモニタリングする機能を提供し、安定したAI推論を確保

大規模AIモデルを実行する環境でも、安定的なワークロードの配分および予測可能な性能維持をサポート

Firmware

RBLN Backend Rebellions Hardware

FP16基準の32 TFLOPS、INT8基準の128 TOPSの強力な演算性能と64 MB オンチップSRAMを通じて、最適なメモリ帯域幅および低遅延を実現

電力効率が重要なデータセンター、クラウドAI、オンプレミスAIのワークロードで、最高の性能および費用対効果を提供

Machine Learning Framework
Compiler
Compute Library
Runtime Module
Driver
Firmware
RBLN Backend Rebellions Hardware

Frequently Asked Questions

Can’t find what you’re looking for? Contact us here!

Q. Which AI frameworks and libraries does RBLN SDK support?
A.
RBLN SDK supports models based on PyTorch and TensorFlow and is also compatible with the Hugging Face Transformers/Diffusers libraries.

We are continuously improving compatibility with major AI frameworks through regular updates.
Q. Can I compile PyTorch or TensorFlow models with RBLN SDK without code modifications?
A.

In most cases, you can use the RBLN SDK with minimal code changes.


  • For officially supported Model Zoo models, you can use the provided example code right away.
  • Other models can also be compiled by referring to the Model Zoo code.

Check the list of supported operations in advance:

Q. How do I ensure version compatibility with AI frameworks?
A.

To maximize the performance of transformer-based models, consider the following:


  • Set the rbln_tensor_parallel_size value appropriately to utilize NPU parallelism
  • Tune the input sequence length and batch size
Q. Does RBLN Runtime API support C/C++?
A.

The RBLN SDK provides a C/C++-bound runtime for applications where Python runtime is unavailable or extremely low latency is required.
Please refer to the C/C++ guide for more information.

Q. How do I ensure version compatibility with AI frameworks?
A.

The RBLN SDK and Compiler are regularly updated to maintain API compatibility with the latest versions of major frameworks.
For details, please refer to the respective Release Notes.

Q. Is RBLN SDK compatible with PyTorch?
A.

RBLN SDK offers high compatibility with PyTorch-based models.


torch.compile() Support: Fully compatible with PyTorch 2.0’s torch.compile() feature, and supports models compiled using TorchDynamo and TorchInductor backends.


• Extensive Operator Support: The RBLN Compiler supports most PyTorch operators. You can check the full list in Supported Ops. It also includes major operators for Vision, NLP, and Audio, making it suitable for a wide range of deep learning models.


• PyTorch Model Zoo Compatibility: Popular models such as ResNet, YOLO, LLaMA, and BERT are supported. See the PyTorch Model Zoo page for more details.


• JIT/Scripted Model Support: Models converted using TorchScript can also be processed by the RBLN Compiler.

Q. How do I install RBLN Driver?
A.

The RBLN Driver can be installed using the provided deb or rpm installation files and requires root privileges. During installation, you must ensure that the kernel version is compatible with the driver.


Q. How do I install RBLN Driver?
A.

The RBLN Driver can be installed using the provided deb or rpm installation files and requires root privileges. During installation, you must ensure that the kernel version is compatible with the driver.


Q. What is the required Python version and are there additional dependencies?
A.

Python 3.9 or higher is recommended, and there are key package dependencies such as numpy, torch, and onnx.


Please refer to the Support Matrix page for the supported OS and Python versions.
Required packages may vary by model, so refer to the requirements.txt file included in the Model Zoo code for details.

Q. Does RBLN SDK support Windows?
A.

Currently, RBLN SDK only supports Linux. Windows support will be determined based on our technical roadmap.


Q. Can I run inference on multiple devices?
A.

The RBLN SDK supports distributed inference based on tensor parallelism, called RSD (Rebellions Scalable Design).
Please first check the Model List that support multi-device, and refer to the provided example for compilation instructions.

Q. Can I measure and analyze model performance?
A.

You can analyze metrics such as latency, throughput, and memory usage using the Profiler included in the SDK.


With rbln-stat, you can also monitor power consumption and utilization.

Q. How do I determine the optimal batch size?
A.

The optimal batch size may vary depending on the type of NPU used, server configuration, and service requirements.
We recommend using the Profiler tool and conducting various experiments for fine-tuning.

Q. Are there profiling and optimization tools?
A.

RBLN SDK includes the RBLN Profiler for performance bottleneck analysis, collecting key metrics such as execution time, memory usage, and operation dependencies


  • .pb format trace files can be visualized with Perfetto. - You can analyze bottlenecks, inter-operation dependencies, and layer-by-layer latency to suggest optimization directions. For detailed usage, refer to the Profiler Guide.
Q. How do I process video input files (.mp4)?
A.

To process video files, you can use libraries like OpenCV (cv2) to extract each frame from an .mp4 file as an image, and then feed those frames into the model for inference.


For example, when using an object detection model like YOLOX, the typical procedure is as follows:


1.Load the video file using cv2.VideoCapture
2.Extract frames one by one
3.Preprocess each frame to match the model’s input format
4.Perform object detection using the model
5.Visualize the results and either save them or display them in real time

Q. Which FP16 formats does RBLN SDK support?
A.
RBLN SDK supports BFloat16, IEEE 754 FP16, and custom FP16 formats. FP32 models can be automatically cast to FP16 during compilation with the RBLN Compiler.
Q. How are ATOM and REBEL different?
A.

Both are AI inference NPUs developed by Rebellions, but REBEL is a next-generation product designed with a chiplet-based architecture. A detailed comparison chart is available on the product page.

Q. Can I train models with RBLN SDK?
A.
The current RBLN SDK is designed for inference-only use. Plans for training support will be announced through the roadmap once they are finalized.
Q. Do you support Kubernetes?
A.

Yes. You can use Rebellions AI processor resources via the Kubernetes Plugin.

Q. What Kubernetes tools are available?
A.
  • Kubernetes Device Plugin: Supports RBLN NPUs on Kubernetes cluster environment.
  • NPU NPU Feature Discovery: Labels Kubernetes nodes with RBLN NPUs for scheduling.
  • RBLN Metrics Exporter: Exposes NPU metrics (temperature, power, DRAM, utilization) in Prometheus format for Grafana dashboards.
Q. Which NPUs are officially supported by the RBLN SDK?
A.
As of May 30, 2025, the SDK supports ATOM™+ (RBLN-CA22) and ATOM™-Max (RBLN-CA25). Support for ATOM™ (RBLN-CA02) ended on June 30, 2025.
Q. Do you support V1 Engine?
A.

The V1 Engine improves generation and multimodal models. Enable it by setting:


export VLLM_USE_V1=1
Q. Which serving frameworks do you support?
A.

RBLN SDK is compatible with vLLM, Nvidia Triton Inference Server, and TorchServe. Container-based deployment also supports integration with Kubernetes.

Q. How are NPUs GPUs different
A.
While both NPUs (Neural Processing Units) and GPUs (Graphics Processing Units) perform parallel computations, they differ in their optimized computation methods and intended use cases.

GPUs were originally designed for graphics rendering but have been widely adopted for AI training and high-performance computing (HPC) due to their large-scale parallel processing capabilities. They typically use FP32/FP16 operations and support various types of computation through CUDA cores and Tensor Cores.

NPUs are processors specialized for AI and deep learning, designed to perform efficient computations at low power. They are optimized for low-bit operations such as INT8 and FP16 and include dedicated hardware architectures that accelerate neural network computations.
Q. How can I fine-tune models or optimize inference?
A.

Rebellions devices are designed exclusively for inference, and fine-tuning is not currently supported.


To maximize inference performance, we recommend the following optimization strategies:


  • Use Mixed Precision and Quantization: Improve memory efficiency and compute speed by using FP16 or INT8 quantized models.
  • Adjust Batch Size: Find the optimal batch size based on model characteristics and input data to increase throughput.
  • Refactor Model Architecture: Simplify the computation graph through layer fusion and removal of redundant operations to boost performance.
  • Double Buffering: Utilize double buffering in AsyncRuntime to improve execution efficiency.
  • Apply Continuous Batching for LLM Serving: For large language model (LLM) serving, maximize hardware utilization by applying continuous batching techniques using vllm-rbln.
Q. Is there a forum or a support channel?
A.

You can ask questions or discuss technical issues on Rebellions Dev Forum. You can directly reach out to us here.

Q. How often are the firmware and driver updated?
A.

The SDK is updated approximately every month, and the driver is updated every three months, although the schedule is subject to change.

Q. Model compilation fails.
A.

Currently, for officially supported models listed in the RBLN Model Zoo, you can use the provided compilation and inference example code.


If you’re using a modified model or a model not included in the Model Zoo, technical support may be limited, and compilation may fail.
First, check the error code to identify the cause. If further assistance is required, please reach out via the Rebellions Dev Forum.

Q. I get errors during language model compilation and inference.
A.

Please check the following items:



  • Memory Usage: If the system runs out of memory during compilation, the process may fail.
  • NPU Configuration: Ensure that the value of rbln_tensor_parallel_size is not greater than the actual number of devices installed in your system. You can verify the number of devices by running the rbln-stat command in your terminal.
  • Docker Environment: Refer to the Docker Guide for more details.
Q. CPU usage is too high during model inference.
A.

You can limit the number of CPU threads used during inference by setting the RBLN_NUM_THREADS environment variable. Specifying an appropriate number of threads can reduce CPU load and help stabilize performance.


Pleae refer to this document for more details.

Q. I get errors after driver/compiler updates.
A.

Issues may arise due to version mismatches between the driver and compiler.


  • Refer to the Release Notes of the RBLN SDK to ensure that all components are installed with compatible versions.
  • After aligning all libraries to their compatible versions, try recompiling the model.

開発者のリソースとサポート