300種類以上のAIモデルをより簡単で迅速に
RBLN SDKで、Rebellions NPUへ AIモデルを手早くデプロイできます。Compiler、Runtime、Model Library、Serving Frameworkなど詳細内容はガイドをご参照ください。開発から運用まで円滑に行うことができます。
業界の標準フレームワークでスタート
Hugging Face
RBLN SDKは、Hugging FaceのTransformerおよびDiffuserモデルに対応します。Optimum RBLNライブラリで、Llama3-8B、SDXLなどの最新モデルをダウンロードし、Hugging Face Hubでデプロイしてください。
💡 RebellionsハードウェアでHugging Faceモデルを実行
- Rebellions NPUに最適化したHugging Faceモデルのコンパイルおよび推論
- RBLNランタイムを利用した開発者フレンドリーなAPIを提供
- Llama、SDXLなどマルチチップ構成をサポート
PyTorch
RBLN SDKはPyTorch 2.0に完全対応します。自然言語処理(NLP)、音声、ビジョンモデルなどさまざまなPyTorch基盤のワークロードをRebellions NPUで高速化できます。
💡 PyTorchモデルの統合
- Rebellionsハードウェアに最適化したPyTorchモデルをコンパイル
- RBLNランタイムを利用した開発者フレンドリーなAPIを提供
- 事前に調整せずにTorch 2.0モデルを即時にサービングパイプラインに接続
TensorFlow
RBLN SDKはTensorFlowに対応し、LLM、ImageNet、YOLOなど多様なモデルの推論を最適化します。
💡 TensorFlowモデルの統合
- Keras Applicationsの多様な事前学習モデルを即時に活用
- RBLNランタイムを利用した開発者フレンドリーなAPIを提供
- 事前に調整せずにTensorFlowモデルをそのままサービングパイプラインに接続
Rebellionsのソフトウェアスタック
ハードウェアの性能を最大化するように設計されたRebellionisのソフトウェアスタックをご紹介します。
Machine Learning Framework
自然言語処理、ビジョン、音声、生成型モデル向けの主要な開発ツール
PyTorch、TensorFlow、HuggingFaceなど業界の標準フレームワークとスムーズに連携
開発者の生産性を最大化し、デプロイまでのEnd-to-Endワークフローを簡素化
Compiler
フロントエンド・コンパイラは、PyTorchとTensorFlowで作成したモデルをRebellionsの中間表現(IR)に変換し、デバイスで実行できるモデルへと円滑に移行
バックエンド・コンパイラは、変換したモデルを基に、デバイスの効率を最大限にするコマンド・ストリームとプログラム・バイナリーを生成
高度な最適化技術でAI推論コストを削減し、演算の最適化およびメモリ管理が簡単にでき、運用効率が向上
Compute Library
ビジョン、トランスフォーマーなど多様なAIモデルの推論に必要な演算を最適化
CNN (Convolutional Neural Network)および最新の生成AIモデルに対応
拡張可能なNeural Engineアーキテクチャを基に、高度な演算を最適化
Runtime Module
コンパイルしたモデルとハードウェア間の実行を仲介する中核的な構成要素
データ転送、命令実行、パフォーマンスモニタリングを担い、最適なAI推論環境を提供
Driver
OSとNPU間の最適なインターフェースを提供し、ハードウェアの活用を最大化
カーネルドライバーおよびユーザードライバーで構成され、ハードウェアとソフトウェア間の円滑な通信を支援
RBLNコンパイラで生成されたコマンドストリームをATOM™デバイスに伝達し、実行を管理
Firmware
ソフトウェアとハードウェアの最下層にあるインターフェース
メモリ階層間の作業調整およびハードウェアの状態をモニタリングする機能を提供し、安定したAI推論を確保
大規模AIモデルを実行する環境でも、安定的なワークロードの配分および予測可能な性能維持をサポート
RBLN Backend Rebellions Hardware
FP16基準の32 TFLOPS、INT8基準の128 TOPSの強力な演算性能と64 MB オンチップSRAMを通じて、最適なメモリ帯域幅および低遅延を実現
電力効率が重要なデータセンター、クラウドAI、オンプレミスAIのワークロードで、最高の性能および費用対効果を提供
Frequently Asked Questions
Can’t find what you’re looking for? Contact us here!
We are continuously improving compatibility with major AI frameworks through regular updates.
In most cases, you can use the RBLN SDK with minimal code changes.
- For officially supported Model Zoo models, you can use the provided example code right away.
- Other models can also be compiled by referring to the Model Zoo code.
Check the list of supported operations in advance:
To maximize the performance of transformer-based models, consider the following:
- Set the
rbln_tensor_parallel_size
value appropriately to utilize NPU parallelism - Tune the input sequence length and batch size
The RBLN SDK provides a C/C++-bound runtime for applications where Python runtime is unavailable or extremely low latency is required.
Please refer to the C/C++ guide for more information.
The RBLN SDK and Compiler are regularly updated to maintain API compatibility with the latest versions of major frameworks.
For details, please refer to the respective Release Notes.
RBLN SDK offers high compatibility with PyTorch-based models.
• torch.compile()
Support: Fully compatible with PyTorch 2.0’s torch.compile()
feature, and supports models compiled using TorchDynamo and TorchInductor backends.
• Extensive Operator Support: The RBLN Compiler supports most PyTorch operators. You can check the full list in Supported Ops. It also includes major operators for Vision, NLP, and Audio, making it suitable for a wide range of deep learning models.
• PyTorch Model Zoo Compatibility: Popular models such as ResNet, YOLO, LLaMA, and BERT are supported. See the PyTorch Model Zoo page for more details.
• JIT/Scripted Model Support: Models converted using TorchScript can also be processed by the RBLN Compiler.
The RBLN Driver can be installed using the provided deb
or rpm
installation files and requires root privileges. During installation, you must ensure that the kernel version is compatible with the driver.
The RBLN Driver can be installed using the provided deb
or rpm
installation files and requires root privileges. During installation, you must ensure that the kernel version is compatible with the driver.
Python 3.9 or higher is recommended, and there are key package dependencies such as numpy, torch, and onnx.
Please refer to the Support Matrix page for the supported OS and Python versions.
Required packages may vary by model, so refer to the requirements.txt
file included in the Model Zoo code for details.
Currently, RBLN SDK only supports Linux. Windows support will be determined based on our technical roadmap.
The RBLN SDK supports distributed inference based on tensor parallelism, called RSD (Rebellions Scalable Design).
Please first check the Model List that support multi-device, and refer to the provided example for compilation instructions.
The optimal batch size may vary depending on the type of NPU used, server configuration, and service requirements.
We recommend using the Profiler tool and conducting various experiments for fine-tuning.
RBLN SDK includes the RBLN Profiler for performance bottleneck analysis, collecting key metrics such as execution time, memory usage, and operation dependencies
.pb
format trace files can be visualized with Perfetto. - You can analyze bottlenecks, inter-operation dependencies, and layer-by-layer latency to suggest optimization directions. For detailed usage, refer to the Profiler Guide.
To process video files, you can use libraries like OpenCV (cv2) to extract each frame from an .mp4
file as an image, and then feed those frames into the model for inference.
For example, when using an object detection model like YOLOX, the typical procedure is as follows:
1.Load the video file using cv2.VideoCapture
2.Extract frames one by one
3.Preprocess each frame to match the model’s input format
4.Perform object detection using the model
5.Visualize the results and either save them or display them in real time
Both are AI inference NPUs developed by Rebellions, but REBEL is a next-generation product designed with a chiplet-based architecture. A detailed comparison chart is available on the product page.
Yes. You can use Rebellions AI processor resources via the Kubernetes Plugin.
- Kubernetes Device Plugin: Supports RBLN NPUs on Kubernetes cluster environment.
- NPU NPU Feature Discovery: Labels Kubernetes nodes with RBLN NPUs for scheduling.
- RBLN Metrics Exporter: Exposes NPU metrics (temperature, power, DRAM, utilization) in Prometheus format for Grafana dashboards.
RBLN SDK is compatible with vLLM, Nvidia Triton Inference Server, and TorchServe. Container-based deployment also supports integration with Kubernetes.
GPUs were originally designed for graphics rendering but have been widely adopted for AI training and high-performance computing (HPC) due to their large-scale parallel processing capabilities. They typically use FP32/FP16 operations and support various types of computation through CUDA cores and Tensor Cores.
NPUs are processors specialized for AI and deep learning, designed to perform efficient computations at low power. They are optimized for low-bit operations such as INT8 and FP16 and include dedicated hardware architectures that accelerate neural network computations.
Rebellions devices are designed exclusively for inference, and fine-tuning is not currently supported.
To maximize inference performance, we recommend the following optimization strategies:
- Use Mixed Precision and Quantization: Improve memory efficiency and compute speed by using FP16 or INT8 quantized models.
- Adjust Batch Size: Find the optimal batch size based on model characteristics and input data to increase throughput.
- Refactor Model Architecture: Simplify the computation graph through layer fusion and removal of redundant operations to boost performance.
- Double Buffering: Utilize double buffering in
AsyncRuntime
to improve execution efficiency. - Apply Continuous Batching for LLM Serving: For large language model (LLM) serving, maximize hardware utilization by applying continuous batching techniques using
vllm-rbln
.
You can ask questions or discuss technical issues on Rebellions Dev Forum. You can directly reach out to us here.
The SDK is updated approximately every month, and the driver is updated every three months, although the schedule is subject to change.
Currently, for officially supported models listed in the RBLN Model Zoo, you can use the provided compilation and inference example code.
If you’re using a modified model or a model not included in the Model Zoo, technical support may be limited, and compilation may fail.
First, check the error code to identify the cause. If further assistance is required, please reach out via the Rebellions Dev Forum.
Please check the following items:
- Memory Usage: If the system runs out of memory during compilation, the process may fail.
- NPU Configuration: Ensure that the value of
rbln_tensor_parallel_size
is not greater than the actual number of devices installed in your system. You can verify the number of devices by running therbln-stat
command in your terminal. - Docker Environment: Refer to the Docker Guide for more details.
You can limit the number of CPU threads used during inference by setting the RBLN_NUM_THREADS
environment variable. Specifying an appropriate number of threads can reduce CPU load and help stabilize performance.
Pleae refer to this document for more details.
Issues may arise due to version mismatches between the driver and compiler.
- Refer to the Release Notes of the RBLN SDK to ensure that all components are installed with compatible versions.
- After aligning all libraries to their compatible versions, try recompiling the model.