Amid the superior GPU turf tussle between AMD and Nvidia (and presently, Intel), a new, China-based player is emerging: Biren Technology, supported in 2019 and headquartered in Shanghai. At Hot Chips thirty four, Biren co-founder and president Lingjie Xu and Biren CTO microphone Hong took the (virtual) stage to detail the company’s inaugural product: the Biren BR100 all-purpose GPU (GPGPU).

Credit to Google

“It is my honor to gift our first-generation figure product: BR100,” Xu said. “BR100 is devoted to addressing the challenges of AI coaching and abstract thought within the datacenter, with increased goals of skyrocketing productivity and reducing overall value of possession.”


At 1074mm2, the seventy seven billion-transistor, dual-die Biren BR100 (pictured within the header) are factory-made on TSMC’s 7nm method and capable of 256 FP32 teraflops. The die-to-die interconnect provides 896GB/s information measure. The BR100 comes with up to 64GB of HBM2E memory (across four stacks) and may manage up to two.3TB/s in external I/O information measure among its multiple BLink connections. All of this adds up to a easy lay TDP of 550W and a targeted clock frequency of 1GHz.


Given the BR100’s targeted use cases, the purpose of comparison was no surprise: Nvidia’s A100 GPU, that has become the factual reference within the widening field of accelerators. The BR100’s peak teraflops, of course, compare terribly favourably to the A100 — nineteen.5 for the A100, 256 for the BR100 (“one of the quickest GPUs within the world,” Xu said). on the far side the flops, Xu aforementioned they'd seen promising results on workloads and benchmarks.


“We had 2 style goals for the BR100,” Hong aforementioned. “The first: it must reach one petaflops H.P.. The second: it ought to be a GPGPU, not a pure AI accelerator.”

With that, let’s come back to flops for a moment: the BR100 supports FP32, BF16, FP16, INT32, INT16 and alternative formats — however there area unit 2 further points of note: initial, the BR100 doesn't support FP64 (“We set to dedicate chip space to our targeted markets and use cases,” Xu commented); second, the BR100 will support a replacement 24-bit knowledge kind known as TP32+. And, with 1024 teraflops of performance at BF16, it's just like the BR100 fits Biren’s bill of “one petaflops H.P..”


The BR100 will are available another flavor: the BR104, a single-die variant designed to be used in PCIe cards. Xu aforementioned that Biren is additionally operating with makers to create reference cluster styles. The chip itself has already been tested on real chemical element. What’s more: “We already submitted to the most recent spherical of MLPerf abstract thought, and you ought to be able to see our ends up in 2 or 3 weeks,” Xu said. (Biren a member of MLCommons.)

Biren Technology conjointly discharged the 8-way cheer OAM server in partnership with Inspur. the businesses expect to start sampling the hardware within the fourth quarter of this year.

Devices can ship with Biren’s own software package platform and programming model, known as BIRENSUPA. “Developers World Health Organization area unit conversant in (Nvidia’s) CUDA will simply write code for SUPA,” aforementioned Hong. Supported AI frameworks embody PyTorch, TensorFlow and PaddlePaddle. the corporate conjointly provides the OpenCL compiler. The dual-die BR100 seems mutually GPU to the software package layer.

As of its Series B funding spherical, Biren has raised over five billion CNY (~$730 million USD).