KAN We Flow?
Advancing Robotic Manipulation with 3D Flow Matching via KAN & RWKV

Anonymous ICRA submission

Overview of KAN-We-Flow. The policy receives a noised action and a condition that comprises three encoded parts, a point-cloud perception embedding, a robot-state embedding, and a time embedding. The concatenated representation is processed by a lightweight RWKV-KAN U-shaped backbone instead of a large UNet-style backbone; RWKV mixes long-range sequential/spatial context with linear complexity, while KAN performs learnable spline-based feature calibration. Then, a straight-line flow is learned with conditional consistency flow matching to produce a one-step velocity field, and the resulting actions are generated at real-time inference speed; an additional action-consistency regularization aligns Euler-extrapolated trajectories with demonstrations to stabilize training.

Simulation Robot Experiments

Simulation results for Hammer, Door, and Pen tasks.

            Assembly
            
            Disassemble
            
            Hammer

Simulation results for Assembly, Disassemble, and Hammer.

            Door Close
            
            Door Open
            
            Button Press Wall

Simulation results for Door Close, Door Open, and Button Press Wall.

            Drawer Close
            
            Drawer Open
            
            Hand Insert

Simulation results for Drawer Close, Drawer Open, and Hand Insert.

            Faucet Close
            
            Faucet Open
            
            Stick Pull

Simulation results for Faucet Close, Faucet Open, and Stick Pull.

            Window Close
            
            Window Open
            
            Stick Push

Simulation results for Window Close, Window Open, and Stick Push.

Abstract

Diffusion-based visuomotor policies excel at modeling action distributions but are inference-inefficient, since recursively denoising from noise to policy requires many steps and heavy UNet backbones, which hinders deployment on resource-constrained robots. Flow matching alleviates the sampling burden by learning a one-step vector field, yet prior implementations still inherit large UNet-style architectures. In this work, we present KAN-We-Flow, a flow-matching policy that draws on recent advances in Receptance Weighted Key Value (RWKV) and Kolmogorov-Arnold Networks (KAN) from vision to build a lightweight and highly expressive backbone for 3D manipulation. Concretely, we introduce an RWKV-KAN block: an RWKV first performs efficient sequence/spatial mixing to propagate task context, and a subsequent GroupKAN layer applies learnable spline-based, groupwise functional mappings to perform feature-wise nonlinear calibration of the action mapping on RWKV outputs. Moreover, we introduce an Action Consistency Regularization (ACR), a lightweight auxiliary loss that enforces alignment between predicted action trajectories and expert demonstrations via Euler extrapolation, providing additional supervision to stabilize training and improve policy precision. Without resorting to large UNets, our design reduces parameters by 86.8%, maintains fast runtime, and achieves state-of-the-art success rates on Adroit, Meta-World, and DexArt benchmarks.

Comparison of Accuracy, Parameter, and Inference Time

Comparison of KAN-We-Flow with the state-of-the-art methods FlowPolicy and DP3 regarding accuracy, parameter, and inference time. (a) KAN-We-Flow achieves superior success rates among different benchmarks' tough tasks; (b) Our approach obtains an 86.8\% parameter reduction, compared with FlowPolicy and DP3; (c) Compared with DP3, our KAN-We-Flow achieves 92.6% inference time decrease in the Adroit–Pen task, enabling real-time control.

KAN We Flow? Advancing Robotic Manipulation with 3D Flow Matching via KAN & RWKV

Anonymous ICRA submission

Simulation Robot Experiments

Hammer

Door

Pen

Assembly

Disassemble

Hammer

Door Close

Door Open

Button Press Wall

Drawer Close

Drawer Open

Hand Insert

Faucet Close

Faucet Open

Stick Pull

Window Close

Window Open

Stick Push

Abstract

Comparison of Accuracy, Parameter, and Inference Time

KAN We Flow?
Advancing Robotic Manipulation with 3D Flow Matching via KAN & RWKV