Live RGB-D streaming
WebSocket bridges to RealSense D405 and Femto Bolt RGB-D cameras.
zlib-compressed uint16 depth is decoded in the browser and rendered
with a JET colormap. Pixel-hover gives a live millimetre readout.
RobotApp is the UI tier of a three-mode robot stack — UI, CLI, and Python API all share one runtime (robot_agent). The dashboard turns any robot_agent-based robot (reference: kcare_robot, an assistive 6-DOF mobile manipulator) into a network-addressable agent — driveable from a browser on the other side of the world, deployed globally on Cloudflare Pages.
Live RGB-D streaming
WebSocket bridges to RealSense D405 and Femto Bolt RGB-D cameras.
zlib-compressed uint16 depth is decoded in the browser and rendered
with a JET colormap. Pixel-hover gives a live millimetre readout.
Open-vocabulary perception
Type apple or the red mug on the table;
the backend routes to a TCP VLM service (GroundingDINO + GroundedSAM +
mask2grasps) and lifts detections to 6-DOF grasp poses.
Streaming task plans
A single WebSocket carries the full lifecycle:
start → plan → step_start → step_log → step_done → done.
Inline frames render next to the step that produced them.
Hot-swappable skills
Register new ROS service / topic / action / WebRTC / TCP / LLM clients from the UI. Reload skills with one POST — no robot restart.
Three execution modes
Same skill, three entry points: POST /skill/<name>,
kcare_robot pick::apple, or
from kcare_robot.skills.pick import pick.
Edge deployment
Static export, served from Cloudflare Pages with a custom domain (robot.aistations.org) — sub-100 ms TTFB anywhere.
The interface is split into four functional zones. Click through them below.

Multi-camera tabs (head_rgb, head_depth, arm_rgb, arm_depth, log_image)
persist their order in localStorage. Drag to reorder. RGB
streams as JPEG; depth streams as raw uint16 +
zlib, decoded and colorised client-side. Rectangle-draw
overlay is wired to detector inputs.

Two prompt modes — Structured for direct skill calls
(moveh::down + key=value params) and
Unstructured for natural-language tasks routed through
an LLM planner. Language picker (EN / KO / VI). Ctrl+Enter to dispatch.

The LLM plan and step-by-step execution timeline. Each step shows
status (✓ done / ✗ failed / ⟳ running),
a JSON result, and any log image emitted mid-skill — all streamed over
one WebSocket.

Live list of every registered skill, grouped by category
(recognition, pick, place, mobile, arm, head, lift…). CRUD on each
skill, JSON-edit per-skill configs, hot-reload via
POST /skills/reload.

Every ROS service / topic / action / WebRTC / TCP / LLM client
registered with the robot. Ping All health-checks the
bus; Scan ROS discovers nodes via
get_ros2_node_names_and_types().
┌──────────────────────────────────────┐ │ robotapp · Next.js 14 + Tailwind │ browser / Cloudflare Pages │ WebSocket + REST client (TypeScript)│ └────────────────┬─────────────────────┘ │ HTTP + WS ▼ ┌──────────────────────────────────────────────────┐ │ robot_agent · FastAPI │ robot host · port 8001 │ · SkillRegistry · DeviceManager · UnifiedAgent │ │ · /ws/camera/{id} /ws/agent 30+ REST │ └────────────┬──────────────────────┬──────────────┘ │ │ ▼ ▼ ROS2 Humble · rclpy · Nav2 Devices · RealSense D405 joint_states · /navigate_to_ Femto Bolt · KAAIR cobot pose · /kaair_worker/* TCP VLM service (GPU)Want the full diagram with sequence flow? Read the architecture deep dive →
Record a screen capture of the dashboard and drop it at docs/public/demo.mp4
(or demo.gif) — it will appear here automatically.
Trung Bui · Robotics + Vision Engineer
Built RobotApp, robot_agent, kcare_robot, and robot_template end-to-end. Open to roles in robotics, computer vision, and full-stack AI infrastructure.
bmtrungvp@gmail.com · github.com/mtbui2010 · More about this project →