Skip to content

kcare_robot — Reference Implementation

Reference robot 6-DOF cobot RealSense D405 Femto Bolt Nav2

A real, working assistive mobile manipulator — the proof that the robot_agent runtime carries production robotics workloads.

23
skills
6-DOF
cobot arm
2
RGB-D cameras
Nav2
mobile base

Hardware

SubsystemHardwareROS2 interface
Manipulator6-DOF KAAIR cobot arm/kaair_worker/arm_moveJ, /arm_moveT (actions)
End-effectorTwo-finger gripper + suction/body/tool_controller/gripper_cmd
Wrist cameraIntel RealSense D405/hand/d405/color/..., /depth/image_rect_raw
Head camerasOrbbec Femto Bolt RGB-D/femto/color/..., /femto/depth/...compressedDepth
Mobile base2-wheel diff-drive, LiDARNav2 /navigate_to_pose
LiftLinear actuator/kaair_worker/lift_move
Head2-DOF pan-tilt/kaair_worker/head_move (rz, ry)
ProprioceptionJoint states, tool pose, mobile odom/joint_states, /robot_pose/*
Perception backendTCP VLM serviceGroundingDINO + GroundedSAM + mask2grasps

The 23 skills

GroupSkills
Perceptionfind, detect, find_arm, grasp_succeed, get3d, inform
Manipulationpick, pick_no_sound, pick_card, fine_move, place, placeat, placep, open_drawer, close_drawer, collect_card, return_card, stack, wipe
Arm motionarm_joints, arm_pose, movel, movej, movet, movelf
Mobile basemove, forward, turn, rotate, moveb, mobile_pose
Head / lift / grippermoveh, head_state, lift, lift_state, dlift, grip
Interactionselect_response, llm

Every skill follows one contract:

def skill(node, **params) -> dict:
return {'isdone': bool, 'msg': str, ...} # planner-readable

What’s interesting under the hood

Open-vocabulary 3D perception

skills/recognition.py — 415 lines — runs the full pipeline:

  1. Fetch RGB-D from wrist or head camera
  2. Detect via TCP to the VLM service (GroundingDINO for text queries, GroundedSAM for masks)
  3. Lift to 3Dattach_3d_features() reconstructs per-cluster normals, min/median/max depth, 3D centroids via inverse projection Ixy2xyz()
  4. Classify pose — detects lying objects from normal-vector dispersion; estimates mass-center percentages for handle-equipped items
  5. Graspmask2grasps returns 2D pixel endpoints; the skill lifts them to a 6-DOF grasp pose using depth + camera intrinsics + wrist-offset geometry

Head-to-base calibration

skills/calibrattion.py ships a Head2BaseCalibration class with:

  • Intrinsic camera parameters (fx, fy, ppx, ppy) per stream
  • 4×4 link-to-base and base-to-lift transforms
  • Per-mode (front / left / right) error-linear corrections

This is what turns “the apple your wrist camera sees” into “an XYZ in the base frame the arm can actually move to.”

Closed-loop grasping with self-correction

skills/pick.py — 422 lines — orchestrates the full pick:

  1. find_arm() — wrist-camera detection
  2. fine_move() — wrist-guided approach with up to 2 self-correction trials if the object drifts out of frame
  3. grip() — close gripper
  4. grasp_succeed() — verify by re-imaging the gripper ROI and checking depth in a ±0.27 m window

Place is the mirror: placeat() / placep() plus retraction choreography. Drawer skills detect the handle as a separate class and run open/close as a constrained Cartesian movement.

Parallel actuator coordination

# Common pattern: lift + arm + head move simultaneously
run_parallel_check([
('lift', {'height': 0.4}),
('movej', {'joints': ARM_PRE_PICK}),
('moveh', {'rz': -30, 'ry': 20}),
])

run_parallel_check() (from pyconnect) fires ROS actions in parallel and waits for all to converge before continuing — drops a typical pick from ~7 s sequential to ~3 s.

Quick start

Terminal window
make install # editable-install everything
make run # uvicorn on :8001
# auto-sources /opt/ros/humble/setup.bash
# HTTP
curl -X POST http://localhost:8001/skill/find -d '{"inputs":"apple"}'
# CLI
kcare_robot --list
kcare_robot find::apple
kcare_robot pick::apple
# Python
python -c "from kcare_robot.skills.pick import pick; print(pick('apple'))"