kcare_robot — Reference Implementation
A real, working assistive mobile manipulator — the proof that the
robot_agent runtime carries production
robotics workloads.
Hardware
| Subsystem | Hardware | ROS2 interface |
|---|---|---|
| Manipulator | 6-DOF KAAIR cobot arm | /kaair_worker/arm_moveJ, /arm_moveT (actions) |
| End-effector | Two-finger gripper + suction | /body/tool_controller/gripper_cmd |
| Wrist camera | Intel RealSense D405 | /hand/d405/color/..., /depth/image_rect_raw |
| Head cameras | Orbbec Femto Bolt RGB-D | /femto/color/..., /femto/depth/...compressedDepth |
| Mobile base | 2-wheel diff-drive, LiDAR | Nav2 /navigate_to_pose |
| Lift | Linear actuator | /kaair_worker/lift_move |
| Head | 2-DOF pan-tilt | /kaair_worker/head_move (rz, ry) |
| Proprioception | Joint states, tool pose, mobile odom | /joint_states, /robot_pose/* |
| Perception backend | TCP VLM service | GroundingDINO + GroundedSAM + mask2grasps |
The 23 skills
| Group | Skills |
|---|---|
| Perception | find, detect, find_arm, grasp_succeed, get3d, inform |
| Manipulation | pick, pick_no_sound, pick_card, fine_move, place, placeat, placep, open_drawer, close_drawer, collect_card, return_card, stack, wipe |
| Arm motion | arm_joints, arm_pose, movel, movej, movet, movelf |
| Mobile base | move, forward, turn, rotate, moveb, mobile_pose |
| Head / lift / gripper | moveh, head_state, lift, lift_state, dlift, grip |
| Interaction | select_response, llm |
Every skill follows one contract:
def skill(node, **params) -> dict: return {'isdone': bool, 'msg': str, ...} # planner-readableWhat’s interesting under the hood
Open-vocabulary 3D perception
skills/recognition.py — 415 lines — runs the full pipeline:
- Fetch RGB-D from wrist or head camera
- Detect via TCP to the VLM service (
GroundingDINOfor text queries,GroundedSAMfor masks) - Lift to 3D —
attach_3d_features()reconstructs per-cluster normals, min/median/max depth, 3D centroids via inverse projectionIxy2xyz() - Classify pose — detects lying objects from normal-vector dispersion; estimates mass-center percentages for handle-equipped items
- Grasp —
mask2graspsreturns 2D pixel endpoints; the skill lifts them to a 6-DOF grasp pose using depth + camera intrinsics + wrist-offset geometry
Head-to-base calibration
skills/calibrattion.py ships a Head2BaseCalibration class with:
- Intrinsic camera parameters (fx, fy, ppx, ppy) per stream
- 4×4 link-to-base and base-to-lift transforms
- Per-mode (front / left / right) error-linear corrections
This is what turns “the apple your wrist camera sees” into “an XYZ in the base frame the arm can actually move to.”
Closed-loop grasping with self-correction
skills/pick.py — 422 lines — orchestrates the full pick:
find_arm()— wrist-camera detectionfine_move()— wrist-guided approach with up to 2 self-correction trials if the object drifts out of framegrip()— close grippergrasp_succeed()— verify by re-imaging the gripper ROI and checking depth in a ±0.27 m window
Place is the mirror: placeat() / placep() plus retraction choreography.
Drawer skills detect the handle as a separate class and run open/close as a
constrained Cartesian movement.
Parallel actuator coordination
# Common pattern: lift + arm + head move simultaneouslyrun_parallel_check([ ('lift', {'height': 0.4}), ('movej', {'joints': ARM_PRE_PICK}), ('moveh', {'rz': -30, 'ry': 20}),])run_parallel_check() (from pyconnect) fires ROS actions in parallel and
waits for all to converge before continuing — drops a typical pick from
~7 s sequential to ~3 s.
Quick start
make install # editable-install everythingmake run # uvicorn on :8001 # auto-sources /opt/ros/humble/setup.bash
# HTTPcurl -X POST http://localhost:8001/skill/find -d '{"inputs":"apple"}'
# CLIkcare_robot --listkcare_robot find::applekcare_robot pick::apple
# Pythonpython -c "from kcare_robot.skills.pick import pick; print(pick('apple'))"