Skip to content
Tony

Mijia Camera Technology: A Complete Breakdown

A systematic breakdown of building an industrial-grade Mijia smart camera from three angles: hardware design, software architecture, and cloud services. Uses the PTZ indoor camera as the main thread.

Tech , Audio & Video 15 min read

This article systematically walks through how to build an industrial-grade Mijia smart camera product from three perspectives: hardware design, software architecture, and cloud services. It uses the PTZ indoor camera as the main thread while covering the differences of other form factors such as the bullet-and-ball dual-lens camera.

The technology stacks covered here have dedicated articles in previous blog posts — read them together for deeper understanding:


┌─────────────┐
│ Lens Module │
│ (Lens+CMOS) │
└──────┬──────┘
│ MIPI CSI
┌─────────┐ I2S ┌──────────┴──────────┐ SDIO/USB ┌──────────┐
│ Mic │───────────►│ │◄───────────────►│ Wi-Fi │
│ (MIC) │ │ Main SoC │ │ Module │
└─────────┘ │ (SigmaStar/Ingenic) │ └──────────┘
│ │ SPI/eMMC
┌─────────┐ PWM │ CPU + ISP + NPU │◄───────────────►┌──────────┐
│ Speaker │◄───────────│ + Video Encoder │ │ Flash │
│ │ DAC │ │ │(NOR/NAND)│
└─────────┘ └──┬───┬───┬───┬──────┘ └──────────┘
│ │ │ │
GPIO │ │ │ │ DDR
┌───────────┘ │ │ └────────┐
│ │ │ │
┌─────┴─────┐ ┌────┴───┴──┐ ┌────┴────┐
│ IR LED │ │ PTZ Motor │ │ DDR │
│ + IR-CUT │ │ (Stepper/ │ │ Memory │
└───────────┘ │ DC) │ └─────────┘
└───────────┘

The main SoC is the camera’s core, integrating the CPU, ISP (Image Signal Processor), video encoding engine, and NPU (AI inference) on a single chip.

Common solutions:

ChipVendorCPUNPU PerformanceVideo CapabilityTarget Scenarios
SSC337DESigmaStarCortex-A7 Dual-core0.5~1 TOPS3MP@30fps H.265Low-to-mid-range PTZ
SSC377SigmaStarCortex-A7 Dual-core~2 TOPS5MP@30fps H.265Mid-to-high-end / Dual-cam
T40XPIngenicMIPS XBurst2 Dual-core3.2 TOPS5MP@30fps H.265/H.264High-end / AI-enhanced
T31IngenicMIPS XBurst Single-coreLimited3MP@25fpsLow-cost / Low-power

Key selection criteria:

  • ISP quality: Directly determines image quality (noise reduction, WDR, 3D-DNR)
  • Encoder capability: Must support simultaneous multi-stream encoding (main stream + substream + AI frames)
  • NPU performance: Determines how many AI models can run (human detection + face recognition requires ≥1 TOPS)
  • Memory bandwidth: Video + AI + ISP working concurrently demands high bandwidth

TypeCapacityNotes
DDR264MBLow-end models, barely enough for basic video
DDR3/DDR3L128~256MBMainstream solution, sufficient for video + AI + P2P
LPDDR4256~512MBHigh-end dual-cam or multi-tasking scenarios

Memory allocation example (128MB):

  • Linux kernel + userspace: ~30MB
  • Video encoding buffers (main/substream): ~40MB
  • ISP image pipeline buffers: ~20MB
  • AI inference tensor buffers: ~20MB
  • P2P / network buffers: ~10MB
  • Reserved / fragmentation: ~8MB

TypeCapacityTypical Use
SPI NOR Flash8~32MBBootloader + kernel + rootfs (minimal system)
SPI NAND Flash128~256MBFull system + model files + recording cache
eMMC512MB~8GBHigh-end solution, supports local recording
TF Card SlotUser-expandableLocal recording storage (up to 256GB)

Partition layout (typical SPI NAND 128MB):

┌──────────────────────────────────────────────────────┐
│ boot (1MB) │ kernel (4MB) │ rootfs (40MB) │ data (80MB) │
│ U-Boot │ uImage │ squashfs │ jffs2/ubifs │
└──────────────────────────────────────────────────────┘
  • rootfs: Read-only squashfs, immune to corruption from power loss
  • data: Writable partition for configuration files, AI models, logs, etc.
  • Dual A/B partition scheme prevents bricking during OTA

ModelVendorResolutionPixel SizeTarget Scenarios
SC3336SmartSens3MP (2304x1296)2.5μmMainstream home use
SC5235SmartSens5MP (2592x1944)2.0μmHigh-definition
IMX307Sony2MP (1920x1080)2.9μmStarlight night vision
OS04A10OmniVision4MP (2560x1440)2.0μmMid-to-high-end

Selection considerations:

  • Pixel size: Larger means more light intake, better night vision
  • Sensitivity: Determines noise levels in low-light conditions
  • Shutter mode: Rolling shutter (lower cost) vs. Global shutter (no jelly effect on motion)
  • Interface: MIPI CSI-2, 2-lane or 4-lane
  • Power consumption: Affects overall thermal design

ParameterTypical ValueNotes
Focal length3.6mm / 2.8mmShorter focal length = wider field of view
ApertureF2.0 / F1.6Wider aperture = more light, better night vision
FOVHorizontal 110°~130°Home use generally requires ≥110°
IR-CUTDual filter switchingDaytime: blocks IR for natural color; Nighttime: removes filter for IR sensitivity
FocusFixedPTZ indoor cameras generally use fixed focus to reduce cost

Lens mount: Typically M12 (S-Mount), secured with thread-locking adhesive after focus adjustment.

SolutionBandNotes
RTL8189FTV2.4GHzLow-cost SDIO interface, supports 802.11 b/g/n
RTL8733BU2.4G + 5G + BLEDual-band + Bluetooth, USB interface, supports BLE provisioning
SSW101B2.4GHzSigmaStar companion solution, SDIO

Key selection criteria:

  • Throughput: 1080P@30fps H.265 main stream is about 2~4Mbps, needs stable Wi-Fi bandwidth
  • Interference resistance: 2.4G band is crowded in homes; MIMO or 5G support recommended
  • Power consumption: Affects device temperature
  • BLE provisioning: Dual-mode chip enables BLE provisioning for better user experience

PTZ cameras use mechanical rotation for horizontal (Pan) and vertical (Tilt) movement:

ParameterPanTilt
Motor typeStepper motorStepper motor
Rotation range360°90°~120°
Step angleTypically 1/16 microstepSame
Gear ratioGear reductionGear reduction
Homing methodOptocoupler / Hall sensorSame
Driver ICe.g. MS41929Same

Control logic:

  • On power-up, determine zero position via homing sensor
  • App sends rotation angle command, converted to step pulse count
  • Supports presets, patrol, tracking, and other advanced features
  • Noise optimization required (microstep driving + gear ratio design)

Bullet-and-ball dual-lens differences: The dual-lens form factor typically has one fixed wide-angle lens + one varifocal PTZ lens, working together to achieve “panoramic tracking + close-up capture.”

Day mode: Night mode:
┌─────────┐ ┌─────────┐
│ Lens │ │ Lens │
│ ↓ │ │ ↓ │
│ IR-CUT │ ← Filter blocks IR │ IR-CUT │ ← Filter moved away
│ (block) │ │ (open) │
│ ↓ │ │ ↓ │
│ CMOS │ │ CMOS │ ← Receives 850nm/940nm IR light
└─────────┘ └─────────┘
IR LED illumination
  • 850nm IR LED: Faint red glow, longer illumination range
  • 940nm IR LED: Completely invisible to the naked eye, suitable for covert scenarios
  • IR-CUT switch: Automatically controlled by ISP based on ambient light levels

ComponentSpecificationNotes
MicrophoneElectret / MEMSAudio pickup, cry detection, two-way talk
Speaker8Ω 1W~2WTalkback playback, alarm siren
Audio CodecSoC-integrated / External e.g. ES8388ADC (MIC → digital) + DAC (digital → speaker)
Echo CancellationSoftware AEC algorithmEliminates speaker crosstalk into microphone during two-way talk

Talkback path: App voice → P2P → device decode → DAC → speaker; device MIC → ADC → AEC → encode → P2P → App

SolutionInputNotes
DC 5V/2AMicro-USB / USB-CStandard indoor PTZ camera power
DC 12V/1ADC barrel jackOutdoor bullet camera, PoE power
Battery powered18650 Li-ion packLow-power battery camera, PIR wake-up

Power path:

USB 5V → DCDC (3.3V/1.8V/1.2V) → SoC / DDR / Wi-Fi / Motor / IR LED
└→ LDO (analog circuits: CMOS sensor, Audio Codec)

Notes:

  • Motor startup draws high transient current — leave margin
  • IR LED high current → independent MOSFET switch control
  • Separate digital and analog grounds to avoid interference with audio and video

ElementNotes
MaterialABS / PC+ABS (flammability rating V0)
Thermal managementSoC uses thermal paste + heatsink, or conducts heat through the housing
Dust protectionLens panel sealed to prevent dust adhesion
Waterproof (outdoor)IP65/IP66 rating, O-ring seals
Antenna placementWi-Fi antenna away from motor and metal parts to avoid shielding
TF card slotHidden slot with eject mechanism
Indicator lightStatus LED (blue/orange), can be turned off via command

┌─────────────────────────────────────────────────────────────┐
│ App Layer / Cloud │
│ Mijia App │ Xiaomi Cloud Storage │ IoT Platform │
└────────────┬────────┴────────┬───────────────┴──────────────┘
│ P2P (MISS) │ HTTPS │ MQTT
│ │ │
┌────────────┴─────────────────┴─────────────────┴────────────┐
│ Device Software Stack │
├─────────────────────────────────────────────────────────────┤
│ ┌──────────┐ ┌──────────┐ ┌─────────┐ ┌─────────────┐ │
│ │ MISS P2P │ │ Cloud │ │ IoT │ │ OTA Client │ │
│ │ SDK │ │ Upload │ │ SPEC │ │ │ │
│ └────┬─────┘ └────┬─────┘ └────┬────┘ └──────┬──────┘ │
│ │ │ │ │ │
│ ┌────┴──────────────┴─────────────┴───────────────┴──────┐ │
│ │ Business Logic Layer (C/C++) │ │
│ │ Video Capture │ Encode Mgmt │ Storage Mgmt │ AI │ │
│ │ Scheduler │ PTZ Control │ │
│ └────────────────────────┬────────────────────────────────┘ │
│ │ │
│ ┌────────────────────────┴────────────────────────────────┐ │
│ │ Middleware / HAL Layer │ │
│ │ ISP Driver │ Encoder API │ Audio API │ GPIO │ Motor │ │
│ │ NPU Driver │ │
│ └────────────────────────┬────────────────────────────────┘ │
│ │ │
├───────────────────────────┴──────────────────────────────────┤
│ Linux Kernel (4.9 / 5.x) │
│ V4L2 │ ALSA │ SPI │ I2C │ SDIO │ USB │ MTD │ NetFilter │
├─────────────────────────────────────────────────────────────┤
│ Bootloader (U-Boot) │
└─────────────────────────────────────────────────────────────┘

Why Linux:

  • Mature and stable, rich driver support
  • Strong community support, chip vendors provide BSP (Board Support Package)
  • Supports multi-threading/multi-processing, suitable for complex camera workloads
  • Open source, customizable and tunable

Buildroot build system:

Buildroot is used to build a minimal embedded Linux root filesystem — lighter than Yocto and faster to build.

Terminal window
# Typical build flow
$ make <board>_defconfig # Load board configuration
$ make menuconfig # Configure kernel / userspace packages
$ make # Build, generating firmware
# Output artifacts
output/images/
├── u-boot.bin # Bootloader
├── uImage # Linux kernel
├── rootfs.squashfs # Read-only root filesystem
└── userdata.ubifs # Writable data partition

System tuning tips:

  • Remove unnecessary kernel modules (USB gadget, Bluetooth stack, etc.)
  • Use BusyBox instead of full coreutils
  • Choose musl libc over glibc (~2MB smaller)
  • Disable kernel printk to reduce serial output overhead

Power-on → BootROM → U-Boot → Kernel → Init → Application
│ │ │ │ │
│ (~10ms) │(~500ms) │(~2s) │(~1s) │(~2s)
└───────────────┴──────────┴───────┴───────┴──────── Total ~6s to first frame

Fast boot optimizations:

  • U-Boot: skip unnecessary device detection, load kernel directly
  • Kernel: trim unused drivers, use initramfs instead of init scripts
  • Userspace: start ISP + encoder first, defer AI/P2P module loading
  • Goal: ≤3 seconds from power-on to first video frame

Firmware flashing happens during factory production:

MethodNotesScenario
USB flashingPC connects via USB, uses chip vendor’s toolMass production
SD card flashingPut firmware on TF card, auto-flashes on power-upSmall batch / R&D
UART flashingSerial + TFTP to download firmwareDebugging / brick recovery
Network flashingBatch firmware delivery over networkLarge-scale production line

Production flashing workflow:

Flash firmware → Write unique device info (DID/MAC/Key) → Functional self-test → Labeling and warehousing

Each device is provisioned with:

  • DID (Device ID): Unique device identifier on the Mijia platform
  • MAC address: Physical address of the Wi-Fi module
  • Device key: Used for secure communication with the Mijia cloud

Xiaomi provides a complete development kit MIKE (Mi IPC Kit Environment) for the camera ecosystem, covering the full chain from low-level chip adaptation to high-level business APIs:

┌─────────────────────────────┐
┌───────────────────────────────────────────┐ │ │
│ MIKE Upper-layer API │ │ │
│ (Unified business interface: AV, AI, │ │ │
│ Storage, Provisioning, etc.) │ │ │
├───────────────────────────────────────────┤ │ │
│ Middleware Modules │ │ Tools Toolkit │
│ ┌──────┐ ┌────┐ ┌─────┐ ┌────┐ ┌─────┐ │ │ │
│ │ MISS │ │ OT │ │Cloud│ │Prov│ │Rec │ │ │ Emulator (device emulator) │
│ │(P2P) │ │Svc │ │Stor │ │ │ │ │ │ │ Monitor (runtime monitor) │
│ └──────┘ └────┘ └─────┘ └────┘ └─────┘ │ │ Auto Test(automation test) │
│ ┌──────┐ ┌──────┐ ┌─────┐ ┌─────────┐ │ │ Logger Debugger │
│ │ OTA │ │Playbk│ │Codec│ │Local AI │ │ │ │
│ │ │ │ │ │ │ │NAS │ │ │ Cross-layer tools: │
│ └──────┘ └──────┘ └─────┘ └─────────┘ │ │ - Business-layer simulation│
├──────────────────────────────────────────┤ │ - Middleware unit testing │
│ Chip Platform Adaptation Layer (HAL) │ │ - Platform HAL validation │
│ SigmaStar │ Ingenic │ Others │ └─────────────────────────────┘
└──────────────────────────────────────────┘

MIKE module responsibilities:

ModuleDescription
MISS (P2P)Streaming media transport SDK — handles audio/video and command channels between device and App
OT ServiceDevice heartbeat, online status management
Cloud StorageEncrypted segmented upload of event recordings
ProvisioningWi-Fi AP scan / BLE provisioning flow
RecordingLocal TF card recording management (continuous / event-based)
OTAFirmware upgrade (A/B partition, delta OTA)
PlaybackTimeline playback of cloud / local recordings
CodecAudio/video encode/decode wrapper (H.265/Opus)
Local AIHuman / face / pet / cry detection model inference scheduling
NAS StorageLAN NAS video storage (SMB/NFS)
Chip Platform AdaptationAbstracts ISP/encoder/NPU differences across SoCs, provides unified HAL
ToolsCross-layer dev/debug tools: Emulator (PC-side device sim), Monitor (runtime status), Auto Test (automation framework), Logger Debugger (log capture and analysis)

MIKE’s design allows business developers to work without worrying about underlying chip differences — they use the upper-layer API for feature development. The Tools toolkit also supports emulating device operation on a PC, significantly improving development and debugging efficiency.

The fundamentals of P2P (NAT types, UDP hole punching, STUN/TURN) are covered in Introduction to P2P Technology. Here we focus on the specific implementation in Xiaomi cameras.

MISS (MIoT Streaming SDK) is Xiaomi’s P2P streaming transport SDK, implemented in C, cross-platform (device-side Linux + App-side iOS/Android), and is the core communication module in the MIKE kit.

┌──────────────────────────────────────────┐
│ MISS SDK (Upper-layer API) │
│ Create Channel │ Send Data │ Recv Data │
│ Event Callback
├──────────────────────────────────────────┤
│ Channel Management / Scheduler │
│ Connection Mgmt │ Reconnect Strategy │
│ Flow Control │ QoS
├──────────────────────────────────────────┤
│ P2P Transport Layer (Pluggable) │
│ ┌────────┐ ┌────────┐ ┌────────────┐ │
│ │ TUTK │ │ Shangyun│ │ Xiaomi │ │
│ │(Kalay) │ │ │ │ Self-Dev │ │
│ │ │ │ │ │ P2P │ │
│ └────────┘ └────────┘ └────────────┘ │
├──────────────────────────────────────────┤
│ Network Layer (UDP/TCP) │
└──────────────────────────────────────────┘

┌──────────┐ ┌──────────────┐ ┌──────────┐
│ Camera │ │ P2P Server │ │ App │
└─────┬────┘ └──────┬───────┘ └─────┬────┘
│ 1. Register online │ │
│──────────────────────►│ │
│ │ │
│ │ 2. Query device │
│ │ address │
│ │◄─────────────────────│
│ │ │
│ │ 3. Return device │
│ │ IP/port │
│ │─────────────────────►│
│ │ │
│ 4. NAT Traversal / P2P Direct │
│◄════════════════════════════════════════════►│
│ │
│ 5. If P2P fails, relay via relay server │
│◄═══════════[ Relay Server ]═══════════════►│

MISS SDK supports multiplexing multiple logical channels over a single P2P connection:

ChannelPurposeCharacteristics
Video channelMain / substream transportHigh bandwidth, frame drops acceptable
Audio channelTwo-way talk audioLow latency priority
Command channelControl commands (PTZ, snapshot, etc.)Reliable delivery
File channelRecording playback / downloadReliable, supports resumable transfer
Alert channelEvent notificationsLow latency

The pluggable transport layer enables:

  • TUTK (Kalay): Covers global nodes, high connectivity overseas
  • Shangyun: Dense node deployment in China, low latency
  • Xiaomi Self-Dev: Full control, continuous optimization
  • SDK automatically selects the optimal engine, transparent to the business layer

Mijia cameras support two provisioning modes:

┌───────────┐ ┌───────────┐
│ Mijia App │ │ Camera │
└─────┬─────┘ └─────┬─────┘
│ │
│ 1. App generates QR code │
│ (Contains Wi-Fi SSID + Password │
│ + Token) │
│ Displayed on phone screen │
│ │
│ 2. Camera powered on,
│ camera scans QR
│ code on phone
│ │
│ 3. Camera decodes QR code, │
│ extracts Wi-Fi credentials, │
│ connects to router │
│ │
│ 4. Camera connects to router, │
│ handshake with App via LAN/Cloud │
│◄────────────────────────────────────────►│
│ │
│ 5. Bind device to Mijia account │
│─────────────(Cloud)────────────────────►│

Advantage: No extra hardware (Bluetooth chip) required — uses the camera’s own capability for provisioning.

1. App discovers device via BLE
2. BLE channel sends Wi-Fi SSID + Password
3. Device connects to Wi-Fi
4. Binding completed via cloud

Advantage: Does not rely on the camera’s image — works in dark or obstructed scenarios. Suitable for dual-mode Wi-Fi+BLE chip solutions.

The MIoT SPEC data model (Device → Service → Property/Action/Event) is architecturally very close to the Matter protocol’s Cluster model. See Introduction to Matter Protocol for a detailed comparison.

As a Mijia device, the camera must implement the Service/Property/Action/Event model defined by the MIoT SPEC protocol.

Typical camera SPEC definition:

Device: camera
├── Service: camera-control
│ ├── Property: on (bool) — Camera power switch
│ ├── Property: night-shot (enum) — Night vision mode (auto/on/off)
│ ├── Property: watermark (bool) — Watermark toggle
│ ├── Action: start-recording — Start recording
│ └── Action: stop-recording — Stop recording
├── Service: ptz-control
│ ├── Property: pan-position (int) — Pan angle
│ ├── Property: tilt-position (int) — Tilt angle
│ ├── Action: rotate (direction, speed)— Rotate
│ └── Action: go-to-preset (id) — Go to preset position
├── Service: motion-detection
│ ├── Property: sensitivity (enum) — Sensitivity (low/medium/high)
│ ├── Property: detection-area (struct)— Detection zone
│ └── Event: motion-detected — Motion detection event
├── Service: ai-detection
│ ├── Property: human-detect-on (bool) — Human detection toggle
│ ├── Property: face-detect-on (bool) — Face detection toggle
│ ├── Property: pet-detect-on (bool) — Pet detection toggle
│ ├── Property: cry-detect-on (bool) — Cry detection toggle
│ ├── Event: human-detected — Human detection event
│ ├── Event: face-detected — Face detection event
│ ├── Event: pet-detected — Pet detection event
│ └── Event: cry-detected — Cry detection event
├── Service: storage
│ ├── Property: sd-card-status (enum) — TF card status
│ ├── Property: cloud-storage-on (bool)— Cloud storage toggle
│ └── Action: format-sd-card — Format TF card
└── Service: indicator-light
└── Property: on (bool) — Indicator light toggle

The device maintains a persistent connection with the Xiaomi IoT cloud via MQTT, reporting property changes and events. Control commands issued by the App are also delivered through the MQTT → device path.

The video segmentation and playback mechanism of cloud storage shares design concepts with the HLS protocol (splitting video into small segments + index manifest). See HLS and Streaming Media Storage for details.

Xiaomi cloud storage uploads recording segments from the camera to the cloud, allowing users to view them in the App.

Camera side:
ISP → YUV420 → Encoder (H.265, 20fps) → Ring buffer → Event trigger → Segment (10s each) → HTTPS upload
Upload flow:
1. Event triggers (motion detection / AI detection / user manual)
2. Extract video segments before and after the event from the ring buffer
3. Encrypt segments (AES-128)
4. Upload via HTTPS POST to Xiaomi cloud storage
5. Server returns index info, device reports event metadata
Playback flow:
App requests timeline → Cloud returns segment list → App pulls stream via HTTPS → Decrypt and play

  • Ring buffer: Keeps the most recent 10~30 seconds of video in memory, ensuring event-triggered capture can reach back before the event
  • End-to-end encryption: Video is encrypted on the device; the cloud cannot decrypt it — the user-side key handles decryption
  • Resumable upload: Automatically retries unfinished segments during network fluctuations
  • Flow control: Dynamically adjusts upload bitrate (substream/main stream) based on available bandwidth

AlgorithmInputInference HardwareFrame RateDescription
Human detectionVideo frame (substream 640x360)NPU10~15fpsDetects if people are in the frame
Face recognitionCropped human regionNPUOn-demandRecognizes strangers / family members
Pet detectionVideo frameNPU10~15fpsCat / dog detection
Cry detectionAudio frame (16kHz PCM)CPU/DSPReal-timeBaby cry recognition
Motion detectionFrame differencingCPUMain stream frame rateLow-computation basic detection

Model training (cloud GPU)
▼ Model quantization / conversion (float → INT8)
▼ Chip vendor toolchain conversion (ONNX → chip-private format)
│ - SigmaStar: IPU Toolkit → .img model
│ - Ingenic: Magik → .bin model
▼ Model files packaged into firmware data partition
▼ Device-side NPU Runtime loads and runs inference

ISP outputs YUV frame
├──► Main stream encode (1080P/1296P) ──► P2P / Recording
└──► Substream (360P/VGA) ──► AI preprocessing (Resize/Normalize)
NPU inference (human / pet)
Post-processing (NMS / threshold filtering)
├──► Target detected → trigger event report + cloud storage
├──► Face region → crop → face recognition model
└──► No target → wait for next frame

Audio AI (cry detection) runs independently of the video pipeline in the audio thread:

MIC → ADC → 16kHz PCM → Sliding window framing → Feature extraction (MFCC) → Classification model → Cry / not cry

OTA is the foundation for continuous iteration of smart devices, requiring both safety and reliability during the upgrade process.

┌───────────┐ ┌──────────────┐ ┌───────────┐
│ Xiaomi │ │ Camera │ │ Mijia App │
│ OTA Server│ │ │ │ │
└─────┬─────┘ └──────┬───────┘ └─────┬─────┘
│ │ │
│ 1. Check for updates (scheduled / push) │
│◄─────────────────────│ │
│ │ │
│ 2. Return new version info + download URL │
│─────────────────────►│ │
│ │ │
│ 3. Download firmware package (HTTPS) │
│─────────────────────►│ │
│ │ │
│ 4. Verify firmware signature │
│ (RSA/ECDSA) │
│ │ │
│ 5. Write to standby partition │
│ (A/B) │
│ │ │
│ 6. Switch boot partition, │
│ reboot │
│ │ │
│ 7. Boot successful → report │
│ new version │
│◄─────────────────────│ │
│ │ │
│ │ 8. App shows upgrade │
│ │ complete │
│ │──────────────────────►│

MechanismDescription
A/B dual partitionNew firmware written to standby partition; switch occurs only after verification
WatchdogHardware watchdog triggers rollback if system fails to run after boot
Boot counterRollback to old partition after N consecutive boot failures
Firmware signatureRSA/ECDSA signature verification prevents tampering
Power-loss recoveryPower loss during write does not affect the currently running partition

To save bandwidth and upgrade time, delta OTA is supported:

  • Server generates a delta package from old and new firmware using bsdiff (typically 10%~30% of full package size)
  • Device applies the delta against the current partition data to reconstruct the full new firmware
  • Verifies hash integrity, then writes

For the basics of audio/video digitization (sampling, quantization, encoding, color spaces), see Audio/Video and Its Digital Representation.

Light → Lens → CMOS Sensor → ISP → Encoder → Bitstream output
ISP Pipeline:
Raw Bayer → Black level correction → Bad pixel correction → Demosaic → White balance →
Color correction → Gamma → Noise reduction (2D/3D-DNR) → Sharpening → YUV output

Mainstream Xiaomi cameras use YUV420 color space (see Audio/Video Fundamentals for a detailed explanation), H.265 video encoding, Opus audio encoding, and a video frame rate of 20fps.

Multi-stream design:

StreamResolutionColor SpaceFrame RateCodecBitratePurpose
Main stream1920x1080 / 2304x1296YUV42020fpsH.2651~4 MbpsP2P HD viewing, cloud storage
Substream640x360YUV42015fpsH.265200~500 KbpsP2P smooth viewing (weak network), AI inference input
JPEG stream1920x1080On-demandJPEGSnapshots, cover images

MIC → ADC (16kHz/16bit) → AEC (Echo Cancellation) → ANR (Noise Reduction) → AGC (Gain Control) → Encode (Opus)

Audio encoding uses Opus, an open-source, royalty-free codec supporting everything from low-bitrate speech (6kbps) to high-bitrate music (510kbps), with encoding latency as low as 5ms — ideal for real-time two-way talk.

  • AEC (Acoustic Echo Cancellation): Essential for two-way talk scenarios — eliminates speaker echo
  • ANR (Automatic Noise Reduction): Removes ambient background noise
  • AGC (Automatic Gain Control): Automatically adjusts gain to prevent levels from being too high or too low

Recording strategies:
├── Continuous recording: 7x24 loop recording, overwrites oldest files when full
└── Event recording: Records only when events are detected, saves space
File organization:
/mnt/sdcard/record/
├── 2024/11/10/
│ ├── 14/ # Organized by hour
│ │ ├── 00.mp4 # One file per minute
│ │ ├── 01.mp4
│ │ └── ...
│ └── 15/
└── index.db # SQLite index for timeline queries

TestMethodCriteria
Video imageAlign to standard color card, auto-analyzeColor/contrast/clarity meets spec
IR night visionDarkroom environment, check IR LED brightnessEven illumination, no dark corners
PTZ movementFull-range rotation, check step countPositioning accuracy ≤1°, no jamming
MicrophonePlay standard audio source, check recordingSNR ≥ 40dB
SpeakerPlay test audioNo distortion, volume meets spec
Wi-FiConnect to specified AP, test throughput≥ 10Mbps
TF card slotInsert test card, read/write verificationSpeed ≥ 10MB/s
Button / ResetPress to testGPIO level transitions correctly
Power consumptionMeasure current across scenariosStandby <2W, active <5W

TestConditionsRequirements
High-temperature aging50°C continuous operation for 48hNo crashes, no image anomalies
Low-temperature startup-10°C cold startNormal image output
Voltage fluctuation4.5V ~ 5.5VStable operation
Power-loss testRandom power cuts 1000 timesSystem boots normally, filesystem undamaged
Wi-Fi roamingSignal attenuation / recoveryAuto-reconnect, P2P recovery
Long-term runContinuous operation for 30 daysNo memory leaks, services stay alive

An industrial-grade Mijia camera product spans a wide range of technology stacks, from hardware to software to the cloud:

Hardware: SoC selection → Sensor → Optics → Mechanical → Power → RF
Software: OS/BSP → ISP tuning → Encoding → P2P → AI → IoT → OTA
Cloud: Provisioning → Device management → Cloud storage → Push notifications → OTA distribution
Production: Flashing → Key writing → Automated testing → Burn-in → Packaging

The core challenge is: within limited hardware resources (a few hundred MHz CPU + ~100MB of RAM), simultaneously running video capture and encoding, AI inference, P2P transport, cloud storage upload, IoT communication, and other real-time tasks — all while maintaining 7x24 stable operation. This demands fine-grained resource scheduling, strict memory management, and robust exception recovery mechanisms.

This article ties together the technical knowledge introduced across previous blog posts into a complete product scenario:

  • Audio/Video Digitization — Sampling, YUV420, H.265 encoding applied to the camera’s video capture and encoding pipeline
  • P2P Technology — NAT traversal applied to MISS SDK’s multi-engine P2P architecture
  • HLS Streaming — Segment-based storage applied to cloud storage slice upload and timeline playback
  • Matter Protocol — Device model applied to Mijia IoT SPEC’s Service/Property/Action design

References