Vision and AI Middleware

Embedded Vision Middleware for Safety and Security

management middleware

Arcturus Vision Middleware transforms cameras from being passive observers into active detection systems by delivering a comprehensive set of analytics for public safety and security at the edge.

The core of the middleware consists of optimized vision algorithms and efficient AI networks capable of object detection, embeddings, tracking and reidentification in real-time, using general purpose edge hardware. Higher-level analytics applications provide situational awareness, detect behaviour and determine events. Output is provided to upstream clients using a real-time event notification system and stored locally using a time series database.

The solution utilizes a highly extensible vision pipeline framework that supports discrete nodes for video source, inference, embeddings, analytics, reidentification and video sink. The pipeline architecture employs cloud native design methodologies allowing nodes to be deployed and interconnected across different physical resources. This approach enables processing to span from fully on-premises to cloud seamlessly, irrespective of resources. This mitigates the need to over-build edge hardware without constraining the scalability of the AI application. Arcturus Vision and AI middleware supports inference runtimes optimized for edge processing including TensorRT™, Arm NN and TensorFlow Lite with a catalog of state-of-the-art pre-trained models including SSD_MobileNet v3, Yolo v3, Inception v4, FaceSSD and others.

The UI/UX has been developed using a Design Thinking process in consultation with police, security and transit authority professionals. The interface presents real-time events and follows intuitive workflows to switch between notifications, live video and event playback seamlessly. An event timeline with clustering makes searching events across large time domains simple. Live streaming is handled using webRTC and RTSP for playback with scrubbing controls.

Introduction - Edge-Based Vision and AI Factsheet White paper Contact Us

Vision at the Edge – Overview and Benefits

Vision Middleware Diagram

Reduce Bandwidth

Existing systems rely on the continuous transmission of pixel data from each camera to a central location for processing or monitoring. As deployments of surveillance cameras and their resolutions increase, this places upward pressure on the network, resulting in data bottlenecks, extensibility problems and the requirement for new infrastructure to support capacity expansion. With an edge-based approach, analytics processing occurs closer to the camera source or in the same physical system. Continuous high bit-rate traffic is replaced by small-packet notifications that define important events and provide access to video on-demand. With an architecture that includes vision processing at the edge, it is possible to:

  • Significantly lower bandwidth resulting in improved network management and resilience
  • Reduce dependency on big, high-availability backbone connections
  • Lower operating costs on pay-per-bit networks, such as LTE
  • Upgrade or expand systems, without replacing existing infrastructure

Increase Effectiveness

Intelligence at the edge transforms video surveillance from passive observation to active detection by identifying specific events and acting on them. Analysts can be notified of events as they occur in real-time, instead of waiting for an observer to report them. This approach allows analysts to quickly identify the event, understand how it transpired, who the actors are and formulate an effective response. Since processing is done close to the video source, detection events can trigger local actions such as lights, sirens, locks or other systems, even if connectivity is lost. This means:

  • Improved focus and situational awareness for analysts
  • Reduced response times during critical events
  • Automated actions and integration with equipment, at the local level

Easily Search Archived Video

Vision processing is performed continuously and output is written to a database with camera source and video time-code references. This database forms an ongoing narrative of the live video as it is analyzed. A database approach makes it possible to perform forensic searches using standard queries, eliminating the need to review footage frame-by-frame. Database queries can include filters, ranges and classes:

  • Time and date range
  • Region of interest
  • Object classification type
  • Event type
  • Characterization

Vision Applications

  • Intrusion Detection
  • Zone Incursion / Boundary Crossing
  • Direction of Travel and Counterflow
  • Crowd and Occupancy Detection
  • Abandoned Package Detection
  • Loitering and Motionless Behavior Detection
  • Removed Object
  • Semantic Segmentation / Characterization


    Vision Algorithms and Primitives
  • Background subtraction (non-parametric)
  • Spatiotemporal blob or object tracking with Euclidean distance
  • Feature extractions using co-occurrence matrices, Motion History Intensity (MHI), Optical Flow, Bag of Words (BoW), tracklets
  • Hardware acceleration using NEON and FPU optimizations
    Inference and Detection
  • Light weight deep neural network, per-trained for general purpose detection
  • Quantized single-shot detector suitable for low-power applications running on general purpose CPUs
  • Pre-trained classes for detecting people, vehicles, cyclists, motorcycle, luggage, backpacks, bags
  • Specialized networks for specific detection tasks such as segmentation and re-identification
  • Hardware acceleration using NEON and FPU optimizations
  • Secure Web Interface
  • Real-time event notifications from detection applications
  • Live video mode
  • Playback video mode
  • Video player with with scrubbing, play, pause controls
  • Event notification timeline
  • Timeline event clustering
  • Workflows to receive notifications, review incidents
  • Analytics database search workflow
  • Video Source Acquisition
  • V4L – Video for Linux (local camera)
  • IP cameras
  • mjpeg, H.264
  • Resolutions up to FHD (1080p), 30fps
  • Video Pre-processing
  • Image privacy masking
  • Blur and solid masking types
  • Video Post-processing
  • Image overlays e.g.: location, branding, date/time
  • Configurable presentation including bounding boxes
  • Object-based dynamic masking
  • Video Output
  • Event video clips
  • WebRTC live video streaming
  • RTSP video playback with scrubbing controls
  • SIP/RTP video communications
  • Local or remote video storage
  • mjpeg, H.264, VP8, VP9 encode, decode (supporting hardware acceleration)
  • Video Storage
  • Configurable encoding format
  • Video output to NVR
  • Video event clips
  • Periodic storage syncing
  • Loop recording with event protection
  • Vision Analytics Output
  • Analytics database
  • Date time ranges
  • Region of interest
  • Object classes
  • Event and incidents types
  • Faces and other metadata tags
  • Connectivity
  • I/O and/or UART for local connectivity
  • Secure system integration and cloud connectivity via network


Video encoding, algorithmic and CNN processing availability varies across CPUs based on overall performance and available hardware acceleration.

Processors and Architectures
  • Arm®v8 Cortex®-A53 (64-bit)
Operating System Support
  • Linux 3.x, 4.x
  • GCC
  • Video encoding, algorithmic and CNN processing availability varies across CPUs based on overall performance and available hardware acceleration. Refer to Voice and Media Middleware for additional information on video options and Mbarx secure IoT for connectivity and integration details.

    Complete Vision System Solution

    Arcturus offers simple engagement packages to help get development moving quickly. This can include specialized algorithm processing, custom analytics applications or tailored workflows. Systems can take advantage of a number of preexisting blocks including video storage, playback, live stream, voice and video communications. A responsive web interface provides streamlined workflows, real-time notifications, live and playback video modes, event searches, filters, queries and a high-level timeline of events.

    Event Detection and Notification

    Analytics applications report events via web-based incident notifications or distributed via a secure protocol. Live video streams use WebRTC, eliminating the need for plugins with most modern browsers. Streamlined workflows help alert analysts to situations quickly and respond effectively.

    Vision System Home Screen Capture
    Streamlined Incident Workflows

    Incident notification allow analysts to review the sequence of video that triggered the event from a timeline summary. Video playback supports integrated RTSP-based player controls with play/pause and scrubbing. Video detection overlays, location and timestamp information is generated dynamically and applied as a overlay during playback, maintaining the forensic integrity of the stored video. Video sequences can be exported from the system, packaged as clips or as time-code references to external storage.

    Vision Intrusion Detection