Vision and AI Middleware

Embedded Vision Middleware for Safety and Security

management middleware

Arcturus Vision Middleware transforms passive cameras into active vision systems by enabling real-time detection, actions and analytics for safety, security and surveillance.

The core of the middleware consists of optimized machine vision algorithms for background subtraction, object tracking, feature extraction and facial recognition with Convolutional Neural Network (CNN) classification for people, vehicles, bags or other objects. The middleware provides enablement for systems that need to detect, classify, record, report and act on events that occur in live video.

Specialized Detection Applications
The vision processing sub-system continuously analyzes live video and provides the output to a database and higher-level vision analytics applications. These analytics detect the occurrence of events such as:

  • Motion / intrusion
  • Boundary crossing / zone incursion
  • Loitering / motionless behavior
  • Abandoned package
  • Face verification
  • Disappeared object
  • Crowd estimation
  • Crowd flow / counter flow
  • Tailgating
  • Or other custom / specialized detection application

The vision system ties directly into Arcturus Voice and Media Middleware including hardware (camera/V4L) or stream-based IP-video source acquisition with live stream distribution using webRTC for web compatibility, video playback using RTSP or SIP for video and voice distribution using VoIP communication systems. It makes use of Mbarx Secure IoT endpoint stack as the connectivity, notification and control architecture.

Machine Vision Boundary Crossing Demo Factsheet Contact Us


Vision Middleware Diagram

Reduce Bandwidth

Existing systems rely on the continuous transmission of pixel data from each camera to a central location for processing or monitoring. As deployments of surveillance cameras and their resolutions increase, this places upward pressure on the network, resulting in data bottlenecks, extensibility problems and the requirement for new infrastructure to support capacity expansion. With an edge-based approach, analytics processing occurs closer to the camera source or in the same physical system. Continuous high bit-rate traffic is replaced by small-packet notifications that define important events and provide access to video on-demand. With an architecture that includes vision processing at the edge, it is possible to:

  • Significantly lower bandwidth resulting in improved network management and resilience
  • Reduced dependency on big, high-availability backbone connections
  • Lower operating costs on pay-per-bit networks, such as LTE
  • Upgrade or expand systems, without replacing existing infrastructure

Increase Effectiveness

Intelligence at the edge transforms video surveillance from passive observation to active detection by identifying specific events and acting on them. Analysts can be notified of events as they occur in real-time, instead of waiting for an observer to report them. This approach allows analysts to respond quickly using streamlined workflow to identify the event, how it transpired, who the actors are and formulate an effective response. Since processing is done close to the video source, detection events can trigger local actions such as lights, sirens, locks, etc, thus augmenting the role of personnel -even if connectivity is lost. This means:

  • Improved focus, situational awareness and workflows for analysts
  • Reduced response times to critical events
  • Automated real-time local actions and integration with other equipment

Easily Search Archived Video

Vision processing is performed continuously and output is written to a database with camera source and video time-code references. This database forms an ongoing narrative of the live video as it is analyzed. A database approach makes it possible to perform forensic searches using standard queries, eliminating the need to review footage frame-by-frame. Database queries can include filters, ranges and classes:

  • Time and date range
  • Region of interest
  • Object classification type
  • Event type
  • Characterization


    Vision Algorithms
  • Background subtraction (using spatiotemporal binary similarity)
  • Blob or object tracking
  • Euclidean distance
  • Feature extractions using co-occurrence matrices, Motion History Intensity (MHI), Bag of Words (BoW), tracklets
    Inference and Classification
  • Light weight deep neural network, designed for embedded vision applications
  • Streamlined neural network architecture using depth-wise separable convolutions
  • Pre-trained classes of efficient models for detecting people, vehicles, cyclists, motorcycle, luggage, backpacks, bags
  • Secure Web Service
  • Real-time event notifications from detection applications
  • Live video mode
  • Playback video mode
  • Video player with with scrubbing, play, pause controls
  • Event notification timeline
  • Timeline event clustering
  • Workflows to receive notifications, review incidents
  • Analytics database search workflow
  • Video Acquisition
  • V4L – Video for Linux (local camera)
  • IP cameras
  • mjpeg, H.264
  • Resolutions up to FHD (1080p), 30fps
  • Video Pre-processing
  • Image privacy masking
  • Blur and solid masking types
  • Video Post-processing
  • Image overlays e.g.: location, branding, date/time
  • Configurable presentation including bounding boxes
  • Object-based dynamic masking
  • Video Output
  • Event video clips
  • WebRTC live video (VP8/VP9)
  • RTSP video playback
  • SIP/RTP video communications
  • Local or remote video storage
  • mjpeg, H.264, VP8, VP9 encode, decode
  • Video Storage
  • Configurable encoding format
  • Video output to NVR
  • Video event clips
  • Periodic storage syncing
  • Loop recording with event protection
  • Vision Analytics Output
  • Analytics database
  • Date time ranges
  • Region of interest
  • Object classes
  • Event and incidents types
  • Faces and other metadata tags
  • Connectivity
  • I/O and/or UART for local connectivity
  • Secure system integration and cloud connectivity via network


Video encoding, algorithmic and CNN processing availability varies across CPUs based on overall performance and available hardware acceleration.

Processors and Architectures
  • Arm®v8 Cortex®-A53 (64-bit), Arm Cortex-A9 (32-bit), Power®e500
Operating System Support
  • Linux 3.x, 4.x
  • GCC
  • Video encoding, algorithmic and CNN processing availability varies across CPUs based on overall performance and available hardware acceleration. Refer to Voice and Media Middleware for additional information on video options and Mbarx secure IoT for connectivity and integration details.

    Complete Vision System Solution

    For specialized vision applications or complete system implementations, Arcturus offers simple engagement packages to help get development moving quickly. This can include any aspects of algorithm processing, analytics applications, interactive workflow, video storage, playback, live stream and communications. Systems can take advantage of a responsive html5 web interface to provide streamlined workflows, real-time notifications, multiple methods of video interaction, event searches, filters, queries and a high-level timeline of events.

    Event Detection and Notification

    The analytics applications report events via incident notifications that can be presented on a web interface or distributed via a simple, secure protocol for further system integration or higher-level aggregation. Web-based live video stream use WebRTC, eliminating the need for plugins with most modern browsers. The web interface is ideal for developing workflows, alerting analysts of situations and providing access to video to respond effectively.

    Vision System Home Screen Capture
    Streamlined Incident Workflows

    When an incident occurs, analysts can use the notification to immediately review the sequence of video that triggered the event and view a timeline summary. Video sequence playback uses an integrated RTSP-based video player with play/pause and scrubbing controls. Video detection overlays, location and timestamp information can be generated dynamically during video playback maintaining the forensic integrity of video in storage. Video sequences can be exported from the system, packaged as clips or as time-code references to external storage.

    Vision Intrusion Detection