Difference between revisions of "Smart 3D Cameras"

From MIT Technology Roadmapping
Jump to navigation Jump to search
 
(85 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''Work in Progress'''
=Smart 3D Cameras for Obstacle Detection, Avoidance and Identification=
=Technology Roadmap Sections and Deliverables=
The Smart 3D Camera roadmap is a level 3 roadmap as it enables the level 2 roadmaps for obstacle detection, avoidance and autonomous navigation of robots, drones and cars.
* '''3S3DCAM - Smart 3D Camera'''


=Technology Roadmap Sections and Deliverables=
[[File:Screen Shot 2019-12-05 at 12.02.21 PM.png|600px]]
The Smart 3D Camera roadmap is a level 2 roadmap as it enables the level 1 roadmaps for autonomous navigation of robots, drones and cars.  
* '''2S3DC - Smart 3D Camera'''


==Roadmap Overview==
==Roadmap Overview==
The robotics and autonomous cars industry is currently lacking an accurate, fast, cheap and reliable sensor which can be used both for obstacle detection, pose estimation and as a safety function. The power budget for a stereo-vision pipeline that can detect, map and avoid its obstacles can be up to 80% of the robot's power budget (in the case of industrial mobile robots). This roadmap explores the feasibility of augmenting stereo cameras to create a safety-certifiable 3D sensor.
[[File:Robot camera stereo.png|600px]]
Smart 3D Cameras use a pair of identical optical imaging sensors and IR projectors (in certain use-cases) to capture stereo images of the environment. These images are then processed to calculate the disparity and then extract depth information for all pixels. In addition to the depth map, the scene is segmented to extract objects of interest and to identify them using training neural nets. This roadmap will focus on passive stereo vision cameras that do not use structured light due to their benefits for long range detection.


Smart 3D Cameras use a pair of identical optical imaging sensors and IR projectors (in certain use-cases) to capture stereo images of the environment. These images are then processed to calculate the disparity and then extract depth information for all pixels. In addition to the depth map, the scene is segmented to extract objects of interest and to identify them using training neural nets. Note that this roadmap will focus on passive stereo vision cameras that DO NOT use structured light.
There are three major areas of interest for augmenting a 3D camera - the sensor, the compute and the algorithms. These are highlighted in red on the system decomposition below.
[[File:Screen Shot 2019-12-05 at 12.02.32 PM.png|800px]]


[[File:Smart3DCamera.png|800px]]


[[File:Smart 3D Camera decomposition.jpg|800px]]
The current state for processing stereo images involves a mix of on-board processing for computing the depth map and segmenting obstacles. Object identification is then farmed out to a cloud service since those algorithms tend be very computationally intensive. There has been significant progress made in the reduction of power consumption for processors, thus enabling a new world of computing intensive applications in embedded platforms. The Smart 3D Camera roadmap will explore onboard processing of all algorithms given the rapid rate of progress in the processing efficiency (Giga Operations per Second per Watt - GOPS/W FOM) for processors.
 
[[File:Screen Shot 2019-12-05 .png|800px]]


==Design Structure Matrix (DSM) Allocation==
==Design Structure Matrix (DSM) Allocation==
[[File:Smart3DCam DSM.png|800px]]


The 2S3DCAM roadmap is part of the larger company effort to develop an autonomous navigation stack as it enables 1ANAV.  
[[File:Screen Shot 2019-12-05 at 12.05.50 PM.png|600px]]


The following tree can be discerned:
The 3S3DCAM roadmap is part of the larger company effort to develop an autonomous navigation stack as it enables 2ANAV.
1ANAV
* 2S3DCAM
** 3OPTFIL


==Roadmap Model using OPM==
==Roadmap Model using OPM==
[[File:Smart 3D Camera OPM.png|800px]]
[[File:3S3DCAM OPM Model.png|800px]]


==Figures of Merit==
==Figures of Merit==
Line 31: Line 35:
! Figure of Merit !! Units !! Description
! Figure of Merit !! Units !! Description
|-
|-
| Million Disparity Estimations Per Second (MDE/s) (10^6) || - || Comparison metric defined as:
| '''Power Consumed per Depth Pixel, P_dpx''' || '''W/px''' || The total power consumed by the sensing and processing pipeline divided by the number of depth pixels
MDE/s = Image resolution * disparities * sensing speed
|-
|-
| Power Consumption || Watts (W)||Power consumed by the entire stereo camera and image processing pipeline to produce a depth map  
| '''Energy Consumed per Disparity Estimation per Depth Pixel, E_dpx''' || '''J/pxHz''' || The total energy consumed by the sensing and processing pipeline divided by the Million Disparity Estimations per Second
|-
| Million Disparity Estimations Per Second (MDE/s) (10^6) || pxHz || Comparison metric defined as: MDE/s = Image resolution * disparities * frame rate
|-
| '''Reliability''' || - ||Number of failures as a percentage of usage
|-
| '''Power Consumption''' || Watts (W)||Power consumed by the entire stereo camera and image processing pipeline to produce a depth map  
|-
|-
| Image resolution || Pixel (px) ||Number of pixels in the captured image
| Image resolution || Pixel (px) ||Number of pixels in the captured image
Line 42: Line 51:
| Accuracy (m) || m || The measuring confidence in the depth data point
| Accuracy (m) || m || The measuring confidence in the depth data point
|-
|-
| Sensing Speed (fps or Hz) || fps or Hz || The scanning frame rate of the entire system
| Frame rate (fps or Hz) || fps or Hz || The scanning frame rate of the entire system
|-
|-
| Depth Pixels (px) || px || The number of data points in the generated depth map
| Depth Pixels (px) || px || The number of data points in the generated depth map
|-
|-
| Cost ($) || $ || The commercial price for a customer, at volume
| '''Cost ($)''' || $ || The commercial price for a customer, at volume
|-
|-
| Energy Consumed per Depth Pixel, E_dpx || W/px || The total power consumed by the sensing and processing pipeline divided by the number of depth pixels
|}
 
The fundamental principle for stereo vision is computing distance from image disparity. This can be defined by the following equation where z is the depth, b is the baseline, F is the focal length and d is the disparity. All variables are in meters.
 
'''z = bF/d'''
 
The Energy Consumed per Disparity Estimation Per Depth Pixel, E_dpx which is the total energy cost for acquiring and processing the image divided by the product of the image resolution (n_px), number of disparities (n_d) and frame rate (f).


|}
[[File:EnergyEquation.png|200px]]


Besides defining what the FOMs are, this section of the roadmap should also contain the FOM trends over time dFOM/dt as well as some of the key governing equations that underpin the technology. These governing equations can be derived from physics (or chemistry, biology ..) or they can be empirically derived from a multivariate regression model. The table below shows an example of a key governing equation governing (solar-) electric aircraft.
The Power Consumed per Depth Pixel is total power consumption of the processing and object detection/identification pipeline divided by the number of depth pixels.  


[[File:Section 4_2.JPG]]
'''P_dpx = P/dpx = W/px'''


==Alignment with Company Strategic Drivers==
==Alignment with Company Strategic Drivers==
The table below shows an example of potential strategic drivers and alignment of the 2SEA technology roadmap with it.
{| class="wikitable"
|-
! # !! Strategic Driver !! Alignment and Targets
|-
| 1 || To develop a compact, high performance and low-power smart 3D camera that can detect objects in both indoors and outdoor environments || The 2S3DCAM roadmap will target the development of a passive stereo camera with onboard computing that has a sensing range of >20m, sensing speed of >30fps at an energy cost lower than 1uW/px in a 15cm x 5cm x 5cm footprint.  <span style="background:#00FF00"> '''ALIGNED'''</span>
|-
| 2 || To enable autonomous classification and identification of relevant objects in the scene || The 2S3DCAM roadmap will enable the capability for AI neural nets to run onboard the camera to perform image classification and recognition actions. <span style="background:#00FF00"> '''ALIGNED BUT LOWER PRIORITY'''</span>


[[File:Section 5.JPG]]
|}


The list of drivers shows that the company views HAPS as a potential new business and wants to develop it as a commercially viable (for profit) business (1). In order to do so, the technology roadmap performs some analysis - using the governing equations in the previous section - and formulates a set of FOM targets that state that such a UAV needs to achieve an endurance of 500 days (as opposed to the world record 26 days that was demonstrated in 2018) and should be able to carry a payload of 10 kg. The roadmap confirms that it is aligned with this driver. This means that the analysis, technology targets, and R&D projects contained in the roadmap (and hopefully funded by the R&D budget) support the strategic ambition stated by driver 1. The second driver, however, which is to use the HAPS program as a platform for developing an autonomy stack for both UAVs and satellites, is not currently aligned with the roadmap.
==Positioning of Company vs. Competition==
[[File:Competition Comparison 3d Camera.png|600px]]


==Positioning of Company vs. Competition==
By attaining those specifications, our Smart 3D Camera gets much closer to the utopia point.  
The figure below shows a summary of other electric and solar-electric aircraft from public data.
 
[[File:Screen Shot 2019-12-05 at 12.12.39 PM.png|800px]]


[[File:Section 6.JPG]]
==Technical Model==
The most important FOM is the Energy Consumed per Depth Pixel, E_dpx which is the total energy cost for acquiring and processing the image divided by the product of the image resolution (n_px), number of disparities (n_d) and frame rate (f).  


The aerobatic aircraft Extra 330LE by Siemens currently has the world record for the most powerful flight certified electric motor (260kW). The Pipistrel Alpha Electro is a small electric training aircraft which is not solar powered, but is in serial production. The Zephyr 7 is the previous version of Zephyr which established the prior endurance world record for solar-electric aircraft (14 days) in 2010. The Solar Impulse 2 was a single-piloted solar-powered aircraft that circumnavigated the globe in 2015-2016 in 17 stages, the longest being the one from Japan to Hawaii (118 hours).  
[[File:EnergyEquation.png|200px]]


SolarEagle  and Solara 50 were both very ambitious projects that aimed to launch solar-electric aircraft with very aggressive targets (endurace up to 5 years) and payloads up to 450 kg. Both of these projects were canceled prematurely. Why is that?
Since the image resolution and number of disparities are constants for a comparison, the relationship can be described as:


[[File:Section 6_2.JPG]]
[[File:EnergyDiffe.png|200px]]


The Pareto Front (see Chapter 5, Figure 5-20 for a definition) shown in black in the lower left corner of the graph shows the best tradeoff between endurance and payload for actually achieved electric flights by 2017. The Airbus Zephyr, Solar Impulse 2 and Pipistrel Alpha Electro all have flight records that anchor their position on this FOM chart. It is interesting to note that Solar Impulse 2 overheated its battery pack during its longest leg in 2015-2016 and therefore pushed the limits of battery technology available at that time.  We can now see that both Solar Eagle in the upper right and Solara 50 were chasing FOM targets that were unachievable with the technology available at that time. The progression of the Pareto front shown in red corresponds to what might be a realistic Pareto Front progression by 2020. Airbus Zephyr Next Generation (NG) has already shown with its world record (624 hours endurance) that the upper left target (low payload mass - about 5-10 kg and high endurance of 600+ hours) is feasible. There are currently no plans for a Solar Impulse 3,  which could be a non-stop solar-electric circumnavigation with one pilot (and an autonomous co-pilot) which would require a non-stop flight of about 450 hours. A next generation E-Fan aircraft with an endurance of about 2.5 hours (all electric) also seems within reach for 2020. Then in green we set a potentially more ambitious target Pareto Front for 2030. This is the ambition of the 2SEA technology roadmap as expressed by strategic driver 1. We see that in the upper left the Solara 50 project which was started by Titan Aerospace, then acquired by Google, then cancelled, and which ran from about 2013-2017 had the right targets for about a 2030 Entry-into-Service (EIS), not for 2020 or sooner. The target set by Solar Eagle was even more utopian and may not be achievable before 2050 according to the 2SEA roadmap.


==Technical Model==
The parameters that affect frame rate is the image resolution, number of disparities and processor/image sensor technology. The curves below were generated empirically based on the data from Andrepoulos et al. that was analyzed for this assignment. The paper is in the publications section.  
In order to assess the feasibility of technical (and financial) targets at the level of the 2SEA roadmap it is necessary to develop a technical model. The purpose of such a model is to explore the design tradespace and establish what are the active constraints in the system. The first step can be to establish a morphological matrix that shows the main technology selection alternatives that exist at the first level of decomposition, see the figure below.


[[File:Section 7_.JPG]]
[[File:SpeedPower.png|800px]]


It is interesting to note that the architecture and technology selections for the three aircraft (Zephyr, Solar Impulse 2 and E-Fan 2.0) are quite different. While Zephyr uses lithium-sulfur batteries, the other two use the more conventional lithium-ion batteries. Solar Impulse uses the less efficient (but more affordable) single cell silicon-based PV, while Zephyr uses specially manufactured thin film multi-junction cells and so forth.
The normalized model with three controllable parameters is shown in the tornado chart below. The imaging sensor and processor choice has a significantly larger impact on power consumption, followed by the image resolution and then by the number of disparities.  
[[File:Tornado.png|600px]]


The technical model centers on the E-range and E-endurance equations and compares different aircraft sizing (e.g. wing span, engine power, battery capacity) taking into account aerodynamics, weights and balance, the performance of the aircraft and also its manufacturing cost. It is important to use Multidisciplinary Design Optimization (MDO) when selecting and sizing technologies in order to get the most out of them and compare them fairly (see below).
This informs the variable selection in the morphological matrix below. The cells highlighted in green are favorable choices and the final choice is boxed in purple. The favorable choices were determined by comparing the current performance available from the technology choice and how it compares against the targets defined for 3S3DCAM. The final choices in purple were chosen because they provide a competitive edge. For example, moving to the neuromorphic event system for the vision processor instead of FPGA enables the company to be on the leading S-curve for low-latency vision processing. Similarly using the neuromorphic imaging sensor and TrueNorth AI processor (neuromorphic) enables a step reduction in power consumption. The remaining parameters such as resolution, shutter, baseline, lens FOV were chosen to optimize for the camera footprint and performance.  


[[File:Section 7_2.JPG]]
[[File:MorphMatrix.png|800px]]


==Financial Model==
==Financial Model==
The figure below contains a sample NPV analysis underlying the 2SEA roadmap. It shows the non-recurring cost (NRC) of the product development project (PDP), which includes the R&D expenditures as negative numbers. A ramp up-period of  4 years is planned with a flat revenue plateau (of 400 million per year) and a total program duration of 24 years.


[[File:Section 8.JPG]]
As the company investing in the development of Smart 3D cameras, two types of financial models matter: 1) the ROI model for a customer which helps inform the BOM cost target and pricing limits 2) the NPV model for prioritizing R&D projects. This analysis will be limited to the #2 as the customer CONOPS is currently not well understood.
To prioritize the R&D projects, it is important to work from the strategic drivers. To achieve both the strategic drivers, innovations are needed for high performance and low power in the imaging sensor, the compute architecture and the classification algorithms.
 
The company has $11 million to spend on R&D programs. The NPV for each of the programs is calculated assuming a discount rate of 7% and a product life of ~10 years. Since the exact monetary benefit of each of these innovations is hard to characterize, I’ve guesstimated the impact to sales and cash flow.
Based on the outcome of the financial analysis, the following innovation programs will be funded:
 
[[File:R d table.png|600px]]
 
[[File:Camera Cashflow.png|600px]]
 
[[File:Dcf camera 1.png|600px]]


==List of R&T Projects and Prototypes==
==List of R&T Projects and Prototypes==
In order to select and prioritize R&D (R&T) projects we recommend using the technical and financial models developed as part of the roadmap to rank-order projects based on an objective set of criteria and analysis. The figure below illustrates how technical models are used to make technology project selections, e.g based on the previously stated 2030 target performance and Figure 8-17 (see the Chapter 8 of the text) shows the outcome if none of the three potential projects are selected.
Based on the strategic drivers, the following programs will be funded:


[[File:Section 9.JPG]]
[[File:Screen Shot 2019-12-05 at 11.50.34 AM.png|600px]]
 
 
1.'''Project Morphy (TRL5 -> TRL7)'''
Morphy is an ambitious R&D program to accelerate the technology maturity of neuromorphic sensors and computers for use in production grade cameras. This project will license the two patents listed below and validate concepts from paper #2.
 
2. '''Project Edge (TRL6 -> TRL9)'''
Edge will develop methods to leverage FPGAs and ASICs to perform pixel computation closer to the sensor in lieu of power hungry GPUs. This project will reproduce and improve upon the results from paper #1
 
3. '''Project Nimbus (TRL6 -> TRL9)'''
Nimbus is an ambitious project to simplify ML algorithms and models so that they can be performant on embedded devices. This project will build upon ideas from paper #3.
 
These three projects can be classified as follows:
 
[[File:R d Portfolio.png|600px]]
 
The development and deployment timeline is as follows:
 
[[File:Screen Shot 2019-12-05 at 11.58.17 AM.png|800px]]


==Key Publications, Presentations and Patents==
==Key Publications, Presentations and Patents==
A good technology roadmap should contain a comprehensive list of publications, presentations and key patents as shown in Figure 8-19. This includes literature trends, papers published at key conferences and in the trade literature and trade press.


[[File:Section 10 1.JPG]]
===Patents===
 
 
'''Dawson et al. Neuromorphic Digital Focal Plane Array. US Pat Pending. US20180278868A1'''
This patent claims new techniques for creating imaging sensors that leverage the principle of neuromorphism to embed pixel processing directly onto the sensor. For a Smart 3D camera this presents a disruptive option for two FOMs - reduce power consumption and increased frame rate.
 
[[File:Dawson et al.png|600px]]
'''Bobda et al. Reconfigurable 3D Pixel-Parallel Neuromorphic Architecture for Smart Image Sensor. Pending. US20190325250A1'''
This patent also leverages neuromorphism but instead of only tracking pixel changes, it also embeds processing elements into different regions of the image sensor. With this technology, it will be feasible to embed intelligent processing of shapes and features close to the image capture system. By leveraging image sensor embedded processing, a magnitude improvement in power efficiency and performance can be achieved.
[[File:Bobda et al.png|600px]]
 
===Publications===
'''Michalik et al. Real time smart stereo camera based on FPGA-SoC. 2017. IEEE-RAS'''
This work presents a realtime smart stereo camera system implementation resembling the full stereo processing pipeline in a single FPGA device. The paper introduces a novel memory optimized stereo processing algorithm ”Sparse Retina Census Correlation” (SRCC) that embodies a combination of two well established window based stereo matching approaches. The presented smart camera solution has demonstrated real-time stereo processing of 1280×720 pixel depth images with 256 disparities on a Zynq XC7Z030 FPGA device at 60fps. This approach is ~3x faster than the nearest competitor.
 
[[File:Michalik et al .png|600px]]
 
'''Andrepoulos et al. A Low Power, High Throughput, Fully Event-Based Stereo System. 2018 IEEE CVF'''
This paper uses neuromorphic event-based hardware to implement stereo vision. This is the first time that an end-to-end stereo pipeline from image acquisition and rectification, multi-scale spatiotemporal stereo correspondence, winner-take-all, to disparity regularization is implemented fully on event-based hardware. Using a cluster of TrueNorth neurosynaptic processors, the authors demonstrates their ability to process bilateral event-based inputs streamed live by Dynamic Vision Sensors (DVS), at up to 2,000 disparity maps per second, producing high fidelity disparities which are in turn used to reconstruct, at low power, the depth of events produced from rapidly changing scenes. They consume ~200x lesser power at 0.058mW/pixel!
 
[[File:Andrepoulos et al.png|600px]]
 
'''Shin et al. An 1.92mW Feature Reuse Engine based on inter-frame similarity for low-power object recognition in video frames. 2014 IEEE'''
This paper proposes a Feature Reuse Engine (FReE) to achieve low-power object recognition in video frames. Unlike previous works, proposed FReE reuses 58% of features from previous frame with inter-frame similarity. Power consumption of object recognition processor is reduced by 31% with the proposed FReE which consumes only 1.92mW in a 130nm CMOS technology. This has potential for reducing power consumption for smart stereo cameras.


==Technology Strategy Statement==
==Technology Strategy Statement==
A technology roadmap should conclude and be summarized by both a written statement that summarizes the technology strategy coming out of the roadmap as well as a graphic that shows the key R&D investments, targets and a vision for this technology (and associated product or service) over time. For the 2SEA roadmap the statement could read as follows:
[[File:Swoosh camera.png|800px]]


'''Our target is to develop a new solar-powered and electrically-driven UAV as a HAPS service platform with an Entry-into-Service date of 2030. To achieve the target of an endurance of 500 days and useful payload of 10 kg we will invest in two R&D projects. The first is a flight demonstrator with a first flight by 2027 to demonstrate a full-year aloft (365 days) at an equatorial latitude with a payload of 10 kg. The second project is an accelerated development of Li-S batteries with our partner XYZ with a target lifetime performance of 500 charge-discharge cycles by 2027. This is an enabling technology to reach our 2030 technical and business targets.'''
Our goal is to be the industry leader in safety-rated 3D sensors for autonomous navigation,especially for industrial robots and drones. '''We intend to get there by 2025 by developing safety-certified smart 3D cameras that cost less than $500. These 3D cameras will leverage cutting-edge image sensors, processors and algorithms that can sense and classify the world at greater than 720p resolution at under 5ms latency at a low power consumption of 1uW/pixel.'''
To achieve these goals, we will invest $11MM in three programs (Morphy, Edge and Nimbus) that has a NPV of $81M over 10 years.

Latest revision as of 13:14, 10 December 2019

Smart 3D Cameras for Obstacle Detection, Avoidance and Identification

Technology Roadmap Sections and Deliverables

The Smart 3D Camera roadmap is a level 3 roadmap as it enables the level 2 roadmaps for obstacle detection, avoidance and autonomous navigation of robots, drones and cars.

  • 3S3DCAM - Smart 3D Camera

Screen Shot 2019-12-05 at 12.02.21 PM.png

Roadmap Overview

The robotics and autonomous cars industry is currently lacking an accurate, fast, cheap and reliable sensor which can be used both for obstacle detection, pose estimation and as a safety function. The power budget for a stereo-vision pipeline that can detect, map and avoid its obstacles can be up to 80% of the robot's power budget (in the case of industrial mobile robots). This roadmap explores the feasibility of augmenting stereo cameras to create a safety-certifiable 3D sensor.

Robot camera stereo.png

Smart 3D Cameras use a pair of identical optical imaging sensors and IR projectors (in certain use-cases) to capture stereo images of the environment. These images are then processed to calculate the disparity and then extract depth information for all pixels. In addition to the depth map, the scene is segmented to extract objects of interest and to identify them using training neural nets. This roadmap will focus on passive stereo vision cameras that do not use structured light due to their benefits for long range detection.

There are three major areas of interest for augmenting a 3D camera - the sensor, the compute and the algorithms. These are highlighted in red on the system decomposition below. Screen Shot 2019-12-05 at 12.02.32 PM.png


The current state for processing stereo images involves a mix of on-board processing for computing the depth map and segmenting obstacles. Object identification is then farmed out to a cloud service since those algorithms tend be very computationally intensive. There has been significant progress made in the reduction of power consumption for processors, thus enabling a new world of computing intensive applications in embedded platforms. The Smart 3D Camera roadmap will explore onboard processing of all algorithms given the rapid rate of progress in the processing efficiency (Giga Operations per Second per Watt - GOPS/W FOM) for processors.

Screen Shot 2019-12-05 .png

Design Structure Matrix (DSM) Allocation

Screen Shot 2019-12-05 at 12.05.50 PM.png

The 3S3DCAM roadmap is part of the larger company effort to develop an autonomous navigation stack as it enables 2ANAV.

Roadmap Model using OPM

3S3DCAM OPM Model.png

Figures of Merit

Figure of Merit Units Description
Power Consumed per Depth Pixel, P_dpx W/px The total power consumed by the sensing and processing pipeline divided by the number of depth pixels
Energy Consumed per Disparity Estimation per Depth Pixel, E_dpx J/pxHz The total energy consumed by the sensing and processing pipeline divided by the Million Disparity Estimations per Second
Million Disparity Estimations Per Second (MDE/s) (10^6) pxHz Comparison metric defined as: MDE/s = Image resolution * disparities * frame rate
Reliability - Number of failures as a percentage of usage
Power Consumption Watts (W) Power consumed by the entire stereo camera and image processing pipeline to produce a depth map
Image resolution Pixel (px) Number of pixels in the captured image
Range (m) m The maximum sensing distance
Accuracy (m) m The measuring confidence in the depth data point
Frame rate (fps or Hz) fps or Hz The scanning frame rate of the entire system
Depth Pixels (px) px The number of data points in the generated depth map
Cost ($) $ The commercial price for a customer, at volume

The fundamental principle for stereo vision is computing distance from image disparity. This can be defined by the following equation where z is the depth, b is the baseline, F is the focal length and d is the disparity. All variables are in meters.

z = bF/d

The Energy Consumed per Disparity Estimation Per Depth Pixel, E_dpx which is the total energy cost for acquiring and processing the image divided by the product of the image resolution (n_px), number of disparities (n_d) and frame rate (f).

EnergyEquation.png

The Power Consumed per Depth Pixel is total power consumption of the processing and object detection/identification pipeline divided by the number of depth pixels.

P_dpx = P/dpx = W/px

Alignment with Company Strategic Drivers

# Strategic Driver Alignment and Targets
1 To develop a compact, high performance and low-power smart 3D camera that can detect objects in both indoors and outdoor environments The 2S3DCAM roadmap will target the development of a passive stereo camera with onboard computing that has a sensing range of >20m, sensing speed of >30fps at an energy cost lower than 1uW/px in a 15cm x 5cm x 5cm footprint. ALIGNED
2 To enable autonomous classification and identification of relevant objects in the scene The 2S3DCAM roadmap will enable the capability for AI neural nets to run onboard the camera to perform image classification and recognition actions. ALIGNED BUT LOWER PRIORITY

Positioning of Company vs. Competition

Competition Comparison 3d Camera.png

By attaining those specifications, our Smart 3D Camera gets much closer to the utopia point.

Screen Shot 2019-12-05 at 12.12.39 PM.png

Technical Model

The most important FOM is the Energy Consumed per Depth Pixel, E_dpx which is the total energy cost for acquiring and processing the image divided by the product of the image resolution (n_px), number of disparities (n_d) and frame rate (f).

EnergyEquation.png

Since the image resolution and number of disparities are constants for a comparison, the relationship can be described as:

EnergyDiffe.png


The parameters that affect frame rate is the image resolution, number of disparities and processor/image sensor technology. The curves below were generated empirically based on the data from Andrepoulos et al. that was analyzed for this assignment. The paper is in the publications section.

SpeedPower.png

The normalized model with three controllable parameters is shown in the tornado chart below. The imaging sensor and processor choice has a significantly larger impact on power consumption, followed by the image resolution and then by the number of disparities. Tornado.png

This informs the variable selection in the morphological matrix below. The cells highlighted in green are favorable choices and the final choice is boxed in purple. The favorable choices were determined by comparing the current performance available from the technology choice and how it compares against the targets defined for 3S3DCAM. The final choices in purple were chosen because they provide a competitive edge. For example, moving to the neuromorphic event system for the vision processor instead of FPGA enables the company to be on the leading S-curve for low-latency vision processing. Similarly using the neuromorphic imaging sensor and TrueNorth AI processor (neuromorphic) enables a step reduction in power consumption. The remaining parameters such as resolution, shutter, baseline, lens FOV were chosen to optimize for the camera footprint and performance.

MorphMatrix.png

Financial Model

As the company investing in the development of Smart 3D cameras, two types of financial models matter: 1) the ROI model for a customer which helps inform the BOM cost target and pricing limits 2) the NPV model for prioritizing R&D projects. This analysis will be limited to the #2 as the customer CONOPS is currently not well understood. To prioritize the R&D projects, it is important to work from the strategic drivers. To achieve both the strategic drivers, innovations are needed for high performance and low power in the imaging sensor, the compute architecture and the classification algorithms.

The company has $11 million to spend on R&D programs. The NPV for each of the programs is calculated assuming a discount rate of 7% and a product life of ~10 years. Since the exact monetary benefit of each of these innovations is hard to characterize, I’ve guesstimated the impact to sales and cash flow. Based on the outcome of the financial analysis, the following innovation programs will be funded:

R d table.png

Camera Cashflow.png

Dcf camera 1.png

List of R&T Projects and Prototypes

Based on the strategic drivers, the following programs will be funded:

Screen Shot 2019-12-05 at 11.50.34 AM.png


1.Project Morphy (TRL5 -> TRL7) Morphy is an ambitious R&D program to accelerate the technology maturity of neuromorphic sensors and computers for use in production grade cameras. This project will license the two patents listed below and validate concepts from paper #2.

2. Project Edge (TRL6 -> TRL9) Edge will develop methods to leverage FPGAs and ASICs to perform pixel computation closer to the sensor in lieu of power hungry GPUs. This project will reproduce and improve upon the results from paper #1

3. Project Nimbus (TRL6 -> TRL9) Nimbus is an ambitious project to simplify ML algorithms and models so that they can be performant on embedded devices. This project will build upon ideas from paper #3.

These three projects can be classified as follows:

R d Portfolio.png

The development and deployment timeline is as follows:

Screen Shot 2019-12-05 at 11.58.17 AM.png

Key Publications, Presentations and Patents

Patents

Dawson et al. Neuromorphic Digital Focal Plane Array. US Pat Pending. US20180278868A1 This patent claims new techniques for creating imaging sensors that leverage the principle of neuromorphism to embed pixel processing directly onto the sensor. For a Smart 3D camera this presents a disruptive option for two FOMs - reduce power consumption and increased frame rate.

Dawson et al.png

Bobda et al. Reconfigurable 3D Pixel-Parallel Neuromorphic Architecture for Smart Image Sensor. Pending. US20190325250A1 This patent also leverages neuromorphism but instead of only tracking pixel changes, it also embeds processing elements into different regions of the image sensor. With this technology, it will be feasible to embed intelligent processing of shapes and features close to the image capture system. By leveraging image sensor embedded processing, a magnitude improvement in power efficiency and performance can be achieved. Bobda et al.png

Publications

Michalik et al. Real time smart stereo camera based on FPGA-SoC. 2017. IEEE-RAS This work presents a realtime smart stereo camera system implementation resembling the full stereo processing pipeline in a single FPGA device. The paper introduces a novel memory optimized stereo processing algorithm ”Sparse Retina Census Correlation” (SRCC) that embodies a combination of two well established window based stereo matching approaches. The presented smart camera solution has demonstrated real-time stereo processing of 1280×720 pixel depth images with 256 disparities on a Zynq XC7Z030 FPGA device at 60fps. This approach is ~3x faster than the nearest competitor.

Michalik et al .png

Andrepoulos et al. A Low Power, High Throughput, Fully Event-Based Stereo System. 2018 IEEE CVF This paper uses neuromorphic event-based hardware to implement stereo vision. This is the first time that an end-to-end stereo pipeline from image acquisition and rectification, multi-scale spatiotemporal stereo correspondence, winner-take-all, to disparity regularization is implemented fully on event-based hardware. Using a cluster of TrueNorth neurosynaptic processors, the authors demonstrates their ability to process bilateral event-based inputs streamed live by Dynamic Vision Sensors (DVS), at up to 2,000 disparity maps per second, producing high fidelity disparities which are in turn used to reconstruct, at low power, the depth of events produced from rapidly changing scenes. They consume ~200x lesser power at 0.058mW/pixel!

Andrepoulos et al.png

Shin et al. An 1.92mW Feature Reuse Engine based on inter-frame similarity for low-power object recognition in video frames. 2014 IEEE This paper proposes a Feature Reuse Engine (FReE) to achieve low-power object recognition in video frames. Unlike previous works, proposed FReE reuses 58% of features from previous frame with inter-frame similarity. Power consumption of object recognition processor is reduced by 31% with the proposed FReE which consumes only 1.92mW in a 130nm CMOS technology. This has potential for reducing power consumption for smart stereo cameras.

Technology Strategy Statement

Swoosh camera.png

Our goal is to be the industry leader in safety-rated 3D sensors for autonomous navigation,especially for industrial robots and drones. We intend to get there by 2025 by developing safety-certified smart 3D cameras that cost less than $500. These 3D cameras will leverage cutting-edge image sensors, processors and algorithms that can sense and classify the world at greater than 720p resolution at under 5ms latency at a low power consumption of 1uW/pixel. To achieve these goals, we will invest $11MM in three programs (Morphy, Edge and Nimbus) that has a NPV of $81M over 10 years.