This class is designed to enable quick, easy, and robust inferencing of .tflite yolo model. More...

#include <YOLOModel.hpp>

Inheritance diagram for yolomodel::tensorflow::TPUInterpreter:

Collaboration diagram for yolomodel::tensorflow::TPUInterpreter:

[legend]

Public Member Functions
	TPUInterpreter (std::string szModelPath, PerformanceModes ePowerMode=PerformanceModes::eHigh, unsigned int unMaxBulkInQueueLength=32, bool bUSBAlwaysDFU=false)
	Construct a new TPUInterpreter object.

	~TPUInterpreter ()
	Destroy the TPUInterpreter object.

std::vector< std::vector< Detection > >	Inference (const cv::Mat &cvInputFrame, const float fMinObjectConfidence=0.85, const float fNMSThreshold=0.6) override
	Given an input image forward the image through the YOLO model to run inference on the EdgeTPU, then parse and repackage the output tensor data into a vector of easy-to-use Detection structs.

Public Member Functions inherited from TensorflowTPU< std::vector< std::vector< Detection > >, cv::Mat >
	TensorflowTPU (std::string szModelPath, PerformanceModes ePowerMode=PerformanceModes::eHigh, unsigned int unMaxBulkInQueueLength=32, bool bUSBAlwaysDFU=false)
	Construct a new TensorflowTPU object.

	~TensorflowTPU ()
	Destroy the TensorflowTPU object.

void	CloseHardware ()
	Release all hardware and reset models and interpreters.

TfLiteStatus	OpenAndLoad (DeviceType eDeviceType=DeviceType::eAuto)
	Attempt to open the model at the given path and load it onto the EdgeTPU device.

bool	GetDeviceIsOpened () const
	Accessor for the Device Is Opened private member.

Private Member Functions
void	ParseTensorOutputYOLOv5 (int nOutputIndex, std::vector< int > &vClassIDs, std::vector< float > &vClassConfidences, std::vector< cv::Rect > &vBoundingBoxes, float fMinObjectConfidence, int nOriginalFrameWidth, int nOriginalFrameHeight)
	Given a TFLite output tensor from a YOLOv5 model, parse it's output into something more usable. The parsed output will be in the form of three vectors: one for class IDs, one for the prediction confidence for the class ID, and one for cv::Rects storing the bounding box data for the prediction. A prediction will line up between the three vectors. (vClassIDs[0], vClassConfidences[0], and vBoundingBoxes[0] correspond to the same prediction.)

void	ParseTensorOutputYOLOv8 (int nOutputIndex, std::vector< int > &vClassIDs, std::vector< float > &vClassConfidences, std::vector< cv::Rect > &vBoundingBoxes, float fMinObjectConfidence, int nOriginalFrameWidth, int nOriginalFrameHeight)
	Given a TFLite output tensor from a YOLOv8 model, parse it's output into something more usable. The parsed output will be in the form of three vectors: one for class IDs, one for the prediction confidence for the class ID, and one for cv::Rects storing the bounding box data for the prediction. A prediction will line up between the three vectors. (vClassIDs[0], vClassConfidences[0], and vBoundingBoxes[0] correspond to the same prediction.)

InputTensorDimensions	GetInputShape (const int nTensorIndex=0)
	Get the input shape of the tensor at the given index. Requires the device to have been successfully opened.

OutputTensorDimensions	GetOutputShape (const int nTensorIndex=0)
	Get the output shape of the tensor at the given index. Requires the device to have been successfully opened.

Private Attributes
cv::Mat	m_cvFrame

Additional Inherited Members
Public Types inherited from TensorflowTPU< std::vector< std::vector< Detection > >, cv::Mat >
enum	DeviceType

enum	PerformanceModes

Static Public Member Functions inherited from TensorflowTPU< std::vector< std::vector< Detection > >, cv::Mat >
static std::vector< edgetpu::EdgeTpuManager::DeviceEnumerationRecord >	GetHardwareDevices ()
	Retrieve a list of EdgeTPU devices from the edge API.

static std::vector< std::shared_ptr< edgetpu::EdgeTpuContext > >	GetOpenedHardwareDevices ()
	Retrieve a list of already opened EdgeTPU devices from the edge API.

Protected Member Functions inherited from TensorflowTPU< std::vector< std::vector< Detection > >, cv::Mat >
edgetpu::EdgeTpuManager *	GetEdgeManager ()
	Retrieves a pointer to an EdgeTPUManager instance from the libedgetpu library.

std::string	DeviceTypeToString (edgetpu::DeviceType eDeviceType)
	to_string method for converting a device type to a readable string.

Protected Attributes inherited from TensorflowTPU< std::vector< std::vector< Detection > >, cv::Mat >
std::string	m_szModelPath

edgetpu::EdgeTpuManager::DeviceEnumerationRecord	m_tpuDevice

edgetpu::EdgeTpuManager::DeviceOptions	m_tpuDeviceOptions

std::unique_ptr< tflite::FlatBufferModel >	m_pTFLiteModel

std::shared_ptr< edgetpu::EdgeTpuContext >	m_pEdgeTPUContext

std::unique_ptr< tflite::Interpreter >	m_pInterpreter

bool	m_bDeviceOpened

Detailed Description

This class is designed to enable quick, easy, and robust inferencing of .tflite yolo model.

Bug:: This class correctly interfaces with the TPU, loads models, and runs inference, but any attempt to parse the output received from the result of the inference is garbage.

Author: clayjay3 (clayt.nosp@m.onra.nosp@m.ycowe.nosp@m.n@gm.nosp@m.ail.c.nosp@m.om)

Date: 2023-10-24

Constructor & Destructor Documentation

◆ TPUInterpreter()

yolomodel::tensorflow::TPUInterpreter::TPUInterpreter	(	std::string	szModelPath,
		PerformanceModes	ePowerMode = `PerformanceModes::eHigh`,
		unsigned int	unMaxBulkInQueueLength = `32`,
		bool	bUSBAlwaysDFU = `false`
	)

inline

Construct a new TPUInterpreter object.

Parameters

szModelPath	- The path to the model to open and inference on the EdgeTPU.
ePowerMode	- The desired power mode of the device.
unMaxBulkInQueueLength	- Input queue length for device. Larger queue may improve USB performance going from device to host.
bUSBAlwaysDFU	- Whether or not to always reload firmware into the device after this object is created.

Note: The given model must be a tflite model custom compiled to map operations to the EdgeTPU refer to https://coral.ai/docs/edgetpu/models-intro/#compiling and https://coral.ai/docs/edgetpu/compiler/#system-requirements

Author: clayjay3 (clayt.nosp@m.onra.nosp@m.ycowe.nosp@m.n@gm.nosp@m.ail.c.nosp@m.om)

Date: 2023-11-11

                                                                            :
                    TensorflowTPU<std::vector<std::vector<Detection>>, cv::Mat>(szModelPath, ePowerMode, unMaxBulkInQueueLength, bUSBAlwaysDFU)
 
                {}

◆ ~TPUInterpreter()

yolomodel::tensorflow::TPUInterpreter::~TPUInterpreter ( )

inline

Destroy the TPUInterpreter object.

Author: clayjay3 (clayt.nosp@m.onra.nosp@m.ycowe.nosp@m.n@gm.nosp@m.ail.c.nosp@m.om)

Date: 2023-10-24

                {
                    // Nothing to destroy.
                }

Member Function Documentation

◆ Inference()

std::vector< std::vector< Detection > > yolomodel::tensorflow::TPUInterpreter::Inference	(	const cv::Mat &	cvInputFrame,
		const float	fMinObjectConfidence = `0.85`,
		const float	fNMSThreshold = `0.6`
	)

inlineoverridevirtual

Given an input image forward the image through the YOLO model to run inference on the EdgeTPU, then parse and repackage the output tensor data into a vector of easy-to-use Detection structs.

Parameters

cvInputFrame	- The RGB camera frame to run detection on.
fMinObjectConfidence	- Minimum confidence required for an object to be considered a valid detection
fNMSThreshold	- Threshold for Non-Maximum Suppression, controlling overlap between bounding box predictions.

Returns: std::vector<std::vector<Detection>> - A 2D vector of structs containing infomation about the valid object detections in the given image. There will be an std::vector<Detection> for each output tensor.

Note: The input image MUST BE RGB format, otherwise you will likely experience prediction accuracy problems.; This function can automatically decode output from YOLOv5 and YOLOv8 models.

Author: clayjay3 (clayt.nosp@m.onra.nosp@m.ycowe.nosp@m.n@gm.nosp@m.ail.c.nosp@m.om)

Date: 2023-11-13

Implements TensorflowTPU< std::vector< std::vector< Detection > >, cv::Mat >.

                {
                    // Create instance variables.
                    std::vector<std::vector<Detection>> vTensorObjectOutputs;
 
                    // Get the input tensor shape for the model.
                    InputTensorDimensions stInputDimensions = this->GetInputShape(m_pInterpreter->inputs()[0]);
 
                    // Copy given frame to class member variable.
                    m_cvFrame = cvInputFrame;
 
                    // Check if model is open and device is ready.
                    if (m_bDeviceOpened && m_pEdgeTPUContext->IsReady())
                    {
                        // Check if the image has the correct type.
                        if (m_cvFrame.type() != CV_8UC3)
                        {
                            // Convert image to unsigned int8 image.
                            m_cvFrame.convertTo(m_cvFrame, CV_8UC3);
                        }
 
                        // Check if the input image matches the input tensor shape.
                        if (m_cvFrame.rows != stInputDimensions.nHeight || m_cvFrame.cols != stInputDimensions.nWidth)
                        {
                            // Resize the image, and store a local copy of it.
                            cv::resize(m_cvFrame,
                                       m_cvFrame,
                                       cv::Size(stInputDimensions.nWidth, stInputDimensions.nHeight),
                                       constants::BASICCAM_RESIZE_INTERPOLATION_METHOD);
                        }
 
                        // Create a vector to store reshaped input image in 1 dimension.
                        std::vector<int8_t> vInputData(m_cvFrame.data,
                                                       m_cvFrame.data + (static_cast<unsigned long>(m_cvFrame.cols) * m_cvFrame.rows * m_cvFrame.elemSize()));
                        // Quantize input data.
                        // for (long unsigned int nIter = 0; nIter < vInputData.size(); ++nIter)
                        // {
                        //     // Quantize value.
                        //     vInputData[nIter] = std::round((vInputData[nIter] - 128) / stInputDimensions.fQuantScale) + stInputDimensions.nQuantZeroPoint;
                        //     // vInputData[nIter] = vInputData[nIter] - 128;
                        // }
                        // Retrieve a new input tensor from the TPU interpreter and copy data to it. This tensor is automatically quantized because it is typed.
                        TfLiteTensor* pInputTensor = m_pInterpreter->tensor(stInputDimensions.nTensorIndex);
                        std::memcpy(pInputTensor->data.raw, vInputData.data(), vInputData.size());
 
                        // Run inference on the EdgeTPU.
                        if (m_pInterpreter->Invoke() != kTfLiteOk)
                        {
                            // Submit logger message.
                            LOG_WARNING(logging::g_qSharedLogger,
                                        "Inferencing failed on an image for model {} with device {} ({})",
                                        m_szModelPath,
                                        m_tpuDevice.path,
                                        this->DeviceTypeToString(m_tpuDevice.type));
                        }
                        else
                        {
                            // Create separate vectors for storing class confidences, bounding boxes, and classIDs.
                            std::vector<int> vClassIDs;
                            std::vector<float> vClassConfidences;
                            std::vector<cv::Rect> vBoundingBoxes;
                            // Create vector for storing all detections for this tensor output.
                            std::vector<Detection> vObjects;
 
                            // Get output indices for output tensors.
                            for (int nTensorIndex : m_pInterpreter->outputs())
                            {
                                // Clear prediction data vectors.
                                vClassIDs.clear();
                                vClassConfidences.clear();
                                vBoundingBoxes.clear();
                                // Clear object detections vector.
                                vObjects.clear();
 
                                /*
                                    Check if the output tensor has a YOLOv5 format.
                                */
                                // Get the tensor output shape details.
                                OutputTensorDimensions stOutputDimensions = this->GetOutputShape(nTensorIndex);
                                // Calculate the general stride sizes for YOLO based on input tensor shape.
                                int nImgSize  = stInputDimensions.nHeight;
                                int nP3Stride = std::pow((nImgSize / 8), 2);
                                int nP4Stride = std::pow((nImgSize / 16), 2);
                                int nP5Stride = std::pow((nImgSize / 32), 2);
                                // Calculate the proper prediction length for different YOLO versions.
                                int nYOLOv5AnchorsPerGridPoint = 3;
                                int nYOLOv8AnchorsPerGridPoint = 1;
                                int nYOLOv5TotalPredictionLength =
                                    (nP3Stride * nYOLOv5AnchorsPerGridPoint) + (nP4Stride * nYOLOv5AnchorsPerGridPoint) + (nP5Stride * nYOLOv5AnchorsPerGridPoint);
                                int nYOLOv8TotalPredictionLength =
                                    (nP3Stride * nYOLOv8AnchorsPerGridPoint) + (nP4Stride * nYOLOv8AnchorsPerGridPoint) + (nP5Stride * nYOLOv8AnchorsPerGridPoint);
 
                                // Output tensor is YOLOv5 format.
                                if (stOutputDimensions.nAnchors == nYOLOv5TotalPredictionLength)
                                {
                                    // Parse inferenced output from tensor.
                                    this->ParseTensorOutputYOLOv5(nTensorIndex,
                                                                  vClassIDs,
                                                                  vClassConfidences,
                                                                  vBoundingBoxes,
                                                                  fMinObjectConfidence,
                                                                  cvInputFrame.cols,
                                                                  cvInputFrame.rows);
                                }
                                // Output tensor is YOLOv8 format.
                                else if (stOutputDimensions.nAnchors == nYOLOv8TotalPredictionLength)
                                {
                                    // Parse inferenced output from tensor.
                                    this->ParseTensorOutputYOLOv8(nTensorIndex,
                                                                  vClassIDs,
                                                                  vClassConfidences,
                                                                  vBoundingBoxes,
                                                                  fMinObjectConfidence,
                                                                  cvInputFrame.cols,
                                                                  cvInputFrame.rows);
                                }
 
                                // Perform NMS to filter out bad/duplicate detections.
                                NonMaxSuppression(vObjects, vClassIDs, vClassConfidences, vBoundingBoxes, fMinObjectConfidence, fNMSThreshold);
 
                                // Append object detections to the tensor outputs vector.
                                vTensorObjectOutputs.emplace_back(vObjects);
                            }
                        }
                    }
                    else
                    {
                        // Submit logger message.
                        LOG_WARNING(logging::g_qSharedLogger,
                                    "Inferencing failed on an image for model {} with device {} ({})",
                                    m_szModelPath,
                                    m_tpuDevice.path,
                                    this->DeviceTypeToString(m_tpuDevice.type));
                    }
 
                    return vTensorObjectOutputs;
                }

Here is the call graph for this function:

Here is the caller graph for this function:

◆ ParseTensorOutputYOLOv5()

void yolomodel::tensorflow::TPUInterpreter::ParseTensorOutputYOLOv5	(	int	nOutputIndex,
		std::vector< int > &	vClassIDs,
		std::vector< float > &	vClassConfidences,
		std::vector< cv::Rect > &	vBoundingBoxes,
		float	fMinObjectConfidence,
		int	nOriginalFrameWidth,
		int	nOriginalFrameHeight
	)

inlineprivate

Given a TFLite output tensor from a YOLOv5 model, parse it's output into something more usable. The parsed output will be in the form of three vectors: one for class IDs, one for the prediction confidence for the class ID, and one for cv::Rects storing the bounding box data for the prediction. A prediction will line up between the three vectors. (vClassIDs[0], vClassConfidences[0], and vBoundingBoxes[0] correspond to the same prediction.)

Parameters

nOutputIndex	- The output tensor index from the model containing inference data.
vClassIDs	- A reference to a vector that will be filled with class IDs for each prediction. The class ID of a prediction will be choosen by the highest class confidence for that prediction.
vClassConfidences	- A reference to a vector that will be filled with the highest class confidence for that prediction.
vBoundingBoxes	- A reference to a vector that will be filled with cv::Rect bounding box for each prediction.
fMinObjectConfidence	- The minimum confidence for determining which predictions to throw out.
nOriginalFrameWidth	- The pixel width of the normal/original camera frame. This is not the size of the model input or resized image.
nOriginalFrameHeight	- The pixel height of the normal/original camera frame. This is not the size of the model input or resized image.

Note: YOLOv5 predicts 25200 grid_cells when fed with a (3, 640, 640) image (Three detection layers for small, medium, and large objects same size as input with same bit depth). Each grid_cell is a vector composed by (5 + num_classes) values where the 5 values are [objectness_score, Xc, Yc, W, H]. Output would be [1, 25200, 13] for a model with eight classes and 640x640 input size.

Check out https://pub.towardsai.net/yolov5-m-implementation-from-scratch-with-pytorch-c8f84a66c98b for some great info.

Author: clayjay3 (clayt.nosp@m.onra.nosp@m.ycowe.nosp@m.n@gm.nosp@m.ail.c.nosp@m.om)

Date: 2023-11-15

                {
                    // Retrieve output tensor from interpreter.
                    TfLiteTensor* tfOutputTensor = m_pInterpreter->tensor(nOutputIndex);
                    // Get output tensor shape.
                    OutputTensorDimensions stOutputDimensions = this->GetOutputShape(nOutputIndex);
                    // Create vector for storing temporary values for this prediction.
                    std::vector<float> vGridPrediction;
                    // Resize the Grid prediction vector to match the number of classes + bounding_box + objectness score.
                    vGridPrediction.resize(stOutputDimensions.nObjectnessLocationClasses);
 
                    /*
                       Loop through each grid cell output of the model output and filter out objects that don't meet conf thresh.
                       Then, repackage into nice detection structs.
                       For YOLOv5, you divide your image size, i.e. 640 by the P3, P4, P5 output strides of 8, 16, 32 to arrive at grid sizes
                       of 80x80, 40x40, 20x20. Each grid point has 3 anchors by default (anchor box values: small, medium, large), and each anchor contains a vector 5 +
                       nc long, where nc is the number of classes the model has. So for a 640 image, the output tensor will be [1, 25200, 85]
                    */
                    for (int nIter = 0; nIter < stOutputDimensions.nAnchors; ++nIter)
                    {
                        // Get objectness confidence. This is the 5th value for each grid/anchor prediction. (4th index)
                        float fObjectnessConfidence =
                            (tfOutputTensor->data.uint8[(nIter * stOutputDimensions.nObjectnessLocationClasses) + 4] - stOutputDimensions.nQuantZeroPoint) *
                            stOutputDimensions.fQuantScale;
 
                        // Check if the object confidence is greater than or equal to the threshold.
                        if (fObjectnessConfidence >= fMinObjectConfidence)
                        {
                            // Loop through the number of object info and class confidences in the 2nd dimension.
                            // Predictions have format {center_x, center_y, width, height, object_conf, class0_conf, class1_conf, ...}
                            for (int nJter = 0; nJter < stOutputDimensions.nObjectnessLocationClasses; ++nJter)
                            {
                                // Repackage value into more usable vector. Also undo quantization the data.
                                vGridPrediction[nJter] =
                                    (tfOutputTensor->data.uint8[(nIter * stOutputDimensions.nObjectnessLocationClasses) + nJter] - stOutputDimensions.nQuantZeroPoint) *
                                    stOutputDimensions.fQuantScale;
                            }
 
                            // Find class ID based on which class confidence has the highest score.
                            std::vector<float>::iterator pStartIterator = vGridPrediction.begin() + 5;
                            std::vector<float>::iterator pMaxConfidence = std::max_element(pStartIterator, vGridPrediction.end());
                            int nClassID                                = std::distance(pStartIterator, pMaxConfidence);
                            // Get prediction confidence for class ID.
                            float fClassConfidence = vGridPrediction[nClassID + 5];
                            // Scale bounding box to match original input image size.
                            cv::Rect cvBoundingBox;
                            int nCenterX = vGridPrediction[0] * nOriginalFrameWidth;
                            int nCenterY = vGridPrediction[1] * nOriginalFrameHeight;
                            int nWidth   = vGridPrediction[2] * nOriginalFrameWidth;
                            int nHeight  = vGridPrediction[3] * nOriginalFrameHeight;
                            // Check if the width and height of the object are greater than zero.
                            if (nWidth > 0 && nHeight > 0)
                            {
                                // Repackaged bounding box data to be more readable.
                                cvBoundingBox.x      = int(nCenterX - (0.5 * nWidth));     // Rect.x is the top-left corner not center point.
                                cvBoundingBox.y      = int(nCenterY - (0.5 * nHeight));    // Rect.y is the top-left corner not center point.
                                cvBoundingBox.width  = nWidth;
                                cvBoundingBox.height = nHeight;
                                // Add data to vectors.
                                vClassIDs.emplace_back(nClassID);
                                vClassConfidences.emplace_back(fClassConfidence);
                                vBoundingBoxes.emplace_back(cvBoundingBox);
                            }
                        }
                    }
                }

Here is the call graph for this function:

Here is the caller graph for this function:

◆ ParseTensorOutputYOLOv8()

void yolomodel::tensorflow::TPUInterpreter::ParseTensorOutputYOLOv8	(	int	nOutputIndex,
		std::vector< int > &	vClassIDs,
		std::vector< float > &	vClassConfidences,
		std::vector< cv::Rect > &	vBoundingBoxes,
		float	fMinObjectConfidence,
		int	nOriginalFrameWidth,
		int	nOriginalFrameHeight
	)

inlineprivate

Given a TFLite output tensor from a YOLOv8 model, parse it's output into something more usable. The parsed output will be in the form of three vectors: one for class IDs, one for the prediction confidence for the class ID, and one for cv::Rects storing the bounding box data for the prediction. A prediction will line up between the three vectors. (vClassIDs[0], vClassConfidences[0], and vBoundingBoxes[0] correspond to the same prediction.)

Parameters

nOutputIndex	- The output tensor index from the model containing inference data.
vClassIDs	- A reference to a vector that will be filled with class IDs for each prediction. The class ID of a prediction will be choosen by the highest class confidence for that prediction.
vClassConfidences	- A reference to a vector that will be filled with the highest class confidence for that prediction.
vBoundingBoxes	- A reference to a vector that will be filled with cv::Rect bounding box for each prediction.
fMinObjectConfidence	- The minimum confidence for determining which predictions to throw out.
nOriginalFrameWidth	- The pixel width of the normal/original camera frame. This is not the size of the model input or resized image.
nOriginalFrameHeight	- The pixel height of the normal/original camera frame. This is not the size of the model input or resized image.

Note: For YOLOv8, you divide your image size, i.e. 640 by the P3, P4, P5 output strides of 8, 16, 32 to arrive at grid sizes of 80x80, 40x40, 20x20. Each grid point has 1 anchor, and each anchor contains a vector 4 + nc long, where nc is the number of classes the model has. So for a 640 image, the output tensor will be [1, 84, 8400] (80 classes). Notice how the larger dimensions is swapped when compared to YOLOv8.

Author: clayjay3 (clayt.nosp@m.onra.nosp@m.ycowe.nosp@m.n@gm.nosp@m.ail.c.nosp@m.om)

Date: 2023-11-15

                {
                    // Retrieve output tensor from interpreter.
                    TfLiteTensor* tfOutputTensor = m_pInterpreter->tensor(nOutputIndex);
                    // Get output tensor shape.
                    OutputTensorDimensions stOutputDimensions = this->GetOutputShape(nOutputIndex);
                    // Create vector for storing temporary values for this prediction.
                    std::vector<float> vGridPrediction;
                    // Resize the Grid prediction vector to match the number of classes + bounding_box + objectness score.
                    vGridPrediction.resize(stOutputDimensions.nObjectnessLocationClasses);
 
                    /*
                        Loop through each grid cell output of the model output and filter out objects that don't meet conf thresh.
                        Then, repackage into nice detection structs.
                        For YOLOv8, you divide your image size, i.e. 640 by the P3, P4, P5 output strides of 8, 16, 32 to arrive at grid sizes
                        of 80x80, 40x40, 20x20. Each grid point has 1 anchor, and each anchor contains a vector 4 + nc long, where nc is the number
                        of classes the model has. So for a 640 image, the output tensor will be [1, 84, 8400] (80 classes). Notice how the larger dimensions is swapped
                        when compared to YOLOv8.
                    */
                    for (int nIter = 0; nIter < stOutputDimensions.nAnchors; ++nIter)
                    {
                        // Loop through the number of object info and class confidences in the 2nd dimension.
                        // Predictions have format {center_x, center_y, width, height, class0_conf, class1_conf, ...}
                        std::string szTest = "";
                        for (int nJter = 0; nJter < stOutputDimensions.nObjectnessLocationClasses; ++nJter)
                        {
                            // Repackage values into more usable vector. Also undo quantization the data.
                            vGridPrediction[nJter] = (tfOutputTensor->data.int8[nIter + (nJter * stOutputDimensions.nAnchors)] - stOutputDimensions.nQuantZeroPoint) *
                                                     stOutputDimensions.fQuantScale;
                        }
 
                        // Find class ID based on which class confidence has the highest score.
                        std::vector<float>::iterator pStartIterator = vGridPrediction.begin() + 4;
                        std::vector<float>::iterator pMaxConfidence = std::max_element(pStartIterator, vGridPrediction.end());
                        int nClassID                                = std::distance(pStartIterator, pMaxConfidence);
                        // Get prediction confidence for class ID.
                        float fClassConfidence = vGridPrediction[nClassID + 4];
 
                        // Check if class confidence meets threshold.
                        if (fClassConfidence >= fMinObjectConfidence)
                        {
                            // Scale bounding box to match original input image size.
                            cv::Rect cvBoundingBox;
                            int nCenterX = vGridPrediction[0] * nOriginalFrameWidth;
                            int nCenterY = vGridPrediction[1] * nOriginalFrameHeight;
                            int nWidth   = vGridPrediction[2] * nOriginalFrameWidth;
                            int nHeight  = vGridPrediction[3] * nOriginalFrameHeight;
                            // Repackaged bounding box data to be more readable.
                            cvBoundingBox.x      = int(nCenterX - (0.5 * nWidth));     // Rect.x is the top-left corner not center point.
                            cvBoundingBox.y      = int(nCenterY - (0.5 * nHeight));    // Rect.y is the top-left corner not center point.
                            cvBoundingBox.width  = nWidth;
                            cvBoundingBox.height = nHeight;
                            // Add data to vectors.
                            vClassIDs.emplace_back(nClassID);
                            vClassConfidences.emplace_back(fClassConfidence);
                            vBoundingBoxes.emplace_back(cvBoundingBox);
                        }
                    }
                }

Here is the call graph for this function:

Here is the caller graph for this function:

◆ GetInputShape()

InputTensorDimensions yolomodel::tensorflow::TPUInterpreter::GetInputShape ( const int nTensorIndex = 0 )

inlineprivate

Get the input shape of the tensor at the given index. Requires the device to have been successfully opened.

Parameters

nTensorIndex - The index of the input tensor to use. YOLO models that have been converted to a edgetpu quantized .tflite file will only have one input at index 0.

Returns: TensorDimensions - A struct containing the height, width, and channels of the input tensor.

Author: clayjay3 (clayt.nosp@m.onra.nosp@m.ycowe.nosp@m.n@gm.nosp@m.ail.c.nosp@m.om)

Date: 2023-11-12

                {
                    // Create instance variables.
                    InputTensorDimensions stInputDimensions = {0, 0, 0, 0, 0, 0};
 
                    // Check if interpreter has been built.
                    if (m_bDeviceOpened)
                    {
                        // Get the desired input tensor shape of the model.
                        TfLiteTensor* tfInputTensor  = m_pInterpreter->tensor(nTensorIndex);
                        TfLiteIntArray* tfDimensions = tfInputTensor->dims;
 
                        // Package dimensions into struct.
                        stInputDimensions.nHeight      = tfDimensions->data[1];
                        stInputDimensions.nWidth       = tfDimensions->data[2];
                        stInputDimensions.nChannels    = tfDimensions->data[3];
                        stInputDimensions.nTensorIndex = nTensorIndex;
                        // Get the quantization zero point and scale for output tensor.
                        stInputDimensions.nQuantZeroPoint = tfInputTensor->params.zero_point;
                        stInputDimensions.fQuantScale     = tfInputTensor->params.scale;
                    }
 
                    return stInputDimensions;
                }

Here is the caller graph for this function:

◆ GetOutputShape()

OutputTensorDimensions yolomodel::tensorflow::TPUInterpreter::GetOutputShape ( const int nTensorIndex = 0 )

inlineprivate

Get the output shape of the tensor at the given index. Requires the device to have been successfully opened.

Parameters

nTensorIndex - The index of the output tensor to use. YOLO models that have been converted to a edgetpu quantized .tflite file will only have one output at index 0.

Returns: TensorDimensions - A struct containing the height, width, and channels of the output tensor.

Author: clayjay3 (clayt.nosp@m.onra.nosp@m.ycowe.nosp@m.n@gm.nosp@m.ail.c.nosp@m.om)

Date: 2023-11-12

                {
                    // Create instance variables.
                    OutputTensorDimensions stOutputDimensions = {0, 0, 0, 0, 0};
 
                    // Check if interpreter has been built.
                    if (m_bDeviceOpened)
                    {
                        // Get the desired output tensor shape of the model.
                        TfLiteTensor* tfOutputTensor = m_pInterpreter->tensor(nTensorIndex);
                        TfLiteIntArray* tfDimensions = tfOutputTensor->dims;
 
                        // Package dimensions into struct. Assume anchors will always be the longer dimension.
                        stOutputDimensions.nAnchors                   = std::max(tfDimensions->data[1], tfDimensions->data[2]);
                        stOutputDimensions.nObjectnessLocationClasses = std::min(tfDimensions->data[1], tfDimensions->data[2]);
                        stOutputDimensions.nTensorIndex               = nTensorIndex;
                        // Get the quantization zero point and scale for output tensor.
                        stOutputDimensions.nQuantZeroPoint = tfOutputTensor->params.zero_point;
                        stOutputDimensions.fQuantScale     = tfOutputTensor->params.scale;
                    }
 
                    return stOutputDimensions;
                }

Here is the caller graph for this function:

The documentation for this class was generated from the following file:

src/util/vision/YOLOModel.hpp

Public Member Functions

Private Member Functions

Private Attributes

Additional Inherited Members

Detailed Description

Constructor & Destructor Documentation

◆ TPUInterpreter()

◆ ~TPUInterpreter()

Member Function Documentation

◆ Inference()

◆ ParseTensorOutputYOLOv5()

◆ ParseTensorOutputYOLOv8()

◆ GetInputShape()

◆ GetOutputShape()