Autonomy Software C++ 24.5.1
Welcome to the Autonomy Software repository of the Mars Rover Design Team (MRDT) at Missouri University of Science and Technology (Missouri S&T)! API reference contains the source code and other resources for the development of the autonomy software for our Mars rover. The Autonomy Software project aims to compete in the University Rover Challenge (URC) by demonstrating advanced autonomous capabilities and robust navigation algorithms.
Loading...
Searching...
No Matches
yolomodel::tensorflow::TPUInterpreter Class Reference

This class is designed to enable quick, easy, and robust inferencing of .tflite yolo model. More...

#include <YOLOModel.hpp>

Inheritance diagram for yolomodel::tensorflow::TPUInterpreter:
Collaboration diagram for yolomodel::tensorflow::TPUInterpreter:

Public Member Functions

 TPUInterpreter (std::string szModelPath, PerformanceModes ePowerMode=eHigh, unsigned int unMaxBulkInQueueLength=32, bool bUSBAlwaysDFU=false)
 Construct a new TPUInterpreter object.
 
 ~TPUInterpreter ()
 Destroy the TPUInterpreter object.
 
std::vector< std::vector< Detection > > Inference (const cv::Mat &cvInputFrame, const float fMinObjectConfidence=0.85, const float fNMSThreshold=0.6) override
 Given an input image forward the image through the YOLO model to run inference on the EdgeTPU, then parse and repackage the output tensor data into a vector of easy-to-use Detection structs.
 
- Public Member Functions inherited from TensorflowTPU< std::vector< std::vector< Detection > >, cv::Mat >
 TensorflowTPU (std::string szModelPath, PerformanceModes ePowerMode=eHigh, unsigned int unMaxBulkInQueueLength=32, bool bUSBAlwaysDFU=false)
 Construct a new TensorflowTPU object.
 
 ~TensorflowTPU ()
 Destroy the TensorflowTPU object.
 
void CloseHardware ()
 Release all hardware and reset models and interpreters.
 
TfLiteStatus OpenAndLoad (DeviceType eDeviceType=eAuto)
 Attempt to open the model at the given path and load it onto the EdgeTPU device.
 
bool GetDeviceIsOpened () const
 Accessor for the Device Is Opened private member.
 

Private Member Functions

void ParseTensorOutputYOLOv5 (int nOutputIndex, std::vector< int > &vClassIDs, std::vector< float > &vClassConfidences, std::vector< cv::Rect > &vBoundingBoxes, float fMinObjectConfidence, int nOriginalFrameWidth, int nOriginalFrameHeight)
 Given a TFLite output tensor from a YOLOv5 model, parse it's output into something more usable. The parsed output will be in the form of three vectors: one for class IDs, one for the prediction confidence for the class ID, and one for cv::Rects storing the bounding box data for the prediction. A prediction will line up between the three vectors. (vClassIDs[0], vClassConfidences[0], and vBoundingBoxes[0] correspond to the same prediction.)
 
void ParseTensorOutputYOLOv8 (int nOutputIndex, std::vector< int > &vClassIDs, std::vector< float > &vClassConfidences, std::vector< cv::Rect > &vBoundingBoxes, float fMinObjectConfidence, int nOriginalFrameWidth, int nOriginalFrameHeight)
 Given a TFLite output tensor from a YOLOv8 model, parse it's output into something more usable. The parsed output will be in the form of three vectors: one for class IDs, one for the prediction confidence for the class ID, and one for cv::Rects storing the bounding box data for the prediction. A prediction will line up between the three vectors. (vClassIDs[0], vClassConfidences[0], and vBoundingBoxes[0] correspond to the same prediction.)
 
InputTensorDimensions GetInputShape (const int nTensorIndex=0)
 Get the input shape of the tensor at the given index. Requires the device to have been successfully opened.
 
OutputTensorDimensions GetOutputShape (const int nTensorIndex=0)
 Get the output shape of the tensor at the given index. Requires the device to have been successfully opened.
 

Private Attributes

cv::Mat m_cvFrame
 

Additional Inherited Members

- Public Types inherited from TensorflowTPU< std::vector< std::vector< Detection > >, cv::Mat >
enum  DeviceType
 
enum  PerformanceModes
 
- Static Public Member Functions inherited from TensorflowTPU< std::vector< std::vector< Detection > >, cv::Mat >
static std::vector< edgetpu::EdgeTpuManager::DeviceEnumerationRecord > GetHardwareDevices ()
 Retrieve a list of EdgeTPU devices from the edge API.
 
static std::vector< std::shared_ptr< edgetpu::EdgeTpuContext > > GetOpenedHardwareDevices ()
 Retrieve a list of already opened EdgeTPU devices from the edge API.
 
- Protected Member Functions inherited from TensorflowTPU< std::vector< std::vector< Detection > >, cv::Mat >
edgetpu::EdgeTpuManager * GetEdgeManager ()
 Retrieves a pointer to an EdgeTPUManager instance from the libedgetpu library.
 
std::string DeviceTypeToString (edgetpu::DeviceType eDeviceType)
 to_string method for converting a device type to a readable string.
 
- Protected Attributes inherited from TensorflowTPU< std::vector< std::vector< Detection > >, cv::Mat >
std::string m_szModelPath
 
edgetpu::EdgeTpuManager::DeviceEnumerationRecord m_tpuDevice
 
edgetpu::EdgeTpuManager::DeviceOptions m_tpuDeviceOptions
 
std::unique_ptr< tflite::FlatBufferModel > m_pTFLiteModel
 
std::shared_ptr< edgetpu::EdgeTpuContext > m_pEdgeTPUContext
 
std::unique_ptr< tflite::Interpreter > m_pInterpreter
 
bool m_bDeviceOpened
 

Detailed Description

This class is designed to enable quick, easy, and robust inferencing of .tflite yolo model.

Bug:
This class correctly interfaces with the TPU, loads models, and runs inference, but any attempt to parse the output received from the result of the inference is garbage.
Author
clayjay3 (clayt.nosp@m.onra.nosp@m.ycowe.nosp@m.n@gm.nosp@m.ail.c.nosp@m.om)
Date
2023-10-24

Constructor & Destructor Documentation

◆ TPUInterpreter()

yolomodel::tensorflow::TPUInterpreter::TPUInterpreter ( std::string  szModelPath,
PerformanceModes  ePowerMode = eHigh,
unsigned int  unMaxBulkInQueueLength = 32,
bool  bUSBAlwaysDFU = false 
)
inline

Construct a new TPUInterpreter object.

Parameters
szModelPath- The path to the model to open and inference on the EdgeTPU.
ePowerMode- The desired power mode of the device.
unMaxBulkInQueueLength- Input queue length for device. Larger queue may improve USB performance going from device to host.
bUSBAlwaysDFU- Whether or not to always reload firmware into the device after this object is created.
Note
The given model must be a tflite model custom compiled to map operations to the EdgeTPU refer to https://coral.ai/docs/edgetpu/models-intro/#compiling and https://coral.ai/docs/edgetpu/compiler/#system-requirements
Author
clayjay3 (clayt.nosp@m.onra.nosp@m.ycowe.nosp@m.n@gm.nosp@m.ail.c.nosp@m.om)
Date
2023-11-11
236 :
237 TensorflowTPU<std::vector<std::vector<Detection>>, cv::Mat>(szModelPath, ePowerMode, unMaxBulkInQueueLength, bUSBAlwaysDFU)
238
239 {}
This class is designed to enable quick, easy, and robust handling of .tflite models for deployment an...
Definition TensorflowTPU.hpp:39

◆ ~TPUInterpreter()

yolomodel::tensorflow::TPUInterpreter::~TPUInterpreter ( )
inline

Destroy the TPUInterpreter object.

Author
clayjay3 (clayt.nosp@m.onra.nosp@m.ycowe.nosp@m.n@gm.nosp@m.ail.c.nosp@m.om)
Date
2023-10-24
249 {
250 // Nothing to destroy.
251 }

Member Function Documentation

◆ Inference()

std::vector< std::vector< Detection > > yolomodel::tensorflow::TPUInterpreter::Inference ( const cv::Mat cvInputFrame,
const float  fMinObjectConfidence = 0.85,
const float  fNMSThreshold = 0.6 
)
inlineoverridevirtual

Given an input image forward the image through the YOLO model to run inference on the EdgeTPU, then parse and repackage the output tensor data into a vector of easy-to-use Detection structs.

Parameters
cvInputFrame- The RGB camera frame to run detection on.
fMinObjectConfidence- Minimum confidence required for an object to be considered a valid detection
fNMSThreshold- Threshold for Non-Maximum Suppression, controlling overlap between bounding box predictions.
Returns
std::vector<std::vector<Detection>> - A 2D vector of structs containing infomation about the valid object detections in the given image. There will be an std::vector<Detection> for each output tensor.
Note
The input image MUST BE RGB format, otherwise you will likely experience prediction accuracy problems.
This function can automatically decode output from YOLOv5 and YOLOv8 models.
Author
clayjay3 (clayt.nosp@m.onra.nosp@m.ycowe.nosp@m.n@gm.nosp@m.ail.c.nosp@m.om)
Date
2023-11-13

Implements TensorflowTPU< std::vector< std::vector< Detection > >, cv::Mat >.

273 {
274 // Create instance variables.
275 std::vector<std::vector<Detection>> vTensorObjectOutputs;
276
277 // Get the input tensor shape for the model.
278 InputTensorDimensions stInputDimensions = this->GetInputShape(m_pInterpreter->inputs()[0]);
279
280 // Copy given frame to class member variable.
281 m_cvFrame = cvInputFrame;
282
283 // Check if model is open and device is ready.
284 if (m_bDeviceOpened && m_pEdgeTPUContext->IsReady())
285 {
286 // Check if the image has the correct type.
287 if (m_cvFrame.type() != CV_8UC3)
288 {
289 // Convert image to unsigned int8 image.
290 m_cvFrame.convertTo(m_cvFrame, CV_8UC3);
291 }
292
293 // Check if the input image matches the input tensor shape.
294 if (m_cvFrame.rows != stInputDimensions.nHeight || m_cvFrame.cols != stInputDimensions.nWidth)
295 {
296 // Resize the image, and store a local copy of it.
297 cv::resize(m_cvFrame,
298 m_cvFrame,
299 cv::Size(stInputDimensions.nWidth, stInputDimensions.nHeight),
300 constants::BASICCAM_RESIZE_INTERPOLATION_METHOD);
301 }
302
303 // Create a vector to store reshaped input image in 1 dimension.
304 std::vector<int8_t> vInputData(m_cvFrame.data,
305 m_cvFrame.data + (static_cast<unsigned long>(m_cvFrame.cols) * m_cvFrame.rows * m_cvFrame.elemSize()));
306 // Quantize input data.
307 // for (long unsigned int nIter = 0; nIter < vInputData.size(); ++nIter)
308 // {
309 // // Quantize value.
310 // vInputData[nIter] = std::round((vInputData[nIter] - 128) / stInputDimensions.fQuantScale) + stInputDimensions.nQuantZeroPoint;
311 // // vInputData[nIter] = vInputData[nIter] - 128;
312 // }
313 // Retrieve a new input tensor from the TPU interpreter and copy data to it. This tensor is automatically quantized because it is typed.
314 TfLiteTensor* pInputTensor = m_pInterpreter->tensor(stInputDimensions.nTensorIndex);
315 std::memcpy(pInputTensor->data.raw, vInputData.data(), vInputData.size());
316
317 // Run inference on the EdgeTPU.
318 if (m_pInterpreter->Invoke() != kTfLiteOk)
319 {
320 // Submit logger message.
321 LOG_WARNING(logging::g_qSharedLogger,
322 "Inferencing failed on an image for model {} with device {} ({})",
323 m_szModelPath,
324 m_tpuDevice.path,
325 this->DeviceTypeToString(m_tpuDevice.type));
326 }
327 else
328 {
329 // Create separate vectors for storing class confidences, bounding boxes, and classIDs.
330 std::vector<int> vClassIDs;
331 std::vector<float> vClassConfidences;
332 std::vector<cv::Rect> vBoundingBoxes;
333 // Create vector for storing all detections for this tensor output.
334 std::vector<Detection> vObjects;
335
336 // Get output indices for output tensors.
337 for (int nTensorIndex : m_pInterpreter->outputs())
338 {
339 // Clear prediction data vectors.
340 vClassIDs.clear();
341 vClassConfidences.clear();
342 vBoundingBoxes.clear();
343 // Clear object detections vector.
344 vObjects.clear();
345
346 /*
347 Check if the output tensor has a YOLOv5 format.
348 */
349 // Get the tensor output shape details.
350 OutputTensorDimensions stOutputDimensions = this->GetOutputShape(nTensorIndex);
351 // Calculate the general stride sizes for YOLO based on input tensor shape.
352 int nImgSize = stInputDimensions.nHeight;
353 int nP3Stride = std::pow((nImgSize / 8), 2);
354 int nP4Stride = std::pow((nImgSize / 16), 2);
355 int nP5Stride = std::pow((nImgSize / 32), 2);
356 // Calculate the proper prediction length for different YOLO versions.
357 int nYOLOv5AnchorsPerGridPoint = 3;
358 int nYOLOv8AnchorsPerGridPoint = 1;
359 int nYOLOv5TotalPredictionLength =
360 (nP3Stride * nYOLOv5AnchorsPerGridPoint) + (nP4Stride * nYOLOv5AnchorsPerGridPoint) + (nP5Stride * nYOLOv5AnchorsPerGridPoint);
361 int nYOLOv8TotalPredictionLength =
362 (nP3Stride * nYOLOv8AnchorsPerGridPoint) + (nP4Stride * nYOLOv8AnchorsPerGridPoint) + (nP5Stride * nYOLOv8AnchorsPerGridPoint);
363
364 // Output tensor is YOLOv5 format.
365 if (stOutputDimensions.nAnchors == nYOLOv5TotalPredictionLength)
366 {
367 // Parse inferenced output from tensor.
368 this->ParseTensorOutputYOLOv5(nTensorIndex,
369 vClassIDs,
370 vClassConfidences,
371 vBoundingBoxes,
372 fMinObjectConfidence,
373 cvInputFrame.cols,
374 cvInputFrame.rows);
375 }
376 // Output tensor is YOLOv8 format.
377 else if (stOutputDimensions.nAnchors == nYOLOv8TotalPredictionLength)
378 {
379 // Parse inferenced output from tensor.
380 this->ParseTensorOutputYOLOv8(nTensorIndex,
381 vClassIDs,
382 vClassConfidences,
383 vBoundingBoxes,
384 fMinObjectConfidence,
385 cvInputFrame.cols,
386 cvInputFrame.rows);
387 }
388
389 // Perform NMS to filter out bad/duplicate detections.
390 NonMaxSuppression(vObjects, vClassIDs, vClassConfidences, vBoundingBoxes, fMinObjectConfidence, fNMSThreshold);
391
392 // Append object detections to the tensor outputs vector.
393 vTensorObjectOutputs.emplace_back(vObjects);
394 }
395 }
396 }
397 else
398 {
399 // Submit logger message.
400 LOG_WARNING(logging::g_qSharedLogger,
401 "Inferencing failed on an image for model {} with device {} ({})",
402 m_szModelPath,
403 m_tpuDevice.path,
404 this->DeviceTypeToString(m_tpuDevice.type));
405 }
406
407 return vTensorObjectOutputs;
408 }
uchar * data
size_t elemSize() const
void convertTo(OutputArray m, int rtype, double alpha=1, double beta=0) const
int type() const
OutputTensorDimensions GetOutputShape(const int nTensorIndex=0)
Get the output shape of the tensor at the given index. Requires the device to have been successfully ...
Definition YOLOModel.hpp:652
InputTensorDimensions GetInputShape(const int nTensorIndex=0)
Get the input shape of the tensor at the given index. Requires the device to have been successfully o...
Definition YOLOModel.hpp:615
void ParseTensorOutputYOLOv5(int nOutputIndex, std::vector< int > &vClassIDs, std::vector< float > &vClassConfidences, std::vector< cv::Rect > &vBoundingBoxes, float fMinObjectConfidence, int nOriginalFrameWidth, int nOriginalFrameHeight)
Given a TFLite output tensor from a YOLOv5 model, parse it's output into something more usable....
Definition YOLOModel.hpp:441
void ParseTensorOutputYOLOv8(int nOutputIndex, std::vector< int > &vClassIDs, std::vector< float > &vClassConfidences, std::vector< cv::Rect > &vBoundingBoxes, float fMinObjectConfidence, int nOriginalFrameWidth, int nOriginalFrameHeight)
Given a TFLite output tensor from a YOLOv8 model, parse it's output into something more usable....
Definition YOLOModel.hpp:537
void resize(InputArray src, OutputArray dst, Size dsize, double fx=0, double fy=0, int interpolation=INTER_LINEAR)
void NonMaxSuppression(std::vector< Detection > &vObjects, std::vector< int > &vClassIDs, std::vector< float > &vClassConfidences, std::vector< cv::Rect > &vBoundingBoxes, float fMinObjectConfidence, float fNMSThreshold)
Perform non max suppression for the given predictions. This eliminates/combines predictions that over...
Definition YOLOModel.hpp:67
Here is the call graph for this function:
Here is the caller graph for this function:

◆ ParseTensorOutputYOLOv5()

void yolomodel::tensorflow::TPUInterpreter::ParseTensorOutputYOLOv5 ( int  nOutputIndex,
std::vector< int > &  vClassIDs,
std::vector< float > &  vClassConfidences,
std::vector< cv::Rect > &  vBoundingBoxes,
float  fMinObjectConfidence,
int  nOriginalFrameWidth,
int  nOriginalFrameHeight 
)
inlineprivate

Given a TFLite output tensor from a YOLOv5 model, parse it's output into something more usable. The parsed output will be in the form of three vectors: one for class IDs, one for the prediction confidence for the class ID, and one for cv::Rects storing the bounding box data for the prediction. A prediction will line up between the three vectors. (vClassIDs[0], vClassConfidences[0], and vBoundingBoxes[0] correspond to the same prediction.)

Parameters
nOutputIndex- The output tensor index from the model containing inference data.
vClassIDs- A reference to a vector that will be filled with class IDs for each prediction. The class ID of a prediction will be choosen by the highest class confidence for that prediction.
vClassConfidences- A reference to a vector that will be filled with the highest class confidence for that prediction.
vBoundingBoxes- A reference to a vector that will be filled with cv::Rect bounding box for each prediction.
fMinObjectConfidence- The minimum confidence for determining which predictions to throw out.
nOriginalFrameWidth- The pixel width of the normal/original camera frame. This is not the size of the model input or resized image.
nOriginalFrameHeight- The pixel height of the normal/original camera frame. This is not the size of the model input or resized image.
Note
YOLOv5 predicts 25200 grid_cells when fed with a (3, 640, 640) image (Three detection layers for small, medium, and large objects same size as input with same bit depth). Each grid_cell is a vector composed by (5 + num_classes) values where the 5 values are [objectness_score, Xc, Yc, W, H]. Output would be [1, 25200, 13] for a model with eight classes and 640x640 input size.

Check out https://pub.towardsai.net/yolov5-m-implementation-from-scratch-with-pytorch-c8f84a66c98b for some great info.

Author
clayjay3 (clayt.nosp@m.onra.nosp@m.ycowe.nosp@m.n@gm.nosp@m.ail.c.nosp@m.om)
Date
2023-11-15
448 {
449 // Retrieve output tensor from interpreter.
450 TfLiteTensor* tfOutputTensor = m_pInterpreter->tensor(nOutputIndex);
451 // Get output tensor shape.
452 OutputTensorDimensions stOutputDimensions = this->GetOutputShape(nOutputIndex);
453 // Create vector for storing temporary values for this prediction.
454 std::vector<float> vGridPrediction;
455 // Resize the Grid prediction vector to match the number of classes + bounding_box + objectness score.
456 vGridPrediction.resize(stOutputDimensions.nObjectnessLocationClasses);
457
458 /*
459 Loop through each grid cell output of the model output and filter out objects that don't meet conf thresh.
460 Then, repackage into nice detection structs.
461 For YOLOv5, you divide your image size, i.e. 640 by the P3, P4, P5 output strides of 8, 16, 32 to arrive at grid sizes
462 of 80x80, 40x40, 20x20. Each grid point has 3 anchors by default (anchor box values: small, medium, large), and each anchor contains a vector 5 +
463 nc long, where nc is the number of classes the model has. So for a 640 image, the output tensor will be [1, 25200, 85]
464 */
465 for (int nIter = 0; nIter < stOutputDimensions.nAnchors; ++nIter)
466 {
467 // Get objectness confidence. This is the 5th value for each grid/anchor prediction. (4th index)
468 float fObjectnessConfidence =
469 (tfOutputTensor->data.uint8[(nIter * stOutputDimensions.nObjectnessLocationClasses) + 4] - stOutputDimensions.nQuantZeroPoint) *
470 stOutputDimensions.fQuantScale;
471
472 // Check if the object confidence is greater than or equal to the threshold.
473 if (fObjectnessConfidence >= fMinObjectConfidence)
474 {
475 // Loop through the number of object info and class confidences in the 2nd dimension.
476 // Predictions have format {center_x, center_y, width, height, object_conf, class0_conf, class1_conf, ...}
477 for (int nJter = 0; nJter < stOutputDimensions.nObjectnessLocationClasses; ++nJter)
478 {
479 // Repackage value into more usable vector. Also undo quantization the data.
480 vGridPrediction[nJter] =
481 (tfOutputTensor->data.uint8[(nIter * stOutputDimensions.nObjectnessLocationClasses) + nJter] - stOutputDimensions.nQuantZeroPoint) *
482 stOutputDimensions.fQuantScale;
483 }
484
485 // Find class ID based on which class confidence has the highest score.
486 std::vector<float>::iterator pStartIterator = vGridPrediction.begin() + 5;
487 std::vector<float>::iterator pMaxConfidence = std::max_element(pStartIterator, vGridPrediction.end());
488 int nClassID = std::distance(pStartIterator, pMaxConfidence);
489 // Get prediction confidence for class ID.
490 float fClassConfidence = vGridPrediction[nClassID + 5];
491 // Scale bounding box to match original input image size.
492 cv::Rect cvBoundingBox;
493 int nCenterX = vGridPrediction[0] * nOriginalFrameWidth;
494 int nCenterY = vGridPrediction[1] * nOriginalFrameHeight;
495 int nWidth = vGridPrediction[2] * nOriginalFrameWidth;
496 int nHeight = vGridPrediction[3] * nOriginalFrameHeight;
497 // Check if the width and height of the object are greater than zero.
498 if (nWidth > 0 && nHeight > 0)
499 {
500 // Repackaged bounding box data to be more readable.
501 cvBoundingBox.x = int(nCenterX - (0.5 * nWidth)); // Rect.x is the top-left corner not center point.
502 cvBoundingBox.y = int(nCenterY - (0.5 * nHeight)); // Rect.y is the top-left corner not center point.
503 cvBoundingBox.width = nWidth;
504 cvBoundingBox.height = nHeight;
505 // Add data to vectors.
506 vClassIDs.emplace_back(nClassID);
507 vClassConfidences.emplace_back(fClassConfidence);
508 vBoundingBoxes.emplace_back(cvBoundingBox);
509 }
510 }
511 }
512 }
Here is the call graph for this function:
Here is the caller graph for this function:

◆ ParseTensorOutputYOLOv8()

void yolomodel::tensorflow::TPUInterpreter::ParseTensorOutputYOLOv8 ( int  nOutputIndex,
std::vector< int > &  vClassIDs,
std::vector< float > &  vClassConfidences,
std::vector< cv::Rect > &  vBoundingBoxes,
float  fMinObjectConfidence,
int  nOriginalFrameWidth,
int  nOriginalFrameHeight 
)
inlineprivate

Given a TFLite output tensor from a YOLOv8 model, parse it's output into something more usable. The parsed output will be in the form of three vectors: one for class IDs, one for the prediction confidence for the class ID, and one for cv::Rects storing the bounding box data for the prediction. A prediction will line up between the three vectors. (vClassIDs[0], vClassConfidences[0], and vBoundingBoxes[0] correspond to the same prediction.)

Parameters
nOutputIndex- The output tensor index from the model containing inference data.
vClassIDs- A reference to a vector that will be filled with class IDs for each prediction. The class ID of a prediction will be choosen by the highest class confidence for that prediction.
vClassConfidences- A reference to a vector that will be filled with the highest class confidence for that prediction.
vBoundingBoxes- A reference to a vector that will be filled with cv::Rect bounding box for each prediction.
fMinObjectConfidence- The minimum confidence for determining which predictions to throw out.
nOriginalFrameWidth- The pixel width of the normal/original camera frame. This is not the size of the model input or resized image.
nOriginalFrameHeight- The pixel height of the normal/original camera frame. This is not the size of the model input or resized image.
Note
For YOLOv8, you divide your image size, i.e. 640 by the P3, P4, P5 output strides of 8, 16, 32 to arrive at grid sizes of 80x80, 40x40, 20x20. Each grid point has 1 anchor, and each anchor contains a vector 4 + nc long, where nc is the number of classes the model has. So for a 640 image, the output tensor will be [1, 84, 8400] (80 classes). Notice how the larger dimensions is swapped when compared to YOLOv8.
Author
clayjay3 (clayt.nosp@m.onra.nosp@m.ycowe.nosp@m.n@gm.nosp@m.ail.c.nosp@m.om)
Date
2023-11-15
544 {
545 // Retrieve output tensor from interpreter.
546 TfLiteTensor* tfOutputTensor = m_pInterpreter->tensor(nOutputIndex);
547 // Get output tensor shape.
548 OutputTensorDimensions stOutputDimensions = this->GetOutputShape(nOutputIndex);
549 // Create vector for storing temporary values for this prediction.
550 std::vector<float> vGridPrediction;
551 // Resize the Grid prediction vector to match the number of classes + bounding_box + objectness score.
552 vGridPrediction.resize(stOutputDimensions.nObjectnessLocationClasses);
553
554 /*
555 Loop through each grid cell output of the model output and filter out objects that don't meet conf thresh.
556 Then, repackage into nice detection structs.
557 For YOLOv8, you divide your image size, i.e. 640 by the P3, P4, P5 output strides of 8, 16, 32 to arrive at grid sizes
558 of 80x80, 40x40, 20x20. Each grid point has 1 anchor, and each anchor contains a vector 4 + nc long, where nc is the number
559 of classes the model has. So for a 640 image, the output tensor will be [1, 84, 8400] (80 classes). Notice how the larger dimensions is swapped
560 when compared to YOLOv8.
561 */
562 for (int nIter = 0; nIter < stOutputDimensions.nAnchors; ++nIter)
563 {
564 // Loop through the number of object info and class confidences in the 2nd dimension.
565 // Predictions have format {center_x, center_y, width, height, class0_conf, class1_conf, ...}
566 std::string szTest = "";
567 for (int nJter = 0; nJter < stOutputDimensions.nObjectnessLocationClasses; ++nJter)
568 {
569 // Repackage values into more usable vector. Also undo quantization the data.
570 vGridPrediction[nJter] = (tfOutputTensor->data.int8[nIter + (nJter * stOutputDimensions.nAnchors)] - stOutputDimensions.nQuantZeroPoint) *
571 stOutputDimensions.fQuantScale;
572 }
573
574 // Find class ID based on which class confidence has the highest score.
575 std::vector<float>::iterator pStartIterator = vGridPrediction.begin() + 4;
576 std::vector<float>::iterator pMaxConfidence = std::max_element(pStartIterator, vGridPrediction.end());
577 int nClassID = std::distance(pStartIterator, pMaxConfidence);
578 // Get prediction confidence for class ID.
579 float fClassConfidence = vGridPrediction[nClassID + 4];
580
581 // Check if class confidence meets threshold.
582 if (fClassConfidence >= fMinObjectConfidence)
583 {
584 // Scale bounding box to match original input image size.
585 cv::Rect cvBoundingBox;
586 int nCenterX = vGridPrediction[0] * nOriginalFrameWidth;
587 int nCenterY = vGridPrediction[1] * nOriginalFrameHeight;
588 int nWidth = vGridPrediction[2] * nOriginalFrameWidth;
589 int nHeight = vGridPrediction[3] * nOriginalFrameHeight;
590 // Repackaged bounding box data to be more readable.
591 cvBoundingBox.x = int(nCenterX - (0.5 * nWidth)); // Rect.x is the top-left corner not center point.
592 cvBoundingBox.y = int(nCenterY - (0.5 * nHeight)); // Rect.y is the top-left corner not center point.
593 cvBoundingBox.width = nWidth;
594 cvBoundingBox.height = nHeight;
595 // Add data to vectors.
596 vClassIDs.emplace_back(nClassID);
597 vClassConfidences.emplace_back(fClassConfidence);
598 vBoundingBoxes.emplace_back(cvBoundingBox);
599 }
600 }
601 }
Here is the call graph for this function:
Here is the caller graph for this function:

◆ GetInputShape()

InputTensorDimensions yolomodel::tensorflow::TPUInterpreter::GetInputShape ( const int  nTensorIndex = 0)
inlineprivate

Get the input shape of the tensor at the given index. Requires the device to have been successfully opened.

Parameters
nTensorIndex- The index of the input tensor to use. YOLO models that have been converted to a edgetpu quantized .tflite file will only have one input at index 0.
Returns
TensorDimensions - A struct containing the height, width, and channels of the input tensor.
Author
clayjay3 (clayt.nosp@m.onra.nosp@m.ycowe.nosp@m.n@gm.nosp@m.ail.c.nosp@m.om)
Date
2023-11-12
616 {
617 // Create instance variables.
618 InputTensorDimensions stInputDimensions = {0, 0, 0, 0, 0, 0};
619
620 // Check if interpreter has been built.
621 if (m_bDeviceOpened)
622 {
623 // Get the desired input tensor shape of the model.
624 TfLiteTensor* tfInputTensor = m_pInterpreter->tensor(nTensorIndex);
625 TfLiteIntArray* tfDimensions = tfInputTensor->dims;
626
627 // Package dimensions into struct.
628 stInputDimensions.nHeight = tfDimensions->data[1];
629 stInputDimensions.nWidth = tfDimensions->data[2];
630 stInputDimensions.nChannels = tfDimensions->data[3];
631 stInputDimensions.nTensorIndex = nTensorIndex;
632 // Get the quantization zero point and scale for output tensor.
633 stInputDimensions.nQuantZeroPoint = tfInputTensor->params.zero_point;
634 stInputDimensions.fQuantScale = tfInputTensor->params.scale;
635 }
636
637 return stInputDimensions;
638 }
Here is the caller graph for this function:

◆ GetOutputShape()

OutputTensorDimensions yolomodel::tensorflow::TPUInterpreter::GetOutputShape ( const int  nTensorIndex = 0)
inlineprivate

Get the output shape of the tensor at the given index. Requires the device to have been successfully opened.

Parameters
nTensorIndex- The index of the output tensor to use. YOLO models that have been converted to a edgetpu quantized .tflite file will only have one output at index 0.
Returns
TensorDimensions - A struct containing the height, width, and channels of the output tensor.
Author
clayjay3 (clayt.nosp@m.onra.nosp@m.ycowe.nosp@m.n@gm.nosp@m.ail.c.nosp@m.om)
Date
2023-11-12
653 {
654 // Create instance variables.
655 OutputTensorDimensions stOutputDimensions = {0, 0, 0, 0, 0};
656
657 // Check if interpreter has been built.
658 if (m_bDeviceOpened)
659 {
660 // Get the desired output tensor shape of the model.
661 TfLiteTensor* tfOutputTensor = m_pInterpreter->tensor(nTensorIndex);
662 TfLiteIntArray* tfDimensions = tfOutputTensor->dims;
663
664 // Package dimensions into struct. Assume anchors will always be the longer dimension.
665 stOutputDimensions.nAnchors = std::max(tfDimensions->data[1], tfDimensions->data[2]);
666 stOutputDimensions.nObjectnessLocationClasses = std::min(tfDimensions->data[1], tfDimensions->data[2]);
667 stOutputDimensions.nTensorIndex = nTensorIndex;
668 // Get the quantization zero point and scale for output tensor.
669 stOutputDimensions.nQuantZeroPoint = tfOutputTensor->params.zero_point;
670 stOutputDimensions.fQuantScale = tfOutputTensor->params.scale;
671 }
672
673 return stOutputDimensions;
674 }
Here is the caller graph for this function:

The documentation for this class was generated from the following file: